Communication over Channels with Symbol Synchronization Errors by Hugues Mercier B.Sc., Universite Laval, 1997 M.Sc., Universite de Montreal, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies (Electrical and Computer Engineering) The University of British Columbia March 2008 ©Hugues Mercier, 2008 Abstract Synchronization is a problem of fundamental importance for a wide range of practi- cal communication systems including reading media, multi-user optical channels, syn- chronous digital communication systems, packet-switched communication networks, dis- tributed computing systems, etc. In this thesis I study various aspects of communication over channels with symbol synchronization errors. Symbol synchronization errors are harder to model than erasures or substitution errors caused by additive noise because they introduce uncertainties in timing. Consequently, the capacity of channels subjected to synchronization errors is a very challenging problem, even when considering the simplest channels for which only deletion errors occur. I improve on the best existing lower and upper bounds for the capacity of the deletion channel using convex and stochastic optimization techniques. I also show that simply finding closed-form expressions for the number of subsequences when deleting symbols from a string is computationally prohibitive. Constructing efficient synchronization error-correcting codes is also a challenging task. The main result of the thesis is the design of a new family of codes able to correct several types of synchronization errors. The codes use trellis and modified versions of the Viterbi decoding algorithm, and therefore have very low encoding and decoding complexities. They also have high data rates and work for reasonably noisy channels, which makes them one of the first synchronization-correcting codes that have any chance of being used in practical systems. In the last part of the thesis, I show that a synchronization approach can solve the opportunistic spectrum access problem in cognitive radio, where cognitive users want to communicate in presence of legacy users to whom the bandwidth has been licensed. I also consider the amount of communication required to solve a large class of distributed problems where synchronization errors can occur. More precisely, I study how allowing the parties to solve the problems incorrectly with small probability can reduce the total amount of communication or the number of messages that need to be exchanged. ii Table of Contents Abstract ^ii Contents ^ iii List of Tables vii List of Figures viii Acknowledgement ^ x Dedication ^ xi 1 Introduction 1 1.1 Thesis Outline and Contributions ^ 2 1.1.1^Synchronization Channel Models ^ 3 1.1.2^Capacity of the Discrete Binary Deletion Channel ^ 3 1.1.3^Subsequences, Sequences, and Supersequences 3 1.1.4^Synchronization Error-Correcting Block Codes ^ 4 1.1.5^Nonlinear Synchronization-Correcting Trellis Codes ^ 4 1.1.6^Cognitive Radio and Synchronization Errors ^ 5 1.1.7^Interactive Communication and Synchronization Errors ^ 5 2 Synchronization Channel Models ^ 7 I Deletion Channel Behavior 10 3 Bounds for the Capacity of the Binary Deletion Channel ^ 11 3.1 Introduction ^11 3.2 Preliminaries ^12 iii Table of Contents iv 3.3 3.4 3.5 Upper Bound ^ Approximate Lower Bound ^ 3.4.1^Computation of I' 3.4.2^Estimation of Cm ^ 3.4.3^Discussion ^ Summary 13 15 16 17 18 19 4 Subsequences, Sequences, and Supersequences ^ 20 4.1 Introduction ^ 20 4.2 Preliminaries and Known Results ^ 21 4.3 Closed-Form Formulae for the Number of Subsequences when d < 5 24 4.3.1^Deletion Patterns and Sandwiches 24 4.3.2^An Improved Upper Bound for IDd(u)I ^ 26 4.3.3^Sandwiches Revisited ^ 29 4.3.4^Closed-Form Expressions 30 4.4 An Efficient Algorithm to Compute the Number of Subsequences ^ 38 4.5 An Efficient Algorithm for the Multiplicity of a Subsequence in a String 42 4.6 Summary ^ 42 II Synchronization Error-Correcting Codes ^ 44 5 Variable-Length Synchronization Error-Correcting Codes ^ 45 5.1 Levenshtein's Single-Synchronization Error-Correcting Codes 45 5.2 Repetition Synchronization Error-Correcting Codes ^ 47 5.3 Summary ^ 50 6 Nonlinear Trellis Codes for Symbol Synchronization Errors ^ 52 6.1 Introduction ^ 52 6.2 Preliminaries and Useful Distances ^ 55 6.3 Convolutional Codes and Viterbi Algorithm for Synchronization Errors 60 6.3.1^Correction of Deletion Errors 61 6.3.2^Correction of Insertion Errors ^ 64 6.3.3^Correction of Duplication Errors 67 6.3.4^Correction of Synchronization Errors 71 6.4 Bypasses and Shortcuts^ 79 6.5 Recovering Synchronization Versus Correcting Synchronization Errors 85 Table of Contents^ v 6.6 A Few Simulation Results ^ 88 6.6.1 A Trivial Triple-Deletion Synchronization-Recovering Code . .^88 6.6.2 A Sample Trellis Code for the Deletion Channel ^ 89 6.6.3 Codes for Channels with Segmented Deletion Errors 91 6.7 Maximum likelihood Decoding of Trellis Codes for Deletion Channels^93 6.8 Summary ^ 94 III Applications of Synchronization Models ^ 96 7 Opportunistic Spectrum Access for Cognitive Radio Systems 97 7.1 Introduction ^ 97 7.2 Opportunistic Spectrum Access via Erasure Models ^ 99 7.2.1^Two-switch erasure model with binary symmetric noise ^ 99 7.2.2^One-switch alternative model ^ 101 7.2.3^Coding for the Erasure Approach 103 7.3 Opportunistic Spectrum Access via Synchronization ^ 103 7.4 Summary 107 8 Worst-Case Nonzero-Error Interactive Communication ^ 108 8.1 Introduction ^ 108 8.2 Complexity Models and Known Results ^ 111 8.2.1^Preliminaries ^ 111 8.2.2^Worst-Case Deterministic Model 112 8.2.3^Worst-Case Private Coin Randomized Model ^ 114 8.2.4^Worst-Case Public Coin Randomized Model 115 8.2.5^Worst-Case Amortized Models ^ 115 8.2.6^Worst-Case Distributional Model 116 8.3 Private Coin Randomized Interactive Communication ^ 117 8.4 Public Coin Randomized Interactive Communication 123 8.4.1^One Round of Communication is Optimal 123 8.4.2^Difference Between the Private Coin and Public Coin Models . 125 8.5 Private Coin Randomized Amortized Interactive Communication ^ 128 8.6 Distributional Interactive Communication ^ 129 8.7 Equivalence of All the Models for Balanced and Symmetric Pairs ^ 131 8.8 Summary ^ 134 Table of Contents^ vi 9 Conclusions, Open Problems, and Future Work ^ 135 9.1 Capacity Bounds for Channels with Synchronization Errors ^ 135 9.2 Nonlinear Trellis Codes for Channels with Synchronization Errors . .^136 9.2.1 Code Construction ^ 136 9.2.2 Timing Errors 137 9.2.3 Soft-Information Algorithms ^ 137 9.2.4 Complexity Issues ^ 138 9.2.5 Maximum-Likelihood Decoding 138 9.3 Opportunistic Spectrum Access for Cognitive Radio Systems ^ 139 9.4 Worst-Case Private Coin Randomized Interactive Communication . .^140 9.4.1 One-Way Private Coin Randomized Complexity ^ 140 9.4.2 Deterministic Model Versus Private Coin Randomized Model .^140 Bibliography ^ 142 List of Tables 4.1 Strings with a R3 (3, 3, 0) run distribution and their respective number of 3-subsequences ^ 25 4.2 Number of subsequences of the string u = 0222021 when deleting up to 5 symbols 39 4.3 Multiplicity of the subsequence u = 0010 in the string v = 00010010 . . ^ 43 5.1 Extensions of C5 for two synchronization errors ^ 48 5.2 Double-deletion-correcting codebooks obtained by increasing the length of the runs of Levenshtein's codewords ^ 49 5.3 Codebook size versus average block length for double-deletion-correcting codes ^ 50 6.1 Prefix synchronization distance between the strings u = 01101 and v = 110 ^ 73 6.2 Metrics, prefix survivors, and state survivors for the modified Viterbi decoding algorithm for synchronization errors with the convolutional encoder G(D) = [1 + D2 1+ D + D 2] and the received sequence 01 01 11 11 01 0 ^77 6.3 Hamming distance between the original and final codewords for the trellis encoder of Figure 6.12 with two or three deletion errors ^ 89 9.1 Multiplicity of the subsequence r = 0010 in the string v = 010 ^ 138 9.2 Multiplicity of the subsequence r = 0010 in the string v = 011 ^ 138 vii List of Figures 2.1 Flow chart for the mother binary synchronization channel ^7 2.2 Varying sampling rate between the transmitter and receiver ^9 3.1 Binary deletion channel, case n = 1 ^13 3.2 Binary deletion channel, case n = 2 ^13 3.3 Convergence of the two algorithms to the capacity of the binary deletion channel when pd = 0.3 ^ 15 3.4 Improved bounds for the capacity of the binary deletion channel ^ 18 4.1 Sandwiches for the string u = 1120200130 ^ 25 4.2 Redundant deletion pattern when deleting a sandwich of length 2 and one additional symbol ^ 25 4.3 Redundant deletion patterns when deleting a sandwich of length 2 and three additional symbols 30 4.4 Multiple deletion of a redundant deletion pattern ^ 33 6.1 State diagram for the G(D) = [1 + D 2 1 + D + D2] convolutional encoder ^ 62 6.2 Deletion trellis for the G(D) = [1+D 2 1+ D + D2] convolutional encoder and received sequence 01 11 01 01 0 ^ 62 6.3 Insertion trellis for the G(D) = [1+D2 1+ D + D2] convolutional encoder and received sequence 11 11 10 10 00 01 11 0 ^ 65 6.4 Duplication trellis for the G(D) = [1 + D2 1 + D + D 2] convolutional encoder and received sequence 11 11 11 10 01 11 10 00 11 01 11 ^ 70 6.5 Empty trellis for the G(D) = [1 + D 2 1 + D + D2] convolutional encoder and received sequence 01 01 11 11 01 0 ^ 76 viii List of Figures^ ix 6.6 Trellis sample for the modified Viterbi decoding algorithm for synchroniza- tion errors with the G(D) = [1 + D2 1 + D + D 2 ] convolutional encoder and received sequence 01 01 11 11 01 0 ^ 76 6.7 A (2,2)-encoder graph with input alphabet {0, 1} and output alphabet {0,1,2} ^80 6.8 A simple graph with bypasses and shortcuts ^ 80 6.9 The (4,1,3)-rectangular graph ^83 6.10 The (3,2,4)-rectangular graph ^83 6.11 The (2,3,2)-rectangular graph ^84 6.12 A triple-deletion synchronization-recovering code^ 88 6.13 A long trellis is very robust against loss of synchronization ^90 6.14 A code correcting 1 deletion error every 5 bits 92 6.15 A code correcting 1 deletion error every 9 bits ^92 6.16 A code correcting 2 deletion errors every 10 bits ^ 92 6.17 A bad trellis for maximum-likelihood decoding 94 7.1 Two-switch model from Jafar and Srinivasa ^ 98 7.2 Spectrum availability for a cognitive user at time t 98 7.3 Components of the noisy erasure channel 100 7.4 Noisy erasure channel ^ 101 7.5 One-switch channel transitions ^ 102 7.6 Synchronization errors caused by discrepancies of the available spectrum to the cognitive transmitter and receiver ^ 104 Acknowledgement First, I would like to thank my advisor and mentor Vijay K. Bhargava. From the first day I joined his research group, his guidance, professional counsel, financial assistance, and unfaltering support have been paramount to my success and defy any attempt to be properly acknowledged. I especially express my gratitude for his patience, understanding, and compassion after the difficult birth of my second daughter. I would also like to thank Brian Marcus for organizing coding seminars and for en- couraging me to discuss research problems with his students. Special thanks to Lutz Lampe and Robert Schober from my supervisory committee for their comments and sug- gestions during my departmental examination, which greatly improved the presentation of my dissertation, and to William A. Aiello, Panos Nasiopoulos, and Vahid Tarokh for their presence on my examining committee. I would like to acknowledge the financial support of the Fonds quebecois de la recherche sur la nature et les technologies, without which completing my graduate studies would simply not have been possible. Many thanks to the University of British Columbia for offering me a Graduate Fellowship for the fourth year of my doctoral program. Working in Vijay's research group, I had the privilege to be surrounded by many bright graduate students and researchers from all over the world. From discussing sci- entific problems to learning about their respective cultures and interests, I would like to thank them for making my stay at UBC so enjoyable. They include Gaurav Bansal, Zeljko Blazek, Daniela Djonin, Dejan Djonin, Serkan Dost, Olivier Gervais, Ziaul Hasan, Ja- hangir Hossain, Praveen Kaligineedi, Ashok Karmokar, Majid Khabbazian, Kin-Kwong Leung, Erez Louidor, Zhiwei Mao, Chris Nicola, Piplu Paul, Umesh Phuyal, Anjana Punchihewage, Mamunur Rashid, Poramate Tarasak, and Chandika Wavegedara. Spe- cial thanks to Majid for using his sharp mind on some research questions of mine Above all, I would like to thank my wife and daughters for their unconditional love and for making me a better man. x A Isabelle, Charlotte et Chloe xi Chapter 1 Introduction "The synchronization problem is not a mere engineering detail, but a fundamental communications problem as basic as detection itself." —Solomon W. Golomb, 1961 [39]. Early communication systems designers were preoccupied with symbol and word syn- chronization even more than they were preoccupied with additive noise. Samuel Morse, when designing his famous eponymous code in the 1830s, has allowed three time units of silence for a letter space and six time units of silence for a word space, a very high price to pay to reduce the synchronization errors and facilitate the decoding of messages sent over telegraph lines. Today, synchronization problems remain an integral part of technological systems op- erating in environments affected by uncertainties in timing, or time noise. These include hard drives, semi-conductors, integrated circuits, and synchronous digital communication networks. In reading media, for example hard disks and digital audio tapes, physical dis- turbances (vibrations, tape stretch, ...) may cause the position of the head relative to the media to digress from its expected location, causing timing errors [9, 87]. In modern semiconductor devices like integrated circuits, timing errors can be caused by the vary- ing sampling rates of the system devices [25]. Time noise caused by inaccurate clocking devices can affect most of the synchronous digital communication systems [37], and syn- chronization errors can be caused by the propagation medium, like crosstalk on multi-user optical channels [91] and Doppler shifts for underwater and satellite communications [94]. Time noise introduces insertions and deletions of symbols for the recipient. As a re- sult, systems corrupted by timing errors do not know their exact position when reading or decoding data. Moreover, as the performance of modern communication systems im- 1 Chapter 1. Introduction^ 2 proves, increasingly stringent synchronization constraints are required. Synchronization techniques are expensive. Some, like sending a pilot tone or periodic synchronization pulses, require additional power. Others, like adding a pseudo-noise sequence in the transmitted signal that can be replicated by the receiver, require increased bandwidth, and error-correcting codes robust against timing errors increase the overall latency and complexity of the systems, reduce their throughput, and are very difficult to construct. Synchronization models can also be used for a large number of problems ranging from mobile computing to quantum cryptography. Deletion channels with large alphabets can model packet-switched communication networks like the Internet, where packets are either lost or dropped due to buffer overflow at finite-buffer queues [22, 23, 95]. In this specific scenario deletion errors are transformed into erasures: the packets are numbered so they can be retransmitted when they are deleted. Deletion errors can also be used to model traitor tracing, where attackers remove parts of the embedded fingerprint in a digital object in order to create untraceable pirate copies [88]. In cognitive radio systems, a discrepancy between primary user detection by a cognitive transmitter and a cognitive receiver can be viewed as a loss of synchronization [79]. Insertions and deletions can also occur for a large class of distributed problems involving reconciliation of correlated data such as reconciliation of nucleotide sequences in DNA molecules [30], remote file storage [15], synchronization of mobile data [1], and quantum key distribution [10]. Although great progress has been made in the last few years, we are still far from a unified theory of synchronization. Synchronization errors are poorly modeled by the usual channels such as the binary symmetric channel or the additive white Gaussian noise channel, and new techniques need to be designed. 1.1 Thesis Outline and Contributions My thesis considers various problems related to communication over channels with symbol synchronization errors. The main body of the thesis is divided into three parts and comes after the description of a very general channel with synchronization errors in Chapter 2. The first part focuses on the understanding of deletion channels, which are the simplest nontrivial channels with synchronization errors. Improved bounds for the capacity of the discrete binary deletion channel are presented in Chapter 3, and closed-form expressions and an algorithm for calculating the number of subsequences when deleting symbols from a string in Chapter 4. In the second part of the thesis, I study binary error- correcting codes capable of correcting synchronization errors. Block codes are presented Chapter 1. Introduction^ 3 in Chapter 5, and a new class of trellis codes is described in Chapter 6. In the third part of the thesis, I study two problems that can be modeled using a synchronization approach: opportunistic spectrum access in cognitive radio systems in Chapter 7, and worst-case randomized interactive communication in Chapter 8. Finally, in Chapter 9, conclusions and suggestions for future work are presented. A more detailed outline follows. 1.1.1 Synchronization Channel Models In Chapter 2, I describe the mother communication channel considered throughout the thesis. The channel can be broken down into simpler channels closely related to various types of synchronization errors like symbol deletions, symbol insertions, duplications, and combinations of more than one of them with or without substitution errors. 1.1.2 Capacity of the Discrete Binary Deletion Channel [70] Quantifying the reduction in capacity due the presence of synchronization errors is a very difficult problem. There is no known simple analytical closed-form for the capacity of channels admitting synchronization errors, and no numerical approximation either. In Chapter 3, I consider the deletion channel, where each transmitted bit is deleted with probability pd, and present two computational methods to estimate its capacity. The first approach bounds the mutual information between the input and the output of the channel. It converges from above as the size of the input blocks approaches infinity and leads to the first nontrivial upper bound for the capacity. The second method uses stochastic optimization to maximize the information rate of the channel when the input source is modeled by a Markov process. It converges from below as the order of the optimal Markov chain increases, and as a result I derive an approximate lower bound improving on the existing capacity lower bounds. 1.1.3 Subsequences, Sequences, and Supersequences [71] Having been unsuccessful in my attempt to improve significantly on the capacity bounds for channels admitting synchronization errors, I decided to gain more insight on the difficulties of these channels by studying the varying size of the spheres associated to synchronization codewords. In Chapter 4, I consider the problem of finding the number of subsequences when deleting d symbols from a string. I present a framework to find closed-form expressions for the number of subsequences, and use it to prove the first Chapter 1. Introduction^ 4 explicit formulae for d E {3, 4, 5} and the first formulae for nonbinary alphabets. Since in general the formulae cannot be calculated in polynomial time for large values of d, I also present the first efficient algorithm to compute the number of subsequences for any string, number of deletions, and alphabet size. A closely related problem is to compute the multiplicity of a subsequence in a string. I provide a recursive definition and an efficient algorithm for this problem. 1.1.4 Synchronization Error-Correcting Block Codes [69] In Chapter 5, I present a simple technique to increase the detection and correction ca- pabilities of any block code robust against synchronization errors, and use it to extend a well-known family of single-synchronization error-correcting codes for multiple insertion and deletion errors. The extended codes, although far from optimal, outperform several known block codes for synchronization errors. 1.1.5 Nonlinear Synchronization-Correcting Trellis Codes [68] Chapter 6 presents a comprehensive investigation of nonlinear trellis codes for synchro- nization errors. I show that the Viterbi decoding algorithm can be modified to correct most types of synchronization errors. It can be used for insertion, deletion, and dupli- cation errors, for instance, and as well for more complex scenarios where insertion and deletion errors can occur together. In all cases, I prove that the algorithm is able to find the closest codeword from the received sequence. The computational complexity of the algorithms is comparable to the complexity of the original Viterbi algorithm for insertion, deletion, and duplication errors, but larger for more general synchronization errors. I show that for trellis codes whose underlying graphs have a fixed number of states, there is a fundamental tradeoff between their capacity to recover synchronization and the number of synchronization errors they can correct. A corollary of this result is that the structure of convolutional codes is not appropriate to correct synchronization errors, which explains why none of the previous attempts to use them in this context has been very successful. I present a class of rectangular graphs well suited to correct most types of synchroniza- tion errors when used as encoders. The graphs in conjunction with the adapted versions of the Viterbi decoding algorithm offer several advantages over most coding techniques proposed to protect communication systems against time noise. Encoding information is not difficult and the decoding complexity is reasonable. It is also possible to cor- Chapter 1. Introduction^ 5 rect bursts of synchronization errors and to take advantage of soft-information from the channels. 1.1.6 Cognitive Radio and Synchronization Errors [79] Cognitive radio is an innovative technology proposed to increase spectrum usage by allowing dynamic allocation of the unused spectrum in changing environments. Cognitive users monitor the spectrum and are allowed to use it as long as it does not interfere with the primary users to whom it has been licensed. In Chapter 7, I describe how some well established tools from the fields of error-correcting codes and information theory can be applied to the opportunistic spectrum access problem in cognitive radio. More precisely, I describe simple channel models, compute their capacity, and demonstrate that low-density parity-check codes and synchronization error-correcting codes can be used to solve this problem. 1.1.7 Interactive Communication and Synchronization Errors [73, 72] Synchronization errors also appear in a wide range of distributed problems. Consider the following scenario: a user modifies a large file on his laptop by inserting, deleting, and substituting characters, and wants to update the centralized version of the file located on a remote server. If few changes are made, then the files are highly correlated. Although the simplest way to reconcile the files is to copy the entire file from the laptop to the server, more clever algorithms can significantly reduce the number of bits that need to be transmitted, especially if the user and the server are allowed to interact. Thus, an interesting problem is to study the minimum amount of communication required to reconcile the two versions of the files, with or without interaction, depending on the "distance" between them. This is a special case of "interactive communication". In the interactive communication model, two distant parties, Px and Py, possess respective private but correlated inputs x and y, and Py wants to learn x from Px while minimizing the communication required for the worst possible input pair (x, y). In chapter 8, I analyze four nonzero-error models in this correlated data setting, i.e., models for which Py is allowed to learn x incorrectly with small probability. In the private coin randomized model, both players are allowed to toss coins, and Py must learn x with high probability for every input pair. The public coin randomized model is Chapter 1. Introduction^ 6 similar to this model, but instead of private coins, both players have access to a common source of randomness. The private coin randomized amortized model is also similar to the first model, but the players are allowed to solve several independent instances of the same problem simultaneously instead of sequentially. Finally, the distributional model is deterministic, but Py is allowed to answer incorrectly for a small fraction of the inputs weighted by their probability distribution. I show that the public coin randomized, private coin randomized amortized, and distributional models are equivalent and can be more efficient than the original worst- case deterministic model. Moreover, when the players are not allowed to interact, the difference between the best deterministic and public coin randomized protocols can be arbitrarily large. I prove that one round of communication is nearly optimal for the private coin randomized model. I also show that the deterministic model and all the nonzero-error models are equivalent for a large class of symmetric problems arising from several practical applications, although nonzero-error and randomization allow efficient one-way protocols. 1 — pc — 3 Input bit x Transmit random bit — Pa — Pi — Pt V Transmit x Transmit 7 Erase x Delete x Transmit x Chapter 2 Synchronization Channel Models The mother binary synchronization channel described in Figure 2.1 includes all the types of errors encountered in this thesis. Although synchronization between the sender and the recipient is not guaranteed, it is a memoryless channel in the sense that the noise behaves independently on each input bit. Figure 2.1: Flow chart for the mother binary synchronization channel. 7 Chapter 2. Synchronization Channel Models^ 8 There are three types of synchronization errors: deletions, insertions, and duplica- tions. There are also two types of errors having no detrimental effect on synchronization: substitutions and erasures. The difference between deletions and erasures is that dele- tions leave no trace of their existence. For instance, deleting the middle bits from 0011 results in the output 01, whereas if the bits are erased by the channel the output is 0??1. A duplication transmits a copy of the input bit and is therefore a special case of insertion error. It is assumed that an arbitrarily large number of bits can be inserted between any two input bits. When an input bit x is transmitted over this channel, it suffers from a synchronization error with probability pd + pi + pt . The bit is deleted with probability pd , a random bit is inserted before x with probability pi , and a copy of x is inserted before x with probability pt . Once the bit x is transmitted, it suffers from a substitution error with probability p, and is erased with probability Pe . All the channels used in this thesis, including the binary symmetric and binary erasure channels, can be derived from the mother synchronization channel. The binary symmetric channel transmits a bit correctly with probability 1 — ps and transmits its complement with probability ps , i.e., pd = pt = pi = pe = 0. In the binary deletion channel, each input bit is deleted with probability pd and transmitted correctly with probability 1 — pd . The erasure channel has pe 0 and pd = Pt = pi = ps = 0. The duplication channel and the insertion channel only cause duplication and insertion errors, respectively. The number of insertions or duplications between two input bits is geometrically distributed, although other distributions could be used. For instance, Gallager [37] considered a channel where a bit can only be replaced by two random bits, and Mitzenmacher [75] a channel where each bit is duplicated at most once; both channels have the advantage of being easier to analyze than their counterparts with more general distributions. In the literature, the synchronization channel and the insertion-deletion channel are both used to denote the channel where insertions and deletions are the only errors allowed. The combination of deletion and duplication errors arises naturally for all the practical problems where timing errors can be caused by the varying rates of the sampling devices: faster sampling at the receiver causes duplication errors, whereas slower sampling causes deletions. In Figure 2.2, the transmitted string 1101110010110 is sampled at incorrect times by the receiver and received with two duplication errors and two deletion errors as 1100111100110. Chapter 2. Synchronization Channel Models^ 9 Sampling at the receiver 1^1^0 0 1^1^1^1^0^0^1^1^0 1^1^0^1^1^1^0^0^1^0^1^1^0 Sampling at the transmitter Figure 2.2: Varying sampling rate between the transmitter and receiver. Part I Deletion Channel Behavior 10 Chapter 3 Improved Numerical Bounds for the Capacity of the Binary Deletion Channel 3.1 Introduction The binary deletion channel is a channel for which each input bit is deleted independently with probability pd and transmitted correctly with probability 1 — pd . For example, the codeword 1011100 could be received as 1100. Although it is easier to analyze than its counterparts with insertions and deletions, the deletion channel has memory and is harder to understand than, for instance, the erasure channel. As a result, there is no known simple closed-form expression for its capacity, and no numerical approximation either. Forty years ago, Dobrushin [24] proved the equivalent of the channel coding theorem and its weak converse for channels admitting synchronization errors. Vvedenskaya and Dobrushin [108] used this theoretical framework and tried to derive numerical lower bounds for the capacity of the binary deletion channel, but were limited by the processing power of the computers available at the time. Deletion channels have recently received renewed interest after Diggavi and Grossglauser [22, 23] used them to prove capacity lower bounds for packet-switched communication networks, where packets are either lost or dropped due to buffer overflow at finite-buffer queues. They also proved that for large alphabets, the capacity of the deletion channel is equal to the capacity of the erasure channel. Their framework was extended by Kavcic and Motwani [52] and Drinea and Mitzenmacher [29, 28], who computed improved lower bounds. Drinea and Mitzenmacher 11 Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^12 [27] also derived a lower bound of 1-9P4 , which is the only nontrivial bound for the capacity of the binary deletion channel obtained without the use of simulations. The capacity of the deletion channel with deletion probability pd is upper bounded by the capacity of the erasure channel with erasure probability pd. This bound is almost tight with large input alphabets, but quite weak in the binary case. There is no known other upper bound, either algebraic or obtained by simulation. If nothing else, upper bounds from Ullman [104] and Dolgopolov [26] were also used, but both use assumptions that cannot exactly be applied in the context of the binary deletion channel. For large values of pd, codes with rates exceeding these last two upper bounds were recently constructed [28]. In this chapter, I present two numerical algorithms that asymptotically converge to the capacity of the binary deletion channel. The first algorithm uses convex programming on the probability of the input sequences; it converges when the length of the sequences approaches infinity. The second algorithm uses stochastic optimization on ergodic pro- cesses modeling the input source, and converges when the order of the optimal Markov chain approaches infinity. The fact that the two algorithms respectively converge to the capacity of the deletion channel from above and from below allows to improve on the existing bounds. I present the first nontrivial upper bound for the capacity of the binary deletion channel, as well as an approximate lower bound. 3.2 Preliminaries It is assumed that the readers are familiar with the basic concepts of entropy, conditional entropy, mutual information, and channel capacity; if not there are several comprehensive presentations of these subjects like the one of Cover and Thomas [17]. All the logarithms are in base 2 unless otherwise specified, and 0 log 0 0. Let X = y = 0,11. Xn = yn is the set of all binary strings of length n, X°° the set of all binary strings of infinite length, and y* = {yiy2... yk k > 0 and each yi E y} the set of all binary strings including the empty string € 1 . The random variable Xn having possible outcomes Xn and probability nmax r(x ; y.) .distribution Pxn is also defined. Finally, let In Pxn^n 1 * is the Kleene star. Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^13 3.3 Upper Bound Sending a single bit over a deletion channel is illustrated in Figure 3.1. Here, / 1 = 1 — pd, which corresponds to the capacity of the erasure channel. The difference between erasure 1 — pd 0 ^ 0 Pd Pd 2^1 1 — pd^1 Figure 3.1: Binary deletion channel, case n = 1. of bits and deletion of bits can be seen by sending a block of 2 bits over the deletion channel, as shown in Figure 3.2. One can show that / 1 > /2 for 0 < pd < 1, and this Figure 3.2: Binary deletion channel, case n = 2. ; yn)is different from the erasure channel for which /(X; Y) = I (cn ^This phenomenon occurs because the recipient does not know with certainty the position of the deleted bits. If, for example, 01 is sent and 0 is erased, the recipient knows that 01 or 11 was sent. On the other hand, if 0 is deleted, the recipient must decide between 01, 10, and 11. The longer the input sequence is, the harder it is to correct deletions as opposed to erasures. Furthermore, the mutual information is maximized when the probability of transmitting 00 or 11 is larger than the probability of transmitting 01 or 10. This can be explained by the fact that the number of subsequences when deleting bits from a string of length n depends on the structure of the string 2 . 1 2 2 Calculating the number of subsequences when deleting symbols from a string is an interesting prob- lem in its own right. It is studied in the next chapter. Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^14 The channel coding theorem and its converse for channels admitting synchronization errors can be used to derive a numerical upper bound for the capacity of the deletion channel. Theorem 3.1 (Dobrushin [24]). i(xn• Y*)c = lim In = lim max^ . n—.co Pxn To illustrate the theorem, let us see how many distinguishable sequences of length n can be sent over the channel. Intuitively, the recipient can receive ti 2 H(Y* ) sequences, which can be divided in sets of size 2H('Ixn) corresponding to the different inputs. The number of distinct sets is at most 2̂H2(Y.(*)^21(xn;Y*), corresponding to thep̂oi) =— maximum number of distinguishable codewords. Hence, ol g2 2I(xn ;Y* ) C lim max^ n—>c. Pxn It is not hard to show that In > In+ 1 for n > 1, thus all the /k are upper bounds for the capacity of the deletion channel. Since received sequences from the deletion channel are at most as long as the transmitted sequences, /k can be computed for small values of k and a fixed probability of symbol deletion pd, as done in the first algorithm. Algorithm 1 function upper_bound(n, pd ) 1. For each input sequence x E Xn beginning with a 0, find all the possible output sequences and their occurrence when bits are deleted. Store the results in the two-dimensional array An . 2. Using pd , modify An so that its entries become output probabilities. 3. Using An , maximize the objective function /(Xn; Y*) = H(Y*) — H(Y* Xn) over all the symmetric probability distributions over Xn. 4. Return F. As an example for Step 1, x = 0001 has the subsequences 0001, 000, 001 (3 times), 00 (3 times), 01 (3 times), 0 (3 times), 1, and €. It should be clear that the mutual in- formation is maximized for symmetric probability distributions with respect to 0 and 1. Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^15 Since /(Xn; Y*) is a concave function and the constraints are linear, the optimization problem in Step 3 can be solved using Kelley's cutting-plane algorithm [53]. An inter- esting property of cutting-plane algorithms is that the accuracy of the optimal solution can be specified. Figure 3.3 shows the value of In for pd = 0.3, n = [1, 6] and a 10 -10 accuracy. Let An ''-- In In-1 for n > 2. Since the rate of convergence of In is slow, one can use the fact that °7„ > 'A r=+„ to extrapolate an improved upper bound for C. Figure 3.3 shows ■-.n-I-1 — ,-.n this upper bound for pd = 0.3. Algorithm 1 was executed several times with different values of n and pd to get an upper bound for the capacity of the binary deletion channel; the results are shown in Figure 3.4. The optimization algorithm converges faster with large values of pd, and as a result, input blocks of length up to 6 were used for pd < 0.6, input blocks of length up to 7 for 0.6 < pd < 0.8, and input blocks of length up to 8 for Pd > 0 . 9 . Length of the input blocks 0^1^2^3^4^5^6^7^8^9 0^1^2^3^4^5^6 7^8^9 10 Order of the optimal Markov chain Figure 3.3: Convergence of the two algorithms to the capacity of the binary deletion channel when pd = 0.3. 3.4 Approximate Lower Bound The capacity of the binary deletion channel can also be expressed as in Theorem 3.2. Theorem 3.2 (Dobrushin [24]). C = (1 — pd) max lim IPC39; Yn) E n—oo^72 0.8 0.7 0.6 a,o, 0.5mg 0.4 0.3 0.2 0.7 0.6 0.5 0.4 0.3 0.2 Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^16 where the upper bound is taken over all stationary ergodic processes E modeling the input source and which are Markov chains. Let Em be the set of all Markov chains of order m, I' ''.- lim -1(x°° '" ) , and n—■oo^n Cm -4 (1 — pd) max /'. Theorem 3.2 can be rewritten in terms of the order of the optimalsn, Markov chains, i.e., C = lim Cm .^ (3.1)m-00 Since Em C Em,±1, it follows that C1 < C2 < • • • < C, thus all the Ck are lower bounds for the capacity of the deletion channel. The idea of using (3.1) to approximate the capacity is from Vvedenskaya and Dobrushin [108], and in this section their framework is extended to approximate Cm for small values of m using stochastic optimization. Algorithm 2 function lower_bound(m, pd) 1. Choose an arbitrary symmetric Markov chain E of order m. 2. Using the Markov chain to generate long input sequences, approximate H(Yn )^H(Yn X°° ) I' = lim^lim n—■oo^n^n—^co n 3. If E does not maximize I' with the desired accuracy, then choose another Markov chain and go to Step 2. 4. Return Cm . 3.4.1 Computation of I' For a fixed Markov chain of order m modeling the input source, I' is a limit as the length of the input sequences approaches infinity and is hard to compute with accuracy. We use I' = lim II(Yn) lim 11(YnIXe‘') and approximate each term separately. n—^oo n^n--*co^n To compute H(Yn), the Markov chain is used to generate long binary sequences from which each bit is then randomly deleted with probability pd . The resulting strings are then broken into i small output strings of length n, and the occurrence of all such H(Yn)^strings is stored in a table used to approximate H(Yn). Although^converges slowly,n Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^17 H(Y3 + 1 ) — H(Y 3 ) is constant when j is larger than or equal to m, and this can be used to compute Ern 11(Yn) with a 10' accuracy in reasonable time. The term H(Yri X°°) is harder to compute accurately. The Markov chain is used to generate an input sequence x of length N >> n. One needs to compute the probability of all its subsequences of length n when bits are randomly deleted with probability pd, but the complexity of this task is exponential with respect to N. Instead, we randomly delete bits from x, take the first n bits of the resulting sequence as a possible subsequence and repeat this process k times. H(Yn IX`x) = x) can be estimated using the k subsequences of x. By using the Markov chain to generate a large number s of sequences x i of length N, it is possible to estimate H(Yn X°°) by averaging all the values H(Yn 1X°° = x i ). Again, 11 (37n lx") converges slowly as n increases, but H(Y 1± 1 I — H(Y 1 X°°) converges much faster and is used to speed up the calculations. In order to compute lim 11(Yn lx°°) n--■oo with a relative error of 10 -2 , N, k, s and 1 have to be chosen such that the simulation takes several days on a 3.0GHz Pentium 4 processor. Also, when pd increases, N has to be larger since an increasing fraction of the input bits are deleted. 3.4.2 Estimation of Cm To estimate Cm , the concave objective function I' needs to be maximized over all Markov chains of order m. Since the channel is symmetric it is sufficient to consider symmetric Markov chains, thus for Cm there are 2" variables. As mentioned above, computing I' accurately is prohibitive and as a result deterministic optimization techniques cannot be used. Instead, we use stochastic optimization. More precisely, we roughly approximate /' at each iteration with small values of i, j, k, 1, N and s, and use the multidimensional extension of the Kiefer-Wolfowitz procedure [56]. At each iteration, the gradient can be estimated using 2' evaluations of I'. It is possible for the procedure to converge to Cm in mean square and with probability one at the price of a slow rate of convergence. Once the Kiefer-Wolfowitz procedure gives a good approximation of the optimal Markov chain, it can be used to estimate Cm by computing I' once with a smaller relative error as explained in the previous subsection. Figure 3.3 shows Cm for m = [0, 4] and pd = 0.3. Algorithm 2 was executed for m = [0, 4] and different values of pd , as shown in Figure 3.4; it gives an approximate lower bound for the capacity of the binary deletion channel 0.1 0.20 10.90.6 0.80.4 0.70.3 0.5 1 0° 10 -1 w 4-)„ lo' c4 10 -1 10 -2 10 -4 Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^18 3.4.3 Discussion Using Markov chains of order more than 1 improves on the previous lower bounds. In our simulations, there was no significant improvement when considering Markov chains of order more than 2, and in some cases using order chains of order 3 and 4 slightly reduced the capacity. This is partly due to the slow convergence of the Kiefer-Wolfowitz procedure and to the fact that the error when estimating lim H(Yn ixc°) is biased, causingnn--^oo the algorithm to converge slightly off the maximum. The impact of the bias could be reduced by gradually improving the accuracy of the estimation at each iteration instead of only once when the maximum is reached. Nevertheless, the capacity is certainly much closer to the lower bound than to the upper bound, since Algorithm 2 converges exponentially faster than Algorithm 1. Unfortunately, the estimation bias lower bounds lim H(Ynni xa°) , thus the lower bounds for the capacity are not strictly provable and are n—■oo instead approximations. 0^0.1^0.2^0.3^0.4^0.5^0.6^0.7 ^ 0.8 ^ 0.9 ^1 Deletion probability Pd Figure 3.4: Improved bounds for the capacity of the binary deletion channel. Chapter 3. Bounds for the Capacity of the Binary Deletion Channel^19 3.5 Summary In this chapter, I have proposed two numerical algorithms that converge to the capacity of the binary deletion channel, one from above and one from below. This has allowed me to introduce the first nontrivial upper bound for the capacity and to obtain an approximate lower bound improving on the existing lower bounds. Insertion-deletion error-correcting codes have not been studied as much as their coun- terparts for binary symmetric channels, and the structure of channels admitting synchro- nization errors is not completely understood. For instance, the capacity of the deletion channel can be approached by sending long messages in one block. Although it is debat- able whether or not the capacity can be reached by a practical coding scheme dividing the input sequence into blocks of reasonable length, I show in Chapter 6 that it is pos- sible to design codes for which the additional redundancy required to maintain block synchronization is not prohibitive. Chapter 4 Subsequences, Sequences, and Supersequences 4.1 Introduction Error-correcting codes capable of correcting insertions and deletions of symbols could be useful in several settings where such errors occur, including reading media, com- munication networks, multiuser optical channels, and reconciliation of correlated data. Unfortunately, insertions and deletions introduce synchronization errors that are difficult to analyze. The existing practical codes seem far from capacity and only work when deletion and insertion rates are small [20], or when there is a single error per block [64]. As a special case of a channel with synchronization errors, the deletion channel has each input symbol deleted independently with probability pd and transmitted correctly with probability 1 — pd. For example, the codeword 0012101 could be received as 0211. The deletion channel can model packet-switched communication networks, where packets are either lost or dropped due to buffer overflow at finite buffer queues [21], and is also of interest because it is easier to analyze than its counterparts with insertions and deletions. Designing efficient codes for the deletion channel remains a challenge: several open questions remain even when a single deletion has to be corrected, and very little is known for more than one deletion [98]. One of the reasons why good codes are difficult to construct is that the spheres associated with codewords for the deletion channel do not have the same size, since the number of subsequences when deleting d bits from a string depends on the structure of the string itself. For instance, when two bits are deleted from 0000, the only subsequence is 00, whereas when two bits are deleted from 20 Chapter 4. Subsequences, Sequences, and Supersequences^21 0101, the four 2-bit strings are possible subsequences. In fact, very few partial results are available on how to efficiently compute the number of subsequences when deleting symbols from a string. In this chapter, I study this problem to improve the understanding of deletion-correcting codes. The rest of the article is structured as follows. In Section 4.2, I present the problem and existing results. In Section 4.3, I introduce the concepts of deletion pattern and sandwich and use them to prove closed-form expressions for the number of subsequences of a string with d E {1, 2, 3, 4, 5} deletions; the only existing results are for binary strings and d E {1, 2}. The results presented in Section 4.3 can be extended for larger values of d, although the formula size increases exponentially with d. In Section 4.4, I present a simple recursive definition for the number of subsequences of a string, and use it to construct the first efficient algorithm working for any string, number of deletions, and alphabet size. A similar recursive definition for the multiplicity of a subsequence in a string and its corresponding algorithm are presented in Section 4.5. Finally, Section 4.6 concludes the chapter. 4.2 Preliminaries and Known Results Let E A {0, 1, , Q — 1} be an alphabet with 1E1 > 2 symbols, and let En be the set of all strings of length n over E. We write u u i u2 un to denote a Q-ary string of length n in En and for the length of u. When E = {0, 1}, we write w(u) for the weight of u, i.e., the number of is in u. A run is a substring of identical symbols of maximum length, and r(u) is used for the number of runs in u (we use r when no confusion is possible). A d-subsequence of u is a string of n — d symbols that can be obtained by deleting d symbols from u. We write Dd(u) for the set of d-subsequences of u and 1Dd(u)1 for the number of d-subsequences of u. It is assumed that IDd(u)1 = 0 when d > lul and that D d(U)1 = 1 when d = Jul. The multiplicity of a subsequence u in a string v is denoted by M(u, v), with the assumption that M(€, v) = 1 for any string v. Example 4.1. Consider the string u = 01100. The string has three runs, I.D 2 (u)1 = 5, and D2 (u) = {000, 010, 011, 100, 110}. The subsequence v = 010 appears four times in u, thus M(v, u) = 4. The first thing to remark is that the number of d-subsequences varies widely depending Chapter 4. Subsequences, Sequences, and Supersequences^22 on the string. For example, ID4(000000000000000)1 = 1 and 1D4 (012012012012012)I = 1233. When a single symbol is deleted from u, Levenshtein [60] pointed out that the number of subsequences is equal to the number of runs in u, i.e., 1 ,01(u)1= (4.1) In his survey of single-deletion-correcting codes, Sloane [98] presented a formula for 1D2 (u)1 and E = {0, 1} using the derivatives of u. The first derivative of u is u'-4.-- _1, where ui ED ui±i for 1 < i < n — 1, and the higher-order derivatives of u are defined similarly. Sloane proved that 1D2(u)1 = (7. dd 1 ) — 2w(u') + w(u"). It is not clear how to generalize this result for more than two deletions, and even less so for nonbinary alphabets. Swart and Ferreira [102] proved a different but equivalent formula for d = 2 and binary alphabets, i.e., 1D2 (u)1 = 2—(p2 + 2pq + q 2 — p + — t, (4.2) where p is the number of runs of length 1 in u, q the number of runs longer than 1, and t the number of runs of length 1 not located at one of the ends of u. A trivial upper bound for the number of d-subsequences is given by (4.3), where the first term is the number of strings of length n— d over E and the second term the number of d-tuples of runs from where the symbols can be deleted. IDd (u )1 5 m in (1 E^rd).^ (4. 3) Levenshtein [60] established the following bounds for a string u with r runs, i.e., (r — d + 1) d^< IDd(u)l <^±^1 ) • ^(4.4) The upper bound is tight if and only if all the runs of u are greater than or equal to d. The lower bound is not tight, but Hirschberg and Regnier [47] have established the µ2(n, = E (n — dl) i=0 (4.6) Chapter 4. Subsequences, Sequences, and Supersequences^23 following lower bound, which is tight for the strings 010101 ... and 101010... . d^ — d d(u)1 > E (r 1 . ). i=o (4.5) The maximum number of subsequences for a string of length n is also a problem of interest. Let plE 1(n, d)^max IDd(u)I. In an unpublished report, Calabi [11] (as cited in uEEn [12]) proved that which is much better than the upper bound from (4.4) and matches the lower bound from (4.5) for the strings 010101... and 101010.... Hirschberg and Regnier [47] generalized (4.6) for nonbinary alphabets, i.e., (n, i=0 n — d • ) PIE1-1(d , d — i). (4.7) The value piE l(n, d) is reached for u = acra^, where a is the concatenation of all the symbols of E in any order. Eq. (4.7) was recently used by Levenshtein [62] to study the number of subsequences required to reconstruct the original string. Let Ed(n)^E I Dd (u) I be the expected number of d-subsequences over all strings oflEin uEEn length n. Calabi and Hartnett [12] proved that E2 (n) = 4E1 — 1) + 1 and Hirschberg and Regnier [47] generalized the result for 0 < d < n, i.e., Ed (n) = i =0 — d — 1+i1 (^1 1 — —1E1 ) .^(4.8) Chapter 4. Subsequences, Sequences, and Supersequences^24 4.3 Closed-Form Formulae for the Number of Subsequences when d < 5 In this section, I present a framework that can be used to derive closed-form expres- sions for the number of d-subsequences of any string over any alphabet. I use the framework to prove formulae for d E {1, 2, 3, 4, 5}. Let rk(u) be the number of runs of length k in u (rk is used when no confusion is possible), and let the d-tuple Rd(u) -P= (r(u), ri (u), r2 (u), . . . , rd_i (u)) '' (r, ri , r2, . . . , rd_i) be the run distribution of u. Since subsequences are uniquely determined by the number of symbols deleted from each run, there is no need to distinguish rk from r 1 for k, l > d; it is sufficient to know that there are r — ri — r2 — • • • — rd_i large runs. 4.3.1 Deletion Patterns and Sandwiches Definition 4.2. A deletion pattern (5 of size d is an ordered multiset of d integers such that each i in (5 represents a deletion in the i-th run of u, and such that the multiplicity of i is smaller than or equal to the length of the i-th run. For example, the deletion pattern {1, 1, 2, 4, 4, 4} for the string u = 0001221111 results in the 6-subsequence 0221. Since the number of deletion patterns depends only on the run distribution of a string u, we write Pd(Rd(u)) or Pd(r, ri, r2, ... , rd_ i ) for the set of deletion patterns of size d of a string u whose run distribution is Rd (u), and 1Pd(Rd(u))1 or 1Pd(r, ri, r2, ... , rd_ i )I for the number of deletion patterns. It is assumed that 1P0(Ro(u))1 = 1. Although the run distribution is sufficient to calculate the number of deletion patterns of a string, it is not descriptive enough to derive its number of subsequences. As shown in Table 4.1, the nine strings have the same run distribution, however, each has a different number of 3-subsequences. Hence, one needs to know more about the structure of u in order to determine 1Dd (u)I. For binary strings, any extra information necessary can be obtained from the sequence of the lengths of its runs. On the other hand, this does not suffice for larger alphabets, and information about the actual symbols of each run must further be known. This situation is also depicted in Table 4.1, where all the strings contain three runs of length 1 followed by three runs of length 2. This happens because there are distinct deletion patterns that generate similar subsequences; we call these unnecessary deletion patterns redundant deletion patterns. In order to count the number of d-subsequences of a string, one needs to count how many redundant deletion patterns it contains. To do so, the concept of sandwich is introduced. 2 1^30 Chapter 4. Subsequences, Sequences, and Supersequences^25 Definition 4.3. A sandwich is a sequence of consecutive symbols squeezed between identical symbols, such that its ends are different from the squeezing symbol. The runs of either side of the sandwich are called the squeezing runs, and the length of a sandwich is the number of symbols between the squeezing runs. It is worth noting that when E = {0, 1}, each run not in the beginning nor the end of a string is also a sandwich; the difference appears for nonbinary alphabets. Figure 4.1 shows that the string 1120200130 has two sandwiches of length 1, one sandwich of length 2, and two sandwiches of length 5. The sandwich of length 2 has a squeezing run of length 1 and a squeezing run of length 2. Figure 4.1: Sandwiches for the string u = 1120200130. 1 0 ^1^1^2 0 2 0f ^ 11^1 2 0 2 0 0 >.< 0 ><( x Figure 4.2: Redundant deletion pattern when deleting a sandwich of length 2 and one additional symbol. When a sandwich is deleted from a string, redundant deletion patterns occur if sym- bols are deleted in the run formed by the concatenation of the merging squeezing runs. In Figure 4.2, if the sandwich of length 2 is deleted, then deleting a symbol in any of its Table 4.1: Strings with a R 3 (3, 3, 0) run distribution and their respective number of 3-subsequences. u(1) = 101001100 D3 (u (1) ) = 26 u (2) = 202001100 D3 (u (2>) = 27 u (3) = 202001122 D3 (u (3) ) = 28 u (4) = 201001100 D3 (u (4>) = 29 u (5) = 201001122 D3 (u(5)) = 30 u(6 ) = 102001122 D3 (u (6) ) = 31 u (7) = 212001122 D3 (u (7) ) = 32 u (8) = 021001122 D3 (0)) = 33 u (9) = 012001122 D3(0 ) ) = 34 Chapter 4. Subsequences, Sequences, and Supersequences^26 squeezing runs gives the same 3-subsequence 1120200. Thus, one of the deletion patterns {5, 6, 7} and {6, 7, 8} is redundant. 4.3.2 An Improved Upper Bound for IDd(u)1 The upper bound in (4.4) counts the number of ways to delete d symbols from r runs of length at least d. If u contains smaller runs, then some combinations are invalid, for example deleting three symbols in a run of length 2. Consequently, the upper bound can be improved by considering deletion patterns and sandwiches, as described in Lemma 4.4. Lemma 4.4. Dd(u)I^IPd(Rd(u))1, where the bound is tight if and only if u has no sandwich of length smaller than d. Proof Clearly, each distinct d-subsequence of u corresponds to at least one distinct deletion pattern of size d, which proves the upper bound. We prove both directions of the second statement by contrapositive, starting with the sufficient condition. Suppose that the bound is not tight. It follows that there are two different deletion patterns {a l , a2 , ... , ad} and {bi , b2 , . . . , bd} generating the same subsequence. Since the deletion patterns are different then there is at least one of the pairs (a l , bi ), (a2 , b2 ), ... , (ad , bd) whose elements were not deleted in the same run. Let (ak, bk) be the first such pair. Without loss of generality, it is assumed that pa, the run containing ak, is located before pb, the run containing bk. Since the length of pa after the deletions depends on which deletion pattern is applied, it follows that the only way for the subsequences to be equal is for the run(s) to the right of pa to be deleted completely until a matching symbol is merged. This scenario corresponds to the deletion of a sandwich, which is a contradiction. Finally, for the necessary condition, if u has a sandwich of length smaller than d, then, from the previous subsection, deleting the sandwich and any of its squeezing runs gives the same subsequence, hence the upper bound is not tight. The result follows. ^ When all the runs of u are at least as long as d, then of course u has no short sandwich. Hence, the upper bound of (4.4) cannot be better than the bound of Lemma 4.4. The number of deletion patterns of a string can be evaluated using generating func- tions, as shown next. Chapter 4. Subsequences, Sequences, and Supersequences^27 Lemma 4.5. 1Pd(r,T1, r2, • • • ,rd-i)1 = [z (1]((1 + z)" • (1 + z + z2 ) 1.2^(1 + z + • • • + where [e]f (z) denotes the coefficient of zn of the polynomial f(z). Proof. Let f(z) = (1 + z)r 1 (1 + z + z2 )r2 . . . (1 + z + • • + zd)fd . The i-th factor of f (z) for 1 < i < d enumerates the ways to delete symbols in the runs of length i, and the d-th factor the number of ways to delete symbols in the runs of length greater than or equal to d. All the large runs are treated the same way, since with d deletions, the worst that can happen is for a single run to be deleted completely. The coefficient of the term of degree d in f (z) when expanded gives the number of deletion patterns of size d of u. ^ It is worth mentioning that the d factors of f (z) can be approximated with the first d + 1 terms of their respective Maclaurin series. Multiplying all the factors and taking the coefficient of the term of degree d of the resulting product, it is then possible to obtain closed-form expressions, although they do not reveal how the number of deletion patterns scales with d. A more intuitive method for deriving closed-form formulae for the number of dele- tion patterns is to subtract the invalid patterns from the upper bound of (4.4) using a technique reminiscent of the inclusion-exclusion counting principle. Definition 4.6. Let A = {A 1 , A2, ... } be an ordered multiset of integers. We write lAlk for the number of occurrences of k in A,^E Al,,l ^for the number of elements in A, k and IIAII for the sum of all the elements in A. Finally, let A be the set of all the multisets A representing partitions of k, where 1 < k < d Lemma 4.7. T2, • • • ,rd-Ol^ + d —1) d ) E (-1)1A1+1 (7. d^— !AI —1) H ( rki )d^— IA!^ 1AEA^ 1.< 0 X Chapter 4. Subsequences, Sequences, and Supersequences^30 We write IPi (u, S)I for the number of deletion patterns of size i of the string u from which all the sandwiches in S and a squeezing symbol on each side have been removed, and for which all the merging squeezing runs are considered different. In Figure 4.3, the sandwich of length 2 is again deleted, but this time we are looking for 5-subsequences. Subtracting two symbols to delete the sandwich and one symbol to be deleted in any of its squeezing runs leaves one redundant deletion pattern for each way to delete two symbols in the remaining substring. More precisely, each element in the set {{1, 1}, {1, 2}, {1, 3}, {1, 4}, {1, 5}, {2, 3}, {2, 4}, {2, 5}, {3, 4}, {3, 5}, {4, 5}} generates a redundant deletion pattern, because its union with {5, 6, 7} and {6, 7, 8} generates the same subsequence. Formally, the run distribution of the string is R5 (U) = (8, 6, 2, 0, 0). IX X 2 0 2 0 X [X X 2 0 2 0 0 ><( 0 ><( X iX X tX X 0 2 0 X >.< 0 0 2 0 0 ><( X Figure 4.3: Redundant deletion patterns when deleting a sandwich of length 2 and three additional symbols. Deleting the sandwich gets rid of two runs of length 1, deleting the right squeezing symbol eliminates another run of length 1, and deleting the left squeezing symbol transforms a run of length 2 into a run of length 1. Hence, the run distribution becomes R 2 (u') = (5, 4). From Corollary 4.8, we can conclude that the number of redundant deletion patterns generated by the sandwich is I P2 (1120200130, (2, (2, 0), 1, 2)) I = I P2 (5, 4) I = 11. 4.3.4 Closed-Form Expressions In order to count the number of d-subsequences of a string, one needs to count how many of its deletion patterns are redundant by analyzing all the possible types of sand- wiches and the redundant deletion patterns they introduce. It was shown in the previous Chapter 4. Subsequences, Sequences, and Supersequences^31 subsection that the number of redundant deletion patterns caused by the deletion of a sandwich from a string depends on the run distribution of the sandwich and the length of its squeezing runs. We therefore define the following variables. Definition 4.10. Let s1 be the number of sandwiches of length 1 in the string u. Also, let s91 92^denote the number of sandwiches s of length 1 in u whose run distribution1,ri(s)r2(s)...rt(s) is R1 = (ri (s), r2(s),^, rj (s)), and whose squeezing runs are of length q i and q2 where qi < q2. We write ql and qi to indicate that the length of a squeezing run is at least q i or q2 , respectively. For example, the first sandwich of length 5 in Figure 4.1 is an instance of 4231 , but also an instance of 431, 4 31, 4131, 4231, and 4231 . Since there is only one type of sandwich of length 1, we write s7192 instead of 412 . Also, it should be clear that e^(q1-1)q2Ti92^04142^,(41+ 1)q2 = '1,111 + ulR1^+ • " S1Ri ^ + So l for any q1 > qi . We now prove closed-form expressions for IDd(u)I when d E {1, 2, 3, 4, 5}, starting with a new proof for the case d = 1. Lemma 4.11. The number of subsequences when deleting one symbol from a string u is Proof. When a single symbol is deleted from u, then clearly all the deletion patterns are distinct, thus IDi(u)1 = 17'1(01 — o = r. 0 Lemma 4.12. The number of subsequences when deleting two symbols from a string u is Cr +1) 1,02(u)1^2^ ri^si. Proof. When two symbols are deleted from a string, the only redundant deletion patterns occur for sandwiches of length 1. For each such sandwich, one symbol is required to delete the sandwich, leaving one symbol that can be deleted in either of the squeezing runs. Thus, ID2(u)1 = 173'2 (,r, ri)I — 8 1, and the result follows from Corollary 4.8.^ ^ For binary alphabets, s i is exactly the number of runs of length 1 not located at one of the ends of the string, thus by using^p, r = p + q, and s i = t, it is easy to show that Chapter 4. Subsequences, Sequences, and Supersequences^32 (4.2) is a special case of Lemma 4.12. The fact that the number of d-subsequences of a string also depends on its sandwiches when d > 2 is one reason why efficient block codes capable of correcting more than one deletion error have proved difficult to construct. Lemma 4.13. The number of subsequences when deleting three symbols from a string u is (r + 2)^sr (7, - 3) - siiI(r - 2) - s23.2(r - 1).1D3 (u)1 =^- r2 - rir - s2 -3 Proof. First, each sandwich of length 2 generates a redundant deletion pattern, as shown in Figure 4.2. Second, there are three types of sandwiches of length 1 that generate redundant deletion patterns: a symbol between two squeezing runs of length 1, a symbol between a squeezing run of length 1 and a squeezing run of length at least 2, and a symbol between two squeezing runs of length at least 2. Thus, from the previous subsection, the number of 3-subsequences of a string is 1 1)3( 7-)1 = 1P3(r,ri.,r2)1 - 82 - s1 1 1Pi(r - 3)1 - 8P1Pi(r - 2)1 - 8r1Pi(r - 1)1, and the result follows from Corollary 4.8.^ 111 Unfortunately, the approach used so far cannot be used to compute 1Dd (u)1 for d > 3, because some redundant deletion patterns are deleted more than once. Consider the number of 4-subsequences of the string u = 0100022212. As shown in Figure 4.4, from the first sandwich of length 1, we find that the deletion patterns {1, 2, 4, 5} and {2, 3, 4, 5} generate the same subsequence, as do {1, 2, 5, 6} and {2, 3, 5, 6}. From the second sand- wich of length 1, we find that the deletion patterns {1, 2, 4, 5} and {1, 2, 5, 6} generate the same subsequence, as do {2, 3, 4, 5} and {2, 3, 5, 6}. The four deletion patterns gen- erate the subsequence 000222, and if the redundant deletion patterns for each sandwich are deleted independently, four deletion patterns will be deleted instead of three. It is important to note that this does not occur for consecutive sandwiches of length 1, as in the string 0000202222, since each sandwich is also a squeezing symbol of the other sandwich. Definition 4.14. Let sa_b be the number of consecutive sandwiches of length a and b (in any order) in a string u. As previously, s aqlq, is used to specify the length of the squeezing runs. For example, s 1 _ 1 is the number of consecutive sandwiches of length 1 in Chapter 4. Subsequences, Sequences, and Supersequences^33 IX^0 0 0 2 2 X X 2 t 0 X X 0 0 2 2 X X 2 p8( F3 0 0 0 2 2 2 X X t 0 Ej X 0 0 2 2 2 X X IX X 0 0 0 2 2 X F4,3 2 tX X 0 0 0 2 2 2 Fj. X 1 ° X X 0 0 2 2 X X 2 1 0 X X 0 0 2 2 2 X X Figure 4.4: Multiple deletion of a redundant deletion pattern. u; u = 001011 has one instance of consecutive sandwiches of length 1, and u = 01010202 has three such instances. Lemma 4.15. The number of subsequences when deleting 4 symbols from a string u is 1D4 (u)1 = Cr 3) — r3 — r2 r — r i (r + 1) + ( 1 ) — s3 - 4-2(r — 4)4^ 2^2 -(401+42)(7-. — 3) — (s 12L + s2 2)(r — 2) — 8222,01(r — 1) — s, 1 ( (r 2 ) _ ^r 1ri + 3 _ sr^2^_ ri + 1 —s, j. ( (r — 2 1) Ti + 2 —s212^ 2 r — ri — 1 — sr^ 2 r I — r i 33 ( (r—s i^2) — r i + 1) + (821 ) — si-i. Proof. The number of redundant deletion patterns deleted when considering all the sand- wiches independently is s3 +^!Pi (r — 4)1 + (4,101 + 4) IP]. (r — 3)1 + (sew + sn)1Pi(r — 2 )1 + sroi 1P1 (r — 1)1 + s il l 1P2(r — 3,r 1 — 3 )1 + s12 1P2(r — 2, r 1 — 1)1 + sP1P2(r — 2, r1 — 2)1 + sT2 1P2(r — 1, r1 + 1)1 + s13 1P2(r — 1, ri)1 + sr1P2(7- — 1 , 7-1 — 1 )1. (4.12) Chapter 4. Subsequences, Sequences, and Supersequences^34 Furthermore, as mentioned before the lemma, there is a redundant deletion pattern that should not have been deleted for every pair of nonconsecutive sandwiches of length 1. Putting everything together, we get 1D4 ( 11)1 IP4(r, ri, r2,r3)I - (4.12) + ( s21) - si-i• The result follows from Corollary 4.8. 111 Redundant deletion patterns that are deleted more than once are very similar to what occurred in Lemma 4.7 when counting the number of deletion patterns, hence closed-form expressions for the number of d-subsequences can also be derived using the inclusion-exclusion counting principle. Definition 4.16. Let A = {Ai, A2, ... } be an ordered multiset of integers and A be the set of all the multisets A representing partitions of k < d - 1A1, as presented in Definition 4.6. Moreover, let S be an ordered multiset of sandwiches as introduced in Definition 4.9, and let I SI and ISI S be number of sandwiches in S and the multiplicity of the sandwich s in S, respectively. We use SA for the set of all the ordered multisets of sandwiches whose lengths correspond to the elements of A, i.e., SA {S such that I = I AI and 0 1 = A., for 1 < i < IAI}. Theorem 4.17. The number of subsequences when deleting d symbols from a string u is given by 1Dd(U)I = IPd(Rd(U))1 c,^(1,RI,q1,92) sql.q2^)] ( U) 51 )1 ISIAEA^SESA _ E Howl Da and all the correction terms AA are equal to 0 if every symbol of u belongs to at most one sandwich (including the squeezing runs). Proof. First, recall that the number of d-subsequences can be found by counting the number of redundant deletion patterns that need to be subtracted from IPd(Rd(u))I. Following the discussion from the previous subsection, the number of redundant deletion patterns, if all the sandwiches of u are analyzed independently, is ip(u, (Ai, RA„ q1 , q2 ))1sq,'-q2„,,, • {) i }EA (4.13) Chapter 4. Subsequences, Sequences, and Supersequences^35 Second, consider two sandwiches in u that do not share any symbol, even when con- sidering their squeezing runs. Without loss of generality, we consider two sandwiches made of the m-th and n-th runs of u, respectively. As mentioned above in the anal- ysis of the string in Figure 4.4, when both sandwiches are deleted, the deletion pat- terns {m — 1, m, m, ... ,m, n — 1, n, n, ... , n}, {m — 1, m, m, ... ,m,n,n,... ,n,n + 1}, {m, m, ... , m + 1, n — 1, n, n, ... , n} and {m,m, ... ,m,m + 1,n, n, ... , n, n +1} generate the same subsequence, but four are deleted instead of three. Moreover, it is clear that adding identical deletions in the four deletion patterns will also create the same prob- lem, i.e., four new deletion patterns generating a common subsequence that get deleted instead of three. In fact, the number of deletion patterns that should not have been deleted in (4.13) corresponds to the number of deletion patterns of the string minus the two sandwiches and a squeezing symbol on each side of them. It follows that when con- sidering all the pairs of sandwiches, the number of deletion patterns that should not have been deleted from (4.13) is {A1,A2 }EA E {5[1],s[2] } Es{A1 , A2} IPd-A1 -A2-2(U) {S PI ) S[21 1)1 . a, (4.14) where s[ 1] = (1 11 ', R 1 1J) , q11> , qP) and s[2] = 0 21 , R 1 2[j) , q1 q12] ) are two sandwiches, # s [ 11 and #s [2] are the number of occurrences of s [1] and 8 [2] in u, respectively, and a = #8 [11 . #s [21 if s[1] s[21 and a = (# 52 [1] ) if s [1 ] = s[2]. If two sandwiches in u share some symbols, then subtracting the two sandwiches and a squeezing symbol on each side might not cause the problem occurring in Figure 4.4 and for which (4.14) was derived. This explains why a correction term 0o 1 ,A 2 1 needs to be subtracted to account for sandwiches sharing some symbols. Eq. (4.14) adds some deletion patterns more than once, so that all the ways to delete three sandwiches from u have to considered, and so on. This procedure is again an application of the inclusion-exclusion counting principle, and the result follows when accounting for all the partitions in A. ^ Definition 4.18. Let s i_x_ i be the number of instances of sandwiches of length 1 sep- arated by a single symbol. For instance, s l_x_ 1 = 1 for u = 000102000 and s i_x_ i = 2 for u = 010101. We use Theorem 4.17 to find a closed-form expression for the number of 5-subsequences of a string. The theorem is only moderately useful because it does not describe how to derive the correction terms AA. Since the resulting closed-form formula is quite large Chapter 4. Subsequences, Sequences, and Supersequences^36 and can easily be obtained with Corollary 4.8, only the terms for each member of A are described. The number of redundant deletion patterns when considering all the sandwiches of length 1 to 4 independently, corresponding to A = {1}, A = {2}, A = {3}, and A = {4}, is 11^ 11S4 + s3 ,3 17)i(r - 5 ) 1 + (83,3 + s3,11)17)1(r - 4)1 + (811^r,322\iv^g\I ^ •^3,001 ' '3,11 1- e,3/1r ")1 + (seooi +^)^(r - 2)1 + srooi^(r - 1)1^?,+ 8 ,,01,1 P2(r - 1, + 2)1 + se311P2(r - 1, r 1 + 1)1 + sfo1lP2(r - 1, ri)1 + 4211)2(r - 4, r i - 4)1 + (4101 + 42,2)17)2(r - 3, ri - 2)1 + 4,2173'2 (r - 3, - 3)1 + (40 1 + 42)1P2(r - 2, ri)1 + (s 12-,31 + 42)1P2(r - 2, r i - 1)1 + sB1P2(r - 2,Th - 2 )1+ .5 1 1 17)3(r - 3, ri - 3, r2)1 + 4 2 11)3(r -^- 1,r2 - 1 )1 + s13 1P3(r - 2, r i - 2, r2 +1)1+ sr1 1P3(r -^- 2 ,r2)1 + 42 1P3 (r -^+ 1,r2 - 2)1 + 43 17)3(r - 1, ri,r2)1+ snP3 (r -1 , r1 ,r2 - 1)1 + 4. 3 1P3 (r - 1 , r 1 - 1 , r2 + 2)1 + sf1P3 (r - 1 , r 1 - 1 , r2 + 1)1 + sfI lP3 (r - 1, r 1 - 1, r2)1. (4.15) In (4.15), there is a nonredundant deletion pattern that gets deleted for each nonconsec- utive combination of a sandwich of length 1 and a sandwich of length 2, thus we need to add S1S2 - S1-2 (4.16) deletion patterns; this corresponds to A =^2}, where 0{ 1 ,2} = s 1 _2 . The last case to consider is A = {1,1}. Counting all the pairs of sandwiches of length 1 independently, we need to add (511)^- 6)1 + (811) 1Pi(r - 4)1 + (42)^(r - 2 )12^2 + .5 1 1 s12 1Pi (r - 5 )1 +^(r - 4)1 + s1 242^(r - 3 )1 (4.17) redundant deletion patterns that were included twice in (4.15). The correction term 00, 1 1 is more complicated to derive. First, since (4.17) includes consecutive sandwiches Chapter 4. Subsequences, Sequences, and Supersequences^37 of length 1 for which no deletion pattern was deleted twice in (4.15), we have to subtract S1-1 I (r - 6 )1 + S1- 1/ 11Pi (r — 5)1 + 22^(r — 4 )1 (4.18) deletion patterns that should not have been counted in (4.17). Second, (4.17) also con- siders sandwiches of length 1 separated by a single symbol, and again in this case no redundant deletion pattern is deleted more than once in (4.15). Thus, we need to sub- tract 11^ 'D (7-^C\ t^d2^I'DS l-x-11P1(r — 6 )1 +^ i (r — 4)1 deletion patterns from (4.17). Third, we need to add 11^ 22S l-x-11P1 (r - 5 )1 + Slx-11P1 (r - 4 )1 + Sl-1 1P1 (r - 3 )1 (4.19) (4.20) deletion patterns; this corresponds to redundant deletion patterns that were subtracted twice, once when deleting a sandwich of length 3 and once when deleting two sandwiches of length 1 separated by a single symbol. Consequently, 0{1,1} = (4.18) + (4.19) — (4.20). Lemma 4.19. The number of subsequences when deleting 5 symbols from a string u is 1D5(u)1 = 1P5(r, r17 r2, r3, r4)1 — (4.15) + (4.16) + (4.17) — (4.18) — (4.19) + (4.20). We find the number of 5-subsequences for the string u = 021202112. The run distri- bution of u is (8, 7, 1,0, 0), and the different sandwiches are 8 4 = 1, 4 13 2, 423 1, sZ ioi = s2 = 1, si t = s 1 = 2, and sP x_ i = 1 (all the other variables are equal to 0). It follows that 1D5 (021202112)1 =^7,1,0,0)I — s4 — s331P1( 3 )1 — s331M(4 )1 —4,1o1IP2( 5 , 5 )1 — 4 1 17'3(5, 4 ,1)1+ 82s1 +(s211 )1P1(2 )1 — s1x-11 7'1(2 )1+ si i x-i1P1(3 )1 = 91-1-6-4-10-28+2+2-2+3 = 47. To conclude this section, we mention that the different variables of the closed-form formulae can be computed in a few passes on the string from left to right with a sliding Chapter 4. Subsequences, Sequences, and Supersequences^38 window varying in size. For moderately long codewords, the closed-form expressions are faster than the algorithm presented in the next section. Unfortunately, it is not feasible to extend Theorem 4.17 for large values of d, because the correction terms become increasingly complex and the size of the formulae grows exponentially with d. Consider simply the number of different sandwiches of length d/ 2; this corresponds to the number of unrestricted partitions of d/ 2, which grows exponentially with d. 4.4 An Efficient Algorithm to Compute the Number of Subsequences In this section, I present a simple and efficient algorithm to calculate the number of subsequences when deleting symbols from a string of length n. I take advantage of the common prefixes among the subsequences and express IDd(u)I using a recursion on d and the length of the string. Let Dd, q (u) be the set of d-subsequences of u whose last symbol is q, and 1Dd,q (u)1 the number of such subsequences. For example, D2 ,0 (01120) = {010, 020,110,120}. Theorem 4.20. Let p,q E E. If d j correspond to d > lu i u2^uj 1 and only contain 0 in the last subcell; the cells (i, j) with i = j correspond to d =^and only contain 1 in the last subcell. The rest of the table is filled column per column from left to right using Theorem 4.20. An example of the algorithm for the string u = 0222021 is presented in Table 4.2. Consider, for example, cell (d = 2, u 7 ). Since u7 = 1, we have 1D2(U1U2 • • • U7)I = 1D2,0(,U12/2 . . . 117)1 + 1D2,1(UiU2 . . . U7)1 + I D it n. _ 2,2 (,-1-2 • . . U7)I = 1/31,0(U1U2 . . . U6)1 + ID2(242/2 . . . U6)I ± 'D1,2(4112 ... U6)1 = 1 ± 6 + 3 = 10. Hence, the string has ten 2-subsequences. Lemma 4.21. The space complexity of the algorithm is in 0 (min(d2 I E I log r, d(n — d)IE I log I El)) bits, and the time complexity of the algorithm is in 0(min(nd2 I El log r, n(n — d)dl El log El ). Proof. From (4.3), the number of d-subsequences of u is in 0(min(rd,^thus each element in the table requires 0(min(d log r, (n — d) log IED) bits of storage space. Furthermore, each nontrivial cell (i, j) where 0 < i < j requires the information located in the cells (i, j — 1) and (i — 1, j — 1). Consequently, the table can be computed column per column, only requiring the storage of 0(d) cells. Since each cell has 1E1 + 1 entries, the space complexity is in 0(min(d 2 IEI log r, d(n — d)1Ellog I E I)) bits. The algorithm requires a sum of I E I numbers and lE I copies per cell. Since all the numbers have 0(min(d log r, (n — d) log IE I)) bits and the table has nd cells, the time complexity is in 0(ndlEl min(d log r, (n — d)loglE1)). I=1 If 1E1 and d are constant, then the number of d-subsequences of u can be calculated in 0(n log n) time and using 0(log n) bits of storage space; if lEl is constant and d = an where 0 < a < 1, then the time complexity is in 0(n3) and the space complexity in 0(n2) bits. The algorithm is not efficient when IDd(u)l is small. However, the strings with a small number of subsequences have very few runs, including very large ones. Since the entries in the table do not change after the (d + 1)-th symbol of a long run, we can Chapter 4. Subsequences, Sequences, and Supersequences^41 modify the algorithm accordingly and reduce its time complexity to O(min((n + dr)d1E1 log r, (n + dr)(n — d)1Ellog1E1). Theorem 4.20 and the corresponding dynamic programming algorithm are quite gen- eral and can be modified or extended to obtain other interesting results and simpler proofs of existing results. For example, they can easily be modified to obtain a recurrence rela- tion for the expected number of d-subsequences over all strings of length n. We conclude this section with a simpler proof describing strings whose number of d-subsequences is maximum, which follows directly from the next lemma. Lemma 4.22. Let last(p) be the last position where the symbol p appears in the string u, with last(p) = —1 if p does not appear in u. If last(q) > last(p), thenIDd, q (u)1> IDd ,p (u)I. Proof. We prove the lemma by induction on the cells (i, j) of the table given by the dynamic programming algorithm used to compute 1Dd(u)1. Clearly, the lemma holds for the cells (0, j) and the cells (i, j) where i > j. Assume that 0 < i < j and that the result holds for the cells (i — 1, j — 1) and (i, j — 1); we need to prove that the lemma also holds for the cell (i, j). From the inductive hypothesis and the fact that ^uj )1^ID • • • ui_1)1 for all p ui from Theorem 4.20, the lemma holds for all the corresponding values in cell (i, j). ^Since last(%) = j , the last thing to prove is that 1Di ,uj (u02^ui )1 > 1Di,p (ui • • • ui )1 for every p E E. 1D i ,u,^ui)1 = 1D i (u 1 u2^ui_ i )1 from Theorem 4.20, and 1Di(u02^)1 is greater than or equal to m ax EE (1Di_Lp (n i u2^ui_ i )1), since for each p substring in Di_ 1 ,p(u02^ui_ i ), the deletion of the last symbol gives a substring in ^ ui_ i ).^ ^ From Lemma 4.22, if one wants to maximize the number of descendants of a string, then identical symbols should be as far apart as possible. Corollary 4.23 (Hirschberg and Regnier [47]). Let u be a string of length n. 1Dd (u)1 is maximized for u = o-o- . . . a, where a is the concatenation of all the symbols of E in any order. Chapter 4. Subsequences, Sequences, and Supersequences^42 4.5 An Efficient Algorithm to Compute the Multiplicity of a Subsequence in a String Using a simplified version of the technique used in the previous section, it is possible to give a recursive definition for the multiplicity of a subsequence in a string. Lemma 4.24. 0^ if k > 1; 1 if k = 0; M(u i ...uk,v i ...vi) =^M(u i ...uk,vi • • • vi-i)^if 0 < k < 1 and uk^VI M (Ui . . . uk-1, V1 • • • V1-1) + M Oil . . . uk, Vi • • • V1-1) if 0 < k < 1 and uk = vi . Proof Let u g-'-- u i ... uk and v- v i ... vi. The case k > 1 is trivial and the case k = 0 comes from the definition. For the third case, it is sufficient to see that since uk v i , all the bits u must be taken from the first 1 - 1 bits of v, thus the multiplicity of u in v is equal to the multiplicity of u in v i ... '0_ 1 . For the last case, all the instances of u in v can be divided into two distinct sets: the first set includes the instances for which the last bit is vi , and the second set the instances for which the last bit is vi', where 1' < 1. For all the instances of u in the first set, the first k -1 bits are taken from v i ... 74_ 1 , hence the cardinality of the set is the multiplicity of u1u2 • . • uk-1 in v1v2 • .. '0_1. The instances of u in the second set are equivalent to the third case, and the result follows. ^ Again, simple recursive and dynamic programming algorithms can be derived from Lemma 4.24. Table 4.3 illustrates the dynamic-programming algorithm and shows that the subsequence u = 0010 appears 19 times in the string v = 000100110. An interesting feature of the algorithm is that it calculates M(u i ... uk , , vi ... vie) for k' < k and 1' < 1 on its way to M(u i ... uk , vi ... vi). This will turn out to be very useful when studying maximum-likelihood decoding of synchronization error-correcting codes in Chapter 6. 4.6 Summary In this chapter, I have studied the number of subsequences when deleting d symbols from a string. I have defined the concepts of deletion pattern and sandwich and used them to construct a framework for finding closed-form expressions for the number of Chapter 4. Subsequences, Sequences, and Supersequences^43 Table 4.3: Multiplicity of the subsequence u = 0010 in the string v = 00010010 E^V1 = 0 2/2 = 0 v3 = 0 v4 = 1 V5 = 0 V6 = 0 v7= 1 V8 = 0 E 1 1 1 1 1 1 1 1 1 ili = 0 0 1 2 3 3 4 5 5 6 u2 = 0 0 0 1 3 3 6 10 10 15 u3 = 1 0 0 0 0 3 3 3 13 13 u4 = 0 0 0 0 0 0 3 6 6 19 subsequences. I have presented the first closed-form expressions for nonbinary alphabets and the first closed-form expressions for d E {3, 4, 5}. Unfortunately, due to the size of the closed-form expressions, it is not feasible to extend the framework for very large values of d. Nevertheless, I have proved a simple and efficient algorithm to numerically compute the number of subsequences. The algorithm works for any string, number of deletions and alphabet size. I have also presented a simple algorithm to calculate the multiplicity of a subsequence in a string. Part II Synchronization Error-Correcting Codes 44 Chapter 5 Variable-Length Synchronization Error-Correcting Block Codes In this short chapter, I describe Levenshtein's well-known single-synchronization block codes and present a simple extension for multiple insertion and deletion errors. 5.1 Levenshtein's Single-Synchronization Error-Correcting Codes Error-correcting codes for channels with synchronization errors were first studied by Lev- enshtein [60], who gave bounds on the size of codebooks able to correct a constant number of synchronization errors per block. Levenshtein also explained how to maintain synchro- nization during long transmissions by inserting synchronization bits between blocks, and studied the equivalent of the Z-channel for synchronization errors [59]. Levenshtein [60], using a result from Varshamov and Tenengolts [106], provided a very nice construction and decoding algorithm for single-synchronization error-correcting codes of length n. Consider all the binary codewords of length n such that E ix, 0 (mod n + 1); i=i this set forms a codebook Cn capable of correcting one synchronization error. For ex- ample, C5 = {00000, 10001, 01010 7 11011, 00111,11100}. The minimum distance of the codebooks Cn is 2 (On and 10n-2 1 are always members of Ca), and as such they can detect a substitution error but cannot correct it. Let 1Cn 1 be the size of the codebook 45 Chapter 5. Variable-Length Synchronization Error-Correcting Codes^46 Cn ; the first few terms of the sequence { Cn I} for n > 1 are 1, 2, 2, 4, 6, 10, 16, 30, ... Combining results from Levenshtein and Varshamov [105], it is possible to show that ^ rn_ i l = — 1 N-- 2n ^0(d)2i ‘--, n ICnI "' n—,^ (5.1) where 0(n), Euler's totient function, is the number of positive integers less than or equal to n that are relatively prime to n, and f (n) ,--, g(n) if and only if lim -f-LI /2 = 1. Hence, K;,o g(n) the rate of these codes is asymptotically optimal. Levenshtein's codes were subsequently studied by several researchers. In a survey of single-deletion-correcting codes, Sloane [98] pointed out that several interesting questions remain unanswered, the main ones in my opinion being the possible connection between the maximum size of single-deletion-correcting codebooks and the number of output se- quences from the complemented cycling shift register, as well as the difference between the size of the largest codebooks and the size of the largest linear codebooks. Unfor- tunately, it does not seem that Levenshtein's construction can be generalized to correct multiple synchronization errors in an asymptotically optimal way. Helberg and Ferreira [46] extended Levenstein's construction for multiple insertion and deletion errors, but the loss in rate is significant, they do not provide efficient encoding and decoding algorithms, and their technique does not scale well for large block sizes. Perfect deletion-correcting-codes were also defined by Levenshtein [61]. A e-deletion- correcting code of length n is perfect if the e-subsequences of all the codewords partition the set {0, l}n-e. Result 5.1 (Levenshtein [61]). For every n > 1, the single-deletion-correcting codebook C,„ is perfect. Following Levenshtein, Mahmoodi [65] showed that for every v there exists a perfect 3-deletion-correcting code with codebook of size v. In fact, several papers [8, 65, 66, 35, 113, 92] studied perfect deletion-correcting-codes using design theory, but all the constructions are mostly derived using ad hoc techniques and without efficient encoding or decoding algorithms. odd din Chapter 5. Variable-Length Synchronization Error-Correcting Codes^47 As mentioned in the previous chapters, the balls associated with synchronization codewords do not all have the same size, thus upper bounds for codes correcting syn- chronization errors are much more difficult to derive than upper bounds for codes over the erasure and binary symmetric channels. For instance, the maximal size of codebooks correcting more than 1 deletion of bit is not known, even for very small block lengths [99]. In theory, this question can be answered by computing the size of the largest independent set in the characteristic graph for this problem. Unfortunately, it was proved [44] that if ArP g BPP [84], then for every € > 0, it is impossible to estimate the size of the largest independent set in a graph of n vertices in polynomial time, even within a factor n' from the answer. 5.2 Repetition Synchronization Error-Correcting Codes In this section, I present a simple extension of Levenshtein's single-synchronization codes to correct more than one error by increasing the length of the runs of the original single- synchronization codewords. Lemma 5.2. The total number of runs for {0,1}n is (n -I- 1)2n-1 . Proof This can be easily computed by solving the recursion { 2^if n = 2; an — an-1 + 2n-1 if n > 2. Lemma 5.3. The total number of runs for all the codewords in the codebook C7, is 2n-1 . Proof First, from Result 5.1, the total number of 1-subsequences for all the codewords of Cm is 2'. Second, from (4.1), the number of subsequences when deleting one symbol from a string corresponds to its number of runs. Hence, the number of runs in Cm is 2n-1 . ^ There are many interesting similarities between sequences originating from insertion- deletion error-correcting codes and other famous integer sequences. For instance, let p(n) be the number of partitions of n into distinct parts; the first few terms of the sequence Chapter 5. Variable-Length Synchronization Error-Correcting Codes^48 Table 5.1: Extensions of C5 for two synchronization errors. C5 cg cl cg cg cg 00000 0000000000 000000 000000 000000 0000 10001 1100000011 11000011 11000011 110000011 110000011 code- 01010 0011001100 0011001100 0011001100 0011001100 0011001100 book 11011 1111001111 11100111 11100111 11100111 11100 1111110000 1111000 1111000 1111000 1111000 00111 0000111111 0001111 0001111 0001111 0001111 Cap. 1 sync. 2 sync. 1 sync. 2 del. 2 del. 2 sync. Rate 0.517 0.258 .337 .306 .330 .345 P = {p(n)} for it > 0 are 1, 1, 1, 2, 2, 3, 4, 5, 6, .... Furthermore, we define the sequence n Q = {q(n)}, where q(n) = > p(m). In other words, the sequence Q corresponds to the m=o partial sums of the sequence P, and its first few terms are 1, 2, 3, 5, 7, 10, 14, 19, 25, .... One can see that the number of infinite sequences whose weighted sum is n is p(n), and that the total number of runs for the strings whose weighted sum is it is q(n). Another interesting similarity is that the recurrence from Lemma 5.2 corresponds to the number of states of the briefcase planning domain, a famous problem first introduced by Pednault [85]. If is unclear whether or not this is more than a coincidence. By duplicating each symbol k times, an e-synchronization-correcting code with code- words of length n becomes an (e + k)-synchronization-correcting code with codewords of length n • (k + 1). A more promising approach, instead of duplicating each symbol k times, is to add k bits in each run. Using this technique in Levenshtein's original codes, it follows from (5.1) and Lemma 5.3 that the codebook Cri can be transformed into a codebook of average length • ^• n^(^k ) .-, n + k 2 = n 1 + - 22n Although the resulting codebooks are not necessarily able to correct k + 1 synchroniza- tion errors, they can easily be modified to do so, either by removing codewords and by increasing or decreasing the length of appropriate runs. This simple technique to increase the error-correction capability of Levenshtein's codes is far from optimal, although it is better than several codes recently presented [46, 13]. Interestingly, the resulting codes are variable-length codes, and as a result they can be more efficient against one type of synchronization error (either insertions or deletions) than they are when insertion and deletion errors can occur together. This will be discussed in more detail in Chapter 6. Table 5.2: Double-deletion-correcting codebooks obtained by increasing the length of the runs of Levenshtein's codewords. C2 C3 C4 C5 C6 C7 C8 C9 000 111 0000 110011 00000 0011100 1100011 11111 000000 0001111 0011001100 11100111 1111000 0000000 0001100111 00110001100 001111100 11001110011 111000111 1110011000 1111111 00000000 00110011111 001110011100 00111100011 1100000011 11000111100 11001100110011 1111001111 11111001100 000000000 00000111100 000111100111 001100111001100 0011100011100 00111001100011 00111100000 00111111100 11000000011 11000011111 110011000110011 1110011100111 111001111000 11110011001100 11111000011 111111111 0000000000 00000011111 00000110011100 0000011100011 0001100000111 000110011110011 00011110001100 00110000001100 00110001111000 001100110001111 001100110011001100 00111001100000 0011100111111 001111100110011 001111110000 11000011001111 1100011100000 110001111111 110011001111100 110011110011000 111000000111 1110000011000 11100011110011 1110011001100111 111100011001100 11110011000011 11111000000 111110011111 1111110011100 111111100011 A CD Chapter 5. Variable-Length Synchronization Error-Correcting Codes^50 Table 5.3: Codebook size versus average block length for double-deletion-correcting codes. block length 6 7 7.6 8 9 9.25 10 10.9 11 12 13 13.5 _ 14 increased runs 4 5 8 9 16 30 maximum size 4 5 7 11 16 > 24 ? ? ? HF2002 [46] 3 4 5 6 8 9 11 15 18 An illustration of the extensions proposed above for the codebook C5 is presented in Table 5.1. The double-synchronization-correcting codebook q is obtained by duplicating every bit of C5. Since the reduction in rate from C5 to Q is prohibitive, cig is instead obtained by adding one bit in each run of the codewords of C5. q cannot correct two deletion errors because both 11000011 and 11100111 have the 2-subsequence 110011; this can be solved either by removing one of the two codewords from the codebook, as in cg, or by adding a single bit in the middle run of 11000011, as in C. Neither q nor cg can correct two synchronization errors, because 00110000 can be obtained by deleting two bits from 0011001100 and by inserting two bits in 000000. However, this can be solved by removing two bits from 000000, and the resulting code cig can correct two synchronization errors. It should be noted that the rate of q is one third higher than the rate of C. Double-deletion-correcting codebooks obtained by extending C„, for 2 < n < 9 are included in Table 5.2. One bit was added in each run, and as it was done for cw, the codewords too close from each other were removed from the codebook. Note that codes with higher rates could be found by adding or deleting bits in some of the runs in- stead of discarding entire codewords. In Table 5.3, the size of the resulting codebooks is compared to the maximum size of double-deletion-correcting codebooks [99] and to the double-synchronization-correcting codes of Helberg and Ferreira [46] with compara- ble block lengths. It should be mentioned that the size of the largest double-deletion codebook with codewords of length 11 is at least 24, but the exact value is unknown. The largest double-deletion codebooks with codewords of length greater than 11 are also unknown. 5.3 Summary In this chapter, I have presented a technique to construct reasonably good variable- length (k + 1)-synchronization error-correcting codes. Codebooks are first preprocessed by adding k bits in each run of Levenshtein's single-correcting codewords. The resulting Chapter 5. Variable-Length Synchronization Error-Correcting Codes^51 codebooks cannot correct k + 1 synchronization errors, but due to the good properties of Levenshtein's codes, this can be done either by removing codewords or by varying the length of the problematic runs. The number of codewords in Levenshtein's codebooks increases exponentially with the block length, and as an unfortunate consequence the codebook derivation cannot be done for very large block lengths. The codes constructed in this chapter outperform several codes previously published but are still far from optimal. Although increasing the length of the runs could also improve the correction capability of any code robust against synchronization errors, it seems that designing more efficient multiple-synchronization error-correcting codes will have to be done without extending Levenshtein's single-synchronization codes. This certainly is a challenging task. Chapter 6 Nonlinear Trellis Codes for Symbol Synchronization Errors 6.1 Introduction In modern communication systems, additive noise and synchronization are treated as different problems and overcome using different techniques. This being said, both have the same effect on communication channels, i.e., reducing their capacity. Unfortunately, although it has often been conjectured that error-correcting codes capable of correcting timing errors could improve the overall performance of communication systems, they are extremely challenging, which partly explains why a large collection of synchronization techniques not based on coding were developed and implemented over the years. One of the reasons why coding is difficult is that channels with timing errors have memory, hence the techniques developed for memoryless channels and additive noise can seldom be used. Besides the work pioneered by Levenshtein and discussed in the previous chapter, there are few mathematical tools that can be used for code design, and as a result the vast majority of known codes capable of correcting more than one synchronization error per block have no chance of being of any practical use. For instance, Schulman and Zuckerman [89] showed the existence of "simple, polynomial-time encodable and decodable codes which are asymptotically good for channels allowing insertions, deletions and transpositions" , although their constructions are far from explicit and seem to be only of theoretical interest. They use a concatenated scheme consisting of a Reed-Solomon outer code and an inner code found by a greedy algorithm. Another major challenge of synchronization error-correcting codes is that the boundaries of the blocks might be 52 Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 53 unknown to the receiver, thus symbol and word synchronization must be considered. Several researchers have juggled with ways to use coding in one form or another to protect communication systems against synchronization errors, for instance using synchronizable codes like comma-free codes [18, 41], prefix codes [38], and codes with bounded synchronization delay [40]. Synchronizable codes are designed to recover syn- chronization efficiently and sometimes to correct a few substitution errors, but not to correct synchronization errors per se. An interesting survey of synchronizable codes was written by Cummings [19]. Liu and Mitzenmacher [64] recently modified Levenshtein's single-deletion codes so they can be used in a concatenated fashion. Their construction is very similar to comma-free codes, and the decoding works as long as there is at most one error per block, this limit again coming from the fact that extending Levenshtein's construction for more that one error seems very difficult. A technique to detect and correct synchronization errors is to insert periodic markers in the codewords. Marker codes were first introduced by Sellers [90], who used the marker 001. Sellers's construction can correct one insertion or deletion error between successive markers. Ferreira et al. [34] used markers based on Levenshtein's codes [60] that can recover synchronization if there is at most one substitution or synchronization error per block, where a block contains the bits between consecutive markers plus a marker. Their algorithm can correct synchronization and substitution errors if there is one of them per two blocks. Markers can also be used to avoid synchronization errors from propagating from block to block, which can happen when the decoder does not know the exact block boundaries [14]. In a remarkable paper, Davey and Mackay [20] designed a concatenated scheme using a nonlinear inner code named watermark code and a low-density parity- check outer code over a nonbinary field. The output of the LDPC decoder is mapped into a sparse binary string, which is then added modulo 2 to the a watermark vector known to the transmitter and receiver. The watermark can be seen as a marker uniformly distributed throughout the codeword and is used to recover synchronization. One of the codes they presented has a rate of 0.7 and can correct 30 synchronization errors per block of length 5000. Davey and Mackay's approach was further studied by Ratzer [86], who showed that irregular watermarks gave better performance at very low error rates. A (d, k)-constrained runlength-limited code is a code such that there is at least d zeros and and most k zeros between consecutive ones. Runlength-limited codes, commonly used in magnetic and optical recording systems [48], were also studied for channels affected by synchronization errors. Roth and Siegel [87] designed constrained BCH codes based on the Lee metric [58] able to correct bitshifts as well as insertions and deletions of zeros Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 54 (a bitshift can be seen as the deletion of a zero followed by an insertion of a zero on the other side of a 1). Their framework was extended by Bours [9], who obtained codes capable of correcting a larger number of insertions and deletions of zeros. Similarly, Levenshtein considered codes able to correct insertions and deletions of ones [59]. In these settings, synchronization cannot be completely lost because the receiver knows the boundaries between the runs. In fact, there is a one-to-one correspondence between error patterns from a binary channel with bitshift errors and deletion of zeros, and error patterns from a binary channel where each run is transformed into a run of nonzero bits, i.e., channels with duplication and deletion errors with the additional assumption that no run is completely deleted. This "light synchronization" does not occur if runs can be completely deleted or if random insertions can occur, creating new runs. Another approach is to adapt convolutional codes for synchronization error-correction. In 1961, Gallager [37] suggested to add pseudo-random sequences to the output of con- volutional encoders and to correct synchronization errors using a sequential decoding algorithm. Gallager's main motivation was that codes capable of correcting insertions and deletions of symbols could likely increase the throughput of synchronous systems by reducing the number of required synchronization pulses. One should note that Gallager's work precedes Viterbi's famous decoding algorithm [107]. A practical technique used to facilitate block synchronization of convolutional codes is to invert alternate symbols at the output of the encoder to avoid long blocks of Os and ls, with the assumption that long runs of Os and is are more frequent than long sequences of alternate bits. Baumert, McEliece, and van Tilborg [7] studied which convolutional encoders can output infinitely many consecutive alternate bits, and for the other codes provided upper bounds on the maximum length of such sequences. Using a similar idea, Swart and Ferreira [101] modi- fied convolutional encoders to correct a small number of insertions and deletions of bits by inverting some of the output bits and eliminating large runs of Os and ls. They designed codes of rate 1 ands that can correct one deletion error every 8 and 12 bits, respectively. Mori and Imai [76] used convolutional encoders and a metric based on the Levenshtein distance for the Viterbi decoding algorithm, but they only considered very small insertion and deletion error rates and made the false assumption that synchronization errors can only affect small blocks of bits in the final codewords. Swart, Ferreira, and dos Santos [103] used convolutional codes and parallel Viterbi decoders to correct a small number of insertions and deletions of bits. In this chapter, I present an extensive study of convolutional codes for synchronization errors. I discuss how to modify the Viterbi decoding algorithm to correct several types Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 55 of synchronization errors, and formally prove that the algorithm can find the "closest" codeword from the received sequence, which, surprisingly, had never been done before. The computational complexity of the modified versions of the algorithm depends on the type of errors it is correcting: it is roughly the same as the complexity of the original Viterbi algorithm for insertion and deletion errors, but much larger if insertion and deletion errors can occur together. I also explain how to obtain maximum-likelihood codewords using improved versions of the algorithms and formally prove that this cannot be done without increasing their complexity. Intuitively, it is a good idea to consider convolutional codes to protect communica- tions against time noise, mainly because convolutional encoders can compensate for our lack of understanding of the behavior of synchronization errors. However, I prove that the performance synchronization-correcting codes largely depends on the structure of the underlying encoder graphs and that convolutional codes are fundamentally ill-suited to correct synchronization errors. It explains why the researchers who previously tried this approach obtained mitigated results. I prove that there is a tradeoff between the number of synchronization errors a code can correct and the capacity it has to recover synchro- nization, and present a family of rectangular graphs very robust against synchronization errors. For instance, I present a code with four states that can correct one deletion error every seven bits. It uses an almost trivial rectangular graph but nevertheless surpasses all the codes discussed above, and this convinces me that trellis codes are a promising approach to correct synchronization errors. The rest of the chapter is organized as follows. In Section 6.2, I introduce the distances required to study synchronization error-correcting codes. In Section 6.3, I prove that the Viterbi decoding algorithm can be modified to correct several types of synchronization errors. A family of rectangular graphs well suited to correct synchronization errors when used as encoders is presented in Section 6.4. In Section 6.5, I discuss the tradeoff between recovering synchronization and correcting synchronization errors, and a few simulation results for trellis codes based on simple rectangular graphs are discussed in Section 6.6. In Section 6.7, I discuss ways to obtain maximum-likelihood decoding for the deletion channel, and I conclude the chapter in Section 6.8. 6.2 Preliminaries and Useful Distances In this section, I introduce the notation and all the distances required throughout this chapter. Let u ''' u i u2 ... um be a Q-ary string of length n and I ul the length of u. We Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 56 use e to denote the empty string of length 0, u[k..l] for the substring ukuk +1 uj, and u o v for the concatenation of the strings u and v. A run is a substring of identical symbols of maximum length in u. A d-subsequence of u is a string of n — d symbols that can be obtained by deleting d symbols from u, and a d-supersequence of u is a string of n + d symbols that can be obtained by inserting d symbols in u. A prefix of u is a string that can be obtained by deleting symbols at the end of u. A string v is deletion- reducible to a string u if u is a subsequence of v. A string v is insertion-reducible to u if u supersequence of v, or alternatively if u is deletion-reducible to v. For example, the string u = 00101 has one run of length two followed by three runs of length one, 000 is a 2-subsequence of u, 00110011 is a 3-supersequence of u, 001 is a prefix of u, and 011001 is insertion-reducible to u. Finally, as defined in the previous chapters, a synchronization error is either a symbol deletion or the insertion of a random symbol, and a duplication error is the insertion of a bit similar to the bit preceding it. Definition 6.1. The insertion-deletion distance (or synchronization distance) between two strings u and v, noted ds (u, v), is the smallest number of insertions and deletions of symbols required to change u into v. The minimum insertion-deletion distance (or minimum synchronization distance) of a code is the smallest insertion-deletion distance between any two of its codewords. The synchronization distance has been studied a lot in the literature and is well understood. It should be clear that it is a metric and that a code with minimum insertion- deletion distance 2e + 1 can detect up to 2e insertions or deletion errors and correct e synchronization errors. Definition 6.2. The deletion distance between two strings u and v, noted dd(u, v), is the minimum number of symbols that have to be deleted from the longest string in order to become a subsequence of the shortest string. If the two strings have the same length, then the deletion distance can be calculated by deleting symbols from any of the two strings. For example, dd(10001, 011) = 3 and dd(01110, 11011) = 2. The minimum deletion distance of a code is the smallest deletion distance between any pair of codewords. The insertion distance between two strings u and v, noted di (u, v), is the minimum number of symbols that have to be inserted in the shortest string in order to become a supersequence of the longest string. If the two strings have the same length, then the insertion distance can be calculated by inserting symbols in any of the two strings. The minimum insertion distance of a code is the smallest insertion distance between any two of its codewords. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 57 The synchronization, deletion and insertion distances have the following properties. 1.dd(u,v) ^ Hul —1v11. di (u,v) ^ l lul —lull. 2. The deletion and insertion distances are metrics, i.e., (a) dd(u,^= di (u,^= 0; (b) dd(u, v) = dd(v, u); di (u, v) = di (v, u); (c) dd(u,^< dd(u, + dd(a, v); di (u, v)^di (u, a) + di (a, v). 3. dd (u, v) = di (u, v) for every pair of strings u and v. 4. ds (u,v) = 2dd(u,v) — 11111 — d,(u,^= 2di (u, v) — l lul — Proof. The properties 1, 2(a) and 2(b) are trivial. We prove 2(c) for the deletion distance. Let lul _< lal < jvl. Since there is a subsequence a' of a that can be obtained by deleting dd(a, v) symbols from v, it is clear that it is possible to obtain a subsequence u' of u by deleting at most dd (u, a') < dd (u, a) symbols from a', and that u' is a subsequence of v. The cases lal < lul < Iv' and lul < Ivl —< lal for the deletion distance and the three cases for the insertion distance are proved similarly. To prove the third property, we prove that dd (u, v) > di (u, v). Suppose, without loss of generality, that lul < IvI, and that u' is a subsequence of u obtained by deleting dd(u, v) symbols from v. It is clear that we get a supersequence v' of u by inserting the dd(u, v) symbols that were deleted from v in the corresponding positions in u. The case dd (u, v) < di (u, v) is done similarly. To prove the last property, we again suppose that lul < Iv' and that u' is a sub- sequence of u obtained by deleting dd(u, v) = k symbols from v. If follows that u' can be changed into u by inserting = I ds (u,v) k + lul — ( 1v1— k) = 2dd (u,v) —11111— lull. Suppose now that d8 (u, v) = k with lul < H. It means that there exists a string u' such that v can be changed into u' with lul — k-(1v 1-1u1) deletions and u' can be changed into u with "I" 2 -lul) 2 insertions.^ ^ ul — ( Ivj — k) symbols in u', thus Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 58 One can note that the deletion and insertion distances are not metrics if the condition on the length of u and v is removed. The reason to compute the distances by deleting symbols from the longest sequence and by inserting symbols in the shortest sequence will become obvious after the next lemma. Lemma 6.3. Let C be a code whose minimum deletion (insertion) distance is e. If a codeword is a subsequence of another codeword, then the code can detect at least e —1 deletion (insertion) errors, otherwise it can detect all the deletion (insertion) error patterns. In both cases, C can correct e — 1 deletion (insertion) errors. Proof We prove the result for deletion errors. The only way for a deletion pattern to remain undetected is if a codeword is transformed into another codeword. This is only possible if there is at least one codeword which is a subsequence of another codeword, and the maximum number of deletions that can be detected in this case is one subtracted from the difference between the length of two such codewords, which is lower bounded by e — 1. For the error-correction capability of the code, suppose that the received sequence u' is obtained by deleting at most e — 1 symbols from a codeword u. Since the minimum deletion distance of the code is e, u' is not a subsequence of any of the codewords whose length is smaller than or equal to the length of u. Furthermore, although u' can be a subsequence of some of the codewords longer than u, at least e symbols must be deleted from them to get u'. Hence, u is the only codeword that can be obtained by inserting at most e — 1 symbols in u', and the code can correct up to e — 1 deletion errors. ^ A direct consequence of the lemma is that when all its codewords have the same length, the minimum deletion distance of a code is exactly half the minimum synchronization distance, and as a result it can correct as many synchronization errors as deletion or insertion errors. At the other end of the spectrum, if for any pair of codewords one is a subsequence of the other, then the synchronization and deletion minimum distances are equal, thus the code can correct roughly half as many synchronization errors as deletion (or insertion) errors. More precisely, if the deletion and synchronization distances are e, then the code can correct e — 1 deletions and [ftli synchronization errors. Consider the code {0000, 00000000, 000000000000, 0000000000000000}. It can correct three deletion or insertion errors, but only one if insertions and deletions can occur together. This discrepancy is caused by the asymmetric nature of insertions and deletions: a received sequence like 00000 cannot be obtained by deleting bits from the codeword 0000 even though both strings are very close to each other. This explains why, in Chapter 5, it was Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 59 more difficult to extend Levenshtein's single-synchronization codes to correct multiple synchronization errors than to correct multiple deletions. Definition 6.4. The prefix deletion distance between two strings u and v, noted prd(u, v), is the minimum number of symbols that have to be deleted from v to obtain a prefix of u. The prefix insertion distance between two strings u and v, noted pr i (u, v), is the minimum number number of symbols that have to be inserted in v to obtain a prefix of u. If v cannot be a prefix of u by inserting bits in it, then pr i (u, v) = oo. Finally, the prefix duplication distance between two strings u and v, noted pr t (u, v), is the minimum number of symbols by which the runs of v must be extended to obtain a prefix of u. If v cannot be a prefix of u by increasing the length of its runs, then prt (u, v) = oo, and it is not hard to show that for binary alphabets, prt (u, v) < oo if and only if both strings begin with the same bit and all the runs of one of the strings are smaller than or equal to the corresponding runs of the other string. For example, prd(010, 111) = 3, prd(111,010) = 2, pr i (0010010, 01) = 1, pr i (111000,01) = oo, prt (000111, 011) = 2, and prt (000111, 1) = oo. If the first k bits of u can be obtained by deleting / bits from v, then for 0 < k' < k the first k— k' bits of u can be obtained by deleting /+k' bits from v. A similar statement holds for insertion and duplication errors, however, when deletion and insertion errors can occur together, more precise prefix distances are required. Definition 6.5. The k-prefix synchronization distance between two strings u and v, noted prsk(u, v) is the minimum number of synchronization errors required to trans- form v into the first k bits of u. It should be clear that the lul-prefix synchronization distance between u and v is also the synchronization distance between u and v. The prefix synchronization distance between strings u and v is the tuple prs (u, v) (pr (u , v) , pr (u , , . . . , prisu l (u, v)) of the k-prefix synchronization distances for all the values of k. The generalized k-prefix synchronization distance between a string u and a codebook C is pr,k(u,C) =6' min{prsk (u, c)}, cEC and the generalized prefix synchronization distance between u and C is prs (u, C) = (prscI (u, C), pr: (u, C),^, prIsul (u, C)). To illustrate the previous definitions, if u = 11101 and C = {1, 011, 00011101}, then prs (u, C) = (1, 0,1, 2, 3, 3). More precisely, deleting one bit from 1 is the fastest way to Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 60 get the first 0 symbols of u, the codeword 1 is the first bit of u, inserting one bit in 1 is the fastest way to obtain the first two bits of u, inserting two bits in 1 is the fastest way to obtain the first three bits of u, one deletion followed by two insertions in 011 is the fastest way to obtain the first four bits of u, and finally the fastest way to change a codeword of C into u is to delete the first three bits from 00011101. The reason to define the prefix distances as above will be clear in the next section when describing how to modify the Viterbi decoding algorithm to correct synchronization errors. 6.3 Convolutional Codes and Viterbi Decoding Algorithm for Synchronization Errors Convolutional codes were first introduced by Elias in 1955 [32] as an alternative to block codes, and a maximum-likelihood decoding for convolutional codes was presented by Viterbi in 1967 [107] and used in several applications of digital communications. It is assumed that the readers are familiar with convolutional codes and the Viterbi decoding algorithm, if not there are several references with detailed presentation of these subjects (e.g. [63, 50]). In this section, I show how the Viterbi decoding algorithm can be modified to correct synchronization errors when convolutional codes are used to encode the information sequences. A (n, k, v) convolutional encoder with encoding rate R = constraint length v, and memory order m that starts and ends in the "all-zero" state so is used. A finite information sequence u = u 1 u2 uk . e, generates the finite encoded sequence v = v1v2 vn.(«±,i), transmitted over a channel subjected to synchronization errors, and received as r = 7- 1 7.2 r1. Since the transmitted codeword and the received sequence do not necessarily have the same length, we use metrics based on the prefix deletion and insertion distances defined in the previous section. This requires to compare branches of the trellis with shifted versions of the received sequence. The decoding process is divided into time units of n bits and starts at time 0. Let b = bo b i bn be the bits of a branch in the decoding trellis, and let m(sa , sb, t) be the branch metric for the branch going from state s a to state sb in the trellis at time t. Among all the paths from state s o at the start of the trellis to state sb at time t, the survivor path is the one with the lowest metric. The survivor metric is the metric of the survivor path to state sb at time t and denoted by M(sb , t), with the initial condition Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 61 that M(so , 0) = O. 6.3.1 Correction of Deletion Errors For deletion errors, we modify the Viterbi decoding algorithm so it chooses the smallest codeword deletion-reducible to the received sequence, i.e., the codeword such that the number of bits that need to get deleted from it to get the received sequence is minimized. This can be done in a recursive manner by considering the following branch metric making use of the prefix deletion distance between the bits of the branch and a shifted substring of the received sequence: m(sa, S b , t) = prd(r[(n • (t — 1) + 1 — M (s , t — 1)) b). (6.1) Using this metric, the algorithm tries to include the bits of the received sequence from left to right in the survivor paths; the bits in the survivor paths that cannot be part of the received sequence correspond to bits that were deleted by the channel The number of bits by which the received sequence has to be shifted when compared to the bits of a branch in the trellis corresponds to the number of bits that have to be deleted in the survivor path leading to the branch to get a prefix of the received sequence. Furthermore, since the length of the codewords is not fixed, the algorithm stops decoding when the entire received sequence is a subsequence of the survivor path ending in state s o , i.e., when M(so, t) + = t • n. This path ending in state s o at time t is the final survivor v. To illustrate how the modified Viterbi algorithm works, consider, for instance, the (2,1,2) nonsystematic feedforward convolutional encoder whose generator matrix is given by G(D) = [1+ D 2 1+ D + D2], and whose state diagram is shown in Figure 6.1. We suppose that the codeword 00 11 10 10 11 00 is transmitted and that the first, tenth, and eleventh bits are deleted by the channel, resulting in the received sequence 01 11 01 01 O. The trellis and the path metrics for this convolutional encoder and received sequence are presented in Figure 6.2. Consider the branch going from state so to state so at time t = 3. Since the survivor metric at state so and time t = 2 is 3, three bits have to be deleted from the path sososo to cover the first bit of the received sequence. Hence, the branch metric is the prefix deletion distance between the received string minus its first bit (11101010) and the bits of the branch (00). Since the next bit in the received sequence that has to be included in Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 62 0/00 1/01 Figure 6.1: State diagram for the G(D) = [1 + D2 1 + D + D2] convolutional encoder. the path is a 1, the two Os of the branch have to be deleted, thus the branch metric is 2 and the total metric for the path s osososo is 5. Consider now the branch going from state s2 to state so at time t = 3. Since the survivor metric at state s 2 and time t = 2 is 2, the branch metric is the prefix deletion distance between the received substring 1101010 and the bits of the branch (11), thus the branch metric is 0, the total metric for the path so s i s2 so is 2, and the branch s 2 so is the survivor at time t = 3. The final survivor in the trellis is the codeword 00 11 10 10 11 00, which means that the algorithm was able to correct the three deletions. Si S2 S3 Figure 6.2: Deletion trellis for the G(D) = [1 + D2 1 + D + D2] convolutional encoder and received sequence 01 11 01 01 0. We now formally prove that the prefix deletion distance can be computed recursively and that the final survivor minimizes the number of deletions required to obtain the Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 63 received sequence. Lemma 6.6. If prd(u, v) = k 1 and prd(vRivl — k1+ 1 )••14,w) = k2 , then Pr d(u, v w) = + k2. Proof. From the assumptions, the first^— k1 symbols of u can be obtained by deleting k1 symbols from v, and the first 1w1 — k2 symbols of u[(1v1 — k 1 + 4.14 can be obtained by deleting k2 symbols from w. It follows that the first Iv' + 1w1 k 1 — k2 symbols of u can be obtained by deleting k 1 + k2 symbols from v o w, thus prd(u, v o w) < k i + k2 . Suppose now that prd(u, v o w) = k' < k1 + k2 . It follows that the first I vi + lwl symbols of u can be obtained either by deleting less than k 1 symbols from v or less than k2 symbols from w. If the former is true, then it is possible to obtain more than the first 1 v — k 1 symbols of u by deleting symbols from v, which contradicts the assumption that prd(u, v)^k 1 . Deleting less than k2 symbols from w leads to a similar contradiction, thus prd (u, v o w) > k i + k2 .^ 111 Theorem 6.7. The final survivor '6 of the modified Viterbi algorithm for deletion errors is the smallest codeword deletion-reducible to the received sequence. Proof From the previous lemma, every survivor metric in the trellis is the prefix deletion distance between the received sequence r and the bits of the corresponding survivor path. Furthermore, since the decoding stops when M(so , t) + in = t • n for the survivor path ending in state so , the received sequence can be obtained by deleting M(so , t) bits from v. We now prove that no codeword smaller than ir is deletion-reducible to r. First, the survivor paths ending at state s o with t • n < M(so , t) + irl are not deletion-reducible to r. Second, halting the algorithm later than when M(so , t) +^= t n for the survivor path ending in state so for the first time can only generate codewords longer than Third, it has to be proved that no discarded path can generate a sequence smaller than and deletion-reducible to r. Suppose the opposite, i.e., that the shortest path deletion- reducible to r, v', must keep one of the nonsurvivor paths at time t, and that the metrics of the nonsurvivor path v'[l..tn] and survivor path p[1..tn] are k + 6. and k, respectively. From the previous lemma, the first to — (k + 8) bits of r can be obtained by deleting k +5 bits from v11..tni, and the last in I — tn+ k + (5 bits of r can be obtained by deleting 1v1 Irl — k —.5 bits from v'[tn+1..lvii]. It follows that the last jr1 — tn+ k bits of r can be obtained by deleting iv'I — in — k bits from v'[tn+1..jv'll, but since the first tn — k bits Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 64 of r can be obtained by deleting k bits from p[1..tn], it means that p[1..n] o vi[tn + 1..111] is deletion-reducible to r, has the same length as v' and only uses survivor paths, which is a contradiction. ^ Since computing prd(a, b) requires 1131 comparisons, the modified Viterbi decoding algorithm for deletion errors has the same computational complexity and storage re- quirements as the original Viterbi decoding algorithm for substitution errors. The main difference is that the bit r i in the received sequence has to be buffered and kept in memory until M(s a , t) + i < t • n for all the survivor paths at time t in the trellis. 6.3.2 Correction of Insertion Errors The Viterbi decoding algorithm can also be modified to correct insertion errors by choos- ing the largest codeword insertion-reducible to the received sequence. Again, this can be done recursively by using the following metric based on the prefix insertion distance: m(sa, sb, t) = Pri (r[(n • (t — 1) + 1+ M(sa, t — b). (6.2) It is not always possible to obtain prefixes of the received sequence by inserting bits in the paths of the trellis, and it should be clear that any path longer than the received sequence cannot be one of its prefixes. Consequently, the main difference between the Viterbi algorithm for insertion and deletion errors is the stopping condition. For insertions, the algorithm can stop when none of the survivor paths in the trellis has a finite metric; the final survivor is the longest survivor path ending in state s o and with a finite metric. An example of the trellis for the G(D) = [1+ D 2 1+ D + D 2] convolutional encoder from Figure 6.1 is shown in Figure 6.3, where the codeword 11 10 10 00 01 11 is trans- mitted over an insertion channel and received as 11 11 10 10 00 01 11 O. Consider, for instance, the branch going from state s 2 to state so at time t = 4. Since M(s 2 , 3) = 2, it means that two bits need to be inserted in the survivor path ending at state so and time t = 3 to obtain the first eight bits of the received sequence. Thus, the branch metric is the prefix insertion distance between the received sequence minus its first eight bits (00 01 11 0) and the bits of the branch (11). This requires the insertion of three zeros, thus the branch metric is 3 and the metric for the path sos1s3s2so is 5. Now examine the branch going from state s o to state so at time t = 4. Since M(so , 3) = 7, the branch metric is the prefix insertion distance between the last two bits of the received sequence (10) and the bits of the branch (11). The distance is infinite because it is not possible Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 65 So s l 82 S3 Figure 6.3: Insertion trellis for the G(D) = [1 + D 2 1 + D + D 2 ] convolutional encoder and received sequence 11 11 10 10 00 01 11 0. to obtain a prefix of 10 by inserting bits in 11, thus the branch s o so is discarded and the survivor branch for state s o at time t = 4 is the branch s 2 so . The algorithm stops after seven time units when all the survivor metrics are infinite; the longest path with a finite metric ending in state s o is the final survivor 11 10 10 00 01 11. Although the path metric for the final survivor is 2, it does not include the insertion located at the very end of the received sequence, hence the three insertions were corrected by the algorithm. As we did for deletion errors, we prove that the prefix insertion distance can be computed recursively and that the final survivor is the largest possible codeword which is a subsequence of r. Lemma 6.8. If pri (u, v) = k 1 and pri (u[(1v1 + k1 + = k 2 , then pri(u, v o = k1 + k2. Proof. If k i < oo and k2 < oo, then from the assumptions, the first 1v1 + k 1 symbols of u can be obtained by inserting k 1 symbols in v, and the first 1w1 + k2 symbols of u[(INTI + k 1 + 4.14 can be obtained by inserting k2 symbols in w. It follows that the first Iv 1 + 1w1 + k 1 + k2 symbols of u can be obtained by inserting k 1 + k2 symbols in vow, thus pr i (u, v o w) < k1 + k2. Suppose now that pr i (u, v o w) = k' < k1 + k2. It follows that the first Ivl + lwl + k' symbols of u can be obtained either by inserting less than k1 symbols in v or less than k2 symbols in w. If the former is true, then it is possible to obtain less than the first 1v + k 1 symbols of u by inserting symbols in v, Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 66 which contradicts the assumption that pr i (u, v)^Inserting less than k2 symbols in w leads to a similar contradiction, thus pr i (u, v o w) > k i + k2 and the lemma holds when ki and k2 are finite. We prove the case when k i^oo and/or k2 = oo by contrapositive. Suppose that pri (u, vow) = PI + k2 < oo. It follows that a finite number of symbols ki can be inserted in v to obtain the first + Pi symbols of u and that the subsequent iw I + 14 symbols of u can be obtained by inserting 4 symbols in w. This contradicts the assumption that u has an infinite prefix insertion distance with v and/or w. The lemma follows. ^ Theorem 6.9. The final survivor i7 of the modified Viterbi algorithm for insertion errors is the longest codeword insertion-reducible to the received sequence. Proof From the previous lemma, every survivor metric in the trellis is the prefix insertion distance between the received sequence r and the bits of the corresponding survivor path. Thus, a survivor path in the trellis is insertion-reducible to the received sequence if and only if its metric is finite. Similar to what we did in Theorem 6.7, we have to prove that no codeword longer than v is insertion-reducible to r. Since the smallest survivor metric at time t +1 cannot be smaller than the smallest survivor metric at time t, the algorithm can stop when all the survivor metrics are infinite. Among all the survivor paths ending in state s o with a finite metric, the one that requires the least number of insertions to be changed into the received sequence is the longest one. The last thing to prove is that no discarded path in the trellis can generate a codeword ending in the all-zero state, with a finite metric, and longer than it' , which we do by contradiction. Suppose that v', the longest path insertion-reducible to r and ending in state s o , must make use of a discarded path at, say, time t. We also assume that the metric of the path v'[1..tn] is k + 6 and that the metric of the corresponding survivor path p[1..tn] at time t is k. Thus, the first tn+k+ 6 bits of r can be obtained by inserting k + 6 bits in v11..tn] and the last — tn — k — 6 bits of r by inserting^— Iv't — k — 6 bits in v'[tn + 1..^It follows that the last Iri — tn — k bits of r can be obtained by inserting Irl —^— k bits in v'[tn + 1..1v/11 (just insert the bits r[tn + 1..tn + 6] in front of v'[tn +1..lvii] and the other^— 1\1 k 6 bits as before. Hence, since the first tn + k bits of r can be obtained by inserting k bits in p[1..tn], the codeword p[1..tn] 0 vi[tn + 1..INT'll has the same length as v' but only uses survivor paths, which is a contradiction. ^ The complexity of the decoding algorithm is harder to calculate for insertions than for deletions because the number of comparisons varies: a branch with n bits and whose Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 67 metric is m < oo requires n + m comparisons. This might seem prohibitive, but toward the end of the decoding process, several states have very high or infinite path metrics and as a result their outgoing branches can be discarded without comparisons. Overall, simulation results indicate that the modified Viterbi algorithm for insertions requires slightly more comparisons than the original version. For example, it can be shown that the trellis with 46 branches in Figure 6.3 requires 94 comparisons. Regarding the memory requirements, the algorithm might have to buffer a large number received bits before a branch metric can be calculated, and the bit r i in the received sequence cannot be discarded until i — M(sa , t) < t • n for all the survivor paths at time t in the trellis. 6.3.3 Correction of Duplication Errors Surprisingly, it seems much more difficult to modify the Viterbi algorithm to correct du- plication errors than to correct insertion errors. A metric based on the prefix duplication distance can be used, but a problem occurs because computing the prefix duplication distance between the received sequence and the paths in the trellis cannot be done re- cursively without memory. Consider the strings u = 0011, v = 0, and w = 1. It can be seen that prt (u, = 0, prt (u[2..4], oo, and prt (u, v o w) = 2, hence prt (u, v o^prt(u, v) +Prt(u[avl + prt(u, v) + It is not possible to increase the run of the string 1 to obtain 011, but the problematic leading 0 could have been inserted in v instead of w. This problem is caused by the constraint on the insertions for duplication errors and formalized in Lemma 6.10. Lemma 6.10. If v = v' 0 a and w = bow' and a b and prt (u, v) = kJ, < oo and a > 1 and u i = a for 114 + k 1 + 1 < i < I vI + k 1 + a and u i a for + k 1 + a + 1 and prt (74(1v1 + k1 + a + 1)..IuI, w]) = k2 < oo, then prt (u, v o w) = k 1 + k2 + a. For all the other cases, if prt (u, = k1 and prt (uKv + k 1 +^w) = k2 , then prt (u, v o^+ k2. Proof. We prove the first case; all the other cases are similar to those proved in Lemma 6.8 and left to the reader. From pr t (u, v)^k 1 , the first I vl + k 1 symbols of u can be Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 68 obtained by adding k i symbols in the runs of v. Furthermore, since vi v i = a and u i = a for Iv' + kl + 1 < i < Ivl + k l + a, the next a symbols of u can be obtained by further increasing the length of the last run of v. Finally, from the assumption that prt (u[(1v1 + k l + a + 1).1111, w]) = k2 , the next Iwi + k2 symbols of u can be obtained by increasing the runs of w, hence the first ivi + iwi + k l + k2 + a symbols of u can be obtained by inserting k i + k2 +a symbols in vow and prt (u,v ow) < k i + k2 + a. Suppose now that prt (u, v o w) = k' < k l + k2 + a. We prove that it is not possible to increase the runs of v while satisfying the lemma without violating one of the assumptions. First, inserting less than k i symbols in v cannot give a prefix of u because it contradicts the fact that prt(u, = k l . Second, if k i < / < kl + a symbols are inserted in v to obtain a prefix of u, then it is not possible to obtain a prefix of u[lvl + / + 1..jul] by increasing the runs of w because the two strings do not begin with the same symbol. Third, if exactly k i + a symbols are added in the runs of v to obtain a prefix of u, then at most k' — k l — a < k2 symbols can be inserted in w, contradicting the assumption that prt (u[(1v1 + k l + a + .14 w]) = k 2 . Finally, since prt (u, v) = kl , it follows that v and u[1..(Ivi + k i )] have the same number of runs. Moreover, since ui v l +k , +a um+ki +a+ 1 it is not possible to obtain a prefix of u by inserting more than k i + a in v, because the two strings do not have the same number of runs. Using Lemma 6.10, the branch metric used for the modified Viterbi algorithm for duplication errors is as follows. Let p be a path ending in state s a at time t — 1 and whose metric is km , and let b be the string of bits for the branch S aSb at time t. If b starts with the symbol a and if the last symbol of p is the symbol b different from a and if rRIPI + + 1)..lril starts with a run of a bs, then m(sa, Sb, t) = a +Prt(rRIPI + km + a + b), otherwise, m(sa ,sb,t) =Prt(r[(IPI + km + 1)..Irl], b)• This definition insures that the metric of a path is the prefix duplication distance between the received sequence and the said path. However, the undesired consequence is that if the Viterbi algorithm is used with this metric and if only one survivor path per state is kept at each step of the decoding process, then it is not guaranteed to output the largest codeword that can be changed into the received sequence by increasing the length of its runs. In fact, in some cases the algorithm might even fail to return a valid Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 69 codeword. Consequently, a more sophisticated approach is needed differing from the modified algorithm for insertion errors in three ways. First, the algorithm must keep in memory more than one survivor path per state. For state sb at time t in the trellis, the algorithm must store the tuple (s a , km) if there is at least one path going from the beginning of the trellis to state sb at time t via state sa at time t — 1 and whose metric is km < oo. Second, the branch metrics must be calculated according to Lemma 6.10 as explained in the previous paragraph. Third, the final survivor ir is the longest path in the trellis ending in the all-zero state and with a finite metric of km < oo and with ri = riiii±k„, for i > + km . A trellis sample for the modified Viterbi decoding algorithm for duplication errors using the convolutional encoder from Figure 6.1 is presented in Figure 6.4, where the codeword 11 01 11 00 11 01 11 is transmitted over a channel with duplication errors and received as 11 11 11 10 01 11 10 00 11 01 11. Consider, for example, the branch going from states s2 to Si at time t = 3, which is an instance of the problematic case from Lemma 6.10. Although the metric for the path sos i s2 is 6 and the prefix dupli- cation distance between 01.4.1] and the bits of the branch is infinite, the three bits r[11..13] could have been inserted in the last run of the branch 8 1 .52 at time t = 2, thus m(s2, s i , 3) = 3 + pr t (r[1441], 00) = 3 + 0 = 3. Now consider the branch going from S i to s 2 at time t = 6. Since there are two paths with a finite metric ending in state s 1 at time t = 5, the algorithm must compute the branch metric twice. For the path with metric 6 ending by the branch s 2 s 1 , m(sa , sb, 6) is infinite, whereas m(sa , sb, 6) = 0 for the path with metric 8 ending by the branch s o s i . This means that the beginning of the best path to state s i at time t = 6 is not the best path to state s 2 at time t = 5. Had only a single survivor path been kept, the algorithm would have failed to find the longest codeword duplication-reducible to the received sequence. When backtracking to find the final survivor path, the algorithm matches the symbols of the received sequence with the symbols of the path from right to left. When it reaches a state with more than one possible survivor, it chooses the one whose metric matches the number of duplication errors not yet covered by the tail end of the final survivor. Going back to the trellis of Figure 6.4, the algorithm backtracks until it reaches state S i at time t = 5 where two survivor paths appear. Since the end of v' perfectly matches the last four bits or r (0111), a path with ten bits must cover the first eighteen bits of r, thus the algorithm must keep backtracking following the survivor whose metric is 8. The final survivor is 11 10 01 01 10 11, thus the algorithm was able to correct all eight duplication errors introduced by the channel. One final remark: if at any point there are two or more S i S2 S3 r = 11^11^11^10^01^11^10^00^11^01^11 Figure 6.4: Duplication trellis for the G(D) =-- [1 + D2 1 + D + D2] convolutional encoder and received sequence 11 11 11 10 01 11 10 00 110111. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 71 survivors with the same metric and different ancestors, one of them is chosen randomly because they all lead to codewords duplication-reducible to the received sequence. Theorem 6.11. The final survivor "V of the modified Viterbi algorithm for duplication errors is the longest codeword duplication-reducible to the received sequence. Proof Since every path metric in the trellis is the prefix duplication distance between the received sequence r and the bits of the corresponding path, and since only the paths with infinite metrics are discarded, it is clear that the longest codeword duplication-reducible to the received sequence is not discarded by the algorithm. ^ It seems that the complexity of the algorithm for duplication errors is higher than the complexity of the original Viterbi algorithm, but it is not clear how much higher. On one end, it must compute some of the branch metrics several times, but on the other end for several branches no comparison is necessary. For the trellis presented in Figure 6.4, 130 comparisons for 54 branches are required. 6.3.4 Correction of Synchronization Errors When only deletion (or insertion) errors are present, the modified Viterbi algorithm knows that the bits in the trellis that do not match the received sequence were deleted (inserted). This means that at any point during the decoding process, the algorithm knows how much ahead (or behind) the optimal decoded sequence is with respect to the received sequence. This asymmetry also explains why the complexity of the algorithm is the same as the complexity of the original Viterbi algorithm. However, when more than one type of error among insertions, deletions and substitutions occur together, the algorithm does not know if a bit in the trellis that does not match the received sequence has been inserted, deleted or flipped. For instance, although a bit deletion gives a better metric than two insertions, it is possible if the bits were indeed inserted in the codeword that inserting the two bits might lead to a better metric at the end of the decoding process. Consequently, in order for the Viterbi algorithm to find the codeword whose synchronization distance from the received sequence is minimized, it has to keep track of the best way to obtain the first k bits of the received sequence for 0 < k < Irl, which is precisely what the generalized prefix synchronization distance provides. Before describing the algorithm in detail, I first prove that all the required metric calculations can be done recursively and efficiently. Lemma 6.12. Let u, v and w be strings, and let w' be a symbol from the alphabet E. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 72 I. If Ivl = 0 and 0 < k < lul, then prsk (u, v) = k; 2. if v = w o wt , then pr (s) (u, v) = prs° (u, w) + 1; 3. if v = w o w' and 0 < k < lul and uk w', then prsk (u, v) = min(prsk-1 (u, v) + 1, prsk (u, w) + 1); 4. if v = w o w' and 0 < k < lui and uk = w', then Pr sk (u, v) = Pr sk-i (u, w). Proof. If Iv' = 0, then k symbols must be inserted to get the first k symbols of u, which proves the first case. For the rest of the proof we assume that v = w o w'. jw1 and lw I +1 symbols must respectively be deleted from w and v to get the first zero bits of u, which proves the second case. Suppose now that uk ^ w'. The first k bits of u can be obtained by transforming v into the first k — 1 bits of u followed by inserting uk at the end of the string, or by deleting w' followed by transforming w into the first k bits of u. Thus, prsk (u,v) < min(prsk-1 (u, v) + 1, pr sk (u,w) + 1). Suppose now that pr sk (u, v) < min(pr sk-1 (u,v) + 1, pr sk (u, w) + 1). Since uk w', either w' has to be deleted from v or uk has to be inserted at the end of v. If it is the former, then w can be changed into the first k bits of u with less than pr sk (u, w) insertions and deletions, which is a contradiction. If it is the latter, then v can be changed into the first k —1 bits of u with less than prsk-1 (u,v) insertions and deletions, which is another contradiction, and this proves the third case. For the fourth case, the first k bits of u can be obtained by transforming w into the first k — 1 bits of u while leaving w' unchanged, thus pr sk (u, v) < pr sk -1 (u, w) . Finally, it should be clear that pr sk (u,v) < pr sk-1 (U,W) leads to a contradiction, which completes the proof of the lemma ^ Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 73 Table 6.1: Prefix synchronization distance between the strings u = 01101 and v = 110. Uo = E Ui = 0 U2 = 1 U3 = 1 U4 = 0 u5= 1 Vo = E 0 1 2 3 4 5 v1 = 1 1 2 1 2 3 4 v2 = 1 2 3 2 1 2 3 v3 = 0 3 2 3 2 1 2 As done several times in Chapter 4, the prefix synchronization distance can be calculated using dynamic programming Table 6.1 shows an example for u = 01101 and v = 110. The prefix synchronization distance is the last row of the table, i.e., pr,(01101, 110) = (3, 2, 3, 2, 1, 2). A consequence of Lemma 6.12 is that the k-prefix syn- chronization distances between u and v o w can be calculated using only the symbols of u, the symbols of w, and the k-prefix synchronization distances between u and v, i.e., without knowing precisely what the string v is. More importantly, this is also holds for the generalized prefix synchronization distance, as shown next. Lemma 6.13. Let u be a string, C a codebook with codewords of length n, and C' {c 0 wic E Cl the codebook consisting of all the codewords of C with the sym- bol w E E appended at the end. pr sk (u, C) + 1{if k = 0; prsk (u,C) =^min(prsk (u,C) +1, pr sk-1 (U, C") + 1) if 0 < k < lul and uk w; pr sk-i ( u, c)^ if 0 < k < Jul and uk = w. Proof. Changing a codeword of C into the empty string requires the deletion of n symbols, whereas n + 1 deletions are required for the codewords of C'. This proves the first case. Let 0 < k <^uk w, and let c E C and c' E C' be codewords such that Prsk (u, c)^Prsk (u, C) and prs" (u, c')^pr sk-1(u, C' ) , respectively. It follows that the codeword c 0 w E C' can be changed into the first k symbols of u by deleting w and by changing c into the first k symbols of u with pr sk (u, C) operations, thus prsk (u,C') < prsk (u,C)+ 1.^ (6.3) The codeword c' can also be changed into the first k symbols of u by changing it into first k - 1 symbols of u with pr 3k-1 (u, C') operations followed by the insertion of uk at Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 74 the end of the string, thus pr sk (u, C') _ r< p i:--i (u, co) + 1,^ (6.4) and prsk (u,C1 ) < min(prks (u, C) + 1,pr sk-1 (u,C') + 1) follows from (6.3) and (6.4). Suppose now that prsk (u,C') < min(pr sk (u, C) + 1,prsk-1 (u, C') + 1). It follows that there is a codeword cow E C' with prsk(u, cow) < min(pr /sc(u, C) + 1, prsk-1 (u, C') + 1). If the fastest way to change c o w into the first k bits of u must delete w, then the first k bits of u can be obtained with prsk(u,C) —1 synchronization operations from c, which is a contradiction. On the other end, if the fastest way to change c o w into the first k bits of u must insert the symbol uk, then it is possible to obtain the first k — 1 bits of u from c o w with pr sk -1 (u, C') — 1 synchronization operations, which again is a contradiction. Hence, prsk (u, C') > min(pil(u,C) +1 , pr sk-1 (u, co) + 1) . As for the third case, if c E C is a codeword such that prsk'(u, c) = prsk-1 (u, C), then cow E C' can be transformed into the first bits of u by changing c into the first k — 1 symbols of u while leaving w unchanged, which leads to prsk(u,C') < pr sk-1 (u,C). Assuming that prsk (u,C') < prsk-1 (U, C) leads to a contradiction, and the result follows. 0 Another important property is that the generalized prefix synchronization distance of the union of two codebooks can easily be calculated from the generalized prefix syn- chronization distance of each of them, as shown next. Lemma 6.14. Let u be a string. If C and C' are codebooks with codewords of length n, then prsk(u, C U C') = min(prsk (u, C),prsk (u, C')). Proof. Let c E C and c' E C' be codewords such that pr.'s' (u, c) = pr sk(u,C) and prsk (u, c') = pr,k (u,C9, respectively. Clearly, c E C U C' and c' E C U C', and it follows that prsk (u,C U C') < pr sk(u,C) and prsk (u, C U C') < pr is'(u,C'). Consequently, prsk (u,C U C') < min(pr sk (u,C),pr sk (u, C')). Suppose now that pr!(u, C U C') < min(pr sk (u, C),prsk(u, C')). It follows that there exists c E C U C' with prsk(u, c) < min(prsk(u,C),prsk (u,C9). If c E C, then prsk(u, c) < min(pr sk(u,C),prsk(u,C')) < prsk (u,C), which is a contradiction. A simi- Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 75 lar contradiction appears if c E C', therefore prsk (u,C U C') > min(prsk (u, C), prsk (u, C')). ID The properties of Lemmas 6.13 and 6.14 are exactly what is required for the prefix syn- chronization distance to be used as a metric in yet another modified version of the Viterbi decoding algorithm for synchronization errors. More precisely, the metric m(s a , sb, t) for the branch sasb in the trellis at time t is the generalized prefix synchronization distance between the received sequence and the codebook consisting of the strings for all the paths starting at the beginning of the trellis and ending with that branch, and from Lemma 6.13 this can easily be calculated recursively. The survivor metric M(s a , t) for state sa in the trellis at time t is the generalized prefix synchronization distance between the received sequence and the codebook consisting of the strings of all the paths starting at the begin- ning of the trellis and ending in that state, and this can be calculated recursively using Lemma 6.14. The metric M(so , 0) for the all-zero state at time t = 0 is initialized to M(so, 0 ) =4 Pr s( 11 , E) = ( 0 , 1 , 2 , • • • , irl)• It is much easier to understand how the algorithm works with an example using the G(D) = [1 + D 2 1 + D + D2] convolutional encoder whose state diagram is shown in Figure 6.1. Suppose that the codeword 00 11 01 11 00 is transmitted over a channel with synchronization errors and that two insertions and one deletion corrupt it, resulting in the received sequence r = 01 01 11 11 01 0. Since each metric is a 12-tuple, the empty trellis for the received sequence is shown in Figure 6.5, a sample of the trellis with the branch and state metrics is presented in Figure 6.6, and all the metrics are included in Table 6.2. Consider first Figure 6.6, showing the two paths going from the start of the trellis to state s i at time t = 3. The metric for the all-zero state at time t = 0 is initial- ized as mentioned above and can be used to calculate the two metrics at time t = 1, which in turn are used to calculate the metrics for the states s o and s2 at time t = 2, which are used to find the metrics for the paths sosi and s2s1 at time t = 3. Since there are two merging paths at state s 1 and t = 3, the state metric consists of the minimum value, between both path metrics, of the generalized k-prefix synchronization distances. When the two of them are equal, the algorithm chooses one randomly, thus M(s i , 3) = (6, 5, 4, 3, 4, 3, 4, 5, 6, 7, 8, 7). It is important to understand that there are Irl + 1 survivor paths for each state in the trellis: the best path that can be changed into 0,1,2,3,4,5,6,7,8,9,10,11 00^00 2,1,2,1,2,3,4,5,6,7,8,9 2,3,2,3,2,3,4,5,6,7,8,9 4,3,2,3,2,3,4,5,6,7,6,7 Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 76 r = 01^01^11^11^01^0 so Si 82 S3 Figure 6.5: Empty trellis for the G(D) = [1 + D2 1 + D + D2 ] convolutional encoder and received sequence 01 01 11 11 01 O. Figure 6.6: Trellis sample for the modified Viterbi decoding algorithm for synchronization errors with the G(D) = [1+ D 2 1 + D + D2 ] convolutional encoder and received sequence 01 01 11 11 01 O. Table 6.2: Metrics, prefix survivors, and state survivors for the modified Viterbi decoding algorithm for synchronization errors with the convolutional encoder G(D) = [1 + D2 1 + D + D2 ] and the received sequence 01 01 11 11 01 O. o .5 2^3^4 \- --"/ p ry ^- / .^5 ^rC r rC r 8 rC r9 r C pr 1 0 r, C pr l 1 (r, C)PS\'' ..-'1^. ‘.. •"/ pro^7^J ,.5b^II PiSk..0-'1^1-"Sk"'"'l^' .t. 0^SO^(0,0,-)^(1,1,-)^(2,2,-)^(3,3,-)^(4,4,-)^(5, 5,-)^(6,6,-)^(7,7,-)^(8,8,-)^(9,9,-)^(10,10,-)^(11,11,-) 1^so^(2,0,so)^(1,0,so)^(2,0,so)^(1,O,so)^(2,0,so)^(3,0,so)^(4,0,so)^(5,0,so)^(6J:1,s°)^(7,2,so)^(8,2,so)^(9,8,so) 1^si^(2,0,s0)^(3,1,so)^(2,1,80)^(3,1,so)^(2,1,so)^(3,3,so)^(4,4,so)^(5,5,so)^(6,6,80)^(7,6,so)^(8,7,so)^(9,7,so) 2^so^(4,0,so)^(3,0,s0)^(4,2,so)^(3,2,so)^(4,4,8o)^(5,5,so)^(6,6,so)^(7,7,so)^(8,8,so)^(7,8,so)^(8,8,so)^(7,8,so) 2^Si^(4,0,so)^(3,1,so)^(2,1,so)^(3,3,so)^(2,3,so)^(1,3,so)^(2,4,8o)^(3,5,8o)^(4,6,so)^(5,6,so)^(6,7,so)^(7,7,so) 2^s2^(4,0,si)^(3,0,si)^(2,0,81)^(3,2,81)^(2,2,81)^(3,4,81)^(4,5,81)^(5,6,si)^(6,7,81)^(7,8,si)^(6,8,si)^ (7,8,81) 2^83^(4,0,s1)^(3,0,81)^(4,1,si)^(3,1,si)^(4,3,si)^(3,4,81)^(4,5,81)^(5,6,81)^(6,7,81)^(5,7,81)^(6,7,81)^(7,9,81) 3^so^(6,0,8o)^(5,0,so)^(4,1,82)^(5,2,8o)^(4,3,s2)^(3,3,82)^(2,4,s2)^(3,5,s2)^(4,6,s2)^(5,6,82)^(6,7,s2)^(7,7,82) 3^Si^(6,0,82)^(5,0,s2)^(4,1,8o)^(3,2,82)^(4,4,s2)^(3,3,so)^(4,4,so)^(5,5,8o)^(6,6,so)^(7,6,so)^ (8,9,so)^(7,10,82) 3^s2^(6,0,81)^(5,0,si)^(4,0,81)^(3,2,81)^(2,2,81)^(3,4,si)^(2,5,81)^(3,6,81)^(4,7,81)^(5,8,80^(4,8,81)^(5,8,80 3^s3^(6,0,si)^(5,0,si)^(4,1,81)^(3,1,81)^(4,3,si)^(3,4,81)^(2,5,81)^(3,6,80^(4,7,81)^(3,7,81)^(4,7,s1)^(5,9,si) 4^so^(8,0,so)^(7,0,so)^(6,2,so)^(5,2,so)^(4,3,82)^(3,3,82)^(2,4,s2)^(3,5,82)^(2,6,s2)^(3,6,82)^(4,7,82)^(5,8,8o) 4^Si^(8,0,so)^(7,1,so)^(6,1,so)^(5,2,82)^(4,4,82)^(5,3,so)^(4,4,so)^(3,5,so)^(2,6,so)^(3,6,8o)^(4,7,so)^(5,7,so) 4^s2^(8,0,81)^(7,0,si)^(6,0,81)^(5,2,81)^(4,2,81)^(5,4,s1)^(4,5,si)^(3,6,83)^(4,7,s3)^(3,7,83)^(4,9,s3)^(3,9,83) 4^s3^(8,0,si)^(7,0,81)^(6,1,si)^(5,1,81)^(4,3,si)^(5,4,81)^(4,5,81)^(3,6,s3)^(4,7,83)^(5,7,81)^(4,8,83)^(5,10,83) 5^so^(10,0,so)^(9,0,so)^(8,2,so)^(7,2,so)^(6,4,so)^(5,5,8o)^(4,6,8o)^(5,7,so)^(4,8,so)^(3,8,so)^(4,8,so)^(3,8,so) 5^si^(10,0,so)^(9,1,so)^(8,1,so)^(7,3,so)^(6,3,so)^(5,3,so)^(4,4,so)^(3,5,so)^(2,6,so)^(3,6,so)^(4,9,so)^(5,9,so) 5^s2^(10,0,81)^(9,0,81)^(8,0,81)^(7,2,81)^(6,2,si)^(5,4,si)^(6,5,si)^(5,6,81)^(4,7,81)^(3,8,81)^(2,8,81)^(3,8,s1) 5^83^(10,0,81)^(9,0,81)^(8,1,si)^(7,1,81)^(6,3,81)^(5,4,81)^(6,5,80^(5,6,81)^(4,7,si)^(3,7,si)^(4,9,81)^(3,9,81) 6^so^(12,0,so)^(11,0,so)^(10,2,so)^(9,2,so)^(8,4,so)^(7,5,so)^(6,6,so)^(5,5,82)^(6,8,so)^(5,8,so) ^(4,9,s2)^(5,10,so) 6^Si^(12,0,so)^(11,1,so)^(10,1,80)^(9,3,so)^(8,3,so)^(7,3,so)^(6,4,so)^(5,5,so)^(4,6,so)^(5,9,so)^(4,9,so)^(3,10,s2) 6^s2^(12,0,s1)^(11,0,80^(10,0,si)^(9,2,81)^(8,2,si)^(7,4,si)^(6,5,81)^(5,6,81)^(4,7,si)^(3,8,si)^(2,8,81)^(3,8,s1) 6^83^(12,0,81)^(11,0,81)^(10,1,si)^(9,1,si)^(8,3,si)^(7,4,si)^(6,5,s1)^(5,6,s1)^(4,7,si)^(3,7,si)^(4,9,81)^(3,9,81) 7r- M Ct z 0 Sa) 0 CD 0 cr 0" 0 z 0 Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 78 the first k bits of r for 0 < k < Irk. Furthermore, the k-th survivor path at state s a and time t in the trellis comes from one of the survivor paths at time t — 1 called the prefix survivor. The arrows in Figure 6.6 show the prefix ancestors for the 12 survivor paths at state s 1 and time t = 3. For instance, the survivor state and prefix survivor for the third element of M(s i , 3) are so and 1, respectively. In other words, the survivor path to obtain the first two bits of r at state s 1 and time t = 3 is the concatenation of the survivor path to obtain the first bit of r at state s 1 and time t = 2 with the bits of the branch s o s i (11). The prefix survivors can easily be calculated at the same time as the generalized k-prefix synchronization distances when applying Lemma 6.13. They must be stored by the algorithm, otherwise it won't be possible to backtrack to find the final survivor at the end of the decoding process. For the received sequence 01 01 11 11 01 0, the twelve metrics, prefix survivors, and survivor states for all the states in the trellis are included in Table 6.2. Let C be the set of all codewords generated by the convolutional encoder, and let M(so , t)Hrl] be the last element of the metric tuple M(so , t) corresponding to the number of insertions and deletions required to change the best codeword of length to into the received sequence. Since r can be obtained by inserting Irj symbols in the empty string e, and since the codewords longer than 2r are at least Ir I deletions away from r, it follows that pr isrl (r, C) < Irl, hence the algorithm can stop at time t = 21-1fl 1 in the worst case (in practice the algorithm stops much before that). Once the algorithm stops increasing the length of the codewords, it must backtrack starting from the point where M(so , t)[111] is minimized to find the final survivor. In the example presented in Table 6.2, M(so , 5) [jrl] = 3 is the minimum value. It follows that the algorithm can stop at t = 6, because all the codewords longer than 12 are at least three deletions away from r. The backtracking process is illustrated by boxed metrics in the table. The metric for state so at time t = 5 is (3, 8,80 ), thus the algorithm must backtrack to the ninth survivor at state so and t = 4, and so on so forth. The final survivor is 00 11 01 11 00, and the algorithm was able to correct the three synchronization errors. Theorem 6.15. The final survivor i; of the modified Viterbi algorithm for synchroniza- tion errors is the codeword whose synchronization distance with the received sequence is minimized. Proof. From Lemmas 6.12, 6.13, and 6.14, the survivor path associated to the last element of M(so , t) corresponds to the codeword of length nt whose synchronization distance with the received sequence is minimized. The result follows directly from the fact that the Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 79 algorithm chooses the length of the codeword such that the last element of M(so , t) is minimized.^ ^ It should come as no surprise that the computational complexity of the algorithm is higher than the computational complexity of the original Viterbi decoding algorithm, mostly because the complexity of calculating pr,(u, v) is 2 lul • Since the gener- alized prefix synchronization distance is calculated recursively, it requires 2 • 2k r n operations per state for a (n, k, v) convolutional encoder instead of n • 2k for the original Viterbi algorithm, thus the modified algorithm for synchronization errors is 2 • Ir I times computationally more expensive. The algorithm needs to store Ir + 1 metrics, survivor states and prefix survivor for each state during the decoding process, and as a result the amount of space required by the algorithm for synchronization errors is also approx- imately 3 • 1r I times the space needed by the original algorithm. Finally, the algorithm must store the received sequence for the entire duration of the decoding process. 6.4 Bypasses and Shortcuts Convolutional codes used in conjunction with the decoding algorithms described in the previous section are ill-suited to correct synchronization errors and as a result their performance is mediocre. In this section, I define bypasses, and shortcuts in graphs and show that there is a tradeoff between them. This leads to a family of graphs whose structure is very robust against synchronization errors. These graphs can be used as encoders in place of convolutional codes and can be decoded with the modified versions of the Viterbi algorithm. Definition 6.16. An k-graph is a graph G = (V, E) with the following properties: 1. G is oriented and strongly connected; 2. There are exactly k outgoing edges from each vertex v E V; 3. It is possible to have several edges joining the same vertices (E can be a multiset); 4. Edges (vi , vi ) are possible (cycles of length one). Convolutional codes are a special case of trellis codes where the structure and the bits of the state diagram of the encoder are defined by a convolution. k-graphs can also be used as encoders by assigning information and output symbols to their edges. A Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 80 (k, n)-encoder graph is a k-graph with n output symbols associated to each of its edges. The (k, n)-encoder graphs can also include input symbols, as shown in the (2, 2)-encoder graph of Figure 6.7. 0/11 Figure 6.7: A (2,2)-encoder graph with input alphabet {0,1} and output alphabet {0,1, 2}. Definition 6.17. A graph G has a bypass of length k if there exists two nonidentical paths of the same length k between a pair of vertices of G. We use burin for the length of the smallest bypass of G. A graph G has a shortcut of length k if there exists two paths of different length between a pair of vertices such that k is the difference between the length of the longest path and the length of the shortest path. It is assumed that the length of the shortest path can be zero, thus a cycle of length k is also a shortcut of length k. We use &min to denote the length of the smallest shortcut of G. As an example, the graph in Figure 6.8 has, among others, a shortcut of length one between the vertex s 1 and itself from the cycle of length one, a bypass of length two between the vertices so and s3 from the paths so s i s3 and so s2 s3 , and a shortcut of length 2 between so and s3 from the paths s o s 1 s 1 s3 and so s3 , For this graph, burin = 2 and Smin = 1. Figure 6.8: A simple graph with bypasses and shortcuts. There is a very nice tradeoff between the length of the smallest shortcut and the length of the smallest bypass of encoder graphs, as shown by Theorem 6.18. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 81 Theorem 6.18. Let G be an k-graph with n vertices. If s min > bmin, then Smin kbmin -i < n. Proof Let so be an arbitrary vertex in an oriented graph G. We dispose all the vertices of G in columns, with s o in column 0, such that a vertex s is put in column i if and only if the length of the smallest path from s o to s is i. For a fixed s o , there is one and only one way to dispose the vertices of G in columns as described since each vertex belongs to exactly one column. It should be clear that there can be no edges going from column i to column i + a for a > 1. Furthermore, if there is an edge (s i , si) joining two vertices si and si in column i, then G has a shortcut of length 1 since there is at least a path of length i and a path of length i + 1 from s o to Similarly, if there is an edge from a vertex s i in column i to a vertex s i_c, in column i - a > 0, then it follows that G has a shortcut of length a + 1. Let G be a k-graph with bmir, b and smin, s and whose vertices are disposed in columns as described two paragraphs ago. First, from the constraint s, no outgoing edge coming from a vertex in the first s - 1 columns can point to a vertex in the same column or in a preceding column, thus all the outgoing edges from vertices in column i for 0 < i < s - 2 go to vertices in column i + 1. Second, from the constraint b and from the fact that each vertex has exactly k outgoing edges, it follows that there are exactly ki vertices in column i for 0 < i < b - 2 and at least kb' vertices in column j for b - 1 < j < s - 1. Indeed, if column j for b - 1 < j < s - 1 contains less than kb-1 vertices, then there is a vertex in column j - b +1 and a vertex in column j with at least two different paths of length b - 1 between them, which means that G has a bypass of length b - 1. A similar argument can be made for the first b - 1 columns. Hence, the total number of vertices in the first s columns is > 1+k+k2 + • • • + kb-2 + (s - b +1) • k b" kb - 1 k - 1 + (s - b) • kb-1 . (6.5) We now prove the theorem by contradiction. Let s • kb-1 > n. We show that there is no possible way to dispose the outgoing edges from column s - 1 and to the satisfy the constraints b and s while adding only n - n' additional vertices in the graph. It is assumed that there are exactly 2 b-1 vertices in columns i for b -1 < j < s -1 (it is not difficult to prove that the argument is also true if those columns contain more vertices). Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 82 Since there is a limited number of vertices that can be put in column i for i > s - 1, we need to point as many outgoing edges as possible from column s - 1 to column 0, as many outgoing edges as possible from column s to column 1, etc. Although it could be decided to wait until column i > s -1 before starting to point outgoing edges towards the first columns of G, this does not help because the outgoing edges from a column cannot point to vertices in more than one of the first s columns without creating a shortcut smaller than s. From the constraint b, there can be at most k vertices in column s - b 1 whose respective set of descendants in column s - 1 are disjoint. Since column 0 has only one vertex, it means that at most k outgoing edges, one from each set, can go from column s - 1 to column O. Otherwise, there would more than one path of length b -1 from a vertex in column s - b +1 to the vertex so in column 0, contradicting the fact that the length of the smallest bypass of G is b. It follows that the remaining kb' - 1 outgoing edges from the vertices of each set in column s - 1 can only point to distinct vertices in column s, hence at least kb" - 1 vertices have to be placed in column s. Using a similar argument for all the subsequent columns of G, it follows that the number of vertices required in column s + i for 0 < i < b - 1 is at least kb' - le, thus the total number of vertices in the last columns is n' ^( kb-i^ko) (kb-i ki)^(kb-1^kb--i) kb -1 = b • kb" Combining (6.5) and (6.6), the total number of vertices in the graph must be at least as large as n, nll > s kb-i > n, which is a contradiction.^ ^ Definition 6.19. Let k > 2, b > 1, and a > 1 be integers, let i\j^be the quotient (integer part) of^and let gi be the Kronecker delta, i.e., { 1 if i 0; 0 if i O. k -1' (6.6) The (k, b, a)-rectangular graph is the k-graph with n a • (b -1+ 4_ 1 ) • kb' vertices numbered from 0 to n - 1, and for which the k outgoing edges from vertex i point to the Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 83 vertices [( i \ kb- 1 + 1) • kb-1+ + j • ki\k6-1(mod b-1-1-4-01 (mod kb")] (mod n) for 0 < j < k.^(6.7) It is not hard to prove that the (k, b, a)-rectangular graphs can be disposed in a • (b — 1 + 4_ 1 ) columns of kb-1 vertices such that all the outgoing edges from ver- tices in column i point to vertices in column i + 1 (mod a • (b —1+ 4_ 1 )). In fact, the graphs contain the sets of edges {(07/3 + kb 1 ) ,^+ kb-1 ,0 + 2 . kb-i) , (0 + 2 kb-1 ,0 + 3 . kb-i) ,... (0 + (a • (b — 1 + 4_1) — 1 ) • kb-1 ,0)} for 0 _< /13 < kb', so they can also be disposed as kb-1 parallel cycles, with additional edges between the cycles. The (4,1,3)-rectangular graph, the (3,2,4)-rectangular graph, and the (2,3,2)-rectangular graph are shown in Figures 6.9, 6.10, and 6.11, respectively. Rectangular graphs have several interesting properties, one of them being that they reach the bound of Theorem 6.18, as shown next. Figure 6.9: The (4,1,3)-rectangular graph. Figure 6.10: The (3,2,4)-rectangular graph. Theorem 6.20. The bound from Theorem 6.18 is tight for every (k, b, a)-rectangular graph. More precisely, buiin, = b and smin = a • (b —1+ Sb_i). Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 84 Figure 6.11: The (2,3,2)-rectangular graph. Proof. From the discussion above, the vertices of a (k, b, a)-rectangular graph can be arranged in a • (b- 1+ 4_ 1 ) columns such that outgoing edges from vertices in one column all point to vertices in the next column, thus the graph has no shortcut of length smaller than a • (b - 1 + 4_ 1 ). Moreover, since the graph has a cycle of length a • (b - 1 + S-b-1)7 it follows that Smin = a • (b - 1 + Ob-i)• Using smin in Theorem 6.18, we can write that sniin •kbmin -1 < Ti a • (b - 1 + Sb_ i ) kbinin -1 < a • (b - 1 + 4_0 • kb-1 bmin^b, thus it remains to be shown that the graph has no bypass of length smaller than b. This certainly is the case for b = 1, so for the rest of the proof we assume that b > 1. Furthermore, since the (k, b, a)-rectangular graph is, loosely speaking, the concatenation of a copies of the (k, b, 1)-rectangular graph, we assume without loss of generality that a = 1. The graph can therefore be disposed in b - 1 columns numbered from 0 to s - 2, with Column i containing the set of vertices {0 + i • kb-1^kb-1 1 + i • k b -1 } whose edges all point to vertices in column i + 1 (mod b - 1). To simplify the proof, we identify the vertices in a column by their value modulo kb-1 , so each column has the set of vertices {0, 1, 2 ... , kb' - 1}. We have to prove that, from any vertex j in column i, there is at most one path of length /3 ending in state / in column i + 13 (mod b - 1) for Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 85 0 < i < b — 1, 0 < j < kb-1 , and 1 < /3 < b. First, if i = 0, then from (6.7) there are ki3 paths of length 0 from j to the vertices {j (mod kb-1 ), + 1 (mod kb-1 ), .^+ kR — 1 (mod kb-1 )} ^ (6.8) in Column 0 (mod b — 1). The paths all end in different vertices if 1 < < b, hence there is no bypass of length smaller than b starting from the first column of the graph. Second, if i > 0 and 1 < /3 < b — i, then from (6.7), there are k Q paths of length 0 from j to the vertices {j (mod kb-1 ), j + ki (mod kb-1 ), j + 2 • k i (mod kb-1 ), . , j +^— 1) • ki (mod kb-1 )} in Column i + 0 (mod b — 1). The paths of length b — i from j in column i > 0 end back in Column 0, and from (6.8), increasing the length of the paths from 0 to 0 + -y will result in 0+7 paths starting from the vertex j in column i and ending in Column -y in the vertices { j (mod kb-1 ), j + 1 (mod kb-1 ),^, j +^— 1 (mod kb-1 ), j + k i (mod kb-1 ), j + k i + 1 (mod kb-1 ),^, j + ki + le7 — 1 (mod kb-1 ), j + (k3 — 1) • ki (mod kb-1 ), j + (k3 — 1)^+ 1 (mod kb-1 ),^, j +^— 1) • ki + — 1 (mod kb-1 ) 1. The paths all end in different vertices when 0 + ry < b, hence there is no bypass of length smaller than b in the graph.^ ^ 6.5 Tradeoff Between Recovering Synchronization and Correcting Synchronization Errors So far, I have proved that the Viterbi decoding algorithm can be used to correct inser- tion, deletion, and duplication errors efficiently, as well as more complex scenarios where Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 86 several types of synchronization errors can occur together. In this section, I explain why convolutional codes are are inefficient to correct synchronization errors and show that the (k, b, a)-rectangular graphs presented in the previous section are more promising, but before that I revisit some of the distances presented in Section 6.2. Convolutional codes and trellis codes in general generate a infinite number of finite codewords of various lengths, but for synchronization errors the length of the codewords is not necessarily preserved by the channels. Furthermore, decoding errors have different consequences whether or not the decoded and original codewords have the same length, so it is important to distinguish between both cases if some insight on the performance of synchronization-correcting codes is to be gained. Definition 6.21. A shortcut decoding error occurs when the final decoded received sequence and the original codeword do not have the same length. A bypass decoding error occurs when a received sequence is decoded incorrectly to a codeword whose length is the same as the length of the original codeword. The minimum bypass deletion distance of C is defined as dmb = u,vEMClunHvi dd(11' v) ' and the minimum shortcut deletion distance is defined as C in =^min dd (u, v). - -^u,vecduiolvl The minimum bypass and shortcut distances can be defined similarly for other types of errors, i.e., imb in and ims in for insertion errors, train and tn.% in for duplication errors, and smb in and emin for synchronization errors. A very useful distance measure to study the performance of convolutional codes is the minimum free distance, which can be modified for synchronization errors. Definition 6.22. The minimum free deletion distance of a code is defined as df,„ = min(dms in, dra b in), and the minimum free insertion, duplication, and synchronization distances are defined the same way and denoted by i free, t free , and sf„,, respectively. The minimum free distances are equivalent to the minimum distances defined in Section 6.2. The minimum bypass and shortcut distances both affect the minimum free distance of a code, but in a different way: a shortcut decoding error causes a loss of synchronization Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 87 between the transmitter and the receiver, whereas a bypass decoding error maintains synchronization but creates a burst of substitution errors. The main result of this chapter is that the robustness of trellis codes against most types of synchronization errors depends on the structure of the codes themselves. Although for clarity purposes the remaining part of the section only considers deletion errors, the analysis and results are also valid for insertion and synchronization errors. Theorem 6.23. For a (k, n)-encoder graph whose smallest bypass is of length bmin and whose smallest cycle is of length Smin, dmi' in < n bmin ; dins in < n smin. Proof. Let b be the output symbols from a cycle of length s rnin in the encoder graph. Clearly, there exist two codewords c=lobor and c'=lor that can be generated by the encoder, and the deletion distance between them is icl = n • s min . Let d and d' be the output symbols for the two paths of a bypass of length b min in the encoder graph. There exists two codewords c=lodor and c'=lod'or that can be generated by the encoder, and the deletion distance between them is at most Id! = = n • bmin. 0 The structure of the underlying graphs of (n, k, v) convolutional codes is terrible because they have a cycle of length one and thus drain < n. It means that no matter how much memory they have, there will not be able to correct all the error patterns consisting of n symbol deletions. Interestingly, the graphs of convolutional codes have a minimum bypass distance of dms in = n+ 1, and as such they reach the bound of Theorem 6.18. This being said, there is no point for the final codeword to have the symbols of the original codeword in the right order if some of them are missing. It is much better to recover synchronization correctly and to correct potential bursts of substitution errors using, for instance, concatenated Reed-Solomon or low-density parity-check codes. Theorem 6.18 means that when the number of states of an encoder is fixed, there is a tradeoff between its capacity to recover synchronization and its capacity to output the bits of the original codeword in the right order (of course some graphs are bad for both). One extreme case includes the convolutional codes, and the other extreme case includes the graphs consisting of one long cycle, like in Figure 6.9, where codewords of correct length but bearing little resemblance to the original codewords can be returned. The (k, b, a)-rectangular graphs are promising for two reasons. First, they achieve Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 88 the bound of Theorem 6.18 for smin > burin — 1, which is required to construct good codes for most types of synchronization errors. The second reason why they are so interesting is that all the edges in one column point to vertices in the next column. As a result, their decoding complexity depends on the number of vertices per column instead of the total number of vertices. Furthermore, their minimum shortcut distance increases as the number of columns increases, whereas their minimum bypass distance increases with the number of vertices per column. Consequently, one can use graphs with a large number of small columns for increased robustness against loss of synchronization without increasing the decoding complexity, and use a concatenated code to correct potential bursts of substitution errors. 6.6 A Few Simulation Results In this section, I present a few very simple trellis encoders and their performance to illustrate the potential of rectangular graphs to correct all types of synchronization errors. 6.6.1 A Trivial Triple-Deletion Synchronization-Recovering Code The first encoder is a (2, 2)-encoder graph over the (2,2,2)-rectangular graph illustrated in Figure 6.12 and analyzed for deletion errors. Its encoding rate is 1, and two output bits are required to end the encoding in state s o when used with input sequences of odd length. The minimum bypass deletion distance of the encoder is one, whereas its minimum shortcut deletion distance is three. The code can therefore correct one deletion error and recover synchronization if less than four deletions occur, whether or not the errors occur in burst. This is as good as it gets for R = 2 binary codes over the (2,2,2)-rectangular graph: it has a cycle of length 4, and it is not possible to select the bits in such a way that the minimum bypass deletion distance is greater than 1. Figure 6.12: A triple-deletion synchronization-recovering code. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 89 Table 6.3: Hamming distance between the original and final codewords for the trellis encoder of Figure 6.12 with two or three deletion errors. Hamming distance between original and final codewords 2 deletions 3 deletions 0 90.2% 73.6% 1 2 8.3% 19.5% 3 4 1.4% 5.3% 5 6 < 0.1% 1.4% 7 8 < 0.2% 9 10 < 0.01% The encoder was simulated extensively with input sequences of 19 bits, resulting in codewords of 40 bits decoded using the modified Viterbi algorithm for deletion errors presented in this chapter. Table 6.3 shows the Hamming distance between the original and final codewords when two or three deletion errors occur. The result is that if the trellis code is concatenated with a simple outer Reed-Solomon code capable of correcting two substitution errors, the resulting code can correct two deletions with probability 0.985 and three deletions with probability 0.931. This might not seem much, but it is, to the best of my knowledge, the first easily decodable code with a reasonable rate that can correct a burst of three deletion errors. Moreover, the code is based on of the simplest nontrivial rectangular graphs and its decoding complexity is the same as the decoding complexity of a R = 2 convolutional code with a 1-bit register. 6.6.2 A Sample Trellis Code for the Deletion Channel In most practical systems, loss of synchronization can have catastrophic consequences. I now show that rectangular graphs can overcome this problem. Consider, for instance, the (2, 2)-encoder graph over the (2,2,8)-rectangular graph illustrated in Figure 6.12. This code was simulated extensively with random binary input sequences of length 503 and codewords of length 1008. The codewords were transmitted over a deletion channel with bit deletion probability 0.1 and decoded using the Viterbi algorithm for deletion errors. For this set of settings, the probability that the transmitted and decoded codewords do 01^10^10^00^01^00^01^10 01^10^10^11^11^11^01^00 Figure 6.13: A long trellis is very robust against loss of synchronization. 0 Z ■—.. Z CD Do 01 CD ■--, C) 0 a CD ci) oil 0 1 C/) Cr 0 Cf) Z n 0" 1-1 0 0 g . CDcl- 0 0 oms 01 0 ►1 Cl) c.0 0 Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 91 not have the same length is approximately 2 x 10 -5 , and the average Hamming distance between the transmitted and received sequences is 67. This means that using this code of rate -. transforms a channel deleting roughly 10% of the bits into a channel with a 6.7% rate of substitution errors. Again, it must be mentioned that the decoding complexity of the inner code is still the same as that of a R = a convolutional code with a 1-bit register. Using a good Reed-Solomon or low-density parity-check outer code, it is possible to correct almost all the substitution errors, thus to obtain a concatenated system with good performance. On average over all codes, the probability of a shortcut decoding error decreases exponentially with the number of columns in the encoder graph. For instance, over all the binary codes of rate R = z using the (2, 2, 20)-rectangular graph, the probability of a shortcut decoding error is less than 10 -12 . This solves one of the most important problems of synchronization, and the only known codes that somewhat achieve this are the watermark codes introduced by Davey and Mackay [20]. The main challenge of trellis codes over rectangular graphs is therefore not to recover synchronization, but instead to find bit configurations minimizing the number of substitution errors between the transmitted and decoded sequences. 6.6.3 Codes for Channels with Segmented Deletion Errors An interesting problem is to develop codes that can guarantee successful decoding if the synchronization errors or burst of errors are far apart from each other, or if there is a bounded number of errors per segment. Using irregular codes over the (2, 1, k)-rectangular graphs and the modified versions of the Viterbi decoding algorithms for deletion or insertion errors, it is possible to obtain codes that significantly outperform the best known codes in this setting. For instance, the almost trivial encoder shown in Figure 6.14 has rate R = 0.4 and can correct one deletion error (or one insertion error) every five bits, which is much better than the code with rate R = 0.25 from Swart and Ferreira that can correct one deletion error (or one insertion error) every six bits. More- over, the decoding complexity of my code is equivalent to that of a convolutional code with no memory, whereas, Swart and Ferreira's code is much more complex and much harder to decode. Liu and Mitzenmacher [64], using another extension of Levenshtein's single-deletion error-correcting codes, designed a code of rate R = 0.448 that can correct one deletion error (or insertion error) per byte (segment of eight bits). This is a slightly different Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 92 11^001 S01`00^110 Figure 6.14: A code correcting 1 deletion error every 5 bits. 11^11^001 0 AGM— 0 ANEW (41)■1111-- o /4_ 00^00^110 Figure 6.15: A code correcting 1 deletion error every 9 bits. model, since the code can correct a pair of deletions close to each other if the first deletion occurs near the end of a segment and the second deletion near the beginning of the next segment. The irregular code over the (2, 1, 4)-rectangular graph presented in Figure 6.15 has rate R = s 0.444 and can correct one deletion error every nine bits when the modified Viterbi decoding algorithm for deletion is used. This being said, when the distance between consecutive errors is less than nine, the algorithm will return valid codewords, but sometimes codewords corresponding to illegal error patterns, i.e., patterns with errors too close to each other. This can be solved by allowing the Viterbi algorithm to backtrack when an illegal error pattern is found until a codeword corresponding to a valid error pattern can be returned, which significantly improves the accuracy of the decoded codewords. It should be noted that the complexity of the algorithm increases as the average distance between the errors decreases, since it has to backtrack more often. The last code of this section is shown in Figure 6.16. Its rate is R = 0.3 and it can correct a burst of 2 deletion errors (or 2 insertion errors) every 10 bits if the Viterbi algorithm for deletion or insertion errors is used without backtracking. More precisely, decoding will be successful if any substring of 10 bits in a codeword is corrupted by at most 2 deletions 111^111^0001 410 41111111.—^■1111111-- o \i< — 000^000 1110 Figure 6.16: A code correcting 2 deletion errors every 10 bits. 11 00 Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 93 6.7 Regarding Maximum likelihood Decoding of Trellis Codes for Deletion Channels This has already been mentioned several times throughout this thesis, but again one reason explaining why deletion-correcting codes are difficult to derive compared to codes for substitution errors is that the balls associated to the codewords do not have the same size and depend on the structure of the codewords themselves. We saw in Chapter 4 that the problem of finding the number of subsequences of a string when bits are deleted from it is a challenging problem in its own right. This also explains why the modified Viterbi algorithm with the metric used in Section 6.3 does not necessarily find the maximum- likelihood path. Consider for instance that one of the codewords from the codebook C = {101, 001, 0001, 00001} is transmitted over a deletion channel with deletion proba- bility pd and that the received sequence is 01. One might say that 101 and 001 are equally close from 01 since only 1 bit needs to be deleted from both strings to get 01, but it is not true, since to get 01 from 001 the bit can be deleted among the two bits of the first run, whereas from 101 the first bit must be deleted. More precisely, P(011001) = 2p d (1 — pd ) 2 and P(011101) = p(1 — pd) 2 . This is quite different from substitution errors, where all the codewords at equal Hamming distance from a received sequence are equally likely. Even more challenging is the fact that for a deletion channel where each bit is inde- pendently deleted with probability pd and transmitted correctly with probability 1 — pd, the maximum-likelihood codeword and even its length change depending on the prob- ability of bit deletion pd, with longer codewords more likely as the probability of bit deletion increases. Using the codebook C defined above, P(0110001) = 3pd (1 — pd) 2 and P(01100001) = 4p(1 — pd ) 2. Hence, if 0 < pd < -., then 001 is the maximum- likelihood codeword; if 3< pd < i, then 0001 is the maximum-likelihood codeword; finally if pd > 1, then the maximum-likelihood codeword is 00001. A direct conse- quence of this phenomenon is that the maximum-likelihood codeword for a received se- quence which is already a codeword can be a different codeword: if 001 is received, then P(0011001) = (1 — pd) 3 , P(00110001) = 3pd (1 — pd ) 3 , and P(001100001) = 6p2d (1 — pd ) 3 , thus 001 is not the maximum-likelihood codeword for itself if pd > In general, maximum-likelihood decoding of trellis deletion-correcting codes cannot be done using a survivor metric and the Viterbi algorithm in its simplest form. This is formalized in the following theorem. Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 94 Theorem 6.24. For any survivor metric, there exists a trellis such that the modified Viterbi decoding algorithm for deletion errors is not maximum-likelihood decoding. Proof. Consider the encoder corresponding to the simple rectangular trellis shown in Figure 6.17. If the deletion probability is small, the maximum likelihood codeword for the received sequence 0100 is 010100, and for 0101 the maximum-likelihood codeword is 011001. Since the two received sequences start with 010, the state of the Viterbi algorithm at time t =1 is the same for both, and whatever branch is kept as the survivor, at least one of the two final survivors will not be the maximum-likelihood codeword for the corresponding received sequence. ^ 010 100 0■11-- 0■10*- -0 ',, : - - -- 011 001 Figure 6.17: A bad trellis for maximum-likelihood decoding. Under the light of Theorem 6.24, a maximum-likelihood algorithm for trellis codes with deletion errors must have the capability to remember more than one survivor path per state. 6.8 Summary I this chapter, I have presented a new class of binary codes capable of correcting syn- chronization errors. The codes use encoders based on graphs, and although convolutional codes do not offer a good performance, I have shown that the structure of rectangular graphs is well suited to be used on channels affected by synchronization errors. I have formally proved that the Viterbi decoding algorithm can be modified to correct various types of synchronization errors by returning the "closest" codeword from a given received sequence. Correcting one type of synchronization error requires roughly the same complexity as the original Viterbi decoding algorithm, whereas it is not the case when insertions and deletions or synchronization and substitution errors can occur together. I have demonstrated that codes can recover synchronization with arbitrarily high probability by using encoders based on rectangular graphs with a large number of columns, and this without increasing their encoding and decoding complexities. Finally, I have proved that maximum-likelihood decoding for synchronization errors cannot be achieved Chapter 6. Nonlinear Trellis Codes for Symbol Synchronization Errors 95 without increasing the number of survivor states in the appropriate versions of the Viterbi decoding algorithm. Part III Applications of Synchronization Models 96 Chapter 7 A Synchronization Approach to Opportunistic Spectrum Access for Cognitive Radio Systems 7.1 Introduction Cognitive radio is an emerging technology proposed to increase spectrum usage in un- derutilized licensed spectrum bands. A challenging task for cognitive radio systems is to design efficient opportunistic spectrum access schemes. One way for cognitive users to communicate is to use an interference avoidance approach [49], meaning that the cogni- tive transmitter and receiver only use the available spectrum in which they don't detect primary user activity. Two difficulties arise in this scenario. First, the availability of the spectrum to the cognitive users depends on the activity of primary users and is therefore highly dynamic. Second, it is possible that the transmitter and the receiver have con- flicting information about the available spectrum at any given time due to factors like mobility, distance from the primary users, shadowing, etc. The channel models developed and described in this chapter are motivated by the two-switch channel model for dynamic and distributed spectrum allocation originally presented by Jafar and Srinivasa [49]. There, the authors model the dynamic spectrum allocation problem as a channel with side-information [93, 16]. The model, illustrated in Figure 7.1, has a switch for each cognitive user which is open if they detect a primary user. The switch requires that the cognitive users avoid using the channel whenever 'The work of Section 7.2 was done in collaboration with Chris Nicola. 97 Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 98 a primary user is detected. A generalization of the two-switch channel is presented in Figure 7.2, where the available bandwidth B is divided into s subcarriers, each with its own two-switch. Partitioning the bandwidth was proposed after measurements indicated that spectrum bands licensed to primary users were not used uniformly, and the main advantage of this approach is that resources can be allocated to cognitive users without overly interfering with legacy users. In particular, cognitive users must avoid using the subcarrier channels where primary user activity is detected. In this chapter we explore this further by implementing channel models motivated by this scheme. I describe two logical ways in which the cognitive users might approach these channels. In the first approach, the cognitive transmitter always continues with its message, allowing bits to be lost when there are open switches; this can be thought of as an erasure channel. In the second approach, the transmitter stops sending information when it encounters an open switch until the next free (closed switch) subcarrier and then resumes the transmission. This requires that the cognitive receiver attempts to remain synchronized. S Primary Trans.Secondary Trans.^T Secondary \Recv. Channel Figure 7.1: Two-switch model from Jafar and Srinivasa. tll a) 1.> a) tu ci) t) a)a> 4-4 4.4 4.4 1 2 3 4^5 6 7 8 9 10 B Figure 7.2: Spectrum availability for a cognitive user at time t. In both models, the detection capabilities of the transmitter and the receiver are not assumed to be ideal, nor do they necessarily match. We define a probability of detection failure and a probability of false alarm for each cognitive user. This gives us some insight Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 99 into the effect spectrum sensing quality has on channel capacity and system performance. Another motivation to study these models is to learn about their fundamental limits and to introduce error-correcting coding techniques to solve the dynamic spectrum allocation problem, especially synchronization error-correcting codes. We are aware that some of our assumptions are unrealistic, this being said we believe that our analysis remains valid with assumptions that model practical systems more closely. For instance, although we use coding at the bit level, coding at the packet level is also possible in our setting. Furthermore, we demonstrate that changing erasures into synchronization errors can provide some advantages, which is an interesting result in its own right. We begin with the noisy erasure model in Section 7.2 and then extend and improve on it to demonstrate that the switch at the receiver is not ideal in this case. Section 7.3 provides the framework for the synchronization approach, and in Section 7.4 we conclude the chapter. 7.2 Opportunistic Spectrum Access via Erasure Models 7.2.1 Two-switch erasure model with binary symmetric noise Dynamic spectrum allocation by an individual cognitive radio in the presence of a primary user with priority use of the spectrum can be modeled using the two-switch model from Jafar and Srinivasa [49] and seen in Figure 7.1. The two switches model the detection of primary users at the transmitter T, and receiver 'R., and are represented by the random variables Sy and SR . Each variable is a local estimate of the random variable S which is the actual event that a primary user is operating in the range of T and/or 7Z. The transmitter is not only required to avoid transmitting in the presence of a primary user, but we also assume that the primary user signal is much more powerful than the channel noise and will result in an near-total loss of information. This is modeled by changing the bit-error probability such that when S = 0 (no primary user present), the noise process of the channel is that of the BSC channel with probability of error p °, = pBsc, , and when S = 1, then pi! = . The detection of the primary users by T and R. is not assumed to be perfect. There is the possibility that they fail to detect an active primary user or that they falsely detect a primary user (false alarm) when none is active. The probabilities of failure at the E1 S = 1 R Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 100 transmitter and receiver are represented by: fT = Pr(ST sIS = s), f7 = Pr(SR, sIS = s). Using this notation, the probabilities of failed detection are represented by f4- and^, and the probabilities of false alarm by n and A. These probabilities can be thought of as characteristics of the cognitive radios indicating the ability to detect primary user activity. Furthermore, the probability of failing to detect a primary user is directly related to the probability of interfering with it, so it will very likely be tightly controlled by regulating agencies [45]. It will be seen that these values have a direct effect on the capacity of the dynamic spectrum channel available to the cognitive radios in this model. T Transmitter^Channel^Receiver L ^I ^I 1 ^ 1^1^1^I .6, ^> 0 0 >0 E 1 Figure 7.3: Components of the noisy erasure channel The components of the noisy erasure channel are illustrated in Figure 7.3. While the channel can be considered to have two separate states associated with S = 0 and S = 1, we can simplify this into the noisy erasure channel shown in Figure 7.4 and express the values e and a as functions of the probabilities of primary user activity and the detection Jtr Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 101 properties of T and R, , i.e., a = (1 — ps)^+ Ps (1 — fie) and (7.1) E = ( 1^s) 1 2 fT + (1- g)pBsd (1- 4.)+ v)sfiz, (7.2) where Ps Pr(S = 1). Figure 7.4: Noisy erasure channel. For this channel the capacity for uniform binary inputs is given by C = 1 — a + H([a,1— cE]) — HUE, a, 1 — E - ap 1 — a + h(1 — a) — h(e) — h(1 — E - a),^(7.3) where H• is the entropy function and h(x) = — x log(x) This model, however simple, may not be optimal for this approach to the problem. We show below that it is possible for the receiver to further exploit the knowledge of the imperfect detection properties of T and R. 7.2.2 One-switch alternative model The noisy erasure model for the two-switch channel can be considered one of the simplest models for the dynamic spectrum allocation problem in that it is merely a combination of two of the simplest and most well known binary input channels. We note, however, that the receiver may not be exploiting all of the information available from the channel and side-information. The effect of the switch at the receiver means that it assumes an erasure occurs whenever it detects a primary user. This does not consider the possibility that the primary user may have been detected incorrectly by the cognitive receiver. If the primary user detection is perfect at the receiver then there is no advantage to consider the output 0 0,0 1 1,0 X Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 102 of the channel. However, we have imperfect detection of the primary user, and in this case the receiver can do better if it has an accurate knowledge of the probabilities of failed detection fisz and fi9-. Figure 7.5: One-switch channel transitions. We consider a modification of the channel model which is illustrated in Figure 7.5, where there are four possible channel outputs corresponding with the two binary output values for the channel Y and the two states of S. This can be looked at as a one-switch model where there is only a switch at T. It is possible to compute the channel capacity in this case, and to do so we first define the probability of error conditioned on SR, by pe (87z) Pr (X YISR = s7). More precisely, pe(0) = pe (1) = 2P ^ 0 \ 110 + [PBSC f^ Jr)^+ , 0 Polo, 1 ^[PBSC ( 1 — n)^n]po l i• (7.4) (7.5) We define Psis,, '1- Pr (S = sISR, = s/z) , and compute ( 1 — Ps) ( 1 — A) (1 — Ps) (1 — A) + Ps fk' ( 1 — Ps) A (1 — ps) A + Ps ( 1 — with P1 1 0 = 1 — P0 1 0 and P1 1 1 = 1 — P0 1 1 . These can be substituted into the equation of the channel mutual information, i.e., / (X; Y, SR) = H(X) — H (X1Y, SR,) = 1 — H (X ED YisR ) =^- E Pr (SR = sre ) • [h (pe (siz )) + h (1 — (87z))] . Polo Poll Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 103 This gives us the capacity of the channel for a uniform input distribution. In the one- switch model the receiver does not assume that no useful information can be received from the channel when a primary user is detected. This is advantageous to R if it knows that the detection capabilities of both itself and of T are not ideal. If we assume that R has an accurate knowledge of the values for f7 and fis-, it is possible for the decoder to exploit this channel further by considering the output value Y. Without that knowledge the receiver may not be able to effectively exploit the information that might have actually been sent in the case of a false detection. It is worth noting, however, that the capacity gained by considering this model vs. the erasure model is modest unless both the primary user activity is fairly high (i.e., ps is large) and the probabilities of detection error are large. 7.2.3 Coding for the Erasure Approach Although in theory coding could be applied to the models described in the previous section in order to compensate for the BSC noise as the primary user activity, the main problem with the erasure approach is that code rates must reflect the probability of primary user activity. In practice, primary user activity will certainly be time-varying and highly bursty, hence it does not seem that coding can be useful in this setting. 7.3 Opportunistic Spectrum Access via Synchronization In this section, we explore how synchronization error-correcting codes can be used to solve the opportunistic spectrum access problem. As in Section 7.2, we use the same assumptions that allow us to model the problem in an intuitive albeit oversimplistic way: a cognitive transmitter T wants to convey a message to a cognitive receiver R, the available bandwidth B is divided into s subcarriers with its own two-switch, the cognitive users are not allowed to transmit in the subcarriers used by a primary user, and there is no interference between the subcarriers. The novelty of the synchronization approach over the erasure models is that when an open switch is encountered, the cognitive transmitter keeps the bit to be transmitted until a closed switch is encountered instead of dropping the bit and moving to the next one. Consequently, a discrepancy of the available subcarriers between the secondary transmitter and receiver causes a loss of synchronization, hence ›tdeletions 0 0 0 a) t) I ;LI 6 7 8 9 10 1 1 as 2 a) 1 2 5 I a, 3 4 5 yilr Insertion of a random bit i Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 104 ^► 1 2 3^4^5 6 7 8 9 10 :Tti CD CD 1 7/ CD CD 1 ''t 5 `-$ CD ■-7/ CD CD 0 :rti CD CD 0 :Tit CD CD 0 .7/ CD CD 1 :V CD CD 0 B ^ND- Figure 7.6: Synchronization errors caused by discrepancies of the available spectrum to the cognitive transmitter and receiver. Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 105 the opportunistic spectrum access problem can be solved using synchronization error- correcting codes. In order to illustrate our model, suppose that T wants to send the string 1100010. It senses the spectrum and realizes for instance that the third, fourth, and fifth subcarriers are being used by primary users, as shown in Figure 7.6. T then sends a bit on each free subcarrier, and the resulting usage of the bandwidth is 1 1 - - - 0 0 0 1 0. Suppose now that R, senses that the fourth, fifth, ninth, and tenth subcarriers are being used by primary users. If no bit is corrupted by the channel, it follows that the received bit for each subcarrier is 1 1 ? - - 0 0 0 - -, in other words the message 1100010 is received as 11?000. It follows that each primary user detected by R. but not by T will result in a bit deletion. It has to be noted that R. has side-information about the deletions, since they can only be located in the subcarriers detected as being used by primary users. Furthermore, each primary user only detected by T generates a random bit insertion at the receiving side, but unlike deletions there is no side-information about their location. Interestingly, it seems that our study of dynamic spectrum allocation shed new light on channels with synchronization themselves. To argument our statement, consider the case with n = 0, fj- = 0, A = 0, and pBsc = 0. In this scenario, the cognitive users want to communicate over a noise-free channel, and errors only occur when the receiver falsely detects primary user activity on a given subcarrier. From (7.1) and (7.2), a = (1—ps)A +ps and E = 0, and it follows from (7.3) that the capacity of the resulting channel is C = 1 — a = 1 — (1 — ps)A— ps • At first sight, when ps is small, this channel seems quite similar to the deletion channel with probability of bit deletion f. in Chapter 3. After all, when looking for the value of a missing bit and its position from a string of 10 bits, it does not seem to be a disadvantage to receive a subsequence of nine bits as opposed to an binary array of ten million cells with 9 nonzero entries. This being said, we can divide the previous expression by (1 — ps) to get a capacity expression when the channel is free of primary user activity. This gives the capacity of an erasure channel with erasure probability As, i.e., C' =1— f1R) ,, which is much higher than the capacity of the corresponding deletion channel. Hence, Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 106 by using the synchronization approach, the cognitive users lose the advantage provided by knowing the exact positions of the bits in the codewords and as a result must suffer a loss in data rate proportional to the probability that they disagree whether or not a legacy user is active on a given subcarrier. Furthermore, it goes without saying that the synchronization approach would be much more difficult to implement than the others for all the reasons mentioned throughout this thesis. Cognitive users in practical settings will have to deal with very dynamic and bursty primary users, and as such the average primary usage is not a very useful measurement. Also, as mentioned previously, their ability to detect primary users will likely be tightly regulated. This means that despite all its drawbacks, the synchronization approach offers a significant advantage over codes for erasure channels or real-time reconciliation of the available spectrum between the cognitive transmitter and receiver: primary activity has no influence whatsoever on the error rate of the channel. Thus, as long as the discrepancy in the spectrum allocation is not larger than the error-correcting capability of the code, T can transmit on the available subcarriers. If the bandwidth used by primary users changes suddenly, then T adjusts the subcarriers it uses accordingly and does not need to alert 'R, of the changes. In most practical systems affected by time noise, synchronization errors have catastrophic consequences and are usually converted into substitution errors or erasures. This is the case, for instance, for packet-switched communication networks, where a preamble is added to each packet to insure that the receiver knows which packets are dropped on their way to destination. Opportunistic access for cognitive radio networks is, to the best of my knowledge, the first problem where converting erasures into synchronization errors can be beneficial. The models introduced in this chapter are interference-free. We now discuss this assumption and explain why the models remain valid if it is relaxed. First, we assume that there is little interference from the secondary users to the primary users. This is paramount to cognitive radio systems since cognitive users must not interfere with neighboring primary transmissions, probably by using a lower transmit power which has yet to be fixed by regulatory agencies [45]. Bansal, Hossain, and Bhargava [5] studied a closely related model in which the power allocated to the subcarriers decreases as they approach the primary user band. Consequently, one of the techniques proposed for spectrum sensing is energy detection, where the cognitive users measure the energy level in each subcarrier and conclude that those for which the energy level as above a certain threshold are used by primary users. Second, we assume that there is no interference between the subcarriers used by T when the bandwidth is partitioned; this Chapter 7. Opportunistic Spectrum Access for Cognitive Radio Systems 107 can be approached by using orthogonal frequency division multiplexing (OFDM). Third, we assume that there is no interference from the primary users in the subcarriers used by the cognitive users. This is of course unrealistic, especially for cognitive subcarriers located close to the ones used by legacy users. However, the interference to the cognitive users from the primary users can be modeled as additive white Gaussian noise [5]. This is not a problem for synchronization-correcting codes, since and a code capable of correcting insertions and deletions can also correct substitutions of bits (a substitution can be viewed as the deletion of a bit followed by the insertion of the opposite bit at the same position). Good codes capable of correcting a small number of insertion and deletion errors while remaining robust against additive noise could be useful in this setting. 7.4 Summary In this chapter, we have developed simple models for which we can apply the techniques of error-correcting coding, especially synchronization error-correcting codes, as a means of compensating for primary user activity in dynamic spectrum allocation applications. Using simple and well understood channel models has provided us with some tools and insight to develop approaches to design dynamic spectrum systems. It remains to be seen whether our approach can still be useful when more practical assumptions are used. Chapter 8 Worst-Case Nonzero-Error Interactive Communication Applied to Distributed Problems with Synchronization Errors 8.1 Introduction Interactive communication was introduced by Orlitsky [80] and lies at the intersection of information theory and communication complexity. It examines the amount of communi- cation required for one party to convey information to a second party who has correlated information. Two players, an informant Px and a recipient Py, possess private but cor- related inputs x and y, respectively. Py wants to learn his interlocutor's input without error while minimizing the number of bits that need to be transmitted in the worst case. To do so, Px and Py alternately exchange data on a noise-free channel following a de- terministic protocol they have agreed upon initially. Unlike the original communication complexity model [111] (see the book by Kushilevitz and Nisan [57] for an exhaustive survey), the function to be computed is trivial (f (x, y) = x), but the problem is to ex- ploit the correlation between the parties' knowledge for reducing the required amount of communication. The following example illustrates the model. 1 I started working on nonzero-error interactive communication at the end of my Master of Science. Lemmas 8.9, 8.13, 8.17, and part of Theorem 8.29 were proved in my Master's Thesis [67], but I include them in this chapter to present a complete survey of the subject. 108 Chapter 8. Worst-Case Nonzero-Error Interactive Communication^109 Example 8.1. The League Problem [80] A sports league has 2' teams, and the name of each team is a binary string of n bits. Py knows the two teams playing in the championship match, but a blackout during the game prevents him from learning who wins. Px , on the other hand, hears the name of the champion team on the radio but has no idea who the runner-up is. Py wants to learn the identity of the champion team from Px with certainty while minimizing the communication required. If only one round of communication from Px to Py is allowed, then Px has to transmit the n bits of the winning team. Indeed, if fewer than n bits are transmitted, there are two teams for which Px sends the same message; if those two teams happen to play in the championship match, then Py is not able to learn the winner with certainty. However, a substantial gain can be achieved when interaction is allowed. Py sends the position of one of the bits where the names of the two finalists differ, which requires [log nl bits. It is then sufficient for Px to send the bit of the winning team at the required position. This protocol requires [log nl + 1 communication bits, an exponential gain compared to the one-way protocol. Orlitsky showed that even if more than two rounds of communication are allowed, no protocol can solve this problem by exchanging a smaller number of bits in the worst case. In this chapter, I study worst-case nonzero-error interactive communication and com- pare the results with the original worst-case deterministic model. I allow P y to learn Px's input with a probability of error of at most € and study how it can improve the communication, either by reducing the number of bits that need to be exchanged or by reducing the number of rounds of communication. Four nonzero-error models are pre- sented. The first model, worst-case private coin randomized interactive communication, allows Py to learn x with a probability of at least 1 — c for of all the possible input pairs (x, y). The players can also use randomized protocols: each player has a private, independent source of randomness whose output can be used to decide which bits should be transmitted. The second model, worst-case public coin randomized interactive com- munication, also allows Py to learn Px 's input with a probability of at least 1— € for all of the possible input pairs. It also uses randomized protocols, but instead of private coins, both players can use a public (common) random generator. The third model, worst-case private coin randomized amortized interactive communication, allows the players to solve several independent instances of the same problem simultaneously instead of sequentially. The players are again permitted to use private coins, and Py can fail to learn x with a probability of at most 6 for every input pair. The fourth model, worst-case distribu- Chapter 8. Worst-Case Nonzero-Error Interactive Communication^110 tional interactive communication, permits only deterministic protocols, but Py can learn x incorrectly for a fraction of at most € of all the inputs weighted by their probability distribution. I prove that the worst-case public coin randomized, private coin randomized amortized and distributional models are equivalent, and that optimal protocols for the three models do not require interaction between the players. The models can be arbitrarily better than the worst-case deterministic model when a single round of communication from Px to Py is allowed. Interactive communication includes a large class of symmetric problems [83] inherent to several practical applications, including synchronization of mobile data [100], reconcili- ation of sequences of symbols such as nucleotide sequences in DNA molecules [30], remote data storage [15], list decoding with side information [42], and quantum key distribution [10]. I prove that the deterministic and all of the nonzero-error models are equivalent and efficient for symmetric problems, although optimal protocols requiring only one round of communication can be constructed when errors and randomization are allowed. I also show that all of the models are equivalent and inefficient for Cartesian-product pairs. The most challenging model to analyze is the worst-case private coin randomized model. I show that the best one-round protocols for this model are at most three times more expensive than the best randomized or deterministic protocols using an unbounded number of rounds of communication. This is a striking difference from the deterministic model, for which Orlitsky [80] showed that the best one-round protocols can require the transmission of exponentially more bits than the optimal protocols. It is also different from the private coin randomized communication complexity of boolean functions, which exhibits the same phenomenon [112]. Finally, I prove that for several classes of problems, the best deterministic and private coin randomized protocols require asymptotically the same number of communication bits. I conjecture that the deterministic and private coin randomized models are equivalent. The outline of the chapter is as follows. In Section 8.2, I describe the complexity models and present the existing work. The private coin and public coin randomized models are treated in Sections 8.3 and 8.4, respectively. Randomized amortized interac- tive communication is studied in Section 8.5, and the distributional model in Section 8.6. Finally, the results for balanced and symmetric problems are presented in Section 8.7, and a summary of the chapter in Section 8.8. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^111 8.2 Complexity Models and Known Results 8.2.1 Preliminaries The framework for studying interactive communication was introduced by Orlitsky in his seminal paper [80]. Let X and Y be finite sets, and let S C X x Y, the support set of (X, Y), define an interactive communication problem. Two players, Px and Py, possess respectively inputs x E X and y E Y such that (x, y) E S, and they want Py to learn x while minimizing the communication between them (it is not necessary for Px to learn y). It is assumed that the communication between the players is binary. A k-round protocol is a protocol such that for every input, there are at most k — 1 alternations between the data sent by Px and the data sent by Py. Due to the asymmetric nature of the interactive communication model, it is assumed that the last round of communication is always from Px to Py. A one-round protocol is also called one-way, and a protocol requiring more than one round of communication is called two-way. A hypergraph is an ordered pair G = (V, E), where V is the set of vertices and E the set of hyperedges. Each hyperedge is a subset of V. Two distinct vertices v 1 and v2 of a hypergraph are adjacent if there is an edge e E E such that v i E e and v2 E e. A proper coloring of a hypergraph is a partition of V in colors such that no adjacent vertices have the same color. The chromatic number x of a hypergraph is the smallest number of colors for which there exists a proper coloring of G. A convenient way to analyze an interactive communication problem SC X x Y is to use its characteristic hypergraph Gs. The vertices of Gs are the elements of X, and for every y E Y there is a hyperedge E(y) 1 ' {x 1 (x, y) E S}. The number of different hyperedges of Gs is denoted by a. It should be noted that all of the asymptotic bounds presented in this chapter only make sense for the support sets for which x is a function of the size of the inputs. The ambiguity set of an input x E X, defined as a(x) '..' {y E Y 1 (x, y) E S}, is the set of all possible inputs for Py given that Px 's input is x, and the ambiguity of x is ja(x)I. The maximum ambiguity of Px , ay ''' maxx{ I a (x ) I } , is the maximum number ofxe possible elements of Y for any element in X. Note that a(y), I a(y)1 and ay are defined similarly, with the assumption that a"---y > 1, since a support set S with ay = 1 is trivial and does not require communication. Example 8.2. As an illustration of the league problem presented in Example 8.1, the support set L is defined as L = {(t 1 , {t 1 , t2 }) 1 t1 t2 }, where t i , t2 E {0,1}n are teams. The vertices of GI are the teams of the league, and its hyperedges are the possible Chapter 8. Worst-Case Nonzero-Error Interactive Communication^112 match-ups for the championship game. It follows that the maximum ambiguity of Px is the number of possible runner-ups given the champion team (a—x = 2 71 — 1), and the maximum ambiguity of Py is the number of possible champion teams given the two finalists (ay = 2). A support set S C X x Y is a Cartesian-product support set if there exists X' C X and Y' C Y such that S = X' x Y'. A support set S is balanced if a—x = a-3. A symmetric support set is a support such that (x, y) E S if and only if (y, x) E S (it is clear that symmetric support sets are also balanced). Symmetric support sets arise naturally in all of the problems for which the parties' inputs are bounded by a certain "distance", including all of the applications mentioned in Section 8.1. 8.2.2 Worst-Case Deterministic Model The worst-case deterministic model was introduced by Orlitsky [80]. It has the following properties: 1. Both players send information according to a deterministic protocol. Each player sends messages based on its input and the messages previously received. 2. When a player sends a message, its interlocutor knows when it ends, and both players know that the transmission ends when the protocol halts. 3. Py has to learn x without error for every pair (x, y) E S. Let the codeword of (x, y) E S be the concatenation of the messages sent by the players on the input pair (x, y). It can be shown that if the three properties are satisfied, then the set of codewords for all the pairs (x, y) E S is prefix-free. The worst-case deterministic complexity of a support set S, written öc,„(S), is the minimum number of bits the players have to exchange in order for Py to learn x without error for every pair (x, y) E S. We write Ck(S) when the number of rounds of communi- cation is bounded by k; obviously, Ck(S) decreases with k, and 600 (S) = lira Ck(S). We k---■oo write 6.(s) for the number of bits that need to be transmitted if Px knows y in advance. The following results were shown by Orlitsky [80]. A deterministic protocol requires at least Flog a-31 bits of communication, otherwise if I a(y) I = a-3, then there are different input pairs (x i , y) and (x2 , y) for which the communication between the players is the same. Clearly, the bound is tight if Px knows y in advance, and a single round of communication is sufficient. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^113 Result 8.3. 0,9(S) [log Ciy1 = 0* (S). The one-way deterministic complexity is the logarithm of the chromatic number of the underlying hypergraph of the problem, and one-way protocols require at most expo- nentially more bits than protocols allowing interaction. Result 8.4. ac.0(S)^[log 01(S) .] + 1 = Flog Flog x -11 + 1. A remarkable result from Orlitsky is that two rounds of communication are almost optimal for every problem, i.e., 02(S) < 46'09 (S) + 3. This is quite different from the original communication complexity model, where for every k > 0, there is a function whose best k-round protocol requires an almost exponentially higher number of bits than its best (k + 1)-round protocol [31]. Orlitsky [81] showed in a subsequent paper that two rounds of communication are not optimal for worst-case deterministic interactive communication, and Zhang and Xia [114] proved that three rounds are not optimal either. Ahlswede, Cai and Zhang [2] conjectured that four rounds are optimal, but the problem remains open, i.e., whether there is a k such that Ck(S) 6„0 (S)+ o(0„„(S)). Naor, Orlitsky and Shor [77] established an upper bound on the 4-round deterministic complexity. Result 8.5. 04 (S) < log log o- + log ay + 3 log log 63, + 7 < log log x + 2 log aj, + 3 log log ay + 7 < 30,0 (S) + o(0,,(S)). Balanced and symmetric support sets were studied by Orlitsky [83], who proved that the best one-way protocols require at most two times the amount of communication required by optimal protocols, i.e, 01 (S) < 2 00,, ( S ) + 1. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^114 Orlitsky also showed that interaction can reduce the communication required, and that three rounds of communication are optimal. Result 8.6. Let S be a balanced support set. Then, 00„(S) < 0 3 (S) < log 6,33 + 3 log log rtS, + 11 5_ Öcc (S) + o(á„,„(S)). 8.2.3 Worst-Case Private Coin Randomized Model In the first nonzero-error model, Py is allowed to learn x with probability of error e. The players are also allowed to toss coins; Px and Py possess independent finite random strings Tx and ry of arbitrary length, respectively. The communication bits become random variables: the bits sent by Px depend on x, Tx and bits received from Py, and the bits sent by Py depend on y, Ty and bits received from Px . It is therefore possible that for a fixed input pair (x, y), a protocol outputs different results for different values of rx and ry. Let S be a support set and let P be a randomized protocol. P computes S with error E if, for every pair (x, y) E S, the probability that Py answers x on input (x, y) is at least 1 — €. The worst-case communication of a protocol P on input (x, y) is the maximum number of bits communicated for any choice of the random strings Tx and Ty. The worst-case cost of P is the maximum, for all the inputs (x, y), of the worst-case communication of P on (x, y). The c-error worst-case randomized complexity of S, written f?'„„(S), is the minimum worst-case cost of a randomized protocol computing S with error e, for 0 < € < 1. In other words, kof (S) is the number of bits transmitted in the worst case by the best protocol which, for every pair (x, y) E S, allows Py to learn x with a probability of at least 1 — €. We write k(S) when the number of rounds is bounded by k. Also, for the rest of this chapter, € is constant and c(E) is a function of E. In his first paper on interactive communication, Orlitsky [80] briefly investigated a weaker randomized model, considering the average communication over the choices of Tx and ry for the worst input pair. He showed that 911(S), the average-c-error worst-case randomized complexity of S, is at most four times the worst-case deterministic complexity, i.e., 911(S) < 4a,, (S) + 2 loge- . Chapter 8. Worst-Case Nonzero-Error Interactive Communication^115 8.2.4 Worst-Case Public Coin Randomized Model In the randomized model previously defined, each player has its own random generator. Px cannot see ry and vice-versa. In the public coin randomized model, both players can access a common "public" random coin. Formally, both players have a common random string r following a probability distribution H. The communication bits sent by Px depend on x, r and bits received from Py , and those sent by Py depend on y, r and bits received from Px . A public coin randomized protocol can also be viewed as a probability distribution over a family of worst-case deterministic protocols. The c-error worst-case public coin randomized complexity of a support set S, written Rub(s) , is the number of bits transmitted in the worst case by the best public coin protocol, which allows Py to learn x with an error probability bounded by e for every pair (x, y) E S and 0 < c < 1. We write frk'Pub (5) when the number of rounds is bounded by k. 8.2.5 Worst-Case Amortized Models For several models of computation, including interactive communication, simultaneous resolution of several independent instances of a problem can be more efficient than sequen- tial resolution of the instances. This phenomenon is known as the direct-sum problem, and was introduced by Karchmer, Raz, and Wigderson [51] for communication complex- ity of relations as a promising approach to separate the complexity classes NC I. and JVC2 [84]. Let S C X x Y define an interactive communication problem, and let (xi, yi), (x2, Y2), • • • , (x 1 , yi ) be 1 independent instances of S. Px knows (x 1 , x2 , , X i) , Py knows (yl, y2 , , yi ), and the goal is for Py to learn all the x i from Px while mini- mizing the worst-case communication. We write Os oc,(S/) for the simultaneous worst-case deterministic complexity of 1 instances of a support set S. The worst-case determinis- tic amortized complexity of S, written A cc,(S), is a complexity measure representing the average communication per instance, and is given by the expression ;1,,(S) LA- We write Ck(S 1 ) and Ak(S) when the number of rounds is bounded by k. Clearly, 0,„(S/) < / • Es 'o„(S) and Am (S) < 000 (S). Deterministic amortized complexity for inter- active communication was studied by Naor, Orlitsky and Shor [77] and Alon and Orlitsky Chapter 8. Worst-Case Nonzero-Error Interactive Communication^116 [3]. In the former paper, it was proven that the deterministic amortized complexity is equal to the complexity when Px knows y in advance, and that at most four rounds of communication are required to find an optimal protocol. Ahlswede, Cai and Zhang [2] subsequently reduced the number of rounds to three. Result 8.7. A3(S) = A 4 (S) = • • • = Aco (S) = 0*(S) = log ay . Example 8.8. We want to solve the league problem for two seasons, assuming that the results are independent. Py wants to learn the identity of the two champion teams from Px , who knows the two pairs of finalists. Obviously, if both seasons are solved independently, 2( Flog id + 1) communication bits are required. However, by treating the two seasons as one larger problem, it was shown [33] that 02 (L2 ) < Flog n] +6. When the number of teams in the league is large, solving one or two instances requires roughly the same number of communication bits. Moreover, Result 8.7 implies that the deterministic amortized complexity of the league problem is 1 bit per instance. This chapter examines the simultaneous resolution of several instances of interactive communication problems using nonzero-error randomized protocols: Px and Py are al- lowed to toss private coins, and Px must learn (x i , x2 , . . . , x i ) correctly with a probability of at least 1 — e, for 0 < E < 1. We write i oo (S/) for the simultaneous c-error worst- case private coin randomized complexity of / instances of a support set S. The c-error worst-case private coin randomized amortized complexity of S is given by the expression AQQ(S) A / lim —, /t(S /). (8.1) 00 — Again, we write k (S/) and AVS) when the number of rounds is bounded by k. 8.2.6 Worst-Case Distributional Model In all of the complexity models defined so far, any interactive communication problem is completely described by its support set S. In effect, since I consider the communication in the worst case for all the possible input pairs (x, y), the probability distribution over the inputs is irrelevant. In the final nonzero-error model, deterministic protocols are allowed to fail with probability 1 for some pairs (x, y) E S provided they are correct for most of the inputs. Let it be a probability distribution over S (it is assumed that the pairs (x, y) 0 S are not possible) The worst-case (,u,e)-distributional complexity of S, written Chapter 8. Worst-Case Nonzero-Error Interactive Communication^117 AtE(S), is the number of bits transmitted in the worst case by the best deterministic protocol that allows Py to learn x for a fraction of at least 1 — E of the inputs (x, y) E S, weighted by ti, for 0 < € < 1. The players are allowed to use an agreed-upon protocol based on ,u. The only model previously studied in the context of interactive communication and considering a probability distribution iL over the inputs is the zero-error average-case deterministic model. It was studied by Orlitsky [82] and Alon and Orlitsky [4]. Py wants to learn x with certainty using a deterministic protocol, with the goal of minimizing the expected number of bits that need to be transmitted, weighted by il. Orlitsky [82] also studied a slightly different distributional model in which Px and Py want to convey x and y to a third person P with the assumption that (x, y) E S. Several variants were studied including allowing interaction, randomization, and errors. The model has interesting similarities to the Slepian-Wolf problem [96]. 8.3 Private Coin Randomized Interactive Communication: One Round of Communication is Nearly Optimal In this section, I study worst-case private coin randomized interactive communication. I derive two lower bounds and an upper bound and use them to prove that a single round of communication is close to optimal for this model. As mentioned in the introduction, this is quite different from worst-case deterministic interactive communication, and even from the worst-case randomized communication complexity of boolean functions. Notice first that a randomized protocol can simulate a deterministic protocol by ignoring the output of the random generators, thus Rao(S) < 0„o(S). (8.2) An initial lower bound, Lemma 8.9, shows that for any support set, the difference between private coin randomized complexity and one-way deterministic complexity is at most exponential. Lemma 8.9. froo (S) E C2(log al (S)). Chapter 8. Worst-Case Nonzero-Error Interactive Communication^118 Proof. A randomized protocol P for a support set S can be represented by a binary tree. During the execution of P on input (x, y) E S, the players travel down the tree from the root to one of the leaves. At each internal node, one of the players uses its input and random string to decide which sibling should be followed, and conveys the answer to its interlocutor by transmitting it a bit. The output of the protocol is one of the values x' E a(y) 1 of the leaf reached at the end of the execution, and the worst-case cost of P is the height of the tree. The probability of reaching a leaf L is the product of the probabilities over the choice of the random strings encountered alternatively by Px and Py along the root-to-leaf path leading to L. This probability can be rewritten as px , L • py , L , where px ,L and py ,L collect the probabilities pertaining to Px and Py, respectively. To prove the lemma, an optimal private coin randomized protocol is derandomized using a well-known technique (see Lemma 3.8 in [57] for the equivalent result for boolean functions). For each leaf L of the protocol tree, Px sends px ,/, to Py. Each real number is sent using p = — log (1 — €) + f1006 (S) bits, which means that the difference between the exact probability and its rounded value is at most 2 -P. Py then computes pi, = px,L•py,L, for each leaf and derives the probability to output x i for every x i E a(y). Suppose that 0 < E < 2. Since P computes S incorrectly with a probability of at most e, there is a single x i = x that is output with a probability of at least 1— E. The total rounding error is smaller than 2k,o (S) 2—p = 2k(S) 2 -10g (1-E)+1h (s) = 1 _ E 2 because the tree has at most 21 0 (s ) leaves and only the information sent by Px has to be rounded. Hence, the only xi E a(y) with probability greater than a is x. The communication complexity of the derandomized protocol is 61(S) < 2.Fc00(s) p. Thus, froo (S) > log 61 (S) — log fro° (S) — c(E).^(8.3) The result of this lemma follows. 0 Lemma 8.9 can be used to show that the deterministic and private coin randomized models are equivalent when the difference between the one-way deterministic complexity and the two-way deterministic complexity is exponential. 1 Recall that a(y) is the ambiguity set of y. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^119 Corollary 8.10. If 0,„„(S) 2 log 01 (S), then 00o (s)^(s). If Oc,(s) E g(log 01 (S)), then Oco (s) E e(froo (s)). Proof. Combining (8.2) and (8.3), we get log 01 (S) — log 0„(S) — c(E) < it(S) 5_ e,,(s). If 6,0 (s) ti log 01 (S), then asymptotically, Occ (S) — log 6'0,0 (S) — c(e) k(S) < 0„,,(S), thus 0„,,(S) ti .h€,„„(S). The second case is proved in a similar way.^111 Example 8.11. Recall that for the league problem presented in Example 8.1, 0,„,(L) = [log ni +1 and C1 (L) = n. Using Corollary 8.10, it follows that ko (L) log n, hence the worst-case deterministic and private coin randomized models are equivalent for this problem. Lemma 8.9 can also be used to show that the worst-case deterministic and private coin randomized models are equivalent when Py 's ambiguity is constant. Corollary 8.12. If ay E 0(1), then 0,0(s) frco (s). Proof From (8.2) and (8.3), it follows that log O1 (S ) - log k(S) — c(c) < k„0 (S) < 61„„(S). Moreover, 0„,„(S) < log log x + 0(1) < log 01 (S) + 0(1) 2 f (n) g (n) if and only iflimo^= 1 Chapter 8. Worst-Case Nonzero-Error Interactive Communication^120 from the assumption and Results 8.4 and 8.5, thus àcc(S) — log f(,,3 (S) — 0(1) < ko (S) 6'„„(S). 0 The next lemma tightens the bound from Lemma 8.9 for problems with a small discrepancy between 01 (S) and 6,,,„(S). Lemma 8.13. ko(s) log 63; — log^1 E . Proof. Suppose that ko (S) < log ay — log 1 E . By definition, there exists a protocol P for S requiring fewer than log ay — log TIT bits of communication and such that for every pair (x, y) E S, the probability that Py does not learn x correctly is at most E. Let y E Y be an input such that la(y)1 = as--) . For every choice of the random strings rx and ry , fewer than (1 — €) • ay distinct messages can be transmitted; it follows that the protocol makes an error for more than € • a-3 of the x i E a(y). By a simple counting argument, there is at least one element x' E a(y) such that the error probability of P on the input pair (x', y) is more than €, which is a contradiction. ^ The previous bound is very weak for problems for which the difference between the one-way deterministic complexity and two-way deterministic complexity is large. For the league problem, a--3 = 2, and Lemma 8.13 gives it' (L) > 1. On the other hand, the bound is tight for balanced and symmetric pairs, as it will be shown in Section 8.7. The next lemma gives a strong upper bound on the private coin randomized complexity. Lemma 8.14. M ^< 2 [log log x + log + log (-;.1 — 1) + 11 . Proof. Let Gs be the characteristic hypergraph of S, and let x be Px 's input. Recall that Py has an input y defining the ambiguity set (hyperedge) a(y) = {x i ,^, x i } and wants to learn which of the x i is x. Let k^Flog Xl — 1. Players agree on a proper coloring IF of Gs with x colors. We write the color of the vertices s of Gs in binary form: sosi^sk, where si E {0,1}. We consider these strings as polynomials in Z, where Chapter 8. Worst-Case Nonzero-Error Interactive Communication^121 p is a prime number such that 1 ^k (ay 1)] < p < 2 [ 1 E k• (63 —^. Bertrand's postulate [43] guarantees the existence of such a p. In other words, for a vertex s of Gs whose color is Ws , tIf s (t)^(so + sit + • • • + sktk )(mod p). Px randomly chooses m E Z, and sends m and klf x (m) to Py. Py then constructs the set {xi xi E a(y) A^) = kIfx (m)}, randomly chooses an element of Ern and concludes that it is Px 's input. In order to show that the protocol works, one has to prove that the probability that Py answers x is at least 1 — E. First, observe that x E Em , therefore lEird > 1 and Py can only answer incorrectly when 1.En, I > 2. Thus, we get Pr[Py answers x] Pr[Px randomly chooses j E zp A Py guesses correctly] = jeZp 1 P-1 P j=o p-1 Pr[Py answers x I j is chosen] Furthermore, since W is a proper coloring of Gs, all of the vertices of the hyperedge a(y) have a different color, thus the corresponding polynomials W s (t) are different. This difference implies that two such polynomials agree on at most k field elements because their difference, a nonzero polynomial of degree at most k, has at most k roots. Hence, we can deduce that p-1 k • (63; — 1 ) + j=0 which implies that p-1 x—■ 1 >j=0^(k•(ctSj-1)+P) ' Chapter 8. Worst-Case Nonzero-Error Interactive Communication^122 and thus Pr[Py answers x]^1^ ± 1 k•(ify-1) ^+k•(633 -1 ) • 1 1-e 1 Finally, the complexity of the protocol is i?'1(S) < 2 [log pi < 2 [1 + log (- - 1) + log log x + log 6331 . Using Lemma 8.14 with the league problem, we get ill(L) < 2Flog + c(c). More generally, the lemma can be used to prove that the one-way private coin randomized complexity is at most four times that of the worst-case deterministic complexity. Corollary 8.15. < 40,0 (S) + [2 log 1 1 Proof. Using Lemma 8.14 and Results 8.3 and 8.4, we get Ri (S) < 2 [1 + log (1 - 1) + log C1 (S) + 0,0 (S)] < 2 [1 + log (-1 -1) + 0„.0 (S) - 1 + 6',,(S)1 < 40,„,(S) + 2 [log -E-1 1 . Combining Lemma 8.14 with the two lower bounds proved in this section, we can conclude that one round of communication is nearly optimal for the worst-case private Chapter 8. Worst-Case Nonzero-Error Interactive Communication^123 coin randomized model. Corollary 8.16. (S) < 41 (S) + o(irec,(S)). Proof. From Result 8.4 and Lemmas 8.9, 8.13, and 8.14, it follows that Ri (S) < 2 log 611 (S) + 2 log ay + ci (c) < 4.te,o (S) + 2 log it(S) c2(f) < 4k,o (S) + o(froo (S)). 8.4 Public Coin Randomized Interactive Communication In this section, I characterize worst-case public coin randomized interactive communi- cation. First, I prove that one round of communication is optimal for this model; the complexity corresponds to the amortized complexity and to the communication required when Py knows y in advance. Second, I derive an upper bound on the difference between the private coin and public coin randomized models, and use it to improve upon the upper bound for the one-way private coin complexity derived in the previous section. One should notice that the players can use the concatenation of the private strings T x and Ty as a public random string, thus the public coin randomized model is at least as powerful as the private coin randomized model, i.e., ^i i';c,pub(s)^(s). 8.4.1 One Round of Communication is Optimal Lemma 8.17. log 6-33 — log^ < frzub(s) < 1,pub 1 1 Flog(c3 1)1 + [log 1 €1 Proof. Lemma 8.13 can be applied with a public random generator for the lower bound. For the upper bound, recall that Px has an input x and Py has an input y defining the Chapter 8. Worst-Case Nonzero-Error Interactive Communication^124 ambiguity set a(y) = {x i , , x/} = {x (x, y) E S}. It is assumed that the elements of X are coded using [log IX11 bits. Px chooses k random subsets of the bits of his input x and, for each of these subsets, sends the parity of the bits to Py. Py then computes the k corresponding parities for each of the inputs x i E a(y). Each time a parity differs from the corresponding result sent by Px , Py deduces that x i x and discards it. When Py has performed all of the comparisons, it randomly chooses an input among those not discarded and concludes that it is Px 's input. The probability of not discarding a vertex x i is 1 if x i = x and 2otherwise. Let Z be a random variable representing the number of inputs not discarded after k iterations. Since there are a(y) — 1 inputs to discard, it follows that 2k̂ 1E(Z) = 1+ (a(y) — 1)^+ Fc (ay — 1). The probability that Py learns Px 's input correctly is E [1-], and since Z > 0, we use Jensen's inequality to get 1 ^1 Pr[Py answers x] > ^ ^E[Z ^1 +^— 1) . By letting k = ilog(E63 — 1)1 + [log yi , it follows that z is convex for Pr[success] 1 1 +^1^(y —1 )1) 2 flog(irg—i)1 + F log Y1 from which we conclude that fri ,pub(L) < k^flog(a3, — 1)1 + [log E f .1 Example fri ,pub (L ) plexity can 8.18. If we apply Lemma 8.17 to the league problem, we get [log yi E OM, which shows that one-way public coin randomized corn- be arbitrarily better than one-way deterministic complexity. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^125 8.4.2 Difference Between the Private Coin and Public Coin Models The private coin randomized complexity cannot be much worse than the public coin randomized complexity: every public randomized protocol can be transformed into a private randomized protocol whose error probability is slightly larger, and which uses a few more communication bits. Theorem 8.19. For all (5 > 0 and for all e > 0 such that E + S < 1, i?‘1±5 (S) < f?Tb (S) + log log 151 + log 1 + 1. Proof The proof is inspired by a similar result for boolean functions shown by New- man [78]. Let P be a public coin randomized protocol for S whose error is bounded by c and requiring R ub (S) communication bits. We suppose that the random generator r follows a probability distribution tt. Let Z(x, y, r) be a random variable equal to 1 if the answer given by Py following the execution of P on input (x, y) is incorrect (different from x), and equal to 0 if correct. Since P solves S with error at most 6, it follows that rEp E [Z(x, y, r)] < c for every pair (x, y) E S. A new public protocol for S using fewer random bits is designed. Let t be a parameter to be fixed later, and let r 1 , r2 ,^rt be binary strings. The protocol P^defined7-1,7-2,•••rt is as follows: Px and Py randomly choose i between 1 and t using their public coin and run protocol P with the common random string We now show that strings r1 , r2,^, rt exist such that E[Z(x, y, ri )] < E + S for every pair (x, y) E S. We choose the t strings r1, r2,^, rt randomly following the probability distribution it, consider an arbitrary pair (x, y) E S and compute the probability that E[Z(x, y, ri )] > e + S (where i is uniformly distributed). The latter is equivalent to the probability that i E it=1 Z(x,y,ri ) > e + S. Since rE [Z(x, y, r)] < e, Chernoff bound yields [1 t ^ Pr - r,^(5Z(x, ri) - c >^< 2e -282t . 7- 1,-•,rt^t `—"'i=1 Chapter 8. Worst-Case Nonzero-Error Interactive Communication^126 By choosing t = Flo, isil it follows that ^ 2e -252t^2e-262 ^ 2e_21.gls1 2 2-21og elog = 21S1 -21°ge 1 151 when ISI > Thus, for a random choice of r i , , rt , the probability that there exists at least one pair (x, y) E S (there are 1S1 such pairs) such that E[Z(x, y, ri )] > E 8 is smaller than I SI 1 = 1. Consequently, there exists a choice of r i ,^, rt such that for every pair (x, y) E S, the error of protocolPr l is at most E S. The number of random bits used by P7- 1,7-2,•••,rt is [log ti , and in order to transform the public protocol into a private protocol, Px has to randomly choose i between 1 and t and to send it to Py. Moreover, from Lemma 8.17, an optimal one-round public coin randomized protocol exists for S ensuring that Pri,r2,•••,rt is also a one-way protocol. Hence, k±o(s) < .kei ,pub(s) + [log ti[ logs21 ,5111< ki,pub(s) + [log < ki,pub(s) + log log 1S1 + log + 1 . 0 Example 8.20. The previous theorem is applied to the league problem, for which 1S1 = (2 72 ) • (2' — 1), giving KH-5 (L) < k^ 1fub(S) + log log 1S1 + log + 1 < log log 12n(2n — 1)1 + c(e', 8), thus k (L) < log n + c(c). An optimal one-way private coin randomized protocol is thereby yielded for this problem; it improves on the bound given by Lemma 8.14, which is not optimal, and the result from Example 8.11, which does not limit interaction between the players. This example also proves that the bound given by Theorem 8.19 can be reached. 1. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^127 Theorem 8.19 can also be used to tighten the upper bounds on the one-way private coin randomized complexity presented in Section 8.3. Corollary 8.21. For all S > 0 and for all c > 0 such that c + S < 1, 1i?r5 (S) 5_ Flog(ciy — 1)1 + log log 1ST + [log 1 — € 6 1 + log—(52 + 1. Corollary 8.22. < log log o- + log 633 + log log -63-,+c(c). Proof Recall that a is the number of hyperedges of the characteristic hypergraph Gs. Suppose that the hyperedges for all the y E Y are different; if this is not the case, the players agree on an equivalent support set S' C X' x Y', S' C S, such that there is a distinct hyperedge for each y E Y'. It follows that 15'1 < < a ay , and the result follows from Theorem 8.19 and Lemma 8.17.^ ^ If X = Y = {0, 1r, the difference between the private and public coin randomized models is at most an additive term of log n + 0(1) bits. Using Result 8.5, the same thing can be said about the difference between the deterministic and public coin ran- domized models. In fact, the similarity between Result 8.5 and Corollary 8.22 is striking, considering how different the models and the proofs are. Corollary 8.23. l(S)^3.ito (S)+ o(1 (S)). Proof It is well known that a x-colorable u-uniform hypergraph has at most ( ux) hyper- edges, thus S can be represented by a (not necessarily uniform) characteristic hypergraph ay having at most E^< xa3 hyperedges. The result follows from Corollary 8.22 and Lemmas 8.9 and 8.13. ^ Corollary 8.24. < 30„„(S)+ o(eco(S)). Chapter 8. Worst-Case Nonzero-Error Interactive Communication^128 8.5 Private Coin Randomized Amortized Interactive Communication: One Round of Communication is Optimal In this section, I show that one round of communication is optimal for the private coin randomized amortized model; the complexity is equal to the number of bits that need to be transmitted when Px knows y in advance. This is an improvement over the de- terministic amortized model, for which interaction is required in order to minimize the communication. Lemma 8.25. Al(s) = "M(s) = ... = Afoo (s) = log c-1,-;,. Proof. As mentioned in Section 8.2, / independent instances of a support set S C X x Y can be treated as a larger support set S I . Let Gs be the characteristic hypergraph of SI. It is not hard to show that the vertices of Gs are the elements of Xi, and that for each l-tuple (e i ,e2 ,..., e l ) of hyperedges of Gs, e l x e2 x • • • x el is a hyperedge of C. Clearly, the maximum ambiguity of Py for the support set SI is a-3 1 , and the number of different hyperedges of Os is o-/ . It follows from Lemma 8.13 that i?‘ 60o (S/ ) > / • log 63-, — log 1 1 e , and Corollary 8.22 gives k (SI ) < log log o- + / • log ay + log log ay + 2 log / + c(c). The result follows from the definitions of Al (S) and ko (S) given by (8.1). 0 Lemma 8.25 shows that when several instances are solved simultaneously instead of sequentially, there is no advantage to using a public coin over private coins since no interaction is required and the amount of communication is the same. Also, comparing Lemma 8.25 and Corollary 8.21 when 'SI E 0(2cn) (which includes the support sets S C {0, l}n x {0, i}n), the difference between the private coin randomized complexity and the private coin randomized amortized complexity is at most an additive term of log n+ 0(1) bits. This bound is tight for the league problem. The same discrepancy between the Chapter 8. Worst-Case Nonzero-Error Interactive Communication^129 deterministic complexity and the deterministic amortized complexity is implicit in the work of Naor, Orlitsky and Shor [77], using Results 8.5 and 8.7. 8.6 Distributional Interactive Communication: One Round of Communication is Optimal In this section, I study the worst-case distributional model, starting with yet another example using the league problem. Example 8.26. Suppose that the support set L follows a uniform distribution, and without loss of generality, that n, the length of the team names, is an even integer. Px sends two bits to Py: the parity of the first i bits of the champion team's name, followed by the parity of the last fri bits. Py computes the equivalent parities for the two teams playing for the championship and compares the results with the bits received from Px. The protocol will err for the pairs of finalist teams with identical parity bits, and there are ILI • (1 — -',t-, ) such pairs. Hence, n i uniform, 71 (L) < 2. It should be noted that for every interactive communication problem and for all 6 > 0, a probability distribution over the inputs exists for which the distributional complexity is 0, for example, if there is an input pair occurring with probability at least 1 — E. Con- sequently, it is interesting to consider probability distributions maximizing the number of bits that need to be transmitted; even in this case, communication can be more ef- ficient than when no error is allowed. This section established that the distributional complexity with the worst possible probability distribution over the inputs is equal to the public coin randomized complexity, which, again, is equivalent to the complexity when Px knows y in advance. This is hardly a surprise, since the equivalence between the public coin randomized and distributional models was first established by Yao [110] for computational complexity using von-Neumann's Minimax Theorem of game theory [109]. The equality is also verified for communication complexity of boolean functions (see, for example, [57]), and the proof can be applied without modification to partial domains S C X x Y. A simpler proof tailored for interactive communication problems in presented; it improves the original proof in two ways. First, a family of asymptotically "worst" probability distributions over the inputs is described. The distributions can be used with any interactive communication problem and probability of error. Second, an optimal protocol requiring a single round of communication is constructed. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^130 Lemma 8.27. — log ay — log 1 —1 2E < max bi,;;;c(S) < max br (S) < ilog(633 1)1 + [ 1 log^• Proof To prove the lower bound, the following probability distribution //' over S is used: there is a y' E Y with I a(y')I = ay such that Pr[(x, y) = (xi , y')] = -2* for every x i E a(y) . The probability of all the other input pairs (x, y) E S can be any nonzero value. Suppose that bt;?(S) < log a---; - log *-2,, and let P be an optimal protocol for S. For any pair (x, y) E S, less than ay • (1 — 2E) distinct messages can be exchanged between Px and Py. It follows that when Py has input y', the protocol fails on more than 2E • ay input pairs (x i , y'), each pair occurring with probability *. Hence, P fails with probability greater than E over S, which is a contradiction, and log ay - log 1 2E < hic-,ilo'c(S) < mtax To prove the upper bound, the public coin protocol presented in Lemma 8.17 is used. Recall that by receiving the parity of k = — 1)1 + [log 1=--€, 1 random subsets of the bits of x from Px , Py can learn x with probability at least 1— E for every valid input pair. Since the protocol works for any probability distribution over the inputs, it succeeds with probability at least 1 — E when the probability distribution is taken over the inputs and the choice of the subsets. By a simple counting argument, it follows that k subsets of the bits of x exist for which the protocol succeeds for at least a fraction 1 — E of the inputs weighted by their probability distribution. Consequently, to derandomize the protocol presented in Lemma 8.17 for a fixed y and a fixed 6, Px and Py agree on k such subsets and can execute the rest of the protocol as before: Px sends the parity of the k subsets of x to Py, who compares the received bits with the k corresponding parities for each of the inputs x i E a(y). If more than one of the x i is not discarded, Py chooses the one whose input probability given y is the highest. The complexity of the protocol is max nr(S) < itl'Pub (S) < k < Flog(E63 — 1)1 + [log El̂ .E 0 The family of "worst" probability distributions presented in the previous proof is not unique. For example, when the ambiguity of every input y E Y is maximal, i.e., 14)1 = ay, it is not hard to show that the uniform distribution also maximizes the Chapter 8. Worst-Case Nonzero-Error Interactive Communication^131 required number of communication bits. The league problem is an example of such a support set: Py always picks the winner among two teams. 8.7 Equivalence of All the Models for Balanced and Symmetric Pairs All of the worst-case complexity models introduced in this chapter require asymptotically at least log ay bits of communication, and with the exception of the deterministic and private coin randomized models, asymptotically log ay bits suffice. For some support sets like the league problem, the difference between log ay and COO (S) is large, but for other problems, the amortized or nonzero-error models cannot be used to reduce the number of communication bits. Consider the following modification of the league problem: ei- ther Py knows the two teams playing for the league championship, or it doesn't know anything. It is obvious that in the worst case, whether it is the worst input or the worst probability distribution over the inputs, Px has to asymptotically send all of the bits of the champion team's name to Py and it can do so in a single round of communication. For the problems mentioned in the next lemma, all of the worst-case models presented in this chapter are asymptotically equivalent and inefficient: randomization, nonzero- error protocols, solving several instances simultaneously, knowing y in advance and even interaction cannot significantly reduce the communication between Px and Py. Lemma 8.28. If there is ay E Y with la(y)1 = ay = PG or if S is a Cartesian-product support set, then Cl (S) = Flog a-31 . Proof. Result 8.3 gives a lower bound of [log Fyl. If Eij, = IX I , then Px has to describe x completely; it can do so using Flog 1X11 = Flog ay bits. If S is a Cartesian-product support set, then X' C X and Y' C Y exist such that S = X' x Y'. Again, a3 = PC1 and Px can describe x with [log IX'11 = Flog a-3,1 bits of communication. 111 It is also possible to prove that all of the worst-case complexity models are equivalent for balanced and symmetric pairs. Recall that a support set S is balanced when a—x = ay and symmetric when (x, y) E S if and only if (y, x) E S. Theorem 8.29. Let S be a balanced support set. The following models are all equivalent: (i) deterministic, (ii) amortized deterministic, (iii) deterministic when Px knows Py '5 Chapter 8. Worst-Case Nonzero-Error Interactive Communication^132 input, (iv) private coin randomized, (v) public coin randomized, (vi) distributional, (vii) private coin randomized amortized. More precisely, 0*(s) < 63(S) < O*(S) + 0(0*(S)); (8.4) O* (s) - 1 < A3 (S) < O*(S); (8.5) 0*(S) -^< 1(S) < 0* (S) + 0(0* (S)); (8.6) O*(s) - 0(1) < < 0*(S) + 0(1); (8.7) O*(s) - 0(1) < br(S) < C‘* (S) + 0(1); (8.8) O*(s) -^< Ai (S) < O*(S). (8.9) Proof. Inequality (8.4) can be deduced from Results 8.3 and 8.6; Inequality (8.5) from Results 8.3 and 8.7; Inequality (8.6) from Results 8.3 and 8.6, Inequality (8.2) and Lemma 8.13; Inequality (8.7) from Result 8.3 and Lemma 8.17; Inequality (8.8) from Result 8.3 and Lemma 8.27; Inequality (8.9) from Result 8.3 and Lemma 8.25. ^ Unfortunately, Theorem 8.29 means that nonzero-error algorithms cannot signifi- cantly reduce the communication for all of the practical applications mentioned in the introduction. This is a somewhat "negative" result, but only because the best determin- istic protocols are already very efficient. However, nonzero-error models, even the private coin randomized model, allow efficient one-way protocols. For most practical applications of symmetric pairs, the ambiguity of the players increases at least polynomially with the size of the inputs; in this case, a private coin randomized protocol exists that uses a single round of communication and whose communication complexity is nearly optimal. If the ambiguity of the players increases superpolynomially with the size of the inputs, then the best one-way private coin randomized protocol is optimal. Lemma 8.30. Let S be a support set with X = {0,1}n, and let k > 1. If r:11; E e(nk ), then 11(S) < (1+ .k-) k(S) + o(froo (S)). If k E w(nk) for all k > 1, then < k(S) + o(Ito (S)). Chapter 8. Worst-Case Nonzero-Error Interactive Communication^133 Proof. We prove the case 63 E 0(nk ). Corollary 8.21 and the assumption give i?‘1(S) < log 63, + log log I + c(e) < log 63; + log log(IX I • a--y- ) + 0(1) < log 63; + log log(2n • 635) + 0(1) < log 63; + log 71, k + log log 635 + 0(1) < (1 + -1 ) • log 63; + log log 63; + 0(1), and the result follows from Lemma 8.13. The case ay E cu(nk ) is proved similarly.^^ Example 8.31. A large database D stored on a PC has to be synchronized with an updated version D' of the database stored on a PDA. The problem can be viewed as a set reconciliation problem [74], and for this example it is assumed that D and D' are sets of integers in [1, 101, that D' has to be conveyed to the PC, and that D and D' differ in at most 1000 entries, i.e., ID \ D'I + ID' \ DI < 1000. Translated in the interactive communication framework, D and D' can be expressed as binary strings x and y such that x i = 1 (yi = 1) if and only if i E D (i E D'). The correlation between D and D' gives an upper bound on the Hamming distance between x and y, thus the support set moo is symmetric and a—x = ay =^(1000i000). i=0 From Result 8.6, we know that a 3-round deterministic algorithm requiring 11452 bits of communication exists. Nonzero-error algorithms cannot do much better: from Lemma 8.13, even with a public random string, at least 11401 bits need to be exchanged in order for Py to learn x with error at most €, for 0 < e < 2 . From Lemma 8.17, we know that there is a one-way public coin randomized protocol transmitting 11452 bits with a 2 -5° probability of error, and from Corollary 8.21 that there is a one-way private coin randomized protocol transmitting 11576 bits with a 2 -5° probability of error . This chapter assumes that Px and Py have unbounded memory and computing power. For most of the practical problems modeled by symmetric support sets, extensive research was conducted to design algorithms with reasonable trade-offs between the number of rounds, communication complexity, computational complexity, and space complexity. Of course, the requirements depend on the application and the size of the maximum ambiguity of the players. Chapter 8. Worst-Case Nonzero-Error Interactive Communication^134 8.8 Summary In this chapter, I have studied worst-case nonzero-error interactive communication. I have shown that if the players are allowed to use public coins, to answer correctly on a fraction of the inputs or to solve a large number of instances simultaneously, then interaction is not necessary and in some cases the communication required can be significantly reduced. I have also proved that one round of communication is almost optimal if the players are allowed to use private coins, and also that all the models are equivalent for the practical data reconciliation problems that can be modeled by symmetric support sets. Chapter 9 Conclusions, Open Problems, and Future Work In this thesis, various problems related to communication over channels with symbol synchronization errors were presented. First, I have studied the capacity of the binary deletion channel and considered the problem of computing the number of subsequences when symbols are deleted from a string. In the second part of the thesis, I have presented new classes of error-correcting codes robust against synchronization errors. Finally, I have studied two problems using a synchronization approach, namely opportunistic spectrum access in cognitive radio systems and worst-case nonzero-error interactive communica- tion. In this chapter, I present the main interesting open problems related to the topics presented in my thesis as well as suggestions for further research. 9.1 Capacity Bounds for Channels with Synchronization Errors It would be interesting to extend the techniques presented in Chapter 3 to derive numer- ical bounds for the capacity of more general channels admitting insertions and deletions of symbols, and for which only very fragmentary results have been published. Channels allowing deletions and duplications are especially of interest due to their numerous prac- tical applications, and this certainly is an extremely challenging problem. One of the main difficulties to overcome is that an arbitrarily large number of bits can be inserted between two consecutive input bits, resulting in output sequences that can be arbitrarily longer than the input sequences. 135 Chapter 9. Conclusions, Open Problems, and Future Work^136 It would also be interesting to find closed-form expressions and an efficient algorithm to calculate the number of sequences that can be obtained when inserting and deleting symbols from a string by modifying the techniques presented in Chapter 4. 9.2 Nonlinear Trellis Codes for Channels with Synchronization Errors The study of nonlinear trellis synchronization-correcting codes presented in Chapter 6 is a very rich problem. This being said, there are several interesting questions to answer and many difficulties to overcome in order to obtain codes that can be implemented in practical systems. 9.2.1 Code Construction We now have promising graph structures for codes against synchronization errors, as well as efficient decoding algorithms. One of the main remaining difficulties is to allocate the output symbols to the edges in the graphs to obtain codes with good performance. Although it is possible to survey the entire space of codes for simple encoder graphs, it is computationally prohibitive to do so for larger structures. Work on small rectangular graphs showed that there is a significant difference between the average performance over all the symbol configurations and the performance of the best ones, and as such approaches based on randomly assigning symbols to graph edges are bound to fail. Two techniques could be applied. The first one is to find good symbol allocations for small trellises and to assemble them to construct larger ones. Preliminary work indicates that this approach seems a good way to optimize the trellis encoders as the number of columns increases. For trellises with large columns metaheuristic optimization techniques like hybrid evolutionary algorithms [36] could be implemented. Another question to investigate further is the potential propagation of the synchro- nization errors for an arbitrarily long time. For instance, the code presented in Figure 6.12 contains a cycle of four Os, and as a result synchronization is lost if four bits are deleted in a large run of zeros, no matter how far apart the deletions are. More precisely, this ripple effect can potentially occur when the trellis encoder can generate several shifted versions of a substring. It would be interesting to study if codes robust against rip- ple error propagation have a better performance than codes without this property over channels affected by realistic time noise. Chapter 9. Conclusions, Open Problems, and Future Work^137 9.2.2 Timing Errors Synchronization errors come in several flavors. For instance, one can imagine a channel where symbols can be inserted and substituted, but not deleted, or a channel with a lot of deletion errors but very few random symbol insertions. Other examples are channels affected by timing errors and intersymbol interference, like in magnetic recording, where peakshift-errors are common. The techniques presented in this chapter can be extended to correct most types of channels affected by synchronization errors, but one particularly stands out. On the majority of the applications, synchronization errors are caused by varying sampling rates of the system devices, resulting in deletion and duplication errors (no random insertions). Consequently, the most practical problem of this chapter is to construct efficient codes for such channels. Two difficulties need to be overcome. First, the Viterbi decoding algorithm has to be modified, this time to correct deletion and du- plication errors. I am confident that this can be done using a blending of the algorithms for duplication and synchronization errors, but it remains to be seen if the computa- tional complexity of resulting algorithm can be reduced without significantly affecting its performance. Second, efficient bit configurations need to be found. Again, preliminary work makes me think that it will not be a problem, and that efficient configurations for deletion errors will also offer a good performance when used for channels affected by deletion and duplication errors. 9.2.3 Soft-Information Algorithms Another very interesting question is how soft-information can be used to improve the performance of the codes. There is no research on the subject, and to the best of my knowledge this is due to the fact that besides the codes of Davey and Mackay [20], the codes presented in this chapter are the only synchronization-correcting codes that can easily take advantage of soft-information from the channels. This is one more point in their favor, since this could probably lead to interesting coding gains. A simple yet insightful scenario worth studying is to use one-dimensional amplitude modulation, like 2-AM, with inaccurate clocks between the receiver and transmitter. A slower receiver clock creates deletion errors, a faster receiver clock creates duplication er- rors, and an irregular clock a mixture of deletion and duplication errors. This could allow to compare the bit error-rates with the same codes used with hard decisions. Another application that could take advantage of soft-information is the opportunistic spectrum access problem presented in Chapter 7. Chapter 9. Conclusions, Open Problems, and Future Work^138 Table 9.1: Multiplicity of the subsequence r = 0010 in the string v = 010. E Vi = 0 V2 = 1 V3 = 0 E 1 1 1 1 ri = 0 0 1 1 2 r2 = 1 0 0 1 1 r3 = 0 0 0 0 1 r4 = 0 0 0 0 0 Table 9.2: Multiplicity of the subsequence r = 0010 in the string v = 011. E V1 = 0 V2 = 1 V3 = 1 E 1 1 1 1 ri = 0 0 1 1 1 r2 = 1 0 0 1 2 r3 = 0 0 0 0 0 r4 = 0 0 0 0 0 9.2.4 Complexity Issues I have shown in Section 6.3 that the computational complexity of the Viterbi decoding algorithm sharply increases when insertions and / or deletions and / or substitutions can occur together. However, all kinds of savings are possible. For instance, the algorithm does not always need to calculate all the k-prefix synchronization distances, because some are trivial or useless. The complexity can also be significantly reduced by bounding the maximum synchronization offset the algorithm can tolerate. It would be useful to study how these tradeoffs and others affect the performance of the codes, especially for practical implementation purposes One of the disadvantages of trellis encoders using rectangular graphs is that the entire state diagram has to be stored in memory. This is in contrast of convolutional codes, which can be defined with linear sequential circuits. An interesting question is to investi- gate whether it is possible to compress the symbol configurations using simple functions without sacrificing the performance of the codes. The fact that efficient synchroniza- tion codes are intrinsically nonlinear and intrinsically as acyclic as possible makes me conjecture that this is a challenging task. 9.2.5 Maximum-Likelihood Decoding It is not clear how far the final survivor paths for all the variations of the Viterbi algorithm in this chapter are from the maximum-likelihood paths, nor how much additional memory Chapter 9. Conclusions, Open Problems, and Future Work^139 is required to obtain maximum-likelihood decoding algorithms. For deletion errors, all the information required to find the maximum-likelihood codewords can be obtained using a generalized version of the algorithm to compute the multiplicity of a subsequence in a string presented in Section 4.5 (similar algorithms can easily be derived for insertion, duplication and synchronization errors), more precisely from the multiplicity of the first k symbols of r in the potential codewords, for 0 < k < Ir 1. The problem is that although the generalized multiplicity can be calculated recursively on the symbols of the trellis paths, it is not possible to define an order relation for it. Consider again the example from Figure 6.17 presented in Section 6.7. Tables 9.1 and 9.2 contain the multiplicity of r in the two branches coming from the all-zero state at time t = 0. Comparing the last column of both tables, one can see that sometimes the largest value comes from the branch 010, and other times from the branch 011. Hence, if the algorithm wants to find the maximum-likelihood codeword, it can only discard a path if all its last column entries are smaller than or equal to the corresponding entries in one of the other survivor paths at the same state in the decoding process. An interesting problem is therefore to examine, on average and in the worst-case, the number of survivor paths that need to be kept in order to achieve maximum-likelihood decoding. Another problem of interest is to devise an order relation for the generalized multiplicity of the received sequence in the codewords allowing quasi-maximum-likelihood decoding. 9.3 Opportunistic Spectrum Access for Cognitive Radio Systems The framework for the spectrum access problem in cognitive radio systems presented in Chapter 7 can be used for studying more complex channel models as well as implementing other types of error-correcting codes. For instance, it would be interesting to extend these channel models to include AWGN channels and more detailed modeling of the primary user detection. The nature of the error-correcting decoding schemes indicates that an accurate esti- mation of the design parameters of the cognitive radios, especially the accuracy of their primary user detection, will be valuable to the decoder in effectively exploiting the ca- pacity of the channels. It would certainly be useful to understand through simulations how much this knowledge (or lack thereof) affects decoding. Spectrum sensing using energy detection is well suited for the use of soft information; Chapter 9. Conclusions, Open Problems, and Future Work^140 it would be interesting to study how much it can improve the overall performance of the codes, especially for the synchronization approach. It would also be interesting to study possible tradeoffs between the erasure and the synchronization approaches, i.e., coding techniques that can use the side-information provided by the potential positions of the deletion errors to increase transmission rates while remaining reasonably robust against variation in primary user activity. 9.4 Worst-Case Private Coin Randomized Interactive Communication I have not been able to completely characterize the private coin randomized model pre- sented in Chapter 8, and in this section I discuss two open problems. 9.4.1 One-Way Private Coin Randomized Complexity I have proved that one round of communication is almost optimal for private coin ran- domized interactive communication, but is one round optimal? If not, is there a k such that an optimal k-round protocol always exists, i.e., a k such that fivs)< froo (s)+0(froo (s))? The lower bounds for the private coin randomized complexity presented in this chapter do not restrict the number of rounds between Px and Py . Furthermore, it is not clear if existing techniques for proving lower bounds for other models of computation like communication complexity of nonboolean functions can be used to derive stronger lower bounds in the interactive communication setting. As for upper bounds, although it is not obvious how randomized protocols can use interaction to reduce the number of bits that need to be transmitted, the best upper bound, Theorem 8.19, is not always tight. 9.4.2 Deterministic Model Versus Private Coin Randomized Model The private coin randomized model allows nearly optimal one-way protocols, but it does not seem that it can be more efficient than determinism when interaction is allowed. In Chapter 9. Conclusions, Open Problems, and Future Work^141 fact, I conjecture that the worst-case deterministic and worst-case private coin random- ized models are equivalent, i.e., Roo(S) , Coo (s). The first thing to remark is that the two lower bounds for the private coin randomized complexity presented in Section 8.3 are asymptotically equal to Results 8.3 and 8.4 for the deterministic model. I have shown in the same section that the models are equiv- alent when Py 's maximum ambiguity is constant, which includes all of the maximally unbalanced pairs such as the league problem, i.e, pairs with ay E 0(1) and a—x E C2 (1X1)• In Section 8.7, I have proved that the models are equivalent for balanced and symmetric support sets. Thus, in order to resolve the conjecture, one needs to study problems for which the private coin randomized complexity is not tightly bounded by Lemmas 8.9 and 8.13, i.e., problems that are neither balanced nor too unbalanced. Bibliography [1] Sachin Agarwal, David Starobinski, and Ari Trachtenberg. On the scalability of data synchronization protocols for PDAs and mobile devices. IEEE Network Mag- azine, July/Aug. 2002. [2] Rudolf Ahlswede, Ning Cai, and Zhen Zhang. On interactive communication. IEEE Transactions on Information Theory, 43(1):22-37, 1997. [3] Noga Alon and Alon Orlitsky. Repeated communication and Ramsey graphs. IEEE Transactions on Information Theory, 41(5):1276-1289, 1995. [4] Noga Alon and Alon Orlitsky. Source coding and graph entropies. IEEE Transac- tions on Information Theory, 42(5):1329-1339, 1996. [5] Gaurav Bansal, Jahangir Hossain, and Vijay K. Bhargava. Adaptive power load- ing for OFDM-based cognitive radio systems. In Proc. 2007 IEEE International Conference on Communications (ICC), pages 5137-5142, June 2007. [6] Gaurav Bansal, Md. Jahangir Hossain, Praveen Kaligineedi, Hugues Mercier, Chris Nicola, Umesh Phuyal, Md. Mamunur Rashid, Kapila C. Wavegedara, Ziaul Hasan, Majid Khabbazian, and Vijay K. Bhargava. Some research issues in cognitive radio networks. In Proceedings of the 8th IEEE Africon Conference in Africa (AFRICON), Windhoek, Namibia, 26-28 September 2007. Invited Paper. [7] Leonard D. Baumert, Robert J. McEliece, and Henk C. A. van Tilborg. Symbol synchronization in convolutionally coded systems. IEEE Transactions on Informa- tion Theory, 25(3):362-365, 1979. [8] P. A. H. Bours. On the construction of perfect deletion-correcting codes using design theory. Designs, Codes and Cryptography, 6:5-20, 1995. 142 Bibliography^ 143 [9] Patrick A. H. Bours. Construction of fixed-length insertion/deletion correcting runlength-limited codes. IEEE Transactions on Information Theory, 40(6):1841- 1856, 1994. [10] Gilles Brassard and Louis Salvail. Secret-key reconciliation by public discussion. In Advances in Cryptology: Proceedings of Eurocrypt '93, volume 765 of Lectures Notes in Computer Science, pages 410-423. Springer-Verlag, 1994. [11] L. Calabi. On the computation of Levenshtein's distances. Report TM-9-0030, Parke Mathematical Laboratories, Inc., Carlisle, MA, 1967. [12] L. Calabi and W. E. Hartnett. Some general results of coding theory with applica- tions to the study of codes for the correction of synchronization errors. Information and Control, 15(3):235-249, Sep. 1969. [13] Willem A. Clarke and Hendrik C. Ferreira. A new linear, quasi-cyclic multiple insertion/deletion correcting code. In Proc. 2003 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), volume 2, pages 899-902, 2003. [14] Willem A. Clarke and Hendrik C. Ferreira. Coding for insertion/deletion error propagation effects associated with fixed length decoding windows. In Proc. 2004 IEEE International Symposium on Information Theory (ISIT), page 228, 2004. [15] Graham Cormode, Mike Paterson, Siileyman Sahinalp, and Uzi Vishkin. Communi- cation complexity of document exchange. In Proc. eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 197-206, 2000. [16] Thomas M. Cover and Mung Chiang. Duality between channel capacity and rate distortion with two-sided state information. IEEE Transactions on Information Theory, 48(6):1629-1638, June 2002. [17] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, second edition, 2006. [18] F. H. C. Crick, J. S. Griffith, and L. E. Orgel. Codes without commas. In Proc. National Academy of Sciences in the United States of America, volume 43, pages 416-421, 1957. Bibliography^ 144 [19] L. J. Cummings. Aspects of synchronizable coding. Journal of Combinatorial Mathematics and Combinatorial Computing, 1:67-84, 1987. [20] Matthew C. Davey and David J. C. MacKay. Reliable communication over channels with insertions, deletions and substitutions. IEEE Transactions on Information Theory, 47(2):687-698, 2001. [21] Suhas Diggavi and Matthias Grossglauser. On information transmission over a finite buffer channel. IEEE Transactions on Information Theory, 52(3):1226-1237, Mar. 2006. [22] Suhas N. Diggavi and Matthias Grossglauser. On transmission over deletion chan- nels. In Proc. 39th Annual Allerton Conference on Communication, Control, and Computing, 2001. [23] Suhas N. Diggavi and Matthias Grossglauser. Bounds on the capacity of deletion channels. In Proc. 2002 IEEE International Symposium on Information Theory (ISIT), page 421, 2002. [24] R. L. Dobrushin. Shannon's theorems for channels with synchronization errors. Problems of Information Transmission, 3(4):11-26, 1967. Translated from Problemy Peredachi Informatsii, 3(4):18-36, 1967 (in Russian). [25] L. Dolecek and V. Anantharam. A synchronization technique for array-based LDPC codes in channels with varying sampling rate. In Proc. 2006 IEEE International Symposium on Information Theory (ISIT), pages 2057-2061, 2006. [26] A. S. Dolgopolov. Capacity bounds for a channel with synchronization errors. Prob- lems of Information Transmission, 26(2):111-120, 1990. Translated from Problemy Peredachi Informatsii, 26(2):27-37, 1990 (in Russian). [27] Eleni Drinea and Michael Mitzenmacher. A simple lower bound for the capacity of the deletion channel. Available at http://www.eecs.harvard.eduhnichaelm/ postscripts/deletiontemp.ps. [28] Eleni Drinea and Michael Mitzenmacher. Improved lower bounds for i.i.d. deletion channels. In Proc. 42nd Annual Allerton Conference on Communication, Control, and Computing, 2004. Bibliography^ 145 [29] Eleni Drinea and Michael Mitzenmacher. On lower bounds for the capacity of deletion channels. In Proc. 2004 IEEE International Symposium on Information Theory (ISIT), page 227, 2004. [30] Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998. [31] Pavol Duris, Zvi Galil, and Georg Schnitger. Lower bounds on communication com- plexity. In Proc. 16th annual ACM Symposium on Theory of Computing (STOC), pages 81-91, 1984. [32] Peter Elias. Coding for noisy channels. In Proc. IRE Convention Record, volume 43, pages 37-46, 1955. [33] Tomas Feder, Eyal Kushilevitz, Moni Naor, and Noam Nisan. Amortized commu- nication complexity. SIAM Journal on Computing, 24(4):736-750, 1995. [34] H. C. Ferreira, W. A. Clarke, A. S. J. Helberg, K. A. S. Abdel-Gaffar, and A. J. Han Vinck. Insertion/deletion correction with spectral nulls. IEEE Transactions on Information Theory, 43(2):722-732, 1997. [35] R. Fuji-Hara, Y. Miao, J. Wang, and J. Yin. Directed B(K , 1; v) with K = {4, 5} and {4, 6} related to deletion/insertion-correcting codes. Journal of Combinatorial Designs, 9(2):147-156, 2001. [36] Philippe Galinier and Jin-Kao Hao. Hybrid evolutionary algorithms for graph coloring. Journal of Combinatorial Optimization, 3(4):379-397, 1999. [37] Robert G. Gallager. Sequential decoding for binary channels with noise and synchronization errors. Lincoln Lab Group Report 2502, 1961. Available at http://web.mit.edu/gallager/www/pages/pubs.html. [38] E. N. Gilbert. Synchronization of binary messages. IRE Transactions on Informa- tion Theory, 6(4):470-477, 1960. [39] Solomon W. Golomb, J. R. Davey, I. S. Reed, H. L. Van Trees, and Jack J. Stuffier. Synchronization. IEEE Transactions on Communications, pages 481-491, 1963. Bibliography^ 146 [40] Solomon W. Golomb and Basil Gordon. Codes with bounded synchronization delay. Information and Control, 8:355-372, 1965. [41] Solomon W. Golomb, Basil Gordon, and L. R. Welch. Comma-free codes. Canadian Journal of Mathematics, 10(2):202-209, 1958. [42] Venkatesan Guruswami. List decoding with side information. In Proc. 18th IEEE Annual Conference on Computational Complexity (CCC), pages 300-309, 2003. [43] Godfrey H. Hardy and Edward M. Wright. An introduction to the theory of num- bers. Oxford University Press, 1979. [44] Johan Hastad. Clique is hard to approximate within n i-E. In Proc. 37th IEEE Symposium on Foundations of Computer Science (FOGS), pages 627-636, 1996. [45] Simon Haykin. Cognitive radio: Brain-empowered wireless communications. IEEE Journal on Selected Areas in Communications, 23(2):201-220, Feb. 2005. [46] Albertus S. J. Helberg and Hendrik C. Ferreira. On multiple insertion/deletion correcting codes. IEEE Transactions on Information Theory, 48(1):305-308, 2002. [47] Daniel S. Hirschberg and Mireille Regnier. Tight bounds on the number of string subsequences. Journal of Discrete Algorithms, 1(1):123-132, 2000. Available at http://www.ics.uci.edui-dan/pubs/JDA.ps. [48] Kees A. Schouhamer Immink. Coding Techniques for Digital Recorders. Prentice Hall Professional Technical Reference, 1991. [49] Syed Ali Jafar and Sudhir Srinivasa. Capacity limits of cognitive radio with dis- tributed and dynamic spectral activity. IEEE Journal on Selected Areas in Com- munications, 25(3):529-537, Apr. 2007. [50] Rolf Johannesson and Kamil Sh. Zigangirov. Fundamentals of Convolutional Cod- ing. IEEE Press, 1999. [51] Mauricio Karchmer, Ran Raz, and Avi Wigderson. Super-logarithmic depth lower bounds via the direct sum in communication complexity. Computational Complex- ity, 5(3/4):191-204, 1995. Bibliography^ 147 [52] Aleksandar Kavcic and Ravi Motwani. Insertion/deletion channels: Reduced-state lower bounds on channel capacities. In Proc. 2004 IEEE International Symposium on Information Theory (ISIT), page 229, 2004. [53] J. E. Kelley. The cutting-plane method for solving convex programs. SIAM Journal on Applied Mathematics, 8:703-712, 1960. [54] Majid Khabbazian, Hugues Mercier, and Vijay K. Bhargava. Severity analysis and countermeasures for the wormhole attack in wireless ad hoc networks. Submitted, IEEE Transactions on Wireless Communications, May 2007, submission number Paper-TW-May-07-0536. Revised, November 2007. Accepted for publication, Jan- uary 2008. [55] Majid Khabbazian, Hugues Mercier, and Vijay K. Bhargava. Wormhole attack in wireless ad hoc networks: Analysis and countermeasure. In Proceedings of the 2006 IEEE Global Telecommunications Conference (GLOBECOM), San Francisco, USA, 27 November - 1 December 2006. [56] J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics, 23:462-466, 1952. [57] Eyan Kushilevitz and Noam Nisan. Communication Complexity. Cambridge Uni- versity Press, 1997. [58] C. Y. Lee. Some properties of nonbinary error-correcting codes. IRE Transactions on Information Theory, 4(2):77-82, 1958. [59] Vladimir I. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1(1):8-17, 1965. Translated from Problemy Peredachi Informatsii, 1(1):12-25, 1965 (in Russian). [60] Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8):707-710, Feb. 1966. Translated from Doklady Akademii Nauk SSSR, 163(4):845-848, 1965 (in Russian). [61] Vladimir I. Levenshtein. On perfect codes in deletion and insertion metric. Discrete Mathematics and Applications, 2(3):241-258, 1992. Translated from Diskretnaya Matematika, 3(1):3-20, 1991 (in Russian). Bibliography^ 148 [62] Vladimir I. Levenshtein. Efficient reconstruction of sequences. IEEE Transactions on Information Theory, 47(1):2-22, Jan. 2001. [63] Shu Lin and Daniel J. Costello. Error Control Coding. Paerson Prentice Hall, second edition, 2004. [64] Zhenming Liu and Michael Mitzenmacher. Codes for deletion and insertion channels with segmented errors. In Proc. 2007 IEEE International Symposium on Informa- tion Theory (ISIT), 2007. To appear. [65] A. Mahmoodi. Existence of perfect 3-deletion-correcting codes. Designs, Codes and Cryptography, 14:81-87, 1998. [66] Rudolf Mathon and Tran Van Trung. Directed t-packings and directed t-Steiner systems. Designs, Codes and Cryptography, 18:187-198, 1999. [67] Hugues Mercier. Reconciliation et complexite de la communication de donnees correlees. Memoire de maitrise, Departement d'informatique et de recherche operationnelle, Universite de Montreal, Canada, 2003. Reconciliation and Com- munication Complexity of Correlated Data, Master's Thesis, in French. [68] Hugues Mercier and Vijay K. Bhargava. Nonlinear trellis codes for channels with synchronization errors. In Preparation. To be submitted to the IEEE Transactions on Information Theory. [69] Hugues Mercier and Vijay K. Bhargava. A note on variable-length multiple- synchronization-correcting codes. In Preparation. To be submitted to the IEEE Transactions on Information Theory. [70] Hugues Mercier and Vijay K. Bhargava. Improved bounds for the capacity of the binary deletion channel. In Proceedings of the 2006 International Symposium on Information Theory and its Applications (ISITA), pages 423-427, Seoul, Korea, 29 October - 1 November 2006. [71] Hugues Mercier, Majid Khabbazian, and Vijay K. Bhargava. On the number of subsequences when deleting symbols from a string. IEEE Transactions on Informa- tion Theory. Submitted, October 2006, submission number 6-751. Revised, March 2007. Accepted for publication, February 2008. Bibliography^ 149 [72] Hugues Mercier, Pierre McKenzie, and Stefan Wolf. Worst-case nonzero-error in- teractive communication. IEEE Transactions on Information Theory. Submitted, May 2006, submission number 6-302. Revised, March 2007. Accepted for publica- tion, October 2007. [73] Hugues Mercier, Pierre McKenzie, and Stefan Wolf. Worst-case randomized inter- active communication. In Proceedings of the 2005 IEEE International Symposium on Information Theory (ISIT), pages 430-434, Adelaide, Australia, 4-9 September 2005. [74] Yaron Minsky, Ari Trachtenberg, and Richard Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory, 49(9):2213-2218, 2003. [75] Michael Mitzenmacher. Capacity bounds for sticky channels. 2007. Available at http: //www.eecs.harvard.edu/—michaelm/postscripts/stickytemp.pdf . [76] Takuo Mori and Hideki Imai. Viterbi decoding considering synchronization er- rors. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E79-A(9):1324-1329, 1996. [77] Moni Naor, Alon Orlitsky, and Peter Shor. Three results on interactive communi- cation. IEEE Transactions on Information Theory, 39(5):1608-1615, 1993. [78] Ilan Newman. Private vs. common random bits in communication complexity. Information Processing Letters, pages 67-71, 1991. [79] Chris Nicola, Hugues Mercier, and Vijay K. Bhargava. Error-correcting codes for dynamic spectrum allocation in cognitive radio systems. In Proceedings of the 2007 International Symposium on Signals, Systems & Electronics (URSI ISSSE), pages 247-250, Montreal, Canada, 30 July - 2 August 2007. Invited Paper. [80] Alon Orlitsky. Worst-case interactive communication I: Two messages are almost optimal. IEEE Transactions on Information Theory, 36(5):1111-1126, 1990. [81] Alon Orlitsky. Worst-case interactive communication II: Two messages are not optimal. IEEE Transactions on Information Theory, 37(4):995-1005, 1991. [82] Alon Orlitsky. Average-case interactive communication. IEEE Transactions on Information Theory, 38:1534-1547, 1992. Bibliography^ 150 [83] Alon Orlitsky. Interactive communication of balanced distributions and of corre- lated files. SIAM Journal on Discrete Mathematics, 6(4):548-564, 1993. [84] Christos H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994. [85] Edwin P. D. Pednault. Synthesizing plans that contain actions with context- dependent effects. Computational Intelligence, 4(4):356-372,1988. [86] Edward A. Ratzer. Error-correction on non-standard communication channels. PhD thesis, University of Cambridge, 2003. [87] Ron M. Roth and Paul H. Siegel. Lee-metric BCH codes and their application to constrained and partial-response channels. IEEE Transactions on Information Theory, 40(4):1083-1096, 1994. [88] Reihaneh Safavi-Naini and Yejing Wang. Traitor tracing for shortened and cor- rupted fingerprint. In Proc. 2002 ACM Workshop on Digital Rights Management, volume 2696 of Lectures Notes in Computer Science, pages 81-100. Springer-Verlag, 2003. [89] Leonard J. Schulman and David Zuckerman. Asymptotically good codes correct- ing insertions, deletions, and transpositions. IEEE Transactions on Information Theory, 45(7):2552-2557, 1999. [90] F. F. Sellers. Bit loss and gain correction code. IRE Transactions on Information Theory, 8(1):35-38, 1962. [91] Ubolthip Sethakaset and T. Aaron Gulliver. Differential amplitude pulse-position modulation for indoor wireless optical channels. In Proc. 2004 IEEE Global Telecommunications Conference (GLOBECOM), 2004. [92] Nabil Shalaby, Jianmin Wang, and Jianxing Yin. Existence of perfect 4-deletion- correcting codes with length six. Designs, Codes and Cryptography, 27:145-156, 2002. [93] Claude E. Shannon. Channels with side information at the transmitter. IBM Journal of Research and Development, 2(4):289-293, Oct. 1958. [94] B. S. Sharif, J. Neasham, O. R. Hinton, and A. E. Adams. A computationally efficient doppler compensation system for underwater acoustic communications. IEEE Journal Of Oceanic Enginering, 25(1):52-61, 2000. Bibliography^ 151 [95] Amin Shokrollahi. Raptor codes. 2003 IEEE International Symposium on Infor- mation Theory (ISIT), 2003. Recent Results Session, available at http://www. inference.phy.cam.ac.uk/mackay/dfountain/RaptorPaper.pdf. [96] David Slepian and Jack K. Wolf. Noiseless coding of correlated information sources. IEEE Transactions on Information Theory, 19(4):471-480, 1973. [97] Neil J. A. Sloane. The on-line encyclopedia of integer sequences. Published elec- tronically at www.research.att.com/njas/sequences/, 1996-2007. [98] Neil J. A. Sloane. On single-deletion-correcting codes. In K. T. Arasu and A. Seress, editors, Codes and Designs (Ray-Chaudhuri Festschrift), volume 10 of Ohio State University Mathematical Research Institute Publications, pages 273-291. de Gruyter, 2002. [99] Neil J. A. Sloane. Challenge problems: Independent sets in graphs. Published electronically at http://www.research.att.comhjas/doc/graphs.html, 2005. [100] David Starobinski, Ari Trachtenberg, and Sachin Agarwal. Efficient PDA synchro- nization. IEEE Transactions on Mobile Computing, 2(1):40-51, 2003. [101] Theo G. Swart and Hendrik C. Ferreira. Insertion/deletion correcting coding schemes based on convolutional coding. Electronics Letters, 38(16):871-873, 2002. [102] Theo G. Swart and Hendrik C. Ferreira. A note on double insertion/deletion cor- recting codes. IEEE Transactions on Information Theory, 49(1):269-273, Jan. 2003. [103] Theo G. Swart, Hendrik C. Ferreira, and Marco P. F. dos Santos. Using parallel- interconnected Viterbi decoders to correct insertion/deletion errors. In Proc. 7th IEEE Africon Conference in Africa (AFRICON), pages 341-344, 2004. [104] Jeffrey D. Ullman. On the capabilities of codes to correct synchronization errors. IEEE Transactions on Information Theory, 13(1):95-105, 1967. [105] R. R. Varshamov. An arithmetic function applicable to coding theory. Soviet Physics-Doklady, 10:185-187, 1965. Translated from Doklady Akademii Nauk SSSR, 161(3):540-543, 1965 (in Russian). Bibliography^ 152 [106] R. R. Varshamov and G. M. Tenengolts. Codes which correct single asymmetric errors. Automation and Remote Control, 26(2):286-290, 1965. Translated from Avtomatika i Telemekhanika, 26(2):288-292, 1965 (in Russian). [107] Andrew J. Viterbi. Error bounds for convolutional codes and an asymptotically op- timum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260— 269,1967. [108] N. D. Vvedenskaya and R. L. Dobrushin. The computation on a computer of the channel capacity of a line with symbol drop-out. Problems of Information Transmission, 4(3):76-79, 1968. Translated from Problemy Peredachi Informatsii, 4(3):92-95, 1968 (in Russian). [109] Michel Willem. Minimax Theorems. Progress in Nonlinear Differential Equations and Their Applications. Birkhauser, 1996. [110] Andrew Chi-Chih Yao. Probabilistic computations: toward a unified measure of complexity. In Proc. 18th IEEE Symposium on Foundations of Computer Science (FOCS), pages 222-227,1977. [111] Andrew Chi-Chih Yao. Some complexity questions related to distributed com- puting. In Proc. 11th ACM Symposium on Theory of Computing (STOC), pages 209-213,1979. [112] Andrew Chi-Chih Yao. Lower bounds by probabilistic arguments. In Proc. 24th IEEE Symposium on Foundations of Computer Science (FOCS), pages 420-428, 1983. [113] Jianxing Yin. A combinatorial construction for perfect deletion-correcting codes. Designs, Codes and Cryptography, 23:99-110, 2001. [114] Zhen Zhang and Xiang-Gen Xia. Three messages are not optimal in worst case interactive communication. IEEE Transactions on Information Theory, 40(43— 10,1994.