Capacity of Multidimensional Constrained Channels Estimates and Exact Computations by Erez Louidor B.Sc., The Technion Inst. of Technology, 1998 M.Sc., The Technion Inst. of Technology, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2010 c Erez Louidor 2010 Abstract This work considers channels for which the input is constrained to be from a given set of D-dimensional arrays over a finite alphabet. Such a set is called a constraint. An encoder for such a channel transforms arbitrary arrays over the alphabet into constrained arrays in a decipherable manner. The rate of the encoder is the ratio of the size of its input to the size of its output. The capacity of the channel or constraint is the highest achievable rate of any encoder for the channel. We compute the exact capacity of two families of multidimensional constraints. We also generalize a known method for obtaining lower bounds on the capacity, for a certain class of 2-dimensional constraints, and improve the best known bounds for a few constraints of this class. Given a binary D-dimensional constraint, a D-dimensional array with entries in {0, 1, } is called “valid”, for the purpose of this abstract, if any “filling” of the ‘’s in the array with ‘0’s and ‘1’s, independently, results in an array that belongs to the constraint. The density of ‘’s in the array is called the insertion rate. The largest achievable insertion rate in arbitrary large arrays is called the maximum insertion rate. An unconstrained encoder for a given insertion rate transforms arbitrary binary arrays into valid arrays having the specified insertion rate. The tradeoff function essentially specifies for a given insertion rate the maximum rate of an unconstrained encoder for that insertion rate. We determine the tradeoff function for a certain family of 1-dimensional constraints. Given a 1-dimensional constraint, one can consider the D-dimensional constraint formed by collecting all the D-dimensional arrays for which the original 1-dimensional constraint is satisfied on every “row” in every “direction”. The sequence of capacities of these D-dimensional generalizations has a limit as D approaches infinity, sometimes called the infinite-dimensional capacity. We partially answer a question of [37], by proving that for a large class of 1-dimensional constraints with maximum insertion rate 0, the infinite dimensional capacity equals 0 as well. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Multidimensional constraints . . . . . . . . . . . . . . . . 2.1 One-dimensional constraints and labeled directed graphs 2.2 Higher dimensional constraints . . . . . . . . . . . . . 2.3 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Axial product . . . . . . . . . . . . . . . . . . . . . . . 2.5 Two-dimensional constraints . . . . . . . . . . . . . . . 2.5.1 Horizontal and vertical strips . . . . . . . . . . 2.6 Open questions . . . . . . . . . . . . . . . . . . . . . . 3 Lower bounds on capacity of 2-dimensional symmetric constraints 3.1 Constraints with symmetric edge-constrained strips . . . . . . . 3.2 Constraints with symmetric vertex-constrained strips . . . . . . 3.3 Capacity bounds for axial products of constraints . . . . . . . . 3.4 Heuristics for choosing φ . . . . . . . . . . . . . . . . . . . . . 3.4.1 Using max-entropic probabilites . . . . . . . . . . . . . 3.4.2 General optimization . . . . . . . . . . . . . . . . . . . 3.5 Numerical results for selected constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 8 10 13 15 15 17 . . . . . . . 18 18 27 31 33 33 36 38 iii Table of Contents 3.5.1 The constraint RWIM . . 3.5.2 The constraint EVEN⊗2 . 3.5.3 The constraint CHG(b)⊗2 Open questions . . . . . . . . . . . . . . 40 40 40 43 4 Exact computation of capacity . . . . . . . . . . . . . . . . . . . . . 4.1 The capacity of ODD⊗D . . . . . . . . . . . . . . . . . . . . . . 4.2 The capacity of CHG(2)⊗D . . . . . . . . . . . . . . . . . . . . 44 44 45 5 Multi-choice constraints and independence capacity 5.1 Multi-choice constraints . . . . . . . . . . . . . 5.2 Independence capacity . . . . . . . . . . . . . . 5.3 Independence capacity and axial products . . . . 5.4 Independence capacity and limD→∞ cap(S ⊗D ) . 5.5 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 51 53 55 61 6 The tradeoff function for binary 1-dimensional constraints 6.1 A brief overview of digital recording . . . . . . . . . . 6.2 Previous work . . . . . . . . . . . . . . . . . . . . . . 6.3 Background and definitions . . . . . . . . . . . . . . . 6.4 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . 6.5 Proof of Theorem 13 . . . . . . . . . . . . . . . . . . . 6.5.1 Outline of proof . . . . . . . . . . . . . . . . . 6.5.2 Proof of propositions . . . . . . . . . . . . . . 6.6 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 62 64 64 69 71 72 78 96 7 Bounds on capacity using probability . . 7.1 Some correlation inequalities . . . . 7.2 Bounds on capacity using probability 7.2.1 Proof of Proposition 14 . . . 7.2.2 Proof of Lemma 11 . . . . . 7.3 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 . 97 . 99 . 102 . 106 . 108 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 iv List of Tables 3.1 3.2 3.3 3.4 Matrix size in the method of [3, 7] for the NAK constraint. Best bounds on capacities of certain constraints. . . . . . . Lower bounds using max-entropic probability heuristic. . . Lower bounds using optimization. . . . . . . . . . . . . . . . . . 39 40 41 42 6.1 Values of |∆(g)| for 1≤g≤2d+6 and d≥2. . . . . . . . . . . . . 90 . . . . . . . . . . . . v List of Figures 1.1 Presentations of 1-dimensional constraints. . . . . . . . . . . . . 2 2.1 Presentations of 2-dimensional constraints. . . . . . . . . . . . . 10 3.1 Paths generating an `α×n-array of S, in Gn and I. . . . . . . . 23 4.1 Example of the graph Gr for D = 2, n = 6. . . . . . . . . . . . . 47 5.1 Proof of Theorem 10. . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1 6.2 Graphs of the tradeoff function for certain RLL constraints. . . . . The graph GbFRLL(d,∞) . . . . . . . . . . . . . . . . . . . . . . . . . The non-trivial component of GbFRLL(d,2d+2) . . . . . . . . . . . . . 68 69 6.3 (V) 72 vi Acknowledgements I am sincerely grateful for the support of my advisor, Prof. Brian Marcus, for being actively involved in my work, being realistically optimistic along the way, and making these abstract concepts a little more tangible by allowing me to discuss them with another. The only downside is that I now have high expectations of the next person I work with. vii Dedicated to my parents: Eti & Adam Louidor. viii Statement of Co-Authorship Chapters 2,3 and 4 were jointly authored by Erez Louidor and Brian Marcus. Erez Louidor and Brian Marcus were responsible for the identification and design of the research program. The research was performed by Erez Louidor and Brian Marcus with the majority of the research done by Erez Louidor. The manuscript was prepared by Erez Louidor and Brian Marcus with the majority of the writing done by Erez Louidor. A version of these chapters appears in [28]. Chapter 5 was jointly authored by Erez Louidor, Brian Marcus and Ronnie Pavlov. Panu Chaichanavong, Brian Marcus and Tze-Lei Poo were responsible for the identification and design of the research program. The research was performed by Panu Chaichanavong, Erez Louidor, Brian Marcus, Ronnie Pavlov and Tze-Lei Poo. The manuscript was prepared by Erez Louidor, Brian Marcus and Ronnie Pavlov with the majority of the writing done by Erez Louidor. A version of this chapter appears as part of [29]. Chapter 6 was authored by Erez Louidor. Brian Marcus was responsible for the identification and design of the research program. The research was performed by Erez Louidor. The manuscript was prepared by Erez Louidor. A version of this chapter appears in [27]. Chapter 7 was authored by Erez Louidor. Erez Louidor and Brian Marcus were responsible for the identification and design of the research program. The research was performed by Erez Louidor. The manuscript was prepared by Erez Louidor. ix Chapter 1 Overview Fix an alphabet Σ and let G be a directed graph whose edges are labeled with symbols in Σ. Each path in G corresponds to a finite word obtained by reading the labels of the edges of the path in sequence. The path is said to generate the corresponding word, and the set of words generated by all finite paths in the graph is called a 1-dimensional constrained system or a 1-dimensional constraint. Such a graph is called a presentation of the constraint. We say that a word satisfies the constraint if it belongs to the constrained system. One-dimensional constraints have found widespread applications in digital storage systems, where they are used to model the set of sequences that can be written reliably to a medium. A central example is the binary run-length-limited constraint, denoted RLL(d, k) for nonnegative integers 0≤d≤k, consisting of all binary sequences in which the number of ‘0’s between consecutive ‘1’s is at least d, and each “run” of ‘0’s (that is a contiguous sub-sequence of ‘0’s) has length at most k. The value of k is allowed to be ∞, in which case there is no restriction on the maximum length of a run of ‘0’s. Another 1-dimensional constraint, often used in practice, is the bounded-charge constraint, denoted CHG(b), for some positive integer b; it consists of all words w1 w2 . . .w` , where Pj `=0, 1, 2, . . . and each wi is either +1 or −1, such that for all 1≤i≤j≤`, | k=i wk |≤b. This constraint is often used to overcome low frequency noise such as fingerprints on compact discs. Other examples of 1-dimensional constraints are the EVEN and ODD constraints, which contain all finite binary sequences in which the number of ‘0’s between consecutive ‘1’s is even and odd, respectively. Presentations for these constraints are given in Figure 1. See [32] for more examples of 1-dimensional constraints and a more detailed explanation of their use in storage systems. In this work, we consider multidimensional constraints of dimension D for some positive integer D. Such a constraint is a set, specified by D edge-labeled directed graphs, of finite-size D-dimensional arrays with entries over some finite alphabet. In Chapter 2 we give a precise definition of what we mean by Ddimensional constraints. As in the 1-dimensional case, we say that an array satisfies the constraint if it belongs to it. Given a 1-dimensional constraint S, one can construct a D-dimensional constraint by collecting all the D-dimensional arrays for which the original 1-dimensional constraint is satisfied on every “row” in every “di1 Chapter 1. Overview 1 0 0 (a) 1 0 0 (b) +1 +1 0 1 −1 +1 +1 ... 2 −1 −1 b −1 (c) 0 0 1 0 ... 0 1 1 d 0 d+1 0 ... 0 k 1 (d) Figure 1.1: Presentations of 1-dimensional constraints: (a) EVEN; (b) ODD; (c) CHG(b); (d) RLL(d, k). rection” along an “axis” of the array. We denote such a D-dimensional constraint by S ⊗D . A well-known 2-dimensional constraint studied in statistical mechanics is the so called “hard-square” constraint. It consists of all finite-size (2-dimensional) binary arrays which do not contain 2 adjacent ‘1’s either horizontally or vertically. Two variations of this constraint are the isolated ‘1’s or “non-attacking-kings” constraint, denoted NAK, and the “read-write-isolated-memory” constraint, denoted RWIM. The former consists of all finite-size binary arrays in which there are no two adjacent ‘1’s either horizontally, vertically, or diagonally, and the latter consists of all finite-size binary arrays in which there are no two adjacent ‘1’s either horizontally, or diagonally. Like their 1-dimensional counterparts, 2-dimensional 2 Chapter 1. Overview constraints play a role in storage systems, where with recent developments, information is written in a true 2-dimensional fashion rather than using essentially 1-dimensional tracks. The RWIM constraint is used to model sequences of states of a binary linear memory in which no two adjacent entries may contain a ‘1’, and in every update, no two adjacent entries are both changed. See [4] and [13] for more details. Let S now be a D-dimensional constraint over an alphabet Σ. For a D-tuple m = (m1 , m2 , . . ., mD ) of positive integers, let Sm or Sm1 ×...×mD denote the set of all m1 ×m2 ×. . .×mD arrays in S, and vol(m) denote the product of the entries (i) (i) of m. We say that a sequence mi = (m1 , . . . , mD ) diverges to infinity, denoted (i) mi → ∞, if (mj )∞ i=1 does for each j. The capacity of S is then defined by log |Smi | , i→∞ vol(mi ) cap(S) = lim (1.1) D where (mi )∞ i=1 is a sequence of D-tuples in (N) diverging to infinity, | · | denotes cardinality, and log = log2 . We show in Chapter 2 that the limit always exists and is independent of the choice of (mi )∞ i=1 . The capacity is a fundamental quantity that has been studied under different names in several disciplines dealing with constrained systems. In symbolic dynamics it is known as the “topological entropy” and in statistical physics it is derived from the “grand partition function”. In the context of coding for storage systems, capacity has a practical role. As already mentioned, in many such systems due to physical constraints, only a subset of binary sequences can be written to the media reliably. This subset is typically modeled as a 1-dimensional constrained system over the binary alphabet {0, 1}. In practice, user information, consisting of an arbitrary sequence of binary digits is encoded into a sequence of the constraint before being written to the media. When reading back the data, the constrained sequence is decoded and the original information is recovered. Typically, an encoder divides its input into fixed size blocks of p digits each, emitting, for each block, a block of q digits, such that when all the output blocks are concatenated the result satisfies the constraint. The ratio p/q is called the rate of the encoder and, naturally, it is desirable that the rate of the encoder be as large as possible. It turns out, that the capacity of a 1-dimensional constraint, is the largest possible rate of any such encoder for the constraint, and hence, is important for the design and evaluation of efficient encoders for digital storage systems. While there is a formula for computing the capacity of a general 1-dimensional constrained system (up-to finding the largest root of a polynomial), no such formula is known for 2- and higher-dimensional constraints. There are only a handful of constraints for which the capacity is nonzero and is known exactly [23, 25, 39]. 3 Chapter 1. Overview Even for the hard-square constraint, the exact capacity is unknown and the problem goes back more than 40 years [12]. Some rigorous evidence for the hardness of computing the capacity of multidimensional constraints is given in [2], where it is shown that there is no algorithm that accepts a 2-dimensional constraint system S as input and determines whether |Sm×n | ≥ 1 for all m, n. In many storage systems, restricting the set of sequences that can be written to the media to be from a constrained system is not enough to ensure the low biterror-rate required in these systems. Accordingly, a conventional error correcting code, or ECC, is used in addition to further reduce the number of errors. Traditionally, arbitrary user information is encoded twice, first by the ECC encoder and then by the constrained system encoder, before it is written to the media. Immink and Wijngaarden [40] proposed a scheme to embed the ECC directly in the constrained system. In this scheme, the user information sequence is encoded into a binary sequence in which certain preset positions are left blank. These positions are denoted by ‘’s and are “unconstrained” in the sense that any way of filling them independently with ‘0’s and ‘1’s would result in a sequence that satisfies the given constrained system. The density of ‘’s in the resulting sequence is called the insertion rate. Next, a systematic ECC is used to compute parity check bits, which are stored directly in these unconstrained positions and the resulting sequence is written to the media. In this manner the written sequence is both a constrained sequence and an ECC codeword. As the error correcting capability of an ECC depends on the number of parity-check bits, it is desirable that the number of unconstrained positions emitted by the encoder, or equivalently the insertion rate, be as large as possible. The maximum insertion rate is the largest insertion rate achievable in arbitrary long sequences. On the other hand as the number of constrained sequences of a given length with a given insertion rate is inversely related to the insertion rate, increasing the insertion rate reduces the overall encoding rate (that is the rate of the combined constrained system and ECC encoders). The tradeoff function [5, 38] quantifies this and provides for a given insertion rate the highest possible rate of any matching encoder. Accordingly, the tradeoff function of a constraint evaluated at 0 equals its capacity and thus can be regarded as a generalization of capacity. In this work we generalize some of these concepts to higher dimensional constraints and non-binary alphabets. In particular, we define independence capacity of a constraint that, roughly speaking, captures the contribution of independence between symbols in arrays of the constraint to its capacity. For the binary alphabet this coincides with the notion of maximum insertion rate. We denote it by capind (S) for a constraint S. For a 1-dimensional constraint S, it turns out that cap(S ⊗1 ) ≥ cap(S ⊗2 ) ≥ . . . ≥ capind (S) and we denote the limit 4 Chapter 1. Overview D limD→∞ cap(S ⊗ ) by cap∞ (S); hence cap∞ (S)≥capind (S). Chaichanavong and Poo observed the curious fact that for all 1-dimensional constraints S, for which we we know cap∞ (S), it turns out to be equal to capind (S), and they ask whether this always holds [37]. Here, we give a partial answer by showing that for a large class of constraints, if capind (S) = 0, than so is cap∞ (S) and the convergence is exponentially fast. The main contributions of this work are summarized below. • Calculated the capacity of CHG(2)⊗D and ODD⊗D , for all D∈N [28]. • Generalized an earlier method for computing lower bounds on a certain class of 2-dimensional constraints, and using the method improved the best bounds on cap(NAK) and cap(RWIM), and gave the first published estimates of cap(EVEN⊗2 ) and cap(CHG(3)⊗2 ) [28]. • Showed that for a large class of constraints S with zero independence capacity, cap(S ⊗D ) → 0 exponentially fast [29, 30]. • Determined the tradeoff function of RLL(d, ∞) and RLL(d, 2d + 2) [27]. • Showed how correlation inequalities can be used to obtain lower bounds on “monotone” 2-dimensional constraints. This work is organized as follows. In Chapter 2 we define multidimensional constraints and related concepts. In Chapter 3 we show a method for obtaining lower bounds on the capacity, for a class of 2-dimensional constraints. In Chapter 4 we calculate the exact capacity of CHG(2)⊗D and ODD⊗D . In Chapter 5 we generalize some of the concepts of [37, 38] to dimensions larger than 1 and nonbinary alphabets and define multi-choice constraints and independence capacity. We show some of their properties and in particular prove the exponential convergence of capind (S) to cap∞ (S) for certain constraints S, discussed above. In Chapter 6 we compute the tradeoff function for RLL(d, 2d + 2) and RLL(d, ∞). Finally, in Chapter 7 we show some probabilistic inequalities that hold for certain 2-dimensional binary constraints. From these, we obtain a lower bound on the capacity. 5 Chapter 2 Multidimensional constraints∗ In this chapter we define multidimensional constraints and related concepts that we use in the rest of this thesis. 2.1 One-dimensional constraints and labeled directed graphs We deal with a finite directed graph G = (V, E), sometimes simply called a graph, with vertices V and edges E. We occasionally refer to the vertices as states and to the edges as transitions. For e∈E we denote by σG (e) and τG (e) the initial and terminal vertices of e in G, respectively. We shall omit the subscript G from σG and τG when the graph is clear from the context. For a sequence (ai )`i=1 and a set A, we abuse notation and write (ai )⊆A to mean that ai ∈A for i = 1, 2, . . ., `. A path of length ` in G is a sequence of ` edges (ei )`i=1 ⊆E, where for i = 1, 2, . . ., `−1, τ (ei )=σ(ei+1 ). The path starts at the vertex σ(e1 ) and ends at the vertex τ (e` ). A cycle in G is a path that starts and ends at the same vertex. Fix a finite alphabet Σ. A directed labeled graph G with labels in Σ is a pair G = (G, L), where G = (V, E) is a directed graph, and L : E → Σ is a labeling of the edges of G with symbols of Σ. The paths and cycles of G are inherited from G and we will sometime use σG and τG to denote σG and τG respectively. For a path (ei )`i=1 of G, we say the path generates the word L(e1 )L(e2 ). . .L(e` ) in Σ∗ (Σ∗ denote the set of all finite (1-dimensional) words over Σ). As mentioned in Chapter 1, a 1-dimensional constraint or 1-dimensional constrained system over Σ is the set of all words generated from finite paths in some labeled graph with labels in Σ. The graph is called a presentation of the constraint. A labeled direct graph is called lossless if for any two of its vertices u and v, all paths starting at u and terminating at v generate distinct words. It is called deterministic if there are no two distinct edges with the same initial vertex and the same label. Every 1-dimensional constraint S has a deterministic, and therefore lossless, ∗ A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 6 2.1. One-dimensional constraints and labeled directed graphs presentation [32]. A 1-dimensional constraint over an alphabet Σ is said to have memory m, for some positive integer m, if for every word w of more than m letters over Σ, in which every sub-word of m + 1 consecutive letters satisfies S, it holds that w satisfies S as well, and m is the smallest integer for which this is true. A 1dimensional constraint with memory m, for some integer m, is called a finitetype constraint. Of the examples introduced in Chapter 1, RLL(d, k) is a finitetype constraint—with memory k, for k<∞, and memory d, for k=∞ —whereas EVEN, ODD and CHG(b) for b≥2 are not finite-type constraints. We introduce two 1-dimensional constraints defined by general directed graphs. Let G=(V, E) be a directed graph. The edge constraint defined by G, denoted X(G), is the 1-dimensional constraint over the alphabet E, presented by G = (G, IE ) where IE is the identity map on E. Equivalently, an edge constraint is a constraint that can be presented by a labeled graph in which all the edges have distinct labels. For a graph G=(V, E) with no parallel edges, the vertex-constraint defined by G, denoted Ẋ(G), is the set `=0,1,2,. . ., and for 1≤i<`, ∃ei ∈E s.t. ` (vi )i=1 ⊆V : . σ(ei )=vi , τ (ei )=vi+1 It is not hard to verify that vertex-constraints and edge-constraints are 1dimensional constraints with memory (at most) 1. In fact, the vertex constraints are precisely the finite-type constraints with memory (at most) 1, and it can be shown that edge constraints are characterized as follows. The follower set of a symbol a in a constraint S is defined to be {b : ab ∈ S}; edge constraints are precisely the constraints with memory 1 such that any two follower sets are either disjoint or identical [26, exercise 2.3.4]. A graph G = (V, E) is irreducible if for any pair of vertices u, v∈V there is a path in G starting at u and terminating at v; otherwise it is reducible. A graph G is primitive if it is irreducible and the gcd of the lengths of all cycles of G is 1. These concepts naturally extend to labeled graphs as well. We denote by A(G) the adjacency matrix of G: namely the |V |×|V | matrix where (A(G))i,j is the number of edges in G from i to j, where | · | denotes cardinality. We use 1 in this work to denote a real vector in which each entry is 1 and for two real matrices (or vectors) M ,N of the same size we write M ≤N and M <N if the corresponding inequality holds entry-wise. We say that a graph G is symmetric if A(G) is symmetric. We say that a vertex of a graph is isolated if it has neither outgoing nor incoming edges. We say that a vertex-constraint (resp. edge-constraint) is symmetric if it is defined by a symmetric graph. For a vertex-constraint, this definition is equivalent to requiring that the constraint is closed under reversal of the order of symbols 7 2.2. Higher dimensional constraints in words. Note, that in a symmetric edge-constraint, up to removal of isolated vertices, the (unlabeled) graph defining the constraint is unique. 2.2 Higher dimensional constraints We consider multidimensional arrays of dimension D—a positive integer. We use Z+ to denote the set of nonnegative integers. For a Q D-tuple m = (m1 , . . . , mD ) ∈(Z+ )D we denote by [m] the Cartesian product i 0, . . ., mi −1, and for a finite set A, we call an m1 × m2 × . . . × mD D-dimensional array with entries in A, a D-dimensional array of size m over A. We shall index the entries of such an array by [m]. We use Am and Am1 ×...×mD to denote the set of all DD dimensional arrays of size m over A. We define A∗...∗ = A∗ , where the number of ‘∗’s in the superscript is D, by [ D A∗ = Am , m D as the set of all finite-size D-dimensional arrays with entries in A. Let Γ∈Σ∗ be such an array. Given an integer 1≤i≤D, a row in direction i of Γ is a semi −1 quence of entries of Γ of the form Γ(k1 ,k2 ,...,ki−1 ,j,ki+1 ,...,kD ) j=0 for some integers kl ∈[ml ]; 1≤l≤D, l6=i. In this work, for D = 2, we use the convention that direction 1 is the vertical direction and direction 2 is the horizontal; thus the columns of a 2-dimensional array are its rows in direction 1, and its “traditional rows” are its rows in direction 2. Let A, B be finite sets and L : A → B be a mapD D ping. We extend L to a mapping L : A∗ →B ∗ as follows. For a D-dimensional array Γ∈Am , L(Γ) is the array in B m obtained by applying L to each entry of Γ, that is (L(Γ))j = L((Γ)j ) , j∈[m]. D Additionally, for a subset S⊆A∗ we define L(S) = {L(Γ) : Γ∈S}. We generalize the definition of a constrained system to D dimensions. Let Ḡ = (G1 , G2 , . . ., GD ), be a D-tuple of labeled graphs with the same set of edges E and the same labeling L : E → Σ. The edge e has D pairs of initial and terminal vertices (σGi (e), τGi (e))—one for each graph Gi in Ḡ. We say that an D array Γ∈Σ∗ of size m is generated by Ḡ if there exists an array Γ0 ∈E ∗D of size m, such that for i = 1, 2, . . ., D, every row in direction i of Γ0 is a path in Gi , and L(Γ0 ) = Γ. We call the set of all arrays Γ∈Σ∗D generated by Ḡ, the D-dimensional constrained system or the D-dimensional constraint presented by Ḡ, and denote it by X(Ḡ) (note the difference from the notation used for edge-constraints where 8 2.2. Higher dimensional constraints the argument inside the parenthese is an unlabeled graph). We say that Ḡ is a presentation for X(Ḡ). In [14], 2-dimensional constrained systems are defined by vertex-labeled graphs, with a common set of vertices and a common labeling on the vertices. It can be shown that their definition (generalized to higher dimensions) is equivalent to ours. We find it more convenient to use our definition, since, just as in one dimension, it permits use of parallel edges and often enables a smaller presentation of a given constraint. Figure 2.2 shows presentations for the NAK and RWIM constraints defined in Chapter 1. In these presentations G1 and G2 describe the vertical and horizontal constraints on the edges, respectively. Each edge e = (e)i,j is a 2×2 binary matrix of the form (e)(0,0) (e)(0,1) e= , (e)(1,0) (e)(1,1) and it is labeled by (e)(1,1) , i.e., the labeling of an edge simply picks out the entry in the lower-right corner. For NAK, the edges E = ENAK are the 2 × 2 matrices which satisfy NAK, that is, with at most one 1. Similarly, for RWIM, the edges E = ERWIM are the 2 × 2 matrices which satisfy RWIM, namely, the elements of ENAK together with 0 1 1 0 , and 0 1 1 0 In the figures, each edge is drawn twice—once in G1 and once in G2 —and the matrix identifying it is written next to it. The states are 1 × 2 blocks for G1 and 2 × 1 blocks for G2 ; for an edge e, σG1 (e) = (e)(0,0) (e)(0,1) , and τG1 (e) = (e)(1,0) (e)(1,1) , and σG2 (e) = (e)(0,0) (e)(0,1) , and τG2 (e) = . (e)(1,0) (e)(1,1) It follows that, for both constraints, and any rectangular array Γ0 ∈E m×n with each of its rows a path in G2 and each of its columns a path in G1 , it holds that ! 0 0 L(Γ ) L(Γ ) (i−1,j−1) (i−1,j) Γ0(i,j) = L(Γ0(i,j−1) ) L(Γ0(i,j) ) for i = 1, 2, . . ., m−1, j = 1, 2, . . ., n−1. Therefore, the only 2×2 sub-arrays appearing in the array L(Γ0 ) are elements of E, and it follows that L(Γ0 ) satisfies the corresponding constraint. Similarly, it can be shown that any array satisfying the constraint can be generated by the presentation. 9 2.3. Capacity 01 00 00 00 G1 : 01 00 10 00 10 00 10 01 00 10 00 00 G2 : 01 00 01 00 0 0 00 01 1 0 10 00 (a) 01 01 00 00 01 00 G1 : 01 00 10 00 00 00 00 G2 : 01 10 00 01 00 0 0 00 01 10 01 00 10 10 10 1 0 10 00 10 10 1 1 01 01 (b) Figure 2.1: Presentations of 2-dimensional constraints: (a) NAK constraint; (b) RWIM constraint. 2.3 Capacity In Chapter 1, we introduced the notion of capacity of a D-dimensional constraint. Here we expand on the definition; in particular we show that capacity is welldefined. We first need a generalization of Fekete’s Subadditivity Lemma which will prove useful in subsequent chapters as well. We denote by N the set of positive integers and by R̄ = R ∪ {−∞, +∞} the extended real numbers. We say that a function f : ND →R̄ has a limit L∈R̄ at infinity, denoted limm→∞ f (m) = L, if D for every sequence (mi )∞ i=1 ⊆N with mi → ∞, we have limi→∞ f (mi ) = L. This is equivalent to requiring that for any real > 0 there exists N ∈N such that for all m∈ND with every entry at least N , |f (m)−L|<. We call a function 10 2.3. Capacity f : ND → R̄ entry-wise subadditive if for all m∈ND , f (m)<∞, and for any (m1 , . . ., mD )∈ND , i∈{1, 2, . . ., D} and n∈N, it holds that f (m1 , . . ., mi−1 , mi +n, mi+1 , . . ., mD ) ≤f (m1 , . . ., mD )+ f (m1 , . . ., mi−1 , n, mi+1 , . . ., mD ). (2.1) We call a function f : ND → R̄ entry-wise superadditive if −f is entry-wise subadditive. Lemma 1. Let f : ND → R̄ be a function, then the following statements hold. 1. If f is entry-wise subadditive then f (m) f (m) = inf . m→∞ |[m]| m∈ND |[m]| lim (2.2) 2. If f is entry-wise superadditive then lim m→∞ f (m) f (m) = sup . |[m]| m∈ND |[m]| (2.3) Remark . Note that if f is entry-wise subadditive then for i = 1, 2, . . ., D, for any mi+1 , . . . , mD ∈ N the mapping mi → lim lim mi−1 →∞ mi−2 →∞ f (m1 , . . . , mD ) m1 →∞ m1 ·. . .·mi−1 . . . lim is subadditive; and hence by Part 1 we have the limit 1 f (m1 , . . . , mD ) lim lim . . . lim = mi →∞ mi mi−1 →∞ m1 →∞ m1 ·. . .·mi−1 f (m1 , . . . , mD ) 1 inf lim . . . lim . m1 →∞ m1 ·. . .·mi−1 mi ∈N mi mi−1 →∞ By repeating this argument D times, one has, f (m1 , . . ., mD ) f (m1 , . . ., mD ) = inf . . . inf m1 →∞ mD ∈N m1 ∈N m1 ·. . .·mD m1 ·. . .·mD f (m) f (m) = inf = lim . m→∞ |[m]| m∈ND |[m]| lim . . . lim mD →∞ An analogous result holds for the superadditive case. 11 2.3. Capacity Proof. Part 1. Let f be entry-wise subadditive. It’s easy to check that (2.1) implies (0) (0) (1) (1) that for all (a1 , . . ., aD ), (a1 , . . ., aD )∈ND , k∈N, and i∈{1, . . ., D}, we have (0) (0) (0) (0) (0) (0) (0) f (a1 , . . ., ai−1 , kai , ai+1 , . . ., aD ) ≤ kf (a1 , . . ., aD ) X (0) (1) (0) (1) (x ) (x ) f (a1 +a1 , . . ., aD +aD ) ≤ f (a1 1 , . . ., aD D ), (2.4) (2.5) x∈{0,1}D where x = (x1 , . . ., xD ) in the sum in the RHS of (2.5). D (i) → ∞. Write m(i) = Now, let (m(i) )∞ i=1 ⊆N be a sequence satisfying m (i) (i) (m1 , . . ., mD ), and let n = (n1 , . . ., nD ) be a vector in ND . Since m(i) → ∞ (and D is finite), there Q is an i0 =i0 (n) such that for all i≥i0 , m(i) ≥2n. Set M = max({0}∪{f (x) : x∈ j {1, . . ., nj }}) and let i≥i0 . Note, that by our assumption (i) (i) on f , M 6= ± ∞. For each j = 1, . . ., D, there are (unique) qj , rj ∈N, such that (i) (i) (i) (i) (0) (0) (i) (1) (i) mj = qj nj + rj , and 1≤rj ≤nj . We define αj =βj =rj , αj =qj nj , and (1) βj =nj . Let T = {0, 1}D \{1} (where 1 denotes the vector in ND with every entry equal to 1). Then using (2.5) and (2.4) we have (i) (i) (i) (i) f (q1 n1 +r1 , . . . , qD nD +rD ) f (m(i) ) = (i) |[m ]| |[m(i) ]| P (x1 ) (xD ) x∈{0,1}D f (α1 , . . . , αD ) ≤ |[m(i) ]| xj P (x1 ) (xD ) Q (i) ) f (β , . . . , β 1 x∈{0,1}D j qj D ≤ |[m(i) ]| xj Q (i) P (i) (x ) (x ) Q f (n) j qj + x∈T f (β1 1 , . . . , βD D ) j qj = |[m(i) ]| (i) Y X Y (qj(i) )xj qj M , ≤ f (n) + (2.6) (i) (i) (i) (i) q n +r q n +r j j j j x∈T j j j j where, again, in each sum above x = (x1 , . . ., xD ). Since m(i) → ∞, it follows (i) that limi→∞ qj = ∞ for each j=1, . . ., D. Observe that, as i goes to infinity, the first summand of the RHS of (2.6) converges to f (n)/|[n]|, and each of the other summands converges to 0. Hence, taking the lim sup of both sides of the last inequality, we obtain lim supi→∞ (f (m(i) )/|[m(i) ]|) ≤ f (n)/|[n]|. Since n is arbitrary, this implies lim sup i→∞ f (m(i) ) f (n) ≤ inf . (i) D |[m ]| n∈N |[n]| (2.7) 12 2.4. Axial product On the other hand, clearly, inf n∈ND f (n) f (m(i) ) f (m(i) ) ≤ lim inf ≤ lim sup . (i) i→∞ |[m(i) ]| |[n]| i→∞ |[m ]| Combining this with (2.7) we get limi→∞ (f (m(i) )/|[m(i) ]|) = inf n {f (n)/|[n]|}, and the result follows. Part 2. Easily follows by applying Part 1 to −f . Observe that D-dimensional constraints are closed under taking contiguous sub-arrays, meaning that if an array belongs to the constraint then any of its Ddimensional contiguous sub-arrays also belongs to the constraint. This implies that the mapping m→ log |Sm | for m∈ND , where we define log 0= − ∞, is entry-wise subadditive. Thus by Lemma 1, the limit in (1.1) always exists, is independent of the choice of (mi )∞ i=1 , and satisfies cap(S) = inf m∈ND log |Sm | |[m]| (2.8) For a nonnegative matrix A denote by λ(A) its Perron eigenvalue, that is, its largest real eigenvalue. It is well-known that for a 1-dimensional constraint S presented by a lossless labeled graph G = (G, L), the capacity of S is log λ (A(G)). In particular, for a graph G, it holds that cap(X(G)) = log λ (A(G)). 2.4 Axial product D The axial product of D sets L1 , . . ., LD ⊆Σ∗ , denoted L1 ⊗L2 ⊗. . .⊗LD ⊆ Σ∗ , D is the set of all arrays Γ∈Σ∗ such that for i = 1, 2, . . ., D every row of Γ in direction i belongs to Li . If L1 =L2 =. . .=LD = L we say that the axial-product is isotropic and denote it by L⊗D . Given a presentation Ḡ = ((G1 , L), . . ., (GD , L)) for a D-dimensional constraint S with a common set of D edges E, the set X(G1 )⊗. . .⊗X(GD ) ⊆ E ∗ is a D-dimensional constraint presented by ((G1 , IE ), . . ., (GD , IE )), where IE is the identity map on E. If cap(X(G1 )⊗. . .⊗X(GD ))=cap(S), we say that Ḡ is capacity-preserving. For D=1 any lossless, and therefore deterministic, presentation of S is capacitypreserving, since in a lossless graph with |V | vertices there are at most |V |2 paths generating any given word; for D > 1, the question whether every D-dimensional constraint has a capacity-preserving presentation is open. This is a major open problem in symbolic dynamics, although it is usually formulated in a slightly different manner; see [6], where it is shown that for every D-dimensional constraint S and > 0, there is a presentation Ḡ = ((G1 , L), . . ., (GD , L)) such 13 2.4. Axial product that cap(S)≤cap(X(G1 )⊗. . .⊗X(GD ))<cap(S)+. We show in the next proposition that the answer to this question is positive, if S is an axial product of D one-dimensional constraints. Proposition 1. Let S (1) , S (2) , . . ., S (D) ⊆Σ∗ be D one-dimensional constraints over Σ, and let S = S (1) ⊗. . .⊗S (D) , then S is a D-dimensional constraint over Σ. Moreover, S has a capacity-preserving presentation. Remark . There are D-dimensional constraints which are not axial products of D one-dimensional constraints. For example for D=2, the NAK constraint defined in Chapter 1 is not an axial product. Proof. Let GS (1) , . . ., GS (D) be presentations of S (1) , . . ., S (D) , where, for i = 1, 2, . . ., D, GS (i) = ((Vi , Ei ), Li ). Define the D-tuple of labeled graphs Ḡ = (G1 , . . ., GD ), as follows. Let ( ) D Y E= (e1 , . . ., eD ) ∈ Ei :L1 (e1 )=L2 (e2 )=. . .=LD (eD ) , i=1 and let L : E → Σ be given by L(e1 , . . ., eD ) = L1 (e1 ) ; (e1 , . . ., eD )∈E. For i = 1, 2, . . ., D, the graph Gi is defined by Gi = (Gi , L) with Gi = (Vi , E), where for e = (e1 , . . ., eD )∈E, σGi (e) = σG (i) (ei ) and τGi (e) = τG (i) (ei ). It’s S S easy to verify that Ḡ is a presentation of S (1) ⊗S (2) ⊗. . .⊗S (D) . Assume now that every GS (i) is lossless. We show that in this case Ḡ is capacity preserving. Let X=X(G1 )⊗. . .⊗X(GD ), n be a positive integer, and let n be the D-tuple with every entry equal to n. We extend the mapping L to L : Xn →Sn as described above. Now, fix an array Γ∈Sn , and for i = 1, 2, . . ., D let Γ0(i) ∈(Ei )n 0(i) be an array such that every row in direction i, (Γjk )nk=1 is a path in GS (i) generating the corresponding row (Γjk )nk=1 in Γ. Let Γ0 ∈E n be the array with entries given by 0(1) 0(D) Γ0j = (Γj , . . ., Γj ), j∈[n]D . It follows from the construction of Ḡ that Γ0 ∈Xn , and that L(Γ0 ) = Γ. Moreover, any array ∆∈Xn such that L(∆) = Γ can be constructed in this manner. Now, as each GS (i) is lossless, there are at most |Vi |2 possibilities of choosing each row in direction i of Γ0(i) , and as there are nD−1 such D−1 rows, there are at most |Vi |2n possibilities of choosing each Γ0(i) . It follows that |L−1 ({Γ})| ≤ D Y D−1 |Vi |2n , i=1 14 2.5. Two-dimensional constraints where L−1 ({Γ}) = {Γ0 ∈Xn : L(Γ0 ) = Γ}. Summing the latter inequality over all Γ∈Sn , we obtain D Y D−1 |Xn |≤|Sn | |Vi |2n . i=1 nD , Taking the log, dividing through by and taking the limit as n approaches infinity, we have cap(X)≤cap(S). Clearly, cap(X)≥cap(S), since X is a presentation of S. The result follows. Let S be a 1-dimensional constraint. Since for any m∈ND |(S ⊗D )m | = |(S ⊗(D+1) )(m,1) | it follows from (2.8) that cap(S ⊗D ) is non-increasing in D. 2.5 Two-dimensional constraints We introduce some specialized definitions for 2-dimensional constraints. For a 2-dimensional array Γ∈Σm1 ×m2 , for nonnegative integers m1 , m2 , we denote by Γt its transpose, namely (Γt )(i,j) = (Γ)(j,i) for all (i, j)∈[m1 ]×[m2 ]. For a 2dimensional constraint S over Σ, we use S t to denote the set S t = Γ∈Σ∗∗ : Γt ∈S , Clearly S t is a 2-dimensional constraint with cap(S t ) = cap(S). We shall use the following (MATLAB-like) notation for 2-dimensional arrays. Let A be a set and Γ∈As×t . For integers 0≤s1 ≤s2 <s and 0≤t1 ≤t2 <t, we denote by Γs1 :s2 ,t1 :t2 the sub-array: (Γs1 :s2 ,t1 :t2 )i,j = Γs1 +i,t1 +j ; (i, j)∈[s2 −s1 +1]×[t2 −t1 +1], and by Γs1 :s2 ,∗ (resp. Γ∗,t1 :t2 ) the sub-array Γs1 :s2 ,0:t−1 (resp. Γ0:s−1,t1 :t2 ). We also abbreviate x:x in the subscript by x. We shall use the same notation for onedimensional vectors: for a vector v∈As , vs:t denotes the sub-vector (vs:t )i = vs+i ; i∈[t−s+1]. 2.5.1 Horizontal and vertical strips Let S be a 2-dimensional constraint over Σ, and m be a positive integer. The horizontal (resp. vertical) strip of height (resp. width) m of S, denoted Hm (S) (resp. Vm (S)) is the subset of S given by [ [ Hm (S) = Sm×n (resp. Vm (S) = Sn×m ). n n 15 2.5. Two-dimensional constraints Let S be a 2-dimensional constraint over an alphabet Σ, and consider a horizontal (resp. vertical) strip Hm (S) (resp. Vm (S)) of S for some positive integer m. We regard such a strip as a set of 1-dimensional words over Σm where each m×n (resp. n×m) array in the strip is considered a word of length n over Σm . Below we show that the horizontal and vertical strips of S are 1dimensional constraints over Σm . For this, we need the following definition. Let G = (V, E) be a graph, and let m be a positive integer. Let G×m be the graph given by G×m = (V m , E m ), where for each e = (e1 , . . ., em )∈E m , σG×m (e) = (σG (e1 ), . . ., σG (em )) and τG×m (e) = (τG (e1 ), . . ., τG (em )). For a labeled graph G = (G, L) with G = (V, E) and L : E → Σ, let G ×m be the labeled graph defined by G ×m = (G×m , L×m ), where L×m : E m → Σm is given by L×m (e1 , . . ., em )=(L(e1 ), . . ., L(em )) ; (e1 , . . ., em )∈E m We call G×m (resp., G ×m ) the mth tensor-power of G (resp., G). We can now state the following proposition. Proposition 2. Let S be a 2-dimensional constraint over Σ and let m be a positive integer. Then 1. Hm (S) (resp. Vm (S)) is a 1-dimensional constraint over Σm . 2. Let S = T (V) ⊗T (H) for 1-dimensional constraints T (V) , T (H) over Σ, presented by labeled graphs G (V) , G (H) , respectively. Then the 1-dimensional (H) constraint Hm (S) is presented by the labeled graph Gm defined as the subgraph of the labeled graph (G (H) )×m consisting of only those edges whose label (an m-letter word over Σ) satisfies T (V) . An analogous statement holds (V) for Vm (S), with respect to the graph Gm formed in a similar way from (G (V) )×m . Proof. It suffices to prove this only for horizontal strips Hm (S)—the argument for the vertical strip being analogous. We first prove part 2. It’s easy to verify that the labeled graph (G (H) )×m presents the constraint over Σm , consisting of all m×n (H) arrays of Σ∗∗ such that every row satisfies T (H) . It follows that the subgraph Gm , formed by removing all the edges of (G (H) )×m that are labeled with a word that does not satisfy T (V) , presents the 1-dimensional constraint consisting of all m×n arrays with every row satisfying T (H) and every column satisfying T (V) . This is precisely the constraint Hm (S). We proceed to prove part 1. Let the pair of labeled graphs (G (V) , G (H) ) be a presentation of S, where G (V) = ((V (V) , E), L) and G (H) = ((V (H) , E), L). Define the edge-constraints E (V) = X(V (V) , E) and E (H) = X(V (H) , E). Since (G (V) , G (H) ) is a presentation of S, we have S = L(E (V) ⊗E (H) ), and therefore 16 2.6. Open questions Hm (S) = L(Hm (E (V) ⊗E (H) )). By part 2, Hm (E (V) ⊗E (H) ) is a 1-dimensional (H) constraint, presented by a labeled graph Gm with edges labeled by words in E m . m Replacing each such label e∈E in that graph with L(e), we clearly obtain a presentation of L(Hm (E (V) ⊗E (H) )) = Hm (S). 2.6 Open questions A major problem in the theory of multidimensional constrained systems is the computation of their capacity. As already mentioned before, the problem is essentially solved for 1-dimensional constraints, however, in higher dimensions, only a few methods for estimating the capacity of a general D-dimensional constraint exist. In the light of [2], no “computable” formula for the capacity of such constraints exists; yet, it still may be true that for some specialized sub-classes of constraints, one can compute the capacity exactly. For example, the hard square constraint is a relatively old open problem for which no “closed-form” formula for computing the capacity is currently known. However, such a formula is known for the hexagonallattice version of this constraint (the “hard-hexagon constraint”) [1], which perhaps suggests that a similar formula exists for the capacity of the hard-square constraint. 17 Chapter 3 Lower bounds on capacity of 2-dimensional symmetric constraints∗ A method for computing very good lower-bounds on the capacity of the hardsquare constraint is given in [7] (see also [34, 41]). [3] generalizes the method slightly and also presents a method for obtaining good upper-bounds on the capacity of this constraint. Both the method for obtaining the lower- and the method for obtaining the upper-bounds can be shown to work on any 2-dimensional constraint for which every horizontal or every vertical strip is a symmetric vertex-constraint. In this chapter we show a generalization of the method for obtaining the lowerbounds that gives improved bounds on capacities of such constraints. Moreover, we show how this generalization as well as the method for obtaining the upperbounds may be applied to a larger class of 2-dimensional constraints that includes constraints in which the vertical and horizontal strips are not necessarily finitetype. We illustrate this by computing lower and upper bounds on the capacities of the EVEN⊗2 and CHG(3)⊗2 constraints, and show that 0.4402086447 ≤ cap(EVEN⊗2 ) ≤ 0.4452873312, 0.4222689819 ≤ cap(CHG(3) 3.1 ⊗2 and ) ≤ 0.5328488954. Constraints with symmetric edge-constrained strips In this section we generalize the method presented in [3, 7] to provide improved lower bounds on capacities of 2-dimensional constraints whose horizontal strips are symmetric edge-constraints. Fix an alphabet Σ, and let S be a 2-dimensional constraint over Σ. We say that S has horizontal edge-constrained-strips if for every positive integer m, the ∗ A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 18 3.1. Constraints with symmetric edge-constrained strips constraint Hm (S) is an edge-constraint. If, in addition, every horizontal strip is symmetric, we say that S has symmetric horizontal edge-constrained strips. Analogously, using Vm (S), we have the notions of a 2-dimensional constraint with vertical edge-constrained-strips and symmetric vertical edge-constrained-strips Here, we consider constraints of the form S = T ⊗ E, where E=X(GE ) is an edge-constraint defined by the graph GE = (VE , EE ) and T is an arbitrary 1-dimensional constraint over Σ. Then E is presented by GE = (GE , IE ) where IE is the identity map on EE . Let m be a positive integer. By Proposition 2, part 2, Hm (S) is a 1-dimensional constraint presented by a subgraph (H) (H) (H) Gm = (Gm , IE×m ) of GE×m . It follows that Hm (S) = X(Gm ), and so S has horizontal edge-constrained strips. Henceforth, we further assume that it has symmetric horizontal edge-constrained strips; note that symmetry of the graph GE is necessary but not sufficient for this assumption (see Proposition 3 below). For a positive integer m, let Fm = |(VE )m |, and let Hm denote the Fm × Fm (H) adjacency matrix of Gm . Since limn→∞ (log(|Sm×n |)/n) = cap(Hm (S)) for every positive integer m, we have log |Sm×n | mn log |Sm×n | = lim lim m→∞ n→∞ mn cap(Hm (S)) = lim m→∞ m log λ(Hm ) = lim . m→∞ m cap(S) = lim m,n→∞ (3.1) For a matrix M , let M t denote its transpose. Fix a positive integer m. Following [3, 7], since Hm is real and symmetric, we obtain by the min-max principle [16] p λ(Hm )≥ t Hp y ym m m , t ym ym for any Fm ×1 real vector ym 6= 0 and positive integer p. The RHS of the last q inequality is known as a Rayleigh quotient. Choosing ym to be the vector Hm xm , for some positive integer q and Fm × 1 real vector xm such that ym 6= 0, we have p λ(Hm )≥ 2q+p xtm Hm xm 2q xtm Hm xm . (3.2) Thus by (3.1), it follows that 2q+p 1 1 xtm Hm xm . cap(S) ≥ lim sup log 2q t p m→∞ m xm Hm xm (3.3) 19 3.1. Constraints with symmetric edge-constrained strips In [3, 7], each xm is chosen to be the vector 1 with Fm entries. We obtain improved lower bounds in many cases by choosing other sequences of vectors, (xm )∞ We fix integers µ≥0 and α≥1, and let m=1 , as follows. φ : (VE )µ+α → [0, ∞) be a nonnegative function. Our method works for sequences (xmk )∞ k=1 , where mk = µ + kα for positive integers k, and the entries of each xmk are indexed by (VE )mk and given by (xmk )v = k−1 Y φ(viα:iα+µ+α−1 ) ; v∈(VE )mk . (3.4) i=0 For such sequences and a fixed positive integer n, we will show that one can comn x pute Ln , the growth rate of xtmk Hm : k mk n x log xtmk Hm k mk , k→∞ mk Ln = lim and from (3.3) we obtain the lower bound cap(S)≥(L2q+p − L2q )/p. Before doing this for general µ and α, it is instructive to look at the special case: µ = 0 and α = 1. In this case mk =k, and using (3.4) to define xk we obtain (xk )v = k−1 Y v∈(VE )k . φ((v)i ) ; i=0 Let n be a positive integer. For a word w = w1 . . .wn ∈E define its weight, Wφ (w), by Wφ (w)=φ(σ(w1 ))φ(τ (wn )), where w1 , wn are regarded as edges in GE and extend this to arrays Γ∈Sm×n by Wφ (Γ) = m−1 Y Wφ (Γi,∗ ). i=0 (H) Observe that for an array Γ∈Sm×n that is a path in Gm of length n starting at v∈(VE )m and ending at u∈(VE )m , it holds that Wφ (Γ)=(xm )v (xm )u . It follows that X n xtm Hm xm = Wφ (Γ). (3.5) Γ∈Sm×n (V) (V) Now, pick a deterministic presentation, Gn , of Vn (S), where Gn = (V) (V) (V) (V) ((Vn , En ), Ln ), and let Wφ : En →[0, ∞) be the edge weighting defined (V) (V) (V) (V) (V) by Wφ (e) = Wφ (Ln (e)), for e∈En . Let A(Gn , Wφ ) be the |Vn |×|Vn | 20 3.1. Constraints with symmetric edge-constrained strips (V) (V) (V) weighted adjacency matrix of Gn with entries indexed by Vn ×Vn by X Wφ (e) ; i, j∈Vn(V) . = A(Gn(V) , Wφ ) i,j and given (V) e∈En : σ(e)=i,τ (e)=j Then 1t A(Gn(V) , Wφ )m 1 = X Wφ (L(V) n (γ)), γ (V) (V) where the sum is taken over all paths γ in Gn of length m and Ln (γ) denotes (V) the array in Sm×n generated by γ. Since Gn is deterministic it follows that P (V) log Γ∈Sm×n Wφ (Γ) log 1t A(Gn , Wφ )m 1 lim = lim . m→∞ m→∞ m m By (3.5) the RHS is Ln and by Perron-Frobenius theory the LHS is (V) (V) log λ(A(Gn , Wφ )), and thus Ln = log λ(A(Gn , Wφ )). For general µ, α, we proceed similarly. We pick a deterministic presentation, (V) (V) (V) (V) (V) Gn , of the vertical strip Vn (S), with Gn = ((Vn , En ), Ln ), and construct a (V) labeled directed graph I=I(µ, α, n, Gn , GE )=((VI , EI ), LI ), with nonnegative real weights on its edges given by Wφ : EI → [0, ∞). The graph I and weight function Wφ are defined as follows. The set of vertices VI is given by o n VI = (f , v, l) : v ∈ Vn(V) , f , l ∈ (VE )µ , and the function LI : EI → Σα×n labels each edge with an α×n array over Σ. We specify the edges of I by describing the outgoing edges of each of its vertices along with their weights. Let v = (f , v, l) ∈ VI be a vertex of I. The set of outgoing edges of v consists of exactly one edge for every path of length (V) (V) α in Gn starting at v. Let γ = (ei )α−1 i=0 ⊆ En be such a path and let u be its (V) terminating vertex. We regard the word generated by γ in Gn as an array Γ ∈ (V) Σα×n with entries given by (Γ)i,j = Ln (ei ) . Let f = (f0 , . . . , fµ−1 ) and l = j (l0 , . . . , lµ−1 ) and for i = µ, µ+1, . . ., µ+α−1, define fi to be σ(Γi−µ,0 ) and li to be τ (Γi−µ,n−1 ), where Γi−µ,0 and Γi−µ,n−1 are regarded as edges in the graph GE . For such a path γ the corresponding outgoing edge e ∈ EI of v satisfies σ(e) = v, LI (e) = Γ, τ (e) = ((fα , fα+1 , . . . , fα+µ−1 ), u, (lα , lα+1 , . . . , lα+µ−1 )). The weight of e, Wφ (e), is given by Wφ (e) = φ(f0 , . . . , fµ+α−1 )φ(l0 , . . . , lµ+α−1 ). We shall regard the label of a path (ei )`−1 i=0 in I as the `α×n array Γ over Σ resulting 21 3.1. Constraints with symmetric edge-constrained strips from concatenating the labels of the edges of γ in order in the vertical direction, namely Γiα+k,j = (LI (ei ))k,j , for all i∈[`], k∈[α] and j∈[n]. Finally, we define the weighted adjacency matrix of the labeled directed graph I with weights given by Wφ as the |VI | × |VI | nonnegative real matrix A(I, Wφ ) with entries indexed by VI × VI and given by X (A(I, Wφ ))i,j = Wφ (e) ; i, j ∈ VI . e∈EI : σ(e)=i,τ (e)=j Figure 3.1 shows paths generating an `α×n array Γ∈S: the left part of the fig(V) ure shows such a path in Gn and the right part of the figure shows the “correspond(V) ing” path in I. The label of each edge in Gn and I is “overlayed” on top of it. Each row of Γ—a path in GE of length n—is depicted in the figure as a grey “snakelike” curve. In the figure, we denote by si ,ti the states σ(Γi,0 ), τ (Γi,n−1 )∈GE respectively, for i = 0, 1, . . ., `α−1. The following lemma generalizes ideas in [3, 7] and uses the weighted labeled graph I to compute Ln , when (xmk )∞ k=1 is the sequence defined by (3.4). Lemma 2. For a sequence (xmk )∞ k=1 with each xmk given by (3.4), and I = (V) I(µ, α, n, Gn , GE ), n x log xtmk Hm log λ(A(I, Wφ )) k mk = . k→∞ mk α lim Proof. Let (xmk )∞ k=1 be a sequence of vectors with each xmk given by (3.4). We shall show that there are positive real constants c,d (depending on I and φ) such that for all positive integers k, n c · 1t (A(I, Wφ ))k+dµ/αe 1 ≤ xtmk Hm x k mk ≤ d · 1t (A(I, Wφ ))k 1. (3.6) For a positive integer s and vector e = (e0 , . . . , es−1 ) in (EE )s denote by σ(e), τ (e) ∈ (VE )s the vectors with entries given by (σ(e))i = σ(ei ) ; i ∈ {0, 1, . . . , s−1}. (τ (e))j = τ (ei ) Now, fix a positive integer k. Let Γ be an array in Smk ×n . Recall that each entry of Γ is an edge in GE and define the weight of Γ, denoted Wφ (Γ), by Wφ (Γ)= k−1 Y φ(σ(Γiα:iα+µ+α−1,0 ))φ(τ (Γiα:iα+µ+α−1,n−1 )). i=0 22 3.1. Constraints with symmetric edge-constrained strips v`α f` s`α−1 s`α−µ s(`−1)α+1 s(`−1)α v`α s`α−1 .. . .. . .. . .. . l` t`α−1 t`α−µ t(`−1)α+1 t(`−1)α t`α−1 v`α−1 v2α f2 s2α−1 s2α−µ t1 t0 v0 .. . .. . sα−µ t2α−µ tα+1 tα vα f1 sα−1 v1 s0 .. . sα+1 sα v2 s1 .. . l2 t2α−1 .. . .. . .. . .. . s1 s0 l1 tα−1 tα−µ t1 t0 v0 f0 l0 .. . .. . (V) Figure 3.1: Paths generating an `α×n-array of S, in Gn (left) and I (right). 23 3.1. Constraints with symmetric edge-constrained strips For a positive integer `, let P` denote the set of paths of length ` in I. We denote the label of a path γ∈P` by LI (γ). It is easily verified that there exists a path in (V) P` with label Γ ∈ Σ`α×n if and only if there exists a path in Gn of length `α (V) that generates Γ. As Gn is a presentation of Vn , the set of labels of paths in P` is S`α×n . For a finite path γ in I, define its weight, denoted Wφ (γ), as the product of the weights of the edges in the path. Recalling that the entries of xmk are indexed by (VE )mk , we observe that X n = (xmk )σ(Γ∗,0 ) (xmk )τ (Γ∗,n−1 ) xtmk Hm x k mk Γ∈Smk ×n = X Wφ (Γ). Γ∈Smk ×n For an array Γ ∈ Smk ×n , we say that a path γ ∈ Pk matches Γ if it is labeled by the sub-array Γµ:mk −1,∗ and starts at a vertex (f , v, l) ∈ VI with f = σ(Γ0:µ−1,0 ) and l = τ (Γ0:µ−1,n−1 ). It can be verified from the construction of I that if γ matches Γ then Wφ (γ) = Wφ (Γ). (V) Now, since Gn is a presentation of Vn (S), it follows from the construction of I that every Γ ∈ Smk ×n has a path in Pk matching it. Conversely, since for a path γ ∈ Pk all arrays Γ ∈ Smk ×n that it matches have the same sub-array Γµ:mk −1,∗ , it follows that there are at most |Σ|µn arrays in Smk ×n that γ matches. Therefore, X n xtmk Hm x = Wφ (Γ) m k k Γ∈Smk ×n ≤ |Σ|µn X Wφ (γ) γ∈Pk = |Σ|µn 1t (A(I, Wφ ))k 1. This shows the right inequality of (3.6), we now turn to the left. Set k 0 = k+dµ/αe, s = dµ/αeα − µ, and let ψ : Pk0 → Smk ×n be given by ψ(γ) = (LI (γ))s:k0 α−1,∗ ; γ ∈ Pk0 . 24 3.1. Constraints with symmetric edge-constrained strips 0 −1 ⊆ EI , its weight satisfies For a path γ ∈ Pk0 , with γ = (ei )ki=0 Wφ (γ) = 0 −1 kY Wφ (ei ) i=0 dµ/αe−1 = Y Wφ (ei ) Wφ (ψ(γ)) i=0 ≤Φ 2dµ/αe Wφ (ψ(γ)), where we take Φ to be a positive constant satisfying Φ≥ max {φ(v):v∈(VE )µ+α }. (V) Now let Γ be an array in Smk ×n . Since Gn is deterministic, so is I, and thus for every vertex v ∈ VI , the paths in Pk0 starting at v are labeled distinctly. As all paths γ that map to Γ under ψ have the same sub-array (LI (γ))s:k0 α−1,∗ , it follows that there are at most |Σ|sn paths γ ∈ Pk0 starting at v such that ψ(γ) = Γ. Consequently, there are at most |VI ||Σ|sn paths in Pk0 that map to Γ under ψ. Therefore, X 1t (A(I, Wφ ))k+dµ/αe 1 = Wφ (γ) γ∈Pk0 ≤ Φ2dµ/αe X Wφ (ψ(γ)) γ∈Pk0 ≤ Φ 2dµ/αe |VI ||Σ|sn 2dµ/αe sn X Wφ (Γ) Γ∈Smk ×n |VI ||Σ| t n ·xmk Hmk xmk . = Φ Dividing both sides by Φ2dµ/αe |VI ||Σ|sn we obtain the left inequality of (3.6). The claim of the lemma now follows from Perron-Frobenius theory by taking the log of (3.6), dividing it by mk and taking the limit as k approaches infinity. We thus obtain the following lower bound on the capacity of a 2-dimensional constraint. Theorem 1. Let T, E be 1-dimensional constraints over an alphabet Σ, with E an edge constraint defined by a graph GE = (VE , EE ). Set S = T ⊗E and suppose that S has symmetric horizontal edge-constrained strips. Let µ≥0 and α, p, q>0 be integers and φ : (VE )µ+α → [0, ∞) be a nonnegative real function. For a positive integer n, let Gn be a deterministic presentation of Vn (S), and set An,φ = 25 3.1. Constraints with symmetric edge-constrained strips A(I(µ, α, n, Gn , GE ), Wφ ). Then cap(S) ≥ log λ(A2q+p,φ ) − log λ(A2q,φ ) . pα (3.7) Remark 1. In addition to computing lower-bounds, [3] gives a method for computing upper bounds on the capacity of the hard-square constraint. It can be shown that this method can also be applied to all constraints of the form T ⊗E, with E an edge constraint, having symmetric horizontal edge-constrained strips. Remark 2. Theorem 1 can be generalized to apply to 2-dimensional constraints having symmetric horizontal edge-constrained strips, which are not necessarily axial-products. Let S be such a constraint, and for every positive integer m, let (H) (H) (H) Gm = (Vm , Em ) be the symmetric graph, with no isolated vertices, defining (H) (H) (H) Hm (S). Set GE = G1 , VE = V1 and EE = E1 . We claim there exists a (H) (H) mapping fm : Vm →(VE )m such that for every edge e = e0 e1 . . .em−1 ∈Em , with each ei ∈EE , fm (σ(e)) = (σ(e0 ), . . ., σ(em−1 )) and fm (τ (e)) = (τ (e0 ), . . ., τ (em−1 )). (3.8) (H) This mapping is defined as follows. For a vertex v∈Vm , pick an incoming edge (H) e = e0 e1 . . .em−1 ∈Em and define fm (v) as (τ (e0 ), . . ., τ (em−1 )). This mapping (H) is uniquely-defined: indeed if e0 = e00 . . .e0m−1 ∈Em is another incoming edge (H) of v, and g = g0 . . .gm−1 ∈Em is an outgoing edge of v, then clearly, for every i∈[m], both ei gi and e0i gi are paths in GE ; consequently τ (ei ) = τ (e0i ). It is easy to check that fm satisfies the conditions in (3.8). Now, replace the definition of (xmk )v in (3.4) with (xmk )v = k−1 Y φ(uiα:iα+µ+α−1 ) ; u = fmk (v), v∈Vm(H) . k i=0 With this new definition and the aid of (3.8), it can be verified that Lemma 2 and consequently Theorem 1 still hold. Remark 3. Clearly, it is sufficient, for the theorem to hold, that Hm (S) is symmetric for large enough m. We now give a sufficient condition for the constraint S = T ⊗E to have symmetric horizontal edge-constrained strips. For this to happen, we (generally) must have that GE is symmetric. This means that there exists a “matching” between edges, were each edge is matched with an edge in the “reverse” direction. More precisely there is a bijection R : EE → EE such that for all e ∈ EE , 26 3.2. Constraints with symmetric vertex-constrained strips (σ(e), τ (e)) = (τ (R(e)), σ(R(e))) and R(R(e)) = e. We call such a bijection an edge-reversing matching, and we denote by R(GE ) the set of all edge-reversing matchings of GE . Clearly a graph G is symmetric iff it has an edge-reversing matching. Thus T ⊗E has symmetric horizontal edge-constrained strips iff for ev(H) ery m, Gm has an edge-reversing matching. We present a sufficient condition for this to hold. Proposition 3. Let T, E be 1-dimensional constraints over an alphabet Σ, with E an edge constraint defined by a graph GE = (VE , EE ) with R ∈ R(GE ) an edgereversing matching. If for every word e1 . . .em ∈T one has R(e1 ). . .R(em )∈T as well (for all m), then T ⊗E has symmetric horizontal edge-constrained strips. (H) (H) that defines Hm (S). We Proof. Let Gm = (VEm , Em ) be the subgraph of G×m E (H) ×m m m show that Gm is symmetric. Let R : EE → EE be defined by R×m (e1 , . . ., em ) = (R(e1 ), . . ., R(em )). (H) Clearly, R×m is an edge-reversing matching of G×m E . Recall that Em consists of all the edges in EEm that, when regarded as m-letter words over Σ, satisfy T . (H) (H) Therefore, by the assumption, it follows that for all e∈Em , R×m (e)∈Em as (H) well. Consequently, R×m restricted to Em , is an edge-reversing matching of (H) Gm and hence it is symmetric. If GT is a presentation of T and R∈R(GE ), a sufficient condition for the hypothesis of Proposition 3 to hold, which may be easier to check, is the existence of a function f : ET → ET satisfying: 1) LT (f (e)) = R(LT (e)) for all e ∈ ET and 2) for any path e1 e2 of length 2 in GT , the sequence f (e1 )f (e2 ) is also a path in GT . Indeed, if such a function exists then any path 1 2 . . .m in GT generating a word e1 e2 . . .em has a corresponding path f (1 )f (2 ). . .f (m ) generating the word R(e1 )R(e2 ). . .R(em ), and thus the hypothesis of Proposition 3 is fullfiled. In fact, it can be shown that when GT is irreducible, deterministic and has the minimum number of vertices among all deterministic presentations of T , this condition is also necessary for the hypothesis of Propositon 3 to hold (see [26, Section 3.3]). In Section 3.3 we use Proposition 3 to show that the method described in this section can be used to compute lower bounds on CHG(b1 )⊗CHG(b2 ) for any positive integers b1 and b2 . 3.2 Constraints with symmetric vertex-constrained strips In this section we present an analog to Theorem 1 that gives lower bounds on the capacities of constraints for which every horizontal or every vertical strip is a sym27 3.2. Constraints with symmetric vertex-constrained strips metric vertex-constraint. We do this by transforming a 2-dimensional constraint with symmetric vertex-constrained strips to a 2-dimensional constraint with symmetric edge-constrained strips, having the same capacity. Fix an alphabet Σ, and let S be a 2-dimensional constraint over Σ. We say that S has horizontal vertex-constrained strips if for every positive integer m, the constraint Hm (S) is a vertex-constraint. If, in addition, every horizontal strip is symmetric, we say that S has symmetric horizontal vertex-constrained strips. The notions of a 2-dimensional constraint with vertical vertex-constrained strips and symmetric vertical vertex-constrained strips are defined analogously. It turns out that RWIM and NAK do not have horizontal or vertical edgeconstrained strips, and so the method in section 3.1 does not apply directly. We illustrate this only for horizontal strips for S = RWIM. Recall from Chapter 2 that an edge constraint is a constraint of memory 1 such that any two follower sets are either disjoint or identical. We claim that this condition does not hold for Hm (S). To see this, given any m, let w = w0 . . .wm−1 be the all-zeros word of length m and u = u0 . . .um−1 be any other word of length m. Now, the m × 2 arrays w0 w1 .. . w0 w1 .. . , wm−1 wm−1 w0 w1 .. . u0 u1 .. . , wm−1 um−1 u0 u1 .. . w0 w1 .. . , um−1 wm−1 belong to S, yet the m × 2 array u0 u1 .. . u0 u1 .. . , um−1 um−1 does not. Thus, w and u, regarded as m×1 columns, have different but non-disjoint follower sets. Consequently, RWIM does not have horizontal edge-constrained strips. However, it is not hard to show that RWIM and NAK have both symmetric horizontal vertex-constrained strips and symmetric vertical vertex-constrained strips. For instance, for S = RWIM, Hm (S) is the vertex constraint defined by the graph G = (V, E), where V consists of all binary vectors u0 . . . um−1 of length m and E consists of a single edge from u ∈ V to v ∈ V iff for all i, whenever ui = 1, then vi+1 = vi = vi−1 = 0 (with the obvious modification when i = 0 or m − 1). And Vm (S) is the vertex constraint defined by the graph of G0 = (V 0 , E 0 ), where V 0 consists of all binary vectors u0 . . . um−1 of length m which do not contain two 28 3.2. Constraints with symmetric vertex-constrained strips adjacent ‘1’s and E 0 consists of a single edge from u ∈ V to v ∈ V iff for all i, whenever ui = 1, then vi+1 = vi−1 = 0 (again, with the obvious modification when i = 0 or m − 1). Clearly, both G and G0 are symmetric. Now, let S be a 2-dimensional constraint over Σ. For a finite m × n array Γ with m ≥ 1 and n ≥ 2 over Σ its [1 × 2]-higher block recoding or [1 × 2]-recoding is an m × (n − 1) array Γ̃ over Σ1×2 with entries given by Γ̃i,j = (Γi,j Γi,j+1 ) ; i = 0, . . . , m − 1, j = 0, . . . , n − 2. We denote by S [1×2] the set of all [1 × 2]-recodings of arrays in S and refer to it as the [1 × 2]-higher block recoding of S. The [1 × 2]-higher block recoding of a constraint is a constraint. This is stated in the following proposition. Proposition 4. Let S be a 2-dimensional constraint over Σ. Then S [1×2] is a 2dimensional constraint over Σ1×2 . Remark . We may, of course, define, in a similar manner, the [s × t]-higher block recoding of S, for any positive integers s and t, and the [s×t]-higher block recoding of a 2-dimensional constraint S is a 2-dimensional constraint. Proof. Set S 0 = S [1×2] and let (GV , GH ) be a presentation of S with GV = (VV , E, L) and GH = (VH , E, L). We construct labeled graphs GV0 = 0 = (E, E 0 , L0 ) as follows. The set of edges E 0 is defined (VV ×VV , E 0 , L0 ) and GH as E 0 = (e0 e1 )∈E 1×2 : e0 , e1 is a path in GH , and the labeling function L0 : E 0 → Σ1×2 is given by L0 (e0 e1 ) = (L(e0 ) L(e1 )) ; (e0 e1 )∈E 0 . For every edge (e0 e1 )∈E 0 we define σGV0 (e0 e1 ) τGV0 (e0 e1 ) 0 (e0 e1 ) σGH 0 (e0 e1 ) τGH = = = = (σGV (e0 ), σGV (e1 )) (τGV (e0 ), τGV (e1 )) . e0 e1 0 ) is a presentation of S 0 . It’s easy to verify that (GV0 , GH [1×2] Clearly, recoding is an injective mapping, thus |Sm×n | = |Sm×(n−1) | for all positive integers m ≥ 1, n ≥ 2. It follows that cap(S) = cap(S [1×2] ). The next proposition shows that the [1 × 2]-higher block recoding of a constraint with symmetric horizontal vertex-constrained strips has symmetric horizontal edgeconstrained strips. 29 3.2. Constraints with symmetric vertex-constrained strips Proposition 5. Let S be a 2-dimensional constraint with horizontal vertexconstrained strips. 1. S [1×2] has horizontal edge-constrained strips. Moreover, S [1×2] has symmetric horizontal edge-constrained strips iff S has symmetric horizontal vertexconstrained strips. 2. S [1×2] = V1 (S [1×2] )⊗H1 (S [1×2] ). (H) (H) Proof. (1). Let m be a positive integer, Gm = (Vm , Em ) be the graph defining the vertex-constraint Hm (S) and set ∆ = Σ1×2 , where Σ is the alphabet of S. We (H) define a labeling L : Em → ∆m×1 of the edges of Gm . For e∈Em , we regard σ(e) and τ (e) as m×1 arrays over Σ, and define L(e) to be the array in ∆m×1 with entries given by L(e)(i,0) = (σ(e)i,0 τ (e)i,0 ) ; i∈[m]. It’s easily verified that the word generated by every path in the labeled graph (H) (Gm , L) is the [1×2]-higher block recoding of the array formed by concatenat(H) ing the vertices along the path horizontally in sequence. It follows that (Gm , L) is a presentation of Hm (S [1×2] ). Since the labels of the edges in (G(H )m , L) are distinct, we may identify each edge with its label, and it follows that Hm (S [1×2] ) is an edge-constraint. Since the same graph defines both Hm (S) and Hm (S [1×2] ) (the former as a vertex-constraint and the latter as an edge-constraint), it follows that Hm (S) is symmetric iff Hm (S [1×2] ) is. This completes the proof. (2). Clearly, S [1×2] ⊆ V1 (S [1×2] )⊗H1 (S [1×2] ). As for the reverse inclusion, let Γ̃ ∈ V1 (S [1×2] )⊗H1 (S [1×2] ) be an m × n array over Σ1×2 . Since every row of Γ̃ is in H1 (S [1×2] ), every row has a unique 1 × (n + 1) pre-image under the recoding map. Let Γ be the m × (n + 1) array over Σ, whose ith row is the preimage under the recoding map of the ith row of Γ̃, for i = 0, . . . , m − 1. Clearly, Γ̃ is the [1 × 2]-higher block recoding of Γ. Thus, it suffices to show that Γ ∈ S. For i = 0, 1, . . . , n − 1 clearly, the m × 2 array Γ∗,i:i+1 over Σ recodes to the column Γ̃∗,i . By our assumption this column is in V1 (S [1×2] ). Since recoding is injective, Γ∗,i:i+1 must be in S. Since this holds for all i = 0, 1, . . . , n − 1 and since Hm (S) has memory 1, it follows that Γ ∈ S and therefore Γ̃ ∈ S [1×2] . We can now use the method described in Section 3.1 to get lower bounds on 2dimensional constraints with symmetric horizontal vertex-constrained strips. This is stated in the following theorem. 30 3.3. Capacity bounds for axial products of constraints Theorem 2. Let S be a 2-dimensional constraint over an alphabet Σ with symmetric horizontal vertex-constrained strips. Let µ≥0, and α, p, q>0 be integers, GE = (VE , EE ) be the graph defining the vertex-constraint H1 (S) (hence VE ⊆Σ), and φ : (VE )µ+α → [0, ∞) be a nonnegative function. For an integer n≥2, let Gn be a labeled graph obtained from a deterministic presentation for Vn (S) by replacing each edge-label with its [1×2]-higher block recoding. Set Ãn,φ = A(I(µ, α, n−1, Gn , GE ), Wφ ), where I, Wφ , and A(I, Wφ ) are as defined in Section 3.1. Then cap(S) ≥ log λ(Ãp+2q+1,φ ) − log λ(Ã2q+1,φ ) . pα Proof. Let S 0 = S [1×2] . By Proposition 5, S 0 = V1 (S 0 )⊗H1 (S 0 ), and S 0 has horizontal symmetric edge-constrained strips. Since GE has no parallel edges, we may identifiy each edge e∈EE with the pair (σ(e), τ (e)); then, with this identification, H1 (S 0 ) = X(GE ). Also, note that G2q+p+1 and G2q+1 are deterministic presentations for V2q+p (S 0 ) and V2q (S 0 ), respectively. The result follows from Theorem 1 applied to S 0 . 3.3 Capacity bounds for axial products of constraints In this section we show how the method described in Section 3.1 can be applied to axial products of certain 1-dimensional constraints. Let S and T be two 1dimensional constraints over an alphabet Σ. We wish to lower bound the capacity of the 2-dimensional constraint T ⊗S. To this end, we pick a lossless presentation GS = (GS , LS ), with GS = (VS , ES ), for S. We extend the function LS to multidimensional arrays over ES in the manner described in Chapter 2, and for a ∗ set A ⊆ Σ∗ , we denote by L−1 S (A)⊆ES the inverse image of A under this map, namely ∗ L−1 S (A) = {w ∈ ES : LS (w)∈A} . The following proposition shows that we can reduce the problem of calculating the capacity of T ⊗S to that of calculating the capacity of L−1 S (T )⊗X(GS ). Proposition 6. Let S, T be two 1-dimensional constraints and let X(GS ) and L−1 S (T ) be as defined above. Then 1. L−1 S (T ) is a 1-dimensional constraint. 2. cap(T ⊗S) = cap(L−1 S (T )⊗X(GS )). 31 3.3. Capacity bounds for axial products of constraints Proof. 1. Let GT = (VT , ET , LT ) be a presentation of T . We shall construct a presentation F = (VT , EF , LF ) of L−1 S (T ). The set of edges is given by EF = {(eT , eS )∈ET ×ES : LT (eT )=LS (eS )}, and for an edge (eT , eS )∈EF , σF (eT , eS ) = σGT (eT ), τF (eT , eS ) = τGT (eT ) and LF (eT , eS ) = eS . It is easily verified that L−1 S (T ) is presented by F, and therefore it is a 1-dimensional constraint. 2. We set R = T ⊗S, U = L−1 S (T )⊗X(GS ). For an array ∆∈Rm×n , define P∆ = {Γ∈Um×n : LS (Γ) = ∆}, we claim that 1≤|P∆ | ≤ |VS |2m . (3.9) Indeed, it’s easily verified that an array Γ∈ESm×n is in P∆ iff for all i∈[m] the n−1 row (Γi,j )n−1 j=0 is a path in GS that generates (∆i,j )j=0 . Since GS is a lossless presentation of S, for every i∈[m], there is at least one path in GS generating 2 (∆i,j )n−1 j=0 and at most |VS | such paths; the claim follows. Now, clearly for any Γ∈Um×n the array LS (Γ) is in Rm×n . It follows that the sets P∆ , for ∆∈Rm×n form a partition of Um×n , and we have X |Um×n | = |P∆ |. ∆∈Rm×n Therefore, by (3.9), we get |Rm×n | ≤ |Um×n | ≤ |Rm×n | |VS |2m , and it follows from (1.1) that cap(R) = cap(U ). Therefore if L−1 S (T )⊗X(GS ) has symmetric horizontal edge-constrained strips, we can apply the method of Section 3.1 to obtain lower bounds on cap(T ⊗S). In this case, it also follows from Remark 1 of Theorem 1, that the method of [3] for obtaining upper bounds on the capacity of the hard-square constraint, can be used to obtain upper bounds on cap(T ⊗S). Proposition 3 present a sufficient condition for L−1 S (T )⊗X(GS ) to have symmetric horizontal edge-constrained strips. Here we give another stronger sufficient condition involving only the presentation GS . We say that a labeled graph (G, L), with G = (V, E), is symmetric as a labeled graph, if there exists an edge-reversing matching R ∈ R(G) which preserves L, that is L(R(e)) = L(e) for all e ∈ E. We assume now that GS is symmetric as a labeled graph, and that R ∈ R(GS ) is an edge-reversing matching which preserves LS . Since for any positive integer m and e1 . . .em ∈ESm , the label L(e1 ). . .L(em ) = L(R(e1 )) . . . L(R(em )), it follows −1 that e1 . . .em ∈L−1 S (T ) iff R(e1 ). . .R(em )∈LS (T ). Consequently, the hypothesis of Proposition 3 holds and we have the following corollary. 32 3.4. Heuristics for choosing φ Corollary 1. If GS is symmetric as a labeled graph then L−1 S (T )⊗X(GS ) has symmetric horizontal edge-constrained strips. Since the presentation in Figure 1.1a is symmetric as a labeled graph, we can apply the method of Section 3.1 to get lower bounds on the capacity of all constraints T ⊗EVEN for any 1-dimensional constraint T . Let S = CHG(b1 ) and let T = CHG(b2 ) for integers b1 , b2 ≥ 2. Let GS = (GS , LS ), with GS = (VS , ES ), be the presentation given in Figure 1.1c for b = b1 . Evidently, GS is symmetric with exactly one edge-reversing matching, R : ES →ES . Fix a positive integer m and let e = e1 e2 . . .em ∈ESm . Obviously, T is closed under negation of words (i.e., negating each symbol), and we have ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ e1 e2 . . .em ∈L−1 S (T ) LS (e1 )LS (e2 ). . .LS (em )∈T (−LS (e1 ))(−LS (e2 )). . .(−LS (em ))∈T LS (R(e1 ))LS (R(e2 )). . .LS (R(em ))∈T R(e1 )R(e2 ). . .R(em )∈L−1 S (T ). Consequently, it follows by Proposition 3 that L−1 S (T )⊗X(GS ) has symmetric horizontal edge-constrained strips and we can apply the method of Section 3.1 to obtain lower bounds on the capacity of CHG(b2 ) ⊗ CHG(b1 ). The reader will note a similarity in the constructions in proofs of Propositions 1 and 6. Indeed, as an alternative approach, one may be able to use the construction in Proposition 1 to obtain bounds on cap(S ⊗ T ): namely, if G1 and G2 are the underlying graphs of a capacity-preserving presentation (G1 , G2 ) of S ⊗ T and X(G1 ) ⊗ X(G2 ) has symmetric horizontal edge-constrained strips. However, the approach given by Proposition 6 seems to be more direct and simpler than the alternative approach. 3.4 Heuristics for choosing φ In this section, we use the notation defined in Section 3.1, and assume that S = T ⊗E is a 2-dimensional constraint with symmetric horizontal edge-constrained strips, where E is an edge constraint. We describe heuristics for choosing the function φ to obtain “good” lower bounds on the capacity of S. 3.4.1 Using max-entropic probabilites Recall that a vertex of a directed graph is isolated if no edges in the graph are (H) connected to it. Note, that since Gm is symmetric, every vertex is either isolated 33 3.4. Heuristics for choosing φ or has both incoming and outgoing edges. We assume here that for every positive (H) integer m, ignoring isolated vertices, Gm is a primitive graph. In this case, the Perron eigenvector of Hm is unique up to multiplication by a scalar. Let rm be the right Perron eigenvector of Hm normalized to be a unit vector in the L2 -norm. Observe, that substituting rm for xm satisfies (3.2) with an equality. This motivates us (H) to choose φ so that the resulting vector xm approximates rm . Since Gm (without its isolated vertices) is irreducible, there is a unique stationary probability measure having maximum entropy on arrays of Hm , namely the max-entropic probability measure on Hm . We denote it here by Pr∗,m . It is given by Pr∗,m (Γ) = (rm )σ(Γ) (rm )τ (Γ) λ(Hm )` . For Γ ∈ Sm×` , for some positive integer `, and where σ(Γ), τ (Γ) ∈ VEm are given by (σ(Γ))i = σ(Γi,0 ) ; i = 0, 1, 2, . . . , m−1. (τ (Γ))i = τ (Γi,`−1 ) Let V(m) be a random variable taking values in VEm with distribution given by Pr(V(m) =v) = Pr∗,m {Γ ∈ Sm×1 : σ(Γ) = v} ; v ∈ VEm . It’s easily verified that Pr(V(m) =v) = ((rm )v )2 . (3.10) Pr(V(m) Thus approximating = v) and taking a square root will give us an approximation for (rm )v . Roughly speaking, Pr(V(m) = v) is the probability of seeing the column of vertices v in the “middle” of an m×` array chosen uniformaly at random from Sm×` , for large `. Fix integers µ ≥ 0, α ≥ 1 as in Section 3.1, and assume now that m = mk = µ + kα, for a positive integer k. For an integer 0≤s<m and vectors u∈VE` , w∈VEr , with lengths satisfying `≤m−s, r≤s, denote (m) (m) by ps (u) and ps (u|w) the probabilities given by (m) p(m) s (u) = Pr(Vs:s+`−1 = u) (m) (m) p(m) s (u|w) = Pr(Vs:s+`−1 = u|Vs−r:s−1 = w). Then by the chain rule for conditional probability we have, for any vector v∈VEm , (m) Pr(V(m) =v) =p0 (v0:µ−1 ) · k−1 Y (m) pµ+iα (vµ+iα:µ+(i+1)α−1 |v0:µ+iα−1 ). i=0 34 3.4. Heuristics for choosing φ A plausible way to approximate Pr(V = v), is by treating V as the outcome of a Markov process. Here we use a Markov process with memory µ, and assume that ps (u|w) can be “well” approximated by ps (u|wr−µ:r−1 ), for vectors u∈VE` , w∈VEr , with r, ` as above, and r≥µ. Using this approximation we get (m) Pr(V(m) =v) ≈p0 (v0:µ−1 ) · k−1 Y (m) pµ+iα (vµ+iα:µ+(i+1)α−1 |viα:µ+iα−1 ). i=0 We hypothesize that for fixed vectors u∈VEα , w∈VEµ , as m gets large, the con(m) ditional probabilities ps (u|w), for 0sm−1, are “approximately equal” to the value when s is in the “middle” of the interval [0, m − 1]. We hypothesize that this holds for “most” of the integers s in that inteval and moreover that this middle value converges as m gets large. Accordingly, we try to approximate the con(m) ditional probability ps (u|w) by the conditional probability found in the “middle” of a “tall” horizontal strip. More precisely, we fix an integer δ ≥ 0, set (m) (ω) ω=2δ+µ+α, and approximate ps (u|w) by pδ+µ (u|w). We also approximate (m) (ω) p0 (w) by p0 (w). This gives us (ω) Pr(V(m) =v) ≈p0 (v0:µ−1 ) · k−1 Y (ω) pδ+µ (vµ+iα:µ+(i+1)α−1 |viα:µ+iα−1 ), i=0 which, by (3.10), implies that q (ω) (rm )v ≈ p0 (v0:µ−1 ) k−1 Y q (ω) · pδ+µ (vµ+iα:µ+(i+1)α−1 |viα:µ+iα−1 ). (3.11) i=0 Fm Set Fm =|VE |m , and denote by rg mk ∈R k the nonnegative real vector with entries mk indexed by VE and given by the RHS of equation (3.11). Let φ : (VE )µ+α → [0, ∞) be given by q (ω) (3.12) φ(u) = pδ+µ (uµ:µ+α−1 |u0:µ−1 ) ; u∈(VE )µ+α , and let xmk ∈ RFmk be the vector with entries indexed by VEmk and defined by (3.4). We obtain q (ω) (rmk )v ≈ (g rmk )v = (xmk )v p0 (v0:µ−1 ) ; v∈(VE )mk . 35 3.4. Heuristics for choosing φ Now for mk ≥ω, if v∈VEmk is not an isolated vertex in Gmk , then clearly, v0:ω−1 is not an isolated vertex in Gω as well. Therefore (rω )v0:ω−1 >0, which implies that (w) (ω) (ω) (ω) p0 (v0:ω−1 )>0 and thus p0 (v0:µ−1 )>0. Let pmin = pmin = min{p0 (w) : (ω) w∈VEµ and p0 (w)>0}. It follows that for all vertices v∈ (VE )mk of Gmk that are not isolated, we have (g rmk )v ≥ √ pmin (xmk )v . ` y Now, for any positive integer ` and (Fmk ×1)-real vector y, the product yt Hm depends only on the values of the entries of y indexed by non-isolated vertices of Gmk . Consequently, we may write t ` ` t ` g pmin xtmk Hm x ≤ rg mk Hmk r mk ≤ xmk Hmk xmk , k mk for all positive integers `. Taking the log, dividing by mk , and taking the limit as k approaches infinity, we obtain t ` ` x g log rg log xtmk Hm mk Hmk r mk k mk = lim , k→∞ k→∞ mk mk lim where by Lemma 2, the limit in the RHS exists. Thus, choosing φ as given by (3.12) and computing the lower bound by the method described in Section 3.1 is equivalent to computing the limit of the lower bound in (3.2), with rf m substituted for xm , as m approaches infinity. If rf m approximates rm well enough, we expect to get good bounds. Note that we may use the heuristic described here even for (H) constraints for which the graphs Gm are not always irreducible. In this case, the geometric multiplicity of the Perron eigenvalue may be larger than 1, and there (ω) may be more than one choice of the vector rω in the computation of pδ+µ (·|·). Regardless of our choice, we will get a nonnegative function φ and a lower bound on the capacity. In Section 3.5 we show numerical results obtained using the heuristic described here for several constraints. 3.4.2 General optimization We may also use general optimization techniques to find functions φ which maximize the lower bound on the capacity. Fix integers µ≥0 and p, q, α>0, and for a positive integer `, set D` = (VE )` . In this subsection, we identify a function φ:Dµ+α →R with a real vector φ∈R|Dµ+α | indexed by Dµ+α ; for each j∈Dµ+α we identify φ(j) with the entry φj . For a positive integer n, let Gn be a deterministic presentation for Vn (S), In = I(µ, α, n, Gn , GE ), and for a function φ : Dµ+α →[0, ∞), set An,φ = A(In , Wφ ). Observe that for a scalar c ∈ [0, ∞), 36 3.4. Heuristics for choosing φ An,cφ = c2 An,φ . It follows that using cφ in place of φ in equation (3.7) of Thereom 1 does not change the lower bound. Consequently, (as φ cannot be the constant 0 function), it’s enough to consider functions φ whose images (of all vectors in (VE )µ+α ) sum to 1. We thus have the following optimization problem. maximize (log λ(A2q+p,φ ) − log λ(A2q,φ )) / (pα) subject to φ ≥ 0, φ · 1 = 1, (3.13) where 0 and 1 denote the real vectors of size |Dµ+α | with every entry equal to 0 and 1 respectively, and for two real vectors of the same size, t, r we write t ≥ r or t>r if the corresponding inequality holds entry-wise. Finding a global solution for a general optimization problem can be hard. We proceed to show that if we replace the constraint φ≥0 with φ>0 in (3.13), thereby changing the feasable set and possibly decreasing the optimal solution, it can be formulated as an instance of a particular class of optimization problems known as “DC optimization” which may be easier to solve. Let d be a positive integer. A real-valued function f : Rd → R is called a DC (difference of convex) function, if it can be written as the difference of two real-valued convex functions on Rd . An optimization problem of the form maximize f (x) subject to x∈X, hi (x) ≤ 0 ; i = 0, 1, . . . , `, where X ⊆ Rd is a convex closed subset of Rd and the functions f, h0 , . . . , h` are DC functions, is called a DC optimization or DC programming problem. See [17] and the references within for an overview of the theory of DC optimization. A nonnegative function f : Rd → [0, ∞) is called log-convex or superconvex, if either f (t)>0 for all t∈Rd and log f is convex in Rd , or f ≡0. A log-convex function is convex, and in [24], it is shown that the class of log-convex functions is closed under addition, multiplication, raising to positive real powers, taking limits, and additionally that for a square matrix A(t) = (ai,j (t)) whose entries are logconvex functions ai,j : Rd → [0, ∞), the function t → λ(A(t)) is log-convex as well. Now, observe, that for a positive integer n, every entery of An,φ is a quadratic form in the entries φ(j), j∈Dµ+α , with nonnegative integer coefficients. Such a function is generally not log-convex. To fix this, we perform the change of variables φ = eψ , where ψ is a real-valued function ψ : Dµ+α → R. Note that by doing so, we added the constraint φ>0. Since every entry of φ is now positive, 37 3.5. Numerical results for selected constraints we may replace the constraint φ · 1 = 1 by the constraint φ(v0 ) = 1 or equivalently ψ(v0 ) = 0, for some fixed v0 ∈Dµ+α . Problem (3.13) with the additional constraint φ>0, can now be rewritten as maximize log λ(A2q+p,eψ ) − log λ(A2q,eψ ) / (pα) (3.14) subject to ψ(v0 ) = 0. Obviously, we may substitue the maximization problem constraint, ψ(v0 ) = 0, above into the objective function, thereby reducing the number of variables by 1; however, this is not relevant for the discussion, so, for simplicity, we do not do so here. Now, for a positive integer n, the entries of the matrix An,eψ are of the form qi,j X eψ(wk,i,j )+ψ(uk,i,j ) , k=1 where qi,j are nonnegative integers, and wk,i,j and uk,i,j are vectors in Dµ+α , for all i, j∈VIn and integers 1≤k≤qi,j . It can be verified that a function of such a form is log-convex in ψ. It follows that the function ψ→λ(An,eψ ) for ψ ∈ R|Dµ+α | is log-convex as well. Therefore either λ(An,eψ )≡0, or λ(An,eψ )>0 for all ψ ∈ R|Dµ+α | . In particular, for ψ≡0 the matrix An,1 is the adjacency matrix of the graph In . Since In is deterministic, and for every nonnegative integer `, the set of labels of its paths of length ` is S`α×n , it follows that (1/α) log λ(An,1 )=cap(Vn (S)) ≥ cap(S) (the latter inequality follows from (2.8)). Hence, if cap(S)>−∞ (or equivalently cap(S)≥0), then λ(An,eψ )>0 for all ψ ∈ R|Dµ+α | and log λ(An,eψ ) is a convex function of ψ. Clearly cap(S)>−∞ iff. Sm×n 6=∅, for all positive integers m, n. We thus obtain the following theorem. Theorem 3. Let S be a constraint such that for all positive integers m, n, Sm×n 6=∅ then Problem (3.14) is a DC optimization problem. 3.5 Numerical results for selected constraints In this section we give numerical lower bounds on the capacity of some 2dimensional constraints obtained using the method presented in the sections above. The constraints considered are NAK, RWIM, EVEN⊗2 , and CHG(3)⊗2 . Table 3.2 summarizes the best lower bounds obtained using our method. For comparison, we provide the best lower bounds that we could obtain using other methods. We also give upper bounds on the capacity of these constraints obtained using the method of [3]. Table 3.3 shows the lower bounds obtained using our max-entropic probability heuristic for choosing φ, described in Section 3.4.1. Table 3.4 shows the 38 3.5. Numerical results for selected constraints lower bounds obtained with our method by trying to solve the optimization problem described in Section 3.4.2. In this, we did not make use of the DC property of the optimization problem; instead, we used a generic sub-optimal optimization algorithm whose results are not guarenteed to be global solutions. Utilizing special algorithms for solving DC optimization problems may give better lower bounds. The rightmost column of each of these tables shows the lower bound calculated for the same values of p and q using the method of [3, 7]. The largest lower-bound obtained for each constraint is marked with a ‘? ’. In the next subsections we give remarks specific to some of these constraints. As can be seen from the results in Table 3.4, using the optimization heuristic with our method gives better lower bounds on the capacity than the method of [3,7]. On the other hand, using optimization typically requires many evaluations of the objective function which results in longer running times compared to the method of [3, 7], for the same values of p and q. Increasing the value of q in the method of [3, 7] usually gives better lower bounds but requires more time and memory to run. It is therefore relevant to check if their method can be used to attain the lower bounds, obtained using the technique described here, with the same computational resources. Our experiments show that this is typically not the case. For example, for the NAK constraint, when µ = 1, α = 1, p = 1 and q = 4, an implementation of the method described here with the optimization heuristic took roughly 70 seconds to run, required operating on matrices of size 144×144 and resulted with the lower bound 0.4250767727. In contrast, our implementation of the method of [3,7] with p = 1, and q = 8, took roughly 1000 seconds to run, required operating on matrices of size 6765×6765 and resulted with the lower bound 0.4250725619. Table 3.1 shows some of the lower bounds obtained using the method of [3, 7] along with the size of the largest square matrix involved in the computation. The numerical results were computed using the eigenvalue routines in Matlab and rounded (down for lower bounds and up for upper-bounds) to 10 decimal places. Given accuracy problems with possibly defective matrices, we verified the results using the technique described in [35, Section IV]. Table 3.1: p 1 2 1 1 2 1 Matrix size in the method of [3, 7] for the NAK constraint. q Lower bound using [3, 7] Largest matrix size 5 0.4248771038 377×377 4 0.4249055702 233×233 6 0.4250215987 987×987 7 0.4250615286 2584×2584 6 0.4250636891 1597×1597 8 0.4250725619 6765×6765 39 3.5. Numerical results for selected constraints 3.5.1 The constraint RWIM Observe that this constraint has both symmetric horizontal and symmetric vertical vertex-constrained strips. Thus, we can apply our method in the vertical as well as the horizontal direction to get lower bounds. Clearly, cap(RWIMt ) = cap(RWIM), so we can obtain additional lower bounds on cap(RWIM) by using our method to get lower bounds on cap(RWIMt ). Some of these bounds are given in Tables 3.3 and 3.4. 3.5.2 The constraint EVEN⊗2 We used the reduction described in Section 3.3 with GEVEN being the presentation of EVEN given in Figure 1.1a, to get lower bounds on the capacity of EVEN⊗2 . Table 3.4 gives the results obtained with our method using the optimization described in Section 3.4.2. We also used the method with the max-entropic probability heuristic of Section 3.4.1 and the results are given in Table 3.3. 3.5.3 The constraint CHG(b)⊗2 For this constraint, the case b=1 is degenerate. Indeed, there are exactly two m×n arrays in CHG(1)⊗2 for all positive integers m and n, and consequently, cap(CHG(1)⊗2 )=0. For b=2, we show in Theorem 5 in Chapter 4 that the capacity is exactly 1/4. For b=3, we used the reduction of Section 3.3 with GCHG(3) being the presentation of CHG(3) given in Figure 1.1c, to get lower bounds on the capacity of CHG(3)⊗2 . Table 3.4 gives the results obtained with our method using the optimization described in Section 3.4.2. We also used the method with the max-entropic probability heuristic of Section 3.4.1 and the results are given in Table 3.3. Table 3.2: Best bounds on capacities of certain constraints. Constraint Previous best New lower bound Upper bound lower bound NAK 0.4250725619?? 0.4250767745 0.4250767997? ??? RWIM 0.5350150 0.5350151497 0.5350428519? ⊗2 EVEN 0.4385027973?? 0.4402086447 0.4452873312? ⊗2 ?? CHG(3) 0.4210209862 0.4222689819 0.5328488954? ? Calculated using the method of [3]. using the method of [3, 7]. ??? Appears in [42]. ?? Calculated 40 3.5. Numerical results for selected constraints Table 3.3: Lower bounds using max-entropic probability heuristic (Section 3.4.1). Constraint NAK RWIM RWIMt EVEN⊗2 CHG(3)⊗2 ? Best δ 3 3 6 3 7 3 3 5 3 3 3 1 3 3 2 0 2 1 1 1 4 1 4 3 5 2 4 2 0 0 3 3 3 3 3 3 3 3 3 0 0 µ 1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 1 1 1 1 1 1 1 0 0 α 1 1 1 4 1 1 4 1 4 2 3 1 2 2 2 2 2 2 3 1 3 1 1 1 1 1 1 1 1 1 1 1 3 2 1 3 2 2 3 1 1 p 1 2 1 1 1 2 1 1 2 2 1 3 1 2 2 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 4 2 3 1 2 1 1 1 q 5 4 5 5 5 6 5 5 6 6 6 6 5 6 6 6 5 6 6 7 4 5 4 5 5 4 5 5 5 4 3 4 3 3 4 3 4 4 4 2 4 Lower bound 0.4250766244 0.4250766446 0.4250767227 0.4250767590 0.4250767617 0.4250767647 0.4250767733 0.4250767744 0.4250767745 0.4250767745 0.5350147968 0.5350148753 0.5350148814 0.5350149069 0.5350149071 0.5350149136 0.5350149271 0.5350149462 0.5350149525 0.5350149707 0.5350145937 0.5350146612 0.5350147212 0.5350147328 0.5350147619 0.5350147969 0.5350148255 0.5350148449 0.5350148814 0.5350148980 0.4383238232 0.4383243738 0.4383632350 0.4383838005 0.4384647082 0.4384906740 0.4385448358 0.4386655840 0.4387455520 0.4188210386 0.4222689819? Using [3, 7] 0.4248771038 0.4249055702 0.4248771038 0.4248771038 0.4248771038 0.4250636891 0.4248771038 0.4248771038 0.4250636891 0.4250636891 0.5235145644 0.5318753627 0.5160533001 0.5337927416 0.5337927416 0.5337927416 0.5160533001 0.5337927416 0.5235145644 0.5280406048 0.5350144722 0.5350149478 0.5350142142 0.5350149478 0.5350149478 0.5350142142 0.5350149478 0.5350149478 0.5350149478 0.5350142142 0.4347423815 0.4367818624 0.4356897662 0.4364303826 0.4371709990 0.4360537982 0.4367818624 0.4371709990 0.4367818624 0.4101473707 0.4197053158 lower bound. 41 3.5. Numerical results for selected constraints Table 3.4: Lower bounds using optimization (Section 3.4.2). Constraint NAK RWIM RWIMt EVEN⊗2 CHG(3)⊗2 ? Best µ 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 0 0 0 0 0 α 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 2 2 3 1 1 1 1 1 1 1 1 p 2 1 3 1 4 5 6 3 6 2 1 2 3 1 2 3 1 2 2 1 4 3 5 1 1 1 1 2 1 1 2 1 1 2 3 1 2 1 q 4 5 4 3 4 4 4 3 3 5 3 3 3 4 4 4 5 5 6 5 3 4 3 4 5 4 5 4 4 3 3 4 2 2 2 3 3 4 Lower bound 0.4250767692 0.4250767736 0.4250767737 0.4250767739 0.4250767739 0.4250767740 0.4250767741 0.4250767743 0.4250767744 0.4250767745? 0.5350147515 0.5350148675 0.5350149371 0.5350150805 0.5350151001 0.5350151123 0.5350151372 0.5350151410 0.5350151491 0.5350151497? 0.5350151364 0.5350151377 0.5350151392 0.5350151405 0.5350151442 0.5350151465 0.5350151481 0.5350151482 0.5350151483 0.4395381520 0.4397347451 0.4402086447? 0.4189237100 0.4197037681 0.4201450063 0.4210954837 0.4214748454 Using [3, 7] 0.4249055702 0.4248771038 0.4248960814 0.4224650194 0.4249674993 0.4249783192 0.4249995626 0.4244240822 0.4247979797 0.4250294285 0.4832292495 0.5300373650 0.5212673183 0.5037272248 0.5318663054 0.5265953036 0.5160533001 0.5330440001 0.5337927416 0.5160533001 0.5350130576 0.5350146307 0.5350134356 0.5350142142 0.5350149478 0.5350142142 0.5350149478 0.5350144722 0.5350142142 0.4347423815 0.4356897662 0.4367818624 0.4101473707 0.4182017399 0.4176642274 0.4165892023 0.4210209862 0.4197053158 lower bound. 42 3.6. Open questions 3.6 Open questions The authors of [9, 10, 35] show how the methods of [7] and [3] can be extended to get lower and upper bounds on the capacity of constraints with dimension larger than 2 which are “symmetric in all but one direction”. Similarly, it should be a relatively easy exercise to extend our generalization of the method to obtain better lower bounds to the capacities of such constraints. For a D-dimensional constraint S over Σ, one can define a more general notion than capacity called “weighted-capacity” or “pressure” [10, 11]. First, one assigns a positive weight to each symbol in Σ. Then, the weight of a D-dimensional array Γ over Σ, denoted W(Γ) is defined to be the product of the weights of its entries. The weighted-capacity of S is now given by P Γ∈Sm W(Γ) lim . m→∞ |[m]| As before, the limit exists since the numerator is an entry-wise subadditive function. It would be beneficial to extend the method of this chapter to obtain lower bounds on the weighted-capacity of symmetric constraints. Finally, consider the method of [3] for obtaining upper-bounds on the capacity of 2-dimensional symmetric constraints. While the method gives better upperbounds than those obtained by computing the normalized capacity of a horizontal or vertical strip, empirical results suggest that for many constraints the upperbounds approach the capacity slower than their lower-bound counterparts. Hence, a better method for upper bounding the capacity of symmetric (and general) constraints would be useful. 43 Chapter 4 Exact computation of capacity∗ As already stated, finding the exact capacity of D-dimensional constraints, for D>1, is hard, and the list of constraints for which the capacity is known precisely is quite small. In this chapter we add two families of isotropic multidimensional constraints to this list, namely ODD⊗D and CHG(2)⊗D . Some of the results presented here have been extended to other similar constraints in [22]. 4.1 The capacity of ODD⊗D Theorem 4. For all positive integers D, 1 cap ODD⊗D = . 2 Proof. Let S be the D-dimensional constraint ODD⊗D . We first show D cap(S)≥1/2. For an integer n, let Yn ⊆ [2n] be the set of all vectors in [2n]D whose entries sum to an even number, and let Xn be the set of all binary Ddimensional arrays Γ of size 2n×2n×. . .×2n, with entries satisfying (Γ)j = 0 for all j∈Yn . Then the number of zeros between consecutive ‘1’s, in any row of an array in Xn is odd since it must be of the form i−j−1 for some integers i, j— either both odd, or both even. Thus, all such arrays satisfy the constraint S, and D D D since |Xn | = 2(2n) −|Yn | = 2(2n) /2 , we have |S2n×2n×...×2n | ≥ 2(2n) /2 for all positive integers n, which implies cap(S)≥1/2. It remains to show that cap(S)≤1/2. Since cap(ODD⊗D ) is non-increasing in D, it’s enough to show cap(S)≤1/2 for D = 1. Let n be a positive integer. It can be easily verified that any 1-dimensional array Γ∈ODDn with entries indexed by [n], satisfies either Γj = 0 for all even integers j∈[n], or Γj = 0 for all odd integers j∈[n]. It follows that |ODDn |≤2dn/2e +2bn/2c which implies the desired inequality ∗ A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 44 4.2. The capacity of CHG(2)⊗D 4.2 The capacity of CHG(2)⊗D Theorem 5. For all positive integers D, 1 cap CHG(2)⊗D = D . 2 Proof. Let S = CHG(2)⊗D . We first show that cap(S)≥1/2D . Let Γ(0) , Γ(1) be the D-dimensional arrays of size 2×2×. . .×2 with entries indexed by {0, 1}D and given by Γ(i) = (−1)i+j·1 ; j ∈ {0, 1}D , j where as usual 1 denotes the D-dimensional vector with every entry equal to 1. Observe that the sum of every row of both of these arrays is zero. Now, let n be a positive integer. For any D-dimensional array of size n×n×. . .×n with entries in {0, 1}, it can be easily verified that replacing every entry equal to 0 with Γ(0) and every entry equal to 1 with Γ(1) results in a D-dimensional array of size D 2n×2n×. . .×2n that satisfies S. It follows that |S2n×2n×...×2n |≥2n for all positive integers n, which implies cap(S)≥1/2D . (0) We now show that cap(S)≤1/2D . For a positive integer n≥2, denote by Nn (1) the set of all even integers in {0, 1, . . ., n−2} and by Nn the set of all odd integers in {0, 1, . . ., n−2}. We shall make use of the following lemma. n−1 Lemma 3. Fix a positive integer n≥2, and let (ai )i=0 ⊆ {+1, −1} be a sequence of length n. Then a0 . . .an−1 ∈CHG(2) if and only if at least one of the following statements hold. (0) 1. For all i∈Nn , ai =−ai+1 . (1) 2. For all i∈Nn , ai =−ai+1 . Proof. We first show the “if” direction. Let (ai )n−1 i=0 ⊆ {+1, −1} be a sequence for which at least one of statements 1,2 of the lemma holds. Then clearly, for any Pj integers 0≤i≤j<n, all the terms in the sum k=i ak , with the possible exception P of the first and last terms, cancel. Therefore | jk=i ak | ≤ |ai | + |aj | = 2 and a0 . . .an−1 ∈CHG(2). n−1 As for the “only if” direction, let (ai )i=0 ⊆ {+1, −1} such that a0 . . .an−1 ∈ CHG(2), and consider the presentation of the CHG constraint given in Figure 1.1c for b = 2 (and vertices {0, 1, 2}). Let (ei )n−1 i=0 be a path in this presentation generating a0 . . .an−1 . It’s easily verified that if σ(ei ) = 1, for some i∈[n−1], then aj = −aj+1 for all integers i≤j≤n−2 such that j≡i (mod 2). Evidently, either σ(e0 ) = 1 and so statement 1 holds, or σ(e1 ) = 1 implying statement 2. 45 4.2. The capacity of CHG(2)⊗D We now return to the claim that cap(S)≤1/2D . Fix a positive integer n≥2. For an integer 1≤i≤D, let e(i) ∈{0, 1}D , be the vector, indexed by {1, 2, . . ., D}, containing 1 in its ith entry and 0 everywhere else and let Ji ⊆[n]D denote the subset of all the vectors indexed by {1, 2, . . ., D} with a 0 in the ith entry. For a n−1 vector j∈Ji , the sequence j + ke(i) k=0 , is a sequence of indices of entries of a row in direction i of a D-dimensional n×n×. . .×n array, and we shall say that it is a sequence in direction i. Let R(n, D) be the set of all such sequences for all integers 1≤i≤D and vectors j∈Ji , and let r∈{0, 1}|R(n,D)| be a binary vector indexed by R(n, D). For the purpose of this proof, let us refer to a sequence (ai )n−1 i=0 ⊆ {+1, −1} as a phase-0 sequence if statement 1 of Lemma 3 holds, and as a phase-1 sequence if statement 2 holds (note that a sequence may be both a D phase 0 and a phase 1 sequence). Also, we denote by X(r)⊆{+1, −1}∗ , the set of all D-dimensional arrays Γ of size n×n×. . .×n for which the row Γ%̄ is a n−1 phase-r%̄ sequence, for all %̄ = j + ke(i) k=0 ∈R(n, D). Then by Lemma 3, we have [ Sn×...×n = X(r). (4.1) r We shall give an upper bound on the size of X(r). For a vector v∈[n]D , denote by ρ(i, v) the unique sequence in direction i in R(n, D) that has v as one of its elements. Let Tr,i : [n]D →ZD be given by: Tr,i (v) = v + e(i) if vi ≡ rρ(i,v) (mod 2) , v − e(i) otherwise v∈[n]D , v = (v1 , . . ., vD ). Next, we define the undirected graph Gr = (V, Er ) (without parallel edges), with vertices given by V = [n]D , and edges given by Er = u u, v∈V and v = Tr,i (u) v: , for some integer 1≤i≤D where u v denotes the undirected edge connecting vertices u, v. It’s easy to D verify that an array Γ∈{+1, −1}∗ of size n×n×. . .×n is in X(r) iff for every edge u v∈Er , it holds that Γu = −Γv . Figure 4.1 shows an example of Gr for D = 2. Let C1 , . . ., C` be the connected components of Gr , and let v(1) , . . ., v(`) be arbitrary vertices such that v(i) ∈Ci , for i = 1, 2, . . ., `. It follows that for every 46 4.2. The capacity of CHG(2)⊗D 0 1 2 3 4 5 0 1 2 3 4 5 1 1 0 1 1 0 1 1 1 0 1 0 Figure 4.1: Example of the graph Gr for D = 2, n = 6. Each entry of r corresponding to a row (column) is written to the right of it (below it). The index of each row (column) is written to its left (above it). vector b = (b1 , . . ., b` )∈{+1, −1}` , there exists at most one array Γ∈X(r) satisfying Γv(i) = bi for all i∈{1, 2, . . ., `}, and consequently, |X(r)|≤2` (in fact, while this is not needed for the proof, such an array Γ∈X(r) does exist, for any choice of b, since each Ci is bipartite; thus |X(r)| = 2` ). Now, let u = (u1 , . . ., uD )∈([n] \ {0, n−1})D be a vertex in the “interior” of Gr . We show that the connected component of Gr containing u has at least 2D vertices. To this end, we match each word w = w1 w2 . . .wD ∈{0, 1}D , with a D sequence of vertices π (w,j) j=0 ⊆V , defined recursively by π (w,j) if j=0 u (w,j−1) π if j > 0 and wj = 0 . = Tr,j (π (w,j−1) ) if j > 0 and wj = 1 It’s easy to verify that since every 1≤ui ≤n−2, the sequence is well-defined and D indeed (π (w,j) )D j=0 ⊆ [n] . Clearly, the sequence is contained entirely in the connected component containing u, and so this component contains the vertex (w,D) (w,D) π (w,D) . Write π (w,D) =(π1 , . . ., πD ). Then for i=1, 2, . . ., D, it holds that (w,D) (w,D) πi =ui if wi =0, and πi =ui ±1 if wi =1. Therefore, for two distinct words 0 w, w0 ∈{0, 1}D , the vertices π (w,D) and π (w ,D) are distinct as well, and consequently there are 2D such vertices. Thus, the connected component of Gr containing u has at least 2D vertices. It follows that there are at most nD /2D connected components of Gr containing a vertex in {1, 2, . . ., n−2}D . There are at most nD − (n−2)D connected components not containing a vertex in {1, 2, . . ., n−2}D and hence the total number of 47 4.2. The capacity of CHG(2)⊗D connected components, `, in Gr satisfies `≤nD /2D + nD − (n−2)D . Hence, D /2D +nD −(n−2)D |X(r)|≤2n . D−1 binary vectors r∈{0, 1}|R(n,D)| , we obtain from (4.1) X |X(r)| |Sn×n×...×n | ≤ Since there are 2Dn r ≤ 2 nD /2D +nD −(n−2)D +DnD−1 D (1/2D +1−(1−2/n)D )+DnD−1 = 2n and the result follows from (1.1). , 48 Chapter 5 Multi-choice constraints and independence capacity∗ In this chapter we generalize some of the concepts appearing in [37, 38] to more than 1 dimension and to non-binary alphabets. In particular we define the notion of independence capacity of a multidimensional constraint which, roughly, is the contribution to the capacity resulting from independence between symbols in arrays of the constraint. For the binary alphabet {0, 1} this concept coincides with the maximum insertion rate defined in [38]. We also show that for a 1-dimensional constraint S of finite-type with 0 independence entropy, cap(S ⊗D ) converges to 0 exponentially fast as D approaches infinity. 5.1 Multi-choice constraints b denote the set of all nonempty subsets of Σ. Let S be a D-dimensional Let Σ b Σ b m for some m∈ND , define constraint over Σ and for a D-dimensional array Γ∈ b b by the set of possible choices of entries in Γ, denoted Φ(Γ), n o b = Γ∈Σm : For all i∈[m], Γi ∈Γ bi . Φ(Γ) b by We define the multi-choice set corresponding to S, denoted S, n o b Σ b ∗D : Φ(Γ) b ⊆S , Sb = Γ∈ and, as for constraints, we use the notation Sbm , for m∈ND , to denote the set b Note that Sb is closed under taking of of D-dimensional arrays of size m in S. contiguous sub-arrays, and so it’s plausible that Sb is a D-dimensional constraint b Unfortunately, we do not know if this is true for general D-dimensional over Σ. constrained systems S. In this work, we are mostly interested in Sb for a constraint S which is an axial product of some D 1-dimensional constraints. In this case, it ∗ A version of this chapter has been submitted for publication. Louidor, E., Marcus B.H. and Pavlov R. (2010) Independence Entropy of Zd Shift Spaces. 49 5.1. Multi-choice constraints turns out that Sb is indeed a constrained system. This is an easy corollary of the next theorem which shows that for a 1-dimensional constraint S, Sb is indeed a 1dimensional constraint. This is shown in [38] for Σ = {0, 1}. For completeness, we state and prove the theorem for general alphabets. We require the following definitions. Let Σ be a finite alphabet. For words x, y∈Σ∗ and integer n we use the conventional notation of |x|, xy, xn to denote, respectively, the length of x, the word formed by concatenating y to the right of x, and the word formed by concatenating x to itself n times. We use ε to denote the empty word. Let S be a 1-dimensional constraint over Σ. For a word x∈S the follower set of x, denoted F(x) = FS (x) is given by FS (x) = {w∈Σ∗ : xw∈S} . If G = ((V, E), L) is a presentation of S, and for v∈V , we denote by F(v) the set of words generated by paths in G starting at v, then FS (x) = ∪v F(v), where the union is taken over all terminal vertices, v, of paths in G generating x. It follows that the number of follower sets of a constraint is finite. In [38] the authors construct a presentation of Sb when Σ = {0, 1}. We generalize their construction here for arbitrary alphabets and denote it by GbFS = ((VFS , EFS ), LFS ). The set of vertices VFS consists of all (finite) intersections of follower sets of words in S: ( VFS = k \ ) FS (wi ) : w1 , . . ., wk ∈S, k=1, 2, . . . . i=1 For a vertex v∈VFS , v= Tk i=1 FS (wi ), δ(v, b a) = b with b and symbol b a∈Σ a⊆v define k \\ FS (wi a) a∈b a i=1 (note that wi a∈S for every a∈b a and i = 1, 2, . . ., k). It’s easy to verify that δ(v, b a) = {w∈Σ∗ : aw∈v for all a∈b a}, and therefore δ(v, b a) does not depend on the choice of k and w1 , . . ., wk . The set EFS is now defined by EFS = {(v, b a, δ(v, b a)) : v∈VFS , b a⊆v} , and for an edge e = (v, b a, δ(v, b a))∈EFS we define σ(e) = v, τ (e) = δ(v, b a) and LFS (e) = b a. We are now ready to prove the aforementioned theorem. Theorem 6. If S is a 1-dimensional constraint over an alphabet Σ, then Sb is a b and GbF is a presentation of S. b 1-dimensional constraint over Σ S 50 5.2. Independence capacity Proof. The theorem readily follows from the following fact, which is easily verified by induction on the length of w. T b ∗ , there is a path Fact 1. For any vertex v = ki=1 FS (wi )∈VFS , and a word w∈Σ b starting at v and generating Tk T w in G if and only if Φ(w)⊆v, in which case the path ends at the vertex i=1 z∈Φ(w) FS (wi z). b ∗ generated by a path of Gb starting at some vertex v, we Now, for a word x b∈Σ b Conversely, since S = F(ε), it follows from have Φ(b x)⊆v⊆S, and therefore x b∈S. the above fact that there is a path generating any word x b∈Sb (starting from F(ε)) b in G. Let S (1) , . . ., S (D) be 1-dimensional constraints over Σ, and set S = d (1) ⊗. . .⊗S (D) , and we have S (1) ⊗. . .⊗S (D) . Then it’s easy to verify that Sb = Sd the following corollary: Corollary 2. If S (1) , . . ., S (D) are 1-dimensional constraints over Σ, and S = S (1) ⊗. . .⊗S (D) , then Sb is a D-dimensional constrained system. 5.2 Independence capacity In this section, we introduce the notion of “independence capacity” of a constraint. Roughly, this is the part of the capacity resulting from inter-symbol independence in elements of the constraint. b Sb of size m, Let S be a D-dimensional constraint over Σ. For an array Γ∈ b the real number log(|Φ(Γ)|)/|[m]| can be thought of as the contribution to the capacity resulting from independence between entries in elements of S as “capb We define the independence capacity as the limit (as m → ∞) tured” by Γ. of the maximumn possible such contribution. Precisely, observe that the mapping o b b b m → log max |Φ(Γ)| : Γ∈Sm for m∈ND is entry-wise subadditive. Using Lemma 1, we define the independence capacity of S, denoted capind (S), by n o b : Γ∈ b Sbm log max |Φ(Γ)| capind (S) = lim m→∞ |[m]| n o b : Γ∈ b Sbm log max |Φ(Γ)| = inf . (5.1) m |[m]| As already mentioned [38] defines the independence capacity of a 1-dimensional constraint, when Σ = {0, 1}, and calls it the “maximum insertion rate”. It is 51 5.2. Independence capacity shown that the maximum insertion rate of a 1-dimensional constraint S can be b when Σ = {0, 1}. The next theorem shows determined from a presentation Gb of S, the generalization of this result to larger alphabets. Let G = ((V, E), L) be a labeled graph. A cycle (ei )`i=1 ⊆E is simple, if the vertices τ (e1 ), . . ., τ (e` ) are distinct. Theorem 7. Let S be a 1-dimensional constraint over an alphabet Σ, and pick any b Then presentation Gb = ((V, E), L) for S. log |Φ(w)| b capind (X) = max : w b is generated by a simple cycle of Gb . |w| b (5.2) Proof. Denote the RHS of (5.2) by ν ∗ . Let w b∗ be a word generated by a simple cycle of Gb such that ν ∗ = (log |Φ(w b∗ )|)/|w b∗ |. Set ` = |w b∗ |. For n∈N, clearly, the word w b∗n ∈Sbn` , and (log |Φ(w b∗n )|)/(n`) = ν ∗ . It follows that ν ∗ ≤(log max{|Φ(w)| b : w∈ b Sbn` })/(n`). Taking the limit as n→∞, we obtain ∗ ν ≤capind (S). To complete the proof we show that capind (S)≤ν ∗ . We first claim b ∗ is a word generated by a (possibly non-simple) cycle of Gb then that if w∈ b Σ ν ∗≥ log |Φ(w)| b . |w| b (5.3) This is easily proved by induction on |w|. b If |w| b = 1, then the cycle generating w b is simple and obviously (5.3) holds. For |w|>1, b let π = (ei )`i=1 be a cycle of Gb generating w. b Obviously (5.3) holds if π is simple. Otherwise, there exist integers 1≤j<k≤` such that τ (ej )=τ (ek ). So both α = ej+1 , . . ., ek and β = b Let x e1 , . . ., ej , ek+1 , . . ., e` are cycles in G. b, yb denote the words generated by α and β respectively. Then using the induction hypothesis on |b x| and |b y |, we get log |Φ(w)| b |b x| log |Φ(b x)| |b y | log |Φ(b y )| = + |w| b |w| b |b x| |w| b |b y| |b x| ∗ |b y| ∗ ν + ν ≤ |w| b |w| b = ν ∗. Now, for n∈N, let zb(n)∈Sbn be a word such that |Φ(b z (n))| = max{|Φ(w)| b : (n) n−1 b b w∈ b Sn }. Let (ei )i=0 be a path in G generating zb(n). Then using [37, Lemma 13] (and removing “0-length” cycles), it may be decomposed as follows. There exist an integer 0≤m≤|V | and 2m integers 0 ≤ s1 ≤t1 < s2 ≤t2 < . . . < sm ≤tm < n 52 5.3. Independence capacity and axial products (n) tk such that for Smeach k = 1, . . ., m, (ei )i=sk is a cycle, and n− Set X = k=1 {sk , . . ., tk }. Using (5.3), we have P k (tk −sk +1)≤|V |. (n) (n) m X log |Φ(L(e(n) ))| log |Φ(b z (n))| X log |Φ(L(esk ). . .L(etk ))| i = + n n n k=1 i∈[n]\X ! (n) (n) m X log |Φ(L(esk ). . .L(et ))| tk − sk + 1 |V | k + ≤ · log |Σ| tk − sk + 1 n n k=1 ≤ ν∗ + |V | log |Σ|. n The result follows by taking the limit as n→∞ of both sides of the last inequality. We next show that the independence capacity of a constraint cannot exceed its capacity. Theorem 8. Let S be a D-dimensional constraint over Σ then capind (S)≤cap(S). Proof. For m∈ND , let zb(m)∈Sbm be a configuration such that |Φ(b z (m))| = max{|Φ(w)| b : w∈ b Sbm }. Since, Φ(b z (m)) ⊆ Sm , we get (1/|[m]|) log |Φ(b z (m))| ≤ (1/|[m]|) log |Sm |. Taking the limit of both sides as m → ∞, we obtain the result. 5.3 Independence capacity and axial products In this section we show a relation between the independence capacity of an axial product of 1-dimensional constraints to the independence capacities of the 1dimensional constraints. A similar relation holds for the (conventional) capacities. The relation is stated in the next theorem. Theorem 9. Let S (1) , . . ., S (D) be 1-dimensional constraints over Σ. Then the following statements hold: 1. capind (S (1) ⊗. . .⊗S (D) )≤ min capind (S (i) ). i 2. cap(S (1) ⊗. . .⊗S (D) )≤ min cap(S (i) ). i 3. If S (1) =. . .=S (D) =S then capind (S ⊗D ) = capind (S). 53 5.3. Independence capacity and axial products Remark 1. [37] proves Part 3 for binary 1-dimensional constraints. The proof uses the existence of a word w∈ b Sb with capind (S) = log(|Φ(w)|)/| b w| b and w bn ∈Sb for all n∈N. The proof given here does not rely on the existence of such a word. Proof. Part 1. Denote by Sb the multi-choice constraint corresponding to (1) (D) S (1) ⊗. . .⊗S (D) . Fix i∈{1, . . ., D}. For k∈N, let mk = (mk , . . ., mk )⊆ND , (j) (i) be the D-tuple with mk =1, for j∈{1, . . ., D}\{i}, and mk =k. Every array d (i) and vice versa. Hence, (log max{|Φ(w)| b : in Sb is essentially a word in S mk k d (i) })/k. By (5.1) it follows that b : w∈ b S w∈ b Sbmk })/|[mk ]| = (log max{|Φ(w)| k (1) (D) (i) capind (S ⊗. . .⊗S )≤capind (S ), and the theorem follows. Part 2. The proof is similar to the proof of Part 1, so we omit it here. Part 3. Let T =S ⊗D . By Part 1, it’s enough to show capind (T )≥capind (S). Let f :N→N be any function satisfying limi→∞ (f (i)/i)=∞. For i∈N, let mi = (i, i, . . ., i, f (i))∈ND be the D-tuple with every entry but the last equal to i, and the last entry equal to f (i). Set `(i)=(D−1)(i−1)+f (i), and let (i) (i) zb(i) =b z0 . . .b z`(i)−1 ∈Sb`(i) be a word such that |Φ(b z (i) )| = max{|Φ(w)| b : w∈ b Sb`(i) }. b (i) ∈Σ b mi by Define the D-dimensional array Γ b (i) = zb(i) , j∈[mi ], Γ j ψ(j) P where ψ : ZD → Z is given by ψ(j1 , . . ., jD ) = k jk . Observe that every row of b (i) is a (contiguous) sub-word of zb(i) ; it follows that Γ b (i) ∈Tbm . Consequently, Γ i b (i) )| ≤ max{|Φ(Γ)| b : Γ∈ b Tbm }. |Φ(Γ i (5.4) b (i) )|. Set X = {j∈Z : (i−1)(D−1)≤j<f (i)}, and We next lower bound log |Φ(Γ −1 for Y ⊆Z denote by ψ (Y ) = {j∈ZD : ψ(j)∈Y }. Then X b (i) )| = b (i) )| log |Φ(Γ log |Φ(Γ j j∈[mi ] X = (i) ψ −1 ({k})∩[mi ] · log Φ(b zk ) k∈[`(i)] ≥ iD−1 X (i) log |Φ(b zk )| k∈X = iD−1 (i) X log |Φ(b zk )| k∈[`(i)] D−1 ≥i − X (i) log |Φ(b zk )| k∈[`(i)]\X (i) log |Φ(b z )| − 2(D−1)(i−1)iD−1 log |Σ|, 54 5.4. Independence capacity and limD→∞ cap(S ⊗D ) where we used the holds that ψ −1 ({k}) ∩ [mi ] = P fact that for k∈X, it D−1 {(j1 , . . ., jD−1 , k− t jt ) : (j1 , . . ., jD−1 )∈[i] }. Combining the last inequality with (5.4), we have b : Γ∈ b Tbm } log max{|Φ(Γ)| iD−1 log |Φ(b z (i) )| − 2(D−1)(i−1)iD−1 log |Σ| i ≥ |[mi ]| |[mi ]| ≥ log |Φ(b z (i) )| 2(D−1)(i−1) log |Σ| − . `(i) f (i) Taking the limit of both sides as i→∞, we obtain capind (T )≥capind (S). 5.4 S ⊗D ) Independence capacity and limD→∞cap cap(S Combining the fact that cap(S ⊗D ) is non-increasing in D with Theorem 8, we have cap(S ⊗1 )≥cap(S ⊗2 )≥. . .≥capind (S). We denote the limit limD→∞ cap(S ⊗D ) by cap∞ (S). Obviously, cap∞ (S)≥capind (S) and for all the constraints for which we know the value of cap∞ , it turns out to be equal to capind . We list these constraints. 1. RLL(d, k) with k≤2d. For this family of constraints, capind turns out to be 0 [38]. In [20], it is shown that cap(∞) = 0 as well. 2. RLL(d, ∞). For this family of constraints, capind turns out to be 1/(d+1) [38]. [36] shows that 2 1 log (D(d+1)) ⊗D cap(RLL(d, ∞) ) = +O . d+1 D(d+1) Additionally, for d = 1, [29] shows that, for sufficiently large D, 1 1 −2D 1 + ·2 ≤ cap(RLL(1, ∞)⊗D ) ≤ + 2o(D) 2−2D . 2 2 2 3. The 3-checkerboard constraint CHK is defined over the alphabet {0, 1, 2}, and consists of all words in which every 2 adjacent symbols are distinct. The independence capacity of this constraint turns out to be 1/2, and [29] shows that cap∞ (CHK) = 1/2 as well. The equality of cap∞ and capind was first noticed empirically by Chaichanavong and Poo in [37], where they ask if this is true for all 1-dimensional constraints. The following theorem gives a partial answer. 55 5.4. Independence capacity and limD→∞ cap(S ⊗D ) Theorem 10. Let S be a 1-dimensional constraint having finite memory m over Σ with capind (S) = 0. Then cap(S ⊗(D+1) ) ≤ m ⊗D cap(S ). m+1 In particular, cap∞ (S)=0. To prove the theorem we need the following definitions on labeled graphs. For a labeled graph G = (V, E, L) and a subset U ⊆V , we call the graph (U, EU , L|EU ) where EU = {e∈E : σ(e)∈U, τ (e)∈U } the subgraph of G induced by U . For two vertices u, v∈V we say that u is reachable from v if there is a path in G that starts G in u and ends in v. We write u ↔ v if u is reachable from v and v is reachable G G G from u. If u ↔ v does not hold we write u ↔ 6 v. The relation ↔ is an equivalence relation on the vertices of G and the equivalence classes are called the irreducible components of G. For an irreducible component of G we shall sometime also call the subgraph of G that it induces an irreducible component. Obviously, a graph G is irreducible iff it has only one irreducible component. A labeled graph G has memory m, for some nonnegative integer m, if all paths of length m generating the same word terminate at the same vertex. Clearly, if G has memory m, then it also has memory n for any n>m. The following proposition relates memory of a constraint to memory of a graph. Proposition 7. If S is a 1-dimensional constraint with finite memory m, then there exists a presentation G of S with memory m. Proof. We construct the “follower-set graph of S”, G = ((V, E), L), as follows. V = {FS (x) : x∈S}, and for a vertex FS (x)∈V and symbol a ∈ FS (x)∩Σ, define δ(FS (x), a) = FS (xa). It’s easy to verify that δ(·, ·) is well-defined. The set of edges E is now given by E = {(u, a, δ(u, a)) : u∈V, a∈u∩Σ}, and for an edge e = (u, a, v)∈E, σ(e) = u, τ (e) = v and L(e) = a. Clearly, G is deterministic. The following fact is easily verified Fact 2. For every vertex FS (x)∈V , a word w is generated from FS (x) in G if and only if w∈FS (x) in which case the path generating it terminates at FS (xw). As FS (ε) = S it follows from this fact that G is a presentation of S. We show that G has memory m. Let w∈S with |w| = m be generated by some path in G starting from a vertex FS (x), then by the above fact, the path terminates at FS (xw). We claim that FS (xw)=FS (w). If x = ε this is obviously true, so assume x 6= ε. Clearly, FS (xw)⊆FS (w). On the other hand, let y∈FS (w). Note that by our assumption, |xwy|>m and every contiguous sub-word, with length m+1, of xwy is a contiguous sub-word of xw or a contiguous sub-word of wy and 56 5.4. Independence capacity and limD→∞ cap(S ⊗D ) therefore satisfies S. Since S has memory m, this implies that xwy satisfies S as well, and hence y∈FS (xw). Thus FS (w) = FS (xw) as claimed. Therefore all paths generating w terminate at FS (w) and it follows that G has memory m. Proof of Theorem 10. Let G = ((V, E), L) be a presentation of S with memory m, and for a word x∈S with |x|≥m, denote by v(x) the terminal state of any path generating x in G. We will need the next two lemmas. The first shows that if capind (S) = 0, knowledge of long enough prefixes and suffixes of a word in S is often sufficient to determine the middle of the word. G Lemma 4. Let x, y∈S be words of length m such that v(x)↔v(y). Then there is at most one word of the form xay, where a∈Σ, in S. Proof. Assume to the contrary that there are two such words xay, xby∈S where 2m+1 a, b∈Σ and a6=b. Therefore there are two paths in G, (ei )2m+1 i=1 , (fi )i=1 ⊆E generm ating xay and xby respectively. Since the paths (ei )m i=1 and (fi )i=1 both generate x, it follows that σ(em+1 ) = σ(fm+1 ) = v(x). Similarly, since both paths (ei )2m+1 i=m+2 and (fi )2m+1 generate y, it follows that τ (e ) = τ (f ) = v(y). There2m+1 2m+1 i=m+2 2m+1 fore the paths (ei )2m+1 i=m+1 , (fi )i=m+1 both start at v(x), both end at v(y), and generate ay and by respectively. Since, by our assumption, v(x) and v(y) are in the same irreducible component, there is a path generating some word z∈Σ∗ starting 2m+1 at v(y) and ending at v(x). Concatenating this path to the end of (ei )i=m+1 and 2m+1 (fi )i=m+1 we obtain two cycles—both starting and ending at v(x)—one generating ayz and the other generating byz. Consequently, for all k∈N and sequence (ci )ki=1 ⊆{a, b}, there is a cycle in G generating the word c1 yzc2 yz. . .ck yz. Let |z| = `, y = y0 . . .ym−1 and z = z0 . . .z`−1 , where yi , zj ∈Σ for i∈[m], j∈[`], b given by yb = {y1 }. . .{ym−1 } and zb = and denote by yb, zb the words over Σ b But this easily im{z1 }. . .{z`−1 }. It follows that for all k∈N, ({a, b}b y zb)k ∈S. plies that capind (S)≥1/(m+`+1), contradicting our assumption. Let CG denote the number of irreducible components of G. The next lemma bounds the number of appearances of words of the form xay in a certain word of G S, where a∈Σ, |x| = |y| = m and v(x) ↔ 6 v(y). Lemma 5. For `∈N and word z∈S`(m+1)+m , let y (0) , . . ., y (`) ∈Σm and a(0) . . ., a(`−1) ∈Σ be defined by z = y (0) a(0) y (1) a(1) . . .y (`−1) a(`−1) y (`) . Then n o G i∈[`] : v(y (i) )6↔v(y (i+1) ) < CG . 57 5.4. Independence capacity and limD→∞ cap(S ⊗D ) Proof. Assume to the contrary that there are 0≤i1 <i2 <. . .<iCG <` such G that v(y (ij ) )6↔v(y (ij +1) ), for every j∈{1, . . ., CG }. Then the sequence (iCG ) (iCG +1) (i ) (i ) 1 2 v(y ), v(y ), . . ., v(y ), v(y ) has CG +1 vertices and therefore contains two belonging to the same irreducible component, say v(y (s) ) and v(y (t) ), for some integers 0≤s<t≤`, with s∈{i1 , . . ., iCG }. Since the word y (s) a(s) y (s+1) ∈S, it follows that v(y (s+1) ) is reachable from v(y (s) ). Similarly, since y (s+1) a(s+1) y (s+2) a(s+2) . . .a(t−1) y (t) ∈S, it holds that v(y (t) ) is reachable from v(y (s+1) ). But since v(y (t) ) is in the same irreducible component as v(y (s) ), G it follows that v(y (s) ) is reachable from v(y (s+1) ) as well. Thus v(y (s) )↔v(y (s+1) ) which contradicts s∈{i1 , . . ., iCG }. We can now prove the theorem. This part of the proof is a generalization of [20, Lemma 3]. Figure 5.1 illustrates the proof. Let ` be a positive integer. Denote by ` ∈ND the D-tuple with every entry equal to `, and let m` ∈ND+1 be given by m` =(`(m+1)+m, `, `, . . . , `). Set T = S ⊗(D+1) . We will give an upper bound on |Tm` |. Let X be the set [`(m+1)+m]\{i(m+1)+m : i∈[`]} and Y the Cartesian product X×[``]. For an array Γ∈Tm` let Γ|Y : Y → Σ denote the mapping given by Γ|Y (x) = Γx . Let BY = {Γ|Y : Γ∈Tm` } denote the set of all such mappings. For a mapping ∆∈BY we define the set Z(∆)⊆Tm` by Z(∆) = {Γ∈Tm` : Γ|Y = ∆}. Clearly, [ Z(∆) = Tm` . (5.5) ∆∈BY Let ∆∈BY , and fix j∈[``]. For i∈[`+1], let y (i,j,∆) ∈Σm be the word given by y (i,j,∆) = ∆(i(m + 1), j)∆(i(m + 1) + 1, j). . .∆(i(m + 1) + m − 1, j) and for Γ∈Z(∆) let w(Γ,j) ∈Σ`(m+1)+m be the word given by w(Γ,j) = Γ(0,j) Γ(1,j) . . .Γ(`(m+1)+m−1,j) . Note that for such Γ, since Γ∈Tm` , w(Γ,j) ∈S, and, since Γ|Y = ∆, we may write w(Γ,j) = y (0,j,∆) a(0,j,Γ) y (1,j,∆) a(1,j,Γ) . . .y (`−1,j,∆) a(`−1,j,Γ) y (`,j,∆) , where a(i,j,Γ) = Γ(i(m+1)+m,j) for G (i,j,∆) (i+1,j,∆) v(y )↔v(y ) then, by Lemma i∈[`]. Now, if i∈[`] such that 4, all Γ∈Z(∆) have the same a(i,j,Γ) . On the other hand, by Lemma 5, since Z(∆) 6= ∅, it holds that |{i∈[`] : G v(y (i,j,∆) )6↔v(y (i+1,j,∆) )}|<CG . It follows that |{w(Γ,j) : Γ∈Z(∆)}|≤|Σ|CG , and consequently Y D |Z(∆)| ≤ |{w(Γ,j) : Γ∈Z(∆)}|≤|Σ|CG ` . (5.6) j∈[``] 58 5.4. Independence capacity and limD→∞ cap(S ⊗D ) Since for any ∆∈BY , and i∈X, the D-dimensional array Λ(∆, i)∈Σ` with entries given by (Λ(∆, i))j = ∆(i, j) is clearly in S ⊗D , it follows that |BY |≤|(S ⊗D )` ||X| . Combining this with (5.6) and (5.5), we have X D |Z(∆)| ≤ |Σ|CG ` |(S ⊗D )` |(`+1)m . |Tm` | = ∆∈BY Taking the logarithm of both sides and dividing by |[m` ]|, we obtain CG log |Σ| log |Tm` | (` + 1)m log |(S ⊗D )` | + ≤ |[m` ]| `(m + 1) + m `D `(m + 1) + m The theorem follows by taking the limit as `→∞. 59 5.4. Independence capacity and limD→∞ cap(S ⊗D ) ` m y (`,j,∆) a(`−1,j,Γ) y (`−1,j,∆) a(1,j,Γ) y (1,j,∆) a(0,j,Γ) y (0,j,∆) j Figure 5.1: Proof of Theorem 10. As an application of Theorem 10, consider the the family of multiple-spaced runlength constraints. Each of these constraints is denoted RLL(d, k, s) with d,k and s nonnegative integers and d ≤ k. It is defined over the alphabet {0, 1} and consists of all words in RLL(d, k) for which the length of each run of ‘0’s delimited by ‘1’s on both ends is a multiple of s. Fix such integers d, k, and s with s ≥ 2, and let S = RLL(d, k, s). It can be verified that each word of Sb has at most two 60 5.5. Open questions letters equal to {0, 1}, and it follows that capind (S) = 0. As the memory of S is (at most) k, by Theorem 10, we have the following corollary. Corollary 3. Let d, k, s be nonnegative integers such that d≤k and s≥2. Then ⊗D cap(RLL(d, k, s) 5.5 )= k k+1 D−1 ·cap(RLL(d, k, s)). Open questions Is it true that cap∞ (S) = capind (S) for every 1-dimensional constraint S? For a 1-dimensional constraint S, what can be said about the rate of convergence of cap(S ⊗D ) to cap∞ (S)? Finally, is Sb a D-dimensional constraint, for every Ddimensional constraint S, when D≥2? 61 Chapter 6 The tradeoff function for binary 1-dimensional constraints∗ This chapter deals with the tradeoff function for 1-dimensional binary constraints. As mentioned in Chapter 1, the tradeoff function for a 1-dimensional constraint evaluated at 0 equals the constraint’s capacity and thus is a more general notion than capacity. The motivation for defining this function comes from the application of 1-dimensional constraints in digital storage systems. We describe the motivation in more detail in the next section. Later on, we give the precise definition of the tradeoff function for a general 1-dimensional binary constraint. The rest of the chapter shows our computation of this function for two families of RLL(d, k) constraints. 6.1 A brief overview of digital recording In digital storage systems, user data is written to the device in the form of a binary sequence. Typically, not every binary sequence may be written reliably to the device and therefore, only a subset of all possible sequences is “allowed” to be written. The set of all the “allowed” sequences is usually modeled as a 1-dimensional binary constraint. In practice, limiting the written sequence to this “allowed” set is often not sufficient to guarantee the required reliability, and an error-correctingcode or ECC is used as well. Consequently, user data (represented as an arbitrary stream of ‘0’s and ‘1’s) is encoded twice before written to media. First the data is encoded to a codeword of an error-correcting-code or ECC and then the resultant codeword is encoded to an “allowed sequence” of some 1-dimensional binary constraint. In this context, the constraint is sometimes called a “modulation code” and the encoding of the ECC codeword to a constrained sequence is known as “modulation encoding”. When reading back the data the process is reversed: the data read from the device is first decoded by the modulation code decoder and then the ECC decoder is used to recover the source data, attempting to correct any errors ∗ A version of this chapter has been submitted for publication. Louidor, E. (2010) The Tradeoff Function for a Class of RLL(d, k) Constraints. 62 6.1. A brief overview of digital recording that may have occurred when the data was read. Roughly speaking, the rate of a modulation encoder is the ratio between the length of its input to the length of its output; it is typically strictly smaller than 1. In storage systems it is desirable that the rate be as high as possible to maximize storage. This scheme suffers from a couple of disadvantages. First, a small number of errors that are present after reading the data from the device may turn into a burst of errors at the output of the modulation decoder, which may overwhelm the ECC decoder. Second, since the modulation code decoder is typically a “hard” decoder—meaning that it outputs “hard” ‘0’s or ‘1’s rather than probabilities or likelihoods—any soft or probabilistic information that might have been available after reading the data from the device is not readily available to the ECC decoder, thereby limiting its error correction capability. In [40], [5], [38] and the references therein, several encoding schemes are proposed to overcome these disadvantages. Here, we focus on one of these schemes, in which the order of the two encodings mentioned above is reversed. The source data is first encoded with a modulation code into a constrained sequence, but instead of using the original constraint, we encode it to a sequence of the multi-choice constraint corresponding to the original constraint, where we use 0, 1, and in place b respectively. The entries containing the of the symbols {0}, {1}, and {0, 1} of Σ, ‘’s are “unconstrained” in the sense that replacing (or “filling”) them with any values in {0, 1} independently would result in a sequence satisfying the constraint. Next, a systematic ECC with a suitable redundancy is applied, placing the redundancy (parity-check) bits in these unconstrained positions. Clearly, this addresses both of the disadvantages listed above. In this scheme, since the error correction capability of the ECC depends on the number of redundancy bits, it is desirable that the number of unconstrained positions be as large as possible. On the other hand, increasing the number of unconstrained positions at the output of the modulation encoder naturally reduces its rate, as no user information is encoded in the unconstrained positions. In [38] the authors study the tradeoff function that defines for a given “density” of unconstrained positions, called the insertion-rate, the maximum overall rate of the encoding; knowing this function is obviously important to the design of efficient digital storage systems employing this scheme. Currently, there are only very few constraints for which the tradeoff function has been computed explicitly. As mentioned in Chapter 1, the RLL(d, k) constraint is widely used in digital storage systems employing optical or magnetic recording. Another constraint used in practice is the maximum transition run or MTR constraint, denoted MTR(j, k) for some nonnegative integers j, k. This constraint consists of all binary sequences in which the length of each run of ‘1’s is at most j and the length of each run of ‘0’s is at most k. More details on these constraints as well as other constraints used in 63 6.2. Previous work practice may be found in [18] and [32]. We give a precise definition of the tradeoff function in Section 6.3. 6.2 Previous work In [5], the tradeoff functions for RLL(0, 1) and RLL(0, 2) are determined. In [37], Poo computed the tradeoff function for RLL(0, 3) for insertion rates between 0 and 1/4, and the tradeoff function for RLL(d, 2d + 1) for any d. In [38] the authors compute the tradeoff function for MTR(2, 2). For completeness, we present these functions in Theorem 11. Lower bounds on the tradeoff function for RLL(0, k) are given in [19] and [21]. In this chapter, we determine the tradeoff functions for two other families of constraints: RLL(d, 2d+2), and RLL(d, ∞). Our results are stated precisely in Theorems 12 and 13. For RLL(d, 2d+2), we find a curious dichotomy in the shape of the tradeoff function between different ranges of values of d. The function is always piecewise linear; yet it consists of 2 linear “segments” for 1≤d≤16 and 3 segments for d≥17. This chapter is organized as follows. In Section 6.3 we define the tradeoff function and related concepts as well as summarize some of its known properties. We also state the previously known tradeoff functions and our new results. In Section 6.4 we show the derivation of the tradeoff function for RLL(d, ∞) and in Section 6.5 we show the derivation of the tradeoff function for RLL(d, 2d+2). 6.3 Background and definitions For the rest of this chapter, fix Σ = {0, 1}. Let S be a 1-dimensional constraint b Φ, Sb and GbF be as defined in Chapter 5. As already stated, in this over Σ and Σ, S chapter we use ‘0’, ‘1’ and ‘’ in place of {0}, {1} and {0, 1}, respectively. So b = {0, 1, }, and Φ(x), for x∈Σ b ∗ , can be thought of as the set of all possible Σ “fillings” of the ‘’s of x with bits. We also sometimes omit the subscript FS from b ∗ , let # (w) denote the number of GbFS to simplify notation. For a word w ∈ Σ ‘’s in w. Observe that log |Φ(w)| = # (w) and hence max{# (w) : w∈Sbm } ; m→∞ m capind (S) = lim b In this so capind (S) is the asymptotic maximum density of ’‘s in words of S. chapter we use the following notation for sequences. For a set T and nonnegative integer n we denote a sequence b1 , . . ., bn of n elements of T by (bi ), and index its elements by {1, 2, . . ., n}. We refer to n as the length of the sequence 64 6.3. Background and definitions and denote it by |(bi )|. We abuse notation and write (bi )⊆T to mean that bi ∈T for all i∈{1, 2, . . ., |(bi )|}. For two sequences (bi ), (di )⊆T we denote by (bi )(di ) the sequence formed by concatenating the sequences (bi ) and (di ), that is the sequence b1 , b2 , . . ., b|(bi )| , d1 , d2 , . . ., d|(di )| . For a nonnegative integer m, the notation (bi )m is used to denote the sequence formed by concatenating (bi ) to itself m times (as usual (bi )0 is the empty sequence). We shall also consider infinite sequences b1 , b2 , . . . with elements in T and denote such sequences by (bi )∞ i=1 . For a sequence (Mi ) of m real nonnegative square matrices all having the same size, we Q write λ((Mi )) to mean λ( i Mi ). Let S be a constraint over Σ. For a positive integer n and a subset I⊆[n] we define M(I, n) = MS (I, n) = |{w0 w1 . . .wn−1 ∈Sbn : ∀i∈[n], wi = ⇔ i∈I}|. Also, for any real number ρ∈[0, 1], define the set Iρ by |Ij | ∞ Iρ = (Ij )j=1 : Ij ⊆[j] for all j, and lim =ρ . j→∞ j Then the tradeoff function of S, fS : [0, 1]→[0, 1]∪{−∞} is given by fS (ρ) = sup lim sup (Ij )∈Iρ j→∞ log MS (Ij , j) . j A 1-dimensional constraint is irreducible if it has an irreducible presentation. A graph (labeled graph) is called trivial if it has exactly one vertex and no edges. Every RLL(d, k) constraint is irreducible. In [38] it is shown that if S is an irreducible finite-type constraint (that has infinitely many words), then GbFS has exactly one non-trivial irreducible component. Here, we denote this comb? , Lb? ). For a subset Q⊆Σ, b let E b?Q denote the subset of ponent by Gb? = (Vb? , E b? consisting of the edges whose label is in Q. We denote by Gb?{0,1} the subE {0,1} b?{0,1} , Lb? | {0,1} ) and by Gb?{} the subgraph of graph of Gb? given by Gb? =(Vb? , E b E ? {} b?{} , Lb? | {} ). We define A{0,1} (S) = A(Gb?{0,1} ) and Gb? given by Gb? =(Vb? , E b E {} ? A{} (S) = A(Gb? ). Let (Mi )⊆{A{0,1} (S), A{} (S)} be a sequence of length b? of Gb? matches (Mi ) if it has length n, and for n. We say that a path (ei )⊆E {} b b every Q 1≤i≤n, L? (ei )= iff Mi =A b(S). Note that for any s, t∈V? , the entry ( i Mi )(s,t) is the number of paths in G? starting at s, ending at t and matching (Mi ). For a finite sequence (Mi )⊆{A{0,1} (S), A{} (S)}, we denote by %((Mi )) the density of A{} (S) in (Mi ), namely 1≤i≤|(Mi )| : Mi =A{} (S) %((Mi )) = . |(Mi )| Let S be a 1-dimensional constraint over Σ. We list the following known facts about the tradeoff function fS . 65 6.3. Background and definitions • For 0≤ρ≤capind (S), fS (ρ)≥0, and for capind (S)<ρ≤1, fS (ρ) = −∞. • fS (0) = cap(S). • fS is decreasing in [0, capind (S)]. Moreover, for any 0≤ρ1 <ρ2 ≤capind (S), fS (ρ1 )−fS (ρ2 )≥ρ2 −ρ1 . • fS is left-continuous in [0, capind (S)]. • If S is irreducible and finite-type then fS is concave and continuous in [0, capind (S)]. Furthermore, for all rational ρ∈[0, 1] log λ((Mi )) , |(Mi )| (Mi ) fS (ρ) = sup (6.1) where the sup is taken over all sequences (Mi )⊆{A{0,1} (S), A{} (S)} with %((Mi ))=ρ. • For integers 0≤d≤k b(k−d)/(d+1)c , b(k+1)/(d+1)c(d+1) 1 . capind (RLL(d, ∞)) = d+1 capind (RLL(d, k)) = and (6.2) (6.3) See [38] for proofs. As mentioned in the introduction, there are a few constraints for which the tradeoff function has been computed explicitly. These are summarized in the next theorem, along with the references to the respective papers. For a finite sequence (xi )⊆R2 of points such that xi = (xi , yi ) and x1 <x2 <. . .<xn , we define L(xi ) : [x1 , xn ]→R to be the function whose graph is the piecewise linear curve connecting these points in sequence; namely, the function that satisfies L(xi ) (x) = yi+1 −yi (x−xi )+yi , xi ≤x≤xi+1 , xi+1 −xi for all 1≤i<n. Theorem 11. Let S (1) = RLL(0, 1), S (2) = RLL(0, 2), S (3) = RLL(0, 3), S (4) = RLL(d, 2d+1), and S (5) = MTR(2, 2). Let fi be the function fS (i) restricted to [0, capind (S (i) )]. Then the following statements hold: 1. f1 =L(0,cap(S (1) )),( 1 ,0) (shown in [5]). 2 66 6.3. Background and definitions 2. f2 =L(0,cap(S (2) )),( 1 , 2 cap(S (1) )),( 2 ,0) (shown in [5]). 3 3 3 3. f3 (ρ)=L(0,cap(S (3) )),( 1 , 3 cap(S (2) )) (ρ), for 0≤ρ≤ 14 (shown in [37]). 4 4 4. f4 =L(0,cap(S (4) )),( 1 ,0) 2(d+1) (shown in [37]). 5. f5 =L(0,cap(S (5) )),( 1 ,0) (shown in [38]). 3 We now state our new results. Theorem 12. Let d be a nonnegative integer and S = RLL(d, ∞). Set 1 , 0 = (capind (S), 0). p1 = (0, cap(S)) , p2 = d+1 Then fS (ρ) = Lp1 ,p2 (ρ), for ρ∈[0, capind (S)]. Theorem 13. Let d be a positive integer and S = RLL(d, 2d + 2). Set 3 log 3 p1 = (0, cap(S)) , p2 = , , 6d+8 6d+8 2 1 2 p3 = , , p4 = , 0 = (capind (S), 0) . 4d+5 4d+5 4d+4 Then the following statements hold: 1. If 1≤d≤16 then fS (ρ) = Lp1 ,p3 ,p4 (ρ), for ρ∈[0, capind (S)]. 2. If 17≤d then fS (ρ) = Lp1 ,p2 ,p3 ,p4 (ρ), for ρ∈[0, capind (S)]. The proofs are given in the next sections. The graphs of the tradeoff functions for RLL(d, ∞) and RLL(d, 2d+2) for ρ∈[0, capind (S)] are sketched in Figure 6.1. We need some properties of nonnegative matrices which we summarize here. If (Mi0 ) is formed by cyclically shifting (Mi ) (that is, there exists an integer o such that for all i, Mi0 =M((i−1+o) mod m)+1 , where for an integer Q j, j mod Qm denotes the unique integer k∈[m] such that k≡j ( mod m)) then i Mi and i Mi0 have the same characteristic polynomial ( [33, 2.15.15]); in particular, λ((Mi0 ))=λ((Mi )). If M and N are two nonnegative square matrices with M ≤N then λ(M )≤λ(N ) 67 6.3. Background and definitions ( [33, 5.7.5]). The support graph of an m×m nonnegative matrix M denoted GM = (VM , EM ) is the directed graph with vertices VM = {1, 2, . . ., m} and edges EM ={(i, j)∈VM ×VM : Mi,j >0}, where for an edge e = (i, j)∈EM , σ(e) = i and τ (e) = j. Such a matrix is called primitive if its support graph is primitive. For a primitive matrix M , the limit limg→∞ (M g )/(λ(M )g ) (where the limit is taken entry-wise), exists and is strictly positive in each entry ( [33, 5.9.7]). fS (ρ) cap(S) ρ 0 1 d+1 (a) fS (ρ) fS (ρ) cap(S) cap(S) 2 1 4d+5 , 4d+5 ρ 0 1 2d+2 (b) 3 , log 3 6d+8 6d+8 2 1 4d+5 , 4d+5 ρ 0 1 2d+2 (c) Figure 6.1: The graphs of fS (ρ) for ρ∈[0, capind (S)] (not to scale): (a) S=RLL(d, ∞); (b) S=RLL(d, 2d+2) and 1≤d≤16; (c) S=RLL(d, 2d+2) and 17≤d. 68 6.4. Proof of Theorem 12 6.4 Proof of Theorem 12 The proof is similar to the proof of [37, Proposition 45]. Set Cd = cap(S), and b GbF =(Vb , E, b L) b be the presentation of Sb defined in Section 6.3. Note that let G= S S is irreducible and of finite-type. We denote by ai the follower set FS (10i ) for 0≤i≤d. It can be verified that Vb = {a0 , a1 , . . ., ad }. The graph Gb is given in Figure 6.2. 0 a0 0 0 0 ... a1 ad 1 Figure 6.2: The graph GbFRLL(d,∞) . Clearly, Gb? = GbFS in this case. Let A = A{0,1} (S), B = A{} (S), and set C=BAd and hd = Lp1 ,p2 . For a sequence (Ni )⊆{A, C} we denote by ε((Ni )) the “expanded” sequence, with elements in {A, B}, formed by substituting the sequence (B)(A)d for every element C in (Ni ). It follows from (6.3) and the rest of the discussion in Section 6.3 that fS (0) = hd (0) and fS (ρ)≥hd (ρ) for ρ∈(0, capind (S)]. Hence it’s enough to show fS (ρ)≤hd (ρ) for ρ∈(0, capind (S)]. Since both fS and hd are continuous in that interval, it’s enough to show the latter inequality for all rational ρ∈(0, capind (S)]. Let ρ be such a rational. By (6.1) it suffices to show that for any sequence (Mi )⊆{A, B} with %((Mi ))=ρ, we have log λ((Mi )) ≤ hd (%((Mi ))). |(Mi )| (6.4) Let (Mi ) be such a sequence. Set m=|(Mi )| and consider the sequence (Xi )=(Mi )2 . Note that in any path of Gb the number of edges between a pair of edges labelled with a ‘’ must be at least d. It follows that if there exist integers 1≤i<j≤2m, with such that Xi =Xj =B, then no paths of Gb match (Xi ) Qj−i−1<d, m 2 or equivalently ( i=1 Mi ) =0. The latter equality implies λ((Mi ))=0, and therefore (6.4) holds. So assume no such integers exist. It can be verified that in this case we may cyclically shift (Mi ) such that (Mi )=ε((Ni )) for some (Ni )⊆{A, C} with N1 =C; such a cyclic shift does not change either side of (6.4). We denote |(Ni )| 69 6.4. Proof of Theorem 12 by n. Now, it’s easy to verify that there is exactly one path matching (B)(A)d and it starts and ends at ad ; hence the only nonzero entry of C is (C)(ad ,ad ) and it is equal to 1. It follows that C k = C, for k = 1, 2, . . . for any nonnegative |Vb |×|Vb | matrix Q. CQC ≤ C 2 Q, (6.5) (6.6) Let s be the number of elements in (Ni ) equal to C; clearly, s=mρ>0. Let (Ni0 )=(C)s (A)n−s ; then by (6.5) and (6.6), we have λ((Mi ))=λ((Ni ))≤λ((Ni0 ))=λ(CAn−s ). We order the entries of (CAn−s ) as follows. For every i, j∈[d] we place the element (CAn−s )(ai ,aj ) in the ith row and jth column. Observe, that using this ordering, the matrix CAn−s is lower triangular with (CAn−s )(i,i) =0 for 0≤i<d. It follows that λ(CAn−s )=(CAn−s )(ad ,ad ) = (An−s )(ad ,ad ) . Let Γ be the set of all paths in Gb starting and terminating in ad and matching (A)n−s , and for a nonnegative integer g define the set ∆(g)⊆S by ∆(g) = w∈Sg : w does not end with 10i , 0≤i≤d−1 . Then it’s not hard to check that ∆(n−s) is precisely the set of words generated by paths in Γ. Since Gb is deterministic, we have that |Γ|=|∆(n−s)|, so λ((Ni0 ))=|∆(n−s)|. Now, observe that for g≥d, any word in Sg−d can be extended to a word in ∆(g) by adding ‘0’s; thus for all g≥d, |Sg−d |≤|∆(g)|≤|Sg |. It follows that log |∆(g)| lim =Cd . g→∞ g On the other hand, note that for any nonnegative integers g1 , g2 and words w∈∆(g1 ), x∈∆(g2 ), the word wx∈∆(g1 + g2 ); it follows that log |∆(·)| is superadditive, and therefore by Lemma 1 we have log |∆(g)| log |∆(g)| = sup . g→∞ g g g≥1 lim In particular, log |∆(n − s)| ≤Cd , n−s 70 6.5. Proof of Theorem 13 and therefore, log λ((Mi )) log λ((Ni0 )) ≤ |(Mi )| m log |∆(n−s)| n−s = n−s m n−s m − s(d + 1) ≤ Cd =Cd m m = Cd (1 − ρ(d + 1)) = hd (ρ). This completes the proof. 6.5 Proof of Theorem 13 In this section we prove Theorem 13. As the proof is rather involved, we show an outline of the proof in Section 6.5.1, relying on several propositions whose proofs we defer to Section 6.5.2. Throughout this section, S, p1 , . . ., p4 are as defined in the statement of the theorem, and we set Cd = cap(RLL(d, 2d+2)). Note that S is b? , Lb? ) denote the unique non-trivial irreducible and of finite-type; let Gb? = (Vb? , E b component of GFS . Then it can be verified that Gb? is the subgraph of Gb induced by Vb? , where Vb? = FS (10i ) : 0≤i≤2d+2 ∪ n o FS (10i ) ∩ FS (10i+d+1 ) : 0≤i≤d−1 ∪ n o i i+d+2 FS (10 ) ∩ FS (10 ) : 0≤i≤d−1 . For the purpose of this proof we use the abbreviations ai = FS (10i ), 0≤i≤2d+2, i i+d+1 ), 0≤i≤d−1, and i i+d+2 ), 0≤i≤d−1. bi = FS (10 ) ∩ FS (10 ci = FS (10 ) ∩ FS (10 The graph Gb? is shown in Figure 6.3. Let A = A{0,1} (S), B = A{} (S) and set C = Ad BAd+1 ; we index the entries of C by Vb?2 . For a sequence of matrices (Mi )⊆{A, C} we denote by ε((Mi )) the sequence with elements in {A, B} formed by substituting the sequence (A)d (B)(A)d+1 for every element C in (Mi ). Let hd : [0, capind (S)]→[0, 1] be given by Lp1 ,p3 ,p4 if 1≤d≤16 hd = . Lp1 ,p2 ,p3 ,p4 if 17≤d 71 6.5. Proof of Theorem 13 0 0 ... c0 cd−1 0 a0 a1 0 0 0 ... 0 ... b0 1 1 bd−1 0 0 0 ad+1 ad 0 ... a2d 1 0 0 0 a2d+1 a2d+2 1 1 Figure 6.3: The non-trivial component of GbFRLL(d,2d+2) . 6.5.1 Outline of proof Let i, j∈Vb? . Since there are only 3 paths in Gb? matching (A)d (B)(A)d+1 : a path starting at vertex a0 and ending at vertex a2d+2 , a path starting at vertex a0 and ending at vertex a0 and a path starting at vertex a1 and ending at vertex a0 , it follows that the entries of C and C 2 are given by 1 if i = a0 and j∈{a0 , a2d+2 } 1 if i = a1 and j = a0 C(i,j) = , i, j∈V, (6.7) 0 otherwise 1 if i∈{a0 , a1 } and j∈{a0 , a2d+2 } , i, j∈V. (6.8) (C 2 )(i,j) = 0 otherwise The following facts easily follow: Fact 3. C 2 ≥ C. Fact 4. For all integers k≥2, C k = C 2 . Fact 5. C 2 = cr, where c and r are the column and row vectors, respectively, of size |Vb? | with entries indexed by Vb? and given by 1 if i ∈ {a0 , a1 } (c)i = 0 otherwise , i∈Vb? . 1 if i ∈ {a0 , a2d+2 } (r)i = 0 otherwise 72 6.5. Proof of Theorem 13 Hence C 2 is a (nonnegative) rank-1 matrix. The following proposition shows how to compute the Perron eigenvalue of such a matrix. Proposition 8. Let M be an m×m real nonnegative matrix of the form M = ab with a and b column and row vectors of size m, respectively. Then the following statements hold: 1. For any real nonnegative m×m matrix N , λ(M N ) = bN a. 2. For any real nonnegative m×m matrices N1 , N2 , λ(M N1 M N2 )=λ(M N1 )λ(M N2 ). Now, observe, that by (6.2), capind (S)=1/(2d+2); hence it follows from the discussion in Section 6.3 that and fS (0)=hd (0), (6.9) fS (1/(2d+2)) ≥ hd (1/(2d+2)). (6.10) So it’s enough to show fS (ρ)=hd (ρ) for all ρ∈(0, capind (S)]. The following proposition characterizes hd . Proposition 9. hd : [0, 1/(2d + 2)] → R is the smallest function satisfying: 1. hd is concave. 2. hd (0)≥Cd . 3. hd (3/(6d + 8))≥ log 3/(6d + 8) 4. hd (2/(4d + 5))≥1/(4d + 5) 5. hd (1/(2d + 2))≥0 We first show that for every ρ∈[0, capind (S)], fS (ρ)≥hd (ρ). Consider the two sequences of matrices (Yi )=ε((C, C, A)) and (Zi )=ε((C, C, A, C, A)). Clearly, 2 , 4d+5 3 %((Zi )) = . 6d+8 %((Yi )) = (6.11) (6.12) 73 6.5. Proof of Theorem 13 Now, let c, r be the vectors defined in Fact 5; by Proposition 8 and Fact 5, λ((Yi )) = λ(C 2 A) = rAc X A(i,j) = 2, = (6.13) i∈{a0 ,a2d+2 }, j∈{a0 ,a1 } and λ((Zi )) = λ(C 2 ACA) = rACAc. {0,1} Observe that for i∈Vb? , the entry (rA)i is the number of paths of length 1 in Gb? that begin in either a0 or a2d+2 and end at i. It follows that rA = ct (where ct is the transpose of c). Using (6.7), we have λ((Zi )) = ct CAc = 2A(a0 ,a0 ) + 2A(a0 ,a1 ) + A(a2d+2 ,a0 ) + A(a2d+2 ,a1 ) = 3. (6.14) By (6.1), for any sequence (Xi )⊆{A, B}, fS (%((Xi )))≥λ((Xi ))/|(Xi )|; hence by (6.11), (6.12), (6.13), and (6.14) above, we get fS (2/(4d+5)) ≥ 1/(4d+5), (6.15) fS (3/(6d+8)) ≥ (log 3)/(6d+8). (6.16) Since fS is concave in [0, capind (S)], equality (6.9), and inequalities (6.10), (6.15), and (6.16) imply that fS (ρ)≥hd (ρ) for all ρ∈[0, capind (S)]. In the remainder of this section we show fS (ρ)≤hd (ρ) for ρ∈(0, capind (S)]. Since both fS and hd are continuous in this interval, it’s enough to show fs (ρ)≤hd (ρ) for all rational ρ∈[0, capind (S)]. Let ρ be such a rational. By (6.1), it’s enough to show that for all finite sequences (Mi )⊆{A, B} with %((Mi ))=ρ, we have log λ((Mi )) ≤ hd (%((Mi ))). (6.17) |(Mi )| Let (Mi ) be such a sequence. Set n=|(Mi )| and consider the sequence (Xi )=(Mi )2 . Note that in any path of Gb? the number of edges between a pair of consecutive edges labelled with a ‘’ must be at least 2d+1. It follows that if there exist nonnegative integers 1≤i<j≤2n, with j−i−1<2d+1, such that Xi =Xj =B, Q then no paths of Gb? match (Xi ) or equivalently ( ni=1 Mi )2 =0. This implies 74 6.5. Proof of Theorem 13 λ((Mi )) = 0, and therefore (6.17) holds. So assume no such integers exist. It can be verified that in this case we may cyclically shift (Mi ) such that (Mi )=ε((Ni )) for some (Ni )⊆{A, C}; such a cyclic shift does not change either side of (6.17). Since we assumed ρ>0, the sequence (Ni ) must have at least one element equal to C. If every element of (Ni ) is equal to C, then ρ = capind (S) and λ((Ni )) = λ(C |(Ni )| ) |(Ni )|/2 = λ(C 2 ) = (rc)|(Ni )|/2 = 1, where c, r are the vectors defined in Fact 5 and we used Proposition 8. Therefore (6.17) holds with equality in this case. So we assume (Ni ) has an element equal to C and an element equal to A. By cyclically shifting (Mi ) and (Ni ), if necessary, we may assume (Ni ) is either of the form or the form (C)s1 (A)g1 (C)s2 (A)g2 . . .(C)sk (A)gk , k≥1, g1 , . . ., gk ≥1, s1 ≥2, and s2 , . . ., sk ≥1, (6.18) (C)(A)g1 (C)(A)g2 . . .(C)(A)gk , k≥1 and g1 , . . ., gk ≥1. (6.19) We claim it’s enough to show that (6.17) holds for (Mi ) = ε((Ni )), where (Ni ) is of the form (6.18). Indeed, assume that (6.17) holds for all sequences (Mi ) = ε((Ni )) such that (Ni )⊆{A, C} is of the form (6.18), and let (Mi )=ε((Ni )) with (Ni ) a sequence of the form (6.19). Pick a positive integer m, and set (Xi )=ε((C)(Ni )m ). Then log λ((Mi )) log λ((Mi )m ) log λ((CAg1 CAg2 . . .CAgk )m ) = = |(Mi )| m|(Mi )| m|(Mi )| g g g 1 2 k log λ(C(CA CA . . .CA )m ) ≤ (6.20) m|(Mi )| log λ((Xi )) |(Xi )| = |(Xi )| m|(Mi )| |(Xi )| ≤ hd (%((Xi ))) (6.21) m|(Mi )| 1+m|(Mi )|%((Mi )) 2d+2+m|(Mi )| = hd 2d+2+m|(Mi )| m|(Mi )| where (6.20) follows from Fact 3 and (6.21) follows from our assumption, as (C)(Ni )m is of the form (6.18). Since hd is continuous, taking the limit of the RHS as m approaches infinity, we get that (6.17) holds for (Mi ). 75 6.5. Proof of Theorem 13 So henceforth, we assume (Ni ) is of the form (6.18). We now further transform (Ni ) by reducing runs of C elements with lengths greater than 2; that is, we (possibly) change (Ni ) to be the sequence 0 0 0 (C)u (C)2 (A)g1 (C)s2 (A)g2 (C)s3 (A)g3 . . .(C)sk (A)gk , P where each s0j = min{2, sj } and u= j (sj −s0j ). We also update (Mi ) so that it still satisfies (Mi )=ε((Ni )). Clearly this does not change the RHS of (6.17) and by Fact 4 the LHS remains the same, as well. Now, the sequence (Ni ) may (1) (2) (m) (j) be rewritten as (Ni )=(C)u (Oi )(Oi ). . .(Oi ), where each (Oi )⊆{A, C} is given by (j) gj,kj (Oi )=(C)2 (A)gj,1 (C)(A)gj,2 (C)(A)gj,3 . . .(C)(A) kj ≥1, and gj,1 , . . ., gj,kj ≥1. , (6.22) Q Q (j) Observe that by Proposition 8 we have: λ(Ni ) = λ( m j=1 i Oi ) = Qm (j) j=1 λ((Oi )). We will use the following two propositions to further transform (Ni ). Proposition 10. For integers k≥2, 1≤i<k, g1 , g2 , . . ., gi−1 , gi+2 , . . ., gk ≥1 and gi , gi+1 ≥2 λ(C 2 Ag1 CAg2 . . . CAgi−1 CAgi CAgi+1 CAgi+2 . . . CAgk ) ≤ λ(C 2 Ag1 CAg2 . . . CAgi−1 CAgi +gi+1 CAgi+2 . . . CAgk ) Proposition 11. For integers k≥2, 1≤i<k, s, g1 , . . ., gi−1 , gi+2 , . . ., gk ≥1 and gi , gi+1 ≥2 λ(C 2 Ag1 CAg2 . . . CAgi−1 CAgi (CA)s CAgi+1 CAgi+2 . . . CAgk ) ≤ λ(C 2 Ag1 CAg2 . . . CAgi−1 CAgi +gi+1 CAgi+2 . . . CAgk C 2 A(CA)s−1 ) (j) For each j=1, 2, . . ., m we transform (Oi ) in turn, resulting in a new se(j) quence (Õi )⊆{A, C}. We first replace occurrences of (contiguous) subsequences of the form (A)g1 (C)(A)g2 , for some g1 , g2 ≥2, in our sequence with (A)g1 +g2 . Each such replacement decreases the number of elements equal to C by 1, does not change the number of elements equal to A, and by Proposition 10 does not decrease the λ of the sequence. We continue to do this until no more occurrences of such sequences exist. Let qj be the number of the replacements we performed. Next, we consider every occurrence of a (contiguous) subsequence of the form (C, A)s (C), for some s∈N, whose two preceding elements and two succeeding elements all equal A. For each such occurrence, in turn, we remove 76 6.5. Proof of Theorem 13 it, and concatenate the sequence (C, C, A)(C, A)s−1 to the end of our current sequence. Note that after each such removal-and-concatenation the number of elements equal to A and the number of elements equal to C do not change, and by Proposition 11 and Part 2 of Proposition 8, the λ of the sequence does not decrease. (j) We denote by (Õi ) the resulting sequence. Then it follows from this discussion (j) (j) that (C)qj (Õi ) and (Oi ) have the same number of elements equal to A and the same number of elements equal to C. It further follows that (j) (j) λ((Õi )) ≥ λ((Oi )), (6.23) and (j) (j,1) (Õi ) = (Ri (j,2) )(Ri (j,wj ) ). . .(Ri (j,k) where wj ∈N, and for each 1≤k≤wj , the sequence (Ri ), (6.24) ) is either of the form (C)2 (A)(C, A)t−1 , for some t≥1, (6.25) (C)2 (A, C)s (A)g (C, A)t , for some s, t≥0, g≥2. (6.26) or of the form P (1) (2) (m) Then Set (Ñi ) = (C)ũ (Õi )(Õi ). . .(Õi ), where ũ=u+ m j=1 qj . %(ε((Ñi ))) = %((Mi )) and |ε((Ñi ))| = |(Mi )|. Additionally, by (6.23), Proposition 8 and Fact 4, we have λ((Ni ))≤λ((Ñi )). Now, for j=1, . . ., m and (j,k) (j,k) k=1, . . ., wj , denote by (Fi ) the sequence ε((Ri )). To finish the proof, we claim, it’s enough to show that for every such j and k, (j,k) log λ((Fi (j,k) |(Fi )| )) (j,k) ≤ hd (%((Fi ))). (6.27) 77 6.5. Proof of Theorem 13 Indeed, assume that this holds. Then, noting that hd (capind (S)) = 0, we have log λ((Mi )) log λ((Ni )) log λ((Ñi )) = ≤ |(Mi )| |(Mi )| |ε((Ñi ))| ! w j m X (j,k) (j,k) X log λ((Fi )) |(Fi )| = (j,k) |(Fi )| |ε((Ñi ))| j=1 k=1 m wj (j,k) |(F )| |ε((C)ũ )| X X (j,k) ≤ hd (capind (S)) + hd (%((Fi ))) i |ε((Ñi ))| j=1 k=1 |ε((Ñi ))| m wj (j,k) ! (6.28) ! )| |ε((C)ũ )| X X (j,k) |(F %((Fi )) i ≤ hd capind (S) + |ε((Ñi ))| j=1 k=1 |ε((Ñi ))| (6.29) = hd (%(ε((Ñi )))) = hd (%((Mi ))), where (6.28) follows from our assumption, and (6.29) follows from the concavity of hd asserted in Proposition 9. So we proceed to show that (6.27) holds for every j=1, . . ., m, k=1, . . ., wj . (j,k) This follows from the next proposition and the fact that each (Ri ) is either of the form (6.25) or (6.26). Proposition 12. The following statements hold. 1. Let t≥1 be an integer and let (Fi )=ε((C)2 (A)(C, A)t−1 ), then log λ((Fi )) ≤hd (%((Fi ))). |(Fi )| 2. Let s, t≥0, g≥2 be integers and let (Fi )=ε((C)2 (A, C)s (A)g (C, A)t ), then log λ((Fi )) ≤hd (%((Fi ))). |(Fi )| The proof is now completed. 6.5.2 Proof of propositions Proof of Proposition 8. Part 1. Consider the matrix M N =a(bN ). Clearly, it has rank at most 1, and therefore the eigenvalue 0 has geometric multiplicity at least 78 6.5. Proof of Theorem 13 m−1; hence m−1 of the eigenvalues are 0. The last eigenvalue must then equal the trace of the matrix, which in this case, is (bN )a. Obviously, it is a largest real eigenvalue. Part 2. Using part 1 we have, λ(M N1 M N2 ) = λ(M (N1 M N2 )) = b(N1 M N2 )a = bN1 abN2 a = λ(M N1 )λ(M N2 ). Proof of Proposition 9. We make use of the following lemma. Lemma 6. For all positive integers d, Cd > Cd+1 . Proof of Lemma 6. It is well known (cf. [18]) that Cd = log λd , where λd >0 is the largest real root of the polynomial Pd (x) given by Pd (x) = x2d+3 − d+2 X xi . i=0 Let γ be any positive root of Pd (x). Choose any x>γ, and write x=(1+δ)γ for δ>0. Then Pd (x) = (1 + δ)2d+3 γ 2d+3 − d+2 X (1 + δ)i γ i i=0 > (1 + δ)2d+3 γ 2d+3 − (1 + δ)2d+3 d+2 X γi i=0 2d+3 = (1 + δ) Pd (γ) = 0. It follows that λd is the only positive root of Pd , and that Pd (x)>0 for all x>λd . Moreover, as Pd is continuous and Pd (0)= − 1<0 it follows that for all 0≤x<λd , Pd (x)<0. Clearly, the above holds for Pd+1 (x) and λd+1 as well; hence to show the claim it’s enough to prove that Pd+1 (λd )>0. Now, since Pd (λd ) = 0, we have 79 6.5. Proof of Theorem 13 λ2d+3 = d Pd+2 i=0 λid and Pd+1 (λd ) = λd2d+5 − = = d+4 X d+3 X λid − i=2 λd+4 d λid i=0 d+3 X λid i=0 − λd − 1. (6.30) Using Pd (λd ) = 0 again, we get −(d−1) 2d+3 λd 3 X = λid > 1+λd . i=−(d−1) λd+4 = λd d Thus, from (6.30), we get Pd+1 (λd )>0 and the claim follows. We now return to the proof of Proposition 9. We first note that for all d≥1 Cd ≤ log(9/8) ⇐⇒ d≥17. (6.31) Indeed it’s a simple matter to verify that for d=17, the LHS holds, and for d=16 it does not; (6.31) then follows by applying Lemma 6. We show that hd is concave. Note that for a sequence (xi )⊆R2 , with |(xi )| = k, xi = (xi , yi ) and x1 <. . .<xk , the function L(xi ) is concave iff the sequence of the slopes of the linear segments is non-increasing, namely, yi − yi−1 yi+1 − yi ≥ , 2≤i≤k−1. xi − xi−1 xi+1 − xi We check this for hd , when d≤16. In this case, hd = Lp1 ,p3 ,p4 = L(0,Cd ),( 1 2 , 1 ),( 2d+2 ,0) 4d+5 4d+5 , so one needs to verify that 1/(4d+5) − Cd 0 − 1/(4d+5) ≥ . 2/(4d+5) 1/(2d+2) − 2/(4d+5) Using simple algebraic manipulations this can be reduced to Cd ≤1, which certainly holds. As for the case d≥17, here, hd = Lp1 ,p2 ,p3 ,p4 = L(0,C log 3 3 2 1 1 d ),( 6d+8 , 6d+8 ),( 4d+5 , 4d+5 ),( 2d+2 ,0) , 80 6.5. Proof of Theorem 13 so one needs to verify the following two inequalities: log 3/(6d+8) − Cd 1/(4d+5) − log 3/(6d+8) ≥ 3/(6d+8) 2/(4d+5) − 3/(6d+8) 1/(4d+5) − log 3/(6d+8) 0 − 1/(4d+5) ≥ . 2/(4d+5) − 3/(6d+8) 1/(2d+2) − 2/(4d+5) (6.32) (6.33) Again, using algebraic manipulations, (6.32) can be reduced to Cd ≤ log(9/8), which holds by (6.31) and our assumption on d, and (6.33) can be reduced to 2≥ log 3, which is obviously true. Next, we verify that hd satisfies the other properties listed in the proposition. Clearly, hd satisfies Properties 2,4,5 with equality, and for d≥17, it satisfies Property 3 with equality as well. It remains to check Property 3 for d≤16, namely that L(0,Cd ),( 2 , 1 ) 4d+5 4d+5 (3/(6d+8)) ≥ log 3 . 6d+8 The latter inequality can be reduced to Cd ≥ log(9/8), which holds by our assumption on d and (6.31). Thus, hd is concave and satisfies Properties 2,3,4 and 5. It’s easy to verify using the definition of hd and concavity that it is the smallest such function. Before we show the proofs of Propositions 10 and 11, we develop some tools for calculating λ((Xi )), where (Xi )⊆{A, C} is a sequence of the form (C)2 (A)g1 (C)(A)g2 . . .(C)(A)gk , for some k≥1 and g1 , . . ., gk ≥1. (6.34) To this end, we define the following sets. For a positive integer g, let ∆(g)⊆Sg be given by w does not begin with ‘0i 1’, 1≤i≤d−1, ∆(g)= w∈Sg : . and does not end with ‘00’ Q For positive integers k and g1 , g2 , . . ., gk , let j ∆(gj ) denote the cartesian product ∆(g1 )×. . .×∆(gj ); define ∆(g1 , . . ., gk ) by Y for 1≤i<k, if wi ends with a ‘0’ ∆(g1 , . . ., gk )= (w1 , . . ., wk )∈ ∆(gj ) : . then wi+1 begins with a ‘0’ j Finally, for symbols a, b∈{0, 1}, and positive integers k,g1 , . . ., gk , define the set ∆a→b (g1 , . . ., gk ) by w1 starts with a ∆a→b (g1 , . . ., gk )= (w1 , . . ., wk ) ∈ ∆(g1 , . . ., gk ) : . and wk ends with b 81 6.5. Proof of Theorem 13 So for example, ∆(1) = {0, 1}, ∆(2) = {01, 10} if d = 1 and ∆(2) = {10} for d ≥ 2, ∆(i)=∅ for 3≤i≤d, and ∆(d + 1) = {0d 1} for d ≥ 2, and so on. Also note that ∆(1, 1, . . ., 1), where the number of 1’s is some integer s, is given by ∆(1, 1, . . ., 1) = {(0, 0, 0, . . ., 0), (1, 0, 0, . . ., 0), (1, 1, 0, . . ., 0), .. . (6.35) (1, 1, 1, . . ., 1)}, and has s+1 elements. The following proposition shows how these sets can be used to compute λ((Xi )), with (Xi ) of the form (6.34). Proposition 13. For all positive integers k, g1 , . . ., gk λ(C 2 Ag1 CAg2 . . .CAgk )=|∆(g1 , . . ., gk )| Proof. For an integer n, let Pn denote the set of all paths of length n in Gb? . For a path γ∈Pn we denote its starting vertex (resp. terminating vertex) by σ(γ) (resp. τ (γ)). Let k and g1 , . . ., gk be positive integers and let c, r be the vectors defined in Fact 5 so that C 2 = cr. By Proposition 8, λ(C 2 Ag1 CAg2 . . .CAgk ) = rAg1 CAg2 . . .CAgk c. Let (Mi ) = ε((A)g1 (C)(A)g2 . . .(C)(A)gk ) and let Γ be the set Γ = γ∈P|(Mi )| : γ matches (Mi ), σ(γ)∈{a0 , a2d+2 }, τ (γ)∈{a0 , a1 } . Then it follows that λ(C 2 Ag1 CAg2 . . .CAgk ) = rAg1 CAg2 . . .CAgk c ! Y =r Mi c i = |Γ|. We will show that |Γ| = |∆(g1 , . . ., gk )| byPexhibiting a bijection between these two sets. For i=1, 2, . . ., k define si = 1 + i−1 j=1 (gi +2d+2), and ti = si +gi −1. Then 1=s1 <t1 <s2 <t2 <. . .<sk <tk =|(Mi )| (si , ti denote the start and end indices of the sequence (A)gi in (Mi )). For a path γ = (e1 , . . ., en )∈Pn let Lb? (γ)=Lb? (e1 )Lb? (e2 ). . .Lb? (en )∈{0, 1, }n , be the word generated by the path, 82 6.5. Proof of Theorem 13 and for 1≤i≤j≤n denote by Lb? (γ)i the symbol Lb? (ei ) and by Lb? (γ)i:j the word Lb? (ei )Lb? (ei+1 ). . .Lb? (ej )∈{0, 1, }∗ . Note that, as Gb? is the only nonb a word w∈{0, 1, }∗ is generated by some path in Gb? trivial component of G, iff for any nonnegative integer m, there exists words z, y∈{0, 1, }m , such that b Next, fix γ∈Γ. Since Mj =B iff j=ti +d+1 for some 1≤i<k, we zwy∈S. b have L? (γ)ti +d+1 = for 1≤i<k, and Lb? (γ)j ∈{0, 1} for all j∈{1, . . ., tk } \ {t1 +d+1, t2 +d+1, . . ., tk−1 +d+1}. Also, observe that in any path δ∈P2d+1 with Lb? (δ)d+1 =, it must hold that Lb? (δ)=0d 0d . It follows that Lb? (γ)ti +1:ti +2d+1 = 0d 0d for all 1≤i<k. We now define φ : Γ → ∆(g1 , . . ., gk ) by φ(γ) = (Lb? (γ)s1 :t1 , Lb? (γ)s2 :t2 , . . ., Lb? (γ)sk :tk ) , γ∈Γ, and claim that it is a bijection. To show this, we need to verify the following statements: 1. φ is well-defined: for all γ∈Γ, φ(γ)∈∆(g1 , . . ., gk ) 2. φ is one-to-one. 3. φ is onto ∆(g1 , . . ., gk ). 1. Let γ∈Γ and 1≤i≤k. Since Lb? (γ)si :ti ∈{0, 1}∗ we have Lb? (γ)si :ti ∈S. Since σ(γ)∈{a0 , a2d+2 }, Lb? (γ) does not begin with ‘0r 1’ for any 1≤r<d which implies Lb? (γ)s1 :t1 does not begin with ‘0r 1’ for any 1≤r<d, as well. For 2≤i≤k, Lb? (γ)si :ti does not begin with ‘0r 1’ for any 1≤r<d, since otherwise Lb? (γ)ti−1 +1:si +r =0d 0d 00r 1 or Lb? (γ)ti−1 +1:si +r =0d 0d 10r 1, and both words b Additionally, for 1≤i≤k−1, Lb? (γ)s :t does not end with ‘00’ are not in S. i i b And, as since otherwise Lb? (γ)ti −1:ti +2d+1 = 000d 0d which is not in S. τ (γ)∈{a0 , a1 }, Lb? (γ) does not end in ‘00’, which implies that Lb? (γ)sk :tk does not end with ‘00’, as well. This shows that for all 1≤i≤k, Lb? (γ)si :ti ∈∆(gi ). Finally, let 1≤i≤k−1, and assume Lb? (γ)si :ti ends with a ‘0’. If Lb? (γ)si+1 :ti+1 begins with a ‘1’, then Lb? (γ)ti :si+1 =00d 0d 01 or Lb? (γ)ti :si+1 =00d 0d 11, and both are not b Therefore, if Lb? (γ)s :t ends with a ‘0’ then Lb? (γ)s :t must begin with a in S. i i i+1 i+1 ‘0’. It follows that φ(γ)∈∆(g1 , . . ., gk ). 2. Let γ1 , γ2 ∈Γ, such that φ(γ1 ) = φ(γ2 ). Clearly, all nonempty paths in G starting at a2d+2 generate a word beginning with ‘1’ and all nonempty paths in Gb? starting at a0 generate a word beginning with ‘0’. By our assumption, Lb? (γ1 ) and Lb? (γ2 ) begin with the same symbol; hence σ(γ1 ) = σ(γ2 ). Now, observe that for any path δ∈Γ, and 2≤i≤k, Lb? (δ)si −1 =Lb? (δ)si , where, 0=1 and 1=0; otherwise, Lb? (δ)ti−1 +1:si =0d 0d 00 or Lb? (δ)ti−1 +1:si =0d 0d 11 and both words are not in 83 6.5. Proof of Theorem 13 b Thus for all 2≤i≤k, S. Lb? (γ1 )ti−1 +1:si −1 = 0d 0d Lb? (γ1 )si = 0d 0d Lb? (γ2 )si = Lb? (γ2 )ti−1 +1:si −1 . It follows that Lb? (γ1 ) = Lb? (γ2 ) and since Gb? is deterministic, we have γ1 =γ2 . Therefore φ is one-to-one. 3. Let j=(j1 , j2 , . . ., jn ) be an n-tuple of positive integers for some n≥1. Consider the set ∆(j1 , . . .jn )=∆(j). For a word w=(w(1) , . . ., w(n) )∈∆(j),with (i) (i) (i) (i) w(i) =w1 w2 . . .wji , wr ∈{0, 1}, 1≤r≤ji , 1≤i≤n, we define the word b ∗ by z(w)∈Σ (2) (3) (n) z(w) = w(1) 0d 0d w1 w(2) 0d 0d w1 w(3) . . .0d 0d w1 w(n) . It is not hard to verify that for all w∈∆(j), any filling of the ‘’s of z(w) with symbols from {0, 1} does not contain the pattern 02d+3 nor any of the patterns 10r 1 b More is true; fix a w∈∆(j) and for an intefor 0≤r<d. It follows that z(w)∈S. ger m, let jm = (y1 , . . ., ym , j1 , . . ., jn , y1 , . . ., ym ) with y1 =y2 =. . .=ym =1, and wm = (x1 , . . ., xm , w(1) , . . ., w(n) , x1 , . . ., xm ) with x1 =x2 =. . .=xm =1. Then, b It thus follows that clearly wm ∈∆(jm ) and by our previous argument z(wm )∈S. z(w) can be extended indefinitely on both sides and is thus generated by a path in Gb? . Now, let w = (w(1) , . . ., w(k) )∈∆(g1 , . . ., gk ), and let γ be a path in Gb? generating z(w). We show that there is a path γ 0 ∈Γ generating z(w). This will conclude the proof, since obviously for such a path γ 0 , φ(γ 0 )=w. If k=g1 =1, then z(w)∈{0, 1} and clearly there is a path γ 0 ∈Γ generating z(w). So we assume k>1 or g>1. In this case, observe that the length of z(w) is at least 2, and exactly one of the last two symbols of z(w) must be a ‘1’. Therefore, τ (γ)∈{a0 , a1 }. Additionally, either z(w) has the prefix ‘1’, or z(w) begins with a ‘0’ and either g1 >1, in which case z(w) has the prefix 0r 1 with r≥d, or g1 =1 and (since k>1), z(w) has the prefix 00d 0d 1. It is easily verified from Figure 6.3, that in every case there is a path β in Gb? generating the corresponding prefix of z(w) with σ(β)∈{a0 , a2d+2 } and τ (β)=a0 . By replacing the initial part of γ generating this prefix with β we obtain a path γ 0 generating z(w) with σ(γ 0 )∈{a0 , a2d+2 } and τ (γ 0 ) = τ (γ)∈{a0 , a1 }. Clearly γ 0 matches (Mi ), and consequently γ 0 ∈Γ. To prove Propositions 10 and 11 we need the following lemma, which shows a relation between ∆(g1 , g2 ) and ∆(g1 + g2 ) for g1 , g2 ≥2. Lemma 7. For all g1 , g2 ≥2, there exists a function T :∆(g1 , g2 )→∆(g1 +g2 ) such that 84 6.5. Proof of Theorem 13 1. T is one-to-one. 2. For any (ax, yb)∈∆(g1 , g2 ), if T (ax, yb)=cwd, where a, b, c, d∈{0, 1} and x, y, w∈{0, 1}∗ , then d=b and c≤a (using the normal order of the integers in {0, 1}). Proof. Let g1 , g2 ≥2. For (x, y)∈∆(g1 , g2 ), we define T (x, y) as follows. 1. If x = w10, y = 0z, where w, z∈{0, 1}∗ and z doesn’t start with 02d+1 , then T (x, y) = xy. 2. If x = w0d+1 10, y = 02d+2 1z, where w, z∈{0, 1}∗ , then T (x, y) = w0d 10d+2 10d+1 1z. 3. If x = w10, y = 02d+2 1z, where w, z∈{0, 1}∗ and w doesn’t end with 0d+1 , then T (x, y) = w0d+2 10d+1 1z. 4. If x = w1, y = 0z, where w, z∈{0, 1}∗ , then T (x, y) = xy. 5. If x = w1, y = 1z, where w, z∈{0, 1}∗ and w doesn’t end with 02d+2 , then T (x, y) = w01z. 6. If x = w02d+2 1, y = 1z, where w, z∈{0, 1}∗ , then T (x, y) = w0d+2 10d 1z. For i = 1, 2, . . ., 6, let Λi denote the subset of pairs (x, y)∈∆(g1 , g2 ) satisfying the conditions of case i above. Then, one can verify that {Λ1 , . . ., Λ6 } is a partition of ∆(g1 , g2 ) and that in each case, T (x, y)∈∆(g1 + g2 ), thus T is welldefined. Moreover, if (ax0 , y 0 b)∈Λi and T (ax0 , y 0 b) = cwd, were a, b, c, d∈{0, 1} and x0 , y 0 , w∈{0, 1}∗ , then it’s easy to verify that d=b, and, unless i=3, then c=a; for i = 3, c≤a. It remains to show that T is one-to-one. First, observe that T restricted to Λi is one-to-one for i = 1, 2, . . ., 6. Next, for (x, y)∈∆(g1 , g2 ), let T (x, y)(1) , T (x, y)(2) ∈{0, 1}∗ be the sub-words of T (x, y) given by T (x, y) = T (x, y)(1) T (x, y)(2) , |T (x, y)(1) | = g1 , |T (x, y)(2) | = g2 , 85 6.5. Proof of Theorem 13 and consider the following table. T (x, y)(1) Conditions (x, y)∈Λ1 v10 (x, y)∈Λ2 v100 (x, y)∈Λ3 , g1 =2 00 (x, y)∈Λ3 , g1 ≥3 v000 (x, y)∈Λ4 v1 d+1 (x, y)∈Λ5 v0 (x, y)∈Λ6 v10d T (x, y)(2) 0w 0w 0w 0w 0w 1w 1w Each entry in the leftmost column describes the conditions on x, y, g1 under which T (x, y)(1) and T (x, y)(2) have the forms written in the corresponding entries under the rightmost two columns; here v, w denote arbitrary words over {0, 1}. For example, the first line claims that if (x, y)∈Λ1 then T (x, y)(1) ends with ‘10’ and T (x, y)(2) begins with a ‘0’. This and the claims corresponding to the other lines can be easily verified from the definition of T . Clearly, for any (x, y)∈Λi and (s, t)∈Λj with i6=j, we have T (x, y)6=T (s, t), and it follows that T is one-toone. We can now prove Propositions 10 and 11. For a vector v=(v1 , v2 , . . ., vk ), with positive integer entries, we use the abbreviation ∆(v) for ∆(v1 , v2 , . . ., vk ), and for a positive integer k we denote by 1k the vector in Zk with every entry equal to 1. Proof of Proposition 10. Let T : ∆(gi , gi+1 ) → ∆( gi +gi+1 ) be a function satisfying the properties listed in Lemma 7. Set g = (g1 , . . ., gk ) and g0 = (g1 , . . ., gi−1 , gi +gi+1 , gi+2 , . . ., gk ) and U :∆(g)→∆(g0 ) be the function defined by U (x1 , . . ., xk ) = (x1 , . . ., xi−1 , T (xi , xi+1 ), xi+2 , . . ., xk ) , (x1 , . . ., xk )∈∆(g). Since T satisfies Property (2) in Lemma 7, it follows that for every z∈∆(g), U (z)∈∆(g0 ); hence U is well-defined. Since T is one-to-one, so is U and therefore |∆(g)|≤|∆(g0 )|. The claim now follows from Proposition 13. Proof of Proposition 11. Let g(1) =(g1 , . . ., gi )∈Zi (2) k−i g =(gi+1 , . . ., gk )∈Z , where every gj >0. Define the block-vectors g = g(1) 1s g(2) g0 = g(1) g(2) . and 86 6.5. Proof of Theorem 13 Finally, let U : ∆(g)→∆(g0 )×∆(1s ) be the function given by U (x1 , . . ., xi , y1 , . . ., ys , xi+1 , . . ., xk ) = (x1 , . . ., xk , y1 , . . ., ys ), for all (x1 , . . ., xi , y1 , . . ., ys , xi+1 , . . ., xk )∈∆(g). Note that for such a vector, if xi ends with a ‘0’, then yj =0, for j = 1, 2, . . ., s, and consequently xi+1 must begin with a ‘0’. Therefore (x1 , . . ., xi , xi+1 , . . ., xk )∈∆(g0 ) and U is well-defined. Since U is obviously one-to-one, it follows that |∆(g)|≤|∆(g0 )||∆(1s )|. Now, set M =C 2 Ag1 CAg2 . . .CAgi (CA)s CAgi+1 . . .CAgk ; then by Propositions 13, 10 and 8, we get λ(M ) = |∆(g)| ≤ |∆(g0 )||∆(1s )| = λ(C 2 Ag1 CAg2 . . .CAgi−1 CAgi CAgi+1 CAgi+2 . . .CAgk )λ(C 2 A(CA)s−1 ) ≤ λ(C 2 Ag1 CAg2 . . .CAgi−1 CAgi +gi+1 CAgi+2 . . .CAgk )λ(C 2 A(CA)s−1 ) = λ(C 2 Ag1 CAg2 . . .CAgi−1 CAgi +gi+1 CAgi+2 . . .CAgk C 2 A(CA)s−1 ). Proof of Proposition 12, part 1. For t=1, it can be easily verified using Proposition 13, that the conclusion holds with equality. So assume t≥2; in this case, setting ρ=%((Fi )) we have, ρ= 1 1 3 t+1 = ≤ = . (t+1)(2d+2)+t 2d+3−1/(t+1) 2d+8/3 6d+8 By Proposition 9, it follows that hd (ρ)≥Lp1 ,p2 (ρ). So, it’s enough to show log λ((Fi )) ≤ Lp1 ,p2 (ρ) |(Fi )| Now, λ((Fi )) = λ(C 2 A(CA)t−1 ), which by Proposition 13 is |∆(1t )|; so by (6.35), λ((Fi ))=t+1. Therefore, we need to show ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ log λ((Fi )) |(Fi )| log λ((Fi )) |(Fi )| log(t+1) |(Fi )| log(t+1) |(Fi )| log(t+1) |(Fi )| ≤ Lp1 ,p2 (ρ) ≤ L(0,C log 3 3 d ),( 6d+8 , 6d+8 ) (ρ) Cd − (log 3)/(6d+8) 3 log 3 ≤ ρ− + −3/(6d+8) 6d+8 6d+8 ρ log 3 ≤ Cd 1 − + ρ 3/(6d+8) 3 ρ (t + 1)(log 3)/3 ≤ Cd 1 − + . 3/(6d+8) |Fi | 87 6.5. Proof of Theorem 13 The last inequality holds, since ρ≤3/(6d+8), Cd ≥0, and for any nonnegative integer t, (t+1)(log 3)/3≥ log(t+1). It remains to prove part 2 of Proposition 12. For this, we require the following two lemmas. The first establishes some properties of |∆(·)| and |∆a→b (·)|, for a, b∈{0, 1}. Lemma 8. The following statements hold. 1. |∆(·)| and |∆a→b (·)| for all bits a, b∈{0, 1} obey the RLL(d, 2d + 2) recursion. Namely, for all positive integers g, |∆(g+2d+3)| = d+2 X |∆(g+i)| i=0 |∆a→b (g+2d+3)| = d+2 X (6.36) |∆a→b (g+i)|. i=0 2. |∆(g)|≤|∆(g+1)| for all g≥3. 3. log(|∆(·)|) is “eventually superadditive”: for all g1 , g2 ≥3, |∆(g1 )||∆(g2 )| ≤ |∆(g1 +g2 )| log |∆(g)| log |∆(g)| = sup = Cd g→∞ g g g≥2 4. lim 5. |∆0→0 (g)|=|∆1→1 (g)| for all positive integers g. 6. |∆0→1 (g)|≤2|∆0→0 (g)| for g6=d+1. 7. |∆0→1 (g)|≤ 21 |∆(g)| for all g if d = 1, and for g6=d+1 if d>1. Proof. Part 1. Let V 0 ⊆V be the set of vertices {a0 , . . ., a2d+2 }, and let G 0 be the subgraph of Gb? induced by V 0 (so G 0 is the “conventional” deterministic presentation of RLL(d, 2d+2)). For vertices u, v∈V 0 and positive integer g, let Wu→v (g)∈{0, 1}g denote the set of words that are generated by paths of length g in G 0 , starting at u and terminating at v. Since G 0 is deterministic, clearly, the number of such paths is |Wu→v (g)|. Let g be a positive integer. It’s not hard to verify that 1. ∆0→0 (g) = Wa0 →a1 (g) 2. ∆0→1 (g) = Wa0 →a0 (g) 88 6.5. Proof of Theorem 13 3. ∆1→0 (g) = Wa2d+2 →a1 (g) 4. ∆1→1 (g) = Wa2d+2 →a0 (g) Let A0 =A(G 0 ). It follows that each |∆a→b (g)|, for a, b∈{0, 1}, is equal to a single entry of (A0 )g . Now, the characteristic polynomial of A0 is (cf. [18]) 2d+3 x − d+2 X xi . i=0 Invoking the Cayley-Hamilton Theorem, we get (A0 )g+2d+3 = d+2 X (A0 )g+i . i=0 Thus, every |∆a→b (g)| satisfies the required recursion and therefore also |∆(g)| = P a,b |∆a→b (g)|. Part 2. For d=1, any word int ∆(g) can be extended from the left to a word in ∆(g + 1) and the claim follows. For d≥2, we use induction on g. Table 6.1, which can be easily verified, shows the sets ∆(g) for g=1, 2, . . ., 2d+3 along with |∆(g)| for g=1, 2, . . ., 2d+6 and d≥2. Evidently, |∆(g)|≤|∆(g+1)| for 3≤g≤2d+5. This shows the induction basis. The induction step follows from the recursion relation (6.36). Part 3. Since the claim is symmetric in g1 , g2 , it’s enough to prove it only for g1 ≤g2 . We use induction on g2 . For the basis of the induction we verify the claim for all 3≤g1 ≤g2 ≤2d+5. For 1≤d≤7 we verified the induction basis using a computer, so here we assume d≥8. Consider Table 6.1. If 3≤g1 ≤d, then |∆(g1 )| = 0 and the claim holds trivially. If g1 = d+1, then |∆(g1 )| = 1 and the claim follows from the monotonicity of |∆(g)| for g≥3 shown in part 2. If g1 = g2 = d+2, the claim holds since |∆(d+2)|2 =9≤|∆(2d+4)|=11. If g1 = d+2 and d+3≤g2 ≤2d+1 then |∆(d+2)||∆(g2 )| = 12<|∆(2d+5)|≤|∆(d+2+g2 )|, with the last inequality following from the monotonicity of |∆(g)|, for g≥3. If g1 =d+2 and 2d+2≤g2 ≤2d+5 then |∆(d+2+g2 )| ≥ |∆(3d+4)| = 2d+3 X |∆(i)| i=d+1 = 1+3+(d−1)4+5+8≥45 > |∆(d+2)||∆(2d+5)|≥|∆(d+2)||∆(g2 )|. If d+3≤g1 ≤2d+1 and g1 ≤g2 ≤2d+1 then |∆(g1 )||∆(g2 )|=16=|∆(2d+6)|≤|∆(g1 +g2 )|. 89 6.5. Proof of Theorem 13 g ∆(g) {0, 1} 1 {10} 2 ∅ 3≤g≤d {0d 1} d+1 {0d 10, 0d+1 1, 10d 1} d+2 d+3≤g≤2d+1 {0g−2 10, 0g−1 1, 10g−2 1, 10g−3 10} {02d 10, 0d 10d 1, 02d+1 1, 102d−1 10, 102d 1} 2d+2 {02d+1 10, 0d 10d 10, 02d+2 1, 0d+1 10d 1, 2d+3 0d 10d+1 1, 102d 10, 102d+1 1, 10d 10d 1} {. . .} 2d+4 2d+5 {. . .} 2d+6 {. . .} |∆(g)| 2 1 0 1 3 4 5 8 11 14, if d=2; 13, 21, if d≥3. if d=2; 17, if d=3; 16, if d≥4. Table 6.1: Values of |∆(g)| for 1≤g≤2d+6 and d≥2. If d+3≤g1 ≤2d+1 and 2d+2≤g2 ≤2d+5 then 2d+4 X |∆(g1 +g2 )| ≥ |∆(3d+5)|= |∆(i)| i=d+2 = 3+(d−1)4+5+8+11≥55 > |∆(g1 )||∆(2d+5)|≥|∆(g1 )||∆(g2 )|. Finally, if 2d+2≤g1 ≤g2 ≤2d+5 then, since d≥8 and hence 2d+11≤3d+3, we 90 6.5. Proof of Theorem 13 have |∆(g1 +g2 )| ≥ |∆(4d+4)|= 3d+3 X 2d+11 X |∆(i)|≥ i=2d+1 = 2d+6 X 2d+11 X 2d+11 X 2d+3 X i=2d+7 i=2d+7 j=d+1 |∆(i)|+ i=2d+1 |∆(i)| i=2d+1 |∆(i)| = 57+ |∆(i−j)| 2d+11 X ≥ 57+ (1+3+4 (i−(d + 1) − (d+2))) =197 i=2d+7 > |∆(2d+5)|2 ≥|∆(g1 )||∆(g2 )|. This shows the basis of the induction. As for the induction step, let k≥2d+5 be an integer and assume the claim holds for all 3≤g1 ≤g2 ≤k. We will prove that the claim holds for all 3≤g1 ≤g2 ≤k+1. For g1 =g2 =k+1 we have |∆(k+1)||∆(k+1)| = 2d+3 X 2d+3 X |∆(k+1−i)||∆(k+1−j)| i=d+1 j=d+1 ≤ 2d+3 X 2d+3 X |∆(2k+2−i−j)| = i=d+1 j=d+1 2d+3 X |∆(2k+2 − i)| i=d+1 = |∆(2k+2)|. where the equalities follow from (6.36), and the inequality follows from the induction hypotheses (note that in the first double sum k+1−i≥3 and k+1−j≥3 due to our assumption on k). The case g2 =k+1, 3≤g1 ≤k is handled in a similar manner. Part 4. Let θ : {1, 2, . . .}→[0, ∞) be the function defined by 0 if g≤d+1 θ(g) = , g = 1, 2, . . .. log |∆(g)| otherwise. Then it’s easily verified, using parts 2 and 3, that this function is superadditive for all positive integers g. Hence, by Lemma 1, limg→∞ θ(g)/g exists and satisfies θ(g) θ(g) = sup . g→∞ g g≥1 g lim Since θ is nonnegative, the RHS is equal to supg≥d+2 (θ(g)/g), thus lim g→∞ θ(g) log |∆(g)| θ(g) log |∆(g)| = lim = sup = sup . g→∞ g g g g g≥d+2 g≥d+2 91 6.5. Proof of Theorem 13 Let A0 be the matrix defined in the proof of part 1 above, and let a, b∈{0, 1}. Since A0 is primitive, it holds that limg→∞ ((A0 )g /λ(A0 )g ) exists and is strictly positive in each entry. As ∆a→b (g) is equal to a single entry of (A0 )g , it follows that there exists a positive real constant ca,b , such that |∆a→b (g)| = ca,b . g→∞ λ(A0 )g lim Since |∆(g)|= P a,b |∆a→b (g)|, lim g→∞ this implies that log |∆(g)| = log λ(A0 ) = Cd . g It remains to check that supg≥2 (log |∆(g)|)/g= supg≥d+2 (log |∆(g)|)/g. This holds since for d≥2 and 2≤g<d+2, (log |∆(g)|)/g≤0≤Cd , and for d=1, it can be verified that (log |∆(2)|)/2=1/2<C1 . Part 5. Obviously the claim holds for g=1, so assume g≥2. In this case, let ψ:∆0→0 (g)→∆1→1 (g) be given by ψ(0w0) = 10w , 0w0∈∆0→0 (g), w∈{0, 1}∗ . It’s easy to verify that for any z∈∆0→0 (g), ψ(z)∈∆1→1 (g) and that ψ is one-toone and onto ∆1→1 (g). This shows the claim. Part 6. Note, that for 1≤g≤d, ∆0→1 (g) = ∅, and the claim holds. To show the claim for g≥d+2, we define a map φ : ∆0→1 (g)→∆0→0 (g). For a word x∈∆0→1 (g), φ(x) is defined as follows: 1. If x = w0d+1 1, where w∈{0, 1}∗ , then φ(x) = w0d 10. 2. if x = w0d+1 10d 1, where w∈{0, 1}∗ , then φ(x) = w0d 10d 10. 3. if x = w10d 1, and w does not end with 0d+1 , then φ(x) = w0d 10. For i = 1, 2, 3, let Λi ⊆∆0→1 (g) denote the set of words satisfying the conditions of case i above. Observe, that {Λ1 , Λ2 , Λ3 } is a partition of ∆0→1 (g), and that in each case φ(x)∈∆0→0 (g); thus φ is well-defined. We claim that φ 92 6.5. Proof of Theorem 13 is at most “two-to-one”. More precisely, we claim that there are no 3 distinct words x, y, z∈∆0→1 (g) such that φ(x)=φ(y)=φ(z); otherwise, since, clearly, φ restricted to Λi is one-to-one, each of x, y, z must belong to a different Λi , say x∈∆1 , y∈∆2 and z∈∆3 ; but then φ(y) ends with 10d 10 while φ(z) ends with 0d+1 10. Thus φ is at most two-to-one, and it follows that |∆0→1 (g)|≤2|∆0→0 (g)|. Part 7. Let g6=d+1. Then by parts 5 and 6, |∆(g)| ≥ |∆0→0 (g)| + |∆1→1 (g)| + |∆0→1 (g)| = 2|∆0→0 (g)| + |∆0→1 (g)| ≥ 2|∆0→1 (g)|, and the claim follows. The case d = 1 and g = d+1 = 2, is easily verified. We will use the next lemma in the proof of Proposition 12, part 2. It gives a bound on λ((Ni )), where (Ni )⊆{A, C} is a sequence of the form (6.26). Lemma 9. Let s, t≥0 and g≥2 be integers. Set λ=λ((C)2 (A, C)s (A)g (C, A)t ). If g=d+1 and d>1 then λ=(s + 1)(t + 1), otherwise, λ≤|∆(g)| s + 1 t + 1 (s + 1)(t + 1) + + 4 4 2 . Proof. Let g∈Zs+t+1 be the block vector given by g=(1s | g | 1t ). For a, b∈{0, 1} let Γa→b ⊆∆(g) be given by Γa→b = {(x1 , . . ., xs , y, z1 , . . ., zt )∈∆(g) : y begins with a and ends with b}. Clearly, {Γa,b : a, b∈{0, 1}} is a partition of ∆(g). On the other hand, consider the following identities, when s, t>0, which are easy to verify: |Γ0→0 | = |∆(1s )||∆0→0 (g)| = (s+1)|∆0→0 (g)| |Γ0→1 | = |∆(1s )||∆0→1 (g)||∆(1t )| = (s + 1)(t + 1)|∆0→1 (g)| |Γ1→0 | = |∆1→0 (g)| |Γ1→1 | = |∆1→1 (g)||∆(1t )| = (t+1)|∆1→1 (g)| where we used (6.35). Note that these identities hold even when s = 0 or t = 0. By Proposition 13 it follows that X λ = |∆(g)|= |Γa→b | a,b∈{0,1} = (s+1)|∆0→0 (g)| + (s+1)(t+1)|∆0→1 (g)| (6.37) + |∆1→0 (g)| + (t+1)|∆1→1 (g)|. 93 6.5. Proof of Theorem 13 Now, if g=d+1 and d>1, then ∆(g)={0d 1} and the claim follows from (6.37). It remains to show the case g6=d+1 or d=1. If |∆(g)| = 0, then the claim readily follows from (6.37). Otherwise, rewriting (6.37), we obtain |∆0→0 (g)| |∆1→1 (g)| λ = |∆(g)| (s+1) + (t+1) |∆(g)| |∆(g)| |∆1→0 (g)| |∆0→1 (g)| + + (s+1)(t+1) |∆(g)| |∆(g)| |∆0→0 (g)|+|∆1→0 (g)|/2 ≤|∆(g)| (s+1) |∆(g)| |∆0→1 (g)| |∆1→1 (g)|+|∆1→0 (g)|/2 + (s+1)(t+1) . + (t+1) |∆(g)| |∆(g)| (6.38) Now, by Lemma 8, part 5 we have |∆0→0 (g)| + |∆1→0 (g)|/2 |∆1→1 (g)| + |∆1→0 (g)|/2 = |∆(g)| |∆(g)| 1 |∆0→1 (g)| = 1− . 2 |∆(g)| Substituting this into (6.38) and applying part 7 of Lemma 8, we obtain s+1 t+1 s+1 t+1 |∆0→1 (g)| + + (s+1)(t+1) − − λ≤|∆(g)| 2 2 |∆(g)| 2 2 . s+1 t+1 (s+1)(t+1) ≤ |∆(g)| + + . 4 4 2 This completes the proof. Proof of Proposition 12, part 2. Set ρ = %((Fi )). Then since g≥2, ρ= t+s+2 1 3 = < . (t+s+2)(2d+2)+t+s+g (2d+3)+(g−2)/(t+s+2) 6d+8 Hence, by Proposition 9, it follows that hd (ρ)≥Lp1 ,p2 (ρ). So, it’s enough to show log λ((Fi )) ≤ Lp1 ,p2 (ρ) |(Fi )| 94 6.5. Proof of Theorem 13 Set λ = λ((Fi )). We need to show ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ log λ ≤ Lp1 ,p2 (ρ) |(Fi )| log λ ≤ L(0,C ),( 3 , log 3 ) (ρ) d 6d+8 6d+8 |(Fi )| 3 log 3 log λ Cd − (log 3)/(6d+8) ρ− + ≤ |(Fi )| −3/(6d+8) 6d+8 6d+8 log λ log 3 ρ − ρ ≤ Cd 1 − |(Fi )| 3 3/(6d+8) log λ − (t + s + 2)(log 3)/3 ≤ Cd ρ (1 − 3/(6d+8) )|(Fi )| log λ − (t + s + 2)(log 3)/3 ≤ Cd g + (t + s − 4)/3 (6.39) Now, if g = d+1, and d>1, then by Lemma 9, the numerator of the LHS of (6.39) is log(s+1)−(s+1)(log 3)/3+ log(t+1)−(t+1)(log 3)/3 which is nonpositive for any nonnegative integers s,t. Since the denominator of the LHS of (6.39) is always positive, it follows that the LHS of (6.39) is nonpositive and (6.39) holds in this case. So we assume that d=1 or g6=d+1. In this case, by Lemma 9, t+1 (s+1)(t+1) log λ−(t+s+2) log3 3 )−(t+s+2) log3 3 log |∆(g)|+ log( s+1 4 + 4 + 2 ≤ g+ t+s−4 g+ t+s−4 3 3 log |∆(g)| − α , = g+β t+1 (s+1)(t+1) where we set α=(t+s+2) log3 3 − log( s+1 ) and β= t+s−4 4 + 4 + 2 3 . Observe that log 3 − log((s + 1)(t + 1)) 3 log 3 log 3 = (t + 1) − log(t + 1) + (s + 1) − log(s + 1) 3 3 α ≥ (t + s + 2) ≥ 0. 95 6.6. Open questions Now, if t + s≥4 then β≥0 and log λ − (t + s + 2) log3 3 log |∆(g)| − α ≤ t+s−4 g+β g+ 3 log |∆(g)| ≤ g ≤ Cd , where the last inequality follows from Lemma 8, part 4. Hence (6.39) holds in this case. Otherwise, if t+s<4 then β<0 and it can be verified that for all such t,s, we have −α≤βC1 . By Lemma 6 this implies that −α ≤ βCd (6.40) Again, by Lemma 8, part 4, we have (log |∆(g)|)/g≤Cd , which implies log |∆(g)| ≤ gCd (6.41) Summing equations (6.41) and (6.40) and dividing by g+β (which is positive), we get log |∆(g)| − α ≤ Cd . g+β Therefore (6.39) holds in this case as well. 6.6 Open questions By (6.2), the independence capacity of RLL(d, k) remains 1/(2d+2) for all 2d + 1≤k≤3d + 1. Is it possible to generalize the derivation in the proof of Theorem 13 to obtain the tradeoff function for RLL(d, k) for this range of d and k? The “reverse-concatenation” encoding scheme described in Section 6.1 is used in practice for certain digital storage systems where the relevant constraint is RLL(0, k). Knowing the tradeoff function for this constraint is thus especially important. Unfortunately, currently, this function is only known exactly when k = 1, 2 [5], and for k = 3 for insertion rates in [0, 1/4] [37]. We so far restricted ourselves to dealing with tradeoff functions of 1dimensional and binary constraints. It remains a task for future work to generalize the definition to higher-dimensional and non-binary constraints. 96 Chapter 7 Bounds on capacity using probability In this chapter we give some probabilistic inequalities that hold for certain 2dimensional binary constraints. Using these, we obtain a lower bound, stated in Theorem 16, on the capacity of certain constraints of the form S (V) ⊗RLL(0, 1), where S (V) is a 1-dimensional constraint over {0, 1}. We suspect that this bound is usually inferior to the bounds we get using the method described in Chapter 3: for example, for the (bit-flipped) hard square constraint, Theorem 16 gives cap(RLL(0, 1)⊗2 )≥0.49445718. . ., whereas with the method of Chapter 3 one gets cap(RLL(0, 1)⊗2 )≥0.58789116. . .. However, the arguments presented here do not seem to require symmetry of either the horizontal or vertical strips of the constraint, and thus may prove to be generalizable to constraints for which the method of Chapter 3 cannot be used. For the rest of this chapter, we assume Σ = {0, 1}. 7.1 Some correlation inequalities The method described in this chapter uses correlation inequalities: specifically the FKG and Holley inequality. We summarize them here. A partially ordered set (Λ, ≤) is called a lattice, if any two elements x, y∈Λ have a smallest upper bound denoted x∨y (i.e. x∨y ≥ x and x∨y ≥ y and for all z∈Λ s.t. z≥x and z≥y, we have z ≥ x∨y) and a greatest lower bound denoted x∧y. A lattice is distributive if it satisfies either of the following two equivalent conditions: x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) for all x, y, z∈Λ, x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z) for all x, y, z∈Λ. As an example consider the set Λ = Σm×n for some m, n ∈ N. As usual we say that for ∆1 , ∆2 in Λ, ∆1 ≤∆2 if the inequality holds entry-wise. Then ≤ is a partial order on Λ and (Λ, ≤) is a finite distributive lattice. For two elements ∆1 , ∆2 ∈Σm , the arrays ∆1 ∨∆2 (resp. ∆1 ∧∆2 ) is the array formed by taking an entry-wise maximum (resp. minimum) of ∆1 , ∆2 . Every lattice that we use in this 97 7.1. Some correlation inequalities chapter will essentially be such a lattice and in fact, it can be shown that every finite distributive lattice is isomorphic to a sub-lattice of Σm×1 for some m [31, Chapter 14, Theorem 15]. Henceforth, we will regard Σm×n as the lattice (Σm×n , ≤). Let (Λ, ≤) be a finite distributive lattice. A function f : Λ → R is increasing (i.e. non-decreasing) if for all x, y∈Λ with x≤y, f (x)≤f (y). We call a subset A⊆Λ increasing if its indicator function is increasing and a 1-dimensional constraint S⊆Σ∗ , increasing, if Sn is increasing (w.r.t the lattice Σn ) for all n∈N. The notions of a decreasing function, subset and 1-dimensional constraint are defined in an analogous manner. A function f : Λ → R is monotone if it is increasing or decreasing. Now, let µ : P(Λ) → [0, 1] be a probability measure on Λ, where P(Λ) denotes the power set of Λ. For x ∈ Λ and such a measure µ, we abuse notation and write µ(x) to mean µ({x}). For any real function f on Λ we use Eµ (f ) = E(f ) to denote the expectation of f w.r.t µ, namely X Eµ (f ) = f (x)µ(x). x∈Λ The following is known as the Holley inequality. Theorem 14 (Holley inequality). Let (Λ, ≤) be a finite distributive lattice, and µ1 ,µ2 : P(Λ) → [0, 1] be two probability measures on Λ such that µ1 (x ∧ y)µ2 (x ∨ y) ≥ µ1 (x)µ2 (y) For all x, y∈Λ. (7.1) Then for any increasing function f : Λ → R, Eµ1 (f ) ≤ Eµ2 (f ). See [15] for a proof. A simple corollary of this inequality is the following lemma which we use in the next section. Lemma 10. Let W0 , . . ., Wm−1 be m random variables—each taking values in Σ—with two probability distributions µ1 , µ2 such that W0 , . . ., Wm−1 are independent w.r.t both µ1 and µ2 . Assume that for all t∈[m], µ1 (Wt = 1) ≤ µ2 (Wt = 1). Then for any increasing set A⊆Σm µ1 ((W0 , . . ., Wm−1 )∈A) ≤ µ2 ((W0 , . . ., Wm−1 )∈A). Proof. It’s easy to verify that (7.1) holds for µ1 and µ2 . The result follows by applying the Holley inequality to the indicator function of A. 98 7.2. Bounds on capacity using probability We will also make use of the following inequality, known as the FKG inequality ( [8]). A probability measure µ : P(Λ) → [0, 1] on Λ is called log supermodular if for all x, y∈Λ. µ(x ∧ y)µ(x ∨ y) ≥ µ(x)µ(y). (7.2) Theorem 15 (FKG inequality). Let Λ be a finite distributive lattice, and let µ : P(Λ) → [0, 1] be a log supermodular probability measure on Λ. Then for any monotone functions f, g : Λ → R the following statements hold: 1. If f, g are both increasing or both decreasing then they are positively correlated, namely E(f )E(g) ≤ E(f g) 2. If f is increasing and g is decreasing or vice versa then they are negatively correlated, namely E(f )E(g) ≥ E(f g) It is shown in [15] that this inequality is an easy corollary of Theorem 14. As the proof is short, we reproduce it here for completeness. Proof. Observe that it is enough to prove the theorem for the case where both f and g are increasing—the other cases follow from this case by taking f˜ = −f or g̃ = −g, as appropriate. Now, by adding a large enough constant to f , we may assume without loss of generality P that f is strictly positive. For a subset A⊆Λ, set µ1 (A) = µ(A) and µ2 (A) = x∈A (f (x)µ(x))/Eµ (f ). Then it’s easy to check that (7.1) is satisfied and therefore since g is increasing, Eµ1 (g) ≤ Eµ2 (g) P f (x)µ(x)g(x) ⇐⇒ Eµ (g) ≤ x∈Λ Eµ (f ) ⇐⇒ Eµ (g)Eµ (f ) ≤ Eµ (f g). 7.2 Bounds on capacity using probability Our goal in this section is to prove the following lower bound on the capacity of certain 2-dimensional constraints. Theorem 16. Let S = S (V) ⊗ S (H) be a 2-dimensional constraint with S (H) √= RLL(0, 1) and S (V) an increasing 1-dimensional constraint over Σ. Let ϕ = 1+2 5 2 denote the golden mean and set µ∞ (0) = ϕ21+1 and µ∞ (1) = ϕϕ2 +1 . Let G (V) = ((V (V) , E (V) ), L(V) ) be a lossless presentation of S (V) , and W (V) : E (V) → [0, 1] 99 7.2. Bounds on capacity using probability be the edge-weighting function given by W (V) (e) = µ∞ (L(V) (e)). Let A be the |V (V) |×|V (V) | real matrix with entries indexed by V (V) ×V (V) and given by X (A)(i,j) = W (V) (e). e∈E (V) , σ(e)=i,τ (e)=j Then 1 (V) cap(S) ≥ log ϕ − 1 − cap(S ) − log λ(A) , 2 Remark 1. If S = RLL(1, ∞)⊗S (V) with S (V) a decreasing 1-dimensional constraint over Σ, then simultaneously changing ‘0’s to ‘1’s and ‘1’s to ‘0’s in every array of S would result in a constraint that satisfies the requirement of the above theorem and has the same capacity as S. Thus, we can also use the theorem to get a lower bound on cap(S). Proof. For a probability space (Ω, F, µ), and events A, B ⊆ Ω with µ(A) > 0, we denote by µ(·|A) : F → [0, 1] the conditional probability measure given by µ(B|A) = µ(B ∩ A)/µ(A) for all B∈F. For a word w∈Σ∗ of length m, m∈N, we index its symbols by [m] so that w = w0 w1 . . .wm−1 . For m, n∈N, let µm×n : P(Σm×n ) → [0, 1] be the uniform probability measure on Σm×n . Clearly, µm×n satisfies (7.2) with equality. For a subset A⊆Σm×n , let 1A : Σm×n → {0, 1} be the indicator function of A, and for integers i∈[m] and j∈[n], define the sets: (m,n) = {∆∈Σm×n : ∆i,∗ satisfies S (H) }, Ri = Ri (m,n) Cj = Cj R = R(m,n) = {∆∈Σm×n : ∆∗,j satisfies S (V) }, \ (m,n) = Ri , i∈[m] (m,n) Ceven = Ceven = (m,n) \ Cj , and j∈[n], j even (m,n) Codd = Codd = \ (m,n) Cj . j∈[n], j odd Then since S (H) and S (V) are both increasing, it follows that, 1R∩Codd and 1Ceven 100 7.2. Bounds on capacity using probability are increasing, and consequently, by the FKG inequality, µm×n (Sm×n ) = µm×n (R ∩ Codd ∩ Ceven ) = Eµm×n (1R∩Codd ·1Ceven ) ≥ Eµm×n (1R∩Codd )Eµm×n (1Ceven ) = µm×n (R ∩ Codd )µm×n (Ceven ). It follows that log |Sm×n | log(2mn µm×n (Sm×n )) = lim lim m→∞ n→∞ mn mn (m,n)→∞ log (2mn µm×n (Ceven )µm×n (R ∩ Codd )) ≥ lim sup lim sup mn m→∞ n→∞ log µm×n (Ceven ) · 2mn · µm×n (R)µm×n (Codd |R) = lim sup lim sup mn m→∞ n→∞ dn/2e (V) (H) m log 2−m Sm Sn = lim sup lim sup + mn m→∞ n→∞ ! Qbn/2c Ti−1 C2j−1 log i=1 µm×n C2i−1 R ∩ j=1 cap(S) = lim mn = cap(S (H) ) + 1 2 lim sup lim sup m→∞ cap(S (V) ) − 1 + Pbn/2c Ti−1 log µ C C R ∩ m×n 2i−1 2j−1 j=1 i=1 mn n→∞ . (7.3) We claim that µm×n (C2i−1 |R∩ i−1 \ C2j−1 )≥µm×n (C2i−1 |R), (7.4) j=1 for all m, n ∈ N and integer 1≤i≤bn/2c. This follows from the right inequality given in the next Proposition. Proposition 14. For all m, n∈N and i∈[n] \ \ Ci−j ) ≤ µm×n (Ci |R) ≤ µm×n (Ci |R ∩ Ci−j ). µm×n (Ci |R ∩ 1≤j≤i, j odd 1≤j≤i, j even 101 7.2. Bounds on capacity using probability The proof of Proposition 14 is given in Section 7.2.1. Using (7.4) in (7.3) and the well-known fact (c.f. [32]) that cap(S (H) ) = log ϕ, we obtain 1 cap(S (V) ) − 1 + 2 Pbn/2c log µm×n (C2i−1 |R) lim sup lim sup i=1 mn m→∞ n→∞ 1 = log ϕ − (1 − cap(S (V) ))+ 2 Pbn/2c log µm×n (C2i−1 |R) lim sup lim sup i=1 mn m→∞ n→∞ cap(S) ≥ cap(S (H) ) + The theorem now follows from the next lemma whose proof is given in Section 7.2.2 Lemma 11. The following statements hold: Pbn/2c X Y i=1 log µm×n (C2i−1 |R) 1 1. For all m∈N, lim = log µ∞ (wj ). n→∞ n 2 (V) w∈Sm j∈[m] log 2. lim m→∞ 7.2.1 P (V) w∈Sm Q j∈[m] µ∞ (wj ) m = log λ(A). Proof of Proposition 14 Proof. We prove the right inequality since it is the only one we use in the proof of Theorem 16 and the proof of the left inequality is similar. Let m, n∈N and i∈[n]. We again use the FKG inequality. Set A = {i − j : j ∈ N, 1 ≤ j≤i, j is even}, S = [m]×A and let Λ0 = ΣS , be the set of all binary-valued functions with domain S. We think of a function in Λ0 as a binary non-contiguous configuration on the “sites” of S. For such a configuration ∆0 ∈Λ0 and j∈A, we write ∆0∗,j to denote the 1-dimensional word ∆0 (0, j)∆0 (1, j). . .∆0 (m − 1, j). As usual for ∆01 , ∆02 ∈Λ0 we write ∆01 ≤∆02 if ∆01 (j)≤∆02 (j) for all j∈S. Then Λ0 with this ordering is a finite distributive lattice. For a binary array ∆∈Σm×n , denote by π(∆)∈Λ0 the restriction of ∆ to S, namely the configuration ∆0 ∈Λ0 given by ∆0 (j) = ∆j for all j∈S. For a set B⊆Λ0 , let π −1 (B) denote the inverse image of B under π. Set µ = µm×n and let µ0 : P(Λ0 ) → [0, 1] be the push-forward probability measure of µ(·|R), defined by µ0 (B) = µ(π −1 (B)|R). For j∈A, let Cj0 = {∆0 ∈Λ0 : ∆0∗,j ∈ S (V) } T T and set C 0 = j∈A Cj0 . Then π −1 (Cj0 ) = Cj and π −1 (C 0 ) = j∈A Cj . Finally, 102 7.2. Bounds on capacity using probability let f, g : Λ0 → R be the functions given by 1 If ∆0 ∈C 0 . 0 0 f (∆ ) = 1C 0 (∆ ) = , 0 otherwise. and g(∆0 ) = µ(Ci |R ∩ π −1 ({∆0 })), for ∆0 ∈Λ0 . We claim that the following statements hold. • µ0 is log supermodular. • f is increasing. • g is increasing. Assuming that these do hold, then by the FKG inequality, we have X X X g(∆0 )µ0 (∆0 ) f (∆0 )g(∆0 )µ0 (∆0 ) ≥ f (∆0 )µ0 (∆0 ) ∆0 ∈Λ0 ⇐⇒ ∆0 ∈Λ0 X ∆0 ∈Λ0 µ(Ci |R ∩ π −1 ({∆0 }))µ(π −1 ({∆0 })|R) ≥ ∆0 ∈C 0 X ∆0 ∈C 0 ⇐⇒ X X µ(π −1 ({∆0 })|R) µ(Ci |R ∩ π −1 ({∆0 }))µ(π −1 ({∆0 })|R) ∆0 ∈Λ0 µ(Ci ∩ π −1 0 ({∆ })|R) ≥ ∆0 ∈C 0 X X µ(π −1 ({∆0 })|R) ∆0 ∈C 0 ⇐⇒ µ(Ci ∩ µ(Ci ∩ π −1 ({∆0 })|R) ∆0 ∈Λ0 \ Cj |R) ≥ µ( j∈A ⇐⇒ µ(Ci |R ∩ \ \ Cj |R)µ(Ci |R) j∈A Cj ) ≥ µ(Ci |R), j∈A which is the desired inequality. So it remains to show that the above 3 claims hold. We begin by showing that µ0 is log supermodular. Let ∆0 ∈Λ0 . Then µ(π −1 ({∆0 }) ∩ R) µ(R) −mn 2 ∆∈Σm×n : ∀t∈[m]∀j∈A(∆t,j = ∆0 (t, j) and ∆t,∗ ∈S (H) ) = 2−mn |R| n o (H) w∈Sn : ∀j∈A wj = ∆0 (t, j) Y = (7.5) (H) Sn t∈[m] µ0 (∆0 ) = 103 7.2. Bounds on capacity using probability Let j0 < j1 < . . . < j`−1 =i−2 be the elements of A and for symbols a, b ∈ Σ (H) and integer u≥0, define αu (a) = |{w∈Su : wa∈S (H) }|, δ(a, b) = |{x∈Σ : (H) axb∈S (H) }| and ωu (a) = |{w∈Su : aw∈S (H) }|. Since the memory of S (H) = RLL(0, 1) is 1, it readily follows that o n Y δ(ak , ak+1 ) ωn−i+1 (a`−1 ). w∈Sn(H) : ∀k∈[`] wjk = ak = αj0 (a0 ) k∈[`−1] Setting r = n − i + 1, we can rewrite (7.5) as Q`−2 0 (t, j ), ∆0 (t, j δ ∆ ) ωr ∆0 (t, j`−1 ) Y αj0 ∆0 (t, j0 ) k k+1 k=0 µ0 (∆0 ) = . (H) Sn t∈[m] Now, pick ∆01 , ∆02 ∈ Λ0 , and set Γ01 = ∆01 ∧ ∆02 and Γ02 = ∆01 ∨ ∆02 . 2 Y Y −2m µ0 (∆1 )µ0 (∆2 )= Sn(H) αj0 ∆0s (t, j0 ) · t∈[m] s=1 Y 2 Y Y δ ∆0s (t, jk ), ∆0s (t, jk+1 ) · t∈[m] k∈[`−1] s=1 2 Y Y ωr ∆0s (t, j`−1 ) t∈[m] s=1 For all a, b, c, d∈Σ, it’s easy to verify that δ(a, b)δ(c, d) ≤ δ(a ∧ c, b ∧ d)δ(a ∨ c, b ∨ d), and obviously αj0 (a)αj0 (b) = αj0 (a∧b)αj0 (a∨b) and ωr (a)ωr (b) = ωr (a∧b)ωr (a∨b). Thus, 2 Y Y −2m µ0 (∆1 )µ0 (∆2 ) ≤ Sn(H) αj0 Γ0s (t, j0 ) · t∈[m] s=1 Y 2 Y Y δ Γ0s (t, jk ), Γ0s (t, jk+1 ) · t∈[m] k∈[`−1] s=1 2 Y Y ωr Γ0s (t, j`−1 ) t∈[m] s=1 0 = µ (Γ01 )µ0 (Γ02 ). 104 7.2. Bounds on capacity using probability Hence µ0 is log supermodular. The second claim that f is increasing follows immediately from the fact the S (V) is increasing. So it remains to show that g is increasing (this is where the proof of the left inequality of the proposition differs). Let ∆0 ∈Λ0 . Then g(∆0 ) = µ(Ci |R ∩ π −1 (∆0 )) = µ(Ci ∩R∩π −1 (∆0 )) µ(R∩π −1 (∆0 )) n o (H) , ∗,i =w, ∀t∈[m] ∆t,∗ ∈S ∆∈Σm×n : ∆∀t∈[m]∀j∈A∆ =∆0 (t,j) P 2−mn w∈S (V) t,j m = −mn ∆∈Σm×n : ∀t∈[m]∀j∈A(∆t,j = ∆0 (t, j) and ∆t,∗ ∈S (H) ) 2 n o P Q (H) 0 (t, j) v∈S : v = w , ∀j∈A v =∆ (V) n i t j t∈[m] w∈Sm n o = , (7.6) Q (H) v∈Sn : ∀j∈A vj =∆0 (t, j) t∈[m] (H) For t∈[m], let qt = |{v∈Si−1 : ∀j∈A vj =∆0 (t, j)}|; then since S (H) has memory 1, it follows that for all t∈[m], and a∈Σ we have n o v∈Sn(H) : vi = a, ∀j∈A vj =∆0 (t, j) = qt δ(∆0 (t, i−2), a)ωn−i−1 (a), and n o v∈Sn(H) : ∀j∈A vj =∆0 (t, j) = qt ωn−i+1 (∆0 (t, i−2)) Substituting this into (7.6), we obtain g(∆0 ) = X X Y Y δ(∆0 (t, i−2), wt )ωn−i−1 (wt ) = ψ∆0 (t,i−2) (wt ), ωn−i+1 (∆0 (t, i−2)) (V) (V) w∈Sm t∈[m] w∈Sm t∈[m] n−i−1 (b) where ψa (b) = δ(a,b)ω for all a, b ∈ Σ. Observe that ψa (0)+ψa (1) = 1 and ωn−i+1 (a) that ψa (0), ψa (1)∈[0, 1] for all a∈Σ. It follows that g(∆0 ) is the probability that a randomly selected word consisting of m independent random bits, with the tth bit having a probability of ψ∆0 (t,i−2) (1) to be 1, satisfies S (V) . Now, let ∆01 , ∆02 ∈ Λ0 satisfy ∆01 ≤ ∆02 . We claim that for each t∈[m] ψ∆01 (t,i−2) (1)≤ψ∆02 (t,i−2) (1). Indeed, let t∈[m] be such that ∆01 (t, i−2) = 0 and ∆02 (t, i−2) = 1. Then δ(0, 1)ωn−i−1 (1) ωn−i−1 (1) = , ωn−i+1 (0) ωn−i (1) δ(1, 1)ωn−i−1 (1) 2ωn−i−1 (1) ψ∆02 (t,i−2) (1) = = , ωn−i+1 (1) ωn−i+1 (1) ψ∆01 (t,i−2) (1) = and (H) and since ωn−i+1 (1)≤2|Sn−i |=2ωn−i (1) it follows that ψ∆01 (t,i−2) (1) ≤ ψ∆02 (t,i−2) (1). As S (V) is increasing, Lemma 10 implies that g(∆01 )≤g(∆02 ) and therefore g is increasing as claimed. 105 7.2. Bounds on capacity using probability 7.2.2 Proof of Lemma 11 Fix m∈N. Observe that for any n1 , n2 ∈N we have µm×(n1 +n2 +1) (Cn1 |R) = = |Cn1 ∩R|2−mn |R|2−mn n ∆∈Σm×(n1 +n2 +1) : X ∆∗,n1 =w, ∀t∈[m]∆t,∗ ∈S (H) o |R| (V) w∈Sm = o n (H) = w v∈S : v Y n j n1 X (H) (V) w∈Sm j∈[m] = X Y (V) w∈Sm j∈[m] Sn1 +n2 +1 µn1 ,n2 (wj ), (7.7) where we define µn1 ,n2 : Σ → [0, 1] to be the function given by µn1 ,n2 (a) = (H) (H) |{v∈Sn1 +n2 +1 : vn1 = a}|/|Sn1 +n2 +1 | for a∈Σ. Since S (H) is increasing, for (H) any v∈Sn1 +n2 +1 with vn1 = 0, the word v 0 formed from v by changing the n1 ’th symbol to ‘1’ is also in S (H) . It follows that for all n1 , n2 ∈ N, µn1 ,n2 (1)≥ 21 , and thus by (7.7), one has µm×(n1 +n2 +1) (Cn1 |R)≥(µn1 ,n2 (1))m ≥2−m . (7.8) We next show that for all a∈Σ lim (n1 ,n2 )→∞ µn1 ,n2 (a) = µ∞ (a). (7.9) In fact it can be shown that the above limit exists (and equals a different µ∞ ) for all vertex constraints S (H) defined by a primitive graph. Let fn = |(RLL(0, 1))n |. It is well known (c.f. [32]) that fn is the (n + 2) Fibonacchi number and is given by 3 ϕ3 n ϕ̄ fn = ϕn + ϕ̄ , (7.10) 1 + ϕ2 1 + ϕ̄2 P P where ϕ̄ = 1 − ϕ. Since a∈Σ µn1 ,n2 (a) = a∈Σ µ∞ (a) = 1, it’s enough to 106 7.2. Bounds on capacity using probability show that (7.9) holds for a = 1. In this case, clearly, (H) |{v∈Sn1 +n2 +1 : vn1 = 1}| µn1 ,n2 (1) = fn1 +n2 +1 (H) (H) |{x1y ∈ Σn1 +n2 +1 : x∈Sn1 , y∈Sn2 }| fn1 +n2 +1 fn1 fn2 = . fn1 +n2 +1 = By substituting (7.10) into the last equality and taking the limit as (n1 , n2 ) → ∞ it can be verified that (7.9) for a = 1. P holds Q Now, set L = log w∈S (V) j∈[m] µ∞ (wj ). It follows from (7.7) and (7.9) m that lim(n1 ,n2 )→∞ log µm×(n1 +n2 +1) (Cn1 |R) = L. Let >0, and choose N ∈N such that for all n1 , n2 ≥N , | log µm×(n1 +n2 +1) (Cn1 |R) − L| < . For n∈N set An = {j∈[n] : j odd}, An,1 = {j∈An : j < N or j > n − 1 − N } and An,2 = An \An,1 . Then, for all n∈N with n > 2N , we have Pbn/2c i=1 log µm×n (C2i−1 |R) −L ≤ bn/2c P i∈An,1 log µm×n (Ci |R) |An | P i∈An,2 + log µm×n (Ci |R)−L −|An,1 |L . |An | Note that for all i∈An,2 , it holds that i≥N and n − i − 1≥N ; thus | log µm×n (Ci |R)−L| < . Also by (7.8), it follows that for all i∈An,1 , | log µm×n (Ci |R)|≤m. Therefore we obtain Pbn/2c i=1 log µm×n (C2i−1 |R) |An,1 | |An,2 | |An,1 | −L ≤ m+ + L bn/2c |An | |An | |An | ≤ 2N (m + L) + , bn/2c which is less than 2 for large enough n. It follows that Pbn/2c lim n→∞ i=1 X Y log µm×n (C2i−1 |R) = log µ∞ (wj ), bn/2c (V) w∈Sm j∈[m] which obviously implies the first statement of the lemma. 107 7.3. Open questions As for the second statement, for a path in G (V) , define its weight to be the product of the weights of its edges. Denote by Wm the sum of all the weights of paths of length m in G (V) . Since G (V) is a lossless presentation of S (V) we have X Y Wm µ∞ (wj ) ≤ Wm . ≤ (V) 2 |V | (V) w∈Sm j∈[m] On the other hand, Wm = 1t Am 1 and, applying Perron-Frobenius theory, we have limm→∞ logmWm = log(λ(A)). The second statement follows. 7.3 Open questions Can Theorem 16 be generalized to other constraints S (H) ? Can one use similar probabilistic tools to obtain upper bounds on the capacity of axial products of monotone 1-dimensional constraints? Finally, can such tools be used to obtain bounds on the capacity of constraints in more than 2 dimensions? 108 Bibliography [1] R.J Baxter, Hard hexagons: exact solution, Journal of Physics A: Mathematical and General 13 (1980), L61–L70. [2] R. Berger, The undecidability of the domino problem, Memoirs of the American Mathematical Society (1966). [3] N.J. Calkin and H.S. Wilf, The number of independent sets in a grid graph, SIAM Journal of Discrete Math. 11 (1998), no. 1, 54–60. [4] M. Cohn, On the channel capacity of read/write isolated memory, Discrete Applied Mathematics 56 (1995). [5] J.C. de Souza, B.H. Marcus, R. New, and Wilson B.A., Constrained systems with unconstrained positions, IEEE Transactions on Information Theory 48 (2002), 866–879. [6] A. Desai, Subsystem entropy for Zd sofic shifts, Indagationes Mathematicae 17 (2006). [7] K. Engel, On the Fibonacci number of an m×n lattice, Fibonacci Quarterly 28 (1990), 72–78. [8] C.M. Fortuin, P.W. Kasteleyn, and J. Ginibre, Correlation inequalities on some partially ordered sets, Comm. Math. Phys. 22 (1971), no. 2, 89–103. [9] S. Friedland, On the entropy of Zd subshifts of finite type, Linear Algebra and its Applications 252 (1997), no. 1–3, 199–220. [10] S. Friedland, Multi-dimensional capacity, pressure and Hausdorff dimension, Mathematical System Theory in Biology, Communication, Computation and Finance (2003), 183–222. [11] S. Friedland and U.N. Peled, The pressure, densities and first order phase transitions associated with multidimensional SOFT, arXiv:0906.5176v3, http://arxiv.org/abs/0906.5176v3, submitted, 2009. 109 [12] D.S. Gaunt and M.E. Fisher, Hard-sphere lattice gases, I. plane-square lattice, J. Chem. Phys. 43 (1965), 2840–2863. [13] M.J. Golin, X. Yong, Y. Zhang, and L. Sheng, New upper and lower bounds on the channel capacity of read/write isolated memory, Discrete Applied Mathematics 140 (2004). [14] S. Halevy and R. Roth, Parallel constrained coding with application to two-dimensional constraints, IEEE Transactions on Information Theory 48 (2002), 1009–1020. [15] R.H. Holley, Remarks on the FKG inequalities, Comm. Math. Phys. 36 (1974), no. 3, 227–231. [16] A. Horn and C. Johnson, Matrix analysis, Cambridge University Press, 1985. [17] R. Horst and N.V. Thoai, DC programming: overview, Journal of Optimizaton Theory and Applications 103 (1999), no. 1. [18] K.A.S. Immink, Codes for mass data storage, Shannon Foundation Publishers, Eindhoven, 2004. [19] K.A.S. Immink and K. Cai, Simple classes of constrained systems with unconstrained positions that outperform the maxentropic bound, IEEE International Symposium on Information Theory, 2008. [20] H. Ito, A. Kato, Zs. Nagy, and K. Zeger, Zero capacity region of multidimensional run length constraints, The Electronic Journal of Combinatorics 6 (1999). [21] H. Kamabe, Insertion rate and optimization of redundancy of constrained systems with unconstrained positions, Information Theory and Applications Workshop, 2009. [22] H. Kamabe, Lower bounds of capacity of 2D constraints, Information Theory and Applications Workshop, 2010. [23] P.W. Kasteleyn, The statistics of dimers on a lattice, Physica A 27 (1961), 1209–1225. [24] J.F.C. Kingman, A convexity property of positive matrices, The Quarterly Journal of Mathematics. Oxford. Second Series 12 (1963). [25] E.H. Lieb, Residual entropy of square ice, Phys. Rev. 162 (1967), no. 1, 162– 172. 110 [26] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge University Press, 1995 (reprinted 1999). [27] E. Louidor, The tradeoff function for a class of RLL(d, k) constraints, In review (SIAM Journal of Discrete Math). [28] E. Louidor and B.H. Marcus, Improved lower bounds on capacities of symmetric 2-dimensional constraints using Rayleigh quotients, IEEE Transactions on Information Theory 56 (2010), no. 4, 1624–1639. [29] E. Louidor, B.H. Marcus, and R. Pavlov, Independence entropy of Zd shift spaces, Submitted to Acta Applicandae Mathematicae. [30] E. Louidor, T.L. Poo, P. Chaichanavong, and B.H. Marcus, Maximum insertion rate and capacity of multidimensional constraints, IEEE International Symposium on Information Theory, 2008. [31] S. MacLane and G. Birkhoff, Algebra, second ed., Chelsea Pub Co, 1999. [32] B. Marcus, R. Roth, and P. Siegel, Constrained systems and coding for recording channels, Handbook of Coding Theory, Chapter 20, Elsevier Science. [33] M. Marcus and H. Minc, A survey of matrix theory and matrix inequalities, Allyn and Bacon, 1964. [34] N. Markley and M. Paul, Maximal measures and entropy of z ν subshifts of finite type, Classical mechanics and dynamical systems 70 (1981), 135–157. [35] Z. Nagy and K. Zeger, Capacity bounds for the three-dimensional (0,1) runlength limited channel, IEEE Transactions on Information Theory 44 (2000), 1030–1033. [36] E. Ordentlich and R. Roth, Independent sets in regular hypergraphs and multidimensional runlength-limited constraints, SIAM J. Discret. Math. 17 (2004), no. 4, 615–623. [37] T.L. Poo, Optimal code rates for constrained systems with unconstrained positions: An approach to combining error correction codes with modulation codes for digital storage systems, Ph.D. thesis, Department of Electrical Engineering, Stanford, 2005. [38] T.L. Poo, P. Chaichanavong, and B.H. Marcus, Trade-off functions for constrained systems with unconstrained positions, IEEE Transactions on Information Theory 52 (2006), 1425–1449. 111 [39] M. Schwartz and J. Bruck, Constrained codes as networks of relations, IEEE Transactions on Information Theory 54 (2008), 2179–2195. [40] A.J. van Wijngaarden and K.A.S. Immink, Maximum runlength-limited codes with error control capabilities, IEEE J. Select. Areas Commun. 19 (2001), 602–611. [41] K. Weber, On the number of stable sets in an m × n lattice., Rostock. Math. Kolloq. 34 (1988), 28–36. [42] X. Yong and M.J. Golin, New techniques for bounding the channel capacity of read/write isolated memory, Data Compression Conference, 2002. 112
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Capacity of multidimensional constrained channels :...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Capacity of multidimensional constrained channels : estimates and exact computations Louidor, Erez 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Capacity of multidimensional constrained channels : estimates and exact computations |
Creator |
Louidor, Erez |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | This work considers channels for which the input is constrained to be from a given set of D-dimensional arrays over a finite alphabet. Such a set is called a constraint. An encoder for such a channel transforms arbitrary arrays over the alphabet into constrained arrays in a decipherable manner. The rate of the encoder is the ratio of the size of its input to the size of its output. The capacity of the channel or constraint is the highest achievable rate of any encoder for the channel. We compute the exact capacity of two families of multidimensional constraints. We also generalize a known method for obtaining lower bounds on the capacity, for a certain class of 2-dimensional constraints, and improve the best known bounds for a few constraints of this class. Given a binary D-dimensional constraint, a D-dimensional array with entries in {0,1,⃞} is called "valid", for the purpose of this abstract, if any "filling" of the '⃞'s in the array with '0's and '1's, independently, results in an array that belongs to the constraint. The density of '*'s in the array is called the insertion rate. The largest achievable insertion rate in arbitrary large arrays is called the maximum insertion rate. An unconstrained encoder for a given insertion rate transforms arbitrary binary arrays into valid arrays having the specified insertion rate. The tradeoff function essentially specifies for a given insertion rate the maximum rate of an unconstrained encoder for that insertion rate. We determine the tradeoff function for a certain family of 1-dimensional constraints. Given a 1-dimensional constraint, one can consider the D-dimensional constraint formed by collecting all the D-dimensional arrays for which the original 1-dimensional constraint is satisfied on every "row" in every "direction". The sequence of capacities of these D-dimensional generalizations has a limit as D approaches infinity, sometimes called the infinite-dimensional capacity. We partially answer a question of [37], by proving that for a large class of 1-dimensional constraints with maximum insertion rate 0, the infinite dimensional capacity equals 0 as well. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 3.0 Unported |
DOI | 10.14288/1.0069991 |
URI | http://hdl.handle.net/2429/24888 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/3.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_louidor_erez.pdf [ 810.17kB ]
- Metadata
- JSON: 24-1.0069991.json
- JSON-LD: 24-1.0069991-ld.json
- RDF/XML (Pretty): 24-1.0069991-rdf.xml
- RDF/JSON: 24-1.0069991-rdf.json
- Turtle: 24-1.0069991-turtle.txt
- N-Triples: 24-1.0069991-rdf-ntriples.txt
- Original Record: 24-1.0069991-source.json
- Full Text
- 24-1.0069991-fulltext.txt
- Citation
- 24-1.0069991.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0069991/manifest