Capacity of Multidimensional Constrained Channels Estimates and Exact Computations by Erez Louidor B.Sc., The Technion Inst. of Technology, 1998 M.Sc., The Technion Inst. of Technology, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Mathematics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) May 2010 c© Erez Louidor 2010 Abstract This work considers channels for which the input is constrained to be from a given set ofD-dimensional arrays over a finite alphabet. Such a set is called a constraint. An encoder for such a channel transforms arbitrary arrays over the alphabet into constrained arrays in a decipherable manner. The rate of the encoder is the ratio of the size of its input to the size of its output. The capacity of the channel or con- straint is the highest achievable rate of any encoder for the channel. We compute the exact capacity of two families of multidimensional constraints. We also gen- eralize a known method for obtaining lower bounds on the capacity, for a certain class of 2-dimensional constraints, and improve the best known bounds for a few constraints of this class. Given a binary D-dimensional constraint, a D-dimensional array with entries in {0, 1,} is called “valid”, for the purpose of this abstract, if any “filling” of the ‘’s in the array with ‘0’s and ‘1’s, independently, results in an array that be- longs to the constraint. The density of ‘’s in the array is called the insertion rate. The largest achievable insertion rate in arbitrary large arrays is called the maximum insertion rate. An unconstrained encoder for a given insertion rate transforms arbi- trary binary arrays into valid arrays having the specified insertion rate. The tradeoff function essentially specifies for a given insertion rate the maximum rate of an un- constrained encoder for that insertion rate. We determine the tradeoff function for a certain family of 1-dimensional constraints. Given a 1-dimensional constraint, one can consider the D-dimensional con- straint formed by collecting all the D-dimensional arrays for which the original 1-dimensional constraint is satisfied on every “row” in every “direction”. The se- quence of capacities of these D-dimensional generalizations has a limit as D ap- proaches infinity, sometimes called the infinite-dimensional capacity. We partially answer a question of [37], by proving that for a large class of 1-dimensional con- straints with maximum insertion rate 0, the infinite dimensional capacity equals 0 as well. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Statement of Co-Authorship . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Multidimensional constraints . . . . . . . . . . . . . . . . . . . . . 6 2.1 One-dimensional constraints and labeled directed graphs . . . . . 6 2.2 Higher dimensional constraints . . . . . . . . . . . . . . . . . . 8 2.3 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Axial product . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Two-dimensional constraints . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Horizontal and vertical strips . . . . . . . . . . . . . . . 15 2.6 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Lower bounds on capacity of 2-dimensional symmetric constraints 18 3.1 Constraints with symmetric edge-constrained strips . . . . . . . . 18 3.2 Constraints with symmetric vertex-constrained strips . . . . . . . 27 3.3 Capacity bounds for axial products of constraints . . . . . . . . . 31 3.4 Heuristics for choosing φ . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 Using max-entropic probabilites . . . . . . . . . . . . . . 33 3.4.2 General optimization . . . . . . . . . . . . . . . . . . . . 36 3.5 Numerical results for selected constraints . . . . . . . . . . . . . 38 iii Table of Contents 3.5.1 The constraint RWIM . . . . . . . . . . . . . . . . . . . 40 3.5.2 The constraint EVEN⊗2 . . . . . . . . . . . . . . . . . . 40 3.5.3 The constraint CHG(b)⊗2 . . . . . . . . . . . . . . . . . 40 3.6 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 Exact computation of capacity . . . . . . . . . . . . . . . . . . . . . 44 4.1 The capacity of ODD⊗D . . . . . . . . . . . . . . . . . . . . . . 44 4.2 The capacity of CHG(2)⊗D . . . . . . . . . . . . . . . . . . . . 45 5 Multi-choice constraints and independence capacity . . . . . . . . . 49 5.1 Multi-choice constraints . . . . . . . . . . . . . . . . . . . . . . 49 5.2 Independence capacity . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Independence capacity and axial products . . . . . . . . . . . . . 53 5.4 Independence capacity and limD→∞ cap(S⊗D) . . . . . . . . . . 55 5.5 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6 The tradeoff function for binary 1-dimensional constraints . . . . . 62 6.1 A brief overview of digital recording . . . . . . . . . . . . . . . 62 6.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3 Background and definitions . . . . . . . . . . . . . . . . . . . . 64 6.4 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . . . . . . 69 6.5 Proof of Theorem 13 . . . . . . . . . . . . . . . . . . . . . . . . 71 6.5.1 Outline of proof . . . . . . . . . . . . . . . . . . . . . . 72 6.5.2 Proof of propositions . . . . . . . . . . . . . . . . . . . 78 6.6 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 7 Bounds on capacity using probability . . . . . . . . . . . . . . . . . 97 7.1 Some correlation inequalities . . . . . . . . . . . . . . . . . . . 97 7.2 Bounds on capacity using probability . . . . . . . . . . . . . . . 99 7.2.1 Proof of Proposition 14 . . . . . . . . . . . . . . . . . . 102 7.2.2 Proof of Lemma 11 . . . . . . . . . . . . . . . . . . . . 106 7.3 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 iv List of Tables 3.1 Matrix size in the method of [3, 7] for the NAK constraint. . . . . 39 3.2 Best bounds on capacities of certain constraints. . . . . . . . . . . 40 3.3 Lower bounds using max-entropic probability heuristic. . . . . . . 41 3.4 Lower bounds using optimization. . . . . . . . . . . . . . . . . . 42 6.1 Values of |∆(g)| for 1≤g≤2d+6 and d≥2. . . . . . . . . . . . . 90 v List of Figures 1.1 Presentations of 1-dimensional constraints. . . . . . . . . . . . . 2 2.1 Presentations of 2-dimensional constraints. . . . . . . . . . . . . 10 3.1 Paths generating an `α×n-array of S, in G(V)n and I. . . . . . . . 23 4.1 Example of the graph Gr for D = 2, n = 6. . . . . . . . . . . . . 47 5.1 Proof of Theorem 10. . . . . . . . . . . . . . . . . . . . . . . . . 60 6.1 Graphs of the tradeoff function for certain RLL constraints. . . . . 68 6.2 The graph ĜFRLL(d,∞) . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.3 The non-trivial component of ĜFRLL(d,2d+2) . . . . . . . . . . . . . 72 vi Acknowledgements I am sincerely grateful for the support of my advisor, Prof. Brian Marcus, for being actively involved in my work, being realistically optimistic along the way, and making these abstract concepts a little more tangible by allowing me to discuss them with another. The only downside is that I now have high expectations of the next person I work with. vii Dedicated to my parents: Eti & Adam Louidor. viii Statement of Co-Authorship Chapters 2,3 and 4 were jointly authored by Erez Louidor and Brian Marcus. Erez Louidor and Brian Marcus were responsible for the identification and design of the research program. The research was performed by Erez Louidor and Brian Marcus with the majority of the research done by Erez Louidor. The manuscript was prepared by Erez Louidor and Brian Marcus with the majority of the writing done by Erez Louidor. A version of these chapters appears in [28]. Chapter 5 was jointly authored by Erez Louidor, Brian Marcus and Ronnie Pavlov. Panu Chaichanavong, Brian Marcus and Tze-Lei Poo were responsible for the identification and design of the research program. The research was performed by Panu Chaichanavong, Erez Louidor, Brian Marcus, Ronnie Pavlov and Tze-Lei Poo. The manuscript was prepared by Erez Louidor, Brian Marcus and Ronnie Pavlov with the majority of the writing done by Erez Louidor. A version of this chapter appears as part of [29]. Chapter 6 was authored by Erez Louidor. Brian Marcus was responsible for the identification and design of the research program. The research was performed by Erez Louidor. The manuscript was prepared by Erez Louidor. A version of this chapter appears in [27]. Chapter 7 was authored by Erez Louidor. Erez Louidor and Brian Marcus were responsible for the identification and design of the research program. The research was performed by Erez Louidor. The manuscript was prepared by Erez Louidor. ix Chapter 1 Overview Fix an alphabet Σ and let G be a directed graph whose edges are labeled with sym- bols in Σ. Each path in G corresponds to a finite word obtained by reading the labels of the edges of the path in sequence. The path is said to generate the corresponding word, and the set of words generated by all finite paths in the graph is called a 1-dimensional constrained system or a 1-dimensional constraint. Such a graph is called a presentation of the constraint. We say that a word satisfies the constraint if it belongs to the constrained system. One-dimensional constraints have found widespread applications in digital storage systems, where they are used to model the set of sequences that can be written reliably to a medium. A central exam- ple is the binary run-length-limited constraint, denoted RLL(d, k) for nonnegative integers 0≤d≤k, consisting of all binary sequences in which the number of ‘0’s between consecutive ‘1’s is at least d, and each “run” of ‘0’s (that is a contiguous sub-sequence of ‘0’s) has length at most k. The value of k is allowed to be∞, in which case there is no restriction on the maximum length of a run of ‘0’s. Another 1-dimensional constraint, often used in practice, is the bounded-charge constraint, denoted CHG(b), for some positive integer b; it consists of all words w1w2. . .w`, where `=0, 1, 2, . . . and each wi is either +1 or −1, such that for all 1≤i≤j≤`, |∑jk=iwk|≤b. This constraint is often used to overcome low frequency noise such as fingerprints on compact discs. Other examples of 1-dimensional constraints are the EVEN and ODD constraints, which contain all finite binary sequences in which the number of ‘0’s between consecutive ‘1’s is even and odd, respectively. Presentations for these constraints are given in Figure 1. See [32] for more exam- ples of 1-dimensional constraints and a more detailed explanation of their use in storage systems. In this work, we consider multidimensional constraints of dimension D for some positive integerD. Such a constraint is a set, specified byD edge-labeled di- rected graphs, of finite-size D-dimensional arrays with entries over some finite alphabet. In Chapter 2 we give a precise definition of what we mean by D- dimensional constraints. As in the 1-dimensional case, we say that an array sat- isfies the constraint if it belongs to it. Given a 1-dimensional constraint S, one can construct aD-dimensional constraint by collecting all theD-dimensional arrays for which the original 1-dimensional constraint is satisfied on every “row” in every “di- 1 Chapter 1. Overview 0 0 1 (a) 0 0 1 (b) 0 1 2 . . . b +1 +1 +1 +1 −1−1−1−1 (c) 0 1 . . . d d+1 . . . k 0 0 0 0 0 0 1 1 1 (d) Figure 1.1: Presentations of 1-dimensional constraints: (a) EVEN; (b) ODD; (c) CHG(b); (d) RLL(d, k). rection” along an “axis” of the array. We denote such a D-dimensional constraint by S⊗D. A well-known 2-dimensional constraint studied in statistical mechanics is the so called “hard-square” constraint. It consists of all finite-size (2-dimensional) binary arrays which do not contain 2 adjacent ‘1’s either horizontally or vertically. Two variations of this constraint are the isolated ‘1’s or “non-attacking-kings” con- straint, denoted NAK, and the “read-write-isolated-memory” constraint, denoted RWIM. The former consists of all finite-size binary arrays in which there are no two adjacent ‘1’s either horizontally, vertically, or diagonally, and the latter con- sists of all finite-size binary arrays in which there are no two adjacent ‘1’s either horizontally, or diagonally. Like their 1-dimensional counterparts, 2-dimensional 2 Chapter 1. Overview constraints play a role in storage systems, where with recent developments, in- formation is written in a true 2-dimensional fashion rather than using essentially 1-dimensional tracks. The RWIM constraint is used to model sequences of states of a binary linear memory in which no two adjacent entries may contain a ‘1’, and in every update, no two adjacent entries are both changed. See [4] and [13] for more details. Let S now be a D-dimensional constraint over an alphabet Σ. For a D-tuple m = (m1,m2, . . .,mD) of positive integers, let Sm or Sm1×...×mD denote the set of all m1×m2×. . .×mD arrays in S, and vol(m) denote the product of the entries of m. We say that a sequence mi = (m (i) 1 , . . . ,m (i) D ) diverges to infinity, denoted mi →∞, if (m(i)j )∞i=1 does for each j. The capacity of S is then defined by cap(S) = lim i→∞ log |Smi | vol(mi) , (1.1) where (mi)∞i=1 is a sequence of D-tuples in (N)D diverging to infinity, | · | de- notes cardinality, and log = log2. We show in Chapter 2 that the limit always exists and is independent of the choice of (mi)∞i=1. The capacity is a fundamental quantity that has been studied under different names in several disciplines dealing with constrained systems. In symbolic dynamics it is known as the “topological entropy” and in statistical physics it is derived from the “grand partition function”. In the context of coding for storage systems, capacity has a practical role. As al- ready mentioned, in many such systems due to physical constraints, only a subset of binary sequences can be written to the media reliably. This subset is typically modeled as a 1-dimensional constrained system over the binary alphabet {0, 1}. In practice, user information, consisting of an arbitrary sequence of binary digits is encoded into a sequence of the constraint before being written to the media. When reading back the data, the constrained sequence is decoded and the original infor- mation is recovered. Typically, an encoder divides its input into fixed size blocks of p digits each, emitting, for each block, a block of q digits, such that when all the output blocks are concatenated the result satisfies the constraint. The ratio p/q is called the rate of the encoder and, naturally, it is desirable that the rate of the encoder be as large as possible. It turns out, that the capacity of a 1-dimensional constraint, is the largest possible rate of any such encoder for the constraint, and hence, is important for the design and evaluation of efficient encoders for digital storage systems. While there is a formula for computing the capacity of a general 1-dimensional constrained system (up-to finding the largest root of a polynomial), no such formula is known for 2- and higher-dimensional constraints. There are only a handful of constraints for which the capacity is nonzero and is known exactly [23, 25, 39]. 3 Chapter 1. Overview Even for the hard-square constraint, the exact capacity is unknown and the problem goes back more than 40 years [12]. Some rigorous evidence for the hardness of computing the capacity of multidimensional constraints is given in [2], where it is shown that there is no algorithm that accepts a 2-dimensional constraint system S as input and determines whether |Sm×n| ≥ 1 for all m,n. In many storage systems, restricting the set of sequences that can be written to the media to be from a constrained system is not enough to ensure the low bit- error-rate required in these systems. Accordingly, a conventional error correcting code, or ECC, is used in addition to further reduce the number of errors. Tradi- tionally, arbitrary user information is encoded twice, first by the ECC encoder and then by the constrained system encoder, before it is written to the media. Immink and Wijngaarden [40] proposed a scheme to embed the ECC directly in the con- strained system. In this scheme, the user information sequence is encoded into a binary sequence in which certain preset positions are left blank. These posi- tions are denoted by ‘’s and are “unconstrained” in the sense that any way of filling them independently with ‘0’s and ‘1’s would result in a sequence that satis- fies the given constrained system. The density of ‘’s in the resulting sequence is called the insertion rate. Next, a systematic ECC is used to compute parity check bits, which are stored directly in these unconstrained positions and the resulting sequence is written to the media. In this manner the written sequence is both a constrained sequence and an ECC codeword. As the error correcting capability of an ECC depends on the number of parity-check bits, it is desirable that the num- ber of unconstrained positions emitted by the encoder, or equivalently the insertion rate, be as large as possible. The maximum insertion rate is the largest insertion rate achievable in arbitrary long sequences. On the other hand as the number of constrained sequences of a given length with a given insertion rate is inversely re- lated to the insertion rate, increasing the insertion rate reduces the overall encoding rate (that is the rate of the combined constrained system and ECC encoders). The tradeoff function [5, 38] quantifies this and provides for a given insertion rate the highest possible rate of any matching encoder. Accordingly, the tradeoff function of a constraint evaluated at 0 equals its capacity and thus can be regarded as a generalization of capacity. In this work we generalize some of these concepts to higher dimensional con- straints and non-binary alphabets. In particular, we define independence capac- ity of a constraint that, roughly speaking, captures the contribution of indepen- dence between symbols in arrays of the constraint to its capacity. For the binary alphabet this coincides with the notion of maximum insertion rate. We denote it by capind(S) for a constraint S. For a 1-dimensional constraint S, it turns out that cap(S⊗1) ≥ cap(S⊗2) ≥ . . . ≥ capind(S) and we denote the limit 4 Chapter 1. Overview limD→∞ cap(S⊗ D ) by cap∞(S); hence cap∞(S)≥capind(S). Chaichanavong and Poo observed the curious fact that for all 1-dimensional constraints S, for which we we know cap∞(S), it turns out to be equal to capind(S), and they ask whether this always holds [37]. Here, we give a partial answer by showing that for a large class of constraints, if capind(S) = 0, than so is cap∞(S) and the convergence is exponentially fast. The main contributions of this work are summarized below. • Calculated the capacity of CHG(2)⊗D and ODD⊗D, for all D∈N [28]. • Generalized an earlier method for computing lower bounds on a certain class of 2-dimensional constraints, and using the method improved the best bounds on cap(NAK) and cap(RWIM), and gave the first published esti- mates of cap(EVEN⊗2) and cap(CHG(3)⊗2) [28]. • Showed that for a large class of constraints S with zero independence capac- ity, cap(S⊗D)→ 0 exponentially fast [29, 30]. • Determined the tradeoff function of RLL(d,∞) and RLL(d, 2d+ 2) [27]. • Showed how correlation inequalities can be used to obtain lower bounds on “monotone” 2-dimensional constraints. This work is organized as follows. In Chapter 2 we define multidimensional constraints and related concepts. In Chapter 3 we show a method for obtaining lower bounds on the capacity, for a class of 2-dimensional constraints. In Chap- ter 4 we calculate the exact capacity of CHG(2)⊗D and ODD⊗D. In Chapter 5 we generalize some of the concepts of [37, 38] to dimensions larger than 1 and non- binary alphabets and define multi-choice constraints and independence capacity. We show some of their properties and in particular prove the exponential conver- gence of capind(S) to cap∞(S) for certain constraints S, discussed above. In Chapter 6 we compute the tradeoff function for RLL(d, 2d + 2) and RLL(d,∞). Finally, in Chapter 7 we show some probabilistic inequalities that hold for certain 2-dimensional binary constraints. From these, we obtain a lower bound on the capacity. 5 Chapter 2 Multidimensional constraints∗ In this chapter we define multidimensional constraints and related concepts that we use in the rest of this thesis. 2.1 One-dimensional constraints and labeled directed graphs We deal with a finite directed graph G = (V,E), sometimes simply called a graph, with vertices V and edges E. We occasionally refer to the vertices as states and to the edges as transitions. For e∈E we denote by σG(e) and τG(e) the initial and terminal vertices of e inG, respectively. We shall omit the subscriptG from σG and τG when the graph is clear from the context. For a sequence (ai)`i=1 and a set A, we abuse notation and write (ai)⊆A to mean that ai∈A for i = 1, 2, . . ., `. A path of length ` in G is a sequence of ` edges (ei)`i=1⊆E, where for i = 1, 2, . . ., `−1, τ(ei)=σ(ei+1). The path starts at the vertex σ(e1) and ends at the vertex τ(e`). A cycle in G is a path that starts and ends at the same vertex. Fix a finite alphabet Σ. A directed labeled graph G with labels in Σ is a pair G = (G,L), where G = (V,E) is a directed graph, and L : E → Σ is a labeling of the edges of G with symbols of Σ. The paths and cycles of G are inherited from G and we will sometime use σG and τG to denote σG and τG respectively. For a path (ei)`i=1 of G, we say the path generates the word L(e1)L(e2). . .L(e`) in Σ∗ (Σ∗ denote the set of all finite (1-dimensional) words over Σ). As mentioned in Chapter 1, a 1-dimensional constraint or 1-dimensional con- strained system over Σ is the set of all words generated from finite paths in some labeled graph with labels in Σ. The graph is called a presentation of the constraint. A labeled direct graph is called lossless if for any two of its vertices u and v, all paths starting at u and terminating at v generate distinct words. It is called deter- ministic if there are no two distinct edges with the same initial vertex and the same label. Every 1-dimensional constraint S has a deterministic, and therefore lossless, ∗A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 6 2.1. One-dimensional constraints and labeled directed graphs presentation [32]. A 1-dimensional constraint over an alphabet Σ is said to have memory m, for some positive integer m, if for every word w of more than m letters over Σ, in which every sub-word of m + 1 consecutive letters satisfies S, it holds that w satisfies S as well, and m is the smallest integer for which this is true. A 1- dimensional constraint with memory m, for some integer m, is called a finite- type constraint. Of the examples introduced in Chapter 1, RLL(d, k) is a finite- type constraint—with memory k, for k<∞, and memory d, for k=∞—whereas EVEN, ODD and CHG(b) for b≥2 are not finite-type constraints. We introduce two 1-dimensional constraints defined by general directed graphs. Let G=(V,E) be a directed graph. The edge constraint defined by G, denoted X(G), is the 1-dimensional constraint over the alphabet E, presented by G = (G, IE) where IE is the identity map on E. Equivalently, an edge constraint is a constraint that can be presented by a labeled graph in which all the edges have distinct labels. For a graph G=(V,E) with no parallel edges, the vertex-constraint defined by G, denoted Ẋ(G), is the set{ (vi) ` i=1⊆V : `=0,1,2,. . ., and for 1≤i<`, ∃ei∈E s.t. σ(ei)=vi, τ(ei)=vi+1 } . It is not hard to verify that vertex-constraints and edge-constraints are 1- dimensional constraints with memory (at most) 1. In fact, the vertex constraints are precisely the finite-type constraints with memory (at most) 1, and it can be shown that edge constraints are characterized as follows. The follower set of a symbol a in a constraint S is defined to be {b : ab ∈ S}; edge constraints are precisely the constraints with memory 1 such that any two follower sets are either disjoint or identical [26, exercise 2.3.4]. A graph G = (V,E) is irreducible if for any pair of vertices u, v∈V there is a path in G starting at u and terminating at v; otherwise it is reducible. A graph G is primitive if it is irreducible and the gcd of the lengths of all cycles of G is 1. These concepts naturally extend to labeled graphs as well. We denote by A(G) the adjacency matrix ofG: namely the |V |×|V |matrix where (A(G))i,j is the number of edges in G from i to j, where | · | denotes cardinality. We use 1 in this work to denote a real vector in which each entry is 1 and for two real matrices (or vectors) M ,N of the same size we write M≤N and M<N if the corresponding inequality holds entry-wise. We say that a graph G is symmetric if A(G) is symmetric. We say that a vertex of a graph is isolated if it has neither outgoing nor incoming edges. We say that a vertex-constraint (resp. edge-constraint) is symmetric if it is defined by a symmetric graph. For a vertex-constraint, this definition is equivalent to requiring that the constraint is closed under reversal of the order of symbols 7 2.2. Higher dimensional constraints in words. Note, that in a symmetric edge-constraint, up to removal of isolated vertices, the (unlabeled) graph defining the constraint is unique. 2.2 Higher dimensional constraints We consider multidimensional arrays of dimension D—a positive integer. We use Z+ to denote the set of nonnegative integers. For a D-tuple m = (m1, . . . ,mD)∈(Z+)D we denote by [m] the Cartesian product ∏ i 0, . . .,mi−1, and for a finite set A, we call an m1 ×m2 × . . .×mD D-dimensional array with entries in A, a D-dimensional array of size m over A. We shall index the entries of such an array by [m]. We use Am and Am1×...×mD to denote the set of all D- dimensional arrays of size m over A. We define A∗...∗ = A∗D , where the number of ‘∗’s in the superscript is D, by A∗D = ⋃ m Am, as the set of all finite-size D-dimensional arrays with entries in A. Let Γ∈Σ∗D be such an array. Given an integer 1≤i≤D, a row in direction i of Γ is a se- quence of entries of Γ of the form ( Γ(k1,k2,...,ki−1,j,ki+1,...,kD) )mi−1 j=0 for some in- tegers kl∈[ml]; 1≤l≤D, l 6=i. In this work, for D = 2, we use the convention that direction 1 is the vertical direction and direction 2 is the horizontal; thus the columns of a 2-dimensional array are its rows in direction 1, and its “traditional rows” are its rows in direction 2. Let A,B be finite sets and L : A → B be a map- ping. We extend L to a mapping L : A∗D→B∗D as follows. For a D-dimensional array Γ∈Am, L(Γ) is the array in Bm obtained by applying L to each entry of Γ, that is (L(Γ))j = L((Γ)j) , j∈[m]. Additionally, for a subset S⊆A∗D we define L(S) = {L(Γ) : Γ∈S}. We generalize the definition of a constrained system to D dimensions. Let Ḡ = (G1,G2, . . .,GD), be a D-tuple of labeled graphs with the same set of edges E and the same labeling L : E → Σ. The edge e has D pairs of initial and terminal vertices (σGi(e), τGi(e))—one for each graph Gi in Ḡ. We say that an array Γ∈Σ∗D of size m is generated by Ḡ if there exists an array Γ′∈E∗D of size m, such that for i = 1, 2, . . .,D, every row in direction i of Γ′ is a path in Gi, and L(Γ′) = Γ. We call the set of all arrays Γ∈Σ∗D generated by Ḡ, theD-dimensional constrained system or the D-dimensional constraint presented by Ḡ, and denote it by X(Ḡ) (note the difference from the notation used for edge-constraints where 8 2.2. Higher dimensional constraints the argument inside the parenthese is an unlabeled graph). We say that Ḡ is a presentation for X(Ḡ). In [14], 2-dimensional constrained systems are defined by vertex-labeled graphs, with a common set of vertices and a common labeling on the vertices. It can be shown that their definition (generalized to higher dimensions) is equiva- lent to ours. We find it more convenient to use our definition, since, just as in one dimension, it permits use of parallel edges and often enables a smaller presentation of a given constraint. Figure 2.2 shows presentations for the NAK and RWIM constraints defined in Chapter 1. In these presentations G1 and G2 describe the vertical and horizontal constraints on the edges, respectively. Each edge e = (e)i,j is a 2×2 binary matrix of the form e = ( (e)(0,0) (e)(0,1) (e)(1,0) (e)(1,1) ) , and it is labeled by (e)(1,1), i.e., the labeling of an edge simply picks out the entry in the lower-right corner. For NAK, the edges E = ENAK are the 2 × 2 matrices which satisfy NAK, that is, with at most one 1. Similarly, for RWIM, the edges E = ERWIM are the 2× 2 matrices which satisfy RWIM, namely, the elements of ENAK together with ( 1 0 1 0 ) and ( 0 1 0 1 ) , In the figures, each edge is drawn twice—once in G1 and once in G2—and the matrix identifying it is written next to it. The states are 1 × 2 blocks for G1 and 2× 1 blocks for G2; for an edge e, σG1(e) = (e)(0,0)(e)(0,1) , and τG1(e) = (e)(1,0)(e)(1,1) , and σG2(e) = (e)(0,0) (e)(1,0) , and τG2(e) = (e)(0,1) (e)(1,1) . It follows that, for both constraints, and any rectangular array Γ′∈Em×n with each of its rows a path in G2 and each of its columns a path in G1, it holds that Γ′(i,j) = ( L(Γ′(i−1,j−1)) L(Γ′(i−1,j)) L(Γ′(i,j−1)) L(Γ′(i,j)) ) for i = 1, 2, . . .,m−1, j = 1, 2, . . ., n−1. Therefore, the only 2×2 sub-arrays appearing in the array L(Γ′) are elements of E, and it follows that L(Γ′) satisfies the corresponding constraint. Similarly, it can be shown that any array satisfying the constraint can be generated by the presentation. 9 2.3. Capacity G1: 01 00 10 G2 : 01 00 10 ( 01 00 ) (0000) ( 00 01 ) ( 00 10 ) ( 10 00 ) ( 00 10 ) (0000) ( 00 01 ) ( 01 00 ) ( 10 00 ) (a) G1: 01 00 10 G2 : 01 00 10 1 1 ( 01 00 )(0101) (0000) ( 00 01 ) ( 00 10 ) ( 10 00 ) ( 10 10 ) ( 00 10 ) (0000) ( 00 01 ) ( 01 00 ) ( 10 00 )( 10 10 ) ( 01 01 ) (b) Figure 2.1: Presentations of 2-dimensional constraints: (a) NAK constraint; (b) RWIM constraint. 2.3 Capacity In Chapter 1, we introduced the notion of capacity of a D-dimensional constraint. Here we expand on the definition; in particular we show that capacity is well- defined. We first need a generalization of Fekete’s Subadditivity Lemma which will prove useful in subsequent chapters as well. We denote byN the set of positive integers and by R̄ = R ∪ {−∞,+∞} the extended real numbers. We say that a function f : ND→R̄ has a limit L∈R̄ at infinity, denoted limm→∞ f(m) = L, if for every sequence (mi)∞i=1⊆ND with mi → ∞, we have limi→∞ f(mi) = L. This is equivalent to requiring that for any real > 0 there exists N∈N such that for all m∈ND with every entry at least N, |f(m)−L|<. We call a function 10 2.3. Capacity f : ND → R̄ entry-wise subadditive if for all m∈ND, f(m)<∞, and for any (m1, . . .,mD)∈ND, i∈{1, 2, . . .,D} and n∈N, it holds that f(m1, . . .,mi−1,mi+n,mi+1, . . .,mD) ≤f(m1, . . .,mD)+ f(m1, . . .,mi−1, n,mi+1, . . .,mD). (2.1) We call a function f : ND → R̄ entry-wise superadditive if −f is entry-wise subadditive. Lemma 1. Let f : ND → R̄ be a function, then the following statements hold. 1. If f is entry-wise subadditive then lim m→∞ f(m) |[m]| = infm∈ND f(m) |[m]| . (2.2) 2. If f is entry-wise superadditive then lim m→∞ f(m) |[m]| = supm∈ND f(m) |[m]| . (2.3) Remark . Note that if f is entry-wise subadditive then for i = 1, 2, . . .,D, for any mi+1, . . . ,mD ∈ N the mapping mi → lim mi−1→∞ lim mi−2→∞ . . . lim m1→∞ f(m1, . . . ,mD) m1·. . .·mi−1 is subadditive; and hence by Part 1 we have the limit lim mi→∞ ( 1 mi lim mi−1→∞ . . . lim m1→∞ f(m1, . . . ,mD) m1·. . .·mi−1 ) = inf mi∈N ( 1 mi lim mi−1→∞ . . . lim m1→∞ f(m1, . . . ,mD) m1·. . .·mi−1 ) . By repeating this argument D times, one has, lim mD→∞ . . . lim m1→∞ f(m1, . . .,mD) m1·. . .·mD = infmD∈N . . . infm1∈N f(m1, . . .,mD) m1·. . .·mD = inf m∈ND f(m) |[m]| = limm→∞ f(m) |[m]| . An analogous result holds for the superadditive case. 11 2.3. Capacity Proof. Part 1. Let f be entry-wise subadditive. It’s easy to check that (2.1) implies that for all (a(0)1 , . . ., a (0) D ), (a (1) 1 , . . ., a (1) D )∈ND, k∈N, and i∈{1, . . .,D}, we have f(a (0) 1 , . . ., a (0) i−1, ka (0) i , a (0) i+1, . . ., a (0) D ) ≤ kf(a(0)1 , . . ., a(0)D ) (2.4) f(a (0) 1 +a (1) 1 , . . ., a (0) D +a (1) D ) ≤ ∑ x∈{0,1}D f(a (x1) 1 , . . ., a (xD) D ), (2.5) where x = (x1, . . ., xD) in the sum in the RHS of (2.5). Now, let (m(i))∞i=1⊆ND be a sequence satisfying m(i) → ∞. Write m(i) = (m (i) 1 , . . .,m (i) D ), and let n = (n1, . . ., nD) be a vector in N D. Since m(i) → ∞ (and D is finite), there is an i0=i0(n) such that for all i≥i0, m(i)≥2n. Set M = max({0}∪{f(x) : x∈∏j{1, . . ., nj}}) and let i≥i0. Note, that by our assumption on f , M 6= ±∞. For each j = 1, . . .,D, there are (unique) q(i)j , r(i)j ∈N, such that m (i) j = q (i) j nj + r (i) j , and 1≤r(i)j ≤nj . We define α(0)j =β(0)j =r(i)j , α(1)j =q(i)j nj , and β (1) j =nj . Let T = {0, 1}D\{1} (where 1 denotes the vector in ND with every entry equal to 1). Then using (2.5) and (2.4) we have f(m(i)) |[m(i)]| = f(q (i) 1 n1+r (i) 1 , . . . , q (i) D nD+r (i) D ) |[m(i)]| ≤ ∑ x∈{0,1}D f(α (x1) 1 , . . . , α (xD) D ) |[m(i)]| ≤ ∑ x∈{0,1}D ( f(β (x1) 1 , . . . , β (xD) D ) ∏ j ( q (i) j )xj) |[m(i)]| = f(n) ∏ j q (i) j + ∑ x∈T ( f(β (x1) 1 , . . . , β (xD) D ) ∏ j ( q (i) j )xj) |[m(i)]| ≤ f(n) ∏ j q (i) j q (i) j nj+r (i) j + ∑ x∈T M∏ j (q (i) j ) xj q (i) j nj+r (i) j , (2.6) where, again, in each sum above x = (x1, . . ., xD). Since m(i) → ∞, it follows that limi→∞ q (i) j = ∞ for each j=1, . . .,D. Observe that, as i goes to infinity, the first summand of the RHS of (2.6) converges to f(n)/|[n]|, and each of the other summands converges to 0. Hence, taking the lim sup of both sides of the last inequality, we obtain lim supi→∞(f(m(i))/|[m(i)]|) ≤ f(n)/|[n]|. Since n is arbitrary, this implies lim sup i→∞ f(m(i)) |[m(i)]| ≤ infn∈ND f(n) |[n]| . (2.7) 12 2.4. Axial product On the other hand, clearly, inf n∈ND f(n) |[n]| ≤ lim infi→∞ f(m(i)) |[m(i)]| ≤ lim supi→∞ f(m(i)) |[m(i)]| . Combining this with (2.7) we get limi→∞(f(m(i))/|[m(i)]|) = infn{f(n)/|[n]|}, and the result follows. Part 2. Easily follows by applying Part 1 to −f . Observe that D-dimensional constraints are closed under taking contiguous sub-arrays, meaning that if an array belongs to the constraint then any of its D- dimensional contiguous sub-arrays also belongs to the constraint. This implies that the mappingm→ log |Sm| form∈ND, where we define log 0=−∞, is entry-wise subadditive. Thus by Lemma 1, the limit in (1.1) always exists, is independent of the choice of (mi)∞i=1, and satisfies cap(S) = inf m∈ND log |Sm| |[m]| (2.8) For a nonnegative matrix A denote by λ(A) its Perron eigenvalue, that is, its largest real eigenvalue. It is well-known that for a 1-dimensional constraint S pre- sented by a lossless labeled graph G = (G,L), the capacity of S is log λ (A(G)). In particular, for a graph G, it holds that cap(X(G)) = log λ (A(G)). 2.4 Axial product The axial product of D sets L1, . . ., LD⊆Σ∗, denoted L1⊗L2⊗. . .⊗LD ⊆ Σ∗D , is the set of all arrays Γ∈Σ∗D such that for i = 1, 2, . . .,D every row of Γ in direction i belongs to Li. If L1=L2=. . .=LD = L we say that the axial-product is isotropic and denote it by L⊗D. Given a presentation Ḡ = ((G1,L), . . ., (GD,L)) for a D-dimensional constraint S with a common set of edges E, the set X(G1)⊗. . .⊗X(GD) ⊆ E∗D is a D-dimensional constraint pre- sented by ((G1, IE), . . ., (GD, IE)), where IE is the identity map on E. If cap(X(G1)⊗. . .⊗X(GD))=cap(S), we say that Ḡ is capacity-preserving. For D=1 any lossless, and therefore deterministic, presentation of S is capacity- preserving, since in a lossless graph with |V | vertices there are at most |V |2 paths generating any given word; for D > 1, the question whether every D-dimensional constraint has a capacity-preserving presentation is open. This is a major open problem in symbolic dynamics, although it is usually formulated in a slightly different manner; see [6], where it is shown that for every D-dimensional con- straint S and > 0, there is a presentation Ḡ = ((G1,L), . . ., (GD,L)) such 13 2.4. Axial product that cap(S)≤cap(X(G1)⊗. . .⊗X(GD))<cap(S)+. We show in the next propo- sition that the answer to this question is positive, if S is an axial product of D one-dimensional constraints. Proposition 1. Let S(1), S(2), . . ., S(D)⊆Σ∗ be D one-dimensional constraints over Σ, and let S = S(1)⊗. . .⊗S(D), then S is a D-dimensional constraint over Σ. Moreover, S has a capacity-preserving presentation. Remark . There are D-dimensional constraints which are not axial products of D one-dimensional constraints. For example for D=2, the NAK constraint defined in Chapter 1 is not an axial product. Proof. Let GS(1) , . . .,GS(D) be presentations of S(1), . . ., S(D), where, for i = 1, 2, . . .,D, GS(i) = ((Vi, Ei),Li). Define the D-tuple of labeled graphs Ḡ = (G1, . . .,GD), as follows. Let E= { (e1, . . ., eD)∈ D∏ i=1 Ei:L1(e1)=L2(e2)=. . .=LD(eD) } , and let L : E → Σ be given by L(e1, . . ., eD) = L1(e1) ; (e1, . . ., eD)∈E. For i = 1, 2, . . .,D, the graph Gi is defined by Gi = (Gi,L) with Gi = (Vi, E), where for e = (e1, . . ., eD)∈E, σGi(e) = σGS(i) (ei) and τGi(e) = τGS(i) (ei). It’s easy to verify that Ḡ is a presentation of S(1)⊗S(2)⊗. . .⊗S(D). Assume now that every GS(i) is lossless. We show that in this case Ḡ is capacity preserving. Let X=X(G1)⊗. . .⊗X(GD), n be a positive integer, and let n be the D-tuple with every entry equal to n. We extend the mapping L to L : Xn→Sn as described above. Now, fix an array Γ∈Sn, and for i = 1, 2, . . .,D let Γ′(i)∈(Ei)n be an array such that every row in direction i, (Γ′(i)jk ) n k=1 is a path in GS(i) generating the corresponding row (Γjk) n k=1 in Γ. Let Γ ′∈En be the array with entries given by Γ′j = (Γ ′(1) j , . . .,Γ ′(D) j ), j∈[n]D. It follows from the construction of Ḡ that Γ′∈Xn, and that L(Γ′) = Γ. Moreover, any array ∆∈Xn such that L(∆) = Γ can be constructed in this manner. Now, as each GS(i) is lossless, there are at most |Vi|2 possibilities of choosing each row in direction i of Γ′(i), and as there are nD−1 such rows, there are at most |Vi|2nD−1 possibilities of choosing each Γ′(i). It follows that |L−1({Γ})| ≤ D∏ i=1 |Vi|2nD−1 , 14 2.5. Two-dimensional constraints where L−1({Γ}) = {Γ′∈Xn : L(Γ′) = Γ}. Summing the latter inequality over all Γ∈Sn, we obtain |Xn|≤|Sn| D∏ i=1 |Vi|2nD−1 . Taking the log, dividing through by nD, and taking the limit as n approaches infin- ity, we have cap(X)≤cap(S). Clearly, cap(X)≥cap(S), sinceX is a presentation of S. The result follows. Let S be a 1-dimensional constraint. Since for any m∈ND |(S⊗D)m| = |(S⊗(D+1))(m,1)| it follows from (2.8) that cap(S⊗D) is non-increasing in D. 2.5 Two-dimensional constraints We introduce some specialized definitions for 2-dimensional constraints. For a 2-dimensional array Γ∈Σm1×m2 , for nonnegative integers m1, m2, we denote by Γt its transpose, namely (Γt)(i,j) = (Γ)(j,i) for all (i, j)∈[m1]×[m2]. For a 2- dimensional constraint S over Σ, we use St to denote the set St = { Γ∈Σ∗∗ : Γt∈S} , Clearly St is a 2-dimensional constraint with cap(St) = cap(S). We shall use the following (MATLAB-like) notation for 2-dimensional arrays. Let A be a set and Γ∈As×t. For integers 0≤s1≤s2<s and 0≤t1≤t2<t, we denote by Γs1:s2,t1:t2 the sub-array: (Γs1:s2,t1:t2)i,j = Γs1+i,t1+j ; (i, j)∈[s2−s1+1]×[t2−t1+1], and by Γs1:s2,∗ (resp. Γ∗,t1:t2) the sub-array Γs1:s2,0:t−1 (resp. Γ0:s−1,t1:t2). We also abbreviate x:x in the subscript by x. We shall use the same notation for one- dimensional vectors: for a vector v∈As, vs:t denotes the sub-vector (vs:t)i = vs+i ; i∈[t−s+1]. 2.5.1 Horizontal and vertical strips Let S be a 2-dimensional constraint over Σ, and m be a positive integer. The horizontal (resp. vertical) strip of height (resp. width) m of S, denoted Hm(S) (resp. Vm(S)) is the subset of S given by Hm(S) = ⋃ n Sm×n (resp. Vm(S) = ⋃ n Sn×m). 15 2.5. Two-dimensional constraints Let S be a 2-dimensional constraint over an alphabet Σ, and consider a hor- izontal (resp. vertical) strip Hm(S) (resp. Vm(S)) of S for some positive inte- ger m. We regard such a strip as a set of 1-dimensional words over Σm where each m×n (resp. n×m) array in the strip is considered a word of length n over Σm. Below we show that the horizontal and vertical strips of S are 1- dimensional constraints over Σm. For this, we need the following definition. Let G = (V,E) be a graph, and let m be a positive integer. Let G×m be the graph given by G×m = (V m, Em), where for each e = (e1, . . ., em)∈Em, σG×m(e) = (σG(e1), . . ., σG(em)) and τG×m(e) = (τG(e1), . . ., τG(em)). For a labeled graph G = (G,L) with G = (V,E) and L : E → Σ, let G×m be the labeled graph defined by G×m = (G×m,L×m), where L×m : Em → Σm is given by L×m(e1, . . ., em)=(L(e1), . . .,L(em)) ; (e1, . . ., em)∈Em We callG×m (resp., G×m) themth tensor-power ofG (resp., G). We can now state the following proposition. Proposition 2. Let S be a 2-dimensional constraint over Σ and let m be a positive integer. Then 1. Hm(S) (resp. Vm(S)) is a 1-dimensional constraint over Σm. 2. Let S = T (V)⊗T (H) for 1-dimensional constraints T (V), T (H) over Σ, pre- sented by labeled graphs G(V), G(H), respectively. Then the 1-dimensional constraintHm(S) is presented by the labeled graph G(H)m defined as the sub- graph of the labeled graph (G(H))×m consisting of only those edges whose label (anm-letter word over Σ) satisfies T (V). An analogous statement holds for Vm(S), with respect to the graph G(V)m formed in a similar way from (G(V))×m. Proof. It suffices to prove this only for horizontal stripsHm(S)—the argument for the vertical strip being analogous. We first prove part 2. It’s easy to verify that the labeled graph (G(H))×m presents the constraint over Σm, consisting of all m×n arrays of Σ∗∗ such that every row satisfies T (H). It follows that the subgraph G(H)m , formed by removing all the edges of (G(H))×m that are labeled with a word that does not satisfy T (V), presents the 1-dimensional constraint consisting of all m×n arrays with every row satisfying T (H) and every column satisfying T (V). This is precisely the constraintHm(S). We proceed to prove part 1. Let the pair of labeled graphs (G(V),G(H)) be a presentation of S, where G(V) = ((V (V), E),L) and G(H) = ((V (H), E),L). Define the edge-constraints E(V) = X(V (V), E) and E(H) = X(V (H), E). Since (G(V),G(H)) is a presentation of S, we have S = L(E(V)⊗E(H)), and therefore 16 2.6. Open questions Hm(S) = L(Hm(E(V)⊗E(H))). By part 2, Hm(E(V)⊗E(H)) is a 1-dimensional constraint, presented by a labeled graph G(H)m with edges labeled by words in Em. Replacing each such label e∈Em in that graph with L(e), we clearly obtain a presentation of L(Hm(E(V)⊗E(H))) = Hm(S). 2.6 Open questions A major problem in the theory of multidimensional constrained systems is the com- putation of their capacity. As already mentioned before, the problem is essentially solved for 1-dimensional constraints, however, in higher dimensions, only a few methods for estimating the capacity of a general D-dimensional constraint exist. In the light of [2], no “computable” formula for the capacity of such constraints exists; yet, it still may be true that for some specialized sub-classes of constraints, one can compute the capacity exactly. For example, the hard square constraint is a relatively old open problem for which no “closed-form” formula for computing the capacity is currently known. However, such a formula is known for the hexagonal- lattice version of this constraint (the “hard-hexagon constraint”) [1], which perhaps suggests that a similar formula exists for the capacity of the hard-square constraint. 17 Chapter 3 Lower bounds on capacity of 2-dimensional symmetric constraints∗ A method for computing very good lower-bounds on the capacity of the hard- square constraint is given in [7] (see also [34, 41]). [3] generalizes the method slightly and also presents a method for obtaining good upper-bounds on the capac- ity of this constraint. Both the method for obtaining the lower- and the method for obtaining the upper-bounds can be shown to work on any 2-dimensional constraint for which every horizontal or every vertical strip is a symmetric vertex-constraint. In this chapter we show a generalization of the method for obtaining the lower- bounds that gives improved bounds on capacities of such constraints. Moreover, we show how this generalization as well as the method for obtaining the upper- bounds may be applied to a larger class of 2-dimensional constraints that includes constraints in which the vertical and horizontal strips are not necessarily finite- type. We illustrate this by computing lower and upper bounds on the capacities of the EVEN⊗2 and CHG(3)⊗2 constraints, and show that 0.4402086447 ≤ cap(EVEN⊗2) ≤ 0.4452873312, and 0.4222689819 ≤ cap(CHG(3)⊗2) ≤ 0.5328488954. 3.1 Constraints with symmetric edge-constrained strips In this section we generalize the method presented in [3, 7] to provide improved lower bounds on capacities of 2-dimensional constraints whose horizontal strips are symmetric edge-constraints. Fix an alphabet Σ, and let S be a 2-dimensional constraint over Σ. We say that S has horizontal edge-constrained-strips if for every positive integer m, the ∗A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 18 3.1. Constraints with symmetric edge-constrained strips constraint Hm(S) is an edge-constraint. If, in addition, every horizontal strip is symmetric, we say that S has symmetric horizontal edge-constrained strips. Anal- ogously, using Vm(S), we have the notions of a 2-dimensional constraint with vertical edge-constrained-strips and symmetric vertical edge-constrained-strips Here, we consider constraints of the form S = T ⊗ E , where E=X(GE) is an edge-constraint defined by the graph GE = (VE , EE) and T is an arbi- trary 1-dimensional constraint over Σ. Then E is presented by GE = (GE , IE) where IE is the identity map on EE . Let m be a positive integer. By Propo- sition 2, part 2, Hm(S) is a 1-dimensional constraint presented by a subgraph G(H)m = (G(H)m , I×mE ) of G×mE . It follows that Hm(S) = X(G(H)m ), and so S has horizontal edge-constrained strips. Henceforth, we further assume that it has sym- metric horizontal edge-constrained strips; note that symmetry of the graph GE is necessary but not sufficient for this assumption (see Proposition 3 below). For a positive integer m, let Fm = |(VE)m|, and let Hm denote the Fm × Fm adjacency matrix of G(H)m . Since limn→∞(log(|Sm×n|)/n) = cap(Hm(S)) for every positive integer m, we have cap(S) = lim m,n→∞ log |Sm×n| mn = lim m→∞ limn→∞ log |Sm×n| mn = lim m→∞ cap(Hm(S)) m = lim m→∞ log λ(Hm) m . (3.1) For a matrix M , let Mt denote its transpose. Fix a positive integer m. Following [3, 7], since Hm is real and symmetric, we obtain by the min-max principle [16] λ(Hpm) ≥ ytmH p mym ytmym , for any Fm×1 real vector ym 6= 0 and positive integer p. The RHS of the last inequality is known as a Rayleigh quotient. Choosing ym to be the vector H q mxm, for some positive integer q and Fm × 1 real vector xm such that ym 6= 0, we have λ(Hpm) ≥ xtmH 2q+p m xm xtmH 2q m xm . (3.2) Thus by (3.1), it follows that cap(S) ≥ 1 p lim sup m→∞ 1 m log xtmH 2q+p m xm xtmH 2q m xm . (3.3) 19 3.1. Constraints with symmetric edge-constrained strips In [3, 7], each xm is chosen to be the vector 1 with Fm entries. We obtain improved lower bounds in many cases by choosing other sequences of vectors, (xm)∞m=1, as follows. We fix integers µ≥0 and α≥1, and let φ : (VE)µ+α → [0,∞) be a nonnegative function. Our method works for se- quences (xmk) ∞ k=1, where mk = µ + kα for positive integers k, and the entries of each xmk are indexed by (VE) mk and given by (xmk)v= k−1∏ i=0 φ(viα:iα+µ+α−1) ; v∈(VE)mk . (3.4) For such sequences and a fixed positive integer n, we will show that one can com- pute Ln, the growth rate of xtmkH n mk xmk : Ln = lim k→∞ logxtmkH n mk xmk mk , and from (3.3) we obtain the lower bound cap(S)≥(L2q+p − L2q)/p. Before doing this for general µ and α, it is instructive to look at the special case: µ = 0 and α = 1. In this case mk=k, and using (3.4) to define xk we obtain (xk)v = k−1∏ i=0 φ((v)i) ; v∈(VE)k. Let n be a positive integer. For a word w = w1. . .wn∈E define its weight,Wφ(w), by Wφ(w)=φ(σ(w1))φ(τ(wn)), where w1, wn are regarded as edges in GE and extend this to arrays Γ∈Sm×n by Wφ(Γ) = m−1∏ i=0 Wφ(Γi,∗). Observe that for an array Γ∈Sm×n that is a path in G(H)m of length n starting at v∈(VE)m and ending at u∈(VE)m, it holds thatWφ(Γ)=(xm)v(xm)u. It follows that xtmH n mxm= ∑ Γ∈Sm×n Wφ(Γ). (3.5) Now, pick a deterministic presentation, G(V)n , of Vn(S), where G(V)n = ((V (V) n , E (V) n ),L(V)n ), and let Wφ : E(V)n →[0,∞) be the edge weighting defined by Wφ(e) = Wφ(L(V)n (e)), for e∈E(V)n . Let A(G(V)n ,Wφ) be the |V (V)n |×|V (V)n | 20 3.1. Constraints with symmetric edge-constrained strips weighted adjacency matrix of G(V)n with entries indexed by V (V)n ×V (V)n and given by ( A(G(V)n ,Wφ) ) i,j = ∑ e∈E(V)n : σ(e)=i,τ(e)=j Wφ(e) ; i, j∈V (V)n . Then 1tA(G(V)n ,Wφ)m1 = ∑ γ Wφ(L(V)n (γ)), where the sum is taken over all paths γ in G(V)n of length m and L(V)n (γ) denotes the array in Sm×n generated by γ. Since G(V)n is deterministic it follows that lim m→∞ log 1tA(G(V)n ,Wφ)m1 m = lim m→∞ log ∑ Γ∈Sm×nWφ(Γ) m . By (3.5) the RHS is Ln and by Perron-Frobenius theory the LHS is log λ(A(G(V)n ,Wφ)), and thus Ln = log λ(A(G(V)n ,Wφ)). For general µ, α, we proceed similarly. We pick a deterministic presentation, G(V)n , of the vertical strip Vn(S), with G(V)n = ((V (V)n , E(V)n ),L(V)n ), and construct a labeled directed graph I=I(µ, α, n,G(V)n , GE)=((VI , EI),LI), with nonnegative real weights on its edges given by Wφ : EI → [0,∞). The graph I and weight functionWφ are defined as follows. The set of vertices VI is given by VI = { (f , v, l) : v ∈ V (V)n , f , l ∈ (VE)µ } , and the function LI : EI → Σα×n labels each edge with an α×n array over Σ. We specify the edges of I by describing the outgoing edges of each of its vertices along with their weights. Let v = (f , v, l) ∈ VI be a vertex of I. The set of outgoing edges of v consists of exactly one edge for every path of length α in G(V)n starting at v. Let γ = (ei)α−1i=0 ⊆ E(V)n be such a path and let u be its terminating vertex. We regard the word generated by γ in G(V)n as an array Γ ∈ Σα×n with entries given by (Γ)i,j = ( L(V)n (ei) ) j . Let f = (f0, . . . , fµ−1) and l = (l0, . . . , lµ−1) and for i = µ, µ+1, . . ., µ+α−1, define fi to be σ(Γi−µ,0) and li to be τ(Γi−µ,n−1), where Γi−µ,0 and Γi−µ,n−1 are regarded as edges in the graphGE . For such a path γ the corresponding outgoing edge e ∈ EI of v satisfies σ(e) = v, LI(e) = Γ, τ(e) = ((fα, fα+1, . . . , fα+µ−1), u, (lα, lα+1, . . . , lα+µ−1)). The weight of e, Wφ(e), is given by Wφ(e) = φ(f0, . . . , fµ+α−1)φ(l0, . . . , lµ+α−1). We shall regard the label of a path (ei)`−1i=0 in I as the `α×n array Γ over Σ resulting 21 3.1. Constraints with symmetric edge-constrained strips from concatenating the labels of the edges of γ in order in the vertical direction, namely Γiα+k,j = (LI(ei))k,j , for all i∈[`], k∈[α] and j∈[n]. Finally, we define the weighted adjacency matrix of the labeled directed graph I with weights given byWφ as the |VI | × |VI | nonnegative real matrix A(I,Wφ) with entries indexed by VI × VI and given by (A(I,Wφ))i,j = ∑ e∈EI : σ(e)=i,τ(e)=j Wφ(e) ; i, j ∈ VI . Figure 3.1 shows paths generating an `α×n array Γ∈S: the left part of the fig- ure shows such a path in G(V)n and the right part of the figure shows the “correspond- ing” path in I. The label of each edge in G(V)n and I is “overlayed” on top of it. Each row of Γ—a path inGE of length n—is depicted in the figure as a grey “snake- like” curve. In the figure, we denote by si,ti the states σ(Γi,0), τ(Γi,n−1)∈GE respectively, for i = 0, 1, . . ., `α−1. The following lemma generalizes ideas in [3, 7] and uses the weighted labeled graph I to compute Ln, when (xmk)∞k=1 is the sequence defined by (3.4). Lemma 2. For a sequence (xmk) ∞ k=1 with each xmk given by (3.4), and I = I(µ, α, n,G(V)n , GE), lim k→∞ logxtmkH n mk xmk mk = log λ(A(I,Wφ)) α . Proof. Let (xmk) ∞ k=1 be a sequence of vectors with each xmk given by (3.4). We shall show that there are positive real constants c,d (depending on I and φ) such that for all positive integers k, c · 1t (A(I,Wφ))k+dµ/αe 1 ≤ xtmkHnmkxmk ≤ d · 1t (A(I,Wφ))k 1. (3.6) For a positive integer s and vector e = (e0, . . . , es−1) in (EE)s denote by σ(e), τ(e) ∈ (VE)s the vectors with entries given by (σ(e))i = σ(ei) (τ(e))j = τ(ei) ; i ∈ {0, 1, . . . , s−1}. Now, fix a positive integer k. Let Γ be an array in Smk×n. Recall that each entry of Γ is an edge in GE and define the weight of Γ, denotedWφ(Γ), by Wφ(Γ)= k−1∏ i=0 φ(σ(Γiα:iα+µ+α−1,0))φ(τ(Γiα:iα+µ+α−1,n−1)). 22 3.1. Constraints with symmetric edge-constrained strips s0 t0 v0 s1 t1 v1 s`α−1 t`α−1 v`α−1 v2 v`α f0 l0 v0 ... ... f1 l1 vα ... ... ... ... f2 l2 v2α ... ... ... ... f` l` v`α ... ... ... ... s0 t0 s1 t1 sα−µ tα−µ sα−1 tα−1 sα tα sα+1 tα+1 s2α−µ t2α−µ s2α−1 t2α−1 s(`−1)α t(`−1)α s(`−1)α+1 t(`−1)α+1 s`α−µ t`α−µ s`α−1 t`α−1 Figure 3.1: Paths generating an `α×n-array of S, in G(V)n (left) and I (right). 23 3.1. Constraints with symmetric edge-constrained strips For a positive integer `, let P` denote the set of paths of length ` in I. We denote the label of a path γ∈P` by LI(γ). It is easily verified that there exists a path in P` with label Γ ∈ Σ`α×n if and only if there exists a path in G(V)n of length `α that generates Γ. As G(V)n is a presentation of Vn, the set of labels of paths in P` is S`α×n. For a finite path γ in I, define its weight, denotedWφ(γ), as the product of the weights of the edges in the path. Recalling that the entries of xmk are indexed by (VE)mk , we observe that xtmkH n mk xmk = ∑ Γ∈Smk×n (xmk)σ(Γ∗,0) (xmk)τ(Γ∗,n−1) = ∑ Γ∈Smk×n Wφ(Γ). For an array Γ ∈ Smk×n, we say that a path γ ∈ Pk matches Γ if it is labeled by the sub-array Γµ:mk−1,∗ and starts at a vertex (f , v, l) ∈ VI with f = σ(Γ0:µ−1,0) and l = τ(Γ0:µ−1,n−1). It can be verified from the construction of I that if γ matches Γ thenWφ(γ) =Wφ(Γ). Now, since G(V)n is a presentation of Vn(S), it follows from the construction of I that every Γ ∈ Smk×n has a path in Pk matching it. Conversely, since for a path γ ∈ Pk all arrays Γ ∈ Smk×n that it matches have the same sub-array Γµ:mk−1,∗, it follows that there are at most |Σ|µn arrays in Smk×n that γ matches. Therefore, xtmkH n mk xmk = ∑ Γ∈Smk×n Wφ(Γ) ≤ |Σ|µn ∑ γ∈Pk Wφ(γ) = |Σ|µn1t (A(I,Wφ))k 1. This shows the right inequality of (3.6), we now turn to the left. Set k′ = k+dµ/αe, s = dµ/αeα− µ, and let ψ : Pk′ → Smk×n be given by ψ(γ) = (LI(γ))s:k′α−1,∗ ; γ ∈ Pk′ . 24 3.1. Constraints with symmetric edge-constrained strips For a path γ ∈ Pk′ , with γ = (ei)k ′−1 i=0 ⊆ EI , its weight satisfies Wφ(γ) = k′−1∏ i=0 Wφ(ei) = dµ/αe−1∏ i=0 Wφ(ei) Wφ(ψ(γ)) ≤ Φ2dµ/αeWφ(ψ(γ)), where we take Φ to be a positive constant satisfying Φ≥max {φ(v):v∈(VE)µ+α}. Now let Γ be an array in Smk×n. Since G(V)n is deterministic, so is I, and thus for every vertex v ∈ VI , the paths in P ′k starting at v are labeled distinctly. As all paths γ that map to Γ under ψ have the same sub-array (LI(γ))s:k′α−1,∗, it follows that there are at most |Σ|sn paths γ ∈ P ′k starting at v such that ψ(γ) = Γ. Consequently, there are at most |VI ||Σ|sn paths in Pk′ that map to Γ under ψ. Therefore, 1t (A(I,Wφ))k+dµ/αe 1 = ∑ γ∈Pk′ Wφ(γ) ≤ Φ2dµ/αe ∑ γ∈Pk′ Wφ(ψ(γ)) ≤ Φ2dµ/αe|VI ||Σ|sn ∑ Γ∈Smk×n Wφ(Γ) = Φ2dµ/αe|VI ||Σ|sn ·xtmkHnmkxmk . Dividing both sides by Φ2dµ/αe|VI ||Σ|sn we obtain the left inequality of (3.6). The claim of the lemma now follows from Perron-Frobenius theory by taking the log of (3.6), dividing it by mk and taking the limit as k approaches infinity. We thus obtain the following lower bound on the capacity of a 2-dimensional constraint. Theorem 1. Let T, E be 1-dimensional constraints over an alphabet Σ, with E an edge constraint defined by a graph GE = (VE , EE). Set S = T⊗E and suppose that S has symmetric horizontal edge-constrained strips. Let µ≥0 and α, p, q>0 be integers and φ : (VE)µ+α → [0,∞) be a nonnegative real function. For a positive integer n, let Gn be a deterministic presentation of Vn(S), and set An,φ = 25 3.1. Constraints with symmetric edge-constrained strips A(I(µ, α, n,Gn, GE),Wφ). Then cap(S) ≥ log λ(A2q+p,φ)− log λ(A2q,φ) pα . (3.7) Remark 1. In addition to computing lower-bounds, [3] gives a method for comput- ing upper bounds on the capacity of the hard-square constraint. It can be shown that this method can also be applied to all constraints of the form T⊗E , with E an edge constraint, having symmetric horizontal edge-constrained strips. Remark 2. Theorem 1 can be generalized to apply to 2-dimensional constraints having symmetric horizontal edge-constrained strips, which are not necessarily axial-products. Let S be such a constraint, and for every positive integer m, let G (H) m = (V (H) m , E (H) m ) be the symmetric graph, with no isolated vertices, defining Hm(S). Set GE = G(H)1 , VE = V (H)1 and EE = E(H)1 . We claim there exists a mapping fm : V (H) m →(VE)m such that for every edge e = e0e1. . .em−1∈E(H)m , with each ei∈EE , fm(σ(e)) = (σ(e0), . . ., σ(em−1)) and fm(τ(e)) = (τ(e0), . . ., τ(em−1)). (3.8) This mapping is defined as follows. For a vertex v∈V (H)m , pick an incoming edge e = e0e1. . .em−1∈E(H)m and define fm(v) as (τ(e0), . . ., τ(em−1)). This mapping is uniquely-defined: indeed if e′ = e′0. . .e′m−1∈E(H)m is another incoming edge of v, and g = g0. . .gm−1∈E(H)m is an outgoing edge of v, then clearly, for every i∈[m], both eigi and e′igi are paths in GE ; consequently τ(ei) = τ(e′i). It is easy to check that fm satisfies the conditions in (3.8). Now, replace the definition of (xmk)v in (3.4) with (xmk)v= k−1∏ i=0 φ(uiα:iα+µ+α−1) ; u = fmk(v),v∈V (H)mk . With this new definition and the aid of (3.8), it can be verified that Lemma 2 and consequently Theorem 1 still hold. Remark 3. Clearly, it is sufficient, for the theorem to hold, thatHm(S) is symmet- ric for large enough m. We now give a sufficient condition for the constraint S = T⊗E to have symmetric horizontal edge-constrained strips. For this to happen, we (generally) must have that GE is symmetric. This means that there exists a “matching” be- tween edges, were each edge is matched with an edge in the “reverse” direc- tion. More precisely there is a bijection R : EE → EE such that for all e ∈ EE , 26 3.2. Constraints with symmetric vertex-constrained strips (σ(e), τ(e)) = (τ(R(e)), σ(R(e))) and R(R(e)) = e. We call such a bijection an edge-reversing matching, and we denote byR(GE) the set of all edge-reversing matchings of GE . Clearly a graph G is symmetric iff it has an edge-reversing matching. Thus T⊗E has symmetric horizontal edge-constrained strips iff for ev- ery m, G(H)m has an edge-reversing matching. We present a sufficient condition for this to hold. Proposition 3. Let T, E be 1-dimensional constraints over an alphabet Σ, with E an edge constraint defined by a graph GE = (VE , EE) with R ∈ R(GE) an edge- reversing matching. If for every word e1. . .em∈T one has R(e1). . .R(em)∈T as well (for all m), then T⊗E has symmetric horizontal edge-constrained strips. Proof. Let G(H)m = (V mE , E (H) m ) be the subgraph of G×mE that definesHm(S). We show that G(H)m is symmetric. Let R×m : EmE → EmE be defined by R×m(e1, . . ., em) = (R(e1), . . ., R(em)). Clearly, R×m is an edge-reversing matching of G×mE . Recall that E (H) m consists of all the edges in EmE that, when regarded as m-letter words over Σ, satisfy T . Therefore, by the assumption, it follows that for all e∈E(H)m , R×m(e)∈E(H)m as well. Consequently, R×m restricted to E(H)m , is an edge-reversing matching of G (H) m and hence it is symmetric. If GT is a presentation of T and R∈R(GE), a sufficient condition for the hy- pothesis of Proposition 3 to hold, which may be easier to check, is the existence of a function f : ET → ET satisfying: 1) LT (f(e)) = R(LT (e)) for all e ∈ ET and 2) for any path e1e2 of length 2 in GT , the sequence f(e1)f(e2) is also a path in GT . Indeed, if such a function exists then any path 12. . .m in GT generating a word e1e2. . .em has a corresponding path f(1)f(2). . .f(m) generating the word R(e1)R(e2). . .R(em), and thus the hypothesis of Proposition 3 is fullfiled. In fact, it can be shown that when GT is irreducible, deterministic and has the minimum number of vertices among all deterministic presentations of T , this condition is also necessary for the hypothesis of Propositon 3 to hold (see [26, Section 3.3]). In Section 3.3 we use Proposition 3 to show that the method described in this section can be used to compute lower bounds on CHG(b1)⊗CHG(b2) for any positive integers b1 and b2. 3.2 Constraints with symmetric vertex-constrained strips In this section we present an analog to Theorem 1 that gives lower bounds on the capacities of constraints for which every horizontal or every vertical strip is a sym- 27 3.2. Constraints with symmetric vertex-constrained strips metric vertex-constraint. We do this by transforming a 2-dimensional constraint with symmetric vertex-constrained strips to a 2-dimensional constraint with sym- metric edge-constrained strips, having the same capacity. Fix an alphabet Σ, and let S be a 2-dimensional constraint over Σ. We say that S has horizontal vertex-constrained strips if for every positive integer m, the constraint Hm(S) is a vertex-constraint. If, in addition, every horizontal strip is symmetric, we say that S has symmetric horizontal vertex-constrained strips. The notions of a 2-dimensional constraint with vertical vertex-constrained strips and symmetric vertical vertex-constrained strips are defined analogously. It turns out that RWIM and NAK do not have horizontal or vertical edge- constrained strips, and so the method in section 3.1 does not apply directly. We illustrate this only for horizontal strips for S = RWIM. Recall from Chapter 2 that an edge constraint is a constraint of memory 1 such that any two follower sets are either disjoint or identical. We claim that this condition does not hold for Hm(S). To see this, given any m, let w = w0. . .wm−1 be the all-zeros word of length m and u = u0. . .um−1 be any other word of length m. Now, the m× 2 arrays w0 w0 w1 w1 ... ... wm−1 wm−1 , w0 u0 w1 u1 ... ... wm−1 um−1 , u0 w0 u1 w1 ... ... um−1 wm−1 , belong to S, yet the m× 2 array u0 u0 u1 u1 ... ... um−1 um−1 , does not. Thus,w and u, regarded asm×1 columns, have different but non-disjoint follower sets. Consequently, RWIM does not have horizontal edge-constrained strips. However, it is not hard to show that RWIM and NAK have both symmet- ric horizontal vertex-constrained strips and symmetric vertical vertex-constrained strips. For instance, for S = RWIM,Hm(S) is the vertex constraint defined by the graph G = (V,E), where V consists of all binary vectors u0 . . . um−1 of length m and E consists of a single edge from u ∈ V to v ∈ V iff for all i, whenever ui = 1, then vi+1 = vi = vi−1 = 0 (with the obvious modification when i = 0 or m− 1). And Vm(S) is the vertex constraint defined by the graph of G′ = (V ′, E′), where V ′ consists of all binary vectors u0 . . . um−1 of length m which do not contain two 28 3.2. Constraints with symmetric vertex-constrained strips adjacent ‘1’s and E′ consists of a single edge from u ∈ V to v ∈ V iff for all i, whenever ui = 1, then vi+1 = vi−1 = 0 (again, with the obvious modification when i = 0 or m− 1). Clearly, both G and G′ are symmetric. Now, let S be a 2-dimensional constraint over Σ. For a finite m × n array Γ with m ≥ 1 and n ≥ 2 over Σ its [1× 2]-higher block recoding or [1× 2]-recoding is an m× (n− 1) array Γ̃ over Σ1×2 with entries given by Γ̃i,j = (Γi,j Γi,j+1) ; i = 0, . . . ,m− 1, j = 0, . . . , n− 2. We denote by S[1×2] the set of all [1 × 2]-recodings of arrays in S and refer to it as the [1 × 2]-higher block recoding of S. The [1 × 2]-higher block recoding of a constraint is a constraint. This is stated in the following proposition. Proposition 4. Let S be a 2-dimensional constraint over Σ. Then S[1×2] is a 2- dimensional constraint over Σ1×2. Remark . We may, of course, define, in a similar manner, the [s × t]-higher block recoding of S, for any positive integers s and t, and the [s×t]-higher block recoding of a 2-dimensional constraint S is a 2-dimensional constraint. Proof. Set S′ = S[1×2] and let (GV ,GH) be a presentation of S with GV = (VV , E,L) and GH = (VH, E,L). We construct labeled graphs G′V = (VV×VV , E′,L′) and G′H = (E,E′,L′) as follows. The set of edges E′ is defined as E′ = { (e0 e1)∈E1×2 : e0, e1 is a path in GH } , and the labeling function L′ : E′ → Σ1×2 is given by L′(e0 e1) = (L(e0) L(e1)) ; (e0 e1)∈E′. For every edge (e0 e1)∈E′ we define σG′V (e0 e1) = (σGV (e0), σGV (e1)) τG′V (e0 e1) = (τGV (e0), τGV (e1)) σG′H(e0 e1) = e0 τG′H(e0 e1) = e1 . It’s easy to verify that (G′V ,G′H) is a presentation of S′. Clearly, recoding is an injective mapping, thus |Sm×n| = |S[1×2]m×(n−1)| for all positive integers m ≥ 1, n ≥ 2. It follows that cap(S) = cap(S[1×2]). The next proposition shows that the [1 × 2]-higher block recoding of a constraint with symmetric horizontal vertex-constrained strips has symmetric horizontal edge- constrained strips. 29 3.2. Constraints with symmetric vertex-constrained strips Proposition 5. Let S be a 2-dimensional constraint with horizontal vertex- constrained strips. 1. S[1×2] has horizontal edge-constrained strips. Moreover, S[1×2] has symmet- ric horizontal edge-constrained strips iff S has symmetric horizontal vertex- constrained strips. 2. S[1×2] = V1(S[1×2])⊗H1(S[1×2]). Proof. (1). Let m be a positive integer, G(H)m = (V (H) m , Em) be the graph defining the vertex-constraintHm(S) and set ∆ = Σ1×2, where Σ is the alphabet of S. We define a labeling L : Em → ∆m×1 of the edges of G(H)m . For e∈Em, we regard σ(e) and τ(e) as m×1 arrays over Σ, and define L(e) to be the array in ∆m×1 with entries given by L(e)(i,0) = (σ(e)i,0 τ(e)i,0) ; i∈[m]. It’s easily verified that the word generated by every path in the labeled graph (G (H) m ,L) is the [1×2]-higher block recoding of the array formed by concatenat- ing the vertices along the path horizontally in sequence. It follows that (G(H)m ,L) is a presentation of Hm(S[1×2]). Since the labels of the edges in (G(H)m,L) are distinct, we may identify each edge with its label, and it follows that Hm(S[1×2]) is an edge-constraint. Since the same graph defines both Hm(S) and Hm(S[1×2]) (the former as a vertex-constraint and the latter as an edge-constraint), it follows thatHm(S) is symmetric iffHm(S[1×2]) is. This completes the proof. (2). Clearly, S[1×2] ⊆ V1(S[1×2])⊗H1(S[1×2]). As for the reverse inclusion, let Γ̃ ∈ V1(S[1×2])⊗H1(S[1×2]) be an m × n array over Σ1×2. Since every row of Γ̃ is in H1(S[1×2]), every row has a unique 1 × (n + 1) pre-image under the recoding map. Let Γ be the m × (n + 1) array over Σ, whose ith row is the pre- image under the recoding map of the ith row of Γ̃, for i = 0, . . . ,m−1. Clearly, Γ̃ is the [1× 2]-higher block recoding of Γ. Thus, it suffices to show that Γ ∈ S. For i = 0, 1, . . . , n− 1 clearly, the m× 2 array Γ∗,i:i+1 over Σ recodes to the column Γ̃∗,i. By our assumption this column is in V1(S[1×2]). Since recoding is injective, Γ∗,i:i+1 must be in S. Since this holds for all i = 0, 1, . . . , n− 1 and sinceHm(S) has memory 1, it follows that Γ ∈ S and therefore Γ̃ ∈ S[1×2]. We can now use the method described in Section 3.1 to get lower bounds on 2- dimensional constraints with symmetric horizontal vertex-constrained strips. This is stated in the following theorem. 30 3.3. Capacity bounds for axial products of constraints Theorem 2. Let S be a 2-dimensional constraint over an alphabet Σ with sym- metric horizontal vertex-constrained strips. Let µ≥0, and α, p, q>0 be inte- gers, GE = (VE , EE) be the graph defining the vertex-constraint H1(S) (hence VE⊆Σ), and φ : (VE)µ+α → [0,∞) be a nonnegative function. For an inte- ger n≥2, let Gn be a labeled graph obtained from a deterministic presentation for Vn(S) by replacing each edge-label with its [1×2]-higher block recoding. Set Ãn,φ = A(I(µ, α, n−1,Gn, GE),Wφ), where I, Wφ, and A(I,Wφ) are as de- fined in Section 3.1. Then cap(S) ≥ log λ(Ãp+2q+1,φ)− log λ(Ã2q+1,φ) pα . Proof. Let S′ = S[1×2]. By Proposition 5, S′ = V1(S′)⊗H1(S′), and S′ has hori- zontal symmetric edge-constrained strips. Since GE has no parallel edges, we may identifiy each edge e∈EE with the pair (σ(e), τ(e)); then, with this identification, H1(S′) = X(GE). Also, note that G2q+p+1 and G2q+1 are deterministic presenta- tions for V2q+p(S′) and V2q(S′), respectively. The result follows from Theorem 1 applied to S′. 3.3 Capacity bounds for axial products of constraints In this section we show how the method described in Section 3.1 can be applied to axial products of certain 1-dimensional constraints. Let S and T be two 1- dimensional constraints over an alphabet Σ. We wish to lower bound the capacity of the 2-dimensional constraint T⊗S. To this end, we pick a lossless presentation GS = (GS ,LS), with GS = (VS , ES), for S. We extend the function LS to multidimensional arrays over ES in the manner described in Chapter 2, and for a set A ⊆ Σ∗, we denote by L−1S (A)⊆E∗S the inverse image of A under this map, namely L−1S (A) = {w ∈ E∗S : LS(w)∈A} . The following proposition shows that we can reduce the problem of calculating the capacity of T⊗S to that of calculating the capacity of L−1S (T )⊗X(GS). Proposition 6. Let S, T be two 1-dimensional constraints and let X(GS) and L−1S (T ) be as defined above. Then 1. L−1S (T ) is a 1-dimensional constraint. 2. cap(T⊗S) = cap(L−1S (T )⊗X(GS)). 31 3.3. Capacity bounds for axial products of constraints Proof. 1. Let GT = (VT , ET ,LT ) be a presentation of T . We shall construct a presentation F = (VT , EF ,LF ) of L−1S (T ). The set of edges is given by EF = {(eT , eS)∈ET×ES : LT (eT )=LS(eS)}, and for an edge (eT , eS)∈EF , σF (eT , eS) = σGT (eT ), τF (eT , eS) = τGT (eT ) and LF (eT , eS) = eS . It is easily verified that L−1S (T ) is presented by F , and therefore it is a 1-dimensional con- straint. 2. We set R = T⊗S, U = L−1S (T )⊗X(GS). For an array ∆∈Rm×n, define P∆ = {Γ∈Um×n : LS(Γ) = ∆}, we claim that 1≤|P∆| ≤ |VS |2m. (3.9) Indeed, it’s easily verified that an array Γ∈Em×nS is in P∆ iff for all i∈[m] the row (Γi,j)n−1j=0 is a path in GS that generates (∆i,j)n−1j=0 . Since GS is a lossless presentation of S, for every i∈[m], there is at least one path in GS generating (∆i,j) n−1 j=0 and at most |VS |2 such paths; the claim follows. Now, clearly for any Γ∈Um×n the array LS(Γ) is in Rm×n. It follows that the sets P∆, for ∆∈Rm×n form a partition of Um×n, and we have |Um×n| = ∑ ∆∈Rm×n |P∆|. Therefore, by (3.9), we get |Rm×n| ≤ |Um×n| ≤ |Rm×n| |VS |2m, and it follows from (1.1) that cap(R) = cap(U). Therefore if L−1S (T )⊗X(GS) has symmetric horizontal edge-constrained strips, we can apply the method of Section 3.1 to obtain lower bounds on cap(T⊗S). In this case, it also follows from Remark 1 of Theorem 1, that the method of [3] for obtaining upper bounds on the capacity of the hard-square constraint, can be used to obtain upper bounds on cap(T⊗S). Proposition 3 present a sufficient condition for L−1S (T )⊗X(GS) to have symmetric horizon- tal edge-constrained strips. Here we give another stronger sufficient condition involving only the presentation GS . We say that a labeled graph (G,L), with G = (V,E), is symmetric as a labeled graph, if there exists an edge-reversing matching R ∈ R(G) which preserves L, that is L(R(e)) = L(e) for all e ∈ E. We assume now that GS is symmetric as a labeled graph, and that R ∈ R(GS) is an edge-reversing matching which preserves LS . Since for any positive integer m and e1. . .em∈EmS , the label L(e1). . .L(em) = L(R(e1)) . . .L(R(em)), it follows that e1. . .em∈L−1S (T ) iff R(e1). . .R(em)∈L−1S (T ). Consequently, the hypothesis of Proposition 3 holds and we have the following corollary. 32 3.4. Heuristics for choosing φ Corollary 1. If GS is symmetric as a labeled graph then L−1S (T )⊗X(GS) has symmetric horizontal edge-constrained strips. Since the presentation in Figure 1.1a is symmetric as a labeled graph, we can apply the method of Section 3.1 to get lower bounds on the capacity of all con- straints T⊗EVEN for any 1-dimensional constraint T . Let S = CHG(b1) and let T = CHG(b2) for integers b1, b2 ≥ 2. Let GS = (GS ,LS), with GS = (VS , ES), be the presentation given in Figure 1.1c for b = b1. Evidently, GS is symmetric with exactly one edge-reversing matching, R : ES→ES . Fix a positive integer m and let e = e1e2. . .em∈EmS . Obviously, T is closed under negation of words (i.e., negating each symbol), and we have e1e2. . .em∈L−1S (T ) ⇐⇒ LS(e1)LS(e2). . .LS(em)∈T ⇐⇒ (−LS(e1))(−LS(e2)). . .(−LS(em))∈T ⇐⇒ LS(R(e1))LS(R(e2)). . .LS(R(em))∈T ⇐⇒ R(e1)R(e2). . .R(em)∈L−1S (T ). Consequently, it follows by Proposition 3 thatL−1S (T )⊗X(GS) has symmetric hor- izontal edge-constrained strips and we can apply the method of Section 3.1 to ob- tain lower bounds on the capacity of CHG(b2)⊗ CHG(b1). The reader will note a similarity in the constructions in proofs of Propositions 1 and 6. Indeed, as an alternative approach, one may be able to use the construction in Proposition 1 to obtain bounds on cap(S ⊗ T ): namely, if G1 and G2 are the underlying graphs of a capacity-preserving presentation (G1,G2) of S ⊗ T and X(G1) ⊗ X(G2) has symmetric horizontal edge-constrained strips. However, the approach given by Proposition 6 seems to be more direct and simpler than the alternative approach. 3.4 Heuristics for choosing φ In this section, we use the notation defined in Section 3.1, and assume that S = T⊗E is a 2-dimensional constraint with symmetric horizontal edge-constrained strips, where E is an edge constraint. We describe heuristics for choosing the func- tion φ to obtain “good” lower bounds on the capacity of S. 3.4.1 Using max-entropic probabilites Recall that a vertex of a directed graph is isolated if no edges in the graph are connected to it. Note, that since G(H)m is symmetric, every vertex is either isolated 33 3.4. Heuristics for choosing φ or has both incoming and outgoing edges. We assume here that for every positive integer m, ignoring isolated vertices, G(H)m is a primitive graph. In this case, the Perron eigenvector of Hm is unique up to multiplication by a scalar. Let rm be the right Perron eigenvector of Hm normalized to be a unit vector in the L2-norm. Ob- serve, that substituting rm for xm satisfies (3.2) with an equality. This motivates us to choose φ so that the resulting vector xm approximates rm. Since G (H) m (without its isolated vertices) is irreducible, there is a unique stationary probability measure having maximum entropy on arrays of Hm, namely the max-entropic probability measure onHm. We denote it here by Pr∗,m. It is given by Pr∗,m(Γ) = (rm)σ(Γ) (rm)τ(Γ) λ(Hm)` . For Γ ∈ Sm×`, for some positive integer `, and where σ(Γ), τ(Γ) ∈ V mE are given by (σ(Γ))i = σ(Γi,0) (τ(Γ))i = τ(Γi,`−1) ; i = 0, 1, 2, . . . ,m−1. Let V(m) be a random variable taking values in V mE with distribution given by Pr(V(m)=v) = Pr∗,m {Γ ∈ Sm×1 : σ(Γ) = v} ; v ∈ V mE . It’s easily verified that Pr(V(m)=v) = ((rm)v) 2 . (3.10) Thus approximating Pr(V(m) = v) and taking a square root will give us an approximation for (rm)v. Roughly speaking, Pr(V(m) = v) is the probability of seeing the column of vertices v in the “middle” of anm×` array chosen uniformaly at random from Sm×`, for large `. Fix integers µ ≥ 0, α ≥ 1 as in Section 3.1, and assume now that m = mk = µ + kα, for a positive integer k. For an integer 0≤s<m and vectors u∈V `E ,w∈V rE , with lengths satisfying `≤m−s, r≤s, denote by p(m)s (u) and p (m) s (u|w) the probabilities given by p(m)s (u) = Pr(V (m) s:s+`−1 = u) p(m)s (u|w) = Pr(V(m)s:s+`−1 = u|V(m)s−r:s−1 = w). Then by the chain rule for conditional probability we have, for any vector v∈V mE , Pr(V(m)=v) =p (m) 0 (v0:µ−1) · k−1∏ i=0 p (m) µ+iα(vµ+iα:µ+(i+1)α−1|v0:µ+iα−1). 34 3.4. Heuristics for choosing φ A plausible way to approximate Pr(V = v), is by treating V as the outcome of a Markov process. Here we use a Markov process with memory µ, and as- sume that ps(u|w) can be “well” approximated by ps(u|wr−µ:r−1), for vectors u∈V `E ,w∈V rE , with r, ` as above, and r≥µ. Using this approximation we get Pr(V(m)=v) ≈p(m)0 (v0:µ−1) · k−1∏ i=0 p (m) µ+iα(vµ+iα:µ+(i+1)α−1|viα:µ+iα−1). We hypothesize that for fixed vectors u∈V αE ,w∈V µE , as m gets large, the con- ditional probabilities p(m)s (u|w), for 0sm−1, are “approximately equal” to the value when s is in the “middle” of the interval [0,m− 1]. We hypothesize that this holds for “most” of the integers s in that inteval and moreover that this middle value converges as m gets large. Accordingly, we try to approximate the con- ditional probability p(m)s (u|w) by the conditional probability found in the “mid- dle” of a “tall” horizontal strip. More precisely, we fix an integer δ ≥ 0, set ω=2δ+µ+α, and approximate p(m)s (u|w) by p(ω)δ+µ(u|w). We also approximate p (m) 0 (w) by p (ω) 0 (w). This gives us Pr(V(m)=v) ≈p(ω)0 (v0:µ−1) · k−1∏ i=0 p (ω) δ+µ(vµ+iα:µ+(i+1)α−1|viα:µ+iα−1), which, by (3.10), implies that (rm)v ≈ √ p (ω) 0 (v0:µ−1) · k−1∏ i=0 √ p (ω) δ+µ(vµ+iα:µ+(i+1)α−1|viα:µ+iα−1). (3.11) Set Fm=|VE |m, and denote by r̃mk∈RFmk the nonnegative real vector with entries indexed by V mkE and given by the RHS of equation (3.11). Let φ : (VE) µ+α → [0,∞) be given by φ(u) = √ p (ω) δ+µ(uµ:µ+α−1|u0:µ−1) ; u∈(VE)µ+α, (3.12) and let xmk ∈ RFmk be the vector with entries indexed by V mkE and defined by (3.4). We obtain (rmk)v ≈ (r̃mk)v = (xmk)v √ p (ω) 0 (v0:µ−1) ; v∈(VE)mk . 35 3.4. Heuristics for choosing φ Now for mk≥ω, if v∈V mkE is not an isolated vertex in Gmk , then clearly, v0:ω−1 is not an isolated vertex in Gω as well. Therefore (rω)v0:ω−1>0, which implies that p (w) 0 (v0:ω−1)>0 and thus p (ω) 0 (v0:µ−1)>0. Let pmin = p (ω) min = min{p(ω)0 (w) : w∈V µE and p(ω)0 (w)>0}. It follows that for all vertices v∈ (VE)mk of Gmk that are not isolated, we have (r̃mk)v ≥ √ pmin (xmk)v . Now, for any positive integer ` and (Fmk×1)-real vector y, the product ytH`my depends only on the values of the entries of y indexed by non-isolated vertices of Gmk . Consequently, we may write pminx t mk H`mkxmk ≤ r̃mktH`mk r̃mk ≤ xtmkH`mkxmk , for all positive integers `. Taking the log, dividing by mk, and taking the limit as k approaches infinity, we obtain lim k→∞ log r̃mk tH`mk r̃mk mk = lim k→∞ logxtmkH ` mk xmk mk , where by Lemma 2, the limit in the RHS exists. Thus, choosing φ as given by (3.12) and computing the lower bound by the method described in Section 3.1 is equiv- alent to computing the limit of the lower bound in (3.2), with r̃m substituted for xm, as m approaches infinity. If r̃m approximates rm well enough, we expect to get good bounds. Note that we may use the heuristic described here even for constraints for which the graphs G(H)m are not always irreducible. In this case, the geometric multiplicity of the Perron eigenvalue may be larger than 1, and there may be more than one choice of the vector rω in the computation of p (ω) δ+µ(·|·). Re- gardless of our choice, we will get a nonnegative function φ and a lower bound on the capacity. In Section 3.5 we show numerical results obtained using the heuristic described here for several constraints. 3.4.2 General optimization We may also use general optimization techniques to find functions φ which max- imize the lower bound on the capacity. Fix integers µ≥0 and p, q, α>0, and for a positive integer `, set D` = (VE)`. In this subsection, we identify a function φ:Dµ+α→R with a real vector φ∈R|Dµ+α| indexed by Dµ+α; for each j∈Dµ+α we identify φ(j) with the entry φj. For a positive integer n, let Gn be a de- terministic presentation for Vn(S), In = I(µ, α, n,Gn, GE), and for a function φ : Dµ+α→[0,∞), set An,φ = A(In,Wφ). Observe that for a scalar c ∈ [0,∞), 36 3.4. Heuristics for choosing φ An,cφ = c 2An,φ. It follows that using cφ in place of φ in equation (3.7) of Thereom 1 does not change the lower bound. Consequently, (as φ cannot be the constant 0 function), it’s enough to consider functions φ whose images (of all vec- tors in (VE)µ+α) sum to 1. We thus have the following optimization problem. maximize (log λ(A2q+p,φ)− log λ(A2q,φ)) / (pα) subject to φ ≥ 0, φ · 1 = 1, (3.13) where 0 and 1 denote the real vectors of size |Dµ+α| with every entry equal to 0 and 1 respectively, and for two real vectors of the same size, t, r we write t ≥ r or t>r if the corresponding inequality holds entry-wise. Finding a global solution for a general optimization problem can be hard. We proceed to show that if we replace the constraint φ≥0 with φ>0 in (3.13), thereby changing the feasable set and possibly decreasing the optimal solution, it can be formulated as an instance of a particular class of optimization problems known as “DC optimization” which may be easier to solve. Let d be a positive integer. A real-valued function f : Rd → R is called a DC (difference of convex) function, if it can be written as the difference of two real-valued convex functions on Rd. An optimization problem of the form maximize f(x) subject to x∈X, hi(x) ≤ 0 ; i = 0, 1, . . . , `, where X ⊆ Rd is a convex closed subset of Rd and the functions f, h0, . . . , h` are DC functions, is called a DC optimization or DC programming problem. See [17] and the references within for an overview of the theory of DC optimization. A nonnegative function f : Rd → [0,∞) is called log-convex or superconvex, if either f(t)>0 for all t∈Rd and log f is convex in Rd, or f≡0. A log-convex function is convex, and in [24], it is shown that the class of log-convex functions is closed under addition, multiplication, raising to positive real powers, taking limits, and additionally that for a square matrix A(t) = (ai,j(t)) whose entries are log- convex functions ai,j : Rd → [0,∞), the function t → λ(A(t)) is log-convex as well. Now, observe, that for a positive integer n, every entery of An,φ is a quadratic form in the entries φ(j), j∈Dµ+α, with nonnegative integer coefficients. Such a function is generally not log-convex. To fix this, we perform the change of vari- ables φ = eψ, where ψ is a real-valued function ψ : Dµ+α → R. Note that by doing so, we added the constraint φ>0. Since every entry of φ is now positive, 37 3.5. Numerical results for selected constraints we may replace the constraint φ · 1 = 1 by the constraint φ(v0) = 1 or equiva- lently ψ(v0) = 0, for some fixed v0∈Dµ+α. Problem (3.13) with the additional constraint φ>0, can now be rewritten as maximize ( log λ(A2q+p,eψ)− log λ(A2q,eψ) ) / (pα) subject to ψ(v0) = 0. (3.14) Obviously, we may substitue the maximization problem constraint, ψ(v0) = 0, above into the objective function, thereby reducing the number of variables by 1; however, this is not relevant for the discussion, so, for simplicity, we do not do so here. Now, for a positive integer n, the entries of the matrix An,eψ are of the form qi,j∑ k=1 eψ(wk,i,j)+ψ(uk,i,j), where qi,j are nonnegative integers, and wk,i,j and uk,i,j are vectors in Dµ+α, for all i, j∈VIn and integers 1≤k≤qi,j . It can be verified that a function of such a form is log-convex in ψ. It follows that the function ψ→λ(An,eψ) for ψ ∈ R|Dµ+α| is log-convex as well. Therefore either λ(An,eψ)≡0, or λ(An,eψ)>0 for all ψ ∈ R|Dµ+α|. In particular, forψ≡0 the matrixAn,1 is the adjacency matrix of the graph In. Since In is deterministic, and for every nonnegative integer `, the set of labels of its paths of length ` is S`α×n, it follows that (1/α) log λ(An,1)=cap(Vn(S)) ≥ cap(S) (the latter inequality follows from (2.8)). Hence, if cap(S)>−∞ (or equiv- alently cap(S)≥0), then λ(An,eψ)>0 for all ψ ∈ R|Dµ+α| and log λ(An,eψ) is a convex function of ψ. Clearly cap(S)>−∞ iff. Sm×n 6=∅, for all positive integers m,n. We thus obtain the following theorem. Theorem 3. Let S be a constraint such that for all positive integersm,n, Sm×n 6=∅ then Problem (3.14) is a DC optimization problem. 3.5 Numerical results for selected constraints In this section we give numerical lower bounds on the capacity of some 2- dimensional constraints obtained using the method presented in the sections above. The constraints considered are NAK, RWIM, EVEN⊗2, and CHG(3)⊗2. Ta- ble 3.2 summarizes the best lower bounds obtained using our method. For compar- ison, we provide the best lower bounds that we could obtain using other methods. We also give upper bounds on the capacity of these constraints obtained using the method of [3]. Table 3.3 shows the lower bounds obtained using our max-entropic probability heuristic for choosing φ, described in Section 3.4.1. Table 3.4 shows the 38 3.5. Numerical results for selected constraints lower bounds obtained with our method by trying to solve the optimization prob- lem described in Section 3.4.2. In this, we did not make use of the DC property of the optimization problem; instead, we used a generic sub-optimal optimization algorithm whose results are not guarenteed to be global solutions. Utilizing special algorithms for solving DC optimization problems may give better lower bounds. The rightmost column of each of these tables shows the lower bound calculated for the same values of p and q using the method of [3, 7]. The largest lower-bound obtained for each constraint is marked with a ‘?’. In the next subsections we give remarks specific to some of these constraints. As can be seen from the results in Table 3.4, using the optimization heuristic with our method gives better lower bounds on the capacity than the method of [3,7]. On the other hand, using optimization typically requires many evaluations of the objective function which results in longer running times compared to the method of [3, 7], for the same values of p and q. Increasing the value of q in the method of [3, 7] usually gives better lower bounds but requires more time and memory to run. It is therefore relevant to check if their method can be used to attain the lower bounds, obtained using the technique described here, with the same computational resources. Our experiments show that this is typically not the case. For example, for the NAK constraint, when µ = 1, α = 1, p = 1 and q = 4, an implementation of the method described here with the optimization heuristic took roughly 70 sec- onds to run, required operating on matrices of size 144×144 and resulted with the lower bound 0.4250767727. In contrast, our implementation of the method of [3,7] with p = 1, and q = 8, took roughly 1000 seconds to run, required operating on matrices of size 6765×6765 and resulted with the lower bound 0.4250725619. Ta- ble 3.1 shows some of the lower bounds obtained using the method of [3, 7] along with the size of the largest square matrix involved in the computation. The numerical results were computed using the eigenvalue routines in Mat- lab and rounded (down for lower bounds and up for upper-bounds) to 10 decimal places. Given accuracy problems with possibly defective matrices, we verified the results using the technique described in [35, Section IV]. Table 3.1: Matrix size in the method of [3, 7] for the NAK constraint. p q Lower bound using [3, 7] Largest matrix size 1 5 0.4248771038 377×377 2 4 0.4249055702 233×233 1 6 0.4250215987 987×987 1 7 0.4250615286 2584×2584 2 6 0.4250636891 1597×1597 1 8 0.4250725619 6765×6765 39 3.5. Numerical results for selected constraints 3.5.1 The constraint RWIM Observe that this constraint has both symmetric horizontal and symmetric verti- cal vertex-constrained strips. Thus, we can apply our method in the vertical as well as the horizontal direction to get lower bounds. Clearly, cap(RWIMt) = cap(RWIM), so we can obtain additional lower bounds on cap(RWIM) by using our method to get lower bounds on cap(RWIMt). Some of these bounds are given in Tables 3.3 and 3.4. 3.5.2 The constraint EVEN⊗2 We used the reduction described in Section 3.3 with GEVEN being the presentation of EVEN given in Figure 1.1a, to get lower bounds on the capacity of EVEN⊗2. Table 3.4 gives the results obtained with our method using the optimization de- scribed in Section 3.4.2. We also used the method with the max-entropic probabil- ity heuristic of Section 3.4.1 and the results are given in Table 3.3. 3.5.3 The constraint CHG(b)⊗2 For this constraint, the case b=1 is degenerate. Indeed, there are exactly two m×n arrays in CHG(1)⊗2 for all positive integers m and n, and consequently, cap(CHG(1)⊗2)=0. For b=2, we show in Theorem 5 in Chapter 4 that the ca- pacity is exactly 1/4. For b=3, we used the reduction of Section 3.3 with GCHG(3) being the presentation of CHG(3) given in Figure 1.1c, to get lower bounds on the capacity of CHG(3)⊗2. Table 3.4 gives the results obtained with our method using the optimization described in Section 3.4.2. We also used the method with the max-entropic probability heuristic of Section 3.4.1 and the results are given in Table 3.3. Table 3.2: Best bounds on capacities of certain constraints. Constraint Previous best lower bound New lower bound Upper bound NAK 0.4250725619?? 0.4250767745 0.4250767997? RWIM 0.5350150??? 0.5350151497 0.5350428519? EVEN⊗2 0.4385027973?? 0.4402086447 0.4452873312? CHG(3)⊗2 0.4210209862?? 0.4222689819 0.5328488954? ?Calculated using the method of [3]. ??Calculated using the method of [3, 7]. ???Appears in [42]. 40 3.5. Numerical results for selected constraints Table 3.3: Lower bounds using max-entropic probability heuristic (Section 3.4.1). Constraint δ µ α p q Lower bound Using [3, 7] NAK 3 1 1 1 5 0.4250766244 0.4248771038 3 1 1 2 4 0.4250766446 0.4249055702 6 1 1 1 5 0.4250767227 0.4248771038 3 3 4 1 5 0.4250767590 0.4248771038 7 1 1 1 5 0.4250767617 0.4248771038 3 1 1 2 6 0.4250767647 0.4250636891 3 1 4 1 5 0.4250767733 0.4248771038 5 1 1 1 5 0.4250767744 0.4248771038 3 1 4 2 6 0.4250767745 0.4250636891 3 1 2 2 6 0.4250767745 0.4250636891 RWIM 3 1 3 1 6 0.5350147968 0.5235145644 1 1 1 3 6 0.5350148753 0.5318753627 3 2 2 1 5 0.5350148814 0.5160533001 3 1 2 2 6 0.5350149069 0.5337927416 2 1 2 2 6 0.5350149071 0.5337927416 0 1 2 2 6 0.5350149136 0.5337927416 2 2 2 1 5 0.5350149271 0.5160533001 1 1 2 2 6 0.5350149462 0.5337927416 1 1 3 1 6 0.5350149525 0.5235145644 1 1 1 1 7 0.5350149707 0.5280406048 RWIMt 4 1 3 2 4 0.5350145937 0.5350144722 1 1 1 1 5 0.5350146612 0.5350149478 4 2 1 1 4 0.5350147212 0.5350142142 3 1 1 1 5 0.5350147328 0.5350149478 5 1 1 1 5 0.5350147619 0.5350149478 2 2 1 1 4 0.5350147969 0.5350142142 4 1 1 1 5 0.5350148255 0.5350149478 2 1 1 1 5 0.5350148449 0.5350149478 0 1 1 1 5 0.5350148814 0.5350149478 0 2 1 1 4 0.5350148980 0.5350142142 EVEN⊗2 3 2 1 1 3 0.4383238232 0.4347423815 3 1 1 1 4 0.4383243738 0.4367818624 3 1 3 2 3 0.4383632350 0.4356897662 3 1 2 4 3 0.4383838005 0.4364303826 3 1 1 2 4 0.4384647082 0.4371709990 3 1 3 3 3 0.4384906740 0.4360537982 3 1 2 1 4 0.4385448358 0.4367818624 3 1 2 2 4 0.4386655840 0.4371709990 3 1 3 1 4 0.4387455520 0.4367818624 CHG(3)⊗2 0 0 1 1 2 0.4188210386 0.4101473707 0 0 1 1 4 0.4222689819? 0.4197053158 ?Best lower bound. 41 3.5. Numerical results for selected constraints Table 3.4: Lower bounds using optimization (Section 3.4.2). Constraint µ α p q Lower bound Using [3, 7] NAK 2 1 2 4 0.4250767692 0.4249055702 1 2 1 5 0.4250767736 0.4248771038 1 1 3 4 0.4250767737 0.4248960814 1 2 1 3 0.4250767739 0.4224650194 1 1 4 4 0.4250767739 0.4249674993 1 1 5 4 0.4250767740 0.4249783192 1 1 6 4 0.4250767741 0.4249995626 1 2 3 3 0.4250767743 0.4244240822 1 2 6 3 0.4250767744 0.4247979797 1 2 2 5 0.4250767745? 0.4250294285 RWIM 1 1 1 3 0.5350147515 0.4832292495 1 1 2 3 0.5350148675 0.5300373650 1 1 3 3 0.5350149371 0.5212673183 1 1 1 4 0.5350150805 0.5037272248 1 1 2 4 0.5350151001 0.5318663054 1 1 3 4 0.5350151123 0.5265953036 1 1 1 5 0.5350151372 0.5160533001 1 1 2 5 0.5350151410 0.5330440001 1 1 2 6 0.5350151491 0.5337927416 1 2 1 5 0.5350151497? 0.5160533001 RWIMt 1 2 4 3 0.5350151364 0.5350130576 1 2 3 4 0.5350151377 0.5350146307 1 2 5 3 0.5350151392 0.5350134356 2 1 1 4 0.5350151405 0.5350142142 1 1 1 5 0.5350151442 0.5350149478 1 2 1 4 0.5350151465 0.5350142142 1 2 1 5 0.5350151481 0.5350149478 1 2 2 4 0.5350151482 0.5350144722 1 3 1 4 0.5350151483 0.5350142142 EVEN⊗2 1 1 1 3 0.4395381520 0.4347423815 1 1 2 3 0.4397347451 0.4356897662 1 1 1 4 0.4402086447? 0.4367818624 CHG(3)⊗2 0 1 1 2 0.4189237100 0.4101473707 0 1 2 2 0.4197037681 0.4182017399 0 1 3 2 0.4201450063 0.4176642274 0 1 1 3 0.4210954837 0.4165892023 0 1 2 3 0.4214748454 0.4210209862 1 4 0.4197053158 ?Best lower bound. 42 3.6. Open questions 3.6 Open questions The authors of [9, 10, 35] show how the methods of [7] and [3] can be extended to get lower and upper bounds on the capacity of constraints with dimension larger than 2 which are “symmetric in all but one direction”. Similarly, it should be a relatively easy exercise to extend our generalization of the method to obtain better lower bounds to the capacities of such constraints. For aD-dimensional constraint S over Σ, one can define a more general notion than capacity called “weighted-capacity” or “pressure” [10, 11]. First, one assigns a positive weight to each symbol in Σ. Then, the weight of a D-dimensional array Γ over Σ, denotedW(Γ) is defined to be the product of the weights of its entries. The weighted-capacity of S is now given by lim m→∞ ∑ Γ∈SmW(Γ) |[m]| . As before, the limit exists since the numerator is an entry-wise subadditive func- tion. It would be beneficial to extend the method of this chapter to obtain lower bounds on the weighted-capacity of symmetric constraints. Finally, consider the method of [3] for obtaining upper-bounds on the capacity of 2-dimensional symmetric constraints. While the method gives better upper- bounds than those obtained by computing the normalized capacity of a horizon- tal or vertical strip, empirical results suggest that for many constraints the upper- bounds approach the capacity slower than their lower-bound counterparts. Hence, a better method for upper bounding the capacity of symmetric (and general) con- straints would be useful. 43 Chapter 4 Exact computation of capacity∗ As already stated, finding the exact capacity of D-dimensional constraints, for D>1, is hard, and the list of constraints for which the capacity is known precisely is quite small. In this chapter we add two families of isotropic multidimensional constraints to this list, namely ODD⊗D and CHG(2)⊗D. Some of the results pre- sented here have been extended to other similar constraints in [22]. 4.1 The capacity of ODD⊗D Theorem 4. For all positive integers D, cap ( ODD⊗D ) = 1 2 . Proof. Let S be the D-dimensional constraint ODD⊗D. We first show cap(S)≥1/2. For an integer n, let Yn ⊆ [2n]D be the set of all vectors in [2n]D whose entries sum to an even number, and let Xn be the set of all binary D- dimensional arrays Γ of size 2n×2n×. . .×2n, with entries satisfying (Γ)j = 0 for all j∈Yn. Then the number of zeros between consecutive ‘1’s, in any row of an array in Xn is odd since it must be of the form i−j−1 for some integers i, j— either both odd, or both even. Thus, all such arrays satisfy the constraint S, and since |Xn| = 2(2n)D−|Yn| = 2(2n)D/2, we have |S2n×2n×...×2n| ≥ 2(2n)D/2 for all positive integers n, which implies cap(S)≥1/2. It remains to show that cap(S)≤1/2. Since cap(ODD⊗D) is non-increasing in D, it’s enough to show cap(S)≤1/2 for D = 1. Let n be a positive integer. It can be easily verified that any 1-dimensional array Γ∈ODDn with entries indexed by [n], satisfies either Γj = 0 for all even integers j∈[n], or Γj = 0 for all odd integers j∈[n]. It follows that |ODDn|≤2dn/2e+2bn/2c which implies the desired inequality ∗A version of this chapter has been published. Louidor, E. and Marcus, B.H. (2010) Improved Lower Bounds on Capacities of Symmetric 2-Dimensional Constraints using Rayleigh Quotients. IEEE Transactions on Information Theory 56:1624–1639. 44 4.2. The capacity of CHG(2)⊗D 4.2 The capacity of CHG(2)⊗D Theorem 5. For all positive integers D, cap ( CHG(2)⊗D ) = 1 2D . Proof. Let S = CHG(2)⊗D. We first show that cap(S)≥1/2D. Let Γ(0),Γ(1) be theD-dimensional arrays of size 2×2×. . .×2 with entries indexed by {0, 1}D and given by ( Γ(i) ) j = (−1)i+j·1 ; j ∈ {0, 1}D , where as usual 1 denotes the D-dimensional vector with every entry equal to 1. Observe that the sum of every row of both of these arrays is zero. Now, let n be a positive integer. For any D-dimensional array of size n×n×. . .×n with en- tries in {0, 1}, it can be easily verified that replacing every entry equal to 0 with Γ(0) and every entry equal to 1 with Γ(1) results in a D-dimensional array of size 2n×2n×. . .×2n that satisfies S. It follows that |S2n×2n×...×2n|≥2nD for all posi- tive integers n, which implies cap(S)≥1/2D. We now show that cap(S)≤1/2D. For a positive integer n≥2, denote by N (0)n the set of all even integers in {0, 1, . . ., n−2} and byN (1)n the set of all odd integers in {0, 1, . . ., n−2}. We shall make use of the following lemma. Lemma 3. Fix a positive integer n≥2, and let (ai)n−1i=0 ⊆{+1,−1} be a sequence of length n. Then a0. . .an−1∈CHG(2) if and only if at least one of the following statements hold. 1. For all i∈N (0)n , ai=−ai+1. 2. For all i∈N (1)n , ai=−ai+1. Proof. We first show the “if” direction. Let (ai)n−1i=0 ⊆ {+1,−1} be a sequence for which at least one of statements 1,2 of the lemma holds. Then clearly, for any integers 0≤i≤j<n, all the terms in the sum ∑jk=i ak, with the possible exception of the first and last terms, cancel. Therefore |∑jk=i ak| ≤ |ai| + |aj | = 2 and a0. . .an−1∈CHG(2). As for the “only if” direction, let (ai)n−1i=0 ⊆{+1,−1} such that a0. . .an−1 ∈ CHG(2), and consider the presentation of the CHG constraint given in Figure 1.1c for b = 2 (and vertices {0, 1, 2}). Let (ei)n−1i=0 be a path in this presentation gen- erating a0. . .an−1. It’s easily verified that if σ(ei) = 1, for some i∈[n−1], then aj = −aj+1 for all integers i≤j≤n−2 such that j≡i (mod 2). Evidently, either σ(e0) = 1 and so statement 1 holds, or σ(e1) = 1 implying statement 2. 45 4.2. The capacity of CHG(2)⊗D We now return to the claim that cap(S)≤1/2D. Fix a positive integer n≥2. For an integer 1≤i≤D, let e(i)∈{0, 1}D, be the vector, indexed by {1, 2, . . .,D}, containing 1 in its ith entry and 0 everywhere else and let Ji⊆[n]D denote the subset of all the vectors indexed by {1, 2, . . .,D} with a 0 in the ith entry. For a vector j∈Ji, the sequence ( j+ ke(i) )n−1 k=0 , is a sequence of indices of entries of a row in direction i of a D-dimensional n×n×. . .×n array, and we shall say that it is a sequence in direction i. Let R(n,D) be the set of all such sequences for all integers 1≤i≤D and vectors j∈Ji, and let r∈{0, 1}|R(n,D)| be a binary vector indexed by R(n,D). For the purpose of this proof, let us refer to a sequence (ai) n−1 i=0 ⊆ {+1,−1} as a phase-0 sequence if statement 1 of Lemma 3 holds, and as a phase-1 sequence if statement 2 holds (note that a sequence may be both a phase 0 and a phase 1 sequence). Also, we denote by X(r)⊆{+1,−1}∗D , the set of all D-dimensional arrays Γ of size n×n×. . .×n for which the row Γ%̄ is a phase-r%̄ sequence, for all %̄ = ( j+ ke(i) )n−1 k=0 ∈R(n,D). Then by Lemma 3, we have Sn×...×n = ⋃ r X(r). (4.1) We shall give an upper bound on the size of X(r). For a vector v∈[n]D, denote by ρ(i,v) the unique sequence in direction i in R(n,D) that has v as one of its elements. Let Tr,i : [n]D→ZD be given by: Tr,i(v) = { v + e(i) if vi ≡ rρ(i,v) (mod 2) v − e(i) otherwise , v∈[n]D, v = (v1, . . ., vD). Next, we define the undirected graph Gr = (V,Er) (without parallel edges), with vertices given by V = [n]D, and edges given by Er = { u v : u,v∈V and v = Tr,i(u) for some integer 1≤i≤D } , where u v denotes the undirected edge connecting vertices u,v. It’s easy to verify that an array Γ∈{+1,−1}∗D of size n×n×. . .×n is in X(r) iff for every edge u v∈Er, it holds that Γu = −Γv. Figure 4.1 shows an example of Gr for D = 2. Let C1, . . ., C` be the connected components of Gr, and let v(1), . . .,v(`) be arbitrary vertices such that v(i)∈Ci, for i = 1, 2, . . ., `. It follows that for every 46 4.2. The capacity of CHG(2)⊗D 0 1 1 1 2 0 3 1 4 1 5 0 0 1 1 1 2 1 3 0 4 1 5 0 Figure 4.1: Example of the graph Gr for D = 2, n = 6. Each entry of r corre- sponding to a row (column) is written to the right of it (below it). The index of each row (column) is written to its left (above it). vector b = (b1, . . ., b`)∈{+1,−1}`, there exists at most one array Γ∈X(r) sat- isfying Γv(i) = bi for all i∈{1, 2, . . ., `}, and consequently, |X(r)|≤2` (in fact, while this is not needed for the proof, such an array Γ∈X(r) does exist, for any choice of b, since each Ci is bipartite; thus |X(r)| = 2`). Now, let u = (u1, . . ., uD)∈([n] \ {0, n−1})D be a vertex in the “interior” of Gr. We show that the connected component of Gr containing u has at least 2D vertices. To this end, we match each word w = w1w2. . .wD∈{0, 1}D, with a sequence of vertices ( pi(w,j) )D j=0 ⊆V , defined recursively by pi(w,j) = u if j=0 pi(w,j−1) if j > 0 and wj = 0 Tr,j(pi (w,j−1)) if j > 0 and wj = 1 . It’s easy to verify that since every 1≤ui≤n−2, the sequence is well-defined and indeed (pi(w,j))Dj=0 ⊆ [n]D. Clearly, the sequence is contained entirely in the connected component containing u, and so this component contains the vertex pi(w,D). Write pi(w,D)=(pi(w,D)1 , . . ., pi (w,D) D ). Then for i=1, 2, . . .,D, it holds that pi (w,D) i =ui if wi=0, and pi (w,D) i =ui±1 if wi=1. Therefore, for two distinct words w,w′∈{0, 1}D, the vertices pi(w,D) and pi(w′,D) are distinct as well, and conse- quently there are 2D such vertices. Thus, the connected component of Gr contain- ing u has at least 2D vertices. It follows that there are at most nD/2D connected components ofGr containing a vertex in {1, 2, . . ., n−2}D. There are at most nD − (n−2)D connected compo- nents not containing a vertex in {1, 2, . . ., n−2}D and hence the total number of 47 4.2. The capacity of CHG(2)⊗D connected components, `, in Gr satisfies `≤nD/2D + nD − (n−2)D. Hence, |X(r)|≤2nD/2D+nD−(n−2)D . Since there are 2Dn D−1 binary vectors r∈{0, 1}|R(n,D)|, we obtain from (4.1) |Sn×n×...×n| ≤ ∑ r |X(r)| ≤ 2nD/2D+nD−(n−2)D+DnD−1 = 2n D(1/2D+1−(1−2/n)D)+DnD−1 , and the result follows from (1.1). 48 Chapter 5 Multi-choice constraints and independence capacity∗ In this chapter we generalize some of the concepts appearing in [37, 38] to more than 1 dimension and to non-binary alphabets. In particular we define the notion of independence capacity of a multidimensional constraint which, roughly, is the contribution to the capacity resulting from independence between symbols in ar- rays of the constraint. For the binary alphabet {0, 1} this concept coincides with the maximum insertion rate defined in [38]. We also show that for a 1-dimensional constraint S of finite-type with 0 independence entropy, cap(S⊗D) converges to 0 exponentially fast as D approaches infinity. 5.1 Multi-choice constraints Let Σ̂ denote the set of all nonempty subsets of Σ. Let S be a D-dimensional constraint over Σ and for a D-dimensional array Γ̂∈Σ̂m for some m∈ND, define the set of possible choices of entries in Γ̂, denoted Φ(Γ̂), by Φ(Γ̂) = { Γ∈Σm : For all i∈[m], Γi∈Γ̂i } . We define the multi-choice set corresponding to S, denoted Ŝ, by Ŝ = { Γ̂∈Σ̂∗D : Φ(Γ̂) ⊆ S } , and, as for constraints, we use the notation Ŝm, for m∈ND, to denote the set of D-dimensional arrays of size m in Ŝ. Note that Ŝ is closed under taking of contiguous sub-arrays, and so it’s plausible that Ŝ is a D-dimensional constraint over Σ̂. Unfortunately, we do not know if this is true for general D-dimensional constrained systems S. In this work, we are mostly interested in Ŝ for a constraint S which is an axial product of some D 1-dimensional constraints. In this case, it ∗A version of this chapter has been submitted for publication. Louidor, E., Marcus B.H. and Pavlov R. (2010) Independence Entropy of Zd Shift Spaces. 49 5.1. Multi-choice constraints turns out that Ŝ is indeed a constrained system. This is an easy corollary of the next theorem which shows that for a 1-dimensional constraint S, Ŝ is indeed a 1- dimensional constraint. This is shown in [38] for Σ = {0, 1}. For completeness, we state and prove the theorem for general alphabets. We require the following definitions. Let Σ be a finite alphabet. For words x, y∈Σ∗ and integer n we use the conventional notation of |x|, xy, xn to denote, respectively, the length of x, the word formed by concatenating y to the right of x, and the word formed by concatenating x to itself n times. We use ε to denote the empty word. Let S be a 1-dimensional constraint over Σ. For a word x∈S the follower set of x, denoted F(x) = FS(x) is given by FS(x) = {w∈Σ∗ : xw∈S} . If G = ((V,E),L) is a presentation of S, and for v∈V , we denote by F(v) the set of words generated by paths in G starting at v, then FS(x) = ∪vF(v), where the union is taken over all terminal vertices, v, of paths in G generating x. It follows that the number of follower sets of a constraint is finite. In [38] the authors construct a presentation of Ŝ when Σ = {0, 1}. We generalize their construction here for arbitrary alphabets and denote it by ĜFS = ((VFS , EFS ),LFS ). The set of vertices VFS consists of all (finite) intersections of follower sets of words in S: VFS = { k⋂ i=1 FS(wi) : w1, . . ., wk∈S, k=1, 2, . . . } . For a vertex v∈VFS , v= ⋂k i=1FS(wi), and symbol â∈Σ̂ with â⊆v define δ(v, â) = ⋂ a∈â k⋂ i=1 FS(wia) (note that wia∈S for every a∈â and i = 1, 2, . . ., k). It’s easy to verify that δ(v, â) = {w∈Σ∗ : aw∈v for all a∈â}, and therefore δ(v, â) does not depend on the choice of k and w1, . . ., wk. The set EFS is now defined by EFS = {(v, â, δ(v, â)) : v∈VFS , â⊆v} , and for an edge e = (v, â, δ(v, â))∈EFS we define σ(e) = v, τ(e) = δ(v, â) and LFS (e) = â. We are now ready to prove the aforementioned theorem. Theorem 6. If S is a 1-dimensional constraint over an alphabet Σ, then Ŝ is a 1-dimensional constraint over Σ̂ and ĜFS is a presentation of Ŝ. 50 5.2. Independence capacity Proof. The theorem readily follows from the following fact, which is easily veri- fied by induction on the length of w. Fact 1. For any vertex v = ⋂k i=1FS(wi)∈VFS , and a word w∈Σ̂∗, there is a path starting at v and generating w in Ĝ if and only if Φ(w)⊆v, in which case the path ends at the vertex ⋂k i=1 ⋂ z∈Φ(w)FS(wiz). Now, for a word x̂∈Σ̂∗ generated by a path of Ĝ starting at some vertex v, we have Φ(x̂)⊆v⊆S, and therefore x̂∈Ŝ. Conversely, since S = F(ε), it follows from the above fact that there is a path generating any word x̂∈Ŝ (starting from F(ε)) in Ĝ. Let S(1), . . ., S(D) be 1-dimensional constraints over Σ, and set S = S(1)⊗. . .⊗S(D). Then it’s easy to verify that Ŝ = Ŝ(1)⊗. . .⊗Ŝ(D), and we have the following corollary: Corollary 2. If S(1), . . ., S(D) are 1-dimensional constraints over Σ, and S = S(1)⊗. . .⊗S(D), then Ŝ is a D-dimensional constrained system. 5.2 Independence capacity In this section, we introduce the notion of “independence capacity” of a constraint. Roughly, this is the part of the capacity resulting from inter-symbol independence in elements of the constraint. Let S be a D-dimensional constraint over Σ. For an array Γ̂∈Ŝ of size m, the real number log(|Φ(Γ̂)|)/|[m]| can be thought of as the contribution to the capacity resulting from independence between entries in elements of S as “cap- tured” by Γ̂. We define the independence capacity as the limit (as m → ∞) of the maximum possible such contribution. Precisely, observe that the mapping m → log max { |Φ(Γ̂)| : Γ̂∈Ŝm } for m∈ND is entry-wise subadditive. Using Lemma 1, we define the independence capacity of S, denoted capind(S), by capind(S) = limm→∞ log max { |Φ(Γ̂)| : Γ̂∈Ŝm } |[m]| = inf m log max { |Φ(Γ̂)| : Γ̂∈Ŝm } |[m]| . (5.1) As already mentioned [38] defines the independence capacity of a 1-dimensional constraint, when Σ = {0, 1}, and calls it the “maximum insertion rate”. It is 51 5.2. Independence capacity shown that the maximum insertion rate of a 1-dimensional constraint S can be determined from a presentation Ĝ of Ŝ, when Σ = {0, 1}. The next theorem shows the generalization of this result to larger alphabets. Let G = ((V,E),L) be a labeled graph. A cycle (ei)`i=1⊆E is simple, if the vertices τ(e1), . . ., τ(e`) are distinct. Theorem 7. Let S be a 1-dimensional constraint over an alphabet Σ, and pick any presentation Ĝ = ((V,E),L) for Ŝ. Then capind(X) = max { log |Φ(ŵ)| |ŵ| : ŵ is generated by a simple cycle of Ĝ } . (5.2) Proof. Denote the RHS of (5.2) by ν∗. Let ŵ∗ be a word generated by a simple cycle of Ĝ such that ν∗ = (log |Φ(ŵ∗)|)/|ŵ∗|. Set ` = |ŵ∗|. For n∈N, clearly, the word ŵn∗∈Ŝn`, and (log |Φ(ŵn∗ )|)/(n`) = ν∗. It follows that ν∗≤(log max{|Φ(ŵ)| : ŵ∈Ŝn`})/(n`). Taking the limit as n→∞, we obtain ν∗≤capind(S). To complete the proof we show that capind(S)≤ν∗. We first claim that if ŵ∈Σ̂∗ is a word generated by a (possibly non-simple) cycle of Ĝ then ν∗≥ log |Φ(ŵ)||ŵ| . (5.3) This is easily proved by induction on |ŵ|. If |ŵ| = 1, then the cycle generating ŵ is simple and obviously (5.3) holds. For |ŵ|>1, let pi = (ei)`i=1 be a cycle of Ĝ generating ŵ. Obviously (5.3) holds if pi is simple. Otherwise, there exist integers 1≤j<k≤` such that τ(ej)=τ(ek). So both α = ej+1, . . ., ek and β = e1, . . ., ej , ek+1, . . ., e` are cycles in Ĝ. Let x̂, ŷ denote the words generated by α and β respectively. Then using the induction hypothesis on |x̂| and |ŷ|, we get log |Φ(ŵ)| |ŵ| = |x̂| |ŵ| log |Φ(x̂)| |x̂| + |ŷ| |ŵ| log |Φ(ŷ)| |ŷ| ≤ |x̂||ŵ|ν ∗ + |ŷ| |ŵ|ν ∗ = ν∗. Now, for n∈N, let ẑ(n)∈Ŝn be a word such that |Φ(ẑ(n))| = max{|Φ(ŵ)| : ŵ∈Ŝn}. Let (e(n)i )n−1i=0 be a path in Ĝ generating ẑ(n). Then using [37, Lemma 13] (and removing “0-length” cycles), it may be decomposed as follows. There exist an integer 0≤m≤|V | and 2m integers 0 ≤ s1≤t1 < s2≤t2 < . . . < sm≤tm < n 52 5.3. Independence capacity and axial products such that for each k = 1, . . .,m, (e(n)i ) tk i=sk is a cycle, and n−∑k(tk−sk+1)≤|V |. Set X = ⋃m k=1{sk, . . ., tk}. Using (5.3), we have log |Φ(ẑ(n))| n = m∑ k=1 log |Φ(L(e(n)sk ). . .L(e(n)tk ))| n + ∑ i∈[n]\X log |Φ(L(e(n)i ))| n ≤ m∑ k=1 ( log |Φ(L(e(n)sk ). . .L(e(n)tk ))| tk − sk + 1 · tk − sk + 1 n ) + |V | n log |Σ| ≤ ν∗ + |V | n log |Σ|. The result follows by taking the limit as n→∞ of both sides of the last inequality. We next show that the independence capacity of a constraint cannot exceed its capacity. Theorem 8. Let S be aD-dimensional constraint over Σ then capind(S)≤cap(S). Proof. For m∈ND, let ẑ(m)∈Ŝm be a configuration such that |Φ(ẑ(m))| = max{|Φ(ŵ)| : ŵ∈Ŝm}. Since, Φ(ẑ(m)) ⊆ Sm, we get (1/|[m]|) log |Φ(ẑ(m))| ≤ (1/|[m]|) log |Sm|. Taking the limit of both sides as m→∞, we obtain the result. 5.3 Independence capacity and axial products In this section we show a relation between the independence capacity of an ax- ial product of 1-dimensional constraints to the independence capacities of the 1- dimensional constraints. A similar relation holds for the (conventional) capacities. The relation is stated in the next theorem. Theorem 9. Let S(1), . . ., S(D) be 1-dimensional constraints over Σ. Then the following statements hold: 1. capind(S (1)⊗. . .⊗S(D))≤min i capind(S (i)). 2. cap(S(1)⊗. . .⊗S(D))≤min i cap(S(i)). 3. If S(1)=. . .=S(D)=S then capind(S ⊗D) = capind(S). 53 5.3. Independence capacity and axial products Remark 1. [37] proves Part 3 for binary 1-dimensional constraints. The proof uses the existence of a word ŵ∈Ŝ with capind(S) = log(|Φ(ŵ)|)/|ŵ| and ŵn∈Ŝ for all n∈N. The proof given here does not rely on the existence of such a word. Proof. Part 1. Denote by Ŝ the multi-choice constraint corresponding to S(1)⊗. . .⊗S(D). Fix i∈{1, . . .,D}. For k∈N, let mk = (m(1)k , . . .,m(D)k )⊆ND, be the D-tuple with m(j)k =1, for j∈{1, . . .,D}\{i}, and m(i)k =k. Every array in Ŝmk is essentially a word in Ŝ(i)k and vice versa. Hence, (log max{|Φ(ŵ)| : ŵ∈Ŝmk})/|[mk]| = (log max{|Φ(ŵ)| : ŵ∈Ŝ(i)k})/k. By (5.1) it follows that capind(S (1)⊗. . .⊗S(D))≤capind(S(i)), and the theorem follows. Part 2. The proof is similar to the proof of Part 1, so we omit it here. Part 3. Let T=S⊗D. By Part 1, it’s enough to show capind(T )≥capind(S). Let f :N→N be any function satisfying limi→∞(f(i)/i)=∞. For i∈N, let mi = (i, i, . . ., i, f(i))∈ND be the D-tuple with every entry but the last equal to i, and the last entry equal to f(i). Set `(i)=(D−1)(i−1)+f(i), and let ẑ(i)=ẑ (i) 0 . . .ẑ (i) `(i)−1∈Ŝ`(i) be a word such that |Φ(ẑ(i))| = max{|Φ(ŵ)| : ŵ∈Ŝ`(i)}. Define the D-dimensional array Γ̂(i)∈Σ̂mi by Γ̂ (i) j = ẑ (i) ψ(j), j∈[mi], where ψ : ZD → Z is given by ψ(j1, . . ., jD) = ∑ k jk. Observe that every row of Γ̂(i) is a (contiguous) sub-word of ẑ(i); it follows that Γ̂(i)∈T̂mi . Consequently, |Φ(Γ̂(i))| ≤ max{|Φ(Γ̂)| : Γ̂∈T̂mi}. (5.4) We next lower bound log |Φ(Γ̂(i))|. Set X = {j∈Z : (i−1)(D−1)≤j<f(i)}, and for Y⊆Z denote by ψ−1(Y ) = {j∈ZD : ψ(j)∈Y }. Then log |Φ(Γ̂(i))| = ∑ j∈[mi] log |Φ(Γ̂(i)j )| = ∑ k∈[`(i)] (∣∣ψ−1({k})∩[mi]∣∣ · log ∣∣∣Φ(ẑ(i)k )∣∣∣) ≥ iD−1 ∑ k∈X log |Φ(ẑ(i)k )| = iD−1 ∑ k∈[`(i)] log |Φ(ẑ(i)k )| − ∑ k∈[`(i)]\X log |Φ(ẑ(i)k )| ≥ iD−1 log |Φ(ẑ(i))| − 2(D−1)(i−1)iD−1 log |Σ|, 54 5.4. Independence capacity and limD→∞ cap(S⊗D) where we used the fact that for k∈X , it holds that ψ−1({k}) ∩ [mi] = {(j1, . . ., jD−1, k− ∑ t jt) : (j1, . . ., jD−1)∈[i]D−1}. Combining the last inequal- ity with (5.4), we have log max{|Φ(Γ̂)| : Γ̂∈T̂mi} |[mi]| ≥ iD−1 log |Φ(ẑ(i))| − 2(D−1)(i−1)iD−1 log |Σ| |[mi]| ≥ log |Φ(ẑ (i))| `(i) − 2(D−1)(i−1) log |Σ| f(i) . Taking the limit of both sides as i→∞, we obtain capind(T )≥capind(S). 5.4 Independence capacity and limD→∞cap(S⊗D) Combining the fact that cap(S⊗D) is non-increasing inDwith Theorem 8, we have cap(S⊗1)≥cap(S⊗2)≥. . .≥capind(S). We denote the limit limD→∞ cap(S⊗D) by cap∞(S). Obviously, cap∞(S)≥capind(S) and for all the constraints for which we know the value of cap∞, it turns out to be equal to capind. We list these constraints. 1. RLL(d, k) with k≤2d. For this family of constraints, capind turns out to be 0 [38]. In [20], it is shown that cap(∞) = 0 as well. 2. RLL(d,∞). For this family of constraints, capind turns out to be 1/(d+1) [38]. [36] shows that cap(RLL(d,∞)⊗D) = 1 d+1 +O ( log2(D(d+1)) D(d+1) ) . Additionally, for d = 1, [29] shows that, for sufficiently large D, 1 2 + 1 2 · 2−2D ≤ cap(RLL(1,∞)⊗D) ≤ 1 2 + 2o(D)2−2D. 3. The 3-checkerboard constraint CHK is defined over the alphabet {0, 1, 2}, and consists of all words in which every 2 adjacent symbols are distinct. The independence capacity of this constraint turns out to be 1/2, and [29] shows that cap∞(CHK) = 1/2 as well. The equality of cap∞ and capind was first noticed empirically by Chaichanavong and Poo in [37], where they ask if this is true for all 1-dimensional constraints. The following theorem gives a partial answer. 55 5.4. Independence capacity and limD→∞ cap(S⊗D) Theorem 10. Let S be a 1-dimensional constraint having finite memory m over Σ with capind(S) = 0. Then cap(S⊗(D+1)) ≤ m m+1 cap(S ⊗D ). In particular, cap∞(S)=0. To prove the theorem we need the following definitions on labeled graphs. For a labeled graph G = (V,E,L) and a subset U⊆V , we call the graph (U,EU ,L|EU ) where EU = {e∈E : σ(e)∈U, τ(e)∈U} the subgraph of G induced by U . For two vertices u, v∈V we say that u is reachable from v if there is a path in G that starts in u and ends in v. We write u G↔ v if u is reachable from v and v is reachable from u. If u G↔ v does not hold we write u 6 G↔ v. The relation G↔ is an equivalence relation on the vertices of G and the equivalence classes are called the irreducible components of G. For an irreducible component of G we shall sometime also call the subgraph of G that it induces an irreducible component. Obviously, a graph G is irreducible iff it has only one irreducible component. A labeled graph G has memory m, for some nonnegative integer m, if all paths of length m generating the same word terminate at the same vertex. Clearly, if G has memory m, then it also has memory n for any n>m. The following proposition relates memory of a constraint to memory of a graph. Proposition 7. If S is a 1-dimensional constraint with finite memory m, then there exists a presentation G of S with memory m. Proof. We construct the “follower-set graph of S”, G = ((V,E),L), as follows. V = {FS(x) : x∈S}, and for a vertex FS(x)∈V and symbol a ∈ FS(x)∩Σ, define δ(FS(x), a) = FS(xa). It’s easy to verify that δ(·, ·) is well-defined. The set of edges E is now given by E = {(u, a, δ(u, a)) : u∈V, a∈u∩Σ}, and for an edge e = (u, a, v)∈E, σ(e) = u, τ(e) = v and L(e) = a. Clearly, G is deterministic. The following fact is easily verified Fact 2. For every vertex FS(x)∈V , a word w is generated from FS(x) in G if and only if w∈FS(x) in which case the path generating it terminates at FS(xw). As FS(ε) = S it follows from this fact that G is a presentation of S. We show that G has memory m. Let w∈S with |w| = m be generated by some path in G starting from a vertex FS(x), then by the above fact, the path terminates at FS(xw). We claim that FS(xw)=FS(w). If x = ε this is obviously true, so assume x 6= ε. Clearly, FS(xw)⊆FS(w). On the other hand, let y∈FS(w). Note that by our assumption, |xwy|>m and every contiguous sub-word, with length m+1, of xwy is a contiguous sub-word of xw or a contiguous sub-word of wy and 56 5.4. Independence capacity and limD→∞ cap(S⊗D) therefore satisfies S. Since S has memory m, this implies that xwy satisfies S as well, and hence y∈FS(xw). Thus FS(w) = FS(xw) as claimed. Therefore all paths generating w terminate at FS(w) and it follows that G has memory m. Proof of Theorem 10. Let G = ((V,E),L) be a presentation of S with memory m, and for a word x∈S with |x|≥m, denote by v(x) the terminal state of any path generating x in G. We will need the next two lemmas. The first shows that if capind(S) = 0, knowledge of long enough prefixes and suffixes of a word in S is often sufficient to determine the middle of the word. Lemma 4. Let x, y∈S be words of length m such that v(x) G↔v(y). Then there is at most one word of the form xay, where a∈Σ, in S. Proof. Assume to the contrary that there are two such words xay, xby∈S where a, b∈Σ and a6=b. Therefore there are two paths in G, (ei)2m+1i=1 , (fi)2m+1i=1 ⊆E gener- ating xay and xby respectively. Since the paths (ei)mi=1 and (fi) m i=1 both generate x, it follows that σ(em+1) = σ(fm+1) = v(x). Similarly, since both paths (ei)2m+1i=m+2 and (fi)2m+1i=m+2 generate y, it follows that τ(e2m+1) = τ(f2m+1) = v(y). There- fore the paths (ei)2m+1i=m+1, (fi) 2m+1 i=m+1 both start at v(x), both end at v(y), and gen- erate ay and by respectively. Since, by our assumption, v(x) and v(y) are in the same irreducible component, there is a path generating some word z∈Σ∗ starting at v(y) and ending at v(x). Concatenating this path to the end of (ei)2m+1i=m+1 and (fi) 2m+1 i=m+1 we obtain two cycles—both starting and ending at v(x)—one generat- ing ayz and the other generating byz. Consequently, for all k∈N and sequence (ci) k i=1⊆{a, b}, there is a cycle in G generating the word c1yzc2yz. . .ckyz. Let |z| = `, y = y0. . .ym−1 and z = z0. . .z`−1, where yi, zj∈Σ for i∈[m], j∈[`], and denote by ŷ, ẑ the words over Σ̂ given by ŷ = {y1}. . .{ym−1} and ẑ = {z1}. . .{z`−1}. It follows that for all k∈N, ({a, b}ŷẑ)k∈Ŝ. But this easily im- plies that capind(S)≥1/(m+`+1), contradicting our assumption. Let CG denote the number of irreducible components of G. The next lemma bounds the number of appearances of words of the form xay in a certain word of S, where a∈Σ, |x| = |y| = m and v(x) 6 G↔ v(y). Lemma 5. For `∈N and word z∈S`(m+1)+m, let y(0), . . ., y(`)∈Σm and a(0). . ., a(`−1)∈Σ be defined by z = y(0)a(0)y(1)a(1). . .y(`−1)a(`−1)y(`). Then ∣∣∣{i∈[`] : v(y(i))6 G↔v(y(i+1))}∣∣∣ < CG . 57 5.4. Independence capacity and limD→∞ cap(S⊗D) Proof. Assume to the contrary that there are 0≤i1<i2<. . .<iCG<` such that v(y(ij)) 6 G↔v(y(ij+1)), for every j∈{1, . . ., CG}. Then the sequence v(y(i1)), v(y(i2)), . . ., v(y(iCG )), v(y(iCG+1)) has CG+1 vertices and therefore contains two belonging to the same irreducible component, say v(y(s)) and v(y(t)), for some integers 0≤s<t≤`, with s∈{i1, . . ., iCG}. Since the word y(s)a(s)y(s+1)∈S, it follows that v(y(s+1)) is reachable from v(y(s)). Similarly, since y(s+1)a(s+1)y(s+2)a(s+2). . .a(t−1)y(t)∈S, it holds that v(y(t)) is reachable from v(y(s+1)). But since v(y(t)) is in the same irreducible component as v(y(s)), it follows that v(y(s)) is reachable from v(y(s+1)) as well. Thus v(y(s)) G↔v(y(s+1)) which contradicts s∈{i1, . . ., iCG}. We can now prove the theorem. This part of the proof is a generalization of [20, Lemma 3]. Figure 5.1 illustrates the proof. Let ` be a positive integer. Denote by `∈ND the D-tuple with every entry equal to `, and let m`∈ND+1 be given by m`=(`(m+1)+m, `, `, . . . , `). Set T = S⊗(D+1). We will give an upper bound on |Tm` |. Let X be the set [`(m+1)+m]\{i(m+1)+m : i∈[`]} and Y the Cartesian productX×[̀ ]. For an array Γ∈Tm` let Γ|Y : Y → Σ denote the mapping given by Γ|Y(x) = Γx. Let BY = {Γ|Y : Γ∈Tm`} denote the set of all such mappings. For a mapping ∆∈BY we define the setZ(∆)⊆Tm` byZ(∆) = {Γ∈Tm` : Γ|Y = ∆}. Clearly, ⋃ ∆∈BY Z(∆) = Tm` . (5.5) Let ∆∈BY , and fix j∈[̀ ]. For i∈[`+1], let y(i,j,∆)∈Σm be the word given by y(i,j,∆) = ∆(i(m+ 1), j)∆(i(m+ 1) + 1, j). . .∆(i(m+ 1) +m− 1, j) and for Γ∈Z(∆) let w(Γ,j)∈Σ`(m+1)+m be the word given by w(Γ,j) = Γ(0,j)Γ(1,j). . .Γ(`(m+1)+m−1,j). Note that for such Γ, since Γ∈Tm` , w(Γ,j)∈S, and, since Γ|Y = ∆, we may write w(Γ,j) = y(0,j,∆)a(0,j,Γ)y(1,j,∆)a(1,j,Γ). . .y(`−1,j,∆)a(`−1,j,Γ)y(`,j,∆), where a(i,j,Γ) = Γ(i(m+1)+m,j) for i∈[`]. Now, if i∈[`] such that v(y(i,j,∆)) G↔v(y(i+1,j,∆)) then, by Lemma 4, all Γ∈Z(∆) have the same a(i,j,Γ). On the other hand, by Lemma 5, since Z(∆) 6= ∅, it holds that |{i∈[`] : v(y(i,j,∆)) 6 G↔v(y(i+1,j,∆))}|<CG . It follows that |{w(Γ,j) : Γ∈Z(∆)}|≤|Σ|CG , and consequently |Z(∆)| ≤ ∏ j∈[̀ ] |{w(Γ,j) : Γ∈Z(∆)}|≤|Σ|CG`D . (5.6) 58 5.4. Independence capacity and limD→∞ cap(S⊗D) Since for any ∆∈BY , and i∈X , theD-dimensional array Λ(∆, i)∈Σ` with entries given by (Λ(∆, i))j = ∆(i, j) is clearly in S⊗D, it follows that |BY |≤|(S⊗D)`||X|. Combining this with (5.6) and (5.5), we have |Tm` | = ∑ ∆∈BY |Z(∆)| ≤ |Σ|CG`D |(S⊗D)`|(`+1)m. Taking the logarithm of both sides and dividing by |[m`]|, we obtain log |Tm` | |[m`]| ≤ (`+ 1)m `(m+ 1) +m log |(S⊗D)`| `D + CG log |Σ| `(m+ 1) +m The theorem follows by taking the limit as `→∞. 59 5.4. Independence capacity and limD→∞ cap(S⊗D) y(0,j,∆) y(1,j,∆) y(`−1,j,∆) y(`,j,∆) a(0,j,Γ) a(1,j,Γ) a(`−1,j,Γ) m ` j Figure 5.1: Proof of Theorem 10. As an application of Theorem 10, consider the the family of multiple-spaced runlength constraints. Each of these constraints is denoted RLL(d, k, s) with d,k and s nonnegative integers and d ≤ k. It is defined over the alphabet {0, 1} and consists of all words in RLL(d, k) for which the length of each run of ‘0’s delimited by ‘1’s on both ends is a multiple of s. Fix such integers d, k, and s with s ≥ 2, and let S = RLL(d, k, s). It can be verified that each word of Ŝ has at most two 60 5.5. Open questions letters equal to {0, 1}, and it follows that capind(S) = 0. As the memory of S is (at most) k, by Theorem 10, we have the following corollary. Corollary 3. Let d, k, s be nonnegative integers such that d≤k and s≥2. Then cap(RLL(d, k, s)⊗D) = ( k k + 1 )D−1 ·cap(RLL(d, k, s)). 5.5 Open questions Is it true that cap∞(S) = capind(S) for every 1-dimensional constraint S? For a 1-dimensional constraint S, what can be said about the rate of convergence of cap(S⊗D) to cap∞(S)? Finally, is Ŝ a D-dimensional constraint, for every D- dimensional constraint S, when D≥2? 61 Chapter 6 The tradeoff function for binary 1-dimensional constraints∗ This chapter deals with the tradeoff function for 1-dimensional binary constraints. As mentioned in Chapter 1, the tradeoff function for a 1-dimensional constraint evaluated at 0 equals the constraint’s capacity and thus is a more general notion than capacity. The motivation for defining this function comes from the applica- tion of 1-dimensional constraints in digital storage systems. We describe the mo- tivation in more detail in the next section. Later on, we give the precise definition of the tradeoff function for a general 1-dimensional binary constraint. The rest of the chapter shows our computation of this function for two families of RLL(d, k) constraints. 6.1 A brief overview of digital recording In digital storage systems, user data is written to the device in the form of a bi- nary sequence. Typically, not every binary sequence may be written reliably to the device and therefore, only a subset of all possible sequences is “allowed” to be writ- ten. The set of all the “allowed” sequences is usually modeled as a 1-dimensional binary constraint. In practice, limiting the written sequence to this “allowed” set is often not sufficient to guarantee the required reliability, and an error-correcting- code or ECC is used as well. Consequently, user data (represented as an arbitrary stream of ‘0’s and ‘1’s) is encoded twice before written to media. First the data is encoded to a codeword of an error-correcting-code or ECC and then the resultant codeword is encoded to an “allowed sequence” of some 1-dimensional binary con- straint. In this context, the constraint is sometimes called a “modulation code” and the encoding of the ECC codeword to a constrained sequence is known as “mod- ulation encoding”. When reading back the data the process is reversed: the data read from the device is first decoded by the modulation code decoder and then the ECC decoder is used to recover the source data, attempting to correct any errors ∗A version of this chapter has been submitted for publication. Louidor, E. (2010) The Tradeoff Function for a Class of RLL(d, k) Constraints. 62 6.1. A brief overview of digital recording that may have occurred when the data was read. Roughly speaking, the rate of a modulation encoder is the ratio between the length of its input to the length of its output; it is typically strictly smaller than 1. In storage systems it is desirable that the rate be as high as possible to maximize storage. This scheme suffers from a couple of disadvantages. First, a small number of errors that are present after reading the data from the device may turn into a burst of errors at the output of the modulation decoder, which may overwhelm the ECC decoder. Second, since the modulation code decoder is typically a “hard” decoder—meaning that it outputs “hard” ‘0’s or ‘1’s rather than probabilities or likelihoods—any soft or probabilistic information that might have been available after reading the data from the device is not readily available to the ECC decoder, thereby limiting its error correction capability. In [40], [5], [38] and the references therein, several encoding schemes are pro- posed to overcome these disadvantages. Here, we focus on one of these schemes, in which the order of the two encodings mentioned above is reversed. The source data is first encoded with a modulation code into a constrained sequence, but instead of using the original constraint, we encode it to a sequence of the multi-choice con- straint corresponding to the original constraint, where we use 0, 1, and in place of the symbols {0}, {1}, and {0, 1} of Σ̂, respectively. The entries containing the ‘’s are “unconstrained” in the sense that replacing (or “filling”) them with any values in {0, 1} independently would result in a sequence satisfying the constraint. Next, a systematic ECC with a suitable redundancy is applied, placing the redun- dancy (parity-check) bits in these unconstrained positions. Clearly, this addresses both of the disadvantages listed above. In this scheme, since the error correction capability of the ECC depends on the number of redundancy bits, it is desirable that the number of unconstrained positions be as large as possible. On the other hand, increasing the number of unconstrained positions at the output of the modulation encoder naturally reduces its rate, as no user information is encoded in the unconstrained positions. In [38] the authors study the tradeoff function that defines for a given “density” of un- constrained positions, called the insertion-rate, the maximum overall rate of the encoding; knowing this function is obviously important to the design of efficient digital storage systems employing this scheme. Currently, there are only very few constraints for which the tradeoff function has been computed explicitly. As mentioned in Chapter 1, the RLL(d, k) constraint is widely used in digital storage systems employing optical or magnetic recording. Another constraint used in practice is the maximum transition run or MTR constraint, denoted MTR(j, k) for some nonnegative integers j, k. This constraint consists of all binary sequences in which the length of each run of ‘1’s is at most j and the length of each run of ‘0’s is at most k. More details on these constraints as well as other constraints used in 63 6.2. Previous work practice may be found in [18] and [32]. We give a precise definition of the tradeoff function in Section 6.3. 6.2 Previous work In [5], the tradeoff functions for RLL(0, 1) and RLL(0, 2) are determined. In [37], Poo computed the tradeoff function for RLL(0, 3) for insertion rates between 0 and 1/4, and the tradeoff function for RLL(d, 2d + 1) for any d. In [38] the authors compute the tradeoff function for MTR(2, 2). For completeness, we present these functions in Theorem 11. Lower bounds on the tradeoff function for RLL(0, k) are given in [19] and [21]. In this chapter, we determine the tradeoff functions for two other families of constraints: RLL(d, 2d+2), and RLL(d,∞). Our results are stated precisely in Theorems 12 and 13. For RLL(d, 2d+2), we find a curious dichotomy in the shape of the tradeoff function between different ranges of values of d. The function is always piecewise linear; yet it consists of 2 linear “segments” for 1≤d≤16 and 3 segments for d≥17. This chapter is organized as follows. In Section 6.3 we define the tradeoff function and related concepts as well as summarize some of its known properties. We also state the previously known tradeoff functions and our new results. In Section 6.4 we show the derivation of the tradeoff function for RLL(d,∞) and in Section 6.5 we show the derivation of the tradeoff function for RLL(d, 2d+2). 6.3 Background and definitions For the rest of this chapter, fix Σ = {0, 1}. Let S be a 1-dimensional constraint over Σ and Σ̂, Φ, Ŝ and ĜFS be as defined in Chapter 5. As already stated, in this chapter we use ‘0’, ‘1’ and ‘’ in place of {0}, {1} and {0, 1}, respectively. So Σ̂ = {0, 1,}, and Φ(x), for x∈Σ̂∗, can be thought of as the set of all possible “fillings” of the ‘’s of x with bits. We also sometimes omit the subscript FS from ĜFS to simplify notation. For a word w ∈ Σ̂∗, let #(w) denote the number of ‘’s in w. Observe that log |Φ(w)| = #(w) and hence capind(S) = limm→∞ max{#(w) : w∈Ŝm} m ; so capind(S) is the asymptotic maximum density of ’‘s in words of Ŝ. In this chapter we use the following notation for sequences. For a set T and nonnega- tive integer n we denote a sequence b1, . . ., bn of n elements of T by (bi), and index its elements by {1, 2, . . ., n}. We refer to n as the length of the sequence 64 6.3. Background and definitions and denote it by |(bi)|. We abuse notation and write (bi)⊆T to mean that bi∈T for all i∈{1, 2, . . ., |(bi)|}. For two sequences (bi), (di)⊆T we denote by (bi)(di) the sequence formed by concatenating the sequences (bi) and (di), that is the se- quence b1, b2, . . ., b|(bi)|, d1, d2, . . ., d|(di)|. For a nonnegative integer m, the nota- tion (bi)m is used to denote the sequence formed by concatenating (bi) to itself m times (as usual (bi)0 is the empty sequence). We shall also consider infinite se- quences b1, b2, . . . with elements in T and denote such sequences by (bi)∞i=1. For a sequence (Mi) of m real nonnegative square matrices all having the same size, we write λ((Mi)) to mean λ( ∏ iMi). Let S be a constraint over Σ. For a positive integer n and a subset I⊆[n] we defineM(I, n) =MS(I, n) = |{w0w1. . .wn−1∈Ŝn : ∀i∈[n], wi= ⇔ i∈I}|. Also, for any real number ρ∈[0, 1], define the set Iρ by Iρ = { (Ij) ∞ j=1 : Ij⊆[j] for all j, and lim j→∞ |Ij | j =ρ } . Then the tradeoff function of S, fS : [0, 1]→[0, 1]∪{−∞} is given by fS(ρ) = sup (Ij)∈Iρ lim sup j→∞ logMS(Ij , j) j . A 1-dimensional constraint is irreducible if it has an irreducible presenta- tion. A graph (labeled graph) is called trivial if it has exactly one vertex and no edges. Every RLL(d, k) constraint is irreducible. In [38] it is shown that if S is an irreducible finite-type constraint (that has infinitely many words), then ĜFS has exactly one non-trivial irreducible component. Here, we denote this com- ponent by Ĝ? = (V̂?, Ê?, L̂?). For a subset Q⊆Σ̂, let ÊQ? denote the subset of Ê? consisting of the edges whose label is in Q. We denote by Ĝ{0,1}? the sub- graph of Ĝ? given by Ĝ{0,1}? =(V̂?, Ê{0,1}? , L̂?|Ê{0,1}? ) and by Ĝ {} ? the subgraph of Ĝ? given by Ĝ{}? =(V̂?, Ê{}? , L̂?|Ê{}? ). We define A {0,1}(S) = A(Ĝ{0,1}? ) and A{}(S) = A(Ĝ{}? ). Let (Mi)⊆{A{0,1}(S), A{}(S)} be a sequence of length n. We say that a path (ei)⊆Ê? of Ĝ? matches (Mi) if it has length n, and for every 1≤i≤n, L̂?(ei)= iff Mi=A{}(S). Note that for any s, t∈V̂?, the entry ( ∏ iMi)(s,t) is the number of paths in Ĝ? starting at s, ending at t and matching (Mi). For a finite sequence (Mi)⊆{A{0,1}(S), A{}(S)}, we denote by %((Mi)) the density of A{}(S) in (Mi), namely %((Mi)) = ∣∣{1≤i≤|(Mi)| : Mi=A{}(S)}∣∣ |(Mi)| . Let S be a 1-dimensional constraint over Σ. We list the following known facts about the tradeoff function fS . 65 6.3. Background and definitions • For 0≤ρ≤capind(S), fS(ρ)≥0, and for capind(S)<ρ≤1, fS(ρ) = −∞. • fS(0) = cap(S). • fS is decreasing in [0, capind(S)]. Moreover, for any 0≤ρ1<ρ2≤capind(S), fS(ρ1)−fS(ρ2)≥ρ2−ρ1. • fS is left-continuous in [0, capind(S)]. • If S is irreducible and finite-type then fS is concave and continuous in [0, capind(S)]. Furthermore, for all rational ρ∈[0, 1] fS(ρ) = sup (Mi) log λ((Mi)) |(Mi)| , (6.1) where the sup is taken over all sequences (Mi)⊆{A{0,1}(S), A{}(S)}with %((Mi))=ρ. • For integers 0≤d≤k capind(RLL(d, k)) = b(k−d)/(d+1)c b(k+1)/(d+1)c(d+1) , (6.2) and capind(RLL(d,∞)) = 1 d+1 . (6.3) See [38] for proofs. As mentioned in the introduction, there are a few constraints for which the tradeoff function has been computed explicitly. These are summarized in the next theorem, along with the references to the respective papers. For a finite se- quence (xi)⊆R2 of points such that xi = (xi, yi) and x1<x2<. . .<xn, we define L(xi) : [x1, xn]→R to be the function whose graph is the piecewise linear curve connecting these points in sequence; namely, the function that satisfies L(xi)(x) = yi+1−yi xi+1−xi (x−xi)+yi , xi≤x≤xi+1, for all 1≤i<n. Theorem 11. Let S(1) = RLL(0, 1), S(2) = RLL(0, 2), S(3) = RLL(0, 3), S(4) = RLL(d, 2d+1), and S(5) = MTR(2, 2). Let fi be the function fS(i) re- stricted to [0, capind(S (i))]. Then the following statements hold: 1. f1=L(0,cap(S(1))),( 1 2 ,0) (shown in [5]). 66 6.3. Background and definitions 2. f2=L(0,cap(S(2))),( 1 3 , 2 3 cap(S(1))),( 2 3 ,0) (shown in [5]). 3. f3(ρ)=L(0,cap(S(3))),( 1 4 , 3 4 cap(S(2)))(ρ), for 0≤ρ≤14 (shown in [37]). 4. f4=L(0,cap(S(4))),( 1 2(d+1) ,0) (shown in [37]). 5. f5=L(0,cap(S(5))),( 1 3 ,0) (shown in [38]). We now state our new results. Theorem 12. Let d be a nonnegative integer and S = RLL(d,∞). Set p1 = (0, cap(S)) , p2 = ( 1 d+1 , 0 ) = (capind(S), 0). Then fS(ρ) = Lp1,p2(ρ), for ρ∈[0, capind(S)]. Theorem 13. Let d be a positive integer and S = RLL(d, 2d+ 2). Set p1 = (0, cap(S)) , p2 = ( 3 6d+8 , log 3 6d+8 ) , p3 = ( 2 4d+5 , 1 4d+5 ) , p4 = ( 2 4d+4 , 0 ) = (capind(S), 0) . Then the following statements hold: 1. If 1≤d≤16 then fS(ρ) = Lp1,p3,p4(ρ), for ρ∈[0, capind(S)]. 2. If 17≤d then fS(ρ) = Lp1,p2,p3,p4(ρ), for ρ∈[0, capind(S)]. The proofs are given in the next sections. The graphs of the tradeoff functions for RLL(d,∞) and RLL(d, 2d+2) for ρ∈[0, capind(S)] are sketched in Figure 6.1. We need some properties of nonnegative matrices which we summarize here. If (M ′i) is formed by cyclically shifting (Mi) (that is, there exists an integer o such that for all i,M ′i=M((i−1+o) modm)+1, where for an integer j, jmodm denotes the unique integer k∈[m] such that k≡j ( mod m)) then ∏iMi and ∏iM ′i have the same characteristic polynomial ( [33, 2.15.15]); in particular, λ((M ′i))=λ((Mi)). If M and N are two nonnegative square matrices with M≤N then λ(M)≤λ(N) 67 6.3. Background and definitions ( [33, 5.7.5]). The support graph of an m×m nonnegative matrix M denoted GM = (VM , EM ) is the directed graph with vertices VM = {1, 2, . . .,m} and edges EM={(i, j)∈VM×VM : Mi,j>0}, where for an edge e = (i, j)∈EM , σ(e) = i and τ(e) = j. Such a matrix is called primitive if its support graph is primitive. For a primitive matrix M , the limit limg→∞(Mg)/(λ(M)g) (where the limit is taken entry-wise), exists and is strictly positive in each entry ( [33, 5.9.7]). fS(ρ) ρ 0 cap(S) 1 d+1 (a) fS(ρ) ρ 0 cap(S) ( 2 4d+5 , 1 4d+5 ) 1 2d+2 (b) fS(ρ) ρ 0 cap(S) ( 3 6d+8 , log 3 6d+8 )( 2 4d+5 , 1 4d+5 ) 1 2d+2 (c) Figure 6.1: The graphs of fS(ρ) for ρ∈[0, capind(S)] (not to scale): (a) S=RLL(d,∞); (b) S=RLL(d, 2d+2) and 1≤d≤16; (c) S=RLL(d, 2d+2) and 17≤d. 68 6.4. Proof of Theorem 12 6.4 Proof of Theorem 12 The proof is similar to the proof of [37, Proposition 45]. Set Cd = cap(S), and let Ĝ=ĜFS=(V̂ , Ê, L̂) be the presentation of Ŝ defined in Section 6.3. Note that S is irreducible and of finite-type. We denote by ai the follower set FS(10i) for 0≤i≤d. It can be verified that V̂ = {a0, a1, . . ., ad}. The graph Ĝ is given in Figure 6.2. a0 a1 . . . ad 0 0 0 0 1 Figure 6.2: The graph ĜFRLL(d,∞) . Clearly, Ĝ? = ĜFS in this case. Let A = A{0,1}(S), B = A{}(S), and set C=BAd and hd = Lp1,p2 . For a sequence (Ni)⊆{A,C} we denote by ε((Ni)) the “expanded” sequence, with elements in {A,B}, formed by substituting the sequence (B)(A)d for every element C in (Ni). It follows from (6.3) and the rest of the discussion in Section 6.3 that fS(0) = hd(0) and fS(ρ)≥hd(ρ) for ρ∈(0, capind(S)]. Hence it’s enough to show fS(ρ)≤hd(ρ) for ρ∈(0, capind(S)]. Since both fS and hd are continuous in that interval, it’s enough to show the latter inequality for all rational ρ∈(0, capind(S)]. Let ρ be such a rational. By (6.1) it suffices to show that for any sequence (Mi)⊆{A,B} with %((Mi))=ρ, we have log λ((Mi)) |(Mi)| ≤ hd(%((Mi))). (6.4) Let (Mi) be such a sequence. Set m=|(Mi)| and consider the sequence (Xi)=(Mi) 2. Note that in any path of Ĝ the number of edges between a pair of edges labelled with a ‘’ must be at least d. It follows that if there exist integers 1≤i<j≤2m, with j−i−1<d, such thatXi=Xj=B, then no paths of Ĝ match (Xi) or equivalently ( ∏m i=1Mi) 2=0. The latter equality implies λ((Mi))=0, and there- fore (6.4) holds. So assume no such integers exist. It can be verified that in this case we may cyclically shift (Mi) such that (Mi)=ε((Ni)) for some (Ni)⊆{A,C}with N1=C; such a cyclic shift does not change either side of (6.4). We denote |(Ni)| 69 6.4. Proof of Theorem 12 by n. Now, it’s easy to verify that there is exactly one path matching (B)(A)d and it starts and ends at ad; hence the only nonzero entry of C is (C)(ad,ad) and it is equal to 1. It follows that Ck = C, for k = 1, 2, . . . (6.5) CQC ≤ C2Q, for any nonnegative |V̂ |×|V̂ | matrix Q. (6.6) Let s be the number of elements in (Ni) equal to C; clearly, s=mρ>0. Let (N ′i)=(C) s(A)n−s; then by (6.5) and (6.6), we have λ((Mi))=λ((Ni))≤λ((N ′i))=λ(CAn−s). We order the entries of (CAn−s) as follows. For every i, j∈[d] we place the element (CAn−s)(ai,aj) in the ith row and jth column. Observe, that using this ordering, the matrix CAn−s is lower triangular with (CAn−s)(i,i)=0 for 0≤i<d. It follows that λ(CAn−s)=(CAn−s)(ad,ad) = (A n−s)(ad,ad). Let Γ be the set of all paths in Ĝ starting and terminating in ad and matching (A)n−s, and for a nonnegative integer g define the set ∆(g)⊆S by ∆(g) = { w∈Sg : w does not end with 10i, 0≤i≤d−1 } . Then it’s not hard to check that ∆(n−s) is precisely the set of words gener- ated by paths in Γ. Since Ĝ is deterministic, we have that |Γ|=|∆(n−s)|, so λ((N ′i))=|∆(n−s)|. Now, observe that for g≥d, any word in Sg−d can be ex- tended to a word in ∆(g) by adding ‘0’s; thus for all g≥d, |Sg−d|≤|∆(g)|≤|Sg|. It follows that lim g→∞ log |∆(g)| g =Cd. On the other hand, note that for any nonnegative integers g1, g2 and words w∈∆(g1), x∈∆(g2), the word wx∈∆(g1 + g2); it follows that log |∆(·)| is su- peradditive, and therefore by Lemma 1 we have lim g→∞ log |∆(g)| g = sup g≥1 log |∆(g)| g . In particular, log |∆(n− s)| n− s ≤Cd, 70 6.5. Proof of Theorem 13 and therefore, log λ((Mi)) |(Mi)| ≤ log λ((N ′i)) m = log |∆(n−s)| n−s n−s m ≤ Cdn−s m =Cdm− s(d+ 1) m = Cd (1− ρ(d+ 1)) = hd(ρ). This completes the proof. 6.5 Proof of Theorem 13 In this section we prove Theorem 13. As the proof is rather involved, we show an outline of the proof in Section 6.5.1, relying on several propositions whose proofs we defer to Section 6.5.2. Throughout this section, S, p1, . . .,p4 are as defined in the statement of the theorem, and we set Cd = cap(RLL(d, 2d+2)). Note that S is irreducible and of finite-type; let Ĝ? = (V̂?, Ê?, L̂?) denote the unique non-trivial component of ĜFS . Then it can be verified that Ĝ? is the subgraph of Ĝ induced by V̂?, where V̂? = {FS(10i) : 0≤i≤2d+2}∪{ FS(10i) ∩ FS(10i+d+1) : 0≤i≤d−1 } ∪{ FS(10i) ∩ FS(10i+d+2) : 0≤i≤d−1 } . For the purpose of this proof we use the abbreviations ai = FS(10i), 0≤i≤2d+2, bi = FS(10i) ∩ FS(10i+d+1), 0≤i≤d−1, and ci = FS(10i) ∩ FS(10i+d+2), 0≤i≤d−1. The graph Ĝ? is shown in Figure 6.3. Let A = A{0,1}(S), B = A{}(S) and set C = AdBAd+1; we index the entries of C by V̂ 2? . For a sequence of matrices (Mi)⊆{A,C} we denote by ε((Mi)) the sequence with elements in {A,B} formed by substituting the sequence (A)d(B)(A)d+1 for every element C in (Mi). Let hd : [0, capind(S)]→[0, 1] be given by hd = { Lp1,p3,p4 if 1≤d≤16 Lp1,p2,p3,p4 if 17≤d . 71 6.5. Proof of Theorem 13 a0 a1 . . . ad ad+1 . . . a2d a2d+1 a2d+2 b0 . . . bd−1 c0 . . . cd−1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11111 Figure 6.3: The non-trivial component of ĜFRLL(d,2d+2) . 6.5.1 Outline of proof Let i, j∈V̂?. Since there are only 3 paths in Ĝ? matching (A)d(B)(A)d+1: a path starting at vertex a0 and ending at vertex a2d+2, a path starting at vertex a0 and ending at vertex a0 and a path starting at vertex a1 and ending at vertex a0, it follows that the entries of C and C2 are given by C(i,j) = 1 if i = a0 and j∈{a0, a2d+2} 1 if i = a1 and j = a0 0 otherwise , i, j∈V, (6.7) (C2)(i,j) = { 1 if i∈{a0, a1} and j∈{a0, a2d+2} 0 otherwise , i, j∈V. (6.8) The following facts easily follow: Fact 3. C2 ≥ C. Fact 4. For all integers k≥2, Ck = C2. Fact 5. C2 = cr, where c and r are the column and row vectors, respectively, of size |V̂?| with entries indexed by V̂? and given by (c)i = { 1 if i ∈ {a0, a1} 0 otherwise (r)i = { 1 if i ∈ {a0, a2d+2} 0 otherwise , i∈V̂?. 72 6.5. Proof of Theorem 13 Hence C2 is a (nonnegative) rank-1 matrix. The following proposition shows how to compute the Perron eigenvalue of such a matrix. Proposition 8. Let M be an m×m real nonnegative matrix of the form M = ab with a and b column and row vectors of size m, respectively. Then the following statements hold: 1. For any real nonnegative m×m matrix N , λ(MN) = bNa. 2. For any real nonnegative m×m matrices N1, N2, λ(MN1MN2)=λ(MN1)λ(MN2). Now, observe, that by (6.2), capind(S)=1/(2d+2); hence it follows from the discussion in Section 6.3 that fS(0)=hd(0), (6.9) and fS(1/(2d+2)) ≥ hd(1/(2d+2)). (6.10) So it’s enough to show fS(ρ)=hd(ρ) for all ρ∈(0, capind(S)]. The following proposition characterizes hd. Proposition 9. hd : [0, 1/(2d+ 2)]→ R is the smallest function satisfying: 1. hd is concave. 2. hd(0)≥Cd. 3. hd(3/(6d+ 8))≥ log 3/(6d+ 8) 4. hd(2/(4d+ 5))≥1/(4d+ 5) 5. hd(1/(2d+ 2))≥0 We first show that for every ρ∈[0, capind(S)], fS(ρ)≥hd(ρ). Consider the two sequences of matrices (Yi)=ε((C,C,A)) and (Zi)=ε((C,C,A,C,A)). Clearly, %((Yi)) = 2 4d+5 , (6.11) %((Zi)) = 3 6d+8 . (6.12) 73 6.5. Proof of Theorem 13 Now, let c, r be the vectors defined in Fact 5; by Proposition 8 and Fact 5, λ((Yi)) = λ(C 2A) = rAc = ∑ i∈{a0,a2d+2}, j∈{a0,a1} A(i,j) = 2, (6.13) and λ((Zi)) = λ(C 2ACA) = rACAc. Observe that for i∈V̂?, the entry (rA)i is the number of paths of length 1 in Ĝ{0,1}? that begin in either a0 or a2d+2 and end at i. It follows that rA = ct (where ct is the transpose of c). Using (6.7), we have λ((Zi)) = c tCAc = 2A(a0,a0) + 2A(a0,a1) +A(a2d+2,a0) +A(a2d+2,a1) = 3. (6.14) By (6.1), for any sequence (Xi)⊆{A,B}, fS(%((Xi)))≥λ((Xi))/|(Xi)|; hence by (6.11), (6.12), (6.13), and (6.14) above, we get fS(2/(4d+5)) ≥ 1/(4d+5), (6.15) fS(3/(6d+8)) ≥ (log 3)/(6d+8). (6.16) Since fS is concave in [0, capind(S)], equality (6.9), and inequalities (6.10), (6.15), and (6.16) imply that fS(ρ)≥hd(ρ) for all ρ∈[0, capind(S)]. In the remainder of this section we show fS(ρ)≤hd(ρ) for ρ∈(0, capind(S)]. Since both fS and hd are continuous in this interval, it’s enough to show fs(ρ)≤hd(ρ) for all rational ρ∈[0, capind(S)]. Let ρ be such a rational. By (6.1), it’s enough to show that for all finite sequences (Mi)⊆{A,B} with %((Mi))=ρ, we have log λ((Mi)) |(Mi)| ≤ hd(%((Mi))). (6.17) Let (Mi) be such a sequence. Set n=|(Mi)| and consider the sequence (Xi)=(Mi) 2. Note that in any path of Ĝ? the number of edges between a pair of consecutive edges labelled with a ‘’ must be at least 2d+1. It follows that if there exist nonnegative integers 1≤i<j≤2n, with j−i−1<2d+1, such thatXi=Xj=B, then no paths of Ĝ? match (Xi) or equivalently ( ∏n i=1Mi) 2=0. This implies 74 6.5. Proof of Theorem 13 λ((Mi)) = 0, and therefore (6.17) holds. So assume no such integers exist. It can be verified that in this case we may cyclically shift (Mi) such that (Mi)=ε((Ni)) for some (Ni)⊆{A,C}; such a cyclic shift does not change either side of (6.17). Since we assumed ρ>0, the sequence (Ni) must have at least one element equal to C. If every element of (Ni) is equal to C, then ρ = capind(S) and λ((Ni)) = λ(C |(Ni)|) = ( λ(C2) )|(Ni)|/2 = (rc)|(Ni)|/2 = 1, where c, r are the vectors defined in Fact 5 and we used Proposition 8. There- fore (6.17) holds with equality in this case. So we assume (Ni) has an element equal to C and an element equal to A. By cyclically shifting (Mi) and (Ni), if necessary, we may assume (Ni) is either of the form (C)s1(A)g1(C)s2(A)g2 . . .(C)sk(A)gk , k≥1, g1, . . ., gk≥1, s1≥2, and s2, . . ., sk≥1, (6.18) or the form (C)(A)g1(C)(A)g2 . . .(C)(A)gk , k≥1 and g1, . . ., gk≥1. (6.19) We claim it’s enough to show that (6.17) holds for (Mi) = ε((Ni)), where (Ni) is of the form (6.18). Indeed, assume that (6.17) holds for all sequences (Mi) = ε((Ni)) such that (Ni)⊆{A,C} is of the form (6.18), and let (Mi)=ε((Ni)) with (Ni) a sequence of the form (6.19). Pick a positive integer m, and set (Xi)=ε((C)(Ni) m). Then log λ((Mi)) |(Mi)| = log λ((Mi) m) m|(Mi)| = log λ((CAg1CAg2 . . .CAgk)m) m|(Mi)| ≤ log λ(C(CA g1CAg2 . . .CAgk)m) m|(Mi)| (6.20) = log λ((Xi)) |(Xi)| |(Xi)| m|(Mi)| ≤ hd(%((Xi))) |(Xi)| m|(Mi)| (6.21) = hd ( 1+m|(Mi)|%((Mi)) 2d+2+m|(Mi)| ) 2d+2+m|(Mi)| m|(Mi)| where (6.20) follows from Fact 3 and (6.21) follows from our assumption, as (C)(Ni) m is of the form (6.18). Since hd is continuous, taking the limit of the RHS as m approaches infinity, we get that (6.17) holds for (Mi). 75 6.5. Proof of Theorem 13 So henceforth, we assume (Ni) is of the form (6.18). We now further trans- form (Ni) by reducing runs of C elements with lengths greater than 2; that is, we (possibly) change (Ni) to be the sequence (C)u(C)2(A)g1(C)s ′ 2(A)g2(C)s ′ 3(A)g3 . . .(C)s ′ k(A)gk , where each s′j= min{2, sj} and u= ∑ j(sj−s′j). We also update (Mi) so that it still satisfies (Mi)=ε((Ni)). Clearly this does not change the RHS of (6.17) and by Fact 4 the LHS remains the same, as well. Now, the sequence (Ni) may be rewritten as (Ni)=(C)u(O (1) i )(O (2) i ). . .(O (m) i ), where each (O (j) i )⊆{A,C} is given by (O (j) i )=(C) 2(A)gj,1(C)(A)gj,2(C)(A)gj,3 . . .(C)(A) gj,kj , kj≥1, and gj,1, . . ., gj,kj≥1. (6.22) Observe that by Proposition 8 we have: λ(Ni) = λ( ∏m j=1 ∏ iO (j) i ) =∏m j=1 λ((O (j) i )). We will use the following two propositions to further transform (Ni). Proposition 10. For integers k≥2, 1≤i<k, g1, g2, . . ., gi−1, gi+2, . . ., gk≥1 and gi, gi+1≥2 λ(C2Ag1CAg2 . . . CAgi−1CAgiCAgi+1CAgi+2 . . . CAgk) ≤ λ(C2Ag1CAg2 . . . CAgi−1CAgi+gi+1CAgi+2 . . . CAgk) Proposition 11. For integers k≥2, 1≤i<k, s, g1, . . ., gi−1, gi+2, . . ., gk≥1 and gi, gi+1≥2 λ(C2Ag1CAg2 . . . CAgi−1CAgi(CA)sCAgi+1CAgi+2 . . . CAgk) ≤ λ(C2Ag1CAg2 . . . CAgi−1CAgi+gi+1CAgi+2 . . . CAgkC2A(CA)s−1) For each j=1, 2, . . .,m we transform (O(j)i ) in turn, resulting in a new se- quence (Õ(j)i )⊆{A,C}. We first replace occurrences of (contiguous) subse- quences of the form (A)g1(C)(A)g2 , for some g1, g2≥2, in our sequence with (A)g1+g2 . Each such replacement decreases the number of elements equal to C by 1, does not change the number of elements equal to A, and by Proposition 10 does not decrease the λ of the sequence. We continue to do this until no more occurrences of such sequences exist. Let qj be the number of the replacements we performed. Next, we consider every occurrence of a (contiguous) subsequence of the form (C,A)s(C), for some s∈N, whose two preceding elements and two succeeding elements all equal A. For each such occurrence, in turn, we remove 76 6.5. Proof of Theorem 13 it, and concatenate the sequence (C,C,A)(C,A)s−1 to the end of our current se- quence. Note that after each such removal-and-concatenation the number of ele- ments equal to A and the number of elements equal to C do not change, and by Proposition 11 and Part 2 of Proposition 8, the λ of the sequence does not decrease. We denote by (Õ(j)i ) the resulting sequence. Then it follows from this discussion that (C)qj (Õ(j)i ) and (O (j) i ) have the same number of elements equal to A and the same number of elements equal to C. It further follows that λ((Õ (j) i )) ≥ λ((O(j)i )), (6.23) and (Õ (j) i ) = (R (j,1) i )(R (j,2) i ). . .(R (j,wj) i ), (6.24) where wj∈N, and for each 1≤k≤wj , the sequence (R(j,k)i ) is either of the form (C)2(A)(C,A)t−1 , for some t≥1, (6.25) or of the form (C)2(A,C)s(A)g(C,A)t , for some s, t≥0, g≥2. (6.26) Set (Ñi) = (C)ũ(Õ (1) i )(Õ (2) i ). . .(Õ (m) i ), where ũ=u+ ∑m j=1 qj . Then %(ε((Ñi))) = %((Mi)) and |ε((Ñi))| = |(Mi)|. Additionally, by (6.23), Proposition 8 and Fact 4, we have λ((Ni))≤λ((Ñi)). Now, for j=1, . . .,m and k=1, . . ., wj , denote by (F (j,k) i ) the sequence ε((R (j,k) i )). To finish the proof, we claim, it’s enough to show that for every such j and k, log λ((F (j,k) i )) |(F (j,k)i )| ≤ hd(%((F (j,k)i ))). (6.27) 77 6.5. Proof of Theorem 13 Indeed, assume that this holds. Then, noting that hd(capind(S)) = 0, we have log λ((Mi)) |(Mi)| = log λ((Ni)) |(Mi)| ≤ log λ((Ñi)) |ε((Ñi))| = m∑ j=1 wj∑ k=1 ( log λ((F (j,k) i )) |(F (j,k)i )| |(F (j,k)i )| |ε((Ñi))| ) ≤ hd(capind(S)) |ε((C)ũ)| |ε((Ñi))| + m∑ j=1 wj∑ k=1 ( hd(%((F (j,k) i ))) |(F (j,k)i )| |ε((Ñi))| ) (6.28) ≤ hd capind(S) |ε((C)ũ)||ε((Ñi))| + m∑ j=1 wj∑ k=1 ( %((F (j,k) i )) |(F (j,k)i )| |ε((Ñi))| ) (6.29) = hd(%(ε((Ñi)))) = hd(%((Mi))), where (6.28) follows from our assumption, and (6.29) follows from the concavity of hd asserted in Proposition 9. So we proceed to show that (6.27) holds for every j=1, . . .,m, k=1, . . ., wj . This follows from the next proposition and the fact that each (R(j,k)i ) is either of the form (6.25) or (6.26). Proposition 12. The following statements hold. 1. Let t≥1 be an integer and let (Fi)=ε((C)2(A)(C,A)t−1), then log λ((Fi)) |(Fi)| ≤hd(%((Fi))). 2. Let s, t≥0, g≥2 be integers and let (Fi)=ε((C)2(A,C)s(A)g(C,A)t), then log λ((Fi)) |(Fi)| ≤hd(%((Fi))). The proof is now completed. 6.5.2 Proof of propositions Proof of Proposition 8. Part 1. Consider the matrix MN=a(bN). Clearly, it has rank at most 1, and therefore the eigenvalue 0 has geometric multiplicity at least 78 6.5. Proof of Theorem 13 m−1; hence m−1 of the eigenvalues are 0. The last eigenvalue must then equal the trace of the matrix, which in this case, is (bN)a. Obviously, it is a largest real eigenvalue. Part 2. Using part 1 we have, λ(MN1MN2) = λ(M(N1MN2)) = b(N1MN2)a = bN1abN2a = λ(MN1)λ(MN2). Proof of Proposition 9. We make use of the following lemma. Lemma 6. For all positive integers d, Cd > Cd+1. Proof of Lemma 6. It is well known (cf. [18]) that Cd= log λd, where λd>0 is the largest real root of the polynomial Pd(x) given by Pd(x) = x 2d+3 − d+2∑ i=0 xi. Let γ be any positive root of Pd(x). Choose any x>γ, and write x=(1+δ)γ for δ>0. Then Pd(x) = (1 + δ) 2d+3γ2d+3 − d+2∑ i=0 (1 + δ)iγi > (1 + δ)2d+3γ2d+3 − (1 + δ)2d+3 d+2∑ i=0 γi = (1 + δ)2d+3Pd(γ) = 0. It follows that λd is the only positive root of Pd, and that Pd(x)>0 for all x>λd. Moreover, as Pd is continuous and Pd(0)=− 1<0 it follows that for all 0≤x<λd, Pd(x)<0. Clearly, the above holds for Pd+1(x) and λd+1 as well; hence to show the claim it’s enough to prove that Pd+1(λd)>0. Now, since Pd(λd) = 0, we have 79 6.5. Proof of Theorem 13 λ2d+3d = ∑d+2 i=0 λ i d and Pd+1(λd) = λ 2d+5 d − d+3∑ i=0 λid = d+4∑ i=2 λid − d+3∑ i=0 λid = λd+4d − λd − 1. (6.30) Using Pd(λd) = 0 again, we get λd+4d = λ −(d−1) d λ 2d+3 d = 3∑ i=−(d−1) λid > 1+λd. Thus, from (6.30), we get Pd+1(λd)>0 and the claim follows. We now return to the proof of Proposition 9. We first note that for all d≥1 Cd≤ log(9/8)⇐⇒ d≥17. (6.31) Indeed it’s a simple matter to verify that for d=17, the LHS holds, and for d=16 it does not; (6.31) then follows by applying Lemma 6. We show that hd is concave. Note that for a sequence (xi)⊆R2, with |(xi)| = k, xi = (xi, yi) and x1<. . .<xk, the function L(xi) is concave iff the sequence of the slopes of the linear segments is non-increasing, namely, yi − yi−1 xi − xi−1≥ yi+1 − yi xi+1 − xi , 2≤i≤k−1. We check this for hd, when d≤16. In this case, hd = Lp1,p3,p4 = L(0,Cd),( 24d+5 , 14d+5 ),( 12d+2 ,0), so one needs to verify that 1/(4d+5)− Cd 2/(4d+5) ≥ 0− 1/(4d+5) 1/(2d+2)− 2/(4d+5) . Using simple algebraic manipulations this can be reduced to Cd≤1, which certainly holds. As for the case d≥17, here, hd = Lp1,p2,p3,p4 = L (0,Cd),( 36d+8 , log 36d+8 ),( 24d+5 , 14d+5 ),( 12d+2 ,0) , 80 6.5. Proof of Theorem 13 so one needs to verify the following two inequalities: log 3/(6d+8)− Cd 3/(6d+8) ≥ 1/(4d+5)− log 3/(6d+8) 2/(4d+5)− 3/(6d+8) (6.32) 1/(4d+5)− log 3/(6d+8) 2/(4d+5)− 3/(6d+8) ≥ 0− 1/(4d+5) 1/(2d+2)− 2/(4d+5) . (6.33) Again, using algebraic manipulations, (6.32) can be reduced to Cd≤ log(9/8), which holds by (6.31) and our assumption on d, and (6.33) can be reduced to 2≥ log 3, which is obviously true. Next, we verify that hd satisfies the other prop- erties listed in the proposition. Clearly, hd satisfies Properties 2,4,5 with equality, and for d≥17, it satisfies Property 3 with equality as well. It remains to check Property 3 for d≤16, namely that L(0,Cd),( 24d+5 , 14d+5 ) (3/(6d+8))≥ log 3 6d+8 . The latter inequality can be reduced to Cd≥ log(9/8), which holds by our assump- tion on d and (6.31). Thus, hd is concave and satisfies Properties 2,3,4 and 5. It’s easy to verify using the definition of hd and concavity that it is the smallest such function. Before we show the proofs of Propositions 10 and 11, we develop some tools for calculating λ((Xi)), where (Xi)⊆{A,C} is a sequence of the form (C)2(A)g1(C)(A)g2 . . .(C)(A)gk , for some k≥1 and g1, . . ., gk≥1. (6.34) To this end, we define the following sets. For a positive integer g, let ∆(g)⊆Sg be given by ∆(g)= { w∈Sg : w does not begin with ‘0 i1’, 1≤i≤d−1, and does not end with ‘00’ } . For positive integers k and g1, g2, . . ., gk, let ∏ j ∆(gj) denote the cartesian product ∆(g1)×. . .×∆(gj); define ∆(g1, . . ., gk) by ∆(g1, . . ., gk)= (w1, . . ., wk)∈∏ j ∆(gj) : for 1≤i<k, if wi ends with a ‘0’ then wi+1 begins with a ‘0’ . Finally, for symbols a, b∈{0, 1}, and positive integers k,g1, . . ., gk, define the set ∆a→b(g1, . . ., gk) by ∆a→b(g1, . . ., gk)= { (w1, . . ., wk) ∈ ∆(g1, . . ., gk) : w1 starts with aand wk ends with b } . 81 6.5. Proof of Theorem 13 So for example, ∆(1) = {0, 1}, ∆(2) = {01, 10} if d = 1 and ∆(2) = {10} for d ≥ 2, ∆(i)=∅ for 3≤i≤d, and ∆(d+ 1) = {0d1} for d ≥ 2, and so on. Also note that ∆(1, 1, . . ., 1), where the number of 1’s is some integer s, is given by ∆(1, 1, . . ., 1) = {(0, 0, 0, . . ., 0), (1, 0, 0, . . ., 0), (1, 1, 0, . . ., 0), ... (1, 1, 1, . . ., 1)}, (6.35) and has s+1 elements. The following proposition shows how these sets can be used to compute λ((Xi)), with (Xi) of the form (6.34). Proposition 13. For all positive integers k, g1, . . ., gk λ(C2Ag1CAg2 . . .CAgk)=|∆(g1, . . ., gk)| Proof. For an integer n, let Pn denote the set of all paths of length n in Ĝ?. For a path γ∈Pn we denote its starting vertex (resp. terminating vertex) by σ(γ) (resp. τ(γ)). Let k and g1, . . ., gk be positive integers and let c, r be the vectors defined in Fact 5 so that C2 = cr. By Proposition 8, λ(C2Ag1CAg2 . . .CAgk) = rAg1CAg2 . . .CAgkc. Let (Mi) = ε((A)g1(C)(A)g2 . . .(C)(A)gk) and let Γ be the set Γ = { γ∈P|(Mi)| : γ matches (Mi), σ(γ)∈{a0, a2d+2}, τ(γ)∈{a0, a1} } . Then it follows that λ(C2Ag1CAg2 . . .CAgk) = rAg1CAg2 . . .CAgkc = r (∏ i Mi ) c = |Γ|. We will show that |Γ| = |∆(g1, . . ., gk)| by exhibiting a bijection between these two sets. For i=1, 2, . . ., k define si = 1 + ∑i−1 j=1(gi+2d+2), and ti = si+gi−1. Then 1=s1<t1<s2<t2<. . .<sk<tk=|(Mi)| (si, ti denote the start and end in- dices of the sequence (A)gi in (Mi)). For a path γ = (e1, . . ., en)∈Pn let L̂?(γ)=L̂?(e1)L̂?(e2). . .L̂?(en)∈{0, 1,}n, be the word generated by the path, 82 6.5. Proof of Theorem 13 and for 1≤i≤j≤n denote by L̂?(γ)i the symbol L̂?(ei) and by L̂?(γ)i:j the word L̂?(ei)L̂?(ei+1). . .L̂?(ej)∈{0, 1,}∗. Note that, as Ĝ? is the only non- trivial component of Ĝ, a word w∈{0, 1,}∗ is generated by some path in Ĝ? iff for any nonnegative integer m, there exists words z, y∈{0, 1,}m, such that zwy∈Ŝ. Next, fix γ∈Γ. Since Mj=B iff j=ti+d+1 for some 1≤i<k, we have L̂?(γ)ti+d+1= for 1≤i<k, and L̂?(γ)j∈{0, 1} for all j∈{1, . . ., tk} \ {t1+d+1, t2+d+1, . . ., tk−1+d+1}. Also, observe that in any path δ∈P2d+1 with L̂?(δ)d+1=, it must hold that L̂?(δ)=0d0d. It follows that L̂?(γ)ti+1:ti+2d+1 = 0d0d for all 1≤i<k. We now define φ : Γ→ ∆(g1, . . ., gk) by φ(γ) = (L̂?(γ)s1:t1 , L̂?(γ)s2:t2 , . . ., L̂?(γ)sk:tk) , γ∈Γ, and claim that it is a bijection. To show this, we need to verify the following statements: 1. φ is well-defined: for all γ∈Γ, φ(γ)∈∆(g1, . . ., gk) 2. φ is one-to-one. 3. φ is onto ∆(g1, . . ., gk). 1. Let γ∈Γ and 1≤i≤k. Since L̂?(γ)si:ti∈{0, 1}∗ we have L̂?(γ)si:ti∈S. Since σ(γ)∈{a0, a2d+2}, L̂?(γ) does not begin with ‘0r1’ for any 1≤r<d which implies L̂?(γ)s1:t1 does not begin with ‘0r1’ for any 1≤r<d, as well. For 2≤i≤k, L̂?(γ)si:ti does not begin with ‘0r1’ for any 1≤r<d, since otherwise L̂?(γ)ti−1+1:si+r=0d0d00r1 or L̂?(γ)ti−1+1:si+r=0d0d10r1, and both words are not in Ŝ. Additionally, for 1≤i≤k−1, L̂?(γ)si:ti does not end with ‘00’ since otherwise L̂?(γ)ti−1:ti+2d+1 = 000d0d which is not in Ŝ. And, as τ(γ)∈{a0, a1}, L̂?(γ) does not end in ‘00’, which implies that L̂?(γ)sk:tk does not end with ‘00’, as well. This shows that for all 1≤i≤k, L̂?(γ)si:ti∈∆(gi). Finally, let 1≤i≤k−1, and assume L̂?(γ)si:ti ends with a ‘0’. If L̂?(γ)si+1:ti+1 begins with a ‘1’, then L̂?(γ)ti:si+1=00d0d01 or L̂?(γ)ti:si+1=00d0d11, and both are not in Ŝ. Therefore, if L̂?(γ)si:ti ends with a ‘0’ then L̂?(γ)si+1:ti+1 must begin with a ‘0’. It follows that φ(γ)∈∆(g1, . . ., gk). 2. Let γ1, γ2∈Γ, such that φ(γ1) = φ(γ2). Clearly, all nonempty paths in G starting at a2d+2 generate a word beginning with ‘1’ and all nonempty paths in Ĝ? starting at a0 generate a word beginning with ‘0’. By our assumption, L̂?(γ1) and L̂?(γ2) begin with the same symbol; hence σ(γ1) = σ(γ2). Now, observe that for any path δ∈Γ, and 2≤i≤k, L̂?(δ)si−1=L̂?(δ)si , where, 0=1 and 1=0; otherwise, L̂?(δ)ti−1+1:si=0d0d00 or L̂?(δ)ti−1+1:si=0d0d11 and both words are not in 83 6.5. Proof of Theorem 13 Ŝ. Thus for all 2≤i≤k, L̂?(γ1)ti−1+1:si−1 = 0d0dL̂?(γ1)si = 0d0dL̂?(γ2)si = L̂?(γ2)ti−1+1:si−1. It follows that L̂?(γ1) = L̂?(γ2) and since Ĝ? is deterministic, we have γ1=γ2. Therefore φ is one-to-one. 3. Let j=(j1, j2, . . ., jn) be an n-tuple of positive integers for some n≥1. Consider the set ∆(j1, . . .jn)=∆(j). For a word w=(w(1), . . ., w(n))∈∆(j),with w(i)=w (i) 1 w (i) 2 . . .w (i) ji , w(i)r ∈{0, 1}, 1≤r≤ji, 1≤i≤n, we define the word z(w)∈Σ̂∗ by z(w) = w(1)0d0dw(2)1 w(2)0d0dw (3) 1 w (3). . .0d0dw(n)1 w(n). It is not hard to verify that for allw∈∆(j), any filling of the ‘’s of z(w) with sym- bols from {0, 1} does not contain the pattern 02d+3 nor any of the patterns 10r1 for 0≤r<d. It follows that z(w)∈Ŝ. More is true; fix a w∈∆(j) and for an inte- ger m, let jm = (y1, . . ., ym, j1, . . ., jn, y1, . . ., ym) with y1=y2=. . .=ym=1, and wm = (x1, . . ., xm, w (1), . . ., w(n), x1, . . ., xm) with x1=x2=. . .=xm=1. Then, clearly wm∈∆(jm) and by our previous argument z(wm)∈Ŝ. It thus follows that z(w) can be extended indefinitely on both sides and is thus generated by a path in Ĝ?. Now, let w = (w(1), . . ., w(k))∈∆(g1, . . ., gk), and let γ be a path in Ĝ? gen- erating z(w). We show that there is a path γ′∈Γ generating z(w). This will con- clude the proof, since obviously for such a path γ′, φ(γ′)=w. If k=g1=1, then z(w)∈{0, 1} and clearly there is a path γ′∈Γ generating z(w). So we assume k>1 or g>1. In this case, observe that the length of z(w) is at least 2, and exactly one of the last two symbols of z(w) must be a ‘1’. Therefore, τ(γ)∈{a0, a1}. Additionally, either z(w) has the prefix ‘1’, or z(w) begins with a ‘0’ and either g1>1, in which case z(w) has the prefix 0r1 with r≥d, or g1=1 and (since k>1), z(w) has the prefix 00d0d1. It is easily verified from Figure 6.3, that in ev- ery case there is a path β in Ĝ? generating the corresponding prefix of z(w) with σ(β)∈{a0, a2d+2} and τ(β)=a0. By replacing the initial part of γ generating this prefix with β we obtain a path γ′ generating z(w) with σ(γ′)∈{a0, a2d+2} and τ(γ′) = τ(γ)∈{a0, a1}. Clearly γ′ matches (Mi), and consequently γ′∈Γ. To prove Propositions 10 and 11 we need the following lemma, which shows a relation between ∆(g1, g2) and ∆(g1 + g2) for g1, g2≥2. Lemma 7. For all g1, g2≥2, there exists a function T :∆(g1, g2)→∆(g1+g2) such that 84 6.5. Proof of Theorem 13 1. T is one-to-one. 2. For any (ax, yb)∈∆(g1, g2), if T (ax, yb)=cwd, where a, b, c, d∈{0, 1} and x, y, w∈{0, 1}∗, then d=b and c≤a (using the normal order of the integers in {0, 1}). Proof. Let g1, g2≥2. For (x, y)∈∆(g1, g2), we define T (x, y) as follows. 1. If x = w10, y = 0z, where w, z∈{0, 1}∗ and z doesn’t start with 02d+1, then T (x, y) = xy. 2. If x = w0d+110, y = 02d+21z, where w, z∈{0, 1}∗, then T (x, y) = w0d10d+210d+11z. 3. If x = w10, y = 02d+21z, where w, z∈{0, 1}∗ and w doesn’t end with 0d+1, then T (x, y) = w0d+210d+11z. 4. If x = w1, y = 0z, where w, z∈{0, 1}∗, then T (x, y) = xy. 5. If x = w1, y = 1z, where w, z∈{0, 1}∗ and w doesn’t end with 02d+2, then T (x, y) = w01z. 6. If x = w02d+21, y = 1z, where w, z∈{0, 1}∗, then T (x, y) = w0d+210d1z. For i = 1, 2, . . ., 6, let Λi denote the subset of pairs (x, y)∈∆(g1, g2) satisfying the conditions of case i above. Then, one can verify that {Λ1, . . .,Λ6} is a par- tition of ∆(g1, g2) and that in each case, T (x, y)∈∆(g1 + g2), thus T is well- defined. Moreover, if (ax′, y′b)∈Λi and T (ax′, y′b) = cwd, were a, b, c, d∈{0, 1} and x′, y′, w∈{0, 1}∗, then it’s easy to verify that d=b, and, unless i=3, then c=a; for i = 3, c≤a. It remains to show that T is one-to-one. First, observe that T restricted to Λi is one-to-one for i = 1, 2, . . ., 6. Next, for (x, y)∈∆(g1, g2), let T (x, y)(1), T (x, y)(2)∈{0, 1}∗ be the sub-words of T (x, y) given by T (x, y) = T (x, y)(1)T (x, y)(2) , |T (x, y)(1)| = g1, |T (x, y)(2)| = g2, 85 6.5. Proof of Theorem 13 and consider the following table. Conditions T (x, y)(1) T (x, y)(2) (x, y)∈Λ1 v10 0w (x, y)∈Λ2 v100 0w (x, y)∈Λ3, g1=2 00 0w (x, y)∈Λ3, g1≥3 v000 0w (x, y)∈Λ4 v1 0w (x, y)∈Λ5 v0d+1 1w (x, y)∈Λ6 v10d 1w Each entry in the leftmost column describes the conditions on x, y, g1 under which T (x, y)(1) and T (x, y)(2) have the forms written in the corresponding entries un- der the rightmost two columns; here v, w denote arbitrary words over {0, 1}. For example, the first line claims that if (x, y)∈Λ1 then T (x, y)(1) ends with ‘10’ and T (x, y)(2) begins with a ‘0’. This and the claims corresponding to the other lines can be easily verified from the definition of T . Clearly, for any (x, y)∈Λi and (s, t)∈Λj with i6=j, we have T (x, y)6=T (s, t), and it follows that T is one-to- one. We can now prove Propositions 10 and 11. For a vector v=(v1, v2, . . ., vk), with positive integer entries, we use the abbreviation ∆(v) for ∆(v1, v2, . . ., vk), and for a positive integer k we denote by 1k the vector in Zk with every entry equal to 1. Proof of Proposition 10. Let T : ∆(gi, gi+1) → ∆(gi+gi+1) be a function sat- isfying the properties listed in Lemma 7. Set g = (g1, . . ., gk) and g′ = (g1, . . ., gi−1, gi+gi+1, gi+2, . . ., gk) and U :∆(g)→∆(g′) be the function defined by U(x1, . . ., xk) = (x1, . . ., xi−1, T (xi, xi+1), xi+2, . . ., xk) , (x1, . . ., xk)∈∆(g). Since T satisfies Property (2) in Lemma 7, it follows that for every z∈∆(g), U(z)∈∆(g′); henceU is well-defined. Since T is one-to-one, so isU and therefore |∆(g)|≤|∆(g′)|. The claim now follows from Proposition 13. Proof of Proposition 11. Let g(1)=(g1, . . ., gi)∈Zi and g(2)=(gi+1, . . ., gk)∈Zk−i, where every gj>0. Define the block-vectors g = ( g(1) 1s g (2) ) g′ = ( g(1) g(2) ) . 86 6.5. Proof of Theorem 13 Finally, let U : ∆(g)→∆(g′)×∆(1s) be the function given by U(x1, . . ., xi, y1, . . ., ys, xi+1, . . ., xk) = (x1, . . ., xk, y1, . . ., ys), for all (x1, . . ., xi, y1, . . ., ys, xi+1, . . ., xk)∈∆(g). Note that for such a vector, if xi ends with a ‘0’, then yj=0, for j = 1, 2, . . ., s, and consequently xi+1 must be- gin with a ‘0’. Therefore (x1, . . ., xi, xi+1, . . ., xk)∈∆(g′) and U is well-defined. Since U is obviously one-to-one, it follows that |∆(g)|≤|∆(g′)||∆(1s)|. Now, set M=C2Ag1CAg2 . . .CAgi(CA)sCAgi+1 . . .CAgk ; then by Propositions 13, 10 and 8, we get λ(M) = |∆(g)| ≤ |∆(g′)||∆(1s)| = λ(C2Ag1CAg2 . . .CAgi−1CAgiCAgi+1CAgi+2 . . .CAgk)λ(C2A(CA)s−1) ≤ λ(C2Ag1CAg2 . . .CAgi−1CAgi+gi+1CAgi+2 . . .CAgk)λ(C2A(CA)s−1) = λ(C2Ag1CAg2 . . .CAgi−1CAgi+gi+1CAgi+2 . . .CAgkC2A(CA)s−1). Proof of Proposition 12, part 1. For t=1, it can be easily verified using Proposi- tion 13, that the conclusion holds with equality. So assume t≥2; in this case, setting ρ=%((Fi)) we have, ρ= t+ 1 (t+1)(2d+2)+t = 1 2d+3−1/(t+1)≤ 1 2d+8/3 = 3 6d+8 . By Proposition 9, it follows that hd(ρ)≥Lp1,p2(ρ). So, it’s enough to show log λ((Fi)) |(Fi)| ≤ Lp1,p2(ρ) Now, λ((Fi)) = λ(C2A(CA)t−1), which by Proposition 13 is |∆(1t)|; so by (6.35), λ((Fi))=t+1. Therefore, we need to show log λ((Fi)) |(Fi)| ≤ Lp1,p2(ρ) ⇐⇒ log λ((Fi))|(Fi)| ≤ L(0,Cd),( 36d+8 , log 36d+8 )(ρ) ⇐⇒ log(t+1)|(Fi)| ≤ Cd − (log 3)/(6d+8) −3/(6d+8) ( ρ− 3 6d+8 ) + log 3 6d+8 ⇐⇒ log(t+1)|(Fi)| ≤ Cd ( 1− ρ 3/(6d+8) ) + log 3 3 ρ ⇐⇒ log(t+1)|(Fi)| ≤ Cd ( 1− ρ 3/(6d+8) ) + (t+ 1)(log 3)/3 |Fi| . 87 6.5. Proof of Theorem 13 The last inequality holds, since ρ≤3/(6d+8), Cd≥0, and for any nonnegative inte- ger t, (t+1)(log 3)/3≥ log(t+1). It remains to prove part 2 of Proposition 12. For this, we require the following two lemmas. The first establishes some properties of |∆(·)| and |∆a→b(·)|, for a, b∈{0, 1}. Lemma 8. The following statements hold. 1. |∆(·)| and |∆a→b(·)| for all bits a, b∈{0, 1} obey the RLL(d, 2d+ 2) recur- sion. Namely, for all positive integers g, |∆(g+2d+3)| = d+2∑ i=0 |∆(g+i)| |∆a→b(g+2d+3)| = d+2∑ i=0 |∆a→b(g+i)|. (6.36) 2. |∆(g)|≤|∆(g+1)| for all g≥3. 3. log(|∆(·)|) is “eventually superadditive”: for all g1, g2≥3, |∆(g1)||∆(g2)| ≤ |∆(g1+g2)| 4. lim g→∞ log |∆(g)| g = sup g≥2 log |∆(g)| g = Cd 5. |∆0→0(g)|=|∆1→1(g)| for all positive integers g. 6. |∆0→1(g)|≤2|∆0→0(g)| for g 6=d+1. 7. |∆0→1(g)|≤12 |∆(g)| for all g if d = 1, and for g 6=d+1 if d>1. Proof. Part 1. Let V ′⊆V be the set of vertices {a0, . . ., a2d+2}, and let G′ be the subgraph of Ĝ? induced by V ′ (so G′ is the “conventional” deterministic pre- sentation of RLL(d, 2d+2)). For vertices u, v∈V ′ and positive integer g, let Wu→v(g)∈{0, 1}g denote the set of words that are generated by paths of length g in G′, starting at u and terminating at v. Since G′ is deterministic, clearly, the number of such paths is |Wu→v(g)|. Let g be a positive integer. It’s not hard to verify that 1. ∆0→0(g) =Wa0→a1(g) 2. ∆0→1(g) =Wa0→a0(g) 88 6.5. Proof of Theorem 13 3. ∆1→0(g) =Wa2d+2→a1(g) 4. ∆1→1(g) =Wa2d+2→a0(g) Let A′=A(G′). It follows that each |∆a→b(g)|, for a, b∈{0, 1}, is equal to a single entry of (A′)g. Now, the characteristic polynomial of A′ is (cf. [18]) x2d+3 − d+2∑ i=0 xi. Invoking the Cayley-Hamilton Theorem, we get (A′)g+2d+3 = d+2∑ i=0 (A′)g+i. Thus, every |∆a→b(g)| satisfies the required recursion and therefore also |∆(g)| =∑ a,b |∆a→b(g)|. Part 2. For d=1, any word int ∆(g) can be extended from the left to a word in ∆(g+ 1) and the claim follows. For d≥2, we use induction on g. Table 6.1, which can be easily verified, shows the sets ∆(g) for g=1, 2, . . ., 2d+3 along with |∆(g)| for g=1, 2, . . ., 2d+6 and d≥2. Evidently, |∆(g)|≤|∆(g+1)| for 3≤g≤2d+5. This shows the induction basis. The induction step follows from the recursion relation (6.36). Part 3. Since the claim is symmetric in g1, g2, it’s enough to prove it only for g1≤g2. We use induction on g2. For the basis of the induction we verify the claim for all 3≤g1≤g2≤2d+5. For 1≤d≤7 we verified the induction basis using a computer, so here we assume d≥8. Consider Table 6.1. If 3≤g1≤d, then |∆(g1)| = 0 and the claim holds trivially. If g1 = d+1, then |∆(g1)| = 1 and the claim follows from the monotonicity of |∆(g)| for g≥3 shown in part 2. If g1 = g2 = d+2, the claim holds since |∆(d+2)|2=9≤|∆(2d+4)|=11. If g1 = d+2 and d+3≤g2≤2d+1 then |∆(d+2)||∆(g2)| = 12<|∆(2d+5)|≤|∆(d+2+g2)|, with the last inequality following from the monotonicity of |∆(g)|, for g≥3. If g1=d+2 and 2d+2≤g2≤2d+5 then |∆(d+2+g2)| ≥ |∆(3d+4)| = 2d+3∑ i=d+1 |∆(i)| = 1+3+(d−1)4+5+8≥45 > |∆(d+2)||∆(2d+5)|≥|∆(d+2)||∆(g2)|. If d+3≤g1≤2d+1 and g1≤g2≤2d+1 then |∆(g1)||∆(g2)|=16=|∆(2d+6)|≤|∆(g1+g2)|. 89 6.5. Proof of Theorem 13 g ∆(g) |∆(g)| 1 {0, 1} 2 2 {10} 1 3≤g≤d ∅ 0 d+1 {0d1} 1 d+2 {0d10, 0d+11, 10d1} 3 d+3≤g≤2d+1 {0g−210, 0g−11, 10g−21, 10g−310} 4 2d+2 {02d10, 0d10d1, 02d+11, 102d−110, 102d1} 5 2d+3 {02d+110, 0d10d10, 02d+21, 0d+110d1, 0d10d+11, 102d10, 102d+11, 10d10d1} 8 2d+4 {. . .} 11 2d+5 {. . .} 14, if d=2; 13, if d≥3. 2d+6 {. . .} 21, if d=2; 17, if d=3; 16, if d≥4. Table 6.1: Values of |∆(g)| for 1≤g≤2d+6 and d≥2. If d+3≤g1≤2d+1 and 2d+2≤g2≤2d+5 then |∆(g1+g2)| ≥ |∆(3d+5)|= 2d+4∑ i=d+2 |∆(i)| = 3+(d−1)4+5+8+11≥55 > |∆(g1)||∆(2d+5)|≥|∆(g1)||∆(g2)|. Finally, if 2d+2≤g1≤g2≤2d+5 then, since d≥8 and hence 2d+11≤3d+3, we 90 6.5. Proof of Theorem 13 have |∆(g1+g2)| ≥ |∆(4d+4)|= 3d+3∑ i=2d+1 |∆(i)|≥ 2d+11∑ i=2d+1 |∆(i)| = 2d+6∑ i=2d+1 |∆(i)|+ 2d+11∑ i=2d+7 |∆(i)| = 57+ 2d+11∑ i=2d+7 2d+3∑ j=d+1 |∆(i−j)| ≥ 57+ 2d+11∑ i=2d+7 (1+3+4 (i−(d+ 1)− (d+2))) =197 > |∆(2d+5)|2≥|∆(g1)||∆(g2)|. This shows the basis of the induction. As for the induction step, let k≥2d+5 be an integer and assume the claim holds for all 3≤g1≤g2≤k. We will prove that the claim holds for all 3≤g1≤g2≤k+1. For g1=g2=k+1 we have |∆(k+1)||∆(k+1)| = 2d+3∑ i=d+1 2d+3∑ j=d+1 |∆(k+1−i)||∆(k+1−j)| ≤ 2d+3∑ i=d+1 2d+3∑ j=d+1 |∆(2k+2−i−j)| = 2d+3∑ i=d+1 |∆(2k+2− i)| = |∆(2k+2)|. where the equalities follow from (6.36), and the inequality follows from the induc- tion hypotheses (note that in the first double sum k+1−i≥3 and k+1−j≥3 due to our assumption on k). The case g2=k+1, 3≤g1≤k is handled in a similar manner. Part 4. Let θ : {1, 2, . . .}→[0,∞) be the function defined by θ(g) = { 0 if g≤d+1 log |∆(g)| otherwise. , g = 1, 2, . . .. Then it’s easily verified, using parts 2 and 3, that this function is superadditive for all positive integers g. Hence, by Lemma 1, limg→∞ θ(g)/g exists and satisfies lim g→∞ θ(g) g = sup g≥1 θ(g) g . Since θ is nonnegative, the RHS is equal to supg≥d+2(θ(g)/g), thus lim g→∞ θ(g) g = lim g→∞ log |∆(g)| g = sup g≥d+2 θ(g) g = sup g≥d+2 log |∆(g)| g . 91 6.5. Proof of Theorem 13 Let A′ be the matrix defined in the proof of part 1 above, and let a, b∈{0, 1}. Since A′ is primitive, it holds that limg→∞((A′)g/λ(A′)g) exists and is strictly positive in each entry. As ∆a→b(g) is equal to a single entry of (A′)g, it follows that there exists a positive real constant ca,b, such that lim g→∞ |∆a→b(g)| λ(A′)g = ca,b. Since |∆(g)|=∑a,b |∆a→b(g)|, this implies that lim g→∞ log |∆(g)| g = log λ(A′) = Cd. It remains to check that supg≥2(log |∆(g)|)/g= supg≥d+2(log |∆(g)|)/g. This holds since for d≥2 and 2≤g<d+2, (log |∆(g)|)/g≤0≤Cd, and for d=1, it can be verified that (log |∆(2)|)/2=1/2<C1. Part 5. Obviously the claim holds for g=1, so assume g≥2. In this case, let ψ:∆0→0(g)→∆1→1(g) be given by ψ(0w0) = 10w , 0w0∈∆0→0(g), w∈{0, 1}∗. It’s easy to verify that for any z∈∆0→0(g), ψ(z)∈∆1→1(g) and that ψ is one-to- one and onto ∆1→1(g). This shows the claim. Part 6. Note, that for 1≤g≤d, ∆0→1(g) = ∅, and the claim holds. To show the claim for g≥d+2, we define a map φ : ∆0→1(g)→∆0→0(g). For a word x∈∆0→1(g), φ(x) is defined as follows: 1. If x = w0d+11, where w∈{0, 1}∗, then φ(x) = w0d10. 2. if x = w0d+110d1, where w∈{0, 1}∗, then φ(x) = w0d10d10. 3. if x = w10d1, and w does not end with 0d+1, then φ(x) = w0d10. For i = 1, 2, 3, let Λi⊆∆0→1(g) denote the set of words satisfying the con- ditions of case i above. Observe, that {Λ1,Λ2,Λ3} is a partition of ∆0→1(g), and that in each case φ(x)∈∆0→0(g); thus φ is well-defined. We claim that φ 92 6.5. Proof of Theorem 13 is at most “two-to-one”. More precisely, we claim that there are no 3 distinct words x, y, z∈∆0→1(g) such that φ(x)=φ(y)=φ(z); otherwise, since, clearly, φ restricted to Λi is one-to-one, each of x, y, z must belong to a different Λi, say x∈∆1, y∈∆2 and z∈∆3; but then φ(y) ends with 10d10 while φ(z) ends with 0d+110. Thus φ is at most two-to-one, and it follows that |∆0→1(g)|≤2|∆0→0(g)|. Part 7. Let g 6=d+1. Then by parts 5 and 6, |∆(g)| ≥ |∆0→0(g)|+ |∆1→1(g)|+ |∆0→1(g)| = 2|∆0→0(g)|+ |∆0→1(g)| ≥ 2|∆0→1(g)|, and the claim follows. The case d = 1 and g = d+1 = 2, is easily verified. We will use the next lemma in the proof of Proposition 12, part 2. It gives a bound on λ((Ni)), where (Ni)⊆{A,C} is a sequence of the form (6.26). Lemma 9. Let s, t≥0 and g≥2 be integers. Set λ=λ((C)2(A,C)s(A)g(C,A)t). If g=d+1 and d>1 then λ=(s+ 1)(t+ 1), otherwise, λ≤|∆(g)| ( s+ 1 4 + t+ 1 4 + (s+ 1)(t+ 1) 2 ) . Proof. Let g∈Zs+t+1 be the block vector given by g=(1s | g |1t). For a, b∈{0, 1} let Γa→b⊆∆(g) be given by Γa→b = {(x1, . . ., xs, y, z1, . . ., zt)∈∆(g) : y begins with a and ends with b}. Clearly, {Γa,b : a, b∈{0, 1}} is a partition of ∆(g). On the other hand, consider the following identities, when s, t>0, which are easy to verify: |Γ0→0| = |∆(1s)||∆0→0(g)| = (s+1)|∆0→0(g)| |Γ0→1| = |∆(1s)||∆0→1(g)||∆(1t)| = (s+ 1)(t+ 1)|∆0→1(g)| |Γ1→0| = |∆1→0(g)| |Γ1→1| = |∆1→1(g)||∆(1t)| = (t+1)|∆1→1(g)| where we used (6.35). Note that these identities hold even when s = 0 or t = 0. By Proposition 13 it follows that λ = |∆(g)|= ∑ a,b∈{0,1} |Γa→b| = (s+1)|∆0→0(g)|+ (s+1)(t+1)|∆0→1(g)| + |∆1→0(g)|+ (t+1)|∆1→1(g)|. (6.37) 93 6.5. Proof of Theorem 13 Now, if g=d+1 and d>1, then ∆(g)={0d1} and the claim follows from (6.37). It remains to show the case g 6=d+1 or d=1. If |∆(g)| = 0, then the claim readily follows from (6.37). Otherwise, rewriting (6.37), we obtain λ = |∆(g)| ( (s+1) |∆0→0(g)| |∆(g)| + (t+1) |∆1→1(g)| |∆(g)| + |∆1→0(g)| |∆(g)| + (s+1)(t+1) |∆0→1(g)| |∆(g)| ) ≤|∆(g)| ( (s+1) |∆0→0(g)|+|∆1→0(g)|/2 |∆(g)| + (t+1) |∆1→1(g)|+|∆1→0(g)|/2 |∆(g)| + (s+1)(t+1) |∆0→1(g)| |∆(g)| ) . (6.38) Now, by Lemma 8, part 5 we have |∆0→0(g)|+ |∆1→0(g)|/2 |∆(g)| = |∆1→1(g)|+ |∆1→0(g)|/2 |∆(g)| = 1 2 ( 1− |∆0→1(g)||∆(g)| ) . Substituting this into (6.38) and applying part 7 of Lemma 8, we obtain λ≤|∆(g)| ( s+1 2 + t+1 2 + |∆0→1(g)| |∆(g)| ( (s+1)(t+1)− s+1 2 − t+1 2 )) ≤ |∆(g)| ( s+1 4 + t+1 4 + (s+1)(t+1) 2 ) . . This completes the proof. Proof of Proposition 12, part 2. Set ρ = %((Fi)). Then since g≥2, ρ = t+ s+ 2 (t+s+2)(2d+2)+t+s+g = 1 (2d+3)+(g−2)/(t+s+2) < 3 6d+8 . Hence, by Proposition 9, it follows that hd(ρ)≥Lp1,p2(ρ). So, it’s enough to show log λ((Fi)) |(Fi)| ≤ Lp1,p2(ρ) 94 6.5. Proof of Theorem 13 Set λ = λ((Fi)). We need to show log λ |(Fi)| ≤ Lp1,p2(ρ) ⇐⇒ log λ|(Fi)| ≤ L(0,Cd),( 36d+8 , log 36d+8 )(ρ) ⇐⇒ log λ|(Fi)| ≤ Cd − (log 3)/(6d+8) −3/(6d+8) ( ρ− 3 6d+8 ) + log 3 6d+8 ⇐⇒ log λ|(Fi)| − log 3 3 ρ ≤ Cd ( 1− ρ 3/(6d+8) ) ⇐⇒ log λ− (t+ s+ 2)(log 3)/3 (1− ρ3/(6d+8))|(Fi)| ≤ Cd ⇐⇒ log λ− (t+ s+ 2)(log 3)/3 g + (t+ s− 4)/3 ≤ Cd (6.39) Now, if g = d+1, and d>1, then by Lemma 9, the numerator of the LHS of (6.39) is log(s+1)−(s+1)(log 3)/3+ log(t+1)−(t+1)(log 3)/3 which is non- positive for any nonnegative integers s,t. Since the denominator of the LHS of (6.39) is always positive, it follows that the LHS of (6.39) is nonpositive and (6.39) holds in this case. So we assume that d=1 or g 6=d+1. In this case, by Lemma 9, log λ−(t+s+2) log 33 g+ t+s−43 ≤ log |∆(g)|+ log( s+1 4 + t+1 4 + (s+1)(t+1) 2 )−(t+s+2) log 33 g+ t+s−43 = log |∆(g)| − α g + β , where we set α=(t+s+2) log 33 − log( s+14 + t+14 + (s+1)(t+1)2 ) and β= t+s−43 . Ob- serve that α ≥ (t+ s+ 2)log 3 3 − log((s+ 1)(t+ 1)) = (t+ 1) log 3 3 − log(t+ 1) + (s+ 1)log 3 3 − log(s+ 1) ≥ 0. 95 6.6. Open questions Now, if t+ s≥4 then β≥0 and log λ− (t+ s+ 2) log 33 g + t+s−43 ≤ log |∆(g)| − α g + β ≤ log |∆(g)| g ≤ Cd, where the last inequality follows from Lemma 8, part 4. Hence (6.39) holds in this case. Otherwise, if t+s<4 then β<0 and it can be verified that for all such t,s, we have −α≤βC1. By Lemma 6 this implies that −α ≤ βCd (6.40) Again, by Lemma 8, part 4, we have (log |∆(g)|)/g≤Cd, which implies log |∆(g)| ≤ gCd (6.41) Summing equations (6.41) and (6.40) and dividing by g+β (which is positive), we get log |∆(g)| − α g + β ≤ Cd. Therefore (6.39) holds in this case as well. 6.6 Open questions By (6.2), the independence capacity of RLL(d, k) remains 1/(2d+2) for all 2d + 1≤k≤3d+ 1. Is it possible to generalize the derivation in the proof of Theorem 13 to obtain the tradeoff function for RLL(d, k) for this range of d and k? The “reverse-concatenation” encoding scheme described in Section 6.1 is used in practice for certain digital storage systems where the relevant constraint is RLL(0, k). Knowing the tradeoff function for this constraint is thus especially important. Unfortunately, currently, this function is only known exactly when k = 1, 2 [5], and for k = 3 for insertion rates in [0, 1/4] [37]. We so far restricted ourselves to dealing with tradeoff functions of 1- dimensional and binary constraints. It remains a task for future work to generalize the definition to higher-dimensional and non-binary constraints. 96 Chapter 7 Bounds on capacity using probability In this chapter we give some probabilistic inequalities that hold for certain 2- dimensional binary constraints. Using these, we obtain a lower bound, stated in Theorem 16, on the capacity of certain constraints of the form S(V)⊗RLL(0, 1), where S(V) is a 1-dimensional constraint over {0, 1}. We suspect that this bound is usually inferior to the bounds we get using the method described in Chap- ter 3: for example, for the (bit-flipped) hard square constraint, Theorem 16 gives cap(RLL(0, 1)⊗2)≥0.49445718. . ., whereas with the method of Chapter 3 one gets cap(RLL(0, 1)⊗2)≥0.58789116. . .. However, the arguments presented here do not seem to require symmetry of either the horizontal or vertical strips of the constraint, and thus may prove to be generalizable to constraints for which the method of Chapter 3 cannot be used. For the rest of this chapter, we assume Σ = {0, 1}. 7.1 Some correlation inequalities The method described in this chapter uses correlation inequalities: specifically the FKG and Holley inequality. We summarize them here. A partially ordered set (Λ,≤) is called a lattice, if any two elements x, y∈Λ have a smallest upper bound denoted x∨y (i.e. x∨y ≥ x and x∨y ≥ y and for all z∈Λ s.t. z≥x and z≥y, we have z ≥ x∨y) and a greatest lower bound denoted x∧y. A lattice is distributive if it satisfies either of the following two equivalent conditions: x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) for all x, y, z∈Λ, x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z) for all x, y, z∈Λ. As an example consider the set Λ = Σm×n for some m,n ∈ N. As usual we say that for ∆1,∆2 in Λ, ∆1≤∆2 if the inequality holds entry-wise. Then ≤ is a partial order on Λ and (Λ,≤) is a finite distributive lattice. For two elements ∆1,∆2∈Σm, the arrays ∆1∨∆2 (resp. ∆1∧∆2) is the array formed by taking an entry-wise maximum (resp. minimum) of ∆1,∆2. Every lattice that we use in this 97 7.1. Some correlation inequalities chapter will essentially be such a lattice and in fact, it can be shown that every finite distributive lattice is isomorphic to a sub-lattice of Σm×1 for some m [31, Chapter 14, Theorem 15]. Henceforth, we will regard Σm×n as the lattice (Σm×n, ≤). Let (Λ,≤) be a finite distributive lattice. A function f : Λ → R is increasing (i.e. non-decreasing) if for all x, y∈Λ with x≤y, f(x)≤f(y). We call a subset A⊆Λ increasing if its indicator function is increasing and a 1-dimensional con- straint S⊆Σ∗, increasing, if Sn is increasing (w.r.t the lattice Σn) for all n∈N. The notions of a decreasing function, subset and 1-dimensional constraint are defined in an analogous manner. A function f : Λ → R is monotone if it is increasing or decreasing. Now, let µ : P(Λ)→ [0, 1] be a probability measure on Λ, where P(Λ) denotes the power set of Λ. For x ∈ Λ and such a measure µ, we abuse notation and write µ(x) to mean µ({x}). For any real function f on Λ we use Eµ(f) = E(f) to denote the expectation of f w.r.t µ, namely Eµ(f) = ∑ x∈Λ f(x)µ(x). The following is known as the Holley inequality. Theorem 14 (Holley inequality). Let (Λ,≤) be a finite distributive lattice, and µ1,µ2 : P(Λ)→ [0, 1] be two probability measures on Λ such that µ1(x ∧ y)µ2(x ∨ y) ≥ µ1(x)µ2(y) For all x, y∈Λ. (7.1) Then for any increasing function f : Λ→ R, Eµ1(f) ≤ Eµ2(f). See [15] for a proof. A simple corollary of this inequality is the following lemma which we use in the next section. Lemma 10. Let W0, . . .,Wm−1 be m random variables—each taking values in Σ—with two probability distributions µ1, µ2 such thatW0, . . .,Wm−1 are indepen- dent w.r.t both µ1 and µ2. Assume that for all t∈[m], µ1(Wt = 1) ≤ µ2(Wt = 1). Then for any increasing set A⊆Σm µ1((W0, . . .,Wm−1)∈A) ≤ µ2((W0, . . .,Wm−1)∈A). Proof. It’s easy to verify that (7.1) holds for µ1 and µ2. The result follows by applying the Holley inequality to the indicator function of A. 98 7.2. Bounds on capacity using probability We will also make use of the following inequality, known as the FKG inequality ( [8]). A probability measure µ : P(Λ)→ [0, 1] on Λ is called log supermodular if for all x, y∈Λ. µ(x ∧ y)µ(x ∨ y) ≥ µ(x)µ(y). (7.2) Theorem 15 (FKG inequality). Let Λ be a finite distributive lattice, and let µ : P(Λ) → [0, 1] be a log supermodular probability measure on Λ. Then for any monotone functions f, g : Λ→ R the following statements hold: 1. If f, g are both increasing or both decreasing then they are positively corre- lated, namely E(f)E(g) ≤ E(fg) 2. If f is increasing and g is decreasing or vice versa then they are negatively correlated, namely E(f)E(g) ≥ E(fg) It is shown in [15] that this inequality is an easy corollary of Theorem 14. As the proof is short, we reproduce it here for completeness. Proof. Observe that it is enough to prove the theorem for the case where both f and g are increasing—the other cases follow from this case by taking f̃ = −f or g̃ = −g, as appropriate. Now, by adding a large enough constant to f , we may assume without loss of generality that f is strictly positive. For a subset A⊆Λ, set µ1(A) = µ(A) and µ2(A) = ∑ x∈A(f(x)µ(x))/Eµ(f). Then it’s easy to check that (7.1) is satisfied and therefore since g is increasing, Eµ1(g) ≤ Eµ2(g) ⇐⇒ Eµ(g) ≤ ∑ x∈Λ f(x)µ(x)g(x) Eµ(f) ⇐⇒ Eµ(g)Eµ(f) ≤ Eµ(fg). 7.2 Bounds on capacity using probability Our goal in this section is to prove the following lower bound on the capacity of certain 2-dimensional constraints. Theorem 16. Let S = S(V) ⊗ S(H) be a 2-dimensional constraint with S(H) = RLL(0, 1) and S(V) an increasing 1-dimensional constraint over Σ. Let ϕ = 1+ √ 5 2 denote the golden mean and set µ∞(0) = 1ϕ2+1 and µ∞(1) = ϕ2 ϕ2+1 . Let G(V) = ((V (V), E(V)),L(V)) be a lossless presentation of S(V), andW(V) : E(V) → [0, 1] 99 7.2. Bounds on capacity using probability be the edge-weighting function given by W(V)(e) = µ∞(L(V)(e)). Let A be the |V (V)|×|V (V)| real matrix with entries indexed by V (V)×V (V) and given by (A)(i,j) = ∑ e∈E(V), σ(e)=i,τ(e)=j W(V)(e). Then cap(S) ≥ logϕ− 1 2 ( 1− cap(S(V))− log λ(A) ) , Remark 1. If S = RLL(1,∞)⊗S(V) with S(V) a decreasing 1-dimensional con- straint over Σ, then simultaneously changing ‘0’s to ‘1’s and ‘1’s to ‘0’s in every array of S would result in a constraint that satisfies the requirement of the above theorem and has the same capacity as S. Thus, we can also use the theorem to get a lower bound on cap(S). Proof. For a probability space (Ω,F, µ), and events A,B ⊆ Ω with µ(A) > 0, we denote by µ(·|A) : F → [0, 1] the conditional probability measure given by µ(B|A) = µ(B ∩ A)/µ(A) for all B∈F. For a word w∈Σ∗ of length m, m∈N, we index its symbols by [m] so that w = w0w1. . .wm−1. For m,n∈N, let µm×n : P(Σm×n) → [0, 1] be the uniform probability measure on Σm×n. Clearly, µm×n satisfies (7.2) with equality. For a subset A⊆Σm×n, let 1A : Σm×n → {0, 1} be the indicator function of A, and for integers i∈[m] and j∈[n], define the sets: Ri = R (m,n) i = {∆∈Σm×n : ∆i,∗ satisfies S(H)}, Cj = C (m,n) j = {∆∈Σm×n : ∆∗,j satisfies S(V)}, R = R(m,n) = ⋂ i∈[m] R (m,n) i , Ceven = C (m,n) even = ⋂ j∈[n], j even C (m,n) j , and Codd = C (m,n) odd = ⋂ j∈[n], j odd C (m,n) j . Then since S(H) and S(V) are both increasing, it follows that, 1R∩Codd and 1Ceven 100 7.2. Bounds on capacity using probability are increasing, and consequently, by the FKG inequality, µm×n(Sm×n) = µm×n(R ∩ Codd ∩ Ceven) = Eµm×n(1R∩Codd ·1Ceven) ≥ Eµm×n(1R∩Codd)Eµm×n(1Ceven) = µm×n(R ∩ Codd)µm×n(Ceven). It follows that cap(S) = lim (m,n)→∞ log |Sm×n| mn = lim m→∞ limn→∞ log(2mnµm×n(Sm×n)) mn ≥ lim sup m→∞ lim sup n→∞ log (2mnµm×n(Ceven)µm×n(R ∩ Codd)) mn = lim sup m→∞ lim sup n→∞ log ( µm×n(Ceven) · 2mn · µm×n(R)µm×n(Codd|R) ) mn = lim sup m→∞ lim sup n→∞ ( log (( 2−m ∣∣∣S(V)m ∣∣∣)dn/2e ∣∣∣S(H)n ∣∣∣m) mn + log ∏bn/2c i=1 µm×n ( C2i−1 ∣∣R ∩⋂i−1j=1C2j−1) mn ) = cap(S(H)) + 1 2 ( cap(S(V))− 1 ) + lim sup m→∞ lim sup n→∞ ∑bn/2c i=1 logµm×n ( C2i−1 ∣∣∣R ∩⋂i−1j=1C2j−1) mn . (7.3) We claim that µm×n(C2i−1|R∩ i−1⋂ j=1 C2j−1)≥µm×n(C2i−1|R), (7.4) for all m,n ∈ N and integer 1≤i≤bn/2c. This follows from the right inequality given in the next Proposition. Proposition 14. For all m,n∈N and i∈[n] µm×n(Ci|R ∩ ⋂ 1≤j≤i, j odd Ci−j) ≤ µm×n(Ci|R) ≤ µm×n(Ci|R ∩ ⋂ 1≤j≤i, j even Ci−j). 101 7.2. Bounds on capacity using probability The proof of Proposition 14 is given in Section 7.2.1. Using (7.4) in (7.3) and the well-known fact (c.f. [32]) that cap(S(H)) = logϕ, we obtain cap(S) ≥ cap(S(H)) + 1 2 ( cap(S(V))− 1 ) + lim sup m→∞ lim sup n→∞ ∑bn/2c i=1 logµm×n(C2i−1|R) mn = logϕ− 1 2 (1− cap(S(V)))+ lim sup m→∞ lim sup n→∞ ∑bn/2c i=1 logµm×n(C2i−1|R) mn The theorem now follows from the next lemma whose proof is given in Sec- tion 7.2.2 Lemma 11. The following statements hold: 1. For allm∈N, lim n→∞ ∑bn/2c i=1 logµm×n(C2i−1|R) n = 1 2 log ∑ w∈S(V)m ∏ j∈[m] µ∞(wj). 2. lim m→∞ log ∑ w∈S(V)m ∏ j∈[m] µ∞(wj) m = log λ(A). 7.2.1 Proof of Proposition 14 Proof. We prove the right inequality since it is the only one we use in the proof of Theorem 16 and the proof of the left inequality is similar. Let m,n∈N and i∈[n]. We again use the FKG inequality. Set A = {i − j : j ∈ N, 1 ≤ j≤i, j is even}, S = [m]×A and let Λ′ = ΣS , be the set of all binary-valued functions with domain S. We think of a function in Λ′ as a binary non-contiguous configuration on the “sites” of S. For such a configuration ∆′∈Λ′ and j∈A, we write ∆′∗,j to denote the 1-dimensional word ∆′(0, j)∆′(1, j). . .∆′(m− 1, j). As usual for ∆′1,∆′2∈Λ′ we write ∆′1≤∆′2 if ∆′1(j)≤∆′2(j) for all j∈S. Then Λ′ with this ordering is a finite distributive lattice. For a binary array ∆∈Σm×n, denote by pi(∆)∈Λ′ the restriction of ∆ to S, namely the configuration ∆′∈Λ′ given by ∆′(j) = ∆j for all j∈S. For a set B⊆Λ′, let pi−1(B) denote the inverse image of B under pi. Set µ = µm×n and let µ′ : P(Λ′) → [0, 1] be the push-forward probability measure of µ(·|R), defined by µ′(B) = µ(pi−1(B)|R). For j∈A, let C ′j = {∆′∈Λ′ : ∆′∗,j ∈ S(V)} and set C ′ = ⋂ j∈AC ′ j . Then pi −1(C ′j) = Cj and pi −1(C ′) = ⋂ j∈ACj . Finally, 102 7.2. Bounds on capacity using probability let f, g : Λ′ → R be the functions given by f(∆′) = 1C′(∆′) = { 1 If ∆′∈C ′. 0 otherwise. , and g(∆′) = µ(Ci|R ∩ pi−1({∆′})), for ∆′∈Λ′. We claim that the following statements hold. • µ′ is log supermodular. • f is increasing. • g is increasing. Assuming that these do hold, then by the FKG inequality, we have∑ ∆′∈Λ′ ( f(∆′)g(∆′)µ′(∆′) ) ≥ ∑ ∆′∈Λ′ ( f(∆′)µ′(∆′) ) ∑ ∆′∈Λ′ ( g(∆′)µ′(∆′) ) ⇐⇒ ∑ ∆′∈C′ ( µ(Ci|R ∩ pi−1({∆′}))µ(pi−1({∆′})|R) ) ≥∑ ∆′∈C′ µ(pi−1({∆′})|R) ∑ ∆′∈Λ′ ( µ(Ci|R ∩ pi−1({∆′}))µ(pi−1({∆′})|R) ) ⇐⇒ ∑ ∆′∈C′ µ(Ci ∩ pi−1({∆′})|R) ≥∑ ∆′∈C′ µ(pi−1({∆′})|R) ∑ ∆′∈Λ′ µ(Ci ∩ pi−1({∆′})|R) ⇐⇒ µ(Ci ∩ ⋂ j∈A Cj |R) ≥ µ( ⋂ j∈A Cj |R)µ(Ci|R) ⇐⇒ µ(Ci|R ∩ ⋂ j∈A Cj) ≥ µ(Ci|R), which is the desired inequality. So it remains to show that the above 3 claims hold. We begin by showing that µ′ is log supermodular. Let ∆′∈Λ′. Then µ′(∆′) = µ(pi−1({∆′}) ∩R) µ(R) = 2−mn ∣∣{∆∈Σm×n : ∀t∈[m]∀j∈A(∆t,j = ∆′(t, j) and ∆t,∗∈S(H))}∣∣ 2−mn|R| = ∏ t∈[m] ∣∣∣{w∈S(H)n : ∀j∈A wj = ∆′(t, j)}∣∣∣∣∣∣S(H)n ∣∣∣ (7.5) 103 7.2. Bounds on capacity using probability Let j0 < j1 < . . . < j`−1=i−2 be the elements of A and for symbols a, b ∈ Σ and integer u≥0, define αu(a) = |{w∈S(H)u : wa∈S(H)}|, δ(a, b) = |{x∈Σ : axb∈S(H)}| and ωu(a) = |{w∈S(H)u : aw∈S(H)}|. Since the memory of S(H) = RLL(0, 1) is 1, it readily follows that∣∣∣{w∈S(H)n : ∀k∈[`] wjk = ak}∣∣∣ = αj0(a0) ∏ k∈[`−1] δ(ak, ak+1) ωn−i+1(a`−1). Setting r = n− i+ 1, we can rewrite (7.5) as µ′(∆′) = ∏ t∈[m] αj0 ( ∆′(t, j0) ) (∏`−2 k=0 δ ( ∆′(t, jk),∆′(t, jk+1) )) ωr ( ∆′(t, j`−1) )∣∣∣S(H)n ∣∣∣ . Now, pick ∆′1,∆′2 ∈ Λ′, and set Γ′1 = ∆′1 ∧∆′2 and Γ′2 = ∆′1 ∨∆′2. µ′(∆1)µ′(∆2)= ∣∣∣S(H)n ∣∣∣−2m ∏ t∈[m] 2∏ s=1 αj0 ( ∆′s(t, j0) ) · ∏ t∈[m] ∏ k∈[`−1] 2∏ s=1 δ ( ∆′s(t, jk),∆ ′ s(t, jk+1) ) · ∏ t∈[m] 2∏ s=1 ωr ( ∆′s(t, j`−1) ) For all a, b, c, d∈Σ, it’s easy to verify that δ(a, b)δ(c, d) ≤ δ(a ∧ c, b ∧ d)δ(a ∨ c, b ∨ d), and obviously αj0(a)αj0(b) = αj0(a∧b)αj0(a∨b) and ωr(a)ωr(b) = ωr(a∧b)ωr(a∨b). Thus, µ′(∆1)µ′(∆2) ≤ ∣∣∣S(H)n ∣∣∣−2m ∏ t∈[m] 2∏ s=1 αj0 ( Γ′s(t, j0) ) · ∏ t∈[m] ∏ k∈[`−1] 2∏ s=1 δ ( Γ′s(t, jk),Γ ′ s(t, jk+1) ) · ∏ t∈[m] 2∏ s=1 ωr ( Γ′s(t, j`−1) ) = µ′(Γ′1)µ ′(Γ′2). 104 7.2. Bounds on capacity using probability Hence µ′ is log supermodular. The second claim that f is increasing follows immediately from the fact the S(V) is increasing. So it remains to show that g is increasing (this is where the proof of the left inequality of the proposition differs). Let ∆′∈Λ′. Then g(∆′) = µ(Ci|R ∩ pi−1(∆′)) = µ(Ci∩R∩pi −1(∆′)) µ(R∩pi−1(∆′)) = 2−mn ∑ w∈S(V)m ∣∣∣{∆∈Σm×n : ∆∗,i=w,∀t∈[m] ∆t,∗∈S(H),∀t∈[m]∀j∈A∆t,j=∆′(t,j) }∣∣∣ 2−mn ∣∣{∆∈Σm×n : ∀t∈[m]∀j∈A(∆t,j = ∆′(t, j) and ∆t,∗∈S(H))}∣∣ = ∑ w∈S(V)m ∏ t∈[m] ∣∣∣{v∈S(H)n : vi = wt, ∀j∈Avj=∆′(t, j)}∣∣∣∏ t∈[m] ∣∣∣{v∈S(H)n : ∀j∈A vj=∆′(t, j)}∣∣∣ , (7.6) For t∈[m], let qt = |{v∈S(H)i−1 : ∀j∈A vj=∆′(t, j)}|; then since S(H) has memory 1, it follows that for all t∈[m], and a∈Σ we have∣∣∣{v∈S(H)n : vi = a, ∀j∈Avj=∆′(t, j)}∣∣∣ = qtδ(∆′(t, i−2), a)ωn−i−1(a), and∣∣∣{v∈S(H)n : ∀j∈Avj=∆′(t, j)}∣∣∣ = qtωn−i+1(∆′(t, i−2)) Substituting this into (7.6), we obtain g(∆′) = ∑ w∈S(V)m ∏ t∈[m] δ(∆′(t, i−2), wt)ωn−i−1(wt) ωn−i+1(∆′(t, i−2)) = ∑ w∈S(V)m ∏ t∈[m] ψ∆′(t,i−2)(wt), where ψa(b) = δ(a,b)ωn−i−1(b) ωn−i+1(a) for all a, b ∈ Σ. Observe that ψa(0)+ψa(1) = 1 and that ψa(0), ψa(1)∈[0, 1] for all a∈Σ. It follows that g(∆′) is the probability that a randomly selected word consisting of m independent random bits, with the tth bit having a probability of ψ∆′(t,i−2)(1) to be 1, satisfies S(V). Now, let ∆′1,∆′2 ∈ Λ′ satisfy ∆′1 ≤ ∆′2. We claim that for each t∈[m] ψ∆′1(t,i−2)(1)≤ψ∆′2(t,i−2)(1). Indeed, let t∈[m] be such that ∆′1(t, i−2) = 0 and ∆′2(t, i−2) = 1. Then ψ∆′1(t,i−2)(1) = δ(0, 1)ωn−i−1(1) ωn−i+1(0) = ωn−i−1(1) ωn−i(1) , and ψ∆′2(t,i−2)(1) = δ(1, 1)ωn−i−1(1) ωn−i+1(1) = 2ωn−i−1(1) ωn−i+1(1) , and since ωn−i+1(1)≤2|S(H)n−i|=2ωn−i(1) it follows that ψ∆′1(t,i−2)(1) ≤ ψ∆′2(t,i−2)(1). As S (V) is increasing, Lemma 10 implies that g(∆′1)≤g(∆′2) and therefore g is increasing as claimed. 105 7.2. Bounds on capacity using probability 7.2.2 Proof of Lemma 11 Fix m∈N. Observe that for any n1, n2∈N we have µm×(n1+n2+1)(Cn1 |R) = |Cn1∩R|2−mn |R|2−mn = ∑ w∈S(V)m ∣∣∣{∆∈Σm×(n1+n2+1) : ∆∗,n1=w,∀t∈[m]∆t,∗∈S(H) }∣∣∣ |R| = ∑ w∈S(V)m ∏ j∈[m] ∣∣∣{v∈S(H)n : vn1 = wj}∣∣∣∣∣∣S(H)n1+n2+1∣∣∣ = ∑ w∈S(V)m ∏ j∈[m] µn1,n2(wj), (7.7) where we define µn1,n2 : Σ → [0, 1] to be the function given by µn1,n2(a) = |{v∈S(H)n1+n2+1 : vn1 = a}|/|S (H) n1+n2+1 | for a∈Σ. Since S(H) is increasing, for any v∈S(H)n1+n2+1 with vn1 = 0, the word v′ formed from v by changing the n1’th symbol to ‘1’ is also in S(H). It follows that for all n1, n2 ∈ N, µn1,n2(1)≥12 , and thus by (7.7), one has µm×(n1+n2+1)(Cn1 |R)≥(µn1,n2(1))m≥2−m. (7.8) We next show that for all a∈Σ lim (n1,n2)→∞ µn1,n2(a) = µ∞(a). (7.9) In fact it can be shown that the above limit exists (and equals a different µ∞) for all vertex constraints S(H) defined by a primitive graph. Let fn = |(RLL(0, 1))n|. It is well known (c.f. [32]) that fn is the (n + 2) Fibonacchi number and is given by fn = ϕ n ϕ 3 1 + ϕ2 + ϕ̄n ϕ̄3 1 + ϕ̄2 , (7.10) where ϕ̄ = 1 − ϕ. Since ∑a∈Σ µn1,n2(a) = ∑a∈Σ µ∞(a) = 1, it’s enough to 106 7.2. Bounds on capacity using probability show that (7.9) holds for a = 1. In this case, clearly, µn1,n2(1) = |{v∈S(H)n1+n2+1 : vn1 = 1}| fn1+n2+1 = |{x1y ∈ Σn1+n2+1 : x∈S(H)n1 , y∈S(H)n2 }| fn1+n2+1 = fn1fn2 fn1+n2+1 . By substituting (7.10) into the last equality and taking the limit as (n1, n2) → ∞ it can be verified that (7.9) holds for a = 1. Now, set L = log ∑ w∈S(V)m ∏ j∈[m] µ∞(wj). It follows from (7.7) and (7.9) that lim(n1,n2)→∞ logµm×(n1+n2+1)(Cn1 |R) = L. Let >0, and choose N∈N such that for all n1, n2≥N , | logµm×(n1+n2+1)(Cn1 |R) − L| < . For n∈N set An = {j∈[n] : j odd}, An,1 = {j∈An : j < N or j > n− 1−N} and An,2 = An\An,1. Then, for all n∈N with n > 2N , we have∣∣∣∣∣ ∑bn/2c i=1 logµm×n(C2i−1|R) bn/2c − L ∣∣∣∣∣ ≤ ∣∣∣∣∣ ∑ i∈An,1 logµm×n(Ci|R) |An| ∣∣∣∣∣+∣∣∣∣∣∣ ∑ i∈An,2 ( logµm×n(Ci|R)−L ) −|An,1|L |An| ∣∣∣∣∣∣ . Note that for all i∈An,2, it holds that i≥N and n − i − 1≥N ; thus | logµm×n(Ci|R)−L| < . Also by (7.8), it follows that for all i∈An,1, | logµm×n(Ci|R)|≤m. Therefore we obtain∣∣∣∣∣ ∑bn/2c i=1 logµm×n(C2i−1|R) bn/2c − L ∣∣∣∣∣ ≤ |An,1||An| m+ |An,2||An| + |An,1||An| L ≤ 2N(m+ L)bn/2c + , which is less than 2 for large enough n. It follows that lim n→∞ ∑bn/2c i=1 logµm×n(C2i−1|R) bn/2c = log ∑ w∈S(V)m ∏ j∈[m] µ∞(wj), which obviously implies the first statement of the lemma. 107 7.3. Open questions As for the second statement, for a path in G(V), define its weight to be the product of the weights of its edges. Denote byWm the sum of all the weights of paths of length m in G(V). Since G(V) is a lossless presentation of S(V) we have Wm |V (V)|2 ≤ ∑ w∈S(V)m ∏ j∈[m] µ∞(wj) ≤ Wm. On the other hand,Wm = 1tAm1 and, applying Perron-Frobenius theory, we have limm→∞ logWmm = log(λ(A)). The second statement follows. 7.3 Open questions Can Theorem 16 be generalized to other constraints S(H)? Can one use simi- lar probabilistic tools to obtain upper bounds on the capacity of axial products of monotone 1-dimensional constraints? Finally, can such tools be used to obtain bounds on the capacity of constraints in more than 2 dimensions? 108 Bibliography [1] R.J Baxter, Hard hexagons: exact solution, Journal of Physics A: Mathemat- ical and General 13 (1980), L61–L70. [2] R. Berger, The undecidability of the domino problem, Memoirs of the Amer- ican Mathematical Society (1966). [3] N.J. Calkin and H.S. Wilf, The number of independent sets in a grid graph, SIAM Journal of Discrete Math. 11 (1998), no. 1, 54–60. [4] M. Cohn, On the channel capacity of read/write isolated memory, Discrete Applied Mathematics 56 (1995). [5] J.C. de Souza, B.H. Marcus, R. New, and Wilson B.A., Constrained systems with unconstrained positions, IEEE Transactions on Information Theory 48 (2002), 866–879. [6] A. Desai, Subsystem entropy for Zd sofic shifts, Indagationes Mathematicae 17 (2006). [7] K. Engel, On the Fibonacci number of an m×n lattice, Fibonacci Quarterly 28 (1990), 72–78. [8] C.M. Fortuin, P.W. Kasteleyn, and J. Ginibre, Correlation inequalities on some partially ordered sets, Comm. Math. Phys. 22 (1971), no. 2, 89–103. [9] S. Friedland, On the entropy of Zd subshifts of finite type, Linear Algebra and its Applications 252 (1997), no. 1–3, 199–220. [10] S. Friedland, Multi-dimensional capacity, pressure and Hausdorff dimension, Mathematical System Theory in Biology, Communication, Computation and Finance (2003), 183–222. [11] S. Friedland and U.N. Peled, The pressure, densities and first order phase transitions associated with multidimensional SOFT, arXiv:0906.5176v3, http://arxiv.org/abs/0906.5176v3, submitted, 2009. 109 [12] D.S. Gaunt and M.E. Fisher, Hard-sphere lattice gases, I. plane-square lat- tice, J. Chem. Phys. 43 (1965), 2840–2863. [13] M.J. Golin, X. Yong, Y. Zhang, and L. Sheng, New upper and lower bounds on the channel capacity of read/write isolated memory, Discrete Applied Mathematics 140 (2004). [14] S. Halevy and R. Roth, Parallel constrained coding with application to two-dimensional constraints, IEEE Transactions on Information Theory 48 (2002), 1009–1020. [15] R.H. Holley, Remarks on the FKG inequalities, Comm. Math. Phys. 36 (1974), no. 3, 227–231. [16] A. Horn and C. Johnson, Matrix analysis, Cambridge University Press, 1985. [17] R. Horst and N.V. Thoai, DC programming: overview, Journal of Optimiza- ton Theory and Applications 103 (1999), no. 1. [18] K.A.S. Immink, Codes for mass data storage, Shannon Foundation Publish- ers, Eindhoven, 2004. [19] K.A.S. Immink and K. Cai, Simple classes of constrained systems with uncon- strained positions that outperform the maxentropic bound, IEEE International Symposium on Information Theory, 2008. [20] H. Ito, A. Kato, Zs. Nagy, and K. Zeger, Zero capacity region of multidi- mensional run length constraints, The Electronic Journal of Combinatorics 6 (1999). [21] H. Kamabe, Insertion rate and optimization of redundancy of constrained systems with unconstrained positions, Information Theory and Applications Workshop, 2009. [22] H. Kamabe, Lower bounds of capacity of 2D constraints, Information Theory and Applications Workshop, 2010. [23] P.W. Kasteleyn, The statistics of dimers on a lattice, Physica A 27 (1961), 1209–1225. [24] J.F.C. Kingman, A convexity property of positive matrices, The Quarterly Journal of Mathematics. Oxford. Second Series 12 (1963). [25] E.H. Lieb, Residual entropy of square ice, Phys. Rev. 162 (1967), no. 1, 162– 172. 110 [26] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge University Press, 1995 (reprinted 1999). [27] E. Louidor, The tradeoff function for a class of RLL(d, k) constraints, In review (SIAM Journal of Discrete Math). [28] E. Louidor and B.H. Marcus, Improved lower bounds on capacities of sym- metric 2-dimensional constraints using Rayleigh quotients, IEEE Transac- tions on Information Theory 56 (2010), no. 4, 1624–1639. [29] E. Louidor, B.H. Marcus, and R. Pavlov, Independence entropy of Zd shift spaces, Submitted to Acta Applicandae Mathematicae. [30] E. Louidor, T.L. Poo, P. Chaichanavong, and B.H. Marcus, Maximum inser- tion rate and capacity of multidimensional constraints, IEEE International Symposium on Information Theory, 2008. [31] S. MacLane and G. Birkhoff, Algebra, second ed., Chelsea Pub Co, 1999. [32] B. Marcus, R. Roth, and P. Siegel, Constrained systems and coding for recording channels, Handbook of Coding Theory, Chapter 20, Elsevier Sci- ence. [33] M. Marcus and H. Minc, A survey of matrix theory and matrix inequalities, Allyn and Bacon, 1964. [34] N. Markley and M. Paul, Maximal measures and entropy of zν subshifts of finite type, Classical mechanics and dynamical systems 70 (1981), 135–157. [35] Z. Nagy and K. Zeger, Capacity bounds for the three-dimensional (0,1) run- length limited channel, IEEE Transactions on Information Theory 44 (2000), 1030–1033. [36] E. Ordentlich and R. Roth, Independent sets in regular hypergraphs and multidimensional runlength-limited constraints, SIAM J. Discret. Math. 17 (2004), no. 4, 615–623. [37] T.L. Poo, Optimal code rates for constrained systems with unconstrained po- sitions: An approach to combining error correction codes with modulation codes for digital storage systems, Ph.D. thesis, Department of Electrical En- gineering, Stanford, 2005. [38] T.L. Poo, P. Chaichanavong, and B.H. Marcus, Trade-off functions for con- strained systems with unconstrained positions, IEEE Transactions on Infor- mation Theory 52 (2006), 1425–1449. 111 [39] M. Schwartz and J. Bruck, Constrained codes as networks of relations, IEEE Transactions on Information Theory 54 (2008), 2179–2195. [40] A.J. van Wijngaarden and K.A.S. Immink, Maximum runlength-limited codes with error control capabilities, IEEE J. Select. Areas Commun. 19 (2001), 602–611. [41] K. Weber, On the number of stable sets in an m× n lattice., Rostock. Math. Kolloq. 34 (1988), 28–36. [42] X. Yong and M.J. Golin, New techniques for bounding the channel capacity of read/write isolated memory, Data Compression Conference, 2002. 112
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Capacity of multidimensional constrained channels :...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Capacity of multidimensional constrained channels : estimates and exact computations Louidor, Erez 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Capacity of multidimensional constrained channels : estimates and exact computations |
Creator |
Louidor, Erez |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | This work considers channels for which the input is constrained to be from a given set of D-dimensional arrays over a finite alphabet. Such a set is called a constraint. An encoder for such a channel transforms arbitrary arrays over the alphabet into constrained arrays in a decipherable manner. The rate of the encoder is the ratio of the size of its input to the size of its output. The capacity of the channel or constraint is the highest achievable rate of any encoder for the channel. We compute the exact capacity of two families of multidimensional constraints. We also generalize a known method for obtaining lower bounds on the capacity, for a certain class of 2-dimensional constraints, and improve the best known bounds for a few constraints of this class. Given a binary D-dimensional constraint, a D-dimensional array with entries in {0,1,⃞} is called "valid", for the purpose of this abstract, if any "filling" of the '⃞'s in the array with '0's and '1's, independently, results in an array that belongs to the constraint. The density of '*'s in the array is called the insertion rate. The largest achievable insertion rate in arbitrary large arrays is called the maximum insertion rate. An unconstrained encoder for a given insertion rate transforms arbitrary binary arrays into valid arrays having the specified insertion rate. The tradeoff function essentially specifies for a given insertion rate the maximum rate of an unconstrained encoder for that insertion rate. We determine the tradeoff function for a certain family of 1-dimensional constraints. Given a 1-dimensional constraint, one can consider the D-dimensional constraint formed by collecting all the D-dimensional arrays for which the original 1-dimensional constraint is satisfied on every "row" in every "direction". The sequence of capacities of these D-dimensional generalizations has a limit as D approaches infinity, sometimes called the infinite-dimensional capacity. We partially answer a question of [37], by proving that for a large class of 1-dimensional constraints with maximum insertion rate 0, the infinite dimensional capacity equals 0 as well. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-05-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 3.0 Unported |
DOI | 10.14288/1.0069991 |
URI | http://hdl.handle.net/2429/24888 |
Degree |
Doctor of Philosophy - PhD |
Program |
Mathematics |
Affiliation |
Science, Faculty of Mathematics, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/3.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_louidor_erez.pdf [ 810.17kB ]
- Metadata
- JSON: 24-1.0069991.json
- JSON-LD: 24-1.0069991-ld.json
- RDF/XML (Pretty): 24-1.0069991-rdf.xml
- RDF/JSON: 24-1.0069991-rdf.json
- Turtle: 24-1.0069991-turtle.txt
- N-Triples: 24-1.0069991-rdf-ntriples.txt
- Original Record: 24-1.0069991-source.json
- Full Text
- 24-1.0069991-fulltext.txt
- Citation
- 24-1.0069991.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0069991/manifest