Open Collections

UBC Theses and Dissertations

Structure and randomness in arithmetic settings Wong, Erick Bryce 2012-12-31

Download

Media
[if-you-see-this-DO-NOT-CLICK]
ubc_2012_fall_wong_erick.pdf [ 926.3kB ]
[if-you-see-this-DO-NOT-CLICK]
Metadata
JSON: 1.0072975.json
JSON-LD: 1.0072975+ld.json
RDF/XML (Pretty): 1.0072975.xml
RDF/JSON: 1.0072975+rdf.json
Turtle: 1.0072975+rdf-turtle.txt
N-Triples: 1.0072975+rdf-ntriples.txt
Original Record: 1.0072975 +original-record.json
Full Text
1.0072975.txt
Citation
1.0072975.ris

Full Text

`Structure and Randomness in Arithmetic Settings by Erick Bryce Wong  B.Sc., Simon Fraser University, 1994 M.Sc., Simon Fraser University, 1997  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Mathematics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2012 c Erick Bryce Wong 2012  Abstract We study questions in three arithmetic settings, each of which carries aspects of random-like behaviour. In the setting of arithmetic functions, we establish mild conditions under which the tuple of multiplicative functions [f1 , f2 , . . . , fd ], evaluated at d consecutive integers n + 1, . . . , n + d, closely approximates points in Rd for a positive proportion of n; we obtain a further generalization which allows these functions to be composed with various arithmetic progressions. Secondly, we examine the eigenvalues of random integer matrices, showing that most matrices have no rational eigenvalues; we also identify the precise distributions of both real and rational eigenvalues in the 2 × 2 case. Finally, we consider the set S(k) of numbers represented by the quadratic form x2 +ky 2 , showing that it contains infinitely many strings of five consecutive integers under many choices of k; we also characterize exactly which numbers can appear as the difference of two consecutive values in S(k).  ii  Preface A version of Chapter 2 was published as: • Wong, E. B. (2008) Simultaneous approximation of reals by values of arithmetic functions. Anatomy of Integers, volume 46 of CRM Proc. Lecture Notes, pp. 289–297. Amer. Math. Soc., Providence, RI. [86] Chapter 3 is based on two published papers: • Martin, G. and Wong, E. (2008) The number of 2 × 2 integer matrices having a prescribed integer eigenvalue. Algebra Number Theory, 2(8):979–1000. [55] • Martin, G. and Wong, E. B. (2009) Almost all integer matrices have no integer eigenvalues. Amer. Math. Monthly, 116(7):588–597. [54] The earliest manuscripts were drafted by Dr. Martin, based on my original research; thereafter, we shared equal responsibility for both writing and subsequent research. In addition, a version of Proposition 3.6 in Section 3.2 “Basic bounds on singularity” was published as Lemma 1.1 in: • Pataki, G., Tural M. and Wong, E. B. (2010) Basis reduction and the complexity of branch-and-bound. In Proceedings of the TwentyFirst Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1254–1261, Philadelphia, PA. SIAM. [66] My primary contribution to this paper was the formulation and proof of this lemma. A version of this proof was submitted to SIAM J. Computing in an extended form of the above paper. iii  Table of Contents Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  Preface  List of Tables  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  Dedication  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.1  Setting the stage . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.2  Modelling the primes  . . . . . . . . . . . . . . . . . . . . . .  2  1.3  Other arithmetic structures . . . . . . . . . . . . . . . . . . .  4  1.4  Density and the primes  . . . . . . . . . . . . . . . . . . . . .  5  1.5  Notation and outline  . . . . . . . . . . . . . . . . . . . . . .  7  1.6  Arithmetic functions . . . . . . . . . . . . . . . . . . . . . . .  8  1.7  Distribution of an arithmetic function . . . . . . . . . . . . .  9  1.8  Approximation by arithmetic functions  . . . . . . . . . . . .  11  1.9  Comparing arithmetic progressions . . . . . . . . . . . . . . .  12  1.10 Probability in number theory . . . . . . . . . . . . . . . . . .  13  1.11 Random matrices and eigenvalues  . . . . . . . . . . . . . . .  15  . . . . . . . . . . . . . . . . . . . .  16  1.12 Random integer matrices  iv  Table of Contents 1.13 Singularity and computation  . . . . . . . . . . . . . . . . . .  18  1.14 Rationality versus reality . . . . . . . . . . . . . . . . . . . .  19  1.15 Sums of two squares . . . . . . . . . . . . . . . . . . . . . . .  20  1.16 Patterns in sums of two squares  . . . . . . . . . . . . . . . .  22  1.17 Patterns in quadratic forms . . . . . . . . . . . . . . . . . . .  24  2 Approximation by Arithmetic Functions . . . . . . . . . . .  26  2.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . .  26  2.2  Main theorem  . . . . . . . . . . . . . . . . . . . . . . . . . .  29  2.3  Generalization to linear functions  . . . . . . . . . . . . . . .  36  3 Eigenvalues of Random Integer Matrices . . . . . . . . . . .  40  3.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . .  40  3.2  Basic bounds on singularity . . . . . . . . . . . . . . . . . . .  45  3.3  Eigenvalues and determinants  . . . . . . . . . . . . . . . . .  48  3.4  Proof of main results for n ≥ 2 . . . . . . . . . . . . . . . . .  51  3.5  Preliminaries for 2 × 2 matrices  . . . . . . . . . . . . . . . .  53  3.6  Enumeration theorems for integer eigenvalues . . . . . . . . .  56  3.7  Proof of key proposition for enumeration  . . . . . . . . . . .  62  3.8  Distribution of real eigenvalues . . . . . . . . . . . . . . . . .  68  3.9  The derivative of the distribution  . . . . . . . . . . . . . . .  72  . . . . . . . . . . . . . . . . . . . . . . . . .  75  4 Patterns in x2 + ky 2 . . . . . . . . . . . . . . . . . . . . . . . . .  76  3.10 Further remarks  4.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . .  76  4.2  Local feasibility of runs . . . . . . . . . . . . . . . . . . . . .  78  4.3  Construction for k = 2  . . . . . . . . . . . . . . . . . . . . .  81  4.4  Construction for k > 2  . . . . . . . . . . . . . . . . . . . . .  82  4.5  How many values of k?  . . . . . . . . . . . . . . . . . . . . .  84  4.6  Preliminaries for gaps . . . . . . . . . . . . . . . . . . . . . .  86  4.7  Proof of existence of gaps . . . . . . . . . . . . . . . . . . . .  88  5 Conclusion 5.1  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  91  Arithmetic functions . . . . . . . . . . . . . . . . . . . . . . .  91 v  Table of Contents 5.2  Random matrices  . . . . . . . . . . . . . . . . . . . . . . . .  92  5.3  Patterns in quadratic forms . . . . . . . . . . . . . . . . . . .  94  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96  vi  List of Tables 4.1  Distribution of x2 + ky 2 modulo 8 . . . . . . . . . . . . . . .  79  vii  List of Figures 3.1 3.2  Distribution of real and rational eigenvalues . . . . . . . . . . √ √ Cases 0 ≤ δ ≤ 1, 1 ≤ δ ≤ 2, and 2 ≤ δ ≤ 2 . . . . . . . . .  45 72  viii  Acknowledgements My immediate thanks go to my supervisor Greg Martin, for his careful readings of many drafts, and helpful responses to my panicked e-mails. Through many different turns he has been a wise mentor, an inspiring teacher, an affable colleague, and a tireless supporter. I also thank my committee members, Izabella Laba and J´ozsef Solymosi, and my examiners, Igor Shparlinski, Mark Greenstreet, Michael Bennett and Gordon Semenoff, for the valuable contribution of their time and insights. It has been an unusually long journey from my Master’s degree to this point, shaped by countless individuals. I am truly grateful to all, though space permits me to mention but a few. I thank my family for their ever-present love and support. I feel quite fortunate to have always had the freedom to pursue my interests, however esoteric (let me also thank my nephew Paris for sharing in those interests). I recognize the generous assistance of the Natural Sciences and Engineering Research Council of Canada for a portion of this work. I am grateful for the support of the MS/MRI Research Group, especially Andrew Riddehough for his personal encouragement. I owe many thanks to all the staff of the mathematics department, particularly Lee Yupitun for her boundless knowledge of administrative vagaries. I am fortunate to have met people here who fuelled my mathematical curiosity in memorable ways; I am doubly fortunate to count them among my friends. In order of appearance: Vishaal Kapoor, Mike Tsiang, Florina Halasan, Nishant Chandgotia, Vasu Tewari and Paul Pollack. Finally, I wish to thank my wife Eva Koo for companionship, support and encouragement through good times and bad; for believing in me when I could not; for putting some structure in a random life. ix  Dedication To my father, who almost made it to 90 years. Happy Birthday, Dad.  x  Chapter 1  Introduction 1.1  Setting the stage  Within mathematics, the field of number theory is notorious for its ability to generate problems which are eminently plausible and well-supported by evidence, and yet for which a rigorous proof appears far beyond the reach of current technology. We would like to begin this chapter by attempting to convey to the reader why such a disparity should exist. One often finds in analytic number theory that the basic statistics of a particular object of interest can be understood quite well. At the same time, such an object seems to possess almost no tractable structure which serves to distinguish it from any other random process with the same statistics. Roughly speaking, so many of the things we are interested in behave as though they are random even though they are decidedly not so (for a concrete analogy, the digits of π seem to vary randomly, but the 763rd digit of π will always be 9 no matter how often one repeats the experiment). While it is entirely reasonable to make predictions about such objects, and these predictions are easily seen to be true for the vast majority of their random counterparts, this lack of structure leaves us with no easy way to certify that our particular object is not one of the unlucky few. A prime example to illustrate this dilemma is given by the familiar sequence of prime numbers: 2, 3, 5, 7, 11, 13, 17, 19, and so on. The primes are in some sense defined by their lack of structure, and it is indeed a nontrivial task to predict the spacings between each prime number and the next. Yet the statistics of this sequence have been well-understood for more than a century. If we use the conventional notation π(n) to denote the number of prime numbers between 1 and n, then the Prime Number Theorem (a 1  1.2. Modelling the primes triumph of late 19th century mathematics) states that π(n) is asymptotic to n/ log n, that is lim π(n)  n→∞  n = 1. log n  As n gets progressively larger, n/ log n becomes a better approximation to π(n), in the sense of relative error. Thus, a number chosen at random from the interval [1, n] bears a roughly 1 in log n chance of being prime (here, as in the entirety of this thesis, log n denotes the natural logarithm of n).  1.2  Modelling the primes  The preceding observation gives rise naturally to the Cram´er model, which has been rather successful at making quantitative predictions about primes (though far from perfectly so: see [28, 52]). We describe a slightly simplified version of this model: fix a number n > 2, and choose a random subset P (n) of the numbers {1, 2, . . . , n} by selecting each element independently with probability 1/ log n, so that P (n) has n/ log n elements on average. The Cram´er model predicts that if a given pattern is highly likely to occur within P (n), then the same pattern should also occur with similar frequency in the primes up to n. The advantage of such a simple random model lies in the ease with which one can analyze the properties of P (n). The simplest type of pattern one could reasonably apply this model to is that of twin primes: pairs of primes like 71 and 73 which differ by exactly 2. In other words, we wish to understand how often it is that both m and m+2 are simultaneously prime. In the random model, the probability that m and m + 2 are both included in P (n) is exactly 1/ log2 n for any 1 ≤ m ≤ n − 2, and so the expected number of “twins” in P (n) is (n − 2)/ log2 n, or roughly n/ log2 n. According to the Cram´er model, we should expect the true number of twin primes up to n to be roughly n/ log2 n. This prediction turns out to be quite good, with one significant caveat: notice that P (n) also contains consecutive pairs of the form {m, m + 1} with the same frequency n/ log2 n. On the other hand, this pattern is nigh impossible for primes, because primes are always odd (with the sole exception 2  1.2. Modelling the primes of the prime 2). The primes do possess a structure after all, at least in a negative sense: their lack of divisibility by 2 makes the pattern {m, m + 1} impossible in the primes (aside from {2, 3}), however common it is in the random model. The same reasoning has a more subtle influence on the pattern {m, m + 2}: the frequency should be better than the random model predicts, because if m is prime then m + 2 is certainly odd, and this makes it more likely to be prime than some random number which may equally be odd or even. This simple idea of examining a problem through the lens of “even” and “odd” (more generally, the remainders modulo a particular prime p or prime power pr ) is common in number theory, and is referred to as a local argument. By adjusting the na¨ıve estimate n/ log2 n with a certain correction factor for each prime p (called a local factor), and combining all such factors into one constant, Hardy and Littlewood (in 1923, long before much computational evidence was available) obtained an asymptotic conjecture (called a heuristic) for the number of twin primes up to n: #{m ≤ n : m, m + 2 are prime} ∼ C2 ·  n , log2 n  (1.1)  where C2 has the approximate value 1.32032363. The Hardy–Littlewood conjecture for twin primes has shown strikingly accurate agreement1 with computational evidence. In [63], Nicely computed the exact number of twin primes up to 1014 , and found the prediction to match to within one part in 2 million! (Incidentally, it was this computation which led to the discovery of the infamous “FDIV” division bug in the Intel Pentium processor.) However, we are very far from understanding how to prove that this estimate remains accurate for all larger values of n. In fact, it remains an open problem whether there are infinitely many twin primes, let alone that they occur with the exact frequency predicted by Hardy and Littlewood. 1  To achieve this striking accuracy, one should first replace n/ log2 n by the almost n identical integral 2 (log t)−2 dt; this relates to the simplification we referred to earlier.  3  1.3. Other arithmetic structures  1.3  Other arithmetic structures  The twin primes problem of the previous section can be viewed as a special case of the study of the gaps between consecutive primes. If we denote the sequence of primes in increasing order by p1 , p2 , p3 , . . ., we may restate the twin primes conjecture as “how often is pn+1 −pn = 2?”. More generally, one could naturally ask what is known about the spacing dn = pn+1 − pn ? (The desire to understand gaps between primes was the motivation for Cram´er introducing his random model.) The Prime Number Theorem can easily be used to deduce that, on average, dn has size roughly equal to log n. While the state of our knowledge is insufficient to conclude that dn = 2 infinitely often, it is a remarkable achievement of Goldston, Pintz and Yıldırım [26] that dn can be arbitrarily small compared to its average value: for any > 0, there are infinitely many n for which dn < log n. One might also ask for longer patterns than twin primes. Note that it is unreasonable to expect {m, m + 2, m + 4} to all be prime for local reasons modulo 3 (there is a sole exception at m = 3). But there is no obvious obstruction that would prevent {m, m + 6, m + 12} from all being prime simultaneously. Here again, Hardy and Littlewood have a more general conjecture which encompasses all such patterns, and it has been generalized further by various authors [5, 34, 71]. These all reflect a broad principle: if there is no simple reason why a given pattern of primes should not occur, then it should occur infinitely often, with an asymptotically predictable frequency. However, given our inability to prove this for a mere pattern of length 2, there is little hope of establishing such a general claim. However, there is one other type of pattern for which great success has been achieved, particularly in recent years. Notice that the previous pattern {m, m + 6, m + 12} consists of three equally-spaced values in arithmetic progression. If we relax the demand for a particular spacing and allow for any pattern of the form {m, m + d, m + 2d} with some integer d ≥ 1, then the question becomes: do the primes contain infinitely many arithmetic progressions of length 3? We will shortly address this question and its analogues to longer arith4  1.4. Density and the primes metic progressions, but first we take a brief excursion to describe some closely related and beautiful work from the field of additive combinatorics.  1.4  Density and the primes  Given a set A ⊆ N of natural numbers, one often describes how wellpopulated A is, relative to N, by the quantity δ := lim  x→∞  #{n ≤ x : n ∈ A} . x  We say that A has asymptotic density δ, provided this limit exists (if it does not converge, one can still define lower and upper densities by taking the lim inf or lim sup). For example, the set of all even numbers has asymptotic density 21 , and this could be loosely interpreted as the “probability” that a natural number, chosen at random, is even. However, some care is warranted in this interpretation, as we discuss further in Section 1.10. Roth proved in 1953 [68] that every set A having a positive density δ > 0 must contain infinitely many arithmetic progressions of length 3. So long as A is “dense”, this pattern — which we abbreviate as 3-AP — is inherently unavoidable, no matter how one tries to arrange the elements of A to prevent it from recurring. We shall not describe Roth’s elegant proof in complete detail, but we feel compelled to give a loose account of its clever use of a dichotomy between randomness and structure. If A is distributed “randomly” enough (in a sense that can be made precise in terms of discrete Fourier transforms), then it can be shown that A contains just as many 3-APs as a typical set whose elements are chosen at random with probability δ. How many is that? In this truly random model, every given subset of size 3 occurs with probability δ 3 > 0, so a random set will typically contain many 3-APs. On the other hand, if A is not “random” in this specific sense, Roth showed that there must be some structure within A that allows one to pass to a new set A having two key properties: A contains fewer 3-APs than A, and the density of A is appreciably greater than δ. 5  1.4. Density and the primes Suppose now that A doesn’t contain any 3-APs. Then it does not fall into the “random” case, and so we may replace it by A which has higher density. Since A also lacks 3-APs, it too can be replaced by A , and so on. Eventually we arrive at a set whose density is so high (namely, δ > 23 ) that it cannot avoid any pattern of length 3, a contradiction. Szemer´edi later extended Roth’s theorem to arithmetic progressions of any length k, through an extraordinarily intricate argument that introduced entirely new notions of structure and randomness [80]. Since then, much sharper versions of both Roth’s and Szemer´edi’s theorems have been found: the current best results are due to Sanders [70] and Gowers [27]. Let us now return to the question from the end of the previous section. Recall that the Prime Number Theorem predicts the density of primes up to x to be about 1/ log x, so the asymptotic density of the primes is 0. Unfortunately, Roth’s theorem does not apply to sets as sparse as the primes. There aren’t enough primes to justify the existence of 3-APs on the sole basis that it is unavoidable for any set with the same density (although the refinement by Sanders comes breathtakingly close to doing so). Nevertheless, the primes do contain infinitely many 3-APs (with the same frequency as predicted by random models); this was proven by van der Corput [84] in 1939! Essentially, one can show that the primes are distributed sufficiently nicely that they already fall into the “random” case of Roth’s proof: the necessary estimates were present two years earlier in Vinogradov’s pioneering work on exponential sums [85]. It took many more decades to establish the existence of longer k-APs in the primes. In part this is because the precise sense of “random” that is needed to enumerate k-APs did not begin to emerge until the seminal work of Host and Kra [40], and Gowers [27]. Building on these insights and much more, Green and Tao [29] proved a landmark achievement: the primes contain arbitrarily long arithmetic progressions. More recently they have shown, together with Ziegler [30–33], that such progressions occur with the  6  1.5. Notation and outline precise frequency predicted by random models: for any k ≥ 2, #{k-APs within primes up to n} ∼  Dk n2 , · 2(k − 1) logk n  (1.2)  where the constant Dk accounts for all local factors (the other terms arising purely from the na¨ıve model).  1.5  Notation and outline  With the necessary background in place, we now go into some detail about the specific questions considered in this dissertation, which cross through all of the themes of the previous sections. Starting in Section 1.6, we discuss arithmetic functions, which constitute the main focus of Chapter 2. This discussion continues until Section 1.11, where we turn to the topic of random matrices from Chapter 3. From Section 1.15 onward, we discuss the combinatorial patterns central to Chapter 4. Let us take a moment to introduce some common notation that will appear throughout this work. We use the standard convention that N refers to the natural numbers beginning with 1, not 0. Most Roman-lettered variables, unless otherwise specified, are assumed to take values in N, especially n, m, k, d; we shall reserve p to indicate a prime value. The variable  merits some attention:  it most often refers to a (small) positive real number, but we use  in some  parts for a quantity that is ±1 (the distinction will be obvious in context). If g(n) is a positive function, the Vinogradov notation f (n)  g(n) indi-  cates that there is a constant C > 0 such that |f (n)| ≤ Cg(n) regardless of the choice of n. This naturally generalizes to functions of multiple variables, and we may use a subscript to denote that the implied constant may depend on some of these variables. For example, it is well-known that log n for any  n  > 0.  When working with error terms it is convenient to use the Landau notation O(g(n)) to abbreviate a quantity that is if said quantity is  g(n), and likewise O (g(n))  g(n). We also use o(g(n)) to denote a quantity which  is asymptotically much smaller than g(n); that is, f (n) = o(g(n)) if and 7  1.6. Arithmetic functions only if limn→∞ f (n)/g(n) = 0. In this case, a subscript as in o (g(n)) allows the rate of convergence of this ratio to depend non-uniformly on . The notation d | n means that d divides n (or, n is divisible by d). When p is prime, pr  n more specifically means that pr exactly divides n,  so that pr | n but pr+1 n. The notation (m, n) is often used for the GCD or greatest common (positive) divisor of m and n. We say that m and n are relatively prime or coprime if (m, n) = 1. At times we may use (·, ·) to instead denote an ordered pair; again, the context should make this clear. When a set is written in comprehension notation such as {n ∈ N : n ≤ x}, we will often prepend # to denote the cardinality, as seen in (1.1) or (1.2).  1.6  Arithmetic functions  The study of arithmetic functions is of central interest in analytic number theory. Strictly speaking, any real-valued function f : N → R (or, more generally, a complex-valued function) defined on the natural numbers could be called an “arithmetic function”, but in actual usage the term carries a connotation that the value of f (n) has some relation to the divisibility properties of n. The most commonly studied class of arithmetic functions are the multiplicative functions, which are defined to satisfy f (1) = 1 and f (mn) = f (m)f (n),  whenever (m, n) = 1.  (1.3)  (The condition f (1) = 1 is nearly redundant, and could be deduced from (1.3) if we ignore the constant function f ≡ 0.) It is not hard to see that if the function f is multiplicative then it is uniquely determined by the values f (pr ) taken at all prime powers pr . It therefore suffices to specify a given multiplicative function only on prime powers, and allow the definition to extend to all natural numbers via (1.3). We give below a brief list of special functions, being already of natural interest, which also enjoy this multiplicative property. • The Euler totient function φ(n) counts the number of integers between 1 and n that are relatively prime to n (also the number of 8  1.7. Distribution of an arithmetic function invertible elements in the ring Z/nZ). This is multiplicative, with φ(pr ) = pr−1 (p − 1) for every r ≥ 1. • The divisor function τ (n) counts the number of positive divisors of n (including n itself). This is multiplicative, with τ (pr ) = r + 1. • The sum-of-divisors function σ(n) is the sum of all positive divisors of n. This is multiplicative, with σ(pr ) = 1 + p + p2 + · · · + pr =  pr+1 −1 p−1 .  The above three functions are among the most well-studied arithmetic functions in number theory, yet much about them is still unknown. To give a sobering example, mathematicians in ancient Greece were interested in socalled “perfect” numbers such as 6 = 1 + 2 + 3 and 28 = 1 + 2 + 4 + 7 + 14, which are exactly equal to the sum of their respective divisors (not counting the original number itself). In modern notation, the number n is perfect if and only if σ(n)/n = 2. The simple question “does there exist an odd number that is perfect?” remains unsolved despite being posed over two millennia ago (it may thus stake a reasonable claim as the world’s oldest unsolved problem). Note that the function σ(n)/n is itself multiplicative (more generally, this is true of any product or quotient of multiplicative functions), and like most such functions, its value depends quite crucially on the prime factors of n. As witnessed by the previous sections, we certainly do not have perfect knowledge of the primes, and multiplicative functions likewise remain an area of intense study. While the previous paragraph highlights the difficulty of understanding when σ(n)/n is exactly equal to 2, there has been much more success in establishing the approximate behaviour of σ(n)/n: how often does it fall between, say, 1.999 and 2.001?  1.7  Distribution of an arithmetic function  The previous question was settled by Davenport [15], who showed that for any fixed real numbers 1 ≤ s < t, the set of positive integers n for which s < σ(n)/n < t has a positive asymptotic density δ(s, t) > 0. (Note that 9  1.7. Distribution of an arithmetic function σ(n) is always larger than n, so there is no reason to consider values below 1.) Furthermore, this density varies continuously with s and t, so the density associated to any single value of σ(n)/n is zero. An analogous result was established earlier by Schoenberg [73] for the multiplicative function n/φ(n). The two functions are quite similar in that they both increase as the number of prime factors of n grows, with smaller primes having significantly greater effect. A celebrated theorem of Erd˝os and Wintner [19] shows that both results fall under a general phenomenon common to many positive-valued multiplicative functions f (n): under mild convergence conditions on the logarithm2 g(n) := log f (n), there is a continuous function F (t) such that, for any t ∈ R+ , the asymptotic density of the set {n : f (n) ≤ t} exists and is equal to F (t). We reproduce these conditions below, which are in fact both necessary and sufficient (in each case the summation index p is restricted to prime values):  |g(p)|>1  1 , p  |g(p)|≤1  g(p) , and p  |g(p)|≤1  g 2 (p) converge; p  g(p)=0  1 diverges. p (1.4)  The function F (t) closely recalls the (cumulative) distribution function of a random variable in probability, and it too is called the distribution function of f (n). However, we remind the reader that there is no well-defined random variable that concretely represents all values of f (n); rather, we are taking the limiting distribution of f (n) taken over n ≤ x as x → ∞. The reader familiar with measure-theoretic probability will recognize this as a weak limit of measures. While one can write down an explicit formula for the distribution of σ(n)/n in terms of its characteristic function or Fourier transform (first established in a later paper of Schoenberg [74]), such an expression does not lend itself to high-precision computation. It was only very recently that Kobayashi in his doctoral thesis [45] computed F (2) to four decimal places of accuracy, showing that the density 1 − F (2) of the abundant numbers 2  The resulting function g(n) is, not surprisingly, called an additive function, since it satisfies g(mn) = g(m) + g(n) for (m, n) = 1.  10  1.8. Approximation by arithmetic functions {n : σ(n)/n > 2} is approximately 0.2476.  1.8  Approximation by arithmetic functions  The Erd˝ os–Wintner theorem can be interpreted as saying that many positive multiplicative functions f (n) can be modelled (subject to the above caveat) by a continuous random variable. Furthermore, for the two cases considered by Davenport and Schoenberg, this distribution function is strictly increasing, so that f (n) approximates, within any desired accuracy , every real number s ≥ 1 for a positive frequency of n (F (s + ) − F (s − ), to be exact). In other words, the set of values {f (n) : n ∈ N} is topologically dense in [1, ∞), and in a rather strong sense (every point has a large number of approximants). Let us now consider a stronger question that may be reminiscent of the twin primes problem: since n and n + 1 share no common prime factors, it is reasonable to believe that f (n) and f (n + 1) should behave “independently” in some sense. Is it possible for f (n) to approximate one number s1 while f (n + 1) simultaneously approximates a different number s2 ? Given that the values of f (n) are dense in [1, ∞), is it true that the pairs of values (f (n), f (n + 1)) are dense in [1, ∞) × [1, ∞)? Schinzel and Wang [72] proved a result in this general direction for the specific function φ(n): for any dimension d ≥ 1, the vector of ratios φ(n + 1) φ(n + 2) φ(n + d) , ,..., φ(n) φ(n) φ(n)  (1.5)  can approximate any point in (0, ∞)d arbitrarily well. We remark that while n and n + k may have some prime factors in common, their GCD cannot exceed k, so it is fair to expect φ(n) and φ(n + k) to retain some weaker level of independence (when n is large and k is fixed). This is, however, true in a slightly weaker sense than previously: the set of n for which (1.5) approximates a given point to within  (in every coordinate) might have  asymptotic density 0. Shortly afterward, Erd˝ os and Schinzel [20] established a wide generaliza11  1.9. Comparing arithmetic progressions tion for an arbitrary multiplicative f (n) with certain mild hypotheses: for any dimension d ≥ 1 there is a constant cd such that the vectors (f (n + 1), f (n + 2), . . . , f (n + d))  (1.6)  are dense in [cd , ∞)d . Note that our initial speculation that c2 = 1, in the classical case f (n) = σ(n)/n, is actually false: it is impossible for f (n) and f (n + 1) to both be very close to 1, since one of n or n + 1 must be even, and for any even number m, f (m) ≥ 32 . So it is genuinely necessary to prescribe a different cd for each dimension: the influence of local factors is felt here once more. Moreover, the Erd˝ os–Schinzel proof shows that the set of n which approximate any given point in [cd , ∞)d (within some fixed tolerance ) has a positive asymptotic lower density. The hypotheses on f (n) are similar to the Erd˝ os–Wintner conditions, but they do not necessarily imply that f has a distribution function, so it would be unreasonable to expect the aforementioned set to have a precise (nonzero) asymptotic density. However, as an approximation property this is arguably just as strong: we still have that a positive proportion of n ≤ x are good approximants, and for some values of x this proportion may be higher still.  1.9  Comparing arithmetic progressions  A related problem was more recently considered in a brief note of Newman [62], which is motivated by a simple observation: while φ(n)/n is wellmodelled by a random variable (whose average value works out to exactly 6/π 2 ), the function φ(6n) contains an inherent bias: it is easy to verify that if m = 6n, then φ(m)/m ≤ 31 , which is considerably smaller than average. Therefore, we might expect that the inequality φ(6n) > φ(6n + 1) should occur very rarely3 , and rarer still for φ(30n) > φ(30n + 1). Indeed, the former does not occur until n ≈ 4.1 × 1055 , and the smallest solution to the 3 In fact, one can show that the average value of φ(m)/m for m = 6n + 1 is 9/π 2 (much higher than average), which should lower our expectations even further.  12  1.10. Probability in number theory latter, explicitly computed by Martin [53], is 1116 digits long! Nevertheless, Newman showed that such unusual reversals do occur at infinitely many n, and more generally this is true for any inequality of the form φ(an + b) > φ(cn + d) where a, b, c, d are fixed integers satisfying ad − bc = 0. (This latter constraint is necessary to prevent cn + d from being a scalar multiple of an + b, which would force them to have almost exactly the same prime factors: for instance, it is not hard to see that φ(n) > φ(2n) can never occur.) In Chapter 2, we will show that Newman’s result may be subsumed into a more general version of the Erd˝os–Schinzel theorem that applies to such arithmetic progressions. Specifically, if a1 n+b1 , a2 n+b2 , . . . , ad n+bd are any choice of non-constant linear functions (none of which is a scalar multiple of another), then the vectors (f (a1 n + b1 ), f (a2 n + b2 ), . . . , f (ad n + bd ))  (1.7)  are dense in [c, ∞)d for some constant c (which depends on the specific choice of linear functions and not just d), in the same strong sense as before (there are many are good approximants to each point). Our result is more general in one additional sense: each component of the vector may have its own individual multiplicative function in place of f (provided these functions all satisfy similar conditions). As a consequence of our theorem, not only does the inequality φ(6n) > φ(6n + 1) occur with at least some positive (albeit very close to 0) frequency, but this statement remains true even if we specify that the ratio φ(6n)/φ(6n + 1) should agree with π to 100 decimal places!  1.10  Probability in number theory  It seems appropriate at this point to elaborate on the loose interpretation of probability which recurred in Sections 1.4 and 1.7. Many results in this area can be paraphrased in probabilistic language: already we have witnessed the terms “probability”, “average value” and “distribution function”. Certainly, 13  1.10. Probability in number theory the concept of a “positive integer chosen at random” has a natural intuitive appeal, perhaps even more so than that of a real number chosen randomly from the interval [0, 1]. But while the latter concept is readily formalized as a legitimate well-defined random variable, the former is at odds with the measure-theoretic foundations of probability theory. There is no uniform measure that can be placed on a countably infinite set like the naturals (although asymptotic density is uniform, it is only finitely additive). Ultimately, any (non-trivial) statement that refers to “random integers” is actually describing a limiting process (where at each step of the limit, the probability space remains finite and well-defined): density as a limit of finite probabilities, average value as a limit of finite averages, and distribution function as a (weak) limit of discrete distributions. But this interpretation, while appealing, tends to obscure the very real error terms underlying the convergence to such limits. To give a simple concrete example, the (limiting) probability that a number is even is 21 , but the true proportion of even numbers in {1, . . . , n} is  1 2  + O( n1 ), the implied constant depending on whether n itself is odd or  even. Similarly, the probability that a number is divisible by 3 is  1 3  + O( n1 ).  The events that a random integer is “divisible by 2” and “divisible by 3” may be independent in the sense of limits (that is, 21 · 13 = 16 ), but this ignores the error terms that appear along the way. When working with large numbers of such not-quite-independent events, the accumulation of these error terms can be very significant. For instance, as mentioned in [28], one cannot use this simple approach to accurately √ count the number of primes between n and n (by excluding all numbers √ divisible by some prime p ≤ n). Nevertheless, Brun [12] found clever ways to handle these errors to show that the number of twin primes, while still poorly understood, cannot be vastly higher than predicted by Hardy and Littlewood’s heuristic (1.1). These techniques gave rise to the modern field of sieve theory. Having discussed some of the inherent complications of thinking about integers in probabilistic terms, we feel that it is justified by the fact that many beautiful theorems could not be described so elegantly in any other 14  1.11. Random matrices and eigenvalues form. We give two such examples below (the latter is a celebrated result of Erd˝ os and Kac [17]; together with the Erd˝os–Wintner theorem, these form the cornerstone of probabilistic number theory). • The probability that two random integers are relatively prime is 6/π 2 . • The number of prime factors of a random n is normally distributed with mean and variance both equal to log log n.  1.11  Random matrices and eigenvalues  Let us now turn to a different type of random structure (still arithmetical in nature), for which it is possible to determine the distribution with a much greater precision than most of our previous examples. We first recall some terminology from undergraduate linear algebra: given a square matrix A, we say that A has an eigenvalue λ ∈ C if there exists a nonzero vector x (called an eigenvector) such that Ax = λx. Any given matrix possesses only finitely many eigenvalues: to be specific, if A is an n × n matrix, it has exactly n eigenvalues given by the roots of a certain nth degree polynomial (possibly with some repetitions). These values yield significant information about the structure of A, permitting numerous applications in diverse areas including quantum physics, statistics and epidemiology. Perhaps the most prominent usage of all is the PageRank algorithm pioneered by Google Inc. [11]: if A is suitably built to represent all links between web pages, then the principal eigenvector of A (one that is associated with the largest eigenvalue) yields a direct measure of each page’s relative importance. This establishes the set of eigenvalues of a matrix (colourfully known as its spectrum) as a central object of study: just as prime factors are intrinsic to integers, so too are eigenvalues intrinsic4 to matrices. Given a large, or even infinite, family of matrices, we are naturally led to wonder about the statistical distribution of eigenvalues within. In keeping with the arithmetic 4  The prefix “eigen-” derives from a Germanic word meaning “self” or “innate”.  15  1.12. Random integer matrices theme of this thesis, we study in Chapter 3 the (initially vague) question: what can one say about the eigenvalues of a random matrix of integers? Before we expand on this question in the next section, we would be remiss not to mention some classical work on random matrices, much of which has been devoted to continuous (and often symmetric) random models originating from quantum physics. Even in number theory, one of the most frequently discussed matrix models is based not on integers but on complex numbers: it is a probability distribution defined on n×n Hermitian matrices5 known as the Gaussian unitary ensemble (GUE). The significance of this model to number theory is that the eigenvalues of random GUE matrices, when n is very large, appear to have spacing that is statistically identical to that of the zeros of the Riemann zeta function; this surprising connection was first made at a (now-famous) chance meeting between Montgomery [58] and physicist Freeman Dyson. (Though it does not occupy a major role in this thesis, the Riemann zeta function may well be the most important object in all of analytic number theory; a greater understanding of its zeros would answer a wide range of outstanding questions about the distribution of primes — see [9] for a nice summary.) A beautiful result of Edelman [16] is also worth mentioning here: if we populate an n × n matrix with independent entries randomly drawn from a standard normal distribution, the probability that it is diagonalizable over R (meaning essentially6 that its eigenvalues are all real) is exactly 2−n(n−1)/4 .  1.12  Random integer matrices  Our own investigation (jointly with Greg Martin) begins with a question similar to Edelman’s, which was posed by Hetzel, Liew, and Morrison in [37]: what is the probability that a random n × n integer matrix is diagonalizable over the field of rational numbers? As we have previously seen, this question does not refer to any single 5  Hermitian matrices are a complex-valued analogue of real symmetric matrices. These are not precisely equivalent, should an eigenvalue be repeated; however, this complication occurs with probability 0 so we may safely ignore the distinction. 6  16  1.12. Random integer matrices distribution of integer matrices; it must be understood as a limiting process — see the introduction of [37] for a lively discussion justifying the use of the term “probability” for this purpose. Just as before, we will restrict the integers to a finite set: for any k ≥ 1, we can certainly populate an n × n matrix M (k) with entries chosen independently at random from the integers between −k and k. The previous question then becomes: as k → ∞, what happens to the probability that M (k) is diagonalizable over Q? We expect that this is equal to the probability that every eigenvalue of M (k) is rational, and indeed Hetzel et al. rigorously justified this equivalence (the two probabilities are not exactly equal but they do have the same limit as k → ∞). Computational experiments led them to conjecture that both probabilities converge to 0 (except in the case n = 1, since 1 × 1 matrices are already diagonal in a vacuous sense). We confirm in Theorem 3.1 that this is indeed true. In fact we prove a stronger result: the probability that M (k) has any rational eigenvalues tends to 0. In other words, a typical matrix does not have any rational eigenvalues, which generalizes earlier results of [47] and [37] for the case n = 2. Quantitatively, we show that this probability is On, (k −1+ ) for any > 0, which certainly goes to 0 for k large (we could take, say,  = 12 ).  In answering the question of Hetzel et al., we are fixing the matrix size n and allowing the size of the entries k to grow. One might equally ask the converse problem: what happens if we instead fix the size of the entries, but allow the matrix to grow? This has been extensively studied in the more general formulation that the entries are chosen from any fixed probability distribution X. The case where X is a standard normal distribution was studied by Ginibre [24], who proved that as n → ∞ the eigenvalues become uniformly distributed — in a limiting sense — throughout the unit disk of √ radius n (this random matrix model, now called the Ginibre ensemble, was also the setting of Edelman’s result). Girko [25] showed that this limit occurs for a wider class of distributions X, and proposed that it should be universal, meaning that any random model of this type should converge to this same distribution (provided X has the same mean and variance). Quite recently, a series of papers by Tao and Vu [82, 83] have successfully 17  1.13. Singularity and computation established Girko’s circular law in full generality (in fact, their most recent result holds for a slightly wider class of random matrices than originally conjectured).  1.13  Singularity and computation  A closely related, and perhaps simpler, variation of the eigenvalue problem is to find the probability that the random matrix M (k) is singular (equivalently, that it has an eigenvalue of 0). In fact, our proof of Theorem 3.1 relies strongly on obtaining a quantitative upper bound of the form On, (k −2+ ) for this singularity probability. Whereas quantum mechanics was a key motivation for the study of Gaussian models, questions about discrete distributions arise more naturally in the field of theoretical computer science. A well-known problem in combinatorial optimization is integer programming: given a matrix A and a vector b, one wishes to find an integer vector x such that Ax ≤ b (here the notation a ≤ b means that ai ≤ bi for every index i). Were it not for the restriction to integer values, this could be done efficiently by standard techniques for linear programming. With this restriction, the problem is classically known to be NP-complete: in a very precise sense, it is just as hard to solve as several thousand other such problems, widely regarded as intractable. While it is justifiably hard to solve integer programs in generality, the typical case is quite different (say, if A is randomly chosen as before). Our bounds for the singularity probability were adapted by Pataki, Tural and this author [66] to show that most instances can be solved very easily, essentially using a na¨ıve algorithm known as branch-and-bound. This may seem counterintuitive, but it is not uncommon: many NP-complete problems (not all, though; see [49]) are known to have good typical-case behaviour with very few difficult instances. In contrast, the famous problem of factoring an integer into primes is believed to be quite the opposite, with very few easy cases7 (it is still unknown whether factoring is NP-complete). 7  Seeing that the security of Internet commerce currently relies on this fact, we sincerely hope this belief is not overturned in the immediate future!  18  1.14. Rationality versus reality  1.14  Rationality versus reality  For the special case of 2 × 2 random matrices, we are able to give a very sharp answer to Hetzel et al.’s original question: Pr(M (k) is diagonalizable over Q) =  C log k 1 , +O k k  where C ≈ 0.55873957 is an explicit constant. In fact, we devote a significant part of Chapter 3 to calculating the exact limiting distribution of rational eigenvalues of M (k) in the case n = 2. Note that we use the term “distribution” in a broader sense than distribution functions: the concept is actually much closer to that of a probability density function, except that it has total area 2 rather than 1, since a matrix has two eigenvalues. To describe this distribution, it is convenient to consider the eigenvalues of the scaled-down matrix k1 M (k) (which are simply the eigenvalues of M (k), divided by k). If one plots a histogram of the eigenvalues of k1 M (k), restricting to only those that are rational, these histograms will converge as k → ∞ to the distribution depicted by the solid line in Figure 3.1, which is defined explicitly as a piecewise smooth function.8 On the other hand, this limiting process k1 M (k) converges (in the weak sense) to a random matrix with real entries uniformly distributed on the interval [−1, 1]. For such a continuous random matrices, there is no obvious structure to the set of rational eigenvalues: any small perturbation of a matrix with a rational eigenvalue could easily shift it to some nearby real value.9 One might therefore ask if the limit distribution of rational eigenvalues could be explained simply in terms of the real eigenvalue distribution in the continuous case? To answer this question, we also compute the real eigenvalue distribution for n = 2: this is a genuine distribution, though it is also equal to the limiting 8 By contrast, the distribution of σ(n)/n is known to be singular, having infinitely many points of non-differentiability [18]. 9 It is not clear whether one can formalize this intuition by placing a natural uniform measure on the restriction to rational eigenvalues.  19  1.15. Sums of two squares distribution for the real eigenvalues of k1 M (k). It is depicted by the dashed line in Figure 3.1, which is noticeably different from the previous one. One obvious difference is the fact that the rational distribution is unimodal, while the real distribution is bimodal. In particular, it is more common for M (k) to have a rational eigenvalue near 0 than one near 3k/4; yet the reverse is true for real eigenvalues! This curious disparity suggests that there is some arithmetic structure of  1 k M (k)  that is lost upon passing to the continuous limit. It would not  be fair to attribute such structure to local effects, since the graphs do not display any obvious dependence on the value of k (or on the eigenvalue itself) modulo small primes. However, a powerful result of Katznelson [43] may shed some light on the nature of this structure, as we discuss further in Section 3.10. Indeed, Katznelson’s result could be used, with some effort, to recover the rational eigenvalue distribution of M (k) for n = 2, or any larger value of n.  1.15  Sums of two squares  Having explored genuinely random phenomena in the previous four sections, we now turn our attention to a third and final arithmetic setting, which bears many similarities to the primes (and poses many of the same challenges). Let us write S for the set of integers which can be written as x2 + y 2 , the sum of two square integers. For concreteness we list the first sixteen terms of S below: S = {0, 1, 2, 4, 5, 8, 9, 10, 13, 16, 17, 18, 20, 25, 26, 29, . . .}. It is apparent from this list that not all natural numbers belong to S. Indeed, since x2 is always 0 or 1 modulo 4, no number of the form 4m + 3 will appear. This local argument succinctly explains why 3 and 7 are not in S, but not why 21 is absent (this could be ruled out by a similar process modulo 9, but we will shortly give a more comprehensive reason). Despite the seemingly arbitrary construction of S, it actually enjoys a 20  1.15. Sums of two squares rather rich history, owing to its surprisingly deep structure with roots at the heart of classical number theory. The third-century Greek mathematician Diophantus of Alexandria was the first to record the identity (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2 ,  (1.8)  which establishes that S is closed under multiplication. However, the critical observation indicating the true nature of S was made by Pierre de Fermat10 in 1640: every prime of the form 4n + 1 belongs to S. A rigorous proof of Fermat’s claim did not appear until a century later by Euler (crediting Fermat for the necessary technique of infinite descent), who used it to characterize members of S by their prime decomposition: the natural number n belongs to S if and only if, in the prime factorization of n, every prime of the form 4m + 3 appears to an even power. For instance, 74 · 17 · 232 belongs to S (since both 7 and 23 have even exponents), while 74 · 17 · 23 does not. So even though S is constructed through addition, it nonetheless carries a highly multiplicative structure that is not entirely unlike the primes.11 Indeed, Landau [48] used this very structure to establish a theorem analogous to the Prime Number Theorem, giving the precise density of S: #{n ≤ x : n ∈ S} ∼ C √  x , log x  (1.9)  where the constant C ≈ 0.76422365 is largely composed of local factors as in the Hardy–Littlewood conjecture. From this we see that the asymptotic √ density of S is equal to 0. The quantitative density O(1/ log x) situates S (on a logarithmic scale) exactly halfway between the positive density O(1) and the density of primes O(1/ log x). 10 Fermat and Diophantus share another connection: Fermat penned his so-called “Last Theorem” (famously proven by Wiles) in his copy of Diophantus’s Arithmetica. 11 One could, somewhat facetiously, describe the primes as those numbers n for which every prime less than n appears to a power less than 1.  21  1.16. Patterns in sums of two squares  1.16  Patterns in sums of two squares  We have observed some commonalities between S and the primes: they are roughly comparable in density, and their similar multiplicative structure suggests that a Cram´er-like model would be successful in formulating predictions about S (tempered by the local factors that arise from Euler’s characterization). It is natural to ask very similar questions about S that were raised about primes in Sections 1.2 and 1.3 (most of which are widely considered hopeless). However, there is one obvious advantage enjoyed by S, which affords further progress on these combinatorial questions. Not only does S have an exclusive12 multiplicative structure, it also has an inclusive, additive structure: by definition, we can show that n ∈ S simply by finding two integers x and y such that x2 + y 2 = n. There is no comparable way to construct a prime out of arbitrary integers.13 As a straightforward example, consider the “twin primes” problem for S: it is obvious that n2 and n2 + 1 both belong to S for any n, so S contains infinitely many patterns of type {m, m+1}. Even the longer pattern {m, m + 1, m + 2} is easy to find: given any three consecutive numbers n − 1, n, n + 1 belonging to S with n > 1 (for instance n = 9), it is also true that {n2 − 1, n2 , n2 + 1} ⊂ S. The first inclusion follows from the factorization (n − 1)(n + 1) and Diophantus’s formula (1.8); the other two are obvious, as noted before. We will further discuss such runs of consecutive elements in Section 1.17, but for S there is no hope of extending this to runs of length 4: every fourth number has the form 4m + 3, which is excluded. Let us turn now to the question of gaps (spacings between consecutive terms of S). We have just witnessed infinitely many gaps of size 1 in S, so 12  That is, we can prove certain integers do not belong to S based solely on, say, the power of 3 dividing them. A similar process of exclusion clearly holds for primes. 13 While “prime-generating” formulae do exist, many are trivial curiosities; even the most interesting one, a polynomial constructed by Jones et al. [41], is not practical here.  22  1.16. Patterns in sums of two squares we know that gaps can be arbitrarily small, and often. Conversely, it is not hard to see that gaps in S can be arbitrarily large: by Landau’s asymptotic √ (1.9), the average size of the nth gap is roughly 1.3 log n. This leaves one existence question: does every d ∈ N occur as a gap somewhere in S? Spiro and Hensley [79] answered this, showing that every gap size does in fact occur, and infinitely often. Note that we have constructed only  √  n instances of the pattern {m, m+  1} up to n; however, random models predict that there should be about n/ log n (ignoring the multiplicative constant). For the pattern {m, m + 1, m + 2}, the above construction gives only log log n occurrences; a more efficient construction could improve this to n1/4 , but this is still a far cry from the expected n/(log n)3/2 . Much stronger results on S were obtained by Hooley [39], who has studied the combinatorics of S extensively. Using analytic techniques (cleverly drawing from sieve theory) he showed that the pattern {m, m + 1} occurs at least  n/ log n times up to n. Since a similar upper bound  n/ log n  is also known by standard sieve methods, this establishes the correct order of magnitude for the frequency of twins {m, m + 1} in S, if not a precise asymptotic (with a known constant). Hooley also showed in [38] that the pattern {m, m + a, m + b} occurs infinitely often in S, for any a and b. So far, there seems to be greater hope of establishing patterns in S than in the prime case. It is ironic then, that for arithmetic progressions the primes have enjoyed far more success than S. For instance, the existence of arbitrarily long k-APs in S was not known until it became an immediate consequence of the Green–Tao theorem.14 The expected asymptotic for the quantity of 3-APs in S remains (to our best knowledge) unproven, unlike van der Corput’s theorem for primes. In general, precise asymptotics for S seem to be quite difficult — very recently, though, Matthiesen [56] has given a complete solution to a modified form of the k-APs problem (we discuss this modification in the conclusion). 14  Green and Tao actually showed that a set containing any positive proportion of primes contains long progressions, and S contains about half of all primes.  23  1.17. Patterns in quadratic forms  1.17  Patterns in quadratic forms  We can more generally view S within a wider context: for k ≥ 1, let S(k) be the set of numbers of the form x2 +ky 2 , so that S is equivalent to S(1). This extension is not entirely arbitrary, in part due to an identity generalizing (1.8), found by the 7th century Indian mathematician Brahmagupta: (a2 + kb2 )(c2 + kd2 ) = (ac − kbd)2 + k(ad + bc)2 .  (1.10)  In Chapter 4 we devote our attention wholly to S(k), but we pause briefly to mention a truly fascinating related area of number theory. The theory of binary quadratic forms such as x2 + ky 2 and more generally ax2 + bxy + cy 2 blossomed in the late 18th century under Euler, Lagrange and Legendre, culminating in Gauss’s 1801 masterwork Disquisitiones Arithmeticae [22]. While not all such forms possess a self-multiplication property15 such as (1.10), Gauss uncovered a beautiful group structure which interconnects them all. Much more recently, Bhargava [7] has shed entirely new light on Gauss’s composition laws, and applied these profound insights to solve some outstanding problems in algebraic number theory! Let us return to the specific topic of S(k). The density of S(k) is very similar to that of S, having an asymptotic formula of the same form as (1.9), albeit with a different constant — here, the constant is more intricate, containing additional arithmetic factors (related to the work mentioned in the previous paragraph). This asymptotic was established by Bernays (then a student of Landau) in his 1912 doctoral thesis [6], and was considerably sharpened recently by Blomer and Granville [8]. The local considerations for S(k) are rather dependent on the value of k. Let us discuss how this affects the two problems of runs and gaps from the previous section. While S(1) does not contain more than 3 consecutive integers, S(2) contains {0, 1, 2, 3, 4}, as well as {96, 97, 98, 99, 100}. On the other hand, one can show that S(k) never admits any run of length 6; in this sense, such quintuples or 5-runs are of maximal interest for the S(k) 15 However, a surprising consequence of Gauss’s work is that the product of any three numbers of the form ax2 + bxy + cy 2 can be expressed in the same form!  24  1.17. Patterns in quadratic forms setting.16 This raises the natural question: for which values of k does S(k) admit (infinitely many) consecutive quintuples?. We give a partial answer to this question, showing that S(2) does contain infinitely many 5-runs, which answers a question of Moshe Rosenfeld (personal communication). More generally, we can show that S(k) has the same property for several small values of k, as well as for an infinite sequence of distinct values k. However, this technique does not apply to all values of k: we do not know whether S(34) has infinitely many 5-runs (heuristics suggest that it should). For the analogous question of which gaps appear in S(k), the only local obstruction occurs when k is divisible by 4 (in which case S(k) only contains numbers of the form 0 or 1 modulo 4). For such k, it is impossible for S(k) to contain gaps of size 2 or any number of the form 4d + 2. In Section 4.7 we fully generalize Spiro and Hensley’s result to each S(k): aside from this one class of exceptions, every d ∈ N appears infinitely often as a gap in S(k).  16 However, the form xy takes all possible values. While this example is rather trivial, it shows that local factors do not preclude, say, 30!(x2 + y 2 ) + xy from having long runs.  25  Chapter 2  Simultaneous Approximation by Arithmetic Functions 2.1  Introduction  In this chapter we study the multidimensional approximation of points using values taken by arithmetic functions on arithmetic progressions, establishing generalizations of some theorems of Erd˝os and Schinzel [20], as well as Newman’s result from Section 1.9. While it is multiplicative functions that comprise our main interest, these theorems are more naturally formulated in terms of the logarithms of such functions, which are additive. We will adopt the (non-standard) notation  ·   x, x := 1,  from [20], which is defined by: if |x| ≤ 1; if |x| > 1.  We first give the formal statement of the Erd˝os–Schinzel theorem. Define F to be the class of all additive functions f : N → R satisfying the following: Condition 2.1. f (n) is a real-valued additive function such that (a)  p  f (p) 2 /p is convergent;  (b) There exists a real c1 = c1 (f ) such that, for any integer M > 0, the set of values {f (N ) : (N, M ) = 1} is dense in (c1 , ∞); (c) The partial sums of  p  f (p) /p are bounded (note that the summand  is not necessarily positive).  26  2.1. Introduction Erd˝ os and Schinzel proved [20] that the following approximation theorem holds for any f ∈ F: Theorem 2.2 (Erd˝ os–Schinzel). For any h ∈ N there exists ch such that, > 0 and every sequence of h numbers α1 , α2 , . . . , αh ≥ ch , the set  for any  of natural numbers n which satisfy the simultaneous inequalities |f (n + i) − αi | <  (i = 1, 2, . . . , h)  (2.1)  has positive lower density. They furthermore showed [20, p. 482] that any additive f for which Theorem 2.2 holds must also satisfy Condition 2.1, so that their result is best possible. It is worth comparing these hypotheses to the Erd˝os–Wintner conditions which determine the existence of a distribution function for f (n). Those conditions, from (1.4), may be rephrased17 in the present notation as •  p  f (p) 2 /p is convergent;  •  p  f (p) /p converges.  Relative to the above, we see that Condition 2.1 relaxes the requirement that  p  f (p) /p converge to merely having bounded partial sums. At the  same time, it adds a new requirement that the values of f remain dense after excluding certain congruence classes. One fairly common situation, which is sufficient to imply this denseness requirement, is that f (p) ≥ 0, f (p) → 0 and  p f (p)  diverges (or that this is true on some subsequence of primes).  In such case we may explicitly take c1 = 0. We note that if f is a positive multiplicative function for which log f ∈ F, then the conclusion of Theorem 2.2 holds equally well for f . The class F includes a wide range of such logarithmic functions, notably log σ(n) − log n and log n − log φ(n) (we may take c1 = 0 here, for the reason mentioned above). The theorem can therefore be used to approximate any tuple of sufficiently large reals by consecutive values of σ(n)/n, or sufficiently small 17 We have suppressed the divergence requirement from (1.4), which is not needed for existence but only for continuity of the distribution function.  27  2.1. Introduction positive reals by φ(n)/n. Theorem 2.2 also readily implies a similar approximation result for consecutive values of f (n + 1) − f (n), and it is easy to see that this yields results for σ(n + 1)/σ(n) and φ(n + 1)/φ(n); thus, Theorem 2.2 is a powerful generalization of earlier results by Schinzel and Wang [72]. We will return to this “relative” formulation in Section 2.3. Shao had previously given [77] a generalization of Schinzel and Wang’s results for φ and σ. However, this result placed additional assumptions on the growth rate of f (pr ) as p → ∞ for each r ≥ 1. By contrast, Theorem 2.2 needs only bounds on f (p), and it obtains a positive proportion of n rather than just an infinite set. As a cute example, any multiplicative function f with f (p) = 1 for all primes p, and each of f (p2 ) = 3 and f (p2 ) =  1 2  for  infinitely many p necessarily satisfies Condition 2.1, and therefore has the approximation property (2.1) for many values of n. (One could freely replace 3 and  1 2  with any reals α and β such that  log α log β  is a negative irrational.)  Our main theorem is a generalization of Theorem 2.2 to the case of multiple arbitrarily chosen functions from F. The method of proof is essentially the same as that in [20], for which we give a self-contained exposition. Theorem 2.3. Let f1 , f2 , . . . , fh ∈ F. Then there exists a constant c such that for any sequence of h numbers α1 , α2 , . . . , αh ≥ c, and  > 0, the set of  natural numbers n satisfying the simultaneous inequalities |fi (n + i) − αi | <  (i = 1, 2, . . . , h)  (2.2)  has positive lower density. In Section 2.3, we generalize this result further: rather than consider the values of f at the consecutive integers n + 1, n + 2, . . . , n + h, we may take (essentially) arbitrary linear functions in n. The technique is entirely elementary, and for expository purposes we choose to generalize a slightly different theorem from [20]; the application of the same technique to Theorem 2.3 is comparably straightforward. It is natural to ask how the density of n satisfying (2.2), or the smallest such n, might vary as  → 0. While estimates for these could be obtained 28  2.2. Main theorem from our construction, they are exponential in 1/ , and the parameters depend on specific convergence properties of f (p), which makes it difficult to give a reasonable bound that is uniform over F. However, Alkan, Ford and Zaharescu [2] have recently proved a very sharp result along these lines, after imposing some reasonable regularity constraints on f (pr ). They obtain an analogue of Theorem 2.3 (and its generalization to linear forms, Corollary 2.11) for infinitely many n, where  may be replaced by n−c for some  positive constant c, depending only on h and the regularity parameters. This gives a growth rate of just n  2.2  (1/ )1/c , on some sequence of  → 0.  Main theorem  Let us first outline the basic strategy for Theorem 2.3: by restricting n to be divisible by h!, we have that n + i is necessarily divisible by i (for 1 ≤ i ≤ h), but we retain the freedom to choose finitely many (small) prime divisors for each (n + i)/i independently. Condition 2.1b then allows us to approximate each target αi by the contributions from these small prime divisors, by further restricting n to a particular congruence class. We then use Conditions 2.1a and 2.1c to show that the effect of the large primes on f (n) is harmlessly small for a positive proportion of n within this class. We make use of the following facts, which are well-known or elementary: Proposition 2.4. (a) The number of positive integers n ≤ x which satisfy n ≡ r (mod Q) and d | n is x/dQ + O(1), uniformly over all naturals d, Q, r where (d, Q) = 1. (b) The number of positive integers n ≤ x with at most two prime factors is O(x log log x/ log x) = o(x). We will also use a topological observation, essentially a restatement of the simple fact that a bounded subset of Rh is totally bounded. Note that the symbol to the  ·  ·  ∞  below refers to the usual supremum norm; it is unrelated  notation appearing in Condition 2.1. 29  2.2. Main theorem Proposition 2.5. Let h ∈ N and let {sn }n>0 be a bounded sequence in Rh . Then for any η > 0, there exists a finite index set K ⊂ N, such that for any n ∈ N there exists k ∈ K such that k ≤ n and sn − sk Proof. Let M > 0 be an upper bound for sn  ∞  ∞  < η.  (uniform over all n). We  can cover the set S = {sn : n ∈ N} with finitely many open h-dimensional cubes B1 , B2 , . . . , Bm of side length η, such that Bi ∩ S = ∅ for each i = 1, 2, . . . , m (we need at most (1 + 2M/η)h such cubes). We may then take K = {k(i) : 1 ≤ i ≤ m}, where k(i) is defined as the least k for which sk ∈ Bi . For any n ∈ N, we certainly have sn ∈ Bi for some i, whence n ≥ k(i) and sn − sk(i)  ∞  < η since sn and sk(i) lie in the same cube.  Lastly, we require the following lemma, which appears as part of equation (21) in [20]. There is a minor flaw in the argument used there, so we include an independent proof for the sake of completeness. Lemma 2.6. There is an absolute constant C > 0 such that for any x > 0,  √  x<p≤x x/p<q<p  1 < C, pq  where the sums range over p, q prime. Proof. We may assume without loss of generality that x > 4. Switching the order of summation, the above sum is equal to  √  x<q<x  1 q  q<p≤x  ≤ √  x<p≤x  1 p  1 p  + √ q≤ x 2  + √ q≤ x  1 q  1 q  x/q<p≤x  x/q<p≤x  1 p  1 . p  (2.3)  Recalling Mertens’ Theorem ([4, Lemma 4.10]) that for x ≥ 2,  p≤x  1 = log log x + B + O(log−1 x), p  30  2.2. Main theorem we see that the first term of (2.3) is equal to log2 2 + O(log−1 x) = O(1), so it suffices to bound the second term. The inner sum is  x/q<p≤x  1 x x = log log x − log log + O log−1 p q q log q log x  = − log 1 − uniformly for 2 ≤ q ≤  √  √ q≤ x  + O(log−1 x)  log q log x  x, so the second term of (2.3) is  1 q  x/q<p≤x  1 p  1 log x  √ q≤ x  log q , q  which is O(1) by another Mertens estimate ([4, Lemma 4.8]). Remark. The argument for the above lemma in [20] used an estimate of the form  y<p≤z  1/p  log log z − log log y, but this bound does not hold  uniformly without an error term such as O(log−1 y) above to account for very short intervals (y, z] containing primes. Proof of Theorem 2.3. We take c = max{c1 (fi ) + fi (i)}hi=1 , where c1 (fi ) is defined as in Condition 2.1b. Let α1 , α2 , . . . , αh ≥ c and  > 0 be fixed.  By iterative application of Condition 2.1b, we can find pairwise co-prime integers N1 , N2 , . . . , Nh > 1 such that (Ni , h!) = 1 |fi (Ni ) − αi + fi (i)| < Let N =  h i=1 Ni  2  (i = 1, 2, . . . , h), (i = 1, 2, . . . , h).  (2.4)  and let k1 > h be the largest prime factor of N . For any  k > k1 , we can define Pk :=  p,  Qk := h!2 Pk N 2 .  h<p≤k p N  31  2.2. Main theorem By the Chinese remainder theorem, the system of congruences n ≡ 0 (mod h!2 Pk ),  n ≡ −i + Ni (mod Ni2 )  (i = 1, 2, . . . , h)  (2.5)  has a unique solution n ≡ n0 (mod Qk ). Let Ak denote this congruence class. For any n ∈ Ak , n + i is congruent to i mod i2 , and to Ni mod Ni2 ; hence the quantity (n + i)/iNi is an integer relatively prime to iNi . Note furthermore that n + i is non-zero mod p for any p ≤ h with (p, i) = 1, for any p | Pk , and for any p | Nj with j = i. It follows that (n + i)/iNi is not divisible by any prime p ≤ k, and so fi (pr ).  fi (n + i) − fi (iNi ) =  (2.6)  pr n+i p>k  We wish to choose k so that this quantity has magnitude at most /2 for a positive proportion of n ∈ Ak . Then by (2.4) we will have |fi (n + i) − αi | < as desired. We define the following parameters (here C is the constant in Lemma 2.6): √ µ := min(1, / 96Ch),  √ η := / 48h.  (2.7)  Let P denote the set of primes p such that |fi (p)| ≤ µ for i = 1, 2, . . . , h. For convenience we use the notation P(k) for the subset {p ∈ P : p > k}. By Condition 2.1a, the sum converge. As  p 1/p  2  p∈P /  1/p ≤ µ−2  h i=1  p  fi (p) 2 /p must  also converges, there is an integer k2 ≥ k1 such that  p>k2 , p∈P /  1 + p  p>k2  1 1 . < 2 p 3h  (2.8)  For any k > k2 > h, let Nk denote the subset of Ak consisting all n ∈ Ak with the following property: (n + 1)(n + 2) · · · (n + h) is not divisible by any prime p > k outside of P, nor is it divisible by p2 for any prime p > k (here and below, Pk , Qk and Ak remain as constructed above). We estimate the counting function Nk (x) = #{n ≤ x : n ∈ Nk }. The  32  2.2. Main theorem counting function of Ak is clearly Ak (x) = x/Qk + O(1). If n ≤ x and n ∈ Ak \ Nk , then n + i (for some 1 ≤ i ≤ h) must be divisible by some √ prime p ∈ / P with k < p ≤ x + i, or by p2 for some k < p ≤ x + i. For any fixed p and i, Proposition 2.4a gives exactly x/pQk + O(1) values of n ≤ x where p | n + i, and x/p2 Qk + O(1) values of n ≤ x where p2 | n + i. Thus h  Ak (x) − Nk (x) ≤ i=1  ≤  hx Qk  k<p≤x+i p∈P /  p>k, p∈P /  x + O(1) pQk 1 + p  p>k  1 p2  x  + √ k<p≤ x+i  +O  h p≤x+h  p2 Qk ≤  + O(1)  x + o(x). 3Qk  So Nk (x) ≥ 32 x/Qk + o(x); in other words, Nk has lower density at least  2 3  relative to the progression Ak . For any n ∈ Nk , the integer (n + i)/iNi is a squarefree product of primes p with p > k and p ∈ P. Thus equation (2.6) restricted to n ∈ Nk reduces to fi (n + i) − fi (iNi ) =  fi (p).  (2.9)  p|n+i p∈P(k)  We will show that for x sufficiently large, we can choose k = k(x) so that h  (fi (n + i) − fi (iNi ))2 <  E(x) = n∈Nk , n≤x i=1  and thus there are at most which |fi (n + i) − fi (iNi )| ≥  1 3 x/Qk 2  2  x + ok (x), 12 Qk  (2.10)  + ok (x) values of n ∈ Nk up to x for  for some i = 1, 2, . . . , h. By the previously  estimated density of Nk there are at least 31 x/Qk + ok (x) values of n ≤ x satisfying inequality (2.2). This will be sufficient to prove Theorem 2.3, provided that we have a uniform upper bound on k (hence also on Qk ) independent of x: we are free to vary k with x provided that we restrict the choice of k(x) to within a finite set of possible values, so that the (lower) density of Nk is bounded below by a positive quantity.  33  2.2. Main theorem By Condition 2.1a, there exists an integer k3 ≥ k2 such that h i=1 p∈P(k3 )  fi (p)2 ≤ p  h i=1  p>k3 |fi (p)|<µ  2 fi (p)2 < . p 24  We have by Condition 2.1c that the partial sums of bounded, and since  p∈P /  the partial sums of  (2.11)  p  fi (p) /p are  fi (p) /p converges absolutely by Condition 2.1a, fi (p) /p are also bounded. We denote these  p∈P  partial sums by Sk,i , that is Sk,i := p∈P, p≤k  fi (p) = p  p∈P, p≤k  fi (p) . p  Applying Proposition 2.5 to the sequence of h-tuples {(Sk,i )hi=1 }k>k3 , we see that there exists a finite index set K ⊂ P(k3 ) such that for any x > k3 there exists k = k(x) ∈ K such that k ≤ x + h and for all i = 1, 2, . . . , h, S  x+h ,i  − Sk,i = k<p≤x+h p∈P  fi (p) < η. p  (2.12)  As K is finite, we have k = k(x) ≤ max K uniformly for x > k3 , as required by the previous discussion. We now estimate the sum in (2.10) for this choice of k. By (2.9) we have h  h  2  (fi (n + i) − fi (iNi ))2 =  E(x) = n∈Nk i=1 n≤x  fi (p) i=1 n∈Nk n≤x  p|n+i p∈P(k)  h  fi (p)2 + 2  ≤ i=1  n∈Ak n≤x+h−i  p|n+i p∈P(k)  fi (p)fi (q) . pq|n+i p,q∈P(k) p>q  34  2.2. Main theorem Changing the order of summation and applying Proposition 2.4a gives h  x + O(h) + 2 pQk  fi (p)2  E(x) ≤ i=1  pq≤x+h p,q∈P(k)  p≤x+h p∈P(k) h  x = Qk  fi (p)fi (q)  i=1 p≤x+h p∈P(k)  h  fi (p)2 +2 p  i=1 pq≤x+h p,q∈P(k) p>q  h  x + O(h) pqQk  fi (p)fi (q) pq  h 2  +O  hµ2 .  hµ + i=1 p≤x+h  i=1 pq≤x+h  We apply (2.11) to the first double sum and Proposition 2.4b to the error term to obtain E(x) <  x Qk  2  24  + 2h max i≤h  pq≤x+h p,q∈P(k) p>q  fi (p)fi (q) pq  + o(x).  To prove (2.10) it suffices to show that for each i = 1, 2, . . . , h, 2 pq≤x+h p,q∈P(k) p>q  2 fi (p)fi (q) < . pq 24h  (2.13)  By equation (2.12) we have for any 1 ≤ i ≤ h, 2 q<p≤x+h p,q∈P(k)  fi (p)fi (q) ≤ pq  p≤x+h p∈P(k)  fi (p) p  2  < η2 =  2  48h  .  35  2.3. Generalization to linear functions Comparing the sum on the left-hand side to the sum in (2.13), we have 2 pq≤x+h p,q∈P(k) p>q  2 fi (p)fi (q) < +2 pq 48h  2  =  48h 2  <  48h  pq>x+h p,q∈P(k) q<p≤x+h  µ2 pq  + 2µ2 √  x+h<p≤x+h (x+h)/p<q<p q∈P(k) p∈P(k)  + 2µ2 C =  1 pq  2  24h  by Lemma 2.6. This completes the proof of Theorem 2.3.  2.3  Generalization to linear functions  As we noted in the introduction, one can use Theorem 2.2 to simultaneously approximate arbitrary reals by the differences f (n + i) − f (n + i − 1) rather than the absolute values f (n + i). Note that by considering these relative differences we no longer need any lower bound on the targets αi . Erd˝ os and Schinzel prove in [20] that this formulation actually holds under a significantly weaker hypothesis than Theorem 2.2, specifically: Theorem 2.7 (Erd˝ os–Schinzel). Let f (n) be an additive function satisfying Conditions 2.1a and 2.1b. Then, for any given sequence of h real numbers α1 , α2 , . . . , αh and  > 0, the set of natural numbers n satisfying the  simultaneous inequalities |f (n + i) − f (n + i − 1) − αi | <  (i = 1, 2, . . . , h)  (2.14)  has positive lower density. For instance, any additive function f with f (p) = log−1 p satisfies Theorem 2.7 (again taking c1 (f ) = 0 in Condition 2.1b) but not Condition 2.1c. The elimination of Condition 2.1c relies on a delicate cancellation between values of f , so their proof does not readily adapt to multiple functions fi 36  2.3. Generalization to linear functions satisfying this weaker condition. However, it can be generalized in a different direction: we may replace the consecutive integers n, n + 1, . . . , n + h, at which f is evaluated, by h + 1 (non-constant) linear functions of n, say a0 n + b0 , a1 n + b1 , . . . , ah n + bh  (ai ∈ N, bi ∈ Z).  (2.15)  We need to impose one condition on the coefficients ai and bi , which is that no two forms ai n + bi and aj n + bj may divide one another as polynomials. This requirement is natural as we cannot expect, say, φ(3n+6) and φ(2n+4) to approximate independent reals (indeed, the function φ(3n + 6)/φ(2n + 4) takes only four distinct values). Thus we add the constraint that ai bj = aj bi  (0 ≤ i < j ≤ h),  (2.16)  whereupon the following theorem holds: Theorem 2.8. Let h ∈ N and let (ai n + bi )hi=0 be linear forms satisfying (2.15) and (2.16). Then Theorem 2.7 holds with (2.14) replaced by |f (ai n + bi ) − f (ai−1 n + bi−1 ) − αi | <  (i = 1, 2, . . . , h).  (2.17)  We will see that Theorem 2.8 actually follows by an elementary argument from Theorem 2.7. We first show that Theorem 2.7 still holds when n is restricted to an arbitrary arithmetic progression. We remark that Erd˝os and Schinzel’s techniques were strong enough to show positive relative density even in the case that n is restricted to the set of primes! We make use of the following observation which is immediate upon inspection of the proof of Theorem 6 [20, p. 477]. There is a minor flaw in the construction which is immaterial to our proposition, and is easily repaired by replacing (h + 1)! with (h + 1)!2 , similar to our equation (2.5). Proposition 2.9. The conclusion of Theorem 2.7 holds when n is restricted to the congruence class n ≡ 1 (mod (h + 1)!). We note that this specific congruence class, just as in the proof of Theorem 2.3, is not essential to the construction and was used only as a means of 37  2.3. Generalization to linear functions simplifying the bookkeeping needed in the argument. Indeed, the preceding proposition readily generalizes to the following lemma. Lemma 2.10. Let f (n) be an additive function satisfying Conditions 2.1a and 2.1b. Then for any a ∈ N and b ∈ Z, the set of n ≡ b (mod a) for which n satisfies (2.14) also has positive lower density. Proof. Without loss of generality, we may assume that b ≥ a, so that a | (h + b)!. Now, let h = h + b − 1 and apply Theorem 2.7 to the h -tuple (α1 , α2 , . . . , αh ), where αi+b−1 = αi and we are free to choose αi arbitrarily for i < b. By Proposition 2.9, any n satisfying (2.14) for the tuple (αi ) is congruent to 1 mod (h + b)!, hence n = n + b − 1 satisfies (2.14) for the requisite h-tuple (α1 , . . . , αh ) as well as the modular constraint. Our strategy for proving Theorem 2.8 is simple: we first transform the problem into a specific instance of Lemma 2.10, and then we correct for the discrepancy introduced by the transformation. Proof of Theorem 2.8. Let A be the lowest common multiple of a0 , a1 , . . . , ah . By making a substitution of the form n → n + k, we may assume that each bi > 0. We multiply each linear form ai n + bi by an appropriate integer to obtain the form An + Bi with Bi > 0. Condition (2.16) implies that the natural numbers B0 , . . . , Bh are distinct. By reordering the ai and bi in condition (2.17) and making corresponding adjustments to the values α1 , α2 , . . . , αh and , we may assume that the linear forms (ai n + bi )hi=0 are ordered so that B0 < B1 < · · · < Bh . It follows from Lemma 2.10, using h = Bh and the congruence class n ≡ 0 (mod A), that for any h-tuple (α1 , . . . , αh ), the set of n such that |f (An + Bi ) − f (An + Bi−1 ) − αi | <  (i = 1, 2, . . . , h)  (2.18)  has positive lower density. This is clear when Bi−1 and Bi are consecutive integers; if ∆Bi = Bi −Bi−1 > 1, then we are free to assign arbitrary targets to the values of f (An + Bi−1 + j) − f (An + Bi−1 + j − 1) for 1 ≤ j ≤ ∆Bi , provided that the targets sum to αi , while replacing  by /∆Bi . 38  2.3. Generalization to linear functions It remains to account for the discrepancy between f (An+Bi ) and f (ai n+ bi ). Since An + Bi = Mi (ai n + bi ) for some integer Mi | A, and f is additive, the value of f (An + Bi ) − f (ai n + bi ) is precisely determined by the p-adic valuations vp (An + Bi ) and vp (ai n + bi ) for just the finitely many primes p dividing Mi . In particular, if we restrict n to a suitable congruence class so that vp (An) > vp (Bi ) for each prime p dividing A and all 0 ≤ i ≤ h, then vp (An + Bi ) takes the constant value vp (Bi ) and likewise vp (ai n + bi ) takes the constant value vp (bi ), so that f (An + Bi ) − f (ai n + bi ) = f (Bi ) − f (bi ). A second appeal to Lemma 2.10 allows us to do just that, and so Theorem 2.8 holds by taking αi = αi + f (Bi ) − f (bi ) − f (Bi−1 ) + f (bi−1 ) in (2.18). The method used to prove Theorem 2.8 is easily adapted to generalize Theorem 2.3, using congruence (2.5) in place of Proposition 2.9, to obtain the corollary below. Note that the constant c now also depends on the coefficients (ai , bi ). Corollary 2.11. Let f1 , f2 , . . . , fh ∈ F, and let (ai n + bi )hi=1 be linear forms satisfying (2.15) and (2.16) (ignoring a0 and b0 ). Then Theorem 2.3 holds with (2.2) replaced by |fi (ai n + bi ) − αi | <  (i = 1, 2, . . . , h).  (2.19)  39  Chapter 3  Eigenvalues of Random Integer Matrices 3.1  Introduction  In this chapter we conduct our investigation of the eigenvalues of random integer matrices, in response to the question posed by Hetzel, Liew and Morrison [37]: what is the probability that a random n × n integer matrix is diagonalizable over the field of rational numbers? Since there is no uniform probability distribution on Z, let us clarify the meaning of this question. For k ≥ 1, let Ik := {−k, −k + 1, . . . , k − 1, k} be the set of integers of height at most k. Since |Ik | = 2k + 1 is finite, we can construct a random n×n matrix M where each entry is chosen independently and uniformly at random from Ik . The probability that M has any given property (such as being diagonalizable over Q) is then a function of the parameter k. Our aim is to determine how this function behaves as k → ∞; in particular, does it approach a limiting value as k → ∞? For any given integers n ≥ 2 and k ≥ 1, the set of random n × n matrices with entries in Ik forms a well-defined (finite) probability space; it will be convenient to compute probabilities just by counting matrices, so we introduce some notation for certain sets of matrices. Let Mn (k) be the set of all n × n matrices whose entries are in Ik ; we are choosing 2  matrices uniformly from Mn (k), a set with cardinality (2k + 1)n . The probability that a random matrix in Mn (k) satisfies a particular property is given simply by the number of matrices in Mn (k) having that property, 2  divided by (2k + 1)n .  40  3.1. Introduction For a given integer λ, let Mλn (k) denote the set of all matrices in Mn (k) that have λ as an eigenvalue. In particular, M0n (k) is the subset of singular matrices in Mn (k). Likewise, we denote the set of matrices in Mn (k) λ λ∈Z Mn (k).  having at least one integer eigenvalue by MZn (k) =  The prob-  ability that a random matrix in Mn (k) has an integer eigenvalue is thus 2  |MZn (k)|/(2k + 1)n . Our first result in this chapter affirms and strengthens the conjecture made by Hetzel et al.: for any n ≥ 2, the probability that a random n × n integer matrix has at least one integer eigenvalue tends to 0 as k → ∞. We give a quantitative upper bound on this probability: Theorem 3.1. Given any integer n ≥ 2 and any real number ε > 0, the probability that a randomly chosen matrix in Mn (k) has any integer eigenvalues is  n,ε  1/k 1−ε . Consequently, the probability that a randomly chosen  matrix in Mn (k) is diagonalizable over Q is  n,ε  1/k 1−ε .  Given an integer matrix M ∈ Mn (k), a necessary (and nearly sufficient) condition for it to be diagonalizable over Q is that all of its eigenvalues be rational. Moreover, since the characteristic polynomial det(λI −M ) is monic with integer coefficients, the familiar “rational roots theorem” [64, §4.3] implies that every rational eigenvalue of M must be an integer. Hence any integer matrix that is diagonalizable over the rationals must be contained in MZn (k), and so the second assertion of the theorem follows immediately from the first. The special case n = 2 of Theorem 3.1 was previously obtained in [37] and also earlier by Kowalsky [47]. Unravelling the  -notation, the theorem  states that there exists a constant C, depending on n and ε, such that |MZn (k)|/|Mn (k)| ≤ C/k 1−ε for all k ≥ 1. Note that |Mn (k)|  2  n  k n (with  the implied constant being highly dependent on n), and so the theorem also gives an upper bound for the number of matrices in Mn (k) having at least one integer eigenvalue, that is |MZn (k)|  n,ε  2 −1+ε  kn  .  The key tool used to establish Theorem 3.1 is the following related esti41  3.1. Introduction mate for |M0n (k)|, the number of singular matrices in Mn (k): Lemma 3.2. Given any integer n ≥ 2 and any real number ε > 0, the probability that a random matrix in Mn (k) is singular is other words, |M0n (k)|  n,ε  2 −2+ε  kn  n,ε  1/k 2−ε . In  .  This bound is far from optimal in general (as we discuss in Section 3.10), but it is essentially best possible when n = 2: in the latter part of this chapter, we conduct a detailed investigation of the distribution of eigenvalues for the 2 × 2 case, in particular obtaining the precise asymptotic |M02 (k)| = 96 2 k log k π2  + O(k 2 ). More generally, we determine the asymptotic incidence  of all integer eigenvalues of M2 (k), as given by the following theorem: Theorem 3.3. Define the function V : [−2, 2] → R by V (−δ) = V (δ) and  V (δ) =     4 − 2δ − δ 2 + δ 2 log(1 + δ) − 2(1 − δ) log(1 − δ),     1 + log 2,   4 − 2δ − δ 2 + δ 2 log(δ + 1) + 2(δ − 1) log(δ − 1),     δ 2 − 2δ − (δ 2 − 2δ + 2) log(δ − 1)  if 0 ≤ δ < 1, if δ = 1, √ if 1 < δ ≤ 2, √ if 2 < δ ≤ 2. (3.1)  Then for any integer λ between −2k and 2k, |Mλ2 (k)| =  24V (λ/k) 2 k log k + O(k 2 ), π2  (3.2)  where the implied constant is absolute. If |λ| > 2k then Mλ2 (k) is empty. We remark that the function V (δ) is continuous and, with the exception of the points of infinite slope at δ = ±1, differentiable everywhere (even at δ = ±2, if we further define V (δ) ≡ 0 for |δ| > 2). Technically, equation (3.2) is not an asymptotic formula when λ is extremely close to ±2k, because then the value of V (λ/k) can have order of magnitude 1/ log k or smaller, making the “main term” no bigger than the error term. However, it truly is an asymptotic formula for |λ| < 2k − ψ(k)k/(log k)1/3 , where ψ(k) is any function tending to infinity (the exponent 1/3 arises because V (δ) approaches 0 cubically as δ tends to 2 from below). 42  3.1. Introduction By summing (3.2) over all possible values of λ, we obtain an asymptotic formula for |Mλ2 (k)|, which gives a sharp quantitative answer to the original question, in the special case n = 2. We defer the details of this calculation to Section 3.6. √ √ Corollary 3.4. Let C = 7 2 + 4 + 3 log( 2 + 1) /3π 2 ≈ 0.55873957. The probability that a random matrix in M2 (k) has integer eigenvalues is asymptotically equal to C(log k)/k. More precisely, |MZ2 (k)| = 16Ck 3 log k + O(k 3 ).  (3.3)  If M ∈ M2 (k) has eigenvalue λ, then the scaled matrix k −1 M has eigenvalue λ/k, which is the argument of V that appears on the right-hand side of (3.2). A natural probabilistic interpretation of Theorem 3.3 is that for large k, the rational eigenvalues of k −1 M are distributed according to the density given by V . Note that the entries of k −1 M are sampled from a discrete, evenly-spaced subset of [−1, 1]. As k → ∞ the random variable describing each entry converges weakly to the uniform distribution on the interval [−1, 1]. Let M2 ([−1, 1]) be the probability space of all 2 × 2 matrices whose entries are independent random variables drawn from this distribution. One might ask whether the distribution given by Theorem 3.3 merely approximates the continuous distribution of (real) eigenvalues in M2 ([−1, 1]); the answer, perhaps surprisingly, is no, as we establish by also computing this latter distribution explicitly. Theorem 3.5. If M is chosen uniformly from M2 ([−1, 1]), the expected number of eigenvalues of M in the interval [s, t] is  t s W (δ) dδ,  where W (δ)  43  3.1. Introduction is the density function given by W (−δ) = W (δ) and    (80 + 20δ + 90δ 2 + 52δ 3 − 107δ 4 )/(144(1 + δ))       − (5 − 7δ + 8δ 2 )(1 − δ) log(1 − δ)/12       − δ(1 − δ 2 ) log(1 + δ)/4,      δ(20 + 10δ − 12δ 2 − 3δ 3 )/(16(1 + δ))   W (δ) = + (3δ − 1)(δ − 1) log(δ − 1)/4      + δ(δ 2 − 1) log(δ + 1)/4,       δ(δ − 2)(2 − 6δ + 3δ 2 )/(16(δ − 1))       − (δ − 1)3 log(δ − 1)/4,     0,  if 0 ≤ δ ≤ 1,  if 1 ≤ δ ≤  if  √  2,  √ 2 ≤ δ ≤ 2,  if δ ≥ 2. (3.4)  Just like V (δ), the function W (δ) is continuous and differentiable everywhere, again with the exception of two points of infinite slope at δ = ±1. (The value W (1) =  15 32  makes the function continuous there, although the  value of a density function at any one point is irrelevant to the distribution.) It also shares the same cubic decay as δ tends to 2 from below. But there are also striking qualitative differences between the functions V and W . In Figure 3.1 we plot these on the same axes, normalized18 so that the area under each curve is 2. In the case of M2 (k), this normalization corresponds to conditioning on having integer eigenvalues: that is, scaling V by the probability C(log k)/k from Corollary 3.4. For M2 ([−1, 1]) we are conditioning on having real eigenvalues, an event which occurs with probability  49 72  (this can  be obtained by integrating W (δ), analogously to the proof of Corollary 3.4, or by a direct integral computation which was done in [37]). Notice that the distribution W (δ) is bimodal, having maxima at δ ≈ ±0.75030751. Thus, a random matrix in M2 ([−1, 1]) is more likely to have an eigenvalue of near  3 4  than one near 0. That the opposite is true for  V (δ) shows that the eigenvalue distribution of MZ2 (k) is not purely the 18  We normalize to area 2 since a 2 × 2 matrix has two eigenvalues. One could view this as the sum of respective probability densities for the upper and lower eigenvalues.  44  3.2. Basic bounds on singularity  Figure 3.1: Distribution of real and rational eigenvalues for n = 2 result of magnitude considerations, but also encodes some of the arithmetic structure of the integers up to k. It is true, however, that if we sample the real eigenvalues of matrices in M2 (k), then these do converge to the exact distribution given by W (δ): eigenvalues vary continuously with the matrix coefficients, so this sampling limit behaves much like a Riemann sum. We may thus view V (δ) and W (δ) as the distributions of integer and real eigenvalues of M2 (k) for large k.  3.2  Basic bounds on singularity  It is not hard to show a simpler version of Lemma 3.2, with a singularity bound that is qualitatively weaker but more explicit and also uniform in n: Proposition 3.6. The probability that a random matrix in Mn (k) is singular is at most 1/2k. To see this, we need a lemma which bounds the size of intersection between a linear subspace and a discrete hypercube in Rn . With minor modifications this previously appeared as problem B4 on the 67th William Lowell Putnam Mathematical Competition [44]. The elegant proof below is due to Catalin Zara. 45  3.2. Basic bounds on singularity Lemma 3.7. Let V be a d-dimensional subspace of Rn , and S be any finite set of m real numbers. Then |S n ∩ V | ≤ md . Proof. Form a matrix A whose rows are the points in S n ∩ V . This has rank at most d, so we can find d columns of A which span all other columns: each row of A can be uniquely determined by the d entries corresponding to those columns. Since |S| = m, there are at most md distinct choices for those entries, hence at most md distinct rows. Proof of Proposition 3.6. Let M be randomly chosen from Mn (k), and let Ei be the event that the ith row of M is contained in the span of the previous i − 1 rows (E1 is just the probability that the first row is zero). If M is singular, then at least one of E1 , . . . , En must occur, so the probability that M is singular is at most (2k +1)i−1  n i=1 Pr(Ei ).  By Lemma 3.7 there are at most  possible choices for row i that are spanned by the preceding rows,  so Pr(Ei ) ≤ (2k +1)i−1−n . The result then follows by bounding with the geometric series  ∞ j=1 (2k  n i=1 Pr(Ei )  + 1)−j = 1/2k.  Remark. Proposition 3.6 is used with minor alterations in [66], where it was beneficial for the application to have explicit constants that do not grow with n. Unfortunately, the decay rate of O(1/k) is not quite strong enough to yield an “almost all” result for the rational eigenvalue problem (we need a bound of o(1/k) to get any qualitative analogue of Theorem 3.1). In order to achieve stronger decay with respect to k, we are willing to sacrifice some uniformity — indeed, our method yields an implicit constant that grows very rapidly with n (like a factorial). We record without proof a well-known bound (see for instance [59, p. 56]) for the number-of-divisors function τ (n) that will be useful here and in later sections. Although more explicit bounds are available, the one below is quite sufficient for our purposes. Proposition 3.8. For any real δ > 0 and nonzero integer n, τ (n)  δ  |n|δ .  We first prove Lemma 3.2 in the easiest case n = 2, but in a slightly more general form that will be useful for induction to larger n. We wish to 46  3.2. Basic bounds on singularity show that the probability that a random 2 × 2 matrix is singular is equally small (  ε  k −2+ε , albeit with a different implied constant) even if we choose  the entries from different arithmetic progressions (each having the same cardinality as Ik ). Lemma 3.9. Fix positive real numbers α, B, and ε. Let k be a positive integer, and let L1 (x), L2 (x), L3 (x), and L4 (x) be nonconstant linear polynomials whose coefficients are integers at most Bk α in absolute value. Then the number of solutions to the equation L1 (a)L2 (b) = L3 (c)L4 (d) with all of a, b, c, and d in Ik is  α,B,ε  (3.5)  k 2+ε .  Proof. First we consider the solutions for which both sides of equation (3.5) equal 0. In this case, at least two of the linear factors L1 (a), L2 (b), L3 (c), and L4 (d) equal 0. If, for example, L1 (a) = 0 and L3 (c) = 0 (the other cases are exactly the same), this completely determines the values of a and c; since there are 2k + 1 choices for each of b and d, the total number of solutions for which both sides of equation (3.5) equal 0 is  k2 .  Otherwise, fix any values for c and d for which the right-hand side of equation (3.5) is nonzero, a total of at most (2k+1)2  k 2 choices. Then the  right-hand side is some nonzero integer that is at most (k · Bk α + Bk α )2 ≤ 4B 2 k 2+2α in absolute value, and L1 (a) must be a divisor of that integer. By Proposition 3.8 with δ = ε/(2 + 2α), the right-hand side of equation (3.5) has only  α,ε  (4B 2 k 2+2α )ε/(2+2α)  α,B,ε  k ε divisors which may serve  as candidates for L1 (a); each of these completely determines a value for a (which might not even be an integer). Then the possible values for L2 (b) and hence b are determined as well. We conclude that there are a total of α,B,ε  k 2+ε solutions to equation (3.5) as claimed.  Remark. It is not important that the Li be linear: the above proof works essentially without change for any four nonconstant polynomials of bounded degree. We will not need such a generalization, however, as the determinant of a matrix depends only linearly on each matrix element. 47  3.3. Eigenvalues and determinants Remark. For our application, both α and B will depend only on the single parameter n, so the dependence  α,B,ε  may be rephrased as  n,ε ,  absorbing  the explicit references to α and B.  3.3  Eigenvalues and determinants  We previously noted that if M ∈ Mn (k) has a rational eigenvalue λ then λ must be an integer. Furthermore, λ could be as high as nk (if we take M to be the n × n matrix with all entries equal to k), or as low as −nk (if we take M to be the negation of the previous example). In fact it is not possible for λ to be any larger (even when it is complex), and an analogous bound holds for the continuous case as shown by the lemma below. Lemma 3.10. Any eigenvalue of a matrix in Mn (k) is bounded in absolute value by nk. Any eigenvalue of a matrix in Mn ([−1, 1]) is bounded in absolute value by n. Proof. We invoke Gershgorin’s “circle theorem” [23], a standard result in spectral theory: let M = (mij ) be an n × n matrix, and let D(z, r) denote the disk of radius r around the complex number z. Then Gershgorin’s theorem says that all of the eigenvalues of M must lie in the union of the disks |m1j | , D m22 ,  D m11 , 1≤j≤n j=1  |m2j | , . . . , D mnn , 1≤j≤n j=2  |mnj | . 1≤j≤n j=n  In particular, if all entries of M are bounded in absolute value by B, then all eigenvalues are bounded in absolute value by nB. Remark. The examples preceding the lemma show that the bound is tight; indeed, one could also prove this using Perron–Frobenius theory for the spectral radius of M , which provides some intuition for the fact that the all-k’s matrix is maximal. The next ingredient is a curious determinantal identity, which was classically known but presently seems to have fallen out of common knowledge. 48  3.3. Eigenvalues and determinants Before we state this identity, it will be very helpful to define some preliminary notation. For the remainder of this section, we will adhere to the convention that capital letters denote matrices, boldface lowercase letters denote column vectors, and regular lowercase letters denote scalars. Let In denote the n × n identity matrix, whose jth column we denote by ej (the jth standard basis vector). Let M be an n × n matrix, and let mj denote its jth column and mij its ijth entry. Note that M ej = mj by the definition of matrix multiplication. We caution the reader that, in the definitions below, aij does not denote the entries of A, nor those of aj . Let aij denote the ijth cofactor of M , that is, the determinant of the (n − 1) × (n − 1) matrix obtained from M by deleting its ith row and jth column. Let A = Adj(M ) denote the adjugate matrix of M , that is, the matrix whose ijth entry is (−1)i+j aji . It is a standard consequence of Laplace’s determinant expansion [36, §4.III]19 that M A = (det M )In . Finally, let aj denote the jth column of A. Note that M aj = (det M )ej , since both sides are the jth column of (det M )In . Lemma 3.11. Fix an integer n ≥ 3. Let M be an n × n matrix with cofactors aij . Also let Z denote the (n − 2) × (n − 2) matrix obtained from M by deleting the first two rows and first two columns, so that     m11  m12  *   M =  m21  m22   * , Z  ∗  ∗  (3.6)  where ∗ represents irrelevant entries. Then a11 a22 −a12 a21 = (det M )(det Z). It is important to note that when det Z = 0, the cofactor a11 is a linear polynomial of m22 whose leading coefficient is det Z, and whose constant term depends only on Z and the starred entries; conversely, the cofactor a22 is a linear polynomial of m11 with leading coefficient det Z (similar correspondences hold for the pair a12 and a21 ). For example, when n = 3 the determinant of the 1 × 1 matrix Z is simply the lower-right entry m33 of M ; 19 Our definition of cofactor differs slightly from that in [36], which includes the (−1)i+j term. We prefer this for simplicity as we work mostly with aij and not the adjugate A.  49  3.3. Eigenvalues and determinants the identity in question thus translates to (m11 m33 − m13 m31 )(m22 m33 − m23 m32 ) − (m12 m33 − m13 m32 )(m21 m33 − m23 m31 ) = m33 det M. (3.7) For any fixed dimension n, the assertion of Lemma 3.11 is just a polynomial identity like the above, which is verifiable by direct computation; however, a proof that works for general n requires a bit of cunning. Proof. Define an n × n matrix   a11   B = a1 a2 e3 · · · en =  −a12  −a21  0    a22  0   .  ∗  In−2  ∗  Since B is in lower-triangular block form, its determinant is easy to evaluate: det B = det  a11  −a21  −a12  a22  · det In−2 = a11 a22 − a12 a21 .  Moreover, M B = M a1 M a2 M e3 · · · M en   = (det M )e1 (det M )e2 m3 · · · mn =   det M  0  0  det M  0  0  ∗     ∗ . Z  Since M B is in upper-triangular block form, its determinant det(M B) = (det M )2 (det Z) is also easy to evaluate. Using the identity det M · det B = det(M B), we conclude that (det M )(a11 a22 − a12 a21 ) = (det M )2 (det Z). Both sides of this last identity are polynomial functions of the n2 variables mij representing the entries of M . The factor det M on both sides is a 50  3.4. Proof of main results for n ≥ 2 nonzero polynomial, and hence it can be canceled to obtain (det M )(det Z) = a11 a22 − a12 a21 as desired. Remark. This proof generalizes readily to an analogous identity for larger minors of the adjugate matrix A. Indeed, Muir’s classic treatise on determinants [60, Ch. VI, §175] includes this general form of Lemma 3.11 in a chapter wholly devoted to compound determinants (that is, determinants of matrices whose elements are themselves determinants). The same result can also be found in Scott’s reference of equally old vintage [75, p. 62], which was made freely available online by the Cornell University Library Historical Math collection.20  3.4  Proof of main results for n ≥ 2  We are now ready to prove the singularity bound of Lemma 3.2. We proceed by an induction on n, which requires n = 2 and n = 3 as base cases. In the arguments that follow, let M denote a matrix in Mn (k) with entries mij . Base case n = 2: The determinant of  m11 m12 m21 m22  m11 m22 = m12 m21 . By Lemma 3.9, there are  ε  is 0 precisely when  k 2+ε  solutions to this  equation with the variables m11 , m12 , m21 , and m22 all in Ik . This immediately shows that |M02 (k)|  ε  k 2+ε as claimed. Since 1/|M2 (k)|  k −4 ,  we see that the probability of a randomly chosen matrix from M2 (k) being singular is |M02 (k)|/|M2 (k)|  ε  k −2+ε .  Base case n = 3: We first estimate the number of matrices in M03 (k) whose lower right-hand entry m33 is nonzero. Fix the five entries in the last row and last column of M , with m33 = 0; there are a total of 2k(2k+1)4  k5  possibilities. Using the earlier identity (3.7), we see that if det M = 0 then 20  http://ebooks.library.cornell.edu/cgi/t/text/text-idx?c=math;idno= 01670002  51  3.4. Proof of main results for n ≥ 2 we must have (m11 m33 − m13 m31 )(m22 m33 − m23 m32 ) = (m12 m33 − m13 m32 )(m21 m33 − m23 m31 ). This equation is of the form L1 (m11 )L2 (m22 ) = L3 (m12 )L4 (m21 ), where the Li are nonconstant linear polynomials whose coefficients are at most k 2 in absolute value. (Note that we have used the fact that m33 = 0 in asserting that the Li are nonconstant.) Applying Lemma 3.9 with α = 2, we see that there are  ε  k 2+ε solutions to this equation with m11 , m12 , m21 , and m22  all in Ik . This shows that there are  ε  k 7+ε matrices in M03 (k) whose lower  right-hand entry m33 is nonzero. If any of the entries in the last row of M is nonzero, then we can swap two columns of M , if necessary, to bring that entry into the lower righthand position; any matrix so formed corresponds to at most three distinct matrices in M03 (k), and so there are still  ε  k 7+ε matrices in M03 (k) that  have any nonzero entry in the last row. Finally, any matrix whose last row consists of all zeros is certainly in M03 (k), but there are only (2k + 1)6  k6  such matrices. We conclude that the total number of matrices in M03 (k) is ε  k 7+ε , so that the probability of a randomly chosen matrix from M3 (k)  being singular is |M03 (k)|/|M3 (k)|  ε  k −2+ε as claimed.  Inductive step for n ≥ 4: Write a matrix M ∈ Mn (k) in the form (3.6). Some such matrices will have det Z = 0; however, by the induction hypothesis for n − 2, the probability that this occurs is  n,ε  k −2+ε (inde-  pendent of the entries outside Z), which is an allowably small probability. Otherwise, fix values in Ik for the n2 − 4 entries other than m11 , m12 , m21 , and m22 such that det Z = 0. It suffices to show that, conditioned on any such fixed values, the probability that M is singular is  n,ε  k −2+ε , as  we let the remaining variables m11 , m12 , m21 , and m22 range over Ik , . By Lemma 3.11, we see that det M = 0 is equivalent to a11 a22 = a12 a21 . Recall that the cofactor a11 is a linear polynomial of m22 with leading coefficient det Z, while a22 is a linear polynomial of m11 with leading coefficient det Z (and similarly for the pair a12 and a21 ). Moreover, the coefficients 52  3.5. Preliminaries for 2 × 2 matrices of these polynomials are at most (n − 1)! k n−1 in magnitude: by Laplace expansion, each coefficient is the sum of at most (n − 1)! products, each composed of at most n − 1 entries of M . We may thus apply Lemma 3.9 with α = n − 1 and B = (n − 1)! to see that the (conditional) probability of a11 a22 = a12 a21 is  n,ε  k −2+ε , as desired.  We now have everything we need to prove Theorem 3.1: given any integer n ≥ 2 and any real number ε > 0, the probability that a randomly chosen matrix in Mn (k) has an integer eigenvalue is  n,ε  1/k 1−ε . By Lemma 3.10,  any such eigenvalue λ is at most nk in absolute value. For each individual λ, we observe that if M ∈ Mn (k) has eigenvalue λ, then M − λIn is a singular matrix with integer entries bounded in size by k + |λ| ≤ (n + 1)k. Thus, every matrix in Mλn (k) is also contained in the set M + λIn : M ∈ M0n (n + 1)k By Lemma 3.2, the cardinality of this set is 2 k n −2+ε  n,ε  . ((n + 1)k)n  2 −2+ε  n,ε  for any fixed λ. Summing over all integer values of λ from −nk to  nk (admittedly, some matrices are counted multiple times, but the upper bound thus obtained is still valid), we conclude that the total number of matrices in MZn (k) is  n,ε  2 −1+ε  kn  . In other words, the probability that a  matrix in Mn (k) has an integer eigenvalue is |MZn (k)|/|Mn (k)|  n,ε  1/k 1−ε ,  as desired.  3.5  Preliminaries for 2 × 2 matrices  Having established Theorem 3.1 and Lemma 3.2 for general n, we can give considerably sharper results in the simplest case n = 2. The remainder of this chapter is devoted to the proofs of Theorem 3.3 and Corollary 3.4 for integer eigenvalues, and also Theorem 3.5 for real eigenvalues. The 2 × 2 case is particularly nice: since the trace of an integer matrix is itself an integer, it follows that if one eigenvalue is an integer then both are. Consequently, there is no significant distinction between belonging to 53  3.5. Preliminaries for 2 × 2 matrices MZ2 (k) and being diagonalizable over Q (if M has rational eigenvalues but is not rationally diagonalizable, then its eigenvalues must be repeated, and we confirm in Lemma 3.14 that this occurs very rarely). We begin with some elementary observations for 2 × 2 matrices (both integer-valued and real-valued) that will simplify our computations of the functions V (δ) and W (δ). The key to the precise enumeration of Mλ2 (k) is the simple structure of singular integer matrices: Lemma 3.12. For any singular matrix M ∈ M02 (k), either at least two entries of M equal zero, or else there exist nonzero integers a, b, c, d with (a, b) = 1 such that M=  ac  bc  ad bd  .  (3.8)  Moreover, this representation of M is unique up to replacing each of a, b, c, d by its negative. Proof. If one of the entries of M equals zero, then a second one must equal zero as well for the determinant to vanish. Otherwise, given M=  m11 m12 m21 m22  with none of the mij equal to zero, define c = (m11 , m12 ), and set a = m11 /c and b = m12 /c, so that (a, b) = 1. Since M is singular, the second row of m must be a multiple of the first row—that is, there exists a real number d such that m21 = ad and m22 = bd. Since a and b are relatively prime, moreover, d must in fact be an integer. This shows that every such matrix admits a representation of type (3.8). If there is another representation M=  ac  bc  ad  bd  ,  then (a , b ) = 1 implies (a c , b c ) = |c |, which shows that |c | = c; the equalities |a | = |a|, |b | = |b|, and |d | = |d| follow quickly. 54  3.5. Preliminaries for 2 × 2 matrices For a 2 × 2 matrix M =  ac bd  , we define disc M = (tr M )2 − 4 det M =  (a − d)2 + 4bc, where tr M is the trace a + d and det M is the determinant ad − bc. It is easily seen that disc M is the discriminant of the characteristic polynomial of M . We record the following elementary facts, which will be useful in the proofs of Lemma 3.14 and Proposition 3.20. Lemma 3.13. Let M be a 2 × 2 matrix with real entries. (a) M has repeated eigenvalues if and only if disc M = 0. (b) M has real eigenvalues if and only if disc M ≥ 0. (c) det M < 0 if and only if M has two real eigenvalues of opposite sign. (d) If det M > 0 and disc M ≥ 0, then the eigenvalues of M have the same sign as tr M . Proof. Let λ1 , λ2 denote the eigenvalues of M , so that tr M = λ1 + λ2 , det M = λ1 λ2 , and disc M = (λ1 − λ2 )2 , each of which is real. Parts (a), (b) and (d) follow immediately from these observations, and part (c) from the fact that λ2 = λ1 if λ1 , λ2 are complex. The next lemma gives a bound for the probability of a matrix having repeated eigenvalues. It is natural to expect this probability to approach 0 as k increases, and indeed such a result was obtained in [37] for matrices of arbitrary size, with an upper bound of On (1/k). We give a simple proof of a stronger bound for the 2 × 2 case, as well as an analogous qualitative statement for real matrices which will be helpful in the proof of Theorem 3.5. Lemma 3.14. The number of matrices in M2 (k) with a repeated eigenvalue is  ε  k 2+ε for every ε > 0. The probability that a random matrix in  M2 ([−1, 1]) has a repeated eigenvalue or is singular is 0. Proof. By Lemma 3.13(a), the 2 × 2 matrix and only if 4bc = −(a −  d)2 .  ab cd  has a double eigenvalue if  For matrices in M2 ([−1, 1]) this is easily seen  to be a zero-probability event, as is the event that det M = ad − bc = 0. 55  3.6. Enumeration theorems for integer eigenvalues For matrices in M2 (k), we enumerate how many can satisfy 4bc = −(a − d)2 .  This follows the same argument as Lemma 3.9: if a = d then one of b  or c must be 0, yielding at most  k 2 solutions. Otherwise, this there are  O(k 2 ) choices for a and d that make (a − d) nonzero, each yielding choices for b, c. Combining the two yields  3.6  ε  ε  kε  k 2+ε solutions in total.  Enumeration theorems for integer eigenvalues  Let µ(n) be the M¨ obius function, characterized by the identity  d|n   1, if n = 1, µ(d) = 0, if n > 1.  The well-known Dirichlet series identity 1/ζ(s) = for  (3.9) ∞ −s n=1 µ(n)n  is valid  s > 1 (see [59, Corollary 1.10], for example). In particular, 6 µ(n) = 2, 2 n π  and we can estimate the tail of this series (using |µ(n)| ≤ 1) to obtain the quantitative estimate  d≤k  µ(d) 6 1 = 2 +O d2 π k  .  (3.10)  Lemma 3.15. For nonzero integers a, b and parameters k, λ, define the function Nk,λ (a, b) = #{(c, d) ∈ Z2 , c = 0, d = 0 : |ac + λ|, |bc|, |ad|, |bd + λ| ≤ k}. (3.11) Then Mλ2 (k) = 4  Nk,λ (dα, dβ) + O(k 2 ),  µ(d) d≤k  1≤α<β≤k/d  where the implied constant is independent of λ and k. Proof. Fix an integer 0 ≤ λ ≤ 2k, and let M ∈ Mλ2 (k), so that M − λI is 56  3.6. Enumeration theorems for integer eigenvalues singular. By Lemma 3.12, either at least two entries of M − λI equal zero, or else M − λI has exactly two representations of the form (3.8). In the former case, there are 2k + 1 choices for each of the two potentially nonzero entries, hence O(k 2 ) such matrices in total (even taking into account the several different choices of which two entries are nonzero). In the latter case, there are exactly two corresponding quadruples a, b, c, d of integers as in Lemma 3.12. Taking into account that each entry of M must be at most k in absolute value, we deduce that Mλ2 (k) =  1 2  # (a, b, c, d) ∈ Z4 : a, b, c, d = 0, (a, b) = 1, |ac + λ|, |bc|, |ad|, |bd + λ| ≤ k + O(k 2 )  =  Nk,λ (a, b) + O(k 2 ).  1 2 1≤|a|,|b|≤k (a,b)=1  Because of the symmetries Nk,λ (a, b) = Nk,λ (−a, b) = Nk,λ (a, −b) = Nk,λ (−a, −b), we have Nk,λ (a, b) + O(k 2 ).  Mλ2 (k) = 2 1≤a,b≤k (a,b)=1  The only term in the sum where a = b is the term a = b = 1, and for all other terms we can invoke the additional symmetry Nk,λ (a, b) = Nk,λ (b, a), seen to be valid by switching the roles of c and d in the definition (3.11) of Nk,λ (a, b). We obtain Mλ2 (k) = 4  Nk,λ (a, b) + 2Nk,λ (1, 1) + O(k 2 ) 1≤a<b≤k (a,b)=1  Nk,λ (a, b) + O(k 2 ),  =4 1≤a<b≤k (a,b)=1  where the last step used the fact that Nk,λ (1, 1) ≤ #{(c, d) ∈ Z2 : |c|, |d| ≤ k}  k2 . Using the characteristic property of the M¨obius function (3.9), we can 57  3.6. Enumeration theorems for integer eigenvalues write the last expression as Mλ2 (k) = 4  µ(d) + O(k 2 )  Nk,λ (a, b) 1≤a<b≤k  =4  d|(a,b)  Nk,λ (a, b) + O(k 2 )  µ(d) d≤k  =4  1≤a<b≤k d|a, d|b  Nk,λ (dα, dβ) + O(k 2 ).  µ(d) d≤k  1≤α<β≤k/d  Lemma 3.16. Let k and λ be integers with 0 ≤ λ ≤ 2k, and let x and y be integers with 1 ≤ x ≤ y ≤ k. Then Nk,λ (x, y) = k 2 C  λ k ; x, y  D  λ k ; x, y  +O  k y  ,  where C(δ; x, y) = max 0, min D(δ; x, y) = min  1−δ y  1−δ x  + y1 , y2  ,  + x1 , y2 .  (3.12)  Proof. We have Nk,λ (x, y) = #{(c, d) ∈ Z2 , c = 0, d = 0 : |xc + λ|, |yc|, |xd|, |yd + λ| ≤ k} = #{c ∈ Z, c = 0 : −k ≤ xc + λ ≤ k, −k ≤ yc ≤ k} × #{d ∈ Z, d = 0 : −k ≤ xd ≤ k, −k ≤ yd + λ ≤ k}.  58  3.6. Enumeration theorems for integer eigenvalues Since x and y are positive, we can rewrite this product as Nk,λ (x, y) = #{c ∈ Z, c = 0 : (−k − λ)/x ≤ c ≤ (k − λ)/x, −k/y ≤ c ≤ k/y} × #{d ∈ Z, d = 0 : −k/x ≤ d ≤ k/x, (−k − λ)/y ≤ d ≤ (k − λ)/y} = # c ∈ Z, c = 0 : −k/y ≤ c ≤ min{(k − λ)/x, k/y}  (3.13)  × # d ∈ Z, d = 0 : max{−k/x, (−k − λ)/y} ≤ d ≤ (k − λ)/y , where we have used λ ≥ 0 and x ≤ y to simplify the inequalities slightly. The first factor on the right-hand side of equation (3.13) is min{(k − λ)/x, k/y} − (−k/y) + O(1) = k min (1 − λk )/x + 1/y, 2/y + O(1) if this expression is positive, and 0 otherwise; it is thus precisely kC( λk ; x, y)+ O(1). Similarly, the second factor on the right-hand side of (3.13) is (k−λ)/y−max{−k/x, (−k−λ)/y}+O(1) = k min{(1− λk )/y+1/x, 2/y +O(1) (note that this expression is always positive under the hypotheses of the lemma), which is simply kD( λk ; x, y) + O(1). Multiplying these two factors yields Nk,λ (x, y) = k 2 C  λ k ; x, y  D  λ k ; x, y  + k · O(C  λ k ; x, y  +D  λ k ; x, y  ) + O(1).  The lemma follows upon noting that both C( λk ; x, y) and D( λk ; x, y) are 1/y by definition, so the second summand becomes simply O( ky ), and the O(1) term may be subsumed into O( ky ) since y ≤ k. We have already used the trivial estimate 1 = (U − L) + O(1), L≤α<U  provided 0 < L < U . We will also use, without further comment, the  59  3.6. Enumeration theorems for integer eigenvalues estimates L≤α<U  1 U 1 , = log + O α L L  with its particular case  1≤α<U  and L≤α<U  1 = log U + O(1), α  1 1 1 1 = − +O . α2 L U L2  These estimates (also valid for 0 < L < U ) follow readily from comparison to the integrals  U L  1/x dx and  U L  1/x2 dx.  Most of the technical work in proving Theorem 3.3 lies in establishing an estimate for a sum of the form  1≤α<β  C(δ; α, β)D(δ; α, β) for a fixed β.  The following proposition provides an asymptotic formula for this sum; we defer the proof until the next section. Assuming this proposition, though, we can complete the proof of Theorem 3.3, as well as Corollary 3.4. Proposition 3.17. Let β ≥ 1 and 0 ≤ δ ≤ 2 be real numbers, and let C and D be the functions defined in equation (3.12). Then C(δ; α, β)D(δ; α, β) = 1≤α<β  V (δ) 1 + log β +O , β β2  where V (δ) was defined in equation (3.1). Proof of Theorem 3.3 assuming Proposition 3.17. The functions C and D defined in equation (3.12) are homogeneous of degree −1 in the variables x and y, so that Lemma 3.16 implies Nk,λ (dα, dβ) =  k2 C d2  λ k ; α, β  D  λ k ; α, β  +O  k dβ ).  60  3.6. Enumeration theorems for integer eigenvalues Inserting this formula into the conclusion of Lemma 3.15 yields Mλ2 (k) = 4k 2 d≤k  µ(d) d2  C  λ k ; α, β  D  λ k ; α, β  1≤α<β≤k/d  +O d≤k 1≤α<β≤k/d  k dβ  + O(k 2 ).  We bound the first error term by summing over 1 ≤ α < β to obtain  d≤k 1≤α<β≤k/d  k ≤ dβ  d≤k 1<β≤k/d  k < d  d≤k  k2 d2  k2 ,  so that we have the estimate Mλ2 (k) = 4k 2 d≤k  µ(d) d2  C  λ k ; α, β  D  λ k ; α, β  + O(k 2 ).  (3.14)  1≤α<β≤k/d  We now apply Proposition 3.17 to obtain Mλ2 (k) = 4k 2 d≤k  = 4k 2 d≤k  µ(d) d2  1≤β≤k/d  + O(k 2 )  µ(d) k V (λ/k) log + O(1) + O(1) + O(k 2 ) 2 d d  = 4k 2 V (λ/k) log k d≤k  = 4k 2 V (λ/k) log k =  V (λ/k) 1 + log β +O β β2  µ(d) +O d2  6 1 +O 2 π k  d≤k  log d d2  + O(k 2 )  + O(1) + O(k 2 )  24V (λ/k) 2 k log k + O(k 2 ), π2  where we have used equation (3.10) and the convergence of  1/n2 and  (log n)/n2 (so the partial sums are uniformly bounded). Proof of Corollary 3.4 from Theorem 3.3. For any M ∈ M2 (k), if one eigenvalue is an integer then they both are (since the trace of M is an integer).  61  3.7. Proof of key proposition for enumeration Thus if we add up the cardinalities of all of the Mλ2 (k), we get twice the cardinality of MZ2 (k), except that matrices with repeated eigenvalues only get counted once. However, the number of such matrices is  ε  k 2+ε by  Lemma 3.14. Therefore |Mλ2 (k)| + Oε (k 2+ε )  2|MZ2 (k)| = λ∈Z  = =  −2k≤λ≤2k 24k 3 log k  π2  24V (λ/k) 2 k log k + O(k 2 ) + Oε (k 2+ε ) π2  −2k≤λ≤2k  V (λ/k) + O(k 3 ). k  The sum is a Riemann sum of a function of bounded variation, so this becomes 2|MZ2 (k)| =  24k 3 log k π2  2  V (δ) dδ + O −2  1 k  + O(k 3 ).  The corollary then follows from the straightforward computation of the in√ √ √ 2 tegral −2 V (δ) dδ = 49 (7 2 + 4 + 3 log( 2 + 1)), noting that log( 2 − 1) = √ − log( 2 + 1).  3.7  Proof of key proposition for enumeration  It remains to prove Proposition 3.17. Recalling that the functions C and D defined in (3.12) are formed by combinations of minima and maxima, we need to separate our arguments into several cases depending on the range of δ. The following lemma addresses a sum that occurs in two such cases √ (0 < δ < 1 and 1 < δ < 2). Note that the formula for V (δ) contains terms like log(δ − 1), so we need to exercise some caution near δ = 1. Lemma 3.18. Let β ≥ 1 and 0 ≤ δ ≤  √  2 be real numbers, with δ = 1.  62  3.7. Proof of key proposition for enumeration Then 1−δ 1 + α β  max{1,|1−δ|β}≤α<(1+δ)−1 β  =  2 β  2 β  1 1 + log β . − |1 − δ| − (1 − δ) log |1 − δ 2 | + O 1+δ β2  Proof. Suppose first that |1 − δ|β ≥ 1. Then the sum in question is 2(1 − δ) β  |1−δ|β≤α<(1+δ)−1 β  2 1 + α β2  1 |1−δ|β≤α<(1+δ)−1 β  β/(1 + δ) 1 2(1 − δ) log +O β |1 − δ|β |1 − δ|β β 2 + 2 − |1 − δ|β + O(1) β 1+δ 2 1 1 = − |1 − δ| − (1 − δ) log |1 − δ 2 | + O 2 , β 1+δ β  =  which establishes the lemma in this case. On the other hand, if |1 − δ|β < 1 then the sum in question is 2(1 − δ) β  1≤α<(1+δ)−1 β  1 2 + 2 α β  1 1≤α<(1+δ)−1 β  2(1 − δ) β 2 β log + O(1) + 2 + O(1) β 1+δ β 1+δ 2 1 1 |1 − δ| log β = − (1 − δ) log(1 + δ) + O 2 + . β 1+δ β β  =  We subtract 2(|1 − δ| + (1 − δ) log |1 − δ|)/β from the main term and com-  63  3.7. Proof of key proposition for enumeration pensate in the error term to obtain 2(1 − δ) β  1≤α<(1+δ)−1 β  1 2 + 2 α β  1 1≤α<(1+δ)−1 β  1 − |1 − δ| − (1 − δ) log |1 − δ 2 | 1+δ |1 − δ| log β |1 − δ| + |1 − δ| log |1 − δ|−1 ) 1 + +O 2 + β β β 2 1 = − |1 − δ| − (1 − δ) log |1 − δ 2 | β 1+δ 1 + log β |1 − δ| log |1 − δ|−1 ) + , +O β2 β  =  2 β  since we are working with the assumption that |1 − δ| < 1/β. Because the function t log t−1 is increasing on the interval (0, 1/e) and bounded on the interval (0, 1], we have |1 − δ| log |1 − δ|−1 < (1/β) log β if β > e and |1 − δ| log |1 − δ|−1  1/β if 1 ≤ β ≤ e. In either case, the last error  1  term can be simplified to O((1 + log β)/β 2 ), which establishes the lemma in second case. Proof of Proposition 3.17. We consider separately the four cases corresponding to the different parts of the definition (3.1) of V (δ). For brevity we will suppress the dependence on δ, α and β from the notation for C and D. • Case 1: 0 ≤ δ < 1. In this case we have 0 < 1 − δ < (1 + δ)−1 ≤ 1 and  C=   2,  β  1−δ α  if α ≤ (1 − δ)β, + β1 ,  and D =  if (1 − δ)β ≤ α   2,  β  1−δ β  if α ≤ (1 + δ)−1 β, + α1 ,  if (1 + δ)−1 β ≤ α.  Therefore CD = 1≤α<β  1≤α<(1−δ)β  2 2 · + β β  max{1,(1−δ)β}≤α<(1+δ)−1 β  + (1+δ)−1 β≤α<β  1−δ 1 + α β  1−δ 1 + α β  2 β  1−δ 1 + . (3.15) β α  64  3.7. Proof of key proposition for enumeration (The first sum might be empty, but this does not invalidate the argument that follows.) The first sum is simply 4 β2  1= 1≤α<(1−δ)β  4 4(1 − δ) 1 (1 − δ)β + O(1) = +O 2 . β2 β β  By Lemma 3.18, the second sum is  max{1,(1−δ)β}≤α<(1+δ)−1 β  =  2 β  1−δ 1 + α β  2 β  1 1 + log β + δ − 1 − (1 − δ) log(1 − δ 2 ) + O , 1+δ β2  while the third sum is  (1+δ)−1 β≤α<β  1−δ 1 + α β  1−δ 1 + β α  = (1 − δ) (1+δ)−1 β≤α<β  +  1−δ β2  1 α2  +  δ 2 − 2δ + 2 β  (1+δ)−1 β≤α<β  1 α  1 (1+δ)−1 β≤α<β  1 1 1+δ − +O 2 β β β 2 δ − 2δ + 2 β 1 + log +O −1 β (1 + δ) β β 1−δ β + β− + O(1) β2 1+δ 1 2 1 = 2 − δ2 − + (δ 2 − 2δ + 2) log(1 + δ) + O 2 . β 1+δ β (3.16)  = (1 − δ)  65  3.7. Proof of key proposition for enumeration This case of the proposition then follows from (3.15) upon noting that 4−4δ+  2 2 +2δ−2−2(1−δ) log(1−δ 2 )+2−δ 2 − +(δ 2 −2δ+2) log(1+δ) 1+δ 1+δ = 4 − 2δ − δ 2 + δ 2 log(1 + δ) − 2(1 − δ) log(1 − δ).  • Case 2: δ = 1. In this case we have  2/β, if α ≤ β/2, 1 C= and D = 1/α, if β/2 ≤ α. β Therefore CD = 1≤α<β  1≤α<β/2  1 2 · + β β  β/2≤α<β  1 1 · β α  2 β 1 β 1 + O(1) + log +O 2 β 2 β β/2 β 1 + log 2 1 = +O 2 , β β  =  as desired.  √ • Case 3: 1 < δ ≤ 2. In this case we have   0, 2, if α ≤ (δ − 1)β, C= and D = β  1−δ + 1 , if (δ − 1)β ≤ α  1−δ + 1 , α β β α  if α ≤ (δ + 1)−1 β, if (δ + 1)−1 β ≤ α.  Therefore CD = 1≤α<β  max{1,(δ−1)β}≤α<(δ+1)−1 β  + (δ+1)−1 β≤α<β  1−δ 1 + α β 1−δ 1 + α β  2 β 1−δ 1 + . (3.17) β α  (We note that (δ − 1)β ≤ (δ + 1)−1 β for δ between 1 and β we might have 1 > (δ +  1)−1 β,  √  2. For very small  in which case the first sum is empty, but  66  3.7. Proof of key proposition for enumeration that does not invalidate the argument that follows.) By Lemma 3.18, the first sum is  max{1,(δ−1)β}≤α<(1+δ)−1 β  =  2 β  1−δ 1 + α β  2 β  1 1 + log β , + 1 − δ − (1 − δ) log(δ 2 − 1) + O 1+δ β2  while the second sum has already been evaluated in equation (3.16) above. This case of the proposition then follows from (3.17) by noting that 2 2 + 2 − 2δ − 2(1 − δ) log(δ 2 − 1) + 2 − δ 2 − + (δ 2 − 2δ + 2) log(δ + 1) 1+δ δ+1 = 4 − 2δ − δ 2 + δ 2 log(δ + 1) + 2(δ − 1) log(δ − 1). • Case 4:  √  2 < δ ≤ 2. Just as in Case 3, we have   0, C=  1−δ + 1 , α β  if α ≤ (δ − 1)β, if (δ − 1)β ≤ α  and D =   2,  β  1−δ β  if α ≤ (δ + 1)−1 β, + α1 ,  if (δ + 1)−1 β ≤ α.  However, the inequality (δ−1)β ≤ α automatically implies that (δ+1)−1 β ≤ √ α when δ ≥ 2. Therefore CD = 1≤α<β  (δ−1)β≤α<β  1 1−δ + α β  1−δ 1 + . β α  (In this case we will not need to use the precise lower bound max{1, (δ −  67  3.8. Distribution of real eigenvalues 1)β} ≤ α for the summation over α.) This yields CD = (1 − δ) 1≤α<β  (δ−1)β≤α<β  +  1−δ β2  1 α2  +  δ 2 − 2δ + 2 β  (δ−1)β≤α<β  1 α  1 (δ−1)β≤α<β  1 1 1 − +O (δ − 1)β β (δ − 1)2 β 2 δ 2 − 2δ + 2 β 1 + log +O β (δ − 1)β (δ − 1)β 1−δ + β − (δ − 1)β + O(1) β2  = (1 − δ)  =  1 2 1 δ − 2δ − (δ 2 − 2δ + 2) log(δ − 1) + O 2 , β β  where the error terms have been simplified since δ−1 is bounded away from 0. This concludes the proof of the key proposition which yields Theorem 3.3.  3.8  Distribution of real eigenvalues  In proving Theorem 3.5, it will be convenient to define the odd function z  − log |t| dt = z(1 − log |z|),  G(z) =  (3.18)  0  whose relevance is demonstrated by the following lemma. Lemma 3.19. If B and C are independent random variables uniformly distributed on [−1, 1], then the product BC has the distribution function FBC (z) = Pr(BC < z) = 21 (1 + G(z)) for z ∈ [−1, 1]. (Of course FBC (z) = 0 for any z < −1, and FBC (z) = 1 for any z > 1.) Proof. Note that |B| and |C| are uniformly distributed on [0, 1]. For 0 ≤ z ≤ 1, we easily check that Pr(|BC| < z) =  1 0 min{1, z/s} ds  = G(z).  Thus |BC| is distributed on [0, 1] with density f|BC| (z) = − log z, and by  68  3.8. Distribution of real eigenvalues symmetry BC has density fBC (z) = − 12 log |z| on [−1, 1]. The lemma follows upon computing FBC (z) =  z −1 fBC (s) ds.  It will also be helpful to define the following functions, which are symmetric in x and y: ν1 (x, y) = 1/2 + G(xy)/2 + G((x − y)2 /4),  (3.19)  ν2 (x, y) = 1/2 − G(xy)/2,    ν1 (x, y),     ν (x, y), 2 ν(x, y) =   1 + G((x − y)2 /4),     0,  (3.20) if xy < 1 and x + y < 0, if xy < 1 and x + y > 0,  (3.21)  if xy > 1 and x + y < 0, otherwise.  To prove Theorem 3.5, we first consider the distribution function FW (δ) =  W (t) dt t<δ  associated to the density W (δ). For a random matrix M in M2 ([−1, 1]) and a real number δ, we will derive an expression for the expected number of real eigenvalues of M falling below δ, then differentiate it to obtain W (δ). It is clear that W (−δ) = W (δ) since the set M2 ([−1, 1]) is closed under negation, so it suffices to compute W (δ) for δ ∈ [0, 2]. It turns out that our calculations for FW will be somewhat simplified by considering FW (−δ) rather than FW (δ). Proposition 3.20. We have FW (−δ) =  1 4  1+δ  1+δ  −1+δ  −1+δ  ν(x, y) dx dy  for all 0 ≤ δ ≤ 2, where ν is defined in equation (3.21). Proof. We denote the entries of M by A, B, C, D; by assumption these are independent random variables uniformly distributed in [−1, 1]. Let δ be 69  3.8. Distribution of real eigenvalues fixed in the range [0, 2], and consider the shifted matrix M = M +δI, which we write as M =  X B C  Y  ,  where X, Y range independently and uniformly in [−1 + δ, 1 + δ] and B, C are as before. Clearly the eigenvalues of M lying below −δ correspond to the negative (real) eigenvalues of M . By Lemma 3.14, we are free to exclude the null set where M is singular or has repeated eigenvalues. Outside of this null set, M has exactly one negative eigenvalue if and only if det M = XY − BC < 0, by Lemma 3.13(c). Likewise by Lemma 3.13(d), M has exactly two negative eigenvalues if and only if XY − BC > 0,  X +Y <0  and  disc M = (X − Y )2 + 4BC > 0.  We thus have: FW (−δ) = Pr(BC > XY )+2 Pr X+Y < 0 and −  (X − Y )2 < BC < XY . 4  By conditioning on the values of X and Y , we may express this probability as the average value FW (−δ) =  1 4  1+δ  1+δ  −1+δ  −1+δ  ρ(x, y) dx dy,  where for fixed x and y, ρ(x, y) = Pr(BC > xy) + 2 Pr(x + y < 0 and −(x − y)2 /4 < BC < xy) = Pr(BC > xy) + 2 Pr(−(x − y)2 /4 < BC < xy)1{x + y < 0} (3.22) (here 1{·} denotes the indicator function of the indicated relation). To complete the proof it suffices to show that ρ equals the function ν defined 70  3.8. Distribution of real eigenvalues in equation (3.21). The probabilities appearing in equation (3.22) are effectively given by Lemma 3.19. However, there is some case-checking involved in applying this lemma, since the value of, say, Pr(BC > xy) = 1 − FBC (xy) depends on whether xy < −1, −1 ≤ xy ≤ 1, or xy < −1. We make some observations to reduce the number of cases we need to examine. Note that (x − y)2 /4 is bounded between 0 and 1 for any x, y ∈ [−1 + δ, 1 + δ], so that −(x − y)2 /4 always lies in the interval [−1, 1] prescribed by Lemma 3.19. From the identity (x + y)2 − (x − y)2 = 4xy we see also that xy ≥ −(x − y)2 /4. Thus xy is never lower than −1, and we need only consider whether xy > 1 (in which case FBC (xy) = 1). We therefore have Pr(BC > xy) = 1 − FBC (xy) = 21 (1 − G(xy))1{xy < 1} and 2 Pr(−(x − y)2 /4 < BC < xy) = 2FBC (xy) − 2FBC (−(x − y)2 /4) = 1{xy > 1} + G(xy)1{xy < 1} + G((x − y)2 /4). Inserting these two evaluations into the formula (3.22), we obtain ρ(x, y) = 12 (1 − G(xy))1{xy < 1} + 1{xy > 1} + G(xy)1{xy < 1} + G  (x − y)2 4  1{x + y < 0}.  It can be verified that this last expression is indeed equal to the right-hand side of the definition (3.21) of ν, which establishes the proposition. d Since W (δ) = W (−δ) = − dδ FW (−δ), to finish the proof of Theorem 3.5 d it suffices to show that − dδ FW (−δ) equals the expression in formula (3.4).  71  3.9. The derivative of the distribution  0  0 0 Ν2  Ν2  Ν2  Ν1  Figure 3.2: The three cases 0 ≤ δ ≤ 1, and 1 ≤ δ ≤  3.9  √  2, and  √  2≤δ≤2  The derivative of the distribution  Proposition 3.20 expresses FW (δ) as an integral of a function ν (which is independent of δ) over the square Sδ := [−1+δ, 1+δ]2 ⊂ R2 . Since the region d Sδ varies continuously with δ, the derivative − dδ FW (−δ) can be computed  by an appropriate line integral around the boundary of Sδ . Indeed, by the fundamental theorem of calculus, we have −  d 1 d FW (−δ) = − dδ 4 dδ =−  1 4  1+δ  1+δ  −1+δ 1+δ  −1+δ  ν(x, y) dx dy 1+δ  ν(1 + δ, y) dy − −1+δ 1+δ  1+δ  ν(x, 1 + δ) dx −  + −1+δ  =  1 2  ν(−1 + δ, y) dy −1+δ  1+δ  ν(x, −1 + δ) dx − −1+δ  1 2  ν(x, −1 + δ) dx  −1+δ 1+δ  ν(x, 1 + δ) dx,  (3.23)  −1+δ  where we have used the symmetry ν(x, y) = ν(y, x) to reduce the integral to just the top and bottom edges of Sδ (where y = 1 + δ and y = −1 + δ, respectively). The evaluation of (3.23) divides into three cases depending on the behaviour of the indicator functions 1{x + y < 0} and 1{xy < 1} on the boundary of Sδ . Figure 3.2 depicts the possible intersections of Sδ with the line x + y = 0 and the hyperbola xy = 1. • Case 1: 0 ≤ δ ≤ 1. For this range of δ, the line x + y = 0 intersects 72  3.9. The derivative of the distribution the bottom edge of Sδ at x = 1 − δ, while the hyperbola xy = 1 intersects the top edge at x = (1 + δ)−1 . Thus by the definition of ν, equation (3.23) becomes −  1−δ  d 1 FW (−δ) = dδ 2  ν1 (x, −1 + δ) dx −1+δ 1+δ  (1+δ)−1  ν2 (x, −1 + δ) dx −  +  ν2 (x, 1 + δ) dx . −1+δ  1−δ  The following elementary antiderivative (easily obtained by substitution and integration by parts) hold for any fixed nonzero real number y using the definitions (3.18), (3.19), and (3.20) of G, ν1 , and ν2 : ν1 (x, y) dx = 21 x + 18 x2 y(3 − 2 log |xy|) +  1 36 (x  − y)3 (5 − 6 log |(x − y)/2|),  ν2 (x, y) dx = 12 x − 18 x2 y(3 − 2 log |xy|).  (3.24)  Therefore in this case −  d 1 FW (−δ) = dδ 2  1 2x  + 18 x2 (−1 + δ)(3 − 2 log |x(−1 + δ)|) +  =  1 36 (x  + 1 − δ)3 (5 − 6 log |(x + 1 − δ)/2|)  +  1 2x  − 18 x2 (−1 + δ)(3 − 2 log |x(−1 + δ)|)  −  1 2x  − 18 x2 (1 + δ)(3 − 2 log |x(1 + δ)|) 90δ 2  52δ 3  80 + 20δ + + − 144(1 + δ) δ(1 − δ 2 ) − log(1 + δ) 4  107δ 4  −  (5 − 7δ  1−δ x=−1+δ  1+δ  x=1−δ (1+δ)−1  x=−1+δ + 8δ 2 )(1  12  − δ)  log(1 − δ)  (after some algebraic simplification), which verifies the first case of Theorem 3.5. (The integrands really are continuous, despite the presence of terms that could be log 0, because the function G is continuous at 0; hence evaluating the integrals by antiderivatives is valid.)  73  3.9. The derivative of the distribution • Case 2: 1 ≤ δ ≤  √  2. Now, the line x + y = 0 does not intersect Sδ ,  while the hyperbola xy = 1 intersects the top edge at x = (1 + δ)−1 . Thus by the definition of ν and the antiderivative (3.24) of ν2 , equation (3.23) becomes  =  (1+δ)−1  1+δ  d 1 − FW (−δ) = dδ 2  ν2 (x, −1 + δ) dx −  ν2 (x, 1 + δ) dx −1+δ  −1+δ  1+δ  1 2  1 2x  − 18 x2 (−1 + δ)(3 − 2 log |x(−1 + δ)|) x=−1+δ (1+δ)−1  −  1 2x  − 18 x2 (1 + δ)(3 − 2 log |x(1 + δ)|) x=−1+δ  =  12δ 2  3δ 3 )  δ(20 + 10δ − − (3δ − 1)(δ − 1) + log(δ − 1) 16(1 + δ) 4 δ(δ 2 − 1) + log(δ + 1), 4  which verifies the second case of Theorem 3.5. √ • Case 3: 2 < δ ≤ 2. As before, the line x + y = 0 does not intersect Sδ , while the hyperbola xy = 1 intersects the bottom edge at x = (δ − 1)−1 . Thus by the definition of ν and the antiderivative (3.24) of ν2 , equation (3.23) becomes d 1 − FW (−δ) = dδ 2 1 = 2 =  (−1+δ)−1  ν2 (x, −1 + δ) dx −1+δ (−1+δ)−1 1 2x  −  1 2 8 x (1  + δ)(3 − 2 log |x(1 + δ)|)  δ(δ − 2)(2 − 6δ + 16(δ − 1)  x=−1+δ  3δ 2 )  −  (δ − 4  1)3  log(δ − 1),  which verifies the third case of Theorem 3.5. Since the last case of Theorem 3.5 is a consequence of Lemma 3.10, the proof of the theorem is complete. Remark. One could also use the same method to extract the individual distributions of the upper and lower eigenvalues of M : for instance, eliminating 74  3.10. Further remarks the factor of 2 from equation (3.22) would yield an expression for the distribution of the lower eigenvalue of M .  3.10  Further remarks  Theorem 3.3 can also be obtained from a powerful result of Y. R. Katznelson [43], which gives an asymptotic formula for the number of singular integer matrices contained in the dilate k · B of any convex body B ⊆ R4 (here we are embedding the 2×2 matrices into R4 in the obvious way).21 In this more general setting, the count remains proportional to k 2 log k, with a constant determined by integrating B with respect to a certain measure. This measure is fairly unusual: it is singular, being supported only on the hypersurface of R4 consisting of singular matrices. The explicit evaluation of this measure is roughly analogous to our case-by-case considerations in Section 3.7, modulo the significant complications of carrying error terms in our case. In some sense, this measure must encode the structural difference between rational eigenvalues and real eigenvalues. We expect that real eigenvalues are likewise governed by a measure that is absolutely continuous. Katznelson’s result for larger n shows that the true proportion of singular matrices in Mn (k) decays as (log k)/k n , with a constant depending on n that is precisely computable (at least in principle). This is a considerable sharpening of our Lemma 3.2 for n ≥ 3, and it is very close to the obvious lower bound of order 1/k n obtained by counting matrices that have a row of all zeros. From this one can deduce a sharper form of Theorem 3.1, that (log k)/k n−1 is the correct order of magnitude for the probability of having at least one integer eigenvalue (see [78] for further results along this line). This probability is also an upper bound for the probability of diagonalizability over Q, but when n > 2 we do not know the precise decay rate of the 2 −n)/2  latter. We note that an obvious lower bound of order 1/k (n  is given by  counting upper-triangular matrices having distinct diagonal entries, and it seems reasonable to conjecture that the true order of magnitude only differs from this bound by some power (possibly dependent on n) of log k. 21  We thank Martin Widmer for directing our attention to this paper.  75  Chapter 4  Patterns in x2 + ky 2 4.1  Introduction  In this chapter we investigate the existence of certain combinatorial patterns in the set S(k) := {x2 + ky 2 : x, y ∈ Z} (we will confine our attention to the positive definite case k ≥ 1). Let us define the two primary patterns of interest, each of which addresses a basic natural question that one might ask of any integer sequence: “how many consecutive integers can it have?” and “how far apart can two successive elements be?” For d ∈ N, we define a run of length d (or d-run) to be any string of d consecutive numbers {m, m + 1, . . . , m + d − 1}. When further specificity is required, we call this the d-run starting at m. Should a given run be entirely contained in another set S ⊆ N, we say that S contains this run (or some intuitively similar turn of phrase). By a gap of size d (or d-gap) we mean an ordered pair (m, m + d), which we also think of as starting at m. We will say that S contains the gap (m, m + d) when S contains m and m + d but no other integer between m and m + d. (When d > 1, this is equivalent to the complement N \ S having the (d − 1)-run starting at m + 1.) Our primary concern in this chapter shall be the qualitative question of whether S(k) contains infinitely many d-runs or d-gaps for a given d. We simply wish to know whether a locally feasible pattern must occur infinitely often in S(k), with comparatively less regard for the much harder problem of quantifying the precise frequency of occurrence (we discuss some quantitative aspects in the conclusion). In the case of S(1), which is the classical sums of two squares, both runs and gaps are fairly well understood. With regard to runs, one sees easily (by local considerations modulo 4) that S(1) contains no run of length 4. 76  4.1. Introduction Conversely, it is well-known (see, for example, [14]) that S(1) has infinitely many runs of length 3, which is made apparent by the relations: 4n4 + 4n2 = (2n2 )2 + (2n)2 , 4n4 + 4n2 + 1 = (2n2 + 1)2 + 02 , 4n4 + 4n2 + 2 = (2n2 + 1)2 + 12 . Sharper quantitative results are known, but only for short runs: Hooley [39] proved that the number of 2-runs in S(1) up to x satisfies the lower bound #{n ≤ x : {n, n + 1} ⊂ S}  x . log x  (4.1)  By standard sieve methods (for instance, Selberg’s sieve as used in [14]), there is an upper bound of exactly the same order of magnitude, meaning that Hooley’s bound is best possible, aside from a constant factor. In general, the length of runs appearing in any S(k) is uniformly bounded: we will see in the next section that S(k) never contains any run of length 6 or higher, which follows from a simple local argument. However, it is possible to find runs of length 5 in certain sets S(k), and our main interest here lies in this extremal case: which sets S(k) admit infinitely many 5-runs? M. Rosenfeld (personal communication) previously observed that S(2) has many such runs, the first of which begin at 0, 96, 800, 2400, 3200 and 3648. This led him to conjecture that there are infinitely many 5-runs in S(2), and we affirm this with the first main result of this chapter: Theorem 4.1. There are infinitely many numbers m for which {18m2 , 18m2 + 1, 18m2 + 2, 18m2 + 3, 18m2 + 4} ⊂ S(2). The proof of Theorem 4.1 uses a direct construction, and the same technique can be used (with slight modifications) with certain other values of k, such as k = 46. However, we will see in Section 4.2 that there is a positive density of k for which S(k) “should” contain 5-runs (in that there are no local obstructions prohibiting so); our method, unfortunately, only works 77  4.2. Local feasibility of runs for k in a set of asymptotic density 0. (The precise conditions are somewhat tedious, and we defer their discussion to Section 4.3.) However, it does allow us to establish the following general theorem about larger values of k: Theorem 4.2. There are infinitely many k > 2 for which S(k) contains infinitely many 5-runs. The question of gaps in S(1) was answered by Spiro and Hensley [79], who proved the universal existence of gaps: for any d ∈ N, S(1) contains infinitely many gaps of size d. Just as in our theorems for runs, their proof is explicitly constructive; we greatly generalize this construction to all sets S(k), proving universal existence with a single type of exception: Theorem 4.3. Fix any k, d ∈ N subject to the restriction that either 4 k or d ≡ 2 (mod 4). Then there exists an explicitly computable quadratic polynomial P (n) such that S(k) has the gap (P (n), P (n) + d) for all n. Note that the restriction in the hypotheses is entirely necessary: if 4 | k, then S(k) consists solely of integers congruent to 0 or 1 modulo 4, so the difference between two elements of S(k) is never of the form 4m + 2.  4.2  Local feasibility of runs  We first examine all local obstructions to the existence of d-runs in S(k), in particular obtaining necessary conditions for S(k) to admit 5-runs. The underlying principle here is to consider the set of values attained by the quadratic form x2 + ky 2 for x, y in the p-adic integers Zp for some p; more directly, we are considering the range of x2 + ky 2 modulo pr for every prime p and each r ≥ 1. If there is any prime p for which this set does not contain d consecutive values, then clearly S(k) does not have any d-runs. Let us first dispense with the 2-adic case, which already reveals a uniform upper bound on d, as shown by the proposition below. Proposition 4.4. The length of any run in S(k) is at most 5. Moreover, if S(k) achieves this length, then k ≡ 2 (mod 4). 78  4.2. Local feasibility of runs k (mod 8) 0, 4 1 2 3, 7 5 6  x2 + ky 2 (mod 8) 0, 1, 4 0, 1, 2, 4, 5 0, 1, 2, 3, 4, 6 7, 0, 1, 3, 4 0, 1, 4, 5, 6 6, 7, 0, 1, 2, 4  Run length 2 3 5 3 3 5  Table 4.1: Distribution of x2 + ky 2 modulo 8 Proof. This follows by straightforward (but tedious) calculation of all possible residues of x2 + ky 2 modulo 8. Table 4.1 summarizes the possible values of x2 +ky 2 depending on the congruence class of k modulo 8, and the longest resulting string of residues. We have ordered the residues so as to make the maximum number of consecutive values more apparent. We know from the introduction that the upper bound of the proposition is tight; for instance S(2), does contain 5-runs. This motivates us to try to characterize all k for which S(k) has any 5-runs. Let us complete our analysis of local obstructions with an eye toward this central question. We first recall the well-known structure of p-adic squares [76, Ch. II, §3.3]: Proposition 4.5. If p is odd and p x, then x is a square modulo pr if and only if x is a square mod p. If p = 2, r ≥ 3 and x is odd, then x is a square modulo 2r if and only if x ≡ 1 (mod 8). In other words (after factoring out powers of p from x), the “squareness” of x modulo some high prime power pr depends only on its quadratic character modulo a fixed prime power (either p, or 23 ). The next fact shows that for most p, S(k) covers all congruence classes mod p. It follows from a well-known counting argument [65, Theorem 5.14]: the sets {x2 : x ∈ Z/pZ} and {b − ky 2 : y ∈ Z/pZ} are simply too large not to intersect. Proposition 4.6. If p k, then the equation x2 + ky 2 ≡ b (mod p) has a solution for any fixed b ∈ Z/pZ. 79  4.2. Local feasibility of runs We will also need a result of Buell and Hudson [13] on runs of consecutive quadratic residues. Proposition 4.7. For any d, every sufficiently large prime p has d consecutive quadratic residues (also d consecutive nonresidues). In particular, for d = 5 this holds for all p ≥ 197. Theorem 4.8. The existence of 5-runs in S(k) is locally feasible if and only if k ≡ 2 (mod 4) and none of the following primes divide k: 3, 5, 7, 11, 13, 19, 29, 31, 37, 53, 149. Proof. For 2-adic feasibility, we look to Proposition 4.4, which clearly shows k ≡ 2 (mod 4) is necessary; since this bounds the power of 2 dividing k, Table 4.1 essentially also shows 2-adic sufficiency.22 Let us now consider p-adic feasibility for some odd p  k. By Proposi-  tion 4.6, x2 + ky 2 attains all possible residues modulo p, giving an “infinite” run in Z/pZ. Not all of these necessarily lift to arbitrary p-adic values, but certainly any nonzero residue will (Proposition 4.5), and 0 itself is certainly attained. This yields the run {−p + 1, . . . , p − 1} of length 2p − 1 ≥ 5; such primes pose no obstruction to local feasibility. It remains to consider odd primes p which do divide k. Since x2 + ky 2 reduces to x2 modulo p, a necessary condition for p-adic feasibility is that there are 5 consecutive squares in Z/pZ. We claim it is also sufficient (note that p > 5 in such case): this is obvious by Proposition 4.5 if the 5 consecutive squares are nonzero, but even when one of the squares is 0, the same proposition allows us to lift the other 4 squares so that they remain consecutive modulo pr for all r. By Proposition 4.7, certainly all p ≥ 197 have 5 consecutive squares (since we count 0 as a square but it is not a quadratic residue, our requirement is weaker). The theorem follows by explicitly computing which primes p < 197 fail to have 5 consecutive squares mod p. These are precisely the primes listed in the statement of the theorem. 22  To be sure, one could extend the table to modulo 16, but nothing new is revealed.  80  4.3. Construction for k = 2 As an immediate consequence, the set of k for which S(k) locally admits 5-runs has a positive density of about 8%. Corollary 4.9. The set of k ∈ N satisfying the conditions of Theorem 4.8 has density equal to 215 · 36 · 5 ≈ 0.0804969. 11 · 19 · 29 · 31 · 53 · 149 Proof. This is just 41 φ(n)/n, where n is the product of the listed primes.  4.3  Construction for k = 2  Clearly k = 2 is the first integer to satisfy the conditions of Theorem 4.8, and we observed in the first section that S(2) contains several 5-runs. Let us now give the simple construction of infinitely many quintuples in S(2). Proof of Theorem 4.1. Three of the stated inclusions are immediate for any m, from the easy identities 18m2 = (4m)2 + 2(m)2 18m2 + 1 = (1)2 + 2(3m)2 18m2 + 4 = (2)2 + 2(3m)2 . For the other two terms, we will show that m may be chosen simultaneously equal to 9a2 + 4a and 3b2 + b, whence m satisfies the additional identities 18m2 + 2 = (4m + 2a)2 + 2(m − 4a − 1)2 , 18m2 + 3 = (4m + 2b − 1)2 + 2(m − 4b − 1)2 . Let us verify that the Diophantine equation 9a2 + 4a = 3b2 + b has infinitely many solutions (it has at least the basic solution m = a = b = 0). Making the substitution c = 18a + 4, d = 6b + 1 yields a Pell-type equation with modular restrictions: c2 − 3d2 = 13,  c ≡ 4 (mod 18),  d ≡ 1 (mod 6).  (4.2) 81  4.4. Construction for k > 2 This has an infinite family of solutions with the initial solution (c0 , d0 ) = (4, 1) and recurrence (cn+1 , dn+1 ) = (2cn + 3dn , cn + 2dn ). One easily verifies that the subsequence (c6n , d6n ) satisfies the necessary congruences (4.2). √ Remark. Since the ring Z[ −2] is a unique factorization domain, the existence of rational values x, y with n = x2 + 2y 2 is enough to conclude that n ∈ S(2). Thus, the above construction actually produces runs in S(2) whenever 18m2 is an integer, regardless of the integrality of c and d. It turns out that this relaxation is satisified for the denser subsequence (c2n , d2n ), which yields the run at 800; in contrast, the first run given by (c6n , d6n ) occurs at the much larger 1224111751200.  4.4  Construction for k > 2  We now generalize the construction of Theorem 4.1 to certain other values of k. We will see that under certain conditions it is possible to take a single 5-run in S(k), and generate from it an infinite sequence of 5-runs. Note that S(2) is unique in that it is the only S(k) containing the run {0, 1, 2, 3, 4}, which one might call the “generator” of Theorem 4.1 since it corresponds to (c0 , d0 ). While the strategy in this section also works for S(2), it cannot be used with the run {0, 1, 2, 3, 4} and thereby yields a less efficient construction. We first require an elementary proposition regarding generalized Pell equations. It is certainly well-known in various equivalent forms; the version below will be convenient to us. Proposition 4.10. Let A, B, C, D be fixed integers such that BD > 0 is non-square. If the system m=  x2 + A y2 + C = B D  (4.3)  has at least one integer solution (m, x, y) with (x, y) = (0, 0), then it has infinitely many. Proof. Let (m0 , x0 , y0 ) be an initial solution to (4.3). Rearranging the right82  4.4. Construction for k > 2 hand equality and substituting z = By yields the Pell equation z 2 − BDx2 = B(AD − BC).  (4.4)  √ √ Define N := BD and let a + b N be a fundamental unit in Z[ N ]. Then (4.4) has an infinite family of integer solutions (zn , xn ) given by zn xn  =  a Nb b  a  n  z0 x0  ,  where z0 = By0 . Let M denote the 2 × 2 matrix above. The terms of this sequence are necessarily distinct because (z0 , x0 ) = (0, 0) and the eigenvalues √ a ± b N of M are not roots of unity. If r is the order of M in the finite group SL2 (Z/BZ), then the subsequence (zrn , xrn ) is congruent to (z0 , x0 ) modulo B. Thus, xrn ≡ x0 (mod B) and zrn ≡ z0 ≡ 0 (mod B), which ensures that both yrn = zrn /B and mrn = (x2rn + A)/B are integers. We can now give a sufficient condition for the existence of infinitely many 5-runs within a particular S(k). Theorem 4.11. Let k > 2 be fixed, and suppose there exists m > 0 such that km2 + 2 and km2 + 3 belong to S(k), hence km2 + 2 = a2 + kb2 and km2 + 3 = c2 + kd2 for some a, b, c, d ∈ Z. If (m − b)(m − d) is not a square, then S(k) contains infinitely many 5-runs. Proof. Just as for Theorem 4.1, the inclusion {km2 , km2 +1, km2 +4} ⊂ S(k) is obvious, so it suffices to produce infinitely many m for which {km2 + 2, km2 + 3} ⊂ S(k). We make the substitutions B = m − b and D = m − d, whereupon the conditions km2 + 2 = a2 + kb2 and km2 + 3 = c2 + kd2 are equivalent to m=  a2 + (kB 2 − 2) c2 + (kD2 − 3) = . 2Bk 2Dk  (4.5)  In light of Proposition 4.10, we may hold B and D fixed and obtain infinitely many solutions to (4.5) in m, a, c, provided that the initial solution 83  4.5. How many values of k? satisfies (a, c) = (0, 0) and BD > 0 (we have assumed by hypothesis that BD is not a square, and so neither is 4BDk 2 ). But any solution to (4.5) for k > 2 satisfies a2 ≡ 2 (mod k), which means a = 0. Furthermore, we must have b = m or else a2 = 2; we must have b ≤ m or else a2 = 2 + k(m2 − b2 ) < 2 − k < 0. Thus B > 0 and likewise D > 0 by an identical argument. The condition on (m−b)(m−d) in Theorem 4.11 can easily be weakened by observing that we may freely substitute b ← −b or d ← −d to modify the quantity (m − b)(m − d). The conclusion can only fail if all such values that result are simultaneously squares. We conjecture that such a coincidence cannot occur (in which case Theorem 4.11 would hold without this additional restriction), but we are unable to prove this. In any case, the previous observation immediately gives the following. Corollary 4.12. The conclusion of Theorem 4.11 also holds if any of the following quantities is not a square: (m − b)(m − d), m2 − b2 , or m2 − d2 .  4.5  How many values of k?  The preceding corollary allows us to rapidly establish the existence of infinitely many 5-runs in S(k) for a specific value of k, by finding a single one of the appropriate form. We have used this to verify, for 91 different values of k ≤ 5000, the existence of infinitely many 5-runs — we have yet to find any generator that fails to satisfy the non-squareness conditions, which are presumably unnecessary. However, there is a rather strict condition on k implicit in this method: in order for km2 + 2 and km2 + 3 to belong to S(k), both 2 and 3 must be squares modulo k, which limits which primes p can divide k. Combining this observation with Theorem 4.8, we see by quadratic reciprocity that the following condition is necessary: Condition 4.13. k is equal to 2 times a product of (not necessarily distinct) primes p ≡ ±1 (mod 24).  84  4.5. How many values of k? Our computations suggest that this is the only restriction: the 91 values of k mentioned above are exactly those values of k ≤ 5000 which satisfy Condition 4.13. Even so, the set of such k still has asymptotic density 0, in contrast to the positive density given by Corollary 4.9. The method of Wirsing and Odoni (for example, [21, Proposition 4]) can be used to make this quantitatively precise: the number of k ≤ x satisfying this condition is asymptotic to x/(log x)3/4 times an explicit constant. While S(34) does contain 5-runs (and computations suggest that they occur with approximately the expected frequency), it does not satisfy Condition 4.13, so we have no proof that it has infinitely many. It is natural to ask if there are infinitely many k for which the construction necessarily works: this is answered by Theorem 4.2, which we now prove. Conveniently, the proof uses Proposition 4.10 again; however, the Pellian construction inevitably gives a sequence of k that is far sparser than x/(log x)3/4 . Proof of Theorem 4.2. We apply Theorem 4.11, which requires that we find families of solutions to km2 + 2 = a2 + kb2 and km2 + 3 = c2 + kd2 . One such solution is given for k = 2 by 2(20)2 + 2 = 802 = 282 + 2(3)2 2(20)2 + 3 = 803 = 92 + 2(19)2 . Now, observe that holding (m, b, d) = (20, 3, 19) fixed and solving for k gives k=  a2 − 2 c2 − 3 = 2 . 2 2 m −b m − d2  (4.6)  We now apply Proposition 4.10 to obtain an infinite sequence of tuples (k, a, c) satisfying (4.6), starting with (2, 28, 9) and noting that m2 − b2 = 17·23 and m2 −d2 = 3·13 satisfy the conditions on B and D. Since m, b and d are held constant and (m−b)(m−d) = 17, the conditions of Theorem 4.11 are satisfied across this entire sequence. The resulting values of k are clearly distinct, since a and c are determined up to sign by k.  85  4.6. Preliminaries for gaps  4.6  Preliminaries for gaps  We now turn to the question of gaps in S(k), with the eventual goal of proving Theorem 4.3. The general strategy for the proof is simple: we first construct a polynomial P0 (y) such that {P0 (y), P0 (y) + d} ⊂ S(k) for any y. We then choose an arithmetic progression for y so that P0 (y) + j ∈ / S(k) for 0 < j < d for any y in this progression; passing to this progression immediately yields the desired polynomial P0 (an + b) in the conclusion of the theorem. While this basic outline is no different from that used by Spiro and Hensley [79], for general k there are some subtleties in choosing P0 that did not appear there. We will require several propositions which should be considered wellknown (the first was essentially due to Euler in his characterization of sums of two squares). We provide the simple proofs here. Proposition 4.14. If a prime p satisfies  −k p  = −1, and p  n, then  n∈ / S(k). Proof. This follows immediately from the fact that x2 + ky 2 ≡ 0 (mod p) has only the trivial solution x ≡ y ≡ 0 (mod p) (else −k would have the square root xy −1 modulo p). Proposition 4.15. If q1 < q2 < · · · < qr are primes and q0 = −1, then for any choice of  0, 1, 2, . . . , r  ∈ {±1}, there are infinitely many odd primes  p which satisfy the constraints  qi p  Proof. Recall that the constraint  = −1 p  i  for each 0 ≤ i ≤ r.  =  0  uniquely determines p as either  1 or 3 mod 4. For each odd qi , quadratic reciprocity then gives p in any of 0 ).  1 2 (qi − 1)  qi p  =  i  for  congruence classes mod 4qi (depending on the choice of  Similarly, if q1 = 2, we easily check that the equation  a (unique) solution modulo 8, for any choice of  0  q1 p  =  1.  By Chinese  and  remaindering, we obtain a congruence class for p modulo 4  i qi ,  i  admits  in which  all constraints are simultaneously satisfied. The proposition now follows from Dirichlet’s theorem. 86  4.6. Preliminaries for gaps Lemma 4.16. Let A, B be positive integers such that AB is not a square. There are infinitely many primes p such that  −A p  = −1 and  −B p  = 1.  Proof. We assume without loss of generality that A and B are squarefree, as A and An2 have identical Legendre symbols, apart from the finitely many primes dividing n. Since AB is non-square (and positive), there exists some prime q dividing exactly one of A and B. If q | A, we may use Proposition 4.15 to choose p such that q p −1 p  −1 p  = 1,  q p  = −1, and  = 1 for all other primes q dividing AB. Likewise, if q | B, we choose =  q p  = −1, and  q p  = 1 for all other q dividing AB.  Remark. This lemma is a special case of the more general principle that, if A1 , A2 , . . . , Ar are multiplicatively independent integers (in the sense that no non-empty subset has a product equal to a square), then for any assignment of ±1 values to the Legendre symbols  Ai p  , there are infinitely many primes  p which satisfy that assignment (in fact the set of such p has density exactly 2−r relative to the set of all primes). Proposition 4.17. Let f (x) = ax2 + bx + c be a quadratic with integer coefficients and discriminant ∆ = b2 − 4ac. Suppose p is an odd prime such that p a and  ∆ p  = 1. Then there exists a number m such that p  f (m).  Proof. Under the given assumptions, the quadratic formula readily yields √ a solution to f (m) ≡ p (mod p), namely m ≡ (−b + ∆)/2a (mod p). Furthermore, since ∆ is nonzero mod p, the derivative f (m) = 2am + b is non-vanishing, and so Hensel’s lemma ensures that we can lift m to a solution of f (m) ≡ p (mod p2 ), whence p exactly divides f (m). Proposition 4.18. Let be P (n) be a nonconstant polynomial of degree r, and B > 0 a fixed integer. Then the asymptotic density of n for which P (n) is B-friable (has no prime factors exceeding B) is 0. Proof. By an elementary counting argument, the number of distinct Bfriable values up to y is  (log y)B . For any n ≤ x, P (n) is bounded 87  4.7. Proof of existence of gaps by Cxr (for some constant C) and takes each value at most r times, so there are at most  r,P  r(r log x + C)B  r,P  (log x)B values of n ≤ x where P (n)  is B-friable. Since B is fixed (one could even allow B to grow slowly with x), this set has asymptotic density 0.  4.7  Proof of existence of gaps  Proof of Theorem 4.3. Let P0 (y) be the quadratic polynomial x2 + ky 2 , where x is a linear function of y with integer coefficients to be chosen shortly. Clearly, we have P0 (y) ∈ S(k) for any y. To ensure that P0 (y) + d ∈ S(k), we write P0 (y) + d = (x + δ)2 + k(y − η)2 , where δ and η are constants to be chosen below. This is equivalent to d = 2δx + δ 2 − 2kηy + kη 2 , or x=  d − δ 2 − kη 2 kη + y. 2δ δ  (4.7)  For a given k and d, we will fix δ and restrict η so that the above expression, viewed as a linear function of y, has integer coefficients. This can be done in several cases as follows: • If d is odd, we may simply choose δ = 1 and η = 0 (so x is constant). • If d is even and k is odd, we choose δ = 1 and restrict η to be odd. • If d ≡ 0 (mod 4), we choose δ = 2 and η even. • If d ≡ k ≡ 2 (mod 4), we choose δ = 2 and η odd. Note that these cases cover all possible choices of k and d satisfying the hypotheses (there is a small amount of overlap between the cases, which is not important). For any η subject to the above restrictions, we now choose x as determined by (4.7) (which is linear in y), so that x2 + ky 2 + d ∈ S(k) for any y. It remains to choose a specific η subject to the above restrictions, as well as an arithmetic progression for y that induces a genuine gap of size d at x2 + ky 2 : we need x2 + ky 2 + j ∈ / S(k) for every 0 < j < d.  88  4.7. Proof of existence of gaps In the first case that d is odd, we have already chosen η = 0. For all other cases, we fix η to be any integer (subject to the parity constraints above) such that δ 2 + kη 2 has a prime factor q which exceeds 2d. The existence of such an η is guaranteed by Proposition 4.18, as very few choices of δ 2 + kη 2 are 2d-friable. To exclude x2 + ky 2 + j from S(k) for a given j, it suffices by Proposition 4.14 to exhibit a prime pj such that  −k pj  x2 + ky 2 + j  = −1 and pj  for some appropriate choice of y. Recall that x and y are not independently chosen but related by (4.7), which we abbreviate as x = Cy + D by taking C :=  kη , δ  D :=  d−ξ , 2δ  where ξ := δ 2 + kη 2 .  Then x2 + ky 2 + j, as a function of y, is the quadratic P0 (y) = (C 2 + k)y 2 + 2CDy + (D2 + j). In order to apply Proposition 4.17, we must verify that the discriminant of P0 is a nonzero square mod pj (we also need pj to be coprime to C 2 + k, but since C and k are fixed, there are only finitely many values of pj to exclude). Specifically, we need C 2 D2 − (C 2 + k)(D2 + j) pj  =  −kD2 − j(C 2 + k) pj  = 1.  (4.8)  By Lemma 4.16, there are infinitely many odd primes pj satisfying both −k pj  = −1 and (4.8), provided that the product k 2 D2 + jk(C 2 + k) is not  a square. Noting that C 2 + k = kξ/δ 2 , this is equivalent, up to a square factor of k 2 /4δ 2 , to ∆j := (d − ξ)2 + 4ξj.  (4.9)  We claim that ∆j is not square for any value of 0 < j < d. Note that this quantity is unavoidably square at the endpoints ∆0 = (d − ξ)2 and ∆d = (d + ξ)2 , by virtue of the construction. When d is odd, the fact that ∆j is not square is immediate: we chose δ = 1 and η = 0, so ξ = 1 and there are no even squares between (d − 1)2 and (d + 1)2 .  89  4.7. Proof of existence of gaps When d is even, recall that we chose η so that ξ = δ 2 + kη 2 has a large prime factor q > 2d. In this case ξ is now larger than d, so it is more natural to write ∆j as (ξ − d)2 + 4ξj. We now show that (ξ − d)2 + 4ξj = A2 admits no integer solutions for 0 < j < d, or equivalently with ξ − d < A < ξ + d. Since q | ξ, any such solution must have A2 ≡ d2 (mod q), whence A − ξ ≡ A ≡ ±d (mod q). But d and −d are necessarily the smallest representatives of their respective congruence classes mod q (because q > 2d), so we must have |A − ξ| ≥ d. Putting this all together, for each 0 < j < d we may now choose a −k = −1 so that pj P0 (mj ) + j for some mj . pj 2 By choosing y ≡ mj (mod pj ) simultaneously for all 0 < j < d, we obtain a congruence class for y modulo j p2j such that P0 (y) + j ∈ / S(k), so that  distinct prime pj satisfying  (P0 (y), P0 (y) + d) is indeed a gap of size d. Remark. The above proof yields a quantitative lower bound of  √ k,d  x gaps  of size d in S(k) up to x, where the implied constant decays rather rapidly with d. In the case of S(1) we can do much better: Hooley’s quantitative lower bounds for the number of pairs {m, m + 1} in S(1) can be generalized to yield a lower bound of  d  x/ log x for the number of {m, m + d} pairs in  S(1). While not every such pair is necessarily a gap, Selberg’s sieve shows for any 0 < c < d, the triple {m, m+c, m+d} appears only  c,d  x/(log x)3/2  times. Since there are only Od (1) possible choices for c, this shows that the majority of pairs {m, m + d} have no other elements in between, and are genuine gaps.  90  Chapter 5  Conclusion 5.1  Arithmetic functions  We studied a broad class of arithmetic functions f (n) in Chapter 2, showing that (with some restrictions) for any ε > 0 and any choices for α1 , α2 , . . . , αh , the approximations |fi (ai n + bi ) − αi | <  (i = 1, 2, . . . , h)  (5.1)  occur simultaneously for a positive proportion of values n. This generalizes the Erd˝ os–Schinzel theorem in two directions: one may freely compose the functions with different arithmetic progressions, and also freely choose a function at each coordinate 1 ≤ i ≤ h. Let us offer an interpretation of this latter aspect that seems intuitively attractive. To paraphrase the earlier results, a single value of n may aim at several targets simultaneously, using the values of a fixed arithmetic function; moreover, each individual function (such as σ(n)/n or φ(n)/n) gives rise to a large set of such approximants n. What our Corollary 2.11 adds is that the sets arising from different functions also have large intersection: they cannot conspire to avoid one another — unless condition (2.16) is violated, making them too interdependent. We might say they are distributed “randomly” enough that their intersections behave (up to an extremely large local factor) like those of random sets having similar size. We suspect that there is not much room for further generalization if our goal is to retain a positive proportion of approximants n. However, we have already described the contemporary work of Alkan, Ford and Zaharescu which passes up this requirement in exchange for a much more effective 91  5.2. Random matrices approximation (one whose accuracy improves nicely with n). More recently, Luca, Mej´ıa Huguet and Nicolae [51] have taken such results in a novel direction: they prove a result analogous to Schinzel and Wang’s (1.5), except where φ(n) is composed with the Fibonacci sequence Fn . In other words, the following sequence of points is also dense in the positive orthant of Rd : φ(Fn+1 ) φ(Fn+2 ) φ(Fn+d ) , ,..., φ(Fn ) φ(Fn ) φ(Fn )  .  Their proof makes use of the fact that Fibonacci numbers, although exponentially sparse, have a rich divisor structure that enables the sort of constructions we saw in the proof of Theorem 2.3; the same techniques work for similar sequences like 2n − 1 (but not 2n , which is very poor in divisors). Many difficult problems on arithmetic functions can be solidly answered when restricted to these sequences. For instance, while no one has entirely ruled out odd perfect numbers, Luca [50] has shown that no Fibonacci number is perfect, and Pollack [67] proved that there are no perfect numbers composed of the same digit repeatedly (excluding the trivial example 6). Seeing that approximation properties are robust against composing with such exotic sequences rather than linear functions, we think it would be interesting to consider compositions with more natural (but less charming) sequences, such as higher-degree polynomials.  5.2  Random matrices  In Chapter 3, we studied questions regarding the number of n × n integer matrices having a particular eigenvalue. In particular, we established upper bounds for the singularity probability in general, as well as the exact distribution of real and rational eigenvalues for n = 2. The latter investigation revealed some rather unexpected differences between the two, and the location of the mode(s) strikes the author as the most surprising aspect worthy of investigation: for instance, might further modes appear as n increases, and is there a natural explanation for their appearance? 92  5.2. Random matrices Our singularity bounds have been considerably improved by Shparlinski [78], who obtained stronger bounds, more generally for matrices shifted by adding a constant matrix (also more generally, having a prescribed determinant rather than just 0). As mentioned in the last section of said chapter, the precise number of integer matrices which are diagonalizable over Q, or have all eigenvalues simultaneously rational, remains open for any n ≥ 3. We know of only one result specific to such matrices, in the special case of symmetric (0, 1)-matrices, also corresponding to integral graphs. These were considered by Ahmadi et al. [1] and shown to occur with exponentially small probability as n increases. Also, Maze [57] has given very precise distributional results on the fine structure23 of random integer matrices. Just as in the eigenvalue problem, there is a wealth of results on the singularity probability in the converse case where we fix the size of entries k (thus fixing the distribution of each entry) and let n increase. Koml´os [46] first showed that large random matrices are rarely singular, regardless of the distribution; that is, as n → ∞, the singularity probability shrinks to 0 for any fixed k. In fact, it decreases exponentially quickly, as first shown by Kahn, Koml´ os and Szemer´edi [42] for (0, 1)-matrices, and in much greater generality by Rudelson and Vershynin [69]. Recently, Bourgain, Vu and Wood [10] have proved an impressive general bound for the singularity probability, one that incorporates both the polynomial decay observed in k (for fixed n), and the exponential decay seen in n (when k fixed). For n sufficiently large their bound, in the case of the matrices Mn (k), is essentially (2k)−n/2 . It would be interesting to explore how explicit such a bound could be made, which might yield a far more effective version of Proposition 3.6. The author’s recent efforts in this area are concerned with obtaining explicit lower bounds for the magnitude of the determinant of a typical random matrix (the determinant may be viewed as a crude measure of how far away a matrix is from being singular; once again, there is a rich history of bounding determinants away from zero in various other contexts, for instance [69, 81]). 23 However, this concerns the Hermite normal form (a division-free version of row echelon form) rather than the spectrum.  93  5.3. Patterns in quadratic forms  5.3  Patterns in quadratic forms  We have studied the patterns of runs and gaps in the set S(k) = {x2 + ky 2 : x, y ∈ Z}. We gave a qualitatively complete answer to the existence of gaps for all S(k), but significant quantitative questions remain. For instance, √ the largest gap in S(k) up to x is certainly at least log x, and heuristically should be not much larger (a power of log x). However, the best known upper bound for this quantity is O(x1/4 ), and it comes from an obvious argument: [59, p. 43] describes this state of the art as “somewhat embarrassing”.24 Even qualitatively, much remains unknown about runs, with S(34) being the first of many examples for which the existence of infinitely many 5-runs is open. One avenue of pursuit considered by the author has been to find new polynomial identities which could take the place of the trivial inclusions in the proofs of Theorem 4.1 and Theorem 4.11, thus extending the reach of this method beyond current limitations. For more general binary quadratic forms, we noted previously that they may (locally) admit extremely long runs, for instance the (positive definite) form xy+k!(x2 +y 2 ) for large k. Such forms are certainly worth investigating and are likely amenable to the methods of Theorem 4.3, but clearly new ideas are needed to tackle runs of any significant length. It seems unlikely that analytic methods can prevail for runs of such length, either: even for 3-runs, we know of no result remotely comparable in strength to (4.1). Considering the spectacular success of Green, Tao and Ziegler in enumerating arithmetic progressions of primes, one naturally expects to obtain better quantitative results for APs. Matthiesen [56] has very recently used those methods to obtain an asymptotic formula for the number of -APs in S(k), vastly generalizing earlier work of Heath-Brown [35]. The catch is that in both results, the elements of S(k) are weighted with multiplicity: that is, since there are 4 solutions to x2 + y 2 = 1, and 4 solutions to x2 + y 2 = 2, the progression {0, 1, 2} counts as appearing 16 times in S(1). Such an interpretation is fairly natural, and it does not detract from the 24  Upper bounds for prime gaps are also very far from predictions, but they may take pride in actually progressing: the current best is due to Baker, Harman and Pintz [3].  94  5.3. Patterns in quadratic forms beauty of Matthiesen’s sharp result. However, the multiplicity rk (n) of a given n ∈ S(k) could be as small as 2, or as large as exp(ck log n/ log log n), depending on n. This makes for a rather wide variation in the number of times each arithmetic progression gets counted, so it is difficult to make any deductions about the number of distinct progressions this way. The primary focus of the author’s current research is to obtain sharper results for this unweighted problem in S(k). Such precision is typically more accessible in the weighted case considered by Heath-Brown and Matthiesen.25 Even though rk (n) fluctuates highly with n, its sum  x n=1 rk (n)  is under-  stood with reasonable accuracy (at least for k fixed), certainly better than the error term in Landau’s unweighted count (1.9). Still, one might hope to obtain estimates akin to Hooley’s (4.1), establishing at least the correct order of magnitude which is not currently known from existing results. A result which dates back to de la Vall´ee Poussin’s original 1897 proof of the Prime Number Theorem [61, p. 72] is that S(k) contains a specific positive fraction of all primes. The Green–Tao theorem therefore shows that the number of r-APs in S(k) up to x is at least k,d  x2 /(log x)d . It is plausible that a careful unweighting of Matthiesen’s  estimate might do better (by some positive power of log x). However, we expect that the powerful Green–Tao machinery can be used to improve this all the way to  k,d  x2 /(log x)d/2 , as predicted by random  heuristics (and matching the upper bounds given by sieves). In fact, it should even be possible to get lower bounds for some higher-order arithmetic structures, such as pairs of arithmetic progressions {m, m + d, m + 2d} ∪ {m + 1, m + d + 1, m + 2d + 1}: Zhou [87] has established this for primes, on assumption of the twin primes conjecture (and unconditionally for almostprimes having only 1 or 2 prime factors).  25  If the reader will forgive the anthropomorphism, it is as though n ∈ S(k) wants to be counted with weight rk (n). The primes also want to be counted with the von Mangoldt weight Λ(n), but this varies very smoothly, making it a simple matter to remove.  95  Bibliography [1] O. Ahmadi, N. Alon, I. F. Blake, and I. E. Shparlinski. Graphs with integral spectrum. Linear Algebra Appl., 430(1):547–552, 2009. [2] Emre Alkan, Kevin Ford, and Alexandru Zaharescu. Diophantine approximation with arithmetic functions. I. Trans. Amer. Math. Soc., 361(5):2263–2275, 2009. [3] R. C. Baker, G. Harman, and J. Pintz. The difference between consecutive primes. II. Proc. London Math. Soc. (3), 83(3):532–562, 2001. [4] Paul T. Bateman and Harold G. Diamond. Analytic number theory. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2004. [5] Paul T. Bateman and Roger A. Horn. A heuristic asymptotic formula concerning the distribution of prime numbers. Math. Comp., 16:363– 367, 1962. ¨ [6] P. Bernays. Uber die Darstellung von positiven, ganzen Zahlen durch die primitiven, bin¨ aren quadratischen Formen einer nicht-quadratischen Diskriminante.  PhD thesis, Georg-August-Universit¨at, G¨ottingen,  1912. [7] Manjul Bhargava.  Higher composition laws. I. A new view on  Gauss composition, and quadratic generalizations. Ann. of Math. (2), 159(1):217–250, 2004. [8] Valentin Blomer and Andrew Granville. Estimates for representation numbers of quadratic forms. Duke Math. J., 135(2):261–302, 2006.  96  Bibliography [9] Peter Borwein, Stephen Choi, Brendan Rooney, and Andrea Weirathmueller, editors. The Riemann Hypothesis: A Resource for the Afficionado and Virtuoso Alike. CMS Books in Mathematics/Ouvrages de Math´ematiques de la SMC. Springer, New York, 2008. [10] Jean Bourgain, Van H. Vu, and Philip Matchett Wood. On the singularity probability of discrete random matrices. J. Funct. Anal., 258(2):559– 603, 2010. [11] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Seventh International World-Wide Web Conference (WWW 1998), 1998. 1 1 1 1 1 1 1 1 1 1 [12] V. Brun. La s´erie 15 + 71 + 11 + 13 + 17 + 19 + 29 + 31 + 41 + 43 + 59 + 61 +· · ·  o` u les d´enominateurs sont “nombres premiers jumeaux” est convergente ou finie. Darboux Bull. (2) 43, 100-104:124–128, 1919. [13] D. A. Buell and R. H. Hudson. On runs of consecutive quadratic residues and quadratic nonresidues. BIT, 24(2):243–247, 1984. [14] Todd Cochrane and Robert E. Dressler. Consecutive triples of sums of two squares. Arch. Math. (Basel), 49(4):301–304, 1987. ¨ [15] Harold Davenport. Uber numeri abundantes. Sitzungsber. Preuß. Akad. Wiss., Phys.-Math. Kl., 1933(26-29):830–837, 1933. [16] Alan Edelman. The probability that a random real Gaussian matrix has k real eigenvalues, related distributions, and the circular law. J. Multivariate Anal., 60(2):203–232, 1997. [17] P. Erd˝ os and M. Kac. The Gaussian law of errors in the theory of additive number theoretic functions. Amer. J. Math., 62:738–742, 1940. [18] Paul Erd˝ os. On the smoothness of the asymptotic distribution of additive arithmetical functions. Amer. J. Math., 61:722–725, 1939. [19] Paul Erd˝ os and Aurel Wintner. Additive arithmetical functions and statistical independence. Amer. J. Math., 61:713–721, 1939. 97  Bibliography [20] P. Erd˝ os and A. Schinzel. Distributions of the values of some arithmetical functions. Acta Arith., 6:473–485, 1960/1961. [21] Steven Finch, Greg Martin, and Pascal Sebah. Roots of unity and nullity modulo n. Proc. Amer. Math. Soc., 138(8):2729–2743, 2010. [22] Carl Friedrich Gauss. Disquisitiones arithmeticae. Springer-Verlag, New York, 1986. Translated and with a preface by Arthur A. Clarke, Revised by William C. Waterhouse, Cornelius Greither and A. W. Grootendorst and with a preface by Waterhouse. ¨ [23] S. Gershgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk SSSR, Otd. Mat. Estest. Nauk, VII. Ser. No., 6:749–754, 1931. [24] Jean Ginibre. Statistical ensembles of complex, quaternion, and real matrices. J. Mathematical Phys., 6:440–449, 1965. [25] V. L. Girko.  The circular law.  Teor. Veroyatnost. i Primenen.,  29(4):669–679, 1984. [26] Daniel A. Goldston, J´anos Pintz, and Cem Y. Yıldırım. Primes in tuples. I. Ann. of Math. (2), 170(2):819–862, 2009. [27] W. T. Gowers. A new proof of Szemer´edi’s theorem. Geom. Funct. Anal., 11(3):465–588, 2001. [28] Andrew Granville. Harald Cram´er and the distribution of prime numbers. Scand. Actuar. J., 1:12–28, 1995. Harald Cram´er Symposium (Stockholm, 1993). [29] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions. Ann. of Math. (2), 167(2):481–547, 2008. [30] Ben Green and Terence Tao. Linear equations in primes. Ann. of Math. (2), 171(3):1753–1850, 2010. [31] Ben Green and Terence Tao. The M¨obius function is strongly orthogonal to nilsequences. Ann. of Math. (2), 175(2):541–566, 2012. 98  Bibliography [32] Ben Green and Terence Tao. The quantitative behaviour of polynomial orbits on nilmanifolds. Ann. of Math. (2), 175(2):465–540, 2012. [33] Ben Green, Terence Tao, and Tamar Ziegler. An inverse theorem for the Gowers U s+1 [N ]-norm. Ann. of Math. (2). To appear. Preprint available at arXiv:1009.3998v2. [34] G. H. Hardy and J. E. Littlewood. Some problems of ‘Partitio numerorum’; III: On the expression of a number as a sum of primes. Acta Math., 44(1):1–70, 1923. [35] D. R. Heath-Brown. Linear relations amongst sums of two squares. In Number theory and algebraic geometry, volume 303 of London Math. Soc. Lecture Note Ser., pages 133–176. Cambridge Univ. Press, Cambridge, 2003. [36] Jim Hefferon. Linear Algebra (online book). http://joshua.smcvt. edu/linearalgebra/, accessed May 17, 2012. [37] Andrew J. Hetzel, Jay S. Liew, and Kent E. Morrison. The probability that a matrix of integers is diagonalizable. Amer. Math. Monthly, 114(6):491–499, 2007. [38] C. Hooley. On the intervals between numbers that are sums of two squares. II. J. Number Theory, 5:215–217, 1973. [39] Christopher Hooley. On the intervals between numbers that are sums of two squares. III. J. Reine Angew. Math., 267:207–218, 1974. [40] Bernard Host and Bryna Kra. Nonconventional ergodic averages and nilmanifolds. Ann. of Math. (2), 161(1):397–488, 2005. [41] James P. Jones, Daihachiro Sato, Hideo Wada, and Douglas Wiens. Diophantine representation of the set of prime numbers. Amer. Math. Monthly, 83(6):449–464, 1976. [42] J. Kahn, J. Koml´ os, and E. Szemer´edi. On the probability that a random ±1-matrix is singular. J. Amer. Math. Soc., 8(1):223–240, 1995. 99  Bibliography [43] Y. R. Katznelson. Singular matrices and a uniform bound for congruence groups of SLn (Z). Duke Math. J., 69(1):121–136, 1993. [44] K. Kedlaya and L. Ng. Solutions to the 67th William Lowell Putnam Mathematical Competition.  http://amc.maa.org/a-activities/  a7-problems/putnam/-pdf/2006s.pdf (online resource), accessed May 23, 2012. [45] Mitsuo Kobayashi. On the Density of Abundant Numbers. PhD thesis, Dartmouth College, 2010. [46] J. Koml´ os. On the determinant of random matrices. Studia Sci. Math. Hungar., 3:387–399, 1968. [47] Hans-Joachim Kowalsky.  Ganzzahlige Matrizen mit ganzzahligen  Eigenwerten. Abh. Braunschweig. Wiss. Ges., 34:15–32, 1982. ¨ [48] E. Landau. Uber die Einteilung der positiven ganzen Zahlen in vier Klassen nach der Mindestzahl der zu ihrer additiven Zusammensetzung erforderlichen Quadrate. Arch. der Math. u. Phys. (3), 13:305–312, 1908. [49] Leonid A. Levin. Average case complete problems. SIAM J. Comput., 15(1):285–286, 1986. [50] Florian Luca. Perfect Fibonacci and Lucas numbers. Rend. Circ. Mat. Palermo (2), 49(2):313–318, 2000. [51] Florian Luca, V. Janitzio Mej´ıa Huguet, and Florin Nicolae. On the Euler function of Fibonacci numbers. J. Integer Seq., 12(6):Article 09.6.6, 15pp., 2009. [52] Helmut Maier. Primes in short intervals. Michigan Math. J., 32(2):221– 225, 1985. [53] Greg Martin. The smallest solution of φ(30n+1) < φ(30n) is . . .. Amer. Math. Monthly, 106(5):449–451, 1999.  100  Bibliography [54] Greg Martin and Erick Wong. The number of 2 × 2 integer matrices having a prescribed integer eigenvalue. Algebra Number Theory, 2(8):979–1000, 2008. [55] Greg Martin and Erick B. Wong. Almost all integer matrices have no integer eigenvalues. Amer. Math. Monthly, 116(7):588–597, 2009. [56] L. Matthiesen. Linear correlations amongst numbers represented by positive definite binary quadratic forms.  Acta Arith.  To appear.  Preprint available at arXiv:1106.4690. [57] G´erard Maze. Natural density distribution of Hermite normal forms of integer matrices. J. Number Theory, 131(12):2398–2408, 2011. [58] H. L. Montgomery. The pair correlation of zeros of the zeta function. In Analytic number theory (Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), pages 181–193. Amer. Math. Soc., Providence, R.I., 1973. [59] Hugh L. Montgomery and Robert C. Vaughan. Multiplicative number theory. I. Classical theory, volume 97 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2007. [60] T. Muir. A treatise on the theory of determinants. Revised and enlarged by William H. Metzler. Dover Publications Inc., New York, 1960. [61] Wladyslaw Narkiewicz.  The development of prime number theory.  Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2000. From Euclid to Hardy and Littlewood. [62] D. J. Newman. Euler’s φ function on arithmetic progressions. Amer. Math. Monthly, 104(3):256–257, 1997. [63] Thomas R. Nicely. Enumeration to 1014 of the twin primes and Brun’s constant. Virginia J. Sci., 46(3):195–204, 1995. [64] Ivan Niven. Numbers: rational and irrational, volume 1 of New Mathematical Library. Random House, New York, 1961. 101  Bibliography [65] Ivan Niven, Herbert S. Zuckerman, and Hugh L. Montgomery. An introduction to the theory of numbers. John Wiley & Sons Inc., New York, fifth edition, 1991. [66] G´ abor Pataki, Mustafa Tural, and Erick B. Wong. Basis reduction and the complexity of branch-and-bound. In Proceedings of the TwentyFirst Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1254–1261, Philadelphia, PA, 2010. SIAM. [67] Paul Pollack. Perfect numbers with identical digits. Integers, 11A:A18, 11pp., 2011. [68] K. F. Roth. On certain sets of integers. J. London Math. Soc., 28:104– 109, 1953. [69] Mark Rudelson and Roman Vershynin. The Littlewood-Offord problem and invertibility of random matrices. Adv. Math., 218(2):600–633, 2008. [70] Tom Sanders. On Roth’s theorem on progressions. Ann. of Math. (2), 174(1):619–636, 2011. [71] A. Schinzel and W. Sierpi´ nski. Sur certaines hypoth`eses concernant les nombres premiers. Acta Arith. 4 (1958), 185–208; erratum, 5:259, 1958. [72] A. Schinzel and Y. Wang. A note on some properties of the functions ϕ(n), σ(n) and θ(n). Ann. Polon. Math., 4:201–213, 1958. ¨ [73] I. Schoenberg. Uber die asymptotische Verteilung reeller Zahlen mod 1. M. Z., 28:171–199, 1928. [74] I. J. Schoenberg. On asymptotic distributions of arithmetical functions. Trans. Amer. Math. Soc., 39(2):315–330, 1936. [75] Robert Forsyth Scott. The theory of determinants and their applications. Revised by G. B. Mathews. Cambridge University Press, Cambridge, 1904.  102  Bibliography [76] J.-P. Serre. A course in arithmetic. Springer-Verlag, New York, 1973. Translated from the French, Graduate Texts in Mathematics, No. 7. [77] Pin-Tsung Shao. On the distribution of the values of a class of arithmetical functions. Bull. Acad. Pol. Sci., Cl. III, 4:569–572, 1956. [78] I. E. Shparlinski. Some counting questions for matrices with restricted entries. Linear Algebra Appl., 432(1):155–160, 2010. [79] Claudia A. Spiro and Douglas A. Hensley. Problems and Solutions: Solutions of Advanced Problems: 6539. Amer. Math. Monthly, 97(3):253– 254, 1990. [80] E. Szemer´edi. On sets of integers containing no k elements in arithmetic progression. Acta Arith., 27:199–245, 1975. Collection of articles in memory of Juri˘ı Vladimiroviˇc Linnik. [81] Terence Tao and Van Vu. On random ±1 matrices: singularity and determinant. Random Structures Algorithms, 28(1):1–23, 2006. [82] Terence Tao and Van Vu. Random matrices: the circular law. Commun. Contemp. Math., 10(2):261–307, 2008. [83] Terence Tao and Van Vu. Random matrices: universality of ESDs and the circular law. Ann. Probab., 38(5):2023–2065, 2010. With an appendix by Manjunath Krishnapur. [84] J. G. van der Corput.  ¨ Uber Summen von Primzahlen und  Primzahlquadraten. Math. Ann., 116(1):1–50, 1939. [85] I.M. Vinogradov. Some theorems concerning the theory of primes. Rec. Math. Moscou, n. Ser., 2:179–195, 1937. [86] Erick B. Wong. Simultaneous approximation of reals by values of arithmetic functions. In Anatomy of Integers, volume 46 of CRM Proc. Lecture Notes, pages 289–297. Amer. Math. Soc., Providence, RI, 2008. [87] Binbin Zhou. The Chen primes contain arbitrarily long arithmetic progressions. Acta Arith., 138(4):301–315, 2009. 103  `

Cite

Citation Scheme:

Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 38 3
China 20 7
Germany 6 30
Turkey 6 0
Canada 4 0
India 3 2
Sweden 1 0
Iran 1 0
Israel 1 2
Mauritius 1 0
City Views Downloads
Shenzhen 15 5
Unknown 15 38
Boardman 10 0
Ashburn 9 0
Ankara 5 0
Tianjin 4 2
San Francisco 4 2
Calgary 3 0
Newington 2 0
San Mateo 2 0
Springfield 2 1
San Jose 2 0
Port Louis 1 0

 {[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
```                        ```
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
```
```
Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
`http://iiif.library.ubc.ca/presentation/dsp.24.1-0072975/manifest`