UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A multidimensional Szemerédi's theorem in the primes Titichetrakun, Tatchai 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2016_september_titichetrakun_tatchai.pdf [ 1.31MB ]
JSON: 24-1.0305856.json
JSON-LD: 24-1.0305856-ld.json
RDF/XML (Pretty): 24-1.0305856-rdf.xml
RDF/JSON: 24-1.0305856-rdf.json
Turtle: 24-1.0305856-turtle.txt
N-Triples: 24-1.0305856-rdf-ntriples.txt
Original Record: 24-1.0305856-source.json
Full Text

Full Text

A Multidimensional Szemere´di’s Theorem in the PrimesbyTatchai TitichetrakunA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Mathematics)The University of British Columbia(Vancouver)July 2016c© Tatchai Titichetrakun, 2016AbstractIn this thesis, we investigate topics related to the Green-Tao theorem on arithmetic progression in primes inhigher dimensions. Our main tool is the pseudorandom measure majorizing primes defined in [51] concen-trated on almost primes. In chapter 2, we combine the sieve technique used in constructing pseudorandommeasure (in this case, Goldston-Yildirim sum and almost primes) with the circle method of Birch to study thenumber of almost prime solutions of diophantine systems (with some rank conditions). Our rank conditionis similar to the integer case, due to the heuristics that almost primes are pseudorandom. In chapter 3, weinvestigate the generalization of Green-Tao’s theorem to higher dimensions in the case of corner configura-tion. We apply the transference principle of Green-Tao (with hyperplane separation technique of Gowers) inthis setting. This problem is also related to the densification trick in [16]. In chapter 4, we extend the resultof Chapter 3 to obtain the full multi-dimensional analogue of the Green-Tao’s theorem, using hypergraphregularity method by directly proving a version of hypergraph removal lemma in the weighted hypergraphs.The method is to run an energy increment on a parametric weight systems of measures, rather than on asingle measure space, to overcome the presence of intermediate weights. Contrary to [110], [68] where theauthors investigate the problem using a measure supported on primes and infinite linear form conditions,relying on the Gowers Inverse Norms Conjecture.iiPrefaceThis thesis is a combination of three manuscripts: Chapter 2 is based on [75] which is a joint work with A.Magyar. Chapter 3 is based on [74] which is joint work with A. Magyar. Chapter 4 is based on [18] whichis a joint work with B. Cook and A. Magyar.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiNotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction: Szemere´di’s and Green-Tao’s Theorem . . . . . . . . . . . . . . . . . . . . . . 11.1 Szemere´di’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 Gowers Uniformity Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Box Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.3 Bohr Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.1.4 Density Increment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.5 Energy Increment Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2 Green-Tao’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2.1 Green-Tao’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3 Szemere´di’s Regularity Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.3.1 Graph Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.3.2 Hypergraph Removal Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Goldston-Yildirim’s Sieve and Almost Prime Solutions to Diophantine Equations . . . . . . . 262.1 Backgrounds and Some Classical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.1 Basic Prime Number Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.2 Sieve Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27iv2.2 A Pseudorandom Measure Majorizing the Primes . . . . . . . . . . . . . . . . . . . . . . . 302.2.1 The W-Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2.2 Pseudorandomness Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3 Birch-Davenport’s Circle Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3.1 Set Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3.2 The Circle Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4 Almost Prime Solutions to Diophantine Equations . . . . . . . . . . . . . . . . . . . . . . . 582.4.1 Local Factors of Integral Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.4.2 Proof of Theorem 2.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.4.3 Sums of Multiplicative Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.4.4 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 Corners in Dense Subsets of Primes via a Transference Principle . . . . . . . . . . . . . . . . 763.1 Hypergraph Setting and Weighted Hypergraph System. . . . . . . . . . . . . . . . . . . . . 773.2 Weighted Box Norm and Weighted Generalized von-Neumann’s Inequality . . . . . . . . . 803.3 Dual Function Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.4 Transference Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.5 Relative Hypergraph Removal Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.6 Proof of the Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.6.1 From ZN to Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.6.2 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.7 Further Remarks: Conlon-Fox-Zhao’s Densification Trick . . . . . . . . . . . . . . . . . . . 1004 Weighted Simplices Removal Lemma and Multidimensional Szemere´di’s Theorem in thePrimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.1.1 Parametric Weight System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054.1.2 Energy Increment in weighted setting. . . . . . . . . . . . . . . . . . . . . . . . . . 1064.2 Weighted Hypergraph System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.3 Parametric Weight Systems: Extensions, Stability, and Symmetrization. . . . . . . . . . . . 1144.3.1 Extension of Parametric Weight System, Stability and Symmetrization. . . . . . . . 1164.3.2 Symmetrization of Parametric Weight System . . . . . . . . . . . . . . . . . . . . . 1234.4 Regularity Lemma for Parametric Weight Hypergraph . . . . . . . . . . . . . . . . . . . . . 1244.4.1 A Koopman-von Neumann Type Decomposition for Parametric Weight System . . . 1244.4.2 Regularity Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1314.5 Counting Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1374.5.1 Proof of the Counting Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141v4.6 Proof of Weighted Simplices Removal Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 1474.7 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1484.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.8.1 Inverse Gowers Norm Theorem and Infinite Linear Forms Condition . . . . . . . . . 152Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154viList of FiguresFigure 3.1 Weighted Hypergraph System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78viiNotationsIndex Notations• Write x = (x1, ..., xd),y = (y1, ..., yd) as vectors in dimension d. Write ω = (ω(1), . . . , ω(d)) ∈{0, 1}d, and for each such ω, let Pω : Z2dN → ZdN be the projection defined byPω(x,y) = u = (u1, ..., ud), uj =xj if ωj = 0yj if ωj = 1• For each I ⊆ [d],xI = (xi)i∈I . We may denote x for x[d].ωI means elements in {0, 1}|I|. Similarly we may write ω for ω[d]. We also define PωI (xI ,yI) in thesame way.• ω|I denotes ω restricted to the index set I.• For finite sets Xj , j ∈ [d], I ⊆ [d] then XI :=∏j∈I Xj andPωI (XI , YI) =∏i∈IZi, Zi =Xi, ωI(i) = 0Yi, ωI(i) = 1• If we want to fix on some positions, we can write, for example ω(0,[2,d]) means element in {0, 1}dsuch that the first position is 0.• For each ω, define y1(ω) ∈ {0, 1}d by(y1(ω))i =0 if ωi = 0yi if ωi = 1 , 1 ≤ i ≤ d.y0(ω) ∈ {0, 1}d is also defined similarly.viiiSet Notations[N ] := {1, 2, ..., N}, [M,N ] := {M,M + 1, ...,M +N}.P denotes the set of primes. PN ,P[N ] := P ∩ [N ].For any finite set X and f : X → R or C, and for any measure µ on X ,Ex∈Xf(x) :=1|X|∑x∈Xf(x). Eµf(x),∫Xfdµ :=1|X|∑x∈Xf(x)µ(x).Hypergraph Setting: Suppose we have a (d+ 1)−partite hypergraph with vertex sets V1, . . . , Vd+1.For e ⊆ [d + 1], we may write Ve :=∏j∈e Vj . Let pie : V[d+1] → Ve be the natural projection. Wewrite Ae = {pi−1e (F ) : F ⊆ Ve} as subsets of V[d+1].Other Notations– Linear characters: for θ ∈ R/Z, e(θ) := e2piiθ.– exp(x) = ex.– Unless otherwise specified, the error term o(1) means a quantity that goes to 0 as N → ∞ (orN,W →∞ in W−trick, see section 2.2).– f(x) ≈ g(x) means there are absolute constants c, C such that cf(x) < g(x) < Cf(x).– ZN ,Z/NZ,Z/N means additive group of integers (mod N).– k − AP means arithmetic progression of length k. Is it nontrivial if the common difference isnot zero.– Multiplicative difference: ∆hf(x) := f(x)f(x+ h). Additive difference ∆+h f(x) := f(x +h)− f(x).ixAcknowledgmentsI would like to thank Prof. A´kos Magyar, my thesis advisor, who shares the enthusiasms of the subjectand is always patient with me. For his unconditional support, trust ; without any judgments. He, withhis collaboration with Prof. Neil Lyall, also introduced me to the field of additive combinatorics longtime ago. Professors in Harmonic Analysis group; Prof. Izabella Łaba, Prof. Malabika Pramanikwho showed me as a role model of great mathematicians and always provide kind supports. Prof.Jo´zsef Solymosi, who becomes my co-advisor when A´kos moved to Georgia, who taught me discretemath and always give me advice and pull me out of my comfort zone. Prof. Brian Marcus whom Ihave audited in his entropy and information theory class, he shared the passion in the subject and histremendous care of students. Prof. Greg Martin whom I took his analytic number theory class andshowed what does it mean to be a teacher (Yes, I have to work like a dog in his class). Also all peopleat harmonic Analysis group who shared passions of the subjects; Robert Fraser, Marc Carnovale, KyleHambrook, Kevin Henriot, Ben Krause, Dimitrios Karslidis. Having met all of these great people islike a dream. I also thank to Math Department at UBC who gives me opportunities to work here andgave some bucks for me to eat. People at math offices and UBC who are always helpful. I also haveopportunities to meet many great and inspiring people at conferences, even though I am usually shy.Prof. Nikos Frantzikinakis, Prof. Terry Tao, Prof. Ben Green, Voraphan Chandee, Yufei Zhao andmany more inspiring people. I also have to thank to all people in my field who did great stuff. Finally,to people who read and review this manuscript, in particular, Prof Tamar Ziegler, Prof. MalabikaPramanik and Prof A´kos Magyar. This thesis is not perfect and their comments are invaluable formy development. I would like to thank my friends I met math department, in particular, WordsobeMun and family, Atsushi Kanazawa, Justin Scarfy and Nishant Chandgotia. Also, Tianhan Lu, MimiBrown, Zoe Lam, Phoebe Li. I thank my Thai friends I met in Vancouver who shared a good timeas starving people here, Wisarn Yenjaichon, Patcharapa Sangsuttarat, Petcharatana Bhuanantanondh,Siriphon Siriangkanakul, Tipp & Jim Placzek, Wassamon Kunamornpong, Suvaporn Phasuk. Finally,I cannot succeed without my Thai friends and teachers: Prof. Zen Harper, Prof. Pacchara Chaisuriya,Prof. Nattapan Kittisin, Prof. Wicharn Lewkeeratiyuttakul, Prof. Aram Tangboonduangjit, ProfVichian Laohakosol, Prof. Nittiya Pabhapote, Prof. Songkiat Sumetkijakan, Prof. Julian Paulter, Mr.Surachai Boonrueng, Mr. Suchart Somsuk, Miss Pavinee Hantanun, Smith Iampiboonvatana, WittayaSiricheepchaiyan, Preecha Rittiwattanadech, Kiratad Neerasen, Teerayuth Chuedee, Krisda Boon-xthose, Anucha Sanrawang, Nattakarn Numpanviwat, Kuntalee Chaisee, Piyashat Sripratak, PreechayaSanyatit, Raywat Thanatkithiran, Nopporn Thamrongrat, Athipat Thamrongthanyaluk and many otherimportant people that I don’t have a space to mention here.I learned many things since I moved to Vancouver including sumo and shogi. I thank my family fortheir unconditional support. Diligent Mayonnaise at Fujiya. Killer Saikyo miso yaki black cod whichis mind-blowing. Ten musu, chicken teriyaki, salted fish and Japanese dish at Fujiya. Thai vegetablesfrom Asia Market. BBQ unagi at Ajisai. Korean drinks and Korean items from Kim Market. Chineseitems from T & T and Henlong supermarket. Dim-sum, BBQ meat and Chinese cuisine in Vancouver.Beaty Museum where I learned to forage and do gyotaku. Fee-bee song of the black-cap chickadees.Ukiyoe, mingei and sumie at Nikkei National Museum.xiDedicationTo papa and mama.xiiChapter 1Introduction: Szemere´di’s andGreen-Tao’s TheoremOn occasions a mathematician will have an insight that is ahead of the time in the sense that theinsight is not fully expressible in the mathematical theory and language developed at the moment.For example the Poincare´ Recurrence Theorem as first stated and proved by Poincare´ was strictlynot meaningful. What was needed was the language of Lebesgue measure which came later.-WalterGottschalkThe main area of mathematics that is related to this thesis is additive combinatorics. Problems inadditive combinatorics usually ask to count or estimate additive structures in sets. This field has someorigins from additive number theory that has interested people since the ancient time. There are manyclassical problems from this field, for example, what is the number of solutions in a given Diophantineequation? Can every even integers n ≥ 4 be written a sum of two primes? Various methods such ascircle method first developed by Hardy-Littlewood and sieve theory are used to attack these problems.We may also ask for properties of set addition e.g. if A ⊆ Z+, what is the size of A + A? When isit small or when is it large? Usually studying problems like this involve studying two very differentoperations: addition and multiplication. This sometimes make problems in this area hard, even withcurrent technology. Analogue problems may be asked in some abstract setting e.g. if we let A to bea subset of arbitrary groups. Problems in additive combinatorics nowadays can be very abstract andusually involves other area of mathematics other than purely number theory or combinatorics. Sincewe usually ask to estimate the size of the sets, the tools from analysis (e.g. harmonic analysis) can behandily adapted to our finite setting.A problem posed by Erdo˝s asking that if A ⊆ Z+ with∑a∈A 1/a = ∞ must contain a (non-trivial)k−term arithmetic progression? This motivates the study of additive structures in a large subset ofZ where what we mean by large is indeed also a question. The famous result in this direction is theSzemere´di’s Theorem and Green-Tao’s theorem discussed in the next few sections. This direction of1research extends the area of additive combinatorics in touch with other area of mathematics such asErgodic theory and Lie Theory.Green-Tao’s theorem generalizes Szemere´di’s Theorem to the case of primes. One of the main tech-nique in proving these kinds of theorem is to decompose an arbitrary set to structural part and uni-formity part (that does not correlate with the structures). What should be the notion of structures anduniformity? How to measure them ? These are already hard and interesting questions. Green-Taomanaged to develop a tool to measure uniformity of primes that is sufficient to deduce results aboutarithmetic progressions on them. In this thesis, we prove some results motivated from their work. Inchapter 2 we use the Goldston-Yildirim sieve used in the original proof of Green-Tao’s theorem [50]combined with the circle method of Birch [11] to study number of solutions in almost prime of Dio-phantine equations (with some rank conditions). In the next two chapters we prove analogue resultsof Green-Tao’s Theorem in higher dimensions, Zd.1.1 Szemere´di’s TheoremOne of the most important theorem in additive combinatorics is the Szemere´di’s Theorem. Informally,Szemere´di’s theorem states that the sets of integers contains so many arithmetic progressions (or affinecopies of a finite set in Zd) that any subsets with positive density (see Definition 1.1.2) must containmany of them. The point is that there is no assumption on the set A (other than its size), it could beeither purely random or supplied with some explicit structures, which we can show that they containarithmetic progressions with different reasons for each case. In general, we try to decompose arbitrarysets into structured and random (or pseudorandom) parts where the techniques are already interestingby themselves.There are many approaches to Szemere´di’s Theorem. Basically, we are not going to find arithmeticprogressions but we will count them.1. The first combinatorial approach due to Szemere´di [99] introduces the Szemere´di’s regularitylemma which is a theorem describing the structure of a large graph. The Lemma became im-portant in combinatorics and computer science. We will talk more about regularity lemma insection 1.3.2. There is a combinatorial Fourier analytic approach due to Gowers [36], [37] where he intro-duces the Uniformity norms (of various degrees determined by the complexity of structures) tomeasure uniformity or randomness of functions. A function which correlates with some kindsof structures would be large in uniformity norms of appropriate degrees. The harder inverse the-orem asks for the converse: if a function is large with a uniformity norm, what kind of structurescan the function be correlated with? An inverse theorem would make the uniformity norms aneffective tool in studying structures in sets. This is a generalization of arguments of Roth [91]2or 3-term arithmetic progressions. Roth observed that if a set A ⊂ ZN with density α > 0has the Fourier coefficient bound ‖1̂A − α‖∞ sufficiently small (i.e. it does not correlate withlinear phase functions) then A contains many three terms arithmetic progressions comparableto random sets. Otherwise A has a structure in the sense that it has increased relative densityin some long arithmetic progressions. Then we can do density increment (see section 1.1.4) toobtain a structured subset of A which contains arithmetic progressions.One may want to generalize this idea the four-term arithmetic progressions using exponent ofquadratic polynomials.1 However this turns out not to be the case (see example 1.1.12), in par-ticular, quadratic functions on multi-dimension arithmetic progressions may not correlate withquadratic polynomial phases (hence quadratic polynomial phases are not sufficient to describestructural objects in this case). Gowers exploits tools from additive number theory such asFrieman-Ruzsa Theorem (structure of sets with small sumsets in term of multi-dimension arith-metic progressions) and Balog-Gowers-Szemere´di’s theorem (sets with many additive quadru-ples have large subsets satisfy the conditions on Frieman’s theorem) to deal with objects likemultidimensional quadratic phase functions. Gowers managed to obtain a local version of theinverse Uk norm theorem on ZN , meaning Gowers obtained correlations with many polynomialphase functions, each on an arithmetic progression. These works also inspired many later workssuch as studying global inverse theorem [59]. Full global inverse theorem as a direct generalizedof Roth’s argument would obtain later in [60] where obstructions are actually described in termsof nilsequences2 (but not with a good bound and the proof is very long).3. The next approach to Szemere´di’s Theorem is the ergodic theory approach initiated by Fursten-berg and Katznelson (e.g. [33], [31]) where they transfer this Szemere´di-type problems (viaFurstenberg-corresponding principle, first formulated in [26]) to studying multiple recurrence ina probability measure preserving system3. For example,to prove Szemere´di’s Theorem, one canstudy1NN∑n=1f1(Tnx)f2(T2nx) . . . fk(Tknx) (1.1.1)Here fi ∈ L∞(X) and T : X → X is measure preserving. This average is referred as mul-tiple recurrence. To understand the limiting behavior (as N → ∞) of (1.1.1) , the key idea isto understand the characteristic factor Z which is an invariant subsigma-algebra Z such that ifE(fi|Z) = 0 for some i then the limit of (1.1.1) would be 0 in L2 norm. Hence the explicitdescription of characteristic factors is a useful tool to prove results on multiple recurrence like(1.1.1). The question of finding the characteristic factor for a given multiple recurrence is a del-1Indeed, we have a quadratic obstruction for four term arithmetic progressions x2 − (x+ d)2 + (x+ 2d)2 − (x+ 3d)2 = 0.This is the only obstruction.2These are technical objects and we will not define them here but we discuss a bit about them in the next paragraph.3This means a set X together with a Borel σ−algebra B and a Borel probability measure µ and a measure preserving transfor-mation T : X → X meaning µ(T−1A) = µ(A)∀A ∈ B.3icate one. Host and Kra [65] and independently Ziegler [115] are able to give a nice descriptionof the characteristic factor of (1.1.1) for any k in terms of nilrotation on nilmanifolds (this couldbe considered as a generalization of abelian rotation on S1, the Kronecker’s factor) and Host-Kraseminorm (an analogue of Gowers norm), see e.g. appendix A in [8] for a brief introduction tothese objects. Nilsequences [7] play a role as the obstruction to uniformity similar to linear ex-ponentials in Roth’s theorem case. This motivated parallel work in additive combinatorics. Themotivation of why nilpotent groups arise is that in a k−step nilsystem, the first k−term of geo-metric progressions will determine the rest, see [71] section 6.4. Ergodic theory is the methodthat can attack most general kinds of patterns in this kind of problems. Let us state a theorem ofFurstenberg-Katznelson which is equivalent via corresponding principle to the multidimenisonalSzemere´di’s theorem.Theorem 1.1.1 (Furstanberg-Katznelson [31]). Let (X,B, µ) be a probability measure spaceand T : X → X is a measure preserving transformation on (X,B, µ). Let A ∈ B, µ(A) >0, k ∈ Z+ then ∃B ⊆ A,µ(B) > 0 and n ≥ 1 such thatTnB, T 2nB, . . . , T (k−1)nB ⊆ A.i.e.µ(k−1⋂j=0T−jnA) > 0.Tao gave finitary ergodic argument [101] where he discretized the ergodic argument and intro-duce UAP -norm which is a counter part of uniformity norm.Roth’s theorem also follows from studying eigenvalues (spectra) of graph using Cheeger-typediscrepancy bound, see [98]. Roughly speaking the largest second value of the adjacency matrixis an analogue of the second largest Fourier coefficients of a dense set. Generalizing this theoryto spectra of hypergraphs seems challenging. Though, in ergodic setting, there is a descriptionof characteristic factor of (1.1.1) with k = 4 in terms of generalized eigenfunctions, see e.g.[116].4. Finally, there is a hypergraph regularity approaches due to Vojta Ro¨dl, B. Nagel, M. Schacht, J.Skokan [82],[83], [85] that generalize argument in [97] and also another hypergeaph approachdue to Gowers [35], [36]. These stronger hypergraph regularity lemma allow us to deduce themultidimensional Szemere´di’s theorem. This exploited ideas in ergodic theory (energy incre-ment) and conditional expectations on sigma-algebras to obtain the required decomposition. Astronger functional version of hypergraph approach due to Tao [104], which will be the versionwe generalize to prove the main result in Chapter 4.Next we formally state Szemere´di’s theorem.4Definition 1.1.2. Let A ⊆ N, we say A has positive upper density if lim supN→∞ |A∩[N ]|N > 0,i.e.there is a δ > 0, Nj ↗∞ such that |A∩[Nj ]|Nj ≥ δ for all j. We say that A has positive upper Banachdensity if there is a sequence of intervals Ij ⊆ N, |Ij | ↗ ∞ such that |A∩Ij ||Ij | ≥ δ for all j. A similarnotation may be similarly defined on Z or Zd using the Cartesian product of intervals.Now we state two versions of Szemere´di’s Theorem in dimension 1.Theorem 1.1.3 (Szemere´di’s Theorem; equivalent forms). 1.(Infinite Version) If A ⊆ N has positiveupper Banach density then for all k ∈ Z+, there exist x, t, t 6= 0 such that P := {x, x + t, ..., x +(k − 1)t} ⊆ A2.(Finite Version) Let δ > 0 then there is an N(k, δ) such that if N ≥ N(k, δ) then any A ⊆ [1, N ]with |A| ≥ δN has some P = {x, x+ t, ..., x+ (k − 1)t, t 6= 0} ⊆ A.The point is that N(k, δ) is independent of A. It is easy to check that these two statements areequivalent. We may also replace the P with F ′ := x+tF for any finite set F , where F ′ = {x+tf, f ∈F} is an affine copy of F .Now we state the functional version of Szemere´di’s Theorem. Note that it is more convenient to workin a more structured setting like in a group, e.g. ZN ′ (where N ′ is a prime much bigger than N ) andthere is a standard argument that deduces the result on [N ] from that, as we will do.Theorem 1.1.4. Let f : ZN → [0, 1] with Ex∈ZN f(x) ≥ δ. Then there is a positive constant c(k, δ)such thatEx,t∈ZN f(x)f(x+ t)...f(x+ (k − 1)t) ≥ c(δ, k)− oδ(1). (1.1.2)Theorem 1.1.4 implies the finite version by taking f to be the characteristic function on a set Aand using an average argument of Varnavides [114] stating about the conclusion in finite version ofSzemere´di’s theorem that we will have at least c(δ′, k)N2 of such progressions. Conversely the setversion also implies the functional version by considering sets of the form {x : f(x) ≥ δ/2} with asimple average argument.Theorem 1.1.5 (Varnavides’s Theorem [114]). The conclusion of Theorem 1.1.3 (finite version) maybe strengthened to conclude that A contains at least c(α, k)N2 k − AP (i.e. k−term arithmeticprogression).Proof [114]: Consider N(k, δ) such that L ≥ N(k, δ) then any set A ⊆ {1, . . . , L} with density≥ δ/2 must contains a nontrivial k−AP. This follows from the following observations:– We work in Z/N . Consider a long L− arithmetic progression Sa,d = {a + d, . . . , a + Ld} ⊆Z/N where a, d ∈ ZN , d 6= 0, L ≤ N . If Sa,d intersects A with at least δL/2 elements thenSa,d ∩A contains a nontrivial k−AP.5– Consider all L−AP with a fixed common difference d 6= 0. Varying a, we have∑a |Sa,d∩A| =L|A| ≥ δLN. Hence |Sa,d ∩A| ≥ (δ/2)L for at least (δ/2)N values of a.Now vary d then |Sa,d ∩A| ≥ (δ/2)L for at least (δ/2)N(N − 1) values of a, d. Each of theseL−AP contains a nontrivial k−AP.– We consider possible repetitions of arithmetic progressions counted. Observe that any nonrivialk−AP could be contained in at most L(L−1) L−APs. ( punch line: for each k−AP, the indicesof the first two terms of this k−AP in the L−AP would determine the L−AP it is in ). Hencethe numbers of k−APs in A is at leastδ2N(N − 1)L(L− 1) &δ2N2L2= c′(k, δ)N2One may take e.g. (when k = 3) L = dexp(Cδ−1(log(1/δ)4))e according to Bloom [9].Finally, let us remark a more general version of Szemere´di’s Theorem called Density Hales-Jewett’sTheorem. This theorem implies Szemere´di’s theorem in finite group.Theorem 1.1.6 (Density Hales-Jewett’s Theorem [29]). Let F ⊆ Z and 0 < δ < 1. IfN ≥ N(|F |, δ)then for anyA ⊆ F d, |A| ≥ δ|F |d, A contains a set of the form {a+tr; t ∈ F, a ∈ Zd, r ∈ Zd, r 6= 0}.Remark 1.1.7 (Quantitative Bound in Szemere´di’s Theorem). We have the following equivalent state-ments of Szemere´di’s Theorem.0 ≤ f ≤ 1,Ex∈Z/Nf(x) ≥ δ ⇒Ex,r∈Z/Nf(x)f(x+ r) . . . f(x+ (k − 1)r) ≥ c1(k, δ)(N in term of δ) A ⊆ [N ], |A| ≥ δN,N ≥ c2(k, δ)⇒A contains some k−term arithmetic progressions.(δ in term of N) A ⊆ [N ], |A| ≤ rk(N)⇒A contains no k−term arithmetic progressions.Example: The original bound obtained by Roth [91] is given by c2(3, δ) > exp(exp(cδ−1)). This isequivalent to r3(N) ≤ C Nlog logN .We have the following recordsN2−√8 logN (O’Bryant [79]) ≤r3(N) ≤ C (log logN)4logNN (Bloom [9])r4(N) ≤ C Nexp(C√log logN)(Green-Tao [57])r4(N) ≤ NlogcNfor some small constant c > 0 (Green-Tao [58]).6For k > 4 we have the following bounds due to Gowers [39],c1(k, δ) > 2−21/δck , ck = 22k+9c2(k, δ) > exp(exp((1/δ)ck))CN exp(−N2N−12 (logN)1/N + 12Nlog logN)[79] ≤rk(N) ≤ C N(log logN)2−2k+9 .Improving quantitative bound of Szemere´di’s Theorem, even in the case k = 3, is an interestingresearch problem. In a precedent work, Bourgain [12] obtains an upper bound of r3(N) by analyzingthe structure of Bohr sets (see section 1.1.3) and doing density increment on Bohr sets. Bloom [9]obtains the bound by analyzing combinatorial properties of large spectrum set (the same to the recentwork on the bound of cap-set problem4 [3] on the size of subsets of Fnp not containing three termarithmetic progressions, which isC3n/n1+ε). The lower bound of r3(N) first obtained by Behrend [5]using the idea that there is no three-term arithmetic progressions on a sphere. There is no substantialimprovement and it might be the optimal shape. This turns out to be the case for Roth theorem in fourvariables [93]. Much less quantitative results are known in higher dimensional cases.1.1.1 Gowers Uniformity NormsDefinition 1.1.8 (Ud-norm and inner product). LetG be a finite abelian group define5 theUd−(generalized)inner product of 2d functions fω, ω ∈ {0, 1}d,〈(fω)ω∈{0,1}d〉Ud = Ex,h1,...,hd∈G∏ω∈{0,1}dC |ω|f(x+ ω1h1 + · · ·+ ωdhd)In particular when fω = f ∀ω, we have the following definition of Gowers uniformity norm‖f‖2dUd = Ex,h1,...,hd∈G∏ω∈{0,1}dC |ω|f(x+ ω1h1 + · · ·+ ωdhd)where Cf := f is the complex conjugate and |ω| = ω1 + . . . ωd.Lemma 1.1.9 (Basic properties of Gowers uniformity norms, see e.g. [109]). 1. If f is a boundedfunction then6‖f‖Ud ≤ ‖f‖ 2dd+12. Gowers-Cauchy-Schwarz’s inequality |〈(fω)ω∈{0,1}d〉| ≤∏ω∈{0,1}d ‖fω‖Ud . In particular,‖ ·‖Ud is indeed a norm.4see [108] for a very recent improvement on this problem via polynomial method.5We don’t actually need conjugates as we are working on real valued functions, these conjugated may be neglected later6This is sharp as mentioned in [66], an easy application of Holder inequalities can give us easy looser bound.73. ‖f‖Ud ≤ ‖f‖Ud+1 .The following recurrence relation is usually useful when working with higher uniformity norms‖f‖2dUd = Eh‖fT hf‖2d−1Ud−1 (1.1.3)Example 1.1.10. [116] Suppose f : ZN → {1,−1} is a random function with mean 0 then by thelaw of large number, ‖f‖Uk = o(1) with high probability.Remark 1.1.11. An analogue of Gowers uniformity norms called Host-Kra seminorms can in fact bedefined on general measure spaces [65] where they are in general seminorms. They are norms exactlywhen they are defined on a nilmanifold.When working in the uniformity norm, we are interested in proving an inverse theorem: Finding theset of structure function Fd such that if f has a large uniformity norm of degree d then f correlatewith some elements of Fd, in other words, if f has small uniformity norm of degree d then f doesnot correlate with any structures in Fd. From property (3) of Lemma 1.1.9, we see that functions withsmall higher order Gowers norm will correlate with less structures i.e. Fd ⊆ Fd+1. We will proveanalogue properties of uniformity norm on more general weighted hypergraph.For example as in the proof of Roth’s theorem, function with large U2 norm , due to relation ‖f‖U2 =‖fˆ‖4, will correlate with some linear exponents. In higher order uniform, however, we have thefollowing example.Example 1.1.12. [[39], section 4.] Consider two-dimensional arithmetic profression in ZN , N isprime: P = {x1 + Kx2 : −K/10 ≤ x1, x2 ≤ K/10,K = b√Nc} and f(x1 + Kx2) = e((x21 +x22)/N)1P . The function φ(x1 +Kx2) = x21 + x22 is a quadratic function on P in the sense that∆+h1∆+h2∆+h3φ(x) = 0 ∀x, h1, h2, h3 ∈ P.and we can show that ‖f‖U3  1. On the other hand, we can do a calculation that |〈f, e(ψ)〉| =O(N−c) for any quadratic phases ψ(x) = rx2/N + sx/N + t with r, s ∈ ZN , t ∈ R/Z. Hence theexponent of quadratic polynomial function of this form is not the only quadratic obstruction of ‖ · ‖U3norm.1.1.2 Box NormBox norm is a more abstract version of the Gowers uniformity norms defined on hypergraphs i.e. ona function f : X1 × · · · ×Xd → R.8Definition 1.1.13 (Box norms). Let X1, . . . , Xd be finite sets. f : X1 × · · · ×Xd → R. Define thebox norm of order d,‖f‖2dd = Ex∈X1×···×Xd,y∈Y1×···×Yd∏ω∈{0,1}df(Pω(x,y))For e ⊆ {1, . . . , d}, we can define ‖f‖e for f : Ve → R.Example 1.1.14. ‖f‖42 = Ex∈X,x′∈X′,y∈Y,y′∈Y ′f(x, y)f(x′, y)f(x, y′)f(x′, y′).On a hypergraph system (J = [d+1], (Vj)j∈J ,H)7, we put sigma algebras Be on each Ve =∏j∈e Vj .We can also think ofBe as the σ−algebraAe on VJ whereAe = pi−1e (Be) for the projection pie : VJ →Ve.Lemma 1.1.15 ( [52], Gowers uniform functions are orthogonal to lower order sets, in other words,there are no correlations between them). Given a hypergraph system (J = [d + 1], (Vj)j∈J ,H). Lete ∈ H, |e| = d′, f : Ve → R.1. ‖fg‖e ≤ ‖f‖e when g : Ve → R is independent of xj-variable for some j ∈ e.2. Exe∈Vef(xe)∏e′(e 1Ee′ (xe′ ) ≤ ‖f‖e3. Suppose there exists sub-algebras Be′ ⊆ Ae′ with compl(Be′) ≤ M for any e′ ⊆ J with|e′| < d, Then for any e ∈ H , let E′e ∈∨e′(e Be′ thenEx∈VJ1E′e(x)f(pie(x)) = OM (‖f‖e)Proof. The first statement follows when we expand the LHS using the definition of box norm. Thesecond statement follows by iterated applications of the first statement, see the proof of Theorem 3.5.5(on weighted hypergraph) for details. The last statement follows from the second statement, triangleinequalities together with the fact that E′e is a union of OM (1) atoms of∨e′(e Be′ .Definition 1.1.16. Given a hypergraph system (J, VJ ,H) and a σ−algebra B on VJ . and let e ∈ Hd.Write ∂e = {f ⊆ e : |f | = |e| − 1}. Define the e−discrepancy ∆e(Ee|B) of the set Ee ∈ Be withrespect to B by8∆e(Ee|B) := supEf∈Af∀f∈∂e∣∣∣∣ ∫VJ(1Ee − E(1Ee |B))∏f∈∂e1Efdµ∣∣∣∣ (1.1.4)Note that the largeness of ∆e(Ee|B) implies the largeness of ‖1Ee −E(1Ee |B)‖e . To see this, writeF = 1Ee − E(1Ee |B). Let x′j ∈ Xj for 1 ≤ j ≤ d be fixed and y1, . . . , yd ∈ [0, 1]. For 1 ≤ i ≤ d,7 See definition 4.1.4. Basically, J is a index set VJ := (Vj)j∈J and H ⊆ ∏j∈J Vj is a hypergraph. See section 1.1.5 forrelevant notions on σ−algebras.8We could replace the product on f ∈ ∂e to f ( e or replacing 1Ee −E(1Ee |B) with a bounded function with the same proof.9defineEi = {(x[d]\{i}) :∏ω[i,d]F (x1, . . . , xi−1, x′i, Pω[i,d](x[i,d])) ≥ yi} ∈ B[d]\{i}Then by Fubini’s Theorem,∆e(Ee|B) ≥∫VJF (x1, . . . , xd)1E1 · · · · · 1Eddµ=∫ 10· · ·∫ 10∫VJF (x1, . . . , xd)1E1 · · · · · 1Eddµ dy1 . . . dyd=∫VJ∫ 10· · ·∫ 10F (x1, . . . , xd)1E1 · · · · · 1Eddy1 . . . dyd dµ=∫VJ∏ωdF (ω[d](x[d]))dµTaking average over x′1, . . . , x′d then ∆e(Ee|B) ≥ ‖F‖2dd . Combining this with Lemma 1.1.15, weconclude the relationships of the two quantities (in particular largeness of one implies largeness of theother.)‖1Ee − E(1Ee |B)‖e ≥ ∆e(Ee|B) ≥ ‖1Ee − E(1Ee |B)‖2de (1.1.5)1.1.3 Bohr SetsBohr sets can be regarded as an analogue of subspaces in integer setting where we can run densityincrement. Given a finite abelian group G, it is not hard to construct a non-degenerate symmetricbilinear form (x, y)→ x · y from G×G to R/Z (see [109], Lem. 4.3). For example, if G = ZN , wecan take x · y = xyN . For each r ∈ Gˆ, the dual group of G. We can define the character on Gˆ to be thefunction er(x) := e(r · x). We can identify Gˆ with the set of characters on G by taking r 7→ er. Wecan define the Fourier transform of f ,fˆ : Gˆ 7→ C, fˆ(r) = Ex∈Gf(x)e(r · x)We can think of the Fourier transform as the measurement of the correlation between f and thecharacter er. It is natural to put a uniform measure on f : G → C and put counting measure onf̂ : Ĝ→ C i.e.‖f‖p = Ex∈G|f(x)|p, ‖f̂‖p =∑r∈Ĝ|f̂(r)|p.We state the following basic properties which follow directly from direct calculations.Lemma 1.1.17. 1. (Orthogonality) Ex∈Gf(x)e(r · x) =0 if r 6= 01 if r = 0102. 〈f, g〉 = 〈fˆ , gˆ〉. In particular ‖fˆ‖2 = ‖f‖23. f̂ ∗ g = fˆ gˆ.4. ‖f‖U2 = ‖fˆ‖4Note that no analogue of property (4) for Uk, k ≥ 3 is known. It would be very useful if such aformula is found.Definition 1.1.18 (Bohr’s set). Given ρ > 0, S ⊆ Ĝ. Let ‖ · ‖ denote the distance to the nearestinteger. Define9B(S, ρ) := {x ∈ G : ‖r · x‖ < ρ ∀r ∈ S}S is referred as the frequency of B(S, ρ). |S| is called the dimension of B(S, ρ), ρ is the width ofB(S, ρ).The notion of Bohr sets is an analogue object of subspaces10 in Z or ZN or in more general groups .For example, it satisfies nearly closure properties (if it is regular, see [12].) Indeed it can be consideredas approximated subgroups [48]. It can also be thought as a metric ball of radius ρ and dimension |S|.An important structural theorem of Bohr sets in ZN is that they look like multidimensional arithmeticprogressions. This is Bogolyubov’s lemma combined with geometry of numbers.Definition 1.1.19 (Multidimensional arithmetic progression). A multidimensional arithmetic progres-sion of dimension K with basis x1, . . . xK is a subset of ZN or Z of the form{ K∑i=0lixi ; |li| ≤ mi.}Lemma 1.1.20 (see [109]). Let S be a nonempty subset of ẐN and ρ > 0. Then– B(S, ρ) contains an arithmetic progression of size at least ρN1/|S| centered at 0.– B(S, ρ) contains a proper multidimensional arithmetic progression of dimension |K| and hassize at least (ρ/|S|)|S|N.1.1.4 Density Increment MethodHere we describe the density increment method, sometimes called L∞−increment method [105]. Wehave to find a good notion of Structure (the set of structural objects in the set/space we considering)and ‖ · ‖S (Uniformity norm, with respect to Structure ). Let S ∈ Structure and δS(A) denotes thedensity of A on S. We want to have the following dichotomy:– (generalized von Neumann 11) ‖1A−δS(A)‖S < c(α)⇒ A contains the required configurations.9equivalently, we may define B(S, ρ) := {x ∈ G : |1− e(r · x)| < ρ}, using |e(t)− 1| ≤ 2| sin(pit)| ≤ 2pi‖t‖R /Z.10If G is a vector space then Bohr set is indeed a subspace, the annihilator of S.11See [31], Lemma 3.111( structure of A can be described in term of ‖ · ‖S and A contains the required configurations).– (density increment) ‖1A − δS(A)‖S > c(α)⇒ we can find S′ ∈ Structure, ω(S′) ≤ ω(S) + 1, where ω here is the notion of complexity12 of structural sets andδS′(A) > δS(A) + c(α)for some positive absolute constant c(α) depending only on α.‖·‖S should be strong enough (i.e. not too many objects with small ‖·‖S norm) to prove the generalizedvon Neumann but weak enough (not too many objects with large ‖ · ‖S norm) to obtain the densityincrement. To run the density increment method, suppose we cannot find the configurations in theset then we can find S′ where A has increased density. However this cannot continue forever as thedensity cannot exceed 1 So eventually, we arrive at the other case of the dichotomy and we mustbe able to find the required configuration in the set. We give some examples of density incrementdichotomy below.Example 1.1.21 ([46], Lemma 2.4). (Roth’s theorem in Finite field.) Let A ⊆ Fnp . If there is t 6= 0such that 1̂A(t) ≥ α2/2 then there is a subspace of codimension 1 such that A has density on someof its translate at least α+ α2/4.In this case we may take Structure to be the set of subspaces of Fnp and ‖ · ‖S to be ‖ˆ·‖∞.Example 1.1.22 (Roth’s Theorem.). Suppose 〈1A − α, e(x rN )〉  δ, this means 1A − α has a linearbias in some direction. We can use equidistribution property13 of rxN (mod 1) to partition [N ] (upto small error) into long arithmetic progressions of length, say N t, t < 1 for which rxN is almost aconstant (mod 1) on each of these progressions. Then our set will have increased density α + c(α)on one of this progression. We could also do density increment in Bohr sets as in [12].Example 1.1.23. (Rectangles, corners) Finding the correct notion of structures and uniformity normscan be tricky. We may expect the ‖ · ‖2 norm to control the number of rectangles to control thenumber of rectangles. Assume ‖1A − α‖2 is small and we want to show that A contains roughlyα4N4 rectangles. Expand‖1A − α‖42 =Ex,x′∈X,y,y′∈Y 1A(x, y)1A(x′, y)1A(x, y′)1A(x′, y′)− αA3(x, x′, y, y′) + α2A2(x, x′, y, y′)− α3A1(x, x′, y, y′) + α4This first term is the number of rectangles (divided by N4). A1 is a sum of 4 terms of the formEx,x′,y,y′1A(x, y′) = α. The second term is 6 sums of the form Ex,y,x′,y′1A(x, y)1A(x, y′) =12So S′ in some sense is not too small compared to S. For example, if S are subspaces of a given vector spaces, ω(S) could bethe codimension of S13e.g. Dirichlet’s diophantine approximation theorem12Ex(Ey1A(x, y))2 ≥ (Ex,y1A(x, y))2 = α2 by Cauchy-Schwartz’s inequality. However we want to es-timate four terms inA3 likeEx,y,x′,y′1A(x, y)1A(x′, y′)1A(x′, y) orEx,y,x′,y′1A(x, y)1A(x′, y)1A(x, y′)to be≤ α3 but there is no reason for this to be true for general, apart from an insufficient trivial boundof O(α2). Indeed, in Shkredov’s proof [96] of exponential bound in corner in dimension 2, he putssome uniformity conditions on the structural objects E1 × E2 (with a method to uniformize generalproduct set E1 × E2) in a way that we can run the density arguments. This is still open in higherdimension or in general.1.1.5 Energy Increment MethodThe energy increment method, first appeared in the context of graph theory in the proof of Szemere´di’sGraph Regularity Lemma [99], gives analogue dichotomy argument as in density increment method.However, this method uses the machinery of ergodic theory (factor) to read the structure in term ofσ−algebras, this is more flexible with the machinery of L2−space. For example, it is used in showingthe existence of prime arithmetic progressions [52] where there is no density argument proof of theresult available. Indeed, there are many results in density Ramsey theory where only ergodic proof isknown. We also use the energy increment method in chapter 4.Definition 1.1.24 (Factor). Let X be a finite set, then a σ−algebra on X can be given by a (unique)partition of X which partitions X into atoms. A sigma-algebra B on X is sometimes called a factor.We say that a factor B′ is finer than B (or B is coarser than B′) if each atom of B can be written as aunion of atoms in B′. We say that f is measurable with respect to B if f is constant on each atoms ofB. Hence L2(B′) ⊆ L2(B).Define compl(B), the complexity of B to be the smallest number of elements in B that can be used togenerate B. Note that the number of atoms is at most 2compl(B).Finally define the join B ∨ B′ the least common refinement of B and B′, that is the sigma-algebrawhose atoms are given by the intersections A ∩A′ where A is an atom in B and A′ is an atom in B′.If we are working on X × Y with B1 a σ−algebra on X and B2 a σ−algebra on Y . We haveB1∨B2 = {B1 × B2;B1 ∈ B1, B2 ∈ B2} is a factor on X × Y . We may also regard B1,B2 asfactors on X × Y in the trivial way and B1 ∨ B2 = {B1 ∩B2;B1 ∈ B1, B2 ∈ B2}.Definition 1.1.25 (Conditional Expectation). For a function f : X → C we define the conditionexpectation to be the function E(f |B) : X → C byE(f |B)(x) := 1|B(x)|∑y∈B(x)f(y),where B(x) is the atom containing x.We see that E(f |B) is constant on each atom of B. We can view E(f |B) as a version of f that couldbe realized by B , in other words the information of f that is captured by B. Indeed, the conditional13expectation E(f |B) is the orthogonal projection of L2(X) to L2(B) where L2(B) is the space offunctions in L2(X) which is B−measurable. This follows from the identity〈f − E(f |B),E(f |B)〉 = 0.Here 〈f, g〉 = Ex∈Xf(x)g(x). Hence f −E(f |B) is orthogonal to all structure given by B. Similarly,if B ⊆ B′ then E(f |B) is the orthogonal projection of E(f |B′) to L2(B). Indeed, we have〈E(f |B′)− E(f |B),E(f |B)〉 = 0.Next, we define the term energy which the name “energy increment method” (or “energy boostingargument” in computer science) comes from.Definition 1.1.26 (Energy). Let f : X → C and B a factor then the energy of f with respect to B isgiven by the L2−norm ‖E(f |B)‖22.Energy increment method: Decompose f : X → C asf = f1 + f2 + f3 (1.1.6)where f1 = E(f |B) is the structural part described in terms of sigma-algebra. f2 = f − E(f |B) isthe pseudorandom part which is orthogonal to the structures given by B. We usually refer to suchdecomposition as Koopman-von Neumann decomposition. Sometimes, we will also have f3, the theerror term which is small in size or , say, in L2 norm.Example 1.1.27. LetBtriv = {∅, X} be the trivial sigma-algebra. Then for any x ∈ X,E(f |Btriv)(x) =1X∑x∈X f(x) := α is a constant function. If X is a dense truly random set then we expect‖1X − α‖d to be small. We may obtain the decomposition1X = α+ (1X − α)Example 1.1.28. [47] Let f : Fnp → [−1, 1], S := Specη(f) := {r : |fˆ(r)| ≥ η}. Hence |S| ≤ 4δ2(by Plancherel theorem). Let H = S⊥ be the annihilator of S and µH := 1H/E1H be the Haarmeasure on H . Let B be the factor generated by the linear functions r(x) = rTx, r ∈ S. We calculatef1(x) := f ∗ µH(x) = 1|H|∑h∈H+xf(h) = E(f |B)f2(x) := f − f ∗ µH , |fˆ2(r)| = |fˆ(r)||1− µˆH(r)| ≤ 2η.The last inequality follows from the fact that µˆ ∈ [0, 1] and r ∈ S ⇒ µˆH(r) = 1.14Suppose we have the decomposition as (1.1.6) then we plug in into an average like (1.1.2) to prove thatthe average is bounded above by a constant. With such a decomposition, we basically reduced to prov-ing such estimate for general f to f1. Such a statement for f1 is sometimes called counting lemma.In some context like Green-Tao’s Theorem, the counting lemma is just the ordinary Szemere´di’s the-orem. The method that get rid of f2, f3 and allows us to work on only f1 is called transferenceprinciple. The term f2 behaves randomly and are expected to cancel out in the average (1.1.2), sothat any term involving f2 giving only a small error term (this is called Generalized von-Neumanntheorem.). The remaining terms involving f3 are also expected to contribute only small error term,this can be trickier to show.Dichtomy for energy increment:If ‖f − E(f |B)‖ > δ is large then f correlates with some structure in B and we can find a finerfactor B′ which incorporated some structure in B, with increased energy,‖E(f |B′)‖22 = E(f |B)‖22 + c(δ). (1.1.7)Here c(δ) is a fixed constant. Since the energy cannot exceed one, this process must stop and we mustarrive at a factor B such that ‖f − E(f |B)‖ is small.1.2 Green-Tao’s TheoremThe motivation for Green-Tao’s theorem is that prime should behave randomly (this is basically stillopen e.g. the prime tuples conjecture), hence it should contain many arithmetic progressions like arandom set.1.2.1 Green-Tao’s TheoremTheorem 1.2.1. [52] Let f : ZN → R≥0, f(x) ≤ ν(x) for some pseudorandom measure ν (satisfyingsome pseudorandom conditions with parameter M , say) with Ex∈ZN f(x) ≥ δ thenEx,y∈ZN f(x)f(x+ y)...f(x+ (k − 1)y) ≥ c(k, δ)− ok,δ,M (1).where c(k, δ) is the same positive constant as in the Szemere´di’s Theorem (Theorem 1.1.2).Note that if we can prove this for fixed k, y then the k−prime tuple conjecture would follow. We showbelow (as in case of Szemere´di’s theorem) that this theorem implies the followingTheorem 1.2.2. If A ⊆ P has relative positive upper density i.e.lim supN→∞|A ∩ PN ||PN | > 015Then A contains infinitely many k-term arithmetic progressions.Remark 1.2.3. A way to think of Theorem 1.2.1 in its relation to Theorem 1.2.2 is that we take f to be(the characteristic functions of) a dense subset primes and ν is (a normalized function supported on)the set of almost primes for which primes has positive density. If support of ν is sparse then ν wouldbe unbounded. The point would be that this set of almost primes has some pseudorandom propertywhich says something like “ν and 1 behave similarly” that will allow us to prove the theorem.Theorem (1.2.1)⇒ Theorem (1.2.2). Suppose A ⊆ P with positive upper density α then there issome14 b depending on N such that 1N∑n≤N 1A(N)Λb(n) > cα.Let [M, δN ] be in the support of the Green-Tao’s measure ν (see deinition 2.2.3). Consider δk = 2−kthen we have a partition[M, δN ] = [M, δkδN ] ∪k−1⋃m=0[2mδkδN, 2m+1δkδN ]so that there is some j such that 1N∑n∈[2jδkδN,2j+1δkδN ] 1A(n)Λb(n) > cα,k. Definef(n) =ck1A(n)Λb(n) if 2jδkδN ≤ n ≤ 2j+1δkδN0 otherwisethen there is a constant cα,k > 0 such thatEn∈[1,2j+1δkδN ]f(n) > cα,kfor some j = j(N) and some N arbitrarily large. We can verify that f(n) ≤ ν(n) (see Chapter 2 fordefinition of ν and the verification). Let B = {x ∈ [1, 2j+1δkδN ] : f(x) ≥ cα,k2 } then |B| ≥c′α,k2 Nso by Theorem 1.2.1,B contains c2(α, k)N2 arithmetic progressionsQ = {x, x+y, ..., x+(k−1)y}such that f(x)...f(x+ (k − 1)y) ≥ ( c′α,k2 )k and so1N2∑x,y∈ZNf(x)...f(x+ (k − 1)y) ≥ 1N2∑x∈A,y∈ZNf(x)...f(x+ (k − 1)y) ≥ c2(α, k)(c′α,k2)k.which is greater than 0. The contribution of trivial arithmetic progression (i.e. y = 0) is O( logk NN ) =o(1). Also if x, x + y, ..., x + (k − 1)y ∈ [2jδkδN, 2j+1δkδN ] ⊆ ZN is an arithmetic progressionin ZN then they are genuine arithmetic progressions in Z. (xj + xj+2 − 2xj+1 ≡ (mod N) ⇒xj + xj+2 − 2xj+1 = 0, ∀j.)14See definition of Λb in Chapter 2, for now just think of it as the Mangoldt functions on primes16In the original proof of Green-Tao’s Theorem [51], they decompose a function f majorized by ν usingthe dual function (also referred as generalized character) Df defined by〈f,Df〉 = ‖f‖2dUd (1.2.1)For example, consider ‖ · ‖3 (where we consider box norm instead, the Gowers norm on ZN couldbe made in a similar form by a change of variable), we haveDf(x, y, z) = Ex′,y′,z′f(x, y, z′)f(x, y′, z)f(x′, y, z)f(x′, y′, z)f(x′, y, z′)f(x, y′, z′)f(x′, y′, z′)Indeed functions with large Uk norm is then correlate with Df just by definition. The obviousstructure of Df we see is that they are composed of functions of lower complexity15 of the formF (x, y)G(y, z)H(z, x) and we will use this obvious structures to decompose functions in the regular-ity lemma using the machinery of sigma-algebras. It is much harder to see what these objects reallyare; they are actually given by nilsequences. We don’t need this in soft inverse arguments as in [52]but we need it if we want to give asymptotes for linear equation in primes [54].A function g with ‖g‖∗Ud= O(1) is called anti uniform function which will be used to show uniformdistribution property: If f is uniform then 〈f, g〉 will be small. This can be regarded of a generaliza-tion to uniformity in Roth’s theorem when g is taken to be linear exponentials. Hence anti uniformfunctions can be used to measure the degree of structures in a function. In [52], they prove that thesedual functions satisfy the dual function estimate for fi bounded by ν‖P (Df1, . . . ,DfK)‖∗Uk = OK,d,P (1) (1.2.2)where P is a polynomial of K variables, degree d. K, d can be arbitrarily large. Correlation conditionis applied here in place of infinite linear forms condition which was not available at that time. Seesection 3.3.This estimate allows one to prove uniform distribution of ν with respect to these dual functions (Prop.6.2 in [52]). The dual of Gowers uniformity norms are not algebra norm in general, but they aremajorized by BAC−norm (defined in [37]) which is a norm satisfying some algebraic propertiesused in proving transference principle [37] and also in Chapter 3 of this thesis to prove a transferenceprinciple. For example, the dual of ‖ · ‖∗U2 norm is not an algebra norm but is majorized by an algebranorm16. Indeed, when f is a function on ZN , we have‖f‖∗U2 = ‖fˆ‖4/3 ≤ ‖fˆ‖115Here, meaning they depend on less number of variables or constructed from functions which depend on less number of vari-ables. This relates to the notion of relatively independent joining which is used in an explicit construction of Host-Kra factors;characteristic factors for multiple recurrence in (1.1.1). See e.g. the appendix B of [8] or section 7 in [71] for expositions.16meaning ‖fg‖ ≤ ‖f‖‖g‖17Here ‖ˆ·‖1 is the Wiener norm which is an algebra norm by Young’s inequality. However it would notbe easy to show that ‖D̂f‖1 = O(1) when 0 ≤ f ≤ ν. A more general and systematic study of dualnorms and dual functions in this direction is taken in [66].For the next step, Green-Tao employed the notions of factor and condition expectation machineryfrom ergodic theory. They use the dual functions to define sigma-algebras, and use the energy incre-ment (if there is a correlation, find dual functions DF with correlate with f and use it to refine thefactor B with increased energy) to prove estimate like (1.1.7) or (1.1.6) for 0 ≤ f ≤ ν. The setsgenerated by these σ−algebras of DF is referred in Green-Tao’s paper [51] as generalized Bohr sets.Now we state a transference principle which is later simplified in [37] and independently known inlanguage of computer science as dense model theorem [87]. The following version is taken from [88].Theorem 1.2.4 (Transference Principle). Let ν be a pseudorandom measure. Suppose ‖ν − 1‖Uk ≤ε′ := exp(−(1/ε)O(1)) then there exists f1, f2, f = f1 + f2, 0 ≤ f1 ≤ 2, ‖f2‖Uk ≤ ε. Furthermore,Ex,rf(x)f(x+ r) . . . f(x+ (k − 1)r) = Ex,rf1(x)f1(x+ r) . . . f1(x+ (k − 1)r) +O(‖f2‖Uk)with O(‖f2‖Uk) = O(ε).We will prove a variant of this transference principle in Chapter 3 where we adapt the method in[37]. The quantitative bound exp(−(1/ε)C) comes from the following fact17 (the explicit bound isnot important unless we are trying to extract a quantitative bound in the application):Fact 1.2.5. (e.g.[16]) There is a polynomial P (x) = pdxd + pd−1xd−1 + · · ·+ p0 such that |p(x)−x+| ≤ 8 for all x ∈ [−2/ε, 2/ε] such that|pd|(2/ε)d + · · ·+ |p1|(2/ε) + |p0| ∼ exp(1/C)Now we demonstrate how to apply transference principle to give a quantitative bound for Green-Tao’sTheorem, assuming the Szemere´di’s Theorem as a blackbox.Claim 1.2.6. ‖ν − 1‖Uk = O(1/ω) +O(logω/√logR).Proof of Claim. Recall that by linear forms condition (see section 2.2.2),‖ν − 1‖2kUk =∑A⊆{0,1}k(−1)|A|(1 + o(1)) = o(1)17If we don’t need an explicit quantitative bound in the transference principle then we don’t need this fact in the proof of thetransference principle. See e.g. Appendix B of [] or [] for expositions.18Hence the error term o(1) comes from the the error in the following linear forms estimate condition;(Cχφ(W )W logR)mEx∈BΛχ,R(θ1(x))2 . . .Λχ,R(θM (x))2 = 1 + o(1)Using the estimate obtained in section 2.2, the o(1) term is given byO(1ω) +O(logω√logR)Here R is a small power of N .Choose ω,N large enough so that ‖ν−1‖Uk ≤ ′ = exp(−(1/)O(1)). Choose the  in the error termO() (using the constant in the Remark 1.1.7 ) so that < exp(−C exp((1/δ)Ck)) < C2(k, δ)Hence (1/)C ≥ exp(exp(1/δCk)). As we chose ω ≥ exp(1/εC) and logN ≥ (logω) exp(1/εC)so we choose N ≥ exp(exp(exp(1/εC))), hence we have the boundN ≥ exp(exp(exp(exp(exp(1/δCk))))).Note that there are other variants of transference principles. A natural question to ask would be theproperties of the weight ν required to obtain a transference principle or what would be the naturalcondition of ν, as investigated18 in [16] in the context of Green-Tao’s Theorem. This can open awider applications of the transference principles. In the case of 3 term arithmetic progressions in theprimes, this question is first investigated by Green [45], to approximate f by a bounded function g.In this case, ν is required to satisfy a restriction estimate and a Fourier decay property. A variant oftransference principle in this case is obtained in [63] where they approximate f by a function g whichis no longer assumed to be bounded but has bounded L2−norm. This price one pays is that ν needsatisfy some more properties such as correlation estimate and some estimate of its L2−norm. Thishas applications in obtaining a better quantitative bound of Roth’s theorem in the primes. Naslund[78] obtains a transference principle for lk-bounded function g with some stronger assumptions on νthan [63]. See e.g. [80] for an exposition.18With correlation conditions in the definition of pseudorandom sets, we expect relative Szemere´di’s Theorem to hold for pseu-dorandom sets of density N−o(1). In [16], they remove the correlation conditions and obtain the results for pseudorandom subsetsof density N−ck .191.3 Szemere´di’s Regularity LemmaThe idea of regularity in graph is that the equally distribution of the edge density. Regularity lemma isa kind of structural theorem. It says that up to small error, we can describe any dense graph with somestructure (partition of vertex sets which has about the same density), and apart from that information,the graph is just behaves randomly. For a survey of basic properties and applications of this lemma,see the survey [69].1.3.1 Graph RegularityDefinition 1.3.1. A bipartite graph G(A,B) is ε−regular if for all A′ ⊆ A,B′ ⊆ B, |A′| ≥ε|A|, |B′| ≥ ε|B| then ∣∣∣∣ |E(A′, B′)||A′||B′| − |E(A,B)||A||B|∣∣∣∣ ≤  (1.3.1)Here we don’t assume regularity condition for a pair involving a small set. Equation (1.3.1) can be rewritten as|E(A,B) ∩ (A′ ×B′)| = |A′ ×B′||A×B| |E(A,B)|+O(ε|A×B|) (1.3.2)This statement would be trivial for small A′ or B′. This is used in the functional version [103].Theorem 1.3.2 (Basic properties of regular pairs; Most degrees into a large set are large [69]). Let(A,B) be an −regular pair with density δ then Let Y ⊆ B, |Y | ≥ ε|B| then∣∣{x ∈ A : deg(x, Y ) < (δ − ε)|Y |}∣∣ < ε|A|Proof. Let X = {x ∈ A : deg(x, Y ) < (δ − ε)|Y |}, expect Xnot to be too large. Trivially,|E(X,Y )| < (δ − ε)|Y |. Suppose |X| ≥ ε|A| then by regularity, this is a contradiction.Remark 1.3.3. Let us mention briefly a relation of the notion of graph regularity with the box norm.Suppose ‖1G − E(1G|BX ∨ BY )‖ ≤ η with BX = X1 ∪ · · · ∪Xm, BY = Y1 ∪ · · · ∪ Yn. We haveE(1G|BX ∨ BY )(x, y) = |G∩(Xi×Yi)||Xi||Yi| := δij . By the definition of box norms we can find functionsU(x), V (y) such thatEx,y(1G(x, y)− E(1G|Bx ∨ BY )(x, y))U(x)V (y) ≤ η.Writing αi := |Xi|/|X|, βi := |Yi|/|Y | and let Ui := {U(x) : x ∈ Xi}, Vj := {V (y) : y ∈ Yj},γi := ||G ∩ (Ui × Vj)| − δij |Ui||Vj || ≤ η. This implies, for example,∑i,j:γij>√ηαiβj ≤ √ηHence we have many Xi, Yj with G∣∣Xi×Yj is η−regular.20Theorem 1.3.4 (Graph regularity lemma [99]). ∀ > 0, ∃K() independent of n with the followingproperty: Any graph Gn with n ≥ n0() vertices can be partitioned into vertex classes V0, V1 . . . , VKsuch that |V0| ≤ n/K, |Vi| = |Vj | for 1 ≤ i, j ≤ K and all but K2(i, j) pairs give −regularG(Vi, Vj).The proof proceeds via the energy increment method on a partition of vertices, see e.g. Theorem 9.4.1in [1]. Inspecting the proof, we would need n0(ε) to be a tower of height ε−5 to obtain an ε−regularpartition with K = ε−1. It was shown by Gowers [40] that this tower type bound is necessary. Itis true for applications of regularity method that we will have terrible bound. If we want a betterquantitative bound, we would need to avoid applying regularity lemma.Now we illustrate a well known application of regularity lemma.Theorem 1.3.5 (Triangle removal lemma; Ruzsa-Szemere´di [92]). ∀c > 0 ∃ε(c) > 0 with the follow-ing property: If Gn is the union disjoint of cn2 edge-disjoint triangles then it must actually containsat least (c)n3 triangles where → 0 as c→ 0.Remark 1.3.6. Trivially, the number of triangles would be at least cn2 triangle but this theorem saysthat it contains much more especially when n is large. In fact, in a higher order of magnitude.Thm 1.3.4⇒ Thm. 1.3.5. Take a random 3−partition on the vertices to obtain vertex setsW1,W2,W3.By losing a positive fraction of the cn2 triangles, we can assume that Gn is tripartite. Choose ε, andapply the regularity lemma (Theorem 1.3.1) to our graph, we obtain the regular partition with Kclasses of vertices.– Delete the edges between non − regular pairs, the number of edges deleted is less than εK2(n/K)2 =εn2 edges.– We will apply graph regularity lemma for pairs with density at least δ. Delete the edges betweenpairs with density ≤ δ then we deleted less than K2δ(n/K)2 = δn2 edges.Choose ε, δ much much smaller than c, hence we still have c′n2 triangles in our graph. We obtain asimplified graph with the following property: If there is an edge between Vi and Vj then G(Vi, Vj) isε−regular and d(Vi, Vj) ≥ δ. Now we claim that the number of triangles is at least c(ε)n3.Consider vertex sets V1 ⊆ W1, V2 ⊆ W2, V3 ⊆ W3 where each pair (Vi, Vj) is regular with density≥ δ. Apply Theorem 1.3.2 to V2, at least (1 − 2ε)|V2| vertices has degree ≥ (δ − ε)|V1| to V1 anddegree ≥ (δ − ε)|V3| to V3. Pick one of such v and assume δ − ε ≥ ε. One has from the definition ofregular pairs,|E(N(v)∣∣V1, N(v)∣∣V3)| ≥ (δ − ε)|N(v)∣∣V1||N(v)∣∣V3| ≥ (δ − ε)((δ − ε)n/K)2 ≥ ε3(n/K)2.This is a lower bound of number of triangles containing v. Since the number of such v is at least(1− 2ε)|V2|, choosing δ = 2ε then the number of triangles is at least (1− 2ε)ε3n3/K(ε)3.21Finally we state a functional version of graph regularity lemma [103]. We will prove analogue of thislemma in the weighted hypergraph setting in the main text.Theorem 1.3.7 (Functional graph regularity lemma [103]). Suppose f : V1 × V2 → [0, 1] is measur-able wrt B1,max,B2,max and ε > 0. Let F = Fε : N → N be an arbitrary increasing function. Thenthere exists M = OF,ε(1) and sigma-algebras Bi ⊆ B′i ⊆ Bi,max on Vi. We obtain the followingdecomposition of f :1. E(f |B1 ∨ B2), compl(B1), compl(B2) ≤M .2. ‖E(f |B′1 ∨ B′2)− E(f |B1 ∨ B2)‖2 ≤ ε3. ‖f − E(f |B′1 ∨ B′2)‖ ≤ 1/F (M).Functional version (Theorem 1.3.7) implies graph version (Theorem 1.3.4): WriteE for the set of edgesbetween V1 and V2. Apply the Theorem 1.3.7 with ε replaced by ε3/2. By equation (1.1.5), the lastcondition in Theorem 1.3.7 may be translated to|(f − E(f |B′1 ∨ B′2))1A1×A2 | ≤ 1/F (M) ∀ A1 ∈ B′1, A2 ∈ B′2.Let J = d2M/εe which is a large number and assume |V1|, |V2| > J where each B1,B2 contains atmost 2M atoms. Subdivide each of these atoms into sets of size b |Vi|(1+O(ε))J c with possibly remainingsets of size (error term) O(|Vi|/J) on each atom. Collect all error term into the set Vi,0, we obtain adecompositionVi = Vi,0 ∪ Vi,1 ∪ · · · ∪ Vi,Jwith |Vi,0| = O(ε|Vi|) + O(2M |Vi|/J) = O(ε|Vi|). Our goal is to show that (V1,j , V2,k) is a regularpair for almost j, k ≥ 1. Consider the induced bipartite graph (V1,j1 , V2,j2 , E ∩ (V1,j1 × V2,j2)). LetA1 ∈ B1, A2 ∈ B2 be atoms. We want to show|E ∩ (A1 ×A2)| = |E ∩ (V1,j1 × V2,j2)||V1,j1 × V2,j2 ||A1 ×A2|+O(ε|V1,j1 ||V2,j2 |). (1.3.3)For this, it suffices to find d independent of A1, A2 such that|E ∩ (A1 ×A2)| = d|A1 ×A2|+O(ε|V1,j1 ||V2,j2 |). (1.3.4)(To see this, take A1 = V1,j1 , A2 = V2,j2 in (1.3.4) and substitute in (1.3.3)). Now A1 × A2 is in anatom of B1 ∨ B2, one may take d = E(1E |B1 ∨ B2). This is equivalent toE(x,y)∈V1×V2(1E − E(1E |B1 ∨ B2))1A1×A2 = O(ε|V1,j1 ||V2,j2 ||V1||V2| ) = O(ε/J2)22Now by our assumption (the conclusion of the functional graph regularity lemma),E[1E − E(1E |B′1 ∨ B′2)1A1×A2]= O(1/F (M)).Take F (M) := 22M/ε3 and since J = Oε(1), one has O(1/F (M)) = O(ε3/22M ) = O(ε/J2) =Oε(1). Hence it suffices to showE(∣∣∣∣E(1E |B′1 ∨ B′2)− E(1E |B1 ∨ B2)∣∣∣∣1V1,j1×V2,j2 ) = O(ε/J2) (1.3.5)By Cauchy-Schwartz’s inequality, it suffices to showE(∣∣∣∣E(1E |B′1 ∨ B′2)− E(1E |B1 ∨ B2)∣∣∣∣21V1,j1×V2,j2 ) = O(ε2/J2) (1.3.6)We have from our assumption thatE∣∣∣∣E(1E |B′1 ∨ B′2)− E(1E |B1 ∨ B2)∣∣∣∣2 = O(ε3) (1.3.7)So (ε2/J2)|{(j1, j2) : (1.3.6) fails}| ≤ ε3. Hence all but O(εJ2) pairs (j1, j2) that (1.3.6) fails.We can prove the functional graph regularity lemma to prove functional triangle removal lemma statedbelow. This theorem this says we can clean up a graph with a small number of triangles in a lowercomplexity manner to make it triangle free.Theorem 1.3.8 (Triangle Removal Lemma [100]). Let (X,µX), (Y, µY ), (Z, µZ) be probability spaces.Suppose f1 : X × Y → [0, 1], f2 : Y × Z → [0, 1], f3 : X × Z → [0, 1] are measurable functions.Let ε > 0. SupposeΛ3(f1, f2, f3) :=∫X∫Y∫Zf1(x, y)f2(y, z)f3(x, z)dµXdµY dµZ ≤ εThen we can find measurable functions f˜1 : X×Y → [0, 1], f˜2 : Y ×Z → [0, 1], f˜3 : X×Z → [0, 1]such that ‖fi − f˜i‖1 = oε→0(1) for i = 1, 2, 3 such that f˜1f˜2f˜3 vanishes entirely, in particularΛ3(f˜1, f˜2, f˜3) = 0.Finally, let us remark that there is also an arithmetic regularity lemma for functions f : [N ] → [0, 1]proved in [50] in terms of nilsequences. An application of this lemma in [50] is a proof of Bergelson-Host-Kra’s conjecture [7]: If A ⊆ [N ] has density α and let ε > 0, then there existsα,ε N choicesof h for which A contains at least (α4 − ε)N 4-AP. The case of 3-AP. is proved by Green [49] and itis shown by Ruzsa in an appendix of [7] that the statement is false for 5-AP.231.3.2 Hypergraph Removal LemmaGeneralizing graph regularity to hypergraph regularity is not trivial. A strong version of hyper-graph regularity lemma due to Vojta Ro¨dl-B.Nagel-M.Schacht-J.Skokan [82, 83, 84, 85] and Gowers[35, 36] allows one to prove hypergraph removal lemma and deduce multidimensional Szemere´di’stheorem from that. The version we will use later is a stronger functional version due to Tao [104]. Itturns out that by consider a projection, we can deduce the general multidimensional Szemere´di’s the-orem from the corner case (corresponding to d−regular hypergraph). This will not work for primes aswe don’t know if the projection of a prime point is still prime or not. Fortunately, we can use generalsimplices in the prime cases, allowing us to apply the Linear forms conditions. In graph theoreticalterm, as stated in [16]19, the Linear forms conditions say that our hypergraph has the asymptoticallycorrect count for any 2-blow-up of its subgraph.Recall a non-degenerated corner is a configuration of the form{(x1, . . . , xd), (x1 + s, x2, . . . , xd), . . . , (x1, x2, . . . , xd + s)}with s 6= 0, we state the corner theorem.Theorem 1.3.9. If A ⊆ Zd has positive upper density then A contains a non-degenerate corner.The corner Theorem can be proved via the hypergraph removal lemma on (d+ 1)−partite d− regularhypergraph. This is first observed by Solymosi in case d = 2 ( [97]). First note that in a (d+1)-partited-regular hypergraph with vertex set X1, ..., Xd+1, a simplex is a set of size d + 1 of d−hyperedges{(xi)i∈[d+1]\{j}}1≤j≤d+1.Lemma 1.3.10 (Hypergraph Removal Lemma [36]). In a (d+ 1)−partite d−uniform hypergraph H ,for any  > 0, there exists δ = δ(ε) > 0 where δ → 0 as ε → 0 with the following property: Let Hbe a (d+ 1)−partite d−uniform hypergraph with vertex set X1, ..., Xd+1, |Xi| = Ni with sufficientlylarge Ni. Suppose H contains ≤ δ∏d+1i=1 Ni simplices, then for each i ≤ d + 1, one can remove atmost ∏j 6=iNi hyperedges of H from∏j 6=iXj in such a way that after the removals, one is left witha hypergraph which is simplex-free.Proof of Theorem 1.3.9 via Lemma 1.3.10. Let A ⊆ [N ]d, |A| ≥ αNd and consider the correspond-ing hypergraph GA on (Z/N)d+1 (see section 3.1 for the construction with all weights are 1 in ourcase here.) where we put a d-hyperedge (yi1 , .., yid) iff all the corresponding d hyperplanes intersectsat a point in A. Then we see that each simplex in GA will correspond to a corner.Each corner may be degenerated to a single point in A but this can happen for only |A| = o(Nd+1)of these simplices. Apply the hypergraph removal lemma with  = αN−1d−1 → 0, as N → ∞.19actually a special case of Linear forms conditions24Suppose GA contains ≤ δ(ε)Nd+1 corners then for a sufficiently large N ,  < α2d . So if A doesnot contain a non-degenerated corner then by the Hypergraph Removal Lemma, we would be able toremove less than αNd of d− hyperedges to make the hypergraph simplex-free but this is impossibleas |A| has size ≥ αNd.Remark 1.3.11. We can ensure that the constant s in the corner can be choose to be positive by thefollowing trick due to Ben Green [36]: If we choose random point (x, y) from [−N,N ]2, since A hasupper density α, we have P((x, y) ∈ A∩[−N,N ]d) ≥ cα for some c > 0 and infinitely manyN . If weselect a (fixed) point (a, b) at random, letB = A∩(a, b)−A. Since |A∩((a, b)−A)| = 1A∗1A(a, b)and201N2∑(a,b)1A ∗ 1A(a, b)1[−N,N ]d =1N2(∑(a,b)1A(a, b))21[−N,N ]d ≥ c2α2.Hence if we replace A by B = A∩ ((a, b)−A) then B = (a, b)−B and still has positive density forsome (a, b) and B is symmetric around A.Corner Theorem⇒ Multidimensional Szemere´di’s Theorem [36]. Suppose A ⊆ Zr has positive up-per density. Consider the nontrivial case F ⊆ Zr, |F | = k+1 ≥ r+1. By the remark above one mayconsider F which is symmetric about some point. Choose a point z such that F − z has one pointat the origin and A − z still has positive upper density. Also we may assume that span{F} = Zr:Suppose span{F} ⊆ V , a vector space of dimension r − 1. Let er be a vector outside V thenF ∪ {er} ⊆ (V ∪A)× Z ⊆ Zr. which has positive upper density. Then we may add vectors to F sothat span(F ) = Zr without affecting the question.Let {e1, ..., ek} the standard basis of Rk, and define a linear map Φ that maps bijectively from{0, e1, ...ek} to F . Now suppose span{φ(e1), ..., φ(er)} = Zr and we can find infinitely many posi-tive integersM = M(N)→∞ asN →∞ and Φ−1(A−z)× [M−1]k−r has positive upper densityη for some η = η(α, F ) > 0 and is mapped into A − z so we can find a large M with at least ηMkpoints on [M − 1]k is mapped into A− z by Φ.Now we may apply the corner theorem to conclude that Φ−1(A− z)× [M − 1]k−r contains a cornerw + c{0, e1, e2, ..., ek}, c > 0. So there is an affine image of F = z + Φ(w + c{0, e1, e2, ..., ek}) ∈A.20 if we choose (x, y), (a, b) independently, we can think of this asP((x, y) ∈ B) = P((x, y) ∈ A)P((x, y)− (a, b) ∈ −A|(x, y) ∈ A) = P((x, y) ∈ A)P((a, b) ∈ (x, y)−A|(x, y) ∈ A) ≥ c2α225Chapter 2Goldston-Yildirim’s Sieve and AlmostPrime Solutions to DiophantineEquationsIn this chapter we prove the main result , Theorem 2.4.2 in section 2.4. We develop some backgrounds,motivations and necessary tools in section 2.1- Backgrounds and Some Classical Results2.1.1 Basic Prime Number EstimatesSome results on sums of primes may be derived from the Prime Number Theorem using partial sum-mation to convert sums involving an arithmetic function to an integral (see appendix A in [77]). GivenN > 1, the set of almost prime Pε[N ] is defined to be the set of positive integers up to N which onlyhave large prime factors, bigger that N ε. Note that each integers in Pε[N ] can have at most b1/εcprime factors.Theorem 2.1.1 (Partial Summation Formula [77]). Let A(x) =∑1≤n≤x an thenN∑n=1anf(n) =∫ N0f(x)dA(x) =∫ N1−f(x)dA(x)We collect some well-known facts from analytic number theory (See e.g. chapter 2 in [77])26– Standard bound on the number of divisors:d(n) ≤ exp( log nlog logn) = O(nε) ∀ε > 0 (2.1.1)– The set of primes is dense in the set of almost primes1.∣∣∣∣ P ∩ [N ]Pε ∩ [N ]∣∣∣∣ ε (2.1.2)– The Prime Number Theorempi(x) = (1 + o(1))xlog x(2.1.3)– The Prime Number Theorem, equivalent forms:∑p<xlog p = x+ o(x),∏p≤ωp = ew(1+o(1)) (2.1.4)–W =∏p≤ωp⇒W/φ(W ) ≈ logω (2.1.5)– ∑p<x1p= log log(10 + x) +O(1), x > 0 (2.1.6)– ∑p<xlogK ppK logK(10 + x),K > 0, x > 0 (2.1.7)– If <(s) > 1, s = 1 + o(1) then2∏p(1− p−s) = (1 + o(1))(s− 1) (2.1.8)2.1.2 Sieve ProblemsWe briefly describe what the sieve method is, however only for motivation purposes as this section isnot required in the main text. We follow closely [107] and [25] in this subsection. Let P =∏p≤z p bea squarefree integer, and D = {p|P : p ≤ D} be a set of divisors of P . Let an be a finitely supportedsequence of nonnegative reals3. For each prime p, let Ep be a subset of integers such that for each1We actually don’t need this estimate, just for motivation purpose.2In Tao’s elementary approach with smooth compactly supported function χ ,which we employ here, we would only need(2.1.8) from the Riemann Zeta’s function. In previous approach [51] or in more technical approach in [110], more information ofRiemann’s zeta function or the zero free region is still needed.3e.g. an = 1[1,x)(n).27d ∈ Z+, Ed := ∩p|dEp (with E1 = Z). Define Xd =∑n∈Ed an. We wish to estimate, that is to findupper/lower bounds of ∑nan1n/∈∪p|PEp (2.1.9)By the Inclusion-Exclusion Principle, we can write the sum in (2.1.9) as∑d|P µ(d)Xd. However,it turns out that the number of terms in the sum is too large causing the error terms to accumulatetoo quickly. One can do better by truncating the sum, working in a way that only Xd for d ∈ D areknown or exploited. This is a linear programming problem and one can restate the problem using theso-called linear programming duality.Problem 2.1.2 (Sieve Problem). Define an (normalized) upper bound sieve to be a function ν+ : Z→R of the form ν+ =∑d∈D λ+d 1Ed for some λ+d ∈ R, such thatν+(n) ≥ 1n/∈∪p|PEp(n), (2.1.10)then the supremum of (2.1.9), subject to the condition that only Xd, d ∈ D are known, equals to theinfimum of ∑d∈Dλ+d Xd, (2.1.11)where the infimum is over (λ+d ) that constitutes an upper bound sieve. Usually Xd will be of the formg(d)X + rd where g is a multiplicative function, 0 ≤ g ≤ 1, X is a quantity independent of d and rdis negligible when d is restricted to a small range D. Hence one is to minimize∑d∈Dλ+d g(d)Observe that a sequence (λ+d ) such that λ+1 ≥ 1,∑d|n λ+d ≥ 0 ∀n|P will form an upper bound sieve.Such a sequence is called upper bound sieve coefficients. Analogue problems for lower bound sievesmay be similarly stated.Now an important kind of upper bound sieve is the Selberg’s upper bound sieve developed by Selbergin 1940s, see e.g. [15] . The idea of Selberg’s upper bound sieve comes from observing that if P isany squarefree number and (ρd)d|P are arbitrary real numbers with ρ1 = 1 then (∑d|P ρd1Ed)2 is anupper bound sieve as it is 1 outside⋃p|P Ep. Equivalently, the sequenceλ+d =∑d1,d2lcm[d1,d2]=dρd1ρd2 , where d|P (2.1.12)is a sequence of upper bound sieve coefficients. Set D = R2 and we assume that ρd is supportedon D = {d|P : d ≤ D}. The key advantage of Selberg’s upper bound sieve is that (ρd)1≤d≤R are28real numbers and the sieve problem reduced to problem of optimizing quadratic forms. It turns outSelberg’s sieve usually already gives good results compared to harder optimizing general upper boundsieve coefficients.We study the following choice of ρd. The optimal Selberg’s weight is given by, roughly, by4 µ(d) log(R/d).We will use the following variant obtained by Tao.ρd := µ(d)χ(log dlogR) (2.1.13)where χ is some smooth compactly supported function.Finally we state an important lemma in sieve theory. This lemma may be used to derive formulas forthe number of solutions of various diophantine equations or counting patterns in almost primes, e.g.one can prove the analogue of the Hardy-Littlewood almost prime tuples conjecture. See [107] fordetails. One limitation of sieve method is that we can find asymptotic for system of linear equationswhere the number of equations bounded by a set of parameters, contrary to more modern resultsinvolving prime numbers[52].Theorem 2.1.3 (The fundamental lemma of sieve theory ([107], Cor 19.)). Let z = D1/s for someD, s > 1. Suppose g is a multiplicative function and κ > 1 (called sieve dimension) such that g obeysthe boundg(p) ≤ κp+Oκ(1p2), g(p) ≤ 1− cκ.for some small constant cκ. In particular, for 2 < ω < z,V (ω) =∏p<ω(1− g(p)) .κ(log zlogω)κV (z)Let Ep be a set of integers and (an)n∈Z be a finitely supported sequence of non-negative real suchthat ∑n∈Edan = Xg(d) + rd, X > 0, rd,∈ R, Ed = ∩p|dEpfor all squarefree d ≤ D. Then∑n/∈⋃p≤z Epan = (1 +Oκ(e−s))XV (z) +O(∑d≤D;ν(d)2=1|rd|).4This is a form used by Goldston-Yildirim in their works on small gaps in the primes.292.2 A Pseudorandom Measure Majorizing the PrimesIn this subsection we construct a pseudorandom measure ν similar that used in the original proof ofGreen-Tao’s Theorem. This subsection is mostly for expositional purpose.2.2.1 The W-TrickThe primes has the obvious structure that they can only live in some residue classes (for example,no primes except 2 and 3 are in 2 (mod 6), 3 (mod 6) i.e. they are not uniformly distributed). Let5W :=∏p≤ω p ≈ eω, considerPW,b[N ] := {1 ≤ n ≤ N : Wn+ b ∈ P}We can see from The Prime Number Theorem in arithmetic progressions that PW,b is uniformlydistributed among residue classes (mod p), for all p ≤ ω. Using the correspondenceA ⊆ P[N ]↔ A′′ := A ∩ PW,b[N/W ]One is able to get rid of the local factors arising from small primes p ≤ ω while primes are muchmore uniformly6. distributed on large residue classes.There are two viewpoints one could think of W (see Theorem 2.2.1 below). First, we can think of Was a function of x which goes to infinity sufficiently slowly. The error term in this case is of the formox→∞(1). Another viewpoint, we could also think of W as a fixed sufficiently large constant andthe error term would be of the form ow→∞,x→∞(1). The latter viewpoint is important in calculatingexplicit bounds. To make this precise, we state the following theorem stated in [106].Theorem 2.2.1 (Overspill Principle7[106]). Let F (w, x) : Z+ × R → R then the following areequivalent:1. For a fixed  > 0 there exists w such that for each fixed w ≥ w|F (w, x)| ≤ + ow,x→∞(1) = oω→∞,x→∞(1)2.F (w, x) = ox→∞(1)whenever w = w(x) ∈ Z is sufficiently slowly growing to infinity. (in the sense that there is afunction w0(x) : R+ → Z+ defined in the proof such that w(x) ≤ w0(x).)5Indeed ω cannot be bigger than logN6Up to error term oω→∞(1).7The name comes from non-standard analysis. We write out explicitly the parameters in o(1) in the statement.30Proof. Assume (1), then for each natural number n, and w ≤ n we can find xn such that|F (w, x)| ≤ 2nfor all x ≥ xn and n ≥ w ≥ w1/n. Define the functionwo : R+ → N, w0(x) :=n where n is the largest integer such that xn ≤ x1 if x < x1.WLOG, we may choose xn in a way that it is increasing in n. Definew(x) =w1/n if xn+1 < x ≤ xn1 if x < x1.Hence w(x) ≤ w0(x). Also F (w, x) = o(1).Conversely, assume (2) holds but (1) fails. Then there is an  > 0 such that for any positive integer n,there is wn ≥ n such that |F (wn, xn)| >  for arbitrarily large xn. Letting wn going to infinity, wecan find a sequence {xn} going to infinity such that |F (wn, xn)| ≥  for all n. Since w(x) goes toinfinity, increasing xn as necessary, we can ensure that w(x) ≥ wn for all x ≥ xn and all n. We seethat |F (w, x)| ≥  at x = xn for all n which contradicts (2).2.2.2 Pseudorandomness ConditionsIn this section we construct a pseudorandom measure ν majorizing the Mangoldt function Λ concen-trated on primes. We prove certain correlation conditions which is a bit more general than the onesobtained in [51], [52](see also the exposition [17]), however the proof is essentially the same8.We defined the following modified Mangoldt function corresponding to W-trick. LetW =∏p≤ω p, (b,W ) =1. DefineΛ˜b(n) =φ(W )W log(Wn+ b) if Wn+ b is prime0 otherwise.(2.2.1)The factor φ(W )/W is for normalized purpose; we have 1/N∑n≤N Λ˜b(n) = 1+o(1).We constructa pseudorandom measure, by making use of the Goldston-Yildirim division sum [42]ΛR(n) =∑d|n,d≤Rµ(d) log(R/d).8A stronger analogue pseudorandom condition for ν is obtained in [110] in particular they need the polynomial to stay in thelength about N so the range average over t has to be of the form No(1). This is a nontrivial bound on the size of t.31Note that if N ≥ n ≥ R is a prime or more generally if n has no prime factor ≤ R, then ΛR(n) =logR & Λ˜(n) ; choosing R = Nη for some η > 0. We state below two technical results which showthat the function ΛR(n) is concentrated on the set of almost primes, i.e. numbers havong only largeprime factors. As we don’t need these facts for our main results, we omit the proofs.Theorem 2.2.2. [81] Let N c0 < R ≤ √N/q(logN)−C and q be a prime,q = Rβ, β < c0 wherec0, C are suitably chosen. Then ∑N<n≤2Nq|nΛR(n)2  βq∑N<n≤2NΛR(n)2In particular, if P (Nη) =∏p≤Nη p then∑N<n≤2Ngcd(n,P (Nη))>1ΛR(n)2  βq∑N<n≤2NΛR(n)2In [52], the following variant of ΛR is introduced,Λχ,R(n) =∑d|nµ(d)χ(log dlogR) (2.2.2)The point is to replace log+(R/d) with a smooth approximation χ(log dlogR) where χ is a smooth com-pact supported, bounded function. Then instead of using the contour integral, one can apply Fouriertransform. Then we can truncate the integrals over bounded interval obtaining error terms o(1) dueto smoothness of χ and rapid decay of its Fourier transform. We will follow this more elementaryapproach as opposed to that of [42] based on evaluating certain contour integrals.Definition 2.2.3 (Green-Tao measure). Let χ : R → [0, 1] supported on [−1, 1], χ(0) = 1, Cχ =∫ 10 |χ′(t)|2dt which we may assume to be 1. Let R = Nk−12−k−5 , k ≥ 3. Let ε > 0 be a smallconstant. Define νb : ZN → R+νb(n) =φ(W )WΛχ,R(Wn+b)2Cχ logRif εN ≤ n ≤ 2εN1 Otherwise(2.2.3)Remark 2.2.4. Note that νb = ν(N)b depends on N but we will not write script N for simplicity.Sometimes we also drop the subscript b as all our estimates are independent of it.We summarize important properties of ν:– In general ν could become unbounded as N → ∞. However, by the so-called linear form32conditions (see Definition 2.2.5 below)Exν(x) = 1 + o(1).– We haveΛ˜(n) .k ν(n).To see this9, we may only check n for which Wn+ b is prime. Then Λχ,R(Wn+ b) = logR =k−12−k−5 logN. Now assume N is sufficiently large (and ω is sufficiently slowly growing orconstant) thenk−12−k−5 logN ≥ k−12−k−6 log(WN + b).Thenk−12−k−6φ(W )Wlog(Wn+ b) ≤ k−12−k−6φ(W )Wlog(WN + b) ≤ φ(W )WlogR = ν(n).– If Wn + b is prime in , say, [ε1N, ε2N ] then ν(n) ≈W logN . Here we may choose W to be a(large) constant.Now we describe the pseudorandom conditions we will need later. The first one roughly says that ifthe linear forms Li are not rational multiple of each other then the events that Li(x) + bi are almostprimes are independent.Definition 2.2.5 (Linear Forms Condition). Let m0, t0 ∈ N be parameters then we say that ν satisfies(m0, t0)− linear form condition if for any m ≤ m0, t ≤ t0, suppose {aij}1≤i≤m1≤j≤tare subsets ofintegers and bi ∈ ZN . Givenm (affine) linear forms Li : ZtN → ZN with Li(x) =∑1≤j≤t aijxj +bifor 1 ≤ i ≤ m be such that each Li is nonzero and they are pairwise linearly independent overrational. ThenEx∈ZtN∏1≤i≤mν(Li(x) + bi) = 1 + oN→∞,m0,t0(1) (2.2.4)This is a very general phenomena. Two important special cases are given below.Example 2.2.6. ‖ν − 1‖2dUd= o(1), where ‖ · ‖Ud is the uniformity norm discussed in section 1.1.1.Example 2.2.7. If f ≤ ν then consider the dual function defined in (1.2.1) associated with ‖ · ‖U2norm then|D2f(x)| ≤ Ea,bν(x+ a)ν(x+ b)ν(x+ a+ b) = 1 + o(1)This is referred as bounded dual condition10, saying that the structural component (used in [51]) inthe decomposition is bounded.9W-trick is essential here, we don’t expect this to holds for prime itself with pseudorandom ν as primes are not uniformlydistributed among residue classes.10 This is the only application of linear form conditions with nonzero constant terms in the original Green-Tao’s proof[51].33Now we state the so-called correlation condition which control some kind of mild correlation of ΛRby functions τ . τ itself may not be bounded but it has bounded moments. The proof of linear formscondition and correlations for ν are not much different, this may be harder for primes. Roughlyspeaking we have for h 6= 0,Exν(x)ν(x+ h) ≤ τ(h) ≈ exp(∑p>ω,p|h1/√h).Note that if h has a large number of divisors then τ(h) can be arbitrarily large. As opposed, there is astrong kind of correlation that we cannot control, i.e. higher moment of ν,Exν(x)2 ∼ logN →∞.Definition 2.2.8 (Correlation Condition). 11 We say that a measure ν satisfies (m0,m1, ...,ml2)−correlation condition if there is a function τ : ZN → R+ such that1. E(τ(x)m : x ∈ ZN ) = Om(1) for any m ∈ Z+2. Suppose– φi, ψ(k) : ZtN → ZN (1 ≤ i ≤ l1, 1 ≤ k ≤ l2, l1 + l2 ≤ m0) are all pairwise linearlyindependent linear forms over Q.– For each 1 ≤ g ≤ l2, 1 ≤ j < j′ ≤ mg we have agj 6= 0, and a(g)j ψ(g)(x)+h(g)j , a(g)j′ ψ(g)(x)+h(g)j′ are different (affine) linear forms.then, we haveEx∈ZdNl1∏k=1ν(φk(x))l2∏k=1mk∏j=1ν(a(k)j ψ(k)(x)+h(k)j ) ≤l2∏k=1∑1≤j<j′≤mkτ(W (a(k)j′ h(k)j −a(k)j h(k)j′ )+(a(k)j′ −a(k)j )b)(2.2.5)where W =∏p≤ω p.Lemma 2.2.9. Let B ⊆ ZdN be a box of length ≥ R10M where M = m0m1 . . .ml2 , R = Nk−12−k−5thenEx∈Bm0∏k=1Λχ,R(Wφk(x) + b)2l2∏k=1mk∏j=1Λχ,R(W · (a(k)j ψ(k)(x) + h(k)j ) + b)2 (2.2.6)= (1 +O(logω√logR)) exp(Om(1/ω))(W logRφ(W ))Ml2∏k=1∏p|∆k(1 +OM (p−1/2))11This lemma is first invented in [51] to prove dual function estimates with K arbitrarily large.34where∆k :=∏1≤j<j′≤mk(W · (a(k)j′ h(k)j − a(k)j h(k)j′ ) + (a(k)j′ − a(k)j )b)andM = m0 +m1 + · · ·+mj[M ] =m0⋃j=1Ij ∪mj⋃j=1Imj , Ij = {j} for j ≤ m0, Imj = (Mj−1,Mj ]ψ(k)i :=φi if i ∈ Ij , j ≤ m0ψ(k) if i ∈ Ik, k > m0Now we verify this lemma as in [51] or [17]. Writeθi(x) =Wφi(x) + b if i ∈ Ij , j ≤ m0W (a(k)i ψ(k)i + h(k)i ) + b if i ∈ Imk ,mk > m0.Expand LHS of (2.2.6)Ex∈Bk∏m0∏i∈ImΛχ,R(θi(x))2 (2.2.7)=∑a,b∈NM( M∏i=1µ(ai)µ(bi)χ(log ailogR)χ(log bilogR))Ex∈B( M∏i=11ai,bi|θi(x)(x))Observe that only the last term depends on x. Recall D = lcm[a1, b1, . . . , aM , bM ] ≤ R2M and Bhas each side of length ≥ R10M , we can approximateEx∈B( M∏i=11ai,bi|θi(x)(x))= Ex∈ZtD( M∏i=11ai,bi|θi(x)(x))+O(R−8M )To see this, consider a slightly smaller box B′ whose the lengths of all its dimensions are all divisibleby D and the length of each side of B′ differs from the length of the corresponding side of B byO(R2M ). The total error in 2.2.7 when changing average in B to average over B′ (which is the sameas the average in ZtD) is O(log2M RR6M).For X ⊆ [M ], define the local factorωX(p) = Ex∈Ztp∏i∈X1θi(x)≡0 (mod p),35ωX = Ex∈ZtD∏i∈X1ai,bi|θi(x), D = lcm[a1, b1, . . . , aM , bM ].Let D = p1 . . . ph. Using the Chinese Remainder Theorem, we rewrite the system of equationsθi(x) ≡ 0 (mod ai), θi(x) ≡ 0 (mod bi), 1 ≤ i ≤ masθi(x) ≡ 0 (mod pj), 1 ≤ i ≤ m, 1 ≤ j ≤ h.HenceωX =∏pωX(p).We have the following local factor estimate that will be used later.Lemma 2.2.10. [Local factor estimate]1. ω∅(p) = 1.2. p ≤ ω,X 6= ∅ ⇒ ωX(p) = 0.3. |X| = 1⇒ ωX(p) = 1.4. Suppose p > ω and X ⊆ Ik, |X| > 1. If p|∆k and |X| = 2 then ωX(p) = p−1. If p|∆k and|X| > 2 then ωX(p) ≤ p−1. If p - ∆k then ωX(p) = 0.5. If p > ω and ∃k1 6= k2 such that X ∩ Ik1 , X ∩ Ik2 is nonempty then ωX(p) ≤ p−2.Proof. (1) is trivial. (2) follows from the fact that if p ≤ ω, j ∈ X then W · (a(k)j ψ(k)j + h(k)j ) + b ≡b 6= 0 (mod p). To see (3), if p > ω,X ⊆ Ik, |X| = 1,say X = {j}, then we can writeωX(p) = Ex∈Ztp1W ·(a(k)j ψ(k)a(k)j +h(k)j )+b≡0 (mod p)= p−1To verify (4) and (5), assume |X| > 1 and j, j′ ∈ X sincep|W · (a(k)j ψ(k)j + h(k)j ) + b, p|W · (a(k)j′ ψ(k)j′ + h(k)j′ ) + bThen p|W · (a(k)j h(k)j − a(k)j h(k)j′ ) + (a(k)j′ − a(k)j )b so p|∆k. Hence p - ∆k ⇒ ωX(p) = 0.Now assume p|∆k, then the condition that p|θi(x)∀i ∈ X and |X| = 2 could be reduced to p|θj(x)for some j ∈ X if p > W, p > ai∀i ∈ X , which is true if ω is chosen sufficiently large. HenceωX(p) = Ex∈Ztp∏i∈X1W ·(a(k)i ψ(k)a(k)i +h(k)i )+b≡0 (mod p)= Ex∈Ztp1W ·(a(k)j ψ(k)a(k)j +h(k)j )+b≡0 (mod p)= p−136If |X| > 2 then we can crudely bound this by ω{j,j′} for some j, j′ chosen so that p divide the factorin ∆ corresponding to j, j′.Now we verify (5). Assume j ∈ X ∩ Ik1 , j′ ∈ X ∩ Ik2 . For i = 1, 2, writea(ki)j ψ(ki)j (x) =t∑s=1Lki,sxsas p -W , our condition becomest∑s=1Lki,sxs = −W−1b−W−1h(ki)ji (mod p), i = 1, 2.By our assumption that (Lk1,s)1≤s≤t, (Lk2,s)1≤s≤t are not rational multiple of each other, we claimthat they are also linearly independent over Zp. Assume indirectly that Lk1,s = rLk2,s (mod p) forsome rational r. Thenai/bi = rci/di (mod p).Hence for each 1 ≤ i ≤ t, λ = (aidi)(bici)−1 (mod p), i.e. a1b1bici ≡ b1c1aidi (mod p). But ifω is sufficiently large so that |ai|, |bi|, |ci|, |di| ≤ ω1/42 . then |a1d1bici − b1c1aidi| ≤ |a1d1bici| +|b1c1aidi| < ω ≤ p. Hence a1d1bici = b1c1aidi ∀1 ≤ i ≤ t. This is a contradiction. Hence theset of solutions of θj(x) = θk(x) ≡ 0 (mod p) is contained in the intersection of two skew-affinesubspaces of Ztp and hence has cardinality ≤ pt−2.Now let ψ be the inverse Fourier transform of exχ(x) i.e.χ(x) =∫Rψ(t)e−x(1+it)dtSince exχ(x) is smooth and has compact support, ψ is smooth and rapidly decays. In particular, forany A > 0, |ψ(t)| = OA((1 + t)−A). Let I = [−√logR,√logR] thenχ(log clogR) =∫Ic− 1+itlogRψ(t)dt+O(c−1/ logR log−AR) (2.2.8)for any A > 0. Observe that χ( log clogR) = O(c− 1logR ), henceM∏j=1χ(log ajlogR)χ(log bjlogR) =∫I· · ·∫IM∏j=1ψ(xj)ψ(yj)a1+ixjlogRj b1+ixjlogRjdxjdyj +OA(log−ARM∏j=1(ajbj)−1/ logR)Substitute this into (2.2.7). The error term can be shown to be o(1) for large enough A. Indeed, using37that Ex∈Ztp1aj ,bj |θj(x)∀j is 1 if aj , bj = 1∀j and ≤ 1/p otherwise. The error term is given by∑ai,bi∈NsquarefreeEx∈ZtD1aj ,bj |θj(x)∀jOA(log−ARM∏j=1(ajbj)−1/ logR).A (logR)−A∏p∑ai,bi∈{1,p}[Ex∈Ztp1aj ,bj |θj(x) ∀j] M∏j=1(ajbj)−1/ logR≤ (logR)−A∏p(1 + p−1∑ai,bi∈{1,p}(a1b1...aMbM )−1/ logR)= (logR)−A∏p[1 + p−1((p−1/ logR + 1)2M − 1)]≤ (logR)−A∏p(1− p−1−1/ logR)2M(apply (2.1.8)) = (logR)−Aζ(1 + 1/ logR)2M = O(logR)−2M−A = o(1)where A > 0 can be chosen arbitrarily large. Denotex′j =1 + ixjlogR, y′j =1 + iyjlogR. (2.2.9)The main term in (2.2.7) becomes∫I· · ·∫I∑a,b∈NM∏p(Ex∈ZtpM∏i=11ai,bi|θi(x)(x)) M∏j=1µ(xj)µ(yj)ax′jj by′jjM∏j=1ψ(xj)ψ(yj)dxjdyjwhere∑a,b∈NM∏p(Ex∈ZtpM∏i=11ai,bi|θi(x)(x)) M∏j=1µ(xj)µ(yj)ax′jj by′jj=∏p∑a,b∈{1,p}M(Ex∈ZtpM∏i=11ai,bi|θi(x)(x)) M∏j=1µ(xj)µ(yj)ax′jj by′jj:=∏pEpis the Euler product, where the Euler factor isEp =∑a,b∈{1,p}M(Ex∈ZtpM∏i=11ai,bi|θi(x)(x)) M∏j=1µ(xj)µ(yj)ax′jj by′jj=∑I,J⊆[M ](−1)|I|+|J |ωI∪J(p)p∑j∈I x′j+∑j∈J y′j38Define a more convenient Euler’s factorE′p =m∏j=1(p1+x′j − 1)(p1+y′j − 1)p(p1+x′j+y′j )=M∏j=1(1− p−1−x′j )(1− p−1−y′j )1− p−1−x′j−y′jFrom (2.1.8), we have ∏pE′p =(1 + o(1))logM RM∏j=1(1 + ix′j)(1 + iy′j)2 + ix′j + iy′j.Define Fp = Ep/E′p we haveLemma 2.2.11.∏pEp =∏pFp∏pE′p =∏pFp1 + o(1)logM RM∏j=1(1 + ixj)(1 + iyj)2 + i(xj + yj)Next, we use that12∫R∫R(1 + ixj)(1 + iyj)2 + i(xj + yj)ψ(xj)ψ(yj)dxjdyj =∫χ′(t)2dt := 1.The main term becomes∫I· · ·∫I∏pFpM∏j=1(1 + ixj)(1 + iyj)2 + i(xj + yj)ψ(xj)ψ(yj)dxjdyj(1 + o(1)) log−M R= (1 + o(1))∏pFp log−M R( ∫I∫I(1 + ixj)(1 + iyj)2 + i(xj + yj)ψ(xj)ψ(yj)dxjdyj)M= (1 + o(1))∏pFp log−M R( ∫R∫R(1 + ixj)(1 + iyj)2 + i(xj + yj)ψ(xj)ψ(yj)dxjdyj + o(1))M= (1 + o(1)) log−M R∏pFpTo find asymptotic of∏p Fp, we apply the local factor estimate.Lemma 2.2.12 (Euler factor estimate for Linear forms condition). We haveEp = (1 +O(p−2))E′p12This follows from the identity12 + ix+ iy=∫ ∞0e−(1+ix)te−(1+iy)tdt.39and moreover13,∏p≤ωFp = exp(Om(1/ω))(1+O(logω√logR))(W/φ(W ))M = (W/φ(W ))M (1+OM (1/ω)1+O(logω√logR)).Proof. Recall notation x′j , y′j defined in (2.2.9).Ep =∑I,J⊆[M ](−1)|I|+|J |ωI∪J(p)p∑j∈I x′j+∑j∈J y′j= ω∅(p)−M∑j=1(1px′j+1py′j− 1p1+x′j+y′j)+∑I,J⊆[M ],|I∪J |≥2OM (p−2)p∑j∈I x′j+∑j∈J y′j= 1−M∑j=1(px′j + py′j − 1p1+x′j+y′j)+OM (p−2)Now recall |x′j |, |y′j | = O((logR)−1/2) and E′p =∏Mj=1(p1+x′j−1)(p1+y′j−1)p1+x′j+y′j−1. We computeFp = Ep/E′p =(1−M∑j=1px′j + py′j − 1p1+x′j+y′j) M∏j=1(1− p−1−x′j−y′j )p−1(1− p−1−x′j )−1(1− p1−y′j )−1 +O(p−2)= 1 +OM (p−2).Now ∏p>ω(1 +OM (p−2)) = exp(OM (∑p>ωp−2)) = exp(OM (1/ω)).For p < ω,Ep = 1, whereas if |zj | = O(log−1/2R) then 1 − p−1−zj = 1 − p−1 exp(−zj log p) =1− p−1(1 +O(|zj | log p)) = (1− p−1)(1 +O( |zj | log pp )), applying this with zj = xj , yj , xj + yj , wehave∏p≤ωE′−1p =∏p≤ωM∏j=11− p−1−x′j−y′j(1− p−1−x′j )(1− p−1−y′j )=∏p≤ωM∏j=11(1− 1/p)(1 +O(|zj | log pp ))=∏p≤ω[( pp− 1)M(1 +O(log pp log1/2R))]The Lemma follows by recalling that∏p≤ω(p/p− 1)M = (W/φ(W ))Mand∏p≤ω(1 +O(log pp log1/2R)) = exp(1√logRO(∑p≤ωlog pp)) = exp(O(logω√logR)).13Here the term (W/φ(W ))M comes from primes p ≤ ω.40Lemma 2.2.13 (Euler factor estimate for the correlation conditions).Ep = (1 +O(p−2))E′p when p - ∆k for all k.Ep = (1 +O(p−1/2))E′p when p|∆k for some k.∏pFp = exp(OM (1/ω))(W/φ(W ))M (1 +O(logω√logR))∏p|∆1...∆k(1 +O(p−1/2))Proof. The first statement is similar to the first part of Lemma 2.2.12. Assume p|∆k, using x′j , y′j =o(1), and applying the local factor estimate (Lemma 2.2.10), we haveEp = 1 +O(1/p)∑I,J⊆[M ],I∪J 6=∅(−1)|I|+|J |p∑i∈I x′j+∑j∈J y′j= 1 +O(p−1/2)E′p =M∏j=1(p1+x′j − 1)(p1+y′j − 1)p(p1+x′j+y′j − 1)=M∏j=1(1− p−1−x′j )(1− p−1−y′j )1− p−1−x′j−y′j=M∏j=1(1− p−1−x′j )(1− p−1−y′j )(1 + p−1−x′j−y′j +O(p−3/2))= 1 +O(p−1/2)Now define τ = τM : ZN → R≥0, τ(0) := exp(CM logNlog logN ) so ‖ν‖M∞ ≤ τ(0). Define τ(n) =OM (n)∏p|n(1 +O(p−1/2))OM (1). One estimates∏p|∆k(1 +OM (p−1/2)) =∏1≤i<j≤mk(∏p|W ·[(a(k)j′ hkj−a(k)j hkj′ )+(akj′−a(k)j )b](1 +OM (p−1/2))≤∏1≤i<j≤mk∏p|W ·[(a(k)j′ hkj−a(k)j hkj′ )+(akj′−a(k)j )b](1 + p−1/2)OM (1)≤ OM (1)∑1≤i<j≤M∏p|W ·[(a(k)j′ hkj−a(k)j hkj′ )+(akj′−a(k)j )b](1 + p−1/2)OM (1)≤∑1≤i<j≤Mτ(W · ((a(k)j′ hkj − a(k)j hkj′) + (akj′ − a(k)j )b).Now we verifyEx∈ZN τ(x)q = Oq(1)41Since 1/√p→ 0, p→∞ then (1 +O(p−1/2))OM (q) ≤ 1 + p−1/4. for all but finitely many p.E0<|n|≤N (∏p|m(1 +O(p−1/2)))OM (q) ≤ OM,q(1)E0≤n≤N∏p|n(1 + p−1/4) ≤ OM,q(1)E0≤n≤N (∑d|nd−1/4)= OM , q(1)N∑d=1d−5/4 = OM,q(1).Lemma 2.2.14. ν satisfies the pseudorandomness conditions (2.2.4), (2.2.5).Proof. We follow the argument in [51]. First by clearing the denominator (inZN , N is prime), we mayassume the linear forms have integer coefficients with coefficient bounds from k to k(k!) < (k + 1)!.Choose ω sufficiently large so that (k + 1)! <√ω2 . To prove (2.2.5), we just use the trivial boundν(x) ≤ 1 + φ(W )W ΛR(θi(x)) and apply Lemma 2.2.9.To deal with the two-part definition of ν and to get the asymptotic of the form 1+o(1), letQ = Q(N)chosen to be a small power of N such that N/Q ≥ R10M , we subdivide ZtN into Qt roughly equalsized boxesBu1,...,ut = {x ∈ ZtN : xj ∈ [bujNQc, b(uj+1)NQc]}, where u1, . . . , ut ∈ ZQ.Then |Bu1,...,ut | = (N/Q+O(1))t = (N/Q)t(1 +O(Q/N)). ThenEx∈ZtN ν(φ1(x)) . . . ν(φm(x)) =1Qt(N/Q)t∑(u1,...,ut)∈ZtQ∑x∈Bu1,...,utν(φ1(x)) . . . ν(φm(x))= (1 + o(1))E(u1,...,ut)∈ZtQEx∈Bu1,...,utν(φ1(x)) . . . ν(φm(x))We say that a box Bu1,...,ut is nice if φi(Bu1,...,ut) ⊆ [εkN, 2εkN ] ∀i. Since N/Q > R10M , applyingLemma 2.2.12 (in particular, Lemma 2.2.9), we obtainEx∈Bu1,...,ut niceν(ψ1(x)) . . . ν(ψm(x)) = 1 +OM (1/ω)(we could replace each ν with either 1 or φ(W ) logRW Λ2R).Now we claim that the proportion of the number of boxes that are not nice is O(1/Q). SupposeBu1,...,ut is not nice then there is a linear formψ and x,y ∈ Bu1,...,ut such thatψ(x) ∈ [εN, 2εN ], ψ(y) /∈[εN, 2εN ]. Hence by continuity, either a = 1 or 2, one hasaεN =t∑j=1LjbNujQc+ b+O(N/Q).42Hencet∑j=1Ljuj = aεQ+bQN+O(1) (mod Q).For any choice of u1, . . . , ut−1, the number of choices of ut is O(1). Hence the number of non-niceboxes is O(Qt−1). If Bu1,...,ut is not nice, then using the trivial bound ν(x) ≤ 1 + φ(W )W ΛR(θi(x))2,we obtainEx∈Bu1,...,ut not nice = exp(1/o(ω))(O(1) + o(1)).HenceEx∈ZtN ν(θ1(x)) . . . ν(θM (x)) = Eu∈ZtQ[Ex∈Bu niceν(θ1(x)) . . . ν(θM (x)) + Ex∈Bu not niceν(θ1(x)) . . . ν(θM (x))]=1Qt[Qt(1 +O(1/ω) +O(logω/√logR)) +O(Qt−1) exp(OM (1/ω))(O(1) + o(1))]= 1 +O(1/ω) +O(logω/√logR) +O(1/Q(N)).Here R is small power of N , Q(N) is a power of N . O(1/Q) may be neglected.The correlation condition for ν can be verified in the same way from Lemma 2.2.13 (in particularLemma 2.2.9) and the definition and properties of τ .2.3 Birch-Davenport’s Circle MethodThe main objective of this section is to prove the Theorem 2.3.4 below whose proof is a simpleadaptation of arguments of Birch [11]. We may skip some details that are the same as in [11].2.3.1 Set UpDenote ‖x‖ the distance of x to the closest integer.Let F = (F1, . . . , Fr) be a system of r homogeneous forms of degree k on Zd. We are particularlyinterested in the case k ≥ 2. Let VF = {x : F(x) = 0} ⊆ Cd, we define V ∗F ⊆ VF , the singularvariety of F , to be the set of points such that JacF , the Jacobian of F , drops rank.Let us introduce the notationRN (v) = #{x ∈ [N ]d : F(x) = v}.RN (M, s;v) := |{x ∈ [N ]d; x ≡ s (mod M), F(x) = v}|.For a family of integral forms F = (F1, . . . , Fr) in variables xi ∈ Zd, write xi = (xi1, . . . , xid) and43Fi(xi1, . . . , xid) =∑0≤j1+···+jd≤kcij1,...,jd(xi1)j1 . . . (xid)jd (2.3.1)We may write it as a symmetric form:Fi(xi1, . . . , xid) =∑0≤i1,...,il≤daii1,...,ikxii1 . . . xiikwith k!aii1,...,ik ∈ Z. Define a symmetric integral d−linear form on xi, . . . ,xk; xi = (xi1, . . . xid):Φi(x1, . . . ,xk) = k!∑0≤i1,...,il≤daii1,...,ikx1i1 . . . xkik(2.3.2)Thenk!Fi(x) = Φi(x, . . . ,x). (2.3.3)For α ∈ [0, 1]r, define the exponential sumSN (M, s, α) :=∑x∈Zde2piiα·F((Mx+s))φN (Mx+ s) (2.3.4)where φN is the characteristic function of [0, N ]d. Recall the notation ∆+h f(x) = f(x + h) − f(x)thenk!∆+hFi(x) = kΦi(x, . . . ,x,h) +Rk−2(x,h) (2.3.5)where deg(Rk−2) (in x) ≤ k − 2. Hencek!∆+hk−1 . . .∆+h1Fi(x) = k!Φi(x,hk−1 . . . ,h1) +R0(h) (2.3.6)HereΦi(x,h1, . . . ,hk−1) =∑1≤j,i1,...,ik−1≤daij,i1,...,id−1xjh1i1 . . . hk−1ik−1 :=∑1≤j≤dxjΨij(h1, . . . ,hk−1).(2.3.7)Definition 2.3.1 (Rank of F ; defined by Birch [11]). 14 Let F = (F1, . . . , Fr) be a system of rhomogeneous form of degree k. The Rank of F is the codimension of the singular variety V ∗F , the setof points z ∈ Cd where the Jacobian ∂F/∂z drops rank.We will writeK :=codim(V ∗F )2k−114There is also a notion of Schmidt rank [94]. Let F be a single homogeneous polynomial, the Schmidt rank is defined tobe the smallest integers h such that we can find homogeneous forms T1, . . . , Th, R1, . . . , Rh of positive degree such that Q =T1R1 + · · ·+ ThRh. Note that if Q =∑hi=1 TiRi then V∇Q ≤ 2h.44Example 2.3.2. If F (x) = Ax · x is a quadratic form i.e. A is an integral symmetric matrix, then∇F (x) = Ax, V ∗F = Ker(A), codimV ∗F = rank(A).2.3.2 The Circle MethodTheorem 2.3.3 (Birch’s Theorem). Let F be a system of r homogeneous integral forms of degree kin d variables. Suppose K > (k − 1)r(r + 1), N > 1, thenRN (v) = Nd−krσ(v)J(N−kv) +O(Nd−kr−ε)for some ε > 0, whereσ(v) =∏p primeσp(v)σp(v) = limr→∞ p−r(d−1)#{x ∈ (Z/p)d : F(x) = v (mod pr)}J(u) = JF (u) is the singular integral defined in (2.3.31).Recall from [11] that the singular series has a positive lower bound independent of u if we can findnonsingular solutions (mod p) for every p. The singular integral J(u) ≥ c(δ) > 0 independently ofN , provided that the equation F(x) = u has a nonsingular real point in the cube [δ, 1− δ]d, see [94]Section 9 and [11] Section 6.Theorem 2.3.4. Let F = (F1, . . . , Fr) be a family of integral forms of degree k ≥ 2 satisfying therank conditionRank(F) > r(r + 1)(k − 1)2k−1 (2.3.8)and for given M ∈ N and s ∈ Zd, recall thatRN (M, s;v) := |{x ∈ [N ]d; x ≡ s (mod M), F(x) = v}|. (2.3.9)Then there exists a constant δ′ = δ′(k, r) > 0 such that the following holds.(i) If 0 < η ≤ 14r2(r+1)(r+2)k2then for every 1 ≤M ≤ N η1+η and s ∈ Zd one has the asymptoticRN (M, s;v) = Nd−rkM−d J(N−kv)∏pσp(M, s,v) + O(Nd−rk−δ′M−d). (2.3.10)(ii) Moreover ifRank (F) > (r(r + 1)(k − 1) + rk)2k (2.3.11)45then the asymptotic formula (2.3.10) holds for η ≤ 14r(r+2)k .In the remaining if this subsecton, we describe the proof of Theorem 2.3.4.Proof of Theorem 2.3.4Recall that ‖x‖ is the distance of x to the closet integer. The first lemma is an exponential sumestimate analogous to Lemma 2.1 in [11].Lemma 2.3.5. Let 1 ≤M < N and s ∈ Zd. Then|(N/M)−dSN (M, s, α)|2k−1 . (N/M)−kd∑h1,...,hk−1∈[−N/M,N/M ]dd∏j=1min (N/M, ‖Mkxjα ·Ψj(h1, . . . ,hk−1)‖)where the ith component of the multi-linear form Ψj = (Ψij)ri=1 is given byΨij(x,h1, . . . ,hk−1) = k!∑1≤j1,...,jk−1≤daij,j1,...,jk−1h1j1 . . .hk−1jk−1 .Proof. We will invoke the following simple inequality: Let I be an interval of length at most N/Mand β ∈ R then|∑x∈Ie2piiβx| ≤ min{N/M, ‖β‖−1} (2.3.12)WriteF (Mx+ s) = MkF (x) +GM,s(x), deg(Gd,s) < k (2.3.13)and note that|N−d∑x∈[N ]df(x)|2 = N−d|N−d∑x,hf(x)f(x+ h)|Applying this k − 1 times, we have|(N/M)−dSN (M, s, α)|2k−1 =∣∣∣∣(N/M)−d ∑x∈Zde2piiα·F((Mx+s))φN (Mx+ s)∣∣∣∣2k−1≤ (N/M)−(k−1)d∑h1,...hk−1∣∣∣∣(N/M)−d∑xe2piiα·∆+hk−1 ...∆+h1F(Mx+s)∆hk−1 . . .∆h1φ(Mx+ s)∣∣∣∣By (2.3.13) and (2.3.6), we calculate∆+hk−1 . . .∆+h1F(Mx+ s) = Mk∆+hk−1 . . .∆+h1F(x) = Mkd∑j=1xjΨj(h1, . . . ,hk−1)46and one may verify that ∆hk−1 . . .∆h1φN (Mx + s) = 0 unless h1, . . . ,hk−1 ∈ [−N/M,N/M ]d.Hence|(N/M)dSN (M, s, α)|2k−1 ≤ (N/M)−kd∑h1,...hk−1∈[−N/M,N/M ]d∣∣∣∣ d∏j=1(∑xje2piiMk∑ri=1 αixjΨij(h1,...,hk−1))∣∣∣∣The result then follows from (2.3.12).In the next step we will use the above lemma to divide S1 into major arc and minor arc. The argumentfollows directly as in [11]. We sketch the argument below.Given η, γ > 0, define the following setsR((N/M)η, (N/M)−γ ;α):=∣∣{(h1, . . . ,hk−1) : hi ∈ [−(N/M)η, (N/M)η]d; ‖Mkα ·Ψj(h1, . . . ,hk−1)‖ ≤ (N/M)−γ ∀1 ≤ j ≤ d}∣∣.Now for fixed h2, . . . ,hk−1, consider the following map from Rd to Rdh→ (Mkα ·Ψ1(h,h2, . . . ,hk−1), . . . ,Mkα ·Ψd(h,h2, . . . ,hk−1)).Define the following symmetric convex bodyBQ,K = BQ,K,h2,...,hk−1 = {(x,y) : x ∈ [−Q,Q]d, |yj−Mkα·Ψj(x,h2, . . . ,hk−1)| ≤ K−1 ∀1 ≤ j ≤ d}We have R((N/M)η, (N/M)−γ ;α) =∣∣{(x,y) ∈ Z2d : (x,y) ∈ B(N/M)η ,(N/M)−γ}∣∣. Now we statethe following fact which says that this set is essentially d−dimensional object.Lemma 2.3.6 (Davenport [21] Lemma 3.3, [22] Chapter 12). Let L > 1 then|Z2d ∩BL−1N/M,L−1(N/M)−1 | & L−d|Z2d ∩BN/M,(N/M)−1 |Applying this lemma repeatedly in hk−1, . . . ,h1 respectively (with other hi fixed at a time) withL = (N/M)1−θ, 0 < θ < 1, one obtainsR((N/M)θ, (N/M)−k+(k−1)θ, α) & (N/M)−(k−1)d(1−θ)R(N/M, (N/M)−1, α) (2.3.14)Now subdividing [−12 , 12 ]d into small cubes∏dj=1[ijN/M ,ij+1N/M ] of size 1/(N/M). Observe that if twopoints (Mkα ·Ψ1(h1, . . . ,hk−1), . . . ,Mkα ·Ψd(h1, . . . ,hk−1)),(Mkα ·Ψ1(h,h2, . . . ,hk−1), . . . ,Mkα ·Ψd(g,h2, . . . ,hk−1))47are in the same cube, then one has‖Mkα ·Ψj(h− g,h2, . . . ,hk−1)‖ ≤ 1N/M, 1 ≤ j ≤ dHence the number ofh1, . . . ,hk−1 such that (Mkα·Ψ1(h1, . . . ,hk−1), . . . ,Mkα·Ψd(h1, . . . ,hk−1))are in a given cube is bounded above by R(N/M, (N/M)−1;α) for every cube. Hence∑h1,...hk∣∣∣∣ d∏j=1(e2piiMkxjα·Φj(h1,...,hk−1))∣∣∣∣ . ∑h1,...hk∣∣∣∣ d∏j=1min{N/M, 1‖α ·Ψj(h1, . . . ,hk−1)‖}∣∣∣∣(2.3.15). R(N/M, (N/M)−1;α)(∑1≤i≤N/2M(1/i))d. (log(N/M))dR((N/M), (N/M)−1;α) (2.3.16)Combining Lemma 2.3.5, (2.3.14) and (2.3.15), one obtains that for 0 < θ < 1,((N/M)−d|SN (M, s, α)|)2k−1 . (N/M)−(k−1)dθ logd(N/M)× (2.3.17)×∣∣∣∣{h1, . . . ,hk−1 ∈ [−(N/M)θ, (N/M)θ]d : ‖Mkα ·Ψj(h1, . . . ,hk−1)‖ ≤ (N/M)−k+(k−1)θ ∀1 ≤ j ≤ d}∣∣∣∣Now consider the inequality (2.3.17), suppose ∃h1, . . . ,hk−1 such that the matrix (Ψij)1≤i≤r,1≤j≤dhas rank r, i.e. there is a non vanishing r × r minor, which we may assume to be (Ψij)1≤i≤r1≤j≤r. Let qdenote the absolute value of the determinant of (Ψij)1≤i≤r1≤j≤r, since the degree of Ψij is k − 1, we have1 ≤ q ≤ (N/M)r(k−1)θ. Then‖qMkα ·Ψj(h1, . . . ,hk−1)‖ ≤ q(N/M)−k+(k−1)θ ≤ (N/M)−k+(r+1)(k−1)θThen as in [11], we can find integers a1, . . . , ar such that for i = 1, . . . , r|qMkαi − ai| ≤ (N/M)−k+(k−1)rθNow we divide the torus Tr = (R/Z)r into “Major arcs” and “Minor arcs”. The major arcs is definedasM(θ) =⋃1≤q≤(N/M)(k−1)rθ⋃(a,q)=1Ma,q(θ).Here (a, q) = gcd(a1, . . . , ar, q) andMa,q(θ) := {α ∈ [0, 1]r; |Mkαi − ai/q| ≤ q−1(N/M)−k+(k−1)rθ, ∀1 ≤ i ≤ r}.48The minor arcs m(θ) is defined to be Tr\M(θ). The name comes from the fact that even thoughminor arcs contributes most of the arcs on the “circle”, the contribution to the integral is small fromminor arc which we will verify now.It is easy to see that Ψij(z, . . . , z) = (k− 1)!∂jFi(z). Define ∆ := {(z, · · · , z) : z ∈ Cd} ⊆ Cd(k−1)which is isomorphic to Cd. Assume the first r − 1 columns of (Ψij)1≤i≤r1≤j≤dare linearly independent,let WΦ ⊆ C(k−1)d be the locus of points satisfying the equations saying that the remaining d− r + 1containing these r column is zero. Hence ∆ ∩ WΦ = V ∗F and hence codim(∆) + codimWΦ ≥(k − 1)d− dim(V ∗F ) where codim(∆) = (k − 2)d, socodim(WΦ) ≥ codim(V ∗F ).Now if α /∈M(θ) then we estimate the size of the set on the RHS of (2.3.17) by|Zd(k−1) ∩ [−(N/M)θ, (N/M)θ]d(k−1) ∩WΦ|,which is |(N/M)−θZ)d(k−1) ∩ [−1, 1]d(k−1) ∩ WΦ| by homogeneousity. We estimate this by thenumber of radius ρ = c(N/M)−θ needed to cover [−1, 1]d(k−1) ∩WΦ. We use the following lemma.Lemma 2.3.7 ([34], Chapter 7). Let W ⊆ Cm be a homogeneous algebraic set of topological dimen-sion l and 0 < ρ < 1. Then W ∩ [−1, 1]m can be covered by cρ−l balls of radius ρ.We obtain the minor arc estimateLemma 2.3.8 ([11], Lemma 3.3). If {Mkα} /∈M(θ) then for every τ > 0,|SN (M, s, α)| τ (N/M)d−Kθ+τ (2.3.18)Proof. This is similar to [11]. Suppose Mkα /∈M(θ) then by (2.3.17), we obtain|(N/M)−dSN (M, s, α)|2k−1 . (N/M)−d(k−1)θ∣∣[(−N/M)θ, (N/M)θ]d(k−1) ∩WΦ ∩ Zd(k−1)∣∣ logd(N/M). (N/M)−d(k−1)θ(N/M)θdim(WΦ)+τ= (N/M)−θcodim(V∗F )+τas required.Lemma 2.3.9. Let 0 < θ,  < 1 and let 0 < η ≤ r(1 − k−1)θ. Suppose M ≤ N η1+η then forα /∈M(θ) one has uniformly in s ∈ Zd|SN (M, s, α)| .τ (N/M)d−K1+θ+τ ,∀τ > 0 (2.3.19)49Proof. If Mkα ∈ Ma,q(θ) (mod 1) then there is q ≤ (N/M)r(k−1)θ and ai ∈ Z such that (ai, q) =1 and |Mkαi − ai/q| ≤ q−1(N/M)−k+(k−1)rθ. Hence|αi − a′i/q1| ≤ q−11 (N/M)−k+(k−1)rθfor some q1 ≤Mk(N/M)(k−1)rθ and (a′i, q) = 1.Now sinceM ≤ N η1+η ,we haveM ≤ (N/M)η hence q1 ≤ (N/M)kη+r(k−1)θ ≤ (N/M)(1+)r(k−1)θ.This implies α /∈M((1 + )θ). By contrapositive and Lemma 2.3.8, one has (2.3.19).The first application of Lemma 2.3.9 is the following estimate for Gauss sums.Lemma 2.3.10 (Gauss sum estimate). Let q ∈ N and a ∈ Zr and (a, q) = 1 and s ∈ Zd. Define theGauss SumSa,q(M, s) :=∑x∈Zdqe2piia·F(Mx+s)q (2.3.20)Then if 1 ≤M < q k , one has the following estimate|Sa,q(M, s)| .τ qd−K(1+)r(k−1) +τ (∀τ > 0) (2.3.21)In particular, taking M = 1, → 0 in (2.3.21), one hasSa,q(1, s) = sa,q(1, 0) .τ qd−Kr(k−1) +τProof. Note thatSa,q(M, s) = SMq(M, s,a/q)Also, if r(k − 1)θ < 1 then for all 1 ≤ q′ ≤ qr(k−1)θ < q and (a′, q′) = 1,∣∣aq− a′q′∣∣ ≥ 1qq′>1q′q−k+r(k−1)θThis implies aq /∈M(θ). SinceM < q/k, choose θ so that r(k−1)θ < 1 , sinceM ≤ (N/M)r(k−1)θ,one hasM < qk < (N/M)r(1−k−1)θHence M < (NM )η where η := r(1− k−1)θ which is the assumption of Lemma 2.3.9. Applying thislemma, we have for each τ > 0,|Sa,q(M, s)| .τ qd−K1+θ+τ .τ qd−K(1+)r(k−1) +τ50Now we will show that the minor arcs contribute little to the integral; this holds if codim(V ∗F ) is largeenough which we will assume. We will apply Lemma 4.4 in [11] (see Lemma 2.3.11 below) to oursituation, we make the following assumptionK :=codim(V ∗F )2k−1> (1 + )r(r + 1)(k − 1) (2.3.22)Then we can choose small positive numbers δ, θ0 satisfying the conditionsδ + 2r(r + 2)θ0 < 1 (2.3.23)2δθ−10 < K(1 + )−1 − r(r + 1)(k − 1) (2.3.24)If θ is small enough so that k > 2r(k − 1)θ then, as in [11], we can verify the following facts fromthe definition of the major arcs.– If a/q 6= a′/q′ thenMa,q,Ma′,q′ are disjoint (from condition (2.3.23) ).– |M(θ)| ≤ (N/M)−rk+r(r+1)(k−1)θWe choose θ = θT < θT−1 < · · · < θ0 and writeM(θ)C =M(θ0)C ∪T−1⋃i=0M(θi)\M(θi+1)Using the two facts above and the size of SN (M, s, α) to bound the integral as in [11]. We use thebound on SN (M, s, α) and (2.3.24) to show that integral overM(θ0) is O((N/M)d−kr−δ). We canalso show that the integral over M(θi)\M(θi+1) is O((N/M)d−rk− 32 δ), say, using the bound of|M(θi)| and size of SN (M, s, α) onM(θi+1). Choose |θi+1 − θi| not too big and T δ 1, see [11]Lemma 4.3. We obtainLemma 2.3.11 ([11], Lemma 4.4). Let δ, θ0 satisfy (2.3.23)-(2.3.24) and 0 < η ≤ (1 − k−1)θ0.Then for 1 ≤M ≤ N η1+η and s ∈ Zd, one has∫α/∈M(θ0)|SN (M, s, α)|dα .δ (N/M)d−Mr−δAdditionally, ifη < δ/Mr (2.3.25)which is equivalent to −δ + (Mr + δ)(1− 11+η ) < 0. Then as in the above lemma,(N/M)d−Mr−δ = Nd−MrM−dN−δMMr+δ ≤ Nd−Mrd−nN−δ+(Mr+δ)η(1+η)−1 ≤ Nd−rM−δ′M−d51for some δ′ > 0. Hence under assumptions of Lemma 2.3.11 and equation (2.3.25), one hasRN (M, s,v) =∫S1e−2piiα·vSN (M, s;α)dα =∫M′(θ0)e−2piiα·vSN (M, s;α)dα+O(Nd−rM−δ′M−d)(2.3.26)for any setM′(θ0) ⊇M(θ0). DefineM′(θ0) as follow:M′(θ0) :=⋃1≤q≤(N/M)r(k−1)θ0⋃(a,q)=1M′a,q(θ0) (2.3.27)M′a,q(θ0) := {α ∈ [0, 1]r; |αi − ai/q| ≤ (N/M)−k+r(k−1)θ0 , 1 ≤ i ≤ r} (2.3.28)Now for given α ∈M′a,q(θ0), we can writeα = a/q + β, |β|∞ ≤ |(N/M)|−k+r(k−1)θ0Then if α ∈ M′a,q(θ0) we can give an estimation of SN (M, s;α) in terms of the singular series andthe singular integral.Lemma 2.3.12. Let 0 < η ≤ 12 ,M ≤ Nη1+η , s ∈ Zd. Then for α ∈M′a,q(θ0), we haveSN (M, s;α) = NdM−dq−dSa,q(M, s)I(Nkβ) +O(Nd−1+2η+r(k−1)θ0M−d) (2.3.29)whereI(γ) :=∫Rde2piiγ·F(y)1[0,1]d(y)dyProof. Write x := qy + z with z ∈ [0, q)d. We haveSN (M, s, ;α) =∑z∈Zdqe2piia·F(Mz+s)q∑y∈Zde2piiβ·F(qMy+Mz+s)1[0,N ]d(qMy +Mz+ s) (2.3.30)Now for t ∈ [0, 1]d, using that the arguments in the functions are bounded by . N and recall thatq ≤ (N/M)r(k−1)θ0 ,M ≤ (N/M)η, we have|e(β · F(s+Mz+ qM(y + t)))− e(β · F(s+Mz+ qMy))|. |β · (F(s+Mz+ qMy + qMt)−F(s+Mz+ qMy))|∞.k |β|∞Nk−1qM≤ (N/M)−k+(k−1)rθ0Nk−1(N/M)r(k−1)θ0(N/M)η≤ N−k+(k−1)rθ0+(k−1)+r(k−1)θ0+η = N−1+2(k−1)rθo+η52where we used N/M ≤ N . Now observe thatφN (s+Mz+ qMy) 6= φ(s+Mz+ qMy + qMt)⇐⇒ y ∈ E :=([0, N/qM ]d − (s+Mz)/qM)∆([0, N/qM ]d − (s+Mz)/qM − t).Since we can write E as a union of d boxes with one side has length O(1), the number of y for whichφN (s+Mz+ qMy) 6= φN (s+Mz+ qMy + qMt) is bounded above by |E| . (N/qM)d−1.Hence we can replace the inner sum with the integral, with the error∣∣∣∣ ∫y∈Rde2piiβ·F(qMy+Mz+s)1[0,N ]d(qMy +Mz+ s)dy −∑y∈Zde2piiβ·F(qMy+Mz+s)1[0,N ]d(qMy +Mz+ s)∣∣∣∣=∣∣∣∣ ∑y∈Zd∫t∈[0,1]d(e(β · F(s+Mz+ qMy + qMt))φN (s+Mz+ qMy + qMt)− e(β · F(s+Mz+ qMy)φN (s+Mz+ qMy))dt∣∣∣∣.∑y∈E1 +∑y/∈E∣∣∣∣ ∫t∈[0,1]d(e(β · F(s+Mz+ qMy + qMt)− e(β · F(s+Mz+ qMy))dtφN (s+Mz+Mqy)∣∣∣∣. (N/qM)d−1 + (N/qM)dN−1+2r(k−1)θ0+η = O((N/qM)dN−1+2r(k−1)θ0+η).By a change of variables N−1(qMy +Mz+ s) 7→ y, we have∫y∈Rre2piiβ·F(qMy+Mz+s)1[0,N ]r(qMy +Mz+ s)dy = NdM−dq−dI(Nkβ)Substituting the estimate in (2.3.30) and summing over z ∈ Zdq , we have (2.3.29).The Singular IntegralLet µ ∈ Rr and Φ > 0. Recall I(γ) := ∫Rd e2piiγ·F(y)1[0,1]d(y)dy, writeJ(µ; Φ) =∫|γ|∞≤ΦI(γ)e−2piiγ·µdγJ(µ) := limΦ→∞J(µ; Φ) (2.3.31)Lemma 2.3.13 ([11], Lemma 5.2, Lemma 5.3, section 6). J(µ) exists, continuous and uniformlybounded by ∫Rr|I(γ)|dγ <∞53Furthermore if F(Mx + s) = v has a nonsingular real solutions in [δ, 1 − δ]d then J(N−kv) ≥c(δ) > 0.Proof. (Sketch): Let B be a box of size less than 1 and λ ∈ Rr. DefineI(B, λ) =∫Be2piiλ·F(y)dy = (N/M)−d∫(N/M)Be2pii(N/M)−kλ·F(y)dyThen the claim in Lemma 2.3.13 follows from the bound|I(B, λ)| . Cε(1 + |λ|∞)−K(k−1)r+ε (2.3.32)To see this bound, assume |λ|∞ > 1 in the I(B, λ). Let α = (N/M)−kλ and choose θ so that|λ|∞ = (N/M)r(k−1)θ. Thus, we have α ∈ M′0,1(θ) is on the edge and α /∈ M(θ′) ∀θ′ < θ. Applythe bound on minor arc of θ′ and use the fact that α ∈ M0,1(θ) with (2.3.29) to estimate the sum bythe integral. Note that r(k − 1)θ ≤ 1 if N/M ≥ |λ|2/k∞ . Since α /∈M(θ′), we have|SN (M, s, α)| . (N/M)d+ε((N/M)k|α|∞)−K(k−1)r . (N/M)d+ε|λ|−K(k−1)r∞and as in the proof of Lemma 2.3.12,|(N/M)dI(B, λ)− SN (M, s, α)| . (|α|∞(N/M)k−1)(N/M)d + (N/M)d−1 . ( |λ|∞N/M)(N/M)d.Choosing large enough N that is N/M ≥ |λ|1+K(k−1)r∞ so that |λ|∞N/M ≤ |λ|− K(k−1)r∞ , we obtain|I(B, λ)| .ε |λ|− K(k−1)r+ε∞To see that J(u) is positive, first consider the contribution of singular points; cover [0, 1]d with boxesof sidelength δ. cover C = V ∗F ∩ [0, 1]d with. δ−dim(V∗F ) cubes of size δ.We can show using (2.3.32)that the contribution from these cubes is . δd−k−dim(V ∗F ) which is negligible if d > k + dim(V ∗F )which can be verified. Now we consider only non-singular points y. Let B = [0, 1]d\C. Take localcoordinate ur+1, . . . , ud such that the Jacobian|JacF | =∣∣∣∣∂(f1, . . . , fr, ur+1, . . . , ud)∂(x1, . . . xd)∣∣∣∣54is a nonzero. Following the calculations in [11] Lemma 6.3 (skipping some technical details),we haveJ(u) =∫F(y)=u,y∈Bφ(y)dσ(y)where dσF = dSF/|JacF |. Here dSF is the surface measure of F(y) = u.Let Bδ be the closed ball, suppose y0 ∈ (0, 1)d,the interior of [0, 1]d, y0 + Bδ ⊆ [0, 1]d is suchthat F(y0) = u, |JacF (y0)| 6= 0. Then |JacF (y)| ≥ c(δ,y0) > 0 for all y ∈ y0 + Bδ and{F(y) = u} ∩ {y0}+Bδ is a d− r dimensional surface with positive measure. HenceJ(u) ≥∫{F(y)=u}∩{y0}+BδdSF (y)|JacF (y)| ≥ η(F ,y0) > 0.To get the bound independent of y0, we need such non-singular y0 ∈ [δ, 1 − δ]d (a closed set in theinterior).The Singular SeriesWe define the singular seriesG(M, s;v) :=∞∑q=1q−d∑(a,q)=1e−2piia·vq Sa,q(M, s) (2.3.33)We have by the assumption (2.3.24)2δr(k − 1)θ0 <K(1 + )r(k − 1) − r − 1Then using the Gauss sum estimate (2.3.21) and recalling M ≤ N η1+η , one has∑q≥(N/M)r(k−1)θ0∑(a,q)=1q−d|Sa,q(d, s)| .τ∑q≥(N/M)r(k−1)θ0q− 2δr(k−1)θ0 +τ .τ (N/M)−2δ+τ .τ N−2δ+δη+τ . N−δ(2.3.34)Hence the infinite sum defining G(M, s;v) is absolutely convergent.Finally we analyze the singular series.To express it in terms of the density of solutions in Zpl .Theorem 2.3.14 (Singular Series). Consider the singular seriesG(M, s;v) :=∞∑q=1∑(a,q)=1q−de−2piia·vq Sa,q(M, s), Sa,q(M, s) :=∑x∈Zdqe2piia·F(Mx+s)q . (2.3.35)55We haveG(M, s,v) =∏p primeσp(M, s,v) where σp(M, s;v) = liml→∞σ(l)p (M, s;v)whereσ(l)p (M, s,v) = p−l(d−r)|{x ∈ Zdpl ;F(Mx+ s) = v (mod pl)}|The infinite product converges absolutely and uniformly in v and the singular series is positive if σp ispositive for all p. A sufficient condition is that F(Mx+s) = v has a non-singular solution (mod p)for every p.Proof. (Sketch) Since the summandq−de−2piia·vq Sa,q(M, s)is multiplicative in q (by multiplicativity of Gauss sum), we can formally writeG(M, s,v) =∏p primeσp(M, s,v)whereσp(M, s,v) =∞∑m=0p−md∑(a,pm)=1e−2pii a·vpm Sa,pm(M, s)If p is sufficiently large,we may apply the Gauss sum estimate (2.3.21) together with assumption(2.3.22), we haveσp(M, s,v) = 1 +∞∑m=1O(p(−1− 1(1+ε)r(k+1)+τ)m) = 1 +O(p−δ′) (2.3.36)for some δ′ > 1 and hence the product is absolutely and uniformly convergent in v. Finally, we do aroutine calculation as in [11],σ(l)p (M, s;v) =l∑m=0p−md∑(a,pm)=1e−2pii a·vpm Sa,pm(M, s)= p−l(d−r)|{x ∈ Zdpl ;F(Mx+ s) = v (mod pl)}| (2.3.37)Now to verify positivity, by the bound (2.3.36), we have|∏p≥Pσp(M, s,v)− 1| .∑p≥P|σp(M, s,v)− 1| ≤ p−δ′′ , δ′′ > 056Next we check the positivity of σp(M, s,v) for small p if there is a nonsingular solution (mod p).Arguing as in [11] or in the proof of Lemma 2.4.5 below shows us that this is indeed the case.Now we can prove the main theorem of this section.Proof of Theorem 2.3.4. First we claim that if 0 <  ≤ 1 satisfies  < Kr(r+1)(k−1) −1 and η > 0 suchthatη <14r(r + 2)kmin {, K − (1 + )r(r + 1)(k − 1)rk(1 + )} (2.3.38)then (2.3.10) holds for 1 ≤M ≤ N η1+η and s ∈ Zd.If we have this claim, since K > r(r + 1)(k − 1) we have  > Kr(r+1)(k−1) − 1 ≥ 1r(r+1)(k−1) thenK − (1 + )r(r + 1)(k − 1)rk(1 + )≥ 1rkHence choosing  slightly larger than 1r(r+1)(k−1) , one has η ≤ 14r2(r+1)(r+2)k2 by (2.3.38). Nowunder assumption (2.3.11) that K > 2r(r + 1)(k − 1) + 2rk, we take  slightly larger than 1 thenη ≤ 14r(r+2)k . So we now only need to verify the claim.Set the parameters θ0 and δ asθ0 :=12r(r + 2)k + 1, δ :=θ02min {1, K1 + − r(r + 1)(k − 1)}Then θ0, δ satisfy (2.3.23), (2.3.24). Set η as (2.3.38) above. We haveη < (1− k−1)θ0 and η < δk−1r−1Hence the condition of Lemma 2.3.11 and the condition (2.3.25) are satisfied. Also note that2r(k − 1)θ0 + η ≤ k − 1k(r + 2)+14r(r + 2)k≤ 1r + 2(1− 34k) <13.Hence|M′(θ0)| ≤ (N/M)(r+1)r(k−1)θ0−rk+rη ≤ N−rk+2/3 (2.3.39)By (2.3.26) we have (for some δ′ = δ′(r, k) > 0) ,RN (M, s,v) =∑q≤(N/M)r(k−1)θ0∑a,(a,q)=1∫M′(a,q)e−2piiα·vSN (M, s;α)dα+O(Nd−rk−δ′M−d)57By Lemma 2.3.12 and the size of major arc (2.3.39), this becomesNdM−d( ∑q≤(N/M)r(k−1)θ0q−de−2piia·vq Sa,q(M, s)∫|βi|≤(N/M)−k+r(k−1)θ0e−2piiβ·vI(Nkβ)dβ+O(N rk−13+2r(k−1)θ0+η))Rescaling β := Nkβ, this becomesNd−rkM−d( ∑q≤(N/M)r(k−1)θ0q−de−2piia·vq Sa,q(M, s)J(N−kv;Mk(N/M)r(k−1)θ0) +O(N−δ′))Applying (2.3.31) and (2.3.34) we haveRN (d, s,v) = Nd−rkM−dG(d, s;v)J(N−kv) +O(Nd−rk−δ′M−d) (2.3.40)as required.2.4 Almost Prime Solutions to Diophantine EquationsFor 0 < ε < 1 and N ≥ 1 let Pε[N ] denote the set of natural numbers m ≤ N such that each primedivisor of m is at least N ε. Note that each m ∈ Pε[N ] at most b1/εc prime factors. We call sets ofthe form Pε[N ] “almost prime”. For given v ∈ Zd, letMεF [N ] := |{x ∈ Pε[N ]d; F(x) = v}|,denote the number of almost prime solutions x ∈ [1, N ]d to the system F(x) = v. Let Udpt denotethe multiplicative group of reduced residue classes (mod pt). Let M(pt,v) represents the numberof solutions to the equation F(x) = v in Udpt .For each prime p, define the local densityσ∗p(v) := limt→∞(pt)rM(pt,v)φ(pt)d(2.4.1)provided the limit exists. As almost primes are concentrated in reduced residue classes, the generallocal to global principle suggests thatMεF [N ] ≈ε Nd−kr (log N)−dJ(N−kv)∏pσ∗p(v), (2.4.2)as N →∞ where J(u) is the singular integral. Our main result in this section is the following.Theorem 2.4.1. Let F = (F1, . . . , Fr) be a system of r integral forms of degree k ≥ 2 in d variablessuch that58Rank(F) > r(r + 1)(k − 1)2k−1. (2.4.3)Then there exists a constant ε = ε(n, k) > 0 such thatMεF [N ] ≥ cd,k Nd−kr (log N)−dJ(N−kv)∏pσ∗p(v). (2.4.4)Moreover, if F(x) = v has a nonsingular solution in Up, the p-adic integer units, for all primes p,then ∏pσ∗p(v) > 0.The key to prove Theorem 2.4.1 is to study a weighted sum over the solutions with weights that areconcentrated on numbers having few prime factors. Such weights have been mentioned in section 2.2which we recall here. For given 0 < η < 1, let R := Nη and χ is some smooth compactly supportedfunctions, defineΛR(m) :=∑d|mµ(d)χ(log dlogR).We will also employ the “W -trick” to bypass the contribution of small primes in our initial asymptoticformulas. Let ω = ωF > 1 be a fixed positive integer depending only on the system F and letW :=∏p≤ω p, the product of primes up to ω. Note that if x ∈ Pε[N ]d and p|xi implies p ≥ N ε > ωFfor sufficiently large N , hence (xi,W ) = 1 for each 1 ≤ i ≤ d. We will write (x,W ) = 1 in thiscase. Under the conditions of Theorem 2.4.1 our key estimates are the followingTheorem 2.4.2. Let F = (F1, . . . , Fr) be a system of r integral forms of degree k ≥ 2 in d variablessatisfying the rank condition (2.3.8). Let 0 < η < 14r2(r+1)(r+2)k(k+1), R = Nη and W =∏p≤ω p.Then one has∑x∈[N ]d(x,W )=1, F(x)=vΛ2R(x1x2 · · ·xd) = Nd−rk(log R)dJ(N−kv)∏p|Wσ∗p(v) (1 + oN,W→∞(1)),(2.4.5)moreover for given 0 < ε < η∑x∈[N ]d, x/∈Pε[N ]dF(x)=vΛ2R(x1x2 · · ·xd) .εηNd−rk(log R)dJ(N−kv)∏p|Wσ∗p(v). (2.4.6)In the proof of Theorem 2.4.2 we will use the asymptotic for the number of integer solutions x ∈ [N ]d59to F(x) = v subject to the congruence condition x ≡ s (mod M), where M is a small modulusbounded by a sufficiently small power N This is summarized in Theorem Local Factors of Integral FormsThe following proposition summarizes the properties of the Euler factors we will need. Recall thedensity of solutions in p-adic numbers,σP (M, s,v) = liml→∞σ(l)p (M, s,v)whereσ(l)p (M, s,v) = p−l(d−r)|{x ∈ Zdpl : F(Mx+ s) ≡ v (mod pl)}|Define the Euler’s factor that will appear in the asymptotic of the sum of in (2.4.5).γp(v) :=p−dσp(v)∑s∈ZdpF(s)≡v (mod p)1p|s1...sdσp(p, s,v)The key property we will need isProposition 2.4.3. If F is a family of r integral forms of degree k such that rank(F) > r(r+ 1)(k−1)2k then for all sufficiently large primes p > ωF we haveγp(v) =dp+O(p−2) (2.4.7)We start with a simple observation on the local densities of solutions.Lemma 2.4.4. Let M,W be square free numbers such that (M,W ) = 1 and let p be a prime. If(M,W ) = 1 thenσp(MW, t,v) = σp(v) (2.4.8)If p|M and t ≡ s (mod M) then one hasσp(MW, t,v) = σp(p, t,v) = σp(p, s,v) (2.4.9)The analogue statement indeed holds when we interchange M and W .Proof. For any l ∈ Z+, since (pl,MW ) = 1, we have the transformation x 7→ MWx + t is abijection on Zdpl. Hence|{x ∈ Zdpl : F(x) ≡ v (mod pl)}| = |{x ∈ Zdpl : F(MWx+ t) ≡ v (mod pl)}|60We have (2.4.8). Next assume p|M then M = pM ′ with (p,M ′W ) = 1 then one may write MWx+t = p(M ′Wx) + t and note that x 7→M ′Wx is a bijection on Zdplhence|{x ∈ Zdpl : F(MWx+ t) ≡ v (mod pl)}| = |{x ∈ Zdpl : F(px+ t) ≡ v (mod pl)}|which establishes the first equality in (2.4.9). To see the second equality of (2.4.9) we write py+ t =p(y + u) + s where y 7→ y + u is a bijection on Zdpl.Recall that if the local factor σp(p, s,v) does not vanish then F(s) ≡ v (mod p). We call a points ∈ Zdp non-singular if the Jacobian JacF (s) has full rank r over Zp. We show that under this rankcondition, it is easy to calculate σ(l)p (p, s,v) explicitly.Lemma 2.4.5. Let s be a non-singular solution to F(s) ≡ v (mod p). Then for all l,σ(l)p (p, s,v) = prProof. We do induction on l. For l = 1, we have F(px + s) ≡ F(s) ≡ v (mod p) for all x ∈ Zdp.Hence σ(1)p (p, s,v) = pr. For l = 2, we count x ∈ Zdp2 satisfyingF(px+ s) ≡ F(s) + pJacF (s) · x ≡ v (mod p2).Since F(s)− v = pu for some u ∈ Zdp, this isJacF (s) · x ≡ −u (mod p).Since JacF (s) has full rank, the above equation has pn−r solutions in Zdp and p2n−r solutions in Zdp2 .Hence σ(2)p (p, s,v) = pr.For l ≥ 3, we showσ(l)p (p, s,v) = σ(l−1)p (p, s,v)Now if x ≡ y (mod pl−1) then px+s ≡ px+y (mod pl) then F(px+s) ≡ F(py+s) (mod pl).For given y ∈ Zdpl−1 , we can uniquely write y = pl−2u+ z with z ∈ Zdpl−2 and u ∈ Zdp. ThenF(py + s) ≡ F(pl−1u+ pz+ s) ≡ F(pz+ s) + pl−1JacF (s) · u (mod pl) (2.4.10)Hence F(py + s) ≡ v (mod pl) impliesF(pz+ s) ≡ v (mod pl−1). (2.4.11)The number of such z ∈ Zdpl−2 is p−d × p(l−1)(d−r)σ(l−1)p (p, s,v). For a given z satisfying (2.4.10),61write F(pz+ s) = pl−1b+ v, then (2.4.10) holds if and only ifJacF (s) · u ≡ −b (mod p) (2.4.12)Since JacF (s) has full rank r over Zdp, the number of solutions of (2.4.12) is pd−r. Since the decom-position y = pl−2u+ z is unique it follows thatσ(l)p (p, s,v) = p−l(d−r) × |{x ∈ Zdpl ;F(px+ s) ≡ 0 (mod pl)}|= p−l(d−r)pd × |{x ∈ Zdpl−1 : F(px+ s) ≡ 0 (mod pl)}|= p−l(d−r)pdp−dp(l−1)(d−r)σ(l−1)p (p, s,v)pd−r= σ(l−1)p (p, s,v)as required.For singular values we can only obtain an upper bound for σp(p, s,v). If s = v = 0, we haveF(px) = pkF(x) ≡ 0 (mod pl) which has≈ p(l−k)(d−r)+kd solutions inZdpland hence σ(l)p (p, s,v) ≈pkr.Lemma 2.4.6. Let F be a family of r integral linear forms of degree k, assume the rank conditioncodim(V ∗F ) ≥ r(r + 1)(k − 1)2k + 1 (2.4.13)then uniformly in l ∈ N and s ∈ Zdp, one hasσ(l)p (p, s,v) . pr2k (2.4.14)Proof. By (2.3.37), we haveσ(l)p (p, s,v) =l∑m=0∑b∈Zrpm(bi,p)=1∃ip−mde2piib·F(px+s)pm Sb,pm(p, s)here Sb,pm is the exponential sum defined in (2.3.20). Ifm > rk then the Gauss sum estimate (Lemma2.3.10) applied with  = 1/r and K = codim(V ∗F )/2k−1 gives∑m>rk∑b∈Zrpm(bi,p)=1∃ip−mn|Sb,pm(p, s)| .∑m>rkpmrp− mK(r+1)(k−1) +τ .∑m>rkp−mτ/2 . 1for sufficiently small τ = τ(r, k) := K(r+1)(k−1) − r > 0 (i.e. sufficiently small V ∗F ). Here we apply62the condition (2.4.13); here τ is the constant such thatcodim(V ∗F ) = (r − τ)(r + 1)(k − 1)2k−1Now using the trivial bound p−md|Sb,pm(p, s)| ≤ 1. We haveσ(l)p (p, s,v) =∑m≤rk∑b∈Zrpm(bi,p)=1∃ip−mde2piib·F(px+s)pm Sb,pm(p, s) +O(1) ≤∑m≤rkpmr . pr2kThis proves (2.4.14).2.4.2 Proof of Theorem 2.4.3Since σp(v)−1 = 1 + O(p−2) for all sufficiently large prime p, it suffices to show that (2.4.7) holdsfor σp(v)γp(v). Now use Lemma 2.4.5 to writeσp(v)γp(v) = p−d ∑F(s)≡v (mod p)1p|s1...sdσp(p, s,v)= p−d∑F(s)≡v (mod p)1p|s1...sdσp(p, s,v)= p−d∑F(s)≡0 (mod s)s non-singular1p|s1...sdσp(p, s,v) + p−d ∑F(s)≡0 (mod s)s singular1p|s1...sdσp(p, s,v)= p−d+r∑F(s)≡0 (mod p)1p|s1...sd − p−d+r∑F(s)≡0 (mod p)s singular1p|s1...sd+ p−d∑F(s)≡0 (mod p)s singular1p|s1...sdσp(p.s,v):= γ1p(v) + γ2p(v) + γ3p(v)Now we need the following facts from algebraic geometry on the singular variety when considerreduced (mod p).– [95] The codimension of singular variety does not change when the equation defining the va-riety are considered (mod p). Let V ∗F (p) denote the locus of singular points s ∈ Zdp of the(mod p)−reduced singular variety VF (p) = {s ∈ Zdp : F(s) = v} thencodim(V ∗F (p)) = codim(V∗F )63for all but finitely many primes p.– ([61], Prop. 12.1) The number of points over Zp on a homogeneous algebraic set V is boundedabove by its degree times pdimV .From these two facts, one has|V ∗F (p)| . pd−codim(V∗F )where the implicit constant may depend on n, k, r. For sufficiently large p > ω. Also we state somemore facts from algebraic geometry which we use later.Lemma 2.4.7 ([19] Cor 4.). If F is a system of r forms then for any subspace MJ of codimension |J |one hasrank(F|MJ ) ≥ rank(F)− r|J |Lemma 2.4.8 ([20], Prop 4.). Let v ∈ Frp and S = F−1(v) where F : Fdp → Frp is a homogeneouspolynomial map of degree k then‖1S − p−r‖Uk ≤ (k − 1)2−kdp2−k(r−codim(V ∗F ))In particular|p−d|S| − p−r| = ‖1S − p−r‖U1 ≤ ‖1S − p−r‖Uk .k,d p2−k(r−codim(V ∗F )) (2.4.15)Apply lemma 2.4.6 one has for i = 2, 3,|γip(v)| . p−d+r2k∑F(s)≡0 (mod p)s singular1p|s1...sd . p−d+r2kpd−codim(V∗F ) . pr2k−r(r+1)(k−1)2k−1−1 . p−2For each J ⊆ [1, d] define the coordinate subspace MJ = {s = (s1, . . . , sd) ∈ Zdp : sj = 0 ∀j ∈ J}.by inclusion-exclusion principle, one hasγ1p(v) = p−d+rd∑j=1(−1)j−1∑|J |=j∑s∈MJ1F(s)=v (2.4.16)From Lemma 2.4.7 and our assumption on the rank of the system F , one has that for 2 ≤ |J | ≤ r+1,rank(F|MJ )−r ≥ r(r+1)(k−1)2k−r(r+2) = r[(r+1)(k−1)2k−(r+2)]−r ≥ r2k (2.4.17)Applying Lemma 2.4.8 to the system F restricted to the subspace Mj ∼= Zd−jp , then one obtains64p−(d−j)+r|{s ∈MJ ;F(s) = v}| = 1 +Ok,d(p−rank(F|MJ )−r2k+r) = 1 +O(p−ε′) (2.4.18)for some ε′ ≥ 0. Hence from (2.4.16), we haveσ1p(v) = p−d+rd∑k=1∑s∈M{k}1F(s)=v + p−d+rr+1∑j=2(−1)j−1∑|J |=j∑s∈Mj1F(s)=v+ p−d+rd∑j=r+2(−1)j−1∑|J |=j∑s∈Mj1F(s)=vFor the second term corresponding to 2 ≤ j ≤ r+ 1, by (2.4.18), the total sum contributes O(p−j) =O(p−2). For the third term we use the the trivial fact |Mj | = pd−j ≤ pd−r−2 and hence the third termalso contributes O(p−2). Henceσ1p(v) = p−d+rd∑k=1∑s∈M{k}1F(s)=v +O(p−2) =dp+O(p−2)as required.2.4.3 Sums of Multiplicative FunctionsLet (b,W ) = 1 and furthermore let us assume the conditions of Theorem 2.4.2 holds. DefineSN,W,b(v) :=∑x≡b (mod W )F(x)=vΛ2R(x1x2 . . . xd)1[0,N ]d(x) (2.4.19)and for a prime q > ω,SN,W,b,q(v) :=∑x≡b (mod W )F(x)=v1q|x1...xdΛ2R(x1x2 . . . xd)1[0,N ]d(x) (2.4.20)First we show that these sums could be written in terms of Euler factors and Goldston-Yidirim sums.The proof invokes Theorem 2.3.4.Lemma 2.4.9.SW,b(N) = Nd−krJ(N−kv)W−dGW,b(v)∑′D;(D,W )=1hD(R)γD(v) +O(Nd−rk−δ) (2.4.21)65SW,q,b(N) = Nd−krJ(N−kv)W−dGW,b(v)∑′D;(D,W )=1hD(R)γ[D,q](v) +O(Nd−rk−δ) (2.4.22)whereGW,b(v) :=∏p|Wσp(p,b,v)∏p-Wσp(v) (2.4.23)γD(v) := D−d ∑s∈ZdDF(s)≡v (mod D)1D|s1...sd∏p|Dσp(p, s,v)σp(v)(2.4.24)hD(R) :=∑[d1,d2]=Dµ(d1)µ(d2)χ(log d1logR)χ(log d2logR) (2.4.25)Proof. By definition (2.4.19),SW,b(N) : =∑x≡b (mod W )F(x)=vΛ2R(x1x2 · · ·xd)1[0,N ]d(x)=∑x≡b (mod W )F(x)=v1[0,N ]d(x)∑′d1,d2[d1,d2]|x1···xdµ(d1)µ(d2)χ(log d1logR)χ(log d2logR)=∑′D∑d1,d2[d1,d2]=Dµ(d1)µ(d2)χ(log d1logR)χ(log d2logR)∑x≡b (mod W )F(x)=v1D|x1···xd1[0,N ]d(x)(2.4.26)Since (b,W ) = 1, the inner sum of the last line of (2.4.26) is zero unless (D,W ) = 1 which we as-sume from now on. The condition x ≡ b (mod W ) andD|x1 . . . xd depends only on x (mod DW )thus one may write∑x≡b (mod W )F(x)=v1D|x1···xd1[0,N ]d(x) =∑t∈ZdDW ,t≡b (mod W )F(t)≡v (mod DW )1D|t1...td∑x≡t (mod DW )F(x)=v1[0,N ]d(x)(2.4.27)66Since D ≤ R2 ≤ N η1+η , apply Theorem 2.3.4,we can write (2.4.27) as∑t∈ZdDW ,t≡b (mod W )F(t)≡v (mod DW )1D|t1...td(Nd−kr(DW )−nJ(N−kv)∏pσp(DW, t,v) +O(Nd−kr−δ′D−d))(2.4.28)First we estimate the contribution of error term in (2.4.28) to the sum SW,b(N). By the standardnumber of divisors estimate, we have that the number of pairs d1, d2 such that [d1, d2] = D is .τ Dτfor any τ > 0. Also since χ is supported on x ≤ 1, the sum in D is restricted to D ≤ R2. Thecontribution of error term is given by.τ Nd−kr−δ′W d∑D≤R2Dτ . Nd−kr−δ′W dR2N τ . Nd−kr−δ′/2 (2.4.29)for a sufficiently small τ . (here we may think of W as a fixed large constant.)Now to calculate the main term using the Chinese Remainder Theorem. For each t ∈ ZdDW satisfyingt ≡ b (mod W ) , there is a unique s ∈ ZdD such that t ≡ s (mod D). Hence suppose F(b) ≡v (mod D) then F(t) ≡ v (mod DW ) is equivalent to F(s) ≡ v (mod W ). Hence, applyingLemma 2.4.4, we have∑t∈ZdDW ,t≡b (mod W )F(t)≡v (mod DW )1D|t1...td∏p|Wσp(p,b,v)∏p|Dσp(p, s,v)∏p-DWσp(v)=∏p|Wσp(p,b,v)∏p-Wσp(v)∑s∈ZdDF(s)≡v (mod D)1D|s1...sd∏p|Dσp(p, s,v)σp(v)(2.4.30)Hence (2.4.19) follows from (2.4.26)-(2.4.30).Now to show (2.4.20), we do the same calculation with D is replaced by Dq = [D, q] in (2.4.26).Hence (2.4.27)-(2.4.30) remain valid with D replaced by [D, q]. Now we have to calculate (asymp-totically) the sumSW (f, γ) :=∑′D:(D,W )=1γD(v)hD(R) (2.4.31)This could be done by sieve methods, as in [42], [106]. We will follow the approach in [106] and thenadapt it to give the asymptotic for the sumSW,q(f, γ) :=∑′D:(D,W )=1γ[D,q](v)hD(R) (2.4.32)which will be needed in the concentration estimate (2.4.5).67Lemma 2.4.10 ([106], Proposition 10). Let γD(v) be a multiplicative function (in D) satisfying theestimate (2.4.7). Let χ(x) = f(x) = (1− x)10d+ thenSW (f, γ) = (φ(W )WlogR)−d∫ ∞0f (d)(x)2xd−1(d− 1)!dx+ oω→∞(1) (2.4.33)Furthermore, for a prime q > ω,SW,q(f, γ) =dq(φ(W )WlogR)−d∫ ∞0(f (d)(x)−f (d)(x+ log qlogR))2xd−1(d− 1)!dx+oω→∞(1) (2.4.34)Proof. Since f(x) = (1 − x)10d+ hence exf(x) is compactly supported and 10d − 1 continuouslydifferentiable. Denoted fˆ(t) the Fourier transform of exf(x). Hence|fˆ(t)| . (1 + |t|)−10dRecall the formulad− 1logR f(log dlogR) =∫Rd−itlogR fˆ(t)dt.Substitute this into (2.4.25), swapping the sum and integral due to rapid decay of fˆ1,fˆ2, one hashD(R) =∫R∫R∑d1,d2[d1,d2]=Dµ(d1)µ(d2)d− 1+it1logR1 d− 1+it2logR2 fˆ(t1)fˆ(t2)dt1dt2 =∫R∫RgD(t1, t2)fˆ(t1)fˆ(t2)dt1dt2.(2.4.35)The function gD(t2, t2)γD(v) is multiplicative inD and by rapid decay of gD(t1, t2) in t1, t2, one has∑′D:(D,W )=1gD(t1, t2)γD(v) =∏p>ω(1 + gp(t1, t2)γp(v)).Substitute this into (2.4.31) givesSW (f, γ) =∫R∫R∏p>ω(1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR)fˆ(t1)fˆ(t2)dt1dt2 (2.4.36)Using the asymptote of γp(v) (2.4.7), one has the following estimate via Taylor’s series of log(1 + ),log∣∣∣∣1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR∣∣∣∣ ≤ 3dp−1− 1logR +O(p−2)68Now by the well-known asymptotic15∑pp−1− 1logR = log logR+O(1)Hence the integrand in (2.4.36) is bounded byC(logR)3d(1 + |t1|)−10d(1 + |t2|)−10dIntegrating over |t2| >√logR,∫|t2|>√logR∫R(logR)3d(1+|t1|)−10d(1+|t2|)−10ddt2 = O(log−dR∫|t2|>√logR(1+|t2|)−2ddt2) = O(log−dR)The same holds for |t1| >√logR. HenceSW (f, γ) =∫|t1|≤√logR∫|t2|≤√logR∏p>ω(1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR)fˆ(t1)fˆ(t2)dt1dt2+O(log−dR)(2.4.37)For <s > 1, defineζW (s) :=∏p>ω(1− p−s)−1 = ζ(s)∏p≤ω(1− p−s)Apply (2.4.7) (asymptote for γp(v)), we have that∏p>ω(1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR)=∏p>ω(1− dp1+s1− dp1+s2+dp2+s1+s2+O(p−2))=∏p>ω(1− dp1+s1− dp1+s2+dp2+s1+s2)(1 +O(p−2))=∏p>ω(1− dp1+s1− dp1+s2+dp2+s1+s2)∏p>ω(1 +O(p−2))=ζW (1 + s1 + s2)dζW (1 + s1)dζW (1 + s2)d(1 + oω→∞(1)) (2.4.38)where s1 = 1 + 1+it1logR , s2 = 1 +1+it2logR . On the range |t1|, |t2| ≤√logR, we have s = 1 +O( 1√logR).15This can be seen by taking log of the following equation obtained from the simple pole with residue 1 at 1 of the Riemann’sZeta Function. ∏p(1− 1p1+ 1logR)=1ζ(1 + 1logR)=1logR+O(1)69For each fixed ω, letting N and hence R goes to infinity, we have∏p≤ω(1− p−s) =∏p≤ω(1− p−1) + o(1) = φ(W )W+ o(1)Hence using that ζ(s) = (s− 1)−1 +O(1)ζW (s) =∏p≤ω(1− p−s)ζ(s) = (φ(W )W+ o(1))( 1s− 1 +O(1))=1s− 1φ(W )W+ o(1s− 1) +O(φ(W )W) + o(1)=1s− 1φ(W )W(1 + ow→∞(1))Substitute this into (2.4.38) and (2.4.37) givesSW (f, γ) = (φ(W )WlogR)−d∫|t1|,|t2|≤√logR(1 + it1)d(1 + it2)d(2 + it1 + it2)d(1 + oω→∞(1))fˆ(t1)fˆ(t2)dt1dt2 +O(log−dR)= (φ(W )WlogR)−d∫|t1|,|t2|≤√logR(1 + it1)d(1 + it2)d(2 + it1 + it2)dfˆ(t1)fˆ(t2)dt1dt2(1 + oω→∞(1))= (φ(W )WlogR)−d∫R∫R(1 + it1)d(1 + it2)d(2 + it1 + it2)dfˆ(t1)fˆ(t2)dt1dt2(1 + oω→∞(1))= (φ(W )WlogR)−d∫R∫R(1 + it1)d(1 + it2)d(2 + it1 + it2)dfˆ(t1)fˆ(t2)dt1dt2 + oω→∞(1) (2.4.39)Here in the last line we use that fˆ is rapidly decays and extending the integral to R causes an errorterm o(1). Now recall the value of the Gamma function at positive integers k:(k − 1)! = Γ(k) =∫ ∞0e−xxk−1dxSince Γ is analytic on {z : <z ≥ 0} and e−zzk−1 decays for large <z. Hence we shift the contourfrom R to (s+ it)R with s > 0, which are lines in the first quadrant with starting point at origin. Thatis we have that for y = (s+ it)x,(k − 1)! =∫ ∞0e−yyk−1dyThat is(s+ it)−k =∫ ∞0e−x(s+it)xk−1(k − 1)!dx (2.4.40)70In our case, we have(2 + it1 + it2)−d =∫ ∞0e−x(2+it1+it2)xd−1(d− 1)!dx (2.4.41)Also recall the Fourier inversion formulaf(x) =∫Re−(1+it)xfˆ(t)dtDifferentiate n times, one getf (d)(x) = (−1)d∫Re−x(1+it)(1 + it)dfˆ(t)dt (2.4.42)Hence one could write (2.4.39) asSW (f, γ) = (φ(W )WlogR)−d∫ ∞0f (d)(x)2xd−1(d− 1)!dx+ oω→∞(1)This shows (2.4.33).Now we modify the above arguments to show (2.4.34). Fix a prime q > ω, we haveSW,q(f, γ) :=∫R∫R∑′D:(D,W )=1γ[D,q](v)gD(t1, t2)dt1dt2 (2.4.43)Now we separate the inner sum in D into cases q - D and q|D,∑′D:(D,W )=1γ[D,q](v)gD(t1, t2) = γq(v)(1 + g1(t1, t2))∑′D,q-D,(D,W )=1gD(t1, t2)γD(v)= γq(v)(1 + gq(t1, t2))∏p>ω,p6=q(1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR)=γq(v)(1 + gq(t1, t2))1 + gq(t1, t2)γq(v)∏p>ω(1− γp(v)p1+it1logR− γp(v)p1+it2logR+γp(v)p2+it1+it2logR)Hence this differs from the previous case in the sense that we have the additional factor γq(v)(1+gq(t1,t2))1+gq(t1,t2)γq(v) .Since we assume q > ω we haveγq(v) =dq(1 + o(1))71Using this estimate, we haveγq(v)(1 + gq(t1, t2))1 + gq(t1, t2)γq(v)=dq(1− q−1+it1logR )(1− q−1+it2logR )(1 + o(1))Hence we have analogue of (2.4.39),SW,q(f, γ) =(φ(W )WlogR)−d∫R∫R(1 + it1)d(1 + it2)d(2 + it1 + it2)d× (1− e−(1+it1) log qlogR )(1− e−(1+it2) log qlogR )fˆ(t1)fˆ(t2)dt1dt2 + oω→∞(1) (2.4.44)Applying (2.4.41), we write (2.4.39) as∫ ∞0(∫R(e−x(1+it) − e−(x+ log qlogR )(1+it))(1 + it)dfˆ(t)dt)2 xd−1(d− 1)!dxApplying (2.4.42), this becomes∫ ∞0(f (d)(x)− f (d)(x+ log qlogR))2xd−1(d− 1)!dx+ oω→∞(1)as required.2.4.4 Proof of the Main TheoremProof of Theorem 2.4.2. Let η ≤ η(r,k)2(1+η(r,k)) where η(r, k) is as in the assumption of Theorem 2.4.2.Then by (2.4.21) and (2.4.30), one has∑x∈[N ]d(x,W )=1, F(x)=vΛ2R(x1x2 · · ·xd) =∑b∈ZdW(b,W )=1SN,W,b(v)= cd(f)Nd−krJ(N−kv)(logR)−dφ(W )−d(1 + oω→∞(1))∑b∈ZdW(b,W )=1GW,b(v) +O(Nd−kr−δ′)(2.4.45)By Lemma 2.4.4 and the Chinese Remainder Theorem,φ(W )−d∑b∈ZdW(b,W )=1GW,b(v) =∏p|W(φ(p)−d∑b∈Zdp(b,p)=1σp(p, b,v))∏p-Wσp(v) (2.4.46)Recall that σp(v) = 1 +O(p−2), hence∏p-W σp(v) = 1 + oω→∞(1). For a fixed l ∈ N and primes72p ≤ ω, one has16φ(p)−dp−l(d−r)∑b∈Zdp,(b,p)=1F(b)≡v (mod p)|{x ∈ Zdpl ;F(px+ b) = v|}|= φ(p)−dp−l(d−r)pd|{y ∈ Zdpl ; (y, p) = 1,F(y) = v}|=prdφ(pl)dM(pl;v) (2.4.47)where M(pl;v) is the number of solutions to F(y) ≡ v (mod pl) in the reduced residue classy ∈ Zdpl, (y, p) = 1. Taking limit l→∞, one hasφ(p)−d∑b∈Zdp,(b,p)=1F(b)≡v (mod p)σp(p,b;v) = σ∗p(v)and by (2.4.46), one hasφ(W )−d∑(b,W )=1F(b)≡v (mod W )GW,b(v) =∏p|Wσ∗p(v)(1 + oω→∞(1)) = G∗(v)(1 + oω→∞(1))This proves (2.4.5).Now we prove (2.4.6). Note that to estimate the sum over x ∈ [N ]d\P[N ]d under the restriction(x,W ) = 1, we only need to sum over x = (x1, . . . , xd) for which q|x1 . . . xd for which q|x1 . . . xdfor some prime ω < q ≤ N . Hence∑x∈[N ]d\P(N)d(x,W )=1,F(x)=vΛ2R(x1 . . . xd) ≤∑ω<q≤N∑(x,W )=1F(x=v)1q|x1...xdΛ2R(x1 . . . xd)1[0,N ]d(x) =∑ω<q≤NSW,q,b(N)Now choose f(x) = (1 − x)10d+ then recall an − bn = (a − b)(an−1 + an−2b + · · · + bn−1), weobserve directly that for 0 ≤ x, τ ≤ 1, we have|f (d)(x)− f (d)(x+ τ)| ≤ τ |f (d+1)(x)| (2.4.48)Then by estimates (2.4.22), (2.4.34), and (2.4.48),∑ω<q≤NSW,q,b(N)16By pd to one correspondence between {px+ b : x ∈ Zdpl , (b, p) = 1} and {y ∈ Zdpl : (y, p) = 1}.73≤∑ω<q≤Ndq(φ(W )WlogR)−d∫ ∞0(f (d)(x)− f (d)(x+ log qlogR))2xd−1(d− 1)!dx+ oω→∞(1) +O(Nd−kr−δ)≤ dq(log qlogR)2cd+1(f)Nd−kr(logR)−dG∗(N,v)(1 + oω→∞(1)) +O(Nd−kr−δ)Write ′ = /η so N  = R′ where R = Nη. Then using dyadic decomposition and the PrimeNumner Theorem, one can bound the sum over primes q, ω < q ≤ R′ ,∑ω<q≤R′q−1(log q)2 =∑ω≤2j<Rε′∑2j−1<q≤2jq−1(log q)2.∑ω≤2j<Rε′(2jj+ oω→∞(1))j22j−1≤ (2 + oω→∞(1))∑j≤′ logRlog 2j ≤ 2(′)2as required. We will choose  > 0 to ensure that (2.4.5) dominates (2.4.6). for that we need to comparec′d+1(f) with cd(f) defined in the Theorem. Here f(x) = (1− x)10d+ so f (d)(x) = αd(1− x)9d+ andf (d+1)(x) = 9dαd(1− x)9d−1+ with αd = (10d)!/(9d)!. By the beta function identity,∫ 10(1− x)axbdx = a!b!(a+ b+ 1)!we havec′d+1(f) < 16d2cd(f)Hence if32d3(/η)2 ≤ 12(2.4.49)then for sufficiently large N,ω.∑x∈P(N)F(x)=vΛ2R(x1 . . . xd)1[0,N ]d(x) ≥ cdNd−kr(logR)−dG∗(N,v) (2.4.50)for some positive constant cd = cd(f) > 0.Finally if x ∈ P[N ]d then each coordinate xi could have at most 1 prime factors. HenceΛR(x1 . . . xd) ≤ the number of squarefree divisors of x1 . . . xd ≤ 2d/.Thus by (2.4.50), the numbers of solutions to F(x) = v with x ∈ P[N ]d satisfiesMF (N) ≥ c(d, k, r)Nd−kr(logN)−dG∗(N,v)where c(d, k, r) := cd2−2d/ for some  = (d, k, r) > 0. In fact we may choose  := (4d)−3/2η(r, k)74to satisfy (2.4.49) with η(r, k) = (8r2(r+ 1)(r+ 2)k(k+ 1))−1 (chosen from conditions in Theorem2.3.4). This proves Theorem Concluding RemarksSome ideas from additive combinatorics are also used along this line where solutions are restricted tosome special sets like primes or almost primes. For example, Bourgain-Gamburd-Sarnak’s result [13]on almost primes uses the idea of affine sieves. Their results are different from us that they give betterbound on ranks but only applicable to some classes of equations with high degrees of symmetry. Thisis non generic. Cook-Magyar proves analogue result [19] on prime solutions to diophantine systemwhose rank is tower-exponential with respect to its degree.The first application of inverse Gowers norm theorem to number theory is to study the asymptoticof the number of prime solutions of systems of linear equations of finite complexity by Green-Tao[52]. Basically what is shown in [52] is that ‖Λ˜ − 1‖Uk is small where Λ˜ is W-tricked Mangoldtfunction. To show this, one would need an explicit form of all structures that make the Uk normlarge; to apply the results of [53] [54]. We obtain a decomposition of Mangoldt function: Λ˜ =Λ] + Λ[ Here Λ] =∑d|n,d≤R µ(d) log(n/d) with R a small power of N which will contribute tothe main term (major arc). Any term involving Λ[(sum over d > R) will be a small error term(minor arc) that does not correlate with Uk−obstructions. This follows from smallness of ‖µ‖Us .Recently, so-called nilpotent circle method where nilsequences could play a role of the linear phase,is also used to find the asymptotic of average of f(L1(u, v)) . . . f(Lk(u, v)) for some classes ofarithmetic functions f and Lj are binary linear forms. Some arithmetic functions that are orthogonalto polynomial nilsequences are studied in [76]. In [64] Frantzikinakis and Host prove a structuretheorem for bounded multiplicative functions using higher order Fourier analysis with some newapplications in number theory and Ramsey theory.75Chapter 3Corners in Dense Subsets of Primes via aTransference PrincipleRecall that a set A ⊆ Pd has upper relative density α iflim supN→∞|A ∩ PdN ||PdN |= αLet us state our main result in this section.Theorem 3.0.1. Let A ⊂ (PN )d of positive relative upper density α > 0. Then A contains at leastC(α) Nd+1(logN)2dcorners for some (computable) constant C(α).Indeed this is not the most general known results and there are also other modern approaches to thisproblem known as the densification trick [16]. We demonstrate the application the original approachof Green-Tao [51] to attack this problem. This is the first result in the direction of extending thetheorem of Green and Tao to the multidimensional setting. The key ingredient is to move our set upto translate the problem to the setting of a hypergraph system and to prove an appropriate version ofthe so-called “correlation conditions” of Green and Tao.In higher dimensions, the direct product of primes Pd is not a random subset of Zd and one reason isthe correlation from direct product structure. For example, if we want to count corners {(a, b), (a +d, b), (a, b + d)} in P2 , suppose (a + d, b), (a, b + d) ∈ P2 then the remaining vertex (a, b) mustalso be in P2. Thus the probability that all three vertices are in P2 (or in the direct product of thealmost primes) is not (log N)−6 as one would expect, but roughly (log N)−4. Due to this corre-lation, the obvious generalization of ν, the d−folds tensor product ν ⊗ ν · · · ⊗ ν(x1, . . . , xd) =ν(x1)ν(x2) . . . ν(xd), could not behaves pseudorandomly on its support Ad where A is the support76of ν. For example in the corner P2, if we calculateEa,b,d(ν ⊗ ν)(a, b)(ν ⊗ ν)(a+ d, b)(ν ⊗ ν)(a, b+ d) = Ea,b,dν(a)2ν(b)2ν(a+ d)ν(b+ d),we have to deal with higher moments of ν where we don’t have control. This happens exactly whenthere is a correlation, that is there are points P1, P2 with a projection pii to a coordinate axis such thatpii(P1) = pii(P2). Such a correlation does not happen in the case of Gaussian prime ; a + ib, c + idbeing Gaussian primes do not imply that a+ id is Gaussian prime (but there is a milder correlation toits conjugate).Our approach is transfer our problem to a corresponding problem in hypergraph to get rid of strongcorrelation from direct product structure. This approach partly used already in [102], where one re-duces the problem to that of proving a hypergraph removal lemma for weighted uniform hypergraphs.Then we use an appropriate form of the so-called transference principle [37], [87] to remove theweights and apply the removal lemmas for “un-weighted” hypergraphs, obtained in [36], [82], [104].An interesting feature is that in our situation the so-called dual function estimates [50] are naturallyhandled only by the linear forms conditions.In our weighted setting, this method allows us to distribute the weights such that we can avoid dealingwith higher moments of the Green-Tao measure ν. We will define the notion of independent (pseu-dorandom) weight systems on hypergraph which will be used to count prime configurations. Thereason that we cannot handle more general constellations is that we don’t quite have a suitable regu-larity or removal lemma for general weight systems on non-uniform hypergraphs which allow us todo transference arguments. We will apply different methods to overcome this difficulty in the nextchapter.3.1 Hypergraph Setting and Weighted Hypergraph System.First let us parameterize any affine copies of a corner as follow.Definition 3.1.1. A non-degenerate corner is given by the following set of d−tuples of size d + 1 inZd (or ZdN ):{(x1, ..., xd), (x1 + s, x2, ..., xd), ..., (x1, ..., xd−1, xd + s), s 6= 0}or equivalently,{(x1, ..., xd), (z−∑1≤j≤dj 6=1xj , x2, ..., xd), (x1, z−∑1≤j≤dj 6=2xj , x3, ..., xd), ..., (x1, ..., xd−1, z−∑1≤j≤dj 6=dxj)}with z 6= ∑1≤i≤d xi77Now to a given set A ⊆ ZdN , we assign a (d+ 1)− partite hypergraph GA as follows:Let X1 = ... = Xd+1 := ZN be the vertex sets, and for 1 ≤ j ≤ d. Let an element a ∈ Xj representthe hyperplane xj = a, and an element a ∈ Xd+1 represent the hyperplane a = x1 + .. + xd. Wejoin these d vertices (which represent d hyperplanes) if all of these d hyperplanes intersect in a singlepoint in A. Then a simplex in GA corresponds to a corner in A. Note that this includes trivial cornerswhich consist of a single point where they are negligible in order of magnitude.For each I ⊆ [d+ 1] let E(I) denote the set of hyperedges whose elements are exactly from verticesset Vi, i ∈ I . In order to count corners inA, we will place some weights on some of these hyperedgesthat will represent the coordinates of the corner. To be more precise we define the weights on 1−edges:νj(a) = ν(a), a ∈ Xj , j ≤ d, νd+1(a) = 1, a ∈ Xd+1,and on d−hyperedges:νI(a) = ν(ad+1 −∑j∈I\{d+1}aj), a ∈ E(I), |I| = d, d+ 1 ∈ Iν[1,d](a) = 1, a ∈ E([1, d])In particular the weights are 1 or of the form νI(LI(xI)) where all linear forms {LI(xI)} are pair-wise linearly independent. This is an example of what we will call independent weight system. Theparametrization of corner is indeed a special case of general parametrization (4.2.11) in next chapterbut let us just describe this explicitly here. Important special features here is that all linear forms eitherdepends on 1 or d variables.V1V2V3V4weight for corner in Z3 on 4−partite 3−regular hypergraph:ν(x4 − x1 − x2) on (x1, x2, x4)ν(x4 − x2 − x3) on (x2, x3, x4)ν(x4 − x1 − x3) on (x1, x3, x4)ν(x1), ν(x2), ν(x3) on x1, x2, x3 respectivelyMeasure space (Ve, µe(xe)).Ve =∏j∈e Vj , xe = (xi)i∈eFigure 3.1: Weighted hypergraph system. In general, on each edge e of a hypergraph, we attach theweight∏ν(Le) on e where the product is taking over all linear forms depending on exactly xe.In our case, these linear forms will be pairwise linearly independent.Definition 3.1.2 (Independent weight system). An independent weight system is a family of weights78on the edges of a d+ 1−partite hypergraph such that for any I ⊆ [d+ 1], |I| ≤ d, νI(xI) is either 1or of the form∏K(I)j=1 ν(LjI(xI)) where all distinct linear forms {LjI} 1≤j≤K(I) are pairwise linearlyindependent, moreover the form LjI depends exactly on the variables xI = (xj)j∈I .In fact for a weight system that arised from parametrizing affine copies of configurations in Zd, itis easy to see from the construction that for any I ⊆ [d + 1], |I| = d all distinct linear forms{LkJ}J⊆I,1≤k≤K(J) are pairwise linearly independent.Now for each I = [d+ 1]\{j}, 1 ≤ j ≤ d letfI = 1A(x1, ..., xj−1, xd+1 −∑1≤i≤di 6=jxi, xj+1, ..., xd) · νIand for I = [d], let fI = 1A(x1, ..., xd). As the coordinates of a corner contained in Pd are given by2d prime numbers. Recall ν(p) ≈ logN if p is a prime in [ε1N, ε2N ] (in residue class b (mod W )).We define a multi-linear formΛ := Λd+1(fI , |I| = d) := Ex[d+1]∏|I|=dfId∏i=1ν(xi) = N−d−1 ∑pi∈A,1≤i≤2d(pi)1≤i≤2d constitutes a corner2d∏i=1ν(pi)≈ log2dNNd+1|number of corners in A|Hence Λ can be used to estimate the numbers of corners. Indeed if Λ ≥ C1 thennumber of corners in A ≥ C2 Nd+1log2dN.We define measure spaces associated to our system of measure as follows. For 1 ≤ i ≤ d, let(Xi, µXi) = (ZN , ν) where ν is the Green-Tao measure, and let µXd+1 be the normalized countingmeasure on Xd+1 = ZN . With this notation one may writeΛ := Λd+1(fI , |I| = d) =∫X1· · ·∫Xd+1∏|I|=dfI dµX1 · · · dµXd+1 .We define a measure on XI , I ⊆ [d+ 1], |I| = d associated to our weight system by∫XIfdµXI := ExIfI ·∏J⊆I,|J |<dνJ(xJ),also on X[d+1] by ∫X[d+1]fdµX[d+1] := Ex[d+1]f ·∏I⊆[d+1],|I|<dνI(xI),79and the associated multi-linear form byΛ := Λ(fI , |I| = d) :=∫X[d+1]∏|I|=dfIdµX[d+1] (3.1.1)Remark 3.1.3. For general configurations, we will use same weighted hypergraph to count the primeconfigurations but will not attach weight to the function and so the measure space is constructed abit differently. For the corner case in this chapter, we will apply the transference principle techniquehence we will attach weights of size d to the functions, only weights of size 1 left on the hypergraph.This strategy will not work in general in particular if there is an intermediate weight of size d′, 1 <d′ < d. We don’t have an appropriate version of hypergraph removal for transference principle. If weattach that weight to the function, then we don’t have control on the size of dual functions.3.2 Weighted Box Norm and Weighted Generalized von-Neumann’sInequalityIn this section we describe the weighted version of Gowers’s uniformity norm on (d + 1)−partitehypergraph (box norms) and the so-called Gowers’s inner product associated to the hypergraph GAendowed with a weight system {νI}I⊆ [d+1],|I|≤d. We describe the analogue properties of weightedbox norm as in unweighted case i.e. Gowers-Cauchy-Schwartz’s inequalities and generalized von-Neumann inequalities in this setting. Here we may recall the index notations stated at the starting ofChapter 1.Definition 3.2.1. For each 1 ≤ j ≤ d, let Xj , Yj be finite set (in this thesis, we will take Xj = Yj :=ZN ) with a weight system ν on X[d] × Y[d]. For f : X[d] → R, define‖f‖2ddµ :=∫X[d]×Y[d]∏ω[d]f(Pω[d](x[d],y[d]))dµX[d]×Y[d]:= Ex[d]Ey[d]∏ω[d]f(Pω[d](x[d],y[d]))×∏|I|<d∏ωIνI(PωI (xI ,yI))and define the corresponding Gowers’s inner product of 2d functions,〈fω, ω ∈ {0, 1}d〉dµ:=∫X[d]×Y[d]∏ω[d]fω[d](Pω[d](x[d],y[d]))dµX[d]×Y[d]:= Ex[d]Ey[d]∏ω[d]fω[d](Pω[d](x[d],y[d]))∏|I|<d∏ωIνI(PωI (xI ,yI))So〈f, ω ∈ {0, 1}d〉dµ = ‖f‖2ddµ .80For each e ∈ H, we may define have a measure space on Ve with weighted from all edges f ⊆ e. Wecan define the box norm ‖f‖µe for f : Ve → R as well. If e is clear from the context, we may writethis as ‖f‖d′µ where |e| = d′.Remark 3.2.2. To prove weighted Cauchy-Schwartz’s inequality (Theorem 3.2.3) or Generalized vonNeumann’s theorem (Theorem 3.2.5) below. We will apply Cauchy-Schwartz’s inequality and linearforms conditions. The way we apply linear forms condition we will only consider the set of variablesthey depend on, and if they are different, linear forms condition is applicable. This will be how weapply linear forms conditions here.Theorem 3.2.3 (Gowers-Cauchy-Schwartz’s Inequality).|〈fω;ω ∈ {0, 1}d〉dµ| ≤∏ω[d]∥∥fω∥∥dµ .Proof. We will use Cauchy-Schwartz’s inequality and linear form condition. Write〈fω;ω ∈ {0, 1}d〉dµ= Ex[2,d],y[2,d][( ∏|I|<d,1/∈I∏ωIνI(PωI (xI ,yI)))1/2(Ex1ν(x1)∏ω[2,d]fω(0,[2,d])(x1, Pω[2,d](x[2,d],y[2,d]))∏|I|<d−1,1/∈Iν{1}∪I(x1, PωI (xI ,yI)))×( ∏|I|<d,1/∈I∏ωIνI(PωI (xI ,yI)))1/2(Ey1ν(y1)∏ω[2,d]fω(1,[2,d])(y1, Pω[2,d](x[2,d],y[2,d]))∏|I|<d−1,1/∈Iν{1}∪I(y1, PωI (xI ,yI)))]Applying the Cauchy Schwartz inequality in the x[2,d],y[2,d] variables, one has|〈fω;ω ∈ {0, 1}d〉dµ |2 ≤ A ·Bhere,A = Ex[2,d],y[2,d][ ∏|I|<d,1/∈I∏ωIνI(PωI (xI ,yI))×(Ex1,y1ν(x1)ν(y1)∏ω[2,d]fω(0,[2,d])(x1, Pω[2,d](x[2,d],y[2,d]))fω(0,[2,d])(y1, Pω[2,d](x[2,d],y[2,d]))×∏|I|<d−1,1/∈I∏ωIν{1}∪I(x1, PωI (xI ,yI))ν{1}∪I(y1, PωI (xI ,yI)))]=〈f (0)ω (Pω(x[d],y[d]))〉dµ81where f (0)ω˜ = f(0,ω˜∩[2,d]) for any ω˜[1,d]. And,B = Ex[2,d],y[2,d][ ∏|I|<d,1/∈I∏ωIνI(PωI (xI ,yI))×(Ex1,y1ν(x1)ν(y1)∏ω[2,d]fω(1,[2,d])(x1, Pω[2,d](x[2,d],y[2,d]))fω(1,[2,d])(y1, Pω[2,d](x[2,d],y[2,d]))×∏|I|<d,1/∈I∏ωIν{1}∪I(x1, PωI (xI ,yI))ν{1}∪I(y1, PωI (xI ,yI)))]=〈f (1)ω (Pω(x[d],y[d]))〉dµwhere f (1)ω˜ = f(1,ω˜∩[2,d])for any ω˜[1,d].In the same way, we apply Cauchy-Schwartz’s inequality in (x[3,d],y[3,d]) variables to end up with|〈fω;ω ∈ {0, 1}d〉dµ |4 ≤∏ω[0,1]〈fω[1,2]ω ;ω ∈ {0, 1}d〉dµContinue applying Cauchy-Schwartz’s inequality consecutively in (x[4,d],y[4,d]), ..., (x[d,d],y[d,d]) vari-ables, we end up with|〈fω;ω ∈ {0, 1}d〉dµ |2d ≤∏ω[d]〈fω, ..., fω〉dµ , fω = fω≤∏ω[d]∥∥fω∥∥2ddµCorollary 3.2.4. ‖·‖dµ is a norm for N is sufficiently large.Proof. First we show nonnegativity. By the linear forms condition, ‖1‖ν = 1 + o(1). Hence by theGowers-Cauchy-Schwartz inequality, we have ‖f‖dµ & |〈f, 1, ..., 1〉dµ | ≥ 0 for all sufficiently largeN . Now‖f + g‖dµ = 〈f + g, ..., f + g〉dµ =∑ω∈{0,1}d〈hω1 , ..., hωd〉dµ , hω =f , ω = 0g , ω = 1≤∑ω∈{0,1}d‖hω1‖dµ ... ‖hωd‖dµ= (‖f‖dµ + ‖g‖dµ)2dAlso it follows directly from the definition that ‖λf‖2ddµ = λ2d ‖f‖2ddµ . Since the norm are nonnega-82tive, we have ‖λf‖dµ = |λ| ‖f‖dµ .Now we will prove Generalized von Neumann inequalities. The generalized von-Neumann inequalitysays that the average Λ := Λd+1,µ(fI , I ⊆ [d+ 1], |I| = d), see (3.1.1), is controlled by the weightedbox norm. We show this inequality in the general settings of an independent weight system.Theorem 3.2.5 (Weighted generalized von-Neumann inequality for corner). Let I ⊆ [d + 1], |I| =d, fI : XI → [0, 1] bounded by 1. Write fej = f[d+1]\{j}. Let ν be an independent system of measureon X[d+1] that satisfies linear form conditions.1 then|Λd+1,µ(fe1 , ..., fed+1)| . min{‖fe1‖µe1 , ..., ‖fed+1‖µed+1 } (3.2.1)Proof of Weighted Generalized von Neumann. LetH′ = {f ∈ H; |f | < d}, and write the left side of(3.2.1) asΛd+1,µ = Ex∈VJ∏e∈Hdfe(xe)νe(xe)∏f∈H′νf (xf ).Write ej := [d + 1]\{j}, 1 ≤ j ≤ d + 1 for the faces. The idea is to apply the Cauchy-Schwartz in-equality successively in the x1, x2, . . . , xd variables to eliminate the functions2 and weights (fe1 , νe1), . . . , (fed , νed),using the linear forms condition at each step, leaving fed+1 on RHS.Write E := Λd+1,µ. To eliminate fe1 , νe1 we have|E| ≤ Ex2,...,xd+1νe1(xe1)∏1/∈f∈H′νf (xf )∣∣Ex1 ∏j 6=2fej (xj)∏1∈f∈H′νf (xf )∣∣.By the linear forms condition Ex2,...,xd+1νe1(xe1)∏1/∈f∈H′ νf (xf ) = 1 + oN→∞(1), thus by theCauchy-Schwartz inequalityE2 . Ex2,...,xd+1νe1(xe1)∏1/∈f∈H′νf (xf ) Ex1,y1∏j 6=2fejνej (x1, xej\{1})fejνej (y1, xej\{1}) (3.2.2)×∏1∈f∈H′νf (y1, xf\{1}) νf (x1, xf\{1})This eliminates fe1 , νe1 and doubles the variable x1 to the pair of variables (x1, y1) and also doubledeach factor of the form Ge(xe) (which is either fe(xe) or νe(xe), for e ∈ H) depending on the x1variable. To keep track of these changes as we continue with the rest of that variables, let us introduce1This lemma indeed directly applicable to corner system where fI ≤ νI by reinterpreting the weight attached to fI as weightattached to XI .2Indeed, f is bounded by 1 an ν is positive so fe can be trivially eliminated but they are also naturally the same way as νe83some notations. Let g ⊆ [d] and for a function Ge(xe) defineG∗e(xe∩g, ye∩g, xe\g) :=∏ωe∈{0,1}e∩gGe(ωe(xe∩g, ye∩g), xe\g). (3.2.3)We claim that after applying the Cauchy-Schwartz inequality in the x1, . . . , xi variables we have3with g = [i]E2i . Ex[i],y[i],xJ\[i]∏j≤iν∗ej (x[i]∩ej , y[i]∩ej , xej\[i])∏j>if∗ejν∗ej (x[i]∩ej , y[i]∩ej , xej\[i]) (3.2.4)×∏f∈H′ν∗f (xf∩[i], yf∩[i], xf\[i]).For i = 1 this can be seen from (3.2.2). Note that the linear forms appearing in any of these factors arepairwise linearly independent. Assuming it holds for i separating the factors independent of the xi+1variable, and eliminate fei+1 and applying the Cauchy-Schwartz inequality we double the variablexi+1 to the pair (xi+1, yi+1) and each factor G∗e(xe∩[i], ye∩[i], xe\[i]) depending on it, to obtain thefactorG∗e(xe∩[i+1], ye∩[i+1], xe\[i+1]),thus the formula holds for i+ 1. After finishing this process we haveE2d . Ex[d],y[d]∏ω∈{0,1}dfed+1(ω(x[d], y[d]))νed+1(ω(x[d], y[d]))∏f⊆[d],f 6=e0∏ωf∈{0,1}fνf (ωf (xf , yf ))W(x[d], y[d]),whereW(x[d], y[d]) = Exd+1∏d+1∈e∈H∏ωe∈{0,1}e∩[d]νe(ωe(xe∩[d], ye∩[d], xe\[d])).Thus to prove (3.2.1), it is enough to show thatEx[d],y[d]∏f⊆[d]∏ωf∈{0,1}fνf (ωf (xf , yf )) |W(x[d], y[d])− 1| = oN→∞(1).This can be done with one more application of the Cauchy-Schwartz inequality in xd+1 variable lead-ing to 4 terms involving the “big” weight functionsW andW2. Each terms is however 1 + oN→∞(1)by the linear forms condition, as the underlying linear forms are pairwise linearly independent: theforms Lf (ωf (xf , yf )) are pairwise independent for f ⊆ [d], and depend on a different set of vari-ables then the forms Le(ωe(xe∩[d], ye∩[d], xe\[d])) for e * [d] defining the weight function W . Thenew forms appearing inW2 are copies of the forms inW with the xd+1 variable replaced by a newvariable yd+1 hence are independent of each other and the rest of the forms. This proves the proposi-3Noting that for j ≤ i : [i] ∩ ej = [i]\{j}. For j > i : [i] ∩ ej = [i].84tion for fed+1 and we can prove for other fej in the same way.3.3 Dual Function EstimatesDefinition 3.3.1 (Dual Function). For f, g : ZdN → R define the weight inner product〈f, g〉µ :=∫X[d]f · gdµX[d] = Ex∈ZdN f(x)g(x)∏|I|<dνI(xI).Define the dual function of f byDf = Dµf := Ey∈ZdN∏ω 6=0f(Pω(x,y))∏|I|<d∏ωI 6=0νI(PωI (xI ,yI))So‖f‖2dµ = Ex∈ZdN f(x)[Ey∈ZdN∏ω 6=0f(Pω(x,y))∏|I|<d∏ωI 6=0νI(PωI (xI ,yI))] ∏|I|<dνI(xI)= 〈f,Df〉µIn this section we prove the dual function estimate in our hypergraph system. We uses product ofthese dual functions as an uniformity obstruction in soft inverse theorem arguments as in [51], [36].In [51], they allow K to be arbitrary large4 and employ the correlation condition to avoid the infinitelinear forms conditions which was not available at that time (this is the only place where they usedcorrelation condition in [51]). We only needs linear form condition here as the parameter K could bearbitrarily finite but fixed constant, depending on α. This makes the number of linear form conditionsinvolved depends on α.Theorem 3.3.2. For allK ≤ K(α) any independent measure system and any fixed J ⊆ [d+1], |J | =d, let F1, ..., FK : XJ → R, Fj(xJ) ≤ νJ(xJ) be given functions. Then for each 1 ≤ K ≤ K(α) wehave that ∥∥ K∏j=1DFj∥∥∗dµ= Oα(1)Proof. Denote by I the subsets of a fixed set J ⊆ [d+ 1], |J | = d. First, for each 1 ≤ j ≤ K, writeDFj(x) = Eyj∈ZdN∏ω 6=0Fj(Pω(x,yj))∏|I|<d∏ωI 6=0νI(PωI (xI ,yjI))4In [50], K is the number of iterations in energy increment which is 22K/ε.85Now assume ‖f‖dµ ≤ 1 then〈f,K∏j=1DFj〉µ= Ex∈ZdN f(x)K∏j=1DFj(x)∏|I|<dνI(xI)= Ex∈ZdN f(x)Ey1,...yK∈ZdNK∏j=1(∏ω 6=0Fj(Pω(x,yj))∏|I|<d[ ∏ωI 6=0νI(PωI (xI ,yjI))]νI(xI))We will compare this to the box norm to exploit the fact that ‖f‖dν ≤ 1. To compare this to theGowers’s inner product, let us introduce the following change of variables:For a fixed y ∈ ZdN , write yj 7→ yj + y for 1 ≤ j ≤ K then our expression takes the form〈f,K∏j=1DFj〉µ= Ey1,...,yKExf(x)K∏j=1[ ∏ω 6=0Fj(Pω(x,y + yj))∏|I|<d∏ωI 6=0[νI(PωI (xI ,yjI + yI))]νI(xI)]Since ZdN is cyclic. This is equal to the averageEy1,...,yK∈ZdNEx,y∈ZdN f(x)K∏j=1[ ∏ω 6=0Fj(Pω(x,y + yj))∏|I|<d∏ωI 6=0[νI(PωI (xI ,yjI + yI))]νI(xI)]For ω ∈ {0, 1}d, Y = (y1, . . . , yk) ∈ (ZdN )k. We will define functions Gω,Y (x) : ZdN → R such that〈f,K∏j=1DFj〉µ= Ey1,..,yK〈Gω,Y ;ω ∈ {0, 1}d〉dµTo do this, let G0(x) := f(x) and for each ω˜ 6= 0, Y , defineGω˜,Y (x) :=K∏j=1[Fj(x+ yj1(ω˜))( ∏|I|<dνI((x+ yj1(ω˜))∣∣I)) 12d−|I|]×∏|I|<dνI(xI)− 12d−|I|Hence for ω˜ 6= 0Gω˜,Y (Pω˜(x,y)) =K∏j=1[Fj(Pω˜(x,y+yj))( ∏|I|<dνI((Pω˜((x,y+yj)∣∣I) 12d−|I|]×∏|I|<dνI(Pω˜(x,y)∣∣I)− 12d−|I|Remark 3.3.3. For each I ⊆ [d] and fixed ωI , the number of ω[d] such that ω[d]|I = ωI is 2d−|I| andPω(x,y)|I = PωI (xI ,yI)⇐⇒ ω|I = ωI86Hence〈Gω,Y ;ω ∈ {0, 1}d〉dµ= Ex,y∈ZdN∏ωGω,Y (Pω(x,y))×∏|I|<d∏ωIνI(PωI (xI ,yI))= Ex,y∈ZdN∏ω[ K∏j=1[Fj(Pω(x,y + yj))(∏|I|<dνI(Pω((x,y) + yj1(ω))∣∣I)12d−|I|]×∏|I|<dνI(Pω(xI ,yI)|I)−12d−|I|]×∏|I|<d∏ωIνI(PωI (xI ,yI))= Ex,y∈ZdN f(x)[ K∏j=1∏ω 6=0Fj(Pω(x,y + yj))]×∏|I|<d[( K∏j=1∏ωI 6=0νI(PωI (xI ,yjI + yI)))νI(xI)]Hence we have〈f,K∏j=1DFj〉µ = Ey1,..,yK〈Gω,Y ;ω ∈ {0, 1}d〉dµThen by Gowers-Cauchy-Schwartz’s and arithmetic-geometric mean inequality, we have∣∣〈f, K∏j=1DFj〉µ∣∣ ≤ Ey1,...,yK ‖f‖dµ ∏ω 6=0∥∥Gω,Y ∥∥dµ . Ey1,...,yK(1 + ∑ω[d] 6=0∥∥Gω,Y ∥∥2ddµ )Hence to prove the dual function estimate, it is enough to show thatEy1,...,yK∥∥Gω˜,Y ∥∥2ddµ = O(1)For any fixed ω˜ 6= 0. NowEy1,...,yK∥∥Gω˜,Y ∥∥2ddµ = Ey1,...,yKEx,y∏ωGω˜,Y (Pω(x,y))∏|I|<d∏ωIνI(PωI (xI ,yI))≤ Ey1,...,yKEx,y∏ωK∏j=1[ν[d](Pω(x,y) + yj1(ω˜))∏|I|<dνI((Pω(x,y) + yj1(ω˜))∣∣I)12d−|I|×∏|I|<dνI(Pω(x,y)∣∣I)− 12d−|I|]×∏|I|<d∏ωIνI(PωI (xI ,yI))= Ey1,...,yKEx,yK∏j=1[∏ων[d](Pω((x,y) + yj1(ω˜)∣∣I))∏|I|<d∏ωIνI(PωI (xI ,yI) + yj1(ω˜)∣∣I)]by remark 3.3.3 above. As the linear forms appearing in the above expression are pairwise linearlyindependent this is Oα(1) (in fact 1 + oα(1)) by the linear forms condition as required.873.4 Transference PrincipleIn this section, we will modify the transference principle in [37](see Theorem 3.4.6). We will workon the set on which our functions have bounded dual, treating the contributions of the remaining setas error terms.We will work on functions f : XI → R, dominated by νI . WLOG I = [d]. Let 〈 · 〉 be any innerproduct on F := {f : X[d] → R} written as 〈f, g〉 =∫f · g dµ for some measure µ on X[d]. In thissection we will need the explicit description of the set Ω(T ) that the dual function is bounded by Tand for this we will need correlation condition (in particular, Proposition 3.4.3 below. This is the onlyplace in this thesis that we will need correlation condition).We will do this on the set on which our functions have bounded dual, and treat the contributionsof the remaining set as error terms. We will need the explicit description of the set Ω(T ) that thedual function is bounded by T using the correlation condition (In particular, that this set is of lowercomplexity, missing one of the variable). In general, T will depend on  and T → ∞ as  → 0 butwhen we apply removal lemma we will choose  to be some small number depending on α and henceif α is fixed then we can regard  as a fixed small constant and T as a fixed large constant.Definition 3.4.1. For each T > 1 we have th set Ω(T ) and define the following setsF := {f : X[d] → R}FT := {f ∈ F : supp(f) ⊆ Ω(T )}ST := {f ∈ FT : |f | ≤ ν[d](x[d]) + 2}We define the following (basic anti-correlation) norm on FT‖f‖BAC := ‖f‖BAC,µ = maxg∈ST| 〈f,Dg〉µ | ≥ ‖f‖2dµ .If f ∈ ST , this norm measures how much the functions on FT correlate with function on ST and ourweighted box norm is denominated by this norm. This will allow us to work on BAC−norm (ratherthan the box norm) which has some useful algebraic properties. RecallDf = Ey∏ω 6=0f(Pω(x,y))∏|I|<d∏ωI 6=0νI(PωI (xI ,yI))Also recall that LI(x) are linear forms appeared in the definition of νI . Write hωI = LI(x)|0(ωI)hence using correlation condition (Definition 2.2.8), we have|Df | ≤∏∅6=J⊆[d]∑(ωI1,ωI2)∈TJτ(W · (aωI1hωI1 − aωI2hωI2 ) + (aωI1 − aωI2 )b) (3.4.1)88where for each J ( [d], J 6= ∅TJ := {(ωI1 , ωI2), ωI1 , ωI2 6= 0, 1(ωI1) = 1(ωI2) = J : ∃c ∈ Q, LI1(y1(ωI1 )) = cLI2(y1(ωI2 ))}where aωIj ∈ Z. DefineΩJ(T ) = {(x[d] :∑(ωI1,ωI2)∈TJτ(W · (aωI1hωI1 − aωI2hωI2 ) + (aωI1 − aωI2 )b)) ≤ T1/2d} (3.4.2)Ω(T ) =⋂∅6=J([d]ΩJ(T ) (3.4.3)So Df is bounded by T on Ω(T ) for any fixed T > 1. Indeed, when T is large, the set Ω(T ) 6= ∅.Explicit description of Ω(T ) is only used in the proof of property 1 in Propposition 3.4.3 below.Example 3.4.2. In 2-dimension corner, let f : X1 ×X2 → R, we haveDf(x1, x2) = Ey1,y2f(x1, y2)f(y1, x2)f(y1, y2)ν(y1)ν(y2)≤ Ey1,y2ν(x2 − y1)ν(y2 − x1)ν(y2 − y1)ν(y1)ν(y2)≤ τ(W · x1)τ(W · x2)Then define Ω1(T ) := {(x1, x2) : τ(W · x2) ≤ T 1/2},Ω2(T ) := {(x1, x2) : τ(W · x1) ≤ T 1/2}. LetΩ{1,2}(T ) := Ω1(T ) ∩ Ω2(T ) then Df is bounded by T on Ω1(T ) ∩ Ω2(T ).We have the following basic properties of this norm.Proposition 3.4.3. 1. g ∈ FT ⇒ Dg ∈ FT2. ‖·‖BAC is a norm on FT and can be extended to be a seminorm on F . Furthermore, we have‖f‖BAC =∥∥f · 1Ω(T )∥∥BAC , f ∈ F .3. Span{Dg : g ∈ ST } = FT4. ‖f‖∗BAC = inf{∑ki=1 |λi|, f =∑ki=1 λiDgi; gi ∈ ST } for f ∈ FT . Hence the dual of BAC normmeasure how one can write f as a linear combination of Dgi; gi ∈ ST .Remark 3.4.4. If f /∈ FT then supp(f) * Ω(T ) so f is not of the form∑ki=1 λiDgi; gi ∈ FT as RHSis zero.Proof. 1. Suppose (x˜1, ..., x˜d) ∈ Ω(T )C then there is an J ( [d] such that KJ(x˜[d]\J) > T whereKJ is the function in the definition of ΩJ(T ) for some J . Let g ∈ FT then g(x˜[d]\J ,xJ) = 0for all xJ ∈ XJ SoDg(x˜[d]\J ,xJ) = g(x˜[d]\J ,xJ)E(x) = 0for some function E so Dg ∈ FT .892. It follows directly from the definition that ‖f + g‖BAC ≤ ‖f‖BAC + ‖g‖BAC and ‖λf‖BAC =|λ| ‖f‖BAC for any λ ∈ R. Now suppose f ∈ FT , f is not identically zero then we need toshow that ‖f‖BAC 6= 0. Since f is defined on a finite set, we have that ‖f‖∞ < ∞. Letg = γf where γ is a constant such that ‖g‖∞ < 2 then g ∈ ST and 〈f,Dg〉µ = 〈f,Dγf〉µ =γ2d−1 〈f,Df〉µ > 0 so ‖f‖BAC > 0.Now since supp(Dg) ⊆ Ω(T ) we have for any f ∈ F‖f‖BAC = supg∈ST| 〈f,Dg〉µ | = supg∈ST| 〈f · 1Ω(T ),Dg〉µ | = ∥∥f · 1Ω(T )∥∥BAC3. If there is an f ∈ FT , f is not identically zero and f /∈ span{Dg : g ∈ ST } So f ∈ span{Dg :g ∈ ST }⊥ then 〈f,Dg〉 = 0 for all g ∈ ST . So ‖f‖BAC = 0 which is a contradiction.4. This is a standard argument. Define ‖f‖D = inf{∑ki=1 |λi| : f =∑ki=1 λiDgi, gi ∈ ST }whichcan be easily verified to be a norm on FT . Now let φ, f ∈ FT , f =∑ki=1 λiDgi, gi ∈ ST , then| 〈φ, f〉 | =k∑i=1|λi|| 〈φ,Dgi〉 | ≤ ‖φ‖BACk∑i=1|λi| ≤ ‖φ‖BAC ‖f‖Dso‖f‖∗BAC ≤ ‖f‖DNext for all g ∈ ST , we have ‖Dg‖D ≤ 1 then‖f‖BAC = supg∈ST| 〈f,Dg〉 | ≤ sup‖h‖D≤1| 〈f, h〉 | = ‖f‖∗Dso ‖f‖BAC ≤ ‖f‖∗D i.e. ‖f‖∗BAC ≥ ‖f‖D. So ‖f‖∗BAC = ‖f‖D.Now let us prove the following lemma whose proof relies on the dual function estimate. From herewe consider our inner product 〈 · 〉µ and the norm ‖ · ‖µ . This argument also works for any normfor which one has the dual function estimate.Lemma 3.4.5. Let φ ∈ FT be such that ‖φ‖∗BAC ≤ C and η > 0. Let φ+ := max{0, φ}. Then thereis a polynomial P (u) = amum + ...+ a1u+ a0 such that1. ‖P (φ)− φ+‖∞ ≤ η2. ‖P (φ)‖∗dµ ≤ ρ(C, T, η)whereρ(C, T, η) := 2 inf RP (C)90where the infimum is taken over polynomials P such that ‖P − φ+‖∞ ≤ η on [−CT,CT ] andRP (x) =m∑j=0C(j)|aj |xj , where C(m) is the constant in the dual function estimateProof. First, recall that if (x1, ..., xd) ∈ supp(Dgi) ⊆ Ω(T ) then|Dg(x1, ..., xd)| ≤ TNow suppose ‖φ‖∗BAC ≤ C then there exist g1, .., gk ∈ ST and λ1, ..., λk such that φ =∑ki=1 λiDgiand∑1≤i≤k |λi| ≤ C. Hence applying the boundedness of the duals,|φ(x1, ..., xd)| ≤ (k∑i=1|λi|)( max1≤i≤k|Dgi(x1, .., xd)|) ≤ CTHence the Range of φ= φ(Ω(T )) ⊆ [−CT,CT ]. Then by Weierstrass approximation theorem, thereis a polynomial P (which may depend on C, T, η) such that RP (C) ≤ ρ and|P (u)− u+| ≤ η ∀|u| ≤ CTand so ‖P (φ)− φ+‖∞ ≤ η and we have (1).Now using the dual function estimate, we have‖φm‖∗dµ ≤∥∥( ∑1≤i≤kλiDgi)m∥∥∗dµ≤∑1≤i1≤...≤im≤k|λi1 ...λim | ‖Dgi1 ...Dgim‖∗dµ≤ C(m)∑1≤i1≤...≤im≤k|λi1 ...λim | ≤ C(m)(∑1≤i≤k|λi|)m ≤ C(m)CmHence ‖P (φ)‖∗dµ ≤∑dm=0 |am|C(m)Cm ≤ ρ(C, T, η)Now we are ready to prove the transference principle.Theorem 3.4.6 (Transference Principle). Suppose ν gives an independent weight system. Let f ∈ Fand 0 ≤ f(x[d]) ≤ ν[d](x[d]), let η > 0. Suppose N ≥ N(η, T ) is large enough, then there arefunctions g, h on X1 × ...×Xd such thatf = g + h on Ω(T ), 0 ≤ g ≤ 2 on Ω(T ), ∥∥h · 1Ω(T )∥∥dµ ≤ ηSince h · 1Ω(T ) = f · 1Ω(T ) − g · 1Ω(T ), we have −2 ≤ h · 1Ω(T ) ≤ ν so |h · 1Ω(T )| ≤ ν + 2 soh · 1Ω(T ) ∈ ST . Hence by the definifion of BAC-norm,η ≥ ‖h · 1Ω(T )‖BAC ≥ 〈h · 1Ω(T ),D(h · 1Ω(T ))〉ν = ‖h · 1Ω(T )‖2ddµ91To prove Theorem 3.4.6, it suffices to show∥∥h · 1Ω(T )∥∥BAC ≤ η1/2d (WLOG, replace η1/2d with η ).Here the BAC-norm is the BAC-norm with respect to 〈 · 〉µThe following lemma will be used in the next proof.Lemma 3.4.7 (Finite dimensional Hahn-Banach’s Theorem). Let X = Rd be a norm space andf ∈ X, ‖f‖ ≥ 1. Then there is a vector φ ∈ Rd such that 〈f, φ〉 > 1 and |〈g, φ〉| ≤ 1 whenever‖g‖ ≤ 1. (i.e. ‖φ‖∗ ≤ 1)The following lemma is an easy corollary of the Hahn-Banach’s theorem by considering the followingnorm with given c1, c2 > 0: ‖f‖ = inf{c1‖f1‖1 + c2‖f2‖2; f = f1 + f2} with its dual ‖f‖∗ =max{c−11 ‖f1‖∗, c−12 ‖f‖∗}.Lemma 3.4.8 ([37], Cor. 3.2). Let K1,K2 be closed convex subsets of Rd, each containing 0 andsuppose f ∈ Rd cannot be written as a sum f1 + f2, f1 ∈ c1K1, f2 ∈ c2K2, c1 > 0, c2 > 0. Thenthere is a linear functional φ such that 〈f, φ〉 > 1 and 〈f1, φ〉 ≤ c−11 for all f1 ∈ K1, 〈f2, φ〉 ≤ c−12for all f2 ∈ K2.Proof of Theorem 3.4.6: DefineK1 := {g ∈ F : 0 ≤ g ≤ 2 on Ω(T )}, K2 := {h ∈ F : ‖h‖BAC ≤ η}.It is clear that K1,K2 are convex. (Also 0 ∈ K1, 0 ∈ Int(K2)). Assume that f /∈ K1 +K2 on Ω(T )then by Lemma 3.4.8, there exists φ ∈ F such that〈φ, f · 1Ω(T )〉µ> 1, 〈φ, g〉µ ≤ 1 ∀g ∈ K1, 〈φ, h〉µ ≤ 1 ∀h ∈ K2First, we claim that φ ∈ FT . To see this, suppose g is a function whose supp(g) ⊆ Ω(T )C (i.e. g ≡ 0on Ω(T ) so g ∈ K1.) Since g ∈ K1, 〈φ, g〉µ ≤ 1 but g could be chosen arbitrarily on Ω(T )C so wemust have φ∣∣Ω(T )C≡ 0 and hence φ ∈ FT . Now letg(x[d]) =2 if φ(x[d]) ≥ 00 otherwisethen g ∈ K1 and〈φ, g〉µ = 〈φ+, 2〉µ = 2 〈φ+, 1〉µ ≤ 1⇒ 〈φ+, 1〉µ ≤12Now we want to show that ‖φ‖∗BAC <≤ η−1. Since φ ∈ FT , h ∈ K2, ‖h · 1Ω(T )C‖BAC ≤ η then wehave〈φ, ηh · 1Ω(T )C 〉µ = 〈φ, ηh〉µ ≤ 1.92Hence if h′ ∈ FT and ‖h′‖BAC ≤ 1 then∥∥h′ · 1Ω(T )∥∥BAC = ‖h′‖BAC ≤ 1 so〈φ, h′〉µ ≤ η−1 ∀h′ ∈ FT ,∥∥h′∥∥BAC ≤ 1so ‖φ‖∗BAC ≤ η−1 as ‖ · ‖BAC is a norm on FT .Now we want to invoke positivity of φ+. By the Lemma 3.4.5, there is a polynomial P such that‖P (φ)− φ+‖∞ ≤18and ‖P (φ)‖∗dµ ≤ ρ(C, T, η)Then 〈P (φ), 1〉µ ≤ 〈P (φ)− φ+, 1〉µ + 〈φ+, 1〉µ ≤ 12 + 18 Also, from the definition of the weightedbox norm and the linear form condition, we have‖ν[d](x[d])− 1‖2ddµ = oN→∞(1)so suppose N ≥ N(T, η) then〈P (φ), ν〉µ = 〈P (φ), 1〉µ + 〈P (φ), ν − 1〉µ ≤12+18+ ‖P (φ)‖∗dµ ‖ν − 1‖dµ ≤12+14=34| 〈ν, φ+〉µ | = | 〈ν, φ+ − P (φ)〉µ |+| 〈ν, P (φ)〉µ | ≤ ‖φ+ − P (φ)‖∞ 〈ν, 1〉µ+〈ν, P (φ)〉µ ≤18· 12+34By positivity of φ+ we have 〈f, φ+〉µ ≤ 〈ν, φ+〉µ. Hence〈f · 1Ω(T ), φ〉µ= 〈f, φ〉µ ≤ 〈f, φ+〉µ ≤ 〈ν, φ+〉µ ≤34+110< 1which is a contradiction. Hence f ∈ K1 +K2 on Ω(T ).Now we can rephrase Theorem 3.4.6 as follow:Theorem 3.4.9 (Transference Principle). Suppose ν is an independent weight system. Let f ∈ F , 0 ≤f ≤ ν and 0 < η < 1 T then there exists f1, f2, f3 ∈ F such that1. f = f1 + f2 + f32. 0 ≤ f1 ≤ 2, supp(f1) ⊆ Ω(T )3. ‖f2‖dµ ≤ η, supp(f2) ⊆ Ω(T )4. 0 ≤ f3 ≤ ν, supp(f3) ⊆ Ω(T )C , ‖f3‖L1µ .1T .Proof. Let g, h be as in Theorem 3.4.6. Take f1 = g · 1ΩT , f2 = h · 1ΩT then f · 1ΩT = f1 + f2. Let93f3 = f · 1ΩCT . Now by the linear forms conditions,‖f3‖L1µ ≤1TEx[d+1]\{d}f ·∏I⊆[d+1]\{d},|I|<dνI(xI) · |Df | (since |Df | > T )=1TEx[d+1]\{d}Ey[d+1]\{d}∏I⊆[d+1]\{d}νI(LI(xI))∏I⊆[d+1]\{d}∏ωI 6=0νI((PωI (xI ,yI))) .1T.3.5 Relative Hypergraph Removal LemmaFirst let us recall the statement of ordinary functional hypergraph removal lemma [104]. Recall thedefinition of Λ in equation (3.1.1).Theorem 3.5.1. Given probability measure spaces (X1, µX1), ..., (Xd+1, µXd+1) and f (i) : XI →[0, 1], I = [d + 1]\{i}. Let  > 0, suppose |Λd+1(f (1), ..., f (d), f (d+1))| ≤ . Then for 1 ≤ i ≤ d,there existsEi ⊆ X[d+1]\{i}such that∏1≤j≤d+1 1Ej ≡ 0 and for 1 ≤ i ≤ d+ 1,∫X1· · ·∫Xd+1f (i) · 1ECi dµX1 · · · dµXddµXd+1 ≤ δ()where δ()→ 0 as → 0.Remark 3.5.2. In fact the paper [104] proves this theorem only with the counting measure (with thenotion of e−discrepancy in place of Box norm). But the proof also works for any finite measure thathas direct product structure (with the notion of weighted Box Norm) as the energy increment as in[104] or [18] would run through in the same way. See also [100] for the case of probability measuresin d = 2, 3. However we don’t know how to genralize this argument to arbitrary measure on theproduct space. If we can prove this theorem for any measure µX1×...×Xd then we would be able toprove multidimensional Green-Tao’s Theorem.The proof of this removal lemma relies on the following regularity lemma.Theorem 3.5.3 (Szemere´di’s Regularity (Tao) Lemma [104]). Let (X[d+1], µ) be a weighted hyper-graph system with pseudorandom weight attached only on each edge of size 1. Let f : X[d+1] → [0, 1]be measurable, let τ > 0 and F : N → N be arbitrary increasing functions (possibly depends on τ ).Then there is an integer M = OF,τ (1), factors BI(I ⊆ [d+ 1], |I| = d) on XI of complexity at mostM such that f = f1 + f2 + f3 where94– f1 = E(f |∨I⊆[d+1],|I|=d BI).– ‖f2‖L2µ ≤ τ.– ‖f3‖dµ ≤ F (M)−1.– f1, f1 + f2 ∈ [0, 1].Remark 3.5.4. A consequence from this theorem that we will use later is the following: since f1 is aconstant on each atom of∨|I|=d BI , we can decompose f1 as a finite sum with OM (1) terms of lowercomplexity functions i.e.a finite sum of product∏d+1i=1 Ji where Ji is a function in x[d+1]\{i} variableand takes values in [0, 1].Theorem 3.5.5 (Weighted Simplex-Removal Lemma). Suppose f (i)(x[d+1]\{i}) ≤ ν[d+1]\{i}(x[d+1]\{i}).Let  > 0, Suppose |Λ| ≤  then there exist Ei ⊆∏j∈[d+1]\{i}Xj such that for 1 ≤ i ≤ d+ 1,–∏i∈[d+1]1Ei ≡ 0–∫X1· · · ∫Xd+1 f (i)1ECi dµX1 · · · dµXd+1 = Ex[d+1]\{i}1ECi f (i)(x[d+1]\{i})∏J([d+1]\{i} νJ(xJ) ≤δ()where δ()→ 0 as → 0.Proof. Using the transference principle (Theorem 3.4.9) for 1 ≤ i ≤ d+ 1, write f (i) = g(i) +h(i) +k(i) where1. f (i) = g(i) + h(i) + k(i)2. 0 ≤ g(i) ≤ 2, supp(g(i)) ⊆ Ω(i)(T )3.∥∥h(i)∥∥dµ ≤ η, supp(h(i)) ⊆ Ω(i)(T )4. k(i) = f (i) · 1(Ω(i))C(T )whereΩ(i)(T ) = {x[d+1]\{i} : |Df (i)| ≤ T}, 1 ≤ i ≤ dStep 1: We will show that if T ≥ T () is sufficiently large then5Λd+1,µ(g(1) + h(1), ..., g(d+1) + h(d+1)) = Λd+1(f(1) − k(1), ..., f (d+1) − k(d+1)) . .Proof of Step 1: For I ⊆ [d+ 1], the term on LHS can be written as a sum of the following terms:Λd+1,I,µ(e(1), ..., e(d), e(d+1)), e(i) =−k(i) if i ∈ If (i) if i /∈ I5Λd+1 is indeed defined on our weighted measures.95If I = ∅ then Λd+1,µ(f (1), ..., f (d), f (d+1)) ≤  by the assumption. Suppose I = {i1, ..., ir} 6= ∅then|Λd+1,I,µ(e(1), ..., e(d), f (d+1))| =∣∣∣∣ ∫X1· · ·∫Xd+1f (1) · · · f (d+1) ·∏i∈I1(Ω(i))CdµX1 · · · dµXd+1∣∣∣∣≤ Ex[d+1]∏I⊆[d+1],|I|≤dνI(xI)1(Ω(i1))C ∃i1 ∈ I≤ 1TEx[d+1]Ey[d+1]\{i1}∏I⊆[d+1],|I|≤dνI(xI)∏ωI 6=0I⊆[d+1]\{i1}νI(PωI (xI ,yI)). 1T≤ εby linear form condition.Step 2 We will show Λd+1,µ(g(1), ..., g(d+1)) .  if η ≤ η(), N ≥ N(, η).Proof of step 2: Write g(i) = g(i) + h(i) − h(i) = f (i) · 1Ω(i)(T ) − h(i) then we have0 ≤ f (i) · 1Ω(i)(T ) ≤ νi, ‖h(i)‖dµ ≤ ηso by the weighted von-Neumann inequality and step 1 , we have|Λd+1,µ(g(1), ..., g(d+1))| = |Λd+1,µ(g(1) + h(1), ..., g(d+1) + h(d+1))−∑ei=hi,∃iΛd+1,µ(e(1), .., e(d), e(d+1))|. + η + oN→∞(1). if 1/T ≤ , η ≤ ,N ≥ N() and the proof of step 2 is completed.Now since 0 ≤ g(i) ≤ 2 then (after normalizing) using the ordinary hypergraph removal lemma(Theorem5.1), we haveFi ⊆ X[d+1]\{i} such that∏1≤k≤d+11Fk ≡ 0 and∫X1· · ·∫Xd+1g(i) · 1FCi dµX1 · · · dµXd+1 . δ()so∫X1· · ·∫Xd+1f (i) · 1FCi dµX1 · · · dµXd+1 . δ() +∫X1· · ·∫Xd+1h(i) · 1FCi dµX1 · · · dµXddµXd+1︸ ︷︷ ︸(A)++∫X1· · ·∫Xd+1f (i) · 1ΩCi (T )1FCi dµX1 · · · dµXd+1︸ ︷︷ ︸(B)96Now for our purpose, it suffices to show (A), (B) . .Estimate for (A) (error from uniformity function): By the assumption of complexity of σ−algebras,the function 1FCi could be written as a sum ofOM (1) of functions of the form∏j∈[d+1]\{i} v(i)j wherev(i)j is a [0, 1]- valued function in x[d+1]\{i,j}. We could write estimate each term with∏j∈[d+1]\{i} v(i)jindividually. Applying Cauchy-Schwartz’s inequality d times to estimate the expression (A) (here let’sassume i < d, the case i = d is the same.) :(∫X1· · ·∫Xd∫Xd+1h(i) · 1FCi dµX1 · · · dµXddµXd+1)2d.[(∫X1· · ·∫Xd(∫Xd+1h(i)∏1≤j≤dj 6=iu(i)j dµXd+1)uid+1dµX1 · · · dµXddµXd+1)2]2d−1≤[ ∫X1· · ·∫Xd(∫Xd+1h(i)∏1≤j≤d−1j 6=iu(i)j dµXd+1)2dµX1 · · · dµXd×∫X1· · ·∫Xd(uid+1)2dµX1 · · · dµXd]2d−1.[ ∫X1· · ·∫Xd∫Xd+1∫Yd+1h(i)(x[d+1]\{i}, xd+1)h(i)(x[d]\{i}, yd+1)∏1≤j≤dj 6=iu(i)j (x[d]\{i}, xd+1)u(i)j (x[d]\{i}, yd+1)dµX1 · · · dµXd+1dµYd+1]2d−1Continue applying Cauchy-Schwartz’s inequality this way. After d application of Cauchy-Schwartz’sinequality, the positive function u(i)j eventually disappears and we have this bounded by ‖h(i)‖2dµ ≤ .Estimate for (B) : Next we estimate the expression in (B),∣∣∣∣ ∫X1· · ·∫Xd+1f (i) · 1(Ω(i)(T ))C · 1FCi dµX1 · · · dµXd+1∣∣∣∣≤∫X1· · ·∫Xd+1(ν[d+1]\{i}) · 1(Ω(i)(T ))CdµX1 · · · dµXd+1≤ 1TEx[d+1]Ey[d+1]\{i}ν[d+1]\{i}(x[d+1]\{i})∏|I|≤dνI(xI)∏ωI 6=0I⊆[d+1]\{i}νI(PωI (xI ,yI)). 1T,97by the linear forms condition. Hence if we choose sufficiently large T then∫X1· · ·∫Xd+1f (i) · 1FCi dµX1 · · · dµXd+1 . δ().3.6 Proof of the Main Result3.6.1 From ZN to ZFirst, recall that νε1,ε2(n) ≈ φ(W )W logN, ε1N ≤ n ≤ ε2N, ε1, ε2 ∈ (0, 1] for a sufficiently largeprime N in the residue class b (mod W ). Also Lemma 4.7.1 allows us to work in (Z/N ′)d for somebig prime N ′. By pigeonhole principle (see Lemma 4.7.2 in Chapter 4) we may choose b ∈ (Z/W )dand small ε1, ε2 > 0 in the definition of ν and A′ such that|A′| := |{n ∈ [1, N/W ]d; Wn+ b ∈ A} ∩ [ε1N ′, ε2N ′]d| ≥ α εd22(N ′)dW d(logN ′)d φ(W )d.Here we choose N ′ so that ε2N ′ = N/W (1 + oN,W→∞(1)).3.6.2 Proof of the Main TheoremTo prove the theorem, suppose on the contrary thatA′ contains less than  N′d+1(logN ′)2d corners.( = c(α))thenΛd+1,µ(f(1), ..., f (d+1))= (N ′)−(d+1)∑x[d+1]∏1≤i≤d1A′(x1, ..., xi−1, xd+1 −∑1≤j≤dj 6=i, xi+1, ..., xd)νI1A′(x1, ..., xd) · ν(x1)...ν(xd)≤ 1N ′d+1∑pi∈A′,1≤i≤2dthat consitutes a corner∏1≤k≤d1A′(p1, ..., pk−1, pd+k, pk+1, ..., pd)1A(p1, ..., pd)ν(p1)...ν(p2d). 1N ′d+1(φ(W ) logN ′W)2d× (The number of corners in A′)≤ Now assume that Λd+1,µ(f (1), ..., f (d), f (d+1)) .  then by the relative hypergraph removal lemma∃Ei, 1 ≤ i ≤ d+ 1, Ei ⊆ X[d+1]\{i} := X˜i,98such that ∏1≤i≤d+11Ei ≡ 0,∫X˜if (i)1ECidµX˜i . δ()where δ()→ 0 as → 0. LetA′ = A∩ [δ1N, δ2N ]d, z =∑1≤j≤d xj , gA′ := g ·1A′ for any functiong thenΛ˜ := N ′−d∑(x1,...,xd)∈A′f(1)A′ (x2, ..., xd, z)f(2)A′ (x1, x3, ..., xd, z)...f(d)A′ (x1, x2, ..., xd−1, z)f(d+1)A′ (x1, ..., xd)≥ N ′−d∑(x1,...,xd)∈A′ν(x1)...ν(xd)& (N ′)−d(φ(W )WlogN ′)d · α · (N ′W )d(φ(W ) logN ′)d= α.for arbitrarily large N ′. NowΛ˜ = Ex[d](f(1)A′ 1E1 + f(1)A′ 1EC1)...(f(d+1)A′ 1Ed+1 + f(d+1)A′ 1ECd+1)Now we have by the assumption Ex[d]f(1)A′ · 1E1 ...f (d+1)A′ · 1Ed+1 ≡ 0 so we just need to estimate eachother term individually.Consider Ex[d]f(1)A′ · 1EC1 f(2)A′ · 1E±2 ...f(d+1)A′ · 1E±d+1 , where F± can be either F or FC for any set F .Now since0 ≤ f (j)A′ 1E±j ≤ ν(xj), d ≥ j ≥ 2 and 0 ≤ f(d+1)A ≤ 1We haveEx[d]f(1)A′ ·1EC1 f(2)A′ ·1E±1 ...f(d+1)A′ ·1E±d+1 ≤ Ex[d]f(1)A′ ·1EC1 ν(x2)...ν(xd) =∫X˜1f (1)·1EC1 dµX2 · · · dµXd+1 . δ().In the same way, we have for any 1 ≤ i ≤ d+ 1,Ex[d]f(i)A′ · 1ECi∏1≤j≤d+1,j 6=i(f (j) · 1E±j ) . δ()So if N ′ > N(α) thenEx[d]f(1)A′ (x2, ..., xd, u)f(2)A′ (x1, x3, ..., xd, u)...f(d)A′ (x1, ..., xd−1, u)f(d+1)A′ (x1, ..., xd) . δ() = o(α)This is a contradiction. Hence there are &  N ′d+1(logN ′)2d corners in A. Note that the number of degener-ated corners is at most O( N′d(logN ′)d ) as the corner is degenerated (and will be degenerated into a singlepoint ) iff z =∑1≤j≤d xj .993.7 Further Remarks: Conlon-Fox-Zhao’s Densification TrickOne important feature that make the transference principle work is that we are working on a d−regularhypergraph, this is also the same as in [52] , [16]. To prove a more general version of the theorem,we have to consider more general version of hypergraph removal lemma, this will be discussed in thenext chapter.A natural question in our method is that if the correlation condition is needed in our proof. In the set-ting of Gaussian prime, the correlation condition was needed to deal with the fact that if p is a Gaus-sian prime then its conjugate. p is also a Gaussian prime. Later, Conlon-Fox-Zhao [16] developedthe “densification technique” to prove a relatively Szemere´di’s Theorem like Theorem 1.2.1 but withν satisfies only certain linear form conditions. Densification trick allows one to replace a sparse edgewith a dense edge. No correlation conditions or bounded dual conditions are assumed as this tech-nique allows most factors in correlation to be bounded. The question of simplifying the pseudorandomconditions is an interesting and active research questions. Gowers [37] asked if ‖ν − 1‖Us = o(1) forsome large s = s(k) would allow us to deduce a relative k −AP Szemeree´di’s theorem.Definition 3.7.1 (Hr-linear forms condition). Consider a weight hypergraph system (J, Vj , Hr) withweight system ν = {νe}e∈Hr . We say that ν satisfies Hr−linear forms condition ifEx,y∈VJ∏e∈Hr∏ωeνe(Pωe(x|e, y|e))ne,w = 1 + o(1)Hence this linear forms condition is about counting the 2-blow up of Hr and any subgraph of thisblow-up.This linear forms condition is the same that is used in [102] (Def. 2.8). However in [102], one assumesthe correlation condition and the bounded dual condition.Theorem 3.7.2. [16] If S ⊆ ZN , ν = N|S|1S satisfiesEx,x′,y,y′,z,z′ν(x)ν(x′)ν(z−x)ν(z′−x)ν(z−x′)ν(z′−x′)ν(y)ν(y′)ν(z−y)ν(z′−y)ν(z−y′)ν(z′−y′) = 1+o(1).or if the condition holds if one of the ν above is replaced by 1. Then a corner-free subset of S has sizeo(|S|2)The densification trick allows them to prove the counting lemma with only such linear forms con-dition. This has application in simplifying many technical difficulties in previous results and onecan obtain a better quantitative result such as primes with narrow polynomial progressions [112]:a+ P1(r), a+ P2(r), . . . , a+ Pk(r) with r ≤ logLN . We want to transfer the count in (pseudoran-som) sparse setting to dense setting.100We may try to transfer some this kind of the theorems similar to corners (for example, we may modelusing d−regular hypergraph) we know in integer case to the prime case. An interesting problem wouldbe to find an analogue of Shkredov’s result [96] that is to obtain an exponential bound for corners indense subsets of P[N ]2 .101Chapter 4Weighted Simplices Removal Lemma andMultidimensional Szemere´di’s Theoremin the Primes4.1 IntroductionThe main objective of this chapter is to prove the following generalization of the main result in chapter3.Theorem 4.1.1. If A is a subset of Pd of positive upper relative density, then A contains infinitelymany non-trivial affine copies of any finite set F ⊆ Zd.Note that it is enough to show that the set A contains at least one non-trivial affine copy of F , asdeleting the set F from A will not affect its relative density. Also, replacing the set F by F ′ =F ∪ (−F ) one can require that the dilation parameter t is positive.By lifting the problem to a higher number of dimensions, it is easy to see that one can assume that Fforms the vertices of a d-dimensional simplex1 (which will be important as the linear forms appearedin the parametrization is pairwise linearly independent). Indeed, let F = {0, x1, . . . , xk}, choose a setof k linearly independent vectors {y1, . . . , yk} ⊆ Zk, and define the set ∆ := {0, (x1, y1), . . . , (xk, yk), zk+1, . . . , zk+d} ⊆Zk+d such that the vectors of ∆\{0} form a basis of Rk+d. If the set A′ = A×Pk contains an affinecopy of ∆ then clearly A contains an affine copy of the set pi(∆) ⊇ F , where pi : Rd × Rk → Rd isthe natural orthogonal projection.1Unike integer case, it is not enough to lift to a corner as we don’t know if the projection used there will project prime cornersto prime points or not.102In the case when ∆ = {v0, v1, . . . , vd} ⊆ Zd is a d-dimensional simplex, i.e. v1−v0, v2−v0, . . . , vd−v0 are pairwise linearly independent, we prove a quantitative version of Theorem 4.1.1. To formulateit we define the quantityl(∆) :=d∑i=1|pii(∆)|, (4.1.1)pii : Rd → R being the orthogonal projection to the i-th coordinate axis.Theorem 4.1.2. Let α > 0 and let ∆ ⊆ Zd be a d-dimensional simplex. There exists a constantc(α,∆) > 0 such that for any N > 1 and any set A ⊆ PdN such that |A| ≥ α |PN |d, the set Acontains at least c(α,∆)Nd+1 (log N)−l(∆) affine copies of the simplex ∆.The lower bound matches with the bound from the heuristic argument that primes is a random subsetof Zd with density 1/ logdN in [1, N ]d : there are ≈ Nd+1 affine copies x + t∆ of ∆ in [1, N ]d,and for a fixed i the probability that all the i-th coordinates of an affine copy ∆ are primes is roughly(logN)−|pii(∆)|. Thus if the prime tuples behave randomly, the probability that ∆ ⊆ Pd is about(logN)−l(E).Note that in Theorem 4.1.2 we do not require the copies of ∆ to be non-trivial, thus without loss ofgenerality, N can be assumed to be sufficiently large with respect to α and ∆. It is clear that Theorem4.1.2 implies Theorem 4.1.1 as the number of trivial copies of ∆ in A (i.e. the one with t = 0) is atmost Nd (log N)−d.In the contrapositive, Theorem 4.1.2 states that if a set A ⊆ PdN contains at most δNd+1(logN)−l(∆)affine copies of ∆, then its relative density is at most ,where  = (δ) is a quantity such that (δ)→ 0as δ → 0. As for a number of similar results on prime configurations [51], [102], [45].Thus identifying [1, N ] with Z/NZ it is easy to show that Theorem 4.1.2 follows fromTheorem 4.1.3. Let ∆ = {v0, . . . , vd} ⊆ Zd be a d-dimensional simplex and let δ > 0. Let N be alarge prime and let A ⊆ ZdN such thatEx∈ZdN ,t∈ZN( d∏i=01A(x+ tvi))w(x+ t∆) ≤ δ (4.1.2)then there exists  = (δ) such thatEx∈ZdN1A(x)w(x) ≤ (δ) + oN,W→∞; ∆(1)Moreover (δ)→ 0 as δ → 0.The objective of this chapter is to prove this theorem. As in Chapter 3, our result would follow if wecould prove the following version of simplices removal lemma (i.e. Lemma 4.1.7 below). Notice that103the conclusion of the lemma does not hold in the same measure space but on a new one which is asmall perturbation of the original measure space.We define weighted system of hypergraph as in section 3.1 of Chapter 3 but now we will do energyincrement so we will put sigma-algebras on our hypergraph. Hypergraph system can be consideredan as analogue of measure preserving system in ergodic theory.We will use the construction of a weighted hypergraph associated to a set A ⊆ ZdN and a simplex∆ = {v0, . . . , vd} given in the case of Gaussian Primes [102].Definition 4.1.4. [Hypergraph System] Let J = {0, 1, . . . , d},H := {e : e ⊆ J} be the set of allpossible hyperedges, and for a set e ∈ H, let Ve = ZeN =∏j∈e ZN . Identify Ve as the subspaceof elements x = (x0, . . . , xd) ∈ VJ such that xj = 0 for all j /∈ e and let pie : VJ → Ve denotethe natural projection. For e = {j} we write Vj := V{j} and for a given H ⊆ H, we will call thequadruplet (J, VJ ,H) a hypergraph system.For each positive integers j denoteHj := {e ∈ H; |e| = j}. For e ∈ H, xe = (xj)j∈e.For a given e ⊆ J and a collection of sets (edges) on Ve, defineAe = {pi−1e (F ) : F ⊆ Ve} consideredas corresponding sets on VJ .Remark 4.1.5. For convenience, we identify Ve as a subset of VJ as the set of points in VJ where thecoordinates in J\e are allowed to be all possible values ( i.e. no restrictions on J\e). Hence we workon a single ambient space.Remark 4.1.6. We can think of a point xe, e ∈ Hd as a d-simplex with vertices {xj : j ∈ e}. Aset Ge ⊆ Ve then may be viewed as a d-regular d-partite hypergraph with vertex sets Vj (j ∈ e).Similarly a point x ∈ VJ represents a (d+ 1)-simplex with d−faces xe, e ∈ Hd.Theorem 4.1.7. (Weighted Simplex Removal Lemma) Let {νe}e⊆J , {µe}e⊆J be a system of weightsand measures associated to a well-defined, pairwise linearly independent and symmetric family oflinear forms L (as defined in (4.2.6)). Let Ee ∈ Ae, ge : Ve → [0, 1] be given for each e ∈ Hd. Thenfor a given δ > 0 there exists an  = (δ) > 0 such that the following holds: IfEx∈VJ∏e∈Hd1Ee(x)µJ(x) ≤ δ (4.1.3)then there exists a well-defined, symmetric family of linear forms L˜ = {L˜ke ; e ∈ Hd, 1 ≤ k ≤ d},such that the associated system of weights and measures {ν˜e}e⊆J , {µ˜e}e⊆J satisfyEx∈VJ∏e∈Hd1Ee(x)µ˜J(x) = Ex∈VJ∏e∈Hd1Ee(x)µJ(x) + oN,W→∞(1) (4.1.4)and for all e ∈ Hd,104Ex∈Ve ge(x)µ˜e(x) = Ex∈Ve ge(x)µe(x) + oN,W→∞(1) (4.1.5)In addition there exist sets E′e ∈ Ae such that⋂e∈Hd(Ee ∩ E′e) = ∅ (4.1.6)and for all e ∈ HdEx∈Ve1Ee\E′e(x)µ˜e(x) ≤ (δ) + oN,W→∞(1) (4.1.7)moreover(δ)→ 0, as δ → 0. (4.1.8)Roughly speaking, if µ(∩e∈HdEe) < δ is small then we can modify each Ee slightly in the magnitudeof ε(δ) to obtain E′e such that⋂e∈Hd (Ee ∩ E′e) = ∅ i.e.⋂e∈Hd Ee ⊆⋃e∈Hd Ee\E′e meaning⋂e∈Hd Ee actually has a very smaller measure than expected.4.1.1 Parametric Weight SystemRecall the weight version of 2µ norm. Let f : X × Y → R, ν12(x1, x2) = ν(x1, x2).‖f‖42µ = Ep∈X×Y Ex∈X×Y f(x1, x2)f(p1, x2)f(x1, p2)f(p1, p2)× ν(x1, x2)ν(p1, x2)ν(x1, p2)ν(p1, p2)ν(x1)ν(x2)ν(p1)ν(p2)If it is possible, we would like to proceed as in previous chapter with transference principle; writingthe weight appearing above in a form of direct product of measures, saydµ(x)dµ(p)with, say, dµ(x) = ν(x1, x2)ν(x1)ν(x2). However this is not possible due to cross terms likeν(p1, x2), ν(x1, p2). More importantly, in higher order box norm, we would have weight in inter-mediate order. Prohibiting us from working on regular hypergraph and application of transferenceprinciple as in chapter 3.The way we get around these difficulties is that we will reprove the removal lemma in our setting,without using transference principle. To accomplish this, we need to introduce the parametric weight105system. We could write‖f‖42µ = Ep∈X×Y Ex∈X×Y f(x1, x2)f(p1, x2)f(x1, p2)f(p1, p2)× ν(x1, x2)ν(p1, x2)ν(x1, p2)ν(x1)ν(x2)ν(p1, p2)ν(p1)ν(p2)= Ep∈X×Y Ex∈X×Y f(x1, x2)f(p1, x2)f(x1, p2)f(p1, p2)dµp(x1, x2)dµ(p)Here for each fixed p, dµp(x1, x2) = ν(x1, x2)ν(p1, x2)ν(x1, p2)ν(x1)ν(x2) is a measure in x1, x2−variablesdepending on p which we regard as parameters. Later, we will regard µp as a parametric extension ofµ in the sense if we consider linear forms defining the measures as linear forms in p and x variablesthen linear forms defining µp is the same as those defining µ.Example 4.1.8. Consider the measure space(X1 ×X2, dµ(x1, x2) = ν(x1 + 2x2)ν(x2))then‖f‖42µ = Ep∈X1×X2Ex∈X1×X2f(x1, x2)f(p1, x2)f(x1, p2)f(p1, p2)× ν(x1 + 2x2)ν(p1 + 2x2)ν(x1 + 2p2)ν(x1)ν(x2)ν(p1 + 2p2)ν(p1)ν(p2).Working on µp would be hard as linear forms conditions may not apply. But if we average dµp over allparameters p then linear form conditions do apply. In this regard, µp itself may not be pseudorandombut upon averaging, there should be many µp that are pseudorandom. Also we will prove that mostµp is only a small perturbation of µ and still share many properties with µ.4.1.2 Energy Increment in weighted setting.Assume that there is an edge e, say e = (1, 2), so that the graph Ge = pie(Ee) is not ε-regular. Thismeans‖F‖µe ≥ ε, (4.1.9)where F : V1 × V2 → R, F = 1Ge − µe(Ge)1Ve . In view of definition of the weight box norm, wemay write‖F‖4µe =∫Ve∫VeF (x)u1q(x1)u2q(x2)νe(x1, q2)νe(q1, x2) dµe(x) dµe(q) ≥ ε4 (4.1.10)where x = (x1, x2), q = (q1, q2), we set u1q(x1) = F (x1, q2), and u2q(x2) = F (q1, x2)F (q1, q2). Ifone defines the measures µq,e, depending on the parameter q, byµq,e(x) := νe(x1, q2)νe(q1, x2)µe(x),106then the inner expression in (4.1.10) can be viewed as the inner productΓ(q) :=〈F, u1q · u2q〉µq,e=∫VeF (x)u1q(x1)u2q(x2) dµq,e(x), (4.1.11)on the Hilbert space L2(Ve, µq,e). Thus (4.1.10) translates to Eq∈Ve Γ(q)µe(q) ≥ ε4 while using thelinear forms condition it is easy to see that Eq∈Ve Γ(q)2 µe(q) . 1 thus, by averaging2,Γ(q) & ε4, for q ∈ Ω, (4.1.12)for a set Ω ⊆ Ve of measure µe(Ω) & ε8. As the functions uiq are bounded, hence without loss ofgenerality, using Fubini’s Theorem3, we may assume that they are indicator functions of sets U iq ⊆ Vi.Let Bq = B1q ∨ B2q denote the σ-algebra on Ve generated by the sets pi−1i (U iq) (i = 1, 2), and letEµq,e(1Ge |Bq) be the conditional expectation function of 1Ge with respect to this σ-algebra and themeasure µq,e. Then, as u1q u2q is measurable with respect to Bq, we have〈1Ge − Eµq,e(1Ge |Bq) , u1q u2q〉µq,e = 0.This together with (4.1.11) and (4.1.12) implies for q ∈ Ω〈Eµq,e(1Ge |Bq)− Eµe(1Ge |B0) , u1q u2q 〉µq,e & ε4,where B0 = {Ve, ∅} is the trivial σ-algebra, and Eµe(1Ge |B0) = µe(Ge)1Ve . Then by the Cauchy-Schwartz inequality, we have‖Eµq,e(1Ge |Bq)− Eµe(1Ge |B0)‖2L2(µq,e) & ε8. (4.1.13)Notice that the condition expectations above are on different measure spaces. To overcome this “dis-crepancy”, using the linear forms condition, we can show that for given B ⊆ Ve one hasEq∈Ve |µq,e(B)− µe(B)|2 µe(q) = oN,W→∞(1).This in turn implies that for almost every4 q,‖Eµq,e(1Ge |B0)− Eµe(1Ge |B0)‖L2(µq,e) = oN,W→∞(1) (4.1.14)and‖Eµe(1Ge |B0)‖L2(µe) = ‖Eµq,e(1Ge |B0)‖L2(µq,e) + oN,W→∞(1). (4.1.15)2see e.g. arguments in the proof of Lemma e.g. calculations after (4.4.13).4We will put a measure on the parametric space.107By (4.1.14) and triangle inequality, we have‖Eµq,e(1Ge |Bq)−Eµe(1Ge |B0)‖L2(µq,e) = ‖Eµq,e(1Ge |Bq)−Eµq,e(1Ge |B0)‖L2(µq,e) + oN,W→∞(1)Now by the Pythagoras theorem, one would obtain the “energy increment”‖Eµq,e(1Ge |Bq)−Eµq,e(1Ge |B0)‖2L2(µq,e) = ‖Eµq,e(1Ge |Bq)‖2L2(µq,e)−‖Eµq,e(1Ge |B0)‖2L2(µq,e) & ε8.(4.1.16)(4.1.13), (4.1.15) and (4.1.16) give us that for almost every q ∈ Ω, that‖Eµq,e(1Ge |Bq)‖2L2(µq,e) ≥ ‖Eµe(1Ge |B0)‖2L2(µe) + c ε8. (4.1.17)If F : V → R is a function and (V,B, µ) is a measure space, recall that the quantity ‖Eµ(F |B)‖2L2(µ)is referred to as the “energy” of the function F with respect to the measure space (V,B, µ), so (4.1.17)is telling that if Ge is not ε-uniform with respect to the initial measure spaces (Ve,B0, µe) then its en-ergy increases by a fixed amount when passing to the measure spaces (Ve,Bq,e, µq,e) for (almost) everyq ∈ Ω. One can iterate this argument to arrive to a family of measure spaces (Ve,Bq,e, µq,e)e∈Hd, q∈Ωsuch that the atoms Gq,e ∈ Bq,e become sufficiently uniform, thus obtaining a parametric versionof the so-called Koopman- von Neumann decomposition. This can be further iterated to eventuallyobtain a regularity lemma.Remark 4.1.9. The number of linear forms defining the measures µq,e is increasing at each step ofthe iteration, causing the linear forms condition to be used at a level depending eventually on therelative density of the set A and not just on the dimension d. (This can me made independent of α indimension 1 in Colon-Fox-Zhao’s arguments [16].)4.2 Weighted Hypergraph SystemFor a finite set S ⊆ Zd we attach the weightw(S) :=d∏i=1∏y∈pii(S)ν(y) (4.2.1)where pii(S) is the canonical projection of S to the i-th coordinate axis. If S = {x} we write w(x) :=w({x}) = ∏di=1 ν(xi). As in previous discussion, the weight ν is used to count configurations withprime coordinates. If Wx+ b ∈ PdN (and x ∈ [ε1N, ε2N ]d ), thenw(x) ≈d,W (logN)d. (4.2.2)108The implicit constant depends only on d and W which we will choose W sufficiently large but inde-pendent of N .In particular, for ∆ ⊆ [ε1N, ε2N ]d such that W∆ + b ⊆ A ⊆ PdN one hasw(∆) ≈ (logN)l(∆). (4.2.3)Recall the definition of hypergraph system (Definition 4.1.4). For a given set A ⊆ ZdN and fore = J\{j}, letEe = {x ∈ VJ :d∑i=0xi(vi − vj) ∈ A} (4.2.4)Note that Ee ∈ Ae for the collection Ae defined in Definition 4.1.4, as the expression in (4.2.4) isindependent of the coordinate xj . A point x ∈ Ee ⊆ Ve represents a vertex of an affine copy in A ofthe simplex. A point x ∈ ⋂e∈Hd Ee represents an affine copy in A of the simplex.Definition 4.2.1 (Weighted system). We will define now a family of functions νe : VJ → R+, µe :VJ → R+. For e ∈ Hd, e = J\{j} and 1 ≤ k ≤ d. Write each vertex asvj = (v1j , v2j , . . . , vdj )where vki denotes the kth−coordinate of the vector vi. DefineLke(x) =d∑i=0xi(vki − vkj ) (4.2.5)We partition the family of formsL := {Lke ; |e| = d, 1 ≤ k ≤ d} :=⋃f∈HLf (4.2.6)according to which coordinates they depend on. Here we write– Lf for the set all linear forms (in x ∈ VJ variables) with variables depend exactly on xf .– Lf for the set of linear forms (in x ∈ VJ variables) depending only on xg, g ⊆ f .For this we define the support of a linear form L(x) =∑dk=0 akxk as supp(L) = {k : ak 6= 0}. Fora given e ⊆ J , defineνe(x) =∏L∈L,supp(L)=eν(L(x)) , µe(x) =∏L∈L,supp(L)⊆eν(L(x)), (4.2.7)with the convention that νe ≡ 1 if {L; supp(L) = e} = ∅.109Note that if ∆ = {v0, . . . , vd} is in general position, that is if vki 6= vkj for all i 6= j and k thensupp(Lke ) = e for all e ∈ Hd henceµe(x) = νe(x) =d∏k=1ν(Lke(x)).Remark 4.2.2. As mentioned before, it is sometimes more convenient to think of µe as a measure onVJ (rather than Ve) in the obvious way. In general for x ∈ VJ , we have µe(x) =∏f⊆e νf (x) andalso µe(x) = µe(pie(x)), that is µe is constant along the fibers of the projection pie hence we canthink of µe as a function on Ve as well. We will refer the functions νe and µe as weights and measuresrespectively. To emphasize this point of view we will often use the integral notation and write∫VJF (x) dµe(x) := Ex∈VJF (x)µe(x), and∫VeFe(x) dµe(x) := Ex∈VeFe(x)µe(x),for functions F : VJ → R and Fe : Ve → R. Thus we could think of µe as a measure on VJ or on thesubspace Ve, the exact interpretation will be clear from the context. Note that for Fe : Ve → [−1, 1],∫VJFe(pie(x))dµe(x) =∫VeFe(x)dµe(x)and it follows easily from the linear forms condition (see Lemma 4.2.3 below) that∫VJFe(pie(x))dµJ(x) =∫VeFe(x)dµe(x) + oN,W→∞(1).Now we prove in Lemma 4.2.3 below that measure µe and µJ are essentially probability measures andin fact essentially the same measure and this supports the idea of identifying functions or sets on Vewith functions or sets on VJ in the obvious way: consider sets Ge ⊆ Ve as sets Ge = pi−1e (Ge) ⊆ VJ ,changing their measure only by a negligible amountµJ(Ge) = µe(Ge) + o(1) (4.2.8)The proof is a prototype of the arguments that are based on the Linear Forms Condition. Here, as inprevious chapter, we don’t need to analyze the structure of linear forms, only inspect that each linearforms depend on different sets of variables and hence linearly independent.Note that for any |e| = d we have that {Lke , 1 ≤ k ≤ d} are linearly independent which may beinspected from (4.2.5). Alternately, since each linear form of this set represents a coordinate of avertex, the set of all normal vectors to each of face of the simplex (choose one vector from each face)called {n1, . . . , nd+1}. Any d vectors chosen from this set are linearly independent. Recall that wecan parametrize affine copies of a simplex in ZdN by Zd+1N : a point (x1, . . . , xd+1) ∈ Zd+1N represents110an affine copy of the simplex where the equation of the d + 1 planes that constitute the simplex arep · ni = xi, 1 ≤ i ≤ d+ 1. Each vertex p = (p1, . . . , pd) of this affine simplex is obtained by solvingp ·m1 = y1...p ·md = ydherem1, . . . ,md are d vectors chosen from {n1, . . . nd+1} and y1, . . . yd are chosen from {x1, . . . xd+1},corresponding to choices of m1, . . . ,md. This means we solve Ap = (y1, ..., yd) for some invertiblematrix A hence p = A−1(y1, . . . , yd) so each pi is represented by a linear form in variables y1, . . . ydand since A is invertible, these k linear forms are in fact linearly independent.Lemma 4.2.3. For all e ∈ H we have thatµe(Ve) = 1 + o(1), (4.2.9)moreover if g : Ve → [−1, 1],Exe∈Ve g(xe)µe(xe) = Ex∈VJ g(pie(x))µJ(x) + o(1),or equivalently ∫Veg dµe =∫VJ(g ◦ pie) dµJ + o(1). (4.2.10)Proof. Note that the linear forms appearing on the right side ofµe(Ve) = Ex∈Ve∏supp(L)⊆eν(L(x))are pairwise linearly independent, and as they are supported on e they remain pairwise independentwhen restricted to Ve. Thus (4.2.9) follows from the linear forms condition.To show (4.2.10), let e′ = J\e and write x = (xe, xe′) with xe = pie(x), xe′ = pie′(x). ThenE := Ex∈VJ (g◦pie)(x)µJ(x)−Exe∈Ve g(xe)µe(xe) = Exe∈Ve g(xe)µe(xe)Exe′∈Ve′ (w(xe, xe′)−1),where w(xe, xe′) =∏f*e νf (xe∩f , xe′∩f ).Now we consider E2 to get rid of g (using |g| ≤ 1). By (4.2.9) we have that µe(Ve) . 1, andthen apply the Cauchy-Schwartz inequality in xe variables,|E|2 = Exe∈Veg(xe)µe(xe)1/2 × Exe′∈Ve′µe(xe)1/2(w(xe, xe′)− 1)111. Exe∈VeExe′ ,ye′∈Ve′ (w(xe, xe′)− 1)(w(xe, ye′)− 1)µe(xe).The right hand side of this expression is a combination of four terms and (4.2.10) follows from the factthat each term is 1 + o(1). Indeed the linear forms appearing in the definition of the function µe(xe)depend only on the variables xj for j ∈ e and are pairwise linearly independent. All linear formsinvolved in w(xe, xe′) depend also on some of the variables in xj , j ∈ e′, while the ones in w(xe, ye′)depend on the variables in yj , j ∈ e′, hence these forms depend on different sets of variables. Thusthe forms appearing in the expression µe(xe)w(xe, xe′)w(xe, ye′) are pairwise linearly independentand (4.2.10) follows from the linear forms condition:|E|2 . (1 + o(1))− (1 + o(1))− (1 + o(1))+ (1 + o(1)) = o(1)Note that the estimate is independent on the function g.Counting Prime Simplices.To see how to use weighted hypergraph {νe}e∈H to count prime simplices we follow [102] to param-eterize affine copies of ∆. Define the map Φ : Zd+1N → Zd+1N byΦ(x) = (d∑i=0xivi,−d∑i=0xi) := (y, t) (4.2.11)By (4.2.4) and (4.2.11) we have that x ∈ Ee for e = J\{j} if and only if y + tvj ∈ A thusx ∈ ⋂e∈Hd Ee exactly when y + t∆ ⊆ A. Since {v1 − v0, . . . , vd − v0} is a linearly independentfamily of vectors, we have that Φ is one to one. Hence, this gives a parametrization of all affine copiesof ∆ contained in A (mod N). Also for e = J\{j},Lke(x) =d∑i=0xi(vki − vkj ) = pik(y + tvj) (4.2.12)where pik is the orthogonal projection to the kth coordinate axis. This implies thatµe(x) =∏supp(L)⊆eν(L(x)) =d∏k=1ν(Lke(x)) = w(y + tvj), (4.2.13)and alsoµJ(x) =∏L∈Lν(L(x)) = w(y + t∆). (4.2.14)In particular µJ(⋂e∈Hd Ee) counts the number of prime affine copies of ∆.Next we observe that our linear forms do satisfy some useful properties which we will refer to later:112Theorem 4.2.4 (Properties of a family of linear forms). Consider a family of linear forms L ={Lke ; e ∈ Hd, 1 ≤ k ≤ d} associated with hypergraph. Our system of linear forms satisfies thefollowing properties.– If e = J\{j}, e′ = J\{j′} then supp(Lke′) ⊆ e if and only if vkj = vkj′ (i.e. the kth coordinate ofvj , vj′ are the same). This is equivalent to Lke′ = Lke . We call such a family L well-defined.– Since for a given e ∈ Hd, the forms Le = {Lke , 1 ≤ k ≤ d} are linearly independent. Any twodistinct forms of the family L are linearly independent. We will refer to such families of formsas being pairwise linearly independent.– Let M = {x ∈ VJ : x0 + . . . + xd = 0}. Then for any x ∈ M , from definition of the linearforms, we have Lke(x) = Lke′(x) for all e, e′ ∈ Hd and k. We call a family of linear forms Lsatisfying this property symmetric.5Example 4.2.5 (Corners in Z2). Φ(x0, x1, x2) = (x1, x2,−x0 − x1 − x2); y − (x1, x2), t = −x0 −x1 − x2.y + tv1 = (x1, x2) + (−x0 − x1 − x2)(0, 0) = (x1, x2)y + tv2 = (x1, x2) + (−x0 − x1 − x2)(1, 0) = (−x0 − x2, x2)y + tv3 = (x1, x2) + (−x0 − x1 − x2)(0, 1) = (x1,−x0 − x1)– Linear forms are x1, x2,−x0 − x2,−x0 − x1.– Linear forms associated with (0, 1) are L1(0,1)(x0, x1) = −x0− x1, L2(0,1)(x0, x1) = x1. Linearforms associated with (0, 2) are L1(0,2)(x0, x2) = −x2 − x0, L2(0,2)(x0, x2) = x2. Linear formsassociated with (1, 2) are L1(1,2)(x1, x2) = x1, L2(1,2)(x1, x2) = x2.– Examples of symmetric property: Let (x0, x1, x2) = M = {(x0, x1, x2) : x0 + x1 + x2 = 0}then−x0−x1 = L1(0,1)(x0, x1, x2) = L1(0,1)(x0, x1) = L1(0,2)(x0,−x2−x0) = −x0−(−x2−x0) = x2 = L2(0,2)(x0, x1, x2)x1 = L2(0,1)(x0, x1, x2) = L2(0,2)(x0, x2 − x0) = −x2 − x0 = L1(0,2)(x0, x1, x2).5The set M correspond the degenerated copy x + t∆ with t = 0, saying that it should be degenerated to a single point. Givenf ∈ H with a set of linear forms Lf , we can have a process of symmetrization to obtain a system of linear forms defined on allhyperedges which is symmetric such that the linear forms that only depend on variables in f is Lf . See section Parametric Weight Systems: Extensions, Stability, andSymmetrization.Definition 4.3.1 (weight systems and associated families of measures depending on parameters.). LetLq := (L1(q, x), ..., Ls(q, x))be a family of linear forms with integer coefficients depending on the parameters q ∈ ZR and thevariables x ∈ ZD. We call the family pairwise linearly independent if no two forms in the family arerational multiples of each other (considered as forms over q and x). If N is a sufficiently large primewith respect to the coefficients of the linear forms Li(q, x), then the forms remain pairwise linearlyindependent when considered as forms over Z × V , Z = ZRN , V = ZDN . We refer to the set Z = ZRNas the parameter space of the family Lq. We call the family of parametric forms Lq well-defined ifall forms Li(q, x) depend on some of the x-variables and there is measure on Z of the form∫Zg(q) dψ(q) = Eq∈Z g(q)ψ(q), ψ(q) =t∏i=1ν(Yi(q)), (4.3.1)for a family of pairwise linearly independent linear forms Yi defined over Z.If V = VJ then we define an associated system of weights {νq,e}q∈Z,e∈H and measures {µq,e}q∈Z,e∈Has follows: For a form Lk(q, x) =∑i biqi +∑j ajxj define its x-support as suppx(L) = {j ∈J ; aj 6= 0}. For e ⊆ J and q ∈ Z, letνq,e(x) :=∏L∈Lqsuppx(L)=eν(L(q, x)), µq,e(x) :=∏L∈Lqsuppx(L)⊆eν(L(q, x)) (4.3.2)We use the convention that νq,e ≡ 1 if there is no form L ⊆ Lq such that suppx(L) = e. Note that thex-support partitions the family of forms Lq is independent of the parameters q, thus for given e ∈ Hµq,e(x) =∏f⊆eνq,f (x), for all q ∈ Z.A crucial observation is that many of the properties of the measure system {µe} still hold for well-defined measure systems {µq,f} for almost every value of the parameter q ∈ Z. In order to formulatesuch statements, we give the following definition.Definition 4.3.2. We define the dimension of the space Z, the number of linear forms Lj(q, x), Yl(q).We say that the familyLq has complexity at mostK if the dimension of the spaceZ and the magnitudeof their coefficients are all bounded by K.Remark 4.3.3. The error terms in applications of the linear forms conditions will depend on quantity114K.We have the analogue of Lemma 4.2.3 for parametric weight system.Lemma 4.3.4. Let {µq,e}e∈H,q∈Z be a well-defined parametric measure system of complexity at mostK.For every e ∈ H there is a set Ee ⊆ Z such that ψ(Ee) = oK(1), and for every q /∈ Eeµq,e(Ve) = 1 + oK(1). (4.3.3)Moreover for every e ∈ H there is a set Ee ⊆ Z of measure ψ(Ee) = o(1), such the following holds.For any function g : Z × Ve → [−1, 1] and for every q /∈ Ee one has the estimate∫Veg(q, xe) dµq,e(xe) =∫VJg(q, pie(x)) dµq,J(x) + oK(1). (4.3.4)Proof. To prove (4.3.3), consider the quantityΛe :=∫Z|µq,e(Ve)− 1|2 dψ(q)= Eq∈ZExe,ye(∏suppx(L)⊆eν(L(q, xe))− 1)(∏suppx(L)⊆eν(L(q, ye))− 1) dψ(q).The above expression is a combination of four terms and note that the family of linear forms{Yk(q), Li(q, xe), Lj(q, ye)}is pairwise linearly independent in the (q, xe, ye) variables by our assumption on the linear forms.Applying the linear forms condition gives that each term is 1 + oK(1) and so Λe = oK(1) and (4.3.3)follows.Now let e′ = J\e, write x = (xe, xe′) and arguing as in Lemma 4.2.3 we consider the differencein (4.3.4),Λ(q, e, g) :=∣∣ ∫VJg(q, pie(x))dµq,J(x)−∫Veg(q, xe)dµq,e(xe)∣∣= | Ex∈VJ g(q, pie(x))µq,J(x)− Exe∈Ve g(q, xe)µq,e(xe)|= |Exe∈Ve g(q, xe)µq,e(xe)Exe′∈Ve′ (wq(xe, xe′)− 1)|≤ Exe∈Ve µq,e(xe) |Exe′∈Ve′ (wq(xe, xe′)− 1)|,115where wq(xe, xe′) =∏f*e νq,f (xe∩f , xe′∩f ).Notice that the right hand side of the above inequality is independent of the function g; if we denote itby Λ(q, e) then (4.3.4) (holds for almost every q) would follow from the estimateEq∈Z Λ(q, e) dψ(q) =oK(1). By the linear forms condition Eq,xe dψ(q) dµq,e(xe) = 1+oK(1) ≤ 2, forN sufficiently largewith respect to K. Then by the Cauchy-Schwartz inequality one has(Eq∈Z Λ(q, e) dψ(q))2 . Eq∈ZExe∈Ve µq,e(xe)|Exe′∈Ve′ (wq(xe, xe′)− 1)|2 dψ(q). Eq∈Z, xe∈Ve Exe′ ,ye′∈Ve′ (wq(xe, xe′)− 1)(wq(xe, ye′)− 1) dµq,e(xe) dψ(q).This is a combination of four terms, however each term again is 1+oK(1) as the linear forms definingψ depend on the variables q while the ones defining µq,e depend also on the xe variables. On the otherhand all linear forms appearing in the weight functions wq(xe, xe′) (respectively, wq(xe, ye′)) dependon the xe′ (respectively, ye′) variables as well. Thus the family of all linear forms in the aboveexpressions is pairwise linearly independent in the (q, xe, xe′ , ye′) variables. The result follows fromlinear forms condition.4.3.1 Extension of Parametric Weight System, Stability and Symmetrization.Energy increment argument involves iterations. In our setting, it turns out that when we do an iteration,due to our averaging argument, we end up with a new parametric system of measure which is anextension of the original measure system (in the sense that the weights in the definition of the originalmeasure are included in the definition of the new measure). The fact we will prove is that most ofthese extensions are just a small perturbation of the original measure and still shares many importantproperties with the original measure.Definition 4.3.5 (Parametric Extension). LetL1q1 = {L11(q1, x), ..., Ls11 (q1, x)}, L2q2 = {L12(q2, x), ..., Ls22 (q2, x)}be two pairwise linearly independent families of linear forms defined on the parameter spaces Z1 =Zk1N and Z2 = Zk2N . Let ψ1 and ψ2 be measures on Z1 and Z2 defined by the families of linear forms{Y 11 (q1), . . . Y 1s1(q1)} and {Y 21 (q2), . . . Y 2s2(q2)}.We say that the family L2q2 is an extension of the family L1q1 if Z1 ≤ Z2 (Z1 may be empty) andthe following holds: The family of forms Li2(q2, x), Y2j (q2) which depend only on the variables q1 =pi(q2) is exactly the family of forms Li1(q1, x), Y1j (q1), where pi : Z2 → Z1 is the natural orthogonalprojection.If V = VJ let µ1 := {µq1,e}q1∈Z1,e∈H and µ2 := {µq2,f}q2∈Z2,f∈H be the associated measuresystems as defined in (4.3.2). We say that the measure system µ2 is an extension of the system µ1.116Remark 4.3.6. Writing Z2 = Z1 × Z, Z = ZrN and q2 = (q1, q), we haveψ2(q1, q) = ψ1(q1) · ϕ(q1, q) (4.3.5)where ϕ(q, q1) =∏ti=1 ν(Yi(q1, q)). The linear forms Yi(q1, q) defining ϕ(q1, q) depend on some ofthe variables of q = (qi)1≤i≤k and are pairwise linearly independent. Similarly one may write forany e ∈ Hµ2(q1,q),e(xe) = µ1q1,e(xe)we(q1, q, xe) (4.3.6)where the linear forms Lj2(q1, q, xe) defining the function we(q, q1, xe) depend on (some of) the vari-ables q as well as on all of the variables xe.Next lemma, we prove Stability Property of a parametric extension. Saying that most extensions µq2of µq1 is just a small perturbation of µq1 , by quantities that is independent of q2.Lemma 4.3.7 (Stability property of measure). Let {µf}f∈H be a well defined measure system, and let{µq,f}q∈Z,f∈H be a well-defined parametric extension of {µf}f∈H of complexity at mostK. Then forany f ∈ H and for any function g : Vf → [−1, 1] there is a set Eg,f ⊆ Z of measure ψ(Eg,f ) = oK(1),so that for all q /∈ Eg,f ∫Vfg dµq,f −∫Vfg dµf = oK(1). (4.3.7)Similarly if {µq1,f}f∈H,q1∈Z1 is a well-defined parametric system and if {µq2,f}f∈H,q2∈Z2 is an exten-sion of complexity at most K2, then to any function g : Z1×Vf → [−1, 1] there exists a set Eg,f ⊆ Z2of measure ψ2(Eg,f ) = oK2(1), such that for all q2 = (q1, q) /∈ Eg,f∫Vfg(q1, x) dµq2,f (x)−∫Vfg(q1, x) dµq1,f (x) = oK2(1). (4.3.8)Proof. As µq,f = µf (xf )wf (q, xf ), the left side of (4.3.7) may be written asΛf,g(q) :=∫Vfg(x)(wf (q, x)− 1) dµf (x).Consider the average over q,Λf,g :=∫Z|Λf,g(q)|2 dψ(q).which is non-negative. By expanding, we haveΛf,g =∫Z∫Vf∫Vf(wf (q, x)− 1)(wf (q, y)− 1)g(x)g(y) dµf (x)dµf (y)dψ(q)≤∫Vf∫Vf∣∣∣∣∫Z(wf (q, x)− 1)(wf (q, y)− 1)dψ(q)∣∣∣∣ dµf (x)dµf (y).117Now the Cauchy-Schwartz inequality in Vf × Vf variables and (4.2.9) gives|Λf,g|2 .∫Vf∫Vf∫Z∫Z(wf (q, x)− 1)(wf (q, y)− 1)×× (wf (p, x)− 1)(wf (p, y)− 1) dµf (x)dµf (y)dψ(q)dψ(p).This last expression is a combination of 16 terms where each term is 1 + oK(1) by the linear formconditions and their total contribution is o(1). Indeed the linear forms which can appear in any ofthese terms are Yi1(q),Yi2(p),Li3(x),Li4(y), Li5(q, x), Li6(q, y), Li7(p, x), Li8(p, y). Note that thelast 4 terms depend on both sets of variables (for example Li(q, x) depends both on q ∈ Z and onx ∈ Vf ), and hence the family of these forms are pairwise linearly independent in the (q, p, x, y)variables. This Proves (4.3.7).The proof of (4.3.8) is essentially the same. SetΛf,g(q2) :=∫Vfg(q1, x)dµq2,f (x)−∫Vfg(q1, x)dµq1,f (x)andΛf,g :=∫Z2|Λf,g(q2)|2 dψ2(q2).where we write Z2 = Z1 × Z, Z = ZkN , and q2 = (q1, q) for q2 ∈ Z2. By (4.3.5) we estimate asaboveΛf,g .∫Vf∫Vf∫Z1dψ1(q1)dµq1,f (x)dµq1,f (y) |Eq∈Z (wf (q1, q, x)− 1)(wf (q1, q, y)− 1)ϕ(q1, q)| .The linear forms condition gives∫Vf∫Vf∫Z1dψ1(q1)dµq1,f (x)dµq1,f (y) = 1 + oK2(1),so by Cauchy-Schwartz’s inequality, we have|Λf,g|2 .∫Vf∫Vf∫Z1Ep,q∈Z (wf (q1, q, x)− 1)(wf (q1, q, y)− 1)×× (wf (q1, p, x)− 1)(wf (q1, p, y)− 1) ϕ(q1, q)ϕ(q1, p) dψ1(q1)dµq1,f (x)dµq1,f (y).Now any linear form Lif (q1, q, x) depends both on the variables q and x. Thus again the left sideis a combination of 16 terms, each being 1 + oK2(1) by the linear forms condition as all the linearforms involved in any of these expressions are pairwise linearly independent in the (x, y, q1, q, p)variables.118Next we will prove stability property of box norm with respect to an extension. Let g : Z1 × Ve → Rbe a function and let e ∈ H, |e| = d′. For a given q1 ∈ Z1 recall the box norm of gq1(x) = g(q1, x)∥∥gq1∥∥2d′µq1,e = Ep,x∈Ve ∏ωe∈{0,1}eg(q1, ωe(p, x))∏f⊆e∏ωf∈{0,1}fνq,f (ωf (pf , xf )), (4.3.9)where xf = pif (x), pf = pif (p), pif : Ve → Vf being the natural projection. Here we have aparametric linear forms Lq1,e on Z1× Ve as in Definition 4.3.1. The inner product on the right side of(4.3.9) is defined by the parametric family of forms (in (p, x)−variables)L˜q1,e =⋃f⊆e{L(q1, ωf (pf , xf )); L ∈ Lq1 , suppx(L) = f, ωf ∈ {0, 1}f}. (4.3.10)Claim. L˜q1 is a pairwise linearly independent family of forms defined over Z1 × V (V = Ve × Ve).Proof of Claim. Suppose we would have thatL′(q1, ω′f ′(pf ′ , xf ′)) = λL(q1, ωf (pf , xf )), (4.3.11)then restriction both forms to the subspace {(p, x) ∈ Vf×Vf : p = x}would imply that L′(q1, xf ′) =λL(q1, xf ) and hence f ′ = suppx(L′) = suppx(L) = f . Then, as L and L′ depend exactly variablesxj ,j ∈ f . For the equation (4.3.11) to hold, we should have ω′f = ωf and L = L′.Lemma 4.3.8 (Stability Property of Box Norm). Let {νq1,f}f∈H,q1∈Z1 be a parametric weight systemwith a well-defined extension {νq2,f}f∈H,q2∈Z2 of complexity at most K2. Then to any e ∈ H and toany function g : Z1 × Ve → [−1, 1] there exists a set E = E(g, e) ∈ Z2 of measure ψ2(E) = oK2(1)such that for all q2 = (q1, p) /∈ E∥∥gq1∥∥µq2,e = ∥∥gq1∥∥µq1,e + oK2(1). (4.3.12)Proof. LetGq1(p, x) :=∏ωe∈{0,1}eg(q1, ωe(p, x)), (4.3.13)and let {µ˜q1,e}q1∈Z1 denotes the associated system of measures on Z1 × Ve, then for given q1 ∈ Z1(according to definition in the equation (4.3.9)), we can write∥∥gq1∥∥2d′µq1,e = Ep,x∈VeGq1(p, x) µ˜q1,e(p, x). (4.3.14)Now, if Lq2 is a well-defined parametric extension of Lq1 then (4.3.10) yields to a well-defined para-119metric extension L˜q2 of the family L˜q1 . Then by Lemma 4.3.7, and the simple observation that|a2d′ − b2d′ | ≤ ε implies6 |a− b| ≤ ε2−d′ for a, b ≥ 0, we are done.Finally we prove prove the stability of the conditional expectation, both with respect to L2 norm andBox norm. Recall that if (V,B, µ) and g : V → R, we definedEµ(g|B)(x) = 1µ(B(x))∫B(x)g(y)dµ(y) :=1µ(B(x))Ey∈V 1B(x)(y)g(y)µ(y),where B(x) ∈ B is the atom containing x. If µ(B(x)) = 0 then we set Eµ(g|B)(x) = 1.Lemma 4.3.9. Let (µq1,f )q1∈Z1,f∈H be a well-defined parametric measure system with a well-definedextension (µq2,f )q2∈Z2,f∈H of complexity at most K2. For q1 ∈ Z1 and e ∈ H, let Bq1,e be aσ−algebra on Ve such that compl(Bq1,e)≤ M for some fixed number M . For any function gq1 :Z1 × Ve → [−1, 1] there exists a set E = E(Bq1,e, g) ⊆ Z2 of measure ψ2(E) = oM,K2(1) such thatfor any q2 = (q1, q) /∈ E . We have1. ∥∥Eµq2,e(gq1 |Bq1,e)− Eµq1,e(gq1 |Bq1,e)∥∥2L2(µq2,e) = oM,K2(1) (4.3.15)2. ∥∥Eµq2,e(gq1 |Bq1,e)∥∥2L2(µq2,e) = ∥∥Eµq1,e(gq1 |Bq1,e)∥∥2L2(µq1,e) + oM,K2(1). (4.3.16)Proof. Let m = 2M and enumerate the atoms of Bq1,e as B1q1 , ..., Bmq1 , allowing some of them topossibly be empty. For a fixed 1 ≤ i ≤ m define the functionsbi(q1, x) := 1Biq1 (x)=1 if x ∈ Biq10 otherwiseand for q2 = (q1, q) ∈ Z2 define the quantitiesµi(q2, g) :=∫Veg(q1, x)bi(q1, x)dµq2,e(x), µi(q2) := µi(q2, 1) = µq2,e(Biq1),µi(q1, g) :=∫Veg(q1, x)bi(q1, x)dµq1,e(x), µi(q1) := µi(q1, 1) = µq1,e(Biq1)6Indeed |a2d′−1 − b2d′−1 |2 ≤ |a2d′−1 − b2d′−1 ||a2d′−1+ b2d′−1 | = |a2d′− b2d′| ≤ ε. Then we may argue by induction.120By Lemma 4.3.7 we have thatµi(q2, g) = µi(q1, g) + oK2(1), µi(q2) = µi(q1) + oK2(1) (4.3.17)for all q2 /∈ Ei where Ei ⊆ Z2 is a set of ψ2- measure oK2(1). Let E =⋃mi=1 Ei then ψ2(E) =oK2,M (1). By definition, the left hand side of (4.3.15) takes the formm∑i=1(µi(q2, g)µi(q2)− µi(q1, g)µi(q1))2µi(q2), (4.3.18)with the convention that if µi(q1) = 0 or µi(q2) = 0 then µi(q1, g)/µi(q1) := 1 or µi(q2, g)/µi(q2) :=1.If q2 = (q1, q) /∈ E then by (4.3.17)ε = ε(N) :=m∑i=1(|µi(q2, g)− µi(q1, g)|+ |µi(q2)− µi(q1)|)= oK2,M (1) (4.3.19)We split the sum in (4.3.18) in i into 2 parts:– If µi(q1) ≤ 2ε1/4 then µi(q2) ≤ 3ε1/4 by (4.3.17) and we have the trivial bound(µi(q2, g)µi(q2)− µi(q1, g)µi(q1))2≤ 22.Hence the total contribution of such terms is bounded by 12mε1/4 = oK2,M (1).– If µi(q1) ≥ 2ε1/4 then µi(q2) ≥ ε1/4, we have the estimate∣∣∣∣µi(q2, g)µi(q2) − µi(q1, g)µi(q1)∣∣∣∣ = ∣∣∣∣(µi(q2, g)− µi(q1, g))µi(q1)− µi(q1, g)(µi(q1)− µi(q2))µi(q1)µi(q2)∣∣∣∣≤  · 2 + 2εµi(q1)µi(q2)≤ 4ε2ε1/2= oK2,M (1).This proves (4.3.15). The proof of inequality (4.3.16) proceeds the same way, here one needs toestimate the quantitym∑i=1∣∣∣∣µi(q2, g)2µi(q2) − µi(q1, g)2µi(q1)∣∣∣∣ = m∑i=1∣∣∣∣(µi(q2, g)µi(q2))2µi(q2)−(µi(q1, g)µi(q1))2µi(q1)∣∣∣∣ (4.3.20)– If µi(q1) ≤ 2ε1/4 then µi(q2) ≤ 3ε1/4 for q2 = (q1, q) /∈ E . Since we have the trivial bounds(µi(qj , g)/µi(qj))2 ≤ 1 for j = 1, 2, the contribution of such terms to the right side of (4.3.20)121is trivially estimated by5mε1/4 = oM,K2(1)– If µi(q1) ≥ 2ε1/4 then µi(q2) ≥ ε1/4, using∣∣∣∣(µi(q2, g)µi(q2))2µi(q2)−(µi(q1, g)µi(q1))2µi(q1)∣∣∣∣ = ∣∣∣∣(µi(q2, g)2 − µi(q1, g)2)µi(q1)− µi(q1, g)2(µi(q1)− µi(q2))µi(q1)µi(q2)∣∣∣∣then proceed as in the proof of (4.3.15),using |µi(q2, g)2−µi(q1, g)2| ≤ 2|µi(q1, g)−µi(q2, g)|we have that these remaining terms are bounded by 8 ε1/2 and (4.3.16) follows.Finally, we need an analogue of the above result when the ‖ · ‖L2(µq,e) norm is replaced by the morecomplicated ‖ · ‖µq,e norms.Lemma 4.3.10. Let {νq2,f}q2∈Z2,f∈H be a well-defined extension of the parametric weight system{νq1,f}q1∈Z1,f∈H, of complexity at most K2. For q1 ∈ Z1 and e ∈ H, let Bq1,e be a σ-algebra ofcomplexity at most M , for some fixed constant M > 0. Then‖Eµq2,e(gq1 |Bq1,e)− Eµq1,e(gq1 |Bq1,e)‖µq2,e = oM,K(1), (4.3.21)for all q2 = (q1, q) /∈ E , where E = E(g,B) ⊆ Z2 is a set of measure ψ2(E) = oM,K2(1).Proof. First we show that for any family of sets A = (Aq1)q1∈Z1 , Aq1 ⊆ Ve there is a set E1 =E1(g,A) of measure ψ2(E1) = oK2(1) such that for all q2 = (q1, q) /∈ E1 we have‖1Aq1‖2|e|µq2,e≤ µq2,e(Aq1) + oK2(1). (4.3.22)To see this, first note that for q2 = (q1, q) ∈ Z2 one has‖1Aq1‖2|e|µq2,e≤ Ex,p∈Ve1Aq1 (x)µq2,e(x)∏f⊆e∏ωf 6=0νq2,f (ωf (pf , xf ))= µq2,e(Aq1) + E(q2),withE(q2) ≤ Ex∈Veµq2,e(x)|Ep∈Ve(w(q2, p, x)− 1)|,wherew(q2, p, x) =∏f⊆e∏ωf 6=0νq2,f (ωf (pf , xf )).122Arguing as in the proof of Lemma 4.3.7, we see thatEq2∈Z2Ex,p,p′∈Ve ψ2(q2)dµq2,e(x) (w(q2, p, x)− 1)(w(q2, p′, x)− 1) = oM,K2(1)and (4.3.22) follows.Now let {Biq1}mi=1 (m = 2M ) be the atoms ofBq1,e and define the quantities µi(q2, g), µi(q2), µi(q1, g), µi(q1)as in Lemma 4.3.9. Using the facts that µi(q2, g) = µi(q1, g) +oK2(1) and µi(q2) = µi(q1) +oK2(1)outside a set of measure oM,K2(1). Arguing as in Lemma 4.3.9, we obtain∣∣∣∣µi(q2, g)µi(q2) − µi(q1, g)µi(q1)∣∣∣∣ = oM,K2(1). (4.3.23)The expression in (4.3.21) is then estimated :∥∥∥∥ m∑i=1(µi(q2, g)µi(q2)− µi(q1, g)µi(q1))1Biq1∥∥∥∥µq2,e≤m∑i=1∣∣∣∣µi(q2, g)µi(q2) − µi(q1, g)µi(q1)∣∣∣∣∥∥1Biq1∥∥µq2,e.m∑i=1∣∣∣∣µi(q2, g)µi(q2) − µi(q1, g)µi(q1)∣∣∣∣ µq2,e(Biq)2−d + oM,K2(1),for q2 = (q1, q) /∈ E1, where E1 = E1(Bq1,e, g) is a set of measure oM,K2(1).Nowm∑i=1µq2,e(Biq1) = µq2,e(Ve) = 1 + oK2(1),In particular µq2,e(Biq1)2−|e| = OM,K2(1) and it follows from (4.3.23) that the above expression(4.3.21) is oM,K2(1) . This completes the proof.4.3.2 Symmetrization of Parametric Weight SystemIn this subsection we prove symmetric property of the parametric weight system.Definition 4.3.11 (Symmetric of parametric weight system). For each e ∈ H, letLq,e = {L1e(q, x), ..., Lse(q, x)}be a pairwise linearly independent family of linear forms in x variables defined on V = VJ , dependingon parameters q ∈ Z, such that suppx(Lje) ⊆ e. We denote Lq =⋃f∈H Lq,f =⋃e∈Hd Lq,eWe say that the family of forms Lq is symmetric if Lje(q, x) = Lje′(q, x) for all q ∈ Z, x ∈M = {x :x0 + · · ·+ xd = 0}, e, e′ ∈ Hd and 1 ≤ j ≤ s.For any given e ∈ H we call Lq the symmetrization of the family Lq,e. The reason and validity of123this later definition is the content of theorem 4.3.13 below.Remark 4.3.12. Recall that our initial family of forms defined in (4.2.5) has this property.Theorem 4.3.13. For a fixed e ∈ H, |e| ≤ d and pairwise linearly independent family Lq,e. There is aunique symmetric family of linear formsLq defined above such thatLq,e = {L ∈ Lq; suppx(L) ⊆ e}.Also Lq,e is a pairwise linearly independentProof. First assume |e| = d and suppose we have a family of pairwise linearly independent linearforms Lq,e. Let M = {x : x0 + · · · + xd = 0} (which is isomorphic to Ve for any |e| = d) andφe : Ve →M be the inverse of the projection pie restricted toM . Now for any e′ ∈ Hd, q ∈ Z, x ∈ VJ ,defineLje′(q, x) := Lje(q, φe′ ◦ pie′(x)) (4.3.24)Note that φe′ ◦pie′(x) is an isomorphism (the identity map) betweenM and Ve. Hence suppxLje′ ⊆ e′.Now if x ∈M then x = φe′ ◦ pie′(x) hence Lje′(q, x) = Lje(q, x), this shows symmetry of Lq.Indeed, Lq,e ⊆ {L ∈ Lq; suppx(L) ⊆ e}. Next, we show Lq,e ⊇ {L ∈ Lq; suppx(L) ⊆ e}.Suppose suppxLje′ ⊆ e (so Lje′(q, φe ◦ pie(x)) is defined) then for all q ∈ Z1, x ∈ VJ , then bysymmetry propertyLje′(q, x) = Lje′(q, φe ◦ pie(x)) = Lje(q, φe ◦ pie(x)) = Lje(q, x) (4.3.25)Finally, we verify that Lq is a pairwise linearly independent family by considering the set of variablesthey depend on. Now all forms in Lq are constructed via (4.3.24), any two of them are either of theform Lje(q, xe) or depends on different sets of variables, hence must be pairwise linearly independent.Now suppose |f | < d and we have a family of linear forms Lq,f . We choose |e| = d with f ⊆ eand we consider Lq,f as family of forms on Ve. This is independent of the choice of e since (bywell-definedness of the system) if f ⊆ e′ as well then Lje = Lje′ for all 1 ≤ j ≤ s(f) and we can dothe symmetrization as above.4.4 Regularity Lemma for Parametric Weight HypergraphIn this section we will prove a decomposition theorem for the functions on our hypergraph where wewill exploit the machinery of parametric weight system we developed.4.4.1 A Koopman-von Neumann Type Decomposition for Parametric Weight SystemLet e ⊆ J and let Bf be a σ-algebra on Vf for f ∈ ∂e, where ∂e = {f ⊆ e; |f | = |e|−1} denotes theboundary of the edge e. Let B := ∨f⊆∂e Bf be the σ-algebra generated by the sets pi−1ef (Bf ) where124pief : Ve → Vf is the canonical projection. The atoms of B are the sets G =⋂f⊆∂e pi−1ef (Gf ) withGf being an atom of Bf . We may interpret G the collection of simplices x ∈ Ve whose faces xf arein Gf for all f ∈ ∂e.The first lemma we will prove says that if there is a large “bad” set Ω of parameters q for which theset Gq,e is not sufficiently uniform with respect to the σ-algebra∨f∈∂e Bq,f , then the energy of theset (with respect to the sigma-algebra) will increase by a fixed amount when passing to a well definedextension {Bq′,f}, each has complexity increased at most 1 and {µq′,e} with complexity O(K). Thisholds for all q′ = (q, p) ∈ Ω′ with positive measure.Lemma 4.4.1 (Large Box Norm implies Structure and Energy Increment). For given e ⊆ J , |e| = d′,let {µq,f}q∈Z,f⊆e be a well-defined family of measures of complexity at most K. For q ∈ Z letGq,e ⊆ Ve and {Bq,f}f∈∂e be a σ-algebra on Vf .Assume that for all q ∈ Ω, where Ω ⊆ Z is a set of measure ψ(Ω) ≥ c0 > 0, we have∥∥1Gq,e − Eµq,e(1Gq,e | ∨f∈∂eBq,f )∥∥2d′µq,e≥ η, (4.4.1)for some η > 0.Then for N,W sufficiently large with respect to the parameters c0, η, there exists a well-defined ex-tension {µq′,f}q′∈Z′,f⊆e of the system {µq,f} of complexity K ′ = O(K), and a set Ω′ ⊆ Ω × Ve ⊆Z ′ = Z × Ve such that all of the following hold.1. (positive measure of parameter) We haveψ′(Ω′) ≥ 2−4c20η2, (4.4.2)where ψ′ is the measure on the parameter space Z ′.2. (complexity control) For all q′ = (q, p) ∈ Z ′ and f ∈ ∂e there is a σ−algebra Bq′,f ⊇ Bq,f ofcomplexitycompl(Bq′,f ) ≤ compl(Bq,f ) + 1. (4.4.3)3. (energy increment) For all q′ = (q, p) ∈ Ω′, one has∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥2L2(µq′,e)≥ ∥∥Eµq,e(1Gq,e | ∨f∈∂eBq,f )∥∥2L2(µq,e)+ 2−2d−5 η2, (4.4.4)4. (probability measure)µq′,e(Ve) ≤ 2. (4.4.5)125Proof. Letgq := 1Gq,e − Eµq,e(1Gq,e |∨f∈∂eBq,f ). (4.4.6)Then by definition in equation (4.3.9) we have for each q ∈ Ω,∥∥gq∥∥2d′µq,e =∫Ve〈gq,∏f∈∂euq,p,f 〉µ(q,p),edµq,e(p) ≥ η, (4.4.7)where uq,p,f : Ve → [−1, 1] are functions in x−variables and dµq,e(p) is from terms without xvariables. {µ(q,p),e}(q,p)∈Z′ is the family of measures defined byµ(q,p),e(x) =∏f⊆e∏ωf∈{0,1}fωf 6=0νq,f (ωf (pf , xf )).As explained after (4.3.2) the measures µ(q,p),e are defined by a pairwise independent family of formsL(q,p),e depending on the parameters (q, p) ∈ Z × Ve, which is a well-defined extension of the fam-ily Lq,e defining the measures µq,e. It is clear from (4.4.7) that the measure ψ′ on Z ′ has the formψ′(q, p) = µq,e(p)ψ(q) where ψ(q) is the product of terms without p variables.For q′ = (q, p), letΓ(q, p) := 〈gq,∏f∈∂euq,p,f 〉µq,p,f (4.4.8)First, we show that there is a set Ω′1 ⊆ Ω× Ve of measureψ′(Ω′1) ≥ 2−3c20 η2, (4.4.9)such that for every (q, p) ∈ Ω′1 one hasΓ(q, p) ≥ η4. (4.4.10)Indeed, by Lemma 4.3.4 we have that µq,e(Ve) = 1 + oK(1) ≤ 2 for q /∈ E1 where E1 ⊆ Ω is a set ofmeasure ψ(E1) = oK(1). Thus for q ∈ Ω\E1 = Ω1 we have by (4.4.7) that126∫Ve1{Γ(q,p)≥η/4}Γ(q, p)dµq,e(p) ≥ η −∫Ve1{Γ(q,p)<η/4}Γ(q, p)dµq,e(p) ≥ η −η4(1 + o(1)) ≥ η2.(4.4.11)Now we use this fact to bound the L2−moment of Γ(p, q). By (4.4.7) and (4.4.8) we haveΓ(q, p) =∫Vegq(x)( ∏f∈∂euq,f)wq,p(x)dµq,e(x)The function wq,p(x) is the product of weight functions of the form ν(L(q, p, x)) depending on bothp and x. Thus, using the bounds |gq| ≤ 1, |uq,p,f | ≤ 1, one has∫Z∫Ve|Γ(q, p)|2dµq,e(p)dψ(q) ≤∫Z∫Ve∫Ve∫Vewq,p(x)wq,p(x′)dµq,e(x)dµq,e(x′)dµq,e(p)dψ(q)(4.4.12)= 1 + oK(1) ≤ 2by the linear forms condition as the factors in the product depend on different sets of variables. LetΩ′1 := {(q, p) ∈ Ω1 × Ve; Γ(q, p) ≥ η/4}. Thus by (4.4.11) and (4.4.12) over Ω1 with the Cauchy-Schwartz inequality,c20η24≤ ψ(Ω1)2η24≤(∫Ω′1Γ(q, p)dµq,e(p) dψ(q))2≤∫Ω′1Γ(q, p)2dµq,e(p) dψ(q)ψ′(Ω′1) ≤ 2ψ′(Ω′1).This shows ψ′(Ω′1) ≥ 2−3c20η2 as claimed.Since |uq′,f | ≤ 1, decomposing of each function uq′,f into its positive and negative parts in (4.4.7)yields that ∫Ve〈gq,∏f∈∂evq′,f 〉µq′,edµq,e(p) ≥ 2−d′η ≥ 2−dηi.e.〈gq,∏f∈∂evq′,f 〉µq′,e ≥ 2−d′η ≥ 2−dη (4.4.13)for some q′ and functions7 vq′,f : Vf → [0, 1]. Now we obtained correlation with structures and we7For each fixed f , take vq′,f (x) = maxu+q′,f where u+q′,f denotes nonnegative terms in the decomposition and the maximumis taken over these nonnegative terms.127will pass from structural functions to sets. For a given f ∈ ∂e and some 0 ≤ tf ≤ 1 , letUq′,tf := {xf ∈ Vf : vq′,f (xf ) ≥ tf}be the level set of the functions vq′,f . That is, for each fixed xf ,F (tf ) := 1Uq′,tf (xf )⇒ F (tf ) =1 if tf ≤ vq′,f (xf )0 if tf > vq′,f (xf )Then vq′,f (xf ) =∫ 10 F (tf )dtf =∫ 10 1Uq′,tf (xf )dtf , and for each term in (4.4.13) we have by swap-ping the integrals,∫ 10· · ·∫ 10〈gq,∏f∈∂e1Uq′,tf 〉µq′,edt ≥ 2−dη,where t = (tf )f∈∂e. By pigeonhole principle the integrand must be at least 2−dη for some value ofthe parameter t.Fix such a t = (tf ) ∈ [0, 1]d and write Uq′,f for Uq′,tf for simplicity of notation. For q′ = (q, p) ∈ Ω′1,define Bq′,f to be the σ−algebra generated by Bq,f , and the Uq′,f . For q′ /∈ Ω′1, set Bq′,f = Bq,f .The function∏f∈∂e 1Uq′,f is constant on the atoms of the σ−algebra∨f∈∂e Bq′,f , and therefore forq′ ∈ Ω′1〈1Gq,e − Eµq′,e(1Gq,e |∨f∈∂eBq′,f ),∏f∈∂e1Uq′,f〉µq′,e= 0for q′ ∈ Ω′1. Hence, by (4.4.6) and (4.4.13) it follows that〈Eµq′,e(1Gq,e |∨f∈∂eBq′,f )− Eµq,e(1Gq,e |∨f∈∂eBq,f ),∏f∈∂e1Uq′,f 〉µq′,e ≥ 2−dη (4.4.14)By Lemma 4.3.4 there is a set E1 ⊆ Z ′ such that ψ′(E1) = oK(1) and∥∥ ∏f∈∂e1Uq′,f∥∥L2(µq′,e)≤ µq′,e(Ve)1/2 = 1 + oK(1) ≤ 2for q′ ∈ Ω′1\E1 =: Ω′2. Then apply Cauchy-Schwartz inequality to (4.4.14),∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )− Eµq,e(1Gq,e |∨f∈∂eBq,f )∥∥L2(µq′,e)≥ 2−d−1η,128for q′ ∈ Ω′2. By (4.3.15) in Lemma 4.3.9 there is an exceptional set E2 ⊆ Z ′ of measure ψ′(E2) =oK,M (1) such that for q′ = (q, p) ∈ Ω′3 := Ω′2\E2 we have∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )− Eµq′,e(1Gq,e |∨f∈∂eBq,f )∥∥L2(µq′,e)≥ 2−d−1η − oK,M (1) ≥ 2−d−2η.(4.4.15)Since Bq,f ⊆ Bq′,f , for q′ = (q, p), applying Pythagorus’s Theorem, (4.4.15) is equivalent to∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥2L2(µq′,e)− ∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq,f )∥∥2L2(µq′,e)≥ 2−2d−4η2. (4.4.16)Finally, an invocation of (4.3.16) in Lemma 4.3.9 there is a set E3 ⊆ Z ′ of measure ψ′(E3) = oK,M (1)such that for q′ ∈ Ω′4 := Ω′3\E3 we have (for N,W sufficiently large)∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥2L2(µq′,e)− ∥∥Eµq,e(1Gq,e | ∨f∈∂eBq,f )∥∥2L2(µq,e)≥ 2−2d−5η2. (4.4.17)This proves the lemma choosing Ω′ = Ω′4.Now for any given e ∈ H, we shall prove a Koopman-von Neumann type decomposition for 1Gefor any Ge ∈ Bq,e . The will be done via iterations argument; repeated applications of lemma4.4.1 and boundedness of the total energy of the hypergraph system. The total energy of the fam-ily {Bq,e}e∈Hd′ with respect to a family of lower order σ-algebras {Bq,f}f∈Hd′−1 and a family ofmeasures {µq,e}e∈Hd′ is the quantityEd′({Bq,f}f∈Hd′−1) =∑e∈Hd′ ,Ge∈Be∥∥Eµe(1Ge | ∨f∈∂eBq,f )∥∥2L2(µq,e)≤ 2(dd′)22Md′ . (4.4.18)And the total energy of the hypergraph system isE({Bq,f}f∈Hd′−1) :=∑1≤d′≤dEd′({Bq,f}f∈Hd′−1) (4.4.19)Assuming the measures µe are normalized i.e. µe(Ve) = 1 + o(1) ≤ 2, a crude upper bound for the Eare 2d+122M= OM (1) is a universal bound, whereM is the complexity of the σ-algebras∨f∈H Bq,f .129Lemma 4.4.2 (Koopman-von Neumann decomposition for Parametric Weight System). Let {µq,f}q∈Z,f∈Hbe a well-defined, symmetric family of measures of complexity at most K. Let 1 ≤ d′ ≤ d, and let{Bq,e}q∈Z,e∈Hd′ and {Bq,f}q∈Z,f∈Hd′−1 be families of σ-algebras of complexity at most Md′ andMd′−1. Finally let Ω ⊆ Z with ψ(Ω) ≥ c0 > 0, and let δ > 0 be a constant. (In the iteration process,these quantities in the assumption are obtained from the previous step of the iteration.)Then for N, W sufficiently large with respect to the constants δ, c0,Md′ ,Md′−1 and K, Z ′ = Z ×V ,there exists a well-defined, symmetric extension {µq′,f}q′∈Z′,f∈H of the system {µq,f}q∈Z,f∈H ofcomplexity at most K ′ = OMd′ ,K, δ(1) and a family of σ-algebras {Bq′,f}q′∈Z′,f∈Hd′−1 such that thefollowing hold.1. For all q′ = (q, p) ∈ Z ′ and f ∈ Hd′−1 we haveBq,f ⊆ Bq′,f , compl(Bq′,f ) ≤ compl(Bq,f ) +OMd′ , δ(1). (4.4.20)2. There exists a set Ω′ ⊆ Ω × V ⊆ Z ′ of measure ψ′(Ω′) ≥ c(c0, δ,Md′) > 0 such that for allq′ = (q, p) ∈ Ω′ and for all e ∈ Hd′ , for all Gq,e ∈ Bq,e one has∥∥1Gq,e − Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥µq′, e≤ δ. (4.4.21)and the stability property∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq,f )∥∥2L2(µq′,e)=∥∥Eµq,e(1Gq,e | ∨f∈∂eBq,f )∥∥2L2(µq,e)+ oMd′ ,K, δ(1),(4.4.22)Proof. Initially set Z ′ = Z, then (4.4.20) and (4.4.22) trivially holds for q′ = q. If there is a setΩ1 ⊆ Ω of measure ψ(Ω1) ≥ c02 such that inequality (4.4.21) holds for all q ∈ Ω1 and Gq,e ∈ Bq,ethen the conclusions of the lemma hold for the initial system of measures and σ-algebras({µq,f}q∈Z,f∈H, {Bq,e}q∈Z,e∈Hd′ , {Bq,f},q∈Zf∈Hd′−1)and the set Ω1.Otherwise, for all sets Ω2 ⊆ Ω of measure ψ(Ω2) ≥ c02 such that for each q ∈ Ω2 there is an e ∈ Hd′and a set Gq,e ∈ Bq,e for which the inequality (4.4.21) fails. Fix one of such Ω2. By the pigeonholing,up to a factor of(dd′), we may assume that there is an e ∈ Hd′ that (4.4.21) fails for all q. Thenby Lemma 4.4.1, with η := δ2d′, there is a well-defined extension {µq′,f}q′∈Z′,f⊆e , a family ofσ-algebras {Bq′,f}q′∈Z′,f∈∂e and a set Ω′ ⊆ Ω2 of positive measure for which (4.4.2)-(4.4.4) hold.Let {µq′,f}q′∈Z′,f∈H be the symmetrization of the system {µq′,f}q′∈Z′,f⊆e as described in section1304.3.2, and set Bq′,f := Bq,f for q′ /∈ Ω′ or f ∈ Hd′−1, f * e or f ∈ Hd′ . By Lemma 4.3.9 and Lemma4.4.1 one may remove a set E of measure ψ′(E) = oMd′ ,K(1) such that for all q′ ∈ Ω′\E , (4.4.20) and(4.4.22) hold for the extended system ({µq′,f}q′∈Z′,f∈H, {Bq′,e}q′∈Z′,e∈Hd′ , {Bq′,f},q′∈Z′,f∈Hd′−1),whose total energy is at least 2−2d−5δ2d′larger than that of the initial system.Based on the above argument we perform the following iteration. Let {µq′,f}q′∈Z′,f∈H be a well-defined, symmetric extension of the initial system {µq,f}q∈Z,f∈H . Let {Bq′,f}q′∈Z′,f∈Hd′−1 be afamily of σ-algebras and let Ω′ ⊆ Ω× V ′ ⊆ Z ′ for which (4.4.20) and (4.4.22) hold. If there is a setΩ′1 ⊆ Ω′ of measure ψ′(Ω′1) ≥ ψ(Ω′)/2 such that for all q ∈ Ω′1, e ∈ Hd′ and Gq,e ∈ Bq,e inequality(4.4.21) holds, then the system ({µq′,f}q′∈Z′,f∈H, {Bq′,e}q′∈Z′,e∈Hd′ , {Bq′,f},q′∈Z′,f∈Hd′−1) togetherwith the set Ω′1 satisfies the conclusions of the lemma. (Note that the family of sigma-algebras Bq′,eis unchanged.)Otherwise there is a well-defined, symmetric extension {µq′′,f}q′′∈Z′′,f∈H together with a familyof σ-algebras {Bq′′,f}q′′∈Z′′,f∈Hd′−1 and a set Ω′′ ⊆ Ω′ × Zd′N such that for all q′′ ∈ Ω′′ inequalities(4.4.20) and (4.4.22) hold, and total energy of the system (µq′′,e,Bq′′,e,Bq′′,f ) is at least 2−2d−6δ2d′larger than that of the system (µq′,e,Bq′,e,Bq′,f ). Set Z ′ := Z, µq′,e := µq′′,e and Bq′,f := Bq′′,f . thenreturn to previous step.As (4.4.18) is bounded by an absolute constant (as Bq,e, e ∈ Hd is never changed in the iteration)the iteration process must stop in OMd′ ,δ(1) steps and the system obtained from the last step satisfies(4.4.20)-(4.4.22).4.4.2 Regularity LemmaThe shortcoming of Lemma 4.4.2 is that the complexity of the σ-algebras Bq,f might be very largewith respect to the parameter δ, which measures the uniformity of the graphs Gq,e. Hence it is nota good tool to describe the structure. This issue can be taken care of with an iteration process usingLemma 4.4.2 repeatedly, along the lines it was done in [104]. In the weighted settings we have to passto a new system of weights and measures at each iteration and have to exploit the stability propertiesof well-defined extensions to show that the iteration process terminates.In the first step, we will prove Preliminary Regularity Lemma which regularize only graphs onHd′ fora fixed d′, as in Lemma 4.4.2. Then we prove a full regularity lemma which regularizes simultaneouslyall elements of the hypergraph.131Remark 4.4.3. In our energy increment process (last subsection), Bq,e is not changed in each step ofthe iteration. So it is okay to write system as e.g.(µq′,f ,Bq,e,Bq′,f ).Remark 4.4.4. In regularity lemma below, to obtain the extreme uniformity like (4.4.28), we wouldneed a pair of sigma algebras Bq,f ,B′q,f which are close in L2 norm in our decomposition. B′q,fitself would not be able to play a role as the structure part since it has complexity OMd′−1,δ(1) =OMd′−1,F (1), due to choice of δ which we have no control as F can be chosen to be arbitrarily fastgrowing.This lemma, as a regularity lemma, is more widely applicable than Lemma 4.4.2 as the uniformityof the hypergraphs Gq,e with respect to the (fine) σ−algebras B′q,f can be chosen to be arbitrar-ily small with respect to the complexity of the (coarse) σ−algebras Bq,f , while the approximationsEµq,e(1Gq,e |∨B′q,f ) and Eµq,e(1Gq,e |∨Bq,f ) stay very close in L2(µq,e). First, we start by regular-izing hyperedges in a givenHd′ for some 1 ≤ d′ ≤ d.Lemma 4.4.5 (Preliminary regularity lemma.). Let 1 ≤ d′ ≤ d and Md′ > 0 be a constant. Let{µq,f}q∈Z,f∈H be a well-defined, symmetric family of measures of complexity at most K, and 1 ≤d′ ≤ d and {Bq,e}q∈Z,e∈Hd′ be a family of σ−algebras on Ve so that for all q ∈ Z, e ∈ Hd′compl (Bq,e) ≤Md′ . (4.4.23)Let ε > 0 and F : R+ → R+ be a non-negative, increasing function, possibly depending on ε andΩ ⊆ Z be a set of measure ψ(Ω) ≥ c0 > 0.If N,W is sufficiently large with respect to the parameters ε, c0,Md′ ,K, and F , then there existsa well-defined, symmetric extension {µq,f}q∈Z,f∈H of complexity at most OK,Md′ ,F, ε(1), and fami-lies of σ-algebras Bq,f ,B′q,f , Bq,f ⊆ B′q,f defined for q ∈ Z, f ∈ Hd−1 and a set Ω ⊆ Z such thatthe following holds.1. We have that Ω ⊆ Ω × V ⊆ Z = Z × V where V = ZkN of dimension k = OMd′ ,F, ε(1).Moreoverψ(Ω) ≥ c(c0, F,Md′ , ε) > 0. (4.4.24)2. There is a constant Md′−1 such thatF (Md′) ≤Md′−1 = OMd′ ,F,ε(1) (4.4.25)and for all q ∈ Z and f ∈ Hd′−1 we havecompl(B′q,f ) ≤Md′−1. (4.4.26)3. For all q = (q, p) ∈ Ω, e ∈ Hd′ and Gq,e ∈ Bq,e, we have132∥∥Eµq,e(1Gq,e | ∨f∈∂eB′q,f )− Eµq,e(1Gq,e |∨f∈∂eBq,f )∥∥L2(µq ,e)≤ ε (4.4.27)and ∥∥1Gq,e − Eµq,e(1Gq,e | ∨f∈∂eB′q,f )∥∥µq,e≤ 1F (Md′−1). (4.4.28)Proof. Let {µq′,f}q′∈Z′, f∈H be a well-defined, symmetric extension of the initial system {µq,f}defined on a parameter space Z ′ = Z × V ′ of complexity at most K ′. We start by putting trivialsigma-algebra Bq′,f = {∅, VJ} on each f ∈ Hd′−1. SetMd′−1 := max{F (Md′), supf∈∂Hdcompl(Bq′,f )} = OK,Md′ ,F (1), δ :=1F (Md′−1)(4.4.29)Indeed the point here is that Md′−1 (and later Md′−1) is OK,Md′ ,F (1). Set Bq′,e := Bq,e for q′ =(q, p) ∈ Z ′, e ∈ Hd′ , and apply Lemma 4.4.2 to the system (µq′,e, Bq′,e, Bq′,f ), with δ = F (Md′−1)−1.This generates a well-defined, symmetric extension {µq,f}q∈Z,f∈H and a family of σ-algebras {B′q,f}q∈Z,f∈Hd′−1and a set Ω ⊆ Z satisfying the conclusion of that Lemma.The new system (µq,f ,Bq,e,B′q,f ) satisfies (4.4.24)-(4.4.26) and (4.4.28). Note that the parametersK ′, Md′−1 are of magnitude OK,Md′ ,F, ε(1). Set Bq,f := Bq′,f for q = (q′, p) ∈ Z, f ∈ Hd′−1.To ensure L2−closeness property, we run the energy increment. There are two possibilities.– Case 1: There exists a set Ω1 ⊆ Ω of measure ψ(Ω1) ≥ ψ(Ω)/2 such that (4.4.27) holds for allq ∈ Ω1. In this case the conclusions of the lemma hold for the system (µq,e,Bq,e,B′q,f ) and theset Ω1.– Case 2: For every Ω1 ⊆ Ω, ψ(Ω1) ≥ ψ(Ω)/2, we have (4.4.27) fails for some q ∈ Ω1. LetΩ2 := {q ∈ Ω : (4.4.27) fails}. Then Ω2 ⊆ Ω is of measure ψ(Ω2) ≥ 12ψ(Ω). Now, thanksto the stability condition (4.4.22) and the fact that Bq′,f = Bq,f ⊆ B′q,f , we have for q ∈ Ω2,q′ = pi′(q), and q = pi(q) where pi : Z → Z, pi′ : Z → Z ′ are projections, we have thatEd′(B′q,f )− Ed′(Bq,f ) =∑e,Gq,e∥∥Eµq,e(1Gq,e | ∨f∈∂eB′q,f )∥∥2L2µq,e−∑e,Gq,e∥∥Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥2L2µq′,e≥∑e,Gq,e(∥∥Eµq,e(1Gq,e | ∨f∈∂eB′q,f )∥∥2L2µq,e− ∥∥Eµq,e(1Gq,e | ∨f∈∂eBq′,f )∥∥2L2µq,e)− oMd′ ,K′,F (1)133=∑e,Gq,e∥∥Eµq,e(1Gq,e | ∨f∈∂eB′q,f )− Eµq,e(1Gq,e |∨f∈∂eBq,f )∥∥2L2µq,e− oMd′ ,K′,F (1)≥ ε2 − oMd′ ,K′,F (1), (4.4.30)where the summation is taken over all e ∈ Hd′ and Gq,e ∈ Bq,e.Thus, for sufficiently large N, W , we have for all q = (q, p) ∈ Ω2 that the total energy of thesystem (µq,f , Bq,e, B′q,f ) is at least ε22 larger than that of the system (µq′,f , Bq′,e, Bq′,f ). In thiscase, set Z ′ := Z, Ω′ := Ω3, µq′,f := µq,f , and Bq′,f := B′q,f and repeat the above argument.The iteration process must stop in at most ε−222(Md′ )+12d+1 = OMd′ ,ε(1) steps, generating a system(µq,f ,Bq,e,B′q,f ) which satisfies the conclusions of the lemma.In order to obtain a counting and a removal lemma starting from a given measure system {µq,e} andσ-algebras {Bq,e} we need to regularize the elements of the σ-algebras Bq,e for all e ∈ H with respectto its lower order σ-algebras∨f∈∂e Bq,f . This is done by applying Lemma 4.4.5 inductively, andprovides the final form of the regularity lemma we need. Let us call a function F : R+ → R+ agrowth function if it is continuous, increasing, and satisfies8 F (x) ≥ 1 + x for x ≥ 0.Theorem 4.4.6. [Full Regularity lemma.] Let 1 ≤ d′ ≤ d and Md′ > 0 be a constant. Let{µq,f}q∈Z,f∈H be a well-defined, symmetric family of measures of complexity at mostK, and {Bq,e}q∈Z,e∈Hd′be a family of σ−algebras on Ve so that for all q ∈ Z, e ∈ Hd′compl (Bq,e) ≤Md′ . (4.4.31)Let F : R+ → R+ be a growth function, and Ω ⊆ Z be a set of measure ψ(Ω) ≥ c0 > 0.If N,W is sufficiently large with respect to the parameters c0,Md′ ,K, and F , then there exists awell-defined, symmetric extension {µq,f}q∈Z,f∈H of complexity at most OK,Md′ ,F (1) on a paramet-ric space Z, and families of σ-algebras Bq,f ⊆ B′q,f defined for q ∈ Z, f ∈ Hd′−1 and a set Ω ⊆ Zsuch that the following holds.1. We have that Ω ⊆ Ω×V ⊆ Z = Z×V where V = ZkN of dimension k = OMd′ ,F (1). Moreoverψ(Ω) ≥ c(c0, F,Md′) > 0. (4.4.32)2. There exist numbersMd′ < F (Md′) ≤Md′−1 < F (Md′−1) ≤ · · · ≤M1 < F (M1) ≤M0 = OMd′ ,F (1) (4.4.33)such that for all 1 ≤ j < d′, f ∈ Hj , and q ∈ Z,8This condition is just for ensuring that Md, . . . ,M0 in (4.4.33) is a strictly increasing sequence of integers.134compl(B′q,f ) ≤Mj . (4.4.34)3. For all 1 ≤ j ≤ d′, e ∈ Hj , q = (q, p) ∈ Ω, and Gq,e ∈ Bq,e (with Bq,e := Bq,e, if j = d′),one has∥∥Eµq,e(1Gq,e | ∨f∈∂eB′q,f )− Eµq,e(1Gq,e |∨f∈∂eBq,f )∥∥L2(µq,e)≤ 1F (Mj)(4.4.35)and ∥∥1Gq,e − Eµq,e(1Gq,e | ∨f∈∂eB′q,f )∥∥µq.e≤ 1F (M1). (4.4.36)Proof. We proceed by an induction on d′. If d′ = 1 the statement follows from Preliminary RegularityLemma 4.4.5 with ε = 1F (M1) , so assume that d′ ≥ 2 and the theorem holds for all j ≤ d′− 1. ApplyLemma 4.4.5 on Hd′ with a very fast growing growth function F ∗ ≥ F (to be specified later 9) andwith ε = 12F ∗(Md′ ) . This gives a well-defined, symmetric extension {µq′,f}f∈H and a family of σ-algebras Bq′,f ⊆ B′q′,f , f ∈ Hd′−1 defined on a parameter space Z ′ = Z × V , such that (recall thedefinition of Md−1 in (4.4.29))F (Md′) ≤ F ∗(Md′) ≤Md′−1 ≤ OK,Md′ ,F ∗(1) (4.4.37)∥∥Eµq′,e(1Gq′,e | ∨f∈∂eB′q′,f )− Eµq′,e(1Gq′,e |∨f∈∂eBq′,f )∥∥L2(µq′,e)≤ 12F ∗(Md′)(4.4.38)and ∥∥1Gq′,e − Eµq′,e(1Gq′,e | ∨f∈∂eB′q′,f )∥∥µq′,e≤ 1F ∗(Md′−1), (4.4.39)hold for all q′ = (q, p) ∈ Ω′, e ∈ Hd′ , and Gq′,e ∈ Bq′,e = Bq,e , where Ω′ ⊆ Ω× V ⊆ Z ′ is a set ofmeasure ψ′(Ω′) ≥ c(c0, F,Md′) > 0. With this system, apply the induction hypothesis to the system{µq′,f}q′∈Z′,f∈H, {Bq′,f}q′∈Z′,f∈Hd′−1 ,Md′−1 the growth function F , and the set Ω′, one obtains anextension {µq,f}q∈Z,f∈H and families of σ−algebras {Bq,f ⊆ B′q,f}q∈Z, f∈Hj such that (4.4.34) -(4.4.36) hold for j < d′ − 1, with constantsMd′−1 < F (Md′−1) ≤ · · · ≤M1 < F (M1) = OMd′−1,F (1). (4.4.40)9F ∗ will be chosen depending on F and grows much faster than F so that F (Md′) < F ∗(Md′) < M0 = 12F∗(Md′−1) so F ∗controls the size of M0 in terms of Md, F . F ∗ is also used in inductive argument to conclude that F (Md) < Md−1. We applypreliminary regularity lemma to F ∗.135For q = (q′, p) ∈ Z , f ∈ Hd′−1 setBq,f := Bq′,f , and B′q,f := B′q′,f . (4.4.41)We show that inequalities (4.4.35) and (4.4.36) hold for j = d′. Indeed, by the stability property(4.3.16), one has∥∥Eµq,e(1Gq′,e | ∨f∈∂eB′q′,f )− Eµq,e(1Gq′,e |∨f∈∂eBq′,f )∥∥L2(µq,e)=∥∥Eµq′,e(1Gq′,e | ∨f∈∂eB′q′,f )− Eµq′,e(1Gq′,e |∨f∈∂eBq′,f )∥∥L2(µq′,e)+ oK,Md′ ,F,F ∗(1)≤ 12F ∗(Md′)+ oK,Md′ ,F,F ∗(1), (4.4.42)for all q = (q′, p) ∈ Ω\E1, e ∈ Hd′ , and Gq′,e ∈ Bq′,e. Here E1 ⊆ Ω is a set of measureψ(E1) = oK,Md′ ,F,F ∗(1).Similarly using the stability properties (4.3.12) and (4.3.21) of the box norms (and also (4.4.41)),we have∥∥1Gq′,e − Eµq,e(1Gq′,e | ∨f∈∂eB′q′,f )∥∥µq,e=∥∥1Gq′,e − Eµq′,e(1Gq′,e | ∨f∈∂eB′q′,f )∥∥µq,e+ oK,Md′ ,F,F ∗(1)=∥∥1Gq′,e − Eµq′,e(1Gq,e | ∨f∈∂eBq′,f )∥∥µq′,e+ oK,Md′ ,F,F ∗(1) ≤12F ∗(Md′−1)+ oK,Md′ ,F,F ∗(1),(4.4.43)for all q = (q′, p) ∈ Ω\E2, e ∈ Hd′ and Aq′,e ∈ Bq′,e = Bq,e , where E2 ⊆ Ω is a set of measureψ(E2) = oK,Md′ ,F,F ∗(1).With F (M1) = OMd′−1,F (1). Now we link Md′ with (4.4.40). Choose (modify) the functionF ∗ = F ∗Md′ ,Md′−1,F so that it grows fast enough that– F ∗(Md′) < Md′−1.– F (M1) < c(Md′ , F ) <F ∗(Md′−1)2 .Then we have from (4.4.37) thatMd′ < F (Md′) ≤ F ∗(Md′) ≤Md′−1 < F (Md′−1) ≤ · · · ≤M1 < F (M1) ≤M0 = OMd′ ,F (1) :=12F ∗(Md′−1)(4.4.44)136Assuming N, W are sufficiently large with respect to Md′ and K, inequalities (4.4.35), (4.4.36) forj = d′ and q ∈ Ω\(E1∪E2) follow from (4.4.38) and (4.4.39) and (4.4.44). The rest of the conclusionsof the theorem (4.4.32), (4.4.33), (4.4.34) are clear from the construction.4.5 Counting LemmaIn this section we formulate a so-called counting lemma and show how it implies Theorem 4.1.7. Ourarguments will closely follow and are straightforward adaptations of those in [106] to the weightedsettings.For e ∈ Hd let Ge ⊆ Ve be a hypergraph, and let Be = {Ge, GCe , ∅, Ve} be the σ−algebra generatedby Ge. Let {νe}e∈H and {µe}e∈H be the weights and measures associated to a well-defined, sym-metric family forms L = {Lke ; e ∈ Hd, 1 ≤ k ≤ d}. Take Md > 0, F : R+ → R+ be a growthfunction (F to be determined later) and apply Theorem 4.4.6 with d′ = d to obtain a well-defined,symmetric parametric extension {µq,f}q∈Z,f∈H together with σ-algebras Bq,e,B′q,e, Bq,e ⊆ B′q,e anda set Ω ⊆ Z such that (4.4.32)-(4.4.36) hold.10 Note that the complexity of the system as well as theσ-algebras is OMd,F (1). We consider the system of measures µq,f and Bq,f , B′q,f , f ∈ H fixed forthe rest of this section.It will be convenient to define all our σ-algebras on the same space VJ and eventually replace theensemble of measures {µq,e}e∈H with the measure µq := µq,J =∏f∈H νq,f . Thanks to the stabilityconditions (4.3.3)-(4.3.4) this can be done at essentially no cost: Indeed for any e ∈ H there is anexceptional set Ee ⊆ Ω of measure ψ(Ee) = oMd,F (1), such that for any family of sets Gq,e ⊆ Ve wehave thatµq(pi−1e (Gq,e)) = µq,e(Gq,e) + oMd,F (1), (4.5.1)uniformly for q ∈ Ω\Ee. Let E =⋃e∈H Ee, Ω′ := Ω\E , then (4.5.1) means that for any set Aq,e ∈ Aeone has that µq(Aq,e) = µq,e(pie(Aq,e)) + oMd,F (1) uniformly for q ∈ Ω′. We will writeµq,e(Aq,e) = µq,e(pie(Aq,e))for simplicity of notations.Define the σ-algebras Bq,e := pi−1e (Bq,e), B′q,e := pi−1e (B′q,e) on VJ , and note that Bq,e = Be fore ∈ Hd as the initial σ-algebras Be are not altered in Theorem 4.4.6. Let Bq :=∨e∈H Bq,e be theσ-algebra generated by the algebras Bq,e, and define similarly the σ-algebra B′q. The atoms of Bq areof the form Aq =⋂e∈HAq,e where Aq,e is an atom of Bq,e. In particular if Ee ∈ Be then⋂e∈Hd Ee10The family {νe} can be considered as a parametric family of weights in a trivial way, setting Z = Ω = {0}, and ψ(0) = 1.137is the union11 of the atoms of Bq.Basically, the counting lemma says that as we decompose1Aq,e = Eµq,e(1Aq,e |∨f∈∂eBq,f ) + bq,e + cq,ewhere bq,e is small in L2 norm and cq,e is small is box norm. Our counting lemma says that for mostatoms, when we calculate the measure µq(Aq) = µq(∩f∈HAq,f ) we haveµq(∩f∈HAq,f ) =∏e∈HEµq,e(1Aq,e |∨f∈∂eBq,f )+small error =∏e∈Hµq,e(Aq,e ∩ ∩f∈∂eAq,f )µq,e(∩f∈∂eAq,f ) +small errorThat is most atoms can be approximated by its relative density with respect to one lower order atoms in∨f∈∂eBq,f which comes from the main term of the decomposition. The terms bq,e, cq,e only contributeto small error terms. To get rid of cq,e, we only need the usual Generalized von Neumann Inequalityargument. A bit more work will be needed to get rid of bq,e.A consequence of the counting lemma is that one can show the measures of an atom that we are in-terested is bounded below by a positive constant depending only on the initial data F and Md. If,as in Theorem 4.1.7, one assumes that the measure of⋂e∈Hd Ee is sufficiently small then it cannotcontain most of the atoms (will be named regular atoms) thus removing the exceptional atoms fromthe setsEe, the intersection of the remaining sets becomes empty, leading to a proof of Theorem 4.1.7.To make this heuristic precise let us start by defining the relative density δq,e(A|B) := µq,e(A ∩B)/µq,e(B) for A,B ∈ Bq,e, with the convention that δq,e(A|B) := 1 if µq,e(B) = 0.Definition 4.5.1. Let Aq = ∩e∈HAq,e be an atom of Bq.For e ∈ Hj , 1 ≤ j ≤ d. We say that the atomAq is regular if the following hold.1. For all atoms Aq,e the relative density is not too small12:δq,e(Aq,e∣∣ ⋂f∈∂eA′q,f ) ≥1logF (Mj), (4.5.2)2. It satisfies an regularity condition13:∫Ve∣∣Eµq(1Aq,e | ∨f∈∂eB′q,f )−Eµq(1Aq,e |∨f∈∂eBq,f )∣∣2 ∏f(e1A′q,f dµq,e ≤1F (Mj)∫Ve∏f(e1A′q,f dµq,e.(4.5.3)11Indeed,∨e∈Hd Bq,e ⊆∨f∈H Bq,f12Don’t take the log function too seriously.13As mentioned in chapter 1, this is related to Box norm. The notation of regular atoms has some relations to (hyper-)graphregularity138This roughly means that all atoms Aq,e are both somewhat large and regular on the intersection of thelower order atoms A′q,f , (f ∈ ∂e). Note that if |e| = 1 then ∂e = ∅ and by convention we define⋂f∈∂eA′q,f = VJ , and the left side of (4.5.2) becomes µq,e(Aq,e).Now we state the counting lemmaProposition 4.5.2. [Counting lemma] There is a set E ⊆ Ω of measure ψ(E) = oN,W→∞;Md,F (1)such that if q ∈ Ω\E and if Aq =⋂e∈HAq,e ∈∨e∈H Bq,e is a regular atom, thenµq(Aq) = (1 + oMd→∞(1))∏e∈Hδq,e(Aq,e∣∣ ⋂f∈∂eAq,f ) +OM1(1F (M1))+ oN,W→∞;Md,F (1).(4.5.4)An important corollary of the counting lemma is that each of the regular atoms is not too small inmeasure and the total measures of all irregular atoms is small, if we assume F is sufficiently fastgrowing, of exponential type.Lemma 4.5.3 (Regular atoms). For F (M) ≥ 222M+3 and sufficiently large Md,1. (Total measure of Irregular atoms is small) For each Aq,e ∈ Bq,e, define the setBq,e,Aq,ebe the union of all sets of the form⋂f(eA′q,f for which (4.5.2) or (4.5.3) fails.Note that if an atom Aq =⋂e∈HAq,e is irregular then Aq ⊆ Aq,e ∩ Bq,e,Aq,e for some e ∈ H.Then for q /∈ E1, where E1 ⊆ Ω is a set of measure ψ(E1) = oMd,F (1). We haveµq(Aq,e ∩Bq,e,Aq,e) .1logF (Mj)(4.5.5)2. (A regular atom is large) For q ∈ Ω and a regular atom Aq = ∩f∈HAq,f ,µq(Aq) ≥ 1F (M1)> 0, (4.5.6)Proof. First we show (4.5.5). Note that the measure µq can be replaced by the measure µq,e as theydiffer by a negligible quantity on sets which belong to Ae. We estimate first the contribution of thosesets⋂f(eAq,f to the left side of (4.5.5) for which (4.5.2) fails. This quantity is bounded by∑{Aq,f}f(e, (4.5.2) failsµq,e(Aq,e ∩⋂f∈∂eAq,f ) .d1logF (Mj)∑{Aq,f}f∈∂eµq,e(⋂f∈∂eAq,f )139≤ 1logF (Mj)µq,e(Ve) .1logF (Mj),as the summation is taken over the disjoint atoms of the σ-algebra∨f∈∂e Bq,f .Similarly, one estimates the total contribution of the disjoint atoms⋂f(eAq,f for which (4.5.3) failsas follows.∑{Aq,f}f(e, (4.5.3) failsµq,e(⋂f(eAq,f )≤ F (Mj)∑{Aq,f}f(e, (4.5.3) fails∫VeEµq,e(1Aq,e |∨f∈∂eB′q,f )− Eµq,e(1Aq,e |∨f∈∂eBq,f )|2∏f(e1Aq,fdµq,e≤ F (Mj)∫Ve|Eµq,e(1Aq,e |∨f∈∂eB′q,f )− Eµq,e(1Aq,e |∨f∈∂eBq,f )|2 dµq,e≤ F (Mj) 1F (Mj)2=1F (Mj).Since the sets Aq,e ∩ Bq,e,Aq,e contain all irregular atoms, and for given e ∈ Hj the number of allatoms of the σ-algebra Bq,e is at most 22Mj , one estimates the total measure of all irregular atoms asd∑j=1∑e∈Hj∑Aq,e∈Bq,eµq(Aq,e∩Bq,e,Aq,e) ≤d∑j=1(dj)22Mj 1logF (Mj)≤d∑j=122Mj+dlogF (Mj)≤ 1√logF (Md)≤ 2−2Md(4.5.7)Here the two last inequalities will follow if we choose Md sufficiently large and F sufficiently fastgrowing: choose Md so that d2d22Mj ≤ 22Mj+1 and 222Mj+3 ≥ e22Mj+2for all j (Md ≥ 1 shouldsuffice here). Now chooseF (M) ≥ 222M+3(4.5.8)Thend∑j=122Mj+dlogF (Mj)≤d∑j=122Mj+d√22Mj+21√logF (Mj)=d∑j=122Mj+d22Mj+11√logF (Mj)≤d∑j=11d1√logF (Mj)≤ 1√logF (Md)So (4.5.7) follows.Now we use the counting lemma to show (4.5.6). Indeed by (4.5.2), (4.5.4), we have that for q ∈ Ωand a regular atom Aq = ∩f∈HAq,f ,140µq(Aq) ≥∏j≤d∏e∈Hj1F (Mj)1/10−Od,M1(1F (M1))+ oMd,F (1)≥ 1F (M1)1/101Mc(d)1−Od,M1(1F (M1))+ oMd,F (1) ≥1F (M1)≥ 1F ∗(Md′−1)= c(Md, F ) > 0,(4.5.9)as long as F is sufficiently rapid growing and Md is sufficiently large with respect to d, here we apply(4.4.44) and (4.4.29).4.5.1 Proof of the Counting Lemma.We will in fact prove a stronger version of counting lemma for hypergraph bundle for which propo-sition 4.5.2 is a special case. The reason is that when we try to eliminate the error term bq,e we willapply Cauchy-Schwartz’s inequality to lower order graph, causing the double vertices which couldbe described as lower order hypergraph bundle, allowing us to apply induction hypothesis from thestatement of counting lemma for hypergraph bundle.Definition 4.5.4 (Weighted hypergraph bundles over H). Let K be a finite set together with a mappi : K → J , called the projection map of the bundle to the index set J . Let GK be the set of edgesg ⊆ K such that pi is injective on g and pi(g) ∈ H.For any g ∈ GK , writeVg := Vpi(g) =∏k∈gVpi(k),and define the weights and measures νq,g, µq,g : Vg → R+ asνq,g(xg) := νq,pi(g)(xg), µq,g(xg) =∏g′⊆gνq,g′(xg′).The total measure measure µq,K on VK is given byµq,K(x) =∏g∈GKνq,g(xg).A hypergraph G ⊆ GK which is closed in the sense that ∂g ⊆ G for every g ∈ G, together with thespaces Vg and the weight functions νq,g for g ∈ G is called a weighted hypergraph bundle over H.The quantity d′ = supg∈G |g| is called the order of G.141Remark 4.5.5. The underlying linear forms defining the weight system {νq,g}q∈Z,g∈GK ,L¯g(q, xg) = Lpi(g)(q, xg), suppx (Lpi(g)) = pi(g)are pairwise linearly independent. Indeed, if g 6= g′ they depend on different sets of variables, and fora fixed sets of variables they are the same as the forms L(q, xg). What happens is that we sample anumber variables from each space Vj and evaluate the forms L(q, x) in the new variables. For exam-ple if we have x1, x1′ ∈ V1 and x2, x2′ ∈ V2 then to the edge (1, 2) ∈ H there correspond the edges(1, 2), (1, 2′), (1′, 2) and (1′, 2′) in G, and to every linear form L(q, x1, x2) there also correspond theforms L(q, x1, x2′), L(q, x1′ , x2) and L(q, x1′ , x2′) defining the weights on the appropriate edges.Proposition 4.5.6. [Generalized Counting Lemma] Let G ⊆ GK be a closed hypergraph bundle overH with the projection map pi : K → J , and d′ := supg∈G |g| be the order of G. Then, for Fgrowing sufficiently rapidly with respect to d and K, there exists a set E ⊆ Ω of measure ψ(E) =oN→∞;Md,K,F (1) such that for q ∈ Ω\E we have∫VK∏g∈G1Aq,pi(g)(xg) dµq,K(x) (4.5.10)= (1 + oMd→∞,K(1))∏g∈Gδq,pi(g)(Aq,pi(g)|⋂f∈∂pi(g)Aq,f ) +OK,M1(1F (M1)) + oN→∞,K,Md(1).Note that Proposition 4.5.2 is the special case when G = H and pi is the identity map.Proof. We use a double induction. First we induct on d′, the order of G (note that d′ ≤ d). Then,fixing K and pi, we induct on the number of edges r := |{g ∈ G : |g| = d′}|.To start, assume that d′ = r = 1, so that G = {k} and j = pi(k) ∈ J. The left hand side of(4.5.10) becomes∫Vk1Aq,j (xk) dµq,k(xk) =∫Vj1Aq,j (xj) dµq,j(xj) = µq,j(Aq,j) = δq,j(Aq,j | ∩f∈∂j Aq,f ).Let {Aq,e}e∈H be a regular collection of atoms for q ∈ Ω, and define the functions bq,e, cq,e : Ve → Rfor e ∈ H bybq,e := Eµq,e(1Aq,e |∨f∈∂eB′q,f )− Eµq,e(1Aq,e |∨f∈∂eBq,f ) (4.5.11)142cq,e := 1Aq,e − Eµq,e(1Aq,e |∨f∈∂eB′q,f ) (4.5.12)and introduce the shorthand notationδq,e = δq,e(Aq,e|⋂f∈∂eAq,f ).Note that if x ∈ Aq,e⋂f∈∂eAq,f thenδq,e = Eµq,e(1Ae |∨f∈∂eBq,f )(xe), (4.5.13)and thus one has the decomposition1Aq,e(xe) = δq,e + bq,e(xe) + cq,e(xe) (4.5.14)on the set Aq,e ∩⋂f∈∂eAq,f . To apply induction on r, let g0 ∈ G such that |g0| = d′ and use (4.5.14)to write ∏g∈G1Aq,pi(g)(xg) = (δq,pi(g0) + bq,pi(g0)(xg0) + cq,pi(g0)(xg0))∏g∈G\{g0}1Aq,pi(g)(xg).Consider the contribution of the terms separately:Step1: Main term∫VK∏g∈G1Aq,pi(g)(xg)dµq,K(x)=∫VK(δq,pi(g0) + bq,pi(g0)(xg0) + cq,pi(g0)(xg0))∏g∈G\{g0}1Aq,pi(g)(xg)dµq,K(x)= Mq + E1q + E2q (4.5.15)For main term Mq, by the second induction hypothesis we haveMq = δq,pi(g0)∫VK∏g∈G\{g0}1Aq,pi(g)(xg)dµq,K(x)143= δq,pi(g0) (1 + oMd→∞(1))∏g∈G\g0δq,pi(g) + OK,M1(1F (M1)) + oN,W→∞;K,Md(1),and hence Mq agrees with the right side of (4.5.10).Step2: We eliminate the error term cq,e by Generalized von Neumann’s Theorem argument.E2q =∫VKcq,pi(g0)(xg0)∏g∈G\{g0}1Aq,pi(g)(xg)dµq(x) = Ex∈VK (cq,pi(g0)νq,g0)(xg0)∏g∈G\{g0}1Aq,pi(g)νq,g(xg)= Ex∈VK∏|g|=d′,g∈Gfq,g(xg)∏g′∈G,|g′|<d′νq,g′(xg′), (4.5.16)where fq,g0 := cq,pi(g0)νq,g0 and fq,g := hq,gνq,g, for g ∈ G, g 6= g0 and |g| = d′ for a function hq,gof magnitude at most 1. Thus we have |fq,g| ≤ νq,g for all g ∈ G, |g| = d′. Applying the Cauchy-Schwartz inequality d′ times successively in the variables xj , j ∈ g0 as in the proof of generalizedvon Neumann’s theorem (Theorem 3.2.5), to clear all functions fq,g(xg), g 6= g0, which does notdepend on at least one of these variables, we obtain|E2q |2d′ .∥∥cq,pi(g0)∥∥2d′νq,g0 + Exg0 ,yg0 |Wq(xg0 , yg0)− 1| ∏h⊆g0∏ω∈{0,1}hνq,h(ωh(xh, yh)), (4.5.17)where K ′ := K\g0 andWq(xg0 , yg0) = Ex∈VK′∏g∈G,g*g0∏ωg∩g0∈{0,1}g∩g0νq,g(ωg∩g0(xg,∩g0 , yg∩g0), xg\g0). (4.5.18)Note that the first term on the right hand side of (4.5.17) is O(F (M1)−2d′) by (4.4.36) and (4.5.12).To estimate the second term of (4.5.17) we apply the Cauchy-Schwartz inequality one more timein xg0 , yg0 variables to see that it is oN,W→∞;Md,K,F (1) for q /∈ E1, where E1 is a set of measureoN,W→∞;Md,K,F (1) using the fact that the underlying linear forms are pairwise linearly independentin the variables (q, xg0 , yg0 , xK′).144Step3: We estimate the error term E1q defined asE1q =∫VKbq,pi(g0)(xg0)∏g∈G\{g0}1Aq,pi(g)(xg)dµq(x).To apply induction hypothesis, take absolute values and discarding all factors 1Aq,pi(g)(xg) for |g| =d′, g 6= g0 (this will be fine due to smallness of L2−norm of bq,g), one estimates|E1q | ≤∫Vg0|bq,pi(g0)(xg0)| ∏g(g01Aq,pi(g)(xg)×ExK′ ∏g∈G′,|g|<d′1Aq,pi(g)νq,g(xg)∏h∈G′,|h|=d′νq,h(xh) dµq,g0(xg0),where G′ = {g ∈ G; g * g0} and recall that K ′ = K\g0. Writing A(xg0) for the expression in thefirst parenthesis, and B(xg0) for the expression in the second parenthesis. Thus we have|E1q | ≤∫Vg0A(xg0)B(xg0) dµq,g0(xg0),thus by the Cauchy-Schwartz inequality we get|E1q |2 .(∫Vg0A(xg0)2 dµq,g0(xg0))(∫Vg0B(xg0)2 dµq,g0(xg0)). (4.5.19)Since νq,g0(Vg0) = 1 + oMd,K,F (1) outside a set E2 ⊆ Ω of measure ψ(E2) = oMd,K,F (1), the firstfactor on the left side of (4.5.19) is estimated byExg0∈Vg0 bq,pi(g0)(xg0)2∏g(g01Aq,pi(g)(xg)∏g⊆g0νq,pi(g)(xg). (4.5.20)Let f0 = pi(g0), since pi : g0 → f0 is injective and Vg0 = Vf0 , we may write the expression in(4.5.20), by re-indexing the variables xg to xf , f = pi(g) for g ⊆ g0, as∫Vf0bq,f0(xf0)2∏f(f01Aq,f (xf ) dµq,f0(xf0) .1F (Md′)∫Vf0∏f(f01Aq,f (xf )dµq,f0(xf0), (4.5.21)where the inequality follows from by assumption (4.5.3) on regular atoms. By the induction hypoth-esis we further estimate the right side (4.5.21) as1451F (Md′)(1 + oMd→∞(1))∏f(f0δq,f + OMd(1F (M1)) + oN,W→∞;Md,K,F (1). (4.5.22)The second factor in (4.5.19) may be expressed in terms of a hypergraph bundle K˜ over K, by usingthe construction given in [106]. Let K˜ = K0 ⊕g0 K, the set K × {0, 1} with the elements (k, 0) and(k, 1) are identified for k ⊆ g0. Let φ : K˜ → K be the natural projection, and pi ◦ φ : K˜ → J be theassociated map down to J . Recall G ⊆ GK is a closed subhypergraph.Let G0 = {g ∈ G, g ⊆ g0} and G′ = {g ∈ G, g 6⊆ g0, |g| < d′} and define the hypergraph bundleG˜ on K˜ to consist of the edges g × {0} and g × {1} for g ∈ G0 ∪ G′, two edges bing identified forg ∈ G0. Define the following weights on VK˜ν˜q,g×{i}(xg×{i}) := νq,g(xg×{i}), (4.5.23)for q ∈ Z, g ∈ GK , i = 0, 1, ( i.e. for all edges g˜ ∈ GK˜), and let µ˜q,g×{i} be the associated family ofmeasures. Then we have for the second factor appearing in (4.5.19)∫Vg0B(xg0)2dµq,g0(xg0)=∫Vg0[ ∏g∈g01Aq,pi(g)(xg)][Ex∈VK\g0∏g∈G\{g0}1Aq,pi(g)νq,g(xg)∏h6⊆g0,|h|=d′νq,h(xh)]2dµq,g0(xg0)=∫VK˜∏g˜∈G˜1Aq,pi◦φ(g˜)(xg˜) dµ˜q,K˜(xK˜). (4.5.24)Indeed, when expanding the square of inner sum in (4.5.24) we double all points in K\g0 thus weeventually sum over xK˜ ∈ VK˜ , also double all edges g ∈ G˜ to obtain the edges g × {0}, g × {1}. Asfor the weights, the procedure doubles all weights νq,g(xg) for g 6⊂ g0, g ∈ GK to obtain the weightsνq,g(xg×{i}) for i = 0, 1 while leaves the weights νq,g(xg) for g ⊆ g0 unchanged. The order of g˜ isless than d′ thus by the first induction hypothesis, we have∫VK˜∏g˜∈G˜1Aq,pi◦φ(g˜)(xg˜) dµ˜q,K˜(xK˜) == (1 + oMd→∞(1))∏g˜∈G˜δq,pi◦φ(g˜) +OK,M1(1F (M1)) + oN,W→∞;,Md,K,F (1)= (1 + oMd→∞(1))∏g∈G0δq,pi(g)∏g∈G′δ2q,pi(g) +OK,M1(1F (M1)) + oN,W→∞;Md,K,F (1), (4.5.25)146for q /∈ EK˜,φ where EK˜,φ ⊆ Ω is a set of measure ψ(EK˜,φ) = oN,W→∞;Md,K,F (1). Note that thereare only OK(1) choices for choosing the set K˜ and the projection map φ : K˜ → K thus takingthe union of all possible exceptional sets EK˜,φ we have that (4.5.25) holds for q /∈ E ′K if measureψ(E ′K) = oN,W→∞;Md,K,F (1). Combining the bounds (4.5.22) and (4.5.25) we obtain the errorestimate|E1q |2 = (oF,Md→∞(1))∏g∈Gδ2q,pi(g) +OK,M1(1F (M1)) + oN,W→∞;Md,K,F (1),outside a set E ′K of measure oN,W→∞;Md,K,F (1). This closes the induction and the Proposition fol-lows.4.6 Proof of Weighted Simplices Removal LemmaProof of Theorem 4.1.7. Let δ > 0, Ee ∈ Ae and ge : Ve → [0, 1] for e ∈ Hd be given. Let E1 ⊆ Ωbe a set of measure ψ(E1) = oMd,F (1) so that (4.5.1), (4.5.7) and (4.5.9) hold for q ∈ Ω/E1. Also by(4.3.8) conditions (4.1.4)-(4.1.5) hold forµ˜J := µq,J and µ˜e := µq,e (e ∈ Hd), (4.6.1)for q /∈ E2, for a set E2 ⊆ Ω be a set of measure ψ(E2) = oMd,F (1).Now fix q /∈ E1 ∪ E2 and define µ˜J and µ˜e for e ∈ Hd as is (4.6.1). We claim that this systemof measures satisfy the conclusions of the theorem. By construction the system is symmetric so itremains to construct the sets E′e and show (4.1.6)-(4.1.8) hold. For given e ∈ Hd define the setsE′q,e = VJ\ (Bq,e,Ee ∪⋃f(e,Aq,f(Aq,f ∩Bq,f,Aq,f )), (4.6.2)where Aq,f ranges over the atoms of Bq,f . Hence E′q,e shpuld not contain a bad atom inside Ee(excluding Ee itself). As we have Bq,e = Be, which is generated by a single set Ee, if⋂e∈Hd Eecontains an atom Aq =⋂f∈HAq,f then Aq,e = Ee for e ∈ Hd. If such an atom Aq would be regularthen by (4.1.3), (4.5.9), its measure would satisfy1F ∗(Md−1)≤ µ˜J (⋂e∈HdEe ) = µJ (⋂e∈HdEe ) + oMd,F (1) < 2δ.147Choosing Md to be the largest positive integer so thatF ∗(Md−1) ≤ (2δ)−1 (4.6.3)then we see that⋂e∈Hd Ee could contain only irregular atoms.Also, from (4.6.2) and (4.5.7) we haveµ˜J (Ee\E′q,e) = µ˜J (⋃f⊆e,Aq,f(Aq,f ∩Bq,f,Aq,f )) ≤ 2−2Md . (4.6.4)Also, all irregular atoms Aq =⋂f∈HAq,f ⊆⋂e∈Hd Ee are contained in one of the sets Ee\E′q,e,thus ⋂e∈HdEe ⊆⋃e∈Hd(Ee\E′q,e)so ⋂e∈Hd(Ee ∩ E′q,e) = ∅.Finally, choosing ε := 2−2Md , (4.1.7) holds by (4.6.4). Moreover δ → 0 implies Md →∞ and henceε→ 0 showing the validity of (4.1.8). This proves Theorem Proof of the Main TheoremProof (Theorem 4.1.7 implies Theorem 4.1.3). By assumption (4.1.2) in Theorem 4.1.3 and by (4.2.14),Ex∈VJ∏e∈Hd1Ee(x) µJ(x) ≤ δ.For a given e′ ∈ Hd define the function ge′ : Ve′ → [0, 1] as follows. Let φe′ : Ve′ → M be theinverse of the projection map pie′ : VJ → Ve′ restricted to M , and y ∈ Ve′ letge′(y) :=∏e∈Hd1Ee(φe′(y)).Applying Theorem 1.4 to the system of weights {νe} and functions {ge} gives a system of measuresµ˜e and sets E′e ∈ Ae satisfying (4.1.4)-(4.1.8). By (4.2.11) we have that x ∈ M ∩⋂e∈Hd Ee if andonly if Φ(x) = (y, 0) with y ∈ A. Moreover in that case w(y) = µe(x) for all e ∈ Hd by (4.2.13),thus for any given e′ ∈ Hd,Ey∈ZdN1A(y)w(y) = Ex∈M∏e∈Hd1Ee(x)µe′(x) = Ez∈Ve′ge′(z)µe′(z)148= Ez∈Ve′ge′(z)µ˜e′(z) + oN,W→∞(1)= Ex∈M∏e∈Hd1Ee(x)µ˜e′(x) + oN,W→∞(1).By (4.1.6),∏e∈Hd 1Ee ≤∑e∈Hd 1Ee\E′e . Then the symmetry of the measures µ˜e (i.e. the fact thatµ˜e(x) = µ˜e′(x) for x ∈M ), (4.1.7) and the fact that 1Ee\E′e is constant on the fibers pi−1e (x) impliesEx∈M∏e∈Hd1Ee(x)µ˜e′(x) ≤∑e∈HdEx∈M1Ee\E′e(x)µ˜e′(x).Changing the sum over M to sum over Ve, we obtain∑e∈HdEx∈M1Ee\E′e(x)µ˜e′(x) =∑e∈HdEx∈Ve1Ee\E′e(x)µ˜e(x) ≤ (d+ 1) (δ) + oN,W→∞(1).Choosing N,W sufficiently large with respect to δ givesEy∈ZdN1A(y)w(y) ≤ ′(δ),with, say ′(δ) := (d+ 2)(δ).First, let us identify [1, N ]d with ZdN and recall that constellations in ZdN defined by the simplex ∆which are contained in a box B ⊆ [1, N ]d of size εN , are in fact genuine constellations containedin B. Note that we can assume that the simplex ∆ is primitive in the sense that t∆ * Zd for any0 < t < 1, as any simplex is a dilate of a primitive one. To any simplex ∆ ⊆ Zd there exists aconstant τ(∆) > 0 depending only on ∆ such that the following holds.Lemma 4.7.1 (ZN to Z). Let ∆ ⊆ Zd be a primitive simplex. Then there is constant 0 < ε < τ(∆)so that the following holds.Let N be sufficiently large, and let B = Id be a box of size εN contained in [1, N ]d ' ZdN . If thereexist x ∈ ZdN and 1 ≤ t < N such that x ∈ B and x + t∆ ⊆ B as a subset on ZdN , then eitherx+ t∆ ⊆ B or x+ (t−N)∆ ⊆ B, also as a subset of Zd.Proof. Consider ∆ = {e1, . . . , ed} as an element of Zd2 where ei ∈ Zd. Defineτ(∆) = infm/∈{0,e},x∈[0,e]|m− x|∞Let 0 < ε < τ(∆). We may assume that the simplex is primitive. By our assumption, there isx ∈ [1, N ]d, t ∈ [1, N − 1] such that x+ tej ∈ B+NZd for all 1 ≤ j ≤ d. Hence for each j, there ismj ∈ Zd such that |tej −Nmj |∞ ≤ εN i.e. |(t/N)∆−m|∞ ≤ ε where m = (m1, . . . ,md) ∈ Zd2 .Since 0 < t/N < 1 and ε < τ(∆) we have m = 0 or ∆. If m = 0 then |te|∞ ≤ εN . Since x ∈ Bwe have x+ te ⊆ B ⊆ Zd. Similarly, if m = e then x+ (t−N)e ⊆ B ⊆ Zd.149Lemma 4.7.2 (Pigeonhole Principle for W-trick). 14 Let A1 := {n ∈ [1, N/W ]d; Wn + b ∈ A}and A′ = A1 ∩ [ε1N ′, ε2N ′]d. By the Prime Number Theorem there is a prime N ′ so that ε2N ′ =N1(1 + oN1→∞(1)). We will work on Z/N ′. We can choose b ∈ Zd, ε1, ε2 in the definition of ν sothat|A1 ∩ [ε1N ′, ε2N ′]d| ≥ α εd22(N ′)dW d(logN ′)d φ(W )d.Proof. Let N,W be sufficiently large positive integers and assume that |A| ≥ α |PN |d for a setA ⊆ PdN . By the pigeonhole principle choose b = (bj)1≤j≤d so that bj is relative prime toW for eachj, and|A ∩ ((WZ)d + b)| ≥ α Nd(logN)d φ(W )d, (4.7.1)where φ is the Euler totient function. Set N1 := N/W and A1 := {n ∈ [1, N1]d; Wn + b ∈ A} .Choose ε2 > 0 so that 2ε2 < τ(∆). HenceαW dNd1φ(W )d logd(WN1)= αd2Wd(N ′)dφ(W )d logd(2WN ′)+oN,W→∞(1) = αεd2(N′)dW d(logN ′)d φ(W )d+oN,W→∞(1)where we used that we choose W2 = O(1). We have from (4.7.1) that|A1 ∩ [1, ε2N ′]d| ≥ α εd22(N ′)dW d(logN ′)d φ(W )d. (4.7.2)By Dirichlet’s theorem on primes in arithmetic progressions the number of n ∈ [1, N ′]d\[ε1N ′, N ′]dfor which Wn+ b ∈ Pd is of O(ε1 WN ′φ(W ) log ε1WN ′ ×(WN ′)d−1φ(W )d−1(logWN ′)d−1 ) = O(ε1N ′dW d(logN ′)d φ(W )d ) ,thus (4.7.2) holds for the set A′ := A1 ∩ [ε1N ′, ε2N ′]d as well, if ε1 ≤ cd εd2α for a small enoughconstant cd > 0.Theorem 4.1.3 implies Theorem 4.1.2. If x ∈ A′ then ε1N ′ ≤ xi ≤ ε2N ′ and Wxi + bi ∈ P for1 ≤ i ≤ d, thus by the definition of the Green-Tao measure νb : [1, N ′]→ R+, we havew(x) =d∏i=1νbi(xi) = cd(φ(W ) logNW)d. (4.7.3)as logN ′ − logN ≈ log( 12W ) = O(1), assuming N sufficiently large with respect to W . ThusEx∈ZdN′1A′(x)w(x) =cd|A′|(N ′)d(φ(W ) logNW)d≥ c′dεd2α (4.7.4)for some constant cd, c′d > 0. Applying the contrapositive of Theorem (4.1.3) for the set A′ with14If we allow W to grow with N then the choice of b will depend on N ; b = b(N).150ε := cdεd2α givesEx∈ZdN′ , t∈ZN′( d∏j=01A′(x+ tvj))w(x+ t∆) ≥ δ (4.7.5)with a constant δ = δ(α,∆) > 0 depending only on α and the simplex ∆ = {v0, . . . , vd} and α→ 0if δ → 0. Hence in our case δ is bounded above by a positive constant. Now we transfer (4.7.5) tostatement about numbers of prime simplices in A′. As in (4.7.3)w(x+ t∆) ≤ Cd(φ(W ) log NW)l(∆), (4.7.6)since all coordinates of x + t∆ are primes, bigger then R. Thus the number of copies ∆′ = x + t∆which are contained in A′ as a subset of ZdN ′ is at least cNd+1 (log N)−l(∆), for some constantc = c(α,∆,W ) > 0 depending only on the initial data α, ∆ and the number W . Since A′ ⊆[ε1N′, ε2N ′]d, by Lemma 4.7.1 at least half of the simplices ∆′ are contained in A′ as a subset of Zd,and then the simplices ∆′′ := W∆′ + b are contained in A.Now choose W = W (α,∆) large enough so that Theorem 4.1.3 holds for all sufficiently largeN , and then A contain at least c′(α,∆)Nd+1 (log N)−l(∆) similar copies of ∆ for some constantc′(α,∆) > 0 depending only on α and the simplex ∆. This proves Theorem Concluding RemarksIn this chapter, we obtain a more general version of the weighted hypergraph removal lemma. Ouranalysis is a kind of averaging arguments. A more details analysis in these measure system may be aninteresting problem, given many recent developments e.g. citeTZ3 in the theory of uniformity norms,say, in Zd.As we have seen, Szemere´di’s theorem type problems in higher dimension are quite interesting. Ourmethod indeed give an explicit bound on the number of prime configurations but it is terrible (of towertype) due to the application of the regularity lemma (This is necessary as demonstrated in [40] but itmay not be necessary in removal lemma or multidimensional Szemere´di’s theorem, see e.g. [24]).Also we use the weight ν which we may obtain narrow progressions result similar to [113] but wedon’t know how to model such problems on graph. Another interesting question would also to prove apolynomial progression in this setting or finding asymptotic of linear equation in primes [52] in higherdimensions. There could be an interesting phenomena happen from the correlations of points. Thereare other interesting modern approaches to multidimensional Szemere´di’s theorem in the primes byTao-Ziegler [111] and Fox-Zhao [68], both relied on the following more advanced tools, the inverseGowers norm theorem, which currently cannot give any quantitative bound.1514.8.1 Inverse Gowers Norm Theorem and Infinite Linear Forms ConditionThe very first application of the full inverse norm conjecture is used to find the asymptotic of numberof prime solutions to a system of linear equation of finite complexity [52]. Basically this means notwo linear parts of the system is a multiple of each other. The complexity basically measures howmany times you have to apply Cauchy-Schwartz’s inequality to obtain the generalized von Neumanntheorem. The notion of complexity is further discussed in [41].Another number theoretical application is that we can now define a weight with more general linearform conditions. Define a new weightν ′b,W (n) :=φ(W )W(log n)1P ′(n) (4.8.1)Then by a result of Green-Tao, we have the following infinite linear forms conditions. Due to thetechnical restriction of the sieve method, we cannot get the infinite linear form conditions for ν in [51]or [52].Theorem 4.8.1 ([102], Thm. 5.1). Let (ψ1, . . . , ψt) : Zd → Zt be a system of linear forms (henceψi(0) = 0). Let K ⊆[b−N/W c, bN/W c]dbe a convex body and b1, . . . , bt are coprime to W .Then15 ∑n∈K∩Zd∏j∈[t]ν ′bi,W (ψj(n)) = #{n ∈ K ∩ Zd : ψj(n) > 0 ∀j}+ o((N/W )d) (4.8.2)This condition is used in [112] and [68] to give different proofs of the main result in this chapter.In [110], they prove analogue of corresponding principle in ergodic theory to the weight setting thatallows them deduce this theorem from the analogue theorem in integer case. They constructed a Zd−system (X,B, µ, (Th)h∈Zd) and they have to consider the measure of the form µ(Th1 ∩ · · · ∩Thk(A))to shows the result. Here k can be arbitrarily large.Proof in [68] using sampling argument and their method also gives a polynomial progression versionof this theorem if we assume the Bateman-Horn conjecture [4] on the asymptotic number of primepoints in a given set of polynomials.Finally we state a nontrivial version of Inverse Gowers Norms conjecture [59].Theorem 4.8.2 (U3−inverse theorem). Let T = R/Z andH be the Heisenberg group. Let N > 2 bea prime and 0 < η < 12 and f : ZN → C, bounded by 1. Suppose ‖f‖U3 ≥ η. Then for some positiveinteger m ≤ η−C , then is an N − th root of g ∈ Hm (i.e. gN ∈ Γ) and a continuous 1−bounded15recall that ν ≥ 0.152function F on Nm with Lipschitz constant at most exp(η−C) such thatEn∈ZN f(n)F (gnΓm) ≥ exp(−η−C) (4.8.3)The constant in (4.8.3) relies on Frieman’s theorem on sumset where the current best bound is due toSander [89]. It is also mentioned (e.g.in [48]) that the proof of the general Uk of this theorem is notvery conceptually explain the role of nilsequence and the bound in terrible (due to the use of ultrafilterarguments). It is also mentioned in [48] also possible that we can use smaller class of nilsequencessuch as eigenfunctions of the Laplacian on free nilpotent Lie group. Link to approximate subgroup ofZ may be a new interesting approach to inverse U3−theorem.153Bibliography[1] N. ALON, J.H. SPENCER , The Probabilistic Method Wiley Series in Discrete Mathematics andOptimization, Fourth edition, 2016.[2] A. BALOG, Linear equations in primes, Mathematika 39.2 (1992): 367-378[3] M. BATEMAN, N. KATZ, New bounds on Cap sets, J. Amer. Math. Soc. 25 (2012), 585-613.[4] P.T. BATEMAN, R.A. HORN, A heuristic asymptotic formula concerning the distribution ofprime numbers, Math. Comp., 16 (1962), 363-367.[5] F. BEHREND, On sequences of integers containing no arithmetic progression, Proc. Nat. Acad.Sci. U. S. A., 32:331332, 1946.[6] V. BERGELSON, A. LIEBMAN, Polynomial extension of Van der Waerden’s and Szemere´di’stheorems, J. Amer. Math. Soc.9 (1996), 3, 725-753.[7] V. BERGELSON, B. HOST, B. KRA, Multiple recurrence and nilsequences., Inventiones Math.160, 2. (2005), 261-303.[8] V. BERGELSON, Combinatorial and Diophantine Applications of Ergodic Theory, with appendixby A. Liebman, appendix B by A. Quas and M. Weirdl., Handbook of Dynamical Systems Volume1, Part B, 2006, 745869.[9] T. BLOOM, A quantitative improvement for Roth’s theorem in arithmetic progression, preprint.[10] T. BLOOM, G JONES, A sum product theorem in function fields, Int. Math. Res. Note. IMRN,rnt125, June 2013.[11] B. BIRCH, Forms in many variables, Invent. Math. 179,(1962):559-644.[12] J. BOURGAIN On triples in arithmetic progression , Geom. Funct. Anal., 9(5):968 984, 1999.[13] J. BOURGAIN, A. GAMBURD, P. SARNARK, Affine linear sieve, expander, and sum-product ,Invent. Math. 179,(2010):559-644.[14] J. BRUDERN, J. DIETMANN, J. LIU, T. WOOLEY, A Birch-Goldbach theorem, Archiv derMathematik, 94(1), (2010): 53-58[15] COJOCARU, ALINA CARMEN; MURTY, M. RAM, An introduction to sieve methods and theirapplications, London Mathematical Society Student Texts 66. Cambridge University Press(2005). pp. 113134.[16] D. CONLON, J. FOX, Y. ZHAO, A relative Szemeredi theorem, Geom. Func. Anal. 25(3) (2015),733-762.[17] D. CONLON, J. FOX, Y. ZHAO, The Green-Tao Theorem: An Exposition, EMS Survey in Math-ematical Sciences. 1(2) (2014), 249-282.154[18] B. COOK, A. MAGYAR AND T. TITICHETRAKUN, A Multidimensional Szemere´di’s theorem inthe primes. Preprint.[19] B. COOK, A. MAGYAR, Diophantine equations in the primes. Invent. Math.: v 198/3, 701-737,2014.[20] B. COOK, A. MAGYAR, On restricted arithmetic progressions over finite fields, Online J. Anal.Comb., v.7, (2012): 1-10[21] H. DAVENPORT, Cubic forms in thirty two variables Philosophical Transactions of the RoyalSociety of London. Series A, Mathematical and Physical Sciences Vol. 251, No. 993 (Mar. 12,1959), 193-232.[22] H. DAVENPORT, Analytic Methods for Diophantine equations and Diophantine inequalitiesCambridge University Press, 2005.[23] W. DUKE, Z. RUDNICK, P. SARNAK, Density of integer points on affine homogeneous varieties,Duke Math. J. 71.1 (1993): 143-179.[24] J. FOX, A new proof of graph removal lemma, Annals. of Math. 174.1 (2011), 561-579.[25] J. FRIEDLANDER, H. IWANIEC, Opera de Cribro, Colloquium Publications, American Mathe-matical Society (2010).[26] H. FURSTENBERG, Ergodic behavior of diagonal measures and a theorem of Szemerdi on arith-metic progressions, J. Analyse Math. 31 (1977), 204-256.[27] H. FURSTENBERG, Y. KATZNELSON, An ergodic Szemerdi theorem for commuting transfor-mations, J. d’Analyse Math., 34 (1978), 275-291.[28] H. FURSTENBERG, Y. KATZNELSON, An ergodic Szemerdi theorem for IP-systems and combi-natorial theory, J. d’Analyse Math., 45 (1985), 117-168.[29] H. FURSTENBERG, Y. KATZNELSON, A density version of the Hales-Jewett theorem, J.d’Analyse Math., 57 (1991), 64-119.[30] H. FURSTENBERG, Y. KATZNELSON, B. WEISS, Ergodic theory and configurations in set ofpositive upper density, Mathematics of Ramsey Theory, Algorithms Combin, no 5 (1990) 184-198.[31] H. FURSTENBERG, Y. KATZNELSON, D. ORNSTEIN, The ergodic theoretical proof of Sze-meredis theorem, Bull. Amer. Math. Soc. (N.S.) 7 (1982), no. 3, 527552[32] H. FURSTENBERG, Y. KATZNELSON, B. WEISS, Ergodic theory and configurations in set ofpositive upper density, Mathematics of Ramsey Theory, Algorithms Combin, no 5 (1990) 184-198.[33] H. FURSTENBERG, Y. KATZNELSON, An ergodic Szemere´di theorem for commuting trnasfor-mations, J. Analyse Math. 31 (1978), 275-291[34] W. HUREWICZ, H. WALLMAN, Dimension Theory, Princeton, N.J. Princeton University Press(1951).[35] W.T. GOWERS, Quasirandomness, Counting and Regularity for 3-Uniform Hypergraphs, Com-bin. Probab. Comput. 15 (2006), no. 1-2, 143-184.[36] W.T. GOWERS, Hypergraph regularity and the multidimensional Szemere´di theorem, Annals ofMath. 166/3 (2007), 897-946155[37] W.T. GOWERS, Decompositions, approximate structure, transference, and the Hahn-Banachtheorem, Bull. London Math. Soc. 42 (4) (2010), 573-606[38] W.T. GOWERS, A New Proof of Szemere´di’s Theorem for Arithmetic Progression of length four,Geom. Func. Anal. 8(3)(1998), 529-551[39] W.T. GOWERS, A New Proof of Szemere´di’s Theorem , Geom. Func. Anal. 11(3)(2001), 465-558[40] W.T. GOWERS, Lower bounds of tower-type for Szemeredi’s uniformity lemma , Geom. Func.Anal. 7(2)(1997)[41] W.T. GOWERS, J. WOLF, The true complexity of a system of linear equations, Proc. London.Math. Soc. 100 (3) (2010), no. 1, 155-176.[42] D. GOLDSTON, J. PINTZ, YILDIRIM, Primes in tuples I, Ann. of Math. 170 (2009): 819-862[43] D. GOLDSTON, C. YILDIRIM, Higher correlations of divisor sums related to primes I: triplecorrelations, Integers: Electronic Journal of Combinatorial Number theory, 3 (2003), 1-66[44] R. GRAHAM, Recent trend in Ergodic Ramsey Theory. Discrete Math, no. 136, (1994), 119-227[45] B. GREEN, Roth’s Theorem in the Primes.Annals. of Math. 161.2 (2005), 1609-1636.[46] B. GREEN, Finite Field Model in Additive Combinatorics. In Bridget S Webb, editor, Surveysin combinatorics 2005, pages 127. Cambridge Univ. Press, Cambridge, Cambridge, 2005.[47] B. GREEN, Montral notes on quadratic Fourier analysis. In Additive combinatorics, pages69102. Amer. Math. Soc., Providence, RI, 2007.[48] B. GREEN, Approximate Algebraic Structure. Proceedings of ICM 2014.[49] B. GREEN, A Szemerdi-type regularity lemma in abelian groups, with applications. Geom.Funct. Anal. 15 (2005), no. 2, 340376.[50] B. GREEN AND T. TAO, An arithmetic regularity lemma, an associatd counting lemma, andapplications. An Irregular Mind: Szemere´di is 70 Bolyai Society Mathematical Studies, 335-342.[51] B. GREEN AND T. TAO, The Primes contain arbitrarily long arithmetic progressions. Annals.of Math.(2) 167.2 (2008), 481-547.[52] B. GREEN AND T. TAO, Linear equations in primes. Annals. of Math.(2) 171.3 (2010), 1753-1850.[53] B. GREEN AND T. TAO, The quantitative behaviour of polynomial orbits on nilmanifolds. An-nals. of Math.(2) 175 (2012), 465-540.[54] B. GREEN AND T. TAO, The Mo¨bius Function is Strongly Orthogonal to Nilsequences. Annals.of Math.(2) 175 (2012), 541-566.[55] B. GREEN AND T. TAO, Restriction theory of Selberg’s sieve with applications. Journal deThe´orie des Nombres.18 (2006), 147-182.[56] B. GREEN AND T. TAO, New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 infinite field geometries revisited. Preprint.[57] B. GREEN AND T. TAO, New bounds for Szemeredi’s Theorem, II: A new bound for r4(N).Analytic number theory: essays in honour of Klaus Roth, W. W. L. Chen, W. T. Gowers, H.Halberstam, W. M. Schmidt, R. C. Vaughan, eds, Cambridge University Press, 2009. 180-204.156[58] B. GREEN AND T. TAO, New bounds for Szemeredi’s Theorem, III: A polylog bound for r4(N).In preparation.[59] B. GREEN AND T. TAO, An inverse theorem for the Gowers U3−norm, with applications Proc.Edinburgh Math. Soc. 51, no. 1, 71-153.[60] B. GREEN, T. TAO AND T. ZIEGLER, An inverse theorem for Gowers U s+1[N ]-norm. Annals.of Math.(2) 176 (2) (2012), 1231-1372.[61] S. GHORPAGE, T. TAO AND G. LACHAUD, Etale cohomology, Lefschetz theorems and numberof points on singuar varieties over finite fields Moscow. Math. J.(2)(2002), 589-631.[62] R. HARTSHORNE, Algebraic geometry, Graduate Texts in Mathematics Vol. 52. Springer (1977)[63] H. HELFGOTT, A. ROTON, Improving Roth’s Theorem in the Primes. Int. Math. Res. Not.IMRN, (4) (2011), 767783.[64] B. HOST, N. FRANTZIKINAKIS, Higher order Fourier analysis of multiplicative functions andapplications., preprint.[65] B. HOST AND B. KRA, Non conventional ergodic averages and nilmanifolds. Annals. of Math.,161 (2005) 398-488.[66] B. HOST AND B. KRA, A point of view in Gowers uniformity norms. New York J. Math., 18(2012), 213-248.[67] L.K. HUA, Additive theory of prime numbers, Translations of Mathematical Monographs Vol.13.” Am. Math. Soc., Providence (1965).[68] J . FOX, Y. ZHAO, A short proof of multidimensional szemere´di’s theorem in the primes Ameri-can Journal of Mathematics 137 (2015), 11391145.[69] J. KOMLS, M. SIMONOVITS, Szemerdi’s regularity lemma and its applications in graph theory,Combinatorics, Paul Erds is eighty, Vol. 2 (Keszthely, 1993), 295-352, Bolyai Soc. Math. Stud.,2, Jnos Bolyai Math. Soc., Budapest, 1996.[70] Y. KOHAYAKAWA, T. LUCZSAK, V. RO¨DL, Arithmetic progressions of length three in subsetsof a random set, Acta Arith. 75 (1996), no. 2, 133-163.[71] B. KRA, Ergodic methods in additive combinatorics,In Additive combinatorics, pages 69102.Amer. Math. Soc., Providence, RI, 2007.[72] J. LENZ, D. MUBAYI, The poset of hypergraph quasirandomness, Random Structures & Algo-rithms. Volume 46, Issue 4, July 2015, 762800.[73] J. LIU, Integral points on quadrics with prime coordinates, Monats. Math. 164.4 (2011): 439-465.[74] A. MAGYAR AND T. TITICHETRAKUN, Corners in dense subset of Pd. Preprint.[75] A. MAGYAR AND T. TITICHETRAKUN, Almost prime solutions to diophantine equation of highrank. Preprint.[76] L. MATTHIESEN, Generalized Fourier coefficients of multiplicative functions , Preprint.[77] H.L. MONTGOMERY AND R.C. VAUGHN, Multiplicative number theory I. classical theory. ,Cambridge studies in advanced mathematics, 97, 2006.[78] E. NASLUND, On improving Roth’s Theorem in the primes. Mathematica (61) , (2015), 49-62.157[79] K. O. BRYANT, Sets of integers that do not contain long arithmetic progressions. The ElectronicJournal of Combinatorics, Volume 18, Issue 1 (2011)[80] S. PRENDIVILLE, Four variants of the Fourier-analytic transference principle, Preprint avail-able at http://arxiv.org/abs/1509.09200.[81] J. PINTZ, Are there arbitrarily long arithmetic progressions In the sequence of twin primes? Anirregular mind, Volume 21 of the series Bolyai Society Mathematical Studies, 525-559.[82] B. NAGLE, V. RO¨DL, M. SCHACHT, The counting lemma for regular k-uniform hypergraphs,Random Structures and Algorithms, 28 (2), (2006), 113-179[83] V. RO¨DL, J. SKOKAN, Regularity lemma for uniform hypergraphs, Random Structures andAlgorithms, 2004, vol. 25, no. 1, 1-42.[84] V. RO¨DL, M. SCHACHT, Applications of the regularity lemma for uniform hypergraphs, Ran-dom Structures and Algorithms, 2006, vol. 28, no. 2, 180-194.[85] V. RO¨DL, M. SCHACHT, Regular partitions of hypergraphs, Regularity Lemma, Combin. Prob.Comput. 16 (2007), 833-885.[86] V. RO¨DL, M. SCHACHT, Regular partitions of hypergraphs, Counting Lemma, Combin. Prob.Comput. 16 (2007), 887-901.[87] O. REINGOLD, L. TREVISAN, M. TULSIANI, S. VADHAM, Dense subsets of pseudorandomsets Electronic Colloquium of Computational Complexity, Report TR08-045 (2008)[88] O. REINGOLD, L. TREVISAN, M. TULSIANI, S. VADHAM, New proofs of the Green-Tao-Ziegler dense model theorem: an exposition arXiv:0806.0381.[89] T. SANDER, On the Bogolyubov-Ruzsa lemma. Anal. PDE, 5 (2) (2012), 627-655.[90] T. SANDER, The structure theory of set additions revisited. Bull. Amer. Math. Soc 50, (2013),93-127.[91] K.F. ROTH, On certain sets of integers J. London Math Society, 28 (1953): 104-109.[92] I. RUZSA, E. SZEMERDI, Triple systems with no six points carrying three triangles, Colloq.Math. Soc. J. Bolyai, 18 (1978), 939-945.[93] T. SCHEON, I. D. SHKREDOV, Roth’s theorem in many variables, Israel Journal of MathematicsJanuary 2014, Volume 199, Issue 1, pp 287-308[94] W. SCHMIDT, The density of integral points on homogeneous varieties, Acta Math. 154.3(1985):243-296.[95] G. SHIMURA, Reduction of algebraic varieties with respect to a discrete valuation of the basicfield, Amer. J. Math. (1955): 134-176.[96] I.D. SHKREDOV, On one problem of Gowers, Izv. RAN. Ser. Mat. 70.2 (2006) 179-221[97] J. SOLYMOSI, Note on a generalization of Roth theorem, Combinatorial Number Theory andAdditive Group Theory, Part of the series Advanced Courses in Mathematics - CRM Barcelona,pp 299-314.[98] J. SOLYMOSI, Incidences and the spectra of graphs, Discrete and Computational Geometry,Algorithms Combin. 25, (2003), 825-827[99] E. SZEMERE´DI, On sets of integers containing no k elements in arithmetic progression, ActaArith. 27 (1975), 299-345158[100] T. TAO, The ergodic and combinatorial approaches to Szemere´di’s theorem, Centre de Re-cerches Mathe´matiques CRM Proceedings and Lecture Notes, 43 (2007), 145–193.[101] T. TAO, A quantitative ergodic theory proof of Szemere´di’s theorem , Electron. J. Combin. 13(2006). 1 No. 99, 1–49.[102] T. TAO, The Gaussian primes contain arbitrarily shaped constellations, J. Analyse Math., 99/1(2006), 109-176[103] T. TAO, Szemerdi’s regularity lemma revisited, Contrib. Discrete Math. 1 (2006), pp.828.[104] T. TAO, A variant of the hypergraph removal lemma, Journal of Combinatorial Theory, SeriesA 113.7 (2006): 1257-1280[105] T. TAO, Higher order Fourier analysis, American Mathematical Society, Graduate Studies inMathematics Volume: 142 (2012).[106] T. TAO, The prime tuples conjecture, sieve theory, and the work of Goldston-Pintz-Yildirim, Motohashi-Pintz, and Zhang (2013). Retrieved on July 16, 2015. Available athttps://terrytao.wordpress.com/2013/06/03/the-prime-tuples-conjecture-sieve-theory-and-the-work-of-goldston-pintz-yildirim-motohashi-pintz-and-zhang/[107] T. TAO, 254A, Notes 4: Some sieve theory (2015). Retrieved on July 16, 2015. Available athttps://terrytao.wordpress.com/2015/01/21/254a-notes-4-some-sieve-theory/[108] T. TAO, A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound(2016). Retrieved on May 24, 2016. Available at https://terrytao.wordpress.com/2016/05/18/a-symmetric-formulation-of-the-croot-lev-pach-ellenberg-gijswijt-capset-bound/[109] T. TAO, V. VU, Additive Combinatorics Cambridge University Press, 2006.[110] T. TAO AND T. ZIEGLER, The primes contain arbitrarily long polynomial progressions, ActaMath., 201 (2008), 213-305[111] T. TAO AND T. ZIEGLER, multi-dimensional Szemerdi theorem for the primes via a correspon-dence principle, Israel Journal of mathematics, 207 (1) (2015), 203-228.[112] T. TAO AND T. ZIEGLER, Narrow gaps in the primes, ”Analytic Number Theory” in honourof Helmut Maier’s 60th birthday, to appear.[113] T. TAO AND T. ZIEGLER, Concatenation theorems for the Gowers uniformity norm and appli-cation , Preprint available at http://arxiv.org/abs/1603.07815[114] P. VARNAVIDES, On Certain Sets of Positive Density , J. London Math Soc. 39 (1959), 358-360.[115] T. ZIEGLER, Universal characteristic factors and Furstenberg average, J. Amer. Math. Soc.20 (2007), 53-97[116] T. ZIEGLER, Linear equations in primes and dynamics of nilmanifolds, Proceedings of ICM2014.159


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items