UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Applications of machine learning for solving complex quantum problems Vargas-Hernández, Rodrigo Alejandro 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2019_february_vargas-hernandez_rodrigo.pdf [ 3.71MB ]
JSON: 24-1.0375836.json
JSON-LD: 24-1.0375836-ld.json
RDF/XML (Pretty): 24-1.0375836-rdf.xml
RDF/JSON: 24-1.0375836-rdf.json
Turtle: 24-1.0375836-turtle.txt
N-Triples: 24-1.0375836-rdf-ntriples.txt
Original Record: 24-1.0375836-source.json
Full Text

Full Text

Applications of machine learning for solving complexquantum problemsbyRodrigo Alejandro Vargas–Herna´ndezB.Sc. Chemistry, Universidad Nacional Auto´noma de Me´xico, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Chemistry)The University of British Columbia(Vancouver)December 2018c© Rodrigo Alejandro Vargas–Herna´ndez, 2018The following individuals certify that they have read, and recommended to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, a dissertation entitled:APPLICATIONS OF MACHINE LEARNING FOR SOLVING COMPLEXQUANTUM PROBLEMSsubmitted in partial fulfillment of the requirementfor by Rodrigo A. Vargas-Herna´ndezthe degree of DOCTOR OF PHILOSOPHYin CHEMISTRYExamining Committee:PROFESSOR ROMAN V. KREMS, CHEMISTRYSupervisorPROFESSOR EDWARD R. GRANT, CHEMISTRYSupervisory Committee MemberPROFESSOR MARK THACHUK, CHEMISTRYSupervisory Committee MemberPROFESSOR JOERG ROTTLER, PHYSICSUniversity ExaminerPROFESSOR PIERRE KENNEPOHL, CHEMISTRYUniversity ExamineriiAbstractThis thesis illustrates the use of machine learning algorithms and exact numeri-cal methods to study quantum observables for different systems. The first part ofthis thesis depicts how to construct accurate potential energy surfaces (PESs) usingsupervised learning algorithms such as Gaussian Process (GP) regression. PESshave a leading part in quantum chemistry since they are used to study chemicalreaction dynamics. Constructing the PES from quantum reactive scattering cal-culations, as the reaction probability, is known as the inverse scattering problem.Here, we illustrate a possible solution to the inverse scattering problem with a two-tiered GP model one GP model interpolates the PES and the second in Bayesianoptimization (BO) algorithm. The end result is an accurate PES constructed witha GP with fewer points than with standard methods previously used for PES. BOis an optimization algorithm for black-box functions that use GP regression as anapproximation of the interrogative function. We applied BO to find the optimal pa-rameters of hybrid-density functionals. Quantum observables can differ betweenphases of matter. GP models with kernel combinations can extrapolate quantumobservables such as the polaron dispersion energy between different phases anddiscover phases of matter. The same algorithm can predict quantum observableswhere standard numerical techniques lack convergence.In the second half of the dissertation, we studied the evolution of quantumwalks in various graphs with Hamiltonians permitting particle number changes.We showed that particle number-changing interactions accelerate quantum walksfor any of the graph considered. Quantum simulators to study many-body physicsis an active research field. We proposed the use of magnetic atoms trapped inoptical lattices to experimentally mimic Bose-Hubbard type models by preparingatoms in different Zeeman states.iiiLay SummaryIn this dissertation, we used machine learning algorithms to reduce the compu-tational resources needed to predict quantum observables such as the electronicenergy of molecules, reaction probabilities, and energy dispersions. Next, we il-lustrate that machine learning algorithms, trained with data from a single phase,can extrapolate quantum observables and predict new phases of matter. We also in-troduced an optimization algorithm for black-box functions to solve problems likethe inverse scattering problem or the optimization of hybrid-density functionalsused in quantum chemistry. Additionally, we illustrate that the spread of quantumparticles used in quantum computing, like in quantum search algorithms, can beenhanced if quantum particles are allowed to create/annihilate different particles.Last, we proposed an experimental simulator to study many-body quantum physicsusing Zeeman excitations of magnetic atoms trapped in an optical lattice.ivPrefacePart of the material in Chapter 2 was published in the article: A. Kamath, R. A.Vargas-Herna´ndez, R. V. Krems, T. Carrington Jr., and S. Manzhos, Neural Net-works vs Gaussian Process Regression for Representing Potential Energy Surfaces:a Comparative Study of Fit Quality and Vibrational Spectrum Accuracy, J. Chem.Phys. 148, 241702 (2017). The author did the GP models calculations to constructthe PESs of the results published in the article.Part of the material in Chapter 3 was accepted for publication in New Journalof Physics, arXiv:1711.06376 (2018): R. A. Vargas-Herna´ndez, Y. Guan, D. H.Zhang, and R. V. Krems, Bayesian optimization for the inverse scattering problemin quantum reaction dynamics. This project was done in collaboration with theresearch group of professor D. H. Zhang at Dalian Institute of Chemical Physics(DICP), in Dalian China. The project was identified and designed by Roman V.Krems and the author. The author developed the Bayesian optimization algorithmand the code that combines BO with the DICP’s library, which computes the reac-tion probabilities and the GP models for the PESs. The author did all the calcula-tions and the data analysis.Part of the material in Chapter 4 was accepted for publication in PhysicalReview Letters, arXiv:1803.08195 (2018): R. A. Vargas-Herna´ndez, J. Sous, M.Bercui, and R. V. Krems, Extrapolating quantum observables with machine learn-ing: inferring multiple phase transitions from properties of a single phase. Theproject was identified and designed by the author and Roman V. Krems. The po-laron dispersions data was provided by J. Sous and M. Bercui. The author devel-oped the code to construct the kernel combinations based on Refs. [58, 60] andcarried all the calculations presented in the article.vThe material in Chapter 5 and Appendix B was published in the article: R. A.Vargas-Herna´ndez, and R. V. Krems, Quantum walks assisted by particle numberfluctuations, Phys. Rev. A 98, 022107 (2018). The project was identified anddesigned by the author under the guidance of Roman V. Krems. The author de-veloped the code, performed the calculations, the data analysis, and the analyticalderivations.The material in Chapter 6 was published in the article: R. A. Vargas-Herna´ndez,and R. V. Krems, Engineering extended Hubbard models with Zeeman excitationsof ultracold Dy atoms, J. Phys. B 49, 235501 (2016). The project was identifiedand designed by Roman V. Krems and the author. The author developed the code,performed the calculations, and the data analysis.The material in Appendix A is unpublished work by the author. The projectwas identified and designed by the author. The code that computes the HOMOenergy with density functional theory was provided by R. Ghassemizadeh fromthe University of Freiburg. The manuscript and further calculations are still inprogress.Results presented in Chapters 2-4 and Appendix A were done using scikit-learn’s Gaussian Process Regression library [137].viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Machine learning to study quantum systems . . . . . . . . . . . . 21.2 Quantum walks . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Quantum simulators of condensed matter . . . . . . . . . . . . . 81.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Gaussian Processes: prediction . . . . . . . . . . . . . . . . . . . 12vii2.3 Gaussian Processes: training . . . . . . . . . . . . . . . . . . . . 132.4 Kernel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.1 Constant kernel . . . . . . . . . . . . . . . . . . . . . . . 172.4.2 Square exponential kernel . . . . . . . . . . . . . . . . . 172.4.3 Matern kernel . . . . . . . . . . . . . . . . . . . . . . . . 172.4.4 Rational Quadratic kernel . . . . . . . . . . . . . . . . . 192.4.5 Periodic kernel . . . . . . . . . . . . . . . . . . . . . . . 212.4.6 Linear kernel . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Potential Energy Surfaces . . . . . . . . . . . . . . . . . . . . . . 232.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.5.2 Gaussian Processe regression vs Neural Networks . . . . . 252.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Bayesian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 Acquisition function . . . . . . . . . . . . . . . . . . . . . . . . 343.2.1 Probability of Improvement . . . . . . . . . . . . . . . . 343.2.2 Expected Improvement . . . . . . . . . . . . . . . . . . . 363.2.3 Upper Confidence Bound . . . . . . . . . . . . . . . . . . 373.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Fitting known quantum dynamics results . . . . . . . . . 413.3.2 Fitting without known quantum dynamics results . . . . . 453.3.3 The inverse scattering problem . . . . . . . . . . . . . . . 473.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Extrapolation of quantum observables . . . . . . . . . . . . . . . . . 534.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2 Combination of kernels . . . . . . . . . . . . . . . . . . . . . . . 564.2.1 Kernel combination to predict a single phase transition . . 584.2.2 Kernel combination to predict multiple phase transitions . 624.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.3.1 Bayesian information criterion . . . . . . . . . . . . . . . 674.3.2 Greedy search for model construction . . . . . . . . . . . 68viii4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.4.1 Extrapolating quantum observables to unconverged regimes694.4.2 SSH-BM polaron phase diagram . . . . . . . . . . . . . . 704.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Accelerating quantum walks . . . . . . . . . . . . . . . . . . . . . . 775.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2.1 Ideal 1D lattices . . . . . . . . . . . . . . . . . . . . . . 815.2.2 Disordered 1D lattices . . . . . . . . . . . . . . . . . . . 855.2.3 Binary trees . . . . . . . . . . . . . . . . . . . . . . . . . 885.2.4 Glued binary trees . . . . . . . . . . . . . . . . . . . . . 915.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946 Quantum simulators with highly magnetic atoms . . . . . . . . . . . 976.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2 Lattice Hamiltonian with Zeeman excitations . . . . . . . . . . . 996.3 Engineering lattice models . . . . . . . . . . . . . . . . . . . . . 1046.3.1 t−V model . . . . . . . . . . . . . . . . . . . . . . . . . 1046.3.2 t−V model with Dy atoms . . . . . . . . . . . . . . . . . 1066.3.3 Particle number-non-conserving interactions . . . . . . . 1106.3.4 Anderson localization of Zeeman excitations . . . . . . . 1126.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.1 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . 1167.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122A Bayesian optimization for hybrid-density functionals . . . . . . . . . 145B Schrieffer–Wolff transformation . . . . . . . . . . . . . . . . . . . . 149ixC Magnetic dipole–dipole interaction . . . . . . . . . . . . . . . . . . . 152C.0.1 Matrix element . . . . . . . . . . . . . . . . . . . . . . . 155xList of TablesTable 2.1 RMSE (test errors computed on 120,000 points) of the PES ob-tained with the NNs for different training points (Npts). Thenumber in the parenthesis are the number of neurons used inthat particular NN. 〈NN〉10 is the average RMSE for 10 NNswith different sets of Npts. The values are in cm−1. . . . . . . . 27Table 2.2 RMSE (test errors computed on 120,000 points) of the PES ob-tained with GP regression. All GPs use the SE kernel function.Npts is the number of training points. 〈GP〉10 is the averageRMSE for 10 GP models with different sets of Npts. The valuesare in cm−1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27xiList of FiguresFigure 2.1 Interpolation using a GP model trained with 7 points and anexponential squared kernel. The black dashed line is f (x) =x10 +x2+10sin(32 x). The solid blue line is the prediction of theGP model, Equation 2.5. The grey shaded area is the standarddeviation of the predicted mean of the GP model, Equation 2.4.The blue square symbols are the training data. . . . . . . . . . 14Figure 2.2 a) The square exponential function with different length scales.b) Prediction of a GP model with the SE kernel using differentlength scale values. . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.3 a,c) The Matern kernel function for different length scales andν = 32 (a) or ν =52 (c). b,d) Prediction of a GP model with theMAT kernel using different length scale values and ν = 32 (b)or ν = 52 (d). . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.4 a) The rational quadratic function with different ` and α val-ues. b) Prediction of a GP model with the RQ kernel usingdifferent values for ` and α . . . . . . . . . . . . . . . . . . . 20Figure 2.5 a) The exponential sine squared function with different lengthscales and different periodicities. b) Prediction of a GP modelwith the PER kernel using different values for ` and p = 8.11. 21Figure 2.6 Prediction of a GP with the LIN kernel using different σ andpolynomial degree. . . . . . . . . . . . . . . . . . . . . . . . 22xiiFigure 2.7 H2 molecule’s energy for different interatomic distances pre-dicted with a GP model trained with 6 points and the SE kernel.The exact PES (dashed black line) is from reference [23]. . . . 24Figure 2.8 Distribution of energy values of the PES dataset. Figure fromreference [102] . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 2.9 Absolute errors in transition frequencies computed on PESsfitted to 625 points with different methods (NN: top half, GP:bottom half) vs the spectrum computed on the reference PES.Figure from reference [102] . . . . . . . . . . . . . . . . . . 29Figure 3.1 Example of the BO algorithm over three iterations. Blue mark-ers are GP’s the training data, blue curves are GP’s predic-tion and black dashed line is the black-box function f (x) =x10 + x2 + 10sin(32 x). The yellow solid curves are the acquisi-tion function, and H are the maximum of the acquisition func-tion at each iteration, xn+1. . . . . . . . . . . . . . . . . . . 35Figure 3.2 Illustration of the PI acquisition during BO. We also show thePI function for different values of ε . . . . . . . . . . . . . . . 36Figure 3.3 Illustration of the EI acquisition during BO. . . . . . . . . . . 37Figure 3.4 Illustration of the UCB acquisition during BO. We also showhow the UCB function for different values of κ . . . . . . . . 38Figure 3.5 a) The reaction probability for the H2 + H → H + H2 reac-tion as a function of the collision energy. The black solidcurve–accurate calculations from [23]. The dashed curves–calculations based on GPs PESs obtained with 22+1 ab initiopoints, where the 22 initial points (solid blue curve) are fixedand the extra point is randomly sampled in the configurationspace. b) The error of the reaction probabilities as a functionof the location extra point added to the original 22 points. . . . 42xiiiFigure 3.6 a) The reaction probability for the H2 + H→ H + H2 reactionas a function of the collision energy. The black solid curve –accurate calculations from Ref. [23] based on the surface con-structed with 8701 ab initio points. The dashed curves – calcu-lations based on the GP PES obtained with 22 ab initio points(blue); 23 points (orange), 30 points (green) and 37 points (in-set). The RMSE of the results with 37 points is 0.009. b) TheGP model of the PES for the H3 reaction system constructedwith 30 ab initio points. This surface yields the quantum dy-namics results shown by the green curve in the upper panel. . 44Figure 3.7 Convergence of the RMSE of the probabilities for H2 + H→H + H2 reaction with the number of BO iterations as a functionof κ in UCB. . . . . . . . . . . . . . . . . . . . . . . . . . . 45Figure 3.8 The reaction probability for the OH + H2→ H + H2O reactionas a function of the collision energy. The black solid curve– accurate calculations from Ref. [42] based on the surfaceconstructed with ∼17000 ab initio points. The dashed curves– calculations based on the GP PES obtained with 200 ab initiopoints (blue); 280 points (orange) and 290 points (green). TheRMSE of the 290-point result is 0.0076. . . . . . . . . . . . . 46Figure 3.9 The reaction probabilities for the H2 + H → H + H2 reactionas functions of the collision energy. The black solid curve –accurate calculations from Ref. [23]. The dashed curves – theresults of iterative calculations maximizing the difference be-tween the reaction probabilities in subsequent iterations. Theblack curve is not used for these calculations. The inset showsthe agreement between the reaction probabilities (red symbols)based on the GP approach after 48 iterations (total of 70 ab ini-tio points) and the exact results. . . . . . . . . . . . . . . . . 47xivFigure 3.10 a) The reaction probabilities for the modified H2 + H → H+ H2 reaction as functions of the collision energy. The blackdot-dashed curve is obtained by a modification of the previ-ous results (black solid curve) involving a translation along theenergy axis. The ML models are trained to obtain the PESthat would describe the new reaction probabilities. The greendashed curve is a results of such training after 30 iterations,which produces a surface constructed with 52 ab initio points.b) Comparison of the original PES (blue) with the new PES(red) found by the BO algorithm. The new PES yields the re-action probabilities described by the green dashed curve in theupper panel. The RMSE of the results shown by the greendashed curve is 0.016 ev. . . . . . . . . . . . . . . . . . . . 49Figure 4.1 Schematic diagram of a quantum system with three phases. . . 55Figure 4.2 The upper panel is the RMSE of GPs’ prediction as a func-tion of λSSH . The markers illustrate the values of λSSH usedto train the GPs. We consider two sets of training values,(a) 0.4 ≤ λSSH ≥ 1.3 and (b) 1.6 ≤ λSSH ≥ 2.5. The lowerpanels illustrate the GPs prediction (solid colour curves) andthe accurate energy dispersion (dashed colour curves). Weuse kMAT × kLIN as the kernel function for both GP models.ω/t = 3 is held fixed. . . . . . . . . . . . . . . . . . . . . . . 60Figure 4.3 The upper panel is the RMSE of GPs’ prediction as a func-tion of λSSH . The markers illustrate the values of λSSH used totrain the GPs. We consider two sets of training data (a) 100points distributed from 0.2 ≤ λSSH ≤ 0.5 and (b) 125 pointsdistributed in 0.2 ≤ λSSH ≤ 0.6. The lower panels illustratesthe GPs prediction (solid colour curves) and the accurate en-ergy dispersion (dashed colour curves). We use kMAT × kLINas the kernel function for both GP models. For all the energydispersions display here, we consider ω/t = 3 and β = 0. . . 61xvFigure 4.4 For both panels we illustrate the change in KGS as a functionof λSSH . The blue curves is the KGS from the GP’s predictionof the energy dispersion. The black dashed curve is the correctKGS for each λSSH . The markers are the values of λSSH that weuse to train each GPs. The upper panel shows 0.2 ≤ λSSH ≥0.5, wile the lower panel shows 0.2 ≤ λSSH ≥ 0.6. We usekMAT × kLIN as the kernel function for both GPs. For all thecalculations ω/t = 3 and β = 0. . . . . . . . . . . . . . . . . 62Figure 4.5 KGS for the mixed model 4.3. The black dashed curves are thecalculations from Ref. [87]. The color map is the prediction ofthe GP models with the fully optimized kernels. The modelsare trained by the dispersion of the polarons at the locationsrepresented by the black dots. The different kernels were con-sidered, kMAT , kRQ and kRBF . . . . . . . . . . . . . . . . . . . 63Figure 4.6 KGS for the mixed model 4.3. The black dashed curves are thecalculations from Ref. [87]. The color map is the prediction ofthe GP models with the fully optimized kernels. The modelsare trained by the dispersion of the polarons at the locationsrepresented by the black dots. The different kernels consideredhere are all the possible pairwise addition, Equation 4.6, of twosimple kernels, kMAT , kRQ and kRBF . . . . . . . . . . . . . . . 65Figure 4.7 The momentum of the polaron ground state for the mixed model4.3. The black dashed curves are the calculations from Ref.[87]. The color map is the prediction of the GP models with thefully optimized kernels. The models are trained by the disper-sion of the polarons at the locations represented by the blackdots. The different kernels considered here are all the possiblepairwise amutiplication, Equation 4.7, of two simple kernels,kMAT , kRQ and kRBF . . . . . . . . . . . . . . . . . . . . . . . 66Figure 4.8 Schematic diagram of the kernel construction method employedto develop a GP model with extrapolation power. At each itera-tion, the kernel with the highest Bayesian information criterion(Equation 4.10) is selected. . . . . . . . . . . . . . . . . . . . 69xviFigure 4.9 Left panel: Holstein polaron dispersion predicted with a GPmodel with kRQ, black dashed curves, and kRQ× kRQ× kRBF +kLIN , blue dashes curves. The red solid curves are polaron dis-persions computed with the momentum average approxima-tion method [22]. Right panel: The change in the RMSE asa function of depth in the kernel combination search guidedby the BIC. We plot the RMSE between GP predicted polarondispersion and the exact polaron dispersion at different valuesof ω . The grey area is the range in ω considered for trainingthe GP. For both figures, the value of g = 1.5 is fixed. . . . . 70Figure 4.10 KGS for the mixed model (4.3) as a function of β/α for λSSH =2α2/th¯ω . The dotted curves are the quantum calculations fromRef. [87]. The color map is the prediction of the GP models.Each panel illustrates the improvement of the predicted phasediagram. The panels correspond to the optimized kernels GPL-0 (left), GPL-1 (right), GPL-2 (centre), where “GPL-X” de-notes the optimal kernel obtained after X depth levels in thealgorithm depicted in Figure 4.8. The models are trained bythe dispersion of the polarons at the locations represented bythe white dots. . . . . . . . . . . . . . . . . . . . . . . . . . 72Figure 4.11 KGS for the mixed model (4.3) as a function of β/α for λSSH =2α2/th¯ω . The dotted curves are the quantum calculations fromRef. [87]. The color map is the prediction of the GP models.Each panel illustrates the improvement of the predicted phasediagram. The panels correspond to the optimized kernels GPL-0 (left), GPL-1 (right), GPL-2 (centre), where “GPL-X” de-notes the optimal kernel obtained after X depth levels in thealgorithm depicted in Figure 4.8. The models are trained bythe dispersion of the polarons at the locations represented bythe white dots. . . . . . . . . . . . . . . . . . . . . . . . . . 73xviiFigure 4.12 KGS for the mixed model (4.3) as a function of β/α and λSSH =2α2/th¯ω . The black dashed curves are the calculations fromRef. [87]. The color map is the prediction of the GP modelswith the fully optimized kernels. The models are trained bythe dispersion of the polarons at two different locations repre-sented by the white dots. Left column panels are the predictedphase diagram with GPL-0 for both training data sets and rightcolumn panels are with GPL-2. . . . . . . . . . . . . . . . . 74Figure 5.1 Time dependence of the standard deviation (in units of lat-tice constant) of the wave packet for a particle initially placedin a single site of a one-dimensional ideal lattice. The solidblack curves represent the ballistic expansion governed by theHamiltonian (5.1) with v = 0 and Vˆnc = 0. Upper panel: Theoscillating curves show the size of the wave packets governedby the Hamiltonian (5.1) with v = 0, γ = 0, ∆/t = 1, t = 1and two values of ∆ε: ∆ε = 10/t (blue) and ∆ε = 20/t (red).Lower panel: The oscillating curves show the size of the wavepackets governed by the Hamiltonian (5.1) with v=±1, ∆/t =1, t = 1 and ∆ε = 20/t. The insets show the average number ofparticles 〈n〉 as a function of time for the corresponding Hamil-tonian parameters. Notice that for ∆ε = 20/t, 〈n〉 stays below1.2 at all times. Figure from reference [185]. . . . . . . . . . 82xviiiFigure 5.2 Time dependence of the standard deviation (in units of lat-tice constant) of the wave packet for a particle initially placedin a single site of a one-dimensional ideal lattice. The solidblack curves represent the ballistic expansion governed by theHamiltonian (5.1) with v = 0 and Vˆnc = 0. The oscillatingcurves show the size of the wave packets governed by theHamiltonian (5.1) with v= 0, γ = 0, ∆/t = 1, t = 1, ∆ε = 20/tand a Hilbert space with different number of particles, 1 par-ticle (black), 1 and 3 particles (red) and 1,3 and 5 (blue). Theupper inset shows the logarithm of the particle probability dis-tributions in a disordered 1D lattice with 19 sites. The resultsare averaged over 50 realizations of disorder and are time-independent with w = 10/t. And the lower inset depicts theaverage number of particles 〈n〉 as a function of time for thethree different wave packets. Figure from reference [185]. . . 84Figure 5.3 Time dependence of the standard deviation (in units of latticeconstant) of the wave packet for a particle initially placed in asingle site of a one-dimensional ideal lattice. The solid blackline represents the ballistic expansion governed by the Hamil-tonian (5.1) with v = 0, ∆ = 0 and γ = 0. For the dotted redcurve ∆ = 0 and γ = t while for the blue solid curve ∆ = tand γ = 0. For all of these calculations, ∆ε = 20/t. The in-set shows the average number of particles 〈n〉 as a function oftime for ∆= 0 and γ = t (dotted red curve) and ∆= t and γ = 0(blue solid curve). Figure from reference [185]. . . . . . . . 85xixFigure 5.4 Upper panel: The logarithm of the particle probability dis-tributions in a disordered 1D lattice with 41 sites: diamonds– ∆/t = 1, squares – ∆/t = 1/2, circles – ∆/t = 1/10. Theresults are averaged over 100 realizations of disorder and aretime-independent. The dashed line is an exponential fit to the∆/t = 1/10 results. Lower panel: the long-time limit of theIPR defined in Eq. (5.3) averaged over 100 instances of disor-der as a function of the disorder strength w: solid line – ∆= 0,dashed line – ∆/t = 1. The inset shows the IPR averagedover 100 realizations of disorder for two disorder strengthsw = {5/t,10/t} as functions of time: the solid black curves– ∆ = 0; the dotted and dot-dashed curves – ∆ = t. Figurefrom reference [185]. . . . . . . . . . . . . . . . . . . . . . 87Figure 5.5 Schematic diagram of an ideal binary tree with depth-5 (G 5). 88Figure 5.6 Upper panel: The growth of the wave packet for a single par-ticle placed at the root of the tree (G 5): black line – ∆ = 0;the oscillating curves – ∆/t = 1 with ∆ε = 10/t (blue) and∆ε = 20/t (red). The inset shows the probability of reachingthe last node (in layer 5) of the binary tree as a function oftime. The curves are color-coded in the same ways as in themain plot. Lower panel: The L1 norm between the probabilitydistribution of the wave packet and the uniform distribution asa function of time. The uniform distribution is defined by thevalue 1/dim(G 5) for each node. The solid black curve is for∆ = 0. The oscillating curves are for ∆ = t with ∆ε = 10/t(blue) or ∆ε = 20/t (red). For all curves in both figures γ = 0.Figure from reference [185]. . . . . . . . . . . . . . . . . . . 89Figure 5.7 The squares represent the stationary distribution piS(ν) for aquantum walk (QW) on the G 5 tree with ∆ = 0. The circlesand diamonds are the the stationary distributions piS(ν) forquantum walks with ∆/t = 1 and ∆ε = 10/t and ∆ε = 20/t,respectively. Figure from reference [185]. . . . . . . . . . . . 92xxFigure 5.8 Schematic diagram of an ideal glued binary trees with depth-4(GBT −4). . . . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 5.9 Average particle probability distribution in a disordered GBT 4graph: upper panel − ∆/t = 0, middle panel − ∆ε = 10/t andlower panel − ∆ε = 20/t. For all panels we consider 100 real-izations of disorder with a strength of w = 5/t and γ = 0. Thewave packet for a particle is initially placed in the head nodeof a GBT 4 graph. Figure from reference [185]. . . . . . . . 94Figure 5.10 Average particle probability distribution in a disordered GBT −4 graph with strength w = 5/t. The wave packet for a singleparticle is initially placed in the head node of a GBT − 4graph shown in Figure 5.8. The upper panel displays the prob-ability (5.14) summed over all nodes of layer j = 3; the lowerpanel shows p j(t) for the bottom node j = 6. Insets: The parti-cle probability distributions for an ideal GBT −4 graph with∆ε = 10/t (blue dot-dashed) and ∆ε = 20/t (red dashed). Thebroken curves show the results obtained with ∆/t = 1 and thefull black lines – ∆/t = 0. Figure from reference [185]. . . . 96Figure 6.1 Upper panel: Zeeman levels of a Dy(5I) atom in the lowest-energy spin-orbit state characterized by J = 8 in a magneticfield B = B0zˆ. Lower panel: the solid curve – difference of theenergy gaps (εM=2− εM=1)− (εM=1− εM=0); the dot-dashedcurve – difference of the energy gaps (εM=2−εM=1)−(εM=0−εM=−1). The horizontal dashed line shows the magnitude ofthe matrix element ti,i+1 in Eq. (6.21) for Dy atoms with |g〉=|J = 8,M = 0〉 and |e〉 = |J = 8,M = 1〉 in an optical latticewith a = 266 nm. Figure from reference [184]. . . . . . . . . 103xxiFigure 6.2 The magnitudes of the coupling constants ti j (upper panel) andthe ratio ti j/vi j (lower panel) with j = i± 1 for the Zeemanstates of Dy corresponding to |g〉 ⇒ |JM〉 and |e〉 ⇒ |JM′〉.The calculations are for the magnetic field B = B0 (0.1xˆ+ zˆ)with B0 = 100 G. The Zeeman states in this magnetic field re-tain 96% of the eigenstates of J2 and J z. Figure from reference[184]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Figure 6.3 The magnitude of the coupling constant vi j with j = i± 1 forthe Zeeman states of Dy corresponding to |g〉 ⇒ |JM〉 and|e〉 ⇒ |JM′〉. The calculations are for the magnetic field B =B0 (0.1xˆ+ zˆ) with B0 = 100 G. The Zeeman states in this mag-netic field retain 96% of the eigenstates of J2 and J z. Figurefrom reference [184]. . . . . . . . . . . . . . . . . . . . . . 108Figure 6.4 The ratio vi,i+1/ti,i+1 for the Zeeman states of Dy correspond-ing to |g〉 = |J = 8,M = −7〉 and |e〉 = a|J = 8,M = −8〉+b|J = 8,M′〉: full circles – a =√3/4,b =√1/4; open circles– a =√3/5,b =√2/5. The calculations are for the magneticfield B = B0 (0.1xˆ+ zˆ) with B0 = 100 G. The Zeeman states inthis magnetic field retain 96% of the eigenstates of J2 and J z.Figure from reference [184]. . . . . . . . . . . . . . . . . . . 109Figure 6.5 The magnetic field dependence of the quantities ti,i+1 (squares)and vi,i+1 (circles) defined in Eqs. (6.24) and (6.25) for twodifferent pairs of the Zeeman state of Dy(J = 8) atoms: thefull symbols – the results for |g〉= |J = 8,M =−8〉 and |e〉=|J = 8,M =−7〉; the open symbols – the results for |g〉= |J =8,M = 0〉 and |e〉 = |J = 8,M = +1〉. The magnetic field isgiven by B = B0 (0.1xˆ+ zˆ). The Zeeman states in such a mag-netic field retain 96% of the eigenstates of J2 and J z. Figurefrom reference [184]. . . . . . . . . . . . . . . . . . . . . . 110xxiiFigure 6.6 Anderson localization of the |J = 8,M = 0〉 → |J = 8,M =+1〉 excitation in a one-dimensional array of Dy atoms on anoptical lattice with a = 266 nm and 20 % of the lattice sitesempty. The upper panel shows the probability distribution forthe atoms in the corresponding site to be in the excited state att = 2 seconds formed by a single excitation placed at t = 0 inthe middle of a lattice with 1000 sites. The lower panel showsthe width of the excitation probability distribution as a functionof time. Figure from reference [184]. . . . . . . . . . . . . . 115Figure A.1 The iterations of BO to find the minimizer of L for the H2molecule. The black vertical line is the next proposed point byBO. The blue markers are the training data used to constructthe acquisition function. The blue-solid curve is the mean ofthe GP model and the grey shaded area is σ(·). . . . . . . . . 148xxiiiGlossaryThis glossary uses the handy acroynym package to automatically maintain theglossary. It uses the package’s printonlyused option to include only thoseacronyms explicitly referenced in the LATEX source.α(·) acquisition functionBO Bayesian optimizationBIC Bayesian information criterionDFT Density functional theoryEI Expected ImprovementGP Gaussian ProcessHOMO highest occupied molecular orbitalk(·, ·) kernel functionθ kernel parametersKRR kernel Ridge regressionLHS Latin hypercube samplingLR long-rangeL loss functionML Machine learningxxivNN Neural networkPES potential energy surfacePI Probability of improvementQW quantum walkRL Reinforcement learningSR short-rangeD training dataUCB Upper confidence boundxxvAcknowledgmentsForemost, I would like to express my sincere gratitude to my supervisor ProfessorRoman V. Krems for giving me the opportunity to do a Ph.D. in his research group.I would also like to thank him for all his patients, devotion and endless help tomake this stage of my life a grateful experience. I would like to thank MarinaLitinskaya and Evgeny A Shapiro for all the help they grant me during the firstyears of my grad studies. I am thankful for the great research group I got to workwith, P. Xiang, J. Cui, J. Sous, T. Chattaraj, and J. Cantin; thank you very muchfor all those great physics and non-physics discussions during these years. I wouldlike to thank R. Chen for all the great machine learning discussions we had.I am grateful for the incredible experiences I had during my multiple visits tothe research group of Professor T. Schaetz in Freiburg Germany, these trips wouldnot had happened without the help of the IRTG-coco program. I would also like tothank my German friends, F. Hakelberg, S. Fuchs and P. Weckesser for all the greatmemories together. I would like to thank Professor D. H. Zhang and his studentY. Guan who made my visit to the Dalian Institute of Chemical Physics a pleasantexperience.Vancouver became my second home and I was lucky to make new friends. E.Castillo and J. Casanova for all their help during my first years. B. Loosley forthe beer/biking moments. K. Lovering and A. Bain for all those cups of coffee.P. L. Esquinas and A. Lee for all those quantum assignments, S. Cuen-Rochin forthe basketball moments, M. Vashista, T. Matzat, H. Neurert and R. Garcia-Rojas-Villegas. Thank you all the great memories that we shared during my Ph.D. Aspecial thanks goes to Montse Rueda for helping me be a better human, bike-beers-books.xxviLike any other immigrant, leaving home behind is one of the hardest chal-lenges because of the people we stop seeing. I am thankful for all the great Mexi-can friends who supported me and my family during my Ph.D. Romina Arguellesand her family, Mariana Dominguez, Paula Balbi, Izaskun Diaz, Alejandro Evaris,Pablo and Diego Mugu¨erza, Arturo Sanchez, Raymundo Marcial, Emiliano Mar-tinez, Pablo Bulit, Rodrigo Corte´s, Salvador Portocarrero and Adrian Va´zquez. Iwould like to thank my best friend Perla Ordaz for all the non-little things she hasdone for me. I am thankful to Professor Jose Carlos Flores who taught me mathin high-school and Professor Carlos Amador-Bedolla for introducing the to thebeautiful world of quantum mechanics.I would like to thank my aunt Julieta Mariaca, my aunts Martha E. MedinaGuerrero, Guadalupe Guerrero and their family, and my uncle Walter Louis andhis family; thank you all for your help and love to me and my family.I am grateful I meet Mriga Das at the beginning of my Ph.D. Mi amor, thankyou very much for all the joy and love you bring to my life. There is no doubt thisride was easier thanks to you.I was lucky to do my Ph.D. while living with my sister Ana Jimena Vargas-Herna´ndez; thank you for all your help, love, patience and pancakes during thistime; every day it felt like home.Last but not least, I would like to thank my parents, Ana Lilia Herna´ndez-Cuevas and Alejandro Vargas-Cruz for all their love that goes even beyond life andall their support. I would also like to thank them for teaching me never to give upand always try to make this world fairer. Since I was a kid my mother taught methat everything in life is explained with the laws of math and physics. My dad alsotaught me that chemistry, on its own, is a wonderful science that can make youunderstand the wonders of nature. I would like to thank my parents for making mepassionate about knowledge.xxviiTo my mother and her eternal fight for a fairer world.To my father, for teaching me to always fight for my dreams.In memory of my father.¡Hasta la victoria!– Ernesto GuevaraxxviiiChapter 1IntroductionIf I have seen further it is by standing on the shoulders of Giants.— Sir Isaac Newton (1675)The field of quantum physics has been theoretically challenging due to dimen-sionality of the Hilbert space required to accurately describe quantum systems.Major breakthroughs in the field of modern science have been achieved due togrowth in computational power in recent years. Nevertheless, there are still manyinteresting quantum systems that cannot be fully simulated with the current com-puting capacity and thus different methodologies have to be designed. In 1982 R.Feynman proposed the idea to simulate quantum systems analogously [64]. Thisnew research field is known as quantum simulators and its only goal is to usefully controllable quantum systems to improve the human knowledge of quantumphysics [30].As practical as it seems, quantum simulators cannot mimic all quantum sys-tems and the hunger for novel numerical tools to study quantum physics is stillpresent [30]. Recently the use of machine learning algorithms has gained momen-tum in the fields of theoretical chemistry and physics [8, 33, 35, 71]. In this thesismachine learning algorithms are being used to study quantum physics in a widerange of problems; from the description of the electronic energy as a function ofthe nuclei position to the discovery of new quantum phases of matter. In this chap-ter, a brief introduction and motivation to each of the research projects that wereconducted during my Ph.D. are presented. The last part of the current chapter sum-1marizes each of the chapters dedicated to the research projects that this thesis iscomprised of.1.1 Machine learning to study quantum systemsPhysics is devoted to studying observables quantities like position, momentum orenergy, to explain how nature behaves. The state of a system, S, determines themeasured quantity, o, that is produced by a physical observable F . This processcan be described by,F (S) = o. (1.1)For example, given the distance and the masses of two bodies, one can computethe gravitational potential energy,U(x1,x2,m1,m2) =−G m1 m2r1,2 , (1.2)where r1,2 is the distance between the two particles, and mi is the mass of eachparticle. G is the gravitational constant. One can observe that Equation 1.2 issimilar to Equation 1.1, where S = [x1,x2,m1,m2] andF (·)→U(·).Things are more complicated in the quantum world since quantum observablesare described by operators, Oˆ [161]. In quantum mechanics, an operator acts on aquantum state |ψ〉 and produces a scalar, λ , and another quantum state,Oˆ |ψ〉 → λ |φ〉 . (1.3)A specific group of operators is the Hermitian operators whose eigenvalues are real,λi ∈ IR. All the quantum observables that are studied in this thesis are described byHermitian operators. When a Hermitian operator acts on its eigenvector, the outputvector is the same,Oˆ |ψi〉= λi |ψi〉 . (1.4)where |ψi〉 is an eigenvector of operator Oˆ and λi is the eigenvalue associated2with |ψi〉. Furthermore, the dot product between two eigenvectors is equal to aKronecker delta,〈ψi|ψ j〉= δi j ={1 if i = j,0 if i 6= j}, (1.5)i.e. they are orthogonal. Using Equation 1.4 one can rewrite any Hermitian opera-tor using its spectral decomposition,Oˆ =∑iλi |ψi〉〈ψi| , (1.6)where λi are the eigenvalues and |ψi〉 the eigenvectors of the operator Oˆ . Each|ψi〉〈ψi| term in Equation 1.6 is a projector or measurement operators in the eigenspaceof Oˆ , Pˆi = |ψi〉〈ψi|. Furthermore, any quantum state |ψ〉 can be rewritten in termsof the eigenstates of Oˆ ,|ψ〉=∑i|ψi〉〈ψi|ψ〉 , (1.7)where each 〈ψi|ψ〉 term is an inner product between the quantum states |ψi〉 and|ψ〉. Using Equations 1.6 and 1.7, one can compute the expectation value of aquantum observable as,〈ψ|Oˆ |ψ〉 = ∑iλi 〈ψ|ψi〉〈ψi|ψ〉= ∑iλi[∑j〈ψ|ψ j〉〈ψ j|]|ψi〉〈ψi|[∑l〈ψl| 〈ψ|ψl〉]= ∑iλi| 〈ψ|ψi〉 |2 (1.8)where | 〈ψ|ψi〉 |2 = 〈ψi|ψ〉〈ψ|ψi〉 is the probability of measuring λi. Furthermore,Equation 1.8 is the statistical definition of the expectation value for a discrete ran-dom variable X ,E[X ] =∑ixi pi. (1.9)3The ground state of Oˆ , denoted as |ψGS〉, is the eigenvector with the lowesteigenvalue, λ0. Using Equation 1.8 and |ψ〉 → |ψGS〉 one can measure λ0,〈ψGS|Oˆ |ψGS〉= λ0. (1.10)Additionally, operators can have internal parameters; for example, the Hamiltonianof a particle in a one-dimensional box depends on the length of the box, L, andthe mass of the particle, m. Here we collectively described the parameters of anoperator by θ¯ . In the case of a particle in a one-dimensional box, θ¯ = [L,m].One of the central ideas in this dissertation, Equation 1.11, is the use of Ma-chine learning (ML) algorithms to learn the relation between λ0 and θ¯ ,〈ψGS|Oˆ(θ¯) |ψGS〉= λ0 → F (θ¯), (1.11)where F (·) is the ML algorithm that mimics the computation of λ0. If F (·) iscapable to learn 〈ψGS|Oˆ(θ¯) |ψGS〉 accurately, one could reduce the computationalresources needed to calculate λ0 for different values of θ¯ .In theoretical chemistry, computing the electronic ground state energy of an en-semble of electrons and nuclei in different fixed positions can be mapped to Equa-tion 1.11. Each of the possible spatial configurations of the nuclei is described by aunique θ¯ = [x1, · · · ,xm], where xi is the position of each nucleus. The map betweenthe nuclei positions and the electronic ground state energy, F (·), is known in thefield of quantum chemistry as the potential energy surface (PES) [133]. As it isstated above, inferring F (·) is crucial to reduce the computational power neededto study quantum systems. This leads to the first question addressed in this the-sis: Can ML be used to interpolate quantum observables accurately to constructPESs?For some quantum systems, the computation of λ0 with standard numericalmethods lacked convergence due to the size of the Hilbert space required to de-scribe the quantum system. Furthermore, without the ability to compute λ0 in theentire space of θ¯ , one can not construct the complete phase diagram of a quantumsystem [160]. This raises the second question: Can ML be used to extrapolateobservables to learn phase diagrams?4Quantum observables can also depend on other quantum observables, Equa-tion 1.12, making the computations more demanding.λ0 = 〈φGS|Oˆ1(θ¯ ,〈ψGS|Oˆ2(θ¯ ′) |ψGS〉) |φGS〉 . (1.12)Using Equation 1.11, one can rewrite Equation 1.12 as,〈φGS|Oˆ(θ¯ ,F (θ¯ ′)) |φGS〉 = 〈φGS|Oˆ(θ¯ ,λ ′0) |φGS〉 , (1.13)where θ ′ are the parameters of Oˆ2 and λ ′0 it the lowest eigenvalue of Oˆ2(θ ′). F (·)computes the quantum observable of Oˆ2 given the values of θ ′. The parameters ofOˆ are described by θ¯ .Many types of quantum problems can be described by Equation 1.12; for ex-ample, to compute any quantum dynamical observable one first needs to fit F (·)to reduce the complexity of the problem. From Equation 1.13 one can observe thecausality relation between λ0 andF (·) isF (·)→ λ0, so one can raise the follow-ing question: Can F (·) be learned from λ0?. This question amounts in the fieldof quantum physics to the inverse scattering problem. Due to the complexity ofthe inverse scattering problem, one could rephrase the above question as the thirdquestion of this dissertation: Can the inverse scattering problem be solved usingML?1.2 Quantum walksThe goal of ML is designing robust algorithms capable of learning almost anyphenomenon given some data. In order to do so, ML algorithms are trained byminimizing a loss function which sometimes is not a simple task. Furthermore,the optimization of ML algorithms is computationally demanding. Some classi-cal algorithms when they are mapped into the quantum framework became moreefficient [194]. For example, the spread of quantum walks in some regimes issignifically faster than random walks, making them more efficient for search typealgorithms. Quantum walks are the quantum counterpart of random walks, oneof the building blocks used in various classical algorithms as statistical models ofreal-world processes, and they are also used in the field of quantum computing5[46, 89, 142, 187].There are two types of quantum walks, discrete and continuous quantum walks.The first type resembles the classical view of a random walk by tossing a quantumcoin. The quantum coin describes the transition probabilities from heads (|+〉)to tails (|−〉) or vice-versa. One of the main differences between quantum andrandom walks is the possibility for a quantum walk to be in a superposition ofheads and tails,|s〉= c1 |+〉 + c2 |−〉 , (1.14)where cis are the complex amplitudes for each coin state. The time evolution of adiscrete quantum walk is done by sequentially applying the coin operator,|s(n)〉= Uˆn |s(n = 0)〉=[N∏n=11√2(1 11 −1)]|s(n = 0)〉 , (1.15)where Uˆ , in this case, is the Hadamard operator and |s(n = 0)〉 the initial state.Tossing a quantum coin one time for an initial state in heads, |s(n = 0)〉 = |+〉,leads to a new state that is in an equal superposition of both coin states,|s(n = 1)〉= Uˆ |+〉= 1√2(1 11 −1)(10)=|+〉 + |−〉√2. (1.16)In order to simulate the movement of the walk in a one-dimensional chain, weintroduce the shift operator, Sˆ. If the quantum coins lands heads, the quantum walkmoves to the right and if it lands tails, it moves to the left,Sˆ = |+〉〈+|⊗∑n|n+1〉〈n| + |−〉〈−|⊗∑n|n−1〉〈n| , (1.17)here the state |n〉 indicates the location of the quantum walk in a one-dimensionalchain. After applying the Hadamard operator on the initial state once, we obtain a6new quantum state with equal probability in the adjacent siteSˆ⊗ Hˆ[ |+〉⊗ |n = 0〉] = Sˆ[ |+〉⊗ |n = 0〉 + |−〉⊗ |n = 0〉√2]=|+〉⊗ |n = 1〉 + |−〉⊗ |n =−1〉√2, (1.18)for this example the coin operator is Uˆ = Sˆ⊗ Hˆ and the initial state considered isa quantum walk located in the middle of a 1D chain with heads. As it is statedabove, the quantum walk has moved with equal probability to the adjacent sites.The time evolution of continuous quantum walks is,|ψ(t)〉= Uˆ(t) |ψ(t = 0)〉 (1.19)where Uˆ(t) is the evolution operator and it is defined asUˆ(t) = e−ih¯Hˆt , (1.20)where Hˆ is the Hamiltonian operator. For discrete quantum walks the evolutionoperator is the Hadamard operator, Equation 1.15.One of the most common continuous quantum walks is described by the Hamil-tonian of a one-dimensional chain with hopping only between adjacent sites,Hˆ =−γ[ |n+1〉〈n|+ |n−1〉〈n|], (1.21)where γ is the transition probability. Expanding the evolution operator in a Taylorseries we find that a quantum walk, initially located in the centre of a 1D chain,moves to the adjacent sites with equal probability,|ψ(t)〉 = [1− ih¯ Hˆt+ · · ·] |n = 0〉= |n = 0〉+ ih¯γt[|n = 1〉+ |n =−1〉]+ · · · . (1.22)It must be pointed out that the formulation of continuous quantum walks is inde-pendent of coin states, meaning that the movement of the walk does not dependon tossing a quantum coin. In continuous quantum walks, the Hamiltonian itself is7the graph where the quantum walk will move [4, 63], and the transition probabili-ties, described by the Hamiltonian, are the vertices between different nodes as it isdescribed in Equation 1.21.One of the key differences between random and quantum walks is the spread ofthe walk over time. While the spread over time for random walks scales as O(√t),the spread of quantum walks scales linearly, O(t), also known as ballistic expan-sion [142]. This unique property has made quantum walks extremely valuable inquantum algorithms [45, 47, 171]. In this thesis we also raise the following ques-tion: Can the spread over time be enhanced for quantum walks beyond the ballisticlimit?1.3 Quantum simulators of condensed matterR. Feynman proposed a revolutionary idea. He suggested that one could mimicquantum phenomena with a fully controlled simulator [64]. This simple idea hadhuge experimental consequences, since, in order to achieve quantum simulators,humans have to be able to fully control quantum systems [30]. Over the lastdecades, various candidates like, ion traps [21, 166], ultracold atoms and moleculestrapped in optical lattices [115], have been proposed to mimic the interactions (orphysics) of the solid-state/condensed matter. For example, in 1998 D. Jaksch et al.[97] demonstrated that the dynamics of an ultracold dilute gas of bosonic atoms inan optical lattice can mimic the Bose-Hubbard model,Hˆ =−t ∑〈i, j〉cˆ†i cˆ j +U2 ∑inˆi(nˆi−1)−µ∑inˆi, (1.23)where cˆ†i and cˆi are the creation and annihilation operators, nˆi is the number of par-ticles operator in the site i, and 〈i, j〉 restricts the sum only to adjacent sites. Thefirst term of the Bose-Hubbard model represents the hopping of particles betweendifferent lattice sites. The second term describes the interaction between two parti-cles at the same lattice site. The last term is known as the chemical potential and isrelated to the total number of particles in the system. For ultracold atoms trappedin an optical lattice, the parameters of Equation 1.23 depend on the lattice depththat is produced by the laser used to trap the atoms [97].8In 2009 M. Ortner et al. proposed the use of polar molecules to simulate theBose-Hubbard model [135]. This was experimentally realized in 2013 by B. Yan etal. [201]. Trapping polar molecules in optical lattice is arguably one of the great-est experimental achievements in modern physics; however, the filling of opticallattices with current experimental setups is not high, currently limited to ∼ 25%[131].Polar molecules trapped in different sites of the optical lattice interact by dipole-dipole interaction. The constants of the Bose-Hubbard model simulated with polarmolecules can be tuned by an external DC electric field [73]. Since the trap depthof the optical lattice has to be large enough to confine polar molecules, the quan-tum particles that simulate Bose-Hubbard model are Frenkel excitons, which in thecase of polar molecules are rotational excitations [73, 84, 86, 110, 121, 128, 138,198, 201].On the other hand, ultracold atoms can be trapped in optical lattices with nearlyuniform filling [74, 195]. In 2005 Greismaier et al. demonstrated that atoms withhigh magnetic dipole were trapped in optical lattices [75]. This raises the follow-ing question, Can magnetic atoms trap in an optical lattice be used as quantumsimulators for Bose-Hubbard type models?1.4 Thesis outlineThe work presented in this thesis is divided into two parts. In the first part, Chapters2-4, illustrates the use of ML to study quantum physics. In the second part ofthe thesis, Chapters 5 and 6, we study quantum walks and quantum simulators ofultracold atoms with large magnetic dipole.Chapter 2 contains the introduction to one of the most versatile supervisedlearning algorithms, GP regression. In this chapter, we also explain why GPsare a more accurate interpolation algorithm than neural networks (NNs) for lowdimensional problems. We interpolate the electronic energy for different spatialconfigurations of the formaldehyde molecule using GP regression.Chapter 3 introduces the novel idea of combining different ML algorithmswith the purpose of solving the inverse scattering problem. It also introduces theBayesian optimization (BO) algorithm to optimize functions without computing9gradients. The combination of BO and GPs can construct accurate PESs for quan-tum dynamics observables like the reaction probability for the H+H2 and OH+H2reactions.Chapter 4 explores the hypothesis that by extrapolating quantum observablesone can discover new phases of matter. Illustrating that GPs with a combinationof simple kernels can extrapolate quantum observables and predict the existenceof new phases of matter. This chapter studies the evolution of the polaron dis-persion as a function of Hamiltonian parameters and shows that the change in theground state momenta leads to a new phase of matter. Additionally, using the samealgorithm one can accurately extrapolate quantum observables where traditionalnumerical methods lack convergence.Chapter 5 demonstrates the possibility to enhance the spread of quantum walksfor various graphs by allowing the number of walks to be a non-conserved quantity.Illustrating that spread of a quantum walk with number-changing interactions issupra-linear at short times, O(tn) where n > 1. We also show that for disorderedgraphs the spread of a quantum walk is larger when number-changing interactionsare considered in the Hamiltonian.Chapter 6 is dedicated to an experimental proposal to study extended Bose-Hubbard models using highly magnetic atoms trapped in optical lattices. The pro-posal presented uses Zeeman excitations to tune the parameters to construct variousBose-Hubbard type models.10Chapter 2Gaussian ProcessesSometimes people have to create out of pure necessity.— Ben ShewryMachine learning (ML) field is divided into three major areas; supervised, un-supervised, and reinforcement learning [20, 132, 181]. Each of these three fieldsstudies a particular task. In the case of supervised learning, the goal is to find thenumerical mapping between an input xi and an output yi. The numerical mappingcan be viewed as, yi = f (xi). When the output value yi is discrete, the problemis known as classification. On the other hand, when yi is continuous is known asinterpolation. This chapter describes one of the most common supervised learningalgorithms Gaussian Process (GP) regression [147].2.1 IntroductionAs in any other supervised learning algorithm, the goal of GP regression is toinfer an unknown function f (·) by observing some input, X, and output data, y.We denote D as the training dataset that contains both X and y. One of the fewdifferences between GP regression and other supervised learning algorithms, likeNeural networks (NNs), is that GPs infer a distribution over functions given thetraining data p( f |X,y).Gaussian Processes (GPs) are a probabilistic method and assume that eachtraining point is a random variable and they have a joint probability distribution11p( f (x1), · · · , f (xN)). As its name suggests, GPs assume that the joint distributionis Gaussian and has a mean µ(x) and covariance K(x,x′). The matrix elementsof the covariance are defined as Ki j = k(xi,x j) where k(·, ·) is a positive definedkernel function. The kernel function plays a key role as it describes the similarityrelation between two points. A GP is denoted as,f (x)∼ GP(µ(x),K(x,x′)) (2.1)In the following sections, we describe the training and prediction of GP regression.2.2 Gaussian Processes: predictionAs it was mentioned above, a GP is the collection of random variables f (xi) thatfollow a joint distribution. The joint Gaussian distribution between D , x∗, andf (x∗) is of the form,(f (X)f (x∗))∼N((µµ∗),(K K∗K>∗ K∗∗))(2.2)where K is the covariance matrix of the training data X , K∗ is the covariance matrixbetween X and x∗, and K∗∗ is the covariance with respect to itself.The probability distribution over f (x∗) can be inferred by computing the condi-tional distribution given the training data, p( f∗|x∗,D). Conditioning on the trainingdata is the same as selecting the distribution of functions that agree with observeddata points y. The mean and covariance matrix of the conditional distribution are,µ(x∗) = K(x∗,X)>K(X ,X)−1y (2.3)σ∗ = K(x∗,x∗)−K(x∗,X)>K(X ,X)−1K(X ,x∗) (2.4)where σ∗ is the predicted variance for x∗. The mean of the conditional distribution,equation 2.3, can be rewritten as,µ(x∗) = ∑id(x∗,xi)yi =∑iαik(x∗,xi) (2.5)12where d=K(x∗,X)>K(X ,X)−1 and α =K(X ,X)−1y. The mean of the conditionaldistribution is a linear combination of the training data, y, or a linear combinationof the kernel function between the training points and x∗. Function d(·, ·) can beunderstood as a distance function. It is important to state that the accuracy of theprediction with GPs directly depends on the size of the training data N and thekernel function k(·, ·). In Section 2.4 we illustrate the impact that different kernelson GPs’ prediction.In the GPs’ framework, the predicted variance or standard deviation of x∗ rep-resents the uncertainty of the model, Equation 2.4. The uncertainty can be usedto sample different regions of the function space to search the location of the min-imum or maximum of a particular function. This is the idea behind a class ofML algorithms known as Bayesian optimization (BO), which will be discussed inChapter 3.An example of GP regression is illustrated in Figure 2.1 where f (x) = x10 +x2 + 10sin(32 x). We use the exponential squared kernel and 7 training points. Inthe following section, we describe the most common procedure to train GPs.2.3 Gaussian Processes: trainingThe goal of any supervised learning algorithm is to infer a function f (·), as ac-curately as possible, given some example data. In order to quantify the accuracyof a model we define a loss function, L , for example, the difference betweenthe prediction yi and the real value yˆi (training points) such as L ≈ ∑Ni (yi− yˆi)2or L ≈ ∑Ni |yi− yˆi|. The parameters of the model w and L are interconnected.To illustrate this idea let us assume that f (·) is a simple linear regression model,f (x) = a+bx. The loss function for such model is,L =N∑i(yi− yˆi)2 =N∑i(a+bxi− yˆi)2. (2.6)From the previous equation, we can observe that the value ofL depends on a andb. It can be argued that when L is large f (xi) 6≈ yˆi. On the other hand whenf (xi) ≈ yˆi the value of L will tend to zero. Using a loss function to tune theparameters of f (·) is known as the training stage in ML. It must be mentioned135.0 2.5 0.0 2.5 5.0x2010010203040f(x)f(x)ObservationsPrediction95% confidence intervalFigure 2.1: Interpolation using a GP model trained with 7 points and an ex-ponential squared kernel. The black dashed line is f (x) = x10 + x2 +10sin(32 x). The solid blue line is the prediction of the GP model, Equa-tion 2.5. The grey shaded area is the standard deviation of the predictedmean of the GP model, Equation 2.4. The blue square symbols are thetraining data.that replicating the training data could also mean that the model “memorized” thetraining data. This common problem in ML is known as overfitting.GPs models can also be trained using a loss function. GP models are non-parametric models, therefore, the dimensionality of the loss function depends onthe number of the parameters of the kernel function. Using a loss function todetermine the optimal value for the kernel parameters for non-parametric modelsis computationally expensive and is prone to overfitting [132, 147]. However, itis possible to train GP methods without a loss function. The most common wayto train GPs is by finding the kernel parameters (θ ) that maximize the marginal14likelihood function,θˆ = argmaxθp(y|X ,θ ,Mi) (2.7)where p(y|X ,θ ,Mi) is the marginal likelihood for a given model or kernel functionMi. Finding the value of θ where p(y|X ,θ ,Mi) is maximized is known as type IImaximum likelihood (ML-II) [132, 147]. The marginal likelihood or evidence isdefined as,p(y|X ,θ ,Mi) =∫p(y|X , f,Mi) p(f|θ ,Mi)df. (2.8)In the case of GPs, the marginal likelihood has a closed form, Equation 2.9.Finding the kernel parameters that maximize the marginal likelihood can be doneby maximizing the logarithm of the marginal likelihood w.r.t to θ ,log p(y|X ,θ ,Mi) =−12y>K−1y− 12log |K|− N2log2pi (2.9)where the first term is known as the data-fit term, the second term reflects thecomplexity of the model Mi, and the third term is a constant that depends on thenumber of training points, N. The value of log p(y|X ,θ ,Mi) mainly depends onthe data-fit and complexity terms. For instance, for a θ that memorizes the train-ing data the value of log |K| is large, while y>K−1y will be small. The tradeoffbetween the data-fit and the complexity term is key for the optimal value of thekernel parameters.Standard gradient-based optimization algorithm is the most common methodto find the optimal value of the kernel parameters. The logarithm of the marginallikelihood has an analytic form, therefore, it is possible to compute the change oflog p(y|X ,θ ,Mi) as a function of the kernel parameters, ∂ log p(y|X ,θ ,Mi)∂θi . For mostcases, the function log p(y|X ,θ ,Mi) is not a convex function; thus there is a possi-bility that the optimization algorithm gets trapped in one of the local maxima [147].For GP regression, the value of θ does depend on the training data y, and havingdifferent training data sets could lead to different values for the kernel parameters.In the following section, we explain the use of GPs as an interpolation tool to fit15multidimensional functions. In Chapter 3, we illustrate that two-tiered GPs mod-els are capable of solving the so-called inverse scattering problem. In Chapter 4,we also illustrate the use of GP regression to predict beyond the training data todiscover phases of matter.2.4 Kernel functionsIn the previous sections, we explained how a GP model is trained and also how itcan make predictions. We also assumed that GPs need a kernel function in order toconstruct the covariance matrices K and K∗. In this section, we introduce variouskernels that are used for training GPs and illustrate how prediction with GPs candrastically change depending on which kernel is used. As mentioned previously,the kernel function should describe the similarity between two points. In kernelregression, two points that are similar, under some metric, should have a similaroutput value yi, predicted by Equation 2.5.The kernel function k(·, ·) must be a positive definite function; this restrictioncomes from the requirement to invert the covariance matrix during training andprediction see, Equations 2.5 and 2.9. The covariance matrix must be a symmetricmatrix which forces the kernel function to be symmetric too, k(x,x′) = k(x′,x).The Cholesky factorization is the most common algorithm to invert matrices withO(N3) complexity where N is the number of training points [147].All of the kernels that are used in this thesis are stationary kernels except for thelinear kernel. Any stationary kernel can be rewritten in terms of the Mahalanobisdistance,r2(x,x′) = (x−x′)>Λ(x−x′) =d∑i`i(xi− x′i)2 (2.10)where the Λ is a diagonal matrix with `i being a particular length-scale for eachdimension. The Mahalanobis distance reduces to the square of the Euclidian dis-tance when all `i = 1, r2(x,x′) = (x−x′)>(x−x′). Furthermore, when all `i havethe same value, the kernel function is an isotropic kernel. In the following sections,we explain some of the most common kernels that are used in GP models.162.4.1 Constant kernelThe constant kernel, arguably the most simple kernel, assumes that the similarityrelation between two points is constant,kC(r) = ` (2.11)The kernel parameter ` can be optimized by maximizing the logarithm of themarginal likelihood. kC(·) is a one-time differentiable function.2.4.2 Square exponential kernelThe square exponential (SE) kernel, also known as the radial basis function (RBF)kernel, is probably the most used kernel in GP regression. The SE kernel is astationary kernel since it depends on the difference between two points,kSE(r) = exp(−r2) . (2.12)The kernel parameters of the SE kernel are the matrix elements of the Λ matrix.Each `i defines the characteristic length-scale for each dimension. Depending onthe user, the SE kernel can be isotropic, all `i have the same value, or anisotropic,each `i has a different value. As it was mentioned in the previous section, the valuesof Λi are optimized by maximizing the logarithm of the marginal likelihood. Thetraining of an isotropic SE kernel is faster since the total number of parameters inthe kernel is one, while anisotropic SE kernels have d parameters, d is the dimen-sion of x. The SE kernel is infinitely differentiable. Figure 2.2 illustrates the squareexponential kernel and GPs’ prediction with SE kernel as a function of different `s.2.4.3 Matern kernelThe Matern (MAT) kernels are probably the second most used kernel for GPs,kMAT (r) =21−νΓ(ν)(√2νr`)νKν(√2νr`)(2.13)173 2 1 0 1 2 3x x′ SE(x,x′)Square exponential kernel= 0.1=0.5=1.0=1.5(a)4 2 0 2 4x100102030f(x)f(x)= 0.883=1.134=1.253Observations(b)Figure 2.2: a) The square exponential function with different length scales.b) Prediction of a GP model with the SE kernel using different lengthscale values.where ν and ` are both positive constants, Γ(·) is the gamma function, and Kν is amodified Bessel function. In the limit of ν → ∞ the Matern kernel reduces to theSE kernel. The two most common Matern kernels are for ν = 32 and ν =52 ,kν= 32 (r) =(1+√3r)exp(−√3r)(2.14)kν= 52 (r) =(1+√5r+53r2)exp(−√5r)(2.15)when ν = 32 the Matern kernel is one-time differentiable, while for ν =52 theMatern kernel is twice differentiable. Both kernels, Equations 2.14 and 2.15, areanisotropic kernels if r is computed with the Mahalanobis distance. Figure 2.3 il-lustrates the Matern function and GPs’ prediction with Matern kernel for different`s and ν = 32 or ν =52 . Figure 2.3 illustrates the MAT kernel and GPs’ predictionwith these kernels for different values of the kernel parameters.183 2 1 0 1 2 3x x′ 2(x,x′ )Matern kernel, = 32=0.1=0.5=1.0=1.5(a)4 2 0 2 4x100102030f(x)= 32f(x)= 0.736=1.213=1.638Observations(b)3 2 1 0 1 2 3x x′ 2(x,x′ )Matern kernel, = 52=0.1=0.5=1.0=1.5(c)4 2 0 2 4x100102030f(x)= 52f(x)= 0.692=1.14=1.539Observations(d)Figure 2.3: a,c) The Matern kernel function for different length scales andν = 32 (a) or ν =52 (c). b,d) Prediction of a GP model with the MATkernel using different length scale values and ν = 32 (b) or ν =52 (d).2.4.4 Rational Quadratic kernelThe Rational quadratic kernel (RQ) is also a stationary kernel,kRQ(r) =(1+r22α`2)−α(2.16)where α and ` are the kernel parameters. In the case where α → ∞ the RQ kernelis identical to the SE kernel. r is computed with the Euclidian distance. Figure 2.4illustrates the RQ kernel and different GPs’ prediction for various α and ` values.193 2 1 0 1 2 3x x′ RQ(x,x′ )Rational quadratic kernel=0.1, = 0.2=0.1, = 1.0=1.0, = 0.2=1.0, = 1.0(a)4 2 0 2 4x402002040f(x)=101.061f(x)= 0.931=1.536=2.532Observations(b)Figure 2.4: a) The rational quadratic function with different ` and α values.b) Prediction of a GP model with the RQ kernel using different valuesfor ` and α .202.4.5 Periodic kernelNone of the previously mentioned kernels are capable of describing periodic func-tions. Periodicity can be described by trigonometric function like cos(x), sin(x), cos2(x)or sin2(x). Since any kernel function must be a semi-positive define function,cos2(x) and sin2(x) are the only capable trigonometric functions that can be usedas kernel functions. The periodic (PER) kernel has the form,kPER(r) = exp−2sin2(pirp)`2 (2.17)where p and ` are the kernel parameters. p describes the intrinsic periodicity inthe data and ` is the length-scale parameter. Figure 2.5 shows the exponential sinesquared function for different ` and p values, and also how GPs’ prediction changesfor different ` and p. PER(x,x′ )p=0.5=0.1 =0.5 =1.0p=12 1 0 1 2x x′ PER(x,x′ )p=52 1 0 1 2x x′p=10(a)4 2 0 2 4x3020100102030f(x)p=8.11f(x)= 0.585=0.965=1.591Observations(b)Figure 2.5: a) The exponential sine squared function with different lengthscales and different periodicities. b) Prediction of a GP model with thePER kernel using different values for ` and p = Linear kernelThe linear kernel (LIN) is a non-stationary kernel and is also known as the dotproduct kernel,kLIN(x,x′) = x> x′+σ =d∑ixix′i+σ (2.18)where σ is the offset value of the kernel. If σ = 0 the linear kernel is considered tobe homogenous. A more general form of the linear kernel is, kLIN(x,x′) = x>Λ x′,where the matrix Λ is a diagonal matrix with unique length-scale parameters foreach dimension in x. The linear kernel is the base of the polynomial kernel,kPOL(x,x′) =(kLIN(x,x′))p=(x> x′+σ)p= σ +(d∑ixix′i)p(2.19)where p is the polynomial degree. Figure 2.6 shows the prediction of GP regressionwith the LIN kernel with different polynomial degrees.5 0 5100102030f(x)p=1f(x)= 0.779=1.0=1.1055 0 5x100102030f(x)p=25 0 5x100102030f(x)p=3Figure 2.6: Prediction of a GP with the LIN kernel using different σ andpolynomial degree.22In the following section, we illustrate the use of GP models to interpolate mul-tidimensional functions. We compare the accuracy of GP regression with NNs, themost well-known ML algorithm for supervised learning.2.5 Potential Energy SurfacesOne of the greatest challenges in computational chemistry is describing the changeof the electronic ground state energy, Eelec, as a function of the position of theatoms. For each Eelec we need to solve the Schro¨dinger equation where the Hamil-tonian (Helec) describes a system of an ensemble of N electrons and M nuclei withfix positions. This Hamiltonian is known as the electronic Hamiltonian,Helec =−N∑i=112∇2i −N∑i=1M∑A=1ZAriA+N∑i=1N∑j>i1ri j, (2.20)where the first term is the kinetic energy of electrons, the second term is theCoulomb interaction between electrons and nuclei, and the last term is the electron-electron interaction. ZA is the atomic number of nucleus A. The eigenvalues ofHelec parametrically depend on the positions of the nuclei because of the electron-nucleus interaction, the riA terms in Equation 2.20.Finding the solutions of Helec is still one of the most challenging problemsin quantum chemistry. In 1998, the Nobel committee for chemistry laureated W.Kohn and J. A. Pople for their contributions in the field of quantum chemistry. W.Kohn is the father of Density functional theory (DFT), which is one of the mostknown methodologies for solving Equation 2.20, and J. A. Pople is the pioneer inthe development of various computational methods in quantum chemistry. Bothscientists dedicated their research to find the solutions of the electronic Hamilto-nian. It is out of the scope of this thesis to explain or propose new methodologiesdedicated to computing the eigenvalues and eigenvectors of the electronic Hamil-tonian. Instead, we propose the use of machine learning to reduce the overallcomputational resources needed to study problems in quantum chemistry.As it is stated above, the electronic energy depends on the positions of thenuclei. For instance, two hydrogen atoms have different energies for different in-teratomic distances; quantum chemists call this function a potential energy sur-23face (PES). Figure 2.7 illustrates the prediction of the hydrogen molecule’s energyfor different interatomic distances using GP regression.A more general definition of a PES is the function that describes the electronicenergy of a system for different spatial configuration of the atoms,Eelect = f (R0, · · · ,RM) (2.21)where Ri is the position of the nucleus i. The function f (·) can be any quantumchemistry method used to evaluate the electronic energy for a given configurationof atoms, Eelec ∼ 〈Helec〉. Trying to infer f (·) from some available data is thedefinition of a supervised learning problem.0.5 1.5 3.0 4.0 5.550510E(eV)ExactPredictionObservations95% confidence interval0.5 1.5 3.0 4.0 5.5RH H (Bohr)50510E(eV)0.5 1.5 3.0 4.0 5.5RH H (Bohr)50510Figure 2.7: H2 molecule’s energy for different interatomic distances pre-dicted with a GP model trained with 6 points and the SE kernel. Theexact PES (dashed black line) is from reference [23].Using ML or any other regression model to infer f (·) is not a novel idea[133, 165, 183]. In 2006 Manzhos et al. used NNs to fit PESs [122, 124]. Theuse of GP regression to construct PESs was introduced by A. P. Barto´k et al. inRef. [14] to study bulk crystals where the PESs are used in the calculation of var-24ious properties at high temperatures. Cui et al. interpolated the PES of the N4molecule [52] using GP regression. However, ML has rapidly evolved in recentyears with the creation of new algorithms that could reduce the number of trainingpoints needed to make more accurate fits for various PESs [13, 15, 78, 169].The prediction of bond-breaking energies, unimolecular reactions, vibrationalspectra, and reaction rates are a few of the observables that depend upon the accu-racy of PES fits [133, 165, 183]. Being able to interpolate the energy of a systemfor different molecular geometries can also give synthetic chemists informationsuch as reaction mechanisms or transition states [186]. Generally speaking, PESsplay a crucial role in chemistry.In the following section, we present the results obtained using NNs and GPregression to predict the PES for the H2CO molecule. We also discuss the impactthat different kernels have on GP regression and the number of neurons in NNs.Additionally, we summarize the impact the training points have on the test errorfor both algorithms.2.5.1 Results2.5.2 Gaussian Processe regression vs Neural NetworksNNs and GP regression are two of the most known supervised learning algorithms[132, 159]. Over the course of this section, we summarize the results obtainedwhen NNs and GP regression both are used to fit the PES of the H2CO molecule.We also discuss the importance of the training data size for both algorithms andexplain under which circumstances which algorithm is more accurate for interpo-lating PESs. The results presented in this section are published in reference [102].The PES of Formaldehyde is a 6D function where the six coordinates are,• CO→ distance between the Carbon and Oxygen atoms.• CH1 → distance between the Carbon atom and one of the Hydrogen atoms.• CH2 → distance between the Carbon atom and the other Hydrogen atoms.• ∠OCH1 → angle between the Carbon, Hydrogen and Oxygen atoms.25• ∠OCH2→ angle between the Carbon, second Hydrogen and Oxygen atoms.• ∠H2OCH1 → dihedral angle between the OCH planes.We consider a data set of 120,000 energy points for the H2CO molecule ob-tained from reference [36]. Figure 2.8 illustrates the distribution over the energyvalues of the complete data set. Each training data set used to train every NN or GPmodel is sampled using the Latin hypercube sampling (LHS) to ensure the pointsare efficiently spread [126].0 20 40 60 80 100 120 140 160 180 0 660 1320 1980 2640 3300 3960 4620 5280 5940 6600 7260 7920 8580 9240 9900 10560 11220 11880 12540 13200 13860 14520 15180 15840 16500 count	  energy,	  cm-­‐1	  Figure 2.8: Distribution of energy values of the PES dataset. Figure fromreference [102]NNs are a powerful and complex supervised learning algorithm [132]. Weconsider one of the simplest NNs architectures, single layer NN with sigmoid ac-tivation functions. Even in the limit of a single layer of neurons, NNs can be usedas an interpolation algorithm. We study the effect that the number of neurons hasin the accuracy of single layer NNs. Each NN is trained using 200 epochs withthe Levenberg-Marquardt algorithm. All the results that are reported here were ob-tained using MATLAB’s Neural Network ToolboxTM. The data used to train NNsis scaled to [0,1]. The results obtained using NNs are summarized in Table 2.1.GPs are a robust supervised learning algorithm; we test their accuracy by con-sidering different kernel functions and various number of training points. The op-timization of the kernel parameters is carried out by maximizing the logarithm of26Table 2.1: RMSE (test errors computed on 120,000 points) of the PES ob-tained with the NNs for different training points (Npts). The number inthe parenthesis are the number of neurons used in that particular NN.〈NN〉10 is the average RMSE for 10 NNs with different sets of Npts. Thevalues are in cm−1.Npts NN 〈NN〉10313 198.00(20) 103.93(30) 87.77(40) 119.11(20) 53.97(30) 43.90(40)625 21.12(50) 12.91(75) 12.03(100) 13.36(50) 7.52(75) 6.53(100)1250 9.29(70) 5.74(100) 4.38(150) 5.74(70) 3.36(100) 2.54(150)2500 4.59(100) 2.43(150) 1.12(250) 2.27(100) 1.23(150) 0.86(250)Table 2.2: RMSE (test errors computed on 120,000 points) of the PES ob-tained with GP regression. All GPs use the SE kernel function. Npts isthe number of training points. 〈GP〉10 is the average RMSE for 10 GPmodels with different sets of Npts. The values are in cm−1.Npts GP 〈GP〉10313 29.09 17.18625 5.98 3.871250 2.17 1.132500 1.08 0.62the marginal likelihood, Section 2.3. All GP models are constructed with scikit-learn’s Guassian Process Regression library [137]. Interpolation results using GPsare summarized in Table 2.2.As discussed, using different kernel functions leads to different accuracies inthe predictions of GPs. For example, the RMSE of a GP model with the MAT(ν =52) kernel and 2000 training points is 43.63 cm−1, while with the same numberof training points but using the RQ kernel the RMSE is 49.87 cm−1. The RMSEfor both GPs is computed over 30,000 points. The kernel function is not the onlyvariable that GPs’ accuracy is sensitive to, different representations of the data alsomake GP regression more or less accurate. For instance, if we train a GP modelwith inverse of bond-lengths, like CO, CH1 and CH2, the RMSE decreases to 29.95cm−1 using the MAT(ν = 52) kernel and 2000 training points.NNs are a parametric model, meaning that the number of parameters is fixed,unlike GP regression. We consider a single layer NN, which only single hyper-27parameter is the number of neurons. By increasing the number of neurons we areable to decrease the RMSE, Table 2.1. NNs become a more robust interpolationalgorithm when the number of neurons increases which means more parameters.However, we also notice that a large number of neurons does not necessarily reducethe RMSE. For example, a NN trained with 625 points but 75 and 100 neurons havesimilar RMSEs. When the number of parameters for NN is large more trainingpoints are needed for better training a NN. The total number of parameters of asingle layer NN is (d+1)∗nH , where d is the dimensionality of the problem andnH is the number of neurons. A well-known problem in NNs is that for a fixednumber parameters, more training data could lead to a better optimization whichmeans a lower RMSE. We must remember that the loss function to train NNs isan m−dimensional function where m is the number of parameters, whereas thedimensionality of the cost function for GPs is the number of kernel parameters.Consequently, GP models are often easier to train than NNs.One of many applications of fitting PESs is the ability to predict the vibrationalspectrum of molecules. We compute the error of the predicted vibrational spec-trum using the NNs and GPs trained to interpolate the PES of the H2CO molecule.The vibrational spectrum is determined using space-fixed Cartesian kinetic energyoperator and Gaussian basis functions (SFGB) [123]. Figure 2.9 shows that theGPs fit clearly outperforms the NNs fit. The RMSE between the accurate vibra-tional frequencies and the ones computed using a GPs fit is 0.05 cm−1, while NNs’RMSE is 0.30 cm−1, when both models are trained with 625 points. GP models arenot only more accurate than NNs in the interpolation of PESs but also in predict-ing vibrational spectra. GP regression also offers the advantage of requiring fewertraining points than NNs.2.6 SummaryWe have shown that it is possible to fit PESs with supervised learning. Furthermore,we compared two of the most common, novel regression tools, NNs and GP mod-els, to interpolate the energy for a spatial arrangement of atoms. We demonstratedthat GP models are a more accurate interpolation tool over NNs for low dimen-sional systems. Additionally, we explored the impact that the number of training28	  0.25	  0.00	  0.25	  0.50	  0.75	  1.00	   2.00	   3.00	   4.00	   5.00	   6.00	  |error|,	  cm-­‐1	  transi8on	  energy,	  103	  cm-­‐1	  1	  NN	   <10	  NN>	  	  1	  GP	   <10	  GP>	  Figure 2.9: Absolute errors in transition frequencies computed on PESs fittedto 625 points with different methods (NN: top half, GP: bottom half) vsthe spectrum computed on the reference PES. Figure from reference[102]points has on the prediction’s accuracy for both methods, NNs and GP regression.Vibrational spectra computed using both regression methods were compared as asecond test to determine which method is more accurate with GPs still outperform-ing NNs.The computational time required to evaluate a single point (x∗) depends onthe architecture of the models; for example, a single layer NNs’s prediction timescales as O(nHd + nH) where d is the dimensionality of the problem and nH arethe number of neurons. For GPs, because of the K−1 term in Equation 2.5 theprediction scales as O(N3) where N is the number of training points. However,inversion of the covariance matrix K−1 just needs to be done once since it does notdepend on x∗. In the limit of few training points, GPs’ prediction is faster than NNsonly if the number of neurons needed for accurate prediction is large.One of the unanswered questions in quantum chemistry is, which regressionmodel needs the least number of energy points to make accurate predictions? Forsome systems, each energy point used to train a regression model may requirehigh computational resources and the resources needed to have a large number ofpoints is a problem. We argue that for low dimensional systems GPs are accurate29interpolation algorithms that require less training points, Tables 2.1 and 2.2.The prediction accuracy of single layer NNs depends on the number of neurons.In Table 2.1 we showed that for a fixed number of training points the RMSE of asingle layer NN decreases as nH increase. NNs with a large number of neuronsrequires more training points in order to become accurate; for example the NN’sRMSE with nN = 100 and train with 625 and 1250 points changes from 12.03 cm−1to 5.74 cm−1. We must remember that the dimensionality of the loss function usedto train single layer NNs directly depends on the number of neurons, dim(LNN) =nHd + nH . The minimizer of LNN represents the best parameters of, for the NN.The dimensionality of the negative log-marginal likelihood, GPs’ loss function,depends only on the number of parameters in the kernel. If we consider an isotropickernel the dim(LGP) = 1, whereas if we consider an isotropic kernel dim(LGP) =d. Given these points, GP models can be trained more accurately partly becausethe low dimensionality of their loss function which makes the minimization moreefficiently.The accuracy of fitted PESs can also be evaluated by computing some physicalobservables that depend on the PES, for instance, the vibrational spectrum of amolecule. We compute the absolute difference between the accurate vibrationalspectrum of Formaldehyde and the one predicted using a PES trained with NNsor GPs. Figure 2.9 illustrates that the vibrational spectrum predicted using a GPstrained with only 625 points and the SE kernel is more accurate than with a NNwith 100 neurons.In recent years, ML algorithms have shown the potential to tackle high-dimensioncomplex problems like playing the Go game, self-driving cars, among others, mak-ing it a popular/novel tool in many scientific areas like physics and chemistry. Inthis chapter, we discussed how using 625 energy points and two of the most simpleML algorithms, it is possible to reduce the computational time to study many-bodyphysics like the prediction of the vibrational spectrum of a molecule. It shouldbe noted that the architecture of the NNs used in our research is not the NN thatdefeated Lee Sedol and future research should be done on how deep-NNs [72],more than a single layer of neurons, can help chemists to construct the PESs formolecules or proteins where GPs cannot be used.30Chapter 3Bayesian OptimizationTo those who do not know mathematics it is difficult to get across areal feeling as to the beauty, the deepest beauty, of nature. If you wantto learn about nature, to appreciate nature, it is necessary tounderstand the language that she speaks in.— Richard P. FeynmanThe optimization of machine learning algorithms is one of the most importantresearch areas since the parameters of each supervised learning algorithm needsto be trained [177]. However, the same tools can be used to optimize physical-chemistry problems. Over the course of this chapter, we introduce one of the mostnovel ML algorithms to optimize black-box functions known as BO. Using BO andGPs, we present an innovative algorithm that allows us to fit PESs with accuratequantum dynamics observables. For the H + H2 → H2 + H system we show thata GP model trained with 30 points is enough to predict accurately the reactionprobability for this system. We also illustrate that the same algorithm works forhigher dimensional systems like OH + H2 → H2O + H, where a GP model trained290 points fits a PES that again predicts accurate reaction probabilities.3.1 IntroductionIn Chapter 2, we introduced two of the most common supervised learning algo-rithms, NNs and GP models. We compared the accuracy of both methods by inter-polating the PES of the H2CO molecule when different number of training points31were used. We conclude that GP models need fewer training points in order to ac-curately interpolate the PES of the H2CO molecule. Also, we discussed that for lowdimensional problems, GP models are more accurate than NNs. Furthermore, inthis chapter, we exemplify how GP models can also be used to solve optimizationproblems.Here we present again the mean (µ∗) and standard deviation (σ∗) of GPs usedin the prediction,µ(x∗) =n∑id(x∗,xi)yi =n∑iαik(x∗,xi) (3.1)σ(x∗) = K(x∗,x∗)−K(x∗,X)>K(X ,X)−1K(X ,x∗) (3.2)where K(·, ·) is the covariance matrix with matrix elements Ki j = k(xi,xi) and k(·, ·)is the kernel function. The values of the kernel parameters are optimized by maxi-mizing the log-marginal likelihood function, Section 2.3.The goal of an optimization problem is to find the best solution among allpotential solutions. In the field of ML, the optimization problem is associatedwith the search for the values of the parameters of a model that better describedthe problem, e.g. minimizing a loss function [132, 177]. Synthetic chemistry alsohas optimization problems, for example varying the reaction conditions to increasepercent yield. The optimization of a function is also a supervised learning problem,but instead of finding the best global representation of function f (·), the goal is tofind the x where f (·) is minimum,x∗ = arg minxf (x). (3.3)The most common optimization algorithm for continuous functions is gradientdescent (GD) [132]. GD algorithm is designed to minimize a function iteratively bydisplacing the current point in the direction of the negative gradient of the function,xn+1 = xn−η∇ f (xn) (3.4)where the parameter η is known as the learning rate. η is also related in the trade-off between exploitation and exploration and plays a key role in the convergence32of the algorithm [132]. For example, when η is small GD is exploiting xn+1 ≈ xn;where as for η  0 is related to exploration. GD is one of the first optimizationalgorithms used to train NNs [158].GD has been a widely successful optimization algorithm. However, not everyfunction can be optimized using GD. For example, there is no analytic functionthat describes the relation between the percent yield given some experimental con-ditions for a chemical reaction, therefore one can not use GD to increase the percentyield. There are many other problems that are described by non-analytic functionsor black-box functions, where evaluations are point-wise. BO is designed to tacklethe optimization of black-box functions where gradients are not available. For ob-vious reasons, trying to find the minimum of f (·) by randomly sampling is not thesmartest strategy, since it may take a large number of evaluations from f (·) beforefinding the minimum. BO tries to infer the location of the minimum of a black-boxfunction by proposing a smarter iterative sampling scheme. In the case of GD weassume that the gradient gives us the information of where to sample the next pointin order to get closer to the minimum. Considering that black-box functions do nothave a gradient, it is necessary to propose a metric that quantifies the informationalgain as a function of the space. The core of BO relays in two components,1. F (·)→ model that mimics the black-box function.2. α(·)→ acquisition function that quantifies the information gain for a givenpoint.To mimic the unknown function f (·) we can use any supervised learning al-gorithm, like NNs. However, if F (·) is not capable to learn at every iteration, wemay waist some of the evaluations because of the lack robustness of the model. GPmodels are a great candidate forF (·) due to the accuracy and robustness to inter-polate any continuous function. Additionally, the ability of GP models to quantifythe prediction’s uncertainty σ(x)without the need of extra data is what makes themthe strongest candidate for BO.Figure 3.1 illustrates how BO works to find the minimum of f (·) without usinggradients. The maximum of the acquisition function is the query point where theblack-box function is evaluated next, f (xn+1), and at each iteration we add the33new point xn+1 to the training data and retrain the GP model. Algorithm 1 isthe pseudocode of BO [25, 170]. In Section 3.2 we explain different acquisitionfunctions that are used in BO.Algorithm 1 Bayesian optimizationInput: Acquisition function α(·), black-box function f (·), data set D .1: for n = 1,2, . . . , do2: Optimize the acquisition function,xn+1 = arg maxxα(x,D)3: Evaluate f (xn+1).4: Augment data Dn+1 = {Dn,(xn+1, f (xn+1))}.5: Update model.3.2 Acquisition functionBO is an optimization algorithm designed for problems where gradients are notavailable. As it was mention above, the acquisition function is designed to repre-sent which point in the space has the most information. By iteratively evaluatingthe black-box function where the acquisition function is maximum we learn a morecertain representation of f (·) where the minimum could be. There are many dif-ferent acquisition functions, here we cover the three most used,1. Probability of improvement (PI)2. Expected Improvement (EI)3. Upper confidence bound (UCB)3.2.1 Probability of ImprovementIn 1964 H. Kushner proposed as an acquisition function to maximize the probabil-ity of improvement, the probability when f (x)> τ [112]. H. Kushner showed thatif f (x) is Gaussian distributed, P( f (x)> τ) can be written as,αPI(x;τ) := P( f (x)> τ) =Φ(µ(x)− τσ(x))(3.5)3410520f(x)f(x)ObservationsPrediction5 0 5010(x)aAcqusition function10520f(x)5 0 5x01025(x)b5 0 5xcFigure 3.1: Example of the BO algorithm over three iterations. Blue markersare GP’s the training data, blue curves are GP’s prediction and blackdashed line is the black-box function f (x) = x10 + x2 + 10sin(32 x). Theyellow solid curves are the acquisition function, andH are the maximumof the acquisition function at each iteration, xn+1.where Φ(·) is the normal cumulative distribution function, µ(·) and σ(·) are thepredicted mean and standard deviation of a GP model trained with the data set Dn,and τ is the target value. Since the goal of BO is to find τ we can approximateit with the best known value in the set Dn, for example τ = maxi=1:Nyi. If yn+1 isgreater than the current value of τ , we update τ . PI is know as a greedy acquisitionfunction, however if we relax the value of τ by adding a constant, ε , we can makeexploratory moves. For example, in Figure 3.2 we illustrate how the maximum ofαPI(·) changes for different values of ε .35100102030f(x)f(x) Observations GP4 2 0 2 4x0.00.51.0PI(x) = 0 = 1 = 2Figure 3.2: Illustration of the PI acquisition during BO. We also show the PIfunction for different values of ε .3.2.2 Expected ImprovementThe expected improvement is one of the most known acquisition functions. Theimprovement is defined as the difference between the predicted point and the bestknown point (τ),I(x;τ) = max{0,µ(x)− τ} (3.6)and the expected improvement is defined as,E[I(x;τ)] =∫I(x)p(I(x)|τ,µ(x),σ(x))dx (3.7)where p(I(x)|τ,µ(x),σ(x)) is the probability distribution over the improvement. Ifwe consider that p(I(x)|τ,µ(x),σ(x)) is Gaussian distributed with a mean µ(x)−36τ , the expectation integral has a closed-form expression [25, 100],αEI(x;τ) ={0 if σ(x) = 0(µ(x)− τ)Φ(z(x;τ))+σ(x)φ(z(x;τ)) if σ(x)> 0 (3.8)where z(x;τ) = µ(x)−τσ(x) . µ(x) and σ(x) are the mean and standard deviation of aGP. Φ(·) is the normal cumulative distribution and φ(·) is the normal probabilitydistribution. Figure 3.3 illustrates the EI function.100102030f(x)f(x) Observations GP4 2 0 2 4x0510EI(x)Figure 3.3: Illustration of the EI acquisition during BO.3.2.3 Upper Confidence BoundThe last acquisition function that we present in this dissertation is the upper confi-dence bound (UCB) function,αUCB(x) = µ(x)+κσ(x) (3.9)37where µ(x) and σ(x) are the predicted mean and standard deviation of a GP model.The constant κ plays a key role since it describes the trade-off between explorationand exploitation. When κ is small the acquisition function relies more on the meanof the GP which is associated with exploitation, opposite to when κ  0 whichmakes αEI(·) ≈ σ(·) and we explore the input space. Figure 3.4 illustrates howthe κ parameter changes the sampling schedule of a black-box function. Therehave been many proposals on how to change the value of κ as a function of thenumber of iterations; for example, set κ to a large value at the beginning for moreexploration and reduce its value at the end for exploitation [25, 170, 178]. The UCBfunction can also be used with other ML algorithms that are capable of computingthe prediction’s uncertainty σ(·), like bayesian-NN [134, 172].100102030f(x)f(x) Observations GP4 2 0 2 4x2010010UCB(x) = 0 = 0.5 = 1Figure 3.4: Illustration of the UCB acquisition during BO. We also show howthe UCB function for different values of κIn the following section we present the results for fitting PES through the errorof quantum dynamic calculations. We also show that by combining GP regression38and BO one can solve the inverse scattering problem.3.3 ResultsThe main idea of Chapter 2 is to illustrate that PESs can be represented by a super-vised learning algorithm like NNs or GPs. We conclude that GPs are more accurateregression models when the training data is few and the dimensionality of the sys-tem is low; for instance, the RMSE of a GP and NN trained with 1250 points is1.13 cm−1 and 2.54 cm−1 restively, Tables 2.1 and 2.2. We also compare the vi-brational spectra of a PES fit with GP regression and NN with respect to the realspectrum, and our results demonstrate that when a PES is fitted using a GP modelthe vibrational spectra is also more accurate [102].PESs have a more profound meaning in the field of quantum molecular dynam-ics. PESs are used to reduce the computational complexity of quantum dynamicscalculations. PESs are also used to understand reaction mechanisms or to studytransition states. To obtain the PES yielding the most accurate description of theexperimental dynamical results, one could start with a rough PES based on a smallnumber of ab initio calculations and systematically improve the surface by one ofthe following two iterative procedures:(i)(ii)(iii)A(i)(ii)(iii)Bwhere (i) is computing the potential energy for a wide range of relative atomiccoordinates by an ab initio quantum chemistry method; (ii) fitting an analyticalrepresentation of the PES; (iii) integrating the Schro¨dinger equation for the motionof the atomic nuclei on this PES.In the first approach (cycle A), one would compute and add more ab initiopoints to the PES at each iteration, thus placing an emphasis on the parts of theconfiguration space most relevant for the dynamics. In the second approach (cycleB), one could attempt to solve the inverse scattering problem by first computingthe global PES and then modifying the analytical fit of the PES through an iterativeprocedure [61, 92, 127]. However, there are two problems that make these iterativeapproaches unfeasible in the application to quantum reaction dynamics. Firstly,39step (ii) above, i.e. finding the best analytical representation of the PES for multi-atom systems is extremely laborious, almost always requiring manual work [24,51, 79, 91, 133]. Secondly, step (iii) takes minutes to hours of computation time,which severely limits the number of loops in any of the optimization cycles above.Here we propose a more efficient optimization cycle,(i)(iii)Cwhere step (ii) can be easily eliminated by fitting PESs using GP models. Cycle Ccan be implemented for low dimensional reaction systems by means of a two-tieredGP regression. Where the first GP is used to fit PESs, and the second GP is usedto optimize the location and magnitude of the energy points to produce the PESwhich gives the best description of the observable.We consider two chemical reactions:H+H2→ H2+H (3.10)OH+H2→ H2O+H (3.11)and for all the results display in the following sections. We compute the reactionprobabilities using the time-dependent wave packet dynamics approach describedin Ref. [41, 55, 146, 190], explicitly accounting for all degrees of freedom. The ba-sis sets of the reaction dynamics calculations are chosen to ensure full convergenceof the dynamical results.Both of these chemical reactions have been studied before [42, 180]. For reac-tion (1), Su et al [180] computed the reaction probabilities with the 3D PES fromRef. [23], constructed using an analytical fit to 8701 ab initio energy points. Forreaction (2), Chen et al [42] computed the reaction probabilities with the 6D PESconstructed using NN fits to ∼17 000 ab initio calculations.In the following sections we highlight how it is possible to optimize cycle Cwhen the quantum dynamics results are and are not known, and how to overcomethe inaccuracy of the quantum chemistry calculations to get a better fit of any PESwith experimental data.403.3.1 Fitting known quantum dynamics resultsIn this section, we explain the algorithm used to optimize cycle C when the reactionprobabilities are known. First we randomly select a small number of energy points(n = 20 for the 3D surface and n = 280 for the 6D surface) from the original PESs[23, 42] and construct an approximate PES with a GP, Equation 3.1. We denote thisGP model of the surface by G (n). When n is small any regression model is likelyto be highly inaccurate and the quantum dynamics calculation with this surface isalso expected to produce highly inaccurate results, Figure 3.5.Given G (n), we then ask the following question: if one ab initio point is addedto the original sample of few points, where in the configuration space should it beadded to result in the maximum improvement of the quantum dynamics results?In principle, this question could be answered without ML by a series of quantumdynamics calculations based on G (n+ 1) with the added point moved around theconfiguration space. However, such an approach would be completely unfeasibleas it would require about 10d dynamical calculations for each added ab initio point,where d is the dimensionality of the configuration space. We also need to quan-tify the improvement gained by adding a single point, which can be done with anutility function f (·). Care must be taken since there are many ways to quantifythe improvement e.g. the RMSE function between a G (n)’s reaction probabilityand the exact value from either from a calculation with the full surface or from anexperiment.We seek the utility function’s minimum, arg min f (·), since it describes whichpoint shapes the PES in the most accurate manner. Finding the minimum of f (·)could not have been done with out using current ML algorithms like BO. Fig-ure 3.5a illustrates the change in the reaction probability as a function of a singlepoint added to a fix set of points. Depending on which point is considered inG (n+1) the improvement in the reaction probability changes, Figure 3.5b.As we state above, BO requires a model that mimics the utility function f (·).We denoted asF , the GP model that learns the utility function and is also used toconstruct the acquisition function that is needed in BO. This GP model is trained byapproximately 15× d quantum dynamics calculations. We use as utility function410.5 1.0 1.5 2.0Translational Energy (eV) ProbabilityAccurateInital(a)r10.2 0.4 0.6 0.8 1.0 1.2r20. 3.5: a) The reaction probability for the H2 + H → H + H2 reactionas a function of the collision energy. The black solid curve–accuratecalculations from [23]. The dashed curves–calculations based on GPsPESs obtained with 22+ 1 ab initio points, where the 22 initial points(solid blue curve) are fixed and the extra point is randomly sampled inthe configuration space. b) The error of the reaction probabilities as afunction of the location extra point added to the original 22 points.the root mean square error (RMSE),ε(xn+1) =√∑i(z(ei)− zˆ(ei))2 (3.12)where z(ei) is the value of the reaction probability at different collision energie eiwith a surface by G (n+1). xn+1 is the position of the added point and zˆ(·) is theaccurate reaction probability. For reaction (1) the values of zˆ(·) are reported inRef. [180] and for reaction (2) in Ref. [42]. Figure 3.5b displays the change in theRMSE for different locations of the added point to G (n+1).For both systems, reactions (1) and (2), we use the UCB acquisition function,Equation 3.9, with κ = 0.005. After a fixed number of iterations in the BO al-gorithm, 35 quantum dynamics calculations for H3 and 60 calculations for OH3,we choose the point where the utility function is minimum, and added to the nset. With this procedure, the minimization of F (·) for each value of n requiresabout 35 quantum dynamics calculations for reaction (1) and 60 calculations for42reaction (2). We carried iteratively the same algorithm until we converge to a PESthat has an accurate reaction probability. Sequentially adding the points that makesthe highest improvement corresponds to a class of ‘greedy’ reinforcement learningstrategies in ML [181].Figure 3.6a illustrates the performance of this algorithm in search of the bestPES for reaction (1). As can be seen, the starting model G (·) of the PES basedon 22 ab initio points produces highly inaccurate results, but the BO scheme con-verges to the correct PES after only 8 iterations. Accurate results for the reactionprobabilities (green dashed line) can be achieved with a GP model trained withonly 30 ab initio points. Figure 3.6b shows the model G (·) of the PES obtainedwith n = 30 ab initio points, illustrating that Equation 3.1 produce a physical sur-face. Figure 3.7 illustrates the dependence of BO as a function of the value of κ inαUCB(·).430.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Translational Energy (eV) ProbabilityExact222330(a)R11. (eV)43210(b)Figure 3.6: a) The reaction probability for the H2 + H → H + H2 reactionas a function of the collision energy. The black solid curve – accuratecalculations from Ref. [23] based on the surface constructed with 8701ab initio points. The dashed curves – calculations based on the GPPES obtained with 22 ab initio points (blue); 23 points (orange), 30points (green) and 37 points (inset). The RMSE of the results with 37points is 0.009. b) The GP model of the PES for the H3 reaction systemconstructed with 30 ab initio points. This surface yields the quantumdynamics results shown by the green curve in the upper panel.4422 24 26 28 30 32 34 36Number of energy points0. = 0.01 = 0.005 = 0.001Figure 3.7: Convergence of the RMSE of the probabilities for H2 + H→ H +H2 reaction with the number of BO iterations as a function of κ in UCB.Figure 3.8 illustrates the performance of this algorithm for reaction (2). As thedimensionality of the configuration space increases, so does the number of pointsrequired to represent accurately the PES. Nevertheless, accurate results for the re-action probabilities (green dashed line) are obtained with 290 ab initio points, muchsmaller than the set of ∼ 17,000 points used in previous work [42] to construct thePES with a NN fit. The RMSE of the reaction probabilities thus obtained is 0.0076.Note that, as any supervised learning technique, this algorithm is guaranteed to be-come more accurate when trained with more ab initio points.3.3.2 Fitting without known quantum dynamics resultsIn the previous section we assume that Equation 3.12 accurately describes the im-provement when accurate dynamics results are known. However, for all systems,the quantum dynamics results may not be known before hand. Therefore, we pro-pose to use an utility function that quantifies the maximum improvement betweenthe observables computed with G (n+1) and G (n) at each iteration. This is justifiedby the observation that G (n→ ∞) must produce the best surface so the maximumimprovement of the surface at each iteration is achieved when xn+1 corresponds tothe maximum of Equation 3.12, where zˆ(·) are the results from G (n). The min-450.1 0.2 0.3 0.4 0.5 0.6Translational Energy (eV) ProbabilityExact280281290Figure 3.8: The reaction probability for the OH + H2→ H + H2O reaction asa function of the collision energy. The black solid curve – accurate cal-culations from Ref. [42] based on the surface constructed with ∼17000ab initio points. The dashed curves – calculations based on the GP PESobtained with 200 ab initio points (blue); 280 points (orange) and 290points (green). The RMSE of the 290-point result is 0.0076.imum of Equation 3.12 indicates that both G (n+ 1) and G (n) produce the samereaction probabilities.To illustrate the validity of this assumption, we show in Figure 3.9 a seriesof computations as functions of n, showing the convergence of the iterative cal-culations to the accurate results (black solid curve). Figure 3.9 shows that theoptimization loop converges to the accurate PES after 48 iterations. We emphasizethat the accurate results (black curve) were not used in any way in this calculation.Care must be taken since many of the G (n+ 1) could lead to unphysical reactiveprobabilities. To avoid this issue we penalized the utility function by decreasing itsvalue, preventing the BO algorithm from sample points from that space’s region.460.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00Translational Energy (eV) ProbabilityExact2260626570Figure 3.9: The reaction probabilities for the H2 + H→ H + H2 reaction asfunctions of the collision energy. The black solid curve – accurate calcu-lations from Ref. [23]. The dashed curves – the results of iterative cal-culations maximizing the difference between the reaction probabilitiesin subsequent iterations. The black curve is not used for these calcula-tions. The inset shows the agreement between the reaction probabilities(red symbols) based on the GP approach after 48 iterations (total of 70ab initio points) and the exact results.3.3.3 The inverse scattering problemIntegrating of Schro¨dinger’s equation for the motion of the atomic nuclei on anyPES is computationally demanding, step (iii); thus the computational resourcesneeded in quantum dynamic calculations increases as a function of the system’ssize. To overcome this problem, there is a proposal to infer the shape of a PESusing experimental quantum dynamical observables. This scheme is known as theinverse scattering problem; however, due to its complexity, this problem has notbeen fully solved for large systems. Unfortunately, it is impossible to compute thepotential energy in step (i) without errors and any theoretical predictions of observ-47ables are subject to uncertainties stemming from the errors of quantum chemistrycalculations. These errors become more significant for heavier atoms and are oftenunknown. Therefore, it would be ideal to develop an approach that either bypassesquantum chemistry calculations or corrects the errors of the ab initio calculations.This could be achieved by deriving the empirical PES from the experimental data[50, 61, 92, 127].Here, we extend the previous sections to construct a PES that, when used inquantum scattering calculations, reproduces an arbitrary set of observables. Wefirst modify the exact scattering results of Figure 3.6 by shifting along the energyaxis and randomly modulating the black curve. This produces an arbitrary energydependence of the reaction probabilities shown by the dot-dashed curve in Fig-ure 3.10a. The goal is to construct a PES that reproduces these arbitrarily chosenreaction probabilities. Note that the dot-dashed curve extends the interval of en-ergies, where the reaction probability is zero, which means that the PES for thisreaction must have a higher reaction barrier and cannot be reproduced with theoriginal PES for H3.We assume that a small ensemble of energy points is known from some (notnecessarily accurate) quantum chemistry calculation, denoted by E◦i . As before,this ensemble serves as a starting point to fit a PES with a G (n). However, in orderto allow for the improvement of the PES and to overcome the inaccuracy of thequantum chemistry calculations we vary E◦i by the amount of ε(x), Ei = E◦i +ε(x).Intuitively the variable ε depends on the atomic coordinates because depending onthe space’s region we many need to increase or decrease the energy to change theshape of the PES.In the previous sectionsF (·) only depended on xn+1, because we assume thatE◦i = Ei. A more robust approximation is to learn the dependence of the utilityfunction for both variables, ε and x. Since ε increases the dimensionality of theutility function by one, we need more data so that BO’sF (·) accurately describesthe minimum of the utility function for xn+1 and the value of ε . Figure 3.10a showsthat this algorithm converges to the arbitrarily modified reaction probabilities after32 iterations, producing a PES depicted in Figure 3.10b.480.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Translational Energy (eV) ProbabilityExactAltered222352(a)R11. (eV)43210(b)Figure 3.10: a) The reaction probabilities for the modified H2 + H → H +H2 reaction as functions of the collision energy. The black dot-dashedcurve is obtained by a modification of the previous results (black solidcurve) involving a translation along the energy axis. The ML modelsare trained to obtain the PES that would describe the new reactionprobabilities. The green dashed curve is a results of such training after30 iterations, which produces a surface constructed with 52 ab initiopoints. b) Comparison of the original PES (blue) with the new PES(red) found by the BO algorithm. The new PES yields the reactionprobabilities described by the green dashed curve in the upper panel.The RMSE of the results shown by the green dashed curve is 0.016 ev.493.4 SummaryThe results we present in this chapter can be summarized as,• GPs are a robust regression algorithm that can also be used to minimize/-maximize black-box functions.• BO is a powerful algorithm that without relying on gradients is capable ofminimize/maximize functions like Equation 3.12.• Accurate quantum reactive scattering results can be obtained with PES basedon a very small number of ab initio points,– 30 for reaction: H + H2 → H2 + H– 290 for reaction: OH + H2 → H + H2O• Accurate quantum reactive scattering results can be obtained with PES basedon 70 points for reaction H + H2 → H2 + H when the quantum dynamicsresults are not known.• Our approach is robust enough that it can bypass the inaccuracy of the quan-tum chemistry calculations by also learning to correct Ei.• Most of the energy points that are used to fit PES are unnecessary for accu-rate quantum dynamics calculations like the reaction probabilities.• We introduce a two-tiered GP model, which gives the reaction probabilitiesas explicit functions of the position of an added ab initio point.As it was mentioned, evaluating the energy with accurate ab initio quantumchemistry methods and quantum dynamical observables are both computationallydemanding. The results we present indicate that ML reduces the total computa-tion of similar problems by using interpolation methods that required less trainingpoints (Chapter 2), and better search algorithms like BO. Incorporating ML in thequantum dynamics calculations is a research problem that should be considered,since it could reduce the computational resources needed to evaluate quantum dy-namics observables for large systems where is currently impossible.50GPs are not well suited regression algorithms for high dimensional problems,therefore the two-tiered GPs regression method we proposed has some flaws to fitthe PES for systems with large number of atoms. However the same strategy canbe used with other Bayesian optimization algorithms like the ones proposed by R.P. Adams et al. suggested in Ref. [172], PHOENICS in Ref. [83] or using randomforest (RF) [94].The optimization of cycle C using GP models and BO show that the total num-ber of points needed to fit a PES with accurate quantum dynamics observables canbe reduced. However, the total number of points used by BO to minimized the util-ity function could be reduced by studying different acquisition functions or changethe value of κ as a function of the iteration number [25, 170, 178].The optimization algorithm for cycle C is greedy since at each iteration we onlyconsider points to train G (n) with data that represent the minimum of the utilityfunction. However, we did not explore the impact of a non-greedy policy in boththe accuracy and the number of points required for a G (n) with accurate reactionprobabilities. A non-greedy policy, know as ε-greedy [181], allow us to make newmoves that could lead to a better over all strategy that could reduce the numberpoints or achieve even more accurate quantum dynamics observables.One of the key elements in the scheme we proposed is the utility function. f (·)describes the quality of the points, e.g. Equation 3.12 computes the difference inthe complete range of the collision energy. However we could benefit from theintrinsic correlations that multiple utility functions could have, e.g partition thecollision energy to have multiple error functions as utility functions. This newstrategy can be tackled using multi-task BO [182] and could cut down the numberof quantum dynamics calculations. Also, multiple quantum dynamics observablescould be used to fit a single PES instead of a single one which is the strategy wepresent.We must emphasize that the most important concept of this chapter is the abilityto optimize black-box functions with the current ML tools without using gradients.There are many open problems in chemistry and physics that can be formulatedin terms of the optimization of black-box functions, e.g. improve the percentageyield in chemical reactions [152], or in cold atoms physics tuning the experimentalparameters to enhance the lifetime of quantum particles. It must be remembered,51however, that while using BO the utility function should capture the problem in themost robust manner so that ML can help us solve new problems.52Chapter 4Extrapolation of quantumobservablesThe underlying physical laws necessary for the mathematical theoryof a large part of physics and the whole of chemistry are thuscompletely known, and the difficulty is only that the exact applicationof these laws leads to equations much too complicated to be soluble.It therefore becomes desirable that approximate practical methods ofapplying quantum mechanics should be developed.— Paul A. M. DiracIn the field of many-body physics quantum observables like ground state ener-gies, particle correlations, particle densities, to mention few, are key to understandthe underline physics of a phenomenon. The computation of quantum observablescan be understood as the prediction of a scalar black box function where the goalis to infer such function given some training data,〈Oˆ〉 ∼ f (x) (4.1)where f (·) represents the numerical or experimental method used to evaluate thequantum observable of operator Oˆ. In the case of electronic ground state ener-gies, the operator Oˆ is the electronic Hamiltonian, Equation 2.20, and x is all thedistances between the nuclei in the molecule. For a condensed matter system, xis the value of the parameters of the Hamiltonian; for example, for the Hubbard53model x can be the values of the onsite energy, hopping amplitude, to mention few,Chapter 6.Using ML to infer quantum observables has been proved to be a novel ap-proach to study many-body physics. For example, J. Carrasquilla et al. used NNsto characterize many-body quantum states from different phases of matter for var-ious spin Hamiltonians [35]. The quantum observable used to train the NNs is adiscrete variable that labels the phase of matter, i.e. ferromagnetic or antiferro-magnetic. As L.-F. Arsenault et al. stated in Ref.[8], the use of ML algorithmsshould be focused to solve true quantum many-body problems. Here the authorspredict the self-energy Σ(ω) or the local Green function G(ω) for the Hubbardmodel with second-neighbour hopping for a 3D cubic lattice using kernel Ridgeregression (KRR) method [132]. The phase diagram of this many-body systemhas one transition, from the metallic to the Mott insulator phase. For each phase,a different KRR method was used to predict the quantum observable. The phasetransition as a function of the Hamiltonian parameters is also learned by a classifi-cation algorithm, decision forest [8].Predicting discrete or continues quantum observables is a daily task for com-putational physicists; naturally, the use of ML algorithms has reduced the com-putational effort needed to study many-body physics. However, when the goal isto discover new physics like unknown phases of matter or the value of quantumobservables where the current experimental/numerical tools cannot be assed, morenovel and powerful ML algorithms have to be developed. Over the course of thischapter, we present the idea of applying ML algorithms like GP models to extrap-olate quantum observables to discover new phases of matter.This chapter exemplifies the power of GP models with non-single kernels byextrapolating the energy dispersion, denoted as E(·), of a single particle dressed bybosons in an infinite 1D lattice. We also illustrate a non-bias manner to constructmore robust kernels that can interpolate between different phases of matter and alsoextrapolate where training data is not used.544.1 IntroductionIt is common in quantum physics to encounter a problem described by a Hamilto-nian like,Hˆ = Hˆ0+αHˆ1+β Hˆ2 (4.2)whose eigenspectrum can be computed/measured in certain limits of α and β e.g.α = 0 or α  β , but not at arbitrary values of α and β . For such problems, itis necessary to interpolate the quantum properties of a system between the knownlimits, if there are more than one like we did in Chapters 2 and 3. If only one limitis accessible, one must extrapolate from this limit, which we will exemplify bellow,Section 4.2. Both the interpolation and extrapolation become exceedingly complexif the system properties undergo sharp transitions at some values of α and/or β .Sharp transitions separate the phases of the Hamiltonian (4.2), as shown schemat-ically in Figure 4.1. Since the properties of the quantum system vary drasticallyin the different phases [160], an extrapolation of quantum properties across phasetransition lines is generally considered unfeasible [160]. We consider the phasediagrams of polaron Hamiltonians, some of which have three phases as depicted inFigure 4.1, and show that the sharp transitions in these diagrams can be identifiedby machine learning models trained with data from only one of the phases.Phase	IPhase	IIPhase	IIIFigure 4.1: Schematic diagram of a quantum system with three phases.To illustrate the possibility to extrapolate quantum observable using ML, weconsider a generalized polaron model describing an electron in a one-dimensional55lattice with N→ ∞ sites coupled to a phonon field:Hˆ =∑kεkc†kck +∑qωqb†qbq+Ve−ph, (4.3)where ck and bq are the annihilation operators for the electron with momentum kand phonons with momentum q, εk = 2t cos(k) and ωq =ω = const are the electronand phonon dispersions, and Ve−ph is the electron - phonon coupling. We chooseVe−ph to be a combination of two qualitatively different terms Ve−ph = αH1+βH2,whereH1 =∑k,q2i√N[sin(k+q)− sin(k)]c†k+qck(b†−q+bq)(4.4)describes the Su-Schrieffer-Haeger (SSH) [125] electron - phonon coupling, andH2 =∑k,q2i√Nsin(q)c†k+qck(b†−q+bq)(4.5)is the breathing-mode model [113]. The eigenstates of the model (4.3) are polaronsthat are known to exhibit two sharp transitions as the ratio α/β increases from zeroto a large value [87]. At α = 0, the model (4.3) describes breathing-mode polaron,which have no sharp transitions [69]. At β = 0, the model (4.3) describes SSHpolarons, which exhibit one sharp transition in the polaron phase diagram [125].At these transitions, the ground state momentum of the polaron changes abruptly,as shown in Figure 4.3 and in the black curve in Figure 4.4 for the SSH polaron.In the following section, we first illustrate how a GP model with a combinationof simple kernels became a more robust supervised learning model. Furthermore,we explain how with GP regression it is possible to extrapolate quantum observ-ables of any quantum system like the polaron Hamiltonian that can lead to thediscovery of new phases of matter.4.2 Combination of kernelsThe efficiency of GP models depend on the kernel function and the size of thetraining data set [147]. In the limit of a large number of training points, any GP56model with a stationary kernel produces accurate results. However, there is norestriction to use more than one kernel to construct the covariance matrices, thuswe could ask what is there to gain by combining kernels?. In this section, weinvestigate the premise of using GPs with more than one kernel.For many problems, there might be a suitable kernel form that describes thesystem more accurately, for example, the periodic kernel for recurrent data. How-ever, to custom made a kernel function for every single problem is not trivial. Oneof the restrictions is that the kernel function used in GP models must be a positive-defined function [147]. It is possible to construct more robust kernels that satisfiedthe positive-defined restriction using two simple operations [58–60],kα + kβ = kα(·, ·)+ kβ (·, ·) (4.6)kα × kβ = kα(·, ·)× kβ (·, ·) (4.7)where ki(·, ·) is any of the simple kernels introduced in Chapter 2 .To illustrate the power of combining kernels, we go back to fitting the PES forH2CO [102], Chapter 2. For a GP model with kMAT the RMSE is 29.95 cm−1,while using kMAT × kRQ the RMSE is 10.97 cm−1; both GP models were trainedwith the same 2,000 points, and tested on a set of 30,000 points. By including anew kernel we reduced the RMSE by a factor of two, however, we must considerthat a GP model with kMAT ×kRQ has more parameters than a GP model with kMAT ,making it a more robust supervised model.The core of GP models is the kernel function which must capture the similar-ity relation between two points. So far we have shown that GPs with single andmultiple kernel functions are accurate interpolation models, but can GPs work forextrapolation?. Extrapolation is defined as the ability to predict beyond the train-ing data range. In principle, if we could design or propose a kernel function thatcaptures the correlation of the data in a robust manner GPs would be capable of ex-trapolating observables. For example, a GP model with a periodic kernel is capableof extrapolating if there is some intrinsic periodicity in the data.As we already state in Chapter 2, there are two types of kernels, stationary andnon-stationary. In the case of the stationary kernels, it indisputable that they arenot suited for extrapolation since the kernel function for two distant points should57be zero. Consequently, if the distance between the training data and the new point,x∗, is large, the prediction with GPs is zero, as can be seen in Figure 2.2a. On thecontrary, a non-stationary kernel, like the linear kernel Equation 2.18, allows themean of the predicted distribution to be non-zero, as is illustrated in Figure 2.6.In the following sections, we illustrate the possibility of using multiple kernelsto extrapolate quantum observables to predict phase transitions.4.2.1 Kernel combination to predict a single phase transitionD. Duvenaud et al. proposed that the multiplication of any stationary kernel withthe linear kernel, kLIN , leads to a GP model with a non-zero mean that can be usedfor extrapolation [58, 60]. Here we illustrate that this simple kernel i.e. kMAT ×kLIN , is robust enough to make extrapolation predictions and to envision changesin the shape of the energy dispersion of the SSH polaron.The first case we study is the energy dispersion of the polaron, when β/αis held fixed to β/α = 1.0. Using Equation 4.1 we defined the lowest polaroneigenenergy or polaron dispersion as,E(K,α,β ,ω) = 〈Hˆ (K,α,β ,ω)〉 (4.8)where K is the total momentum, α and β are the polaron constants and ω isthe phonon frequency. E(·) depends on the parameters of the Hamiltoninan 5.1,K,α,β and ω . We denote value of the K where the polaron dispersion is minimumas,KGS = arg minKE(K,α,β ,ω). (4.9)The first case we consider is to extrapolate the polaron energy for β/α ≥ 1.0,where KGS = 0 [87, 113]. We used GP regression with kMAT × kLIN to illustratethat if is possible to predict E(·) for λSSH > 1.3 while only being trained from0 < λSSH < 1.3, Figure 4.2. We defined λSSH = 2α2/th¯ω as the effective SSHcoupling strength.Predicting energy dispersion with roughly the same shape is still challenging;however, the energy dispersions of the SSH model, for β = 0, suffer a change of58shape as a function of α . Due to the change in the polaron dispersion, the groundstate momentum gets shifted from KGS = 0 to KGS > 0. Figure 4.3 illustrates thatin the case for ω/t = 3 the change in KGS happens right at λSSH = 0.6, denoted asλ ∗SSH . Indicating that the SSH polaron undergoes a phase transition [125].We use GP models to extrapolate E(·) for λSSH > λ ∗SSH using the same kernel,kMAT × kLIN , to exhibit that simple ML algorithm can predict phase transitionswithout knowing there exists such phases of matter. Here, λ ∗SSH denotes the valuewhere the shift in KGS happens. The shape of the energy dispersion before andafter λ ∗SSH changes, Figure 4.3; nevertheless, GP model trained with only valuesof λSSH before the transition (λSSH < λ ∗SSH) are capable to learn the intrinsic trendin E(·) that shifts the ground state momenta, Figure 4.2. The training data andthe kernel are equally important in GP regression. When GP models have moreinformation like an extra λSSH point, they became more accurate as it is displayedin Figure 4.4 where two different GP models were considered. The first GP modelwas trained from 0.2≤ λSSH ≤ 0.5 and the second from 0.2≤ λSSH ≤ 0.6. All theGP models are trained with only 250 total points. To quantify the accuracy of theGPs prediction we computed the RMSE between the predicted energy dispersionand the accurate energy dispersion [16, 87, 174, 175]. In Figure 4.2 we can observethat GP models can make accurate extrapolation for at least |λSSH − λˆSSH | ≈ 1,where λˆSSH is the greatest/smallest training value.In the following section, we consider alternative kernel combinations to predictthe complete phase diagram of the polaron model.590.4 0.8 1.3 1.6 2.0 2.5SSH0.00.10.2RMSE(t)abkMAT× kLIN0.0 0.5 1.0E(K)/taSSH=1.4SSH=1.50.0 0.5 1.0aSSH=1.6SSH=1.70.0 0.5 1.0K/E(K)/taSSH=1.8SSH=1.90.0 0.5 1.0K/aSSH=2.0SSH=2.10.0 0.5 1.0bSSH=0.8SSH=0.90.0 0.5 1.0bSSH=1.0SSH=1.10.0 0.5 1.0K/bSSH=1.2SSH=1.30.0 0.5 1.0K/bSSH=1.4SSH=1.5Figure 4.2: The upper panel is the RMSE of GPs’ prediction as a functionof λSSH . The markers illustrate the values of λSSH used to train theGPs. We consider two sets of training values, (a) 0.4≤ λSSH ≥ 1.3 and(b) 1.6 ≤ λSSH ≥ 2.5. The lower panels illustrate the GPs prediction(solid colour curves) and the accurate energy dispersion (dashed colourcurves). We use kMAT ×kLIN as the kernel function for both GP models.ω/t = 3 is held fixed.600.2 0.5 1.0 1.5 2.0 2.5SSH0123RMSE(t)abkMAT× kLIN0.0 0.5 1.0E(K)/tbSSH=0.7SSH=0.80.0 0.5 1.0bSSH=0.9SSH=1.00.0 0.5 1.0K/E(K)/tbSSH=1.1SSH=1.20.0 0.5 1.0K/bSSH=1.3SSH=1.40.0 0.5 1.0aSSH=0.7SSH=0.80.0 0.5 1.0aSSH=0.9SSH=1.00.0 0.5 1.0K/aSSH=1.1SSH=1.20.0 0.5 1.0K/aSSH=1.3SSH=1.4Figure 4.3: The upper panel is the RMSE of GPs’ prediction as a function ofλSSH . The markers illustrate the values of λSSH used to train the GPs.We consider two sets of training data (a) 100 points distributed from0.2≤ λSSH ≤ 0.5 and (b) 125 points distributed in 0.2≤ λSSH ≤ 0.6. Thelower panels illustrates the GPs prediction (solid colour curves) and theaccurate energy dispersion (dashed colour curves). We use kMAT × kLINas the kernel function for both GP models. For all the energy dispersionsdisplay here, we consider ω/t = 3 and β = 0.610.2 0.4 0.6 0.8 1.0 1.2 GS0.2 0.4 0.6 0.8 1.0 1.2 1.4SSH0. GSFigure 4.4: For both panels we illustrate the change in KGS as a function ofλSSH . The blue curves is the KGS from the GP’s prediction of the energydispersion. The black dashed curve is the correct KGS for each λSSH .The markers are the values of λSSH that we use to train each GPs. Theupper panel shows 0.2≤ λSSH ≥ 0.5, wile the lower panel shows 0.2≤λSSH ≥ 0.6. We use kMAT × kLIN as the kernel function for both GPs.For all the calculations ω/t = 3 and β = Kernel combination to predict multiple phase transitionsThe complete phase diagram of the polaron model has three different phases [87],as is sketched in Figure 4.1. Each phase represents the value of KGS as function ofthe Hamiltonian parameters,• Phase I→ KGS = pi• Phase II→ 0< KGS < pi• Phase III→ KGS = 0In this section, we exploit the power of ML to learn the complete phase dia-gram of the polaron model when few values of α and β are used as training data.62The first problem we consider is to determine the phase transitions by training a GPmodel with a few data points from phase I and III. Figure 4.5 illustrates the KGS ofthe predicted E(·) with a GP model trained with different simple kernels. As it canbe observed, the only kernel that is capable to predict the existence of phase II is theMatern kernel, however the transition lines are only predicted quantitatively closethe the training data regime, Figure 4.5. The interpolation of the energy disper-sions inside phase II is challenging due to the difference in the physical propertiesbetween the other two known phases.012SSHMAT0.00.51.0K GS/4 2 0 2/012SSHRQ4 2 0 2/RBF0.00.51.0K GS/Figure 4.5: KGS for the mixed model 4.3. The black dashed curves are thecalculations from Ref. [87]. The color map is the prediction of the GPmodels with the fully optimized kernels. The models are trained by thedispersion of the polarons at the locations represented by the black dots.The different kernels were considered, kMAT , kRQ and kRBF .As we showed in the previous section, Section 4.2.1, to increase the learningcapacity of a GP model we can use any possible combination of simple kernels63using Equations 4.6 and 4.7. The kernel parameters of any complex kernel wereoptimized by maximizing the log marginal likelihood. Here we evaluated all thepossible combinations of two simple kernels. Each kernel combination proposedwas trained with the same data points, 1250 points distributed from 2.0 < λSSH ≤2.5 for phases I and III, and−4≤ β/α ≤−3 (phase I) and 1.0≤ β/α ≤ 2.0 (phaseIII). More over, for each value of α and β we used 25 points from 0 < K < pi .The phase diagram predicted with each of all the possible combination of twokernels is shown in Figures 4.6 and 4.7. Figure 4.6 displays the predicted phasediagram when two kernels are combined by the addition operation, Equation 4.6.The predicted phase diagrams in Figure 4.7 were computed by multiplying twokernels, Equation 4.7.As discussed in Chapter 2, ML models with a large number of parameters arenot always more capable of learning. This can be observed in Figures 4.6 and4.7, where increasing the number of kernels does not make the GPs regressionprediction of E(·) more accurate. For example, a GP with kRQ + kRQ is still notcapable of predicting phase II where 0 < KGS < pi . On the other hand, if the rightcombination of two kernels is chosen, e.g. kMAT + kRBF ,kRBF + kRBF , kMAT × kRBFor kMAT × kLIN , GP predicted phase diagram improves significantly,With current computational power training GPs models with a more complexcombination of kernels is doable; however, there are two major issues. The firstproblem is the number of kernel combinations that are required to fully describe asystem, and the second obstacle is the need for test data to evaluate the accuracy ofeach kernel combination. In the following section, we address the lack of test datato construct more a robust kernel by reformulating the kernel combination probleminto a Reinforcement learning (RL) problem where each taken action is towardsconstructing the ‘best’ kernel.64012SSHRBF+RBF012SSHRBF+RQ012SSHRBF+MAT4 2 0 2/012SSHRBF+LINRQ+RBFRQ+RQRQ+MAT4 2 0 2/RQ+LINMAT+RBF0.00.51.0K GS/MAT+RQ0.00.51.0K GS/MAT+MAT0.00.51.0K GS/4 2 0 2/MAT+LIN0.00.51.0K GS/Figure 4.6: KGS for the mixed model 4.3. The black dashed curves are thecalculations from Ref. [87]. The color map is the prediction of theGP models with the fully optimized kernels. The models are trained bythe dispersion of the polarons at the locations represented by the blackdots. The different kernels considered here are all the possible pairwiseaddition, Equation 4.6, of two simple kernels, kMAT , kRQ and kRBF .65012SSHRBFxRBF012SSHRBFxRQ012SSHRBFxMAT4 2 0 2/012SSHRBFxLINRQxRBFRQxRQRQxMAT4 2 0 2/RQxLINMATxRBF0.00.51.0K GS/MATxRQ0.00.51.0K GS/MATxMAT0.00.51.0K GS/4 2 0 2/MATxLIN0.00.51.0K GS/Figure 4.7: The momentum of the polaron ground state for the mixed model4.3. The black dashed curves are the calculations from Ref. [87]. Thecolor map is the prediction of the GP models with the fully optimizedkernels. The models are trained by the dispersion of the polarons at thelocations represented by the black dots. The different kernels consid-ered here are all the possible pairwise amutiplication, Equation 4.7, oftwo simple kernels, kMAT , kRQ and kRBF .4.3 Model selectionAs we stated in the previous section, we could define a GP with a complex kernelstructure by combining simple kernels. However, given two GP models with totallydifferent kernels, how do we select the ‘best’ kernel form?. Under the supervisedlearning framework, once can simply evaluate the test error for each GP model andchoose the one with the lowest test error. In order to the compute test error, extra setof data is required, raising the question: How can the most accurate combination66of kernels be elected with the lack of test points?. This section presents that takingthe advantage that GPs are probabilistic methods, it is possible to select the mostaccurate model without test points.Under the framework of Bayesian statistics, it is possible to compute how prob-able is the data given a modelMi, p(D |Mi). In the case of GPs, p(D |Mi) is usedto optimize the kernel parameters as we explained in Section 2.3 and can be evalu-ated analytically. For illustrative purposes let us defined a first GP with kMAT +kRQ,and a second GP with kMAT +kMAT . For each model, we compute the marginal like-lihood p(y|X ,θ ,Mi), where Mi is one of the two proposed kernels and θ are thekernel parameters. Once both GP models are trained we can select the GP modelwith the highest marginal likelihood, since it resembles the relationship betweenthe data and the model. However, it may not be fair to compare marginal likeli-hoods for kernels with different number of parameters. We must remember that amodel with a larger number of parameters tends to memorize the data easily whichcould lead to a higher marginal likelihood [20, 132]. It is possible to penalize themarginal likelihood to compare models with different number of parameters morefairly using the Bayesian information criterion (BIC) [58, 132, 168].4.3.1 Bayesian information criterionThe training stage of a GP with a fixed kernel form is also known in the Bayesianstatistics framework as model selection, where each different kernels parameters’value is a unique model. The most appropriate model is the one with the highestmarginal likelihood. In the case of GPs, the parameters of the kernel are contin-uous variables and the optimization can be carried out numerically as shown inSection 2.3. Unfortunately, we can not use the exact same procedure to searchfor the ‘best’ kernel combination since the combination of kernels is not a con-tinuous variable that can be numerically optimized. To overcome this problem,we select different kernel combinations using the Bayesian information criterion[58, 132, 168],BIC(Mi) = log p(y|X , θˆ ,Mi)− 12 |Mi| logN (4.10)67where |Mi| is the number of parameters of kernelMi, and N is the number of train-ing points. The term −12 |Mi| logN is a penalizing term for kernels with a largernumber of parameters. The log p(y|X ,θ ∗,Mi) is the logarithm of the marginal like-lihood for an optimized kernel. Using the BIC we can compare different kernelswithout the need of test data, however, there can be more than one good combina-tion of simple kernels that could accurately represent the structure of the trainingdata. With the computational power available today, it may seem possible to cre-ate and train GP models with as many combinations of kernels as can be imagined.Unfortunately, the combination of kernels is a non-tractable combinatorial problemand this makes the problem almost impossible.4.3.2 Greedy search for model constructionTo overcome the combinatorial problem of possible kernel combinations, we use agreedy algorithm to narrow the search space for the most suitable combination ofkernels [58, 60]. As we mentioned in Chapter 3, greedy strategies are well knownin the RL literature [181]. We use BIC as our selection criteria to find the ‘best’combination of kernels.The first step in searching for a more robust kernel is to train different GPmodels with different simple kernels. We denoted simple kernels as all the kernellisted in Section 2.4. We select the kernel with the highest BIC, and denoted ask0(·, ·). In the following step, we use k0(·, ·) as a base kernel and we combine itwith all ki(·, ·). We combine the two kernels by multiplying them, k0(·, ·) × ki(·, ·),or adding them, k0(·, ·) + ki(·, ·). For all the possible combinations we computethe BIC and select the kernel with the heights BIC which is denoted as k1(·, ·). Wecarried the same procedure iteratively where the base kernel is k`(·, ·), and k`+1(·, ·)is the kernel combination that has the highest BIC. Figure 4.8 illustrates the greedysearch algorithm explained above.4.4 ResultsIn this section, we illustrate the use of GP models with complex kernels that areconstructed by an iterative maximization of the BIC explained in the previous sec-tion. We illustrate the possibility to extrapolate quantum observables to regimes68No kernelRBF RQ MAT LINMAT × LIN· · ·MAT + RBF · · · MAT × RQMAT × LIN + RQ· · ·MAT × LIN + RBFMAT × LIN × RBF · · · MAT × LIN + LIN MAT × LIN + MAT· · ·· · · · · ·Figure 4.8: Schematic diagram of the kernel construction method employedto develop a GP model with extrapolation power. At each iteration, thekernel with the highest Bayesian information criterion (Equation 4.10)is selected.in the spaces where traditional numerical algorithms have a lack of convergence.Furthermore, we also trained a GP model with data only form phases I and III, asexplained in Section 4.2.2 and predicted the complete phase diagram of the polaronmodel.4.4.1 Extrapolating quantum observables to unconverged regimesThis section illustrates how kernel combinations can predict quantum observableswhere standard numerical methods fail to converge due to the Hilbert space di-mensionality. We consider the Holstein polaron model, where the Ve−ph term inEquation 4.3 is,H2 =g√N∑k,qc†k+qck(b†−q+bq). (4.11)Many methods have been developed to study dressed particles like the Holsteinpolaron, e.g. quantum Monte Carlo based methods, variational exact diagonal-ization [22], momentum average approximation [16], to mention few. All thesemethods demand higher computational resources when the value of ω decreases690.00 0.25 0.50 0.75 1.00K/ = 1.000.00 0.25 0.50 0.75 1.00K/  = 1.10(a)1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75× kRQkRQ× kRQ× kRBF+ kDP(b)Figure 4.9: Left panel: Holstein polaron dispersion predicted with a GPmodel with kRQ, black dashed curves, and kRQ×kRQ×kRBF +kLIN , bluedashes curves. The red solid curves are polaron dispersions computedwith the momentum average approximation method [22]. Right panel:The change in the RMSE as a function of depth in the kernel combina-tion search guided by the BIC. We plot the RMSE between GP predictedpolaron dispersion and the exact polaron dispersion at different valuesof ω . The grey area is the range in ω considered for training the GP. Forboth figures, the value of g = 1.5 is fixed.since the increase of the Hilbert space dimension. As a proof-of-principle, wepresent that GP models with kernel combination can also extrapolate E(·) in theregime where ω is small. The phonon frequency range considered for trainingwas, 2.0 < ω < 3.0. We used the BIC to search for the most appropriate kernelcombination to extrapolate E(·) in the low phonon frequency regime, Figure 4.8.Figure 4.9b illustrates that at each iteration of the BIC search, the learning of a GPmodel has a better representation of E(·) for ω < 2.0. We must point out that thesingle kernel with the highest BIC, kRQ, is capable to extrapolate E(·) qualitatively.On the other hand, a GP model with kRQ× kRQ× kRBF + kLIN yields to accurateresults even when ω ≈ 1.0, Figure 4.9a.4.4.2 SSH-BM polaron phase diagramIn this section, we present predicted phase diagram of the SSH-BM polaron with aGP model where the kernel is constructed by the BIC, as it is depicted in Figure 4.8,70For clarity, here, we use the notation “GPL-X” for the kernel with the highestBIC obtained after X depth levels of the algorithm depicted in Figure 4.8 of themain manuscript. We considered two different sets of training data, the first onewhere the training data is only from phase III, same as in Figures 4.6 and 4.7,and the second from phases I and II. In the first case, Figure 4.10 illustrates thechange in the prediction of the polaron phase diagram depth levels of the algorithmdepicted in Figure 4.8. As it can be noticed the first kernel GPL-0, kMAT , doespredict the change in KGS from zero to pi qualitatively. At the third level GPL-3is (kMAT + kRBF)× kLIN where the phase diagram is predicted quantitatively. Westate that any GPL-X predicts E(K,α,β ) in the complete range of the Hamiltonianparameters, then we search for KGS which we used to construct the phase diagram.Here, we also test if GP models are capable to predict quantum observablesin-between phases. Predicting quantum observables in between different quantumphases is also a challenging problem. The training data used for this example issampled from phases I and III. Again, the kernel form, GPL-X , is proposed bythe algorithm described in Figure 4.8. GPL-4, in Figure 4.11, is (kMAT × kLIN +kRBF)× kLIN and accurately predicts the entire SSH-BM polaron phase diagram.GPs are a non-parametric method, meaning that the data is one of the keycomponents to train GPs. Here, we illustrate that even with kernel combinations,different data lead to a different combination of kernels. With only consideringdata from phase III, as in Figure 4.10, and used Section 4.3.2 algorithm to predictthe polaron phase diagram. We consider two different training sets, one were thetraining points are closer to the transition line and one where the points are furtheraway. Each set has the same amount of points, 900, but different values of α andβ . For each pair of values of α and β we consider 20 points distributed 0<K < pi .The predicted phase diagram with a GP trained with both set of points is illustratedin Figure 4.12.For both training sets, the first selected kernel is kMAT , Figure 4.12. A GPmodel with a single kernel trained with data closer to the phase transition is capableto qualitatively predict the existence of phase II where KGS > 0. Additionally,neither of both kernels can predict the existence of phase III, KGS = 0. The rightcolumn panels of Figure 4.12 are the predicted phase diagrams with GPL-2 forboth training sets. kMAT + kRBF + kLIN is the GPL-2 for where the training data is714 3 2 1 0 1 2/ GS/4 3 2 1 0 1 2/ GS/4 3 2 1 0 1 2/ GS/Figure 4.10: KGS for the mixed model (4.3) as a function of β/α for λSSH =2α2/th¯ω . The dotted curves are the quantum calculations from Ref.[87]. The color map is the prediction of the GP models. Each panelillustrates the improvement of the predicted phase diagram. The panelscorrespond to the optimized kernels GPL-0 (left), GPL-1 (right), GPL-2 (centre), where “GPL-X” denotes the optimal kernel obtained afterX depth levels in the algorithm depicted in Figure 4.8. The models aretrained by the dispersion of the polarons at the locations representedby the white dots.the furthest from the first phase transitions, while GPL-2 is (kMAT + kRBF)kLIN forthe data closer to the transition. Both kernels are capable to predict the existenceof phase I where KGS = pi . The left column of Figure 4.12 shows that the two setsof training data contain different physical information.724 3 2 1 0 1 2/ GS/4 3 2 1 0 1 2/ GS/4 3 2 1 0 1 2/ GS/4 3 2 1 0 1 2/ GS/Figure 4.11: KGS for the mixed model (4.3) as a function of β/α for λSSH =2α2/th¯ω . The dotted curves are the quantum calculations from Ref.[87]. The color map is the prediction of the GP models. Each panelillustrates the improvement of the predicted phase diagram. The panelscorrespond to the optimized kernels GPL-0 (left), GPL-1 (right), GPL-2 (centre), where “GPL-X” denotes the optimal kernel obtained afterX depth levels in the algorithm depicted in Figure 4.8. The models aretrained by the dispersion of the polarons at the locations representedby the white dots.4.5 DiscussionPredicting phase transitions using ML learning algorithms is currently a ‘hot’ re-search topic in the field of many-body physics. One of the most known works wasproposed the use of NNs with multiple layers to learn how to classify differentspin configurations in order to characterize the ferromagnetic or antiferromagneticphase by J. Carrasquilla et al. in Ref [35]. Another influential work was proposedin 2015 by L.-F. Arsenault et al. where it is showed that ML algorithms can alsoclassified different phases of matter and interpolate quantum observables such as73012SSHL-0 L- GS/4 2 0 2/012SSH4 2 0 2/ GS/Figure 4.12: KGS for the mixed model (4.3) as a function of β/α and λSSH =2α2/th¯ω . The black dashed curves are the calculations from Ref. [87].The color map is the prediction of the GP models with the fully opti-mized kernels. The models are trained by the dispersion of the po-larons at two different locations represented by the white dots. Leftcolumn panels are the predicted phase diagram with GPL-0 for bothtraining data sets and right column panels are with GPL-2.the quasi particle weight in the Hubbard model [8]. The interpolation of the quan-tum observables was done using KRR. Identifying different phases of matter inthe Ising model can also be done with unsupervised learning as L. Wang showedin Ref. [188].The approach proposed over this chapter illustrates how GP models are capa-ble of robust interpolation and extrapolation of quantum observables to constructphase diagrams. We also state that the learning capacity of GP models can be en-hanced using multiple kernels. We highlight the impact that different combinationof kernels have in the prediction of GP models. It is important to emphasize thatcombining a large number of kernels does not ensure a more accurate model, asmany of the kernels predicted incorrect phase diagrams in Figures 4.5 to 4.7Over the course of this chapter, we illustrate that a hand-crafted kernel com-74bination, like kMAT × kLIN , can predict quantum observables beyond training dataunder some regimes. This kernel form is robust enough to even predict the changeof form in the energy dispersion of the SSH polaron model as a function of λSSHor α . The prediction of GP models intrinsically depend on the number of trainingpoints since they are non-parametric methods; Figures 4.2, 4.3 and 4.4 display theincrease in the accuracy in the GPs extrapolation when more training data is used.The prediction of E(·) in the low phonon frequency regime is computationally de-manding as a result of the Hilbert space dimensionality required. Section 4.4.1presents that quantum observables can be accurately learned where traditional nu-meric methods have a lack of convergence.Constructing the most appropriate combination of kernels to study differentsystems is a complex problem on its own, and a lack of test data increases thedifficulty of the problem. We examine the use of BIC to reduce the combinato-rial space of kernels; specifically, we used greedy selections. However, differentstrategies can be used to construct the most accurate combination of kernels. Forexample, using auto-econders we can reformulate the search of the ‘best’ kernelto a continuous space as R. Go´mez-Bombarelli et al. did for molecules in Ref.[71]. The advantage of searching for the most accurate kernel combination in acontinuous space is the possibility to use optimization algorithms as BO discussedin Chapter 3.The BIC is the selecting criteria used to pick the most appropriate kernel com-bination to overcome the lack of test data for comparing the robustness of eachmodel. Further work should be carried out to study the impact of different met-rics, like the Akaike information criterion [132], for constructing different kernelcombinations,AIC(Mi) = log p(y|X , θˆ ,Mi)−|Mi|, (4.12)where the first term is the logarithm of the marginal likelihood for an optimizedkernel and |Mi| is the number of kernel parameters. It is important to mention thatthere can be more that one single kernel that can accurately predict phase diagram,as it is shown in Figures 4.6 and 4.7.75As we discussed in Chapter 2, when non-optimal kernel parameters are usedfor prediction, GP models have a lack of accuracy. The algorithm introduced inSection 4.3.1 sequentially constructs more robust kernels using the BIC. The BICdepends on the optimal values of θ , meaning that it is possible to select a wrongkernel if the kernel optimization is not converged. GP models with extremely com-plex kernels cannot be used since the optimization of such kernels is challengingand could lead to inaccurate predictions.The prediction of phase diagrams is one of the key problems in condensed-matter physics, however, we can use ML to push the boundaries of a physicalproblem and try to infer new phases of a diagram without knowing its existence.With the rise of more complex ML algorithms like deep-NN [72], bayesian-NN[134], GPs of GPs (deep-GPs) [53] or even with different kernel functions [191,192], it may be possible to discover new quantum systems or study the quantumworld in new regimes.76Chapter 5Accelerating quantum walksIf you are receptive and humble,mathematics will lead you by the hand.— Paul A. M. DiracThis chapter presents a study for the spreading of a quantum particle placedin a single site of a lattice or binary tree with the Hamiltonian permitting particlenumber changes. We show that particle number-changing interactions can accel-erate the spreading beyond the ballistic expansion limit by inducing off-resonantRabi oscillations between states of different numbers of particles. We considerthe effect of perturbative number-changing couplings on Anderson localization inone-dimensional disordered lattices and show that they lead to a decrease of lo-calization. The effect of these couplings is shown to be larger at larger disorderstrength, which is a consequence of the disorder-induced broadening of the parti-cle dispersion bands. Results presented in this chapter are published in Ref. [185]5.1 IntroductionAn important class of quantum computing algorithms is based on quantum walks(QWs) [89, 187], the quantum analog of random walks [46]. Random walks on lat-tices and graphs are powerful mathematical objects that can be used as algorithmictools for a variety of problems, including optimization, search and classification.The efficiency of many such algorithms is determined by ‘hitting’ and ‘mixing’77times, quantifying how long it takes random walks to explore the underlying graphs[4]. Depending on the Hamiltonian, quantum walks can be accelerated by dynam-ical interferences and have potential to offer polynomial or, for some problems,exponential, computation speedup [63]. The role of interferences in QWs is per-haps best exemplified by the ballistic expansion of a quantum particle with time Tin a periodic lattice [45, 47, 89, 171], leading to the ∝ T growth of the probabilitydistribution, compared to the ∝√T expansion of the classical random walk. QWshave been proven to offer the√N speed-up of spatial search over N items arrangedin a d-dimensional lattice, with d > 4 [44].With recent advances in the experiments on controlling atoms [19, 57, 66, 103,143, 184], molecules [32, 111, 176], photons [9] and arrays of superconductingqubits [17, 31, 80, 98, 136], it has become possible to engineer lattice Hamiltoni-ans. This, combined with the importance of the speed of QWs for the quantumcomputing algorithms and for the study of the fundamental limits of the velocityof quantum correlation propagations [117, 149], raises the question if and how lat-tice or graph Hamiltonians can be engineered to accelerate quantum walks. Theeffect of Hamiltonian engineering on quantum walks has been studied in manydifferent contexts. For example, Giraud et al. [70] showed that Anderson local-ization impeding quantum walks in disordered systems can be mitigated by addinghopping terms, which provide shortcuts in circular graphs. QWs can also be ac-celerated by coupling a Hamiltonian system to an external bath. While the generalbelief is that particle-environment interactions destroy the coherence of quantumwalks leading to transport suppression in ordered systems, multiple recent studiesshowed that interactions with certain non-Markovian baths provide new pathwaysfor interferences [130, 144, 148]. The range of particle hopping and particle inter-actions are also known to determine the speed of quantum information propagation[39, 54, 149, 153].Engineering many-particle (as opposed to a single particle) quantum walks isbecoming an important research goal [139]. As shown by Childs et al., QWs ona sparse graph can be used to efficiently simulate any quantum circuit [43] andinteracting quantum walks are capable of universal quantum computation [48].Quantum walks of interacting pairs can be used to determine if graphs are iso-morphic [67]. Particle correlations can be exploited to change the directionality of78quantum walks [28]. Quantum walks of interacting particles can be used to realizequantum Hash schemes [116]. Two-body or multi-particle correlations have beenshown to affect quantum walks of few- and many-particle systems in interestingways [18, 28, 38, 39, 68, 108, 109, 145, 154, 155, 157, 162, 179, 189]. These stud-ies consider particle correlations arising either as a consequence of direct density -density interactions or particle quantum statistics.This chapter presents an alternative mechanism for accelerating quantum walks,namely quantum walks in a dynamical system governed by a Hamiltonian allow-ing particle number changes. Such Hamiltonians can be engineered with quasi-particles, such as excitons [34, 84, 128, 201], or with ultracold atoms trapped inoptical lattices and immersed in a condensate [40, 99]. They are also of signifi-cant experimental and theoretical interest due to the relation to the topologicallyprotected states and their possible use in quantum computing [106]. The presentwork shows that the particle-number-changing interactions lead to Rabi oscilla-tions, which significantly accelerate the spreading of quantum wave packets inideal lattices and binary trees. Also considers the effect of such terms on Andersonlocalization and show that they lead to decrease of the inverse participation ratioin disordered systems. This work also shows that the effect of number-changinginteractions on the participation ratio becomes stronger with increasing disorderstrength.5.2 ModelsConsider the quantum dynamics governed by the following lattice Hamiltonian:Hˆ = ∑iωicˆ†i cˆi+ t ∑〈i, j〉cˆ†j cˆi+ v∑〈i, j〉c†i cic†jc j +Vˆnc, (5.1)whereVˆnc = ∆∑〈i, j〉(cˆ†i cˆ†j + cˆicˆ j)+ γ∑i(cˆ†i + cˆi), (5.2)cˆi is the operator that removes the particle from site i, the quantities ωi, t, ∆ and vare the Hamiltonian parameters, and the angular brackets indicate that the hopping79and interactions are only permitted between nearest neighbour sites. The on-siteenergy ωi is defined as ωi = ∆ε+ εi, where ∆ε is a constant and εi is varied in thecalculations for lattices with on-site disorder (more details below and in Chapter 6).The term Vˆnc couples different particle-number states.Model (5.1) is a special case of the full Hamiltonian for the Frenkel excitonsin an ensemble of coupled two-level systems [2]. At ∆ = 0, γ = 0 and v = 0, thisHamiltonian reduces to the tight-binding model. At ∆ = 0 and γ 6= 0, the modeldescribes the quantum annealer setup of D-wave [80], where currents in interactingsuperconducting qubits are mapped onto spin states.This work considers the few-particle limit of Hamiltonian (5.1) and calculatesthe dynamics of quantum walks by diagonalizing the Hamiltonian and constructingthe full time evolution operator from the complete set of the corresponding eigen-vectors, as was done, for example, in Ref. [199]. In order to describe properlythe dynamics governed by the models with ∆ 6= 0 and/or γ 6= 0, the Hilbert spacemust include multiple particle-number states. The Hilbert spaces is truncated andonly includes one and three particles for the case ∆ 6= 0,γ = 0. When ∆= 0,γ 6= 0,the Hilbert space includes the vacuum state (zero particles), one, two, and threeparticles. As discussed below, this chapter considers the Hamiltonian parameters,for which the multiple-particle states have high energy. Since the energy of suchstates increases with the number of particles and the couplings can only change thenumber of particles by one or two, the contribution of such states decreases withthe number of particles. Previous calculations done, verified that for a lattice with19 sites that including the states of five particles does not change the results for theHamiltonian parameters considered here, Figure 5.2.The on-site energy ∆ε + εi determines the energy separation between stateswith different numbers of particles. Throughout this work, we consider the limit∆,γ ∆ε . For ideal lattices, εi = 0. For disordered lattices, εi is drawn from a uni-form distribution of random numbers. In this limit, the state corresponding to oneparticle at zero time becomes weakly dressed with higher particle-number states.The effect of the dressing can be accounted for by the Schrieffer-Wolf transforma-tion [167], which in first order leads to the appearance of next-nearest-neighbourhopping terms, as shown in Appendix B. Including higher order terms resultingfrom the transformation induces longer-range hopping. The particle can effec-80tively hop by undergoing virtual transitions to higher particle-number states andback. Note that in models with γ 6= 0, the particle can also hop by virtual transi-tions to the vacuum state (the state of no particles) and back.5.2.1 Ideal 1D latticesThe first considered case is the well-studied problem of ballistic spreading in anideal one-dimensional (1D) lattice. At ∆ = 0 and v = 0, a particle placed in anindividual lattice site expands as shown by the solid black line in Figure 5.1. Thisspreading is much faster than the expansion of the area covered by the classicalrandom walk, illustrated in Figure 5.1 by the dotted curve. Figure 5.1 shows thatthe quantum dynamics of a single particle initially placed in a single lattice site isdrastically different from both the random walk result and the ballistic spreadingwhen governed by the model (5.1) with ∆ 6= 0. In particular, the width of the wavepacket oscillates at short times, approaching the ballistic-expansion-like behaviourat long times. These calculations are performed for the 1D lattice with N = 41lattice sites with open boundary conditions. As can be seen from Figure 5.1, theeffect of the boundaries is not important until time reaches ≈ 11 t−1.In order to understand the origin of the oscillations, we plot in the insets ofFigure 5.1 the average number of particles 〈n〉 as a function of time. It can be seenthat 〈n〉 oscillates with the same period as the wave packet size. Thus concludethat the oscillations observed in Figure 5.1 are due to off-resonant Rabi floppingbetween the state of one particle and the states of multiple particles induced byVˆnc. Figure 5.1 shows that these coherent oscillations accelerate quantum walksbeyond the ballistic limit. Note that 〈n〉 in Figure 5.1 is an average of one andthree particles. For ∆ε/t = 20, 〈n〉 < 1.2, which illustrates that the three-particlesubspace remains largely unpopulated at all times.810 2 4 6 8 10 12time (in units of t 1)051015r(in units of a)ballistic expansionclassical random walk=10=200 2 =10=20n0 1 2 3 4time (in units of t 1)01234567r(in units of a)v= tv= tballistic expansion0 1 tv= tnFigure 5.1: Time dependence of the standard deviation (in units of latticeconstant) of the wave packet for a particle initially placed in a singlesite of a one-dimensional ideal lattice. The solid black curves representthe ballistic expansion governed by the Hamiltonian (5.1) with v= 0 andVˆnc = 0. Upper panel: The oscillating curves show the size of the wavepackets governed by the Hamiltonian (5.1) with v = 0, γ = 0, ∆/t = 1,t = 1 and two values of ∆ε: ∆ε = 10/t (blue) and ∆ε = 20/t (red).Lower panel: The oscillating curves show the size of the wave pack-ets governed by the Hamiltonian (5.1) with v =±1, ∆/t = 1, t = 1 and∆ε = 20/t. The insets show the average number of particles 〈n〉 as afunction of time for the corresponding Hamiltonian parameters. No-tice that for ∆ε = 20/t, 〈n〉 stays below 1.2 at all times. Figure fromreference [185].82Since the c†i c†j term generates pairs of particles in adjacent sites, it is importantto consider the role of inter-site interactions v. Such interactions appear in extendedHubbard models, leading to non-trivial properties of the lattice systems [135, 164,184] and inducing correlations in quantum walks [39]. Here, they are transient asthe mutliple-particle subspaces are populated only virtually. The inset of Figure 5.1illustrates that repulsive interactions stabilize the oscillations at long times, whilethe short-time dynamics appears to be largely unaffected by the density-densityinteractions.The ∆ 6= 0 term couples the subspaces with the odd number of particles. Thus,the state of a single particle is coupled to a state of three-particles, but not to thestate of two particles or the vacuum state. By contrast, the γ 6= 0 term couplessubspaces differing in the number of particles by one. To illustrate the effect ofsuch couplings on the dynamics of quantum walks, we compare two models: (i)∆= 0,γ = t and (ii) ∆= t,γ = 0. The results shown in Figure 5.3 illustrate that thecouplings in case (i) have a much stronger effect, leading to larger amplitudes ofthe oscillations and the persistence of the oscillations for much longer time.830 1 2 3 4 5 6time (in units of t 1)01234567r(in units of a)0.0 0.5 5 0 5 10Site index10 410 310 210 1100 ln| i|211 and 3 1,3 and 5 Figure 5.2: Time dependence of the standard deviation (in units of latticeconstant) of the wave packet for a particle initially placed in a singlesite of a one-dimensional ideal lattice. The solid black curves representthe ballistic expansion governed by the Hamiltonian (5.1) with v = 0and Vˆnc = 0. The oscillating curves show the size of the wave packetsgoverned by the Hamiltonian (5.1) with v = 0, γ = 0, ∆/t = 1, t = 1,∆ε = 20/t and a Hilbert space with different number of particles, 1 par-ticle (black), 1 and 3 particles (red) and 1,3 and 5 (blue). The upperinset shows the logarithm of the particle probability distributions in adisordered 1D lattice with 19 sites. The results are averaged over 50 re-alizations of disorder and are time-independent with w = 10/t. And thelower inset depicts the average number of particles 〈n〉 as a function oftime for the three different wave packets. Figure from reference [185].840 2 4 6 8 10 12time (in units of t 1)051015r(in units of a)ballistic expansion=0, =t= t, = 00 2 5.3: Time dependence of the standard deviation (in units of latticeconstant) of the wave packet for a particle initially placed in a singlesite of a one-dimensional ideal lattice. The solid black line representsthe ballistic expansion governed by the Hamiltonian (5.1) with v = 0,∆ = 0 and γ = 0. For the dotted red curve ∆ = 0 and γ = t while forthe blue solid curve ∆ = t and γ = 0. For all of these calculations,∆ε = 20/t. The inset shows the average number of particles 〈n〉 as afunction of time for ∆ = 0 and γ = t (dotted red curve) and ∆ = t andγ = 0 (blue solid curve). Figure from reference [185].5.2.2 Disordered 1D latticesWe next consider disordered 1D lattices. The disorder is generated by randomizingthe on-site energy εi by drawing the random values from a uniform distribution[w/2,w/2], where w quantifies the strength of disorder. Non-interacting particlesare exponentially localized in 1D disordered systems [7]. Our goal is to explorethe role of the ∆ 6= 0 interactions on the localization.In all of the disordered models we consider γ = 0, ∆/t ≤ 1 and ∆ε/t = 20.Notice that for the ideal lattice with ∆/t = 1 illustrated in Figure 1, this valueof ∆ε ensures that the average number of particles 〈n〉 < 1.2 at all times. Thethree-particle sub-space is thus far off-resonant and contributes to the dynamicsperturbatively.85Upper panel of Figure 5.4 shows the average lattice population distributionsillustrating the localization. To obtain these distributions, we place a particle in asingle lattice site, propagate the wave packet to long time and average the resultingprobability distribution over 100 random instances of disorder. We have verifiedthat this number of disorder realizations ensures converged results. The averagingremoves the time-dependence in the long-time limit. The results show that the termVˆnc induces non-exponential wings of the distribution, which rise with the magni-tude of ∆. To illustrate the quantitative contribution of these wings, we computethe inverse participation ratio (IPR) defined asI(t) =∑i( |ψi(t)|2∑i |ψi(t)|2)2, (5.3)where |ψi(t)|2 is the probability of the population of lattice site i at time t. Thevalue of the IPR ranges from 1/N for the state completely delocalized over thelattice with N sites to 1 for the state localized in a single lattice site. We findthat the couplings with ∆/t = 1 decrease the IPR, indicating decrease of local-ization. Surprisingly, the effect of these couplings increases with increasing dis-order strength. This phenomenon is reminiscent of noise-induced delocalization[49, 130, 140, 148]. Here, the variation of on-site energy due to disorder bringsthe energy of the different particle-number states for random lattice sites closer to-gether, thereby enhancing the effect of the couplings induced by Vˆnc. With increas-ing disorder strength w, the probability of the different number states becomingcloser in energy increases, leading to more and stronger high-order hopping terms,thereby decreasing localization more significantly. Note that this result appliesonly in the limit ∆ ∆ε , i.e. in the limit where the number-changing interactionsare much weaker than the energy separation between the number subspaces.8620 10 0 10 20Site index10 610 510 410 310 210 1100ln|i|21 3 5 7 10 12 15 20w(in units of t) t0.0 0.8 1.5time (in units of t 1) 5.4: Upper panel: The logarithm of the particle probability distribu-tions in a disordered 1D lattice with 41 sites: diamonds – ∆/t = 1,squares – ∆/t = 1/2, circles – ∆/t = 1/10. The results are averagedover 100 realizations of disorder and are time-independent. The dashedline is an exponential fit to the ∆/t = 1/10 results. Lower panel: thelong-time limit of the IPR defined in Eq. (5.3) averaged over 100 in-stances of disorder as a function of the disorder strength w: solid line –∆ = 0, dashed line – ∆/t = 1. The inset shows the IPR averaged over100 realizations of disorder for two disorder strengths w = {5/t,10/t}as functions of time: the solid black curves – ∆ = 0; the dotted anddot-dashed curves – ∆= t. Figure from reference [185].875.2.3 Binary treesFigure 5.5: Schematic diagram of an ideal binary tree with depth-5 (G 5).This sections considers quantum walks in binary trees. A binary tree is charac-terized by the number of layers and the connectivity of the lattice sites. Here, weconsider the binary tree G 5 with five layers schematically depicted in Figure 5.5.The model (5.1) adapted to binary trees becomesH =2g−1∑i=1(∆ε+ εi)cˆ†i cˆi+2g−1∑i=12g−1∑j=1ti j(cˆ†i cˆ j + cˆ†j cˆi)+2g−1∑i=12g−1∑j=1∆(cˆ†i cˆ†j + cˆ jcˆi)(5.4)γ = 0 is set for all binary tree calculations. Each node of the binary tree is connectedto three nodes: its father, the left child 2i and the right child 2i+1, soH =2g−1∑i=1(∆ε+ εi)cˆ†i cˆi+ t2g−1∑i=1(cˆ†i cˆ2i+ cˆ†i cˆ2i+1+h.c.)+Vˆnc,tree (5.5)whereVˆnc,tree = ∆2g−1∑i=1(cˆ†i cˆ†2i+ cˆ†i cˆ†2i+1+ cˆicˆ2i+ cˆicˆ2i+1+h.c.)(5.6)The spread of the quantum wave packets in such trees can be described by,σ(t) =√〈ν2〉−〈ν〉2 =√2g−1∑ν=1ν2 pν(t)− (ν pν(t))2 (5.7)Quantum walks started at the root of a graph and comparison between the dynamics880.0 0.5 1.0 1.5time (in units of t 1)02468r(in units of a)=10=200.0 1.0| n(t)|20.0 0.5 1.0 1.5 2.0time (in units of t 1)||P(t)u||1 =10=20Figure 5.6: Upper panel: The growth of the wave packet for a single particleplaced at the root of the tree (G 5): black line – ∆ = 0; the oscillatingcurves – ∆/t = 1 with ∆ε = 10/t (blue) and ∆ε = 20/t (red). The insetshows the probability of reaching the last node (in layer 5) of the binarytree as a function of time. The curves are color-coded in the same waysas in the main plot. Lower panel: The L1 norm between the probabilitydistribution of the wave packet and the uniform distribution as a functionof time. The uniform distribution is defined by the value 1/dim(G 5) foreach node. The solid black curve is for ∆ = 0. The oscillating curvesare for ∆ = t with ∆ε = 10/t (blue) or ∆ε = 20/t (red). For all curvesin both figures γ = 0. Figure from reference [185].89of models with ∆= 0 and ∆ 6= 0 is carried.Upper panel of Figure 5.6 shows that the couplings Vˆnc,tree accelerate quantumwalks on the tree. To quantify the effect of ∆ 6= 0 on quantum walks, we computethe mixing time defined asMε = min{T |∀ t ≥ T : ‖P(t)−pi‖1 ≤ ε} (5.8)where P(t) is the probability distribution at time t, pi is a distribution that the quan-tum system is expected to approach, and ‖·‖1 is the L1 norm.Two distributions pi are considered, the uniform distribution piU and the station-ary distribution piS. The uniform distribution is characterized by the same value ofprobability for each node. The stationary distribution is defined by the followingvalues of the probability for node νpiS(ν) = limT→∞p¯ν(T ), (5.9)where p¯ν(T ) is the time average of the probability of populating node ν ,p¯ν(T ) =1TT−1∑t=0pν(t) =1TT−1∑t=0|〈ν |Ψ(t)〉|2 (5.10)=1TT−1∑t=0〈ν |Ψ(t)〉〈Ψ(t)|ν〉=1TT−1∑t=0{∑λe−iEλ t〈ν |λ 〉〈λ |Ψ(0)〉}p¯ν(T ) ={∑λ ′eiEλ ′ t〈Ψ(0)|λ ′〉〈λ ′|ν〉}(5.11)90Here, h¯ = 1 is set. Defining 〈λ |Ψ(0)〉 as cλn0 ,p¯ν(T ) =1TT−1∑t=0{∑λe−iEλ tcλn0〈ν |λ 〉}{∑λ ′eiEλ ′ tcλ′∗n0 〈λ ′|ν〉}=1TT−1∑t=0∑λ|cλn0 |2 |〈ν |λ 〉|2+1TT−1∑t=0∑λ ,λ ′(cλn0cλ ′∗n0 ei(Eλ ′−Eλ )t〈ν |λ 〉〈λ ′|ν〉). (5.12)In the limit of long time T → ∞ the imaginary part of p¯ν(T ) tends to zero. Thefactor 1T in the real part of p¯ν(T ) cancels because ∑T−1t=0 ei(Eλ−Eλ )t = T . We canthus rewrite piS(ν) aspiS(ν) = ∑λ|cλn0 |2 |〈ν |λ 〉|2. (5.13)From Eq. (5.13) it can be observed that piS(ν) depends on the initial condition(|Ψ(t = 0)〉).Lower panel of Figure 5.6 illustrates the effect of the couplings Vˆnc,tree on thespeed of approaching the uniform distribution piu(ν) and Figure 5.7 the effect of thecouplings Vˆnc,tree on the stationary distribution piS(ν). The approach to the uniformdistribution is accelerated by the Vˆnc,tree terms at short times. As can be seen fromFigure 5.6, the couplings Vˆnc,tree enhance the stationary distribution, illustrating thatthe graph is explored more efficiently by the dynamics with the Vˆnc,tree couplings.5.2.4 Glued binary treesIf two binary trees of Figure 5.5 are joined together as shown in Figure 5.8, oneobtains a glued binary tree. Transport through glued binary trees represents animportant class of problems [96, 104]. Of particular interest is the probability oftransfer from the head node to the bottom node in disordered glued trees. Studies ofsuch processes have been used to understand the consequences of quantum local-ization for the application of quantum walks for quantum computing and quantumcommunication algorithms [96, 104].To study quantum walks in a glued binary tree, model (5.4) was used but with917 15 23 30Node index0.0150.0200.0250.030S()0 10 20 30Node index0.000.120.25 S( )Figure 5.7: The squares represent the stationary distribution piS(ν) for a QWon the G 5 tree with ∆= 0. The circles and diamonds are the the station-ary distributions piS(ν) for quantum walks with ∆/t = 1 and ∆ε = 10/tand ∆ε = 20/t, respectively. Figure from reference [185].Figure 5.8: Schematic diagram of an ideal glued binary trees with depth-4(GBT −4).an adapted summation index to the tree shown in Figure 5.8. Figures 5.10 and 5.9illustrate the effect of the particle-number fluctuations on quantum walks througha glued binary tree. The results shown in the insets of 5.10 are for a single particle92placed at zero time in the head node of an ideal glued tree depicted in Figure 5.8.The upper panel of 5.10 shows the probabilityp j(t) =2d∑j=0|〈 j|ψ(t)〉|2 (5.14)summed over all nodes of depth level j = 3. The lower panel of Figure 5.10 isthe probability of particle density transfer between the two ends of the glued tree.Interestingly, while the ∆ 6= 0 interactions affect the population of the j = 3 level,it can observed a small effect of these interactions on the head-to-bottom transferof the particle for times < 10 t−1 (see the inset of the lower panel of Figure 5.10).In contrast, the same interactions have a much stronger effect on the head-to-bottom transfer through a disordered tree. As can be seen from Figure 5.10, the ∆ 6=0 interactions accelerate the efficiency of particle transfer through the disorderedtree, especially at short times by inducing oscillations as in the case of an open-ended binary tree discussed above. Figure 5.9 shows that these oscillations surviveaveraging over 100 disorder realizations. The disorder strength was set to w = 5/tfor these calculations.While the methodology used here limits the size of the glued tree to sevenlevels, results indicate that the localization of quantum particles in disordered gluedtrees must be affected by the couplings between particle number subspaces. Itwould be interesting to see if the head-to-bottom transfer remains insensitive tothese interactions and how the localization length is affected by such interactionsin larger trees. To treat such problems, it is necessary to develop approximatecomputation techniques for few-particle systems in glued trees.9301021Site index0.|i(t)|201021Site index0.|i(t)|20 1 2 3 4 5time (in units of t 1)01021Site index0.|i(t)|2Figure 5.9: Average particle probability distribution in a disordered GBT 4graph: upper panel − ∆/t = 0, middle panel − ∆ε = 10/t and lowerpanel − ∆ε = 20/t. For all panels we consider 100 realizations of dis-order with a strength of w = 5/t and γ = 0. The wave packet for aparticle is initially placed in the head node of a GBT 4 graph. Figurefrom reference [185].5.3 ConclusionIn this work, coherent quantum dynamics governed by the lattice Hamiltonianswith number non-conserving interactions in the few-body limit was considered.As it is illustrated, the couplings between particle-number subspaces, even if muchsmaller than the energy separation between these subspaces, accelerate the dynam-ics of quantum walks in ideal lattices and binary trees and increase the localizationlength in disordered lattices. Effectively, these couplings provide new degrees offreedom, increasing the range of hopping due to virtual excitations and/or tran-sient elimination of a single particle due to coupling to the vacuum state. As isshowed, the number-changing interactions decrease the mixing and hitting timesfor quantum walks on binary trees.94Results show that the inverse participation ratio in disordered one-dimensionallattices decreases in the presence of number-changing interactions, signalling de-crease of localization. This effect increases with increasing disorder strength, lead-ing to larger changes of the inverse participation ratio in lattices with stronger on-site disorder. This is a direct consequence of the disorder-induced broadening ofthe particle energy bands. This broadening brings different particle number sub-spaces closer in energy, increasing the effect of the number-changing couplingsand, consequently, the effective range of particle hopping.Engineering lattice Hamiltonians to accelerate quantum dynamics has been ofmuch recent interest due to potential applications in quantum computing and thestudy of the fundamental limits of the speed of correlation propagations in quantummany-body systems. Also of much interest is the localization dynamics of particleswith long-range hopping in disordered lattices and graphs. This work illustratesthat models of the type (5.1) can be used to study the effect of hopping range onAnderson localization and quantum walks spreading faster than ballistic expansion.While non-interacting particles are known to be always localized in disordered1D lattices, there is a localization - diffusion transition in 3D lattices [156]. Resultsindicate that the number-changing interactions must affect this transition. It wouldbe interesting in future work to explore the quantitative effect of such interactionson the localization transition in 3D disordered lattices. It would also be interestingto explore the effect of such interactions on localization in 2D lattices. While non-interacting particles with short-range hopping are known to be always localizedin 2D disordered latices, particle interactions may lead to delocalization. Sincethe ∆ 6= 0 terms considered here create pairs of interacting particles in adjacentsites, these interactions may have non-trivial consequences on the localization indisordered 2D lattices.950 2 4 6 8 10time (in units of t 1) 3(t)=10=20=00 5 10time (in units of t 1) 3(t)0 2 4 6 8 10time (in units of t 1) 6(t)=10=20=00 5 10time (in units of t 1)0.00.5p 6(t)Figure 5.10: Average particle probability distribution in a disorderedGBT − 4 graph with strength w = 5/t. The wave packet for a sin-gle particle is initially placed in the head node of a GBT − 4 graphshown in Figure 5.8. The upper panel displays the probability (5.14)summed over all nodes of layer j = 3; the lower panel shows p j(t) forthe bottom node j = 6. Insets: The particle probability distributionsfor an ideal GBT − 4 graph with ∆ε = 10/t (blue dot-dashed) and∆ε = 20/t (red dashed). The broken curves show the results obtainedwith ∆/t = 1 and the full black lines – ∆/t = 0. Figure from reference[185].96Chapter 6Quantum simulators with highlymagnetic atomsNature isn’t classical, dammit, and if you want to make a simulationof nature, you’d better make it quantum mechanical, and by golly it’sa wonderful problem, because it doesn’t look so easy.– Richard P. FeynmanWe show that Zeeman excitations of ultracold Dy atoms trapped in an opticallattice can be used to engineer extended Hubbard models with tunable inter-siteand particle number-non-conserving interactions. We show that the ratio of thehopping amplitude and inter-site interactions in these lattice models can be tunedin a wide range by transferring the atoms to different Zeeman states. We proposeto use the resulting controllable models for the study of the effects of direct particleinteractions and particle number-non-conserving terms on Anderson localization.Results presented in this chapter are published in Ref. [184].6.1 IntroductionThere is currently growing interest in engineering lattice Hamiltonians with ul-tracold atoms and molecules [115]. Of particular interest are extended Hubbardmodels, which include interactions between particles in different lattice sites. Suchmodels exhibit rich physics and have been used to explain the role of long-range97interactions in the context of superfluid - Mott insulator transitions [74], antiferro-magnetism [27, 101], high-Tc superconductivity [56], twisted superfluidity [173],supersolids [141], self-trapping of bipolarons [175]. Extended Hubbard models arevery difficult to solve numerically, especially for two- and three-dimensional lat-tices. Hence, the need to build experiments, where a many-body quantum systemis described by an extended Hubbard model, whose parameters (in particular, theratio of the hopping amplitude and the inter-site interaction energy) can be tunedby varying external fields, and where the particle densities can be imaged prefer-ably with single site resolution. Tuning the parameters of the model, one could usesuch experiments to map out the phase diagrams.There are many proposals for realizing lattice models, including extended Hub-bard models [11, 29, 129, 135, 164, 202], with ultracold atoms or molecules trappedin optical lattices. However, if ultracold atoms or molecules are used as probeparticles of such models, the inter-site interactions are usually very weak. There-fore, the measurements of the phase diagrams require extremely low temperaturesand extremely long coherence times, which are often difficult to achieve in cur-rent experiments. A more promising approach is to trap ultracold molecules inan optical lattice in a Mott insulator phase (with one molecule per site) and userotational excitations of trapped molecules as probe particles of lattice models[73, 84, 86, 110, 121, 128, 138, 198, 201]. Such excitations can be transferredbetween molecules in different sites due to dipole - dipole interactions. The dy-namics of the excitations as well as their interactions can be controlled by externaldc electric and/or microwave fields, leading to lattice models with tunable parame-ters. Experiments using excitations as probe particles of lattice models can toleratemuch higher temperatures of atomic or molecular motion. However, it is currentlynot possible to create an optical lattice filled uniformly with molecules. On theother hand, ultracold atoms can be trapped in optical lattices with nearly uniformfilling [74, 195]. Thus, it would be desirable to engineer extended Hubbard modelswith internal excitations of atoms (instead of molecules) trapped in a Mott insulatorphase.A series of experiments have recently demonstrated the cooling of highly mag-netic Cr [75], Dy [119, 120], and Er [5, 65] atoms to quantum degeneracy. Suchatoms interact via long-range magnetic dipole interactions and one can envision en-98gineering the same lattice models with magnetic atoms as with ultracold molecules.However, the internal level structure of magnetic atoms is more complex than therotational structure of molecules and the nature of magnetic dipole interactions isdifferent from that of electric dipole interactions. Motivated by the experiments onmagnetic atoms and the work with ultracold molecules, we explore here the possi-bility of engineering extended Hubbard models with internal Zeeman excitations ofultracold magnetic atoms, such as Dy, trapped in a Mott insulator phase. Exploit-ing the unique nature of magnetic dipole interactions, we show that, for Zeemanexcitations, the ratio of the hopping amplitude and inter-site interaction energy inthe resulting lattice models can be tuned in a wide range by transferring the atomsto different Zeeman states. We discuss the advantages of using Zeeman excitationsof magnetic atoms over rotational excitations of ultracold molecules. In particular,we show that the hopping of the Zeeman excitations in the lattice is insensitive tothe magnitude of the magnetic field, which makes the coherent dynamics of exci-tations robust to field fluctuations. We show that Zeeman excitations in a dilutedlattice of Dy atoms undergo Anderson localization over time scales less than onesecond and propose the models derived here for the study of the role of interactionsand particle number fluctuations on Anderson localization.6.2 Lattice Hamiltonian with Zeeman excitationsWe consider an ensemble of open-shell atoms with non-zero electron spin (S) andorbital angular momentum (L) trapped in an optical lattice in the presence of anexternal DC magnetic field. We assume that the atoms fill the lattice uniformlywith one atom per lattice site and that the atoms are not allowed to tunnel betweendifferent lattice sites. Thus, the atoms are separated by a large distance (≥ 260nm) equal to half the wavelength of the trapping field. At such separations, thedominant interaction between the atoms in sites i and j is the magnetic dipole -dipole interaction Vˆi j. For simplicity, we assume that the atoms are arranged in aone-dimensional array along the z-axis of the space-fixed coordinate frame. In thiscase,Vˆi j =αr3i j{12[Jˆi,+Jˆ j,−+ Jˆi,−Jˆ j,+]−2Jˆi,zJˆ j,z} . (6.1)99where Jˆz and Jˆ± are the z-component and the raising/lowering operators of the totalangular momentum J = L + S, acting on the space of the eigenstates |JM〉 of J2and Jˆz, and α is the fine structure constant. Equation 6.1 is derived in Appendix C.The full Hamiltonian of the many-atom system isHˆ =∑i{ALi ·Si+µB(Li+2Si) ·B}+ 12∑i ∑j 6=iVˆi j (6.2)where A is the constant of the spin-orbit interaction, µB is the Bohr magneton andB is the vector of an external magnetic field.We assume that all atoms are initially prepared in the Zeeman state |g〉 and asmall number of atoms is then transferred to another Zeeman state |e〉. Note thatthe state |e〉 can be lower or higher in energy than the state |g〉. Following theapproach described in Refs. [2] (see also [196]), we derive the second-quantizedHamiltonian describing the Zeeman transitions in this system:Hˆex = vg+∑i∑e′{εe′− εg+∑j 6=i[〈e′i|〈g j|Vˆi j|e′i〉|g j〉−〈gi|〈g j|Vˆi j|gi〉|g j〉]}cˆ†i,e′ cˆi,e′(6.3)+ ∑i, j 6=i∑e′,e′′〈gi|〈e′j|Vˆi j|e′′i 〉|g j〉cˆ†i,e′′ cˆ j,e′+ ∑i, j 6=i∑e′,e′′(1−δe′,e′′)〈e′i|〈g j|Vˆi j|e′′i 〉|g j〉cˆ†i,e′ cˆi,e′′(6.4)+12 ∑i, j 6=i ∑e′,e′′ ∑f ′, f ′′[δe′,e′′δ f ′, f ′′〈gi|〈g j|Vˆi j|gi〉|g j〉+ 〈e′i|〈 f ′j|Vˆi j|e′′i 〉| f ′′j 〉−2δ f ′, f ′′〈e′i|〈g j|Vˆi j|e′′i 〉|g j〉]cˆ†i,e′ cˆi,e′′ cˆ†j, f ′ cˆ j, f ′′(6.5)+ ∑i, j 6=i∑e′[〈gi|〈g j|Vˆi j|gi〉|e′j〉cˆ j,e′+ 〈gi|〈e′j|Vˆi j|gi〉|g j〉cˆ†j,e′](6.6)+12 ∑i, j 6=i ∑e′,e′′[〈gi|〈g j|Vˆi j|e′i〉|e′′j 〉cˆi,e′ cˆ j,e′′+ 〈e′i|〈e′′j |Vˆi j|gi〉|g j〉cˆ†i,e′ cˆ†j,e′′](6.7)+12 ∑i, j 6=i ∑e′,e′′, f ′[〈e′i|〈g j|Vˆi j|e′′i 〉| f ′j〉−δe′,e′′〈gi|〈g j|Vˆi j|gi〉| f ′j〉] cˆ†i,e′ cˆi,e′′ cˆ j, f ′(6.8)+12 ∑i, j 6=i ∑e′,e′′, f ′[〈e′i|〈 f ′j|Vˆi j|e′′i 〉|g j〉−δe′,e′′〈gi|〈 f ′j|Vˆi j|gi〉|g j〉] cˆ†i,e′ cˆi,e′′ cˆ†j, f ′(6.9)100wherevg = Nεg+12∑i ∑j 6=iV ggi j , (6.10)N is the number of atoms, εg and εe′ are the energies of the atomic states |g〉 and|e′〉, andV ggi j = 〈gi|〈g j|Vˆi j|gi〉|g j〉. (6.11)Here, we assume that the Zeeman states e′,e′′, f ′, f ′′ 6= g and use the operators cˆ†i,e′and cˆi,e′ defined by cˆ†i,e′ |g j〉= δi j|e′j〉 and cˆi,e′ |e′j〉= δi j|g j〉. For the purposes of thiswork, it is convenient to rewrite this complex Hamiltonian asHˆex = vg+∑i(∆εeg+di)cˆ†i cˆi+∑i∑j 6=iti jcˆ†j cˆi+ (6.12)12∑i ∑j 6=ivi jc†i cic†jc j + (6.13)12∑i ∑j 6=iti j(cˆ†i cˆ†j + cˆicˆ j)+∑i∑j 6=isi j(cˆ†i + cˆi)+∑i∑j 6=ipi j(cˆ†i + cˆi)cˆ†j cˆ j (6.14)+H (e′ 6= e,g) (6.15)where the operators cˆ†i and cˆi are defined by cˆ†i |g j〉 = δi j|e j〉 and cˆi|e j〉 = δi j|g j〉,∆εeg is the energy separation between the states |e〉 and |g〉, and the parameters ofthe Hamiltonian aredi =∑j 6=idi j, (6.16)di j ={V gei j −V ggi j}, (6.17)vi j =V eei j +Vggi j −2V egi j , (6.18)101V egi j =Vgei j = 〈gi|〈e j|Vˆi j|gi〉|e j〉 (6.19)V eei j = 〈ei|〈e j|Vˆi j|ei〉|e j〉 (6.20)ti j = 〈gi|〈e j|Vˆi j|ei〉|g j〉 (6.21)si j = 〈ei|〈g j|Vˆi j|gi〉|g j〉 (6.22)andpi j = 〈ei|〈g j|Vˆi j|ei〉|e j〉−〈ei|〈g j|Vˆi j|gi〉|g j〉. (6.23)The terms (6.12), (6.13) and (6.14) are a part of the full Hamiltonian that de-scribes the Zeeman transitions only within the four-state subspace |a〉|b〉 with both|a〉 and |b〉 being either |g〉 or |e〉. If the energy gap for the |g〉→ |e〉 transition werefar detuned from all other energy gaps in the Zeeman level spectrum, it would besufficient to consider the part of the Hamiltonian given by Eqs. (6.12), (6.13) and(6.14). It is important to note that for highly magnetic atoms it may be necessary toconsider Zeeman states outside of this subspace. Figure 1a shows that the Zeemanstates of a Dy atom in the ground electronic state form a ladder of nearly equidis-tant levels at weak magnetic fields. This pattern of energy levels is characteristic ofhighly magnetic atoms with zero or negligible hyperfine structure. This pattern ofenergy levels allows for transitions to states outside of the subspace spanned by |g〉or |e〉. For example, two atoms in the |g〉 state may interact to produce two Zeemanstates with energies just above and just below that of |g〉. Such interactions are in-duced by the matrix elements in Eq. (6.7). The full Hamiltonian must also includethe terms that describe the interactions of two atoms in states e′,e′′ 6= g to produceatoms in other states f ′, f ′′ 6= g,e. Since the majority of atoms are in a particularstate |g〉, we assume that such interactions are unlikely and neglect them.Various lattice models can be engineered by controlling the magnitude of thedifferent matrix elements of the magnetic dipole interaction entering Eqs. (6.3) -(6.9).102-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0  200  400  600  800  1000Energy (cm-1 )Magnetic Field B0 (G)J = 8∆ε0∆ε1∆ε-1 0 20 40 60 80 100 120 0  50  100  150  200  250  300  350  400Energy (Hz)Magnetic Field B0 (G)∆ε1 - ∆ε-1∆ε1 - ∆ε0ti,i+1 Figure 6.1: Upper panel: Zeeman levels of a Dy(5I) atom in the lowest-energy spin-orbit state characterized by J = 8 in a magnetic fieldB = B0zˆ. Lower panel: the solid curve – difference of the energygaps (εM=2− εM=1)− (εM=1 − εM=0); the dot-dashed curve – differ-ence of the energy gaps (εM=2−εM=1)−(εM=0−εM=−1). The horizon-tal dashed line shows the magnitude of the matrix element ti,i+1 in Eq.(6.21) for Dy atoms with |g〉= |J = 8,M = 0〉 and |e〉= |J = 8,M = 1〉in an optical lattice with a = 266 nm. Figure from reference [184].1036.3 Engineering lattice modelsIn this section we show (i) how to simplify the lattice Hamiltonian presented inSection II by applying magnetic fields; and (ii) how to tune the relative magnitudesof the parameters of the resulting lattice models by transferring atoms into differentstates. We illustrate the tunable range of the parameters by calculating the modelparameters for the specific example of Dy atoms in an optical lattice.6.3.1 t−V modelEqs. (6.12) and (6.13) represents a t−V model [37], an extended single band, Hub-bard model for hard-core bosons [76, 135, 141]. This model can be studied withthe Zeeman excitations if the effect of the terms (6.14) and (6.15) are suppressed.As we show below, this can be achieved by applying a finite magnetic field andintroducing a small admixture of different M-states into the eigenstates |JM〉.Eqs. (6.3) - (6.9) and (6.12) – (6.14) can be separated into terms that conservethe number of excitations (Eqs. 6.3 – 6.5, 6.12 and 6.13) as well as particle number-non-conserving terms (Eqs. 6.6 – 6.9 and 6.14). If the Zeeman states form a ladderof equidistant states, the particle number-non-conserving terms can be further sep-arated into energy-conserving (some terms in Eq. 6.7) and energy-non-conservingterms (Eqs. 6.6 – 6.9, 6.14). The effect of the energy-non-conserving terms canbe eliminated by applying a finite magnetic field such that the energy differencebetween the Zeeman levels is significantly larger than the magnitude of the matrixelements appearing in Eqs. (6.6) – (6.9) and (6.14).In order to eliminate the effect of all terms in Eq. (6.15), it is necessary tomake the energy gap for the |g〉 → |e〉 transition unique, i.e. different from theenergy gaps in the Zeeman spectrum just below and just above the states |g〉 and|e〉. This can be achieved by applying a magnetic field strong enough to shiftthe Zeeman levels due to couplings between different total angular momentumstates. As illustrated in the lower panel of Figure 6.1, these couplings introducea differential in the energy gaps between different Zeeman states. To illustratethis, we plot in Figure 6.1 the of the energy gaps between the states correlatingwith the states |J = 8,M = −1〉 and |J = 8,M = 0〉; states |J = 8,M = 0〉 and|J = 8,M =+1〉 and states |J = 8,M =+1〉 and |J = 8,M =+2〉, as functions of104B0. As Figure 6.1 shows, the magnetic field with B0 ≈ 200− 300 G produces thedifferential of the energy gaps equal to the matrix elements ti,i+1 for Dy atoms onan optical lattice with a = 266 nm. At fields with B0 > 300 G, the difference in theenergy gaps becomes much larger than any of the matrix elements in Eq. (6.15) sothe Hamiltonian (6.12) – (6.15) reduces to the t−V model.The parameters of the t −V model can be tuned by transferring atoms intodifferent Zeeman states. If the |g〉 and |e〉 states are the Zeeman states |g〉= |JM〉and |e〉= |JM′〉, the matrix elements (6.11) and (6.21) of the operator (6.1) can bewritten as follows:di j =Vgei j −V ggi j =2αr3i j(M2−M′M) (6.24)andti j =α2r3i j[ai+bj−δiM′,M+1δjM′,M−1+ai−bj+δiM′,M−1δjM′,M+1], (6.25)withai± = [J(J+1)−M(M±1)]1/2 (6.26)b j± =[J(J+1)−M′(M′±1)]1/2 (6.27)The interaction between the Zeeman excitations (6.18) can be written asvi j =−[(V egi j −V ggi j )+(V egi j −V eei j )]=−2αr3i j(M−M′)2 (6.28)These equations show that the diagonal matrix elements V ggi j and Vegi j , and hencedi j and vi j are non-zero, provided both M 6= 0 and M′ 6= 0. This is different fromthe case of the electric dipole - dipole interaction between molecules [85]. Theelectric dipole interaction must couple states of the opposite parity. Therefore, if|g〉 and |e〉 are the eigenstates of a molecular Hamiltonian in the absence of electricfields, the matrix elements di j and vi j of the electric dipole - dipole interactionvanish. These interactions can be induced in an ensemble of polar molecules by105applying an external electric field that mixes the rotational states with differentparity [85, 197]In contrast, the matrix elements of the magnetic dipole - dipole interaction(6.24) and (6.25) should not be expected to vary significantly with an externalmagnetic field. This will be illustrated and discussed in the following section,using the example of Dy atoms on an optical lattice. As follows from Eqs. (6.24)and (6.25), the relative weights of the two couplings can be tuned by choosingdifferent Zeeman states |JM〉 as the |g〉 and |e〉 states. Notice, for example, thatfor the particular case of |g〉 being the state |J,M = 0〉, the magnitudes of di j, andconsequently di, vanish.6.3.2 t−V model with Dy atomsWe illustrate the range of controllability of the parameters of the t −V modelsusing an example of Dy atoms in an optical lattice. The absolute magnitudes ofdi j, ti j and vi j increase with J as the square of the magnetic moment. The groundelectronic state of Dy is characterized by the total angular momentum J = 8 soDy atoms have a large magnetic moment (10 Bohr magnetons) and a manifold ofZeeman states displayed in Figure 6.1. The Zeeman structure of Dy allows for thepossibility of using the state |M = 0〉 as the |g〉 state, leading to the value di j = 0.If the states for the Zeeman excitations in an ensemble of Dy atoms are chosento be well-defined angular momentum states |g〉= |JM〉 and |e〉= |JM′〉, Eq. (6.25)shows that ti j = 0 unless |M−M′| = 1. Eq. (6.28) shows that the interaction vi jis ∝ (M−M′)2 so it is independent of M and M′, if |M−M′| = 1. However, theparameter ti j is sensitive to the magnitudes of M and M′. This is illustrated inthe upper panel of Figure 6.2. The ratio ti j/vi j can thus be tuned by transferringatoms into the Zeeman states with different M, as illustrated in the lower panel ofFigure 6.2. Notice that the ratio ti j/vi j is always negative, which means that theinteractions between the excitations are always effectively attractive. The largestmagnitude of the ratio ti j/vi j ≈−18 can be achieved when the atoms are preparedin the Zeeman state with M= 0 and excited to the Zeeman state with M=+1, whilethe smallest magnitude of the ratio ti j/vi j ≈ −4 can be achieved by preparing theatoms in the maximally stretched state |J = 8,M =−8〉 or |J = 8,M =+8〉.106-8-7-6-5-4-3-2-1 0 1 2 3 4 5 6 7 8-8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8M'M 5 10 15 20 25 30 35 40| t ij |  (in units of Hz)-8-7-6-5-4-3-2-1 0 1 2 3 4 5 6 7 8-8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8M'M-18-16-14-12-10-8-6-4Ratio  tij / vij Figure 6.2: The magnitudes of the coupling constants ti j (upper panel) andthe ratio ti j/vi j (lower panel) with j = i± 1 for the Zeeman states ofDy corresponding to |g〉 ⇒ |JM〉 and |e〉 ⇒ |JM′〉. The calculations arefor the magnetic field B = B0 (0.1xˆ+ zˆ) with B0 = 100 G. The Zeemanstates in this magnetic field retain 96% of the eigenstates of J2 and J z.Figure from reference [184].107As illustrated in Figure 6.3 the absolute magnitude of vi j can be tuned if theatoms are prepared in coherent superpositions of states with different M. Considerfor example the superpositions |g〉 = α|JM〉+ β |J,M + δ 〉 and |e〉 = α ′|JM′〉+β ′|J,M′+ δ ′〉. For the parameter ti j to be non-zero, either |M−M′| or |M−M′+δ − δ ′| must be 1. However, there is no such restriction on the matrix elementsdetermining the magnitude of vi j. As follows from Eq. (6.28), the magnitudeof vi j is expected to increase with increasing the difference between the angu-lar momentum projections of the states participating in the excitation. This isgraphically illustrated in Figure 6.3, showing that the magnitude of vi j can reach600 Hz, if M −M′ = 16. This suggests that the ratio ti j/vi j can be tuned bypreparing the atoms in the coherent superpositions of the following kind: |g〉 =α|JM〉+ β |J,M+ δ 〉 and |e〉 = α ′|JM+ 1〉+ β ′|J,M+ δ ′〉, Figure 6.4. The pa-rameters ti j and vi j for these states are both non-zero and the magnitude of vi j canbe modified by varying the value of |δ −δ ′|.-8-7-6-5-4-3-2-1 0 1 2 3 4 5 6 7 8-8 -7 -6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8M'M 0 100 200 300 400 500 600|V ij |  (in units of Hz)Figure 6.3: The magnitude of the coupling constant vi j with j = i±1 for theZeeman states of Dy corresponding to |g〉⇒ |JM〉 and |e〉⇒ |JM′〉. Thecalculations are for the magnetic field B = B0 (0.1xˆ+ zˆ) with B0 = 100G. The Zeeman states in this magnetic field retain 96% of the eigenstatesof J2 and J z. Figure from reference [184].108-14-12-10-8-6-4-2 0 2-6 -5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8Ratio  vi,i+1 / ti,i+1M'(√3/4,√1/4)(√3/5,√2/5)Figure 6.4: The ratio vi,i+1/ti,i+1 for the Zeeman states of Dy correspondingto |g〉= |J = 8,M =−7〉 and |e〉= a|J = 8,M =−8〉+b|J = 8,M′〉: fullcircles – a =√3/4,b =√1/4; open circles – a =√3/5,b =√2/5.The calculations are for the magnetic field B = B0 (0.1xˆ+ zˆ) with B0 =100 G. The Zeeman states in this magnetic field retain 96% of the eigen-states of J2 and J z. Figure from reference [184].The interaction of atoms with a magnetic field couples states with different to-tal angular momenta J, which may - in principle - modify the atomic states |g〉and |e〉, and, consequently, the lattice model parameters. It is important to exam-ine the effect of an external magnetic field on the lattice model parameters. Todo this we diagonalized the full Hamiltonian of the Dy atom in a magnetic fieldB = B0 (0.1xˆ+ zˆ) and used the eigenstates to evaluate the model parameters inEqs. (6.12) – (6.13). Since the states of different J in the Dy atom are separated bylarge energy gaps (> 1000 cm−1) due to the spin-orbit interaction, the eigenstatesof Dy in a magnetic field are nearly identical to the angular momentum states |JM〉.Figure 6.4 shows the nearest-neighbour coupling parameters ti,i+1 and vi,i+1 for aone-dimensional array of Dy atoms on an optical lattice with the lattice site sepa-ration a = 266 nm computed for two pairs of Zeeman states at different magneticfields. The results shown in Figure 6.5 illustrate that the Hamiltonian parameters109do not change with the magnetic field in the interval of field strengths between zeroand 5000 G. This is important because it shows that the magnetic field can be usedto separate the Zeeman states in order to create isolated two-level systems or tunedto the limit of vanishing field where the terms in Eq. (6.14) become important,without affecting the parameters of excitation interactions.-5 0 5 10 15 20 25 30 35 40 0  200  400  600  800  1000Energy (Hz)Magnetic Field B0 (G)ttvFigure 6.5: The magnetic field dependence of the quantities ti,i+1 (squares)and vi,i+1 (circles) defined in Eqs. (6.24) and (6.25) for two differentpairs of the Zeeman state of Dy(J = 8) atoms: the full symbols – theresults for |g〉 = |J = 8,M = −8〉 and |e〉 = |J = 8,M = −7〉; the opensymbols – the results for |g〉= |J = 8,M = 0〉 and |e〉= |J = 8,M =+1〉.The magnetic field is given by B = B0 (0.1xˆ+ zˆ). The Zeeman states insuch a magnetic field retain 96% of the eigenstates of J2 and J z. Figurefrom reference [184].6.3.3 Particle number-non-conserving interactionsIn the limit of weak magnetic fields, as ∆εeg → 0, the energy separation betweendifferent particle number states of the model (6.12) decreases to the minimum ofdi. As follows from Eq. (6.24), this parameter can be eliminated if the groundstate |g〉 is chosen to be |J,M = 0〉. At weak magnetic fields, the particle number-110non-conserving terms (6.14) as well as the terms in Eq. (6.15) must be included inthe Hamiltonian. Number non-conserving interactions may mediate effective long-range hopping (for example, a particle can move in a lattice by virtual transitions tothe three-particle subspace and back). As such, these interactions may have non-trivial effects on the dynamics of quantum walks and localization in disorderedlattices. Such interactions arise in the context of excitons in molecular crystals[3, 12]. However, they are usually negligibly small and difficult to investigate. Asshown below, number non-conserving interactions can be made significant in thesystem considered here.We first note that if the array of atoms is arranged along the magnetic fielddirection, the matrix elements of the operator (6.1) that determine the parameters si jand pi j in Eq. (6.14) vanish. This simplifies the resulting lattice models to includeonly the first of the particle number-non-conserving terms in Eq. (6.14). If desired,the terms si j and pi j can be tuned to finite values if the magnetic field direction ischanged or the atoms are prepared in coherent superpositions of different M-states.For example, if |g〉 = |J,M = 0〉 and |e〉 = α|J = 8,M = 0〉+ β |J = 8,M = 1〉,all of ti j, si j and pi j become non-zero. Here, we assume that the magnetic field isdirected along the atomic array and that si j = 0 and pi j = 0.Care must be taken when considering the limit ∆εeg→ 0. In this limit, multipleZeeman states become degenerate and it may be necessary to consider interbandcouplings determined by Eq. (6.15). This may be useful if complicated models,including multiple excitations of different kind, are desired. Note, however, that if|g〉 and |e〉 are states with well-defined M and M′, a two-atom state |M〉|M′〉 canonly be coupled to the same state, the state |M′〉|M〉 or a state |M±1〉|M′∓1〉. Thematrix elements of the dipole - dipole interaction 〈M,M′|Vˆi j|M±1,M′∓1〉 changethe number and type of excitations in the atomic ensemble. These processes canbe eliminated if the state |g〉 is chosen to be |J,M =±J〉. In this case, the effectivelattice model describing the dynamics of |g〉 → |e〉 excitations is111Hˆex = vg+∑i(∆εeg+di)cˆ†i cˆi+∑i∑j 6=iti jcˆ†j cˆi+12∑i ∑j 6=ivi jc†i cic†jc j +12∑i ∑j 6=iti j(cˆ†i cˆ†j + cˆicˆ j)(6.29)It is important to note that this model is valid as long as ∆εeg (which is deter-mined by the magnitude of the magnetic field) is significantly larger than ti j. Inthis limit, the effect of the number-non-conserving terms is perturbative, i.e. a sin-gle excitation remains predominantly in the single-particle subspace, undergoingvirtual transitions to the three-particle subspace. If the energy gap ∆εeg is so smallthat the interactions (6.29) lead to the creation of multiple particles, other terms inEq. (6.7) must be included, making the Hamiltonian more complex.If the effects of the interactions vi j are to be removed, one can choose the states|g〉 = |J,M = 0〉 and |e〉 = |J,M = 1〉. In this case, |ti j|  |vi j| (see Figure 4).However, the lattice model for these excitations is also affected by terms in Eq.(6.7), which lead to leaking of the |e〉-state populations to other Zeeman states ofhigher energy. These terms lead to the spontaneous creation of atoms in Zeemanstates above and below the energy of the state with M = 0, as well as the inverseprocess. The Zeeman state populations must eventually return to states e and g, asthe total number of the Zeeman states is finite and small. These terms thus serve asan additional source of particle number-non-conserving interactions that generateatoms in state e.6.3.4 Anderson localization of Zeeman excitationsUntil now, we assumed that the atoms populate the optical lattice uniformly. Ifthe lattice is populated partially (which is more often the case in experiments), theempty lattice sites serve as impurities that can scatter the Zeeman excitations. Sincethe distribution of empty sites is random, the Zeeman excitations thus propagate ina randomly diluted lattice. Tuning the models as described above suggests an in-teresting opportunity to explore the role of direct particle interactions and numbernon-conserving forces on Anderson localization in disordered lattices [7, 62]. In112addition, the ability to design optical lattices with various dimensionalities and ge-ometries can be used to verify the scaling hypothesis of Anderson localization [1]as well as Anderson localization of particles with long-range hopping in variousgeometries [118]. Here, we explore if the parameters of the models based on Zee-man excitations of Dy are significant enough to allow Anderson localization overexperimentally feasible time- and length-scales.We consider an isolated Zeeman excitation in a one-dimensional array of 1000Dy atoms trapped in an optical lattice with a = 266 nm containing 20 % of emptylattice sites. We use the parameters correpsonding to the |J = 8,M = 0〉 → |J =8,M =+1〉 excitation and compute the dynamics of quantum walk for the Zeemanexcitation placed at t = 0 on a single atom in the middle of the lattice. The wavepacket of the excitations is propagated by computing the time-evolution operator,as described in detail in Ref. [200]. The results of each dynamical propagationare averaged over 100 disorder realizations (random distributions of empty latticesites).The results shown in Figure 6.6 illustrate that the Zeeman excitation formsan exponentially localized spatial distribution within one second. The width ofthe distribution characterized as the length L containing 90 % of the excitationprobability exhibits a short-time oscillation which is likely an effect of coherentback scattering and approaches the value of ∼ 20 lattice sites in the limit of longtime. These results can be directly mapped onto the results describing Andersonlocalization for rotational excitations in an ensemble of polar molecules [200] andthe electronic excitations in an ensemble of Rydberg atoms [150].6.4 ConclusionIn this work, we consider Zeeman excitations in an ensemble of highly magneticatoms (such as Dy) trapped in an optical lattice, with one atom per lattice site.The Zeeman excitations can travel in the lattice due to energy transfer between theatoms. The most important results of this work can be summarized as follows:• We show that superpositions of the Zeeman excitations can be used to sim-ulate the t −V model (the single-band, extended Bose-Hubbard model forhard-core bosons). The parameters of the model (most importantly, the ra-113tio of the hopping amplitude and the inter-site interaction energy) can betuned by preparing the atoms in different Zeeman states. For an ensemble ofDy atoms on an optical lattice with a = 266 nm, we show that the inter-siteinteraction can be engineered to be as large as 600 Hz.• We illustrate that the parameters of the model (hopping amplitudes and inter-site interactions) are insensitive to the magnetic field. This has two signifi-cant consequences. First, an external magnetic field can be used to uncouplethe electron degrees of freedom from nuclear spins, thereby removing com-plications associated with the hyperfine structure of atoms and the degenera-cies of the Zeeman states. Second, an external magnetic field can be usedto separate the Zeeman states, leading to suppression of energy- and particlenumber-non-conserving terms.• We show that the same Hamiltonian can be used to simulate a lattice modelwith significant c†i c†j terms, leading to particle number interactions. Theseinteractions mediate effective interactions modifying the hopping of particlesand can be used to produce entangled pairs [88].• Since the lattice with randomly distributed empty sites leads to a quan-tum percolation model for the Zeeman excitations, we propose to apply themodels derived here for the study of Anderson localization induced by off-diagonal disorder. In particular, our results suggest the possibility of study-ing the role of inter-site interactions and particle number fluctuations onquantum localization in diluted lattices. We show that for an optical latticewith a = 266 nm partially populated with Dy atoms, Anderson localizationof excitations placed on individual atoms occurs over timescales less than asecond.114−500 −250 0 250 500Site Index10−1510−1010−5100Probability0.0 0.5 1.0 1.5 2.0time (s)510152025L(a)Figure 6.6: Anderson localization of the |J = 8,M = 0〉 → |J = 8,M = +1〉excitation in a one-dimensional array of Dy atoms on an optical latticewith a = 266 nm and 20 % of the lattice sites empty. The upper panelshows the probability distribution for the atoms in the corresponding siteto be in the excited state at t = 2 seconds formed by a single excitationplaced at t = 0 in the middle of a lattice with 1000 sites. The lower panelshows the width of the excitation probability distribution as a functionof time. Figure from reference [184].115Chapter 7ConclusionOur intelligence is what makes us human,and AI is an extension of that quality.— Yann LeCunThe journey to understand quantum phenomena has been challenging. In thelast decades, knowledge about the quantum world has gained a huge boost both bybuilding quantum simulators and by the growth in the computational power. Forexample, the field of theoretical physics has been able to study larger systems as thecomputational power has grown over the years. Furthermore, quantum simulatorshave also improved in recent years due to progress in the precision and power ofelectronics used in the experiments. This thesis has explored new theoretical andnumerical possibilities to study novel quantum phenomena; for example, usingmachine learning algorithms that allow the discovery of new phases of matter or apossibility to accelerate the transport of quantum particles. This dissertation alsoproposed an experimental scheme to study many-body physics.7.1 Summary of the thesisThe first part of the dissertation illustrated that quantum observables, like the elec-tronic energy of a molecule or the polaron dispersion energy, can be learned usingsupervised learning algorithms. The second part of the thesis introduced a novelapproach to accelerate quantum walks in various graphs and the possibility to sim-116ulate extended Bose-Hubbard models using magnetic atoms trapped in an opticallattice.All the ML algorithms that were introduced in this dissertation are supervisedlearning methods, where the only goal is to learn the map from an input to anoutput,yi =F (xi), (7.1)where xi is the input and yi the output. For example, in the case of PESs xi is theposition of the nuclei and yi is the electronic energy. In the case of the polaronHamiltonians, xi is the parameters of the Hamiltonian and yi is the dispersion en-ergy. Here it was showed that given some training data,F (·) can be approximatedusing GP models, where the only assumption made is the analytic form of the ker-nel function. Chapter 2 compared the accuracy of GP models with another famoussupervised learning algorithm, Neural networks. As it is pointed out in Chapter 2,when the dimensionality of xi is low, GP models are a more accurate interpolationalgorithm.As was illustrated in this thesis, the main message of Chapter 3 is the use ofGPmodels as an optimization algorithm. In quantum physics, there are numerousproblems that can be formulated as an optimization problem. The goal for anyoptimization algorithm is to find the minimizer of a function,x∗ = arg minxf (x), (7.2)where the minimum of function f (·) is at x∗. For example, the optimization ofexperimental setups to increase the trapping time of an ion in a surface-electrodetrap [21, 166]. A great variety of optimization algorithms are gradient based meth-ods. However, in quantum physics the function that describes the phenomena isnot analytic and ∇ f (x) can not be evaluated. On the other hand, the ability of GPmodels to accurately approximate f (x) given some training data is what makesthem a suitable tool for BO. Also, GP models can compute the uncertainty of theprediction, something simple NNs can not do. The uncertainty is key in BO sincewe can use it to explore the volume space of f (·) to find x∗.117GPs are a non-parametric supervised method and the choice of the kernel func-tion is fundamental. For interpolation problems, GP models can predict accuratelyusing distance-based kernel functions as e−(xi−x j)2 . Chapter 4 introduced an algo-rithm that finds more robust kernels by combining simple kernel functions, chal-lenging the idea that one can quantitatively and qualitatively predict quantum ob-servables beyond the original observation range. Furthermore, it is demonstratedthat by extrapolating continuous quantum observables, like the polaron dispersion,one can discover new phases of matter.The second part of this dissertation presented the possibility to enhance the dif-fusion of quantum walks and an experimental proposal to simulate Bose-Hubbardtype models. Quantum walks are used in different algorithms due to their abilityto spread faster than a random walk. Major efforts have been devoted to enhanc-ing the linear time dependence of quantum walks diffusion, O(t). One possibilityis to allow quantum walks to interact with an external environment [130, 148].However, we decided to tackle the same problem by allowing the quantum walksto create/annihilate more walks in a graph. The possibility of making the numberof initial particles be a non-conservative quantity made the quantum walk spreadfaster than the ballistic expansion at short times, O(tn) where n > 1. We alsoshowed that in the presence of disorder a quantum walk is less localized if parti-cle number-changing interactions are considered. These conclusions are importantsince the diffusion and localization of quantum walks is key in quantum computing[47, 89, 171] and quantum transport [49, 140].The Bose-Hubbard Hamiltonian was proposed to describe the electrons in asolid state in 1963 [77, 93]. It is practically impossible to computationally sim-ulate quantum systems with a large number of particles; therefore, the ability toexperimentally study them is key. Chapter 6 presented an experimental proposal tostudy Bose-Hubbard type Hamiltonians using magnetic atoms trapped in an opticallattice. This Illustrates the possibility to tune various Bose-Hubbard type Hamil-tonians by experimentally preparing magnetic atoms in different Zeeman states.Additionally, it was explained in Chapter 6, in the limit of low external magneticfield particle a Bose-Hubbard model with number-changing interactions can be de-scried.1187.2 Future workThis section discusses some of the future possible research directions given theresults presented in this dissertation.ML has recently caught the attention in the fields of theoretical chemistry andphysics. However, in the field of theoretical chemistry, the first paper where a NNwas used to construct PESs was published in 2006 [124]. Furthermore, if one wereto construct the PES of a big polymer using GP models is practically impossibledue to the computational resources needed to invert the covariance matrix. As itis mentioned, one of the limitations of GPs is the scalability for large dimensionalsystems. Nevertheless, there have been new proposals in the ML field to improvethe scalability of GPs models for large dimensional systems [59, 192]. The com-putational bottleneck for GPs’ prediction is the inversion of the covariance matrix,which scales asO(N3), where N is the number of training points. In principle, GPsare dimensionally independent; however, the volume space increases as a functionof the dimensions and so does the training data required to fill the volume. Thus,the need for robust kernel functions to learn the intrinsic similarity of the data it iscrucial.Another salient point introduced in this dissertation is the optimization of black-box type functions. Black-box type functions do not have an analytic form and onlyoutput a value given a specific input. Optimization of black-box type functions isan active research field in computer science and could also have a great impact inexperimental physics. Chapter 3 explains how BO works to find the minimizer ofblack-box functions. The experimental challenges in ultracold physics can benefitby the use of BO type algorithms to search for better parameters that could in-crease, for example, the lifetime of quantum particles. In a current collaborationwith Prof. Tobias Schaetz, from the University of Freiburg, we are exploring theuse of BO to increase the trapping time of ions in surface-electrode traps by simu-lating the lifetime of an ion in a trap with different voltages. The assumption madehere is that there exists a function f (v) that maps a set of voltages, v, applied to thetrap to the trapping time of an ion. Thus, one can use BO to search for v∗ that willincrease the trapping time of an ion.In the field of chemistry, BO already has had a great impact. Recently, BO was119introduced in the field of experimental chemistry to improve the yield of chemistryreactions [82, 152]. Moreover, as explained in Appendix A, BO can optimizehybrid-exchange functionals for a benchmark of molecules. Most of the hybrid-exchange functionals are optimized by computing the square error between thepredicted property of a molecule and the accurate one,L =√1MM∑i=1(EDFT − Eˆ)2, (7.3)where M is the total number of molecular properties, EDFT is the predicted propertyusing density functional theory (DFT) and Eˆ is the accurate molecular property.These types of functions cannot be minimized with gradient descent based meth-ods since the gradient of a molecular property given different values of exchange-correlation functional parameters, a0,ax and ac do not exist. However, the er-ror function L does depend on the values of a0,ax, and ac [105, 151], L =f (a0,ax,ac). Thus, the minimizer of L is the values of a0,ax, and ac that bestreproduce the molecular properties.Chapter 4 introduced how GP models are capable of extrapolating continuousvariables if the right kernel function is selected. This has a great implication sinceone can study quantum problems where the observables cannot be accessed due toconvergence problems or lack of computational power. For example, it is illustratedthat GP models can extrapolate the Holstein polaron dispersion for low phonon fre-quencies, where traditional condensed matter methods have a lack of convergencedue to the dimensionality of the Hilbert space. This is not the only possibility, onecan think of extrapolating quantum observables over time 〈Oˆ(t)〉. For example,F. Ha¨se et al. [81] used a deepNN to interpolate for different times the excitationenergy transfer properties for different Fenna-Matthews-Olson (FMO) pigment-protein complex with eight chlorophyll pigments. Along these lines, one could useGP models with a combination of kernels to extrapolate the same time-dependentobservables and reduce the computational cost, since the time-dependent calcula-tions for longer time periods are computationally demanding.The constructing procedure to combine simple kernels proposed by D. K. Du-venaud et al. in Ref. [60] is based on the maximization of the BIC. The BIC120allows the comparison of different kernel combinations without the need for testdata, which makes it a powerful tool to extrapolate quantum observables. BIC isunbiased to models with a large number of parameters since it also penalizes ker-nels with a larger number of parameters to avoid memorization and enhance thelearning. One of the problems with the methodology presented in Chapter 4 is thelack of criteria to stop the search for the optimal kernel combination. In theory,one can run the algorithm and create complex kernel forms with a large number ofparameters but this does not guarantee a better extrapolation prediction. An inter-esting research direction is to understand why BIC prefers certain types of kernelcombinations. This will allow the use of GPs to extrapolate quantum observableswith a more physical intuition.As Jeff Hawkins said: The key to artificial intelligence has always been therepresentation. The most remarkable point of this dissertation is the possibility forscientists to break free from the old mathematical analytics and study more com-plex or new problems, whose representation can be better captured and describedwith machine learning algorithms.121Bibliography[1] E. Abrahams, P. W. Anderson, D. C. Licciardello, and T. V. Ramakrishnan.Scaling theory of localization: Absence of quantum diffusion in twodimensions. Phys. Rev. Lett., 42:673–676, Mar 1979.doi:10.1103/PhysRevLett.42.673. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.42.673. → page 113[2] V. M. Agranovich. Excitations in Organic Solids. Oxford: OxfordUniversity Press, 2009. → pages 80, 100[3] V. M. Agranovich and D. M. Basko. Frenkel excitons beyond theHeitler-London approximation. The Journal of Chemical Physics, 112(18):8156–8162, 2000. → page 111[4] D. Aharonov, A. Ambainis, J. Kempe, and U. Vazirani. Quantum walks ongraphs. In Proceedings of the Thirty-third Annual ACM Symposium onTheory of Computing, STOC ’01, pages 50–59, New York, NY, USA,2001. ACM. ISBN 1-58113-349-9. → pages 8, 78[5] K. Aikawa, A. Frisch, M. Mark, S. Baier, A. Rietzler, R. Grimm, andF. Ferlaino. Bose-Einstein condensation of erbium. Phys. Rev. Lett., 108:210401, May 2012. → page 98[6] Y. Akinaga and S. Ten-no. Range-separation by the Yukawa potential inlong-range corrected density functional theory with Gaussian-type basisfunctions. Chemical Physics Letters, 462(4):348 – 351, 2008. ISSN0009-2614. URLhttp://www.sciencedirect.com/science/article/pii/S0009261408010609. →page 147[7] P. W. Anderson. Absence of diffusion in certain random lattices. Phys.Rev., 109:1492–1505, Mar 1958. → pages 85, 112122[8] L.-F. Arsenault, O. Anatole von Lilienfeld, and A. J. Millis. Machinelearning for many-body physics: efficient solution of dynamical mean-fieldtheory. ArXiv e-prints, June 2015. → pages 1, 54, 74[9] A. Aspuru-Guzik and P. Walther. Photonic quantum simulators. NaturePhysics, 8:285 EP –, Apr 2012. URL http://dx.doi.org/10.1038/nphys2253.Review Article. → page 78[10] R. Baer and D. Neuhauser. Density functional theory with correctlong-range asymptotic behavior. Phys. Rev. Lett., 94:043002, Feb 2005.doi:10.1103/PhysRevLett.94.043002. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.94.043002. → page 147[11] S. Baier, M. J. Mark, D. Petter, K. Aikawa, L. Chomaz, Z. Cai, M. Baranov,P. Zoller, and F. Ferlaino. Extended Bose-Hubbard models with ultracoldmagnetic atoms. Science, 352(6282):201–205, 2016. → page 98[12] L. D. Bakalis and J. Knoester. Optical properties of one-dimensionalexciton systems: Beyond the Heitler-London approximation. The Journalof Chemical Physics, 106(17):6964–6976, 1997. → page 111[13] A. P. Barto´k and G. Csa´nyi. Gaussian approximation potentials: A brieftutorial introduction. International Journal of Quantum Chemistry, 115(16):1051–1057, 2015. doi:10.1002/qua.24927. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/qua.24927. → page 25[14] A. P. Barto´k, M. C. Payne, R. Kondor, and G. Csa´nyi. Gaussianapproximation potentials: The accuracy of quantum mechanics, without theelectrons. Phys. Rev. Lett., 104:136403, Apr 2010.doi:10.1103/PhysRevLett.104.136403. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.104.136403. → page 24[15] J. Behler. Neural network potential-energy surfaces in chemistry: a tool forlarge-scale simulations. Phys. Chem. Chem. Phys., 13:17930–17955, 2011.doi:10.1039/C1CP21668F. URL http://dx.doi.org/10.1039/C1CP21668F.→ page 25[16] M. Berciu. Green’s function of a dressed particle. Phys. Rev. Lett., 97:036402, Jul 2006. doi:10.1103/PhysRevLett.97.036402. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.97.036402. → pages 59, 69[17] A. J. Berkley, M. W. Johnson, P. Bunyk, R. Harris, J. Johansson, T. Lanting,E. Ladizinsky, E. Tolkacheva, M. H. S. Amin, and G. Rose. A scalable123readout system for a superconducting adiabatic quantum optimizationsystem. Superconductor Science and Technology, 23(10):105014, 2010.URL http://stacks.iop.org/0953-2048/23/i=10/a=105014. → page 78[18] S. D. Berry and J. B. Wang. Two-particle quantum walks: Entanglementand graph isomorphism testing. Phys. Rev. A, 83:042317, Apr 2011.doi:10.1103/PhysRevA.83.042317. URLhttps://link.aps.org/doi/10.1103/PhysRevA.83.042317. → page 79[19] J. Billy, V. Josse, Z. Zuo, A. Bernard, B. Hambrecht, P. Lugan, D. Cle´ment,L. Sanchez-Palencia, and A. Bouyer, Philippeand Aspect. Directobservation of Anderson localization of matter waves in a controlleddisorder. Nature, 453:891 EP –, Jun 2008. URLhttp://dx.doi.org/10.1038/nature07000. → page 78[20] C. M. Bishop. Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer-Verlag, Berlin, Heidelberg, 2006. ISBN0387310738. → pages 11, 67[21] R. Blatt and C. F. Roos. Quantum simulations with trapped ions. NaturePhysics, 8:277 EP –, Apr 2012. URL http://dx.doi.org/10.1038/nphys2252.Review Article. → pages 8, 117[22] J. Bonc˘a, T. Katras˘nik, and S. A. Trugman. Mobile bipolaron. Phys. Rev.Lett., 84:3153–3156, Apr 2000. doi:10.1103/PhysRevLett.84.3153. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.84.3153. → pagesxvii, 69, 70[23] A. I. Boothroyd, W. J. Keogh, P. G. Martin, and M. R. Peterson. A refinedH3 potential energy surface. The Journal of Chemical Physics, 104(18):7139–7152, 1996. → pages xiii, xiv, 24, 40, 41, 42, 44, 47[24] B. J. Braams and J. M. Bowman. Permutationally invariant potentialenergy surfaces in high dimensionality. International Reviews in PhysicalChemistry, 28(4):577–606, 2009. → page 40[25] E. Brochu, V. M. Cora, and N. de Freitas. A Tutorial on BayesianOptimization of Expensive Cost Functions, with Application to ActiveUser Modeling and Hierarchical Reinforcement Learning. ArXiv e-prints,Dec. 2010. → pages 34, 37, 38, 51[26] F. Brockherde, L. Vogt, L. Li, M. E. Tuckerman, K. Burke, and K.-R.Mu¨ller. Bypassing the Kohn-Sham equations with machine learning.124Nature Communications, 8(1):872, 2017. ISSN 2041-1723.doi:10.1038/s41467-017-00839-3. URLhttps://doi.org/10.1038/s41467-017-00839-3. → page 146[27] P. Bruno. Absence of spontaneous magnetic order at nonzero temperaturein one- and two-dimensional Heisenberg and XY systems with long-rangeinteractions. Phys. Rev. Lett., 87:137203, Sep 2001. → page 98[28] A. Buarque and W. Dias. Unidirectional quantum walk of two correlatedparticles: Manipulating bound-pair and unbound wave-packet components.Physics Letters A, 381(37):3173 – 3177, 2017. ISSN 0375-9601.doi:https://doi.org/10.1016/j.physleta.2017.08.016. URLhttp://www.sciencedirect.com/science/article/pii/S0375960117307417. →page 79[29] A. Bu¨hler and H. P. Bu¨chler. Supersolid phase in atomic gases withmagnetic dipole interaction. Phys. Rev. A, 84:023607, Aug 2011. → page98[30] I. Buluta and F. Nori. Quantum simulators. Science, 326(5949):108–111,2009. ISSN 0036-8075. doi:10.1126/science.1177838. URLhttp://science.sciencemag.org/content/326/5949/108. → pages 1, 8[31] P. I. Bunyk, E. M. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Altomare,A. J. Berkley, R. Harris, J. P. Hilton, T. Lanting, A. J. Przybysz, andJ. Whittaker. Architectural considerations in the design of asuperconducting quantum annealing processor. IEEE Transactions onApplied Superconductivity, 24(4):1–10, Aug 2014. ISSN 1051-8223.doi:10.1109/TASC.2014.2318294. → page 78[32] S. Cantrill. Water behind walls. Nature Chemistry, 3:753 EP –, Sep 2011.URL http://dx.doi.org/10.1038/nchem.1168. → page 78[33] G. Carleo and M. Troyer. Solving the quantum many-body problem withartificial neural networks. Science, 355(6325):602–606, 2017. ISSN0036-8075. doi:10.1126/science.aag2302. URLhttp://science.sciencemag.org/content/355/6325/602. → page 1[34] L. D. Carr, D. DeMille, R. V. Krems, and J. Ye. Cold and ultracoldmolecules: science, technology and applications. New Journal of Physics,11(5):055049, 2009. URLhttp://stacks.iop.org/1367-2630/11/i=5/a=055049. → page 79125[35] J. Carrasquilla and R. G. Melko. Machine learning phases of matter.Nature Physics, 13:431 EP –, Feb 2017. URLhttp://dx.doi.org/10.1038/nphys4035. → pages 1, 54, 73[36] S. Carter, N. C. Handy, and J. Demaison. The rotational levels of theground vibrational state of formaldehyde. Molecular Physics, 90(5):729–738, 1997. → page 26[37] M. A. Cazalilla, R. Citro, T. Giamarchi, E. Orignac, and M. Rigol. Onedimensional bosons: From condensed matter systems to ultracold gases.Rev. Mod. Phys., 83:1405–1466, 2011. → page 104[38] C. M. Chandrashekar and T. Busch. Quantum walk on distinguishablenon-interacting many-particles and indistinguishable two-particle.Quantum Information Processing, 11(5):1287–1299, Oct 2012. ISSN1573-1332. doi:10.1007/s11128-012-0387-6. URLhttps://doi.org/10.1007/s11128-012-0387-6. → page 79[39] T. Chattaraj and R. V. Krems. Effects of long-range hopping andinteractions on quantum walks in ordered and disordered lattices. Phys.Rev. A, 94:023601, Aug 2016. doi:10.1103/PhysRevA.94.023601. URLhttps://link.aps.org/doi/10.1103/PhysRevA.94.023601. → pages 78, 79, 83[40] R. H. Chaviguri, T. Comparin, V. S. Bagnato, and M. A. Caracanhas. Phasetransition of ultracold atoms immersed in a Bose-Einstein-condensatevortex lattice. Phys. Rev. A, 95:053639, May 2017.doi:10.1103/PhysRevA.95.053639. URLhttps://link.aps.org/doi/10.1103/PhysRevA.95.053639. → page 79[41] L. Che, Z. Ren, X. Wang, W. Dong, D. Dai, X. Wang, D. H. Zhang,X. Yang, L. Sheng, G. Li, H.-J. Werner, F. Lique, and M. H. Alexander.Breakdown of the Born-Oppenheimer approximation in the F+ o-D2 → DF+ D reaction. Science, 317(5841):1061–1064, 2007. → page 40[42] J. Chen, X. Xu, X. Xu, and D. H. Zhang. A global potential energy surfacefor the H2 + OH→ H2O + H reaction using neural networks. The Journalof Chemical Physics, 138(15):154301, 2013. → pagesxiv, 40, 41, 42, 45, 46[43] A. M. Childs. Universal computation by quantum walk. Phys. Rev. Lett.,102:180501, May 2009. doi:10.1103/PhysRevLett.102.180501. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.102.180501. → page 78126[44] A. M. Childs and J. Goldstone. Spatial search and the Dirac equation.Phys. Rev. A, 70:042312, Oct 2004. doi:10.1103/PhysRevA.70.042312.URL https://link.aps.org/doi/10.1103/PhysRevA.70.042312. → page 78[45] A. M. Childs and J. Goldstone. Spatial search by quantum walk. Phys. Rev.A, 70:022314, Aug 2004. doi:10.1103/PhysRevA.70.022314. URLhttps://link.aps.org/doi/10.1103/PhysRevA.70.022314. → pages 8, 78[46] A. M. Childs, E. Farhi, and S. Gutmann. An example of the differencebetween quantum and classical random walks. Quant. Info. Proc., 1(1):35–43, Apr 2002. → pages 6, 77[47] A. M. Childs, R. Cleve, E. Deotto, E. Farhi, S. Gutmann, and D. A.Spielman. Exponential algorithmic speedup by a quantum walk. InProceedings of the Thirty-fifth Annual ACM Symposium on Theory ofComputing, STOC ’03, pages 59–68. ACM, 2003. ISBN 1-58113-674-9.→ pages 8, 78, 118[48] A. M. Childs, D. Gosset, and Z. Webb. Universal computation bymultiparticle quantum walk. Science, 339(6121):791–794, 2013. ISSN0036-8075. doi:10.1126/science.1229957. URLhttp://science.sciencemag.org/content/339/6121/791. → page 78[49] A. W. Chin, A. Datta, F. Caruso, S. F. Huelga, and M. B. Plenio.Noise-assisted energy transfer in quantum networks and light-harvestingcomplexes. New Journal of Physics, 12(6):065002, 2010. URLhttp://stacks.iop.org/1367-2630/12/i=6/a=065002. → pages 86, 118[50] Y.-S. Cho and R. J. L. Roy. Full empirical potential curves for the X1Σ+and A1Π states of CH+ from a direct-potential-fit analysis. The Journal ofChemical Physics, 144(2):024311, 2016. → page 48[51] M. A. Collins. Molecular potential-energy surfaces for chemical reactiondynamics. Theoretical Chemistry Accounts, 108(6):313–324, Dec 2002. →page 40[52] J. Cui and R. V. Krems. Efficient non-parametric fitting of potential energysurfaces for polyatomic molecules with Gaussian processes. Journal ofPhysics B: Atomic, Molecular and Optical Physics, 49(22):224001, 2016.→ page 25[53] A. Damianou and N. Lawrence. Deep Gaussian processes. In Proceedingsof the Sixteenth International Conference on Artificial Intelligence and127Statistics, volume 31 of Proceedings of Machine Learning Research, pages207–215. PMLR, 2013. → page 76[54] F. A. B. F. de Moura, A. V. Malyshev, M. L. Lyra, V. A. Malyshev, andF. Domı´nguez-Adame. Localization properties of a one-dimensionaltight-binding model with nonrandom long-range intersite interactions.Phys. Rev. B, 71:174203, May 2005. doi:10.1103/PhysRevB.71.174203.URL https://link.aps.org/doi/10.1103/PhysRevB.71.174203. → page 78[55] W. Dong, C. Xiao, T. Wang, D. Dai, X. Yang, and D. H. Zhang.Transition-state spectroscopy of partial wave resonances in the F + HDreaction. Science, 327(5972):1501–1502, 2010. → page 40[56] S. Doniach and M. Inui. Long-range Coulomb interactions and the onset ofsuperconductivity in the high-tc materials. Phys. Rev. B, 41:6668–6678,Apr 1990. → page 98[57] W. Du¨r, R. Raussendorf, V. M. Kendon, and H.-J. Briegel. Quantum walksin optical lattices. Phys. Rev. A, 66:052319, Nov 2002.doi:10.1103/PhysRevA.66.052319. URLhttps://link.aps.org/doi/10.1103/PhysRevA.66.052319. → page 78[58] D. Duvenaud. Automatic Model Construction with Gaussian Processes.PhD thesis, Computational and Biological Learning Laboratory, Universityof Cambridge, 2014. → pages v, 57, 58, 67, 68[59] D. Duvenaud, H. Nickisch, and C. E. Rasmussen. Additive Gaussianprocesses. In Advances in Neural Information Processing Systems 24,pages 226–234, 2011. → page 119[60] D. Duvenaud, J. R. Lloyd, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani.Structure discovery in nonparametric regression through compositionalkernel search. In Proceedings of the 30th International Conference onMachine Learning, pages 1166–1174, 2013. → pages v, 57, 58, 68, 120[61] A. Ernesti and J. M. Hutson. Non-additive intermolecular forces from thespectroscopy of van der waals trimers: A comparison of Ar2-HF andAr2−HCl, including H/D isotope effects. The Journal of Chemical Physics,106(15):6288–6301, 1997. → pages 39, 48[62] F. Evers and A. D. Mirlin. Anderson transitions. Rev. Mod. Phys., 80:1355–1417, Oct 2008. → page 112128[63] E. Farhi and S. Gutmann. Quantum computation and decision trees. Phys.Rev. A, 58:915–928, Aug 1998. → pages 8, 78[64] R. P. Feynman. Simulating physics with computers. International Journalof Theoretical Physics, 21(6):467–488, Jun 1982. ISSN 1572-9575.doi:10.1007/BF02650179. URL https://doi.org/10.1007/BF02650179. →pages 1, 8[65] A. Frisch, K. Aikawa, M. Mark, A. Rietzler, J. Schindler, Z. E., R. Grimm,and F. Ferlaino. Narrow-line magneto-optical trap for erbium. Phys. Rev.A, 85:051401, May 2012. → page 98[66] T. Fukuhara, P. Schauß, M. Endres, S. Hild, M. Cheneau, I. Bloch, andC. Gross. Microscopic observation of magnon bound states and theirdynamics. Nature, 502:76 EP –, Sep 2013. URLhttp://dx.doi.org/10.1038/nature12541. → page 78[67] J. K. Gamble, M. Friesen, D. Zhou, R. Joynt, and S. N. Coppersmith.Two-particle quantum walks applied to the graph isomorphism problem.Phys. Rev. A, 81:052313, May 2010. doi:10.1103/PhysRevA.81.052313.URL https://link.aps.org/doi/10.1103/PhysRevA.81.052313. → page 78[68] B. T. Gard, R. M. Cross, P. M. Anisimov, H. Lee, and J. P. Dowling.Quantum random walks with multiphoton interference and high-ordercorrelation functions. J. Opt. Soc. Am. B, 30(6):1538–1545, Jun 2013.doi:10.1364/JOSAB.30.001538. URLhttp://josab.osa.org/abstract.cfm?URI=josab-30-6-1538. → page 79[69] B. Gerlach and H. Lo¨wen. Analytical properties of polaron systems or: Dopolaronic phase transitions exist or not? Rev. Mod. Phys., 63:63–90, Jan1991. doi:10.1103/RevModPhys.63.63. URLhttps://link.aps.org/doi/10.1103/RevModPhys.63.63. → page 56[70] O. Giraud, B. Georgeot, and D. L. Shepelyansky. Quantum computing ofdelocalization in small-world networks. Phys. Rev. E, 72:036203, Sep2005. doi:10.1103/PhysRevE.72.036203. URLhttps://link.aps.org/doi/10.1103/PhysRevE.72.036203. → page 78[71] R. Go´mez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Herna´ndez-Lobato,B. Sa´nchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel,R. P. Adams, and A. Aspuru-Guzik. Automatic chemical design using adata-driven continuous representation of molecules. ACS Central Science,4(2):268–276, 2018. → pages 1, 75129[72] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press,2016. http://www.deeplearningbook.org. → pages 30, 76[73] A. V. Gorshkov, S. R. Manmana, G. Chen, J. Ye, E. Demler, M. D. Lukin,and A. M. Rey. Tunable superfluidity and quantum magnetism withultracold polar molecules. Phys. Rev. Lett., 107:115301, Sep 2011. →pages 9, 98[74] M. Greiner, O. Mandel, T. Esslinger, T. W. Ha¨nsch, and I. Bloch. Quantumphase transition from a superfluid to a Mott insulator in a gas of ultracoldatoms. Nature, 415:39 EP –, Jan 2002. URLhttp://dx.doi.org/10.1038/415039a. Article. → pages 9, 98[75] A. Griesmaier, J. Werner, S. Hensler, J. Stuhler, and T. Pfau. Bose-Einsteincondensation of chromium. Phys. Rev. Lett., 94:160401, Apr 2005.doi:10.1103/PhysRevLett.94.160401. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.94.160401. → pages 9, 98[76] S.-J. Gu, S.-S. Deng, Y.-Q. Li, and H.-Q. Lin. Entanglement and quantumphase transition in the extended Hubbard model. Phys. Rev. Lett., 93:086402, Aug 2004. doi:10.1103/PhysRevLett.93.086402. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.93.086402. → page 104[77] M. C. Gutzwiller. Effect of correlation on the ferromagnetism of transitionmetals. Phys. Rev. Lett., 10:159–162, Mar 1963.doi:10.1103/PhysRevLett.10.159. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.10.159. → page 118[78] C. M. Handley and J. Behler. Next generation interatomic potentials forcondensed systems. The European Physical Journal B, 87(7):152, Jul2014. ISSN 1434-6036. doi:10.1140/epjb/e2014-50070-0. URLhttps://doi.org/10.1140/epjb/e2014-50070-0. → page 25[79] C. M. Handley and P. L. A. Popelier. Potential energy surfaces fitted byartificial neural networks. The Journal of Physical Chemistry A, 114(10):3371–3383, 2010. → page 40[80] R. Harris, M. W. Johnson, T. Lanting, A. J. Berkley, J. Johansson, P. Bunyk,E. Tolkacheva, E. Ladizinsky, N. Ladizinsky, T. Oh, F. Cioata, I. Perminov,P. Spear, C. Enderud, C. Rich, S. Uchaikin, M. C. Thom, E. M. Chapple,J. Wang, B. Wilson, M. H. S. Amin, N. Dickson, K. Karimi, B. Macready,C. J. S. Truncik, and G. Rose. Experimental investigation of an eight-qubitunit cell in a superconducting optimization processor. Phys. Rev. B, 82:130024511, Jul 2010. doi:10.1103/PhysRevB.82.024511. URLhttps://link.aps.org/doi/10.1103/PhysRevB.82.024511. → pages 78, 80[81] F. Ha¨se, S. Valleau, E. Pyzer-Knapp, and A. Aspuru-Guzik. Machinelearning exciton dynamics. Chem. Sci., 7:5139–5147, 2016.doi:10.1039/C5SC04786B. URL http://dx.doi.org/10.1039/C5SC04786B.→ page 120[82] F. Ha¨se, L. M. Roch, and A. Aspuru-Guzik. Chimera: enabling hierarchybased multi-objective optimization for self-driving laboratories. Chem.Sci., pages –, 2018. doi:10.1039/C8SC02239A. URLhttp://dx.doi.org/10.1039/C8SC02239A. → page 120[83] F. Ha¨se, L. M. Roch, C. Kreisbeck, and A. Aspuru-Guzik. PHOENICS: Auniversal deep Bayesian optimizer. ArXiv e-prints, Jan. 2018. → page 51[84] K. R. A. Hazzard, B. Gadway, M. Foss-Feig, B. Yan, S. A. Moses, J. P.Covey, N. Y. Yao, M. D. Lukin, J. Ye, D. S. Jin, and A. M. Rey.Many-body dynamics of dipolar molecules in an optical lattice. Phys. Rev.Lett., 113:195302, Nov 2014. doi:10.1103/PhysRevLett.113.195302. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.113.195302. → pages9, 79, 98[85] F. Herrera and R. V. Krems. Tunable Holstein model with cold polarmolecules. Phys. Rev. A, 84:051401(R), Nov 2011. → pages 105, 106[86] F. Herrera, M. Litinskaya, and R. V. Krems. Tunable disorder in a crystal ofcold polar molecules. Phys. Rev. A, 82:033428, Sep 2010. → pages 9, 98[87] F. Herrera, K. W. Madison, R. V. Krems, and M. Berciu. Investigatingpolaron transitions with polar molecules. Phys. Rev. Lett., 110:223002,May 2013. doi:10.1103/PhysRevLett.110.223002. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.110.223002. → pagesxvi, xvii, xviii, 56, 58, 59, 62, 63, 65, 66, 72, 73, 74[88] F. Herrera, Y. Cao, S. Kais, and K. B. Whaley. Infrared-dressedentanglement of cold open-shell polar molecules for universal matchgatequantum computing. New Journal of Physics, 16(7):075001, 2014. → page114[89] A. P. Hines and P. C. E. Stamp. Quantum walks, quantum gates, andquantum computers. Phys. Rev. A, 75:062321, Jun 2007. → pages6, 77, 78, 118131[90] P. Hohenberg and W. Kohn. Inhomogeneous electron gas. Phys. Rev., 136:B864–B871, Nov 1964. doi:10.1103/PhysRev.136.B864. URLhttps://link.aps.org/doi/10.1103/PhysRev.136.B864. → page 145[91] T. Hollebeek, T.-S. Ho, and H. Rabitz. Constructing multidimensionalmolecular potential energy surfaces from ab initio data. Annual Review ofPhysical Chemistry, 50(1):537–570, 1999. → page 40[92] J. M. M. Howson and J. M. Hutson. Morphing the He-OCS intermolecularpotential. The Journal of Chemical Physics, 115(11):5059–5065, 2001. →pages 39, 48[93] J. Hubbard. Electron correlations in narrow energy bands. Proceedings ofthe Royal Society of London A: Mathematical, Physical and EngineeringSciences, 276(1365):238–257, 1963. ISSN 0080-4630.doi:10.1098/rspa.1963.0204. URLhttp://rspa.royalsocietypublishing.org/content/276/1365/238. → page 118[94] F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-basedoptimization for general algorithm configuration. In C. A. C. Coello, editor,Learning and Intelligent Optimization, pages 507–523, Berlin, Heidelberg,2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3. → page 51[95] H. Iikura, T. Tsuneda, T. Yanai, and K. Hirao. A long-range correctionscheme for generalized-gradient-approximation exchange functionals. TheJournal of Chemical Physics, 115(8):3540–3544, 2001.doi:10.1063/1.1383587. URL https://doi.org/10.1063/1.1383587. → page147[96] S. R. Jackson, T. J. Khoo, and F. W. Strauch. Quantum walks on trees withdisorder: Decay, diffusion, and localization. Phys. Rev. A, 86:022335, Aug2012. doi:10.1103/PhysRevA.86.022335. URLhttps://link.aps.org/doi/10.1103/PhysRevA.86.022335. → page 91[97] D. Jaksch, C. Bruder, J. I. Cirac, C. W. Gardiner, and P. Zoller. Coldbosonic atoms in optical lattices. Phys. Rev. Lett., 81:3108–3111, Oct1998. doi:10.1103/PhysRevLett.81.3108. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.81.3108. → page 8[98] M. W. Johnson, P. Bunyk, F. Maibaum, E. Tolkacheva, A. J. Berkley, E. M.Chapple, R. Harris, J. Johansson, T. Lanting, I. Perminov, E. Ladizinsky,T. Oh, and G. Rose. A scalable control system for a superconductingadiabatic quantum optimization processor. Superconductor Science and132Technology, 23(6):065004, 2010. URLhttp://stacks.iop.org/0953-2048/23/i=6/a=065004. → page 78[99] T. H. Johnson, Y. Yuan, W. Bao, S. R. Clark, C. Foot, and D. Jaksch.Hubbard model for atomic impurities bound by the vortex lattice of arotating Bose-Einstein condensate. Phys. Rev. Lett., 116:240402, Jun 2016.doi:10.1103/PhysRevLett.116.240402. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.116.240402. → page 79[100] D. R. Jones. A taxonomy of global optimization methods based onresponse surfaces. Journal of Global Optimization, 21(4):345–383, Dec2001. → page 37[101] G. S. Joyce. Absence of ferromagnetism or antiferromagnetism in theisotropic Heisenberg model with long-range interactions. Journal ofPhysics C: Solid State Physics, 2(8):1531, 1969. → page 98[102] A. Kamath, R. A. Vargas-Herna´ndez, R. V. Krems, J. T. Carrington, andS. Manzhos. Neural networks vs Gaussian process regression forrepresenting potential energy surfaces: A comparative study of fit qualityand vibrational spectrum accuracy. The Journal of Chemical Physics, 148(24):241702, 2018. → pages xiii, 25, 26, 29, 39, 57[103] M. Karski, L. Fo¨rster, J.-M. Choi, A. Steffen, W. Alt, D. Meschede, andA. Widera. Quantum walk in position space with single optically trappedatoms. Science, 325(5937):174–177, 2009. ISSN 0036-8075.doi:10.1126/science.1174436. URLhttp://science.sciencemag.org/content/325/5937/174. → page 78[104] J. P. Keating, N. Linden, J. C. F. Matthews, and A. Winter. Localizationand its consequences for quantum walk algorithms and quantumcommunication. Phys. Rev. A, 76:012315, Jul 2007.doi:10.1103/PhysRevA.76.012315. URLhttps://link.aps.org/doi/10.1103/PhysRevA.76.012315. → page 91[105] K. Kim and K. D. Jordan. Comparison of density functional and MP2calculations on the water monomer and dimer. The Journal of PhysicalChemistry, 98(40):10089–10094, 1994. doi:10.1021/j100091a024. URLhttps://doi.org/10.1021/j100091a024. → page 120[106] A. Kitaev. Anyons in an exactly solved model and beyond. Annals ofPhysics, 321(1):2 – 111, 2006. ISSN 0003-4916.doi:https://doi.org/10.1016/j.aop.2005.10.005. URL133http://www.sciencedirect.com/science/article/pii/S0003491605002381.January Special Issue. → page 79[107] W. Koch and M. Holthausen. A chemist’s guide to density functionaltheory. Wiley-VCH, 2000. ISBN 9783527299188. URLhttps://books.google.es/books?id=rCApAAAAYAAJ. → page 145[108] P. L. Krapivsky, J. M. Luck, and K. Mallick. Interacting quantum walkers:two-body bosonic and fermionic bound states. Journal of Physics A:Mathematical and Theoretical, 48(47):475301, 2015. URLhttp://stacks.iop.org/1751-8121/48/i=47/a=475301. → page 79[109] P. L. Krapivsky, J. M. Luck, and K. Mallick. Quantum centipedes:collective dynamics of interacting quantum walkers. Journal of Physics A:Mathematical and Theoretical, 49(33):335303, 2016. URLhttp://stacks.iop.org/1751-8121/49/i=33/a=335303. → page 79[110] K. A. Kuns, A. M. Rey, and A. V. Gorshkov. d-wave superfluidity inoptical lattices of ultracold polar molecules. Phys. Rev. A, 84:063639, Dec2011. → pages 9, 98[111] K. Kurotobi and Y. Murata. A single molecule of water encapsulated infullerene C60. Science, 333(6042):613–616, 2011. ISSN 0036-8075.doi:10.1126/science.1206376. URLhttp://science.sciencemag.org/content/333/6042/613. → page 78[112] H. Kushner. A new method of locating the maximum point of an arbitrarymultipeak curve in the presence of noise. J. Basic Eng., 86(1):97–106,1964. → page 34[113] B. Lau, M. Berciu, and G. A. Sawatzky. Single-polaron properties of theone-dimensional breathing-mode Hamiltonian. Phys. Rev. B, 76:174305,Nov 2007. doi:10.1103/PhysRevB.76.174305. URLhttps://link.aps.org/doi/10.1103/PhysRevB.76.174305. → pages 56, 58[114] T. Leininger, H. Stoll, H.-J. Werner, and A. Savin. Combining long-rangeconfiguration interaction with short-range density functionals. ChemicalPhysics Letters, 275(3):151 – 160, 1997. ISSN 0009-2614.doi:https://doi.org/10.1016/S0009-2614(97)00758-6. URLhttp://www.sciencedirect.com/science/article/pii/S0009261497007586. →page 147134[115] M. Lewenstein, A. Sanpera, V. Ahufinger, B. Damski, A. Sen(De), andU. Sen. Ultracold atomic gases in optical lattices: mimicking condensedmatter physics and beyond. Advances in Physics, 56(2):243–379, 2007. →pages 8, 97[116] D. Li, J. Zhang, F.-Z. Guo, W. Huang, Q.-Y. Wen, and H. Chen.Discrete-time interacting quantum walks and quantum hash schemes.Quantum Information Processing, 12(3):1501–1513, Mar 2013. ISSN1573-1332. doi:10.1007/s11128-012-0421-8. URLhttps://doi.org/10.1007/s11128-012-0421-8. → page 79[117] E. H. Lieb and D. W. Robinson. The finite group velocity of quantum spinsystems. Comm. Math. Phys., 28(3):251–257, 1972. URLhttps://projecteuclid.org:443/euclid.cmp/1103858407. → page 78[118] D. E. Logan and P. G. Wolynes. Localizability and dephasing of dipolarexcitons in topologically disordered systems. The Journal of ChemicalPhysics, 87(12):7199–7207, 1987. → page 113[119] M. Lu, N. Q. Burdick, S. H. Youn, and B. L. Lev. Strongly dipolarBose-Einstein condensate of dysprosium. Phys. Rev. Lett., 107:190401, Oct2011. → page 98[120] M. Lu, N. Q. Burdick, and B. L. Lev. Quantum degenerate dipolar Fermigas. Phys. Rev. Lett., 108:215301, May 2012. → page 98[121] S. R. Manmana, E. M. Stoudenmire, K. R. A. Hazzard, A. M. Rey, andA. V. Gorshkov. Topological phases in ultracold polar-molecule quantummagnets. Phys. Rev. B, 87:081106, Feb 2013. → pages 9, 98[122] S. Manzhos and J. T. Carrington. Using neural networks to representpotential surfaces as sums of products. The Journal of Chemical Physics,125(19):194105, 2006. → page 24[123] S. Manzhos and T. Carrington. Using an internal coordinate Gaussian basisand a space-fixed cartesian coordinate kinetic energy operator to compute avibrational spectrum with rectangular collocation. The Journal of ChemicalPhysics, 145(22):224110, 2016. → page 28[124] S. Manzhos, X. Wang, R. Dawes, and T. Carrington. A nestedmolecule-independent neural network approach for high-quality potentialfits. The Journal of Physical Chemistry A, 110(16):5295–5304, 2006. →pages 24, 119135[125] D. J. J. Marchand, G. De Filippis, V. Cataudella, M. Berciu, N. Nagaosa,N. V. Prokof’ev, A. S. Mishchenko, and P. C. E. Stamp. Sharp transition forsingle polarons in the one-dimensional Su-Schrieffer-Heeger model. Phys.Rev. Lett., 105:266605, Dec 2010. doi:10.1103/PhysRevLett.105.266605.URL https://link.aps.org/doi/10.1103/PhysRevLett.105.266605. → pages56, 59[126] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of threemethods for selecting values of input variables in the analysis of outputfrom a computer code. Technometrics, 21(2):239–245, 1979. → page 26[127] M. Meuwly and J. M. Hutson. Morphing ab initio potentials: A systematicstudy of Ne-HF. The Journal of Chemical Physics, 110(17):8338–8347,1999. → pages 39, 48[128] A. Micheli, G. K. Brennen, and P. Zoller. A toolbox for lattice-spin modelswith polar molecules. Nature Physics, 2:341 EP –, Apr 2006. URLhttp://dx.doi.org/10.1038/nphys287. Article. → pages 9, 79, 98[129] R. Micnas, J. Ranninger, and S. Robaszkiewicz. Superconductivity innarrow-band systems with local nonretarded attractive interactions. Rev.Mod. Phys., 62:113–171, Jan 1990. doi:10.1103/RevModPhys.62.113.URL https://link.aps.org/doi/10.1103/RevModPhys.62.113. → page 98[130] M. Mohseni, P. Rebentrost, S. Lloyd, and A. Aspuru-Guzik.Environment-assisted quantum walks in photosynthetic energy transfer.The Journal of Chemical Physics, 129(17):174106, 2008.doi:10.1063/1.3002335. URL https://doi.org/10.1063/1.3002335. → pages78, 86, 118[131] S. A. Moses, J. P. Covey, M. T. Miecnikowski, B. Yan, B. Gadway, J. Ye,and D. S. Jin. Creation of a low-entropy quantum gas of polar molecules inan optical lattice. Science, 350(6261):659–662, 2015. ISSN 0036-8075.doi:10.1126/science.aac6400. URLhttp://science.sciencemag.org/content/350/6261/659. → page 9[132] K. P. Murphy. Machine Learning: A Probabilistic Perspective. The MITpress, 2012. ISBN 9780262018029. → pages11, 14, 15, 25, 26, 32, 33, 54, 67, 75[133] J. N. Murrell, S. Carter, S. C. Farantos, P. Huxley, and A. J. C. Varandas.Molecular Potential Energy Functions. Wiley, New York, 1984. → pages4, 24, 25, 40136[134] R. M. Neal. Bayesian Learning for Neural Networks. Springer, New York,1996. → pages 38, 76[135] M. Ortner, A. Micheli, G. Pupillo, and P. Zoller. Quantum simulations ofextended Hubbard models with dipolar crystals. New Journal of Physics,11(5):055045, 2009. → pages 9, 83, 98, 104[136] G. Parisi and B. Seoane. Liquid-glass transition in equilibrium. Phys. Rev.E, 89:022309, Feb 2014. doi:10.1103/PhysRevE.89.022309. URLhttps://link.aps.org/doi/10.1103/PhysRevE.89.022309. → page 78[137] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.Scikit-learn: Machine learning in Python. Journal of Machine LearningResearch, 12:2825–2830, 2011. → pages vi, 27[138] J. Pe´rez-Rı´os, F. Herrera, and R. V. Krems. External field control ofcollective spin excitations in an optical lattice of 2Σ molecules. NewJournal of Physics, 12(10):103007, 2010. → pages 9, 98[139] A. Peruzzo, M. Lobino, J. C. F. Matthews, N. Matsuda, A. Politi,K. Poulios, X.-Q. Zhou, Y. Lahini, N. Ismail, K. Wo¨rhoff, Y. Bromberg,Y. Silberberg, M. G. Thompson, and J. L. OBrien. Quantum walks ofcorrelated photons. Science, 329(5998):1500–1503, 2010. ISSN0036-8075. doi:10.1126/science.1193515. URLhttp://science.sciencemag.org/content/329/5998/1500. → page 78[140] M. B. Plenio and S. F. Huelga. Dephasing-assisted transport: quantumnetworks and biomolecules. New Journal of Physics, 10(11):113019, 2008.URL http://stacks.iop.org/1367-2630/10/i=11/a=113019. → pages 86, 118[141] L. Pollet, J. D. Picon, H. P. Bu¨chler, and M. Troyer. Supersolid phase withcold polar molecules on a triangular lattice. Phys. Rev. Lett., 104:125302,Mar 2010. → pages 98, 104[142] R. Portugal. Quantum Walks and Search Algorithms. Quantum Science andTechnology. Springer New York, 2013. ISBN 9781461463368. URLhttps://books.google.ca/books?id=n5up9BYK1XwC. → pages 6, 8[143] P. M. Preiss, R. Ma, M. E. Tai, A. Lukin, M. Rispoli, P. Zupancic,Y. Lahini, R. Islam, and M. Greiner. Strongly correlated quantum walks inoptical lattices. Science, 347(6227):1229–1233, 2015. ISSN 0036-8075.137doi:10.1126/science.1260364. URLhttp://science.sciencemag.org/content/347/6227/1229. → page 78[144] N. V. Prokof’ev and P. C. E. Stamp. Decoherence and quantum walks:Anomalous diffusion and ballistic tails. Phys. Rev. A, 74:020102, Aug2006. doi:10.1103/PhysRevA.74.020102. URLhttps://link.aps.org/doi/10.1103/PhysRevA.74.020102. → page 78[145] P. Pudleiner and A. Mielke. Interacting bosons in two-dimensional flatband systems. The European Physical Journal B, 88(8):207, Aug 2015.ISSN 1434-6036. doi:10.1140/epjb/e2015-60371-3. URLhttps://doi.org/10.1140/epjb/e2015-60371-3. → page 79[146] M. Qiu, Z. Ren, L. Che, D. Dai, S. A. Harich, X. Wang, X. Yang, C. Xu,D. Xie, M. Gustafsson, R. T. Skodje, Z. Sun, and D. H. Zhang. Observationof feshbach resonances in the F + H2→ HF + H reaction. Science, 311(5766):1440–1443, 2006. → page 40[147] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for MachineLearning. The MIT press, 2006. ISBN 026218253X. → pages11, 14, 15, 16, 56, 57[148] P. Rebentrost, M. Mohseni, I. Kassal, S. Lloyd, and A. Aspuru-Guzik.Environment-assisted quantum transport. New Journal of Physics, 11(3):033003, 2009. URL http://stacks.iop.org/1367-2630/11/i=3/a=033003. →pages 78, 86, 118[149] P. Richerme, Z.-X. Gong, A. Lee, C. Senko, J. Smith, M. Foss-Feig,S. Michalakis, A. V. Gorshkov, and C. Monroe. Non-local propagation ofcorrelations in quantum systems with long-range interactions. Nature, 511:198 EP –, Jul 2014. URL http://dx.doi.org/10.1038/nature13450. → page78[150] F. Robicheaux and N. M. Gill. Effect of random positions for coherentdipole transport. Phys. Rev. A, 89:053429, May 2014. → page 113[151] L. M. Roch and K. K. Baldridge. General optimization procedure towardsthe design of a new family of minimal parameter spin-component-scaleddouble-hybrid density functional theory. Phys. Chem. Chem. Phys., 19:26191–26200, 2017. doi:10.1039/C7CP04125J. URLhttp://dx.doi.org/10.1039/C7CP04125J. → page 120138[152] L. M. Roch, F. Ha¨se, C. Kreisbeck, T. Tamayo-Mendoza, L. P. E. Yunker,J. E. Hein, and A. Aspuru-Guzik. Chemos: An orchestration software todemocratize autonomous discovery. ChemRxiv e-prints, march 2018,3. →pages 51, 120[153] A. Rodrı´guez, V. A. Malyshev, G. Sierra, M. A. Martı´n-Delgado,J. Rodrı´guez-Laguna, and F. Domı´nguez-Adame. Anderson transition inlow-dimensional disordered systems driven by long-range nonrandomhopping. Phys. Rev. Lett., 90:027404, Jan 2003.doi:10.1103/PhysRevLett.90.027404. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.90.027404. → page 78[154] J. P. J. Rodriguez, Z. J. Li, and J. B. Wang. Discord and entanglement oftwo-particle quantum walk on cycle graphs. Quantum InformationProcessing, 14(1):119–133, Jan 2015. ISSN 1573-1332.doi:10.1007/s11128-014-0859-y. URLhttps://doi.org/10.1007/s11128-014-0859-y. → page 79[155] P. P. Rohde, A. Schreiber, M. Sˇtefanˇa´k, I. Jex, and C. Silberhorn.Multi-walker discrete time quantum walks on arbitrary graphs, theirproperties and their photonic implementation. New Journal of Physics, 13(1):013001, 2011. URL http://stacks.iop.org/1367-2630/13/i=1/a=013001.→ page 79[156] L. J. Root and J. L. Skinner. Localization phase diagram for theenergetically and substitutionally disordered anderson/quantum percolationmodel. The Journal of Chemical Physics, 89(5):3279–3284, 1988.doi:10.1063/1.454933. URL https://doi.org/10.1063/1.454933. → page 95[157] K. Rudinger, J. K. Gamble, M. Wellons, E. Bach, M. Friesen, R. Joynt, andS. N. Coppersmith. Noninteracting multiparticle quantum random walksapplied to the graph isomorphism problem for strongly regular graphs.Phys. Rev. A, 86:022334, Aug 2012. doi:10.1103/PhysRevA.86.022334.URL https://link.aps.org/doi/10.1103/PhysRevA.86.022334. → page 79[158] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learningrepresentations by back-propagating errors. Nature, 323:533, 1986. →page 33[159] M. Rupp. Machine learning for quantum mechanics in a nutshell.International Journal of Quantum Chemistry, 115(16):1058–1073, 2015.doi:10.1002/qua.24954. URLhttps://onlinelibrary.wiley.com/doi/abs/10.1002/qua.24954. → page 25139[160] S. Sachdev. Quantum Phase Transitions. Cambridge University Press, 2edition, 2011. → pages 4, 55[161] J. Sakurai and S. Tuan. Modern Quantum Mechanics. Addison-Wesley,1994. ISBN 9780201537307. URLhttps://books.google.ca/books?id=NoI-AQAACAAJ. → page 2[162] L. Sansoni, F. Sciarrino, G. Vallone, P. Mataloni, A. Crespi, R. Ramponi,and R. Osellame. Two-particle Bosonic-Fermionic quantum walk viaintegrated photonics. Phys. Rev. Lett., 108:010502, Jan 2012.doi:10.1103/PhysRevLett.108.010502. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.108.010502. → page 79[163] A. Savin. Beyond the Kohn-Sham Determinant, pages 129–153. WorldScientific, Singapore, 1995. doi:10.1142/9789812830586 0004. URLhttps://www.worldscientific.com/doi/abs/10.1142/9789812830586 0004.→ page 147[164] V. W. Scarola and S. Das Sarma. Quantum phases of the extendedBose-Hubbard Hamiltonian: Possibility of a supersolid state of cold atomsin optical lattices. Phys. Rev. Lett., 95:033003, Jul 2005.doi:10.1103/PhysRevLett.95.033003. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.95.033003. → pages 83, 98[165] G. C. Schatz. The analytical representation of electronic potential-energysurfaces. Rev. Mod. Phys., 61:669–688, 1989. → pages 24, 25[166] C. Schneider, D. Porras, and T. Schaetz. Experimental quantumsimulations of many-body physics with trapped ions. Reports on Progressin Physics, 75(2):024401, 2012. URLhttp://stacks.iop.org/0034-4885/75/i=2/a=024401. → pages 8, 117[167] J. R. Schrieffer and P. A. Wolff. Relation between the Anderson and KondoHamiltonians. Phys. Rev., 149:491–492, Sep 1966.doi:10.1103/PhysRev.149.491. URLhttps://link.aps.org/doi/10.1103/PhysRev.149.491. → page 80[168] G. Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 03 1978. doi:10.1214/aos/1176344136. URLhttps://doi.org/10.1214/aos/1176344136. → page 67[169] A. Seko, A. Takahashi, and I. Tanaka. Sparse representation for a potentialenergy surface. Phys. Rev. B, 90:024101, Jul 2014.140doi:10.1103/PhysRevB.90.024101. URLhttps://link.aps.org/doi/10.1103/PhysRevB.90.024101. → page 25[170] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Takingthe human out of the loop: A review of bayesian optimization. Proceedingsof the IEEE, 104(1):148–175, Jan 2016. → pages 34, 38, 51[171] N. Shenvi, J. Kempe, and K. B. Whaley. Quantum random-walk searchalgorithm. Phys. Rev. A, 67:052307, May 2003. → pages 8, 78, 118[172] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram,M. M. A. Patwary, P. Prabhat, and R. P. Adams. Scalable bayesianoptimization using deep neural networks. In Proceedings of the 32NdInternational Conference on International Conference on MachineLearning - Volume 37, ICML’15, pages 2171–2180, 2015. → pages 38, 51[173] P. Soltan-Panahi, D.-S. Lu¨hmann, J. Struck, P. Windpassinger, andK. Sengstock. Quantum phase transition to unconventional multi-orbitalsuperfluidity in optical lattices. Nature Physics, 8:71 EP –, Oct 2011. →page 98[174] J. Sous, M. Berciu, and R. V. Krems. Bipolarons bound by repulsivephonon-mediated interactions. Phys. Rev. A, 96:063619, Dec 2017.doi:10.1103/PhysRevA.96.063619. URLhttps://link.aps.org/doi/10.1103/PhysRevA.96.063619. → page 59[175] J. Sous, M. Chakraborty, C. P. J. Adolphs, R. V. Krems, and M. Berciu.Phonon-mediated repulsion, sharp transitions and (quasi)self-trapping inthe extended Peierls-Hubbard model. Scientific Reports, 7(1):1169, 2017.→ pages 59, 98[176] T. Sowin´ski, O. Dutta, P. Hauke, L. Tagliacozzo, and M. Lewenstein.Dipolar molecules in optical lattices. Phys. Rev. Lett., 108:115301, Mar2012. doi:10.1103/PhysRevLett.108.115301. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.108.115301. → page 78[177] S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning.The MIT Press, 2011. → pages 31, 32[178] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger.Information-theoretic regret bounds for Gaussian process optimization inthe bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, 2012. → pages 38, 51141[179] M. Sˇtefanˇa´k, S. M. Barnett, B. Kollr, T. Kiss, and I. Jex. Directionalcorrelations in quantum walks with two particles. New Journal of Physics,13(3):033029, 2011. URLhttp://stacks.iop.org/1367-2630/13/i=3/a=033029. → page 79[180] N. Q. Su, J. Chen, Z. Sun, D. H. Zhang, and X. Xu. H + H2 quantumdynamics using potential energy surfaces based on the XYG3 type ofdoubly hybrid density functionals: Validation of the density functionals.The Journal of Chemical Physics, 142(8):084107, 2015. → pages 40, 42[181] R. S. Sutton and A. G. Barto. Reinforcement Learning, An Introduction.MIT Press, 1 edition, 2016. → pages 11, 43, 51, 68[182] K. Swersky, J. Snoek, and R. P. Adams. Multi-task bayesian optimization.In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q.Weinberger, editors, Advances in Neural Information Processing Systems26, pages 2004–2012. Curran Associates, Inc., 2013. URLhttp://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf. →page 51[183] D. G. Truhlar, R. Steckler, and M. S. Gordon. Potential energy surfaces forpolyatomic reaction dynamics. Chemical Reviews, 87(1):217–236, 1987.→ pages 24, 25[184] R. A. Vargas-Hera´ndez and R. V. Krems. Engineering extended Hubbardmodels with Zeeman excitations of ultracold Dy atoms. Journal of PhysicsB: Atomic, Molecular and Optical Physics, 49(23):235501, 2016. URLhttp://stacks.iop.org/0953-4075/49/i=23/a=235501. → pagesxxi, xxii, xxiii, 78, 83, 97, 103, 107, 108, 109, 110, 115[185] R. A. Vargas-Herna´ndez and R. V. Krems. Quantum walks assisted byparticle-number fluctuations. Phys. Rev. A, 98:022107, Aug 2018.doi:10.1103/PhysRevA.98.022107. URLhttps://link.aps.org/doi/10.1103/PhysRevA.98.022107. → pagesxviii, xix, xx, xxi, 77, 82, 84, 85, 87, 89, 92, 94, 96[186] A. Va´zquez-Sa´nchez and J. G. A´vila-Za´rraga. A formal synthesis of (+/-)parvifoline by an aromatic Cope rearrangement of atrans-1-aryl-2-ethenylcyclobutanecarbonitrile. Tetrahedron Letters, 58(10):981 – 984, 2017. → page 25[187] S. E. Venegas-Andraca. Quantum walks: a comprehensive review. Quant.Info. Proc., 11(5):1015–1106, Oct 2012. → pages 6, 77142[188] L. Wang. Discovering phase transitions with unsupervised learning. Phys.Rev. B, 94:195105, Nov 2016. doi:10.1103/PhysRevB.94.195105. URLhttps://link.aps.org/doi/10.1103/PhysRevB.94.195105. → page 74[189] Q. Wang and Z.-J. Li. Repelling, binding, and oscillating of two-particlediscrete-time quantum walks. Annals of Physics, 373:1 – 9, 2016. ISSN0003-4916. doi:https://doi.org/10.1016/j.aop.2016.06.015. URLhttp://www.sciencedirect.com/science/article/pii/S0003491616300914. →page 79[190] T. Wang, J. Chen, T. Yang, C. Xiao, Z. Sun, L. Huang, D. Dai, X. Yang,and D. H. Zhang. Dynamical resonances accessible only by reagentvibrational excitation in the F + HD→HF + D reaction. Science, 342(6165):1499–1502, 2013. → page 40[191] A. Wilson and R. Adams. Gaussian process kernels for pattern discoveryand extrapolation. In Proceedings of the 30th International Conference onMachine Learning, volume 28 of Proceedings of Machine LearningResearch, pages 1067–1075. PMLR, 2013. → page 76[192] A. G. Wilson, E. Gilboa, A. Nehorai, and J. P. Cunningham. Fast kernellearning for multidimensional pattern extrapolation. In Advances in NeuralInformation Processing Systems 27, pages 3626–3634. Curran Associates,Inc., 2014. → pages 76, 119[193] R. Winkle. Quasi-degenerate perturbation theory. In R. Winkler, editor,Spin–Orbit Coupling Effects in Two-Dimensional Electron and HoleSystems, Springer Tracts in Modern Physics, chapter B, page 201.Springer-Verlag Berlin Heidelberg, Springer, Berlin, Heidelberg,, 2003. →page 149[194] P. Wittek. Quantum Machine Learning. Elsevier, Boston, 2014. ISBN978-0-12-800953-6.doi:https://doi.org/10.1016/B978-0-12-800953-6.00015-3. URLhttp://www.sciencedirect.com/science/article/pii/B9780128009536000153.→ page 5[195] P. Wu¨rtz, T. Langen, T. Gericke, A. Koglbauer, and H. Ott. Experimentaldemonstration of single-site addressability in a two-dimensional opticallattice. Phys. Rev. Lett., 103:080404, Aug 2009. → pages 9, 98[196] P. Xiang. Quantum control of dynamics of quasiparticles in periodic anddisordered lattice potentials. PhD thesis, University of British Columbia,1432014. URL https://open.library.ubc.ca/collections/24/items/1.0165942.Text. → page 100[197] P. Xiang, M. Litinskaya, and R. V. Krems. Tunable exciton interactions inoptical lattices with polar molecules. Phys. Rev. A, 85:061401(R), Jun2012. → page 106[198] P. Xiang, M. Litinskaya, E. A. Shapiro, and R. V. Krems. Non-adiabaticcontrol of quantum energy transfer in ordered and disordered arrays. NewJournal of Physics, 15(6):063015, 2013. → pages 9, 98[199] T. Xu and R. V. Krems. Quantum walk and Anderson localization ofrotational excitations in disordered ensembles of polar molecules. NewJournal of Physics, 17(6):065014, 2015. URLhttp://stacks.iop.org/1367-2630/17/i=6/a=065014. → page 80[200] T. Xu and R. V. Krems. Quantum walk and Anderson localization ofrotational excitations in disordered ensembles of polar molecules. NewJournal of Physics, 17(6):065014, 2015. → page 113[201] B. Yan, S. A. Moses, B. Gadway, J. P. Covey, K. R. A. Hazzard, A. M. Rey,D. S. Jin, and J. Ye. Observation of dipolar spin-exchange interactions withlattice-confined polar molecules. Nature, 501:521 EP –, Sep 2013. URLhttp://dx.doi.org/10.1038/nature12483. → pages 9, 79, 98[202] J. Zaanen. Quantum phase transitions in cuprates: stripes andantiferromagnetic supersolids. Physica C: Superconductivity, 317-318:217– 229, 1999. → page 98[203] R. Zare. Angular momentum: understanding spatial aspects in chemistryand physics. George Fisher Baker non-resident lectureship in chemistry atCornell University. Wiley, 1988. ISBN 9780471858928. URLhttps://books.google.ca/books?id=kDQ31MBZxioC. → pages 152, 156144Appendix ABayesian optimization forhybrid-density functionalsElectronic structure methods developed by quantum chemists are designed to eval-uate the ground state energy, EGS, of a given ensemble of electrons and fixed posi-tion nuclei,EGS[Vˆ ,N]= minΨ→N[〈Ψ| Hˆ |Ψ〉] , (A.1)where N is the number of electrons, Vˆ is the potential energy that electrons feel bythe fixed position nuclei and Ψ is the N−electrons wave function. The Hamilto-nian, Hˆ, describe the total energy Hˆ = Tˆ +Uˆ +Vˆ , where first and second terms arethe kinetic energy operator and electron-electron interaction potential. The min-imization of EGS is still an active research problem due to the complexity of theproblem.In 1964 a new methodology, known as Density functional theory (DFT), wasintroduced to evaluate Equation A.1 [90]. DFT assumes that for N−electrons pro-duced by Vˆ there is a single electronic density, n(r), [107]. Given Vˆ and n(r) onecan compute EGS as,EGS[Vˆ ,N]= minn→N[F [n(r)]+∫n(r)v(r)dr3], (A.2)145where F [n(r)] is the universal functional and the second term is the nuclear at-traction potential energy function for a electron density n(r) given fixed positionnuclei, also denoted as VNe[n(r)]. More condensed notation can be used to rewriteEGS[Vˆ ,N],EGS[Vˆ ,N]= F [n(r)]+VNe[n(r)] = T [n(r)]+Vee[n(r)]+VNe[n(r)], (A.3)where T [·] is the kinetic energy functional and Vee[·] is the electron-electroninteraction functional. The form of F [n(r)] is still unknown and is still an openresearch problem; recently F [·] was constructed using ML [26]. The functionalVee[·] plays a key role in DFT since it described the electron-electron interaction,also denoted as EXC[n(r)]. Over the last decades, DFT researchers have proposedmore accurate descriptors of EXC[n(r)]; one of the most successful ones is,EXC = a0 EHFX +a1 EPBEX +a2 EPBEC , (A.4)where EHFX is the Hartree-Fock exact exchange functional, EPBEX is the PBE ex-change functional and EPBEC is the PBE correlation functional. This type of func-tionals are known as hybrid exchange-correlation functionals, the trade off betweenthe three terms in hybrid EXC is described by the ai constants. a¯ denotes collec-tively the parameters of EXC It is out of the scope of this dissertation to give a morecomplete explanation about DFT.From Equations A.3 and A.4, one can observe that the accuracy in DFT’s pre-diction directly depend on the values of a¯. In most of the cases, the values of a¯ areset by minimizing an error-type function, where EGS is compared with an accuratevalue, EˆGS, for a given set of molecules or atoms.For illustrative purposes, this chapter shows that DFT parameters can be tunedusing BO. A similar type of exchange-correlation functional is considered here,range-separated exchange functionals where Vee is segmented in short and long-146range interactions [6, 10, 95, 114, 163],Vee ∝1ri, j= SR+LR =ω(γ,ri, j)ri, j+1−ω(γ,ri, j)ri, j(A.5)where ri, j is the distance between two electrons, ω is a function that depends on ri, jand a parameter, denoted by γ . The idea behind the ω function is to switch fromshort-range (SR) to long-range (LR) interactions to achieve a better description.Given an analytical form for ω , the only free parameter in EXC is γ . In the case ofrange-separated exchange functionals, γ is tuned by minimizing the a loss function,L (γ|M ) =|M |∑i∣∣εHOMOi (γ)+ IPi∣∣ (A.6)where the summation is over different molecules, εHOMOi (γ) is the energy of thehighest occupied molecular orbital (HOMO) for molecule i computed with a givenvalue of γ and IPi is the ionization potential of molecule i. |M | is the total numberof molecules or atoms considered. Equation A.6 is an optimization problem underthe eyes of ML,γ∗ = arg minγL (γ|M ), (A.7)where γ∗ is the minimizer ofL that best describes the IP of eachMi. Equation A.6cannot be minimized using gradient descent methods; however, as we explained inChapter 3, low dimensional black-box type functions can be optimized using BO.The example considered here is the optimization of γ for hydrogen molecule whichIP is 16.2 eV. The value of the optimal γˆ is 1.2 for an LCY −PBE functional. Forillustrative purposes a UCB acquisition function was used, Equation 3.9, to findthe minimizer of L for the hydrogen molecule. Figure A.1 illustrates the firstfourth iterations of BO. The first two points were sampled randomly and at thefifth iteration γ = 1.24.1470.0 0.5 1.0 1.5 2.0 2.5γ2010010203040L(a) N = 20.0 0.5 1.0 1.5 2.0 2.5γ2010010203040L(b) N = 30.0 0.5 1.0 1.5 2.0 2.5γ2010010203040L(c) N = 40.0 0.5 1.0 1.5 2.0 2.5γ2010010203040L(d) N = 5Figure A.1: The iterations of BO to find the minimizer of L for the H2molecule. The black vertical line is the next proposed point by BO.The blue markers are the training data used to construct the acquisitionfunction. The blue-solid curve is the mean of the GP model and thegrey shaded area is σ(·).148Appendix BSchrieffer–Wolff transformationHere, we show that the particle-changing interactions in Eq. (5.2), if perturbative,modify the range of particle hopping. In particular, we show that, to leading order,the couplings (5.2) lead to next-nearest-neighbour hopping.It is clear that the Hamiltonian (5.1) is not block diagonal in the site repre-sentation basis due to couplings in Eq. (5.2). Using the Schrieffer–Wolff (SW)transformation, it is possible to block diagonalize this Hamiltonian. We follow thenotation in Ref. [193]. The total Hamiltonian is defined as Hˆ = Hˆ0+Hˆ ′, whereHˆ0 contains all the operators that commute with the particle number operator, andHˆ ′ = Vˆnc. The SW transformation assumes that the transformed HamiltonianH˜ = e−SHˆ eS (B.1)can be written asH˜ = H(0)+H(1)+H(2)+ · · · , (B.2)with the different terms defined by the following matrix elements:H(0)mm′ = H0mm′ (B.3)H(1)mm′ = H′mm′ (B.4)H(2)mm′ =12 ∑`[1Em−E` +1E ′m−E`]H ′m`H′`m′ . (B.5)149Here, the indices m and m′ refer to any one-particle state, while ` is an index of athree-particle state. H0 is the Hamiltonian (5.1) with Vˆnc = 0. All matrix elementsof H(1) are zero: H(1)mm′ = 0. The first correction to H˜ appears in H(2) whose matrixelements depend on Vˆnc. Here we only consider the case of a 1D lattice with γ = 0,∆ 6= 0, and nearest-neighbour interactions. The matrix elements H(2)mm′ depend onthe matrix elements of the ∆–dependent term in Vˆnc,H ′m` = 〈m|∆∑i(cˆ†i cˆ†i±1+ cˆicˆ±1)|abc〉= ∆∑i〈m|cˆicˆi±1|abc〉= ∆ [δm,c (δb±1,a+δa±1,b)+δm,b (δc±1,a+δa±1,c)+δm,a (δc±1,b+δb±1,c)] (B.6)where a, b and c are the lattice indices of the three particles. In the case of an idealsystem, Eq. (B.5) can be rewritten as,H(2)mm′ = −12∆ε ∑`H′m`H′`m′ (B.7)where the summation ∑` is over all possible combinations of lattice indices forthree particles. Inserting Eq. (B.6) into Eq. (B.5) for both H ′m` and H′`m′ we obtain,H(2)mm′ = −12∆ε ∑`∆2 [δm,c (δb±1,a+δa±1,b)+δm,b (δc±1,a+δa±1,c)+δm,a (δc±1,b+δb±1,c)]×[δc,m′ (δa,b±1+δb,a±1)+δb,m′ (δa,c±1+δc,a±1)+δa,m′ (δb,c±1+δc,b±1)](B.8)The diagonal elements of H(2) areH(2)mm =−∆2∆εm∗, (B.9)where m∗ is the total number of nearest-neighbour lattice sites without considering150the site m. The value of m∗ can be computed as,m∗ =m−1∑i=0m−1∑j=i+1δi, j±1+N∑i=m+1N∑j=i+1δi, j±1, (B.10)where the first summation is over all pairs of lattice site interactions for i< m, andthe second summation is for i>m. For example, when m is any of a 1D lattice siteedges, m∗ = N− 2, where N is the total number of lattice sites. In the case whenm 6= m′,H(2)mm′ =−3∆22∆ε[δm,m′±2+δm±2,m′], (B.11)which leads to next-nearest-neighbour hopping with t ′ = − 3∆22∆ε . Combining Eqs.(B.9) and (B.11) we see that the SW transformation, to first order, leads to thefollowing one-particle Hamiltonian:H˜ = ∑iω ′i cˆ†i cˆi+∑it(cˆ†i±1cˆi)+ t ′(cˆ†i±2cˆi), (B.12)where ω ′i = ∆ε− ∆2∆ε m∗ and t ′ =− 3∆22∆ε .151Appendix CMagnetic dipole–dipoleinteractionThe potential energy between two dipoles d1 and d2 isVˆdd =Cd4pid1 ·d2−3(d1 · r)(d2 · r)r312, (C.1)where di is any of the dipoles, and r is a unit vector in the direction of r12. r12 isthe distance between the two dipoles. As it is described in Chapter 6, the magneticdipole-dipole interaction between two atoms can be used to tune various Hubbardmodels. Here we derive the matrix elements relation to compute of the magneticdipole-dipole interaction between atoms with different Zeeman states.In spherical tensor algebra, the dot product is defined as [203]a ·b = −√3[a(1)⊗b(1)](0)0= −√3 [−a+b−+a0b0−a−b+] , (C.2)where[a(1)⊗b(1)](0)0 is the tensor product between vectors a and b, and a± =∓ 1√2(ax± iay) and a0 = az. A more general expression of a tensor product is,[T (1)⊗T (1)](k)q= ∑m〈1m,1q−m|kq〉T (1,m)T (1,q−m), (C.3)152where 〈1m,1q−m|kq〉 are the Clebsch-Gordan (CG) coefficients, and T (k,q) arespherical tensors of rank k. Any vector is a first rank spherical tensor, T (1), wherek = 1. Equations C.2 and C.3 are identical when k = 0 and q = 0, and we candeduce the value of the CG coefficients.The second term of Vˆdd , Equation C.1, can be rewritten using the followingidentity,(a ·b)(c ·d) = 3[[a(1)⊗b(1)](0)0⊗[c(1)⊗d(1)](0)0](0)0=13(a · c)(b ·d)+ 12(a×b)(c×d)+2∑q=−2(−1)2−q[a(1)⊗ c(1)](2)q[b(1)⊗d(1)](2)q. (C.4)Combining Equations C.3 and C.4 we can rewrite Vˆdd as,Vˆdd =Cd4pi2∑p=−2(−1)2−q[r(1)⊗ r(1)](2)p[d(1)1 ⊗d(1)2](2)−p, (C.5)where the first term[r(1)⊗ r(1)](2)p is a second rank spherical tensor, T (2)(r), andthe second term is the tensor product between the two dipoles.Let us describe the term[r(1)⊗ r(1)](2)p when p = 0 and using Equation C.3 wefind that,[r(1)⊗ r(1)](2)0= ∑p1,p2〈1 p1,1 p2|20〉T 1p1(r)1p2(r) (C.6)= 〈1 1,1 −1|20〉T 11 (r)1−1(r)+〈1 −1,1 1|20〉T 1−1(r)11(r)+ 〈1 0,1 0|20〉T 10 (r)10(r)=− 1√6(x2+ y2)+√23z2, (C.7)where we use T 1±1 = a±1 and T10 = a0, previously defined. We observe that Equa-tion C.7 is proportional to the spherical harmonic Y 20 =√1516pi2z2−x2−y2r2 . Multiply-ing Equation C.7 by√6 we found that[r(1)⊗ r(1)](2)0 is the C20 modified spherical153harmonic,[r(1)⊗ r(1)](2)0=−√6T 2p (C) =−√6r3C2p(θ ,φ), (C.8)where Ckq =√4pi2k+1Ykq . Here we list some of the the spherical harmonics for k = 2,m Y 2m C2m0 (5/16pi)1/2(3cos2θ −1) 12(3cos2θ −1)±1 ∓(15/8pi)1/2cosθsinθe±iφ ∓(3/2)1/2cosθsinθe±iφ±2 (15/32pi)1/2sin2θe±2iφ (3/8)1/2sin2θe±2iφUsing Equations C.5 and C.8, Vˆdd isVˆdd = −√6Cd4pi2∑p=−2(−1)2−pT 2p (C)T 2−p(d1,d2)= −√6Cd4pi2∑p=−2(−1)2−pC2p(θ ,φ)r3 ∑p1,p2〈1 p1,1 p2|2− p〉T 1p1(d1)T 1p2(d2) (C.9)which fully expanded is,Vˆdd = − Cd4pir3{√62(3cos2θ −1)(〈1 1,1 −1|2 0〉T 11 (d1)T 1−1(d2)+ 〈1 −1,1 1|2 0〉T 1−1(d1)T 11 (d2)+〈1 0,1 0|2 0〉T 10 (d1)T 10 (d2))}p=0− Cd4pir3{√6(3/2)1/2cosθsinθe±iφ(〈1 −1,1 0|2 −1〉T 1−1(d1)T 10 (d2)+〈1 0,1 −1|2 −1〉T 10 (d1)T 1−1(d2))}p=1+Cd4pir3{√6(3/2)1/2cosθsinθe±iφ(〈1 1,1 0|2 1〉T 11 (d1)T 10 (d2)+〈1 0,1 1|2 1〉T 10 (d1)T 11 (d2))}p=−1− Cd4pir3{√6(3/8)1/2sin2θe±2iφ(〈1 −1,1 −1|2 −2〉T 1−1(d1)T 1−1(d2))}p=2− Cd4pir3{√6(3/8)1/2sin2θe±2iφ(〈1 1,1 1|2 2〉T 11 (d1)T 11 (d2))}p=−2, (C.10)154when θ = 90◦, all the terms except when p = 0 vanish leading to a z−axis parallelto the internuclear vector that connects both dipoles. Under this assumption, thedipole-dipole equation is,Vˆdd = − Cd4pir3{2√6(〈1 1,1 −1|2 0〉T 11 (d1)T 1−1(d2)+ 〈1 −1,1 1|2 0〉T 1−1(d1)T 11 (d2)+〈1 0,1 0|2 0〉T 10 (d1)T 10 (d2))}= − Cd4pir3{2√6(√23T 10 (d1)T10 (d2)+√16T 1−1(d1)T11 (d2)+√16T 11 (d1)T1−1(d2))}= − Cd4pir3{2T 10 (d1)T10 (d2)+T1−1(d1)T11 (d2)+T11 (d1)T1−1(d2)}. (C.11)C.0.1 Matrix elementThe total magnetic dipole operator looks like dJ = µ0gLL+ µ0gSS where µ0gLLis the magnetic dipole moment from the electron orbital angular momentum andµ0gsS is the magnetic dipole moment from the electron spin angular momentum.Here we describe how to compute the matrix elements of Vˆdd for two magneticdipoles in the total angular momentum basis set, |ψi〉= |ηi;JiMi〉, where, the totalangular momentum is J = L+S. η denotes any further quantum number requiredto characterize the state,〈ψ ′i∣∣〈ψ ′j ∣∣ Vˆdd ∣∣ψi〉∣∣ψ j〉 = 〈η ′i ;J′i M′i ∣∣〈η ′j;J′jM′j ∣∣ Vˆdd ∣∣ηi;JiMi〉∣∣η j;J jM j〉= − Cd4pir3{〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 11 (d1)T 1−1(d2) |ηi;JiMi〉|η j;J jM j〉+〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 1−1(d1)T 11 (d2) |ηi;JiMi〉η j;J jM j〉}−2 Cd4pir3〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 10 (d1)T 10 (d2) |ηi;JiMi〉|η j;J jM j〉(C.12)155The computation of 〈ψ ′i |〈ψ ′j| Vˆdd |ψi〉|ψ j〉 only depends on the values of〈J′i M′i∣∣T 1p1(d)∣∣JiMi〉because,〈η ′i ;J′i M′i∣∣〈η ′j;J′jM′j ∣∣ T 10 (d1)T 10 (d2) ∣∣ηi;JiMi〉∣∣η j;J jM j〉= 〈η ′i ;J′i M′i ∣∣T 10 (d1)∣∣ηi;JiMi〉×〈η ′j;J′jM′j ∣∣T 10 (d2)∣∣η j;J jM j〉 (C.13)where 〈η ′j;J′jM′j|T 10 (d2)|η j;J jM j〉 can be computed using the Wigner-Eckart theo-rem [203],〈η ′;J′ M′∣∣T kp (A)∣∣η ;J M〉= (−1)J′−M′(J′ k J−M′ p M)〈η ′;J′∥∥T k(A)∥∥η ;J〉 (C.14)In the case of magnetic dipoles, T k(di) has no physical dependence on thequantum number η , and the reduced density matrix〈η ′;J′∥∥T k(A)∥∥η ;J〉 is equalto δη ′,ηδJ′,J [J(J+1)(2J+1)]1/2. The matrix dipole element is,〈J′ M′|T 1p1(d)|J M〉 = (−1)J′−M′(J′ 1 J−M′ p1 M)〈J′‖T k(d)‖J〉= (−1)J′−M′(J′ 1 J−M′ p1 M)[J(J+1)(2J+1)]1/2 δJ′,J= (−1)−(M′+M+1)δJ′,J(2J+1)−1/2〈J′ −M′;1 p1|J −M〉 [J(J+1)(2J+1)]1/2(C.15)from the above equation, we can deduce the first selection rule, the magneticdipole-dipole interaction only couples states with the same J. To derive therest of the selection rules we first present the value of the 3 j symbols of the CGcoefficients, (J 1 J−M 0 M)= (−1)J−M M[J(J+1)(2J+1)]1/2. (C.16)Using the above equation we can fully compute 〈J′ M′|T 1p (d)|J M〉 = (−1)J′−M′ .156The first case we consider is for p = 0,〈η ′i ;J′i M′i∣∣T 10 (d1)∣∣ηi;JiMi〉 = (−1)J′i−M′i(J′i 1 Ji−M′i 0 Mi)〈Ji∥∥T 1(d1)∥∥Ji〉δη ′i ,ηi= (−1)Ji−M′i(Ji 1 Ji−Mi 0 Mi)[Ji(Ji+1)(2Ji+1)]1/2= (−1)2J′i−2M′i M[Ji(Ji+1)(2Ji+1)]1/2 [Ji(Ji+1)(2Ji+1)]1/2= Mδη ′i ,ηi (C.17)therefore,〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 10 (d1)T 10 (d2) |ηi;JiMi〉|η j;J jM j〉= Mi×M jδη ′i ,ηiδη ′j,η j (C.18)The other two terms left in equation C.12 have the same form,〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 1±1(d1)T 1∓1(d2) |ηi;JiMi〉|η j;J jM j〉= 〈η ′i ;J′i M′i |T 1±1(d1)|ηi;JiMi〉×〈η ′j;J′jM′j|T 1∓1(d2)|η j;J jM j〉(C.19)and its value depends on the 3j symbol when j2 = 1 and m2 =±1,(J 1 J−M∓1 ±1 M)= ±(−1)J−M[(J∓M)(J±M+1)2J(J+1)(2J+1)]1/2= ±(−1)J−M[J(J+1)−M(M±1)2J(J+1)(2J+1)]1/2(C.20)Inserting Equation C.20 into any of the rand-hand side terms in Equation C.19157we obtain,〈η ′i ;J′i M′i∣∣T 1±1(d1)∣∣ηi;JiMi〉 = (−1)J′i−M′i(J′i 1 Ji−M′i ±1 Mi)〈Ji∥∥T 1(d1)∥∥Ji〉δη ′i ,ηi= (−1)Ji−M′i(Ji 1 Ji−M′i ±1 Mi)[Ji(Ji+1)(2Ji+1)]1/2= ±(−1)Ji−Mi+1(−1)Ji−Mi[Ji(Ji+1)−Mi(Mi±1)2Ji(Ji+1)(2Ji+1)]1/2[Ji(Ji+1)(2Ji+1)]1/2= ∓ 1√2[Ji(Ji+1)−Mi(Mi±1)]1/2 δM′i ,Mi±1 (C.21)〈η ′i ;J′i M′i |〈η ′j;J′jM′j| T 11 (d1)T 1−1(d2) |ηi;JiMi〉|η j;J jM j〉=−12[Ji(Ji+1)−Mi(Mi±1)]1/2× [J j(J j +1)−M j(M j∓1)]1/2(C.22)where the second selection rule appears, the magnetic dipole-dipole interactioncouples states with |M−M′|= 1.Inserting Equations C.18 and C.22 into Equation C.12, we obtaine,〈Vˆdd〉= − Cd4pir3{−12[Ji(Ji+1)−Mi(Mi+1)]1/2 [J j(J j +1)−M j(M j−1)]1/2}δM′i ,Mi+1δM′jM j−1− Cd4pir3{−12[Ji(Ji+1)−Mi(Mi−1)]1/2 [J j(J j +1)−M j(M j +1)]1/2}δM′i ,Mi−1δM′jM j+1−2 Cd4pir3{Mi×M j}δM′i ,MiδM′jM j (C.23)Equation C.23 can be rewritten into a more compact form by remembering that[J(J+1)−M(M±1)]1/2 looks like the eigenvalue of the raising and lowering op-erator Jˆ±, and M is the eigenvalue of the Jˆzi ,Vˆdd =Cd4pir3{12[Jˆi,−Jˆ j,++ Jˆi,+Jˆ j,−]−2Jˆi,zJˆ j,z} (C.24)158


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items