Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Search for the production of Higgs bosons in association with top quarks and decaying into bottom quark… Held, Alexander 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_november_held_alexander.pdf [ 10.17MB ]
Metadata
JSON: 24-1.0383232.json
JSON-LD: 24-1.0383232-ld.json
RDF/XML (Pretty): 24-1.0383232-rdf.xml
RDF/JSON: 24-1.0383232-rdf.json
Turtle: 24-1.0383232-turtle.txt
N-Triples: 24-1.0383232-rdf-ntriples.txt
Original Record: 24-1.0383232-source.json
Full Text
24-1.0383232-fulltext.txt
Citation
24-1.0383232.ris

Full Text

Search for the production of Higgs bosonsin association with top quarksand decaying into bottom quark pairswith the ATLAS detectorbyAlexander HeldA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Physics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2019c© Alexander Held, 2019The following individuals certify that they have read, and recommend to the Faculty of Graduate andPostdoctoral Studies for acceptance, the dissertation entitled:Search for the production of Higgs bosons in association with top quarks and decaying intobottom quark pairs with the ATLAS detectorsubmitted by Alexander Held in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin PhysicsExamining Committee:Oliver Stelzer-Chilton, Physics & AstronomySupervisorColin Gay, Physics & AstronomySupervisory Committee MemberJoanna Karczmarek, Physics & AstronomyUniversity ExaminerRoman Krems, ChemistryUniversity ExaminerAdditional Supervisory Committee Members:Gary Hinshaw, Physics & AstronomySupervisory Committee MemberDavid Morrissey, TRIUMFSupervisory Committee MemberiiAbstractThe Standard Model of particle physics (SM) describes mass generation of fundamental particles viathe Brout-Englert-Higgs mechanism. It predicts Yukawa interactions between the Higgs boson andfermions, with interaction strengths proportional to the fermionmasses. The largest Yukawa couplingis that of the top quark, and its value has implications in particle physics and cosmology. As the SMis not a complete theory of nature, detailed measurements of its predictions are a mandatory steptowards improving the understanding of nature.This dissertation presents a search for Higgs boson production in association with a top quarkpair, a process directly sensitive to the top quark Yukawa coupling. The search uses 36.1 fb−1 of data atps = 13 TeV, collected with the ATLAS detector at the Large Hadron Collider (LHC) in 2015 and 2016.It is designed for Higgs boson decays to bottom quarks, and decays of the top quark pair resulting infinal states with one or two electrons or muons. The discrimination between the signal Higgs bosonproduction process and background processes, dominated by the production of top quark pairs, isperformed with multivariate analysis techniques. The matrix element method is used and optimizedfor this search. Possible machine learning extensions of the method are investigated to help overcomeits large computational demand. The obtained ratio of the measured cross-section for the signalHiggs boson production process to the prediction of the SM is µ= 0.84+0.64−0.61. The expected sensitivityof an extension of the search, using 139.0 fb−1 of data collected between 2015 and 2018, is 3.3σ. Datacollected between 2016 and 2018 is also used in a measurement of the ATLASmuon trigger systemefficiency.A statistical combination of searches for Higgs bosons produced in association with top quarkpairs is performed, including the search for Higgs boson decays to bottom quarks and additionalfinal states. The combination results in the observation of this Higgs boson production process withan observed significance of 5.4σ, compared to an expected sensitivity of 5.5σ. It experimentallyestablishes top quark Yukawa interactions in the SM.iiiLay summaryThe Standard Model of particle physics (SM) describes fundamental particles and their interactions.One of these particles is the Higgs boson, which participates in the mechanism that grants masses tofundamental particles. Without this mechanism, the universe would be completely different. Thelarge role played by the Higgs boson in the SMmotivates detailed studies of its behavior. Measuringthe Higgs boson properties has become possible only in recent years, thanks to a particle collidercalled the Large Hadron Collider (LHC). This dissertation describes a measurement of the interactionbetween Higgs bosons and the heaviest known fundamental particles, called top quarks. Producing aHiggs boson at the LHC is rare, and only one out of every hundred Higgs bosons is produced togetherwith top quarks. The measurement of this rare process experimentally establishes a new type offundamental interaction with Higgs bosons and top quarks, as predicted by the SM.ivPrefaceThis dissertation is based on data collected with the ATLAS experiment. The operation of this experi-ment is made possible by an international collaboration of thousands of scientists. Many aspects ofdata collection and processing are shared among the collaboration, and performed by the memberson top of the physics analyses they perform. The simulation of collision events relies on software andefforts from both within and outside the ATLAS collaboration. Analyses performed with the ATLASexperiment use software jointly developed by manymembers of the collaboration, and rely heavilyon the computing resources provided.All text in this dissertation is written by me, and no text is taken directly from previous public orinternal documentation. The ATLAS collaboration releases final results in peer-reviewed journals, andpreliminary results that do not undergo an external review process. Both types are reviewed withinthe collaboration. Figures labeled with "ATLAS" or "ATLAS Simulation" are taken from peer-reviewedpublications. The label "ATLAS Preliminary" denotes preliminary public results approved by thecollaboration. Figures labeled with "ATLAS work in progress" or "ATLAS Simulation work in progress"are prepared by me and not approved by the collaboration. They use data from the ATLAS detectoror official samples of events simulated with ATLAS software. Figures not prepared by me have theirreference indicated in the caption.My primary work is summarized in chapters 6, 7, 9, 10, and 11. Detailed contributions andacknowledgements are listed below. As a member of the ATLAS collaboration, I contributed to twoaspects of the common shared efforts enabling the detector operation and analysis of results. Withinthe muon trigger group, I measured the efficiency of the ATLASmuon trigger system for data collectedbetween 2015 and 2018. The results of this work are summarized in chapter 10. I am also part of thecore development and support team of a software package for statistical inference, used for manyanalyses in the ATLAS collaboration. This package is called TREXFITTER, and is only available to thecollaboration. It is used for the results presented in chapters 6 and 9.Chapters 6 and 7 present a search for the t t¯H(bb¯)process with 36.1 fb−1 of data. This search wasperformed by a group of around 70 people, and I was part of a small core team of analyzers for thesingle-lepton channel. The search has been published asATLAS Collaboration, Search for the standard model Higgs boson produced in associa-tion with top quarks and decaying into a bb¯ pair in pp collisions atps = 13 TeV with theATLAS detector, Phys. Rev. D 97 (2018) 072016, arXiv: 1712.08895 [hep-ex].I performed extensive studies of the fit model in the single-lepton channel in close collaborationwith other analyzers, particularly T. Calvet. This includes contributions towards the definition ofthe nominal t t¯ model and the systematic uncertainties assigned to it, as well as the derivation ofvPrefacesystematic uncertainties affecting several small background processes. Independent implementationsof the processing of samples and subsequent fit in the single-lepton channel by G. Aad, T. Calvet, andme, were used to validate the fit results. I performed detailed studies of the post-fitmodeling of data tovalidate the fit model. Many of the histograms used to produce the supporting figures accompanyingthe publication as auxiliary material were produced by me. I developed and optimized the matrixelement method (MEM) used in the search. I studied its combination with other multivariate analysistechniques in collaboration with several other analyzers, particularly D. Mori. C. David wrote thesoftware allowing to access variables related to the MEMwith a common analysis framework used inthe t t¯H(bb¯)group. The work documented in chapter 7 was performed by me, with the exception ofthe transfer function, which was derived by D. Mori. The description of the MEM in the publicationwas written by me. I additionally described the method in internal ATLAS documentation, whereI also summarized the effect of systematic uncertainties on the modeling of t t¯ production. TheMEM implementation was further documented byme in deliverable 1.4 of the AMVA4NewPhysicsinnovative training network, funded by the European Union’s Horizon 2020 research and innovationprogram. The deliverable was jointly prepared with other members of this network.Chapter 8 describes the statistical combination of t t¯H analyses performed with the ATLAS detec-tor. The evidence for the t t¯H process has been published asATLAS Collaboration, Evidence for the associated production of the Higgs boson and atop quark pair with the ATLAS detector, Phys. Rev. D 97 (2018) 072003, arXiv: 1712.08891[hep-ex].For this combination of results, I studied the effect of removing nuisance parameters describingsmall systematic uncertainties in the t t¯H(bb¯)search. This decreases the fitting time without signifi-cantly affecting the results. The subsequent observation of the t t¯H process has been published inreference [3].Chapter 9 presents a sensitivity study of a t t¯H(bb¯)search with 139.0 fb−1 of data. The studywas performed within the ATLAS t t¯H(bb¯)group and in collaboration with other analyzers. Withinthe group, I was responsible for the statistical analysis in the single-lepton channel, and a coreanalyzer in this channel. In this role, I worked on the definition of the fit model and implementedit in TREXFITTER. I studied the modeling of data, contributed towards defining the event selectionand region definitions, and compared two available b-tagging algorithms. All results of the statisticalanalysis in the chapter were obtained byme. The analysis of the dataset used in the chapter is ongoing,and will be submitted for publication upon completion of the analysis.The muon trigger efficiency measurement documented in chapter 10 was performed in collabora-tion with S. Rettie. The analysis was designed in collaboration with O. Stelzer-Chilton and J. Jovic´evic´,with additional guidance from the ATLAS muon trigger group. I processed the required samplesof data and simulated events with software developed within the ATLAS collaboration, and wroteand maintained a framework to process these samples into the histograms needed for the triggerefficiency computation. The tables in section 10.5 were obtained with code written by S. Rettie. TheviPrefacemeasurements were described in documentation internal to the ATLAS collaboration, written by S.Rettie and me. A publication summarizing the ATLAS muon trigger performance during Run-2 of theLarge Hadron Collider (LHC) is being prepared, and will include the measurements described in thechapter. I am part of the team of editors for this document.Chapter 11 summarizes two methods of approximating differential cross-sections. The foamapproach is based on an idea of T. Carli, and was developed with his guidance. I wrote the frameworkto test this method, and studied its behavior. The results documented in the chapter for the foamperformance are obtained by me. I co-supervised two students together with T. Carli, who studiedextensions of the project. T. Sandell optimized the implementation and investigated various methodsof improving the performance of the foam. G. Van Goffrier extended the study to more complex finalstates. The neural network approach was implemented following an idea developed during a stay atthe statistics department of EPFL in Lausanne. I developed the framework used for this approachand studied its performance. I supervised two students who contributed to the development of themethod. J. Bamber studied optimizations of the neural network architecture and parameters, andcontributed to the development of the framework. C. Lewis performed the training for the six-particlefinal state. All required samples of simulated events were generated by me, and I implemented thedifferential cross-section calculation for all final states investigated throughout the project. Theperformance of the neural network approximation for six-particle final states is also described ininternal ATLAS documentation I wrote.viiTable of contentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivList of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviAcronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxviiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 The StandardModel of particle physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Particles in the Standard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Quantum chromodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.1 Running coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Parton distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Parton-parton scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Electroweak theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3.1 Lagrangian density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Electroweak symmetry breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.3 Lagrangian density after electroweak symmetry breaking . . . . . . . . . . . . 112.4 Success and limitations of the Standard Model . . . . . . . . . . . . . . . . . . . . . . . 132.4.1 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Implications for physics at the Large Hadron Collider . . . . . . . . . . . . . . . . . . . 172.5.1 Top quark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5.2 Higgs boson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5.3 Yukawa couplings and the special role of t t¯H . . . . . . . . . . . . . . . . . . . 203 The Large Hadron Collider and the ATLAS experiment . . . . . . . . . . . . . . . . . . . . . 22viiiTable of contents3.1 The Large Hadron Collider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.1 Accelerator chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.2 Luminosity and pile-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 The ATLAS detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.1 Coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.2 Detector overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.3 Inner Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.4 Calorimeters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.5 Muon spectrometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.6 Trigger and data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.7 Data quality requirements and available data for analyses . . . . . . . . . . . . 343.2.8 Simulation of ATLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Object reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1 Reconstruction overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Tracks, vertices and energy clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.1 Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.2 Vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.3 Energy clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3 Leptons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.1 Muons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.2 Electrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.3 Tau leptons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4 Jets and flavor tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4.1 Jets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4.2 Flavor tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5 Missing transverse energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.6 Overlap removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.1 Statistical modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.1.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.1.2 Common distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.1.3 Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Statistical inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2.2 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.2.3 Median significances and the Asimov dataset . . . . . . . . . . . . . . . . . . . 545.3 Multivariate techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55ixTable of contents5.3.1 Boosted decision trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3.2 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Search for Higgs boson production in association with a top quark pair and decayinginto a bottom quark pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.1 Analysis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.2 Event selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.2 Object definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.3 Definition of the single-lepton and dilepton channels . . . . . . . . . . . . . . 636.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3.1 t t¯H signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3.2 t t¯ + jets background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3.3 Other backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.3.4 Inclusive modeling of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.4 Event categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.1 Region definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.4.2 Region composition and signal contributions . . . . . . . . . . . . . . . . . . . 726.5 Multivariate analysis techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.5.1 Reconstruction BDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.5.2 Likelihood discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746.5.3 Matrix element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.5.4 Classification BDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.6 Systematic uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.6.1 Nuisance parameter details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.6.2 Experimental uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.6.3 Signal and backgroundmodeling . . . . . . . . . . . . . . . . . . . . . . . . . . 786.6.4 Summary of systematic uncertainty sources . . . . . . . . . . . . . . . . . . . . 816.7 Statistical analysis and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.7.1 Fit model details and expected performance . . . . . . . . . . . . . . . . . . . 836.7.2 Fit to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.7.3 Dominant nuisance parameters and sources of uncertainty . . . . . . . . . . 866.7.4 Validation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.7.5 Observed significance and upper limits . . . . . . . . . . . . . . . . . . . . . . 946.7.6 Summary distribution of events . . . . . . . . . . . . . . . . . . . . . . . . . . . 957 Thematrix elementmethod for t t¯H(bb¯). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.1 The matrix element method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977.1.1 Parton level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987.1.2 Reconstructed objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99xTable of contents7.2 General approach for t t¯H(bb¯). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997.2.2 Transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.2.3 Remaining degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.2.4 Likelihoods and discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2.5 Systematic uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3 Technical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3.2 Matrix elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067.3.3 Parton distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077.3.4 Transfer function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.4.1 Results for the SR≥6j1 signal region . . . . . . . . . . . . . . . . . . . . . . . . . . 1107.4.2 Modeling in validation regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.4.3 Comparison to other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.5 System reconstruction with the matrix element method . . . . . . . . . . . . . . . . . 1167.5.1 Assignment efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167.5.2 Object reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 Observation of Yukawa interactions with third generation quarks . . . . . . . . . . . . . . 1208.1 Evidence for t t¯H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.1.1 Analyses entering the combination . . . . . . . . . . . . . . . . . . . . . . . . . 1208.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1218.2 Observation of t t¯H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228.2.1 Analyses entering the combination . . . . . . . . . . . . . . . . . . . . . . . . . 1228.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228.2.3 Top quark Yukawa coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248.3 Observation of H→ bb¯ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1249 Search for t t¯H(bb¯)with 139 fb−1 of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269.1 Event selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269.1.2 Object definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.1.3 Definition of the single-lepton channel . . . . . . . . . . . . . . . . . . . . . . . 1279.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.2.1 t t¯H signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.2.2 t t¯ + jets background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1279.2.3 Other backgrounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1289.2.4 Inclusive modeling of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1289.3 Event categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128xiTable of contents9.3.1 Region definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299.3.2 Region composition and signal contributions . . . . . . . . . . . . . . . . . . . 1299.4 Multivariate analysis techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319.5 Systematic uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319.5.1 Experimental uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1329.5.2 Signal and backgroundmodeling . . . . . . . . . . . . . . . . . . . . . . . . . . 1339.5.3 Summary of systematic uncertainty sources . . . . . . . . . . . . . . . . . . . . 1349.6 Statistical analysis and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349.6.1 Expected sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369.6.2 Dominant nuisance parameters and sources of uncertainty . . . . . . . . . . 14110 Muon trigger efficiencymeasurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14410.1 Analysis method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14410.2 Event selection and categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14510.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14510.2.2 Object definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14510.2.3 Definition of the t t¯ andW +jets channels . . . . . . . . . . . . . . . . . . . . . . 14610.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14610.3.1 Comparison with data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14610.4 Systematic uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14710.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14910.5.1 Trigger efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14910.5.2 Scale factors and impact of systematic uncertainties . . . . . . . . . . . . . . . 15211 Differential cross-section approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15511.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15511.1.1 Fully differential cross-sections . . . . . . . . . . . . . . . . . . . . . . . . . . . 15611.1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15611.2 Final state with two particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15711.2.1 Approximation method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15711.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15811.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15811.3 Final state with six particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15811.3.1 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16011.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16111.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16211.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16512 Conclusions and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169xiiTable of contentsAppendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181A Additional material related to the t t¯H(bb¯)analysis with 36.1 fb−1 . . . . . . . . . . . 181A.1 Categorization for the dilepton channel . . . . . . . . . . . . . . . . . . . . . . 181A.2 Signal region modeling for the dilepton channel . . . . . . . . . . . . . . . . . 181A.3 Correlation between nuisance parameters . . . . . . . . . . . . . . . . . . . . . 183B Additional material related to the t t¯H(bb¯)analysis with 139.0 fb−1 . . . . . . . . . . . 186B.1 Correlation between nuisance parameters . . . . . . . . . . . . . . . . . . . . . 186C Additional material related to differential cross-section approximation . . . . . . . . 187C.1 Differential cross-section for a 2→ 2 process . . . . . . . . . . . . . . . . . . . 187xiiiList of tablesTable 2.1 Fermion fields with associated charges. Q is the electric charge, T 3L the weakisospin, and Y the weak hypercharge. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Table 2.2 Branching ratios for the decay of t t¯ [10]. . . . . . . . . . . . . . . . . . . . . . . . . . 17Table 2.3 Branching ratios for the decay of the Higgs boson. The other category containsexperimentally challenging final states [34]. . . . . . . . . . . . . . . . . . . . . . . . 20Table 4.1 Operating points of theMV2c10 algorithm, with corresponding b-jet identificationefficiencies and rejection factors for c-jets and light jets [75]. . . . . . . . . . . . . . 46Table 5.1 Probabilities P for Gaussian distributed observable x to fall within n standarddeviations of the mean µ in one experiment, and average amount of experimentrepetitions needed for one observation to fall outside of this range. . . . . . . . . . 49Table 6.1 Definition of the t t¯ + jets components used in the analysis. Additional particle jetsare those not originating from a top quark orW boson decay. . . . . . . . . . . . . . 65Table 6.2 Systematic uncertainty sources affecting the modeling of t t¯ + jets. The left columnshows the individual sources. Additional details regarding the sources are givenin the central column. The column on the right lists on which t t¯ componentsthe sources act on, and whether the effect is correlated between the components.Additional details are provided in section 6.6.3 [1]. . . . . . . . . . . . . . . . . . . . 80Table 6.3 List of the systematic uncertainties affecting the analysis. The type N indicatesuncertainties changing normalization of the affected process, uncertainties withtype S+N can change both shape and normalization. The amount of differentcomponents per source is listed in the third column [1]. . . . . . . . . . . . . . . . . 82Table 6.4 Expected signal strength measurement in fits to an Asimov dataset. The SR≥6j1plays an important role in the overall sensitivity of the analysis. . . . . . . . . . . . 84Table 6.5 Contributions to the signal strength uncertainty, grouped by sources. The to-tal statistical uncertainty includes effects from the k(t t¯+≥ 1b) and k (t t¯+≥ 1c)normalization factors, while the intrinsic statistical uncertainty does not. Thebackgroundmodel statistical uncertainty includes effects from statistical uncer-tainties in nominal MC samples and the data-driven fake lepton estimate in thesingle-lepton channel [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94xivList of tablesTable 8.1 Summary of the signal strength µt t¯H and the observed and expected significancemeasured in the individual analyses used to establish evidence for the t t¯H process,as well as the combination of all analyses. No events are observed in the analysistargeting H→ ZZ∗→ 4l , hence the 68% confidence level upper limit is reportedfor the signal strength [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Table 8.2 Observed and expected significance for H→ bb¯ decays. The results are reportedseparately per productionmode, and for the statistical combination of all channels[38]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125Table 9.1 Systematic uncertainty sources affecting the modeling of t t¯ + jets. The left columnshows the individual sources, while the central column describes how the effect isevaluated. The column on the right lists on which t t¯ components the sources acton, and whether the effect is correlated between the components. . . . . . . . . . . 134Table 9.2 List of the systematic uncertainties affecting the analysis. The type N indicatesuncertainties changing normalization of the affected process, uncertainties withtype S+N can change both shape and normalization. The amount of differentcomponents per source is listed in the third column. . . . . . . . . . . . . . . . . . . 135Table 9.3 Contributions to the signal strength uncertainty, grouped by sources. The to-tal statistical uncertainty includes effects from the k(t t¯+≥ 1b) and k (t t¯+≥ 1c)normalization factors, while the intrinsic statistical uncertainty does not. . . . . . 143Table 10.1 Summary of trigger efficiencies for data taken between 2016 and 2018, and forsimulation. The results are reported separately for the barrel and end-cap regions,and split by channel. Only absolute statistical uncertainties are included. . . . . . 152Table 10.2 Summary of SFs for the years 2016–2018. The results are reported separately for thebarrel and end-cap regions, and split by channel. The total absolute uncertaintiesfor the SFs are shown, and their split into statistical and systematic components isalso included. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Table 10.3 Average size of relative systematic uncertainty on the SF per source, calculated asthe arithmetic mean over the three years 2016–2018 of data-taking. The uncertain-ties are reported separately per detector region and channel in the measurement. 154Table 11.1 Mean relative error obtained when only considering events above a threshold oftheir normalized differential cross-section. The requirement is listed in the leftcolumn, while the central column specifies the fraction of the full test datasetsatisfying this requirement. The mean relative error obtained in the dataset withthe requirement applied is listed in the right column. . . . . . . . . . . . . . . . . . . 165xvList of figuresFigure 2.1 The particle content of the SM, adapted from reference [11] and reference [12],using information from reference [10]. . . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 2.2 QCD interactions: coupling of quarks and gluons (left), three- and four-pointgluon self-interactions (center and right, respectively). The interaction strengthis parameterized by the coupling constant gS . . . . . . . . . . . . . . . . . . . . . . 5Figure 2.3 Momentum distributions for partons within a proton forQ = 100 GeV, using datafrom the CT10 PDF set. The first two generations of quarks, including contribu-tions for valence quarks uV and dV , as well as gluons are shown. Uncertaintiesare not drawn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 2.4 Higgs potential, visualized as a function of the complex field φ0. The unbrokencase with µ2 > 0 is shown on the left, with a ground state ϕ1 = ϕ2 = 0. Thepotential for µ2 < 0 is shown on the right, where the circle of global minima isdrawn with a dashed red line. Rotational symmetry is spontaneously brokenwhen a ground state along this circle is chosen. . . . . . . . . . . . . . . . . . . . . . 11Figure 2.5 Three-point interactions betweenHiggs boson andW +W − bosons (left), betweenHiggs boson and two Z bosons (right). . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 2.6 Higgs boson coupling to fermions, the coupling strength is proportional to thefermion massm f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.7 Summary of ATLASmeasurements of total cross-sections of various SMprocesses,compared to predictions of the SM [25]. . . . . . . . . . . . . . . . . . . . . . . . . . 14Figure 2.8 Loop corrections to the Higgs bosonmass via three-point couplings to fermionsf , vector bosonsV , self-interactions and a newmassive particle X . Contributionsfrom quartic interactions are not shown. . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 2.9 Exemplary Feynman diagrams for the gluon–gluon fusion (top left), vector bosonfusion (top right), VH (bottom left), and t t¯H (bottom right) processes . . . . . . . 18Figure 2.10 Dominant processes for Higgs boson production with associated cross-sectionsin proton–proton collisions, shown as a function of COM energy. Bands indicatetheoretical uncertainties in the cross-section calculation [34]. . . . . . . . . . . . . 18Figure 3.1 The CERN accelerator complex relevant for proton–proton collisions in the LHC.Gray arrowheads indicate the proton path. BOOSTER refers to the Proton Syn-chrotron Booster, PS is the Proton Synchrotron, and SPS is the Super ProtonSynchrotron. The figure is adapted from reference [49]. . . . . . . . . . . . . . . . . 23xviList of figuresFigure 3.2 Distribution of mean number of interactions per bunch crossing in data recordedby the ATLAS experiment atps = 13 TeV [51]. . . . . . . . . . . . . . . . . . . . . . . 24Figure 3.3 The ATLAS coordinate system, with ATLAS located at the origin. The x axis pointstowards the center of the LHC ring (indicated as dotted line), and the y axis uptowards the surface. Beams propagate along the z axis. The azimuthal angle φand polar angle θ are also shown for an arbitrary point P . . . . . . . . . . . . . . . 25Figure 3.4 The complete ATLAS detector in cutaway view [45]. . . . . . . . . . . . . . . . . . . 26Figure 3.5 Cutaway view of the ATLAS ID. The IBL is missing in this visualization [45]. . . . . 27Figure 3.6 Cutaway view of the ATLAS calorimeter system surrounding the ID and solenoidmagnet [45]. The term LAr refers to liquid argon as active material. . . . . . . . . . 29Figure 3.7 Cutaway view of the ATLASMS [45]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Figure 3.8 Schematic of one quarter of a cross-section through the ATLAS detector [53]. . . . 31Figure 3.9 Total integrated luminosity collected by the LHC (green), recorded by the ATLASdetector (yellow) and used for physics analyses (blue), shown as a function oftime [51]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Figure 4.1 Schematic of fundamental particles interacting with the ATLAS detector, adaptedfrom reference [59]. It shows a section of the x–y plane. . . . . . . . . . . . . . . . . 36Figure 5.1 Relation between significance Z and p-value. . . . . . . . . . . . . . . . . . . . . . 52Figure 5.2 Distribution of the test statistic tNP under hypotheses H1 and H0, including p-values calculated from an observation tobs indicated in the shaded areas. . . . . . 53Figure 5.3 Distribution of the test statistic tµ, the p-value can be obtained via the integralprescription in equation (5.14). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 5.4 Exemplary architecture of a fully connected feedforward neural network withthree inputs (drawn as blue circles), two hidden layers (with associated nodesdrawn in green), and one output (drawn in purple). Information flows along thelines connecting nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure 6.1 Exemplary Feynman diagram for the t t¯H(bb¯)topology, with one or two lightcharged leptons (l ) in the final state. The different columns listed for the decayproducts of theW bosons correspond to the alternative topologies considered inthe analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Figure 6.2 Exemplary Feynman diagram for the t t¯ +bb¯ background process. . . . . . . . . . 61Figure 6.3 Relative fraction of t t¯+≥ 1b sub-components predicted by the POWHEG+PYTHIA 8and SHERPA4F samples. The uncertainty for both predictions are also shown,including the sources discussed in section 6.6.3 [1]. . . . . . . . . . . . . . . . . . . 66xviiList of figuresFigure 6.4 Expected distribution of the number of jets per event in the single-lepton channel,compared to data. The uncertainties shown include all sources of systematicuncertainty described in section 6.6, with the exception of the free-floating nor-malization factors for the t t¯+≥ 1b and t t¯+≥ 1c processes. The t t¯H distributionnormalized to the total background is overlaid as a dashed red line. . . . . . . . . 68Figure 6.5 Expected distribution of the number of b-tagged jets per event at the four op-erating points (very tight, tight, medium, loose) in the single-lepton channel,compared to data. The uncertainties shown include all sources of systematicuncertainty described in section 6.6, with the exception of the free-floating nor-malization factors for the t t¯+ ≥ 1b and t t¯+ ≥ 1c processes. The t t¯H signal isshown both in the stacked histogram, contributing in red, as well as a dashed redline drawn on top of the stacked histogram. . . . . . . . . . . . . . . . . . . . . . . . 69Figure 6.6 Definition of resolved analysis regions with exactly five jets in the single-leptonchannel. The vertical axis shows the b-tagging requirements for the first two jetsin each event, while the horizontal axis shows the requirement for the third andfourth jet. Jets are ordered by decreasing tightness of the operating point theysatisfy [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 6.7 Definition of resolved analysis regions with six or more jets in the single-leptonchannel. The vertical axis shows the b-tagging requirements for the first two jetsin each event, while the horizontal axis shows the requirement for the third andfourth jet. Jets are ordered by decreasing tightness of the operating point theysatisfy [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 6.8 Composition of background processes in the single-lepton regions. Each piechart shows the relative contributions per process and region, with the processesdefined in section 6.3 [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Figure 6.9 Signal contributions per analysis region in the single-lepton channel, evaluatedusing the expected amount of t t¯H events (S) and background events (B) perregion. The solid black line, corresponding to the left vertical axis, shows S/B. Thedashed red line, corresponding to the right vertical axis, shows S/pB [1]. . . . . . . 73Figure 6.10 Measurement of the signal strength µt t¯H when fitting the model to data. Thetwo-µ fit is performed by fitting both dilepton and single-lepton channels, withtwo separate signal strength parameters affecting them. The nominal fit result,listed in the last row, is obtained by using a single signal strength parameter [1]. . 85xviiiList of figuresFigure 6.11 Overview of the yields in all single-lepton regions pre-fit (top) and post-fit (bot-tom). The uncertainty bands include all sources of systematic uncertainty de-scribed in section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) isincluded pre-fit. The t t¯H signal is shown both in the stacked histogram, con-tributing in red, as well as a dashed red line drawn on top of the stacked histogram.It is normalized to the SM prediction pre-fit, and the best-fit signal strength valuereported in equation (6.1) post-fit [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 6.12 Overview of the yields in all dilepton regions pre-fit (top) and post-fit (bottom).The uncertainty bands include all sources of systematic uncertainty describedin section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal is shown both in the stacked histogram, contributing inred, as well as a dashed red line drawn on top of the stacked histogram. It isnormalized to the SM prediction pre-fit, and the best-fit signal strength valuereported in equation (6.1) post-fit [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 88Figure 6.13 Comparison between data and the model for the control regions CR5jt t¯+≥1c (top)and CR≥6jt t¯+≥1c (bottom), with pre-fit on the left and post-fit on the right. Theuncertainty bands include all sources of systematic uncertainty described insection 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalized tothe SM prediction pre-fit, and the best-fit signal strength value reported in equa-tion (6.1) post-fit. Events with HhadT < 200 GeV or HhadT > 650 GeV are included inthe leftmost and rightmost bins of the CR5jt t¯+≥1c distributions, respectively. Sim-ilarly, events with HhadT < 200 GeV or HhadT > 1000 GeV are also included in theoutermost bins of the CR≥6jt t¯+≥1c distributions [1]. . . . . . . . . . . . . . . . . . . . . 89Figure 6.14 Comparison between data and the model for the signal regions SR5j1 (top), SR5j2(middle) and SRboosted (bottom), with pre-fit on the left and post-fit on the right.The uncertainty bands include all sources of systematic uncertainty describedin section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalizedto the SM prediction pre-fit, and the best-fit signal strength value reported inequation (6.1) post-fit. The t t¯H distribution normalized to the total backgroundis overlaid as a dashed red line [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90xixList of figuresFigure 6.15 Comparison between data and the model for the signal regions SR≥6j1 (top), SR≥6j2(middle) and SR≥6j3 (bottom), with pre-fit on the left and post-fit on the right.The uncertainty bands include all sources of systematic uncertainty describedin section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalizedto the SM prediction pre-fit, and the best-fit signal strength value reported inequation (6.1) post-fit. The t t¯H distribution normalized to the total backgroundis overlaid as a dashed red line [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Figure 6.16 The 20 dominant nuisance parameters in the fit, ranked according to their impacton the signal strength. The empty rectangles correspond to the pre-fit impact,while the filled rectangles show the post-fit impact per nuisance parameter. Theupper axis shows the impact∆µ. The pull θˆ−θ0∆θ of the nuisance parameter is shownas black points, with the vertical black lines visualizing the post-fit nuisanceparameter uncertainty ∆θˆ [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 6.17 95% confidence level (CL) upper limits on the signal strength µt t¯H , derived ina combined fit to single-lepton and dilepton channels with two independentsignal strength parameters (two-µ fit), as well as a fit with a single signal strength(combined fit) [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Figure 6.18 Post-fit yields of signal (S), total background (B), and observed data, shown asa function of log10 (S/B). Contributions of the signal, when scaled to its best-fit signal strength value, are drawn in red, while contributions with the signalstrength set to its value excluded at the 95% confidence level are drawn in orange.The lower panel shows the difference of observed data and various fit models tothe total background taken from the nominal fit [1]. . . . . . . . . . . . . . . . . . . 96Figure 7.1 Topologies for t t¯H production. The three diagrams on the left are initiated bygluons and are considered in the MEM calculation, while the quark–antiquarktopology in the diagram on the right is neglected. . . . . . . . . . . . . . . . . . . . 107Figure 7.2 Transfer function components for b- and light jets, shown as a function of theenergy Eq of the quark a jet is associated to. The distributions for light jets,described by a double Gaussian asWlight, are shown as solid lines for differentjet energies. The corresponding distributionsWb for b-jets are shown as dashedlines, and are described by crystal ball functions. . . . . . . . . . . . . . . . . . . . . 109Figure 7.3 Distribution of the t t¯H(bb¯)signal and t t¯+bb¯ background processes in the SR≥6j1region as a function of the logarithms of the signal and background likelihoods,LS and LB . Both processes are normalized to have unit integral. The left- andrightmost bins of the distributions include all events with likelihoods smaller orlarger than the edge of these bins, respectively. . . . . . . . . . . . . . . . . . . . . . 111xxList of figuresFigure 7.4 Distribution of the t t¯H(bb¯)signal and t t¯ + bb¯ background processes in theSR≥6j1 region, both normalized to have unit integral. The left figure shows thedistributions as a function of the MEMD1 likelihood ratio, while the right figureshows the transformed version of this variable. . . . . . . . . . . . . . . . . . . . . . 111Figure 7.5 Comparison between data and the model for the logarithm of signal likelihood(top), logarithm of background likelihood (middle) and the transformedMEMD1discriminant (bottom). The figures on the left show the pre-fit model. The post-fit model on the right is obtained from the fit described in section 6.7.2. Theuncertainty bands include all sources of systematic uncertainty described insection 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalizedto the SM prediction pre-fit, and the best-fit signal strength value reported inequation (6.1) post-fit. The t t¯H distribution normalized to the total backgroundis overlaid as a dashed red line. The left- and rightmost bins of the likelihooddistributions include all events with likelihoods smaller or larger than the edge ofthese bins, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Figure 7.6 Comparison between data and the model for the logarithm of signal likelihood(top), logarithm of background likelihood (middle) and the transformedMEMD1discriminant (bottom) in the SR≥6j2 region (left) and the SR≥6j3 region (right). Theuncertainty bands only include sources related to t t¯H and t t¯ modeling, with theexception of the t t¯+≥ 1b sub-component uncertainty derived from SHERPA4F.No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included. The t t¯Hsignal shown in red in the stacked histogram is normalized to the SM prediction.The t t¯H distribution normalized to the total background is overlaid as a dashedred line. The left- and rightmost bins of the likelihood distributions include allevents with likelihoods smaller or larger than the edge of these bins, respectively. 114Figure 7.7 Comparison between data and the model for the logarithm of signal likelihood(top left), logarithm of background likelihood (top right) and the transformedMEMD1 discriminant (bottom) in the CR≥6jt t¯+≥1c region. The uncertainty bandsonly include sources related to t t¯H and t t¯ modeling, with the exception of thet t¯+≥ 1b sub-component uncertainty derived from SHERPA4F. No uncertaintyrelated to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included. The t t¯H signal shown in redin the stacked histogram is normalized to the SMprediction. The t t¯H distributionnormalized to the total background is overlaid as a dashed red line. The left- andrightmost bins of the likelihood distributions include all events with likelihoodssmaller or larger than the edge of these bins, respectively. . . . . . . . . . . . . . . 115xxiList of figuresFigure 7.8 Assignment efficiency of jets to quarks in the permutation with the largest t t¯Hlikelihood, evaluated with a sample of t t¯H events. The rows correspond to thequark each jet is matched to, while the columns describe the true jet origin. Jetsmay be truth-matched to multiple quarks. . . . . . . . . . . . . . . . . . . . . . . . 117Figure 7.9 Reconstructed invariant mass of the bb¯ system produced in association with thetop quark pair. The figure on the left shows the invariant mass of the two jetsassigned to the b quarks from the Higgs boson decay in the permutation with thehighest t t¯H likelihood, this quantity is interpreted as the reconstructed Higgsboson mass. The figure on the right shows the bb¯ system assigned to b quarksthat do not originate from top quark decays in the t t¯ +bb¯ topology, using thepermutation with the highest t t¯ +bb¯ likelihood. Distributions of the t t¯H signalare shown as dashed red lines, the t t¯ background is drawn as a solid blue line. Alldistributions are normalized to unit integral. Only statistical uncertainties arevisualized in the figure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Figure 8.1 Results of the t t¯H cross-section measurement, divided by the SM prediction, inthe statistical combination. The results per analysis topology are obtained froma fit with four independent cross-section parameters. Statistical and systematicuncertainties are shown in yellow and blue, respectively. The SM prediction isshown in red, with the associated uncertainty indicated as a gray band. No eventsare observed in the H→ ZZ∗→ 4l analysis, and the 68% confidence level upperlimit is reported [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123Figure 9.1 Expected distribution of the number of jets per event, compared to data. Theuncertainties shown include all sources of systematic uncertainty described insection 9.5, with the exception of the free-floating normalization factors for thet t¯+≥ 1b and t t¯+≥ 1c processes. The t t¯H distribution normalized to the totalbackground is overlaid as a dashed red line. . . . . . . . . . . . . . . . . . . . . . . . 129Figure 9.2 Expected distribution of the number of b-tagged jets per event at the four operat-ing points (very tight, tight,medium, loose), compared to data. The uncertaintiesshown include all sources of systematic uncertainty described in section 9.5,with the exception of the free-floating normalization factors for the t t¯+ ≥ 1band t t¯+≥ 1c processes. The t t¯H signal is shown both in the stacked histogram,contributing in red, as well as a dashed red line drawn on top of the stacked his-togram. Data is not shown in bins where the t t¯H signal is expected to contributemore than 5% to the yield, indicated by a gray hashed area. . . . . . . . . . . . . . 130Figure 9.3 Composition of background processes per region. Each pie chart shows therelative contributions per process, with the processes defined in section 9.2. . . . 131xxiiList of figuresFigure 9.4 Signal contributions per region, calculated with the expected amount of t t¯Hevents (S) and background events (B) per region. The histograms show S/pB, withblue bars for control regions and red bars indicating signal regions. S/B is alsolisted for each region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Figure 9.5 Overview of the yields in all regions pre-fit (top) and post-fit (bottom). Theuncertainty bands include all sources of systematic uncertainty described insection 9.5. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal is shown both in the stacked histogram, contributing inred, as well as a dashed red line drawn on top of the stacked histogram. It isnormalized to the SM prediction. Data is only compared to the pre-fit model, andnot shown in bins where the t t¯H signal is expected to contribute more than 5%to the yield, indicated by a gray hashed area. . . . . . . . . . . . . . . . . . . . . . . 137Figure 9.6 Comparison between data and the model for the control regions CR5j (top) andCR≥6j (bottom), with pre-fit on the left and post-fit on the right. The uncertaintybands include all sources of systematic uncertainty described in section 9.5. Nouncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯Hsignal shown in red in the stacked histogram is normalized to the SM prediction.Events with HhadT < 200 GeV or HhadT > 800 GeV are included in the leftmostand rightmost bins of the CR5j distributions, respectively. Similarly, events withHhadT < 250 GeV or HhadT > 1150 GeV are also included in the outermost bins ofthe CR≥6j distributions. Data is only compared to the pre-fit model. . . . . . . . . 138Figure 9.7 Comparison between data and the model for the signal regions SR5j1 (top), SR5j2(middle) and SRboosted (bottom), with pre-fit on the left and post-fit on the right.The uncertainty bands include all sources of systematic uncertainty describedin section 9.5. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalized tothe SM prediction. The t t¯H distribution normalized to the total background isoverlaid as a dashed red line. Data is only compared to the pre-fit model, and notshown in bins where the t t¯H signal is expected to contribute more than 5% tothe yield, indicated by a gray hashed area. . . . . . . . . . . . . . . . . . . . . . . . . 139Figure 9.8 Comparison between data and the model for the signal regions SR≥6j1 (top) andSR≥6j2 (bottom), with pre-fit on the left and post-fit on the right. The uncertaintybands include all sources of systematic uncertainty described in section 9.5. Nouncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯Hsignal shown in red in the stacked histogram is normalized to the SM prediction.The t t¯H distribution normalized to the total background is overlaid as a dashedred line. Data is only compared to the pre-fit model, and not shown in bins wherethe t t¯H signal is expected to contribute more than 5% to the yield, indicated by agray hashed area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140xxiiiList of figuresFigure 9.9 The 20 dominant nuisance parameters in the fit, ranked according to their im-pact on the signal strength. The empty rectangles correspond to the pre-fitimpact, while the filled rectangles show the post-fit impact per nuisance pa-rameter. The upper axis shows the impact ∆µ. The pull θˆ−θ0∆θ of the nuisanceparameter is shown as black points, with the vertical black lines visualizing thepost-fit nuisance parameter uncertainty ∆θˆ. MG5 refers to samples generatedwith MG5_AMC@NLO+PYTHIA 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Figure 10.1 Expected distribution of the muon transverse momentum (left) and EmissT (right)in the t t¯ channel, compared to data. An overall normalization factor is appliedto simulation to match data, with an effect smaller than 1%. Only statisticaluncertainties are shown for the expected distribution, drawn with dashed lines.The EmissT > 200 GeV requirement is not applied in the figures showing the EmissTdistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147Figure 10.2 Expected distribution of the muon transverse momentum (left) and EmissT (right)in the W +jets channel, compared to data. An overall normalization factor isapplied to simulation to match data, with an effect around 10%. Only statisticaluncertainties are shown for the expected distribution, drawn with dashed lines.The EmissT > 200 GeV requirement is not applied in the figures showing the EmissTdistributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148Figure 10.3 Muon trigger efficiencies and SFs in the barrel region, measured in the t t¯ (left)andW +jets (right) channels, for data recorded in 2016 (top), 2017 (middle), and2018 (bottom). The upper part of the figures show the trigger efficiencies for datain black, and simulation as a hashed green area. The lower part shows the SF,given by the ratio of efficiency measured in data to simulation. The efficienciesare shown as a function of the reconstructed muon transverse momentum, andthe resulting efficiencies and SFs from a fit to muons with pT > 100 GeV are alsolisted in the figure. Only statistical uncertainties are included. . . . . . . . . . . . . 150Figure 10.4 Muon trigger efficiencies and SFs in the end-cap regions, measured in the t t¯ (left)andW +jets (right) channels, for data recorded in 2016 (top), 2017 (middle), and2018 (bottom). The upper part of the figures show the trigger efficiencies for datain black, and simulation as a hashed green area. The lower part shows the SF,given by the ratio of efficiency measured in data to simulation. The efficienciesare shown as a function of the reconstructed muon transverse momentum, andthe resulting efficiencies and SFs from a fit to muons with pT > 100 GeV are alsolisted in the figure. Only statistical uncertainties are included. . . . . . . . . . . . . 151Figure 11.1 Exemplary Feynman diagram for gluon-initiated t t¯ production. . . . . . . . . . . . 157xxivList of figuresFigure 11.2 Prediction for the differential cross-section of t t¯ production by a foamwith 50 000cells, drawn as dashed blue lines. The distributions are shown as a function ofthe four variables used to parameterize the fully differential cross-section, andcompared to a reference set of events generated with MG5_AMC@NLO. Thereference distribution is shown in green, statistical uncertainties are indicatedwith hashed gray lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Figure 11.3 Exemplary Feynman diagram for gluon-initiated t t¯ production, with subsequentdecays into a final state with six particles. . . . . . . . . . . . . . . . . . . . . . . . . 160Figure 11.4 Distribution of one million test events as a function of the logarithm of their nor-malized differential cross-section, log10(1/σt t¯ dσt t¯), and the corresponding neuralnetwork prediction log10(1/σt t¯ dσˆt t¯). The fraction of events per bin is indicatedby the color of each bin, drawn with a logarithmic scale. The diagonal gray lineindicates where all events would be located for dσˆt t¯ = dσt t¯ . The test events aredistributed like the differential cross-section; fewer events exist in phase spaceregions with small differential cross-sections. The neural network prediction for avery small fraction of events with small differential cross-sections significantly un-derestimates their differential cross-sections. The error is largest for such eventsdue to the training set distribution and loss function choice for the neural network. 163Figure 11.5 Distribution of one million test events. The green histogram shows the distri-bution of events as a function of the logarithm of their normalized differentialcross-section, log10(1/σt t¯ dσt t¯). The dashed blue line shows the distribution as afunction of the corresponding network prediction, log10(1/σt t¯ dσˆt t¯). The networkprediction significantly underestimates the differential cross-section for a smallfraction of events with small differential cross-sections. This is due to the choiceof training set distribution and loss function of the neural network. . . . . . . . . . 164Figure 11.6 Distribution of the relative error (dσt t¯−dσˆt t¯ )/dσt t¯ for one million test events. Themean relative error is the average absolute value of the relative error across allevents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164Figure A.1 Definition of analysis regions with exactly three jets in the dilepton channel. Thevertical axis shows the b-tagging requirements for the first two jets in each event,while the horizontal axis shows the requirement for the third jet. Jets are orderedby decreasing tightness of the operating point they satisfy [1]. . . . . . . . . . . . . 181Figure A.2 Definition of analysis regions with four or more jets in the dilepton channel. Thevertical axis shows the b-tagging requirements for the first two jets in each event,while the horizontal axis shows the requirement for the third and fourth jet. Jetsare ordered by decreasing tightness of the operating point they satisfy [1]. . . . . . 182Figure A.3 Composition of background processes in the dilepton regions. Each pie chartshows the relative contributions per process and region, with the processes de-fined in section 6.3 [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182xxvList of figuresFigure A.4 Signal contributions per analysis region in the single-lepton channel. The solidblack line, corresponding to the left vertical axis, shows S/B. The dashed red line,corresponding to the right vertical axis, shows S/pB. S is the number of t t¯H eventsper region, and B the number of expected background events [1]. . . . . . . . . . . 183Figure A.5 Comparison between data and the model for the signal regions SR≥4j1 (top), SR≥4j2(middle) and SR≥4j3 (bottom), with pre-fit on the left and post-fit on the right.The uncertainty bands include all sources of systematic uncertainty describedin section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is includedpre-fit. The t t¯H signal shown in red in the stacked histogram is normalizedto the SM prediction pre-fit, and the best-fit signal strength value reported inequation (6.1) post-fit. The t t¯H distribution normalized to the total backgroundis overlaid as a dashed red line [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184Figure A.6 Correlations between the most highly ranked nuisance parameters and the signalstrength, determined by the nominal fit to data described in section 6.7.2. Allvalues are in % [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185Figure B.1 Correlations between nuisance parameters and signal strength, determined by afit to an Asimov dataset. Parameters are only included if they have a correlationof at least 25% with at least one other parameter. All values are in %. . . . . . . . . 186xxviAcronyms3F three-flavor4F four-flavor5F five-flavorBDT boosted decision treeBSM beyond the Standard ModelCKM Cabibbo–Kobayashi–MaskawaCOM center-of-massCP charge conjugation and parity symmetryCPU central processing unitCSC cathod strip chamberDGLAP Dokshitzer–Gribov–Lipatov–Altarelli–ParisiDM dark matterEW electroweakEWSB electroweak symmetry breakingFSR final state radiationGPU graphics processing unitHL-LHC High Luminosity Large Hadron ColliderHLT high-level triggerIBL insertable B-layerID inner detectorxxviiAcronymsISR initial state radiationL1 Level-1LHC Large Hadron ColliderLHD likelihood discriminantLO leading orderMC Monte CarloMDT monitored drift tubeMEM matrix element methodMPI multi-parton interactionMS muon spectrometerNLO next-to-leading orderNNLL next-to-next-to-leading logarithmicNNLO next-to-next-to-leading orderPDF parton distribution functionPMNS Pontecorvo–Maki–Nakagawa-SakataPS parton showerQCD quantum chromodynamicsQED quantum electrodynamicsReLU rectified linear unitRoI region of interestRPC resistive plate chamberSCT semiconductor trackerSF scale factorSM Standard Model of particle physicsxxviiiAcronymsTGC thin gap chamberTMVA Toolkit for Multivariate Data Analysis with ROOTTRT transition radiation trackerUE underlying eventVEV vacuum expectation valuexxixAcknowledgementsThis dissertation would not have been possible without many people who I had the pleasure to meetand work with during the last five years. Thank you for making this an amazing experience.To my supervisor Oliver, thank you for your continuous guidance, for always being available tohelp, for your advice, and for providing me the freedom to focus on projects I am interested in. I havelearned a lot from you, and I am happy to have had you as my supervisor. Thank you to Tancredi forsharing your deep understanding of ATLAS and particle physics. Your experience, insight and advicewere invaluable.Thank you to the members of the AMVA4NewPhysics network, which shaped the second half ofmy PhD. You taught me lots about machine learning, and encouraged me to follow new ideas. It was agreat experience to have been part of this network. Thank you to Tommaso for all your work to makeit happen, and the groups at CP3 and EPFL for hosting me.To the t t¯H(bb¯)group, thank you for providing such a great environment to work in at all times. Iam especially grateful to Jelena. Thank you for teaching memost of what I know about the analysis,patiently answering lots of questions, and for the continuous advice and feedback. Thank you toeveryone whomade the use of the matrix element method possible: Bernd for lots of advice, Dan forthe transfer function derivation, and Claire for the interface to the common analysis code. Thanks toeveryone else in the group, particularly Georges, Michele and Thomas. It was great to work alongsideyou on this analysis.My time working towards a PhD was split between Vancouver and CERN, and it was wonderfulto work in both places. Thank you to the UBC and TRIUMF groups for providing such an enjoyableatmosphere. Thanks to everyone in the UBC lab, both previous and current members, especially toSébastien writing in parallel, and to Tal at CERN. A special thank you to Felix andWojtek for runningFlashy, which was an invaluable resource.To everyone who read earlier drafts of this document, thank you for all the useful feedback youprovided.Last but not least, tomy family and friends: thank you for all your support during the last five years.It was a privilege to have been given the chance to pursue this journey, and I thoroughly enjoyed it.This project has received funding from the European Union’s Horizon 2020 research andinnovation program under grant agreement No 675440 AMVA4NewPhysics.xxx1. IntroductionThe StandardModel of particle physics (SM) describes interactions between elementary particles. It isa highly successful theory of nature, and in agreement with decades of experimental tests. One of thecentral ingredients to the SM is the Brout-Englert-Higgs mechanism, which explains how fermionsand the weak gauge bosons acquire mass. This mechanism predicts the existence of a Higgs boson,which was observed in 2012 by both the ATLAS and CMS collaborations at the Large Hadron Collider(LHC). TheHiggs boson observationmarks the beginning of a new era of experimental particle physics,which studies the details of electroweak symmetry breaking (EWSB).Despite all its success, the SM has several shortcomings and does not constitute a completetheory of nature. A rich experimental program is dedicated to the search for phenomena predicted bytheories that extend the SM. An alternative way forward are precise measurements of the theoreticalpredictions. Detailed studies of the interactions between the Higgs boson and other elementaryparticles are necessary to further establish the SM validity. Such studies are possible at the LHC. Apossible deviation from the SM predictions can help guide the way towards a more complete theory.This dissertation presents ameasurement of the interaction betweenHiggs bosons and top quarks.The strength of the Higgs boson interaction with other elementary particles increases with theirmasses. The top quark, being the heaviest elementary particle in the SM, is the fermion that interactsmost strongly with the Higgs boson. This interaction is studied directly in a process called t t¯H , usingproton–proton collisions at the LHC that were recorded by the ATLAS experiment. The t t¯H processdescribes the production of a pair of top quarks and a Higgs boson. A specific topology is the focusof this dissertation, where the Higgs boson decays to a pair of bottom quarks. This process is calledt t¯H(bb¯). A measurement of the t t¯H process can be interpreted in terms of the top quark Yukawacoupling, which has important implications in both particle physics and cosmology.The dissertation is organized as follows. Chapter 2 introduces the SM, with a focus on the pro-cess of EWSB and implications for the experimental analyses performed. The LHC and the ATLASexperiment are described in chapter 3. This is followed by a description of how the ATLAS detectorreconstructs physics objects, provided in chapter 4. Chapter 5 introduces the statistical methodologyused to interpret results, and includes the description of multivariate analysis techniques employed. Asearch for the t t¯H(bb¯)process is presented in detail in chapter 6. It uses 36.1 fb−1 of data collected bythe ATLAS experiment in 2015 and 2016. The t t¯H(bb¯)search uses a multivariate analysis techniquecalled the matrix element method (MEM), and its implementation for the search is described indetail in chapter 7. Chapter 8 presents the statistical combination of the t t¯H(bb¯)search with othermeasurements performedwith the ATLAS detector, leading to the observation of both the t t¯H processand the Higgs boson decay to bottom quark pairs by the ATLAS collaboration in 2018. Between11. Introduction2015 and 2018, the ATLAS experiment collected a proton–proton collision dataset of 139.0 fb−1. Theexpected sensitivity of a t t¯H(bb¯)analysis with this dataset is studied in chapter 9. Chapter 10 presentsa measurement of the efficiency of the ATLAS muon trigger system. This system is essential to thesuccess of many physics analyses studying events where muons are produced, including the t t¯H(bb¯)process. The studies summarized in chapter 11 investigate the feasibility of using machine learningtechniques to help reduce the large computational requirements of MEM calculations. A conclusionto the dissertation is provided in chapter 12. The appendices A and B provide additional materialrelated to the t t¯H(bb¯)searches with 36.1 fb−1 and 139.0 fb−1 respectively, and appendix C includesadditional material related to chapter 11.This dissertation uses natural units, where the speed of light in vacuum c and the reduced Planckconstant ħ are both set to unity, and electric charge is given in units of the electric charge of thepositron.22. The Standard Model of particle physicsThe SM [4–7] describes interactions between fundamental particles via three out of the four knownfundamental forces: electromagnetic, weak, and strong force. The strong force is governed by thetheory of quantum chromodynamics (QCD), while the electromagnetic and weak forces are unifiedin the electroweak (EW) theory. Gravitational interactions are not accounted for by the SM, anddescribed by the theory of general relativity.The SM is a non-abelian gauge theory, invariant under the gauge groupG = SU (3)C ×SU (2)L ×U (1)Y . (2.1)Its Lagrangian density is given byL SM =L QCD+L EW, (2.2)with the two components described in sections 2.2 and 2.3.The SM is a Lorentz invariant quantum field theory, reconciling the laws of quantummechanicsand special relativity. This summary is based on reference [8], with some input from reference [9] andexperimental measurements summarized in reference [10]. The Einstein summation convention isused in this chapter, repeated indices are summed over.The chapter starts with an overview of the particle content in the SM in section 2.1, followed bybrief descriptions of QCD in section 2.2 and the EW theory in section 2.3. Section 2.4 highlights thesuccess and limitations of the SM. The chapter closes with predictions of the SM for physics at theLHC relevant to this dissertation in section 2.5.2.1 Particles in the Standard ModelThe particle content of the SM is shown in figure 2.1. Matter is made up of fermions, which areparticles with half integer spin. Particles with integer spin are called bosons.The fundamental fermions can be split into two classes, quarks and leptons. Quarks carry afractional electric chargeQ and a color charge. There are six flavors of quarks, split into three differentgenerations. Each generation contains an up-type and a down-type quark, where the types are namedafter the quarks in the first generation: up and down. The corresponding quarks of the second andthird generations are called charm and strange, and top and bottom, respectively. Up-type quarks carryelectric chargeQ = 2/3, while down-type quarks haveQ =−1/3. Besides these six quarks, there are sixantiquarks with the samemasses and opposite charges. Quarks can only be observed in bound states,called hadrons, due to color confinement. Hadrons consisting of three quarks are called baryons,and they are also fermions. Quark–antiquark pairs form mesons, which are bosons. The second32. The Standard ModelR/G/B2/31/22.2 MeVupuR/G/B−1/31/24.7 MeVdownd−11/2511 keVelectrone1/2e neutrinoνeR/G/B2/31/21.27 GeVcharmcR/G/B−1/31/293 MeVstranges−11/2106 MeVmuonµ1/2µ neutrinoνµR/G/B2/31/2173 GeVtoptR/G/B−1/31/24.2 GeVbottomb−11/21.78 GeVtauτ1/2τ neutrinoντ±1180.4 GeVW±191.2 GeVZ1photonγ1gluong0125.1 GeVHiggsHstrongforce(color)electromagneticforce(charge)weakforce(weakisospin)chargecolorsmassspin6quarks(+6antiquarks)6leptons(+6anti-leptons)12 fermions(+12 anti-fermions)5 bosons(+1 opposite chargeW )standard matter unstable matter force carriers Higgs boson1st 2nd 3rd generationFigure 2.1: The particle content of the SM, adapted from reference [11] and reference [12], usinginformation from reference [10].class of fundamental fermions are the leptons, again split into three generations: electrons, muons,and tau leptons. Each generation contains a charged lepton with electric charge Q = −1 and thecorresponding neutrino of the same flavor.There are five different types of elementary bosons in the SM. Four types of vector bosons withspin-1 act as gauge bosons, mediating forces between fermions. The scalar spin-0 Higgs boson is aconsequence of EWSB, described in section 2.3.2.Throughout this document, the term electron is generally used to refer to both of the chargedfirst generation leptons. The term positron is only used when a distinction is necessary. Similarly, thedistinction between quarks and antiquarks of a specific flavor is only made when explicitly required.2.2 Quantum chromodynamicsStrong interactions are described by the non-abelian SU (3) gauge theory QCD. Two types of fields arecontained in this theory, quark fields qrα, and gauge fieldsG i , called the gluons. The flavor index r42. The Standard Modelqq¯gS gSg 2SFigure 2.2: QCD interactions: coupling of quarks and gluons (left), three- and four-point gluon self-interactions (center and right, respectively). The interaction strength is parameterized by the couplingconstant gS .takes values r = u,d ,c, s, t ,b, the color index is α= red, green, blue. There are eight Hermitian fieldsG i =G i†, i = 1, . . . ,8, corresponding to the eight generators of SU (3). The gauge covariant derivative,introduced to keep the quark kinetic terms gauge invariant under local SU (3) transformations, isgiven byDµβα = ∂µδβα+i gsp2Gµβα . (2.3)It couples quarks to the gluon field, with an interaction strength parameterized by gs . The expressionδβα is the Kronecker delta, it is unity for α=β and zero otherwise. The gluon field in matrix notation isGβα =8∑i=1G iλiαβp2, (2.4)where α,β are matrix indices. The Gell-Mann matrices λi form a representation of the Lie algebraof SU (3) and satisfy[λi ,λ j]= 2i fi j kλk , defining the antisymmetric structure constants fi j k . Theirnormalization is chosen such that Tr[λiλ j]= 2δi j . The gluon field strength tensor isG iµν = ∂µG iν−∂νG iµ− gs fi j kG jµGkν . (2.5)Putting everything together, QCD is described by the Lagrangian densityL QCD =−14G iµνGiµν+∑rq¯αr iγµDµβα qrβ, (2.6)where γµ are the Dirac matrices. The first term leads to three- and four-point gluon self-interactions,while the quark kinetic term results in quark–gluon coupling. These interactions are depicted infigure 2.2.Additional quark mass terms −∑r mr q¯αr qrα could be added to the Lagrangian in pure QCD. Inthe context of the SM, they are instead generated during EWSB, as described in section 2.3. TheLagrangian 2.6 also allows for the addition of a charge conjugation and parity symmetry (CP) violatingterm, scaled by a free parameter θQCD . It is experimentally known to be small.2.2.1 Running couplingCalculations in QCD involve contributions from processes happening at different orders of gS . Thelowest order of coupling at which a process can take place is called leading order (LO), the next highest52. The Standard Modelorder next-to-leading order (NLO), followed by next-to-next-to-leading order (NNLO). Contributionsfrom higher order processes can be partially absorbed into an effective coupling αs(µ2R)= g 2s (µ2R )/4piwith renormalization scale µR . This scale is commonly chosen to be a characteristic energy scale for agiven process, such as the momentum transferQ in a collision. Quark loop corrections to the gluonpropagator decrease the effective coupling αs(µ2R)at long distances (equivalent to small energy scaleµR ), while gluon loops increase it. The effect from the gluon loops is dominant. In result, the strongforce becomes weaker at high µR , this phenomenon is called asymptotic freedom. At low energies,QCD is non-perturbative.The dependence of the effective coupling on the energy scale is expressed by the renormalizationgroup equation. At one loop level, the effective coupling scales like αs(Q2)∝ ln(Q2/Λ2)−1, whereΛis a reference energy scale at which the strong coupling becomes large.2.2.2 Parton distribution functionsThe parton model describes the constituents of hadrons as point-like particles. Parton distributionfunctions (PDFs) describe the probability density (as defined in section 5.1) for a specific partonto carry a momentum fraction x of a proton. These distributions, expressed as f(x,Q2), dependon the momentum transfer Q2. The evolution of the PDFs with energy scale Q2 is given by theDokshitzer–Gribov–Lipatov–Altarelli–Parisi (DGLAP) equations [13–15].Figure 2.3 shows an example of the momentum distributions x f(x,Q2)of partons in protons. Thedata is taken from the CT10 PDF set [16] interfaced via LHAPDF [17], with Q = 100 GeV. Protonscontain two valence up and one down quark, which carry significant momentum fractions as visiblein the figure. The contributions from sea quarks decreases at higher x.2.2.3 Parton-parton scatteringThe cross-sections for processes at hadron colliders can be factorized into two contributions. PDFsdescribe the colliding partons i , j within the colliding hadronsH1,H2. The subsequent hard scatteringof the partons, described by a cross-section σi j→F , can typically be calculated perturbatively. Thetotal cross-section for producing a final state F , along with any unobserved X , from colliding hadronsH1,H2 is given by the factorization theorem [18], and can be written asσH1,H2→F+X =∑i , j∫fi(x1,µ2F)f j(x2,µ2F)σi j→Fdx1dx2. (2.7)The energy scale µF is called factorization scale, typically chosen to correspond to a momentumtransfer or characteristic momentum of the process. It may also affect the hard scatter cross-sectionσi j→F through radiation from the initial partons i , j .2.3 Electroweak theoryThe EW theory provides the unified description of electrodynamic and weak interactions. It is anon-abelian gauge theory, based on the SU (2)L ×U (1)Y gauge group. In contrast to QCD, it is chiral62. The Standard Model103102101100x0.000.250.500.751.001.251.501.752.00xf( x,Q2 )CT10 parton distribution functions for Q2= (100 GeV)2uuVddVcsgFigure 2.3: Momentum distributions for partons within a proton forQ = 100 GeV, using data from theCT10 PDF set. The first two generations of quarks, including contributions for valence quarks uV anddV , as well as gluons are shown. Uncertainties are not drawn.and spontaneously broken via the process of EWSB. An unbrokenU (1)Q symmetry remains afterEWSB, which describes quantum electrodynamics (QED).The SU (2)L part acts on the flavor of left-chiral fermions, indicated by the subscript L. It introducesthree gauge bosonsW i , i = 1,2,3 with associated coupling g . TheU (1)Y group describes a gaugeboson B with coupling g ′, and different interactions with left- and right-chiral fermions. Its subscriptY refers to the weak hypercharge, a conserved quantity underU (1)Y transformations. It is defined asY =Q−T 3L , (2.8)where Q is the electric charge after EWSB, and T 3L the weak isospin, which is conserved in weakinteractions.After EWSB, aU (1)Q symmetry remains. It describes the photon, a linear combination ofW3 andB . The massiveW ± bosons interact with left-chiral fermions via charged current interactions and canchange their flavor. Massive Z bosons mediate neutral current interactions, and couple to both left-and right-chiral fermions. Flavor changing neutral currents are highly suppressed.2.3.1 Lagrangian densityThe EW Lagrangian density is given byL EW =L gauge+L f +L φ+L Yuk . (2.9)72. The Standard ModelThe first term, L gauge, contains the gauge boson kinetic terms and self-interactions. Covariantfermion kinetic terms are described byL f . The Higgs sector behavior is captured byL φ. Yukawainteractions, which generate fermion masses, are contained in the last term, L Yuk. All terms aredescribed individually in this section.Gauge termThe field strength tensors for the SU (2)L andU (1)Y gauge bosons are given byW iµν = ∂µW iν −∂νW iµ− g²i j kW jµW kν ,Bµν = ∂µBν−∂νBµ.(2.10)The definition ofW iµν is similar to the SU (3)C QCD gluon tensor in equation (2.5), but now there arethree tensorsW iµν, i = 1,2,3, corresponding to the three generators of SU (2). The Levi-Civita tensor²i j k with i , j ,k = 1,2,3 is totally antisymmetric, with ²123 = 1.The gauge term in the Lagrangian density isL gauge =−14W iµνWiµν− 14BµνBµν. (2.11)The kinetic term for theW i bosons introduces three- and four-point gauge boson self interactions,while the B gauge boson does not self-interact.Fermion sectorThe fermion sector is split into left-chiral SU (2)L doublets and right-chiral singlets. For each fermionfield f 0mL = q0mL ,`0mL and f 0mR = u0mR ,d0mR ,e0mR , the subscripts L andR refer to the left- and right-chiralnature of the field, respectively. The subscriptm refers to the generation. Up-type quarks are writtenas um withm = 1,2,3, dm are the down-type quarks, and the charged leptons are em = e,µ,τ. Thesuperscript 0 indicates that these fields are weak interaction eigenstates. The color index α carried bythe quarks, which transforms under QCD, is not explicitly written. A summary of the fermion fields inthe SM, along with their associated charges, is provided in table 2.1.The gauge covariant derivatives for left-chiral fermion doublets f 0mL introduce couplings to thethree SU (2)L gauge bosons, written as a column vector ~W =(W 1 W 2 W 3)T, with coupling strengthg , and to theU (1)Y gauge boson B with a coupling proportional to weak hypercharge Y f and couplingstrength g ′. The corresponding weak hypercharge per fermion is listed in table 2.1. The row vector~τ=(σ1 σ2 σ3)denotes the Pauli matrices, which satisfy[σi ,σ j]= 2i²i j kσk . The covariant derivativesfor left-chiral fermion doublets f 0mL areDµ f0mL =(∂µ+ i g2~τ · ~Wµ+ i g ′Y f Bµ)f 0mL . (2.12)The gauge covariant derivatives for right-chiral fermions f 0mR only couple them to the B gauge boson:Dµ f0mR =(∂µ+ i g ′Y f Bµ)f 0mR . (2.13)82. The Standard ModelTable 2.1: Fermion fields with associated charges. Q is the electric charge, T 3L the weak isospin, and Ythe weak hypercharge.fields Q T 3L Yq0mL =u0md0mL2/3−1/31/2−1/21/6u0mR 2/3 0 2/3d0mR −1/3 0 −1/3`0mL =ν0me0mL0−11/2−1/2−1/2e0mR −1 0 −1The fermion part of the EW Lagrangian density contains the kinetic terms for all fermion fields forall three generations:L f =3∑m=1(q¯0mLiγµDµq0mL + u¯0mR iγµDµu0mR + d¯0mR iγµDµd0mR+ ¯`0mLiγµDµ`0mL + e¯0mR iγµDµe0mR).(2.14)Higgs sectorThe Higgs sector introduces two complex scalar Higgs fieldsφ=(φ+φ0)= 1p2(ϕ1+ iϕ2ϕ3+ iϕ4), (2.15)transforming as a doublet under SU (2)L with weak hypercharge YH = 1/2. The gauge covariantderivative acting on this doublet is given byDµφ=(∂µ+ i g2~τ · ~Wµ+ i g ′YHBµ)φ, (2.16)consistent with the behavior on left-chiral fermion doublets. This covariant derivative introducesthree- and four-point interactions between the gauge bosonsW i ,B and the Higgs field inL φ.The Lagrangian density of the Higgs sector is given byL φ =(Dµφ)†Dµφ−V (φ) (2.17)with a potential termV(φ)=µ2φ†φ+λ(φ†φ)2 . (2.18)The first term determines the Higgs bosonmass after EWSB, and the second term describes quarticHiggs-field self-interactions. A requirement of λ> 0 guarantees that there is a lower bound on thepotential.92. The Standard ModelYukawa termThe last term in the EW Lagrangian density describes Yukawa interactions. It has the formL Yuk =−3∑m,n=1[Γumn q¯0mLφ˜u0nR +Γdmn q¯0mLφd0nR +Γemn ¯`0mLφe0nR]+h.c., (2.19)with a Hermitian conjugate term given by h.c.. The field φ˜ is a conjugate of the Higgs doublet φ,defined as φ˜= iσ2φ† =(φ0† −φ−)T. The 3×3 matrices Γu ,Γd ,Γe determine the Yukawa couplingsand fermion masses after EWSB.2.3.2 Electroweak symmetry breakingDuring EWSB, the scalar φ0 field acquires a vacuum expectation value and the SU (2)L ×U (1)Y gaugegroup is broken down toU (1)Q . In the SM, this takes place via the Brout-Englert-Higgs mechanism[19–24]. This mechanism generates masses for the gauge bosonsW ± and Z , and fermion masses viaYukawa couplings. Out of the four degrees of freedom in the complex doublet φ, three get assigned totheW ± and Z , and the last degree of freedom is the scalar Higgs boson.The vacuum expectation value (VEV) of the scalar doublet φ is given by its lowest energy state〈0|φ|0〉, whichminimizes the potential defined in equation (2.18). For µ2 > 0, this minimum is locatedatφ†φ= 0, (2.20)where all real scalar fields ϕi have zero VEV, since φ†φ= 12∑4i=1ϕ2i . This case is visualized in figure 2.4on the left, projected onto the complex field φ0. There is a single stable global minimum at the origin,preserving the SU (2)L ×U (1)Y symmetry.When µ2 < 0, a hypersphere of degenerate stable global minima is located atφ†φ=−µ22λ≡ ν22> 0, (2.21)which defines ν≡p−µ2/λ. The choice of a specific ground state breaks the degeneracy, spontaneouslybreaking rotational symmetry. Figure 2.4 shows this configuration on the right, again projected ontothe complex field φ0. The degenerate minima are located along a circle, drawn as a dotted red line.A suitable axis rotation aligns the real component of φ0 with the ground state chosen, such that〈0|ϕ3|0〉 = ν and 〈0|ϕ1|0〉 = 〈0|ϕ2|0〉 = 〈0|ϕ4|0〉 = 0. The ground state in these coordinates is given by〈0|φ|0〉 ≡ v = 1p2(0ν), (2.22)with v†v = ν2/2. Only the neutral field component φ0 receives a non-zero VEV, resulting in aU (1)Qsymmetry after EWSB. Written in unitary gauge, the expansion around this minimum is given byφ→ 1p2(0ν+H)(2.23)with a Hermitian scalar field H representing the Higgs boson.102. The Standard ModelRe(φ0)Im(φ0)V(φ)Re(φ0)Im(φ0)V(φ)Figure 2.4: Higgs potential, visualized as a function of the complex field φ0. The unbroken case withµ2 > 0 is shown on the left, with a ground state ϕ1 = ϕ2 = 0. The potential for µ2 < 0 is shown onthe right, where the circle of global minima is drawn with a dashed red line. Rotational symmetry isspontaneously broken when a ground state along this circle is chosen.2.3.3 Lagrangian density after electroweak symmetry breakingThe Lagrangian density in equation (2.9) can be rewritten after EWSB in unitary gauge to study thestructure of the SM.Gauge and Higgs sectorsIt is convenient to define linear combinations of the SU (2)L ×U (1)Y gauge bosons asW ±µ =W 1µ ∓ iW 2µp2, (2.24)andZµ =−sinθWBµ+cosθWW 3µ , (2.25)where sinθW = g ′/gZ and cosθW = g/gZ with gZ =√g 2+ g ′2. This defines the weak angle θW ≡arctan g ′/g . The remaining linear combination of the Bµ andW 3µ gauge bosons is the photon, given byAµ = cosθWBµ+ sinθWW 3µ . (2.26)The Lagrangian density for the Higgs sector after EWSB is obtained by inserting the expres-sion (2.23) into equation (2.17), using the definitions from equation (2.24) and equation (2.25). It isgiven byL φ = g2ν24W +µW −µ(1+ Hν)2+ 12g 2Zν24ZµZµ(1+ Hν)2+ 12(∂µH)(∂µH)−V (H) . (2.27)This Lagrangian density describes two charged gauge bosonsW ± and a neutral boson Z , with tree-level massesMW = gν2, MZ = gZν2, (2.28)112. The Standard ModelW +W −H2M2WνZZH2M2ZνFigure 2.5: Three-point interactions between Higgs boson andW +W − bosons (left), between Higgsboson and two Z bosons (right).while the photon remains massless. The masses of the Z boson and theW ± bosons are related viaMW/MZ = cosθW , alternatively written as sin2θW = 1− (MW/MZ )2.The Lagrangian density in equation (2.27) introduces three- and four-point interactions betweentheW ± and Z bosons with the Higgs boson H . The corresponding interaction strengths are propor-tional to the square of the respective gauge boson masses. The three-point interactions HW +W − andHZZ are shown in figure 2.5, with interaction strengths proportional to the squares of theW ± and Zboson masses. Three- and four-point Higgs boson self-interactions are included in the potential term,which is given in unitary gauge byV (H)=−µ44λ−µ2H2+λνH3+ λ4H4. (2.29)The Higgs boson tree-level mass is appearing in the second term asMH =√−2µ2 =p2λν, (2.30)and is related to its quartic coupling.The gauge interactions fromL gauge in equation (2.11) result in three- and four-point interactionsof the gauge bosons after EWSB.Quantum electrodynamicsThe theory of QED is described by the Lagrangian densityL QED = ψ¯(iγµ∂µ−m)ψ−qAµψ¯γµψ− 14FµνFµν, (2.31)for a fermion fieldψwith massm and electric charge q , and field strength tensor Fµν = ∂µAν−∂νAµ.When comparing the fermion Lagrangian density L f after EWSB to L QED, the positron electriccharge can be identified ase = g sinθW . (2.32)It is fully determined by the SU (2)L andU (1)Y coupling constants.FermionsFermion interactions with the Higgs field are obtained by inserting the expression (2.23) into equa-tion (2.19). The resulting mass matrices Mumn = Γumnν/p2 and Yukawa couplings to the Higgs field122. The Standard Modelff¯Hm fνFigure 2.6: Higgs boson coupling to fermions, the coupling strength is proportional to the fermionmassm f .1/νMumn need to be diagonalized to obtain the fermion mass eigenstates. The Lagrangian densitydescribing fermion mass eigenstatesψ f , where f = um ,dm ,em (m = 1,2,3 is the fermion generation),isLψ =∑f =u,d ,eψ¯ f[iγµ∂µ−m f(1+ Hν)]ψ f , (2.33)with ψ f =ψ f L +ψ f R . The kinetic term for the fermions originates from equation (2.14). Fermionscouple to the Higgs field with a coupling strength proportional to their masses, visualized in figure 2.6.The Yukawa coupling strength of fermions is defined asy f =p2m fν. (2.34)The relation between the fermion mass eigenstates and weak eigenstates is given by the 3×3 unitaryCabibbo–Kobayashi–Maskawa (CKM) matrix VCKM,d0s0b0=VCKMdsb=Vud Vus VubVcd Vcs VcbVtd Vt s Vtbdsb , (2.35)with V ∗CKMVCKM = I3, the 3× 3 identity matrix. The off-diagonal elements cause flavor changingweak charged current interactions of quarks with transition probabilities proportional to the squaredmagnitude of the matrix elements, |Vmn |2.2.4 Success and limitations of the Standard ModelThe SM has 19 free parameters:• nine fermion Yukawa couplings for the three charged leptons and six quarks,• three coupling constants for gauge interaction strengths: gS , g , g ′ (alternatively parameterizedvia the strong coupling αs , the QED coupling α, and the Fermi coupling constantGF ),• two parameters describing the Higgs potential: ν,mH ,• four mixing angles of the CKMmatrix,• one strong CP phase θCP , usually taken as zero, leading to CP violation in strong interactions.132. The Standard Modelpp500 µb−180 µb−1W Z t¯t tt-chanWW Htotaltt¯HVBFVHWt2.0 fb−1WZ ZZ ts-chant¯tW t¯tZ tZj10−111011021031041051061011σ[pb]Status: July 2018ATLAS PreliminaryRun 1,2√s = 7,8,13 TeVTheoryLHC pp√s = 7 TeVData 4.5 − 4.6 fb−1LHC pp√s = 8 TeVData 20.2 − 20.3 fb−1LHC pp√s = 13 TeVData 3.2 − 79.8 fb−1Standard Model Total Production Cross Section MeasurementsFigure 2.7: Summary of ATLASmeasurements of total cross-sections of various SM processes, com-pared to predictions of the SM [25].The SM can be extended with seven additional parameters to describe neutrino oscillations and massgeneration via couplings to the Higgs field (for Dirac neutrinos). The Pontecorvo–Maki–Nakagawa-Sakata (PMNS) matrix relates the neutrino mass eigenstates to weak eigenstates, parameterized withfour parameters. On top of this, three new Yukawa couplings are needed, bringing the total amount offree parameters in the SM to 26. Alternative mechanisms for neutrino mass generation exist, and theexperimental determination of the mechanism responsible for neutrino masses in nature is an activearea of research.Using just these 19 degrees of freedom, the predictions of the SM are in agreement with decadesof experimental tests. It is a highly successful theory of nature. Figure 2.7 shows cross-sectionmeasurements for a range of processes, measured with the ATLAS detector. They are compared toSM predictions, obtained from calculations performed at least at NLO. The measurements are ingreat agreement with these predictions, for processes with cross-sections varying over many orders ofmagnitude.2.4.1 Open questionsDespite all the success, the SM is not the final theory of particle physics. A number of outstandingissues are listed in this section [8, 9].142. The Standard ModelDark matterWhen assuming that themass of galaxies ismostly comprised of luminous stars, the tangential velocityv of stars with massm, located at a radius r from the center of the galaxy, is given bymv2r≈ Gmr 2M (r ) , (2.36)with gravitational constantG . The mass contained within radius r is given byM (r ). For galaxies likethe Milky Way, most of the luminous mass is in the center. The observed distributionM (r ) decreasesslower with radius than expected, hence indicating that a significant amount of non-luminous darkmatter (DM) contributes gravitationally.Gravitational lensing results from the bullet cluster [26], measuring the deflection of light dueto the mass in this galaxy cluster, provide additional support for the existence of DM. Furthermore,when fitting the cosmological standard modelΛCDM to the cosmic microwave background powerspectrum, the density of baryonic matter makes up only roughly one sixth of the total matter densityin the universe [27]. Alternative explanations to these results, which propose modified versions ofgravity, are increasingly challenged by the precision of the experimental observations. A detailedhistorical perspective is provided in reference [28].Many theories of beyond the Standard Model (BSM) physics predict candidates for DM, and abroad range of experiments is searching for them. Once DM particles are observed and confirmed,the description of them will require an appropriately modified new SM.Baryon asymmetryThe baryon asymmetry in the universe describes the observed excess of baryonic matter compared toantimatter. Its origin is unclear, but the SMmight require additional sources of CP violation to explainthe visible universe today.NeutrinosThe discovery of neutrino oscillations between different flavors established their non-zeromasses, andcurrent data can be described assuming neutrino mixing between all three flavors [10]. Several openquestions remain to be answered about the nature of neutrinos. Neutrinos may be Dirac or Majoranafermions, the latter implying that they are equivalent to their antiparticles and can violate leptonnumber conservation. A Majorana nature could be confirmed by the observation of neutrinolessdouble β decays (nn→ ppe−e−). The absolute scale of neutrinomasses and themechanism by whichthey are generated are currently unknown. Neutrino masses may be acquired via Yukawa couplings tothe Higgs field, which introduces sterile right-chiral neutrinos that do not couple to theW ± and Zbosons. A range of other proposed mechanisms exists.152. The Standard ModelH H H H H H H Hf V H XFigure 2.8: Loop corrections to the Higgs bosonmass via three-point couplings to fermions f , vectorbosons V , self-interactions and a newmassive particle X . Contributions from quartic interactions arenot shown.Vacuum stability and top quark Yukawa couplingThe evolution of the effective Higgs self-coupling λ is given by the renormalization group equation.It predicts that large values of the top quark Yukawa coupling yt drive the self-coupling to negativevalues at high energies. This results in the appearance of a new EW vacuum, with a lower potentialthan the vacuum after EWSB, which is described in section 2.3.2. For very large values of yt , thelifetime of the current vacuum is smaller than the lifetime of the universe. Data indicates that thecurrent EW vacuum in the SM is meta-stable, with a lifetime larger than the liftetime of the universe[29]. A precisemeasurement of yt is an important check of the SM validity in the cosmological context,and can answer the question whether any BSM phenomena are needed for consistency below thePlanck scale, which is the energy scale at which gravitational effects are expected to play a significantrole [30].Theory considerationsThe SM is unable to provide explanations forwhy it takes the form described in this chapter. Thereare three generations of fermions, with the second and third generations playing seemingly no largerole in nature. It is unclear why the fermion masses, free parameters for which the SM makes noprediction, vary over many orders of magnitude. The CP-violating strong phase θQCD is known to bevery small, with no explanation for why this is the case.The hierarchy problem arises when embedding the SM into another, more complete, theory ofnature. Such a theory is characterized by an energy scaleΛ above the EW scale ν≈ 246 GeV. If the SMis valid up to this scale, then the Higgs boson mass receives loop-level corrections to the tree-levelexpression from equation (2.30). The contributions from three-point interactions are visualized infigure 2.8. In the SM, they are dominated by the top quark in the fermion loop. These corrections tothe mass are proportional to Λ, the scale at which the integrals over the momenta of the particlescontributing to the loops are cut off. When taking this scale to be the Planck scale ≈ 1019 GeV, theHiggs bosonmass should receive corrections that make it many orders of magnitude larger than itsmeasured value. These corrections could be cancelled by additional couplings of the Higgs field to anew field X . The coupling strength of such interactions needs to be fine-tuned to precisely cancel outthe contributions from SM couplings. An increasingly severe fine-tuning is needed for larger valuesofΛ. According to the naturalness paradigm, such large cancellations between free parameters of atheory should not occur [31].162. The Standard ModelTable 2.2: Branching ratios for the decay of t t¯ [10].Final state Branching ratioall-hadronic 45.7%single-lepton 43.8%dilepton 10.5%2.5 Implications for physics at the Large Hadron ColliderThis section describes the physics of top quarks, Higgs bosons, and their interplay, in collisions at theLHC.2.5.1 Top quarkWith amass of around 173 GeV, the top quark t is the heaviest elementary particle of the SM [10]. Thisallows for decays intoW bosons and b quarks, which happen before it can form hadrons. Decays intoother down-type quarks are in principle possible, but, due to the structure of the CKMmatrix with|Vtb | ≈ 1, extremely rare and negligible in practice. With its large mass, the top quark has a Yukawacoupling value of approximately unity.Top quarks are predominantly produced in pairs, as a t t¯ system, at hadron colliders. Table 2.2shows the branching ratios of the t t¯ system into three different final states. The all-hadronic finalstate contains decays of bothW boson into a quark–antiquark pair each. This is the dominant decaymode. A slightly smaller fraction of events decays into the single-lepton final state, where one of theW bosons decays into a quark–antiquark pair, while the other one decays into a charged lepton and aneutrino. The dilepton final state describes events where bothW bosons decay leptonically. Table 2.2includes tau leptons in the definition of leptons. In the context of physics analyses, leptonic decaysoften instead refer to decays to light charged leptons only. Light charged leptons are electrons andmuons, including those from decays of tau leptons. Decays of tau leptons to hadronic final states aretreated differently experimentally, as described in section 4.3.3.2.5.2 Higgs bosonThe detailed investigation of the EWSB process is one of the major physics goals of the LHC. In 2012,both the ATLAS and CMS collaborations published their independent discoveries of a new boson[32, 33], consistent with a SMHiggs boson. Subsequent studies have confirmed the SM nature of theobserved particle, and are probing its properties in detail. With the central role played by the Higgsboson in the SM, detailed studies are both an important check of the validity of the SM, and can at thesame time reveal possible hints of BSM physics phenomena.172. The Standard ModelggHqqqHqqq¯W /ZHggt¯HtW /ZFigure 2.9: Exemplary Feynman diagrams for the gluon–gluon fusion (top left), vector boson fusion(top right), VH (bottom left), and t t¯H (bottom right) processes [TeV] s6 7 8 9 10 11 12 13 14 15 H+X) [pb]    →(pp σ2−101−10110210 M(H)= 125 GeVLHC HIGGS XS WG 2016 H (N3LO QCD + NLO EW)→pp  qqH (NNLO QCD + NLO EW)→pp  WH (NNLO QCD + NLO EW)→pp  ZH (NNLO QCD + NLO EW)→pp  ttH (NLO QCD + NLO EW)→pp  bbH (NNLO QCD in 5FS, NLO QCD in 4FS)→pp  tH (NLO QCD, t-ch + s-ch)→pp Figure 2.10: Dominant processes for Higgs boson productionwith associated cross-sections in proton–proton collisions, shown as a function of COM energy. Bands indicate theoretical uncertainties in thecross-section calculation [34].Higgs boson productionThere are fourmajor Higgs boson productionmodes accessible in proton–proton collisions at the LHC[34]. Figure 2.9 shows exemplary LO Feynman diagrams for these four modes, while the respectivecross-section as a function of the center-of-mass (COM) energy are presented in figure 2.10 for a Higgsbosonmass of 125 GeV.The loop-induced gluon–gluon fusion is the dominant production mode. Due to its large Yukawa182. The Standard Modelcoupling, the largest contributions to the loop in the SM come from virtual top quarks. The contribu-tions from lighter quarks q are suppressed proportional to the squares of their massesmq , by factorsm2q/m2t for top quark massmt . While this productionmode has a large cross-section, it lacks distinctiveobjects produced in the final state along the Higgs bosons, which would help identify events producedvia this production mode experimentally.The productionmode with the second largest cross-section is vector boson fusion, sometimes alsomore precisely called weak boson fusion. It is initiated by the scattering of two quarks or antiquarksvia exchange of aW ± or Z boson. The weak boson radiates off a Higgs boson. Experimentally, thisproduction mode is characterized by two high-momentum jets (see section 4.4), emitted at smallangles from the colliding protons.Higgs boson production in association with aW ± or Z vector boson, VH , is the production modewith the third highest cross-section. It is characterized by the Drell-Yan production of an off-shell weakboson, which radiates off a Higgs boson. The weak boson in the final state can be used experimentallyto identify events originating from this process.The associated production of Higgs bosons with a top quark pair, t t¯H , is the focus of this disser-tation. In this process, the Higgs boson is radiated off a top quark pair. This top quark pair can beproduced from either gluon–gluon or quark–antiquark interactions. The relevance of this processis discussed in section 2.5.3. In order to concisely specify a topology where the Higgs boson is pro-duced via t t¯H and decays to a bottom quark pair, the term t t¯H(bb¯)will be used in this dissertation.Chapter 6 and chapter 9 describe analyses of this topology.Additional Higgs boson production modes remain experimentally challenging. Higgs boson pro-duction in associationwith a bottomquark pair, bb¯H , has a comparable cross-section to t t¯H at a COMenergy ofps = 13 TeV. The bottom quark pair signature is not as easily identifiable experimentally asthe top quark pair signature of t t¯H , thus making the isolation of bb¯H from background processesdifficult. Higgs boson production in association with a single top quark, tH , is sensitive to the sign ofthe top quark Yukawa coupling and affected by destructive interference between dominant processescontributing to its production. The cross-section for this process is an order of magnitude smallerthan t t¯H , making the process more difficult to observe. Lastly, double Higgs boson production issensitive to the Higgs boson cubic self-coupling. Due to its very low cross-sections in the SM, theobservation of this production requires future runs of the LHC, in order to collect a large amount ofintegrated luminosity.Higgs boson decayThe Higgs boson decays into a wealth of experimentally accessible final states. Its branching ratios inthe SM for a mass of 125 GeV are shown in table 2.3.The bb¯ final state is most common. Decays into gauge vector bosons are suppressed, since one ofthe bosons must be produced off-shell. The decays into a di-photon final state γγ are loop-induced;this loop is dominated by contributions from virtual top quarks in the SM. Higgs boson decays intoZZ∗ and subsequently four charged leptons, as well as decays into γγ, are comparatively rare. Due to192. The Standard ModelTable 2.3: Branching ratios for the decay of the Higgs boson. The other category contains experimen-tally challenging final states [34].Final state Branching ratiobb¯ 58.2%WW ∗ 21.4%ττ 6.3%ZZ∗ 2.6%γγ 0.2%other 11.3%their clean experimental final state signature, these decay modes nevertheless made the dominantcontributions to the Higgs boson discovery.The other category includes decays into pairs of gluons (with a branching ratio of 8.2%), charmquarks, Zγ, muons, and other processes with very small cross-sections. These are challenging tomeasure experimentally. While the branching ratio for the decay to a pair of muons is only 0.02%,a search performed by the ATLAS collaboration with 79.8 fb−1 of data excludes values larger thanroughly twice this prediction at the 95% confidence level, assuming the SM production cross-section[35]. An ATLAS search for the Zγ final state, which has a branching ratio of 0.15%, excludes valueslarger than roughly seven times the SM prediction using 36.1 fb−1 of data [36].2.5.3 Yukawa couplings and the special role of t t¯HThe t t¯H process assumes a special role in the SM, as it allows for the direct tree-level measurementof the top quark Yukawa coupling yt . The relevance of this parameter is highlighted in section 2.4.1.When assuming that the SM is correct, yt can be obtained in various ways. It is related to the top quarkmass via equation (2.34). Furthermore, it can be measured from the gluon–gluon fusion Higgs bosonproduction and the loop-induced Higgs boson decay to di-photon final states, since the top quarkcontribution to the loop dominates. The measurement from these loop-induced processes relies onthe assumption that no BSM particles contribute to the loops. In a combined fit to Run-2 data of up to79.8 fb−1 recorded by the ATLAS experiment, the uncertainty on yt is around 10%, with a central valueconsistent with the SM prediction [37]. In this fit, the coupling strengths of the Higgs boson to weakgauge bosons, third generation quarks, tau leptons, and muons are all measured simultaneously. Thisfit assumes no BSM particles coupling to the Higgs bosons.A direct measurement of yt via t t¯H is an important confirmation of the Brout-Englert-Higgsmechanism and of the SM validity. It tests a fundamental type of interaction in the SM, the Yukawainteraction between Higgs boson and fermions. The ATLAS t t¯H(bb¯)analysis presented in chapter 6contributed to the observation of both the t t¯H process [3] and the Higgs boson decay to bottom quarkpairs [38]. These processes were independently observed by the CMS collaboration [39, 40], and bothCMS and ATLAS observed the Higgs boson coupling to τ leptons in 2017 and 2018, respectively [41,42]. These results establish the observation of Yukawa interactions. The observation was achieved by202. The Standard Modelanalyzing the interactions of Higgs bosons with third generation fermions. The measurement of theinteractions with the first and second generation fermions remains a challenge for the future [43].213. The Large Hadron Collider and the ATLASexperimentThe LHC [44] is a hadron accelerator, delivering proton–proton collisions and also supporting colli-sions of protons with heavy ions or just heavy ions. The decay products of the interactions taking placeare recorded by a range of experiments. The four major experiments include two general-purposeexperiments, ATLAS [45] and CMS [46]. LHCb [47] specializes in physics with b-hadrons, and ALICE[48] in heavy ion collisions.This chapter describes the experimental facilities relevant to the work presented in this disserta-tion. The LHC is briefly introduced in section 3.1, and relevant details about the ATLAS experimentare given in section 3.2.3.1 The Large Hadron ColliderThe LHC is located at the border between Switzerland and France, close to the city of Geneva [44]. Itforms a part of the CERN accelerator complex, and serves as the final stage in a chain of accelerators,designed to provide collisions between proton beams at COM energies ofps = 14 TeV with instan-taneous luminosities of L = 1034 cm−2s−1. The LHC is installed in a 26.7 km long tunnel, which liesbetween 45 m and 170 m below the surface. Beams of charged particles are accelerated in opposite di-rections within two rings in a vacuum system, using electromagnetic fields in radio frequency cavitiesoperating at 400 MHz. Superconducting magnets deflect the beams via the Lorentz force, keepingthem on track within the rings. The beams are focused and collided at four points, with the fourexperiments ATLAS, CMS, LHCb and ALICE situated at these collision points. Besides proton–protoncollisions, the LHC also supports collisions of heavy ions, as well as heavy ions and protons. As thoseoperation modes are not relevant for this dissertation, they are not discussed further.3.1.1 Accelerator chainBefore proton beams are circulated in the LHC, a range of other accelerators gradually accelerates thebeams to increasingly higher energies. The CERN accelerator complex is pictured in figure 3.1.Hydrogen gas is used as a source of protons, with their electrons stripped off by an electric field.The protons are initially accelerated in the linear accelerator Linac2 to energies of 50 MeV. Subse-quently, the Proton Synchrotron Booster accelerates them to 1.4 GeV, followed by further accelerationin the Proton Synchrotron to 25 GeV. The last step before entering the LHC is additional accelerationin the Super Proton Synchrotron to energies of 450 GeV. It takes a minimum of approximately 16223. LHC and ATLASFigure 3.1: The CERN accelerator complex relevant for proton–proton collisions in the LHC. Grayarrowheads indicate the proton path. BOOSTER refers to the Proton Synchrotron Booster, PS is theProton Synchrotron, and SPS is the Super Proton Synchrotron. The figure is adapted from refer-ence [49].minutes to fill the LHC with pre-accelerated proton bunches. Within around 20 minutes after thefilling, the LHC accelerates the beams to their target collision energies.3.1.2 Luminosity and pile-upThe number of events produced by the LHC is a function of the instantaneous luminosity L deliveredby the machine over time t ,Nevents =σevent∫Ldt =σeventL, (3.1)and is proportional to the relevant cross-section σevent for producing such events. The time integralover instantaneous luminosity is called the integrated luminosity L. The instantaneous luminosity isgiven by [9]L = f n1n24piσxσyF. (3.2)With the nominal LHC spacing between proton bunches of 25 ns, the collision frequency is f = 40MHz.The amount of protons per bunch is n1 and n2 for the two beams, with up to 1011 protons per bunch.Not all bunches are filled with protons in practice. The bunches have root mean square extensions σxand σy in the two directions perpendicular to the beam propagation direction. Collisions at the LHCare not exactly head-on, and the factor F contains a description of the geometric effects due to thecrossing angle between the beams at the interaction point.Due to the large amount of protons per bunch at the LHC, each bunch crossing usually resultsin more than one hard scattering interaction. Interactions besides the interaction of interest are233. LHC and ATLAS0 10 20 30 40 50 60 70 80Mean Number of Interactions per Crossing0100200300400500600/0.1]-1Recorded Luminosity [pb Online, 13 TeVATLAS-1Ldt=146.9 fb∫> = 13.4µ2015: <> = 25.1µ2016: <> = 37.8µ2017: <> = 36.1µ2018: <> = 33.7µTotal: <2/19 calibrationFigure 3.2: Distribution of mean number of interactions per bunch crossing in data recorded by theATLAS experiment atps = 13 TeV [51].called in-time pile-up. Out-of-time pile-up is caused by proton–proton interactions taking place inneighboring bunch crossings around the crossing of interest, which can affect the measurement,since the readout times for detector systems can be longer than the time between two bunches. Thenumber of interactions per bunch crossing is Poisson-distributed, with a mean µ proportional to theproduct of the total inelastic proton–proton cross-section σinel. and the instantaneous luminosity[50],µ= Lσinel.f. (3.3)3.1.3 DatasetThe data-taking period with LHC collisions of protons at energies ofps = 7 TeV and ps = 8 TeV iscalled Run-1. Subsequently, Run-2 lasted from 2015–2018, with collision energies ofps = 13 TeV. Thedataset collected in Run-2 is analyzed in this dissertation.The mean number of interactions per bunch crossing is shown in figure 3.2 for the data recordedby the ATLAS detector in Run-2 of the LHC. Different colors show the distribution per data-takingyear, with the 2015 contribution in yellow, 2016 in orange, 2017 in purple and 2018 in green. The bluedistribution corresponds to the total dataset integrated over all four data-taking years. The averagenumber of interactions per bunch crossing per year is shown in the legend as 〈µ〉. This quantity,and hence the pile-up, increases over the course of Run-2. The increase is caused by the increasedinstantaneous luminosity provided by the LHC, which reaches a plateau in 2017 and 2018.243. LHC and ATLASxyzPφθFigure 3.3: The ATLAS coordinate system, with ATLAS located at the origin. The x axis points towardsthe center of the LHC ring (indicated as dotted line), and the y axis up towards the surface. Beamspropagate along the z axis. The azimuthal angle φ and polar angle θ are also shown for an arbitrarypoint P .3.2 The ATLAS detectorThe ATLAS detector is a general purpose detector at the LHC [45]. A broad range of high energyparticle physics analyses are conducted with ATLAS, including measurements of SM properties andsearches for hints of BSM physics. In order to meet the requirements of these analyses, the ATLASdetector is designed with fast electronics, high granularity, good object reconstruction efficiency andresolution.3.2.1 Coordinate systemThe right-handed coordinate system used to describe ATLAS is visualized in figure 3.3. Its originis located at the nominal beam interaction point. The z-axis is aligned with the beam axis, whilethe x–y plane is perpendicular to it. The x-axis points towards the center of the LHC ring, whichis indicated in the figure as a dotted line. The y-axis is directed up towards the surface. Transversequantities are defined in the x–y plane. The transverse momentum pT of an object is the momentumcomponent in the x–y plane, pT =√p2x +p2y . The azimuthal angle φ is measured around the beamaxis, while the polar angle θ is measured from the beam axis. A common alternative parametrizationof the polar angle is given by the pseudorapidity η=− ln[tanθ/2], which approaches infinity as thepolar angle decreases to zero. Differences in pseudorapidity, ∆η, are Lorentz invariant under boostsalong the beam axis. The rapidity y = 12 ln[E+pzE−pz], typically used for massive objects, is equivalent tothe pseudorapidity in the limit of negligible object mass,m¿ E . A commonmeasure of distancesbetween object is defined as ∆R =√∆η2+∆φ2. This separation can also be defined using rapidityinstead of pseudorapidity, as ∆Ry =√∆y2+∆φ2.253. LHC and ATLASFigure 3.4: The complete ATLAS detector in cutaway view [45].3.2.2 Detector overviewA schematic of the ATLAS detector is shown in figure 3.4. The full detector is 44 m long and 25 m high,roughly rotationally symmetric around the LHC beam pipe, and has a weight of around 7000 metrictons. It is centered around the interaction point of the colliding LHC beams, and symmetric underreflection across the z = 0 plane. The central barrel region of the detector consists of multiple detectorsubsystems arranged as concentric cylinders. A disk-shaped end-cap is located at each side of thebarrel.The inner detector (ID) is located at the core of ATLAS, and embedded within a 2 T axial magneticfield provided by a solenoid magnet. It provides tracking of charged particles within the central∣∣η∣∣ < 2.5 region. Electromagnetic and hadronic calorimeters surround the inner detector. Theoutermost parts of ATLAS are formed by the muon spectrometer (MS), which is immersed in amagnetic field provided by toroid magnets in barrel and end-caps.3.2.3 Inner DetectorThe ATLAS ID system is designed to provide precision tracking information for charged particles up to|η| < 2.5. It is located within a 2 Tmagnetic field generated by a solenoidmagnet with a 2.5m diameterand a length of 5.3 m in the z direction. This axial field is parallel to the z-axis, bending the tracksof charged particles in the φ direction. The deflection allows for measurements of momentum andcharge of these particles. The precision information provided by the ID is also used to reconstruct theprimary vertex, the measured location at which the hard parton scattering of interest in a given event263. LHC and ATLASFigure 3.5: Cutaway view of the ATLAS ID. The IBL is missing in this visualization [45].took place.Figure 3.5 illustrates the ID layout. Closest to the beam pipe is the pixel detector, followedby the semiconductor tracker (SCT) and the transition radiation tracker (TRT). Precision trackinginformation within |η| < 2.5 is provided by the pixels and the SCT. They are arranged as concentriccylinders centered around the beam pipe in the barrel, and disks in both end-caps.Pixel detectorsThe innermost subsystem of the ID are silicon pixel detectors. Charged particles passing throughthem are detected via the electron–hole pairs they create in the semiconductors. There are four layersin the pixel system. The innermost layer, called insertable B-layer (IBL), was installed between Run-1and Run-2 of the LHC [52]. It has the highest granularity, with a pixel size of 50 µm in the φ direction,and 250 µm in the z direction. This layer is located at a radius r = 33 mm from the beam pipe center,and covers the |η| < 3.0 region. The pixels in the remaining three layers have sizes of 50 µm in theφ direction, and 400 µm in the z direction. In total, the pixel detector system contains around 86million pixels. The expected hit resolution in the IBL is 8 µm in the φ direction and 40 µm in the zdirection. The remaining three pixel layers have a decreased resolution of 10 µm in φ and 115 µm inthe z direction.273. LHC and ATLASSemiconductor trackerThe SCT consists of silicon strip detectors, with strips of size 80 µm×12 cm. It is made up of fourlayers in the barrel, and two disks in each end-cap. In each barrel layer, one set of strips is parallelto the z axis, and a second set of strips is rotated by a stereo angle of 40 mrad with respect to thefirst set. The end-cap disks have one set of strips in the radial direction perpendicular to the z axis,with another set again rotated by 40 mrad. The resolution in the SCT is 17 µm in the φ direction, and580 µm in the z direction. The SCT contains around 6 million readout channels.Transition radiation trackerThe outermost part of the ID is the TRT. It consists of around 300000 straw tubes with a diameterof 4 mm in the region |η| < 2.0. In the barrel region, the straws are 144 cm long and parallel to thez axis. The straws in the end-caps are 37 cm long, and positioned radially, perpendicular to the zaxis. A wire runs through the center of the straws, and they are filled with gas. The wire is held ata potential difference with the tube walls. Charged particles passing through the tubes ionize thegas, and the resulting electrons drift to the anode wire. The resolution in the φ direction per straw is130 µm. Besides the tracking capabilities provided by the TRT, it is also used for particle identification.The straws are interleaved with polypropylene, and transition radiation is emitted at the materialboundaries. Electrons can be distinguished from charged pions due to the larger amount of transitionradiation they leave behind.3.2.4 CalorimetersThe solenoidmagnet containing the ATLAS ID is surrounded by a calorimeter system. Electromagneticand hadronic calorimeters provide energy measurements of particles passing through them, coveringthe range up to |η| < 4.9. The system is designed to absorb the energy of most SM particles originatingfrom the collision, with the exception ofmuons and neutrinos. Figure 3.6 shows the ATLAS calorimetersystem.Electromagnetic calorimeterThe electromagnetic calorimeter consists of a barrel and two end-cap components, located in separatecryostats and covering the range |η| < 3.2. It is a sampling calorimeter, consisting of alternating layersof lead absorbing plates and liquid argon as an active medium. Electromagnetic showers developmostly in the absorbing material, and are measured in the active medium. Lead plates and electrodesare arranged in an accordion shape, with a potential difference applied between them to collectcharges from ionization left by particles passing through the calorimeter. The granularity varies acrossthe calorimeter, and it is highest for |η| < 2.5, tomatch the regionwhere precision tracking informationfrom the ID is available. The highest granularity is 0.025×0.025 in η×φ. In total, the electromagneticcalorimeter has around 170000 readout channels.283. LHC and ATLASFigure 3.6: Cutaway view of the ATLAS calorimeter system surrounding the ID and solenoid magnet[45]. The term LAr refers to liquid argon as active material.The electromagnetic calorimeter provides energy measurements of electrons and photons. Elec-trons at the LHC lose most of their energy via bremsstrahlung and subsequent electron–positron pairproduction. For photons, electron–positron pair production is the dominant energy loss process. Theradiation length X0, which is a function of atomic number and mass number of the material throughwhich the particles are passing, characterizes the distance scale over which the energy losses takeplace. One radiation length is the average distance over which bremsstrahlung reduces the electronenergy by a factor of 1/e, and corresponds to roughly 7/9 of the mean free path for photon-induced pairproduction [9]. The electromagnetic calorimeter has a thickness of more than 22X0 in the barrel, andmore than 24X0 in the end-caps, to provide good containment of electromagnetic showers originatingfrom bremsstrahlung and pair production.Hadronic calorimetersThe hadronic calorimeter system consists of three components, with a total of around 19 000 readoutchannels. The tile calorimeter covers the central region |η| < 1.7, with a barrel in the region |η| < 1.0and two extended barrels covering 0.8< |η| < 1.7. It is a sampling calorimeter made up of steel as anabsorption material and scintillating plastic tiles as active material. The scintillating tiles are readout via photomultiplier tubes. The tile calorimeter has three layers, with the highest granularity of0.1×0.1 in η×φ in the first two layers.The end-cap hadronic calorimeters are located directly outside the electromagnetic calorimeters.293. LHC and ATLASFigure 3.7: Cutaway view of the ATLASMS [45].Copper plates are used as an absorbing material, while the active medium is liquid argon. They coverthe range 1.5< |η| < 3.2, with the highest granularity of 0.1×0.1 in η×φ in the region |η| < 2.5.The forward calorimeter covers the range 3.2 < |η| < 4.9 and is made up of three layers. Thefirst layer uses copper as absorbing material, optimized for electromagnetic measurements. Theremaining layers use tungsten for hadronic measurements. Tubes, aligned parallel to the z direction,are located within the absorbing material. They contain small gaps of less than 1 mm, filled withliquid argon as active material, and a rod in the center.Charged hadrons lose energy via ionization of the surrounding material, and both charged andneutral hadrons also lose energy by undergoing strong interactions with nuclei in the surroundingmedium. The nuclear interaction length λI describes the mean distance between hadronic inter-actions for relativistic hadrons. The hadronic calorimeter has a thickness of around 10λI, thereforecontaining the majority of the energy in hadronic showers within the hadronic calorimeter. Theseshowers also contain electromagnetic components, for example from the decay of neutral pions intotwo photons.3.2.5 Muon spectrometerThe MS forms the outermost layer of ATLAS. Four different detector systemmake up the MS, withmore than one million readout channels in total. A cutaway view is shown in figure 3.7. The MS isembedded in a magnetic field produced by a system of three superconducting toroidal magnets, onein the barrel, and one in each end-cap. The magnets provide an average magnetic field of around0.5 T across the MS, pointing in the φ direction and therefore generally perpendicular to the muon303. LHC and ATLAS12 m2 4 6 8 10 12 14 16 18 2024681012 m0Large (odd numbered) sectors BILBMLBOLEOLEILCSC1 2 3 4 5 6123456EIL401 2 3 4 5 61 2 3 4 5 612345123End-capmagnetRPCsyz12TGCEnd-  toroid η=2.7η=2.4η=1.3η=1.0MDTMDTCSCMDTMDTFigure 3.8: Schematic of one quarter of a cross-section through the ATLAS detector [53].propagation direction. This field deflects muons passing through the MS in the η (or equivalently z)direction, allowing for measurement of their momenta.Figure 3.8 shows a schematic view of a cross-section of a quarter of the detector. The outermostchambers in the barrel region are located at a radius of around 10 m from the beam pipe, and theoutermost end-cap disks at |z| ≈ 21.5 m. The muon system is not completely symmetric underrotation in the φ direction due to gaps needed for detector services and support structure (feet).Muon trigger chambersTwo detector systems allowing for fast readout are used to make initial trigger decisions (the triggersystem is described in section 3.2.6). Different experimental conditions in the barrel and end-capregions motivate the use of two detector technologies. A higher granularity is needed in the end-capsto match the momentum resolution in the barrel, and the radiation levels in the end-caps are higher.Three layers of resistive plate chambers (RPCs) are used in the barrel region |η| < 1.05. They consistof parallel plates with high resistivity held at a potential difference, with a gas mixture in the gapbetween them. Muons ionize the gas, and the resulting charges are collected on the plates. Besidestrigger information, the RPCs provide η and φmeasurements, with a resolution of around 10 mm inboth the z direction and the plane tangential to the φ direction.The end-caps use thin gap chambers (TGCs) in the region 1.05< |η| < 2.4, which deal with theincreased rate requirements due to non-collision background processes. The chambers of thesedetectors are formed by graphite-coated cathodes, filled with a gas mixture, and contain multiplewires separated by 1.8 mm. The TGCs also provide a measurement of the φ coordinate, with aresolution of around 5 mm.313. LHC and ATLASPrecision muon tracking chambersTwo additional detector systems provide high position resolution and precision tracking information.These systems are slower, and only read out after an initial trigger decision has beenmade.A system of monitored drift tubes (MDTs) covers the range |η| < 2.7. It is made of aluminum drifttubes with a diameter of 3 cm, filled with a gas mixture. The drift tubes contain a wire in the center,which is held at a potential difference with the tube. Muons passing through the tubes ionize the gas,and the resulting electrons are collected at the central wire. The electron drift time can reach up to700 ns, and the length of the signal pulse indicates how far from the wire the muon passed throughthe tube. The drift tubes are aligned tangentially to the φ direction to achieve high position accuracyin the z direction. MDT chambers consist of three to eight layers of drift tubes, achieving an averageposition resolution of 35 µmper chamber in the z direction. The measurement in the φ direction isobtain from the remaining MS systems.The forward region 2.0< |η| < 2.7 contains cathod strip chambers (CSCs). The CSCs aremulti-wireproportional chambers like the TGCs. They are used for their high rate capability and time resolutionof 4 ns. The anodewires are arranged in the radial direction, with a spacing of 2.5mm. On one end, thecathodes form strips perpendicular to the wires, providing the precision measurement in the radialdirection. The cathodes on the other end are perpendicular to the wires, and provide a measurementin the φ direction. The position resolution is 40 µm in the radial direction and 5 mm in φ.3.2.6 Trigger and data acquisitionIt is impossible to record all collision events in ATLAS at the 40 MHz collision rate provided by theLHC. The ATLAS trigger and data acquisition system reduces this input down to a rate of around1 kHz of events, which are recorded and kept for subsequent physics analyses [54, 55]. It consists oftwo components, the hardware-based Level-1 (L1) trigger and a software-based high-level trigger(HLT). Events accepted at the fast initial L1 step are passed on to the HLT, which runs more precisealgorithms andmakes a final trigger decision. Pre-scale factors N can be applied to triggers, causingonly every 1 in N events passing the trigger to be accepted by the trigger. The use of such factors limitsthe trigger rate, allowing to trigger on objects with less stringent requirements, which are producedmore often by the LHC. Triggers used for physics analyses are typically not pre-scaled. A wide rangeof triggers, summarized in a trigger menu, is used during ATLAS data-taking. They include triggersfor events with one or multiple characteristic objects (such as charged leptons with high transversemomenta), combinations of different object types, or triggers for particular event topologies.Level-1The L1 trigger step searches for events with high transverse momentum objects, including chargedleptons, photons, jets, and missing transverse momentum (see chapter 4). It uses data from thecalorimeters andMS, and defines coarse regions of interest (RoIs) in which the relevant objects arelocated, for further processing at the subsequent steps. The L1 trigger reduces the event rate to around323. LHC and ATLAS100 kHz, with a 2.5 µs maximum latency. The minimum duration between two consecutive eventsaccepted by the L1 trigger is limited, as is the number of events accepted over a given period of time.Events passing this trigger step are buffered for further processing in the HLT.High level triggerAfter the incoming event rate has been reduced by the L1 trigger system, the software-based HLTsystem can run more precise and computationally expensive reconstruction algorithms to refine thetrigger decision. These reconstruction algorithms generally are run in the RoIs defined at the L1 step.Events passing the HLT criteria are transferred to local storage at the ATLAS site, and then sent toCERN’s computing center for storage and subsequent processing. In 2015, the average HLT processingtime for events at the highest instantaneous luminosity reached was around 200 ms.Muon trigger systemGiven its relevance for the trigger efficiency analyses presented in chapter 10 of this dissertation,the muon trigger system is briefly explained in more detail in this section. This system is designedto identify events containing muons over a large spectrum of transverse momentum, with highefficiencies and moderate trigger rates.The L1 step uses only inputs from the fast muon trigger chambers, the RPCs and TGCs. Thesystem requires a hit coincidence in the trigger chambers, which points back to the beam interactionregion. A roughmuon candidate track transverse momentum is obtained at this step by comparingit to a track expected from amuon with infinite transverse momentum. The L1 defines RoIs of size0.1×0.1 in η×φ, which are passed to the HLT for further processing. Limited mostly due to the effectof detector geometry on the muon trigger chamber distribution, the L1 trigger covers around 80% ofthe barrel region and 99% of the end-caps.The trigger decision is refined at the HLT step by incorporating higher resolution and precisiontracking information from the MDT and CSC detector systems. A coincidence of generally 2–3 hits inmultiple detector layers is required. This step has an efficiency of close to 100% with respect to L1,and reduces the trigger rate by a factor of roughly 100.The HLT step consists of two stages: an initial fast reconstruction step is followed by precisionmuon reconstruction. Fast track reconstruction is performed in the RoIs defined at L1, using onlyinformation from the MS, with measurements from the MDTs and CSCs. A refined track fit is sub-sequently performed by combining measurements from the ID and the MS. If candidates pass thisstage, they enter the precision HLT step.High resolution muon reconstruction takes place in the precision step, with inputs from both theID and MS. Track candidates in the MS are extrapolated back and combined with ID information(called outside-in). A second approach, which starts with ID information and then extrapolates andcombines it with the MS (inside-out), recovers inefficiencies at lowmuon transverse momentum.Following this step, optional requirements can be applied. The use of muon isolation cuts helpsdistinguish between prompt and non-prompt muons. Prompt muons originate from the initial333. LHC and ATLAShard scatter process taking place, while non-prompt muons can arise from the decay of chargedhadrons. The muon isolation requirement is performed by rejecting events where the scalar sum ofthe transverse momentum of all tracks in a cone around the muon candidate is large. This sum isexpected to be small for prompt muons. Applying muon isolation requirements allows to lower themuon trigger transverse momentum threshold, while maintaining a reasonable trigger rate.Additional HLT algorithms exist besides the baseline strategy described. There are triggers usingonly MS information, and do not require information from the ID. In order to circumvent the triggerefficiency loss in the L1 step, the full-scan approach does not rely on L1 RoIs, but searches the entireMS for additional muons. While this is very computationally expensive, it provides a high triggerefficiency. It is suitable for finding additional muons for multi-muon triggers, when one muon hasalready been found with the baseline approach.3.2.7 Data quality requirements and available data for analysesCollision events from the LHC happening within ATLAS can only be used in physics analyses if theypassed a trigger and were recorded. Additional requirements are made to ensure high quality data.These requirements include that all detector subsystems were operational, and the beams providedby the LHC were stable when the event was recorded. Each event must furthermore contain a vertex(see section 4.2.2) with at least two associated tracks with transverse momenta pT > 400 MeV.Figure 3.9 shows the total integrated luminosity during Run-2 as a function of time. The greenhistogram corresponds to the total amount of data delivered by the LHC, amounting to 156 fb−1.Around 94% of this was recorded by the ATLAS detector, a total of 147 fb−1, shown in yellow. Additionalrequirements on the quality of the reconstructed data reduce the integrated luminosity available forphysics analyses further, and a total of 139.0 fb−1 remains to be analyzed in the full Run-2 ATLASdataset. This contribution is shown in blue.3.2.8 Simulation of ATLASThe simulation of the ATLAS detector is based on the GEANT4 software toolkit [56] and implementedwithin the ATLAS simulation framework [57]. It models the interactions of stable particles withthe detector, and the resulting signals received by the detector. The analog detector signals aredigitized and combined with pile-up effects, simulating both in-time and out-of-time pile-up. Triggeralgorithms are simulated, and the events are further reconstructed with the same algorithms usedfor data, described in section 4. An alternative faster simulation method, called AFII, is also used.It uses a parameterized calorimeter response to electromagnetic and hadronic showers, instead ofsimulating it in detail [58].34Month in YearJan '15Jul '15Jan '16Jul '16Jan '17Jul '17Jan '18Jul '18-1fbTotal Integrated Luminosity 020406080100120140160ATLASPreliminaryLHC DeliveredATLAS RecordedGood for Physics = 13 TeVs-1 fbDelivered: 156-1 fbRecorded: 147-1 fbPhysics: 1392/19 calibrationFigure 3.9: Total integrated luminosity collected by the LHC (green), recorded by the ATLAS detector(yellow) and used for physics analyses (blue), shown as a function of time [51].354. Object reconstructionThis chapter provides an overview of the reconstruction of physics objects with the ATLAS describedin section 3.2, with a focus on the objects of relevance to this dissertation. Basic components used inthe identification of particles, tracks and vertices, are described in section 4.2. Section 4.3 outlinesthe reconstruction of charged leptons, with a focus onmuons and electrons. The approach used toreconstruct jets is described in section 4.4. This section also includes a description of flavor taggingalgorithms, which are employed to identify jets containing b-hadrons. The definition of missingtransverse energy is given in section 4.5. The section ends with a description of the procedure used toresolve the overlap between the various physics objects reconstructed. This is described in section 4.6.4.1 Reconstruction overviewA schematic depiction of different fundamental particles interacting with the ATLAS detector isshown in figure 4.1 [59]. Muons pass through the whole detector, and are reconstructed mainlyFigure 4.1: Schematic of fundamental particles interacting with the ATLAS detector, adapted fromreference [59]. It shows a section of the x–y plane.364. Object reconstructionfrom their interactions with the ATLAS ID and MS system. Since they do not carry electric charge,photons do not interact with the ID, and form a collimated shower in the electromagnetic calorimeter.Electrons cause the same kind of electromagnetic shower, but interact with the ID on top of this. Theelectrically charged protons also interact with the ID, and then deposit their energy in the hadroniccalorimeter, in a shower that is less confined than electromagnetic showers. Neutrons cause thesame hadronic shower, but do not interact with the ID due to their lack of electric charge. The ATLASdetector is unable to detect neutrinos directly, however their presence may be inferred by consideringmomentum conservation. Missing transverse energy, described in section 4.5, can be caused byneutrinos in the SM.4.2 Tracks, vertices and energy clustersCharged particles passing through the ATLAS ID leave behind tracks. Primary vertices are locatedat proton–proton interaction points. The energy deposited by particles in the ATLAS calorimetersystem is grouped into clusters. Tracks, vertices, and calorimeter energy clusters are inputs to thereconstruction of other physics objects discussed in this chapter.4.2.1 TracksThe reconstructed trajectories of charged particles passing through the ATLAS detector are calledtracks. This section provides an overview of track reconstruction with ATLAS. More detailed descrip-tions of the algorithms used and their performance are provided in references [60, 61].Sensor measurements above a threshold in the pixel and SCT detectors are grouped into clusters.The resulting three-dimensional measurements of the cluster positions are called space points. Acluster is typically composed of multiple contributing pixels, and the intersection point of a chargedparticle with the detector layer is obtained from combining the information of all pixels in the cluster.Clusters can contain charge deposits from multiple particles, and they also may be used in thereconstruction of multiple tracks.The baseline inside-out tracking algorithm starts by defining sets of three space points, usedas seeds for tracks. These seeds are combined with additional space points compatible with thepreliminary track trajectory estimated from the three space points, then forming track candidates.This is done using a Kalman filter, which iteratively updates the best track candidate estimate as morespace points are added. A score for each track candidate is calculated, with higher scores assigned tocandidates more likely representing the trajectories of charged particles. An ambiguity solver thenconsiders track candidates in order of decreasing score. It limits the amount of clusters shared bydifferent tracks, but not identified as being compatible with having originated frommultiple tracks.Further quality criteria are applied in this step, and track candidates failing to pass the ambiguity solverare rejected. These criteria include that at least 7 clusters are assigned to a track, and that it containsat most two holes. Holes are defined as the intersection of a track candidate with a detector element374. Object reconstructionthat does not contain a matching cluster, even though it would be expected from the trajectory of aparticle following along the track candidate. Tracks are then extended into the TRT.An additional outside-in algorithm starts with track segments reconstructed in the TRT, andextends them towards the inside of the detector, adding information from the silicon detectors.4.2.2 VerticesVertices are locations of particle interactions, and identified via the tracks pointing away from them. Ofparticular relevance for measurements of particle kinematics are vertices at the points where proton–proton interactions took place. Their reconstruction in ATLAS is briefly described here. Additionaldetails are provided in reference [62].Vertex reconstruction consists of two steps. In the first step, reconstructed tracks are associated tocandidate vertices, this is called vertex finding. During the subsequent vertex fitting step, the vertexposition is reconstructed.The vertex finding starts with a seed position for a candidate vertex. The optimal vertex candidateposition is then updated by an iterative fitting procedure, using the tracks found previously. Eachtrack is assigned a weight, which describes its compatibility with the candidate vertex position. Bothtrack weights and vertex candidate position are updated throughout the iterative fit. All tracks that areincompatible with the vertex after the last iteration are not assigned to the vertex, and instead usedin the determination of subsequent vertices. New iterations of the vertex finding algorithm are thenperformed on the tracks not yet assigned to any vertex, this is repeated until no new vertex can befound. The primary vertex of a collision event is defined as the vertex with the largest sum of squaredtransverse momenta of tracks associated to it. While other definitions exist, this definition is used inthe results presented in this dissertation.4.2.3 Energy clustersEnergy deposits in individual cells of the ATLAS calorimeter system are clustered together, formingthree-dimensional topological cell clusters, also called topo-clusters. The procedure employed byATLAS is described in detail in reference [63].Topo-clusters are built starting from seeds. They are located at cells with high signal signifi-cance, which is given by the ratio of the signal to the expected average noise in each specific cell.Neighboring cells are iteratively added to the cluster if they have sufficient signal significance. Thisprocess continues until no significant cells are left to be added to the clusters. Cells with insignificantamounts of signal are not included in the clusters, thereby suppressing noise. A topo-cluster does notnecessarily contain all of the energy deposited by a particle, especially in hadronic showers, which areless confined than electromagnetic showers. It may contain the full shower or only a fraction of it, oreven calorimeter responses to multiple particles.The ATLAS calorimeters are non-compensating, meaning that the signal of a hadronwith the sameenergy as an electron or photon is smaller. Signals are measured using the electromagnetic energyscale, which correctly measures energies deposited by electrons and photons. Topo-clusters can be384. Object reconstructioncalibrated to correct for the non-compensating nature of the calorimeters and additional effects,such as energy losses in inactive material. They can be interpreted as massless pseudo-particles,characterized by their energy and location in the η and φ coordinates.4.3 LeptonsThis section describes the reconstruction of charged leptons with the ATLAS detector. The focus lieson electrons and muons, which are most relevant to the work presented in this dissertation. Tauleptons are only briefly described, as the t t¯H(bb¯)analyses in chapter 6 and chapter 9 contain a vetofor events with tau lepton with hadronic decay products, to ensure that the events analyzed in thisanalysis do not overlap with other searches for t t¯H . Due to the special treatment of tau leptons,and the inability of the ATLAS detector to detect neutrinos directly, the term lepton is used for areconstructed object in this dissertation when describing a light charged lepton (electron or muon).4.3.1 MuonsMuons are reconstructed mostly using information from the ATLAS ID and MS detector systems.Details about the relevant algorithms and their performance are given in reference [64].ReconstructionThe first step of muon reconstruction takes place independently in both the ID and the MS. Tracks inthe ID are reconstructed as described in section 4.2.1.In the MS, segments of tracks are formed by combining nearby hits, which follow the trajectoryexpected from amuon, and are consistent with having originated from the proton–proton interactionpoint. Segments from different detector layers are combined, first using segments from the middleof the MS as seeds, and then extending the seeds also to the inner and outer layers. At least twosegments are required to form a track candidate in general, and segments can be used in multipletrack candidates. A χ2-based fit to the hits associated to each track candidate is then performed todetermine whether a track candidate is accepted as a track.Multiple different muon types exist, depending on the detector information used to reconstructthem. Combined muons are built from a combined track fit in the ID and MS. Most muons arefound with an outside-in approach, which extrapolates the track from the MS to the ID. An inside-outalgorithm is also used. Extrapolated muons are muons based only on a track in the MS, which isrequired to be consistent with having originated from the interaction point. These muons extend themuon acceptance in the region 2.5< |η| < 2.7, which is not covered by the ID. Segment-taggedmuonsare built from tracks in the ID and extrapolated to match at least one track segment in the MS. Thisrecovers muons crossing only one MS layer. Calorimeter-taggedmuons use energy deposits in thecalorimeter compatible with having been deposited by a minimum-ionizing particle. These must bematched to an ID track. This muon type recovers inefficiencies in regions of gaps in the MS.394. Object reconstructionIdentificationDuring muon identification, further quality requirements are applied to muon candidates to suppressnon-muon backgrounds (mostly originating from decays of charged hadrons) and to ensure a goodmuonmomentummeasurement. Non-prompt muons from decays of charged hadrons can lead toincompatible momentummeasurements in the ID andMS, and the comparison of individual andcombined fits helps reject such muons.The muon identification criteria are collected in so-called operating points. For relevance to thisdissertation are the loose,medium, and high-pT operating points.Themedium operating point is optimized to minimize systematic uncertainties related to muonreconstruction and calibration. It uses combined and extrapolated muon tracks. For combinedmuons, at least three hits in at least two layers of the MDT are required. The requirement is relaxedin the central |η| < 0.1 region due to a gap in the MS. Extrapolated muons need to have hits in atleast three MDT or CSC layers. A consistency check of the momentummeasurement for combinedmuons in the ID andMS is also applied to reject non-prompt muons. The reconstruction efficiencyfor muons with transverse momentum above 20 GeV when using this operating point is 96%.The loose operating point includes all muons passing themedium operating point requirements,but extends the acceptance further. It is designed for analyses searching for Higgs boson decays tofour charged leptons, and increases the reconstruction efficiency to 98% for muons with transversemomentum above 20 GeV. This operating point includes segment- and calorimeter-tagged muons inthe |η| < 0.1 region.The high-pT operating point is optimized for momentum resolution of tracks with transversemomenta above 100GeV. It uses combinedmuons fulfilling themedium operating point requirements.On top of this, tracks need to have at least three hits in at least three MS detector layers. This improvesthe transverse momentum resolution of muons with very high momenta above 1.5 TeV by around30%, while reducing reconstruction efficiency by roughly 20%.IsolationPrompt muons are typically produced spatially well separated from other particles. Muon isolation,which refers to detector activity present around a muon candidate, can therefore be used to rejectnon-prompt muons. The decay of objects at high momenta, resulting in collimated decay productswhich can include muons, presents one possible exception. Twomeasures of isolation are used, onetrack-based and one based on calorimeter information. The track-based isolation is calculated asthe scalar sum of transverse momenta of all tracks with pT > 1 GeV within a cone around the muon,excluding themomentumof themuon track itself. The cone size depends on themuonmomentum pµTand decreases in size for higher momentummuons: ∆R =min(10 GeV/pµT,0.3). The calorimeter-basedisolation is obtained by summing the transverse energies of topo-clusters in a cone of size ∆R = 0.2around the muon, again excluding the energy deposit from the muon itself, and also correcting foreffects from pile-up. Isolation criteria are defined via the ratio of track-based or calorimeter-basedisolation to the muon transverse momentum.404. Object reconstructionThe isolation operating points used in this dissertation are loose, gradient, FCTight and FCTTO.The loose, gradient, and FCTight (which stands for "fixed cut tight") operating points use both track-and calorimeter-based isolation; the FCTTO (which stands for "fixed cut tight track-only") operatingpoint only uses track-based isolation. The loose operating point provides a constant 99% efficiency formuons across η and pT, while the gradient operating point is at least 90% efficient at muon transversemomenta of 25 GeV and 99% efficient at 60 GeV. The FCTight and FCTTO operating points aredesigned to be robust to pile-up effects.CorrectionsIn order to match data more accurately, corrections to muon momentum scale and resolution areapplied to simulated muons. An additional correction is applied to resolve differences in muonselection efficiencies between data and simulation. These selection efficiencies originate from theassociation of tracks to vertices, muon identification, andmuon isolation. They also include triggerefficiencies whenmuon triggers are used. All of these corrections are derived from studies making useof the clean decays of Z bosons and J/ψmesons to muon pairs. The trigger efficiency corrections aredescribed in more detail in chapter 10, where they are derived for additional event topologies.4.3.2 ElectronsElectrons are reconstructed using the tracks they leave in the ID, and their energy deposited in theelectromagnetic calorimeter. A detailed overview of reconstruction algorithms and their performancecan be found in reference [65].ReconstructionElectrons lose their energy predominantly due to bremsstrahlung and subsequent electron–positronpair production from emitted photons. The energy from an electron is typically deposited within asingle cluster in the electromagnetic calorimeter. Interactions of electrons with detector materialcan already happen before they enter the calorimeter, resulting in radiated photons converting intoelectron–positron pairs (photon conversions). This can result in multiple tracks being reconstructedin the ID, all originating from the same electron, and all pointing to the same cluster in the calorimeter.Electrons are reconstructed in the |η| < 2.47 region of ATLAS, not including the overlap region betweenthe barrel and end-caps.Electron reconstruction starts by finding energy clusters in the electromagnetic calorimeter withtransverse energies above 2.5 GeV. This procedure is more than 99% efficient for electrons withtransverse momenta above 15 GeV. The track reconstruction in the ID, described in section 4.2.1, ismore than 98% efficient for electron transversemomenta above 10GeV. Tracks are electron candidatesif they geometrically match an energy cluster, have at least four hits in silicon layers of the ID, and arenot associated to a vertex that has been identified as originating from a photon conversion [66]. Analgorithm selects the track most likely originating from the primary electron in case multiple tracks414. Object reconstructionmatch the criteria. The energy cluster size is then extended and calibrated to accurately represent theoriginal electron energy. The electron position in the η and φ coordinates is obtained from the trackmatching the cluster.IdentificationLikelihood-based identification algorithms are applied to electron candidates in order to ensurehigh quality electrons for analysis and suppress non-prompt electron backgrounds, which includephoton conversions, jets, and non-prompt electrons from heavy-flavor quark decays. The likelihoodis calculated from a range of measurements, including the shower shape in the electromagneticcalorimeter, energy deposited in the hadronic calorimeter, details about the track matched to theelectron and the matching itself, as well as the transition radiation in the TRT to discriminate againstpions. The probability distribution functions of the inputs to the algorithm are derived from simu-lation, and corrected to accurately model data. Electron identification proceeds by requiring that alikelihood-based discriminant, which increases in value for more electron-like candidates, has at leasta specific minimum value, which depends on the operating point used.The electron identification operating points relevant to the work presented in this dissertation aretight,medium, and LooseAndBLayer. The tight operating point is 80% efficient at identifying promptelectrons with transverse energies of 40 GeV, while themedium operating point is 88% efficient. TheLooseAndBLayer operating point is a variation of the 93% efficient loose operating point. It includes arequirement of a hit in the innermost pixel layer, which the tight operating point does as well. Furthertracking requirements are applied for all operating points. These include at least two hits across allpixel layers and at least seven hits in pixel and SCT layers combined.IsolationsLike muons, prompt electrons are generally expected to be spatially separated from other particles,and correspondingly detector activity associated to them is expected to be isolated.Electron isolation is calculated similarly to muon isolation, with track-based and calorimeter-based isolation variables. Track-based isolation is calculated from the scalar sum of transversemomenta of tracks with pT > 1 GeV around the electron. The track matched to the electron is notincluded, and neither are nearby tracks likely having originated from photon conversions. Thecone has a variable size depending on the electron momentum peT, ∆R =min(10 GeV/peT,0.2). Thecalorimeter-based isolation is given from the sum over transverse energies of topo-clusters in acone of size ∆R = 0.2 around the electron, with the electron energy deposit at the core removed andadditional corrections for pile-up applied. Isolation criteria are defined via the ratio of track-based orcalorimeter-based isolation to the electron transverse momentum.The isolation operating points used in this dissertation are loose and gradient. These operatingpoints use both track- and calorimeter-based isolation. The loose operating point is 98% efficient,while the gradient operating point is 90% and 99% efficient for electrons with transverse momenta of25 GeV and 60 GeV, respectively.424. Object reconstructionCorrectionsA calibration of the electron energy scale and resolution is applied to electrons in data and simulation[67]. It is derived from samples with Z boson decays to electron pairs, and the corrections are validatedin samples with J/ψmeson decays to electron pairs, as well as Z boson decays to electron pairs withan additional photon. Electron selection efficiency corrections are derived using the clean decays ofZ bosons and J/ψmesons to electron pairs. The selection efficiency in simulation is corrected foreffects due to electron reconstruction, identification, and isolation. When electron triggers are used,the acceptance efficiency due to these triggers is also corrected.4.3.3 Tau leptonsTau leptons decaying into final states with electrons or muons are reconstructed as these lighterleptons. Separate techniques are used for tau leptons decaying into hadronic final states, also calledhadronic tau leptons [68]. Their reconstruction starts from jets, which are described in section 4.4.1.Additional criteria on the associated tracks help distinguish lepton candidates from jets. The taulepton energy is calibrated depending on their transverse momentum. Boosted decision trees (BDTs)are used to identify taus decaying into hadronic final states. Different operating points exist for the taulepton identification, including themedium operating point used for the t t¯H(bb¯)analyses presentedin chapter 6 and chapter 9.4.4 Jets and flavor taggingJets consist of showers of hadrons originating from partons produced in proton–proton collisions.The showers develop from color-charged particles produced in the hard scattering interaction incollisions at the LHC. Clustering algorithms group together the energy deposits from the shower inthe ATLAS calorimeter system into jets. Flavor tagging algorithms are used to distinguish betweenjets containing hadrons with quarks of different flavors. This section describes jets at ATLAS and theflavor tagging algorithm used for the results in this dissertation to identify jets containing b-hadrons.4.4.1 JetsThe jet definition relevant for the analyses in this dissertation employs energy deposits in the ATLAScalorimeter system, clustered together to form jets. This section provides an overview of jets in theseanalyses. Details about reconstruction and calibration of jets are given in reference [69].Formation from energy clustersJet reconstruction starts with three-dimensional topological cell clusters described in section 4.2.3,which are clustered together into jets. The clustering is performed with the anti-kt algorithm [70],implemented in the FASTJET package [71]. In the anti-kt algorithm, the distance measure between434. Object reconstructiontwo objects i , j (characterized by their transverse momenta, rapidities y , and azimuthal angles) isdefined asdi j =min(p−2T,i ,p−2T, j) (yi − y j )2+ (φi −φ j )2R2. (4.1)The parameter R is variable, and set to R = 0.4 for the results presented in this dissertation. Aftercalculating the distances between all objects, they are combined together in order of increasingdistance. If di j > p−2T,i for any object j , the object i is called a jet and not used anymore in theclustering of the remaining objects.The distance between two objects with lowmomentum is large compared to the distance betweenobjects with low and high momentum. High momentum objects therefore cluster together with lowmomentum objects in their vicinity, forming a conical jet with radius R, as long as no other highmomentum objects are nearby. Lowmomentum objects do not modify the shape of the jet, makingthe algorithm infrared safe; it is furthermore also collinear safe (and therefore unaffected by collineargluon emissions).Calibration and selectionThe topological clusters from which a jet is built are calibrated to the electromagnetic scale, andcorrectly measure the energy deposits from electromagnetically interacting particles. In the firstcalibration step, the jet four-momentum is scaled such that the jet points to the primary vertex, whilekeeping the same energy. After this, contributions to the jet energy from pile-up are removed. The jetenergy and direction are then calibrated to match the behavior derived from simulation. It correctsboth the jet energy scale and resolves reconstruction biases as a function of η, which occur due tochanges in calorimeter granularity. Further corrections are derived by including also tracking andATLASMS information, which increase the jet energy resolution. The final step consists of residualcorrections to jets in data, accounting for differences between data and simulation. It is derived usingwell-measured reference objects, such as photons or Z bosons.After these calibration steps, quality criteria on jets are applied [72]. Events with jets failing tomeetthese criteria are not considered for further analysis. The effect of pile-up is mitigated by employingan algorithm called jet vertex tagger [73]. It rejects jets where a significant fraction of the transversemomentum of tracks assigned to the jet is not associated to the primary vertex.4.4.2 Flavor taggingThe identification of jets containing hadrons of a specific flavor is called flavor tagging. It is usedto distinguish between the kind of parton a given jet originated from. This section focuses on thedescription of a so-called b-tagging algorithm, which is designed to identify jets containing b-hadrons.Such jets are called b-jets, and originate from bottom quarks produced in the initial proton–protonscattering. Their identification is crucial for the identification of processes like t t¯H(bb¯), where manybottom quarks are expected to be produced, and subsequently many b-tagged jets are expected in a444. Object reconstructionreconstructed event. In contrast to b-jets, the so-called light jets originate from the first generationand s quarks, while c-jets originate from charm quarks.This section provides an overview of the b-tagging algorithm used for the work presented in thisdissertation. More details are provided in reference [74]. The performance of the algorithm versionrelevant for the t t¯H(bb¯)analysis in chapter 6 is summarized in reference [75]. The t t¯H(bb¯)analysisin chapter 9 and the muon trigger efficiency measurement in chapter 10 use an updated algorithmversion, which is described in reference [76].Algorithm overviewThe most important ingredients to the b-tagging algorithm are jets, tracks reconstructed in the IDand the primary vertex. Tracks are associated to jets based on their separation ∆R, within a cone ofvarying size depending on the jet transverse momentum.The identification of b-hadrons in jets makes use of their long lifetime and high mass. With alifetime around 1.5 ps, a b-hadron with a transverse momentum of 50 GeV travels around 4.5 mm inthe transverse direction before it decays. Tracks associated to the hadron can be identified by theirlarge impact parameters. The transverse impact parameter d0 is the closest distance between primaryvertex and track in the transverse direction. The longitudinal impact parameter z0 measures thisclosest distance in the z direction. Tracks significantly displaced from the primary vertex can be usedto reconstruct a secondary vertex where the b-hadron decay takes place. A dedicated algorithm aimsat reconstructing the b-hadron decay chain [77].The BDT basedMV2 algorithm combines the information from various other algorithms, usingimpact parameter information, the reconstruction of a secondary vertex and the b-hadron decaychain. It is designed to correctly identify b-jets. The MV2c10 algorithm used for the results in thisdissertation is trained with b-jets as signal, and amixture of 93% light jets and 7% c-jets as background.The jets used for the training are taken from a sample of simulated t t¯ events.Operating pointsTheMV2c10 algorithm is used with a range of operating points, achieving different efficiencies andrejection rates for b-jets and other jets, respectively. The b-tagging efficiency is the rate at which trueb-jets are correctly identified as such. For a given efficiency, the algorithm performance is quantifiedby the rejection of other jets, such as c-jets and light jets. A rejection factor r means that one in r jetswill mistakenly be tagged as a b-jet. Larger b-jet efficiencies result in lower rejection of other jets.Four different operating points are defined for the MV2c10 algorithm, corresponding to b-jetefficiencies of 60%, 70%, 77% and 85%. These operating points are referred to as very tight, tight,medium and loose, respectively. By applying the algorithm to every jet in an event, each jet can beclassified into one of five classes. When a jet satisfies any of the four operating points, it is called ab-jet. If it fails to satisfy the loose operating point, it is instead classified as untagged. The tightestoperating point satisfied by a jet is used to refer to it, since it also satisfies all operating points withhigher efficiency by design.454. Object reconstructionTable 4.1: Operating points of the MV2c10 algorithm, with corresponding b-jet identification efficien-cies and rejection factors for c-jets and light jets [75].operating point b-jet efficiency c-jet rejection factor light jet rejection factorvery tight 60% 34 1538tight 70% 12 381medium 77% 6 134loose 85% 3 33An overview of the operating points and their performance is shown in table 4.1, taken from [75].The performance is evaluated using a sample of simulated t t¯ events.CalibrationThe performance of the MV2c10 algorithm is evaluated using various event topologies enriched inb-, c- and light jets. Using these measurements, scale factors for the b-jet tagging efficiency and c-and light jet mis-tag rates are derived. These scale factors are applied to simulation to match theperformance measured in data, and depend on the true jet flavor.4.5 Missing transverse energyConservation of four-momentum implies that the vector sum of all objects produced in a collisionat the LHC is equal to the sum over colliding partons. As the LHC collides protons head-on, thetransverse momentum of the system containing all objects produced in a collision should vanish. Notall of the objects in this system are always detected by ATLAS; neutrinos leave the detector unseen.The resulting momentum imbalance is restored by adding the so-called missing transverse energy tothe system. It can be quantified by an energy, denoted by EmissT , and an associated azimuthal angle.Adding this contribution to the system of all visible particles will balance the total vector sum to havevanishing transverse momentum.Themissing transverse energy in an event is calculated as the negative of the vector sum of thetransverse momenta of all reconstructed, calibrated objects [78]. For the results in this dissertation,this sum includes electrons, muons and jets. An additional term is added to the sum to account forenergy deposits not associated to any of these reconstructed objects. This term is built from chargedparticle tracks in the ID, which are assigned to the primary vertex, but not to any reconstructed objects.Overlap between the different physics objects is removed in the calculation of EmissT , in order to avoiddouble counting of contributions.4.6 Overlap removalAfter the reconstruction of the various objects described in this chapter, an overlap removal procedureis used to avoid double counting detector responses in the reconstruction of multiple objects. Such464. Object reconstructiondouble counting can happen for example when an electron showers and deposits energy in theelectromagnetic calorimeter, and this deposit is also reconstructed as a jet.The overlap removal employed for the results in this dissertation is briefly described here. Ifthere is a jet candidate within ∆Ry = 0.2 of an electron candidate, the closest jet to this electron isremoved. In case there is another jet left within ∆Ry = 0.4 of the electron candidate after this step, theelectron candidate is removed as well. Muon candidates are required to not be within ∆Ry = 0.4 of ajet candidate, or removed from the event otherwise. An exception to this treatment is used if the jetcandidate has two or less tracks associated to it, in which case the jet candidate is removed and themuon candidate is kept instead. This accounts for muons losing a significant amount of energy inthe calorimeter. Candidates for tau leptons decaying into hadronic final states are rejected if they arewithin ∆Ry = 0.2 of an electron or muon.475. Statistical methodsThis chapter describes several statistical techniques used to interpret measurements at the LHCwithin a frequentist approach. The notion of probability in this approach refers to the relativefrequency of an outcome of a repeatable experiment. In contrast, Bayesian statistics includes priorsubjective knowledge to express probability density functions for parameters. The chapter startswith a description of the basic ingredients needed for statistical inference in section 5.1, followed bydetails regarding inference techniques in section 5.2. A brief introduction to two multivariate analysistechniques, BDTs and neural networks, is included in section 5.3. The first two sections in this chapterfollow the overview of statistical techniques relevant to high energy physics provided in reference [10].A summary of the procedures used for the Higgs boson discovery can be found in reference [79]. Moreinformation about a broad range of multivariate analysis techniques can be found in references [80,81].5.1 Statistical modelingThis section introduces the basic ingredients necessary for statistical inference.5.1.1 Random variablesA random variable is the outcome of a repeatable experiment. This outcome of an experiment isdenoted as an observation x. Depending on the experiment, observations take on either discrete orcontinuous values. The continuous case is assumed in the following. Using the probability densityfunction f (x;α), the probability for an observation to lie in the range between x and dx is givenby f (x;α)dx. The probability density function may depend on one or more additional parameters,denoted by α. It is normalized to unity, such that the probability for an observation to take on anyallowed value x is exactly one. The probability for an observation to take on a value x ≤ b is given bythe cumulative distribution functionF (b)=∫ b−∞f (x;α)dx. (5.1)For any function u (x) of a random variable x, its expected value is given byE [u (x)]=∫ ∞−∞u (x) f (x;α)dx. (5.2)The mean of a probability density function is given by µ= E [x], and the variance by σ2 = E [x2]−µ2.The square root of the variance is the standard deviation σ.485. Statistical methodsTable 5.1: Probabilities P for Gaussian distributed observable x to fall within n standard deviationsof the mean µ in one experiment, and average amount of experiment repetitions needed for oneobservation to fall outside of this range.n P 11−P1 0.683 3.152 0.954 22.03 0.997 3704 1−6 ·10−5 158005 1−6 ·10−7 1740 000If x and y are two random variables, then f(x, y ;α)is called the joint probability distributionfunction. The marginal probability density function of x is then given byf1 (x;α)=∫ ∞−∞f(x, y ;α)dy, (5.3)obtained via marginalization over y . The covariance of x and y is defined ascov[x, y]= E [(x−µx)(y −µy )]= E [(xy)]−µxµy , (5.4)with µx and µy being the means of x and y , respectively. For x = y , cov[x,x]=σ2x , with σx being thestandard deviation for x. The correlation between two variables x and y is given by cov[x,y]/σxσy .5.1.2 Common distributionsA few commonly used probability density functions are described in this section.Poisson distributionThe Poisson distributionPoisson(n;ν)= νne(−ν)n!(5.5)gives the probability to observe n events occurring independently in an interval, where the expectedrate of events ν is ν> 0 per interval. For this distribution, σ2 = ν.Gaussian distributionThe Gaussian or normal distributionN is given byN(x;µ,σ)= 1p2piσe−(x−µ)22σ2 , (5.6)with mean µ and variance σ2. The probabilities P(x ∈ [µ−nσ,µ+nσ]) for observation x to be withinthe range[µ−nσ,µ+nσ] in one experiment are given in table 5.1. The third column describes howmany times on average an experiment would have to be repeated for an observable to fall outside ofthe range[µ−nσ,µ+nσ].495. Statistical methodsχ2 distributionFor n independent Gaussian random variables x1,x2, ...xn , the variable z = ∑ni=1 (xi−µi )2/σ2i is dis-tributed like a χ2 probability distribution function with n degrees of freedom. It is written as χ2 (n)and given for z ≥ 0 asfχ2 (z;n)=zn2−1e−z22n2 Γ(n2) . (5.7)The gamma function is Γ (n) = (n−1)! for integer n > 0. For n = 1 degrees of freedom, fχ2 (z;1) =e−z2/p2piz.5.1.3 Likelihood functionThe expression L (α)= P (x|α) defines the likelihood function for a hypothesis α, given an observationx. It specifies the probability to obtain an observation x under a specific hypothesis. This hypothesisα is usually specified by a parameter of interest µ, as well as nuisance parameters θ, α = (µ,θ). Atypical choice for the parameter of interest is a signal strength µ = σobs/σSM, given by the ratio of ameasured cross-section to the prediction from the SM. Nuisance parameters θ encode additionaldegrees of freedom in the likelihood, representing systematic uncertainties.For an experiment measuring event counts across N different bins i , with the expected countsunder hypothesis α given by νi (α), the likelihood of an observation x characterized by event countsxi per bin is given by a product of Poisson terms,L (α)= P (x|α)=N∏i=1Poisson(xi ;νi (α)) . (5.8)Systematic uncertaintiesThere are typically many different sources of systematic uncertainty which affect the expected countsin a bin νi(µ,θ). The parameters θ describe these effects and can increase the uncertainties onthe parameter of interest µ. To reduce the impact of these uncertainties, statistically independentsubsidiarymeasurements with data y can be used to build a jointmodel expressing the total likelihoodfor observations x and y , given all parameters µ,θ. This joint model isL(µ,θ)= P (x|µ,θ)P (y |θ) . (5.9)The subsidiary measurements usually do not depend on the parameter of interest µ.Inmany practical applications, the subsidiarymeasurement is approximated by amodel. GaussiandistributionsN are a common choice for this. Consider a subsidiary observation yi , used to constraina nuisance parameter θi . Given an estimator for this nuisance parameter θˆi (which can be obtainedby finding the parameter value maximizing the likelihood, described in section 5.2.1), and its standarddeviation σˆθi , the subsidiary measurement can be approximated asP(yi |θi)→N (θˆi ;θi , σˆθi ) . (5.10)Nuisance parameters are often re-defined for convenience, such that θˆ ≡ θ0 = 0 and σˆθ ≡∆θ = 1.505. Statistical methods5.2 Statistical inferenceDepending on the scientific question examined, a range of different inference methods exist to gaininsights frommeasured data. This section provides an overview of techniques relevant to the work inthis dissertation.5.2.1 Parameter estimationAn estimate of any parameter αi can be obtained via the method of maximum likelihood, by solving∂P (x|α)∂αi= 0. (5.11)The estimators solving this set of equations are given by αˆ and are called maximum likelihoodestimators. They are unbiased: their expected value agrees with the true parameter value E [αˆi ]=αi .An estimate for the covariance matrix Vi j = cov[αˆi , αˆ j]is obtained from(Vˆ −1)i j = −∂2P (x|α)∂αi∂α j∣∣∣∣αˆ. (5.12)The estimate for the variance of a parameter αi is given at Vˆi i .So-called conditional maximum likelihood estimators are obtained whenmaximizing the likeli-hood for a given value of one of the parameters. The parameter values ˆˆθµ maximize the likelihoodL (α) with α= (µ,θ) for a given setting of µ.Solutions to equation (5.11) and equation (5.12) are typically calculated numerically; the MINUITsoftware [82, 83] is used for the applications in this dissertation.5.2.2 Hypothesis testingIn a hypothesis test, two different hypotheses H0,H1 are compared with each other to determinewhether the null hypothesis H0 can be rejected in favor of the alternative H1. In a typical use case,the hypotheses are distinguished by a signal strength µ= µobs/µSM, which is the ratio of a measuredcross-section to the prediction from the SM. The null hypothesis specifies a signal strengthµ= 0, whilethe alternative hypothesis predicts a signal strength consistent with the SM, µ= 1. The rejection of H0is required to claim discovery of the signal process affected by µ. As stated by the Neyman-Pearsonlemma [84], the likelihood ratioλNP (x)= f (x|H1)f (x|H0)(5.13)maximizes the statistical power to reject H0 in favor of H1. A scalar function of the data, such asthe likelihood ratio described in equation (5.13), is called a test statistic t (x). While λNP (x) is anoptimal test statistic, it can only be used if the probability density functions f (x|Hi ) can be evaluated.When this is not possible, common alternatives include the use of BDTs and neural networks (seesection 5.3), and the matrix element method described in detail in chapter 7.515. Statistical methodsx(x;,)Zp-valueFigure 5.1: Relation between significance Z and p-value.Given the probability density function f (t |H0), and assuming that larger values of t indicateincreased discrepancy with H0, the p-valuep =∫ ∞tobsf (t |H0)dt (5.14)quantifies the level of discrepancy between the observed test statistic tobs, calculated frommeasureddata, and the expectation when assuming that H0 is true. When H0 is true, the p-value will beuniformly distributed in the interval [0,1]. The p-value can be converted into the significance Z viaZ =Φ−1 (1−p) , (5.15)whereΦ(x)= ∫ x−∞N (y ;0,1)dy is the Gaussian cumulative distribution function andΦ−1 is its inverse.The result of a hypothesis test comparing a hypothesis including a new particle and a null hypothesiswithout this particle present is called evidence in high energy physics if Z ≥ 3, and observation ofthis particle for Z ≥ 5. A threshold of p = 0.05 is usually applied when performing a test to reject ahypothesis containing a new signal process in favor of a background-only hypothesis. The relationbetween significance and p-value is visualized in 5.1. In the presence of nuisance parameters, thep-value generally depends on those.The CLS methodInstead of working with the p-value directly, a common alternative for the derivation of limits isdefined by the CLS method [85]. Let pµ be the p-value derived under a hypothesis specifying a signalwith strength µ that is being tested. It represents the probability to obtain an observed result equallyor less compatible with the signal hypothesis than the observed one. Let p0 be the p-value describingthe probability to obtain a result equally or less compatible with a background-only (no signal, µ= 0)hypothesis. The CLS methodmodifies pµ to determine whether the signal hypothesis may be rejected:CLS(µ)= pµ1−p0. (5.16)525. Statistical methodstNP= 2ln[f(x|H1) / f(x|H0)]f(tNP)tobsp1p0f(tNP|H1)f(tNP|H0)Figure 5.2: Distribution of the test statistic tNP under hypotheses H1 and H0, including p-valuescalculated from an observation tobs indicated in the shaded areas.In experiments with little sensitivity, the distributions of test statistics under signal and background-only hypotheses may overlap significantly. If the observed data fluctuates downwards compared tothe expectation from the background-only hypothesis, the upper limit derived on µmay be very low.For large p0, the resulting value for CLS(µ)penalizes pµ to mitigate this effect. Models to which thetest is not sensitive are therefore not excluded.Figure 5.2 shows an example with a test statistic defined as tNP = −2ln[ f (x|H1)/f (x|H0)]. The teststatistic distribution under hypothesis H0, shown as f (tNP|H0), is concentrated along higher valuesof tNP than the distribution under hypothesis H1. The p-value p1 calculated to reject hypothesis H1in this example is around 2%. When calculating CLS, it gets penalized by the large value of p0 andincreases by roughly a factor of 3. It is not possible to reject H1 at the 95% confidence level in this case,as the sensitivity of the measurement is insufficient.Profile likelihoodThe profile likelihood ratio is a test statistic defined asλµ =L(µ, ˆˆθµ)L(µˆ, θˆ) (5.17)in order to dissolve the dependence on nuisance parameters θ. The parameters µˆ, θˆ are maximumlikelihood estimators, while ˆˆθµ is the conditional maximum likelihood estimator for a given µ. Forconvenience, the test statistictµ =−2ln[λµ], (5.18)defined as a function of the profile likelihood ratio, is commonly used. Increasing values of tµcorrespond to larger discrepancies of the observed datawith the hypothesis parameter settingµ. Wilk’stheorem [86] states that in the limit of sufficiently large data samples, and observations generatedwith a signal strength parameter µ′, tµ is distributed like a χ2 distribution with as many degrees of535. Statistical methodstf (t|H)t , obsp-valueFigure 5.3: Distribution of the test statistic tµ, the p-value can be obtained via the integral prescriptionin equation (5.14).freedom as dimensions in µ for µ′ =µ. An example with one degree of freedom is shown in 5.3. Whentesting for other settings µ′ 6=µ, the test statistic tµ follows a non-central χ2 distribution.Discovery test statisticThe test statisticq0 =t0 =−2ln[λ0] µˆ≥ 0,0 µˆ< 0 (5.19)is used to test for the discovery of a new signal by rejecting the µ= 0 hypothesis. This is assumingµ ≥ 0, and a data fluctuation resulting in µˆ < 0 is not interpreted as evidence for a signal. Instead,only increasing values of µˆ > 0 result in increasingly large values of q0 and thereby an increasedincompatibility with the null hypothesis µ= 0. The discovery significance Z0 is given by [87]Z0 =pq0. (5.20)Test statistic for upper limitsFor testing the upper limit on a signal strength parameter, the test statisticq˜µ =t˜µ =−2ln[λ˜µ]µˆ≤µ,0 µˆ>µ(5.21)is defined. The test statistic λ˜µ is equivalent to λµ for µˆ≥ 0. For µˆ< 0, it is given by λ˜µ = L(µ, ˆˆθµ)/L(0, ˆˆθ0).Fluctuations in a measurement resulting in µˆ>µ are not regarded as making the observed data lesscompatible with the hypothesis signal.5.2.3 Median significances and the Asimov datasetIt can be very expensive computationally to build the probability density functions f(tµ|Hµ)fortest statistics. This procedure relies on repeatedly generating datasets x distributed according to545. Statistical methodseach hypothesis Hµ that needs to be tested, and evaluating the test statistic for each of them. Whenevaluating the performance of an experiment, the median discovery significance for a given signalprocess can be evaluated by constructing the distribution f(q0|µ= 0), and evaluating the median teststatistic q¯0 across many different simulated datasets distributed under the signal hypothesis. Themedian significance with which a background-only hypothesis is expected to be excluded is obtainedby calculating the p-value of this median q¯0 via equation (5.14) and converting it to a significance.Similarly, expected median upper limits can be evaluated after constructing f(q˜µ|µ= 0), by findingthe value of µ for which the median p-value is 0.05. The value obtained is the median upper limit forµ at the 95% confidence level.Asimov datasetAnalytic approximations for the distribution of the test statistics mentioned above exist in the largesample limit [87]. The method to obtain them uses the Asimov dataset, which is defined in such away that the estimators for all parameters α= (µ,θ) obtained on this dataset correspond to their truevalues. Let µ′ be the signal strength parameter used in the generation of the Asimov dataset, and θ thenuisance parameters. The profile likelihood ratio evaluated on this dataset is given byλµ,A =LA(µ, ˆˆθ)LA(µ′,θ) , (5.22)where LA is the likelihood of the Asimov dataset. With a test statistic defined as qµ,A =−2ln[λµ,A],the variance of µˆ can be calculated:σ2A =(µ−µ′)2qµ,A. (5.23)In a test for discovery, µ= 0 to exclude a background-only hypothesis. When calculating the medianexclusion significance for hypothesis µ under the assumption that no signal exists, µ′ = 0.The Asimov dataset generated with signal strength parameter µ′ can furthermore be used toestimate median significances. Assuming a signal strength µ′, the discovery significance is given bymed(Z0|µ′)=pq0,A . The median exclusion significance for a signal strength µ, assuming true signalstrength µ′ = 0, is med(Zµ|µ′ = 0)=√q˜µ,A .5.3 Multivariate techniquesThis section provides a brief introduction into twomultivariate techniques relevant to this dissertation,BDTs and neural networks. The description is based on references [80, 81].5.3.1 Boosted decision treesBDTs are a commonmultivariate technique in high energy physics. They are used in many places, forexample in the identification of objects from collision remnants in the detector, and as discriminantsto distinguish between events originating from various processes in physics analyses.555. Statistical methodsDecision treesLet ~X = (~x1,~x2, . . . ,~xN ) be a set of data, where each data point i is described by a set of features~xi =(x1i ,x2i , . . . ,xMi), as well as a label (or set of labels) zi . The feature set~xi corresponds to observableinformation, such as kinematics of collision remnants reconstructed in a detector. The label zi isnot observable, but here it is assumed that it is possible to generate a simulated set of events (~xi ,zi ).Discrete labels are used to separate events into different classes. In the context of high energy physics,each event may be assigned for example a label of zi = 0 if no Higgs boson is produced in the finalstate of the event, and a label zi = 1 otherwise. Continuous labels can also be used.Decision trees provide an approximate model zˆi = fˆ (~xi ) to assign a label to a set of features foreach data point i = 1..N . They partition the feature space into hyperrectangles, with a label zˆi assignedto each hyperrectangle. Only binary partitioning, which is most commonly used, is considered inthe following; decision trees can then be visualized as binary trees. At the root of the tree, the fullfeature space has not been partitioned yet. The space is then recursively split into hyperrectangles, byapplying cuts along hyperplanes, until a stopping criterion has been reached. Each split decision isindicated by a node, the terminal nodes are also called leaves. Typical stopping criteria are amaximumtree depth and a minimum amount of simulated events remaining in a node during construction ofthe tree.Constructing decision treesThe construction of decision trees is done by recursively determining and applying the best possiblenext cut, according to some metric. Each cut splits a hyperrectangle into two. For classificationproblems with discrete labels k = 1,2, . . . ,K , the Gini index is given byG =K∑k=1pk(1−pk), (5.24)where pk is the fraction of events in class k at a given node. The index is bounded by maxG = 1− 1k ,reached if an equal fraction of events from all classes is present at the node, and minG = 0, reached ifonly events from one class are present. Cut decisions are taken such that the sum of Gini indices ofthe child nodes, weighted by the relative amount of events contained in them, is minimized.BoostingBoosting is a powerful method to extend the performance of decision trees for classification problems.It relies on constructing a weak classifier, which performs just slightly better than pure guessing,and iterative applications of this classifier onmodified versions of the data. After each step, eventsthat were misclassified by the latest classifier receive larger weights, while correctly classified eventsreceive smaller weights. The final model is then obtained via a weighted average of the individualclassifiers.565. Statistical methodsInput layer Hidden layers Output layerFigure 5.4: Exemplary architecture of a fully connected feedforward neural network with three inputs(drawn as blue circles), two hidden layers (with associated nodes drawn in green), and one output(drawn in purple). Information flows along the lines connecting nodes.5.3.2 Neural networksThe term neural network is used for a wide range of machine learning methods. The focus in thissection is on fully connected feedforward artificial neural networks; this architecture is used inchapter 11. Such a network has the goal of approximating a function f (~x). The term neural networkoriginates from its use as a model for the human brain, describing neurons connected via synapses.A neural network consists of layers of nodes. Figure 5.4 visualizes an example. The nodes of theinput layer, drawn as blue circles, correspond to features~x provided to the neural network, with onefeature per node. Nodes in all the remaining layers calculate a derived feature from the combinationof all nodes in the respective previous layer. The inputs to each node are visualized via lines in thefigure. Each node performs a linear combination of its inputs, weighted by a vector ~α, and also allowsfor the addition of a bias term α0. This bias term can be thought of as an additional node in theprevious layer with a constant value. The output of a node y j is given byy j =σ(~α j ·~r +α j0), (5.25)where σ (v) is called an activation function. The values for ~α j and αj0 are learned during the trainingof the neural network. The vector~r contains the output of all nodes in the previous layer. Whencalculating the output of a node in the first hidden layer, it is equivalent to the network inputs,~r =~x.The term feedforward to describe this architecture refers to the flow of information from the inputnodes forward towards the output of the network. It is fully connected, as each node receives inputsfrom all nodes in the previous layer.The hidden layers receive their name since their associate node values are not observed, andonly act as intermediate steps in the calculation of the network output. The output of the network isgiven in the final layer, the output layer. It may have more than one node, for example for a network575. Statistical methodsdesigned for a classification problem. When using the network in a regression problem, one outputnode is typically used.Loss function and trainingThe weights θ in the network are learned during a training procedure, whichminimizes a loss functionLθ. These weights contain ~αj ,α j0, and can also contain additional parameters affecting Lθ. The lossfunction measures the performance of the network. The regression task studied in section 11.3 usesthe mean absolute error between network output and true function value as loss function, defined asLθ =1NN∑i=1∣∣ f (~xi )− gθ (~xi )∣∣ . (5.26)The network output is given by gθ (~xi ), it depends on the set of weights θ. This loss function isevaluated by considering a set of N events, calculating the absolute error in the network output foreach event, and averaging the results. The minimization of the loss function Lθ can be performedwith gradient descent methods. During this iterative process, updates to the network weights θ arecalculated via the chain rule of calculus, minimizing the value of Lθ evaluated with a set of trainingevents.Activation functionsFor σ (v)= v , the neural network is a linear model of its inputs. The use of other activation functionsintroduces non-linearity to the neural network, and thus allows the network to describe non-linearfunctions of its inputs. Two types of activation functions are used in this dissertation, they are calledrectified linear unit (ReLU) and softplus. The ReLU function is defined byσ (v)=0 v < 0,v v ≥ 0. (5.27)Its output is zero for negative inputs, and equal to its input otherwise. The softplus function isσ (v)= log(ev +1) , (5.28)with positive output. It approaches the ReLU function for both very small and very large input values.Architecture and hyperparametersThe performance of a neural network not only depends on learning the weights θ, but also onparameters of the model that are not learned. This includes the architecture of the model, specifiedby the amount of hidden layers and nodes per layer, and the activation functions used. The set ofinputs to the network can be tuned, as can the choice of algorithm performing the loss functionminimization, and the hyperparameters associated to this algorithm.586. Search for Higgs boson production inassociation with a top quark pair and decayinginto a bottom quark pairThis chapter summarizes a search for t t¯H production with Higgs boson decays to bottom quark pairswith the ATLAS detector, performed with 36.1 fb−1 of data collected in 2015 and 2016 atps = 13 TeVduring Run-2 of the LHC. The result of the search was published in 2018, measuring a signal strengthof µt t¯H = 0.84+0.64−0.61 [1]. This signal strength is defined as the ratio of the measured cross-section to thecross-section predicted by the SM, as µt t¯H = σobst t¯H/σSMt t¯H .A measurement of the Higgs boson production process in association with top quark pairs, t t¯H ,provides a direct probe of the Yukawa sector of the SM. This production process is sensitive to the topquark Yukawa coupling yt , a parameter with implications exceeding particle physics, as discussed insection 2.4.1. Section 2.5.3 describes the role of the t t¯H process in determining this coupling in moredetail. The determination of yt via loop-induced couplings of gluons or photons to the Higgs bosonrelies on assumptions about BSM particles contributing to these loops in the Higgs boson productionand decay. In contrast to this, the tree-level measurement of yt via the more rare t t¯H process doesnot rely on such assumptions.The cross-section of t t¯H is proportional to the square of the top quark Yukawa coupling. Ameasurement of the t t¯H cross-section can therefore be interpreted in terms of yt .A range of t t¯H analyses were conducted prior to the search described in this chapter. DuringRun-1 of the LHC, both the ATLAS and CMS collaborations performed dedicated searches for t t¯Hwith three different Higgs boson decay topologies. Decays to bottom quark pairs, photon pairs, andfinal states with multiple charged electrons or muons (via Higgs boson decays to weak gauge bosonsand tau leptons) were analyzed. An ATLAS analysis of the t t¯H(bb¯)topology using 20.3 fb−1 of Run-1data measured a signal strength of µt t¯H = 1.5±1.1 [88]. The combination of various t t¯H final statesanalyzed with Run-1 data from ATLAS and CMS resulted in a t t¯H signal strength measurement ofµt t¯H = 2.3+0.7−0.6 [89].This chapter is organized as follows. Section 6.1 provides a brief overview of the analysis approachand challenges. The definitions of the objects used in the analysis and the basic event selectionare described in section 6.2. The expected kinematic distributions of events produced via the t t¯Hprocess and background processes are obtained from simulation and a data-driven technique; detailsabout this are provided in section 6.3. Section 6.4 describes the categorization of events into differentregions. The multivariate analysis techniques employed to distinguish between the t t¯H signal andbackground processes are presented in section 6.5. Systematic uncertainties affecting the t t¯H(bb¯)596. Search for t t¯H(bb¯)ggb¯b¯bbν¯ll−q¯qq¯ql+νlν¯ll−l+νlt¯tHW −W +Figure 6.1: Exemplary Feynman diagram for the t t¯H(bb¯)topology, with one or two light chargedleptons (l ) in the final state. The different columns listed for the decay products of theW bosonscorrespond to the alternative topologies considered in the analysis.search are listed in section 6.6. Lastly, section 6.7 presents the results of the statistical analysis.6.1 Analysis overviewThe analysis presented in this chapter targets the t t¯H(bb¯)topology, with one or two light chargedleptons (electrons or muons) originating from theW bosons produced in the top quark pair decay.Figure 6.1 shows one of the Feynman diagrams for this topology. Two channels are considered inthe analysis, with events assigned to them depending on the amount of reconstructed light chargedleptons. Events with one reconstructed light charged lepton are analyzed in the single-lepton channel,while those with two light charged leptons are contained in the dilepton channel. Within both of thesechannels, several regions are defined depending on the number of reconstructed jets and b-tagginginformation. The single-lepton channel also contains a dedicated region targeting the decay of topquarks with high transverse momentum. Given thatW bosons decay hadronically with a branchingratio of roughly 2/3 [10], the single-lepton channel contains more events than the dilepton channel.The signal extraction is performed via a combined profile likelihood fit to 19 non-overlappingregions across both channels. In 10 of these regions, a small fraction of signal events (less than 1.5% ofthe total amount of events) is expected, and they mostly serve to constrain systematic uncertaintiesassociated to backgroundmodeling. The remaining 9 regions have larger contributions of the t t¯H(bb¯)signal, and a variety of multivariate techniques are employed there to discriminate between t t¯H(bb¯)and other processes.While the analysis is designed for H→ bb¯ decays, all SMHiggs boson decay modes are consideredas signal. The contributions from other decay modes are small, they make up around 1–4% of thetotal signal in the most sensitive signal regions.606. Search for t t¯H(bb¯)ggt¯tb¯bFigure 6.2: Exemplary Feynman diagram for the t t¯ +bb¯ background process.The presence of at least one light charged lepton in the final state allows for an efficient way toselect events for the analyses, using electron andmuon triggers. Requiring at least one light chargedlepton to be present also suppresses background contributions from QCDmulti-jet production. Adedicated analysis for t t¯H(bb¯)with a fully hadronic final state was performed by ATLAS in Run-1of the LHC [90]. The ATLAS analysis of Run-2 data in this final state is ongoing; the analysis designdiffers significantly from the analysis presented in this chapter in order to deal with the multi-jetbackground.Main background: t t¯ +bb¯The largest experimental challenge in the analysis arises from the modeling of top quark pair pro-duction with additional b-jets, called t t¯+≥ 1b. A subset of t t¯+≥ 1b, the t t¯ +bb¯ background, arisesfrom top quark pair production with an additional emitted gluon splitting to a bottom quark pair.An exemplary Feynman diagram for this signature is shown in figure 6.2. The correct description ofthis process is difficult, and the large uncertainties associated to the predicted distribution of t t¯ +bb¯events in the analysis limit the overall sensitivity.Signal–background discriminationFurther challenges arise from the similarity of the t t¯H(bb¯)signal to other background processes. Inparticular, the final states of t t¯ +bb¯ and t t¯H (bb¯) contain the same partons. A successful discrimina-tion between these two processes relies on small differences in kinematic distributions.The invariant mass distribution of the bb¯ system from the decay of the Higgs boson is sharplypeaked around the Higgs boson mass, while the invariant mass of the system produced from anemitted gluon follows a more broad distribution. Reconstruction of the Higgs boson invariant masssuffers from combinatorial ambiguity, since additional b-jets from the top quark decays are present inevery event. The jets originating from the Higgs boson decay may also not have been b-tagged, ornot have been reconstructed. Additional jets may have been b-taggedmistakenly. Due to the largeamount of objects expected in the final state, the efficiency to reconstruct and identify all correctly islow. Even for the cases where all objects were correctly identified, the finite ATLAS detector resolution616. Search for t t¯H(bb¯)considerably widens the invariant mass peak expected from the b-jet system from the Higgs bosondecay.Additional small differences between t t¯ +bb¯ and t t¯H (bb¯) are expected, and diluted by detectoreffects. They are also affected by combinatorial ambiguity; in order to take advantage of some of theseeffects, jets need to bematched to partons. The angular distribution of the bb¯ system originating froma Higgs boson decay in the Higgs boson frame differs from the corresponding distribution of a gluon.This difference is due to the spin-0 nature of the SMHiggs boson, compared to the spin-1 gluon.The analysis uses a range of multivariate techniques to perform system reconstruction anddiscrimination of t t¯H from the background processes present. These are described in section 6.5.6.2 Event selectionThis section summarizes the requirements on events to be considered in the analysis. The analyzeddataset is briefly described in section 6.2.1. Section 6.2.2 lists additional details about the objectdefinitions used. If no further details are given, the object definitions follow the description inchapter 4. Lastly, section 6.2.3 specifies how the single-lepton and dilepton channels are defined, andwhich events they contain.6.2.1 DatasetEvents considered in this analysis are taken from proton–proton collisions atps = 13 TeV, deliveredin 2015 and 2016 by the LHC, and recorded by the ATLAS detector. All events are required to fulfillthe quality criteria listed in section 3.2.7. The dataset corresponds to an integrated luminosity of 36.1± 0.8 fb−1; the uncertainty is derived with a method similar to reference [50]. The mean number ofinteractions per bunch crossing in this dataset is 24, with a distribution ranging from around 8 to 45interactions. Figure 3.2 shows these distributions, but also includes data recorded by ATLAS that doesnot fulfill the quality criteria for physics analyses.6.2.2 Object definitionsElectrons are required to have pT > 10 GeV and be reconstructed within |η| < 2.47. They are removedif they fall into the transition region between the calorimeter barrel and end-cap, located at 1.37<|η| < 1.52. Electrons need to satisfy the LooseAndBLayer identification operating point, and the looseisolation operating point.Muons have the same transverse momentum requirement of pT > 10 GeV, and need to be locatedwithin |η| < 2.5. They have to satisfy the loose identification operating point, as well as the looseisolation operating point.Jets are required to have pT > 25 GeV and be located within |η| < 2.5 after their calibration. Theboosted region within the single-lepton channel uses an additional jet definition, so-called large-Rjets. It targets the decay of objects with high momenta, where the decay products are collimated and626. Search for t t¯H(bb¯)not resolved as individual jets. The selected standard jets are re-clustered [91] into large-R jets usingthe anti-kt algorithm with R = 1.0. Only large-R jets with pT > 50 GeV are considered.The overlap removal procedure described in section 4.6 is applied on these objects, using thestandard R = 0.4 jets, not the large-R jets. After the overlap removal, the lepton requirements aretightened further. Electrons need to pass the tight identification operating point, while muons needto fulfill themedium identification operating point. Both also need to pass the respective gradientisolation operating point requirements.6.2.3 Definition of the single-lepton and dilepton channelsAll events for this analysis were recorded with single light lepton (electron andmuon) triggers. Thesetriggers are highly efficient above their thresholds, and events are required to pass either a trigger witha lower transverse momentum threshold and a lepton isolation requirement, or a trigger with higherthreshold and no isolation requirement. For muons, the transverse momentum thresholds of thetriggers used are 20 GeV and 26 GeV for data recorded in 2015 and 2016, respectively, with additionalisolation requirements applied. Without the isolation requirements, the threshold is 50 GeV. Thelowest electron trigger thresholds are 24 GeV and 26 GeV with isolation requirements. Additionalelectron triggers with thresholds of 60 GeV and 120 GeV (for data recorded in 2015) and 140 GeV(for 2016), using increasingly relaxed identification criteria, are also used. Events considered in thisanalysis are required to pass any of these triggers. They also need to contain a reconstructed leptonwith pT > 27 GeV, which matches the lepton reconstructed by the trigger, defined by proximity in ∆R.Single-lepton channelThe single-lepton channel selects events with at least five jets and exactly one reconstructed lightlepton, no other light leptons with pT > 10 GeVmust be present. Events with more than one hadronictau lepton are removed. This requirement avoids the selection of events used in other searches fort t¯H with different Higgs boson final states.The boosted region within the single-lepton channel targets events with at least one top quarkproduced at high transverse momentum. Higgs boson candidates are defined as large-R jets withpT > 200 GeV, which contain at least two jets. At least two of the contained jets need to be b-taggedat the loose operating point. Top quark candidates are formed by large-R jets with pT > 250 GeV,containing at least two jets, out of which exactly one passes the loose b-tagging operating point. Eventsin the boosted region contain at least one Higgs boson candidate, at least one top quark candidate, aswell as an additional jet b-tagged at the loose operating point. The b-tagging requirements are lessstringent than in the resolved regions to retain a sufficient t t¯H selection efficiency.Events which do not fall into the boosted region are instead considered for the remaining regionsof the single-lepton channel, called the resolved regions. Among the five required jets, they need tohave at least two jets passing the very tight b-tagging operating point, or at least three passing themedium operating point.636. Search for t t¯H(bb¯)Dilepton channelThe dilepton channel requires events to have two reconstructed light leptons with opposite electriccharge. For events with two electrons, the lepton with lower transverse momentum needs to satisfypT > 15 GeV, while the threshold is pT > 10 GeV for events with at least one muon. If both leptonshave the same flavor, their invariant mass is required to be higher than 15 GeV and not within the83–99 GeV range. The latter requirement suppresses events originating from Z boson decays. At leastthree jets are required in the dilepton channel, and at least two of them need to be b-tagged at themedium operating point. Events with at least one hadronic tau lepton are removed.6.3 ModelingThe expected contributions to the analysis from various processes are modeled mostly with MonteCarlo (MC) simulation; the contribution of fake and non-prompt leptons in the single-lepton channelis estimated from data. The GEANT4-based full simulation of the ATLAS detector is used for the major-ity of MC samples. Some samples used to build templates for estimating systematic uncertaintiesare instead simulated with the AFII method. The simulation of ATLAS is described in reference [57].Pile-up interactions are simulated with PYTHIA 8.186 [92], and simulated events are reweighted tocorrespond to the pile-up profile in data. EVTGEN 1.2.0 [93] is used to decay b- and c-hadrons forall samples except those produced with SHERPA [94]. The top quark mass in all simulations is set to172.5 GeV.6.3.1 t t¯H signalSamples for the expected t t¯H signal distributions are produced with the MADGRAPH5_AMC@NLO[95] generator in version 2.3.2, performing the matrix element calculation at NLO in QCD. The termMG5_AMC@NLO will be used in the following to refer to this event generator. The parton distribu-tion functions are provided by the NNPDF3.0NLO [96] set. Both renormalization and factorizationscales are set to µR =µF = 0.5 HT, with the HT variable defined as the sum of the transverse masses√p2T+m2 of all final state particles per event. Parton showering and hadronization are performedby PYTHIA 8.210 [97], with free model parameters set to the A14 tune [98]. The Higgs bosonmass inthe simulation is set to 125 GeV, and its branching ratios are calculated with HDECAY [34, 99]. Theproduction cross-section for t t¯H is 507+35−50 fb [34], calculated at NLO accuracy in QCD and includingNLO EW corrections.6.3.2 t t¯ + jets backgroundTop quark pair production, the t t¯ process, is the dominant background in this analysis. It is modeledwith the POWHEG-BOX v2 event generator [100–103] at NLO using the NNPDF3.0NLO PDF set inthe five-flavor (5F) scheme. This event generator will be referred to as POWHEG. The setup is tunedto describe data in a more inclusive phase space than the analysis in this chapter is using [104].646. Search for t t¯H(bb¯)Table 6.1: Definition of the t t¯ + jets components used in the analysis. Additional particle jets are thosenot originating from a top quark orW boson decay.t t¯ + jets component definitiont t¯+≥ 1b ≥ 1 additional particle jets matched to ≥ 1 b-hadronst t¯ +b one additional particle jet matched to one b-hadront t¯ +bb¯ two additional particle jets matched to one b-hadron eacht t¯ +B one additional particle jet matched to two or more b-hadronst t¯+≥ 3b remaining t t¯+≥ 1b events, excluding t t¯ +b (MPI/FSR)t t¯ +b (MPI/FSR) additional particle jet fromMPI and FSRt t¯+≥ 1c not t t¯+≥ 1b, and ≥ 1 additional particle jets matched to ≥ 1 c-hadronst t¯ + light neither t t¯+≥ 1b nor t t¯+≥ 1cRenormalization and factorization scales are set to the transverse mass√p2T+m2 of the top quark,evaluated in the reference frame where the t t¯ center of mass is at rest. PYTHIA 8.210 with the sameA14 tune is used for parton showering and hadronization. The production cross-section for t t¯ is832+46−51 pb. It is evaluated with Top++2.0 [105] at NNLO in QCD, including next-to-next-to-leadinglogarithmic (NNLL) corrections [106–109].t t¯ + jets classification into componentsThe t t¯ +jets background is split into multiple components. In order to perform the split, so-calledparticle jets are built with the anti-kt algorithm with R = 0.4, using stable particles (with meanlifetimes τ> 3 ·10−11 s) from the MC simulation as input. The number of b- and c-hadrons within∆R < 0.4 is then counted for each particle jet. An event is classified as t t¯+≥ 1b if it has at least oneparticle jet containing at least one b-hadron, and this jet does not originate from a top quark orWboson decay. All of the remaining events are classified as t t¯+≥ 1c if they have at least one particle jetcontaining at least one c-hadron, and this jet does not originate from aW boson decay. The rest of theevents are labeled t t¯ + light; the name refers to the smaller masses of the u, d and s quarks comparedto the c and b quarks.The t t¯+≥ 1b component is split further into sub-components. For the following definitions, againonly particle jets not originating from a top quark orW boson decay are considered. Events containinga single particle jet with exactly one b-hadron matched to it are labeled as t t¯ +b. If they containexactly two particle jets with exactly one b-hadronmatched to each, they are labeled as t t¯+bb¯. Eventswith a single particle jet containing two or more b-hadrons are called t t¯ +B . The remaining eventsare categorized as t t¯+≥ 3b. A special category exists for events containing b-jets originating frommulti-parton interaction (MPI) and final state radiation (FSR), which is gluon radiation from top quarkdecay products. This affects 10% of the events, and such events are categorized as t t¯ +b (MPI/FSR)instead. Table 6.1 summarizes the different t t¯ +jets components used in the analysis.656. Search for t t¯H(bb¯)tt+b tt+bb tt+B 3b≥tt+Fraction of events2−101−1018YTHIA +POWHEGP4FHERPASATLAS Simulationtt + b tt + bb tt + B 3b≥tt +   8YTHIA +POWHEGP4FHERPAS 0.511.52Figure 6.3: Relative fraction of t t¯+≥ 1b sub-components predicted by the POWHEG+PYTHIA 8 andSHERPA4F samples. The uncertainty for both predictions are also shown, including the sourcesdiscussed in section 6.6.3 [1].ReweightingA MC sample for t t¯ +bb¯ is produced with SHERPA+OPENLOOPS [94, 110, 111], describing the pro-duction of the two additional b-jets at NLO precision. SHERPA is used in version 2.1.1, with the CT10PDF set [16, 112] in the four-flavor (4F) scheme. This sample will be referred to as SHERPA4F. As itdescribes the two additional b-jets at NLO and takes into account the b quark mass, it is the mostprecise theoretical prediction for the t t¯+ ≥ 1b process available for the analysis. In the nominalPOWHEG+PYTHIA 8 t t¯ sample, additional b-jets come from the parton shower (PS).The relative contributions from the various t t¯+≥ 1b sub-components in the POWHEG+PYTHIA 8sample are reweighted to match the distribution of the SHERPA4F sample. The t t¯ +b (MPI/FSR)component, which is not included in the SHERPA4F prediction, is unaffected by this reweighting.Figure 6.3 shows the fraction of events in the relevant t t¯+≥ 1b sub-components for both samples.The uncertainty for the POWHEG+PYTHIA 8 prediction is obtained from the modeling uncertaintiesdiscussed in section 6.6.3, but not including SHERPA4F related uncertainties. The uncertainty for theSHERPA4F prediction originates from the sources affecting this prediction directly, as discussed insection 6.6.3.6.3.3 Other backgroundsThe remaining background processes have a smaller impact on the analysis. All of the processesdescribed in the following, with the exception of t t¯V , are collectively referred to as non-t t¯ . This group666. Search for t t¯H(bb¯)of non-t t¯ processes contributes 4–15% across the regions considered.Additional simulated backgroundsThe simulation of weak vector boson production with additional jets, V +jets, is performed withSHERPA 2.2.1 and the NNPDF3.0NNLO PDF set. It allows for the generation of up to two additionalpartons at NLO, and four at LO [113–115]. Z+jets events containing b- or c-jets are weighted by afactor 1.3 to match the distribution of data within a control region around the Z boson mass window.SHERPA 2.2.1 is also used to simulate diboson (two weak vector boson) production [116].The production of top quark pairs with additional weak vector bosons, t t¯W and t t¯ Z , is modeledat NLO with MG5_AMC@NLO and the NNPDF3.0NLO PDF set. For these t t¯V samples, PYTHIA 8.210with the A14 tune is used for parton showering and hadronization. This setup corresponds to thetreatment for t t¯H .Five different samples are generated for single top quark production. They describe s- and t-channel single top quark production, as well asWt , t Z and tWZ topologies. Three of these processesare modeled at NLO with POWHEG-BOX v1 and the CT10 PDF set: s- and t-channel production, aswell asWt . PYTHIA 6.428 [117] with parameters set to the Perugia 2012 tune [118] is used for all threesamples. The t-channel sample is produced in the 4F scheme. The diagram removal scheme [119]is used to treat the overlap of t t¯ andWt topologies. The t Z process is instead generated at LO withMG5_AMC@NLO and PYTHIA 6. The tWZ process is also generated with MG5_AMC@NLO, but atNLO and using PYTHIA 8.Two additional rare process involving multiple top quarks are the production of t t¯ t t¯ and t t¯WW .Both processes are generated with MG5_AMC@NLO at LO and use PYTHIA 8 for parton showeringand hadronization.Additional backgrounds originate fromHiggs boson production mechanisms other than t t¯H . Thegluon–gluon fusion, vector boson fusion and VH production mechanisms result in topologies verydifferent from t t¯H , and are all negligible in the analysis. The rare Higgs boson production with asingle top quark is modeled with two samples. MG5_AMC@NLO with CTEQ6L1 [120] PDF set andHERWIG++ [121] for parton showering and hadronization is used for the production with an additionalW boson in the final state, tWH . A 4F scheme simulation is used for the tHqb final state, where qstands for any quark lighter than the bottom quark. The sample for this process is obtained at LOusing MG5_AMC@NLO with the CT10 PDF set and PYTHIA 8.Fake and non-prompt leptonsPhotons or jets which are misidentified as light leptons are called fake leptons. Both fake and non-prompt leptons will be referred to as fake leptons in the following. In the dilepton channel of theanalysis, this background is estimated from simulation in a control region where two leptons with thesame electric charge are required. The dominant contribution to the fake lepton background in thischannel arises from single-lepton t t¯ events, where one of the two reconstructed leptons is fake andcan therefore have the same charge as the real lepton. The fake lepton estimate is normalized to data.676. Search for t t¯H(bb¯)5 6 7 8 9≥number of jets0.50.7511.25 Data / Pred. 0100200300400500310×Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 6.4: Expected distribution of the number of jets per event in the single-lepton channel, com-pared to data. The uncertainties shown include all sources of systematic uncertainty described insection 6.6, with the exception of the free-floating normalization factors for the t t¯+≥ 1b and t t¯+≥ 1cprocesses. The t t¯H distribution normalized to the total background is overlaid as a dashed red line.The single-lepton channel makes use of the matrix method [122] to estimate the fake leptonbackground with a data-driven technique. This method defines a control region with relaxed leptonrequirements, composed of events with real and fake leptons. The fraction of events in this controlregion which also satisfy the nominal analysis requirements is estimated for both real and fake leptons.Events taken from data in the control region are then assigned weights, depending on the leptonkinematics and the measured fractions. These weighted events provide the estimate of fake leptonsin the nominal analysis regions. The estimate is statistically consistent with zero events in the threemost sensitive single-lepton signal regions (SR≥6j1 , SR≥6j2 , SR5j1 , see section 6.4), and the contributionfrom fake leptons is neglected there.6.3.4 Inclusive modeling of dataThe model described in this section can be compared to data in an inclusive region, containingall events entering the single-lepton channel. Figure 6.4 shows the distribution of the number ofjets per event, where the model is in good agreement with data. With this inclusive selection, thebackground is dominated by t t¯ + light processes. The uncertainty shown in the figure includesstatistical uncertainties and all sources from section 6.6. Uncertainties related to the free-floatingnormalization factors for the t t¯+≥ 1b and t t¯+≥ 1c processes are not included.The number of b-tagged jets at the four operating points very tight, tight,medium, loose are shownin figure 6.5. The model is in agreement with data for all operating points, and the t t¯+≥ 1b processdominates the events with many b-tagged jets.682≤ 3 4 5≥number of jets b-tagged at very tight operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at tight operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at medium operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at loose operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.HttFigure 6.5: Expected distribution of the number of b-tagged jets per event at the four operating points(very tight, tight,medium, loose) in the single-lepton channel, compared to data. The uncertaintiesshown include all sources of systematic uncertainty described in section 6.6, with the exception of thefree-floating normalization factors for the t t¯+≥ 1b and t t¯+≥ 1c processes. The t t¯H signal is shownboth in the stacked histogram, contributing in red, as well as a dashed red line drawn on top of thestacked histogram.696. Search for t t¯H(bb¯)6.4 Event categorizationEvents in both the single-lepton and dilepton channel are divided into multiple exclusive regions.These regions are defined via the amount of jets present per event, as well as the amount of b-taggedjets at the four operating points. The expected composition of events varies across the regions defined.Regionswith large contributions from the t t¯H process are called signal regions. The remaining regionsare called control regions, and they serve to constrain backgrounds and systematic uncertainties inthe analysis. Both signal and control regions enter the fit to measure the t t¯H signal strength. Thesimultaneous use of four calibrated b-tagging operating points is an improvement compared to theprevious Run-1 t t¯H(bb¯)ATLAS analysis [88].6.4.1 Region definitionsThe boosted region in the single-lepton channel is defined as described in section 6.2.3. It will also bereferred to as SRboosted. All remaining events in the single-lepton channel are split depending on theirjet multiplicity; separate regions are constructed for events with exactly five, and six or more jets. Forevery event, jets are then considered in decreasing order of tightness of the b-tagging operating pointsthey satisfy. The operating points are assigned numerical values 1–5, with 1 representing a jet thatis not b-tagged, and 5 for a jet tagged at the very tight operating point, as described in section 4.4.2.An event with at least four jets can thus be represented by (b1,b2,b3,b4), with bi ∈ {1,2,3,4,5} andbi ≥ bi+i .Exactly four b quarks are expected from the t t¯H(bb¯)signal, motivating the creation of signalregions requiring four b-tagged jets. The best signal purity is achieved when using the very tightoperating point. In the single-lepton channel, the SR5j1 and SR≥6j1 signal regions are therefore definedby requiring the first four jets to be b-tagged at the very tight operating point, and the event to containexactly five, or at least six jets, respectively. These events can be represented by (5,5,5,5).All remaining regions in the single-lepton channel are defined by grouping together events withsimilar b-tagging configurations. The SR5j2 , SR≥6j2 and SR≥6j3 signal regions are obtained by mergingtogether configurations enriched in t t¯+≥ 2b processes. Control regions enriched in t t¯ +b, t t¯+≥ 1cand t t¯ + light collect the rest of the events. Three of these control regions are built with eventscontaining exactly five jets: CR5jt t¯+b , CR5jt t¯+≥1c , and CR5jt t¯+light. The corresponding control regions forevents with at least six jets are CR≥6jt t¯+b , CR≥6jt t¯+≥1c , and CR≥6jt t¯+light.The exact definitions of the eleven resolved regions in the single-lepton channel are summarizedin figure 6.6 and figure 6.7 for the regions with exactly five, and at least six jets, respectively. Thesefigures show the b-tagging requirement placed on the first two jets on the vertical axis, and therequirement on third and fourth jet on the horizontal axis. Signal regions have the most stringentb-tagging requirements. Control regions enriched in t t¯+light have the loosest b-tagging requirements.Regions enriched in t t¯+≥ 1c are located between regions enriched in t t¯ + light and signal regions;this is due to the larger mis-tag rate of c-jets compared to light jets.The treatment for the dilepton channel is similar, resulting in three signal regions and four control70Single Lepton, 5 jSR1 SR2CRtt¯+b CRtt¯+≥1cCRtt¯+light(3rd, 4th) jetb-taggingdiscriminant(5, 5) (5, 4) (5, 3) (5, 2) (4, 4) (4, 3) (4, 2) (3, 3) (3, 2) (2, 2) (5, 1) (4, 1) (3, 1) (2, 1) (1, 1)(1st, 2nd) jetb-taggingdiscriminant(5, 5)(5, 4)(4, 4)(5, 3)(4, 3)(3, 3)Figure 6.6: Definition of resolved analysis regions with exactly five jets in the single-lepton channel.The vertical axis shows the b-tagging requirements for the first two jets in each event, while thehorizontal axis shows the requirement for the third and fourth jet. Jets are ordered by decreasingtightness of the operating point they satisfy [1].Single Lepton, ≥ 6 jSR1 SR2 SR3CRtt¯+b CRtt¯+≥1cCRtt¯+light(3rd, 4th) jetb-taggingdiscriminant(5, 5) (5, 4) (5, 3) (5, 2) (4, 4) (4, 3) (4, 2) (3, 3) (3, 2) (2, 2) (5, 1) (4, 1) (3, 1) (2, 1) (1, 1)(1st, 2nd) jetb-taggingdiscriminant(5, 5)(5, 4)(4, 4)(5, 3)(4, 3)(3, 3)Figure 6.7: Definition of resolved analysis regions with six or more jets in the single-lepton channel.The vertical axis shows the b-tagging requirements for the first two jets in each event, while thehorizontal axis shows the requirement for the third and fourth jet. Jets are ordered by decreasingtightness of the operating point they satisfy [1].716. Search for t t¯H(bb¯)ATLAS = 13 TeVsSingle Lepton + lighttt 1c≥ + tt 1b≥ + tt + Vtt tNon-t+lighttt5jCR 1c≥+tt5jCR +btt5jCR25jSR 15jSR boostedSR+lighttt6j≥CR 1c≥+tt6j≥CR +btt6j≥CR36j≥SR 26j≥SR 16j≥SRFigure 6.8: Composition of background processes in the single-lepton regions. Each pie chart showsthe relative contributions per process and region, with the processes defined in section 6.3 [1].regions: SR≥4j1 , SR≥4j2 , SR≥4j3 , and CR3jt t¯+≥1b , CR3jt t¯+light, CR≥4jt t¯+≥1c , CR≥4jt t¯+light. The corresponding figuresare shown in appendix section A.1.6.4.2 Region composition and signal contributionsThe background composition of all single-lepton regions is visualized in figure 6.8. Signal regions aredominated by t t¯+≥ 1b production. The remaining regions vary in their composition, from regionsdominated by t t¯ + light to regions with substantial t t¯+≥ 1c or t t¯+≥ 1b contribution. No region iscompletely dominated by the t t¯+≥ 1c background. The relative contribution to the total backgroundfrom non-t t¯ and t t¯V processes is small compared to the t t¯ background.Figure 6.9 visualizes the contribution of the signal t t¯H process to the single-lepton regions. Thesolid black line, corresponding to the left vertical axis, shows the fraction of expected signal events726. Search for t t¯H(bb¯)tt+light5jCR1c≥tt+5jCR tt+b5jCR 25jSR 15jSR boostedSR tt+light6j≥CR1c≥tt+6j≥CR tt+b6j≥CR 3 6j≥SR 2 6j≥SR 1 6j≥SRB / S00.010.020.030.040.05B / S00.20.40.60.811.21.41.61.82ATLAS-1 = 13 TeV, 36.1 fbsSingle LeptonFigure 6.9: Signal contributions per analysis region in the single-lepton channel, evaluated usingthe expected amount of t t¯H events (S) and background events (B) per region. The solid black line,corresponding to the left vertical axis, shows S/B. The dashed red line, corresponding to the rightvertical axis, shows S/pB [1].(S) to the total background (B). This fraction is below 1.5% in the control regions, and surpasses 5%only in SR≥6j1 . The dashed red line, corresponding to the right vertical axis, shows S/pB, which is alsohighest in SR≥6j1 .The corresponding figures for the dilepton channel are shown in appendix section A.1.6.5 Multivariate analysis techniquesMultivariate analysis techniques are employed in all signal regions to help isolate the t t¯H signalprocess from the backgrounds. The approach for the t t¯H(bb¯)analysis has two stages. In the first stage,variousmethods of system reconstruction are performed. A reconstruction BDT (see also section 5.3.1)matches jets to partons to obtain candidates for top quarks and the Higgs boson in each event. Thelikelihood discriminant (LHD) considers the kinematics across possible jet–parton assignments,and calculates a discriminant for each event. The MEM provides another discriminant, built fromfirst principles. All three methods approach the system reconstruction slightly differently, and theircombination results in a stronger overall discriminant. The output from these methods is combinedwith additional information in the second stage, the classification BDT. Both the reconstruction andclassification BDT are trained with the Toolkit for Multivariate Data Analysis with ROOT (TMVA)package [123].736. Search for t t¯H(bb¯)6.5.1 Reconstruction BDTThe reconstruction BDT is trained to match jets to the partons from the t t¯ and t t¯H system. It is usedin the dilepton and resolved single-lepton regions. Candidates for top quarks,W bosons and Higgsbosons are built by combining jets and leptons, and multiple permutations (jet–parton assignments)are possible per event. The use of b-tagging information reduces the total amount of permutationsconsidered. For each permutation, invariant masses of object candidates and combinations of them,as well as angular distances, are calculated. They serve as inputs to the reconstruction BDT. The BDTis trained with simulated t t¯H events to identify correct permutations. Besides the BDT output itself,reconstructed quantities in the permutation with the highest BDT output are also used as input tothe classification BDT. Two different versions of the reconstruction BDT are used; for one versionall observables related to the Higgs boson are removed. This reduces the reconstruction efficiency,but improves the discriminating power for variables related to the Higgs boson, such as the invariantmass of the Higgs boson candidate. In SR≥6j1 , the Higgs boson is correctly reconstructed for 48% and32% of the simulated t t¯H events when using or not using information related to the Higgs boson inthe reconstruction BDT, respectively.The large-R jets in the boosted region simplify the combinatorial problem, and no reconstructionBDT is used in that region. The Higgs boson candidate in this region contains the two jets from theHiggs boson decay for 47% of the t t¯H events.6.5.2 Likelihood discriminantThe LHD is calculated from various one-dimensional probability density functions, which describesignal and background distribution of kinematic variables such as invariant masses and angles. It isused in the resolved single-lepton regions. The probabilities for an event to be consistent with thesignal or background hypotheses, psig and pbkg, are calculated as a product of the one-dimensionalprobability density functions, and averaged over jet–parton permutations. The permutations areweighted with b-tagging information. Two background processes are considered, t t¯ +b and t t¯+≥ 2b,their likelihoods are added in the calculation of pbkg, and weighted by their expected relative contri-bution in the simulated t t¯ sample. Different distributions are used in regions with exactly five, or sixand more jets. An additional hypothesis is included for events where not all of the jets correspondingto the hadronic decay products of theW boson were reconstructed. The output of the LHD for usein the classification BDT is defined as psig/psig+pbkg. In contrast to the reconstruction BDT, the LHDincorporates information frommultiple jet–parton permutations in its output. The LHDhowever doesnot account for correlations between the kinematic variables used to build the probability densityfunction templates, while both the reconstruction BDT and the MEM take correlations into account.More information about the LHDmethod is provided in reference [124].746. Search for t t¯H(bb¯)6.5.3 Matrix element methodTheMEM provides a strong discriminant between t t¯H and the t t¯ +bb¯ background and is describedin detail in chapter 7. In contrast to the LHD, the discriminant is calculated from first principles. Dueto its large computational cost, the MEM is only used in the most sensitive signal region SR≥6j1 of theanalysis. This choice is made to maximize the separation of t t¯H from the backgrounds in the regionwhere it is most important for the analysis.6.5.4 Classification BDTThe classification BDT is trained to separate t t¯H from the t t¯ background. It combines a range ofinputs to achieve good discrimination. For every input variable, the expected distribution fromsimulation is compared to data, and only well modeled variables are used. Different combinationsof input variables are used across the nine signal regions, they are listed in reference [1]. The basicinputs are kinematic variables, such as angles between reconstructed objects and invariant massesof combined objects. Information regarding the b-tagging operating points passed by various jets isalso included. An additional ingredient are the three intermediate system reconstruction approaches.Information from the reconstruction BDT is used in the dilepton and the resolved single-leptonregions, such as the output of the reconstruction BDT itself, but also the Higgs boson candidatemass. The LHD is used in the resolved single-lepton regions, and the MEM discriminant enters theclassification BDT only in SR≥6j1 . The LHD andMEM discriminant are the most powerful inputs to theclassification BDT, followed by the reconstruction BDT output.6.6 Systematic uncertaintiesThe t t¯H(bb¯)analysis is affected by many sources of systematic uncertainty. All sources can generallyaffect both the normalization and shape of the distributions on which they act. Exceptions to this arethe luminosity uncertainty, as well as cross-section and normalization uncertainties on the variousprocesses considered in the analysis. These sources of uncertainty only affect the normalizationof the samples they act on. In the case of normalization uncertainties affecting only a specificprocess, a variation of the related nuisance parameter can however still result in a shape variation of adistribution containing a sum of processes. A nuisance parameter is introduced for each source ofuncertainty.General notes regarding the treatment of systematic uncertainties in the profile likelihood fit aregiven in section 6.6.1. This is followed by descriptions of the experimental andmodeling uncertaintiesin section 6.6.2 and section 6.6.3. A summary of all nuisance parameters considered is provided insection 6.6.4.756. Search for t t¯H(bb¯)6.6.1 Nuisance parameter detailsThe nuisance parameters are implemented with a Gaussian constraint as explained in section 5.1.3.The interpolation between the two templates defining the ±1σ effect of the systematic variationspecified by a nuisance parameter is done with polynomial functions. The extrapolation methodbeyond this range differs for the normalization and shape components. A linear extrapolation isused for the shape component, while the use of an exponential extrapolation for the normalizationcomponent prevents the total yield from samples to become negative. The normalization componentthus effectively behaves as if the extrapolation was linear, with a log-normal constraint.Some nuisance parameters are defined by a variation of the nominal configuration in only asingle direction. An important example in this analysis is the comparison between the nominal t t¯simulation, and a variation where the MC generator is replaced by an alternative setup. In these cases,the variation is defined as the +1σ effect, and the effect of the variation is symmetrized to obtainthe corresponding template for the −1σ effect of the nuisance parameter. When both variationsof a nuisance parameter are defined, the templates corresponding to the ±1σ effects are usuallysymmetrized. An exception to this are cross-section and normalization uncertainties.Besides the symmetrization, a smoothing procedure is applied to the templates defining the ±1σeffects of nuisance parameters, with the exception of most of the cross-section and normalizationuncertainties. This procedure removes the effect of statistical fluctuations in the templates, whichlead to artificially enlarged constraints in the profile likelihood fit.Lastly, the effect of a nuisance parameter acting on a specific sample in a given region is removedfrom the likelihood function if this effect is negligible. This speeds up the profile likelihood fit, withoutchanging the results. A normalization effect in a given region acting on a sample is removed fromthe fit model if it is below 1%. The same threshold is used to drop shape effects; a shape size of 1%corresponds to the template in any bin changing the normalization of the nominal sample by morethan 1% compared to the average effect.Statistical uncertainties on the modelStatistical uncertainties related to the distributions predicted by the nominal model originate fromthe finite amount of simulated events in the MC samples, and the finite amount of events in thedata-driven fake lepton estimate for the single-lepton channel. These uncertainties reach 12% intwo bins considered in the analysis, while the uncertainties in the majority of the remaining bins issignificantly below 10%. The nominal model estimate is treated as a subsidiary measurement, withan uncertainty corresponding to the statistical uncertainty in each bin. One nuisance parameter perbin in the analysis is used to describe these statistical uncertainties. A Gaussian constraint controlsthese statistical variations in each bin, which is a good approximation for Poisson uncertainties forthe relatively small statistical uncertainties in this analysis.766. Search for t t¯H(bb¯)6.6.2 Experimental uncertaintiesThe relative uncertainty regarding the integrated luminosity of the dataset used in the analysis is 2.1%,derived with a similar method as in reference [50]. A variation of the pile-upmodeling is also includedto cover related uncertainties.LeptonsLepton related systematic uncertainties have a very small impact on the analysis. For electrons,these cover effects related to the trigger, reconstruction, identification, and isolation efficiencies.Two additional nuisance parameter cover the calibration of the electron energy scale and resolution,for a total of six nuisance parameters related to electrons. The treatment of muons is similar. Theassociated nuisance parameters describe uncertainties related to the muon trigger, the association oftracks to vertices, the muon identification, identification of low-momentummuons, and the muonisolation. These nuisance parameters for muon systematic uncertainties related to efficiency aresplit into systematic and statistical components of the effects they describe. Muon scale calibrationuncertainties are covered by five nuisance parameters, related to the muon momentum scale andresolution, and additional calibrations used. In total, 15 nuisance parameters related to muons inATLAS are used. Three more nuisance parameters related to tau leptons are considered and found tobe negligible.JetsA total of 23 nuisance parameters are used to describe sources of systematic uncertainty related tojets. The basic jet energy scale calibration is covered by a set of eight nuisance parameters. Additionalnuisance parameters are used to describe uncertainties related to the calibration dependence on jetflavor, the jet position in η, pile-up, jets not contained within the calorimeter system and jets withhigh momentum. This results in 20 nuisance parameters related to the jet energy scale. Two nuisanceparameters describe uncertainties related to the jet energy resolution. One more nuisance parametercovers uncertainties related to the jet vertex tagger. Since the events considered in the analysis havemany jets, the related uncertainties have a significant impact on the analysis.Flavor taggingThe efficiency to correctly tag b-jets, and mis-tag rates for c- and light jets are measured for alloperating points used in the analysis, and combined into a global calibration. The related uncertaintiestake into account correlations between different operating points, and depend on the jet transversemomentum. In the case of the light jet mis-tag rate, the calibration is also dependent on the jetpseudorapidity. Uncertainties regarding the b-tagging efficiency are split into 30 sources. There are 15sources describing mis-tag rates for c-jets, and 80 nuisance parameters are used for light jet mis-tagrate uncertainties. An additional nuisance parameter is used for jets from hadronic decays of tau776. Search for t t¯H(bb¯)leptons. The b-tagging uncertainty ranges between 2% and 10%, and the c- and light jet mis-tag rateshave uncertainties in the ranges 5–20%, and 10–50%, respectively.Missing transverse energyUncertainties regarding the energy of objects used to calculate the missing transverse energy arepropagated to the measured EmissT . An additional three nuisance parameters describe uncertaintiesrelated to energy deposits that enter the EmissT calculation, but are not associated to any reconstructedobjects.6.6.3 Signal and background modelingThis section summarizes systematic uncertainties related to the nominal model introduced in sec-tion 6.3.t t¯H signalThe t t¯H cross-section uncertainty is split into two components. The first component contains theQCD scale uncertainties(+5.8%−9.2%), and the second component the uncertainties related to the PDF andstrong coupling (±3.6%) [34]. Three nuisance parameters cover the uncertainties related to the Higgsboson branching ratios into bb¯,WW ∗, and the remaining final states. The absolute uncertainty of thebranching ratio for the dominant bb¯ is +1.2%−1.3%. All five of these components do not have a significanteffect on the t t¯H shape, so only their normalization effect is considered. An uncertainty regarding thechoice of the PS and hadronization model is implemented by comparing the nominal setup, whichuses PYTHIA 8, to a sample using HERWIG++.t t¯ + jets backgroundA large amount of systematic uncertainties is associated to the modeling of the t t¯ + jets background.The cross-section uncertainty is 6% [105], implemented as one nuisance parameter and correlatedfor all t t¯ components. All remaining uncertainties related to t t¯ are implemented separately forthe different t t¯ components. Two nuisance parameters control the normalization of the t t¯+ ≥ 1band t t¯+≥ 1c components, with no constraint applied. The normalization of these components isfree-floating in the fit, and referred to as k(t t¯+≥ 1b) and k (t t¯+≥ 1c). All samples used to defineadditional systematic uncertainties are reweighted such that the fractions of t t¯+≥ 1b, t t¯+≥ 1c andt t¯ + light processes they contain corresponds to the nominal t t¯ sample, and such that the t t¯+≥ 1bsub-components match the SHERPA4F predictions. Dedicated uncertainties describe the t t¯+≥ 1bsub-component fractions, and the samples used to derive these uncertainties are not reweighted tomatch the SHERPA4F prediction.Multiple alternative simulations for the t t¯ background are generated. A t t¯ sample, which will bereferred to in the following as SHERPA5F, is generated at NLO with SHERPA 2.2.1 and OPENLOOPS,using the NNPDF3.0NNLO PDF set. This sample is accurate to NLO for up to one additional parton786. Search for t t¯H(bb¯)beyond the t t¯ system, and to LO for up to four more partons. The difference between the nominalPOWHEG+PYTHIA 8 and the SHERPA5F samples is used as a systematic uncertainty related to thechoice of NLO event generator, and it also varies the PS and hadronization model. Another sample isgenerated with a configuration similar to the nominal POWHEG+PYTHIA 8 sample, but using HERWIG 7[125], version 7.0.1, instead for parton showering and hadronization. The difference between thePOWHEG+PYTHIA 8 and POWHEG+HERWIG 7 samples is used as nuisance parameters related to thechoice of PS and hadronization model. Two alternative POWHEG+PYTHIA 8 samples are compared tothe nominal POWHEG+PYTHIA 8 sample to evaluate the uncertainty related to initial state radiation(ISR) and FSR. In these alternative samples, renormalization and factorization scales, a parameter inPOWHEG controlling extra radiation, as well as parameters in the A14 tune for the PYTHIA 8 showerare set to different values than in the nominal POWHEG+PYTHIA 8 sample. These three uncertaintysources are implemented with nine nuisance parameters, split between t t¯ components.Two additional modeling uncertainties for t t¯+≥ 1b and t t¯+≥ 1c are constructed from additionalMC samples. The residual difference between the POWHEG+PYTHIA 8 and SHERPA4F samples is usedas an uncertainty for the t t¯+≥ 1b sub-components, with the exception of t t¯ +b (MPI/FSR), which isnot included in the SHERPA4F calculation. This uncertainty covers differences between the 5F andNLO 4F scheme calculations of t t¯+≥ 1b and t t¯ +bb¯. A dedicated t t¯ +cc¯ sample is produced in thethree-flavor (3F) scheme with MG5_AMC@NLO at NLO, including the effect of massive c quarks.Parton showering and hadronization are performed with HERWIG++ [126]. The difference between thissample and the POWHEG+PYTHIA 8 prediction is used as an additional uncertainty for the t t¯+≥ 1cprocess.The uncertainties related to the fractions of t t¯+≥ 1b sub-components predicted by SHERPA4Fare evaluated by varying parameters within this simulation. Three of the uncertainties are related tothe settings of scales within SHERPA. One more nuisance parameter is used to compare twomodelsfor the PS. Two nuisance parameters describe the effect of exchanging the nominal CT10 PDF set bythe MSTW2008NLO [127] and NNPDF2.3NLO PDF sets. One nuisance parameter varies the settingsof the underlying event (UE) modeling. The UE refers to everything not related to the primary hardscattering process of interest in proton–proton collisions. These seven nuisance parameters are usedto build the uncertainty band for the SHERPA4F prediction shown in figure 6.3. An additional 50%normalization uncertainty is added for the t t¯+ ≥ 3b sub-component, which covers the differencebetween the POWHEG+PYTHIA 8 and SHERPA4F predictions. Lastly, a 50% normalization uncertaintyis added for the t t¯+b (MPI/FSR) sub-component, which is not described by the SHERPA4F prediction.A summary of the nuisance parameters for t t¯ modeling is shown in table 6.2. The modeling oft t¯ + light processes is described by three nuisance parameters, which compare POWHEG+PYTHIA 8 toSHERPA5F and POWHEG+HERWIG 7, and also include ISR and FSR variations. One additional nuisanceparameter is used for t t¯+≥ 1c, originating from the comparison to the MG5_AMC@NLO+HERWIG++sample, for a total of four nuisance parameters affecting t t¯+≥ 1c modeling. The t t¯+≥ 1b componentis covered by 13 nuisance parameters in total, including the three sources also affecting t t¯ + light,the comparison to SHERPA4F, seven variations affecting the SHERPA4F prediction, and finally two796. Search for t t¯H(bb¯)Table 6.2: Systematic uncertainty sources affecting the modeling of t t¯ + jets. The left column showsthe individual sources. Additional details regarding the sources are given in the central column.The column on the right lists on which t t¯ components the sources act on, and whether the effect iscorrelated between the components. Additional details are provided in section 6.6.3 [1].Systematic sources Description t t¯ categoriest t¯ cross-section Up or down by 6% All, correlatedk(t t¯+≥ 1b) Free-floating t t¯+≥ 1b normalization t t¯+≥ 1bk(t t¯+≥ 1c) Free-floating t t¯+≥ 1c normalization t t¯+≥ 1cSHERPA5F vs. nominal NLO event generator choice All, uncorrelatedPS & hadronization POWHEG+HERWIG 7vs. POWHEG+PYTHIA 8All, uncorrelatedISR and FSR Variations of µR, µF, and additionalPOWHEG and PYTHIA 8 parametersAll, uncorrelatedt t¯+≥ 1b SHERPA4F vs. nominal Comparison of t t¯ +bb¯ NLO (4F)vs. POWHEG+PYTHIA 8 (5F)t t¯+≥ 1bt t¯+≥ 1c 3F vs. 5F scheme MG5_AMC@NLO+HERWIG++vs. POWHEG+PYTHIA 8t t¯+≥ 1ct t¯+≥ 1b scale variations Three components t t¯+≥ 1bt t¯+≥ 1b shower recoil scheme Alternative model scheme t t¯+≥ 1bt t¯+≥ 1b PDF (MSTW) Compare MSTW vs. CT10 t t¯+≥ 1bt t¯+≥ 1b PDF (NNPDF) Compare NNPDF vs. CT10 t t¯+≥ 1bt t¯+≥ 1b UE Alternative set of tuned parametersfor the underlying eventt t¯+≥ 1bt t¯+≥ 3b normalization Up or down by 50% t t¯+≥ 1bt t¯+≥ 1b MPI Up or down by 50% t t¯+≥ 1bnormalization uncertainties for t t¯+≥ 3b and t t¯ +b (MPI/FSR).Small backgroundsA 40% normalization uncertainty forW +jets production is used, derived from parameter variationswithin the SHERPA simulation. The dominant contribution to this uncertainty comes from variationsof the renormalization scale. An additional 30% uncertainty is assigned to events containing b- orc-jets, split into two components. One nuisance parameter is used for events with exactly two suchjets, another one for events with three or more. This uncertainty covers differences observed whencomparing the SHERPA prediction to a sample generated with MG5_AMC@NLO and PYTHIA 8. Anuncertainty of 35% is assigned to the normalization of Z+jets events. It is split into three components;events are treated separately if they have exactly three jets, at least four jets and are in the dileptonchannel, or fall into the single-lepton channel. A normalization uncertainty of 50% is used for diboson806. Search for t t¯H(bb¯)events [116].The treatment of the t t¯V cross-section uncertainties is equivalent to the treatment of t t¯H . Theyare split into components for the PDF and scale uncertainties, and not correlated between t t¯W andt t¯ Z . For both t t¯W and t t¯ Z , the combined impact of the two components is around ±15% [128]. Thecomparison between the nominal t t¯V samples and samples produced with SHERPA is used as anadditional modeling uncertainty, not correlated between t t¯W and t t¯ Z .One normalization uncertainty each is assigned to the cross-section for the s- and t-channel singletop quark production, as well as the production ofWt , and for tWZ . The cross-section uncertaintyfor t Z is split into two components, equivalent to the treatment employed for t t¯H and t t¯V . Forthe Wt and t-channel production processes, additional uncertainties related to the choice of PSand hadronization model are derived by comparing the nominal POWHEG+PYTHIA 6 samples toalternative samples generated with POWHEG and HERWIG++. Two more uncertainties are derivedfrom POWHEG+PYTHIA 6 samples, where renormalization and factorization scale, as well as settingsin the Perugia shower tune, are all varied; this describes variations in ISR and FSR. An additionaluncertainty affecting theWt sample is derived by comparing the nominal diagram removal schemeto the alternative diagram subtraction scheme [119]. In total, the single top quark processes aredescribed by six cross-section uncertainties and five nuisance parameters related to modeling of theprocesses.Seven more nuisance parameters describe cross-section uncertainties for the remaining back-ground processes with minor contributions. A 50% normalization uncertainty is used for t t¯ t t¯ . Theuncertainties for t t¯WW , tWH and tHqb are split into two components each. These componentsseparately describe QCD scale uncertainties and uncertainties related to the PDFs.Fake and non-prompt leptonsA 50% uncertainty is used for the normalization of the data-driven fake lepton estimate in the single-lepton channel. It is split into six components for this channel, treating events with electrons andmuons separately. Three types of regions are assigned separate nuisance parameters: the boostedregion, resolved regions with exactly five jets, and resolved regions with six or more jets. The fakelepton estimate for the dilepton channel is derived from simulation, and a single 25% normalizationuncertainty is assigned to it. This results in a total of seven nuisance parameters related to the fakelepton estimate.6.6.4 Summary of systematic uncertainty sourcesTable 6.3 lists all systematic uncertainties affecting the analysis, grouped by their sources. The type ofeach source indicate whether the nuisance parameter affects only normalization (type N ) or bothshape and normalization (type S+N ) of the samples on which it acts. Many of the uncertainties arebroken down into multiple components, the amount of these components per source is listed in thelast column. The cross-sections for the small backgrounds listed at the end of the table affect t t¯ t t¯ , t Z ,tW Z , t t¯WW , tH jb andWtH .81Table 6.3: List of the systematic uncertainties affecting the analysis. The typeN indicates uncertaintieschanging normalization of the affected process, uncertainties with type S+N can change both shapeand normalization. The amount of different components per source is listed in the third column [1].Systematic uncertainty Type ComponentsExperimental uncertaintiesLuminosity N 1Pile-up modeling S+N 1Physics objectsElectron S+N 6Muon S+N 15Taus S+N 3Jet energy scale S+N 20Jet energy resolution S+N 2Jet vertex tagger S+N 1EmissT S+N 3b-taggingEfficiency S+N 30Mis-tag rate (c) S+N 15Mis-tag rate (light) S+N 80Mis-tag rate (extrapolation c→ τ) S+N 1Modeling uncertaintiesSignalt t¯H cross-section N 2H branching fractions N 3t t¯H modeling S+N 1t t¯ backgroundt t¯ cross-section N 1t t¯+≥ 1c normalization free-floating N 1t t¯+≥ 1b normalization free-floating N 1t t¯ + light modeling S+N 3t t¯+≥ 1c modeling S+N 4t t¯+≥ 1b modeling S+N 13Other backgroundsW +jets normalization N 3Z+jets normalization N 3Diboson normalization N 1t t¯W cross-section N 2t t¯ Z cross-section N 2t t¯W modeling S+N 1t t¯ Z modeling S+N 1Single top cross-section N 6Single top modeling S+N 5Small background cross-sections N 7Fake and non-prompt lepton normalization N 7826. Search for t t¯H(bb¯)6.7 Statistical analysis and resultsThis section presents the results of the statistical analysis for the t t¯H(bb¯)search. In the statisticalanalysis, the profile likelihood ratio introduced in section 5.2.2 is maximized, corresponding to aminimization of tµ. The HISTFACTORY [129] software is used to build the likelihood functions for themodel described in section 6.3, including the uncertainties described in section 6.6. HISTFACTORYis based on ROOFIT [130] and used together with ROOSTATS [131] tools. It operates on histogramsspecifying all relevant distributions needed to build the likelihood functions. The creation of thesehistograms is performed with the TREXFITTER software. This software is developed for internal usein the ATLAS collaboration. It acts as a steering tool for the statistical analysis, and includes a largerange of tools used to study the fit model and fit results in detail. The minimization of tµ is performedwith MINUIT [82], implemented in C++ within the ROOT framework [83]. The uncertainties forunconstrained parameters are determined with the MINOS algorithm, which supports asymmetricuncertainties. It varies a parameter in both directions until tµ changes by one unit, thereby obtainingthe parameter uncertainties.The profile likelihood test statistic tµ is constructed as described in section 5.2.2. Systematicuncertainties are implemented according to the prescription from section 5.1.3, with additionaldetails provided in section 6.6.1. All nuisance parameters θ encoding prior knowledge are scaledsuch that θˆ ≡ θ0 = 0 and σˆθ ≡∆θ = 1. Upper limits are calculated with the CLS technique described insection 5.2.2.6.7.1 Fit model details and expected performanceA total of 19 regions enter the simultaneous fit. The four dilepton control regions, as well as CR5jt t¯+b ,CR5jt t¯+light, CR≥6jt t¯+b , and CR≥6jt t¯+light, enter as a single bin each. The distribution of the HhadT variable,which is the scalar sum of jet transverse momenta, is used for the CR5jt t¯+≥1c and CR≥6jt t¯+≥1c regionsinstead. These distributions, with six and eight bins for CR5jt t¯+≥1c and CR≥6jt t¯+≥1c , respectively, allow foradditional control over the t t¯+≥ 1c background. Distributions of the classification BDT are used inall nine signal regions to help isolate the t t¯H signal. The binning of all distributions is optimized forsensitivity, while keeping the statistical uncertainties related to the model below 20%. This avoidspossible bias due to statistical fluctuations in the model. Most distributions entering the fit are shownin section 6.7.2, while several regions of the dilepton channel are included in appendix section A.2.The expected performance of the analysis can be studied by fitting themodel to an Asimov datasetas described in section 5.2.3. This results in an expected signal strength measurement of µt t¯H =1.00+0.61−0.58, while the free-floating normalization factors for the t t¯+ ≥ 1b and t t¯+ ≥ 1c backgroundsare expected to be measured as k(t t¯+≥ 1b) = 1.00+0.09−0.08 and k (t t¯+≥ 1c) = 1.00+0.20−0.20. The analysissensitivity is expected to be dominated by the regions in the single-lepton channel, and stronglyaffected by SR≥6j1 . When only including the single-lepton channel, the signal strength measurementis expected to be µt t¯H = 1.00+0.68−0.65. When performing the same fit, but with SR≥6j1 excluded, theuncertainties increase, resulting in µt t¯H = 1.00+0.85−0.84. In contrast to this, a removal of SR≥6j2 results836. Search for t t¯H(bb¯)Table 6.4: Expected signal strength measurement in fits to an Asimov dataset. The SR≥6j1 plays animportant role in the overall sensitivity of the analysis.regions included in fit µt t¯Hsingle-lepton and dilepton regions 1.00+0.61−0.58single-lepton regions 1.00+0.68−0.65single-lepton regions, without SR≥6j2 1.00+0.71−0.68single-lepton regions, without SR5j1 1.00+0.73−0.70single-lepton regions, without SR≥6j1 1.00+0.85−0.84in µt t¯H = 1.00+0.71−0.68, while removing SR5j1 results in µt t¯H = 1.00+0.73−0.70. Table 6.4 summarizes theseconfigurations.6.7.2 Fit to dataThe analysis is optimized for sensitivity using the expected signal and background distributionsfrom Asimov datasets. Data is used during the optimization stage only in signal-depleted regions toguide the definition of the background model and its associated uncertainties. This avoids biasesin the analysis design due to the knowledge of the distribution of data in regions where significantsignal contributions are expected for a SM t t¯H signal. Fits to data that are sensitive to the signal areperformed only after finalizing all decisions regarding the analysis design.When fitting the model to data, the t t¯H signal strength is measured asµt t¯H = 0.84±0.29 (stat.)+0.57−0.54 (syst.)= 0.84+0.64−0.61. (6.1)It is compatible with the SM prediction. The statistical uncertainty is evaluated in a second fit. Forthis fit, all nuisance parameters are set to the values minimizing tµ, the post-fit values θˆ. A fit ofonly the three free-floating parameters µt t¯H , k(t t¯+≥ 1b), k (t t¯+≥ 1c) to data is performed, and theresulting uncertainty ∆µt t¯H is interpreted as the statistical uncertainty for the signal strength. Thesystematic uncertainty component reported in equation (6.1) is obtained by subtracting the statisticalcomponent in quadrature from the total uncertainty.The fit results for the free-floating normalization factors k(t t¯+≥ 1b) and k (t t¯+≥ 1c) arek(t t¯+≥ 1b)= 1.24±0.10,k(t t¯+≥ 1c)= 1.63±0.23. (6.2)Another fit is performed where two signal strength parameters are used, one scaling t t¯H in thedilepton channel, and another one scaling the single-lepton channel. The results of this so-calledtwo-µ fit, as well as the nominal fit result with only a single signal strength parameter, are summarizedin figure 6.10. The fit is used to validate the compatibility of the measurement in both channels.The compatibility between the two signal strengths in the two-µ fit is evaluated with a χ2 test withone degree of freedom (corresponding to the additional signal strength parameter in the two-µ fit,846. Search for t t¯H(bb¯)SMHttσ/Httσ = µBest fit 1− 0 1 2 3 4 5 6Combined combined fit)µ(two-         Single Lepton combined fit)µ(two-                  Dilepton−0.24 1.02+1.05− ( 0.54+0.52− 0.87+0.91− )0.95 0.65+0.62− ( 0.31+0.31− 0.57+0.54− )0.84 0.64+0.61− ( 0.29+0.29− 0.57+0.54− )ATLAS -1 = 13 TeV, 36.1 fbs = 125 GeVHmtot.stat.tot ( stat syst )Figure 6.10: Measurement of the signal strength µt t¯H when fitting the model to data. The two-µ fitis performed by fitting both dilepton and single-lepton channels, with two separate signal strengthparameters affecting them. The nominal fit result, listed in the last row, is obtained by using a singlesignal strength parameter [1].compared to the nominal fit). It compares the negative logarithm of the likelihood for the nominaland two-µ fits at the respective best-fit points. The probability to obtain a difference between thetwo signal strength parameters as large or larger than the one observed is 19%; the results are notincompatible with each other.Separate fits to the single-lepton and dilepton channels are also performed. When fitting onlydata in the single-lepton channel, the signal strength is measured as µt t¯H = 0.67+0.71−0.69. The fit to thedilepton channel results in µt t¯H = 0.11+1.36−1.41. Both of these individual measurements are smaller thanthe signal strength extracted from the combined fit, but compatible with it. This effect is caused bylarge correlations between systematic uncertainties affecting both channels.Distributions before and after the fitA summary of all single-lepton regions considered in the fit is shown in figure 6.11. Every region isshown as a single bin in this distribution, even though some regions are considered in the fit withmultiple bins. The figure includes both the nominal distribution of the model (called pre-fit), as wellas the distribution of the model with all parameters set to their best-fit values (called post-fit). Forthe pre-fit figure, the normalization factors k(t t¯+≥ 1b) and k (t t¯+≥ 1c) are both set to unity, and nouncertainty related to them is used to build the total uncertainty band shown in the figure. The post-fitfigure contains the effect of their uncertainties, as determined by the fit. It also takes into account thecorrelations between all nuisance parameters, and the constraints of nuisance parameters. Theseconstraints arise when the nuisance parameter uncertainty determined in the fit is smaller thanthe uncertainty originating from the associated constraint. In the figure, the t t¯H signal is shown856. Search for t t¯H(bb¯)contributing in red to the stacked histogram, and separately drawn as a dashed red line on top of thestacked histogram. The signal is normalized to the SM prediction pre-fit, and to the best-fit signalstrength post-fit.In some regions, the predicted yield from themodel is smaller than the observed amount of eventsin data. This deficit is corrected well in the fit, as visible in the post-fit distribution. The total modeluncertainties are also decreased in the post-fit distribution compared to the pre-fit distribution. Thisis due to correlations between nuisance parameters and their constraints.Figure 6.12 shows the equivalent distribution of all regions pre- and post-fit for the dileptonchannel. The pre-fit deficit visible in some regions is well adjusted by the fit.The distributions of the control regions in the single-lepton channel that enter the fit with morethan one bin are shown in figure 6.13. Figure 6.14 shows the resolved signal regions with exactly fivejets, as well as the boosted region. The resolved signal regions with six or more jets are visualized infigure 6.15. The corresponding distributions for the dilepton regions are found in appendix section A.2.In figure 6.14 and figure 6.15, the t t¯H distribution normalized to the total background is drawn as adashed red line.All distributions of the model are in agreement with data within the associated uncertainties bothpre- and post-fit. The post-fit uncertainties are reduced due to the correlations between nuisanceparameters, and their constraints. The agreement of the model with data is improved post-fit.The post-fit modeling of data is also studied with distributions that are not directly used in thefit; data is described well by the post-fit model for the wide range of distributions investigated. Thepost-fit distribution of the MEM discriminant is shown in section 7.4.1. Distributions not directlyused in the fit are generally still well described post-fit.6.7.3 Dominant nuisance parameters and sources of uncertaintyThe influence of a given nuisance parameter on the fit result can be evaluated by studying the fit withthis nuisance parameter fixed to specific values. The impact ∆µ of a nuisance parameter on the signalstrength is defined as the shift in the signal strength µt t¯H between the nominal fit and a fit with thenuisance parameter held fixed at θˆ±x. The pre-fit impact is obtained by considering x =∆θ = 1. Sincenuisance parameters may get constrained during the fit, the post-fit impact of a nuisance parameter,evaluated with x =∆θˆ ≤ 1, can be smaller than its pre-fit impact.The 20 dominant nuisance parameters in the fit, ranked according to their impact ∆µ on thesignal strength, are shown in figure 6.16. The pre-fit impact, where the nuisance parameter is fixedto θˆ±∆θ, is shown as empty blue and cyan rectangles. The blue rectangle corresponds to fixing theparameter to θˆ+∆θ, for the cyan rectangle it is fixed to θˆ−∆θ. Similarly, the filled blue and cyanrectangles show the post-fit impact, obtained from fits with the nuisance parameter fixed to θˆ±∆θˆ.The upper axis on the figure shows the scale of the impact ∆µ. The pull of a nuisance parameter isdefined by comparing its best-fit point θˆ to its nominal pre-fit value θ0, and dividing the differenceby its pre-fit uncertainty. This pull, given by θˆ−θ0∆θ , is shown by the black points for the nuisanceparameters. The lower axis in the figure shows the corresponding scale; all best-fit points θˆ are within86tt+light5jCR1c≥tt+5jCRtt+b5jCR 25jSR 15jSR boostedSR tt+light6j≥CR1c≥tt+6j≥CR tt+b6j≥CR 3 6j≥SR 2 6j≥SR 1 6j≥SRData / Pred. 0.50.7511.251.5Events / bin10210310410510610710810 ATLAS-1 = 13 TeV, 36.1 fbsSingle LeptonPre-FitData Htt  + lighttt1c≥ + tt 1b≥ + tt  + VtttNon-t Total unc. Htttt+light5jCR1c≥tt+5jCRtt+b5jCR 25jSR 15jSR boostedSR tt+light6j≥CR1c≥tt+6j≥CR tt+b6j≥CR 3 6j≥SR 2 6j≥SR 1 6j≥SRData / Pred. 0.50.7511.251.5Events / bin10210310410510610710810 ATLAS-1 = 13 TeV, 36.1 fbsSingle LeptonPost-FitData Htt  + lighttt1c≥ + tt 1b≥ + tt  + VtttNon-t Total unc. HttFigure 6.11: Overview of the yields in all single-lepton regions pre-fit (top) and post-fit (bottom).The uncertainty bands include all sources of systematic uncertainty described in section 6.6. Nouncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯H signal is shown bothin the stacked histogram, contributing in red, as well as a dashed red line drawn on top of the stackedhistogram. It is normalized to the SM prediction pre-fit, and the best-fit signal strength value reportedin equation (6.1) post-fit [1].87tt+light3jCR1b≥tt+3jCRtt+light4j≥CR1c≥tt+4j≥CR 3 4j≥SR 2 4j≥SR 1 4j≥SRData / Pred. 0.50.7511.251.5Events / bin10210310410510610710ATLAS-1 = 13 TeV, 36.1 fbsDileptonPre-FitData Htt  + lighttt1c≥ + tt 1b≥ + tt  + VtttNon-t Total unc. Htttt+light3jCR1b≥tt+3jCRtt+light4j≥CR1c≥tt+4j≥CR 3 4j≥SR 2 4j≥SR 1 4j≥SRData / Pred. 0.50.7511.251.5Events / bin10210310410510610710ATLAS-1 = 13 TeV, 36.1 fbsDileptonPost-FitData Htt  + lighttt1c≥ + tt 1b≥ + tt  + VtttNon-t Total unc. HttFigure 6.12: Overview of the yields in all dilepton regions pre-fit (top) and post-fit (bottom). Theuncertainty bands include all sources of systematic uncertainty described in section 6.6. No uncer-tainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯H signal is shown both inthe stacked histogram, contributing in red, as well as a dashed red line drawn on top of the stackedhistogram. It is normalized to the SM prediction pre-fit, and the best-fit signal strength value reportedin equation (6.1) post-fit [1].88 [GeV]hadTH200 250 300 350 400 450 500 550 600 650Data / Pred. 0.50.7511.251.5Events / 50 GeV02004006008001000120014001600180020002200ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton1c≥+tt5jCRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc. [GeV]hadTH200 250 300 350 400 450 500 550 600 650Data / Pred. 0.50.7511.251.5Events / 50 GeV02004006008001000120014001600180020002200ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton1c≥+tt5jCRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc. [GeV]hadTH200 300 400 500 600 700 800 900 1000Data / Pred. 0.50.7511.251.5Events / 100 GeV010002000300040005000ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton1c≥+tt6j≥CRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc. [GeV]hadTH200 300 400 500 600 700 800 900 1000Data / Pred. 0.50.7511.251.5Events / 100 GeV010002000300040005000ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton1c≥+tt6j≥CRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Figure 6.13: Comparison between data and the model for the control regions CR5jt t¯+≥1c (top) andCR≥6jt t¯+≥1c (bottom), with pre-fit on the left and post-fit on the right. The uncertainty bands include allsources of systematic uncertainty described in section 6.6. No uncertainty related to k(t t¯+≥ 1b) andk(t t¯+≥ 1c) is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalizedto the SM prediction pre-fit, and the best-fit signal strength value reported in equation (6.1) post-fit.Events with HhadT < 200 GeV or HhadT > 650 GeV are included in the leftmost and rightmost bins of theCR5jt t¯+≥1c distributions, respectively. Similarly, events with HhadT < 200 GeV or HhadT > 1000 GeV arealso included in the outermost bins of the CR≥6jt t¯+≥1c distributions [1].89Classification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin020406080100120140160 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton15jSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin020406080100120140160 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton15jSRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin02004006008001000ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton25jSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin02004006008001000ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton25jSRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output0.3− 0.2− 0.1− 0 0.1 0.2 0.3Data / Pred. 0.50.7511.251.5Events / bin050100150200250ATLAS-1 = 13 TeV, 36.1 fbsSingle LeptonboostedSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output0.3− 0.2− 0.1− 0 0.1 0.2 0.3Data / Pred. 0.50.7511.251.5Events / bin050100150200250ATLAS-1 = 13 TeV, 36.1 fbsSingle LeptonboostedSRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 6.14: Comparison between data and the model for the signal regions SR5j1 (top), SR5j2 (middle)and SRboosted (bottom), with pre-fit on the left and post-fit on the right. The uncertainty bands includeall sources of systematic uncertainty described in section 6.6. No uncertainty related to k(t t¯+≥ 1b)and k(t t¯+≥ 1c) is included pre-fit. The t t¯H signal shown in red in the stacked histogram is nor-malized to the SM prediction pre-fit, and the best-fit signal strength value reported in equation (6.1)post-fit. The t t¯H distribution normalized to the total background is overlaid as a dashed red line [1].90Classification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin050100150200250300350400450 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton16j≥SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin050100150200250300350400450 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton16j≥SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin0100200300400500600700800900 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton26j≥SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin0100200300400500600700800900 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton26j≥SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin02004006008001000120014001600 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton36j≥SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttClassification BDT output1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Data / Pred. 0.50.7511.251.5Events / bin02004006008001000120014001600 ATLAS-1 = 13 TeV, 36.1 fbsSingle Lepton36j≥SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 6.15: Comparison between data and themodel for the signal regions SR≥6j1 (top), SR≥6j2 (middle)and SR≥6j3 (bottom), with pre-fit on the left and post-fit on the right. The uncertainty bands include allsources of systematic uncertainty described in section 6.6. No uncertainty related to k(t t¯+≥ 1b) andk(t t¯+≥ 1c) is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalized tothe SM prediction pre-fit, and the best-fit signal strength value reported in equation (6.1) post-fit. Thet t¯H distribution normalized to the total background is overlaid as a dashed red line [1].916. Search for t t¯H(bb¯)θ∆)/0θ-θ(2− 1.5− 1− 0.5− 0 0.5 1 1.5 2b-tagging: efficiency NP II: soft-term resolutionmissTEb-tagging: mis-tag (c) NP Ib-tagging: efficiency NP IWt: diagram subtr. vs. nominal+light: PS & hadronizationttJet energy resolution: NP II1c: ISR / FSR≥+tt1b: shower recoil scheme≥+tt5F vs. nominalHERPA1c: S≥+tt3b normalization≥1b: tt+≥tt+H: cross section (QCD scale)ttJet energy resolution: NP I 0.10±1b) = 1.24 ≥k(tt+b-tagging: mis-tag (light) NP IH: PS & hadronizationtt1b: ISR / FSR≥+tt1b: PS & hadronization≥+tt4F vs. nominalHERPA1b: S≥+tt5F vs. nominalHERPA1b: S≥+ttµ∆1− 0.5− 0 0.5 1:µPre-fit impact on θ∆+θ = θ θ∆-θ = θ:µPost-fit impact on θ∆+θ = θ θ∆-θ = θNuis. Param. PullATLAS-1 = 13 TeV, 36.1 fbsFigure 6.16: The 20 dominant nuisance parameters in the fit, ranked according to their impact on thesignal strength. The empty rectangles correspond to the pre-fit impact, while the filled rectanglesshow the post-fit impact per nuisance parameter. The upper axis shows the impact ∆µ. The pull θˆ−θ0∆θof the nuisance parameter is shown as black points, with the vertical black lines visualizing the post-fitnuisance parameter uncertainty ∆θˆ [1].the pre-fit uncertainties ∆θ. The vertical black lines show the post-fit nuisance parameter uncertainty∆θˆ. Neither pre-fit uncertainty∆θ nor nominal value θ0 are defined for the free-floating normalizationfactor k(t t¯+≥ 1b), so they are set to∆θ = θ0 = 1 in this figure. The pre-fit impact of k (t t¯+≥ 1b) is notdrawn, since it is not well-defined. Nuisance parameters corresponding to statistical uncertainties inthe samples, described in section 6.6.1, are excluded from this figure.The dominant uncertainty source is related to the modeling of t t¯+≥ 1b, and described by thecomparison of the nominal POWHEG+PYTHIA 8 t t¯ sample to the SHERPA5F prediction. Additional un-certainties related to t t¯+≥ 1b modeling, also derived by comparing the nominal POWHEG+PYTHIA 8t t¯ to alternative samples, follow in the ranking. Multiple systematic uncertainties related to themodeling of t t¯H show up, and the modeling of t t¯+≥ 1c and t t¯ + light also plays a role, albeit muchdecreased compared to t t¯+≥ 1b. When considering experimental sources, the dominant contribu-926. Search for t t¯H(bb¯)tions are related to b-tagging and jet energy resolution. The impact of all uncertainties not included infigure 6.16 is small; when removing them from the fit, the total signal strength uncertainty decreasesby 5%. The correlations between the most highly ranked nuisance parameters are shown in appendixsection A.3.Uncertainties grouped by sourceTable 6.5 shows contributions to the measured total signal strength uncertainty ∆µ, grouped bysources of uncertainty. The contributions are obtained by fixing all nuisance parameters within onegroup to their best-fit values and repeating the fit, resulting in a reduced uncertainty ∆µ′ <∆µ. Theimpact of a group is defined by subtracting this uncertainty in quadrature from the total uncertaintyin the nominal fit,√(∆µ)2− (∆µ′)2. Due to correlations between the nuisance parameters in differentgroups, the quadrature sum of all sources differs from the total uncertainty. The total statisticaluncertainty is evaluated by fixing all nuisance parameters except the free-floating normalizationfactors k(t t¯+≥ 1b) and k (t t¯+≥ 1c), thus including their contribution as well as the contributionfrom data statistics. In contrast, the intrinsic statistical uncertainty is obtained from a fit where thesenormalization factors are also fixed to their post-fit values. The statistical uncertainty in backgroundmodeling includes both the effects from statistical uncertainties in the nominal MC samples, as wellas the data-driven estimation of the fake lepton background in the single-lepton channel.The modeling of t t¯+ ≥ 1b is the dominant source of uncertainty in the analysis and limits itssensitivity. Another large source of uncertainty is the background model statistical uncertainty due tothe finite amount of MC events generated. The experimental sources of uncertainty with the largestimpact are related to b-tagging and jet calibration. The modeling of the t t¯H signal and t t¯+≥ 1c playa smaller role, followed by the remaining sources listed.6.7.4 Validation studiesIn order to test the fit model, a dataset of simulated events is built where the t t¯ sample is replaced bya prediction generated with POWHEG+PYTHIA 6. When fitting the nominal model to this dataset, nosignificant bias in the signal strength measurement is observed.Dedicated studies are performed to understand the pulls and constraints observed in the fit.The origin of pulls is investigated by splitting them into multiple components, such that differentcomponents act on different regions or samples. Their effect on the signal extraction is also studied byexcluding bins enriched in t t¯H signal. In general, the pulls serve to correct predictions of t t¯ , withoutbiasing the signal strength measurement.The constraints observed when fitting data are compatible with the constraints expected fromfitting the Asimov dataset, and from fitting the dataset built with the POWHEG+PYTHIA 6 predictionfor t t¯ .The modeling of t t¯ has a dominant impact on the analysis. Alternative ways to defined the t t¯model and associated systematic uncertainties have been studied, and they lead to compatible results[132].936. Search for t t¯H(bb¯)Table 6.5: Contributions to the signal strength uncertainty, grouped by sources. The total statisticaluncertainty includes effects from the k(t t¯+≥ 1b) and k (t t¯+≥ 1c) normalization factors, while theintrinsic statistical uncertainty does not. The background model statistical uncertainty includeseffects from statistical uncertainties in nominal MC samples and the data-driven fake lepton estimatein the single-lepton channel [1].Uncertainty source ∆µSystematic uncertaintiest t¯H modeling +0.22 −0.05t t¯+≥ 1b modeling +0.46 −0.46t t¯+≥ 1c modeling +0.09 −0.11t t¯ + light modeling +0.06 −0.03Other backgroundmodeling +0.08 −0.08Backgroundmodel statistical uncertainty +0.29 −0.31b-tagging efficiency andmis-tag rates +0.16 −0.16Jet energy scale and resolution +0.14 −0.14Jet vertex tagger, pile-up modeling +0.03 −0.05Light lepton (e,µ) identification, isolation, trigger +0.03 −0.04Luminosity +0.03 −0.02Total systematic uncertainty +0.57 −0.54Statistical uncertaintiest t¯+≥ 1b normalization +0.09 −0.10t t¯+≥ 1c normalization +0.02 −0.03Intrinsic statistical uncertainty +0.21 −0.20Total statistical uncertainty +0.29 −0.29Total uncertainty +0.64 −0.616.7.5 Observed significance and upper limitsThe observed signal strength represents an excess over the SM background of 1.4 standard deviations.The expected sensitivity, evaluated from a dataset of simulated events, is 1.6 standard deviations.The dataset used for this calculation corresponds to the post-fit model, with all nuisance parametersadjusted to their best-fit values.A signal strength µt t¯H > 2.0 is excluded at the 95% confidence level. Figure 6.17 summarizes 95%confidence level upper limits on the signal strength µt t¯H . The dashed black line shows the expectedmedian limit, corresponding to the median of the distribution of pseudo-experiments generatedunder a background-only hypothesis with µt t¯H = 0. Green and yellow bands correspond to the rangescontaining 68% and 95% of these limits. The expected limit under a hypothesis including the SM t t¯Hsignal is drawn as a dashed red line. All expected limits are calculated using the dataset of simulatedevents corresponding to the post-fit model. The observed limits are shown as solid black lines. Thelimits for the two-µ fits are derived by simultaneously fitting the single-lepton and dilepton channels,but using two independent signal strength parameters for them.946. Search for t t¯H(bb¯)H)t(tSMσ/σ95% CL upper limit on 0 1 2 3 4 5Combined combined fit)µ(two-         Single Lepton combined fit)µ(two-                  DileptonATLAS -1 = 13 TeV, 36.1 fbs = 125 GeVHmσ 1±Expected σ 2±Expected Observed=1)µExpected (Figure 6.17: 95% confidence level (CL) upper limits on the signal strength µt t¯H , derived in a combinedfit to single-lepton and dilepton channels with two independent signal strength parameters (two-µfit), as well as a fit with a single signal strength (combined fit) [1].6.7.6 Summary distribution of eventsFigure 6.18 shows the post-fit distribution of t t¯H signal (S) and total background (B) events as afunction of log10 (S/B), as well as the distribution of data. It is built by evaluating log10 (S/B) of everybin in the analysis, and combining the bins with similar log10 (S/B) to form this distribution. Theevaluation is performed with the SM prediction for the signal strength, µt t¯H . The upper panel showsthe total background, as well as the contribution from t t¯H signal with signal strength corresponding toits best-fit value in red, and corresponding to the signal strength excluded at the 95% confidence levelin orange. The lower panel of the figure shows the difference between data and post-fit backgroundmodel taken from the nominal fit, divided by its uncertainty (shown with hashed lines). The resultof a background-only fit with a fixed signal strength µt t¯H = 0 is shown as a dashed black line. Itunderestimates the yields observed in data in bins of high log10 (S/B). The red line corresponds to thenominal fit, including the signal strength at its best-fit value. The orange dashed line represents thesignal strength scaled to its value excluded at the 95% confidence level. Events with log10 (S/B)<−2.7are included in the first bin.95(S/B)10log2.6− 2.4− 2.2− 2− 1.8− 1.6− 1.4− 1.2− 1− 0.8−Bkgd. Unc.Data - Bkgd.2−024Events / 0.2210310410510610710Data=0.84)fitµH (tt=2.0)95% excl.µH (ttBackgroundBkgd. Unc.=0)µBkgd. (ATLAS-1 = 13 TeV, 36.1 fbs) CombinedbH (bttDilepton and Single LeptonPost-fitFigure 6.18: Post-fit yields of signal (S), total background (B), and observed data, shown as a functionof log10 (S/B). Contributions of the signal, when scaled to its best-fit signal strength value, are drawnin red, while contributions with the signal strength set to its value excluded at the 95% confidencelevel are drawn in orange. The lower panel shows the difference of observed data and various fitmodels to the total background taken from the nominal fit [1].967. The matrix element method for t t¯H(bb¯)As discussed in section 5.2.2, the optimal test statistic to distinguish between two hypotheses is givenby a likelihood ratio. In the t t¯H(bb¯)search, the two hypotheses are distinguished by the signalstrength µt t¯H . The signal hypothesis predicts that both t t¯H and background processes take placeand result in a set of reconstructed objects in the ATLAS detector for every event. The backgroundhypothesis with µt t¯H = 0 specifies that no observed events are due to the t t¯H process. Eventsgenerated under both of these hypotheses can be generated with MCmethods, but the individuallikelihoods cannot be evaluated directly. These likelihoods involve the integration over millions ofrandom variables in the description of parton showering, hadronization, and interactions with thedetector. When factorizing the contributions to the likelihood into parton level kinematics and allremaining effects, the likelihood ratio can be estimated directly with machine learning methods [133].The MEM provides an approximation for the individual likelihoods. They are calculated fromfirst principles, with sufficient simplifications to make the calculation computationally feasible. Thismethod was first used for a top quark measurement by the D0 Collaboration [134], following anoriginal proposal in reference [135]. It has been used in many analyses since, including the t t¯H(bb¯)search performed by the ATLAS collaboration in Run-1 of the LHC [88]. An overview of the MEM froman experimental viewpoint is provided in reference [136], and multiple software implementationsexist [137, 138].This chapter describes an implementation of the MEM for the single-lepton channel of thet t¯H(bb¯)search with 36.1 fb−1 of Run-2 data from the LHC. Details about this search are providedin chapter 6. The implementation defines a discriminant between t t¯H and the dominant t t¯ +bb¯background, and is used as an input to the classification BDT in the SR≥6j1 region. As the MEMcalculation involves an integral over a function that is computationally expensive, the implementationfocuses on providing a good discriminant at a reasonable computational cost.The chapter starts with a general description of the MEM in section 7.1, followed by more specificdetails regarding the implementation for t t¯H(bb¯)in section 7.2. Technical aspects for the MEMcalculation are discussed in section 7.3. The performance and modeling of data is summarized insection 7.4, and section 7.5 describes additional studies where the method is used to reconstruct thet t¯H and t t¯ +bb¯ system.7.1 The matrix element methodTheMEM approximates the probability density f(~X |α) for an observed set of reconstructed objects~X in an event and hypothesis α. Two hypotheses are considered here. In the signal hypothesis, all977. The matrix element method for t t¯H(bb¯)events are produced via the signal process S. The background hypothesis specifies that events are onlyproduced via background processes B . The corresponding likelihoods are LS and LB , respectively.If the signal process exist, events generally should be produced both via signal and backgroundprocesses. The aim of the method described in this chapter is however not the test of µt t¯H > 0 versusµt t¯H = 0, where both hypotheses include contributions from background processes. Instead, thelikelihood ratio LS/LB is used as a discriminant to distinguish whether any observed event is morecompatible with having been produced by a t t¯H process, than with having been produced by abackground process.The full expression for f(~X |α) involves the description of parton production, their showeringand hadronization, and the subsequent detector interactions and the reconstruction into observedobjects. The MEM factorizes the contributions into a parton level process, and groups together allremaining contributions. These two components will be described in the following, resulting in anapproximate expression for the likelihoods LS and LB .7.1.1 Parton levelThe probability density is proportional to the differential cross-section dσ, which can be written as [8]dσα(pa ,pb ,~Y)= 1F(2pi)4δ4(pa +pb −N∑i=1pi)∣∣Mα (pa ,pb ,~Y )∣∣2d~Φ~Y (7.1)at the parton level for hypothesis α. The squared matrix element, |Mα|2, describes the transitionprobability from the interacting initial states pa and pb to theN final state partons pi , all characterizedby their four-momenta. The flux factor F in the laboratory frame is given by F = 4√(pa ·pb)2−m2am2b .The Dirac delta distribution δ4 enforces conservation of four-momentum. In the following, pnet =pa +pb −∑i pi will be used. The final state configuration at parton level is called ~Y , and described byd~Φ~Y =N∏i=1d3~pi(2pi)32Ei, (7.2)with three degrees of freedom per on-shell final state parton, and a total of N final state partons.The initial states pa and pb carry unknown momentum fractions x1 and x2 in proton–protoncollisions. As described in section 2.2.3, the differential cross-section can be factorized into thehard scatter differential cross-section dσα(pa ,pb ,~Y), and contributions from PDFs. The resultingdifferential cross-section,dσα(~Y)=∑j ,k∫x1,x2f j (x1) fk (x2)dσα(pa ,pb ,~Y)dx1dx2, (7.3)is marginalized over the colliding partons by summing all flavors i , j and integrating over all allowedmomentum fractions x1,x2. The total cross-section is obtained when also performing the integralover all final state configurations,σα =∫~Ydσα(~Y). (7.4)The parton-level probability density is given byf(~Y |α)= 1σαdσα(~Y). (7.5)987. The matrix element method for t t¯H(bb¯)7.1.2 Reconstructed objectsThe correspondence between parton level and reconstructed objects ~X is described by a transfer func-tion and a sum over all possible permutations of objects,∑perm.T(~X |~Y ). The permutations describeassignments between partons and reconstructed objects. The joint probability density f(~X ,~Y |α) foran observed set of reconstructed objects ~X , parton-level configuration ~Y , and hypothesis α is writtenasf(~X ,~Y |α)= [ ∑perm.T(~X |~Y )] f (~Y |α) , (7.6)factorizing into a term describing the parton level, and another term describing all other effects. Thecorresponding likelihood for the observation ~X to be consistent with hypothesis α is obtained whenperforming the marginalization over the space of parton level configurations,Lα = P(~X |α)= 1σα∑perm.∫~YT(~X |~Y )dσα (~Y ) . (7.7)The expression for the likelihood in equation (7.7) can be used to build the likelihood ratio LS/LB , whichin turn provides a discriminant between signal and background events.7.2 General approach for t t¯H(bb¯)The MEM relies on the calculation of likelihoods of the form given in equation (7.7). They arecalculated for both the signal t t¯H(bb¯)process and the dominant background in the SR≥6j1 region,which is t t¯ +bb¯. Both processes have N = 8 final state partons at LO, described by the likelihoodLα = 1σα∑perm.∑j ,k∫1FT(~X |~Y ) f j (x1) fk (x2) (2pi)4δ4 (pnet) |Mα|2dx1dx2 8∏i=1d3~pi(2pi)32Ei. (7.8)This likelihood requires a sum over the type of initial partons colliding, and an integration over theirtwo momentum fractions. It also includes an integration over 24 kinematic degrees of freedom of thefinal state partons, and four constraints for four-momentum conservation. After resolving the deltadistribution, an integration over a 22-dimensional phase space remains. The computation of thisintegral is not feasible when having to consider many different assignments of reconstructed objectsto partons, while having to perform this calculation for millions of events. Several approximationssimplify the calculation significantly.7.2.1 PermutationsThe likelihood calculation relies on the description of the relation between partons and reconstructedobjects by the transfer function. A unique identification of the parton from which a given recon-structed object originated is generally not possible, and contributions from all possible assignmentsneed to be considered. They are described by the sum∑perm. in equation (7.8).Both the number of partons and the number of reconstructed objects can in general vary acrossevents. Only LO processes are considered here, which guarantees exactly eight final state partons in997. The matrix element method for t t¯H(bb¯)every event. All events in the SR≥6j1 region have at least six reconstructed jets and exactly one recon-structed light charged lepton. There is no ambiguity in the assignment of the single reconstructedlepton to the parton level lepton. It is possible to uniquely assign one jet to each quark in everypermutation, but there may be events where no jet corresponding to a given quark is reconstructed.Such cases can be addressed by integrating over the degrees of freedom describing the jet. This iscomputationally expensive and not done here. An example is briefly discussed in section 7.5.1.Events can have more than six reconstructed jets due to higher order corrections to the LOmatrixelement calculation, pile-up interactions, or imperfect object reconstruction in the detector. Forsuch events, it is in principle possible to sum over contributions from each possible subset of sixjets. In each subset, multiple permutations need to be considered. The combinatorial complexityquickly increases for events with more than six jets, and it becomes computationally prohibitive toconsider all possible permutations. The computational cost of the likelihood in equation (7.8) scaleslinearly with the amount of permutations considered. Only a limited amount of permutations is thusconsidered in practice.Jet selection strategyFour jets need to be assigned to b quarks in every permutation, and a unique set of four jets is chosenfor every event. Jets are ordered by decreasing tightness of the b-tagging operating point they satisfy. Inthe SR≥6j1 region, at least four jets are guaranteed to pass the very tight operating point. The ambiguityin the ordering between jets satisfying the same operating point is resolved by additionally orderingsuch jets in decreasing order of their transverse momentum. The top four jets in this ranking are theselected b-jets, which are assigned to b quarks in the permutations considered.Twomore jets need to be selected and assigned to the quarks originating from the decay of theWboson. The invariant massm j j of every pair of two jets, excluding jets that were already selected asb-jets, is calculated. For every pair, the distance∣∣mW −m j j ∣∣, withmW = 80.4 GeV, is calculated. Thepair that minimizes this distance is selected and assigned to the two quarks from theW boson decay.These two jets are called the selected light jets.Jet permutationsThere is no ambiguity in the assignment of the selected light jets to the quarks originating from theW boson decay. The four selected b-jets need to be assigned to the four b quarks. Two are assignedto the decay products of the Higgs boson, and one each is assigned to the b quarks originating fromthe decay of both top quarks. There are four possible quarks to assign each selected b-jet to. Thelikelihood is unchanged when exchanging the jets assigned to the Higgs boson decay products. Thisresults in 4!/2= 12 possible permutations that need to be considered for every event.When not relying on b-tagging information in the jet selection strategy, a selection of six jetscould be made, and these jets could be assigned to all six quarks. The likelihood is unchanged whenexchanging jets assigned to the Higgs boson decay products, and when exchanging those assigned totheW boson final states. There are thus 6!/2·2= 180 permutations for every event in this case. This1007. The matrix element method for t t¯H(bb¯)would slow down the likelihood calculation by more than an order of magnitude; the use of b-tagginginformation is essential to control the computational cost.Permutations could also be considered only if they satisfy kinematic constraints. When calculatingthe t t¯H signal likelihood, permutations could be rejected if the invariant mass of the two b-jetsassigned to the decay of the Higgs boson is very different from the Higgs boson mass. No suchkinematic constraints are used in this implementation.7.2.2 Transfer functionThe transfer function is assumed to factorize into components, with one component per reconstructedand selected object,∑perm.T(~X |~Y )=W lep (p l , reco|p l )W ν (pν, reco|pν)[ ∑perm.6∏i=1W jeti(p jeti |pqi)]. (7.9)It describes multiple assignments of reconstructed objects to partons, with one-to-one correspon-dences between objects in each permutations. When calculating the likelihood for a given event,specified by a set of reconstructed objects ~X , contributions from regions of the parton level phasespace are generally suppressed by the transfer function if the parton level kinematics ~Y are verydifferent from the measurement ~X . The quantities p l , pν, and pq refer to the lepton, neutrino, andquark four-vectors on parton level, while p l , reco, pν, reco, and p jet are the corresponding quantities forthe associated reconstructed objects.In the case of charged leptons, the measurement of the ATLAS detector p l , reco agrees well withthe parton level lepton kinematics p l . The corresponding component of the transfer function isapproximated by a delta distribution,W lep(p l , reco|p l ) = δ3 (~p l , reco−~p l ). Leptons on both partonlevel and the reconstructed objects are treated as massless.There is no direct measurement of the neutrino four-momentum pν, reco. Its kinematics can beconstrained by requiring four-momentum conservation. It can also be related to the EmissT measure-ment of the ATLAS detector, which is not done in the implementation described here. The transferfunction componentW ν(pν, reco|pν) is treated as a uniform distribution.The detector resolution for jet measurements is worse than for leptons. Themeasured jet directionis approximated to exactly correspond to the parton direction, and described by a delta distribution.The relation between measured jet energy and parton energy is described by a transfer functioncomponent. Details about this treatment are provided in section 7.3.4.7.2.3 Remaining degrees of freedomWith these chosen approximations for the transfer function, the dimensionality of the integral inequation (7.8) can be reduced by resolving all delta distributions. The integral is performed over a22-dimensional phase space. The lepton kinematics between reconstructed lepton and parton levellepton are assumed to match exactly, reducing the dimensionality by three. The directions for all sixjets are also assumed to match exactly between parton and jet, resulting in an additional reduction by1017. The matrix element method for t t¯H(bb¯)two degrees of freedom for each of the six jets. After inserting and resolving these delta distributionsinto equation (7.8), an integral over a seven-dimensional phase space remains. This integral has to beevaluated for each of the 12 permutations considered in every event.Further possible simplificationsThe integral may be simplified further when considering details of the kinematics in t t¯H and t t¯ +bb¯processes. The total width of the Higgs boson is predicted to be ΓH = 4 MeV by the SM [10], signifi-cantly below the resolution of the ATLAS detector. The t t¯H matrix element significantly suppressescontributions from phase space regions wherembb −mH À ΓH . In this expression,mbb is the invari-ant mass of the two b quarks originating from the Higgs boson decay. Contributions to the likelihoodfrom such regions are negligible. An additional constraint δ (mbb −mH ) can be introduced to reducethe degrees of freedom by one when calculating the t t¯H signal likelihood.The decay width ofW bosons and top quarks are three orders of magnitude larger comparedto the Higgs bosons, but similar arguments can be applied to reduce the degrees of freedom inthe integration further. Particularly interesting is the decay of the W bosons into leptons. Thecharged lepton is assumed to be well measured, and the transverse components of the neutrino canbe constrained by requiring four-momentum conservation in each event. One degree of freedomremains for the neutrino when treating it as massless and on-shell. When forcing theW boson todecay exactly on-shell, the resulting quadratic function can be solved for the neutrino momentum inthe z direction.The two approaches were tested, and both found to not improve the performance of the method.Consequently, neither of these additional simplifications are adopted for the t t¯H(bb¯)analysis. Thespecial treatment for the Higgs boson decay is only possible when calculating the t t¯H likelihood, andthus leads to a different approach than in the calculation of the t t¯ +bb¯ likelihood. It results in a slightloss in discrimination power of the likelihood ratio. Requiring an on-shell decay of theW boson toleptons reduces the degrees of freedom in the integral required for the likelihood calculation, but alsointroduces two solutions that both need to be considered separately. This slows down the calculation,and slightly decreases discrimination power when implemented for both t t¯H and t t¯ +bb¯ likelihoods.When only using this approximation in one of the likelihoods, but not the other, the discriminationpower reduces more significantly.7.2.4 Likelihoods and discriminantFor every event, the signal and background likelihood is calculated according to equation (7.8). Thesignal likelihood LS is calculated using a t t¯H matrix element, while the background likelihood LB iscalculated for the t t¯ +bb¯ process. In the calculation, several of the constant factors in the expressionfor the likelihood are not included, and additional constant factors are applied to prevent numericalunderflow. The logarithm of the likelihoods log10 (LS) and log10 (LB ), visualized in section 7.4, isproportional to the logarithm of the likelihood in equation (7.8) up to a constant shift. The resulting1027. The matrix element method for t t¯H(bb¯)calculated variables can thus not be interpreted as a probability density P(~X |α) without suitablenormalization. This has no impact on the performance of the method.The MEM discriminant is defined as the logarithm of the likelihood ratio,MEMD1 = log10 (LS)− log10 (LB ) . (7.10)It is designed to discriminate between events originating from the t t¯H process, which are character-ized by large values ofMEMD1, and events from the t t¯+bb¯ process. The t t¯+bb¯ process dominates theSR≥6j1 region, where the discriminant is used, and contributions from other processes are small. Whilethe MEMD1 discriminant is not designed to discriminate against other processes besides t t¯ +bb¯, itstill helps discriminate against them.7.2.5 Systematic uncertaintiesNo systematic uncertainty inherent to the calculation of MEMD1 is used in the t t¯H(bb¯)analysis. Thediscriminant is a function of event properties, like any other multivariate analysis technique. It iscalculated in the same way for both data and simulated events. An incorrect implementation of thelikelihood calculation does not invalidate the method, but decreases its performance. Uncertaintiescovering the difference between the approximate event likelihood calculated via theMEM and thetrue likelihood are therefore not needed. The method relies on the correct modeling of kinematicfeatures present in data. All systematic uncertainties described in section 6.6 are therefore propagatedthrough theMEM calculation, and also through all other multivariate techniques used in the analysis.7.3 Technical implementationThe likelihood calculation is steered by a framework based on reference [139]. It allows for the imple-mentation of the integrand in equation (7.8) in both CUDA [140] and OPENCL [141]. The calculationcan be performed both with central processing units (CPUs) and graphics processing units (GPUs). Inboth architectures, the integral evaluation is parallelized across many cores. Initial studies for themethod were mostly performed with GPUs and CUDA, while the final implementation was usingOPENCL on CPUs. The choice was motivated by the availability of resources. This implementationcalculates both likelihoods on a Intel Xeon E5-2650 v2 CPU in 1.8 s. Around onemillion events perday can be processed on a cluster with 368 cores, which was used for the majority of the calculations.This section lists additional details regarding the implementation of the integration, PDFs, thematrix elements, and transfer function.7.3.1 IntegrationThe phase space integral in equation (7.8) is performedwithMCmethods, using the VEGAS algorithm[142]. An integral over a volumeV can be approximated by sampling the integrand f (~x) withN points1037. The matrix element method for t t¯H(bb¯)drawn from a probability density g (~x).∫Vf (~x)d~x = limN→∞1NN∑i=1f (~xi )g (~xi )(7.11)The standard deviation of the approximation decreases as 1/pN , independent of the dimensionality ofthe integral. The VEGAS algorithm is adaptive and focuses on sampling regions where the integrandis large. It starts out using a uniform probability density g (~x), and the integration volume V is dividedinto a uniform grid of hypercubes. The size of the hypercubes is then adjusted, resulting in smallhypercubes where the integrand is large, and large hypercubes where it is small. The number ofintegration points per hypercube remains the same, and the hypercube density is equivalent to theprobability density g (~x). The algorithm thus focuses the sampling on the regions that contributemost to the total integral.Phase space detailsAs discussed in section 7.2.3, seven degrees of freedom remain to be integrated over in the likelihoodevaluation. They are chosen to be the energies of the six quarks, as well as the momentum of theneutrino in the z direction. The momentum of the neutrino in the transverse directions is chosensuch that the total transverse momentum of the LO parton system vanishes. The directions of thequarks are fixed to the directions of the jets they are assigned to in each permutation. The jet four-momenta are built from their measured energies, as well as their position in the η and φ coordinates.Their transverse momenta are adjusted such that the invariant mass of selected b-jets is 4.7 GeV,corresponding to the b quark mass in the matrix element calculation, and the invariant mass ofselected light jets vanishes. The same treatment is employed for the reconstructed lepton, whichreceives a vanishing invariant mass.The integration is performed over the six quark energies Eq , with the integration volume restrictedby a requirement depending on the energy of the jet assigned to the quark, Eq ∈ It(E jet). The interval Itis centered around the measured jet energy, and depends on the jet type t , with a different definitionfor b- and light jets. The definition of It is determined from the transfer function described insection 7.3.4. A distinction is made between b- and light jets, as a different parametrization is used forthe respective transfer function components. The interval is defined asIt =[E jet · (1− r ) ,E jet · (1+ r )] , (7.12)with parameter r defined separately for and b- and light jets as rb and rl , respectively. These parame-ters arerb = a ·(3.73E jet+0.0736),rl = a ·(3.19E jet+0.0856).(7.13)The setting a = 5 is used, which limits the integration volume for faster evaluation of the integral,while not restricting the volume too much to avoid compromising the performance of the MEMD11047. The matrix element method for t t¯H(bb¯)discriminant. The integration volume is furthermore restricted to regions with positive parton energy.Contributions from outside the integration volume considered are suppressed by the transfer function,and thus only have a small contribution to the likelihoods.For light jets, the intervals contain more than 99% of the quark energy probability density distribu-tion,W(Eq |E jet), for any given jet energy. They contain around 80% for b-jets with E jet = 25 GeV, 90%for 50 GeV and around 95% for b-jets with 100 GeV< E jet < 500 GeV.The integration range for the neutrino momentum in the z direction is restricted to be in theinterval pνz ∈ [−1 TeV,1 TeV] in the laboratory frame.Alternative parameterizationsThe choice of the six quark energies as integration variables aligns the peak structure of the integrand,which is induced by the transfer function, with these integration variables. This is desirable for fasterconvergence of the integration. Additional peaks of the integrand due to the decay of intermediateparticles into the final state objects are not aligned with the integration variables. It is possibleto perform a change of integration variables to instead align the integration variable with theseintermediate resonances. The integral over the energy of both b quarks originating from the Higgsboson decay can be replaced by an integral over the energy of one decay product and the invariantmass of the Higgs boson. No significant improvement in computation speed or power of the MEMD1discriminant is observed in this case. A similar transformation can be done for the two quarks fromthe decay of theW boson, again with no improvement in performance observed. Neither of thesealternative integration variable choices are adopted in the final implementation of the method.Integration processThe integral over the seven degrees of freedom of equation (7.8) is performed with VEGAS, andoptimized to provide a good discriminant within a reasonable amount of time. This requirementcorresponds to being able to process around a million events per day.The integration for the signal and background likelihoods are performed separately, and each ofthe 12 permutations per likelihood are also integrated separately. The integration procedure consistsof multiple phases. Each phase can contain several rounds of integration. In one round of integration,up to 1024 phase space points are evaluated. The VEGAS grid of hypercubes is updated after everyround to provide an improved sampling of the integrand in subsequent rounds.The first phase consists of a single round of integration. The result for the integral obtainedfrom this round of integration is not used any further, but this phase provides a first update to thehypercube grid. Subsequent integration rounds therefore yield better results.Themain integration phase follows, where three rounds of integration are performed. An estimatefor the integral is obtained by combining the results from the three rounds. If the estimated uncertaintyfor the integral is below 1%, the integration process stops. This is the case only for a small fraction ofevents.1057. The matrix element method for t t¯H(bb¯)After the main integration phase, the contributions to the signal and background likelihoods fromeach permutation are compared. The likelihoods are Lα for the two hypotheses, and consist of a sumof contributions Liα per permutation i , which are all evaluated in separate integrals. If the likelihoodof an individual permutation i is less than 1% of the largest likelihood for any permutation for thishypothesis, no further integration is performed for this permutation i . The contribution from such apermutation to Lα is small, and precision for this permutation is thus less important. Generally theindividual contributions to the likelihood range over multiple orders of magnitude, and the removalof small contributions speeds up the last integration phase, while not compromising the accuracy forthe full likelihood.The final integration phase consists of up to three more rounds of integration for all permutationsthat are not pruned in the previous phase. The integration process is stopped early if the estimateduncertainty for the integral is below 1% after any of these rounds.The total likelihood is then obtained by summing the individual contributions from all 12 permu-tations for both the signal and background hypotheses.7.3.2 Matrix elementsThe matrix elements for the t t¯H and t t¯ +bb¯ processes are provided by MADGRAPH5_AMC@NLO[95]. Only LO Feynman diagrams are considered; the extension to NLO is challenging and comes at asignificant computational cost [143]. The intermediate top quark pair in every diagram is required todecay toW bosons and bottom quarks. Only single-lepton final states are considered, and theW +boson in all diagrams is forced to decay to leptons, while theW − boson decays to quarks. The samematrix elements are used regardless of the charge and flavor of the lepton, which is equivalent to theassumption of lepton universality and invariance under charge conjugation.Two topologies of diagrams can be distinguished, those initiated by interactions of gluons, anddiagrams with a quark–antiquark pair in the initial state. The total cross-section for both processes isdominated by contributions from gluon–gluon diagrams. Figure 7.1 visualizes examples of the t t¯Htopologies. The three diagrams on the left are initiated by gluon–gluon interactions, while the diagramon the right is initiated by a quark–antiquark interaction.In the MEM calculation, diagrams initiated by quark–antiquark interactions are not included,and only gluon–gluon interactions are considered. This is done for both the t t¯H and t t¯ +bb¯ matrixelements. The calculation time is reduced by around 30% with this removal of diagrams, withoutsignificantly impacting the discrimination power of the likelihood ratio.The matrix element code provided by MADGRAPH5_AMC@NLO is optimized for performance.Non-physical helicity settings for the particles involved in the interaction are removed from thecalculation. They do not contribute to the matrix element, and their removal results in a significantdecrease in calculation time.The renormalization scale is set toµR =√√√√( 8∑i=1Ei)2−(8∑i=1pz,i)2, (7.14)1067. The matrix element method for t t¯H(bb¯)ggt¯Htggt¯Htggt¯Htqq¯t¯HtFigure 7.1: Topologies for t t¯H production. The three diagrams on the left are initiated by gluons andare considered in the MEM calculation, while the quark–antiquark topology in the diagram on theright is neglected.where the sum runs over the eight final state partons produced. This choice has a negligible impact.The top quark mass in the calculation is set to 173 GeV, the Higgs bosonmass is 125 GeV, theWbosonmass is 80.42 GeV, and the bottom quark mass is 4.7 GeV.Signal hypothesisThe matrix element for the t t¯H signal contains diagrams with an intermediate top quark pair andHiggs boson. The Higgs boson is forced to decay to a pair of bottom quarks. Diagrams initiated byquark–antiquark interactions are manually removed from the calculation.Background hypothesisThe matrix element for t t¯ +bb¯ describes the production of a pairs of top quarks and bottom quarks.Only diagrams initiated by gluon–gluon interactions are considered.7.3.3 Parton distribution functionsPDFs are obtained from the CT10 PDF set [16], via an interface provided by LHAPDF [17]. The PDFdistributions are saved to a grid, parameterized via momentum fraction x carried by the parton andmomentum transferQ2. The integrand in the likelihood calculation uses this PDF grid with a linearinterpolation, instead of evaluating the PDF via LHAPDF for every phase space point. This speedsup the calculation at a negligible loss of accuracy. The factorization scale choice is equivalent to therenormalization scale choice, listed in equation (7.14). No significant difference is observed wheninstead using a factorization scale setting of µF = 173 GeV.7.3.4 Transfer functionThe implementation for the transfer function used in the likelihood calculation provides the relationbetween the energies of reconstructed jets observed in the ATLAS detector, and the energies of thequarks they are associated to, in the formW(E jet|Eq). It suppresses contributions to the likelihood1077. The matrix element method for t t¯H(bb¯)from parton level configurations that are very different from the kinematics of the associated recon-structed jets. The transfer function componentW(E jet|Eq) is derived separately for b- and light jets,denoted asWb(E jet|Eq) andWlight (E jet|Eq), respectively.These distributions are derived with a sample of t t¯ events generated with POWHEG+PYTHIA 6[100–103, 117]. Events considered in the derivation need to have at least four jets, out of which atleast two are b-tagged at the loose operating point. The distributions are subsequently validatedwith the nominal POWHEG+PYTHIA 8 sample used in the t t¯H(bb¯)analysis, which is described insection 6.3.2. They describe the POWHEG+PYTHIA 8 sample well. Jets are matched to quarks producedin the decay of the top quark pair system, by requiring their distance to be∆R < 0.3. If multiple quarksare within this distance of the jet, the jet is matched to the quark that is closest in∆R . The componentsWb(E jet|Eq) andWlight (E jet|Eq) are derived in a fit of the Eq −E jet distribution, using b quarks or thelighter first and second generation quarks, respectively. Two different functional forms are used todescribe b- and light jets.Light jetsDouble Gaussian distributions describe the transfer function component for light jets, parameterizedasWlight(E jet,Eq)= 1p2pi (σ1+ Aσ2)[exp(−(Eq −E jet−µ1)22σ21)+ A ·exp(−(Eq −E jet−µ2)22σ22)], (7.15)with the parameters µ1,µ2,σ1,σ2,A determined by the fit.The resulting transfer function componentWlight(E jet,Eq)is visualized in figure 7.2 with solidlines, shown as a function of quark energy for various light jet energies. The distribution Wlightsuppresses the integrand in the MEM likelihood calculation in phase space regions where the quarkenergy is very different from the measured energy of the jet it is associated to. The width of thedistribution increases with jet energy.b-jetsThe description for b-jets makes use of a crystal ball function [144, 145], which provides betterfit results for such jets than the double Gaussian function. The core of the crystal ball functiondistribution is Gaussian, and it includes a tail region described by a power law. The transfer functioncomponent is described byWb(E jet,Eq)= 1σ (A+B)exp(−(Eq−E jet−µ)22σ2), Eq−E jet−µσ <α,C(D+ Eq−E jet−µσ)−n, Eq−E jet−µσ ≥α,(7.16)1087. The matrix element method for t t¯H(bb¯)0 25 50 75 100 125 150 175 200 225 250Eq [GeV]0.000.010.020.030.040.05W( Ejet ,Eq )Wlight(25 GeV, Eq)Wlight(50 GeV, Eq)Wlight(75 GeV, Eq)Wlight(100 GeV, Eq)Wlight(125 GeV, Eq)Wlight(150 GeV, Eq)Wb(25 GeV, Eq)Wb(50 GeV, Eq)Wb(75 GeV, Eq)Wb(100 GeV, Eq)Wb(125 GeV, Eq)Wb(150 GeV, Eq)Figure 7.2: Transfer function components for b- and light jets, shown as a function of the energy Eqof the quark a jet is associated to. The distributions for light jets, described by a double Gaussian asWlight, are shown as solid lines for different jet energies. The corresponding distributionsWb for b-jetsare shown as dashed lines, and are described by crystal ball functions.with A,B ,C ,D defined asA = n|α| (n−1) exp(− (|α|)22),B =√pi2(1+erf( |α|p2)),C =(n|α|)nexp(− (|α|)22),D = n|α| − |α| ,(7.17)The error function is defined as erf(x)= 2ppi∫ x0 e−t2dt . The parameters µ, σ, α, n are determined bythe fit.The fit result forWb(E jet,Eq)is shown in figure 7.2 with dashed lines for various b-jet energies.Compared to the double Gaussian used to describe light jets, the power law behavior is visible inthe tail for high jet energies. This tail describes jets with a measured energy that underestimates theenergy of the associated quark. It is larger for b-jets than for light jets due to the production of muonsand neutrinos in the decay of b-hadrons.1097. The matrix element method for t t¯H(bb¯)7.4 ResultsThis section summarizes the results obtained with theMEM implementation described in this chapter.The separation of a classifier, such as the MEMD1 likelihood ratio, is defined by [123]〈S2〉 = 12N∑i=1(Si −Bi )2Si +Bi, (7.18)evaluated using a distribution with N bins. The normalized signal and background yield per binis denoted as Si and Bi , with normalization condition∑i Si =∑i Bi = 1. For identical distributionsof signal and background, the separation is 〈S2〉 = 0. Distributions that do not overlap in any binhave 〈S2〉 = 1, which is the maximum possible separation. The separation depends on the choice ofbinning, and the values listed in this section are calculated using distributions with eight bins.The MEMD1 discriminant as defined in equation (7.10) is used in the SR≥6j1 region of the t t¯H(bb¯)analysis as an input to the classification BDT described in section 6.5.4. A sigmoid function is used formost distributions shown in this section to map the discriminant into the interval [0,1]. The functionis(1+exp(−MEMD1−4))−1, indicated on the relevant figures.The expected distributions of signal and background processes for all figures in this section areobtained with the model described in section 6.3, unless specified otherwise.7.4.1 Results for the SR≥6j1 signal regionFigure 7.3 shows the logarithm of the signal and background likelihoods described in section 7.2.4for the SR≥6j1 region. The normalized distributions of the t t¯H(bb¯)signal and t t¯ +bb¯ backgroundare drawn with dashed red and solid blue lines, respectively. The contributions from the signalhave on average a larger signal likelihood LS than the t t¯ +bb¯ background. The distributions forthe background likelihoods LB are very similar. The MEMD1 discriminant for the two processes isvisualized in figure 7.4. The left side shows the discriminant directly, while the right side shows thetransformed version used in the following. The likelihood ratio LS/LB is on average larger for the t t¯Hsignal than the background, hence the contribution from the signal peaks towards the right of thedistribution, while the t t¯ +bb¯ background peaks at a lower value of the discriminant. The separationbetween t t¯H(bb¯)and t t¯ +bb¯ is 〈S2〉 = 13% and 〈S2〉 = 14%, respectively. For a signal efficiency of50%, the t t¯ + bb¯ efficiency is 24%. This corresponds to a rejection factor of four when correctlyidentifying every second t t¯H event. When including all background processes, and t t¯H decays intothe remaining final states besides H → bb¯, the separation between t t¯H and the total backgroundreduces to 12% for both distributions. The separation achieved by the classification BDT, shown infigure 6.15, reaches 〈S2〉 = 20%.A comparison between data and the model is provided in figure 7.5. It shows the logarithms of thesignal and background likelihoods, as well as the transformedMEMD1 discriminant. The pre-fitmodelis shown on the left, and the post-fit model, obtained from the combined fit described in section 6.7.2,is on the right. The model is in good agreement with data within its associated uncertainties, bothpre- and post-fit.1104− 2− 0 2 4 6 8 10S L10log00.10.20.3Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progress0 2 4 6 8 10 12 14B L10log00.10.20.3Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progressFigure 7.3: Distribution of the t t¯H(bb¯)signal and t t¯+bb¯ background processes in the SR≥6j1 region asa function of the logarithms of the signal and background likelihoods, LS and LB . Both processes arenormalized to have unit integral. The left- and rightmost bins of the distributions include all eventswith likelihoods smaller or larger than the edge of these bins, respectively.8− 7− 6− 5− 4− 3− 2− 1− 0D1MEM00.20.40.6Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progress0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM00.10.20.3Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progressFigure 7.4: Distribution of the t t¯H(bb¯)signal and t t¯ +bb¯ background processes in the SR≥6j1 region,both normalized to have unit integral. The left figure shows the distributions as a function of theMEMD1 likelihood ratio, while the right figure shows the transformed version of this variable.1114− 2− 0 2 4 6 8 10S L10log0.50.7511.25 Data / Pred. 0100200300400500Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt4− 2− 0 2 4 6 8 10S L10log0.50.7511.25 Data / Pred. 0100200300400500Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 2 4 6 8 10 12 14B L10log0.50.7511.25 Data / Pred. 0100200300400500600Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 2 4 6 8 10 12 14B L10log0.50.7511.25 Data / Pred. 0100200300400500600Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM0.50.7511.25 Data / Pred. 050100150200250300350400450Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM0.50.7511.25 Data / Pred. 050100150200250300350400450Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  1SRPost-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 7.5: Comparison between data and the model for the logarithm of signal likelihood (top),logarithm of background likelihood (middle) and the transformed MEMD1 discriminant (bottom).The figures on the left show the pre-fit model. The post-fit model on the right is obtained from thefit described in section 6.7.2. The uncertainty bands include all sources of systematic uncertaintydescribed in section 6.6. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalized to the SM prediction pre-fit, and the best-fit signal strength value reported in equation (6.1) post-fit. The t t¯H distributionnormalized to the total background is overlaid as a dashed red line. The left- and rightmost bins of thelikelihood distributions include all events with likelihoods smaller or larger than the edge of thesebins, respectively.1127. The matrix element method for t t¯H(bb¯)7.4.2 Modeling in validation regionsThe MEM likelihoods are calculated in additional regions to validate the modeling of data. Theseregions are SR≥6j2 , SR≥6j3 and CR≥6jt t¯+≥1c , corresponding to the remaining regions with at least six jets inthe t t¯H(bb¯)analysis that are considered in the combined fit with more than one bin each. The b-jetselection strategy is slightly modified from the description in section 7.2.1. The set of jets satisfyingthe loose b-tagging operating point is ordered by decreasing transverse momentum, and the top fourjets are used as selected b-jets.The amount of MC events in these regions is significantly larger than the amount of events in theSR≥6j1 region, resulting in increased computation time needed to process all systematic variations forall events in these regions. For this reason, only the dominant systematic uncertainties are evaluated.They include all modeling uncertainties related to the t t¯H signal and the t t¯ background. An exceptionis the t t¯+≥ 1b sub-component uncertainty derived from the comparison of POWHEG+PYTHIA 8 andSHERPA4F, which is not evaluated. Modeling uncertainties related to small backgrounds, as well asexperimental uncertainties, are not considered.Figure 7.6 shows the logarithms of the signal and background likelihoods, as well as the trans-formedMEMD1 discriminant, in the two signal regions SR≥6j2 and SR≥6j3 . All distributions are shownpre-fit, as the post-fit model cannot be evaluated without processing all variations defining systematicuncertainties. The model generally describes data well, but underestimates data in the SR≥6j3 region.The shape of data and the model agrees for all distributions, validating the MEM implementation.Figure 7.7 shows the same distributions for the CR≥6jt t¯+≥1c region. No issues in the modeling of theMEM distributions are observed. The separation between t t¯H and the total background is 〈S2〉 = 8%in SR≥6j2 , 6% in SR≥6j3 , and 5% in CR≥6jt t¯+≥1c .7.4.3 Comparison to other methodsTheMEM implementation provides a strong discriminant between t t¯H and t t¯+bb¯ in the SR≥6j1 signalregion. In the t t¯H(bb¯)analysis, it is combined with additional multivariate techniques described insection 6.5. The three methods, reconstruction BDT, LHD and the MEM, are complimentary.Compared to the LHD and reconstruction BDT approaches, the MEM does not rely on havingsufficiently many simulated events available to derive the LHD template distributions and for BDTtraining. The transfer function in this implementation is derived from aMC sample. It can instead beapproximated by considering the detector resolution, which can be approximated without the needfor dedicated MC samples for the processes considered in a specific analysis.The results of the MEM as implemented here depend onMC samples only through the transferfunction. The LHD templates and BDT training can bias these methods towards features specific tothe samples used to build the templates and for training. When using these methods, it is thereforeimportant to validate that all relevant features are also present in data.Both the reconstruction BDT and the MEM use correlations between objects, which the LHD doesnot. The LHD andMEM both combine information frommultiple permutations (assignments of jets1134− 2− 0 2 4 6 8 10S L10log0.50.7511.25 Data / Pred. 02004006008001000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  2SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt4− 2− 0 2 4 6 8 10S L10log0.50.7511.25 Data / Pred. 0200400600800100012001400160018002000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  3SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt2 4 6 8 10 12 14B L10log0.50.7511.25 Data / Pred. 02004006008001000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  2SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt2 4 6 8 10 12 14B L10log0.50.7511.25 Data / Pred. 0200400600800100012001400160018002000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  3SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM0.50.7511.25 Data / Pred. 02004006008001000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  2SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM0.50.7511.25 Data / Pred. 02004006008001000120014001600Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥  3SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 7.6: Comparison between data and the model for the logarithm of signal likelihood (top),logarithm of background likelihood (middle) and the transformedMEMD1 discriminant (bottom) inthe SR≥6j2 region (left) and the SR≥6j3 region (right). The uncertainty bands only include sources relatedto t t¯H and t t¯ modeling, with the exception of the t t¯+ ≥ 1b sub-component uncertainty derivedfrom SHERPA4F. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included. The t t¯H signalshown in red in the stacked histogram is normalized to the SM prediction. The t t¯H distributionnormalized to the total background is overlaid as a dashed red line. The left- and rightmost bins of thelikelihood distributions include all events with likelihoods smaller or larger than the edge of thesebins, respectively.1144− 2− 0 2 4 6 8 10S L10log0.50.7511.25 Data / Pred. 010002000300040005000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥ 1c≥ tt+CRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt2 4 6 8 10 12 14B L10log0.50.7511.25 Data / Pred. 010002000300040005000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥ 1c≥ tt+CRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1 - 4))D1(1 + exp(-MEM0.50.7511.25 Data / Pred. 05001000150020002500300035004000Events / binATLAS work in progress-1 = 13 TeV, 36.1 fbsSingle Lepton6j≥ 1c≥ tt+CRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 7.7: Comparison between data and the model for the logarithm of signal likelihood (top left),logarithm of background likelihood (top right) and the transformedMEMD1 discriminant (bottom) inthe CR≥6jt t¯+≥1c region. The uncertainty bands only include sources related to t t¯H and t t¯ modeling, withthe exception of the t t¯+≥ 1b sub-component uncertainty derived from SHERPA4F. No uncertaintyrelated to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included. The t t¯H signal shown in red in the stacked his-togram is normalized to the SM prediction. The t t¯H distribution normalized to the total backgroundis overlaid as a dashed red line. The left- and rightmost bins of the likelihood distributions include allevents with likelihoods smaller or larger than the edge of these bins, respectively.1157. The matrix element method for t t¯H(bb¯)to partons) into the resulting discriminants. Additional permutations are considered in the case of theLHD, which also includes hypotheses beyond what is considered in this MEM implementation. Theaddition of additional hypotheses in the MEM is conceptually straightforward, but computationallyexpensive. Increasing the dimensionality of the integral in the likelihoods to account for objects thatwere not reconstructed results in significantly increased integration times.7.5 System reconstruction with the matrix element methodTheMEM likelihoods are evaluated per jet permutation before being summed together. The assign-ment of jets to quarks in the permutation with the highest likelihood can be considered to studyproperties of the reconstructed t t¯H and t t¯ +bb¯ system. The results summarized in this section areobtained in a region defined by requiring at least six jets, out of which at least four are b-tagged atthe tight operating point. Events for the t t¯H process are simulated with MADGRAPH5_AMC@NLO[95] and HERWIG++ [121]. The t t¯ background is simulated with POWHEG+PYTHIA 6 [100–103, 117], thesample used is enriched with contributions from t t¯+≥ 1b.7.5.1 Assignment efficiencyThe true origin of a jet is defined by matching it to quarks. A jet is matched to any of the six quarksexpected from the LO t t¯H system in the single-lepton channel if the jet is within∆R < 0.3 of the quark.This matching is called truth-matching. Each jet may be truth-matched to multiple quarks.If a jet in the highest likelihood permutation is assigned to a quark that it also truth-matched to,then the assignment by the MEM is declared to be correct. This is done nomatter whether the jet isalso truth-matched to additional quarks or not. TheMEM assignment efficiency can be calculatedwith these definitions.The confusion matrix in figure 7.8 summarizes the MEM performance for object assignments.It is evaluated on a sample of t t¯H events, and using the t t¯H likelihood. The rows correspond tothe six different quarks needed for the LO t t¯H topology. The columns list the true origin of each jet,determined by truth-matching. The last column contains jets that are not truth-matched to any quarkin the LO t t¯H system. Quarks from the decay of the Higgs boson and those from the decay of theWboson are distinguished according to their transverse momenta. The following nomenclature is used:• b1 from H : b quark fromHiggs boson decay with highest transverse momentum,• b2 from H : second b quark fromHiggs boson decay,• b from tlep: b quark from top quark decay tlep →Wb→ lνb, with theW boson decaying toleptons,• b from thad: b quark from top quark decay thad→Wb→ qq¯b, with theW boson decaying toquarks,• q1 fromW : quark fromW boson decay with highest transverse momentum,• q2 fromW : second quark fromW boson decay.1167. The matrix element method for t t¯H(bb¯)8− 7− 6− 5− 4− 3− 2− 1− 0D1MEM00.20.40.6Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progressFigure 7.8: Assignment efficiency of jets to quarks in the permutation with the largest t t¯H likelihood,evaluated with a sample of t t¯H events. The rows correspond to the quark each jet is matched to, whilethe columns describe the true jet origin. Jets may be truth-matched to multiple quarks.The correct assignments for the selected b-jets to the four b quarks in the t t¯H system are foundwith an efficiency of slightly over 40%. The chance to assign a selected b-jet to the wrong b quark isaround 10–20% per quark. The b-jet selection is very efficient, as the four selected b-jets are truth-matched to the four b quarks for around 85–90% of all events. This selection efficiency is evaluatedwith figure 7.8. It is obtained by summing the assignment efficiency of jets truth-matched to b quarksfrom the t t¯H system, and assigned to b quarks in the permutation with the highest t t¯H likelihood.Only a small fraction of events contain jets that are truth-matched to b quarks but assigned toWboson decay products. The assignment efficiency for the quark q2 is significantly smaller than theremaining assignment efficiencies. The majority of jets assigned to q2 are not truth-matched to any ofthe quarks from the t t¯H system. The jet originating from q2 is frequently not reconstructed, as it failsto satisfy the pT > 25 GeV threshold. Both of the jets truth-matched to the decay products q1 and q2 oftheW boson are present in only around 50% of the events. The sum of assignment efficiencies acrosseach row is slightly larger than 100%; this is due to jets that are truth-matched to multiple quarks.Reconstruction efficiencies for the b quarks from top quark decays and the quarks fromW bosondecays are similar when considering the assignment efficiency for t t¯ events and using the t t¯ +bb¯likelihood.Discriminant dependence on assignment efficiencyThe MEMD1 discrimination power is largest when all reconstructed objects associated to the LOt t¯H and t t¯ +bb¯ systems are present in an event and selected for the jet permutations in the MEM1177. The matrix element method for t t¯H(bb¯)calculation. The separation 〈S2〉 increases by 20% when evaluating the discriminant only for eventswith exactly six jets, instead of events with six or more jets. This is due to the simplified jet selection,which results in increased assignment efficiencies. When considering events with exactly six jets, andalso requiring that at least one jet in every event is truth-matched to the quark from theW bosondecay with lower transverse momentum, the separation increases by 30%.The low assignment efficiency for the hadronic decay products of theW boson adversely affectsthe performance of the MEMD1 discriminant. An additional topology can be considered, whereinstead of integrating over the energies of the two quarks resulting from the decay of theW boson,the integral is performed over theW boson directly. This increases the dimensionality of the integralby one, and the assignment of jets to theW boson is no longer needed. The separation obtained ina test where the integration is performed over the three momentum components px ,py ,pz of theW boson is only 〈S2〉 = 1%. The integration for this configuration is significantly slower than for thenominal configuration, and this hypothesis is not used in the final implementation of the MEM forthe t t¯H(bb¯)analysis.7.5.2 Object reconstructionThe two jets assigned to the b quarks from the Higgs boson decay in the permutation with the highestt t¯H likelihood can be combined and interpreted as a reconstructed Higgs boson. Similarly, thetwo jets assigned to the two b quarks produced in association with the top quark pair in the t t¯ +bb¯process can be combined in the permutation with the highest t t¯ +bb¯ likelihood. The invariant massdistributions of the resulting reconstructed objects are visualized in figure 7.9. The Higgs boson massis shown on the left, and the mass of the bb¯ system in the t t¯ +bb¯ case is on the right. The figures showdistributions for both the t t¯H signal as a dashed red line, and the t t¯ background as a solid blue line,both normalized to unit integral.The t t¯H process shows a peak in the reconstructedHiggs boson invariantmass, located around the125 GeV Higgs bosonmass used in the simulation. The peak for the t t¯ background is less pronounced.This peak appears also for the t t¯ background since the likelihood of a permutation where the invariantmass of the two jets is very different from the Higgs bosonmass is suppressed. The suppression cantake place via the transfer function in phase space regions where the invariant mass of the quarksfrom the Higgs boson decay is close to the Higgs boson mass. In regions where their invariant mass isvery different from the Higgs boson mass, the transfer function may be larger, but the matrix elementis smaller. The permutation chosen to calculate the reconstructed Higgs bosonmass is the one withthe largest likelihood, which is therefore biased towards selecting a configuration where the invariantmass of the jets assigned to the Higgs boson is close to the Higgs bosonmass.The t t¯ +bb¯ likelihood is not biased by the Higgs boson propagator. The distribution of the recon-structed mass of the bb¯ system in the permutation with the highest t t¯ +bb¯ likelihood is qualitativelydifferent for the t t¯H and t t¯ distributions. The t t¯ distribution is smoothly falling after a threshold effectdue to the jet transverse momentum requirement of pT > 25 GeV in the analysis. The distribution ofthe t t¯H signal peaks at higher values. If the assignment of jets to b quarks originating from top quark1187. The matrix element method for t t¯H(bb¯)8− 7− 6− 5− 4− 3− 2− 1− 0D1MEM00.20.40.6Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progress8− 7− 6− 5− 4− 3− 2− 1− 0D1MEM00.20.40.6Arbitrary unitsb+btt)bH(bttSingle Lepton6j≥  1SR = 13 TeVsATLAS Simulation  work in progressFigure 7.9: Reconstructed invariant mass of the bb¯ system produced in association with the top quarkpair. The figure on the left shows the invariant mass of the two jets assigned to the b quarks from theHiggs boson decay in the permutation with the highest t t¯H likelihood, this quantity is interpretedas the reconstructed Higgs boson mass. The figure on the right shows the bb¯ system assigned to bquarks that do not originate from top quark decays in the t t¯ +bb¯ topology, using the permutationwith the highest t t¯ +bb¯ likelihood. Distributions of the t t¯H signal are shown as dashed red lines, thet t¯ background is drawn as a solid blue line. All distributions are normalized to unit integral. Onlystatistical uncertainties are visualized in the figure.decays is performed correctly, the remaining jets assigned to b quarks will on average have a largerinvariant mass for t t¯H events than for t t¯ events, since they originate from Higgs boson decays (ifcorrectly selected).The reconstructed Higgs boson and bb¯ masses provide discrimination between t t¯H and back-ground processes. Information from such reconstructed variables can be combined with the MEMD1discriminant to further help distinguish t t¯H from other processes. The assignment efficiencies fromthe reconstruction BDT in the t t¯H(bb¯)analysis are slightly higher than those of the MEM, as theBDT is specifically optimized for system reconstruction. The classification BDT thus includes inputsfrom the reconstruction BDT which describe the kinematics of reconstructed objects. No informationabout reconstructed objects from the MEM is used in the analysis.1198. Observation of Yukawa interactions with thirdgeneration quarksThe t t¯H(bb¯)analysis described in chapter 6 is statistically combined with a range of other analyses,resulting in the observation of Yukawa interactions with third generation quarks by the ATLAS collab-oration. This chapter summarizes the evidence [2] and subsequent observation [3] of the t t¯H processin section 8.1 and section 8.2. The observation of H→ bb¯ [38] is briefly summarized in section 8.3.These results establish the Yukawa interactions of top and bottom quarks predicted by the SM. TheCMS collaboration also independently observed these interactions [39, 40], and both the CMS andATLAS collaborations observed Yukawa interactions of tau leptons [41, 42].The measurements reported in this chapter include the t t¯H signal strength µt t¯H , defined like inchapter 6 as the ratio of the measured cross-section to the SM prediction: µt t¯H = σobst t¯H/σSMt t¯H .8.1 Evidence for t t¯HEvidence for the t t¯H process is obtained by statistically combining searches with the ATLAS detectorfor t t¯H with four different final states, which all use 36.1 fb−1 of Run-2 LHC data withps = 13 TeV [2].This includes the t t¯H(bb¯)search [1], described in detail in chapter 6. Higgs boson decays to Z bosonsand subsequently four light leptons (electrons and muons) [146] are analysed in a targeted search.The remaining final states with multiple leptons are considered separately [2]; these multi-leptonfinal states originate fromHiggs boson decays to Z andW bosons, as well as decays to tau leptons.Lastly, Higgs boson decays to two photons are included [147].8.1.1 Analyses entering the combinationThe t t¯H analysis for the H → ZZ∗→ 4l topology selects events with four leptons, which form twopairs of leptons with the same flavor and opposite charge each. The invariant mass of the four leptonsystem is required to be in a window around the Higgs bosonmass, and additional jet and b-taggingrequirements are used to select events corresponding to the t t¯H topology. A total of 0.5 events areexpected, out of which 0.4 events are expected to originate from t t¯H production. No events areobserved.The multi-lepton search analyzes t t¯H final states with seven different topologies, defined viadifferent combinations of light leptons and tau leptons decaying to hadrons. Events selected forthe analysis need to have various combinations of leptons for the different topologies, while jetand b-tagging requirements are included to select the t t¯H phase space. Dedicated multivariate1208. Observation of Yukawa interactions with third generation quarksTable 8.1: Summary of the signal strength µt t¯H and the observed and expected significance measuredin the individual analyses used to establish evidence for the t t¯H process, as well as the combinationof all analyses. No events are observed in the analysis targeting H → ZZ∗ → 4l , hence the 68%confidence level upper limit is reported for the signal strength [2].µt t¯HSignificanceObserved ExpectedH→ bb¯ 0.8+0.6−0.6 1.4σ 1.6σH→ ZZ∗→ 4l < 1.9 - 0.6σMulti-lepton 1.6+0.5−0.4 4.1σ 2.8σH→ γγ 0.6+0.7−0.6 0.9σ 1.7σCombination 1.2+0.3−0.3 4.2σ 3.8σanalysis methods are used in most regions enriched in t t¯H signal. Additional regions are included ina combined fit to control background contributions.The t t¯H analysis with loop-induced Higgs boson decays to a system of photons, H→ γγ, extractsthe t t¯H signal from a fit of the di-photon invariant mass spectrum. The signal contribution resultsin a peak over a smooth background distribution, this peak is located around the Higgs bosonmass.Besides requiring two photons in the event selection, the analysis contains channels for final states ofthe t t¯ system with zero leptons or at least one lepton. Additional jet and b-tag requirements completethe event selection. Different categories are defined via a BDT, which is trained to identify t t¯H signalevents. A combined fit to the di-photon invariant mass spectra in all categories is used to measure thet t¯H signal.8.1.2 ResultsTable 8.1 summarizes the t t¯H signal strength, µt t¯H , measured in the four individual analyses, and inthe statistical combination of all analyses. It also includes the observed and expected significance ofthe measurements, compared to the background-only hypothesis where the t t¯H signal is absent. Dueto the lack of observed events in the H→ ZZ∗→ 4l analysis, the 68% confidence level upper limit onthe signal strength is calculated with the CLS method described in section 5.2.2.The statistical combination of the four analyses results in a signal strength measurement ofµt t¯H = 1.17±0.19 (stat.) +0.27−0.23 (syst.), (8.1)dominated by systematic uncertainty. Leading systematic uncertainties in the measurement are themodeling of t t¯ in the t t¯H(bb¯)analysis, the cross-section uncertainty for t t¯H , and the non-promptand fake lepton estimate in the multi-lepton analysis.The background-only hypothesis, where µt t¯H = 0, is excluded at 4.2σ, with an expected sensitivityof 3.8σ when using the SM prediction for the t t¯H signal. This result establishes evidence for the1218. Observation of Yukawa interactions with third generation quarkst t¯H process. The sensitivity is dominated by the multi-lepton channel, followed by contributions ofsimilar size from H→ γγ and H→ bb¯ final states.The corresponding cross-section for the best-fit signal strength is measured as σt t¯H = 590+160−150 fb,in agreement with the SM prediction of σt t¯H = 507+35−50 fb.8.2 Observation of t t¯HThe observation of the t t¯H process by the ATLAS collaboration is achieved by combining the searchesfor multi-lepton and H → bb¯ final states, which were also used to establish evidence for t t¯H , withsearches using a larger dataset recorded atps = 13 TeV for the H→ γγ and H→ ZZ∗→ 4l topologies[3]. A combination with t t¯H searches performed with Run-1 LHC data recorded atps = 7 TeV andps = 8 TeV is also performed. The top quark Yukawa coupling is extracted in a combined fit of t t¯Hsearches and additional Higgs boson analyses performed atps = 13 TeV.8.2.1 Analyses entering the combinationThe H → bb¯ topology is discussed in detail in chapter 6, and the multi-lepton search is brieflydescribed in section 8.1. Both of these searches use a dataset of 36.1 fb−1 and enter the statisticalcombination.The t t¯H analysis targeting the H→ ZZ∗→ 4l topology is updated to use 79.8 fb−1 of data [3]. Theapproach to this search is similar to the approach for the prior search with 36.1 fb−1, described insection 8.1.1. A BDT is used in one of the regions enriched in signal to further improve the sensitivity.No events are observed, while 1.1 events are expected. The t t¯H process is expected to contribute 0.6of these expected events.The analysis for t t¯H with loop-induced Higgs boson decays to di-photon final states is alsoupdated to use 79.8 fb−1 of data [3]. A similar approach is used as for the search with the 36.1 fb−1dataset described in section 8.1.1. The inclusion of four-momentum information for the reconstructedobjects results in an improved BDT performance for this search.Additional searches for t t¯H enter the combination when also considering data from Run-1 of theLHC. These target the H→ bb¯ final state [88, 90], multi-lepton final states [148] and di-photon finalstates [149] of the Higgs boson.8.2.2 ResultsFigure 8.1 shows the t t¯H cross-sectionmeasurement, divided by the SM prediction, which is obtainedin a combination of analyses using onlyps = 13 TeV data from Run-2 of the LHC. The results peranalysis topology are obtained from a combined fit with four individual parameters for the t t¯Hcross-section, which independently scale the t t¯H contributions in each topology. The result ofthe combined fit with only one cross-section parameter is also shown. The 68% confidence levelupper limit is reported for the H → ZZ∗→ 4l analysis, where no events are observed. Statisticaland systematic uncertainties affecting the measurements are drawn as yellow and blue rectangles,1228. Observation of Yukawa interactions with third generation quarksSMttHσ/ttHσ1− 0 1 2 3 4Total Stat. Syst. SMATLAS-1 = 13 TeV, 36.1 - 79.8 fbs             Total       Stat.    Syst.Combined   )0.190.21  ± 0.18 , ±   ( 0.260.28  ±  1.32 H (ZZ)tt < 1.77 at 68% CL)γγH (tt   )0.170.23  ±  , 0.380.42  ±   ( 0.420.48  ±  1.39 H (multilepton)tt   )0.270.30  ±  , 0.290.30  ±   ( 0.400.42  ±  1.56 )bH (btt  0.53 )±  , 0.280.29  ±   ( 0.600.61  ±  0.79 Figure 8.1: Results of the t t¯H cross-section measurement, divided by the SM prediction, in thestatistical combination. The results per analysis topology are obtained fromafitwith four independentcross-section parameters. Statistical and systematic uncertainties are shown in yellow and blue,respectively. The SM prediction is shown in red, with the associated uncertainty indicated as a grayband. No events are observed in the H → ZZ∗→ 4l analysis, and the 68% confidence level upperlimit is reported [3].respectively. The total uncertainty is shown as a black bar. The SM prediction is shown in red, witha gray bar indicating uncertainties related to the t t¯H cross-section prediction. These uncertaintiesrelated to the t t¯H cross-section are not included in the systematic uncertainty reported for themeasurement; this is in contrast to signal strength measurements, where they are included.The statistical combination of the four searches for t t¯H results in the observation of the t t¯Hprocess. The background-only hypothesis is excluded at 5.8σ, with an expected sensitivity of 4.9σ,when using only Run-2 data withps = 13 TeV. When also including t t¯H searches performed with theATLAS detector atps = 7 TeV andps = 8 TeV during Run-1 of the LHC, the exclusion is at 6.3σ, withan expected sensitivity of 5.1σ.The larger dataset and analysis improvements in the search for di-photon final states are thedominant contribution to the increase in sensitivity compared to the t t¯H combination describedin section 8.1. The observed significance when only considering this search targeting the H → γγtopology is 4.1σ, with an expected sensitivity of 3.7σ. Contributions to the combination from H→ZZ∗ → 4l are small, with an expected sensitivity of 1.2σ, but no observed events. The t t¯H (bb¯)analysis has a smaller impact in this combination than in the t t¯H combination establishing evidencefor the process.Statistical and systematic effects have a similar impact on the combined measurement. Contribu-1238. Observation of Yukawa interactions with third generation quarkstions to the systematic uncertainty are dominated by t t¯ modeling in the t t¯H(bb¯)analysis, the t t¯Hcross-section uncertainty, and the non-prompt and fake lepton estimate in the multi-lepton search.The t t¯H cross-section is measured atps = 13 TeV as σt t¯H = 670±90+110−100 fb. It agrees with the SMprediction of σt t¯H = 507+35−50 fb.8.2.3 Top quark Yukawa couplingAn interpretation of ATLAS Higgs boson measurements in terms of Yukawa couplings can be per-formed by allowing these couplings to vary. The effective coupling κt scales the top quark Yukawacoupling as a multiplicative factor. When combining the t t¯H analyses performed atps = 13 TeV withother analyses using up to 79.8 fb−1 of Run-2 data from the LHC, the effective coupling is measured asκt = 1.02+0.11−0.10 [37]. The value is consistent with the SM, which predicts κt = 1. In thismeasurement, theeffective couplings to weak gauge bosons, bottom quarks, tau leptons, and muons are also extractedsimultaneously with κt, and are all compatible with their SM predictions. It assumes no BSM particlescoupling to the Higgs boson. The ratio of effective couplings to top quarks and gluons is measured tobe κt/κg = 1.10+0.15−0.14, also compatible with the SM prediction.8.3 Observation of H→ bb¯The statistical combination of ATLAS searches for multiple Higgs boson productionmodes, with Higgsboson decays to bb¯, results in the observation of H → bb¯ [38]. The dominant contribution to thiscombination comes from the search of VH production of Higgs bosons, with decays H→ bb¯. Boththe search with Run-1 data [150] and Run-2 data [38] are included. The t t¯H(bb¯)search [1] describedin chapter 6 enters the combination, as does the corresponding Run-1 search [88]. Two searches forvector boson fusion Higgs production with H→ bb¯ are included as well, using Run-1 data [151] andRun-2 data [152] of the LHC.The observed significance for H → bb¯ decays from this combination is 5.4σ, with an expectedsensitivity of 5.5σ. Table 8.2 summarizes the observed and expected significance per productionmode, and for the combination. The result per production mode includes two searches each, usingboth Run-1 and Run-2 data. The overall sensitivity is dominated by the VH channel. The t t¯H channelhas a small contribution; the leading contribution to the t t¯H significance comes from the Run-2t t¯H(bb¯)analysis.With the cross-sections for Higgs boson production processes fixed to the SM prediction, theH→ bb¯ signal strengthµH→bb¯ measures the ratio of the observed branching ratio to the SMprediction.In the combination of all channels, the signal strength is measured asµH→bb¯ = 1.01±0.20= 1.01±0.12 (stat.) +0.16−0.15 (syst.), (8.2)dominated by systematic uncertainty and consistent with the SM prediction.1248. Observation of Yukawa interactions with third generation quarksTable 8.2: Observed and expected significance for H→ bb¯ decays. The results are reported separatelyper production mode, and for the statistical combination of all channels [38].SignificanceProcess Observed ExpectedVector boson fusion 0.9σ 1.5σt t¯H 1.9σ 1.9σVH 5.1σ 4.9σCombination 5.5σ 5.4σThe effective coupling strength scaling the bottom Yukawa coupling is extracted in a combinationof analyses, as described in section 8.2.3, as κb = 1.06+0.19−0.18 [37]. It is compatible with the SM predictionof κb = 1.1259. Search for t t¯H(bb¯)with 139 fb−1 of dataWith the Run-2 of the LHC completed, and 139.0 fb−1 of data recorded by the ATLAS experimentavailable for physics analyses, this chapter provides an outlook on the sensitivity of a t t¯H(bb¯)analysiswith this data. Many of the details of this study follow the description of the analysis published with36.1 fb−1 [1], which is documented in chapter 6. This chapter highlights important aspects anddifferences to the treatment described previously.The analysis presented here is designed to provide a robust baseline configuration for the next iter-ation of the t t¯H(bb¯)analysis. Region definitions andmultivariate analysis techniques are simplified,and only the single-lepton channel is considered. MC samples for physics analyses with the ATLASexperiment are centrally produced for use by the whole collaboration. The modeling and treatment ofsystematic uncertainties in the study documented in this chapter differs in several regards from the36.1 fb−1 analysis, mostly caused by the availability of MC samples.The design of the ATLAS search for t t¯H(bb¯)with the full Run-2 dataset is not finalized at the timeof writing this chapter. In order to not bias decisions taken regarding the analysis design, data is notshown in bins of any distributions where the t t¯H signal is expected to contribute more than 5% tothe bin yield. Bins affected by this blinding are indicated with a gray hashed area. In addition to this,this chapter does not include any fits to data, and only shows the expected sensitivity obtained fromfitting an Asimov dataset.9.1 Event selectionThe event selection for the analysis is very similar to the selection described in section 6.2. Relevantchanges in object definitions and the definition of the single-lepton channel are listed in this section.9.1.1 DatasetThe analysis uses proton–proton collision events provided by the LHC between 2015 and 2018 atps = 13 TeV. They are recorded by the ATLAS detector and required to fulfill the quality criteria fromsection 3.2.7. The integrated luminosity of this dataset is 139.0 ± 2.4 fb−1. Figure 3.2 shows the meanamount of interactions in each bunch crossing per year, including data recorded by ATLAS but notused for physics analyses. Its mean value of 34 interactions for this dataset is increased compared tothe analysis performed with 36.1 fb−1.1269. Search for t t¯H(bb¯)with 139 fb−1 of data9.1.2 Object definitionsThe basic object definitions correspond to those listed in section 6.2.2, with minor changes. Electronsneed to satisfy the medium identification operating point, while muons need to satisfy the looseidentification operating point. No isolation requirements are applied for either of these light leptons.Jet definitions are unchanged. After applying the overlap removal, electron andmuon identificationoperating points are tightened to tight andmedium, respectively. Electrons also need to satisfy thegradient isolation operating point, while muons need to fulfill the FCTTO isolation operating point.9.1.3 Definition of the single-lepton channelEvents are recorded with single electron and muon triggers, with thresholds as described in sec-tion 6.2.3. The thresholds from 2016 stay the same for 2017 and 2018. An additional muon trigger isadded in the barrel region for 2017–2018, with a transverse momentum threshold of 60 GeV.Events are required to have at least five jets, exactly one reconstructed light lepton, and not morethan one hadronic tau lepton. The boosted region definition is unchanged. All remaining eventsare considered for the resolved regions, provided they contain at least three jets satisfying the tightb-tagging operating point.9.2 ModelingThe modeling of all processes relevant to this analysis is very similar to the setup described insection 6.3. This section focuses on differences compared to the previous setup. All processesare modeled with MC simulation. Contributions from fake and non-prompt leptons are negligibleand therefore not included. The nominal samples used to describe t t¯ production are simulated withthe AFII method, described in reference [58]. Additional events are simulated in regions of phasespace the analysis is targeting. This decreases the sizeable uncertainty related to backgroundmodelstatistical uncertainties that was observed in the t t¯H(bb¯)analysis with 36.1 fb−1, as listed in table 6.5.9.2.1 t t¯H signalThe t t¯H signal is modeled with POWHEG-BOX v2 [100–102, 153], referred to in the following asPOWHEG, at NLO in QCD and using the NNPDF3.0NLO PDF set [96]. Parton showering and hadroniza-tion are simulated with PYTHIA 8.230 [97], using the A14 set of free parameters [98].9.2.2 t t¯ + jets backgroundThe dominant background process in this analysis is top quark pair production. It is modeled withthe POWHEG generator [103] and an updated version of PYTHIA, using 8.230 with the A14 tune. Thesplit of t t¯ into components is performed as described in section 6.3.2. No reweighting to a predictionfrom a 4F scheme sample is performed.1279. Search for t t¯H(bb¯)with 139 fb−1 of data9.2.3 Other backgroundsThe modeling of V +jets and t t¯V processes is unchanged. Diboson samples use a combination ofSHERPA 2.2.1 and 2.2.2 [94, 113–115]. Single top quark production in the s- and t-channel, as well asWt , is simulated at NLO with POWHEG and the NNPDF3.0NLO PDF set. The t-channel is simulatedin the 4F scheme. The t Z and tWZ processes are simulated with MADGRAPH5_AMC@NLO [95], inthe following abbreviated as MG5_AMC@NLO. Parton showering and hadronization are performedwith PYTHIA 8 for all five of these single top quark processes. The production of four top quarks,t t¯ t t¯ , is simulated at NLO with MG5_AMC@NLO and PYTHIA 8; the t t¯WW process is not consideredseparately. No additional Higgs production mechanisms are considered, including the associatedproduction with a single top quark.9.2.4 Inclusive modeling of dataFigure 9.1 compares the model described to data, showing the distribution of the number of jetsper event. It includes all events satisfying the event selection for the single-lepton channel. Theuncertainty band drawn as a hashed blue area combines statistical and systematic uncertainties,described in section 9.5, with the exception of uncertainties from the free-floating normalizationfactors for the t t¯+≥ 1b and t t¯+≥ 1c processes. Due to the tightened b-tagging requirements, thecontribution from t t¯ + light processes is smaller than in figure 6.4, which shows the correspondingdistribution for the 36.1 fb−1 analysis.The agreement with data is worse in figure 9.1 than in the corresponding figure 6.4 for the 36.1 fb−1analysis. While many small differences contribute to this, a large contribution comes from an im-proved jet calibration. Differences between the predicted distribution and data are covered by system-atic uncertainties, and the normalization difference can be corrected with free-floating k(t t¯+≥ 1b)and k(t t¯+≥ 1c) normalization factors. The background model describes data well in fits to dataperformed with signal-depleted regions.Figure 9.2 shows the number of b-tagged jets at the operating points very tight, tight,medium,and loose. The model underestimates data, particularly for events with many b-tagged jets. No eventswith only two b-tagged jets at the loose operating point are considered in the analysis, since eitherthree b-tagged jets are required at the tight operating point for the resolved selection, or four b-taggedjets at the loose operating point for the boosted selection.Themodeling improves considerablywhen applying the best-fit normalization factors for t t¯+≥ 1band t t¯+≥ 1c obtained in the previous analysis, which are given in equation (6.2). With this scalingapplied, model and data agree well within the uncertainties related to the model, for the distributionsof the number of jets and b-tagged jets.9.3 Event categorizationEvents in the single-lepton channel are divided into five signal regions and two control regions. Theregion design is simplified compared to the previous configuration described in section 6.4.1289. Search for t t¯H(bb¯)with 139 fb−1 of data5 6 7 8 9≥number of jets0.50.7511.25 Data / Pred. 020406080100120140160180310×Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)ttFigure 9.1: Expected distribution of the number of jets per event, compared to data. The uncertaintiesshown include all sources of systematic uncertainty described in section 9.5, with the exception ofthe free-floating normalization factors for the t t¯+≥ 1b and t t¯+≥ 1c processes. The t t¯H distributionnormalized to the total background is overlaid as a dashed red line.9.3.1 Region definitionsThe SRboosted signal region remains as defined in section 6.4.1. The resolved regions are dividedaccording to jet multiplicity and b-tagging requirements. Separate regions are formed for events withexactly five jets, and for events with six or more jets. The SR5j1 and SR≥6j1 signal regions are unchanged,requiring at least four jets that are b-tagged at the very tight operating point. Two control regionsare defined by requiring exactly three jets b-tagged at the tight operating point, these regions areCR5j and CR≥6j, for events with exactly five or six and more jets, respectively. All remaining eventsenter two intermediate signal regions. They require at least four jets that are b-tagged at the tightoperating point, but less than four jets b-tagged at the very tight operating point. The two resultingsignal regions are called SR5j2 and SR≥6j2 .9.3.2 Region composition and signal contributionsThe composition of background processes in the seven regions is summarized in figure 9.3. Thet t¯+≥ 1b process dominates in contribution to signal regions. Control regions have a larger amount oft t¯+ light, while t t¯+≥ 1c is not dominant in any region. Contributions from t t¯V and non-t t¯ processesare small, but not negligible.Signal contributions to the seven regions are summarized in figure 9.4. For each region, thefraction of expected signal events (S) to the total background (B) is listed. The histograms furthermorevisualize S/pB; blue bars are used to indicate control regions, while red bars are used in signal regions.The relative contribution of signal is increased in the SR≥6j1 and SR5j1 signal regions compared to the36.1 fb−1 t t¯H(bb¯)analysis, which uses these regions as well. This is caused by the improved b-tagging1292≤ 3 4 5≥number of jets b-tagged at very tight operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at tight operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at medium operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.Htt2≤ 3 4 5≥number of jets b-tagged at loose operating point0.50.7511.25 Data / Pred. 110210310410510610710810910Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptoninclusivePre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.HttFigure 9.2: Expected distribution of the number of b-tagged jets per event at the four operating points(very tight, tight,medium, loose), compared to data. The uncertainties shown include all sources ofsystematic uncertainty described in section 9.5, with the exception of the free-floating normalizationfactors for the t t¯+≥ 1b and t t¯+≥ 1c processes. The t t¯H signal is shownboth in the stacked histogram,contributing in red, as well as a dashed red line drawn on top of the stacked histogram. Data is notshown in bins where the t t¯H signal is expected to contribute more than 5% to the yield, indicated bya gray hashed area.1309. Search for t t¯H(bb¯)with 139 fb−1 of dataATLAS work in progress = 13 TeVsSingle Lepton + lighttt  + Vtt 1c≥ + tt1b≥ + tt tNon-t5jCR 25jSR 15jSR 6j≥CR 2 6j≥SR 1 6j≥SRboostedSRFigure 9.3: Composition of background processes per region. Each pie chart shows the relativecontributions per process, with the processes defined in section 9.2.algorithm. There are also substantial increases in S/pB due to the large increase in the size of thedataset used.9.4 Multivariate analysis techniquesThemultivariate analysis approach employed in the signal regions to discriminate between the t t¯Hsignal and background processes is very similar to the approach for the previous analysis, describedin section 6.5. It consists of two stages, with a reconstruction BDT and LHD employed in the first stage.The second stage consists of the classification BDT. Information from the reconstruction BDT andLHD is included in all resolved signal regions. Variables related to kinematic quantities and b-taggingare also used.9.5 Systematic uncertaintiesThe implementation of systematic uncertainties follows the treatment described in section 6.6.1. Thissection summarizes changes with respect to the previous analysis.1319. Search for t t¯H(bb¯)with 139 fb−1 of dataATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton024BS / 5jCR = 0.6%S/B024BS / 25jSR = 3.1%S/B024BS / 15jSR = 6.1%S/B024BS /  6j≥CR = 1.3%S/B024BS / 2 6j≥SR = 4.4%S/B024BS / 1 6j≥SR = 7.0%S/B024BS / boostedSR = 3.1%S/BFigure 9.4: Signal contributions per region, calculated with the expected amount of t t¯H events (S)and background events (B) per region. The histograms show S/pB, with blue bars for control regionsand red bars indicating signal regions. S/B is also listed for each region.9.5.1 Experimental uncertaintiesA relative uncertainty of 1.7% is assigned to the total size of the dataset used, derived with a methodsimilar to reference [50]. The majority of experimental uncertainties closely corresponds to thedescription in section 6.6.2.LeptonsSystematic uncertainties related to the electron energy scale calibration are split into two components,while the remaining nuisance parameters associated to leptons are unchanged. This results in atotal of seven nuisance parameters related to electrons. The 15 nuisance parameters associated tomuons are unchanged. No uncertainties related to tau leptons are considered; they are expected to benegligible given their effect in the 36.1 fb−1 analysis.JetsUncertainties related to the jet energy scale calibration are described by a set of 31 nuisance parame-ters, split into more components than in the previous analysis. The treatment of nuisance parameters1329. Search for t t¯H(bb¯)with 139 fb−1 of datarelated to jet energy resolution is also updated, resulting in eight associated nuisance parameters.One more nuisance parameter is related to the jet vertex tagger, resulting in a total of 40 nuisanceparameters associated to jets.Flavor taggingThe b-tagging calibration used in this analysis describes uncertainties related to b-tagging efficiency,split into 45 sources. Mis-tag rates for c- and light jets are split into 20 sources each, for a total of 85nuisance parameters related to b-tagging.Missing transverse energyThe treatment of uncertainties related to the EmissT calculation is unchanged, with three associatednuisance parameters.9.5.2 Signal and background modelingThe treatment of systematic uncertainties related to the modeling of processes follows section 6.6.3,with differences mostly due to the availability of MC samples.t t¯H signalUncertainties related to the t t¯H cross-section and Higgs boson branching ratios are unchanged. Acomparison of the nominal POWHEG+PYTHIA 8 sample to a sample produced with MG5_AMC@NLOand PYTHIA 8 is implemented as a nuisance parameter capturing effects due to the event generatorchoice. The choice of PS and hadronization model is described by a comparison of the nominalsample to one generated with POWHEG and HERWIG 7 [125]. Uncertainties related to ISR and FSRare described by two nuisance parameters. The ISR nuisance parameter compares the nominalconfiguration to a variation of renormalization and factorization scale, as well as the tune used in thePS. The nuisance parameter related to FSR varies the renormalization scale.t t¯ + jets backgroundThe t t¯ cross-section uncertainty treatment remains the same, as do the free-floating normalizationfactors k(t t¯+≥ 1b) and k (t t¯+≥ 1c). All other t t¯ modeling nuisance parameters are split across thethree t t¯ components. Samples used to estimate modeling uncertainties are reweighted to match thet t¯+≥ 1b, t t¯+≥ 1c and t t¯ + light fractions predicted by the nominal POWHEG+PYTHIA 8 sample.A t t¯ sample generated with MG5_AMC@NLO and PYTHIA 8 is compared to the nominal sampleto estimate the uncertainty related to the choice of NLO event generator. The choice of PS andhadronization model is described with an uncertainty derived from the comparison to a sampleproduced with POWHEG+HERWIG 7. Uncertainties related to ISR and FSRmodeling are derived withthe samemethod as used for t t¯H , and described with two nuisance parameters per t t¯ component.1339. Search for t t¯H(bb¯)with 139 fb−1 of dataTable 9.1: Systematic uncertainty sources affecting the modeling of t t¯ + jets. The left column showsthe individual sources, while the central column describes how the effect is evaluated. The columnon the right lists on which t t¯ components the sources act on, and whether the effect is correlatedbetween the components.Systematic sources Description t t¯ categoriest t¯ cross-section Up or down by 6% All, correlatedk(t t¯+≥ 1b) Free-floating t t¯+≥ 1b normalization t t¯+≥ 1bk(t t¯+≥ 1c) Free-floating t t¯+≥ 1c normalization t t¯+≥ 1cEvent generator MG5_AMC@NLO+PYTHIA 8 vs. POWHEG+PYTHIA 8 All, uncorrelatedPS & hadronization POWHEG+HERWIG 7 vs. POWHEG+PYTHIA 8 All, uncorrelatedISR Variations of µR, µF, and PYTHIA 8 tune All, uncorrelatedFSR Variation of µR All, uncorrelatedA summary of the nuisance parameters for t t¯ is shown in table 9.1. Four sources of uncertaintyaffect each of the t t¯ components separately.Small backgroundsThe normalization uncertainties assigned to all small background processes remain the same, withthe exception of the normalization of Z+jets events. These events are assigned a single nuisanceparameter describing a 35% uncertainty.For theWt , s- and t-channel, additional samples are generated with POWHEG+HERWIG 7. Thecomparison of those samples to the nominal POWHEG+PYTHIA 8 samples is used to describe un-certainties related to the choice of PS and hadronization model, with one nuisance parameter foreach sample. A comparison between the nominal Wt sample, and another one generated withMG5_AMC@NLO and PYTHIA 8, is implemented as a nuisance parameter describing the event gen-erator choice. An additional uncertainty for theWt sample is derived from the comparison of thenominal diagram removal scheme to the diagram subtraction scheme [119]. This results in a total offive nuisance parameters related to modeling of single top quark processes.9.5.3 Summary of systematic uncertainty sourcesAn overview of systematic uncertainties in the analysis is provided in table 9.2. Nuisance parametersaffecting only normalization are indicated by type N , those affecting both shape and normalizationare type S+N . The amount of components per source is listed in the last column.9.6 Statistical analysis and resultsThis section summarizes the expected sensitivity of the analysis to the t t¯H signal. The statisticaltreatment corresponds to the description from section 6.7. Seven regions enter the simultaneous fit.134Table 9.2: List of the systematic uncertainties affecting the analysis. The typeN indicates uncertaintieschanging normalization of the affected process, uncertainties with type S+N can change both shapeand normalization. The amount of different components per source is listed in the third column.Systematic uncertainty Type ComponentsExperimental uncertaintiesLuminosity N 1Pile-up modeling S+N 1Physics objectsElectron S+N 7Muon S+N 15Jet energy scale S+N 31Jet energy resolution S+N 8Jet vertex tagger S+N 1EmissT S+N 3b-taggingEfficiency S+N 45Mis-tag rate (c) S+N 20Mis-tag rate (light) S+N 20Modeling uncertaintiesSignalt t¯H cross-section N 2H branching fractions N 3t t¯H modeling S+N 4t t¯ backgroundt t¯ cross-section N 1t t¯+≥ 1c normalization free-floating N 1t t¯+≥ 1b normalization free-floating N 1t t¯ + light modeling S+N 4t t¯+≥ 1c modeling S+N 4t t¯+≥ 1b modeling S+N 4Other backgroundsW +jets normalization N 3Z+jets normalization N 1Diboson normalization N 1t t¯W cross-section N 2t t¯ Z cross-section N 2Single top cross-section N 6Single top modeling S+N 5t t¯ t t¯ normalization N 11359. Search for t t¯H(bb¯)with 139 fb−1 of dataIn the two control regions, CR5j and CR≥6j, the distribution of theHhadT variable is used. The remainingfive signal regions use distributions of the classification BDT. Statistical uncertainties related to themodel are below 5% across all bins.9.6.1 Expected sensitivityThe expected analysis sensitivity is evaluated in a fit of the model to an Asimov dataset. The resultingsignal strength and free-floating normalization factors areµt t¯H = 1.00±0.18 (stat.)+0.29−0.24 (syst.)= 1.00+0.34−0.31,k(t t¯+≥ 1b)= 1.00+0.24−0.19,k(t t¯+≥ 1c)= 1.00+0.45−0.35.(9.1)The sensitivity increases substantially compared to the 36.1 fb−1 analysis, where the signal strengthexpected when fitting the single-lepton channel model to an Asimov dataset was µt t¯H = 1.00+0.68−0.65.The increase can only partially be attributed to the larger dataset, given that the 36.1 fb−1 analysissensitivity was already limited by systematic uncertainties. Despite the larger dataset, the level ofexpected constraints on nuisance parameters is overall similar, as the selection requirements in the139.0 fb−1 analysis are tightened. Less events enter the fit compared to the 39.1 fb−1 analysis. Theuncertainty source with the second largest impact in the 36.1 fb−1 analysis, given by the comparison ofthe SHERPA4F prediction for t t¯+≥ 1b to the nominal configuration, does not have an equivalent in thisanalysis. Additional uncertainties related to t t¯+≥ 1b modeling derived from the SHERPA4F sampleare also not considered, leading to an increase in sensitivity. The improved b-tagging calibrationresults in more powerful signal regions, which improve the performance further.The expected significance over the SM background prediction is 3.3σ, thereby surpassing thethreshold for evidence for the t t¯H(bb¯)process. This sensitivity is likely to decrease when consideringa comparison between a 4F scheme and 5F scheme prediction for t t¯ as an additional uncertainty.The analysis presented in this chapter is however not optimized for sensitivity, and a dedicatedoptimization of the region definitions andmultivariate analysis techniques can increase the sensitivityfurther. The inclusion of the dilepton channel, which is not considered in this chapter, will also resultin a sensitivity increase.Distributions before and after the fitFigure 9.5 shows a summary of the yield in all regions considered in the fit. The pre-fit model isshown at the top, with the post-fit model below, obtained from the fit to the Asimov dataset. Thecomparison between the post-fit model and data is not meaningful, hence data is not included in thecorresponding figure. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. Theexpected t t¯H signal is shown in red contributing to the stacked histogram, and also drawn overlaid asa dashed red line. The expected uncertainties decrease post-fit due to correlations between nuisanceparameters and their constraints.1365jCR25jSR 15jSR  6j≥CR 2 6j≥SR 1 6j≥SR boostedSR0.50.7511.25 Data / Pred. 110210310410510610Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptonPre-FitData Htt  + lighttt1c≥ + tt 1b≥ + tt  + VtttNon-t Total unc. Htt5jCR25jSR 15jSR  6j≥CR 2 6j≥SR 1 6j≥SR boostedSR0.50.7511.25 Data / Pred. 110210310410510610Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptonPost-FitHtt  + lighttt 1c≥ + tt1b≥ + tt  + Vtt tNon-tTotal unc. HttFigure 9.5: Overview of the yields in all regions pre-fit (top) and post-fit (bottom). The uncertaintybands include all sources of systematic uncertainty described in section 9.5. No uncertainty relatedto k(t t¯+≥ 1b) and k (t t¯+≥ 1c) is included pre-fit. The t t¯H signal is shown both in the stackedhistogram, contributing in red, as well as a dashed red line drawn on top of the stacked histogram. Itis normalized to the SM prediction. Data is only compared to the pre-fit model, and not shown inbins where the t t¯H signal is expected to contribute more than 5% to the yield, indicated by a grayhashed area.1379. Search for t t¯H(bb¯)with 139 fb−1 of data200 300 400 500 600 700 800 [GeV]hadTH0.50.7511.25 Data / Pred. 05000100001500020000250003000035000Events / 50 GeV ATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton5jCRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.200 300 400 500 600 700 800 [GeV]hadTH0.50.7511.25 Data / Pred. 050001000015000200002500030000Events / 50 GeV ATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton5jCRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc.300 400 500 600 700 800 900 1000 1100 [GeV]hadTH0.50.7511.25 Data / Pred. 050001000015000200002500030000Events / 75 GeV ATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton 6j≥CRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.300 400 500 600 700 800 900 1000 1100 [GeV]hadTH0.50.7511.25 Data / Pred. 0500010000150002000025000Events / 75 GeV ATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton 6j≥CRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc.Figure 9.6: Comparison between data and the model for the control regions CR5j (top) and CR≥6j (bot-tom), with pre-fit on the left and post-fit on the right. The uncertainty bands include all sources of sys-tematic uncertainty described in section 9.5. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c)is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalized to the SMprediction. Events with HhadT < 200 GeV or HhadT > 800 GeV are included in the leftmost and rightmostbins of the CR5j distributions, respectively. Similarly, events with HhadT < 250 GeV or HhadT > 1150 GeVare also included in the outermost bins of the CR≥6j distributions. Data is only compared to the pre-fitmodel.The HhadT distributions in the two control regions which enter the fit, CR5j and CR≥6j, are shown infigure 9.6. Figure 9.7 shows the classification BDT distributions in the SR5j1 , SR5j2 , and SRboosted regions,while figure 9.8 shows the corresponding distributions in the SR≥6j1 and SR≥6j2 regions. All distributionsare shown with the binning used in the fit. The t t¯H distribution, normalized to the total backgroundprediction, is overlaid as a dashed red line in the signal region distributions. Data is only included inpre-fit distributions.1381− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0100200300400500600700Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton15jSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0100200300400500Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton15jSRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc. H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0200400600800100012001400Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton25jSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 02004006008001000Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton25jSRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc. H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0100200300400500600700800Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptonboostedSRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0100200300400500600700Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle LeptonboostedSRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc. H (norm)ttFigure 9.7: Comparison between data and themodel for the signal regions SR5j1 (top), SR5j2 (middle) andSRboosted (bottom), with pre-fit on the left and post-fit on the right. The uncertainty bands include allsources of systematic uncertainty described in section 9.5. No uncertainty related to k(t t¯+≥ 1b) andk(t t¯+≥ 1c) is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalized tothe SM prediction. The t t¯H distribution normalized to the total background is overlaid as a dashedred line. Data is only compared to the pre-fit model, and not shown in bins where the t t¯H signal isexpected to contribute more than 5% to the yield, indicated by a gray hashed area.1391− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 0200400600800100012001400Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton1 6j≥SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 02004006008001000Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton1 6j≥SRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc. H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 02004006008001000120014001600180020002200Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton2 6j≥SRPre-FitData Htt + lighttt 1c≥ + tt1b≥ + tt  + VtttNon-t Total unc.H (norm)tt1− 0.8− 0.6− 0.4− 0.2− 0 0.2 0.4 0.6 0.8 1Classification BDT output0.50.7511.25 Data / Pred. 02004006008001000120014001600Events / binATLAS work in progress-1 = 13 TeV, 139.0 fbsSingle Lepton2 6j≥SRPost-FitHtt  + lighttt1c≥ + tt 1b≥ + tt + Vtt tNon-tTotal unc. H (norm)ttFigure 9.8: Comparison between data and the model for the signal regions SR≥6j1 (top) and SR≥6j2 (bot-tom), with pre-fit on the left and post-fit on the right. The uncertainty bands include all sources of sys-tematic uncertainty described in section 9.5. No uncertainty related to k(t t¯+≥ 1b) and k (t t¯+≥ 1c)is included pre-fit. The t t¯H signal shown in red in the stacked histogram is normalized to the SMprediction. The t t¯H distribution normalized to the total background is overlaid as a dashed red line.Data is only compared to the pre-fit model, and not shown in bins where the t t¯H signal is expected tocontribute more than 5% to the yield, indicated by a gray hashed area.1409. Search for t t¯H(bb¯)with 139 fb−1 of data2− 1.5− 1− 0.5− 0 0.5 1 1.5 2θ∆)/0θ-θ(Jet energy resolution: NP IIb-tagging: efficiency NP IIWt: diagram subtr. vs. nominalWt: MG5 vs. nominalH: PS & hadronizationttJet energy resolution: NP IH: cross-section (PDF)ttJet energy scale: NP IH: FSRtt1b: ISR≥+ttb-tagging: efficiency NP I1c)≥+tk(t1b: PS & hadronization≥+tt1c: MG5 vs. nominal≥+tt+light: PS & hadronizationttH: MG5 vs. nominalttH: cross-section (QCD)tt1b: FSR≥+tt1b)≥+tk(t1b: MG5 vs. nominal≥+tt0.3− 0.2− 0.1− 0 0.1 0.2 0.3µ∆:µPre-fit impact on θ∆+θ = θ θ∆-θ = θ:µPost-fit impact on θ∆+θ = θ θ∆-θ = θNuis. Param. PullATLAS work in progress-1 = 13 TeV, 139.0 fbsFigure 9.9: The 20 dominant nuisance parameters in the fit, ranked according to their impact on thesignal strength. The empty rectangles correspond to the pre-fit impact, while the filled rectangles showthe post-fit impact per nuisance parameter. The upper axis shows the impact ∆µ. The pull θˆ−θ0∆θ of thenuisance parameter is shown as black points, with the vertical black lines visualizing the post-fit nui-sance parameter uncertainty ∆θˆ. MG5 refers to samples generated with MG5_AMC@NLO+PYTHIA 8.9.6.2 Dominant nuisance parameters and sources of uncertaintyAll nuisance parameters are ranked according to their impact, as defined in section 6.7.3. The 20dominant contributions are summarized in figure 9.9. Pre- and post-fit impact are drawn with emptyand filled rectangles, respectively. The upper axis shows the impact ∆µ of each nuisance parameter,while the pull is indicated on the lower axis, and drawn with black points. No pulls are present in thisfit to the Asimov dataset by design. Constraints are indicated by the vertical black lines. The centralvalue and pre-fit uncertainty for k(t t¯+≥ 1b) and k (t t¯+≥ 1c) are not defined, and set to ∆θ = θ0 = 1in the figure. MG5 refers to samples generated with MG5_AMC@NLO+PYTHIA 8.Systematic uncertainties in the analysis are dominated by the modeling of t t¯+≥ 1b. The largestindividual uncertainty in the analysis is related to the choice of the event generator for t t¯+ ≥1419. Search for t t¯H(bb¯)with 139 fb−1 of data1b, defined by comparing the nominal POWHEG+PYTHIA 8 setup to the sample produced withMG5_AMC@NLO+PYTHIA 8. It predicts large shape variations for t t¯+ ≥ 1b in the most sensitivesignal regions, which have an effect specifically in the bins most enriched in t t¯H signal. This nuisanceparameter also predicts shape variations for t t¯+≥ 1b in the control regions and is constrained bythe fit. The corresponding nuisance parameter is the most dominant in the 36.1 fb−1 analysis aswell. The k(t t¯+≥ 1b) normalization factor has the second highest impact, followed by t t¯+≥ 1b FSR.Both k(t t¯+≥ 1b) and the t t¯+≥ 1b FSR nuisance parameter affect the t t¯+≥ 1b normalization; thet t¯+≥ 1b FSR nuisance parameter defines a variation of the t t¯+≥ 1b normalization of around 15%per region, with only a small effect on the shape of the t t¯+ ≥ 1b distributions. The fit determinesthat it is strongly anti-correlated with k(t t¯+≥ 1b), with a correlation of around 75%. A correlationmatrix for this fit is provided in appendix section B.1. The nuisance parameter assigned to t t¯+≥ 1bPS and hadronization model similarly affects the t t¯+ ≥ 1b normalization, and predicts variationsaround 10–25% per region, with slightly larger predicted shape variations. The variations decrease thet t¯+≥ 1b yield, and this nuisance parameter is strongly correlated with k (t t¯+≥ 1b) as well.Uncertainties related to themodeling of t t¯H follow in the ranking, related to the t t¯H cross-sectionand the choice of event generator. Several additional nuisance parameters related to t t¯H also areincluded within the 20 most dominant contributions. The impact of t t¯+≥ 1c and t t¯ + light modelingis smaller than the impact of t t¯+≥ 1bmodeling. The dominant experimental uncertainties are relatedto b-tagging and jet calibration.Uncertainties grouped by sourceTable 9.3 shows contributions to the total predicted uncertainty ∆µ for the t t¯H signal strength,grouped by sources of uncertainty. The method to obtain these results is equivalent to the descriptionin section 6.7.3. The dominant source of uncertainty is the modeling of t t¯+ ≥ 1b, with smallercontributions from the other components t t¯+≥ 1c and t t¯ + light. The modeling of t t¯H is also a largesource of uncertainty. The statistical uncertainties related to the backgroundmodel are significantlyreduced in impact compared to the 36.1 fb−1 analysis due to the use of additional samples to populatethe phase space selected by the analysis with more MC events.142Table 9.3: Contributions to the signal strength uncertainty, grouped by sources. The total statisticaluncertainty includes effects from the k(t t¯+≥ 1b) and k (t t¯+≥ 1c) normalization factors, while theintrinsic statistical uncertainty does not.Uncertainty source ∆µSystematic uncertaintiest t¯H modeling +0.18 −0.08t t¯+≥ 1b modeling +0.19 −0.19t t¯+≥ 1c modeling +0.07 −0.07t t¯ + light modeling +0.08 −0.07Other backgroundmodeling +0.07 −0.07Experimental uncertainties +0.15 −0.14Backgroundmodel statistical uncertainties +0.04 −0.05Total systematic uncertainty +0.29 −0.24Statistical uncertaintiest t¯+≥ 1b normalization +0.13 −0.12t t¯+≥ 1c normalization +0.05 −0.05Intrinsic statistical uncertainty +0.12 −0.11Total statistical uncertainty +0.18 −0.18Total uncertainty +0.34 −0.3114310. Muon trigger efficiency measurementThe ATLASmuon trigger system identifies events containing one or moremuons at various thresholdsof transverse momentum. It is described in section 3.2.6. Muons are produced by a wide range ofprocesses, and muon triggers are consequently used in many different physics analyses in the ATLAScollaboration. They are an essential ingredient to the t t¯H(bb¯)analyses presented in chapter 6 andchapter 9.This chapter describes a measurement of the ATLAS muon trigger efficiency for muons withpT > 100 GeV. The measurement is performed in two channels, dominated by contributions from t t¯andW +jets processes. It is complimentary to measurements performed with decays of Z bosons tomuon pairs (which use a method similar to reference [53]), where the muons typically have smallertransverse momenta. The efficiency is computed with data recorded in 2016–2018, and comparedto the expected efficiency from the simulation of the ATLAS detector. The simulation can then becorrected to match the efficiency measured in data. This correction uses the ratio of efficienciesmeasured in data to the efficiency measured in simulation; the ratio is called scale factor (SF).Section 10.1 outlines the method used for the trigger efficiency measurement in this chapter. Theevent selection and definitions of the two channels are provided in section 10.2. Section 10.3 describesthe samples of simulated events used in this measurement. Systematic uncertainties affecting themeasurement are listed in section 10.4, followed by a presentation of the results in section 10.5.10.1 Analysis methodThe efficiency of a trigger is evaluated from a set of events containing objects that the trigger shouldidentify, and then measuring the fraction of events in which a positive trigger decision is made. Inorder to not bias themeasurement, the event selection should not be correlated to the performance ofthe trigger under investigation. The tag-and-probe method is used to achieve this. Events are selectedby a tag trigger, and are required to also contain the object that the probe trigger is designed to identify.The probe trigger efficiency is the fraction of these events in which the probe trigger sends a positivetrigger decision.For the muon trigger efficiency measurements in this chapter, the tag trigger identifies eventsbased on their missing transverse energy, EmissT . The muons used to study the trigger performancemostly originate fromW boson decays, and are produced together with neutrinos. Events with highmomentummuons are therefore also generally expected to have a significant amount of EmissT . Theevent selection requires events to pass the EmissT tag trigger, and to contain exactly one muon.The probe trigger is a logical or between twomuon triggers. They identify events with pT > 26 GeV14410. Muon trigger efficiency measurementmuons and an isolation requirement, or muons with pT > 50 GeV and no isolation requirement,respectively. These two triggers perform very similarly for the muon transverse momentum rangepT > 100 GeV targeted by this measurement, and are therefore combined.The trigger efficiency is measured separately for each year of data-taking. It varies due to smallchanges to the trigger algorithms and the trigger chambers in the ATLAS MS system implementedthroughout Run-2 of the LHC. No muon trigger SFs are applied in the derivation of the triggerefficiencies.The trigger efficiency can also be measured in t t¯ decays resulting in one muon and one electron.In this case, an electron tag trigger can be used. The trigger efficiencies obtained with this topologyare consistent with those shown in section 10.5.10.2 Event selection and categorizationThis section summarizes the selection of events entering the trigger efficiency measurement. Twodifferent channels, one enriched in t t¯ events and another enriched inW +jets events, are used inde-pendently to measure the muon trigger efficiency.10.2.1 DatasetThemeasurement uses events from proton–proton collisions atps = 13 TeV, delivered by the LHCin 2016–2018, and recorded by the ATLAS detector. The events need to fulfill all quality criteriaspecified in section 3.2.7. The resulting dataset corresponds to an integrated luminosity of 135.7 fb−1.Figure 3.2 shows the distribution of the mean number of interactions per bunch crossing for datarecorded during the three years considered in this measurement. It also includes additional datarecorded by ATLAS that is not used for physics analyses. Data recorded in 2015 is not considereddue to the small size of this dataset, which has an integrated luminosity of only 3.2 fb−1. Triggerefficiency measurements for the 2015 dataset are performed with Z boson decays to muon pairs. Thestatistical uncertainties for trigger efficiencies measured with the t t¯ andW +jets topologies are verylarge, making the measurement with these topologies less useful for this small dataset.10.2.2 Object definitionsMuons considered in this analysis must have pT > 25 GeV, and need to be within |η| < 2.5. They arerequired to satisfy themedium identification operating point, and no requirement is applied regardingtheir isolation.Electrons need to have pT > 25 GeV and be reconstructed within |η| < 2.47, but are removed ifthey fall into the 1.37 < |η| < 1.52 transition region between calorimeter barrel and end-cap. Theyneed to satisfy the tight identification operating point and the gradient isolation operating point.Jets are required to have pT > 25 GeV and be located within |η| < 2.5. The overlap removalprocedure described in section 4.6 is applied.14510. Muon trigger efficiency measurement10.2.3 Definition of the t t¯ andW +jets channelsEvents in this measurement are recorded with EmissT triggers. They require a threshold of EmissT >50–60 GeV at L1, and EmissT > 100–110 GeV at the HLT stage across the three years 2016–2018. Allselected events are then required to satisfy EmissT > 200 GeV. This puts them far above the EmissT triggerthresholds, and outside the range where the EmissT trigger simulation does not model the behavior indata well.All events need to contain exactly one muon with transverse momentum above 27 GeV, and noelectrons with transverse momentum above 25 GeV.t t¯ channelEvents in the t t¯ channel need to contain at least four jets. At least one jet has to be b-tagged at themedium operating point. The resulting selection of events is dominated by those originating from t t¯production.W +jets channelTheW +jets channel selects mostly events fromW +jets production. It requires events to have at mostfour jets, and no jet that is b-tagged at themedium operating point. This selection guarantees thatthere is no overlap between both channels.10.3 ModelingThe muon trigger efficiency obtained in the simulation of ATLAS is evaluated using samples verysimilar to those described in section 9.2. The event topology targeted in this trigger efficiency mea-surement is different from the t t¯H(bb¯)analysis, with relaxed b-tagging and a large amount of EmissTrequired. Consequently, the samples considered and their treatment are slightly different. The t t¯process is modeled with POWHEG+PYTHIA 8 [97, 100–103], but is not split into multiple components.BothW +jets and Z+jets processes are simulated with SHERPA 2.2.1 [94, 113–115]. Events from theZ+jets process containing b- or c-jets are not scaled by the factor 1.3 used in the t t¯H(bb¯)analysisdue to the different phase space region used in this trigger efficiency measurement. The remainingprocesses have small contributions, and are produced as described in section 9.2.3. These includethe production of single top quark, t t¯V , diboson, and t t¯ t t¯ final states. No Higgs boson productionprocesses are considered in this measurement, as their contributions to the t t¯ andW +jets channelsare negligible.10.3.1 Comparison with dataThe modeling of the muon transverse momentum and pseudorapidity, compared to data, is shown infigure 10.1 for the t t¯ channel. The corresponding distributions for theW +jets channel are shown infigure 10.2. These figures only show statistical uncertainties for the prediction. The EmissT distributions14610. Muon trigger efficiency measurement50 100 150 200 250 300 350 400 [GeV]Tmuon p0.60.811.21.4Data / Pred. 01000020000300004000050000600007000080000EventsDatattW+jetsSingle topZ+jetsDibosonOtherATLAS work in progress-1 = 13 TeV, 135.7 fbs selectiontt0 50 100 150 200 250 300 350 400 [GeV]missTE0.60.811.21.4Data / Pred. 050100150200250300350400450310×EventsDatattW+jetsSingle topZ+jetsDibosonOtherATLAS work in progress-1 = 13 TeV, 135.7 fbs selectionttFigure 10.1: Expected distribution of the muon transverse momentum (left) and EmissT (right) in the t t¯channel, compared to data. An overall normalization factor is applied to simulation to match data,with an effect smaller than 1%. Only statistical uncertainties are shown for the expected distribution,drawn with dashed lines. The EmissT > 200 GeV requirement is not applied in the figures showing theEmissT distributions.are obtained by removing the EmissT > 200 GeV requirement. The simulation slightly underestimatesdata, and an overall normalization factor is applied to the simulation to correct for this effect. In the t t¯channel, this correction is smaller than 1%. The correction for theW +jets channel is around 10%. Bothof these corrections are within the systematic uncertainties affecting the modeling of the simulationin both channels. An overall correction does not influence the trigger efficiency measurement, as thenormalization factor cancels out in the ratio of events used to calculate an efficiency.In the t t¯ channel, the prediction slightly underestimates data for lowmuon transverse momenta,and overestimates it at high pT. In the W +jets channel, the prediction underestimates data by aconstant factor, independent of the muon pT. The EmissT distribution in the t t¯ channel shows a slightdiscrepancy between data and prediction, and a constant offset in theW +jets channel above theEmissT > 200 GeV threshold. The bulk of events in data are modeled well by the prediction. Systematicuncertainties are introduced to cover effects frommis-modeling that may affect the measurement,they are described in section 10.4.10.4 Systematic uncertaintiesThis section summarizes the sources of systematic uncertainty considered in the measurement.Trigger efficiencies and scale factors are measured independently for the nominal configuration andthe systematic variations. The size of each systematic uncertainty is given by the absolute differencebetween themeasurement obtained with the nominal configuration and with the variation, specific to14710. Muon trigger efficiency measurement50 100 150 200 250 300 350 400 [GeV]Tmuon p0.60.811.21.4Data / Pred. 050100150200250300310×EventsDatattW+jetsSingle topZ+jetsDibosonOtherATLAS work in progress-1 = 13 TeV, 135.7 fbsW+jets selection0 50 100 150 200 250 300 350 400 [GeV]missTE0.60.811.21.4Data / Pred. 0500100015002000250030003500310×EventsDatattW+jetsSingle topZ+jetsDibosonOtherATLAS work in progress-1 = 13 TeV, 135.7 fbsW+jets selectionFigure 10.2: Expected distribution of the muon transverse momentum (left) and EmissT (right) in theW +jets channel, compared to data. An overall normalization factor is applied to simulation to matchdata, with an effect around 10%. Only statistical uncertainties are shown for the expected distribution,drawn with dashed lines. The EmissT > 200 GeV requirement is not applied in the figures showing theEmissT distributions.each systematic source. All uncertainties arising from the systematic sources considered are assumedto not be correlated, and added in quadrature to obtain the total systematic uncertainty.The distinction between the t t¯ andW +jets channels relies on b-tagging. A variation of the b-tagging operating point affects the composition of processes contained in both channels, and isconsidered as a systematic uncertainty. For the t t¯ channel, the nominalmedium b-tagging operatingpoint is replaced by the tight b-tagging operating point to define the systematic variation. In theW +jets channel, the variation from the nominalmedium operating point to the loose operating pointis used. The variations for both channels tighten the requirements for events to be considered in theanalysis.The EmissT > 200 GeV requirement in the measurement is considered as an additional sourceof uncertainty. Raising the EmissT threshold to define a systematic uncertainty is not feasible, asany substantial increase in this threshold limits the amount of events remaining for the efficiencymeasurement. The systematic uncertainty derived from that would be dominated by statisticalfluctuations. Instead, a variation of EmissT > 150 GeV is considered as a source of uncertainty. Bybringing this threshold closer to the trigger threshold, the variation covers possible mis-modeling ofthe EmissT trigger.The effect of pile-up on the measurement is considered by raising the jet transverse momentumthreshold to pT > 30 GeV, compared to the nominal requirement of pT > 25 GeV. This tightenedrequirement rejects more pile-up events.The isolation of the reconstructed muonmay affect the trigger efficiency, particularly for triggers14810. Muon trigger efficiency measurementthat themselves include isolation requirements. In the nominal event selection for the efficiencymeasurements, no isolation requirements for the reconstructed muons are included. The effect ofisolation is evaluated via a systematic uncertainty, calculated by comparing the nominal configurationto two other variations. In one variation, the FCTight isolation operating point is applied to thereconstructed muon, while the other variation uses the FCTTO operating point. The systematicuncertainty is defined as the larger of the two variations, compared to the nominal configuration.Lastly, the effect of muon identification requirements is considered. The associated systematicuncertainty is defined as the difference between the nominalmeasurement, obtainedwith themediumidentification operating point, and a measurement using the high-pT operating point.Additional systematic uncertainties related to muon calibration, similar to those described insection 6.6.2, are negligible.10.5 ResultsThis section lists the results of the trigger efficiency measurement and the SFs derived. All results arereported separately for each of the years of data-taking considered, split between the MS barrel andend-cap regions, and for both the t t¯ andW +jets channels. Trigger efficiencies and SFs are listed insection 10.5.1, only including statistical uncertainties. Section 10.5.2 describes the SF results includingsystematic uncertainties.10.5.1 Trigger efficienciesFigure 10.3 shows the trigger efficiencies for muons in the barrel region of the MS, derived for thelogical or between the two triggers described in section 10.1. The corresponding results for theend-cap regions are presented in figure 10.4. The figures show the t t¯ channel on the left, and theW +jets results on the right. Each row corresponds to one year of data-taking, with 2016 on top, 2017in the middle, and 2018 on the bottom. The upper part of each figure shows the trigger efficiency as afunction of the reconstructedmuon transversemomentum. Themeasurement in data is reportedwithblack points, along with the statistical uncertainty. The predicted trigger efficiency from simulation isshown as a green hashed area, with the size of the area indicating the statistical uncertainty. In thelower part of the figure, the trigger SF is shown. It is calculated as the ratio of the trigger efficiencymeasured in data to the measurement in simulation, with a statistical uncertainty given by the sumin quadrature of the statistical uncertainties of the efficiencies entering the calculation. All figuresalso report the measured efficiencies in data and simulation, as well as the measured SF. These areobtained by fitting a constant term to the binned efficiency and SF distributions, only consideringthe bins with muon pT > 100 GeV. The efficiencies and SFs show no large dependence on the muontransverse momentum. When including a term proportional to the muon pT in the fit, no significantslopes are found. The uncertainty shown in the figure for the trigger efficiency and SF results is theabsolute statistical uncertainty, and does not include sources of systematic uncertainty.1492016:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 33.0 fbs selectionttBarrelSF:  0.92%±90.52% Simul. eff.:  0.22%±75.95% Data eff.:  0.67%±68.70% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 33.0 fbsW+jets selectionBarrelSF:  0.75%±90.73% Simul. eff.:  0.38%±75.47% Data eff.:  0.41%±68.54% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.2017:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 44.3 fbs selectionttBarrelSF:  0.80%±86.90% Simul. eff.:  0.28%±76.94% Data eff.:  0.56%±66.81% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 44.3 fbsW+jets selectionBarrelSF:  0.59%±86.80% Simul. eff.:  0.32%±76.61% Data eff.:  0.35%±66.43% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.2018:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 58.4 fbs selectionttBarrelSF:  0.69%±88.29% Simul. eff.:  0.22%±76.94% Data eff.:  0.49%±67.92% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 58.4 fbsW+jets selectionBarrelSF:  0.52%±86.86% Simul. eff.:  0.28%±76.85% Data eff.:  0.31%±66.76% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.Figure 10.3: Muon trigger efficiencies and SFs in the barrel region, measured in the t t¯ (left) andW +jets (right) channels, for data recorded in 2016 (top), 2017 (middle), and 2018 (bottom). The upperpart of the figures show the trigger efficiencies for data in black, and simulation as a hashed greenarea. The lower part shows the SF, given by the ratio of efficiency measured in data to simulation.The efficiencies are shown as a function of the reconstructed muon transverse momentum, and theresulting efficiencies and SFs from a fit to muons with pT > 100 GeV are also listed in the figure. Onlystatistical uncertainties are included.1502016:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 33.0 fbs selectionttEnd-capsSF:  0.93%±96.70% Simul. eff.:  0.26%±87.83% Data eff.:  0.77%±84.79% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 33.0 fbsW+jets selectionEnd-capsSF:  0.51%±98.18% Simul. eff.:  0.27%±87.11% Data eff.:  0.35%±85.51% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.2017:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 44.3 fbs selectionttEnd-capsSF:  0.80%±96.66% Simul. eff.:  0.30%±88.44% Data eff.:  0.65%±85.34% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 44.3 fbsW+jets selectionEnd-capsSF:  0.46%±96.64% Simul. eff.:  0.26%±87.35% Data eff.:  0.30%±84.43% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.2018:00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 58.4 fbs selectionttEnd-capsSF:  0.71%±95.80% Simul. eff.:  0.25%±87.92% Data eff.:  0.57%±84.11% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff. 00.10.20.30.40.50.60.70.80.91EfficiencyDataSimulationATLAS work in progress-1 = 13 TeV, 58.4 fbsW+jets selectionEnd-capsSF:  0.43%±95.47% Simul. eff.:  0.24%±87.13% Data eff.:  0.28%±83.17% 0 100 200 300 400 500 600 [GeV]Tmuon p0.80.911.1Data eff. / Simul. eff.Figure 10.4: Muon trigger efficiencies and SFs in the end-cap regions, measured in the t t¯ (left) andW +jets (right) channels, for data recorded in 2016 (top), 2017 (middle), and 2018 (bottom). The upperpart of the figures show the trigger efficiencies for data in black, and simulation as a hashed greenarea. The lower part shows the SF, given by the ratio of efficiency measured in data to simulation.The efficiencies are shown as a function of the reconstructed muon transverse momentum, and theresulting efficiencies and SFs from a fit to muons with pT > 100 GeV are also listed in the figure. Onlystatistical uncertainties are included.15110. Muon trigger efficiency measurementTable 10.1: Summary of trigger efficiencies for data taken between 2016 and 2018, and for simulation.The results are reported separately for the barrel and end-cap regions, and split by channel. Onlyabsolute statistical uncertainties are included.Year Region Channel Data efficiency Simulation efficiency2016Barrelt t¯ 68.7% ± 0.7% 76.0% ± 0.2%W +jets 68.5% ± 0.4% 75.5% ± 0.4%End-capst t¯ 84.8% ± 0.8% 87.8% ± 0.3%W +jets 85.5% ± 0.4% 87.1% ± 0.3%2017Barrelt t¯ 66.8% ± 0.6% 76.9% ± 0.3%W +jets 66.4% ± 0.4% 76.6% ± 0.3%End-capst t¯ 85.3% ± 0.7% 88.4% ± 0.3%W +jets 84.4% ± 0.3% 87.4% ± 0.3%2018Barrelt t¯ 67.9% ± 0.5% 76.9% ± 0.2%W +jets 66.8% ± 0.3% 76.9% ± 0.3%End-capst t¯ 84.1% ± 0.6% 87.9% ± 0.3%W +jets 83.2% ± 0.3% 87.1% ± 0.2%Themeasured trigger efficiencies generally agree between the t t¯ andW +jets channels within theirstatistical uncertainties. They vary across the three years considered due to changes in the ATLAS MS,as well as changes to the trigger algorithms. The trigger efficiencies measured in the barrel region aresignificantly below the efficiencies in the end-caps. This is mostly caused by the limited coverageof the L1 trigger system in the barrel region, where detector support structure and elevator paths toaccess the inner parts of ATLAS necessitate gaps in the RPC coverage. The L1 system covers 99% ofthe end-caps, and around 80% of the barrel.The simulation overestimates the trigger efficiency in the RPCs, and consequently the triggerefficiencymeasured in data is significantly below the efficiency obtained from simulation in the barrelregion. In the end-caps, the agreement between data and simulation is better, with SFs closer to unity.A summary of the efficiencies measured for muons with pT > 100 GeV for the years 2016–2018and both channels is provided in table 10.1. Only the absolute statistical uncertainties are included inthe table.The statistical uncertainties for the trigger efficiency measured in data are smaller for theW +jetschannel than for the t t¯ channel, both in the barrel and end-cap regions. The efficiencies in simulationhave comparable statistical uncertainties between the channels.10.5.2 Scale factors and impact of systematic uncertaintiesThemeasured muon trigger SFs for muons with pT > 100 GeV recorded in the years 2016–2018, splitbetween barrel and end-cap regions, and split by channel, are summarized in table 10.2. The resultsshow the total absolute uncertainty for the SFs, as well as their split into statistical and systematiccomponents. The total uncertainty is calculated as the sum in quadrature of these two components.15210. Muon trigger efficiency measurementTable 10.2: Summary of SFs for the years 2016–2018. The results are reported separately for the barreland end-cap regions, and split by channel. The total absolute uncertainties for the SFs are shown, andtheir split into statistical and systematic components is also included.Year Region Channel SF (stat.) (syst.)2016Barrelt t¯ 90.5% ± 1.2% (0.9%) (0.8%)W +jets 90.7% ± 1.1% (0.8%) (0.8%)End-capst t¯ 96.7% ± 2.6% (0.9%) (2.5%)W +jets 98.2% ± 0.9% (0.5%) (0.7%)2017Barrelt t¯ 86.9% ± 1.0% (0.8%) (0.6%)W +jets 86.8% ± 0.7% (0.6%) (0.4%)End-capst t¯ 96.7% ± 1.1% (0.8%) (0.7%)W +jets 96.6% ± 1.1% (0.5%) (1.0%)2018Barrelt t¯ 88.3% ± 1.7% (0.7%) (1.6%)W +jets 86.9% ± 1.2% (0.5%) (1.1%)End-capst t¯ 95.8% ± 1.7% (0.7%) (1.5%)W +jets 95.5% ± 0.8% (0.4%) (0.7%)All systematic components are added in quadrature to obtain the total systematic uncertainty.The SFs vary significantly between the three years, caused by differences in the active triggerchambers in the MS. The mis-modeling of the RPC efficiency in simulation needs to be correctedwith SFs that deviate more significantly from unity in the barrel region compared to the end-caps.The measurements in the t t¯ andW +jets channels per year and detector region are compatible witheach other within their uncertainties. Relative statistical uncertainties in the measurement reachup to 1%, while the systematic uncertainties are generally slightly larger. The measured SFs are ingood agreement with those derived from decays of Z bosons to muon pairs, when considering theirassociated uncertainties.A detailed look at the systematic uncertainties, split into the components described in section 10.4,is provided in table 10.3. It lists the arithmetic mean of the relative systematic uncertainty on the SFper source, averaged over the three years 2016–2018 of data-taking.The smallest uncertainty source is the variation of the muon isolation requirement, followed bythe variation of the b-tagging operating point used to define the channels. The variation of the EmissTrequirement and the jet pT threshold have a comparable impact, and the largest source of uncertaintyis the variation of the muon identification operating point. The impact per source of systematicuncertainty is generally larger in the t t¯ channel than in theW +jets channel. A contribution to thiseffect arises from the larger statistical uncertainties in the data efficiency measurement in the t t¯channel compared to theW +jets channel; a statistical fluctuation in the determination of the SFmeasured with the systematic variation can increase the impact from a systematic source.153Table 10.3: Average size of relative systematic uncertainty on the SF per source, calculated as thearithmetic mean over the three years 2016–2018 of data-taking. The uncertainties are reportedseparately per detector region and channel in the measurement.ChannelUncertainty source Region t t¯ W +jetsEmissT thresholdBarrel 0.5% 0.3%End-caps 0.9% 0.3%b-tagging operating pointBarrel 0.2% 0.1%End-caps 0.3% 0.2%Jet pT thresholdBarrel 0.7% 0.3%End-caps 0.5% 0.1%Muon isolationBarrel 0.1% 0.1%End-caps 0.1% 0.1%Muon identificationBarrel 0.5% 0.6%End-caps 1.0% 0.7%15411. Differential cross-section approximationTheMEM presented in chapter 7 is a powerful analysis technique with large computational demand.The calculation of MEM likelihoods relies on two ingredients: the transfer function and the fullydifferential cross-section. Equation (7.7) shows the general form of the likelihoods and is repeatedhere:Lα = 1σα∑perm.∫~YT(~X |~Y )dσα (~Y ) . (11.1)In this expression, T(~X |~Y ) is the transfer function, ~X ,~Y are the kinematics of reconstructed objectsand partons, respectively, and α denotes the process of interest. The transfer function is generallyfast to evaluate. The evaluation of the differential cross-section dσα/σα is much slower for processesinvolving the interaction of many particles.The calculation of the MEM discriminant for the t t¯H(bb¯)analysis described in chapter 6 requiresaround 1012 evaluations of dσα. This includes the calculation for all systematic variations needed forthe statistical analysis in the SR≥6j1 region. When the event selection and kinematics of the selectedevent in the analysis change, which happens while optimizing it for sensitivity, the calculation has tobe repeated. Consequently, the MEM implementation needs to make sufficient approximations tolimit the calculation time to a reasonable level.This chapter investigates the feasibility of approximating dσα with machine learning methods.A sufficiently precise approximation dσˆα can replace the exact expression dσα in the likelihoodevaluation without significant adverse effect. If such an approximation is possible, it can be optimizedfor speed andmay be used to significantly reduce the computation time needed for MEM calculations.Section 11.1 provides a general introduction to the method and discusses its challenges. Anexample for a scattering process with two particles in the final state is provided in section 11.2.Section 11.3 demonstrates the feasibility of the method for a complex scattering process where sixparticles are produced. An alternative approach for faster MEM calculations with machine learningtechniques is discussed in section 11.4 and compared to the approach studied in this chapter.11.1 OverviewThe expression for a fully differential cross-section of a scattering process at the LHC has the formdσα∝ δ4(pnet)dx1dx2N∏i=1d3~pi(2pi)32Ei, (11.2)with momentum fractions x1 and x2 carried by the colliding partons, N final state particles, anda four-vector pnet corresponding to the difference in four-momentum between all incoming and15511. Differential cross-section approximationoutgoing particles. While dσα is an expression with 3N +2 degrees of freedom, it is non-zero onlywhen four-momentum is conserved. The phase space where four-momentum is conserved has 3N−2degrees of freedom, with three degrees of freedom per final state particle, two degrees of freedomrelated to the colliding partons, and the four constraints applied. As it is computationally inexpensiveto check whether four-momentum is conserved in any calculation of interest, the approximation ofdσα is implemented in a phase space where four-momentum conservation is enforced.11.1.1 Fully differential cross-sectionsAfter partial integration over four degrees of freedom, the fully differential cross-section for thescattering of two partons into two final states pi =(Ei ,~pi)is given bydσ2→2 = 164(2pi)2E4B∑j ,k1x1x2E1E2f j (x1) fk (x2) |M2→2|2 pT∣∣~p1∣∣ ∣∣~p2∣∣dpTdφ1dη1dη2, (11.3)with additional details provided in appendix section C.1. The energy of the colliding protons is EB ,and pT is the transverse momentum of the final state particles, which is equal for both in this case.The extension to final states involving additional particles is straightforward; a phase space factord3~pi/(2pi)32Ei per final state particle extends the expression. The production of six final state particlescan be parameterized asdσ2→6 = 11024(2pi)14E4B∑j ,k1x1x2f j (x1) fk (x2) |M2→6|2[5∏i=11EipTi∣∣~pi ∣∣dpTidφidηi]1E6dpz6 (11.4)11.1.2 ChallengesObtaining a good approximation for a differential cross-section becomes increasingly challenging forfinal states with many particles. The differential cross-section can generally not be factorized intocomponents that separately describe the dependence on each kinematic variable; an approximationneeds to correctly model correlations between kinematic variables. The expression can vary overmany orders of magnitude across phase space, and the matrix element can be highly non-uniformwhen describing the effect of resonances in the scattering process. A suitable parametrization of thedifferential cross-section helps mitigate this problem, motivating the use of an expression differentialin the invariant masses of internal resonances. An analytical change of variables to achieve this is notalways possible [137].The differential cross-section across the phase space with 3N −2 degrees of freedom vanishesin many regions. These regions describe events where the colliding partons have more energy thanwhat is provided by the beam of protons, EB . The PDFs vanish in these regions, and the differentialcross-section approaches zero in their vicinity as limx→1 f (x)= 0. The distribution of such regionswith vanishing cross-section depends on the parametrization chosen.A suitable parameterization choice for the differential cross-section simplifies the approxima-tion task. No matter the parameterization, the expression generally remains a function with manydistinct features in its distribution across the phase space, which all need to be captured by a goodapproximation.15611. Differential cross-section approximationggt¯tFigure 11.1: Exemplary Feynman diagram for gluon-initiated t t¯ production.11.2 Final state with two particlesThis section demonstrates the approximation of the differential cross-section for the production of atwo-particle system as a proof of concept. A possible parameterization for such a process is givenin equation (11.3). The production of two particles in head-on collisions is invariant under a globalrotation in the φ direction, motivating the parametrization as a function of dφ1 to take advantage ofthis symmetry.The process studied in this section is the production of a top quark pair, without its subsequentdecays. Only gluon-initiated collisions are considered; no significant impact is expected when in-cluding t t¯ production via quark–antiquark annihilation. An exemplary Feynman diagram is shown infigure 11.1.11.2.1 Approximation methodThe approximation of dσt t¯ is performed with the Foam algorithm [154]. This algorithm is designedfor MC event generation and integration.The Foam algorithm is defined for an integrand ρ, defined across a hypercube. The algorithmsamples the integrand across this hypercube, and then divides the hypercube into two cells. Thesecells are hyperrectangles, obtained by a split along a hyperplane. In subsequent steps, the distributionof the integrand across each cell is sampled, and further binary splits of cells are performed until amaximum amount of cells is reached. The resulting grid of cells is called a foam. The foam definesan approximation ρ′ for the integrand ρ; this approximation is constant across each cell. An eventweight w = ρ′/ρ is defined for each event generated when sampling cells. The implementation used inthis study optimizes the binary splits of cells such that the variance of weights w is minimized.When identifying ρ = dσt t¯ , a foam can be built to obtain an approximation ρ′ = dσˆt t¯ . This foamspans a four-dimensional hypercube, with edges along dpT, dφ1, dη1, and dη2. As the differentialcross-section is constant across dφ1, no cell splits are made in this direction. The foam defines alookup table for the differential cross-section dσt t¯ with a constant approximation dσˆt t¯ across each cell.Cells are large where dσt t¯ is approximately constant, and small where it varies quickly as a function ofthe edges of the hypercube. Due to the binary tree structure of the foam, dσˆt t¯ is fast to evaluate. Thequality of the approximation dσˆt t¯ is limited by the amount of cells used in the foam.15711. Differential cross-section approximation11.2.2 ImplementationA foam with 50 000 cells is constructed to describe dσt t¯ . The implementation of the function uses theform specified in equation (11.3). The LOmatrix element is provided by MADGRAPH5_AMC@NLO[95], denoted as MG5_AMC@NLO in the following. The CTEQ6L1 [120] PDF set is used, evaluatedwith the LHAPDF [17] interface. Renormalization and factorization scale are set to the top quarkmassin the simulation, which is 172.5 GeV for this study. The beam energy is set to EB = 6.5 TeV. A sampleof one million simulated events is produced with MG5_AMC@NLO with equivalent settings. Thissample provides the reference for the differential cross-section distributions considered to evaluatethe foam performance.The distribution of the foam in one dimension u is obtained by marginalizing over the threeremaining dimensions. This is done by considering all cells that overlap with the interval [u1,u2].The prediction dσˆt t¯ in all these cells is summed, weighted by the overlap between each cell and theinterval. This process can be repeated for many intervals to obtain an expression for the normalizeddifferential cross-section in one variable, 1/σt t¯ dσt t¯/du.11.2.3 ResultsFigure 11.2 shows the foam approximation of the normalized differential cross-section as a functionof each of the four variables describing the full phase space for this process. The distribution of thefoam is shown as a dashed blue line, overlaid on top of a reference distribution obtained with theevents generated with MG5_AMC@NLO. The reference distribution is shown in green, with statisticaluncertainties indicated as hashed gray lines. No statistical uncertainty related to the approximationby the foam is included.The foam describes all four variables very well. The transverse momentum distribution is accu-rately approximated across four orders of magnitude. The distribution of the azimuthal angle φ iscompletely uniform due to the design of the foam, and therefore accurately describes the rotationalsymmetry. Both pseudorapidity distributions for the top quarks are also accurately described.This method works well for final states with limited degrees of freedom. Its performance dependson the parameterization chosen for the foam, and the dimensionality of the foam corresponds to theamount of degrees of freedom in the process. The performance significantly reduces for processesinvolving the production of many particles. Since the construction of foams with hundreds of millionsof cells becomes computationally prohibitive, the average amount of cell splits per dimension ofthe process decreases when adding more final state particles. The following section provides analternative method for such cases.11.3 Final state with six particlesThe approximation of differential cross-sections with neural networks (see section 5.3.2) offers severalbenefits. A neural network can handle any amount of input parameters, so it is possible to provideinputs beyond the 3N −2 parameters required to fully describe final states with N particles. This15811. Differential cross-section approximation0 200 400 600 800 1000 [GeV]Tp5−104−103−102−101−10 (1/GeV)T/dpσ dσ1/Foam MG5_aMC@NLO50000 cells3− 2− 1− 0 1 2 31φ00.0050.010.0150.020.0251φ/dσ dσ1/Foam MG5_aMC@NLO50000 cells6− 4− 2− 0 2 4 61η00.010.020.030.040.050.060.071η/dσ dσ1/Foam MG5_aMC@NLO50000 cells6− 4− 2− 0 2 4 62η00.010.020.030.040.050.060.072η/dσ dσ1/Foam MG5_aMC@NLO50000 cellsFigure 11.2: Prediction for the differential cross-section of t t¯ production by a foam with 50 000 cells,drawn as dashed blue lines. The distributions are shown as a function of the four variables used toparameterize the fully differential cross-section, and compared to a reference set of events generatedwith MG5_AMC@NLO. The reference distribution is shown in green, statistical uncertainties areindicated with hashed gray lines.allows to use all physically motivated variables, which strongly affect the differential cross-section,such as the momentum fractions x1, x2 and the invariant masses of internal resonances. A foam builtin d > 3N −2 dimensions approximates a function that is non-zero only in specific regions defined bya delta distribution, and thus is not useful in practice. A neural network with d > 3N −2 input nodesdoes not suffer from this issue; it can make use of all input variables for the function approximationtask, and does not need to learn the relation between input variables. An implementation withneural networks can also take advantage of highly optimized libraries to increase the computationalperformance.This section describes the approximation of the fully differential cross-section for t t¯ production,with subsequent decays of the top quarks intoW bosons and bottom quarks, and also including thedecays of theW bosons. Figure 11.3 visualizes this process.There are six final state particles, for a total of 16 degrees of freedom. Top quark pair productionis a common background process for many analyses at the LHC. It represents an important steptowards the t t¯ +bb¯ process, for which the MEM likelihoods are calculated for the t t¯H (bb¯) analysis15911. Differential cross-section approximationggb¯bq¯ql+νlt¯tW −W +Figure 11.3: Exemplary Feynman diagram for gluon-initiated t t¯ production, with subsequent decaysinto a final state with six particles.as described in chapter 7. This process also includes four resonances, in form of the two top quarksand twoW bosons. It provides a good benchmark for the performance of neural networks whenapproximating its differential cross-section dσt t¯ .11.3.1 General considerationsThe differential cross-section for the production of six particles is given in equation (11.4). Whendividing this expression by the total cross-section,f(~Y)= 1σt t¯dσt t¯ , (11.5)the resulting distribution describes the probability density for generating a final state with a configu-ration ~Y . A set of samples can be generated in the phase space