On Bayesian Baseline Removal and Signal ExtractionMethods in AstronomybyRuihan Henry LiuA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFBachelor of Science (Honours)inTHE FACULTY OF SCIENCE(Physics and Astronomy)The University of British Columbia(Vancouver)April 2021© Ruihan Henry Liu, 2021The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:On Bayesian Baseline Removal and Signal Extraction Methods in As-tronomysubmitted by Ruihan Henry Liu in partial fulfillment of the requirements for thedegree of Bachelor of Science (Honours) in Physics and Mathematics.Examining Committee:Douglas Scott, Physics and AstronomySupervisorRyley Hill, Physics and AstronomySecond ReaderRob Kiefl, Physics and AstronomyPHYS 449 InstructoriiAbstractSignals in radio astronomy are often accompanied with strong instrumental effects.These ”baseline” instrumental effects may affect the detection of real astrophysicalsignals, such as absorption and emission lines. A recent paper published in Naturehas raised the possibility of life on Venus through a claimed detection of phosphine(PH3) gas absorption lines in the Venusian atmosphere. In this thesis, we utilizeBayesian marginalization methods to perform a re-analysis of the original obser-vations of Venus using data from the James Clerk Maxwell Telescope (JCMT) andthe Atacama Large Millimeter Array (ALMA). We find a signal detection in theJames Clerk Maxwell Telescope data, but no phosphine absorption line feature inthe higher resolution data from the Atacama Large Millimeter Array. We thereforeconclude that the phosphine absorption line is not present as originally claimed. Wefurther expand upon the methods used in this specific implementation, and discussother applications of Bayesian signal extraction methodology.iiiLay SummaryIn a recent paper published in the journal Nature, a group of researchers claimed thedetection of a signal in the atmosphere of Venus which might indicate the existenceof life on that planet. In this thesis, we perform a re-analysis of their findings usingthe original observational data, and conclude that the signal detected is likely a dataartifact rather than a genuine sign of life. We further generalize our methods, anddiscuss the applications to other similar feature detection problems.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Purpose and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 32 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Introduction and Definitions . . . . . . . . . . . . . . . . . . . . 42.2 Bayesian Derivation for Maximum and Marginal Likelihoods . . . 62.2.1 Derivation for the maximum likelihood . . . . . . . . . . 62.2.2 Derivation and Discussion for the marginal likelihood . . 82.3 Discussion of Likelihood Functions . . . . . . . . . . . . . . . . 102.3.1 Gaussian Likelihoods . . . . . . . . . . . . . . . . . . . . 102.4 The Gaussian Sampling Prior . . . . . . . . . . . . . . . . . . . . 122.5 Example: Toy Model . . . . . . . . . . . . . . . . . . . . . . . . 132.6 The Addition of the Line Component . . . . . . . . . . . . . . . . 16v2.6.1 Prior Restrictions During Line Fitting . . . . . . . . . . . 173 Bayesian Line Fitting Applied to Observational Data from the JamesClerk Maxwell Telescope . . . . . . . . . . . . . . . . . . . . . . . . 183.1 Traditional JCMT Data Analysis . . . . . . . . . . . . . . . . . . 183.2 Bayesian Analysis of JCMT Data . . . . . . . . . . . . . . . . . . 213.3 Discussion of JCMT Analysis Results . . . . . . . . . . . . . . . 254 Bayesian Line Fitting Applied to Observational Data from the Ata-cama Large Millimeter Array . . . . . . . . . . . . . . . . . . . . . . 274.1 Traditional ALMA Data Analysis . . . . . . . . . . . . . . . . . 274.2 Bayesian Analysis of ALMA Data . . . . . . . . . . . . . . . . . 294.3 Discussion of ALMA data analysis results . . . . . . . . . . . . . 325 Discussion of the Generalization of Bayesian Line Fitting Methodology 366 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . 396.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41viList of FiguresFigure 1.1 Supplementary Figure 4 from Greaves et al. [5]. The left panelshows the raw signal of different regions of the disk of Venustaken from ALMA, as well as their baseline fits (dotted lines).The right panel shows the signals after baseline removal. Ob-serve that although the removal of the baseline did flatten theoriginally artifact-dominated signal, the baseline removal pro-cess also contributed significantly to the detection of the ab-sorption line in the blue signal. The authors used the blue sig-nal to make the case for phosphine detection, but a portion ofthat signal comes from the choice of baseline fit . . . . . . . . 2Figure 2.1 Toy model of best-fit and marginal fitting lines. The blue linesrepresent the generated data, the grey represent the set of marginalbaselines and the yellow represent the best-fit baseline on thegenerated data. . . . . . . . . . . . . . . . . . . . . . . . . . 14viiFigure 3.1 Results of data analysis of the JCMT observations followingthe techniques utilised in Greaves et al. [5]. The left three pan-els represent the three-step baseline removal approach used bythe original authors to confirm the signal, with the cyan linerepresenting the background signal, while the darker line rep-resents the baseline fitted and removed in each step. These leftpanels can be compared to Supplementary Figure 1 in Greaveset al. [5] for clarity. The right panel represents the final flat-tened signal after the previously described data processing steps,binned into a histogram to mimic Figure 1 of Greaves et al. [5].Although a weak absorption line is present at around −8 kms−1, this detection does not seem as strong as the one shown inthe original paper. . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 3.2 Plots of 2nd-order, 4th-order and 6th-order polynomial fits onthe combined JCMT Venus observations. The left panels rep-resent the baseline fits with no absorption line, while the rightpanels represent the fits with an added Gaussian modelling lineabsorption, using the methods described in Section 2.6. Thenavy line represents the data, the grey lines represent a sampleof marginalized baselines and the yellow line represents thebest-fit in each panel. . . . . . . . . . . . . . . . . . . . . . . 22Figure 3.3 Computed (unnormalised) probabilities for the Bayesian marginalbest-fit analysis for the line and no line models, for the cases ofa uniform prior and an added L0 prior. Note that given the un-normalized probabilities, it is only relevant here to comparethe probabilities to each other, and to disregard the y-scalingentirely. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 4.1 Continuum image of Venus as observed by ALMA, averagedover all frequency channels. The angular diameter of Venus atthe centre of the disk measures 15.36′′. . . . . . . . . . . . . 29viiiFigure 4.2 Disk-averaged ALMA narrowband signal centred on the phos-phine absorption line. The top panel shows the original signalas well as a 12th-order polynomial fit on top, while the bottompanel shows the residuals after subtraction of the 12th-orderbaseline from the original signal. . . . . . . . . . . . . . . . 30Figure 4.3 Bayesian marginalization over ALMA data. As was the casewith Figure 3.2, we sample from a Gaussian distribution cen-tered on the best-fit parameters to approximately marginalizeover all polynomial orders. The navy line represents the data,the grey lines represent a sample of marginalized baselines,and the yellow line represents the best-fit in each panel. . . . . 31Figure 4.4 Computed relative probabilities for the Bayesian line analysisconducted on the ALMA observations. As is the case withFigure 3.3, the probabilities are un-normalised and thus therelative probabilities give the true comparison. . . . . . . . . 33ixGlossaryAIC Akaike information criterionALMA Atacama Large Millimeter ArrayCADC Canadian Astronomy Data CentreIID Independent and identically distributedJCMT James Clerk Maxwell TelescopeMAP Maximum a posterioriMLE Maximum likelihood estimateMSE Mean squared errorNLL Negative log-likelihoodxAcknowledgmentsFirst and foremost, I would like to thank my advisor, Douglas Scott, for his supportand advice throughout not only this thesis, but all my undergraduate years. When Ifirst started working with Douglas, I was only a fresh second-year undergraduate,and much of my academic development in the years since can be attributed to hismentorship. Our discussions are always interesting, whether the topic is science orsomething else entirely. I am forever grateful to him for helping me start my pathdown cosmology and astro-statistics.I would also like to thank Ryley Hill, who has also taught me so much over thelast few years. During this thesis and on our past projects, Ryley has supported megreatly, and even helped me with code when I ran into roadblocks. In addition toall that, he also agreed to be the second reader of this thesis. I am thus very gratefulto him for his help over the years.I would also like to thank Professor Rob Kiefl, and the rest of the PHYS 449class, for his support and feedback during the writing and presentation of this the-sis. Writing an undergraduate thesis is a challenging task, but it was made evereasier alongside everyone in this class.This past pandemic year has been difficult for many, so I am extraordinarilygrateful for all my friends who have kept me sane through it all. Although mytime with them have mostly impeded my productivity, including my progress onthis thesis, I could not have asked for better companionship throughout my collegeyears. I would like to especially thank, in no particular order, Kevin, Jordan, Alex,Tiger, Charles, Ryan, Parker and Sam for the bits of fun that you all brought meduring this challenging year.Lastly, I would like to thank my parents and the rest of my family, for theirxiconstant, unwavering support of all my endeavours. They always make me feel athome and welcome, even during the most challenging times. There is so much tosay to them, but here I’ll just condense it to one sentence: Thank you, for every-thing.xiiChapter 1IntroductionIf you torture the data enough, nature will always confess.— Ronald Coase1.1 MotivationRecently, a paper was published in Nature reporting the presence of phosphine gasin the cloud decks of Venus [5], a claim which, if true, could mean that signs oflife have been detected on the planet. This astonishing discovery could have hugeimplications for the future of solar system science and space exploration, and itwas given widespread media coverage. However, further re-analysis of the givendata by other groups revealed what may be flaws in the original analysis, and castinto doubt the results by Greaves et al. [5] [10, 12].One problem with the initial analysis lies in the method of baseline subtractionemployed by the original authors. Spectroscopic radio signals are often contam-inated with large baseline artifacts. The commonly accepted practice of dealingwith them involves subtracting a polynomial baseline fit before searching for linefeatures in the baseline-free data.In the case of the claimed phosphine line detection, the authors utilised a 12th-order polynomial to remove the baseline, this being chosen purely empirically.While such a high-order polynomial can indeed remove the measurement artifacts,it could also overfit the data, and add unintended artefacts to the output signal.1Figure 1.1: Supplementary Figure 4 from Greaves et al. [5]. The left panelshows the raw signal of different regions of the disk of Venus takenfrom ALMA, as well as their baseline fits (dotted lines). The right panelshows the signals after baseline removal. Observe that although the re-moval of the baseline did flatten the originally artifact-dominated signal,the baseline removal process also contributed significantly to the detec-tion of the absorption line in the blue signal. The authors used the bluesignal to make the case for phosphine detection, but a portion of thatsignal comes from the choice of baseline fitFigure 1.1, taken from Greaves et al. [5], clearly shows that the “removal” of abaseline has actually increased the amplitude of the original signal. This leads usto ask if the perceived phosphine line detection is only due to the data processing,specifically the choice of baseline to fit, rather than being a part of the originalsignal.All of this raises a more general problem. How does one systematically fit abaseline to a radio spectrum, while maintaining confidence that the true astrophys-ical signal will be left unaltered? Or, at least, how does one assess the uncertaintieson such a data analysis process? Rather than considering models by “trial and er-2ror”, as in the current papers, I propose a more systematic Bayesian approach tothis problem, applying Bayesian inference models for hypothesis testing of possi-ble line detection in the presence of unknown baselines.A baseline fitting method based on Bayesian inference would not only be use-ful for the specific case of detecting absorption lines on Venus, but could haveapplications in a broad range of other astronomical contexts. If made availableas an easy-to-use package, Bayesian line detection methods could make it mucheasier for astronomers to conclusively extract signals from noisy data with varyingbackgound, and it could help future scientists avoid the potential mistakes whichhave cast this latest Venus discovery into doubt.1.2 Purpose and OutlineThe goal of this project is to develop such a Bayesian signal-extraction and baseline-fitting methodology. We will apply our more rigourous Bayesian method to a re-analysis of the Venus observational data utilised by Greaves et al. [5]. We willconsider a systematic approach to the question of line detection, and we will utilisea Bayesian statistical analysis technique to consider and re-analyze the originaldata, in order to determine the possible existence of an absorption line.In Chapter 2, we will discuss the relevant statistical theory for Bayesian marginal-ization of data. Chapter 3 outlines the analysis methods used by Greaves et al. [5]on the James Clerk Maxwell Telescope (JCMT) data, and improves upon them withour own Bayesian methodology. Chapter 4 discusses the methods and results of ourBayesian signal extraction approach when applied to the higher spatial-resolutiondata from the Atacama Large Millimeter Array (ALMA). Chapter 5 will discussthe potential to generalize our approach to other applications in astronomy, andChapter 6 will conclude by summing up our findings and the potential for futurework.TODO: general references for JCMT and ALMA3Chapter 2TheoryThere are three types of lies – lies, damn lies, and statistics.”— Benjamin DisraeliIn this chapter, we will explore the statistical theory behind the detection ofabsorption lines in an astronomical spectrum. We will specifically consider a for-mal, Bayesian approach to the problem of maximization and marginalization of theprobability.2.1 Introduction and DefinitionsFirst recall the Bayes Theorem:P (H | E) = P (E | H) · P (H)P (E), (2.1)where P (H | E) is the posterior distribution given the hypothesis H and evidenceE. The posterior is obtained via the likelihood, P (E|H)P (E) , which depends on thecurrent evidence, and the prior P (H), defined as the distribution of the hypoth-esis based exclusively on past information. The goal of this section, and of theproject as a whole, is to consider a formulation of the probability based on Bayes’theorem. We will utilise Bayesian statistics to systematically derive a method fordetermining the existence of features (i.e. absorption lines) in baseline signals.4In general terms, the problem we face is simple: we have a set of data D, andtwo sets of models, H1 and H2, and our goal is to find which one of these two setsof models represents a better fit on the data.We first start off with a few definitions. Let I be defined as the prior infor-mation, M be defined as some model for the data, and D be defined as the datathemselves. We consider H1 to be the hypothesis (or model) that there is only abaseline and H2 to be the hypothesis that there is a baseline and an absorption line.Furthermore, since we are considering polynomials as the baseline models in ourapplication to absorption lines on Venus, we consider n to imply a model of a n-th degree polynomial, with n + 1 free parameters. We also consider ~w or ~wn torepresent a vector of size n + 1 (n only specified if we are dealing with multiplevalues) which contains the coefficients of the polynomial model. The absorptionline model H2 also includes another component to model the line itself, but thiswill be further discussed in Section 2.6.Bayesian model comparison then directs our overall goal to be the computationof the odds ratio,O12 =P (H1|D, I)P (H2|D, I) =P (H1|I)P (H2|I)P (D|H1, I)P (D|H2, I) , (2.2)to determine the ratio of probabilities of one model being better than the other. Thenormalization constant (the denominator of Equation 2.1 conveniently drops outhere, so the only relevant component is the numerator of Equation 2.1.We can treat the prior component, P (H1|I)P (H2|I) , as unity, meaning that we haveno innate preference of one model over another. This could change if we wereto expect (or not expect) a line at the position, but it is hard to quantify how totake into account such prior information. Furthermore, we note that although thisgeneral prior is considered unity, there are additional priors, (such as the one in thenext Equation) which may be utilised and will be discussed later in this work. Withregards to the odds ratio, however, we focus on the Bayesian likelihood component,given simply by P (D|H1,I)P (D|H2,I) .To calculate this properly within the Bayesian context, we consider the compu-tation of P (D|M, I), where M is a generic model representing one of H1 or H2.We can first marginalize each model over n, which is the number of parameters in5the polynomial:P (D|M, I) =∑nP (n|M, I)P (D|n,M, I). (2.3)Here P (n|M, I) is the “prior probability” we place on n, the polynomial order.This is the term we consider for limiting the complexity of our model. At the sametime P (D|n,M, I) represents the “likelihood function”, which can be calculatedusing multiple different methods, as discussed in the next section. In later sec-tions we will discuss the viability of Equation 2.3, and some issues that arise whenutilising this equation.2.2 Bayesian Derivation for Maximum and MarginalLikelihoodsThere are two methods of computingP (D|n,M, I) over the parameters ~wn, namelymaximization and marginalization. In this section we will describe how to computefor the two methods, and discuss the differences and similarities between them.2.2.1 Derivation for the maximum likelihoodLet ~w be a vector of size n + 1, representing the polynomial coefficients for thebaseline. In a maximum likelihood scenario, we are looking for the solution ~ˆwMLEwhich successfully maximizes P (D|~w, n,M, I), with ~ˆwMLE being defined as~ˆwMLE = argmax~w{P (D|~w, n,M, I)}, (2.4)WhereP (D|n,M, I) = max~w{P (D|~w, n,M, I)}. (2.5)The successful ~ˆw maximizes the likelihood function, and as a result this type ofestimation is known as a Maximum likelihood estimate (MLE).MLE can be further expanded to the case of Maximum a posteriori (MAP)estimation, which considers an additional prior, through the use of Bayes’ theorem.In the case of MAP estimation, we instead find P (~w|D,n,M, I), which through6Bayes’ theorem can be seen asP (~w|D,n,M, I) = P (D|~w, n,M, I)P (~w|n,M, I)P (D|n,M, I) . (2.6)Note that the additional parameters n,M, and I are carried over from the earlierderivation for continuity. We see that since ~w is implicitly dependent on n, we canalso consider P (~w|n,M, I) to be the same as P (~w, n|M, I), and in special casesof this prior, it can be considered the same as P (n|M, I) in Equation 2.3.In the case of MAP then, the maximal solution ~ˆwMAP is given by~ˆwMAP = argmax~w{P (~w|D,n,M, I)}= argmax~w{P (D|~w, n,M, I)P (~w|n,M, I)P (D|n,M, I)} (2.7)Notice that the denominator does not depend on ~w and can therefore be broughtout of the argmax function.In a slightly loose manner, we then see that because we can refer toP (~w, n|M, I)as similar to P (n|M, I), then the numerator is exactly what we are looking for inEquation 2.3. Then we can consider P (n|M, I)P (D|n,M, I) as:P (n|M, I)P (D|n,M, I) = max~wn{P (D|~wn, n,M, I)P (~wn|n,M, I)}, (2.8)using the approximate definition P (~wn|n,M, I) = P (~wn, n|M, I).In this formulation, we therefore calculate the model probability asP (D|M, I) =∑n(max~wn{P (D|~wn, n,M, I)P (~wn|n,M, I)}). (2.9)We see that each term in the sum is a separate minimization, and hence each termcan be found independently using log-likelihood minimizers.Example: The Aikake information criterionIn order to further develop this MAP concept, note that this formulation of modelcomparison given by the odds ratio in Equation 2.2 and Equation 2.9 can be re-7duced using specific existing forms of model comparison. In particular, the Akaikeinformation criterion (AIC), is an often used reduction of this MAP estimationmethod.The AIC is given byA = 2n− 2 ln(L( ~ˆw)), (2.10)With L( ~ˆw) = P (D|~w, n,M, I) by definition. The model selection based on theAIC then states that the minimal A yields the best model.Observe that the AIC is essentially minimizing the Negative log-likelihood(NLL) of Equation 2.8, with an L0 prior:P (~w|n,M, I) = exp(−n) = exp(−||~w||0). (2.11)We note that the L0 prior of a vector returns simply the number of dimensions ofthat vector, such that ||~wn||0 = n. We then see that−2 log (P (D|~w, n,M, I)P (~w|n,M, I)) = 2n− 2 log(L(~w)) (2.12)and the maximum of Equation 2.8 then yields the minimum of Equation 2.12.The next step of the AIC is a bit different in our formulation. For an AIC prob-lem we consider the model having minimum AIC value to find the best parameternˆ and the best probability P (D|M, I) = P (D|nˆ,M, I) :nˆ = argminn{2n− 2 ln(P (D|~wn, n,M, I))};P (D|M, I) = minn{2n− 2 ln(P (D|~wn, n,M, I))}.(2.13)This is different than our formulation for P (D|M, I) given in Equation 2.3. Ineffect, AIC chooses the best model with two layers of maximization of parame-ters. The next section will discuss using a marginalization of parameters instead ofmaximization for the purposes of model selection.2.2.2 Derivation and Discussion for the marginal likelihoodTo consider the marginal likelihood solution, rather than considering the maximalprobability, we treat ~w as nuisance parameters and instead integrates over them to8find the marginal solution.The marginal likelihood in our case is given byP (D|n,M, I) =∫d~wP (~w|n,M, I)P (D|~w, n,M, I) = L(n). (2.14)Such a solution still relies on the priorP (~w|n,M, I) and likelihoodP (D|~w, n,M, I),but instead of finding the maximal parameters ~ˆw, we integrate over all possiblecombinations of ~w to calculate the probability.It is worth noting that finding the MLE or MAP solutions to the line-fittingproblem are easier compared to marginal likelihoods, since this is a problem ofoptimization rather than integration. In high-dimensional parameter spaces, an op-timization problem can be solved with standard optimization methods such as leastsquares or stochastic gradient descent; however, an integration problem requireseither an analytic integral, or more likely Monte Carlo sampling to integrate andcompute. We thus see that the number of samples must also increase along with thedimensionality of the integral. As a result, a marginalization approach involvingnumerical integration will take longer to achieve.In practice, for the problem of sampling the parameters, sampling over all ofthe parameter space too hard to achieve, due to the large sample space and theunhelpful nature of sampling states that are not relevant to discussion (i.e. it isunnecessary and unhelpful to sample baseline polynomials which are vastly differ-ent from the signal). Instead of marginalizing over the whole sample space then,we can marginalize over a prior distribution centred on the best-fit parameters ~ˆw.Although a prior distribution is not a perfect approximation, it is an accurate repre-sentation of our confidence in the parameters given by the MLE or MAP optimiza-tions. In most cases, a Gaussian confidence prior can be used, and we will furtherdiscuss this in Section 2.4.One potential problem of marginalizing near the best-fit ~ˆw parameters is thepossibility of having multiple similar local maxima in the ~w parameter space. How-ever, we see that for the practical problem of polynomial line fitting, this is not animportant issue. In particular, the best-fit n-th order polynomial to any particularsignal is unique, with the stipulation that the n-th coefficient, an 6= 0. We see thatfor each order n polynomial, when we marginalize we are fixing am = 0 for all9m > n, while allowing ak to marginalize when k ≤ n. When marginalized overthe k dimensional space of a Gaussian centred at the maximal point, we do ideallyobtain the marginal likelihood we are looking for.To fully marginalize over Equation 2.14 then, we add the aforementioned Gaus-sian prior and sample from a distribution centered on ~ˆw with a Monte Carlo inte-gration method. We further note that this Monte Carlo integration is harder toaccomplish as the polynomial order n increases, since the size of the parameterspace to be sampled increases exponentially with the dimension of ~w. This prob-lem is known as the “curse of dimensionality” for sampling problems. Luckily,as we will see in later sections and in the practical applications of the Bayesianmarginals, high-order polynomial functions are unnecessarily complex and nearlyalways yield near zero probabilities. We will thus find that it will be possible to cutoff the polynomial order at a certain point.2.3 Discussion of Likelihood FunctionsRecall that the likelihood function is the term P (D|~w, n,M, I) in Equations 2.4and 2.14. This term was only roughly defined in the previous sections, but here weshall specify the likelihood definition and its applications in our work.Let f(xi|n, ~w), simplified to fi, represent the model’s (with the model givenby n and ~w) prediction at the i-th data point in D (where D has a total of N datapoints). The quantity xi is the assumed x axis of the data, which in our case isvelocity or frequency. We can further formalize this by statingDi = f(xi|n, ~w) + ei, (2.15)where ei is the presumed error. For simplicity’s sake, we will from now on definefi = f(xi|n, ~w) for the rest of this section.2.3.1 Gaussian LikelihoodsIn the most common case, ei is assumed to be Independent and identically dis-tributed (IID), following a Gaussian (normal) distribution centered at µ = 0. Fol-10lowing this assumption, the probability of ei being any value is given byP (ei) =1σi√2piexp(− e2i2σ2i)=1σi√2piexp(−(Di − fi)22σ2i), (2.16)where σi is the variance of the normal. Because each ei is supposedly IID, we haveσi = σ for all i, and we can set the σ parameter based on the standard deviationof Di from fi, to be used as a normalizing constant for the rough magnitude of thedeviation.Because each ei is independent, the combined Gaussian likelihood probabilityis then given by the product of each P (ei):P (D|~w, n,M, I) =N∏i1σ√2piexp(−(Di − fi)22σ2)=(1σ√2pi)Nexp(N∑i−(Di − fi)22σ2).(2.17)The probability distribution given by Equation 2.17 can be easily maximizedto obtain the optimal MLE parameters ~ˆwMLE . This is achieved by minimizing theNLL of the likelihood function. As the likelihood is an exponential, the NLL issimply− logP (D|~w, n,M, I) =N∑i(Di − fi)22σ2+N log σ√2pi. (2.18)We see the second term is constant, wheras the quantity σ in the first term is in-dependent of i and thus can be factored out with no effect on the minimum. As aresult, minimizing the NLL of the Gaussian likelihood is equivalent to minimizingthe minimizer:S =N∑i(Di − fi)2. (2.19)This is simply the least-squares minimizer, which can be minimized using standardoptimization methods found in most optimization packages. As a matter of fact, the11default line-fitting algorithms in scipy and other packages work by minimizingthe least squares of the data. As a result, we see that it is quite easy to compute themaximum likelihood parameters ~ˆwMLE .We further see that the least-squares minimizer in Equation 2.19 is also propor-tional to the Mean squared error (MSE):M =1NN∑i(Di − fi)2 (2.20)Which is another commonly used model evaluation method. In later sections, wewill be using the MSE as well as the likelihood to calculate the uncertainties fordifferent models.The convenience of the Gaussian likelihood and its simple reduction to theleast-squares problem shows us the benefits of utilising Gaussian likelihood in ourmethodology. In the case of marginalization, however, we see that we would stillneed to marginalize over the probabilities given in Equation 2.17, rather than usingthe MSE errors.2.4 The Gaussian Sampling PriorIn Bayesian statistical inference, the prior probability represents the prior knowl-edge we have before current evidence is taken into account. We may add priorsto represent prior information of the model, or to restrict the complexity or rangeof the model in question. In the case of our specific polynomial fitting problem, anumber of priors are utilised at different stages to represent the known informationfor these problems. In this section, we will discuss the relevant priors used.The first prior we utilise refers to a point discussed briefly in Section 2.2.2,which is the Gaussian sampling prior. In computing the marginal likelihood givenby Equation 2.14, it is challenging and unnecessary to integrate over the entirety ofthe n-dimensional parameter space, since the whole sample space is not relevantto the problem. Instead, we compute the Monte Carlo integral over Equation 2.14by sampling over a Gaussian distribution. This is exactly equivalent to setting the12prior in Equation 2.14 toP (~w|n,M, I) = N ( ~ˆw,Σ), (2.21)which is a multivariate normal distribution with multivariate mean ~ˆw and covari-ance matrix Σ.The quantity Σ in Equation 2.21 is the covariance matrix of the parameters,which we approximate as diagonal, so as to make the parameters ~wj independent ofthe others. To estimate Σ, we use the the covariance matrix Σ˜, generated from theleast-squares optimization to find ~ˆw, and set all non-diagonal elements to zero (wechose to set the diagonals to zero because it is easier and more efficient to considerindependent parameters than dependent parameters). Lastly, in order to expand theregion of marginalization beyond the well-constrained covariances, we multiplythe resultant diagnonal covariance matrix by a factor of 10, so as to marginalizeover a larger area.2.5 Example: Toy ModelIn order to further develop the theory, we apply these Bayesian methods to a toymodel in order to better test the concepts of marginal likelihoods and the oddsratios. We generate two sets of artificial baseline data, one quadratic and one cubic,both with additional Gaussian noise. We then attempt to fit 2nd-, 3rd- and 4th-orderpolynomial functions to these generated datasets, using the methods described sofar.To create the fake data, we first randomly generate one cubic and one quadraticpolynomial with random coefficients, to which we add independent Gaussian noiseto simulate noisy observational data. We then first optimize to find the best-fitparameters on each polynomial, which equates to a maximum likelihood estimateof the data from our discussions in Section 2.2.Given the best-fit parameters of the simulated data, we next marginalize overthe polynomial coefficients to compute the marginal likelihood as discussed in Sec-tion 2.2.2. As considered in Section 2.4, rather than marginalizing over the entirepolynomial parameter sample space, we instead marginalize over a Gaussian dis-tribution centered on the best-fit line, with the width of the Gaussian calculated13from the least squares covariance.120100806040n=2MSE = 101.09120100806040n=3MSE = 106.4710.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.012010080604020 n=4MSE = 126.13(a) Quadratic real baseline1008060402002040 n=2MSE = 176.781008060402002040 n=3MSE = 112.1910.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.010075502502550 n=4MSE = 130.59(b) Cubic real baselineFigure 2.1: Toy model of best-fit and marginal fitting lines. The blue linesrepresent the generated data, the grey represent the set of marginal base-lines and the yellow represent the best-fit baseline on the generated data.A visualization of this approach is given in Figure 2.1. As expected, we see thatthe higher-order best-fit models are always able to fit the generated data correctlydue to its additional complexity. However there is also a downside to this higherorder complexity. Due to the larger and more varied sample space of the complexmodel, we see that a sampling through the higher order models yields worse fitsbecause the complex models are more likely to have larger variances from thebest-fit line. This is further confirmed by looking at the averaged MSE for themarginalized fits, and we see that although the best fit of the complex models yieldacceptable results, the marginalized fits favour the simplest model that accuratelyrepresents the given data.We can further confirm this by considering the associated marginal likelihoodsof each model, which can be calculated using the least-squares minimizer. This isconfirmed when we consider the mean squared errors and the associated probabil-14ities from the marginalisation. Recall that from Equation 2.17 and Equation 2.20,the probability of a given function isP (D|~w, n,M, I) ∝ exp(N∑i−(Di − fi)22σ2). (2.22)Marginalizing as in Equation 2.14, we obtainP (D|n,M, I) ∝∫P (~w|n,M, I)P (D|~w, n,M, I) d~w=1kk∑j=1exp(N∑i−(Di − f(xi|n, ~wj))22σ2).(2.23)Again, n refers here to the polynomial order, M to the model (in this case, a poly-nomial), I refers to some arbitrary prior information, and k is the number of sam-ples taken during the Bayesian sampling process. The prior term is omitted fromthe sum, since the prior is realized through the sampling of ~wj from a Gaussianprior distribution N ( ~ˆw,Σ), as discussed in Section 2.4.The odds ratio is therefore given byOn1n2 =P (D|n1, I)P (D|n2, I) ∝∑ki=1 exp(∑Ni−(Di−f(xi|n1, ~wn1k ))22σ2)∑ki=1 exp(∑Ni−(Di−f(xi|n2, ~wn2k ))22σ2) , (2.24)where n1 represents the n1-th order polynomial while n2 represents the n2-th or-der polynomial, and σ again represents the deviation of the data Di from fi, asdescribed in the previous section.Given the MLE results shown in Figure 2.1, we see that the correct model hadthe lowest MLE calculation. This translates to a greater probability through the ex-ponential, which gives a favourable odds ratio via the calculation of Equation 2.24.We were able to further confirm the validity of the correct baseline by generatinga set of 1000 cubic (n = 3) polynomial signals, finding an odds ratio comparisonwith O23 =P (D|n=2)P (D|n=3) < 1 for all sample signals. We also found the same to betrue for O43, with O43 < 1 for all sample signals, which demonstrates the abilityof this method to constrain complexity of the model to some extent.15Going back to the numerical analysis shown in Figure 2.1, we observe exactlythe same phenomenon during line fitting. A model with insufficient complexity isshown to be a terrible fit overal, as is the case with the top plot in Figure 2.1 (b).However, a model with too much complexity will tend to deviate from the best-fitdramatically when marginalized, since the unnecessary complexity adds additionalparameters to the marginalization algorithm. This is observed in the bottom twopanels on Figure 2.1. Furthermore, a lower mean squared error would indicate ahigher probability from Equation 2.17, which weighs in favour through the oddsratio.This toy model example demonstrates the effectiveness of a marginalized Bayesianline-fitting solution as compared to a simpler MLE method. In a MLE formulation,we would simply judge models based on the mean squared error of the best-fit line,ignoring the complexity of the model. However, such a solution is biased towardsmore complex models, since such models often have more free parameters andthus are able to fit over a much larger space. These models also run the risk ofover-fitting the data, and we see that such a system would always favour the morecomplex model over the simpler model.2.6 The Addition of the Line ComponentIn the previous sections, we have dealt with a generalized Bayesian treatment ofthe odds ratio, considering the model M as only an arbitrary polynomial function.However, the point of this project is to model the additional absorption line presentwithin the data alongside the polynomial baseline.To model an absorption line, we consider a Gaussian component at a specifiedlocation ν:g(x|A, ν, σ2L) = A exp(−(x− ν)22σ2L). (2.25)The σL variance parameter defined should not be confused with that defined inEquation 2.16, and is distinguished by the subscript L for line. x in this equationrefers to a continuous version of the xi from Section 3. We see that in this lineapproximation, we have three free parameters (A, σL, and ν). In an AIC model16this can be described by adding 3 to n in the exponential prior in Equation 2.9.This will be fitted along with the baseline fit.To conduct a marginal likelihood analysis, we consider a fully marginalizedmodel,P (D|n,H2, I) =∫∫∫d~w dσL dAP (~w,A, σL|n,M, I)P (D|~w, n,A, σL, H2, I),(2.26)which represents a fully marginalized probability. In words, this considers all pos-sible polynomials of the n-th order along with all possible line amplitudes andwidths, in order to fully consider the possibility of having a line at the specifiedlocation.Although Equation 2.26 is easy to write down, computing it practically is quitechallenging. We note that this is even more difficult than the problem discussedin Section 2.2.2, especially since we do not have a good approximation for thelikelihood P (D|~w, n,A, σL, H2, I). Furthermore, in this case we are even lesssure of possible forms for the prior P (~w,A, σL|n,M, I).One proposed method of calculating Equation 2.26 involves generating ran-dom values of ~w,A, σL within a certain range (or distribution) of maximized val-ues ~ˆw, Aˆ, σˆL, then evaluating the likelihood for each of those randomly sampledvariables. As in the case with marginalizing over polynomials, such a problem maysuffer under the curse of dimensionality, which may lead to incomplete samplingover the whole sample space in higher dimensions.2.6.1 Prior Restrictions During Line FittingTo accommodate our prior knowledge on the line location and width, we includeanother prior during the line-fitting process. In order to ensure that the line detectedis indeed the absorption line we are looking for and not some other line, we mustrestrict the range of the line parameter ν. Similarly, to ensure that the width of thedetected line is not diluted to unphysical scales, we further include restrictions onthe line width parameter σL. Lastly, the amplitude of the line A must be restrictedsuch that A < 0 to represent an absorption line. These restrictions are included asa uniform prior on the parameters ν, σL and A during the line-fitting process, suchthat any fits outside the region are rejected.17Chapter 3Bayesian Line Fitting Applied toObservational Data from theJames Clerk Maxwell TelescopeBeware of bugs in the above code; I have only proved it correct, nottried it. — Donald E. KnuthIn this chapter, we will discuss the data analysis conducted on observationaldata from the James Clerk Maxwell Telescope (JCMT). The James Clerk MaxwellTelescope is a submillimetre-wavelength radio telescope with a single 15m dish,situated on Mauna Kea, Hawaii.3.1 Traditional JCMT Data AnalysisTo consider the analysis of the data from the James Clerk Maxwell Telescope, wefirst reproduced the data reduction described in Greaves et al. [5]. As given in theiroriginal work, Greaves et al. [5] subtracted three baselines of various amplitudesfrom the observed JCMT dataUsing the publicly available observations archived at the Canadian AstronomyData Centre (CADC)1, we were able to download and work with the original data1https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/18used by the Greaves et al. team. The original observations, listing Jane Greaves asthe PI, were taken on several mornings in June 2017. Disregarding some failed andexcessively noise data points, we were left with eight independent observations ofVenus. These observations include observation No. 49 on June 9th, observationsNo. 96 and No. 97 on June 11th, observations No. 97 and No. 100 on June12th, observations No. 76 and No. 78 on June 14th, and lastly observation No. 49on June 16th. Following Greaves et al., we omitted observation No. 79 on June10th because of excessive noise. We note that we have omitted a small numberof observations originally utilised in the Greaves et al. paper, due to failed qualitymarkers on the data, as given in the CADC database. However, we stress that ourdata is a significant subset of the observations used by Greaves et al., and it shouldbe noted that we also expect line detection results at the same locations as Greaveset al.Further details of the weather conditions can be found on page 22 of Greaveset al. [5]. After receiving the data, we next computed the variance-weighted av-erage for all datasets across the same set of velocity channels; since baseline sub-traction is linear, there is no reason to conduct the analysis on parts of the dataseparately.Next, we followed the techniques used in Greaves et al. [5] to process andconfigure the data for the possibility of observing a line near the v = 0 km s−1location, referenced to the phosphine line. First, we subtracted a best-fit 4th or-der polynomial baseline from the noisy data. We then followed the techniques inGreaves et al. [5] and applied a 1024-channel wide averaging filter, and subtractedthe remaining quasi-sinusoid signal via the fitting and removal of a 9-th order poly-nomial from the unfiltered data.After the subtraction of these components, Greaves et al. claims the existenceof a further oscillatory signal. We were able to independently verify the existenceof these Fourier modes, and we follow the methodology given by Greaves et al. inbinning the data to 0.55 km s−1 resolution, then finding the two most promoninentFourier modes via a Fourier transform and removing those from the data. Lastly,we bin the data into histograms, and were able to detect a faint line signal near−10km s−1 velocity. We plot the three removed baselines, as well as the final centredsignal in Figure 3.1.19100 50 0 50 100 1500.50.00.51.01.5100 50 0 50 100 1500.40.20.00.20.40.6100 50 0 50 100 150Velocity [km/s]0.100.050.000.050.1040 20 0 20 40Velocity [km/s]0.030.020.010.000.010.02Figure 3.1: Results of data analysis of the JCMT observations following thetechniques utilised in Greaves et al. [5]. The left three panels representthe three-step baseline removal approach used by the original authors toconfirm the signal, with the cyan line representing the background sig-nal, while the darker line represents the baseline fitted and removed ineach step. These left panels can be compared to Supplementary Figure1 in Greaves et al. [5] for clarity. The right panel represents the final flat-tened signal after the previously described data processing steps, binnedinto a histogram to mimic Figure 1 of Greaves et al. [5]. Although aweak absorption line is present at around −8 km s−1, this detectiondoes not seem as strong as the one shown in the original paper.Although we were able to detect a faint absorption line signal within the claimedrange, our signal was not as perfectly centered on v = 0 km s−1 as was the casein Greaves et al. [5]. It seems reasonable to consider that the numerous baselineremoval processes carried out by Greaves et al. [5] seem subjective and arbitrary,with the multi-step process is possibly contributing to the detected signal given inFigure 3.1.20Although the absorption line shown in Figure 3.1 appears to be weakly present,there have to be doubts at the existence of the line, given the somewhat arbitrarybaseline removal process. We also note that given that the line is centered at around−8 km s−1, this line may not be the absorption of the PH3 emission, but haveanother source.3.2 Bayesian Analysis of JCMT DataHaving established our doubts with the methods used by Greaves et al. [5], wenow consider a Bayesian analysis of baseline and absorption line fitting, using themethods given in Chapter 2.Following the combination of the JCMT observations as discussed in Sec-tion 3.1, we next applied the methodology discussed in Section 2.6 to the data.After flattening by a test 4th-order baseline, we fit a Gaussian to the data to obtainan approximate line location from the flattened data.Using this flattened result, we then consider the two models as discussed inChapter 2: one with no absorption line and just a baseline (H1); and one includingan absorption line based on the Gaussian fit (H2). For each order of polynomial,we produce two best-fit models, one where the polynomial itself is the best fit, andone where the polynomial with the Gaussian line feature is considered the best fit.We conduct this fit up to polynomial order 10, which we reason is an upper limitfor the complexity of a polynomial needed to fit the JCMT spectral baseline.The next step involves Bayesian marginalization, and is similar to the processdescribed in Section 2.5. In order to conduct a fit, we sample from a Gaussian dis-tribution centred on the best-fit polynomial parameters, and compute the MSE foreach possible fit. This sampling represents a practical application of Equation 2.26,as discussed in Section 2.6. Since it is impossible to fully integrate over the en-tire sample space, we instead conduct numerical Monte Carlo integration over adistribution near the best-fit.A visualization of the marginalization process is given in Figure 3.2. We showthe fits of the 2nd order, 4th order and 6th order polynomials to the combinedJCMT Venus observations. From visual inspection we can reason that the 2nd-order polynomial underfits the data and the 4th-order polynomial fits the data well,210.50.00.51.01.5n=20.50.00.51.01.5n=4100 50 0 50 100 150Velocity [km/s]10123n=6100 50 0 50 100 150Velocity [km/s]Figure 3.2: Plots of 2nd-order, 4th-order and 6th-order polynomial fits on thecombined JCMT Venus observations. The left panels represent the base-line fits with no absorption line, while the right panels represent the fitswith an added Gaussian modelling line absorption, using the methodsdescribed in Section 2.6. The navy line represents the data, the greylines represent a sample of marginalized baselines and the yellow linerepresents the best-fit in each panel.22even with marginalization. We also see that the 6th-order polynomial represents anexample of a much more complex model, which fits well initially but fails whenmarginalized over the Gaussian sample space.In order to account for the curse-of-dimensionality problem during the sam-pling process, we specifically sampled more finely for the higher-order MonteCarlo integrations, with a cap of 200,000 samples for the highest order polyno-mials. Though ideally even more sampling would produce better results, such asampling would be too computationally intensive, and as we will see in Figure 3.3,the higher order polynomials have a relative probability that trends to zero in anycase.We further make the decision to cut off polynomial orders at n = 10. From Fig-ure 3.3, we see that in the case of higher-order polynomials models, the probabilityfit tends to zero as the massively complex models yield low marginal likelihoods,which do not contribute much to the odds ratio. Having a cutoff like this is effec-tively putting a prior of P (n > 10) = 0 in Equation 2.3; however, such high-ordermodels do not contribute much anyway, so that the difference is negligible. Wefurther note that it would be possible to have an informed prior by either analyzingmore of the JCMT data on other sources to understand the instrument behavior,or by understanding more of the physical processes of the cause of baselines insignals.The resulting set of probabilities, calculated following Equation 2.23, is shownin the top panel of Figure 3.3. We can easily see the peak in relative probability atn = 4, which indicates that a 4th-order baseline is the optimal complexity. We fur-ther see that generally, the model including the absorption line greatly outperformsthe model without the emission line.We next sum up the probabilities from the top panel of Figure 3.3, followingEquation 2.3, using a uniform prior P (n|M, I) = 1 on the polynomial order. Theresulting odds ratio, calculated as the ratio of normalised probabilities given byEquation 2.2, is small, with a final calculated value of O12 = 0.030. This suggeststhat the odds ratio favours the model that contains an absorption line.The relative absorption line model is still true if we were to add a prior pe-nalizing the additional complexity of this model, which we can see in the bottompanel of Figure 3.3. Here we added am L0 prior, the same prior previously shown230.00.20.40.60.81.0Relative ProbabilitiesUniform Prior O12=0.030No LineLine0 2 4 6 8 10Polynomial Order (n)0.000000.000050.000100.000150.000200.000250.000300.00035Relative Probabilitiese ||w||0 Prior O12=0.254Figure 3.3: Computed (unnormalised) probabilities for the Bayesianmarginal best-fit analysis for the line and no line models, for thecases of a uniform prior and an added L0 prior. Note that given theun-normalized probabilities, it is only relevant here to compare theprobabilities to each other, and to disregard the y-scaling entirely.24in our discussion of the AIC. As previously discussed, this L0 prior takes the formof P (n|M, I) = exp−||~w||0, which penalizes additional parameters added to themodel. In the case of the baseline-only and baseline-plus-line models on the JCMTsignal, we see that the baseline-plus-line has three more free parameters, and as aresult is penalized more for the additional complexity of the model. Even with thispenalty though, we still see that the final odds ratio, summed up by Equations 2.3and 2.2, is still less than 1. As seen in the bottom panel of Figure 3.3, the odds ratioin this case is O12 = 0.254. This result is slightly less favourable of the baseline-plus-line model, but still suggests the existence of an absorption line at the givenlocation.3.3 Discussion of JCMT Analysis ResultsThe results of our Bayesian re-analysis seem to indicate that an absorption linefeature does indeed exist close to v = 0 km s−1 in the JCMT observations ofVenus. However, the line feature we detected does not seem to centre exactly atv = 0 km s−1, with the best-fit mean parameter being closer to v = −10 km s−1.The location of this line is consistent with the line we discovered in Section 3.1,in which we reproduced the work of Greaves et al. [5]. The detected feature isalso quite wide, with a best-fit 1-σ width of 4.9 km s−1. Again, this is relativelyconsistent with the wide line feature we detected in Figure 3.1.The reason for this discrepancy is unknown. It is possible that there is a calibra-tion difference between our JCMT data, compared to the calibrations in Greaveset al. [5]. It is also possible that the detected line feature is an entirely differentsignal. The extra width of the line, as well as the negative velocity location doesnot point to the presence of phosphine, and subsequent reanalyses have shownthat the location of this feature can also be interpreted an absorption line of meso-sphereic SO2 distributed across the Venusian atmosphere, which has an absorptionfrequency of 267.54 GHz [8, 12].In order to fully confirm or deny the existence of phosphine on Venus, fur-ther analysis is still required. In particular, it would be ideal to consider a higher-resolution set of observations of Venus, as the single-dish JCMT observations arequite noisy and do not enable the evaluation of signals from parts of Venus’ disk.25Fortunately, Greaves et al. [5] have already conducted follow-up observations us-ing the Atacama Large Millimeter Array, and in the next chapter we will discussthe data analysis of this set of higher-resolution data.26Chapter 4Bayesian Line Fitting Applied toObservational Data from theAtacama Large Millimeter ArrayIf you thought that science was certain - well, that is just an error onyour part. — Richard P. FeynmanIn this chapter, we will discuss and present a re-analysis of the observationaldata taken for Greaves et al. [5] by the Atacama Large Millimeter Array (ALMA).We will discuss these resolved data of the ALMA observations, and further dis-cuss the non-detection of ALMA absorption lines following re-calibration of theoriginal data used in Greaves et al. [5].4.1 Traditional ALMA Data AnalysisThe data from ALMA were based on an observational proposal by Greaves et al.,with proposal code 2018.A.00023.S, and made publicly available on the ALMAScience Archive 1. ALMA observed Venus over two observing sessions on March5th, 2019. We utilised the reduced ALMA data, taken from the online archive, for1https://almascience.nrao.edu/asax/27our analysis in this work.Unlike the single-pixel observations from JCMT, the ALMA observations areof much higher resolution, and resolves the disk of Venus into many spatial ele-ments. ALMA was able to resolve Venus into a disk (although the interferometricnature of ALMA means that the largest scale spatial modes are not observed). Theresolved disk of Venus measured 15.36′′ in diameter on the night of observation,and the resolved area means that it is possible to conduct spectral analyses on sepa-rate regions of the planet. Although we only consider the spectrum of the averageddisk in this work, we note that Greaves et al. [5], as well as other groups [1], haveconducted independent analyses of partial regions of Venus for detecting phosphinegas.The continuum spectrum of Venus as observed by ALMA, along with its sur-rounding sky, is shown in Figure 4.1. In addition to this averaged continuum image,the ALMA data contain both narrowband (high resolution) and wideband (low res-olution) spectral signals of the resolved Venus disk. These spectral signals havealready had the continuum removed during pre-processing, and thus have a meanvalue close to zero.We will be analysing the ALMA narrowband data, which was measured overa range of frequencies centred on the phosphine absorption line at 266.94 GHz(which, by definition, corresponds to a velocity of 0 km s−1 in velocity space).These narrowband data are what yielded the original detection in Greaves et al.[5], and were the major focus of further analysis by both the Greaves et al. teamand other groups.The disk-averaged narrow-band signal is shown in the top panel of Figure 4.2.As in Greaves et al. [5], we overlay a 12th-order polynomial fit on top of the orig-inal signal. A preliminary analysis of the narrowband signal and the 12th-orderbaseline does not show the presence of any absorption line, and this suspicion isfurther confirmed when looking at the bottom panel of Figure 4.2, which displaysthe residuals of the data. Instead, we note that at the supposed location of theabsorption line, the residual is actually above the baseline fit.28306.498626 306.496636 306.494646 306.492656 306.490666Right Ascension-18.6572-18.6552-18.6532-18.6512-18.6492DeclinationFigure 4.1: Continuum image of Venus as observed by ALMA, averaged overall frequency channels. The angular diameter of Venus at the centre ofthe disk measures 15.36′′.4.2 Bayesian Analysis of ALMA DataHaving seen that a preliminary analysis of the ALMA data does not yield any ab-sorption lines, we now perform a more systematic analysis using Bayesian method-ology.Much of the line fitting methodology use for the ALMA analysis is the sameas the Bayesian methods described in the last chapter, and Figure 4.3 shows avisualization of the the 2nd, 3rd and 12th-order polynomial fits, as well as theassociated marginalization. In particular, we see that in the case of the 12th-order290.00200.00150.00100.00050.00000.00050.00100.0015ALMA Narrowband Signal12th Order Baseline60 40 20 0 20 40 60Velocity [km/s]0.00040.00030.00020.00010.00000.00010.00020.0003Signal BaselineFigure 4.2: Disk-averaged ALMA narrowband signal centred on the phos-phine absorption line. The top panel shows the original signal as wellas a 12th-order polynomial fit on top, while the bottom panel shows theresiduals after subtraction of the 12th-order baseline from the originalsignal.300.00250.00200.00150.00100.00050.00000.00050.00100.0015 n=20.00250.00200.00150.00100.00050.00000.00050.00100.0015 n=360 40 20 0 20 40 60Velocity [km/s]0.00250.00200.00150.00100.00050.00000.00050.00100.0015 n=1260 40 20 0 20 40 60Velocity [km/s]Figure 4.3: Bayesian marginalization over ALMA data. As was the case withFigure 3.2, we sample from a Gaussian distribution centered on the best-fit parameters to approximately marginalize over all polynomial orders.The navy line represents the data, the grey lines represent a sample ofmarginalized baselines, and the yellow line represents the best-fit ineach panel.31polynomial fit, there is no absorption line present, since the polynomial sufficientlyexplains all possible signal features.The un-normalised probabilities are again plotted in Figure 4.4; however, theresults this time are very different. We see from the top panel that the un-normalisedprobabilities are almost equal in the case of the uniform prior, but with O12 =1.123, we see that the odds ratio here slightly favours H1, the baseline-only hy-pothesis.This bias towards the baseline-only hypothesis is further strengthened if weare to apply a prior to penalize the additional complexity of H2. As is the casewith the JCMT analysis, we can apply an L0 prior and penalize every additionalparameter used to describe the model. After the application of this prior, we seefrom the bottom panel of Figure 4.4 that the probabilities now heavily favour theno-line hypothesis H1. This is further confirmed with the odds ratio, which is nowO12 = 24.354, far above the threshold needed to decide in favour of H1.4.3 Discussion of ALMA data analysis resultsThe Bayesian analysis of ALMA data seems to conclusively favour the baseline-only hypothesis, H1. Even if we were to disregard the addition of the L0 prior,we see that the uniform prior model shows that the two hypotheses are roughlyequal in describing the data. Given this fact, and the additional complexity of thebaseline-plus-lineH2 model, conventional statistical wisdom would dictate that weshould be in favour of the simpler H1 model to describe our data.This result is further boosted by the addition of the L0 prior to restrict com-plexity. In this case, we see that the odds ratio has been pushed far beyond thethreshold of O12 = 1, and applying such a prior would conclusively favour the H1baseline-only hypothesis. We further draw comparisons between Figure 4.4 andFigure 3.3, since we observe that in the case of the JCMT signal, the line detectionwas strong enough that even with the application of the same L0 prior to restrictcomplexity, the odds ratio in that case still favoured H2.The conclusion of our Bayesian analysis then is that there is no observable linefeature near v = 0 in the narrowband ALMA observations of Venus. This result isconsistent with the result we found in Section 4.1, where we simply subtracted the320.00.20.40.60.81.0Relative ProbabilitiesUniform Prior O12=1.123No LineLine0 2 4 6 8 10 12 14Polynomial Order (n)0.00000.00250.00500.00750.01000.01250.01500.0175Relative Probabilitiese ||w||0 Prior O12=24.354Figure 4.4: Computed relative probabilities for the Bayesian line analysisconducted on the ALMA observations. As is the case with Figure 3.3,the probabilities are un-normalised and thus the relative probabilitiesgive the true comparison.3312th-order best-fit baseline, following the methods of Greaves et al. [5].The apparent discrepancy between our analysis of the ALMA results and thatfound in the original paper may be due to factors outside of our control. TheALMA observations of Venus were re-calibrated in November 2020, following thepublication of the original paper, due to identifications of errors in the original cal-ibration. Subsequent re-analysis by Akins et al. [1] have shown that the ALMAnarrowband signal were significantly changed by the re-calibration process. Un-fortunately, the original data before re-calibration are no longer accessible throughthe ALMA archive, but we refer to Figure 1 of Akins et al. [1] to show the signifi-cant changes of the signal before and after re-calibration. It is thus possible that theclaimed detection by Greaves et al. [5] was simply an incorrectly calibrated signalartifact in the original ALMA results.After the release of the re-calibrated ALMA data, the authors of the originalstudy also performed a re-analysis of the ALMA results [6], following a similarmethodology as in their original paper, Greaves et al. [6] still claim a phosphinedetection in the observations of Venus, although at a lower strength than originallyclaimed. We note that the authors still utilise the same disputed methodology ofa 12th-order polynomial baseline fit, which we have shown to be suspect in ourwork.We thus see that our conclusion is still in tension with the results from Greaveset al. [6], as we characterize the methodology of using a 12th-order polynomialbaseline as doubtful. However, we note that our final conclusion that there is nophosphine line detection in the Venusian atmosphere is also consistent with a num-ber of follow-up papers [1, 4, 8, 10–12]. These results, conducted by a number ofdifferent research groups, reached the same conclusion as us using a number of dif-ferent approaches. Furthermore, our apparent detection of an absorption line in thelow-resolution JCMT results were also corroborated by Lincowski et al. [8], whoclaim that the apparent detection of an absorption line near 266.94 ∼ GHz couldbe explained by a nearby SO2 absorption line. Lincowski et al. [8] further claimsthat the SO2 line was not present in the ALMA observations due to a case of linedilution (i.e. it is extended across Venus’ disk, and diluted by the interferometricALMA data). Although we were unable to independently confirm these results,we note that they offer a possible explanation for the discrepancy of line detection34between the JCMT and ALMA observations.35Chapter 5Discussion of the Generalizationof Bayesian Line FittingMethodologyAll models are wrong, but some are useful. — George E. P. BoxAlthough so far our focus has been the application of this work to the detectionof phosphine absorption in the atmosphere of Venus, we stress that the broadergoal of this project is to develop a generalized Bayesian methodology, that can beapplied to other situations where features are searched for in variable backgrounds.In this section, we will further discuss a number of possible applications of thegeneralized Bayesian line fitting methodology, and consider possible extensions ofthe current project.Abstracted to a higher level, the problem we discussed essentially involves theextraction of a feature of known shape from very noisy data. There are broadapplications for this method, ranging from the detection of expolanets in radialvelocity measurements to the detection of astronomical sources in two-dimensionalimages. In this chapter, we will briefly discuss methods to apply the Bayesianmethodology to a number of these situations.One potential application of a Bayesian statistical approach would be to mea-36sure and fit radial velocity or light curves In order to detect exoplanets.[13] Theradial velocity method for detecting exoplanets depends on measuring a particularstar’s Doppler shifts over a long period of time, in order to measure the relative ra-dial movement of the star using the Doppler effect. We then look for sinusoidal (orperhaps sums of sinusoidal) signals in the relative velocity of the star, which wouldindicate periodic motion caused by the gravity of an orbiting massive exoplanet.Since the basis of this work involves looking for a particularly shaped of signalin velocity space, we see that it is a perfect candidate for a Bayesian analyticalapproach. Looking for features such as eclipses in stellar light curves is a relatedmethod for finding exoplanets.In higher dimensions, Bayesian signal extraction methods can also be used forthe purpose of source detection in two-dimensional data. In noisy images, separat-ing noise from signal is not a trivial problem, and work has been done in the pastto automated methods to separate signal from background [2, 9]. The problem ofsource detection within a noisy background could be converted to the problem offitting a multivariate Gaussian signal over a 2D baseline, or the “background”. Wethus see that in such a setup, a Bayesian approach could marginalize over back-ground fits and calculate potential odds ratios for the existence of sources.Many other applications of Bayesian signal fitting approaches exist for morespecified contexts. In the past, work has been done using similar methods appliedto specific datasets, most notably in the modelling of the EDGES data, which re-lates to cosmological reionization [3, 7]. In this case, the authors of the originalpaper, Bowman et al. [3], claimed that they detected an absorption profile in a sky-averaged spectrum that could potentially indicate the existence of star formation inthe early Universe, but a later reanalysis by Hills et al. [7] demonstrated some prob-lems in the assessment of potential signal detection, particularly regarding freedomin the “baseline” model. As is the case with our application to the problem of phos-phine on Venus, the original findings became highly disputed after the publicationof subsequent work.All this is to say that signal detection issues caused by improper fitting and pro-cessing of baseline and noise is a common phenomenon in astronomy (and likelyin other sub-fields of physics as well). We believe that a systematic Bayesian ap-proach could improve the rigour of astronomical signal detection, and the methods37utilised in this thesis could be expanded to produce a generalized signal detectionapproach for the detection of spectral features and to other applications in the pres-ence of variable baselines.38Chapter 6ConclusionsLiving is worthwhile if one can contribute in some small way to thisendless chain of progress. — Paul A.M. DiracIn this chapter, we will conclude with a summary of our results on the detectionof possible phosphine lines in Venus, and follow up with potential future work onthis topic.6.1 Summary of FindingsIn this thesis, we considered a Bayesian marginalization approach to the problemof feature extraction and baseline fitting. Rather than considering simply the best-fit model as given by the MLE estimates of model parameters, we marginalizedover all model parameters within their distributions and developed a robust methodof model comparison based on the ratio of Bayesian marginalized likelihoods.We next applied our approach to observations of Venus as a follow-up analysisof the claimed detection of a phosphine absorption line by Greaves et al. [5]. Weuse our Bayesian model comparison methodology to compare between the modelof a baseline-only signal and the model of a baseline plus absorption line signal.We first analysed the low-spatial-resolution observations of Venus from the JamesClerk Maxwell Telescope, and found that the odds ratio favoured the existence ofan absorption line. This was even true even after the addition of an additional prior,39which further penalized the absorption line model for the additional complexity.We were thus able to find a weak detection of a line feature near the expectedfrequency.However, our subsequent Bayesian analysis of the high-resolution follow-upobservations from the Atacama Large Millimeter Array produced conflicting re-sults. Our Bayesian marginalized likelihood ratio in this case did not favour themodel with an absorption line. Furthermore, the odds ratio favoured the baseline-only model even more after the application of a prior to penalize complexity. Wethus conclude that we were unable to confirm the claimed phosphine absorptionline in the ALMA data.Our findings are in conflict with the original Venus paper and the subsequentfollow-up paper [5, 6], but this result was consistent with a number of other re-analyses of the ALMA data conducted using other methods [1, 4, 8, 10–12]. Wefurther note that the apparent detection of the line feature in the low-resolutionJCMT data was also corroborated by Lincowski et al. [8] and Villanueva et al. [12],who both claim that this feature may result from a nearby SO2 spectral line.6.2 Future WorkHaving established the validity of our Bayesian methodology to the particular ap-plication of phosphine detection in the Venusian atmosphere, we would like tofurther develop the methods described in this thesis into a comprehensive packagefor use in more general feature-detection applications.As described in Chapter 5, Bayesian baseline removal and feature extractiontechniques have a wide range of applications in astronomy beyond line detection.In the future, we would like to develop the methodology described in this thesis toapply it to other topics. We would also like to expand the signal detection algo-rithms beyond 1D features, for the purpose of source detection in 2D or even 3Dastronomical data sets.40Bibliography[1] A. B. Akins, A. P. Lincowski, V. S. Meadows, and P. G. Steffes.Complications in the ALMA Detection of Phosphine at Venus. , 907(2):L27,Feb. 2021. doi:10.3847/2041-8213/abd56a. → pages 28, 34, 40[2] E. Bertin and S. Arnouts. SExtractor: Software for source extraction. , 117:393–404, June 1996. doi:10.1051/aas:1996164. → page 37[3] J. D. Bowman, A. E. E. Rogers, R. A. Monsalve, T. J. Mozdzen, andN. Mahesh. An absorption profile centred at 78 megahertz in thesky-averaged spectrum. , 555(7694):67–70, Mar. 2018.doi:10.1038/nature25792. → page 37[4] T. Encrenaz, T. K. Greathouse, E. Marcq, T. Widemann, B. Be´zard,T. Fouchet, R. Giles, H. Sagawa, J. Greaves, and C. Sousa-Silva. A stringentupper limit of the PH3 abundance at the cloud top of Venus. , 643:L5, Nov.2020. doi:10.1051/0004-6361/202039559. → pages 34, 40[5] J. S. Greaves, A. M. Richards, W. Bains, P. B. Rimmer, H. Sagawa, D. L.Clements, S. Seager, J. J. Petkowski, C. Sousa-Silva, S. Ranjan, et al.Phosphine gas in the cloud decks of venus. Nature Astronomy, pages 1–10,2020. → pages vii, viii, 1, 2, 3, 18, 19, 20, 21, 25, 26, 27, 28, 34, 39, 40[6] J. S. Greaves, A. M. S. Richards, W. Bains, P. B. Rimmer, D. L. Clements,S. Seager, J. J. Petkowski, C. Sousa-Silva, S. Ranjan, and H. J. Fraser.Re-analysis of Phosphine in Venus’ Clouds. arXiv e-prints, art.arXiv:2011.08176, Nov. 2020. → pages 34, 40[7] R. Hills, G. Kulkarni, P. D. Meerburg, and E. Puchwein. Concerns aboutmodelling of the EDGES data. , 564(7736):E32–E34, Dec. 2018.doi:10.1038/s41586-018-0796-5. → page 37[8] A. P. Lincowski, V. S. Meadows, D. Crisp, A. B. Akins, E. W.Schwieterman, G. N. Arney, M. L. Wong, P. G. Steffes, M. N. Parenteau,41and S. Domagal-Goldman. Claimed Detection of PH3 in the Clouds ofVenus Is Consistent with Mesospheric SO2. , 908(2):L44, Feb. 2021.doi:10.3847/2041-8213/abde47. → pages 25, 34, 40[9] T. P. MacKenzie, D. Scott, and M. Swinbank. SEDEBLEND: a new methodfor deblending spectral energy distributions in confused imaging. , 463(1):10–23, Nov. 2016. doi:10.1093/mnras/stw1890. → page 37[10] I. A. G. Snellen, L. Guzman-Ramirez, M. R. Hogerheijde, A. P. S. Hygate,and F. F. S. van der Tak. Re-analysis of the 267-GHz ALMA observations ofVenus: No statistically significant detection of phosphine. arXiv e-prints,art. arXiv:2010.09761, Oct. 2020. → pages 1, 34, 40[11] M. A. Thompson. The statistical reliability of 267-GHz JCMT observationsof Venus: no significant evidence for phosphine absorption. , 501(1):L18–L22, Jan. 2021. doi:10.1093/mnrasl/slaa187.[12] G. Villanueva, M. Cordiner, P. Irwin, I. de Pater, B. Butler, M. Gurwell,S. Milam, C. Nixon, S. Luszcz-Cook, C. Wilson, V. Kofman, G. Liuzzi,S. Faggi, T. Fauchez, M. Lippi, R. Cosentino, A. Thelen, A. Moullet,P. Hartogh, E. Molter, S. Charnley, G. Arney, A. Mandell, N. Biver, A. Vandaele, K. de Kleer, and R. Kopparapu. No phosphine in the atmosphere ofVenus. arXiv e-prints, art. arXiv:2010.14305, Oct. 2020. → pages1, 25, 34, 40[13] J. T. Wright. Radial Velocities as an Exoplanet Discovery Method, page 4.2018. doi:10.1007/978-3-319-55333-7 4. → page 3742