You may notice some images loading slow across the Open Collections website. Thank you for your patience as we rebuild the cache to make images load faster.

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Applying convolutional neural networks to classify fast radio bursts detected by the CHIME telescope Yadav, Prateek 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_may_yadav_prateek.pdf [ 7.12MB ]
Metadata
JSON: 24-1.0389889.json
JSON-LD: 24-1.0389889-ld.json
RDF/XML (Pretty): 24-1.0389889-rdf.xml
RDF/JSON: 24-1.0389889-rdf.json
Turtle: 24-1.0389889-turtle.txt
N-Triples: 24-1.0389889-rdf-ntriples.txt
Original Record: 24-1.0389889-source.json
Full Text
24-1.0389889-fulltext.txt
Citation
24-1.0389889.ris

Full Text

Applying Convolutional Neural Networks to Classify FastRadio Bursts Detected by The CHIME TelescopebyPrateek YadavA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Physics)The University of British Columbia(Vancouver)April 2020c© Prateek Yadav, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Applying Convolutional Neural Networks to Classify Fast Radio BurstsDetected by The CHIME Telescopesubmitted by Prateek Yadav in partial fulfillment of the requirements for the de-gree of Master of Science in Physics.Examining Committee:Dr. Ingrid H. Stairs, AstronomySupervisorDr. Gary F. Hinshaw, AstronomySupervisory Committee MemberiiAbstractThe Canadian Hydrogen Intensity Mapping Experiment (CHIME) is a novel ra-dio telescope that is predicted to detect up to several dozens of Fast Radio Bursts(FRBs) per day. However, CHIME’s FRB detection software pipeline is suscep-tible to a large number of false positive triggers from terrestrial sources of RadioFrequency Interference (RFI). This thesis details the description of intensityML, asoftware pipeline designed to generate waterfall plots and automatically classify ra-dio bursts detected by CHIME without explicit RFI-masking and DM-refinement.The pipeline uses a convolutional neural network based classifier trained exclu-sively on the events detected by CHIME, and the classifier has an accuracy, preci-sion and recall of over 99%. It has also successfully discovered several FRBs, bothin real-time and from archival data. The ideas presented in this thesis may play akey role in designing future machine-learning models for FRB classification.iiiLay SummaryFast radio bursts are bright bursts of radio waves that last about a few millisecondsin duration. These radio bursts originate from outside of our galaxy, but their exactorigins remain a mystery. CHIME is a novel radio telescope which aims to detecta large number of these radio bursts in order to better understand their origins.However, radio telescopes are extremely susceptible to picking up radio signalsfrom terrestrial sources, such as airplanes and mobile phones. This thesis presentsan automated classifier, which can look at the radio bursts detected by CHIME andtell whether they had a terrestrial or an astrophysical origin.ivPrefaceThe work presented in this thesis was primarily done by me, with contributionsfrom various members of The CHIME/FRB collaboration. All members of collab-oration have contributed to this thesis in some way or the other through instrumentand software development or through data acquisition and verification.The description of The CHIME/FRB system in Section 1.2 has been mostlysummarised from previous publications by The CHIME/FRB Collaboration [4][5].Chapter 2 presents some basic background knowledge for understanding convolu-tional neural networks, which can be readily found in most deep-learning textbookssuch as Goodfellow et al., 2016 [19].Dr. Shriharsh Tendulkar played a significant role in supervising and providinginsights into the development of the plotting routine described in Section 3.2. Thescripts were primarily coded by me, with some contributions from Dr. ShriharshTendulkar and Mr. Chitrang Patel. These scripts extensively utilise modules fromIntensity Analysis Utilities, a library developed by The CHIME/FRB collaborationto analyze intensity data from radio telescopes. The idea of DM-augmentation,described in Subsection 3.2.2, also emerged from discussions within the collabo-ration.The classifier described in Section 3.1 was developed and trained indepen-dently by me. While most of the events used were labelled by the members ofthe collaboration, they were also double-checked by me. Mr. Charanjot Brar andMr. Chitrang Patel assisted me with the real-time deployment of intensityML.Dr. Ingrid Stairs supervised the overall project and assisted with editing thisthesis.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Fast Radio Bursts . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The CHIME/FRB Project . . . . . . . . . . . . . . . . . . . . . . 31.3 Related Work: Use of Machine Learning in FRB Classification . . 81.3.1 Hybrid Deep Neural Network . . . . . . . . . . . . . . . 81.3.2 Transfer Learning on ImageNet Models . . . . . . . . . . 101.4 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 112 Introduction to Convolutional Neural Networks . . . . . . . . . . . 122.1 Feed-Forward Neural-Networks . . . . . . . . . . . . . . . . . . 12vi2.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 153 Description of intensityML . . . . . . . . . . . . . . . . . . . . . . . . 193.1 The FRBNet Architecture . . . . . . . . . . . . . . . . . . . . . . 193.2 Generating Waterfall Plots . . . . . . . . . . . . . . . . . . . . . 233.2.1 Automated Plotting Scripts . . . . . . . . . . . . . . . . . 233.2.2 Data Augmentation . . . . . . . . . . . . . . . . . . . . . 244 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.1 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . 335.1.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 335.1.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . 345.1.3 Science Goals . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38viiList of TablesTable 4.1 Accuracy, precision, recall and F1-score computed on the test set. 29Table 4.2 L1 SNRs and DMs for events shown in Figures 4.1 and 4.2. . . 30viiiList of FiguresFigure 1.1 (a) Plot on the left shows the waterfall plot for an FRB aftercorrecting for the effects from electromagnetic dispersion. (b)The plot on the right shows the same waterfall plot, but withpartial dedispersion to demonstrate the effects of the quadraticdispersive delay in Equation 1.1. The masked frequency chan-nels in both plots are due to interference from the LTE band. . 2Figure 1.2 CHIME radio telescope located at The Dominion Radio Astro-physical Observatory (DRAO) in Canada. Photo taken by MarkHalpern and used with permission. . . . . . . . . . . . . . . . 4Figure 1.3 A schematic of CHIME’s signal path. The raw data collectedby the four reflectors (red arcs) is transferred to the F- and X-Engines at a rate of 13 Tb/s. The F-Engine consists of FPGAboards to digitise and channelise the data. The X-Engine utilisesa GPU cluster for Fast Fourier Transform beam-forming. TheCHIME/FRB backend receives the 1024 stationary intensitybeams at 1 ms cadence and 16k frequency channels. Imageadapted from [4]. . . . . . . . . . . . . . . . . . . . . . . . . 5Figure 1.4 A high-level overview of the CHIME/FRB’s software pipelineshowing different stages of processing. Image adapted from [4]. 6Figure 1.5 A schematic diagram showing the hybrid deep neural-networkdeveloped by Connor et al. Image adapted from [9]. In theoriginal figure, the authors seem to have incorrectly labelledthe operation on the waterfall plot as 1D convolution. . . . . . 9ixFigure 1.6 Diagram showing an example of a network architecture usedby Agarwal et al.. Image adapted from [2] . . . . . . . . . . . 11Figure 2.1 Schematic diagram of a fully-connected n-layer neural-network.An input vector x ∈Rd is transformed to a vector h(1) ∈Rm inthe first hidden layer. This transformation is repeated n timesas shown in Equation 2.1. The final output vector yˆ ∈ Rk isobtained by the transformation shown in Equation 2.3, whereyˆi represents the probability for class i if a softmax function(Equation 2.4) is used. Image adapted from [19]. . . . . . . . 13Figure 2.2 The figures show how convolution can be used to extract theedges from an input image with a single colour channel. Thered pixels in the convolution kernels represent negative num-bers, while the blue pixels represent positive numbers. . . . . 16Figure 2.3 Schematic diagram showing the operation performed by theconvolution layer. See Equation 2.8. Image adapted from [19] 17Figure 2.4 Diagram shows the typical sequence of operations in a CNN.Image adapted from [19] . . . . . . . . . . . . . . . . . . . . 18Figure 3.1 Architecture of FRBNet. The model takes a single-channel256× 256 pixel image as input. Layer 0 of the model con-volves the input with a fixed set of thirteen 7×7 kernels. Theresulting thirteen-channel 250×250 image is then down-sampledwith 2× 2 max-pooling and followed with non-linear ReLUactivation function. The next three layers each perform a 5×5convolution with a stride of two, followed by a ReLU activa-tion. At the end of Layer 3, the ten-channel image is down-sampled using max-pooling to give a one-dimensional vectorof size ten. Finally, a fully-connected layer performs the oper-ation in Equation 2.3 to give an output vector of size 2. Imagestyle adapted from [23]. . . . . . . . . . . . . . . . . . . . . 20xFigure 3.2 The top left plot shows the waterfall plot of an FRB. The hori-zontal streaks in this plot are RFI contamination. The remain-ing plots show the thirteen convolution kernels and their cor-responding Layer 0 transformations on the original plot. Thekernel on the top right is simply the identity kernel. The re-maining kernels show the Prewitt (left) and Sobel (right) ker-nels embedded in a 7×7 grid. These help enhance the verticalpulse shape while wiping away the horizontal RFI streaks. . . 21Figure 3.3 The top left plot shows the waterfall plot of an RFI event (withno astrophysical pulse present). Similar to Figure 3.2, the re-maining plots show the thirteen convolution kernels and theircorresponding Layer 0 transformations on the original plot. . . 22Figure 3.4 Some examples of plots generated by the automated script forevents that were classified as astrophysical by the CHIME/FRBpipeline. The top five plots are pulses from FRBs and knownpulsars that were correctly classified as astrophysical by thepipeline. The pulses are not perfectly vertical as L1’s DMsearch doesn’t find the optimal DM value. The horizontal RFIstreaks can also be seen on these plots due to absence of RFI-masking. The bottom five events were also classified classi-fied as astrophysical by the pipeline, but these events are mostlikely RFI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 3.5 The figure shows the difference between the plots generatedby the online-waterfaller (left) and intensityML (right) for ascattered FRB. There is no down-sampling performed by theonline-waterfaller by default. intensityML on the other handperforms automatic down-sampling which makes the burst ap-pear narrow in the plot. . . . . . . . . . . . . . . . . . . . . . 25Figure 3.6 The waterfall plots of an FRB and its ten DM-augmented coun-terparts generated by the plotting script. L1 typically finds asub-optimal DM value, which is why the burst in (a) does notappear vertical. . . . . . . . . . . . . . . . . . . . . . . . . . 26xiFigure 3.7 Noise-augmentation for an FRB’s waterfall plots. (a) showshow this is performed by taking a weighted sum of the defaultwaterfall plot with the blank-sky plot, where ξ ∼ 0.6. There’salso a probability of 0.5 to flip the blank-sky plot along itstime-axis before the addition. (b) shows this effect for all ofthe DM-augmented counterparts from Figure 3.6. . . . . . . . 27Figure 4.1 Waterfall plots generated by intenstiyML (left) compared withthe ones generated by the online-waterfaller (right) for someof the potential FRB candidates. These were discovered byusers with the help of real-time deployment. . . . . . . . . . . 31Figure 4.2 Waterfall plots for some of the potential FRB candidates dis-covered from archival data that were intially misclassified asRFI by users. These were discovered by either visually in-specting plots generated by intensityML or with the help of theclassifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 5.1 Diagram shows an example of an Inception module [44] forLayer 0 convolution. . . . . . . . . . . . . . . . . . . . . . . 35xiiGlossaryCHIME Canadian Hydrogen Intensity Mapping ExperimentCNN Convolutional Neural NetworkDBSCAN Density-Based Spatial Clustering of Applications with NoiseDM Dispersion MeasureDRAO Dominion Radio Astrophysical ObservatoryFRB Fast Radio BurstGBT Green Bank TelescopeRFI Radio Frequency InterferenceRRAT Rotating Radio TransientsSNR Signal-to-Noise RatioSVM Support Vector MachinexiiiAcknowledgmentsI would like to express my gratitude towards my supervisor, Dr. Ingrid H. Stairs,for supervising and supporting my research. I would also like to thank the en-tire CHIME/FRB collaboration for providing numerous insightful discussions andassisting me with countless roadblocks encountered over the course of this project.I would also like to extend my gratitude towards the faculty and staff, as wellas the greater student community, at The University of British Columbia for theirsupport during my time here.Finally, I would like to thank my family for their financial and moral supportthroughout my education.xivChapter 1Introduction1.1 Fast Radio BurstsA Fast Radio Burst (FRB) is a highly dispersed millisecond duration radio signalof unknown extra-galactic origin [25][35].Figure 1.1 shows the intensity as function of frequency vs. time (also knownas the waterfall plot) for an FRB detected by the CHIME telescope [5] (see Sec-tion 1.2 for more details on CHIME). As an FRB propagates through space, theeffects of electromagnetic dispersion due to the interstellar cold-plasma result in aquadratic delay in the time of arrival as a function of frequency [25]. The amountof dispersion can be quantified by calculating the Dispersion Measure (DM) usingthe cold-plasma dispersion law [24]:DM =2pimec∆te2( f−2low− f−2high), (1.1)where me and e are the mass and charge of an electron respectively, c is the speedof light and ∆t is the difference in the time of arrival between the higher frequency( fhigh) and the lower frequency ( flow) of the burst. The DM is also equal to theintegral of the electron density (ne) along the line-of-sight to the FRB [34]:DM =∫ d0ne(l)dl, (1.2)1(a) (b)Figure 1.1: (a) Plot on the left shows the waterfall plot for an FRB after cor-recting for the effects from electromagnetic dispersion. (b) The plot onthe right shows the same waterfall plot, but with partial dedispersionto demonstrate the effects of the quadratic dispersive delay in Equation1.1. The masked frequency channels in both plots are due to interfer-ence from the LTE band.where d is the distance to the FRB. The optimum DM value can be determined bya process knows as dedispersion, in which the lower frequency channels are shiftedwith respect to the higher frequency channels over several trial DM values until theburst lines up vertically [25]. Figure 1.1 shows the waterfall plots of the FRB andthe effects of dedispersion.The first FRB was discovered by Lorimer et al. in 2007 [25] during their searchof archival data from Parkes radio telescope [42]. This FRB came to be known asthe Lorimer burst. They found the DM of this burst to be 375 cm−3 pc. However,according to the NE2001 model [10], The Milky Way should have only contributedabout 25 cm−3 pc along this line of sight of detection [25]. This led them to con-clude that the burst originated from outside our galaxy.Since then, around 110 FRBs have been discovered by various radio surveysacross the world1 [32]. Some FRB sources are also known to repeat, and the firstrepeating source was identified by Spitler et al. in 2016 [41]. It is estimated that the1http://frbcat.org2new generation of telescopes, such as The Canadian Hydrogen Intensity MappingExperiment (CHIME), may be capable of detecting dozens of new FRBs every day[4].Despite numerous detections, the source of FRBs still remains a mystery [35].Some of the popular progenitor theories include compact object mergers, collapseof compact objects, supernovae remnants and active galactic nuclei [35]. For ex-ample, two neutron stars in a highly eccentric binary orbit may witness reduction intheir orbital separation due to dissipation via gravitation waves [13]. It is predictedthat FRBs may be produced when the neutron stars approach their periastron andtheir magneto-spheres interact, before they finally merge into each other [13].However, repeating FRBs have been observed to have a different burst mor-phology when compared to non-repeating ones, which suggests that that theiremission mechanism or local environment may be different [6] [15]. Moreover,repeating FRBs can only originate from a mechanism that does not involve the de-struction of the original source. For example, Metzger et al. developed a theoryinvolving synchrotron maser emission from young magnetars [27] to describe theorigins and characteristics of repeating FRBs. However, it is still not clear whetherrepeating and non-repeating FRBs originate from the same mechanism. Needlessto say, an increased number of FRB detections would help better constrain thesetheoretical models.1.2 The CHIME/FRB ProjectThe following section gives a brief overview of The CHIME/FRB software pipeline.For more details, refer to [4][5].The Canadian Hydrogen Intensity Mapping Experiment (CHIME) is originallydesigned to measure the baryon acoustic oscillations by mapping neutral hydrogenin the frequency range of 400 – 800 MHz [28]. The telescope consists of four adja-cent 20 m x 100 m semi-cylindrical reflectors oriented in the North-South directionas shown in Figure 1.2 [4]. Each one of these reflectors have 256 dual-polarisationfeeds along its axis, resulting in a total of 1024 independent intensity beams anda large field-of-view of about 250 deg2 [4] [5]. The CHIME/FRB project aims toutilise CHIME’s wide bandwidth, high sensitivity, large field-of-view and powerful3Figure 1.2: CHIME radio telescope located at The Dominion Radio Astro-physical Observatory (DRAO) in Canada. Photo taken by Mark Halpernand used with permission.correlator to detect multiple FRBs per day [4].Figure 1.3 shows a schematic diagram of the telescope signal path. The inputfrom the receiver feeds is processed by the F-Engine and the X-Engine to produce1024 stationary intensity beams. Each beam has a high sampling rate of 0.983 msand a frequency resolution of 24.4 kHz, resulting in a total of 16384 frequencychannels in the 400-800 MHz range [4]. The data rate into the CHIME/FRB back-end is a massive 142 Gb/s. [4]Figure 1.4 shows a schematic diagram of the CHIME/FRB software pipeline.The processing is split into four stages, namely L1, L2, L3 and L4. The L0 stagerefers to the pre-processing, including beam-forming, done by the X-Engine cor-relator. The L1 stage receives data from L0 and utilises a dedicated cluster of 128compute nodes to perform two key tasks [4]. The first task involves removing ter-restrial sources of Radio Frequency Interference (RFI). It is imperative to excisethese RFI signals from the intensity data in order to prevent misclassifying themas astrophysical signals. L1 does this by using a specialised algorithm to apply acustom mask to the intensity data in the frequency vs. time space. The second taskinvolves dedispersion and identification of potential burst candidate events, andit is the most computationally expensive part of the pipeline. A highly optimised4Figure 1.3: A schematic of CHIME’s signal path. The raw data collected bythe four reflectors (red arcs) is transferred to the F- and X-Engines ata rate of 13 Tb/s. The F-Engine consists of FPGA boards to digitiseand channelise the data. The X-Engine utilises a GPU cluster for FastFourier Transform beam-forming. The CHIME/FRB backend receivesthe 1024 stationary intensity beams at 1 ms cadence and 16k frequencychannels. Image adapted from [4].tree-algorithm performs dedispersion on the data stream and searches for candidateevents with DMs up to 13,000 cm−3 pc and pulse widths up to 128 ms [5].Once an event is identified, various techniques, including machine-learning,are used to classify it [4][5]. In L1, a Support Vector Machine (SVM) classifier [11]uses the event’s Signal-to-Noise Ratio (SNR) behaviour to give it a score between 0to 10. This score is called the L1 Grade and it reflects how astrophysical the eventis. A score closer to 0 suggests that the event was most likely a terrestrial source ofRFI, whereas a score closer to 10 suggests that the event was of an astrophysicalorigin. A lightweight description of the event containing key parameters, suchas SNR, DM, L1 Grade, detection-time and sky-coordinates, are passed onto thenext stage for event identification. Since CHIME/FRB data rate is so high, theevent’s baseband (raw voltage data from antennae) and intensity data are stored5Figure 1.4: A high-level overview of the CHIME/FRB’s software pipelineshowing different stages of processing. Image adapted from [4].temporarily in L0 and L1 buffers respectively, and are retrieved later if the event isconfirmed to be astrophysical further down the pipeline.The L2 stage utilises Density-Based Spatial Clustering of Applications withNoise (DBSCAN) algorithm [14] to consolidate events that are from the same origin,but detected simultaneously at different beams, into one single event based on theirdetection-time, DM, and sky-coordinates [5]. Once the events from all beams havebeen collated and grouped, another SVM classifier called the RFI-sifter gives thema score between 0 to 10 depending on their SNR behaviour, beam activity andL1 Grade. The RFI-sifter has an accuracy and recall of about 99% [5]. The eventsclassified as RFI are sent straight to L4 stage. The events classified as astrophysicalhave their positions refined and fluxes estimated before being sent to L3 stage.The L3 stage checks whether the events are of galactic or extra-galactic originbased on their DMs and refined positions [4]. It also checks whether an event iscoming from a source that is already known, such as a pulsar, a known FRB or6an Rotating Radio Transients (RRAT), by checking the ATNF pulsar catalogue2[26], the RRATalog3, the FRB catalogue4 [32] and CHIME/FRB’s own databaseof discoveries. Once this is determined, L3 contains a set of rules which determinewhat further actions need to be taken by the L4 stage.L4 performs the actions requested by L3 [4]. These include sending a call-back request to L1 to retrieve the raw intensity data for extra-galactic events. Thecall-back request may also trigger L0 to write raw voltage data from the antennaeby performing a baseband dump. The intensity data that is called-back is writtento a network-shared archiver and can be used for further offline analysis. L4 alsohosts a relational database that stores key parameters of each event sent past the L1stage.Post L4, several offline analysis routines are monitored and controlled by TheCHIME/FRB Master. One such routine is the online-waterfaller, which automati-cally generates waterfall plots for all called-back events. These plots are displayedon an interactive web-interface where users can modify the plots by manually re-fining the DM, masking and sub-banding the frequency channels, down-samplingthe time samples etc. With the assistance of these tools, users visually inspect theseplots and try their best to classify the called-back events as astrophysical or RFI.However, despite multiple efforts to mitigate RFI events, the CHIME/FRBpipeline is still susceptible to a large number of false positives, i.e there is a signif-icant proportion of called-back events which are actually RFI because the pipelineclassifies them as astrophysical. For example, roughly 135 out of 360 called-backevents in the months of October and November 2019 were false positives. This canmake manual classification a rather laborious task. To address this, the membersof the CHIME/FRB collaboration are assigned periodic 5-6 hour-long shifts wherethey monitor the waterfall plots of the events as they are called-back, and clas-sify them as either RFI or astrophysical. Needless to say, image-recognition viamachine-learning can play a significant role in minimising the amount of work-load in this area.2www.atnf.csiro.au/people/pulsar/psrcat/3astro.phys.wvu.edu/rratalog/4www.frbcat.org/71.3 Related Work: Use of Machine Learning in FRBClassificationConvolutional Neural Network (CNN) based classifiers have been successfully ap-plied to pulsar searches (see for example [20][47]). More recently, they have alsobeen successfully applied to FRB searches. In 2018, Zhang et al. successfullyapplied CNNs to detect FRBs directly from the intensity data stream from Break-through Listen observations from Green Bank Telescope (GBT), West Virginia [46].CNNs have also been applied to search for FRBs from event data, and this sectionwill describe two such applications that are similar to the model presented in thisthesis [9][2]. For technical details on CNNs, refer to Chapter 2.The performance of such models is typically evaluated by four metrics:1. Accuracy: No. of correctly classified eventsTotal no. of events2. Recall: No. of correctly classified astrophysical eventsTotal no. of astrophysical events3. Precision: No. of correctly classified astrophysical eventsTotal no. of events classified as astrophysical4. F1-score: 2Recall−1+ Precision−11.3.1 Hybrid Deep Neural NetworkIn 2018, Connor et al. constructed a tree-like hybrid deep neural network as shownin Figure 1.5 [9]. This hybrid network takes in four different features:1. Dedispersed waterfall plot.2. Pulse profile obtained by summing the dedispersed waterfall plot along itsfrequency axis.3. DM vs. time plot, where each row of the 2-dimensional plot is the pulseprofile at a different DM value.4. The SNRs of neighbouring beams that were triggered.8Figure 1.5: A schematic diagram showing the hybrid deep neural-networkdeveloped by Connor et al. Image adapted from [9]. In the originalfigure, the authors seem to have incorrectly labelled the operation onthe waterfall plot as 1D convolution.Three different CNNs independently extract higher-level features from the firstthree input features. A fully-connected neural-network extracts higher-level fea-tures from the last input feature. All of these output features are merged into onelarge fully-connected neural-network which makes a prediction.Connor et al. trained and tested their results independently on two differentdatasets. The first dataset consists of events triggered on the CHIME Pathfinder,which is a precursor to CHIME [3]. The model was trained on 4850 simulatedFRBs and an equal number of RFI triggers from the CHIME Pathfinder. Thetrained model was then tested on several hundred RFI triggers and single pulses9from the pulsar B0329+54 and the Crab pulsar5. The classifier had a recall rate ofabout 99% on the test set.The second dataset used by Connor et al. was from the Apertif Telescope [29].The training set consisted of roughly 10,000 RFI candidates, 9,800 simulated FRBsand a couple hundred single pulses from galactic pulsars. The trained model wastested on several hundred single pulses from galactic pulsars and RFI triggers, andthe resulting recall rate was about 99.7%.1.3.2 Transfer Learning on ImageNet ModelsIn 2019, Agarwal et al. trained several popular deep CNN architectures [2]. Thesemodels have previously shown remarkable performance when it came to large pub-lic image repositories of real-life images, such as ImageNet [12]. Agarwal et al.trained these models on images of waterfall plots and DM vs. time plots inde-pendently using the method of transfer learning. In this method, all of the modelparameters are initialised to their values from training on ImageNet dataset, andonly the parameters from the last few convolution layers were fitted. Once theseindividual models were optimised, they removed each model’s classification layerand fused them into a hybrid network as shown in Figure 1.6.Agarwal et al. trained the models on RFI and galactic pulsars from The GreenBank Telescope (GBT) [43][37] and the 20 m telescope [18] located in Green Bank,West Virginia. The training set also consisted of simulated FRBs. The examplesused for training were also augmented to artificially increase the size of the trainingset. They did so by flipping the waterfall plots along their time axes and flippingthe DM vs. time plots along both axes. The test set consisted of about 13,500real events, out of which roughly half were from galactic pulsars and the otherhalf from RFI. Their top 11 hybrid models had greater than 99% accuracy, recalland F1-score. They also tested their models on 56 real FRB events from ASKAP[39], Parkes [31] [33][45][8] and Breakthrough Listen [17][46], and most of theirmodels had close to perfect recall.5It is a common practice to use pulses from pulsars as their waterfall plots look extremely similarto those from FRBs.10Figure 1.6: Diagram showing an example of a network architecture used byAgarwal et al.. Image adapted from [2]1.4 Thesis OrganisationThis thesis presents intensityML, a software pipeline which automatically generateswaterfall plots for called-back events and classifies them. The classifier has beendesigned specifically for the CHIME/FRB system, as it has been trained and testedexclusively on real events called-back by CHIME.The second chapter of this thesis presents a brief overview of convolutionalneural networks. The third chapter describes the features of intensityML, and thefourth chapter reports its performance. Finally, the fifth chapter presents a discus-sion on the current results and potential future work.11Chapter 2Introduction to ConvolutionalNeural NetworksThis chapter will provide a very quick and condensed description of feed-forwardneural-networks and convolutional neural networks. For more comprehensive de-tails, refer to sources like [19] and [16].2.1 Feed-Forward Neural-NetworksA neural-network is a commonly used nonlinear statistical model which is oftenused for classification problems. An input example with d features can be repre-sented as a vector x∈Rd . For a classification problem with k target labels, a neural-network maps the input vector x to a target vector yˆ ∈ Rk. The mapping consistsof a series of nonlinear transformations or hidden layers as shown in Equation 2.1.h(1) = g(1)(W(1)x+b(1))h(2) = g(2)(W(2)h(1)+b(2))(2.1)...h(n) = g(n)(W(n)h(n−1)+b(n))12Figure 2.1: Schematic diagram of a fully-connected n-layer neural-network.An input vector x ∈ Rd is transformed to a vector h(1) ∈ Rm in the firsthidden layer. This transformation is repeated n times as shown in Equa-tion 2.1. The final output vector yˆ∈Rk is obtained by the transformationshown in Equation 2.3, where yˆi represents the probability for class i ifa softmax function (Equation 2.4) is used. Image adapted from [19].Each layer multiplies its input vector with a matrix W(i) and adds a bias vectorb(i) to it. The resulting vector is then acted on by an element-wise nonlinear func-tion g(i), which is also known as the activation function. The rectified linear unit(ReLU) is the typical choice for activation functions in most neural-networks [19]:g(z) = max{0, z}. (2.2)The output layer takes the vector h(n) and performs a transformation similar tothe hidden layers:yˆ = f(Wh(n)+b), (2.3)except the nonlinear function f is typically the softmax function [19]:13f (z j) =exp(z j)k∑c=1exp(zc). (2.4)The advantage of using the softmax function is that it allows us to interpretthe elements of yˆ as the predicted probabilities for each class. Figure 2.1 shows aschematic diagram summarising Equations 2.1 and 2.3.The parameters of a neural-network (W’s and b’s) are a priori unknown. LetΘ represent all the neural-network parameters. The optimal values of these param-eters, Θˆ, are determined by maximum likelihood estimation:Θˆ ∈ argmaxΘ{N∏i=1p( yi | xi, Θ)}, (2.5)where yi is the true label for example xi, and N is the total number of examples.This can equivalently be expressed as finding the minimiser of the negative log-likelihood:Θˆ ∈ argmaxΘ{N∏i=1k∑c=1yic yˆic}∈ argminΘ{−N∑i=1k∑c=1yic ln(yˆic)}, (2.6)where yic is equal 1 only if c is the correct class label for example i, and yˆic ispredicted probability for example i to be of class c. The term being minimised inEquation 2.6 is also known as the cross-entropy loss function [16].A given dataset is typically split into three parts for training, validation andtesting. The training set is used for minimising the loss function iteratively viasome variation of the gradient descent algorithm:Θt+1 =Θt −α ·∇Θt L(Θt), (2.7)where Θt are the weights at iteration t, α is the step-size and L is the loss func-tion. This process, however, can often lead to a model that overfits the trainingdataset since neural-networks are typically over-parameterised. In the context ofmachine-learning, overfitting refers to a situation where a model is fit too well tothe peculiarities of the features in the training set, and generalises poorly to newer14datasets. The validation set can be used to gauge this effect by estimating howwell the trained model generalises to data that it did not ‘see’ during training. Thegeneralisation error on the validation set can also be used for fine-tuning the hyper-parameters of the neural-network, such as the number of hidden layers or the sizeof each layer. It can also be used to determine when to stop the gradient descentalgorithm once an optimal value for the validation error has been obtained.Lastly, the test set is used to get an unbiased estimate of the final trainedmodel’s performance and should ideally only be used once to avoid optmisationbias.2.2 Convolutional Neural NetworksConvolutional Neural Network (CNN) [22] is a special type of neural-networkthat generally performs well on image-classification problems [19]. Unlike fully-connected neural-networks, a CNN consists of convolution layers that perform con-volution operations instead of matrix multiplications. The purpose of convolutionsis to extract features from an input image. Figure 2.2 shows how convolutions withvertical and horizontal Prewitt operators [36] can be used to extract edges.For an input two-dimensional grid of pixels I, the output O from the convolu-tion operation1 performed by a convolution layer looks like:Oi, j =∑m∑nIi+m, j+n Km,n +b, (2.8)where K is the two-dimensional convolution kernel, b is a constant bias term andthe subscripts denote pixel numbers. Figure 2.3 illustrates this operation. For atwo-dimensional image with multiple input colour channels, the convolution oper-ation is performed is a slight modification to Equation 2.8:Oi, j,k =∑m∑n∑pIi+m, j+n,p Kk,m,n,p +bk, (2.9)where subscripts p and k denote the colour channels of the input and output imagerespectively.1Technically, the operation performed here is cross-correlation, which is similar to convolution.See [19] for more details.15(a) Vertical edge detection with a 3×3 Prewitt operator.(b) Horizontal edge detection with a 3×3 Prewitt operator.Figure 2.2: The figures show how convolution can be used to extract theedges from an input image with a single colour channel. The red pixelsin the convolution kernels represent negative numbers, while the bluepixels represent positive numbers.The convolution operation can also be written as a matrix multiplication:Z = Wx+b, (2.10)where x and Z are images I and O flattened into one-dimensional vectors respec-tively, and b is the bias vector. W is a sparse doubly-block circulant-matrix, wherethe number of non-zero elements of each row is equal to the total number of el-ements in matrix K. Here, we can see the similarity between the CNN and thefully-connected neural network from the previous subsection. However, due to thesparsity of W, not all input features interact with each other. This can be advanta-geous as it allows the transformation to focus on the local regions of an image. It16Figure 2.3: Schematic diagram showing the operation performed by the con-volution layer. See Equation 2.8. Image adapted from [19]also greatly reduces the number of parameters in the model. Moreover, the sameset of parameters are applied at every position of the input. As a result, CNNshave better time-complexity, memory requirement and statistical efficiency thanfully-connected neural-networks.Similar to neural-networks, the convolution operation is usually followed by anon-linear activation function applied to every single pixel of the output. This isthen typically followed by a pooling function which down-samples the dimensionsof the output. This set of processes are applied repeatedly to extract higher-levelfeatures from the image while reducing the size of the image. Finally, the outputis reshaped into a one-dimensional vector and is fed into a fully-connected neural-17Figure 2.4: Diagram shows the typical sequence of operations in a CNN. Im-age adapted from [19]network which then classifies it. Figure 2.4 shows the architecture of a typicalCNN. Note that the convolution kernels are also typically a priori unknown andare fitted during the training phase.18Chapter 3Description of intensityML3.1 The FRBNet ArchitectureIn this section, the architecture of FRBNet, the CNN model used by intensityMLfor classification will be described. Figure 3.1 shows its architecture.In the Layer 0 of FRBNet, the input image is first convolved with a set ofthirteen kernels which were kept fixed during the training phase1. The convolutionoperation is followed by max-pooling operation [19], where the image is down-sampled by taking the maximum value over a window of size 2× 2 every twopixels. Finally, a ReLU activation function is applied to every pixel.Figures 3.2 and 3.3 show the thirteen kernels and their corresponding Layer0 transforms for an FRB and an RFI event respectively. There are twelve Sobel[40][1] and Prewitt [36] kernels of different sizes for vertical edge detection. It canbe seen how these kernels help extract the vertical pulse while clearing away theRFI. The 3×3 kernels are typically more sensitive to narrower bursts, whereas the7×7 kernels are more sensitive to wider ones. A single identity kernel is also usedto retain some information from the original plot. Since the kernels are of differentsizes, the smaller kernels were padded with zeroes to give larger 7×7 kernels. Thisallows the convolution operations to be performed simultaneously within the sameconvolution layer.1Similar techniques have been applied in numerous CNN applications. See for example [7] and[38].19Figure 3.1: Architecture of FRBNet. The model takes a single-channel256× 256 pixel image as input. Layer 0 of the model convolves theinput with a fixed set of thirteen 7× 7 kernels. The resulting thirteen-channel 250×250 image is then down-sampled with 2×2 max-poolingand followed with non-linear ReLU activation function. The next threelayers each perform a 5× 5 convolution with a stride of two, followedby a ReLU activation. At the end of Layer 3, the ten-channel imageis down-sampled using max-pooling to give a one-dimensional vectorof size ten. Finally, a fully-connected layer performs the operation inEquation 2.3 to give an output vector of size 2. Image style adaptedfrom [23].Layers 1, 2 and 3 perform 5×5 convolution and ReLU activation successively.Instead of applying max-pooling, the convolution operation is performed with astride of 2. These kernels are not kept frozen and are fitted during the trainingphase. The purpose of these layers is to extract higher level features from the Layer0 outputs. Finally, the extracted information is down-sampled to a one-dimensionalvector, which is then transformed to give the classification probabilities.20Figure 3.2: The top left plot shows the waterfall plot of an FRB. The hori-zontal streaks in this plot are RFI contamination. The remaining plotsshow the thirteen convolution kernels and their corresponding Layer 0transformations on the original plot. The kernel on the top right is sim-ply the identity kernel. The remaining kernels show the Prewitt (left)and Sobel (right) kernels embedded in a 7×7 grid. These help enhancethe vertical pulse shape while wiping away the horizontal RFI streaks.21Figure 3.3: The top left plot shows the waterfall plot of an RFI event (with noastrophysical pulse present). Similar to Figure 3.2, the remaining plotsshow the thirteen convolution kernels and their corresponding Layer 0transformations on the original plot.22The classification probability is converted to a score between 0 to 10. Forevents detected in multiple beams, the final event score is determined by averagingthe score for each beam’s waterfall plot. During the training procedure, each wa-terfall plot is treated independently and the model is optimised to give the correctscore to each waterfall plot. The averaging can help minimise the errors in the finalevent classification due to the classification errors in each beam’s waterfall plots.3.2 Generating Waterfall Plots3.2.1 Automated Plotting ScriptsThe called-back raw intensity data for an event is converted into a 256×256 pixelwaterfall plot by intensityML via an automated plotting script which:1. Dedisperses to the DM value calculated by L1.2. Trims and down-samples the plot along its time-axis to give a total of 256time samples.3. Sub-bands the frequency-axis to give 256 frequency channels.4. Scales each frequency channel independently by subtracting the median valueand dividing by the standard deviation.Figure 3.4 shows plots for typical examples of astrophysical and RFI eventsgenerated from this automated script. All of these events were classified as astro-physical by the CHIME/FRB pipeline. Note that DM-refinement and RFI-maskingare not performed on any of these plots as these are often difficult to automate andcan be relatively time-consuming processes.A key feature of these scripts is that they automatically down-sample plotsbased on their L1 parameters. L1’s dedispersion algorithm finds parameters whichare correlated with the burst width, and these are used by intensityML for automaticdown-sampling. As a result, bursts which are wide and scattered appear narrow asshown in Figure 3.5. This property is extremely useful to the FRBNet architecture.23Figure 3.4: Some examples of plots generated by the automated scriptfor events that were classified as astrophysical by the CHIME/FRBpipeline. The top five plots are pulses from FRBs and known pulsars thatwere correctly classified as astrophysical by the pipeline. The pulses arenot perfectly vertical as L1’s DM search doesn’t find the optimal DMvalue. The horizontal RFI streaks can also be seen on these plots dueto absence of RFI-masking. The bottom five events were also classi-fied classified as astrophysical by the pipeline, but these events are mostlikely RFI.3.2.2 Data AugmentationData augmentation is a powerful technique that is commonly used when trainingimage classifiers [19]. This involves generating extra modified copies of the train-ing examples by rotating, flipping, translating, cropping, adding noise etc. Thistechnique significantly improves how well a trained model can generalise to newdata. There were three types of data augmentation strategies used for training.The first of these were the DM-augmented plots, where an event was plottedabout ten times at different DM values uniformly sampled within the event’s DM-error range. Figure 3.6 shows this for an FRB event. This augmentation can alsobe performed for RFI events.The plotting scripts can also be used to generate blank-sky plots, where thetime-window of an FRB’s waterfall plot was shifted to give a plot with just noise.A weighted sum of this blank-sky plot and the FRB plot was then taken to producea noisier FRB plot. This technique was also used by Zhang et al. in 2018 [46].24Figure 3.5: The figure shows the difference between the plots generated bythe online-waterfaller (left) and intensityML (right) for a scattered FRB.There is no down-sampling performed by the online-waterfaller by de-fault. intensityML on the other hand performs automatic down-samplingwhich makes the burst appear narrow in the plot.This process will be referred to as noise-augmentation, and Figure 3.7 shows thiseffect on the plots from Figure 3.6. The summation weight, ξ , was chosen to bea random number close to 0.6. This value was chosen as it gave the right balancebetween diluting the signal while still keeping it visible. There was also a 50%chance to flip the blank-sky plot before summation in order to further increase therandomness introduced by this augmentation technique.Noise-augmentation was performed only for high-SNR FRBs. For fainter FRBs,the signal may get too diluted by this effect. For pulsars, the blank-sky plots maycontain additional pulses if the plotting window is shifted. For RFI events, theaugmentation may not give a realistic plot.It can be seen that the combination of the two aforementioned augmentationtechniques can be used to multiply the size of the training set by about an order ofmagnitude. The size of the training set was further doubled by mirroring each plotalong its frequency-axis (similar to Agarwal et al. [2]).25(a) Default waterfall plot.(b) DM-augmentations of the waterfall plot.Figure 3.6: The waterfall plots of an FRB and its ten DM-augmented counter-parts generated by the plotting script. L1 typically finds a sub-optimalDM value, which is why the burst in (a) does not appear vertical.26(a) Noise-augmentation for an FRB waterfall plot.(b) Noise-augmentation for all DM-augmented waterfall plots of the FRB.Figure 3.7: Noise-augmentation for an FRB’s waterfall plots. (a) shows howthis is performed by taking a weighted sum of the default waterfall plotwith the blank-sky plot, where ξ ∼ 0.6. There’s also a probability of0.5 to flip the blank-sky plot along its time-axis before the addition. (b)shows this effect for all of the DM-augmented counterparts from Figure3.6.27Chapter 4Results4.1 TrainingThe dataset used for training, validation and testing consisted of events from Septem-ber 2018 to November 2019. The raw data for these events were processed intowaterfall plots using intensityML. Events and plots which appeared ambiguous orcorrupted were excluded.The training set consisted of roughly 3000 called-back events from September2018 to August 2019. Roughly half of these were from pulsars and FRBs, andthe other half were from RFI. After applying augmentations, the astrophysical setconsisted of about 57,000 waterfall plots, whereas the RFI set consisted of about49,000. Some blank-sky plots were also included in the RFI set to further improveclassifier’s sensitivity to faint bursts. The validation set consisted of around 195astrophysical events (pulsars and FRBs) and 95 RFI events from September 2019.All waterfall plots were standardised such that the brightest and dimmest pixel ineach plot had a value of 1 and 0 respectively.PyTorch [30] was used to implement the FRBNet architecture and train it. Themodel was trained using the Adam optimiser [21], which is a variation of the gra-dient descent algorithm. The optimal model configuration was selected using themethod of early-stopping [19], where the validation error was kept track of duringthe optimisation process, and the configuration with the lowest error was selected.The optimal configuration had 100% validation accuracy.284.2 ResultsThe test set consisted of roughly 135 RFI events and 223 astrophysical events (pul-sars and FRBs) from October and November 2019. Table 4.1 shows the accuracy,precision, recall and F1-score for the test set. The lower performance on the test setwhen compared to the validation set could be due to optimisation bias towards thelatter as its size is very small.Accuracy (%) Precision (%) Recall (%) F1-Score (%)99.2 99.1 99.6 99.3Table 4.1: Accuracy, precision, recall and F1-score computed on the test set.The latest model has been deployed for real-time classification of called-backevents via The CHIME/FRB Master. This also now allows the users to viewthe plots generated by intensityML alongside the ones generated by the online-waterfaller. In several cases, users have reported that these plots have been sig-nificantly more discernible when compared to the default ones produced by theonline-waterfaller. Figure 4.1 shows the waterfall plots for a few FRB candidatesthat were discovered via real-time deployment of intensityML.The combination of improved plots and the means to automatically classifythem proved to be extremely useful in discovering several potential FRB candidatesfrom archival data that were initially misclassified as RFI by users. Figure 4.2shows several examples of such candidates that were discovered through the latestmodel or through one of its preliminary versions. This also helped clear over 70Tbof disk-space from the archiver by deleting old unwanted event data.Table 4.2 shows the L1 SNRs and DMs for the events shown in Figures 4.1 and4.2. It can be seen that these events cover a wide range of SNRs and DMs, whichsuggests that the classifier may be capable of finding bursts over a wide range ofparameters.29L1 SNR L1 DM (cm−3 pc)12.1 179.510.3 1074.08.9 105.18.9 714.99.2 648.625.6 308.99.4 1184.010.6 669.69.6 334.811.1 975.48.7 706.912.9 681.0Table 4.2: L1 SNRs and DMs for events shown in Figures 4.1 and 4.2.30Figure 4.1: Waterfall plots generated by intenstiyML (left) compared with theones generated by the online-waterfaller (right) for some of the potentialFRB candidates. These were discovered by users with the help of real-time deployment.31Figure 4.2: Waterfall plots for some of the potential FRB candidates discov-ered from archival data that were intially misclassified as RFI by users.These were discovered by either visually inspecting plots generated byintensityML or with the help of the classifier.32Chapter 5Discussion5.1 Discussion and Future Work5.1.1 DiscussionThe preliminary analysis in this thesis presents an alternative CNN architecturethat can give excellent performance when it comes to FRB searches. The FRBNetarchitecture presented here is simpler and more lightweight than the ones presentedby Connor et al. [9] and Agarwal et al. [2], but exhibits comparable classificationperformance. A simpler model with fewer input features and parameters will takesignificantly less time when it comes to training and classification. This also greatlyreduces the memory required to store the parameters of the trained model. Forexample, the parameters of the models presented by Agarwal et al. have a size ofat least a 100 MB [2], whereas FRBNet’s parameters occupy only about 45 KB.However, it should be stressed that the comparisons presented here are weak asthese models were trained and tested on completely different datasets.Nevertheless, the ideas from this thesis, combined with the ones presented inthe two papers, can provide key insights into building and training even better mod-els in the future. The DM-augmentation technique could become a very powerfultool for generating additional training examples if using waterfall plots dedispersedat sub-optimal DMs. Moreover, building future architectures may also involve ex-perimenting with different types of convolution operations in the very first layer.33Hybrid architectures like the ones presented by Connor et al. [9] can perhapsbe optimised by pruning away redundant input features or by replacing the first 2DCNN with an architecture similar to that of FRBNet. Moreover, training deep CNNarchitectures that were successful on ImageNet challenge may also not be neces-sary for this type of classification problem. Since waterfall plots contain muchsimpler features when compared to images of real-life objects from 1000 classes,using deeper networks to extract very high-level features may not be necessary asAgarwal et al. [2] claim. A relatively shallow network, as presented in this thesis,may be sufficient. Clearly, all of these ideas can be validated when working witheven larger datasets as they become available.5.1.2 Future WorkIt should also be stressed that the specific attributes of the classifier are subject tochange in the future. The classifier presented in this thesis has been trained andtested on a relatively small dataset, and as CHIME/FRB’s dataset grows, the com-bination of model architecture plus optimisation strategy would most likely evolvetoo. The distribution of events is also highly dependent on ever-changing factorssuch as the beam sensitivity, the local RF environment or the performance of theupstream classifiers. However, the key ideas presented here will still, nonetheless,be very helpful in developing future classifiers.Besides, training and testing on larger datasets, other key future improvementsto this work may involve exploring how to further optimise the architecture ofFRBNet. For example, it may be worth investigating if some of the kernels inLayer 0 could be pruned to reduce redundancy in its output channels. Havingboth, Prewitt and Sobel filters, and their reflections may be unnecessary. It wouldalso be worth exploring the effects of replacing some of the vertical-edge detectionkernels with horizontal-edge detection kernels as it would increase the contrast inthe layer’s output channels and reduce redundancy.The classifier may also be sped up further by adopting an architecture similarto that of the Inception module [44] for Layer 0 as shown in Figure 5.1. In thismethod, instead of padding the convolution kernel, the convolution operations ofdifferent sizes are performed by different branches, and the output is concatenated.34Figure 5.1: Diagram shows an example of an Inception module [44] for Layer0 convolution.However, the downside of this implementation is that it would be more challengingto parallelise.5.1.3 Science GoalsThe trained model’s ability to efficiently classify events without explicit RFI-maskingand DM-refinement makes it ideal for real-time event classification with the CHIME/FRBpipeline. This is because the L1 tree-dedispersion algorithm usually finds a sub-optimal DM and automating the process of finding the optimal DM is often a dif-ficult task, especially with the presence of RFI contamination. Similarly, RFI-masking is also a challenging task to automate. These tasks can also be computa-tionally expensive and can hinder the speed of real-time classification.Currently, the intensityML plots and real-time classification have not fully re-placed the need for manual classification, but instead complement this task. It willbe crucial to train and test on much larger datasets before manual classification35can be completely phased out. Another key future work would involve trainingon events with SNR lower than the pipeline’s default threshold. This would in-crease the sensitivity of the classifier further in the low SNR range, which wouldeventually help lower the pipeline’s SNR threshold and increase the overall FRBdetection rate.For several key scientific goals, it is crucial to miss as few FRBs as possible,and intensityML plots and classifier would certainly assist with this challenge. Anexhaustive catalogue of detected FRBs would improve estimates of CHIME’s eventdetection rates and its variations with DM, sky-coordinates, fluence, morphologicalstructure etc. This will also help get better estimates of properties of FRBs such astheir repetition rates.The automatic real-time classifier can also be used to trigger baseband dumps[9]. An event’s baseband data provides a higher temporal resolution when com-pared to the intensity data, which allows studying burst morphology in even greaterdetail [6]. Moreover, it also contains an event’s polarisation information [15].However, baseband data requires substantially more disk space and takes consid-erably longer time to download. In order to minimise the risk of overloading thesystem with baseband dumps triggered by false-positives, the real-time classifiercan be used to efficiently control it. This technique can also be applied to triggerdata dumps from outrigger sites, which can help improve localisation of FRBs [9].Presently, The CHIME/FRB pipeline does not perform callbacks for galacticsources as it would result in an overwhelming event-rate. However, an automatedclassifier can easily fulfill this role with minimal human supervision. This wouldallow CHIME/FRB to detect a large number of pulses from known galactic pulsars,which could help characterise properties of individual pulses and provide betterestimates of their pulse statistics. The automated classifier may also help CHIMEdiscover new pulsars.5.2 ConclusionThis thesis presented a detailed description of intensityML pipeline designed togenerate plots and automatically classify events for The CHIME/FRB system. TheCNN classifier used by intensityML has a unique architecture when compared to36other classifiers used in this field, and it has been trained exclusively on eventscollected by CHIME, without using any simulated events, DM-refinement and RFI-masking. The training phase also utilised a novel data-augmentation technique ofDM-augmentation. The combination of intensityML’s new plotting routine as wellas its trained classifier helped discover several FRBs, both in real-time and fromarchival data. The work presented in this thesis encourages further explorationin refining future CNN architectures designed for FRB classification. Having alarge and comprehensive catalogue of FRBs will be a major step in answering keyscientific questions behind the source of FRBs, and automated classifiers will playa key role in accomplishing this task.37Bibliography[1] Sobel filter kernel of large size. URL https://stackoverflow.com/questions/9567882/sobel-filter-kernel-of-large-size.Accessed: 2019-10-01. → page 19[2] D. Agarwal, K. Aggarwal, S. Burke-Spolaor, D. R. Lorimer, andN. Garver-Daniels. Towards deeper neural networks for Fast Radio Burstdetection, 2019. → pages x, 8, 10, 11, 25, 33, 34[3] M. Amiri, K. Bandura, P. Berger, J. R. Bond, J. F. Cliche, L. Connor,M. Deng, N. Denman, M. Dobbs, R. S. Domagalski, and et al. Limits on theUltra-bright Fast Radio Burst Population from the CHIME Pathfinder. TheAstrophysical Journal, 844(2):161, Aug 2017. ISSN 1538-4357.doi:10.3847/1538-4357/aa713f. URLhttp://dx.doi.org/10.3847/1538-4357/aa713f. → page 9[4] M. Amiri, K. Bandura, P. Berger, M. Bhardwaj, M. M. Boyce, P. J. Boyle,C. Brar, M. Burhanpurkar, P. Chawla, and et al. The CHIME Fast RadioBurst Project: System Overview. The Astrophysical Journal, 863(1):48, Aug2018. ISSN 1538-4357. doi:10.3847/1538-4357/aad188. URLhttp://dx.doi.org/10.3847/1538-4357/aad188. → pages v, ix, 3, 4, 5, 6, 7[5] M. Amiri, K. Bandura, M. Bhardwaj, et al. Observations of Fast RadioBursts at Frequencies Down to 400 Megahertz. Nature, 566(7743):230–234,Jan 2019. ISSN 1476-4687. doi:10.1038/s41586-018-0867-7. URLhttp://dx.doi.org/10.1038/s41586-018-0867-7. → pages v, 1, 3, 5, 6[6] B. Andersen, K. Bandura, M. Bhardwaj, P. Boubel, M. Boyce, P. Boyle,C. Brar, T. Cassanelli, P. Chawla, D. Cubranic, et al. CHIME/FRBDiscovery of Eight New Repeating Fast Radio Burst Sources. TheAstrophysical Journal Letters, 885(1):L24, 2019. → pages 3, 3638[7] A. Caldero´n, S. Roa, and J. Victorino. Handwritten digit recognition usingconvolutional neural networks and Gabor filters. Proc. Int. Congr. Comput.Intell, 2003. → page 19[8] D. J. Champion, E. Petroff, M. Kramer, M. J. Keith, M. Bailes, E. D. Barr,S. D. Bates, N. D. R. Bhat, M. Burgay, S. Burke-Spolaor, and et al. Five newfast radio bursts from the HTRU high-latitude survey at Parkes: firstevidence for two-component bursts. Monthly Notices of the RoyalAstronomical Society: Letters, 460(1):L30–L34, Apr 2016. ISSN1745-3933. doi:10.1093/mnrasl/slw069. URLhttp://dx.doi.org/10.1093/mnrasl/slw069. → page 10[9] L. Connor and J. van Leeuwen. Applying Deep Learning to Fast RadioBurst Classification. The Astronomical Journal, 156(6):256, Nov 2018.ISSN 1538-3881. doi:10.3847/1538-3881/aae649. URLhttp://dx.doi.org/10.3847/1538-3881/aae649. → pages ix, 8, 9, 33, 34, 36[10] J. M. Cordes and T. J. W. Lazio. NE2001.I. A New Model for the GalacticDistribution of Free Electrons and its Fluctuations, 2002. → page 2[11] C. Cortes and V. Vapnik. Support-Vector Networks. Mach. Learn., 20(3):273–297, Sept. 1995. ISSN 0885-6125. doi:10.1023/A:1022627411411.URL https://doi.org/10.1023/A:1022627411411. → page 5[12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: Alarge-scale hierarchical image database. In 2009 IEEE conference oncomputer vision and pattern recognition, pages 248–255. Ieee, 2009. →page 10[13] V. I. Dokuchaev and Y. N. Eroshenko. Recurrent Fast Radio Bursts fromCollisions of Neutron Stars in the Evolved Stellar Clusters, 2017. → page 3[14] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A Density-based Algorithmfor Discovering Clusters a Density-based Algorithm for DiscoveringClusters in Large Spatial Databases with Noise. In Proceedings of theSecond International Conference on Knowledge Discovery and DataMining, KDD’96, pages 226–231. AAAI Press, 1996. URLhttp://dl.acm.org/citation.cfm?id=3001460.3001507. → page 6[15] E. Fonseca, B. Andersen, M. Bhardwaj, P. Chawla, D. Good, A. Josephy,V. Kaspi, K. Masui, R. Mckinven, D. Michilli, et al. Nine New RepeatingFast Radio Burst Sources from CHIME/FRB. arXiv preprintarXiv:2001.03595, 2020. → pages 3, 3639[16] J. Friedman, T. Hastie, and R. Tibshirani. The Elements of StatisticalLearning : Data Mining, Inference, and Prediction. Springer, New York,2009. ISBN 978-0387848570. → pages 12, 14[17] V. Gajjar, A. P. V. Siemion, D. C. Price, C. J. Law, D. Michilli, J. W. T.Hessels, S. Chatterjee, A. M. Archibald, G. C. Bower, C. Brinkman, andet al. Highest frequency detection of frb 121102 at 4–8 ghz using thebreakthrough listen digital backend at the green bank telescope. TheAstrophysical Journal, 863(1):2, Aug 2018. ISSN 1538-4357.doi:10.3847/1538-4357/aad005. URLhttp://dx.doi.org/10.3847/1538-4357/aad005. → page 10[18] G. Golpayegani, D. R. Lorimer, S. W. Ellingson, D. Agarwal, O. Young,F. Ghigo, R. Prestage, K. Rajwade, M. A. McLaughlin, and M. Mingyar.GBTrans: A commensal search for radio pulses with the Green Bank twentymetre telescope. Monthly Notices of the Royal Astronomical Society, Sep2019. ISSN 1365-2966. doi:10.1093/mnras/stz2424. URLhttp://dx.doi.org/10.1093/mnras/stz2424. → page 10[19] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press,2016. http://www.deeplearningbook.org. → pagesv, x, 12, 13, 15, 17, 18, 19, 24, 28[20] P. Guo, F. Duan, P. Wang, Y. Yao, Q. Yin, and X. Xin. Pulsar CandidateIdentification with Artificial Intelligence Techniques, 2017. → page 8[21] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization,2014. → page 28[22] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,and L. D. Jackel. Backpropagation Applied to Handwritten Zip CodeRecognition. Neural Comput., 1(4):541–551, Dec. 1989. ISSN 0899-7667.doi:10.1162/neco.1989.1.4.541. URLhttp://dx.doi.org/10.1162/neco.1989.1.4.541. → page 15[23] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learningapplied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, Nov 1998. ISSN 1558-2256. doi:10.1109/5.726791. → pagesx, 20[24] D. R. Lorimer and M. Kramer. Handbook of Pulsar Astronomy, volume 4.2004. → page 140[25] D. R. Lorimer, M. Bailes, M. A. McLaughlin, D. J. Narkevic, andF. Crawford. A Bright Millisecond Radio Burst of Extragalactic Origin.Science, 318(5851):777–780, Nov 2007. ISSN 1095-9203.doi:10.1126/science.1147532. URLhttp://dx.doi.org/10.1126/science.1147532. → pages 1, 2[26] R. N. Manchester, G. B. Hobbs, A. Teoh, and M. Hobbs. The AustraliaTelescope National Facility Pulsar Catalogue. The Astronomical Journal,129(4):1993–2006, Apr 2005. ISSN 1538-3881. doi:10.1086/428488. URLhttp://dx.doi.org/10.1086/428488. → page 7[27] B. D. Metzger, B. Margalit, and L. Sironi. Fast radio bursts as synchrotronmaser emission from decelerating relativistic blast waves. Monthly Noticesof the Royal Astronomical Society, 485(3):4091–4106, Mar 2019. ISSN1365-2966. doi:10.1093/mnras/stz700. URLhttp://dx.doi.org/10.1093/mnras/stz700. → page 3[28] L. B. Newburgh, G. E. Addison, M. Amiri, K. Bandura, J. R. Bond,L. Connor, J.-F. Cliche, G. Davis, M. Deng, N. Denman, and et al.Calibrating CHIME: A New Radio Interferometer to Probe Dark Energy.Ground-based and Airborne Telescopes V, Jul 2014.doi:10.1117/12.2056962. URL http://dx.doi.org/10.1117/12.2056962. →page 3[29] T. Oosterloo, M. Verheijen, W. van Cappellen, L. Bakker, G. Heald, andM. Ivashina. Apertif - the focal-plane array system for the WSRT, 2009. →page 10[30] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen,Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Ko¨pf, E. Yang,Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-PerformanceDeep Learning Library, 2019. → page 28[31] E. Petroff, M. Bailes, E. D. Barr, B. R. Barsdell, N. D. R. Bhat, F. Bian,S. Burke-Spolaor, M. Caleb, D. Champion, P. Chandra, and et al. Areal-time fast radio burst: polarization detection and multiwavelengthfollow-up. Monthly Notices of the Royal Astronomical Society, 447(1):246–255, Dec 2014. ISSN 0035-8711. doi:10.1093/mnras/stu2419. URLhttp://dx.doi.org/10.1093/mnras/stu2419. → page 10[32] E. Petroff, E. D. Barr, A. Jameson, E. F. Keane, M. Bailes, M. Kramer,V. Morello, D. Tabbara, and W. van Straten. FRBCAT: The Fast Radio Burst41Catalogue. Publications of the Astronomical Society of Australia, 33, 2016.ISSN 1448-6083. doi:10.1017/pasa.2016.35. URLhttp://dx.doi.org/10.1017/pasa.2016.35. → pages 2, 7[33] E. Petroff, S. Burke-Spolaor, E. F. Keane, M. A. McLaughlin, R. Miller,I. Andreoni, M. Bailes, E. D. Barr, S. R. Bernard, S. Bhandari, and et al. Apolarized fast radio burst at low galactic latitude. Monthly Notices of theRoyal Astronomical Society, May 2017. ISSN 1365-2966.doi:10.1093/mnras/stx1098. URL http://dx.doi.org/10.1093/mnras/stx1098.→ page 10[34] E. Petroff, J. W. T. Hessels, and D. R. Lorimer. Fast Radio Bursts. TheAstronomy and Astrophysics Review, 27(1), May 2019. ISSN 1432-0754.doi:10.1007/s00159-019-0116-6. URLhttp://dx.doi.org/10.1007/s00159-019-0116-6. → page 1[35] E. Platts, A. Weltman, A. Walters, S. Tendulkar, J. Gordin, and S. Kandhai.A living theory catalogue for fast radio bursts. Physics Reports, 821:1–27,Aug 2019. ISSN 0370-1573. doi:10.1016/j.physrep.2019.06.003. URLhttp://dx.doi.org/10.1016/j.physrep.2019.06.003. → pages 1, 3[36] J. Prewitt and B. Lipkin. ”Object Enhancement and Extraction”. In PictureProcessing and Psychopictorics. Academic Press, 1970. → pages 15, 19[37] K. M. Rajwade, D. Agarwal, D. R. Lorimer, N. M. Pingel, D. J. Pisano,M. Ruzindana, B. Jeffs, K. F. Warnick, D. A. Roshi, and M. A. McLaughlin.A 21 cm pilot survey for pulsars and transients using the Focal L-BandArray for the Green Bank Telescope. Monthly Notices of the RoyalAstronomical Society, 489(2):1709–1718, Aug 2019. ISSN 1365-2966.doi:10.1093/mnras/stz2207. URL http://dx.doi.org/10.1093/mnras/stz2207.→ page 10[38] S. S. Sarwar, P. Panda, and K. Roy. Gabor filter assisted energy efficient fastlearning Convolutional Neural Networks. 2017 IEEE/ACM InternationalSymposium on Low Power Electronics and Design (ISLPED), Jul 2017.doi:10.1109/islped.2017.8009202. URLhttp://dx.doi.org/10.1109/ISLPED.2017.8009202. → page 19[39] R. M. Shannon, J.-P. Macquart, K. W. Bannister, R. D. Ekers, C. W. James,S. Osłowski, H. Qiu, M. Sammons, A. Hotan, M. A. Voronkov, R. J.Beresford, M. Brothers, A. J. Brown, J. D. Bunton, A. Chippendale,C. Haskins, M. Leach, M. Marquarding, D. McConnell, M. Pilawa, E. M.42Sadler, E. Troup, J. Tuthill, M. T. Whiting, J. R. Allison, C. S. Anderson,M. E. Bell, J. D. Collier, G. Gu¨rkan, G. Heald, and C. J. Riseley. Thedispersion–brightness relation for fast radio bursts from a wide-field survey.Nature, 562:386–390, 2018. → page 10[40] I. Sobel. An Isotropic 3x3 Image Gradient Operator. Presentation atStanford A.I. Project 1968, 02 2014. → page 19[41] L. G. Spitler, P. Scholz, J. W. T. Hessels, S. Bogdanov, A. Brazier, F. Camilo,S. Chatterjee, J. M. Cordes, F. Crawford, J. Deneva, and et al. A repeatingfast radio burst. Nature, 531(7593):202–205, Mar 2016. ISSN 1476-4687.doi:10.1038/nature17168. URL http://dx.doi.org/10.1038/nature17168. →page 2[42] L. Staveley-Smith, W. Wilson, T. Bird, M. Disney, R. Ekers, K. Freeman,R. Haynes, M. Sinclair, R. Vaile, R. Webster, et al. The Parkes 21 cmmultibeam receiver. Publications of the Astronomical Society of Australia,13(3):243–248, 1996. → page 2[43] M. P. Surnis, D. Agarwal, D. R. Lorimer, X. Pei, G. Foster, A. Karastergiou,G. Golpayegani, R. J. Maddalena, S. White, W. Armour, and et al.GREENBURST: A commensal Fast Radio Burst search back-end for theGreen Bank Telescope. Publications of the Astronomical Society ofAustralia, 36, 2019. ISSN 1448-6083. doi:10.1017/pasa.2019.26. URLhttp://dx.doi.org/10.1017/pasa.2019.26. → page 10[44] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and patternrecognition, pages 1–9, 2015. → pages xii, 34, 35[45] D. Thornton, B. Stappers, M. Bailes, B. Barsdell, S. Bates, N. D. R. Bhat,M. Burgay, S. Burke-Spolaor, D. J. Champion, P. Coster, and et al. Apopulation of fast radio bursts at cosmological distances. Science, 341(6141):53–56, Jul 2013. ISSN 1095-9203. doi:10.1126/science.1236789.URL http://dx.doi.org/10.1126/science.1236789. → page 10[46] Y. G. Zhang, V. Gajjar, G. Foster, A. Siemion, J. Cordes, C. Law, andY. Wang. Fast Radio Burst 121102 Pulse Detection and Periodicity: AMachine Learning Approach. The Astrophysical Journal, 866(2):149, Oct2018. ISSN 1538-4357. doi:10.3847/1538-4357/aadf31. URLhttp://dx.doi.org/10.3847/1538-4357/aadf31. → pages 8, 10, 2443[47] W. W. Zhu, A. Berndsen, E. C. Madsen, M. Tan, I. H. Stairs, A. Brazier,P. Lazarus, R. Lynch, P. Scholz, K. Stovall, and et al. Searching for pulsarsusing image pattern recognition. The Astrophysical Journal, 781(2):117, Jan2014. ISSN 1538-4357. doi:10.1088/0004-637x/781/2/117. URLhttp://dx.doi.org/10.1088/0004-637X/781/2/117. → page 844

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0389889/manifest

Comment

Related Items