Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning based prediction of repetitive transcranial magnetic stimulation treatment outcome in… Liu, Xiang 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2020_november_liu_xiang.pdf [ 2.55MB ]
Metadata
JSON: 24-1.0392572.json
JSON-LD: 24-1.0392572-ld.json
RDF/XML (Pretty): 24-1.0392572-rdf.xml
RDF/JSON: 24-1.0392572-rdf.json
Turtle: 24-1.0392572-turtle.txt
N-Triples: 24-1.0392572-rdf-ntriples.txt
Original Record: 24-1.0392572-source.json
Full Text
24-1.0392572-fulltext.txt
Citation
24-1.0392572.ris

Full Text

MACHINE LEARNING BASED PREDICTION OF REPETITIVE TRANSCRANIAL MAGNETIC STIMULATION TREATMENT OUTCOME IN PATIENTS WITH TREATMENT-RESISTANT DEPRESSION  by  Xiang Liu  B.Sc. The University of British Columbia, 2018   A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF   MASTER OF SCIENCE in  THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Experimental Medicine)   THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)   July 2020 © Xiang Liu, 2020 ii  The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, a thesis entitled: Machine Learning Based Prediction of Repetitive Transcranial Magnetic Stimulation Treatment Outcome in Patients with Treatment-Resistant Depression  submitted by Xiang Liu  in partial fulfillment of the requirements for the degree of Master of Science in Experimental Medicine  Examining Committee: Dr. Fidel Vila-Rodriguez, Assistant Professor, Department of Psychiatry, UBC Supervisor Dr. Z. Jane Wang, Professor, Department of Electrical and Computer Engineering, UBC Co-supervisor  Dr. Sara Mostafavi, Assistant Professor, Department of Statistics, Department of Medical Genetics, UBC Supervisory Committee Member Dr. Rebecca Todd, Associate Professor, Department of Psychology, UBC Additional Examiner   iii  Abstract  Major depressive disorder (MDD) is a highly prevalent psychiatric disorder that affects millions of people. Repetitive transcranial magnetic stimulation (rTMS) has been recommended as a safe, reliable, non-invasive, neurostimulation therapy option for treatment-resistant depression (TRD). The effectiveness of rTMS treatment varies among individuals; thus, predicting the responsiveness to rTMS treatment can reduce unnecessary expenses and improve treatment capacity. In this study, we combined machine learning models with depression rating scales, clinical variables, and demographic data to predict the outcomes and effectiveness of rTMS treatment for TRD patients. Using the clinical data of 356 TRD patients who each received 20 to 30 sessions of rTMS treatment over a 4-6-week period, we examined the predictive value of different depression rating scales and models for various prediction outcomes, at multiple time points. Our optimal baseline models achieved area under the curve (AUC) values of 0.634 and 0.735 for treatment response and remission prediction, respectively, using the Elastic Net. In the longitudinal analysis, using baseline data and early treatment outcomes for 1–3 weeks, all predictive values improved compared with baseline models. In addition, predicting the percentage of symptom improvement was also feasible using longitudinal treatment outcomes, achieving coefficients of determination of 0.277, at the end of week 1, and 0.464, at the end of week 3. We found that the use of depression rating subscales, combined with clinical and demographic data, including anxiety severity, employment status, age, gender, and education level, may produce higher accuracy at baseline. In the longitudinal analysis, the total scores of depression rating scales were the most significant iv  predictors, allowing prediction models to be built using only the total scores, which resulted in high predictive value and interpretability.  This work presented a convenient and economical approach for the prediction of rTMS treatment outcomes in TRD patients, using pre-treatment clinical and demographic data alone, without requiring expensive biomarker data. The predictive value was further enhanced by adding longitudinal treatment outcomes. This method is a plausible approach that could be utilized in clinical practice for individualized treatment selection, leading to better treatment outcomes for rTMS in TRD patients.    v  Lay Summary  Repetitive transcranial magnetic stimulation (rTMS) is currently a first-line option for the treatment of patients with treatment-resistant depression (TRD). Predicting rTMS treatment outcomes for individual patients could help avoid unnecessary expenses and improve rTMS treatment availability. Using a large clinical dataset, this work demonstrated a method for predicting rTMS treatment outcomes, based on machine learning techniques, which only required affordable, fast, and easily acquired clinical and demographic information. The baseline models could predict the individual outcome of rTMS therapy before performing any rTMS treatments, saving up to 6 weeks of unnecessary treatment. After 1–3 weeks of rTMS treatment, the longitudinal models could generate more accurate predictions based on early treatment outcomes. This work can be plausibly implemented to assist in the development of personalized antidepressive treatment plans.   vi  Preface  The idea for predicting repetitive transcranial magnetic stimulation (rTMS) treatment outcomes using machine learning was conceived by my supervisor, Dr. Fidel Vila-Rodriguez. This study is a comprehensive extension of Dana Bazazeh's thesis project, which was supervised by Dr. Z. Jane Wang and Dr. Fidel Vila-Rodriguez. Bazazeh, Dana. 2018. “Artificial Neural Network Based Prediction of Treatment Response to Repetitive Transcranial Magnetic Stimulation for Major Depressive Disorder Patients.” University of British Columbia. My work improved the performance and applicability of the predictive model by adopting interpretable machine learning techniques, introducing various depression rating scales, and extending the baseline model to a longitudinal model. This thesis is independent, original, unpublished work by me. I designed and implemented all of the necessary code to perform the machine learning analysis and to visualize the results. Dr. Fidel Vila-Rodriguez shared the research data released from the THREE-D trial and provided overall guidance for this study. Dr. Z. Jane Wang and Dr. Sara Mostafavi contributed to the machine learning analysis and other methodology aspects. The Clinical Research Ethics Board (CREB) approved this study (CREB number H13-02340). In Chapters 2 and 3, figures and tables were re-used from our manuscript in preparation, with some modifications, and I am the first author of the manuscript, who created these figures and tables.    vii  Table of Contents Abstract ........................................................................................................................... iii Lay Summary .................................................................................................................. v Preface ............................................................................................................................vi Table of Contents ........................................................................................................... vii List of Tables ...................................................................................................................xi List of Figures ................................................................................................................. xii List of Abbreviations .......................................................................................................xv Acknowledgements ...................................................................................................... xvii Chapter 1: Introduction and Background ................................................................... 1 1.1 Overview .................................................................................................. 1 1.2 Depression and Antidepressive Treatment .............................................. 3 1.2.1 MDD and TRD .................................................................................... 4 1.2.2 Depression Diagnosis and Antidepressive Treatment ........................ 5 1.3 Machine Learning (ML) ............................................................................ 7 1.3.1 Theory of Learning .............................................................................. 7 1.3.2 Supervised Learning ......................................................................... 10 1.4 Research Questions ............................................................................... 11 1.5 Objective ................................................................................................ 12 1.6 Contribution ............................................................................................ 13 1.7 Related Work .......................................................................................... 13 1.8 Thesis Organization ............................................................................... 18 viii  Chapter 2: Data and Methods .................................................................................. 19 2.1 THREE-D Study ..................................................................................... 19 2.1.1 Participants ....................................................................................... 19 2.1.2 Study Procedure ............................................................................... 20 2.2 Dataset ................................................................................................... 21 2.2.1 Treatment Response, Remission, and Symptom Improvement ........ 23 2.2.2 Depression Rating Scales ................................................................ 24 2.2.3 Demographic and Other Baseline Clinical Data ................................ 28 2.2.4 Predictor Selection ............................................................................ 31 2.3 Exploratory Data Analysis ...................................................................... 33 2.4 ML Analysis ............................................................................................ 37 2.4.1 From Linear Regression to Logistic Regression ............................... 37 2.4.2 From L1, L2 Regularization to Elastic Net Regularization ................. 40 2.4.3 From Decision Tree to Random Forest ............................................. 41 2.4.4 Binary Classification Model ............................................................... 44 2.4.5 Regression Model ............................................................................. 45 2.5 Model Tuning and Performance Evaluation Pipeline .............................. 46 2.5.1 Receiver Operating Characteristic (ROC) Curve and AUC ............... 49 2.5.2 Coefficient of Determination (𝑹𝑹𝑹𝑹 score) ............................................ 50 2.5.3 Permutation Test .............................................................................. 50 ix  Chapter 3: Results ................................................................................................... 51 3.1 Treatment Response Prediction ............................................................. 51 3.1.1 HRSD-C-17 Treatment Response Prediction ................................... 52 3.1.2 IDS-C-30 Treatment Response Prediction........................................ 53 3.1.3 QIDS-SR-16 Treatment Response Prediction .................................. 54 3.2 Treatment Remission Prediction ............................................................ 55 3.2.1 HRSD-C-17 Treatment Remission Prediction ................................... 56 3.2.2 IDS-C-30 Treatment Remission Prediction ....................................... 57 3.2.3 QIDS-SR-16 Treatment Remission Prediction .................................. 58 3.3 Symptom Improvement Prediction ......................................................... 58 3.3.1 HRSD-C-17 Symptom Improvement Prediction ................................ 60 3.3.2 IDS-C-30 Symptom Improvement Prediction .................................... 61 3.3.3 QIDS-SR-16 Symptom Improvement Prediction ............................... 61 3.4 Feature Importance Analysis .................................................................. 62 Chapter 4: Discussion and Conclusion .................................................................... 67 4.1 Discussion .............................................................................................. 67 4.1.1 Baseline Models ............................................................................... 67 4.1.2 Longitudinal Models .......................................................................... 71 4.1.3 Limitations and Future Research Directions ..................................... 72 4.2 Conclusion.............................................................................................. 73 x  Bibliography .................................................................................................................. 75 Appendix ....................................................................................................................... 85 A: Accuracy, sensitivity, and specificity of stage 2 clinical response and remission prediction models ....................................................................................................... 85 B: Feature coefficients plots of optimal baseline stage 2 models ............................... 86   xi  List of Tables  Table 1.1: Summary of previous studies. Part 1 lists all of the rTMS treatment-associated outcome-prediction studies, and part 2 includes additional antidepressive treatment outcome prediction studies, performed using clinical data. ........................................... 17 Table 2.1: Treatment response and remission definitions for the HRSD-C-17, IDS-C-30, and QIDS-SR-16. .......................................................................................................... 23 Table 2.2: Summary of the HRSD-C-17 items, with a literature-based subscale for each item. .............................................................................................................................. 25 Table 2.3: Summary of the IDS-C-30 items, with a literature-based subscale for each item. In clinical practice, only 28 out of 30 items would be valid because only one of the items 11 and 12 and one of items 13 and 14 should be answered. ........................................ 27 Table 2.4: Summary of the QIDS-SR-16 items, with a literature-based subscale for each item. .............................................................................................................................. 28 Table 2.5: Summary and statistics of demographic, ongoing treatment, and other clinical data, collected at baseline. ............................................................................................ 30 Table 2.6: Summary and statistics for the baseline BSI-A items. .................................. 31 Table 2.7: Summary of the number of features used in our model. ............................... 33 Table 2.8: Treatment response and remission rates and the averaged percentage of symptom improvement, as assessed by the three depression rating scales. ................ 34 Table 2.9: Confusion matrix. ......................................................................................... 49   xii  List of Figures  Figure 1.1: Number of ML publications collected in the PubMed database from 2000 to 2019. This bar chart illustrates that ML has had significant impacts on medical research. ...................................................................................................................................... 10 Figure 1.2: Supervised learning notations. This figure demonstrates an abstraction of a single-label supervised learning problem. Features and labels also refer to explanatory variables and response variables. ................................................................................. 11 Figure 2.1: The numbers of missing records for the three depression rating scales at each time point. ...................................................................................................................... 22 Figure 2.2: The proportions of responders (upper) and remitters (lower) defined by three different scales. This figure demonstrates that the definitions for treatment response and remission were highly consistent among the three depression rating scales. ............... 35 Figure 2.3: Scatter plots for three depression rating scales and histograms, showing the percentage of symptom improvement for each depression rating scale. This figure demonstrates that the percentages of symptom improvement were highly consistent across all three scales. .................................................................................................. 36 Figure 2.4: Sigmoid function .......................................................................................... 38 Figure 2.5: An example of a classification decision tree, built using our dataset. .......... 42 Figure 2.6: Proposed framework for our ML analysis. Upper: cross-validation. Lower: training the final model and extracting feature importance. ........................................... 48 Figure 3.1: AUC, accuracy, sensitivity, and specificity for treatment response prediction models that were built during the stage 1 analysis. Models were based on LRC and were xiii  trained using longitudinal total scores of depression rating scales. All the results were validated through repeated stratified cross-validation. .................................................. 51 Figure 3.2: AUC for the clinical response prediction models that were built during the stage 2 analysis. The models were based on LRC/ENC/RFC and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated stratified cross-validation. ............................. 52 Figure 3.3: AUC, accuracy, sensitivity, and specificity for treatment remission prediction models that were built during the stage 1 analysis. Models were based on LRC and were trained using longitudinal total scores of depression rating scales. All the results were validated through repeated stratified cross-validation. .................................................. 55 Figure 3.4: AUC for the clinical remission prediction models that were built during the stage 2 analysis. The models were based on LRC/ENC/RFC and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated stratified cross-validation. ............................. 56 Figure 3.5: 𝑹𝑹𝑹𝑹 score for symptom improvement prediction models that were built during the stage 1 analysis. Models were based on LRR and were trained using longitudinal total scores of depression rating scales. All the results were validated through repeated cross-validation. ...................................................................................................................... 59 Figure 3.6: 𝑹𝑹𝑹𝑹 score for symptom improvement prediction models that were built during the stage 2 analysis. The models were based on LRR/ENR/RFR and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression xiv  rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated cross-validation. ............................................ 60 Figure 3.7: Feature coefficients for the stage 1 models (trained using the depression rating scales total scores). Treatment response and remission prediction models were built with LRC. Symptom improvement models were built with LRR. ............................ 63 Figure 3.8: Top 10 features extracted from the Elastic Net models, trained using the longitudinal total scores of depression rating scales and baseline clinical and demographic variables. ................................................................................................. 65 Figure 3.9: Top 10 features extracted from the Random Forest models, trained using the longitudinal total scores of depression rating scales and baseline clinical and demographic variables. ................................................................................................. 66    xv  List of Abbreviations  ATHF antidepressant treatment history form  AOC receiver operating characteristic  ATHF  Antidepressant Treatment History Form AUC area under the curve BSI-A Brief Symptom Inventory-Anxiety Subscale  DL deep learning DLPFC dorsolateral prefrontal cortex  DSM Diagnostic and Statistical Manual of Mental Disorder ECT electroconvulsive therapy  EEG electroencephalography ENC Elastic Net regularized logistic regression classifier ENR Elastic Net regularized linear regression regressor FDA US Food and Drug Administration  fMRI functional magnetic resonance imaging  FPR false positive rate  HRSD Hamilton Rating Scale for Depression  HRSD-C-17 17-item Clinician-Administered Hamilton Rating Scale  IDS-C-30 30-item Clinician-Administered Inventory of Depressive Symptoms IID independent and identically distributed  iTBS intermittent theta burst Stimulation  LRC logistic regression classifier LRR  linear regression regressor MDD major depressive disorder  ML machine learning NPV negative predictive value  OLS ordinary least squares linear regression  xvi  PCA principle component analysis PPV positive predictive value QIDS-SR-16 16-item Self-Rated Quick Inventory of Depressive Symptoms  𝑹𝑹𝑹𝑹 score coefficient of determination  RFC Random Forest classifier RFR Random Forest regressor ROC receiver operating characteristic rTMS repetitive transcranial magnetic stimulation SD standard deviation  sMRI structural magnetic resonance imaging SVM support vector machine  TPR true positive rate  TPR true positive rate  TRD treatment-resistant depression    xvii  Acknowledgements  First and foremost, I would like to express my sincere gratitude to my supervisors, Dr. Fidel Vila-Rodriguez and Dr. Z. Jane Wang, for providing continuous guidance and supervision on my research. It is my fortune and honour to work with them since the fourth year of my undergraduate program. I would also like to thank my supervisory committee member, Dr. Sara Mostafavi, for her invaluable input in machine learning and statistics for this thesis. I want to express special thanks to my mentor in the lab, Dr. Ruiyang Ge, for his unique assistance and guidance on my projects.  I would also like to thank Dr. Renata Menezes, Elizabeth Gregory, Dr. David Long, and all other members from the Non-Invasive Neurostimulation Therapies (NINET) Laboratory and Dr. Z. Jane Wang’s research group, for being a lovely and friendly family and offering different aspects of help all the time. I would like to thank all faculty members and friends who helped me during my six years of academic journey at the University of British Columbia.  Finally, I wish to acknowledge the great love and sacrifices from my parents and family for raising me and supporting me with no reservation. They are teachers of my life, and I will be forever grateful for their efforts.   1  Chapter 1: Introduction and Background  1.1 Overview Depressive disorder impacts millions of people, worldwide. Repetitive transcranial magnetic stimulation (rTMS) is an effective, reliable, non-invasive treatment for treatment-resistant depression (TRD). However, rTMS is not covered by insurance in most Canadian provinces, and one conventionally delivered 10-Hz treatment session can last for about 30 to 40 minutes (Mendlowitz et al. 2019). The high treatment costs and the shortage of medical resources emphasize the significance of developing precise and personalized treatment plans. In the current age of Big Data, researchers have expressed an increasing interest in applying machine learning (ML) techniques to psychiatric studies (Iniesta, Stahl, and McGuffin 2016). Interpretable and robust ML models are significant for medical research. Numerous ML models from the statistical learning family, especially linear models and tree-based models, have been widely developed and accepted in psychiatric research (Iniesta, Stahl, and McGuffin 2016).  A clinical dataset, with a reasonable sample size, is required to train ML models and obtain robust results with the potential to be applied in clinical practice. Performing a large-sample rTMS trial requires many efforts, and the community has been progressing towards collaboration and sharing of research data. The THREE-D trial is a multicenter, randomized, non-inferiority clinical trial, providing clinical and demographic information of 414 TRD patients who received 4–6 weeks of rTMS treatment between 2013 and 2016 in three Canadian university hospitals (Blumberger et al. 2018). Our primary goal was to 2  develop a clinical tool that could predict rTMS outcomes in TRD patients using widely available clinical and demographic data under the current clinical practice. We proposed the following question. Can we develop an ML model to predict rTMS treatment outcomes using clinical and demographic data? Using information collected at baseline and during the early stages of rTMS treatment, we hypothesized that depression rating scales and demographic data could predict the treatment response, remission, and improvement for individual subjects. We first built logistic regression and linear regression models, using the total scores of the 17-item Clinician-Administered Hamilton Rating Scale for Depression (HRSD-C-17), the 30-item Clinician-Administered Inventory of Depressive Symptoms (IDS-C-30), and the 16-item Self-Rated Quick Inventory of Depressive Symptoms (QIDS-SR-16). Then, we examined the predictive values of individual items and extracted subscales from the HRSD-C-17, IDS-C-30, and QIDS-SR-16, combined with additional demographic and clinical data, such as age, gender, stimulation intensity, and the Brief Symptom Inventory-Anxiety Subscale (BSI-A). We also tested two additional classification models (logistic regression with Elastic Net regularization and Random Forest classifier) and two additional regression models (linear regression with Elastic Net regularization and Random Forest regressor), for binary and continuous prediction tasks, respectively. We paired each depression rating scale with its corresponding clinical outcomes and determined the performance for the prediction of clinical responses, remission, and symptom improvements, through repeated cross-validation analysis. We performed label permutation tests, for all models, and finally, analyzed the relative feature importance. 3   1.2 Depression and Antidepressive Treatment Depression is a common mental disorder, and the World Health Organization estimated that more than 300 million people experience depression, globally (World Health Organization 2017). According to the Diagnostic and Statistical Manual of Mental Disorder 5th edition (DSM-5), an essential feature of major depressive disorder (MDD) is experiencing depressed mood or the loss of happiness in nearly all activities (American Psychiatric Association 2013). Depression can lower patients' quality of life and work productivity, and more seriously, depression can lead to suicide. Depressive disorder was one of the three leading contributions to years lived with disability in 2017 (James et al. 2018). Depression symptoms vary among individuals, and different categories of depression exist, including MDD, recurrent depressive disorder, bipolar disorder, seasonal affective disorder, and psychotic depression. Depression rating scales are widely used to diagnose depression and assess the severity of depressive symptoms. These scales are widely used to assess the severity of depression symptoms, and they can be completed by either clinicians or patients. Professional health providers recommend different anti-depression treatments for different patients, based on the severity of depression, the length of episodes, treatment history, and personal preferences. Typical treatment options include psychotherapy, medications, and brain stimulation therapies (Khan et al. 2012; Gartlehner et al. 2017; Kennedy et al. 2011; Morishita et al. 2014).   4  1.2.1 MDD and TRD According to the 2012 Canadian Community Health Survey, 4.7% of the Canadians older than 15 years met the criteria for MDD (Statistics Canada 2012). The DSM-5, which is the most authoritative reference for mental disorder diagnoses, lists nine typical symptoms of MDD, including  experiencing depressed mood including sad empty hopeless, no interest for most daily acidity, changes in weight or appetite, insomnia or hypersomnia, observable psychomotor agitation, lack of energy, feeling guilty and worthless, having problems with concentration and decision making, and active suicidal thought (American Psychiatric Association 2013).  MDD is underdiagnosed and undertreated because of limited access to counselling professionals and social stigma associated with mental health disorders (Weihs and Wert 2011; Epstein et al. 2010; Overton and Medina 2008). The pathophysiology of depression is not yet fully understood (Bowie, Milanovic, and Tran 2019). The biopsychosocial model suggested that MDD is associated with biological, psychological, and social factors (Schotte et al. 2006). The diathesis-stress model proposed that too much stress from daily life disrupts the balance of diathesis (Flett et al. 1995). Some research has also suggested the existence of high-risk factors, such as childhood abuse (Springer et al. 2003) and genetic factors (Sullivan, Neale, and Kendler 2000).  Although depression is treatable, symptoms do not always improve after treatment. No standardized criteria exist to define treatment-resistant depression (TRD), and in current clinical practice, a TRD patient is generally defined as a patient who has failed to respond to 1–3 adequate antidepressant treatments (Fava 2003). According to existing 5  epidemiological studies, approximately 30% to 60% of MDD subjects experience TRD (Jaffe, Rive, and Denee 2019; Kennedy et al. 2011; Fava 2003).   1.2.2 Depression Diagnosis and Antidepressive Treatment Depression is generally diagnosed by an in-person interview with a healthcare provider, who assesses the patient’s family history, length of depression episodes, symptoms, drug/alcohol use, and daily activities. During the clinical assessment, a mental health examination is performed, using depression rating scales to assess the severity of depression symptoms. Depression rating scales are available in both clinician-administered and self-reported versions. Reductions in the depression rating scale score indicate symptom improvements, and typically, a decrease in the total score of 50% or more is defined as positive treatment response. The definition of treatment remission tends to be scale-specific. General antidepressive treatments include psychotherapy (psychological counselling) and antidepressants (medication), and sometimes a combination of multiple treatments may be more effective than any single treatment (Khan et al. 2012). When general therapies fail, additional treatment options may be recommended by psychiatrists, including hospitalization, electroconvulsive therapy (ECT), and repetitive transcranial magnetic stimulation (rTMS), for TRD and severe MDD. ECT is a conventional and successful antidepressive treatment, performed by passing a brief electric current through the brain, to trigger a seizure under anesthesia conditions. ECT is highly effective, with a response rate of 66.1% and a remission rate of 51.7%, as reported by a meta-analysis of randomized, controlled trials (Ontario Health Quality 2016). However, ECT has also been 6  reported to produce severe side effects, including confusion and short/long-term memory impairment. In contrast, rTMS is a non-invasive technique for the treatment of various neurological and psychiatric disorders and has been associated with few side effects (Micallef-Trigona 2014). Studies have reported that rTMS might have potential neurocognitive benefits (Guse, Falkai, and Wobrock 2010; O’Connor et al. 2003). rTMS uses electromagnetic induction to induce electric currents, by rapidly changing the magnetic field. Meta-analysis have consistently demonstrated the safety and efficacy of rTMS for TRD treatment (McClintock et al. 2018; Brunoni et al. 2017; Dunner et al. 2014; Micallef-Trigona 2014), although recent evidence suggested lower response and remission rates for rTMS trials compared with those for ECT trials (Ontario Health Quality 2016). The majority of rTMS studies performed in MDD patients have targeted the dorsolateral prefrontal cortex (DLPFC) (Yadollahpour, Hosseini, and Shakeri 2016), a brain region that manages cognitive processes (Koenigs and Grafman 2009). Conventional 10-Hz rTMS was approved by the US Food and Drug Administration (FDA) in 2008, which delivers 3,000 stimulatory pulses in a 37.5-minute session (George et al. 2010; O’Reardon et al. 2007). The average cost of one course of treatment is approximately 2,309 CAD (1,844 USD), and one rTMS machine is estimated to be capable of performing approximately seven sessions per day (Mendlowitz et al. 2019).  rTMS treatment may differ according to the stimulation site, the orientation of the magnetic field, the number of stimuli delivered, and the frequency, intensity and duration of stimulation (O’Reardon et al. 2007). Intermittent theta burst stimulation (iTBS) is a novel, affordable, and efficient rTMS protocol that was approved by the FDA in 2018 (Brooks 2018). Compared with conventional 10-Hz rTMS, iTBS delivers 600 pulses of stimulation, 7  at 50 Hz, every 0.2 seconds, repeated at 5 Hz, for 2 seconds on and 8 seconds off, over approximately 3 minutes. This protocol significantly reduces the length of one treatment session to 15 minutes and increases the equipment capacity to 20 sessions per day. The estimated average cost of one course of iTBS treatment is 1,387 CAD (1,108 USD) (Mendlowitz et al. 2019). In the THREE-D study, researchers compared conventional 10-Hz rTMS with the novel iTBS protocol and found evidence that iTBS was non-inferior to the 10-Hz rTMS protocol for the treatment of depression (Blumberger et al. 2018). The study reported a response rate of 46% for the 10-Hz rTMS protocol, compared with a response rate of 49% for the iTBS protocol. The remission rates were 27% for the 10-Hz rTMS protocol and 32% for the iTBS protocol.   1.3 Machine Learning (ML) Traditionally, developers explicitly implement an algorithm and manually enter the algorithm parameters into a program. In ML, data are fed into models, to identify the optimal parameters necessary to describe a pattern, automatically, and the captured pattern can then be used to generate predictions for unseen datasets. This section will briefly review the basic learning theory, and the terminology and concepts associated with ML, before introducing the background for supervised learning.  1.3.1 Theory of Learning To learn patterns, the data that is used to develop (train) the model must have some shared connection with the data that will be applied to (test) our model. More 8  formally, all data are assumed to be derived from an identical distribution and are sampled independently, which is referred to as independent and identically distributed (IID) data. Although the IID assumption is difficult to prove in real-life situations, it is a substantial and fundamental foundation of ML.  Three typical stages of work are performed during an ML task, including training (developing), validation, and testing, which correspond to the training (development) dataset, the validation dataset, and the testing dataset, respectively. However, the terms validation and testing are sometimes reversed, in both academia and industry (Ripley and Hjort 1995). To avoid misunderstandings, the definitions of these terminologies are reviewed here: • Training (developing): The stage in which the parameters of the model are adjusted, using the training dataset, to generate the best performance. • Validation: To verify if any performance improvement on the training dataset would also yield performance improvement on the validation dataset. • Testing: The model is examined to determine its performance during a real situation, using the testing dataset, which is an independent and unseen dataset. ML represents a balance between how well the model works on the training dataset and how well the training performance approximates the testing performance. The validation step uses a subset of the existing training dataset to approximate the testing dataset. Complex models can always fit training data better than simpler models,  but complex models can also learn overly specific patterns that are not shared with the testing dataset. Comparatively, simple models are more generalized and are not sensitive 9  to particular training data, but they may perform poorly due to missing important patterns. The definitions of overfitting and underfitting are as follows: • Overfitting: The model has excellent performance on the training dataset but poor performance on the testing dataset, indicating that the model is too sensitive during the training phase and has learned patterns that are not shared by other data (testing data).  • Underfitting: The model performs poorly on both the training dataset and the testing dataset because it is too generalized to learn sufficient patterns shared with the other data (testing data). The balance between overfitting and underfitting is also referred to as the bias-variance trade-off, in statistics. High bias corresponds with bias introduced by underfitting, whereas high variance corresponds with overfitting. To obtain good performance, we aim to minimize both bias and variance, which is not always possible. ML can be applied widely to many fields, including medical research. Based on records from the US National Library of Medicine and the National Institutes of Health (PubMed 2020), the number of “machine learning” publications indexed by the PubMed database demonstrated the exponential growth of interest in the field (Figure 1.1), which encouraged the use of an ML approach in this study. Many categories of ML methods exist, including supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Our study describes a supervised learning problem. 10   Figure 1.1: Number of ML publications collected in the PubMed database from 2000 to 2019. This bar chart illustrates that ML has had significant impacts on medical research.  1.3.2 Supervised Learning Supervised learning represents one of the most successful ML strategies. Supervised learning problems can be divided into two categories: the classification of categorical (discrete) response variables and the regression of numerical (continuous) response variables. In supervised learning, each object is comprised of a set of features (input/explanatory variables) and a corresponding set of labels (output/response variables). Supervised learning can be formalized as a feature matrix and a label matrix, as shown in Figure 1.2. The input is a feature matrix, with a dimension of 𝒏𝒏 ×  𝒅𝒅, and the output matrix consists of labels for each object (shown as a vector in this example 11  because only one label exists for each object). Supervised learning is intended to identify relationships between the input and the output.  Feature 1 Feature 2 …… Feature d  Label Object 1 2 3 …… 2  Object 1's Label Object 2 1 2 …… 3  Object 2's Label …… …… …… …… ……  …… Object n 0 3 …… 1  Object n's Label input  output  Figure 1.2: Supervised learning notations. This figure demonstrates an abstraction of a single-label supervised learning problem. Features and labels also refer to explanatory variables and response variables.  In comparison, unsupervised learning does not include a label for each object. Typical unsupervised learning algorithms include clustering and association rule learning. K-means (Forgy 1965) is a representative type of unsupervised algorithm, starting with the random means of 𝑲𝑲 clusters, and reassigning each subject to the closest mean, updating the location of mean, and repeating this process until the data converge.   1.4 Research Questions Using powerful ML tools and a suitable dataset, we hypothesized that the baseline and early-treatment scores of depression rating scales could predict rTMS treatment outcomes. We designed two binary prediction outcomes, responder/non-responder and remitter/non-remitter, and one continuous prediction outcome, the percentage of 12  symptom improvement, as assessed by the total scores of depression rating scales. We focused on the following research questions: • Research question 1: Can we build ML models that can predict rTMS treatment outcomes using baseline and longitudinal depression rating scales? • Research question 2: Would adding additional clinical and demographic data improve the predictive value of the ML models?  • Research question 3: Which predictors contribute the most to the prediction models?  1.5 Objective Our goal was to develop a clinical tool capable of predicting rTMS treatment outcomes. To achieve this goal, we pursued the following objectives.  • Objective 1: To develop logistic/linear regression models, based on the total scores of depression rating scales. • Objective 2: To develop logistic/linear regression, Elastic Net, and Random Forest models, including additional clinical and demographic features, such as extracted subscale scores and individual items from the depression rating scales, treatment history, ongoing anti-depressive treatment, and score for an anxiety rating scale. • Objective 3: To extract feature importance from the prediction model and discover potential predictors.  13  1.6 Contribution Our work contributed to the following aspects: • Proposed a robust, convenient, and feasible approach for the prediction of rTMS treatment outcomes, without requiring the use of any expensive biomarkers, saving up to 6 weeks of unnecessary treatment. • Provided evidence to support previously discovered clinical and demographic predictors, using a large dataset. • Demonstrated the predictive value of three different depression rating scales (including both clinician-administered and self-reported scales), for multiple ML algorithms, simultaneously.  • Provided comparable benchmark models for other rTMS treatment outcome prediction studies.  1.7 Related Work Some studies have developed rTMS outcome prediction models, based on clinical and demographic data (Trevizol et al. 2020; Feffer et al. 2018; Kelly et al. 2017; Brakemeier et al. 2008; 2007). A recent study analyzed the predictive value of baseline HRSD-C-17 and BSI-A scores, and other clinical and demographic data among THREE-D participants (Trevizol et al. 2020), reporting a c-index of 0.687. They found that current employment status, the severity of baseline depressive and anxiety symptoms, the number of failed antidepressive treatments, and age were associated with treatment 14  remission. Another study suggested that rTMS non-responders could be predicted using improvements at week 2 of treatment, with a sensitivity of 0.857, a specificity of 0.764, a positive predictive value (PPV) of 0.720, and a negative predictive value (NPV) of 0.882 (Feffer et al. 2018). The treatment outcome of the initial rTMS induction could predict the percentage response to the reinduction with a coefficient of determination (𝑹𝑹𝑹𝑹 score) of 0.29 (Kelly et al. 2017). Brakemeier et al. investigated whether 2-week rTMS treatment outcomes could be predicted using baseline data (Brakemeier et al. 2007) and then found that their own work could not be replicated (Brakemeier et al. 2008). Other studies have explored the use of electroencephalography (EEG) features to increase the predictive performance (Corlier et al. 2019; Zandvakili et al. 2019; Bailey et al. 2019; 2018; Erguzel et al. 2015; Khodayari-Rostamabad et al. 2011). A recent study found that adding EEG functional connectivity data to 2-week clinical data can increase the area under the curve (AUC) from 0.727 to 0.778 (Corlier et al. 2019). More EEG studies have reported AUC values ranged from 0.83 to 0.93 (Zandvakili et al. 2019; Bailey et al. 2019; 2018; Erguzel et al. 2015; Khodayari-Rostamabad et al. 2011). Another study, using functional magnetic resonance imaging (fMRI), achieved an accuracy of 0.88–0.93 on an independent replica dataset, based on functional connectivity features and depression subtypes, as defined by fMRI (Drysdale et al. 2017). However, a recent replication study argued that their depression subtype should be interpreted with caution due to a lack of significant evidence (Dinga et al. 2019). Other studies have investigated the use of ML and clinical data to predict other antidepressive treatment outcomes, including ECT (Martínez-Amorós et al. 2018) and medication (Chekroud et al. 2016; Iniesta et al. 2016).  15  Except for the THREE-D trial, the sample sizes of most rTMS outcome-predicting studies have ranged from 27 to 109. In the literature, AUC, accuracy, balanced accuracy, sensitivity, specificity, NPV, and PPV are the metrics most commonly selected for response and remission prediction, whereas almost all regression models used the coefficient of determination (𝑹𝑹𝑹𝑹 score). ML models have included regression with Lasso and Elastic Net regularization, support vector machine (SVM), Random Forest (RF), and neural networks. The majority of the related studies were validated through cross-validation, including leave-one-out cross-validation (Zandvakili et al. 2019), leave-two-out cross-validation (Khodayari-Rostamabad et al. 2011),  five-fold cross-validation (Bailey et al. 2018), and k-fold cross-validation (𝒌𝒌 = 6, 8, and 10) (Corlier et al. 2019; Erguzel et al. 2015). One study was tested on an independent dataset including 30 subjects (Drysdale et al. 2017). Table 1 summarizes the predictors, results, sample sizes, and ML methods described in previous studies.  16  Study Type Significant Predictors Outcome Sample Size* Method Citation rTMS treatment outcome prediction clinical demographic the total score of depression rating scales, demographic data including age, education level, employment status, gender, anxiety, treatment history, ongoing anti-depressive treatment model with baseline data: response AUC = 0.634, remission AUC = 0.735       model with baseline and week 1–3 data: response AUC = 0.840 remission AUC = 0.877 improvement 𝑹𝑹𝑹𝑹 score = 0.464 356 (THREE-D trial) Logistic Regression   Linear Regression Elastic Net   Random Forest this work clinical demographic baseline depressive and anxiety symptoms, employment, age, treatment history remission: c-index = 0.687 388 (THREE-D trial) Logistic Regression (Trevizol et al. 2020) clinical whether two-week symptom improvement < 20% response: accuracy = 0.800, sensitivity 0.857, specificity = 0.764, PPV = 0.720, NPV = 0.882 101 threshold of 2-week improvement (Feffer et al. 2018) clinical  response to the first rTMS course improvement after reinduction: 𝑹𝑹𝑹𝑹 score = 0.29 16 Linear Regression (Kelly et al. 2017) clinical demographic length of the episode, treatment history, factors of depression rating scales 2-week response: sensitivity = 0.444, specificity = 0.846   2-week improvement: 𝑹𝑹𝑹𝑹 score = -0.017 79 (27/52) Logistic Regression Linear Regression (Brakemeier et al. 2008) clinical demographic length of the episode, treatment history, factors of depression rating scales 2-week response: sensitivity = 0.867, specificity = 0.964 70 (15/55) Logistic Regression (Brakemeier et al. 2007) clinical EEG  functional connectivity (alpha-SC) during the first rTMS treatment 2-week depression rating scales response: AUC = 0.778, accuracy = 0.792, sensitivity = 0.757, specificity = 0.819 109 (49/60) Elastic Net Logistic Regression (Corlier et al. 2019) EEG pretreatment alpha coherence improvement: 𝑹𝑹𝑹𝑹 = 0.41                                   response: AUC = 0.83, sensitivity = 1.00, specificity = 0.46 29 (13/16) LASSO regression  SVM (Zandvakili et al. 2019) EEG resting theta connectivity at baseline and week 1 treatment  response: sensitivity = 0.84, specificity = 0.89 42 (12/30) linear SVM (Bailey et al. 2019) 17  Study Type Significant Predictors Outcome Sample Size* Method Citation EEG pre-treatment and week 1 frontal-midline theta power and theta connectivity response: sensitivity = 0.90, specificity = 0.92, balanced accuracy = 0.91 39 (10/29) linear SVM (Bailey et al. 2018) EEG pre-treatment frontal quantitative EEG cordance response: AUC = 0.82-0.92, accuracy = 0.89, sensitivity = 0.93, specificity = 0.89 55 (30/25) Neural Network (Erguzel et al. 2015) EEG pre-treatment quantitative EEG features response:  specificity = 0.83, sensitivity = 0.78,  combined accuracy = 0.80 27 (9/18) mixture of factor analysis (Khodayari-Rostamabad et al. 2011) neuroimage fMRI connectivity features and biotype diagnosis response: accuracy = 0.88-0.93 30 (Replica) SVM, Latent Dirichlet Allocation, Logistic Regression (Drysdale et al. 2017) Other antidepressive treatment outcome prediction using clinical data clinical demographic reduction in the total score of depression rating scale after 2 weeks of treatment ECT remission: AUC = 0.77, sensitivity = 0.76, specificity = 0.67, PPV = 0.88, NPV = 0.47 87 (66/11) Logistic Regression (Martínez-Amorós et al. 2018) clinical demographic baseline QIDS, HRSD, other clinical data, and demographic data citalopram remission: accuracy = 0.646, AUC = 0.700, sensitivity = 0.628, specificity = 0.662 final QIDS-SR scores: baseline 𝑹𝑹𝑹𝑹 = 0.175, week 2 𝑹𝑹𝑹𝑹 = 0.256  1949 (949/1000) Remitter Elastic Net Feature Selector Gradient Boosting Machine (Chekroud et al. 2016) clinical demographic symptoms of depressed mood, reduced interest, decreased activity, indecisiveness, pessimism, and anxiety escitalopram and nortriptyline remission: AUC = 0.72 (0.72 and 0.70 in drug-specific analysis) symptom reduction: 𝑹𝑹𝑹𝑹 = 0.038 (0.064 and 0.053 in drug-specific analysis) 793 (326/467)  Elastic Net (Iniesta et al. 2016) *: Responders/non-responders or remitters/non-remitters are provided in the bracket.  Table 1.1: Summary of previous studies. Part 1 lists all of the rTMS treatment-associated outcome-prediction studies, and part 2 includes additional antidepressive treatment outcome prediction studies, performed using clinical data.18  1.8 Thesis Organization The thesis is organized as follows: Chapter 1 introduces the background for depression, rTMS treatment, and ML. Chapter 1 also describes the research questions, objectives, and related work. Chapter 2 explains the approaches used for data collecting and ML analysis. In Chapter 3, the cross-validation results for the ML analysis are summarized, and the feature importances are extracted from ML models. Chapter 4 discusses the discovered predictors and indicates the highlights and limitations of this study.  Conclusions and future directions are also presented in Chapter 4.    19  Chapter 2: Data and Methods  2.1 THREE-D Study Between September 2013 and October 2016, a randomized, multicenter, and non-inferiority trial was conducted at three Canadian university hospitals, the Centre for Addiction and Mental Health, Toronto Western Hospital, and University of British Columbia Hospital, to determine whether the novel iTBS protocol was non-inferior to the conventional 10-Hz rTMS protocol for TRD treatment (Blumberger et al. 2018). Participants received 4-6 weeks of treatment, and their depression symptoms were assessed weekly during the trial. The THREE-D study demonstrated that iTBS has “non-inferior effectiveness and similar adverse event profile and acceptability” compared with conventional 10-Hz rTMS (Blumberger et al. 2018).  2.1.1 Participants A total of 501 participants, aged 18–65 years, who have been diagnosed with single or recurrent MDD episodes, according to the Mini-International Neuropsychiatric Interview (Sheehan et al. 1998), were recruited for eligibility assessment. A subject qualified for inclusion in this trial if the following criteria were met: • The subject’s current episode scored 18 or higher on the 17-item Clinician-Administered Hamilton Rating Scale for Depression (HRSD-C-17). 20  • The subject showed no response to at least one adequate dose of antidepressant treatment; Obtained an antidepressant treatment history form (ATHF) score of 4 or above. • The subject failed to endure two or more separate antidepressant treatments and received an antidepressive regimen that started at least four weeks before the start of the current study.  Subjects were excluded from the trial for the following reasons: they had any condition that might affect rTMS treatment performance or threaten their lives, such as substance abuse, pregnancy, or active suicidal thoughts. For the complete inclusion and exclusion criteria, please refer to the original manuscript describing the THREE-D study (Blumberger et al. 2018). Prior to the trial, 87 of 501 participants were excluded, and 414 subjects entered the trial.   2.1.2 Study Procedure The THREE-D study was intended to illustrate that the iTBS protocol represents an adequate replacement for the conventional 10-Hz rTMS protocol for TRD treatment. To achieve this goal, all participants were randomly enrolled into two groups, which each received one of these two treatments. Before the treatment, each participant was subjected to a structural magnetic resonance imaging (sMRI) scan, to assist with locating the left DLPFC using a neuronavigation system. Both 10-Hz rTMS and iTBS targeted the DLPFC, with a 120% resting motor threshold. All participants received 20 sessions of treatment during a 4-week period (1 treatment each weekday). Those who achieved a 21  30% improvement in the HRSD-C-17 score but did not achieve a score of less than 8 (those who scored <8 were regarded as remitters) received 10 additional sessions during the following 2 weeks. Participants who missed treatment sessions received the missing treatment sessions at the end of the study period, but participants who missed four consecutive treatment sessions were withdrawn from the study. The primary outcome, the HRSD-C-17 score, was assessed by trained researchers before treatment and after every five sessions. Three additional secondary outcome measures were documented during this study, including the 30-item Clinician-Administered Inventory of Depressive Symptoms (IDS-C-30), the 6-item Brief Symptom Inventory-Anxiety Subscale (BSI-A) (both were administered by the same research staff who administered the HRSD-C-17), and the 16-item Self-Rated Quick Inventory of Depressive Symptoms (QIDS-SR-16). These four scales were also evaluated one week, one month, and three months after the last session.   2.2 Dataset Of the initial 414 participants, 27 subjects discontinued treatment during the THREE-D trial. Additionally, 31 subjects were removed due to incomplete records for the three depression rating scales, prior to performing the ML analysis. Because the BSI-A only measures anxiety symptoms, this subscale is not considered to be a depression rating scale and only included the baseline BSI-A scores in our analysis. We formed a complete dataset, consisting of 356 subjects with scores for three scales, collected at baseline, week 1, week 2, and week 3 during treatment, and at the end of treatment. The 22  numbers of missing records for the three depression rating scales at each assessment time point are illustrated in Figure 2.1. In addition to the three depression rating scales, we obtained  pre-treatment BSI-A scores, demographic data, and additional clinical data, such as ATHF results, medication usage, and other ongoing treatments. We established two categorical outcomes (treatment response and remission) and one continuous outcome (post-treatment symptom improvement) for the predictive ML tasks.   Figure 2.1: The numbers of missing records for the three depression rating scales at each time point.  23  2.2.1 Treatment Response, Remission, and Symptom Improvement In the THREE-D trial, treatment response and remission were determined at the end of treatment, regardless of whether 4 or 6 weeks of treatment were received. Because the HRSD-C-17, IDS-C-30, and QIDS-SR-16 have been consistently validated clinically and have been demonstrated to overlap (Wu et al. 2010; Drieling, Schärer, and Langosch 2007), we avoided building predictive models using more than one depression rating scale or mixing the response or remission definitions used for different rating scales. The BSI-A is not a comprehensive measurement of depressive symptoms; therefore, we did not have a treatment response and remission definition for the BSI-A. Higher rating scores indicated more severe depression symptoms. The definitions of treatment response and remission are shown in Table 2.1. Scale Response Remission HRSD-C-17 ≥ 50% improvement from baseline � 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑡𝑡𝑡𝑡𝑡𝑡𝑏𝑏𝑏𝑏 𝑏𝑏𝑠𝑠𝑡𝑡𝑠𝑠𝑏𝑏 − 𝑓𝑓𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑡𝑡𝑡𝑡𝑡𝑡𝑏𝑏𝑏𝑏 𝑏𝑏𝑠𝑠𝑡𝑡𝑠𝑠𝑏𝑏𝑡𝑡𝑡𝑡𝑡𝑡𝑏𝑏𝑏𝑏 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑏𝑏𝑠𝑠𝑡𝑡𝑠𝑠𝑏𝑏  ≥  50%� final score < 8 IDS-C-30 final score < 14 QIDS-SR-16 final score < 6  Table 2.1: Treatment response and remission definitions for the HRSD-C-17, IDS-C-30, and QIDS-SR-16.  In addition to the binary response and remission predictions, we also developed a continuous prediction target, as a non-responder with 49% improvement is unlikely to be clinically different from a responder with 50% improvement. Thus, predictions regarding the percentage of symptom improvement, as assessed using the depression rating scales total scores may generate results that are more clinically meaningful than binary classification models. For each subject, the percentage of symptom improvement may 24  differ, as defined by the three scales, and improvements for all three depression rating scales were calculated, as follows: 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒏𝒏𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑 𝒐𝒐𝒐𝒐 𝒔𝒔𝒔𝒔𝒑𝒑𝒑𝒑𝒐𝒐𝒔𝒔 𝒊𝒊𝒔𝒔𝒑𝒑𝒑𝒑𝒐𝒐𝒊𝒊𝒑𝒑𝒔𝒔𝒑𝒑𝒏𝒏𝒑𝒑 =  𝒃𝒃𝒑𝒑𝒔𝒔𝒑𝒑𝒃𝒃𝒊𝒊𝒏𝒏𝒑𝒑 𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑𝒃𝒃 𝒔𝒔𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑 − 𝒐𝒐𝒊𝒊𝒏𝒏𝒑𝒑𝒃𝒃 𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑𝒃𝒃 𝒔𝒔𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑𝒃𝒃𝒑𝒑𝒔𝒔𝒑𝒑𝒃𝒃𝒊𝒊𝒏𝒏𝒑𝒑 𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑𝒃𝒃 𝒔𝒔𝒑𝒑𝒐𝒐𝒑𝒑𝒑𝒑   2.2.2 Depression Rating Scales Three depression rating scales were available at multiple time points, including the baseline and weekly measurements during the THREE-D trial. Max Hamilton initially proposed the HRSD-C-17, also known as the HDRS or HAM-D, and currently, more than 20 different versions have been published, containing different numbers of questions (1960). The majority of clinical studies focus on the original 17-item version. However, some studies have argued that HRSD can not adequately assess modern depression symptoms, and some items have been shown to have bad interrater and retest reliability (Bagby et al. 2004). The HRSD has been one of the “gold standard” depression rating scales over the past 60 years. The HRSD-C-17 consists of 8 items that are scored from 0-2 and 9 items that are scored from 0-4. A total score below 8 is considered to represent clinical remission or the lack of depression symptoms, whereas a score above 18 is considered to represent severe or very severe depression symptoms. We also included a subscale for the HRSD-C-17, which divided the 17 items into 4 factors (Shafer 2006), to measure different aspects of depression symptoms. A summary of all HRSD-C-17 data is listed in Table 2.2.  25  # Item Score Range Subscale 1 Depressed Mood 0 - 4 Depression 2 Feelings of Guilt 0 - 4 Depression 3 Suicide 0 - 4 Depression 4 Insomnia (early in night) 0 - 2 Insomnia 5 Insomnia (middle of night) 0 - 2 Insomnia 6 Insomnia (early in morning) 0 - 2 Insomnia 7 Work and Activities 0 - 4 Depression 8 Retardation 0 - 4 Depression 9 Agitation 0 - 4 Anxiety 10 Anxiety Psychic 0 - 4 Anxiety 11 Anxiety Somatic 0 - 4 Anxiety 12 Gastrointestinal 0 - 2 Somatic 13 General Somatic Symptoms 0 - 2 Somatic 14 Genital Symptoms 0 - 2 Somatic 15 Hypochondriasis 0 - 4 Anxiety 16 Loss of Weight 0 - 2 Somatic 17 Insight 0 - 2 Anxiety  Total Score 0 - 52                          Table 2.2: Summary of the HRSD-C-17 items, with a literature-based subscale for each item.  The Inventory of Depressive Symptomatology (IDS) measures all depression symptom domains that are specified in the DSM, which presented an alternative method for addressing a limitation of the HRSD, by assessing more contemporary depression symptoms (Bagby et al. 2004). Although the IDS is designed to measure the severity of depression, it is also sometimes used for depression screening. The IDS offers both self-reported (IDS-SR) and clinician-administered (IDS-C) versions, and both versions rate 26  identical symptoms. The IDS includes 30 items scored from 0–3; however, increases and decreases in both weight and appetite are addressed by separate items, and only one of each should be answered, leaving two blank items. All answered 28 items add up to a maximum total score of 84. A total score below 14 is considered to represent clinical remission or a lack of depression symptoms, whereas a score above 38 is considered to represent severe or very severe depression symptoms. We adopted the principle component analysis (PCA) results, published by the author of the IDS (Rush et al. 1996), to create 3 IDS subscales, and a summary of the IDS-C-30 can be found in Table 2.3.  # Item Score Range Subscale 1 Sleep Onset Insomnia 0 - 3 Anxiety/Arousal 2 Mid-Nocturnal Insomnia 0 - 3 Anxiety/Arousal 3 Early Morning Insomnia 0 - 3 Other 4 Hypersomnia 0 - 3 Other 5 Mood (Sad) 0 - 3 Mood/Cognition 6 Mood (Irritable) 0 - 3 Mood/Cognition 7 Mood (Anxious) 0 - 3 Anxiety/Arousal 8 Reactivity of Mood 0 - 3 Mood/Cognition 9 Mood Variation 0 - 3 Anxiety/Arousal 10 Quality of Mood 0 - 3 Mood/Cognition 11 Appetite Increased 0 - 3 Other 12 Appetite Decreased 13 Weight Increased 0 - 3 Other 14 Weight Decreased 15 Concentration/Decision Making 0 - 3 Mood/Cognition 16 Outlook (Self) 0 - 3 Mood/Cognition 17 Outlook (Future) 0 - 3 Mood/Cognition 18 Suicidal Ideation 0 - 3 Mood/Cognition 19 Involvement 0 - 3 Mood/Cognition 20 Energy/Fatiguability 0 - 3 Other 27  # Item Score Range Subscale 21 Pleasure/Enjoyment (exclude sex) 0 - 3 Mood/Cognition 22 Sexual Interest  0 - 3 Mood/Cognition 23 Psychomotor Slowing 0 - 3 Mood/Cognition 24 Psychomotor Agitation 0 - 3 Anxiety/Arousal 25 Somatic Complaints 0 - 3 Anxiety/Arousal 26 Sympathetic Arousal 0 - 3 Anxiety/Arousal 27 Panic/Phobic Symptoms 0 - 3 Anxiety/Arousal 28 Gastrointestinal 0 - 3 Anxiety/Arousal 29 Interpersonal Sensitivity 0 - 3 Mood/Cognition 30 Leaden Paralysis/Physical Energy 0 - 3 Anxiety/Arousal  Total Score 0 - 84     Table 2.3: Summary of the IDS-C-30 items, with a literature-based subscale for each item. In clinical practice, only 28 out of 30 items would be valid because only one of the items 11 and 12 and one of items 13 and 14 should be answered.  After the introduction of the IDS, researchers developed a shorter and faster version, known as the QIDS (Rush et al. 2003). All 16 items on the QIDS are derived from the IDS, concentrating on the 9 criteria of diagnostic symptom domains identified in DSM-IV, including sleep, sad mood, appetite/weight change, concentration, self-criticism, suicidal ideation, interest, energy/fatigue, and psychomotor agitation or retardation. QIDS is also available in both self-reported and clinician-administered versions. Unlike the HRSD and the IDS, the calculation of the QIDS total score is more complicated, because it is based on the highest scores for each of the 9 symptom domains. A total score below 6 is considered to represent clinical remission or a lack of depression symptoms, whereas a total score above 15 is considered to represent severe or very severe depression 28  symptoms. The PCA results from the literature was adopted to created two subscales (De Vos et al. 2015). A summary of all QIDS-SR-16 items can be found in Table 2.4. # Item To Total Score Score Range Subscale 1 Falling Asleep Highest of the sleeping items (# 1 - 4) 0 - 3 Somatic/Affective 2 Sleep During the Night Somatic/Affective 3 Waking Up Too Early Somatic/Affective 4 Sleeping Too Much Somatic/Affective 5 Feeling Sad # 5 0 - 3 Somatic/Affective 6 Decreased Appetite Highest of the appetite/weight items (# 6 - 9) 0 - 3 Cognitive/Appetitive 7 Increased Appetite Cognitive/Appetitive 8 Decreased Weight Cognitive/Appetitive 9 Increased Weight Cognitive/Appetitive 10 Concentration/Decision Making # 10 0 - 3 Somatic/Affective 11 View of Myself # 11 0 - 3 Cognitive/Appetitive 12 Thoughts of Death or Suicide # 12 0 - 3 Cognitive/Appetitive 13 General Interest # 13 0 - 3 Somatic/Affective 14 Energy Level # 14 0 - 3 Somatic/Affective 15 Feeling Slowed Down Highest of the psychomotor items (# 15 - 16) 0 - 3 Somatic/Affective 16 Feeling Restless Somatic/Affective  Total Score  0 - 27            Table 2.4: Summary of the QIDS-SR-16 items, with a literature-based subscale for each item.  2.2.3 Demographic and Other Baseline Clinical Data In addition to the three depression rating scales, which were collected at multiple time points, we also obtained data for 18 characteristics that were available at baseline, including 4 demographic variables, 5 ongoing treatment variables, 3 variables from the ATHF, 5 other clinical variables collected at baseline, and the total score of BSI-A. These 29  variables were included in our prediction model because they may also be associated with the rTMS outcome. A description of these features, presented as the mean and standard deviation (SD), for continuous variables, and as the proportion, for categorical variables, as illustrated in Table 2.5. The rTMS treatment group, 10-Hz rTMS or iTBS, was included as a feature for the 5 other clinical variables.  # Item Category Range Statistics* 1 Age Demographic 18-65 42.51 +/- 11.42 2 Gender Demographic 0 = male 40.45% 1 = female 59.55% 3 Education Level Demographic 1 = grade 6 or less 0.56% 2 = grade 7 - 12 1.97% 3 = high school 8.15% 4 = part college 17.13% 5 = 2-year college 13.76% 6 = 4-year college 29.21% 7 = part graduate school 7.87% 8 = graduate school 21.01% 4 Employment Demographic 0 = Not Employed 63.48% 1 = Employed 36.53% 5 Psychotherapy Ongoing Treatment 0 = no 60.11% 1 = yes 39.89% 6 Benzodiazepine Ongoing Treatment 0 = no 65.17% 1 = yes 34.83% 7 Antidepressant Ongoing Treatment 0 = no 22.47% 1 = yes 77.53% 8 Antidepressant Combination Ongoing Treatment 0 = no 79.49% 1 = yes 20.51% 9 Antipsychotic Augmentation Ongoing Treatment 0 = no 80.90% 1 = yes 19.10% 10 Clinical Variable 0 = 10-Hz rTMS 49.72% 30  # Item Category Range Statistics* 10-Hz rTMS/iTBS 1 = iTBS 50.28% 11 Anxiety Comorbidity Clinical Variable 0 = no 46.63% 1 = yes 53.37% 12 Stimulation Intensity Clinical Variable 21 - 90 49.43 +/- 10.80 13 Age of Onset Clinical Variable 3 - 55 20.96 +/- 10.93 14 Months of Episode Clinical Variable 0 - 216 23.39 +/- 27.28 15 ATHF Strength Treatment History 0 - 16 6.37 +/- 3.38 16 ATHF Trials Treatment History 0 7.30% 1 42.98% 2 29.49% 3 20.22% 17 ATHF High Treatment History 0 0.28% 1 3.09% 2 4.78% 3 34.55% 4 49.44% 5 7.78% *: The mean and standard deviation for continuous variables and percentage for each category of discrete variables are listed.  Table 2.5: Summary and statistics of demographic, ongoing treatment, and other clinical data, collected at baseline.  The Brief Symptom Inventory (BSI) is a 53-item psychological and psychiatric instrument, consisting of 9 subscales that evaluate different symptoms, such as anxiety, depression, and somatization. The BSI-A is the anxiety subscale, consisting of 6 items scored from 0–4. Table 2.6 summarizes the BSI-A subscale. Although the BSI-A is not a 31  depression rating scale, it was used to complement the depression symptom assessment. Only the total score for baseline BSI-A was included in the ML analysis. # Item Score Range Mean +/- SD 1 Nervousness/Shakiness 0 - 4 2.15 +/- 1.11 2 Suddenly Scared 0 - 4 1.07 +/- 1.22 3 Fearful 0 - 4 1.86 +/-1.27 4 Tense/Keyed up 0 - 4 2.44 +/-1.12 5 Terror/Panic 0 - 4 1.01 +/- 1.25 6 Restless 0 - 4 1.43 +/- 1.27  Total Score 0 - 24 9.96 +/- 5.29 To obtain the total score, simply sum all individual items up.   Table 2.6: Summary and statistics for the baseline BSI-A items.  2.2.4 Predictor Selection Several related studies have been based on early improvements and total depression scale scores (Corlier et al. 2019; Feffer et al. 2018; Kelly et al. 2017); therefore, total score models using logistic/linear regression were developed during the stage 1 analysis. Other studies discovered the predictive values of demographic data,  antidepressive treatment history, and item/subscale scores from depression rating scales (Trevizol et al. 2020; Brakemeier et al. 2008; 2007). Additional ML models were included to examine these predictors during the stage 2 analysis. Based on the literature, we labelled all individual items with different subscales (factors) from the HRSD-C-17, IDS-C-30, and QIDS-SR-16 (Table 2.2, Table 2.3, and Table 2.4, respectively). We compared the predictive performances of individual items, the subscales created from the 32  literature, and the total score, combined with demographic data and other baseline clinical data. We tested the following feature sets in our analysis: • Stage 1:  • The total scores of three depression rating scales, collected at multiple time points. • Stage 2: • Set 1: Individual items from the depression rating scale available at multiple time points, combined with demographic data and other clinical data, collected at baseline • Set 2: Subscales from the depression rating scale available at multiple time points, combined with demographic data and other clinical data, collected at baseline • Set 3: Total scores of the depression rating scale, available at multiple time point, combined with demographic data and other clinical data, collected at baseline Different depression rating scales contain different numbers of items. In the IDS-C-30 and QIDS-SR-16, only one of two weight items and one of two appetite items should be answered. Thus, these questions were merged to weight change and appetite change, leaving only 28 items in the IDS-C-30 and 14 items in the QIDS-SR-16. The number of features is presented in Table 2.7, and the numbers of features used were the same for both the classification and regression models. At baseline, only baseline data were used 33  to train ML models, and at week 3, all the data available at week 3 were used to train ML models.    HRSD-C-17 IDS-C-30 QIDS-SR-16 Baseline Set 1 35 46 32 Set 2 22 21 20 Set 3 19 19 19 Week 1 Set 1 52 74 46 Set 2 26 24 22 Set 3 20 20 20 Week 2 Set 1 69 102 60 Set 2 30 27 24 Set 3 21 21 21 Week 3 Set 1 86 130 74 Set 2 34 30 26 Set 3 22 22 22  Table 2.7: Summary of the number of features used in our model.  2.3 Exploratory Data Analysis Exploratory data analysis (EDA) is a crucial and necessary step to understand the data before modelling them. NumPy (Oliphant 2006) and Pandas (McKinney 2010)  were utilized to perform statistical analysis, and all the figures were created in Python 3.6 with Matplotlib (J. D. Hunter 2007) and Seaborn (Waskom et al. 2014).  The treatment response rates, as assessed by the 3 depression scales, varied from 39.9% to 49.7%, and the remission rate was lower, varying from 24.4% to 30.3%. On average, symptoms improved by 43.7%, 39.1%, and 38.4%, from baseline mean total 34  scores of 23.5, 39.27, and 17.0 to post-treatment mean total scores of 13.2, 24.1, and 10.5, for the HRSD-C-17, IDS-C-30, and QIDS-SR-16, respectively (Table 2.8).   Response Rate Remission Rate Averaged Percentage of Symptom Improvement HRSD-C-17 49.7% 30.3% 43.7% IDS-C-30 39.9% 28.9% 39.1% QIDS-SR-16 41.9% 24.4% 38.4%  Table 2.8: Treatment response and remission rates and the averaged percentage of symptom improvement, as assessed by the three depression rating scales.  The scales used in this study have been validated in many previous studies (Rush et al. 1996; 2006; 2003). A Venn graph was generated to visualize the consistency among the responders and remitters defined by three depression rating scales in Figure 2.2. A total of 158 subjects (44.4%) were not defined as responders by any of the three scales, whereas 114 subjects (32.0%) were defined as responders by all three scales. By comparison, 226 subjects (63.5%) were not defined as remitters by any of the three scales, whereas 68 subjects (19.1%) were defined as remitters by all three scales. In summary, 84 subjects (23.6%) showed inconsistent labelling for treatment response, and 62 subjects (17.4%) were inconsistently labelled for treatment remission, depending on the definition used by each scale. 35    Figure 2.2: The proportions of responders (upper) and remitters (lower) defined by three different scales. This figure demonstrates that the definitions for treatment response and remission were highly consistent among the three depression rating scales.  36   To visualize the percentage of clinical improvement, scatter plots were also generated, with three histograms showing the percentage of symptom improvement for each scale in Figure 2.3.  Figure 2.3: Scatter plots for three depression rating scales and histograms, showing the percentage of symptom improvement for each depression rating scale. This figure demonstrates that the percentages of symptom improvement were highly consistent across all three scales. 37   2.4 ML Analysis Python is one of the most popular programming languages for ML, with broad community support. Anaconda is an open-source distribution of Python, intended for scientific computations, including Scikit-learn (Pedregosa et al. 2011) and many other libraries. We implemented all ML models using Scikit-learn 0.22.2, in Python 3.6, and ran the experiments on Anaconda Windows 10. In this section, we introduce the technologies and methods used in this study, including logistic regression, linear regression, Elastic Net regularization, and Random Forest classifier/regressor. We standardized all our features to standard normal distribution when training models.  2.4.1 From Linear Regression to Logistic Regression Linear regression is a well-known statistical approach, used to model relationships between explanatory variables and response variables. In the field of ML, linear regression represents a model from the supervised learning family that generates outputs as linear combinations between each input and a bias. Assuming that we have n pairs of {𝒙𝒙𝒊𝒊:𝒔𝒔𝒊𝒊} in our training dataset, if we define the number of features in the input 𝒙𝒙 as 𝒅𝒅, then 𝒙𝒙𝒊𝒊  ∈  𝑹𝑹𝒅𝒅, and weight can be defined as 𝒘𝒘 ∈  𝑹𝑹𝒅𝒅. The formula for the i-th sample, based on the definition of linear regression, is as follows (Raschka 2015): 𝒔𝒔�𝒊𝒊 = 𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊 + 𝒃𝒃𝒊𝒊  = �𝒘𝒘𝒋𝒋𝒙𝒙𝒊𝒊𝒋𝒋 + 𝒃𝒃𝒊𝒊𝒅𝒅𝒋𝒋 = 𝟏𝟏         (𝑹𝑹.𝟏𝟏) 38  where 𝒘𝒘 is the coefficient and b is the bias (also called the intercept). To estimate how good the fitting is, an objective function for a sample size of n is defined, to estimate the differences between the predictions and the ground truth labels (Raschka 2015):  𝑴𝑴𝒊𝒊𝒏𝒏𝒊𝒊𝒔𝒔𝒊𝒊𝑴𝑴𝒑𝒑 𝒐𝒐(𝒘𝒘,𝒃𝒃) = �(𝒔𝒔𝒊𝒊 − 𝒔𝒔�𝒊𝒊)𝑹𝑹  = ��𝒔𝒔𝒊𝒊 − �� 𝒘𝒘𝒋𝒋𝒙𝒙𝒊𝒊𝒋𝒋 + 𝒃𝒃𝒊𝒊𝒅𝒅𝒋𝒋 = 𝟏𝟏 ��𝑹𝑹𝒏𝒏𝒊𝒊 = 𝟏𝟏𝒏𝒏𝒊𝒊 = 𝟏𝟏        (𝑹𝑹.𝑹𝑹) This function is also referred to as an ordinary least squares (OLS) linear regression.  Logistic regression is a classical classification model, based on sigmoid function, with an “S” shape (Figure 6) that maps the input 𝑴𝑴 ∈  (−𝒊𝒊𝒏𝒏𝒐𝒐, +𝒊𝒊𝒏𝒏𝒐𝒐) to 𝝈𝝈(𝑴𝑴)  ∈  (𝟎𝟎,𝟏𝟏). A sigmoid function has a beautiful form of probability, defined as 𝝈𝝈(−𝑴𝑴) =  𝟏𝟏 −  𝝈𝝈(𝑴𝑴).  Figure 2.4: Sigmoid function  The sigmoid function is defined as follows: 𝝈𝝈(𝑴𝑴) = 𝟏𝟏𝟏𝟏+𝒑𝒑−𝑴𝑴 . Suppose the label y is 0 or 1. Combining the sigmoid function with the linear regression, 𝑴𝑴 = 𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊 + 𝒃𝒃𝒊𝒊 , the 39  following equations could be obtained, given 𝒑𝒑(𝒔𝒔 =  𝟏𝟏 | 𝑴𝑴) = 𝝈𝝈(𝑴𝑴), 𝒑𝒑(𝒔𝒔 =  𝟎𝟎 | 𝑴𝑴)  = 1 − 𝝈𝝈(𝑴𝑴) (Raschka 2015): 𝒑𝒑(𝒔𝒔𝒊𝒊  = 𝟏𝟏 | 𝒙𝒙𝒊𝒊) =  𝟏𝟏𝟏𝟏 + 𝒑𝒑−(𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊)         (𝑹𝑹.𝟑𝟑) 𝒑𝒑(𝒔𝒔𝒊𝒊 = 𝟎𝟎 | 𝒙𝒙𝒊𝒊) =  𝒑𝒑−�𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊�𝟏𝟏 + 𝒑𝒑−(𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊)  = 𝟏𝟏𝟏𝟏 + 𝒑𝒑𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊         (𝑹𝑹.𝟒𝟒) 𝒃𝒃𝒏𝒏 �𝒑𝒑(𝒔𝒔𝒊𝒊  = 𝟏𝟏 | 𝒙𝒙𝒊𝒊)𝒑𝒑(𝒔𝒔𝒊𝒊 = 𝟎𝟎 | 𝒙𝒙𝒊𝒊)� = 𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊 + 𝒃𝒃𝒊𝒊        (𝑹𝑹.𝟓𝟓) Also, 𝒑𝒑(𝒔𝒔𝒊𝒊 | 𝒙𝒙𝒊𝒊) =  � 𝟏𝟏𝟏𝟏 + 𝒑𝒑−(𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊)�𝒔𝒔𝒊𝒊 � 𝟏𝟏𝟏𝟏 + 𝒑𝒑𝒘𝒘𝑻𝑻𝒙𝒙𝒊𝒊+𝒃𝒃𝒊𝒊�𝟏𝟏−𝒔𝒔𝒊𝒊         (𝑹𝑹.𝟔𝟔) 𝒑𝒑(𝒔𝒔𝒊𝒊 | 𝑴𝑴𝒊𝒊) =  𝝈𝝈( 𝑴𝑴𝒊𝒊)𝒔𝒔𝒊𝒊�𝟏𝟏 −  𝝈𝝈( 𝑴𝑴𝒊𝒊)�𝟏𝟏−𝒔𝒔𝒊𝒊         (𝑹𝑹.𝟖𝟖) Given a sample of 𝒏𝒏 objects, we attempted to identify the optimal parameter 𝒘𝒘 and bias 𝒃𝒃 that maximizes 𝑷𝑷(𝒔𝒔𝟏𝟏,𝒔𝒔𝑹𝑹,𝒔𝒔𝟑𝟑. . .𝒔𝒔𝒏𝒏 |𝒙𝒙𝟏𝟏,𝒙𝒙𝑹𝑹,𝒙𝒙𝟑𝟑. . .𝒙𝒙𝒏𝒏). Under the assumption of IID, 𝑷𝑷(𝒔𝒔𝟏𝟏,𝒔𝒔𝑹𝑹,𝒔𝒔𝟑𝟑. . .𝒔𝒔𝒏𝒏 |𝒙𝒙𝟏𝟏,𝒙𝒙𝑹𝑹,𝒙𝒙𝟑𝟑. . .𝒙𝒙𝒏𝒏) =  ∏ 𝑷𝑷(𝒔𝒔𝒊𝒊 |  𝒙𝒙𝒊𝒊)𝒏𝒏𝒊𝒊 = 𝟏𝟏 . Because log-likelihood is more convenient to maximize, the original equation is converted to the negative of log-likelihood form, resulting in the objective function (Raschka 2015):  40  𝑴𝑴𝒊𝒊𝒏𝒏𝒊𝒊𝒔𝒔𝒊𝒊𝑴𝑴𝒑𝒑 𝒐𝒐( 𝑴𝑴𝒊𝒊) = −𝒃𝒃𝒐𝒐𝒑𝒑��𝒑𝒑(𝒔𝒔𝒊𝒊 |  𝑴𝑴𝒊𝒊) 𝒏𝒏𝒊𝒊 = 𝟏𝟏 �= ��−𝒔𝒔𝒊𝒊𝒃𝒃𝒐𝒐𝒑𝒑(𝒔𝒔�𝒊𝒊) − (𝟏𝟏 − 𝒔𝒔𝒊𝒊)𝒃𝒃𝒐𝒐𝒑𝒑(𝟏𝟏 − 𝒔𝒔�𝒊𝒊)�  𝒏𝒏𝒊𝒊 = 𝟏𝟏= ��−𝒔𝒔𝒊𝒊𝒃𝒃𝒐𝒐𝒑𝒑�𝝈𝝈( 𝑴𝑴𝒊𝒊)� − (𝟏𝟏 − 𝒔𝒔𝒊𝒊)𝒃𝒃𝒐𝒐𝒑𝒑�𝟏𝟏 −  𝝈𝝈( 𝑴𝑴𝒊𝒊)�� 𝒏𝒏𝒊𝒊 = 𝟏𝟏        (𝑹𝑹.𝟗𝟗)  2.4.2 From L1, L2 Regularization to Elastic Net Regularization As mentioned in the previous section, during the OLS linear regression, the sum of squared residuals was minimized. Adding squared weight to the end of the objective function can form a new objective function (Raschka 2015): 𝑴𝑴𝒊𝒊𝒏𝒏𝒊𝒊𝒔𝒔𝒊𝒊𝑴𝑴𝒑𝒑 𝒐𝒐(𝒘𝒘,𝒃𝒃) = ��𝒔𝒔𝒊𝒊 − �� 𝒘𝒘𝒋𝒋𝒙𝒙𝒊𝒊𝒋𝒋 + 𝒃𝒃𝒊𝒊𝒅𝒅𝒋𝒋 = 𝟏𝟏 ��𝑹𝑹𝒏𝒏𝒊𝒊 = 𝟏𝟏 +  𝜶𝜶� 𝒘𝒘𝒋𝒋𝑹𝑹𝒅𝒅𝒋𝒋 = 𝟏𝟏         (𝑹𝑹.𝟏𝟏𝟎𝟎) The regularization item  ∑ 𝒘𝒘𝒋𝒋𝑹𝑹𝒅𝒅𝒋𝒋 = 𝟏𝟏  can also be written as ||𝒘𝒘||𝑹𝑹𝑹𝑹,  where  𝒘𝒘 = <𝒘𝒘𝟏𝟏,𝒘𝒘𝑹𝑹,𝒘𝒘𝟑𝟑 … …𝒘𝒘𝒋𝒋 >, which is also referred to as the L2 norm. This function is known as ridge regression (Hoerl and Kennard 1970), and L2 regularization can prevent overfitting by shrinking 𝒘𝒘 to small values. Adding the absolute value of the weight to the end of the objective function generates the following equation (Raschka 2015): 𝑴𝑴𝒊𝒊𝒏𝒏𝒊𝒊𝒔𝒔𝒊𝒊𝑴𝑴𝒑𝒑 𝒐𝒐(𝒘𝒘,𝒃𝒃) = ��𝒔𝒔𝒊𝒊 − �� 𝒘𝒘𝒋𝒋𝒙𝒙𝒊𝒊𝒋𝒋 + 𝒃𝒃𝒊𝒊𝒅𝒅𝒋𝒋 = 𝟏𝟏 ��𝑹𝑹𝒏𝒏𝒊𝒊 = 𝟏𝟏 +  𝜷𝜷� |𝒘𝒘𝒋𝒋| 𝒅𝒅𝒋𝒋 = 𝟏𝟏         (𝑹𝑹.𝟏𝟏𝟏𝟏) 41  The regularization item  ∑ |𝒘𝒘𝒋𝒋|𝒅𝒅𝒋𝒋 = 𝟏𝟏  can also be written as ||𝒘𝒘||𝟏𝟏, where  𝒘𝒘 = <𝒘𝒘𝟏𝟏,𝒘𝒘𝑹𝑹,𝒘𝒘𝟑𝟑 … …𝒘𝒘𝒋𝒋 >. When the L1 norm is applied to linear regression, it is referred to as the Lasso regression (Tibshirani 1996). Lasso regressions can perform variable selections implicitly because they shrink the coefficients to 0. Elastic Net regularization adds both L1 and L2 regularization items into the object function (Zou and Hastie 2005). By combining both L1 and L2 regularization, the objective function for Elastic Net regularized linear regression is as follows (Raschka 2015): 𝑴𝑴𝒊𝒊𝒏𝒏𝒊𝒊𝒔𝒔𝒊𝒊𝑴𝑴𝒑𝒑 𝒐𝒐(𝒘𝒘,𝒃𝒃) = ��𝒔𝒔𝒊𝒊 − ��𝒘𝒘𝒋𝒋𝒙𝒙𝒊𝒊𝒋𝒋 + 𝒃𝒃𝒊𝒊𝒅𝒅𝒋𝒋 = 𝟏𝟏 ��𝑹𝑹𝒏𝒏𝒊𝒊 = 𝟏𝟏 +  𝜶𝜶� 𝒘𝒘𝒋𝒋𝑹𝑹𝒅𝒅𝒋𝒋 = 𝟏𝟏 + 𝜷𝜷� |𝒘𝒘𝒋𝒋|𝒅𝒅𝒋𝒋 = 𝟏𝟏         (𝑹𝑹.𝟏𝟏𝑹𝑹) Similarly, by adding the same regularization item to the logistic regression, the objective function for the Elastic Net regularized logistic regression can be derived.  2.4.3 From Decision Tree to Random Forest Decision trees are a sequence of nested “logical conditions” of “if and else” functions, based on features with strong abilities to capture interactions between features and natural visualization. Fitting a good tree requires identifying the best splitting rules, to determine when and how the inputs should be classified into different sub-categories. Decision trees can handle both numerical and categorical features and can be interpreted through visualization. Here, a specific type of splitting algorithm, Classification and Regression Tree (CART) (Breiman et al. 1984), is discussed. The authors demonstrated this approach using a clinical application, which predicted whether heart attack subjects 42  would survive after one month by classifying them into low-risk and high-risk groups. They built a simple model based on only three features (age, blood pressure, and sinus tachycardia), and successfully classified 89% of low-risk and 75% of high-risk patients. A decision tree is visualized, using our dataset, as shown in Figure 2.5.  Figure 2.5: An example of a classification decision tree, built using our dataset.  The optimization problem for the decision trees was defined as follows: iteratively process all features to identify the best split rule 𝒘𝒘, which maximizes the information gain on dataset 𝑺𝑺: 𝑰𝑰𝒏𝒏𝒐𝒐𝒐𝒐𝒑𝒑𝒔𝒔𝒑𝒑𝒑𝒑𝒊𝒊𝒐𝒐𝒏𝒏 𝑮𝑮𝒑𝒑𝒊𝒊𝒏𝒏 = 𝑰𝑰(𝑺𝑺) −  𝒏𝒏𝟏𝟏𝒏𝒏𝑰𝑰(𝑺𝑺𝟏𝟏) −  𝒏𝒏𝑹𝑹𝒏𝒏 𝑰𝑰(𝑺𝑺𝑹𝑹), where 𝑺𝑺𝟏𝟏 and 𝑺𝑺𝑹𝑹 are the subsets split by rule 𝒘𝒘, and 𝒏𝒏𝟏𝟏 and 𝒏𝒏𝑹𝑹 are the corresponding sample sizes of the subset 𝑺𝑺𝟏𝟏 and 𝑺𝑺𝑹𝑹, respectively. Several options exist for the impurity function 𝐼𝐼, including the Gini index 43  (used in CART) and Entropy, used in other decision tree algorithms, such as ID3 (Quinlan 1986). The Gini index is defined as follows: 𝑰𝑰𝑮𝑮𝒊𝒊𝒏𝒏𝒊𝒊 𝑰𝑰𝒏𝒏𝒅𝒅𝒑𝒑𝒙𝒙 =  ∑ 𝒑𝒑𝒊𝒊 ∑ 𝒑𝒑𝒌𝒌𝒌𝒌≠𝒊𝒊𝑪𝑪𝒊𝒊=𝟏𝟏 = 𝟏𝟏 − ∑ (𝒑𝒑𝒊𝒊)𝑹𝑹𝑪𝑪𝒊𝒊=𝟏𝟏 , where 𝑪𝑪 is the number of classes, such that 𝒊𝒊 ∈ {𝟏𝟏,𝑹𝑹,𝟑𝟑…𝑪𝑪}, and 𝒑𝒑𝒊𝒊 is the fraction of the corresponding class 𝒊𝒊. The Gini Index measures the probability of mislabeling a randomly chosen subject if the subjects were randomly labelled, based on the distribution of subjects. When all subjects in a subset have the same label, the Gini index reaches a minimum of 0. To fit a decision tree, the optimal splitting rule is obtained greedily at each iteration until the maximum depth is reached, or the samples cannot be separated.  Decision trees may have reduced performance because of overfitting. Decision trees are built based on all features for all subjects included in the training dataset. As decision trees become deeper, they are likely to yield better performance on the training dataset, by remembering the whole dataset, but are not likely to improve performance on the testing dataset. The straightforward solution is to limit the depth, by setting a smaller max depth; however, this performance may highly depend on the value of max depth.  The Random Forest method addresses this problem very well, reducing overfitting and maintaining a relatively low level of errors caused by bias. A Random Forest represents a set of decision trees, in which each tree is trained using different random subsets of features and subjects. For example, tree 1 is trained on five random features from ten random objects, and tree 2 is trained on a different set of five random features from another ten random objects. Because of the randomness, each tree is unlikely to be similar, allowing each tree to make independent mistakes; however, any patterns that are repeatedly observed in multiple trees are considered useful. By averaging the results from each tree, the possibility of overfitting could be reduced, even when using deep trees. 44  Research has shown that Random Forest algorithm is the most likely to yield the best performance for small datasets (Fernández-Delgado et al. 2014).   2.4.4 Binary Classification Model During the stage 1 analysis, a non-regularized logistic regression classifier (LRC) was trained using the total scores of depression rating scales. During the stage 2 analysis, three different classification models, including an LRC, an Elastic Net-regularized logistic regression classifier (ENC), and a Random Forest classifier (RFC) were built for treatment response and remission prediction models using stage 2 features. Because the response and remission, as defined by HRSD-C-17, IDS-C-30, and QIDS-SR-16, have different degrees of imbalance, the hyperparameter “class_weight” was set to “balanced”, for all classification models. Our repeated cross-validation and permutation test required heavy computational resources; therefore, “n_jobs” was set to -1 to allow parallel computing. For Elastic Net regulation, the optimal “l1_ratio” was searched in the range [0, 0.25, 0.5, 0.75, 1], which corresponded to different weights for L1 and L2 regularization. Hyperparameter “C”, the strength of regularization, was also tuned in the range [0.0001, 0.001, 0.01, 0.1, 1, 10]. For the Random Forest classifier, “n_estimator”, the number of decision trees in one model, was tuned in the range [5, 25, 50, 100]; “max_depth”, the maximum depth for each decision tree, was tuned in the range [2, 4, 8, 16]; “min_samples_split”, the minimum number of subjects at split nodes, was tuned in the range [2, 4, 8, 16]; “min_samples_leaf”, the minimum number of subjects at leaf nodes, was tuned in the range [1, 2, 4, 8]. All other hyper-parameters for ML model were set as follows, and all unmentioned hyper-parameters were set to package default: 45  • LRC: LogisticRegression(C=1.0, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='warn', n_jobs=-1,  penalty='none', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)  • ENC: LogisticRegression(C=1.0, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=0.5, max_iter=100, multi_class='warn', n_jobs=-1,  penalty='elasticnet', random_state=None, solver='saga', tol=0.0001, verbose=0, warm_start=False)  • RFC: RandomForestClassifier(bootstrap=True, class_weight='balanced', criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=-1, oob_score=False, random_state=None, verbose=0, warm_start=False)  2.4.5 Regression Model During the stage 1 analysis, a non-regularized OLS linear regression regressor (LNR) was trained using the total scores of depression rating scales. During the stage 2 analysis, three regression models, including an LNR, an Elastic Net-regularized linear regression regressor (ENR), and a Random Forest regressor (RFR), were trained to predict the percentage of symptom improvement. The hyper-parameter search spaces defined for ENR and RFR were identical to those defined for ENC and RFC, respectively, and “alpha” in ENR is the same hyper-parameter as “C” in ENC. All other hyper-parameters were set to package default. The detailed hyper-parameters used for regression model were as follows: • LRR: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=-1, normalize=False)  • ENR: ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.5, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None,                   selection='cyclic', tol=0.0001, warm_start=False)  • RFR: RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_spli46  t=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1, oob_score=False, random_state=None, verbose=0, warm_start=False)   2.5 Model Tuning and Performance Evaluation Pipeline Cross-validation can be used to estimate how ML models will perform on an unseen, independent dataset. When tuning hyper-parameters on the whole dataset, cross-validation results are not accurate, due to optimism bias. Nested cross-validation was suggested as an unbiased approach for the estimation model performance on limited data, without using an independent testing dataset (Vabalas et al. 2019). To tune our Elastic Net and Random Forest models, we adopted a repeated, nested, 5-fold cross-validation framework, by splitting the whole dataset into five folds, and then literately using 4 folds to train the model and the remaining one fold to validate the model. For classification models, a stratified cross-validation approach was adopted, by keeping the proportion of each class the same in each fold, which is often better than a regular cross-validation approach for reducing bias and variance (Kohavi 1995). Each fold had roughly the same proportion of subjects from each class. In each iteration, the 4 training folds were used to performed cross-validation grid search for the optimal hyper-parameters, while maintaining the remaining fold as an unseen validation dataset until the validation stage. In the grid search, the best hyper-parameters for the AUC and the coefficient of determination ( 𝑹𝑹𝑹𝑹  score) were identified for classification and regression models, respectively. AUC was recommended as the best single-number metric for a classification algorithm (Bradley 1997) and has been widely used in related studies (Zandvakili et al. 2019; Erguzel et al. 2015; Corlier et al. 2019; Kautzky et al. 2018; Chekroud et al. 2016; 47  Iniesta et al. 2016). The 𝑹𝑹𝑹𝑹 score is also a commonly used evaluation metric in related work (Zandvakili et al. 2019; Iniesta et al. 2016). We repeated this process 100 times, with independent random data partitions, to reduce variations across the partition of datasets. Together with AUC, we also extracted accuracy, sensitivity, and specificity for our classification models. These pipelines are demonstrated in the upper part of Figure 2.6. Finally, we applied our model building pipeline to the whole dataset, to obtain the final models and extract feature importance.   48    Figure 2.6: Proposed framework for our ML analysis. Upper: cross-validation. Lower: training the final model and extracting feature importance.  49   2.5.1 Receiver Operating Characteristic (ROC) Curve and AUC The confusion matrix is used to evaluate the performance of a binary classification model, and the layout is shown in Table 2.9.   Ground Truth   Positive Negative Prediction Positive True Positive False Positive Negative False Negative True Negative  Table 2.9: Confusion matrix.  The AUC refers to the area under the ROC curve. The ROC curve summarizes the false positive rate (FPR) and the true positive rate (TPR), at different classification thresholds, based on the prediction probability that a given subject belongs to a specific class. The AUC makes comparisons of classification models possible because it provides a summary of the ROC. A perfect model has an AUC of 1, whereas a random guesser would have an AUC near 0.5. An AUC of 0.7 or below is considered as low diagnostic accuracy (Swets 1996). In psychiatric practice, a classifier with an AUC of 0.8 or greater is considered to be useful (McMahon 2014). The formulas for TPR and FPR are as follows:  𝑻𝑻𝑷𝑷𝑹𝑹 = 𝑺𝑺𝒑𝒑𝒏𝒏𝒔𝒔𝒊𝒊𝒑𝒑𝒊𝒊𝒊𝒊𝒊𝒊𝒑𝒑𝒔𝒔 = 𝑻𝑻𝒑𝒑𝑻𝑻𝒑𝒑 𝑷𝑷𝒐𝒐𝒔𝒔𝒊𝒊𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑𝑻𝑻𝒑𝒑𝑻𝑻𝒑𝒑 𝑷𝑷𝒐𝒐𝒔𝒔𝒊𝒊𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑 + 𝑭𝑭𝒑𝒑𝒃𝒃𝒔𝒔𝒑𝒑 𝑵𝑵𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑         (𝑹𝑹.𝟏𝟏𝟑𝟑) 𝑭𝑭𝑷𝑷𝑹𝑹 = 𝟏𝟏 − 𝑺𝑺𝒑𝒑𝒑𝒑𝒑𝒑𝒊𝒊𝒐𝒐𝒊𝒊𝒑𝒑𝒊𝒊𝒑𝒑𝒔𝒔 = 𝑭𝑭𝒑𝒑𝒃𝒃𝒔𝒔𝒑𝒑 𝑷𝑷𝒐𝒐𝒔𝒔𝒊𝒊𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑𝑭𝑭𝒑𝒑𝒃𝒃𝒔𝒔𝒑𝒑 𝑷𝑷𝒐𝒐𝒔𝒔𝒊𝒊𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑 + 𝑻𝑻𝒑𝒑𝑻𝑻𝒑𝒑 𝑵𝑵𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒊𝒊𝒊𝒊𝒑𝒑      (𝑹𝑹.𝟏𝟏𝟒𝟒)  50  2.5.2 Coefficient of Determination (𝑹𝑹𝑹𝑹 score) The coefficient of determination (𝑹𝑹𝑹𝑹  score) is a commonly used metric for measuring how samples fit regression models. 𝑹𝑹𝑹𝑹 score is defined as follows: 𝑹𝑹𝑹𝑹 = 𝟏𝟏 − 𝑹𝑹𝒑𝒑𝒔𝒔𝒊𝒊𝒅𝒅𝑻𝑻𝒑𝒑𝒃𝒃 𝑺𝑺𝑻𝑻𝒔𝒔 𝒐𝒐𝒐𝒐 𝑺𝑺𝑺𝑺𝑻𝑻𝒑𝒑𝒑𝒑𝒑𝒑𝑻𝑻𝒐𝒐𝒑𝒑𝒑𝒑𝒃𝒃 𝑺𝑺𝑻𝑻𝒔𝒔 𝒐𝒐𝒐𝒐 𝑺𝑺𝑺𝑺𝑻𝑻𝒑𝒑𝒑𝒑𝒑𝒑 = 𝟏𝟏 − ∑ (𝒔𝒔𝒊𝒊 − 𝒔𝒔�𝒊𝒊)𝒏𝒏𝒊𝒊=𝟏𝟏∑ (𝒔𝒔𝒊𝒊 − 𝒔𝒔�)𝒏𝒏𝒊𝒊=𝟏𝟏         (𝑹𝑹.𝟏𝟏𝟓𝟓) where 𝒔𝒔�𝒊𝒊 is the prediction of the regression model and 𝒔𝒔𝑖𝑖  is the ground-truth label of subject 𝒊𝒊. An 𝑹𝑹𝑹𝑹  score can be interpreted as the proportion of variance that can be explained by the features. In most cases, 𝑹𝑹𝑹𝑹 scores range from 0 to 1, but a model that produces a worse prediction than average may yield a negative value. Higher 𝑹𝑹𝑹𝑹 scores indicate better regression models. However, no threshold exists for a good 𝑹𝑹𝑹𝑹 score in clinical outcomes, because low 𝑹𝑹𝑹𝑹  scores may have low precision but still provide information regarding data trends (D. F. Hamilton, Ghert, and Simpson 2015; Poitras et al. 2015).  2.5.3 Permutation Test We performed permutation tests to obtain the significance of our classification models (Ojala and Garriga 2010). The null distribution of our permutation test was the 5-fold cross-validation results from 1,000 experiments, with randomly permutated binary labels. We compared model performance using real outcomes with the null distribution and derived a p-value, which is a measurement of the likelihood that our real outcomes were obtained by chance.    51  Chapter 3: Results  3.1 Treatment Response Prediction During stage 1 of the analysis, LRC was trained using the total scores of depression rating scales. The AUC, accuracy, sensitivity, and specificity are shown in Figure 3.1. At baseline, the IDS-C-30 response prediction model achieved modest performance (P < 0.05), whereas the models for the other two depression rating scales were not significant. With the outcomes of 1–3 weeks of rTMS treatment, all the evaluation metrics were further enhanced.  Figure 3.1: AUC, accuracy, sensitivity, and specificity for treatment response prediction models that were built during the stage 1 analysis. Models were based on LRC and were trained using longitudinal total scores of depression rating scales. All the results were validated through repeated stratified cross-validation.  During stage 2 analysis, using individual items, subscales, and total scores of the depression rating scales, combined with additional baseline clinical and demographic features, three different classifiers were trained for each feature set, including an LRC, an ENC, and an RFC. Figure 3.2 illustrates the heatmaps for all cross-validated AUC 52  values of stage 2 models. The corresponding accuracy, sensitivity, and specificity were shown in Appendix A. In the models that included only baseline information, the IDS-C-30 was the single scale that produced significant results in permutation tests. Adding the treatment outcome for weeks 1–3 improved the predictive values of all models. Overall, at baseline, our stage 2 IDS-C-30 model exceeded the stage 1 model, however, with the addition of data from weeks 1–3, the results were reversed, with stage 1 models outperforming all of the stage 2 models, according to AUC values.  Figure 3.2: AUC for the clinical response prediction models that were built during the stage 2 analysis. The models were based on LRC/ENC/RFC and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated stratified cross-validation.  3.1.1 HRSD-C-17 Treatment Response Prediction None of our baseline HRSD-C-17 response prediction models were significant at P < 0.05 in permutation tests. With the addition of early treatment outcome data from 53  weeks 1–3, the stage 1 models (LRC and total scores) yielded better results than the stage 2 models. Combining the baseline total score with the week 1 total score, the LRC model reached an AUC of 0.768 (P < 0.001; accuracy = 0.695, sensitivity = 0.667, specificity = 0.722). Including the total score data for baseline and two weeks of early treatment further enhanced the AUC to 0.787 (P < 0.001; accuracy = 0.718, sensitivity = 0.707, specificity = 0.726). The week 3 LRC model that was trained with the total score data for baseline and three weeks of early treatment achieved the highest AUC of 0.840 (P < 0.001; accuracy = 0.753, sensitivity = 0.750, specificity = 0.743) in HRSD-C-17 response prediction. These results implied that baseline data were likely not sufficient to build prediction models, but predictive models can be built after 1–3 weeks of rTMS treatment.  3.1.2 IDS-C-30 Treatment Response Prediction Using only baseline information, IDS-C-30 treatment response prediction could be achieved. The stage 1 model produced an AUC of 0.602 (P < 0.05; accuracy = 0.611, sensitivity = 0.613, specificity = 0.578). With the additional features added during the stage 2 analysis, the optimal baseline stage 2 model exceeded the stage 1 model AUC by 0.032 (P < 0.001; AUC = 0.634, accuracy = 0.602, sensitivity = 0.622, specificity = 0.589). However, with early improvement, opposite results were observed. The stage 1 models outperformed stage 2 models. Using only baseline and week 1 IDS-C-30 total scores, our LRC models obtained an AUC of 0.713 (P < 0.001; accuracy = 0.665, sensitivity = 0.662, specificity = 0.672) at week 1. The AUC increased to 0.753 (P < 0.001; accuracy = 0.683, sensitivity = 0.661, specificity = 0.711) at week 2 and to 0.814 (P < 54  0.001; accuracy = 0.723, sensitivity = 0.706, specificity = 0.737) at week 3, with the addition of week 2 and week 3 IDS-C-30 total scores. These results demonstrated the possibility of building IDS-C-30 treatment response prediction models with modest accuracy by using baseline variables alone. The use of additional demographic and clinical data could further improve the predictive value of the baseline ENC compared with total score LRC model. With adequate  interpretability and feasibility, building an IDS-C-30 treatment response model using only total scores using the LRC model was suggested, for use in patients received 1–3 weeks of early rTMS treatment.  3.1.3 QIDS-SR-16 Treatment Response Prediction Based on the data collected at baseline, none of the QIDS-SR-16 response prediction models was significant, at P < 0.05 in permutation tests. After 1–3 weeks of treatment, the stage 1 models outperformed stage 2 models, similar to the results observed for the HRSD-C-17 and IDS-C-30 response prediction models. At week 1, the stage 1 LRC model predicted the QIDS-SR-16 response with an AUC of 0.731 (P < 0.001; accuracy = 0.667, sensitivity = 0.631, specificity = 0.694), using baseline and week 1 QIDS-SR-16 total scores. The stage 1 AUC improved to 0.788 (P < 0.001; accuracy = 0.717, sensitivity = 0.690, specificity = 0.738), at week 2, and to 0.830 (P < 0.001; accuracy = 0.753, sensitivity = 0.745, specificity = 0.751), at week 3. These results implied that building a QIDS-SR-16 treatment response prediction models was feasible, using total scores in the LRC model, after 1–3 weeks of rTMS treatment.   55  3.2 Treatment Remission Prediction During stage 1, the total scores of depression rating scales were used to train LRC models. The AUC, accuracy, sensitivity, and specificity were shown in Figure 3.3. Using only baseline total scores, all three scales could predict treatment remission with significant AUC values in permutation tests (P < 0.001). Adding 1–3 weeks of treatment outcome data further improved the AUC values for all three depression rating scales.  Figure 3.3: AUC, accuracy, sensitivity, and specificity for treatment remission prediction models that were built during the stage 1 analysis. Models were based on LRC and were trained using longitudinal total scores of depression rating scales. All the results were validated through repeated stratified cross-validation.  During stage 2 of ML analysis, all of the models were trained again to predict treatment remission, using the same three methods that were used to build response prediction models. The heatmap of cross-validation AUC values were demonstrated in Figure 3.4. The corresponding accuracy, sensitivity, and specificity can be found in Appendix A. The baseline stage 2 ENC models for the HRSD-C-17 and IDS-C-30 produced higher AUC values than the stage 1 models (stage 1: HRSD-C-17 AUC = 0.630, IDS-C-30 AUC = 0.703; stage 2: HRSD-C-17 AUC = 0.656, IDS-C-30 AUC = 0.735), 56  though the differences were minor. In early treatment outcome prediction models, the optimal stage 1 models performed comparably to the stage 2 models, in terms of AUC values. Although stage 2 models included more features, the optimal stage 2 models only outperformed the stage 1 models by a maximum of 0.002. We concluded that stage 1 models generated comparative results to those generated by optimal stage 2 models in longitudinal analysis.   Figure 3.4: AUC for the clinical remission prediction models that were built during the stage 2 analysis. The models were based on LRC/ENC/RFC and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated stratified cross-validation.  3.2.1 HRSD-C-17 Treatment Remission Prediction When using only baseline data, stage 1 and stage 2 models could predict remission, but the optimal stage 2 ENC model (AUC = 0.656, accuracy = 0.609, sensitivity 57  = 0.596, specificity = 0.614) returned a better AUC than our stage 1 model (AUC = 0.630, accuracy = 0.570, sensitivity = 0.649, specificity = 0.539). With the addition of early treatment outcome data, the stage 1 models outperformed the stage 2 models, in terms of AUC values. After week 1 of rTMS treatment, the LRC model predicted remission with an AUC of 0.771 (P < 0.001; accuracy = 0.703, sensitivity = 0.697, specificity = 0.705) using the total HRSD-C-17 scores collected at baseline and at week 1. The performance improved further, to an AUC of 0.800, after the addition of week 2 data (P < 0.001; accuracy = 0.706, sensitivity = 0.710, specificity = 0.710) and to 0.822,  after the addition of week 3 data (P < 0.001; accuracy = 0.741, sensitivity = 0.763, specificity = 0.728).  3.2.2 IDS-C-30 Treatment Remission Prediction Using baseline data, both our stage 1 and stage 2 models could predict remission, but our stage 2 ENC model (AUC = 0.735, accuracy = 0.680, sensitivity = 0.664, specificity = 0.687) exceeded our stage 1 model (AUC = 0.703, accuracy = 0.643, sensitivity = 0.665, specificity = 0.626). With the addition of early treatment outcome data, the optimal stage 2 models returned higher AUC values than the stage 1 models, at weeks 1 and 3, and the differences were small. The stage 1 models were also preferable, because they required fewer features and had higher interpretability. With baseline and week 1 total IDS-C-30 scores, our LRC model predicted remission with an AUC of 0.754 (accuracy = 0.684, sensitivity = 0.675, specificity = 0.684). The performance was further enhanced to an AUC of 0.805, with the addition of week 2 data (accuracy = 0.754, sensitivity = 0.760, specificity = 0.749), and to 0.821, with the addition of week 3 data (accuracy = 0.753, sensitivity = 0.761, specificity = 0.752). 58   3.2.3 QIDS-SR-16 Treatment Remission Prediction Using only baseline data, stage 1 and stage 2 models could predict remission, but the stage 1 model (P < 0.001; AUC = 0.680, accuracy = 0.646, sensitivity = 0.621, specificity = 0.654) had better performance than our optimal stage 2 models (P < 0.001; AUC = 0.670, accuracy = 0.631, sensitivity = 0.597, specificity = 0.642). In longitudinal analysis, the stage 2 model outperformed stage 1 model in AUC values at weeks 1 and 3, with increases of 0.001 and 0.002 in AUC values, respectively. Because the improvement observed for the stage 2 models was very small, and the stage 1 models were also preferred here, due to reduced model complexity and increased interpretability. With both baseline and week 1 total QIDS-SR-16 scores, our LRC model predicted remission with an AUC of 0.795 (P < 0.001; accuracy = 0.710, sensitivity = 0.714, specificity = 0.715). The performance further improved to an AUC of 0.852, with the addition of week 2 data (P < 0.001; accuracy = 0.750, sensitivity = 0.734, specificity = 0.755), and to 0.877 with the addition of week 3 data (P < 0.001; accuracy = 0.766, sensitivity = 0.784, specificity = 0.752).  3.3 Symptom Improvement Prediction During stage 1, we used the total scores of the depression scales to train an LRR and extracted 𝑹𝑹𝑹𝑹 scores in Figure 3.5. Using only the baseline total score, none of the three scales could effectively predict the percentage of symptom improvement. Adding early treatment outcome data enhanced the 𝑹𝑹𝑹𝑹  scores for all three depression rating 59  scales. With longer rTMS treatment outcome data, our model was able to predict the percentage of symptom improvement more accurately.   Figure 3.5: 𝑹𝑹𝑹𝑹 score for symptom improvement prediction models that were built during the stage 1 analysis. Models were based on LRR and were trained using longitudinal total scores of depression rating scales. All the results were validated through repeated cross-validation.  During stage 2, we built three regression models, including an LRR, an ENR, and an RFR. A heatmap of the 𝑹𝑹𝑹𝑹 scores was shown in Figure 3.6. Overall, the baseline prediction models obtained 𝑹𝑹𝑹𝑹 scores near 0 and were not significant in permutation tests. With the addition of early treatment outcome data, significant 𝑹𝑹𝑹𝑹 scores were able to be obtained for all three depression rating scales. The residual plots of all of models were examined visually, and no patterns or correlations was observed between features and residuals. 60   Figure 3.6: 𝑹𝑹𝑹𝑹 score for symptom improvement prediction models that were built during the stage 2 analysis. The models were based on LRR/ENR/RFR and were trained using longitudinal individual items (set 1)/subscales (set 2)/total scores (set 3) of depression rating scales, combined with additional baseline clinical and demographic data. The models were validated through repeated cross-validation.  3.3.1 HRSD-C-17 Symptom Improvement Prediction The best 𝑹𝑹𝑹𝑹 score using baseline data was 0.015, which was not significant in the permutation test. With the addition of 1–3 weeks of rTMS treatment outcome data, the stage 1 models demonstrated better performance than the stage 2 models, in terms of 𝑹𝑹𝑹𝑹 scores. The LRR model 𝑹𝑹𝑹𝑹 score was increased to 0.277 (P < 0.001) when using both baseline and week 1 treatment outcome. Adding additional weeks of treatment outcome data further improved the 𝑹𝑹𝑹𝑹 score, to 0.333 (P < 0.001) after week-2 treatment and to the highest predictive value of 0.464 (P < 0.001) after week 3. These results suggested that an HRSD-C-17 symptom prediction model was able to be built with high 61  interpretability and reliable performance, using the total scores collected during rTMS treatment.  3.3.2 IDS-C-30 Symptom Improvement Prediction In the pre-treatment models, the optimal 𝑹𝑹𝑹𝑹  score was 0.031 and was not significant in the permutation test. With the addition of early treatment outcome data, the stage 1 models exceeded the stage 2 models, in terms of 𝑹𝑹𝑹𝑹 scores. When adding week 1 treatment information, the 𝑹𝑹𝑹𝑹 score for the LRR model achieved 0.157 (P < 0.001). The 𝑹𝑹𝑹𝑹 score improved to 0.253 (P < 0.001), with the addition of week 2 data, and to 0.385 (P < 0.001), with the addition of week 3 data. These results suggested that the IDS-C-30 symptom improvement prediction model was able to be built using the total scores of at least one week of rTMS treatment.  3.3.3 QIDS-SR-16 Symptom Improvement Prediction Using only baseline data, all of the 𝑹𝑹𝑹𝑹 scores were negative. With the addition of treatment outcome data, collected during 1–3 weeks of rTMS treatment, the stage 1 models returned higher 𝑹𝑹𝑹𝑹  scores than the stage 2 models. When adding week 1 treatment information, the LRR model 𝑹𝑹𝑹𝑹 score was improved to 0.179 (P < 0.001). The 𝑹𝑹𝑹𝑹 scores increased to 0.301 (P < 0.001), with the addition of week 2 data and to 0.396 (P < 0.001), with the addition of week 3 data. These results suggested that a QIDS-SR-16 symptom improvement prediction model was able to be built with total scores from at least 1 week of rTMS treatment. 62   3.4 Feature Importance Analysis  In baseline classification analysis, the stage 2 models, built with ENC and additional baseline clinical and demographic data, may demonstrated some improvement in terms of AUC scores than stage 1 models. In longitudinal analysis, the stage 1 models had comparable performances with the stage 2 models, trained using more features. A heatmap of the relative coefficients of stage 1 models, which were trained using LRC/LRR and total scores alone, was shown in Figure 3.7. A higher pre-treatment total score and a lower during-treatment total score were associated with a higher possibility of being a treatment responder, based on the feature importance of significant prediction models. In the treatment remission models, the latest available depression rating scales total score was the most significant predictor, with lower latest depression severity associated with a higher likelihood of being a remitter. The percentage of symptom improvement prediction models demonstrated similar trends to those observed for the treatment response prediction models, because the treatment response is defined using the percentage of symptom improvement.  63   Figure 3.7: Feature coefficients for the stage 1 models (trained using the depression rating scales total scores). Treatment response and remission prediction models were built with LRC. Symptom improvement models were built with LRR. 64  Although the stage 1 models achieved excellent performance for many situations, the stage 2 models outperformed the stage 1 models for baseline IDS-C-30 treatment response prediction model, baseline HRSD-C-17 treatment remission prediction models, and baseline IDS-C-30 treatment remission prediction models. The relative coefficients plot for these optimal baseline stage 2 models can be found in Appendix B.  To understand which of the additional clinical and demographic features contribute to the prediction models, the feature importance of the Elastic Net and Random Forest models, which were trained using feature Set 3 (longitudinal total scores and baseline clinical and demographic data), were illustrated in Figure 3.8 and Figure 3.9, respectively, as they had relatively better performances than the other stage 2 models. The top 10 features were selected from the Elastic Net based on the absolute relative feature coefficient, and from the Random Forest based on the Gini feature importance. In these models, total scores remained still the most significant features, but additional features, such as age, stimulation intensity, anxiety severity (assessed by BSI-A), treatment history (assessed by ATHF), and employment, may also contribute to the prediction models.     65   Figure 3.8: Top 10 features extracted from the Elastic Net models, trained using the longitudinal total scores of depression rating scales and baseline clinical and demographic variables. 66   Figure 3.9: Top 10 features extracted from the Random Forest models, trained using the longitudinal total scores of depression rating scales and baseline clinical and demographic variables. 67  Chapter 4: Discussion and Conclusion  4.1 Discussion In this study, we investigated the predictive value of scores from three depression rating scales (HRSD-C-17, IDS-C-30, and QIDS-SR-16), collected at four time points (baseline, week 1, week 2, and week 3 of rTMS treatment) using three ML tasks (treatment response, remission, and percentage of symptom improvement prediction). We designed a two-stage ML analysis. During stage 1, non-regularized logistic regression and linear regression (LRC/LRR) models were trained using the depression rating scales total scores. During stage 2, three different features sets (individual items/subscales/total scores, additional clinical data, and demographic data) were examined using three ML models for each feature set (LRC, ENC, and RFC for classification, LRR, ENR, and RFR for regression). The models were validated through repeated cross-validation and were examined using label permutation tests. Finally, the feature importance was extracted to help understand the decision process used by the ML models. Our results suggested that all three depression rating scales could be used to predict rTMS outcomes for the treatment of TRD.   4.1.1 Baseline Models Predicting the IDS-C-30 treatment response was feasible at baseline. The baseline IDS-C-30 total score was a negative predictor of treatment response. Using only the baseline total IDS-C-30 score, a prediction model was able to be developed with an AUC 68  of 0.602 (P < 0.05; accuracy = 0.611, sensitivity = 0.613, specificity = 0.578). A pooled analysis, including data from 11 clinical trials, suggested that less severe baseline depression symptoms were associated with a higher probability of being a treatment responder (Fitzgerald et al. 2016). Another study also found that patients with mild or moderate depressive symptoms had significantly better rTMS treatment outcomes than patients with severe depressive symptoms. The addition of baseline clinical and demographic data, further improved the predictive value, as shown by optimal baseline stage 2 model (trained using ENC and set 2 features), reporting a higher AUC (P < 0.001; AUC = 0.634, accuracy = 0.602, sensitivity = 0.622, specificity = 0.589). The positive predictors in the baseline IDS-C-30 response prediction model were employed status, elder age, and female gender, whereas the negative predictors were benzodiazepine usage, baseline BSI-A total score, and baseline IDS-C-30 mood/cognition subscale score. None of these predictors had a significant coefficient. In contrast, we were not able to build a significant model capable of predict HRSD-C-17 and QIDS-SR-16 response using only baseline data. One plausible interpretation could be that the IDS-C-30 represents a more comprehensive measurement of depressive symptoms than either the HRSD-C-17 or QIDS-SR-16.  In stage 1 remission models, using the total scores for any of the three depression rating scales were able to predict treatment remission, using the LRC models at baseline (P < 0.001). The baseline total score was a negative predictor of treatment remission. We observed higher predictive AUC values for the stage 2 HRSD-C-16 and IDS-C-30 models, compared with those for the stage 1 models, when adding clinical and demographic 69  features. In the stage 2 HRSD-C-16 and IDS-C-30 remission prediction models, significant positive baseline remission predictor was employed status, whereas significant negative baseline remission predictors included baseline BSI-A score, baseline HRSD-C-16 depression subscale, and baseline IDS-C-30 mood/cognition subscale.  Employment was also associated with a higher treatment remission rate in previous antidepressive treatment prediction studies (Viglione et al. 2019; Chiarotti et al. 2017). Using the same THREE-D dataset, a recent study  also suggested that employed status was a positive predictor of treatment remission (Trevizol et al. 2020).  According a large-scale analysis of 2,876 MDD adults, patients with anxious were likely to have poorer antidepressive treatment outcome (Fava et al. 2008). A previous study reported that responders showed lower levels of baseline anxiety compared with non-responders, among 70 depressive patients treated with rTMS (Brakemeier et al. 2007).  We found that predicting treatment improvement using only baseline information was not feasible, for any of the depression rating scales. None of the baseline models returned significant 𝑹𝑹𝑹𝑹 scores during the permutation tests. The percentage of symptom improvement, which was a continuous variable, contained more information than either treatment response or remission, which were binary indices, necessitating additional information to make predictions.  Some other predictors were also included in baseline ENC and RFC models. The relationship of these predictors with rTMS treatment outcome should be interpreted with caution due to the lack of statistical evidence.  70  Age has been widely investigated in previous studies, but the association between age and the rTMS treatment outcomes have been inconsistent (Trevizol et al. 2020; Brakemeier et al. 2007; Mosimann et al. 2004; Huang et al. 2004; Schüle et al. 2003). One recent study, using the same THREE-D dataset, suggested that older age was associated with a rapid response trajectory (Kaster et al. 2019).  Female gender was found be associated with a higher treatment response rate, reported by a recent study  (Bailey et al. 2018). However, the gender effect was not consistently observed in other studies (Hasanzadeh, Mohebbi, and Rostami 2019; Peng et al. 2012; Arns et al. 2012; Brakemeier et al. 2007).   The use of benzodiazepine was found to be a negative predictor of treatment response in this model, and a previous study reported that the response rate at week 6 were significant lower in benzodiazepine users versus non- benzodiazepine  users (A. M. Hunter et al. 2019). Another study using the same THREE-D dataset found that the absence of benzodiazepine use was associated with a rapid rTMS response trajectory (Kaster et al. 2019).  Increased education was found to be associated with better antidepressive treatment outcomes (Chiarotti et al. 2017), although previous rTMS treatment outcome prediction studies did not suggest significant education differences between responders and non-responders (Magnezi et al. 2016; Arns et al. 2012). Another study, utilizing a Random Forest approach, also suggested education as a useful predictor (Kautzky et al. 2018).   71   4.1.2 Longitudinal Models Adding the rTMS treatment outcome data after 1-3 weeks of treatment allowed us to build prediction models capable of predicting the treatment response, treatment remission, and symptom improvement, with moderate accuracy, for all three depression rating scales. We found that for our treatment response and symptom improvement prediction models, the stage 1 models generated higher AUC scores than the stage 2 models. For the remission prediction models, the optimal stage 2 models could only outperform stage 1 models by a maximum of 0.002 in AUC scores, for all three depression rating scales. We concluded that early treatment outcome data and total scores were sufficient to build LRC/LRR prediction models. Additional demographic and clinical data and the use of individual items or subscales did not significantly improve the predictive value of these models. After extracting the feature coefficients, the baseline total scores were found to be positive predictors of treatment response, whereas the total scores collected during 1–3 weeks of early rTMS treatment were negatively associated with treatment response. Early improvement indicated a higher possibility of being a treatment responder. Experiencing severe depression symptoms during the most recent evaluation available, after 1-3 weeks of rTMS treatment, indicated a lower chance of being a remitter. The coefficients for the treatment improvement models showed a similar pattern to those for the treatment response prediction models. Better early treatment improvement was associated with better post-treatment improvement, which agreed with previous research (Kim et al. 2011). These results were very intuitive, as the final treatment outcome was defined by the total depression rating scale and the latest available total score was a 72  reasonable estimation of the final total score. Our results suggested that longitudinal rTMS prediction models should be built, utilizing the total scores of depression rating scales.   4.1.3 Limitations and Future Research Directions Our study had some limitations. First, although our sample size was relatively large compared with other rTMS prediction studies, our results require further validation using an independent dataset (testing phase in ML), as replicability is essential for the clinical application of ML models. Second, we assumed that all three depression rating scales were consistent for measuring the treatment outcome; however, they measured different aspects of depression symptoms. In our study, we observed that 84 (23.6%) and 62 (17.4%) subjects demonstrated inconsistent definitions for response and remission, as defined by the three scales. One possible solution could be the development of cross-scale definitions for treatment response, remission, and percentage of improvement. Further studies could also investigate the consistency of the three depression rating scales, at the level of individual items. Next, the reported relationship of rTMS treatment outcome with age, gender, education, and benzodiazepine use have been inconsistent across previous studies. Future studies could investigate the pathology of these predictors during rTMS treatment. Finally, related studies have shown the high predictive value of using electroencephalogram features on small datasets (sample size < 110), suggesting that our results could be further improved through the combination of clinical data and better biomarkers with multimodality ML models.  73  4.2 Conclusion In this study, we developed rTMS outcomes prediction models using ML approaches. Our results revealed the following observations:  • We were able to build baseline clinical response prediction models using only the baseline total IDS-C-30 score, and the model achieved a modest performance (AUC = 0.602, accuracy = 0.611, sensitivity = 0.613, specificity = 0.578). Using additional clinical and demographic data could further improve the predictive values of baseline clinical response prediction models (AUC = 0.634, accuracy = 0.602, sensitivity = 0.622, specificity = 0.589).  • We were not able to build baseline treatment response prediction models with the HRSD-C-17 or QIDS-SR-16. • We could build baseline clinical remission prediction models, using HRSD-C-17, IDS-C-30, and QIDS-SR-16. The addition of clinical and demographic data to the model could improve the predictive values of the HRSD-C-17 and IDS-C-30 models. In these two models, employed status was a significant positive predictor, whereas baseline total BSI-A score, baseline HRSD-C-16 depression subscale, and baseline IDS-C-30 mood/cognition subscale were significant negative predictors. • We were not able to build baseline symptom improvement prediction models with the HRSD-C-17, IDS-C-30, or QIDS-SR-16. • For the longitudinal prediction models, all predictive values improved compared with the respective baseline models. We were able to predict rTMS treatment 74  response, treatment remission, and percentage of symptom improvement, using the HRSD-C-17, IDS-C-30, or QIDS-SR-16.  • In longitudinal prediction models, the total scores were the most significant predictors. Using the total scores with both logistic regression and linear regression yielded the best results, in most situations. Patients who demonstrated increased early improvement were associated with a higher chance of being a responder and getting improved. Lower depression rating scale total scores after early treatments were associated with a higher chance of being a treatment remitter. • Adding clinical and demographic data did not significantly improve the performance of our longitudinal prediction models. We anticipate that our models could be further improved by including additional features, including better clinical variables and biomarkers. Clinical and demographic variables are widely available, easily obtained, and simple to process. Our work could serve as a benchmark and a basis for building more complex ML prediction models.    75  Bibliography American Psychiatric Association. 2013. Diagnostic and Statistical Manual of Mental Disorders: DSM-5. Arlington, VA: American Psychiatric Association. Arns, Martijn, Wilhelmus H. Drinkenburg, Paul B. Fitzgerald, and J. Leon Kenemans. 2012. “Neurophysiological Predictors of Non-Response to RTMS in Depression.” Brain Stimulation 5 (4): 569–76. https://doi.org/10.1016/j.brs.2011.12.003. Bagby, R. Michael, Andrew G. Ryder, Deborah R. Schuller, and Margarita B. Marshall. 2004. “The Hamilton Depression Rating Scale: Has the Gold Standard Become a Lead Weight?” American Journal of Psychiatry 161 (12): 2163–77. https://doi.org/10.1176/appi.ajp.161.12.2163. Bailey, N. W., K. E. Hoy, N. C. Rogasch, R. H. Thomson, S. McQueen, D. Elliot, C. M. Sullivan, B. D. Fulcher, Z. J. Daskalakis, and P. B. Fitzgerald. 2018. “Responders to RTMS for Depression Show Increased Fronto-Midline Theta and Theta Connectivity Compared to Non-Responders.” Brain Stimulation 11 (1): 190–203. https://doi.org/10.1016/j.brs.2017.10.015. Bailey, N. W., K. E. Hoy, N. C. Rogasch, R. H. Thomson, S McQueen, D Elliot, C. M. Sullivan, B. D. Fulcher, Z. J. Daskalakis, and P. B. Fitzgerald. 2019. “Differentiating Responders and Non-Responders to RTMS Treatment for Depression after One Week Using Resting EEG Connectivity Measures.” Journal of Affective Disorders 242 (January): 68–79. https://doi.org/10.1016/j.jad.2018.08.058. Bazazeh, Dana. 2018. “Artificial Neural Network Based Prediction of Treatment Response to Repetitive Transcranial Magnetic Stimulation for Major Depressive Disorder Patients.” University of British Columbia. https://doi.org/10.14288/1.0375879. Blumberger, Daniel M, Fidel Vila-Rodriguez, Kevin E Thorpe, Kfir Feffer, Yoshihiro Noda, Peter Giacobbe, Yuliya Knyahnytska, et al. 2018. “Effectiveness of Theta Burst versus High-Frequency Repetitive Transcranial Magnetic Stimulation in Patients with Depression (THREE-D): A Randomised Non-Inferiority Trial.” The Lancet 391 (10131): 1683–92. https://doi.org/10.1016/S0140-6736(18)30295-2. Bowie, Christopher R., Melissa Milanovic, and Tanya Tran. 2019. “Pathophysiology of Cognitive Impairment in Depression.” In Neurobiology of Depression, 27–30. Elsevier. https://doi.org/10.1016/b978-0-12-813333-0.00004-4. Bradley, Andrew P. 1997. “The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms.” Pattern Recognition 30 (7): 1145–59. https://doi.org/10.1016/S0031-3203(96)00142-2. Brakemeier, Eva-Lotta, Alexander Luborzewski, Heidi Danker-Hopfe, Norbert Kathmann, and Malek Bajbouj. 2007. “Positive Predictors for Antidepressive Response to Prefrontal Repetitive Transcranial Magnetic Stimulation (RTMS).” Journal of Psychiatric Research 41 (5): 395–403. https://doi.org/10.1016/J.JPSYCHIRES.2006.01.013. 76  Brakemeier, Eva-Lotta, Gregor Wilbertz, Silke Rodax, Heidi Danker-Hopfe, Bettina Zinka, Peter Zwanzger, Nicola Grossheinrich, et al. 2008. “Patterns of Response to Repetitive Transcranial Magnetic Stimulation (RTMS) in Major Depression: Replication Study in Drug-Free Patients.” Journal of Affective Disorders 108 (1–2): 59–70. https://doi.org/10.1016/j.jad.2007.09.007. Breiman, L, J H Friedman, R A Olshen, and C J Stone. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks. Brooks, Megan. 2018. “FDA Clears 3-Minute Brain Stimulation Protocol for Depression.” MedScape Medical News. August 22, 2018. https://www.medscape.com/viewarticle/901052. Brunoni, Andre R., Anna Chaimani, Adriano H. Moffa, Lais B. Razza, Wagner F. Gattaz, Zafiris J. Daskalakis, and Andre F. Carvalho. 2017. “Repetitive Transcranial Magnetic Stimulation for the Acute Treatment of Major Depressive Episodes a Systematic Review with Network Meta-Analysis.” JAMA Psychiatry 74 (2): 143–52. https://doi.org/10.1001/jamapsychiatry.2016.3644. Chekroud, Adam Mourad, Ryan Joseph Zotti, Zarrar Shehzad, Ralitza Gueorguieva, Marcia K. Johnson, Madhukar H. Trivedi, Tyrone D. Cannon, John Harrison Krystal, and Philip Robert Corlett. 2016. “Cross-Trial Prediction of Treatment Outcome in Depression: A Machine Learning Approach.” The Lancet Psychiatry 3 (3): 243–50. https://doi.org/10.1016/S2215-0366(15)00471-X. Chiarotti, F, A Viglione, A Giuliani, and I Branchi. 2017. “Citalopram Amplifies the Influence of Living Conditions on Mood in Depressed Patients Enrolled in the STAR*D Study.” Translational Psychiatry 7 (3): e1066–e1066. https://doi.org/10.1038/tp.2017.35. Corlier, Juliana, Andrew Wilson, Aimee M Hunter, Nikita Vince-Cruz, David Krantz, Jennifer Levitt, Michael J Minzenberg, Nathaniel Ginder, Ian A Cook, and Andrew F Leuchter. 2019. “Changes in Functional Connectivity Predict Outcome of Repetitive Transcranial Magnetic Stimulation Treatment of Major Depressive Disorder.” Cerebral Cortex 29 (12): 4958–67. https://doi.org/10.1093/cercor/bhz035. Dinga, Richard, Lianne Schmaal, Brenda W.J.H. Penninx, Marie Jose van Tol, Dick J. Veltman, Laura van Velzen, Maarten Mennes, Nic J.A. van der Wee, and Andre F. Marquand. 2019. “Evaluating the Evidence for Biotypes of Depression: Methodological Replication and Extension of Drysdale et Al. (2017).” NeuroImage: Clinical 22 (January): 101796. https://doi.org/10.1016/j.nicl.2019.101796. Drieling, T., L.O. Schärer, and J.M. Langosch. 2007. “The Inventory of Depressive Symptomatology: German Translation and Psychometric Validation.” International Journal of Methods in Psychiatric Research 16 (4): 230–36. https://doi.org/10.1002/mpr.226. Drysdale, Andrew T, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, et al. 2017. “Resting-State Connectivity Biomarkers Define Neurophysiological Subtypes of Depression.” Nature Medicine 23 77  (1): 28–38. https://doi.org/10.1038/nm.4246. Dunner, David L., Scott T. Aaronson, Harold A. Sackeim, Philip G. Janicak, Linda L. Carpenter, Terrence Boyadjis, David G. Brock, et al. 2014. “A Multisite, Naturalistic, Observational Study of Transcranial Magnetic Stimulation for Patients with Pharmacoresistant Major Depressive Disorder: Durability of Benefit over a 1-Year Follow-up Period.” Journal of Clinical Psychiatry 75 (12): 1394–1401. https://doi.org/10.4088/JCP.13m08977. Epstein, Ronald M., Paul R. Duberstein, Mitchell D. Feldman, Aaron B. Rochlen, Robert A. Bell, Richard L. Kravitz, Camille Cipri, Jennifer D. Becker, Patricia M. Bamonti, and Debora A. Paterniti. 2010. “‘I Didn’t Know What Was Wrong:’ How People with Undiagnosed Depression Recognize, Name and Explain Their Distress.” Journal of General Internal Medicine 25 (9): 954–61. https://doi.org/10.1007/s11606-010-1367-0. Erguzel, Turker Tekin, Serhat Ozekes, Selahattin Gultekin, Nevzat Tarhan, Gokben Hizli Sayar, and Ali Bayram. 2015. “Neural Network Based Response Prediction of RTMS in Major Depressive Disorder Using QEEG Cordance.” Psychiatry Investigation 12 (1): 61–65. https://doi.org/10.4306/pi.2015.12.1.61. Fava, Maurizio. 2003. “Diagnosis and Definition of Treatment-Resistant Depression.” Biological Psychiatry. Elsevier USA. https://doi.org/10.1016/S0006-3223(03)00231-2. Fava, Maurizio, A. John Rush, Jonathan E. Alpert, G. K. Balasubramani, Stephen R. Wisniewski, Cheryl N. Carmin, Melanie M. Biggs, et al. 2008. “Difference in Treatment Outcome in Outpatients with Anxious versus Nonanxious Depression: A STAR*D Report.” American Journal of Psychiatry 165 (3): 342–51. https://doi.org/10.1176/appi.ajp.2007.06111868. Feffer, Kfir, Hyewon Helen Lee, Farrokh Mansouri, Peter Giacobbe, Fidel Vila-Rodriguez, Sidney H. Kennedy, Zafiris J. Daskalakis, Daniel M. Blumberger, and Jonathan Downar. 2018. “Early Symptom Improvement at 10 Sessions as a Predictor of RTMS Treatment Outcome in Major Depression.” Brain Stimulation 11 (1): 181–89. https://doi.org/10.1016/j.brs.2017.10.010. Fernández-Delgado, Manuel, Eva Cernadas, Senén Barro, Dinani Amorim, and Amorim Fernández-Delgado. 2014. “Do We Need Hundreds of Classifiers to Solve Real World Classification Problems?” Journal of Machine Learning Research. Vol. 15. http://www.mathworks.es/products/neural-network. Fitzgerald, Paul B., Kate E. Hoy, Rodney J. Anderson, and Zafiris J. Daskalakis. 2016. “A STUDY OF THE PATTERN OF RESPONSE TO RTMS TREATMENT IN DEPRESSION.” Depression and Anxiety 33 (8): 746–53. https://doi.org/10.1002/da.22503. Flett, Gordon L., Paul L. Hewitt, Kirk R. Blankstein, and Shawn W. Mosher. 1995. “Perfectionism, Life Events, and Depressive Symptoms: A Test of a Diathesis-Stress Model.” Current Psychology 14 (2): 112–37. https://doi.org/10.1007/BF02686885. 78  Forgy, E. 1965. “Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classification.” Biometrics 21 (3): 768–69. Gartlehner, Gerald, Gernot Wagner, Nina Matyas, Viktoria Titscher, Judith Greimel, Linda Lux, Bradley N. Gaynes, Meera Viswanathan, Sheila Patel, and Kathleen N. Lohr. 2017. “Pharmacological and Non-Pharmacological Treatments for Major Depressive Disorder: Review of Systematic Reviews.” BMJ Open. BMJ Publishing Group. https://doi.org/10.1136/bmjopen-2016-014912. George, Mark S., Sarah H. Lisanby, David Avery, William M. McDonald, Valerie Durkalski, Martina Pavlicova, Berry Anderson, et al. 2010. “Daily Left Prefrontal Transcranial Magnetic Stimulation Therapy for Major Depressive Disorder: A Sham-Controlled Randomized Trial.” Archives of General Psychiatry 67 (5): 507–16. https://doi.org/10.1001/archgenpsychiatry.2010.46. Guse, Birgit, Peter Falkai, and Thomas Wobrock. 2010. “Cognitive Effects of High-Frequency Repetitive Transcranial Magnetic Stimulation: A Systematic Review.” Journal of Neural Transmission. https://doi.org/10.1007/s00702-009-0333-7. Hamilton, D. F., M. Ghert, and A. H.R.W. Simpson. 2015. “Interpreting Regression Models in Clinical Outcome Studies.” Bone and Joint Research. British Editorial Society of Bone and Joint Surgery. https://doi.org/10.1302/2046-3758.49.2000571. Hamilton, Max. 1960. “A Rating Scale for Depression.” Journal of Neurology, Neurosurgery, and Psychiatry 23 (February): 56–62. https://doi.org/10.1136/jnnp.23.1.56. Hasanzadeh, Fatemeh, Maryam Mohebbi, and Reza Rostami. 2019. “Prediction of RTMS Treatment Response in Major Depressive Disorder Using Machine Learning Techniques and Nonlinear Features of EEG Signal.” Journal of Affective Disorders 256 (September): 132–42. https://doi.org/10.1016/J.JAD.2019.05.070. Hoerl, ARTHUR E., and ROBERT W. Kennard. 1970. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics 12 (1): 55–67. https://www.math.arizona.edu/~hzhang/math574m/Read/RidgeRegressionBiasedEstimationForNonorthogonalProblems.pdf. Huang, Chih Chia, Tung Ping Su, Ian Kai Shan, Kelly Chang, and I. Hua Wei. 2004. “An Open Trial of Daily Left Prefrontal Cortex Repetitive Transcranial Magnetic Stimulation for Treating Medication-Resistant Depression [1].” European Psychiatry. No longer published by Elsevier. https://doi.org/10.1016/j.eurpsy.2004.09.023. Hunter, Aimee M., Michael J. Minzenberg, Ian A. Cook, David E. Krantz, Jennifer G. Levitt, Natalie M. Rotstein, Shweta A. Chawla, and Andrew F. Leuchter. 2019. “Concomitant Medication Use and Clinical Outcome of Repetitive Transcranial Magnetic Stimulation (RTMS) Treatment of Major Depressive Disorder.” Brain and Behavior 9 (5): e01275. https://doi.org/10.1002/brb3.1275. Hunter, J D. 2007. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering 9 (3): 90–95. https://doi.org/10.1109/MCSE.2007.55. 79  Iniesta, Raquel, Karim Malki, Wolfgang Maier, Marcella Rietschel, Ole Mors, Joanna Hauser, Neven Henigsberg, et al. 2016. “Combining Clinical Variables to Optimize Prediction of Antidepressant Treatment Outcomes.” Journal of Psychiatric Research 78 (July): 94–102. https://doi.org/10.1016/j.jpsychires.2016.03.016. Iniesta, Raquel, D. Stahl, and P. McGuffin. 2016. “Machine Learning, Statistical Learning and the Future of Biological Research in Psychiatry.” Psychological Medicine. Cambridge University Press. https://doi.org/10.1017/S0033291716001367. Jaffe, Dena H., Benoit Rive, and Tom R. Denee. 2019. “The Humanistic and Economic Burden of Treatment-Resistant Depression in Europe: A Cross-Sectional Study.” BMC Psychiatry 19 (1): 247. https://doi.org/10.1186/s12888-019-2222-4. James, Spencer L., Degu Abate, Kalkidan Hassen Abate, Solomon M. Abay, Cristiana Abbafati, Nooshin Abbasi, Hedayat Abbastabar, et al. 2018. “Global, Regional, and National Incidence, Prevalence, and Years Lived with Disability for 354 Diseases and Injuries for 195 Countries and Territories, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study 2017.” The Lancet 392 (10159): 1789–1858. https://doi.org/10.1016/S0140-6736(18)32279-7. Kaster, Tyler S., Jonathan Downar, Fidel Vila-Rodriguez, Kevin E. Thorpe, Kfir Feffer, Yoshihiro Noda, Peter Giacobbe, et al. 2019. “Trajectories of Response to Dorsolateral Prefrontal RTMS in Major Depression: A Three-D Study.” American Journal of Psychiatry 176 (5): 367–75. https://doi.org/10.1176/appi.ajp.2018.18091096. Kautzky, Alexander, Markus Dold, Lucie Bartova, Marie Spies, Thomas Vanicek, Daniel Souery, Stuart Montgomery, et al. 2018. “Refining Prediction in Treatment-Resistant Depression: Results of Machine Learning Analyses in the TRD III Sample.” Journal of Clinical Psychiatry 79 (1). https://doi.org/10.4088/JCP.16m11385. Kelly, Michael S, Albino J Oliveira-Maia, Margo Bernstein, Adam P Stern, Daniel Z Press, Alvaro Pascual-Leone, and Aaron D Boes. 2017. “Initial Response to Transcranial Magnetic Stimulation Treatment for Depression Predicts Subsequent Response.” Journal of Neuropsychiatry and Clinical Neurosciences 29 (2): 179–82. https://doi.org/10.1176/appi.neuropsych.16100181. Kennedy, Sidney H., Peter Giacobbe, Sakina J. Rizvi, Franca M. Placenza, Nishikawa Yasunori, Helen S. Mayberg, and Andres M. Lozano. 2011. “Deep Brain Stimulation for Treatment-Resistant Depression: Follow-up after 3 to 6 Years.” American Journal of Psychiatry 168 (5): 502–10. https://doi.org/10.1176/appi.ajp.2010.10081187. Khan, Arif, James Faucett, Pesach Lichtenberg, Irving Kirsch, and Walter A. Brown. 2012. “A Systematic Review of Comparative Efficacy of Treatments and Controls for Depression.” PLoS ONE 7 (7). https://doi.org/10.1371/journal.pone.0041778. Khodayari-Rostamabad, Ahmad, James P. Reilly, Gary M. Hasey, Hubert De Bruin, and Duncan MacCrimmon. 2011. “Using Pre-Treatment Electroencephalography Data to Predict Response to Transcranial Magnetic Stimulation Therapy for Major Depression.” In Proceedings of the Annual International Conference of the IEEE 80  Engineering in Medicine and Biology Society, EMBS, 6418–21. IEEE. https://doi.org/10.1109/IEMBS.2011.6091584. Kim, Jae Min, Seon Young Kim, Robert Stewart, Joon An Yoo, Kyung Yeol Bae, Sung Won Jung, Min Soo Lee, Hyeon Woo Yim, and Tae Youn Jun. 2011. “Improvement within 2 Weeks and Later Treatment Outcomes in Patients with Depressive Disorders: The CRESCEND Study.” Journal of Affective Disorders 129 (1–3): 183–90. https://doi.org/10.1016/j.jad.2010.09.007. Koenigs, Michael, and Jordan Grafman. 2009. “The Functional Neuroanatomy of Depression: Distinct Roles for Ventromedial and Dorsolateral Prefrontal Cortex.” Behavioural Brain Research. NIH Public Access. https://doi.org/10.1016/j.bbr.2009.03.004. Kohavi, Ron. 1995. “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection.” In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, 1137–1143. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. Magnezi, Racheli, Emanuel Aminov, Dikla Shmuel, Merav Dreifuss, and Pinhas Dannon. 2016. “Comparison between Neurostimulation Techniques Repetitive Transcranial Magnetic Stimulation vs Electroconvulsive Therapy for the Treatment of Resistant Depression: Patient Preference and Cost-Effectiveness.” Patient Preference and Adherence 10 (August): 1481–87. https://doi.org/10.2147/PPA.S105654. Martínez-Amorós, Erika, Ximena Goldberg, Verònica Gálvez, Aida de Arriba- Arnau, Virginia Soria, José M. Menchón, Diego J. Palao, Mikel Urretavizcaya, and Narcís Cardoner. 2018. “Early Improvement as a Predictor of Final Remission in Major Depressive Disorder: New Insights in Electroconvulsive Therapy.” Journal of Affective Disorders 235 (August): 169–75. https://doi.org/10.1016/J.JAD.2018.03.014. McClintock, Shawn M., Irving M. Reti, Linda L. Carpenter, William M. McDonald, Marc Dubin, Stephan F. Taylor, Ian A. Cook, et al. 2018. “Consensus Recommendations for the Clinical Application of Repetitive Transcranial Magnetic Stimulation (RTMS) in the Treatment of Depression.” Journal of Clinical Psychiatry. Physicians Postgraduate Press Inc. https://doi.org/10.4088/JCP.16cs10905. McKinney, Wes. 2010. “Data Structures for Statistical Computing in Python.” In Proceedings of the 9th Python in Science Conference, edited by Stéfan van der Walt and Jarrod Millman, 51–56. McMahon, Francis J. 2014. “Prediction of Treatment Outcomes in Psychiatry-Where Do We Stand?” Dialogues in Clinical Neuroscience 16 (4): 455–64. Mendlowitz, Andrew B., Alaa Shanbour, Jonathan Downar, Fidel Vila-Rodriguez, Zafiris J. Daskalakis, Wanrudee Isaranuwatchai, and Daniel M. Blumberger. 2019. “Implementation of Intermittent Theta Burst Stimulation Compared to Conventional Repetitive Transcranial Magnetic Stimulation in Patients with Treatment Resistant Depression: A Cost Analysis.” PLoS ONE 14 (9): e0222546. 81  https://doi.org/10.1371/journal.pone.0222546. Micallef-Trigona, Beppe. 2014. “Comparing the Effects of Repetitive Transcranial Magnetic Stimulation and Electroconvulsive Therapy in the Treatment of Depression: A Systematic Review and Meta-Analysis.” Edited by Heinz Grunze. Depression Research and Treatment 2014: 135049. https://doi.org/10.1155/2014/135049. Morishita, Takashi, Sarah M. Fayad, Masa aki Higuchi, Kelsey A. Nestor, and Kelly D. Foote. 2014. “Deep Brain Stimulation for Treatment-Resistant Depression: Systematic Review of Clinical Outcomes.” Neurotherapeutics. Springer New York LLC. https://doi.org/10.1007/s13311-014-0282-1. Mosimann, Urs P., Wolfgang Schmitt, Benjamin D. Greenberg, Markus Kosel, René M. Müri, Magdalena Berkhoff, Christian W. Hess, Hans U. Fisch, and Thomas E. Schlaepfer. 2004. “Repetitive Transcranial Magnetic Stimulation: A Putative Add-on Treatment for Major Depression in Elderly Patients.” Psychiatry Research 126 (2): 123–33. https://doi.org/10.1016/j.psychres.2003.10.006. O’Connor, Margaret, Cornelia Brenninkmeyer, Amy Morgan, Kerry Bloomingdale, Mark Thall, Russell Vasile, and Alvaro Pascual Leone. 2003. “Relative Effects of Repetitive Transcranial Magnetic Stimulation and Electroconvulsive Therapy on Mood and Memory: A Neurocognitive Risk-Benefit Analysis.” Cognitive and Behavioral Neurology 16 (2): 118–27. https://doi.org/10.1097/00146965-200306000-00005. O’Reardon, John P., H. Brent Solvason, Philip G. Janicak, Shirlene Sampson, Keith E. Isenberg, Ziad Nahas, William M. McDonald, et al. 2007. “Efficacy and Safety of Transcranial Magnetic Stimulation in the Acute Treatment of Major Depression: A Multisite Randomized Controlled Trial.” Biological Psychiatry 62 (11): 1208–16. https://doi.org/10.1016/j.biopsych.2007.01.018. Ojala, Markus, and Gemma C Garriga. 2010. “Permutation Tests for Studying Classifier Performance.” Journal of Machine Learning Research 11: 1833–63. Oliphant, Travis E. 2006. A Guide to NumPy. Vol. 1. Trelgol Publishing USA. Ontario Health Quality. 2016. “Repetitive Transcranial Magnetic Stimulation for Treatment-Resistant Depression: A Systematic Review and Meta-Analysis of Randomized Controlled Trials.” Ontario Health Technology Assessment Series 16 (5): 1–66. https://pubmed.ncbi.nlm.nih.gov/27099642. Overton, Stacy L., and Sondra L. Medina. 2008. “The Stigma of Mental Illness.” Journal of Counseling & Development 86 (2): 143–51. https://doi.org/10.1002/j.1556-6678.2008.tb00491.x. Pedregosa, F, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, et al. 2011. “Scikit-Learn: Machine Learning in {P}ython.” Journal of Machine Learning Research 12: 2825–30. Peng, Hongjun, Huirong Zheng, Lingjiang Li, Jianbin Liu, Yan Zhang, Baoci Shan, Li Zhang, et al. 2012. “High-Frequency RTMS Treatment Increases White Matter FA in the Left Middle Frontal Gyrus in Young Patients with Treatment-Resistant 82  Depression.” Journal of Affective Disorders 136 (3): 249–57. https://doi.org/10.1016/j.jad.2011.12.006. Poitras, S., K. S. Wood, J. Savard, G. F. Dervin, and P. E. Beaule. 2015. “Predicting Early Clinical Function after Hip or Knee Arthroplasty.” Bone and Joint Research 4 (9): 145–51. https://doi.org/10.1302/2046-3758.49.2000417. PubMed. 2020. “Search Results for Machine Learning.” 2020. https://www.ncbi.nlm.nih.gov/pubmed/?term=machine+learning. Quinlan, J. R. 1986. “Induction of Decision Trees.” Machine Learning 1 (1): 81–106. https://doi.org/10.1007/bf00116251. Raschka, Sebastian. 2015. Python Machine Learning. Birmingham, UK: Packt Publishing. Ripley, Brian D, and N L Hjort. 1995. Pattern Recognition and Neural Networks. 1st ed. USA: Cambridge University Press. Rush, A. John, Ira H. Bernstein, Madhukar H. Trivedi, Thomas J. Carmody, Stephen Wisniewski, James C. Mundt, Kathy Shores-Wilson, et al. 2006. “An Evaluation of the Quick Inventory Of Depressive Symptomatology and the Hamilton Rating Scale for Depression: A Sequenced Treatment Alternatives to Relieve Depression Trial Report.” Biological Psychiatry 59 (6): 493–501. https://doi.org/10.1016/j.biopsych.2005.08.022. Rush, A. John, C. M. Gullion, M. R. Basco, R. B. Jarrett, and M. H. Trivedi. 1996. “The Inventory of Depressive Symptomatology (IDS): Psychometric Properties.” Psychological Medicine 26 (3): 477–86. https://doi.org/10.1017/s0033291700035558. Rush, A. John, Madhukar H. Trivedi, Hicham M. Ibrahim, Thomas J. Carmody, Bruce Arnow, Daniel N. Klein, John C. Markowitz, et al. 2003. “The 16-Item Quick Inventory of Depressive Symptomatology (QIDS), Clinician Rating (QIDS-C), and Self-Report (QIDS-SR): A Psychometric Evaluation in Patients with Chronic Major Depression.” Biological Psychiatry 54 (5): 573–83. https://doi.org/10.1016/S0006-3223(02)01866-8. Schotte, Chris K.W., Bart Van Den Bossche, Dirk De Doncker, Stephan Claes, and Paul Cosyns. 2006. “A Biopsychosocial Model as a Guide for Psychoeducation and Treatment of Depression.” Depression and Anxiety 23 (5): 312–24. https://doi.org/10.1002/da.20177. Schüle, Cornelius, Peter Zwanzger, Thomas Baghai, Patrick Mikhaiel, Heike Thoma, Hans Jürgen Möller, Rainer Rupprecht, and Frank Padberg. 2003. “Effects of Antidepressant Pharmacotherapy after Repetitive Transcranial Magnetic Stimulation in Major Depression: An Open Follow-up Study.” Journal of Psychiatric Research 37 (2): 145–53. https://doi.org/10.1016/S0022-3956(02)00101-2. Shafer, Alan B. 2006. “Meta-Analysis of the Factor Structures of Four Depression Questionnaires: Beck, CES-D, Hamilton, and Zung.” Journal of Clinical Psychology 62 (1): 123–46. https://doi.org/10.1002/jclp.20213. 83  Sheehan, David V., Yves Lecrubier, K. Harnett Sheehan, Patricia Amorim, Juris Janavs, Emmanuelle Weiller, Thierry Hergueta, Roxy Baker, and Geoffrey C. Dunbar. 1998. “The Mini-International Neuropsychiatric Interview (M.I.N.I.): The Development and Validation of a Structured Diagnostic Psychiatric Interview for DSM-IV and ICD-10.” In Journal of Clinical Psychiatry, 59:22–33. Springer, Kristen W., Jennifer Sheridan, Daphne Kuo, and Molly Carnes. 2003. “The Long-Term Health Outcomes of Childhood Abuse: An Overview and a Call to Action.” Journal of General Internal Medicine. Springer. https://doi.org/10.1046/j.1525-1497.2003.20918.x. Statistics Canada. 2012. “Canadian Community Health Survey: Mental Health.” 2012. https://www150.statcan.gc.ca/n1/daily-quotidien/130918/dq130918a-eng.htm. Sullivan, P. F., M. C. Neale, and K. S. Kendler. 2000. “Genetic Epidemiology of Major Depression: Review and Meta-Analysis.” American Journal of Psychiatry. American Psychiatric Publishing. https://doi.org/10.1176/appi.ajp.157.10.1552. Swets, John A. 1996. Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers. New York: Lawrence Erlbaum Associates, Inc. https://doi.org/https://doi.org/10.4324/9781315806167. Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society 58 (1): 267–88. Trevizol, Alisson P., Jonathan Downar, Fidel Vila-Rodriguez, Kevin E. Thorpe, Zafiris J. Daskalakis, and Daniel M. Blumberger. 2020. “Predictors of Remission after Repetitive Transcranial Magnetic Stimulation for the Treatment of Major Depressive Disorder: An Analysis from the Randomised Non-Inferiority THREE-D Trial.” EClinicalMedicine 22 (May): 100349. https://doi.org/10.1016/j.eclinm.2020.100349. Vabalas, Andrius, Emma Gowen, Ellen Poliakoff, and Alexander J. Casson. 2019. “Machine Learning Algorithm Validation with a Limited Sample Size.” PLoS ONE 14 (11). https://doi.org/10.1371/journal.pone.0224365. Viglione, Aurelia, Flavia Chiarotti, Silvia Poggini, Alessandro Giuliani, and Igor Branchi. 2019. “Predicting Antidepressant Treatment Outcome Based on Socioeconomic Status and Citalopram Dose.” The Pharmacogenomics Journal 19 (6): 538–46. https://doi.org/10.1038/s41397-019-0080-6. Vos, Stijn De, Klaas J. Wardenaar, Elisabeth H. Bos, Ernst C. Wit, and Peter De Jonge. 2015. “Decomposing the Heterogeneity of Depression at the Person-, Symptom-, and Time-Level: Latent Variable Models versus Multimode Principal Component Analysis.” BMC Medical Research Methodology 15 (1): 88. https://doi.org/10.1186/s12874-015-0080-4. Waskom, Michael, Olga Botvinnik, Paul Hobson, John B Cole, Yaroslav Halchenko, Stephan Hoyer, Alistair Miles, et al. 2014. “Seaborn: V0.5.0 (November 2014).” Zenodo. https://doi.org/10.5281/zenodo.12710. Weihs, Karen, and Jonathan M. Wert. 2011. “A Primary Care Focus on the Treatment of 84  Patients with Major Depressive Disorder.” American Journal of the Medical Sciences. Lippincott Williams and Wilkins. https://doi.org/10.1097/MAJ.0b013e318210ff56. World Health Organization. 2017. “Depression and Other Common Mental Disorders: Global Health Estimates.” https://doi.org/CC BY-NC-SA 3.0 IGO. Wu, Daxing, Huifang Yin, Shujing Xu, Thomas Carmody, and David W. Morris. 2010. “Psychometric Properties of the Chinese Version of Inventory for Depressive Symptomatology (IDS): Preliminary Findings.” Asian Journal of Psychiatry 3 (3): 126–29. https://doi.org/10.1016/j.ajp.2010.08.003. Yadollahpour, Ali, Seyed Ahmad Hosseini, and Ahmad Shakeri. 2016. “RTMS for the Treatment of Depression: A Comprehensive Review of Effective Protocols on Right DLPFC.” International Journal of Mental Health and Addiction 14 (4): 539–49. https://doi.org/10.1007/s11469-016-9669-z. Zandvakili, Amin, Noah S. Philip, Stephanie R. Jones, Audrey R. Tyrka, Benjamin D. Greenberg, and Linda L. Carpenter. 2019. “Use of Machine Learning in Predicting Clinical Response to Transcranial Magnetic Stimulation in Comorbid Posttraumatic Stress Disorder and Major Depression: A Resting State Electroencephalography Study.” Journal of Affective Disorders 252 (June): 47–54. https://doi.org/10.1016/j.jad.2019.03.077. Zou, Hui, and Trevor Hastie. 2005. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2): 301–20. https://doi.org/10.1111/J.1467-9868.2005.00503.X@10.1111/(ISSN)1467-9868.TOP_SERIES_B_RESEARCH.  85  Appendix Appendix A: Accuracy, sensitivity, and specificity of stage 2 clinical response and remission prediction models The following figure illustrates the accuracy of stage 2 clinical response and remission prediction models. 86  The following figure illustrates the sensitivity of stage 2 clinical response and remission prediction models.    87  The following figure illustrates the specificity of stage 2 clinical response and remission prediction models.  88  Appendix B: Feature coefficients plots of optimal baseline stage 2 models The following figure demonstrates the feature coefficients of the baseline IDS-C-30 treatment response prediction model, which was based on ENC and was trained using the subscales of depression rating scales and baseline clinical and demographic data.         89  The following figure demonstrates the feature coefficients of the baseline HRSD-C-17 treatment remission prediction model, which was based on ENC and was trained using the subscales of depression rating scales and baseline clinical and demographic data.   90  The following figure demonstrates the feature coefficients of the baseline IDS-C-30 treatment remission prediction model, which was based on ENC and was trained using the subscales of depression rating scales and baseline clinical and demographic data.    

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0392572/manifest

Comment

Related Items