Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A homotopy-minimization method for parameter estimation in differential equations and its application… Carlquist, William Christopher 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_february_carlquist_william.pdf [ 10.44MB ]
Metadata
JSON: 24-1.0376073.json
JSON-LD: 24-1.0376073-ld.json
RDF/XML (Pretty): 24-1.0376073-rdf.xml
RDF/JSON: 24-1.0376073-rdf.json
Turtle: 24-1.0376073-turtle.txt
N-Triples: 24-1.0376073-rdf-ntriples.txt
Original Record: 24-1.0376073-source.json
Full Text
24-1.0376073-fulltext.txt
Citation
24-1.0376073.ris

Full Text

A Homotopy-Minimization Method forParameter Estimation in DifferentialEquations and Its Application inUnraveling the Reaction Mechanism ofthe Min SystembyWilliam Christopher CarlquistBS in Biology, The University of Utah, 2008BS in Mathematics, The University of Utah, 2008M.Sc. in Mathematics, The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Mathematics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2019c© William Christopher Carlquist 2018The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the dissertation entitled:A Homotopy-Minimization Method for Parameter Estimation in Differential Equations and ItsApplication in Unraveling the Reaction Mechanism of the Min Systemsubmitted by William Christopher Carlquist in partial fulfillment of the requirements forthe degree of Doctor of Philosophyin MathematicsExamining Committee:Eric Cytrynbaum, MathematicsSupervisorLeah Keshet, MathematicsSupervisory Committee MemberColin Macdonald, MathematicsUniversity ExaminerCarl Michal, Physics and AstronomyUniversity ExaminerDavid Campbell, Statistics and Actuarial Science at Simon Fraser UniversityExternal ExaminerAdditional Supervisory Committee Members:Daniel Coombs, MathematicsSupervisory Committee MemberiiAbstractA mathematical model of a dynamical process, often in the form of a system of differentialequations, serves to elucidate underlying dynamical structure and behavior of the process thatmay otherwise remain opaque. However, model parameters are often unknown and may need tobe estimated from data for a model to be informative. Numerical-integration-based methods,which estimate parameters in a differential equation model by fitting numerical solutions todata, can demand extensive computation, especially for large stiff systems that require implicitmethods for stability. Non-numerical integration methods, which estimate parameters in adifferential equation model by fitting solution approximations to data, do not provide an impartialmeasure of how well a model fits data, a measure required for the testability of a model. Inthis dissertation, I propose a new method that steps back from a numerical-integration-basedmethod, and instead allows an optimal data-fitting numerical solution to emerge as part ofan optimization process. This method bypasses the need for implicit solution methods, whichcan be computationally intensive, seems to be more robust than numerical-integration-basedmethods, and, interestingly, admits conservation principles and integral representations, whichallow me to gauge the accuracy of my optimization.The Escherichia coli Min system is one of the simplest known biological systems thatdemonstrates diverse complex dynamic behavior or transduces local interactions into a globalsignal. Various mathematical models of the Min system show behaviors that are qualitativelysimilar to dynamic behaviors of the Min system that have been observed in experiments, butno model has been quantitatively compared to time-course data. In this dissertation, I extracttime-course data for model fitting from experimental measurements of the Min system and fitestablished and novel biochemistry-based models to the time-course data using my parameterestimation method for differential equations. Comparing models to time-course data allows me tomake precise distinctions between biochemical assumptions in the various models. My modelingand fitting supports a novel model, which suggests that a regular, ordered, stability-switchingmechanism underlies the emergent, dynamic behavior of the Min system.iiiLay SummaryIn this dissertation, I develop a method to map experimental measurements onto mathematicalmodels that describe how the measured system changes in time and space. This mappingallows me to test whether a mathematical model can explain experimental observations andhelps understand the underlying dynamic structure of a modeled system. After developing andtesting my method, I apply it to map experimental measurements of a protein system thatdemonstrates interesting dynamic behavior onto established and novel mathematical modelsthat describe the protein systems’s temporal evolution. My modeling and data mapping informa novel mechanism that may underlie the dynamic behavior of the protein system.ivPrefaceWilliam Carlquist performed the research in this dissertation, designed and wrote the computerprograms for this dissertation, and wrote this dissertation. Eric Cytrynbaum provided researchdiscussion and feedback on the writing. Vassili Ivanov and Kiyoshi Mizuuchi provided thepreviously published experimental data used in Chapter 4, Appendix F, and Appendix G. Theresearch in this dissertation is original and unpublished.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Parameter Estimation in Differential Equations . . . . . . . . . . . . . . . . . . 11.2 The Min System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Chapter Summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A Homotopy-Minimization Method. . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Defining a Measure of Data Fitting, ry(p,x) . . . . . . . . . . . . . . . . . . . . 122.4 Defining a Measure of Satisfying a Numerical Solution, r∆x(p,x) . . . . . . . . . 122.4.1 Inclusion of a Smoothing Penalty in r∆x(p,x) . . . . . . . . . . . . . . . 142.5 A Concrete Example of ry(p,x) and r∆x(p,x) Using a Model of FRAP . . . . . 152.6 Extending the Homotopy on Refined Discretization Grids . . . . . . . . . . . . . 152.6.1 Defining a Measure of Interpolated Data Fitting, ryˆ(p,x) . . . . . . . . . 162.7 Optimization Using Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . 162.8 Properties of the Homotopy and Inspection of Overlapping-Niche Descent . . . . 183 Testing the Homotopy-Minimization Method. . . . . . . . . . . . . . . . . . . . 213.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21viTable of Contents3.2 A Model for MinD and MinE Interactions by Bonny et al (2013) . . . . . . . . . 213.3 Synthetic Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4 Details of Optimization Using Overlapping-Niche Descent . . . . . . . . . . . . . 263.4.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x) . . . . . . . . . . . . . . . . . . 263.4.2 Domain Restrictions on Parameters and States . . . . . . . . . . . . . . . 283.4.3 Niches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4.4 Calculating Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . 293.5 Fitting Forms of the Bonny Model to Synthetic Data . . . . . . . . . . . . . . . 303.5.1 Fitting the Spatially Homogeneous Bonny Model. . . . . . . . . . . . . . . 303.5.2 Fitting the Traveling Wave Bonny Model. . . . . . . . . . . . . . . . . . . 323.5.3 Fitting the Full Bonny Model. . . . . . . . . . . . . . . . . . . . . . . . . . 333.6 Comparing . . . to a Numerical-Integration-Based Method . . . . . . . . . . . . . 353.7 Noisy Data and Incomplete Modeling . . . . . . . . . . . . . . . . . . . . . . . . 393.8 Overlapping-Niche Descent in Practice . . . . . . . . . . . . . . . . . . . . . . . 443.8.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8.2 Consistency with the Conservation Principle and. . . . . . . . . . . . . . . 463.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 Fitting Models of the Min System to Time-Course Data . . . . . . . . . . . . 504.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 Choosing and Processing Data to Simplify Fitting . . . . . . . . . . . . . . . . . 504.2.1 Extracting Spatially Near-Homogeneous Data . . . . . . . . . . . . . . . 514.3 Fitting Models to the Near-Homogeneous Data . . . . . . . . . . . . . . . . . . . 524.3.1 Modeling and Fitting Brief . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.2 Models in Which MinE Acts Only as an Inhibitor . . . . . . . . . . . . . 534.3.3 Models in Which MinE Acts as Both a Stabilizer and an Inhibitor . . . . 654.3.4 A Stability-Switching Mechanism. . . . . . . . . . . . . . . . . . . . . . . 804.3.5 Results Relating to Experimental Observations . . . . . . . . . . . . . . . 824.4 Details of Optimization Using Overlapping-Niche Descent . . . . . . . . . . . . . 834.4.1 Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.4.2 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x) . . . . . . . . . . . . . . . . . . 844.4.3 Domain Restrictions on Parameters and States . . . . . . . . . . . . . . . 854.4.4 Niches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.4.5 Calculating Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . 874.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2 Limitations of Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . . . . . 905.3 Extensions of the Homotopy-Minimization Method . . . . . . . . . . . . . . . . . 91viiTable of Contents5.4 Limitations in Fitting Models to Spatially Near-Homogeneous Min Data . . . . 915.5 Extensions of Fitting Models to Spatially Near-Homogeneous Min Data . . . . . 92Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94AppendicesA Extensions of the . . . Method Beyond Systems of First Order. . . . . . . . . . 101A.1 Extensions to Systems of Higher Order Ordinary Differential Equations . . . . . 101A.1.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x) . . . . . . . . . . . . . . . . . . 102A.2 Extensions to Systems of Partial Differential Equations . . . . . . . . . . . . . . 103A.2.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x) . . . . . . . . . . . . . . . . . . 104B Properties of r(p,x;λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107B.1 Limiting Behavior of r˘(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108B.2 Continuity of r˘(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109B.3 Differentiability of r˘(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110B.4 Conservation in r˘y(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112B.5 Integral Representations of Limit Values . . . . . . . . . . . . . . . . . . . . . . 115B.6 Bounding Normalized Squared Residual Sums . . . . . . . . . . . . . . . . . . . 123C Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125C.1 Defining Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . . . . . . . . 125C.2 Defining Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127C.2.1 Descent Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127C.2.2 Descent Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130C.2.3 Descent on Restricted Domains . . . . . . . . . . . . . . . . . . . . . . . 132C.3 Descent Prolongation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133D Computational Complexities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138D.1 Computational Complexity of r(p,x;λ) Descent . . . . . . . . . . . . . . . . . . 138D.1.1 Formulation of r(p,x;λ) for Counting . . . . . . . . . . . . . . . . . . . . 138D.1.2 Defining Quantities for Counting . . . . . . . . . . . . . . . . . . . . . . . 138D.1.3 Counting the Computational Complexity of r(p,x;λ) Descent . . . . . . 140D.2 Computational Complexities of Numerical-Integration-Based Methods . . . . . . 150D.2.1 Counting the Computational Complexity of r(q) Descent . . . . . . . . . 150D.2.2 Counting the Computational Complexity of Newton’s Method. . . . . . . 164D.2.3 Counting the Computational Complexity of Gradient-Based Methods . . 179D.3 Comparison of Computational Complexities . . . . . . . . . . . . . . . . . . . . 186D.3.1 Complexity Assumptions for Comparison . . . . . . . . . . . . . . . . . . 186viiiTable of ContentsD.3.2 Comparison of Computational Complexities with Assumptions . . . . . . 188E Details of Testing the Homotopy-Minimization Method. . . . . . . . . . . . . 193E.1 Implementation of Overlapping-Niche Descent. . . . . . . . . . . . . . . . . . . . 193E.1.1 Generating Random Parameters and State Values . . . . . . . . . . . . . 193E.1.2 Parents and Offspring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194E.1.3 Selection and Random Perturbation . . . . . . . . . . . . . . . . . . . . . 194E.1.4 Dykstra’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195E.1.5 Initial values, Termination, Prolongation, and Computation . . . . . . . 195E.2 Details of SNSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195E.3 Details Pertaining to the Implementation. . . . . . . . . . . . . . . . . . . . . . . 196E.3.1 Selection in Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . . 196E.3.2 Prolongation in Overlapping-Niche Descent . . . . . . . . . . . . . . . . . 198E.3.3 Convergence During Accelerated Descent . . . . . . . . . . . . . . . . . . 199F Extracting Near-Homogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . . 202F.1 Data Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202F.2 Aligning Data Tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203F.3 Preparing Aligned Data for Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 206F.3.1 Temporal Partition of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 206F.3.2 Intensity Flattening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207F.3.3 Scaling Flattened Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211F.4 Finding Spatially Homogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . 215F.4.1 Spatially Near-Homogeneous Model Reductions . . . . . . . . . . . . . . 215F.4.2 Finding Spatially Near-Homogeneous Data . . . . . . . . . . . . . . . . . 217F.4.3 Errors in Spatially Near-Homogeneous Data . . . . . . . . . . . . . . . . 223F.4.4 Bounding Persistent and Bulk Densities . . . . . . . . . . . . . . . . . . . 226G Implementation of Overlapping-Niche Descent for Near-Homogeneous. . . . 229G.1 Generating Random Parameter and State Values . . . . . . . . . . . . . . . . . . 229G.2 Parents and Offspring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230G.3 Selection and Random Perturbation . . . . . . . . . . . . . . . . . . . . . . . . . 231G.4 Dykstra’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231G.5 Initial values, Termination, Prolongation, and Computation . . . . . . . . . . . . 232ixList of Tables3.1 Values and definitions of parameters and constants in the Bonny model . . . . . 223.2 Parameter estimates from . . . the spatially homogeneous Bonny model. . . . . . . . 323.3 Parameter estimates from . . . the traveling wave Bonny model. . . . . . . . . . . . 333.4 Parameter estimates from . . . the full Bonny model. . . . . . . . . . . . . . . . . . 343.5 Values of ry(p˜, x˜) from overlapping-niche descent and SNSD . . . . . . . . . . . . 363.6 Values of r∆x(p˜, x˜) from overlapping-niche descent and SNSD . . . . . . . . . . . 363.7 Mean time per iteration of descent from overlapping-niche descent and SNSD . . 383.8 Total descent time from overlapping-niche descent and SNSD . . . . . . . . . . . 394.1 Parameters from the fit of the Modified Bonny Model. . . . . . . . . . . . . . . . . 584.2 Parameters from the fit of the Extended Bonny Model. . . . . . . . . . . . . . . . 634.3 Removed-reaction fits of the Extended Bonny Model. . . . . . . . . . . . . . . . . 644.4 Parameters from the fit of the Symmetric Activation Model. . . . . . . . . . . . . 714.5 Parameters from the fit of the Asymmetric Activation Model. . . . . . . . . . . . . 784.6 Removed-reaction fits of the Asymmetric Activation Model. . . . . . . . . . . . . . 79xList of Figures1.1 Division-site regulation by the Min system . . . . . . . . . . . . . . . . . . . . . . 41.2 MinE acting as an inhibitor of MinD membrane binding . . . . . . . . . . . . . . 72.1 Overlapping-Niche Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Synthetic spatially-homogeneous data . . . . . . . . . . . . . . . . . . . . . . . . 243.2 Synthetic traveling-wave data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3 Synthetic traveling-wave-emergence data . . . . . . . . . . . . . . . . . . . . . . . 263.4 The fit of the spatially homogeneous Bonny model. . . . . . . . . . . . . . . . . . 313.5 Observable-state errors for noisy-data and incomplete-model fits . . . . . . . . . 413.6 Numerical solution errors for noisy-data and incomplete-model fits . . . . . . . . 433.7 Parameter variation over λ ∈ (0, 1) for noisy-data and incomplete-model fits . . . 443.8 Convergence of r˜(λ) during overlapping-niche descent . . . . . . . . . . . . . . . 453.9 The evolution of r˜y(λ) and r˜∆x(λ) over generations. . . . . . . . . . . . . . . . . . 463.10 Consistency in conservation of r˜(λ), r˜y(λ), and r˜∆x(λ) . . . . . . . . . . . . . . . 473.11 Consistency in the integral representations of limλ→1− r˜y(λ) . . . . . . . . . . . . 483.12 Cumulative integral representations of limit values . . . . . . . . . . . . . . . . . 494.1 Near-homogeneous MinD and MinE density data . . . . . . . . . . . . . . . . . . 524.2 The Modified Bonny Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3 The fit of the Modified Bonny Model to the near-homogeneous data . . . . . . . 564.4 States from the fit of the Modified Bonny Model. . . . . . . . . . . . . . . . . . . . 574.5 The Extended Bonny Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.6 The fit of the Extended Bonny Model to the near-homogeneous data . . . . . . . 614.7 States from the fit of the Extended Bonny Model. . . . . . . . . . . . . . . . . . . 624.8 The fit of the ωeE,d→de-null Extended Bonny Model. . . . . . . . . . . . . . . . . . 654.9 The Symmetric Activation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.10 The fit of the Symmetric Activation Model to the near-homogeneous data . . . . 694.11 States from the fit of the Symmetric Activation Model. . . . . . . . . . . . . . . . 704.12 The Asymmetric Activation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 754.13 The fit of the Asymmetric Activation Model to the near-homogeneous data . . . 764.14 States from the fit of the Asymmetric Activation Model. . . . . . . . . . . . . . . 774.15 Stability of MinD dimers on the supported lipid bilayer . . . . . . . . . . . . . . 80xiList of Figures4.16 The stability-switching mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 82E.1 Selection from offspring types during overlapping-niche descent . . . . . . . . . . 197E.2 Selection across niches during overlapping-niche descent . . . . . . . . . . . . . . 198E.3 Descent prolongation during overlapping-niche descent . . . . . . . . . . . . . . . 199E.4 Convergence behavior of accelerated descent . . . . . . . . . . . . . . . . . . . . . 200E.5 A comparison of optimization using accelerated descent. . . . . . . . . . . . . . . . 201F.1 The 330th data frame as an image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 203F.2 Alignment preimage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204F.3 Relative c(s, t) values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205F.4 Alignment preimage and alignment image . . . . . . . . . . . . . . . . . . . . . . 205F.5 Aligned data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206F.6 Mean fluorescence intensities over space during temporal partition P0 . . . . . . 208F.7 Mean background intensities over time . . . . . . . . . . . . . . . . . . . . . . . . 208F.8 Gaussian profile data and best fitting Gaussian functions . . . . . . . . . . . . . 210F.9 Flattened MinD and MinE fluorescence intensities for image 224 . . . . . . . . . 211F.10 Mean flattened fluorescence intensities over space during temporal partition P0. . 213F.11 The decomposition of MinD fluorescence intensity. . . . . . . . . . . . . . . . . . . 214F.12 Bulk MinD and MinE fluorescence intensities . . . . . . . . . . . . . . . . . . . . 215F.13 Relative inhomogeneity values of planar-fit MinD and MinE density data. . . . . . 219F.14 Scaled sums of maximal MinD and MinE relative inhomogeneity values. . . . . . . 220F.15 Relative inhomogeneity values of planar-fit MinD and MinE density data . . . . 221F.16 Planar-fit MinD density data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222F.17 Spatially near-homogeneous MinD and MinE density data profiles. . . . . . . . . . 223F.18 The spreads of MinD and MinE density data. . . . . . . . . . . . . . . . . . . . . . 224F.19 Densities within error of spatially near-homogeneous data . . . . . . . . . . . . . 225F.20 Estimating power laws in errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226F.21 Spatially near-homogeneous data errors and power law approximations . . . . . . 226F.22 MinD and MinE pulse-train density data . . . . . . . . . . . . . . . . . . . . . . . 227xiiAcknowledgementsMy research supervisor, Eric Cytrynbaum, introduced me to the Min system and collocationmethods. Throughout my research, he gave me the freedom to explore while providing meaningfuldiscussion to keep me from getting lost. I sincerely thank him. My supervisory committeemembers, Leah Edelstein-Keshet and Daniel Coombs, encouraged me, provided helpful discussion,and fostered an enriching MathBio group at UBC. I thank them. Vassili Ivanov and KiyoshiMizuuchi kindly provided me with their experimental data, and I thank them. Without the loveand support from my family and my better half, S¸ule, I would not have started or completedmy PhD program. I cannot thank them enough.xiiifor S¸ulexivChapter 1IntroductionA mathematical model of a dynamical process serves to elucidate underlying dynamical structureand behavior of the process that may otherwise remain opaque. Across different fields, notablyin Biology, systems of interest are becoming more interrelated, with a commensurate increasein mathematical model complexity. Often, an explicit model for the evolution of a complexdynamical process may not be known or may not exist. Alternatively, from experiment orpostulate, one may formulate an implicit model for the evolution of a complex dynamical process,relating the state of the system to the change in the state of the system. Such models fallinto one of a variety of forms including difference equations, stochastic processes, and, mostubiquitously, differential equations. A differential equation model of a complex dynamicalprocess, with a large number of states, parameters, and nonlinearities, often admits solutionsthat are sensitive to changes in parameter values and often contains parameters that are notdirectly measurable by experiment. As such, parameter values in a differential equation modelof a complex dynamical process are often unknown, and dynamical outcomes of the model arenot well characterized.Fitting solutions of a differential equation model to data determines parameter values for amodel. For a good model, fitted parameter values will confer dynamical structure on the modelso that the model fits data well. In converse, the ability of a model to fit data provides testabilityfor a model. In this dissertation, I develop a parameter estimation method for differentialequations that accurately estimates model parameter values from data and accurately estimatesa model’s ability to fit data.The Escherichia coli Min system is one of the simplest known biological systems thatdemonstrates diverse complex dynamic behavior or transduces local interactions into a globalsignal. As such, the Min system is currently one of the most reduced model systems forunderstanding such behaviors. I apply my parameter estimation method for differential equationsto fit established and novel models of the Min system to time-course data. My modeling andfitting reveals a novel mechanism that may underlie the dynamic behavior of the Min system.1.1 Parameter Estimation in Differential EquationsVarious parameter estimation methods exist for differential equations. Most commonly, nu-merical solutions of differential equations are fit to data, as, generally, closed-form solutions todifferential equations are not known or do not exist for fitting. In numerical-integration-based11.1. Parameter Estimation in Differential Equationsmethods, parameter values are iteratively updated to minimize a measure of the differencebetween numerical solution values and data [11]. Numerical-integration-based methods areprecise, in that numerical solution values are directly compared to data, so parameter esti-mates correspond directly to the numerical solution that fits data best. However, complexsystems of differential equations admit a variety of parameter-dependent solution behaviors,and bifurcations separate the numerical solution space into regions with qualitatively differentbehaviors. Thus, to find parameter values of the optimal data-fitting numerical solution, initialparameter-value estimates must be chosen for a numerical solution with the same qualitativebehavior as the optimal data-fitting numerical solution, as local search information is lostacross bifurcations. As such, numerical-integration-based methods require extensive parametersearch-space probing, which, in practice, is accomplished by combining global optimizationmethods, such as genetic algorithms, with local numerical-integration-based methods [49] [78].However, repeated numerical integration is computationally intensive, especially for large stiffsystems of differential equations that require implicit methods for stability, and excessively longcomputational times in numerical-integration-based methods may surpass tractability.Non-numerical integration methods for differential equation parameter estimation, such ascollocation methods, relax the exactness of using numerical solutions to increase reliability inparameter estimation and to gain computational efficiency. Static collocation methods are thecomputationally simplest form of collocation method and are used to estimate parameters insystems of differential equations from data with measurements of all model states. In them, togenerate smooth solution proxies, focusing on fitting non-local data behavior, smooth splineswith fewer knots than data points are fit to data [71], or focusing on fitting local data behavior,polynomials centered at data points are fit to data [43]. Then, parameter values in differentialequations are estimated by minimizing a measure of satisfying the system of differential equationswith the solution proxies as state values.Dynamic collocation methods extend the idea of static collocation methods to estimateparameters in systems of differential equations with unobserved states by incorporating a dynamicbasis representation for each state, a linear combination of basis functions, generally splines.In them, under some smoothing penalty, basis representations of states are fit to data under afixed set of parameter values. Then, parameter values are re-estimated, either by minimizing ameasure of satisfying the system of differential equations with the basis representations as statevalues [57] or by fitting basis representations of states, treated as implicit functions of modelparameters, to data [58], and the process is repeated. Generally, smoothing penalties in datafitting consist of a weighted measure of satisfying the system of differential equations with thebasis representations as state values, where a small penalty weight biases fitting towards dataand a large penalty weight biases fitting towards a solution of the system of differential equations.For reliable parameter estimation, the penalty weight is chosen relatively large. However, anexcessively large penalty weight obscures data fitting. Parameter estimates directly dependon the penalty weight, so the penalty weight is chosen judiciously. Often, the penalty weight21.1. Parameter Estimation in Differential Equationsis incrementally increased until some stopping criterion is met: when basis representations ofstates begin to deform after stabilizing [58], parameter estimates become stable [77] then beginto destabilize [9], or when a sharp decrease in data fitting accommodates a sharp increase insatisfying the system of differential equations [7]. The penalty weight may also be chosen asthe penalty weight that minimizes data fitting error under some cross-validation criterion, suchas model based complexity [10] or error in satisfying the system of differential equations [79].Alternatively, with a Bayesian approach, the posterior conditional probability density, giventhe data, may be generated by assuming some prior probability distribution for the penaltyweight in addition to prior probability distributions for model parameters [79]. Or, multipleposterior conditional probability densities may be simultaneously generated under differentpenalty weights, with exchange, to more robustly estimate the posterior conditional probabilitydensity for some large penalty weight [8].The choice of method for parameter estimation in differential equations depends on thedata, the model, and the motivation for parameter estimation. If parameter estimation ismotivated by measuring the underlying parameter values of some dynamic process, then acollocation method will often return reliable parameter estimates. However, non-numericalintegration methods, such as collocation methods, approximate parameter values of the optimaldata-fitting solution through an approximation of the optimal data-fitting solution. So, eventhough parameter estimates may be similar to those of the optimal data-fitting numericalsolution, parameter estimates from non-numerical integration methods may admit numericalsolutions with significantly different behavior than the optimal data-fitting numerical solution,especially in complex systems with sensitivity to parameters. Also, because non-numericalintegration methods approximate the optimal data-fitting solution, they do not assess how wellthe optimal data-fitting numerical solution fits data. To reliably compare different models ofthe same dynamic process, each model’s ability to fit the data must be assessed, requiring thedetermination of the optimal data-fitting numerical solution. However, as mentioned previously,calculating the optimal data-fitting numerical solution using a numerical-integration-basedmethod can require an excessively long computational time, especially with a large system ofdifferential equations that requires an implicit method for stability.In this dissertation, I propose a method that extends the idea of collocation methods to allowme to calculate the optimal data-fitting numerical solution and its parameters for a differentialequation model. It steps back from a numerical-integration-based method, and instead allows thenumerical solution to emerge as part of an optimization process. This method bypasses the needto calculate numerical solutions with implicit methods, which can be computationally intensive,seems to be more robust than numerical-integration-based methods, and, interestingly, admitsconservation principles and integral representations, which allow me to gauge the accuracy ofmy optimization.31.2. The Min System1.2 The Min SystemThe Min system, consisting of three proteins, MinC, MinD, and MinE, dynamically orients thesite of cell division toward midcell in Escherichia coli. Local interactions of MinD and MinEon the cell membrane drive a recurring, coordinated repositioning of MinD and MinE fromcell pole to cell pole [60]. MinC disrupts the aggregation of FtsZ into the Z-ring ([14], [3], [33],[30], [55]), the contractile ring that divides the cell, and localizes to the cell membrane in thepresence of MinD ([35], [29], [30], [34], [42]). The pole-to-pole repositioning of MinD shuttlesMinC from cell pole to cell pole [29]. Over time, the average concentration of MinC is higher atcell poles than at midcell, leading to greater inhibition of Z-ring formation at cell poles than atmidcell and dynamically orienting the site of cell division toward midcell ([80], [55]). Ultimately,cell division at midcell produces viable, symmetric daughter cells. A schematic diagram ofdivision-site regulation by the Min system is shown in Figure 1.1.Z-ring at midcellsymmetric daughter cellsZ-ring forms at midcellMinC disrupts Z-ring formationMin proteinsmean over timeFigure 1.1: Division-site regulation by the Min system. Min proteins oscillate from cell pole tocell pole, with minimum concentrations over time at midcell (left). MinC inhibits the formationof the Z-ring, causing the Z-ring to form at midcell (middle). The Z-ring at midcell contracts todivide the cell into two symmetric daughter cells (right).The Min system demonstrates interesting dynamic behavior in vivo and in vitro. In shortcells, MinD and MinE arrange into dynamic protein bands that stochastically switch togetherfrom cell pole to cell pole [21]. As cells grow longer, stochastic pole-to-pole switching of MinDand MinE stabilizes into regular pole-to-pole oscillations of MinD and MinE ([60], [23], [21]). Incells devoid of FtsZ, cells continually grow, and regular pole-to-pole oscillations of MinD andMinE form into stable pole-to-midcell oscillations of MinD and MinE ([60], [23]). However, theoscillatory behavior of MinD and MinE in cells can be altered by changing the expression levelsof MinD and MinE. At low expression levels, MinD and MinE oscillate regularly from cell poleto cell pole in short cells and from cell pole to midcell in long cells; at high induction levels,MinD and MinE stochastically switch from cell pole to cell pole in short cells and from cell pole41.2. The Min Systemto midcell in long cells [69]. Generally, Escherichia coli cells are rod shaped. In round mutantcells, MinD and MinE oscillate antipodally ([12],[68]), and in branched mutant cells, MinD andMinE oscillate from branch to branch to branch [72]. On supported lipid bilayers in vitro, MinDand MinE arrange into dynamic protein aggregates that oscillate [38] or form into travelingwaves ([45], [38], [44], [74], [73]), spiral waves ([45], [38], [44], [74], [73]), dynamic amoeba-likeshapes ([38], [73]), snake-like projections [38], mushroom-like shapes [73], and bursts [73]. Theoscillatory behavior of MinD and MinE in cells has been reproduced in artificial, rod-shaped,membrane-clad compartments with dimensions on scales that are ten times longer than thosein living cells. In compartments with smaller aspect ratios, MinD and MinE oscillate fromcompartment end to compartment end; in compartments with larger aspect ratios, MinD andMinE oscillate from compartment end to compartment middle [82].Both in vivo and in vitro, MinD and MinE form dynamic protein arrangements on spatialscales that are thousands of times larger than the spatial scale of an individual MinD orMinE protein, which are sustained on temporal scales that are much longer than the temporalscale of an individual MinD or MinE interaction on the membrane. Biochemical experimentshave elucidated the functional role of the Min system and much of its underlying biochemicalbasis. However, biochemical experiments only show small-scale snapshots of the reactions thatdrive dynamic behavior on much larger spatial and temporal scales – crystal structures showstable, static protein configurations and mutational analyses measure amino acid function withrespect to a particular functional assay. Protein visualizations, on the other hand, allow forthe observation of dynamic behavior, the collective outcome from local reactions, but providelittle insight into the local reactions themselves. As such, the direct connection between localreactions and global, dynamic behavior in the Min system is unclear. Mathematical modelscan predict dynamic outcomes for a set of reactions, and thus provide a means to connectinformation about local reactions to global, dynamic behavior in the Min system.Various mathematical models demonstrate dynamic behavior that is qualitatively similarto experimental observations of the Min system. Most mathematical models have focusedon behavior of the Min system in vivo. On domains approximating short rod-shaped cells,agent-based models demonstrate MinD and MinE densities that stochastically switch togetherfrom domain pole to domain pole ([21], [4]). As short rod-shaped domains grow, MinD and MinEdensities transition from static to regular pole-to-pole oscillations in deterministic models [75]that are sustained in both deterministic and stochastic models on mid-sized rod-shaped domains([26], [48], [41], [64]). As mid-sized rod-shaped domains grow, regular pole-to-pole oscillations ofMinD and MinE densities transition into regular pole-to-mid-domain oscillations in deterministicmodels ([48], [4], [75]) that are sustained in both deterministic and stochastic models on longrod-shaped domains ([48], [36], [47], [70]). Additionally, deterministic and stochastic modelshave qualitatively addressed various other aspects of in vivo oscillatory behavior in the Minsystem: oscillatory behavior in round mutant cells ([37], [20], [22], [4]), oscillatory behaviorin branched mutant cells [72], oscillatory behavior in flattened, irregular cells [62], oscillatory51.2. The Min Systembehavior in dividing cells ([70], [65], [17], [75]), oscillatory behavior of MinE mutants ([13],[1]), transitions in oscillation waveforms [75], midcell establishment through oscillation ([26],[25], [40], [22]), and the dependence of oscillation period on protein numbers ([26], [36], [70],[40]), cell length ([26], [70]), and temperature ([22], [75]). Several mathematical models havefocused on behavior of the Min system in vitro. On domains approximating supported lipidbilayers, MinD and MinE densities form into traveling waves ([45], [54]) and spiral waves ([45],[4]) in deterministic models. Additionally, deterministic and stochastic models have qualitativelyaddressed MinD and MinE patterning in vitro on geometrically confined membranes [63] andon micropatterned substrates [24].Most mathematical models of the Min system are based on the biochemistry-based charac-terization that MinE acts as an inhibitor of MinD membrane binding: cytosolic MinD monomersbind to ATP and form dimers ([34], [81]), which bind to the membrane ([28], [34], [32], [42], [81],[45]); MinE dimers bind to MinD dimers on the membrane ([28], [44]) and stimulate ATPaseactivity in MinD dimers, causing MinD dimers to separate and dissociate from the membrane([31], [28], [34], [42]). MinE acting as an inhibitor of MinD membrane binding is depicted inFigure 1.2. Recent experiments have shown, however, that MinE can act to both stabilizeand inhibit MinD membrane binding, with MinE stabilizing MinD membrane binding at lowerrelative concentrations of MinE to MinD and MinE inhibiting MinD membrane binding athigher relative concentrations of MinE to MinD [73]. No mathematical model has accounted forMinE’s dual role in MinD membrane binding and its biological implications remain unknown.61.2. The Min SystemMinD monomerATPMinE dimerADPFigure 1.2: MinE acting as an inhibitor of MinD membrane binding. Cytosolic MinD monomersbind to ATP and form dimers, which bind to the membrane (left). MinE dimers bind to MinDdimers on the membrane (center) and stimulate ATPase activity in MinD dimers, causing MinDdimers to separate and dissociate from the membrane (right). This classic model does notaccount for the dual stabilizing and inhibitory roles of MinE.Various quantitative experimental measurements have been used to validate mathematicalmodels of the Min system: pole-to-pole oscillation period in vivo ([47], [70], [13], [5], [1], [22],[4], [75]), distributions of residence times during stochastic pole-to-pole switching ([21], [4])and regular pole-to-pole oscillations [21] in vivo, and traveling wave velocity and wavelength invitro [45]. However, no model has been quantitatively compared to time-course data. A largevariety of models demonstrate dynamic behavior that is qualitatively similar to experimentalobservations without accounting for observed biological phenomena such as MinE’s dual role inMinD membrane binding. A quantitative comparison of models to time-course data is the nextlogical step in unraveling how proposed reactions contribute to Min system dynamics.In this dissertation, I extract time-course data for model fitting from Ivanov and Mizuuchi’s invitro experimental measurements of the Min system [38]. I fit established and novel biochemistry-based models to the time-course data using my parameter estimation method for differentialequations. Comparing models to time-course data allows me to make precise distinctions betweenbiochemical assumptions in the various models. My modeling and fitting supports a novel modelthat accounts for MinE’s previously unmodeled dual role as a stabilizer and an inhibitor ofMinD membrane binding. It suggests that a regular, ordered, stability-switching mechanismunderlies the emergent, dynamic behavior of the Min system.71.3. Chapter Summaries1.3 Chapter Summaries• In Chapter 2, I develop a method that allows me to calculate the optimal data-fittingnumerical solution and its parameters for a differential equation model without usingnumerical integration. Additionally, I show that my method admits conservation principlesand integral representations that allow me to gauge the accuracy of my optimization.• In Chapter 3, I test my method using a system of first order ordinary differential equations,a system of second order ordinary differential equations, and a system of partial differentialequations. In doing so, I compare the performance of my method to that of an analogousnumerical-integration-based method, explore how my method can inform modeling insuffi-ciencies and potential model improvements, and expound how conservation principles andintegral representations in my method gauge the accuracy of my optimization in practice.• In Chapter 4, I briefly summarize extracting time-course data for model fitting fromexperimental measurements of the Min system. I fit established and novel biochemistry-based models to the time-course data using my method. In doing so, I explore howindividual reactions affect a model’s ability to describe the time-course data. Based on myresults, I interpret a novel mechanism that may underlie the dynamic behavior of the Minsystem.• In Chapter 5, I briefly summarize my results from the previous chapters and discusslimitations and extensions of my method and fitting models of the Min system to time-course data.8Chapter 2A Homotopy-Minimization Methodfor Parameter Estimation inDifferential Equations2.1 IntroductionNon-numerical integration methods estimate parameters in a differential equation model byfitting solution approximations to data. Often, non-numerical integration methods, such ascollocation methods, will return reliable parameter estimates. However, because non-numericalintegration methods approximate the optimal data-fitting solution, they do not provide animpartial measure of how well a differential equation model fits data, a measure requiredfor the testability of a model. Numerical-integration-based methods estimate parametersin a differential equation model by fitting numerical solutions to data. As such, numerical-integration-based methods directly find the optimal data-fitting numerical solution and itsparameter for a differential equation model. However, numerical-integration-based methodscan demand extensive computation, especially for large stiff systems of differential equationsthat require implicit methods for stability. In this chapter, I develop a method that allows meto calculate the optimal data-fitting numerical solution and its parameters for a differentialequation model without using numerical integration. In doing so, my method bypasses the needto calculate numerical solutions with implicit methods, which can be computationally intensive.Additionally, in this chapter, I show that my method admits conservation principles and integralrepresentations that allow me to gauge the accuracy of my optimization.2.2 Method OverviewHere, for simplicity in presentation, I explicate the method for a system of first order ordinarydifferential equations. I extend the method to systems of higher order ordinary differentialequations and systems of partial differential equations in Sections A.1 and A.2.A first order ordinary differential equation model of some dynamic process in t, with nxstates, x1, x2, . . . , xnx , np parameters, p1, p2, . . . , pnp , and ny observable states, y1, y2, . . . , yny ,92.2. Method Overviewis defined by the system of equations,Fi(t, p1, p2, . . . , pnp , x1, x2, . . . , xnx ,dx1dt,dx2dt, . . . ,dxnxdt)= 0, (2.1a)yj = gj(p1, p2, . . . , pnp , x1, x2, . . . , xnx), (2.1b)where functions Fi, for i ∈ {1, 2, . . . , nx}, provide a model for the evolution of state values,and the functions gj , for j ∈ {1, 2, . . . , ny}, define observable states. For some observed datavalues, y1,k, y2,k, . . . , yny ,k, measured at times tk, for k ∈ {1, 2, . . . , nt}, I seek the parametersp1, p2, . . . , pnp such that the functions xi(t, p1, p2, . . . , pnp), for i ∈ {1, 2, . . . , nx}, satisfy differen-tial equation system (2.1a) and admit the observable state values that most closely approximatesthe observed data, in some sense.Generally, solutions to the system of equations (2.1a) are difficult or impossible to find inclosed form. However, using a numerical approximation method – finite difference, finite element,etc – I can numerically approximate the value of solutions at time tk, for k ∈ {1, 2, . . . , nt}.Because data may be sampled more sparsely than that required for the desired numericalsolution accuracy, I choose the numerical discretization {tk : k ∈ I∆} to be a refinement of{tk : k ∈ {1, 2, . . . , nt}}, where the index set of the numerical discretization, I∆, is a super setof {1, 2, . . . , nt}. In doing so, I index grid points in {tk : k ∈ I∆} that lie between adjacentgrid points in {tk : k ∈ {1, 2, . . . , nt}} with fractional indices that reflect their relative locationwithin the discretization. Once a numerical method is chosen, equation (2.1a) can be formulatedinto a method-dependent system of equations for the discrete numerical solution values xi,k:fi,k(t1, . . . , tnt , p1, . . . , pnp , x1,1, x2,1, . . . , xnx,1, x1,2, . . . , xnx,nt)=fi,k(t,p,x) = 0, (2.2)for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆.Once a measure of the quality of the model’s fit to the data is chosen – least-squares, negativelog-likelihood, etc – I can define a functional ry(p,x) with the properties that (i) ry(p,x) ≥ 0,(ii) ry(p,x) = 0 if and only if gj(p, x1,k, . . . , xnx,k) = yj,k for all j ∈ {1, 2, . . . , ny} and for allk ∈ {1, 2, . . . , nt}, and (iii) ry(p1,x1) < ry(p2,x2) implies that (p1,x1) gives a better fit to thedata than does (p2,x2). I describe the construction of ry(p,x) using a normalized least-squaresmeasure in Section 2.3. Ultimately, I seek the parameters pˇ, which minimize ry(p,x) subject tothe constraints fi,k(t,p,x) = 0 for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆, the parameters ofthe numerical solution that fits the data best.The structure of differential equation solutions can cause numerical-integration-based methodsto be inaccurate or inefficient. High dimensional, nonlinear systems of differential equationswith many parameters often contain bifurcations, which separate the solution space, and thusthe numerical solution space, into regions with different qualitative behavior. Bifurcationseffectively disconnect regions within the numerical solution space, obfuscating optimization102.2. Method Overviewwith a numerical-integration-based method. Also, numerical-integration-based methods requirerepeated numerical integration, which can be computationally very expensive, especially fordifferential equations that require implicit methods for stability. To ameliorate inaccuracies frombifurcations and inefficiencies from calculating numerical solution values, rather than searchingfor pˇ within the numerical solution space of differential equation system (2.1a), I search for pˇwithin an extended space of discrete state values, including discrete state values outside of thesolution space of system (2.2). To do so, using some measure of satisfying the numerical solutionto differential equation system (2.1a), I can define a functional r∆x(p,x) with the properties that(i) r∆x(p,x) ≥ 0, (ii) r∆x(p,x) = 0 if and only if fi,k(t,p,x) = 0 for all i ∈ {1, 2, . . . , nx} andfor all k ∈ I∆, and (iii) r∆x(p1,x1) < r∆x(p2,x2) implies that (p1,x1) satisfies the numericalsolution method better than (p2,x2) does. I describe the construction of r∆x(p,x) using anormalized least-squares measure in Section 2.4. Then, I combine ry(p,x) and r∆x(p,x), withproportionality value λ ∈ (0, 1), into a single functional,ρ(p,x;λ) = (1− λ)ry(p,x) + λr∆x(p,x), (2.3)a homotopy between ry(p,x) and r∆x(p,x), a continuous deformation from ry(p,x) to r∆x(p,x).I note, for consistency in scale and units in ρ(p,x;λ), that I define ry(p,x) and r∆x(p,x) onthe same scale with the same units. As elaborated in Section 2.8, as λ→ 1− the parameters andstate values that minimize ρ(p,x;λ) approach the parameters and state values of the optimaldata-fitting numerical solution.For λ = 1, ρ(p,x;λ) attains a minimum value of zero at all points where r∆x(p,x) = 0, theinfinite set of numerical solutions that correspond to the infinite set of all parameter combinations.Thus, for λ near 1, an iterative method like gradient descent will converge to a minimum that sitsnear some numerical solution with parameter values close to the initial set of parameter values.As such, an iterative method like gradient descent will not likely converge to the global minimizerof ρ(p,x;λ) for λ near 1. Instead, I minimize ρ(p,x;λ) over a broad range of λ values, and usethe minimization of ρ(p,x;λ) with smaller values of λ to direct the minimization of ρ(p,x;λ)with λ near 1, an idea that is similar in spirit to homotopy methods, which start at a solution ofa simpler problem and sequentially step toward the solution of a more difficult problem that ishomotopic to the simpler problem [18]. However, rather than minimizing r(p,x;λ) sequentiallyover an array of λ values, I simultaneously minimize r(p,x;λ) over an array of λ values, to avoiderror propagation from sequential minimization and to avoid excessive computation in ensuringthe global minimum of r(p,x;λ) for a smaller value of λ before beginning the minimization ofr(p,x;λ) for a larger value of λ. I outline my minimization technique in Section 2.7.For λ ∈ (0, 1), the parameters and state values that minimize ρ(p,x;λ), p˘λ and x˘λ, allowme to define useful functions, ρ˘(λ), r˘y(λ), and r˘∆x(λ), as follows:ρ˘(λ) = (1− λ)r˘y(λ) + λr˘∆x(λ) = (1− λ)ry(p˘λ, x˘λ) + λr∆x(p˘λ, x˘λ). (2.4)112.3. Defining a Measure of Data Fitting, ry(p,x)The homotopy-minimum functions, ρ˘(λ), r˘y(λ), and r˘∆x(λ), are useful because they admitconservative quantities, which allow me to gauge the progress and accuracy of my minimizationtechnique. I discuss details in Section 2.8.2.3 Defining a Measure of Data Fitting, ry(p,x)As an example and for use later, I define ry(p,x). In doing so, I consider the sum of weightedsquared differences as a measure of the difference between the jth observable model state andthe corresponding observed data values:nt∑k=1wj,k(yj,k − gj(p, x1,k, x2,k, . . . , xnx,k))2=nt∑k=1wj,k(yj,k − gj(p,xk))2, (2.5)for some data-dependent weights wj,k. To simultaneously measure the difference between allobservable model states and all observed data values, I combine the sum of weighted squareddifferences, for all observable states, j = 1, 2, . . . , ny, into the single functional ry(p,x). In doingso, to combine weighted squared differences on mixed scales with mixed units, I normalizedeach sum by the sum of weighted squared observed data values. To remove dependence on thenumber of observable states, I divide the normalized sum by the number of observable states.Thus,ry(p,x) =1nyny∑j=1(1∑ntk=1wj,ky2j,knt∑k=1wj,k(yj,k − gj(p,xk))2). (2.6)Without normalization, minimizing ρ(p,x;λ) biases fitting toward data on a larger scale atthe detriment of fitting data on a smaller scale. Normalization also removes dependence ofry(p,x) on the number of data points. In cases where data-dependent weights, wj,k, correctfor disparities in scale and units, normalization standardizes the scale of ry(p,x). To give thereader a sense of scale, for the homogeneous model, gj = 0 for all j ∈ {1, 2, . . . , ny}, of nontrivialdata, ry(p,x) = 1.2.4 Defining a Measure of Satisfying a Numerical Solution,r∆x(p,x)As an example and for use later, I define r∆x(p,x). In doing so, I consider systems of first orderordinary differential equations that are linear in derivatives of xi,dxidt= F¯i(t, p1, p2, . . . , pnp , x1, x2, . . . , xnx), (2.7)122.4. Defining a Measure of Satisfying a Numerical Solution, r∆x(p,x)for i ∈ {1, 2, . . . , nx}. In terms of Fi as defined in equation (2.1a),Fi =dxidt− F¯i(t, p1, p2, . . . , pnp , x1, x2, . . . , xnx), (2.8)for i ∈ {1, 2, . . . , nx}. I also consider finite difference numerical methods, of the form∆xi,k =Fi,k( {F¯i(tk, p1, p2, . . . , pnp , x1,k, x2,k, . . . , xnx,k) : k ∈ I∆} )= Fi,k(t,p,x), (2.9)where xi,k are the numerical solution values, ∆xi,k is some method-dependent finite differencediscretization of dxidt at time tk, and Fi,k are some method-dependent functions of F¯i at time tk,for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆. In terms of fi,k as defined in equation (2.2),fi,k(t,p,x) = ∆xi,k − Fi,k(t,p,x), (2.10)for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆. For example, with the backward Euler method,∆xi,k =0 if k ∈ {1}xi,k − xi,k−tk − tk−if k ∈ I∆ \ {1},Fi,k(t,p,x) ={0 if k ∈ {1}F¯i (tk,p,xk) if k ∈ I∆ \ {1},(2.11)where k− is the index below k in I∆, for i ∈ {1, 2, . . . , nx}.To be consistent in measure, scale, and units with ry(p,x) as defined in equation (2.6),I measure the difference between all ∆xi,k and Fi,k by the mean normalized sum of squareddifferences,r∆x(p,x) =1nxnx∑i=1 1∑k∈I∆(∆xi,k)2∑k∈I∆(∆xi,k − Fi,k(t,p,x))2 . (2.12)Normalizing by the sum of squared finite differences allows me to combine squared differencesfor variables on mixed scales with mixed units. Without normalization, minimizing ρ(p,x;λ)biases fitting toward discretizations in more rapidly changing states at the detriment of fittingdiscretizations in more slowly changing states. Normalization also removes dependence ofr∆x(p,x) on the number of points in the discretization, and dividing by the number of statesremoves dependence of r∆x(p,x) on the number of states. To give the reader a sense of scale,for the homogeneous model, Fi,k = 0 for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆, of nonconstantstate values, r∆x(p,x) = 1. Thus, r∆x(p,x) is consistent in scale and units with ry(p,x) ofequation (2.6). For systems of first order ordinary differential equations that are nonlinear inderivatives of xi, or for numerical methods other than finite difference methods, r∆x(p,x) should132.4. Defining a Measure of Satisfying a Numerical Solution, r∆x(p,x)be normalized similarly.2.4.1 Inclusion of a Smoothing Penalty in r∆x(p,x)Observable states are often the combination of multiple unobservable states. As such, it ispossible for observable states to fit observed data with jagged underlying state values. Becauseof normalization by sums of squared finite differences in r∆x(p,x), jagged state values maylead to relatively small values r∆x(p,x). As λ → 1−, jagged state values are dampened outin the minimization of ρ(p,x;λ) = (1 − λ)ry(p,x) + λr∆x(p,x), as r∆x(p,x) approaches avalue of 0, which occurs if and only if state values approach a numerical solution. However,for small λ, jagged state values may lead to smaller values of r∆x(p,x) than would less jaggedstate values that are closer to a numerical solution. Jagged state values do not conform to thedynamic structure of the differential equation model and, thus, do not meaningfully guide theminimization of ρ(p,x;λ) for larger values of λ. To avoid jagged state values when minimizingρ(p,x;λ) for small values of λ, I incorporate multiplicative smoothing penalties, si(x), intor∆x(p,x):r∆x(p,x) =1nxnx∑i=1 si(x)∑k∈I∆(∆xi,k)2∑k∈I∆(∆xi,k − Fi,k(t,p,x))2 , (2.13a)si(x) = αi + βi(4∑k∈I∆\{1,nt}(xi,k− − 2xi,k + xi,k+)2∑k∈I∆\{1,nt}(xi,k+ − xi,k−)2)γi, (2.13b)where k− and k+ are the indices below and above k in I∆; (xi,k− − 2xi,k + xi,k+)2 is a measureof roughness around xi,k, the squared difference between the forward difference centered at xi,k,xi,k+ − xi,k, and the backward difference centered at xi,k, xi,k − xi,k− ; (xi,k+ − xi,k−)2/4 is anormalization measure of differences centered at xi,k, the squared mean value of the forwarddifference centered at xi,k and the backward difference centered at xi,k; and αi > 0, βi ≥ 0,and γi ≥ 0 are parameters that are chosen to ensure that state values do not jaggedly deviatefrom the dynamic structure of the model and to set the scale of the smoothing penalty. To givea sense of scale, I note that si(x) = 0 with αi = 0, βi = 1, and γi = 1 for xi,k evenly spacedalong a line with nonzero slope, and si(x) ranges from around 10 to 15 with αi = 0, βi = 1,and γi = 1 for xi,k randomly sampled from the standard uniform distribution or the standardnormal distribution with k ∈ {1, 2, . . . , 103}. Thus, as an example, choosing αi = 1, βi = 102,and γi = 2 for all i ∈ {1, 2, . . . , nx} would insignificantly modify r∆x(p,x) when state values areclose to colinear and would strongly penalize r∆x(p,x) when state values are close to random. Inote that, as αi > 0 for all i ∈ {1, 2, . . . , nx}, r∆x(p,x) = 0 if and only if ∆xi,k = Fi,k(t,p,x)for all i ∈ {1, 2, . . . , nx} and all k ∈ I∆. As such, the inclusion of multiplicative smoothingpenalties in r∆x(p,x), as defined in equation (2.13), does not alter the parameters and statevalues that minimize ρ(p,x;λ) as λ→ 1−.142.5. A Concrete Example of ry(p,x) and r∆x(p,x) Using a Model of FRAP2.5 A Concrete Example of ry(p,x) and r∆x(p,x) Using aModel of FRAPI provide a concrete example of ry(p,x) and r∆x(p,x) for a simple model of fluorescencerecovery after photobleaching (FRAP). For fluorescence intensity x1(t), recovery-level parameterp1, timescale parameter p2, and observable state y1,dx1dt=p1 − x1p2, (2.14a)y1 = g1(p1, p2, x1) = x1. (2.14b)I consider some observed data values y1,k measured at times tk, for k ∈ {1, 2, . . . , nt}, anddiscrete state values x1,k on an unrefined discretization grid, k ∈ I∆ = {1, 2, . . . , nt}. Thus, forry(p,x) as defined in equation 2.6 with unitary weights and r∆x(p,x) as defined in equation2.12 using the backward Euler discretization as defined in equation 2.11,ry(p,x) =1∑ntk=1 y21,knt∑k=1(y1,k − x1,k)2, (2.15a)r∆x(p,x) =(nt∑k=2(x1,k − x1,k−1tk − tk−1)2)−1 nt∑k=2(x1,k − x1,k−1tk − tk−1 −p1 − x1,kp2)2. (2.15b)2.6 Extending the Homotopy on Refined Discretization GridsFor ρ(p,x;λ) with small λ, on a refined discretization grid, where I∆ 6= {1, 2, . . . , nt}, deviationsin observable state values from observed data values carry a strong penalty at grid points withindices in {1, 2, . . . , nt} and no penalty at grid points with indices in I∆ \ {1, 2, . . . , nt}. Thus,the state values that minimize ρ(p,x;λ) with small λ may admit observable state values thatvary dramatically across adjacent indices in {1, 2, . . . , nt} and I∆ \ {1, 2, . . . , nt}. As λ increasesfrom 0 to 1, the state values that minimize ρ(p,x;λ) increasingly inherit smooth structurefrom the solution to differential equation system (2.1a). Smoothness transfers from state valuesto observable state values through the observation functions, gj . Thus, the state values thatminimize ρ(p,x;λ) with λ near 1 admit observable state values that vary smoothly acrossadjacent indices in {1, 2, . . . , nt} and I∆ \ {1, 2, . . . , nt}. As such, on a refined discretizationgrid, the parameters and state values that minimize ρ(p,x;λ) for small values of λ may notmeaningfully guide the minimization of ρ(p,x;λ) for larger values of λ. To address this, I extendρ(p,x;λ) to penalize deviations in observable state values from interpolated data values at gridpoints with indices in I∆ \ {1, 2, . . . , nt}.Using some interpolation method, I generate interpolated data, yˆj,k for j ∈ {1, 2, . . . , ny},at grid points with indices in Iyˆ = I∆ \ {1, 2, . . . , nt} from observed data values with indicesin {1, 2, . . . , nt}. Using some measure of the difference between observable model states and152.7. Optimization Using Overlapping-Niche Descentinterpolated data values, I can define the functional ryˆ(p,x) with the properties that (i)ryˆ(p,x) ≥ 0, (ii) ryˆ(p,x) = 0 if and only if I∆ = {1, 2, . . . , nt} or gj(p, x1,k, . . . , xnx,k) = yˆj,kfor all j ∈ {1, 2, . . . , ny} and for all k ∈ Iyˆ, and (iii) ryˆ(p1,x1) < ryˆ(p2,x2) implies that (p1,x1)gives a better fit to the interpolated data than does (p2,x2). I describe the construction ofryˆ(p,x) under a normalized least-squares measure in Section 2.6.1. I extend ρ(p,x;λ) to includeryˆ(p,x) with proportionality value (1− λ)2:r(p,x;λ) = ρ(p,x;λ) + (1− λ)2ryˆ(p,x) =(1− λ)ry(p,x) + (1− λ)2ryˆ(p,x) + λr∆x(p,x), (2.16)a homotopy between ry(p,x) + ryˆ(p,x) and r∆x(p,x). The proportionality value of (1− λ)2on ryˆ(p,x) incrementally decreases the weighting of ryˆ(p,x) relative to ry(p,x), as λ increasesfrom 0 to 1 in r(p,x;λ). I show in Section B.1 that if (pˇ, xˇ) minimizes ρ(p,x;λ) as λ→ 1− then(pˇ, xˇ) also minimizes r(p,x;λ) as λ→ 1−. Thus, the inclusion of (1− λ)2ryˆ(p,x) in ρ(p,x;λ)penalizes deviations in observable state values from interpolated data values at smaller values ofλ, but does not alter the parameters and state values that minimize ρ(p,x;λ) as λ→ 1−.2.6.1 Defining a Measure of Interpolated Data Fitting, ryˆ(p,x)As an example and for use later, I define ryˆ(p,x) following the form of ry(p,x) in equation (2.6).To do so, using some interpolation method, I generate interpolated data-dependent weights, wˆj,kfor j ∈ {1, 2, . . . , ny}, at grid points with indices in Iyˆ from data-dependent weights with indicesin {1, 2, . . . , nt}. To be consistent in measure and scale with ry(p,x) as defined in equation(2.6), I measure the difference between observable state values and interpolated data values atgrid points with indices in Iyˆ by the mean normalized sum of squared differences:ryˆ(p,x) =1nyny∑j=1 σˆ∑k∈Iyˆ wˆj,kyˆ2j,k∑k∈Iyˆwˆj,k(yˆj,k − gj(p, x1,k, . . . , xnx,k))2 , (2.17)where σˆ > 0 is a scaling parameter that is chosen to set the weighting of ryˆ(p,x) relative tory(p,x) as λ→ 0+. If I∆ = {1, 2, . . . , nt}, then I define ryˆ(p,x) = 0, and r(p,x;λ) reduces toρ(p,x;λ).2.7 Optimization Using Overlapping-Niche DescentTo synergistically minimize r(p,x;λ) over an array of λ values, I implement overlapping-nichedescent, a genetic algorithm directed by gradient-based descent, which is isomorphic to thedynamics of an evolving ecological population that competes for multiple food sources in asingle environment, where an individual that successfully competes for one food source maysuccessfully compete for a similar food source. In overlapping-niche descent, a unique value of162.7. Optimization Using Overlapping-Niche Descentλ ∈ (0, 1) defines a niche, and a set of λ values spanning (0, 1) defines the set of niches in theenvironment. Each niche supports a certain number of individuals, where each individual isrepresented by a set of parameters and state values. As in other genetic algorithms, individualsreproduce, with crossover and mutation, to generate new individuals and variability within theparameter-state value search space. The likelihood of optimizing a function by random probingdecreases with an increasing number of variables, and r(p,x;λ) is a functional of many variables.Thus, to accelerate optimization, after reproduction, individuals undergo gradient-based descent.After descent, through selection, each niche sustains the individuals with the lowest values ofr(p,x;λ). Selection acts across niches, allowing individuals to spread from one niche to others,for a cooperative transfer of information from data to the optimal data-fitting numerical solutionduring optimization. I discuss details of overlapping-niche descent in Section C. The process ofoverlapping-niche descent is depicted in Figure 2.1.descentselectionreproduction41 2 3 5 . . .Figure 2.1: Overlapping-Niche Descent. The parameters and state values that minimize r(p,x;λ)as λ→ 1− are those of the optimal data-fitting numerical solution. Overlapping-niche descentsynergistically minimizes r(p,x;λ) over a broad range of λ values, λ1, λ2, . . . , to more robustlyminimize r(p,x;λ) as λ → 1−. In overlapping-niche descent, each value of λ defines a niche.In each niche, r(p,x;λ) is locally minimized for a set of initial parameters and state values(descent). From the full set of parameters and state values, the parameters and state valueswith the lowest values of r(p,x;λ) are retained in each niche (selection). Then, a new set ofparameters and state values are generated in each niche from the full set of retained parametersand state values (reproduction). After selection and reproduction, descent occurs again. Initialpoints are shown with triangles, local minimums are shown with circles, and newly generatedpoints are shown with squares. Gradation from light gray to black is shown to emphasize thetransfer of information across niches during optimization.Although similar in name, my overlapping-niche genetic algorithm, which cooperativelyoptimizes over a range of similar problems, differs from multi-niche genetic algorithms, whichoptimize a single problem to find multiple modes. Overlapping niche-descent is less similar inname, but more similar in character to smooth functional tempering [8], a Bayesian, dynamic172.8. Properties of the Homotopy and Inspection of Overlapping-Niche Descentcolocation method. Smooth functional tempering employs parallel MCMC (Markov Chain MonteCarlo) chains, over a range of derivative-matching penalty weights. During sampling, parallelchains may exchange parameters to more robustly sample posterior probability distributions inchains with large derivative-matching penalty weights. Ultimately, smooth functional temperingapproximates the posterior probability distribution of a dynamic collocation basis with somelarge derivative-matching penalty weight.2.8 Properties of the Homotopy and Inspection ofOverlapping-Niche DescentFor λ ∈ (0, 1), the parameters and state values that minimize r(p,x;λ), p˘λ and x˘λ, allow me todefine useful functions, r˘(λ), r˘y(λ), r˘yˆ(λ), and r˘∆x(λ), as follows:r˘(λ) = (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) =(1− λ)ry(p˘λ, x˘λ) + (1− λ)2ryˆ(p˘λ, x˘λ) + λr∆x(p˘λ, x˘λ). (2.18)The homotopy-minimum functions, r˘(λ), r˘y(λ), r˘yˆ(λ), and r˘∆x(λ), are useful because they admitconservative quantities, which allow me to gauge the progress and accuracy of overlapping-nichedescent.From Theorem 1 in Appendix B,limλ→0+p˘λ, x˘λ = arg min(r∆x(p,x) : ry(p,x) = 0, ryˆ(p,x) = 0), (2.19)limλ→1−p˘λ, x˘λ = arg min(ry(p,x) : r∆x(p,x) = 0). (2.20)Equation (2.19) states that the parameters and state values that minimize r(p,x;λ) as λ→ 0+are those closest to a numerical solution given that observable state values perfectly fit data;equation (2.20) states that the parameters and state values that minimize r(p,x;λ) as λ→ 1−are those of the numerical solution that fits observed data best. Thus, limλ→0+ r˘∆x(λ) is ameasure of how well data can satisfy a numerical solution, and limλ→1− r˘y(λ) is a measure of howwell a numerical solution can fit observed data. Generally, I am interested in finding limλ→1− p˘λand limλ→1− x˘λ. However, limλ→0+ p˘λ and limλ→0+ x˘λ are also informative, as they show howbadly data fails to be a numerical solution, and may thus provide insight into measurementerror, model inadequacies, and potential model improvements.From Theorem 3 in Appendix B, if r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at λ ∈ (0, 1),then(1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ+ λdr˘∆x(λ)dλ= 0. (2.21)Thus, changes in r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) with respect to λ are coupled. If r˘y(λ), r˘yˆ(λ), and182.8. Properties of the Homotopy and Inspection of Overlapping-Niche Descentr˘∆x(λ) are differentiable at all but a finite number of points in (0, 1), then from Theorem 4 inAppendix B,2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ =∫ 10r˘∆x(λ)dλ−∫ 10(1− λ)2r˘yˆ(λ)dλ. (2.22)Equation (2.22) defines a conservation in the coupled changes of r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) acrossλ ∈ (0, 1). In overlapping-niche descent, I minimize r(p,x;λ) over an array of λ values in (0, 1)to find p˜λ and x˜λ, approximations of p˘λ and x˘λ, which allow me to define the functions r˜(λ),r˜y(λ), r˜yˆ(λ), and r˜∆x(λ) such thatr˜(λ) = (1− λ)r˜y(λ) + (1− λ)2r˜yˆ(λ) + λr˜∆x(λ) =(1− λ)ry(p˜λ, x˜λ) + (1− λ)2ryˆ(p˜λ, x˜λ) + λr∆x(p˜λ, x˜λ). (2.23)I can determine how well r˜y(λ), r˜yˆ(λ), and r˜∆x(λ) satisfy conservation in coupled functionalchanges:2∫ 10r˜(λ)dλ =∫ 10r˜y(λ)dλ+∫ 10(1− λ2)r˜yˆ(λ)dλ =∫ 10r˜∆x(λ)dλ−∫ 10(1− λ)2r˜yˆ(λ)dλ. (2.24)A failure to reasonably satisfy equation (2.24) indicates that r˜y(λ), r˜yˆ(λ), and/or r˜∆x(λ) differsignificantly from r˘y(λ), r˘yˆ(λ), and/or r˘∆x(λ), implying that overlapping-niche descent has notbeen successfully or is incomplete.Conservation in coupled functional changes, equation (2.22), relates r˘y(λ), r˘yˆ(λ), and r˘∆x(λ)over a broad range of λ ∈ (0, 1). However, values of r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) for λ near 0 and λnear 1 do not significantly affect the values of integrals in equation (2.22). Thus, reasonablysatisfying equation (2.24) reveals little about the coupled changes of r˜y(λ), r˜yˆ(λ), and r˜∆x(λ)for λ near 0 and λ near 1. From Theorem 5 in Appendix B, if r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) aredifferentiable at all but a finite number of points in (0, 1), thenlimλ→0+r˘∆x(λ) =∫ 101λ2r˘y(λ)dλ+∫ 101− λ2λ2r˘yˆ(λ)dλ, (2.25a)limλ→1−r˘y(λ) =∫ 101(1− λ)2 r˘∆x(λ)dλ−∫ 10r˘yˆ(λ)dλ. (2.25b)Equation 2.25 defines integral representations of limit values, with limλ→0+ r˘∆x(λ) definedentirely in terms of r˘y(λ) and r˘yˆ(λ), and limλ→1− r˘y(λ) defined entirely in terms of r˘yˆ(λ) andr˘∆x(λ). I can determine how well r˜y(λ), r˜yˆ(λ), and r˜∆x(λ) satisfy the integral representations192.8. Properties of the Homotopy and Inspection of Overlapping-Niche Descentof limit values:limλ→0+r˜∆x(λ) =∫ 101λ2r˜y(λ)dλ+∫ 101− λ2λ2r˜yˆ(λ)dλ, (2.26a)limλ→1−r˜y(λ) =∫ 101(1− λ)2 r˜∆x(λ)dλ−∫ 10r˜yˆ(λ)dλ. (2.26b)A failure to reasonably satisfy equation (2.26) indicates that r˜y(λ), r˜yˆ(λ), and/or r˜∆x(λ) differsignificantly from r˘y(λ), r˘yˆ(λ), and/or r˘∆x(λ), implying that overlapping-niche descent has notbeen successfully or is incomplete.I note that r(p,x;λ) reduces to ρ(p,x;λ) when ryˆ(p,x) = 0, and thus, the aforementionedproperties of r(p,x;λ) apply to ρ(p,x;λ) with ryˆ(p,x) = 0.20Chapter 3Testing the Homotopy-MinimizationMethod for Parameter Estimation inDifferential Equations3.1 IntroductionIn Chapter 2, I developed a method that allowed me to calculate the optimal data-fittingnumerical solution and its parameters for a differential equation model without using numericalintegration. Additionally, I showed that my method admits conservation principles and integralrepresentations that allow me to gauge the accuracy of my optimization. In this chapter, I testmy method using a system of first order ordinary differential equations, a system of secondorder ordinary differential equations, and a system of partial differential equations. In doing so,I compare the performance of my method to that of an analogous numerical-integration-basedmethod, explore how my method can inform modeling insufficiencies and potential modelimprovements, and expound how conservation principles and integral representations in mymethod gauge the accuracy of my optimization in practice.As discussed in Section 1.2, the Min System, consisting of three proteins, MinD, MinE, andMinC, regulates the site of cell division in Escherichia coli [15]. Experimentally, MinD andMinE show interesting behaviors, such as emerging pole-to-pole oscillations in cells in vivo [60]and traveling waves and spiral waves on supported lipid bilayers in vitro [45]. The Bonny model[4], which models the interactions of MinD and MinE, admits solutions with behaviors thatare qualitatively similar to the dynamic patternings of MinD and MinE that are observed inexperiments [4]. I choose the Bonny model to test my method because of its biological relevance,and because in different contexts it manifests as a first order ordinary differential equationsystem, a second order ordinary differential equation system, and a system of partial differentialequations.3.2 A Model for MinD and MinE Interactions by Bonny et al(2013)The Bonny model consists of five states, cD, cE , cd, cde, and ce, corresponding to concentrationsof bulk MinD, bulk MinE, membrane bound MinD, membrane bound MinD-MinE complex, and213.2. A Model for MinD and MinE Interactions by Bonny et al (2013)membrane bound MinE respectively. I describe the biology behind the Bonny model and itsformulation in detail in Section 4.3.2. I focus on a simplified version of the Bonny model with alarge, well mixed bulk, such that MinD and MinE bulk concentrations, cD and cE , are constant:∂cd∂t= cD(ωD + ωdDcd)(cmax − cd − cde)/cmax − ωEcEcd − ωedcecd +Dd∇cd, (3.1a)∂cde∂t= ωEcEcd + ωedcecd − (ωde,m + ωde,c)cde +Dde∇cde, (3.1b)∂ce∂t= ωde,mcde − ωedcecd − ωece +De∇ce, (3.1c)with parameters ωD, ωdD, ωE , ωed, ωde,m, ωde,c, ωe, cmax, Dd, Dde, and De and observable statesMinD = cd + cde and MinE = cde + ce. cd, cde, ce, MinD, and MinE are measured in µm−2.Values and definitions of parameters and constants for the Bonny model in an in vitro contextare shown in Table 3.1.value definitionDd 3.00 · 10−1 µm2 s−1 diffusion coefficient of cdDde 3.00 · 10−1 µm2 s−1 diffusion coefficient of cdeDe 1.80 · 100 µm2 s−1 diffusion coefficient of cecD 4.80 · 102 µm−3 bulk concentration of MinDcE 7.00 · 102 µm−3 bulk concentration of MinEcmax 2.75 · 104 µm−2 maximum value of cd + cdeωD 5.00 · 10−4 µm s−1 rate of the reaction cD → cdωE 1.36 · 10−4 µm3 s−1 rate of the reaction cE + cd → cdeωdD 3.18 · 10−3 µm3 s−1 rate of the reaction cD + cd → 2cdωde,c 1.60 · 10−1 s−1 rate of the reaction cde → cD + cEωde,m 2.52 · 100 s−1 rate of the reaction cde → cD + ceωe 5.00 · 10−1 s−1 rate of the reaction ce → cEωed 4.90 · 10−3 µm2 s−1 rate of the reaction cd + ce → cdeTable 3.1: Values and definitions of parameters and constants in the Bonny model. Values aretaken from the set of in vitro parameters in [4].In the case of spatial homogeneity, where ∇cd = 0, ∇cde = 0, and ∇ce = 0, the Bonny modelreduces to a system of first order ordinary differential equations:dcddt= cD(ωD + ωdDcd)(cmax − cd − cde)/cmax − ωEcEcd − ωedcecd, (3.2a)dcdedt= ωEcEcd + ωedcecd − (ωde,m + ωde,c)cde, (3.2b)dcedt= ωde,mcde − ωedcecd − ωece. (3.2c)The Bonny model admits traveling wave solutions. In the traveling wave coordinate system,z = x − st, with spatial location x, time t, and nonzero traveling wave velocity s, where223.3. Synthetic Data Generationcd(x, t) = cd(z), cde(x, t) = cde(z), and ce(x, t) = ce(z), the Bonny model (3.1) reduces to asystem of second order ordinary differential equations:dcddz= −1s(cD(ωD + ωdDcd)(cmax − cd − cde)/cmax − ωEcEcd − ωedcecd +Ddd2cddz2), (3.3a)dcdedz= −1s(ωEcEcd + ωedcecd − (ωde,m + ωde,c)cde +Dded2cdedz2), (3.3b)dcedz= −1s(ωde,mcde − ωedcecd − ωece +Ded2cedz2). (3.3c)3.3 Synthetic Data GenerationInstead of fitting a form of the Bonny model to experimental data, I generate synthetic datafrom a numerical solution of the form of the Bonny model using the parameters in Table 3.1,and fit parameters in the form of the Bonny model to the synthetic data. This allows me to testmy method within a controlled context, for a more concrete interpretation of my results.The spatially homogeneous Bonny model (3.2) admits numerical solutions with oscillatingpulses in MinD and MinE concentrations. To generate synthetic spatially-homogeneous data, Inumerically solve the spatially homogeneous Bonny model (3.2) with the parameter values fromTable 3.1 and small but non-zero initial conditions, cd(0) = 5.83 µm−2, cde(0) = 1.34 ·10−1 µm−2,and ce(0) = 1.58 · 10−1 µm−2, to introduce some uncertainty in the values of initial conditionswhen fitting data. In doing so, I use the MATLAB ODE solver ODE15s with a relativeerror tolerance of 10−12 and an absolute error tolerance of 10−12. I extract the syntheticspatially-homogeneous data by sampling the numerical solution every 0.5 s and calculatingobservable-state values. Synthetic spatially-homogeneous data is shown in Figure 3.1.233.3. Synthetic Data GenerationFigure 3.1: Synthetic spatially-homogeneous data. Data is generated by numerically solving thespatially homogeneous Bonny model (3.2) with the parameters from Table 3.1. Data is shownwith points, and dashed lines are shown to emphasize the underlying pulse behavior.To generate synthetic traveling-wave data, I construct a temporal pulse profile by numericallysolving the spatially homogeneous Bonny model (3.2) with the parameter values from Table 3.1and zero initial conditions. Then, I transform the temporal pulse profile into an initial pulseprofile in space, and numerically evolve the pulse profile according to the Bonny model (3.1)with the parameter values from Table 3.1 and periodic boundary conditions. In doing so, Iuse the method of lines with a symmetric second order finite difference discretization of theLaplacian and RK4 time-stepping, on a grid with 1/8 µm between spatial grid points and 10−3 sbetween temporal grid points. Over time, the pulse profile forms into a stable traveling waveprofile, with measured traveling wave velocity of s = −1.15 µm s−1. I extract the synthetictraveling-wave data by sampling the stable traveling wave profile every 0.5 µm and calculatingobservable-state values. Synthetic traveling-wave data is shown in Figure 3.2.243.3. Synthetic Data GenerationFigure 3.2: Synthetic traveling-wave data. Data is generated by numerically evolving a pulsewith the full Bonny model (3.1) and the parameters from Table 3.1 until a stable traveling waveforms. Data is shown with points, and dashed lines are shown to emphasize the underlyingtraveling wave behavior.The full Bonny model (3.1) demonstrates traveling wave emergence, the temporal evolutionfrom a pulse profile into a stable traveling wave profile. I extract the synthetic traveling-wave-emergence data by sampling the numerical evolution of a pulse profile, as described above,every 0.5 µm and every 0.5 s for the first 15 s of its numerical evolution and calculatingobservable-state values. Synthetic traveling-wave-emergence data is shown in Figure 3.3.253.4. Details of Optimization Using Overlapping-Niche Descent(a) (b)Figure 3.3: Synthetic traveling-wave-emergence data. Data is generated by numerically evolvinga pulse with the full Bonny model (3.1) and the parameters from Table 3.1 for 15 s. MinD datais shown in (a), and MinE data is shown in (b). Gradation is from black, with a value of 0, towhite, with a value of 1.9 · 104 in (a) and 4.5 · 103 in (b).3.4 Details of Optimization Using Overlapping-Niche DescentHere, I describe structural components of overlapping-niche descent for forms of the Bonnymodel. I describe details pertaining to the implementation of overlapping-niche descent inSection E.1.3.4.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x)Preliminarily, for consistency with previous notation, I define: x1 = cd, x2 = cde, and x3 = ce;p1 = Dd, p2 = Dde, p3 = De, p4 = cmax, p5 = ωD, p6 = ωE , p7 = ωdD, p8 = ωde,c, p9 = ωde,m,p10 = ωe, and p11 = ωed; y1 = MinD and y2 = MinE; and g1 = x1 + x2 and g2 = x2 + x3. Thus,nx = 3, np = 11, and ny = 2.For the spatially homogeneous Bonny model (3.2), I define ry(p,x) as in equation (2.6),ryˆ(p,x) as in equation (2.17), and r∆x(p,x) as in equation (2.13a). In ry(p,x), I use unitarydata weights. In ryˆ(p,x), I set σˆ = 1, and generate interpolated data and interpolated dataweights using a piecewise cubic spline with not-a-knot end conditions. I discretize the spatiallyhomogeneous Bonny model (3.2) using the backward Euler method, a method with first order263.4. Details of Optimization Using Overlapping-Niche Descentaccuracy. Thus, in r∆x(p,x),∆xi,k = 0 if k ∈ {1}xi,k − xi,k−∆tif k ∈ I∆ \ {1},Fi,k(t,p,x) ={0 if k ∈ {1}F¯i (p, x1,k, x2,k, . . . , xnx,k) if k ∈ I∆ \ {1},(3.4)where k− is the index below k in I∆, ∆t is the grid spacing in {tk : k ∈ I∆}, and F¯i is as definedin equation (2.7). In smoothing penalties, si(x), of r∆x(p,x), I set αi = 1, βi = 102, and γi = 2,for all i ∈ {1, 2, . . . , nx}, to insignificantly modify r∆x(p,x) with a smooth set of state valuesand to strongly penalize r∆x(p,x) with a jagged set of state values.For the traveling wave Bonny model (3.3), I define ry(p,x) as in equation (A.7a), ryˆ(p,x) asin equation (A.7b), and r∆x(p,x) as in equation (A.7c), with t = z. In ry(p,x), I use unitarydata weights. In ryˆ(p,x), I set σˆ = 1, and generate interpolated data and interpolated dataweights using a piecewise cubic spline with not-a-knot end conditions. I discretize the travelingwave Bonny model (3.3) using a central first order finite difference, a finite difference with secondorder accuracy, and a symmetric second order finite difference, a finite difference with secondorder accuracy. Thus, in r∆x(p,x),∆1xi,k = 0 if k ∈ {1, nz}xi,k+ − xi,k−2∆zif k ∈ I∆ \ {1, nz},∆2xi,k =0 if k ∈ {1, nz}xi,k+ − 2xi,k + xi,k−∆z2if k ∈ I∆ \ {1, nz},Fi,k(t,p,x) ={0 if k ∈ {1, nz}F¯i(p, x1,k, x2,k, . . . , xnx,k,∆2xi,k)if k ∈ I∆ \ {1, nz},(3.5)where k− and k+ are the indices below and above k in I∆, ∆z is the grid spacing in {zk : k ∈ I∆},and F¯i is as defined in equation (A.3). As with the spatially homogeneous Bonny model, I setαi = 1, βi = 102, and γi = 2 in smoothing penalties, si(x), of r∆x(p,x), for all i ∈ {1, 2, . . . , nx},to insignificantly modify r∆x(p,x) with a smooth set of state values and to strongly penalizer∆x(p,x) with a jagged set of state values.For the full Bonny model (3.1), I define ry(p,x) as in equation (A.14a), ryˆ(p,x) as inequation (A.14b), and r∆x(p,x) as in equation (A.14c), with u in time t and v in space s. Inry(p,x), I use unitary data weights. In ryˆ(p,x), I set σˆ = 1, and generate interpolated dataand interpolated data weights using a two-dimensional piecewise cubic spline with not-a-knotend conditions. I discretize the full Bonny model (3.1) using a Simpson method first order finitedifference in time, a finite difference with fourth order accuracy, and a symmetric second order273.4. Details of Optimization Using Overlapping-Niche Descentfinite difference in space, a finite difference with second order accuracy. Thus, in r∆x(p,x),∆1,0xi,k,l = 0 if k ∈ {1, nt} or l ∈ {1, ns}xi,k+,l − xi,k−,l2∆tif (k, l) ∈ I∆t \ {1, nt} × I∆s \ {1, ns},∆0,2xi,k,l =0 if l ∈ {1, ns}xi,k,l+ − 2xi,k,l + xi,k,l−∆s2if l ∈ I∆s \ {1, ns},Fi,k,l(t,p,x) =0 if k ∈ {1, nt} or l ∈ {1, ns}1∑m=−1bmF¯i(p,xk+m,l,∆0,2xi,k+m,l)if (k, l) ∈ I∆t \ {1, nt} × I∆s \ {1, ns},(3.6)where k− and k+ are the indices below and above k in I∆t , l− and l+ are the indices belowand above l in I∆s , ∆t is the grid spacing in {tk : k ∈ I∆t}, ∆s is the grid spacing in{sl : l ∈ I∆s}, b−1 = 1/6, b0 = 4/6, b1 = 1/6, F¯i is as defined in equation (A.10), and xk,l =x1,k,l, x2,k,l, . . . , xnx,k,l. As with the spatially homogeneous Bonny model, I set αi = 1, βi = 102,and γi = 2 in smoothing penalties, sti(x) and ssi (x), of r∆x(p,x), for all i ∈ {1, 2, . . . , nx}, toinsignificantly modify r∆x(p,x) with a smooth set of state values and to strongly penalizer∆x(p,x) with a jagged set of state values.3.4.2 Domain Restrictions on Parameters and StatesRate parameters, ωD, ωdD, ωE , ωed, ωde,m, ωde,c, and ωe, and diffusion coefficients, Dd, Dde, andDe, are only biologically relevant if nonnegative. Thus, I restrict rate parameters and diffusioncoefficients to nonnegative values:p ≥ 0 for all p ∈ {ωD, ωdD, ωE , ωed, ωde,m, ωde,c, ωe, Dd, Dde, De}. (3.7)Parameter cmax dictates the maximum concentration of membrane-bound MinD. Thus, I restrictcmax to values greater than or equal to the maximum MinD data value, Dmax:cmax ≥ Dmax. (3.8)For the synthetic spatially-homogeneous data, Dmax = 1.88 · 104 µm−2; for the synthetictraveling-wave data, Dmax = 1.72 · 104 µm−2; and for the synthetic traveling-wave-emergencedata, Dmax = 1.88 · 104 µm−2. Concentrations cd, cde, and ce are only biologically relevant ifnonnegative. Thus, I restrict cd, cde, and ce to nonnegative values:ci,k ≥ 0 for all i ∈ {d, de, e} and k ∈ I∆, (3.9)283.4. Details of Optimization Using Overlapping-Niche Descentwhere cd,k, cde,k, and ce,k are the values of cd, cde, and ce at the kth index of the numericaldiscretization. Details of overlapping-niche descent on restricted domains are described inSection C.2.3.3.4.3 NichesI choose 101 values of λ, λk for k = 1, 2, . . . , 101, to define 101 niches. The bounds (B.68) and(B.69), which state that r˘y(λ) ≤ ε¯ if λ ≤ ε¯/(1 + ε¯) and r˘∆x(λ) ≤ ε¯ if λ ≥ 1/(1 + ε¯) for sometolerance ε¯, provide a meaningful guide for the choice of λk. Thus, based on the bounds (B.68)and (B.69) with chosen ε¯ = b0, b−1, . . . , b−50 and base b such that b−50 = 10−6, I define λk fork = 1, 2, . . . , 101 such thatλk =b51−k1 + b51−kif k ≤ 5111 + b51−kif k > 51.(3.10)My choice of λk distributes the values of λk for k = 1, 2, . . . , 101 more densely near 0 and 1and less densely near 0.5. For reference, λ1 ≈ 10−6, λ2 ≈ 1.3 · 10−6, λ51 = 0.5, λ52 ≈ 0.57,λ100 ≈ 1− 1.3 · 10−6, and λ101 ≈ 1− 10−6.3.4.4 Calculating Confidence IntervalsI calculate confidence intervals by bootstrapping, given the complex nonlinear relationshipbetween data noise and parameter noise that would not be adequately captured using a (Taylorexpansion based) delta method [39]. In doing so, I calculate observable-state residuals,ε˜j,k = yj,k − gj(p˜, x˜1,k, . . . , x˜nx,k), (3.11)where p˜ = p˜λ101 and x˜ = x˜λ101 , the parameters and state values that minimize r(p,x;λ101),and x˜i,k is the value in x˜ from the ith state and the kth grid index, for i ∈ {1, 2, . . . , nx},j ∈ {1, 2, . . . , ny}, and k ∈ {1, 2, . . . , nt}. By resampling residuals, I generate nb = 103 bootstrapdata sets :yj,k = gj(p˜, x˜1,k, . . . , x˜nx,k) + ε˜j,l, (3.12)where l is randomly sampled with replacement from {1, 2, . . . , nt}, for j ∈ {1, 2, . . . , ny} andk ∈ I∆. I replace observed data values in r(p,x;λ) with bootstrap data values from the ithbootstrap data set to construct the functional rbi (p,x;λ). Globally minimizing rbi (p,x;λ) usingoverlapping-niche descent for all i ∈ {1, 2, . . . , nb} would be computationally prohibitive. Rather,if residuals are not overly large, the optimal parameters and state values of rbi (p,x;λ) willgenerally be fairly similar to p˜ and x˜. Thus, with p˜ and x˜ as initial parameters and state values,I locally optimize rbi (p,x;λb) using accelerated descent, for all i ∈ {1, 2, . . . , nb}, with λb chosen293.5. Fitting Forms of the Bonny Model to Synthetic Datalarge enough to weight local optimization towards a numerical solution but not so large that pand x are fixed near p˜ and x˜. Specifically, I chooseλb = arg min{∣∣∣ry(p˜λ, x˜λ)− 103r∆x(p˜λ, x˜λ)∣∣∣ : λ ∈ {λ1, λ2, . . . , λ101}} . (3.13)From the nb local optimizations, I construct a distribution of values for each parameter. Fromthe distribution of values for parameter pj , I compute the 2.5th and 97.5th percentile values,which I translate into the 95% confidence interval for parameter pj , for j ∈ {1, 2, . . . , np}.3.5 Fitting Forms of the Bonny Model to Synthetic DataTo ascertain the efficacy of overlapping-niche descent, I fit forms of the Bonny model to syntheticdata.3.5.1 Fitting the Spatially Homogeneous Bonny Model to the SyntheticSpatially-Homogeneous DataI fit the spatially homogeneous Bonny model (3.2) to the synthetic spatially-homogeneous datausing overlapping-niche descent, as described in Section 3.4, on a uniform grid with a gridrefinement factor of 1, n∆nt−1 = 1 for n∆ the number or grid points and nt the number ofdata points. I find that ry(p˜λ101 , x˜λ101) = 4.36 · 10−4, r∆x(p˜λ101 , x˜λ101) = 1.06 · 10−11, the meantime per iteration of accelerated descent is 1.46 · 10−4 s, and the total accelerated descenttime is 7.38 · 10−1 minutes. I calculate the total accelerated descent time as the sum of themaximal accelerated descent time in each generation, as I compute accelerated descent in parallel.Observable-state values of x˜λ101 , the state values that minimize r(p,x;λ101), are shown in Figure3.4.303.5. Fitting Forms of the Bonny Model to Synthetic DataFigure 3.4: The fit of the spatially homogeneous Bonny model to the synthetic spatially-homogeneous data. Observable-state values are shown with solid lines and data values are shownwith points. The spatially homogeneous Bonny model fits the synthetic spatially-homogeneousdata fairly well. Fitting errors arise from a relatively course numerical discretization.As is visible in Figure 3.4, the observable-state values of x˜λ101 visibly differ from syntheticdata values at some times. This discrepancy stems from a relatively inaccurate method on arelatively coarse grid. On more refined grids, the observable-state values of x˜λ101 fit syntheticdata more accurately (shown in Section 3.6) and are visually indistinguishable from the syntheticspatially-homogeneous data (not shown). Parameter estimates from the fit of the spatiallyhomogeneous Bonny model to the synthetic spatially-homogeneous data are shown in Table 3.2.313.5. Fitting Forms of the Bonny Model to Synthetic Datatrue value estimated value 95% confidence interval unitscmax 2.75 · 104 3.02 · 104 [2.96 · 104, 3.09 · 104] µm−2ωD 5.00 · 10−4 0.00 · 100 [0.00 · 100, 2.56 · 10−3] µm s−1ωE 1.36 · 10−4 1.20 · 10−4 [1.17 · 10−4, 1.22 · 10−4] µm3 s−1ωdD 3.18 · 10−3 2.99 · 10−3 [2.91 · 10−3, 3.05 · 10−3] µm3 s−1ωde,c 1.60 · 10−1 9.31 · 10−2 [8.64 · 10−2, 9.98 · 10−2] s−1ωde,m 2.52 · 100 2.59 · 100 [2.56 · 100, 2.62 · 100] s−1ωe 5.00 · 10−1 5.78 · 10−1 [5.69 · 10−1, 5.91 · 10−1] s−1ωed 4.90 · 10−3 4.41 · 10−3 [4.35 · 10−3, 4.46 · 10−3] µm2 s−1Table 3.2: Parameter estimates from the fit of the spatially homogeneous Bonny model to thesynthetic spatially-homogeneous data. Parameter estimates are generally fairly similar to trueparameter values.3.5.2 Fitting the Traveling Wave Bonny Model to the SyntheticTraveling-Wave DataI fit the traveling wave Bonny model (3.3) to the synthetic traveling-wave data using overlapping-niche descent, as described in Section 3.4, on a uniform grid with a grid refinement factor of 1,n∆nt−1 = 1 for n∆ the number or grid points and nt the number of data points. I find that theobservable-state values of x˜λ101 are visually indistinguishable from the synthetic traveling-wavedata shown in Figure 3.2, ry(p˜λ101 , x˜λ101) = 1.61 · 10−5, r∆x(p˜λ101 , x˜λ101) = 6.18 · 10−17, themean time per iteration of accelerated descent is 1.29 · 10−4 s, and the total accelerated descenttime is 2.69 · 10−1 minutes. I note that overlapping-niche descent requires a similar amountof time to fit the traveling wave Bonny model to the synthetic traveling-wave data as it doesto fit the spatially homogeneous Bonny model to the synthetic spatially homogeneous data(2.69 · 10−1 minutes vs. 7.38 · 10−1 minutes), even though the traveling wave Bonny modelis a boundary value problem and the spatially homogeneous Bonny model is an initial valueproblem. Parameter estimates from the fit of the traveling wave Bonny model to the synthetictraveling-wave data are shown in Table 3.3.323.5. Fitting Forms of the Bonny Model to Synthetic Datatrue value estimated value 95% confidence interval unitsDd 3.00 · 10−1 3.37 · 10−1 [3.31 · 10−1, 3.46 · 10−1] µm2 s−1Dde 3.00 · 10−1 2.42 · 10−1 [2.30 · 10−1, 2.53 · 10−1] µm2 s−1De 1.80 · 100 1.68 · 100 [1.65 · 100, 1.71 · 100] µm2 s−1cmax 2.75 · 104 2.83 · 104 [2.81 · 104, 2.84 · 104] µm−2ωD 5.00 · 10−4 1.36 · 10−3 [5.82 · 10−4, 1.97 · 10−3] µm s−1ωE 1.36 · 10−4 1.39 · 10−4 [1.38 · 10−4, 1.40 · 10−4] µm3 s−1ωdD 3.18 · 10−3 3.07 · 10−3 [3.06 · 10−3, 3.08 · 10−3] µm3 s−1ωde,c 1.60 · 10−1 1.72 · 10−1 [1.69 · 10−1, 1.76 · 10−1] s−1ωde,m 2.52 · 100 2.49 · 100 [2.48 · 100, 2.49 · 100] s−1ωe 5.00 · 10−1 4.94 · 10−1 [4.87 · 10−1, 4.97 · 10−1] s−1ωed 4.90 · 10−3 4.84 · 10−3 [4.84 · 10−3, 4.86 · 10−3] µm2 s−1Table 3.3: Parameter estimates from the fit of the traveling wave Bonny model to the synthetictraveling-wave data. Parameter estimates are closer to true parameter values than those shownin Table 3.2.Rate parameter estimates from fitting the traveling wave Bonny model to the synthetictraveling-wave data (shown in Table 3.3) are generally somewhat more accurate with narrowerspreads than rate parameter estimates from fitting the spatially homogeneous Bonny model tothe synthetic spatially-homogeneous data (shown in Table 3.2). I suspect this occurs because,on average, the traveling wave Bonny model fits the synthetic traveling-wave data somewhatbetter than the spatially homogeneous Bonny model fits the synthetic spatially-homogeneousdata (ry(p˜λ101 , x˜λ101) = 1.61 · 10−5 vs. ry(p˜λ101 , x˜λ101) = 4.36 · 10−4).3.5.3 Fitting the Full Bonny Model to the SyntheticTraveling-Wave-Emergence DataI fit the full Bonny model (3.1) to the synthetic traveling-wave-emergence data using overlapping-niche descent, as described in Section 3.4, on a uniform grid with a grid refinement factor of1, n∆tn∆snt−1ns−1 = 1 for n∆t and n∆s the number or temporal and spatial grid points andnt and ns the number of temporal and spatial data points. I find that the observable-statevalues of x˜λ101 are visually indistinguishable from the synthetic traveling-wave-emergence datashown in Figure 3.3, ry(p˜λ101 , x˜λ101) = 1.29 · 10−6, r∆x(p˜λ101 , x˜λ101) = 1.98 · 10−16, the meantime per iteration of accelerated descent is 6.03 · 10−3 s, and the total accelerated descent timeis 48.0 minutes. Parameter estimates from the fit of the full Bonny model to the synthetictraveling-wave-emergence data are shown in Table 3.4.333.5. Fitting Forms of the Bonny Model to Synthetic Datatrue value estimated value 95% confidence interval unitsDd 3.00 · 10−1 2.98 · 10−1 [2.98 · 10−1, 2.99 · 10−1] µm2 s−1Dde 3.00 · 10−1 3.04 · 10−1 [3.03 · 10−1, 3.05 · 10−1] µm2 s−1De 1.80 · 100 1.83 · 100 [1.83 · 100, 1.83 · 100] µm2 s−1cmax 2.75 · 104 2.75 · 104 [2.75 · 104, 2.75 · 104] µm−2ωD 5.00 · 10−4 0.00 · 100 [0.00 · 100, 1.83 · 10−5] µm s−1ωE 1.36 · 10−4 1.36 · 10−4 [1.35 · 10−4, 1.36 · 10−4] µm3 s−1ωdD 3.18 · 10−3 3.18 · 10−3 [3.18 · 10−3, 3.18 · 10−3] µm3 s−1ωde,c 1.60 · 10−1 1.59 · 10−1 [1.58 · 10−1, 1.59 · 10−1] s−1ωde,m 2.52 · 100 2.52 · 100 [2.52 · 100, 2.52 · 100] s−1ωe 5.00 · 10−1 5.00 · 10−1 [5.00 · 10−1, 5.01 · 10−1] s−1ωed 4.90 · 10−3 4.85 · 10−3 [4.85 · 10−3, 4.85 · 10−3] µm2 s−1Table 3.4: Parameter estimates from the fit of the full Bonny model to the synthetic traveling-wave-emergence data. Parameter estimates are very similar to true parameter values except forthe parameter ωD, which seems to play a small role is the overall dynamics of the Bonny model.Parameter estimates from fitting the full Bonny model to the synthetic traveling-wave-emergence data (shown in Figure 3.4) are generally somewhat more accurate with narrowerspreads than parameter estimates from fitting the traveling wave Bonny model to the synthetictraveling-wave data (shown in Table 3.3). I suspect this occurs because, on average, thefull Bonny model fits the synthetic traveling-wave-emergence data somewhat better than thetraveling wave Bonny model fits the synthetic traveling-wave data (ry(p˜λ101 , x˜λ101) = 1.29 · 10−6vs. ry(p˜λ101 , x˜λ101) = 1.61 · 10−5).Neither the spatially homogeneous Bonny model, the traveling wave Bonny model, nor thefull Bonny model accurately estimate the nonzero value ωD when fitting respective syntheticdata (see Tables 3.2, 3.3, and 3.4). Yet, the spatially homogeneous Bonny model (on grids morerefined than when n∆nt−1 = 1), the traveling wave Bonny model, and the full Bonny modelvery accurately fit respective synthetic data. Thus, it appears that, beyond allowing an initialincrease in cd from a homogeneous initial condition, ωD plays very little role in the overalldynamics of the Bonny systems.As shown for the spatially homogeneous Bonny model in Table 3.2, the traveling waveBonny model in Table 3.3, and the full Bonny model in Table 3.4, 95% confidence intervalsoften do not include true parameter values. I suspect this occurs because errors in fittingarise from discretization errors in the numerical methods. As such, residuals are small andare not independent or identically distributed. Thus, when calculating confidence intervals bybootstrapping as described in Section 3.4.4, bootstrap data sets do not significantly differ fromsynthetic data and errors in bootstrap data sets do not accurately represent discretization errorsin the numerical methods. Thus, 95% confidence intervals are often fairly narrow and may notinclude true parameter values.343.6. Comparing . . . to a Numerical-Integration-Based Method3.6 Comparing Overlapping-Niche Descent to aNumerical-Integration-Based MethodI compare the performance of overlapping-niche descent to the performance of a numerical-integration-based parameter optimization method. For a balanced comparison, I constructa variant of overlapping-niche descent that omits r∆x(p,x) from the objective function andinstead solves the differential equation numerically at each step. It also uses a single niche,so I refer to it as single-niche solution descent (SNSD). SNSD optimizes over parameters andinitial conditions to minimize ry(p,x), with numerical solution values x. Details of SNSD aredescribed in Section E.2.To highlight scenarios in which overlapping-niche descent outperforms SNSD, I compare theperformance of overlapping-niche descent to the performance of SNSD on a set of differentialequations systems that vary only in the size of the system. I construct the Bonny × n model, adifferential equation system consisting of n independent copies of the spatially homogeneousBonny model (3.2), with 3n states, 8n parameters, and 2n observable states. Accordingly, Igenerate synthetic spatially-homogeneous × n data using n copies of the synthetic spatially-homogeneous data. I fit the Bonny × n model to the synthetic spatially-homogeneous × n data,for n = 1, 2, . . . , 5, using overlapping-niche descent, as described in Section 3.4, and using SNSD.For both, I use the backward Euler scheme on a set of uniform grids with grid refinement factorsof 1, 2, 3, and 4, n∆nt−1 = 1, 2, 3, 4 for n∆ the number or grid points and nt the number of datapoints. I construct ry(p,x), ryˆ(p,x), and r∆x(p,x) for the Bonny × n model as described inSection 3.4.1 for the spatially homogeneous Bonny model. ry(p,x) is normalized by the numberof observable states. Thus, for pˇ and xˇ, the parameter and numerical solution values thatminimize ry(p,x), the value of ry(pˇ, xˇ) is identical for the Bonny × n model with all n ∈ N+.In SNSD, I calculate numerical solutions using the backward Euler method, solve nonlinearsystems using Newton’s method with an absolute termination tolerance of 10−3, and solvematrix equations using Gaussian elimination. Ultimately, I calculate p˜ and x˜, approximationsof pˇ and xˇ. For overlapping-niche descent, p˜ = p˜λ101 and x˜ = x˜λ101 , the parameters and statevalues that minimize r(p,x;λ101). Results are shown in Tables 3.5, 3.6, 3.7, and 3.8.353.6. Comparing . . . to a Numerical-Integration-Based Methodoverlapping-niche descentn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 4.36 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 2 4.38 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 3 4.36 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 4 4.35 · 10−4 1.14 · 10−4 5.19 · 10−5 2.95 · 10−5Bonny × 5 4.31 · 10−4 1.14 · 10−4 5.19 · 10−5 2.95 · 10−5SNSDn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 4.53 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 2 4.53 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 3 4.53 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 4 4.53 · 10−4 1.14 · 10−4 5.18 · 10−5 2.95 · 10−5Bonny × 5 1.37 · 10−1 1.14 · 10−4 1.37 · 10−1 1.37 · 10−1Table 3.5: Values of ry(p˜, x˜) from overlapping-niche descent and SNSD. p˜ and x˜ approximate pˇand xˇ, the parameters and state values of the optimal data-fitting numerical solution, for the fitof the Bonny × n model, a differential equation system consisting of n independent copies ofthe spatially homogeneous Bonny model (3.2), to the synthetic spatially-homogeneous × n data,which consists of n copies of the synthetic spatially-homogeneous data, for n ∈ {1, 2, . . . , 5}.Bold values are shown for emphasis. Values of ry(p˜, x˜) from overlapping-niche descent andSNSD are similar except for n = 5 in the Bonny × n model when SNSD fails to find the optimaldata-fitting numerical solution.overlapping-niche descentn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 1.06 · 10−11 4.82 · 10−17 4.69 · 10−17 4.61 · 10−16Bonny × 2 1.50 · 10−11 7.63 · 10−16 1.54 · 10−17 9.60 · 10−18Bonny × 3 1.31 · 10−11 4.40 · 10−16 8.92 · 10−18 1.66 · 10−17Bonny × 4 2.31 · 10−11 2.78 · 10−17 2.22 · 10−15 1.39 · 10−15Bonny × 5 5.04 · 10−11 2.77 · 10−17 8.10 · 10−17 2.89 · 10−17SNSDn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 1.42 · 10−12 1.75 · 10−12 2.03 · 10−12 1.78 · 10−12Bonny × 2 1.42 · 10−12 1.75 · 10−12 2.03 · 10−12 1.78 · 10−12Bonny × 3 1.42 · 10−12 1.75 · 10−12 2.03 · 10−12 1.78 · 10−12Bonny × 4 1.42 · 10−12 1.75 · 10−12 2.03 · 10−12 1.78 · 10−12Bonny × 5 1.14 · 10−12 1.75 · 10−12 1.62 · 10−12 1.42 · 10−12Table 3.6: Values of r∆x(p˜, x˜) from overlapping-niche descent and SNSD. Notation is as definedin Table 3.5. Values of r∆x(p˜, x˜) from SNSD are nonzero because numerical solution values arecalculated implicitly using Newton’s method in SNSD. Values of r∆x(p˜, x˜) from overlapping-niche descent are significantly less than values of r∆x(p˜, x˜) from SNSD except on unrefinedgrids, where n∆ = nt.363.6. Comparing . . . to a Numerical-Integration-Based MethodAs shown in Table 3.5, ry(p˜, x˜) from overlapping-niche descent is essentially equal to ry(p˜, x˜)from SNSD on each grid with n∆ = 2nt, 3nt, 4nt and for each Bonny × n model with n = 1, 2, 3, 4.ry(p˜, x˜) from overlapping-niche descent is slightly less than ry(p˜, x˜) from SNSD on the grid withn∆ = nt and for each Bonny × n model with n = 1, 2, 3, 4. However, each integral representationof ry(p˜, x˜) from overlapping-niche descent, as defined in equation (2.26b), is similar to ry(p˜, x˜)from SNSD (not shown), and each r∆x(p˜, x˜) from overlapping-niche descent is larger thanr∆x(p˜, x˜) from SNSD, as shown in Table 3.6. Thus, I expect larger λ in overlapping-nichedescent to decrease the value of r∆x(p˜, x˜) and increase the value of ry(p˜, x˜) to be commensuratewith ry(p˜, x˜) from SNSD. ry(p˜, x˜) from overlapping-niche descent are essentially equal on eachgrid with n∆ = 2nt, 3nt, 4nt and for each Bonny × n model with n = 1, 2, 3, 4, 5. Somewhatsimilarly, ry(p˜, x˜) from SNSD are essentially equal on each grid with n∆ = nt, 2nt, 3nt, 4nt andfor each Bonny × n model with n = 1, 2, 3, 4. However, ry(p˜, x˜) from SNSD is significantlylarger for the Bonny × n model with n = 5 than for n = 1, 2, 3, 4 on each grid with n∆ = nt,n∆ = 3nt, and n∆ = 4nt. Also, as shown in Table 3.6, r∆x(p˜, x˜) from overlapping-niche descentis no greater than r∆x(p˜, x˜) from SNSD on each grid with n∆ = 2nt, 3nt, 4nt and for each Bonny× n model with n = 1, 2, 3, 4, 5. Therefore, collectively, overlapping-niche descent appears tofind pˇ and xˇ, the parameter and numerical solution values that minimize ry(p,x), on each gridwith n∆ = nt, 2nt, 3nt, 4nt and for each Bonny × n model with n = 1, 2, 3, 4, 5. Whereas, SNSDfails to find pˇ and xˇ on each grid with n∆ = nt, n∆ = 3nt, and n∆ = 4nt for the Bonny × nmodel with n = 5. Ultimately, overlapping-niche descent appears to find the optimal data-fittingnumerical solution more robustly than SNSD.A system of differential equations often admits a variety of parameter dependent solutionbehaviors. Bifurcations separate the solution space, and thus the numerical solution space,into regions with qualitatively different behaviors. In numerical-integration-based methods,including SNSD, parameters and initial conditions entirely define numerical solutions. Thus, anumerical-integration-based method can only find the optimal data-fitting numerical solution ifoptimization begins with a set of parameters and initial conditions of a numerical solution withthe same qualitative behavior as the optimal data-fitting numerical solution.As a system of differential equations increases in complexity, the system admits a greatervariety of solution behaviors with more bifurcations, and the likelihood of randomly finding a setof parameters and initial conditions of a numerical solution with the same qualitative behavioras the optimal data-fitting numerical solution decreases. Also, as the number of parametersincreases in a system of differential equations, simply by dimensional scaling, the likelihood ofrandomly finding a set of parameters and initial conditions of a numerical solution with thesame qualitative behavior as the optimal data-fitting numerical solution decreases. Thus, inSNSD, as n increases in the Bonny × n model, the likelihood of finding the parameters andinitial conditions of the optimal data-fitting numerical solution decreases.In overlapping-niche descent, state values, beyond those of initial conditions, directly guideoptimization. Thus, even if random parameters and initial conditions are not those of a numerical373.6. Comparing . . . to a Numerical-Integration-Based Methodsolution with the same qualitative behavior as the optimal data-fitting numerical solution, statevalues orient optimization towards the parameters and state values of a numerical solutionwith the same qualitative behavior as data, positioning the optimization routine to potentiallyfind the parameters and state values of the optimal data-fitting numerical solution. This, Ibelieve, is why overlapping-niche descent is more robust than SNSD for the Bonny × n modelwith n = 5. Accordingly, I expect overlapping-niche descent to be more robust than othernumerical-integration-based methods for the Bonny × n model with n = 5, and for complexmodels in general.overlapping-niche descentn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 1.46 · 10−4 2.87 · 10−4 4.54 · 10−4 5.68 · 10−4Bonny × 2 4.18 · 10−4 8.23 · 10−4 1.26 · 10−3 1.63 · 10−4Bonny × 3 9.41 · 10−4 1.84 · 10−3 2.80 · 10−3 3.73 · 10−3Bonny × 4 1.65 · 10−3 3.30 · 10−3 5.03 · 10−3 6.58 · 10−3Bonny × 5 2.70 · 10−3 5.23 · 10−3 7.99 · 10−3 1.05 · 10−2SNSDn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 1.39 · 10−3 1.16 · 10−3 1.80 · 10−3 2.50 · 10−3Bonny × 2 1.04 · 10−2 2.09 · 10−2 2.25 · 10−2 3.03 · 10−2Bonny × 3 3.98 · 10−2 7.85 · 10−2 9.47 · 10−2 1.54 · 10−1Bonny × 4 9.72 · 10−2 2.13 · 10−1 3.50 · 10−1 4.48 · 10−1Bonny × 5 2.38 · 10−1 5.00 · 10−1 9.44 · 10−1 1.26Table 3.7: Mean time per iteration of descent from overlapping-niche descent and SNSD. Notationis as defined in Table 3.5. Times are shown in seconds. SNSD requires more time for an iterationof descent than overlapping-niche descent, and the difference in required time increases as thesystem of differential equations increases in size.383.7. Noisy Data and Incomplete Modelingoverlapping-niche descentn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 7.38 · 10−1 4.35 · 10−1 4.86 · 10−1 6.26 · 10−1Bonny × 2 1.23 1.34 3.24 3.68Bonny × 3 1.03 · 101 2.90 8.58 7.25Bonny × 4 7.55 8.69 1.32 · 101 1.40 · 101Bonny × 5 5.37 1.34 · 101 2.47 · 101 2.09 · 101SNSDn∆ = nt n∆ = 2nt n∆ = 3nt n∆ = 4ntBonny × 1 2.25 1.72 2.22 4.80Bonny × 2 2.56 · 101 5.34 · 101 6.27 · 101 5.70 · 101Bonny × 3 1.41 · 102 2.45 · 102 3.61 · 102 4.33 · 102Bonny × 4 5.32 · 102 8.87 · 102 1.24 · 103 2.99 · 103Bonny × 5 1.20 · 103 2.50 · 103 3.15 · 103 3.86 · 103Table 3.8: Total descent time from overlapping-niche descent and SNSD. Notation is as definedin Table 3.5. Times are shown in minutes. The total accelerated descent time is the sum ofthe maximal accelerated descent time in each generation, as accelerated descent is calculated inparallel. SNSD requires more time for descent than overlapping-niche descent, and the differencein required time increases as the system of differential equations increases in size.As shown in Tables 3.7 and 3.8 for the Bonny × 1 model, computational times, the mean timeper iteration of descent and the total descent time, of SNSD range from about 3 to about 10 timeslarger than corresponding computational times of overlapping-niche descent. As n increases inthe Bonny × n model, differences in computational times between SNSD and overlapping-nichedescent increase. Notably, for the Bonny × 5 model, computational times of SNSD range fromabout 90 to about 220 times larger than corresponding computational times of overlapping-nichedescent. I explore this result in Section D.1, where I count the computational complexity ofoverlapping niche descent and a variety of numerical-integration-based methods, including SNSD.Under relatively general assumptions, I find that overlapping-niche descent outperforms thenumerical-integration-based methods and the difference in performance increases with increasingsystem size, especially with implicit methods and with partial differential equations.3.7 Noisy Data and Incomplete ModelingReal data is often noisy and models of real data are often incomplete. As such, I exploredifferences in the character of fits for a complete model with noisy data and an incomplete modelwith noiseless data. In doing so, I show how parameters and state values from overlapping-nichedescent as λ→ 0+ can inform model shortcomings and potential model improvements. I alsoprovide an example that shows how a parameter from overlapping-niche descent differs overλ ∈ (0, 1) for a fit of a complete model to noisy data and a fit of an incomplete model to noiselessdata.393.7. Noisy Data and Incomplete ModelingTo explore differences in the character of fits for a complete model with noisy data and anincomplete model with noiseless data, I generate noisy data and an incomplete model. I caneasily add noise to synthetic data to generate noisy data. For an instructive example of anincomplete model, I seek an incomplete variant of one of the forms of the Bonny model thatvisibly, but not overly, alters the fit to synthetic data. Removing a reaction term from thespatially homogeneous Bonny model leads to either no visible difference or a dramatic differencein the fit to the synthetic spatially homogeneous data (not shown). Thus, I do not constructan incomplete model by removing a reaction term from one of the forms of the Bonny model.Instead, as the synthetic spatially homogeneous data and the synthetic traveling-wave dataare visibly different but not so dissimilar in shape, I generate an incomplete traveling wavemodel by imposing zero diffusion in the traveling wave Bonny model (3.3), Dd = 0 µm2 s−1,Dde = 0 µm2 s−1, and De = 0 µm2 s−1. For corresponding noisy data, I generate noisytraveling-wave data by adding random errors to the synthetic traveling-wave data. For noisytraveling-wave MinD data, I distribute random errors normally with a mean of zero and astandard deviation that is 0.05 times the range of synthetic traveling-wave MinD data. Fornoisy traveling-wave MinE data, I distribute random errors normally with a mean of zero and astandard deviation that is 0.05 times the range of synthetic traveling-wave MinE data. In doingso, I restrict noisy traveling-wave data to non-negative values for physical relevance.Using overlapping-niche descent, as described in Section 3.4, on a uniform grid with a gridrefinement factor of 1, n∆nt−1 = 1 for n∆ the number or grid points and nt the number of datapoints, I fit the traveling wave Bonny model to the noisy traveling-wave data, and I fit theincomplete traveling wave model to the synthetic traveling-wave data. Respectively, I find thatry(p˜λ101 , x˜λ101) = 7.30 · 10−3 and ry(p˜λ101 , x˜λ101) = 3.13 · 10−3, r∆x(p˜λ101 , x˜λ101) = 3.40 · 10−15and r∆x(p˜λ101 , x˜λ101) = 5.84 · 10−15, the mean times per iteration of accelerated descent are1.28 · 10−4 s and 1.26 · 10−4 s, and the total accelerated descent times are 3.27 · 10−1 minutesand 2.70 · 10−1 minutes. Observable-state values of x˜λ101 are shown in Figure 3.5.403.7. Noisy Data and Incomplete Modeling(a) (b)(c) (d)Figure 3.5: Observable-state errors for noisy-data and incomplete-model fits. For the fit of thetraveling wave Bonny model to the noisy traveling-wave data, the observable-state values ofx˜λ101 and data values are shown in (a) and differences in the observable-state values of x˜λ101and data values are shown in (c). For the the fit of the incomplete traveling wave model to thesynthetic traveling-wave data, the observable-state values of x˜λ101 and data values are shown in(b) and differences in the observable-state values of x˜λ101 and data values are shown in (d). In(a) and (b), observable-state values are shown with solid lines and data values are shown withpoints. Observable-state errors appear to be uncorrelated in z for the noisy-data fit and highlycorrelated in z for the incomplete-model fit.As is visible in Figure 3.5, for the traveling wave Bonny model fit to the noisy traveling-wavedata, differences in the observable-state values of x˜λ101 and data values appear to be mostlyuncorrelated in z (with a Durbin-Watson statistic in MinD of 1.39 and a Durbin-Watson statisticin MinE of 2.12, where a Durbin-Watson statistic of 2 indicates no autocorrelation in residualsand a Durbin-Watson statistic closer to 0 indicates a greater positive autocorrelation in residuals),reflecting random error; whereas, for the incomplete traveling wave model fit to the synthetic413.7. Noisy Data and Incomplete Modelingtraveling-wave data, differences in the observable-state values of x˜λ101 and data values appear tobe highly correlated in z (with a Durbin-Watson statistic in MinD of 0.52 and a Durbin-Watsonstatistic in MinE of 0.38), reflecting modeling error. Differences in the observable-state values ofx˜λ101 and data values may indicate the existence of modeling error, but they provide little insightinto potential sources of the modeling error. Alternatively, I consider numerical discretizationsinvolving p˜λ1 and x˜λ1 , the parameters and state values closest to a numerical solution giventhat observable state values (very nearly) match data. Differences in ∆1xi,k and Fi,k(t,p,x), asdefined in equation (3.5), for p = p˜λ1 and x = x˜λ1 are the minimal differences in ∆1xi,k andFi,k(t,p,x) imposed by data, as measured by r∆x(p,x), for all i ∈ {1, 2, . . . , nx} and all k inI∆. Thus, differences in ∆1xi,k and Fi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 may reveal modelshortcomings and point to changes in a model that can be made to bring the model closer todata. I plot differences in ∆1xi,k and Fi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 in Figure 3.6 for thefit of the traveling wave Bonny model to the noisy traveling-wave data and for the fit of theincomplete traveling wave model to the synthetic traveling-wave data, for the state xi = cde.423.7. Noisy Data and Incomplete Modeling(a) (b)(c) (d)Figure 3.6: Numerical solution errors for noisy-data and incomplete-model fits. Values are shownfor the state xi = cde. For the fit of the traveling wave Bonny model to the noisy traveling-wavedata, ∆1xi,k and Fi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 are shown in (a) and ∆1xi,k−Fi,k(t,p,x)for p = p˜λ1 and x = x˜λ1 is shown in (c). For the fit of the incomplete traveling wave modelto the synthetic traveling-wave data, ∆1xi,k and Fi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 areshown in (b) and ∆1xi,k − Fi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 is shown in (d). In (a) and (b),∆1xi,k is shown with dashed lines and Fi,k(t,p,x) is shown with solid lines. Numerical solutionerrors appear to be uncorrelated in z for the noisy-data fit and highly correlated in z for theincomplete-model fit.Paralleling inferences drawn from Figure 3.5, as is visible in Figure 3.6, for the fit ofthe traveling wave Bonny model to the noisy traveling-wave data, differences in ∆1xi,k andFi,k(t,p,x) for p = p˜λ1 and x = x˜λ1 appear to be mostly uncorrelated in z (with a Durbin-Watson statistic of 1.43), reflecting random error; whereas, for the fit of the incomplete travelingwave model to the synthetic traveling-wave data, differences in ∆1xi,k and Fi,k(t,p,x) forp = p˜λ1 and x = x˜λ1 appear to be highly correlated in z (with a Durbin-Watson statistic of433.8. Overlapping-Niche Descent in Practice0.50), reflecting modeling error. Furthermore, for the fit of the incomplete traveling wave modelto the synthetic traveling-wave data, relatively large differences in ∆1xi,k and Fi,k(t,p,x) forp = p˜λ1 and x = x˜λ1 occur near values of z where ∆1xi,k changes rapidly, indicating that themodeling error could be Laplacian-dependent.I provide an example that shows how a parameter from overlapping-niche descent differsover λ ∈ (0, 1) for a fit of a complete model to noisy data and a fit of an incomplete modelto noiseless data. For the example, I plot values of ωde,m in p˜λ for λ ∈ {λ1, λ2, . . . , λ101} fromthe fit of the traveling wave Bonny model to the noisy traveling-wave data and the fit of theincomplete traveling wave model to the synthetic traveling-wave data in figure 3.7.(a) (b)Figure 3.7: Parameter variation over λ ∈ (0, 1) for noisy-data and incomplete-model fits. Valuesof ωde,m in p˜λ for λ ∈ {λ1, λ2, . . . , λ101} from the fit of the traveling wave Bonny model to thenoisy traveling-wave data are shown in (a). Values of ωde,m in p˜λ for λ ∈ {λ1, λ2, . . . , λ101} fromthe fit of the incomplete traveling wave model to the synthetic traveling-wave data are shown in(b). I plot at λ(1− λ)−1 on a log scale to distinguish values of ωde,m in p˜λ near λ = 0 and nearλ = 1.3.8 Overlapping-Niche Descent in PracticeIn Sections 3.5, 3.6, and 3.7, I have simply shown overlapping-niche descent results. Here, Iexplicate overlapping-niche descent in practice. I continue my discussion for details pertaining tothe implementation of overlapping-niche descent in practice in Section E.3. My discussion followsoverlapping-niche descent in the fitting of the full Bonny model (3.1) to the synthetic traveling-wave-emergence data on a uniform grid with a grid refinement factor of 1, n∆tn∆snt−1ns−1 = 1for n∆t and n∆s the number or temporal and spatial grid points and nt and ns the number oftemporal and spatial data points.443.8. Overlapping-Niche Descent in Practice3.8.1 ConvergenceDuring overlapping-niche descent, I minimize r(p,x;λ) over an array of λ values in (0, 1) tofind p˜λ and x˜λ, which allows me to define the function r˜(λ) = (1 − λ)r˜y(λ) + λr˜∆x(λ) =(1 − λ)ry(p˜λ, x˜λ) + λr∆x(p˜λ, x˜λ), as defined in equation (2.23). I note that ryˆ(p,x) = 0 inr(p,x;λ) for a grid refinement factor of 1, so the r˜yˆ(λ) term in r˜(λ) is omitted here. To illustrateconvergence in r˜(λ) over generations of overlapping-niche descent, I plot the relative change inr˜(λ) over each generation of overlapping-niche descent in Figure 3.8.Figure 3.8: Convergence of r˜(λ) during overlapping-niche descent. The relative change inr˜(λ) over each generation of overlapping-niche descent is shown for niches defined by λ ∈{λ1, λ2, . . . , λ101}. Values correspond to ∆rg,i,1, as defined in equation (C.1) of Section C.1,for niche index i ∈ {1, 2, . . . , 101} and generation g > 2. Generally, r˜(λ) converges quickly forsmaller λ and more slowly for larger λ.As is visible in Figure 3.8, generally, r˜(λ) converges sequentially in λ, in the order of increasingλ. Interestingly, as shown in Section E.3, although r˜(λ) converges more readily for smallerλ, selection in overlapping-niche descent from a niche with a larger value of λ contributesto convergence in r˜(λ) at least as much as selection from a niche with a smaller value of λ.Convergence in r˜(λ) is coupled to convergence in r˜y(λ) and r˜∆x(λ). To illustrate convergence inr˜y(λ) and r˜∆x(λ), I plot the evolution of r˜y(λ) and r˜∆x(λ) over generations of overlapping-nichedescent in Figure 3.9.453.8. Overlapping-Niche Descent in Practice(a) (b)(c) (d)Figure 3.9: The evolution of r˜y(λ) and r˜∆x(λ) over generations of overlapping-niche descent, forλ ∈ {λ1, λ2, . . . , λ101}. Values are shown in (a), (b), (c), and (d) for generations 1, 2, 5, and 17of overlapping-niche descent. I plot at λ(1− λ)−1 on a log scale to distinguish values of r˜y(λ)and r˜∆x(λ) near λ = 0 and near λ = 1.3.8.2 Consistency with the Conservation Principle and IntegralRepresentationsFor λ ∈ (0, 1), the parameters and state values that minimize r(p,x;λ), p˘λ and x˘λ, allow meto define the function r˘(λ) = (1 − λ)r˘y(λ) + λr˘∆x(λ) = (1 − λ)ry(p˘λ, x˘λ) + λr∆x(p˘λ, x˘λ), asdefined in equation (2.18). I note that ryˆ(p,x) = 0 in r(p,x;λ) for a grid refinement factorof 1, so the r˘yˆ(λ) term in r˘(λ) is omitted here. r˘(λ), r˘y(λ), and r˘∆x(λ) satisfy a conservation463.8. Overlapping-Niche Descent in Practiceprinciple of the form:2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ =∫ 10r˘∆x(λ)dλ, (3.14)as stipulated in equation (2.22). As such, I calculate and compare values of 2∫ 10 r˜(λ)dλ,∫ 10 r˜y(λ)dλ, and∫ 10 r˜∆x(λ)dλ to ensure that r˜(λ), r˜y(λ), and r˜∆x(λ) are consistent with theconservation principle. In doing so, I numerically calculate integral values by integratingpiecewise cubic spline interpolants of integrand values with not-a-knot end conditions. I plotvalues of 2∫ 10 r˜(λ)dλ,∫ 10 r˜y(λ)dλ, and∫ 10 r˜∆x(λ)dλ for each generation of overlapping-nichedescent in Figure 3.10.(a) (b)Figure 3.10: Consistency in conservation of r˜(λ), r˜y(λ), and r˜∆x(λ). Values of 2∫ 10 r˜(λ)dλ,∫ 10 r˜y(λ)dλ, and∫ 10 r˜∆x(λ)dλ are shown for generations 1-17 in (a) and for generations 3-17 (fora more focused view) in (b). Dashed lines are shown to delineate values. 2∫ 10 r˜(λ)dλ,∫ 10 r˜y(λ)dλ,and∫ 10 r˜∆x(λ)dλ converge to similar values over generations.As is visible in Figure 3.10, 2∫ 10 r˜(λ)dλ,∫ 10 r˜y(λ)dλ, and∫ 10 r˜∆x(λ)dλ converge to similar valuesover generations of overlapping-niche descent, indicating that, ultimately, r˜(λ), r˜y(λ), and r˜∆x(λ)are fairly consistent with the conservation principle.r˘y(λ) and r˘∆x(λ) admit integral representations of limit values:limλ→0+r˘∆x(λ) =∫ 10λ−2r˘y(λ)dλ, (3.15)limλ→1−r˘y(λ) =∫ 10(1− λ)−2r˘∆x(λ)dλ, (3.16)as stipulated in equation (2.25). As such, I calculate and compare values of r˜∆x(λ1) with∫ 10 λ−2r˜y(λ)dλ and r˜y(λ101) with∫ 10 (1 − λ)−2r˜∆x(λ)dλ to ensure that r˜y(λ) and r˜∆x(λ) are473.8. Overlapping-Niche Descent in Practiceconsistent with the integral representations of limit values. In doing so, I numerically calculateintegral values by integrating piecewise cubic spline interpolants of integrand values with not-a-knot end conditions. I find that r˜∆x(λ1) ≈ 2.70 · 10−4 and∫ 10 λ−2r˜y(λ)dλ ≈ 2.72 · 10−4 for allgenerations of overlapping-niche descent. I plot values of r˜y(λ101) and∫ 10 (1− λ)−2r˜∆x(λ)dλ foreach generation of overlapping-niche descent in Figure 3.11.(a) (b)Figure 3.11: Consistency in the integral representations of limλ→1− r˜y(λ). Values of r˜y(λ101)and∫ 10 (1 − λ)−2r˜∆x(λ)dλ are shown for generations 1-17 in (a) and for generations 4-17(for a more focused view) in (b). Dashed lines are shown to delineate values. r˜y(λ101) and∫ 10 (1− λ)−2r˜∆x(λ)dλ converge to similar values over generations.As is visible in Figure 3.11, r˜y(λ101) and∫ 10 (1− λ)−2r˜∆x(λ)dλ converge to similar values overgenerations of overlapping-niche descent. I conclude that r˜y(λ) and r˜∆x(λ) are fairly consistentwith the integral representations of limit values.I explore how r˜y(λ) and r˜∆x(λ) contribute to the integral representations of limit values fordifferent values of λ. Given the weighting λ−2, it would appear that∫ 10 λ−2r˜y(λ)dλ dependson r˜y(λ) most heavily for λ near 0. However, r˜y(λ) is smallest for λ near 0. Similarly, giventhe weighting (1− λ)−2, it would appear that ∫ 10 (1− λ)−2r˜∆x(λ)dλ depends on r˜∆x(λ) mostheavily for λ near 1. However, r˜∆x(λ) is smallest for λ near 1. Thus, the extent to which theintegral representations of limit values depend on r˜y(λ) and r˜∆x(λ) for different values of λ isunclear. For clarification, I plot cumulative integrals of∫ 10 λ−2r˜y(λ)dλ and∫ 10 (1−λ)−2r˜∆x(λ)dλin Figure 3.12.483.9. Discussion(a) (b)Figure 3.12: Cumulative integral representations of limit values. The cumulative integral of∫ 10 λ−2r˜y(λ)dλ is shown in (a). The cumulative integral of∫ 10 (1− λ)−2r˜∆x(λ)dλ is shown in (b).∫ 10 λ−2r˜y(λ)dλ and∫ 10 (1− λ)−2r˜∆x(λ)dλ are rooted most heavily in small to intermediate λ.As is visible in Figure 3.12,∫ 10 λ−2r˜y(λ)dλ depends on r˜y(λ) most heavily for small λ. Also,∫ 10 (1 − λ)−2r˜∆x(λ)dλ depends on r˜∆x(λ) most heavily for small to intermediate λ and for λnear 1. Interestingly, despite the weighting of (1 − λ)−2, ∫ 10 (1 − λ)−2r˜∆x(λ)dλ depends onr˜∆x(λ) more heavily for small to intermediate λ than for λ near 1. Thus,∫ 10 λ−2r˜y(λ)dλ and∫ 10 (1− λ)−2r˜∆x(λ)dλ are rooted in r˜y(λ) and r˜∆x(λ) for small to intermediate λ, which, as r˜(λ)converges most readily for small to intermediate λ, implies that r˜y(λ) and r˜∆x(λ) for small tointermediate λ provide a robust basis for the integral representations of limit values.3.9 DiscussionIn this chapter, I tested my method on synthetic data and a system of first order ordinarydifferential equations, a system of second order ordinary differential equations, and a system ofpartial differential equations. I found that my method accurately identified the optimal data-fitting numerical solution and its parameters in all three contexts. I compared the performanceof my method to that of an analogous numerical-integration-based method, and found that mymethod identified the optimal data-fitting numerical solution more robustly than the analogousnumerical-integration-based method, while requiring significantly less time to do so. I alsoexplored an example where my method informed modeling insufficiencies and potential modelimprovements for an incomplete variant of a model. Finally, I showed that my optimizationroutine converged to values that were consistent with my derived conservation principles andintegral representations.49Chapter 4Fitting Models of the Min System toTime-Course Data4.1 IntroductionThe Escherichia coli Min system is one of the simplest known biological systems that demon-strates diverse complex dynamic behavior or transduces local interactions into a global signal. Assuch, the Min system is currently one of the most reduced model systems for understanding suchbehaviors. Various mathematical models of the Min system show behaviors that are qualitativelysimilar to dynamic behaviors of the Min system that have been observed in experiments, butno model has been quantitatively compared to time-course data. In this chapter, I brieflysummarize extracting time-course data for model fitting from experimental measurements of theMin system and fit established and novel biochemistry-based models to the time-course datausing my method, which I developed in Chapter 2 and tested in Chapter 3. Comparing modelsto time-course data allows me to make precise distinctions between biochemical assumptions inthe various models. My modeling and fitting supports a novel model that accounts for MinE’spreviously unmodeled dual role as a stabilizer and an inhibitor of MinD membrane binding. Itsuggests that a regular, ordered, stability-switching mechanism underlies the emergent, dynamicbehavior of the Min system.4.2 Choosing and Processing Data to Simplify FittingFor data fitting, I use in vitro data from the experiments of Ivanov and Mizuuchi and focus onregions that are as close to spatially homogeneous as possible. In vitro data poses fewer challengesfor time course comparison than in vivo data: in vitro geometry is a simple two-dimensionalplane, whereas in vivo geometry is a relatively complex three-dimensional rod shape; in vitromeasurements map a two-dimensional process to a two-dimensional image, whereas in vivomeasurements map a three-dimensional process to a two-dimensional image; in vitro data is lesscoarse than in vivo data, as the spatial scale of pattern formation is larger in in vitro experimentsthan in in vivo experiments; and in vitro behavior is less susceptible to stochastic effects than invivo behavior because in vitro experiments employ significantly higher concentrations of proteinsthan in vivo experiments.In the Ivanov and Mizuuchi in vitro experiments, buffer was rapidly flowed atop a supported504.2. Choosing and Processing Data to Simplify Fittinglipid bilayer to induce spatially uniform concentrations of reaction components in the buffer.On the supported lipid bilayer, densities of MinD and MinE, which were measured using totalinternal reflection microscopy (TIRF), oscillated near-homogeneously in space before forming intotraveling waves [38]. Details of the Ivanov and Mizuuchi experiments are described throughoutAppendix F. Deterministic models of the Min system are generally systems of partial differentialequations that describe how protein concentrations change in space and time. In SectionF.4.1, I show that the global behavior of a near-homogeneous process is described to leadingorder by a system of ordinary differential equations, the spatially-homogeneous reduction of thesystem’s partial differential equation description. For the Min system, the spatially-homogeneousreduction is a description of how local reactions change Min protein concentrations in time. Assuch, fitting an ordinary differential equation model of the Min system to near-homogeneoustime-course data provides a direct comparison of the model’s reaction-based outcomes andexperimental observations. Kindly, Ivanov and Mizuuchi have shared their data with me.4.2.1 Extracting Spatially Near-Homogeneous DataThe Ivanov and Mizuuchi data requires some preprocessing before being able to extract near-homogeneous data from it. Fluorescence intensities of fluorescently labeled MinD and MinEare not spatially aligned in the Ivanov and Mizuuchi data. I align the Ivanov and Mizuuchidata using the cross-correlation of similarly shaped structures in MinD and MinE fluorescenceintensity profiles. Details of data alignment are described in Section F.2. After aligning theIvanov and Mizuuchi data, I flatten the aligned data to correct for variability in MinD and MinEfluorescent intensities from Gaussian illumination in microscopy. Details of data flattening aredescribed in Section F.3.2. The conversion from MinE fluorescence intensity to MinE densitywas not directly measured in the Ivanov and Mizuuchi experiments. From bulk concentrations ofMinD and MinE and from properties of evanescent waves in total internal reflection microscopy(TIRF), I calculate conversions from flattened MinD and MinE fluorescent intensities to MinDand MinE densities. Details of calculating conversions from fluorescence intensities to densitiesare described in Section F.3.3.To extract near-homogeneous data, first, I find the MinD and MinE density data that is theleast inhomogeneous within a disk of 1000 pixels for all times within a spatially near-homogeneousoscillation in the Ivanov and Mizuuchi data. Then, I calculate mean values of MinD and MinEdensities within the disk at each time. Details of extracting near-homogeneous data are describedin Section F.4.2. Near-homogeneous MinD and MinE density data is shown in Figure 4.1.514.3. Fitting Models to the Near-Homogeneous DataFigure 4.1: Near-homogeneous MinD and MinE density data. Data is extracted from mea-surements made by Ivanov and Mizuuchi [38], in which densities of MinD and MinE oscillatenear-homogeneously in space on a supported lipid bilayer.If I had included error bars in figure 4.1 representing standard errors of the mean, they wouldnot be visually distinguishable from the data. Details of errors in near-homogeneous MinD andMinE density data are described in Section F.4.3. Interestingly, as detailed in Section F.4.3,I find that errors in near-homogeneous MinD and MinE density data are related to values ofnear-homogeneous MinD and MinE density data by power laws.4.3 Fitting Models to the Near-Homogeneous DataLocal reactions somehow coordinate membrane binding and unbinding of the Min proteins ina way that collectively generates the emergent, dynamic, global behavior of the Min system.Fitting a model, in the form of an ordinary differential equation, to the near-homogeneoustime-course data provides me with a direct measure of how well the model’s reaction-basedoutcomes describe the near-homogeneous data. As such, fitting models with a variety of proposedMin-system reaction mechanisms to the near-homogeneous data allows me to make precise524.3. Fitting Models to the Near-Homogeneous Datadistinctions between biochemical assumptions in the various models, helping to unravel thespecifics of the local Min-system reaction mechanism.I fit models to the near-homogeneous data using my homotopy-minimization method, whichI developed in Chapter 2 and tested in Chapter 3, to find optimal data-fitting numericalsolutions for models. Some quantities related to model parameter values have been measured inexperiments. As such, during fitting, I restrict values of parameters using the experimentallymeasured values, to confine some parameters to biologically realistic values. Details of fittingare described in Section 4.4, and details of parameter restrictions based on experimentalmeasurements are described in Section 4.4.3.4.3.1 Modeling and Fitting BriefAmong previously published models, the Bonny model [4] has demonstrated the most diversearray of dynamic behaviors that are qualitatively similar to experimental observations of theMin system in vivo and in vitro. To begin my investigation, in Section 4.3.2, I modify the Bonnymodel to account for the details of the experimental protocol used by Ivanov and Mizuuchi andfit the Modified Bonny Model to the near-homogeneous data. Then, I extend the Bonny modelto include new reactions based on experiments, data, and postulate and fit the Extended BonnyModel to the near-homogeneous data. MinE has generally been thought to act as an inhibitorof MinD membrane binding. The Modified Bonny Model and the Extended Bonny Model treatMinE as such. Recently, however, it has been shown that MinE can act to both stabilize andinhibit MinD membrane binding [73]. I build on the Extended Bonny Model, in Section 4.3.3,to develop two models that could account for MinE’s dual role as a stabilizer and an inhibitorof MinD membrane binding, the Symmetric Activation Model and the Asymmetric ActivationModel, and fit them to the near-homogeneous data. Ultimately, I find that my AsymmetricActivation Model fits the near-homogeneous data best, suggesting, as described in Section 4.3.4,that a regular, ordered, stability-switching mechanism underlies the emergent, dynamic behaviorof the Min system.4.3.2 Models in Which MinE Acts Only as an InhibitorBiochemical analysis has characterized how MinD and MinE interact with the membrane andeach other. In the cytosol, MinD monomers bind to ATP and form dimers ([34], [81]) thatsandwich two ATP molecules [76]. MinD dimers bind to the phospholipid membrane ([28], [34],[32], [42], [81], [45]) and cooperatively recruit other MinD dimers to the membrane [42]. MinEdimers bind to MinD dimers on the membrane ([28], [44]), undergoing a conformational changethat allows MinE dimers to bind to the membrane [52]. MinE dimers stimulate ATPase activityin bound MinD dimers, causing MinD dimers to separate and dissociate from the membrane([31], [28], [34], [42]), leaving MinE dimers temporarily bound to the membrane ([27], [66], [73]).Most mathematical models of the Min system, including the Bonny model, are based on a subsetof the aforementioned set of reactions.534.3. Fitting Models to the Near-Homogeneous DataThe Modified Bonny ModelThe Bonny model [4] demonstrates an array of dynamic behavior that is qualitatively similar tomany experimental observations of the Min system, including stochastic pole-to-pole switchingin short cells, regular pole-to-pole oscillations in mid-sized cells, oscillation splitting in growingcells, regular pole-to-midcell oscillations in long cells, end-to-end oscillations in thick cells, andspiral waves on a supported lipid bilayer. The Bonny model consists of three membrane-boundstates, cd, cde, and ce, corresponding to the membrane-bound concentrations of MinD dimers,MinE dimers bound to MinD dimers, and MinE dimers respectively. Normally, the Bonny modelincludes two cytosolic states, cD and cE , corresponding to the concentrations of MinD andMinE dimers in the cytosol respectively. Because of spatially uniform concentrations of reactioncomponents in the buffer of the Ivanov and Mizuuchi experiments, I modify the Bonny modelsuch that cD and cE are constant. The Bonny model is a system of partial differential equations.In the case of spatial homogeneity, the Bonny model reduces to a system of ordinary differentialequations. As such, for correspondence with the near-homogeneous data, I reduced the Bonnymodel to a system of ordinary differential equations under spatially homogeneous conditions:dcddt= (ωD→d + ωdD→dcd)(cmax − cd − cde)/cmax − ωE,d→decd − ωd,e→decdce, (4.1a)dcdedt= ωE,d→decd + ωd,e→decdce − ωde→D,Ecde − ωde→D,ecde, (4.1b)dcedt= −ωd,e→decdce + ωde→D,ecde − ωe→Ece, (4.1c)where cmax is the saturation concentration of MinD dimers on the membrane and ωu,v→x,ydenotes the reaction rate of cu and cv converting into cx and cy, for u, v, x, y ∈ {∅, D,E, d, de, e}.When ωu,v→x,y has a superscript it indicates facilitation of the reaction by the superscriptedspecies. I note that ωD→d and ωdD→d have a multiplicative factor of cD built into them andωE,d→de has a multiplicative factor of cE built into it. Also, I note that I have changed parameternotation in the Modified Bonny Model for consistency with upcoming models. In terms ofthe original parameter notation, as used in Chapter 3, ωD→d = ωD · cD, ωdD→d = ωdD · cD,ωE,d→de = ωE · cE , ωde→D,E = ωde,c, ωde→D,e = ωde,m, ωe→E = ωe, and ωd,e→de = ωed. TheModified Bonny Model (4.1) is depicted in Figure 4.2.544.3. Fitting Models to the Near-Homogeneous DataωD→d ωdD→d ωE,d→de ωde→D,E ωde→D,e ωd,e→de ωe→E(a)ededD E(b)E, d! de de! D,Ede! D, e d, e! de e! ED ! d(+d)(c)Figure 4.2: The Modified Bonny Model. Parameters characterizing reactions are shown in (a).State variables are matched to protein states in (b). Reactions are depicted in (c). In (c),reactants are shown on the left and products are shown on the right of panels. (+) indicatesfacilitation in a reaction by the indicated species.I fit the Modified Bonny Model (4.1) to the near-homogeneous data, as described in Section 4.4.The resulting fit, state values, and parameter values are shown in Figure 4.3, Figure 4.4, andTable 4.1 respectively.554.3. Fitting Models to the Near-Homogeneous DataFigure 4.3: The fit of the Modified Bonny Model to the near-homogeneous data. Data is shownwith points and model values are shown with lines.As is visible in Figure 4.3, the Modified Bonny Model admits pulses in MinD and MinE that arequalitatively similar in width and height to the MinD and MinE pulses of the near-homogeneousdata. However, the Modified Bonny Model is visibly an incomplete description of the dynamicalsystem underlying the near-homogeneous data. I seek experimentally-based alterations of theModified Bonny Model that allow for a better description of the dynamical system underlyingthe near-homogeneous data.564.3. Fitting Models to the Near-Homogeneous DataFigure 4.4: States from the fit of the Modified Bonny Model to the near-homogeneous data.574.3. Fitting Models to the Near-Homogeneous Dataparameter value 95% confidence interval unitsCd 3.18 · 102 [3.18 · 102, 3.18 · 102] µm−2Ce 2.49 · 102 [2.26 · 102, 2.49 · 102] µm−2cmax 5.48 · 103 [5.47 · 103, 5.65 · 103] µm−2ωD→d 6.31 · 10−6 [8.48 · 10−3, 4.67 · 10−1] µm−2 s−1ωdD→d 2.47 · 10−1 [2.20 · 10−1, 2.61 · 10−1] s−1ωE,d→de 6.23 · 10−3 [5.73 · 10−3, 6.59 · 10−3] s−1ωd,e→de 1.23 · 100 [1.21 · 100, 1.25 · 100] µm2 s−1ωde→D,E 2.43 · 10−3 [1.55 · 10−3, 3.10 · 10−3] s−1ωde→D,e 7.81 · 10−2 [7.20 · 10−2, 8.16 · 10−2] s−1ωe→E 4.54 · 10−2 [4.03 · 10−2, 4.96 · 10−2] s−1Table 4.1: Parameters from the fit of the Modified Bonny Model to the near-homogeneousdata. Cd and Ce are fitted data-motivated shifts in observable state values, data preprocessingparameters described in Section F.4.4 that correspond to the constant concentrations of monomersin the bulk and persistently bound monomers on the membrane, for MinD and MinE respectively.Details of calculating confidence intervals are described in Section 4.4.5.The Extended Bonny ModelThe Bonny model assumes that cd, but not by cde, recruits cD to the membrane. In MinD andMinE bursts on a supported lipid bilayer in vitro, increasing the MinE concentration increasesboth the membrane-binding rate of MinD and the peak membrane density of MinD [73]. Thus,the binding of MinE to MinD on the supported lipid bilayer does not seem to suppress MinD’sability to recruit bulk MinD to the supported lipid bilayer. As such, I extend the ModifiedBonny Model to allow cde to recruit cD to the membrane.The Bonny model assumes that neither cde or ce recruits cE to bind to cd on the membrane.In fact, I know of no model that incorporates the facilitated recruitment of cytosolic MinE to bindto free MinD on the membrane. Without facilitated recruitment, a constant concentration ofcytosolic MinE binds to free MinD on the membrane with rate proportional to the concentrationof free MinD on the membrane. As shown in Figure 4.1, MinD decreases from a density of∼10, 000 µm−2 to a density of ∼7, 000 µm−2 while MinE increases at a roughly constant ratefrom a density of ∼2, 000 µm−2 to a density of ∼5, 000 µm−2. Thus, the MinE recruitmentrate does not trail off with decreasing MinD (as seen in the Bonny model in Figure 4.3), whichsuggests that cytosolic MinE may be recruited to bind to free MinD on the membrane withfacilitation. Thus, I extend the Modified Bonny Model, allowing cde and ce to recruit cE to bindto cd.Despite MinE’s long-established role in inducing hydrolysis by MinD, in vitro experimentsshow that MinD rapidly dissociates from the supported lipid bilayer in the absence of MinE[38]. Furthermore, the residence time of MinD on the supported lipid bilayer increases from11 s to at least 40.71 s as the concentration of MinD increases from 0.275 µM to 1.1 µM [44].Without information on the mechanism of stabilization, I phenomenologically model the rate584.3. Fitting Models to the Near-Homogeneous Data“constant” of spontaneous cd dissociation by a reverse hill function,ωd→Dcnsscnss + (cd + cde)ns, (4.2)where ωd→D is the basal spontaneous dissociation rate of cd, cs is the half-max stabilizationconcentration of cd + cde, and ns is the Hill coefficient. Thus, cd decreases from spontaneous cddissociation with rateωd→Dcnss cdcnss + (cd + cde)ns. (4.3)In the absence of MinE in vitro, buffer flowed atop a MinD-saturated supported lipid bilayerreveals that, initially, MinD spontaneously dissociates from the supported lipid bilayer at aroughly constant rate [73]. Thus, from rate (4.3), for relatively large cd,ωd→Dcnss cdcnss + cnsd≈ k, (4.4)for some constant k, which occurs only ifcs << cd and ns = 1. (4.5)Therefore, I extend the Modified Bonny Model, allowing the spontaneous dissociation of cd, atthe rate given by (4.2) with ns = 1.I also consider the possibility of reversible reactions in the Extended Bonny Model. Intraveling waves of Min proteins on a supported lipid bilayer in vitro, MinE residence times are atleast 1.3 times as long as MinD residence times in all portions of the traveling waves [44]. Thus,I do not consider reactions where MinE spontaneously dissociates from MinD on the supportedlipid bilayer. I do consider the other reverse reaction, the reaction of cde splitting into cd and ce.I extend the Modified Bonny Model (4.1) such thatdcddt= (ωD→d + ωdD→dcd + ωdeD→dcde)(cmax − cd¯ − cd − cde)/cmax− (ωE,d→de + ωdeE,d→decde + ωeE,d→dece)cd − ωd,e→decdce + ωde→d,ecde− ωd→Dcscd/(cs + cd¯ + cd + cde), (4.6a)dcdedt= (ωE,d→de + ωdeE,d→decde + ωeE,d→dece)cd + ωd,e→decdce− ωde→D,Ecde − ωde→D,ecde − ωde→d,ecde, (4.6b)dcedt= −ωd,e→decdce + ωde→d,ecde + ωde→D,ecde − ωe→Ece, (4.6c)where cmax is the saturation concentration of MinD dimers on the membrane, cd¯ is the constantconcentration of persistently bound MinD dimers on the membrane, an experimental artifact594.3. Fitting Models to the Near-Homogeneous Datadiscussed in Section F.4.4, and ωu,v→x,y denotes the reaction rate of cu and cv converting into cxand cy, for u, v, x, y ∈ {∅, D,E, d, de, e}. When ωu,v→x,y has a superscript it indicates facilitationof the reaction by the superscripted species. I note that ωzD→d has a multiplicative factor ofcD built into it for z ∈ {∅, d, de} and ωzE,d→de has a multiplicative factor of cE built into it forz ∈ {∅, de, e}. I also note that I do not explicitly include cd¯ in the Modified Bonny Model (4.1),as it is absorbed by cmax, ωD→d, and ωdD→d, but must include it in the Extended Bonny Modelbecause the structure of equation (4.6a) doesn’t allow it to be rescaled away. The ExtendedBonny Model (4.6) is depicted in Figure 4.5.ωD→d ωdD→d •ωdeD→d ωE,d→de •ωdeE,d→de •ωeE,d→deωd,e→de ωde→D,E ωde→D,e •ωde→d,e •ωd→D ωe→E(a)ededD E(b)de! d, e d! DD ! d E, d! de(+de) (+de)(+e)(c)Figure 4.5: The Extended Bonny Model. Parameters characterizing reactions are shown in (a).A parameter that characterizes a reaction that is not included in the Modified Bonny Model isshown with a bullet (•) and the corresponding reaction is depicted in (c). All reactions from theModified Bonny Model are included in the Extended Bonny Model. State variables are matchedto protein states in (b). In (c), reactants are shown on the left and products are shown on theright of panels. (+) indicates facilitation in a reaction by the indicated species.I fit the Extended Bonny Model (4.6) to the near-homogeneous data, as described in Section4.4. The resulting fit, state values, and parameter values are shown in Figure 4.6, Figure 4.7,and Table 4.2 respectively.604.3. Fitting Models to the Near-Homogeneous DataFigure 4.6: The fit of the Extended Bonny Model to the near-homogeneous data. Data isshown with points and model values are shown with lines. The Extended Bonny Model fits thenear-homogeneous data appreciably better than the Modified Bonny Model (compare to Figure4.3).Comparing Figures 4.3 and 4.6, the Extended Bonny Model describes the near-homogeneous datavisibly better than the Modified Bonny Model. Quantitatively, χ2, the weighted sum of squaredresiduals, from the Modified Bonny Model is 3.53 times larger than χ2 from the ExtendedBonny Model, and the value of AIC, the Akaike information criterion, from the Modified BonnyModel is 419 units larger than the value of AIC from the Extended Bonny Model. The Akaikeinformation criterion is a measure of a model’s ability to fit data that accounts for the numberof parameters in the model, based on information theory. For a set of competing models, themodel with the minimum AIC value is considered the best model.614.3. Fitting Models to the Near-Homogeneous DataFigure 4.7: States from the fit of the Extended Bonny Model to the near-homogeneous data.624.3. Fitting Models to the Near-Homogeneous Dataparameter value 95% confidence interval unitsCd 3.18 · 102 [2.74 · 102, 3.18 · 102] µm−2Ce 0.00 · 100 [0.00 · 100, 2.52 · 101] µm−2cd¯ 7.50 · 10−8 [0.00 · 100, 1.18 · 101] µm−2cmax 5.38 · 103 [5.32 · 103, 5.48 · 103] µm−2cs 2.95 · 101 [1.03 · 101, 1.89 · 102] µm−2ωD→d 3.26 · 10−2 [0.00 · 100, 1.26 · 10−1] µm−2 s−1ωdD→d 4.88 · 10−1 [4.45 · 10−1, 5.35 · 10−1] s−1ωdeD→d 6.48 · 10−2 [5.65 · 10−2, 7.69 · 10−2] s−1ωE,d→de 9.08 · 10−5 [0.00 · 100, 5.96 · 10−4] s−1ωdeE,d→de 0.00 · 100 [0.00 · 100, 6.15 · 10−7] µm2 s−1ωeE,d→de 3.68 · 10−3 [3.32 · 10−3, 4.37 · 10−3] µm2 s−1ωd,e→de 5.17 · 10−4 [4.59 · 10−4, 6.02 · 10−4] µm2 s−1ωd→D 3.15 · 10−1 [2.17 · 10−1, 3.15 · 10−1] s−1ωde→D,E 1.13 · 10−1 [1.04 · 10−1, 1.24 · 10−1] s−1ωde→D,e 1.75 · 10−2 [1.54 · 10−2, 1.89 · 10−2] s−1ωde→d,e 1.10 · 10−6 [0.00 · 100, 7.72 · 10−4] s−1ωe→E 6.22 · 10−3 [5.61 · 10−3, 6.92 · 10−3] s−1Table 4.2: Parameters from the fit of the Extended Bonny Model to the near-homogeneous data.Cd and Ce are as described in Table 4.1. Details of calculating confidence intervals are describedin Section 4.4.5.To determine how individual reactions affect the Extended Bonny Model’s ability to describethe near-homogeneous data, individually, I remove non-necessary reactions from the ExtendedBonny Model and fit the resulting model to the near-homogeneous data, as described in Section4.4. Results are shown in Table 4.3.634.3. Fitting Models to the Near-Homogeneous DataNull parameter χ2/χ2∅ AIC−AIC∅ωdD→d 2.85 · 101 1.14 · 103ωdeD→d 1.32 · 100 8.53 · 101ωdeE,d→de 1.00 · 100 −2.05 · 100ωeE,d→de 2.62 · 100 3.26 · 102ωd,e→de 7.96 · 100 6.99 · 102ωd→D 1.10 · 100 2.56 · 101ωde→D,E 2.63 · 100 3.27 · 102ωde→D,e 1.03 · 100 5.89 · 100ωde→d,e 1.00 · 100 −4.06 · 100Table 4.3: Removed-reaction fits of the Extended Bonny Model to the near-homogeneous data.A removed reaction is characterized by a null parameter. χ2 is the the weighted sum of squaredresiduals, and AIC is the Akaike information criterion. χ2∅ and AIC∅ are the values of χ2 and AICfrom the Extended Bonny Model without a removed reaction. χ2/χ2∅ and AIC−AIC∅ measurethe affect of removing the reaction characterized by the null parameter from the Extended BonnyModel on its ability to fit the near-homogeneous data; larger values of χ2/χ2∅ and AIC−AIC∅correspond to a larger decrease in fitting ability.As shown in Table 4.3, the reactions characterized by the parameters ωdD→d, ωdeD→d, ωeE,d→de,ωd,e→de, and ωde→D,E each significantly (AIC−AIC∅ > 50) affect the Extended Bonny Model’sability to describe the near-homogeneous data. Of these, ωdeD→d and ωeE,d→de are not included inthe Modified Bonny Model. Notably, much of the Extended Bonny Model’s ability to describethe near-homogeneous data better than the Modified Bonny Model comes from the recruitmentof cE to bind to cd by ce (through ωeE,d→de). The fit of the ωeE,d→de-null Extended Bonny Modelto the near-homogeneous data is shown in Figure 4.8.644.3. Fitting Models to the Near-Homogeneous DataFigure 4.8: The fit of the ωeE,d→de-null Extended Bonny Model to the near-homogeneous data.Data is shown with points and model values are shown with lines. Removing the reactioncharacterized by the parameter ωeE,d→de from the Extended Bonny Model significantly reducesits ability to fit the near-homogeneous data (compare to Figure 4.6).The Extended Bonny Model fits the near-homogeneous data better than the ModifiedBonny Model, but it is still an incomplete description of the dynamical system underlying thenear-homogeneous data, as can be seen in the visible deviations between data and model whichare larger than the noise in the data. Also, the Extended Bonny Model does not account forexperimental observations that MinE can act as both a stabilizer and an inhibitor of MinDmembrane binding.4.3.3 Models in Which MinE Acts as Both a Stabilizer and an InhibitorWith buffer and MinE flowed atop a MinD-saturated supported lipid bilayer, initially, MinDdissociates from the supported lipid bilayer more slowly than with buffer alone. Later, asconcentrations of MinD and MinE on the supported lipid bilayer approach each other, MinDdissociates from the supported lipid bilayer more quickly than with buffer alone [73]. Thus,654.3. Fitting Models to the Near-Homogeneous DataMinE seems to act as both a stabilizer and an inhibitor of MinD membrane binding. Accordingly,the experimentally measured rate of MinD ATPase activity is sigmoidal in the concentrationof MinE, showing switch-like behavior [73]. I build on the Extended Bonny Model to developtwo models, the Symmetric Activation Model and the Asymmetric Activation Model, basedon experiments and postulate that could account for MinE’s dual role as a stabilizer and aninhibitor of MinD membrane binding, and fit them to the near-homogeneous data.The Symmetric Activation ModelIt has been proposed that MinD and MinE may form a stable complex, with one MinE dimerbound to one subunit of the MinD dimer, and an unstable complex, with one MinE dimer boundto each subunit of the MinD dimer ([73], [54]). This symmetric activation could explain MinE’sdual role as a stabilizer and an inhibitor of MinD membrane binding. I build on the ExtendedBonny Model (4.6) to develop the Symmetric Activation Model.My Symmetric Activation Model consists of four supported-lipid-bilayer-bound states, cd, cde,cede, and ce, respectively corresponding to the supported-lipid-bilayer-bound concentrations ofMinD dimers, MinD dimers bound to one MinE dimer, MinD dimers bound to two MinE dimers,and MinE dimers. For symmetric activation, cde is the stable state and cede is the unstable state.Thus, I exclude reactions from the Symmetric Activation Model where cde dissociates from thesupported lipid bilayer, and I include reactions in the Symmetric Activation Model where cededissociates from the supported lipid bilayer. In the Symmetric Activation Model, without reasonfor restriction, I allow cd, cde, and cede to recruit cD, the concentration of bulk MinD dimers, tothe supported lipid bilayer, and I allow cde, cede, and ce to recruit cE , the concentration of bulkMinE dimers, to bind to cd and cde. As in the Extended Bonny Model, I only consider reactionswhere MinE does not spontaneously dissociate from MinD on the supported lipid bilayer. Thus,in the Symmetric Activation Model, I include forward and reverse bimolecular reactions of cd,cde, cede, and ce with all products in {cd, cde, cede, ce}. Also, as in the Extended Bonny Model, inthe Symmetric Activation Model, I include the spontaneous dissociation of cd with stabilization664.3. Fitting Models to the Near-Homogeneous Databy supported-lipid-bilayer-bound MinD dimers. I define the Symmetric Activation Model:dcddt= (ωD→d + ωdD→dcd + ωdeD→dcde + ωedeD→dcede)(cmax − cd¯ − cd − cde − cede)/cmax− (ωE,d→de + ωdeE,d→decde + ωedeE,d→decede + ωeE,d→dece)cd− ωd,e→decdce + ωde→d,ecde − ωd,ede→de,decdcede + ωde,de→d,edec2de− ωd→Dcscd/(cs + cd¯ + cd + cde + cede), (4.7a)dcdedt= (ωE,d→de + ωdeE,d→decde + ωedeE,d→decede + ωeE,d→dece)cd− (ωE,de→ede + ωdeE,de→edecde + ωedeE,de→edecede + ωeE,de→edece)cde+ ωd,e→decdce − ωde→d,ecde + 2ωd,ede→de,decdcede − 2ωde,de→d,edec2de− ωde,e→edecdece + ωede→de,ecede, (4.7b)dcededt= (ωE,de→ede + ωdeE,de→edecde + ωedeE,de→edecede + ωeE,de→edece)cde− ωd,ede→de,decdcede + ωde,de→d,edec2de + ωde,e→edecdece − ωede→de,ecede− ωede→D,e,ecede − ωede→E,D,ecede − ωede→E,D,Ecede, (4.7c)dcedt= ωde→d,ecde − ωd,e→decdce − ωde,e→edecdece+ ωede→de,ecede + ωede→E,D,ecede + 2ωede→D,e,ecede − ωe→Ece, (4.7d)where cmax is the saturation concentration of MinD dimers on the membrane, cd¯ is the constantconcentration of persistently bound MinD dimers on the membrane, an experimental artifactdiscussed in Section F.4.4, and ωu,v→x,y denotes the reaction rate of cu and cv converting intocx and cy, for u, v, x, y ∈ {∅, D,E, d, de, ede, e}. When ωu,v→x,y has a superscript it indicatesfacilitation of the reaction by the superscripted species. I note that ωzD→d has a multiplicativefactor of cD built into it for z ∈ {∅, d, de, ede} and ωzE,d→de has a multiplicative factor of cEbuilt into it for z ∈ {∅, de, ede, e}. The Symmetric Activation Model is depicted in Figure 4.9.674.3. Fitting Models to the Near-Homogeneous DataωD→d ωdD→d ωdeD→d •ωedeD→d ωE,d→de ωdeE,d→deωeE,d→de •ωedeE,d→de •ωE,de→ede •ωdeE,de→ede •ωeE,de→ede •ωedeE,de→edeωd,e→de ωde→d,e •ωde,e→ede •ωede→de,e ωd→D ωe→E•ωde,de→d,ede •ωd,ede→de,de •ωede→D,e,e •ωede→E,D,e •ωede→E,D,ENωde→D,E Nωde→D,e(a)ededD Eede(b)E, d! de(+ede)D ! d(+ede)de, e! edeE, de! ede(+ede)(+de)(+e)ede! de, e de, de! d, ede d, ede! de, deede! D, e, e ede! E,D, e ede! E,D,E(c)Figure 4.9: The Symmetric Activation Model. Parameters characterizing reactions are shown in(a). A parameter that characterizes a reaction that is not included in the Extended Bonny Modelis shown with a bullet (•) and the corresponding reaction is depicted in (c). A triangle (N) nextto a parameter indicates that the corresponding reaction from the Extended Bonny Model isnot included in the Symmetric Activation Model. State variables are matched to protein statesin (b). In (c), reactants are shown on the left and products are shown on the right of panels.(+) indicates facilitation in a reaction by the indicated species.I fit the Symmetric Activation Model (4.7) to the near-homogeneous data, as described inSection 4.4. The resulting fit, state values, and parameter values are shown in Figure 4.10,Figure 4.11, and Table 4.4 respectively.684.3. Fitting Models to the Near-Homogeneous DataFigure 4.10: The fit of the Symmetric Activation Model to the near-homogeneous data. Data isshown with points and model values are shown with lines. The Symmetric Activation Model fitsthe near-homogeneous data moderately better than the Extended Bonny Model (compare toFigure 4.6).Comparing Figures 4.6 and 4.10, the Symmetric Activation Model describes the near-homogeneousdata visibly somewhat better than the Extended Bonny Model. Quantitatively, χ2, the weightedsum of squared residuals, from the Extended Bonny Model is 1.89 times larger than χ2 from theSymmetric Activation Model, and the value of AIC, the Akaike information criterion, from theExtended Bonny Model is 210 units larger than the value of AIC from the Symmetric ActivationModel. Although the Symmetric Activation Model does describe the near-homogeneous datamoderately better than the Extended Bonny Model, as discussed below, it does not agree wellwith experiments. Whereas, the Asymmetric Activation Model (described below) fits the thenear-homogeneous data better than the Symmetric Activation Model and agrees well withexperiments. Therefore, I do not expound the Symmetric Activation Model further.694.3. Fitting Models to the Near-Homogeneous DataFigure 4.11: States from the fit of the Symmetric Activation Model to the near-homogeneousdata.704.3. Fitting Models to the Near-Homogeneous Dataparameter value 95% confidence interval unitsCd 3.18 · 102 [2.90 · 102, 3.18 · 102] µm−2Ce 1.91 · 102 [1.69 · 102, 2.03 · 102] µm−2cd¯ 0.00 · 100 [0.00 · 100, 7.69 · 100] µm−2cmax 5.38 · 103 [5.35 · 103, 5.44 · 103] µm−2cs 5.31 · 101 [4.11 · 101, 6.71 · 101] µm−2ωD→d 0.00 · 100 [0.00 · 100, 2.37 · 10−1] µm−2 s−1ωdD→d 2.62 · 10−1 [2.47 · 10−1, 2.68 · 10−1] s−1ωdeD→d 0.00 · 100 [0.00 · 100, 5.05 · 10−2] s−1ωedeD→d 0.00 · 100 [0.00 · 100, 1.27 · 10−2] s−1ωE,d→de 1.01 · 10−3 [2.49 · 10−4, 1.29 · 10−3] s−1ωdeE,d→de 0.00 · 100 [0.00 · 100, 9.70 · 10−7] µm2 s−1ωeE,d→de 8.17 · 10−4 [7.77 · 10−4, 8.61 · 10−4] µm2 s−1ωedeE,d→de 0.00 · 100 [0.00 · 100, 1.00 · 10−6] µm2 s−1ωE,de→ede 0.00 · 100 [0.00 · 100, 5.13 · 10−4] s−1ωdeE,de→ede 0.00 · 100 [0.00 · 100, 2.80 · 10−6] µm2 s−1ωeE,de→ede 8.11 · 10−2 [8.07 · 10−2, 8.16 · 10−2] µm2 s−1ωedeE,de→ede 0.00 · 100 [0.00 · 100, 4.79 · 10−6] µm2 s−1ωd,e→de 0.00 · 100 [0.00 · 100, 9.04 · 10−6] µm2 s−1ωd,ede→de,de 1.69 · 10−4 [1.66 · 10−4, 1.70 · 10−4] µm2 s−1ωd→D 3.15 · 10−1 [2.93 · 10−1, 3.15 · 10−1] s−1ωde,de→d,ede 3.17 · 10−4 [3.13 · 10−4, 3.20 · 10−4] µm2 s−1ωde,e→ede 6.74 · 10−3 [6.38 · 10−3, 7.14 · 10−3] µm2 s−1ωde→d,e 0.00 · 100 [0.00 · 100, 1.55 · 10−4] s−1ωe→E 1.55 · 10−2 [1.36 · 10−2, 1.66 · 10−2] s−1ωede→D,e,e 1.67 · 10−4 [0.00 · 100, 5.56 · 10−4] s−1ωede→E,D,E 2.29 · 10−1 [2.26 · 10−1, 2.31 · 10−1] s−1ωede→E,D,e 4.12 · 10−2 [3.93 · 10−2, 4.20 · 10−2] s−1ωede→de,e 3.08 · 10−7 [0.00 · 100, 9.00 · 10−4] s−1Table 4.4: Parameters from the fit of the Symmetric Activation Model to the near-homogeneousdata. Cd and Ce are as described in Table 4.1. Details of calculating confidence intervals aredescribed in Section 4.4.5.The Asymmetric Activation ModelApart from a crystal structure showing two MinE fragments (containing 20 of 88 amino acids)each bound to a subunit of the MinD dimer [52], there is no direct evidence for the SymmetricActivation Model. Contrarily, a MinE dimer bound to one subunit of a MinD dimer stimulatesATPase activity in both subunits of the MinD dimer [53], showing asymmetric rather thansymmetric activation of MinD by MinE. Furthermore, with ATPγS, a non-hydrolyzable analogueof ATP, and a high concentration of MinE, the Symmetric Activation Model would predictthat the ratio of MinD to MinE on a lipid bilayer would be 1:2. However, experiments testingprecisely this scenario found a ratio of MinD to MinE of 1:1 [34] and 3:1 [42]. Additionally,with ATPγS, MinD and MinE dissociate with a ratio of 1:1 from the supported lipid bilayer714.3. Fitting Models to the Near-Homogeneous Data[73], suggesting that, if dissociation occurs predominantly in the unstable MinD-MinE complex,the ratio of MinD to MinE in the unstable MinD-MinE complex is 1:1. Therefore, experimentaloutcomes support asymmetric rather than symmetric activation of MinD by MinE.For asymmetric activation of MinD by MinE, MinD and MinE form an unstable complexwith one MinE dimer bound to one subunit of the MinD dimer ([52], [53]). There is no directevidence for the structure of a stable MinD-MinE complex. However, a crystal structure showsa MinE dimer bridging two MinD dimers, with one MinD dimer rotated 90◦ with respect tothe other MinD dimer [52]. Because of the 90◦ rotation of one MinD dimer with respect tothe other MinD dimer, a MinE dimer bridging two MinD dimers has been considered moreof an experimental artifact than a biologically relevant state. I believe, however, that a MinEdimer bridging two MinD dimers may reflect the stable MinD-MinE complex, and the 90◦rotation of one MinD dimer with respect to the other MinD dimer may reflect the stabilizationmechanism. If the stable configuration of a MinE dimer bridging two MinD dimers requires a90◦ rotation of one MinD dimer with respect to the other MinD dimer, then some strain wouldlikely exist within a MinE dimer that bridges two membrane-bound MinD dimers both withmembrane-targeting sequences oriented toward the membrane. I hypothesize that when a MinEdimer binds to a second MinD dimer on the membrane, imposed strain alters the interactionbetween the MinE dimer and the first MinD dimer, tempering MinE-stimulated ATPase activity.Thus, I propose that MinD and MinE form a stable complex with one MinE dimer bridging twoMinD dimers.Supporting my stabilization-by-bridging hypothesis, experiments show that removing thedimerization domain of MinE removes switch-like behavior in the stimulation of MinD ATPaseactivity by MinE: the rate of MinD ATPase activity as a function of WT MinE concentrationresembles a Hill equation with a Hill coefficient greater than one, while the rate of MinDATPase activity as a function of dimerization-domain-deficient MinE concentration resemblesa Hill equation with a Hill coefficient of one [73]. Refuting my stabilization-by-bridging hy-pothesis, dimerization-domain-deficient MinE stabilizes MinD membrane binding in buffer-flowexperiments [73]. However, it stabilizes MinD membrane binding less than WT MinE. Also,dimerization-domain-deficient MinE does not support dynamic pattern formation in vitro, andthe rate of MinD ATPase activity is lower with high concentrations of dimerization-domain-deficient MinE than with high concentrations of WT MinE [73]. So, the stabilization of MinDmembrane binding could stem from a dimerization-domain-deficient disruption of the regularMinD-MinE interaction, which tempers the stimulation of ATPase activity.I build on the Extended Bonny Model (4.6) to develop the Asymmetric Activation Model.My Asymmetric Activation Model consists of four supported-lipid-bilayer-bound states, cd, cde,cded, and ce, respectively corresponding to the supported-lipid-bilayer-bound concentrationsof MinD dimers, MinE dimers with one MinD dimer bound, MinE dimers with two MinDdimers bound, and MinE dimers with no MinD dimers bound. For asymmetric activation, cdeis the unstable state and cded is the stable state. Thus, I include reactions in the Asymmetric724.3. Fitting Models to the Near-Homogeneous DataActivation Model where cde dissociates from the supported lipid bilayer, and I exclude reactionsfrom the Asymmetric Activation Model where cded dissociates from the supported lipid bilayer.As stated previously, a MinE dimer bound to one subunit of a MinD dimer is able to cause aconformational change in the other subunit of the MinD dimer [53], and, with ATPγS and ahigh concentration of MinE, the number of MinE dimers does not exceed the number of MinDdimers on lipid bilayers ([34], [42]). Thus, in the Asymmetric Activation Model, I assume thatthe binding of a MinE dimer to one subunit of a MinD dimer excludes other MinE dimers frombinding to the other subunit of the MinD dimer. As such, without reason for restriction, I allowMinE-exclusion reactions in the Asymmetric Activation Model:cE + cded → 2cde, (4.8a)ce + cded → 2cde, (4.8b)cde + cde → cded + ce, (4.8c)where cE is the concentration of bulk MinE dimers. Also, without reason for restriction, inthe Asymmetric Activation Model, I allow cd, cde, and cded to recruit cD, the concentrationof bulk MinD dimers, to the supported lipid bilayer, and I allow cde, cded, and ce to recruitcE to bind to cd and cded. As in the Extended Bonny Model, I only consider reactions whereMinE does not spontaneously dissociate from MinD on the supported lipid bilayer. Thus, in theAsymmetric Activation Model, I include forward and reverse bimolecular reactions of cd, cde,cded, and ce with all products in {cd, cde, cded, ce}. Also, as in the Extended Bonny Model, in theAsymmetric Activation Model, I include the spontaneous dissociation of cd with stabilization by734.3. Fitting Models to the Near-Homogeneous Datasupported-lipid-bilayer-bound MinD dimers. I define the Asymmetric Activation Model:dcddt= (ωD→d + ωdD→dcd + ωdeD→dcde + ωdedD→dcded)(cmax − cd¯ − cd − cde − 2cded)/cmax− (ωE,d→de + ωdeE,d→decde + ωdedE,d→decded + ωeE,d→dece)cd− ωd,de→dedcdcde + ωded→d,decded − ωd,e→decdce + ωde→d,ecde− ωd→Dcscd/(cs + cd¯ + cd + cde + 2cded), (4.9a)dcdedt= (ωE,d→de + ωdeE,d→decde + ωdedE,d→decded + ωeE,d→dece)cd+ 2(ωE,ded→de,de + ωdeE,ded→de,decde + ωdedE,ded→de,decded + ωeE,ded→de,dece)cded− ωd,de→dedcdcde + ωded→d,decded + ωd,e→decdce − ωde→d,ecde− 2ωde,de→ded,ec2de + 2ωded,e→de,decdedce − ωde→D,Ecde − ωde→D,ecde, (4.9b)dcdeddt= −(ωE,ded→de,de + ωdeE,ded→de,decde + ωdedE,ded→de,decded + ωeE,ded→de,dece)cded+ ωd,de→dedcdcde − ωded→d,decded + ωde,de→ded,ec2de − ωded,e→de,decdedce, (4.9c)dcedt= ωde→d,ecde − ωd,e→decdce + ωde,de→ded,ec2de − ωded,e→de,decdedce+ ωde→D,ecde − ωe→Ece, (4.9d)where cmax is the saturation concentration of MinD dimers on the membrane, cd¯ is the constantconcentration of persistently bound MinD dimers on the membrane, an experimental artifactdiscussed in Section F.4.4, and ωu,v→x,y denotes the reaction rate of cu and cv converting intocx and cy, for u, v, x, y ∈ {∅, D,E, d, de, ded, e}. When ωu,v→x,y has a superscript it indicatesfacilitation of the reaction by the superscripted species. I note that ωzD→d has a multiplicativefactor of cD built into it for z ∈ {∅, d, de, ded} and ωzE,d→de has a multiplicative factor of cEbuilt into it for z ∈ {∅, de, ded, e}. The Asymmetric Activation Model is depicted in Figure 4.12.744.3. Fitting Models to the Near-Homogeneous DataωD→d ωdD→d ωdeD→d •ωdedD→d ωE,d→deωdeE,d→de •ωdedE,d→de ωeE,d→de •ωE,ded→de,de •ωdeE,ded→de,de•ωeE,ded→de,de •ωdedE,ded→de,de ωd,e→de ωde→d,e •ωd,de→ded•ωded→d,de ωde→D,E ωde→D,e •ωde,de→ded,e •ωded,e→de,deωd→D ωe→E(a)D Eeded ded(b)E, d! de(+ded)D ! d(+ded)E, ded! de, de(+ded)(+de) (+e)d, de! dedded! d, de de, de! ded, e ded, e! de, de(c)Figure 4.12: The Asymmetric Activation Model. Parameters characterizing reactions are shownin (a). A parameter that characterizes a reaction that is not included in the Extended BonnyModel is shown with a bullet (•) and the corresponding reaction is depicted in (c). All reactionsfrom the Extended Bonny Model are included in the Asymmetric Activation Model. Statevariables are matched to protein states in (b). In (c), reactants are shown on the left andproducts are shown on the right of panels. (+) indicates facilitation in a reaction by the indicatedspecies.I fit the Asymmetric Activation Model (4.9) to the near-homogeneous data, as describedin Section 4.4. The resulting fit, state values, and parameter values are shown in Figure 4.13,Figure 4.14, and Table 4.5 respectively.754.3. Fitting Models to the Near-Homogeneous DataFigure 4.13: The fit of the Asymmetric Activation Model to the near-homogeneous data. Data isshown with points and model values are shown with lines. The Asymmetric Activation Model fitsthe near-homogeneous data appreciably better than the Symmetric Activation Model (compareto Figure 4.10).Comparing Figures 4.6, 4.10, and 4.13, the Asymmetric Activation Model describes the near-homogeneous data visibly better than the Extended Bonny Model, even more so than theSymmetric Activation Model. Quantitatively, χ2, the weighted sum of squared residuals, fromthe Extended Bonny Model is 16.1 times larger than χ2 from the Asymmetric Activation Model,and the value of AIC, the Akaike information criterion, from the Extended Bonny Model is 919units larger than the value of AIC from the Asymmetric Activation Model. Comparatively, χ2from the Extended Bonny Model is 1.89 times larger than χ2 from the Symmetric ActivationModel, and the value of AIC from the Extended Bonny Model is 210 units larger than the valueof AIC from the Symmetric Activation Model. As such, the Asymmetric Activation Modeldescribes the near-homogeneous data considerably better than the Symmetric Activation Model.764.3. Fitting Models to the Near-Homogeneous DataFigure 4.14: States from the fit of the Asymmetric Activation Model to the near-homogeneousdata.774.3. Fitting Models to the Near-Homogeneous Dataparameter value 95% confidence interval unitsCd 7.47 · 101 [6.52 · 101, 8.58 · 101] µm−2Ce 6.23 · 10−1 [0.00 · 100, 6.67 · 100] µm−2cd¯ 1.08 · 10−2 [0.00 · 100, 1.42 · 101] µm−2cmax 5.30 · 103 [5.28 · 103, 5.35 · 103] µm−2cs 1.15 · 102 [8.11 · 101, 2.64 · 102] µm−2ωD→d 8.77 · 10−5 [0.00 · 100, 9.66 · 10−2] µm−2 s−1ωdD→d 3.73 · 10−1 [3.58 · 10−1, 3.96 · 10−1] s−1ωdeD→d 2.00 · 10−1 [1.98 · 10−1, 2.02 · 10−1] s−1ωdedD→d 4.20 · 10−1 [4.05 · 10−1, 4.29 · 10−1] s−1ωE,d→de 3.28 · 10−3 [2.34 · 10−3, 3.42 · 10−3] s−1ωdeE,d→de 2.11 · 10−9 [0.00 · 100, 2.41 · 10−6] µm2 s−1ωdedE,d→de 8.50 · 10−10 [0.00 · 100, 8.21 · 10−7] µm2 s−1ωeE,d→de 1.90 · 100 [1.88 · 100, 1.92 · 100] µm2 s−1ωE,ded→de,de 1.86 · 10−13 [0.00 · 100, 7.57 · 10−4] s−1ωdeE,ded→de,de 1.45 · 10−5 [1.35 · 10−5, 1.50 · 10−5] µm2 s−1ωdedE,ded→de,de 2.10 · 10−5 [1.97 · 10−5, 2.11 · 10−5] µm2 s−1ωeE,ded→de,de 1.64 · 10−1 [1.63 · 10−1, 1.65 · 10−1] µm2 s−1ωd,de→ded 4.86 · 10−5 [4.75 · 10−5, 5.00 · 10−5] µm2 s−1ωd,e→de 6.16 · 10−1 [6.11 · 10−1, 6.22 · 10−1] µm2 s−1ωd→D 3.15 · 10−1 [2.25 · 10−1, 3.15 · 10−1] s−1ωde,de→ded,e 6.59 · 10−4 [6.59 · 10−4, 6.59 · 10−4] µm2 s−1ωde→D,E 2.00 · 10−1 [1.99 · 10−1, 2.02 · 10−1] s−1ωde→D,e 3.40 · 10−6 [0.00 · 100, 5.66 · 10−4] s−1ωde→d,e 6.64 · 10−2 [6.59 · 10−2, 6.68 · 10−2] s−1ωded,e→de,de 9.34 · 100 [9.34 · 100, 9.34 · 100] µm2 s−1ωded→d,de 1.52 · 10−13 [0.00 · 100, 1.91 · 10−3] s−1ωe→E 5.39 · 10−2 [5.34 · 10−2, 5.45 · 10−2] s−1Table 4.5: Parameters from the fit of the Asymmetric Activation Model to the near-homogeneousdata. Cd and Ce are as described in Table 4.1. Details of calculating confidence intervals aredescribed in Section 4.4.5.To determine how individual reactions affect the Asymmetric Activation Model’s ability todescribe the near-homogeneous data, individually, I remove non-necessary reactions from theAsymmetric Activation Model and fit the resulting model to the near-homogeneous data, asdescribed in Section 4.4. Results are shown in Table 4.6.784.3. Fitting Models to the Near-Homogeneous DataNull parameter χ2/χ2∅ AIC−AIC∅ωdD→d 3.80 4.36 · 102ωdeD→d 5.71 5.82 · 102ωdedD→d 1.98 2.30 · 102ωdeE,d→de 1.00 −4.34 · 100ωdedE,d→de 1.05 7.16 · 101ωeE,d→de 8.43 7.09 · 102ωE,ded→de,de 1.00 −7.20 · 10−1ωdeE,ded→de,de 1.07 2.09 · 101ωdedE,ded→de,de 1.12 3.35 · 101ωeE,ded→de,de 1.13 3.48 · 101ωd,de→ded 4.46 4.93 · 102ωd,e→de 1.49 1.23 · 102ωd→D 1.04 4.56 · 100ωde,de→ded,e 2.11 2.43 · 102ωde→D,E 5.81 5.82 · 102ωde→D,e 1.04 1.13 · 101ωde→d,e 1.05 1.33 · 101ωded,e→de,de 4.71 5.09 · 102ωded→d,de 1.01 2.29 · 100Table 4.6: Removed-reaction fits of the Asymmetric Activation Model to the near-homogeneousdata. A removed reaction is characterized by a null parameter. χ2 is the the weighted sum ofsquared residuals, and AIC is the Akaike information criterion. χ2∅ and AIC∅ are the values ofχ2 and AIC from the Asymmetric Activation Model without a removed reaction. χ2/χ2∅ andAIC−AIC∅ measure the affect of removing the reaction characterized by the null parameterfrom the Asymmetric Activation Model on its ability to fit the near-homogeneous data; largervalues of χ2/χ2∅ and AIC−AIC∅ correspond to a larger decrease in fitting ability.As shown in Table 4.3, the reactions characterized by the parameters ωdD→d, ωdeD→d, ωdedD→d,ωeE,d→de, ωd,de→ded, ωd,e→de, ωde,de→ded,e, ωde→D,E , and ωded,e→de,de each significantly (AIC −AIC∅ > 50) affect the Asymmetric Activation Model’s ability to describe the near-homogeneousdata.As is visible in Figure 4.14 and elaborated in Figure 4.15, at the front end of the pulse, alarge proportion of MinD dimers are in the semistable state, cd, or in the stable state, cded.Whereas, at the back end of the pulse, a large proportion of MinD dimers are in the unstablestate, cde.794.3. Fitting Models to the Near-Homogeneous DataFigure 4.15: Stability of MinD dimers on the supported lipid bilayer. The total MinD dimerconcentration on the supported lipid bilayer is separated into the concentration of dimers in thestable state, 2cded, the concentration of dimers in the semistable state, cd, and the concentrationof dimers in the unstable state, cde. Values of cd, cde, and cded, as shown in Figure 4.14, comefrom the fit of the Asymmetric Activation Model to the near-homogeneous data. MinD dimersare predominantly semi-stably and stably bound to the supported lipid bilayer at the front endof the pulse and unstably bound to the supported lipid bilayer at the back end of the pulse.4.3.4 A Stability-Switching Mechanism Underlying the Dynamic Behaviorof the Min SystemThrough the fit of the Asymmetric Activation Model to the near-homogeneous data, I interpreta stability-switching mechanism that underlies the dynamic behavior of the Min system. Mydiscussion is based on the state values shown in Figure 4.14 and the characterization of significantreactions from Table 4.6.During the MinD upstroke, bulk MinD binds to the supported lipid bilayer in the proteinstate d, which is semi-stably bound to the supported lipid bilayer, and recruits more bulk MinDto the supported lipid bilayer through the significant reaction D + d → 2d (characterized by804.3. Fitting Models to the Near-Homogeneous DataωdD→d). Bulk MinE binds to supported-lipid-bilayer-bound MinD in the protein state de, whichis unstably bound to the supported lipid bilayer. However, through the significant reactiond+de→ ded (characterized by ωd,de→ded), a large concentration of d pushes de to react with d toform ded, the protein state that is stably bound to the supported lipid bilayer. Simultaneously,through the significant reaction d+ e→ de (characterized by ωd,e→de), a large concentration ofd pushes the protein state e to react with d to form de, which a large concentration of d pushesto react with d to form ded. As more bulk MinE binds to supported-lipid-bilayer-bound MinD,the concentration of d decreases and the concentration of de increases. Through the significantreaction de+ de → ded+ e (characterized by ωde,de→ded,e) and the reactions de → D + e andde→ d+ e (characterized by ωde→D,e and ωde→d,e), de generates e. Eventually, in catastrophe,a period of destabilization with positive feedback, without d as a substrate, e binds to ded andgenerates de through the significant reaction ded+ e→ de+ de (characterized by ωded,e→de,de),de generates more e which generates more de, and de dissociates from the supported lipidbilayer through the significant reaction de → D + E (characterized by ωde→D,E). Cappingcatastrophe, e recruits bulk MinE to bind to any remaining d through the significant reactionE+d+ e→ de+ e (characterized by ωeE,d→de), and e dissociates from the supported lipid bilayerthrough the reaction e→ E (characterized by ωe→E). Collectively, a large concentration of dreinforces stability of MinD on the supported lipid bilayer and suppresses the catastrophe switch,e; a decrease in the concentration of d signals the switch from stability to catastrophe, causingrapid MinD dissociation from the supported lipid bilayer. The aforementioned stability-switchingmechanism is depicted in Figure 4.16.814.3. Fitting Models to the Near-Homogeneous Dataeded ded(a)stableswitchcatastropheupstroke(b)Figure 4.16: The stability-switching mechanism. Protein states are shown in (a). The stability-switching mechanism is shown in (b). Relative concentration/affect is shown by size. Thesemistable protein state d is predominant during pulse upstroke (top). Bulk MinE binds to dto form the unstable protein state de, but a large concentration of d pushes de to react withd to form the stable protein state ded (right). Meanwhile, a large concentration of d inhibitsthe catastrophe switch, protein state e, by reacting with e to form de. The concentration of ddecreases and the switch in stability occurs (bottom). In a feedback loop, de generates e, thecatastrophe switch, which binds to ded to form more de (left); Min proteins rapidly dissociatefrom the supported lipid bilayer in catastrophe.4.3.5 Results Relating to Experimental ObservationsThe results from my modeling and fitting could explain some experimental observations. AMinE mutant that stimulates MinD ATPase activity but is deficient in membrane binding, MinEC1 [27], fails to stimulate dynamic pattern formation on a supported lipid bilayer in vitro [44].Rather, MinD and MinE C1 form a stationary structure on the supported lipid bilayer withMinD and MinE C1 profiles that are similar in shape to MinD and MinE profiles in travelingwaves, except that the MinE C1 profile lacks a sharp peak [44]. My results, from fitting boththe Extended Bonny Model and the Asymmetric Activation Model to the near-homogeneousdata, suggest that supported-lipid-bilayer-bound MinE recruits bulk MinE to bind to supported-lipid-bilayer-bound MinD (ωeE,d→de is significant). As such, the lack of a sharp peak in the824.4. Details of Optimization Using Overlapping-Niche DescentMinE C1 profile could follow from the inability of MinE C1 to recruit bulk MinE C1 to bind tosupported-lipid-bilayer-bound MinD. My results, from fitting the Asymmetric Activation Modelto the near-homogeneous data, also suggest that supported-lipid-bilayer-bound MinE acts asa catastrophe switch (ωded,e→de,de is significant). Thus, MinD and MinE C1 forming a stable,stationary structure on the supported lipid bilayer could follow from the lack of a catastropheswitch with the MinE C1 mutant.MinE consists of three domains, conferring the functions of membrane binding, MinDbinding and the stimulation of ATPase activity, and dimerization ([46], [56], [35], [27], [52], [73]).Membrane binding and dimerization play critical roles in MinE’s function, but the functional rolesof membrane binding and dimerization are somewhat unclear. During pole-to-pole oscillationsin vivo, the E ring, a concentrated band of MinE, follows a more diffuse band of MinD fromnear midcell to cell pole, capping the MinD polar zone ([59], [23], [67]). A disruption in MinEof membrane binding [27] or dimerization [61] inhibits E ring formation, allowing the extensionof MinD polar zones. Similarly, a disruption in MinE of membrane binding [44] or dimerization[73] inhibits dynamic pattern formation on a supported lipid bilayer in vitro. It has beenthought that membrane binding allows a transition between MinD binding events, permitting aMinE dimer to stimulate ATPase activity in multiple MinD dimers before dissociating from themembrane ([52], [53]). The role of dimerization in MinE function is not understood. My resultssuggest that the functional roles of membrane binding and dimerization are tightly coupled in astability-switching mechanism, with dimerization underlying stability and membrane bindingunderlying catastrophe.4.4 Details of Optimization Using Overlapping-Niche DescentHere, I describe structural components of overlapping-niche descent for fits to the near-homogeneous data. I describe details pertaining to the implementation of overlapping-nichedescent in Appendix G.4.4.1 Statistical ModelFor near-homogeneous MinD and MinE densities at time tk, y1,k and y2,k, and correspondingobservable model values, y1,k and y2,k,yj,k = yj,k + εj,k, (4.10)with errors εj,k, for j ∈ {1, 2} and k ∈ {1, 2 . . . , nt}. Errors, εj,k, consist of modeling errors anddata errors. As is visible in Figure F.19 of Section F.4.3, errors in near-homogeneous data aresmall compared to the ranges of near-homogeneous data; errors range between roughly 5 µm−2and 30 µm−2, as is visible in Figure F.21, and data ranges on the scale of 5 · 103 µm−2. Iexpect modeling errors to significantly exceed the relatively small errors in near-homogeneous834.4. Details of Optimization Using Overlapping-Niche Descentdata. Thus, I expect errors, εj,k, to consist primarily of modeling errors. Modeling errors areinherently not independent or identically distributed, but without a better a priori distribution, Iassume that εj,k for k ∈ {1, 2 . . . , nt} are independent and identically distributed from a normaldistribution with a mean of 0, for each j ∈ {1, 2}. The range of near-homogeneous MinDdensities is larger than the range of near-homogeneous MinE densities. Thus, to remove bias infitting from differences in scale, I assume that errors, εj,k, are proportional to the range of yj,kfor k ∈ {1, 2 . . . , nt}, y¯j , for j ∈ {1, 2}. Therefore, collectively, I assume thatyj,k = yj,k + y¯j ε¯j,k, (4.11)where ε¯j,k for j ∈ {1, 2} and k ∈ {1, 2 . . . , nt} are independent and identically distributed froma normal distribution with a mean of 0 and a variance of σ¯2, N(0, σ¯2). Thus, the likelihood ofyj,k given yj,k and σ¯2, for j ∈ {1, 2} and k ∈ {1, 2 . . . , nt}, isL(yj,k|yj,k, σ¯2 : j ∈ {1, 2}, k ∈ {1, 2 . . . , nt}) =2∏j=1nt∏k=11√2piy¯2j σ¯2exp(−(yj,k − yj,k)2y¯2j σ¯2)= C¯ exp 1σ¯22∑j=1nt∑k=1−(yj,k − yj,k)2y¯2j , (4.12)for constant C¯ > 0. The values of yj,k for j ∈ {1, 2} and k ∈ {1, 2 . . . , nt} that maximizeL(yj,k|yj,k, σ¯2 : j ∈ {1, 2}, k ∈ {1, 2 . . . , nt}) are those that minimize2∑j=1nt∑k=1(yj,k − yj,k)2y¯2j. (4.13)Thus, I measure the difference in observable model values from the near homogenous data bythe sum of weighted squared residuals in equation (4.13).4.4.2 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x)Preliminarily, for consistency with previous notation, I define: x1 = cd, x2 = cde, and x3 = ce forthe Modified Bonny Model and the Extended Bonny Model, x1 = cd, x2 = cde, x3 = cede, andx4 = ce for the Symmetric Activation Model, and x1 = cd, x2 = cde, x3 = cded, and x4 = ce forthe Asymmetric Activation Model; y1 = the concentration of MinD monomers (µm−2) and y2 =the concentration of MinE monomers (µm−2); for constant bulk and persistent lipid bilayer-bound MinD and MinE densities, Cd and Ce, as described in Section F.4.4, g1 = 2(x1 +x2) +Cdand g2 = 2(x2 + x3) + Ce for the Modified Bonny Model and the Extended Bonny Model,g1 = 2(x1 + x2 + x3) + Cd and g2 = 2(x2 + 2x3 + x4) + Ce for the Symmetric Activation Model,and g1 = 2(x1 + x2 + 2x3) + Cd and g2 = 2(x2 + x3 + x4) + Ce for the Asymmetric ActivationModel. Additionally, for each model, I uniquely identify each parameter with p1, p2, . . . , pnp .As described in Section 4.4.1, I measure the difference in observable model values from the844.4. Details of Optimization Using Overlapping-Niche Descentnear homogenous data by the sum of weighted squared residuals in equation (4.13). Thus, Idefine ry(p,x) such thatry(p,x) =1∑nyj=1∑ntk=1 y¯−2j y2j,kny∑j=1nt∑k=1y¯−2j(yj,k − gj(p, x1,k, . . . , xnx,k))2, (4.14)where I normalize by∑nyj=1∑ntk=1 y¯−2j y2j,k to match the scale of ry(p,x) in equation (2.6). I usethe data grid, with a data point every 3 s, as the numerical discretization grid. Thus, I defineryˆ(p,x) = 0. I define r∆x(p,x) as in equation (2.13a), and discretize models using a Simpson’smethod finite difference, a finite difference with fourth order accuracy. Thus, in r∆x(p,x),∆xi,k = 0 if k ∈ {1, nt}xi,k+ − xi,k−2∆tif k ∈ I∆ \ {1, nt},Fi,k(t,p,x) =0 if k ∈ {1, nt}1∑m=−1bmFi (p, x1,k+m, x2,k+m, . . . , xnx,k+m) if k ∈ I∆ \ {1, nt},(4.15)where k+ is the index above k in I∆, k− is the index below k in I∆, ∆t = 3 s is the gridspacing in {tk : k ∈ I∆}, b−1 = 1/6, b0 = 4/6, b1 = 1/6, and Fi is as defined in equation(2.7). In smoothing penalties, si(x), of r∆x(p,x), I set αi = 1, βi = 102, and γi = 2, for alli ∈ {1, 2, . . . , nx}, to insignificantly modify r∆x(p,x) with a smooth set of state values and tostrongly penalize r∆x(p,x) with a jagged set of state values.4.4.3 Domain Restrictions on Parameters and StatesAs described in Section F.4.4, I restrict nonnegative constant bulk and persistent lipid bilayer-bound MinD and MinE densities, Cd and Ce, such that0 ≤ Cd ≤ 317.88 µm−2, (4.16a)0 ≤ Ce ≤ 249.25 µm−2. (4.16b)The nonnegative concentration of persistent lipid bilayer-bound MinD dimers is necessarily nolarger than half the constant bulk and persistent lipid bilayer-bound MinD density. Thus, Irestrict cd¯ such that0 ≤ cd¯ ≤ 158.94 µm−2. (4.17)The parameter cmax dictates the maximum concentration of membrane-bound MinD dimers.Thus, I restrict cmax to values greater than or equal to half the maximal near homogeneousMinD density value, Dmax/2, where Dmax = 1.02 · 104 µm−2. Additionally, I assume cmax is on854.4. Details of Optimization Using Overlapping-Niche Descentthe scale of Dmax/2, so I bound cmax above by 100 ·Dmax/2. Thus, I restrict cmax such thatDmax/2 ≤ cmax ≤ 100 ·Dmax/2 (4.18)As per condition (4.5), I assume that ns ≤ Dmax/2. Also, equation (4.3) can be undefined ifns = 0, so I bound ns below by 1. Thus, I restrict ns such that1 ≤ cs ≤ Dmax/2 (4.19)Rate parameters, ωzu,v→x,y for u, v, x, y, z ∈ {∅, D,E, d, de, ede, ded, e}, are only biologicallyrelevant if nonnegative. Also, to restrict optimization within ranges of reactions that arenot overly fast, I restrict all rate parameters to no more than 10 units. Thus, I restrict rateparameters such that:0 ≤ p ≤ 10 up for all p ∈{ωzu,v→x,y : u, v, x, y, z ∈ {∅, D,E, d, de, ede, ded, e}}, (4.20)where up is the units of parameter p. In experiments similar to those of Ivanov and Mizuuchi,in the absence of MinE, MinD dimers dissociate from a supported lipid bilayer at a rate of0.210 s−1 at low concentrations of supported lipid bilayer-bound MinD dimers [73]. Thus, fornonzero ωd→D, I restrict ωd→D to within 50% of 0.210 s−1:0.105 ≤ ωd→D ≤ 0.315 s−1 if ωd→D 6= 0. (4.21)Additionally, in experiments similar to those of Ivanov and Mizuuchi, MinE-facilitated dissocia-tion rates of MinD dimers from a supported lipid bilayer were found to be 0.14 s−1, 0.17 s−1,and 0.18 s−1 with respective bulk MinE concentrations of 2.5 µM , 3 µM , and 4 µM [73]. Thus,I restrict MinE-facilitated dissociation rates of MinD dimers from the supported lipid bilayer tono less than 50% of 0.14 s−1 and no more than 150% of 0.18 s−1:0.07 ≤ ωde→D,E + ωde→D,e ≤ 0.27 s−1, (4.22a)for the Modified Bonny Model, the Extended Bonny Model, and the Asymmetric ActivationModel;0.07 ≤ ωede→E,D,E + ωede→E,D,e + ωede→D,e,e ≤ 0.27 s−1, (4.22b)for the Symmetric Activation Model;Concentrations cd, cde, cede, cded, and ce are only biologically relevant if nonnegative. Thus,I restrict cd, cde, cede, cded, and ce to nonnegative values:ci,k ≥ 0 for all i ∈ {d, de, ede, ded, e} and k ∈ I∆, (4.23)864.4. Details of Optimization Using Overlapping-Niche Descentwhere cd,k, cde,k, cede,k, cded,k, and ce,k are the values of cd, cde, cede, cded, and ce at the kth indexof the numerical discretization. Details of overlapping-niche descent on restricted domains aredescribed in Section C.2.3.4.4.4 NichesI choose values of λ to define niches as in Section 3.4.3. For convenience, I repeat the discussionfrom Section 3.4.3 below. I choose 101 values of λ, λk for k = 1, 2, . . . , 101, to define 101 niches.The bounds (B.68) and (B.69), which state that r˘y(λ) ≤ ε¯ if λ ≤ ε¯/(1 + ε¯) and r˘∆x(λ) ≤ ε¯if λ ≥ 1/(1 + ε¯) for some tolerance ε¯, provide a meaningful guide for the choice of λk. Thus,based on the bounds (B.68) and (B.69) with chosen ε¯ = b0, b−1, . . . , b−50 and base b such thatb−50 = 10−6, I define λk for k = 1, 2, . . . , 101 such thatλk =b51−k1 + b51−kif k ≤ 5111 + b51−kif k > 51.(4.24)My choice of λk distributes the values of λk for k = 1, 2, . . . , 101 more densely near 0 and 1and less densely near 0.5. For reference, λ1 ≈ 10−6, λ2 ≈ 1.3 · 10−6, λ51 = 0.5, λ52 ≈ 0.57,λ100 ≈ 1− 1.3 · 10−6, and λ101 ≈ 1− 10−6.4.4.5 Calculating Confidence IntervalsI calculate confidence intervals as in Section 3.4.4. For convenience, I repeat the discussionfrom Section 3.4.4 below. I calculate confidence intervals by bootstrapping, given the complexnonlinear relationship between data noise and parameter noise that would not be adequatelycaptured using a (Taylor expansion based) delta method [39]. In doing so, I calculate observable-state residuals,ε˜j,k = yj,k − gj(p˜, x˜1,k, . . . , x˜nx,k), (4.25)where p˜ = p˜λ101 and x˜ = x˜λ101 , the parameter and state values that minimize r(p,x;λ101),and x˜i,k is the value in x˜ from the ith state and the kth grid index, for i ∈ {1, 2, . . . , nx},j ∈ {1, 2, . . . , ny}, and k ∈ {1, 2, . . . , nt}. By resampling residuals, I generate nb = 103 bootstrapdata sets:yj,k = gj(p˜, x˜1,k, . . . , x˜nx,k) + ε˜j,l, (4.26)where l is randomly sampled with replacement from {1, 2, . . . , nt}, for j ∈ {1, 2, . . . , ny} andk ∈ I∆. I replace observed data values in r(p,x;λ) with bootstrap data values from the ithbootstrap data set to construct the functional rbi (p,x;λ). Globally minimizing rbi (p,x;λ) usingoverlapping-niche descent for all i ∈ {1, 2, . . . , nb} would be computationally prohibitive. Rather,874.5. Discussionif residuals are not overly large, the optimal parameters and state values of rbi (p,x;λ) willgenerally be fairly similar to p˜ and x˜. Thus, with p˜ and x˜ as initial parameters and state values,I locally optimize rbi (p,x;λb) using accelerated descent, for all i ∈ {1, 2, . . . , nb}, with λb chosenlarge enough to weight local optimization towards a numerical solution but not so large that pand x are fixed near p˜ and x˜. Specifically, I chooseλb = arg min{∣∣∣ry(p˜λ, x˜λ)− 103r∆x(p˜λ, x˜λ)∣∣∣ : λ ∈ {λ1, λ2, . . . , λ101}} . (4.27)From the nb local optimizations, I construct a distribution of values for each parameter. Fromthe distribution of values for parameter pj , I compute the 2.5th and 97.5th percentile values,which I translate into the 95% confidence interval for parameter pj , for j ∈ {1, 2, . . . , np}.4.5 DiscussionIn this chapter, I briefly summarized extracting time-course data for model fitting from experi-mental measurements of the Min system. Then, I fit established and novel biochemistry-basedmodels to the time-course data using my parameter estimation method for differential equations.Comparing models to the time-course data allowed me to make precise distinctions betweenbiochemical assumptions in the various models. My modeling and fitting supported a novelmodel that accounts for MinE’s previously unmodeled dual role as a stabilizer and an inhibitorof MinD membrane binding. It suggests specific biological functions for MinE membrane bindingand dimerization, which play critical but somewhat unclear roles in Min system dynamics.In my supported model, MinD and MinE form an unstable complex on the membrane, whereone MinE dimer is bound to one MinD dimer, and a stable complex on the membrane, whereone MinE dimer bridges two MinD dimers. Acting as an instability switch, MinE dimers thatare unbound to MinD on the membrane bind to stable MinD-MinE complexes to form unstableMinD-MinE complexes. Pushing the system towards stability, MinD dimers that are unboundto MinE on the membrane bind to unstable MinD-MinE complexes to form stable MinD-MinEcomplexes and bind to membrane-bound MinE dimers to inhibit the instability switch. As such,the concentration of MinD dimers that are unbound to MinE on the membrane modulates localstability or catastrophe in the aggregation of MinD and MinE on the membrane. MinE onlyassociates with the membrane by binding to membrane-bound MinD, and MinE binding tomembrane-bound MinD decreases the concentration of MinD dimers that are unbound to MinEon the membrane, which, when concentrations are sufficiently low, signals a switch from localstability to catastrophe in the aggregation of MinD and MinE on the membrane. Ultimately, mysupported model suggests a regular, ordered, stability-switching mechanism that may underliethe emergent, dynamic behavior of the Min system.To be reliable, a biological signal should be regular and ordered; to be informative, abiological signal should be sensitive to variability in its stimuli. My proposed stability-switchingmechanism is regular and ordered, with a switch between stability and catastrophe, which884.5. Discussionwould provide sensitivity to local variation in Min system dynamics. As such, my proposedstability-switching mechanism could provide a reliable signal that fluidly transduces local Minsystem dynamics into a global cellular signal.89Chapter 5Conclusion5.1 Summary of Results• In Chapter 2, I developed a method that allowed me to calculate the optimal data-fittingnumerical solution and its parameters for a differential equation model without using nu-merical integration. Additionally, I showed that my method admits conservation principlesand integral representations that allow me to gauge the accuracy of my optimization.• In Chapter 3, I tested my method on synthetic data and a system of first order ordinarydifferential equations, a system of second order ordinary differential equations, and asystem of partial differential equations. I found that my method accurately identifiedthe optimal data-fitting numerical solution and its parameters in all three contexts. Icompared the performance of my method to that of an analogous numerical-integration-based method, and found that my method identified the optimal data-fitting numericalsolution more robustly than the analogous numerical-integration-based method, whilerequiring significantly less time to do so. I also explored an example where my methodinformed modeling insufficiencies and potential model improvements for an incompletevariant of a model. Finally, I showed that my optimization routine converged to valuesthat were consistent with my derived conservation principles and integral representations.• In Chapter 4, I briefly summarized extracting time-course data for model fitting fromexperimental measurements of the Min system and fit established and novel biochemistry-based models to the time-course data using my method. Comparing the models totime-course data allowed me to make precise distinctions between biochemical assumptionsin the various models. My modeling and fitting supported a novel model that accounts forMinE’s previously unmodeled dual role as a stabilizer and an inhibitor of MinD membranebinding. It suggests that a regular, ordered, stability-switching mechanism underlies theemergent, dynamic behavior of the Min system.5.2 Limitations of Overlapping-Niche DescentI designed my method for global optimization with complex systems of differential equations.When computing descent in parallel on a computer cluster, my method has shown itself to befairly computationally efficient. However, my method, as specified, could be computationally905.3. Extensions of the Homotopy-Minimization Methodprohibitive when computing descent in serial on a single computer. When testing my method,I found that for a relatively complex system of differential equations, the Bonny model withspatially homogeneous conditions, as defined in Equation 3.2, overlapping-niche descent wouldconverge to the optimal data-fitting numerical solution in several generations using only a fewniches. However, with only a few niches, conservation principles and integral representations areinaccurately calculated using numerical integration, and thus are not informative in gaugingthe accuracy and progress of optimization. Alternatively, for more robust optimization whencomputing descent in serial on a single computer, I suggest applying overlapping-niche descentwith many niches but only applying descent to several chosen individuals in each generation.This approach will be slower than applying descent to all individuals in each generationwhen computing descent in parallel on a computer cluster, but selection across niches willstill contribute to the synergistic minimization amongst individuals in different niches, andconservation principles and integral representations will be accurately calculated.5.3 Extensions of the Homotopy-Minimization MethodIn Section 3.7, I demonstrated how parameters and state values from overlapping-niche descent asλ→ 0+ could inform model shortcomings and potential model improvements. My demonstrationwas a proof of concept that was qualitative in nature, requiring model improvements to be inferredsimply by looking at a graph. The premise of my demonstration could be extended to develop asystematic method for informing model shortcomings and potential model improvements. Sucha method would be a valuable tool for refining models.My homotopy-minimization method, with conservation principles, integral representations,and overlapping-niche descent, applies to any differentiable function of the form h(v;λ) =(1− λ)h1(v) + λh2(v), where minh1(v) = 0 and minh2(v) = 0, with variable vector v. Thus,the method naturally extends to a wide class of optimization problems. For example, where h1(v)is a measure of how well state values fit data and h2(v) is a measure of how well state valuessatisfy a system of difference equations, my method naturally extends to find the parametervalues of the solution to the system of difference equations that fits data best. More generally,my homotopy-minimization method naturally extends to any constrained optimization problemthat minimizes h1(v) subject to constraints that can be formulated in a functional, h2(v).5.4 Limitations in Fitting Models to SpatiallyNear-Homogeneous Min DataIn Chapter 4, I fit models to spatially near-homogeneous Min data to compare how well modelscould describe the data. My fitting suggested that the Asymmetric Activation Model, asdiscussed in Section 4.3.3, could describe the near-homogeneous data best. In Section 4.3.3, Iexplored which reactions in the Asymmetric Activation Model were indispensable for describing915.5. Extensions of Fitting Models to Spatially Near-Homogeneous Min Datathe near-homogeneous data and found that quite a few reactions were superfluous. In doing so,I found that the Asymmetric Activation Model would admit numerical solutions that fit dataalmost identically well for a fairly large range of parameter values (not shown). As such, myasymmetric activation model requires refinements in included reactions and parameter estimatesfor biological realism. Fitting the Asymmetric Activation Model to the near-homogeneous datawith pairwise null reactions would allow me to determine how well pairwise reductions of theAsymmetric Activation Model could describe the near-homogeneous data. Such an analysiswould allow me to determine the most reduced form of the Asymmetric Activation Modelthat describes the near-homogeneous data well. As discussed in Section 4.4.3, I was able toconfine some parameters to biologically realistic values by using parameter restrictions fromexperimental measurements. New experimental measurements, such as the dissociation rateof MinD at various concentrations from the supported lipid bilayer in the absence of MinE,will hopefully provide new parameter restrictions, which will allow me to confine parameterestimates to more biologically realistic values in future fitting of the Asymmetric ActivationModel to the near-homogeneous data.5.5 Extensions of Fitting Models to SpatiallyNear-Homogeneous Min DataTo unravel the local reaction mechanism of the Min system, I focused on fitting ordinarydifferential equation models to spatially near-homogeneous Min data. My modeling and fittingsupported the Asymmetric Activation Model, which suggests that a regular, ordered, stability-switching mechanism underlies the emergent, dynamic behavior of the Min system. However,it is still unclear how local asymmetric activation reactions collectively contribute to dynamicpattern formation in Min protein bands on spatial scales that are much larger than the size ofan individual Min protein. To address this, I would extend the Asymmetric Activation Modelfrom a system of ordinary differential equations to a system of partial differential equationsthat describe how Min protein concentrations evolve in space and time. Then, using myhomotopy-minimization method, I would fit the extended model to experimental measurementsof traveling-wave Min protein bands on a supported lipid bilayer. Such modeling and fittingcould aid in unraveling how the Min system transduces local interactions into a global signal.My homotopy-minimization method could provide a novel computational assay for site-directed mutagenesis experiments, allowing me to map dynamic function in proteins to specificamino acids. In this homotopy-minimization mapping method, I would alter a single proteinresidue by site directed mutagenesis. Then, using reproducible experiments, like those of Ivanovand Mizuuchi, I would measure dynamic protein behavior. Ultimately, I would fit a model tothe measurements, like fitting the Asymmetric Activation Model to the near-homogeneous data,and would map changes in parameters from those fitted with the wild type protein to the aminoacid that I mutated. Other site-directed mutagenesis assays measure disruption of a specific925.5. Extensions of Fitting Models to Spatially Near-Homogeneous Min Dataprotein function and often require deleterious mutations, which dramatically affect proteinfunction. This homotopy-minimization mapping method would simultaneously map effects froma mutation across multiple functional states of a protein and would require mutations that onlymildly alter protein function. It could also provide insight into dynamic protein function that isotherwise difficult to assay, such as cooperative binding. Ultimately, this homotopy-minimizationmapping method could provide a useful bridge between modeling and experiments that couldcontribute to our understanding of the dynamic structure of proteins.93Bibliography[1] Satya Nanda Vel Arjunan and Masaru Tomita. A new multicompartmental reaction-diffusionmodeling method links transient membrane attachment of E. coli MinE to E-ring formation.Systems and synthetic biology, 4(1):35–53, 2010.[2] Daniel Axelrod, Thomas P Burghardt, and Nancy L Thompson. Total internal reflectionfluorescence. Annual review of biophysics and bioengineering, 13(1):247–268, 1984.[3] Erfei Bi and J Lutkenhaus. Cell division inhibitors SulA and MinCD prevent formation ofthe FtsZ ring. Journal of bacteriology, 175(4):1118–1125, 1993.[4] Mike Bonny, Elisabeth Fischer-Friedrich, Martin Loose, Petra Schwille, and Karsten Kruse.Membrane binding of mine allows for a comprehensive description of min-protein patternformation. PLoS Comput Biol, 9(12):e1003347, 2013.[5] Peter Borowski and Eric N Cytrynbaum. Predictions from a stochastic polymer model forthe MinDE protein dynamics in Escherichia coli. Physical Review E, 80(4):041916, 2009.[6] James P. Boyle and Richard L. Dykstra. A method for finding projections onto theintersection of convex sets in hilbert spaces. In Richard Dykstra, Tim Robertson, andFarroll T. Wright, editors, Advances in Order Restricted Statistical Inference, pages 28–47,New York, NY, 1986. Springer New York.[7] D.A. Campbell and O. Chkrebtii. Maximum profile likelihood estimation of differential equa-tion parameters through model based smoothing state estimates. Mathematical Biosciences,246(2):283 – 292, 2013.[8] David Campbell and Russell J. Steele. Smooth functional tempering for nonlinear differentialequation models. Statistics and Computing, 22(2):429–443, 2012.[9] David A. Campbell, Giles Hooker, and Kim B. McAuley. Parameter estimation in differentialequation models with constrained states. Journal of Chemometrics, 26(6):322–332, 2012.[10] Jiguo Cao and James O. Ramsay. Parameter cascades and profiling in functional dataanalysis. Computational Statistics, 22(3):335–351, 2007.[11] J.P. Chandler, Doyle E. Hill, and H.Olin Spivey. A program for efficient integration of rateequations and least-squares fitting of chemical reaction data. Computers and BiomedicalResearch, 5(5):515 – 534, 1972.94Bibliography[12] Brian D Corbin, XuanChuan Yu, and William Margolin. Exploring intracellular space:function of the Min system in roundshaped Escherichia coli. The EMBO journal, 21(8):1998–2008, 2002.[13] Eric N Cytrynbaum and Brandon DL Marshall. A multistranded polymer model explainsMinDE dynamics in E. coli cell division. Biophysical journal, 93(4):1134–1150, 2007.[14] PA De Boer, Robin E Crossley, and Lawrence I Rothfield. Roles of MinC and MinD in thesite-specific septation block mediated by the MinCDE system of Escherichia coli. Journalof Bacteriology, 174(1):63–70, 1992.[15] Piet A.J. de Boer, Robin E. Crossley, and Lawrence I. Rothfield. A division inhibitor and atopological specificity factor coded for by the minicell locus determine proper placement ofthe division septum in e. coli. Cell, 56(4):641 – 649, 1989.[16] Frank Deutsch and Hein Hundal. The rate of convergence of dykstra’s cyclic projectionsalgorithm: The polyhedral case. Numerical Functional Analysis and Optimization, 15(5-6):537–565, 1994.[17] Barbara Di Ventura and Victor Sourjik. Selforganized partitioning of dynamically localizedproteins in bacterial cell division. Molecular systems biology, 7(1):457, 2011.[18] Daniel M. Dunlavy and Dianne P. O’Leary. Homotopy optimization methods for globaloptimization. United States. Dept. of Energy, 2005.[19] Richard L. Dykstra. An algorithm for restricted least squares regression. Journal of theAmerican Statistical Association, 78(384):837–842, 1983.[20] David Fange and Johan Elf. Noise-induced Min phenotypes in E. coli. PLoS Comput Biol,2(6):e80, 2006.[21] Elisabeth Fischer-Friedrich, Giovanni Meacci, Joe Lutkenhaus, Hugues Chate´, and KarstenKruse. Intra-and intercellular fluctuations in Min-protein dynamics decrease with celllength. Proceedings of the National Academy of Sciences, 107(14):6134–6139, 2010.[22] Jacob Halatek and Erwin Frey. Highly canalized MinD transfer and MinE sequestrationexplain the origin of robust MinCDE-protein dynamics. Cell Reports, 1(6):741–752, 2012.[23] Cynthia A Hale, Hans Meinhardt, and Piet AJ de Boer. Dynamic localization cycle of thecell division regulator MinE in Escherichia coli. The EMBO journal, 20(7):1563–1572, 2001.[24] Max Hoffmann and Ulrich S Schwarz. Oscillations of Min-proteins in micropatternedenvironments: a three-dimensional particle-based stochastic simulation approach. SoftMatter, 10(14):2388–2396, 2014.95Bibliography[25] Martin Howard and Andrew D Rutenberg. Pattern formation inside bacteria: fluctuationsdue to the low copy number of proteins. Physical Review Letters, 90(12):128102, 2003.[26] Martin Howard, Andrew D Rutenberg, and Simon de Vet. Dynamic compartmentalizationof bacteria: accurate division in E. coli. Physical review letters, 87(27):278102, 2001.[27] ChengWei Hsieh, TiYu Lin, HsinMei Lai, ChuChi Lin, TingSung Hsieh, and YuLing Shih.Direct MinE–membrane interaction contributes to the proper localization of MinDE in E.coli. Molecular microbiology, 75(2):499–512, 2010.[28] Zonglin Hu, Edward P Gogol, and Joe Lutkenhaus. Dynamic assembly of MinD onphospholipid vesicles regulated by ATP and MinE. Proceedings of the National Academy ofSciences, 99(10):6761–6766, 2002.[29] Zonglin Hu and Joe Lutkenhaus. Topological regulation of cell division in Escherichia coliinvolves rapid pole to pole oscillation of the division inhibitor MinC under the control ofMinD and MinE. Molecular microbiology, 34(1):82–90, 1999.[30] Zonglin Hu and Joe Lutkenhaus. Analysis of MinC reveals two independent domainsinvolved in interaction with MinD and FtsZ. Journal of bacteriology, 182(14):3965–3971,2000.[31] Zonglin Hu and Joe Lutkenhaus. Topological regulation of cell division in E. coli: spatiotem-poral oscillation of MinD requires stimulation of its ATPase by MinE and phospholipid.Molecular cell, 7(6):1337–1343, 2001.[32] Zonglin Hu and Joe Lutkenhaus. A conserved sequence at the Cterminus of MinD is requiredfor binding to the membrane and targeting MinC to the septum. Molecular microbiology,47(2):345–355, 2003.[33] Zonglin Hu, Amit Mukherjee, Sebastien Pichoff, and Joe Lutkenhaus. The MinC componentof the division site selection system in Escherichia coli interacts with FtsZ to preventpolymerization. Proceedings of the National Academy of Sciences, 96(26):14819–14824,1999.[34] Zonglin Hu, Cristian Saez, and Joe Lutkenhaus. Recruitment of MinC, an inhibitor ofZ-ring formation, to the membrane in Escherichia coli: role of MinD and MinE. Journal ofbacteriology, 185(1):196–203, 2003.[35] Jian Huang, Chune Cao, and Joe Lutkenhaus. Interaction between FtsZ and inhibitors ofcell division. Journal of Bacteriology, 178(17):5080–5085, 1996.[36] Kerwyn Casey Huang, Yigal Meir, and Ned S Wingreen. Dynamic structures in Escherichiacoli: spontaneous formation of MinE rings and MinD polar zones. Proceedings of theNational Academy of Sciences, 100(22):12724–12728, 2003.96Bibliography[37] Kerwyn Casey Huang and Ned S Wingreen. Min-protein oscillations in round bacteria.Physical biology, 1(4):229, 2004.[38] Vassili Ivanov and Kiyoshi Mizuuchi. Multiple modes of interconverting dynamic patternformation by bacterial cell division proteins. Proceedings of the National Academy ofSciences, 107(18):8071–8078, 2010.[39] M. Joshi, A. Seidel-Morgenstern, and A. Kremling. Exploiting the bootstrap method forquantifying parameter confidence intervals in dynamical systems. Metabolic Engineering,8(5):447 – 455, 2006.[40] Rex A Kerr, Herbert Levine, Terrence J Sejnowski, and Wouter-Jan Rappel. Divisionaccuracy in a stochastic model of Min oscillations in Escherichia coli. Proceedings of theNational Academy of Sciences of the United States of America, 103(2):347–352, 2006.[41] Karsten Kruse. A dynamic model for determining the middle of Escherichia coli. Biophysicaljournal, 82(2):618–627, 2002.[42] Laura L Lackner, David M Raskin, and Piet AJ de Boer. ATP-dependent interactionsbetween Escherichia coli Min proteins and the phospholipid membrane in vitro. Journal ofbacteriology, 185(3):735–749, 2003.[43] Hua Liang and Hulin Wu. Parameter estimation for differential equation models using aframework of measurement error in regression models. Journal of the American StatisticalAssociation, 103(484):1570–1583, 2008. PMID: 19956350.[44] Martin Loose, Elisabeth Fischer-Friedrich, Christoph Herold, Karsten Kruse, and PetraSchwille. Min protein patterns emerge from rapid rebinding and membrane interaction ofmine. Nat Struct Mol Biol, 18(5):577–583, 05 2011.[45] Martin Loose, Elisabeth Fischer-Friedrich, Jonas Ries, Karsten Kruse, and Petra Schwille.Spatial regulators for bacterial cell division self-organize into surface waves in vitro. Science,320(5877):789–792, 2008.[46] Lu-Yan Ma, Glenn King, and Lawrence Rothfield. Mapping the MinE site involved ininteraction with the MinD division site selection protein of Escherichia coli. Journal ofBacteriology, 185(16):4948–4955, 2003.[47] Giovanni Meacci and Karsten Kruse. Min-oscillations in Escherichia coli induced byinteractions of membrane-bound proteins. Physical Biology, 2(2):89, 2005.[48] Hans Meinhardt and Piet AJ de Boer. Pattern formation in Escherichia coli: a modelfor the pole-to-pole oscillations of Min proteins and the localization of the division site.Proceedings of the National Academy of Sciences, 98(25):14202–14207, 2001.97Bibliography[49] Hongyu Miao, Carrie Dykes, Lisa M. Demeter, and Hulin Wu. Differential equation modelingof hiv viral fitness experiments: Model identification, model selection, and multimodelinference. Biometrics, 65(1):292–300, 2009.[50] Yurii Nesterov. A method for unconstrained convex minimization problem with the rate ofconvergence o (1/k2). Doklady an SSSR, 269(3):543–547, 1983.[51] Brendan O’Donoghue and Emmanuel Cande`s. Adaptive restart for accelerated gradientschemes. Foundations of Computational Mathematics, 15(3):715–732, 2015.[52] Kyung-Tae Park, Wei Wu, Kevin P Battaile, Scott Lovell, Todd Holyoak, and Joe Lutken-haus. The Min oscillator uses MinD-dependent conformational changes in MinE to spatiallyregulate cytokinesis. Cell, 146(3):396–407, 2011.[53] KyungTae Park, Wei Wu, Scott Lovell, and Joe Lutkenhaus. Mechanism of the asymmetricactivation of the MinD ATPase by MinE. Molecular microbiology, 85(2):271–281, 2012.[54] Zdeneˇk Petra´sˇek and Petra Schwille. Simple membrane-based model of the Min oscillator.New Journal of Physics, 17(4):043023, 2015.[55] Sebastien Pichoff and Joe Lutkenhaus. Escherichia coli division inhibitor MinCD blocksseptation by preventing Z-ring formation. Journal of bacteriology, 183(22):6630–6635, 2001.[56] Se´bastien Pichoff, Benedikt Vollrath, Christian Touriol, and JeanPierre Bouche´. Deletionanalysis of gene minE which encodes the topological specificity factor of cell division inEscherichia coli. Molecular microbiology, 18(2):321–329, 1995.[57] A.A. Poyton, M.S. Varziri, K.B. McAuley, P.J. McLellan, and J.O. Ramsay. Parameter esti-mation in continuous-time dynamic models using principal differential analysis. Computersand Chemical Engineering, 30(4):698 – 708, 2006.[58] J. O. Ramsay, G. Hooker, D. Campbell, and J. Cao. Parameter estimation for differentialequations: a generalized smoothing approach. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 69(5):741–796, 2007.[59] David M. Raskin and Piet A.J. de Boer. The MinE ring: An FtsZ-independent cell structurerequired for selection of the correct division site in E. coli. Cell, 91(5):685 – 694, 1997.[60] David M Raskin and Piet AJ de Boer. Rapid pole-to-pole oscillation of a protein requiredfor directing division to the middle of escherichia coli. Proceedings of the National Academyof Sciences, 96(9):4971–4976, 1999.[61] S. L. Rowland, X. Fu, M. A. Sayed, Y. Zhang, W. R. Cook, and L. I. Rothfield. Membraneredistribution of the Escherichia coli MinD protein induced by MinE. Journal of Bacteriology,182(3):613–619, 2000.98Bibliography[62] Jeff B Schulte, Rene W Zeto, and David Roundy. Theoretical prediction of disrupted minoscillation in flattened Escherichia coli. PloS one, 10(10):e0139813, 2015.[63] Jakob Schweizer, Martin Loose, Mike Bonny, Karsten Kruse, Ingolf Mo¨nch, and PetraSchwille. Geometry sensing by self-organized protein patterns. Proceedings of the NationalAcademy of Sciences, 109(38):15283–15288, 2012.[64] Supratim Sengupta, Julien Derr, Anirban Sain, and Andrew D Rutenberg. Stuttering Minoscillations within E. coli bacteria: a stochastic polymerization model. Physical biology,9(5):056003, 2012.[65] Supratim Sengupta and Andrew Rutenberg. Modeling partitioning of Min proteins betweendaughter cells after septation in Escherichia coli. Physical biology, 4(3):145, 2007.[66] Yu-Ling Shih, Kai-Fa Huang, Hsin-Mei Lai, Jiahn-Haur Liao, Chai-Siah Lee, Chiao-MinChang, Huey-Ming Mak, Cheng-Wei Hsieh, and Chu-Chi Lin. The N-terminal amphipathichelix of the topological specificity factor MinE is associated with shaping membranecurvature. PloS one, 6(6):e21425, 2011.[67] YuLing Shih, Xiaoli Fu, Glenn F King, Trung Le, and Lawrence Rothfield. Division siteplacement in E. coli: mutations that prevent formation of the MinE ring lead to loss of thenormal midcell arrest of growth of polar MinD membrane domains. The EMBO journal,21(13):3347–3357, 2002.[68] YuLing Shih, Ikuro Kawagishi, and Lawrence Rothfield. The MreB and Min cytoskeletallikesystems play independent roles in prokaryotic polar differentiation. Molecular microbiology,58(4):917–928, 2005.[69] Oleksii Sliusarenko, Jennifer Heinritz, Thierry Emonet, and Christine JacobsWagner.Highthroughput, subpixel precision analysis of bacterial morphogenesis and intracellularspatiotemporal dynamics. Molecular microbiology, 80(3):612–627, 2011.[70] Filipe Tostevin and Martin Howard. A stochastic model of Min oscillations in Escherichiacoli and Min protein segregation during cell division. Physical biology, 3(1):1, 2005.[71] J. M. Varah. A spline least squares method for numerical parameter estimation in differentialequations. SIAM Journal on Scientific and Statistical Computing, 3(1):28–46, 1982.[72] Archana Varma, Kerwyn Casey Huang, and Kevin D Young. The Min system as a generalcell geometry detection mechanism: branch lengths in Y-shaped Escherichia coli cells affectMin oscillation patterns and division dynamics. Journal of bacteriology, 190(6):2106–2117,2008.[73] Anthony G Vecchiarelli, Min Li, Michiyo Mizuuchi, Ling Chin Hwang, Yeonee Seol, Keir CNeuman, and Kiyoshi Mizuuchi. Membrane-bound MinDE complex acts as a toggle switch99that drives Min oscillation coupled to cytoplasmic depletion of MinD. Proceedings of theNational Academy of Sciences, 113(11):E1479–E1488, 2016.[74] Anthony G Vecchiarelli, Min Li, Michiyo Mizuuchi, and Kiyoshi Mizuuchi. Differentialaffinities of MinD and MinE to anionic phospholipid influence Min patterning dynamics invitro. Molecular microbiology, 93(3):453–463, 2014.[75] James C Walsh, Christopher N Angstmann, Iain G Duggin, and Paul MG Curmi. Molecularinteractions of the Min protein system reproduce spatiotemporal patterning in growing anddividing Escherichia coli cells. PloS one, 10(5):e0128148, 2015.[76] Wei Wu, KyungTae Park, Todd Holyoak, and Joe Lutkenhaus. Determination of thestructure of the MinD–ATP complex reveals the orientation of MinD on the membraneand the relative location of the binding sites for MinE and MinC. Molecular microbiology,79(6):1515–1528, 2011.[77] Hongyu Zhao Xin Qi. Asymptotic efficiency and finite-sample properties of the generalizedprofiling estimation of parameters in ordinary differential equations. The Annals of Statistics,38(1):435–481, 2010.[78] Hongqi Xue, Hongyu Miao, and Hulin Wu. Sieve estimation of constant and time-varyingcoefficients in nonlinear ordinary differential equation models by considering both numericalerror and measurement error. Annals of statistics, 38(4):2351–2387, 01 2010.[79] Xiaolei Xun, Jiguo Cao, Bani Mallick, Arnab Maity, and Raymond J. Carroll. Parameterestimation of partial differential equation models. Journal of the American StatisticalAssociation, 108(503):1009–1020, 2013.[80] XuanChuan Yu and William Margolin. FtsZ ring clusters in min and partition mutants:role of both the Min system and the nucleoid in regulating FtsZ ring localization. Molecularmicrobiology, 32(2):315–326, 1999.[81] Huaijin Zhou, Ryan Schulze, Sandra Cox, Cristian Saez, Zonglin Hu, and Joe Lutkenhaus.Analysis of MinD mutations reveals residues required for MinE stimulation of the MinDATPase and residues required for MinC interaction. Journal of Bacteriology, 187(2):629–638,2005.[82] Katja Zieske and Petra Schwille. Reconstitution of self-organizing protein gradients asspatial cues in cell-free systems. Elife, 3:e03949, 2014.100Appendix AExtensions of theHomotopy-Minimization MethodBeyond Systems of First OrderOrdinary Differential EquationsHere, I extend the parameter estimation method described for systems of first order ordinarydifferential equations in Sections 2.2 and 2.6 to systems of higher order ordinary differentialequations and systems of partial differential equations.A.1 Extensions to Systems of Higher Order OrdinaryDifferential EquationsA higher order ordinary differential equation model of some dynamic process in t, with nxstates, x1, x2, . . . , xnx , np parameters, p1, p2, . . . , pnp , and ny observable states, y1, y2, . . . , yny ,is defined by the system of equations,Fi(t, p1, . . . , pnp , x1, . . . , xnx ,dx1dt,d2x1dt2,d3x1dt3, . . . ,dx2dt,d2x2dt2, . . .)= 0, (A.1a)yj = gj(p1, p2, . . . , pnp , x1, x2, . . . , xnx). (A.1b)For some observed data values, y1,k, y2,k, . . . , yny ,k, measured at at tk, for k ∈ {1, 2, . . . , nt}, Iextend the method described for first order ordinary differential equation models in Sections2.2 and 2.6, to find the parameters p1, p2, . . . , pnp of the numerical solution to the differentialequation model with the observable state values that most closely approximate the observed data,in some sense. As with first order ordinary differential equation models, once a numerical methodis chosen, equation (A.1a) can be formulated into a method-dependent system of equations forthe discrete numerical solution values xi,k:fi,k(t1, . . . , tnt , p1, . . . , pnp , x1,1, x2,1, . . . , xnx,nt)= fi,k(t,p,x) = 0, (A.2)101A.1. Extensions to Systems of Higher Order Ordinary Differential Equationsfor all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆, the index set of the numerical discretization. Usingsome interpolation method, I generate interpolated data, yˆj,k for j ∈ {1, 2, . . . , ny}, at grid pointswith indices in Iyˆ = I∆ \ {1, 2, . . . , nt} from observed data values with indices in {1, 2, . . . , nt}.As with first order ordinary differential equation models, I define the functionals ry(p,x),ryˆ(p,x), and r∆x(p,x) with the properties that (i) ry(p,x) ≥ 0, ryˆ(p,x) ≥ 0, and r∆x(p,x) ≥ 0;(ii) ry(p,x) = 0 if and only if gj(p, x1,k, . . . , xnx,k) = yj,k for all j ∈ {1, 2, . . . , ny} and for allk ∈ {1, 2, . . . , nt}, ryˆ(p,x) = 0 if and only if I∆ = {1, 2, . . . , nt} or gj(p, x1,k, . . . , xnx,k) = yˆj,kfor all j ∈ {1, 2, . . . , ny} and for all k ∈ Iyˆ, and r∆x(p,x) = 0 if and only if fi,k(t,p,x) = 0 for alli ∈ {1, 2, . . . , nx} and for all k ∈ I∆; and (iii) ry(p1,x1) < ry(p2,x2) implies that (p1,x1) givesa better fit to the data than does (p2,x2), ryˆ(p1,x1) < ryˆ(p2,x2) implies that (p1,x1) givesa better fit to the interpolated data than does (p2,x2), and r∆x(p1,x1) < r∆x(p2,x2) impliesthat (p1,x1) satisfies the numerical solution method better than (p2,x2) does. Then, I combinery(p,x), ryˆ(p,x), and r∆x(p,x) into the single functional r(p,x;λ) = (1 − λ)ry(p,x) + (1 −λ)2ryˆ(p,x) + λr∆x(p,x). I describe the construction of ry(p,x), ryˆ(p,x), and r∆x(p,x) usinga normalized least-squares measure in Section A.1.1. As with first order ordinary differentialequation models, minimizing r(p,x;λ) as λ→ 1− is equivalent to minimizing ry(p,x) subjectto the constraints fi,k(t,p,x) = 0 for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆, and finds theparameters and state values of the optimal data-fitting numerical solution.A.1.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x)As with first order ordinary differential equation models, I consider systems of higher orderordinary differential equations that are linear in first derivatives of xi,dxidt= F¯i(t, p1, . . . , pnp , x1, . . . , xnx ,d2x1dt2,d3x1dt3, . . . ,d2x2dt2, . . .), (A.3)for i ∈ {1, 2, . . . , nx}. In terms of Fi as defined in equation (A.1a),Fi =dxidt− F¯i(t, p1, . . . , pnp , x1, . . . , xnx ,d2x1dt2,d3x1dt3, . . . ,d2x2dt2, . . .), (A.4)for i ∈ {1, 2, . . . , nx}. Also, as with first order ordinary differential equation models, I considerfinite difference numerical methods, of the form∆1xi,k =Fi,k( {F¯i(tk,p, x1,k, . . . , xnx,k,∆2x1,k,∆3x1,k, . . . ,∆2x2,k, . . . ) : k ∈ I∆} )=Fi,k(t,p,x), (A.5)where xi,k are numerical solution values, ∆nxi,k is some method-dependent finite differencediscretization of dnxidtn at tk, and Fi,k are some method-dependent functions of F¯i at tk, for all102A.2. Extensions to Systems of Partial Differential Equationsi ∈ {1, 2, . . . , nx} and for all k ∈ I∆. In terms fi,k as defined in equation (A.2),fi,k(t,p,x) = ∆1xi,k − Fi,k(t,p,x). (A.6)Thus, using a normalized least-squares measure, as described for first order ordinary differentialequation models in Sections 2.3, 2.4.1, and 2.6.1,ry(p,x) =1nyny∑j=1(1∑ntk=1wj,ky2j,knt∑k=1wj,k(yj,k − gj(p, x1,k, . . . , xnx,k))2), (A.7a)ryˆ(p,x) =1nyny∑j=1 σˆ∑k∈Iyˆ wˆj,kyˆ2j,k∑k∈Iyˆwˆj,k(yˆj,k − gj(p, x1,k, . . . , xnx,k))2 , (A.7b)r∆x(p,x) =1nxnx∑i=1 si(x)∑k∈I∆(∆1xi,k)2∑k∈I∆(∆1xi,k − Fi,k(t,p,x))2 , (A.7c)si(x) = αi + βi(4∑k∈I∆\{1,nt}(xi,k− − 2xi,k + xi,k+)2∑k∈I∆\{1,nt}(xi,k+ − xi,k−)2)γi, (A.7d)for some data-dependent weights, wj,k, some interpolated data-dependent weights, wˆj,k, scalingparameter σˆ, as discussed in Section 2.6.1, and smoothing-penalty parameters, αi > 0, βi ≥ 0,and γi ≥ 0, as discussed in Section 2.4.1, where k− and k+ are the indices below and above k inI∆.A.2 Extensions to Systems of Partial Differential EquationsFor simplicity in notation, I present a partial differential equation model with two independentvariables. Although, the method naturally extends to models with any finite number ofindependent variables.A partial differential equation model of some dynamic process in u and v, with nx states,x1, x2, . . . , xnx , np parameters, p1, p2, . . . , pnp , and ny observable states, y1, y2, . . . , yny , is definedby the system of equations,Fi(u, v, p1, . . . , pnp , x1, . . . , xnx ,∂x1∂u,∂x1∂v,∂2x1∂2u,∂2x1∂u∂v, . . . ,∂x2∂u,∂x2∂v, . . .)= 0, (A.8a)yj = gj(p1, p2, . . . , pnp , x1, x2, . . . , xnx). (A.8b)For some observed data values, y1,k,l, y2,k,l, . . . , yny ,k,l, measured at (uk, vl), for k ∈ {1, 2, . . . , nu}and l ∈ {1, 2, . . . , nv}, I extend the method described for first order ordinary differential equationmodels in Sections 2.2 and 2.6, to find the parameters p1, p2, . . . , pnp of the numerical solution tothe differential equation model with the observable state values that most closely approximatethe observed data, in some sense. As with first order ordinary differential equation models, once103A.2. Extensions to Systems of Partial Differential Equationsa numerical method is chosen, equation (A.8a) can be formulated into a method-dependentsystem of equations for the discrete numerical solution values xi,k,l:fi,k,l(u1, . . . , unu , v1, . . . , vnv , p1, . . . , pnp , x1,1,1, x2,1,1, . . . , xnx,nu,nv)=fi,k,l(u,v,p,x) = 0, (A.9)for all i ∈ {1, 2, . . . , nx} and for all (k, l) ∈ I∆u × I∆v , the index set of the numerical discretiza-tion. Using some interpolation method, I generate interpolated data, yˆj,k,l for j ∈ {1, 2, . . . , ny},at grid points with indices in Iyˆu × Iyˆv = I∆u \ {1, 2, . . . , nu} × I∆v \ {1, 2, . . . , nv} fromobserved data values with indices in {1, 2, . . . , nu} × {1, 2, . . . , nv}. As with first order ordi-nary differential equation models, I define the functionals ry(p,x), ryˆ(p,x), and r∆x(p,x)with the properties that (i) ry(p,x) ≥ 0, ryˆ(p,x) ≥ 0, and r∆x(p,x) ≥ 0; (ii) ry(p,x) = 0if and only if gj(p, x1,k,l, . . . , xnx,k,l) = yj,k,l for all j ∈ {1, 2, . . . , ny} and for all (k, l) ∈{1, 2, . . . , nu}×{1, 2, . . . , nv}, ryˆ(p,x) = 0 if and only if I∆u×I∆v = {1, 2, . . . , nu}×{1, 2, . . . , nv}or gj(p, x1,k,l, . . . , xnx,k,l) = yˆj,k,l for all j ∈ {1, 2, . . . , ny} and for all (k, l) ∈ Iyˆu × Iyˆv ,and r∆x(p,x) = 0 if and only if fi,k,l(u,v,p,x) = 0 for all i ∈ {1, 2, . . . , nx} and for all(k, l) ∈ I∆u × I∆v ; and (iii) ry(p1,x1) < ry(p2,x2) implies that (p1,x1) gives a better fit to thedata than does (p2,x2), ryˆ(p1,x1) < ryˆ(p2,x2) implies that (p1,x1) gives a better fit to theinterpolated data than does (p2,x2), and r∆x(p1,x1) < r∆x(p2,x2) implies that (p1,x1) satisfiesthe numerical solution method better than (p2,x2) does. Then, I combine ry(p,x), ryˆ(p,x), andr∆x(p,x) into the single functional r(p,x;λ) = (1− λ)ry(p,x) + (1− λ)2ryˆ(p,x) + λr∆x(p,x).I describe the construction of ry(p,x), ryˆ(p,x), and r∆x(p,x) using a normalized least-squaresmeasure in Section A.2.1. As with first order ordinary differential equation models, mini-mizing r(p,x;λ) as λ → 1− is equivalent to minimizing ry(p,x) subject to the constraintsfi,k,l(u,v,p,x) = 0 for all i ∈ {1, 2, . . . , nx} and for all (k, l) ∈ I∆u × I∆v , and finds theparameters and state values of the optimal data-fitting numerical solution.A.2.1 Defining ry(p,x), ryˆ(p,x), and r∆x(p,x)As with first order ordinary differential equation models, I consider systems of partial differentialequations that are linear in first derivatives of xi with respect to u,∂xi∂u= F¯i(u, v, p1, . . . , pnp , x1, . . . , xnx ,∂x1∂v,∂2x1∂2u,∂2x1∂u∂v, . . . ,∂x2∂v, . . .), (A.10)for i ∈ {1, 2, . . . , nx}. In terms of Fi as defined in equation (A.8a),Fi =∂xi∂u− F¯i(u, v, p1, . . . , pnp , x1, . . . , xnx ,∂x1∂v,∂2x1∂2u,∂2x1∂u∂v, . . . ,∂x2∂v, . . .), (A.11)104A.2. Extensions to Systems of Partial Differential Equationsfor i ∈ {1, 2, . . . , nx}. Also, as with first order ordinary differential equation models, I considerfinite difference numerical methods, of the form∆1,0xi,k,l =Fi,k,l( {F¯i(uk, vl,p, x1,k,l, . . . , xnx,k,l,∆0,1x1,k,l,∆2,0x1,k,l, . . . ) : k, l ∈ I∆u , I∆v} )= Fi,k,l(u,v,p,x), (A.12)where xi,k,l are numerical solution values, ∆n,mxi,k,l is some method-dependent finite differencediscretization of ∂n+mxi∂un∂vm at (uk, vl), and Fi,k,l are some method-dependent functions of F¯i at(uk, vl), for all i ∈ {1, 2, . . . , nx} and for all (k, l) ∈ I∆u × I∆v . In terms fi,k,l as defined inequation (A.9),fi,k,l(u,v,p,x) = ∆1,0xi,k,l − Fi,k,l(u,v,p,x). (A.13)Thus, using a normalized least-squares measure, as described for first order ordinary differentialequation models in Sections 2.3, 2.4.1, and 2.6.1,ry(p,x) =1nyny∑j=1(1∑nuk=1∑nvl=1wj,k,ly2j,k,lnu∑k=1nv∑l=1wj,k,l(yj,k,l − gj(p, x1,k,l, . . . , xnx,k,l))2), (A.14a)ryˆ(p,x) =1nyny∑j=1 σˆ∑k∈Iyˆu∑l∈Iyˆv wˆj,k,lyˆ2j,k,l∑k∈Iyˆu∑l∈Iyˆvwˆj,k,l(yˆj,k,l − gj(p, x1,k,l, . . . , xnx,k,l))2 ,(A.14b)r∆x(p,x) =1nxnx∑i=1 sui (x) + svi (x)2∑k∈I∆u∑l∈I∆v (∆1,0xi,k,l)2∑k∈I∆u∑l∈I∆v(∆1,0xi,k,l − Fi,k,l(u,v,p,x))2 , (A.14c)sui (x) = αi + βi4∑k∈I∆u\{1,nu}∑l∈I∆v (xi,k−,l − 2xi,k,l + xi,k+,l)2∑k∈I∆u\{1,nu}∑l∈I∆v(xi,k+,l − xi,k−,l)2γi , (A.14d)svi (x) = αi + βi4∑k∈I∆u ∑l∈I∆v\{1,nv} (xi,k,l− − 2xi,k,l + xi,k,l+)2∑k∈I∆u∑l∈I∆v\{1,nv}(xi,k,l+ − xi,k,l−)2γi , (A.14e)for some data-dependent weights, wj,k,l, some interpolated data-dependent weights, wˆj,k,l, scalingparameter σˆ, as discussed in Section 2.6.1, and smoothing-penalty parameters, αi > 0, βi ≥ 0,and γi ≥ 0, as discussed in Section 2.4.1, where k− and k+ are the indices below and above k inI∆u and l− and l+ are the indices below and above l in I∆v .I note that under the linear indexing m = (k, l) ∈ I∆u × I∆v = I∆, with t = u × v and105A.2. Extensions to Systems of Partial Differential Equationsnt = nunv, r(p,x;λ) constructed from the functionals in equation (A.14) is equivalent in formto r(p,x;λ) of equation (D.1). Thus, the computational complexity count from Section D.1applies to descent on r(p,x;λ) with PDEs.106Appendix BProperties of r(p,x;λ)Here, I derive properties imposed on r(p,x;λ) by minimization. For λ ∈ (0, 1), the parametersand state values that minimize r(p,x;λ), p˘λ and x˘λ, allow me to define functions, r˘(λ), r˘y(λ),r˘yˆ(λ), and r˘∆x(λ), as follows:r˘(λ) = (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) =(1− λ)ry(p˘λ, x˘λ) + (1− λ)2ryˆ(p˘λ, x˘λ) + λr∆x(p˘λ, x˘λ). (B.1)For any λ ∈ (0, 1), no set of parameters and state values can further minimize r(p,x;λ). Thus,for all λ ∈ (0, 1) and for all ε such that λ+ ε ∈ (0, 1),r˘(λ) = (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) =(1− λ)ry(p˘λ, x˘λ) + (1− λ)2ryˆ(p˘λ, x˘λ) + λr∆x(p˘λ, x˘λ) ≤(1− λ)ry(p˘λ+ε, x˘λ+ε) + (1− λ)2ryˆ(p˘λ+ε, x˘λ+ε) + λr∆x(p˘λ+ε, x˘λ+ε) =(1− λ)r˘y(λ+ ε) + (1− λ)2r˘yˆ(λ+ ε) + λr˘∆x(λ+ ε)⇐⇒(1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) ≤(1− λ)r˘y(λ+ ε) + (1− λ)2r˘yˆ(λ+ ε) + λr˘∆x(λ+ ε) (B.2)andr˘(λ+ ε) = (1− λ− ε)r˘y(λ+ ε) + (1− λ− ε)2r˘yˆ(λ+ ε) + (λ+ ε)r˘∆x(λ+ ε) =(1− λ− ε)ry(p˘λ+ε, x˘λ+ε) + (1− λ− ε)2ryˆ(p˘λ+ε, x˘λ+ε) + (λ+ ε)r∆x(p˘λ+ε, x˘λ+ε) ≤(1− λ− ε)ry(p˘λ, x˘λ) + (1− λ− ε)2ryˆ(p˘λ, x˘λ) + (λ+ ε)r∆x(p˘λ, x˘λ) =(1− λ− ε)r˘y(λ) + (1− λ− ε)2r˘yˆ(λ) + (λ+ ε)r˘∆x(λ)⇐⇒(1− λ− ε)r˘y(λ+ ε) + (1− λ− ε)2r˘yˆ(λ+ ε) + (λ+ ε)r˘∆x(λ+ ε) ≤(1− λ− ε)r˘y(λ) + (1− λ− ε)2r˘yˆ(λ) + (λ+ ε)r˘∆x(λ). (B.3)From these relations, I can determine some properties of the imposed structure on r˘(λ).107B.1. Limiting Behavior of r˘(λ)B.1 Limiting Behavior of r˘(λ)Theorem 1.limλ→0+r˘(λ) = limλ→0+min r(p,x;λ) = 0,limλ→0+p˘λ, x˘λ = arg min(r∆x(p,x) : ry(p,x) = 0, ryˆ(p,x) = 0),limλ→1−r˘(λ) = limλ→1−min r(p,x;λ) = 0,limλ→1−p˘λ, x˘λ = arg min(ry(p,x) : r∆x(p,x) = 0).Proof. By construction, min (ry(p,x) + ryˆ(p,x)) = 0, where observable state values matchobserved data and interpolated data. Thus,limλ→0+r˘(λ) = limλ→0+min r(p,x;λ) =limλ→0+min((1− λ)ry(p,x) + (1− λ)2ryˆ(p,x) + λr∆x(p,x))=min(ry(p,x) + ryˆ(p,x))= 0. (B.4)Thus,limλ→0+arg min r(p,x;λ) ∈ {p,x : ry(p,x) + ryˆ(p,x) = 0} =⇒limλ→0+arg min r(p,x;λ) ∈ {p,x : ry(p,x) = 0, ryˆ(p,x) = 0}, (B.5)which implies thatlimλ→0+p˘λ, x˘λ = limλ→0+arg min r(p,x;λ) =limλ→0+arg min(r(p,x;λ) : ry(p,x) = 0, ryˆ(p,x) = 0)=limλ→0+arg min(λr∆x(p,x) : ry(p,x) = 0, ryˆ(p,x) = 0)=limλ→0+arg min(r∆x(p,x) : ry(p,x) = 0, ryˆ(p,x) = 0)=arg min(r∆x(p,x) : ry(p,x) = 0, ryˆ(p,x) = 0). (B.6)By construction, min r∆x(p,x) = 0, where parameters and state values satisfy the chosennumerical solution method. Thus,limλ→1−r˘(λ) = limλ→1−min r(p,x;λ) =limλ→1−min((1− λ)ry(p,x) + (1− λ)2ryˆ(p,x) + λr∆x(p,x))=min r∆x(p,x) = 0. (B.7)108B.2. Continuity of r˘(λ)Thus,limλ→1−arg min r(p,x;λ) ∈ {p,x : r∆x(p,x) = 0}, (B.8)which implies thatlimλ→1−p˘λ, x˘λ = limλ→1−arg min r(p,x;λ) =limλ→1−arg min(r(p,x;λ) : r∆x(p,x) = 0)=limλ→1−arg min((1− λ)ry(p,x) + (1− λ)2ryˆ(p,x) : r∆x(p,x) = 0)=limλ→1−arg min(ry(p,x) + (1− λ)ryˆ(p,x) : r∆x(p,x) = 0)=limλ→1−arg min(ry(p,x) : r∆x(p,x) = 0)=arg min(ry(p,x) : r∆x(p,x) = 0). (B.9)B.2 Continuity of r˘(λ)Theorem 2. r˘(λ) is continuous for λ ∈ (0, 1).Proof. From equation (B.3),(1− λ)r˘y(λ+ ε) + (1− λ)2r˘yˆ(λ+ ε) + λr˘∆x(λ+ ε) ≤(1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) + ε(r˘y(λ+ ε)− r˘y(λ))+2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)), (B.10)which, in conjunction with equation (B.2), implies that(1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) ≤(1− λ)r˘y(λ+ ε) + (1− λ)2r˘yˆ(λ+ ε) + λr˘∆x(λ+ ε) ≤(1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) + ε(r˘y(λ+ ε)− r˘y(λ))+2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)), (B.11)109B.3. Differentiability of r˘(λ)for all λ ∈ (0, 1) and for all ε such that λ+ ε ∈ (0, 1). Thus,r˘(λ) = limε→0((1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ)) ≤limε→0((1− λ)r˘y(λ+ ε) + (1− λ)2r˘yˆ(λ+ ε) + λr˘∆x(λ+ ε))=limε→0((1− λ− ε)r˘y(λ+ ε) + (1− λ− ε)2r˘yˆ(λ+ ε) + (λ+ ε)r˘∆x(λ+ ε))=limε→0r˘(λ+ ε) ≤limε→0((1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) + ε(r˘y(λ+ ε)− r˘y(λ))+2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)))= (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) = r˘(λ), (B.12)which implies thatlimε→0r˘(λ+ ε) = r˘(λ). (B.13)Therefore, r˘(λ) is continuous for λ ∈ (0, 1).B.3 Differentiability of r˘(λ)Theorem 3. For λ ∈ (0, 1) such that r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable,(1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ+ λdr˘∆x(λ)dλ= 0,anddr˘(λ)dλ= r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ).Proof. From Equations (B.2) and (B.3), for all λ ∈ (0, 1) and for all ε such that λ+ ε ∈ (0, 1),λ(r˘∆x(λ)− r˘∆x(λ+ ε)) ≤(1− λ)(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ)), (B.14a)(1− λ)(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ)) ≤λ(r˘∆x(λ)− r˘∆x(λ+ ε))+ ε(r˘y(λ+ ε)− r˘y(λ))+ 2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)), (B.14b)110B.3. Differentiability of r˘(λ)which implies thatλ(r˘∆x(λ)− r˘∆x(λ+ ε)) ≤(1− λ)(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ)) ≤λ(r˘∆x(λ)− r˘∆x(λ+ ε))+ ε(r˘y(λ+ ε)− r˘y(λ))+ 2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)). (B.15)Thus, for ε > 0,λr˘∆x(λ)− r˘∆x(λ+ ε)ε≤(1− λ) r˘y(λ+ ε)− r˘y(λ)ε+ (1− λ)2 r˘yˆ(λ+ ε)− r˘yˆ(λ)ε≤λr˘∆x(λ)− r˘∆x(λ+ ε)ε+(r˘y(λ+ ε)− r˘y(λ))+ 2(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε(r˘yˆ(λ)− r˘yˆ(λ+ ε))+(r˘∆x(λ)− r˘∆x(λ+ ε)). (B.16)For λ where r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable, and thus also continuous, relation (B.16)implies that−λdr˘∆x(λ)dλ= limε→0+(λr˘∆x(λ)− r˘∆x(λ+ ε)ε)≤limε→0+((1− λ) r˘y(λ+ ε)− r˘y(λ)ε+ (1− λ)2 r˘yˆ(λ+ ε)− r˘yˆ(λ)ε)=(1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ≤limε→0+(λr˘∆x(λ)− r˘∆x(λ+ ε)ε+(r˘y(λ+ ε)− r˘y(λ))+ 2(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε(r˘yˆ(λ)− r˘yˆ(λ+ ε))+(r˘∆x(λ)− r˘∆x(λ+ ε)))= −λdr˘∆x(λ)dλ, (B.17)which implies that where r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable,−λdr˘∆x(λ)dλ= (1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ. (B.18)r˘(λ) is differentiable where r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable, with value,dr˘(λ)dλ= (1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ+ λdr˘∆x(λ)dλ−r˘y(λ)− 2(1− λ)r˘yˆ(λ) + r˘∆x(λ), (B.19)111B.4. Conservation in r˘y(λ)which, in conjunction with equation (B.18), implies thatdr˘(λ)dλ= r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ). (B.20)B.4 Conservation in r˘y(λ)Theorem 4. If r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at all but a finite number of pointsin (0, 1), then2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ =∫ 10r˘∆x(λ)dλ−∫ 10(1− λ)2r˘yˆ(λ)dλ.Proof. On any open interval (a, b) ∈ (0, 1) over which r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable,and thus where r˘(λ) is differentiable, from equation (B.20), the fundamental theorem of calculus,and continuity of r˘(λ) by Theorem 2,limε→0+(∫ b−εa+εdr˘(λ)dλdλ)= limε→0+(∫ b−εa+εr˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ)=⇒limε→0+(r˘(λ)∣∣∣b−εa+ε)= limε→0+(∫ b−εa+εr˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ)=⇒r˘(b)− r˘(a) =∫ bar˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ. (B.21)Assuming that r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at all but a finite number of points,λˆ1 < λˆ2 < · · · < λˆnˆ, then from equation (B.21),limε→0+(r˘(λˆ1)− r˘(ε) +nˆ∑i=2(r˘(λˆi)− r˘(λˆi−1))+ r˘(1− ε)− r˘(λˆnˆ))=limε→0+(∫ λˆ1εr˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ+nˆ∑i=2∫ λˆiλˆi−1r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ+∫ 1−ελˆnˆr˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ)=⇒limε→0+(r˘(1− ε)− r˘(ε))= limε→0+(∫ 1−εεr˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ), (B.22)112B.4. Conservation in r˘y(λ)which, in conjunction with Equations (B.4) and (B.7), imply that0 =∫ 10r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ)dλ ⇐⇒∫ 10r˘y(λ)dλ+ 2∫ 10(1− λ)r˘yˆ(λ)dλ =∫ 10r˘∆x(λ)dλ. (B.23)Equation (B.23) defines the conserved quantity in the minimum deformation from data to thebest fitting numerical solution. Additionally,∫ 10r˘(λ) =∫ 10(1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ)dλ =∫ 10r˘y(λ) + (1− λ2)r˘yˆ(λ)dλ+∫ 10λ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ. (B.24)On any open interval (a, b) ∈ (0, 1) over which r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable, andthus where r˘(λ) is differentiable, from equation (B.20), integration by parts, and continuity ofr˘(λ) by Theorem 2, ∫ baλ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ =limε→0+(∫ b−εa+ελ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ)=limε→0+(∫ b−εa+ελdr˘(λ)dλdλ)= limε→0+(λr˘(λ)∣∣∣b−εa+ε−∫ b−εa+εr˘(λ)dλ)=br˘(b)− ar˘(a)−∫ bar˘(λ)dλ. (B.25)Assuming that r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable, and thus r˘(λ) is differentiable, at all113B.4. Conservation in r˘y(λ)but a finite number of points, λˆ1 < λˆ2 < · · · < λˆnˆ, then, from equation (B.25),∫ 10λ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ =limε→0+(∫ 1−εελ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ)=limε→0+(∫ λˆ1ελ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ+nˆ∑i=2∫ λˆiλˆi−1λ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ+∫ 1−ελˆnˆλ(r˘∆x(λ)− r˘y(λ)− 2(1− λ)r˘yˆ(λ))dλ)=limε→0+(λˆ1r˘(λˆ1)− εr˘(ε)−∫ λˆ1εr˘(λ)dλ+nˆ∑i=2(λˆir˘(λˆi)− λˆi−1r˘(λˆi−1)−∫ λˆiλˆi−1r˘(λ)dλ)+(1− ε)r˘(1− ε)− λˆnˆr˘(λˆnˆ)−∫ 1−ελˆnˆr˘(λ)dλ)=limε→0+((1− ε)r˘(1− ε)− εr˘(ε)−∫ 1−εεr˘(λ)dλ)= −∫ 10r˘(λ)dλ, (B.26)as, from equation (B.7), limλ→1− r˘(λ) = 0. Therefore, from Equations (B.24) and (B.26),∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ−∫ 10r˘(λ)dλ =⇒2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ, (B.27)which, in conjunction with equation (B.23), implies that2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ =∫ 10r˘∆x(λ)dλ− 2∫ 10(1− λ)r˘yˆ(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ ⇐⇒2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ+∫ 10(1− λ2)r˘yˆ(λ)dλ =∫ 10r˘∆x(λ)dλ−∫ 10(1− λ)2r˘yˆ(λ)dλ. (B.28)I note, when r˘yˆ(λ) = 0, equation (B.28) implies that2∫ 10r˘(λ)dλ =∫ 10r˘y(λ)dλ =∫ 10r˘∆x(λ)dλ. (B.29)114B.5. Integral Representations of Limit ValuesB.5 Integral Representations of Limit ValuesTheorem 5. If r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at all but a finite number of pointsin (0, 1), thenlimλ→0+r˘∆x(λ) =∫ 101λ2r˘y(λ)dλ+∫ 101− λ2λ2r˘yˆ(λ)dλ,limλ→1−r˘y(λ) =∫ 101(1− λ)2 r˘∆x(λ)dλ−∫ 10r˘yˆ(λ)dλ.Proof. From equation (B.18), where r˘y(λ) and r˘∆x(λ) are differentiable,dr˘∆x(λ)dλ= −1− λλdr˘y(λ)dλ− (1− λ)2λdr˘yˆ(λ)dλ, (B.30)Assuming that r˘∆x(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <· · · < λˆnˆ, then, from the fundamental theorem of calculus,∫ 10dr˘∆x(λ)dλdλ = limε→0+(∫ 1−εεdr˘∆x(λ)dλdλ)=limε→0+(∫ λˆ1−εεdr˘∆x(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+εdr˘∆x(λ)dλdλ+∫ 1−ελˆnˆ+εdr˘∆x(λ)dλdλ)=limε→0+(r˘∆x(λ)∣∣∣λˆ1−εε+nˆ∑i=2(r˘∆x(λ)∣∣∣λˆi−ελˆi−1+ε)+ r˘∆x(λ)∣∣∣1−ελˆnˆ+ε)=limε→0+(−r˘∆x(ε)−nˆ∑i=1(r˘∆x(λ)∣∣∣λˆi+ελˆi−ε)+ r˘∆x(1− ε)), (B.31)which, as limλ→1− r˘∆x(λ) = 0 (B.7), implies that∫ 10dr˘∆x(λ)dλdλ = − limλ→0+r˘∆x(λ)−nˆ∑i=1limε→0+(r˘∆x(λ)∣∣∣λˆi+ελˆi−ε). (B.32)Assuming that r˘y(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <115B.5. Integral Representations of Limit Values· · · < λˆnˆ, then, from integration by parts,−∫ 101− λλdr˘y(λ)dλdλ = − limε→0+(∫ 1−εε1− λλdr˘y(λ)dλdλ)=− limε→0+(∫ λˆ1−εε1− λλdr˘y(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+ε1− λλdr˘y(λ)dλdλ+∫ 1−ελˆnˆ+ε1− λλdr˘y(λ)dλdλ)=− limε→0+(1− λλr˘y(λ)∣∣∣λˆ1−εε+∫ λˆ1−εε1λ2r˘y(λ)dλ+nˆ∑i=2(1− λλr˘y(λ)∣∣∣λˆi−ελˆi−1+ε+∫ λˆi−ελˆi−1+ε1λ2r˘y(λ)dλ)+1− λλr˘y(λ)∣∣∣1−ελˆnˆ+ε+∫ 1−ελˆnˆ+ε1λ2r˘y(λ)dλ)=limε→0+(1− εεr˘y(ε))− limε→0+(ε1− ε r˘y(1− ε))+nˆ∑i=1limε→0+(1− λλr˘y(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101λ2r˘y(λ)dλ =limε→0+(1− εεr˘y(ε))+nˆ∑i=1limε→0+(1− λλr˘y(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101λ2r˘y(λ)dλ. (B.33)As limλ→0+ r˘y(λ) = 0 (B.4), from L’Hoˆpital’s rule,limε→0+(1− εεr˘y(ε))= limλ→0+(1− λλr˘y(λ))= limλ→0+((1− λ)dr˘y(λ)dλ− r˘y(λ))=limλ→0+dr˘y(λ)dλ. (B.34)From equation (B.18),0 = limλ→0+(−λdr˘∆x(λ)dλ)= limλ→0+((1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ)=limλ→0+dr˘y(λ)dλ+ limλ→0+dr˘yˆ(λ)dλ. (B.35)By construction, r˘y(λ) ≥ 0 and r˘yˆ(λ) ≥ 0, ∀λ ∈ (0, 1); from equation (B.4), limλ→0+ r˘y(λ) = 0and limλ→0+ r˘yˆ(λ) = 0. Thus, by L’Hoˆpital’s rule,0 ≤ r˘y(λ)λ=⇒ 0 ≤ limλ→0+r˘y(λ)λ= limλ→0+dr˘y(λ)dλ(B.36a)0 ≤ r˘yˆ(λ)λ=⇒ 0 ≤ limλ→0+r˘yˆ(λ)λ= limλ→0+dr˘yˆ(λ)dλ, (B.36b)116B.5. Integral Representations of Limit Valueswhich, in conjunction with equation (B.35), implies thatlimλ→0+dr˘y(λ)dλ= 0 (B.37a)limλ→0+dr˘yˆ(λ)dλ= 0. (B.37b)Equations (B.34) and (B.37a) imply thatlimε→0+(1− εεr˘y(ε))= 0, (B.38)which, in conjunction with equation (B.33), implies that−∫ 101− λλdr˘y(λ)dλdλ =nˆ∑i=1limε→0+(1− λλr˘y(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101λ2r˘y(λ)dλ. (B.39)Assuming that r˘yˆ(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <· · · < λˆnˆ, then, from integration by parts,−∫ 10(1− λ)2λdr˘yˆ(λ)dλdλ = − limε→0+(∫ 1−εε(1− λ)2λdr˘yˆ(λ)dλdλ)=− limε→0+(∫ λˆ1−εε(1− λ)2λdr˘yˆ(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+ε(1− λ)2λdr˘yˆ(λ)dλdλ+∫ 1−ελˆnˆ+ε(1− λ)2λdr˘yˆ(λ)dλdλ)=− limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆ1−εε+∫ λˆ1−εε1− λ2λ2r˘yˆ(λ)dλ+nˆ∑i=2((1− λ)2λr˘yˆ(λ)∣∣∣λˆi−ελˆi−1+ε+∫ λˆi−ελˆi−1+ε1− λ2λ2r˘yˆ(λ)dλ)+(1− λ)2λr˘yˆ(λ)∣∣∣1−ελˆnˆ+ε+∫ 1−ελˆnˆ+ε1− λ2λ2r˘yˆ(λ)dλ)=limε→0+((1− ε)2εr˘yˆ(ε))− limε→0+(ε21− ε r˘yˆ(1− ε))+nˆ∑i=1limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101− λ2λ2r˘yˆ(λ)dλ =limε→0+((1− ε)2εr˘yˆ(ε))+nˆ∑i=1limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101− λ2λ2r˘yˆ(λ)dλ. (B.40)117B.5. Integral Representations of Limit ValuesFrom L’Hoˆpital’s rule, as limλ→0+ r˘yˆ(λ) = 0 (B.4), and from equation (B.37b),limε→0+((1− ε)2εr˘yˆ(ε))= limλ→0+((1− λ)2λr˘yˆ(λ))=limλ→0+((1− λ)2dr˘yˆ(λ)dλ− 2(1− λ)r˘yˆ(λ))= limλ→0+dr˘yˆ(λ)dλ= 0, (B.41)which, in conjunction with equation (B.40), implies that−∫ 10(1− λ)2λdr˘yˆ(λ)dλdλ =nˆ∑i=1limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101− λ2λ2r˘yˆ(λ)dλ. (B.42)From equation (B.15), for all λ ∈ (0, 1) and for all ε such that λ+ ε ∈ (0, 1),0 ≤ (1− λ)(r˘y(λ+ ε)− r˘y(λ))+(1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ λ(r˘∆x(λ+ ε)− r˘∆x(λ)) ≤ε(r˘y(λ+ ε)− r˘y(λ))+ 2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)). (B.43)Thus, for all λ ∈ (0, 1),0 = limε→00 ≤ limε→0((1− λ)(r˘y(λ+ ε)− r˘y(λ))+(1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ λ(r˘∆x(λ+ ε)− r˘∆x(λ))) ≤limε→0(ε(r˘y(λ+ ε)− r˘y(λ))+ 2ε(1− λ)(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ε2(r˘yˆ(λ)− r˘yˆ(λ+ ε))+ ε(r˘∆x(λ)− r˘∆x(λ+ ε)))= 0 =⇒limε→0((1− λ)(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ))+λ(r˘∆x(λ+ ε)− r˘∆x(λ)))= 0 ⇐⇒limε→0±((1− λ)(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2(r˘yˆ(λ+ ε)− r˘yˆ(λ))+λ(r˘∆x(λ+ ε)− r˘∆x(λ)))= 0 =⇒(1− λ) limε→0±(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2 limε→0±(r˘yˆ(λ+ ε)− r˘yˆ(λ))+λ limε→0±(r˘∆x(λ+ ε)− r˘∆x(λ))= 0, (B.44)118B.5. Integral Representations of Limit Valuesand consequently, for all λ ∈ (0, 1),(1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ− ε))+(1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ− ε))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ− ε))=(1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ) + r˘y(λ)− r˘y(λ− ε))+(1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ) + r˘yˆ(λ)− r˘yˆ(λ− ε))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ) + r˘∆x(λ)− r˘∆x(λ− ε))=(1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ))+ (1− λ) limε→0+(r˘y(λ)− r˘y(λ− ε))+(1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ (1− λ)2 limε→0+(r˘yˆ(λ)− r˘yˆ(λ− ε))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ))+ λ limε→0+(r˘∆x(λ)− r˘∆x(λ− ε))=(1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ))+ (1− λ) limε→0−(r˘y(λ)− r˘y(λ+ ε))+(1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ))+ (1− λ)2 limε→0−(r˘yˆ(λ)− r˘yˆ(λ+ ε))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ))+ λ limε→0−(r˘∆x(λ)− r˘∆x(λ+ ε))=((1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ)))−((1− λ) limε→0−(r˘y(λ+ ε)− r˘y(λ))+ (1− λ)2 limε→0−(r˘yˆ(λ+ ε)− r˘yˆ(λ))+λ limε→0−(r˘∆x(λ+ ε)− r˘∆x(λ)))= 0− 0 = 0 ⇐⇒(1− λ) limε→0+(r˘y(λ+ ε)− r˘y(λ− ε))+ (1− λ)2 limε→0+(r˘yˆ(λ+ ε)− r˘yˆ(λ− ε))+λ limε→0+(r˘∆x(λ+ ε)− r˘∆x(λ− ε))= 0. (B.45)Assuming that r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at all but a finite number of points,λˆ1 < λˆ2 < · · · < λˆnˆ, then, from equation (B.30),∫ 10dr˘∆x(λ)dλdλ = −∫ 101− λλdr˘y(λ)dλdλ−∫ 10(1− λ)2λdr˘yˆ(λ)dλdλ, (B.46)119B.5. Integral Representations of Limit Valueswhich, in conjunction with Equations (B.32), (B.39), and (B.42) implies that− limλ→0+r˘∆x(λ)−nˆ∑i=1limε→0+(r˘∆x(λ)∣∣∣λˆi+ελˆi−ε)=nˆ∑i=1limε→0+(1− λλr˘y(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101λ2r˘y(λ)dλ+nˆ∑i=1limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101− λ2λ2r˘yˆ(λ)dλ. (B.47)From equation (B.45), for λˆ ∈ (0, 1),limε→0+(1− λλr˘y(λ)∣∣∣λˆ+ελˆ−ε)+ limε→0+((1− λ)2λr˘yˆ(λ)∣∣∣λˆ+ελˆ−ε)=1− λˆλˆlimε→0+(r˘y(λ)∣∣∣λˆ+ελˆ−ε)+(1− λˆ)2λˆlimε→0+(r˘yˆ(λ)∣∣∣λˆ+ελˆ−ε)= − limε→0+(r˘∆x(λ)∣∣∣λˆ+ελˆ−ε), (B.48)which, in conjunction with equation (B.47), implies that− limλ→0+r˘∆x(λ)−nˆ∑i=1limε→0+(r˘∆x(λ)∣∣∣λˆi+ελˆi−ε)=−nˆ∑i=1limε→0+(r˘∆x(λ)∣∣∣λˆi+ελˆi−ε)−∫ 101λ2r˘y(λ)dλ−∫ 101− λ2λ2r˘yˆ(λ)dλ =⇒limλ→0+r˘∆x(λ) =∫ 101λ2r˘y(λ)dλ+∫ 101− λ2λ2r˘yˆ(λ)dλ. (B.49)Similarly, from equation (B.18), where r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable,dr˘y(λ)dλ= − λ1− λdr˘∆x(λ)dλ− (1− λ)dr˘yˆ(λ)dλ. (B.50)Assuming that r˘y(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <· · · < λˆnˆ, then, from the fundamental theorem of calculus,∫ 10dr˘y(λ)dλdλ = limε→0+(∫ 1−εεdr˘y(λ)dλdλ)=limε→0+(∫ λˆ1−εεdr˘y(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+εdr˘y(λ)dλdλ+∫ 1−ελˆnˆ+εdr˘y(λ)dλdλ)=limε→0+(r˘y(λ)∣∣∣λˆ1−εε+nˆ∑i=2(r˘y(λ)∣∣∣λˆi−ελˆi−1+ε)+ r˘y(λ)∣∣∣1−ελˆnˆ+ε)=limε→0+(−r˘y(ε)−nˆ∑i=1(r˘y(λ)∣∣∣λˆi+ελˆi−ε)+ r˘y(1− ε)), (B.51)120B.5. Integral Representations of Limit Valueswhich, as limλ→0+ r˘y(λ) = 0 (B.4), implies that∫ 10dr˘y(λ)dλdλ = limλ→1−r˘y(λ)−nˆ∑i=1limε→0+(r˘y(λ)∣∣∣λˆi+ελˆi−ε). (B.52)Assuming that r˘∆x(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <· · · < λˆnˆ, then, from integration by parts,−∫ 10λ1− λdr˘∆x(λ)dλdλ = − limε→0+(∫ 1−εελ1− λdr˘∆x(λ)dλdλ)=− limε→0+(∫ λˆ1−εελ1− λdr˘∆x(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+ελ1− λdr˘∆x(λ)dλdλ+∫ 1−ελˆnˆ+ελ1− λdr˘∆x(λ)dλdλ)=− limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆ1−εε−∫ λˆ1−εε1(1− λ)2 r˘∆x(λ)dλ+nˆ∑i=2(λ1− λr˘∆x(λ)∣∣∣λˆi−ελˆi−1+ε−∫ λˆi−ελˆi−1+ε1(1− λ)2 r˘∆x(λ)dλ)+λ1− λr˘∆x(λ)∣∣∣1−ελˆnˆ+ε−∫ 1−ελˆnˆ+ε1(1− λ)2 r˘∆x(λ)dλ)=limε→0+(ε1− ε r˘∆x(ε))− limε→0+(1− εεr˘∆x(1− ε))+nˆ∑i=1limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆi+ελˆi−ε)+∫ 101(1− λ)2 r˘∆x(λ)dλ =− limε→0+(1− εεr˘∆x(1− ε))+nˆ∑i=1limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆi+ελˆi−ε)+∫ 101(1− λ)2 r˘∆x(λ)dλ. (B.53)As limλ→1− r˘∆x(λ) = 0 (B.7), from L’Hoˆpital’s rule,limε→0+(1− εεr˘∆x(1− ε))= limλ→1−(λ1− λr˘∆x(λ))= limλ→1−(−λdr˘∆x(λ)dλ− r˘∆x(λ))=− limλ→1−dr˘∆x(λ)dλ. (B.54)From equation (B.18),− limλ→1−dr˘∆x(λ)dλ= limλ→1−(−λdr˘∆x(λ)dλ)=limλ→1−((1− λ)dr˘y(λ)dλ+ (1− λ)2dr˘yˆ(λ)dλ)= 0, (B.55)121B.5. Integral Representations of Limit Valueswhich, in conjunction with equation (B.54), implies thatlimε→0+(1− εεr˘∆x(1− ε))= 0, (B.56)which, in conjunction with equation (B.53), implies that−∫ 10λ1− λdr˘∆x(λ)dλdλ =nˆ∑i=1limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆi+ελˆi−ε)+∫ 101(1− λ)2 r˘∆x(λ)dλ. (B.57)Assuming that r˘yˆ(λ) is differentiable at all but possibly a finite number of points, λˆ1 < λˆ2 <· · · < λˆnˆ, then, from integration by parts,−∫ 10(1− λ)dr˘yˆ(λ)dλdλ = − limε→0+(∫ 1−εε(1− λ)dr˘yˆ(λ)dλdλ)=− limε→0+(∫ λˆ1−εε(1− λ)dr˘yˆ(λ)dλdλ+nˆ∑i=2∫ λˆi−ελˆi−1+ε(1− λ)dr˘yˆ(λ)dλdλ+∫ 1−ελˆnˆ+ε(1− λ)dr˘yˆ(λ)dλdλ)=− limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆ1−εε+∫ λˆ1−εεr˘yˆ(λ)dλ+nˆ∑i=2((1− λ)r˘yˆ(λ)∣∣∣λˆi−ελˆi−1+ε+∫ λˆi−ελˆi−1+εr˘yˆ(λ)dλ)+(1− λ)r˘yˆ(λ)∣∣∣1−ελˆnˆ+ε+∫ 1−ελˆnˆ+εr˘yˆ(λ)dλ)=limε→0+((1− ε)r˘yˆ(ε))− limε→0+(εr˘yˆ(1− ε))+nˆ∑i=1limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 10r˘yˆ(λ)dλ =limε→0+r˘yˆ(ε) +nˆ∑i=1limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 10r˘yˆ(λ)dλ, (B.58)which, as limλ→0+ r˘yˆ(λ) = 0 (B.4), implies that−∫ 10(1− λ)dr˘yˆ(λ)dλdλ =nˆ∑i=1limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 10r˘yˆ(λ)dλ. (B.59)Assuming that r˘y(λ), r˘yˆ(λ), and r˘∆x(λ) are differentiable at all but a finite number of points,λˆ1 < λˆ2 < · · · < λˆnˆ, then, from equation (B.50),∫ 10dr˘y(λ)dλdλ = −∫ 10λ1− λdr˘∆x(λ)dλdλ−∫ 10(1− λ)dr˘yˆ(λ)dλdλ, (B.60)122B.6. Bounding Normalized Squared Residual Sumswhich, in conjunction with Equations (B.52), (B.57), and (B.59) implies thatlimλ→1−r˘y(λ)−nˆ∑i=1limε→0+(r˘y(λ)∣∣∣λˆi+ελˆi−ε)=nˆ∑i=1limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆi+ελˆi−ε)+∫ 101(1− λ)2 r˘∆x(λ)dλ+nˆ∑i=1limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆi+ελˆi−ε)−∫ 10r˘yˆ(λ)dλ. (B.61)From equation (B.45), for λˆ ∈ (0, 1),limε→0+(λ1− λr˘∆x(λ)∣∣∣λˆ+ελˆ−ε)+ limε→0+((1− λ)r˘yˆ(λ)∣∣∣λˆ+ελˆ−ε)=λˆ1− λˆ limε→0+(r˘∆x(λ)∣∣∣λˆ+ελˆ−ε)+ (1− λˆ) limε→0+(r˘yˆ(λ)∣∣∣λˆ+ελˆ−ε)= − limε→0+(r˘y(λ)∣∣∣λˆ+ελˆ−ε)(B.62)which, in conjunction with equation (B.61), implies thatlimλ→1−r˘y(λ)−nˆ∑i=1limε→0+(r˘y(λ)∣∣∣λˆi+ελˆi−ε)=−nˆ∑i=1limε→0+(r˘y(λ)∣∣∣λˆi+ελˆi−ε)+∫ 101(1− λ)2 r˘∆x(λ)dλ−∫ 10r˘yˆ(λ)dλ=⇒ limλ→1−r˘y(λ) =∫ 101(1− λ)2 r˘∆x(λ)dλ−∫ 10r˘yˆ(λ)dλ. (B.63)B.6 Bounding Normalized Squared Residual SumsFrom equation (B.2), for ε ∈ (0, 1),(1− λ)r˘y(λ) ≤ (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) ≤(1− λ)r˘y(ε) + (1− λ)2r˘yˆ(ε) + λr˘∆x(ε), (B.64a)λr˘∆x(λ) ≤ (1− λ)r˘y(λ) + (1− λ)2r˘yˆ(λ) + λr˘∆x(λ) ≤(1− λ)r˘y(1− ε) + (1− λ)2r˘yˆ(1− ε) + λr˘∆x(1− ε), (B.64b)123B.6. Bounding Normalized Squared Residual Sumswhich implies, in conjunction with Equations (B.6) and (B.9), that(1− λ)r˘y(λ) ≤limε→0+((1− λ)r˘y(ε) + (1− λ)2r˘yˆ(ε) + λr˘∆x(ε))= λ limε→0+r˘∆x(ε), (B.65a)λr˘∆x(λ) ≤ limε→0+((1− λ)r˘y(1− ε) + (1− λ)2r˘yˆ(1− ε) + λr˘∆x(1− ε))=(1− λ) limε→0+r˘y(1− ε) + (1− λ)2 limε→0+r˘yˆ(1− ε). (B.65b)Thus,r˘y(λ) ≤ λ(1− λ) limε→0+ r˘∆x(ε), (B.66a)r˘∆x(λ) ≤ (1− λ)λ(limε→0+r˘y(1− ε) + (1− λ) limε→0+r˘yˆ(1− ε))≤(1− λ)λ(limε→0+r˘y(1− ε) + limε→0+r˘yˆ(1− ε)), (B.66b)and for reasonable models, with numerical solutions that fit data at least twice as well as thehomogeneous model, where limε→0+ r˘y(1 − ε) ≤ 1/2 and limε→0+ r˘yˆ(1 − ε) ≤ 1/2, and withdifferential equation values that fit discretized data at least as well as the homogeneous model,where limε→0+ r˘∆x(ε) ≤ 1,r˘y(λ) ≤ λ(1− λ) , (B.67a)r˘∆x(λ) ≤ (1− λ)λ. (B.67b)Thus, for any data and any reasonable model of the data, to ensure that r˘y(λ) does not exceedsome tolerance, ε¯, λ should be chosen small enough, such thatλ ≤ ε¯1 + ε¯, (B.68)and to ensure r˘∆x(λ) does not exceed some tolerance, ε¯, λ should be chosen large enough, suchthatλ ≥ 11 + ε¯. (B.69)124Appendix COverlapping-Niche DescentOverlapping-niche descent, a genetic algorithm directed by gradient-based descent, synergisticallyminimizes r(p,x;λ) over a broad range of λ values. Here, I describe overlapping-niche descentin full.C.1 Defining Overlapping-Niche DescentIn overlapping-niche descent, I define an environment that contains nniche niches, each definedby a unique value of λ, λ1 < λ2 < · · · < λnniche ∈ (0, 1). Individuals, points of parameters andstate values, inhabit the environment. Initially, in the first generation, I randomly generateparameters and state values in all individuals. In generation g, the ith niche sustains ng,iindividuals, (pg,i,j ,xg,i,j) for j = 1, 2, . . . , ng,i. Starting from (pg,i,j ,xg,i,j), I locally minimizer(p,x;λi) to determine (p˘g,i,j , x˘g,i,j), for i = 1, 2, . . . , nniche and j = 1, 2, . . . , ng,i, using thedescent method described in Section C.2.I decompose the number of sustained individuals in each niche, ng,i, into the number ofsustained parents, nˆi, which remains fixed over generations, and the number of sustained off-spring, nˇg,i, which may change over generations. From{{p˘g,i,j , x˘g,i,j}: i ∈ {1, 2, . . . , nniche}, j ∈{1, 2, . . . , ng,i}}, I select the nˆi individuals with the nˆi least values of r(p,x;λi) to occupy the nˆiparent spaces of the ith niche in the g + 1th generation, for i = 1, 2, . . . , nniche; I allow individualsto occupy parent spaces in multiple niches, to permit cross-niche minimization, but do not allowindividuals to occupy multiple parent spaces in the same niche, to maintain diversity in theparameter-state value search space. I enumerated individuals occupying the nˆi parent spacesof the ith niche from 1 to nˆi, such that r(pg+1,i,1,xg+1,i,1;λi) ≤ r(pg+1,i,2,xg+1,i,2;λi) ≤ · · · ≤r(pg+1,i,nˆi ,xg+1,i,nˆi ;λi), for i = 1, 2, . . . , nniche. Thus, for the jth parent space of the ith niche, Icalculate the relative change in r(p,x;λ) over the g + 1th generation:∆rg+1,i,j =r(pg,i,j ,xg,i,j ;λi)− r(pg+1,i,j ,xg+1,i,j ;λi)r(pg,i,j ,xg,i,j ;λi)∈ [0, 1]. (C.1)I decompose the number of sustained offspring in each niche, nˇg,i, into the number ofhigh momentum offspring, nˇmg,i, the number of cross-niche offspring, nˇcg,i, the number of sexualoffspring, nˇsg,i, and the number of random offspring, nˇrg,i. I generate high momentum offspringto accelerate convergence rates in the descent method for individuals that occupy a parent spacein some niche. The individual that occupies the jth parent space in the ith niche also occupies125C.1. Defining Overlapping-Niche Descentthe jth high momentum offspring space in the ith niche. Thus, nˇmg,i = nˆi for i = 1, 2, . . . , nniche.I describe details of high momentum offspring in Section C.2.2.An individual occupying a parent space in one niche may be close to a global minimum inanother niche. I generate cross-niche offspring to synergistically minimize r(p,x;λ) across niches.From the set of individuals not selected from the kth niche and occupying a parent space not in thekth niche,{{p˘g,i,j , x˘g,i,j}: i ∈ {1, 2, . . . , nniche}\{k}, j ∈ {1, 2, . . . , ng,i}}⋂{{pg+1,i,j ,xg+1,i,j}:i ∈ {1, 2, . . . , nniche}\{k}, j ∈ {1, 2, . . . , nˆi}}, I randomly select an individual to occupy the lthcross-niche offspring space in the kth niche, with the probability of selection proportional to anindividual’s fitness within the kth niche, 1/r(pg+1,i,j ,xg+1,i,j ;λk)qfit , where the parameter qfitdictates the strength of selection.I generate sexual offspring to search for global minima beyond functional basins sur-rounding local minima. From the set of individuals occupying a parent space in some niche,{{pg+1,i,j ,xg+1,i,j}: i ∈ {1, 2, . . . , nniche}, j ∈ {1, 2, . . . , nˆi}}, I randomly select two, not necessar-ily distinct, sexual parents to produce the lth sexual offspring in the kth niche, with the probabilityof selection proportional to an individual’s fitness within the kth niche, 1/r(pg+1,i,j ,xg+1,i,j ;λk)qfit .To produce the lth sexual offspring in the kth niche, I randomly combine and perturb parametersand state values from both sexual parents, a process that is isomorphic to chromosomal crossoverand mutation in biological sexual reproduction. I treat each parameter and all states likeseparate chromosomes, independent structural units of information, and implement crossoverand mutation in each chromosomal-like unit. For each parameter, a sexual offspring inheritsa parameter value, with random perturbation, from one of its sexual parents, which I chooserandomly with equal probability. For each sexual offspring, I choose a crossover location ran-domly from two to the number of elements in the numerical discretization, I∆, with equalprobability. A sexual offspring inherits all state values, with random perturbations, at gridpoints preceding its crossover location from one its sexual parents, which I choose randomly withequal probability, and inherits all remaining state values, with random perturbations, from itsother sexual parent. For effective and efficient global minimization, searches for global minimaaround local minima should start broad, and should narrow as individuals approach globalminima. Naturally, as individuals approach global minima, r(pg+1,i,j ,xg+1,i,j ;λk)qfit decreasessignificantly with decreasing |λi − λk|, increasing the likelihood of similarity in sexual parentpairings. Concomitantly, as individuals approach global minima, random perturbations ininherited parameters and state values should decrease. I measure convergence in the jth parentspace of the ith niche by ∆rg+1,i,j (C.1). As ∆rg+1,i,j decreases, I decrease random perturbationsin parameters and state values inherited from individual (pg+1,i,j ,xg+1,i,j).Parameters and state values in individuals likely do not initially cover the parameter-state value domain, and progressively cover less of the domain with successive generations ofoverlapping-niche descent. I generate random offspring, with random parameters and statevalues, to continually probe diverse regions of the parameter-state value domain for globalminima.126C.2. Defining DescentUltimately, I terminate overlapping-niche descent when the relative change in r(p,x;λ) overthe g + 1th generation is less than some tolerance, ε∆r, in all parent spaces of all niches,∆rg+1,i,j < ε∆r,∀i ∈ {1, 2, . . . , nniche} and ∀j ∈ {1, 2, . . . , nˆi}. (C.2)C.2 Defining Descentr(p,x;λ) is a high dimensional system for models of complex behavior. The likelihood ofrandomly selecting the parameters and state values of a local minimum of r(p,x;λ) decreaseswith increasing dimensionality, particularly with limited parameter and state value estimates.Thus, I use directed minimization to find local minimums of r(p,x;λ).Calculating the gradient of r(p,x;λ) requires calculating n(p) + n(x) partial derivatives,where n(v) denotes the number of elements in vector v. Thus, an iteration of minimizationoriented by the gradient of r(p,x;λ) is relatively computationally efficient. Alternatively,r(p,x;λ) is minimizable using Newton’s method or a variant of Newton’s method, such as theGauss-Newton method. However, beyond calculating partial derivatives, an iteration of Newton’smethod or a variant of Newton’s method requires solving a (n(p) + n(x))× (n(p) + n(x)) linearsystem of equations, which can require from O(n(p) + n(x))2 operations to solve using aniterative method, such as the generalized minimal residual method, and up to O(n(p) + n(x))3operations to solve using Gaussian elimination. Thus, the computational time of an iterationof r(p,x;λ) minimization using Newton’s method or a variant of Newton’s method increasessuperlinearly in the number of parameters and state values. Whereas, the computational time ofan iteration of r(p,x;λ) minimization oriented by its gradient increases linearly in the numberof parameters and state values. Thus, in minimizing r(p,x;λ), to maintain computationalfeasibility for complex models with a large number of parameters and state values, I implementminimization oriented by the gradient of r(p,x;λ) rather than using Newton’s method or avariant of Newton’s method.C.2.1 Descent ScalingThe most straightforward direction for r(p,x;λ) minimization is down the gradient, the directionof the negative gradient, the direction in which r(p,x;λ) decreases most rapidly. However,individual parameters affect r(p,x;λ) more extensively than individual state values, as a singleparameter spans many differences between differential equation values and finite differencevalues or many differences between observable state values and data values, while a single statevalue spans only several differences between differential equation values and finite differencevalues, several differences between observable state values and data values, and several finitedifference normalization values. Also, as in models with both linear and nonlinear terms,parameters vary in the degree to which they affect differences between differential equationvalues and finite difference values. Thus, partial derivatives of r(p,x;λ) vary dramatically in127C.2. Defining Descentscale, in ways that do not inform distances to local minimum of r(p,x;λ). So, a move downthe gradient of r(p,x;λ), to a lower value of r(p,x;λ), is a leading order move down directionsof affect-dominating variables, with lower order moves down the directions of other variables,requiring many iterations for significant changes in affect-nondominating variables. I seek a moreefficient r(p,x;λ) minimizing direction, one in which directional movement relates to distancefrom a local minimum.For the vector v = (p,x), with nv = n(p) + n(x) elements, the directional derivativeof r(v;λ), in the direction of the Hadamard product between some gradient scaling vectors = (s1, s2, . . . , snv) and the negative gradient of r(v;λ), −s ◦ ∇r(v;λ) = −(s1 · ∂v1r(v;λ),s2 · ∂v2r(v;λ), . . . , snv · ∂vnv r(v;λ)), is given by−nv∑i=1si(∂r(v;λ)∂vi)2.Thus, if s is a vector with positive elements, then r(v;λ) decreases in the direction of −s◦∇r(v;λ),at all points where ∇r(v;λ) 6= 0. As such, positive elements of s can be chosen to orient a moreefficient minimization direction than down the gradient of r(v;λ). Ideally, elements of s shouldbe chosen such that∣∣si · ∂vir(v;λ)∣∣ is the distance along vi to the nearest local minimum ofr(v;λ). Then, a single step in the direction of −s ◦ ∇r(v;λ) would lead directly to the nearestlocal minimum of r(v;λ). In practice, such distances are unknown, but can be approximated. Alocal minimum of a function occurs at a point where all partial derivatives of the function arezero. For improved efficiency over gradient descent, the computation burden of approximatingdistances to a local minimum of r(v;λ) should not exceed the amount of computation requiredfor a commensurate number of gradient descent iterations. Rather than a computationallyexpensive search for the location where all partial derivatives of r(v;λ) are zero, I more coarselyand computationally efficiently approximate the distance along vi to the point where all partialderivatives of r(v;λ) are zero as the distance to the zero of the linearization of ∂vir(v;λ), withall variables constant but vi, equivalent to the one-dimensional Newton-method approximatedistance to a zero of ∂vir(v;λ) along vi:d˜i =∣∣∣∣∂r(v;λ)∂vi∣∣∣∣ ∣∣∣∣∂2r(v;λ)∂v2i∣∣∣∣−1 . (C.3)Near a local minimum of r(v;λ), d˜i fairly accurately estimates the distance to the local minimumalong vi, as r(v;λ) is continuous. Far from a local minimum of r(v;λ), d˜i does not accuratelyestimate the distance to the local minimum along vi, but does inform the scale of the distanceto the local minimum along vi better than the value of ∂vir(v;λ) alone. Near a critical pointof r(v;λ), if ∂vivir(v;λ) > 0, then d˜i is an approximate distance to a local minimum of r(v;λ)along vi. For i such that ∂vivir(v;λ) > 0, I calculate the ith element of the gradient scaling128C.2. Defining Descentvector s, si, by equating∣∣si · ∂vir(v;λ)∣∣ and d˜i. Thus,si =∂2r(v;λ)∂v2i−1. (C.4)Near a critical point of r(v;λ), if ∂vivir(v;λ) < 0, then d˜i is an approximate distance to a localmaximum of r(v;λ) along vi. For i such that ∂vivir(v;λ) ≤ 0, I choose si to be the search scalefrom the previous iteration of descent, rather than calculating si by equating∣∣si · ∂vir(v;λ)∣∣and d˜i, which does not reflect the scale of the distance to a local minimum of r(v;λ) along viand could impede descent. For the first iteration of descent, if ∂vivir(v;λ) ≤ 0, I choose thevalue of si to be si,0, some very small value or zero. It may be beneficial to assign a nonzerovalue to si,0, under certain restrictions on variables, as discussed in Section C.2.3. If si = 0 and∂vir(v;λ) 6= 0, then near a local minimum of r(v;λ), descent in other variables should lead to anew set of variable values where ∂vivir(v;λ) > 0, and thus si > 0. If descent is not possible or adescent sequence does not lead to a set of variable values where ∂vivir(v;λ) > 0, then, near alocal minimum of r(v;λ), some crossover and mutation event should eventually produce a setof variable values where ∂vivir(v;λ) > 0, allowing unperturbed local minimization to continue.Collectively, denoting vj and sj as the vectors v and s in the jth iteration of descent, with ithelements vi,j and si,j , I define si,j , for j ≥ 1 and i ∈ {1, 2, . . . , nv}, such thatsi,j =∂2r(vj ;λ)∂v2i,j−1if∂2r(vj ;λ)∂v2i,j> 0si,j−1 if∂2r(vj ;λ)∂v2i,j≤ 0.(C.5)Thus, I orient r(vj ;λ) descent in the direction ofv↓j = −sj ◦ ∇r(vj ;λ) = −(s1,j∂r(vj ;λ)∂v1,j, s2,j∂r(vj ;λ)∂v2,j, . . . , snv ,j∂r(vj ;λ)∂vnv ,j). (C.6)Because of rough distance approximations to a local minimum of r(v;λ), an extra scalingparameter of v↓j , σj , is required to fine-tune descent. Thus, r(v;λ) descent occurs by moving awayfrom the point vj , in the direction of σjv↓j , to the new point vj+σjv↓j . σj must be chosen such thatr(vj+σjv↓j ;λ) < r(vj ;λ), and should be chosen large enough to avoid an excess number of descentiterations. To choose σj , I begin with a value of σj = 1. If r(vj+2v↓j ;λ) < r(vj+v↓j ;λ) < r(vj ;λ),I expand σj , continuing to double σj until r(vj + 2σjv↓j ;λ) ≥ r(vj + σjv↓j ;λ). If, however,r(vj + v↓j ;λ) ≥ r(vj ;λ) or r(vj + 2v↓j ;λ) ≥ r(vj + v↓j ;λ), I contract σj , continuing to half σjuntil r(vj + 2−1σjv↓j ;λ) ≥ r(vj + σjv↓j ;λ) < r(vj ;λ) or until σj contracts below some specifiedtolerance εσ.129C.2. Defining DescentDescent maps vi,j to vdi,j :vdi,j = vi,j − σjsi,j∂r(vj ;λ)∂vi,j. (C.7)Under the variable scalingvˆi,j = s−1/2i,j vi,j ,∀i ∈ {1, 2, . . . , nv}, (C.8)Equation (C.7) becomesvdi,j = s1/2i,j vˆi,j − σjsi,js−1/2i,j∂r(vˆj ;λ)∂vˆi,j= s1/2i,j(vˆi,j − σj ∂r(vˆj ;λ)∂vˆi,j), (C.9)as∂r(vj ;λ)∂vi,j=∂r(vˆj ;λ)∂vˆi,jdvˆi,jdvi,j= s−1/2i,j∂r(vˆj ;λ)∂vˆi,j. (C.10)Thus, a step of descent is equivalent to a step of gradient descent under the variable scaling inequation (C.8).C.2.2 Descent AccelerationWith very little extra computational burden, Nesterov’s method significantly increases theconvergence rate of gradient descent in functional minimization [50]. In Nesterov’s method,movement from the point of values in the jth iteration, along the direction of the change invalues over the jth iteration, generates an intermediary point of values. Then, movement fromthe intermediary point of values, down the gradient of the functional, generates the point ofvalues in the j + 1th iteration. Although Nesterov’s method converges to a local minimum ofa functional, it is not a descent method, as the functional value may increase during someiterations. Restarting Nesterov’s method when functional values would increase during aniteration ensures descent during every iteration and accelerates the method’s rate of convergence[51]. To minimize r(v;λ), I apply Nesterov’s method with increasing-functional restart from130C.2. Defining Descent[51], with variable scaling, strict descent, and termination-tolerance restart modifications:θj = 1 if j = 1 or δj = 0 or τj = 1θj−1 (−θj−1 + (θ2j−1 + 4)1/2) /2 otherwise, (C.11a)βj ={0 if j = 1 or δj = 0 or τj = 1θj−1(1− θj−1)/(θ2j−1 + θj) otherwise,(C.11b)vj ={uj if j = 1 or δj = 0 or τj = 1uj + βj(uj − uj−1) otherwise,(C.11c)uj+1 ={vj + σjv↓j if r(vj + σjv↓j ;λ) < r(uj ;λ)uj otherwise,(C.11d)δj+1 ={1 if r(uj+1;λ) < r(uj ;λ)0 otherwise,(C.11e)τj+1 ={1 if σj < εσ or(r(uj ;λ)− r(uj+1;λ))/r(uj ;λ) < εr0 otherwise,(C.11f)where uj is the vector of parameter and states values in iteration j ≥ 1, θj tempers the iterationalincrease from 0 to 1 in βj , the proportion along the iterational change in uj , which generatesthe intermediary point vj ; δj is the strict descent indicator, and τj is the termination toleranceindicator. Accelerated descent begins with scaled gradient descent, and restarts if either strictdescent does not occur, δj = 0, or a termination tolerance is met, τj = 1. A termination toleranceis met if either the fine-tuning scaling parameter, σj , contracts below the specified tolerance, εσ,or the iterational relative change in r(uj ;λ) falls below the specified tolerance, εr. Ultimately,accelerated descent terminates in iteration j if a termination tolerance is met after restart,βj−1 = 0 and τj = 1, during an iteration of scaled gradient descent. Alternatively, accelerateddescent terminates in iteration j if the number of strict descent iterations reaches some specifiedmaximum number of strict descent iterations, nmax, which occurs when∑ji=2 δi = nmax.To maintain momentum in convergence over generations of accelerated descent, an individualthat occupies a parent space in a niche begins accelerated descent with θj−1 and uj−1 from itslast iteration of accelerated descent in the previous generation, rather than beginning accelerateddescent with a restart. To overcome stagnating convergence, for an individual that occupies aparent space in a niche, I extend momentum from its last iteration of accelerated descent toits last generation of accelerated descent, in a high momentum offspring. Thus, the individualthat occupies the lth high momentum offspring space in the kth niche of the gth generationbegins accelerated descent with θj−1 from its last iteration of accelerated descent in the g − 1thgeneration, and begins accelerated descent with uj−1 = (pg−1,k,l,xg−1,k,l), the parameters andstate values of the individual that occupies the lth parent space in the kth niche of the g − 1thgeneration.131C.2. Defining DescentC.2.3 Descent on Restricted DomainsWhen parameters and states are defined on a restricted domain, accelerated descent trajectoriesmust be confined to the restricted domain. Generally, accelerated descent would restart whenuj or vj would leave the restricted domain, and the gradient scaling vector, sj , with scalingparameter, σj , would be chosen to ensure that uj+1 remains within the restricted domain. Lessonerously, when the restricted domain is a closed convex set, I can simply project accelerateddescent trajectories onto the restricted domain.From the Bourbaki-Cheney-Goldstein inequality, for some point x and a closed convex set C,〈x− PC(x), PC(x)− y〉 ≥ 0 for all y ∈ C, wherePC(u) = arg min{‖v − u‖ : v ∈ C} (C.12)and 〈·, ·〉 denotes the standard inner product. Thus, for a functional f(u), x0 ∈ C, andx = x0 − σ∇f(x0) for some σ > 0, if PC(x) 6= x0, then〈x− PC(x), PC(x)− x0〉 ≥ 0 ⇐⇒〈x0 − σ∇f(x0)− PC(x), PC(x)− x0〉 ≥ 0 ⇐⇒〈σ∇f(x0) + PC(x)− x0, PC(x)− x0〉 ≤ 0 ⇐⇒〈σ∇f(x0), PC(x)− x0〉+ 〈PC(x)− x0, PC(x)− x0〉 ≤ 0 =⇒〈∇f(x0), PC(x)− x0〉 ≤ −σ−1 〈PC(x)− x0, PC(x)− x0〉 < 0 =⇒〈∇f(x0), PC(x0 − σ∇f(x0))− x0〉 < 0, (C.13)and, if PC(x) = x0, then〈x− x0,x0 − y〉 ≥ 0 ∀y ∈ C ⇐⇒〈x0 − σ∇f(x0)− x0,x0 − y〉 ≥ 0 ∀y ∈ C ⇐⇒〈−σ∇f(x0),x0 − y〉 ≥ 0 ∀y ∈ C ⇐⇒〈∇f(x0),y − x0〉 ≥ 0 ∀y ∈ C. (C.14)Inequality (C.13) implies that the non-invariant projection of a gradient descent trajectory ontoa closed convex set is a strictly decreasing direction of a functional. Inequality (C.14) impliesthat a functional does not decrease locally in a closed convex set when the projection of agradient descent trajectory onto the closed convex set is invariant. Therefore, the projection ofgradient descent trajectories onto a closed convex set preserves convergence to a local minimumof a functional in the closed convex set.The projection of a scaled descent trajectory, vj + σjv↓j , onto a closed convex set is notnecessarily a decreasing direction of r(v;λ). However, from equation (C.9), scaled descent isequivalent to gradient descent under the variable transformation in equation (C.8). Therefore,132C.3. Descent Prolongationto ensure descent, when parameters and states are restricted to a closed convex set, I projectscaled descent trajectories onto the restricted domain under the variable transformation inequation (C.8). I note that if si,j = 0 for some variable, vi, the projection of a scaled descenttrajectory onto the restricted domain may not be defined under the variable transformation inequation (C.8). In such cases, assigning si,0 some very small, positive value ensures that si,j 6= 0.Additionally, to confine accelerated descent intermediary points, vj , to the restricted domain, Iproject accelerated descent intermediary points onto the restricted domain. Accelerated descentrestarts upon reaching a termination tolerance. Thus, accelerated descent ultimately terminatesduring an iteration of scaled gradient descent, preserving convergence to a local minimum ofr(v;λ) on a restricted domain.Often, a closed convex set C is generated from the union of nc simpler closed convex setsC1, C2, . . . , Cnc . In such cases, I employ Dykstra’s method [19] to calculate the projection of apoint x onto C = C1 ∩ C2 ∩ · · · ∩ Cnc :xji =x if i = 0PCj (xnci−1 − dji−1) if i > 0 and j = 1PCj (xj−1i − dji−1) if i > 0 and j > 1,(C.15a)where PCj (u) = arg min{‖v − u‖ : v ∈ Cj} and (C.15b)dji =0 if i = 0xji − (xnci−1 − dji−1) if i > 0 and j = 1xji − (xj−1i − dji−1) if i > 0 and j > 1,(C.15c)for i ∈ {0, 1, 2, . . . } and j ∈ {1, 2, . . . , nc}. ‖xji − PC(x)‖ → 0 as i → ∞ and xji convergesmonotonically to PC(x) in i and j [6]. Commonly, a linear inequality may restrict points to aclosed half-space. When all Cj are closed half-spaces, the sequence {xnci } converges linearly toPC(x) [16]. I terminate Dykstra’s method when |xnci,k − xnci−1,k| < εc|xnci−1,k| or |xnci,k − xnci−1,k| < εc¯for all elements xnci,k in xnci , given some relative termination tolerance εc > 0 and some absolutetermination tolerance εc¯ > 0.C.3 Descent ProlongationAccelerated descent terminates if the number of strict descent iterations reaches nmax. Individualsoccupying parent spaces in niches have been descending for multiple generations of overlapping-niche descent. Thus, for nmax not exceedingly large, an individual that occupies an offspringspace in some niche may terminate accelerated descent with a higher value of r(v;λ) thanan individual that occupies a parent space in the same niche, even though the individualthat occupies the offspring space could ultimately converge to a lower value of r(v;λ) thanthe individual that occupies the parent space. In preliminary tests with randomly generatedparameter and states values, I have found that following initial, rapid sublinear convergence,133C.3. Descent Prolongationaccelerated descent trajectories generally converge linearly or superlinear in periods betweenaccelerated descent restart. I prolong accelerated descent for an individual that convergeslinearly or superlinearly to a value of r(v;λ) below the least value of r(v;λ) in its niche.Theorem 6. For a strictly decreasing sequence (bk)∞k=0 converging linearly or superlinearly to b,ifbn+2m < b˜+(bn+m − b˜)2bn − b˜, (C.16)then b may be less than b˜, and ifbn+2m > b˜+(bn+m − b˜)2bn − b˜, (C.17)then b > b˜, for m ∈ N. Also, if b > b˜ andbn+m − bbn − b < ρ, (C.18)for some ρ ∈ (0, 1/2), then there exists some n ∈ N such thatbn+2m > b˜+(bn+m − b˜)2bn − b˜(C.19)for all n > N .Proof. For a strictly decreasing sequence (ak)∞k=0 that converges linearly to a,an+1 − a = µ(an − a), (C.20)for some µ, where 0 < µ < 1. Thus, for m ∈ N,an+m − a = µ(an+m−1 − a) = µ2(an+m−2 − a) = · · · = µm(an − a), (C.21)from which it follows thatan+2m − a = µ2m(an − a) =(µm)2(an − a) =(an+m − aan − a)2(an − a) =(an+m − a)2an − a ⇐⇒ an+2m = a+(an+m − a)2an − a . (C.22)Thus, for a strictly decreasing sequence (bk)∞k=0,b˜n+2m = b˜+(bn+m − b˜)2bn − b˜(C.23)134C.3. Descent Prolongationis the linear-convergence estimate of bn+2m, with estimate, b˜, to the limit of the sequence, b.∂b˜n+2m∂b˜=(bn − bn+mbn − b˜)2> 0. (C.24)Thus, if (bk)∞k=0 is converging linearly to b < b˜, thenbn+2m = b+(bn+m − b)2bn − b < b˜+(bn+m − b˜)2bn − b˜= b˜n+2m, (C.25)from which it follows that if (bk)∞k=0 is converging superlinearly to b < b˜, thenbn+2m < b+(bn+m − b)2bn − b < b˜+(bn+m − b˜)2bn − b˜= b˜n+2m; (C.26)if (bk)∞k=0 is converging linearly to b > b˜, thenbn+2m = b+(bn+m − b)2bn − b > b˜+(bn+m − b˜)2bn − b˜= b˜n+2m, (C.27)from which it follows that if (bk)∞k=0 is converging sublinearly to b > b˜, thenbn+2m > b+(bn+m − b)2bn − b > b˜+(bn+m − b˜)2bn − b˜= b˜n+2m. (C.28)For (bk)∞k=0 converging superlinearly to b > b˜, I determine when b > b˜n+2m, which implies thatbn+2m ≥ b > b˜n+2m. For δ = b− b˜ and εn = bn − b,b > b˜n+2m = b˜+(bn+m − b˜)2bn − b˜⇐⇒ b > b− δ + (εn+m + δ)2εn + δ⇐⇒ε2n+m + 2δεn+m − εnδ < 0 ⇐⇒ εn+m ∈(−δ −√δ2 + εnδ,−δ +√δ2 + εnδ). (C.29)As εn+m ≥ 0, equation (C.29) holds, for εn > 0, if and only ifεn+m < −δ +√δ2 + εnδ ⇐⇒ εn+mεn<−δ +√δ2 + εnδεn, (C.30)135C.3. Descent Prolongationwhich necessarily holds for large enough m, as limm→∞ εn+m = 0.∂∂εn(−δ +√δ2 + εnδεn)=−(2δ2 + εnδ) + 2δ√δ2 + εnδ2ε2n√δ2 + εnδ, (C.31a)(2δ2 + εnδ)2 = 4δ4 + 4εnδ3 + ε2nδ2 > 4δ4 + 4εnδ3 =(2δ√δ2 + εnδ)2=⇒2δ2 + εnδ > 2δ√δ2 + εnδ =⇒∂∂εn(−δ +√δ2 + εnδεn)< 0. (C.31b)By L’Hoˆpital’s rule,limεn→0+−δ +√δ2 + εnδεn= limεn→0+δ2√δ2 + εnδ=12, (C.32)limεn→∞−δ +√δ2 + εnδεn= limεn→∞δ2√δ2 + εnδ= 0, (C.33)and equation (C.31b),0 <−δ +√δ2 + ε0δε0< · · · < −δ +√δ2 + εnδεn<−δ +√δ2 + εn+1δεn+1< · · · < 12. (C.34)Thus, for any ρ ∈ (0, 1/2) and m large enough that εn+m/εn < ρ, there exists some N ∈ N suchthat inequality (C.30) holds for all n > N , which implies that bn+2m > b˜n+2m for all n > N .Example 1. For m such that 3εn+m < εn and N such that n > N implies that εn < 3δ,εn+mεn<13=−δ +√δ2 + 3δ23δ<−δ +√δ2 + εnδεn, (C.35)for n > N , which implies that bn+2m > b˜n+2m for n > N .Collectively, if (bk)∞k=0 converges linearly or superlinearly to b < b˜, then bn+2m < b˜n+2m forall m,n ∈ N; if (bk)∞k=0 converges to b > b˜, then for sufficiently, but not exceedingly large m ∈ N,bn+2m > b˜n+2m, for all n ∈ N greater than some N . Therefore, for (bk)∞k=0 converging linearlyor superlinearly to b, if bn+2m < b˜n+2m, then b may be less than b˜, and if bn+2m > b˜n+2m, thenb is greater than b˜.For r˘λ, the least value of r(v;λ) amongst all individuals inhabiting the environment at theonset of accelerated descent, I prolong accelerated descent if the limj→∞ r(uj ;λ) may be lessthan σ˘r˘λ, where I choose σ˘ ∈ [0, 1] to specify the stringency of prolongation. Thus, if the numberof strict descent iterations has reached nmax and r(uj ;λ) > r˘λ, following Theorem 6, I prolong136C.3. Descent Prolongationaccelerated descent, up to nˆpro strict descent iterations, ifr(uj ;λ) < σ˘r˘λ +(r(uj−mpro ;λ)− σ˘r˘λ)2r(uj−2mpro ;λ)− σ˘r˘λ(C.36)or if a restart occurred between iterations j − 2mpro and j, for some sufficiently, but notexceedingly large, chosen value of mpro. If r(uj ;λ) ≤ r˘λ and r(uj−1;λ) > r˘λ, I prolongaccelerated descent by nˇpro strict descent iterations, to remove bias from an uneven number ofstrict descent iterations in selection, when comparing individuals with values of r(v;λ) that fallbelow r˘λ.137Appendix DComputational ComplexitiesHere, I calculate and compare computational complexities of r(p,x;λ) descent and a variety ofnumerical-integration-based methods.D.1 Computational Complexity of r(p,x;λ) DescentD.1.1 Formulation of r(p,x;λ) for CountingDescent is the most computationally intensive portion of overlapping-niche descent. Here, Icount the computation complexity required for an iteration of r(p,x;λ) descent. I considerr(p,x;λ) in the formr(p,x;λ) =1− λnyny∑j=1nt∑k=1dyj,k(gj,k(p,x))+(1− λ)2nyny∑j=1∑k∈Iyˆdyˆj,k(gj,k(p,x))+λnxnx∑i=1hi(x)∑k∈I∆d∆i,k(fi,k(t,p,x)), (D.1)where gj,k(p,x) = yj,k − gj(p, x1,k, . . . , xnx,k), with difference measures dyj,k , dyˆj,k , and d∆i,k ,and auxiliary functions hi(x), which may contain a smoothing penalty and numerical methodnormalization. For example, with ry(p,x) as defined in Section 2.3, ryˆ(p,x) as defined in Section2.6.1, and r∆x(p,x) as defined in Section 2.4.1,dyj,k(u) =wj,k∑ntk=1wj,ky2j,ku2, dyˆj,k(u) =σˆwˆj,k∑k∈Iyˆ wˆj,kyˆ2j,ku2,d∆i,k(u) = u2, hi(x) =si(x)∑k∈I∆(∆xi,k)2. (D.2)D.1.2 Defining Quantities for CountingPreliminarily, I define quantities for the computational complexity counting of r(p,x;λ) descent.I consider computationally simple difference measures, which require O(1) operations to calculate138D.1. Computational Complexity of r(p,x;λ) Descenta function value and O(1) operations to calculate a partial derivative value. Thus, calculatingdyj,k(u), dyˆj,k(u), d∆i,k(u) requires O(1) operations,∂dyj,k(u), ∂dyˆj,k(u), ∂d∆i,k(u) requires O(1) operations,∂2dyj,k(u), ∂2dyˆj,k(u), ∂2d∆i,k(u) requires O(1) operations, (D.3)for i ∈ {1, 2, . . . , nx}, j ∈ {1, 2, . . . , ny}, and k ∈ I∆. I note that dyj,k(u) and dyˆj,k(u) inequation (D.2) are computationally simple after initially calculating and storing∑ntk=1wj,ky2j,kand∑k∈Iyˆ wˆj,kyˆ2j,k. On average, gj,k(p,x) requires ng operations to calculate a function value,ng1 operations to calculate a first order partial derivative value, and ng2 operations to calculatea second order partial derivative value. Thus, I consider gj,k(p,x) such that calculatinggj,k(p,x) requires O(ng) operations,∂gj,k(p,x) requires O(ng1) operations,∂2gj,k(p,x) requires O(ng2) operations, (D.4)for j ∈ {1, 2, . . . , ny} and k ∈ I∆. On average, fi,k(t,p,x) requires nf operations to calculatea function value, nf1 operations to calculate a first order partial derivative value, and nf2operations to calculate a second order partial derivative value. Thus, I consider fi,k(t,p,x) suchthat calculatingfi,k(t,p,x) requires O(nf ) operations,∂fi,k(t,p,x) requires O(nf1) operations,∂2fi,k(t,p,x) requires O(nf2) operations, (D.5)for i ∈ {1, 2, . . . , nx} and k ∈ I∆. Auxiliary functions, hi(x), modify∑k∈I∆ d∆i,k ◦ fi,k(t,p,x)and, generally, are computationally simpler than∑k∈I∆ fi,k(t,p,x). Thus, I consider hi(x) thatare no more computationally complex than∑k∈I∆ fi,k(t,p,x), with partial derivatives that areno more computationally complex than corresponding partial derivative of∑k∈I∆ fi,k(t,p,x).Calculating∑k∈I∆ fi,k(t,p,x) requires O(nfn∆) operations, where n∆ is the number of elementsin I∆. fi,k(t,p,x) depend on xl,m for only a small fraction of k in I∆, at k in I∆m ⊂ I∆. Thus,calculating a first order partial derivative of∑k∈I∆ fi,k(t,p,x) with respect to xl,m requiresO(nf1nδ) operations, and calculating a second order partial derivative of∑k∈I∆ fi,k(t,p,x)with respect to xl,m requires O(nf2nδ) operations, where nδ is the number of elements in I∆m .Therefore, calculatinghi(x) requires O(≤ nfn∆) operations,∂hi(x) requires O(≤ nf1nδ) operations,∂2hi(x) requires O(≤ nf2nδ) operations, (D.6)139D.1. Computational Complexity of r(p,x;λ) Descentfor i ∈ {1, 2, . . . , nx}. On average, discretized differential equation values, Fi,k(t,p,x), requirenF operations to calculate a function value, nF1 operations to calculate a first order partialderivative value, and nF2 operations to calculate a second order partial derivative value. Thus, Iconsider Fi,k(t,p,x) such that calculatingFi,k(t,p,x) requires O(nF ) operations,∂Fi,k(t,p,x) requires O(nF1) operations,∂2Fi,k(t,p,x) requires O(nF2) operations, (D.7)for i ∈ {1, 2, . . . , nx} and k ∈ I∆. In computational complexity counting, I consider O(ng),O(ng1), O(nf ), O(nf1), O(nf2), O(nF ), O(nF1), O(nF2) ≥ O(1) and ng2 = 0 or O(ng2) ≥ O(1).D.1.3 Counting the Computational Complexity of r(p,x;λ) DescentTheorem 7. An iteration of r(p,x;λ) descent requiresO(nσn∆(ngny + nfnx))+O(n∆np(ng1ny + nf1nx + ng2ny + nf2nx))+O(n∆nx(ng1ny + nf1nxnδ + ng2ny + nf2nxnδ))(D.8)operations, with O(nσ) line-search test points.Proof. In each iteration of descent, I calculate values of r(p,x;λ). Calculating r(p,x;λ), as inequation (D.1), is equivalent in computational complexity to calculatingdyj,k(gj,k(p,x))for all (j, k) ∈ {1, . . . , ny} × {1 . . . , nt},dyˆj,k(gj,k(p,x))for all (j, k) ∈ {1, . . . , ny} × Iyˆ,hi(x) for all i ∈ {1, . . . , nx},d∆i,k(fi,k(t,p,x))for all (i, k) ∈ {1, . . . , nx} × I∆, (D.9)which, respectively, requireO(1) ◦O(ng)× nynt = O(ngnynt),O(1) ◦O(ng)× ny(n∆ − nt) = O(ngny(n∆ − nt)),O(≤ nfn∆)× nx = O(≤ nfn∆nx),O(1) ◦O(nf )× nxn∆ = O(nfnxn∆) (D.10)operations to calculate, as stipulated in Equations (D.3), (D.4), (D.5), and (D.6). Thus, in total,140D.1. Computational Complexity of r(p,x;λ) Descentcalculating r(p,x;λ) requiresO(ngnynt) +O(ngny(n∆ − nt))+O(≤ nfn∆nx) +O(nfnxn∆) =O(ngnyn∆) +O(nfnxn∆) = O(n∆(ngny + nfnx))(D.11)operations.In each iteration of descent, I calculate first order and un-mixed second order partialderivatives of r(p,x;λ) with respect to all parameters.∂r(p,x;λ)∂pl=1− λnyny∑j=1nt∑k=1∂dyj,k∂gj,k∂gj,k(p,x)∂pl+(1− λ)2nyny∑j=1∑k∈Iyˆ∂dyˆj,k∂gj,k∂gj,k(p,x)∂pl+λnxnx∑i=1hi(x)∑k∈I∆∂d∆i,k∂fi,k∂fi,k(t,p,x)∂pl, (D.12)∂2r(p,x;λ)∂p2l=1− λnyny∑j=1nt∑k=1(∂2dyj,k∂g2j,k(∂gj,k(p,x)∂pl)2+∂dyj,k∂gj,k∂2gj,k(p,x)∂p2l)+(1− λ)2nyny∑j=1∑k∈Iyˆ(∂2dyˆj,k∂g2j,k(∂gj,k(p,x)∂pl)2+∂dyˆj,k∂gj,k∂2gj,k(p,x)∂p2l)+λnxnx∑i=1hi(x)∑k∈I∆(∂2d∆i,k∂f2i,k(∂fi,k(t,p,x)∂pl)2+∂d∆i,k∂fi,k∂2fi,k(t,p,x)∂p2l). (D.13)Calculating ∂r(p,x;λ)/∂pl, as in equation (D.12), for all l ∈ {1, 2, . . . , np} requires calculating141D.1. Computational Complexity of r(p,x;λ) Descentpartial derivative values,∂dyj,k∂gj,kfor all (j, k) ∈ {1, . . . , ny} × {1 . . . , nt},∂gj,k(p,x)∂plfor all (j, k, l) ∈ {1, . . . , ny} × {1 . . . , nt} × {1 . . . , np},∂dyˆj,k∂gj,kfor all (j, k) ∈ {1, . . . , ny} × Iyˆ,∂gj,k(p,x)∂plfor all (j, k, l) ∈ {1, . . . , ny} × Iyˆ × {1 . . . , np},∂d∆i,k∂fi,kfor all (i, k) ∈ {1, . . . , nx} × I∆,∂fi,k(t,p,x)∂plfor all (i, k, l) ∈ {1, . . . , nx} × I∆ × {1 . . . , np}, (D.14)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),O(ng1)× nyntnp = O(ng1nyntnp),O(1) ◦O(I0)× ny(n∆ − nt) = O(ny(n∆ − nt)),O(ng1)× ny(n∆ − nt)np = O(ng1ny(n∆ − nt)np),O(1) ◦O(I0)× nxn∆ = O(nxn∆),O(nf1)× nxn∆np = O(nf1nxn∆np), (D.15)operations to calculate, as stipulated in Equations (D.3), (D.4), and (D.5), where I0 indicatesvalues that have been calculated previously and O(I0) = 1. Apart from calculating partialderivative values, calculating ∂r(p,x;λ)/∂pl, as in equation (D.12), for all l ∈ {1, 2, . . . , np} isequivalent in computational complexity to calculating∂dyj,k∂gj,k∂gj,k(p,x)∂plfor all (j, k, l) ∈ {1, . . . , ny} × {1 . . . , nt} × {1 . . . , np},∂dyˆj,k∂gj,k∂gj,k(p,x)∂plfor all (j, k, l) ∈ {1, . . . , ny} × Iyˆ × {1 . . . , np},∂d∆i,k∂fi,k∂fi,k(t,p,x)∂plfor all (i, k, l) ∈ {1, . . . , nx} × I∆ × {1 . . . , np},hi(x) · I0 for all (i, l) ∈ {1, . . . , nx} × {1 . . . , np}, (D.16)142D.1. Computational Complexity of r(p,x;λ) Descentwhich, respectively, requireO(1) ◦O(I0)× nyntnp = O(nyntnp),O(1) ◦O(I0)× ny(n∆ − nt)np = O(ny(n∆ − nt)np),O(1) ◦O(I0)× nxn∆np = O(nxn∆np),O(1) ◦O(I0)× nxnp = O(nxnp) (D.17)operations to calculate. Thus, from computational complexity counts (D.15) and (D.17),calculating ∂r(p,x;λ)/∂pl for all l ∈ {1, 2, . . . , np} requiresO(nynt) +O(ng1nyntnp) +O(ny(n∆ − nt))+O(ng1ny(n∆ − nt)np)+O(nxn∆) +O(nf1nxn∆np) +O(nyntnp) +O(ny(n∆ − nt)np)+O(nxn∆np)+O(nxnp) = O(n∆np(ng1ny + nf1nx))(D.18)operations.Calculating ∂2r(p,x;λ)/∂p2l , as in equation (D.13), for all l ∈ {1, 2, . . . , np} requires calcu-lating partial derivative values,∂2dyj,k∂g2j,kfor all (j, k) ∈ {1, . . . , ny} × {1 . . . , nt},∂2gj,k(p,x)∂p2lfor all (j, k, l) ∈ {1, . . . , ny} × {1 . . . , nt} × {1 . . . , np},∂2dyˆj,k∂g2j,kfor all (j, k) ∈ {1, . . . , ny} × Iyˆ,∂2gj,k(p,x)∂p2lfor all (j, k, l) ∈ {1, . . . , ny} × Iyˆ × {1 . . . , np},∂2d∆i,k∂f2i,kfor all (i, k) ∈ {1, . . . , nx} × I∆,∂2fi,k(t,p,x)∂p2lfor all (i, k, l) ∈ {1, . . . , nx} × I∆ × {1 . . . , np}, (D.19)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),O(ng2)× nyntnp = O(ng2nyntnp),O(1) ◦O(I0)× ny(n∆ − nt) = O(ny(n∆ − nt)),O(ng2)× ny(n∆ − nt)np = O(ng2ny(n∆ − nt)np),O(1) ◦O(I0)× nxn∆ = O(nxn∆),O(nf2)× nxn∆np = O(nf2nxn∆np), (D.20)143D.1. Computational Complexity of r(p,x;λ) Descentoperations to calculate, as stipulated in Equations (D.3), (D.4), and (D.5). Apart fromcalculating partial derivative values, calculating ∂2r(p,x;λ)/∂p2l , as in equation (D.13), for alll ∈ {1, 2, . . . , np} is equivalent in computational complexity to calculating∂2dyj,k∂g2j,k(∂gj,k(p,x)∂pl)2for all (j, k, l) ∈ {1, . . . , ny} × {1 . . . , nt} × {1 . . . , np},∂dyj,k∂gj,k∂2gj,k(p,x)∂p2lfor all (j, k, l) ∈ {1, . . . , ny} × {1 . . . , nt} × {1 . . . , np},∂2dyˆj,k∂g2j,k(∂gj,k(p,x)∂pl)2for all (j, k, l) ∈ {1, . . . , ny} × Iyˆ × {1 . . . , np},∂dyˆj,k∂gj,k∂2gj,k(p,x)∂p2lfor all (j, k, l) ∈ {1, . . . , ny} × Iyˆ × {1 . . . , np},∂2d∆i,k∂f2i,k(∂fi,k(t,p,x)∂pl)2for all (i, k, l) ∈ {1, . . . , nx} × I∆ × {1 . . . , np},∂d∆i,k∂fi,k∂2fi,k(t,p,x)∂p2lfor all (i, k, l) ∈ {1, . . . , nx} × I∆ × {1 . . . , np},hi(x) · I0 for all (i, l) ∈ {1, . . . , nx} × {1 . . . , np}, (D.21)which, respectively, requireO(1) ◦O(I0)× nyntnp = O(nyntnp),Ig2 ·O(1) ◦O(I0)× nyntnp = O(Ig2nyntnp),O(1) ◦O(I0)× ny(n∆ − nt)np = O(ny(n∆ − nt)np),O(1) ◦O(I0)× ny(n∆ − nt)np = O(ny(n∆ − nt)np),O(1) ◦O(I0)× nxn∆np = O(nxn∆np),O(1) ◦O(I0)× nxn∆np = O(nxn∆np),O(1) ◦O(I0)× nxnp = O(nxnp) (D.22)operations to calculate, whereIg2 ={0 if ng2 = 01 if O(ng2) ≥ 1. (D.23)Thus, from complexity counts (D.20) and (D.22), calculating ∂2r(p,x;λ)/∂p2l for all l ∈144D.1. Computational Complexity of r(p,x;λ) Descent{1, 2, . . . , np} requiresO(nynt) +O(ng2nyntnp) +O(ny(n∆ − nt))+O(ng2ny(n∆ − nt)np)+O(nxn∆) +O(nf2nxn∆np) +O(nyntnp) +O(Ig2nyntnp)+O(ny(n∆ − nt)np)+O(ny(n∆ − nt)np)+O(nxn∆np) +O(nxn∆np)+O(nxnp) = O(ng2nyn∆np) +O(nf2nxn∆np) +O(nyn∆np) =O(n∆np(ng2ny + nf2nx))(D.24)operations. I note that computational complexity count (D.24) holds when ng2 = 0, as ny ≤ nx.In each iteration of descent, I calculate first order and un-mixed second order partialderivatives of r(p,x;λ) with respect to all state values.∂r(p,x;λ)∂xl,m=1− λnyny∑j=1nt∑k=1∂dyj,k∂gj,k∂gj,k(p,x)∂xl,m+(1− λ)2nyny∑j=1∑k∈Iyˆ∂dyˆj,k∂gj,k∂gj,k(p,x)∂xl,m+λnxnx∑i=1hi(x)∑k∈I∆∂d∆i,k∂fi,k∂fi,k(t,p,x)∂xl,m+λnxnx∑i=1∂hi(x)∂xl,m∑k∈I∆d∆i,k(fi,k(t,p,x)),∂2r(p,x;λ)∂x2l,m=1− λnyny∑j=1nt∑k=1(∂2dyj,k∂g2j,k(∂gj,k(p,x)∂xl,m)2+∂dyj,k∂gj,k∂2gj,k(p,x)∂x2l,m)+(1− λ)2nyny∑j=1∑k∈Iyˆ(∂2dyˆj,k∂g2j,k(∂gj,k(p,x)∂xl,m)2+∂dyˆj,k∂gj,k∂2gj,k(p,x)∂x2l,m)+λnxnx∑i=1hi(x)∑k∈I∆(∂2d∆i,k∂f2i,k(∂fi,k(t,p,x)∂xl,m)2+∂d∆i,k∂fi,k∂2fi,k(t,p,x)∂x2l,m)+λnxnx∑i=12∂hi(x)∂xl,m∑k∈I∆∂d∆i,k∂fi,k∂fi,k(t,p,x)∂xl,m+∂2hi(x)∂x2l,m∑k∈I∆d∆i,k(fi,k(t,p,x)) . (D.25)Observable-state functions, gj,k, depend only on the state values at grid index k; auxiliaryfunctions, hi(x), depend only on state values in the ith state; and fi,k(t,p,x) depend on xl,m145D.1. Computational Complexity of r(p,x;λ) Descentfor only a small fraction of k in I∆, at k in I∆m ⊂ I∆. Thus,∂r(p,x;λ)∂xl,m=1− λnyny∑j=1∂dyj,m∂gj,m∂gj,m(p,x)∂xl,m+(1− λ)2nyny∑j=1∂dyˆj,m∂gj,m∂gj,m(p,x)∂xl,m+λnxnx∑i=1hi(x)∑k∈I∆m∂d∆i,k∂fi,k∂fi,k(t,p,x)∂xl,m+λnx∂hl(x)∂xl,m∑k∈I∆d∆i,k(fi,k(t,p,x)), (D.26)∂2r(p,x;λ)∂x2l,m=1− λnyny∑j=1(∂2dyj,m∂g2j,m(∂gj,m(p,x)∂xl,m)2+∂dyj,m∂gj,m∂2gj,m(p,x)∂x2l,m)+(1− λ)2nyny∑j=1(∂2dyˆj,m∂g2j,m(∂gj,m(p,x)∂xl,m)2+∂dyˆj,m∂gj,m∂2gj,m(p,x)∂x2l,m)+λnxnx∑i=1hi(x)∑k∈I∆m(∂2d∆i,k∂f2i,k(∂fi,k(t,p,x)∂xl,m)2+∂d∆i,k∂fi,k∂2fi,k(t,p,x)∂x2l,m)+λnx2∂hl(x)∂xl,m∑k∈I∆m∂d∆i,k∂fi,k∂fi,k(t,p,x)∂xl,m+∂2hl(x)∂x2l,m∑k∈I∆d∆i,k(fi,k(t,p,x)) . (D.27)Calculating ∂r(p,x;λ)/∂xl,m, as in equation (D.26), for all l ∈ {1, 2, . . . , nx} and all m in I∆requires calculating partial derivative values,∂gj,m(p,x)∂xl,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂fi,k(t,p,x)∂xl,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1 . . . , nx} × I∆,∂hl(x)∂xl,mfor all (l,m) ∈ {1 . . . , nx} × I∆, (D.28)which, respectively, requireO(ng1)× nynxn∆ = O(ng1nynxn∆),O(nf1)× nxnδnxn∆ = O(nf1nxnδnxn∆),O(≤ nf1nδ)× nxn∆ = O(≤ nf1nδnxn∆) (D.29)operations to calculate, as stipulated in Equations (D.4), (D.5), and (D.6). Apart fromcalculating partial derivative values, calculating ∂r(p,x;λ)/∂xl,m, as in equation (D.26), for all146D.1. Computational Complexity of r(p,x;λ) Descentl ∈ {1, 2, . . . , nx} and all m in I∆ is equivalent in computational complexity to calculating∂dyj,m∂gj,m∂gj,m(p,x)∂xl,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂dyˆj,m∂gj,m∂gj,m(p,x)∂xl,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂d∆i,k∂fi,k∂fi,k(t,p,x)∂xl,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1 . . . , nx} × I∆,hi(x) · I0 for all (i, l,m) ∈ {1, . . . , nx} × {1 . . . , nx} × I∆,∂hl(x)∂xl,m· I0 for all (l,m) ∈ {1 . . . , nx} × I∆, (D.30)which, respectively, requireO(1) ◦O(I0)× nynxn∆ = O(nynxn∆),O(1) ◦O(I0)× nynxn∆ = O(nynxn∆),O(1) ◦O(I0)× nxnδnxn∆ = O(nxnδnxn∆),O(1) ◦O(I0)× nxnxn∆ = O(nxnxn∆),O(1) ◦O(I0)× nxn∆ = O(nxn∆) (D.31)operations to calculate. Thus, from complexity counts (D.29) and (D.31), calculating ∂r(p,x;λ)/∂xl,mfor all l ∈ {1, 2, . . . , nx} and all m in I∆ requiresO(ng1nynxn∆) +O(nf1nxnδnxn∆) +O(≤ nf1nδnxn∆) +O(nynxn∆)+O(nynxn∆) +O(nxnδnxn∆) +O(nxnxn∆) +O(nxn∆) =O(n∆nx(ng1ny + nf1nxnδ))(D.32)operations. Calculating ∂2r(p,x;λ)/∂x2l,m, as in equation (D.27), for all l ∈ {1, 2, . . . , nx} andall m in I∆ requires calculating partial derivative values,∂2gj,m(p,x)∂x2l,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂2fi,k(t,p,x)∂x2l,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1 . . . , nx} × I∆,∂2hl(x)∂x2l,mfor all (l,m) ∈ {1 . . . , nx} × I∆, (D.33)147D.1. Computational Complexity of r(p,x;λ) Descentwhich, respectively, requireO(ng2)× nynxn∆ = O(ng2nynxn∆),O(nf2)× nxnδnxn∆ = O(nf2nxnδnxn∆),O(≤ nf2nδ)× nxn∆ = O(≤ nf2nδnxn∆) (D.34)operations to calculate, as stipulated in Equations (D.4), (D.5), and (D.6). Apart fromcalculating partial derivative values, calculating ∂2r(p,x;λ)/∂x2l,m, as in equation (D.27), forall l ∈ {1, 2, . . . , nx} and all m in I∆ is equivalent in computational complexity to calculating∂2dyj,m∂g2j,m(∂gj,m(p,x)∂xl,m)2for all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂dyj,m∂gj,m∂2gj,m(p,x)∂x2l,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂2dyˆj,m∂g2j,m(∂gj,m(p,x)∂xl,m)2for all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂dyˆj,m∂gj,m∂2gj,m(p,x)∂x2l,mfor all (j, l,m) ∈ {1, . . . , ny} × {1 . . . , nx} × I∆,∂2d∆i,k∂f2i,k(∂fi,k(t,p,x)∂xl,m)2for all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1 . . . , nx} × I∆,∂d∆i,k∂fi,k∂2fi,k(t,p,x)∂x2l,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1 . . . , nx} × I∆,hi(x) · I0 for all (i, l,m) ∈ {1, . . . , nx} × {1 . . . , nx} × I∆,∂hl(x)∂xl,m· I0 for all (l,m) ∈ {1 . . . , nx} × I∆,∂2hl(x)∂x2l,m· I0 for all (l,m) ∈ {1 . . . , nx} × I∆, (D.35)148D.1. Computational Complexity of r(p,x;λ) Descentwhich, respectively, requireO(1) ◦O(I0)× nynxn∆ = O(nynxn∆),Ig2 ·O(1) ◦O(I0)× nynxn∆ = O(Ig2nynxn∆),O(1) ◦O(I0)× nynxn∆ = O(nynxn∆),Ig2 ·O(1) ◦O(I0)× nynxn∆ = O(Ig2nynxn∆),O(1) ◦O(I0)× nxnδnxn∆ = O(nxnδnxn∆),O(1) ◦O(I0)× nxnδnxn∆ = O(nxnδnxn∆),O(1) ◦O(I0)× nxnxn∆ = O(nxnxn∆),O(1) ◦O(I0)× nxn∆ = O(nxn∆),O(1) ◦O(I0)× nxn∆ = O(nxn∆) (D.36)operations to calculate. Thus, from complexity counts (D.34) and (D.36), calculating ∂2r(p,x;λ)/∂x2l,mfor all l ∈ {1, 2, . . . , nx} and all m in I∆ requiresO(ng2nynxn∆) +O(nf2nxnδnxn∆) +O(≤ nf2nδnxn∆) +O(nynxn∆)+O(Ig2nynxn∆) +O(nynxn∆) +O(Ig2nynxn∆) +O(nxnδnxn∆)+O(nxnδnxn∆) +O(nxnxn∆) +O(nxn∆) +O(nxn∆) =O(ng2nynxn∆) +O(nf2nxnδnxn∆) +O(nynxn∆) =O(n∆nx(ng2ny + nf2nxnδ))(D.37)operations. I note that computational complexity count (D.37) holds when ng2 = 0, as ny ≤ nx.In each iteration of descent, I generate O(nσ) line-search test points, one for each value ofσj . Generating each test point requires an update of all parameters and state values, requiringO(np + nxn∆) operations. I consider r(p,x;λ) with fewer parameter values than state values.Thus, np < nxn∆, which implies that O(np +nxn∆) = O(nxn∆). From complexity count (D.11),calculating r(p,x;λ) at each line-search test point requires O(n∆(ngny + nfnx)) operations.Thus, updating all parameters and state values at all line-search test points, calculating r(p,x;λ)at all line-search test points, calculating first order and un-mixed second order partial derivativesof r(p,x;λ) with respect to all parameters, and calculating first order and un-mixed secondorder partial derivatives of r(p,x;λ) with respect to all state values requiresO(nσnxn∆) +O(nσn∆(ngny + nfnx))+O(n∆np(ng1ny + nf1nx))+O(n∆np(ng2ny + nf2nx))+O(n∆nx(ng1ny + nf1nxnδ))+O(n∆nx(ng2ny + nf2nxnδ))=O(nσn∆(ngny + nfnx))+O(n∆np(ng1ny + nf1nx + ng2ny + nf2nx))+O(n∆nx(ng1ny + nf1nxnδ + ng2ny + nf2nxnδ))(D.38)149D.2. Computational Complexities of Numerical-Integration-Based Methodsoperations, with partial derivative complexity counts from (D.18), (D.24), (D.32), and (D.37).Therefore, an iteration of r(p,x;λ) descent requiresO(nσn∆(ngny + nfnx))+O(n∆np(ng1ny + nf1nx + ng2ny + nf2nx))+O(n∆nx(ng1ny + nf1nxnδ + ng2ny + nf2nxnδ))(D.39)operations.D.2 Computational Complexities ofNumerical-Integration-Based MethodsComparatively, I count the computational complexity required to minimize r(q), wherer(q) =1nyny∑j=1nt∑k=1dyj,k(gj,k(q,x)):fi,k(t,q,x) = 0 for all i ∈ {1, 2, . . . , nx} and for all k ∈ I∆, (D.40)with q consisting of nq elements, q1, q2, . . . , qnq , which correspond to model parameters p1, p2, . . . , pnpand variable initial conditions and boundary values, and where gj,k(p,x) = yj,k−gj(p, x1,k, . . . , xnx,k)with difference measure dyj,k . As in counting the computation complexity required for an it-eration of r(p,x;λ) descent, I define quantities for counting computational complexities as inSection D.1.2.D.2.1 Counting the Computational Complexity of r(q) DescentTheorem 8. For an explicit numerical solution method, an iteration of r(q) descent requiresO(nσ(nfnxn∆ + ngnynt) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.41)operations, with O(nσ) line-search test points. For an implicit numerical solution method, aniteration of r(q) descent requiresO(nσnNnxn∆(nf + nf1nxnδ) + nσngnynt)+O(nM (nq + nσnN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.42)operations, with O(nM ) operations in solving matrix equations for each of O(nN ) iterations ofNewton’s method applied to fi,k(t,q,x) = 0, for all i ∈ {1, 2, . . . , nx} and k ∈ I∆.Alternatively, for an explicit numerical solution method, an iteration of r(q) descent with150D.2. Computational Complexities of Numerical-Integration-Based Methodspartial derivative approximation by finite difference requiresO((nσ + nq)(nfnxn∆ + ngnynt))(D.43)operations, and, for an implicit numerical solution method, an iteration of r(q) descent withpartial derivative approximation by finite difference requiresO((nσ + nq)(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt))(D.44)operations.Proof. In each iteration of descent, I calculate values of r(q). For an explicit numerical solutionmethod, fi,k(t,q,x) = 0 is an explicit system of equations in x, and solving fi,k(t,q,x) = 0simply requires evaluating fi,k(t,q,x). Thus, to determine x, solvingfi,k(t,q,x) = 0 for all (i, k) ∈ {1, . . . , nx} × I∆ (D.45)requiresO(nf )× nxn∆ = O(nfnxn∆) (D.46)operations, as stipulated in equation (D.5). After calculating x, calculating r(q) requirescalculating1nyny∑j=1nt∑k=1dyj,k(gj,k(q,x)), (D.47)which is equivalent in computational complexity to calculatingdyj,k(gj,k(q,x))for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.48)which requiresO(1) ◦O(ng)× nynt = O(ngnynt) (D.49)operations to calculate, as stipulated in Equations (D.3) and (D.4). Thus, from complexitycounts (D.46) and (D.49), calculating r(q) requiresO(nfnxn∆) +O(ngnynt) = O(nfnxn∆ + ngnynt) (D.50)operations with an explicit numerical solution method.In each iteration of descent, I calculate first order and un-mixed second order partial151D.2. Computational Complexities of Numerical-Integration-Based Methodsderivatives of r(q) with respect to all parameters,∂r(q)∂ql=1nyny∑j=1nt∑k=1∂dyj,k∂gj,k(nx∑m=1∂gj,k∂xm,k∂xm,k∂ql+∂gj,k∂ql), (D.51a)∂2r(q)∂q2l=1nyny∑j=1nt∑k=1∂2dyj,k∂g2j,k(nx∑m=1∂gj,k∂xm,k∂xm,k∂ql+∂gj,k∂ql)2+∂dyj,k∂gj,k(nx∑m=1(( nx∑n=1∂2gj,k∂xn,k∂xm,k∂xn,k∂ql+∂2gj,k∂ql∂xm,k)∂xm,k∂ql+∂gj,k∂xm,k∂2xm,k∂q2l)+nx∑m=1∂2gj,k∂xm,k∂ql∂xm,k∂ql+∂2gj,k∂q2l)]. (D.51b)Partial derivatives of state values with respect to parameters are generally calculated bynumerically solving the sensitivity equations, which are generated by applying the chain rule tothe differential equation system.In the case of an initial value problem, dxi/dt = Fi(t,q, x1, . . . , xnx) for i ∈ {1, 2, . . . , nx},applying the chain rule to Fi(t,q, x1, . . . , xnx) generates the sensitivity equations,ddt(∂xi∂ql)=∂∂ql(dxidt)=nx∑j=1∂Fi∂xj∂xj∂ql+∂Fi∂ql, (D.52a)ddt(∂2xi∂q2l)=∂2∂q2l(dxidt)=nx∑j=1[(nx∑k=1∂2Fi∂xk∂xj∂xk∂ql+∂2Fi∂ql∂xj)∂xj∂ql+∂Fi∂xj∂2xj∂q2l]+nx∑j=1∂2Fi∂xj∂ql∂xj∂ql+∂2Fi∂q2l=nx∑j=1[(nx∑k=1∂2Fi∂xk∂xj∂xk∂ql)∂xj∂ql+ 2∂2Fi∂xj∂ql∂xj∂ql+∂Fi∂xj∂2xj∂q2l]+∂2Fi∂q2l, (D.52b)two systems of differential equations, one in ∂xi/∂ql and one in ∂2xi/∂q2l for i ∈ {1, 2, . . . , nx}.From Equations (D.52a) and (D.52b), using the forward Euler method, the simplest explicitnumerical method for initial value problems, I can calculate ∂xi,m/∂ql and ∂2xi,m/∂q2l fori ∈ {1, 2, . . . , nx} and m ∈ I∆ such that∂xi,m+1∂ql=∂xi,m∂ql+1tm+1 − tm nx∑j=1∂Fi,m∂xj,m∂xj,m∂ql+∂Fi,m∂ql , (D.53a)∂2xi,m+1∂q2l=∂2xi,m∂q2l+1tm+1 − tm(. . .nx∑j=1[(nx∑k=1∂2Fi,m∂xk,m∂xj,m∂xk,m∂ql)∂xj,m∂ql+ 2∂2Fi,m∂xj,m∂ql∂xj,m∂ql+∂Fi,m∂xj,m∂2xj,m∂q2l]+∂2Fi,m∂q2l),(D.53b)152D.2. Computational Complexities of Numerical-Integration-Based Methodswhere Fi,m = Fi(tm,q, x1,m, . . . , xnx,m).Solving system (D.53a) for i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq} requirescalculating partial derivative values,∂Fi,m∂xj,mfor all (i, j,m) ∈ {1, . . . , nx} × {1, . . . , nx} × I∆,∂Fi,m∂qlfor all (i,m, l) ∈ {1, . . . , nx} × I∆ × {1, . . . , nq}, (D.54)which, respectively, requireO(nF1)× nxnxn∆ = O(nF1nxnxn∆),O(nF1)× nxn∆nq = O(nF1nxn∆nq) (D.55)operations to calculate, as stipulated in equation (D.7). Apart from calculating partial derivativevalues, solving system (D.53a) for all i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq} isequivalent in computational complexity to calculating∂Fi,m∂xj,m∂xj,m∂qlfor all (i, j,m, l) ∈ {1, . . . , nx} × {1, . . . , nx} × I∆ × {1, . . . , nq},∂Fi,m∂qlfor all (i,m, l) ∈ {1, . . . , nx} × I∆ × {1, . . . , nq}, (D.56)which, respectively, requireO(1) ◦O(I0)× nxnxn∆nq = O(nxnxn∆nq),O(I0)× nxn∆nq = O(nxn∆nq) (D.57)operations to calculate, where I0 indicates values that have been calculated previously andO(I0) = 1. Therefore, from complexity counts (D.55) and (D.57), in total, solving systems(D.53a) for all i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq}, the first order sensitivityequations for an initial value problem using the forward Euler method, requiresO(nF1nxnxn∆) +O(nF1nxn∆nq) +O(nxnxn∆nq) +O(nxn∆nq) =O(nxn∆(nF1nx + nF1nq + nxnq))(D.58)operations. The forward Euler method applied to an initial value problem is the computationallyleast expensive explicit numerical solution method. Thus, in general, solving the first ordersensitivity equations with an explicit numerical solution method requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.59)operations.153D.2. Computational Complexities of Numerical-Integration-Based MethodsSimilarly, solving system (D.53b) for i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq}requires calculating partial derivative values,∂2Fi,m∂xk,m∂xj,mfor all (i, j, k,m) ∈ {1, . . . , nx} × {1, . . . , nx} × {1, . . . , nx} × I∆,∂2Fi,m∂xj,m∂qlfor all (i, j,m, l) ∈ {1, . . . , nx} × {1, . . . , nx} × I∆ × {1, . . . , nq},∂2Fi,m∂q2lfor all (i,m, l) ∈ {1, 2, . . . , nx} × I∆ × {1, 2, . . . , nq}, (D.60)which, respectively, requireO(nF2)× nxnxnxn∆ = O(nF2nxnxnxn∆),O(nF2)× nxnxn∆nq = O(nF2nxnxn∆nq),O(nF2)× nxn∆nq = O(nF2nxn∆nq) (D.61)operations to calculate, as stipulated in equation (D.7). Apart from calculating partial derivativevalues, solving systems (D.53b) for all i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq} isequivalent in computational complexity to calculating∂2Fi,m∂xk,m∂xj,m∂xk,m∂qlfor all (i, j, k,m, l) ∈ {1, . . . , nx}3 × I∆ × {1, . . . , nq},I0 · ∂xj,m∂qlfor all (j,m, l) ∈ {1, . . . , nx} × I∆ × {1, . . . , nq},∂2Fi,m∂xj,m∂ql∂xj,m∂qlfor all (i, j,m, l) ∈ {1, . . . , nx}2 × I∆ × {1, . . . , nq},∂Fi,m∂xj,m∂2xj,m∂q2lfor all (i, j,m, l) ∈ {1, . . . , nx}2 × I∆ × {1, . . . , nq},∂2Fi,m∂q2lfor all (i,m, l) ∈ {1, . . . , nx} × I∆ × {1, . . . , nq}, (D.62)which, respectively, requireO(1) ◦O(I0)× nxnxnxn∆nq = O(nxnxnxn∆nq),O(1) ◦O(I0)× nxn∆nq = O(nxn∆nq),O(1) ◦O(I0)× nxnxn∆nq = O(nxnxn∆nq),O(1) ◦O(I0)× nxnxn∆nq = O(nxnxn∆nq),O(I0)× nxn∆nq = O(nxn∆nq) (D.63)operations to calculate. Therefore, from complexity counts (D.61) and (D.63), in total, solvingsystems (D.53b) for all i ∈ {1, 2, . . . , nx}, m ∈ I∆, and l ∈ {1, 2, . . . , nq}, the un-mixed second154D.2. Computational Complexities of Numerical-Integration-Based Methodsorder sensitivity equations for an initial value problem using the forward Euler method, requiresO(nF2nxnxnxn∆) +O(nF2nxnxn∆nq) +O(nF2nxn∆nq) +O(nxnxnxn∆nq)+O(nxn∆nq) +O(nxnxn∆nq) +O(nxnxn∆nq) +O(nxn∆nq) =O(nxnxn∆(nF2nx + nF2nq + nxnq))(D.64)operations. The forward Euler method applied to an initial value problem is the computationallyleast expensive explicit numerical solution method. Thus, in general, solving the un-mixedsecond order sensitivity equations with an explicit numerical solution method requiresO( ≥ nxnxn∆(nF2nx + nF2nq + nxnq)) (D.65)operations.After solving the first order sensitivity equations, calculating ∂r(q)/∂ql, of equation (D.51a),for all l ∈ {1, 2, . . . , nq} requires calculating partial derivative values,∂gj,k∂xm,kfor all (j, k,m) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,kfor all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.66)which, respectively, requireO(ng1)× nyntnx = O(ng1nyntnx),O(ng1)× nyntnq = O(ng1nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.67)operations to calculate, as stipulated in Equations (D.3) and (D.4). After solving the first ordersensitivity equations and calculating partial derivative values, calculating ∂r(q)/∂ql, of equation(D.51a), for all l ∈ {1, 2, . . . , nq} is equivalent in computational complexity to calculating∂gj,k∂xm,k∂xm,k∂qlfor all (j, k, l,m) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.68)155D.2. Computational Complexities of Numerical-Integration-Based Methodswhich, respectively, requireO(1) ◦O(I0)× nyntnqnx = O(nyntnqnx),O(I0)× nyntnq = O(nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.69)operations to calculate. Thus, from complexity counts (D.67) and (D.69), after solving the firstorder sensitivity equations, calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requiresO(ng1nyntnx) +O(ng1nyntnq) +O(nynt) +O(nyntnqnx) +O(nyntnq)+O(nynt) = O(nynt(ng1nx + ng1nq + nqnx))(D.70)operations. Therefore, from complexity counts (D.59) and (D.70), in total, calculating ∂r(q)/∂qlfor all l ∈ {1, 2, . . . , nq} requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nynt(ng1nx + ng1nq + nqnx)) =O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nyntng1(nx + nq)) (D.71)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆.After solving the un-mixed second order sensitivity equations, calculating ∂2r(q)/∂q2l , ofequation (D.51b), for all l ∈ {1, 2, . . . , nq} requires calculating partial derivative values,∂2dyj,k∂g2j,kfor all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt},∂2gj,k∂xn,k∂xm,kfor all (j, k,m, n) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx}2,∂2gj,k∂xm,k∂qlfor all (j, k, l,m) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂2gj,k∂q2lfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq}, (D.72)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),O(ng2)× nyntnxnx = O(ng2nyntnxnx),O(ng2)× nyntnqnx = O(ng2nyntnqnx),O(ng2)× nyntnq = O(ng2nyntnq) (D.73)operations to calculate, as stipulated in Equations (D.3) and (D.4). After solving the un-156D.2. Computational Complexities of Numerical-Integration-Based Methodsmixed second order sensitivity equations, and calculating partial derivative values, calculating∂2r(q)/∂q2l , of equation (D.51b), for all l ∈ {1, 2, . . . , nq} is equivalent in computational com-plexity to calculating∂2dyj,k∂g2j,k· I02 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt},∂2gj,k∂xn,k∂xm,k∂xn,k∂qlfor all (j, k, l,m, n) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx}2,∂2gj,k∂ql∂xm,kfor all (j, k, l,m) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},I0 · ∂xm,k∂qlfor all (k, l,m) ∈ {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂gj,k∂xm,k∂2xm,k∂q2lfor all (j, k, l,m) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂2gj,k∂xm,k∂ql∂xm,k∂qlfor all (j, k, l,m) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂2gj,k∂q2lfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.74)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),Ig2 ·O(1) ◦O(I0)× nyntnqnxnx = O(Ig2nyntnqnxnx),Ig2 ·O(I0)× nyntnqnx = O(Ig2nyntnqnx),O(1) ◦O(I0)× ntnqnx = O(ntnqnx),O(1) ◦O(I0)× nyntnqnx = O(nyntnqnx),Ig2 ·O(1) ◦O(I0)× nyntnqnx = O(Ig2nyntnqnx),Ig2 ·O(I0)× nyntnq = O(Ig2nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.75)157D.2. Computational Complexities of Numerical-Integration-Based Methodsoperations to calculate, whereIg2 ={0 if ng2 = 01 if O(ng2) ≥ 1.(D.76)Thus, from complexity counts (D.73) and (D.75), after solving the un-mixed second ordersensitivity equations, calculating ∂2r(q)/∂q2l for all l ∈ {1, 2, . . . , nq} requiresO(nynt) +O(ng2nyntnxnx) +O(ng2nyntnqnx) +O(ng2nyntnq) +O(nynt)+O(Ig2nyntnqnxnx) +O(Ig2nyntnqnx) +O(ntnqnx) +O(nyntnqnx)+O(Ig2nyntnqnx) +O(Ig2nyntnq) +O(nynt) =O(nynt(ng2nxnx + ng2nqnx + Ig2nqnxnx + nqnx))(D.77)operations. Therefore, from complexity counts (D.65) and (D.77), in total, calculating ∂2r(q)/∂q2lfor all l ∈ {1, 2, . . . , nq} requiresO( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nynt(ng2nxnx + ng2nqnx + Ig2nqnxnx + nqnx))=O( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nyntng2(nxnx + nqnx)) (D.78)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆.In each iteration of r(q) descent, I generate O(nσ) line-search test points, one for eachvalue of σj . Generating each test point requires an update of all parameter values, requiringO(nq) operations. From complexity count (D.50), for an explicit numerical solution method,calculating r(q) at each line-search test point requires O(nfnxn∆ + ngnynt) operations. Thus,for an explicit numerical solution method, updating all parameter values and calculating r(q) ateach line-search test point requiresO(nq) +O(nfnxn∆ + ngnynt) = O(nfnxn∆ + ngnynt) (D.79)operations, as I consider r(q) with fewer parameter values than state values, nq < nxn∆. Assuch, for an explicit numerical solution method, updating all parameter values and calculatingr(q) at all line-search test points, calculating first order partial derivatives of r(q) with respectto all parameters, and calculating un-mixed second order partial derivatives of r(q) with respect158D.2. Computational Complexities of Numerical-Integration-Based Methodsto all parameters requiresO(nσ(nfnxn∆ + ngnynt))+O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nyntng1(nx + nq))+O( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nyntng2(nxnx + nqnx)) =O(nσ(nfnxn∆ + ngnynt) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.80)operations, with partial derivative complexity counts from (D.71) and (D.78). Therefore, for anexplicit numerical solution method, an iteration of r(q) descent requiresO(nσ(nfnxn∆ + ngnynt) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.81)operations.For implicit numerical solution methods, fi,k(t,q,x) = 0 is an implicit system of equations inx and discretized sensitivity equations are implicit systems of equations in discretized sensitivityvalues, ∂xi,m/∂ql and ∂2xi,m/∂q2l . Otherwise, r(q) descent with an implicit numerical solutionmethod is identical to r(q) descent with an explicit numerical solution method. Generally,fi,k(t,q,x) = 0 is a nonlinear system of equations that is solved numerically using Newton’smethod. Each iteration of Newton’s method to solve fi,k(t,q,x) = 0 requires calculating thevalues of fi,k(t,q,x) and the values of first order partial derivatives of fi,k(t,q,x) with respectto xl,m, for all i ∈ {1, 2, . . . , nx}, k ∈ I∆, l ∈ {1, 2, . . . , nx}, and m ∈ I∆. Calculatingfi,k(t,q,x) for all (i, k) ∈ {1, 2, . . . , nx} × I∆ (D.82)requiresO(nf )× nxn∆ = O(nfnxn∆) (D.83)operations, as stipulated in equation (D.5). fi,k(t,q,x) depend on xl,m for only a small fractionof k in I∆, at k in I∆m ⊂ I∆. Calculating∂fi,k(t,q,x)∂xl,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1, . . . , nx} × I∆ (D.84)requiresO(nf1)× nxnδnxn∆ = O(nf1nxnδnxn∆) (D.85)operations, as stipulated in equation (D.5). Additionally, each iteration of Newton’s method to159D.2. Computational Complexities of Numerical-Integration-Based Methodssolve fi,k(t,q,x) = 0 requires solving matrix equations, requiring a total of O(nM ) operations.Thus, in conjunction with complexity counts (D.83) and (D.85), each iteration of Newton’smethod to solve fi,k(t,q,x) = 0 requiresO(nxn∆nf ) +O(nf1nxnδnxn∆) +O(nM ) =O(nxn∆nf + nf1nxnδnxn∆ + nM ) (D.86)operations. Solving fi,k(t,q,x) = 0 requires O(nN ) iterations of Newton’s method. Thus, for animplicit numerical solution method, numerically solving fi,k(t,q,x) = 0 to determine x requiresO(nNnxn∆nf + nNnf1nxnδnxn∆ + nNnM ) (D.87)operations. From complexity count (D.49), After calculating x, calculating r(q) requirescalculating1nyny∑j=1nt∑k=1dyj,k(gj,k(q,x)),which requires O(ngnynt) operations to calculate. Thus, from complexity counts (D.49) and(D.87), calculating r(q) requiresO(nNnxn∆nf + nNnf1nxnδnxn∆ + nNnM ) +O(ngnynt) =O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)(D.88)operations with an implicit numerical solution method.The sensitivity equations are generated by applying the chain rule to the differential equationsystem, and are thus linear in sensitivity values, ∂xi/∂ql and ∂2xi/∂q2l . As such, discretized sen-sitivity equations are generally linear in discretized sensitivity values, ∂xi,m/∂ql and ∂2xi,m/∂q2l .Thus, beyond the calculations required to solve the discretized sensitivity equations withan explicit numerical solution method, solving the discretized sensitivity equations with animplicit numerical solution method requires solving matrix equations. Both first order andun-mixed second order discretized sensitivity equations with respect to ql are identical in size tofi,k(t,q,x) = 0. Thus, calculating matrix equations in solving first order discretized sensitivityequations with respect to ql requires a total of O(nM ) operations, and calculating matrixequations in solving un-mixed second order discretized sensitivity equations with respect toql requires a total of O(nM ) operations. As such, calculating matrix equations in solving firstorder discretized sensitivity equations with respect to ql for all l ∈ {1, 2, . . . , nq} requires a totalof O(nqnM ) operations, and calculating matrix equations in solving un-mixed second orderdiscretized sensitivity equations with respect to ql for all l ∈ {1, 2, . . . , nq} requires a total ofO(nqnM ) operations. Therefore, in general, in conjunction with complexity count (D.59), solving160D.2. Computational Complexities of Numerical-Integration-Based Methodsthe first order sensitivity equations with an implicit numerical solution method requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM ) (D.89)operations, and, in conjunction with complexity count (D.65), solving the un-mixed secondorder sensitivity equations with an implicit numerical solution method requiresO( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nqnM ) (D.90)operations. From complexity count (D.70), after solving the first order sensitivity equations,calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requires O(nynt(ng1nx+ng1nq+nqnx)) operations.Therefore, from complexity counts (D.89) and (D.70), in total, calculating ∂r(q)/∂ql for alll ∈ {1, 2, . . . , nq} requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM )+O(nynt(ng1nx + ng1nq + nqnx))=O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM + nyntng1(nx + nq)) (D.91)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆. Fromcomplexity count (D.77), after solving the un-mixed second order sensitivity equations, calculating∂2r(q)/∂q2l for all l ∈ {1, 2, . . . , nq} requires O(nynt(ng2nxnx + ng2nqnx + Ig2nqnxnx + nqnx))operations. Therefore, from complexity counts (D.90) and (D.77), in total, calculating ∂2r(q)/∂q2lfor all l ∈ {1, 2, . . . , nq} requiresO( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nqnM )O(nynt(ng2nxnx + ng2nqnx + Ig2nqnxnx + nqnx))=O( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nqnM ) +O(nyntng2(nxnx + nqnx)) (D.92)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆.In each iteration of r(q) descent, I generate O(nσ) line-search test points, one for eachvalue of σj . Generating each test point requires an update of all parameter values, requiringO(nq) operations. From complexity count (D.88), for an implicit numerical solution method,calculating r(q) at each line-search test point requires O(nNnxn∆(nf + nf1nxnδ) + nNnM +ngnynt) operations. Thus, for an implicit numerical solution method, updating all parametervalues and calculating r(q) at each line-search test point requiresO(nq) +O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)=O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)(D.93)operations, as I consider r(q) with fewer parameter values than state values, nq < nxn∆. As161D.2. Computational Complexities of Numerical-Integration-Based Methodssuch, for an implicit numerical solution method, updating all parameter values and calculatingr(q) at all line-search test points, calculating first order partial derivatives of r(q) with respectto all parameters, and calculating un-mixed second order partial derivatives of r(q) with respectto all parameters requiresO(nσnNnxn∆(nf + nf1nxnδ) + nσnNnM + nσngnynt)+O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM + nyntng1(nx + nq))+O( ≥ nxnxn∆(nF2nx + nF2nq + nxnq))+O(nqnM ) +O(nyntng2(nxnx + nqnx))= O(nσnNnxn∆(nf + nf1nxnδ) + nσngnynt)+O(nM (nq + nσnN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.94)operations, with partial derivative complexity counts from complexity counts (D.91) and (D.92).Therefore, for an implicit numerical solution method, an iteration of r(q) descent requiresO(nσnNnxn∆(nf + nf1nxnδ) + nσngnynt)+O(nM (nq + nσnN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nxnxnq)) (D.95)operations.Alternatively, I can approximate partial derivatives of r(q) with respect to parameters byfinite differences, rather than by solving the sensitivity equations. Most simply,∂r(q)∂ql≈ r(q + hlel)− r(q)hl, (D.96a)∂2r(q)∂q2l≈ r(q + hlel)− 2r(q) + r(q− hlel)h2l, (D.96b)where el is the lth standard basis vector and hl is some small perturbation in parameterql. Approximating ∂r(q)/∂ql and ∂2r(q)/∂q2l , as in Equations (D.96a) and (D.96b), for alll ∈ {1, 2, . . . , nq} is equivalent in computational complexity to calculatingr(q + hlel) for all l ∈ {1, 2, . . . , nq},r(q),r(q− hlel) for all l ∈ {1, 2, . . . , nq}, (D.97)which, from complexity count (D.50), for an explicit numerical solution method, respectively,162D.2. Computational Complexities of Numerical-Integration-Based MethodsrequireO(nfnxn∆ + ngnynt)× nq = O(nq(nfnxn∆ + ngnynt)),O(I0),O(nfnxn∆ + ngnynt)× nq = O(nq(nfnxn∆ + ngnynt)), (D.98)operations to calculate, and, from complexity count (D.88), for an implicit numerical solutionmethod, respectively, requireO(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× nq =O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq),O(I0),O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× nq =O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)(D.99)operations to calculate. Thus, for an explicit numerical solution method, approximating∂r(q)/∂ql and ∂2r(q)/∂q2l by finite difference for all l ∈ {1, 2, . . . , nq} requiresO(nq(nfnxn∆ + ngnynt))+O(I0) +O(nq(nfnxn∆ + ngnynt))=O(nq(nfnxn∆ + ngnynt))(D.100)operations, and for an implicit numerical solution method, approximating ∂r(q)/∂ql and∂2r(q)/∂q2l by finite difference for all l ∈ {1, 2, . . . , nq} requiresO(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)+O(I0)+O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)=O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)(D.101)operations.For an explicit numerical solution method, updating all parameter values and calculatingr(q) at all of the O(nσ) line-search test points, as calculated for each line-search test pointin complexity count (D.79), and approximating first order and un-mixed second order partialderivatives of r(q) with respect to all parameters by finite difference, as calculated in complexitycount (D.100), requiresO(nσnfnxn∆ + nσngnynt) +O(nq(nfnxn∆ + ngnynt))=O(nfnxn∆(nσ + nq) + ngnynt(nσ + nq))=O((nσ + nq)(nfnxn∆ + ngnynt))(D.102)163D.2. Computational Complexities of Numerical-Integration-Based Methodsoperations. For an implicit numerical solution method, updating all parameter values andcalculating r(q) at all of the O(nσ) line-search test points, as calculated for each line-search testpoint in complexity count (D.93), and approximating first order and un-mixed second orderpartial derivatives of r(q) with respect to all parameters by finite difference, as calculated incomplexity count (D.101), requiresO(nσnNnxn∆(nf + nf1nxnδ) + nσnNnM + nσngnynt)+O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)=O(nNnxn∆(nσ + nq)(nf + nf1nxnδ) + nNnM (nσ + nq) + ngnynt(nσ + nq))=O((nσ + nq)(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt))(D.103)operations. Therefore, from complexity count (D.102), for an explicit numerical solution method,an iteration of r(q) descent with partial derivative approximation by finite difference requiresO((nσ + nq)(nfnxn∆ + ngnynt))(D.104)operations, and from complexity count (D.103), for an implicit numerical solution method, aniteration of r(q) descent with partial derivative approximation by finite difference requiresO((nσ + nq)(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt))(D.105)operations.D.2.2 Counting the Computational Complexity of Newton’s Method toMinimize r(q)On an unrestricted domain, a local minimum of r(q) occurs where ∂r(q)/∂pl = 0 for alll ∈ {1, 2, . . . , nq}. Thus, a local minimum of r(q) is calculable by applying Newton’s method tofind a solution to the system ∂r(q)/∂pl = 0 for all l ∈ {1, 2, . . . , nq}.Theorem 9. For an explicit numerical solution method, an iteration of r(q) minimization byNewton’s method requiresO(nfnxn∆ + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.106)operations. For an implicit numerical solution method, an iteration of r(q) minimization by164D.2. Computational Complexities of Numerical-Integration-Based MethodsNewton’s method requiresO(nNnxn∆(nf + nf1nxnδ) + nM (nqnq + nN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.107)operations, with O(nM ) operations in solving matrix equations for each of O(nN ) iterations ofNewton’s method applied to fi,k(t,q,x) = 0, for all i ∈ {1, 2, . . . , nx} and k ∈ I∆.Alternatively, for an explicit numerical solution method, an iteration of r(q) minimizationby Newton’s method with partial derivative approximation by finite difference requiresO(nqnq(nfnxn∆ + ngnynt))(D.108)operations, and, for an implicit numerical solution method, an iteration of r(q) minimization byNewton’s method with partial derivative approximation by finite difference requiresO(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)(D.109)operations.Proof. In each iteration of r(q) minimization by Newton’s method, I calculate numerical solutionvalues, x. For an explicit numerical solution method, fi,k(t,q,x) = 0 is an explicit system ofequations in x, and solving fi,k(t,q,x) = 0 simply requires evaluating fi,k(t,q,x). Thus, todetermine x, solvingfi,k(t,q,x) = 0 for all (i, k) ∈ {1, . . . , nx} × I∆ (D.110)requiresO(nf )× nxn∆ = O(nfnxn∆) (D.111)operations, as stipulated in equation (D.5).In each iteration of r(q) minimization by Newton’s method, I calculate first order and second165D.2. Computational Complexities of Numerical-Integration-Based Methodsorder partial derivatives of r(q) with respect to all parameters,∂r(q)∂ql=1nyny∑j=1nt∑k=1∂dyj,k∂gj,k(nx∑m=1∂gj,k∂xm,k∂xm,k∂ql+∂gj,k∂ql), (D.112a)∂2r(q)∂qm∂ql=1nyny∑j=1nt∑k=1[∂2dyj,k∂g2j,k(nx∑s=1∂gj,k∂xs,k∂xs,k∂qm+∂gj,k∂qm)(nx∑s=1∂gj,k∂xs,k∂xs,k∂ql+∂gj,k∂ql)+∂dyj,k∂gj,k(nx∑s=1(( nx∑t=1∂2gj,k∂xt,k∂xs,k∂xt,k∂qm+∂2gj,k∂qm∂xs,k)∂xs,k∂ql+∂gj,k∂xs,k∂2xs,k∂qm∂ql)+nx∑s=1∂2gj,k∂xs,k∂ql∂xs,k∂qm+∂2gj,k∂qm∂ql)]. (D.112b)Partial derivatives of state values with respect to parameters are generally calculated bynumerically solving the sensitivity equations, which are generated by applying the chain rule tothe differential equation system.In the case of an initial value problem, dxi/dt = Fi(t,q, x1, . . . , xnx) for i ∈ {1, 2, . . . , nx},applying the chain rule to Fi(t,q, x1, . . . , xnx) generates the sensitivity equations,ddt(∂xi∂ql)=∂∂ql(dxidt)=nx∑j=1∂Fi∂xj∂xj∂ql+∂Fi∂ql, (D.113a)ddt(∂2xi∂qm∂ql)=∂2∂qm∂ql(dxidt)=nx∑j=1[(nx∑k=1∂2Fi∂xk∂xj∂xk∂qm+∂2Fi∂qm∂xj)∂xj∂ql+∂Fi∂xj∂2xj∂qm∂ql]+nx∑j=1∂2Fi∂xj∂ql∂xj∂qm+∂2Fi∂qm∂ql, (D.113b)two systems of differential equations, one in ∂xi/∂ql and one in ∂2xi/∂qm∂ql for i ∈ {1, 2, . . . , nx}.From Equations (D.113a) and (D.113b), using the forward Euler method, the simplest explicitnumerical method for initial value problems, I can calculate ∂xi,s/∂ql and ∂2xi,s/∂qm∂ql for166D.2. Computational Complexities of Numerical-Integration-Based Methodsi ∈ {1, 2, . . . , nx} and s ∈ I∆ such that∂xi,s+1∂ql=∂xi,s∂ql+1ts+1 − ts nx∑j=1∂Fi,s∂xj,s∂xj,s∂ql+∂Fi,s∂ql , (D.114a)∂2xi,s+1∂qm∂ql=∂2xi,s∂qm∂ql+1ts+1 − ts(. . .nx∑j=1[(nx∑k=1∂2Fi,s∂xk,s∂xj,s∂xk,s∂qm+∂2Fi,s∂qm∂xj,s)∂xj,s∂ql+∂Fi,s∂xj,s∂2xj,s∂qm∂ql]+nx∑j=1∂2Fi,s∂xj,s∂ql∂xj,s∂qm+∂2Fi,s∂qm∂ql), (D.114b)where Fi,s = Fi(ts,q, x1,s, . . . , xnx,s).Solving system (D.114a) for i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, and s ∈ I∆ requirescalculating partial derivative values,∂Fi,s∂xj,sfor all (i, j, s) ∈ {1, . . . , nx} × {1, . . . , nx} × I∆,∂Fi,s∂qlfor all (i, l, s) ∈ {1, . . . , nx} × {1, . . . , nq} × I∆, (D.115)which, respectively, requireO(nF1)× nxnxn∆ = O(nF1nxnxn∆),O(nF1)× nxnqn∆ = O(nF1nxnqn∆) (D.116)operations to calculate, as stipulated in equation (D.7). Apart from calculating partial derivativevalues, solving system (D.114a) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, and s ∈ I∆ isequivalent in computational complexity to calculating∂Fi,s∂xj,s∂xj,s∂qlfor all (i, j, l, s) ∈ {1, . . . , nx} × {1, . . . , nx} × {1, . . . , nq} × I∆,∂Fi,s∂qlfor all (i, l, s) ∈ {1, . . . , nx} × {1, . . . , nq} × I∆, (D.117)which, respectively, requireO(1) ◦O(I0)× nxnxnqn∆ = O(nxnxnqn∆),O(I0)× nxnqn∆ = O(nxnqn∆) (D.118)operations to calculate, where I0 indicates values that have been calculated previously andO(I0) = 1. Therefore, from complexity counts (D.116) and (D.118), in total, solving systems(D.114a) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, and s ∈ I∆, the first order sensitivity167D.2. Computational Complexities of Numerical-Integration-Based Methodsequations for an initial value problem using the forward Euler method, requiresO(nF1nxnxn∆) +O(nF1nxnqn∆) +O(nxnxnqn∆) +O(nxnqn∆) =O(nxn∆(nF1nx + nF1nq + nxnq))(D.119)operations. The forward Euler method applied to an initial value problem is the computationallyleast expensive explicit numerical solution method. Thus, in general, solving the first ordersensitivity equations with an explicit numerical solution method requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.120)operations.Similarly, solving system (D.114b) for i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, m ∈ {1, 2, . . . , nq},and s ∈ I∆ requires calculating partial derivative values,∂2Fi,s∂xk,s∂xj,sfor all (i, j, k, s) ∈ {1, . . . , nx} × {1, . . . , nx} × {1, . . . , nx} × I∆,∂2Fi,s∂xj,s∂qlfor all (i, j, l, s) ∈ {1, . . . , nx} × {1, . . . , nx} × {1, . . . , nq} × I∆,∂2Fi,s∂qm∂qlfor all (i, l,m, s) ∈ {1, 2, . . . , nx} × {1, 2, . . . , nq} × {1, 2, . . . , nq} × I∆, (D.121)which, respectively, requireO(nF2)× nxnxnxn∆ = O(nF2nxnxnxn∆),O(nF2)× nxnxnqn∆ = O(nF2nxnxnqn∆),O(nF2)× nxnqnqn∆ = O(nF2nxnqnqn∆) (D.122)operations to calculate, as stipulated in equation (D.7). Apart from calculating partial derivativevalues, solving systems (D.114b) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, m ∈ {1, 2, . . . , nq},168D.2. Computational Complexities of Numerical-Integration-Based Methodsand s ∈ I∆ is equivalent in computational complexity to calculating∂2Fi,s∂xk,s∂xj,s∂xk,s∂qmfor all (i, j, k,m, s) ∈ {1, . . . , nx}3 × {1, . . . , nq} × I∆,∂2Fi,s∂qm∂xj,sfor all (i, j,m, s) ∈ {1, . . . , nx}2 × {1, . . . , nq} × I∆,I0 · ∂xj,s∂qlfor all (j, l, s) ∈ {1, . . . , nx} × {1, . . . , nq} × I∆,∂Fi,s∂xj,s∂2xj,s∂qm∂qlfor all (i, j, l,m, s) ∈ {1, . . . , nx}2 × {1, . . . , nq}2 × I∆,∂2Fi,s∂xj,s∂ql∂xj,s∂qmfor all (i, j, l,m, s) ∈ {1, . . . , nx}2 × {1, . . . , nq}2 × I∆,∂2Fi,s∂qm∂qlfor all (i, l,m, s) ∈ {1, . . . , nx} × {1, . . . , nq}2 × I∆, (D.123)which, respectively, requireO(1) ◦O(I0)× nxnxnxnqn∆ = O(nxnxnxnqn∆),O(I0)× nxnxnqn∆ = O(nxnxnqn∆),O(1) ◦O(I0)× nxnqn∆ = O(nxnqn∆),O(1) ◦O(I0)× nxnxnqnqn∆ = O(nxnxnqnqn∆),O(1) ◦O(I0)× nxnxnqnqn∆ = O(nxnxnqnqn∆),O(I0)× nxnqnqn∆ = O(nxnqnqn∆) (D.124)operations to calculate. Therefore, from complexity counts (D.122) and (D.124), in total, solvingsystems (D.114b) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, m ∈ {1, 2, . . . , nq}, and s ∈ I∆, thesecond order sensitivity equations for an initial value problem using the forward Euler method,requiresO(nF2nxnxnxn∆) +O(nF2nxnxnqn∆) +O(nF2nxnqnqn∆)+O(nxnxnxnqn∆) +O(nxnxnqn∆) +O(nxnqn∆)+O(nxnxnqnqn∆) +O(nxnxnqnqn∆) +O(nxnqnqn∆) =O(nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))(D.125)operations. The forward Euler method applied to an initial value problem is the computationallyleast expensive explicit numerical solution method. Thus, in general, solving the second ordersensitivity equations with an explicit numerical solution method requiresO( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.126)operations.169D.2. Computational Complexities of Numerical-Integration-Based MethodsAfter solving the first order sensitivity equations, calculating ∂r(q)/∂ql, of equation (D.112a),for all l ∈ {1, 2, . . . , nq} requires calculating partial derivative values,∂gj,k∂xm,kfor all (j, k,m) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,kfor all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.127)which, respectively, requireO(ng1)× nyntnx = O(ng1nyntnx),O(ng1)× nyntnq = O(ng1nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.128)operations to calculate, as stipulated in Equations (D.3) and (D.4). After solving the first ordersensitivity equations and calculating partial derivative values, calculating ∂r(q)/∂ql, of equation(D.112a), for all l ∈ {1, 2, . . . , nq} is equivalent in computational complexity to calculating∂gj,k∂xm,k∂xm,k∂qlfor all (j, k,m, l) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx} × {1, . . . , nq},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.129)which, respectively, requireO(1) ◦O(I0)× nyntnxnq = O(nyntnxnq),O(I0)× nyntnq = O(nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.130)operations to calculate. Thus, from complexity counts (D.128) and (D.130), after solving thefirst order sensitivity equations, calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requiresO(ng1nyntnx) +O(ng1nyntnq) +O(nynt) +O(nyntnxnq) +O(nyntnq)+O(nynt) = O(nynt(ng1nx + ng1nq + nxnq))(D.131)operations. Therefore, from complexity counts (D.120) and (D.131), in total, calculating170D.2. Computational Complexities of Numerical-Integration-Based Methods∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nynt(ng1nx + ng1nq + nxnq)) =O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nyntng1(nx + nq)) (D.132)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆.After solving the second order sensitivity equations, calculating ∂2r(q)/∂qm∂ql, of equation(D.112b), for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requires calculating partial derivativevalues,∂2dyj,k∂g2j,kfor all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt},∂2gj,k∂xt,k∂xs,kfor all (j, k, s, t) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx}2,∂2gj,k∂xs,k∂qlfor all (j, k, l, s) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂2gj,k∂qm∂qlfor all (j, k, l,m) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq}2, (D.133)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),O(ng2)× nyntnxnx = O(ng2nyntnxnx),O(ng2)× nyntnqnx = O(ng2nyntnqnx),O(ng2)× nyntnqnq = O(ng2nyntnqnq) (D.134)operations to calculate, as stipulated in Equations (D.3) and (D.4). After solving the second ordersensitivity equations and calculating partial derivative values, calculating ∂2r(q)/∂qm∂ql, ofequation (D.112b), for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} is equivalent in computational171D.2. Computational Complexities of Numerical-Integration-Based Methodscomplexity to calculating∂2dyj,k∂g2j,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt},∂2gj,k∂xt,k∂xs,k∂xt,k∂qmfor all (j, k,m, s, t) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx}2,∂2gj,k∂qm∂xs,kfor all (j, k,m, s) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},I0 · ∂xs,k∂qlfor all (k, l, s) ∈ {1, . . . , nt} × {1, . . . , nq} × {1, . . . , nx},∂gj,k∂xs,k∂2xs,k∂qm∂qlfor all (j, k, l,m, s) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq}2 × {1, . . . , nx},∂2gj,k∂xs,k∂ql∂xs,k∂qmfor all (j, k, l,m, s) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq}2 × {1, . . . , nx},∂2gj,k∂qm∂qlfor all (j, k, l,m) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq}2,∂dyj,k∂gj,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.135)which, respectively, requireO(1) ◦O(I0)× nynt = O(nynt),Ig2 ·O(1) ◦O(I0)× nyntnqnxnx = O(Ig2nyntnqnxnx),Ig2 ·O(I0)× nyntnqnx = O(Ig2nyntnqnx),O(1) ◦O(I0)× ntnqnx = O(ntnqnx),O(1) ◦O(I0)× nyntnqnqnx = O(nyntnqnqnx),Ig2 ·O(1) ◦O(I0)× nyntnqnqnx = O(Ig2nyntnqnqnx),Ig2 ·O(I0)× nyntnqnq = O(Ig2nyntnqnq),O(1) ◦O(I0)× nynt = O(nynt) (D.136)operations to calculate, whereIg2 ={0 if ng2 = 01 if O(ng2) ≥ 1.(D.137)Thus, from complexity counts (D.134) and (D.136), after solving the second order sensitivity172D.2. Computational Complexities of Numerical-Integration-Based Methodsequations, calculating ∂2r(q)/∂qm∂ql for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requiresO(nynt) +O(ng2nyntnxnx) +O(ng2nyntnqnx) +O(ng2nyntnqnq) +O(nynt)+O(Ig2nyntnqnxnx) +O(Ig2nyntnqnx) +O(ntnqnx) +O(nyntnqnqnx)+O(Ig2nyntnqnqnx) +O(Ig2nyntnqnq) +O(nynt) =O(nynt(ng2nxnx + ng2nqnx + ng2nqnq + Ig2nqnxnx + Ig2nqnqnx + nqnqnx))(D.138)operations. Therefore, from complexity counts (D.126) and (D.138), in total, calculating∂2r(q)/∂qm∂ql for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requiresO( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nynt(ng2nxnx + ng2nqnx + ng2nqnq + Ig2nqnxnx + Ig2nqnqnx + nqnqnx))=O( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nyntng2(nxnx + nqnx + nqnq))(D.139)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆.In each iteration of r(q) minimization by Newton’s method, updating parameter valuesrequires solving an nq × nq matrix equation, which requires O(n3q) operations using Gaussianelimination. Thus, for an explicit numerical solution method, calculating first order partialderivatives of r(q) with respect to all parameters, calculating second order partial derivativesof r(q) with respect to all parameters, updating parameter values, and updating numericalsolution values requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nyntng1(nx + nq))+O( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nyntng2(nxnx + nqnx + nqnq))+O(n3q) +O(nfnxn∆) =O(nfnxn∆ + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.140)operations, as nq < nxn∆, the number of parameters is less than the number of state values, withpartial derivative complexity counts from (D.132) and (D.139) and numerical solution complexitycount from (D.111). Therefore, for an explicit numerical solution method, an iteration of r(q)minimization by Newton’s method requiresO(nfnxn∆ + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.141)operations.For implicit numerical solution methods, fi,k(t,q,x) = 0 is an implicit system of equations in173D.2. Computational Complexities of Numerical-Integration-Based Methodsx and discretized sensitivity equations are implicit systems of equations in discretized sensitivityvalues, ∂xi,s/∂ql and ∂2xi,s/∂qm∂ql. Otherwise, r(q) minimization by Newton’s method withan implicit numerical solution method is identical to r(q) minimization by Newton’s methodwith an explicit numerical solution method. Generally, fi,k(t,q,x) = 0 is a nonlinear systemof equations that is solved numerically using Newton’s method. Each iteration of Newton’smethod to solve fi,k(t,q,x) = 0 requires calculating the values of fi,k(t,q,x) and the valuesof first order partial derivatives of fi,k(t,q,x) with respect to xl,m, for all i ∈ {1, 2, . . . , nx},k ∈ I∆, l ∈ {1, 2, . . . , nx}, and m ∈ I∆. Calculatingfi,k(t,q,x) for all (i, k) ∈ {1, 2, . . . , nx} × I∆ (D.142)requiresO(nf )× nxn∆ = O(nfnxn∆) (D.143)operations, as stipulated in equation (D.5). fi,k(t,q,x) depend on xl,m for only a small fractionof k in I∆, at k in I∆m ⊂ I∆. Calculating∂fi,k(t,q,x)∂xl,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1, . . . , nx} × I∆ (D.144)requiresO(nf1)× nxnδnxn∆ = O(nf1nxnδnxn∆) (D.145)operations, as stipulated in equation (D.5). Additionally, each iteration of Newton’s method tosolve fi,k(t,q,x) = 0 requires solving matrix equations, requiring a total of O(nM ) operations.Thus, in conjunction with complexity counts (D.143) and (D.145), each iteration of Newton’smethod to solve fi,k(t,q,x) = 0 requiresO(nxn∆nf ) +O(nf1nxnδnxn∆) +O(nM ) =O(nxn∆(nf + nf1nxnδ) + nM)(D.146)operations. Solving fi,k(t,q,x) = 0 requires O(nN ) iterations of Newton’s method. Thus, for animplicit numerical solution method, numerically solving fi,k(t,q,x) = 0 to determine x requiresO(nNnxn∆(nf + nf1nxnδ) + nNnM)(D.147)operations.The sensitivity equations are generated by applying the chain rule to the differential equa-tion system, and are thus linear in sensitivity values, ∂xi/∂ql and ∂2xi/∂qm∂ql. As such,discretized sensitivity equations are generally linear in discretized sensitivity values, ∂xi,s/∂ql174D.2. Computational Complexities of Numerical-Integration-Based Methodsand ∂2xi,s/∂qm∂ql. Thus, beyond the calculations required to solve the discretized sensitiv-ity equations with an explicit numerical solution method, solving the discretized sensitivityequations with an implicit numerical solution method requires solving matrix equations. Bothfirst order and second order discretized sensitivity equations with respect to ql are identicalin size to fi,k(t,q,x) = 0. Thus, calculating matrix equations in solving first order discretizedsensitivity equations with respect to ql requires a total of O(nM ) operations, and calculatingmatrix equations in solving second order discretized sensitivity equations with respect to qlrequires a total of O(nM ) operations. As such, calculating matrix equations in solving firstorder discretized sensitivity equations with respect to ql for all l ∈ {1, 2, . . . , nq} requires a totalof O(nqnM ) operations, and calculating matrix equations in solving second order discretizedsensitivity equations with respect to ql and qm for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq}requires a total of O(nqnqnM ) operations. Therefore, in general, in conjunction with complexitycount (D.120), solving the first order sensitivity equations with an implicit numerical solutionmethod requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM ) (D.148)operations, and, in conjunction with complexity count (D.126), solving the second order sensi-tivity equations with an implicit numerical solution method requiresO( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nqnqnM ) (D.149)operations. From complexity count (D.131), after solving the first order sensitivity equations,calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requires O(nynt(ng1nx+ng1nq+nxnq)) operations.Therefore, from complexity counts (D.148) and (D.131), in total, calculating ∂r(q)/∂ql for alll ∈ {1, 2, . . . , nq} requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM )+O(nynt(ng1nx + ng1nq + nxnq))=O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM + nyntng1(nx + nq)) (D.150)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆. From complexitycount (D.138), after solving the second order sensitivity equations, calculating ∂2r(q)/∂qm∂qlfor all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requires O(nynt(ng2nxnx + ng2nqnx + ng2nqnq +Ig2nqnxnx + Ig2nqnqnx + nqnqnx)) operations. Therefore, from complexity counts (D.149) and(D.138), in total, calculating ∂2r(q)/∂qm∂ql for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq}175D.2. Computational Complexities of Numerical-Integration-Based MethodsrequiresO( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nqnqnM )+O(nynt(ng2nxnx + ng2nqnx + ng2nqnq + Ig2nqnxnx + Ig2nqnqnx + nqnqnx))=O( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nqnqnM )+O(nyntng2(nxnx + nqnx + nqnq))(D.151)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆.In each iteration of r(q) minimization by Newton’s method, updating parameter valuesrequires solving an nq × nq matrix equation, which requires O(n3q) operations using Gaussianelimination. Thus, for an implicit numerical solution method, calculating first order partialderivatives of r(q) with respect to all parameters, calculating second order partial derivativesof r(q) with respect to all parameters, updating parameter values, and updating numericalsolution values requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM + nyntng1(nx + nq))+O( ≥ nxn∆(nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq))+O(nqnqnM )+O(nyntng2(nxnx + nqnx + nqnq))+O(n3q)+O(nNnxn∆(nf + nf1nxnδ) + nNnM)=O(nNnxn∆(nf + nf1nxnδ) + nM (nqnq + nN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.152)operations, as nq < nxn∆, the number of parameters is less than the number of state values, withpartial derivative complexity counts from (D.150) and (D.151) and numerical solution complexitycount from (D.147). Therefore, for an implicit numerical solution method, an iteration of r(q)minimization by Newton’s method requiresO(nNnxn∆(nf + nf1nxnδ) + nM (nqnq + nN ) + nyntng1(nx + nq))+O(nyntng2(nxnx + nqnx + nqnq))+O( ≥ nxn∆(nF1nx + nF1nq + nF2nxnx + nF2nxnq + nF2nqnq + nxnxnq + nxnqnq)) (D.153)operations.Alternatively, I can approximate partial derivatives of r(q) with respect to parameters by176D.2. Computational Complexities of Numerical-Integration-Based Methodsfinite differences, rather than by solving the sensitivity equations. Most simply,∂r(q)∂ql≈ r(q + hlel)− r(q)hl, (D.154a)∂2r(q)∂qm∂ql≈ r(q + hlel)− r(q)− r(q + hlel − hmem) + r(q− hmem)hmhl, (D.154b)where el is the lth standard basis vector and hl is some small perturbation in parameter ql.After calculating x, calculating r(q) requires calculating1nyny∑j=1nt∑k=1dyj,k(gj,k(q,x)), (D.155)which is equivalent in computational complexity to calculatingdyj,k(gj,k(q,x))for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.156)which requiresO(1) ◦O(ng)× nynt = O(ngnynt) (D.157)operations to calculate, as stipulated in Equations (D.3) and (D.4). Thus, from complexitycounts (D.111) and (D.157), calculating r(q) requiresO(nfnxn∆) +O(ngnynt) = O(nfnxn∆ + ngnynt) (D.158)operations with an explicit numerical solution method, and, from complexity counts (D.147)and (D.157), calculating r(q) requiresO(nNnxn∆(nf + nf1nxnδ) + nNnM)+O(ngnynt) =O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)(D.159)operations with an implicit numerical solution method.Approximating ∂r(q)/∂ql and ∂2r(q)/∂qm∂ql, as in Equations (D.154a) and (D.154b), for alll ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} is equivalent in computational complexity to calculatingr(q + hlel) for all l ∈ {1, 2, . . . , nq},r(q),r(q + hlel − hmem) for all (l,m) ∈ {1, 2, . . . , nq}2 \ {(l,m) : l = m}r(q− hmem) for all m ∈ {1, 2, . . . , nq}, (D.160)which, from complexity count (D.158), for an explicit numerical solution method, respectively,177D.2. Computational Complexities of Numerical-Integration-Based MethodsrequireO(nfnxn∆ + ngnynt)× nq = O(nq(nfnxn∆ + ngnynt)),O(nfnxn∆ + ngnynt),O(nfnxn∆ + ngnynt)× (nqnq − nq) = O(nqnq(nfnxn∆ + ngnynt)),O(nfnxn∆ + ngnynt)× nq = O(nq(nfnxn∆ + ngnynt)), (D.161)operations to calculate, and, from complexity count (D.159), for an implicit numerical solutionmethod, respectively, requireO(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× nq =O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq),O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt),O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× (nqnq − nq) =O(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq),O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× nq =O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)(D.162)operations to calculate. Thus, for an explicit numerical solution method, approximating∂r(q)/∂ql and ∂2r(q)/∂qm∂ql by finite difference for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq}requiresO(nq(nfnxn∆ + ngnynt))+O(nfnxn∆ + ngnynt)+O(nqnq(nfnxn∆ + ngnynt))+O(nq(nfnxn∆ + ngnynt))=O(nqnq(nfnxn∆ + ngnynt))(D.163)operations, and for an implicit numerical solution method, approximating ∂r(q)/∂ql and∂2r(q)/∂qm∂ql by finite difference for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requiresO(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)+O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)+O(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)+O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)=O(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)(D.164)operations.In each iteration of r(q) minimization by Newton’s method, updating parameter valuesrequires solving an nq × nq matrix equation, which requires O(n3q) operations using Gaussian178D.2. Computational Complexities of Numerical-Integration-Based Methodselimination. Thus, for an explicit numerical solution method, approximating first order andsecond order partial derivatives of r(q) with respect to all parameters by finite difference,updating parameter values, and updating numerical solution values requiresO(nqnq(nfnxn∆ + ngnynt))+O(n3q) +O(nfnxn∆) =O(nqnq(nfnxn∆ + ngnynt))(D.165)operations, as nq < nxn∆, the number of parameters is less than the number of state values,with partial derivative approximation complexity count from (D.163) and numerical solutioncomplexity count from (D.111); and for an implicit numerical solution method, approximatingfirst order and second order partial derivatives of r(q) with respect to all parameters by finitedifference, updating parameter values, and updating numerical solution values requiresO(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)+O(n3q)+O(nNnxn∆(nf + nf1nxnδ) + nNnM)=O(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)(D.166)operations, as nq < nxn∆, the number of parameters is less than the number of state values,with partial derivative approximation complexity count from (D.164) and numerical solutioncomplexity count from (D.147). Therefore, from complexity count (D.165), for an explicitnumerical solution method, an iteration of r(q) minimization by Newton’s method with partialderivative approximation by finite difference requiresO(nqnq(nfnxn∆ + ngnynt))(D.167)operations, and from complexity count (D.166), for an implicit numerical solution method, aniteration of r(q) minimization by Newton’s method with partial derivative approximation byfinite difference requiresO(nNnxn∆nqnq(nf + nf1nxnδ) + nNnMnqnq + ngnyntnqnq)(D.168)operations.D.2.3 Counting the Computational Complexity of Gradient-BasedMethods to Minimize r(q)In methods such as the Gauss-Newton method, Levenberg-Marquardt method, and quasi-Newtonmethods, rather than generating the Hessian matrix by calculating second order partial derivativevalues, ∂2r(q)/∂qm∂ql for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq}, as in Newton’s method,the Hessian matrix is approximated using first order partial derivative values, ∂r(q)/∂ql for alll ∈ {1, 2, . . . , nq}. Thus, an iteration of a method that approximates the Hessian using ∂r(q)/∂ql179D.2. Computational Complexities of Numerical-Integration-Based Methodsfor all l ∈ {1, 2, . . . , nq} requires at least as many operations as the number of operations requiredto calculate ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq}.Theorem 10. For an explicit numerical solution method, calculating first order partial deriva-tives of r(q) with respect to all parameters requiresO(nfnxn∆ + nyntng1(nx + nq))+O( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.169)operations, and, for an implicit numerical solution method, calculating first order partial deriva-tives of r(q) with respect to all parameters requiresO(nNnxn∆(nf + nf1nxnδ) + nM (nq + nN ) + nyntng1(nx + nq))+O( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.170)operations, with O(nM ) operations in solving matrix equations for each of O(nN ) iterations ofNewton’s method applied to fi,k(t,q,x) = 0, for all i ∈ {1, 2, . . . , nx} and k ∈ I∆.Alternatively, for an explicit numerical solution method, approximating first order partialderivatives of r(q) with respect to all parameters by finite difference requiresO(nq(nfnxn∆ + ngnynt))(D.171)operations, and, for an implicit numerical solution method, approximating first order partialderivatives of r(q) with respect to all parameters by finite difference requiresO(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)(D.172)operations.Proof. For an explicit numerical solution method, fi,k(t,q,x) = 0 is an explicit system ofequations in x, and solving fi,k(t,q,x) = 0 simply requires evaluating fi,k(t,q,x). Thus, todetermine x, solvingfi,k(t,q,x) = 0 for all (i, k) ∈ {1, . . . , nx} × I∆ (D.173)requiresO(nf )× nxn∆ = O(nfnxn∆) (D.174)operations, as stipulated in equation (D.5).I calculate first order partial derivatives of r(q) with respect to all parameters,∂r(q)∂ql=1nyny∑j=1nt∑k=1∂dyj,k∂gj,k(nx∑m=1∂gj,k∂xm,k∂xm,k∂ql+∂gj,k∂ql). (D.175)180D.2. Computational Complexities of Numerical-Integration-Based MethodsPartial derivatives of state values with respect to parameters are generally calculated bynumerically solving the sensitivity equations, which are generated by applying the chain rule tothe differential equation system.In the case of an initial value problem, dxi/dt = Fi(t,q, x1, . . . , xnx) for i ∈ {1, 2, . . . , nx},applying the chain rule to Fi(t,q, x1, . . . , xnx) generates the sensitivity equations,ddt(∂xi∂ql)=∂∂ql(dxidt)=nx∑j=1∂Fi∂xj∂xj∂ql+∂Fi∂ql, (D.176)a system of differential equations in ∂xi/∂ql for i ∈ {1, 2, . . . , nx}. From Equations (D.176), usingthe forward Euler method, the simplest explicit numerical method for initial value problems, Ican calculate ∂xi,m/∂ql for i ∈ {1, 2, . . . , nx} and m ∈ I∆ such that∂xi,m+1∂ql=∂xi,m∂ql+1tm+1 − tm nx∑j=1∂Fi,m∂xj,m∂xj,m∂ql+∂Fi,m∂ql , (D.177)where Fi,m = Fi(tm,q, x1,m, . . . , xnx,m). Solving system (D.177) for i ∈ {1, 2, . . . , nx}, l ∈{1, 2, . . . , nq}, and m ∈ I∆ requires calculating partial derivative values,∂Fi,m∂xj,mfor all (i, j,m) ∈ {1, . . . , nx} × {1, . . . , nx} × I∆,∂Fi,m∂qlfor all (i, l,m) ∈ {1, . . . , nx} × {1, . . . , nq} × I∆, (D.178)which, respectively, requireO(nF1)× nxnxn∆ = O(nF1nxnxn∆),O(nF1)× nxnqn∆ = O(nF1nxnqn∆) (D.179)operations to calculate, as stipulated in equation (D.7). Apart from calculating partial derivativevalues, solving system (D.177) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, and m ∈ I∆ isequivalent in computational complexity to calculating∂Fi,m∂xj,m∂xj,m∂qlfor all (i, j, l,m) ∈ {1, . . . , nx} × {1, . . . , nx} × {1, . . . , nq} × I∆,∂Fi,m∂qlfor all (i, l,m) ∈ {1, . . . , nx} × {1, . . . , nq} × I∆, (D.180)which, respectively, requireO(1) ◦O(I0)× nxnxnqn∆ = O(nxnxnqn∆),O(I0)× nxnqn∆ = O(nxnqn∆) (D.181)181D.2. Computational Complexities of Numerical-Integration-Based Methodsoperations to calculate, where I0 indicates values that have been calculated previously andO(I0) = 1. Therefore, from complexity counts (D.179) and (D.181), in total, solving systems(D.177) for all i ∈ {1, 2, . . . , nx}, l ∈ {1, 2, . . . , nq}, and m ∈ I∆, the first order sensitivityequations for an initial value problem using the forward Euler method, requiresO(nF1nxnxn∆) +O(nF1nxnqn∆) +O(nxnxnqn∆) +O(nxnqn∆) =O(nxn∆(nF1nx + nF1nq + nxnq))(D.182)operations. The forward Euler method applied to an initial value problem is the computationallyleast expensive explicit numerical solution method. Thus, in general, solving the first ordersensitivity equations with an explicit numerical solution method requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.183)operations.After calculating x and solving the first order sensitivity equations, calculating ∂r(q)/∂ql,of equation (D.175), for all l ∈ {1, 2, . . . , nq} requires calculating partial derivative values,∂gj,k∂xm,kfor all (j, k,m) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,kfor all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.184)which, respectively, requireO(ng1)× nyntnx = O(ng1nyntnx),O(ng1)× nyntnq = O(ng1nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.185)operations to calculate, as stipulated in Equations (D.3) and (D.4). After calculating x,solving the first order sensitivity equations, and calculating partial derivative values, calculating∂r(q)/∂ql, of equation (D.175), for all l ∈ {1, 2, . . . , nq} is equivalent in computational complexity182D.2. Computational Complexities of Numerical-Integration-Based Methodsto calculating∂gj,k∂xm,k∂xm,k∂qlfor all (j, k,m, l) ∈{1, . . . , ny} × {1, . . . , nt} × {1, . . . , nx} × {1, . . . , nq},∂gj,k∂qlfor all (j, k, l) ∈ {1, . . . , ny} × {1, . . . , nt} × {1, . . . , nq},∂dyj,k∂gj,k· I0 for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.186)which, respectively, requireO(1) ◦O(I0)× nyntnxnq = O(nyntnxnq),O(I0)× nyntnq = O(nyntnq),O(1) ◦O(I0)× nynt = O(nynt) (D.187)operations to calculate. Thus, from complexity counts (D.185) and (D.187), after calculating xand solving the first order sensitivity equations, calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq}requiresO(ng1nyntnx) +O(ng1nyntnq) +O(nynt) +O(nyntnxnq) +O(nyntnq)+O(nynt) = O(nynt(ng1nx + ng1nq + nxnq))(D.188)operations. Therefore, from complexity counts (D.174), (D.183), and (D.188), in total, calculating∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requiresO(nfnxn∆) +O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nynt(ng1nx + ng1nq + nxnq))=O(nfnxn∆ + nyntng1(nx + nq))+O( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.189)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆.For implicit numerical solution methods, fi,k(t,q,x) = 0 is an implicit system of equations inx and discretized sensitivity equations are implicit systems of equations in discretized sensitivityvalues, ∂xi,m/∂ql. Generally, fi,k(t,q,x) = 0 is a nonlinear system of equations that is solvednumerically using Newton’s method. Each iteration of Newton’s method to solve fi,k(t,q,x) = 0requires calculating the values of fi,k(t,q,x) and the values of first order partial derivativesof fi,k(t,q,x) with respect to xl,m, for all i ∈ {1, 2, . . . , nx}, k ∈ I∆, l ∈ {1, 2, . . . , nx}, andm ∈ I∆. Calculatingfi,k(t,q,x) for all (i, k) ∈ {1, 2, . . . , nx} × I∆ (D.190)183D.2. Computational Complexities of Numerical-Integration-Based MethodsrequiresO(nf )× nxn∆ = O(nfnxn∆) (D.191)operations, as stipulated in equation (D.5). fi,k(t,q,x) depend on xl,m for only a small fractionof k in I∆, at k in I∆m ⊂ I∆. Calculating∂fi,k(t,q,x)∂xl,mfor all (i, k, l,m) ∈ {1, . . . , nx} × I∆m × {1, . . . , nx} × I∆ (D.192)requiresO(nf1)× nxnδnxn∆ = O(nf1nxnδnxn∆) (D.193)operations, as stipulated in equation (D.5). Additionally, each iteration of Newton’s method tosolve fi,k(t,q,x) = 0 requires solving matrix equations, requiring a total of O(nM ) operations.Thus, in conjunction with complexity counts (D.191) and (D.193), each iteration of Newton’smethod to solve fi,k(t,q,x) = 0 requiresO(nxn∆nf ) +O(nf1nxnδnxn∆) +O(nM ) =O(nxn∆(nf + nf1nxnδ) + nM)(D.194)operations. Solving fi,k(t,q,x) = 0 requires O(nN ) iterations of Newton’s method. Thus, for animplicit numerical solution method, numerically solving fi,k(t,q,x) = 0 to determine x requiresO(nNnxn∆(nf + nf1nxnδ) + nNnM)(D.195)operations.The sensitivity equations are generated by applying the chain rule to the differential equationsystem, and are thus linear in sensitivity values, ∂xi/∂ql. As such, discretized sensitivityequations are generally linear in discretized sensitivity values, ∂xi,m/∂ql. Thus, beyond thecalculations required to solve the discretized sensitivity equations with an explicit numericalsolution method, solving the discretized sensitivity equations with an implicit numerical solutionmethod requires solving matrix equations. First order discretized sensitivity equations withrespect to ql are identical in size to fi,k(t,q,x) = 0. Thus, calculating matrix equations insolving first order discretized sensitivity equations with respect to ql requires a total of O(nM )operations. As such, calculating matrix equations in solving first order discretized sensitivityequations with respect to ql for all l ∈ {1, 2, . . . , nq} requires a total of O(nqnM ) operations.Therefore, in general, in conjunction with complexity count (D.183), solving the first ordersensitivity equations with an implicit numerical solution method requiresO( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM ) (D.196)184D.2. Computational Complexities of Numerical-Integration-Based Methodsoperations. From complexity count (D.188), after calculating x and solving the first ordersensitivity equations, calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requires O(nynt(ng1nx +ng1nq + nxnq)) operations. Therefore, from complexity counts (D.195), (D.196), and (D.188), intotal, calculating ∂r(q)/∂ql for all l ∈ {1, 2, . . . , nq} requiresO(nNnxn∆(nf + nf1nxnδ) + nNnM)+O( ≥ nxn∆(nF1nx + nF1nq + nxnq))+O(nqnM ) +O(nynt(ng1nx + ng1nq + nxnq))=O(nNnxn∆(nf + nf1nxnδ) + nM (nq + nN ) + nyntng1(nx + nq))+O( ≥ nxn∆(nF1nx + nF1nq + nxnq)) (D.197)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆.Alternatively, I can approximate partial derivatives of r(q) with respect to parameters byfinite differences, rather than by solving the sensitivity equations. Most simply,∂r(q)∂ql≈ r(q + hlel)− r(q)hl, (D.198)where el is the lth standard basis vector and hl is some small perturbation in parameter ql.After calculating x, calculating r(q) requires calculating1nyny∑j=1nt∑k=1dyj,k(gj,k(q,x)), (D.199)which is equivalent in computational complexity to calculatingdyj,k(gj,k(q,x))for all (j, k) ∈ {1, . . . , ny} × {1, . . . , nt}, (D.200)which requiresO(1) ◦O(ng)× nynt = O(ngnynt) (D.201)operations to calculate, as stipulated in Equations (D.3) and (D.4). Thus, from complexitycounts (D.174) and (D.201), calculating r(q) requiresO(nfnxn∆) +O(ngnynt) = O(nfnxn∆ + ngnynt) (D.202)operations with an explicit numerical solution method, and, from complexity counts (D.195)and (D.201), calculating r(q) requiresO(nNnxn∆(nf + nf1nxnδ) + nNnM)+O(ngnynt) =O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)(D.203)185D.3. Comparison of Computational Complexitiesoperations with an implicit numerical solution method.Approximating ∂r(q)/∂ql, as in equation (D.198), for all l ∈ {1, 2, . . . , nq} is equivalent incomputational complexity to calculatingr(q + hlel) for all l ∈ {1, 2, . . . , nq},r(q), (D.204)which, from complexity count (D.202), for an explicit numerical solution method, respectively,requireO(nfnxn∆ + ngnynt)× nq = O(nq(nfnxn∆ + ngnynt)),O(nfnxn∆ + ngnynt) (D.205)operations to calculate, and, from complexity count (D.203), for an implicit numerical solutionmethod, respectively, requireO(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)× nq =O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq),O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)(D.206)operations to calculate. Thus, for an explicit numerical solution method, approximating∂r(q)/∂ql by finite difference for all l ∈ {1, 2, . . . , nq} requiresO(nq(nfnxn∆ + ngnynt))+O(nfnxn∆ + ngnynt) =O(nq(nfnxn∆ + ngnynt))(D.207)operations, and for an implicit numerical solution method, approximating ∂r(q)/∂ql by finitedifference for all l ∈ {1, 2, . . . , nq} and m ∈ {1, 2, . . . , nq} requiresO(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)+O(nNnxn∆(nf + nf1nxnδ) + nNnM + ngnynt)=O(nNnxn∆nq(nf + nf1nxnδ) + nNnMnq + ngnyntnq)(D.208)operations.D.3 Comparison of Computational ComplexitiesD.3.1 Complexity Assumptions for ComparisonIn Theorems 7, 8, 9, and 10, I calculate computational complexities with general ng, ng1 , ng2 ,np, nF , nF1 , nF2 , nf , nf1 , nf2 , nδ, nσ, nN , and nM . To compare computational complexities, I186D.3. Comparison of Computational Complexitiesconsider more specific ng, ng1 , ng2 , np, nF , nF1 , nF2 , nf , nf1 , nf2 , nδ, nσ, nN , and nM . Often,observable-state functions, gj,k(p,x), are linear combinations of state values. Thus, I considerO(ng) = O(nx),O(ng1) = O(1),O(ng2) = O(0). (D.209)Often, a discretized differential equation, Fi,k(t,p,x), is a sum of nonlinear parameter anddiscretized state-value combinations, with O(nx) nonlinear combinations of O(nx) parametersin interconnected systems, for each i ∈ {1, 2, . . . , nx}. Thus, I considerO(np) = O(n2x),O(nF ) = O(nx),O(nF1) = O(1),O(nF2) = O(1). (D.210)Generally, calculating numerical discretizations, fi,k(t,p,x), requires a similar number of opera-tions as calculating discretized differential equation values, Fi,k(t,p,x). Thus, I considerO(nf ) = O(nx),O(nf1) = O(1),O(nf2) = O(1). (D.211)Generally, the number of elements in I∆m is similar to the order of the differential equation,ordinarily of O(1). Therefore, I considerO(nδ) = O(1). (D.212)operations. I consider O(1) line-search test points in each iteration of descent and O(1) iterationsof Newton’s method to solve fi,k(t,q,x) = 0 for all i ∈ {1, 2, . . . , nx} and k ∈ I∆. Thus,O(nσ) = O(1),O(nN ) = O(1), (D.213)Often, initial conditions and boundary values are not fixed, and are fitted along with modelparameters to data. Thus, for ni initial points in I∆ and nb boundary points in I∆, I considerO(nq) = O(np + nxni + nxnb). (D.214)187D.3. Comparison of Computational ComplexitiesFor clarity, I note that ni = 1 and nb = 0 for initial value ordinary differential equations andni = 0 and nb = 2 for boundary value ordinary differential equations.Calculating xn+1, the nxn∆ state values, xl,m for all l ∈ {1, 2, . . . , nx} and m ∈ I∆, in then+ 1th iteration of Newton’s method applied to solving the system f(x) = 0, fi,k(t,q,x) = 0for all i ∈ {1, 2, . . . , nx} and k ∈ I∆, requires calculatingJ(xn)(xn+1 − xn) = −f(xn), (D.215)where J(x) is the nxn∆ × nxn∆ Jacobian matrix of f(x). For locally implicit numerical solutionmethods, such as the backward Euler method for ordinary differential equations, matrix equation(D.215) is separable into n∆ submatrix equations, each of size nx×nx. For interconnected models,generally, ∂fi,k(t,q,x)/∂xl,m 6= 0 if m = k. Thus, for interconnected models, each nx × nxsubmatrix of J(xn) is a full matrix, and solving each of the n∆ full submatrix equations requiresO(n3x) operations using Gaussian elimination. For globally implicit numerical solution methods,matrix equation (D.215) is not separable into smaller matrix equations. Although, J(xn) is largeand sparse, so solving matrix equation (D.215) with an iterative method, such as the generalizedminimal residual method, is computationally more efficient than solving matrix equation (D.215)with a direct method, such as Gaussian elimination. fi,k(t,q,x) depends on xl,m for only asmall fraction of k in I∆, at k in I∆m ⊂ I∆. Thus, for interconnected models, fi,k(t,q,x)depends on O(nδnx) values of xl,m, and J(xn) contains O(n∆nx × nδnx) = O(n∆n2xnδ) nonzeroelements. In each iteration of the generalized minimal residual method, multiplication by J(xn)requires O(n∆n2xnδ) operations. The generalized minimal residual method may may require upto O(n∆nx) iterations to exactly solve matrix equation (D.215), but will generally converge insignificantly fewer iterations. Therefore, for all implicit numerical solution methods, I considerO(nM ) = O(≥ n∆n3x). (D.216)D.3.2 Comparison of Computational Complexities with AssumptionsComputational Complexity of r(p,x;λ) Descent with AssumptionsFrom assumptions (D.209), (D.210), (D.211), (D.212), and (D.213) and Theorem 7, an iterationof r(p,x;λ) descent requiresO(n∆(nxny + nxnx))+O(n∆np(ny + nx))+O(n∆nx(ny + nx))=n∆npnx = n∆n3x (D.217)operations, as ny ≤ nx ≤ np.188D.3. Comparison of Computational ComplexitiesComputational Complexity of r(q) Descent with AssumptionsFrom assumptions (D.209), (D.210), (D.211), (D.212), (D.213), (D.214), and (D.216) andTheorem 8, an iteration of r(q) descent requiresO(nxnxn∆ + nxnynt + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnx + nxnq + nxnxnq)) =O(≥ n∆nqn3x) = O( ≥ n∆(nx + ni + nb)n4x) (D.218a)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆; an iteration ofr(q) descent requiresO(nxn∆nx + nxnynt)+O(nMnq + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnx + nxnq + nxnxnq)) =O(≥ n∆nqn3x) +O(nMnq) =O( ≥ n∆(nx + ni + nb)n4x)+O( ≥ n∆n4x(nx + ni + nb)) =O( ≥ n∆(nx + ni + nb)n4x) (D.218b)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆; an iteration ofr(q) descent requiresO(nq(nxnxn∆ + nxnynt))= O(n∆nqn2x) = O(n∆(nx + ni + nb)n3x)(D.218c)operations with an explicit numerical solution method and partial derivative approximation byfinite difference, as ny ≤ nx and nt ≤ n∆; and an iteration of r(q) descent requiresO(nq(nxn∆nx + nM + nxnynt))= O(n∆nqn2x) +O(nMnq) =O(n∆(nx + ni + nb)n3x)+O( ≥ n∆n4x(nx + ni + nb)) =O( ≥ n∆(nx + ni + nb)n4x) (D.218d)operations with an implicit numerical solution method and partial derivative approximationby finite difference, as ny ≤ nx and nt ≤ n∆. Comparing computational counts (D.217) and(D.218), for an explicit numerical solution method, an iteration of r(q) descent requires at leastnx + ni + nb times as many operations as an iteration r(p,x;λ) descent, and, for an implicitnumerical solution method, an iteration of r(q) descent requires at least (nx + ni + nb)nx timesas many operations as an iteration r(p,x;λ) descent. For ODEs, where ni + nb ≤ 2, r(p,x;λ)descent is computationally more efficient than r(q) descent, and the difference in computationalefficiency increases with an increasing number of model states, markedly for implicit numericalsolution methods. Generally, in PDE models of data, the number of initial points and/or thenumber of boundary points greatly exceeds the number of states, nx << ni+nb. Thus, for PDEs,189D.3. Comparison of Computational Complexitiesr(p,x;λ) descent is computationally far more efficient than r(q) descent, and the difference incomputational efficiency grows with an increasing number of data points and/or model states,markedly for implicit numerical solution methods.Computational Complexity of Newton’s Method to Minimize r(q) withAssumptionsFrom assumptions (D.209), (D.210), (D.211), (D.212), (D.213), (D.214), and (D.216) andTheorem 9, an iteration of r(q) minimization by Newton’s method requiresO(nxnxn∆ + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnx + nxnq + nqnq + nxnxnq + nxnqnq)) =O(≥ n∆n2qn2x) = O( ≥ n∆(nx + ni + nb)2n4x) (D.219a)operations with an explicit numerical solution method, as ny ≤ nx ≤ nq and nt ≤ n∆; aniteration of r(q) minimization by Newton’s method requiresO(nxn∆nx + nMnqnq + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnx + nxnq + nqnq + nxnxnq + nxnqnq)) =O(≥ n∆n2qn2x) +O(nMn2q) =O( ≥ n∆(nx + ni + nb)2n4x)+O( ≥ n∆n5x(nx + ni + nb)2) =O( ≥ n∆(nx + ni + nb)2n5x) (D.219b)operations with an implicit numerical solution method, as ny ≤ nx ≤ nq and nt ≤ n∆; aniteration of r(q) minimization by Newton’s method requiresO(nqnq(nxnxn∆ + nxnynt))= O(n∆n2qn2x) =O(n∆(nx + ni + nb)2n4x)(D.219c)operations with an explicit numerical solution method and partial derivative approximationby finite difference, ny ≤ nx and nt ≤ n∆; and an iteration of r(q) minimization by Newton’smethod requiresO(nxn∆nqnqnx + nMnqnq + nxnyntnqnq) = O(n∆n2qn2x) +O(nMn2q) =O(n∆(nx + ni + nb)2n4x)+O( ≥ n∆n5x(nx + ni + nb)2) =O( ≥ n∆(nx + ni + nb)2n5x) (D.219d)operations with an implicit numerical solution method and partial derivative approximation byfinite difference, as ny ≤ nx and nt ≤ n∆. Comparing computational counts (D.217) and (D.219),for an explicit numerical solution method, an iteration of r(q) minimization by Newton’s method190D.3. Comparison of Computational Complexitiesrequires at least (nx + ni + nb)2nx times as many operations as an iteration r(p,x;λ) descent,and, for an implicit numerical solution method, an iteration of r(q) minimization by Newton’smethod requires at least (nx + ni + nb)2n2x times as many operations as an iteration r(p,x;λ)descent. Newton’s method generally converges quadratically. For ODEs, where ni+nb ≤ 2, withfew model states, superior convergence of Newton’s method may compensate for its relativelylarge computational burden, but, with an increasing number of model states, r(p,x;λ) descentbecomes increasingly more computationally efficient than r(q) minimization by Newton’s method,markedly for implicit numerical solution methods. For PDEs, where nx << ni + nb, r(p,x;λ)descent is computationally far more efficient than r(q) minimization by Newton’s method, andthe difference in computational efficiency grows with an increasing number of data points and/ormodel states, markedly for implicit numerical solution methods.Computational Complexity of Gradient-Based Methods to Minimize r(q) withAssumptionsTo minimize r(q), gradient-based methods, such as gradient descent, the Gauss-Newton method,Levenberg-Marquardt method, and quasi-Newton methods, require calculating all first orderpartial derivatives of r(q). From assumptions (D.209), (D.210), (D.211), (D.212), (D.213),(D.214), and (D.216) and Theorem 10, calculating all first order partial derivatives of r(q)requiresO(nxnxn∆ + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnq)) =O(≥ n∆nqn2x) = O( ≥ n∆(nx + ni + nb)n3x) (D.220a)operations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆; calculating allfirst order partial derivatives of r(q) requiresO(nxn∆nx + nMnq + nynt(nx + nq))+O( ≥ nxn∆(nx + nq + nxnq)) =O(≥ n∆nqn2x) +O(nMnq) =O( ≥ n∆(nx + ni + nb)n3x)+O( ≥ n∆n4x(nx + ni + nb)) =O( ≥ n∆(nx + ni + nb)n4x) (D.220b)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆; approximatingall first order partial derivatives of r(q) by finite difference requiresO(nq(nxnxn∆ + nxnynt))= O(n∆nqn2x) =O(n∆(nx + ni + nb)n3x)(D.220c)191D.3. Comparison of Computational Complexitiesoperations with an explicit numerical solution method, as ny ≤ nx and nt ≤ n∆; and approxi-mating all first order partial derivatives of r(q) by finite difference requiresO(nxn∆nqnx + nMnq + nxnyntnq) = O(n∆nqn2x) +O(nMnq) =O(n∆(nx + ni + nb)n3x) +O( ≥ n∆n4x(nx + ni + nb)) =O( ≥ n∆(nx + ni + nb)n4x) (D.220d)operations with an implicit numerical solution method, as ny ≤ nx and nt ≤ n∆. Comparingcomputational counts (D.217) and (D.220), for an explicit numerical solution method, calculatingall first order partial derivatives of r(q) requires at least nx + ni + nb times as many operationsas an iteration r(p,x;λ) descent, and, for an implicit numerical solution method, calculating allfirst order partial derivatives of r(q) requires at least (nx + ni + nb)nx times as many operationsas an iteration r(p,x;λ) descent. In some cases, gradient-based methods that approximatethe Hessian using first order partial derivative values, such as the Gauss-Newton method,Levenberg-Marquardt method, and quasi-Newton methods, may converge somewhat fasterthan descent. In such cases, for ODEs, where ni + nb ≤ 2, with few model states, superiorconvergence of gradient-based methods may compensate for relatively larger computationalburdens, but, with an increasing number of model states, r(p,x;λ) descent becomes increasinglymore computationally efficient than r(q) minimization by gradient-based methods, markedlyfor implicit numerical solution methods. For PDEs, where nx << ni + nb, r(p,x;λ) descent iscomputationally far more efficient than r(q) minimization by gradient-based methods, and thedifference in computational efficiency grows with an increasing number of data points and/ormodel states, markedly for implicit numerical solution methods.ConclusionOverall, from assumptions (D.209), (D.210), (D.211), (D.212), (D.213), (D.214), and (D.216)and Theorems 7, 8, 9, and 10, r(p,x;λ) descent is computational more efficient than r(q)minimization. More specifically, for ODEs, with few model states, r(p,x;λ) descent may besomewhat more computationally efficient than r(q) minimization, and, with an increasingnumber of model states, r(p,x;λ) descent becomes increasingly more computationally efficientthan r(q) minimization, markedly for implicit numerical solution methods. Also, for PDEs,r(p,x;λ) descent is computationally far more efficient than r(q) minimization, and the differencein computational efficiency grows with an increasing number of data points and/or model states,markedly for implicit numerical solution methods.192Appendix EDetails of Testing theHomotopy-Minimization Method forParameter Estimation in DifferentialEquationsE.1 Implementation of Overlapping-Niche Descent for Formsof the Bonny ModelHere, I describe details pertaining to the implementation of overlapping-niche descent for formsof the Bonny model. I describe related structural components of overlapping-niche descent inSection 3.4.E.1.1 Generating Random Parameters and State ValuesInitially in overlapping-niche descent, I randomly generate parameters and state values. Also, asdiscussed in Section C.1, throughout overlapping-niche descent, I randomly generate parametersand state values in random offspring. Given no prior parameter value estimates, I randomlygenerate rate parameters over a broad range of scales:p ∼ up · 10U(−6,1) for all p ∈ {ωD, ωdD, ωE , ωed, ωde,m, ωde,c, ωe}, (E.1)where U(a, b) is the uniform probability distribution over the interval (a, b), and up is the unitsof parameter p. I expect cmax to be within one or two orders of magnitude of the maximal MinDdata value, Dmax. Thus I randomly generate cmax such thatcmax ∼ Dmax · 10U(0,2). (E.2)Naively, I guess that fitted diffusion coefficients are within one or two orders of magnitude of10 µm2 s−1. Thus, I randomly generate diffusion coefficients such thatp ∼ 10 µm2 s−1 · 10U(−2,2) for all p ∈ {Dd, Dde, De}. (E.3)193E.1. Implementation of Overlapping-Niche Descent. . .For reference, the parameter values used to generate synthetic data are shown in Table 3.1. Irandomly generate state values so that observable state values match observed or interpolateddata exactly. Thus, for MinD and MinE observed or interpolated data values, D and E, with ceas a free state, I randomly generate state values such thatce ∼ U(max{0, E −D}, E),cde = E − ce,cd = D − E + ce. (E.4)E.1.2 Parents and OffspringThe function of parents and offspring in overlapping-niche descent is described in Section C.1.Accordingly, to the ith niche in generation g, I allocate one sustained parent, nˆi = 1, one highmomentum offspring, nˇmg,i = 1, one cross-niche offspring, nˇcg,i = 1, and one random offspring,nˇrg,i = 1, for each i ∈ {1, 2, . . . , 101} and each generation of overlapping-niche descent, g ≥ 1. Inthe first two generations of overlapping-niche descent, g ≤ 2, I allocate two sexual offspring toeach niche, nˇsg,i = 2 for all i ∈ {1, 2, . . . , 101}. After the second generation of overlapping-nichedescent, I adaptively change the number of sexual offspring that I allocate to each niche, enlargingless convergent niches and shrinking more convergent niches for greater efficiency in optimization.Specifically, I allocate one sexual offspring to the ith niche, and randomly allocate the remaining101 sexual offspring to the ith niche with probability proportional to ∆rg,i,1, the measure ofconvergence in the (first) parent space of the ith niche in generation g, as defined in equation(C.1), for each i ∈ {1, 2, . . . , 101} and g > 2.E.1.3 Selection and Random PerturbationI choose the natural default value for the selection strength parameter, qfit = 1, for qfit asdescribed in Section C.1. For a sexual offspring that inherits parameter p from individual(pg,i,j ,xg,i,j), I perturb the value of the parental parameter, pˆ, such thatp ∼(pˆ · 10N(0,max{∆rg,i,j ,10−2}2) | p ≥ pmin), (E.5)where N(µ, σ2) is the normal distribution with mean µ and variance σ2, ∆rg,i,j is the measure ofconvergence in the jth parent space of the ith niche in generation g, as defined in equation (C.1),and pmin is the restricted lower bound on parameter p as discussed in Section 3.4.2. A standarddeviation of max{∆rg,i,j , 10−2} ensures some small but significant perturbation in parameter pwhen ∆rg,i,j is small. Similarly, for a sexual offspring that inherits state value x from individual(pg,i,j ,xg,i,j), I perturb the value of the parental state value, xˆ, such thatx ∼(xˆ+ xˆ ·N (0,max{∆rg,i,j , 10−2}2) ∣∣ x ≥ 0). (E.6)194E.2. Details of SNSDDetails pertaining to sexual offspring are described in Section C.1.E.1.4 Dykstra’s MethodAs discussed in Section C.2.3, during overlapping-niche descent, I project parameters andstate values onto a restricted domain using Dykstra’s method. For forms of the Bonny model,restrictions on parameters and state values, linear inequalities (3.7), (3.8), and (3.9), are allmutually independent. Thus, Dykstra’s method, as discussed in Section C.2.3, will converge inone projection cycle, when projecting parameters and state values onto the domain boundedby (3.7), (3.8), and (3.9). As such, for Dykstra’s method, I have no need to define the relativetermination tolerance, εc or the absolute termination tolerance, εc¯.E.1.5 Initial values, Termination, Prolongation, and ComputationI choose the initial gradient scaling value si,0 = 0, for all i in the indexed set of all parametersand state values. Details pertaining to si,0 are described in Section C.2.1. I choose the maximumnumber of strict descent iterations to be relatively but not excessively large, nmax = 104, toensure sufficient convergence to a local minimum of r(p,x;λ) while avoiding overburdensomecomputation. I choose a very small contraction termination tolerance, εσ = 10−30, and a verysmall relative-change termination tolerance, εr = 10−30, to continue accelerated descent throughnmax strict descent iteration unless local minimization is essentially complete. Details pertainingto nmax, εσ, and εr are described in Section C.2.2. For descent prolongation, I choose: σ˘ = 1, fornon-stringent descent prolongation, mpro = 103, a factor of 10 less than nmax; nˆpro = nmax = 104;and nˇpro = nmax = 104. Details pertaining to σ˘, mpro, nˆpro, and nˇpro are described in SectionC.3. I choose the overlapping-niche descent termination tolerance to be relatively but notexceedingly small, ε∆r = 10−3, for reliable convergence in all niches while avoiding an excessivenumber of overlapping-niche descent generations. Details pertaining to ε∆r are described inSection C.1. I compute genetic algorithm calculations using MATLAB. I compute accelerateddescent calculations in parallel using C++ on the Calcul Que´bec server Guillimin, which usesIntel Xeon X5650 Westmere processors and has 6 cores per CPU.E.2 Details of SNSDHere, I describe details of SNSD, single-niche solution descent. SNSD optimizes over parametersand initial conditions to minimize ry(p,x) with numerical solution values x. SNSD followsthe genetic algorithm of overlapping-niche descent as described in Section C.1, but consists ofa single niche. To be consistent with overlapping-niche descent as described in Section 3.4.3,SNSD sustains 101 parents, 101 one high momentum offspring, 303 sexual offspring, and 101random offspring. SNSD follows the accelerated descent routine of overlapping-niche descentas described in Section C.2, except state values are determined by numerically solving thedifferential equation system. Also, state values are implicit functions of parameters and initial195E.3. Details Pertaining to the Implementation. . .conditions. So, partial derivatives of ry(p,x) with respect to parameters and initial conditionsare calculated in part by numerically solving the sensitivity Equations (D.52) to determinepartial derivatives of state values with respect to parameters and initial conditions.E.3 Details Pertaining to the Implementation ofOverlapping-Niche Descent in Practice with the FullBonny ModelHere, I continue the discussion of Section 3.8, to explicate details pertaining to the implementationof overlapping-niche descent in practice. My discussion follows overlapping-niche descent inthe fitting of the full Bonny model, as defined in Equation 3.1, to the synthetic traveling-wave-emergence data, as shown in Figure 3.3, on a uniform grid with a grid refinement factor of 1,n∆tn∆snt−1ns−1 = 1 for n∆t and n∆s the number or temporal and spatial grid points and ntand ns the number of temporal and spatial data points.E.3.1 Selection in Overlapping-Niche DescentAs discussed in Section C.1, overlapping-niche descent employs various types of offspring. Toillustrate how offspring types contribute to convergence in overlapping-niche descent, I mapselection from offspring types in Figure E.1.196E.3. Details Pertaining to the Implementation. . .(a) (b)(c) (d)Figure E.1: Selection from offspring types during overlapping-niche descent. White indicatesthat (pg,i,1,xg,i,1), the individual in the (first) parent space of the ith niche in generation g asdescribed in Section C.1, was selected from a high momentum offspring in (a), a cross-nicheoffspring in (b), a sexual offspring in (c) and a random offspring in (d).As is visible in Figure E.1, in early generations of overlapping-niche descent, sexual offspring andrandom offspring contribute to convergence in smaller values of λ while high momentum offspringand sexual offspring contribute to convergence broadly in λ, and in subsequent generationsof overlapping-niche descent, high momentum offspring and sexual offspring contribute toconvergence in larger values of λ.To illustrate how cross-niche optimization contributes to convergence in overlapping-nichedescent, I map selection from niches in Figure E.2.197E.3. Details Pertaining to the Implementation. . .(a)(b) (c)Figure E.2: Selection across niches during overlapping-niche descent. The individual in the(first) parent space of the ith niche in generation g, (pg,i,1,xg,i,1), was selected from the nicheshown by grayscale in (a) and was generated from parents(s) in the niche(s) shown by grayscalein (b) and (c). In (b) and (c), a blue value indicates that the selected individual was a randomoffspring and thus not generated from a parent in any niche.As is visible in Figure E.2, cross-niche optimization ubiquitously contributes to convergencethroughout overlapping-niche descent, and selection or generation from a parent from a nichewith a larger value of λ contributes to convergence at least as much as selection or generationfrom a parent from a niche with a smaller value of λ. Interestingly, although r˜(λ) convergesmore readily for smaller λ, selection or generation from a parent from a niche with a larger valueof λ contributes to convergence in r˜(λ) for smaller λ.E.3.2 Prolongation in Overlapping-Niche DescentAs discussed in Section C.3, I prolong accelerated descent for an individual that appears to beconverging to a value of r(p,x;λ) below the least value of r(p,x;λ) in its niche. To illustrate how198E.3. Details Pertaining to the Implementation. . .descent prolongation contributes to convergence in overlapping-niche descent, I map selectionfrom individuals with prolonged accelerated descent in Figure E.3.Figure E.3: Descent prolongation during overlapping-niche descent. The number of strict descentiterations are shown for the individual that was selected to occupy the (first) parent space ofthe ith niche in generation g, (pg,i,1,xg,i,1). Descent prolongation occurs when the number ofstrict descent iterations exceeds nmax = 104.As is visible in Figure E.3, descent prolongation ubiquitously contributes to convergence through-out overlapping-niche descent.E.3.3 Convergence During Accelerated DescentAs discussed in Section C.2, overlapping-niche descent includes accelerated descent, a variantof accelerated scaled gradient descent. Generally, accelerated descent follows a trajectory withrapid, sublinear convergence that develops into periods of superlinear convergence, which arepunctuated by restarts. The number of strict descent iterations with sublinear convergence, thenumber of strict descent iterations with superlinear convergence, and convergence rates varywith initial parameters and state values and with λ. I show convergence plots that exemplifythe convergence behavior of accelerated descent in Figure E.4.199E.3. Details Pertaining to the Implementation. . .(a) (b)Figure E.4: Convergence behavior of accelerated descent. Rapid, sublinear convergence thatdevelops into periods of superlinear convergence is shown in (a), for accelerated descent thatbegins with a random offspring and converges to (p˜λ22 , x˜λ22). Periods of superlinear convergencethat are punctuated by restarts is shown in (b), for accelerated descent that begins with a highmomentum offspring and converges to (p˜λ8 , x˜λ8). I calculate relative errors as (r(p,x;λ) −r(p˜λ, x˜λ;λ))/(p˜λ, x˜λ;λ). Rather than misrepresenting relative errors near (p˜λ, x˜λ), I omitrelative errors in the last 103 strict descent iterations of accelerated descent.For reference, gradient descent and Nesterov’s accelerated gradient method generally convergesublinearly, with respective upper bounds on errors of O(n−1) and O(n−2) for iteration numbern [50], and Newton’s method generally converges quadratically.To demonstrate the importance of scaling in accelerated descent, as discussed in SectionC.2.1, I apply accelerated descent without scaling, si,j = 1 for all i ∈ {1, 2, . . . , nv} andj ≥ 1 in equation (C.5), to the randomly generated individuals from the first generation ofoverlapping-niche descent and compare results from the optimization to analogous results fromthe optimization with accelerated descent. In doing so, for a balanced comparison, I onlycompare results for individuals with nmax = 104 strict descent iterations during both accelerateddescent and accelerated descent without scaling, where nmax is the maximum number of strictdescent iterations described in Section C.2.2 and specified in Section E.1.5. Results are shownin Figure E.5.200E.3. Details Pertaining to the Implementation. . .(a) (b)(c)Figure E.5: A comparison of optimization using accelerated descent and accelerated descentwithout scaling. Final-to-initial ratios of r(p,x;λ) are shown for accelerated descent in (a)and for accelerated descent without scaling in (b). A histogram of values for the ratio of thefinal value of r(p,x;λ) from accelerated descent to the final value of r(p,x;λ) from accelerateddescent without scaling is shown in (c). In (a), (b), and (c), I show results for the randomlygenerated individuals in the first generation of overlapping-niche descent with nmax = 104 strictdescent iterations during both accelerated descent and accelerated descent without scaling andomit other results.As is visible in Figure E.5, uniformly in all niches, accelerated descent dramatically outperformsaccelerated descent without scaling. More precisely, in accordance with the histogram in panel(c) of Figure E.5, for the randomly generated individuals in the first generation of overlapping-niche descent (with nmax = 104 strict descent iterations during both accelerated descent andaccelerated descent without scaling), I find a median value for the ratio of the final valueof r(p,x;λ) from accelerated descent to the final value of r(p,x;λ) from accelerated descentwithout scaling of 6.86 · 10−8.201Appendix FExtracting Near-Homogeneous DataHere, I discuss details of data preprocessing and extracting spatially near-homogeneous time-course data from Ivanov and Mizuuchi’s in vitro experimental measurements of the Min system[38].F.1 Data InformationIn the Ivanov and Mizuuchi experiments, in a 25 µm deep flow chamber, a buffer with 1.06 µMfluorescently labeled EFGP-MinD, 1.36 µM fluorescently labeled Alexa647-MinE, and 2.5 mMATP was flowed at an average rate of 0.5 mm s−1 atop a supported lipid bilayer. On thesupported lipid bilayer, densities of MinD and MinE, which were measured using total internalreflection microscopy (TIRF), oscillated near-homogeneously in space then formed into travelingwaves. According to the Ivanov and Mizuuchi Supporting Information, rapid buffer flowcompensated for local changes in the concentrations of reaction components in the buffer, as thebuffer flow rate was three orders of magnitude faster than the typical traveling wave speed ofMinD and MinE densities on the supported lipid bilayer and Taylor dispersion in the laminarflow was estimated to take place on a time scale that was about two orders of magnitude fasterthan the typical wave period of MinD and MinE densities on the supported lipid bilayer.I analyze Ivanov and Mizuuchi’s raw data, from the file Movie1.stk, using MATLAB, andthe function tiffread2 to import data. The data contains 2401 grayscale frames. Each frame is512× 512 pixels, and is a dual image, with the fluorescence intensity signals of EGFP-MinDon the left and the fluorescence intensity signals of Alexa647-MinE on the right. For visualclarification, an image of the 330th data frame is shown in (a) of Figure F.1 and an enhanced(through the MATLAB function imadjust) image of the 330th data frame is shown in (b) ofFigure F.1.202F.2. Aligning Data Tracks(a) (b)Figure F.1: The 330th data frame as an image (a) and as an enhanced image (b).F.2 Aligning Data TracksAs is visible in (b) of Figure F.1, similarly shaped structures in MinD and MinE fluorescenceintensity profiles seem to be aligned in rotation and scale, but not vertically. Also, MinDand MinE fluorescence intensity profiles require horizontal alignment because both signals aremerged into a single image. I translationally align EGFP-MinD fluorescence intensity profilesand Alexa647-MinE fluorescence intensity profiles using structures in the 330th data frame aslocation markers. I denote the intensity value of the ith vertical pixel from the top and the jthhorizontal pixel from the left in the 330th merged data frame by m330i,j , for i ∈ {1, 2, . . . , 512} andj ∈ {1, 2, . . . , 512}. I select a square alignment preimage sufficiently within the interior of theMinE fluorescence intensity profile (shown in Figure F.2), which contains a unique arrangementof structures that are shaped similarly in both the MinD and MinE fluorescence intensity profiles.203F.2. Aligning Data TracksFigure F.2: Alignment preimage [(p, q) = (65, 295), l = 106].The alignment preimage has upper-left vertex (p, q) = (65, 295) with side length l = 106 pixels. Icompare the similarity between the alignment preimage in the MinE fluorescence intensity profilewith an equally-sized alignment image in the MinD fluorescence intensity profile by normalizedcross-correlation. For a square alignment image in the MinD fluorescence intensity profile withupper-left vertex (s, t) and side length l, the normalized cross-correlation between the alignmentpreimage and the alignment image is given by:c(s, t) =l−1∑i=0l−1∑j=0(m330p+i,q+j − µp,q)(m330s+i,t+j − µs,t) l−1∑i=0l−1∑j=0(m330p+i,q+j − µp,q)2 12  l−1∑i=0l−1∑j=0(m330s+i,t+j − µs,t)2 12, (F.1)with alignment preimage and image mean values µp,q and µs,t:µp,q =1l2l−1∑i=0l−1∑j=0m330p+i,q+j , (F.2a)µs,t =1l2l−1∑i=0l−1∑j=0m330s+i,t+j . (F.2b)The MinD fluorescence intensity profile appears to end at the 252nd horizontal pixel. Thus, forside length l = 106, square alignment images with upper-left vertices (s, t) ∈ {1, 2, . . . , 407} ×{1, 2, . . . , 147} are contained entirely within the MinD fluorescence intensity profile. I calculate204F.2. Aligning Data Tracksc(s, t) for all (s, t) ∈ {1, 2, . . . , 407} × {1, 2, . . . , 147}; relative values are shown in Figure F.3.Figure F.3: Relative c(s, t) values for (s, t) ∈ {1, 2, . . . , 407} × {1, 2, . . . , 147}. Values increasewith gradation from black to white.For (s, t) ∈ {1, 2, . . . , 407}× {1, 2, . . . , 147}, c(s, t) has a maximum value of 0.56 at (92, 43). Thealignment preimage in the MinE fluorescence intensity profile and the alignment image in theMinD fluorescence intensity profile with upper-left vertex (s, t) = (92, 43) are shown in FigureF.4.Figure F.4: Alignment preimage [(p, q) = (65, 295), l = 106] and alignment image [(s, t) =(92, 43), l = 106].Thus, I align pixel (65, 295) in the MinE fluorescence intensity profile with pixel (92, 43) in the205F.3. Preparing Aligned Data for AnalysisMinD fluorescence intensity profile, a shift of (27,−252) pixels. I apply the same shift to alignall MinD and MinE fluorescence intensity data that has a one-to-one correspondence underthe shift. Therefore, I assign value m330i+27,j to aligned MinD fluorescence intensity data elementd¨330i,j , and I assign value m330i,j+252 to aligned MinE fluorescence intensity data element e¨330i,j , fori ∈ {1, 2, . . . , 485} and j ∈ {1, 2, . . . , 252}. I align MinD and MinE fluorescence intensity dataidentically in all data frames. During many data frames, MinD fluorescence intensity in thetwo right-most vertical pixels is significantly less than fluorescence intensity in neighboringpixels, and MinE fluorescence intensity in the three left-most vertical pixels is significantly morethan fluorescence intensity in neighboring pixels. Thus, I remove the three left-most and tworight-most vertical pixels from aligned MinD and MinE fluorescence intensity data. Therefore,for data frame t, I define aligned MinD and MinE fluorescence intensity data elements d˙ti,jand e˙ti,j to be data elements d¨ti,j+3 = mti+27,j+3 and e¨ti,j+3 = mti,j+252+3, for i ∈ {1, 2, . . . , 485},j ∈ {1, 2, . . . , 247}, and t ∈ {1, 2, . . . , 2401}. An enhanced image of aligned MinD and MinEfluorescence intensity data is shown in Figure F.5.Figure F.5: Aligned data from the 330th data frame as an image.F.3 Preparing Aligned Data for AnalysisF.3.1 Temporal Partition of DataAligned MinD and MinE fluorescence intensity data contains the temporal evolution of sixpulses, over roughly the entire spatial domain of the data, that slowly develop into persistenttraveling wave fronts. I temporally partition aligned MinD and MinE fluorescence intensity206F.3. Preparing Aligned Data for Analysisdata, by temporal index, such that{1, 2, . . . , 2401} =10⋃k=0Pk ={1, . . . , 224} ∪ {225, . . . , 343} ∪ {344, . . . , 546} ∪ {547, . . . , 719}∪{720, . . . , 894} ∪ {895, . . . , 1064} ∪ {1065, . . . , 1259} ∪ {1260, . . . , 1388}∪{1389, . . . , 1539} ∪ {1540, . . . , 1677} ∪ {1678, . . . , 2401},where P0 contains temporal indices before the first global pulse in MinD and MinE fluorescenceintensity; P1, P2, . . . , P6 each contain temporal indices of a single, roughly global MinD and MinEfluorescence intensity spike then fall; and P7 contains the temporal indices of the beginning ofMinD and MinE fluorescence intensity traveling wave formation, which further develops duringthe temporal indices of P8 and P9, and persists through the temporal indices of P10. I denotealigned MinD and MinE fluorescence intensity data elements di,j and ei,j , in the lth orderedindex of the kth temporal partition, as dk,li,j and ek,li,j , for i ∈ {1, 2, . . . , 485}, j ∈ {1, 2, . . . , 247},k ∈ {0, 1, . . . , 10}, and l ∈ {1, 2, . . . , n(Pk)}, where n(Pk) denotes the cardinality of Pk.F.3.2 Intensity FlatteningBulk MinD and MinE, in the solution buffer, reach the middle of the flow cell, the site offluorescence intensity measurements, during temporal partition P0. Before bulk proteins reachthe middle of the flow cell, MinD and MinE fluorescence intensities consist entirely of backgroundnoise. After bulk proteins reach the middle of the flow cell, MinD and MinE fluorescent intensitiesconsist of background noise, bulk proteins, and some membrane-bound proteins. Mean MinDand MinE fluorescence intensity values over space, for each data frame in P0, are shown inFigure F.6.207F.3. Preparing Aligned Data for AnalysisFigure F.6: Mean fluorescence intensities over space during temporal partition P0. Intensityvalues are in camera units (c.u.).As is visible in Figure F.6, MinD and MinE fluorescence intensities consist entirely of backgroundnoise from image 1 to image 56. After image 56, MinD fluorescence intensity increases slightly,then MinD and MinE fluorescence intensities increase dramatically, presumably when bulkproteins reach the middle of the flow cell. Mean MinD and MinE fluorescence intensities overtime, from image 1 to image 56, are shown in Figure F.7.(a) (b)Figure F.7: Mean background intensities over time. MinD fluorescence intensity is shown in (a)and MinE fluorescence intensity is shown in (b).As is visible in Figure F.7, background noise intensities are roughly uniform in space. MinD andMinE background fluorescence intensities, in all pixels, from image 1 to 56, have mean valuesµbd = 102.9 c.u. and µbe = 102.8 c.u. and standard deviations σbd = 12.3 c.u. and σbe = 12.2 c.u.,where camera unit is abbreviated as c.u..208F.3. Preparing Aligned Data for AnalysisAfter bulk proteins reach the middle of the flow cell, MinD and MinE fluorescence intensitiesno longer appear roughly uniform in space. According to the Ivanov and Mizuuchi SupportingInformation, “The illumination had a Gaussian shape in the field of view with a measuredhorizontal and vertical half maximum widths of 65× 172 µm at 488 nm ”, where 488 nm refersto the wavelength of MinD fluorescence excitation. The Gaussian spreads of MinD and MinEfluorescent intensities are visibly different. Thus, MinD and MinE fluorescence intensities requireflattening to remove asymmetries in data acquisition. The Gaussian function, in two spatialdimensions x and y, has the formg(x, y;A, x0, y0, kx, ky) = A exp(− (kx(x− x0)2 + ky(y − y0)2)), (F.3)with scaling factor A, center point (x0, y0), and spread characterizing parameters kx and ky.The sum of Gaussian functions with identical center point and spread is a Gaussian functionthat retains the center point and spread. For static illumination sources, the Gaussian centerpoints and spreads of MinD and MinE fluorescence intensity remain constant throughout dataacquisition. During P0, after bulk proteins reach the middle of the flow cell, apart from apparentGaussian profiles, MinD and MinE fluorescence intensities appear to be roughly uniform inspace. Thus, roughly, each MinD and MinE image contains spatially uniform background noiseand a Gaussian intensity profile with static center point and spread. To generate MinD andMinE Gaussian profile data, initially, I subtract mean background intensity values, µbd and µbe,from MinD and MinE fluorescence intensity data, for image 85, when bulk proteins appear tohave saturated the entire spatial domain of the data, through image 224, the final image in P0.Then, for each image from 85 to 224, I normalize translated MinD and MinE data by the meantranslated MinD and MinE value in the image. Finally, I generate MinD and MinE Gaussianprofile data by calculating the mean normalized translated MinD and MinE value, from image 85to 224, at each image pixel. Ultimately, generated MinD and MinE Gaussian data are roughlyGaussian, with mean values of 1. According to the Ivanov and Mizuuchi Supporting Information,the side length of each pixel is 6−1 µm. Thus, for a a horizontal half maximum width of 65 µm,kx = ln(2) · (6 · 65 pixels)−2 ≈ 4.56 · 10−6 pixels−2, and for a vertical half maximum widths of172µm, ky = ln(2) · (6 · 172 pixels)−2 ≈ 6.51 · 10−7 pixels−2. Using the MATLAB functionlsqcurvefit, I determine the parameters A, x0, y0, kx, and ky that minimize the sum of squareddifferences between the Gaussian function and generated MinD and MinE Gaussian profile data,with initial parameter estimates A0 = 1, k0x = ln(2) ·(6 ·65 pixels)−2, k0y = ln(2) ·(6 ·172 pixels)−2,and (x00, y00) = (123, 242), the center of the spatial domain. I compute MinD Gaussian parameters,Ad = 1.43, kdx = 2.49 · 10−5 pixels−2, kdy = 3.27 · 10−6 pixels−2, xd0 = 186.0, and yd0 = 431.5,and MinE Gaussian parameters, Ae = 1.26, kex = 1.50 · 10−5 pixels−2, key = 2.53 · 10−6 pixels−2,xe0 = 196.3, and ye0 = 371.2. MinD and MinE Gaussian profile data and best fitting Gaussianfunctions are shown in Figure F.8.209F.3. Preparing Aligned Data for Analysis(a) (b)Figure F.8: Gaussian profile data and best fitting Gaussian functions. Generated Gaussian datais shown on the left and the best fitting Gaussian function is shown on the right, for MinD in(a) and MinE in (b).Using fitted Gaussian parameters, I flatten all aligned MinD and MinE fluorescence intensitydata to correct for Gaussian intensity profiles:d¯k,li,j =Adg(j, i;Ad, xd0, yd0 , kdx, kdy)(d˙k,li,j − µbd), (F.4)e¯k,li,j =Aeg(j, i;Ae, xe0, ye0, kex, key)(e˙k,li,j − µbe), (F.5)for i ∈ {1, 2, . . . , 485}, j ∈ {1, 2, . . . , 247}, k ∈ {0, 1, . . . , 10}, and l ∈ {1, 2, . . . , n(Pk)}. Confirm-ing the assumption, during P0, after bulk proteins reach the middle of the flow cell, flattenedMinD and MinE fluorescence intensities appear to be roughly uniform in space. Flattened MinDand MinE fluorescence intensities are shown for image 224, the final image in P0, in Figure F.9.210F.3. Preparing Aligned Data for Analysis(a) (b)Figure F.9: Flattened MinD and MinE fluorescence intensities for image 224. Flattened MinDfluorescence intensity is shown in (a) and flattened MinE fluorescence intensity is shown in (b).F.3.3 Scaling Flattened DataAccording to the Ivanov and Mizuuchi Supporting Information, by comparing the fluorescenceintensity of EGFP-MinD with the fluorescence intensity of ParA-GFP, a protein with no affinityfor the lipid bilayer, a EGFP-MinD fluorescence intensity of 7, 000 c.u. was found to correspond,approximately, to a surface density of 1.25 · 104 µm−2. The Alexa647-MinE fluorescenceintensity was not directly calibrated, but was estimated by direct comparison to experimentswith similar dynamic outcomes involving MinE-EGFP and nonfluorescent MinD. Peak MinE-EGFP fluorescence intensities were found to be two to four times less than peak fluorescenceintensities of EGFP-MinD, and the peak-to-peak ratio of 2.6 was used to normalize Alexa647-MinE data. I calibrate MinD and MinE fluorescence intensities from flattened fluorescenceintensity data in temporal partition P0.MinD and MinE fluorescence intensities were measured using total internal reflection mi-croscopy (TIRF). In TIRF, a light beam undergoes total internal reflection at the interface of asolution with a solid surface, producing an evanescent wave that excites fluorescent moleculesnear the solid surface [2]. The evanescent electric field intensity, I, decays with distance, z, fromthe solid surface:I(z) = I0e−z/d, (F.6a)d =λ04pi(n21 sin2 θ − n22)− 12 , (F.6b)where I0 is the electric field intensity at z = 0, λ0 is the light wavelength in a vacuum, n1 isthe refractive index of the solid surface, n2 is the refractive index of the solution, and θ is theangle of incidence [2]. According to the Ivanov and Mizuuchi Supporting Information, for MinD211F.3. Preparing Aligned Data for Analysisfluorescence excitation, using a laser with a wavelength of 488 · 10−3 µm, the penetration depthof the evanescent wave, dd, was 128 · 10−3 µm. The penetration depth of the evanescence wavewas not explicitly characterized for MinE fluorescence excitation. From equation (F.6b) and dd,for the experiment,14pi(n21 sin2 θ − n22)− 12 =128488. (F.7)MinE fluorophores were excited using a laser with a wavelength of 633 · 10−3 µm. Thus, forMinE fluorescence excitation, the penetration depth of the evanescent wave, de, was 633 · 10−3 ·128 · 488−1 µm ≈ 166 · 10−3 µm.Pixelated MinD and MinE fluorescence intensities, Id and Ie, are the sums of bulk fluorescenceintensities, ID and IE , and lipid bilayer-bound fluorescence intensities, Id and Ie. ID and IE areproportional to the number of excited MinD and MinE fluorophores in solution, and Id and Ieare proportional to the number of excited MinD and MinE fluorophores on the lipid bilayer; forMinD and MinE solution concentrations, cD and cE , and lipid bilayer-bound concentrations, cdand ce, in the 25 µm deep flow cell,Id = ID + Id :ID =µd · a · d · Id0∫ 250cDe−z/dddz=µd · a · d · Id0 · cD · dd(1− 1.5 · 10−85)≈µd · a · d · Id0 · cD · ddId =µd · a · d · Id0 · cd,(F.8a)Ie = IE + Ie :IE =µe · a · e · Ie0∫ 250cEe−z/dedz=µe · a · e · Ie0 · cE · de(1− 3.9 · 10−66)≈µe · a · e · Ie0 · cE · deIe =µe · a · e · Ie0 · ce,(F.8b)with MinD and MinE evanescent field intensities at z = 0, Id0 and Ie0 , excitation scales, d and e,measurement scales, µd and µe, and pixel area, a. Thus, for camera unit (c.u.) to concentration(µm−2) conversion factors, αd =(µd · a · d · Id0 )−1 and αe =(µe · a · e · Ie0)−1, to numericalprecision,cD · dd = αdID, (F.9a)cd = αdId, (F.9b)cE · de = αeIE , (F.9c)ce = αeIe. (F.9d)212F.3. Preparing Aligned Data for AnalysisFlowed bulk MinD and MinE protein concentrations are roughly constant in time. During P0, inall data frames after bulk proteins reach the middle of the flow cell, flattened MinD and MinEfluorescent intensities appear roughly uniform in space, as shown in Figure F.9. Mean flattenedMinD and MinE fluorescence intensity values over space, for each data frame in P0, are shownin Figure F.10.Figure F.10: Mean flattened fluorescence intensities over space during temporal partition P0.As is visible in Figure F.10, when bulk proteins reach the middle of the flow cell, MinD and MinEfluorescence intensities transiently peak, level off, then MinD fluorescence intensity graduallyincreases in time and MinE fluorescence intensity remains roughly constant in time. Withno initial lipid bilayer-bound MinD and MinE and roughly constant bulk MinD and MinEconcentrations, just after bulk proteins reach the middle of the flow cell, MinE fluorescenceintensity, while roughly constant in time, consists almost entirely of bulk MinE protein fluorophoreexcitation. During a period with very little lipid bilayer-bound MinE and a roughly constantbulk MinD concentration, models of the evolution of lipid bilayer-bound MinD concentration, asdefined in Equations (4.1), (4.6), (4.7), and (4.9) of Section 4.3, reduce to the formdcddt= (ωD→d + ωdD→dcd)(cmax − cd)/cmax −ωd→Dcnss cdcnss + cnsd, (F.10)where cde, ce ≈ 0 and ωd→D = 0 in equation (4.1), cde, ce ≈ 0 and ns = 1 in equation (4.6),cde, cede, ce ≈ 0 and ns = 1 in equation (4.7), and cde, cded, ce ≈ 0 and ns = 1 in equation(4.9). For small cd, where cd is very small compared to cmax and cs, (cmax − cd)/cmax ≈ 1 andcnss /(cnss + cnsd ) ≈ 1. Thus, models of the evolution of lipid bilayer-bound MinD concentrationreduce to the formdcddt= β1 + β2cd, (F.11)213F.3. Preparing Aligned Data for Analysiswhere β1 = ωD→d and β2 = ωdD→d − ωd→D, which has solutioncd(t) = −β1β2+(cd,0 +β1β2)eβ2(t−t0), (F.12)with concentration cd,0 at time t = t0. Converting to camera units, Id(t) = cd(t)/αd, equation(F.12) becomesId(t) = −γ1β2+(Id,0 +γ1β2)eβ2(t−t0), (F.13)where γ1 = β1/αd and Id,0 = cd,0/αd. Mean flattened MinE fluorescence intensity values overspace appear to be roughly constant from image 122 through image 204. After image 204, meanflattened MinE fluorescence intensity values over space increase slightly. Thus, to decomposeMinD fluorescence intensity, Id(t), which I approximate as the mean flattened MinD fluorescenceintensity over space, into roughly constant bulk MinD fluorescence, ID(t), and lipid bilayer-boundMinD fluorescence, Id(t), I determine Id(t) characterizing parameters, γ1, β1, and Id,0, thatminimize the variance in MinD bulk fluorescence intensity, ID(t) = Id(t)− Id(t), from image 122through image 204, using the MATLAB function lsqcurvefit with initial parameter estimatesof 10−3. I find that, for t0 = 122 index, γ1 = 3.37 · 10−2 c.u. index−1, β1 = 3.36 · 10−2 index−1,and Id,0 = 2.24 · 10−1 c.u. minimize the variance in ID(t). The decomposition of Id(t) into ID(t)and Id(t) is shown in Figure (F.11).Figure F.11: The decomposition of MinD fluorescence intensity into bulk fluorescence intensityand lipid bilayer-bound fluorescence intensity.Bulk MinD fluorescence intensity, ID(t), and bulk MinE fluorescence intensity, IE(t), whichI approximate as the mean flattened MinE fluorescence intensity over space, from image 122through image 204, are shown in Figure (F.12).214F.4. Finding Spatially Homogeneous DataFigure F.12: Bulk MinD and MinE fluorescence intensities. Mean values are shown with dashedlines.For camera unit to concentration conversion factors, αd and αe, from equation (F.9d),αd = cD · dd · ID−1, (F.14a)αe = cE · de · IE−1. (F.14b)ID−1 and IE−1 have mean values 1.392 · 10−2 c.u. and 2.796 · 10−2 c.u. and standard deviations4.9 · 10−4 c.u. and 9.8 · 10−4 c.u.. The flowed buffer contains 1.06 µM = 6.383 · 102 µm−3 MinDand 1.36 µM = 8.190 · 102 µm−3 MinE. Thus, αd and αe have mean values 1.137 µm−2 c.u.−1and 3.802 µm−2 c.u.−1 and standard deviations 0.040 µm−2 c.u.−1 and 0.133 µm−2 c.u.−1. Iapproximate αd and αe by mean values, αd = 1.137 µm−2 c.u.−1 and αe = 3.802 µm−2 c.u.−1.Comparatively, Ivanov and Mizuuchi measured αd, approximately, as 1.786 µm−2 c.u.−1. I scaleflattened MinD and MinE fluorescence intensity data to generate MinD and MinE density data:dk,li,j = αd · d¯k,li,j , (F.15)ek,li,j = αe · e¯k,li,j , (F.16)for i ∈ {1, 2, . . . , 485}, j ∈ {1, 2, . . . , 247}, k ∈ {0, 1, . . . , 10}, and l ∈ {1, 2, . . . , n(Pk)}.F.4 Finding Spatially Homogeneous DataF.4.1 Spatially Near-Homogeneous Model ReductionsA partial differential equation description of some dynamic process with m states, u1, u2, . . . , um,in n-dimensional space, x = x1, x2, . . . , xn, and time, t, is characterized by the system of215F.4. Finding Spatially Homogeneous Dataequations:∂ui(x, t)∂t=fi(x, t, u1, . . . , um,∂u1∂x1, . . . ,∂u1∂xn,∂2u1∂x1∂x1, . . . ,∂2u1∂x1∂xn, . . . ,∂u2∂x1, . . .), (F.17)for some functions fi and i ∈ {1, 2, . . . ,m}. If u1, u2, . . . , um evolve near-homogeneously oversome spatial domain, Ω, thenui(x, t) = µi(t) + εgi(x, t), (F.18)with spatially homogeneous state-evolution µi(t), small scaling factor ε ≥ 0, and local state-variation gi(x, t), for x ∈ Ω and i ∈ {1, 2, . . . ,m}. As ε → 0+, ui(x, t) → µi(t), and thus allpartial derivatives of u1, u2, . . . , um with respect to x1, x2, . . . , xn converge to a value of zero.Hence,limε→0+∂ui(x, t)∂t=limε→0+fi(x, t, u1, . . . , um,∂u1∂x1, . . . ,∂u1∂xn,∂2u1∂x1∂x1, . . . ,∂2u1∂x1∂xn, . . . ,∂u2∂x1, . . .)=fi (x, t, µ1, . . . , µm, 0, . . . , 0, 0, . . . , 0, . . . , 0, . . . ) = fµi (t, µ1, . . . , µm) , (F.19)for x ∈ Ω and i ∈ {1, 2, . . . ,m}. For fi analytic in ε,fi(x, t, u1, . . . , um,∂u1∂x1, . . . ,∂u1∂xn,∂2u1∂x1∂x1, . . . ,∂2u1∂x1∂xn, . . . ,∂u2∂x1, . . .)=fi(x, t, µ1 + εg1, . . . , µm + εgn, ε∂u1∂x1, . . . , ε∂g1∂xn, ε∂2g1∂x1∂x1, . . . , ε∂2g1∂x1∂xn, . . .)=fµi (t, µ1, . . . , µm) +O(ε), (F.20)for x ∈ Ω and i ∈ {1, 2, . . . ,m}. Thus,∂ui(x, t)∂t=dµi(t)dt+ ε∂gi(x, t)∂t= fµi (t, µ1, . . . , µm) +O(ε), (F.21)for x ∈ Ω and i ∈ {1, 2, . . . ,m}. Hence, to leading order,dµi(t)dt= fµi (t, µ1, . . . , µm) (F.22)for x ∈ Ω and i ∈ {1, 2, . . . ,m}. Therefore, the global behavior of a near-homogeneous processis described to leading order by a system of ordinary differential equations, the spatially-homogeneous reduction of the system’s partial differential equation description.216F.4. Finding Spatially Homogeneous DataF.4.2 Finding Spatially Near-Homogeneous DataI assume that near-homogeneous MinD and MinE densities at consecutive measurements resultfrom near-homogeneous MinD and MinE density evolutions. MinD and MinE densities arelinear combinations of unobserved state densities; I assume that near-homogenous MinD andMinE densities result from linear combinations of near-homogeneous unobserved state densities.Therefore, as discussed in Section F.4.1, I model sequentially near-homogeneous MinD andMinE densities by spatially-homogeneous partial differential equation model reductions.Classifying near-homogeneous MinD and MinE densities requires a measure of data homogene-ity, or, alternatively, a measure of data inhomogeneity. I measure data relative inhomogeneity,h, by the ratio of the data standard deviation to the data mean, the ratio of data variation todata uniformity. For strictly positive data, h ≥ 0, and h = 0 if and only if data is constant.Thus, h is a well defined measure of MinD and MinE density inhomogeneity. For density datack,l, from the lth data frame of the kth temporal partition, I calculate the relative inhomogeneityof ck,l over D(i, j, r), a disk with center at the middle of the (i, j)th pixel and radius r:hr ◦ ck,li,j = h(ck,l|D(i,j,r))=(µr ◦ ck,li,j)−1( r∑m=−rr∑n=−rρi+m,j+npir2(ck,li+m,j+n − µr ◦ ck,li,j)2) 12, (F.23a)with mean value of ck,l over D(i, j, r),µr ◦ ck,li,j = µ(ck,l|D(i,j,r))=r∑m=−rr∑n=−rρi+m,j+npir2ck,li+m,j+n, (F.23b)where ck,li+m,j+n is the value of ck,l at the (i+m, j + n)th pixel, and ρi+m,j+n is the fraction ofthe (i+m, j + n)th pixel contained within D(i, j, r).MinD and MinE densities appear to globally vary over areas of hundreds to hundreds ofthousands of pixels, with local pixel-to-pixel variation. Local pixel-to-pixel variation increasesrelative inhomogeneity, obscuring measurements of underlying global density variation. Surfacesare locally well approximated by tangent planes. Thus, I locally fit MinD and MinE densities byplanes to approximate global MinD and MinE density surfaces. For density data ck,l, I calculatethe plane over a disk D(i, j, r) that fits ck,l best in the least squares sense, and determine thevalue of the plane at the middle of the (i, j)th pixel:p(ck,l|D(i,j,r))= βˆx(x− i) + βˆy(y − j) + βˆ0, (F.24a)(βˆx, βˆy, βˆ0) = arg min(βx,βy ,β0)r∑m=−rr∑n=−rρi+m,j+n(βxm+ βyn+ β0 − ck,li+m,j+n)2, (F.24b)gr ◦ ck,li,j = g(ck,l|D(i,j,r))= p(ck,l|D(i,j,r))∣∣∣(x,y)=(i,j)= βˆ0, (F.24c)217F.4. Finding Spatially Homogeneous Datawhere ρi+m,j+n is the fraction of the (i+m, j+n)th pixel contained within D(i, j, r); I determineβˆy, βˆx, βˆ0 by solving the weighted normal equations.I choose the local density planar-fitting disk radius, r˚, such that D(i, j, r˚) covers 100 squarepixels, the lower end of the visible MinD and MinE global density variation scale. I choosethe relative inhomogeneity disk radius, r¯, such that D(i, j, r¯) covers 1000 square pixels, anorder of magnitude more pixels than D(i, j, r˚) covers. Thus, r˚ = (100/pi)1/2 pixels ≈ 5.6 pixelsand r¯ = (1000/pi)1/2 pixels ≈ 17.8 pixels. In an experiment similar to that of Ivanov andMizuuchi, Loose et al measured the motilities of MinD and MinE in traveling protein waves onthe lipid bilayer. They found that motilities varied along the wave, but were well characterizedby diffusion in sections of the wave, with maximal MinD and MinE diffusion coefficients of0.374± 0.022 µm2 s−1 and 0.320± 0.023 µm2 s−1 [44]. Thus, maximal MinD and MinE rootmean squared displacements, over the time between measurements, 3 s, are approximately(4 · 0.374 µm2 s−1 · 3 s)1/2 · 6 pixels µm−1 ≈ 12.7 pixels and (4 · 0.320 µm2 s−1 · 3 s)1/2 ·6 pixels µm−1 ≈ 11.8 pixels. Therefore, r¯ is on the scale of MinD and MinE mobilities betweenmeasurements. I calculate the relative inhomogeneity of planar-fit MinD and MinE density data:hr¯ ◦ gr˚ ◦ dk,li,j and hr¯ ◦ gr˚ ◦ ek,li,j , (F.25)for i ∈ {R¯+ 1, R¯+ 2, . . . , 485− R¯} and j ∈ {R¯+ 1, R¯+ 2, . . . , 247− R¯}, where R¯ = dr¯+ r˚− 1/2e,the ceiling of r¯ + r˚ − 1/2, and for k ∈ {0, 1, . . . , 10} and l ∈ {1, 2, . . . , n(Pk)}.During temporal partition P0, MinD and MinE densities consist mainly of bulk proteinsin the well-mixed, flowed solution buffer. Thus, to determine basal relative inhomogeneityscales, I calculate mean relative inhomogeneity values of planar-fit MinD and MinE densitydata, centered at mid data pixel (243, 124), from image 122 through image 204 of temporalpartition P0, the images of bulk fluorescence intensity estimation (as discussed in Section F.3.3);relative inhomogeneity values are shown in Figure F.13; I find mean MinD and MinE relativeinhomogeneity values µhd = 0.0837 and µhe = 0.1148, with standard deviations of σhd = 0.0129and σhe = 0.0188.218F.4. Finding Spatially Homogeneous DataFigure F.13: Relative inhomogeneity values of planar-fit MinD and MinE density data in temporalpartition P0, hr¯◦gr˚◦dk,li,j and hr¯◦gr˚◦ek,li,j for (i, j) = (243, 124), k = 0, and l ∈ {122, 123, . . . , 204}.Mean values are shown with dashed lines.To find planar-fit MinD and MinE density data that is uniformly homogenous, on basal relativeinhomogeneity scales, throughout a temporal partition, I measure the scaled sum of maximalMinD and MinE relative inhomogeneity values in each temporal partition:Hki,j =max{hr¯ ◦ gr˚ ◦ dk,li,j : l ∈ {1, . . . , n(Pk)}µhd+max{hr¯ ◦ gr˚ ◦ ek,li,j : l ∈ {1, . . . , n(Pk)}µhe, (F.26)for i ∈ {R¯ + 1, R¯ + 2, . . . , 485 − R¯}, j ∈ {R¯ + 1, R¯ + 2, . . . , 247 − R¯}, and k ∈ {0, 1, . . . , 10}.Values of Hki,j are shown in Figure F.14 for temporal partitions P0 (images 122 through 204),P5, and P9.219F.4. Finding Spatially Homogeneous Data(a) (b) (c)Figure F.14: Scaled sums of maximal MinD and MinE relative inhomogeneity values in temporalpartitions P0 (images 122 through 204), P5, and P9. Hki,j are shown for k = 0 in (a), k = 5 in(b), and k = 9 in (c), for i ∈ {R¯+ 1, R¯+ 2, . . . , 485− R¯} and j ∈ {R¯+ 1, R¯+ 2, . . . , 247− R¯}.Values increase with gradation from black, with a value of 1.89, to white, with a value of 24.1.As is visible in Figure F.14: in temporal partition P0, consisting mainly of bulk flow, planar-fit MinD and MinE densities are uniformly homogeneous at all spatial location; in temporalpartition P5, during a MinD and MinE density pulse, planar-fit MinD and MinE densitiesare uniformly homogeneous at some spatial locations; and in temporal partition P9, duringMinD and MinE density traveling waves, planar-fit MinD and MinE densities are uniformlyinhomogeneous at all spatial locations.The minimum value ofHki,j occurs at the (436, 204)th pixel of temporal partition P5, H5436,204 =1.89. Relative inhomogeneity values of planar-fit MinD and MinE density data, at pixel (436, 204),during temporal partition P5, are shown in Figure F.15. For comparison, relative inhomogeneityvalues of planar-fit MinD and MinE density data, at mid data pixel (243, 124), during temporalpartition P9, are also shown in Figure F.15.220F.4. Finding Spatially Homogeneous Data(a) (b)Figure F.15: Relative inhomogeneity values of planar-fit MinD and MinE density data. hr¯◦gr˚◦dk,li,jand hr¯ ◦ gr˚ ◦ ek,li,j are shown, for (i, j) = (436, 204), k = 5, and l ∈ {1, 2, . . . , 170}, in (a), and for(i, j) = (243, 124), k = 9, and l ∈ {1, 2, . . . , 138}, in (b).Comparatively, {max{hr¯ ◦ gr˚ ◦ d5,l436,204 : l ∈ {1, 2, . . . , n(P5)}}=0.87 · µhdmax{hr¯ ◦ gr˚ ◦ e5,l436,204 : l ∈ {1, 2, . . . , n(P5)}}=1.02 · µhe ,(F.27a){max{hr¯ ◦ gr˚ ◦ d9,l243,124 : l ∈ {1, 2, . . . , n(P9)}}=5.39 · µhdmax{hr¯ ◦ gr˚ ◦ e9,l243,124 : l ∈ {1, 2, . . . , n(P9)}}=2.59 · µhe .(F.27b)Thus, relative inhomogeneity values of planar-fit MinD and MinE density data, centered at pixel(436, 204), during temporal partition P5, do not significantly exceed basal relative inhomogeneityscales, whereas relative inhomogeneity values of planar-fit MinD and MinE density data, centeredat pixel (243, 124), during temporal partition P9, significantly exceed basal relative inhomogeneityscales. The maximum relative inhomogeneity value of planar-fit MinD density data, centered atpixel (436, 204), during the MinD density pulse upstroke in P5, occurs at frame 21; the maximumrelative inhomogeneity value of planar-fit MinD density data, centered at pixel (243, 124), duringtemporal partition P9, occurs at frame 52, when the MinD density traveling wave front passesthrough D(243, 124, r¯). For comparison, planar-fit MinD density data, over D(436, 204, r¯) atframe 21 of temporal partition P5 and over D(243, 124, r¯) at frame 52 of temporal partition P9,are shown in Figure F.16.221F.4. Finding Spatially Homogeneous Data(a) (b)Figure F.16: Planar-fit MinD density data. Planar-fit MinD density data over D(436, 204, r¯)at frame 21 of temporal partition P5 is shown in (a), and planar-fit MinD density data overD(243, 124, r¯) at frame 52 of temporal partition P9 is shown in (b). Density data is overlaid ontop of planar-fit density data with reduced opacity.I consider planar-fit MinD and MinE density data over D(436, 204, r¯) during P5 to benear-homogeneous. Thus, I consider MinD and MinE density data over D(436, 204, r¯) duringP5 to represent near-homogeneous processes with local pixel-to-pixel noise, and I approximatespatially homogeneous MinD and MinE density data by mean MinD and MinE density dataover D(436, 204, r¯) during P5,dl = µr¯ ◦ d5,l436,204, (F.28a)el = µr¯ ◦ e5,l436,204, (F.28b)for l ∈ {1, 2, . . . , n(P5)}. Spatially near-homogeneous MinD and MinE density data profiles, dland el for l ∈ {1, 2, . . . , n(P5)}, are shown in Figure F.17.222F.4. Finding Spatially Homogeneous DataFigure F.17: Spatially near-homogeneous MinD and MinE density data profiles, dl and el forl ∈ {1, 2, . . . , n(P5)}.F.4.3 Errors in Spatially Near-Homogeneous DataTo show the spreads of MinD and MinE density data over D(436, 204, r¯) during P5, I plotnormalized histograms of MinD and MinE density data over D(436, 204, r¯) for each frame of P5in Figure F.18.223F.4. Finding Spatially Homogeneous Data(a) (b)Figure F.18: The spreads of MinD and MinE density data over D(436, 204, r¯) during P5.Normalized histograms of density data over D(436, 204, r¯) for each frame of P5 are shown forMinD in (a) and MinE in (b). Each histogram is normalized by the maximum count in thehistogram. Gradation is from white, with a count of 0, to black, with the maximum count inthe histogram.I approximate spatially near-homogeneous density data by mean density data. Thus, Icalculate errors in approximating spatially near-homogeneous data by the standard error of themean; for MinD and MinE density data over D(436, 204, r¯) during P5,dσl = (pir¯2)− 12(r¯∑m=−r¯r¯∑n=−r¯ρ436+m,204+npir¯2(d5,l436+m,204+n − µr¯ ◦ d5,l436,204)2) 12, (F.29a)eσl = (pir¯2)− 12(r¯∑m=−r¯r¯∑n=−r¯ρ436+m,204+npir¯2(e5,l436+m,204+n − µr¯ ◦ e5,l436,204)2) 12, (F.29b)for l ∈ {1, 2, . . . , n(P5)}, where ρi+m,j+n is the fraction of the (i+m, j + n)th pixel containedwithin D(i, j, r). I show densities within dl ± dσl and el ± eσl in Figure F.19.224F.4. Finding Spatially Homogeneous DataFigure F.19: Densities within error of spatially near-homogeneous data. Densities within dl± dσlare shown in green, and densities within el ± eσl are shown in red, for l ∈ {1, 2, . . . , n(P5)}Interestingly, I find that dσl and eσl are related to dl and el by power laws, dσl ≈ kddlrdand eσl ≈ keelre for constants kd, rd, ke, and re. I fit log(kd) + rd log(dl) to log(dσl ) andlog(ke) + re log(el) to log(eσl ) for l ∈ {1, 2, . . . , n(P5)} using least squares, as shown in FigureF.20, to find kd = 0.153, rd = 0.570, ke = 0.330, and re = 0.532.225F.4. Finding Spatially Homogeneous Data(a) (b)Figure F.20: Estimating power laws in errors. The least squares fit of log(kd) + rd log(dl) tolog(dσl ) is shown in (a) and the least squares fit of log(ke) + re log(el) to log(eσl ) is shown in (b),for l ∈ {1, 2, . . . , n(P5)}. Log-errors are shown with points and fits are shown with lines.I plot dσl with kddlrd and eσl with keelre , with fit kd, rd, ke, and re, in Figure F.21.(a) (b)Figure F.21: Spatially near-homogeneous data errors and power law approximations. dσl andkddlrd are shown in (a), and eσl and keelre are shown in (b), for l ∈ {1, 2, . . . , n(P5)}F.4.4 Bounding Persistent and Bulk DensitiesIn MinD and MinE density data, higher densities than surrounding areas persist over time insome small regions, a phenomena likely related to the experimental observation that, in the226F.4. Finding Spatially Homogeneous Dataabsence of MinE, a fraction of lipid bilayer-bound MinD resists membrane dissociation whenwashed with buffer, as discussed in the Ivanov and Mizuuchi Supporting Information. MinD andMinE densities consist of bulk densities, persistent lipid bilayer-bound densities, and transientlipid bilayer-bound densities. Thus, I include terms that account for bulk and persistent lipidbilayer-bound densities in mathematical models.MinD and MinE pulse-train density data, generated by calculating mean values of MinDand MinE density data over D(436, 204, r¯) during temporal partitions P4, P5, P6, and P7, isshown in Figure F.22.Figure F.22: MinD and MinE pulse-train density data. Temporal indexing corresponds to frameindexing in temporal partition P5, as in Figure F.17.As discussed previously, MinD and MinE pulse-train density generating data is near homogeneousduring P5. MinD and MinE pulse-train density generating data is near homogeneous duringP4, P6, and P7, except during MinD pulse upstrokes (with maximum planar-fit MinD relativeinhomogeneity values of 2.17 · µhd , 3.10 · µhd , and 5.24 · µhd and maximum planar-fit MinE relativeinhomogeneity values of 1.10 ·µhe , 1.20 ·µhe , and 2.28 ·µhe ). As is visible in Figure F.22, consecutiveMinD and MinE density pulses are similar in dynamic behavior, apart from relatively small227F.4. Finding Spatially Homogeneous Datadifferences in peak density values and pulse periods, which may result from spatial asymmetriesduring pulse upstrokes. Repeatedly in the pulse train, MinD and MinE densities sharply increaseduring pulse upstrokes, peak, sharply decrease, slowly decay, level off to minimal values, thenslightly increase before pulse upstrokes. I find minimal MinD and MinE density data betweensuccessive MinD pulse-train density peaks, the MinD and MinE density data with least meansover 17 consecutive data indices, 1/10 the number of data indices in P5. Between the first andsecond, second and third, and third and fourth MinD pulse-train density peaks, minimal MinDand MinE densities are roughly constant, with minimal MinD density means of 277.16 µm−2,302.79 µm−2 and 317.88 µm−2, minimal MinD density standard deviations of 3.18 µm−2,4.59 µm−2, and 4.18 µm−2, minimal MinE density means of 240.68 µm−2, 243.20 µm−2, and249.25 µm−2, and minimal MinE density standard deviations of 6.58 µm−2, 6.99 µm−2, and7.68 µm−2. Mean minimal MinD and MinE densities increase during the pulse train, but byrelatively insignificant amounts on the scales of MinD and MinE density pulses.Bulk MinD and MinE densities are roughly constant during the pulse-train. Given thesimilarities in dynamic behavior and minimal values of successive MinD and MinE densitypulses, persistent and transient lipid bilayer-bound MinD and MinE densities likely attainsimilar minimal values during successive pulses of the pulse train. Thus, as persistent lipidbilayer-bound densities are inherently temporally non-decreasing, persistent lipid bilayer-boundMinD and MinE densities are likely roughly constant, on the scales of MinD and MinE densitypulses, during temporal partition P5. As such, for spatially near-homogeneous MinD and MinEdensity data, I model the sums of bulk and persistent lipid bilayer-bound MinD and MinEdensities as constants, Cd and Ce. MinD and MinE densities consist of bulk densities, persistentlipid bilayer-bound densities, and transient lipid bilayer-bound densities. Thus, minimal MinDand MinE densities are upper bounds of persistent lipid bilayer-bound MinD and MinE densities.As such, I impose upper bounds on Cd and Ce, as the maximum values of minimal MinD andMinE density means, 317.88 µm−2 and 249.25 µm−2.228Appendix GImplementation ofOverlapping-Niche Descent forNear-Homogeneous Data FittingHere, I describe details pertaining to the implementation of overlapping-niche descent for modelfitting to the near-homogeneous data. I describe related structural components of overlapping-niche descent in Section 4.4.G.1 Generating Random Parameter and State ValuesInitially in overlapping-niche descent, I randomly generate parameters and state values. Also, asdiscussed in Section C.1, throughout overlapping-niche descent, I randomly generate parametersand state values in random offspring. In accordance with bounds on Cd, Ce, and cd¯ (4.16) and(4.17), I randomly generate values of Cd, Ce, and cd¯ such thatCd ∼ 317.88 · U(0, 1) µm−2, (G.1a)Ce ∼ 249.25 · U(0, 1) µm−2, (G.1b)cd¯ ∼ 158.94 · U(0, 1) µm−2, (G.1c)where U(a, b) is the uniform probability distribution over the interval (a, b). I expect that cmaxis within one or two orders of magnitude of half the maximal near homogeneous MinD densityvalue, Dmax/2. Thus, in accordance with bounds on cmax (4.18), I randomly generate cmax suchthatcmax ∼ Dmax/2 · 10U(0,2). (G.2)Additionally, I expect that cs is within one or two orders of magnitude of Dmax/2. Thus, inaccordance with bounds on cs (4.19), I randomly generate cs such thatcs ∼ Dmax/2 · 10U(−2,0). (G.3)229G.2. Parents and OffspringGiven no prior parameter value estimates, I randomly generate rate parameters over a broadrange of scales:p ∼ 10U(−9,1) up for all p ∈{ωzu,v→x,y : u, v, x, y, z ∈ {∅, D,E, d, de, ede, ded, e}}, (G.4)where up is the units of parameter p.I choose random state values to match near homogeneous data exactly. For near-homogeneousMinD and MinE data values, D and E, D¯ = (D − Cd)/2, and E¯ = (E − Ce)/2, with ce as afree state, I generate random state values for the modified Bonny et al model and the extendedBonny et al model such thatce ∼ U(max{0, E¯ − D¯}, E¯),cde = E¯ − ce,cd = D¯ − E¯ + ce. (G.5)With cede and ce as free states, I generate random state values for the symmetric activationmodel such thatcede, ce ∼ U({cede + ce ≥ E¯ − D¯, 2cede + ce ≤ E¯ : cede ≥ 0, ce ≥ 0}),cde = E¯ − 2cede − ce,cd = D¯ − E¯ + cede + ce, (G.6)where U({·}) is the uniform probability distribution over the set {·}. With cded and ce as freestates, I generate random state values for the asymmetric activation model such thatcded, ce ∼ U({ce − cede ≥ E¯ − D¯, cede + ce ≤ E¯ : cded ≥ 0, ce ≥ 0}),cde = E¯ − cded − ce,cd = D¯ − E¯ − cded + ce, (G.7)G.2 Parents and OffspringI choose parents and offspring as in Section E.1.2. For convenience, I repeat the discussionfrom Section E.1.2 below. The function of parents and offspring in overlapping-niche descent isdescribed in Section C.1. Accordingly, to the ith niche in generation g, I allocate one sustainedparent, nˆi = 1, one high momentum offspring, nˇmg,i = 1, one cross-niche offspring, nˇcg,i = 1, andone random offspring, nˇrg,i = 1, for each i ∈ {1, 2, . . . , 101} and each generation of overlapping-niche descent, g ≥ 1. In the first two generations of overlapping-niche descent, g ≤ 2, Iallocate two sexual offspring to each niche, nˇsg,i = 2 for all i ∈ {1, 2, . . . , 101}. After the secondgeneration of overlapping-niche descent, I adaptively change the number of sexual offspringthat I allocate to each niche, enlarging less convergent niches and shrinking more convergent230G.3. Selection and Random Perturbationniches for greater efficiency in optimization. Specifically, I allocate one sexual offspring to the ithniche, and randomly allocate the remaining 101 sexual offspring to the ith niche with probabilityproportional to ∆rg,i,1, the measure of convergence in the (first) parent space of the ith niche ingeneration g, as defined in equation (C.1), for each i ∈ {1, 2, . . . , 101} and g > 2.G.3 Selection and Random PerturbationI choose the natural default value for the selection strength parameter, qfit = 1, for qfit asdescribed in Section C