UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Advances in meta-algorithmic software libraries for distributed automated algorithm configuration Ramage, Stephen Edward Andrew 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_may_ramage_steve.pdf [ 5.46MB ]
JSON: 24-1.0167184.json
JSON-LD: 24-1.0167184-ld.json
RDF/XML (Pretty): 24-1.0167184-rdf.xml
RDF/JSON: 24-1.0167184-rdf.json
Turtle: 24-1.0167184-turtle.txt
N-Triples: 24-1.0167184-rdf-ntriples.txt
Original Record: 24-1.0167184-source.json
Full Text

Full Text

Advances in Meta-algorithmic Software Libraries forDistributed Automated Algorithm ConfigurationbyStephen Edward Andrew RamageB.Sc, The University of British Columbia, 2012a thesis submitted in partial fulfillmentof the requirements for the degree ofMaster of Scienceinthe faculty of graduate and postdoctoral studies(Computer Science)The University of British Columbia(Vancouver)April 2015© Stephen Edward Andrew Ramage, 2015AbstractA meta-algorithmic procedure is a computer procedure that operates upon anotheralgorithm and its associated design space to produce another algorithm with desirableproperties (e.g., faster runtime, better solution quality, ...; see e.g., Hoos [2008]). Manymeta-algorithmic procedures have runtimes that are dominated by the runtime ofthe algorithm being operated on. This holds in particular for automatic algorithmconfigurators, such as ParamILS, SMAC, and GGA, which serve to optimize the design(expressed through user settable parameters) of an algorithm under certain use cases.Consequently, one can gain improved performance of the meta-algorithm if evaluationsof the algorithm under study can be done in parallel. In this thesis, we explore adistributed version of the automatic configurator, SMAC, called pSMAC, and thelibrary, AEATK, that it was built upon, which has proved general and versatile enoughto support many other meta-algorithmic procedures.iiPrefaceI participated in the design of pSMAC along with my co-collaborators Frank Hutter,Kevin Leyton-Brown and and Holger H. Hoos. For AEATK, I primarily designed andimplemented it, but it’s evolution and development were driven by requirements bymyself and other users in the lab, especially Alexandre Fre´chette, and Frank Hutter.Some other contributions of note:1. Daniel Geschwender worked on implementing some features for the MySQL andHAL Target Algorithm Evaluators.2. Alexandre Fre´chette helped implement several features, and many bug fixes.3. Christopher Thornton provided several bug fixes.4. Frank Hutter actually provided the initial code that everything grew out of, andmost of the abstractions within it are actually more concrete formalisms of hisprevious work on SMAC and ParamILS. Additionally some text in here has beenlifted on documents were wrote together.5. Christopher Fawcett provided some bug fixes, as well as provided some guidancebased on his experience with HAL.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Meta-Algorithmic Libraries . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Algorithm Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Sequential Model Based Optimization Approaches . . . . . . . 83 AEATK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.1 Motivation & Related Work . . . . . . . . . . . . . . . . . . . . 123.1.2 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.3 Algorithm Execution Model . . . . . . . . . . . . . . . . . . . . 143.1.4 Sample Use Cases for Target Algorithm Execution . . . . . . . 17iv3.2 Domain Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.1 Parameter Configuration Spaces . . . . . . . . . . . . . . . . . 183.2.2 ProblemInstance & ProblemInstanceSeedPair . . . . . . . . . . 193.2.3 AlgorithmExecutionConfiguration . . . . . . . . . . . . . . . . 193.2.4 AlgorithmRunConfiguration . . . . . . . . . . . . . . . . . . . . 203.2.5 AlgorithmRunResult . . . . . . . . . . . . . . . . . . . . . . . . 213.2.6 Differences Between the Object and Conceptual Models . . . . 213.3 Target Algorithm Evaluator API . . . . . . . . . . . . . . . . . . . . . 223.3.1 Execution Options . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Local Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.1 Target Algorithm Invocation . . . . . . . . . . . . . . . . . . . 263.4.2 Concurrent Execution . . . . . . . . . . . . . . . . . . . . . . . 283.4.3 Target Algorithm Observation & Termination . . . . . . . . . . 283.5 Distributed Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5.2 Distributed vs Local Execution Performance . . . . . . . . . . 333.6 Modification & Customization . . . . . . . . . . . . . . . . . . . . . . . 383.6.1 New Implementations . . . . . . . . . . . . . . . . . . . . . . . 383.6.2 Existing Implementations . . . . . . . . . . . . . . . . . . . . . 403.6.3 Execution Customization . . . . . . . . . . . . . . . . . . . . . 413.7 Easy Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7.1 Command-line Interface . . . . . . . . . . . . . . . . . . . . . . 443.7.2 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8 Conclusions and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 454 SMAC in AEATK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 AEATK Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 494.4 Differences Between MATLAB and Java Implementations . . . . . . . 514.5 Experimental Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 524.5.1 Computational Environment . . . . . . . . . . . . . . . . . . . 524.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.5.3 Conclusions & Future Work . . . . . . . . . . . . . . . . . . . . 53v5 Parallel Automatic Configuration . . . . . . . . . . . . . . . . . . . . 605.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.1.1 Parallel Independent Runs . . . . . . . . . . . . . . . . . . . . . 605.1.2 Parallel Automatic Configurator . . . . . . . . . . . . . . . . . 615.1.3 Parallel Dependent Runs . . . . . . . . . . . . . . . . . . . . . 615.2 pSMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.3.1 Experiment I - Performance Over Time . . . . . . . . . . . . . 635.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.4 Experiment II - CSSC Performance . . . . . . . . . . . . . . . . . . . . 645.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 655.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74A AEATK Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.1 Debug Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.2 Functionality Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . 81A.3 Helper Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82A.4 Resource Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.5 Safety Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86viList of TablesTable 3.1 System configuration for performance comparison between dis-tributed and local execution. . . . . . . . . . . . . . . . . . . . . . . 34Table 4.1 Configuration scenarios for MATLAB and Java SMAC Implemen-tations. Additionally, all of the multi-instance scenarios used thecontinuous parameter configuration spaces. For single instance sce-narios involving the SPEAR solver, the configuration space wasdiscretized. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Table 4.2 System configuration for all configuration scenarios . . . . . . . . . 56Table 4.3 Median training performance of MATLAB and Java implementationsof SMAC, along with the default performance for single-instancescenarios from Hutter et al. [2011b]. The number of independent runsfor each is denoted in the # column. Some runs of MATLAB didnot complete successfully. Bold-faced entries represent configuratorperformance that is significantly better under a Mann-Whitney UTest with significance level of 0.01 (a Bonferroni correction wasapplied so the individual p-values are less than 6.667× 10−4). . . . 57Table 4.4 Median performance of MATLAB and Java implementations ofSMAC, along with the default performance for multi-instance sce-narios from Hutter et al. [2011b]. The number of independent runsfor each is denoted in the # column. Some runs of MATLAB didnot complete successfully. Bold-faced entries represent configura-tor performance that is significantly better under a Mann-WhitneyU Test with a significance level of 0.01 (a Bonferroni correctionfor 15 samples was applied so the individual p-values are less than6.667× 10−4). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57viiTable 5.1 Median training performance of SMAC and pSMAC implementations.For Best of SMACx4, we performed bootstrap samplings of 4 runsfrom the # column, computed the minimum, and then the mediumof 1000 samples. For the pSMACx4 run, we took grouped runs intobatches of 4 for sharing runs, and took # independent samples fromthis. Bold-faced entries represent configurator performance that issignificantly better under a Mann-Whitney U Test with significancelevel of 0.01 (because of the repeated testing we applied a Bonferronicorrection and adjusted the individual p-values used for significanceof the test to less than 6.667× 10−4). . . . . . . . . . . . . . . . . 64Table 5.2 Median training performance of SMAC and pSMAC implementations.For Best of SMACx8, we performed bootstrap sampling of 8 runsfrom the # column, computed the minimum, and then the mediumof 1000 samples. For the pSMACx8 run, we took grouped runs intobatches of 8 for sharing runs, and took # independent samples fromthis. Bold-faced entries represent configurator performance that issignificantly better under a Mann-Whitney U Test with significancelevel of 0.01 (because of the repeated testing we applied a Bonferronicorrection and adjusted the individual p-values used for significanceof the test to less than 6.667× 10−4). . . . . . . . . . . . . . . . . 68viiiList of FiguresFigure 3.1 Goals of Selected Meta-Algorithmic Software Libraries . . . . . . . 13Figure 3.2 Classes related to the configuration space of a target algorithm . . 18Figure 3.3 Classes that represent Problem Instances . . . . . . . . . . . . . . 19Figure 3.4 Class that represents the physical information needed to executean algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 3.5 Class that represents all information needed to execute a targetalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 3.6 Class that represents the result of an target algorithm execution . 21Figure 3.7 The TargetAlgorithmEvaluator interface . . . . . . . . . . . . . . . 22Figure 3.8 Standard Execution Sequence Diagram . . . . . . . . . . . . . . . 23Figure 3.9 UML Diagram for Observer class . . . . . . . . . . . . . . . . . . 24Figure 3.10 Sequence diagram outlining how observation interacts between classes 24Figure 3.11 UML Diagram describing Callback class . . . . . . . . . . . . . . . 24Figure 3.12 Sequence diagram outlining asynchronous callbacks . . . . . . . . . 25Figure 3.13 TAE Queue Facade UML . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3.14 TAE Queue Facade Sequence Diagram . . . . . . . . . . . . . . . 26Figure 3.15 Components involved in executing a target algorithm . . . . . . . 27Figure 3.16 Target algorithm execution with runsolver . . . . . . . . . . . . . . 28Figure 3.17 MySQL Target Algorithm Evaluator architecture . . . . . . . . . . 31Figure 3.18 Sharing of runs between applications . . . . . . . . . . . . . . . . . 31Figure 3.19 Processing time versus cutoff time for theCommandLineTargetAlgorithmEvaluator . . . . . . . . . . . . . . 34Figure 3.20 Processing time versus cutoff time for theMySQLTargetAlgorithmEvaluator . . . . . . . . . . . . . . . . . . 35Figure 3.21 Overhead for the CommandLineTargetAlgorithmEvaluator . . . . 36Figure 3.22 Overhead for the MySQLTargetAlgorithmEvaluator . . . . . . . . 37ixFigure 3.23 UML for new TargetAlgorithmEvaluator implementations . . . . 39Figure 3.24 Sequence diagram showing how CRASHED runs can be dealt with . 42Figure 4.1 Sequence diagram depicting interactions between the main classesin the Java implementation of SMAC. . . . . . . . . . . . . . . . . 50Figure 4.2 Median Performance and Interquartile range for SAPS Single In-stance Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.3 Median Performance and Interquartile range for SPEAR SingleInstance Scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Figure 4.4 Median Performance and Interquartile range for Multi-InstanceScenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 5.1 Median Performance and Interquartile range for SAPS Single In-stance Scenarios using 4 cores . . . . . . . . . . . . . . . . . . . . 65Figure 5.2 Median Performance and Interquartile range for SPEAR SingleInstance Scenarios using 4 cores . . . . . . . . . . . . . . . . . . . 66Figure 5.3 Median Performance and Interquartile range for Multi-InstanceScenarios using 4 cores . . . . . . . . . . . . . . . . . . . . . . . . 67Figure 5.4 Median Performance and Interquartile range for SAPS Single In-stance Scenarios using 8 cores . . . . . . . . . . . . . . . . . . . . 69Figure 5.5 Median Performance and Interquartile range for SPEAR SingleInstance Scenarios using 8 cores . . . . . . . . . . . . . . . . . . . 70Figure 5.6 Median Performance and Interquartile range for Multi-InstanceScenarios using 8 cores . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 5.7 Configurator Performance in CSSC 2014 Competition Data . . . . 71Figure 5.8 Configurator Variance . In the plot the final point where SMAC’svariance sky-rockets is due to some missing data for SMAC. . . . . 72Figure 5.9 Core sharing architecture . . . . . . . . . . . . . . . . . . . . . . . 73Figure A.1 Utilization of the MySQLTargetAlgorithmEvaluator over time witha batch size of 32 and a cutoff time of 4 seconds. . . . . . . . . . . 83xList of Algorithms4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55xiAcknowledgementsThere are many people that helped me get to the end of this process and consequentlythat I am indebted to. In no particular order, I would like to thank my supervisors:Holger H. Hoos, Kevin Leyton-Brown and especially my unofficial third supervisorFrank Hutter. Secondly, my lab mates Chris Fawcett, Alexandre Fre´chette, and ChrisThornton. Next I’d like to thank my instructors and the department as a whole forbeing supportive. Fourthly, my friends: Elisabeth, Karioka, Marcin, Marek, Pawel,Sarah, and Susan. Lastly, my parents and grandmother.xiiChapter 1IntroductionA meta-algorithmic procedure is an algorithm that uses other algorithms as input.Many meta-algorithmic procedures are focused on optimizing algorithms or a set ofalgorithms for a specific task. One such class of meta-algorithmic procedures are thosethat configure or tune an algorithm’s parameters to minimize some objective functionacross a set of inputs or instances (e.g., finding the algorithm that can solve a set ofproblems with minimized runtime). Produces for solving this algorithm configurationproblem are called automatic configurators like ParamILS, GGA, irance, and SMAC.Automatic configurators have already made significant strides, in some casesimproving algorithm perform times by several orders of magnitude (see e.g., Hutteret al. [2010a]).This work extends previous work by describing an advancement in automaticconfiguration in parallel environments, that is, an automatic configurator that worksin a distributed and parallel setting, pSMAC.One model for meta-algorithmic procedures is that of an algorithm that emitsa series of requests for runs of an algorithm, perhaps varying some parts or someinput, and then operates on the result of the algorithm run, to make more decisions.This model indeed captures many meta-algorithmic procedures, not simply automaticconfiguration. Another contribution of this thesis is a new toolkit, called AEATK,which is described separately.1.1 ContributionsThe contribution of this thesis is twofold: A toolkit for abstracting algorithm executions,the second is a new automatic configurator that supports distributed configuration.11.2 Thesis OrganizationThis thesis is organized as follows, Chapter 2 provides a survey of existing work onmeta-algorithmic procedures in general, and development of (distributed) automaticconfigurators in particular. Chapter 3 provides a detailed description of the Algo-rithm Execution and Abstraction Toolkit (AEATK) including its design goals, theabstractions it provides, and the various mechanisms of execution available. Chapter4 provides an empirical validation of a configurator implemented in AEATK. Finally,chapter 5 provides an overview of the design of pSMAC.2Chapter 2Background2.1 IntroductionMeta-algorithmic procedures (see Nell et al. [2011]) are algorithms that operate on oruse other algorithms as part of their procedure. A simple example of a meta-algorithmwould be given an instance of the Boolean satisfiability (SAT) problem (see, e.g., Gomeset al. [2008]), and a series of heuristic solvers each of which may solve it within a giventime budget. Provided that there is some variation between which instances can besolved by each heuristic solver, and that none of them simply dominates the remainder,one can construct a simple more powerful meta-algorithmic solver that simply runs allof these algorithms in parallel, and terminates once any of them successfully solvesit, returning the model from the successful solver. The above example is an exampleof a parallel algorithm portfolio. Other types of meta-algorithms may speculativelychoose a particular solver which it feels has the best chance of solving an instance,this type of algorithm is called a algorithm selector, another type of meta-algorithmautomatically “tunes” a heuristic algorithm with the objective of having it performbetter (this is an instance of the algorithm configuration problem). In both cases, thesemeta-algorithms need to interact with the target algorithms, and Section 2.2 outlinesexisting software libraries that provide this functionality. In Section 2.3, we define thealgorithm configuration problem more formally before surveying the literature on thistopic.32.2 Meta-Algorithmic LibrariesMost meta-algorithmic procedures, a procedure that operates on algorithms, performruns on the algorithm under study. A standard and portable approach is to executethe algorithm via the shell or system() system call. This allows the meta-algorithmicprocedure to treat the algorithm that is being used as a black box. The simplest ofmeta-algorithmic procedures use the result of the algorithm’s computation directly(e.g., whether a particular instance of the Boolean satisfiability problem is satisfiableor unsatisfiable), while other meta-algorithmic procedures take measurements of thealgorithm performance, such as its runtime.On the surface, executing target algorithm runs may seem trivial. As a result,designers of meta-algorithms tend to simply develop their own custom solution whenthey need to interact with target algorithms. However, there are in fact a number ofissues related to algorithm execution for other users or during large scale experiments,such as:• Measuring the runtime of an algorithm reliably.• Terminating an algorithm run reliably.• Policing algorithms that exceed resource limits (e.g., runtime and/or memorylimits).• Caching and reusing a previous algorithm execution (when possible) which allowsfor improved utilization of scarce resources. This can be extremely helpful indebugging, by minimizing the time needed to reproduce an error state in ameta-algorithm.• Dealing with crashed runs and/or erroneous results.• Executing algorithms in parallel.• Distributing the execution of algorithms on a cluster.• Managing the number of concurrent algorithm executions that happen at anygiven time.In settings where a number of meta-algorithmic procedures have a similar purpose,the fact that they each execute the algorithm differently can be an annoying liability.For instance, GGA (Ansotegui et al. [2009]), ParamILS (Hutter et al. [2007b]), and4SMAC (Hutter et al. [2011b]) all solve the algorithm configuration problem, buteach has there own method of interacting with the given target algorithm. In somecases, different executions can even make the results incomparable; for example, GGAexecutes and measures an algorithm runtime directly and is unable to determine ifthe algorithm terminated early due to an error. For these reasons, there is a needfor a well-developed application programming interface (API) for executing targetalgorithm runs.There are currently two projects that implement such an API as part of a moreambitious project. The first API is the Experiment Design and Administrationfor Computer Clusters (EDACC) from Balint et al. [2010] & Balint et al. [2011].Originally conceived as a tool for managing algorithms in the context of SAT solvercompetitions, it evolved into a general experimental platform. It follows a master/slavearchitecture, in which the master defines the experiment and the runs needed, andthen the slaves execute the target algorithm executions that are requested. To policealgorithm resource usage, the runsolver tool is used (Roussel [2011]). EDACC alsohas support for running solver competitions, up to and including generating variousplots regarding the experiments. The standard user interface is very limited in termsof possible experiments, but the API is rich and some example implementations basedon SMAC are available publicly. Unfortunately, this aspect of EDACC seems largelyundocumented, and development appears to have stalled.Nell et al. [2011] developed another meta-algorithmic framework, the High-performance Algorithm Laboratory (HAL) which has similar goals as EDACC. Itdiffers, however, in that it provides a rich structure and set of abstractions, in anattempt at being able to describe all meta-algorithmic procedures. Once described,meta-algorithmic procedures for certain problems, such as the algorithm configurationproblem can have many implementations, such as ParamILS or GGA. The web-basedinterface seems to provide much more flexibility and control over experiments thanEDACCs user interface does. Both EDACC and HAL have the ability to directlyinteract with a computer cluster management system such as Torque or Sun GridEngine, to schedule algorithm executions.2.3 Algorithm ConfigurationOne of the most widely studied and useful meta-algorithmic design procedures areautomated algorithm configurators, which are able to significantly improve algorithmperformance, in some cases by several orders of magnitude (see, e.g., Hutter et al.5[2010a]).It is widely believed that a large class of algorithmic problems (e.g., verifying theexistence of a solution to Boolean formula, or determining the shortest path thatvisits all point on a graph) features worst case runtimes that at least with currentcomputer architectures make them theoretically intractable (Garey and Johnson[1990]). Empirically, however, acceptable performance is often attainable on instancesof practical relevance (see, e.g., Leyton-Brown et al. [2014]). For example, whilesatisfiability problems generated randomly with certain parameters may be verydifficult (Johnson [2002]), instances that occur in practice are often computationallyfeasible, such as those from hardware or software verification (see, e.g., Prasad et al.[2005]).For NP-hard problems especially, better than brute-force performance on specifictypes of instances can be attained through the use of carefully designed heuristicmethods. The performance of these heuristics may rely on implicit assumptions aboutdesign choices, such as parameter values that may either be incorrectly set, or maydepend on the actual problem instance, or distribution of problem instances beingexamined. Additionally, as more of these heuristics are added, these may interact inunexpected and complex ways.Algorithm configuration techniques (see, e.g., Hoos [2012b]) are able to explore thedesign space of an algorithm and optimize some objective such as algorithm runtimeor solution quality.Definition 1 The algorithm configuration problem can be defined as a 4-tuple〈A,Θ,Π,m〉1, where:• A is a parameterized algorithm;• Θ = {θ1,θ2, ...} is the possibly infinite set of allowed configurations of A (i.e., acomplete assignments of values to all parameters of A), each θi = 〈T1, T2, ..., Tk〉,is an ordered k-tuple where each element Ti has its own domain coming fromeither a finite set, or a non-empty interval over R or N;• Π = {pi1, pi2, ..., pin} is a set of problem instances that A will be run on;• m is a objective function that measures the performance of each θ across theinstance set Π for algorithm A. For instance the objective function might bethe mean solution quality.1This definition is slightly modified from Hoos [2012b].62.3.1 Existing ApproachesThere are a number of existing approaches to solving the algorithm configurationproblem. Hutter et al. [2007b] developed ParamILS, an automatic configurator thatuses a local search optimization strategy. The local-search-based approach involvesselecting configurations within the one-exchange neighbourhood (under which twoconfigurations are neighbours if they are identical except for one parameter). Theconfiguration space that ParamILS searches also supports the concept of conditionalparameters, that is parameters which are only active or meaningful if another parameteris set to a certain value. ParamILS differs from many other approaches in that it isable to optimize multi-instance scenarios (i.e., many approaches can only optimize analgorithms performance for a particular instance, not over a set of instances). BasicILS,the simplest conceptual variant of ParamILS, evaluates each configuration on the entireinstance distribution. FocusedILS, the variant that typically exhibits much betterperformance, initially starts with a small number of runs for a challenger (i.e., a newconfiguration which will be inspected to see if it’s better than the best configurationfound so far), and then increases it so long as the empirical estimate of the objectivefunction suggests it is better than the current incumbent (i.e., the best configurationfound so far). FocusedILS also occasionally gives the current incumbent more runs, andsupports random restarts in the configuration space. Finally, ParamILS introducedthe adaptive capping mechanism in runtime optimization, where an algorithm run ofa challenger is given a cutoff time which is set adaptively to the time it would takefor the challenger to be demonstratively worse than the incumbent (e.g., if κMAX is5 seconds, but the incumbent has runtime performance of 3 seconds on an instance,then the challenger need only be run for 3 +  seconds, where  > 0) (Hutter et al.[2009b]).Another approach for solving the algorithm configuration problem is realised in theirace package by Lo´pez-Iba´nez et al. [2011]. Their racing approach traces its originsto earlier method called F-Race (Birattari et al. [2002]). In F-Race, a set of candidateconfigurations are repeatedly run against each other until a statistical test revealswhich configuration is statistically significantly better than the others. This methodcan only select a configuration from a pre-existing list of configurations. Balaprakashet al. [2007] modified F-Race, and created I/F-Race which runs multiple rounds ofF-Race, and between each rounds samples new configurations with parameter settingsbiased towards the values of the promising configurations in the last round. Thereference implementation for the I/F-Race algorithm is called irace (Lo´pez-Iba´nez7et al. [2011]), and includes I/F-Race as well as supporting other variants (such aschanging the statistical tests used).An entirely different approach is the Gender-based Genetic Algorithm (GGA)automatic configurator (Ansotegui et al. [2009]). GGA uses techniques from evolution-ary computation to find the most promising configuration. GGA maintains two sets(genders) of configurations; the first is a set of promising configurations, the second isa set of other configurations of unknown performance. Each round, some fixed fractionof the promising configurations is selected to mate with one of the others (i.e., a newparameter setting is created by taking parts of each parents). GGA allows the designerof the algorithm to specify an And/Or tree encoding of the configuration space, whichallows different parameters to be grouped together as a unit (genes) so that whenmating, these groups of parameters can be selected together. The authors compareGGA against a non-gender based genetic algorithm, and find that GGA outperformedit. They note that the non-promising configurations allow for a wider variety of genesto be maintained beyond the number that could be actually evaluated. Additionally,it allows for good genes to stay around, even if they only occurred in configurationsalong with some other bad genes. They also evaluated the benefit provided by thegrouping of parameters into genes and demonstrated significant improvements. Finallythey compared GGA against ParamILS on three scenarios (Hutter et al. [2007b]) andfound that it generally tends to perform better, however, a later, thorough study byHutter et al. [2011b] found that the ParamILS with adaptive capping performed betterthan GGA on these and other scenarios.2.3.2 Sequential Model Based Optimization ApproachesSequential Model-based Optimization (SMBO) techniques, which have a groundingin the statistics literature, are also suitable for solving the algorithm configurationproblem.The prominent EGO (Efficient Global Optimization) algorithm (Jones et al. [1998]& Jones [2001]) lays the ground work for model-based approaches to automaticalgorithm configuration. In EGO, an arbitrary black box function is optimized byrepeatedly fitting a model to the observed data, using that model to select a newinput to evaluate, and then sampling the black box function at that point. The firstmodel is built from data sampled in the same way as by Sacks et al. [1989]. Instead ofoptimizing the mean predicted value, EGO will optimize a more complex acquisitionfunction, that takes both the mean and the uncertainty about the prediction intoaccount. This prevents the procedure from selecting points too close to the best point8found so far, but also still searches in regions in which we expect to find better inputswith high probability.One limitation of EGO is that the black box function being optimized is presumed tobe noiseless. Huang et al. [2006] extend EGO to create Sequential Kriging Optimization(SKO). In SKO, response values are treated as samples from a posterior distributionof the true value. To account for this, the acquisition function from Jones [2001] isaugmented to support uncertainty and previously sampled points and to allow SKOto potentially resample points to obtain a better estimate as needed.Bartz-Beielstein et al. [2005] adapted EGO to the optimization of algorithms,with their Sequential Parameter Optimization (SPO) method. SPO uses the sameacquisition functions as EGO, but a slightly different model, which includes a secondorder polynomial fit as well as the standard Gaussian process model. Unlike EGO, theirapproach is able to deal with random response values through a continual resamplingof the best observed points using a doubling strategy, which allows the estimate toconverge to the true value over time. Finally, as opposed to fitting the model witheach sample point individually, as done by SKO, SPO merges the samples for eachpoint into a better estimate of the objective at that point, and then fits the model onthese merged estimates.Hutter et al. [2009a] directly compared SPO and SKO and their suitability foralgorithm configuration. They found that SPO in general outperformed SKO onthe algorithms they studied. They also introduced SPO+, which introduced somemodifications to the original algorithm. First, they used log-transformed responsevalues, which significantly improved the model’s predictive performance. To accountfor this transformation, they introduced a new acquisition function, log expectedimprovement. Finally, they noticed that SPO could be misled by a small number ofruns on a challenger, and so introduced an invariant from their previous work (Hutteret al. [2009b]), which requires that for a challenger to be an incumbent, it must haveat least as many samples as the incumbent.This work was followed up by Hutter et al. [2010b], who improved upon SPO+ toform TB-SPO. TB-SPO differs from SPO+ in a few key ways. The first is that TB-SPOallows the procedure to meaningfully execute within a specific CPU time budget, asopposed to a fixed number of algorithm evaluations. The authors noted that runtimeoptimization of algorithms is conceptually different from ordinary expensive black-boxfunction optimization, in that the cost of each measurement can vary significantly, andbecomes cheaper as the optimization procedure makes progress. Therefore, instead oflimiting the number of function evaluations that a procedure may take, it makes more9sense to limit the amount of time it takes. This is further true, because often buildingthe model can take a disproportionate amount of time. The authors report thatin some cases, previous experiments showed that the actual execution of the targetalgorithm only accounted for approximately 3% of the actual CPU time used, while therest dealt with the overhead of building the models. The first change TB-SPO makes isthat it drops the expensive initial design; instead, it interleaves random configurationswith configurations selected by the model as the configuration procedure executes.The second is to alter the intensification strategy to be sensitive to the overhead ofbuilding the model: in TB-SPO many promising configurations are retrieved from themodel (and mixed with randomly generated configurations), and these are executed insequence until they use the same amount of CPU time as the last model build, thusensuring that the algorithm gets to be executed at least 50% of the time. Finally, theauthors replace the noise-free Gaussian process models with computationally cheaperapproximations called projected process models.Further advances to TB-SPO came with Hutter et al. [2011b], which also borrowedseveral ideas from ParamILS and defined the new automatic configurator SMAC.Unlike TB-SPO, SMAC was the first SMBO technique that was directly able toreason about configuration across multiple instances, in a manner similar to ParamILS.SMAC also replaced the previous projected process models with Random Forests(Breiman [2001]). Random Forests allow SMAC to optimize categorical parameters likeParamILS, but also continuous parameters like TB-SPO easily. Instead of treating eachrun of the target algorithm on a given instance as the same design point, the model isgiven both the used configurations and, if available, a vector of features describingeach used instance features. To predict the performance of a configuration, theinstance features are marginalized out of the model. As in previous SMBO approaches,promising configurations are selected from the model via random sampling, but inaddition, SMAC also uses a local search procedure to optimize the acquisition function,afterwards merging the selected configurations and selecting them in order of themost promising acquisition function value. Hutter et al. [2011a] extended SMACto add support for the adaptive capping mechanism from ParamILS, which requiredchanges to the model building procedure (to account for the fact that response valuesnow in some cases represented a lower bound on the performance) as well as to theintensification mechanism.Finally Hutter et al. [2012] extended SMAC to work in a distributed setting, andalso showed that SMAC can yield state of the art performance for solution qualityoptimization. D-SMAC modifies the general SMBO framework used in SMAC such10that the model building step and the evaluations happen in parallel. In D-SMAC,many configurations are chosen to be challenged against the current incumbent inparallel. Each configuration undergoes intensification to generate a set of runs ifappropriate, or to terminate the challenge. Runs generated this way are placed in aqueue; periodically, runs from this queue are distributed to workers synchronously, andthe D-SMAC algorithm waits for all to be completed. If the size of the queue dropsbelow a certain threshold, a new model is built and new configurations are generated.The authors demonstrated that for solution quality optimization, D-SMAC generatesthe best configurations and in general, as a function of wall-clock time, dominatedsequential approaches, or approaches based on taking the best from a set of parallelindependent runs of sequential approaches. D-SMAC, however, is largely limited tosolution quality optimization, since it dispatches many runs to workers synchronouslyand then must wait for the final run to finish. A large variation of runtime would leadit to poorly utilizing the available resources, and it was thus not evaluated for runtimeoptimization.11Chapter 3AEATK3.1 IntroductionIn this chapter, we outline the Algorithm Execution & Abstraction Toolkit (AEATK).After providing the motivation, an overview of the design goals, and some exampleapplications, we provide an overview of the AEATK API, with an eye towards howthese design goals are implemented.3.1.1 Motivation & Related WorkIn Section 2.2, we described two existing meta-algorithm libraries, and it is not clearwhy a third is needed. The Gang of Four (Gamma et al. [1994]) design patternsbook makes a distinction between three types of software: applications, toolkits, andframeworks. A toolkit is a set of reusable components that provide useful featuresand functions. They do not “impose a design on the application”, and in generalcannot make assumptions about how their functionality will be used. Alternativelya framework essentially provides a developer with an almost completely ready-madeapplication and requires them to fill in the missing bits. Conceivably this makes it veryquick to develop new applications, but requires the framework designer to “gamble”that their chosen architecture will work for all for all targeted applications. HALand EDACC most closely fall into framework category and our experience using theformer suggests it is very much an all-or-nothing activity. Additionally, our anecdotalexperience developing meta-algorithmic procedures suggested that the architectureintroduced by HAL suffers from several key limitations1:1An examination of the EDACC documentation suggests these would also be true of EDACC.12HAL EDACC AEATKTarget Algorithm Execution X X XDistributed Target Algorithm Execution X X XExperiment Specification & Execution X X ×Experiment Analysis X X ×Management of Solver Competitions × X ×Rich Representation of Meta-algorthmic Concepts X × ×Cluster Management X X ×Easily Modified Execution Architecture × × XEasily Distributed Meta-algorithmic Procedures × × XFigure 3.1: Goals of Selected Meta-Algorithmic Software Libraries1. All meta-algorithmic procedures require the availability of a properly configuredMySQL server.2. Users of a meta-algorithmic procedure would primarily interact with it via HAL’sweb interface.3. Meta-algorithmic procedures are tightly coupled to HAL and its API.Historically AEATK “fell out” of the process of porting SMAC from MATLAB toJava. There was a strong desire to ensure that SMAC would be available for anyonewho wished to configure an algorithm. While a direct implementation of HAL wouldallow users to easily manage experiments, and facilitate comparisons with comparablemeta-algorithmic procedures, this was not viewed as necessary for the simplest of users,and most likely would be overwhelming and discourage uptake. Thus a lightweightabstraction layer was created to allow SMAC to be operated independently and directlyvia the command line, and it subsequently proved flexible enough to be adapted forother uses.3.1.2 Design GoalsAEATK primarily evolved organically in response to my own needs, as well as theneeds of other users. Nevertheless, there have been several unifying design goals thatgoverned its evolution. If a new featured aligned with these goals, it would be addedto AEATK, otherwise it would have to be implemented elsewhere. The central designgoals of AEATK are the following:131. A simple API should exist that allows developers to invoke a target algorithmand obtain a response.2. It should be trivial to have target algorithm execution take place in a parallel anddistributed manner. In particular, users should be shielded as much as possiblefrom having to deal with the standard issues related to concurrent programmingin Java, such as thread synchronization, data-races, or correct adherence to theJava memory model.3. Target algorithm execution should be extensible, and policies (e.g., how targetalgorithm evaluations should be cached, or how errors should be handled) shouldbe customizable, both for meta-algorithmic designers (e.g., the developer of anew automatic configurator), and for individual users of the meta-algorithm(e.g., users that would apply the new automatic configurator to a new targetalgorithm).4. The meta-algorithmic procedures themselves should be easy to redistribute. Inother words, they should be portable, and lend themselves to easy execution.As AEATK is a toolkit and not a framework, and thus using it is not anall-or-nothing proposition. Indeed, while the central work in AEATK has been focusedon target algorithm execution, at least one project (Zilla) which has no need foralgorithm execution uses it for the last reason alone. Figure 3.1 outlines AEATK’sgoals and contrasts them with both HAL and EDACC. One final note is that AEATKis not exclusive to the other approaches, in fact it can be complementary. Theabstractions in AEATK are flexible and cohesive enough to lend themselves to workingwith both HAL and EDACC. Indeed, a number of meta-algorithmic procedures,including SMAC, can be used in both, provided that the meta-algorithmic procedureis properly encapsulated in Java (i.e., is not just implemented in the main() method)and that a TargetAlgorithmEvaluator (discussed in Section 3.3) is implemented inthe other meta-algorithmic library. At the time of this writing, an implementationexists for HAL, and SMAC is available as a plugin in HAL.3.1.3 Algorithm Execution ModelAt a conceptual level, the model of an algorithm execution in AEATK is thatof a function invocation: f : P → R. In this model f is the target algorithm14under study, P is an ordered tuple < pi, κ, s, θ >, and R is the ordered tuple< status, satisfiability, runtime, quality, data >.The elements of P are:pi - The problem instance to be invoked.κ - The time (must be greater than or equal to zero) that the algorithm is permittedto execute for.s - A seed for the pseudo-random number generator used by the algorithm.θ - The configuration of the algorithm.The elements of R are:status - One of {SUCCESS, TIMEOUT, CRASHED, ABORT, KILLED}. ‘SUCCESS’indicates that the algorithm completed successfully. ‘TIMEOUT’ indicates thatthe algorithm did not complete within the time budget κ. ‘CRASHED’ indicatesthat the algorithm exhibited some unexpected failure. ‘ABORT’ also signifiessome failure, but in this case the error is expected to be permanent2. ‘KILLED’indicates that the target algorithm did not complete successfully, because themeta-algorithm asked for the run to be cancelled prematurely, later in Section3.1.4 we will see examples where this is useful.satisfiability - One of {SAT, UNSAT}. This is useful when the target algorithm isa decision problem (for instance graph colouring, or boolean satisfiability), inwhich case ‘SAT’ indicates that the instance specified by pi in P has a solutionand ‘UNSAT’ indicates that it does not (e.g., this could indicate that a graphmay be three-colourable or not three-colourable respectively). Other problemssuch as optimization problems typically have no use for this parameter.runtime - How long the took algorithm took to execute in seconds.quality - A problem specific measure of how good the run was. For instance if thetarget algorithm were a heuristic algorithm for the travelling salesman problem,this field might report the distance it found on a graph.2For example, imagine if the target algorithm is accessed by directly executing it, and the executableis not found. Users doing a single run of a meta-algorithmic procedure, or others doing large scaleexperiments are more likely to notice premature termination of the procedure than to examine theoutput of an on-going procedure. Additionally these types of errors typically invalidate the experimentanyway, so no benefit is derived by fixing this while the procedure is running.15data - Additional details on the run; for example, it might report the certificate to anNP-hard decision problem that answered ‘SAT’ for satisfiability above, otheruse cases have been debug information about the target algorithm execution.Typically general meta-algorithmic procedures do not rely on this field, and onlyspecialized uses cases exist for this.Additionally, we make the following assumptions:1. As f is viewed as a function, repeated invocations of f(P ) conceptually shouldresult in identical output. The seed parameter is meant to control the randomiza-tion inherent in the algorithm itself. One limitation of the current model is thatit is not possible to measure the distribution of runtime due to say cache effectsor the operating system. It would be fairly easy to add another parameter tothe tuple P , a sample index, that would allow multiple invocations of a specificpoint in the function easily.2. If status is ‘TIMEOUT’ for some κ and ‘SUCCESS’ for a κ+ where ( > 0), thenit should be the case that status is ‘TIMEOUT’ for all y′ < κ , and ‘SUCCESS’for all yˆ > κ+ . Additionally, if κ is 0, then status will equal ‘TIMEOUT’.3. The satisfiability parameter is only meaningful when status is ‘SUCCESS’, andit should be a function of the instance, pi. In other words it should not bethe case that for a particular pi this parameter is sometimes ‘SAT’, and othertimes ‘UNSAT’. In the example of graph colouring, a particular graph is eitherthree-colourable or it is not. It does not depend on which graph colouringalgorithm is used, or the configuration that algorithm.AEATK expects the algorithm to satisfy these assumptions, which can facilitateimprovements in the efficiency of handling algorithm execution through the use ofcaching and/or simplified logic. For instance, the first property allows for runs to becached easily (i.e., if we previously have evaluated a particular P we do not need tore-evaluate it, and can use the previous result), and the second allows us to improveon caching or skip execution entirely (if κ = 0). The third property is used for errordetection in the target algorithm as occasionally different configurations can triggerbugs that cause the algorithm to produce incorrect results. Later, we will examinehow this conceptual model is implemented and represented, and in some cases, therepresentation will deviate slightly from this model.163.1.4 Sample Use Cases for Target Algorithm ExecutionFor the purposes of this discussion it will be helpful to have some example use casesin mind, consequently I will mention several use cases here:Online Single Execution In this use case a meta-algorithm makes a decision aboutwhich target algorithm executions it would like to do, gets the result of theexecution, and repeats. SMAC is a simple example which follows this work-flow.SMAC repeatedly executes the target algorithm comparing a new configuration’s(challenger) estimated performance versus that of the best found configuration(incumbent) thus far. After doing a target algorithm evaluation, it must decidewhether to continue evaluating the challenger to get a better estimate of theperformance, try a different challenger instead, or take another sample of theincumbent to improve its empirical estimate of the incumbents performance.Offline Batch Execution In this use case, a meta-algorithm knows a priori exactlywhich invocations of target algorithms it would like to do and will do somecomputation when the results are completed. An example of this is the validationutility included in SMAC, which measures the performance of a configuration ona set of test instances. Another example is determining the runtime virtual bestsolver (VBS) for a set of instances and a set of solvers. This requires evaluatingevery solver on every instance, to determine which solver solved each instance inthe least amount of time.Online Batch Execution In this use case, many invocations of a target algorithmare needed before the meta-algorithm can continue, and (potentially) parallelresources are available to execute them. For example, imagine we have 10 coresand an automatic configurator knows that the best configuration (incumbent)found so far has an average time of 5 seconds over a set of 10 instances; toevaluate this incumbent in the conceptual model of Section 3.1.3 would requireinvoking the algorithm 10 times, each with a κ (cutoff time) of 50 seconds,since any one of the runs could be the longest and they will be executed inparallel. Instead AEATK supports looking at the runs as it executes them, andcapping them dynamically, thus only determining a lower bound on executiontime. Another use case of this would be again determining the runtime VBS,where once the first solver solves an instance, the remaining solver executionscan be terminated once they are known to take longer.Online Concurrent Execution This use case is almost identical to the previous17Figure 3.2: Classes related to the configuration space of a target algorithmone, except that the meta-algorithm is multi-threaded and these threads donot coordinate amongst themselves individual target algorithm execution. Anexample use case for this would be in parallelizing FocusedILS, by evaluatingneighbours in parallel by a set of worker threads.3.2 Domain ObjectsIn this section we outline the domain objects used in the AEATK API. These areobjects which will correspond to the conceptual model outlined in Section 3.1.3 (e.g,The P and R tuple, configurations, instances, etc...). Most of the objects are verysimple POJO objects with a minimal of logic, and in most cases immutable.3.2.1 Parameter Configuration SpacesThe ParameterConfigurationSpace and ParameterConfiguration classes (shownin Figure 3.2) allows representation of the design space or configuration space ofa target algorithm. This space is exactly the space as introduced in SMAC (seeHutter et al. [2011b]). Notably, a parameter space of an algorithm can consist of twokinds of parameters. Parameters can be categorical or numeric. Additionally, certain18Figure 3.3: Classes that represent Problem Instancesparameters can be active or inactive, depending on the value of other parameters.Finally combinations of parameters can be forbidden.ParameterConfigurationSpace objects are constructed by supplying a stringeither from file or in memory containing a description of the space in PCS format(Hutter and Ramage [2013]). ParameterConfigurationSpace objects are immutable(i.e., stateless).ParameterConfiguration objects can only be obtained from the space, viagetRandomConfiguration() or getDefaultConfiguration(), or by supplying astring encoded version of a configuration to the getConfigurationFromString()method. ParameterConfiguration objects are mutable, but typically are modifiedonly once in their life cycle.3.2.2 ProblemInstance & ProblemInstanceSeedPairA ProblemInstance class (shown in figure 3.3) represent the abstract problem instancethat an algorithm will be run on. A ProblemInstanceSeedPair object is a containerfor a ProblemInstance, along with the seed the target evaluation should use, if it isa randomized algorithm. A ProblemInstance object can also contain an associatedset of features. Both of these objects are immutable, are instantiable directly, andare simple objects with no complicated logic. In typical use cases, the instance willrepresent a file on disk, but nothing in AEATK requires this to be so.3.2.3 AlgorithmExecutionConfigurationAlgorithmExecutionConfiguration (Figure 3.4) are objects which represent thephysical information about executing an algorithm, namely where the algorithm is,what its configuration space is, etc. It is an immutable, and again a simple object thatcan be constructed directly. The semantics of the individual fields vary dependingon exactly how the algorithm will be executed (see Section 3.3). In the simplest use19Figure 3.4: Class that represents the physical information needed to execute analgorithmFigure 3.5: Class that represents all information needed to execute a target algorithmcase, the field names correspond to what one would expect. Additionally a Map objectis included that may contain custom key/value pairs that can be used to customizealgorithm execution.3.2.4 AlgorithmRunConfigurationAn AlgorithmRunConfiguration (Figure 3.5) object includes everything necessaryto execute a target algorithm through a command line, and is a merely a container for20Figure 3.6: Class that represents the result of an target algorithm executionthe previously defined objects. Its constructor is directly accessible. This correspondsto the ordered tuple P in in our conceptual model from Section AlgorithmRunResultAn AlgorithmRunResult object (Figure 3.6) represents the result of executing a targetalgorithm with the given AlgorithmRunConfiguration. The details of how this isdone will be discussed in Section 3.3. This object corresponds to the ordered tuple Rin the conceptual model from Section Differences Between the Object and Conceptual ModelsThere are two differences between the conceptual model and the object model. Thefirst is that the function f is encoded with the parameters P . The second is that inthe AlgorithmRunResult the RunResult field combines the status and satisfiabilityfields in R, this is for historical reasons. Finally, the RunStatus can also take thevalue of RUNNING, which is used only in certain cases, when the meta-algorithm isexposed to the lower bound of runtime performance as discussed in the conceptualmodel.21Figure 3.7: The TargetAlgorithmEvaluator interface3.3 Target Algorithm Evaluator APIMost of the value provided by AEATK lies within its ability to execute target algo-rithms, and specifically the TargetAlgorithmEvaluator interface shown in Figure3.7. This interface provides a clear separation between the meta-algorithm and thetarget algorithm and allows meta algorithm designers the ability to design their algo-rithms ignoring most of the details of actual execution. TargetAlgorithmEvaluatorinstances are thread-safe, and so client programs can schedule runs from multiplethreads.3.3.1 Execution OptionsDepending on the meta-algorithmic procedure, different execution modes may berequired, for example, fast concurrent performance can require the use of callback andasynchronous methods, but at the same time we do not want to require users to dealwith more complexity than they need.Standard Execution By default the simplest execution option is via theevaluateRun() method, which allows the meta algorithm designer to supply a listof AlgorithmRunConfiguration objects to be evaluated (the second argument isoptional, and its semantics will be covered in the Observation section). The methodwill return when all the runs are evaluated. This execution method corresponds to theOnline Single Execution Case in Section 3.1.4. For completeness, a sequence diagramhas been included to show this in Figure 3.8.Observation During algorithm execution, additional information may become avail-able that would cause one to change the parameters of that execution. For instance,22Figure 3.8: Standard Execution Sequence Diagramin the Online Batch Execution in Section 3.1.4, the determination of a VBS maydecide, once an instance is solved by one target algorithm, to terminate other targetalgorithms runs that are longer. As a further example, consider a situation where thetime budget on an experiment might run out, and it is desirable to end the procedureas quickly as possible. TargetAlgorithmEvaluators implement an observer (Gammaet al. [1994]) pattern that allows a given set of runs to be monitored. Depending on theTargetAlgorithmEvaluator, the currentStatus() method of the user-supplied ob-server may be invoked, and the implementation may request that the run be terminatedby invoking the kill() method. Importantly, the termination mechanism is strictlyadvisory3, and the TargetAlgorithmEvaluator may or may not honour the request.This provides several benefits, including allowing the TargetAlgorithmEvaluator toignore this request when inconvenient and discourages users from writing code thatis prone to data races (e.g., between an algorithm being completed and requestingthat it is terminated). The class structure for this is outlined in Figure 3.9, and asequence diagram outlining the interaction is shown in Figure 3.10. The instancesof the AlgorithmRunResult objects that the observer sees are snapshots and do notprovide live updates.Asynchronous Execution High-performing meta-algorithmic procedures may re-quire the ability to execute runs asynchronously, and the evaluateRunsAsync()method can be used to achieve this purpose. As well as supplying the run,one supplies a callback object (typically an anonymous implementation of the3Other examples of advisory mechanisms in Java are invoking the method System.gc() for garbagecollection, or Thread.yield() to suspend a thread. The Java virtual machine is free to ignore theserequests, as these methods have no testable semantics (Bloch [2008]).23Figure 3.9: UML Diagram for Observer classFigure 3.10: Sequence diagram outlining how observation interacts between classesTargetAlgorithmEvaluatorCallback interface shown in Figure 3.11 that will benotified when the runs have been completed, as in Figure 3.12. Asynchronous execu-tion is not actually guaranteed, and this method may block until resources becomeavailable within the TargetAlgorithmEvaluator to execute the request. For instance,the CommandLineTargetAlgorithmEvaluator will block after all CPU cores havebeen used. This provides a way for meta-algorithms to avoid generating new runs,when they cannot be run instantly. This mechanism provides a natural method ofFigure 3.11: UML Diagram describing Callback class24Figure 3.12: Sequence diagram outlining asynchronous callbackscontrolling the rate at which runs are requested, as the blocking will act like a backpressure (see Welsh et al. [2001]) regulating the meta-algorithm.Single-Threaded Asynchronous Execution Conceptually, this mechanism provides fora high-performance version of the Online Concurrent Execution example in Section3.1.4. One limitation, however, is that the callbacks are handled in a different threadthan the caller, which can introduce issues of data races and memory visibility issues tothe meta-algorithm designer. To minimize this, AEATK also includes a queue facadethat allows for executions to be done asynchronously and concurrently, but whichplaces the results in a queue (along with perhaps user supplied context), and retrievedby the submitting thread at some later point in time. This allows the meta-algorithmdesigner to gain most of the benefits of concurrent execution, while avoiding thedrawbacks. Figure 3.13 shows the interface, and a sequence diagram illustrates howthe client code would interact with it in Figure Local ExecutionThe preceding section outlined the TargetAlgorithmEvaluator interface, but didnot provide any details regarding the implementations. AEATK includes a num-ber of implementations of this interface, including an implementation that defersexecution to HAL. In this section we outline how execution is done locally and dis-cuss several challenges and how they were overcome. The implementation of theTargetAlgorithmEvaluator interface in AEATK that schedules runs locally is theCommandLineTargetAlgorithmEvaluator.25Figure 3.13: TAE Queue Facade UMLFigure 3.14: TAE Queue Facade Sequence Diagram3.4.1 Target Algorithm InvocationTypically, most executions are done by executing the algorithm through a standardcommand line interface, which traces its origins to ParamILS (the format is bestdocumented in Hutter and Ramage [2013]). In short, the algorithm is invoked byexecuting the following in a shell:<algo> <instance_name> <instance_specific_information> <cutoff_time><cutoff_length> <seed> -<paramname>=‘<paramvalue>’-<paramname>=‘<paramvalue>’ ...Eventually, at some point the algorithm being executed must output the following:Result of algorithm run: <status>, <runtime>, <runlength>, <quality>,26Figure 3.15: Components involved in executing a target algorithm<seed>, <additional run data>When this is output, the various parameters are parsed, and a correspondingAlgorithmRunResult object is returned to the user. It should be clear how the aboveparameters and arguments map to the conceptual model in Section 3.1.3, and to theobjects outlined in Section 3.2.In practice, it is rare for the target algorithm to directly implement the interfaceneeded by AEATK. Unlike HAL, which allows the user to instruct it exactly how tointeract with the target algorithm, AEATK mandates this calling structure. To allowarbitrary algorithms to be executed, typically a wrapper is employed, which takes thecall generated by AEATK and then adapts the call above to the calling structure ofthe algorithm. Figure 3.15 shows the typical components involved between the meta-algorithm or application and the target algorithm. Due to the difficulty associatedwith correctly measuring CPU time (see e.g., Chapter 9 of Bryant and O’Hallaron[2002]), typically these wrappers delegate this task to another special-purpose utility,called runsolver Roussel [2011], which then invokes the algorithm directly, as isshown in Figure 3.16.The fact that the CommandLineTargetAlgorithmEvaluator delegates this entirelyto the wrapper is a design choice that serves several purposes:1. It ensures that runtime measurements are done consistently across meta-algorithm implementations, not just those using AEATK.2. Java itself has no portable way of measuring runtimes, although libraries doexist, such as SIGAR.3. It allows for more control over what exactly is being measured by the algorithm,for example, one use case we have experienced is that prior to the target algorithmbeing run, the instance is unzipped from a file and placed in a temporary directory.This can be specifically excluded from the processing time.27Figure 3.16: Target algorithm execution with runsolver3.4.2 Concurrent ExecutionThe CommandLineTargetAlgorithmEvaluator may be configured to enable multiplesimultaneous executions of a target algorithm. The management of physical CPUresources is left entirely to the discretion of the target algorithm; however, eachconcurrent run will be assigned an ID that will be exposed to it via the environmentvariable: AEATK CONCURRENT TASK ID. The target algorithm may then chose to bind toonly a specific subset of available cores (e.g., on a 16-core machine, if the environmentvariable was set to 4, and the target algorithm is dual-core, then it might limit theCPU affinity to cores 7 & 8). As mentioned in the previous section, the runsolverutility is often used to measure and enforce CPU time limits. AEATK includes acustomized build of runsolver that will automatically set the affinity of the executingprocess when the environment variable is detected.3.4.3 Target Algorithm Observation & TerminationOne aspect of the conceptual model from Section 3.1.3 is the notion that func-tion evaluations take time, and it can be useful to obtain a lower bound onruntime performance, as mentioned the examples from Section 3.1.4. TheCommandLineTargetAlgorithmEvaluator maintains a thread pool and occasionallynotifies any supplied observer of runs. To get an update of the current runtime, theCommandLineTargetAlgorithmEvaluator listens for a UDP message on a specific portpassed as an environment variable (AEATK PORT) to the target algorithm when invoked.Again, because runsolver is commonly used to measure and enforce runtime limits,AEATK’s custom version of it automatically does the notification upon detecting theenvironment variable. Figure 3.16 outlines the typical relation between the variouscomponents involved in physically executing a target algorithm, and the UDP message28from runsolver to the CommandLineTargetAlgorithmEvaluator.Termination of a target algorithm had historically been deferred to the wrapper,which typically defers to runsolver to ensure that all processes are terminated.Experimentally, we have found that this mechanism is insufficient. On shared multi-user clusters, where memory usage was strictly enforced externally, we noticed thatoccasionally processes would leak over time, eventually this would culminate in thejob being terminated by the cluster for exceeding its resource limits. This occurreddue to several reasons:1. In several cases, runsolver was susceptible to segfaults, and when this occurred,the wrapper would complete, but the process would continue executing.2. runsolver manages which processes to terminate by keeping track of a tree ofprocesses and their children. With unfortunate timing and the parent processterminating, it is possible that runsolver may not detect a child process beingspawned, if the parent process terminates too quickly.3. The Java call to terminate a subprocess is the Process.destroy() method,which terminates the direct parent, not the process tree (in Figure 3.16, onlythe Wrapper process would be terminated, runsolver and the actual TargetAlgorithm would continue executing. This means that a kill() method whichrelied on this mechanism would inherently leak processes overtime in most cases.After trying several mechanisms (including process groups), our design of AEATKeventually settled on the following mechanism, which was inspired by the mechanismthat our local cluster used to track processes. When AEATK executes, it sets anenvironment variable AEATK EXECUTION UUID to a randomly generated UniversallyUnique Identifier (UUID). When the kill() method is invoked or the parent processcompletes execution, AEATK will find any process with the environment variable andvalue set appropriately and terminate that process. In almost all cases, the environmentvariables are inherited by subprocesses unmolested. This method has anecdotallyproven to be very reliable in terms of preventing process leakage. The environmentvariable name may change, if the CommandLineTargetAlgorithmEvaluator detectsthat this name is already set in the currently running process, as this implies that weare already running as a child to some other TargetAlgorithmEvaluator.Since Java does not provide an API that allows the inspection ofother processes environment variables, to implement this functionality, theCommandLineTargetAlgorithmEvaluator delegates to a script, which takes the name29of the environment variable and the value as arguments and is responsible for ter-minating all matching processes. In Linux this can be done by traversing the /procfilesystem, and theoretically, the same could be done in Java; however benchmarkingsuggested that it was on the order of 10 to 100 times slower to traverse the file systemin Java than it was to simply call a shell script which performed the same task viafgrep. A portable way to do this on POSIX systems (including BSD and MacOS X)seems to be through the output of ps eww, and a preliminary investigation suggestedthat there exists a way to inspect the environment data of other processes via theWin32 API. Conceivably, AEATK could automatically delegate to the appropriatemethod, but in practice so far, no one has needed anything but Linux support for thismechanism.3.5 Distributed ExecutionOne goal outlined in Section 3.1.2 is support for leveraging distributed re-sources in a straightforward manner. To meet this goal, AEATK providesanother implementation of the TargetAlgorithmEvaluator interface called theMySQLTargetAlgorithmEvaluator4. This section outlines the details of the imple-mentation.3.5.1 ArchitectureThe MySQLTargetAlgorithmEvaluator uses a master-slave architecture, in whichAlgorithmRunConfiguration objects are written to the database by the master, andthen processed on a corresponding slave/worker on another machine typically, but notnecessarily, by its CommandLineTargetAlgorithmEvaluator, as in Figure 3.17.The MySQLTargetAlgorithmEvaluator is designed to be a lightweight and efficientmechanism for performing large amounts of target algorithm evaluations. It is alsodesigned to be conceptually simple to manage, with most user intervention takingplace directly in the table structure. In contrast with HAL, there is no grouping of runsby experiments. Conceptually, the table structure consists of a table that stores dis-tinct AlgorithmExecutionConfiguration and ParameterConfigurationSpace ob-jects, another table that stores the rest of the AlgorithmRunResult object as well as4Technically, this implementation is in a separate plugin as outlined in Section 3.6.1, butthis is largely to avoid the requiring the extra space for database support in all programs.At the code level, the only difference between the CommandLineTargetAlgorithmEvaluator andMySQLTargetAlgorithmEvaluator is that the latter exists in a separate project; both are used andinteracted with the same way.30Figure 3.17: MySQL Target Algorithm Evaluator architectureFigure 3.18: Sharing of runs between applications31indicating the status of whether the run has been completed by a worker, and a thirdtable that provides information about the workers and allow changing of some workerparameters (e.g., how long to stay alive for, how often to poll the database, how manyruns to retrieve from the database at a single time, etc...).In many use cases, applications will share a pool of workers, which allows forimproved efficiency if desired, see Figure 3.18. The database structure very muchfollows the conceptual model outlined in Section 3.1.3; in particular, the table orpool serves simply as a cache of runs, that are resolved by workers over time. Infact, the resolution of jobs by workers is entirely anonymous to the clients of theMySQLTargetAlgorithmEvaluator, and there is no connection between the applicationand the worker executing it. The application merely polls the database periodicallyto check for run status, and the workers poll the database to check for new runs toexecute. Additionally, the workers can automatically adapt the number of runs theypoll from the database in a single trip, if they detect that the runs are very short, andpush back unstarted runs to the database if other workers are idle.It should be noted that to distribute runs requires no changes to the applicationcode, and in some circumstances (discussed more in Section 3.7.1), the only changesneeded for users to turn a locally executing meta-algorithm to a distributed one isto change a single argument on the command line and to schedule workers on theircluster. To run a worker, the user merely executes the mysql-worker script that isprovided with the MySQLTargetAlgorithmEvaluator. To have the worker execute ina distributed way, the user simply schedules the execution of this command on theircluster using the appropriate mechanisms5.One additional benefit concerns the Offline Batch Execution example inSection 3.1.4. The TargetAlgorithmEvalutor interface contains a methodareRunsPersisted() which will return true if the application can exit such that runswill still be processed. This allows for a simplified workflow in larger experimentsthat primarily follow the Offline Batch Execution workflow. For example, if one needsto compute the VBS over 10 different instance sets, instead of having to start andpotentially background 10 different processes, one can execute them serially, eachinserting the necessary runs into the database. Later when the runs are done, theuser can re-execute the processes serially, and as expected by our conceptual model,the runs will be done now and can be processed, as if they were cached. This is the5This is another instance where AEATK operates differently than HAL, in that it does not do anycluster management, and the user is responsible for scheduling the workers. This has not historicallybeen an impediment, as most clusters provide an example script for executing commands and theseare sufficient to distribute the workers.32standard workflow for the smac-validate utility.3.5.2 Distributed vs Local Execution PerformanceAt an API level, the distinction between local and distributed target algorithmexecution is non-existent. However, in terms of actual meta-algorithmic performance,we would certainly expect a difference between local and distributed execution. Wefeel that the two most pertinent parameters of interest are the amount of time thatindividual target algorithm evaluations take (the cutoff time), as well as the numberof runs scheduled at any one time to the Target Algorithm Evaluator (the batch size).A batch size of 1 captures the Online Single Execution use case described in Section3.1.4, in which the application submits runs to the Target Algorithm Evaluator oneby one. As the batch size gets larger we expect it more closely follows Online BatchExecution, before finally being more similar to the Offline Batch Execution, where alarge number of runs is scheduled in a single batch.In this experiment, we examined how long 128 runs take to execute for cutoff timesfrom {2−6, 2−5, ...24} seconds and batch size from {1, 2, 4, ...128}. A sample pointconsists a combination of cutoff time and batch size, and we measured the time imme-diately before the first evaluateRun() call, to when the last call to evaluateRun()returns (corresponding to all the runs being completed). Each combination of settingswas measured 20 times, for a total of 1 980 total sample points. We compared the dif-ferences between the CommandLineTargetAlgorithmEvaluator detailed in Section 3.4versus the time taken for the MySQLTargetAlgorithmEvaluator detailed in Section3.5. The CommandLineTargetAlgorithmEvaluator was configured to permit up to16 runs to be executed concurrently. The MySQLTargetAlgorithmEvaluator used adistinct pool for each sample, and simultaneously had 16 workers scheduled to processthe jobs. Finally, for the MySQLTargetAlgorithmEvaluator version, we set the clientpoll frequency to 0.5 seconds and the worker poll frequency to 1 second. The targetalgorithm being executed is a python application that busy-waits until the requiredcutoff time has passed. Target algorithm runs are limited by wall-time rather thansystem time, because at the smaller scales system time is a relatively coarse measure(see Bryant and O’Hallaron [2002]); furthermore, at these small scales ( ∼ 15 ms), theinterference from polling every 500 ms is expected to be negligible.Each sample was scheduled on a cluster in shuffled order, and then processed inFIFO order. The same database and file system is used for all points, and there isa potentially for interaction effects to occur, but we expect that these effects wouldbe minor and mitigated by the randomized order. This experiment was executed on33Operating System Ubuntu Server 14.04.1 LTS x86-64CPU 2×Intel Xeon Processor E5-2650 v2 (8 cores / CPU)Memory 64 GiBKernel Version 3.13.0-24-genericJava Version 1.7.0 45Table 3.1: System configuration for performance comparison between distributed andlocal execution.Figure 3.19: Processing time versus cutoff time for theCommandLineTargetAlgorithmEvaluatorthe META cluster at the University of Freiburg, Germany, whose specifications areoutlined in Table 3.1.Figure 3.19 on page 34 outlines the processing time versus the cutoff time for anumber of different batch sizes. Several aspects of the graph deserve to be mentioned:1. For the CommandLineTargetAlgorithmEvaluator the overhead of target algo-rithm execution seems to be about on the order of 2−4 or 0.06 seconds, judging34Figure 3.20: Processing time versus cutoff time for theMySQLTargetAlgorithmEvaluatorby how the processing time levels out for smaller values.2. There is negligible performance increase in scheduling more jobs than the availablenumber of cores with the CommandLineTargetAlgorithmEvaluator, judging bythe fact that the processing time only decreases up until 16 runs are scheduledconcurrently.Figure 3.20 on page 35 shows the outcome of an analogous experiment for theMySQLTargetAlgorithmEvaluator from which we make the following observations:1. The overhead of target algorithm execution seems to be on the order of 2−2seconds or 0.25 seconds.2. Processing time shows a similar improvement as the batch size increases, but theimprovements continue past 16, especially for small cutoff times. This is to beexpected, and with a batch size of 16, when the worker completes a run and pollsfor a next, there are not any waiting as the client must poll for completion and35Figure 3.21: Overhead for the CommandLineTargetAlgorithmEvaluatorthen submit the next run. With larger batch sizes, the utilization improvementcontinues. Additionally, the workers will automatically adapt the number ofruns they poll from the database in a single trip, if they detect that the runs arevery short.3. The processing time for the MySQLTargetAlgorithmEvaluator has much morevariance than that for the CommandLineTargetAlgorithmEvaluator.Another way of looking at the same data is to consider the overhead of processingthe runs, defined as: Actual TimeExpected Time . In this calculation, we account for the 16 availablecores, so for example, assuming the cutoff time is 1 second, the expected time tocomplete 128 runs with a batch size of 2 is 64 seconds, a batch size of 16 is 8 seconds,and a batch size of 64 is 8 seconds.Figure 3.21 on page 36 shows the overhead as batch size increases. From this datawe notice that, except for the 0.0625 line, which has a lot of variation, the overhead isindependent of batch size and is less than 2× for ≥ 125 ms.The corresponding plot for the MySQLTargetAlgorithmEvaluator is shown inFigure 3.22 on page 37. We notice that in this case the overhead is far greater, in36Figure 3.22: Overhead for the MySQLTargetAlgorithmEvaluatorsome cases over 10× as high in the slowest case. Here, batch size does seem to affectthe overhead significantly; unexpectedly, up until a size of 16 the overhead seems toconsistently increase, and afterwards it decreases as expected.In general, the results from these experiments suggest that for veryshort runtimes there is a considerable amount of overhead introduced by theMySQLTargetAlgorithmEvaluator, but this can be mitigated by increasing batchsize.In practice, better performance than seen in our experiments is achievable, espe-cially when it becomes possible to increase the batch size further (as is fairly commonwhen using the smac-validate utility to submit > 100 000 runs in a single batch).To a limited degree, it is also possible to improve performance by decreasing thepolling intervals, but this comes at the cost of increased load on the database. TheMySQLTargetAlgorithmEvaluator is not ideal for Online Concurrent workloads withsmall cutoff times, and another distributed approach that does not use polling mech-anisms would be preferred. Anecdotally, however, the fact that all of this is donevia polling of the database has provided a considerable advantage to meta-algorithmdesigners, as this approach provides a transparent and easily modified view to the37internal state of an the application in a persistent form.3.6 Modification & CustomizationIn this section we outline how TargetAlgorithmEvaluator usage can be customizedand changed to suite the particular needs of the meta-algorithm designer and the user.In Section 3.6.1, we outline how new TargetAlgorithmEvaluator implementationscan be created by users and slotted into existing applications using AEATK. In Section3.6.2, we provide an overview of other TargetAlgorithmEvaluator implementations,and finally, in Section 3.6.3, we outline other parts of AEATK that provide commonfunctionality needed by all implementations.3.6.1 New ImplementationsThere are a number of reasons why a new implementation of aTargetAlgorithmEvaluator may be necessary:1. It can be useful to debug an algorithm by, for instance, having a specific sequenceof results returned.2. In some cases, the overheads associated with existing implementation may bevery large. For instance, for very short runtimes, it can be more efficient todirectly execute the target algorithm within the same process instead of doing asystem() call6.3. The target algorithm may not lend itself easily to wrapper execution; for instance,it might be a remote process, or something that prompts the user for input.4. To interface with other meta-algorithmic libraries.AEATK supports a plugin style architecture based on the Service Provider Interface(SPI) in Java. To get an instance of a TargetAlgorithmEvaluator in AEATK, oneinvokes the TargetAlgorithmEvaluatorBuilder.getTargetAlgorithmEvaluator()method and supplies a custom name, for instance CLI for the implementation discussedin Section 3.4 or MYSQLDB for the implementation discussed in Section 3.5. SPI is usedto map this name to the specific implementation, which is then loaded dynamically.Figure 3.23 shows the required classes (on the right) needed to create a new imple-mentation. The actual implementation is shown in the first row. The second row6This is only applicable if the target algorithm is written in Java and can be evaluated directly, oris executable in C/C++ using the Java Native Interface (JNI)38Figure 3.23: UML for new TargetAlgorithmEvaluator implementationsis an implementation for an Abstract Factory design pattern (Gamma et al. [1994]),which can instantiate the TargetAlgorithmEvaluator implementation. The third rowallows the user to specify options applicable to the TargetAlgorithmEvaluator im-plementation, for instance in the CommandLineTargetAlgorithmEvaluator an optionexists to set the number of concurrent executions.As will be discussed in Section 3.7.1, this name can be specified in the commandline. This allows someone to implement a new TargetAlgorithmEvaluator andthen leverage existing meta-algorithmic techniques without needing to make anymodifications to source code.Supporting all of the features and mechanisms outlined in Section 3.3 suchas asynchronous execution, observation, and early termination of runs can ap-pear daunting, but AEATK provides mechanisms for simplifying this task. Onemechanism that AEATK provides to simplify the task of implementing newTargetAlgorithmEvaluator is by supplying useful base types which can be ex-tended (others will be discussed in Section 3.6.3). First note that a newTargetAlgorithmEvaluator need only implement one of evaluateRunAsync() andevaluateRun(). If evaluateRunAsync() is implemented, then evaluateRun()can always simply call evaluateRunAsync() and wait on a semaphore that willbe triggered in a callback’s onSuccess() method on completion. Alternatively,if evaluateRun() is implemented, evaluateRunAsync() can simply start an-39other thread and in this thread call evaluateRun()7. AEATK provides the ab-stract subtype AbstractSyncTargetAlgorithmEvaluator, which can be extendedand only requires the evaluateRun() method to be implemented, and the ab-stract subtype AbstractAsyncTargetAlgorithmEvaluator, which only requires theevaluateRunAsync() method to be implemented.Additionally note that our conceptual model from Section 3.1.3 allowed us toview a run as a function evaluation, and it should be obvious that meta-algorithmswould generally be agnostic to how long the evaluation took (This would not betrue for instance if the API returned a token, and expected the meta-algorithm topoll this token for updates, in this world the meta-algorithm is very much awareof time.) If the evaluation were instantaneous there would be no reason to notifythe observer, and consequently notifying the observer is optional. Additionally, asmentioned in Section 3.3, the kill() method is advisory and an implementation maychose to implement this method only partially8, if at all. Since implementations mayat their discretion skip observer functionality and kill() method, and can rely onthe library to handle asynchronous requests new implementations can be as simpleas implementing a single procedural method. Consequently, while AEATK supportsa flexible execution framework, users of custom meta-algorithms are not exposed tocomplexity beyond that needed for their purposes.3.6.2 Existing ImplementationsBeyond local execution outlined in Section 3.4 and distributed execution in Section3.5, there are several additional implementations of the TargetAlgorithmEvaluatorinterface available in AEATK.There are a number of TargetAlgorithmEvaluator implementations that areuseful for debugging (names in brackets). For instance, there are implementationsthat return a constant response (CONSTANT), allow the user to specify a predefinedsequence of responses (PRELOADED), or return a randomized response (RANDOM). Anotherimplementation (BLACKHOLE) simply drops asynchronous requests and is useful forunit testing.A number of TargetAlgorithmEvaluator have more general applicability. TheTargetAlgorithmEvaluator (HAL) is used by meta-algorithm implementations when7It should be noted that this solution is generally not scalable to a large number of asynchronousthreads.8An example of a partial implementation, is a TargetAlgorithmEvaluator that cannot actuallyterminate active runs, but can terminate those runs in a batch that have not actually physicallystarted executing.40running from within HAL. Another implementation (ANALYTIC) contains some closed-form analytic test functions in Java, for instance Branins’ function (see e.g., Molga andSmutnicki [2005]). Finally, an implementation exists (IPC) that can communicate withanother process via UDP or TCP/IP; typically, this is used as a way of interacting withother languages, or across the network. Finally another implementation (SURROGATE)allows users to build a model of their target algorithms performance and use the modelin place of the actual algorithm, similar to work done in Eggensperger et al. [2015].3.6.3 Execution CustomizationWhen actually performing target algorithm evaluations, a number of problems canarise. In some cases, these problems are entirely within the purview of the actualimplementation, such as the problem of process leakage described in Section 3.4.3.In other cases, the problems can be fairly general such as evaluations that reportCRASHED sporadically or that are exceeding their requested cutoff time. Alternatively,there are features that are useful for any and all TargetAlgorithmEvaluator im-plementations, such as caching, or logging information about runs being processed.Consequently, there are issues and features that are general and apply regardless ofwhich TargetAlgorithmEvaluator implementation or meta-algorithm is being used.It would introduce an unnecessary burden to require that every implementation ofthe TargetAlgorithmEvaluator interface deal with these cases explicitly. At thesame time, we do not want to push the complexity of dealing with these issues ontothe meta-algorithm designer either. Ultimately, we want to deal with this at a layerbetween user-supplied TargetAlgorithmEvaluator implementations, which shouldbe simple to implement, and the meta-algorithm, which should have a simple way ofdoing evaluations.The solution to this problem comes in the form of the Decoratordesign pattern (Gamma et al. [1994]). After retrieving the requestedTargetAlgorithmEvaluator implementation, it is modified by applying a num-ber of decorators to provide additional functionality or properties. One suchdecorator, shown in Figure 3.24, automatically retries runs that are crashed auser defined number of times. As mentioned in Section 3.6.1, invoking theTargetAlgorithmEvaluatorBuilder.getTargetAlgorithmEvaluator() method re-trieves the TargetAlgorithmEvaluator instance; By default this instance is decorated14 times, and within the AEATK API, there are ∼40 decorators available that servedifferent purposes, most of which can be configured in the code and/or by the uservia the command line (see Section 3.7.1).41Figure 3.24: Sequence diagram showing how CRASHED runs can be dealt withThe decorators in AEATK correspond (loosely) to the following categories, a fullreference is available in Appendix A:Debugging These decorators aid in debugging applications. For instance, someallow users to control logging of every evaluation, and will check that theTargetAlgorithmEvaluator is properly shutdown when the program ends.Finally, others check various invariants, such as that the client is not ask-ing for duplicate target algorithm evaluations in the same request or thatthe TargetAlgorithmEvaluator implementation is not misbehaving (e.g., aTargetAlgorithmEvaluator could due to a concurrency bug return the wrongAlgorithmRunResult objects).Functionality These decorators implement some added functionality; for instance,one of the decorators implements the getOutstandingEvaluations() method(instead of requiring the base-type to do it) and another allows transformingthe response values according to any analytic function. Yet another decoratorsimulates delays to allow for reproducibility and debugging (e.g., if a run reportsthat it took 5 seconds to execute, this decorator will wait, if necessary, to ensurethat it took 5 seconds from request to response).Helpers These decorators are viewed as providing generally helpful functionality.One decorator terminates runs that have taken 10 times their cutoff time toexecute. Another decorator retries crashed runs, as described previously. Anotherdecorator will notify observers of the wall time that runs have taken to execute,if the target algorithm evaluator is not aware of the actual runtime (e.g., if thewrapper does not use the modified version of runsolver). Another decorator42always calls the observer with the final result (this ensures that if key logic ina meta-algorithm is done in an observer, and the TargetAlgorithmEvaluatordoes not implement observation, it works and appears as if it was a cached runwith observation). Another decorator produces a report of utilization overtimeof the TargetAlgorithmEvaluator.Resource These decorators generally control the flow of execution. Several decoratorsimplement caching (in memory and file based). Another decorator limits thenumber of requests that can go past it at the same time (imagine wanting to do1 000 runs simulating 16 cores with random response values; using this decoratorand the one that simulates delays would achieve this effect). Another decoratorallows runs to be submitted as “low priority” (will only run if nothing elseneeds to). Another decorator ensures that the evaluateRunAsync() methodnever blocks. Finally, another decorator submits requests to several decoratedimplementations that depend on a configurable policy (e.g., one might run analgorithm for 1 second locally before submitting it to the database).Safety These decorators are concerned with ensuring safety and correctness. Dec-orators here may chose to treat crashes as aborts, or if the first run crashes,treat that as an abort (most of the time if the first run crashes, then everyrun crashes, and something is broken). Other decorators check that instancesalways report consistent results for problem instance satisfiability (either spec-ified in a file, or just based on other runs). Another, will log warnings if theoverhead detected in the target algorithm is high. Finally, another will warnif the TargetAlgorithmEvaluator does not seem to be making progress (themost common reason for this to occur is not scheduling workers when using theMySQLTargetAlgorithmEvaluator described in Section 3.5).3.7 Easy DistributionThe final aspect that AEATK assists application developers with is easy distributionof meta-algorithmic procedures. McConnell [2004] notes that once a simple programhas been completed, it takes twice as much work to release it for general consumption(citing Brooks [1995] & Shull et al. [2002]). Admittedly, AEATK (and, to a lesserextent, meta-algorithmic techniques in general) is primarily research focused; however,there is a strong desire for meta-algorithmic techniques to see wider adoption in bothacademia and industry. To do this, meta-algorithmic techniques should be reliable,43robust and easy to use. In previous sections, we have talked about reliability androbustness, and in this section we will discuss two aspects of using AEATK applicationsfrom the perspective of an end user: Supplying input to the meta-algorithm on thecommand-line in Section 3.7.1 and its output in Section Command-line InterfaceIn Sections 3.5 and 3.6.1, we alluded to being able to change theTargetAlgorithmEvaluator fairly easily, and while it is easy to envision howthis could be done in code, it is unclear how a user could simply create a newTargetAlgorithmEvaluator implementation and then not have to change themeta-algorithm implementation. Additionally, in Section 3.6.3, we mentioned ahost of options that can provide additional functionality that may be of use toTargetAlgorithmEvaluator implementers, meta-algorithmic designers, but also toend users of those meta-algorithms. Both selection of the TargetAlgorithmEvaluatorand configuration of it and the decorators are controlled through a standardcommand line interface library included with AEATK called JCommander (http://www.jcommander.org).JCommander allows developers to declare a standard Java object and then useannotations to have those fields exposed on the command line. JCommander alsoallows these objects to be composed (in JCommander terminology, they are parameterdelegates). AEATK supplies a series of delegates that can allow for a large amount ofobjects and choices to be specified on the command line as needed. We also modifiedJCommander to provide additional support for reading options files (including defaultfiles located in the user home directory). Finally, it was also extended to providemore flexibility in displaying help to the user. For example, in JCommander optionscan either be hidden or not, but in our modified version different levels of help areavailable to the user of an AEATK application by using the option --help-levelwith values: BASIC, INTERMEDIATE, ADVANCED, or DEVELOPER.In Section 3.5 (page 32), we noted that to changing an existing ap-plication to leverage parallel resources can be done with a single argu-ment, and at this point, we can see how and when this is true. NewTargetAlgorithmEvaluator implementations can specify options (third row of Figure3.23). These options are then read by JCommander as well as the name from theTargetAlgorithmEvaluatorFactory.getName() method. On the command-line, theuser can select this TargetAlgorithmEvaluator by using its name with the --tae pa-rameter, which selects base-type of the TargetAlgorithmEvaluator returned by the44TargetAlgorithmEvaluatorBuilder.getTargetAlgorithmEvaluator() method.3.7.2 OutputWhile the previous section detailed how the user provides input to AEATK applications,AEATK also provides some support for handling output of an application. Fordirect user feedback, most applications rely on logging; AEATK uses SLF4J (http://www.slf4j.org/), a logging library in Java that acts as a facade and front end to anumber of other logging libraries. Similar to how AEATK can dynamically changethe TargetAlgorithmEvaluator an application is using, SLF4J allows the user tocontrol which logging library they use. By default, AEATK also supplies logback(http://logback.qos.ch/) and some parameter delegates to control the logging options(so users can for instance change the log level of an application). It is importantto note that the coupling of SLF4J does not introduce a serious restriction on themeta-algorithm, as SLF4J allows users to switch logging frameworks at runtime.For instance, HAL uses the Apache Commons Logging, and so when an AEATKmeta-algorithm is running inside HAL, SLF4J will redirect log messages to the sameframework instead of logback.3.8 Conclusions and LimitationsIn Section 3.1.2, we outlined the principal design goals that drove the developmentof AEATK. Algorithms should be easy for users to execute – the mechanism forthis was discussed in Section 3.2 and 3.3. Meta-algorithmic designers can evaluatealgorithms with a simple method call, and yet the API also provides much richermethods; however, users only need to deal with the complexity they need. In partbecause of this philosophy, users of meta-algorithms can also easily leverage parallelresources, as discussed in 3.5 and 3.7.1. Additionally, the plugin (see Section 3.6.1) anddecorator structure (see Section 3.6.3) of AEATK allows meta-algorithms leveragingAEATK to be versatile and adaptable to users needs. Finally, AEATK provides somesupport for making redistributable applications described in Section 3.7, as a toolkitusers can pick and chose the functionality they need (MySQL and JCommander areoptional), and the coupling to the logging framework, SLF4J, is arguably not likely tobe a serious impediment to future users.The reader might wonder where all the complexity related to algorithm execu-tion went; the answer to this question is that, to a large degree, it has been hiddenfrom both the meta-algorithm designer and the TargetAlgorithmEvaluator imple-45menter. This is aided by design principles such as Inversion of Control, which thecallback and observer structure follow, and leads to more extensible code. Indeed,the simplicity that both groups of AEATK users benefit from, is offset by the com-plexity of implementing some decorators. For example, the decorator which retriesruns if they are CRASHED is surprisingly complex, because of the callback mecha-nism. Additionally, reasoning about decorator order can be surprisingly complex. Forinstance, the waitForOutstandingEvaluations() method may not behave as themeta-algorithm expects, if another decorator transparently reschedules runs (e.g., be-cause they crashed and a decorator is retrying them). Most meta-algorithm designersand TargetAlgorithmEvaluator implementers are not exposed to this, however, asthe ordering is an internal aspect of AEATK, and this issue only arises when newdecorators need to be created that cannot be inserted near the base or at the top ofthe decorator chain.Much of AEATK grew organically driven by porting SMAC to Java, and also byneeds of users on other projects. As a general rule, no idea would be implemented, nomatter how good, unless there was an actual need for it (similar to the principle inextreme programming). The four best ideas that have not been implemented at thispoint are:1. Allowing meta-algorithms to take multiple samples per point as discussed inSection A distributed TargetAlgorithmEvaluator implementation that does not relyon a MySQL server being available.3. Allowing the configuration of a target algorithm to be specified using a Mapinstead of an instance of the ParameterConfiguration class, would decouplethe execution framework from the PCS format.4. Allowing decorators for the TargetAlgorithmEvaluator to be loaded fromplugins9.The first three of these ideas are not particularly onerous to implement, but thefinal one would require some API changes and careful consideration.In general, however, AEATK has been successful, and it has facilitated work thatgave rise to several publications, including in Thornton et al. [2013a] and Hutter9One can achieve this in some cases by creating a new TargetAlgorithmEvaluator as in Section3.6.1, and then supplying an option that loads the actual base type manually, but this is coarse anddoes not give you the ability to select where in the decorator structure it appears46et al. [2013] where AEATK facilitated data collection.The reference implementationsfor detecting parameter importance via ablation analysis Fawcett and Hoos [2013],and functional ANOVA Hutter et al. [2013] were also written in AEATK. Finally,beyond forming the basis for the re-implementation of SMAC, efforts are under-way byothers to use AEATK as the reference implementations in SatZilla by Chris Cameronand Alexandre Fre´chette, ParamILS 2.0 by Aymeric Blot and SATFC by AlexandreFre´chette.47Chapter 4SMAC in AEATK4.1 IntroductionSequential Model-based Algorithm Configuration (SMAC) (Hutter et al. [2011b]) isan algorithm which solves the algorithm configuration problem defined in Section2.3. SMAC was original implemented in Matlab; in this chapter, we describe theimplementation of the SMAC algorithm in AEATK & Java (hereafter referred to asthe Java implementation). The Java version is now the reference implementation forSMAC and has been used to configure state-of-the-art algorithms for a broad range ofcombinatorial problems (see e.g., Hoos et al. [2012], Lindauer et al. [2015], Seipp et al.[2015]), and machine learning tasks (see e.g., Domhan et al. [2014], Thornton et al.[2013a], Eggensperger et al. [2013]).4.2 Algorithm OverviewAs discussed in Section 2.3.2, SMAC traces its origins to the EGO algorithm (Joneset al. [1998]), which (after an initial design) alternated between building a model andevaluating the most promising point found from the model. Algorithm 4.1 presentsthe pseudo-code for the SMAC algorithm in more detail (it is adapted from Hutteret al. [2011b] but also includes details from Hutter et al. [2011a]). At the highest level,SMAC uses a single run of the default as an initial design, and then alternates betweenbuilding models (random forests) and evaluating promising configurations selectedfrom it. Promising configurations are selected from the model by performing ten localsearches using a one-exchange neighbourhood1. The results of these local searches are1For continuous parameters, the set of neighbours for a parameter T with value x normalized to(0,1) is sampled 4 times from x′ ∼ N (x, 0.2), with rejection if x′ /∈ [0, 1].48mixed with 10 000 random samples and sorted in order of best acquisition functionvalue2. These 10 010 configurations are there interleaved with randomly sampledconfigurations, and SMAC evaluates these configurations until enough time has passedand it decides to rebuild its model. Algorithm 4.1 shows high level pseudocode forSMAC.4.3 AEATK ImplementationThe SMAC project heavily relies on the AEATK library for target algorithm evaluation,input parsing and output. Existing parameter delegates (see Section 3.7.1) takes careof most of the input validation and typically catch problems with inputs early. Theprincipal functionality of SMAC is distributed amongst five classes:AbstractAlgorithmFramework contains the core functionality of the SMAC algorithm(as presented in Algorithm 4.1). By itself, this class implements the ROARalgorithm specified in Hutter et al. [2011b], where no model is learnt andconfigurations are selected randomly.SequentialModelBasedAutomaticConfigurator is a subtype ofAbstractAlgorithmFramework and contains the logic for building the modeland performing local search.Validator performs validation of the configurations found during the execution ofSMAC, typically after SMAC has executed but also as a stand alone utility.This utility can be used for not just SMAC, but also ParamILS (Hutter et al.[2007b]) and GGA (Ansotegui et al. [2009]).SMACExecutor is the program entry-point and parses command-line arguments.SMACBuilder builds all of the necessary dependencies for running SMAC (e.g., retriev-ing the instances, setting the TargetAlgorithmEvaluator, etc.). Both AEATKand SMAC following a coding convention called constructor-based dependencyinjection3. While this in general leads to more portable code, it has the side2Typically exponential expected improvement as defined by Hutter et al. [2009a].3Dependency injection means that a class is given all dependencies it needs by its caller, anddoes not resolve the dependencies internally (e.g., the AbstractAlgorithmFramework is given theTargetAlgorithmEvaluator from outside). Constructor-based means that all dependencies are sup-plied in the constructor, instead of using setter methods to supply them. An advantage of thisapproach is that once objects are constructed they are always in a usable state and have all necessarydependencies.49Figure 4.1: Sequence diagram depicting interactions between the main classes in theJava implementation of SMAC.effect of making the constructors quite large. The SMACBuilder class simplifiesconstruction of the AbstractAlgorithmFramework andSequentialModelBasedAutomaticConfigurator objects.Figure 4.1 shows a sequence diagram which represents how these classes interact.Most of the core functionality depicted in Algorithm 4.1 is contained within the run()and intensify() methods in the AbstractAlgorithmFramework class.504.4 Differences Between MATLAB and JavaImplementationsWhile both implementations follow the algorithm outlined in Algorithm 4.1, there aresome minor differences between them4:1. Following the ideas outlined by Hoos [2012a], the Java version exposes almostevery design parameter on the command line, although most users are shieldedfrom this by AEATKs support for different levels of options as mentioned inSection The Java version supports independently seeding the different pseudo-randomnumber generators of different components (e.g., seeds for building models,random configuration generation, etc.), which is particularly useful for localizeddebugging.3. The MATLAB version has support for many types of models, such as GaussianProcesses, whereas the Java version is coupled very tightly to the Random Forestmodels.4. The Java version supports pre-loading the random forest model with existingdata before running. This can be beneficial if existing data is available, a featurethat was used for instance by Feurer et al. [2015].5. The Java version supports tracking where and when configurations were instan-tiated and how far they progressed, which enables one to answer queries likehow many of the final incumbents were selected via expected improvement, andhow many were selected as a result of random search.6. The Java version is better documented, both at the code level but also for users.It includes a quick start guide, example scenarios and a comprehensive manual.7. The MATLAB and Java versions have different command line interfaces, bututilize the same file formats (similar to ParamILS)5.4The MATLAB and Java version share the exact same random forest code, as the MATLABversion uses the implementation written in Java.5Technically, the feature file format is incompatible, since the Java version requires a header toname all instance features (whereas in MATLAB one can just supply a vector of doubles). Thischange was mandated after discovering instance sets with features sorted in different order.518. If the MATLAB version re-executes a run with a longer cap-time it forgets aboutthe previous execution entirely, and the preceding run does not count towardstime limits. In Java, the preceding run is properly counted.9. In contrast to the Matlab version, the Java version can be run easily, withoutthe need for expensive Matlab licenses or time-consuming setup of the MatlabRuntime Environment. This has facilitated the wide-spread use of SMAC.Although now included in AEATK, the Java version of SMAC includes utilitiesfor detecting problems (e.g., missing instances, missing features, syntax errors in files,etc...) with an experiment before they are begun, and can do this across multipleexperiments at the same time. Another included utility allows users to call their targetalgorithm exactly as SMAC would, to help debug issues that may not be obvious whenusers simulate calling their algorithm from the command line/shell.64.5 Experimental ComparisonIn this section we compare the performance of the MATLAB and Java implementationsof SMAC. For reference we use a subset of the scenarios presented in Hutter et al.[2011b] for the SAPS (Hutter et al. [2002]), and SPEAR (Babic´ and Hutter [2007]) SATsolvers, that are contained with the AClib project (http://www.aclib.net/). Additionally,our runs for both the Matlab and the Java version differ in that they use the adaptivecapping mechanism outlined in Hutter et al. [2011a], which is the standard experimentalset-up for runtime optimization. Finally, we included the SAPS-QWH scenariointroduced in Hutter et al. [2009b]. Table 4.1 outlines the parameters for eachconfiguration scenario. For both MATLAB and Java implementations of SMAC, theTuner Time is calculated as the sum of the CPU time of SMAC, plus the greater of0.1 and the reported CPU time of each individual target algorithm run7.4.5.1 Computational EnvironmentThis experiment was performed using the META cluster at the University of Freiburg,Germany. Table 4.2 lists the hardware and software resources of the machines in thiscluster. Multiple runs of each scenario were performed (the actual numbers are inSection 4.5.2). For submission to the Sun Grid Engine clustering platform, runs of both6A common example is when users are relying on features of their shell, for example quotationmarks or the “∼” character expecting it to be turned into their home directory.7Other configurators may calculate this differently.52MATLAB and Java SMAC were aggregated into groups of 16 that would be dispatchedto an entire node. Then, each grouping of 16 runs were executed concurrently, whicheach individual run being assigned a distinct processor affinity (i.e., run n of SMACand its target algorithm executions must all compete for a single core on the machine).4.5.2 ResultsFigures 4.2, 4.3, and 4.4 show the value of the objective function on the training setover time for both configurators. Qualitatively, the graphs look very similar, and whilethere are some noticeable differences, neither configurator tends to dominate the otherbased upon a visual inspection of the graphs.Tables 4.3 and 4.4 outline the final median performance for both configurators.We use the Mann-Whitney U test with a Bonferroni correction of 15 to test fora distributional difference at a significance level of 0.01 and find only 4 scenarioswhere they diverge8. In the cases where there is a statistically significant differencewe note that the difference in medians occurs at the 10s of milliseconds level forIBM-Spear-med, the millisecond level for SWV-Spear-med and SWV-Spear-q095 andat the sub millisecond level for IBM-Spear-q025. At these levels the mechanismused by these scenario to determine runtime are very noisy (see Chapter 9 of Bryantand O’Hallaron [2002]). Consequently, we believe that the MATLAB and Javaimplementations of SMAC are comparable.4.5.3 Conclusions & Future WorkIn this chapter, we outlined the re-implementation of the SMAC algorithm from Hutteret al. [2011b] in Java, which now is the reference implementation of SMAC. There areseveral interesting directions in SMAC which one might consider exploring. The firstis to meta-configure SMAC (i.e., tune SMAC’s parameters, many of which were onlyset heuristically), as much of the parameter space is exposed on the command line,and afterwards to conduct an analysis of parameter importance via either the methodof Fawcett and Hoos [2013] or Hutter et al. [2013], to improve SMAC’s performance.Another important question is how much of the improvement SMAC discovers in aconfiguration space is due to the choice of instance selection and its use of randomconfigurations.8For the skeptical reader, if a significance level of 0.05 was desired and no Bonferroni correc-tion is applied, then only the performance on two additional scenarios, SWGCPq075-PAR1-saps andSWGCPq095-PAR1-saps, would be deemed significantly different.53;Figure 4.2: Median Performance and Interquartile range for SAPS Single InstanceScenarios.54Algorithm 4.1: Simplified SMAC AlgorithmR keeps track of all target algorithm runs performed so far and their performances (i.e.,SMBO’s training data {([θ1,x1], o1), . . . , ([θn,xn], on)}), M is SMBO’s model, ~Θnew is a listof promising configurations, and tfit and tselect are the runtimes required to fit the model andselect configurations, respectively.Input : Target algorithm A with parameter configuration space Θ; instanceset Π; cost metric cˆOutput : Optimized (incumbent) parameter configuration, θinc1 [R, θinc] ← Initialize(Θ, Π);2 repeat3 M← learnModel(R);4 ~Θnew ← selectConfigurations(M);5 tintensify ← duration of previous function call6 //intensify() is in-lined here7 i← 08 foreach θnew ∈ ~Θnew do9 //challengeIncumbent() is in-lined here10 R ← ExecuteNewRunForIncumbent(R,θinc)11 //Number of Runs to do per stage12 N ← 1;13 while true do14 S = {(pii, si)}Ni=1 ← NextPISPsForChallenger(N,θnew,θinc)15 foreach (pi, s) ∈ S do16 (pi, s, κ)← ComputeCapTimes((pi, s))17 R ← ExecuteRun(R, θnew,pi ,s ,κ )18 T = {(pii, si)}Mi=1 ← GetPISPSRunForConfig(θinc)V = {(pii, si)}Li=1 ← GetPISPSRunForConfig(θnew)19 oinc = ComputeObjectiveOverPISPS(θinc, V )onew = ComputeObjectiveOverPISPS(θnew, V )20 if onew > oinc then break ;21 else if T = V then θinc ← θnew; break ;22 else N ← 2 ·N ;23 if time spent in this call to this procedure exceeds tintensify and i ≥ 2then break ;24 i← i+ 125 until total time budget for configuration exhausted;26 return θinc; 55Scenario Instances Features Cutoff (s) Tuner Time (s)IBM-Spear-med 1 0 5.0 1800IBM-Spear-q025 1 0 5.0 1800IBM-Spear-q075 1 0 5.0 1800QCPmed-PAR1-saps 1 0 5.0 1800QCPq075-PAR1-saps 1 0 5.0 1800QCPq095-PAR1-saps 1 0 5.0 1800QWHmed-PAR1-saps 1 0 1.0 3600SWGCPmed-PAR1-saps 1 0 5.0 1800SWGCPq075-PAR1-saps 1 0 5.0 1800SWGCPq095-PAR1-saps 1 0 5.0 1800SWV-Spear-med 1 0 5.0 1800SWV-Spear-q075 1 0 5.0 1800SWV-Spear-q095 1 0 5.0 1800QCP-PAR10-saps 955 125 5.0 18000QCP-spear-PAR10 976 125 5.0 18000SWGCP-PAR10-saps 1000 125 5.0 18000SWGCP-spear-PAR10 1000 125 5.0 18000Table 4.1: Configuration scenarios for MATLAB and Java SMAC Implementations.Additionally, all of the multi-instance scenarios used the continuous parameter con-figuration spaces. For single instance scenarios involving the SPEAR solver, theconfiguration space was discretized.Operating System Ubuntu Server 14.04.1 LTS x86-64CPU 2×Intel Xeon Processor E5-2650 v2 (8 cores / CPU)Memory 64 GiBKernel Version 3.13.0-24-genericJava Version 1.7.0 45Table 4.2: System configuration for all configuration scenarios56Java MATLABScenario Unit Default Median # Median #IBM-Spear-med [·100 s] 2.300 1.593 160 1.513 160IBM-Spear-q025 [·10−1 s] 5.516 3.693 160 3.695 160QCPmed-PAR1-saps [·10−2 s] 3.396 2.561 160 2.534 160QCPq075-PAR1-saps [·10−1 s] 2.500 0.919 160 0.921 159QCPq095-PAR1-saps [·100 s] 3.934 0.463 160 0.481 160QWHmed-PAR1-saps [·10−2 s] 4.212 2.738 32 2.838 31SWGCPq075-PAR1-saps [·100 s] 3.024 0.110 160 0.113 144SWGCPq095-PAR1-saps [·100 s] 2.687 0.123 160 0.127 160SWV-Spear-med [·10−1 s] 6.631 3.710 160 3.724 160SWV-Spear-q075 [·100 s] 2.644 0.363 160 0.363 159SWV-Spear-q095 [·100 s] 5.000 0.506 160 0.507 160Table 4.3: Median training performance of MATLAB and Java implementations ofSMAC, along with the default performance for single-instance scenarios from Hutteret al. [2011b]. The number of independent runs for each is denoted in the # column.Some runs of MATLAB did not complete successfully. Bold-faced entries representconfigurator performance that is significantly better under a Mann-Whitney U Testwith significance level of 0.01 (a Bonferroni correction was applied so the individualp-values are less than 6.667× 10−4).Java MATLABScenario Unit Default Median # Median #QCP-PAR10-saps [·100 s] 9.367 3.484 32 3.557 32QCP-spear-PAR10 [·100 s] 1.897 0.653 32 0.616 31SWGCP-PAR10-saps [·100 s] 15.094 0.107 32 0.108 32SWGCP-spear-PAR10 [·100 s] 6.987 6.027 32 6.018 31Table 4.4: Median performance of MATLAB and Java implementations of SMAC,along with the default performance for multi-instance scenarios from Hutter et al.[2011b]. The number of independent runs for each is denoted in the # column.Some runs of MATLAB did not complete successfully. Bold-faced entries representconfigurator performance that is significantly better under a Mann-Whitney U Testwith a significance level of 0.01 (a Bonferroni correction for 15 samples was applied sothe individual p-values are less than 6.667× 10−4).57Figure 4.3: Median Performance and Interquartile range for SPEAR Single InstanceScenarios.58Figure 4.4: Median Performance and Interquartile range for Multi-Instance Scenarios59Chapter 5Parallel AutomaticConfiguration5.1 IntroductionThere are many possible ways in which an automatic configurator can leverage parallelresources to achieve improved configuration outcomes. In this chapter, we first examinetwo current approaches and then explore a simple mechanism to increase parallelperformance.5.1.1 Parallel Independent RunsOne method that has been used prominently in practice to leverage parallel resourcesis the k-independent runs protocol outlined in Hutter et al. [2009b]. This has becomea standard protocol for parallelizing algorithm configuration that has been used in avariety of papers (see e.g., Hutter et al. [2007a], Hutter et al. [2010a], Fawcett and Hoos[2013], Hutter et al. [2012], Styles and Hoos [2013]). In this protocol, k independentruns with different seeds are run on parallel computing resources. Once all k runsare completed, the configuration with the best performance on the training set isselected. This approach has several benefits, including that it is easy to schedule thek runs independently in shared cluster environments, and that they need not executeconcurrently. This approach’s power comes from exploiting the randomness inherentin the algorithm configurators. Intuitively, one can imagine rolling a set of dice andobtaining the minimum value; the expected minimum value becomes smaller if youroll the roll more dice. This is precisely why this protocol works, and this protocol60yields the largest benefits for high-variance algorithm configurators (like ParamILS),but in general it is applicable to any configurator that is sufficiently randomized.5.1.2 Parallel Automatic ConfiguratorIn this approach, all computational resources are centralized and coordinated by amaster thread or process. This master process schedules multiple algorithm runs inparallel and manages their results. Some configurators that leverage this are GGA(Ansotegui et al. [2009]) and D-SMAC (Hutter et al. [2012]). The advantage of thisapproach is that there is less (if any) redundant computation, and one is better ableto leverage parallel resources (e.g., if ten algorithm runs are required for a challengerto beat the incumbent, they can be scheduled in parallel in this approach but not inthe parallel independent runs method outlined in Section 5.1.1). One drawback of thisapproach is that it has a single point of failure. Another drawback is that these runs aretypically difficult to schedule on shared cluster systems, as they must reserve a largerportion of the available resources. Additionally, in particular the runtime of algorithmevaluations for runtime optimization tend to exhibit large variation when the objectiveis runtime minimization, which can require significant complexity to effectively utilizethe cores. D-SMAC, for instance, primarily assumes that all algorithm evaluationstake the same amount of time.5.1.3 Parallel Dependent RunsIn this protocol and its instantiation with SMAC (called pSMAC ), the automaticconfigurator runs are executed in parallel as in Section 5.1.1. However, over time theseparallel runs share data. The data being shared is the data used to build the model,namely the parameter configurations and instances being run and the correspondingmeasured value of the objective function for each of them. The utility of sharingrun data between otherwise independent runs depends greatly on the configuratorselected. In the case of ParamILS, it would be of limited utility, because it wouldonly allow for ParamILS to skip runs that had already been completed by anotherrun. Intuitively, one would expect this to happen very rarely, because each ParamILSrun would typically explore different parts of the search space, not to mention thatwhen optimizing randomized algorithms, the evaluations themselves would likely havedifferent pseudo-random number generator seeds. However, in SMBO methods, suchas SMAC, it is very easy to exploit this run data, by adding it to the data used tobuild the model. In principle, such an approach could also be leveraged by GGA, by61selectively breeding with configurations from other runs.Conceptually, this approach has several advantages. The first is that each run hasaccess to an amount of data closer to a fully parallel automatic configurator, albeitwith perhaps less quality and some redundancy. In the case of SMAC, a significantportion of the runs being performed are also random evaluations. Another advantagethis kind of protocol enjoys is redundancy, in that a failure of a single run or machinedoes not necessarily ruin the experiment. Indeed, in some of the best runs of pSMACreported in Section 5.3, some runs ran out of memory, and yet the configurationexperiment as a whole still succeeded. These runs can also be easier to schedule ona shared cluster environment because individual configurator runs can appear anddisappear at different times.Of course, there are also some obvious disadvantages to this approach. One isthat it is very possible that the dependent runs converge to a single configuration(one can imagine that, e.g., parallel runs of EGO (Jones et al. [1998]) would likelyconverge to the same point and not maintain enough diversity). Another disadvantagecompared to the parallel automatic configurator is that the evaluation of individualconfigurations is still limited to a single processor.5.2 pSMACpSMAC is an extension of SMAC that follows the parallel dependent runs paradigmfor parallelization. Each run of pSMAC writes its own measured observations of thetarget algorithm to disk and reads the other runs’ observations. To avoid stress on ashared file system, other runs are only read periodically. The shared runs are writtento the model and optionally can be used as a cache of runs. The number of duplicatealgorithm executions (and consequently the benefits of using them as a cache) istypically expected to be very low for a number of reasons:1. Different runs of pSMAC will likely use different pseudo-random number genera-tor seeds, which will be treated differently.2. Continuous parameters in the configuration space are unlikely to ever be selectedin a way that exactly matches the previous value, as the random forest model ispiecewise constant.3. Parameters that have no significant effect on the model are unlikely to be set tothe same value.62Consequently, the mechanism through which we would expect pSMAC to benefit isthrough the ability to build models with more data than would otherwise be available.5.3 Experiments5.3.1 Experiment I - Performance Over TimeIn this first experiment, we examine pSMACs performance over time in comparisonwith SMAC the same scenarios in Section 4.5; in fact, the data for SMAC andexperimental setup are the exact same (and so are not repeated here). The numbersin this experiment will differ from the previous scenario, because we are comparingthe best of k independent samples against the best of k dependent samples of pSMAC(denoted pSMACxk).5.3.2 ResultsTables 5.1 and 5.2 show the performance of pSMACx4 and pSMACx8 respectively. Ingeneral, there are only a few cases in which pSMAC beats SMAC at a statisticallysignificant level. Interestingly, this occurs more frequently for pSMACx8 than pS-MACx4. If we validate the performance over time, as shown in Figures 5.1, 5.2, 5.3for pSMACx4 and Figures 5.4, 5.5, 5.6 we can see some additional patterns. In manyscenarios (e.g., QCP-PAR10-SAPS, SWGCP-q075-PAR1-SAPS, SWGCP-q095-PAR1-SAPS),there is a significant period of time during which pSMAC performs significantly better,but because of the time budget, eventually the k independent runs of SMAC catchup. In other scenarios, (e.g., SWV-SPEAR-MED), pSMAC offers no apparent benefit overSMAC, but it also does not hurt. In some scenarios pSMAC does seem to perform63Best of SMACx4 pSMACx4Scenario Unit Default Median # Median #IBM-Spear-med [·100 s] 2.300 1.593 160 1.455 40IBM-Spear-q025 [·10−1 s] 5.516 3.693 160 3.704 40QCPmed-PAR1-saps [·10−2 s] 3.396 2.561 160 2.522 40QCPq075-PAR1-saps [·10−1 s] 2.500 0.919 160 0.918 40QCPq095-PAR1-saps [·100 s] 3.934 0.463 160 0.437 40SWGCPq075-PAR1-saps [·100 s] 3.024 0.110 160 0.109 40SWGCPq095-PAR1-saps [·100 s] 2.687 0.123 160 0.121 40SWV-Spear-med [·10−1 s] 6.631 3.710 160 3.713 40SWV-Spear-q075 [·100 s] 2.644 0.363 160 0.362 40SWV-Spear-q095 [·100 s] 5.000 0.506 160 0.504 40QCP-PAR10-saps [·100 s] 9.367 3.484 32 3.470 8QCP-spear-PAR10 [·100 s] 1.897 0.653 32 0.606 8QWHmed-PAR1-saps [·10−2 s] 4.212 2.738 32 2.768 8SWGCP-PAR10-saps [·100 s] 15.094 0.107 32 0.104 8SWGCP-spear-PAR10 [·100 s] 6.987 6.027 32 5.595 8Table 5.1: Median training performance of SMAC and pSMAC implementations.For Best of SMACx4, we performed bootstrap samplings of 4 runs from the #column, computed the minimum, and then the medium of 1000 samples. For thepSMACx4 run, we took grouped runs into batches of 4 for sharing runs, and took #independent samples from this. Bold-faced entries represent configurator performancethat is significantly better under a Mann-Whitney U Test with significance level of0.01 (because of the repeated testing we applied a Bonferroni correction and adjustedthe individual p-values used for significance of the test to less than 6.667× 10−4).worse for a significant portion of overall running time, but these are the multi-instancescenarios for pSMACx8 which have only 4 runs, although SWGCP-SPEAR-PAR10 forpSMACx4 also performs worse for a period of time on 8 runs.5.4 Experiment II - CSSC PerformanceThe Configurable SAT Solver Challenge 2014 (Hutter et al. [2014]) is a competitiondesigned to encourage SAT solvers to parameterize their algorithms so that meta-algorithmic techniques can be applied to improve performance on an instance type of64;Figure 5.1: Median Performance and Interquartile range for SAPS Single InstanceScenarios using 4 coresinterest. In this experiment, we compare the original CSSC 2014 data with a run ofpSMAC for each configuration scenario run on the same hardware.5.4.1 Experimental SetupIn the original competition, a number of instance sets were created comprising SATproblems that were grouped in four categories:• Industrial: (BMC, CircuitFuzz, IBM)• Crafted (GI, LABS, Queens)65Figure 5.2: Median Performance and Interquartile range for SPEAR Single InstanceScenarios using 4 cores• Random SAT+UNSAT (3cnf, K3, unif-k5)• Random SAT (3sat1k, 5sat500, and 7sat90)Each competing solver was run on the instance sets to determine the defaultperformance and then was configured using the configurators GGA Ansotegui et al.[2009]), ParamILS (Hutter et al. [2007b]), and SMAC (Hutter et al. [2011b]). ForParamILS, we performed four independent algorithm configuration runs on a discretizedversion of the parameter space. SMAC was used with four independent runs of thecontinuous configuration space, and four independent runs for a discretized version66Figure 5.3: Median Performance and Interquartile range for Multi-Instance Scenariosusing 4 coresof the configuration space. GGA was run once with 4 cores each on the discretizedversion of the space and the continuous version. In the CSSC, the best solver wasdetermined as the one that solved the most instances, with ties broken by the solverwith the least runtime. Each algorithm configurator run was given the minimum of172 800 CPU seconds or 172 800 wall-clock seconds to run. Each run of the targetalgorithm was given a limit of 300 CPU seconds. After the configuration process hasbeen completed we perform validation on the same problem instances and seeds thatwere used in the CSSC competition.Using the original competition data from the CSSC competition and running onthe same hardware (see Section 4.2), we compared the original configuration runswith two runs of pSMAC (one on the discretized and the other on the continuousversion of the space). Hutter et al. [2014] reports the number of timeouts for eachsolver and their PAR1 score (i.e., the sample mean performance treating timeouts asthe maximum value), the individual configurators were optimizing the PAR10 score(i.e., the sample mean performance treating timeouts as 10 times the maximum value).Since we want to summarize performance in a single value, we also use PAR10 tocompare all configurators.67Best of SMACx8 pSMACx8Scenario Unit Default Median # Median #IBM-Spear-med [·100 s] 2.300 1.593 160 1.324 20IBM-Spear-q025 [·10−1 s] 5.516 3.693 160 3.709 20QCPmed-PAR1-saps [·10−2 s] 3.396 2.561 160 2.577 20QCPq075-PAR1-saps [·10−1 s] 2.500 0.919 160 0.913 20QCPq095-PAR1-saps [·100 s] 3.934 0.463 160 0.427 20SWGCPq075-PAR1-saps [·100 s] 3.024 0.110 160 0.109 20SWGCPq095-PAR1-saps [·100 s] 2.687 0.123 160 0.118 20SWV-Spear-med [·10−1 s] 6.631 3.710 160 3.712 20SWV-Spear-q075 [·100 s] 2.644 0.363 160 0.362 20SWV-Spear-q095 [·100 s] 5.000 0.506 160 0.504 20QCP-PAR10-saps [·100 s] 9.367 3.484 32 3.491 4QCP-spear-PAR10 [·100 s] 1.897 0.653 32 0.613 4QWHmed-PAR1-saps [·10−2 s] 4.212 2.738 32 2.750 4SWGCP-PAR10-saps [·100 s] 15.094 0.107 32 0.103 4SWGCP-spear-PAR10 [·100 s] 6.987 6.027 32 5.271 4Table 5.2: Median training performance of SMAC and pSMAC implementations.For Best of SMACx8, we performed bootstrap sampling of 8 runs from the #column, computed the minimum, and then the medium of 1000 samples. For thepSMACx8 run, we took grouped runs into batches of 8 for sharing runs, and took #independent samples from this. Bold-faced entries represent configurator performancethat is significantly better under a Mann-Whitney U Test with significance level of0.01 (because of the repeated testing we applied a Bonferroni correction and adjustedthe individual p-values used for significance of the test to less than 6.667× 10−4).5.4.2 ResultsAltogether there are 72 scenarios, on only 66 of which any configurator makes anyprogress. In summary, GGA is the best configurator in 4 scenarios, ParamILS in 15,SMAC in 24 and pSMAC 23.In Figure 5.7, we plot the difference in performance between a configurator’s bestrun on the training set versus the best run of all other configurators. Therefore, forexample, if the PAR10 scores of pSMAC, SMAC, ParamILS and GGA were (1, 5, 10,20) respectively, we would report (+4, -4, -9, -19) for the 4 configurators respectively.The PAR10 score of an incomplete or missing run is defined to be 3000. In a number68;Figure 5.4: Median Performance and Interquartile range for SAPS Single InstanceScenarios using 8 coresof cases, GGA had problems executing on a scenario. Thus, to prevent the graph frombeing overwhelmed by GGA’s performance it has been truncated to only show a lossof -150. Finally, we only plot the 66 scenarios where at least one configurator foundan improving configuration.From Figure 5.7 we can observe the following. First, while there is seeminglya large amount of variance among configurators, pSMAC does seem to allow forthe exploration of previously unseen regions of the configuration space. Second, itperformed well in similar scenarios as SMAC: when it performed best, the next bestconfigurator after pSMAC was often SMAC. Areas where pSMAC did poorly tended to69Figure 5.5: Median Performance and Interquartile range for SPEAR Single InstanceScenarios using 8 coresbe dominated by SMAC, and the performance loss of both SMAC variants was neveras severe as that of other configurators. Only for two of 66 scenarios, did pSMAC loseby a PAR10 of more than 50.In Figure 5.8, we show the variation between individual runs of a configurator. Inthis graph, we plot, in increasing order for each scenario, how much variance there isbetween the individual runs’ PAR10 score. In the case of ParamILS, it is the samplestandard deviation across 4 runs, and in the case of (p)SMAC it is the sample standarddeviation across 8 runs (4 on the normal configuration space and 4 on the discretizedconfiguration space).70Figure 5.6: Median Performance and Interquartile range for Multi-Instance Scenariosusing 8 coresFigure 5.7: Configurator Performance in CSSC 2014 Competition Data71Figure 5.8: Configurator Variance . In the plot the final point where SMAC’s variancesky-rockets is due to some missing data for SMAC.The figure suggests that pSMAC has lower variance in its configurator runs thanboth SMAC and ParamILS. In conjunction with what we observed from Figure 5.7,this suggests that pSMAC is both a very competitive configurator and a relativelystable configurator.5.5 Future ExtensionsA fundamental limitation of pSMAC, as opposed to a parallel automatic config-urator, is that individual pSMAC runs can only explore the configuration space asingle evaluation at a time. In the later stages of a challenge, when we have obtainedmany measurements of the incumbents performance, it is possible that a promisingconfiguration might need more than a hundred evaluations, each of which must bedone sequentially.It would not be difficult in practice to fix this behaviour, as it would only require twochanges. The first is that it would require the SMAC algorithm (shown in Algorithm4.1) to request runs be evaluated in parallel and to use dynamic adaptive capping (as72Figure 5.9: Core sharing architectureoutlined in Section 3.1.4). The second is that we would replace the local executionmechanism with the distributed mechanism (described in Section 3.5), but at thesame time, launch a worker locally in parallel. Some synchronization would be neededbetween the local worker and pSMAC to ensure that the worker is not evaluating a runwhile pSMAC is building a model. Consequently, both could be launched within thesame process, and thread synchronization could be used to coordinate between them.When pSMAC submits a run to the database, the worker would start processing jobsfrom the database. When pSMAC is notified that the run it requested is completed, itwould wait for the local worker to finish it’s evaluation, and then control would returnto the pSMAC algorithm. Figure 5.9 illustrates this approach.73BibliographyAnsotegui, C., Sellmann, M., and Tierney, K. (2009). A gender-based geneticalgorithm for the automatic configuration of solvers. In Proc. of CP-09, pages142–157.Babic´, D. and Hutter, F. (2007). Spear theorem prover. Solver description, SATcompetition.Balaprakash, P., Birattari, M., and Stu¨tzle, T. (2007). Improvement strategies for theF-Race algorithm: Sampling design and iterative refinement. In Proc. of MH-07,pages 108–122.Balint, A., Gall, D., Kapler, G., Retz, R., Diepold, D., and Gerber, S. (2011).EDACC - an advanced platform for the experiment design, administration andanalysis of empirical algorithms. In Proc. of LION-5.Balint, A., Gall, D., Kappler, G., and Retz, R. (2010). Experiment design andadministration for computer clusters for sat-solvers (edacc), system description.Journal on Satisfiability, Boolean Modeling and Computation (JSAT), 7.Bartz-Beielstein, T., Lasarczyk, C., and Preuss, M. (2005). Sequential parameteroptimization. In cec05, pages 773–780.Birattari, M., Stu¨tzle, T., Paquete, L., and Varrentrapp, K. (2002). A racingalgorithm for configuring metaheuristics. In Langdon, W. B., Cantu´-Paz, E.,Mathias, K., Roy, R., Davis, D., Poli, R., Balakrishnan, K., Honavar, V., Rudolph,G., Wegener, J., Bull, L., Potter, M. A., Schultz, A. C., Miller, J. F., Burke, E., andJonoska, N., editors, GECCO 2002: Proceedings of the Genetic and EvolutionaryComputation Conference, pages 11–18, New York. Morgan Kaufmann Publishers.Bloch, J. (2008). Effective Java (2nd Edition) (The Java Series). Prentice Hall PTR,Upper Saddle River, NJ, USA, 2 edition.Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.Brooks, Jr., F. P. (1995). The Mythical Man-month (Anniversary Ed.).Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Bryant, R. E. and O’Hallaron, D. R. (2002). Computer Systems: A Programmer’sPerspective. Prentice Hall, USA, 1st edition.74Domhan, T., Springenberg, T., and Hutter, F. (2014). Extrapolating learning curvesof deep neural networks. In ICML 2014 AutoML Workshop.Eggensperger, K., Feurer, M., Hutter, F., Bergstra, J., Snoek, J., Hoos, H., andLeyton-Brown, K. (2013). Towards an empirical foundation for assessing bayesianoptimization of hyperparameters. In NIPS workshop on Bayesian Optimization inTheory and Practice.Eggensperger, K., Hutter, F., Hoos, H., and Leyton-Brown, K. (2015). Efficientbenchmarking of hyperparameter optimizers via surrogates. In Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligence.Fawcett, C. and Hoos, H. H. (2013). Analysing differences between algorithmconfigurations through ablation. In Proceedings of the 10th MetaheuristicsInternational Conference (MIC 2013), pages 123–132.Feurer, M., Springenberg, T., and Hutter, F. (2015). Initializing bayesianhyperparameter optimization via meta-learning. In Proceedings of theTwenty-Ninth AAAI Conference on Artificial Intelligence.Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1994). Design Patterns:Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, 1edition.Garey, M. R. and Johnson, D. S. (1990). Computers and Intractability; A Guide tothe Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA.Gomes, C. P., Kautz, H., Sabharwal, A., and Selman, B. (2008). Satisfiability solvers.Handbook of Knowledge Representation, 3:89–134.Hoos, H. (2012a). Programming by optimisation. Communications of the ACM,55(2):70–80.Hoos, H., Leyton-Brown, K., Schaub, T., and Schneider, M. (2012). Algorithmconfiguration for portfolio-based parallel sat-solving. In WORKSHOP ONCOMBINING.Hoos, H. H. (2008). Computer-aided design of high-performance algorithms.Technical Report TR-2008-16, University of British Columbia, Department ofComputer Science.Hoos, H. H. (2012b). Automated algorithm configuration and parameter tuning.Available at: http://www.cs.ubc.ca/ hoos/Publ/Hoos12b-preprint.pdf.Huang, D., Allen, T. T., Notz, W. I., and Zeng, N. (2006). Global optimization ofstochastic black-box systems via sequential kriging meta-models. Journal of GlobalOptimization, 34(3):441–466.Hutter, F., Babic´, D., Hoos, H. H., and Hu, A. J. (2007a). Boosting verification byautomatic tuning of decision procedures. In Proc. of FMCAD-07, pages 27–34.Hutter, F., Hoos, H. H., and K.Leyton-Brown (2012). Parallel algorithm75configuration. In Proc. of LION-6, pages 55–70.Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2010a). Automated configuration ofmixed integer programming solvers. In Proc. of CPAIOR-10, volume 6140, pages186–202.Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2011a). Bayesian optimization withcensored response data. In NIPS workshop on Bayesian Optimization, SequentialExperimental Design, and Bandits. Published online.Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2011b). Sequential model-basedoptimization for general algorithm configuration. In Proc. of LION-5, pages507–523.Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013). An efficient approach forassessing parameter importance in bayesian optimization. In NIPS workshop onBayesian Optimization in Theory and Practice.Hutter, F., Hoos, H. H., Leyton-Brown, K., and Murphy, K. P. (2009a). Anexperimental investigation of model-based parameter optimisation: SPO andbeyond. In Proc. of GECCO-09, pages 271–278.Hutter, F., Hoos, H. H., Leyton-Brown, K., and Murphy, K. P. (2010b).Time-bounded sequential parameter optimization. In Proc. of LION-4, volume6073 of LNCS, pages 281–298. Springer Verlag.Hutter, F., Hoos, H. H., Leyton-Brown, K., and Stu¨tzle, T. (2009b). ParamILS: anautomatic algorithm configuration framework. JAIR, 36:267–306.Hutter, F., Hoos, H. H., and Stu¨tzle, T. (2007b). Automatic algorithm configurationbased on local search. In Proc. of AAAI-07, pages 1152–1157.Hutter, F., Lindauer, M., Bayless, S., Hoos, H., and Leyton-Brown, K. (2014).Configurable sat solver challenge 2014. Technical Report 277, Department ofComputer Science, University of Freiburg, Freiburg, Germany.Hutter, F. and Ramage, S. (2013). Manual for SMAC version v2.06.01-master.Department of Computer Science, UBC.Hutter, F., Tompkins, D. A. D., and Hoos, H. H. (2002). Scaling and probabilisticsmoothing: efficient dynamic local search for SAT. In Proc. of CP-02, volume 2470of LNCS, pages 233–248. Springer Verlag.Johnson, D. S. (2002). A theoreticianâĂŹs guide to the experimental analysis ofalgorithms. Data structures, near neighbor searches, and methodology: fifth andsixth DIMACS implementation challenges, 59:215–250.Jones, D. R. (2001). A taxonomy of global optimization methods based on responsesurfaces. Journal of Global Optimization, 21(4):345–383.Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization ofexpensive black box functions. Journal of Global Optimization, 13:455–492.76Leyton-Brown, K., Hoos, H., Hutter, F., and Xu, L. (2014). Understanding theempirical hardness of np-complete problems. Communications of the Associationfor Computing Machinery (CACM), 57(5):98–107.Lindauer, M., Hoos, H., Hutter, F., and Schaub, T. (2015). Autofolio: Algorithmconfiguration for algorithm selection. In Proceedings of the Twenty-Ninth AAAIWorkshops on Artificial Intelligence.Lo´pez-Iba´nez, M., Dubois-Lacoste, J., Stu¨tzle, T., and Birattari, M. (2011). Theirace package: Iterated racing for automatic algorithm configuration. TechnicalReport TR/IRIDIA/2011-004, IRIDIA, Universite´ Libre de Bruxelles, Brussels,Belgium.McConnell, S. (2004). Code Complete, Second Edition. Microsoft Press, Redmond,WA, USA.Molga, M. and Smutnicki, C. (2005). Test functions for optimization needs.Nell, C., Fawcett, C., Hoos, H. H., and Leyton-Brown, K. (2011). Hal: A frameworkfor the automated analysis and design of high-performance algorithms. In Learningand Intelligent Optimization, pages 600–615. Springer.Prasad, M. R., Biere, A., and Gupta, A. (2005). A survey of recent advances inSAT-based formal verification. International Journal on Software Tools forTechnology Transfer, 7(2):156–173.Roussel, O. (2011). Controlling a solver execution with the runsolver tool systemdescription. Journal on Satisfiability, Boolean Modeling and Computation,7:139–144.Sacks, J., Welch, W. J., Welch, T. J., and Wynn, H. P. (1989). Design and analysis ofcomputer experiments. Statistical Science, 4(4):409–423.Seipp, J., Sievers, S., Helmert, M., and Hutter, F. (2015). Automatic configuration ofsequential planning portfolios. In Proceedings of the Twenty-Ninth AAAIConference on Artificial Intelligence.Shull, F., Basili, V., Boehm, B., Brown, A. W., Costa, P., Lindvall, M., Port, D., Rus,I., Tesoriero, R., and Zelkowitz, M. (2002). What we have learned about fightingdefects. In in Proceedings of 8th International Software Metrics Symposium, pages249–258.Styles, J. and Hoos, H. (2013). Ordered racing protocols for automatically configuringalgorithms for scaling performance. In Proceeding of the fifteenth annual conferenceon Genetic and evolutionary computation conference, pages 551–558. ACM.Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013a). Auto-WEKA:Combined selection and hyperparameter optimization of classification algorithms.In Proc. of KDD-2013, pages 847–855.Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2013b). Auto-WEKA:77combined selection and hyperparameter optimization of classification algorithms.In Proceedings of the 19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 847–855.Welsh, M., Culler, D., and Brewer, E. (2001). Seda: an architecture forwell-conditioned, scalable internet services. In ACM SIGOPS Operating SystemsReview, volume 35, pages 230–243. ACM.78Appendix AAEATK DecoratorsIn this appendix, we outline all of the decorators supplied with AEATK. For eachdecorator we outline the class name and provide a description. One word of cautionis that the names of the decorators have not been maintained nor has a consistentnaming scheme been applied, for consistency in the thesis we omit the words Targe-tAlgorithmEvaluator and Decorator from the name, as the context is implicit. Mostof the decorators listed below are available to any AEATK based application on thecommand-line, and many are enabled by default, a few rare ones are only available tometa-algorithm designers via the API for certain special use cases. The distinctionbetween the type of decorator is at times blurry, never the less these groupings are thegroupings that exist in the code (at the Java package level). Finally most decoratorswere created as a result of a real problem, and were added to the official release incase they proved useful later, nevertheless some of these decorators are admittedlyvery esoteric or bizarre.A.1 Debug DecoratorsThese decorators are useful for debugging meta-algorithms orTargetAlgorithmEvaluator implementations.CheckForDuplicateRunConfig This decorator checks to ensure that the meta-algorithm does not supply duplicate AlgorithmRunConfiguration objectswithin a single request, if so an exception is thrown. The motivation for thisdecorator is that as per the conceptual model outlined in Section 3.1.3, thereis no reason to ask for the same point to be evaluated twice within a sin-gle request. Other decorators and TargetAlgorithmEvaluators may use the79AlgorithmRunConfiguration as a key in a Map and may not handle the casewhere a request has duplicates. Instead of throwing an exception, this decoratorcould silently fix the request but the original impetus was a bug in a meta-algorithm and there seems to be no good reason for a meta-algorithm to evermake such a request.EqualTester This decorator wraps two TargetAlgorithmEvaluator implementa-tions and ensures that they both answer the same value.LeakingMemory This decorator leaks a configurable amount of memory for everyrequest by allocating an array.LogEveryObservation This decorator logs a message with the results of everynotification of an observer.LogEvery This decorator logs a message for every AlgorithmRunConfigurationsubmitted and AlgorithmRunResult returned.RunHashCodeVerifying This decorator keeps track of a hash for all runs completedthus far, it is useful if one wants to verify that the runs being executed matchesa predefined sequence.SanityChecking This decorator ensures that the AlgorithmRunResult objects beingreturned and observed correspond to the AlgorithmRunConfiguration objectsbeing submitted.UncleanShutdownDetecting The TargetAlgorithmEvaluator is a resource likea file that must be closed when completed by calling the notifyShutdown()method1. Closing a TargetAlgorithmEvaluator allows it to do necessaryshutdown tasks before the program exits, such as writing everything to thedisk/database, or clean up in other decorators. This decorator uses a shutdownhook, to detect when Java is exiting without the TargetAlgorithmEvaluatorbeing told as this is commonly a bug and has some unfortunate side effects(e.g., if any thread pools are still running, after the main() method returns, theprogram will hang and never exit).1When the code was ported to Java 7, TargetAlgorithmEvaluator was changed to extendAutoCloseable and the close() method was created as an alias for notifyShutdown()80A.2 Functionality DecoratorsThese decorators provide some functionality to the TargetAlgorithmEvaluator.PrePostCommand This decorator schedules shell commands either before themeta-algorithm starts, or when it completes (technically when it callsnotifyShutdown().Portfolio This decorator allows users to treat a series ofAlgorithmExecutionConfiguration objects as one as far as the meta-algorithm is concerned. Where this is useful for instance is in trying to configurea portfolio of configurations for an algorithm, if you want your objective beingoptimized to be the minimum of a new candidate configuration, and theobjective of the current configurations in a portfolio, this decorator allows youto do that.OutstandingEvaluations This decorator supplies the implementation of thegetNumberOfOutstandingBatches(), getNumberOfOutstandingRuns(), andwaitForOutstandingEvaluations(). Many decorators internally alter the num-ber of runs being executed at any time, in a way that should be transparentto the calling code. This decorator typically applied near top of the decoratorchain allows these methods to behave properly even if internally the behaviour isdifferent. For example, if a decorator is retrying all CRASHED runs, then there isa race condition between when the run is completed and when it is resubmitted,where the client could potentially erroneously see zero runs outstanding.OutstandingEvaluationWithAccessor This decorator is a subtype of the abovethat provides additional methods that additionally provides methods to accessthe the set of outstanding runs. It is used primarily for debugging. A subtype isused because this decorator must keep track of the actual runs, where as theprevious one can simply use counters.SimulatedDelay This decorator slows down the returning of AlgorithmRunResultto be no faster than the reported runtime. This is primarily used for debuggingwhen one wants to use a test function as an algorithm but the test function itselfevaluates instantly.TerminateAllRunsOnFileDelete This decorator silently invokes the kill()method on all outstanding runs if a file on disk is deleted. It is through81this mechanism that Auto-WEKA can terminate SMAC near instantaneously,as otherwise it would have to wait for SMAC to finish executing the targetalgorithm.Transform This decorator allows the user to transform the returnedAlgorithmRunResult according to some user-defined function. For in-stance if you wanted to treat instances that were SAT differently than UNSAT asfar as optimization is concerned.A.3 Helper DecoratorsThese decorators provide some helpful functionality.CallObserverBeforeCompletion This decorator notifies the observer of the fi-nal result before returning. This essentially simplifies implementing theTargetAlgorithmEvaluator interface, by requiring no semantics for observa-tion, yet allowing the meta-algorithm to have the useful semantic of always beingable to observe the final result. Without this decorator certain work-flows canbe annoying, for instance imagine a user interface which wants to display liveresults of runs. Without this decorator users would have to use two code pathsto update the UI, the first being the observer, the second is when the runs arecomplete. This decorator allows the client code to be simplified by guaranteeingthat the observer always sees the final result.KillCaptimeExceedingRuns This decorator invokes the kill() method automat-ically if the observed runtime exceeds some ratio, for instance 10 times therequested cutoff time. This is useful because occasionally misbehaving wrapperscan otherwise cause a meta-algorithm to become stuck.OutstandingRunLogging This decorator tracks the time all runs are submitted tothe TargetAlgorithmEvaluator and the time they are returned, as well as themeasured runtime and walltime. From this data, one can plot the approximateutilization of the meta-algorithm over time. The data is computed by trackingthe time when a run was first determined to be completed (i.e., either when therequest completes, or we first observe it completed) and from that subtractingthe reported runtime and walltime respectively to determine when it started(i.e., this approximation puts all of the wall-clock overhead at the start of therun). For example, in the experiment in Section 3.5.2, we can examine how well82Figure A.1: Utilization of the MySQLTargetAlgorithmEvaluator over time with abatch size of 32 and a cutoff time of 4 seconds.the MySQLTargetAlgorithmEvaluator was utilized with a batch size of 32 anda cutoff of 4 second which is shown in Figure A.1. While the plot is noisy onecan notice for instance that over time the batches are completed in less time asthe workers adjusted, the first batch takes almost 12 seconds, well the last takesless than 10. The utilization is computed in buckets and is approximate, thespike above 16 is due to the polling frequency (e.g., if a worker does two runsbetween polls then both runs will appear to take place at the same time).RetryCrashedRuns This decorator resubmits runs that report CRASHED a config-urable number of times.83StrictlyIncreasingRuntimes This decorator2 ensures that the meta-algorithm onlyobservers an increasing value for the runtime of the algorithm. This is necessaryto guard against some race conditions where a run is restarted (e.g., a workerdies and it is reassigned), while simultaneously the meta-algorithm requests therun be terminated and then unexpectedly the meta-algorithm receives a runwith a much lower runtime than it’s invariants expect (e.g., in our improvedVBS in Section 3.1.4 would want KILLED runs to have a higher runtime thanthe best completed run).NotifyTerminationCondition This decorator notifies an object called aTerminationCondition about runs being completed. These objects are used inmeta-algorithms to track how much CPU time has been used, and terminatethe meta-algorithm when a limit has been reached.UseDynamicCappingExclusively This decorator transforms allAlgorithmRunConfiguration objects to have the same cutoff time asthe AlgorithmExecutionConfiguration object, and then supplies an observerthat terminates the run at the requested cutoff time. This is used to improvecache hit rates as two runs that differ only in their cutoff time can be treated asthe same run.WalltimeAsRuntime This decorator causes the meta-algorithm to see a scaleddown version of the wall-time as the runtime of the run when being observed.This is useful if the target algorithm does not support observation of runtime(e.g., if runsolver isn’t being used as discussed in Section 3.4), and only kicksin after 5 seconds by default.A.4 Resource DecoratorsThese decorators help manage computation resources within theTargetAlgorithmEvaluator.Bounded This decorator limits the number of AlgorithmRunConfiguration objectsthat are allowed to be submitted to the decorated TargetAlgorithmEvaluator,and blocks the caller of evaluateRunAsync() until all runs have been sub-mitted. This decorator is useful in a number of situations, the first is to2Which probably should be called StrictlyNonDecreasingRuntimes.84allow TargetAlgorithmEvaluator implementations to easily limit the num-ber of target algorithm evaluations in progress at any time. For instance,the MySQLTargetAlgorithmEvaluator’s evaluateRunAsync() method neverblocks, because runs can always be written to the database, so if a meta-algorithm wants to limit its progress based on the number of available compu-tational resources then this decorator provides the required functionality. Bythis decorators’ nature, it inherently supports observation and termination oftarget algorithm evaluations, because it sees completed runs come back in pieces,and can chose not to submit runs that the user invoked kill() on. So, evenif a TargetAlgorithmEvaluator does not support these features, using thisdecorator can be a pretty good approximation performance wise if many runsare being submitted.Caching This decorator uses an in-memory cache for caching duplicate runs submittedin different requests.FileCache This decorator reads and writes all runs to a directory and uses thoseruns as a cache. Primarily it is used for debugging as it allows developers ofmeta-algorithms to skip redoing the same results in subsequent runs of theirmeta-algorithm.Forking This decorator, based on a configurable policy, may chose to submitthe AlgorithmRunConfiguration to a different TargetAlgorithmEvaluator.For instance, to speed up processing of short target algorithms with theMySQLTargetAlgorithmEvaluator, the this decorator may simultaneously sub-mit the runs to a CommandLineTargetAlgorithmEvaluator but only runningeach for 1 second.NonBlocking This decorator allows every call to evaluateRunAsync() return im-mediately. It complements the Bounded decorator above. This decorator isused when a component of your meta-algorithm must always be able to sub-mit it’s runs in a timely fashion, for instance, to avoid a deadlock. Thisfunctionality is implemented by putting all submitted runs into a queue, andhaving another thread actually invoke evaluateRunAsync() on the decoratedTargetAlgorithmEvaluator.Preempting This decorator exposes a method that provides another low-priorityTargetAlgorithmEvaluator, but AlgorithmRunConfiguration objects submit-ted to it will be terminated and restarted if other runs are submitted and block.85This low-priority support is very heuristic as the TargetAlgorithmEvaluatorAPI does not natively expose this kind of information. It is based on observation,if the TargetAlgorithmEvaluator reports that low priority runs have runtimesgreater than zero, and yet the normal runs are zero, then we kill low priorityruns to make room.RunHistoryCaching This decorator uses another AEATK object called aRunHistory object as a cache of runs. In pSMAC (see Section 5.2), theRunHistory object is populated with run data from other runs, and this decora-tor allows those runs to be added to the cache. It differs from the FileCachedecorator, in that the FileCache decorator is responsible for managing thecache, where in this one the RunHistory object is.A.5 Safety DecoratorsThese decorators are exist to ensure safety and correctness.AbortOnCrash This decorator treats CRASHED runs as ABORT runs and is configurableby the user.AbortOnFirstRunCrash This decorator checks if the very first run is a CRASHEDrun and if so changes it to an ABORT. The motivation for this decorator is two-fold,first meta-algorithms like SMAC behave poorly if the first run is a CRASHEDbecause it is uninformative. The second is that most likely if the first run is aCRASHED, every run is likely to be a CRASHED.CrashedSolutionQualityTransforming This decorator makes sure that the re-ported solution quality of CRASHED runs is greater than some value. It guardsagainst wrappers which default the solution quality to 0, which the meta-algorithm will treat as very good.ExitOnFailure This decorator forcibly exits the JVM via a combination of exit()and halt() if an exception is encountered evaluating the AlgorithmRunRequest.The best explanation is the use-case it was designed for, which was adistributed meta-algorithm using the MySQLTargetAlgorithmEvaluatorwhere the worker TargetAlgorithmEvaluator was not theCommandLineTargetAlgorithmEvaluator but a custom one that utilizedJNI to talk to a C target algorithm directly. This target algorithm occasionally86had non-deterministic problems with memory corruption and to make themeta-algorithm more reliable it was decided that the worker process should justexit abruptly, and be restarted.JVMShutdownBlocker This decorator detaches submitted and observedAlgorithmRunConfiguration objects from their caller when the JVM is shut-ting down. It also silently drops requests that are submitted after the shutdownhas started. The primary purpose of this decorator is to avoid generatingspurious or erroneous results to the meta-algorithm that are due to shutdown.For instance if the meta-algorithm was terminated by the user via CTRL+C, theentire process tree will start shutting down. This often means that wrapperswill report CRASHED. Depending on how long it takes the JVM to shutdown,the meta-algorithm may see this and start logging a number of warnings orerrors that are misleading. Consequently when the JVM detects shutdown, thisdecorator basically stops notifying the caller of updates, and stops submittingnew runs.SATConsistency This decorator does an in-memory check that for each probleminstance the satisfiability (i.e., whether the run is SAT or UNSAT) is consistent, asper the conceptual model in Section 3.1.3.SynchronousObserver This decorator simply invokes the user-supplied observerin a synchronized method. This prevents the observer from being invokedsimultaneously, and ensures that there are no memory visibility issues if itis called by multiple threads, and guards against the observer being notifiedby two threads concurrently. This protection isn’t perfect however, as if thesame observer is supplied with multiple sets of runs the user must still provideadditional thread safety guarantees.TimingChecker This decorator logs warnings if the target algorithm wall-timeis substantially higher than the measured runtime, suggesting that there issubstantial unaccounted for overhead that the user may be unaware of.VerifySAT This decorator uses the instance specific information on theProblemInstance object to verify whether the target algorithm correctly de-tected satisfiability of the instance.WarnOnNoWallOrRuntime This decorator logs a warning if none of the runssubmitted have a runtime or walltime greater than 0. The most common87use case where this occurs is when the workers are not started with theMySQLTargetAlgorithmEvaluator.88


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items