UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

IR-MetaOCaml : (re)implementing MetaOCaml Roubinchtein, Evgeny 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2015_november_roubinchtein_evgeny.pdf [ 252.48kB ]
JSON: 24-1.0166800.json
JSON-LD: 24-1.0166800-ld.json
RDF/XML (Pretty): 24-1.0166800-rdf.xml
RDF/JSON: 24-1.0166800-rdf.json
Turtle: 24-1.0166800-turtle.txt
N-Triples: 24-1.0166800-rdf-ntriples.txt
Original Record: 24-1.0166800-source.json
Full Text

Full Text

IR-MetaOCaml: (re)implementing MetaOCamlbyEvgeny RoubinchteinA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)The University of British Columbia(Vancouver)October 2015© Evgeny Roubinchtein, 2015AbstractMulti-stage programming is a form of metaprogramming that is an extension ofideas and techniques of partial evaluation. This thesis discusses a (re)implementationof a multi-stage programming system MetaOCaml. The system presented here dif-fers from the OCaml implementation by Taha et al in that it is implemented on topof a modern OCaml compiler. It differs from BER MetaOCaml in that it supportsgeneration of native code in a turn-key fashion. It differs from both systems in thatit uses the OCaml intermediate representation to represent the notion of code (tothe best of my knowledge, existing system use abstract syntax trees instead.)iiPrefaceThe thesis is original, unpublished, independent work by the author, Evgeny Roubinchtein.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 A brief introduction to multi-stage programming . . . . . . . . . 62.2.1 Partial evaluation . . . . . . . . . . . . . . . . . . . . . . 62.3 Partial evaluation systems . . . . . . . . . . . . . . . . . . . . . . 142.4 Multi-stage programming in MetaOCaml . . . . . . . . . . . . . 142.5 OCaml compiler overview . . . . . . . . . . . . . . . . . . . . . 182.5.1 OCaml values . . . . . . . . . . . . . . . . . . . . . . . . 232.5.2 The OCaml toplevel . . . . . . . . . . . . . . . . . . . . 292.5.3 Representing code . . . . . . . . . . . . . . . . . . . . . 31iv3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1 Run for closed terms . . . . . . . . . . . . . . . . . . . . . . . . 333.2 Cross-stage persistence . . . . . . . . . . . . . . . . . . . . . . . 363.3 Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.1 Dealing with scope extrusion . . . . . . . . . . . . . . . . 444 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Performance results . . . . . . . . . . . . . . . . . . . . . . . . . 505 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.2 Retrospective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3 Possible future work . . . . . . . . . . . . . . . . . . . . . . . . 55Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57vList of TablesTable 2.1 Translation from λ -calculus to (Meta)OCaml syntax . . . . . . 15Table 4.1 Description of the benchmarks used in evaluating the perfor-mance of the MetaOCaml implementations. . . . . . . . . . . 49Table 4.2 Ratio of running time of (annotated) benchmarks to the runningtime of unannotated program in the corresponding OCaml com-piler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Table 4.3 Memory allocation for benchmarks (in words allocated in theminor heap). . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Table 4.4 Raw data for BER MetaOCaml, as reported by Core_bench. Allnumbers are average values per run. . . . . . . . . . . . . . . . 52Table 4.5 Raw data for my MetaOCaml implementation, as reported byCore_bench. All numbers are average values per run. . . . . . 52Table 4.6 Raw data for OCaml v4.02.1 with byte code back-end, as re-ported by Core_bench. All numbers are average values per run.BER MetaOCaml running times are normalized to the numbersin this table. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Table 4.7 Raw data for OCaml v4.02.1 with native code back-end, as re-ported by Core_bench. All numbers are average values per run.The running times of my MetaOcaml implementation are nor-malized to the numbers in this table. . . . . . . . . . . . . . . 53Table 4.8 Raw data for the original MetaOCaml implementation. All num-bers are average values per run. . . . . . . . . . . . . . . . . . 53viTable 4.9 Raw data for OCaml 3.09 with native code backend. All num-bers are average values per run. The running times of the origi-nala MetaOcaml implementation are normalized to the numbersin this table. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53viiList of FiguresFigure 1.1 Partial evaluation data flow: the language of annotated andresidual program need not be the same. . . . . . . . . . . . . 2Figure 1.2 Multi-stage programming data flow: the language of annotatedand residual program is the same, and specialization step maybe repeated an arbitrary number of times. . . . . . . . . . . . 3Figure 2.1 Partial evaluation data flow. . . . . . . . . . . . . . . . . . . 7Figure 2.2 Syntax of a toy language for partial evaluation . . . . . . . . . 8Figure 2.3 Semantics of a toy language for partial evaluation . . . . . . . 10Figure 2.4 The free variables function . . . . . . . . . . . . . . . . . . . 11Figure 2.5 The capture-avoiding substitution function . . . . . . . . . . . 11Figure 2.6 The evaluation of (fix f . (λ n. n ? n∗ f (n−1) : 1)) 3 . . . . . 12Figure 2.7 Partial evaluation of (fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.8 OCaml compiler overview. . . . . . . . . . . . . . . . . . . . 19Figure 2.9 OCaml compiler frontend detail. . . . . . . . . . . . . . . . . 20Figure 2.10 OCaml bytecode backend detail. . . . . . . . . . . . . . . . . 21Figure 2.11 OCaml native code backend detail. . . . . . . . . . . . . . . . 22Figure 2.12 A function that returns a function . . . . . . . . . . . . . . . 25Figure 2.13 The memory of the closure created for adder . . . . . . . . . 25Figure 2.14 The memory layout of a standard OCaml closure . . . . . . . 27Figure 2.15 The OCaml toplevel . . . . . . . . . . . . . . . . . . . . . . 30Figure 3.1 The implementation of code for closed terms . . . . . . . . . 34viiiFigure 3.2 The implementation of run for closed terms . . . . . . . . . . 35Figure 3.3 The implementation of code for cross-stage persistence . . . . 37Figure 3.4 The MetaOCaml closure . . . . . . . . . . . . . . . . . . . . 38Figure 3.5 The implementation of run for cross-stage persistence . . . . 39Figure 3.6 The implementation of code for splicing . . . . . . . . . . . . 42Figure 3.7 Example that illustrates the scope extrusion problem, by Taha 44Figure 4.1 The OCaml code to benchmark (hypothetical) procedures namedp1 and p2 using Core_bench. . . . . . . . . . . . . . . . . . . 48Figure 4.2 The OCaml code to collect timing data from Taha et. al MetaO-Caml. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Figure 4.3 The code for power, fib and their staged counterparts, cpowerand cfib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51ixGlossaryAST abstract syntax tree , an (in general n-ary) tree representing thesyntactical structure of the program, (usually) produced by a compiler’sparserxChapter 1IntroductionWhen faced with a problem that can be solved by computer, one possible approachis to write a program that solves that problem. Another approach is to write aprogram that produces a program that solves the problem. The former approach iscommonly known as programming The latter is (somewhat less commonly) knownas meta-programming. Practitioners of meta-programming report being able toproduce concise and expressive programs [11] [14]; being able to produce pro-grams that run efficiently with relatively little programmer effort has also beenreported [16] [20][2].1Multi-stage programming is a form of meta-programming which builds onideas from partial evaluation. Figure 1.1 shows the flow of data in a typical partialevaluation system: a program together with a specification of some of its inputsis first subjected to (automated) binding time analysis to decide which of the pro-gram’s internal variables depend on the given inputs. The binding time analysisphase is followed by partial evaluation proper, which produces a residual program.When the residual program is run, any values whatsoever may be assigned to thesubset of the inputs that were not specified during partial evaluation. If S denotesthe subset of the original program’s inputs that were specified during partial eval-uation and U denotes the subset of program’s inputs that were not specified during1While hand-coding the program in assembly and optimizing it for a specific architecture mightproduce a significantly more efficient program, part of the emphasis here is on saving programmer’seffort.1Program Names ofstatically known inputsBinding Time Analysis(BTA)Annotated programValues ofstatically known inputsSpecializationResidual ProgramFigure 1.1: Partial evaluation data flow: the language of annotated and resid-ual program need not be the same.partial evaluation, then running the residual program while assigning any valueswhatsoever to the inputs in U is equivalent to running the original program withthe values that were given to the partial evaluator for the inputs in S and providingthose same – arbitrary – values for the inputs in U .Figure 1.2 shows the flow of data in a multi-stage programming system: theresponsibility for doing binding time analysis is transferred from the machine to thehuman: programmer is responsible for annotating the program to describe whichvalues may be immediately calculated from a given set of inputs. The machinethen faithfully follows the annotations provided by the programmer to arrive at a2(Meta)ProgrammerProgramNames ofstaticaly known inputsValues ofstaticall knon inputsMulti-stage programming system(compiler or interpreter)Annotated programFigure 1.2: Multi-stage programming data flow: the language of annotatedand residual program is the same, and specialization step may be re-peated an arbitrary number of times.residual program.Lisp macro system may be considered as an early example of a multi-stage pro-gramming system. Graham [14] popularized meta-programming with Lisp macros,and also observed that, “macro definitions are harder to read than ordinary code” [15].A plausible approach to help programmers debug Lisp macros would be to have3problematic usages of macros. Types are a programming languages device thatenables compile-type checking of programs. If it were possible to create a typesystem for Lisp macros, then compile-type checking of Lisp macros could be im-plemented. Wand studied the Lisp macro system to determine if any guaranteescould be made about the behavior of Lisp macros, and concluded that no type sys-tem could be devised for the Lisp macros system.[33]MetaML, invented by Sheard [31], is a typed meta-programming system. Thetype system detects certain programming errors. Examples of the kinds of errorsprohibited by the type system include unbound variables, and using a variable inthe wrong stage (before its value has been calculated). Type systems for multi-stage programming detect those errors without running the program, and withoutneeding to re-examine the program during each successive stage of the evaluation.Taha showed that, unlike Lisp macro system, a type system for MetaML can bedevised [29].MetaOCaml is a descendant of MetaML, first implemented by Taha [4]. Taha’swork was done on OCaml version 3, and was never (to my knowledge) portedto OCaml version 4. BER MetaOCaml is an implementation of MetaOCaml forOCaml version 4 that supports generating byte code (only) in a turn-key fashion.The work presented here is a clean-room re-implementation of MetaOCaml with afocus on native code generation.The goal of this project was to explore using the OCaml intermediate repre-sentation2 rather than AST as the concrete representation of the program that isbeing manipulated by the meta-programming system. To the best of my knowl-edge, all existing MetaOCaml implementations manipulate ASTs. Because of howthe OCaml compiler is implemented, manipulating ASTs means that program istype-checked at every stage of evaluation. While not harmful, such type checkingis unnecessary in view of the existence of multi-stage programming type systems.We were curious if getting rid of the repeated type checking would positively im-pact run-time performance. The performance evaluation of the resulting system isfound in Chapter 4.A secondary goal of the project is that the implementation should be minimally-2The intermediate representation used for this project is caled Lambda. It is described in moredetail is Section 2.5.4invasive with respect to the underlying OCaml system it is based on. This goal isshared between the BER MetaOCaml implementation and the work presented here.As I show in Chapter 3, the resulting system is somewhere mid-way between BERMetaOCaml and “classic” MetaOCaml in terms of the changes it makes to theunderlying OCaml implementation.The contribution of this project is demonstrating the feasibility of producinga minimally-invasive implementation of MetaOCaml that uses the OCaml inter-mediate representation as the concrete representation of the program that is beingmanipulate.After giving the necessary background on multi-stage programming and theOCaml compiler, I describe the design and implementation of the multi-stage pro-gramming extensions added to the OCaml compiler; a performance evaluation ofthe resulting system follows, and is followed in turn by a summary and conclusion.5Chapter 2Background2.1 OverviewMetaOCaml is a multi-stage programming system based on MetaML. The workpresented in this thesis implements MetaOCaml as an extension to the compilerfor OCaml version 4. This chapter introduces the basic notions of multi-stageprogramming, and gives an overview of the OCaml compiler, with particular focuson the parts of the compiler that are important for the discussion of implementationin Chapter 3.2.2 A brief introduction to multi-stage programmingThe goal of this section is to introduce readers unfamiliar with multi-stage pro-gramming to the basic concepts, and to provide some simple examples of multi-stage programs.Multi-stage programming is an extension of ideas found in the field of partialevaluation, so, in order to explain multi-stage programming, I start by describingpartial evaluation.2.2.1 Partial evaluationPartial evaluation may be described informally as running a program when valuesare known for only some of the program’s variables. Naturally, if values for all of6Values ofstatically known inputsValues ofinputs not know until runtimePartial evaluation system(Full) ProgramResidual ProogramResults of running programall inputsFigure 2.1: Partial evaluation data flow.the program’s variables are known, the program can just be run in the usual fashionto obtain the result. If only some of the values are known, however, then a programalong with the known values may be given to a partial evaluator, which will analyzethe program and the available inputs and produce a residual program, as illustratedin Figure 2.1. When the values for the variables that were not yet known at the timethe partial evaluator ran become known, the residual program may be run with justthose extra values to obtain the result. The correctness criteria for partial evaluationis quite natural: the result obtained should not depend on whether a program wasrun in the normal fashion, with all the values specified or whether the program wasfirst run through the partial evaluator with only some of the values known, and thenthe residual program was run on the remaining values.To describe the process of partial evaluation in more detail, I will use a simple7x ∈ VARIABLE n ∈ N t ∈ TERMv ∈ VALUEt ::= n |x |λ x. t | t t |fixx. t | t ? t : t | t + t | t − t | t ∗ tv ::= n |λ x. tFigure 2.2: Syntax of a toy language for partial evaluation“toy” language. The syntax of the language is given in Figure 2.2.1The language is typical of functional programming languages inspired by Church’slambda calculus [6]. The intuition is that writing down one of the terms (TERM) cre-ates an expression; expressions may be evaluated to obtain their value. The rulesof evaluation appear below, but the intuitive meaning of numbers (n), variables (x),and arithmetic operations on natural numbers (+,−,∗) is probably clear to any-one familiar with at least one programming language. The conditional expression(t ? tt : t f ) works like the conditional operator t ? tt : t f found in C (and Java).Writing down λ x. t creates a function expression. Function expressions may beapplied to an argument using t t (the rules by which application is evaluated appearbelow). Finally, fixx. t creates recursive expressions: if one writes down fixx. t,then, within t, x stands for the entire expression fixx. t, and t t may be used toapply that expression to an argument. A word on precedence is in order: functionapplication has the highest precedence, and in particular the precedence of a func-tion application is higher than that of any of the binary operators. This means thatf 1 + 2 is parsed as ( f 1) + (2), rather than as f (1 + 2) As usual, parenthesesmay be used to override the default precedence.Conceptually, a result is obtained by rewriting the program according to therules that appear in Figure 2.3 until no rewrite rules apply.The intuition is as follows: Numbers,and abstractions (the standard name forλ x. t, which defines an anonymous function) evaluate to themselves. To evaluatean application (the standard name for t t), one evaluates both the term (t1) beingapplied (the operator) and the argument of the application (t2) (the operand): theoperator must evaluate to a λ x. t-form (however it is immaterial what the operand1More formally, the BNF rule describes the set of syntactically valid terms of the language.8evaluates to); once both the operator and the operand are known, one substitutesthe operand for the formal parameter of the λ x. t-form ([v/x]) in the body of theλ x. t-form, while being careful not to capture variables by accident (more detailson the rules of substitution are found below); finally, one evaluates the term thatresults from the substitution: the value obtained from evaluation is the value of theapplication. The fixx. t expression allows recursive functions to be written: theintuition is that fixx. t gives a name to the function that is in the process of beingdefined, so that the name can be used inside the body of the function to allow thedefintion to “refer to itself.”. More formally, after writing down fixx. t, wheneverx is applied to an argument, the rules of evaluation ensure that an entire new copyof fixx. t will be substituted in place of x (to avoid infinite recursion, one needs toensure that applying fixx. t to an argument doesnt always result in an applicationof x to some argument). The intuition for t ? tt : t f is that it works like the ternary? : operator in C (and Java).A combination of conditional and (recursive) function calls enables a widerange of flow control constructs to be expressed in the language. A few illustrationsof programs written using in the toy language are forthcoming, but, to finish thepresentation of the language, capture-avoiding substitution needs to be described.Capture-avoiding substitutionIn an expression like λ x. x+y, the variable x is bound, while the variable y is free,and analogously for the fixx. x+ y. The function FV t produces a set of variablesthat are free in a term in the toy language. The formal definition of the FV t functionappears in Figure 2.5. A variable is bound if it is not free.The intent of the capture-avoiding substitution is that bound variables shouldstay bound following the substitution and free variables should stay free: this way,the “meaning” of terms is preserved by substitution. To demonstrate an exampleof a variable capture, if one were to perform textual substitution (λ x. y)[y := x] toobtain λ x. x, a variable that started out free (namely, y) has been captured followingtextual substitution.I now give a few examples to illustrate how familiar functions can be expressed9t1→ t ′1t1 t2→ t ′1 t2t→ t ′v t→ v t ′ (λ x. t) v→ t[v/x]fixx. t→ t[(fixx. t)/x]t→ t ′t ? t1 : t2→ t ′ ? t1 : t2n 6= 0n ? t1 : t2→ t1 0 ? t1 : t2→ t2t1→ t ′1t1+ t2→ t ′1+ t2t2→ t ′2v+ t2→ v+ t ′2n = n1+n2n1+n2→ nt1→ t ′1t1− t2→ t ′1− t2t2→ t ′2v− t2→ v− t ′2n = n1−n2n1−n2→ nt1→ t ′1t1 ∗ t2→ t ′1 ∗ t2t2→ t ′2v∗ t2→ v∗ t ′2n = n1 ∗n2n1 ∗n2→ nFigure 2.3: Semantics of a toy language for partial evaluationin this “toy” language. The familiar factorial function isfix f . (λ n. n ? n∗ f (n−1) : 1).The function to calculate the nth Fibonacci number isfix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : aThe evaluation of the expression(fix f . (λ n. n ? n∗ f (n−1) : 1)) 310FV n =∅FV x = xFV λ x. t = FV t \{x}FV (t1 t2) = FV t1∪ FV t2FV fixx. t = FV t \{x}FV (t1 ? t2 : t3) = FV t1∪ FV t2∪ FV t3FV (t1 + t2) = FV t1∪ FV t2FV (t1 − t2) = FV t1∪ FV t2FV (t1 ∗ t2) = FV t1∪ FV t2Figure 2.4: The free variables function[t/x]n = n[t/x]y = y x 6= y[t/x]x = t[t/x]λ x. t = λ x. t[t/x]λ y. t = λ x′. [t/x][x′/y]t where x 6= y and x′ /∈ FV λ y. t ∪ FV t ∪ x[t/x](t1 t2) = [t/x]t1 [t/x]t2[t/x]fixx. t = fixx. t[t/x]fixy. t = fixx′. [t/x][x′/y]t where x 6= y and x′ /∈ FV fixy. t ∪ FV t ∪ x[t/x](t1 ? t2 : t3) = [t/x]t1 ? [t/x]t2 : [t/x]t3[t/x](t1 + t2) = [t/x]t1 + [t/x]t2[t/x](t1 − t2) = [t/x]t1 − [t/x]t2[t/x](t1 ∗ t2) = [t/x]t1 ∗ [t/x]t2Figure 2.5: The capture-avoiding substitution function11(fix f . (λ n. n ? n∗ f (n−1) : 1)) 3 =(λ n. n ? n∗ (fix f . (λ n. n ? n∗ f (n−1) : 1))(n−1) : 1) 3 =3 ? 3∗fix f . (λ n. n ? n∗ f (n−1) : 1) (3−1) : 1 =3∗ (fix f . (λ n. n ? n∗ f (n−1) : 1) 2) =3∗ ((λ n. n ? n∗fix f . (λ n. n ? n∗ f (n−1) : 1) (n−1) : 1) 2) =3∗ (2 ? 2∗fix f . (λ n. n ? n∗ f (n−1) : 1) (2−1) : 1) =3∗2∗ (fix f . (λ n. n ? n∗ f (n−1) : 1) 1) =3∗2∗ ((λ n. n ? n∗fix f . (λ n. n ? n∗ f (n−1) : 1) (n−1) : 1) 1) =3∗2∗ (1 ? 1∗fix f . (λ n. n ? n∗ f (n−1) : 1) (1−1) : 1) =3∗2∗1∗ (fix f . (λ n. n ? n∗ f (n−1) : 1) 0) =3∗2∗1∗ ((λ n. n ? n∗fix f . (λ n. n ? n∗ f (n−1) : 1) (n−1) : 1) 0) =3∗2∗1∗ (0 ? 0∗fix f . (λ n. n ? n∗ f (n−1) : 1) (n−1) : 1) =3∗2∗1∗1 =6Figure 2.6: The evaluation of (fix f . (λ n. n ? n∗ f (n−1) : 1)) 3proceeds as shown in Figure 2.6I can now give an example of partial evaluation. If I wish to obtain a functionfor calculating nth Fibonacci number with the value of n fixed ahead of time (e.g.,n = 3), then the function can be obtained simply by rewriting the application(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3until all occurrences of the variable n are eliminated from the expression, as shownin Figure 2.7 (to make the calculations less cluttered, I only substitute for f whenthe next step of the calculation depends on its definition). A noticeable differencebetween the partial evaluation of(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 312(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3 =(λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3 =λ a. λ b. (( f 3−1) b) (a+b) =λ a. λ b. (((fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 2) b) (a+b) =λ a. λ b. (((λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 2) b) (a+b) =λ a. λ b. ((λ a. λ b. (((fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 1) b) (a+b)) b) (a+b) =λ a. λ b. ((λ a. λ b. (((λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 1) b) (a+b)) b) (a+b) =λ a. λ b. ((λ a. λ b. ((λ a. λ b. (( f 1−1) b) (a+b)) b) (a+b)) b) (a+b) =λ a. λ b. ((λ a. λ b. ((λ a. λ b. (( f 0) b) (a+b)) b) (a+b)) b) (a+b) =λ a. λ b. ((λ a. λ b. ((λ a. λ b. ((λ a. a) b) (a+b)) b) (a+b)) b) (a+b)Figure 2.7: Partial evaluation of (fix f . λ n. λ a. n ? λ b. (( f (n− 1)) b) (a+b) : a) 3and the standard evaluation of(fix f . (λ n. n ? n∗ f (n−1) : 1)) 3is that, during the evaluation of(fix f . (λ n. n ? n∗ f (n−1) : 1)) 3,the work of calculation (reduction) always happened at the outer-most level ofthe term (a pattern typical of tail-recursive functions, which are equivalent to whileloops). On the other hand, during the partial evaluation of(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3,the reduction happened inside the term, in a pattern typical of functions that are nottail recursive – even though, as written, the functionfix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : ais tail-recursive when fully applied.13While the partial evaluation of(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 3may demonstrate the process of partial evaluation, it leaves the question of how apartial evaluator program might be implemented somewhat vague. To understandmulti-stage programming, a high-level understanding of how a partial evaluatorworks is helpful, so I will sketch out the architecture of a partial evaluation system.2.3 Partial evaluation systemsA typical partial evaluation system [18] can be thought of as occurring in two dis-tinct phases: a binding time analysis phase which annotates the source code toindicate where computation can be done using the statically known inputs, and thespecialization phase, which transforms the parts of the program annotated by thebinding time analyzer to produce a residual program. The residual program itselfcan then be subjected to partial evaluation, producing in turn a new residual pro-gram, which could then be partially evaluated. As a simple example of this processof “chained” partial evaluations, the residual program obtained in Figure 2.7 couldbe partially evaluated if the value of the starting term of the Fibonacci sequencewas supplied. I have found it easiest to understand multi-stage programming asbuilding on the notion of “chained” partial evaluation.2.4 Multi-stage programming in MetaOCamlThe MetaOCaml multi-stage programming system does not include an automatedbinding-time analysis phase: instead, it requires the programmer to perform binding-time analysis manually, and to annotate the source code with staging annotations(which are essentially binding-time annotations). Also, in addition to the usualvalues (such as integers, strings, etc.), a MetaOCaml program may produce codefor a program that contains staging annotations (i.e., asking a MetaOCaml systemto evaluate suitably annotated(fix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a) 314Table 2.1: Translation from λ -calculus to (Meta)OCaml syntaxλ -calculus syntax OCaml syntaxx xn n 3λ x. t fun x → t4t1 t2 t1 t2fix f . t let rec f = t 5t1 ? t2 : t3 if t1 != 0 then t2 else t3t1 + t2 t1 + t2t1 − t2 t1 − t2t1 ∗ t2 t1 ∗ t2will produce a value that looks very much like the final line of Figure 2.7 – except,of course, expressed in MetaOCaml, rather than λ -calculus syntax).To illustrate, I will show below how to annotate the Fibonacci function fromabove in MetaOCaml. A translation from the λ -calculus syntax which I have beenusing so far to (Meta)OCaml syntax is given in Table 2.12.So, the definition offix f . (λ n. n ? n∗ f (n−1) : 1)becomeslet rec f = (fun n → if n != 0 then n∗ f (n−1) else 1),and the definition offix f . λ n. λ a. n ? λ b. (( f (n−1)) b) (a+b) : a2The translation given in the table is intended to produce OCaml code that is as close to thecorresponding λ -calculus syntax as possible (as opposed to producing idiomatic OCaml)3Negative integers need to be written as ˜−1, ˜−2,etc. in OCaml syntax4The arrow(→) is typed as a sequence of two characters: –>5let rec f = t in needs to be used if let rec f = t appears within another term (as opposed to atthe outer level of a term), e.g., , let rec f = fun x → t . . ., but fun x → let rec f = fun y → . . . in . . .15becomeslet rec f = fun n → fun a → if n != 0 then fun b → (( f (n−1)) b) (a+b) else a.In MetaOCaml, three kinds of annotations are used to express the order ofstaging (or, equivalently, the result of binding-time analysis). The code brackets.<,>. mark a term as code (i.e, not evaluated); the unary operator .~ (escape)is used inside a code block to request that an expression be evaluated, and a resultbe spliced into a piece of code, and, finally .! (run) is used to run a piece of code.As a practical matter, just code and escape are sufficient for the annotationsmost programmers would want to write most of the time. A key property of thedesign of MetaOCaml system is that it is always possible to erase all annotationsfrom a piece of MetaOCam code and end up with valid code in plain OCaml.Taha [30] reports that, in his experience, annotating functions as code trans-formers (i.e., functions take arguments, some of which may be pieces of code)tends to produce fewer annotations than annotating functions as “pieces of codethat happen to contain functions”. With that in mind, I start the annotation of thefunction to calculate the nth Fibonacci number. Here is the original function again:let rec f = fun n → fun a → if n != 0 then fun b → (( f (n−1)) b) (a+b) else aTo make the definition of the function legal under the OCaml type system,which requires that both branches of the if conditional produce result of the sametype, I rearrange the definition by “moving out” the fun b →let rec f = fun n → fun a → fun b → if n != 0 then (( f (n−1)) b) (a+b) else aBorrowing inspiration from Roberts [26] and Felleisen et al. [9], I can con-trast the process of writing regular Fibonacci numbers function with the processof adding staging annotations to the Fibonacci numbers function as follows: whenwriting regular Fibonacci numbers function, the recursive leap of faith [26] is thatthe recursive call to the function produces a previous Fibonacci number in the16sequence; the job of the programmer is to use operations on numbers (addition,subtraction, multiplication, etc.) to construct an expression for the next Fibonaccinumber given the recursive leap of faith [9]. On the other hand, when adding multi-stage annotations to the Fibonacci numbers function, the recursive leap of faith isthat the call to the function produces the code which calculates the previous Fi-bonacci number. The job of the multi-stage programmer is to use operations oncode (i.e., quoting using meta-brackets and splicing using .~ to produce the codewhich calculates the next Fibonacci number in the sequence, given the recursiveleap of faith.Both the unannotated and the annotated version of the Fibonacci function takethree parameters, but the meanings of the parameters differ: while the unannotatedversion takes a count of how many numbers lie between the parameter named a andthe desired Fibonacci number, the value of the second-most-recent Fibonacci num-ber calculated so far, and the value of the most-recent Fibonacci number calculatedso far (in that order), in the annotated version, the meaning of the first parameteris unchanged, but the second and third parameter are now pieces of code whichcalculate the respective Fibonacci numbers. Also, the return values of the two ver-sions of the function differ: the unannotated version of the function returns thedesired Fibonacci number; the annotated version returns the code which calculatesthe desired Fibonacci number.What annotations need to be added to the function? For the n = 0 branch ofthe conditional, if the function is given a piece of code that calculates the desiredFibonacci number, it simply returns that piece of code. On the other hand, in then 6= 0 case, a holds a piece of code which calculates i− 1st Fibonacci number,while b holds a piece of code which calculates ith Fibonacci number. The functionneeds to construct and pass to the recursive call a piece of code which calculatesthe i+1st Fibonacci number. A template for the piece of code to be constructed canbe written down as .<...+ ...>.. The first idea might be to write down simply.<a + b>.; however that expression is rejected by MetaOCaml. The reason theexpression is rejected is that it is nonsense[8]: plus(+) operates on numbers, not onpieces of code. The MetaOCaml .~ operator takes a piece of code that calculates x,and splices x into a fragment of code that surrounds the application of the operator.The final annotated function then, is:17l e t rec f = fun n −>fun a −>fun b −> i f n != 0 then ( ( f ( n−1) b ) . < ( .~ a+.~b ) > .else aThe behavior of the annotated function, when fully applied (i.e., when n, a and bhave all been supplied) is indistinguishable from the original function. However,the compiler has been directed how to produce a partial function when the functionis partially applied to just n (or just n and a).2.5 OCaml compiler overviewThe goal of this project was to re-implement MetaOCaml on top of a modernOCaml compiler, targeting the native code generation interface of the OCaml com-piler. Before describing the specifics of the implementation strategies chosen, Idescribe the structure of the OCaml compiler to help the reader orient herself.At a high level, the OCaml compiler is structured similarly to other moderncompilers, as shown in Figure 2.8: there is a compiler front end which producesthe intermediate representation of the program’s code and several back-ends whichwork on the intermediate representation to generate code for various target instruc-tion sets.The front end that is part of the standard OCaml distribution is responsiblefor performing lexical analysis and parsing of the OCaml source code as wellas for type checking the program, as illustrated in Figure 2.9. A key data struc-ture is an abstract syntax tree (abstract syntax tree (AST)), as defined in the fileparsing/parsetree.mli6. The standard OCaml distribution also includes apreprocessor which allows ASTs to be transformed before they are handed off tothe type checker.The OCaml type checker works on ASTs, and produces ASTs that are anno-tated with typing annotations. These typed ASTs are defined in the file typing/typedtree.mli. The type checking is the final stage at which a program canbe rejected: once an OCaml program is accepted by the type checker, it will becompiled, producing an executable for a target architecture. Further, the OCaml6All paths in this chapter are rooted at the src directory of the standard OCaml distribution18Front EndSource CodeLambdaIntermediate Representation(bytecomp/lambda.mli)Byte codeback endNative codeback endByte code Native codeFigure 2.8: OCaml compiler overview.type system guarantees that a program that is accepted by the type checker will befree from certain kinds of errors at run time.7Once the AST has been successfully type-checked, the subsequent stages ofcompilation do not use typing information.8A key data structure for the intermediate representation is defined in the filebytecomp/lambda.mli. The location of the files lambda.mli andtranslcore.mli in the source tree notwithstanding, both the byte code back7In theory. In practice, the type checker is just a program which could – in theory – have bugs init. Outside of the somewhat esoteric possibility of bugs in the OCaml type checker, OCaml includesa suite of unsafe operations in the module Obj – including an operation equivalent to the C++reinterpret_cast (namely, Obj.magic): since (some) of those operations subvert the OCaml type system,if a program uses one of those operations, then – generally speaking – all bets are off at run time.8This “throwing away” of type information is characteristic of not only OCaml but most compilersdesigned in the 90s [7]. Some of the compilers developed from 2000 onward retain type informationmuch later in the compilation process [32].19Source codeLexer + Parser(parsing/parser.mly)AST(parsing/parsetree.mli)Preprocessor(optional)(parsing/ast_mapper.mli)Type checker(typing/typecore.mli)Type-annotated AST(typing/typedtree.mli)AAST(parsing/parsetree.mli)ALambdaIntermediate Representation(bytecomp/lambda.mli)Translator(bytecomp/translcore.mli)Figure 2.9: OCaml compiler frontend detail.20LambdaIntermediate Representation(bytecomp/lambda.mli)Byte Code Translator(bytecomp/bytegen.mli)List of byte code instructions(bytecomp/intstruct.mli)Byte code emitter(bytecomp/emitcode.mli)String of bytesFigure 2.10: OCaml bytecode backend detail.end and the native code back-end work on the data types defined in lambda.mli,so, at this point in the development of the OCaml compiler it is quite reasonable tothink of Lambda as the intermediate representation in the OCaml compiler.Type-annotated ASTs are translated to the Lambda intermediate representationby the functions declared in the file bytecomp/translcore.mliThe byte code back end, defined in the file bytecomp/bytegen.mli, workson the lambda representation to produce a stream of byte code instructions(bytecomp/instruct.mli) for the OCaml byte code interpreter, as shown in21LambdaIntermediate Representation(bytecomp/lambda.mli)Closure conversion(asmcomp/closure.mli)uLambdaIntermediate Representation(asmcomp/clambda.mli)C–Intermediate Representation(asmcomp/cmm.mli)C– translator(asmcomp/cmmgen.mli)AAAssembly code generator(asmcomp/asmgen.mli)Assembly code(specific to OS and processor)Operating system’sassembler and linkerExecutable file(or shared library)Figure 2.11: OCaml native code backend detail.Figure 2.10. Ultimately, a string of bytes may be emitted either to memory or to afile on disk.The native code back end has a few more moving parts than the byte codeback end, as shown in Figure 2.11. The intermediate representation is first closure-22converted [3] and a bit of data flow analysis is done on it by the function introdeclared in the file asmcomp/closure.mli, producing a representation calledµlambda (asmcomp/clambda.mli). The µlambda representation is then con-verted to the C-- language by asmcomp/cmmgen.mli (Incidentally, the C-- lan-guage in the OCaml compiler is not related [24] to the C-- language of SimonPeyton Jones [25]). Assembly code for a specific processor (e.g, i386, x86_64,arm, etc.) is generated from the C-- representation via a series of standard codegeneration operations, such as register allocation, pipeline scheduling, etc. Thegenerated assembly code is then handed off to the system’s assembler and linker(the latter produce executables in the system’s preferred object file format).2.5.1 OCaml valuesThe OCaml run-time system is garbage-collected, so, similarly to other garbage-collected run-time systems [13] [12] OCaml makes a distinction between imme-diate and pointer values. Broadly speaking, immediate values are values whoserun-time representation is small enough that it is practical to pass the value on theC-like run-time stack. On the other hand, pointer values, point to (i.e., hold theaddress of) a garbage-collected value on the heap. A language implementer makesa choice about how small is “small enough” and what is “practical” to pass on therun-time stack. Given a value, an implementation needs to be able to distinguishat run time whether the value is immediate or a pointer. To this end, the commonimplementation strategy is to tag values: the implementer decides that a few bitsof a value (usually starting with the least significant bit) are to be used to distin-guish pointers from immediate values. The usual trick of the trade is to observethat every microprocessor platform currently in common use, either requires thatpointer values be aligned (i.e., hold an address that is a multiple of a power of 2)or strongly encourages aligned pointer access (by providing more efficient accessto aligned as opposed to unaligned pointers, the latter being pointer values that arenot a power of 2). That observation leads to an implementation decision to only usealigned pointers into the implementation’s run-time heap. Since an aligned pointerholds an address that is a power of 2, the low bits of such a pointer will always be 0(exactly how many bits depends on the minimum alignment chosen). This means23that, if a value does not have 0 in the low bit(s) it cannot be a pointer, and musttherefore be an immediate value.The choices made by CamlLight (a byte-code-only precursor to OCaml) weredescribed by Leroy [22]. That description is still surprisingly accurate when itcomes to describing how values are laid out in the OCaml heap. The choicesOCaml makes are actually quite simple: the only immediate values are integers,so only one tag bit is needed. Hence, pointers can (in principle) be aligned on2-byte boundaries, and on a machine with 32-bit integers, immediate integers canrange from 231−1 to 231.OCaml closuresThe implementation of MetaOCaml code relies crucially on notions of a func-tion’s environment and closure, so it is useful to describe those notions in somedetail. The material in this section forms the basis for the presentation of the im-plementation in Section 3.2.A function’s environment maps names of variables which are free 2.2.1 in thefunction to the values associated with those variables. A closure [21] is a datastructure that combines the code of the function with the function’s environment.A closure is a key mechanism for implementing lexical scope in presence offirst-class functions. A (type of) value in a programming language is said to befirst class if the language supports performing all of the following operations on anobject:• Passing the value as an argument to functions• Returning the value as a result of a function• Storing the value in a data structure and assigning values to variablesTo illustrate the need for a closure I invite the reader to consider the exam-ple function that appears in Figure 2.129. The function make_adder returns afunction which adds n to its argument. Lexical scoping rules demand that the nin the body of adder should refer to the value that was passed as an argument in9The function is written in a fictional programming language24function make_adder ( n ) :function adder ( x ) :return x+nreturn adderFigure 2.12: A function that returns a functionpointer to code of adder(pointer to) the valuen had when make_adderwas calledFigure 2.13: The memory of the closure created for adderthe call to make_adder. Another way to say the same thing is that the environ-ment associated with adder maps the name n with the value n had at the timemake_adder was called. In order to support lexical scoping rules, the compilerneeds to “remember” not just the code of the function adder, but also the valuethat n had when make_adder was called. Because of this need to “remember”not just the code of the function but also its environment, the compiler cannot rep-resent the function adder with just a pointer to the function’s code: if it did, itwould lose information about adder’s environment. One possible solution to thatproblem is to represent adder with an in-memory structure shown in Figure 2.13.To arrive at that structure, the compiler might go through the following steps:1. Make a list of all the variables that are free in the function adder. In thisexample, the only free variable is n.2. Calculate how much space is needed in order to store information aboutthe adder function. In the current example, the compiler is only storinga pointer to the code of the function, but more information could be kept10.3. For each free variable, calculate the offset at which the value the variablerefers to will be stored with respect to the beginning of the in-memory struc-10For example, the OCaml compiler stores (at least) the number of parameters a function takes inaddition to a pointer to the function25ture that is being created. In this example, there is only one variable namedn, so the compiler would “remember” that “variable n can be found at (zero-based) offset 1 in the memory block”. In general, more than one variablemay be free, and a table mapping names to offsets may need to be built atcompile time.4. Add a parameter which holds a pointer to the function’s environment to thelist of parameters for the function, and replace references to names in thebody of the function with offsets from the environment parameter, accordingto the table created in the previous step. In the current example, the compilerwould arrange for adder to take an extra parameter that holds a pointer toits environment, and, in the body of adder, the name n would get replacedwith an offset (in this case, 1) from the environment pointer.Having completed these steps, the compiler can pass around a pointer to the in-memory structure in Figure 2.13 in lieu of a code to just the code of adder. Theprocess of creating this in-memory structure is known as closure conversion [3].To compile a call to adder under the assumption that the compiler “has in itshands” a pointer to the structure shown in Figure 2.13 (which I will refer to simplyas closure), the compiler proceeds as follows:1. Retrieves a pointer to the function from the pointer to the closure. In thecurrent example, the pointer to the function is simply the pointer stored atoffset zero with respect to the the pointer to the closure.2. Arranges for the arguments that appear as part of the call to be passed tothe function (for example, the arguments may be pushed onto the stack). Inaddition to passing each of the arguments to the call, pass in a pointer to theclosure as the value of the environment parameter.3. Generate a jump to the function’s code.The OCmal compiler uses flat closures [5]. The basic idea is that a closureblock is allocated that holds function information for the entire set of mutually-recursive functions, as shown in the example that follows.26Closure Headerpointer to codefor function f1(arity of f)Infix Closure Hdrpointer to curryingwrapper for f’2(arity of f’)pointer to codefor f’Infix Closure Hdrpointer to curryingwrapper for f”3(arity of f”)pointer to codefor f”...closure_variable1closure_variable2...closure_variableN3 words to start ofnext function block4 words to start ofnext function blockoffset to variables block for f”Environmentpointer for fEnvironmentpointer for f’Environmentpointer for f”Figure 2.14: The memory layout of a standard OCaml closure27The closure shown in Figure 2.14 represents a closure the OCaml compilerwould create if it encountered a definition of three mutually-recursive functionsnamed f, f’, and f”; the function f takes a single argument (its arity is 1); thefunction f’ takes two arguments (its arity is 2); the function f” takes 3 arguments.The closure starts with a closure header11. The header is followed by one or morefunction blocks (in this case, there are three function blocks: one each for thefunctions f, f’, and f”. The second field of each function’s block is always thearity of the function for which the closure block is being allocated. The contents ofthe first field depends on the function’s arity: for functions of arity 1, the first fieldholds the pointer to the function’s code; for functions of arity greater than 1, the firstfield holds the pointer to a currying [27]12 wrapper for the original function, andthe third field contains the pointer to code for the function. Second and subsequentfunction blocks are preceeded by an infix closure header (all values in the OCamlheap are preceeded by a header; closures that appear within a flat-closure block arevalues, so they need to be preceeded by a header). The variables for the entire setof mutually-recursive functions are laid out in a single contiguous memory block.When compiling the definitions of the functions used in this example, the steps theOCaml compiler takes to create the closure structure can be described as follows:1. Make a list of all the mutually-recursive functions that appear in a singlemutually-recursive function definition (in this case, the list contains threefunctions: f, f’, and f”).2. Make a list of the names of all variables that are free in all the mutually-recursive functions (in this case, there are N variable names)1311Every value in the OCaml run-time heap has a value header, which indicates at run-time whatkind of value is stored in the block. The header is also used by the garbage collector to keep track ofwhich values are in use12Currying is the process by which a function of multiple arguments is converted to a functionthat consumes the first argument and produces a function that consumes the rest of the arguments inthe arguments list. The function that consumes the rest of the arguments in the list may in turn becurried.13Name clashes do not occur because the compiler generates unique name for each variable; i.e.,even if f and f’ take an argument named a, the compiler renames the variable uniquely. To giveunique name to each variable, the compiler keeps a counter which it increments each time a newvariable name is encountered. The internal name of a variable is then formed by concatenating thevariable the programmer wrote with the value of the counter.283. Go through the list of mutually-recursive functions and calculate the size ofthe memory block needed to hold the function blocks.4. Go through the list of names of free variables, and calculate the offset of eachvariable with respect to the start of the closure block. The offset of the firstvariable is simply the size of the memory area needed to hold the functionblocks (calculated in the previous step). Subsequent variables are just laidout sequentially in the order in which they appear in the list of names ofall free variables. The result of this step is a compile-time table mappingnames of variables that are free in the body of the function(s) to offsets inthe closure heap block. I will refer to this table as a (compile-time) lexicalenvironment.5. For each of the mutually-recursive function whose code references either afree variable or another mutually recursive function:• Introduce a parameter to hold the environment pointer; at function ap-plication time, the value of the environment pointer is simply the ad-dress of the start of the respective function block• For each of the free variables referenced, replace the variable namewith an offset from the environment pointer to the location of the vari-able in the variables block14.• For each of the mutually recursive functions referenced, replace thename of the function with an offset from the environment pointer tothe location of the function in the closure block15.2.5.2 The OCaml toplevelIn addition to the OCaml compiler, the OCaml distribution includes an interac-tive toplevel, which operates similarly to the read-evaluate-print loop in Lisp sys-tems[23]. For the purposes of implementing a multi-stage programming system,14Because of the way OCaml compiler lays out the closure block, that offset is always a positiveinteger15That offset may be either positive or negative, for example, the offset from the start of f’’senvironment would be positive if f” is being accessed but negative if f is being accessed29OCamltoplevel(type,result)printedUserLexer/parserType checkerType-annotatedASTuser-enteredstringLambda IRresult(value or exception)Compiler backendFigure 2.15: The OCaml toplevelthe toplevel is of interest because it is an interactive program that, during its execu-tion, invokes the OCaml compiler to obtain OCaml values. This ability to call thecompiler at run-time is crucial to being able to implement a multi-stage program-ming system, as will be seen in Section 3.1. In this section, I describe the operationand implementation of the toplevel, as it relates to my MetaOCaml implementation.When the toplevel is started, it presents the user with a prompt at which OCamlexpressions may be entered. An expression entered by the user is terminated by en-tering a double semicolon ;;. The expression is then compiled and type-checkedalmost as it would be compiled with the regular compiler.16 Then the expression is16There is a small number of special hooks in the compiler that are used specifically to supportevaluating expressions at the top level.30executed to obtain a result which can be either an exception or a regular value; theresult along with the result’s type are then printed to the user, and the cycle repeats.As discussed in Section 2.5, the OCaml compiler does not pass on typing in-formation to any compilation stages that follow the conversion to the lambda inter-mediate representation. Yet, as mentioned above, the toplevel prints out to the userboth the value to which the expression she entered evaluates to and the type of thevalue. Clearly, the toplevel cannot obtain the type of the value directly from thecompiler backend, so where does the type information come from?The answer is that the toplevel evaluates the expression the user has enteredin two distinct phases, as shown in Figure 2.15: first, it calls the type checkeron the parsed expression and saves the type obtained from the type checker; thenit translates the expression to the intermediate representation, and compiles andevaluates that intermediate representation to obtain a value. In the toplevel code,the function that compiles and evaluates the lambda intermediate representation iscalled load_lambda. As discussed in Section 2.5.3, the implementation of MetaOCamlpresented in this work uses the lambda intermediate representation to represent thecode objects. Hence, as described in more detail in Chapter 3, the MetaOCamlimplementation presented here uses load_lambda as a run-time interface to the OCamlcompiler.2.5.3 Representing codeAs was seen in Section 2.4, the central object of interest for multi-stage program-ming is code. An implementer of metaprogramming system on top of OCamlis faced with the question of which of the OCaml program representations shouldcode correspond to. Choosing ASTs as the internal representation of code is anattractive option for several reasons:1. Since AST is essentially a parse tree it is easy to “reconstitute” the code fordisplay to the programmer. Since the ability to interactively explore stagingannotations was one of the goals of the MetaML (and MetaOCaml) systems,the ability to display code to the programmer is important.2. It may be possible to take advantage of the OCaml preprocessor to do someof code generation.313. The type checking performed by the OCaml compiler on the ASTs may helpcatch errors in code generation.On the other hand, using ASTs as the representation for code commits one torunning the type checker whenever code is run – even though type systems whichwould reject potentially unsafe code exist [19].Avoiding redundant type-checking was one of the goalds of this project, so Ichose to use the Lambda intermediate representation, rather than ASTs as the repre-sentation for code. Since Lambda intermediate representation is a representation towhich fully type-checked code has been translated, this approach to implementa-tion holds the promise of reducing the amount of type checking, but leaves openthe issue of how to present generated code to the programmer: while the Lambdalanguage is quite rich, it may not be possible to unambiguously reverse-translate itinto an AST.Since all of my work was done between the intermediate representation Lambdaand Cmm levels, these levels are the focus of discussion in the Chapter 3.32Chapter 3ImplementationIn this chapter, I describe my re-implementation of the MetaOCaml system. I takea chronological approach to describing the implementation, as I believe this styleof presentation makes it easier to understand the design decisions and refinementsthat were made as the implementation progressed. Since the main object of interestis the native code compiler, the discussion focuses on it.3.1 Run for closed termsAs a first subgoal of the project, I extended the OCaml compiler to support thecode and run operations, with the limitation that code terms supported needed tobe closed (i.e., could not contain free variables).The compiler operation was extended as follows:1. Support for parsing the code brackets (. <, > .) and the run operator (.!) wasadded) to the parser (parsing/parser.mly.2. A new built-in type code was added (in typing/predef.mli).3. The lambda intermediate representation was extended to include a type Lcode.Parsing an OCaml expression surrounded by the code brackets (. <, > .)produces the Lcode intermediate representation.4. In the compiler backend, the lambda intermediate representation of the ex-pression that appeared between the code brackets code brackets (. <, > .) is33Lambda IRMarshal.to_stringOcamlruntimeheap......serialized Lambda IRFigure 3.1: The implementation of code for closed termssimply serialized (using the standard Marshal module), allocating a block oftype string in the OCaml heap, as shown in Figure 3.11.5. When run is applied to a value serialized by .<,>., the deserialized valuebecomes the body of a function that takes no arguments2 in the Lambda in-termediate representation. Conceptually, the run needs to call the load_lambdafunction in the toplevel (the OCaml toplevel is introduced in Section 2.5.2)passing it the zero-argument function to obtain a value. To allow run to ac-tually call the load_lambda function, I added a C-level primitive to the OCamlrun-time which calls a custom wrapper around the load_lambda function in theOCaml toplevel. The custom wrapper deserializes the serialized Lambda term,and passes the term to load_lambda. Figure 3.2 summarizes the operation ofrun.1Serializing the code was an early design decision, intended to ensure that the entire structure ofthe Lambda terms is preserved2Technically, OCaml does not support functions that take no arguments functions: a zero-argument function is represented in OCaml as a single-argument function whose argument is ofspecial type unit . The unit type can be thought of as being similar to Java null34Ocamlruntimeheap......serialized Lambda IRimplementation of ’run’(bytecomp/translcore.mli)result(value or exception)Compiler backendC-level primitiveCustom wrapperfor load_lambdaOCaml toplevelLambda IRFigure 3.2: The implementation of run for closed terms353.2 Cross-stage persistenceAfter settling on an implementation for run, the next item to tackle was cross-stagepersistence. The need for cross-stage persistence can be illustrated with the simplefunction let make_code a = .<a>.. When the function is called, the desired behavioris that the piece of code returned from the function “remembers” the value of athat was in effect when make_code was called. From the standpoint of the user, thebehavior is very similar to the behavior of make_adder in Section 2.5.1: when thefunction returned by make_adder is called, the function being called “remembers” thevalue of n that was passed to make_adder. Similarly, when .! is applied to the pieceof code returned my make_code, the piece of code to which .! is being applied must“remember” the value of a that was in effect at the time make_code was called.To support cross-stage persistence, it must be possible for run to execute a pieceof code in a way that closely simulates how the code would have been executed ifit had not been surrounded by the code brackets (. <,> .). Specificially, the codefragment inside the code brackets must have access to all the variables that are lex-ically visible at the point at which the code fragment occurs. In order to executethe code fragment in the same lexical environment in which the code fragment oc-curred, run needs to have access to a an environment that maps free variables in thecode fragment to the values those variables were referring to when the expressionthat created the code fragment was encountered. One way to satisfy this require-ment is to arrange things so that run has access to a closure created at the time thecorresponding code construct is encountered. This is exactly the approach takenby the implementation presented here. Additionally, in order for run to be able tocompile the code fragment, run needs to have access to a lexical environment thatmaps variable names to offsets in the closure block.The responsibility for acquiring these two pieces of information is split be-tween the implementation of code and implementation of run. As shown in Fig-ure 3.3, the implementation of code is modified to preserve, in addition to thecode fragment itself, the lexical environment in which the code fragment should becompiled.The code construct allocates a structure which I will refer to as MetaOCaml36closure conversion(asmcomp/closure.mli)lexical environmentLambda IRimplementation of ’code’(bytecomp/translcore.mli)Lambda IRC– translator(asmcomp/cmmgen.mli)Ocamlruntimeheap......MetaOCaml closureFigure 3.3: The implementation of code for cross-stage persistenceclosure. A MetaOCaml closure is an extension of a regular OCaml closure3. TheMetaOCaml closure, shown in Figure 3.4, adds two fields to the standard OCamlclosure4:1. The piece of code which contains the free variables whose values appear inthe closure (as a Lambda intermediate representation).2. The lexical environment that maps names of free variables to offsets in the3Figure 2.14 shows the memory layout of a standard OCaml closure4Conceptually, MetaOCaml closure adds two fields. In practice, the fields are stored as a serial-ized representation of a standard OCaml record, so the actual number of fields added is 1.37Closure Headerf_label(not used)int_arity(not used)closure_variable1closure_variable2...closure_variableNcode fragment(Lambda IR)offset forclosure_variable1offset forclosure_variable2...Lexical environmentoffset forclosure_variableNStandardOCamlClosureMetaOCamlextensionsFigure 3.4: The MetaOCaml closureclosure.As shown in Figure 3.5, run passes a pointer to a MetaOCaml closure to aC-level primitive, which, in turn, calls back into a custom wrapper around theload_lambda function in the OCaml toplevel. The custom wrapper uses theLambda intermediate representation of the code fragment and the lexical envi-ronment it has retrieved from the MetaOCaml closure along with the pointer to38implementation of ’run’(bytecomp/translcore.mli)C-level primitiveCustom wrapperfor load_lambdaLambda IROcamlruntimeheap......MetaOCaml closurelexicalenvironmentpointer toclosureOCaml toplevelresult(value or exception)Compiler backendpointer toclosurecustom Lambda IR variantFigure 3.5: The implementation of run for cross-stage persistence39MetaOCaml closure to construct a custom variant record in the Lambda interme-diate representation, which it passes to the load_lambda function in the OCamltoplevel. I extended the compiler backend to recognize the custom variant. Thesteps the compiler back-end takes to process the custom variant can be describedas:1. Perform closure conversion on the code fragment using the lexical environ-ment it has received as part of the variant record. The result of the compila-tion is code for a function. The function takes zero parameters; it also takesa pointer to the environment in which it expects to be run.2. Call the function obtained in the previous step passing to it the pointer to theclosure contained in the variant record (and ultimately obtained from the Cprimitive) as the value of the environment parameter to obtain a result (i.e.,a value or an exception).3.3 SplicingSplicing is the MetaOCaml mechanism by which pieces of code can be combinedto produce other (larger) pieces of code. The basic idea is that the programmerconstructs a template for the code she wants to have generated. The template (con-ceptually) has a few placeholders. The programmer then uses splicing to indicatewhat pieces of code to “plug into” the locations of the placeholders. The piecesof code being “plugged in” may have been passed as function’s arguments or theymay have been constructed by preceding code. It may help to think of MetaO-Caml as embedding a language for manipulating pieces of code into OCaml. Inthe tradition of SICP [1], to understand the language one needs to understand whatprimitives it provides and what means for creating and constructing those prim-itives are available. The only primitive the language provides is code (a codefragment). The means for combining those primitives are quoting (putting themeta-brackets(. <,> .) around a literal code fragment) and splicing (plugging ina piece of code into a code fragment).5 In Chapter 1, the splicing operator was5I am treating run as somewhat tangential to the issue of constructing and manipulating pieces ofcode: run evaluates a piece of code to produce a value, but the resulting value may or may not be apiece of code.40used to combine the pieces of code that calculate the n−1st and n−2nd Fibonaccinumbers. It is important to note that not every fragment of code preceded withthe escape operator (.~) needs to be evaluated: Taha [30] introduces the notionof a level of a (syntactic) term, which is simply the number of meta-brackets theterm is surrounded by minus the number of escape operators that precede the term.The rules of evaluation for MetaOCaml state that only escapes at level 0 need tobe evaluated. To motivate this rule, it may help to think of evaluation as proceed-ing in phases, with subsequent phases being (potentially) separated in time. Onlythe splices that appear at level 0 are relevant for the evaluation during the currentphase: the splices that appear at levels numbered higher than 0 are relevant forevaluation during later phases, which may potentially occur at later time.6.To summarize, the splicing operator may only occur inside the code brackets(.<,>.). The operator causes the expression that follows it to be evaluated. Theresult of the evaluation must be a piece of code7. That piece of code is “pluggedin” at the location at which the splicing operator appears. The processing of splicesonly takes place for splices that appear at level 0: the splices that appear at levelshigher than 0 will be evaluated during a later phase of evaluation.The processing of a fragment of code with splices may be visualized as shownin Figure 3.6. The end goal of the processing is to produce a regular MetaOCamlclosure, which can be processed in all the ways code can be processed. The pro-cessing proceeds as follows:1. A code fragment is examined, and a list of splices it contains is created.References to splices inside the fragment of code are replaced with numbersrepresenting position of respective splice in the list of splices.2. The code for each splice is translated to the µLambda representation in theusual fashion. The code for the main body of the code is not translatedat this time (though its lexical environment is preserved, to be used in theMetaOCaml closure allocated in the next step).6The MetaOCaml type system prohibits splices at levels lower than 07The MetaOCaml type system ensures that the expression being spliced in is indeed a piece ofcode.41.<... .˜a ... .˜z ...>.scan for level-0 splices(asmcomp/closure.ml).<... .#1 ... .#26 ...>.(Lambda IR)list of splices: (a,...,z)(Lambda IR)closure conversion of spliced-in fragments(asmcomp/closure.ml)compilation of spliced-in fragments(asmcomp/cmmgen.ml).<... .#1 ... .#26 ...>.(Lambda IR)list of splices: (a,...,z)(uLambda IR)OCaml HeapMetaOCamlclosure for.<...#1...#26 ...>.value of splice 1...value of splice Ncount of splicesrebuild primitiveFigure 3.6: The implementation of code for splicing423. A MetaOCaml closure is allocated for the main body of the code. The clo-sure’s size is declared to be large enough to hold the MetaOCaml closureproper, the values of splices, and a count of the number of splices.4. The rebuild primitive is called on the extended MetaOCaml closure allocatedin the previous step.The data flow of the rebuild primitive is identical to the operation of the runprimitive describe in Section 3.2: it constructs a custom variant of the Lambda in-termediate representation and passes that variant to load_lambda8, so I do notduplicate that part of the description here, and instead focus on how the customvariant of the Lambda intermediate representation constructed by the rebuild prim-itive is processed in the compiler backend. An important point to keep in mind isthat the value of each splice is a MetaOCaml closure, which means that it containsthe Lambda intermediate representation of a fragment of code, the vector of valuesof variables that are free in that code fragment, and a lexical environment mappingnames of free variables to offsets in the values array.The rebuild primitive operates on the extended MetaOCaml closure shownin Figure 3.6; it produces a regular MetaOCaml closure9 as follows:1. The lexical environment of the rebuilt MetaOCaml closure is the lexical envi-ronment of the MetaOCaml closure corresponding to the main code fragmentaugmented with names of variables that appear in the lexical environment ofsome splice but not in the lexical environment of the main code fragment.2. The values vector of the rebuilt MetaOCaml closure is the values vector ofthe MetaOCaml closure corresponding to the main code fragment augmentedwith values corresponding to variable names that were added to the lexicalenvironment of the MetaOCaml closure in the previous step.3. The Lambda intermediate representation of the rebuilt closure is obtained byreplacing the ith splice marker in the Lambda intermediate representation of8It would have been possible to have a single primitive that dispatches on the type of value passedto it; having two primitives is a bit overkill, but it does offer a very straightforward implementation.9Figure 3.4 illustrates a regular MetOCaml closure43l e t puzzle = . ! ( . < fun a −>. ~ ( ( fun x −> . <x > . ) ( ( fun x −> . <a > . ) 0 ) > . ) 5Figure 3.7: Example that illustrates the scope extrusion problem, by Tahathe main fragment with the Lambda intermediate representation found in ithsplice10As a result of this processing, a regular MetaOCaml closure is produced. Thisclosure can then be processed in all the usual ways code can be processed, givingthe user of the system multi-staging abilities.3.3.1 Dealing with scope extrusionThe scope extrusion problem as it applies to MetaOCaml was described by Taha [30].Briefly, the scope extrusion problem is the fact that a naive implementation of splic-ing may allow variables that should be bound (by an abstraction or let-form) toescape the scope of their corresponding binder. Taha’s original example illustrat-ing the scope extrusion problem appears in Figure 3.7. As first described in [30],a naive implementation of MetaOCaml will return something like .<a123>. whenpresented with this example. The peculiar value .<a123>. is the (uniquely-renamed)variable a which is bound by the fun a −>, but is free in the code fragment that fol-lows the .~. The difficulty arises because, as mentioned in Section 3.3, the MetaMLmental model of the order of evaluation demands that the spliced-in value be eval-uated before the code fragment into which they are being spliced. The variable ais free in the code fragment, so the expression (fun x −> .<a>.) 0 yields the variable.<a123>.; when the closure environment is built for the function (fun x −> .<x>.), thevariable x is bound to the code fragment that contains .<a123>., and when the valuebound to the variable a is finally known, a naive implementation does not “realize”that it needs to traverse the structure of the closure created for the function fun a −>to find and replace the now-unbound value of a.My solution to the scope extrusion problem is based on the observation that, atany point in the code fragment, it is known which variables are bound (static scope10The bound variables of each splice are renamed before substitution is performed to avoid acci-dental variable capture.44discipline ensures that that is the case). In other words, the compiler can find outwhich variables should be bound simply by recursively descending the code termwhile keeping track of the binders encountered so far and the variables they bind.To handle the scope extrusion problem in my code, I extended the Ulambda languageby introducing a Ucover term (the inspiration for my implementation of Ucover as wellas the term cover come from Taha [30]).The high-level intuition for my implementation of the concept of covers, pio-neered by Taha [30]) is that I want a construct that obeys the rules of regular lexicalscoping (so that it can have access to all the variables that are lexically visible atthe point at which it occurs), but which delays actually acting on the pieces of codeit covers until the values of the variables that are appear to be free in the body ofthe code being spliced in are known.The operation of the Ucover term can be described as follows:1. When a piece of code is compiled, the Lambda-tree is traversed, looking forvariables bound by functions. Each point in the code where an expressionis spliced in is annotated with the list of (names of) variables that have beenbound by the function definitions encountered so far. I view the informationabout "what variables are bound" as a property of the context into which anexpression is being spliced rather than as an intrinsic property of the spliced-in expression itself.2. When the expression being spliced is closure-converted, the lexical environ-ment in which closure-conversion happens is "poisoned" by binding each ofthe variables that appear as an argument to any function (from the previousitem) to a special tag (Ufreevar variable_name_here).3. The closure-converted code is then examined to see which Ufreevars (if any)have been "closed over". For any Ufreevar’s that have been captured, offsetsfrom the beginning of the closure block are collected. (This trick is borrowedfrom stock OCaml close_functions code, which is what processes a "let rec"definition: that function does something very similar to what I describedabove to decide whether it actually needs to pass to a function the address ofits corresponding closure block).454. The list of Ufreevars is used to construct a Lcover that "wraps around" thesplicing-in point. The Lcover expression is part of the "main" body of thecode (as opposed to being part of the code of the expression being splicedin), so all variables that have been bound by functions are lexically visible toLcover. Lcover eventually compiles to code that puts values correspondingto variables at their intended offsets in the closure’s variable block (Ufreevarcompiles to code that just allocates a dummy cell in the closure’s variableblock).46Chapter 4EvaluationIn this chapter, I present the results of evaluating the performance of my MetaO-Caml implementation and other MetaOCaml implementations on a few bench-marks. As mentioned in 2.4, a property of the MetaOCaml design is that it isalways possible to erase all annotations from a piece of MetaOCaml code to endup with valid plain OCaml code. The role of the annotations is to allow the pro-grammer to influence the order of evaluation. The stated motivation for design ofMetaML [31] (and MetaOCaml) was that, by allowing programmers to control theorder of evaluation, programmers will be able to influence the performance of theprograms they write. In view of that design goal of annotations as a tool for im-proving performance of programs, I consider the ratio of running time for stagedfunction (under MetaOCaml) to the running time of unstaged function (under thematching version of plain OCaml compiler) to be of primary interest, and raw run-ning time statistics to be of secondary interest.4.1 MethodologyThe measurements for the 4.02.1 series of compilers were collected using theCore_bench library from Jane Street [17]. Suppose I have a procedure (i.e., a func-tion that takes no arguments and does not return any interesting results to its caller)named p. The Core_bench library attempts to find and report the average runningtime tavg of the procedure p which allows the total running time ttotal for k runs of47open Core . Stdopen Core_bench . Stdl e t p1 ( ) = . . .l e t p2 ( ) = . . .l e t main ( ) =Command. run ( Bench . make_command [Bench . Test . c reate ~name : " p1 " p1 ;Bench . Test . c reate ~name : " p2 " p2 ;] )l e t ( ) = main ( )Figure 4.1: The OCaml code to benchmark (hypothetical) procedures namedp1 and p2 using Core_bench.p to be calculated simply as ttotal = ktavg. It finds tavg by sampling ttotal for variousvalues of k, and then fitting a straight line through the data points obtained duringsampling. Figure 4.1 shows the OCaml code to benchmark two OCaml proceduresnamed p1 and p2 using Core_bench.Unfortunately, Core_bench is not available under the original MetaOcaml [4](because the version of OCaml on which that implementation of MetaOCaml isbased is too old), so I used a simple piece of code to run the function for a specificnumber of iterations and report the average running time. The code appears inFigure 4.2. The number of times to run each individual function was determined inan ad-hoc manner, by trying a few different values until the reported average timeappeared to be stable; I report the number of iterations used for each function.All measurements were performed on a virtual machine running Debian stable(jesse) release 8.2 with 64-bit Linux kernel version 3.16.0. The virtual machinewas run under VirtualBox version 5.0.6; the virtual machine had a single CPUallocated to it; the host CPU was i5-2520M with the (nominal) clock frequency of2.5 GHz.Table 4.1 describes the microbenchmarks used to measure the performance ofthe OCaml systems under evaluation.The power and fib are the traditional favorites of the partial evaluation and multi-stage programming communities, while the peval1 and peval2 were used by Taha [28]as case studies to motivate the usefulness of multi-stage programming techniques.48l e t i t e r s = re f 1234l e t rec loop i f =i f i <= 0 then ( )else begin f ( ) ; loop ( i −1) f endl e t simple_bench f =l e t t0 = Sys . t ime ( ) inloop ! i t e r s f ;l e t t1 = Sys . t ime ( ) inp r i n t _ f l o a t ( ( t1 −. t0 ) / . ( f l o a t _ o f _ i n t ! i t e r s ) ) ; p r i n t _new l i ne ( )l e t main f =i f Array . leng th Sys . argv > 1then i t e r s := i n t _ o f _ s t r i n g ( Sys . argv . ( 1 ) ) ;simple_bench fl e t ( ) = main ( fun ( ) −> . . . )Figure 4.2: The OCaml code to collect timing data from Taha et. al MetaO-Caml.Benchmark Descriptionfib The function to calculate Fibonacci numbers.power The function to raise a number to an (integer) power.cfib The staged version of the functionto calculate Fibonacci numbers.cpower The staged version of the functionto raise a number to an integer power.peval1 An interpreter for a simple languagewith the four arithmetic operations and first-order functions.peval2 A staged version of the interpreter for a simple languagewith the four arithmetic operations and first-order functions.Table 4.1: Description of the benchmarks used in evaluating the performanceof the MetaOCaml implementations.49The code for the power, fib , cpower and cfib appears in Figure 4.3. The code forpeval1 and peval2 appears in [28]1.4.2 Performance resultsTable 4.2 shows the ratios of running times of staged code in MetaOCaml im-plementations being benchmarked to the respective running times of plain OCamlcode under the same major and minor version of the compiler and the same backendas the staged code. My MetaOCaml implementation is currently based on OCaml4.02.1 and comes with a native code backend, so I report the numbers normalizedto native code produced by the standard OCaml compiler version 4.02.1; on theother hand BER MetaOCaml comes with a byte code backend (and is also basedon OCaml 4.02.1), so I report numbers for BER normalized to byte code producedby the standard OCaml compiler version 4.02.1; finally the original MetaOCamlimplementation is based on OCaml version 3.0.9, and comes with both a nativecode and a byte code backend – however, I am primarily interested in the nativecode backend, so I only report numbers for the original MetaOCaml normalized tothe native code produced by the standard OCaml compiler version 3.0.9.Tables 4.4, 4.5, 4.8, 4.6, 4.7, and 4.9 present the “raw” running times for BERMetaOCaml, my implementation of MetaOCaml, the original MetaOCaml, plainOCaml 4.02.1 (the byte code backend), plain OCaml 4.02.1 (the native code back-end), and OCaml 3.09 (the native code backend) respectively. As mentioned inSection 4.1, the numbers for implementations based on OCaml version 4.02.1 werecollected using Core_bench, while the numbers for the implementations based onOCaml 3.0.9 were collected using a simple piece of code which only measures therunning times. Unfortunately, no numbers could be obtained for peval2 under theoriginal MetaOCaml implementation, since the compiled program crashed with themessage, “Fatal error: exception Ctype.Unify(_, _).”2 (the program does run underthe bytecode toplevel in the original MetaOCaml).I speculate that the reason staged functions run significantly slower than un-1For consistency, I prefix names of all staged functions with “c”(for “code”), so peval2 appearsas cpeval22Regrettably, I am not familiar enough with the original MetaOCaml implementation to be ableto effectives debug the cause of the crash.50Benchmark BER MetaOCaml My MetaOCaml Original MetaOCamlcpower 2850 180∗103 6530cfib 45000 22∗106 780∗103Table 4.2: Ratio of running time of (annotated) benchmarks to the runningtime of unannotated program in the corresponding OCaml compiler.l e t rec power x n = i f n = 0 then 1 else x * power x ( n−1)l e t rec f i b a b n = i f n = 0 then a else f i b b ( a + b ) ( n−1)l e t rec s f i b a b n = i f n = 0 then a else s f i b b ( . < ( . ~ a ) + ( . ~ b ) > . ) ( n−1)l e t c f i b = s f i b ( . <2 > . ) ( . <3 > . ) 17l e t rec spower x n = i f n = 0 then . <1 >. else . < ( .~ x ) * ( . ~ ( spower x ( n−1))) >.l e t cpower = spower ( . <2 > . ) 17Figure 4.3: The code for power, fib and their staged counterparts, cpower andcfibstaged versions of the same function is that staged functions allocate significantlymore heap memory than their non-staged counterparts. The memory allocationnumbers, as reported by Core_bench are shown in Table 4.3. As can be seen, bothBER MetaOCaml and my implementation of MetaOCaml allocate significantlymore memory than regular OCaml when running these benchmarks.I speculate that the reason my implementation allocates significantly more heapstorage than the BER MetaOCaml is the serialization and deserialization of theLambda intermediate language. Getting rid of the serialization and deserializationis one obvious avenue for future work. At a higher level, my measurements suggestthat reducing the number of allocations in the run-time heap during a program’s runtime is likely to yield higher performance gains than reducing the number of CPUinstructions the program executes while it runs.51Benchmark BER MetaOCaml My MetaOCaml Stock OCamlcpower 18∗103 2.5∗106 0cfib 2600∗103 17.9∗106 0Table 4.3: Memory allocation for benchmarks (in words allocated in the mi-nor heap).Name Time minor heap major heap minor→ major promotionscpower 1,600µs 18∗103 words 2∗103 words 3∗103 wordscfib 230,000µs 2,700∗103 words 400∗103kw 400∗103 wordscpeval2 600µs 7∗103 words 1∗103 words 1∗103 wordsTable 4.4: Raw data for BER MetaOCaml, as reported by Core_bench. Allnumbers are average values per run.Name Time minor heap major heap minor→ major promotionscpower 110,000 µs 2.5∗106 words 5∗103 words 3.7∗103 wordcfib 790,000 µs 18∗106 words 840∗103 words 770∗103 wordsTable 4.5: Raw data for my MetaOCaml implementation, as reported byCore_bench. All numbers are average values per run.Name Time minor heappower 575.46nsfib 510.08nspeval1 9,600ns 150.00wTable 4.6: Raw data for OCaml v4.02.1 with byte code back-end, as reportedby Core_bench. All numbers are average values per run. BER MetaO-Caml running times are normalized to the numbers in this table.52Name Time minor heappower 61nsfib 36nspeval1 1,800ns 149 wordsTable 4.7: Raw data for OCaml v4.02.1 with native code back-end, as re-ported by Core_bench. All numbers are average values per run. Therunning times of my MetaOcaml implementation are normalized to thenumbers in this table.Name Time number of iterationscpower 430µs 700cfib 31,000µs 400cpeval2 (crashed) —Table 4.8: Raw data for the original MetaOCaml implementation. All num-bers are average values per run.Name Time number of iterationspower 66ns 299999999fib 41ns 199999999peval1 1,690ns 29999999Table 4.9: Raw data for OCaml 3.09 with native code backend. All numbersare average values per run. The running times of the originala MetaO-caml implementation are normalized to the numbers in this table.53Chapter 5Conclusions5.1 SummaryI have presented my re-implementation of MetaOCaml on top of a modern OCamlcompiler. My implementation uses the Lambda intermediate representation (as op-posed to abstract syntax trees) to represent OCaml code fragments. Cross-stagepersistence is achieved by using an extended version of OCaml closures whichcombine the values vector of the usual OCaml closure with a code fragment to becompiled and the lexical in which environment to perform the compilation. Splic-ing is accomplished by directly manipulating the Lambda intermediate representa-tion, followed by concatenation of the values vector of the OCaml closures whichcorrespond to the main body of code and the code being spliced. (Implementingsplicing by direct manipulation of Lambda terms is helped by the fact that Lambdalanguage is a rather high-level and AST-like. It has been argued that this prop-erty makes certain kinds of “standard” optimizations awkward to perform [10], butthe AST-like nature of Lambda definitely helped with my implementation of MetaO-Caml). The scope extrusion problem is dealt with by a pre-emptive scan of thelexical structure of each code term.545.2 RetrospectiveI have presented a (re)-implementation of MetaOCaml that supports turn-key nativecode generation on top of a modern OCaml compiler infrastructure. The existingMetaOCaml implementation that supports turn-key generation of native code arenot based on a modern OCaml compiler. The existing MetaOCaml implementationthat is based on a modern OCaml compiler does not support turn-key generation ofnative code.Along with producing a convenient way to compile MetaOCaml to native code,a secondary goal of the project was to evaluate the current version of OCaml as asubstrate for implementing MetaOCaml. In summary, I am happy to report thatthe current OCaml compiler is a very good environment for implementing MetaO-Caml: the total number of lines of code that needed to be changed or added as com-pared with the standard OCaml compiler is well under 3000. I was able to extendthe existing compiler data structures and functions essentially without doing anymajor re-architecturing work on the compiler: outside of exposing the load_lambda in-terface in toploop/nattoploop.ml, and changing the scope of the module-global reference program_name in the same file, very little of my implementation workcan be accurately described as “fixing bugs in the OCaml compiler.”A third goal of the project was to check whether by removing the redundanttype-checking at run-time the overall performance of the MetaOCaml compiler wasimproved (as compared with the existing native-code compiler). Unfortunately, nosignificant improvements in run time of programs were observed during the perfor-mance evaluation. One area potentially worth exploring with respect to run-timeperformance is whether serializing and deserializing of complete Lambda strcuturesis really necessary. It certainly was a convenient and quick implementation strat-egy, but it may have come at a cost in terms of program performance.5.3 Possible future workPossible directions for future work are:Do not serialize Lambda A question worth investigating is whether performancecan be significantly increased by getting rid of serialization/deserializationof the Lambda intermediate representation into string.55Bytecode compiler During the development of the native code compiler, I im-plemented run and cross-stage persistence in a byte code compiler. Unfor-tunately, the implementation of the byte code version of those features didnot use the implementation model which can be summarized as “extend theLambda language enough to be able to support MetaOCaml; make a min-imal wrapper around load_lambda”: instead, it relied on a special version ofload_lambda. Redoing that implementation so that it does not need a specialversion of load_lambda would be desirable.Native code back-end for BER MetaOCaml BER MetaOCaml is an implemen-tation of MetaOCaml that keeps pace with current developments of the OCamlcompiler. Some of the experience gained from the current “clean-room”implementation of native MetaOCaml code generator may be transferrabletowards producing a native code generator for BER.56Bibliography[1] Harold Abelson and Gerald J. Sussman. Structure and Interpretation ofComputer Programs. 2nd. Cambridge, MA, USA: MIT Press, 1996. ISBN:0262011530.[2] Baris Aktemur et al. “Shonan Challenge for Generative Programming:Short Position Paper”. In: Proceedings of the ACM SIGPLAN 2013Workshop on Partial Evaluation and Program Manipulation. PEPM ’13.Rome, Italy: ACM, 2013, pp. 147–154. ISBN: 978-1-4503-1842-6.[3] Andrew W. Appel. Compiling with Continuations. New York, NY, USA:Cambridge University Press, 1992. ISBN: 0-521-41695-7.[4] Cristiano Calcagno et al. “Implementing Multi-stage Languages UsingASTs, Gensym, and Reflection”. In: Proceedings of the 2Nd InternationalConference on Generative Programming and Component Engineering.GPCE ’03. Erfurt, Germany: Springer-Verlag New York, Inc., 2003,pp. 57–76. ISBN: 3-540-20102-5. URL:http://dl.acm.org/citation.cfm?id=954186.954190.[5] Luca Cardelli. The Functional Abstract Machine. Technical ReportTR-107. Bell Laboratories, 1983. URL:https://karczmarczuk.users.greyc.fr/matrs/Maitrise/fam.pdf.[6] A. Church. The calculi of lambda-conversion. Annals of mathematicsstudies. Princeton University Press, 1941.[7] Joshua Dunfield. private communication. 2015.[8] Matthias Felleisen and Daniel P. Friedman. A Little Java, a Few Patterns.Cambridge, MA, USA: Massachusetts Institute of Technology, 1998. ISBN:0-262-56115-8.[9] Matthias Felleisen et al. How to Design Programs: An Introduction toProgramming and Computing. Cambridge, MA, USA: MIT Press, 2001.ISBN: 0-262-06218-6.57[10] Fabrice Le Fessant. Flambda trunk. Dec. 18, 2014. URL:https://github.com/ocaml/ocaml/pull/132 (visited on 12/18/2014).[11] Matthew Flatt. Rebuildxing Racket’s Graphics Layer. June 15, 2015. URL:http://blog.racket-lang.org/2010/12/racket-version-5.html (visited on12/08/2010).[12] Free Software Foundation, ed. GNU Emacs Lisp Reference Manual. sectionGNU Emacs Internals, subsection Object Internals. 2015. URL:http://www.gnu.org/software/emacs/manual/html_node/elisp/Object-Internals.html#Object-Internals.[13] Free Software Foundation, ed. The Guile Reference Manual. section GuileImplementation. 2014. URL:http://www.gnu.org/software/guile/manual/html_node/Data-Representation.html#Data-Representation.[14] Paul Graham. On LISP: Advanced Techniques for Common LISP. UpperSaddle River, NJ, USA: Prentice-Hall, Inc., 1993. ISBN: 0130305529.[15] Paul Graham. Succintness Is Power. Sept. 15, 2015. URL:http://www.paulgraham.com/power.html (visited on 05/01/2002).[16] Miguel Guerrero et al. “Implementing DSLs in metaOCaml”. In:Companion to the 19th Annual ACM SIGPLAN Conference onObject-oriented Programming Systems, Languages, and Applications.OOPSLA ’04. Vancouver, BC, CANADA: ACM, 2004, pp. 41–42. ISBN:1-58113-833-4.[17] Roshan James. Core_bench: better micro-benchmarks through linearregression. May 17, 2014. URL:https://blogs.janestreet.com/core_bench-micro-benchmarking-for-ocaml/(visited on 10/14/2015).[18] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluationand Automatic Program Generation. Upper Saddle River, NJ, USA:Prentice-Hall, Inc., 1993. ISBN: 0-13-020249-5.[19] Oleg Kiselyov and Chung-chieh Shan. “The MetaOCaml files”. 2010.[20] Oleg Kiselyov and Walid Taha. “Relating FFTW and Split-radix”. In:Proceedings of the First International Conference on Embedded Softwareand Systems. ICESS’04. Hangzhou, China: Springer-Verlag, 2005,pp. 488–493. ISBN: 3-540-28128-2, 978-3-540-28128-3.[21] Peter J Landin. “The mechanical evaluation of expressions”. In: TheComputer Journal 6.4 (1964), pp. 308–320.58[22] Xavier Leroy. The ZINC experiment: an economical implementation of theML language. Technical report 117. INRIA, 1990.[23] John McCarthy. “Recursive Functions of Symbolic Expressions and TheirComputation by Machine, Part I”. In: Commun. ACM 3.4 (Apr. 1960),pp. 184–195. ISSN: 0001-0782.[24] Simon Peyton Jones and Norman Ramsey. C-- FAQ (Web Archive). 2007.URL: http://web.archive.org/web/20080716105923/http://www.cminusminus.org/faq.html.[25] Simon Peyton Jones and Norman Ramsey. C-- Official Site (Web Archive).2007. URL: http://web.archive.org/web/20080822062234/http://www.cminusminus.org/.[26] Eric S. Roberts. Thinking Recursively: With Examples in Java. John Wiley& Sons, 2005. ISBN: 0471701467.[27] Moses Schönfinkel. “Über die Bausteine der mathematischen Logik”. In:Math. Ann 92.3–4 (1924), pp. 305–316.[28] Walid Taha. “A gentle introduction to multi-stage programming”. In:Domain-specific Program Generation, LNCS. Springer-Verlag, 2004,pp. 30–50.[29] Walid Taha. “A Sound Reduction Semantics for Untyped CBN Mutli-stageComputation. Or, the Theory of MetaML is Non-trival (ExtendedAbstract)”. In: Proceedings of the 2000 ACM SIGPLAN Workshop onPartial Evaluation and Semantics-based Program Manipulation. PEPM’00. Boston, Massachusetts, USA: ACM, 1999, pp. 34–43. ISBN:1-58113-201-8.[30] Walid Mohamed Taha. “Multistage Programming: Its Theory andApplications”. AAI9949870. PhD thesis. 1999. ISBN: 0-599-52497-9.[31] Walid Taha and Tim Sheard. “Multi-stage Programming with ExplicitAnnotations”. In: Proceedings of the 1997 ACM SIGPLAN Symposium onPartial Evaluation and Semantics-based Program Manipulation. PEPM’97. Amsterdam, The Netherlands: ACM, 1997, pp. 203–217. ISBN:0-89791-917-3.[32] Mads Tofte et al. Programming with Regions in the MLKit (Revised forVersion 4.3.0). Tech. rep. IT University of Copenhagen, Denmark, Jan.2006.[33] Mitchell Wand. “The Theory of Fexprs is Trivial”. In: Lisp Symb. Comput.10.3 (May 1998), pp. 189–199. ISSN: 0892-4635.59


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items