UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Platform-independent live process migration for edge computing applications Jung, Kumseok 2021

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2021_may_jung_kumseok.pdf [ 1.65MB ]
Metadata
JSON: 24-1.0395872.json
JSON-LD: 24-1.0395872-ld.json
RDF/XML (Pretty): 24-1.0395872-rdf.xml
RDF/JSON: 24-1.0395872-rdf.json
Turtle: 24-1.0395872-turtle.txt
N-Triples: 24-1.0395872-rdf-ntriples.txt
Original Record: 24-1.0395872-source.json
Full Text
24-1.0395872-fulltext.txt
Citation
24-1.0395872.ris

Full Text

Platform-independent Live Process Migrationfor Edge Computing ApplicationsbyKumseok JungB.Sc., University of British Columbia, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)February 2021c© Kumseok Jung, 2021The following individuals certify that they have read, and recommend to the Faculty of Graduateand Postdoctoral Studies for acceptance, the thesis entitled:Platform-independent Live Process Migrationfor Edge Computing Applicationssubmitted by Kumseok Jung in partial fulfillment of the requirements for the degree of Master ofApplied Science in Electrical and Computer Engineering.Examining Committee:Karthik Pattabiraman, Computer EngineeringSupervisorSathish Gopalakrishnan, Computer EngineeringSupervisory Committee MemberAlexandra Fedorova, Computer EngineeringSupervisory Committee MemberiiAbstractThe past decade has witnessed the rise of the Internet of Things (IoT) devices with single-boardcomputers such as the Raspberry Pi. The increased programmability and connectivity allow re-alization of the edge computing paradigm, in which we run on these devices complex distributedapplications that were traditionally run on the cloud.Since IoT devices are subject to resource constraints like available battery power, we need todynamically migrate a running process from one machine to another to prevent losing state. Incloud systems, VM migration techniques based on virtual memory snapshots are used in similarfailure scenarios. However, it is challenging to apply virtualization-based migration techniques inthe IoT domain, due to the differences in processor and platform architecture between IoT devices.In this thesis, we present a platform-independent migration technique, which we call ThingsMi-grate, to address this challenge. Given a program, ThingsMigrate automatically instruments thesource code to expose the hidden states such as closures and continuations. During run-time, the in-strumented program produces on demand a platform-agnostic snapshot of the process, from whichnew code is generated to resume execution. Thus, ThingsMigrate enables process migration with-out any modifications to the underlying virtual machine, providing platform-independence. Usingstandard JavaScript (JS) benchmarks, we demonstrate that it can migrate resource-intensive appli-cations, with average run-time latency overhead of 33% and memory overhead of 78%. ThingsMi-grate supports multiple subsequent migrations without introducing additional overhead over eachsubsequent migration.iiiLay SummaryWhen computers fail for unknown reasons, any tasks running on it are lost and need to be restarted.To prevent such a catastrophe, cloud computer systems periodically take “snapshots” of runningprograms, so that they can be recovered quickly on another computer upon a failure. Since all kindsof “smart” devices and household appliances nowadays have full-fledged computers inside them,we want to apply the same preventive measures as those employed by cloud systems. However, asnapshot of a program can only be copied between devices of the same type, and currently there isno general way to recover snapshots across different devices. In this thesis, we present a techniquethat allows running programs to be saved and recovered across different types of devices. Wedemonstrate our technique on 3 different types of devices to show that it can be applied in practice.ivPrefaceThis thesis is the result of work carried out by myself, under the supervision of my advisor, Dr.Karthik Pattabiraman, and Dr. Julien Gascon-Samson. All chapters of this thesis are based on the2 publications listed below, of which I am an author.A version of this material has been published as [Gascon-Samson et al. ThingsMigrate: Platform-Independent Migration of Stateful JavaScript IoT Applications. 32nd European Conference onObject-Oriented Programming (ECOOP 2018), 18:1-32, 2018]. I was the lead research program-mer, involved in all major areas of concept formation and contributed over 90% of the implementa-tion of the research artifact. I was also responsible for data collection and analysis, and contributedto several sections of the manuscript. Dr. Gascon-Samson led this project, contributing to themajority of the manuscript, leading the discussions and concept formation, and designing the ex-periments. Dr. Pattabiraman was responsible for overseeing the project, providing guidance andediting parts of the manuscript.Another version of this material has been published as [Jung et al. ThingsMigrate: Platform-Independent Migration of Stateful JavaScript IoT Applications. Software: Practice and Expe-rience, 51:117-155, 2020]. I led this work and was involved in all areas of the research, frommanuscript composition, artifact implementation, and data collection and analysis. Dr. Gascon-Samson provided insights into the problems and contributed to several sections of the manuscript.Dr. Pattabiraman provided guidance throughout this work and edited parts of the manuscript.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9vi2.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Formal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.1 Development Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Technique 1: “TREECOPY” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Code Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Snapshotting and Migrating . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.3 Code Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Technique 2: “XPLICTGC” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4 Technique 3: “LAZYSNAP” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.1 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.1.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.1.2 Experiment 1: Code Instrumentation . . . . . . . . . . . . . . . . . . . . . 504.1.3 Experiment 2: Run-time Performance Overhead . . . . . . . . . . . . . . . 524.1.4 Experiment 3: Migration Overhead . . . . . . . . . . . . . . . . . . . . . . 584.1.5 Experiment 4: Multiple Migrations . . . . . . . . . . . . . . . . . . . . . . 60vii4.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2 Case Study: Motion Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2.1 Generalization of the Approach to Other High-level Languages . . . . . . . 735.2.2 Support for I/O Resources . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2.3 Applying Live-migration for Performance and Reliability . . . . . . . . . . 74Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A Research Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83A.1 ThingsJS and ThingsMigrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83A.1.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83A.1.2 Implementation as an Open-Source Project . . . . . . . . . . . . . . . . . 84A.2 Web Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86A.2.1 Interface Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86A.2.2 Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88viiiList of TablesTable 1.1 Comparison between JS migration techniques . . . . . . . . . . . . . . . . . . 7Table 2.1 JS abstract syntax tree (AST) Node Reference Table . . . . . . . . . . . . . . . 16Table 4.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Table 4.2 Code Instrumentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 53ixList of FiguresFigure 2.1 Plain JS Code Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.2 JS Example Closures and Scopes . . . . . . . . . . . . . . . . . . . . . . . . . 14Figure 2.3 Schematic diagram of ThingsMigrate system architecture . . . . . . . . . . . . 17Figure 3.1 Scope Object Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3.2 JS Code Example - instrumented with V1 . . . . . . . . . . . . . . . . . . . . 29Figure 3.3 JS Code Example - restored with V1 . . . . . . . . . . . . . . . . . . . . . . . 35Figure 3.4 JS Code Example - instrumented with V2 . . . . . . . . . . . . . . . . . . . . 40Figure 3.5 JS Code Example - instrumented with V3 . . . . . . . . . . . . . . . . . . . . 43Figure 4.1 Code Instrumentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Figure 4.2 Code Instrumentation Results versus Code Size . . . . . . . . . . . . . . . . . 51Figure 4.3 Normalized Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.4 Memory Usage (mb) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.5 Snapshot / Restoration Time (in seconds) . . . . . . . . . . . . . . . . . . . . 59Figure 4.6 Multi-Hop Migration Analysis (regulator application) . . . . . . . . . . . . . 60Figure 4.7 Snapshot Size over Time (regulator application) . . . . . . . . . . . . . . . . 60Figure 4.8 Case study setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Figure 4.9 CPU Usage over time – Motion Detector component . . . . . . . . . . . . . . 65Figure 4.10 FPS over time – Motion Detector component . . . . . . . . . . . . . . . . . . 66xFigure A.1 High-Level Architecture of ThingsJS . . . . . . . . . . . . . . . . . . . . . . 85Figure A.2 ThingsMigrate Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87xiGlossaryAST abstract syntax treeIoT Internet of ThingsJS JavaScriptJSON JavaScript Object NotationOS operating systemVM virtual machinexiiAcknowledgmentsFirstly, I would like to thank my advisor Dr. Karthik Pattabiraman for his guidance and supportthroughout this thesis work. He has provided a sandbox for me to play in (metaphorically speak-ing), helping me to explore different ideas and find my own direction. He has inspired me to thinkand grow as a researcher. I thank Dr. Julien Gascon-Samson for his leadership and the valuable dis-cussions that brought this work to fruition. He has played a huge role in delivering this work and Icould not have accomplished it on my own. I also thank my colleagues in the Dependable SystemsLab and the members of the Computer Science Reading Group (CSRG) for the fun discussionsand the feedback on my work.I would like to thank the examining committee, Dr. Sathish Gopalakrishnan and Dr. AlexandraFedorova, for the thought-provoking follow-up questions and the feedback on this thesis.I thank my mother, father, brother, and sister for their unconditional love and support.Finally, I would like to thank Sang Eun, my fiancée, for the moral support.xiiiChapter 1IntroductionOver the past decade, there has been significant adoption of Internet of Things (IoT) devices, withsome estimates putting the number of IoT devices to grow to tens of billions[25, 48] by 2030. IoTdevices embed single-board computers (SBC) such as the Raspberry PI, and thus are equippedwith general-purpose operating system (OS) like Linux, allowing users to easily write programsusing high-level languages such as JavaScript (JS) or Python. As a result, developers other thanthe device manufacturers can write custom software to run on arbitrary IoT devices and deviceusers can install third-party software, giving rise to a flourishing software market. This increasedprogrammability and connectivity brings the edge computing paradigm closer[49, 53], in whichwe run on these devices complex distributed applications that were traditionally run on the cloud.Unlike the cloud servers, IoT devices have additional resource constraints like available batterypower, and we need to dynamically migrate a running process from one machine to another toprevent losing state. However, virtualization-based migration techniques used in the cloud[12,68] cannot be applied easily in the IoT domain, due to the platform differences between the IoTdevices. In this thesis, we present a platform-independent process migration technique, which wecall ThingsMigrate. ThingsMigrate enables the migration of edge computing applications writtenin JS across different IoT devices.11.1 MotivationIoT applications are becoming stateful. The edge computing paradigm offers certain advantagesin terms of latency, bandwidth consumption, and some aspects of security [53]. We envision run-ning on the IoT devices complex applications that were traditionally run on the cloud, which wouldrender better utilization of the resources in the underlying computer network. Consequently, as IoTdevices and applications become more complex, they inherently generate more elements of state(i.e., variables, arrays, objects) during run-time. For instance, in an application that detects motionpatterns in a video stream (Section 4.2), elements of state would include the pixels of the videoframes being inspected, as well as any intermediate results produced as part of the computation.IoT devices are resource-constrained. While general purpose OSes and high-level languagevirtual machine (VM)s may bring programmability to IoT, IoT systems are under different con-straints than traditional PC networks or cloud datacenters. For instance, many IoT devices arebattery powered – such as wearables and drones – meaning that they have a finite timespan fordoing useful work and their compute capacity is influenced by the energy available[38]. Computeresources such as memory and CPU cores are more scarce than cloud machines, and dynamicallyprovisioning them is challenging in IoT[57] due to the heterogeneity of devices and their scatteredownership. Predicting resource utilization is also more difficult than in the cloud because many IoTapplications are interfacing with the external environment, making them highly non-deterministic.Process migration is useful for performance and reliability. Due to the aforementionedconstraints of IoT devices, a great deal of flexibility is required in scheduling the workloads. Inother words, static deployment of IoT applications to devices is insufficient, and hence there isa need for applications to be migratable. For example, an increase in the computational load ora decrease in battery level of a given IoT device might require delay-sensitive applications to bemigrated to a different device. We may also need to migrate processes when there are externalfactors causing device failures (e.g., device gets overheated or physically damaged), or in the case2of cyber-attacks on IoT devices.Memory-based process migration is unsuitable for IoT. In many dependability techniques,process migration is an important building block. At a high-level, it involves capturing the stateof a running process, transferring a snapshot of the process (i.e., an image of the process state),then restoring the process from the snapshot. The same technique can be used to provide highavailability and high resilience to crashes. The most intuitive and well-known process migrationtechnique is the memory snapshot migration. In this technique, we simply copy the memoryregion of the running process to another device that has the same architecture. Unfortunately, suchtechniques are unsuitable in IoT due to the high heterogeneity of devices. Different devices havedifferent memory layouts and instruction set architectures, and hence the snapshot serializationand deserialization steps will have to account for a variety of migration targets. Therefore, to beable to migrate a process from a device to any arbitrary device, the technique needs to be platform-independent.JS is prevalent in IoT. In a heterogeneous software ecosystem like the IoT, high-level lan-guages running on a VM – e.g., JS, Python – offer many advantages such as greater code portabil-ity/reusability and developer productivity, due to the platform-independent semantics and higherlevel of abstraction than low-level languages like C or Assembly. In this thesis, we focus on JS asthe programming language for IoT applications.While JS has enjoyed wide popularity in the Web for a long time, it is now a mature and richgeneral-purpose programming language. JS is one of the most popular languages today (in 2020),and ranks seventh in the TIOBE programming languages index [31]. It has also been ranked asthe top language on popular open-source development communities such as GitHub and StackOverflow for the last eight years.More recently, it has become more prevalent in the IoT domain[17, 23] following the widespreadadoption of Node.js as a server-side language. In fact, the use of JS opens the possibility of sharinga common codebase and data formats (e.g., JavaScript Object Notation (JSON)) across the Web3and the IoT software stacks (e.g., the client-side and server-side portions of end-user applicationsin a Web of Things (WoT)[26] setting could both be written in JS[11, 21, 39]). As many IoTdevices nowadays provide a browser-based interface, it is fair to assume that they integrate a JSVM. In fact, there are efforts being made at either adopting existing JS VMs (e.g., Node.js for IoTdevices[58]), or developing new JS VMs [17, 23, 55, 59] for the IoT.What makes JS particularly attractive in the context of IoT is its single-threaded, asynchronous,and event-driven execution model. Programming concurrent computations (e.g., Actor model) inJS is fairly straightforward using continuation-passing style (CPS) patterns, where a closure ispassed to a function invocation as a callback. The closure carries the execution context for eachchain of function calls, and the programmer does not need to worry about implementing mutex orsemaphores. Furthermore, the asynchronous execution model means that concurrent function callsare interleaved in the JS event queue, eliminating any sleep time and maximizing CPU utilization.The event-driven paradigm maps well to the IoT landscape, as sensors can be expressed as eventemitters and actuators as event listeners.To sum up, the programming model and the broad applicability of JS makes it an appealing tar-get language for IoT[54]. Popular IoT frameworks such as Samsung SmartThings, Node-RED[19],and AWS Greengrass[52] have all chosen JS as one of the main programming languages for user-land applications.1.2 Related WorkMigration of JS Applications. There has been significant prior work in the area of migratingJS programs. However, they focus on migrating web applications between web browsers [8, 37,41, 47], and hence have different constraints and assumptions than in the context of IoT devicese.g., capturing the DOM state and user input. Consequently, some of the techniques [37, 47]require modification to the underlying JS VM to access the hidden application state, making themplatform-dependent.4Imagen [41] migrates web applications across heterogeneous browsers without altering theVM, and address some of the challenges specific to web applications (e.g., the DOM, HTML5media elements). However, their handling of nested closures is limited (Section 3.2.1). Further,Imagen allows the application to be migrated just once, as it restores the scope hierarchy by cre-ating a global dictionary object and the restored code lacks the necessary instrumentation for sub-sequent migration. The restored program thus may exhibit the same behavior after a single hop ofmigration, but its lexical structure is altered, breaking semantic equivalence.Similar to Imagen, Oh et al.[47] propose a migration framework for web applications, withlimited support for capturing closure states. Kwon et. al. [37] further extend their work to providedeeper support for serializing and reconstructing nested closures, though they report a degradationin performance with growing closure depth. Furthermore, they require modifications to the JSVM to access the internal scope hierarchy, which makes their approach less portable and tied to aspecific version of the open-source Webkit browser.FlashFreeze[60], which claims to be inspired by an earlier version of our work[22], demon-strates a more performant process migration also based on code instrumentation. The performanceboost comes mostly from capturing the program state lazily. In the code instrumentation step,they first statically extract the names of the variables captured by a closure, and inject a capturelist generator function for each closure, which they invoke at the time of snapshot to retrieve thecaptured variables. Similar to Imagen[41], FlashFreeze cannot subsequently migrate a programafter restoring it from a snapshot, making it unsuitable for our target application domain where aprogram needs to be migrated multiple times.Table 1.1 summarizes how prior work in the migration of JS programs compares againstThingsMigrate in terms of the features supported by each technique or framework. As shown inthe table, ThingsMigrate is the only work that supports multiple hops of migration (i.e., repeatedmigration), since it preserves the program semantics between migration. ThingsMigrate, Flash-Freeze, and Imagen are platform-independent as they are based on code-instrumentation; though it5should be noted that Imagen is actually implemented by extending Mozilla Rhino[18] – a JS VMwritten in Java – rather than as an integrated JS module. Kwon et. al. [37] requires access to theunderlying JS VM and thus is not platform-independent.Deterministic Replay. As an alternative to capturing and restoring the state of the web appli-cation, deterministic replay techniques can be used to replay an exact sequence of actions leadingto the current state [2, 6, 9, 44, 51]. However, these approaches focus on web browser events, andare hence not applicable to IoT environments. Further, they are not practical for IoT environmentsthat are resource-constrained, as the sequence of events to be captured and replayed grows rapidlyover time [41].VM/Container Migration. There has been many attempts at providing low-level migrationtechniques that directly save and restore the process memory space, and are hence programminglanguage independent [40, 45, 63, 72, 73]. Recent work in process migration in the mobile/edge/-cloud environments [27, 42, 65] also use a memory-snapshot based technique. Such techniquescould be applied for migrating JS programs, but they would require serializing the state of the JSvirtual machine (VM) itself, which can incur significant overheads on IoT devices. Further, asa single VM might host several IoT components, migrating the entire VM would migrate all thecomponents. Most importantly, providing platform-independent migration would not be possiblein diverse IoT environments, as even the same version of a given VM might have different in-memory representations across platforms due to hardware differences. Similar challenges arise inmigrating virtualized OSes across devices [12, 68], illustrating the limitations of migration tech-niques based on memory-snapshots. We avoid the challenges posed by platform differences byleveraging the platform-independent environment provided by the language runtime (in our case,JS), and by employing code instrumentation in the application layer.6ThingsMigrate Imagen[41] Kwon et al.[37] FlashFreeze[60]Application Domain Server Browser Browser ServerDOM Migration X O O XDeeply Nested Closures O X O OMulti-hop Migration O X X XPlatform-independent O O X OX means that the technique does not support a feature, while O means it does.Table 1.1: Comparison between JS migration techniques1.3 Research QuestionsAs we have discussed in the previous sections, existing migration techniques cannot be easilyadopted in edge computing applications. In this thesis, we aim to address the following researchquestions:• RQ1: Can we migrate the state of a running process in a platform-independent way? Thekey to enabling platform-independent migration is to have an abstract representation of thedata and control state of a running process, and a mechanism to transform a running processto and from the abstract process image. We address this research question in Chapter 3 ofthis thesis.• RQ2: What is the overhead of the migration technique? Our approach takes a given userprogram and transforms it into a “migratable” version, without requiring any user involve-ment. The instrumented version includes additional objects injected by our framework, andthus runs with some overhead. We address this research question in Chapter 4 of this thesis.1.4 Publications• Gascon-Samson, J., Jung, K., Goyal, S., Rezaiean-Asel, A., & Pattabiraman, K. (2018).ThingsMigrate: Platform-independent Migration of Stateful JavaScript IoT Applications.In 32nd European Conference on Object-Oriented Programming (ECOOP 2018). SchlossDagstuhl-Leibniz-Zentrum fuer Informatik.7• Jung, K., Gascon-Samson, J., Goyal, S., Rezaiean-Asel, A., & Pattabiraman, K. (2020).ThingsMigrate: Platform-independent Migration of Stateful JavaScript Internet of ThingsApplications. Software – Practice & Experience. 2020; 51:117-155.8Chapter 2Problem DescriptionIn this chapter, we describe the problem of enabling platform-independent process migration. Weoutline the technical challenges of migrating a JS process (Section 2.1) and provide a motivatingexample (Section 2.2) as well as a formal description of the problem (Section 2.3). We provideadditional context to clarify how our technique is applied in a practical system (Section 2.4), andfinally list the assumptions we make (Section 2.5).2.1 ChallengesMigrating a running JS program from one VM to another VM running on a different platformarchitecture poses several challenges when it comes to capturing and reconstructing the state. Tosupport migration, the current state of the running process must be captured. As mentioned inSection 1.1, naively dumping the process memory into a byte array and replicating it on anothermachine would not work in this heterogeneous setting. Instead, we want to capture the abstractlogical state of a running process. More concretely, in the context of JS, this involves capturingthe various objects bound to variables, function definitions, and – most importantly – closures andtheir context hierarchy. Once we capture everything needed to represent the entire process state,we need to serialize it into a platform-agnostic snapshot, and then restore the logical state entirely9from the snapshot. We have identified the following challenges in doing so.(1) Closures. In JS, extracting an object and its properties is easy, using the standard JSONAPI (e.g., JSON.stringify(foo) where foo is an object). However, there are 2 criticallimitations with this approach. The first is in correctly serializing the static definition of a function.The JSON schema [69] does not specify a serialization format for functions, and the onus is onthe user to decide how functions should be serialized. Serializing a function is more involvedthan serializing the name of the function and the code in its body; we also need to serialize thelexical context surrounding the function. For instance, if a function refers to a variable bar thatis outside its body, then we have to ensure that the variable bar exists in the right place when werestore the function so that we do not break the lexical binding of the said variable. The secondchallenge is in capturing the dynamic state of the closures created by function invocations and thelocal variables initialized inside these closures. This dynamic state is implicit and not visible at thecode level. Accessing a closure’s local scope is impossible through the application layer reflectionAPIs, as a function’s local variables are not accessible from outside by definition. In fact, thissemantics is fundamental in JS and is what makes closures particularly useful; users can create aself-contained, stateful function that carries a private execution context. Since the state of closuresis fundamentally hidden and cannot be accessed by an external agent, we need a mechanism toexpose these hidden states and make them available for serialization.(2) Object References. Since JS is untyped and lacks a syntactic representation of object refer-ences (e.g., pointers), we cannot determine statically whether a given variable holds a literal valueor a reference. It is important to distinguish between literal values and references because naivelycopying references into the snapshot would lead to duplication of objects if there are multiple ref-erences pointing to the same object. For example, consider an object foo with a property barstoring a circular reference to itself. If we do not check whether foo.bar points to foo itself, wewould be recursively copying foo into itself and the snapshot would grow infinitely. Therefore,the links between various objects and their equality have to be assessed dynamically and without10having access to the internals of the VM.(3) Event Queue. JS’s execution model is based on an asynchronous event-loop, which re-peatedly dequeues handler functions from the event queue in a FIFO fashion and calls them oneafter another until the event queue is exhausted. Although there are built-in APIs to interact withthe event queue (e.g., process.nextTick), there is no way to inspect the event queue itself.Hence, this hidden state of the event queue and the associated event data needs to be exposed inorder to restore the control-flow state of the program. The types of event handlers that need to becaptured can be broadly categorized into: (1) timer events, (2) network events, and (3) file systemevents. We do not consider the Document Object Model (DOM) and user input events as typicalIoT applications are running on server-side JS VMs, and not in a browser; therefore, they do nothave a DOM.(4) Migration Trigger. As JS is single-threaded, there is no easy way to interrupt the currentexecution (i.e., yield control mid-execution) at arbitrary points in time to perform a migration. Fur-thermore, the call stack is completely invisible to the JS layer and cannot be accessed or modifiedwithout access to the VM’s internals. Therefore, we need to come up with a mechanism to triggerthe migration at certain yield points in the execution of the program.(5) State Reconstruction. Assuming we have a way to capture the state of a program, restoringthe program from a serialized image poses yet another challenge. A snapshot is a static represen-tation of the dynamic, run-time state of a program. Given a static image, we have to restore thedynamic state of a program without the ability to directly manipulate the internal state of a VM –that is, statically through code generation. Restoring the dynamic state through code generation isnon-trivial, as we have to preserve the hierarchy of closures and the implicit references betweenobjects, and without re-executing code that can potentially lead to side effects.(6) Multiple Migrations. As a given program might be migrated arbitrarily many times, weneed to support multiple migrations. For this to work, the migration mechanism must ensure thatthe restored program is semantically equivalent – or at the very least observationally equivalent11– to the program before snapshot. If this condition is not met, then the program behaviour maydeviate over multiple hops of migration, which would be incorrect. To ensure that a restoredprogram is equivalent to the original program, the snapshot and restore procedures have to satisfythe following:Formal Definition. Let P be the set of all program configurations (dynamic state duringrun-time) and S be the set of all serialized process images. We define the snapshot functionsnapshot : P→ S and the restore function restore : S→ P. Then, given a program po ∈ P, we haveto implement snapshot and restore functions in such a way that: restore(snapshot(po))⇔ pr⇔ powhere pr is the restored program. If we cannot satisfy the relation p⇔ restore(snapshot(p)),the restored program may exhibit deviant and incorrect behaviour. In other words, snapshot andrestore should both be bijective functions and thus inverse of each other. The migration techniquemust implement the corresponding snapshot and restore procedures to make sure that a givenprogram can survive and remain functionally equivalent over multiple hops of migration.2.2 Motivating ExampleFigure 2.1 presents an example JS program that we use throughout this paper as a running ex-ample. In lines 1-21, there is a Function Declaration defining makeAccount, which returnsan object with 2 properties – balance and makeTransfer – each referencing an anonymousfunction. The 2 anonymous functions (line 5, 9) access the variable _balance declared in thelocal scope of makeAccount, and continue to have access to it after makeAccount has re-turned; thus, both functions are closures. The closure function balance in line 5 is used asboth getter and setter for the “private” variable _balance. It takes a single argument, adds itto the closed variable _balance if it is a number, and returns the new _balance. If the ar-gument is not a number, it simply returns _balance. The makeTransfer closure in line 9is slightly more complex. It creates a local variable _count and then returns another closurefunction installment (line 11) that captures it. The variable _count is used to keep track121 function makeAccount(name, initial){2 var _balance = initial || 0;3 return {4 name: name,5 balance: function(amount){6 if (typeof amount === ’number’) _balance += amount;7 return _balance;8 },9 makeTransfer: function(account, amount, repeat){10 var _count = 0;11 return function installment(){12 if (_count < repeat){13 _balance −= amount;14 account.balance(amount);15 _count ++;16 }17 console.log(name+’ $’ + _balance + ’, ’ + account.name + ’ $’+ account.balance());18 }19 }20 }21 }22 var alice = makeAccount(’Alice’, 100);23 var bob = makeAccount(’Bob’);24 var transferOut = alice.makeTransfer(bob, 20, 4);25 var transferIn = alice.makeTransfer(bob, −10, 6);26 setInterval(transferOut, 1000);27 setInterval(transferIn, 1500);Figure 2.1: Plain JS Code Exampleof how many times installment was called. installment is used to decrement a givenamount from the closed variable _balance and add the same amount to the given account- it can be invoked up to repeat number of times. In line 22, a new variable alice is declaredand initialized by invoking makeAccount, which returns an object containing the 2 closures(balance and makeTransfer). The hidden variable _balance is initialized to 100. Whilethe function makeAccount has returned, its local variable _balance is still “in-scope” becauseof the 2 closures referencing it. We simply refer to this hidden scope containing the _balanceas alice’s scope from now on. In line 23, a similar object is created and assigned to the vari-able bob, with a _balance of 0. In line 24, alice.makeTransfer is called to create aninstallment function, which can be called up to 4 times, each time transferring an amount of13Figure 2.2: JS Example Closures and Scopes20 from alice’s _balance to bob’s _balance. This installment closure is assigned tothe variable transferOut. Similarly in line 25, a different installment closure is createdand assigned to the variable transferIn. Finally in line 26 and 27, the native setIntervalAPI is used to schedule the transferOut and transferIn function to be called every 1second and 1.5 seconds respectively.Figure 2.2 visually illustrates the scopes of the various closures and their relationship. Thereare two independent copies of variable _count each defined in their own scope, but only onecopy of _balance, which is defined in the parent scope and is hence shared with the two childscopes.2.3 Formal DescriptionThingsMigrate captures the hierarchy of scopes, starting from the global scope, as well the dataelements (variables and functions) contained within each scope. In other words, ThingsMigratecaptures the structure and the values of the different state elements. More formally, we denote thestate of a JS application as S = 〈S,F,V,R〉, where S is the set of scopes, F is the set of functions,14V is the set of variables (i.e., a tuple of 〈name,value〉) and R is the set of relations between scopesand other entities (i.e., a tuple of 〈scope,entity〉).Taking the code snippet shown in Figure 2.1 as an example, and assuming that a snapshot of thestate is taken after 3250ms, there are 2 objects that need to be captured, each representing aliceand bob. Each object contains 2 closures referencing a closed variable _balance. The scopecontaining _balance and its closure functions should be included in the snapshot. Also in theglobal scope are 2 instances of the installment function, each enclosing a variable _count.The installment closures were returned by the makeTransfer function inside alice’sscope. The resulting state of the snapshot object S= 〈S,F,V,R〉 would respectively contain statesS, functions and their definition F (omitted for brevity), variables and their value V , and the set ofrelationships R between each variable/function and its associated scope:S ={σ0,σ1_alice,σ1_bob,σ3_t1,σ3_t2}F ={makeAccount,λ1_alice,λ2_alice,λ1_bob,λ2_bob,installment_t1,installment_t2}V ={(alice,<Object>),(bob,<Object>),(transferOut,installment_t1),(transferIn,installment_t2)},{(_balance_alice,60),(_balance_bob,40),(_count_t1,3),(_count_t2,2)},R ={(σ0,makeAccount),(σ0,alice),(σ0,bob),(σ0,transferOut),(σ0,transferIn)},{(σ1_alice,λ1_alice),(σ1_alice,λ2_alice),(σ1_alice,_balance_alice)},{(σ1_bob,λ1_bob),(σ1_bob,λ2_bob),(σ1_bob,_balance_bob)},{(σ3_t1,installment_t1),(σ3_t1,_count_t1),(σ3_t2,installment_t2),(σ3_t2,_count_t2)}Section 3.2.3 (Code Restoration) describes in more detail our algorithmic approach to gener-ating reconstruction code, and gives an example of the restored code of the same code sample(Figure 2.1) after migration. As can be observed, the same functions, scopes and variables, as wellas their relationships, are in the restored code sample (Figure 3.3).15Table 2.1: JS abstract syntax tree (AST) Node Reference TableAbbreviation Name State-changing ExampleIDNT Identifier × fooOBJEXPR Object Expression × { foo: 1 }FUNCEXPR Function Expression × function(){ }MEMBEXPR Membership Expression × foo.barUNAEXPR Unary Expression × !fooBINEXPR Binary Expression × foo > barLOGICEXPR Logical Expression × foo && barUPDTEXPR Update Expression © foo ++ASSGNEXPR Assignment Expression © foo = 1FUNCDECL Function Declaration © function foo(){ }VARDECL Variable Declaration © var foo = 1CALLEXPR Call Expression 4 foo()NEWEXPR New Expression 4 new Foo()RTNSTMT Return Statement 4 return foo×: Does not change state©: Explicitly updates state4: Implicitly updates state via scope creation/destructionFor ease of discussion, we briefly introduce some terminology relating to the JS AST. Table2.1 lists the AST nodes relevant to ThingsMigrate, and we will use the abbreviations in column 1 torefer to a certain type of expression. Column 3 indicates whether the given expression has an effecton the program state during run-time. Expressions like MEMBEXPR do not alter the state, so werefer to them as no-change expressions. On the other hand, expressions like ASSGNEXPR directlymodify the values of variables, and we refer to them as explicit-change expressions. CALLEXPRand NEWEXPR are more interesting, since they do not directly update any state variable. However,they create new scopes containing new variables and change the structure of the internal scopetree; we refer to these nodes as implicit-change expressions.2.4 System ModelTo describe the overall workflow and provide some context in which our migration techniquewould be used, we first present the underlying system architecture of ThingsMigrate, as illustrated16(1) Instrumentor instruments each JS program initially to make them migratable, (2) Schedulerdetermines the appropriate migration target, and the (3) Migrator coordinates the migration pro-cess by issuing commands over the Publish-Subscribe network. Blue lines refer to the migrationoperations.Figure 2.3: Schematic diagram of ThingsMigrate system architecturein Figure 2.3. The system is derived from the architecture of ThingsJS, a comprehensive IoTmiddleware that we presented as a vision paper in [21]. While ThingsJS proposes migration aspart of an integrated system, it does not specifically address migration challenges. We describethe relevant systems components below. More details about its implementation is provided in theappendix A.1.ThingsMigrate ManagerThe central piece of our architecture, namely the ThingsMigrate Manager, manages the executionof distributed IoT applications across the set of available devices. In our model, all communicationsbetween the components of the system use the topic-based Pub-Sub paradigm (MQTT)[16]. Pub-Sub decouples content producers (publishers) from content consumers (subscribers), providing ahigher-level of abstraction free from the low-level network considerations.Overall, the Manager has three components:(1) Scheduler. This component orchestrates the execution of IoT applictions across all de-vices. For the Scheduler to operate efficiently, developers are encouraged to modularize their IoTapplications into a set of Components, and to follow the best practices of JS (Section 2.5). Taking17into consideration the capabilities of each device, the requirements of the Components, and a setof developer-specified constraints, the Scheduler determines the assignment of each Componentonto a specific device. Upon changing conditions, the Scheduler can decide to dynamically mi-grate some of the Components between devices. In this paper, we assume that the Scheduler isresponsible for initiating a migration; the details of the Scheduler (e.g., scheduling algorithm, etc.)are outside the scope.(2) Instrumentor. This component is in charge of instrumenting the JS source code of the IoTapplications, which is eventually executed by the ThingsMigrate Runtime. The code instrumen-tation procedure is performed once before running an application on ThingsMigrate. Most of theapparatus needed to enable migration is bootstrapped at the code instrumentation stage, and wediscuss the work done by the Instrumentor in depth in the following sections – this is our maincontribution.(3) Migrator. The Migrator is in charge of transparently migrating the running applicationfrom a source device to the destination device. To migrate a given Component (e.g., Componentregulator1 on device 2), the migrator issues a migrate command to the ThingsMigrate Runtimerunning on the source device (Section 2.4), specifying the name of the Component to migrate. Thesource Runtime returns a snapshot back to the Migrator, which then issues a restore command tothe destination Runtime, passing the snapshot with the command.ThingsMigrate RuntimeThe ThingsMigrate Runtime is a thin JS middleware service that runs on each IoT device and man-ages the local execution of all the Components running on the device. It receives the instrumentedsource code of various Components from the Instrumentor and executes them, and awaits migra-tion commands from the Migrator over the Pub-Sub interface. Upon receiving a migrate commandfor a Component (e.g., for Component regulator1 on device 2), the Runtime first captures the stateof the running program through the ThingsMigrate API injected into the augmented program, then18serializes its state (Section 3.2.2) and sends it back to the Migrator over Pub-Sub. Alternatively,when a Runtime (e.g., destination device device1) receives the serialized snapshot of a Compo-nent, it restores the program by generating the appropriate restoration code (Section 3.2.3), whichresumes from the pre-migration state.2.5 AssumptionsWe make five assumptions as follows.ES5 Compliance. To ensure broadest compatibility, we assume that the JS code is compliantwith strict-mode ES5 (ECMAScript 2009) [32], which has been the de facto standard for manyyears. Although more recent versions of JS have been released and are now widely adopted (e.g.,ES6/ECMAScript 2015[33]), not all JS engines fully support the newest features of the language.Supporting programs that use language constructs from newer versions of the ECMAScript stan-dard is not a significant issue, as support for ES6+ can easily be provided by leveraging transpilers(e.g., Babel.js [13]) to convert to ES5.Asynchronous Programming Best Practices. Because JS is event-driven and single-threaded,we assume that developers will avoid blocking the main thread for long periods of time, as thiswould prevent the migration from being scheduled. Note that this assumption is not specific toThingsMigrate – in fact, a long-running operation that never yields control would inhibit any asyn-chronous event (e.g., timers, messages, I/O) from being dispatched. To use ThingsMigrate, de-velopers should follow the best practices and write their code in an event-driven manner, or breaklong-running operations to yield control (e.g., setImmediate) at periodic intervals.Use of Publish/Subscribe. We assume that applications use the MQTT.js Publish/Subscribe(Pub-Sub) API for network communication, instead of the built-in net or http module. SincePub-Sub is the de-facto communication interface in most IoT applications[35], we believe thisassumption is reasonable. That said, our technique is not specific to the Pub-Sub interface, and caneasily be adapted to other mechanisms.19Use of Local File System. We also assume that applications do not perform write operationsto the local file system. The migration of applications writing to local files is challenging. Forinstance, we have to answer the question of whether to migrate the file itself to the target device, sothat when the application resumes, it is writing to the same file. Alternatively if we decide not tomigrate the file, we have to address how to handle consistency of files distributed across differentdevices. In this thesis, we only focus on high-level process migration mechanism, and leave thesedecisions for future work (Section 3.2.4).Migration support in the VM. While migration support could be implemented in the VM,we believe that this is unlikely in the near future given the vast heterogeneity in IoT platforms.Unlike in the web browser space where there are only a handful of dominant players, the IoTlandscape is much more fragmented, with the availability of a wide variety of JS engines (e.g.,[17, 23, 55, 59]). Further, migration support would be required at both ends of the migrationprocess; i.e., at the source device/VM, and at the target device/VM, which may be different fromeach other. Some of the VMs may be closed-source and hence not easily modifiable. That beingsaid, should full or partial support for migration be provided in the VM (e.g., by enabling specialAPIs to access the state of closures), our technique could be adapted accordingly.2.6 SummaryIn this chapter, we provided a comprehensive description of the problem we aim to address in thisthesis, detailing its goals, challenges, and assumptions. Our goal is to enable the live migration of aJS process between Edge devices, in a platform-indepedent way without accessing the underlyingVM. More specifically, we seek an abstract model of a process, representing its logical state,which consists of its data state (heap), the control state (event queue), and the program logic (AST).However, it can be challenging to capture the high-level state of a process, as some states are hiddenfrom the application and are not directly accessible, such as closure scopes, object references, andthe event queue. In Chapter 3, we discuss how we address each of the challenges mentioned in this20chapter.In addition to the problem at hand, we also provided additional context to understand the role ofprocess migration within a system. In particular, we have described the architecture of ThingsJS,an edge computing framework, on which we deploy our migration technique. We further listed theassumptions we make in our approach, such as the use of ES5 (ECMAScript 2009) and the use ofPublish-Subscribe.21Chapter 3ApproachIn this chapter, we present the technical work done to address the question of enabling platform-independent process migration (RQ1). We first provide a high-level overview (Section 3.1) of thesteps taken to enable platform-independent process migration, and then discuss each step in depthin the following sections (Section 3.2 - 3.4).3.1 OverviewWe first present the overall workflow at a high-level, and then discuss the details of each step inthe following sections. The migration process consists of 3 steps:(1) Code Instrumentation: this step is performed prior to running a given user application,and only needs to be done once for each application. The main purpose of this step is to exposethe hidden states in the program by injecting certain ThingsMigrate objects into the appropriatelocations, which will have access to the hidden objects. Once the code has been instrumented,we can then access the internal states of the program through the injected ThingsMigrate objects.During this step, we also inject an event listener that listens for snapshot requests, providing aninterface to interact with the program. (Section 3.2.1)(2) State Extraction and Serialization: this step is triggered by an external entity (i.e., end-22user or ThingsMigrate Manager) through the event listener interface injected during the code in-strumentation step. The instrumented application provides access to all the objects in the program,but these objects are still in their native form inside the VM. To create a snapshot that can be trans-ported, we have to serialize the objects, the relations between them, and the hierarchy of the scopesin which different objects reside. We do this by recursively traversing the tree of ThingsMigrateScope objects (injected in step (1)), serializing each node along the way, and finally producing asnapshot that represents the tree-like logical state of the program. This snapshot is then transportedto another device where the next step happens (Section 3.2.2).(3) Code Reconstruction: given a platform-agnostic snapshot, we generate a new programthat is equivalent to the program at the time the snapshot was taken. The reason we have togenerate a new program is that we cannot directly control the VM to create the implicit closurestates. Instead, we restore the closure scopes by writing immediately invoked functions in the newlygenerated program. We reconstruct the program in such a way that the overall logical structure ofthe program is unchanged, and only the data and control flow states are updated; this is crucial forenabling subsequent migrations. The generated program is then executed as if it were a freshlyinstrumented program (Section 3.2.3).Note that steps (1) and (3) can occur anywhere, and not necessarily on the IoT device. In fact,the initial device (where the application is initially run) can perform all 3 steps and the restored codeitself can be transmitted to another device. However, for efficiency, we perform the instrumentationon a high-performance machine e.g., cloud instance. Code reconstruction is done by the targetdevice on which the application will be restored. This choice is made for pragmatic reasons.3.1.1 Development ExperienceDuring the course of this research, 3 different migration techniques were explored, each offeringa different lesson. The first technique, which we label TREECOPY, is our initial approach; it es-tablishes many of the key features such as the scope tree. We present the baseline workflow in our23discussion of TREECOPY in Section 3.2. The later approaches are modifications of TREECOPYand some of the workflow remains unchanged. In our second approach, XPLICTGC, we attempt toreduce the memory footprint, but we uncover additional challenges in memory management. Fi-nally, in our third technique, LAZYSNAP, we achieve improvements in both memory and executionoverhead. In the following sections, we present each of the steps3.2 Technique 1: “TREECOPY”3.2.1 Code InstrumentationIn the code instrumentation phase, the ThingsMigrate Runtime augments the input JS source file toallow the state to be dynamically captured (challenge 1), corresponding to the formal model definedin Section 2.3. Our code instrumentation approach is inspired by the work by Lo et. al. [41],but differs significantly as Lo et. al. [41] only offers limited support for capturing and restoringcomplex closures. In the example shown in Figure 2.1, Lo et. al. [41] would be able to captureand restore the scope of the two internal installment closures, but would not accurately modelthe relationship between said scopes in the restored output, so that two different instances of themakeAccount scope would be generated rather than one, ending with two distinct _balancevariables after restoration. Each nested scope would then update its own _balance variable,which would be inaccurate.The main aspects of our technique are illustrated in Algorithm 1. The high-level idea behindthis approach is to expose the internals of the logical data structure by injecting additional objects,so that the instrumented program maintains an explicit copy of the data used by the program duringrun-time. In order to fully capture the state of closures, the Instrumentor exposes the hiddenvariables by injecting Scope objects into every function, which will store the variables definedlocally in its scope, and then dynamically constructing a scope tree that mirrors the internal scopehierarchy. An abstract representation of the Scope object is presented in Figure 3.1.241 function instrument(code: String) : String2 ASTroot← codeToTree(code)3 ASTroot← instrumentNode(ASTroot)4 newCode← treeToCode(ASTroot)5 return wrapTemplate(newCode)6 end7 function instrumentNode(node: ASTNode, parentScope: LexicalScope) : void8 switch node.type do9 case FUNCDECL or FUNCEXPR do10 scope← new LexicalScope(parentScope)11 foreach childNode in node.body do12 injection← instrumentNode(childNode, scope)13 if injection then14 node.body.insertAfter(childNode, injection)15 end16 end17 firstLine← new ASTNode("var " + scope.name + " = new Scope(" + parentScope.name + ")")18 node.body.unshift(firstLine)19 injectAfter← new ASTNode(parentScope.name + ".addFunction(" + node.name + ")")20 return injectAfter21 case VARDECL do22 parentScope.addVariable(node.name, node.value)23 injectAfter← new ASTNode(parentScope.name + ".vars." + node.name + " = " + node.name)24 return injectAfter25 case ASSGNEXPR do26 varScope← parentScope.findVariableScope(node.name)27 injectAfter← new ASTNode(varScope.name + ".vars." + node.name + " = " + node.name)28 return injectAfter29 otherwise do/* If node has child nodes, call instrumentNode recursively.*//* Otherwise return. */30 end31 end32 end33 function wrapTemplate(code: String) : String34 return "require(’things-js’).bootstrap(function(){" + code + "})"35 endAlgorithm 1: Code Instrumentation Algorithm (V1)251 Scope {2 creator: Function // the function that created this scope3 params: [ (name, value) ] // tuples of parameters the creator was invoked with4 uid: String // unique id of the scope5 parent: Scope // the parent scope6 vars: [ (name, value), ... ] // local variables defined in the scope7 funcs: [ Function, ... ] // functions created in this scope8 children: [ Scope, ... ] // child scopes9 }Figure 3.1: Scope Object DefinitionLines 1-6 in Algorithm 1 describe the user-facing API for performing code instrumentation.Note that an end-user does not need to explicitly call this function, as the ThingsMigrate Run-time automatically invokes it before running the program given by the user. The instrumentfunction accepts the code string as its only argument, and returns the code string of the instru-mented program. The first step in instrumentation is to construct an Abstract Syntax Tree (AST).We then pass the root node of the AST to the function instrumentNode, which will traversethe AST recursively and update each node as needed. Lines 7-32 illustrate the important aspectsof the instrumentNode procedure, omitting some of the minor details for clarity. The kindof operation performed on a node depends on the type of the AST node. If the node currentlybeing processed is a function (i.e., FUNCDECL or FUNCEXPR), we need to inject a Scope objectbecause a function invocation will implicitly create a new scope, which we need to expose. Forexample, when the makeAccount function in Figure 2.1 is called, a hidden scope will enclosethe variable _balance for that very invocation of the function. In line 10, we first create a newLexicalScope object in order to keep track of the variables and functions being defined in itsscope. It should be noted that the LexicalScope object only tracks the lexical informationduring instrumentation and is different from the Scope object in Figure 3.1, which represents thedynamic scope of a function during run-time. After instantiating a LexicalScope object, thealgorithm proceeds to instrument the function body. We iterate through the nodes in the functionbody (lines 11-16) and invoke instrumentNode on each AST node recursively, some of which26return an injection. The injection is a piece of code that is placed immediately after the expressionthat was instrumented (line 14). After the body of the function is processed, we inject a line of codethat instantiates the appropriate Scope object at the beginning of the function being instrumented(lines 17-18). Finally, since the function being instrumented is itself an object belonging to theparent scope, we create a line of code that will add the function to its parent scope (line 19), andthen return it (line 20) as an injection.Instrumenting a VARDECL node is more straightforward (lines 21-24). We first register thevariable with the LexicalScope object (line 22) so that any other nested nodes referencingthe variable can find its enclosing scope. We then inject a line of code that will update the vari-able’s value in the corresponding Scope object (lines 23-24). When processing an ASSGNEXPR,we first look up the lexical scope in which the variable was declared by traversing the chain ofLexicalScope objects until the variable is found (line 26). Subsequently a line of code is in-jected that will update the variable’s value in the corresponding Scope object. There are severalother types of JS statements that are traversed, but we skip the operational details as they are im-material to the instrumentation technique. Once the instrumentNode called on the root nodereturns, the AST represents the instrumented version of the program (line 3). We convert the ASTback to code (line 4), and then finally return it to the user-space after wrapping the entire applica-tion code in a ThingsMigrate template code (line 5). The template is a small snippet of glue codethat performs bootstrapping work such as importing the ThingsMigrate objects (e.g. Scope) andconnecting to the Pub-Sub service to listen for incoming snapshot commands.Figure 3.2 depicts the instrumented version of the original source code shown in Figure 2.1.The lines of code that are added during the code instrumentation process are shown in bold. As canbe observed, all defined scopes (the global scope, then the scopes corresponding to each functiondefinition) are mirrored through an instance of a ThingsMigrate Scope object (lines 1, 3, 9, 17,and 21). Furthermore, each variable is copied into the Scope at which it is defined (lines 5, 12,19, 24, 27, 36, 38, 40, and 42). Similarly, functions are also registered (lines 8, 16, 20, and 34). To27make it easy to identify the anonymous functions, the Instrumentor simply assigns a unique nameto each anonymous function (lines 8, 16).Instrumenting Timers. Following a similar algorithmic approach as Lo et. al. [41], ThingsMi-grate provides support for saving the state of timer functions, namely setInterval and setTimeout(challenge 3). This is accomplished in the instrumentation phase by replacing the native timer callswith the ThingsMigrate timer API (lines 43, 44 in Figure 3.2), which expose the state of the timerssuch as remaining time until the next invocation of the callback. Consequently, the restored timersresume from the state it was in when serialization took place. For instance, if we took a snapshotof our example program after 3250ms, then upon restoration, the first timer (line 43) will triggerits callback after 750ms and then every second, while the second timer (line 44) will trigger after1250ms and then every 1500ms.Pub-Sub Interfaces. ThingsMigrate provides support for capturing the state of Pub-Sub inter-faces (challenge 3), following our assumptions (Section 2.5) about the usage of Pub-Sub in IoT.Similar to how we handle timers, ThingsMigrate wraps calls to the Pub-Sub interface (MQTTlibrary) at the instrumentation phase, so that upon a migration being requested, the list of eachtopic previously subscribed by the application gets serialized as part of the snapshot. Then, at therestoration phase, prior to resuming the execution, a subscription is transparently reestablished toeach of the previously subscribed topics. To ensure that no publications are lost during the mi-gration, we assume that reliable Pub-Sub is provided by the service [10, 70], so that the latter canretransmit any missed publication sent during the migration.Classes and Prototypes. The JS language does not support classes per se unlike object-oriented languages (e.g., Java). Instead, it provides high-level abstractions that emulate classesby means of prototypal inheritance [14]. ThingsMigrate provides support for serializing JS-likeclasses by serializing each object’s prototype object, so that upon restoring the code, the correctprototype chain can be recreated along with the objects.Cleaning Orphaned Scopes. During the life cycle of a JavaScript application, scopes are281 var σ0 = new Scope();2 function makeAccount(name, initial){3 var σ1 = new Scope(σ0);4 var _balance = initial || 0;5 σ1.vars._balance = _balance;6 return {7 name: name,8 balance: σ1.addFunction(function λ1(amount){9 var σ2 = new Scope(σ1);10 if (typeof amount === ’number’){11 _balance += amount;12 σ1.vars._balance = _balance;13 }14 return _balance;15 }),16 makeTransfer: σ1.addFunction(function λ2(account, amount, repeat){17 var σ3 = new Scope(σ1);18 var _count = 0;19 σ3.vars._count = _count;20 return σ3.addFunction(function installment(){21 var σ4 = new Scope(σ3);22 if (_count < repeat){23 _balance −= amount;24 σ1.vars._balance = _balance;25 account.balance(amount);26 _count ++;27 σ3.vars._count = _count;28 }29 console.log(name+’ $’ + _balance + ’, ’ + account.name + ’ $’+ account.balance());30 })31 })32 }33 }34 σ0.addFunction(makeAccount);35 var alice = makeAccount(’Alice’, 100);36 σ0.vars.alice = alice;37 var bob = makeAccount(’Bob’);38 σ0.vars.bob = bob;39 var transferOut = alice.makeTransfer(bob, 20, 4);40 σ0.vars.transferOut = transferOut;41 var transferIn = alice.makeTransfer(bob, −10, 6);42 σ0.vars.transferIn = transferIn;43 σ0.setInterval(transferOut, 1000);44 σ0.setInterval(transferIn, 1500);Figure 3.2: JS Code Example - instrumented with V129dynamically created, and can sometimes become orphaned. Orphaned scopes are scopes for whichthere are no single remaining references that point to either them or one of their child scopes. Inthe example shown in Figure 2.1, at each timer iteration (line 26), the scope that is created on thefly by the transferOut function (first argument) becomes orphaned and is therefore destroyed,as its serialization will not be required. Therefore, we need to manually destroy the scope objectscorresponding to orphaned scopes, as they can lead to memory leaks otherwise – this problem isexacerbated on multiple migrations (challenge 6).As a novel contribution, ThingsMigrate provides support for automatically destroying orphanedscopes, to support multiple migrations (challenge 6) on the same application without increas-ing the snapshot size and incurring additional overhead in the restored code (i.e., scope explo-sion). In the instrumentation phase, the argument of a RTNSTMT is wrapped in an API call toScope.maybeDestroy (line 16 of Algorithm 1), for the current Scope object. At executiontime, this function will check whether any other Scope or variable depend on this Scope. Ifthere are no dependent objects, then the scope is destroyed, and therefore it will not be included inthe snapshot (Section 3.2.2).3.2.2 Snapshotting and MigratingTo trigger a migration, the component that is being executed receives a migrate command from theMigrator Service through the Pub-Sub interface. Recall that the code instrumentation phase setsup a listener, which initiates the migration (Section 3.2.1).Serializing the State. The migration process first involves serializing the state to the JSONformat. To do so, the scope tree is recursively walked in a top-down approach, from the globalscope. The serialized output includes, for each scope, the variables and parameters, as well asnested scopes and functions. In JS, functions cannot be serialized as-is. Thus, upon encounteringa function when walking the scope tree, the function is assigned a unique ID, and the function’ssource code is added to a table of functions, which is appended at the end of the serialized state.30Note that the serialized output also contains the state for special objects that ThingsMigrate ad-dresses, such as timers and Pub-Sub interfaces.Handling the Stack. We address the challenge of handling the stack (challenge 4) by exploit-ing the asynchronous, event-driven nature of JS. Because JS applications are single-threaded andare event-based, the runtime maintains an event queue. We schedule code migrations as events sothat they get pushed at the end of the event queue and get executed over an empty stack. More pre-cisely, as migration requests are sent through the form of Pub-Sub publications, they are treated asevents and pushed to the event queue. We could also accomplish the same behavior by schedulingthe migration as a timer-based event.Sending the Serialized State. Once the snapshot is generated, it is sent over the Pub-Sub in-terface to the target IoT node, which will regenerate the code considering the state of the snapshot,and resume execution.3.2.3 Code RestorationUpon a given IoT node receiving a snapshot, it needs to reconstruct the original program at theexact state where migration took place (challenge 5). The code restoration process must retainthe original program structure, while reassigning the values of logical constructs holding state,such as variables, parameters and closures, without directly restoring the memory regions - this isimportant for platform independence and portability.Reconstructing Closures and Scopes. As in the code instrumentation phase, closures poseunique challenges when it comes to generating restoration code, as they wrap state elements. Be-cause functions can have return values as functions in JS (e.g., as seen in line 11 in Figure 2.1),there can exist multiple instances of the same function, each with its own scope and maintain-ing different states (i.e., holding different values in closed variables). The code restoration processneeds to instantiate multiple copies of the same function while preserving the hierarchy of the asso-ciated scopes, since closed variable states can not only be held in its enclosing scope but also any-31where in its ancestor scopes. For instance, in Figure 2.1, 2 instances of the installment func-tion are created (i.e., transferOut and transferIn), each capturing the variable _count inits own parent scope created by 2 separate invocations of the makeTransfer function. These 2scopes created by the makeTransfer function capture yet another variable _balance declaredin a common parent scope corresponding to the first invocation of the makeAccount function(bound to the variable alice), but not the second invocation of the same function (bound to thevariable bob). Thus, the reconstructed program should restore accordingly multiple instancesof the installment function and their enclosing Scope objects, multiple instances of themakeTransfer function and their Scopes, and the relationship between the various Scopes.Code Generation Algorithm. A simplified version of the code generation algorithm is shownin Algorithm 2. The restore function in line 1 is the user-facing function taking in a singleargument snapshot and returning the generated code string. At a high-level, the program statecaptured in the snapshot have 2 parts: the scope tree representing the organization of the program’sdata, and the queue of timer events representing the program’s control-flow state. Program restora-tion involves generating code to reconstruct the data state (line 2) and the control-flow state (line3).In a nutshell, the generateCode traverses the serialized representation of the scope tree andrecursively generates code string, starting from the root (global) scope. Given a scope, the firstline of the code generated is the code to initialize the Scope itself (line 7). In contrast to thecode injected during the instrumentation, this restored Scope receives an extra uid argumentto re-assign the same instance ID it had when the snapshot was taken. This ensures that otherobjects dependent on the scope can correctly retrieve the same scope instances. Next, the functions(closures) defined in the scope are injected, again wrapped with the Scope.addFunction call(lines 8-10). Then the local variables of the scope are injected, each initialized with the same valueit had at the time of snapshot (lines 12-19). For some variables during this step, it might happen thatthe scopes they depend on have not been reconstructed yet. An example would be a variable storing321 function restore(snapshot: JSON) : String2 dataCode← generateCode(snapshot.root)3 timerCode← generateTimers(snapshot.timers)4 return wrapTemplate(dataCode + timerCode)5 end6 function generateCode(scope: JSON) : String7 code← "var " + scope.name + " = new Scope(" + scope.parent + ", ’" + scope.uid + "’);"8 foreach func in scope.funcs do9 code← code + scope.name + ".addFunction(" + func.toString() + ");"10 end11 stage2← []12 foreach variable in scope.vars do13 if hasDependentScope(variable) then14 stage2.push(variable)15 else16 code← code + "var " + variable.name + " = " + variable.value + ";"17 code← code + scope.name + "." + variable.name + " = " + variable.name + ";"18 end19 end20 foreach childScope in scope.children do21 code← code + generateCode(childScope)22 end23 foreach variable in stage2 do24 code← code + "var " + variable.name + " = " + variable.value + ";"25 code← code + scope.name + "." + variable.name + " = " + variable.name + ";"26 end27 return "(function (" + names(scope.params) + "){" + code + "})(" + values(scope.params) + ")"28 end29 function generateTimers(timers: Iterable) : String/* skipped for brevity */30 endAlgorithm 2: Code Restoration Algorithma closure function, which was originally created by one of the child scopes. Without generatingthe child scope first, the closure function cannot exist and therefore cannot be bound to the saidvariable. Our approach in TREECOPY addresses these situations by pushing such variables in aqueue, to process them at a later step (i.e., after the generation of the child scopes - lines 20-22).After the first pass through the variables, generateCode function is called recursively on allchild scopes. The last step is to restore the variables that were deferred in the first step (lines 23-26). Finally, before the code generated so far is returned, the entire code is wrapped as a functioninvocation, with the same function signature as the original function that created the scope and33with the arguments it was invoked with (line 27). This step ensures that the closures created in thescopes are referencing the correct arguments. When the recursive generateCode call finallyreturns from its invocation on the root scope, the returned code string can be called to recreate thescope hierarchy.We skip the details of generateTimers as they are quite trivial. For example, if a 1000mssetInterval timer has 750ms left, we first generate a setTimeout call with 750ms and theninvoke the setInterval inside the function passed to setTimeout.Once the code for the scope tree and the timers are available, they are concatenated and thenwrapped with the ThingsMigrate bootstrapping code template in a similar way as in the instrumen-tation phase (line 4). As a result, the restored code contains the same set of injected ThingsMigrateobjects and can be migrated again in the same fashion.Code Restoration Example. Assume that a snapshot was taken after executing the code shownin Figure 2.1 for 3250 milliseconds. Figure 3.3 illustrates the restored code. Note that while thisexample has been derived from the output of a real invocation of the code restoration procedure ofThingsMigrate, some simplifications and adjustments were made for clarity. Also, the names of thevarious entities within this snippet (i.e., variables, functions, scopes), as well as their relationships,correspond to the state example shown in Section 2.3.First we see bob’s scope being created by an invocation of the makeAccount function withthe same arguments it was first called with (lines 4-10). The closed variable _balance is assignedthe value 40, which it had at the time of snapshot after 3250 ms of execution. Following thescope generation, the global variable bob is assigned an object with its properties referencingthe closure functions in bob’s scope that was just restored (lines 11-15). A different scope foralice is created, initializing the variable _balance with the value 60. In addition, there are 2child scopes created, each containing the closure installment for the 2 different invocationsof alice.makeTransfer. In the first installment scope (lines 23-28), the closed variable_count is initialized to 3 (transferOut was called 3 times) and in the second scope (lines341 var σ0 = new Scope();2 function makeAccount(name, initial){ /∗ makeAccount function body ∗/ };3 σ0.addFunction(makeAccount);4 (function makeAccount(name, initial){5 var σ1 = new Scope(σ0, ’bob’);6 σ1.addFunction(function λ1(amount){ /∗ balance (λ1) function body ∗/ });7 σ1.addFunction(function λ2(account, amount, repeat){ /∗ makeTransfer (λ2) function body ∗/})8 var _balance = 40;9 σ1.vars._balance = _balance;10 })(’Bob’);11 var bob = {12 name: ’Bob’,13 balance: σ0.getFunction(’bob.λ1’),14 makeTransfer: σ0.getFunction(’bob.λ2’)15 };16 σ0.vars.bob = bob;17 (function makeAccount(name, initial){18 var σ1 = new Scope(σ0, ’alice’);19 σ1.addFunction(function λ1(amount){ /∗ balance (λ1) function body ∗/ });20 σ1.addFunction(function λ2(account, amount, repeat){ /∗ makeTransfer (λ2) function body ∗/})21 var _balance = 60;22 σ1.vars._balance = _balance;23 (function λ2(account, amount, repeat){24 var σ3 = new Scope(σ1, ’t1’);25 σ3.addFunction(function installment(){ /∗ installment function body ∗/ });26 var _count = 3;27 σ3.vars._count = _count;28 })(bob, 20, 4);29 (function λ2(account, amount, repeat){30 var σ3 = new Scope(σ1, ’t2’);31 σ3.addFunction(function installment(){ /∗ installment function body ∗/ });32 var _count = 2;33 σ3.vars._count = _count;34 })(bob, −10, 6);35 })(’Alice’, 100);36 var alice = {37 name: ’Alice’,38 balance: σ0.getFunction(’alice.λ1’),39 makeTransfer: σ0.getFunction(’alice.λ2’)40 };41 σ0.vars.alice = alice;42 var transferOut = σ0.getFunction(’alice/t1.installment’);43 σ0.vars.transferOut = transferOut;44 var transferIn = σ0.getFunction(’alice/t2.installment’);45 σ0.vars.transferIn = transferIn;46 σ0.setTimeout(function(){47 transferOut();48 σ0.setInterval(transferOut, 1000);49 }, 725);50 σ0.setTimeout(function(){51 transferIn();52 σ0.setInterval(transferIn, 1500);53 }, 1250);Figure 3.3: JS Code Example - restored with V13529-34) it is initialized to 2 (transferIn was called 2 times). The global variable alice isinitialized in a similar manner to bob (lines 36-40). The closures bound to transferOut andtransferIn are retrieved from the reconstructed scope tree (lines 42, 44). Finally, the 2 timerstates are restored to resume execution.Multiple Migrations. ThingsMigrate supports transparent multiple migrations without intro-ducing additional overhead (challenge 6). This is accomplished at the code restoration phase bymaintaining a unique scope tree structure that is accessed by all the generated closures and scopes,and by re-injecting scope definitions (i.e., variables, parameters, nested functions, etc.) across theregenerated code, following an approach derived from Algorithm 1. Further, relevant Pub-Subcode is re-injected to support receiving migrate messages again. In other words, the output of thecode restoration phase is an alternate code segment equivalent to the output of the code instrumen-tation phase, which can hence support further migrations.3.2.4 LimitationsHandling External Libraries. ThingsMigrate does not yet provide full support for importedlibraries (i.e., the require statement). A simple solution would be to directly import the code inthe main JS module itself prior to instrumentation. This approach may be inefficient however, ifthere are multiple levels of nested library imports. Another solution would be for ThingsMigrateto provide a migration interface, and for module developers to implement the interface for eithera more optimized migration of the nested libraries, or for supporting libraries exposing native I/Oresources, such as file system access. Despite this limitation, we find that ThingsMigrate cansupport many third-party libraries as we show in Section 4.2.Scope Explosion. If programs make use of several levels of nested closures, then the resultingsnapshot and restored code can become quite large, due to the phenomenon of scope explosion, inwhich multiple scopes might have to be maintained. However, this problem is symptomatic of badprogramming practices and is not specific to ThingsMigrate, as the JS VM itself will have to retain36a large amount of scope structures in-memory.Redirecting I/O Operations. As mentioned in Section 2.5, ThingsMigrate assumes that allcommunications are done over the Pub-Sub interface. Further, in the current state, ThingsMigratedoes not support file I/O operations, which is non-trivial, as reads and writes must occur where thecorresponding files are located. For instance, assume there is a file on device A which is read byan application on the same device that gets migrated to device B. In order to guarantee consistentreads, we must guarantee (1) the availability of the file on B, or (2) to provide some redirectionmechanism.As JS I/O operations are typically handled through streams, we plan on transparently redirect-ing streams over the Pub-Sub interface (solution 2 above), by wrapping the base JS stream API(similar to wrapping timer-based or Pub-Sub-based APIs). A stream-level solution can support ar-bitrary stream-based I/O operations, such as files, network, and even HTTP requests. Upon deviceA receiving a migration request to migrate a given app to device B, the ThingsMigrate Runtimewill generate a unique ID for each currently active stream, and will setup a transparent forward-ing mechanism over a Pub-Sub bridge (i.e., by creating a topic corresponding to that ID that bothdevices A and B will subscribe to). Then, upon a read operation being requested by the app ondevice B, for a given stream, the request will be transparently forwarded by the Runtime to deviceA, which will perform the read and send back the results to the Runtime on B, who will deliverthem to the stream at the application layer. Likewise, any write operation will simply be forwardedfrom the Runtime on B to the Runtime on A.Nested Timers. Another limitation is the handling of some deeply nested timer-related calls(i.e., setTimeout, setImmediate). Should a snapshot command be received while a timeris in a pending state – i.e., before the callback function is invoked – then the timer gets cleared,the remaining time and the reference to the callback function are serialized, and migration hap-pens normally. However, should the snapshot command be received after the callback functionis invoked, then a race condition occurs between any asynchronous calls made inside the body of37the callback and the snapshot function. Race conditions are sometimes problematic in JS, as theordering of events is unpredictable [5, 43]. For instance, should the JS VM event loop process thesnapshot function before the asynchronous calls, then the resulting snapshot will not contain thescopes created by the asynchronous calls, producing an incorrect snapshot. Handling nested timerswould require that the snapshot function be delayed until all callbacks have been resolved, whichis a non-trivial problem. To address this, we can inject at the instrumentation phase, specific codeinto the function scope that will signal the function’s completion, which would allow us to detectthe resolution of nested asynchronous calls.3.3 Technique 2: “XPLICTGC”We successfully validated our approach with TREECOPY; however, the instrumented (migration-enabled) programs were consuming more than double the memory due to the duplication of state,and were significantly slower because of the additional work done in tracking the changes in pro-gram state. We came up with a new technique (XPLICTGC) to address the following optimiza-tion goals: (1) to reduce the memory footprint, (2) improve run-time performance, (3) reduce thesnapshot and code restoration time to minimize the overall migration latency, and (4) shorten theinstrumentation time.We focus our discussion of XPLICTGC on the code instrumentation phase, as the snapshot andrestoration steps remain mostly the same. In TREECOPY, duplicating the logical program stateincurs a significant overhead both in terms of memory usage and run-time speed. The instrumentedprogram consumes at least twice the amount of memory used by the original program due tokeeping an explicit copy of each variable. The extra code injected for updating the explicit stateconsumes additional CPU cycles doing bookkeeping work; the single-threaded nature of JS makesthis overhead very apparent.Keeping a single copy of the state. To address both of the above issues, the first idea was tofind a way to work with a single copy only. Since we need to be able to access every object (includ-38ing those inside closures) for capturing the program state, we decided to keep the explicit copy, andget rid of the original scope tree. This entails the following modifications to the instrumentationalgorithm introduced in TREECOPY:1. Remove all injected ASSGNEXPR that follow an explicit-change statement (refer to Table2.1).2. Replace all VARDECL with an ASSGNEXPR on the corresponding scope property. For exam-ple, var foo = "bar" declared in scope scope_0 becomes scope_0.vars.foo = "bar".3. In all statements in the scope, replace IDNT of the original variable with the MEMBEXPRof the corresponding scope property. For example, all references to foo are replaced withscope_0.vars.foo.Figure 3.4 shows the instrumented example program using XPLICTGC. The local variable_balance is an explicit property in the vars object of the corresponding Scope (σ1). Simi-larly, the variable _count is bound to σ3.vars. Additionally, all the references to _balanceand _count are replaced with the respective MEMBEXPR. As a result, the instrumented programresembles the original program. In a nutshell, we essentially remove all the free variables from theprogram and turn them into properties of the enclosing Scope object.Preserving the lexical semantics. Because we replace all the IDNT with the correspondingMEMBEXPR, more work needs to be done during the code instrumentation phase. Apart fromperforming an increased number of AST node modifications, the Instrumentor must also preservethe lexical semantics of JS. The first issue is that VARDECL and FUNCDECL are hoisted, whileASSGNEXPR are not. Because hoisted statements are processed before everything else within theexecution context even if they are placed at the end, we cannot naively change a VARDECL toan ASSGNEXPR without inspecting the order of statements first. Next, the lexical scope of eachIDNT needs to be tracked during the instrumentation, since we no longer rely on the native VM for391 var σ0 = new Scope();2 function makeAccount(name, initial){3 var σ1 = new Scope(σ0);4 σ1.vars._balance = initial || 0;5 return {6 name: name,7 balance: σ1.addFunction(function λ1(amount){8 var σ2 = new Scope(σ1);9 if (typeof amount === ’number’){10 σ1.vars._balance += amount;11 }12 return σ1.vars._balance;13 }),14 makeTransfer: σ1.addFunction(function λ2(account, amount, repeat){15 var σ3 = new Scope(σ1);16 σ3.vars._count = 0;17 return σ3.addFunction(function installment(){18 var σ4 = new Scope(σ3);19 if (σ3.vars._count < repeat){20 σ1.vars._balance −= amount;21 account.balance(amount);22 σ3.vars._count ++;23 }24 console.log(name+’ $’ + σ1.vars._balance + ’, ’ + account.name+ ’ $’ + account.balance());25 })26 })27 }28 }29 σ0.addFunction(makeAccount);30 σ0.vars.alice = σ0.vars.makeAccount(’Alice’, 100);31 σ0.vars.bob = σ0.vars.makeAccount(’Bob’);32 σ0.vars.transferOut = σ0.vars.alice.makeTransfer(σ0.vars.bob, 20, 4);33 σ0.vars.transferIn = σ0.vars.alice.makeTransfer(σ0.vars.bob, −10, 6);34 σ0.setInterval(σ0.vars.transferOut, 1000);35 σ0.setInterval(σ0.vars.transferIn, 1500);Figure 3.4: JS Code Example - instrumented with V2resolving the lexical bindings. For example, assuming there is an extra global variable _balancein the running example, the Instrumentor has to track whether a given identifier _balance isreferring to the variable in the global scope or to the local variable inside makeAccount’s scope.Handling Garbage Collection (GC). When we implemented the above approach, we foundthere was a noticeable degradation in terms of memory usage and execution speed. Since we haveattached all the state variables explicitly as properties of scope objects, all the variables created in40the program were reachable from the global scope. This meant that no variables were automaticallygarbage collected, even if they were no longer used by the program. To make objects availablefor garbage collection, XPLICTGC had to dereference variables that are no longer needed by theprogram. Determining which objects can be safely GC’ed is a difficult problem to solve duringrun-time, so we instead perform a Mark-and-Sweep periodically to collect the active scope objects.The scope objects that are not referenced by any of the objects currently in scope are dereferencedand made available for the native GC. To clarify, this Mark-and-Sweep performed by XPLICTGCis not an actual GC, but a “pre-GC” to make objects available for the native JS VM’s GC.Lesson learned. Roughly speaking, what we end up building in this approach is another JSVM written in JS. Instead of relying on the native VM (i.e., Node.js) to handle the lexical bindingof variables, the Instrumentor needs to resolve the binding. In the instrumented program, everyvariable is accessed via properties of scope objects, which is slower than accessing via direct ref-erences. When there are many transient objects, the Mark-and-Sweep is an expensive operation toperform periodically. In XPLICTGC, the performance gain due to lower memory usage is minimal,and the overall performance degrades.3.4 Technique 3: “LAZYSNAP”Although our approach in XPLICTGC was not successful, the experience of building the Mark-and-Sweep mechanism revealed an important insight: Mark-and-Sweep essentially produces asnapshot of the active program state. Another important insight – recurrent in various domainsof optimizations – is to track the state lazily. The end-goal in our migration problem is simplyto be able to capture the program state on-demand; actively mirroring the state is not necessary.Combining the two insights, we identified that a more efficient way to achieve this goal wouldbe to invoke Mark-and-Sweep lazily upon a snapshot command and trace the active scope tree tocapture the state (i.e., lazily tracking the state).Before we discuss further, we briefly describe the Mark-and-sweep GC mechanism used by41Node.js V8 engine. This GC algorithm is the de facto GC used in most modern JS engines. Thehigh-level intuition underlying the Mark-and-Sweep algorithm is to traverse the “links” betweenthe objects in the heap and carve out a graph of reachable objects, starting from the objects directlyaccessible from the root scope. The objects that are not included in the graph are unreachable bythe program and thus can be safely garbage collected. Figure 3 illustrates the algorithm[66].1 function markAndSweep(items: Set, marked: Set) : Set2 foreach item in items do3 if item ∈ marked then4 continue5 else6 marked← marked ∪ {item}7 if item /∈ LITERAL then8 markAndSweep(getScope(item), marked)9 markAndSweep(getProperties(item), marked)10 end11 end12 end13 return marked14 end15 reachable← markAndSweep(rootScope, /0)16 unreachable← heap ⊕ reachable17 foreach object in unreachable do18 delete object19 endAlgorithm 3: Mark-and-Sweep AlgorithmLazily capturing the original scope tree. As per our assumptions, we do not have access tothe JS VM and its GC, so we cannot leverage the native Mark-and-Sweep mechanism. Thus, thechallenge of accessing enclosed variables still remains, if we build our own Mark-and-Sweep. Toexpose the closed variables lazily, we inject a callback function in the variable’s lexical scope,which we can call later to access the variables. To reiterate, instead of injecting code that activelycopies the values of the variables, we inject a callback function, which returns the copy of thevariables when invoked. Figure 3.5 shows the example code transformed using LAZYSNAP.In this approach, the callback function resides in the same function body as the variables itis capturing. The lexical binding of all the variables remain unchanged from the original code,421 var σ0 = new Scope();2 σ0.extract = function(){3 return {4 makeAccount: makeAccount,5 alice: alice,6 bob: bob,7 transfer: transfer8 }9 };10 function makeAccount(name, initial){11 var σ1 = new Scope(σ0);12 σ1.extract = function(){13 return {14 _balance: _balance15 }16 };17 var _balance = initial || 0;18 return {19 name: name,20 balance: σ1.addFunction(function λ1(amount){21 if (typeof amount === ’number’){22 _balance += amount;23 }24 return _balance;25 }),26 makeTransfer: σ1.addFunction(function λ2(account, amount, repeat){27 var σ3 = new Scope(σ1);28 σ3.extract = function(){29 return {30 _count: _count31 }32 };33 var _count = 0;34 return σ3.addFunction(function installment(){35 if (_count < repeat){36 _balance −= amount;37 account.balance(amount);38 _count ++;39 }40 console.log(name+’ $’ + _balance + ’, ’ + account.name + ’ $’ + account.balance())41 })42 })43 }44 }45 var alice = makeAccount(100);46 var bob = makeAccount();47 var transferOut = alice.makeTransfer(bob, 20, 4);48 var transferIn = alice.makeTransfer(bob, −10, 6);49 σ0.setInterval(transferOut, 1000);50 σ0.setInterval(transferIn, 1500);Figure 3.5: JS Code Example - instrumented with V343and the callback function inherits the same lexical scope as the variables. Unless invoked, theinjected callback does not interfere with the normal execution of the program and thus introducesminimal run-time overhead. At any point during the execution, a Scope’s extract callbackcan be invoked to retrieve its enclosed variables. With this modification, we have avoided activelytracking the state, and are able to capture all the variables lazily on demand.Aligning the lexical scope of injected objects for natural GC. At this point, we have ef-fectively made the migration technique work lazily. However, the injected objects created duringrun-time still need to be explicitly dereferenced. The best strategy would be to avoid having todereference objects altogether, and let the native GC collect unused objects. To achieve this, thenext step is to organize the injected code in such a way that the life-cycle of the injected objects isaligned with the actual life-cycle of the enclosing function. That is, if a closure goes out of scopenaturally, its corresponding Scope object should too. Concretely, we update the definition of theScope object so that there is only a one-way reference between a parent and a child Scope. Un-like TREECOPYand XPLICTGC, a parent Scope in LAZYSNAP does not keep references to itschildren, whereas the child Scopes can reach the parent. This aligns with the operational seman-tics of JS – objects in the child scope can refer to variables declared in the parent scope, while theparent cannot access the closure variables in the child scope. In a similar fashion, the child Scopehas access to the parent Scope, but the parent Scope has no direct link to the child Scopes.After this modification, if a child Scope is not reachable from the root scope, they are naturallygarbage collected; ThingsMigrate does not need to manage the life-cycle of objects.On the other hand, without a direct reference from the root Scope, certain descendant Scopesneed to be reachable when capturing a snapshot; we make them reachable via objects currently inscope. A scope is needed only if it is associated with an object in the reachable program context.So on every non-literal object (e.g., objects, functions), we attach a “private” property and bind theassociated Scope object. Then at the time of snapshot, we can traverse the Scope tree startingfrom the root, using our adaptation of Mark-and-Sweep as shown in Algorithm 4.441 function capture(scope: Scope) : dict2 if scope.visited then3 return4 else5 scope.visited← True6 scope.image← { refs, children }7 refs← scope.extract()8 foreach name, object in refs do9 scope.image.refs[name]← serialize(object)10 captureObject(object, scope)11 end12 if scope.parent then13 capture(scope.parent)14 scope.parent.image.children[scope.id]← scope.image15 end16 scope.visited← False17 return scope.image18 end19 end20 function captureObject(object: Object, scope: Scope) : void21 if isNotLiteral(object) then22 if isFunction(object) and object._parentScope != scope then23 capture(object._parentScope)24 foreach property in object.prototype do25 captureObject(property, scope)26 end27 else if isObject(object) and isDefined(object._scope) then28 capture(object._scope)29 end30 foreach property in properties(object) do31 captureObject(property, scope)32 end33 end34 end35 snapshot← capture(rootScope)Algorithm 4: LAZYSNAP Snapshot AlgorithmFinally, we also skip instrumenting a function if we can statically determine that it only createstransient state (e.g., pure functions). That is, if a function does not create any objects and does notreturn an object other than literals, we can safely assume that it does not create any closures andtherefore need not be tracked. Inside such a function, we do not even inject the Scope object, andleave the entire function intact.453.5 ImplementationThingsMigrate is implemented in the form of a JS library that can be included by the application.Its implementation is built over the ThingsJS system, and is part of ThingsJS [21]1. It providesAPIs that can be invoked to perform code instrumentation, snapshotting and code restoration.From a higher-level perspective, our implementation also provides an execution environmentthat replicates the architecture shown in Figure 2.3. More specifically, it provides a runtime envi-ronment that can be run on IoT devices supporting an appropriate VM (e.g., Node.js on RaspberryPIs Models 3 and 0), as well as a Manager component, which is used to transparently instrumentJS programs, launch them on specific IoT nodes (decided by a scheduler), monitor them, and trig-ger a serialization/migration. Internally, our implementation uses the popular esprima library[29] to parse JS code into an AST, and the escodegen [56] to convert back an AST into JS code.We provide more details about the implementation of ThingsMigrate and ThingsJS in the appendixA.1.3.6 SummaryIn this chapter, we discussed in detail how we enable the live-migration of a JS process in aplatform-independent manner, and reported on our experience of developing the technique to pro-vide any insights gained and lessons learned. Our migration technique has 3 stages: 1) codeinstrumentation, 2) snapshotting and migrating, and 3) code reconstruction. During code instru-mentation, we inject ThingsMigrate objects into the user program without altering the programbehaviour, in order to be able to extract the hidden states of the program at a later point in time.Upon issuing an external trigger via the network, a process produces a platform-agnostic JSONsnapshot of its current state. From the JSON snapshot, new code is generated in order to restorethe state of a process and resume its execution.1http://www.github.com/DependableSystemsLab/ThingsJS46Throughout our research, we have developed 3 versions of the technique, each revealing a keyinsight that led to a change in design. In TREECOPY, the first version, we construct a scope tree toactively track the structure and values of the objects used by the process. The first version incurs ahigh overhead, as we essentially keep a “shadow” copy of the heap. In order to reduce the memoryoverhead, we developed XPLICTGC, which tries to use a single scope tree as the source of truth.This version produced unsatisfactory results and led to the decision of using the Mark-and-Sweepalgorithm in LAZYSNAP. In our final attempt, we inject callback functions into each functionscope, which we later call to lazily trace out the scope tree using Mark-and-Sweep. LAZYSNAPproduced practical performance when applied to real benchmark applications.47Chapter 4EvaluationIn this chapter, we present the empirical work done to evaluate our migration technique, addressingRQ2. We first discuss the set of experiments carried out using standard JS benchmarks, in order toassess the performance overhead of our technique (Section 4.1). Additionally, we perform a casestudy on a realistic application to demonstrate the practicality of our migration technique (Section4.2).4.1 Experimental ValidationWe perform 4 sets of experiments to evaluate ThingsMigrate. Experiment 1 (Section 4.1.2) mea-sures the performance of our code instrumentation algorithm against a set of benchmarks. Experi-ment 2 (Section 4.1.3) measures the run-time performance overhead of ThingsMigrate in terms ofexecution time and memory usage. Experiment 3 (Section 4.1.4) measures the migration latencycomprising the snapshotting and code generation time. Finally, Experiment 4 (Section 4.1.5) eval-uates the multi-hop migration capabilities of ThingsMigrate by migrating a benchmark applicationseveral times across different devices.484.1.1 Experimental SetupThingsMigrate provides JS migration between IoT devices, and between devices and the cloud. Toemulate different scenarios, we ran our experiments on two IoT platforms, namely a Raspberry Pimodel 3B (quad-core 1.2 Ghz ARM7, 1 GB memory), and a Raspberry Pi model 0W (single-core1 Ghz ARM6, 512 MB memory), both running the Raspbian Jessie operating system (a DebianLinux variant). We also included a cloud server (Xeon E3-1220 v3, quad-core 3.10Ghz, 32 GBmemory). All nodes were running the Node.js VM version 8, which is ES5 compliant. While wedid not test other VMs due to stability issues or due to incompliance with the ES5 standard, eachof the Node.js VMs we used were compiled for different architectures (armv7, armv6, and x86-64respectively).Despite an extensive search, we did not find publicly-available sets of IoT-specific JS bench-marks to evaluate our system. Prior work ([54]) has built their own IoT-specific JS benchmarks1.We followed a similar approach and built two IoT-specific benchmarks: (1) a factorial application,which computes the factorial of a very large number and uses closures to store the computed digits(i.e., in a very large expanding array), and (2), a regulator application, which models an IoT edgecomponent which receives temperature measurement data from different sensors2 over a Pub-Subinterface, keeps the previous n values for m sensors, and periodically computes an optimal poweradjustment to be sent to an actuator. factorial models a CPU and memory-intensive applica-tion of a finite duration (experiment 2), while regulator models a less intensive (i.e., low CPUand memory usage) application that runs for a long time. Both applications are stateful, and needto preserve state across migrations. Note that the memory usage of the regulator is similar tothe memory usage of typical IoT-specific benchmarks [54].In addition, for experiments 1 and 2, we also used some benchmarks from the ChromiumOctane [24] suite, which were originally designed to stress-test the performance of the V8 JS1The source code is not publicly available, and hence we cannot use them.2We fed the application with pre-determined values, as the computed result itself is not part of the experiment.49(a) Time Taken (Lower is better) (b) Code Size Increase (Lower is better)(with a confidence interval of 95%)Figure 4.1: Code Instrumentation Resultsengine in the Chrome web browser. They run synchronously and hence are not representative ofIoT applications, but we nevertheless use them to assess the universality of our framework, and forperformance testing of ThingsMigrate under intense workload.4.1.2 Experiment 1: Code InstrumentationIn this experiment, we consider all the benchmark programs from the Chromium Octane suitethat do not depend on a web browser (i.e., accessing the DOM or any other in-browser object),as ThingsMigrate migrates IoT applications (i.e., server-side) rather than in-browser applications.We measure the time it takes to instrument the code for these benchmarks3, as well as for ourfactorial and regulator applications. In addition, we compare the size of the instrumentedcode against the size of the uninstrumented (raw) code. The set of benchmarks and the set of resultsof instrumentation using all 3 approaches (TREECOPY, XPLICTGC, LAZYSNAP) are shown inTable 4.1 and Table 4.2.Figure 4.1a shows the time taken to instrument each benchmark, normalized to the time takenby TREECOPY. The main takeaway of this plot is the significant improvement in the performanceof the instrumentation algorithm. As described in Sections 3.3 and 3.4, we inject less code in3Measurements were taken on our cloud server.50(a) Time Taken (Lower is better) (b) Code Size Increase (Lower is better)Figure 4.2: Code Instrumentation Results versus Code SizeXPLICTGC and LAZYSNAP compared to TREECOPY where we inject a follow-up expression forevery ASSGNEXPR, which makes it much faster.Figure 4.2a shows a lin-log (i.e., log linear) plot of intrumentation time against code sizefor all 3 approaches. The instrumentation time for TREECOPY is noticeably larger than thosefor XPLICTGC and LAZYSNAP. Approaches XPLICTGC and LAZYSNAP yield similar results,with LAZYSNAP performing marginally better. We also confirm that the instrumentation timegrows linearly with the code size – i.e., they fit the curve of the form f (x) = ax+ b (the plot forTREECOPY is approximated by the curve f (x)= 2.055×10−3x+56.998, XPLICTGC by the curvef (x) = 0.355× 10−3x+ 0.327, and LAZYSNAP by the curve f (x) = 0.259× 10−3 + 0.404). Us-ing the functions, we can estimate that instrumenting a program of 1MB would take about 263msusing TREECOPY, and about 26ms using LAZYSNAP, which is an order of magnitude faster. Notethat even in our slowest approach (TREECOPY), the instrumentation is performed quite quickly(in under 1 second) for the benchmark applications. Further, code instrumentation is a one-timeprocess for any given program and can always be performed on a machine with higher computecapacity, or can be directly integrated as part of the build process, similar to code minification.The size of the instrumented code relative to the original is shown in figure 4.1b. In TREECOPY,51the median code size increase is around 700%, while in XPLICTGC and LAZYSNAP the mediancode size is around 200%. In Figure 4.2b we provide a lin-log plot of the size increase as a functionof the original code size.4.1.3 Experiment 2: Run-time Performance OverheadIn this experiment, we analyze the performance impact of ThingsMigrate over a set of highlyresource-intensive benchmarks. The goal of this experiment is to model the execution of a resource-intensive task of a finite duration (i.e, eventually returns a result) that would be executed overdifferent IoT devices and the cloud server. For evaluating the 3 techniques, we selected bench-marks navier-stokes and splay from the Octane suite, as they respectively model extremeconditions in terms of CPU usage and memory utilization. Further, we were successful in run-ning these benchmarks on all test devices and across all 3 versions of ThingsMigrate, unlike mostother benchmarks in the suite (even without our instrumentation, most of the benchmarks in theOctane suite were unable to run on the Rapsberry Pi 0 due to its limited capabilities). We alsoused our factorial application. For the LAZYSNAP technique, we were able to additionallyrun richards and raytrace from the Octane suite, as LAZYSNAP addresses the limitationsof the previous versions (e.g., multiple levels of prototypal inheritance).For each benchmark, we measure and compare the time to complete its execution. On our 3target devices (Raspberry Pi 3, Pi 0 and our cloud server), we run (A) the non-instrumented (raw)code, (B) the instrumented code, and (C) the restored code after migration. As the restored code (C)only runs the second half of the program (the snapshot is taken at the mid-point of the execution),we also only consider the second half for (A) and (B) for fairness. In addition, we measure theaverage memory usage of each of the runs. We perform this experiment for all 3 approaches, andcompare the run-time and memory overhead incurred by each of the approaches.Execution Time. Results for the execution time of selected benchmarks running on the cloud(Xeon), Pi3, and Pi0 are shown in Figure 4.3a, 4.3b, and 4.3c respectively. The execution times of52Benchmark Code Time spent AST Node Distribution % of State-changingSize (kb) on GC NO-CHANGE EXPLICIT-CHANGE IMPLICIT-CHANGE EXPRESSIONSnavier-stokes 9.985 0.0% 1428 304 81 21.2%splay 6.573 13.2% 785 142 82 22.2%deltablue 1.5452 1916 325 218 22.1%crypto 39.028 0.1% 6558 1002 592 19.6%box2d 357.169 1.3% 57288 7049 2534 14.3%earley-boyer 159.794 17.0% 11201 1679 2901 29.0%raytrace 24.998 5.6% 3009 348 232 16.2%richards 8.302 0.5% 1122 212 99 21.7%typescript 2.138 297 71 39 27.0%factorial 0.952 0.0% 122 25 16 25.2%regulator 1.855 1.7% 155 46 27 32.0%Table 4.1: BenchmarksBenchmarkTREECOPY XPLICTGC LAZYSNAPSize Time Size Time Time Taken Size Time Time TakenIncrease Taken (ms) Increase Taken (ms) (as % of TREECOPY) Increase Taken (ms) (as % of TREECOPY)navier-stokes 1220% 135.67±4.5 300% 7.4±0.7 5.4% 250% 4.8±0.6 3.5%splay 700% 86.18±2.7 220% 2.8±0.1 3.2% 210% 2.0±0.1 2.4%deltablue 7480% 120.65±0.6 220% 6.8±0.3 5.6% 210% 4.7±0.1 3.9%crypto 710% 194.05±1.6 260% 22.4±0.5 11.6% 230% 14.6±0.2 7.5%box2d 780%x 821.95±3.0 190% 133.8±1.7 16.3% 160% 95.8±1.3 11.7%earley-boyer 360% 301.9±0.8 150% 40.0±0.7 13.2% 150% 34.8±0.6 11.5%raytrace 130% 64.2±0.5 160% 7.4±0.2 11.5% 140% 5.5±0.1 8.5%richards 720% 87.4±0.5 210% 3.1±0.1 3.5% 210% 2.5±0.1 2.8%typescript 490% 31.75±0.2 250% 1.0±0.1 3.1% 290% 0.9±0.1 3.0%factorial 580% 28.21±0.2 200% 0.5±0.1 1.6% 210% 0.4±0.1 1.5%regulator 840% 42.1±0.4 220% 0.9±0.1 2.1% 240% 0.8±0.1 2.0%Average 1270% 220% 7.0% 210% 5.3%(confidence interval of 95%)Table 4.2: Code Instrumentation Results53(a) cloud Server (Xeon) (b) Raspberry Pi 3 (c) Raspberry Pi 0Margins of errors were below 1.5% for most of our results, and up to 6% for some of our resultson the Pi 0, for a confidence interval of 95%.Figure 4.3: Normalized Execution Time(a) cloud Server (Xeon) (b) Raspberry Pi 3 (c) Raspberry Pi 0Margins of errors are not shown, as the results show the averaged memory usage for all runs,averaged over the duration of the experiment.Figure 4.4: Memory Usage (mb)the raw version of the benchmarks, represented by the blue bar, are normalized to 1. The executiontime of the instrumented benchmarks using approaches TREECOPY, XPLICTGC, and LAZYSNAPare shown by the red, yellow, and green bars respectively.The results for the factorial program clearly illustrates the improvement over the 3 ver-sions of ThingsMigrate. When using TREECOPY across all 3 devices, factorial slowed downby a factor of about 52% (averaged across 3 devices). XPLICTGC observes only a marginal im-provement. However, in LAZYSNAP the average overhead is reduced to about 27%, where the54overhead on the cloud is 23%.Running navier-stokes exacerbates the flaws of XPLICTGC, as it performs worse thanTREECOPY across all 3 platforms. The navier-stokes benchmark mostly calls functions thatperform arithmetic operations over several variables and over elements in a large array. Running aJS profiler shows 90.3% of CPU time is spent on project(u, v, p, div), vel_step(u,v, u0, v0, dt), advect(b, d, d0, u, v, dt) functions alone, which are all suchfunctions. In XPLICTGC, all the variables are converted to explicit properties of Scope objects– i.e., IDNT are converted to MEMBEXPRs – and hence it takes longer to access and update eachof the state variables. Furthermore, XPLICTGC’s ad-hoc GC mechanism needs to work diligentlyto explicitly dereference variables that have gone out-of scope. The execution time overhead ofnavier-stokes is brought down to about 3% in LAZYSNAP, as the improved instrumentationalgorithm is able to identify and skip over the functions that do not need to be captured. As a result,only a few functions that generate closure state are instrumented, with the rest of the program beinguntouched.We plot the splay benchmark separately, because it incurred significantly larger overhead inTREECOPY and XPLICTGC. The splay benchmark creates a splay tree spanning thousands ofnodes and rapidly transforms the tree’s structure. Unlike navier-stokes, it instantiates higher-level objects like SplayTree.Node that contain closure states. When instrumented, each ofthe nodes in the splay tree instantiates a ThingsMigrate Scope object, adding to the run-timeoverhead. TREECOPY and XPLICTGC experience a much larger average overhead of 8220%and 3870% respectively, since both approaches involve some form of GC duty (i.e., explicitlydereferencing unused objects via delete). In LAZYSNAP, the overhead is brought down to 67%.The observed difference between the approaches highlight the importance of leveraging the nativeGC.In addition to the 3 benchmarks, we were able to run richards and raytrace in LAZYS-NAP. In contrast to the other benchmarks, both richards and raytrace create higher-level ob-55jects using prototypal inheritance and the new expression (as opposed to object literals). TREECOPYand XPLICTGC were not able to correctly restore the program on a target device, due to their lim-ited support for prototype inheritance. Thus, we report the results for the 2 additional benchmarks,but exclude them from the overall average, as we cannot compare their results across the differentversions of ThingsMigrate.Despite the large slowdown in TREECOPY and XPLICTGC, all 3 approaches were able to cor-rectly migrate the benchmarks. We note however that these benchmarks are synchronous programsthat were specifically designed to stress-test browsers on desktop computers, and do not representtypical asynchronous and event-driven applications that run on IoT devices.We also observe that the performance of the instrumented code (B) and the restored code (C)is the same across all benchmarks – i.e., time taken for restored program is within the confidenceinterval of the time taken for instrumented code and vice versa. As the restored code is semanticallyequivalent to the original code, we obtain the same performance measurements as the instrumented(pre-migration) code. These results indicate that the instrumentation overhead does not accumulateand degrade the performance (i.e., execution time) over subsequent migrations (Section 4.1.4).Unfortunately, we cannot directly compare our results with prior work in terms of executiontime overhead, as Lo et. al. [41] measured such overheads for web applications on desktop com-puters, which do not exhibit the same workload characteristics as our benchmarks, and Kwon et.al. [37] does not report on the execution time overheads at all.Memory Usage. Figures 4.4a, 4.4b, and 4.4c present the memory usage of the benchmarksacross 3 different platforms. Similar to the results in Figure 4.3, the memory usage of the rawbenchmarks (shown in blue) are normalized to 1; the red, yellow, and green bars represent thedifferent versions of ThingsMigrate. For each benchmark, we uniform-randomly sampled thememory usage throughout the execution of the benchmark, and report the average memory usage.We apply uniform random sampling and not periodic sampling to avoid the sampling intervalcoinciding with the garbage collection interval, which may render biased results. Overall, our56results reveal that the instrumented programs use considerably more memory than the originalprogram, ranging from 18% in LAZYSNAP to 940% in XPLICTGC. LAZYSNAP incurs the smallestmemory overhead for all benchmarks and across all platforms.Since ThingsMigrate is a pure application-level solution that does not involve augmenting theJS VM (unlike Kwon et. al. [37]), some memory overhead is unavoidable. For example, everyinjected Scope object takes up additional space in the heap. For each benchmark, the observedmemory overhead exhibits a different trend, which we attribute to the varying logical structureof the benchmarks. For factorial, the memory overhead is reduced over each version ofThingsMigrate, with LAZYSNAP incurring the least memory overhead of 77%. Interestingly,the average memory overhead for navier-stokes is higher by 48% in XPLICTGC than inTREECOPY. In factorial, there are only a few Scope objects while the captured object islarge (i.e., a large array of numbers). In this program, the size of the program state dominates thememory usage. However in navier-stokes, the program state consists of a smaller array anda few literal values, while there are a lot of Scope objects created by transient function calls. Insuch programs, the creation of Scope objects dominates the memory usage. This is why we seehigher memory usage in XPLICTGC even without keeping a replica of the program state. LAZYS-NAP does not incur memory overhead for tracking the program state, since it captures the programstate lazily. Rather, the memory overhead in LAZYSNAP comes from the injected Scope objectsand the associated extract callback functions. The results for splay also illustrates the im-provement over the 3 approaches, with the lowest overhead of 42% in LAZYSNAP. Surprisingly,the overhead introduced by XPLICTGC is larger than TREECOPY on the cloud machine, but noton the other 2 devices. A potential explanation for this is the interaction between the garbage col-lection cycle and the memory sampling code. As we measure the resource usage from within theapplication, the sampling function is also pushed into the JS VM’s event queue. Hence, we knowthat the sampling function can only be invoked when the synchronous part of the benchmark hasfinished. During the synchronous part of splay, thousands of Scope objects are created, con-57suming a lot of memory. On the cloud device, the next synchronous part of splay can be queuedright away, as there is still gigabytes of memory left. In contrast, on the Raspberry Pi devices,the VM invokes garbage collection more aggressively, to clean up the unused Scope objects andmake room for the next function invocation. Therefore, there is a higher chance on the RaspberryPi devices that the memory sampling function is invoked after the garbage collector has freedup memory. The memory logs support this explanation, in which the memory consumption forRaspberry Pis drops at regular interval, while the pattern observed on the cloud is more arbitrary.In addition to the Scope objects injected throughout the user code, there are other ThingsMi-grate objects initialized during the bootstrapping step (Section 3.2.1) that perform logistical workunrelated to the application logic such as maintaining an MQTT Pub-Sub connection. Thus, wefurther break down the overall memory usage into 2 parts: 1 memory used for tracking the pro-gram state, and 2 memory used by helper objects providing the Pub-Sub interface for trigger-ing the migration. Part 1 is the unavoidable overhead in any implementation of ThingsMigrate;Scope objects are injected regardless of how the migration is triggered and how the snapshot istransported. Part 2 is the variable overhead and is mainly due to our choice of the MQTT Pub-Subinfrastructure for triggering the migration process. We therefore subtract the latter from the totalmemory overhead. We measured the overhead of part 2 by executing an “empty” benchmark thatperforms no computation, and found it to be about 19MB for 64-bit devices and 9.5MB for 32-bitdevices. We note that the absolute memory usage for both Raspberry Pi devices are half of thatof the cloud server due to the bitness of the devices; i.e., our cloud server has a 64 bit processor,while the other Pi devices have 32 bit processors. Further, GC is triggered more aggressively onthe Pi devices, as they are more memory-constrained.4.1.4 Experiment 3: Migration OverheadIn this experiment, we report the migration overhead consisting of the time taken to capture andserialize the state, and the time taken to generate restoration code. We ran each benchmark until58(a) cloud Server (Xeon) (b) Raspberry Pi 3 (c) Raspberry Pi 0Margins of error were between 0.5% and 5% for all results, for a confidence interval of 95%.(TC = TREECOPY, XG = XPLICTGC, LS = LAZYSNAP)Figure 4.5: Snapshot / Restoration Time (in seconds)it made 50% progress, took a snapshot, and then generated restoration code from the snapshot.As we noted previously, as the benchmarks were synchronous programs (i.e., do not yield controlto the migration framework), we manually injected code to trigger the migration from within thebenchmark; this also has the effect of migration being triggered deterministically at the desiredpoint in the execution. Each benchmark was captured and restored multiple times on each platform,and the times averaged over the multiple runs are displayed in Figure 4.5.Similar to the results of Experiment 4.1.2, we observe that XPLICTGC and LAZYSNAP bothperform significantly better than TREECOPY, mostly due to the improved implementation of theInstrumentor and Runtime. An interesting observation regarding the navier-stokes bench-mark is that LAZYSNAP takes slightly longer than XPLICTGC because the snapshot step forLAZYSNAP takes more time. In XPLICTGC, the scope tree is explicit and thus always readilyavailable for capture. If there are not many unused scopes in the program, the explicit GC proce-dure takes minimal time and the scope tree can be captured immediately. In LAZYSNAP, the scopetree is captured lazily, so a Mark-and-Sweep procedure is invoked to traverse and collect the activestates, only upon a snapshot command. The results for splay bring out the opposite effect, in59 0 5 10 15 20 25 30 35 0  3  6  9  12  15  18  21  24  27  30 0 5 10 15 20 25 30 35Memory (RAM) (mb)CPU Utilisation (%)Time (minutes)Cloud MemPi3 MemPi0 MemPi0 CPUFigure 4.6: Multi-Hop Migration Analysis (regulator application) 60 65 70 75 80 85 90 95 100 0  3  6  9  12  15  18  21  24  27  30Snapshot Size (kb)Time (minutes)SnapshotFigure 4.7: Snapshot Size over Time (regulator application)which XPLICTGC has to perform a lot of work to dereference the unused scopes before producinga snapshot. In LAZYSNAP, the unused scopes are left for the native GC, and the Mark-and-Sweepprocedure simply captures the active scopes. The migration latency of LAZYSNAP on even themost constrained device (Raspberry Pi 0) is still quite reasonable (77ms for splay).4.1.5 Experiment 4: Multiple MigrationsIn this experiment, we analyze the global behavior and performance of ThingsMigrate over time,when multiple migrations are performed between the edge and the cloud. More precisely, weanalyze the effects of migrating a long-running asynchronous service that is not computationallyexpensive from one device to another. None of the benchmarks used in Experiment 2 fit thisdescription, nor could we find publicly available JS-based IoT benchmarks that satisfy this criteria(Section 4.1.1). Therefore, we developed and used our own benchmark – the regulator applicationthat satisfies this criteria (by design). We first deploy the regulator application on our cloud server,60then we migrate it to the edge devices (i.e., the Raspberry Pi 3 device, after one minute, and thento the Pi 0 device, after one minute). The application is then pushed back to the cloud server. Thiscycle is repeated 10 times (30 migrations over 30 minutes), and the CPU and memory utilizationare measured in each instance.The memory utilization results are shown in Figure 4.6, for the duration of the experiment(30 min). The migration cycles are denoted by a vertical bar (every 1 minute), and an oscillatingvariation pattern can be observed during the time periods for which each device was executingthe regulator application. As can be observed, the memory usage fluctuates, for all devices, butremains overall stable, as each successive code restoration does not consume additional memory(assuming the memory needs of the application do not increase). The step-like appearance ofthe memory curves are explained by the JS garbage collector (GC), which regularly claims smallamounts of memory (i.e., during execution of the regulator – small pikes), and which periodicallyruns a more through collection (bigger drops). However, we also observe that the memory tendsto very slowly increase over time, but this is not due to the multiple migrations – rather, this isan artifact of the experimental data collection process, which logs memory and CPU usage ata frequent interval (every 200ms) and keeps the data in memory. This is supported by Figure4.7, which plots the snapshot size at each successive migration, which remains constant at 83kb.Finally, as in Experiment 2 (Section 4.1.3), the memory usage on the cloud server is higher thanon the Pi devices.The CPU usage is shown on the same Figure (4.6). For simplicity, we show CPU usage resultsonly for one device (i.e., Pi 0, which is the most resource constrained), but the trend is similar onthe others. As can be observed, the CPU usage peaks at about 4%-5% when the Pi 0 device isexecuting the application, and is close to 0% otherwise. The CPU usage during run-time remainsconstant across the different executions. The short spike before migration corresponds to the codereconstruction, and the short spike after migration corresponds to the serialization process, forwhich a small memory surge can also be observed.614.1.6 SummaryOverall, our results demonstrate that ThingsMigrate can enable the cross-platform migration ofIoT JS-based applications with acceptable performance overhead and without any modificationsto the underlying VM. The run-time latency overhead using LAZYSNAP was 1.6% in the bestcase, and around 25% for control-yielding benchmark (factorial). Even for the compute-intensive,thread-blocking benchmarks from the Octane Suite, which are not representative of typical IoTworkloads, ThingsMigrate was able to migrate them despite a higher run-time overhead. Thememory overheads were more significant, ranging from 19% to 252%, though we believe that thisis an acceptable tradeoff, given our approach to provide migration support purely at the application-level through code instrumentation.Starting from the state-replication approach in TREECOPY, we explored the effects of differentoptimization techniques on the performance of the user application. XPLICTGC faired poorly interm of run-time overhead, despite a marginal reduction in memory consumption. This was mostlydue to the explicit dereferencing of variables that have gone out-of-scope. In LAZYSNAP, we cutdown memory consumption to nearly 1/3 of TREECOPY, by not keeping a replica of the internalstate. We also saw a significant reduction in run-time overhead due to capturing the state lazily.Finally, our results show that ThingsMigrate can support multiple-hops of subsequent migrationwithout altering the application semantics, while keeping the CPU and memory usage constantthroughout and not adding any overhead over the migration cycle.4.2 Case Study: Motion DetectorIn this section, we describe our experience with using ThingsMigrate to build a realistic IoT ap-plication for video surveillance by adapting third-party JS components developed for standaloneNode.js applications. There is a strong motivation for processing video streams at the edge [4, 64],as sending live streams to the cloud for processing in real-time might not always be efficient or62Figure 4.8: Case study setuppossible. The components that we adapted were not designed with ThingsMigrate in mind, andas a result, we had to make (minor) modifications to make them work with our system. We alsoevaluate this application using application-specific metrics, rather than CPU/memory usage (unlikeSection 4.1).4.2.1 Experimental SetupWe set up an IoT network with four devices to build a surveillance system. Figure 4.8 shows thesetup. We have used the TREECOPY technique for this case study, as the focus of the experimentwas not on optimal performance but to understand how ThingsMigrate can integrate with a real-world application. The workflow for integrating XPLICTGC and LAZYSNAP remain the same.The application logic is modularized into two components: a video streamer component thatcaptures images from a video source such as a webcam, and a motion detector component thatprocesses the images to detect motion. Unlike the video streamer, which is bound to a singledevice by the peripheral from which it needs to capture video, the motion detector can be run onany device as it performs computations on the image data. We measured the behavior of the systemover a series of migrations of the motion detector across the three systems (Raspberry Pi 3, Pi 0,and the cloud server from Section 4.1).Video-streamer: We used FFmpeg [7], a popular open-source software for handling multi-63media, to capture individual frames from a video stream. For the purpose of the experiment, thecomponent was configured to stream from a video file instead of a peripheral such as a webcam, sothat we have a deterministic and reproducible sequence of frames. To interface with the FFmpegprocess from the JS layer, we adapted a third-party NPM library called fluent-ffmpeg [50],which we use to capture individual frames. We then publish them over the Pub-Sub interface. Thecapture-and-publish routine was written as a single JS function captureFrame that was passedinto a setInterval call with interval set to 200ms (i.e., a rate of 5 frames per second). We usedthe cloud server to serve as a surveillance camera and run the video streamer component.Motion Detector: This component was written entirely in JS without having to interface withany external software. We integrated a third-party NPM module called jimp [46], which providesan API to read Buffer objects (i.e., received from the Pub-Sub overlay) and perform imageprocessing tasks. The component stores binary frame data for the n latest frames.The motion detection logic (i.e., function detectMotion) iterates through the array of im-ages and computes the difference between subsequent frames by calling jimp.diff(). Thebinary difference between the frames is published over the Pub-Sub interface. In addition, if morethan 10% of the pixels are altered, an alert message is also published under a different topic. ThedetectMotion function is passed to a setInterval call with the interval set to 500ms -this is lower than the frame rate of the video streamer (Section 4.2.2 explains why). Since thedetectMotion works by retrospective inspection of past frames, the array of Buffer objectscontaining the image data needs to be migrated. Otherwise, the restored component would need towait for the buffer of past frames to be filled again – thereby skipping the motion detection processfor a given time window, and missing potentially important motion.Although we do not fully support the migration of external libraries (Section 3.2.4), it waspossible to integrate the third-party NPM libraries as the objects they created were native JS objectsand the API calls were limited to stateless operations. For instance, since the Buffer objects arenative objects, they could be easily serialized and migrated. The function call to jimp.diff()64Figure 4.9: CPU Usage over time – Motion Detector componentis a stateless operation, since it does not create any additional scopes and its execution context isdestroyed after it returns. Such stateless operations do not affect the migration process because wedo not need to serialize their scopes.Finally, we collected performance statistics by subscribing to a Pub-Sub topic at which eachThingsMigrate runtime publishes its CPU and memory usage. To monitor and verify that themotion detection was working correctly, we used the ThingsJS web dashboard, which displays theimages by converting the data into a base64 encoded PNG image.4.2.2 ResultsTo automate our migration test in a controlled fashion, we wrote a Node.js script, using theThingsMigrate Migrator to send commands to the IoT devices over the Pub-Sub interface. Wesent a migrate command every 1 minute to the cloud Server, Pi 3, and Pi 0, in that order, andback. We repeated the cycle 3 times.Figure 4.9 shows the CPU usage over time as the application is migrated between the devices.The collected data for CPU and memory usage across the three devices exhibit a similar patternto the regulator component discussed in Section 4.1.4. The CPU usage on a device has a spikeupon receiving a snapshot and just before sending a snapshot, remains high while it is runninga component, and stays near 0 during the idle state. The memory consumption stays within a65Figure 4.10: FPS over time – Motion Detector componentnarrow range, with the garbage collector being triggered more frequently while a device is runninga component, and only occasionally while it is idle.However, on the Raspberry Pi 0’s console, we observed error messages showing that the pro-cess failed at regular intervals. This is because the asynchronous call to detectMotion tookmuch longer than the set interval of 500ms, due to the limited computational capacity of the Pi0,which led to the event queue filling up faster than the JS VM could consume, which eventually ledto overflowing and halting of the program.Figure 4.10 shows the frame rate measured in Frames Per Second (FPS) on each device overtime. The FPS was calculated using the formula 1∆t where ∆t is the time taken to execute thedetectMotion function. The figure shows the FPS dropping below the required FPS of 2over the periods between 120 and 180, 300 and 360, and 480 and 540 seconds, during whichthe Pi 0 was running the motion detector component. We can also observe the detectMotionfunction blocking the thread at 180 seconds and 360 seconds, preventing the migration from beingtriggered. The FPS drops below 2 occasionally during Pi 3’s execution, but for most frames it isable to process within the time interval, compensating for the delay overall.In summary, we were able to successfully migrate a third-party application with minimal mod-ifications between different devices. We also found the frame rate measured in FPS was acceptablein most cases for the application.664.3 DiscussionEquivalency of 3 versions. Since we instrument the original code and transform the program,we technically change the semantics of the program. From an operational semantics perspective,the instrumented program is not strictly equivalent (e.g., alpha-equivalent) to the original. Thatsaid, we use the relaxed notion of observational equivalence, where we claim that 2 programsare equivalent if their observable outputs are indistinguishable. This is a practical and acceptablenotion of equivalence in the context of real-world usage, as evidenced by the use of regression testsover a software life-cycle[15, 28, 67].Based on this reasoning, we verify the safety of our instrumentation by observing the behaviourof the instrumented program and comparing it against the behaviour of the original program usinga set of unit tests and micro-benchmarks. For example, to verify that we do not break the closuresemantics, we test a minimal program that defines a single closure function foo() containinga local variable x initialized with an arbitrary value, then try to print the value of x from the globalscope – a correctly transformed program should print undefined. We use a set of such minimalprograms and more complex programs (e.g., nested closures, multiple instances of closures) totest each of the JS semantics, ranging from simple variable assignment and loops to prototypalinheritance. These tests also help us assess the completeness of our instrumentation, as failedtests reveal the JS features we do not support. Using this rubric, LAZYSNAP is the most completeapproach, followed by TREECOPY, which does not fully support migrating prototypes. As wehave mentioned, XPLICTGC actually breaks the closure semantics when dealing with multiplenested closures. However, we have not pursued to correct XPLICTGC as we have abandoned theapproach after discovering its relatively poor performance.Security Impact. We examine the security impact of transforming a program, since the instru-mented program exposes to ThingsMigrate the closed variables, which otherwise would be hiddento a third-party script running in the same VM. Firstly, we assume that the user (or a group of67trusted users such as a company) has control over the host platform on which ThingsMigrate Run-times run, and that the process snapshot is transfered over the Internet. This assumption applies tomost Web services, as the service providers either own the host machines or rent VMs from trustedparties, and data is transferred between servers over the Internet. Given this assumption, we claimthat the attack surface of ThingsMigrate is fundamentally no larger than an ordinary Web service;it is secure as long as we apply the standard Web security practices of encrypting communicationchannels and protecting the hosts[3, 20, 71].There are 2 attack vectors through which a closed variable can be leaked: 1) snapshot image,and 2) a third-party script. The first attack vector is straightforward – the snapshot contains thevalues of all the variables and it is transferred over the network. Thus, to preserve the confiden-tiality of the snapshot, we must ensure that the communication is secure via end-to-end encryptionbetween the ThingsMigrate Runtimes. While we have not secured the communication channelsin our implementation, it is trivial to do so – because we control both communication endpoints– by leveraging existing solutions such as TLS. As per the second attack vector (via third-partyscript), we first summarize ThingsMigrate’s operation: ThingsMigrate itself is written as a JS pro-gram that runs inside a VM like Node.js, exposes a user interface at a public-facing network portthrough which a user sends code to execute, and instruments and executes a given program uponuser request. At the user interface, we must ensure that the user is trusted and the communicationchannel between the user and ThingsMigrate is secure. While we do not address this aspect inour implementation, it can easily be addressed by using existing Web security solutions such asstandard authentication, OAuth, or PKI (SSL certificates). This provides the bare minimal protec-tion from a technical standpoint, but the system is still vulnerable from authenticated users withmalicious intent – i.e., compromised clients. For example, an honest user’s program could be vul-nerable to a program sent by a malicious user, similarly to a cross-site scripting attack. TREECOPYwas vulnerable to this attack, as the closed variables are directly reachable from a global objectand the user programs were run in the same process space. In LAZYSNAP, this attack is more68difficult, firstly because we do not expose closed variables in the global scope, and secondly be-cause each program is run as a child process of the ThingsMigrate Runtime, and thus is isolatedfrom each other. The only channel a malicious program has to another program is through theThingsMigrate API – by invoking the snapshot function and receiving the snapshot. The cur-rent implementation of ThingsMigrate is vulnerable to this particular attack, but we can defendagainst it by implementing permission control. For example, each program would have an owner,and migration control would be permitted only by the owner. Albeit the lack of implementation,the defense we have described – such as using TLS, authentication, and permission control – arestandard practice in Web-based systems, and hence we claim that ThingsMigrate is on par withordinary Web services in terms of security.Algorithmic Complexity. In terms of algorithmic complexity, we first point out that the inputinstance is the user program and its size is simply the size of the JS code in UTF8. The algorithmcan be broken down into 3 phases: parsing phase, AST modification phase, and code generationphase. The algorithmic complexity of the parsing phase is O(n), as JS’s lexical grammar is Left-to-Right (LR)[33] and the parser is essentially a single-pass stream operator over a text stream.We use the esprima parser [29] whose algorithmic complexity is undocumented, but we assumeO(n) which is typical of LR parsers [1, 36]. After parsing, ThingsMigrate modifies the relevantAST nodes by recursively processing the tree starting from the root. TREECOPY and XPLICTGCwere not context-free, as they had to track each variable name by looking up the surroundingnodes to determine the name’s lexical scope. In the worst case, the algorithmic complexity wouldbe O(n2), as the whole tree could be looked up for each node. However, LAZYSNAP modifies theAST in a nearly context-free manner; it builds the local context of each function expression as ittraverses its body, and then injects a new node as it exits the node. Thus, the modification phasealso completes in O(n). Finally, the code generation phase is the reverse process of parsing, and italso completes in O(n); we use the escodegen library [56]. Overall, the algorithmic complexityof ThingsMigrate’s code instrumentation is therefore O(n). We also observed this experimentally69in Figure 4.1a.4.4 SummaryIn this chapter, we evaluated our migration technique, measuring the time taken to instrumentuser code, the size of the instrumented code, the execution latency and memory overhead, andthe migration overhead. We have shown that the code instrumentation overhead is very small, inthe order of hundreds of milliseconds for megabytes of code. Enabling migration introduces about33% overhead on average in terms of execution time, and about 3% in the best case. The migration-enabled program consumes about 78% more memory on average, and 18% more memory in thebest case. The overhead added by the code instrumentation depends on the program structure, suchas the depth of the nested scopes. Finally, we measure the time taken to migrate a running process,and show that for most of the benchmarks, it takes only tens of milliseconds to capture and restorea process. The worst case migration latency on the cloud server is about 50ms and about 500mson the Raspberry PI.We have also performed a case study on a realistic video processing application, migrating astateful image processing program across multiple devices. Further, we show that our techniquedoes not introduce any additional overhead over multiple hops of migration, unlike any prior workon JS process migration. The ability to migrate repeatedly is important for edge computing appli-cations, where a process might be migrated frequently between different devices.70Chapter 5Conclusion and Future Work5.1 SummaryIn this thesis, we presented ThingsMigrate, an approach that enables platform-independent migra-tion of stateful JS processes across diverse IoT devices. ThingsMigrate transparently instrumentsthe application code before executing it, injecting application-level objects that expose the hiddenstates of a JS program such as local variables captured inside closures. Additional event listenersare injected into the instrumented program to provide an interface for serializing the program stateinto a snapshot during run-time. Given a platform-agnostic representation (i.e., JSON snapshot)of the process state, ThingsMigrate generates code that restores the state of the program and re-sumes execution. Further, ThingsMigrate enables multiple subsequent migrations by formulatingthe snapshot procedure and restoration procedure as inverses of each other, to ensure that the pro-gram before capture and after restoration are semantically equivalent. ThingsMigrate is purely anapplication-level technique relying only on the semantics of the language, and requires neither anymodification to the underlying VM nor the platform.We presented 3 different versions of ThingsMigrate, each with different tradeoffs. In the firstTREECOPY approach, we maintain an explicit replica of the program state by copying any state71variables into the injected Scope objects. In an attempt to reduce the memory footprint, we carryout a more invasive program transformation in XPLICTGC, in which we convert all lexically-scoped variables into explicit properties of Scope objects. Consequently, we face additional dif-ficulties having to explicitly dereference out-of-scope variables, which eventually degrades perfor-mance during run-time. Combining the learning outcomes from TREECOPY and XPLICTGC, wedevise an optimized technique in LAZYSNAP, where we avoid both keeping an explicit replica ofthe state and managing the Scope life-cycle. LAZYSNAP introduces minimal run-time overheadby capturing the state lazily via callbacks, and by aligning its Scope hierarchy with the origi-nal program’s scope hierarchy, thus delegating to the native GC the management of the Scopelife-cycle.We evaluated each version of ThingsMigrate on three different devices (IoT and cloud), us-ing both Chrome Octane benchmarks and custom benchmarks. The results show that ThingsMi-grate can instrument user programs within microseconds on a cloud device, and migrate processes(i.e., capture snapshot and generate restored program) between devices within reasonable timebounds. ThingsMigrate imposes an average of 33% execution time overhead during run-time fornon-blocking IoT applications, which is reasonable, and does not lead to significant slowdown. Fi-nally, we show that ThingsMigrate supports multiple subsequent migrations across heterogeneousdevices without incurring additional overheads.5.2 Future WorkOur migration technique opens up new possibilities for providing performance improvements andfailure tolerance in the edge computing domain. On the other hand, we have uncovered limita-tions in our approach, which leaves room for future research. We suggest some potential researchdirections in the following sections.725.2.1 Generalization of the Approach to Other High-level LanguagesIn this thesis, we have focused on enabling migration of JS (ES5) programs. That said, the keyinsights we discovered in our research are not specific to JS programs and can be applied to otherhigh-level languages. For example, for any language that supports closures, we can extract localclosure variables by injecting Scope objects. Similarly, for any language using a tree-like heap,we can trace the hierarchy of objects by performing Mark-and-Sweep. As an immediate extensionto the work presented, we could start by providing support for the newer ECMA standards – e.g.,ES6 (ECMAScript 2015). We could further apply our technique to languages such as Pythonor Ruby, in order to study the generalizability of our approach. Investigating the challenges indifferent languages may reveal additional insights and would be an interesting research.5.2.2 Support for I/O ResourcesAs mentioned in Section 3.2.4, ThingsMigrate provides limited support for migrating I/O resourcessuch as files and sockets. The challenge in migrating I/O resources is not just a matter of providingthe necessary instrumentation. For example, if a process reads data from a TCP socket, what doesit mean to migrate this process to another machine? The target machine would have a differentidentity in the network and would be receiving a different stream of data through its TCP socket; itmay or may not make sense to resume the process in this case. Similarly, if a process that interactswith a file were to be migrated, the migration target device would need to have the same set of files.Early works on process migration had acknowledged this challenge, and the challenge remains inthe edge computing domain.We have explored a potential solution, which we call OneOS, for addressing the challengesof supporting the migration of I/O resources. OneOS is a distributed middleware that collectivelyprovides a single-system image of the computer network, with a global file system and a globalI/O resource registry. The key intuition behind our approach is to provide distribution transparencyby presenting a thin virtual interface for accessing the I/O resources. A OneOS program interacts73with an I/O resource (e.g., a file) as if it was located on the local host. The OneOS middlewaretransparently intercepts any read/write operations to an I/O resource and redirects it to the actualresource in the network. We have presented an early version of this work at the HotEdge ’19Workshop[34]. We plan to investigate how our migration technique can be expanded with the helpof OneOS.5.2.3 Applying Live-migration for Performance and ReliabilityWe have demonstrated platform-independent migration in this thesis, but we have not exploredhow it can be used for improving the performance and reliability of distributed edge computingapplications. As mentioned in Section 1.1, delay-sensitive applications may need to migrate to amore powerful machine if the current machine’s compute load increases or its battery is runninglow. Under changing conditions, we can dynamically distribute the workload across the networkusing live-migration. In case of arbitrary failures, we can periodically take snapshots of the ap-plications and recover lost processes from the saved snapshot. Our technique enables efficientplatform-independent live-migration at the granularity of a single process, and opens up new pos-sibilities for edge computing applications.One potential research direction is to study how we can dynamically allocate workloads in anetwork, taking into account the migration capability. Existing workload management systems(e.g., Apache Mesos[30], Apache Hadoop YARN[61], Google Borg[62]) assume that once a taskis allocated on a machine, it either runs to completion or fails and restarts. Dynamic reconfigu-ration is costly due to the high cost of VM migration, and thus the task allocator aims to find theoptimal configuration at the beginning. Using ThingsMigrate, we can explore adaptive task allo-cation, where we leverage the efficient migration technique to quickly reallocate tasks, rather thanstatically compute the optimal allocation.Another interesting direction is to apply application checkpointing in stream processing appli-cation in the edge. Existing stream processing applications provide failure tolerance by keeping74an external data storage and writing the application in such a way that it reads from the storage toinitialize its state, and writes to the storage periodically. The application developer is responsiblefor annotating the objects that should be persisted, which can be prone to human error. UsingThingsMigrate, we can explore new ways to write stream processing applications, as it can persistthe entire process including the control flow state, and does not require any involvement from thedeveloper.75Bibliography[1] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers, principles, techniques. Addison wesley, 7(8):9, 1986. → page 69[2] S. Alimadadi, S. Sequeira, A. Mesbah, and K. Pattabiraman. Understanding javascriptevent-based interactions. In Proceedings of the 36th International Conference on SoftwareEngineering, pages 367–377. ACM, 2014. → page 6[3] J. Allen. Cert system and network security practices. In Proceedings of the Fifth NationalColloquium for Information Systems Security Education (NCISSE’01), George MasonUniversity, Fairfax, VA USA, pages 22–24, 2001. → page 68[4] G. Ananthanarayanan, P. Bahl, P. Bodík, K. Chintalapudi, M. Philipose, L. Ravindranath,and S. Sinha. Real-time video analytics: The killer app for edge computing. computer, 50(10):58–67, 2017. → page 62[5] E. Andreasen, L. Gong, A. Møller, M. Pradel, M. Selakovic, K. Sen, and C.-A. Staicu. Asurvey of dynamic analysis and test generation for javascript. ACM Computing Surveys(CSUR), 50(5):66, 2017. → page 38[6] S. Andrica and G. Candea. Warr: A tool for high-fidelity web application record and replay.In Dependable Systems & Networks (DSN), 2011 IEEE/IFIP 41st International Conferenceon, pages 403–410. IEEE, 2011. → page 6[7] F. Bellard. Ffmpeg website. https://www.ffmpeg.org, 2017. URL https://www.ffmpeg.org.→ page 63[8] F. Bellucci, G. Ghiani, F. Paternò, and C. Santoro. Engineering javascript state persistenceof web applications migrating across multiple devices. In Proceedings of the 3rd ACMSIGCHI symposium on Engineering interactive computing systems, pages 105–110. ACM,2011. → page 4[9] B. Burg, R. Bailey, A. J. Ko, and M. D. Ernst. Interactive record/replay for web applicationdebugging. In Proceedings of the 26th annual ACM symposium on User interface softwareand technology, pages 473–484. ACM, 2013. → page 676[10] T. Chang and H. Meling. Byzantine fault-tolerant publish/subscribe: A cloud computinginfrastructure. In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on,pages 454–456. IEEE, 2012. → page 28[11] I. K. Chaniotis, K.-I. D. Kyriakou, and N. D. Tselikas. Is node. js a viable option forbuilding modern web applications? a performance evaluation study. Computing, 97(10):1023–1044, 2015. → page 4[12] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield.Live migration of virtual machines. In Proceedings of the 2nd Conference on Symposium onNetworked Systems Design & Implementation-Volume 2, pages 273–286. USENIXAssociation, 2005. → pages 1, 6[13] B. Contributors. Babel.js javascript compiler. https://babeljs.io/, 2017. URLhttps://babeljs.io/. → page 19[14] D. Crockford. JavaScript: The Good Parts: The Good Parts. " O’Reilly Media, Inc.", 2008.→ page 28[15] C. Csallner and Y. Smaragdakis. Jcrasher: an automatic robustness tester for java. Software:Practice and Experience, 34(11):1025–1050, 2004. → page 67[16] P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec. The many faces ofpublish/subscribe. ACM Comput. Surv., 35(2):114–131, 2003. ISSN 0360-0300.doi:10.1145/857076.857078. → page 17[17] P. Fischer. Intel R© xdk: cross-platform ide for development of mobile apps including gamesand iot: tutorial presentation. Journal of Computing Sciences in Colleges, 32(1):180–180,2016. → pages 3, 4, 20[18] M. Foundation. Rhino - mozilla. https://github.com/mozilla/rhino, 2020. URLhttps://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rhino. → page 6[19] O. Foundation. Node-red website. https://nodered.org/, 2017. URL https://nodered.org/. →page 4[20] S. Garfinkel and G. Spafford. Web security, privacy & commerce. " O’Reilly Media, Inc.",2002. → page 68[21] J. Gascon-Samson, M. Rafiuzzaman, and K. Pattabiraman. Thingsjs: Towards a flexible andself-adaptable middleware for dynamic and heterogeneous iot environments. In Proceedingsof the 4th Workshop on Middleware and Applications for the Internet of Things, M4IoT ’17,pages 11–16, 2017. ISBN 978-1-4503-5170-6. → pages 4, 17, 46, 8377[22] J. Gascon-Samson, K. Jung, S. Goyal, A. Rezaiean-Asel, and K. Pattabiraman.ThingsMigrate: Platform-Independent Migration of Stateful JavaScript IoT Applications. InT. Millstein, editor, 32nd European Conference on Object-Oriented Programming (ECOOP2018), volume 109 of Leibniz International Proceedings in Informatics (LIPIcs), pages18:1–18:33, Dagstuhl, Germany, 2018. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.ISBN 978-3-95977-079-8. doi:10.4230/LIPIcs.ECOOP.2018.18. URLhttp://drops.dagstuhl.de/opus/volltexte/2018/9223. → page 5[23] E. Gavrin, S.-J. Lee, R. Ayrapetyan, and A. Shitov. Ultra lightweight javascript engine forinternet of things. In SPLASH Companion 2015, pages 19–20, New York, NY, USA, 2015.ACM. ISBN 978-1-4503-3722-9. → pages 3, 4, 20[24] Google. Chromium octane benchmark suite. https://github.com/chromium/octane, 2017.URL https://github.com/chromium/octane. → page 49[25] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. Internet of things (iot): A vision,architectural elements, and future directions. Future generation computer systems, 29(7):1645–1660, 2013. → page 1[26] D. Guinard and V. Trifa. Towards the web of things: Web mashups for embedded devices.In Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web(MEM 2009), in proceedings of WWW (International World Wide Web Conferences),Madrid, Spain, volume 15, 2009. → page 4[27] K. Ha, Y. Abe, Z. Chen, W. Hu, B. Amos, P. Pillai, and M. Satyanarayanan. Adaptive vmhandoff across cloudlets. Technical Report CMU-CS-15-113, 2015. → page 6[28] M. J. Harrold, J. A. Jones, T. Li, D. Liang, A. Orso, M. Pennings, S. Sinha, S. A. Spoon, andA. Gujarathi. Regression test selection for java software. In Proceedings of the 16th ACMSIGPLAN Conference on Object-Oriented Programming, Systems, Languages, andApplications, OOPSLA ’01, page 312–326, New York, NY, USA, 2001. Association forComputing Machinery. ISBN 1581133359. doi:10.1145/504282.504305. URLhttps://doi-org.ezproxy.library.ubc.ca/10.1145/504282.504305. → page 67[29] A. Hidayat. esprima: Ecmascript parsing infrastructure for multipurpose analysis.http://esprima.org/, 2017. URL http://esprima.org/. → pages 46, 69[30] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker,and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. InNSDI, volume 11, pages 22–22, 2011. → page 74[31] T. Index. 2017 tiobe index. https://www.tiobe.com/tiobe-index/, 2017. → page 3[32] E. International. ECMAScript Language Specification. Ecma International, 5.1 edition,2011. URL http://www.ecma-international.org/ecma-262/5.1/. → page 1978[33] E. International. ECMAScript 2015 Language Specification. Ecma International, 6.0edition, 2015. URL http://www.ecma-international.org/ecma-262/6.0/. → pages 19, 69[34] K. Jung, J. Gascon-Samson, and K. Pattabiraman. Oneos: Iot platform based on {POSIX}and actors. In 2nd {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 19),2019. → page 74[35] V. Karagiannis, P. Chatzimisios, F. Vazquez-Gallego, and J. Alonso-Zarate. A survey onapplication layer protocols for the internet of things. Transaction on IoT and CloudComputing, 3(1):11–17, 2015. → page 19[36] D. E. Knuth. On the translation of languages from left to right. Information and control, 8(6):607–639, 1965. → page 69[37] J.-w. Kwon and S.-M. Moon. Web application migration with closure reconstruction. InProceedings of the 26th International Conference on World Wide Web, WWW ’17, pages133–142, Geneva, Switzerland, 2017. ISBN 978-1-4503-4913-0. → pages 4, 5, 6, 7, 56, 57[38] A. Levy, B. Campbell, B. Ghena, D. B. Giffin, P. Pannuto, P. Dutta, and P. Levis.Multiprogramming a 64kb computer safely and efficiently. In Proceedings of the 26thSymposium on Operating Systems Principles, pages 234–251. ACM, 2017. → page 2[39] J. Lin and K. El Gebaly. The future of big data is... javascript? IEEE Internet Computing, 20(5):82–88, 2016. → page 4[40] M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpoint and migration of unixprocesses in the condor distributed processing system. Technical report, University ofWisconsin-Madison Department of Computer Sciences, 1997. → page 6[41] J. T. K. Lo, E. Wohlstadter, and A. Mesbah. Imagen: Runtime migration of browser sessionsfor javascript web applications. In Proceedings of the 22Nd International Conference onWorld Wide Web, WWW ’13, pages 815–826, New York, NY, USA, 2013. ACM. ISBN978-1-4503-2035-1. → pages 4, 5, 6, 7, 24, 28, 56[42] A. Machen, S. Wang, K. K. Leung, B. J. Ko, and T. Salonidis. Live service migration inmobile edge clouds. IEEE Wireless Communications, 25(1):140–147, 2018. → page 6[43] M. Madsen, O. Lhoták, and F. Tip. A model for reasoning about javascript promises.Proceedings of the ACM on Programming Languages, 1(OOPSLA):86, 2017. → page 38[44] J. W. Mickens, J. Elson, and J. Howell. Mugshot: Deterministic capture and replay forjavascript applications. In NSDI, volume 10, pages 159–174, 2010. → page 6[45] D. S. Milo´icˇic´, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou. Process migration.ACM Comput. Surv., 32(3):241–299, Sept. 2000. ISSN 0360-0300.doi:10.1145/367701.367728. URL http://doi.acm.org/10.1145/367701.367728. → page 679[46] O. Moran. Jimp: Javascript image manipulation program.https://github.com/oliver-moran/jimp, 2017. URL https://github.com/oliver-moran/jimp. →page 64[47] J. Oh, J.-w. Kwon, H. Park, and S.-M. Moon. Migration of web applications with seamlessexecution. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference onVirtual Execution Environments, VEE ’15, pages 173–185, New York, NY, USA, 2015.ACM. ISBN 978-1-4503-3450-1. doi:10.1145/2731186.2731197. URLhttp://doi.acm.org/10.1145/2731186.2731197. → pages 4, 5[48] H. N. Saha, A. Mandal, and A. Sinha. Recent trends in the internet of things. In 2017 IEEE7th annual computing and communication workshop and conference (CCWC), pages 1–4.IEEE, 2017. → page 1[49] M. Satyanarayanan. The emergence of edge computing. Computer, 50(1):30–39, Jan 2017.doi:10.1109/MC.2017.9. → page 1[50] S. Schaermeli. Fluent ffmpeg-api for node.js.https://github.com/fluent-ffmpeg/node-fluent-ffmpeg, 2017. URLhttps://github.com/fluent-ffmpeg/node-fluent-ffmpeg. → page 64[51] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A selective record-replay anddynamic analysis framework for javascript. In Proceedings of the 2013 9th Joint Meeting onFoundations of Software Engineering, pages 488–498. ACM, 2013. → page 6[52] A. W. Services. Iot greengrass. https://docs.aws.amazon.com/greengrass/, jan 2019. →page 4[53] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu. Edge computing: Vision and challenges. IEEEInternet of Things Journal, 3(5):637–646, 2016. → pages 1, 2[54] D. Sin and D. Shin. Performance and resource analysis on the javascript runtime for iotdevices. In International Conference on Computational Science and Its Applications, pages602–609. Springer, 2016. → pages 4, 49[55] C. Software. mjs. https://github.com/cesanta/mjs, 2017. URLhttps://github.com/cesanta/mjs. → pages 4, 20[56] Y. Suzuki. escodegen: Ecmascript code generator. https://github.com/estools/escodegen,2017. URL https://github.com/estools/escodegen. → pages 46, 69[57] A. Taivalsaari and T. Mikkonen. A roadmap to the programmable world: Softwarechallenges in the iot era. IEEE Software, 34(1):72–80, Jan 2017. ISSN 0740-7459.doi:10.1109/MS.2017.26. → page 2[58] S. Tilkov and S. Vinoski. Node. js: Using javascript to build high-performance networkprograms. IEEE Internet Computing, 14(6):80–83, 2010. → page 480[59] S. Vaarala. Duktape. http://www.duktape.org/, 2017. URL http://www.duktape.org/. →pages 4, 20[60] J. Van der Cruysse, L. Hoste, and W. Van Raemdonck. Flashfreeze: Low-overheadjavascript instrumentation for function serialization. In Proceedings of the 4th ACMSIGPLAN International Workshop on Meta-Programming Techniques and Reflection,META 2019, pages 31–39, New York, NY, USA, 2019. ACM. ISBN 978-1-4503-6985-5.doi:10.1145/3358502.3361268. URL http://doi.acm.org/10.1145/3358502.3361268. →pages 5, 7[61] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves,J. Lowe, H. Shah, S. Seth, et al. Apache hadoop yarn: Yet another resource negotiator. InProceedings of the 4th annual Symposium on Cloud Computing, pages 1–16, 2013. → page74[62] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scalecluster management at google with borg. In Proceedings of the Tenth European Conferenceon Computer Systems, pages 1–17, 2015. → page 74[63] C. Wang, F. Mueller, C. Engelmann, and S. L. Scott. Proactive process-level live migrationin hpc environments. In Proceedings of the 2008 ACM/IEEE conference onSupercomputing, page 43. IEEE Press, 2008. → page 6[64] J. Wang, Z. Feng, Z. Chen, S. George, M. Bala, P. Pillai, S.-W. Yang, andM. Satyanarayanan. Bandwidth-efficient live video analytics for drones via edge computing.In 2018 IEEE/ACM Symposium on Edge Computing (SEC), pages 159–173. IEEE, 2018. →page 62[65] S. Wang, R. Urgaonkar, M. Zafer, T. He, K. Chan, and K. K. Leung. Dynamic servicemigration in mobile edge-clouds. In 2015 IFIP Networking Conference (IFIP Networking),pages 1–9, 2015. → page 6[66] P. R. Wilson. Uniprocessor garbage collection techniques. In International Workshop onMemory Management, pages 1–42. Springer, 1992. → page 42[67] W. E. Wong, J. R. Horgan, S. London, and H. Agrawal. A study of effective regressiontesting in practice. In Proceedings The Eighth International Symposium on SoftwareReliability Engineering, pages 264–274, 1997. → page 67[68] T. Wood, P. J. Shenoy, A. Venkataramani, M. S. Yousif, et al. Black-box and gray-boxstrategies for virtual machine migration. In NSDI, volume 7, pages 17–17, 2007. → pages1, 6[69] A. Wright and H. Andrews. Json schema: A media type for describing json documents. InIETF, Internet-Draft draft-handrews-json-schema-OO. 2017. → page 1081[70] Y. Yoon, V. Muthusamy, and H.-A. Jacobsen. Foundations for highly availablecontent-based publish/subscribe overlays. In Distributed Computing Systems (ICDCS), 201131st International Conference on, pages 800–811. IEEE, 2011. → page 28[71] C. Yue and H. Wang. Characterizing insecure javascript practices on the web. InProceedings of the 18th International Conference on World Wide Web, WWW ’09, page961–970, New York, NY, USA, 2009. Association for Computing Machinery. ISBN9781605584874. doi:10.1145/1526709.1526838. URLhttps://doi.org/10.1145/1526709.1526838. → page 68[72] A. Zarrabi. A generic process migration algorithm. International Journal of Distributed andParallel Systems, 3(5):29, 2012. → page 6[73] S. Zhongyuan, Q. Jianzhong, L. Shukuan, and Z. Qiang. Use pre-record algorithm toimprove process migration efficiency. In 2015 14th International Symposium on DistributedComputing and Applications for Business Engineering and Science (DCABES), pages50–53, Aug 2015. doi:10.1109/DCABES.2015.20. → page 682Appendix AResearch ArtifactsA.1 ThingsJS and ThingsMigrateAs mentioned previously, the implementation of our system (ThingsMigrate) was realized overThingsJS [21], our general-purpose framework for executing high-level edge applications on IoTdevices. In this appendix, we provide a brief summary of ThingsJS below, and its relationshipwith ThingsMigrate. We then give some details on our open-source implementation of ThingsMi-grate/ThingsJS.A.1.1 ArchitectureThe high-level architecture of the ThingsJS Framework consists of several distributed componentsand is presented in Figure A.1. From a holistic point of view, a ThingsJS environment comprisesa highly-distributed ThingsJS Application, and dynamically manages its execution over a set ofheterogeneous devices through the ThingsJS Framework. More details on ThingsJS are availablein our vision paper [21].ThingsJS Manager. The ThingsJS Manager is the center-piece of our system architecture,and manages the execution of all components across all devices. It takes as input a ThingsJS83application, written in JavaScript, and schedules and monitors its distributed execution across allparticipating ThingsJS devices. The Manager can decide to trigger the migration of a given appli-cation towards another device – to accomplish this, it uses the ThingsMigrate APIs.ThingsJS Runtime. An instance of the ThingsJS Runtime is present on every device andacts as a thin hypervisor layer. It locally manages the execution of all components on the device,and gathers detailed statistics during execution (CPU, memory and bandwidth usage) which arefed to the Manager. Upon the migration of a given application being requested, ThingsMigratecoordinates the migration through the Runtime components on both devices involved.Inter-Component Communications. ThingsJS provides a pub/sub-based communication sub-strate (MQTT), and requires that all inter-component communications follow that model. Thechoice of this model was primarily motivated by the logical decoupling of content producers fromcontent consumers that it provides. Like any other component, ThingsMigrate makes use of thepub/sub communication substrate for all communications (i.e., migration commands and snapshotsare transferred directly between relevant Runtime nodes through the pub/sub interface).ThingsMigrate. As a subcomponent of ThingsJS, ThingsMigrate provides support for dy-namically migrating JavaScript IoT applications between devices, and is the focus of this thesis.More details on the architecture on ThingsMigrate and its components are given in Section 2.4 andFigure 2.3 of this thesis. In our specific implementation, as scheduling considerations are outsidethe scope, we let the user deploy applications manually, monitor their state and trigger migrations,through a web dashboard interface (appendix A.2).A.1.2 Implementation as an Open-Source ProjectAs mentioned, we implemented ThingsMigrate as an open-source project (built over ThingsJS).The version of ThingsMigrate/ThingsJS that correspond to this thesis has been tagged as spe-v3in our GitHub repository and can be accessed at: https://github.com/karthikp-ubc/ThingsJS/tree/spe-v3. Given the availability of similar hardware and software configurations, it can be used to84Figure A.1: High-Level Architecture of ThingsJSreproduce the results that we obtained.As ThingsJS provides an IoT-based middleware, it needs be installed and run on every device.Each device then executes a worker node (corresponding to the ThingsMigrate/ThingsJS Runtimein our architecture) that hosts one or more JavaScript applications. Worker nodes listen for appro-priate start and stop commands over the pub/sub interface to respectively start and stop theexecution of applications, as well as migrate commands to trigger the migration between twonodes.Detailed installation and usage instructions can be found at our project page1 and in our wiki2.In addition, we also provide a video demo of ThingsMigrate3, as well as a ready-to-use VirtualBox1https://github.com/karthikp-ubc/ThingsJS2https://github.com/karthikp-ubc/ThingsJS/wiki3http://ece.ubc.ca/~karthikp/ThingsMigrate/ecoop2018.html85Virtual Machine image4 that can be used to try ThingsMigrate and our web dashboard (appendixA.2).A.2 Web DashboardIn addition to implementing the ThingsMigrate system over ThingsJS, as well as the code instru-mentation, snapshotting and code generation algorithms as a set of command-line tools, we alsoimplemented a Web Dashboard as a user-friendly interface to interact with ThingsMigrate. Thisappendix highlight the main features of our dashboard.A.2.1 Interface OverviewOur web dashboard can be used to view the status of the currently available devices and applica-tions, and can also launch, migrate and stop the execution of applications across devices. Given thatthe scheduling infrastructure is not yet implemented, the dashboard plays the role of the ThingsMi-grate Manager by allowing the the user to control the deployment and the migration of applicationsto IoT devices manually.Screenshots of the dashboard are shown in Figure A.2. The dashboard provides three differentviews, which can be selected through the menu:1. Main (default) view (Figure A.2a): this view provides a holistic view of all the nodes (de-vices – IoT or cloud-based), their detailed status, performance (CPU and memory usage) andconsole output. In addition, specific to the video streaming / motion detection case-study ap-plication (Section 4.2), this view allows one to observe the raw video stream as well as thedetected motion patterns.2. Codes view (Figure A.2b): this view provides a code viewer and editor to view and edit thesource code of the IoT apps to be run on the devices. Developers can write their apps theredirectly, or cut-and-paste from their favorite editor / IDE into the code editor.4Idem.86(a) Overview(b) IoT Apps Source CodeFigure A.2: ThingsMigrate Dashboard873. Debug view: this view provides network debugging support by showing the flow of pub/submessages across the pub/sub substrate.A.2.2 Main FeaturesIn a nutshell, our dashboard provides the following features:• Viewing the state of worker nodes (Main view, similar to Figure A.2a): in the top-leftportion of the main view, a list of all registered worker nodes is shown, with their status:– Green: the node is currently available and idle (i.e., not executing any application)5– Grey: the node is currently available, but busy (i.e., executing another application)– Red: the node is currently unavailable• Viewing detailed worker status (Main view): on the right side, the interface provides threepanes that can be used to show detailed information on a given worker (IoT/cloud) node.Each status pane provides three different views:– Status: shows the status of the node– Graph: shows a graph of the memory or CPU usage of the node– Console: shows the output of the application that the node is currently executing• Launching applications on devices (Main view): the dashboard can be used to launch anyapplication defined in the Codes section of the dashboard. The code is instrumented on-the-fly by ThingsMigrate (Section 3.2.1) in order to support migration, and is launched throughthe Runtime on the target device.• Stopping applications (Main view): the execution dashboard can instruct the Runtime onany target device to stop the execution of a given application.5Although our framework supports executing several applications on a given worker node, our web dashboardcurrently allows for only one application to be mapped to a given worker node.88• Migrating applications between nodes (Main view): the dashboard can trigger the migra-tion of an application from one device to another. When doing so, it instructs the Runtime onthe first device to initiate the migration. The Runtime will then pause the execution, take asnapshot of the current state, send the state to the Runtime of the second device, and instructthe second device to resume the execution (Figure 2.3).89

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0395872/manifest

Comment

Related Items