@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Science, Faculty of"@en, "Computer Science, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "MacRow, Kalan W."@en ; dcterms:issued "2017-08-14T17:16:36Z"@en, "2017"@en ; vivo:relatedDegree "Master of Science - MSc"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description "Vaportrail is a privacy-preserving platform for personal data and applications. It allows users to archive their personal data and safely expose it to untrusted third- party applications. As a trusted hub for data sources like email, social updates, location data, and health metrics it enables new types of applications that com- bine several sensitive personal data streams. Through carefully designed isolation mechanisms, the platform prevents applications from exfiltrating data and unbur- dens developers of the fiduciary responsibility associated with handling personal data. Vaportrail provides an open package format and APIs for building service connectors that ingest data from external services, as well as a robust browser- based sandbox that mediates application access to sensitive data."@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/62599?expand=metadata"@en ; skos:note "Vaportail: A Platform for Personal Data ApplicationsbyKalan W. MacRowB.Sc. Computer Science, University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)August 2017c© Kalan W. MacRow, 2017AbstractVaportrail is a privacy-preserving platform for personal data and applications. Itallows users to archive their personal data and safely expose it to untrusted third-party applications. As a trusted hub for data sources like email, social updates,location data, and health metrics it enables new types of applications that com-bine several sensitive personal data streams. Through carefully designed isolationmechanisms, the platform prevents applications from exfiltrating data and unbur-dens developers of the fiduciary responsibility associated with handling personaldata. Vaportrail provides an open package format and APIs for building serviceconnectors that ingest data from external services, as well as a robust browser-based sandbox that mediates application access to sensitive data.iiLay SummaryVaportrail is a software system that allows users to import their data from websites,social networks, smart devices, and other services. It provides a safe platform forexploring and leveraging personal data using third-party applications.iiiPrefaceThis dissertation is original, unpublished, independent work by the author, KalanW. MacRow. It was conducted in the Networks, Systems and Security Laboratoryat the University of British Columbia, Point Grey campus.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Deployment model . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Personal data warehousing . . . . . . . . . . . . . . . . . . . . . 21.3 Service connectors . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Application sandbox . . . . . . . . . . . . . . . . . . . . . . . . 31.5 A new class of applications . . . . . . . . . . . . . . . . . . . . . 31.6 A simple trust model . . . . . . . . . . . . . . . . . . . . . . . . 31.7 Sharing with intent . . . . . . . . . . . . . . . . . . . . . . . . . 4v1.8 Call for an open ecosystem . . . . . . . . . . . . . . . . . . . . . 42 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 A personal appliance . . . . . . . . . . . . . . . . . . . . . . . . 62.2 A modern user experience . . . . . . . . . . . . . . . . . . . . . 72.3 Simple trust model . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.1 Trust the infrastructure . . . . . . . . . . . . . . . . . . . 82.3.2 Trust the platform . . . . . . . . . . . . . . . . . . . . . . 92.3.3 Service connectors . . . . . . . . . . . . . . . . . . . . . 92.4 Personal data warehousing . . . . . . . . . . . . . . . . . . . . . 92.5 An open and extensible platform . . . . . . . . . . . . . . . . . . 112.5.1 Service connectors . . . . . . . . . . . . . . . . . . . . . 112.5.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . 122.5.3 Platform core . . . . . . . . . . . . . . . . . . . . . . . . 122.5.4 Intent-based sharing . . . . . . . . . . . . . . . . . . . . 132.6 Enabling personal data applications . . . . . . . . . . . . . . . . 132.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1 The appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.1 System overview . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Threat model and security . . . . . . . . . . . . . . . . . 203.1.3 Appliance lifecycle and costs . . . . . . . . . . . . . . . . 203.1.4 Backend platform components . . . . . . . . . . . . . . . 213.1.5 Dashboard user interface . . . . . . . . . . . . . . . . . . 273.2 Service connectors . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Packaging and distribution . . . . . . . . . . . . . . . . . 293.2.2 Isolation and developer flexibility . . . . . . . . . . . . . 303.2.3 Permissions and capabilities . . . . . . . . . . . . . . . . 313.2.4 Connector lifecycle . . . . . . . . . . . . . . . . . . . . . 323.2.5 Reading and writing data . . . . . . . . . . . . . . . . . . 333.2.6 Example connectors . . . . . . . . . . . . . . . . . . . . 333.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 36vi3.3 Application Sandbox . . . . . . . . . . . . . . . . . . . . . . . . 363.3.1 WebWorkers as execution containers . . . . . . . . . . . . 373.3.2 Hosted JavaScript runtime . . . . . . . . . . . . . . . . . 383.3.3 Application sandbox lifecycle . . . . . . . . . . . . . . . 393.3.4 Application monitoring . . . . . . . . . . . . . . . . . . . 413.3.5 Platform APIs . . . . . . . . . . . . . . . . . . . . . . . 423.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 453.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4.1 Packaging and distribution . . . . . . . . . . . . . . . . . 463.4.2 The application manifest . . . . . . . . . . . . . . . . . . 463.4.3 Application lifecycle . . . . . . . . . . . . . . . . . . . . 473.4.4 Example applications . . . . . . . . . . . . . . . . . . . . 493.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 523.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.1 Personal data platforms . . . . . . . . . . . . . . . . . . . . . . . 544.2 Privacy-preserving web applications . . . . . . . . . . . . . . . . 564.3 JavaScript sandboxes . . . . . . . . . . . . . . . . . . . . . . . . 584.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62A Platform APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67viiList of TablesTable 3.1 The data store implementations used by Vaportrail . . . . . . . 25Table 3.2 Application permission classes . . . . . . . . . . . . . . . . . 48viiiList of FiguresFigure 3.1 The Vaportrail appliance runs on a trusted infrastructure provider.Service connectors load and transform data which is then storedusing the platform API. A web dashboard allows the user tosafely expose their data to applications. . . . . . . . . . . . . 18Figure 3.2 Platform components and service connector containers run onisolated bridge networks. The platform API container is routablefrom service connector networks using the DNS name platform.api. 22Figure 3.3 Service connectors authenticate with the platform API by pre-senting a secret the platform places in their environment atinitialization. The connectors IP address and immutable con-tainer name are used to establish the set of permissions thatwill be enforced for subsequent requests. . . . . . . . . . . . 24Figure 3.4 The Vaportrail dashboard provides an overview of installedcomponents, applications and available storage capacity. . . . 28Figure 3.5 The installation flow for a service connector. A connectorpackage is uploaded through the dashboard and unpacked toa staging location on the appliance filesystem. The user thenapproves or rejects the platform permissions required by theconnector manifest via a UI workflow. . . . . . . . . . . . . . 32ixFigure 3.6 Vaportrail applications run in a dedicated JavaScript interpreterwithin a WebWorker thread. A trusted broker marshals appli-cation API calls over an asynchronous RPC interface and intothe platform monitor where authorization checks are made be-fore updating the DOM, or making requests into the platformAPI over the network. . . . . . . . . . . . . . . . . . . . . . 37Figure 3.7 When an application is launched, the monitor and the sandboxbroker coordinate to setup the new environment for the appli-cation instance, creating a new DOM root in the dashboard andloading the application code into the sandbox interpreter. . . . 40Figure 3.8 The lifecycle of a Vaportrail application. . . . . . . . . . . . . 49Figure 3.9 The Contrail timeline application. . . . . . . . . . . . . . . . 50Figure 3.10 The Altitude search application. . . . . . . . . . . . . . . . . 51Figure 3.11 The Radar account fraud detection application. . . . . . . . . 52xList of Programs3.1 A JSON service connector manifest specifying package metadataand platform permissions required by a Facebook connector. . . . 303.2 Method calls on remote objects are implemented by mapping a(taskID, objectID) pair to a specific object instance and applyingthe method on a set of wrapped arguments. Argument wrappingallows us to transparently support callbacks into the sandbox in-terpreter by hiding function pointer semantics in simple callablefunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.3 The JSON application manifest specifying package metadata andthe platform permissions required by the Radar application. . . . . 473.4 A “Hello, World!” Vaportrail application that stores the date of itslast run in localStorage, logs a message to the browser console, andcreates a modal dialog with a familiar salutation. . . . . . . . . . 48xiGlossaryRDBMS Relational Database Management SystemTCB Trusted Computing BaseETL Extract Transform LoadDOM Document Object ModelCSS Cascading Style SheetsAMI Amazon Machine ImageUUID Universally Unique IdentifierCSP Content Security Policy, A browser technology to mitigate cross sitescripting and data injection attacks.xiiAcknowledgmentsI would like to thank my supervisor, Andy Warfield, for his limitless patience andencouragement to complete this project. Also, Bill Aiello for the many conversa-tions that have shaped this work, and my perspective on online privacy. I wouldlike to thank my family, and Sylvanna, for their constant support, and my friendsand fellow NSS students for their guidance over the years. Finally, I would like tothank UBC Graduate and Postdoctoral Studies for their financial support.xiiiChapter 1IntroductionA man cannot be comfortable without his own approval.— Mark TwainSocial networking and cloud-based services have ushered in an era of unprece-dented data creation, sharing, and interaction[12]. Apps and websites encourageus to create, connect and share an increasingly intimate portrait of our daily liveswith everyone around us. We generate a continuous stream of personal content inthe form of pictures, videos, emails, documents, comments, likes, purchases andGPS location data. Behind each page impression, click, and scroll event is a trail ofpersonal data. Detailed logs of every interaction are captured and stored, creatinga valuable, high-resolution history of what we view, like, dislike and search for.With the advent of Internet of Things (IoT) a growing array of “smart” devices,including everyday appliances and mundane household objects, are passively (andin many cases actively[23][2][21]) listening: contributing enormous volumes ofpersonal data to our vaportrail in the cloud. In this thesis we present Vaportrail,a self-contained and self-hosted appliance that enables users to leverage the datathey generate. Vaportrail has a plugin architecture that allows service connectorsto continuously load and store personal data from external services. To leveragethe data, it provides a browser-based sandbox in which untrusted third-party appli-cations can operate without the ability to exfiltrate data over the network.11.1 Deployment modelVaportrail is packaged as a virtual machine image (or virtual appliance) that canbe hosted on any public cloud or private infrastructure. All operational costs, in-cluding compute and storage, are the responsibility of the user. There is no centralVaportrail service: the appliance does not “phone home” or have dependencies onany external services. Vaportrail is an entirely self-contained device. Althoughthis model imposes some cost and operational complexity beyond familiar cloudservices, it ensures that costs are explicit and mitigates the need for an exploitativebusiness model. The continued growth and maturity of public cloud infrastructurehas brought the cost of non-trivial compute and storage resources well within themeans of the hobbyist[22][5][14]. We believe that for privacy conscious users, thecontrol and transparency afforded by operating the appliance is preferable to otherdeployment options.1.2 Personal data warehousingInexpensive cloud storage and the expected (business) value of data has made col-lecting and warehousing it, regardless of its immediate utility, standard practicefor Software as a Service (SaaS) providers[12]. We take inspiration from modernconsumer scale data warehousing to propose a form of personal scale data ware-housing. Instead of archiving raw data to flat files or coercing various schemas intoa single traditional Relational Database Management System (RDBMS), we com-bine several specialized data stores to support analytical queries across a range ofstructured and unstructured personal data. The platform organizes the data storesunder a unified namespace and API.1.3 Service connectorsService connectors leverage platform APIs and a flexible execution environment toauthenticate with external services and load data into Vaportrail. The connectorswrite to schemas that the user has explicitly granted them access to at install time.Service connector developers can choose from various storage interfaces to suit thenature of the data and or the expected access pattern. A Gmail[17] connector might2store attachments as binary objects in gmail.objects.inbox.attachments,and messages as indexed text in gmail.nosql.inbox.messages.1.4 Application sandboxApplications are the primary way users interact with their Vaportrail. A browser-based sandbox provides a safe execution environment for untrusted applications,restricting their access to data and preventing them from exfiltrating it over thenetwork. Applications can use a rich set of platform APIs to query data, ren-der GUIs, trigger platform-mediated sharing and save their state. There is a sub-stantial amount of related work in JavaScript sandboxing, including AdSafe[1],Treehouse[44], JS.JS[51] and others[52][39][38]. We build on this work to createa robust application monitor that works in any modern web browser. In contrastto related work, we have made trade offs in favour of security and reliability overperformance and backwards compatibility for existing applications.1.5 A new class of applicationsWe hope to enable a new class of applications characterised by combining personaldata streams that were previously siloed and or too sensitive to share with untrustedthird-parties. Often we suffer from data lock-in: our data is trapped in the platformin which it was created, limiting what we can do with it. When we can export it,our data is often too sensitive to expose to any except the most credible, trustworthythird-parties. Trust and credibility are rightfully difficult to attain, but this limits theability of independent developers to build applications that consume personal data.With Vaportrail, we unburden the developer of traditional fiduciary responsibility,and the user of the fear of their privacy being compromised, shifting trust onto theplatform and enabling a new application design space.1.6 A simple trust modelWith sensitive personal data at stake, a straightforward trust model is critical. Tothat end, we state Vaportrail’s trust model here:Trust the platform and infrastructure, but not applications or service connectors.3We expect the number of service connectors to be small compared to the numberof applications, and rely on the community and trusted brands to endorse them. Inpractice, an acceptable level of privacy can be achieved by deploying Vaportrail ona reputable public cloud and using service connectors that have been vetted not tointerfere with the services they integrate with.1.7 Sharing with intentVaportrail supports a limited form of sharing via platform-mediated intents[35].We provide an API that applications can use to signal that an item (e.g. an image ortext) is shareable. The platform then provides the user with the option to share thecontent using any service connector that has registered as a handler for the relevantdata type. Intents provide a mechanism to publish results out of the platform whilecompletely decoupling the application from the channel used to do so. Sharing iscurrently limited to basic data types, but we imagine it could be extended to supportricher peer-to-peer sharing over WebRTC[36] or other protocols, with the platformas a trusted broker. There is some risk of exfiltration inherent in sharing, which wemake explicit by requiring the user to grant an application permission to share.1.8 Call for an open ecosystemThe success of Vaportrail depends on community adoption and a low-friction ex-perience for developers and users. To foster the growth of an ecosystem we takeinspiration for the packaging and distribution model of service connectors and ap-plications from the web browser extension model. Service connectors and appli-cations are packaged as self-contained archives including their source code and amanifest describing the permissions they require. Vaportrail is a decentralized plat-form: there is no official “app store”, only simple packages that are easily created,inspected and shared using familiar tools.The aim of this thesis is to describe the design, prototype implementation and workrelated to Vaportrail. We also explore the application design space enabled by sucha platform. The contributions of this thesis are threefold:• We describe the design space for a modern, privacy-preserving platform for4personal data and summarize related commercial and academic work;• We present Vaportrail, a prototype implementation and discuss the specificchallenges and tradeoffs encountered in its development;• We discuss several examples of fun and useful apps enabled by the platform,as well as consider the viability of Vaportrail’s deployment model as an al-ternative to conventional SaaS.At present the system is very much a research prototype, however enough of thecore functionality has been implemented to evaluate the design of key mechanismsand APIs. In Chapter 2 we discuss the overall design of the system. In Chapter 3we present the implementation. In Chapter 4 we summarize related work and inChapter 5 we conclude.5Chapter 2DesignVaportrail is a self-contained platform for personal data and applications. It mustbe inexpensive for users to operate, simple to maintain and extensible throughout:from the appliance architecture, to the services it can ingest data from, and the coreisolation mechanisms and APIs exposed to applications. In addition to these goals,we aim to prove that such a system can provide a modern and familiar web-baseduser interface. We argue that user experience is a critical factor in the adoptionand success of a system like Vaportrail. We believe Vaportrail achieves these goalsthrough infrastructure-agnostic packaging that gives users a choice of hosting envi-ronments, a plugin architecture for service connectors, modular design throughout,and careful adherence to the principle that our implementation should not break thefamiliar browser-based web experience. This chapter provides an overview of thedesign space and specific considerations and trade-offs that were made in the pur-suit of these design objectives. We provide a summary of the design requirementsat the end of the chapter.2.1 A personal applianceOne of the guiding design goals for Vaportrail was that it should be a completelyself-contained appliance that a user can operate with minimal cost and effort. Therise of Software as a Service has shifted user expectations toward a software de-livery model in which the complexity (and cost) of providing sophisticated appli-6cations is largely centralized and hidden. Using a new application usually requireslittle more effort than pointing a web browser at a URL. While the centralised na-ture of this model presents several challenges for privacy and data ownership, it hasa few attributes from a user experience perspective that are worth maintaining: ap-plications are platform-independent JavaScript and HTML, execution is confinedto a trusted sandbox (the browser), and application lifecycle is managed throughfamiliar browser system primitives (tabs, history, bookmarks, etc.).With Vaportrail we took the decision to make the costs of operating the systemexplicit to the user by default, to ensure that the design did not depend on therecovery of these costs through some form of business model. At the same time, weembrace the merits of the SaaS model for application delivery and aim to providethe “best of both worlds”: a standalone appliance that is simple and inexpensive todeploy, with a familiar browser-based web application for interacting with servicesand applications. The virtual appliance should support deployment without specialnetworking or storage configuration.We acknowledge that because the appliance is self-hosted and maintained bythe user, it should be capable of long-term operation without intervention. Toachieve this, we use a modular, component-oriented architecture throughout thesystem. Platform components are decoupled and isolated from one another. Whenpossible we choose proven software, protocols and isolation mechanisms overnewer or less stable options.2.2 A modern user experienceThe requirement that Vaportrail provide a familiar, browser-based UI presentedseveral challenges in balancing robust application sandboxing with a seamless userexperience. Specifically, it was important that it be possible to run several Vapor-trail applications concurrently and that the sandbox not require multiple browsertabs, page reloads, popups, or frames. We also set the requirement that Vapor-trail should not depend on a custom browser extension to provide a trusted/priv-ileged execution environment because extension semantics and APIs vary acrossplatforms and browsers. By restricting the design to standard browser APIs andfeatures that do not vary across platforms, Vaportrail is usable on a larger num-7ber of devices, safer by not requiring elevated privileges, and provides a modernsingle-page application experience.2.3 Simple trust modelA simple and plainly stated trust model should be central to any device that consol-idates highly personal data, and Vaportrail is no exception. It is important from apractical perspective, in that the system should be simple to use and reason about.It also gets to the heart of one of our broader goals with Vaportrail: to demon-strate that there are architectural patterns that support familiar application prim-itives without nebulous privacy implications, and that these patterns can be de-ployed today. It is a choice, and not a technical necessity, to build software withoutprivacy controls. We state the platform’s trust model here and discuss the designspace and implications for each item below:1. The user must trust the infrastructure that they choose to run their Vaportrailinstance on2. The user must trust the core platform base that mediates access to their per-sonal data3. The user must grant service connectors a) access to the external service(s)they integrate with, and b) platform-mediated write access to personal dataschemas2.3.1 Trust the infrastructureAs a self-hosted virtual appliance, the user is at liberty to run the software onany compatible infrastructure platform. This allows the user to balance cost andprivacy according to their own priorities. An extremely cautious user might deploythe appliance on physical hardware that they own, while others might find a publiccloud server, or even an instance hosted by someone else acceptable. There is agradient of options between the two extremes. Whatever the case, the user musttrust that the underlying hardware and software will not compromise their data.Verifying the low level safety and integrity of the environment is beyond the scope8of this project; we rely on the user to make a choice that is consistent with theirpriorities.2.3.2 Trust the platformThe user must trust the platform core, which together with the underlying infras-tructure forms the Trusted Computing Base (TCB). The platform core consists ofthe isolation mechanisms and APIs that mediate access to personal data by un-trusted third-party applications and connectors. The platform will ensure that theuser is made aware of the specific data schemas and permissions that applicationsand connectors request at install time. The fundamental guarantee of the platform isthat it will ensure data cannot be exfiltrated except through explicit, user-approved,platform-mediated sharing mechanisms on a case-by-base basis.2.3.3 Service connectorsThe platform prevents service connectors from negatively impacting the stabilityof the appliance or the integrity of the data stores, however it cannot police theirinteractions with external services. The user must trust the connectors they installto operate in good faith with respect to the API access granted to them and thedata that flows through them. We expect the number of service connectors to besmall compared to the number of applications. Our hope is that trusted serviceproviders will build their own connector integrations, and that the community willvet third-party connectors based on developer reputation and code reviews.2.4 Personal data warehousingVaportrail can be related to traditional information infrastructure. The platform isconceptually similar to a data warehouse (the combination of an archive and ananalytical platform) but for a single individual’s data, instead of a large enterprise.Service connectors could be framed as Extract Transform Load (ETL) jobs, andapplications as their analytical processing counterparts.Although there are high-level comparisons to be drawn with traditional datawarehousing, Vaportrail is different in a couple of important ways: a) it is tailoredto a single individual’s data and not a large enterprise, and b) it leverages a com-9bination of specialized modern data stores instead of either a monolithic RDBMSor “lake” of unstructured data. Reducing the scope to a single individual alleviatesmost performance and scalability concerns: the data is decidedly small by modernstandards. The relatively small scale allows us to consider more convenient rowand document-oriented databases that make storing and querying diverse schemaseasier.Leveraging multiple specialized data stores simplifies the transformations re-quired in service connectors and makes a richer application-facing query API pos-sible. An important trade-off versus using a single SQL-driven column store orsimilar, is that the query interface is more complex and applications will tendto be more coupled to the design choices of upstream connectors. A potentiallysignificant limitation of this design is that schema migrations due to connectorchanges are more likely to break downstream applications. Mitigating this througha platform-managed schema migration mechanism was beyond the scope of thedesign at this prototype stage. Our current solution is to recommend versionedschema names.Determining a set of data store interfaces that could meet the needs of a repre-sentative group of service connectors was a key design goal. The database archi-tecture for an IMAP connector will be substantially different from one that importsvideos from a social media service. As in other parts of the system, we aim foran extensible data layer so that unforeseen use cases can be met by adding storageand query interfaces as needed. Motivated by a thorough examination of the re-quirements of a few example connectors (discussed in Chapter 3) we arrived at aminimum set of storage interfaces for the initial prototype:• SQL store providing efficient create, update, delete operations with columnindexes for highly structured, row-oriented data• Object store with support for large binary objects for use as a bulk repositoryfor video, images, text blobs or other large files• Memory Key/Value store for values that change frequently, caches, persistentdata structures and publish-subscribe functionality• Document or “nosql” store for less structured data with variable or frequently10changing schemasWe could imagine adding other potentially useful interfaces as future work (e.g.a graph database for capturing networks and dependency graphs or a time seriesdatabase for event or other time-indexed data) but most use cases are reasonablywell served by a combination of the initial set of data stores.By providing an abstraction layer on top of the native data store interfaces,the platform can restrict connector and application access by putting authoriza-tion logic on the data path and mapping platform permissions down to data store-specific access control mechanisms, e.g. a database user with a specific set ofGRANTs, in the case of an SQL store. System resources can be managed simi-larly by mapping high-level resource limits onto configuration options on the datastores.2.5 An open and extensible platformA guiding principle of Vaportrail is that it should be an open and extensible plat-form driven and owned by the community. It should invite extension and personal-isation by individuals. This is more than a “feel good” design goal: for a platformto succeed, developers need to build connectors, applications, and extend the coreplatform capabilities. The existence of several SaaS products in this space suggeststhere is broad interest in leveraging personal data. Our aim is to lay the foundationfor an open, privacy-preserving alternative to commercial offerings.2.5.1 Service connectorsService connectors provide the logic to connect to external services and load datainto Vaportrail. Each connector is a self-contained application that runs in an iso-lated environment with limited system resources. The platform does not restrictthe outgoing network traffic of connectors or impose any requirements on howthey communicate with external services. Service connectors are third-party com-ponents, and the wide array of protocols and authentication mechanisms used byexternal services makes over-specifying their implementation impractical. Insteadwe assume connectors are well intentioned and focus on insulating the platform11from well meaning, but perhaps badly behaved or broken connectors. This is dis-cussed in detail in Section YYY.Connectors are shared and distributed using an open format inspired by browserextensions: essentially an archive containing a manifest and the connector code.The platform requires that the user accept the terms of the manifest during instal-lation. This simple, open format is easy to read and write with existing tools, andis easily distributed with no strings attached. We leave building a central packagemanager or “app store” to the community.2.5.2 ApplicationsApplications are untrusted third-party code that can be installed to provide fun anduseful new functionality built on personal data streams. Much like service connec-tors, applications are distributed in a simple package format containing a manifestand code, however there are important differences between connectors and ap-plications. Applications are developed in any language that can be compiled toJavaScript and can only use a very narrow platform API to query data, persist ap-plication state, and render user interface (UI) elements. The runtime environmentfor applications is extremely restricted compared to the standard browser Docu-ment Object Model (DOM) and JavaScript APIs. While our aim for the Vaportrailprototype was to demonstrate a sandbox that enables purpose-built applications,and not to support running existing applications, we believe the sandbox could beextended to support a complete virtual DOM and thus many popular libraries andframeworks. We focus on providing a low-friction format for sharing applications,and a robust, extensible browser-based sandbox.2.5.3 Platform coreThe applications we use to create and interact with the content we generate is adiverse and ever evolving ecosystem. For Vaportrail to remain relevant in such adynamic environment, we acknowledge that even the core facilities of the platformshould be open and extensible. To this end, we imagine Vaportrail as a kind of per-sonal data “hub” that should invite extension to the way service connectors and ap-plications interact with the platform, including the isolation mechanisms and proto-12cols governing those interactions. We design for extensibility throughout the plat-form core by using a modular architecture that relies on simple, well-documentedAPI contracts between components, allowing entirely different implementations ofcore facilities (e.g. the application sandbox) to be used interchangeably.2.5.4 Intent-based sharingPersonal data does not exist in a vacuum and we argue that a viable platform forleveraging personal data would not either. Vaportrail must reconcile the need forrobust privacy with the ability to intentionally share useful application results withexternal services. The platform accomplishes this with Intents[35], a platform-mediated sharing mechanism that is widely deployed in mobile operating systems.An intent allows the platform to match an application provided share object (text,image, or other data) with a target capable of publishing the object to an externalservice. The content of each share is inspected by the user via a UI workflowbefore being handed off to the selected target to be published. This mechanismfully decouples the untrusted application from the sharing target and puts the userin control of an approval process, making each instance of data leaving the platformexplicit. While sharing unavoidably creates the possibility of abuse by a maliciousapplication, we believe Vaportrail mitigates the risk of such an exploit in two ways:1) the application cannot know to which external service the data will be publishedand thus would need access to several possible targets in order to recover the datafor exploitation, and 2) the user must explicitly grant applications permission toshare. We believe that in practice the value of a sensible sharing mechanism willoutweigh the potential risk of its exploitation.2.6 Enabling personal data applicationsApplications that operate on even a single personal data stream (e.g. banking trans-actions, or email) demand a degree of trust that is extremely difficult to attain. Thedeveloper or service provider requires a level of credibility that is scarcely achievedexcept by enormous corporations, entities in highly regulated sectors, or publicinstitutions. Consequently, our most personal data is siloed within these trustedsystems, making it extremely difficult for us to make our own copy, or to derive13further value from it without unacceptable privacy risks. Services increasingly pro-vide APIs or data export features that solve one aspect of the problem–allowing usto make our own copy–but loading that data into any other software means placingsignificant trust in a third-party.Vaportrail enables a new breed of personal data-consuming applications by pro-viding a platform that prevents untrusted applications from exfiltrating data. Thisunburdens developers of the enormous cost and responsibility of being a legitimatefiduciary, and the user from the fear of having their data exploited. An excitingcorollary of this is that multiple sensitive data streams can safely be exposed toapplications that we previously would not have trusted with any personal data. Webelieve this opens up an entirely new design space for personal data applications,and suggest that there are several new application archetypes in this space.WatchdogsWatchdogs passively monitor your data streams and alert you when somethinggood, bad or out of the ordinary happens. They might employ outlier detectionor other Machine Learning techniques to build a model of your activity and moni-tor deviations from it.AggregatorsAggregators extend the features of external services (e.g. search) by combiningmultiple data streams under a single UI. Imagine searching your emails and creditcard transactions in one place, perhaps plotted on a timeline alongside your heartrate data.LifestyleLifestyle applications provide useful hints, insights and reminders by looking atcorrelations in your data streams. Perhaps your location data and your fuel pur-chases suggest that you could be achieving better fuel economy.ProductivityProductivity apps will leverage your personal data to actively help you achievegoals. They may suggest you purchase healthier food, find more time or cost effi-14cient modes of transportation for common routes, or combine health metrics withactivity streams to determine your most productive times to work.2.7 SummaryIn this chapter we have reviewed the design objectives for Vaportrail and discussedthe key features, capabilities and trade-offs involved in meeting them. The majordesign goals for Vaportrail include:• Simple deployment and management for an individual• A familiar, modern user experience• A simple trust model• An open and extensible platformWe believe the Vaportrail prototype achieves these goals. In Chapter 3 we discussthe implementation in detail and in Chapter 4 we survey related work.15Chapter 3ImplementationIn this section we discuss the prototype implementation of Vaportrail. The pro-totype is a self-contained, standalone virtual appliance that provides a fully man-aged platform for importing and archiving personal data. The platform serves asa privacy-preserving runtime and mediator between personal data and untrustedthird-party applications.The primary goal of the implementation is to provide a working proof of con-cept that meets the design criteria set out in Chapter 2 and to demonstrate that evena naive “reference” implementation of Vaportrail can be practical, useful and fun.In particular, we aim to show that with a reasonable level of effort we can builda platform with a modern user experience that is easy and inexpensive to operatewhile providing strong, practical privacy guarantees. This implementation servesto validate the basic architecture and appliance form factor, isolation mechanisms,programming interfaces and cost expectations. We acknowledge that the true testof a system like Vaportrail would depend on observing it in the hands of users anddevelopers over an extended period of time. Due to time constraints, we limit thescope of our evaluation to discussion of how well the prototype meets the basic de-sign goals, and aspects of the system that can be measured at the scale of a singledeployment.Vaportrail was inspired by several privacy-preserving systems that came be-fore it and our implementation builds on the lessons and tradeoffs articulated bythem. Specifically we credit Priv.io[52] with the basic idea that users might “bring16their own infrastructure” to an application ecosystem, and Treehouse[44] with thenotion of repurposing WebWorkers as execution environments for untrusted code.DataBox[42] provided a philosophical framework that guided the design specifica-tion of Vaportrail as a personal data hub, and inspired the virtual “form factor” ofVaportrail as a personal appliance.Although there is a considerable amount of prior work in this area, we felt thedevelopment a new system was justified by a few broad themes (limitations) in therelated work:1. requiring modified environments (custom operating system extensions, browsers,etc.) that are impractical for regular users,2. depending on adoption by trusted service providers and or changes to theirbusiness models and,3. code isolation mechanisms that are difficult to verify or reason about in prac-ticeWe argue that in the development of privacy-preserving systems, practical usabilityis tantamount to robust privacy controls. In many ways Vaportrail is only a novelarrangement of existing good ideas with a focus on making them usable today.We also take advantage of external factors like the declining costs, and increasingreliability, of public cloud infrastructure to make a case for individuals operatingtheir own virtual appliances.We believe the prototype implementation of Vaportrail successfully meets thedesign criteria and demonstrates that, with further refinement, the system could be apractical fiduciary and platform for our personal data. While we cannot extrapolatefrom our findings to conclude that a broader ecosystem around Vaportrail wouldbe successful, we are hopeful that this initial work makes a compelling case for thepossibility of wider adoption.3.1 The applianceIn this section we discuss the packaging and core components of the platform.Vaportrail is designed as a personal appliance that can be operated by an individualwith minimal technical or operational intervention.173.1.1 System overviewThe platform is packaged as a virtual machine image based on a standard UbuntuServer 16.04 LTS release. The LTS designation guarantees that the release is fo-cused on stability for enterprise applications and will receive long term support andupgrades. We chose Ubuntu largely for its familiarity, though a number of otherLinux distributions might have served equally well. Although the machine imagecould easily be built for almost any virtualization environment, we chose to tar-get the Amazon Machine Image (AMI) used by Amazon’s Elastic Compute Cloud,for convenience. By providing a Vaportrail AMI users can create a virtual serverrunning Vaportrail with resources (storage, CPU, RAM) tailored to their needs andbudget. With a publicly visible IP attached to the server, the user can point theirbrowser at the platform as soon as the appliance is running.Figure 3.1: The Vaportrail appliance runs on a trusted infrastructure provider.Service connectors load and transform data which is then stored usingthe platform API. A web dashboard allows the user to safely exposetheir data to applications.At a high level, the platform implementation is composed to two pieces: thebackend (components that run directly on the virtual machine) and the frontend18(components that run in the browser). Backend components run in Linux contain-ers managed by Docker[48] while the frontend components are served over secureHTTP to the users browser and executed client side. Running backend components(databases, service connectors, platform API servers, etc.) in containers affordsus a) fine grained and dynamic control over their isolation from one another b)the ability to limit the system resources (CPU, memory) that any one componentcan consume. Docker images provide us with an open, versionable and famil-iar format for packaging and distributing Vaportrail components. By running thecomponents in isolated containers and having them communicate via (generallynarrow) API contracts, we achieve a highly modular design that facilitates test-ing, upgrades and even wholesale replacement of component implementations ifthe need arises. In order to orchestrate the various component containers, we runa special platform service in a privileged mode that allows it to create, configure,stop and start other containers. In keeping with the principle of least privilege,even the platform service does not have full control of the system, only a limitedset of capabilities necessary to create containers at or below its own privilege level.The platform dashboard that the user interacts with is served by a single web appli-cation (vaportrail-wsgi) that both serves the static files (HTML, JavaScriptand CSS) and acts as a RESTful API server providing endpoints for authentica-tion and all other platform operations: installing, updating, removing applicationsand connectors, selecting data and persisting state. The frontend application issimilarly modular: a privileged monitor establishes a session with the backendvia the API and orchestrates the execution of sandboxed applications in isolatedWebWorker containers. The application sandbox traps platform API calls and for-wards them to the monitor, which authorizes the operation and makes the necessarycalls into the backend before returning control (and any results) back to the sand-boxed application. We achieve similar architectural benefits as in the backend byimposing a strict separation of concerns within the frontend implementation: themonitor, sandbox and API communicate only over asynchronous message-basedinterfaces allowing them to evolve independently and for different implementa-tions to be swapped in and out. Separation of components via an asynchronousinterface affords a particularly interesting opportunity for future work in which wecould imagine migrating running applications to the server (i.e. by migrating the19sandbox) before the browser window is destroyed.3.1.2 Threat model and securityExposing any service on the internet is inherently risky, especially when that ser-vice is a box containing all of your most sensitive, personal data. We take a numberof precautions in the Vaportrail architecture to minimize the network exposed sur-face of the platform by locking down all but the necessary ports, using random portswhen a component needs to expose a service (e.g. to complete a third-party authen-tication workflow in a service connector), rate-limiting API endpoints, and apply-ing the principle of least privilege throughout the system. We also employ strongpassword-based authentication over HTTPS and use techniques to prevent CrossSite Request Forgery (CSRF), Cross Site Scripting (XSS) and Phishing by mali-cious third-party applications (discussed in detail in Section 3.3). Beyond theseprecautions, the user has a considerable degree of control (and responsibility) tosecure their appliance on the network. They might whitelist only a range of IPs orMAC addresses from which they intend to access their appliance. They might em-ploy circuit-breaking middleware between the appliance and the internet to protectagainst denial of service attacks or to provide online traffic analysis and alerting.Automating the provisioning of a more sophisticated, production-grade securitystack was beyond the scope of our initial prototype, but we could imagine provid-ing a more comprehensive deployment template via Amazons CloudFormation[8](or similar) as future work.3.1.3 Appliance lifecycle and costsIt is important to consider the entire lifecycle of the appliance in terms of mainte-nance and costs. One of the advantages of centralized SaaS is that the costs andmaintenance are entirely absorbed by the service provider. With Vaportrail, we ex-ploit the fact that cloud computing resources have become increasingly affordable,and with the entrance of several new major competing providers (namely Googleand Microsoft, in addition to Amazon) we expect the downward trend to continue.In 2017, a basic Vaportrail instance with 80GB of storage, a single CPU and 8GBof memory can be operated for roughly $10US per month. In many cases, these20resources fall under the limits of a free pricing tier. In general, Vaportrail is storagecapacity-bound as it acts as an archive, offloading much of the application comput-ing to the browser. Adding 100GB of storage would cost less than $5US on mostcloud platforms[14][5][22] today.Due to time limitations we were not able to complete a streamlined updateprocess for the prototype appliance, however that process would essentially in-volve updating the operating system packages (e.g. apt-get upgrade all),pulling the latest platform service container image from the Vaportrail registry, andpotentially restarting the appliance VM. Given that the supported lifetime of theoperating system release is five years, we imagine that a full re-spin of the appli-ance would be necessary after that period of time. Data would be migrated eitherby mounting the outgoing appliance’s root volume on the new appliance or byexporting/importing data from one appliance to the other.Backups can be achieved using conventional approaches, i.e. by snapshottingthe root volume of the appliance and storing those snapshots on secondary storage(for example Amazon S3[32]). Interestingly, a Vaportrail service connector withsufficient permissions could serve as a backup/export tool targeting external stor-age services such as Dropbox[13], Google Drive[20] or even another Vaportrailinstance. Such a connector would require an uncommon degree of access and, asdiscussed in Section 3.2, the user would be required to explicitly grant it.3.1.4 Backend platform componentsThe backend of the appliance hosts several important components of the platform.Each of these components runs in its own Docker container and is allocated a sliceof the system resources. In this section we describe each of the major componentsas well as their communication and network topology within the appliance.Container networkingThe backend containers are network isolated from one another using virtual bridgenetworks. Specifically, the trusted platform components run on one network (vplatform)while third-party service components each run on their own private network. Atrusted platform API service is attached to both the platform network and each of21the service connector networks as an “API gateway”, i.e. single point of entry, intothe platform services.Figure 3.2: Platform components and service connector containers run onisolated bridge networks. The platform API container is routable fromservice connector networks using the DNS name platform.api.All containers that share a network are IP addressable by one another and canresolve container names (e.g. api.platform) using DNS to IP addresses. Thebridge network is a very lightweight abstraction that can easily accommodate thou-sands of networks and containers on a single host. This approach to container net-working allows us very fine grained programmatic control over network isolationbetween components running on the appliance. It also alleviates a lot of challengesaround service port collisions and accidental exposure of services to the internet(e.g. if a server binds to 0.0.0.0:80 inside of a container, it will not be boundto the appliances public IP) that are common when running multiple services on22a single host. Although we do not leverage them in the Vaportrail prototype, con-tainers (which are essentially process groups) can also be run under AppArmor[3]profiles to further restrict their egress network traffic, access to the filesystem, andother capabilities.Platform APIThe platform API is the top-level web application that serves the static files (HTML,JavaScript, CSS) that make up the Vaportrail dashboard UI as well as the JSON-based REST endpoints that control the platform. The platform API is exposed asthe default web server (port 80 and 443) on the appliance to enable access fromthe internet, as well as being discoverable as the DNS name api.platformfrom within service connector containers. The API is implemented as a Pythonweb application and mounts several sub application modules at URL base pathscorresponding to their functions. For example, the authentication module is avail-able at api.platform/api/v1.0/authn while the Data API is accessed atapi.platform/api/v1.0/data. All of the API endpoints expect JSON in-put and produce JSON output, and all except for the static HTML endpoint requirea valid authentication token to be present in the request. A very basic rate-limitingscheme wraps all endpoints to curtail intentional or accidental abuse.Authentication and authorizationThe platform API has two authentication mechanisms used by the Vaportrail dash-board UI (frontend) and service connectors, respectively. The first mechanism isa fairly standard password-based challenge in which the user submits a passwordover a secure HTTPS form and receives a time-limited token with which to makesubsequent API requests. A production implementation would likely employ atwo-step authentication flow using a Time-based One-time Password in addition tothe fixed password, but this was beyond the scope of our prototype. The secondauthentication mechanism, used by service connectors, exploits the container net-work topology to verify the identity of the client based on an immutable containername.A service connector only needs to make a request to the authentication API in-23Figure 3.3: Service connectors authenticate with the platform API by pre-senting a secret the platform places in their environment at initializa-tion. The connectors IP address and immutable container name are usedto establish the set of permissions that will be enforced for subsequentrequests.cluding a random Universally Unique Identifier (UUID) Vaportrail instance secretin order to receive a valid API token. The container name is then automatically usedto associate the session with the correct permission set for the connector. Once aclient is in possession of a valid API token, each API call is authorized based on thepermission set associated with the token. We implement both the authenticationand authorization steps as middleware between the incoming HTTP request andthe actual endpoint implementation, allowing endpoints to specify which autho-rizations they require in a declarative style. Authorizations can also be performeddynamically as needed: for example, if the permissions required vary between re-quests to a single endpoint.24Data storesAs discussed in Section 2.4, the platform is built around several specialized datastores, each tailored to different types of data and access patterns. Each data store isa standalone database server that provides at least one of the key storage interfacesidentified by the design. The databases run on the trusted platform network and ac-cess to them is mediated by the platform API. None of the third-party componentsthat read or write data to the stores have direct network access to the databaseservers; all operations go through the abstraction layer provided by the platformAPI. While this implementation choice comes with significant performance over-head (versus allowing clients to communicate directly over native protocols) thevalue of having a trusted platform component on the data path justifies it, providingboth a single point of authorization for each operation and hiding the implementa-tion backing each storage and query interface from clients. Because Vaportrail is asingle tenant, privacy-oriented platform, we prefer correctness over performance.The relatively small, personal scale means that even a 200x throughput/latencyoverhead does not impact the user experience noticeably.Table 3.1: The data store implementations used by VaportrailInterface Implementation NotesSQL store PostgreSQL[29] 8.4 Postgres is a flexible, perfor-mant, modern row-store thatspeaks an extended SQL variant.Object store Riak CS[31] 2.0 Riak CS is an S3 compatible ob-ject store built on the Riak KVstore.Memory key/value store Redis[30] 3.0 Redis is an in-memory storesometimes described as a datastructure server. It also providespublish-subscribe functionality.Document store MongoDB[27] 2.6 MongoDB is a schemaless(”nosql”) document store witha flexible JSON-based querylanguage.25Each of the data stores has read-write access to a dedicated volume on the ap-pliance’s filesystem. A special volume mount is necessary in order to persist dataacross container restarts as the default container filesystem only lives as long as thecontainer itself. Table 3.1 gives an overview of the specific data store implementa-tions chosen to meet the design requirements. In general we chose the latest stablerelease of the most established system in each category. Many of the data stores aredesigned to run at scale with configurable replication and failover policies for highavailability. For simplicity we run the data stores in their simplest single node con-figuration. We did not implement health checks or any kind of watchdog processfor the data stores, which would be a wise addition in a production implementation.Data APIThe data API is an abstraction layer on top of the data stores and forms an inte-gral part of the platform API. It provides a write interface with a driver tailored toeach data store that enables create, update, and delete operations with semanticsdetermined by the specific store. Similarly, the read (or query) interface provides adriver for each data store that can map a user provided query down to a data-store-specific request. All of the data stores are addressable under a single unified names-pace with the structure ...which allows the data API to map each request to a specific store driver andfor that driver to apply whatever semantics it chooses to the (component, col-lection, schema) tuple. For example, a Facebook service connector might writeto facebook.sql.timeline.comments to address a table named “com-ments” in the “timeline” schema of the “facebook” database in the SQL (Post-gresSQL) data store. For writers, the component portion of the namespace tuple isfixed based on the identity of the writer. Unifying the data store namespace pro-vides us with a convenient and familiar grammar for describing permissions acrossdatasets. An application might request access to facebook.*.*.* (all of theusers Facebook data) or only their images: facebook.blobs.timeline.images.The platform itself stores most of its state in vaportrail.sql.metastore.{users,applications, connectors} and vaportrail.blobs.metastore.code.Although constructing data API requests by hand can be tedious (and is currently26the only option), a well designed client SDK could abstract most of the complexityaway for the user.Service connectorsService connectors are third-party components that connect to external services andload data into Vaportrail via the data API. Connectors can be built in any languageusing any framework or libraries the developer chooses, which is crucial given thedegree of variation in the wider API ecosystem. Each connector runs in a containeron a dedicated bridge network. The only other container that is routable fromwithin the connector is the platform API service, which is discoverable throughDNS. The random API secret required to connect to the platform API is injectedinto the container through an environment variable. Service connectors have un-restricted access to the internet to facilitate diverse authentication workflows andAPI access patterns. Service connectors are discussed in detail in Section 3.2.3.1.5 Dashboard user interfaceThe dashboard (broady the frontend) of the appliance is the web application throughwhich the user interacts with Vaportrail. The dashboard is a modern “single page”application composed of several JavaScript components. The entire application isloaded when the user navigates to the Vaportrail instance, and subsequently makescalls into the platform API from JavaScript. We provide an overview of the threemajor components of the UI in this section, and delve into greater implementationdetail for each of them in subsequent sections. Throughout the frontend codebasewe use only standard JavaScript, HTML5 APIs and the Bootstrap[6] UI compo-nent library. The platform components are largely decoupled from one another andcommunicate asynchronously across a publish-subscribe bus, event handlers, orthe browsers postMessage() API.Vaportrail dashboard. The dashboard is the first view the user sees when they lo-gin to Vaportrail. The dashboard provides an overview of which service connectorsand applications are installed, as well as available storage capacity. The dashboardview integrates with the application monitor to launch and monitor Vaportrail ap-plications, providing a “dock” along the top of the frame for running applications.27Figure 3.4: The Vaportrail dashboard provides an overview of installed com-ponents, applications and available storage capacity.Application monitor. The application monitor runs in the background and servesas the trusted platform intermediary through which all platform API requests orig-inating in sandboxed applications flow. It is also responsible for launching appli-cations in the application sandbox, and hosts the engine that renders applicationUI elements. The monitor is responsible for enforcing application access to datastores via the data API.Application sandbox. The sandbox provides a very restrictive runtime environ-ment in which arbitrary JavaScript code can run completely isolated from the DOMand browser APIs. Each running application has a dedicated instance of the sand-box running in a browser thread separate from the application monitor and streamview. The sandbox hosts the untrusted application code and traps platform calls in28the guest code, routing them into the monitor.3.2 Service connectorsService connectors are an integral part of the Vaportrail platform, serving as theconduits through which data is exported from external services and loaded intothe platform. Like applications, service connectors are developed by third-parties.They integrate with external services using the SDKs and APIs provided by thoseservices. A typical service connector will complete an authentication workflowwith a service as part of its installation process and subsequently synchronize withthe service periodically or in real-time, depending on the nature of the service andthe application use cases imaged by the developer. Once a service connector isinstalled, the user can install applications that know how to process data stored bythat connector.3.2.1 Packaging and distributionAs discussed in Section 3.1.4, each service connector runs in a Docker containeron an isolated virtual network within the platform appliance. The Docker tool-ing provides us with a very convenient format for efficiently distributing versionedbinary container images with built-in data integrity checks. We build on a packag-ing convention that has become common for browser extensions to create our ownstandalone service connector package format that consists of a gzipped tarfile con-taining a JSON manifest and, optionally, the container image in the Docker imageformat.The manifest is a simple JSON file that specifies details about the connector(name, author, version, etc.), the platform permissions it requires to run, any set-tings that can be configured by the user, and a reference to the container imageeither as a package-relative path, or a Docker registry URL where the image canbe downloaded from. This package format has a number of benefits:• It is simple to inspect and construct using standard tools (vi, Docker, tar)• It is built on an open, and increasingly standard, container image format• The JSON manifest is extensible, human writable and machine readable29Program 3.1 A JSON service connector manifest specifying package metadataand platform permissions required by a Facebook connector.{\"name\": \"Facebook Connector\",\"author\": \"Kalan MacRow \",\"description\": \"Facebook for Vaportrail\",\"version\": \"0.0.1\",\"image\": \"assets/image.tar\",\"permissions\": [{\"type\": \"write\", \"schema\": \"facebook.*.*.*\"},{\"type\": \"memory\", \"request\": \"1G\"},{\"type\": \"share\", \"type\": \"image/*\"},{\"type\": \"host_port\", \"port\": 8080}]}• It supports distribution either as a complete binary package, or as a lightweightinstallerAlthough we do not implement or enforce any particular cryptographic features(e.g. package signing) the simple, self-contained, nature of the format invites theuse of existing tools (e.g. GnuPG[18]) for this purpose. We could imagine incorpo-rating package singing directly into the platform tooling in the future. Building ona simple, open package format is crucial to facilitating adoption in the open sourcecommunity, and we believe the Vaportrail package format meets this design goal.3.2.2 Isolation and developer flexibilityRunning service connectors in resource-restricted, network-isolated containers helpsus meet two important design goals: connectors are isolated from the core platformservices, data stores, and each other while also ensuring developers have the flex-ibility to use whichever Linux distribution, frameworks, libraries, languages andgeneral program structure they prefer in implementing the connector. The con-tainer environment provides a similar degree of self-determination as would a ded-icated virtual machine, with the caveat that the operating system must be a flavour30of Linux. Flexibility is critical because of the large degree of variation in how ex-ternal services choose to expose APIs and data. There are a multitude of authenti-cation protocols and workflows, some open standards (OAuth2[43], OpenID[49]),and many proprietary. While SDKs are often made available in several languages,the process of installing and configuring them usually involves installing depen-dencies, updating or setting environment variables, and creating files at sensitivefilesystem locations. In short, restricting the service connector sandbox to a moremanaged environment, as we do with applications, would significantly limit thelikelihood that many connectors are built. We believe implementing service con-nectors this way balances the need for isolation, in accordance with the platformtrust model, with the need for developer flexibility in integrating with a highlyfragmented wider API ecosystem.3.2.3 Permissions and capabilitiesService connectors are able to request required and optional permissions from theuser via the package manifest. The permissions are presented to the user uponinstallation for explicit approval before the connector image is loaded into the ap-pliance environment and scheduled to run. The permissions fall into two broadcategories.System resources. Connectors can request specific memory, CPU slices and ap-pliance port forwarding. These advanced permissions exist to accommodate con-nectors that a) may perform non-trivial data transformations requiring more thanthe default memory and CPU allocations, and b) authentication workflows or APIsthat involve “callbacks” (e.g. WebHooks) from external services. The platform UIclearly identifies the gravity of these advanced permissions at install time so thatthe user can make an informed decision.Data store access. Connectors request access to the specific data store schemasthey need to read or write to. The schemas need not exist at install time: they willbe created on the first write operation, however, the user must explicitly approve theconnector access. The platform UI clearly identifies when a connector is request-31ing a) read access to any schema, given the risk of exfiltration and b) any access toa schema that already exists, given the risk of data corruption or exfiltration.3.2.4 Connector lifecycleThe user can install connectors by uploading a package through the dashboard UI.The platform API places the package in a staging location and returns the mani-fest to the UI for the approval workflow. If the user approves the permissions andsettings for the connector, the manifest is stored in the platform metabase and theconnector image is imported into the appliance. The connector container is createdand started using either the default system resource limits or the (approved) re-quested limits. If port forwarding is requested, the platform chooses a random freeport on the appliance and maps it to the requested port in the container. The hostIP address and forwarded port are then passed into the container in an environmentvariable so that they can advertised as needed.Figure 3.5: The installation flow for a service connector. A connector pack-age is uploaded through the dashboard and unpacked to a staging lo-cation on the appliance filesystem. The user then approves or rejectsthe platform permissions required by the connector manifest via a UIworkflow.Once the connector has started, the user is free to access its configuration page32which can be used for changing settings or completing authentication workflows.The configuration page is served by the service connector itself, and is availablethrough a random, temporarily mapped port accessible only from the users currentIP address. The connector then runs indefinitely, scheduling data ingestion as andwhen it deems appropriate. If the user no longer wants the service connector,it can be stopped and completely removed through the dashboard. Although theconnector software can be removed, we do not currently provide a mechanism forwholesale deletion of any associated data store schemas.3.2.5 Reading and writing dataThe primary purpose of service connectors is to load personal data streams intothe platform. They accomplish this by writing into the data API endpoints of theplatform API. The API is accessed from within the container using the DNS nameapi.platform. In the simplest case, the connector will load data from its up-stream service(s) and write the objects or records directly to a schema that it hasaccess to, with minimal transformation. In some cases, the connector may needto load some state (e.g. a checkpoint marker of some kind) from the data API orits local filesystem, and or may perform some non-trivial operations or transforma-tions (e.g. aggregation, rollup, deduplication, downsampling) before writing it intothe platform. The platform will automatically provide back-pressure in the formof API rate limiting (communicated back to the connector as an HTTP 429 -‘‘Too Many Requests’’ response) and enforce the schema access permis-sions configured at install time. In general, connectors can “fire and forget” datainto the platform.3.2.6 Example connectorsWe implement three prototype service connectors for popular services to demon-strate the capabilities of the platform, and to serve as test mules in evaluating thedesign and implementation of the platform API and service connector runtime. Weplanned the connector implementations by referring to the best practices prescribedby the respective services, and without regard for any limitations that the platformdesign might impose on us.33FacebookFacebook remains the dominant social network globally with over 1 billion activeusers[40] in 2017. It is a significant personal data sink for many users, who gen-erate a continuous stream of status updates, photo uploads, likes and commentseveryday. The company provides a comprehensive API that gives users the abilityto search, export and post content programmatically with nearly as much flexibil-ity as the main Facebook website itself. Given its prominence and excellent API,Facebook would be a canonical data source for Vaportrail and thus a good candi-date for prototyping. The service connector is based on a standard Ubuntu 14.04base image and uses the Python bindings for the Facebook Graph API[16] to querythe user’s posts every 10 minutes. The connector requests write access to three datastores:• facebook.sql.posts.status (status updates)• facebook.sql.posts.comment (comments posted by the user)• facebook.blob.posts.photo (photos posted by the user)It creates the relational tables for statuses and comments using the data APIcreate table if not exists. The connector code simply waits until a fileis created at /var/auth token by completion of the OAuth authenticationworkflow with Facebook and then enters an infinite loop, syncing new posts beforesleeping for 10 minutes. The connector only requests a very limited set of read-onlyFacebook permissions: public profile, user posts and user photos.Although our prototype only imports a small subset of the data available, it is suffi-cient to validate our design goal that it should be simple and frictionless to developconnectors that load diverse types of objects.TwitterTwitter is the world’s leading microblogging website in 2017 with over 320 millionactive users globally[50]. For many users, their stream of Twitter updates (tweets)represents an important personal log of daily activities and interactions. Like theFacebook connector, we base the Twitter service connector on an Ubuntu 14.0434base image and use the service connector configuration page to trigger an OAuthflow with Twitter. We then use a Python script to access the Twitter User StreamREST API[33]. The User stream is an HTTP long-polling endpoint that returnsa stream of JSON objects representing the activity of the authenticated user. Wefilter the stream to only include the users updates. The connector only populates asingle data schema:• twitter.nosql.stream.eventsIn this case we push all stream updates (tweets, retweets, likes) into a single un-structured stream.events collection, which seemed more natural than routingthem into separate tables. At the time of writing, the User stream API is beingphased out in favour of a WebHook-based “push” model. We could readily supportthis change by having the service connector request an appliance host port map-ping into a small stateless HTTP handler that would write the JSON payload intothe stream.events collection.GmailEmail is a crucial archive of personal data in the form of personal correspon-dences,purchase receipts, travel itineraries, photos, and documents. Googles Gmailis one the leading free email services and offers a number of options for program-matic access to the inbox, including traditional IMAP and RESTful APIs[19].We built the Gmail connector in much the same way as the Facebook and Twit-ter prototypes, and opted for the more convenient, modern RESTful API that pro-vides high-level JSON interfaces to messages threads and message content. Wedesigned the data schema to align with the top level JSON objects:• gmail.nosql.inbox.threads• gmail.nosql.inbox.messages• gmail.blobs.inbox.attachmentsThe unstructured data store allowed us to beginning storing data with a minimumof table or schema design effort. To demonstrate the flexibility of the platformservice connector environment we elected to use the Gmail Java SDK to develop35the connector. To accomplish this we install the open Java 8 runtime[28] in thecontainer and bundle the Gmail SDK JAR files. The connector runs a Java appli-cation with a single threaded web server for configuration and authentication, andthe syncing logic on a separate thread. Currently the prototype only fetches mes-sages that arrive after the connector is installed. We imagine a more sophisticatedimplementation would allow the user to backfill messages for a period of time inadditional to capturing new ones as they are sent or received.3.2.7 SummaryIn this section we have discussed the implementation of service connectors includ-ing how they are deployed and isolated within the platform, how they are packagedfor sharing and distribution, what permissions and capabilities they have at run-time and how they read and write data into the platform. We have also discussedthe implementation of three non-trivial prototype connectors that demonstrate theflexibility of the runtime environment and ability of the platform to accommodatereal-world data models.3.3 Application SandboxIn this section we discuss the implementation of the application sandbox, whichprovides the isolation necessary to expose personal data streams to untrusted ap-plications running in the browser. The sandbox hosts JavaScript code written bythird-party developers and works in an unmodified, modern HTML5 browser. Sev-eral platform APIs are baked into the sandbox environment that allow applicationsto query personal data stores, render UI elements, share results through intents andpersist state. Because our aim is foster an ecosystem of purpose-built Vaportrailapplications, and not to support existing web applications general, we trade emu-lation of the standard browser environment (e.g. DOM and related APIs) for morerobust isolation through a completely virtualized JavaScript interpreter. We arguethat the straightline performance overhead incurred, while significant, is quite ac-ceptable in practice; vaportrail applications tend to be I/O bound. We also hopeto demonstrate that even the limited set of APIs we implemented in the proto-type are sufficient for building interesting applications. Furthermore, with some36effort it would be feasible to fully emulate the DOM and native APIs in the hostedJavaScript environment.Figure 3.6: Vaportrail applications run in a dedicated JavaScript interpreterwithin a WebWorker thread. A trusted broker marshals application APIcalls over an asynchronous RPC interface and into the platform moni-tor where authorization checks are made before updating the DOM, ormaking requests into the platform API over the network.3.3.1 WebWorkers as execution containersMuch like Treehouse[44], we exploit WebWorkers as a standard, natively sup-ported execution container. The WebWorker provides a thread of execution (usu-ally an OS thread, though it depends on browser implementation) and a JavaScriptcontext separate from that of the main web page. WebWorkers do not share mem-ory or object references directly with the main page and can only communicatewith it via a very narrow message-based interface (postMessage()). Out of the box,the worker provides a useful degree of isolation from the main page in that a) the37DOM and window objects are not accessible from the worker context and, b) theworker process can be monitored and terminated from the main page context. Forsimply preventing guest code from modifying the visible page, the worker abstrac-tion alone may be enough. However, although WebWorkers cannot manipulate theDOM, they do have the ability to spawn child workers, make network requestsand import arbitrary JavaScript code. Other sandbox implementations (discussedin Chapter 4) have attempted to limit access to these capabilities by overwriting orinterposing on native APIs before loading guest code, applying Content SecurityPolicy (CSP) or statically enforcing a safe subset of JavaScript. The multiplicity ofnon-standard browser implementations and corner cases makes it difficult to argueconvincingly that solutions based on these approaches provide comprehensive iso-lation. As in Treehouse, we initialize the WebWorker with a monitor (or broker)module that prepares the environment for executing guest code. Instead of lockingor freezing native APIs, we embed a complete JavaScript engine and wire our plat-form APIs into it so that they appear “native” to guest application code. When theenvironment is ready, we hand control over to the application.3.3.2 Hosted JavaScript runtimeThe application sandbox is built on the js.js[51] runtime, which is a stripped downversion of Mozilla’s JavaScript engine compiled first to LLVM[25] and then toasm.js[4] (a highly optimizable subset of JavaScript) using emscripten[15]. Js.jsis a complete JavaScript interpreter, equal in capability (if not performance) tothe interpreter hosting it in the browser. We use broadly the same interfaces to“wire” our platform APIs into the interpreter as would a web browser to implementthe DOM and other standard native APIs. The platform broker initializes a newinstance of the JavaScript virtual machine, installs the platform APIs, and thenloads the guest code into the runtime. A platform-defined entrypoint triggers alifecycle event that the application code registers a handler on. The application-defined handler serves as the application’s main() function.Although compute-intensive code running in js.js is roughly two orders of mag-nitude(200x) slower than the same code running in a native interpreter, we findthe overhead almost unnoticeable in practice: applications tend to be IO bound,38and much of the heavy-lifting (animation, DOM updates) is offloaded to nativeJavaScript through high-level platform APIs. Applications that apply machine-learning algorithms, image processing or other tasks that require straightline per-formance will be affected the most, however, the relatively small scale of personaldata and the promise of WebAssembly as a target replacement for asm.js helpsmitigate concerns looking forward. We could have taken the virtualization a stepfurther and provided a limited Linux environment within the browser, however,this would have come with increased complexity and would completely divorcethe sandbox environment from the ergonomics of traditional web development.The JavaScript virtual machine provides robust isolation at an acceptable per-formance cost. We believe it achieves the design goal of building a practical systembased on mechanisms strong enough for users to trust with their personal data to-day.3.3.3 Application sandbox lifecycleEach Vaportrail application instance runs in a dedicated WebWorker and JavaScriptvirtual runtime created and managed by the platform monitor from the main pagecontext. When the user launches an application from their dashboard, a new tab iscreated in the application dock and in the background the monitor creates a newWebWorker based on the platform sandbox broker. When the new worker starts,the broker code initializes its hosted JavaScript interpreter by installing referencesto all of the global platform API objects. When the worker is ready to executeguest (application) code, it notifies the monitor via the postMessage() interface.The platform then responds with the application JavaScript code, which the brokerloads into the interpreter.Any compilation errors are passed back to the monitor, which then terminatesthe worker. If the application code compiles, the sandbox is placed in the readystate for the monitor to start at any time. When the broker receives the start com-mand, the control is handed over to the hosted interpreter, which executes somebootstrapping code and triggers an application lifecycle event, ultimately enteringthe application handler. The platform monitor pings the worker at regular intervalsto check for liveness, and will terminate the worker if it does not receive a re-39Figure 3.7: When an application is launched, the monitor and the sandboxbroker coordinate to setup the new environment for the application in-stance, creating a new DOM root in the dashboard and loading the ap-plication code into the sandbox interpreter.sponse. When the user quits the application from the dashboard, the monitor sendsan unload command to the broker, which then fires the unload application lifecycleevent inside the interpreter. The application can handle this event to persist anystate before exiting. Finally, the broker notifies the monitor that the application hasunloaded, and the monitor terminates the worker. As discussed in Section 3.3.4,unauthorized API calls will also trigger termination of the application without no-tice.403.3.4 Application monitoringThe Vaportrail monitor runs as a component within the dashboard and is responsi-ble for launching applications (creating instances of the sandbox) and supervisingthem at runtime via a) continuous health checks and b) authorizing platform APIcalls made by application code and enforcing platform policies.While an application is running, the monitor pings the application worker every10 seconds and expects a response within 1 second that includes any error codesfrom the sandbox interpreter. The purpose of the health check is to ensure thatthe JavaScript context within the worker has not become wedged or halted unex-pectedly, e.g. due to entering a tight infinite loop or an unhandled exception, re-spectively. The WebWorker isolation ensures that a broken or misbehaving workercannot adversely affect the responsiveness of the main page or other Vaportrail ap-plications, however, without a health check the application could crash or becomeunresponsive without the platform (or the user) knowing.The monitor is also responsible for authorizing platform API calls that originatewithin sandboxed applications. If an application attempts to make an unauthorizedAPI call, we terminate it immediately, bypassing the normal unload lifecycle flow.The reason for this is two-fold. First, we provide a platform permissions API thatallows an application to check whether it has permission to execute a particularAPI call. This is necessary to support optional permission grants, and to allowapplications with pattern-based permission grants (discussed in Section 3.4) to testspecific access at runtime. Thus, there is no need for an application to issue anunauthorized API call to probe for permission. Second, issuing unauthorized callsstrongly implies the app may be malicious, or at least broken. In either case, weerr on the side of caution by implementing a zero tolerance policy.With relatively heavy-handed isolation of application code in place, it could beargued that there is little need for limiting application access to data through finegrained permissions: with no access to native APIs and only the small platform APIand mature JavaScript interpreter as an attack surface, data exfiltration is unlikely.We argue that there are at least three compelling reasons to gate access to data andsharing:1. It is conceivable that in the future we would add permissions allowing ap-41plications more access to the network, increasing exfiltration risks. Havinga framework in place that limits exposure to specific data sets will be impor-tant.2. As a design axiom of secure systems, we apply the principle of least privilegethroughout the platform. Restricting applications to only the data they needis in keeping with this principle.3. Clearly defined permissions allow the platform to maintain a dependencygraph of applications and data schemas, facilitating a better user experiencein the event that a service connector is being upgraded or removed, poten-tially affecting the downstream application(s).Although the application sandbox provides robust isolation, we implement theseadditional layers of monitoring and authorization to further protect against brokenor malicious applications, and or the event of a sandbox escape.3.3.5 Platform APIsThe application sandbox embeds a number of platform objects and APIs that ap-plications can use to query data stores, trigger sharing intents, render UI elementsand persist application state. The platform APIs are broken into namespaces (ui,net, sharing, query, etc.) and wired directly into the sandbox interpreter so that theyappear to guest code as “native” APIs along with other JavaScript standard libraryclasses and objects.RPC mechanismWe implement a custom RPC protocol that transparently proxies operations onplatform API objects within the sandbox interpreter through the broker and acrossthe postMessage() interface to corresponding components in the monitor. In theother direction, we route events from the monitor side into the application by re-playing them on objects in the interpreter. The objects that the application codeinteracts with inside the sandbox are little more than lightweight proxies instru-mented to forward method calls and property accesses through the RPC mech-anism. Component implementations sanitize all arguments and data originating42within applications to protect against script injection.Program 3.2 Method calls on remote objects are implemented by mapping a(taskID, objectID) pair to a specific object instance and applying the method ona set of wrapped arguments. Argument wrapping allows us to transparently sup-port callbacks into the sandbox interpreter by hiding function pointer semantics insimple callable functions.rpc[’__method’] = (function(objId, method, args, rpc){var task = this._tasks[rpc.taskId],obj = task.objects[objId];args = this.wrapMethodArgs(args, rpc);obj[method].apply(obj, args);return true;});platform.uiThe platform UI toolkit provides a number of user interface components that ap-plications can use to render static or interactive UIs. The platform establishes aDOM root node for the application and platform API classes based on BootstrapComponents[6]. The components created by platform.ui toolkit are consistent inform with the rest of the Vaportrail dashboard, but are styled to be visually dis-tinct from trusted platform UI elements. Applications are given very little controlover the visual style of the elements which ensures a) applications cannot “dis-guise” themselves as platform features and, b) a consistent visual style is main-tained across applications. A complete listing of the available UI components isavailable in Appendix A.platform.netThe platform network API allows applications to interact with the network sub-ject to very restrictive permissions. Currently the only network access available isto fetch a pre-approved URL once every 24hrs. Construction of the actual HTTPrequest is handled in the monitor; the application only passes an identifier corre-sponding to a URL that was included in the application manifest and approved at43install time. Preventing the application from fetching the URL more than once perday mitigates the risk of the mechanism being abused (to any practical extent) as anexfiltration channel. Providing a unidirectional mechanism to fetch a URL enablesa class of applications that rely on evolving external data sources (e.g. gas prices,interest rates or machine-learning models).platform.shareThe platform sharing API allows applications to integrate with a platform medi-ated sharing mechanism. The application signals that an item is shareable and theplatform matches the item with a list of installed service connectors capable of han-dling it, based on the type of data. Currently platform.share supports onlytwo rudimentary forms of sharing, text/* and image/*, however we imaginethe API evolving to support more sophisticated, platform-managed channels. Wedid not have time to implement a service connector with sharing support, althoughthe mechanism is supported. Service connectors can specify which share typesthey support (based on MIME-type pattern) in their manifest. They can then poll aplatform API to receive shares that have been queued for them to handle.platform.queryThe platform query API allows applications to execute queries against the personaldata stores that they have permission to access. The API is read-only and presentsa unified interface to the various storage interfaces. In our prototype, applicationsneed to be aware of the query language used by the underlying data store. A moresophisticated implementation might provide a unified, higher-level query language.All results are returned as JSON, including binary objects. The exact format of theJSON depends on the type of data store. For example, the SQL store will return alist of lists representing a result set, while the document store will return a list ofobjects. We do not support pagination or streaming for large result sets: the entireresponse is returned for each query. Applications can implement streaming/pagi-nation by making repeated queries with the appropriate filter clauses. Even once aspecific query API call has been authorized by the monitor for the application, themonitor itself only has read-only access to non-platform data schemas through the44data API.platform.localStorageWe emulate the standard HTML5 LocalStorage[37] API (window.localStorage) inthe sandbox environment to provide applications with a general-purpose facilityfor persisting state across runs. The LocalStorage API is implemented as a globalobject with a getItem(key), setItem(key, value) interface. Any prop-erties set on the global object are automatically relayed into the monitor and savedto a platform-managed application state store. When the platform instantiates anew application sandbox, the saved state is passed into the worker along with theapplication code, and the LocalStorage is initialized before the application codeexecutes. All values are automatically converted to strings, as in the standard Lo-calStorage implementation. We do not support the StorageEvent interface fornotifying other instances of the same application of changes to the storage object.3.3.6 SummaryVaportrail applications run in a robust and flexible sandbox environment based ondedicated JavaScript interpreters using WebWorkers as execution containers. Ourimplementation is cross-platform and runs in unmodified modern browsers. Wefind the performance overhead imposed by the hosted JavaScript interpreter to beacceptable in practice today, and we expect the performance of this approach toimprove with time. We leverage js.js’s interface for binding functions and objectsinto the interpreter context to expose a range of platform APIs that developers canuse to build rich data-driven applications. The platform APIs available to applica-tions are built on a simple RPC mechanism between the sandbox and the platform,and are easily extend. We believe our prototype application sandbox meets thedesign goals of providing robust isolation that is practical and works today, whilealso providing a familiar developer and user experience that will foster adoption ofthe platform.453.4 ApplicationsVaportrail applications complement service connectors by providing the user witha way to leverage their personal data in new tools, visualizations, and games de-veloped by untrusted third-parties. In this section we discuss how applications arepackaged and distributed, general development using the platform-provided APIs,permissions and capabilities, and finally we present three prototype applicationsthat demonstrate the flexibility of the platform.3.4.1 Packaging and distributionMuch like their service connector counterparts, applications are packaged as a sim-ple, self-contained format that facilitates sharing through conventional channels(e.g. email or HTTP download). Like service connectors, application packagescan be built using standard, familiar tools like tar and a text editor. The packageis a gzipped tarfile containing an application manifest, the application code andany assets (images or other dependencies). As with service connectors, the appli-cation manifest is specified in JSON and describes the permissions the applicationrequires to run.Vaportrail applications can be written in JavaScript, or any programming lan-guage that can be compiled to JavaScript, e.g. TypeScript[34], CoffeeScript[9],Dart[11], or Caja[7]. In our prototype implementation, all of the application codemust reside in a single file. This would be an unreasonable limitation for a re-lease implementation, however, a standard form of module loading could easily beadded.3.4.2 The application manifestThe application manifest is a machine (and human) readable JSON file that de-scribes various properties of the application (name, version, author, release date,etc.) as well as the platform permissions it requires. The manifest fully specifieswhich data schemas the application requests access to, as well as which URLs it canfetch data from, and which types of data it is capable of sharing. Any of these per-missions can be marked as required or optional: an application cannot be installedif one or more of its required permissions are not granted by the user. Whether46Program 3.3 The JSON application manifest specifying package metadata andthe platform permissions required by the Radar application.{\"name\": \"Radar App\",\"author\": \"Kalan MacRow \",\"description\": \"Radar for Vaportrail\",\"version\": \"0.0.1\",\"code\": \"src/radar.js\",\"permissions\": [{\"type\": \"read\", \"schema\": \"facebook.*.*.*\"},{\"type\": \"read\", \"schema\": \"twitter.*.*.*\"},{\"type\": \"read\", \"schema\": \"gmail.*.*.*\"},{\"type\": \"network_get\", \"url\": \"http://\", \"id\": \"SPAM_DB\"},{\"type\": \"window_open\", \"*.facebook.com\"},{\"type\": \"window_open\", \"*.twitter.com\"},{\"type\": \"window_open\", \"*.gmail.com\"}]}or not optional permissions were granted can be discovered by the application atruntime using the platform.permissions.can(permission) API. Weoutline the application permission classes in Table 3.2. All applications have accessto all platform.ui components as well as platform.localStorage.3.4.3 Application lifecycleVaportrail applications have an event-driven lifecycle that is closely related to theplatform sandbox lifecycle, and should be familiar to developers accustomed tobuilding browser-based software. When the user launches an application from theplatform dashboard, the monitor creates and initializes a new instance of the appli-cation sandbox.When the sandbox reaches the ready state, and the application code has suc-cessfully been loaded into the interpreter, the application lifecycle begins. The bro-ker managing the interpreter hands control over to the interpreter, which evaluatesall code in the global scope, including a synthetic (i.e. generated) bootstrapping47Table 3.2: Application permission classesPermission class Example DescriptionQuery data {\"type\": \"read\":,\"schema\":\"gmail.sql.inbox.messages\"}Request access to a data schema.Wildcards can be used in anycomponent of the schema name.Share content {\"type\": \"share\",\"content\":\"text/plain\"}Request access to share contentby MIME-type. Wildcards canbe used.Network access {\"type\":\"network get\", \"url\":\"http://.../prices.json\",\"id\": \"PRICE DB\"}Network access capabilities. Weonly support fetching approvedURLs by name. Wildcard pat-terns are not supported in URLs.Navigation {\"type\":\"open window\",\"domain\":\"(.*).facebook.com\"}If enabled, the application canuse platform.open to openbrowser windows to domainsthat match the specified domainpattern.function that in turn fires the platform.onload event. The application codeshould use platform.addEventListener(’load’) to register a handleron the event. The onload handler is effectively the application’s “main” function,and should be used to load state, issue queries and create UI elements.Program 3.4 A “Hello, World!” Vaportrail application that stores the date of itslast run in localStorage, logs a message to the browser console, and creates a modaldialog with a familiar salutation.use strict;!function(){platform.addEventListener(’load’, function(e) {localStorage[’lastRun’] = new Date() + ’’;console.log(’Hello, log!’);platform.ui.alert(’Hello, world!’);});}();48Figure 3.8: The lifecycle of a Vaportrail application.Nearly all interaction with the platform API is asynchronous. The applicationmay also register a handler on the platform.onunload event, which is firedby the monitor when a) the user closes the application or, b) the browser window isclosed. The unload handler can be used to persist state using platform.localStorage.As discussed in Section 3.3.4, the monitor will forcibly terminate the applicationand sandbox (skipping the unload event) if an unauthorized API call is made.The most important phase of life for a Vaportrail application happens between theload and unload events, during which time it responds to events triggered bythe user (via UI components), processes data, and updates its UI.3.4.4 Example applicationsWe implement three prototype applications to help motivate the design and imple-mentation of the platform APIs, and to demonstrate the platforms ability to supportinteresting, non-trivial applications. Each application consumes data from multipleservices, depends on persistent state and provides a simple UI.ContrailContrail is a simple application that renders events from multiple services as iconson a horizontal timeline. Each event (e.g. email, post, upload) is plotted as an iconsized proportional to the “impact” of the event. The definition of impact dependson the type of event: it corresponds to likes, retweets (replies for private messages)49Figure 3.9: The Contrail timeline application.and thread size for Facebook, Twitter and Gmail, respectively. The applicationshows the current day’s events at hourly resolution, and the trailing week at dailyresolution. Contrail demonstrates a simple, but useful visualization that withoutVaportrail would have required granting a third-party application access to threepersonal data sources.AltitudeAltitude is a basic search engine that provides an aggregate free-form search acrossseveral data sources. It leverages the native text search capabilities of the backenddata stores (i.e. SQLs LIKE % clause, and MongoDB’s $text operator) to com-bine events, messages, email attachments and images into one result set. The appli-cation uses platform.open() in the onclick handler of platform.ui.Linkcomponents to link the user out of Vaportrail into the originating service’s view forthe item. Altitude demonstrates the flexibility of the platform APIs and the utility50Figure 3.10: The Altitude search application.of applications that provide a familiar primitive (in this case search) over a unifiedview of personal data.RadarRadar is a rudimentary fraud detection application that leverages the network ac-cess permission to periodically download a database of spam detection patternswhich it applies to all outgoing messages across Facebook, Twitter and Gmail.The rationale is that if a spam message has been sent from a personal account, theaccount has likely been compromised. Each monitored service appears in the UIalong with its current status (Green - all clear, Yellow - possible spam detected, orRed - account compromised). When a message that is likely spam is detected, it isshown in the UI with a direct link to view in the context of the originating service.Radar uses a simple detection model based on a set of regular expressions, eachmapping to a score and human-readable reason that a match implies spam. Each51Figure 3.11: The Radar account fraud detection application.score is multiplied by the number of matches and then a subtotal is calculated forthe message. Alerts are raised based on two thresholds: warning and critical. Atless than 100 lines of code, Radar demonstrates that useful monitoring applicationscan be developed for the platform with a minimum of effort.3.4.5 SummaryThe ability to run untrusted third-party applications on personal data is the corecontribution of the platform. In this section we have reviewed how applicationsare packaged and shared, the high-level capabilities available to them, and theirruntime lifecycle within the sandbox. We have also presented three prototype ap-plications that illustrate the flexibility of the platform APIs and the ability of theplatform to host interesting, non-trivial applications. We believe the applicationpackaging and runtime implementation balances the design requirements of open-ness, flexibility and familiar development style with the need for robust privacy52controls.3.5 SummaryIn this section we have discussed the implementation of the Vaportrail platformprototype, including several example service connectors and applications. We haveimplemented a self-contained personal appliance that leverages Linux containers toisolate and network platform and third-party components within the appliance. Wediscuss how this architecture provides an open, robust, extensible and maintainableplatform that is simple and inexpensive to operate. We argue that our choice ofindustry-proven abstractions (containers, RESTful APIs, databases) and carefuldecoupling of components will allow the appliance to operate without interventionover long periods of time, though a long-term user study would be required toverify this.We discuss service connectors in Section 3.2, and the implementation chal-lenges involved in ensuring developers have sufficient flexibility to integrate with adiverse array of external services, while retaining enough platform control to safelymanage misbehaving connectors. We aim to balance safety and security with thenetworking capabilities required to build non-trivial integrations.In Section 3.3 we discuss our application sandbox in detail. We build on ideasdeveloped by Treehouse[44] and js.js[51] to implement a robust browser-basedapplication sandbox capable of hosting untrusted JavaScript applications and en-forcing policies and permissions through a platform monitor. We implement APIsin the sandbox environment that enable guest code to query data sources, renderUI components and persist application state.Finally, we discuss the platform implementation from the perspective of ap-plications: how they are packaged and distributed, the programming environment,platform permissions and APIs. We present three example applications to demon-strate the ability of the platform to support interesting applications that combinepersonal data sources.Although only a prototype, we believe the implementation succeeds in address-ing the primary design goals of the system and would serve as a good starting pointfor a more production-ready implementation.53Chapter 4Related WorkVaportrail is a self-contained platform for personal data. The guiding philosophybehind Vaportrail is that it should be simple and inexpensive to operate while pro-viding a safe and practical environment for running untrusted applications. This islargely in contrast to existing commercial products which may trade privacy for op-portunities to monetize users, and also existing academic works that provide onlya call for solutions, or present implementations that limit the likelihood of wideradoption by regular users.4.1 Personal data platformsThere have been several commercially successful products that help a user aggre-gate their personal data into a single location and provide some value in the formof applications or integrations with other systems. If-This-Then-That[24] (IFTTT)provides a fully-managed SaaS platform that allows a user to select from over 4,000“applets” that connect to external services and trigger various platform-mediatedactions based on events in the data stream. Applets can have triggers on multipleservice streams and generally provide simple, stateless utilities such as automat-ically re-posting content from one social network to another, or emailing a dailydigest of activity. Unlike Vaportrail, IFTTT makes no claims about privacy andprovides the service for free.Cue[10] (formerly Greplin) allowed users to link several personal accounts into54a single dashboard and provides search capabilities over multiple data streams atonce. Before being acquired by Apple, Cue did have a tiered pricing model inwhich users could pay for more storage capacity. Like IFTTT, Cue made no partic-ular claims about privacy or data ownership once it was loaded into the platform.Loggacy[26] represents a special class of personal data platforms designedspecifically for establishing a digital legacy that can be shared across generations.Loggacy does not provide an application runtime, however, long term personal dataarchiving is a central capability of Vaportrail. We imagine establishing a digitallegacy in a format that invites exploration and discovery as an important use casefor Vaportrail. We also argue that as a self-hosted, privacy-preserving platform,Vaportrail is much better positioned to fulfil this long-term responsibility than aSaaS product that is subject to changeable business conditions.The existence of several related products in this space, and indeed their abilityto attract venture capital and buyers like Apple demonstrate the potential businessvalue of consolidating personal data, as well as user interest in the capabilities itenables. Although generally lacking in privacy controls, we have taken inspirationfrom the user experience design, seamless integrations with the services peopleuse, and the types of applications (or applets) they have developed.Databox[42] proposed a trusted personal data platform for collating personaldata, managing it, and providing controlled access to it. Although Databox didnot present a specific implementation, we take considerable inspiration from theirexploration of the design space and suggestions as to what a Databox should be.Although our implementation diverges in a few important ways from the Databoxproposal (Vaportrail does not provide a mechanism for the user to monetize or bro-ker their data out of the platform, nor does it allow applications to take copies ofdata, or for external devices to access the data over the network) Vaportrail doesimplement a subset of the Databox features. Furthermore, while the Databox au-thors envision the Databox as a core piece of infrastructure for the user, actingas a clearinghouse for sensitive data, we do not position Vaportrail on the user’scritical path. Instead, Vaportrail is a trusted passive consumer of data providing alargely offline means of interacting with it. Databox suggests that an implementa-tion would provide a means for users to explicitly trade their data in exchange forservices, enabling targeted advertising as an incentive for developers. Coordinating55with a multitude of of external service providers to broker data (or “take money”as the Databox authors put it) is a much less tractable problem than building thecore platform in relative isolation. For this reason, we leave the ability to act as abroker to future work.4.2 Privacy-preserving web applicationsPiBox[46] presents a privacy-preserving platform with design goals very similar toVaportrail but with a significantly different implementation. Like Vaportrail, PiBoxshifts the burden of establishing trust from applications to the platform itself, andprovides a sandbox with communication, storage and control primitives that restrictapplication access to sensitive personal data. While Vaportrail employs a singleuser-controlled virtual appliance for deployment, the PiBox sandbox spans a userdevice and a virtual server hosted by one of a few trusted cloud providers. Theplatform uses differential privacy techniques to expose anonymized usage patternsto the application publisher to enable ad-driven revenue streams. We recognizethat systems do not exist in an economic vacuum, but have elected to make thecosts of hosting and application development as explicit as possible in Vaportrail:we argue that cloud infrastructure for a personal appliance is already sufficientlyinexpensive for an individual, and that privacy-conscious users will pay for usefulapplications. The differences in architecture have important implications for thepossibility of wider adoption as well. The PiBox sandbox is based on a modifiedAndroid kernel that manipulates existing IPC, filesystem and network isolationmechanisms to limit the capabilities of applications. It also assumes the willingnessof large trusted service providers (e.g. Google, Apple) to host the cloud portion ofthe platform, acting as fiduciaries on the basis of their reputations. This is in starkcontrast to Vaportrails sandbox which is platform independent (runs in the browser)and does not rely on adoption by large third-party vendors whose interests may notbe well aligned with those of the individual. Although the motivating philosophy,trust model and types of applications that the platforms enable are very comparable,we believe Vaportrail is better positioned for real-world adoption.Priv.io[52] is a platform for building and running privacy-preserving web ser-vices. In Priv.io, users pay for their share of cloud resources (storage, messaging56and bandwidth) and all computing is performed by applications in a browser sand-box. The authors argue that an always-on cloud server is too expensive for mostpeople, based on an analysis of how much users would pay for services like Face-book and Twitter if they were built on Priv.io. Using strong encryption to ensurethat plain text personal data is never exposed to the infrastructure provider, Priv.iocan support a wide range of existing applications through a browser-based sand-box and API. With Vaportrail we do not aim for a general alternative deploymentmodel for web services, nor to support existing services, but focus on a much nar-rower class of applications specifically built around creating value from personaldata streams. Given the more specialized nature of Vaportrail as a tool for safelyexploring data generated on other platforms, and not to host those platforms withstronger privacy controls as Priv.io does, we believe the Priv.io cost analysis doesnot translate well to Vaportrail. In terms of sandbox implementations, Priv.io takesan approach that balances the need for robust control over the flow of sensitivedata into and out of the hosted applications with the flexibility to support exist-ing browser-based code. They use an HTML iframe to host the untrusted appli-cation code and expose the platform API over the postMessage() interface whilerestricting the network access using a browser-enforced CSP. The advantage ofthis approach is that an iframe hosts a completely isolated and fully functionalDOM, enabling existing applications to be easily ported to function in the sand-box. There are a couple of significant limitations: 1) the DOM is a very large andcomplex API implemented differently across browsers, making the CSP suscepti-ble to circumvention by clever use of JavaScript or Cascading Style Sheets (CSS)DOM manipulations that trigger network requests. 2) as a complete web page, theuntrusted content of the iframe is very capable of mounting phishing attacks to sub-vert the platform, i.e. by presenting UI that tricks the user into entering sensitiveinformation that can then be exfiltrated through (1). For these reasons, and othersdiscussed in Section 3.3, we prefer a more robust (albeit restrictive) approach thatinvolves complete virtualization of the JavaScript runtime in which untrusted coderuns. Consequently a successfull attack would require escaping a proven JavaScriptengine, or a very narrow platform API.574.3 JavaScript sandboxesTreehouse[44] presents a JavaScript sandbox for running untrusted code that sharesa number of similar goals to ours, namely that it should work without browsermodifications and should be able to prevent guest code from directly manipulatingthe DOM and using network APIs. We take considerable inspiration from Tree-house, in particular the repurposing of WebWorkers as containers for executinguntrusted code, but take a stronger (albeit less performant) approach to isolatingguest code from the native browser APIs. Treehouse loads a “broker” into the Web-Worker context that creates a virtual DOM and interposes itself on several sensitivebrowser APIs before loading and executing guest code. The virtual DOM allowssandboxed code to run largely unmodified as the broker asynchronously propa-gates changes into the real DOM via the WebWorker postMessage() interface, androutes events back into the virtual DOM. Fine-grained control over which browserobjects, methods and arguments the sandbox code can use is specified by the userthrough policies, and enforced by the broker at runtime when the API is called.This ensures better performance than running a virtualized JavaScript interpreter aswe do, but has an important limitation: the lack of standard API implementationsacross browsers means that it is difficult to verify, even for a single browser, thatthe broker has “locked down” and interposed on the entire API surface. Even if theJavaScript API has been secured, syncing virtual DOM manipulations into the realDOM enables an attacker to carefully construct DOM operations that, when ap-plied outside the sandbox, could conceivably trigger network requests, inject codeor present misleading UI elements. Instead of engaging in an API isolation “armsrace”, we take advantage of highly performant modern JavaScript engines and optfor a fully virtualized interpreter. Consequently Vaportrail cannot easily run un-modified applications, though our approach does not preclude exposing a virtualDOM compatibility layer as future work. Doing so would require very careful con-sideration of the DOM syncing implementation for the same reasons that apply toTreehouse.Several other projects have investigated sandboxing or otherwise restricting thecapabilities of client-side JavaScript. AdSafe[1], Caja[7] and FBJS[16] identify a“safe” subset of JavaScript that can be statically checked ahead of execution. There58are a number of iframe-based sandboxing implementations including AdJail[47],SMash[41] and Subspace[45]. These projects have provided inspiration for Va-portrail but often make trade-offs in favour of supporting existing applications, orsupporting use cases (e.g. rendering ads) that are different from ours. JS.js[51],which we build on in our sandbox implementation, compiles an existing JavaScriptinterpreter to a subset of JavaScript, enabling it to run in modern browsers. JS.jsincurs non-trivial (though acceptable) overhead and completely isolates sandboxedcode from the browser JavaScript environment and APIs.4.4 SummaryThere is a considerable body of work related to aggregating and leveraging per-sonal data streams, building privacy-preserving web applications and services, andsandboxing client-side browser JavaScript. We note that the commercial successof several personal data platforms (IFTTT, Cue, Loggacy) highlights the interestand, we argue, the need for an alternative that makes the costs and privacy controlsexplicit to the user. We review DataBox and its influence on our design of the Va-portrail prototype, as well as PiBox and Priv.io which attempt to solve similar prob-lems by leveraging centralized fiduciaries, and peer-to-peer infrastructure sharingapproaches, respectively. Finally we summarize client-side JavaScript sandboxingwork, and Treehouse in particular for its influence on our implementation.59Chapter 5ConclusionsWe have described the design and implementation of Vaportrail, a platform forpersonal data and applications. Vaportrail provides a self-contained network ap-pliance that individuals can use to archive their own personal data and safely runapplications from untrusted third-party developers on their data. We have devel-oped a simple trust model that puts the user in control of their data, as well as anarchitecture that implements this model with modern web-based ergonomics. Wewere inspired by several other privacy-preserving systems and built on them in aneffort to design a platform that is open, extensible and practical.Our aim with Vaportrail was to build a platform for personal data that, unlikecommercial offerings, puts the user in control of their data and makes the operatingcosts explicit. Our hope is that by building Vaportrail using standard, open tech-nologies that are readily deployable today, we can demonstrate that an alternativemodel to ad-driven Software-as-a-Service is practically feasible, and can deliver amodern user experience. We acknowledge that the economics of Vaportrail (andother privacy-conscious systems) are challenging: it remains to be seen whetherthe public will pay for privacy, however, without viable privacy-preserving alter-natives we may never know. We argue that the decreasing costs, and increasingreliability, of cloud infrastructure are promising indications that at least storageand computing resources may not be a significant barrier now and in the future.Although the current implementation is only an early-stage prototype, it demon-strates several of the core components of a personal data hub, as identified by60projects such as DataBox before us. The concept of a networked hub for per-sonal data is not new, and we believe it could be an important primitive in a moreprivacy-aware future. Vaportrail is an attempt to refine these ideas in a moderndesign context and, hopefully, provide inspiration for others to continue the de-velopment of tools and systems to support privacy and personal data provenance.We observe similarities with public key infrastructures, which have only recentlybegun to enjoy wider consumer adoption as a result of ongoing refinement of theuser experience around them, and less so of the core technology and ideas.The near-term next steps for Vaportrail would be to produce a sufficiently sta-ble version of the platform software to conduct a user study. In the longer-term,we would like to see Vaportrail extended to support private peer-to-peer data shar-ing primitives (perhaps via WebRTC[36]), enabling direct social sharing and fullydecentralized services. We would also extend the application sandbox and runtimewith graphics APIs to support data-driven 3D games and visualizations. Porting thejs.js interpreter to run on WebAssembly would significantly improve the runtimeperformance of applications.Vaportrail is far from having all of the features and properties that we wouldlike from a personal data platform, however, we submit that building privacy-preserving software is difficult on several broad fronts: technically, economically,and socially. That being said, we feel that privacy is a worthwhile pursuit and thatVaportrail is a step towards a platform that puts users in control of their data, andgives developers the freedom to innovate using personal data sources.61Bibliography[1] ADsafe: A safe JavaScript widget framework for advertising and othermashups. https://github.com/douglascrockford/ADsafe. Accessed:2017-02-04. → pages 3, 58[2] Amazon Echo and Alexa Devices.https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/. Accessed:2017-02-04. → pages 1[3] AppArmor - Ubuntu wiki. https://wiki.ubuntu.com/AppArmor. Accessed:2017-02-04. → pages 23[4] asm.js. http://asmjs.org. Accessed: 2017-07-09. → pages 38[5] Microsoft Azure Pricing. https://azure.microsoft.com/en-ca/pricing.Accessed: 2017-02-04. → pages 2, 21[6] Bootstrap CSS. http://getbootstrap.com. Accessed: 2017-07-09. → pages27, 43[7] The LLVM Compiler Infrastructure. http://llvm.org. Accessed: 2017-07-09.→ pages 46, 58[8] Amazon CloudFormation. https://aws.amazon.com/cloudformation/.Accessed: 2017-02-04. → pages 20[9] CoffeeScript. http://coffeescript.org. Accessed: 2017-07-09. → pages 46[10] Cue Search Engine (Greplin).https://en.wikipedia.org/wiki/Cue (search engine). Accessed: 2017-02-04.→ pages 54[11] The Dart Programming Language. https://www.dartlang.org. Accessed:2017-07-09. → pages 4662[12] Big Data, for better or worse: 90years.https://www.sciencedaily.com/releases/2013/05/130522085217.htm.Accessed: 2017-02-04. → pages 1, 2[13] Dropbox. https://www.dropbox.com. Accessed: 2017-02-04. → pages 21[14] Amazon EC2 Pricing. https://aws.amazon.com/ec2/pricing. Accessed:2017-02-04. → pages 2, 21[15] Emscripten: An LLVM-to-JavaScript Compiler.https://github.com/kripken/emscripten. Accessed: 2017-07-09. → pages 38[16] Facebook for Developers. https://developers.facebook.com. Accessed:2017-02-04. → pages 34, 58[17] Gmail: Free Storage and Email from Google.https://www.google.com/gmail/about/. Accessed: 2017-02-04. → pages 2[18] The GNU Privacy Guard. https://www.gnupg.org. Accessed: 2017-07-09.→ pages 30[19] Google Data APIs. https://developers.google.com/gdata/docs/directory, .Accessed: 2017-02-04. → pages 35[20] Google Drive - Cloud Storage. https://www.google.com/drive/, . Accessed:2017-02-04. → pages 21[21] Google Home. https://madeby.google.com/home/, . Accessed: 2017-02-04.→ pages 1[22] Google Compute Platform Pricing.https://cloud.google.com/compute/pricing, . Accessed: 2017-02-04. →pages 2, 21[23] Apple HomePod. https://www.apple.com/homepod/. Accessed: 2017-02-04.→ pages 1[24] If-This-Then-That. https://ifttt.com. Accessed: 2017-02-05. → pages 54[25] The LLVM Compiler Infrastructure. http://llvm.org. Accessed: 2017-07-09.→ pages 38[26] Loggacy. https://www.loggacy.com. Accessed: 2017-02-04. → pages 55[27] MongoDB. https://www.mongodb.com. Accessed: 2017-07-09. → pages 2563[28] OpenJDK. http://openjdk.java.net. Accessed: 2017-02-04. → pages 36[29] PostgreSQL. https://www.postgresql.org. Accessed: 2017-07-09. → pages25[30] Redis in-memory data structure store. https://redis.io. Accessed:2017-07-09. → pages 25[31] Riak CS. https://github.com/basho/riak cs. Accessed: 2017-07-09. → pages25[32] Amazon Simple Storage Service. https://aws.amazon.com/s3/. Accessed:2017-02-04. → pages 21[33] Twitter Developer Documentation. https://dev.twitter.com/rest/public.Accessed: 2017-02-04. → pages 35[34] TypeScript: JavaScript that scales. https://www.typescriptlang.org.Accessed: 2017-07-09. → pages 46[35] Web Intents. http://webintents.org, . Accessed: 2017-02-04. → pages 4, 13[36] WebRTC: Browser Real-Time Communications. https://webrtc.org, .Accessed: 2017-02-04. → pages 4, 61[37] Window.localStorage - Web APIs — MDN.https://developer.mozilla.org/en/docs/Web/API/Window/localStorage.Accessed: 2017-07-09. → pages 45[38] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin. Persona:An online social network with user-defined privacy. SIGCOMM Comput.Commun. Rev., 39(4):135–146, Aug. 2009. ISSN 0146-4833.doi:10.1145/1594977.1592585. URLhttp://doi.acm.org/10.1145/1594977.1592585. Accessed: 2017-02-04. →pages 3[39] T. Chajed, J. Gjengset, J. van den Hooff, M. F. Kaashoek, J. Mickens,R. Morris, and N. Zeldovich. Amber: Decoupling user data from webapplications. In 15th Workshop on Hot Topics in Operating Systems (HotOSXV), Kartause Ittingen, Switzerland, 2015. USENIX Association. URLhttp://blogs.usenix.org/conference/hotos15/workshop-program/presentation/chajed. Accessed: 2017-02-04. → pages 364[40] J. Constine. Facebook now has 2 billion monthly users... and responsibility.https://techcrunch.com/2017/06/27/facebook-2-billion-users/. Accessed:2017-02-04. → pages 34[41] F. De Keukelaere, S. Bhola, M. Steiner, S. Chari, and S. Yoshihama. Smash:Secure component model for cross-domain mashups on unmodifiedbrowsers. In Proceedings of the 17th International Conference on WorldWide Web, WWW ’08, pages 535–544, New York, NY, USA, 2008. ACM.ISBN 978-1-60558-085-2. doi:10.1145/1367497.1367570. URLhttp://doi.acm.org/10.1145/1367497.1367570. Accessed: 2017-02-04. →pages 59[42] H. Haddadi, H. Howard, A. Chaudhry, J. Crowcroft, A. Madhavapeddy, andR. Mortier. Personal data: Thinking inside the box. CoRR, abs/1501.04737,2015. URL http://arxiv.org/abs/1501.04737. Accessed: 2017-02-04. →pages 17, 55[43] D. Hardt. The OAuth 2.0 authorization framework. RFC 6749, RFC Editor,Fremont, CA, USA, Oct. 2012. URLhttp://www.rfc-editor.org/rfc/rfc6749.txt. Accessed: 2017-02-04. → pages 31[44] L. ingram and M. Walfish. Treehouse: Javascript sandboxes to help webdevelopers help themselves. In Presented as part of the 2012 USENIXAnnual Technical Conference (USENIX ATC 12), pages 153–164, Boston,MA, 2012. USENIX. ISBN 978-931971-93-5. URL https://www.usenix.org/conference/atc12/technical-sessions/presentation/ingram.Accessed: 2017-02-04. → pages 3, 17, 37, 53, 58[45] C. Jackson and H. J. Wang. Subspace: Secure cross-domain communicationfor web mashups. In Proceedings of the 16th International Conference onWorld Wide Web, WWW ’07, pages 611–620, New York, NY, USA, 2007.ACM. ISBN 978-1-59593-654-7. doi:10.1145/1242572.1242655. URLhttp://doi.acm.org/10.1145/1242572.1242655. Accessed: 2017-02-04. →pages 59[46] S. Lee, E. L. Wong, D. Goel, M. Dahlin, and V. Shmatikov. Box: A Platformfor Privacy-Preserving Apps. In Presented as part of the 10th USENIXSymposium on Networked Systems Design and Implementation (NSDI 13),pages 501–514, Lombard, IL, 2013. USENIX. ISBN 978-1-931971-00-3.URL https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/lee sangmin. Accessed: 2017-02-04. → pages 5665[47] M. T. Louw, K. T. Ganesh, and V. N. Venkatakrishnan. Adjail: Practicalenforcement of confidentiality and integrity policies on web advertisements.In Proceedings of the 19th USENIX Conference on Security, USENIXSecurity’10, pages 24–24, Berkeley, CA, USA, 2010. USENIX Association.ISBN 888-7-6666-5555-4. URLhttp://dl.acm.org/citation.cfm?id=1929820.1929852. Accessed: 2017-02-04.→ pages 59[48] D. Merkel. Docker: Lightweight linux containers for consistentdevelopment and deployment. Linux J., 2014(239), Mar. 2014. ISSN1075-3583. URL http://dl.acm.org/citation.cfm?id=2600239.2600241.Accessed: 2017-02-04. → pages 19[49] D. Recordon and D. Reed. Openid 2.0: A platform for user-centric identitymanagement. In Proceedings of the Second ACM Workshop on DigitalIdentity Management, DIM ’06, pages 11–16, New York, NY, USA, 2006.ACM. ISBN 1-59593-547-9. doi:10.1145/1179529.1179532. URLhttp://doi.acm.org/10.1145/1179529.1179532. Accessed: 2017-02-04. →pages 31[50] T. Team. Twitter’s Surprising User Growth Bodes Well For 2017.https://www.forbes.com/sites/greatspeculations/2017/04/27/twitters-surprising-user-growth-bodes-well-for-2017/. Accessed:2017-07-09. → pages 34[51] J. Terrace, S. R. Beard, and N. P. K. Katta. Javascript in javascript (js.js):Sandboxing third-party scripts. In Presented as part of the 3rd USENIXConference on Web Application Development (WebApps 12), pages 95–100,Boston, MA, 2012. USENIX. ISBN 978-931971-94-2. URL https://www.usenix.org/conference/webapps12/technical-sessions/presentation/terrace.Accessed: 2017-02-04. → pages 3, 38, 53, 59[52] L. Zhang and A. Mislove. Building confederated web-based services withpriv.io. In Proceedings of the First ACM Conference on Online SocialNetworks, COSN ’13, pages 189–200, New York, NY, USA, 2013. ACM.ISBN 978-1-4503-2084-9. doi:10.1145/2512938.2512943. URLhttp://doi.acm.org/10.1145/2512938.2512943. Accessed: 2017-02-04. →pages 3, 16, 5666Appendix APlatform APIsIn this section we provide a listing of platform APIs and components. The plat-form API surface consists of methods facing service connectors, applications, andinternally (i.e. used only by the platform itself in the service of higher level APIsor functions.platform.authnThe authentication endpoints handle authenticating service connectors and the webdashboard to establish a session that can be reused for subsequent requests.• POST /authn/auth– Requires password or VT CONNECTOR SECRET– Returns access token that must be included in all subsequent requestsusing a custom X-Auth-Token HTTP header.platform.authzThe authorization module determines whether a given API request is authorized fora previously authenticated client. Authorization logic wraps platform API serverHTTP handlers and permission-restricted calls in the application monitor.• platform.permission.can(permission)67– Application facing API that allows applications to check whether theplatform monitor would allow calls related to a specific permission,e.g. {\"type\": \"share\", \"content\": \"image/png\"}.• POST /authz/can– An equivalent connector-facing API takes a permission in the requestbody.platform.dataThese APIs are used to write data to the platform data stores as well as query datafrom applications. platform.query provides a slightly higher level application-facing API.• POST /data/write– Takes a data store schema and store-type specific write request andreturns when the write is persistent or has failed.• GET /data/read– Takes a data store schema and store-specific query request and returnsdata in a store-specific format (e.g. JSON list of rows, or a binary blob).platform.metaPlatform metabase APIs are used largely internally for platform bookkeeping: stor-ing and retrieving connector and application manifests, unpacking and parsingpackages, setting up internal networks and creating and managing containers.platform.uiThe platform application UI components. Component colours are restricted to aplatform-defined palette. Similarly, sizing and positioning is restricted to preventcomponents from being abused.• Button - simple button with text68• Checkbox - stateful checkbox button with label• Dropdown - simple dropdown list of options• Input - a text input field• Image - static image label• Layer - container that can be sized and styled within certain visual param-eters• Label - static text label• Link - text label that appears as a clickable hyperlink• Panel - a container with a heading and an outline• TableLayout - grid-based layout container for other components• Tooltip - simple tooltip with static textplatform.netThe platform application networking APIs.• get(uri id)– Fetches and returns content from a URI by ID specified in the appli-cation manifest. Note that the sandbox interpreter does not have aneval() function, making it very difficult to use this mechanism toload code.platform.localStorageThe platform sandbox implementation of window.localStorage used by applica-tions to persist state.• get(key) : String• set(key, Object) : String69platform.shareThe application-facing APIs for triggering sharing intents. These largely wrap callsinto platform.data.• share(type, content)– Trigger a share request of content with content-type type.70"@en ; edm:hasType "Thesis/Dissertation"@en ; vivo:dateIssued "2017-09"@en ; edm:isShownAt "10.14288/1.0354256"@en ; dcterms:language "eng"@en ; ns0:degreeDiscipline "Computer Science"@en ; edm:provider "Vancouver : University of British Columbia Library"@en ; dcterms:publisher "University of British Columbia"@en ; dcterms:rights "Attribution-NonCommercial-NoDerivatives 4.0 International"@* ; ns0:rightsURI "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* ; ns0:scholarLevel "Graduate"@en ; dcterms:title "Vaportrail : a platform for personal data applications"@en ; dcterms:type "Text"@en ; ns0:identifierURI "http://hdl.handle.net/2429/62599"@en .