UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Partitioning and distribution of web applications to the hybrid cloud Kaviani, Nima 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2014_september_kaviani_nima.pdf [ 5.56MB ]
JSON: 24-1.0165945.json
JSON-LD: 24-1.0165945-ld.json
RDF/XML (Pretty): 24-1.0165945-rdf.xml
RDF/JSON: 24-1.0165945-rdf.json
Turtle: 24-1.0165945-turtle.txt
N-Triples: 24-1.0165945-rdf-ntriples.txt
Original Record: 24-1.0165945-source.json
Full Text

Full Text

Partitioning and Distribution of WebApplications to the Hybrid CloudbyNima KavianiB.Sc., Bu-Ali Sina University, 2005M.Sc., Simon Fraser University, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2014c© Nima Kaviani 2014AbstractHybrid cloud deployment is an effective strategy in deploying software ser-vices across public cloud and private infrastructure. It allows deployed soft-ware systems to benefit from cost savings and scalability offerings of thecloud while keeping control over privacy- or security-sensitive code and dataentities. However, the complexity of determining which code and data en-tities should reside on-premises, and which can be migrated to the cloudis daunting. Researchers have attempted to address this complexity by us-ing partitioning algorithms to optimize distribution and deployment of codeentities across public cloud and private infrastructure. However, we haveidentified the following shortfalls with the existing research work:• Current research does not provide enough flexibility in placement ofsoftware function execution and data entities between public/privatehosts. In particular it does not allow for replication or optimizedseparation of code and data entities in relation to one another.• Current research on partitioning of software systems does not explicitlyconsider the dynamics of a hybrid cloud deployment when makingdecisions about public cloud and private infrastructure. Particularly,current research lacks support for making explicit tradeoffs betweenmonetary cost and improved performance in hybrid cloud softwaresystems.• The dynamics of the cloud require partitioning algorithms to be tai-lored towards features inherent to a hybrid cloud deployment. This in-cludes encoding data dependency models and component dependencymodels of a software system collectively into one unique mathemat-ical optimization model. There is no existing algorithm that allowsfor combined code and data dependency requirements to be modelledunder one optimization formula.This thesis presents my work on implementing algorithms and tools thataddress the shortcomings of the previous research as discussed above. Thesealgorithms and tools are put together under a partitioning and distributioniiAbstractframework named Manticore. Manticore has been used to drive parti-tioning and deployment decisions on several open source software systems.The experiment results show an estimate of up to 54% reduction in mone-tary costs compared to a premises only deployment and 56% improvementin performance compared to a na¨ıve separation of code entities from dataentities in a hybrid cloud deployment.iiiPrefaceThis dissertation is original, unpublished, independent work by the author,Nima Kaviani.The contributions and evaluations presented in this dissertation are sum-marized and published in three conference papers, namely: IEEE Confer-ence on Service Oriented Computing 2011 (ICSOC 2011) [66], IEEE CloudComputing Conference 2012 (CloudCom 2012) [67] and Usenix MiddlewareConference 2013 [68].ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Deploying Applications to Hybrid Cloud . . . . . . . . . . . 21.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 41.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1 Code Mobility & Distributed Middleware . . . . . . . . . . . 152.2 Application Partitioning . . . . . . . . . . . . . . . . . . . . 192.3 Software Migration and Resource Utilization . . . . . . . . . 302.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Code Dependency Modeling and Application-tier Partition-ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1 Creating the Dependency Graph . . . . . . . . . . . . . . . . 383.2 Modeling Code Components in the Dependency Graph . . . 393.2.1 Request-based Model (RBM) . . . . . . . . . . . . . . 40vTable of Contents3.2.2 Static Structure Model (SSM) . . . . . . . . . . . . . 423.2.3 Context-Sensitive Model (CSM) . . . . . . . . . . . . 433.3 Defining Constraints for Placement & Cost . . . . . . . . . . 463.3.1 Policy Specification Scheme . . . . . . . . . . . . . . 463.3.2 Cost Scheme . . . . . . . . . . . . . . . . . . . . . . . 473.4 Augmenting Dynamic Dependency Graphs with Cost Metrics 473.5 Applying the Partitioning Algorithm . . . . . . . . . . . . . 503.5.1 The Partitioning Algorithm for Symmetric Data Ex-change Costs . . . . . . . . . . . . . . . . . . . . . . . 503.5.2 The Partitioning Algorithm for Asymmetric Data Ex-change Costs . . . . . . . . . . . . . . . . . . . . . . . 523.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.6.1 Micro-Benchmarks on Performance Improvements . . 573.6.2 Cost Models vs Measured Deployments . . . . . . . . 623.6.3 Evaluation of Context-Sensitive Modeling . . . . . . . 643.6.4 Evaluation of Flexible Cost Modeling . . . . . . . . . 663.6.5 Evaluation of Scalability with Code-only Partitioning 683.6.6 Evaluation of Deployment Costs . . . . . . . . . . . . 703.6.7 Evaluating Symmetric vs. Asymmetric PartitioningAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . 733.6.8 Evaluating Data Entity Placement Constraints . . . . 753.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764 Data Dependency Modeling and Data-tier Partitioning . 794.1 Extending Application-tier Partitioning with Data-tier Par-titioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.2 High-Level Overview of Data-tier Partitioning . . . . . . . . 864.3 Profiling the Data-tier with Explain Plan . . . . . . . . . 904.4 Analyzing Dependencies for the Data-tier . . . . . . . . . . . 924.5 Partitioning the Data-tier . . . . . . . . . . . . . . . . . . . . 954.5.1 Defining BIP Constraints for Data Dependencies . . . 964.5.2 Defining the BIP Objective for Data Dependencies . 974.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.6.1 Evaluation of Performance . . . . . . . . . . . . . . . 1044.6.2 Evaluation of Deployment Costs . . . . . . . . . . . . 1074.6.3 Evaluation of Scalability . . . . . . . . . . . . . . . . 1094.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112viTable of Contents5 Manticore:The Partitioning Framework . . . . . . . . . . . . . . . . . . . 1145.1 Manticore Overview . . . . . . . . . . . . . . . . . . . . . . . 1155.2 The Software Profiling Tool . . . . . . . . . . . . . . . . . . . 1185.3 The Tool for Analysis and Generating the Dynamic Depen-dency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.4 The Tool for Partitioning the Dynamic Dependency Graph . 1215.5 Distribution and Deployment . . . . . . . . . . . . . . . . . . 1235.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266 Conclusion & Future Work . . . . . . . . . . . . . . . . . . . . 1286.1 Primary Research Contributions . . . . . . . . . . . . . . . . 1296.1.1 Context-Sensitive Dependency Modeling . . . . . . . 1296.1.2 Flexible Cost Modeling . . . . . . . . . . . . . . . . . 1296.1.3 Cross-tier Partitioning . . . . . . . . . . . . . . . . . 1306.1.4 Asymmetric Data Exchange Costs . . . . . . . . . . . 1306.2 Secondary Research Contributions . . . . . . . . . . . . . . . 1316.2.1 Simulation and Real-world Evaluation . . . . . . . . . 1316.2.2 Comprehensive Tooling . . . . . . . . . . . . . . . . . 1316.3 Thesis Review and Observations . . . . . . . . . . . . . . . . 1316.4 Challenges and Future Work . . . . . . . . . . . . . . . . . . 1326.4.1 Semi-automatic Partitioning . . . . . . . . . . . . . . 1326.4.2 Heterogeneous vs. Homogeneous Machines . . . . . . 1336.4.3 Multi-way and Online Partitioning . . . . . . . . . . . 1336.4.4 Data Dependency Analysis . . . . . . . . . . . . . . . 1346.4.5 Security and Privacy Requirements . . . . . . . . . . 1356.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137AppendicesA Appendix on Hybrid Cloud Solutions . . . . . . . . . . . . . 146B Appendix on Dynamic Dependency Graph . . . . . . . . . 149C Appendix on Cost and Policy Specifications for Partitioning 151C.1 Policy Specification . . . . . . . . . . . . . . . . . . . . . . . 151C.2 Cost Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . 152viiList of Tables2.1 Hilda’s Characteristics . . . . . . . . . . . . . . . . . . . . . . 212.2 SWIFT’s Characteristics . . . . . . . . . . . . . . . . . . . . . 222.3 Wishbone’s Characteristics . . . . . . . . . . . . . . . . . . . 232.4 J-Orchestra’s Characteristics . . . . . . . . . . . . . . . . . . 242.5 Characteristics of the approach by Diaconescu et al. . . . . . 242.6 Characteristics of the approach by Gu et al. . . . . . . . . . . 252.7 Characteristics of the approach by Ou et al. . . . . . . . . . . 262.8 Coign’s Characteristics . . . . . . . . . . . . . . . . . . . . . . 272.9 ABACUS’s Characteristics . . . . . . . . . . . . . . . . . . . . 282.10 Characteristics of the approach by Nanda et al. . . . . . . . . 292.11 Characteristics of the approach by Giurgiu et al. . . . . . . . 302.12 Dimensions of Investigation for application partitioning ap-proaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 Symbols used in application trace of data dependency graph . 393.2 The number of nodes in the DDG of DayTrader as well as thetop ten largest JForum business logic functionality. . . . . . . 563.3 Microbenchmarks of Database and RPC roundtrips for JFo-rum requests . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.4 Microbenchmarks on the amount of data exchanged betweenthe public cloud and the private cloud for various hybrid de-ployments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.5 Code placement for different request types in Apache Day-Trader as Gamma (γ) changes from 0.5 to 15. . . . . . . . . . 673.6 Deployment costs for DayTrader with changes in constrainedtables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1 Constraint generation functions . . . . . . . . . . . . . . . . . . 974.2 Functions for generating objective helper constraints . . . . . . . . 984.3 Functions for generating objective function . . . . . . . . . . . . . 994.4 Naming conventions for deployment models used in evaluation.103viiiList of Tables4.5 Response time evaluation of the Cross-Tier deployment com-pared to other models of deployment (a) for doLogin and (b)across all business logic functionality. Results show how adeployment based on cross-tier partitioning improves or de-grades response time compared to the other models of deploy-ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.6 Cost improvements of Cross-Tier deployment compared toother models of deployment. Results show how a deploymentbased on cross-tier partitioning improves deployment costscompared to the other models of deployment. . . . . . . . . 109ixList of Figures1.1 Hybrid deployment architecture with distributed code and data. 71.2 Highlevel representation of our methodology involving thesteps of profiling, analysis, and partitioning. . . . . . . . . . . 81.3 Cross-tier partitioning of a monolithic web application. . . . . 113.1 Application Dependency Models: a) RBM, b) SSM, c) CSM.Circles are application nodes and cylinders are database ta-bles. The edge weights represent number of round-trip dataexchanges between nodes. . . . . . . . . . . . . . . . . . . . . 413.2 CSM visualization of partitioning results for doLogin . . . . . 453.3 DDG and data exchange for the sub-graph of doAccount inDayTrader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4 A high-level model of how different partitioning algorithmswould choose component placements between the public andthe private cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 583.5 Inbound and Outbound data exchanges for Na¨ıve Hybrid ver-sus partitioned deployment models . . . . . . . . . . . . . . . 613.6 Simulated and measured execution times for hybrid and cloudcode placements for each DayTrader request type . . . . . . . 633.7 Simulated and measured data transfer sizes for hybrid andcloud code placements for each DayTrader request type . . . 643.8 Comparison of latency adjustments for the SSM, RBM, andCSM as the premises cost changes. . . . . . . . . . . . . . . . 653.9 Monetary cost of deploying requests of various type for Day-Trader with respect to changes in Gamma (γ) . . . . . . . . . 683.10 Scalability tests for full premises, full cloud, and hybrid de-ployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.11 Comparison of monthly deployment costs for different deploy-ment models of DayTrader. . . . . . . . . . . . . . . . . . . . 713.12 Comparison of monthly deployment costs for different deploy-ment models of JForum. . . . . . . . . . . . . . . . . . . . . . 72xList of Figures3.13 Measured data exchanged between the cloud and premisesusing the Symmetric vs. Asymmetric partitioning algorithms. 743.14 Cloud-to-premise ratio of module placements. . . . . . . . . . 764.1 A cross-tier partitioning suggested by our tool for the doLoginrequest from DayTrader. . . . . . . . . . . . . . . . . . . . . . 804.2 Possible query plans from one of the queries in DayTrader. . 854.3 The high-level steps in performing data-tier partitioning. . . . 874.4 The query plan tree representing alternatives in executing agiven query joining three tables A, B, and C with one another. 894.5 The first phase in partitioning the data-tier is to profile thedata-tier with EXPLAIN PLAN. . . . . . . . . . . . . . . . . 904.6 The second phase in partitioning the data-tier is to analyzedependencies in a query plan. . . . . . . . . . . . . . . . . . . 924.7 The third phase in partitioning the data-tier is to encode thepartitioning problem into a BIP formulation. . . . . . . . . . 954.8 Measured execution times for selected request types in the fourdeployments of DayTrader. . . . . . . . . . . . . . . . . . . . . . 1054.9 Measured execution times for selected request types in the fourdeployments of RUBiS. . . . . . . . . . . . . . . . . . . . . . . . 1064.10 Monthly cost comparison for different deployments of DayTraderand RUBiS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.11 Monthly cost comparison for different deployments of RUBiS. . . . 1094.12 Scalability tests for full premises, full cloud, and hybrid deploy-ments for DayTrader. . . . . . . . . . . . . . . . . . . . . . . . . 1104.13 Scalability tests for full premises, full cloud, and hybrid deploy-ments for RUBiS. . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.1 Profiling, Analysis, and Partitioning in Manticore . . . . . 1185.2 The screen shot for the eclipse launcher of the Java profilerin Manticore. . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.3 The screen shot for selecting the profiling trace, the policyconstraints, and the host configuration constraints in Man-ticore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.4 The screen shot for selecting the execution cost model, thecommunication cost model, the partitioning algorithm, andapplying partitioning in Manticore. . . . . . . . . . . . . . 1225.5 The screen shot for the results of partitioning and the gener-ated distribution plan in Manticore. . . . . . . . . . . . . . 124xiList of Figures5.6 The screen shot for statistical information on partitioning re-sults in Manticore. . . . . . . . . . . . . . . . . . . . . . . . 125xiiList of Programs4.1 Function to collect statistics for alternative query plan operatorsfor the input query Q. Pi is the powerset operator over sets of size i. 954.2 Constraint generation for cross-tier partitioning. . . . . . . . 974.3 Objective generation . . . . . . . . . . . . . . . . . . . . . . . . 99B.1 A sample profiling trace collected by jip-osgi. . . . . . . . . . 150C.1 A sample policy specification filtering on given DDG. . . . . 152C.2 An example cost scheme provided to the profiler. . . . . . . . 154xiiiAcknowledgementsI am honoured and delighted to acknowledge the many people whose counsel,support, and encouragement have contributed immensely to the completionof my dissertation.First and foremost I would like to thank my amazing supervisors Dr.Rodger Lea and Dr. Eric Wohlstadter for their constant help, support, andadvice throughout my PhD. When I started my PhD, I had this assumptionthat it would be an easy ride from the start to the end. However, it did nottake long for me to realize that PhD is not just a scientific adventure butmore importantly it is a life adventure. It was in the first year of my PhDthat I had to change the direction for my research. My passion was alwaysin distributed systems, however if it wasn’t for the help and support of Ericand Rodger, I would have probably ended up in a completely different fieldof research. It was my great pleasure to work with the two of them and ben-efit from their insight, experience, and help not only as research supervisorsbut more importantly as role models in life.I also need to thank the very many great friends from the MAGIC laband the Software Practices lab who helped me with their constant advice,support, encouragement, and feedback:I would like to thank Dr. Michael Blackstock for being an amazing friendand a great advisor throughout my PhD. He was the best counselor for meto talk to when making the early career decisions for my PhD.My great level of appreciation also goes to Dr. Thomas Fritz for beingan amazing friend. My friendship with Thomas very quickly grew beyonda simple work relation and we developed bonds to the point that I considerhim one of my closest friends in life. Thomas has always been a wonderfulsource of energy, encouragement, and positive vibe. Particularly wheneverI was low on steam, he constantly pushed me forward with his encouragingwords. I consider myself very lucky to get to know him as a great individualxivAcknowledgementsand I hope I can return his favours at some point in life.I also immensely appreciate the help and support I received from Dr.Sarah Rastkar, Dr. Aiman Erbad, Dr. Matthias Finke, Dr. EbrahimBagheri, Reza Babanezhad, Roberto Calderon, Vincent Tsao, Albert Thomp-son, and Julius Davis.My deepest levels of gratitude also go to my amazing friend and brotherDr. Bardia Mohabbati who made a significant mark in my life with his pres-ence, support, and encouragement. Being roommates for five years and stillbeing best friends speaks a world about the level of trust and friendship wedeveloped. I am very honoured to have him as my friend. Thanks very muchBardia for all the adventures we shared together, for all the all-nighters wepulled together, and for all the ideas we bounced off each other.Last but not least, I would like to thank my beautiful and amazing friendMona Erfani. Mona walked alongside me from the very starting point of myPhD career to the very last minute that I heard the word “pass” in the dayof my PhD defence. Without her there would have been less meaning tolife, to joy, to fun, and to friendship for me. Thanks Mona for all the help,for all the support, for your kind heart, and for your trustful manner. Stayamazing like this!xvDedicationTo the three stars of my life:“my dad, my mom, and my beautiful sister ...”xviChapter 1IntroductionCloud computing is a model for enabling convenient, on-demand networkaccess to a shared pool of configurable computing resources (e.g., networks,servers, storage, applications, and services) that can be rapidly provisionedand released with minimal management effort or service provider interac-tion. The benefits of cloud computing are in providing i) on-demand elasticcomputing resources, ii) pay-per-use billing models, and iii) minimal up-front user commitments [38]. All these benefits have made cloud computingan attractive technology to be employed and used by businesses.In early 2011, the Information Week Analytics survey [50] revealed grow-ing interests in using cloud computing resources among companies, from 31%in 2009 to 46% in 2010. The trend continued to 56% in 2011 [29], 67% in2012, and 75% in 2013 [30]. In a separate research survey, Gartner predictsa cloud computing service revenue growth from $131 billion in 2013 [52] to$180 billion by the end of 2015 [31].Despite the benefits of public cloud in providing increased flexibility atlower costs, the idea of a complete migration of software systems to thecloud is not fully embraced by cloud customers. Issues such as operationalchallenges, data compliance requirements, data or architectural lock-in to aparticular provider, and security and privacy concerns are among the majorobstacles preventing a full application migration to the cloud [32, 55, 107].This is to the extent that a recent study by RightScale1 identifies privacyconcerns and data compliance as the top most fears among IT managerswhen considering a full migration to the cloud [87]. To mitigate some ofthese challenges, companies have turned into a new architectural model, re-ferred to as Hybrid Cloud.In essence, Hybrid Cloud [38, 93, 97] is an architectural model in whichcomputation and storage capacities from a public cloud are offered as supple-1http://www.rightscale.com/11.1. Deploying Applications to Hybrid Cloudmentary resources to a private infrastructure in order to obtain the benefitsof both public cloud and private infrastructure. With a hybrid deployment,flexible business operations, enhanced performance, and optimized deploy-ment costs are borrowed from the public cloud while stronger security andbetter control over resources are taken from the private infrastructure [32].With the growing number of proponents for hybrid cloud architectures,public cloud providers have also started offering solutions that support hy-brid deployments (i.e., combining their public cloud infrastructures withthe private infrastructure from their customers). Existing hybrid solutionsrange from offerings as simple as establishing cheap software-enabled virtualprivate network connections (VPNs) between the cloud infrastructure andthe private infrastructure, to having dedicated fiber-optic cables betweenthe public cloud and the private infrastructure (see Appendix A for detailson hybrid cloud solutions).With all the hybrid solutions available, deploying applications to hybridcloud is heavily influenced by the capabilities and offerings of the hybridsolution used for application deployment. These capabilities contribute dif-ferently to the overall performance, cost, and scalability of a deployed ap-plication. As a result, a system architect making decisions about a hybriddeployment has to deal with the challenging problem of deciding about thevariabilities for all infrastructure elements and their effects on applicationbehavior in order to achieve an optimal deployment.1.1 Deploying Applications to Hybrid CloudThere has been significant amount of academic and commercial interest inunderstanding the implications of deploying an application to the cloud [31].This includes understanding factors such as the overall performance of theapplication, cost of deployment, bottlenecks in a deployment, points of vul-nerability or failure in a deployment, etc. There has been both manual andautomated efforts to gain this level of understanding. RightScale [28] andMicrosoft [27] help their customers decide about their cloud deployments byhaving engineers manually look into the variabilities and requirements forpublic and private deployments of an application. On the contrary, frame-works such as PaaSLane [26] have tried to automatically achieve this bydoing code-level inspection of software systems and verifying application de-ployments against cost and performance schemes offered by cloud providers.21.1. Deploying Applications to Hybrid CloudA hybrid deployment should prevent placing privacy-critical code or datain the public cloud in order to adhere to confidentiality, integrity, and pri-vacy requirements for critical business information. Constraining placementof these privacy-critical code and data entities to the premises infrastruc-ture demands for careful planning in order to achieve performance and costoptimization in a hybrid cloud deployment. Suboptimal planning can leadto inaccurate provisioning of required resources (e.g., computation, commu-nication, and storage resources) which may lead to expensive deploymentswith degraded performance.Consequently, effective hybrid cloud deployment of a software systemdepends on identifying the appropriate location for deploying critical com-ponents or data entities in the system, identifying their inter-dependencies,assessing entity placement constraints or cost considerations, finding theright hybrid solution for the deployment, and finally distribution and de-ployment of the system. Identifying the inter-dependencies between codeand data has been a major challenge from the early stages of research oncode mobility [42, 51]. This challenge leads to difficulties in optimal sepa-ration and distribution of code and data entites of an application [60, 74].Programmers’ intuitions on distribution and deployment of functionality isoften inaccurate and can result in decreased performance and efficiency ofthe running software [63]. Also manual identification of business criticalinformation and evaluation of the requirements for its secure deployment iserror prone [108]. Zdancewic et al. [108] have shown that programmers oftenmisjudge the sensitivity of information which may lead to undesirable place-ment of critical data when deploying software to the cloud. Since such deci-sions are usually made at the early stages of software design, making changesto the deployment can be costly if not done automatically. Automated tech-niques previously have been utilized to help with analysis and deployment ofservice implementations across distributed host machines. These techniquesare commonly referred to as application partitioning [45, 58, 71]. Techniquesfor application partitioning can also be utilized in the context of hybrid cloudfor efficient deployment of software systems.31.2. Problem Statement1.2 Problem StatementThe problem of application partitioning has been addressed in previous re-search. Systems such as CloneCloud [45], Cloudward Bound [58], and Ley-mann et al.’s [71] partition only software but not data. Other work inthe area provides for partitioning of data, e.g., partitioning of relationaldatabases [69] or Map-Reduce job/data components [33, 70, 103]. Unfortu-nately, none of these approaches combines code and data partitioning in thecontext of a hybrid cloud. However, one cannot “cobble together” a hybridsolution by using independent results from such approaches. The problem ofhybrid deployment requires detailed analysis of inter-dependencies betweencode and data within the context in which the code accesses the data. Toaddress the requirements of a hybrid deployment, a new approach is neededthat integrates application and data partitioning natively. This partitioningstrategy also needs to be tailored to the type of application deployed in ahybrid cloud (see Chapter 4 for a detailed example).Among all types of applications deployable to the cloud, web applica-tions are of particular importance because of their prevalence and scalabil-ity, performance, and cost efficiency requirements. The focus of this thesisis particularly on partitioning of On-Line Transaction Processing (OLTP)-style web applications. Web applications follow the well known multi-tier ar-chitecture, generally consisting of tiers such as: client-tier, application-tier2(serving dynamic web content), and back-end data-tier. Existing researchon (semi-)automated partitioning of web applications has been only appliedto one of the application- or data-tiers and does not address partitioningof software systems across both application- and data-tiers (which we referto as cross-tier partitioning). Consequently, the main argument for this re-search work is that combined partitioning of code and data is a challengingtask for software designers and system architects.One major challenge in cross-tier partitioning is the tight coupling ofdata-flow between application- and data-tiers. The application-tier canmake several queries during its execution, passing information to and fromdifferent queries. Even though developers follow best practices to ensurethe source code for the business logic and the data access layer are looselycoupled, this loose coupling does not apply to the data-flow. The data-flowcrosscuts application- and data-tiers requiring a new strategy that considers2In the rest of the thesis we use the terms code and application-tier interchangeably.41.2. Problem Statementthe two simultaneously and explores data dependencies within the contextof single requests in the web application. This is particularly importantwhen making decisions about the secure and efficient deployment of soft-ware systems. Any efficient deployment must avoid, whenever possible, thelatency and bandwidth requirements imposed by distributing such data-flow.A second challenge for cross-tier partitioning is that it requires an analy-sis that simultaneously reasons about the execution of application-tier codeand data-tier queries. On the one hand, previous work on partitioning ofcode is not applicable to database queries because it does not account formodeling execution of database queries particularly when several databasetables are involved. On the other hand, existing work on data partitioningdoes not account for the data-flow or execution footprint of the application-tier [69].Given all the above, a high-level formulation of the research problem canbe defined as follows:How could code and data dependency analysis, system resource usage,and cost models of a cloud platform be leveraged in a cross-tier partitioningframework in order to help developers make cost- or performance-effectivedecisions for hybrid deployment of OLTP-style web applications to the cloud?To clarify the research problem, consider the following example. As-sume that a web application company (e.g., a stock trading company, anonline auction site, etc.) is willing to take its previously on-premises soft-ware system and deploy it to the cloud to take advantage of the cost savingsand scalability capabilities. The migration plan suggests cost reductions onownership and maintenance for individual applications, savings on energyconsumption, and cost reductions on the required IT staff by leasing someinfrastructure from public cloud provider companies. Savings on cost andenergy are sufficiently high for the company to give up control and owner-ship on major parts of the system. However, the company requires privateand confidential information (e.g., personal data for its users including theircredit card information) to be stored on the company’s on-premises servers.When migrating to the cloud, keeping control on where the data (par-ticularly privacy-sensitive data) is physically stored is one of the major re-quirements of most medium and large enterprises in order to avoid datalock-in or reduce risks of exposing confidential data [87]. Revealing corre-51.2. Problem Statementlations between application components and different types of data to beplaced on servers within or outside company’s premises requires thoroughinvestigation of the architecture of the application. The company thus re-quires tooling and solutions to determine and assess performance gains andcost savings of optimal partitioning plans for the target software system. Inthe absence of proper tooling and solutions, given the scale and number ofapplications to be re-architectured and data to be partitioned, such deploy-ment would require hours of planning and engineering by IT managers andsystem architects [1].Figure 1.1 shows a high-level diagram for an ideal hybrid deploymentfor our target company, in which both code and data are partitioned acrossthe public cloud and the private infrastructure. As shown in Figure 1.1,in an ideal deployment, the company will keep the minimal subset of codeand data entities on-premises and push the rest to the cloud. This minimalset needs to be determined based on the defined placement constraints,the dependencies across code entities, interaction of code entities with dataentities, computation resource needs, estimated data transfers to/from thecloud (inbound/outbound data), and all their associated monetary costs.Under such circumstances, the company needs to deal with the followingquestions:• How to model inter-relations between code and data?• How to model performance and cost implications of a deployment?• How to optimize cost and performance towards an optimal hybrid de-ployment?• Will the combination of optimization strategies and cost models leadto increased performance and cost savings?The questions above are in fact derivatives of the high-level researchquestion we formulated earlier in this section. The goal of this researchwork is to provide algorithms, techniques, and tools to help software archi-tects and system engineers find answers to the above questions accuratelyand avoid uninformed hybrid cloud deployments. Consequently, we definethe thesis of this research as follows:We can develop context-sensitive dependency models and cross-tier par-titioning algorithms to facilitate the migration of OLTP-style web applica-tions to a hybrid cloud deployment. This is achieved through identification61.3. MethodologyFigure 1.1: Hybrid deployment architecture with distributed code and data.of constraints imposed on cost and data placement, modeling and analysisof code and data dependencies in a monolithic software system on a refer-ence machine, formulating the dependency models and constraints into anoptimization problem, transforming the monolithic dependency models andconstraints to a target distributed deployment through solving the optimiza-tion problem.Next, we describe our methodology to validate the thesis above.1.3 MethodologyThe approach taken in this thesis builds on top of existing research workon application partitioning, resource provisioning, and cost modeling. For-mulating a cross-tier partitioning strategy requires combining profiling, costmodeling, and partitioning for the application-tier with those of the data-tier. As shown in Figure 1.2, creating partitioning formulations at each tierconsists of the same high-level steps (although the specific details vary):• Profiling– measuring execution of software functions on a reference machine.– measuring execution of SQL queries on a reference machine.71.3. MethodologyFigure 1.2: Highlevel representation of our methodology involving the stepsof profiling, analysis, and partitioning.– measuring data exchange between software functions and dataentities on a reference machine.• Dependency Analysis and Cost Modeling– converting the collected profiling data into a dependency modelrepresenting the software system.– transforming the reference software dependency model into a dis-tributed deployment model on target machines.– applying cost and performance models to the distributed deploy-ment model.– identifying placement constraints for code and data in the dis-tributed model.• Application Partitioning– transforming the distributed model into an optimization problem.– solving the optimization problem using binary integer linear pro-gramming (BIP).Profiling. We rely on profiling of the application-tier and the data-tier toprovide a model of the system under analysis and monitor load on differentapplication components. This process involves injecting extra profiling codeinto the application and data tiers to collect traces of application execution81.3. Methodologyand data access for each tier in the system. The data is used to create a dy-namic dependency graph (DDG) [39, 49, 78] of the application as a directedacyclic graph where functions and data entities (e.g., database tables) arerepresented as graph nodes and their dependencies are represented as graphedges. Weights on vertices and edges of the graph represent cpu usage anddata exchange respectively between entities in the application.Analysis. Dependency analysis involves understanding which code or dataentities are inter-related and what model of separating components wouldcontribute to a cost effective yet performant deployment of the software sys-tem.Existing research on software dependency modeling provides techniquesto determine the optimal mapping of software functions and data entitiesto network hosts (e.g. client or server) [43, 63, 78]. Such research only sup-ports a simple one-to-one mapping of functions and data to hosts. However,in our research we have found this simple mapping to be inadequate be-cause the optimal placement of a function depends on the context in whichthe function is used or the data is accessed. In short, sometimes it is bet-ter to execute a particular function in the public cloud and sometimes itis better to execute it on premises. We have developed a technique calledcontext-sensitive dependency modeling that allows for replication of softwarefunctions or data entities depending on their context of use. In Chapter 3we discuss the details of this dependency model and report on the results ofusing this model compared to the existing dependency models.Upon creating a dependency model, the model is augmented with datathat can reflect on either of the two primary objectives for optimization:1. performance, e.g. request processing latency2. monetary cost of deploymentPrevious work on performance and cost models [45, 58, 71] does not providea means for developers to explicitly make trade-offs between these two objec-tives. We provide a flexible cost modeling technique which allows developersto make trade-offs between latency and monetary costs. The flexible costmodel supports specification of code replication, data replication, and ad-justing cost-to-performance trade-offs when declaring performance or costcharacteristics of an application. All these constraints are formulated intothe partitioning algorithm to help with achieving the optimal partitioning91.3. Methodologyresults. This also allows for different hybrid deployment solutions (see Ap-pendix A) to be encoded into the cost model (see Chapter 3 for details).For our cost model to be effective within the context of hybrid cloud de-ployments, financial and operational characteristics of the host cloud needto be defined into the cost model and utilized in the partitioning algorithms.Our analysis of cost schemes for various cloud providers revealed that pub-lic cloud providers impose asymmetric charges for data to/from their infras-tructure [2, 3]. Purposefully enough, these asymmetric data transfer chargesencourage pushing data to the cloud by making cost of data transfer to thecloud considerably cheaper than data retrieved from the cloud3. Exploitingthis asymmetric cost model, we developed an optimization formulation forour partitioning algorithm that further reduces the cost of a hybrid deploy-ment (see Section 3.5).Partitioning. At its core, partitioning is a method for applying mathemat-ical optimization to distributed software development. Binary Integer Pro-gramming [90] has been utilized previously for partitioning of applications(although not for cross-tier partitioning) [44, 67, 78, 105]. A binary integerprogram (BIP) consists of the following:• Binary variables: A set of binary variables x1, x2, ..., xn ∈ {0, 1}.• Constraints: A set of linear constraints between variables where eachconstraint has the form: c0x0 + c1x1 + ...+ cnxn {≤,=,≥} cm and ciis a constant.• Objective: A linear expression to minimize or maximize: cost1x1 +cost2x2 + ...+ costnxn, with costi being the cost charged to the modelwhen xi = 1. The job of a BIP optimizer is to choose the set of valuesfor the binary variables which minimize/maximize this expression.We combine our code and data partitioning algorithms into a cross-tierpartitioning algorithm in order to guarantee an optimal hybrid cloud de-ployment (see Chapter 4 for details).The overall process of applying cross-tier partitioning is shown in Fig-ure 1.3. In the top left we see a monolithic web application before partition-ing. Notice that the profiling logs are split in two branches with the upper3Amazon and RackSpace have no charges for data transfer into their cloud infrastruc-ture while they charge between $0.12 to $0.18 for every GB of data leaving the cloud.101.4. ContributionsFigure 1.3: The overall process of applying cross-tier partitioning to a mono-lithic web application (process flows from left to right). The top of the figureshows the process of application partitioning through code dependency anal-ysis and generating the application-tier BIP. The bottom of the figure showsdata partitioning through analyzing data access patterns and creating ob-jective functions and dependency constraints. The two BIP models are thencombined and passed to an off-the-shelf BIP solver to provide the solutionto the optimization problem.branch performing profiling and analysis of the application tier and the lowerbranch performing profiling and analysis of the data tier. As shown at thetop of the figure, the process of application partitioning involves code depen-dency analysis which then results in creating the application-tier BIP. Atthe bottom of the figure, relations among database tables (e.g., join orders)are analyzed and objective functions and dependency constraints are createdwhich are then converted to a data-tier BIP. The two BIPs for application-tier and data-tier are then combined into one BIP formulation, fed to anoff-the-shelf BIP solver, and the results are taken to derive function andtable placements.1.4 ContributionsThe work presented here makes four primary contributions and two sec-ondary contributions. The four primary contributions correspond to thefour research questions we listed in Section 1.2, and are discussed below:Contribution #1 (Context-Sensitive Dependency Modeling): We designedand developed application profiling tools which generate dynamic depen-dency models of the target application and capture resource usage within111.4. Contributionsthe context of application execution.Contribution #2 (Flexible Cost Modeling): We developed a flexible costmodel that captures specification of code and data replication as well ascost-to-performance trade-offs to capture the details of the deployment en-vironments (both public and private).Contribution #3 (Cross-tier Partitioning): We demonstrate that, in theapplications we studied, combined partitioning of code and data entities toa hybrid cloud can provide a cost improvement of more than 55% comparedto a na¨ıve hybrid deployment, and 40% compared to when only code (andnot data) is partitioned. The cross-tier partitioning work presented hereis the first of its kind in tackling the problem of combined code and datapartitioning for hybrid cloud.Contribution #4 (Asymmetric Data Exchange Costs): We show thatby exploiting the asymmetry in data exchange costs in our evaluatedapplications, we are able to reduce the monthly cost of hybrid deploymentsby 11% compared to when this asymmetry in communication costs isignored. Employing the asymmetric cost models offers an interesting oppor-tunity to further refine partitioning algorithms for hybrid cloud deployment.Besides the primary contributions above, this research work also makesthe following secondary contributions:Contribution #5 (Detailed Evaluations): To validate the effectivenessof the developed solutions, we applied the developed techniques to severalopen source software systems (i.e., RUBiS [4], Apache DayTrader [5], andJForum [6]) and provided real world deployments to Amazon EC2 [2] tomeasure cost and performance of deployed systems.Contribution #6 (Tooling): All the algorithms and tools described inthis thesis are implemented under a framework for application partitioningand distribution to the cloud, named Manticore [67]4. Manticore helpssoftware architects make informed decisions with cost- or performance-effective deployments of their applications.The contributions were summarized and published in three conference4http://nima.magic.ubc.ca/manticore121.4. Contributionspapers, namely: IEEE Conference on Service Oriented Computing 2011(ICSOC 2011) [66], IEEE Cloud Computing Conference 2012 (CloudCom2012) [67] and Usenix Middleware Conference 2013 [68].In the remainder of this dissertation, we describe the details of ourmethodology and highlight the contributions. This thesis is organized asfollows: In Chapter 2, we discuss the related work on application partition-ing, resource provisioning, and cost modeling. In Chapter 3, we discuss codepartitioning and address the contributions on context sensitive dependencymodeling and flexible cost modeling. In Chapter 4, we explain how codepartitioning can be augmented with data partitioning to provide cross-tierpartitioning. In that chapter we also present the contribution on asymmetriccost models for data exchange. In Chapter 5 we discuss the implementationof our Manticore framework. Finally in Chapter 6, we summarize thework presented in this research, explain the challenges, highlight the pathfor the future work, and conclude.13Chapter 2Related WorkThe problem of hybrid cloud deployment sits at the intersection of existingresearch work on automated application partitioning, resource provisioning,and software distribution. These research works correspond to the threesteps of Profiling, Modeling, and Partitioning that we described in Chapter 1.Application partitioning allows for identifying the size and the level ofgranularity for application components to be distributed for hybrid clouddeployment. That includes identifying whether and which components ofthe application (e.g., software functions, modules, or servers) need to beplaced on premises, and which ones need to be placed in the cloud for theoverall cost or performance of the application to be optimized. In caseof hybrid cloud deployment, we are particularly interested in identifyingexisting research work that allows us to understand the right level ofgranularity for partitioning and distribution of web applications and thetype of partitioning algorithms that allow us to better capture systemcharacteristics when it comes to application partitioning. We are also inter-ested to know the set of metrics that are collected when doing applicationpartitioning in order to determine which ones are potentially applicable tohybrid cloud deployment. In the area of application partitioning, we narrowour focus on approaches that discuss two-way partitioning. Even though weare fully aware of the work on multi-way partitioning, we do not report onthat body of research here mostly because at this point we are not awareof a practical approach to combine software modelling and multi-way ap-plication partitioning into a combined strategy for hybrid cloud deployment.Resource provisioning helps with predicting the amount of requiredresources for each set of migrated components. Existing research workin this area presents models and mappings to analyze the behaviour of asoftware system on a target platform based on its execution profile on asource platform. This is particularly useful where hosts of deployment arenot completely identical in their capabilities or where the overall behaviourof the system could be affected by external factors such as number of142.1. Code Mobility & Distributed Middlewarerequests to be processed. In case of hybrid cloud deployment, resourceprovisioning techniques are helpful when modelling the non-determinismin network capabilities, incoming requests, multi-threading issues, etc.While focusing on research provisioning, we omit any work that focuseson maximizing resource usage since that is not part of our focus in thisresearch. From the resource provisioning literature we only borrow themodels suggested to capture and map the behaviour of a software systemwhen deployed from one infrastructure to another.The final step in having effective hybrid deployments of an applica-tion is to facilitate migration of code components into target hosts andlinking them together, for the distributed application to behave similarto monolithic software system. By looking into code mobility techniquesand distribution platforms, our aim is to identify how this overall processcan be facilitated. From the area of code mobility, we ignore previousliterature on agent-based code migration where the are assumptions onpre-configurations of the source or the target hosts. Also out of all theexisting distribution stacks (e.g., CORBA, Java RMI, etc.), we only focus onthe ones supported in the default runtime environment of web applications(e.g., Java) and ignore non-standard efforts.In this section, we briefly discuss the most related research work in eachof the areas mentioned above and discuss how each work has contributed tomy research.2.1 Code Mobility & Distributed MiddlewareCode mobility enables dynamic changing of the bindings between fragmentsof code, where they are resided in a distributed system [41]. Fugetta etal. [51] provide a comprehensive study and analysis of code mobility fromthree angles of technologies, design paradigms, and application domains.Code technologies refer to the languages and systems that provide the re-quired mechanisms for supporting and enabling code mobility. The authorsconsider a five layer architecture for a system supporting code mobility. Atthe lowest layer reside the hardware and physical resources, followed by thecore operating system providing basic system functionalities at the secondlayer. At the third layer, network operating system provides low-levelfunctionalities for network access. Individual components, at the top mostlayer (i.e. fifth layer), are placed on top of a computational environment152.1. Code Mobility & Distributed Middleware(CE) layer (i.e., the fourth layer) that handles bridging between low-levelfunctionalities for network and core operating system layers and the high-level requirements of components. Furthermore, CE provides componentswith the capability to dynamically relocate their components on differenthosts. Individual components at the fifth layer can be execution units (EUs)representing flows of computation, or they can be resources capable ofbeing shared by multiple EUs. The authors have classified mobility mecha-nisms to code and execution state management and data space management.The above classification resembles the cloud support for applica-tion execution where the first three layers could be combined to theInfrastructure-as-a-Service (IaaS) layer in cloud computing with the secondand the third layer combined into the abstract concept of hypervisor incloud computing. The fourth layer could be considered as the Platform-as-a-Service (PaaS) and the fifth layer could be considered equivalent to theSoftware-as-a-Service (SaaS) levels of abstraction in the cloud classification.In this sense, the concept of code mobility with the considerations onmobility and elasticity is well applicable to cloud computing. Code mobilityis related to the problem of partitioning and placing services in a cloudnetwork. Distribution and relocation of code units (e.g., software functions)across machines in the cloud may effectively reduce the overall monetarycost of deployment, improve performance, and minimize resource usagebased on metrics such as network traffic, energy expenditure, quality ofservice (QoS), etc.Here we review the recent systems providing the technology for dis-tribution of software components and enabling location transparency forcomponent placements in a distributed software system.There is a strong body of previous research on code mobility anddistribution at the level of software objects. Sadjadi [89] provides a com-prehensive study of available object-oriented adaptive middleware systems.This includes systems like Common Object Request Broker Architecture(CORBA) [79], Java Remote Method Invocation (Java RMI) [76], and theDistributed Component Object Model (DCOM) [46]. What is common inall these distributed middleware systems is that they provide an interfacefor binding a remote object to be realized as a local object located in thelocal address space. In our research we particularly make use of the JavaRMI middleware to allow for distributed execution of software systems afterdeployment to hybrid cloud. Benefiting from the distribution model offered162.1. Code Mobility & Distributed Middlewarein Java RMI, we choose to have some of the software code executed in thepublic cloud and some of it in the private cloud. The boundaries on howmuch of the code should be executed in either of the two infrastructures isdetermined based on analysis we perform on software metrics such as exe-cution time, data exchange, quality of service, etc (see Chapter 3 for details).Aside from the major object-oriented middleware systems, there alsoexist service-oriented component-based frameworks. In these frameworksapplications are realized at the level of modular units which form applicationcomponents. These frameworks, once enabled for distributed deployment,allow service components for their applications to be deployed acrossdifferent host infrastructures. OSGi is one of the prominent service-orientedcomponent-based frameworks available. Remoting middleware built on topof OSGi have been widely used for distributed deployment of OSGi-basedapplications in a hybrid cloud setting. Next we review the remotingmiddleware, utilized in the context of hybrid cloud:Remote OSGi (R-OSGi) is a distributed middleware platform capableof transparently distributing software modules of an application acrossaccessible devices and platforms [85]. R-OSGi resembles a distributedservice-oriented component model where cross-network distribution ofservices and their late bindings happen through dynamic generation ofproxies that connect one peer to another over a network channel. Anyfailure in the network or in remote invocation of services in R-OSGiis mapped to local events, allowing dependent services to treat remotefailures just like local failures. Performance analysis for R-OSGi showsthat the overhead of remote method invocations is negligible (1.5%) andoutperforms Java RMI (16% overhead). The prominent aspect about theR-OSGi distribution model is that its distribution of a software systemhappens at the boundaries of its components and with minimal interventionto the internal component code. While distribution at the level of softwarecomponents allows for easier decoupling, not all software systems dohave clear boundaries for their components. In our work we hold theposition that analyzing a software system for distributed deploymentshould not be limited by logical boundaries built around units of codeforming the software system. As such, in our distributed middlewarewe analyze dependencies among code entities at various levels of granu-larity and provide a flexible model of distribution (see Chapter 3 for details).AlfredO [86] is a light-weight middleware architecture aiming to allow172.1. Code Mobility & Distributed Middlewaremobile phones and resource constrained devices to be augmented by re-sources from the cloud for performing resource intensive computation [86].Developed by the designers of R-OSGi, AlfredO fully relies on R-OSGifor the purpose of code mobility and exchange of required modules andcomponents between the mobile phone and a cloud platform. AlfredO offersscalability and ease of administration, flexibility, device independence,security and efficiency. AlfredOs prominent enhancement over R-OSGiseems to be the possibility to integrate on-demand code mobility into itsdesign which allows moving a thin client for modules and bundles to aresource constrained device while keeping this client as minimal as possible.The use of the thin-client strategy used in AlfredO helps to enhanceefficiency. The use of a thin client strategy in AlfredO is also utilized inour approach for distribution of software components. However, in ourapproach it is achieved through replication or distribution of functions ina software system which fall into a finer level of granularity compared tomodules used in AlfredO.Further to the distribution support provided by OSGi-enabled middle-ware, PCOM [40] is another component-based service-oriented frameworkwhich inherently supports distribution. PCOMs major goal is to providegeneric automatic adaptation support for service and component selectionin a distributed environment. To do so, PCOM requires i) applications tobe clearly specified in terms of their services and their non-functional prop-erties, ii) services to be monitored in order to deal with missing or lostservices and also changes in their nonfunctional properties, iii) users to de-fine policies and strategies for choosing from a set of services with similarfunctional and quality-of-service properties, and iv) the resulting system tostay minimal in terms of resources consumption. Dynamism in PCOM isaddressed through late binding and contract negotiation, nonetheless, dueto not having a centralized source for registering and discovering services,a peer to peer process of contract exchange in search of a particular servicemight become extremely exhaustive and subject to failure. In our approachwe stay away from a contract based model of distributing components dueto the extra overhead of contract negotiation. We achieve distribution pas-sively (i.e., at compile-time) as opposed to a runtime model of distributionthrough contract negotiation as done in case of PCOM.182.2. Application Partitioning2.2 Application PartitioningAs discussed in Section 1.3, application partitioning splits a monolithic ap-plication into standalone, yet dependent, modules that can run on multiplehosts while ensuring that the overall behavior of the partitioned applicationremains consistent with the original application. The process of partitioningis often performed in order to improve on the quality of service (QoS) (e.g.,increasing throughput while decreasing latency and response time) withinthe context of business-to-business and business-to-customer applications.In order to classify the work done in the area of application partitioning forlarge and scalable distributed systems, we have extracted the following di-mensions in the design of partitioning algorithms from the reviewed relatedwork in this area:• Level of Granularity for Distribution: Level of granularity refersto the boundaries at which the application partitioning algorithm sep-arates the constructs of an application into standalone modules. Thesemodules, while independent, should preserve their behavior within thecontext of the original application upon application’s distributed de-ployment. Partitioning is usually applied at the following levels ofgranularity: Language Entities in the Source Code (e.g., software func-tions), Application Binary, Application Modules and Components, andApplication Execution Engine.• Model of Profiling: Model of profiling refers to the resource usageinformation collected from the application in order to make informeddecisions about optimal partitioning of the application. We considertwo main models for application profiling: i) Static Profiling and ii)Dynamic Profiling. In static profiling, analyzing the code (whethersource or binary) happens statically and collecting required optimiza-tion information usually happens prior to or during compile time. Inthis case, analyzing the signatures for operations or functions allowsfor estimating the amount of data exchanged between different entitiesin the application and information about bandwidth and throughputcan be collected without analyzing the running code. Dynamic pro-filing is done during runtime and when analyzing method signaturesis not enough to collect all the information required to partition theapplication. Dynamic partitioning can be done in offline or onlinemodes. During offline profiling, application entities are executed andmonitored in a controlled environment. During online profiling how-ever, light weight profiling is added to the modules and entities of the192.2. Application Partitioningapplication in a real setting. Dynamic profiling helps with collect-ing accurate information on CPU and memory usage, communicationbandwidth, I/O operations, etc.• Model of Placement: We have identified two major models of place-ment for a partitioned application: i) Client/Server Placement and,ii) Multi-Node Placement. In the client/server placement model, ap-plication entities are placed across two nodes, i.e., the client and theserver. In the multi-node placement model, the entities of the ap-plication usually are placed on more than one node where nodes runseparate independent and distinct entities and modules of an applica-tion or replications of modules. The distinction is important as somealgorithms are only tractable for the client/server model of placement.• Partitioning Methods: Graph based partitioning is a common andwidely used partitioning strategy in which the relations between theapplication elements are modeled in the form of a directed acyclicgraph (DAG). The nodes in the graph usually represent the applica-tion entities that are distributed across several nodes during the pro-cess of partitioning while the edges show the relations between theseentities. The edges could also bear weights showing the size of datacommunicated between two consecutive vertices. Depending on whatthe entities are, the type of graph that is used to model the relationsvaries. The Program Dependency Graph (PDG) is used to show thedata flow between the functions or operations in an application. PDGis used in situations where the bandwidth or the throughput are thecritical parameters for the application partitioning. Object RelationGraph (ORG) is a different model in which the relations between theobjects are modeled as a graph. The graph indicates what objectscreate the others, reference the others, or use the others. Finally, aModule Dependency Graph (MDG) is used to show the relations be-tween modules or services composing an application. In this model,the edges in the graph represent coarse grained dependency relationsbetween components in the applications and the amount of data ex-changed between these components is usually encoded as the weightfor the edges connecting these components.• Optimization Parameters: When creating a graph of the applica-tion, whether it is a Program Dependency Graph (PDG), an ObjectRelation Graph (ORG), or a Module Dependency Graph (MDG), ver-tices and edges in the graph are weighted by information collected202.2. Application Partitioningduring application execution in order to determine how the graphshould be partitioned for the application distribution to be optimal.The weights are collected based on parameters like throughput (TP ),response time (RT ), CPU Usage (CPU), Bandwidth (BW ), Mem-ory Usage (Mem), and Battery Usage (Battrey). Defining the edgeweights for the dependency graphs is influenced by what informationfrom the application is collected prior to partitioning and how theyare combined to properly reflect on the behavior of the application.In the remainder of this section we go over the existing work onapplication partitioning. We investigate each work using the applicationpartitioning dimensions we discussed earlier in this section.Hilda [104, 106] is a high level declarative language for developing datadriven Web applications. The latest version of Hilda supports automaticpartitioning with performance optimization based on linear programming.Its programming model is based on SQL and is only suitable for data-drivenapplications. The entities in Hilda are called Application Units and areroughly comparable to classes. The programmer develops the code for theApplication Units and hence partitioning criteria are enforced at the levelof source code in Hilda. The performance optimization problem in Hildais NP-Complete and is solved with a randomized rounding approximationalgorithm [84]. The model of placement in Hilda is a client/server modeldue to the nature of the target Web applications it generates. As foroptimization parameters, Hilda focuses on optimizing for response time andbandwidth. Table 2.2 summarizes the characteristics of Hilda.SystemLevel of Profiling PartitioningGranularity Model MethodHilda Source Offline Dynamic Integer ProgrammingPlacement PartitioningModel ParamsClient/Server RT, BWTable 2.1: Hilda’s CharacteristicsSWIFT [43] is a system enabling development of 3-tier Web applica-tions using Jif as a high level Java-like programming language. Jif enablesdevelopers to enforce security constraints on data flow between the clientand the server. Jif is translated to WebIL as an intermediate language,gets augmented with module placement annotations, and then translated to212.2. Application Partitioningtwo Java applications one for the server and one for the client. The clientcode is further translated to JavaScript using Google Web Toolkit whichenables placements of the client code on the client machine. Similar toHilda, SWIFT also enforces partitioning at the level of source code throughstatic and offline dynamic profiling. In order for the best partitioning tobe decided, SWIFT employs linear integer programming technique whichoptimizes the flow of information in a client/server placement through opti-mization on response time and bandwidth. The authors provide an integralrelaxation for their ILP formulation that is polynomial-time solvable follow-ing the optimizations performed on modeling data exchange between theelements in the SWIFT application. The graph that SWIFT uses in orderto show the relations between the elements of the program is an acyclic con-trol flow graph (CFG). The nodes in the graph are the statements in theprogram and the edges are given a weight of 1 if the consecutive vertices aresplit between client and server and 0 otherwise. Table 2.2 summarizes thecharacteristics of SWIFT.SystemLevel of Profiling PartitioningGranularity Model MethodSWIFT Source Static + Offline Dynamic Integer ProgrammingPlacement PartitioningModel ParamsClient/Server RT, BWTable 2.2: SWIFT’s CharacteristicsWishbone [78] performs distribution through defining language entitiesfor partitioning. Wishbone focuses on partitioning the code betweenresource constrained sensors and servers. Wishbone addresses the problemsof heterogeneity of resource constrained devices including motes, smartphones, embedded devices, etc. For Wishbone to be able to support properpartitioning, the developer develops code using WaveScript, as a streamprocessing language that has runtimes for both embedded nodes andservers. WaveScript has been extended for developers to logically definethe data flow graph (DFG) and determine which parts of the dataflowgraph should be replicated on all embedded nodes. When partitioningthe application, operators are either considered as movable or pinned;with pinned operators having strict dependencies to the platform they arebeing executed on and movable operators with looser dependencies to theirunderlying platform. Having this information, Wishbone creates a directedacyclic graph (DAG) with its vertices as WaveScript streaming operators.The graph is then run through the partitioning algorithm to generate the222.2. Application Partitioningdistributed entities. Wishbone optimizes the CPU usage and the bandwidthused by the streaming operators. It uses an integer programming modelto define the optimization problem and uses the off-the-shelf lp solveinteger programming solver to solve the optimization problem. Table 2.2summarizes the characteristics of Wishbone.SystemLevel of Profiling PartitioningGranularity Model MethodWishbone Source Static + Offline Dynamic Integer ProgrammingPlacement PartitioningModel ParamsClient/Server CPU, BW, MemTable 2.3: Wishbone’s CharacteristicsOur research work differs from Hilda, SWIFT, and Wishbone in that wei) have look into the broader range of 3-tier OLTP-style Web applicationsthat potentially can be deployed to the cloud, ii) have focused on otherpartitioning parameters (e.g., monetary costs of deployment to the cloud,data placement constraints, CPU, I/O, and memory usage requirements),and iii) have dealt with partitioning the binary.J-Orchestra [99] has been one of the early efforts towards enabling auto-matic partitioning of Java applications. The difference between J-Orchestraand systems mentioned to his point is that, unlike the previous systems,J-Orchestra does not require access to the source code of the system inorder to enable distribution. It employs an offline dynamic partitioningapproach at the level of application binary (bytecode) with potentials formulti-node distribution. J-Orchestra converts all application objects toremote-capable objects accessible from remote sites. It then enables theuser to decide about the placement of these objects based on whetherthese objects are anchored (i.e., if they should stick to their location) ormobile (i.e., they can be moved to other locations). J-Orchestra classifiesJava objects to modifiable and mobile objects and, using a classifier and aprofiler, helps the user decide where these mobile objects should be placed.J-Orchestra focuses on minimizing network traffic and uses a heuristicstrategy to decide about the proper partitioning with the user capable ofoverride the results. Table 2.2 summarizes the characteristics of J-Orchestra.Diaconescu et al. [48] introduce a dynamic runtime infrastructure andautomatic partitioning approach that similar to J-Orchestra performs Java232.2. Application PartitioningSystemLevel of Profiling PartitioningGranularity Model MethodJ-Orchestra Binary Offline Dynamic HeuristicsPlacement PartitioningModel ParamsMulti-Node BWTable 2.4: J-Orchestra’s Characteristicsbyte code analysis and rewriting to partition and monitor the entities of aJava application. In their approach an object dependence graph (ODG)is created with its edges indicating creation of one object by the other,using it, or referencing it. Similar to J-Orchestra, a classification for localand dependent objects is performed in which dependency relations betweenobjects is resolved by placing inter-process communication code betweenthem. For the system to support execution in heterogeneous environments,the code for the system is first converted to a highlevel intermediaterepresentation to generate an abstract syntax tree (AST) and then to thecode for the target platform using a bottom-up rewrite system. In orderto match the accuracy of the sampling strategy against the real behaviorof objects and elements in the application, automatic profiling throughcode instrumentation is performed. During profiling, information aboutmethod duration, method frequency, hot methods, hot paths, memoryallocation, and dynamic call graph during is collected. The system performsmulti-constraint graph partitioning following Hendrikson et al.’s [61] heuris-tic graph partitioning approach which allows for multi-node distribution.Table 2.2 summarizes the characteristics of the approach by Diaconescu et al.SystemLevel of Profiling PartitioningGranularity Model MethodDiaconescu etal.Binary Static + Offline Dynamic Multi Constraint PartitioningPlacement PartitioningModel ParamsMulti-Node CPU, BW, Mem, Bat-teryTable 2.5: Characteristics of the approach by Diaconescu et al.Our work is close to the ones offered by J-Orchestra and Diaconescuet al. in that we also look into analyzing the binary for the applicationsduring the profiling process. However, our profiling process then results in242.2. Application Partitioningcreating a dependency graph that not only deals with code but also withhow data entities are access in the application. As a result our partitioningrelies on augmenting the code dependency graph of an application withdata dependencies. Also, our partitioning strategy uses monetary costsof deployment to the private cloud as one of the primary motivationsfor application partitioning. Furthermore, the focus of our research is onthe 3-tier OLTP-style web applications compared to the typical desktopJava applications used in the case of J-Orchestra and Diaconesu et al.’s work.Gu et al. [57] propose a system called Spectra for adaptable offloading ofJava applications from resource constraint devices to resource rich systems.Spectra offers the four phases of i) application execution monitoring, ii)resource monitoring, iii) application partitioning candidate generation,and iv) transparent remote procedure calls. The monitoring happens byproviding an application execution graph (AEG) with each node in thegraph indicating a class in the application and edges indicating methodinvocations and resource access between classes in the application. Theweight metrics for each node in the graph formulate memory size, accessfrequency, location, and whether or not the class is native. The edgesbetween classes represent interaction frequency and bandwidth require-ments for the classes. As for partitioning, the system employs an efficientpartitioning heuristic using Store and Wagner’s MinCut algorithm [96] togenerate candidate partitioning plans. Their partitioning algorithm splitsclasses only between two nodes, a surrogate (a server) and a client. In orderfor the distribution to happen, they have modified the Hewlett-PackardsChai virtual machine by adding transparent migration of objects throughusing RPC. Their main contribution is to suggest an inference engine usingthe Fuzzy Control Model on when to do the partitioning of the applicationsbased on the load on the system and the exhaustion of resources. Table 2.2summarizes the characteristics of the approach by Gu et al.SystemLevel of Profiling PartitioningGranularity Model MethodGu et al. Binary Online Dynamic MinCut AlgorithmPlacement PartitioningModel ParamsClient/Server CPU, BW,MemTable 2.6: Characteristics of the approach by Gu et al.252.2. Application PartitioningOu et al. [80, 81] propose an adaptive heuristic (k+1) partitioningalgorithm in which they partition an application into 1 unoffloadablepartition and k offloadable partitions. Similar to Gu’s approach they createa dynamic multi-cost graph with classes in the application representingthe vertices of the graph and edges representing associations betweenclasses. The weights for the vertices indicate a combination of memoryutilization, accumulated processing time, and bandwidth usage. The edgeweights represent the number of invocations and data access between nodesor classes in the application. Their partitioning algorithm is performedby first identifying the unoffloadable nodes in the graph and combiningthem together as a single node. The algorithm then performs a (k+1)coarse partitioning based on a cost function to reduce the number of nodesto the k nodes desired for the partitioning. Finally, fine-tuning is doneby shifting classes on the boundaries between nodes in order to reduceinteractions between nodes. Their coarsening algorithm is based on theirheavy-edge light-vertex matching (HELVM) algorithm which combinesadjacent vertices with heavy edge weights and light vertex weights together.Once the partitioning is done, a binary rewriting of the code is performed inorder to perform offloading by placing proxies between classes on differentnodes. Table 2.2 summarizes the characteristics of the approach by Ou et al.SystemLevel of Profiling PartitioningGranularity Model MethodOu et al. Binary Online Dynamic HELVM HeuristicsPlacement PartitioningModel ParamsMulti-Node CPU, BW,MemTable 2.7: Characteristics of the approach by Ou et al.Our work employs the strategies from the HELVM algorithm and alsohow resource usage parameters are combined when creating the applicationgraph. However, we have constraints on cost and data placement thatare not considered in Gu et al.’s or Ou et al.’s work. Furthermore, thesystems by Gu et al. and Ou et al. only care about the partitioningof the application and do not deal with how code in an applicationis dependent to the data. These works do not investigate the ideaof cross-tier partitioning investigated in our work and furthermore, noneof these works targets the context of cloud as the deployment infrastructure.262.2. Application PartitioningCoign [63] has been one of the early efforts supporting partitioning ofMS COM components. In this sense, application partitioning in Coign hasa coarse level of granularity compared to the approaches by J-Orchestra andDiaconescu and happens at the level of application modules. The problemthat Coign solves is to facilitate repartitioning of the applications so thatoptimized partitioning can be performed in different execution scenarios.Coign enables application profiling by binary rewriting of the COM com-ponents. Based on the component communication profiles and componentlocation constraints an abstract inter-component communication graph(ICC) is created. Location constrains are obtained from the programmer,or by analyzing the component communication record, or from applicationbinaries. Once the ICC graph is created, the lift-to-front minimum-cutgraph cutting algorithm is used to produce a two-machine client-serverdistribution of the original application. Once the application is distributed,a component factory is replicated on each host the application is running on.It is responsible for redirecting the calls for remote components to the hostswhere those components are located. Coign supports both heavy-weightoffline and light-weight online profiling mechanisms to monitor and analyzethe COM components during runtime as well. Table 2.2 summarizes thecharacteristics of Coign.SystemLevel of Profiling PartitioningGranularity Model MethodCoign Module Offline + Online Dynamic Lift-to-Front MinCut Alg.Placement PartitioningModel ParamsClient/Server BWTable 2.8: Coign’s CharacteristicsABACUS [37] follows an application programming model in which a pro-grammer composes an application from explicitly-migratable components orobjects that have no dependency in their functionality. ABACUS deals withapplication programs written in C++ and for the system to perform prop-erly, the programmer needs to embed functions associated with data streamsinto distinct components that can store or retrieve their states during migra-tion. ABACUS sits at the border line of an application partitioning systemand a load balancing system in that it provides support for fine-graineddistribution and redistribution of components among different hosts. Thecomponents in ABACUS are mobile objects of low granularity performing272.2. Application Partitioningself-contained data intensive processing. ABACUS uses a heuristic approachfor partitioning and distribution of mobile objects between different hostsand only deals with distribution of applications between a client and a server.The interesting point about ABACUS is that it provides support for dynamicmonitoring of the behavior of the application which then enables the appli-cation modules to be redistributed across the client and the server based onthe immediate changes in the profiling information. ABACUS is similar toJ-Orchestra and Coign in that it only deals with the amount of data trans-ferred between the source and the sink and uses dynamic online profilingto collect information about entities involved in partitioning and distribu-tion. The important point about distribution in ABACUS is that, profilingis done based on a sample of potential scenarios. Table 2.2 summarizes thecharacteristics of ABACUS.SystemLevel of Profiling PartitioningGranularity Model MethodABACUS Module Offline + Online Dynamic HeuristicsPlacement PartitioningModel ParamsClient/Server TP, RT, CPU, MemTable 2.9: ABACUS’s CharacteristicsNanda et al. [77] deal with distribution of composite Web services. Theirsystem takes in the centralized orchestration of Web services represented in aBusiness Process Execution Language (BPEL) format and tries to decentral-ize the execution in order to increase parallelism and reduce network trafficrequired for an application. In their case, execution entities are Web ser-vices running on different hosts. The decentralization helps in that sourcesof producing data directly send their data to less loaded sinks for consump-tion which helps with distribution of the load on receiving and processingdata. Following a program dependency graph, services involved in decentral-ized orchestration are of finer level of granularity compared to the originalservices. As a result, a fixed node is considered a node in the programdependency graph that has execution resources, whereas a portable nodecould be a simple assignment of data to a variable, only indicating flow ofdata. The authors assume that for a centralized orchestration to be parti-tioned to a decentralized orchestration, at least one fixed service and zeroor more portable services must exist. Their partitioning algorithm is alsoconcerned with flow of data between services and throughput is consideredas the primary performance metric. Finally, for the graph to be partitioned,the authors used a simple heuristic technique, called the merge-by-def-use.282.2. Application PartitioningTable 2.2 summarizes the characteristics of the approach by Nanda et al.SystemLevel of Profiling PartitioningGranularity Model MethodNanda et al. Module Online Dynamic Heuristic Merge-by-def-usePlacement PartitioningModel ParamsMulti-Node TP, RTTable 2.10: Characteristics of the approach by Nanda et al.Giurgiu et al [54] represent a distribution model at the level of ap-plication modules that works with components built on top of the OSGiframework [36]. The goal is to minimize communication between OSGimodules working together in creating an OSGi-based Java application. Allthe instrumentation and profiling of OSGi bundles is done manually andthe collected information is statically analyzed to decide about the bestpartitioning for the OSGi bundles. For the sake of simplicity, the authorsassume that the developer marks the movable bundles and each bundle onlyprovides one single service shared with other bundles in the application.The model of distribution for the work by Giurgiu et al. is built on top ofAlfredO [86]. For optimized partitioning of OSGi bundles, the parametersconsidered in partitioning include the memory, the code size, and also theamount of communicated data between the source and the client. Giurgiuet al. employ two heuristic models to find an optimized partitioning in thegraph. In their first approach they use a full heuristic approach analyzingall possible distributions of OSGi bundles across the client and the server.In a K-step algorithm a minimized version of the full heuristic approach ischosen, where a local optimum is chosen as the possible distribution for theapplication. As mentioned earlier, their approach also distributes bundlesonly between two nodes, i.e., a resource constrained mobile node and aresource rich server node. Table 2.2 summarizes the characteristics of theapproach by Giurgiu et al.Modular applications are closer to the typical model of 3-tier ormulti-component applications deployed on a PaaS in a cloud environment.As a result, applications evaluated in this thesis are similar to the onesinvestigated in Coign, ABACUS, or by Nanda et al. and Giurgiu etal. As discussed earlier, our research work has focused on profiling andpartitioning at the level of application binary and through offline dynamicprofiling. Consequently, our work is different from the one by Giurgiu et292.3. Software Migration and Resource UtilizationSystemLevel of Profiling PartitioningGranularity Model MethodGiurgiu et al. Module Offline Dynamic Heuristic K-Step PartitioningPlacement PartitioningModel ParamsClient/Server CPU, BW,Mem, BatteryTable 2.11: Characteristics of the approach by Giurgiu et al.al. as they are assuming that application source code should be present.As discussed earlier, the primary difference between the work presented inthis dissertation and the previous body of research is our primary focus onproviding cross-tier partitioning targeted to OLTP-style web applications.Also, the special constraints in the cloud, particularly the monetarycosts and data privacy constraints, are the constraints driving profiling,partitioning, and distributed deployment of the applications.Table 2.2 summarizes the contributions from the all the research workspresented above, based on the dimensions we identified at the beginning ofSection Software Migration and Resource UtilizationAnother area of research related to our work is the research on predictingresource utilization by a software system and performing software migrationto improve non-functional requirements in a system (e.g., quality of serviceand security). Software migration is similar to application partitioningin that both address nonfunctional requirements. The two are howeverdifferent in that: the former assumes tighter coupling among software com-ponents where the loosening of components happens through partitioning;while the latter assumes that software components are loosely coupled withclear component boundaries that make components easily separable forrelocation.Within the context of both application partitioning and softwaremigration it is critical to accurately predict resource utilization by softwarecomponents. Both approaches rely on algorithms that require resourceusage for software components to be analyzed prior to distribution anddeployment on a target host. Accurate capturing of these metrics in the302.3. Software Migration and Resource UtilizationSystemLevel of Profiling PartitioningGranularity Model MethodHilda Source Offline Dynamic Integer ProgrammingSWIFT Source Offline Dynamic Integer ProgrammingWishbone Source Static + Offline Dynamic Integer ProgrammingJ-Orchestra Binary Offline Dynamic HeuristicsDiaconescu etal.Binary Static + Offline Dynamic Multi Constraint PartitioningGu et al. Binary Online Dynamic MinCut AlgorithmOu et al. Binary Online Dynamic HELVM HeuristicsCoign Module Offline + Online Dynamic Lift-to-Front MinCut Alg.ABACUS Module Offline + Online Dynamic HeuristicsNanda et al. Module Online Dynamic Heuristic Merge-by-def-useGiurgiu et al. Module Offline Dynamic Heuristic K-Step PartitioningPlacement Partitioning SuggestedModel Params ImprovementsHilda Client/Server RT, BW Added Partitioning Con-straints,SWIFT Client/Server RT, BW (e.g., Cost and Placement)Wishbone Client/Server CPU, BW, Mem Multi-node placement,J-Orchestra Multi-Node BW broader range of Apps, On-lineDiaconescu etal.Multi-Node CPU, BW, Mem, Battery Dynamic Profiling,Gu et al. Client/Server CPU, BW, Mem Placement and ResourceOu et al. Multi-Node CPU, BW, Mem Expansion ModelsCoign Client/Server BWABACUS Client/Server TP, RT, CPU, MemNanda et al. Multi-Node TP, RTGiurgiu et al. Client/Server CPU, BW, Mem, BatteryTable 2.12: The Dimensions of Investigation for all of the reviewedapproaches on application partitioning. In the “Partitioning Param-eters” column, RT=response time, BW=bandwidth, Mem=memory usage,Battery=battery Usgae, and TP=throughput. The “Suggested Improve-ments” column, summarizes the improvements we suggest over the existingapproaches as discussed within Section 2.2.312.3. Software Migration and Resource Utilizationsoftware model would lead to partitioning solutions that are optimal interms of using resources and reducing costs where constraints on cost anddata placement are tight.Ou et al. [82, 83] and Zhuang et al. [109] have shown that even foridentical instance types for virtual machines leased from a given cloudprovider (e.g., Amazon Web Services) different physical capabilities couldbe presented to the users. When modeling the behavior of an applicationfor deployment to the hybrid cloud, it is important to understand how theapplication would behave in a target platform given the capabilities of theplatform (e.g., CPU cycles, memory size, etc.). A conservative strategy toestimate the behavior of the application on a target machine is by doing realdeployments. However, given the variety of physical machines assigned to agiven instance type (e.g., on Amazon Web Services), physical deploymentswould introduce extra deployment costs, extra efforts during analysis,and potentially incomplete information on the behavior of the softwaresystem under study. An alternative strategy would be to try and model theestimated behavior of the target software system on heterogeneous hostsgiven the set of capabilities for those hosts.Research on resource utilization has long looked into techniques onmodeling the behavior of software systems, in particular to estimate how thebehavior of the software system changes as the load on the software varies.In the rest of this section, we look into the previous work on modelingsoftware resource utilization and software migration for deployment onheterogeneous machines. We combine the models of resource utilizationfrom this section into the definition of our cost model when we combinecost schemes and performance constraints for partitioning (see Chapter 3for details).Urgaonkar et al. [101, 102] have looked into the problem of resourceutilization in a shared platform where maximization of profit and optimizedutilization of resources are desired. In their work, Urgaonkar et al. [101, 102]have employed kernel-based profiling as an empirical approach for col-lecting QoS measures on CPU and bandwidth usage. This usage basedprofiling is done for each component in the application running on a singlemachine. The prominent aspect of the method suggested by Urgaonkar etal. [101, 102] is that it performs online usage monitoring in order to providepredictions for how resource usage changes under varying load. The workof Urgaonkar et al., even though one of the major contributions in the area,322.3. Software Migration and Resource Utilizationdoes not consider dependencies between code and data placement whenmaking resource usage predictions. Furthermore, cost constraints for theunderlying machines can heavily affect how resources will be used whendeciding about migration and placement strategies.In a more recent work in 2011, Tak, Urgaonkar, and Sivasubrama-niam [98] have investigated the implications of migrating web applicationsin three settings of cloud-only, premises-only, and hybrid. For their costmodels they have evaluated Net Present Value to measure the profitabilityof investing in each of the three deployment settings. For their cost model,Tak et al. have used the three dimensions of material costs (includingsoftware and hardware costs), maintenance costs, and expense costs (e.g.,facility costs, ec.). In their modeling of costs, authors find workload, datatransfer, storage capacities, and software licensing to be the importantcost factors when combined. With all the analysis the authors discoverthat a full cloud deployment is only attractive to small businesses whereaswith bigger organizations a hybrid deployment of applications is suitableparticularly with horizontal partitioning and migration of applicationcomponent being most beneficial to these organizations. The authors claimthat in their analysis, vertical partitioning showed smaller benefits due tohigh costs of data transfer. In our work we have reached a similar conclusionin that partitioning would help with better utilization of cloud resources.We however show that a combination of vertical and horizontal partitioningcould reach even further efficiency for hybrid deployment if patterns ofdata exchange are properly optimized. This is particularly useful when weutilize our asymmetric data exchange algorithm for optimizing ingress andegress data transfer to a public cloud platform (see Chapter 3 for details).Stewart and Shen [94] have suggested a method for performance model-ing for multi-component applications based on their component placements.Their goal is to enable system managers to have a cost effective approachfor optimal capacity planning and component placement. The authorshave divided their approach to two phases of application profiling andperformance modeling. Application profiling is done by kernel profiling ofapplication components based on CPU, disk, and communication overheadsfor a running application, and then generalizing the results into a linearfunction through linear fitting. The authors rely on the assumption thatthe overhead on resource usage can be modeled linearly and generalized toreflect on the overall behavior of the application. To derive the performancemodel for the application, the authors have employed the queueing the-332.3. Software Migration and Resource Utilizationory [35] with assumptions for independent request arrivals. In their model,the inter-arrival times for requests follow an exponential distribution,service time distribution is based on a general distribution, and requestsare served one at a time. The authors show that their approach improvesthe approximation on resource usage under varying load within a range of7-36%. The approach of Stewart and Shen lacks the same considerationsfor data and cost as explained for the work of Urgaonkar et al, However,even though fairly simplified, the approach by Stewart and Shen provedto be effective when modeling varying load for web applications. In ourwork we borrow the linear fitting ideas from the work of Stewart and Shenwhen trying to model changes in software behavior upon changes in theunderlying host machine.Shimizu et al. [92] suggest an approach for prediction and modelingof application resource usage similar to the one suggested by Stewartand Shen [94]. Shimizu et al. however try to provide a platform in-dependent model, so that the driven model can be used to predict theamount of required resources on any machine with arbitrary specifications.The authors use profiling techniques to collect computation (e.g., CPUclock speed, L2 cache size, etc.), communication (e.g., round-trip-time,interconnect bandwidth), and storage (e.g., main memory size, memoryaccess bandwidth) parameters during application execution. In order tomake the profiling information platform independent, the profiling datais contributed from several platforms with heterogeneous computation,communication, and storage capabilities. The evaluations are performedusing real-world applications (e.g., the Postmark email server benchmark)on several platforms with different system specifications. The authorsuse numerical methods to ensure resource requirement predictions andapplication constraint satisfactions. Even though in our analysis andmodeling of hybrid cloud deployments we assume cloud machines to behomogeneous, the possibilities to deal with varying resource capacities asoffered by Shimizu et al. can particularly increase the effectiveness of ourwork when dealing with heterogeneous machines. We consider Shimizu etal.’s work to be integrated into our work as part of our future research plan.Li et al. [72, 73] suggest a heuristic approach as a combination of linearprogramming and a nonlinear performance model to the problem of scalablecomponent migration for service configuration with cost and QoS consid-erations. The authors deal with contention of software resources which isspecifically effective in the multi-tenant setting of a cloud environment.342.3. Software Migration and Resource UtilizationThe approach starts with creating the Layered Queueing Network (LQN)model providing representations for resource usage, users, entries, tasks,processors, and delays. The model is accompanied by a linear networkflow model representing resource usage (in particular, CPU demand) byservices and based on user requests for each server task on each host.This network flow model, referred to as the processing network, producesthe queueing delays for using resources. Their results show that for anapplication of 50 services it takes about 258 seconds for the algorithm todecide about the proper placement with error rates below 2%. The mainissue with Li et al.’s technique for modeling the behavior of the system isthat LQN is generated manually and hence its accuracy requires a goodunderstanding of the original application. Furthermore, while the queueingmodel helps with mapping the communication behavior of the system,once combined with other resource usage metrics, the model becomesharder to analyze and use. On the contrary, our work focuses on a flexiblemethod of resource modeling by doing it automatically and through awell-known dependency graph. This takes the extra work of modelingsoftware systems off the shoulders of system architects by relying on ana-lyzing the application binary for partitioning, distribution, and provisioning.Hajjat et al. introduce Cloudward Bound [59] to provide a modelfor exploring the benefits of a hybrid migration approach for distributingmulti-tier software systems between the public and the private cloud. Foroptimal migration of servers to the cloud, the authors solve optimizationproblems that assure data flow balance across the hybrid components of anapplication such that all transactions happening in a hybrid deployment areaddressed either by the components in the public or the private cloud. Theauthors also try to minimize communication costs, minimize the increasein transaction delays, model and maximize cost benefits of migration,and assure the effectiveness of the security policies in place. The authorsdemonstrate that, even though the users will experience delays as a result ofmigration to the cloud, the increase in the delay stays within the acceptedlimits. The approach taken by Hajjat et al. is similar to our approach intaking cost and privacy constraints into account when planning a hybridmigration. However, their view into the problem is different from oursin that Cloudward Bound considers application components as coarsegrained components in multi-tier software systems with each componentconsisting of several servers. On the contrary, our approach considersexecution units or components of an application as fine-grained elementsin the system. Migration of fine-grained components in the cloud allows352.4. Summaryfor better utilization of cloud resources whereas coarse grained migrationmay result in under utilization of resource and hence losing on cost savings.Furthermore, we put no assumptions on the existence of resource usagemodels for components in the application and try to dynamically measureresource usage for components in the system.2.4 SummaryIn this section we described the related work in the areas of code Mobility& distributed middleware, application partitioning, and software migrationand resource utilization. As discussed earlier in this chapter, our workbenefits from each of the three areas by, understanding and modelling thebehaviour of software application in a distributed environment, analysisand partitioning of the software application for optimized deployment inthe hybrid cloud, and finally distribution and deployment of the applicationacross the public cloud and the private infrastructure.Our work, even though heavily inspired by the existing research workin all of the above areas, distinguishes itself from the previous work by:i) looking at a finer level of granularity for application partitioning (softwarefunctions), ii) doing cost and performance optimizations across different tiersof an application, taking both code and data dependencies into account,and iii) benefiting from cloud cost models that allow for better decisionmaking on optimized application partitioning and distribution. As shownin the forthcoming chapters, the collection of these contributions positivelyaffects the overall distribution of applications towards cheaper and betterperforming hybrid deployments.36Chapter 3Code Dependency Modelingand Application-tierPartitioningOur approach towards cheaper and faster deployment of a software systemto a hybrid cloud relies on modeling the behavior of the target softwaresystem (e.g., dependencies between code components and data entities),assessing cost and performance measures for code components and dataentities in the software system, and partitioning and distribution of thetarget software system. In this context, we rely on resource monitoring, costspecification, software modeling, and application partitioning of the softwaremodel in order to generate cost or performance effective hybrid deployments.As discussed in Section 1.3 of Chapter 1, we perform cross-tier parti-tioning both of the application tier and the data tier of a multi-tier softwaresystem (particularly a web application) to achieve cost and performanceefficiency in hybrid deployments. In this chapter of the thesis, we focusour attention on dependency modeling, cost/performance analysis, andpartitioning of the application tier. In particular, in the remainder of thischapter we address three of the contributions discussed in Section 1.4,i.e., context-sensitive dependency modeling, flexible cost modelling, andasymmetric data exchange costs. We describe how combination of thesecontributions can lead to effective hybrid cloud deployment decisions. Inthe next chapter we discuss how this can be combined with data-tierpartitioning to form an overall cross-tier partitioning strategy.This chapter is summarized and published in IEEE Cloud ComputingConference 2012 (CloudCom 2012) [67]. The remainder of this chapter isorganized as follows: in Section 3.1, we describe the process of creating thedynamic dependency graph for a given software system. We elaborate onthe details of creating the dependency graphs for software functions in Sec-373.1. Creating the Dependency Graphtion 3.2 and introduce context-sensitive dependency modeling as our strat-egy towards enabling code replication for cloud deployment. Section 3.3shows how cost schemas and policy constraints are introduced into the par-titioning framework. This is then followed in Section 3.4 where we describedthe application of cost and policy constraints to the dependency models. Fi-nally in Section 3.5 we show how the partitioning algorithm is applied tothe generated cost models.3.1 Creating the Dependency GraphA common approach to modeling the behavior of a software system iscreating its Dynamic Dependency Graph [39]. A Dynamic DependencyGraph - DDG(V,E) - is a directed acyclic graph where the set of nodes (V )in the graph represents software functions involved in the execution of agiven software system, and the set of edges (E) represents the dependenciesbetween those entities. The DDG is created by profiling the softwaresystem, collecting profiling traces, and converting the traces into a DDG.The process of profiling involves injecting extra profiling code into thesoftware and collecting execution traces of the application representingcode, data, and their respective resource usage. We have developed asoftware profiler called jip-osgi [7, 66] to perform profiling and analysisof software systems and generate a profiling trace (details in Chapter 5).A profiling trace contains information on the name and reference to thecode function used, its performance metrics, and its preceding code ordata entities. Table 3.1 shows the set of data collected in our traces wheninstrumenting a software system. The DDG is the result of converting theprofiling trace into its graph equivalent.Upon creation of the DDG, it is then augmented with vertex weightsand edge weights. Depending on the context of analysis for the softwaresystem, vertex weights and edge weights may take on different values.When software performance is the focus, a vertex weight wv can be setto refer to the average execution time for node v in the DDG, and anedge weight we(u,v) can refer to the average amount of time taken for dataexchange between vertices u and v [45, 78]. On the other hand, for a clouddeployment, where monetary deployment costs are the focus, wv is set torefer to the monetary cost of execution for v and we(u,v) to refer to themonetary cost of data exchange between u and v (cf. Section 3.4) [58, 67].383.2. Modeling Code Components in the Dependency GraphElement Name Descriptionbn code module namecn code class namemn code method namec number of execution times for a given software functiont time of execution for a given software functiondfp data transfer from the caller function to the calleedtp data transfer from the callee function to the callercfp number of calls from the caller function to the calleectp number of callbacks from the callee function to the callerTable 3.1: The set of symbols used in the application trace of a data depen-dency graph, generated by our profiler tool. An example of the generatedtrace can be found in Program B.1 of Appendix B.We allow the the dependency graph to be generated at various levels ofgranularity, e.g., a coarse level of granularity where nodes represent codecomponents, software modules or jar files - or - a fine level of granularity,e.g., classes or methods (a detailed model of the generated dependency graphis presented in Program B.1 of Appendix B). The level of granularity usedin modeling code or data in the dependency graph has a direct effect onhow the partitioning algorithm is applied to the model. Next we describevarious strategies in modeling code components for dependency graphs.3.2 Modeling Code Components in theDependency GraphIn this section we discuss two of the existing models for representingan application dependency graph and compare them with a third de-pendency model that we suggest for effective hybrid cloud partitioning.The models are discussed within the context of multi-tier web appli-cations (cf. Section 1.2). In web applications, the entry point to thesoftware system is an incoming web request to a front-end tier which getspassed along to the application-tier. The application-tier, in consultationwith the database-tier, prepares a response to the incoming request.The DDG for the web application is captured through modeling the set ofbusiness logic functionality (i.e., incoming request types) that it implements.393.2. Modeling Code Components in the Dependency GraphCommon to all three models is the fact that vertices represent executioncomponents in the application with vertex weights corresponding to aggre-gated CPU usage during the profiling of the application. Edges representdata-links with edge weights capturing latency and data transfer. Themodels differ in how the notions of component and data link are realized.For example, nodes might represent low-level function definitions fromsource code - or - high-level service request types. While both of thesecapture the notion of an execution component, these different choices havean important effect on what kind of optimization can be achieved.In the remainder of this section, we will refer to Figure 3.1, inspiredby Apache DayTrader [5], to illustrate the details of our discussed models.Apache DayTrader is a benchmark multi-tier web application emulatingthe behavior of a stock trading system. DayTrader implements sevenservice request types allowing users to login (doLogin), view/update theiraccount information (doAccount & doAccountUpdate), view their portfolio(doPortfolio), lookup stock quotes (doQuotes), and buy/sell (doBuy& doSell) stock shares. It also consists of five database tables, storinginformation for account, accountprofile, holding, order, and quote. Itshould be noted that Figure 3.1 is simply a high-level comparative mockupand does not correspond to the actual models generated or used by ourframework. A real example of a dependency model can be seen in Figure Request-based Model (RBM)In this model nodes represent either request types (e.g., implementedbusiness logic functionality) or data objects (e.g. relational tables). Edgesare created between each request-node and all of the data objects therequest operates on [34].An illustration of the RBM, inspired by DayTrader, is presented inFigure 3.1(a). Two request types doPortfolio and doBuy are shown, eachdependent on some tables from the DayTrader database (the identity of thetables is not relevant for this example). The edge weights show the averagenumber of round-trips between the code components and database tablesin a single request.When partitioning the application for a hybrid deployment, the parti-tioning algorithm needs to decide whether the code for each request type403.2. Modeling Code Components in the Dependency GraphFigure 3.1: Application Dependency Models: a) RBM, b) SSM, c) CSM.Circles are application nodes and cylinders are database tables. The edgeweights represent number of round-trip data exchanges between nodes.413.2. Modeling Code Components in the Dependency Graphshould be executed in the public cloud or the private premises. The decisionis based on the CPU demands of processing each request type and thevarious data dependencies. Executing code in the cloud will benefit fromhigher CPU scalability at a decreased cost as compared to on-premisesCPU resources. However, separating request types from the data theydepend on could introduce additional latency or bandwidth requirements.For example, in Figure 3.1(a), we see that doPortfolio accesses themiddle database table an average of 15 times per request. This makes theseparation of doPortfolio from the database table inefficient both in termsof cost and performance. If the database table is restricted to be placedon premises, the doPortfolio request also needs to be placed on-premises.Having both code and data on premises clearly would not help with anycloud specific cost or performance improvement.3.2.2 Static Structure Model (SSM)A drawback of the RBM is that it does not separate request types intoindividual programming language functions used in the implementation.This means there is only a single node representing the entire softwarefunctions involved in executing each request type in the application. Con-sequently, the partitioner does not have the flexibility to split the executionof software functions for a given request type between the cloud and thepremises, even if that would provide an advantage. Some other research onapplication partitioning ([43, 59, 63]) deals with an implementation-levelmodel of a service, we refer to as SSM.In this model nodes in the graph represent the definition of a functionor the existence of data objects. By definition we mean that a nodecorresponding to some function occurs only once in the graph, regardless ofhow many times the function is typically executed during the processing ofa request. For data objects, this distinction is not necessary (because datadoes not execute). An edge between two nodes means that there is at leastone call between those functions.An illustration of the SSM, inspired by DayTrader, is provided inFigure 3.1(b). Here we see doPortfolio and doBuy broken down into thefunctions that implement them. Using this model, the job of the applicationpartitioner is to decide whether each function should execute code in the423.2. Modeling Code Components in the Dependency Graphpublic cloud or the private premises. However, both request types makeuse of the function getQuote. This causes the edge weights induced bydoPortfolio and doBuy over getQuote to become conflated, resulting inan overly abstract representation. We found that having only one noderepresenting functions such as getQuote may over-constrain the partitioningoptimizer causing it to produce suboptimal code placement suggestions.Referring back to Figure 3.1b, the conflated edge from getQuote to thedatabase table bears a weight of 16 for which the profiler is not able todistinguish the weight induced by doPortfolio and the weight induced bydoBuy. This results for the partitioner to assume a tight bound betweengetQuote and the database table for both requests doPortfolio and doBuyin a situation where in fact only doPortfolio has a tight dependency withthe database.In a SSM, applying the partitioning strategy would place each andevery software function either in the cloud or on-premises, leaving nofurther refinement to be done to the final function placement decisions.Furthermore, conflated dependencies across multiple request types mayconfuse the partitioner in finding an optimal partitioning and separation offunctions in a SSM.Conflated dependencies are less of a problem in a request-based model.This is because there will be a new instance of each function under everyrequest type, allowing for higher replicability of functions. However,functions are at most replicated as many times as the number of requesttypes in the model, and functions within a particular request will not bereplicated. As a result, the dependency model internal to each request in aRBM would in fact follow a SSM model.3.2.3 Context-Sensitive Model (CSM)In order to address the problems of the previous two models, we havebuilt our framework on a context-sensitive model of software behavior.In this model each node represents the execution of a function in adistinct calling context. A calling context captures the transitive set ofcallers to each function execution (aka. an execution stack). Using theexecution of a function rather than a function’s static definition as thebasis for our framework allows our runtime to execute the same functionon premises or in the public cloud depending on the situation. Note,433.2. Modeling Code Components in the Dependency Graphhowever, if we simply tried to apply automated optimization to a graphwhich contained one node for each distinct function execution in a capturedexecution profile, the graph would grow too large. We built on the assump-tion that we can capture differences in each function execution by onlyconsidering each execution different if it occurs in a different calling context.An illustration inspired by DayTrader is provided in Figure 3.1(c). Nowwe can see that two copies of getQuote are created, one where it is in thecontext of getQuotes and one where it is in the context of doBuy. Consider,as is the case for DayTrader, that getQuotes calls getQuote several times ina loop; an average of 15 times as indicated by the edge weight. If getQuotewas executed only on the public cloud, this would cause a large numberof round-trips over the network, so in an optimal deployment it should beplaced on premises where it is close to the data. However, in the context ofdoBuy, getQuote is only called once, so for this case executing getQuote inthe public cloud does not incur extra latency, and yet we can take advantageof the public cloud’s pricing benefits. Different from previous research, ourapproach provides this capability for such context-sensitive partitioning. Arealistic example of a context sensitive model, collected by our frameworkfor the DayTrader application, is shown in Figure 3.2. The figure capturesthe CSM model for the doLogin request type in Apache DayTrader.443.2.ModelingCodeComponentsintheDependencyGraphFigure 3.2: CSM Visualization of partitioning results for the doLogin request type (root node of the tree) inthe DayTrader example provided by our framework - Manticore. Yellow nodes (alternatively light gray nodes)represent database tables and all the other nodes represent individual function executions with dark gray nodesbeing chosen to be placed on premises and white nodes to be placed in the cloud. Note in the figure that nodesrepresenting database tables are replicated in the figure for presentation purposes, where in fact, the actual graphis a DAG instead of a tree. The figure exemplifies a case where all database tables are restricted to be placed onpremises. As the figure shows, with latency being the dominant performance bottleneck, functions are pulled tothe private premises when there is frequent database access under the partitioned subgraph (this also depends onother factors such as CPU usage of functions and size of data transferred).453.3. Defining Constraints for Placement & Cost3.3 Defining Constraints for Placement & CostAs it has been discussed so far in this thesis, minimizing deployment costswhile keeping control over placement of particular code and data entities isthe main motivation behind having hybrid deployments for the cloud. Con-sequently, enabling software architects and designers to be able to effectivelyspecify their code placement constraints and their deployment costs has beenone of our major areas of concentration throughout this research. To provideenough support for system architects to flexibly define their software place-ment constraints and the costs associated with deployment of their softwareto public or private clouds we have introduced two deployment schemes: i)a Policy Specification Scheme that allows for defining the placement con-straints and ii) a Cost Scheme that allows for specifying deployment costs.In the rest of this section, we describe how these two specifications canbe utilized by software architects to define constraints for a hybrid clouddeployment.3.3.1 Policy Specification SchemeThe policy specification scheme (policy spec) is used to declare businesslogic functionality to be included in the DDG; the level of granularity fordifferent entities in the DDG (e.g., methods, classes, etc.); list of entitiesthat can possibly be ignored; and constraints on co-location or replicationof code or data entities. We describe some of the features supported in ourpolicy specification model below:Changing the level of granularity allows for system developers to gofrom a coarse-grained dependency model (e.g., a request-based model) tofiner levels of granularity (e.g., static structure model or context-sensitivemodel) when creating the DDG.Possibility to include or exclude modules in the DDG enables softwarearchitects to narrow down their focus when analyzing a software system,e.g., by isolating the analysis to particular request types in the application,excluding cross-cutting components of the system (e.g., security or loggingcomponents), etc.Co-location or replication provides flexibility in deciding on componentplacements, e.g. by restricting the authentication or authorization functionsto be in the private premises, or allowing for stateless components in the463.4. Augmenting Dynamic Dependency Graphs with Cost Metricssystem to be replicated, etc.Details of our declarative policy specification can be found in Ap-pendix C. Cost SchemeThe partitioner also accepts a cost scheme to be used for a hybrid clouddeployment. The cost scheme essentially encodes the hardware capabilitiesand associated monetary costs offered by a cloud provider, but in a formatthat can be processed by our framework.The cost model captures hardware characteristics for the machineshosting the target application. In our cost model, we have includedhardware capabilities for which there is a direct monetary cost definedin existing public cloud cost schemes. This includes CPU performancemeasures, network bandwidth, storage capacities, etc. Details of our costscheme are presented in Appendix C.2.The partitioner tool combines all the cost and policy specifications intothe representation of the DDG prior to applying a partitioning algorithm toit. These constraints affect the DDG by changing the weights for verticesand edges to reflect on the cost metrics rather than the performance metrics.Also the placement constraints are modeled in the form of constraints fedto the partitioning algorithm before searching for an optimal solution. Allthe collected information is utilized to generate a model of the applicationfor the analysis.3.4 Augmenting Dynamic Dependency Graphswith Cost MetricsCost modeling involves associating quantifiable metrics to the executionfootprint of a given application. Since the overall time of execution foran application directly contributes to the cost of deployment, a simplisticoptimization strategy would be to only minimize the total time of execution.However, execution time does not account for all the costs billed by cloudproviders. While solving the optimization problem only with executiontimes allows for tuning the system for an optimal performance, missing costinformation prevents any optimizations on other deployment costs (e.g.,473.4. Augmenting Dynamic Dependency Graphs with Cost Metricscost of data transfer or cost of data storage) to be performed. To reflectthe costs of a hybrid deployment, we augment the dependency graphs fromSection 3.2 with cost implications of a hybrid deployment. This is done intwo phases:Phase I: In the first phase, vertex weights and edge weights need to beupdated to reflect the overall time of application execution when deployedacross private premises and public cloud. For a given vertex v in the depen-dency graph, we generate two weights indicating the execution time of anycorresponding function on-premises and in the cloud (based on the CPUcapabilities of the host machines). Generating these edge weights can bedone either by profiling the software system on machines both in the pub-lic cloud and the private premises, or by applying interpolation techniques(e.g., linear fitting [95]) to estimate its execution time on a target machine(texecv) based on its execution time (tmeasuredv) during the profiling phase:texecv = tmeasuredv ×CPUtargetCPUsource(3.1)where CPUtarget represents CPU capabilities of the target deployment hostand CPUsource represents CPU capabilities of the machine used during theprofiling phase.The edge weights are set to reflect the amount of time it takes for datato be communicated between two components u, v in the application wherethose components are split between the cloud and the on-premises datacenter. We utilize the communication latency model introduced in [65] asfollows:tcommu,v =[Π +du,vβ+ λ](3.2)where du,v indicates the amount of data communicated on the edge e andβ is the communication bandwidth between the premises and the cloudprovider. The first expression (Π) determines the queuing delay for datatransfer on the network; du,vβ is the data transmission time between thepremises and the cloud; and λ is the median measured latency between theprivate premises and a given public cloud.Phase II: In the second phase, we include implications of cloud deploy-ment. The model starts by accounting for the actual cost of deployment483.4. Augmenting Dynamic Dependency Graphs with Cost Metricsby applying cost schemes offered by a cloud provider. For pricing, we useinformation from the cost scheme as shown in Program C.2 of Appendix Cto update vertex weights for the dependency graph model as follows:costexecv = α× costexecunit ×(tmeasuredvTunit)× CPUtargetCPUsource (3.3)where Tunit represents the time unit for which cloud charges apply,costexecunit indicates cloud charges for each Tunit, and α is the cost ratioof running each function on a target premises machine versus running iton the cloud machine. α is configurable by the system architect. In ourevaluations we change it from 1 to 25 for varied premises deployment costs,evaluating a 0 to 25 times cost saving for public cloud deployment.Next we model the monetary cost of communication between vertices uand v as follows:costcommu,v = γ × costexecunit ×(tcommu,vTunit)+du,vDunit× costcommunit (3.4)In the first part of the above equation, we account for charges incurreddue to the latency of remote function calls introduced in the hybrid clouddeployment (e.g., extra data exchange, etc.). The second part of theequation accounts for charges directly related to transferring data betweenfunctions deployed in the cloud and the ones on premises. In the aboveformula, Dunit represents the data unit for which cloud data charges applyand costcommunit indicates cloud charges for each Dunit.Finally, we introduce γ as a configurable parameter reflecting the ef-fect of latency on the cost of deployment. This allows developers to useour cost model to make flexible trade-offs between monetary cost and la-tency. The larger a developer chooses the value of γ, the algorithm willwork towards minimizing communication latency and improving the roundtrip time; whereas a smaller γ diminishes the effect of latency. We canformulate γ in the equation below:γ =Tunit × costlatencyunitcostexecunit × Tlatencyunit(3.5)with costlatencyunit being the monetary cost of perceiving the latency time(Tlatencyunit) incurred during an end-to-end execution of a request.493.5. Applying the Partitioning AlgorithmThe formulation of Equation 3.5 defines γ in relation to Tunit andcostexecunit , allowing for latency costs to be tied into the cost measuresgiven in a public cloud provider’s cost schema. For example, given aworkload of 100 req/sec and a round-trip latency of 10msec/req, there willbe 1 second of latency for every second of system execution. If a softwaredeveloper defines a cost-to-latency policy indicating that every hour oftime wasted on latency (Tlatencyunit) is worth $0.32 (costlatencyunit), andgiven costexecunit is $0.32 per hour (Tunit), Equation 3.5 reveals the valueof γ to be set equal to 1 for the cost formulation to account for the overhead.In a real deployment setting, the value of γ is derived from the cost-to-latency value indicated by the system architect. The system architectcan experiment with different cost-to-latency values to reach the appropri-ate partitioning cost. Upon introducing each new cost-to-latency value γ,our framework performs partitioning of the dependency models of the soft-ware system and recalculates the estimated monetary cost and performanceimplications of the deployment.3.5 Applying the Partitioning AlgorithmAs explained in Chapter 2, application partitioning, at its core, is a methodfor applying mathematical optimization to a dependency model of a givensoftware system with the objectives of minimizing latency and reducing cost.We discussed how vertices and edges in an application’s augmentedDDG areassigned weights that represent performance measures [45, 78] or monetarycosts [58, 67] of entities in the DDG. For every v ∈ V , the ultimate goalof the partitioning algorithm is to determine whether v should be placedon-premises or in the cloud, for the overall monetary cost of deployment tobe minimized or performance to be maximized. In this section, we describethe mathematical formulation of the DDG into an application partitioningproblem.3.5.1 The Partitioning Algorithm for Symmetric DataExchange CostsFor each v ∈ V we define costexecv to represent cost of executing v on-premises and cost′execv to represent cost of executing v in the cloud (seeEquation 3.3). Also we simplify the definition of Equation 3.4 into the503.5. Applying the Partitioning Algorithmfollowing formula:costcommu,v = latency cost(u,v) +du↔vDunit× costcommunit (3.6)where latency cost(u,v) is equivalent to γ × costexecunit ×(tcommu,vTunit)fromEquation 3.4, Dunit would be the unit of data to which cloud data chargesare applied and costcommunit would be the cloud charges for Dunit of datatransfer, and du↔v represents data exchange between vertices u and v.It should be noted that costexecv and costcommu,v could hold either theperformance costs or the monetary costs depending on whether we optimizefor performance or monetary costs.As briefly discussed in Chapter 1, we utilize Binary Integer Programming(BIP) to encode the DDG into an optimization problem. For every node uin the dependency graph we consider a variable xu in the IP formulation,where the set s refers to entities placed on-premises and the set t refers toentities placed in the cloud.xu ∈ {0, 1}∀xu ∈ s, xu = 0 (3.7)∀xu ∈ t, xu = 1With all the above constraints and cost modeling, the following objectivecan then be defined:min(∑u∈V(xucost′execu + (1− xu)costexecu)+ (3.8)∑(u,v)∈E(xu − xv)2costcommu,vwhere placement of xu in the cloud results in cost′execu being charged tothe objective function while its placement on premises results in costexecubeing charged to the objective function. Also if u and v are on differenthosts (i.e., xu 6= xv), the equation above charges the communication costcostcommu,v to the objective function.The quadratic expression in the objective function of Equation 3.9 can513.5. Applying the Partitioning Algorithmbe relaxed by making the expansion suggested in [78].∀(u, v) ∈ E e(u,v) ≥ 0, e(u,v) ≤ 1∀(u, v) ∈ E e′(u,v) ≥ 0, e′(u,v) ≤ 1 (3.9)∀(u, v) ∈ E xu − xv + e(u,v) ≥ 0∀(u, v) ∈ E xv − xu + e′(u,v) ≥ 0With the expansion of Equation 3.9, e(u,v) and e′(u,v) will be 0 if both xu andxv are assigned to the same host but e(u,v) will be 0 and e′(u,v) will be 1 ifxu is on t and xv is on s or e(u,v) will be 1 and e′(u,v) will be 0 if xu is on sand xv is on t. Given Equation 3.9, the objective function of Equation 3.8is converted to the following non-quadratic formulation:min(∑u∈V(xucost′execu + (1− xu)costexecu)+ (3.10)∑(u,v)∈E(euv + e′uv)costcommu,vWe refer to the objective of Equation 3.10 as Symmetric IP in the restof the thesis as it does not distinguish between inbound and outbound com-munication costs3.5.2 The Partitioning Algorithm for Asymmetric DataExchange CostsWith the expansion of Equation 3.9, comes an immediate benefit of beingable to determine function placements in relation to one another. Knowingwhether two immediate functions are co-located or distributed (i.e., sepa-rated between public cloud and private premises), and if data goes from thefunction in the public cloud to the function on premises or vice versa, weare able to formulate the asymmetric data exchange charges for a cloud de-ployment into the partitioning problem (cf. Contribution 4 in Section 1.4).As discussed earlier, public cloud providers have no monetary chargeswhen data goes into their cloud data centers but they apply charges to thedata that leaves their data centers. Being able to extract and encode in-bound and outbound data charges into the partitioning algorithm allows forsmarter monetary cost optimizations when applying partitioning to a DDG.523.5. Applying the Partitioning AlgorithmBenefiting from the expansion in Equation 3.9, asymmetric billingcharges can be added to the communication cost of Equation 3.4 as follows:cost′commu,v = (euv + e′uv)× latency cost(u,v)+du→vDunit× cost outcommunit × e′(u,v) (3.11)+dv→uDunit× cost incommunit × e(u,v)where du→v stands for data transfer from the public cloud to the premiseswith xu being in the cloud and xv being on-premises, and dv→u representsdata transfer from cloud to the premises where xu is on-premises andxv is in the cloud. Furthermore, cost outcommunit represents cloud datacharges when data leaves the cloud and cost incommunit represents clouddata charges when data enters the cloud. This formulation, combined withseparation of outgoing and ingoing data to an entity during applicationprofiling leads to cost optimization of the target software service for ahybrid deployment. Following the changes above, the objective functionof Equation 3.10 can be updated by replacing (euv + e′uv)costcommu,v withcost′commu,v . In our evaluations, we refer to this new IP formulation as theAsymmetric IP.As a concrete example of how the asymmetric algorithm works withinthe context of DayTrader, we provide the code partitioning of Figure 3.3 assuggested by the execution of the asymmetric algorithm when applied toDayTrader’s DDG. Figure 3.3 shows a snippet of the DDG for the methoddoAccount. Let us assume that the accountprofile database table isconstrained to stay on-premises, the dominant performance bottleneck isthe round trip network latency, and each edge in the graph is exercisedonly once. Given that premises resources are more expensive than cloudresources and there are asymmetric data exchange costs for data transferbetween the cloud and the premises, in the scenario of Figure 3.3, theoptimal deployment needs to reduce data transfer over the network andminimize the number of functions deployed on premises. In a symmetricpartitioning algorithm separating updateAccountData from doAccount(7 KB of data exchange) is preferred over cutting executeQuery (13 KBof data exchange). In an asymmetric partitioning however, the algorithmwill assign no costs for inbound data to the cloud. As such, cuttingexecuteQuery to be pushed to the premises has equivalent cost overhead(1 KB) compared to cutting updateAccountData. However, there is a533.6. EvaluationFigure 3.3: DDG and data exchange for the sub-graph of doAccount inDayTrader. Black nodes represent software functions in the graph.gain in cutting executeQuery in that by pushing only this method tothe premises, all other code entities in the DDG (two left branches ofupdateAccountData) can be placed in the cloud, benefiting from scalableand cheaper resources.3.6 EvaluationIn this section we evaluate the contributions discussed earlier in this chap-ter, i.e. context-sensitive dependency modeling, flexible cost modeling, andasymmetric data exchange costs. In our evaluations, we need to analyzewhether or not our models capture the behavior of the system effectively.Proper modeling of the system would allow for the results of the partitioningalgorithms to match the real-world behavior of the software under analysis.We need to analyze if our context-sensitive model provides advantages overthe other existing models and whether or not the asymmetric cost modelsand the cost-to-latency analysis provide any additional advantage when an-alyzing the system. To evaluate all these questions, we have formalized theevaluation points into the following questions:1. How do the partitioning algorithms help with optimal hybrid deploy-543.6. Evaluationment of web applications? (cf. Section 3.6.1).2. How are our dependency models comparable to real world deploymentof the applications under the scenario of full cloud deployments orhybrid deployments? (cf. Section 3.6.2).3. How does our context-sensitive dependency modeling compare to therequest-based or static structure dependency models when used inapplication partitioning for hybrid cloud? (cf. Section 3.6.3).4. How do our cost model and the cost-to-latency ratio (γ) contribute toan optimal partitioning of web applications for hybrid cloud deploy-ment (cf. Section 3.6.4).5. How does application partitioning affect the scalability of the system?(cf. Section 3.6.5).6. How does application partitioning affect the overall deployment costsof the system? (cf. Section 3.6.6).7. How is symmetric application partitioning compared to asymmetricapplication partitioning? (cf. Section 3.6.7).8. How does the number of tables constrained to be on premises affectthe partitioning performance and monetary costs? (cf. Sectoin 3.6.8).We evaluate our contributions discussed in this chapter using twodifferent applications: Apache DayTrader [5] and JForum [6]. DayTrader(as discussed in Chapter 3) is a Java implementation benchmark of astock trading system and has already been used for evaluation purposes inprevious cloud computing research [64, 100]. JForum, on the other hand,is a widely used real world open source discussion board implementedin Java. It allows users with different levels of privilege to contribute todiscussions in a forum by logging in (validateLogin), creating profiles(profile), browsing forums (show), listing posts (list), inserting/storingposts (insert, insertSave), quoting posts (quote), editing posts (edit,editSave), bookmarking posts (bookmark), etc. Compared to DayTraderwhich consists of 63 Java classes and almost 6200 LOC, JForum is a biggerapplication with 347 Java classes and 25.6K LOC. Additionally, JForumconsists of 36 database tables and supports close to 100 different businesslogic functionality.553.6. EvaluationApplicationBusiness Logic ∼ Node#Functionality in DDGDayTraderdoSell 299doAccountUpdate 114doQuotes 62doAccount 79doPortfolio 73doBuy 293doLogin 176Total = 1096JForumvalidateLogin 345edit 450quote 430profile 310list 570insertSave 1200editSave 500insert 430show 350bookmark 430Total = 5015Table 3.2: The number of nodes in the DDG of DayTrader as well as thetop ten largest JForum business logic functionality.Due to the large scale of JForum, we limit our evaluation of JFo-rum to the top 10 business logic functionality with the most number ofcode entities in their dependency graphs. Table 3.2 shows the selectedbusiness logic functionality as well as the number of nodes in the DDGof each business logic functionality for both Apache DayTrader and JForum.Given the two type of algorithms suggested for application partitioning,we evaluated DayTrader and JForum in the following four deployments:i) both code and data are deployed to the premises (Private premises);ii) data is on-premises and code is in the cloud (Na¨ıve-Hybrid); iii) datais on-premises and code is partitioned using a symmetric partitioningalgorithm (Symmetric-Split-Code); and iv) data is on-premises and codeis partitioned using the asymmetric partitioning algorithm (Asymmetric-Split-Code). Also in order to compare our simulations with real worlddeployments of cloud platforms we have provided deployments where bothcode and data are in the cloud (Full-Cloud).563.6. EvaluationWe used the following setup for the evaluation: for the premisesmachines, we used two 3.5 GHz dual core machines with 4.0 GB of memory,one as the application server and another as our database server. Bothmachines were located at our lab in Vancouver, and were connected througha 100 Mb/sec data link. For the cloud machines, we used a large EC2instance with 2 EC2 Compute Units and 3.75 GB of memory as ourapplication server and another large instance as our database server. Bothmachines were leased from Amazon’s US West region (Oregon) and wereconnected by a 1 Gb/sec data link. For the operating system, all themachines ran Ubuntu 13.04 with 64bit Java Virtual Machine v.7. Our Webapplications ran on top of the Jetty Web server v.8 [8] and used OracleDatabase 11g Express Edition [9] as the database servers. The only pointof variability among all machines in our deployments was the underly-ing hardware as we kept the software stack the same across all the machines.For all the experiments in this section, unless otherwise stated, we ran allthe experiments in this section a minimum of five times. Also, we assumedthe cost of a large EC2 instance to be $0.32/hour. With the assumptionfor the cloud machine cost to be 80% cheaper than the premises machinecost, we set the cost of leasing a premises machine to be $1.50/hour. Wealso assume the cost of data transfer to be $0.12 per gigabyte where data isgoing from the cloud to the premises, and $0 per Gigabyte5 where data isgoing from the premises to the cloud. Our cloud cost schemes match thoseof Amazon EC2 [2].3.6.1 Micro-Benchmarks on Performance ImprovementsWe started evaluating our approach by performing micro-benchmarkson how applying partitioning affects the overall behavior of applicationscompared to a Na¨ıve-Hybrid deployment. As discussed earlier, the mainadvantage of doing fine-grained profiling and context sensitive modelingis to track inter-code and code-to-data dependencies at a higher level ofdetails and see how partitioning affects the total number of round-tripsor amount of data going over the network for each of the transactions inthe application. As a reference consider Figure 3.3 of Section 3.5 (alsoshown below). In the micro-benchmarks presented here, we evaluatehow applying any of the partitioning algorithms could change the overallbehavior of the application for each request type in terms of inbound-5We assume fixed flat rate monthly charges for bandwidth.573.6. EvaluationFigure 3.4: A high-level model of how different partitioning algorithms wouldchoose the placement of code and data components between the public andthe private cloud. For each of the three lines, code elements and databasetables falling below the line are considered to be placed on premises and allthe other elements are placed in the public cloud./outbound data exchange, and the number of network round-trips causedby separating software functions and database tables in a target application.We omit any micro-benchmark evaluation for Private premises deploy-ments since in a full premises deployment there is no data exchange ornetwork round-trips. In order to collect micro-benchmark results for eachtransaction in the applications we developed a network sniffer software thatwould track number of connections and size of data exchanged between peersof a hybrid cloud deployment.Network RoundtripsIn our first set of evaluations we measured changes in the number of networkround-trips when moving from a Na¨ıve-Hybrid deployment of JForum toa partitioned version of it (either Symmetric Split-Code or AsymmetricSplit-Code). A network round-trip was measured when the source andthe target of communication were located on two different machines, oneon the private premises and the other in the public cloud. In a network583.6. Evaluationround-trip the request would travel from the source machine to the targetmachine and the response would return back with the request being eithera remote procedure call request or a database request. Network round-tripsare particularly important as they cause performance degradation to thedistributed deployment of a software system and it is desirable for them tobe as minimal as possible.Deployment Request Type Database Roundtrips Http Roundtrips TotalNa¨ıve HybridvalidateLogin 6 0 6edit 7 0 7quote 9 0 9profile 5 0 5list 9 0 9inserSave 17 0 17editSave 10 0 10insert 3 0 3show 1 0 1bookmark 3 0 3Total 70 0 70SymmetricvalidateLogin 3 2 5Split-Codeedit 1 3 4quote 0 4 4profile 3 1 4list 3 2 5inserSave 6 3 9editSave 4 1 5insert 0 1 1show 1 1 2bookmark 3 0 3Total 24 18 42AsymmetricvalidateLogin 6 0 6Split-Codeedit 2 3 5quote 2 1 3profile 2 1 3list 3 2 5inserSave 5 3 8editSave 3 2 5insert 5 2 7show 3 2 5bookmark 0 2 2Total 23 17 40Table 3.3: Microbenchmarks on the number of round-trips for DataBase andRPC invocations between the public cloud and the private infrastructure forvarious application deployments to the cloud.As shown in Table 3.3, moving from a Na¨ıve Hybrid deployment toa Symmetric-Split-Code or Asymmetric-Split-Code deployment reduces theaverage number of overall round-trips between the private cloud and the593.6. Evaluationpublic cloud by 40%. The table shows that even though having a hybriddeployment results in extra HTTP message exchanges between the publicand the private cloud, it reduces the number of database round-trips by65.7% through placing code and data closer to one another.Average Data ExchangesIn another benchmark, we measured the amount of data exchange betweenthe public cloud and the private infrastructure for every request type.Figure 3.5 shows this measured data. In the figure, Http-In implies averageamount of application HTTP traffic data entering the public cloud andHttp-Out shows average amount of application HTTP traffic data leavingthe cloud. Similarly, DB-In and DB-out show the average amount ofdatabase traffic data entering and leaving the cloud, respectively. Giventhe fact that public cloud providers impose monetary charges based on theamount of data leaving the cloud, an effective optimization strategy wouldtry to minimize the amount of outgoing data for the public cloud providers.As Figure 3.5 shows, the Symmetric Split-Code algorithm reduces theoverall amount of data exchange between the cloud and the private premisesby 36% with a reduction of 33.2% in the total amount of outgoing datafrom the cloud. On the other hand, the Assymmetric Split-Code algorithmreduces the total amount of data exchange by 11.4%. However, there isa reduction of 49.8% in the amount of outgoing data from the cloud tothe private premises. The Assymmetric Split-Code has 28.2% more dataexchange compared to the Symmetric Split-Code algorithm, however whenit comes to the amount of outgoing data, the Assymmetric Split-Code hasnearly 25% less data transferred out of the cloud. Given that outgoing dataexchanges impose extra monetary costs to a hybrid cloud deployment, theAssymmetric Split-Code algorithm would result in more cost savings for thedeployment. It should be noted in the figure that, for the case of a Na¨ıveHybrid deployment (the bottom two pairs of bars in Figure 3.5), no HTTPinbound or outbound data is observed. This is due to the fact that theNa¨ıve Hybrid deployment assumes a separation of code components fromdata entities and does not separate code components from one another andhence does not generate any HTTP traffic.Table 3.4 shows more details on the amount of inbound and outboundhttp/database data exchange for each request type in JForum. The tableshows that, for 9 out of 10 request types analyzed, partitioning of the code603.6. Evaluation0.2 0.4 0.6 0.8 1·105Http-InHttp-OutDB-InDB-OutTotalData Transfer Size / Request (Bytes)RequestTypeNa¨ıve Hybrid Deployment Symmetric-Split-Code DeploymentAsymmetric-Split-CodeFigure 3.5: Inbound and Outbound data exchanges for Na¨ıve Hybrid versuspartitioned deployment modelsusing the Split-Code algorithms results in observed HTTP network traffic.However, the overall amount of data exchange reduces when a partitioningalgorithm is used for hybrid deployment. Also, as the table shows, anasymmetric data exchange algorithm, while increasing data transfer in-bound to the public cloud, reduces the size of data leaving the cloud. Thisjustifies the extra cost savings we observe when using the asymmetric par-titioning algorithm compared to the symmetric algorithm (cf. Section 3.6.7).To summarize, the analyses show that, compared to a na¨ıve hybrid de-ployment, both symmetric and asymmetric split-code partitioning resultin smaller number of network round-trips and also less data transfer be-tween the public and the private clouds. Asymmetric split-code partition-ing achieves more savings by decreasing the size of cloud out-bound datatransfer.613.6. EvaluationDeployment Request Type Http-In Http-Out DB-In DB-Out TotalNa¨ıve HybridvalidateLogin 0 0 6620 322 9843edit 0 0 8147 3783 11930quote 0 0 11047 4916 15964profile 0 0 3611 2618 6230list 0 0 5464 4331 9795inserSave 0 0 8041 7926 15967editSave 0 0 6574 4974 11548insert 0 0 4346 1774 6120show 0 0 515 605 1120bookmark 0 0 1903 1457 3360Total 0 0 56268 35607 91877SymmetricvalidateLogin 3492 1150 809 742 6194Split-Codeedit 3033 1774 337.6 214 5458.6quote 4873 2274 0 0 7279profile 1824 638 1510 1159 5132list 4444 1675 1008 1062 8191inserSave 3635 5651 1562 2218 13068editSave 1264 1031 1108 1449 4853insert 1994 949 0 0 2943show 1497 167 361 444 2471bookmark 0 0 1911 1193 3105Total 26056 15309 8606.6 8481 58452.6AsymmetricvalidateLogin 3327.875 1082.3125 430.9375 369.375 5210.5Split-Codeedit 1248.95 928 558.75 532.25 16508.5quote 1437.75 1447.5 559.5 565.25 14937profile 1634.94 621.26 1548.47 985.26 4989.94list 4027.125 1514.075 1162.575 539.225 7943inserSave 4921.75 4428.3 2056.125 837.83 14944.04editSave 11341 686.5 809.5 994.5 16931.5insert 1701.2 998.6 0 0 2699.8show 410.34 49.69 317.91 372.52 1150.47bookmark 0 0 1889 894.5 3083.5Total 54231.24 11756.27 9332.77 6090.71 81411.01Table 3.4: Microbenchmarks on the amount of data exchanged between thepublic cloud and the private cloud for various hybrid deployments.3.6.2 Cost Models vs Measured DeploymentsIn order to verify the accuracy of the cost models, we compare the executionand data transfer measurements of the generated dependency models tothose of real deployments for DayTrader. Since our profiling data wascollected on the premises machine, the models and the real deploymentfor the machine on premises are identical. In a hybrid or a full clouddeployment, the models are generated by applying linear fitting techniquesas described in Section 3.4. We compared the generated cost models withthe real deployment of the DayTrader application for settings where thedeployment is fully in the cloud (Full-Cloud) or when it is partitioned usingthe asymmetric algorithm (Asymmetric-Split-Code). Figure 3.6 compares623.6. Evaluationthe results averaged over 1000 requests to each request type over 5 runs.0 200 400 600 800 1,000 1,200 1,400 1,600 1,800doLogindoAccountdoAccountUpdatedoPortfoliodoQuotesdoBuydoSellExecution Time / Request (milliseconds)RequestTypeAsymmetric-Split-Code Model Asymmetric-Split-Code DeploymentFull-Cloud Model Full-Cloud DeploymentFigure 3.6: Simulated and measured execution times for hybrid and cloudcode placements for each DayTrader request typeAs can be seen from Figure 3.6, the execution time for the generatedmodels in case of the Asymmetric-Split-Code are 81.3% similar in theaverage accuracy to the actual deployments (σ=0.13) while models for Full-Cloud deployments are 86.1% similar (σ=0.052) compared to a practicaldeployment. Similarly, we compared the actual data transfer to the modeleddata transfer (cf. Figure 3.7). For data transfer the Asymmetric-Split-Codemodel provides 87.4% average estimation accuracy compared to the realdeployment (σ=0.061) while for a Full-Cloud deployment we get 86.3%average estimation accuracy (σ=0.076).In summary, the results show a variance of ±15% for our generated mod-els compared to the real world deployments of the application. The resultsverify that partitioning of the real-world applications based on the simu-633.6. Evaluationlated models would lead to having real-world hybrid deployments whosecost and performance measures fall within boundaries of estimated cost andperformance for the simulated models. In the following sections, we discussreal deployments of DayTrader and JForum, and as results follow, we ver-ify that using cost models of this accuracy can provide benefits in actualdeployments.0 5 10 15 20 25doLogindoAccountdoAccountUpdatedoPortfoliodoQuotesdoBuydoSellData Transfer / Request (KB)RequestTypeAsymmetric-Code-Split Simulation Asymmetric-Code-Split DeploymentFull-Cloud Simulation Full-Cloud DeploymentFigure 3.7: Simulated and measured data transfer sizes for hybrid and cloudcode placements for each DayTrader request type3.6.3 Evaluation of Context-Sensitive ModelingNext, we evaluate the three dependency models from Section 3.2 to seewhich one contributes to a performant deployment under varied premisescost and fixed cloud cost. We used Manticore to decide about functionplacements for DayTrader as the cost of deployment to a premises machinelinearly changes from $0.16 per hour to $2.5 under an expected load of 100requests per second. For each cost value, we took the mapping of functionexecutions suggested by Manticore for each of the models and physically643.6. Evaluationdeployed it. We then measured the average perceived latency across all therequests to see how the partitioning affects the overall performance of thesystem. Results are shown in Figure 3.8. The line at the top of the graphshows the full deployment of the code to the cloud whereas the bottom lineindicates a full premises deployment.We observe that for premises costs greater than $0.16/hour, SSM yieldsto a full deployment to the cloud (cf. Figure 3.8). This is because thismodel does not account for function replication and thus, separation ofcode elements would imply introduce network round trips and increase thecommunication latency compared to a full cloud deployment. To preventthis, upon premises cost increase, the analysis framework chooses to haveall the code in the cloud and only pay for the communication latency when0 0.5 1 1.5 2 2.5100200300400500600700premisesCloudpremises Machine Cost / Hour ($)AveragePerceivedLatency/Req(ms)SSM RBM CSMFigure 3.8: Comparison of latency adjustments for the SSM, RBM, andCSM as the premises cost changes.653.6. Evaluationit comes to retrieving data from database tables.For RBM, the dependency model consists of subgraphs separated byrequest types and as a result there is freedom in the model to choose codeplacements based on request types. Compared to the SSM, this modeltolerates changes to the cost of deployment on premises and compensatesbetween cost and performance by pushing the code for different requesttypes from premises to the cloud to the point where all the code is in thecloud. Figure 3.8) shows that for RBM , a full cloud deployment happenswhen premises charges exceed $0.50/hour.In CSM, the fine level of granularity in the model allows for replicationof code while enabling the partitioning algorithms to find edges with thelowest performance overhead to be cut. As a result, unlike SSM and RBM,cuts made in the CSM do not solely separate code from data but alsoseparate code units with low communication overhead from one anotherthus improving costs of the deployment. In CSM , a full cloud deploymenthappens when premises charges exceed $2.15/hour which leads to higherperformance in the deployment.To summarize Figure 3.8, when cost of deployment to premises is cheap(left part of the graph), all three models choose to have the code deployed onpremises. Similarly, when cost of deployment to the premises is expensive,all three models choose to push all the code to the cloud. However, betweenthese two ends of the spectrum, CSM provides latency aware deploymentsby not only separating code from data but also by separating code unitswith lower performance overhead from one another. As a concrete example,for the premises resource cost of $0.48/hour, a CSM -based deployment has46% less latency compared to SSM and 25.3% less latency compared toRBM .3.6.4 Evaluation of Flexible Cost ModelingPerformance degradation is one of the biggest concerns when it comes toseparation of code from data or code from code for a distributed applicationdeployment. The degradation is concerned with the extra latency addedto the overall execution process as a matter of having the data sent andreceived over the network. In this section we show how by changing γ fromEquation 3.5, we allow for latency and cost to be traded for one another.663.6. EvaluationWith a measured median round-trip communication latency of 15msbetween the cloud machine and our premises machine over 100 trials,we linearly increased γ from 0.5 to 15 (i.e. ×30) to see how it affectsfunction placement decisions and the corresponding deployment costs. Tomeasure deployment costs, we set α = 5 representing 80% cost saving(cf. Equation 3.3). Table 3.5 shows code placement decisions made bythe partitioning framework for different request types in Apache DayTrader.As shown in Table 3.5, with γ = 0.5, the partitioning algorithm choosesto have all functions in the cloud (Na¨ıve-Hybrid) where the deploymentwould be the cheapest (low γ favors monetary cost over latency). As γincreases, the algorithm pushes more functionality to the premises in orderto accommodate a performant deployment, either by making a cut in thecode (Asymmetric-Split-Code) or by pushing the entire code for a requesttype to the premises (Private-premises). Figure 3.9 shows the increase incost of deployment as the functions are pushed to the premises. As wealready discussed in Section 3.6.4, cost of deployment is increased as thepartitioner moves more code entities to the premises.Comparing each deployment with its predecessor and successor deploy-ments illustrates that changing γ for the partitioning algorithm can balancebetween cost and performance of the deployed application. For example,the deployment when γ equals 5 is (on average) 50.9% more expensivethan the successor deployment suggested when γ equals 1.5, yet its averageperformance is only 21.7% slower. However, the deployment suggested forwhen γ equals 5, compared to its predecessor deployment when γ equals15, is on average 24.6% slower but 2.6% more cost effective.In nutshell, γ allows for system architects to establish a relation betweenRequestTypeDeployment choice for varied cost-to-latency ratio (γ)0.5 1.5 5 15doLogin Na¨ıve-Hybrid Asym-Split-Code Asym-Split-Code Private-premisesdoBuy Na¨ıve-Hybrid Asym-Split-Code Asym-Split-Code Private-premisesdoPortfolio Na¨ıve-Hybrid Private-premises Private-premises Private-premisesdoAccount Na¨ıve-Hybrid Na¨ıve-Hybrid Private-premises Private-premisesdoQuotes Na¨ıve-Hybrid Na¨ıve-Hybrid Private-premises Private-premisesdoAccountUpdate Na¨ıve-Hybrid Na¨ıve-Hybrid Private-premises Private-premisesdoSell Na¨ıve-Hybrid Asym-Split-Code Asym-Split-Code Private-premisesTable 3.5: Code placement for different request types in Apache DayTraderas Gamma (γ) changes from 0.5 to 15.673.6. Evaluation0 0.2 0.4 0.6 0.8 1 1.2doLogindoBuydoPortfoliodoAccountdoQuotesdoAccountUpdatedoSellCost of Execution / Request ($×10−4)RequestTypeγ = 0.5 γ = 1.5 γ = 5 γ = 15Figure 3.9: Monetary cost of deploying requests of various type for Day-Trader with respect to changes in Gamma (γ)the cost of the deployment and the level of perceived latency. The higherthe value of γ, the higher the cost of latency and the higher the chances forcollocation of code and data. Considering γ as a tuning knob, software de-velopers are able to decide about the proper value of γ for their distributionmodel based on the latency and cost policies of their system.3.6.5 Evaluation of Scalability with Code-only PartitioningWe also performed a scalability analysis for DayTrader to see how differentcode placement choices affect application throughput. DayTrader comeswith a client workload generator that models user behaviors for browsingand purchasing stocks. For the deployment tests, we used a range of 500 to4000 simulated clients over a period of 5 minutes with 1 minute ramp-uptime, and measured throughput. Figure 3.10 shows the throughput of thesystem under varied user load. As shown in the figure, the machine on thepremises riches the highest throughput of 384 req/sec when the numberof user threads sending requests is 2500. For 2500 user threads, Private-683.6. Evaluationpremises has 18% better throughput compared to both Na¨ıve-Hybridand Asymmetric-Split-Code deployments. Once we passed this threshold,the premises machine got overloaded and was unable to properly handleincoming requests to the point that for 4500 user threads Na¨ıve-Hybridand Asymmetric-Split-Code deployments reached 32% better throughputcompared to the Private-premises deployment.500 1,0001,5002,0002,5003,0003,5004,0004,500250300350400450Simulated UsersThroughputPrivate-premises Na¨ıve-Hybrid Asymmetric-Split-CodeFigure 3.10: Scalability tests for full premises, full cloud, and hybrid deploy-mentsFinally, we notice that the hybrid and cloud deployment had similar scal-ability. However, as shown in Section 3.6.3 a hybrid deployment using ourCSM provides better response time when the CPU is not a bottleneck. Sofor the DayTrader case study there is extra advantage in using partitioningfor hybrid cloud deployment over simple public cloud or private premisesdeployments.693.6. Evaluation3.6.6 Evaluation of Deployment CostsNext, we evaluate the effects of combined code and data partitioningon the estimated monetary deployment costs (see [67]) of DayTraderand JForum (Figures 3.11 and 3.12) under various deployment models.For DayTrader, both symmetric and asymmetric partitioning algorithmsgenerate the same partitioning suggestions and thus Symmetric-Split-Codeand Asymmetric-Split-Code overlap.As shown in both graphs, a full deployment of the Web applicationson-premises (Private-premises) results in a deployment cost increase linearto the increase in premises machine costs, rendering such deploymentsexpensive. As shown in Figure 3.11, in case of DayTrader with premisesmachines costing $0.5/hour, the premises deployment is 38% cheaper thanNa¨ıve-Hybrid, 34% cheaper than Symmetric-Split-Code, and 31% cheaperthan Asymmetric-Split-Code. However, for the premises resource costs of$2.0/hour, premises deployment is 35% more expensive than any of thehybrid deployments.In cases of Symmetric-Split-Code and Asymmetric-Split-Code, anincrease in costs for machines on-premises results in the algorithm pushingmore code to the cloud to the point where all the code is in the cloud andall the data is on-premises. In such a situation, both Symmetric-Split-Codeand Asymmetric-Split-Code eventually yield to a deployment identical tothe one suggested by Na¨ıve-Hybrid ; that is, pushing all the code to thecloud where all the data inevitably needs to be on-premises.For JForum the trend in deployment costs (see Figure 3.12) is similarto DayTrader. With cheap premises resource costs, premises deployment isthe cheapest of all deployments. However linear increase in cost of premisesresources increases the cost of deployments linearly as well to the pointthat it becomes more expensive than hybrid deployments. For a premisescost of $1.5/hour, the Symmetric-Split-Code and Asymmetric-Split-Codedeployments are 15% cheaper than Private-premises and Na¨ıve-Hybrid. Buteventually all three hybrid deployments converge by pushing all the code tothe cloud while keeping all the data on premises. Also, there are differencesbetween the deployment costs for JForm and DayTrader. Since JForum isa larger application (4.6 times more functions in its dependency graph), theoverall cost of deploying JForum for each model is comparatively higherthan for DayTrader. Furthermore in JForum, partitioning of code and data703.6. Evaluationresults in fine-grained transition of code modules from the premises to thecloud and as a result changes in premises cost lead to changes in overalldeployment linearly. However, in DayTrader, due to the smaller size of theapplication (4.6 times smaller), less code is moved to the cloud. Finally,as shown in Figure 3.12, when we only partition code, there is a differenceof 11% between the costs of deployment for Symmetric-Split-Code andAsymmetric-Split-Code. This is because, due to the size of JForum, anyoptimization that results in relocating code and data deploys more code inthe cloud and benefits from cloud resource costs.When measuring the costs of deployment, we rely on the assumptionthat resource costs on premises grow at a higher rate compared to the cost0 0.5 1 1.5 2 2.55001,0001,5002,0002,5003,000720premises Machine Cost / Hour ($)AverageExecutionCost($/Month)Private-premises Na¨ıve-HybridSymmetric-Split-Code Asymmetric-Split-CodeFigure 3.11: Comparison of monthly deployment costs for different deploy-ment models of DayTrader. The cost is measured by detailed profiling ofapplication resource usage for a 10 minute period over 5 runs, interpolatingresource usage to a month time of deployment and then applying the costmetrics from the cost scheme to the measured resource usages.713.6. Evaluation0 0.5 1 1.5 2 2.501,0002,0003,0004,0005,000133720premises Machine Cost / Hour ($)AverageExecutionCost($/Month)Private-premises Na¨ıve-HybridSymmetric-Split-Code Asymmetric-Split-CodeFigure 3.12: Comparison of monthly deployment costs for different deploy-ment models of JForum. The cost is measured by detailed profiling of ap-plication resource usage for a 10 minute period over 5 runs, interpolatingresource usage to a month time of deployment and then applying the costmetrics from the cost scheme to the measured resource usages.of resources in the cloud. In fact, in the years since the start of Ama-zon Web Services, the resource costs for the public cloud have only gonedown. With growing costs of the premises cloud, deployments dependent tothe premises cloud experience an increase in the overall deployment costs.However, the results show that using partitioning algorithms for hybrid de-ployments lessens the dependency of a deployment to premises resources,and as such their deployment costs are less affected by a growth on premisesdeployment costs.723.6. Evaluation3.6.7 Evaluating Symmetric vs. Asymmetric PartitioningAlgorithmsIn this section we evaluate how symmetric and asymmetric partitioningalgorithms affect data exchange patterns and so lead to different parti-tioning choices. As discussed in Section 3.5.2, the asymmetric algorithmimproves performance by loosening dependencies between code and dataentities. The loosening of dependencies happens by reducing the monetarycharges to data exchange costs (i.e., edge weights) in the dependencygraph and thus allowing the partitioning algorithm to separate graphnodes from one another with less penalty. This allows for code entitiestightly bound together, due to their data dependencies, to freely bepartitioned. However, for the asymmetric algorithm to work effectively, theapplication under analysis needs to satisfy one of the following conditions:i) either it needs to be data intensive, implying that there needs to beheavy data dependencies between its software entities (code & data),or ii) its business logic functionality need to be computation intensiveso that relocation of their code entities would result in performance changes.DayTrader and JForum, as 3-tier Web applications do not fall withinthe category of data intensive applications. Moreover, as we alreadyshowed, DayTrader is a small application (see Table 3.3) and relocationof its code entities does not necessarily imply changes in its performance(cf. Figure 3.11). For JForum, however, relocation of code entities mayresult in cost changes. As shown in Figure 3.12, in JForum these costchanges can result in better performance and cheaper deployments in aAsymmetric-Split-Code compared to Symmetric-Split-Code.Figure 3.13 shows how the partitioning decisions made using theasymmetric algorithm have different data exchange patterns comparedto the symmetric algorithm. In particular, Figure 3.13 shows that eventhough partitioning using the asymmetric algorithm results in increaseddata transfer to the cloud by a factor of ×1.5, it reduces the amount ofdata leaving the cloud by a factor of ×0.25, and more importantly it allowsfor relocation of code entities from the premises to the cloud. In JForumthis results in a cost saving of 11% when using Asymmetric-Split-Codecompared to using Symmetric-Split-Code.In a nutshell, the results show that the asymmetric partitioning algo-rithm achieves improvements in cost by reducing the size of data outbound733.6. EvaluationFigure 3.13: Measured data exchanged between the cloud and premisesafter using the Symmetric vs. Asymmetric partitioning algorithms of Sec-tion 3.5.2. In each bar, the left part of the bar shows data inbound tothe cloud and the right part of the bar shows data outbound to the cloud.Requests partitioned using the Asymmetric Split-Code, have a smaller out-bound data exchange (the right part of the bar) compared to when thoserequest are partitioned using the Symmetric Split-Code.to the cloud. Even though Asymmetric Split-Code partitioning may resultin larger volumes of data going to the cloud, due to the lower cost of inbounddata transfer, the overall deployment will be cheaper in cost.743.6. Evaluation3.6.8 Evaluating Data Entity Placement ConstraintsSo far, we have made specific assumptions about which database tables areconstrained to be on-premises for DayTrader and JForum. Our criteria forconstraining database tables have been security and privacy concerns thatmay lead to data protection issues. In this section we show how differentchoices for table placement constraints can change the deployment modelsand associated costs.To do so, for DayTrader, we started by having all five tables ofaccountprofile, account, holding, order, quote placed in the cloud.Then following the order shown above, we constrained the tables to bepushed to the premises one-by-one to the point where all the tables wereplaced on-premises. Figure 3.14 shows how the placement for code entitieschange as more data entities are pulled to the premises.As shown in Figure 3.14, when all the tables are in the cloud, all thecode is also pushed to the cloud. However, as tables are pinned to thepremises, code entities with tightly coupled data constraints are migratedto the premises. We noticed when we constrained accountprofile tobe on-premises, its data dependencies with the account table pulled thistable to the premises as well. For this reason, in Figure 3.14 the datashown, when only the accountprofile table is pinned to the premises, isidentical to when both tables accountprofile and account are pinned tothe premises.Our cost analysis estimations for Asymmetric-Split-Code deploymentswith variable number of constrained tables is shown in Table 3.6. Theresults show that by only moving the quote table to the cloud, a costsaving of 20% can be achieved compared to where all the tables are onpremises. On the other hand, pinning only accountprofile and accountto the premises would result in 33% cost savings compared to a full pinningof all tables to the premises. However, placement choices of holding andorder did not appear to affect the overall deployment costs.The results from this evaluation show that minimizing data placementconstraints contributes to the deployment costs. More importantly, ouranalysis shows that proper data placement decisions for particular dataentities (e.g., quote in DayTrader) affects the overall cost of deployment.This further necessitates the need for in-depth analysis of code and data753.7. Summary0 1 2 3 4 5020406080100# of Tables pinned to the premises%CloudtopremisesCodeFigure 3.14: Ratio of module placement in the cloud (upper bar) to thepremises (lower bar) when changing the number of tables pinned to thepremises.dependencies when doing hybrid cloud deployments. We show in Chapter 4,how a thorough analysis of code and data dependencies combined with across-tier partitioning of code and data can lead to improvements in overallcost and performance of a hybrid deployment.3.7 SummaryIn this chapter, we proposed an extension to existing application partition-ing techniques to provide for hybrid deployment of web applications. Theevaluation on DayTrader showed that the new approach can contribute to-wards an optimized hybrid cloud deployment. In particular, it showed that:we are able to reduce network round-trips and data exchange outbound tothe cloud by using partitioning algorithms (cf. Section 3.6.1); the costs of a763.7. Summary# Tables Pinned Deployment Costto premises ($/Month)1 1032.862 1032.863 1109.124 1230.385 1539.16Table 3.6: Deployment costs for DayTrader as the number of tables pinnedto the premises change.hybrid deployment extrapolated from monitoring a single-host test versionof the service were at least 81.3% accurate (cf. Section 3.6.2); the context-sensitive modeling of service behavior provided a better representation tooptimize placement of software function execution (cf. Section 3.6.3); ourformulation of the objective function for optimization allows developers totune the tradeoff between end-to-end round-trip time and deployment costs(cf. Section 3.6.4); and that a hybrid deployment while providing similarscalability as a full cloud deployment offers better round-trip latency (cf.Section 3.6.5), and is cheaper overall (cf. Section 3.6.6).Throughout this chapter we made specific assumptions about thenature of web applications we analyzed that affect our process of profiling,modelling, and partitioning. Here we describe some of those assumptions,enumerate threats to validity, and describe solutions or potential directionsfor future research:• We assumed that web applications are collections of stateless softwarefunctions. This implies that none of the software functions in theapplication should keep an internal state (i.e. a precedent event) towhich the forthcoming sequence of interactions depend. While thisassumption holds for the web applications we analyzed, there couldbe situations where the statelessness constraint is violated by somesoftware functions in the application. Our current strategy to handlesuch cases is by manually inspecting and identifying functions in theapplication that keep internal state and avoiding their replication. Aspart of the future work this could be further automated by benefittingfrom techniques such as static analysis [85] or dynamic data depen-dency analysis in order to automate the process of identifying statefulsoftware functions. The internal states then can be either converted773.7. Summaryinto a shared state to be accessed by different replicas of a softwarefunction, or such stateful software functions can be constrained to notget replicated.• For the deployment machines, we assumed all the machines in thecloud to be homogeneous. As such we assumed that the overall be-haviour of the application once replicated from one instance to anotherwill stay the same as long as those instances are identical in their type.However, existing research shows that the assumption of homogeneityof replicated instances, while true in theory, does not hold in prac-tice. As part of our future work we aim to look into techniques toencode heterogeneity of machines into the modelling and partitioningalgorithms to allow for more accurate hybrid deployment of softwareapplications in the cloud.• The partitioning algorithms discussed in this chapter only allow fortwo-way partitioning of web applications. In our case it is useful be-cause it allows for optimal partitioning between the public cloud andthe private infrastructure. However, two-way partitioning relies on anassumption related to the previous point, i.e., that all machines in agiven partition are homogeneous. It is due to this assumption on ho-mogeneity that we can have estimations on the deployment costs formachines in each partition. As mentioned above, this assumption maynot hold in the real world. As such, we intend to look into the ex-isting algorithms on multi-way partitioning and investigate how theircombination with modelling of heterogeneous machines could improvecost and performance in hybrid cloud deployments.• Finally, for our cost model we focused on capturing the cost of usingCPU resources and the cost of data transfer. While our current costmodel is in line with the cost metrics provided by cloud providers,we acknowledge that the cost of having a hybrid deployment can alsodepend on other factors such as the cost of data storage, the cost ofcooling, CO2 emission, etc. Even though our current cost model doesnot deal with these extra cost parameters, the flexibility encoded intoour cost model allows these parameters to be declaratively includedinto the model. These factors then either can be related to perfor-mance metrics we collect from the application or can be added as aconstant to the overall cost of deployment, where they will come intoeffect during the optimization process for hybrid deployment.78Chapter 4Data Dependency Modelingand Data-tier PartitioningIn a real-world software system, partitioning for a hybrid cloud deploymentrequires reasoning on an optimal placement of software functions anddatabase tables across all request types in the web application. In theprevious chapter, we discussed how a combination of code dependencymodeling and application-tier partitioning would allow for separation ofsoftware functions from one another. In this chapter we explain howapplication-tier partitioning is augmented with data-tier partitioning inorder to improve on performance and cost for a hybrid-cloud deployment.To motivate combined application-tier and data-tier partitioning, weagain use the DayTrader doLogin example as a running scenario throughoutthis chapter. Figure 4.1 shows the output of our cross-tier partitioningfor doLogin. The figure shows the call-tree of function executions inthe application-tier and dependencies between the application-tier andthe data-tier. For each database query, the data-tier dependencies arerepresented in the form of a dependency graph, known as a query plan. Thequery plan provides an ordered set of steps for how data from databasetables is fetched in order to respond to the target query. In a query plan,database operands (i.e., tables) and operators (e.g., join or select) formthe nodes of the graph and their relations within the context of the targetquery form the edges.79Chapter4.DataDependencyModelingandData-tierPartitioningFigure 4.1: A cross-tier partitioning suggested by our tool for the doLogin request from DayTrader showing apartitioned application- and data-tier: nodes from the data-dependency graph on premises (black square nodes),nodes from the data-dependency graph in the cloud (white square nodes), functions from the code-dependencygraph on premises (black circle nodes), and functions from the code-dependency graph in the cloud (white circlenodes). Here we only show the full dependency graph for doLogin.80Chapter 4. Data Dependency Modeling and Data-tier PartitioningIn Figure 4.1 (also at a higher level in Figure 1.1), we see four categoriesof components in 3 regions separated by dashed lines:• Nodes from the data-dependency graph placed on premises, shown asblack squares in Regions (A) and (C) of the figure.• Nodes from the data-dependency graph placed in the cloud, shown aswhite squares in Region (B) of the figure.• Functions from the code-dependency graph placed on premises, shownas black circles in Region (A) of the figure.• Functions from the code-dependency graph placed in the cloud, shownas white circles in Region (B) of the figure.We use each of these four categories to motivate cross-tier partitioningand achieve the following objectives:First, some data is not suitable for deployment in the cloud due toprivacy concerns or regulations [58]. Thus, many enterprises need to avoidcommitting deployment of certain data in the public cloud, and instead hostit on private premises infrastructure. For example, in DayTrader, given thatuser information is privacy sensitive, tables account and accountprofilein Figure 4.1 are constrained to stay on premises (black square nodes inRegions (A) and (C) of the figure).Second, function execution requires CPU resources which are generallycheaper and easier to scale in the public cloud (some reports claim 80%savings using public cloud versus on-premises private systems [75]). Thusplacing function execution in the public cloud is useful to limit theamount of on-premises infrastructure. So without regard to other factors,we would want to execute application-tier functions in the cloud. Forthe example of Figure 4.1, this would include the majority of functionsinvolved in the doLogin request type (i.e., white circle nodes in Region (B)).Third, since we would like to deploy functions to the cloud, theassociated non-sensitive data, tightly bound to those functions, shouldbe deployed to the cloud, otherwise we will incur excessive latency andbandwidth usage. So there is motivation to move non-sensitive data tothe cloud. However, such non-sensitive data may be bound to sensitivedata through queries which operate over both. For this reason, moving81Chapter 4. Data Dependency Modeling and Data-tier Partitioningnon-sensitive data to the public cloud is not always a winning proposition.We will need an analysis which, on one hand, could reason about thebenefit of moving data closer to functions executing in the public cloud, andon the other hand, would consider the drawbacks of pulling non-sensitivedatabase away from the sensitive data on premises. Figure 4.1 shows howthe holdings table is marked as non-sensitive and thus is allowed to beplaced in the cloud (i.e., white square nodes in Region (B) of the figure).Finally, executing all functions in the public cloud is not always benefi-cial. Some functions are written as transactions over several data resources.Such functions may incur communication overhead if they execute in thepublic cloud but operate on private premises data. Furthermore, choiceson function placements are going to have a propagating effect on decidingabout the benefits of where to place non-sensitive data and vice-versa. Sothe benefit of executing functions in the cloud or on the premises needs to bebalanced with the incurred communication overhead caused by placementdecisions made for corresponding data. In Figure 4.1, black circle nodes ofRegion (A) show how constraining account and accountprofile tablesto the premises infrastructure pulls the code closely dependent to thosedatabase tables to the premises infrastructure as well.These four objectives help to illustrate the inter-dependencies betweenthe application-tier and the data-tier. In the case of doLogin of Figure 4.1,a developer may manually arrive at a similar partitioning with only minoreffort. The developer might be able to look into factors such as numberof round-trips over communication links (i.e., edges in the graph), CPUusage for each function (i.e., nodes in the graph), and privacy-sensitivityof each node in the graph, and decide about about how to make a cheaperhybrid partitioning of the graph. However, in a real-world softwareapplication with hundreds to thousands of request types (each possiblycontaining hundreds to thousands of function and data nodes in theirdependency graphs), partitioning an entire application requires developersto simultaneously reason about the effects of component placement acrossall request types. Without proper tooling and automation, the process ofdependency analysis and partitioning will not be manageable.This chapter is summarized and published in the Usenix MiddlewareConference 2013 [68]. In the rest of this chapter we explain how we ex-tend application-tier partitioning with data-tier partitioning to achieve afull cross-tier partitioning strategy for hybrid cloud deployment. In the ex-824.1. Extending Application-tier Partitioning with Data-tier Partitioningample application of DayTrader, through cross-tier partitioning, we will beable to take the dependency graph of Figure 3.2, augment it with extra in-formation on data dependencies, and apply our partitioning algorithms, inorder to achieve a full dependency graph for application and data tiers, sim-ilar to the one in Figure 4.1. In Section 4.1 we provide an example scenarioof augmenting application-tier partitioning of Section 3.2.3 for doLogin withdata-tier partitioning to show how cross-tier partitioning improves cost andperformance savings over a code-only partitioning strategy. In Section 4.3the process of collecting profiling data for data entities are explained. Sec-tion 4.4 discusses our analysis to capture data interactions and dependenciesacross database tables in the software system. In Section 4.5 we elaborate onthe mathematical formulation of the partitioning problem for the data-tierand how it can be combined with the application-tier to form cross-tier par-titioning. Finally in Section 4.6 we evaluate the effectiveness of our approachthrough an analysis of benchmark software systems and their partitioningand deployment in hybrid cloud using our framework.4.1 Extending Application-tier Partitioning withData-tier PartitioningWe take the following steps when extending application-tier partitioningto integrate the data-tier: (i) weighing the benefits of distributing queries,(ii) comparing the trade-offs between join orders, (iii) taking into accountintra-request data-dependencies, (iv) taking into account inter-requestdata-dependencies and (v) providing a query execution model comparableto application-tier function execution. We focus on a data-tier implementedwith a traditional SQL database. While some web application workloadscan benefit from the use of alternative NoSQL techniques, we chose toinitially focus on SQL due to its generality and widespread adoption.1. Weighing the Benefits of Distributing Queries. As described earlier,placing more of the unconstrained data in the cloud will allow forthe corresponding code from the application-tier to also be placed inthe cloud, thus increasing the overall efficiency of the deployment andreducing data transfer. However, this can result in splitting the setof tables used in a query across public and private locations. Withthe constrained tables kept on the premises and unconstrained tablesmoved to the cloud, any query operating on a combination of these two834.1. Extending Application-tier Partitioning with Data-tier Partitioningsets of tables would require to be turned into a distributed query. Par-ticularly this would result in binary relational operations in a query(e.g., a join operation) to have to work on distributed tables. Assuch, partitioning of the join operations would lead to trade-offs be-tween cost and performance similar to the case of code partitioning.The challenge for partitioning of data-dependency graphs is to findthe right balance between having the minimum number of round-tripsand data transfer between the cloud and the premises as well as themaximum use of cheap resources (e.g., CPU and memory) from thecloud. For our example of DayTrader, each user can have many stocksin her holdings which makes the holding table quite large in size. Asshown in Figure 4.1, in an optimal splitting of the join operation, theholdings table can be pushed to the cloud (white square nodes) toeliminate the traffic of moving its data to the cloud. This splitting alsomaintains our constraint to have the privacy sensitive account tableon the private premises. An effective modeling of the data-tier needsto help the BIP optimizer reason about the trade-offs of distributingsuch queries across the hybrid architecture.2. Comparing the Trade-offs Between Join Orders. The order in whichtables are joined can have an effect not only on traditional process-ing time but also on round-trip latency. Throughput this section, weuse the running example of the query shown in Figure 4.2, with twodifferent join orders, left and right. If the query results are processedin the public cloud where the holding table is in the cloud and ac-count and accountprofile are stored on the private premises, thenthe plan on the left will incur two round-trips between the public andthe private clouds for distributed processing. On the other hand, thequery on the right only requires one round-trip. While both queries arefeasible to be processed by the database engine, it can be instructedto choose the plan incurring less round-trip and latency. Modeling thedata-tier should help the BIP optimizer reason about the cost of exe-cution plans for different placements of tables and accordingly instructthe database engine to execute the better performing query. The aimis to extend the capabilities of database engines to execute distributedqueries with extra guidance on execution of database operations toachieve improved cost and performance.3. Taking into Account Intra-request Data-dependencies. A web appli-cation is a collection of tens to hundreds of request types with each844.1. Extending Application-tier Partitioning with Data-tier PartitioningFigure 4.2: Two possible query plans from one of the queries in Day-Trader: SELECT p.*, h.* FROM holding h, accountprofile p, accounta WHERE h.accountid = a.accountid AND a.userid = p.userid ANDh.quote symbol = ? AND a.accountid = ?. The figure shows two differ-ent execution plans for the same query where in (a) first holding and account arejoined and the result is then joined with accountprofile while in (b) accountand accountprofile are joined prior to being joined with holding.request type implementing a business logic functionality in the webapplication (e.g., doLogin or doAccount in Apache DayTrader). Dueto the stateless nature of web applications, software functions involvedin the execution of each request type and their data access patternscan be realized independently from other request types. Across allapplication request types, some can execute more than one query toaccess data. In these cases, it may be beneficial to partition func-tions internal to the request type into group executions with data ata single location. Such grouping helps to eliminate latency overheadotherwise needed to move data to the location where the application-tier code executes. An example of this is shown in Figure 4.1, wherea sub-tree of function executions for TradeJdbc:login is labeled as“private” (black circle nodes). By pushing this sub-tree to the privatepremises, the computation needed for working over account and ac-countprofile data in the two queries under TradeJdbc:login canbe completed at the premises without multiple round-trips betweenlocations.4. Taking into Account Inter-request Data-dependencies. While code de-pendencies can be analyzed internal to each request type in a webapplication (see the requirement above on intra-request data depen-dencies), data dependencies need to be analyzed across different re-quest types. This is due to the fact that there is a single databaseshared across all request types that stores or supplies data from / tothe code in each request type. As a result, while decisions on place-854.2. High-Level Overview of Data-tier Partitioningment of code across the public cloud and the private infrastructureonly affect the execution of the corresponding request type, decisionson placement of data have a wide effect on all request types dependingon that data. As an example in in Figure 4.1, moving the holding ta-ble to the cloud is only effective if code from other request types usingthis database table is also largely placed in the cloud. Otherwise, thelatencies in accessing this table from other request types would have anegative effect on the cost and performance of the overall deployment.5. Providing a Query Execution Model Comparable to Application-tierFunction Execution. Since the trade-offs on function placement dependon the placement of data and vice-versa, we need a model that canreason simultaneously about both application-tier function executionand query plan execution. Previously we discussed the mathematicalmodel used for defining constraints and the objective function whenpartitioning the code dependency graphs. The model for the data-tiershould be compatible for integration with an approach to applicationpartitioning such as the one described in Chapter 3.4.2 High-Level Overview of Data-tierPartitioningAs shown in Section 1.3 of Chapter 1, the overall process of applying cross-tier partitioning involves the following three high level steps:• Profiling• Analysis• PartitioningIn Chapter 3, we described how the three steps of profiling, analysis,and partitioning are applied to the application-tier of a web application.In this chapter, we describe how profiling, analysis, and partitioning areapplied to the data-tier, with the next sections in this chapter elaboratingon the details of each step. It should be noted that the main goal in ourdata partitioning approach is to decide about the proper placement ofdatabase tables and not to invent a new query optimization technique.To do data-tier partitioning, we could have tried to implement a queryoptimizer for a hybrid cloud setting. However, that would have required864.2. High-Level Overview of Data-tier Partitioningdeveloping a database query optimization engine and re-inventing a lotof existing algorithms for query optimization. Instead of re-inventing thewheel we chose to implement a layer of abstraction on top of traditionalquery optimization engines that performs the following tasks: i) Collectingprofiling data on implications of changing the orders of execution for joinorders in a query plan ii) Analyzing the collected profiling data using thehybrid cloud optimizer which considers the implications of executing joinorders across distributed tables, iii) Utilizing optimization techniques tochose the query plan whose distribution of join orders has the minimumoverhead on performance and cost, and iv) Placing database tables fol-lowing the selection of the optimal query plans for a hybrid cloud setting.Figure 4.3 summarizes the steps involved in our data-tier partitioningprocess.Figure 4.3: The high-level steps in performing data-tier partitioning.For profiling, unlike the case of code partitioning, where CPU usageand data exchange are evaluated at the level of software functions, forthe data-tier these performance footprints are collected by monitoringthe database engine and how it works on operations (e.g., join, selection,and projection) as well as operands (i.e., database tables) involved in a query.Dependency analysis, as explained in Section 4.1, involves analyzingall applicable models of executing a query. Given a query, the order ofapplying operations involved in a query might be changeable. In this case,the analyzer would examine all different variations for the query plan, andcollect statistics on each instance of the query plan. The collected data isthen fed to the partitioner in order to create the mathematical formulationbased on the identified query plans.874.2. High-Level Overview of Data-tier PartitioningThe partitioning phase involves creating the constraints and theobjective function for the optimization algorithm. The constraints consistof the business-level constraints as well as the constraints internal to theexecution of join orders in a query plan. The business level constraintsshould assure the placement of nodes (i.e., operations and operands inthe data dependency graph) of the chosen query plan either on premisesor in the cloud. They also ensure satisfaction of constraints on cost ofdeployment or expected performance. The internal constraints for a queryplan ensure exclusivity of query plans such that out of all possible vari-ations for a given query, eventually only one is suggested as the optimal plan.Finally, the generated constraints and objective function should bemerged with the constraints and objective functions defined for the codedependency graph. The mathematical formulation of the partitioningalgorithm for the code dependency graph makes assumptions about thevalue of variables representing node placements in the cloud or on thepremises (see Section 3.5). Those same assumptions should hold for thedata dependency graph and its mathematical formulation as well. Thenovelty of our approach is that instead of optimizing to a specific join orderin isolation of the structure of application-tier execution, we encode the pos-sible orders together with the BIP of the application-tier as a combined BIP.As a high-level example for how the query plan execution works, letus consider a query plan joining three tables A, B, and C with two possibleorders of executing joins: ((A 1 B) 1 C) and (A 1 (B 1 C)). We encode thepossibilities for the execution of the query plans into a query tree similarto the one shown in Figure 4.4. The left subtree in the tree represents thefirst possible join and the right one represents the second possible join.Our partitioning of the data-tier using the query plan requires this treefor every query plan to be created and encoded into a BIP formulation. Thecombined BIP formulation from all query plans defines the problem whosesolution provides the optimized partitioning of the data-tier. Steps for thecreation of the BIP formulation for this query tree consist of the followingparts:1. Collecting Operator Status: Identifying alternative execution of joinorders in a given query plan. For the example of Figure 4.4, it isto identify that joining A, B, and C can be achieved through either((A 1 B) 1 C) or (A 1 (B 1 C))884.2. High-Level Overview of Data-tier PartitioningCODE_____|______| |AB_C A_BC| |___|___ ___|___| | | |AB C A BC| |___|___ ___|___| | | |A B B CFigure 4.4: The query plan tree representing alternatives in executing agiven query joining three tables A, B, and C with one another.2. Generating Choices: Ensuring mutual exclusivity of alternative queryplans. This implies that out of ((A 1 B) 1 C) and (A 1 (B 1 C)) onlyone of them is supposed to be chosen by the analyzer.3. Generating Input Constraints: Ensuring that selection of (A 1 (B 1C)) requires its inputs, i.e., (B 1 C) joined with A to also be chosen.Further to this, for each join operation in the query plan the followingconstraints have to be met:1. At Most One Location Placement: for every child in a join opera-tion, there should be a single placement either in the cloud or onthe premises. That is, for each table or join operation in Figure 4.4,it should happen on premises, in the cloud, or it should be ignored.That is, if ((A 1 B) 1 C) is chosen then there should be placementdecisions for A, B, and C as well as (A 1 B); however (B 1 C) should becanceled out of the BIP formulation.2. Generate Execution Cost: For each child of a join operation, the costof its executions should be considered.3. Generate Communication Cost: For each child of a join operation, thecost of its data exchange should be considered.In the following sections we go into the details of how each step of pro-filing, analysis, and partitioning for the data-tier is performed.894.3. Profiling the Data-tier with Explain Plan4.3 Profiling the Data-tier with Explain PlanThe process of data-tier partitioning starts with Profiling the Data-tier withEXPLAIN PLAN as highlighted in Figure 4.5.Figure 4.5: The first phase in partitioning the data-tier is to profile thedata-tier with EXPLAIN PLAN.Profiling information is available for query execution through theExplain Plan SQL command. Given a particular query, this commandprovides a tree-structured result set detailing the execution of the query.We use a custom JDBC driver wrapper to collect information on theexecution of queries. During application profiling (cf. Chapter 3) whenevera query is issued by the application-tier, our JDBC wrapper intercepts thequery and collects the plan for its execution.From the extracted statistics on the executed query, we are interested inthe plan of execution for the query and its execution footprint. In particularwe are interested to collect data on how much time it takes for the query toexecute, what is the measured CPU usage and data exchange after queryexecution, and finally, the detailed plan on the execution of the query (i.e.,the query plan). The query plan consists of a list of nodes, i.e., operationsand operands, and also the order in which operations (i.e., joins, selections,or projections) are applied to the operands (i.e., database tables). Theextracted plan shows only the pattern of execution for the recently issuedquery but it can also help extract alternative plans of execution for a query.In a query plan, we use the term join orders to refer to the order in whichthe join operators are composed.From the plan returned by the database, we extract the following infor-mation:904.3. Profiling the Data-tier with Explain Plan1. type(op): Each node in the query plan is an operator such as a join,table access, selection (i.e. filter), sort, etc. We leverage the database’sown cost model directly by recording from the provided plan howmuch each operator costs. Hence, we don’t need to evaluate differentoperator implementations to evaluate their costs. On the other hand,we do need to handle joins specially because table placement is greatlyaffected by their ordering.2. cpu(op): This statistic gives the expected time of execution for aspecific operator. In general, we assume that the execution of a requestin a hybrid web application will be dominated by the CPU processingof the application-tier and the network latency. So in many cases, thisstatistic is negligible. However, we include it to detect the odd case ofexpensive query operations which can benefit from executing on thepublic cloud.3. size(op): This statistic captures the expected number of bytes outputby an operator which is equal to the expected number of rows timesthe size of each retrieved row. From the perspective of the plan tree-structure, this is the data which flows from a child operator to itsparent.4. predicates(joinOp): Each join operator combines two inputs basedon a set of predicates which relate those inputs. We use these predi-cates to determine if alternative join orders are possible for a query.The process of collecting profiling data is conducted through the exe-cution of the web application and collecting statistics. The data collectedfrom execution of identical queries are aggregated and normalized in orderto achieve an estimate average on cost and performance footprints. As aresult the CPU usage and data size collected are in fact aggregates of allexecutions of a unique query plan.As discussed earlier, when profiling the application, the profiler observesand collects execution statistics only for plans that get executed but notfor its alternative join orders. However, the optimal plan executed bythe database engine in a distributed hybrid deployment can be differentfrom the one observed during a centralized non-distributed profiling of theapplication. When optimizing the deployment, it is important to be ableto examine all the alternative plans and choose the one that results inbetter optimization. For example, in a centralized execution of the query in914.4. Analyzing Dependencies for the Data-tierFigure 4.2, both executions of the query may result in identical executionfootprints. However, in a distributed execution, the query of Figure 4.2b ispreferred over that of Figure 4.2a due to the reduced number of networkround-trips. This makes the plan of Figure 4.2b the optimal plan in adistributed execution of the query.In order to make the BIP partitioner aware of alternative orders, we haveextended our JDBC wrapper to consult the database engine and examinethe alternatives by utilizing a combination of Explain Plan and join orderhints. Our motivation is to leverage the already existing cost model from aproduction database for cost estimation of local operator processing, whilestill covering the space of all query plans. The profiler also captures whichsets of tables are accessed together as part of an atomic transaction. Thisinformation is used to model additional costs of applying a two-phase com-mit protocol, should the tables get partitioned. The collected metrics fordata exchange and CPU usage are then fed through the same cost modelsexplained in Chapter 3 to provide equivalent monetary cost models to thoseof the application-tier for the data-tier.4.4 Analyzing Dependencies for the Data-tierOnce the profiling data is collected using Explain Plan, we start the sec-ond phase of Query Plan Dependency Analysis as highlighted in Figure 4.6.Figure 4.6: The second phase in partitioning the data-tier is to analyzedependencies in a query plan.As discussed in the previous section, we need to encode enough infor-mation in the BIP so it can reason over all possible plans. Otherwise, theBIP optimizer would mistakenly assume that the plan executed during our924.4. Analyzing Dependencies for the Data-tierinitial profiling is the only one possible. For example, during initial profilingon a single host, we may only observe the plan from Figure 4.2a. However,in the example scenario, we saw that the plan in Figure 4.2b introducesfewer round-trips across a hybrid architecture. We need to make sure theright plan is accounted for when deciding about table placement. Ourstrategy to collect the necessary information for all plans consists of twosteps: (i) gather statistics for all operators in all plans irrespective of howthey are joined, and (ii) encode BIP constraints about how the operatorsfrom step (i) can be joined. Here we describe step 1 and then describe step2 in the next subsection.As is commonly the case in production databases, we assume a queryplan to be left-deep [88]. In a left-deep query plan, a join takes two inputs:one from a single base relation (i.e. table) providing immediate input(referred to as the “inner relation”); and another one potentially derived asan intermediate result from a different set of relations (the “outer relation”).The identity of the inner relation and the set of tables comprising the outerrelation uniquely determine the estimated best cost for an individual joinoperator. This is true regardless of the order in which the outer relation wasderived [91]. For convenience in our presentation, we call this informationthe operator’s id, because we use it to represent an operator in the BIP.As an example, the root operator in Figure 4.2a takes accountPro-file as an inner input and {holding, account} as an outer input. Theoperator’s id is then {(holding, account), accountProfile}. We willrefer to the union of these two inputs as a join set (the set of tables joinedby that operator). For example, the join set of the aforementioned operatoris {holding, account, accountProfile}. Notably, while the join setsfor the roots of Figures 4.2a & 4.2b are the same, Figure 4.2b’s root nodehas the operator id {(accountProfile, account), holding} allowingus to differentiate the operators in our BIP formulation. Our task is tocollect statistics on the execution of possible join operators with unique ids.As discussed in Section 4.3, this can be done by instructing the databaseengine to execute other variations of join orders for a given query.Most databases provide the capability for developers to provide hintsto the query optimizer in order to force certain join orders. For example inOracle, a developer can use the hint LEADING(X, Y, Z, ...). This tellsthe optimizer to create a plan where X and Y are joined first, then theirintermediate result is joined with Z, etc. We use this capability to extract934.4. Analyzing Dependencies for the Data-tierstatistics for all join orders. For tables in a query that are not referenced ina LEADING hint, the optimizer is still free to choose any order. As mentionedearlier, when collecting statistics on the execution of a query plan, we canalso extract information on the type and order of joins executed for thegiven query plan. This information can be utilized in order to understandother feasible query plans based on the set of observed tables and joins inthe original query plan. We have developed an algorithm that allows forextraction and execution of all possible query plans by sending hints to thedatabase engine.Program 4.1 takes as input a query observed during profiling. In line2, we extract the set of all tables referenced in the query. Next, we startcollecting operator statistics for joins over two tables and progressivelyexpand the size through each iteration of the loop on line 3. The table t,selected for each iteration of line 4 can be considered as the inner input of ajoin. Then, on line 5 we loop through all sets of tables of size i which don’tcontain t. On line 6, we verify if t is joinable with the set S by making surethat at least one table in the set S shares a join (access) predicate witht. This set forms the outer input to a join. Finally, on line 7, we collectstatistics for this join operator by forcing the database to explain a plan inwhich the join order is prefixed by the outer input set, followed by the innerinput relation. We record the information for each operator by associatingit with its id.For example, consider Figure 4.2 as the input Q to Program 4.1.In a particular iteration of line 5, i might be chosen as 2 and t as ac-countProfile. Since accountProfile has a predicate shared withaccount, S could be chosen as the set of size 2: {account, holdings}.Now on line 6, explainPlanWithLeadingTables({account, holdings},accountProfile) will get called and the statistics for the join operatorwith the corresponding id will get recorded.The bottom-up structure of the program is similar to the classic dynamicprogramming algorithm for query optimization [91]. However, in our case wemake calls into the database to extract costs by leveraging Explain Planand the LEADING hint. The complexity of Program 4.1 is O(2n) (where n isthe number of tables); which is the same as the classic algorithm for queryoptimization [91], so our approach scales in a similar fashion. Even thoughProgram 4.1’s complexity is exponential, queries typically operate on anorder of tens of tables making the performance of the program reasonable.944.5. Partitioning the Data-tierProgram 4.1 Function to collect statistics for alternative query plan operatorsfor the input query Q. Pi is the powerset operator over sets of size i.Function collectOperatorStats(Q)tables← getTables(Q) for i← 1 to |tables| doforeach t ∈ tables doforeach S ∈ Pi(tables− {t}) doif isJoinable(S, t) thenexplainPlanWithLeadingRelations(S, t)endendendendend4.5 Partitioning the Data-tierThe last step in data-tier partitioning involves encoding the collectedprofiling data and dependency information for query plans into a BIPformulation. This is done through Defining the Constraints and theObjective Function as highlighted in Figure 4.7. Having the constraintsfor the data-tier defined and the objective function described, integrationof the data-tier partitioning with the code-tier partitioning is as simpleas defining a union set of the constraints and appending the objectivefunctions. We describe defining the constraints in the first subsection anddefining the objective function in the second subsection of this section.Figure 4.7: The third phase in partitioning the data-tier is to encode thepartitioning problem into a BIP formulation.954.5. Partitioning the Data-tier4.5.1 Defining BIP Constraints for Data DependenciesGiven the statistics for all operators with a unique id, we need to instructthe BIP how they can be composed. The mathematical formulation forthe composition is important as it needs to produce the target query andensure the exclusive selection of the optimized query plan from the set offeasible ones for a given query. Our general strategy is to model each queryplan operator, op, as a binary variable in a BIP. The variable will take onthe value 1 if the operator is part of the query plan which minimizes theobjective of the BIP and 0 otherwise. Each possible join set is also modeledas a variable. Constraints are used to create a connection between operatorsthat produce a join set and operators that consume a join set (cf. Table 4.1).The optimizer will choose a plan having the least cost given both theoptimizer’s choice of table placement and function execution placement(for the application-tier). Each operator also has associated variablesopcloud and oppremises which indicate the placement of the operator. Tableplacement is controlled by each table’s associated table access operators.The values of these variables for operators in the same query plan will allowus to model the communication costs associated with distributed queries.Our program to formulate these composition constraints makes useof two helper functions as shown in Table 4.1, namely genChoice andgenInputConstraint. When these functions are called by our programs,they append the generated constraint to the BIP that was already builtfor the application-tier. The first function, genChoice, encodes that aparticular join set may be derived by multiple possible join operators (e.g.,{holding, account, accountprofile} could be derived by either ofthe root nodes in Figure 4.2). The second function, genInputConstraint,encodes that a particular join operator takes as inputs the join sets ofits two children. It ensures that if op is selected, both its children’sjoin sets (inleft and inright) are selected as well, constraining whichsubtrees of the execution plan can appear under this operator. The “≥”inequality in Table 4.1 helps to encode the boolean logic op→ inleft∧inright.Starting with the final output join set of a query, Program 4.2 recursivelygenerates these constraints encoding choices between join operators andhow parent operators are connected to their children. It starts on line 2 bycalling a function to retrieve all operator ids which could produce that joinset (these operators were all collected during the execution of Program 4.1).964.5. Partitioning the Data-tierFunction genChoice(joinSet, {op1 ... opn})Generated constraint op1 + ... + opn = joinSetDescription a joinSet is produced by one and only one ofthe operators op1 ... opnFunction genInputConstraint(op, {inleft, inright})Generated constraint −2× op + inleft + inright ≥ 0Description If op is 1, then variables representing itsleft and right inputs (inleft and inright) must both be 1Table 4.1: Constraint generation functionsIt passes this information to genChoice on line 3. On line 4, we loop overall these operator ids, decomposing each into its two inputs on line 5. Thisinformation is then passed to genInputConstraint. Finally on line 7,we test for the base case of a table access operator. If we have not hitthe base case, then the left input becomes the join set for recursion on line 8.Once the constraints are defined, the next step would be to define theobjective function for the BIP formulation.4.5.2 Defining the BIP Objective for Data DependenciesCreating the optimization objective function consists of two parts: (i) de-termining the cost of executing individual operators, and (ii) determiningthe costs of distributing operators in a hybrid setting. Once all the costs areProgram 4.2 Constraint generation, using functions from Table 4.1. The detailsfor the functions getOperatorsForJoinSet, getInputs, sizeof, and left are notshown but their uses are described in the text.Function createConstraints(joinSet)ops← getOperatorsForJoinSet(joinSet)genChoice(joinSet, ops)foreach op ∈ ops doinputs← getInputs(op)genInputConstraint(op, inputs)if sizeof(left(inputs)) > 0 thencreateConstraints(left(inputs))endendend974.5. Partitioning the Data-tiermeasured we need to create a mathematical formulation of those costs andencode it into the objective function.Determining the Cost of Executing OperatorsThe magnitude of the execution cost for each operator and the communica-tion cost between operators that are split across the network are computedusing the cost model introduced in Chapter 3. This accounts for thevariation between local execution and distributed execution in that thelatter will make use of a semi-join optimization to reduce costs (i.e. inputdata to a distributed join operator will transmit only the columns needed tocollect matching rows). We need to encode information on CPU and datatransmission costs into the objective function. Table 4.2 shows functions togenerate these constraints. The first constraint specifies that if an operatoris included as part of a chosen query plan (its associated id variable is setto 1), then either the auxiliary variable opcloud or oppremises will have tobe 1 but not both. This enforces a single placement location for op. Thesecond builds on the first and toggles the auxiliary variable cutop1,op2 whenop1cloud and op2premises are 1, or when op1premises and op2cloud are 1.The objective function itself is generated using two functions in Ta-ble 4.3. For each operator op, function genExecutionCost charges to theobjective function one of the following three values: (i) the execution costof the operator on the cloud infrastructure if opcloud is set to one, (ii) theexecution cost of the operator on the premises infrastructure if oppremises isset to one, or (iii) zero if neither of the two variables is set to one. Notethat it will never charge both due to the constraints of Table 4.2. Thesecond function charges the communication cost between two operatorsFunction genAtMostOneLocation(op)Generated constraint opcloud + oppremises = opDescription If the variable representing op is 1, then either the variablerepresenting it being placed in the cloud is 1 or the variablerepresenting it being place in the premises is 1Function genSeparated(op1, op2)Generated constraint op1cloud + op2premises - cutop1,op2 ≤ 1op1premises + op2cloud - cutop1,op2 ≤ 1Description If the variables representing the locations of two operatorsare different, then the variable cutop1,op2 is 1Table 4.2: Functions for generating objective helper constraints984.5. Partitioning the Data-tierFunction genExecutionCost(op)Generated objective component opcloud× execCostcloud(op) +oppremises× execCostpremises(op)Description If the variable representing op deployed in thecloud/premises is 1, then charge the associated cost ofexecuting it in the cloud/premises respectivelyFunction genCommCost(op1, op2)Generated objective component cutop1,op2× commCost(op1, op2)Description If cutop1,op2 for two operators op1 and op2 wasset to 1, then charge their cost of communicationTable 4.3: Functions for generating objective functionif the associated cut variable was set to 1. In the case that there is nocommunication between two operators this cost is simply 0.Program 4.3 takes a join set as input and follows a similar structureto Program 4.2. The outer loop on line 3, iterates over each operator thatcould produce the particular join set. It generates the location constraintson line 4 and the execution cost component to the objective function onProgram 4.3 Objective generationFunction createObjFunction(joinSet)ops← getOperatorsForJoinSet(joinSet)foreach op ∈ ops dogenAtMostOneLocation(op)genExecutionCost(op)inputs← getInputs(op)foreach input ∈ inputs doforeach childOp ∈ getOperatorsForJoinSet(input) dogenSeparated(op, childOp)genCommCost(op, childOp)endendif sizeof(left(inputs)) > 0 thencreateObjFunction(left(inputs))endendend994.5. Partitioning the Data-tierline 5. Next, on line 7, it iterates over the two inputs to the operator. Foreach, it extracts the operators that could produce that input (line 8) andgenerates the communication constraint and objective function component.Finally, if the left input is not a base relation (line 11), it recurses using theleft input now as the next join set.Determining the Cost of Distributing OperatorsWe extend the previous cost model to account for possible transactiondelays. We assume that if the tables involved in an atomic transaction aresplit across the cloud and the private premises, by default the transactionwill be resolved using the two-phase commit protocol.Performance overhead from atomic two-phase distributed transactionscomes primarily from two sources: protocol overhead and lock con-tention [47]. Protocol overhead is caused by the latency of prepare and com-mit messages in a database’s two-phase commit protocol. Lock contention iscaused by queuing delay which increases as transactions over common tablerows become blocked. We provide two different strategies to account for thecost of such overhead when formulating our objective function:• For some transactions, lock contention is negligible. This is becausethe application semantics do not induce sharing of table rows betweenmultiple user sessions. By this we mean that at any point in time,during the execution of the application, only a single request to theapplication is tampering with the data at a given row of any databasetable in the application. For example, in the DayTrader example ofFigure 4.1, although Account and Holdings tables are involved inan atomic transaction, specific rows of these tables are only ever ac-cessed by a single user concurrently. The reason is that the operationjoining the two tables applies its filtering based on the userid of theuser making the request. As such, each join operation exclusively ac-cesses the data that belongs to the given user. In such cases, where notable rows are shared between multiple user sessions, we account forthe overhead of the two phase commit by charging it to the objectivefunction. This is done by adding the cost of two extra round-trips be-tween the cloud and the private premises to the objective function, oneto prepare the remote site for the transaction and another to commitit. This model assumes that no extra request would cause any extra1004.5. Partitioning the Data-tierdelay in applying a two-phase commit or degrading the performanceof executing the request. Applying the extra charges to the objectivefunction reduces chances for tables involved in a two phase commit tobe distributed across a hybrid deployment.• In situations where congestion in accessing table rows is expected,the cost of a two-phase commit could be worsened if it is applied toa distributed deployment of tables involved. For such cases wherelock contention is expected to be considerable, developers can requestthat certain tables be co-located in any partitioning suggested by ourtool. This prevents locking for transactions over those tables to bedelayed by network latency. Since such decisions require knowledgeof application semantics that are difficult to infer automatically, ourtool provides an interactive visualization of partitioning results. Thisallows developers to work through different “what-if” scenarios of tableco-location constraints and the resulting suggested partitioning.It should be noted that we have chosen a conservative strategy ofassessing data partitioning costs by assuming that tables involved inthe same atomic transaction will be resolved using a two-phase commitstrategy. In practice, the two-phase commit strategy is to ensure thatthe ACID properties associated with a transaction are preserved. Intheory, unless strong integrity, consistency, and durability is required, sometransactions are resolved using strategies other than a two-phase commit,e.g., retry or compensation [62]. In fact, due to the scalability issueswith preserving consistency for transactions in large scale web applica-tions, a lot of the new web applications are choosing to provide eventualconsistency in favor of performance. With respect to our algorithms, it im-plies smaller penalties for separating database tables from one another andthus more loose-coupling between tables when it comes to their partitioning.With the possibility to freely separate database tables form one another,there are better chances for collocation of tightly coupled code and data.Partitioning at the data tier and increased collocation of code and dataimplies less network round-trip overhead, through which we expect hybridpartitioning of web applications to improve.Having appended the constraints and objective components associatedwith query execution to the application-tier BIP, we make a connectionbetween the two. This is done by encoding the dependency between each1014.6. Evaluationfunction that executes a query and the possible root operators for theassociated query plan. This allows for the optimization formulation fromthe application-tier partitioning to be combined with the optimizationformulation from the data-tier partitioning. Solving this combined op-timization problem results in a deployment that considers optimality ofplacement for software functions and database tables at the same time. Wecontinue to claim that the resulting cross-tier partitioning is efficient interms of performance and cost savings compared to partitioning only at theapplication-tier of an OLTP-style software system.In the next Section we show how hybrid deployments of a web applicationfollowing the cross-tier partitioning algorithm performs better compared toa situation where only code is partitioned.4.6 EvaluationIn this section we present the results of evaluations we performed to assessthe contributions discussed in this chapter. In particular, we provide answersto the following research questions:1. How does combined data-tier partitioning and application-tier parti-tioning affect the performance of the application compared to onlyapplication-tier partitioning, or no partitioning? (cf. Section 4.6.1)2. How is the overall cost of deployment affected when application-tierpartitioning is combined with data-tier partitioning compared to othermodels of application deployment? (cf. Section 4.6.2)3. How is the scalability of the system affected when cross-tier partition-ing is used, compared to other models of application deployment? (cf.Section 4.6.3)We evaluate the cross-tier partitioning work using two different ap-plications: Apache DayTrader [5] and RUBiS [4]. DayTrader is a Javabenchmark of a stock trading system. RUBiS implements the functionalityof an auctioning Web site. Both applications have already been used inevaluating previous cloud computing research [64, 95]. We have carefullychosen Apache DayTrader and RUBiS for our evaluations, due to theirunique characteristics in representing web applications with requirementsfor privacy and scalability at the same time. Even though our deploy-ments for Apache DayTrader and RUBiS are not totally comparable with1024.6. Evaluationreal-world software deployments (due to incomparable network traffic, datasize, and transaction frequency), they still allow us to analyze code anddata inter-dependencies in an OLTP-style web application with privacyrequirements on data placements.We can have 9 possible deployment variations with each of the data-tierand the application tier being (i) on the private premises, (ii) on the publiccloud, or (iii) partitioned for hybrid deployment. Out of all the placementswe eliminate the 3 that place all data in the cloud as it contradicts theconstraints to have constrained information on-premises. Also, we considerdeployments where only data is partitioned as a subset of deploymentswith both code and data partitioned. Thus our evaluation of code anddata partitioning deployments includes deployments where only data ispartitioned. The remaining four models deployed for evaluations were asfollows: (i) both code and data are deployed to the premises (Private-premises); (ii) data is on-premises and code is in the cloud (Na¨ıve-Hybrid);(iii) data is on-premises and code is partitioned (Split-Code); and (iv) bothdata and code are partitioned (Cross-Tier). Table 4.4 shows the list ofdeployments. The main reason for not considering a data-only partitioningdeployment for the evaluation is that the benefit of using the public cloudis in providing cheap resources, i.e. CPU, memory, and storage. Accordingto [109], with web applications, the dominating bottleneck is the CPU whenhorizontally scaling the business logic tier of the application. In a data-onlypartitioning model, the business logic part of the application would beplaced on the private premises, defeating the purpose of hybrid deployments.For both DayTrader and RUBiS, we consider privacy incentives to bethe reason behind constraining placement for some database tables. Assuch, when partitioning data, we constrain tables storing user information(account and accountprofile for DayTrader and users for RUBiS) to beplaced on-premises. The remaining tables are allowed to be flexibly placedCodeData premises Cloud Partitionedpremises Private-premises Na¨ıve-Hybrid Split-CodeCloud n/a n/a n/aPartitioned n/a n/a Cross-TierTable 4.4: The map of naming conventions used for different deploymentmodels in the evaluation.1034.6. Evaluationon-premises or in the cloud.We used the following setup for the evaluation: for the premises ma-chines, we used two 2.8 GHz core i5 machines (quad cores) with 8.0 GB ofmemory, one as the application server and another as our database server.Both machines were located at our lab in Vancouver, and were connectedthrough a 100 Mb/sec data link. For the cloud machines, we used an extralarge EC2 instance with 8 EC2 Compute Units and 7.0 GB of memory as ourapplication server and another extra large instance as our database server.Both machines were leased from Amazon’s US West region (Oregon) andwere connected by a 1 Gb/sec data link. We use Jetty as the web serverand Oracle 11g Express Edition as the database servers. We measured themedian round-trip latency between the cloud and our lab to be 15 millisec-onds. Our intentions for choosing these setups is to create an environmentwhere the cloud offers the faster and more scalable environment. To gener-ate load for the deployments, we launched simulated clients from a 3.0 GHzquad core machine with 8 GB of memory located in our lab in Vancouver.To ensure validity of the collected results, all the experiments mentioned inthis section are executed for 10 minutes and are repeated a minimum of fivetimes. In the rest of this section we provide the following evaluation resultsfor the four deployments described above: execution times (Section 4.6.1),expected monetary deployment costs (Section 4.6.2), and scalability undervarying load (Section 4.6.3).4.6.1 Evaluation of PerformanceIn this section we analyze how the performance of a hybrid deploymentpartitioned using a cross-tier algorithm is compared to other methods ofdeplyment discussed earlier in this section. The hypothesis is that applyingcross-tier partitioning for hybrid cloud deployment would result in betteroverall performance in the web applications.We measured the execution time across all request types in DayTraderand RUBiS under a load of 100 requests per second, for ten minutes. Byexecution time we mean the elapsed wall clock time from the beginning tothe end of each request type. Figures 4.8 and 4.9 show those with largestaverage execution times. We model a situation where CPU resources arenot exhausted. As shown in Figure 4.8 and Figure 4.9, execution time incross-tier partitioning is better than any other model of hybrid deploymentand is comparable to a non-distributed private premises deployment. This1044.6. Evaluationis due to the fact that, a higher percentage of code and data is co-located ina cross-tier deployment compared to any other hybrid deployment; resultingin a reduced number of network round-trips and improved performance.0 50 100 150 200 250 300 350 400doLogindoBuydoPortfoliodoAccountdoQuotesdoAccountUpdatedoSellExecution time / Request (milliseconds)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.8: Measured execution times for selected request types in the four de-ployments of DayTrader.As Figure 4.8 shows, response time for DayTrader’s doLogin underCross-Tier deployment is 50% faster than Na¨ıve-Hybrid while doLogin’s re-sponse time for Cross-Tier is only 5% slower compared to Private-premises(i.e., the lowest bar in the graph). It can also be seen that, for doLogin,Cross-Tier has 25% better response time compared to Split-Code, showingthe effectiveness of cross-tier partitioning compared to partitioning onlyat the application-tier. Similarly for other business logic functionality, wenote that cross-tier partitioning achieves performance improvements whencompared to other distributed deployment models. It results in performancemeasures similar to a full premises deployment. For the case of DayTrader- across all business logic functionality of Figure 4.8 - Cross-Tier results1054.6. Evaluation0 100 200 300 400 500 600 700 800AboutMeViewBidHistoryViewItemViewUserInfoBrowseCategoriesSearchItemsByCategorySearchItemsByRegionExecution time / Request (milliseconds)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.9: Measured execution times for selected request types in the four de-ployments of RUBiS.in an overall performance improvement of 56% compared to Na¨ıve-Hybridand a performance improvement of around 45% compared to Split-Code.By making data-partitioning possible and allowing for code and data to beco-located, we see improvements in performance. Compared to a code-onlypartitioning of the web application, the performance is improved sincedata-partitioning allows for moving less privacy sensitive data to the cloud.This leads to further reducing the number of network round-trips otherwisehappening between distributed code and data.We observed similar performance improvements for RUBiS. Cross-TierRUBiS performs 28.3% better - across all business logic functionality ofFigure 4.9 - compared to its Na¨ıve-Hybrid, and 15.2% better compared toSplit-Code. Based on the results, cross-tier partitioning provides flexibilityfor moving function execution to the cloud and can increase performancefor a hybrid deployment of an application. Table 4.5 these performance1064.6. Evaluationimprovements for DayTrader and RUBiS.DeploymentsCross-Tier Private-premises Na¨ıve-Hybrid Split-CodeDayTrader - doLogin 5% slower 50% faster 25% fasterDayTrader - Overall 9% slower 56% faster 45% fasterRUBiS - Overall 27% slower 42% faster 24% fasterTable 4.5: Response time evaluation of the Cross-Tier deployment comparedto other models of deployment (a) for doLogin and (b) across all businesslogic functionality. Results show how a deployment based on cross-tier par-titioning improves or degrades response time compared to the other modelsof deployment.4.6.2 Evaluation of Deployment CostsNext, we evaluate the effects of cross-tier partitioning on the estimatedmonetary deployment costs of DayTrader and RUBiS (Figure 4.10) undervarious deployment models. The hypothesis is that we would see animprovement for the cost of deployment when the hybrid deployment isdriven by a cross-tier partitioning algorithm.For computing monetary costs of deployments, we use parameters takenfrom the advertised Amazon EC2 service where the cost of an extra largeEC2 instance is $0.48/hour and the cost of data transfer is $0.12/GB fordata transfer inbound to the cloud and $0.0/GB for data transfer outboundto the cloud. To evaluate deployment costs, we apply these machine anddata transfer costs to the performance results from Section 4.6.1, scale theobtained deployment costs to one month cost of deployment, and experi-ment with different ratio of premises-to-cloud deployment costs to assessthe effects of varying cost of private premises on the overall deployment costs.As shown in both graphs, a Private-premises deployment of web appli-cations results in a cost increase linear to the increased cost of machineson premises, rendering such deployments inefficient. In contrast, allpartitioned deployments of the applications result in optimal deployments(a logarithmic increase in cost with the cost increase for machines onpremises) with Cross-Tier being the cheapest deployment. For a cloud cost80% cheaper than the private-premises cost (5 times ratio), DayTrader’s1074.6. Evaluation1 3 5 7 95001,0001,5002,0002,5003,000Ratio of premises to Public Machine CostsExpectedDeploymentCost($/Month)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.10: Monthly cost comparison for different deployments of DayTrader andRUBiS.Cross-Tier is 20.4% cheaper than Private-premises and 11.8% cheaper thanNa¨ıve-Hybrid and Split-Code deployments. RUBiS achieves even bettercost savings with Cross-Tier being 54% cheaper than Private-premises and29% cheaper than Na¨ıve-Hybrid and Split-Code. As shown in Figure 4.10and Figure 4.11, in cases where only code is partitioned, an increase incosts for machines on-premises eventually results in the algorithm pushingmore code to the cloud to the point where all code is in the cloud and alldata is on-premises. In such a situation Split-Code eventually converges toNa¨ıve-Hybrid; i.e., pushing all the code to the cloud. Similarly, Cross-Tierwill finally stabilize. However since in Cross-Tier part of the data is alsomoved to the cloud, the overall cost is cheaper than Na¨ıve-Hybrid andSplit-Code. Table 4.6 summarizes the overall cost savings of cross-tier de-ployments compared to other deployment models for DayTrader and RUBiS.Intuitively, this cost reduction is reasonable. In cross-tier partitioning,by enabling data partitioning, the unconstrained data has freedom to be1084.6. Evaluation1 3 5 7 901,0002,0003,0004,0005,000Ratio of premises to Public Machine CostsExpectedDeploymentCost($/Month)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.11: Monthly cost comparison for different deployments of RUBiS.pushed to the cloud. This leads to pushing the dependent code to the cloudas well, which increases utilization of cloud resources.DeploymentsCross-Tier Private-premises Na¨ıve-Hybrid Split-CodeDayTrader - Overall 20.4% cheaper 11.8% cheaper 11.8% cheaperRUBiS - Overall 58.2% cheaper 38.8% cheaper 19.1% cheaperTable 4.6: Cost improvements of Cross-Tier deployment compared to othermodels of deployment. Results show how a deployment based on cross-tierpartitioning improves deployment costs compared to the other models ofdeployment.4.6.3 Evaluation of ScalabilityWe also performed scalability analyses for both DayTrader and RUBiS tosee how different placement choices affect application throughput. The aim1094.6. Evaluationis to show that cross-tier partitioning results in better application through-put due to improved performance. DayTrader comes with a random clientworkload generator that dispatches requests to all available functionalityon DayTrader. On the other hand, RUBiS has a client simulator designedto operate either in the browse mode or the buy mode. In the browse modeof RUBiS, only requests for browsing categories and items in the auctioningweb site are dispatched to the web application. In the buy mode however,requests for buying or selling items from each category are also issued.For both DayTrader and RUBiS we used a range of 10 to 1000 clientthreads to send requests to the applications in 5 minute intervals with 1minute ramp-up. For RUBiS, we used the client in buy mode. Resultsare shown in Figures 4.12 and 4.13. As the figure shows, for both ap-plications, after the number of requests passes a certain threshold of 300requests/sec, Private-premises becomes overloaded. For Na¨ıve-Hybrid and10 100 300 500 1,0001002003004005006007008009001,000Number of Simulated UsersThroughput(req/sec)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.12: Scalability tests for full premises, full cloud, and hybrid deploymentsfor DayTrader.1104.6. Evaluation10 100 300 500 1,000100200300400500600700800900Number of Simulated UsersThroughput(req/sec)Private-premises Na¨ıve-HybridSplit-Code Cross-TierFigure 4.13: Scalability tests for full premises, full cloud, and hybrid deploymentsfor RUBiS.Split-Code on the other hand, the applications progressively provide bet-ter throughput. However, due to the bottleneck when accessing the dataon premises, both deployments maintain a consistent increase in through-put. Finally, Cross-Tier achieved the best scalability. For 500 requests/secin RUBiS, the Cross-Tier deployment is 30% better than Private-premises,52% better than Na¨ıve-Hybrid, and 49% better than Split-Code. Similarlyfor DayTrader with 500 requests/sec, Cross-Tier deployment is 17% betterthan Private-premises, 31% better than Na¨ıve-Hybrid, and 22% better thanSplit-Code. With a large portion of the data in the cloud, the underlyingresources for both code and data can scale to reach better overall through-put for the applications. Despite having part of the data on the privatepremises, due to its small size the database machine on premises becomescongested at a slower rate and the deployment can keep a high throughput.1114.7. Summary4.7 SummaryIn this chapter we demonstrated that combining code and data dependencymodels can lead to cheaper and better performing hybrid deploymentof Web applications. In particular, we showed that for our evaluatedapplications, combined code and data partitioning can achieve up to 56%performance improvement compared to a na¨ıve partitioning of code anddata between the cloud and the premises and a more than 40% performanceimprovement compared to when only code is partitioned. Similarly, fordeployment costs, we showed that combining code and data can provideup to 54% expected cost savings compared to a fully premises deploy-ment and almost 30% expected savings compared to a na¨ıvely partitioneddeployment of code and data or a deployment where only code is partitioned.The results presented here demonstrate that a cross-tier partitioning ofboth code and data improves performance, cost efficiency, and scalabilityof deployed web applications. Even though the applications presented inour evaluation are benchmark applications, their overall architecture issimilar to those of real world software systems to ensure that the results arewidely applicable to real world software systems. In particular, the resultsof our analysis demonstrate that co-location of code and data helps reducedata transfer rate and network round-trips. The results also demonstratethat the entanglement of software functions and database tables directlyaffects how easily software functions and database tables can be separatedfrom one another for distributed deployment. These characteristics are notspecific to the type of applications we analyzed, and in fact are independentfrom the workload the application perceives or the size of the databaseused. Consequently, the results discussed in this chapter can be equallyapplicable to real world deployments of software systems and in particularweb applications.Similar to Chapter 3, next we describe some of those assumptions wemade throughout this chapter, enumerate threats to validity, and describesolutions or potential directions for future research:• We assumed the behaviour of the web application across its differenttiers to be deterministic in responding to incoming requests. It wasbased on this assumption that we were able to model and project costand performance metrics for the web application for different deploy-ments and under different load. We acknowledge that web applications1124.7. Summarycould behave nondeterministically when running in multi-threaded, re-source constrained environments. While such behaviour may affect ouroverall modelling, the partitioning and distribution of the web applica-tion is independent from how the application is modelled. So a modelthat allows for better capturing of the nondeterministic behaviour of aweb application can still benefit from our partitioning and distributionstrategy.• We relied on the data received from the database engine as part of themetrics to augment and analyzed our generated models of the appli-cation. This reliance tied us to the technology used in the web appli-cations when modelling and partitioning. A more realistic approachwould be to benefit from a middleware layer that would capture in-teractions between the application tier and the data tier and providemore accurate metrics for cross-tier partitioning.• Our focus throughout this dissertation has been on making partition-ing decisions based on constraints on data placement. As such, we arenot dealing with how the data flows in the application. While con-straints on data storage addresses regulatory requirements for hybridcloud deployments, it does not necessarily prevent privacy sensitivedata from flowing to the public cloud. We recognize that understand-ing the flow of privacy sensitive data requires research on data flowanalysis and taint tracking in order to capture which software func-tions in the application consume privacy sensitive data. These softwarefunctions can then be modified to either work on encrypted data orbe constrained to stay on premise. Further research in this area is asubject of our future work.113Chapter 5Manticore:The Partitioning FrameworkIn the previous chapters, we have described how the combination of codeand data partitioning can lead to effective application deployments forhybrid cloud. In particular, in Chapter 3 we looked at combining codedependency analysis and application partitioning to achieve optimal webapplication deployments for hybrid cloud with the constraint of having allthe data on the private premises. In Chapter 4, we extended our approachto include data-tier partitioning and achieve cross-tier partitioning anddeployment of web applications for hybrid cloud. We showed that cross-tierpartitioning could achieve better performance, cheaper deployment costs,and better scalability compared to other models of hybrid cloud deployment.During our work on analysis and deployment of applications for hybridcloud, we came to realize that the process of converting a monolithicsoftware system to its hybrid equivalent is cumbersome without propertooling. First, a fully automatic partitioning of a hybrid cloud deploymentis not feasible. Developers have different preferences for the amount ofmoney they would like to pay and the performance they expect to perceive.As an example, the Internet is full of articles and entries by people thateither praise the payment models and service offerings provided by AmazonWeb Services (AWS)67, or express unsatisfaction with their monetaryexpenses and system performance when using AWS8910. As such there is nofixed cost-to-performance ratio to be considered the silver bullet of hybridcloud deployments.6http://www.tnrglobal.com/blog/2009/11/will-amazon-aws-save-me-money/7http://www.jonzobrist.com/2011/10/09/my-comments-on-cheap-hosting-amazon-aws-blog-post/8http://blog.carlmercier.com/2012/01/05/ec2-is-basically-one-big-ripoff/9http://www.cloudadvantage.com.au/deep-value-claims-aws-is-380-more-expensive-can-this-be-true-2/10http://hostingfu.com/article/amazon-web-services-is-expensive1145.1. Manticore OverviewIn order to support developers and system architects with efficient hybriddeployment of their software systems, we developed Manticore. Manti-core provides a framework that allows for hybrid-cloud partitioning of soft-ware applications to be included as part of the development process. Whendeveloping Manticore we pursued the following goals:• Flexibility. To allow for flexible and effective encoding of cost andperformance preferences for a hybrid cloud deployment into the designand development process of a software system.• Modeling. To facilitate the modeling and simulation of software parti-tioning, analyze the implications of software partitioning, and correctdesign decisions prior to an actual deployment of a software system tohybrid cloud.• Automation. To provide automation in the process of partitioning anddeployment of the final software system to the hybrid cloud, and henceeliminating the extra engineering effort in software partitioning.Manticore is a semi-automatic hybrid partitioning and deploymentframework which integrates all the algorithms and techniques discussed inthe previous chapters, and facilitates a more interactive partitioning processfor its users. Manticore has been integrated closely with the develop-ment environment of a software system such that as the software evolves,the framework can perform processing and suggest hybrid partitioning anddeployments. Hence, Manticore allows for continuous and iterative anal-ysis of a system towards a hybrid deployment. We implemented Manti-core as an extension into the Java integrated development environment,EclipseTM [56], due to its extensible and customizable plugin system. TheManticore plugin integrates three steps of profiling, dependency modeling,and partitioning to a target cloud environment. In the rest of this chapter,we discuss each of these steps.5.1 Manticore OverviewManticore provides implementations for each of the three steps of profiling,analysis. and partitioning for both the code-tier and the data-tier of a Javaweb application.• Profiling. Runtime instrumentation of applications is an effectivemethod for understanding application behavior [66]. The process of1155.1. Manticore Overviewprofiling involves injecting extra profiling code into the software andcollecting execution traces of the application representing code anddata dependencies as well as application resource usage. We havedeveloped a software profiler called jip-osgi [7, 66] to perform instru-mentation and analysis of Java applications. The software profilerallows for the profiling process to happen at the level of Jar files, Javaclasses, or Java methods, i.e., from coarse-grained to fine-grained. Pro-filing at different levels of granularity helps software architects balancethe trade-off between the level of details they want to capture duringthe instrumented execution of a software system and the overhead in-curred as a matter of having the extra profiling code added to thesoftware. Finer level of granularity implies more instrumentation tobe done to more methods in the code, and can lead to lower perfor-mance for the target software system. It is left to system architects todecide on the level of granularity of their instrumentation for Jar fileand Java classes in the application.When instrumenting a Java web application, we hook a Java agent tothe binary of the target web application, right at the beginning of run-ning the application. The Java agent mines the code for every Javaclass loaded into JVM’s runtime. Through the loading process, theagent injects profiling code into the beginning and the end of methodsin loaded Java classes, collecting information on the overall execu-tion time of each method. This profiling code collects traces on howmethods call one another, how much time it takes for a method to beexecuted, and how much data is transferred from a caller method to acallee. This information allows for creation of the dependency graphduring the analysis phase. (cf. Section 5.2).• Analysis. The data collected from the profiling process is used tocreate a dynamic dependency graph (DDG) [39, 49, 78] of the applica-tion. We have developed an algorithm that transforms traces collectedduring the profiling process into a tree representation with CPU us-ages stored as node weights and data exchange stored as edge weights.Further to the profiling process, during the analysis phase, the archi-tects can decide about the level of granularity of their dependencygraph prior to applying any partitioning algorithm to it. The initialdependency graph holds information on performance metrics for theanalyzed application, i.e., the execution time of the methods, volumesof data exchange across methods, etc. Any partitioning applied to thedependency graph at this stage would thus contribute to an improved1165.1. Manticore Overviewperformance of the application. In order to do analysis for minimizedcosts, the analyzer can also augment the dependency graph with acost model of the target cloud platform where vertex weights and edgeweights are updated to reflect on the monetary implications of the ex-ecution time and data exchange. Partitioning of this new graph wouldthus result in deployments that minimize the overall cost of deploy-ments. In Manticore, developers can choose to optimize for cost orperformance.Also for enhanced understandability, a visual representation of thisdependency model is implemented using the graph visualization tool,Jung [25]. Details of how Manticore performs this analysis can befound in Section 5.3.• Partitioning. The partitioning process involves receiving the datadependency graph generated in the analysis phase and providing amathematical representation of it as a BIP formulation prior to solvingit using a BIP solver. The transformation applies all the constraintsand policies of table deployments, cost constraints, and placement con-straints and generates BIP constraints and a BIP objective function.The sets of formulations are then passed to a BIP solver. The resultsof the BIP solver allow for the final placement of methods, classes, andjar files in the Java application to be determined. Details on our imple-mentation of the partitioning algorithms in Manticore are presentedin Section 5.4.Figure 5.1 shows the series of actions Manticore performs in orderto go from profiling to partitioning and deployment. From left to right,Manticore operates on a Java application connected to an ORACLEdatabase11. At the first phase, the jip-osgi agent is hooked to the Javaapplication where it performs byte-code instrumentation on the codeand profiles database access patterns using the modified database driver.Next, at the analysis phase, the profiling trace, constraints, and hostconfigurations are processed to generate the dependency model. At thepartitioning phase, constraints are enforced to the dependency model andthe partitioning algorithm is applied to generate a distribution plan for thedistributed deployment of the Java application. Finally, the plan is passedto the deployment engine which takes care of the distributed deployment.11Other databases can also be supported but in our deployments we particularly usedORACLE due to its capabilities in supporting distribution of database tables.1175.2. The Software Profiling ToolIn the following sections we describe the implementation details of Man-ticore. Each section describes the implementation of the tools correspond-ing to each phase of profiling, analysis, and partitioning in Figure 5.1.Figure 5.1: From left to right, Profiling: jip-osgi is hooked to the appli-cation; Java bytecode is instrumented and database queries are analyzed;Analysis: cost constraints and host configuration information augment theprofiling trace; the level of granularity for the dependency model is decidedand the dependency model is generated; Partitioning: the partitioner en-forces the constraints and partitions the dependency model; and finally theresult of partitioning is encoded into a distribution plan that is used by thedeployment engine for a distributed deployment of the Java application.5.2 The Software Profiling ToolAs mentioned in Chapter 3, for the purpose of profiling we developed aJava software profiler called jip-osgi [7] that allows for instrumentation andanalysis of a given software system. To implement the profiling process,Manticore implements an eclipse launcher that hooks jip-osgi’s Javaprofiling agent to a target application. Figure 5.2 shows the implementedlauncher. As shown in the VM arguments part of the window, uponchoosing the Java Profile launcher (Figure 5.2B), the VM argumentsare augmented with the information for the javaagent as well as otherrequired parameters for Manticore in order to perform the profilingprocess (Figures 5.2C.1 and 5.2C.2).1185.2. The Software Profiling ToolFigure 5.2: The screen shot for the eclipse launcher of the Java profiler.(A) User selects the target Java web application and chooses to modifythe runtime environment. (B) The Java Profiler is selected in the runtimeenvironment. (C.1) The required parameters get added to the set of runtimeparameters for launching the javaagent. As shown in (C.2), the parametersinclude the path to the javaagent library as well as minimum and maximummemory requirements, and information for the database to be used by ourmodified database driver for collecting information on data exchange.Once the javaagent is hooked to the target Web application, the pro-cess of collecting profiling data can be initiated. We refer to the completeprofiling trace of the application as a snapshot which is managed using theSnapshot View in Manticore. The Snapshot View communicates withthe Java profiler agent over a TCP connection and allows for monitoringprofiling information on remote hosts as well as the local host. The hostand port number for the TCP communication with the profiling server canbe defined in the Snapshot view. Through the TCP communication, theSnapshot view can start or stop the profiling process and store the profilingtraces of the execution for further analysis and partitioning.1195.3. The Tool for Analysis and Generating the Dynamic Dependency Graph5.3 The Tool for Analysis and Generating theDynamic Dependency GraphUpon collection of the profiling trace, the profiling trace is stored in asnapshot folder. Any of the profiling traces in the snapshot folder can bechosen for creating the DDG and analysis. Once a profiling trace is clicked,a dedicated editor view is generated for its dependency graph. However,prior to the generation of the dependency graph, the developer needs tospecify the policy model (cf. Section 3.3.1) and the host configuration model(cf. Section 3.3.2) that need to be used when creating the dependencygraph. As explained in Chapter 3, the policy model is used to decide onthe set of software methods and database tables that need to be includedin the DDG. On the other hand, the cost model is used to adjust vertexweights and edge weights in the dependency graph. To enable this selectionof the policy constraints and the cost model, Manticore implements aConfiguration View that allows the software developer to define theseconstraints and feed them to the system (Figure 5.3A). Both host configu-ration and the policy model files are XML files that can be imported intothe configuration view. Figure 5.3B.1 shows that the Profile XML Tracereferences the trace file, the Mod Exposer XML references the policy modelfile, and the Host Config XMl reference the host configuration file.Prior to generating the dependency graph, Manticore also enablessoftware developers to select the type of dependency modeling that theywant to apply to the DDG of the target software. In Section 3.2, wedescribed different models that we can use in order to generate a DDG of atarget system (see Figure 5.3B.2). Manticore enables software developersto experiment with various models of generating the DDG and comparethe analysis results. The developers can choose from the possible set ofRequest-Based Model, Static-Structure Model, or Context-Sensitive Model ofSection 3.2 when deciding about which model of dependency analysis theywant to apply to the collected profiling data. In the particular example ofFigure 5.3, the Context-Sensitive Model is chosen as the target applicablemodel. Once all the proper options are selected, the dependency model canbe created by clicking on the Generate Model button (Figure 5.3C) fromthe Configuration View.1205.4. The Tool for Partitioning the Dynamic Dependency GraphFigure 5.3: The screen shot for selecting the profiling trace, the policy con-straints, and the host configuration constraints prior to generating the de-pendency graph. (A) The user select a given snapshot from the snapshotview on the right pane of the framework for analysis. With the snapshot se-lected, the user provides information on (B.1) collected profiling trace (i.e.,snapshot), the constraints files, and the host configuration file, as well as(B.2) the type of coarsening to be applied to the collected profiling trace inthe snapshot file. (C) With all the information provided, the user clicks thebutton to generate the model.5.4 The Tool for Partitioning the DynamicDependency GraphOnce the DDG is successfully created, the visualization of the graph willappear in the model view editor part of the plugin (cf. Figure 5.4). Thedependency graph contains the list of methods calling one another as thenodes in the graph and edges showing the call order. The view also containsinformation on data exchange between methods in the target application(edge weights) and the execution time for each method in the graph (vertexweights). The values for the edge weights and the vertex weights for thegraph shown in the editor is calculated based on measurements describedearlier in Chapter 3.1215.4. The Tool for Partitioning the Dynamic Dependency GraphFigure 5.4: The screen shot for selecting the execution cost model, thecommunication cost model, the partitioning algorithm, and applying parti-tioning. (A) The dependency graph is generated, (B) The user can provideinformation on applying charges to the edge weights and node weights inthe dependency graph based on the type of cost model selected (i.e., mon-etary costs or performance), and also the user can select the partitioningalgorithm. (C) With all the partitioning information selected, the user canclick the button to partition the model.Once the dependency graph is generated, the Configuration Viewis augmented with a sub-view on partitioning. In this view, the softwaredeveloper is able to choose the proper encoding of resource usage costs intothe dependency graph. For example, if the software developer is interestedin minimizing the execution time of the hybrid system, the costs for edgeweights and vertex weights can be set to be the communication time andthe execution time of the components in the graph. Alternatively, if thesoftware developer chooses to minimize the monetary costs of a hybriddeployment, edge weights and vertex weights can be set to reflect onmonetary costs of deployment (cf. Figure 5.4B). Upon selecting the properencoding for edge weights and vertex weights in the DDG, the formulationsfrom Section 3.4 are used in order to update the weights in the graph. Thedetails on how these edge weights are calculated can be found in Chapter 3.1225.5. Distribution and DeploymentFinally, the last step in the partitioning phase is to choose the parti-tioning algorithm. As discussed in Section 3.5, the work in this thesis isprimarily concerned with two models of binary linear programming (BIP)algorithms, i.e., the symmetric and the asymmetric partitioning algorithms.For the BIP algorithms we used the off-the-shelf integer programmingsolver lp solve [10]. The results of the partitioning process are thenreflected both on the dependency graph and in the form of a distributionplan. The visualized dependency graph in the plugin is updated with nodesgoing to the cloud colored in green and nodes staying on premises colored inred (cf. Figure 5.5). This provides a visual view of what nodes are separatedfrom one another and what edges need to be cut. Besides the visualizationresults, the plugin also generates the partitioning output in the form ofa distribution plan shown in the console of the plugin. The distributionplan highlights what modules in what request types in the applicationare separated from one another and how they need to be distributed.The partitioner also generates an XML version of the distribution plan.Figure 5.5A.1 highlights some of the nodes in the dependency graph whoseplacements are set to be the private premises (colored red in the figure) andFigure 5.5A.2 shows some of the nodes set to be in the cloud (colored green).Figure 5.5B shows the output of the partitioning and placements algorithms.Further to the visualization of the partitioning results and the distribu-tion plan, Manticore also generates statistical information on the numberof modules placed on each host, the cost of execution, the cost of commu-nication, and the total cost of deployment as calculated by the partitioningalgorithm. Figure 5.6 shows how this information is presented to the user.5.5 Distribution and DeploymentThe distribution plan generated from partitioning of the dependencygraph serves as the input to the Distribution Engine. Since the DDGis augmented with both code and data interdependencies, a cut in theDDG may separate code entities from one another, code entities fromdata entities, and data entities from one another. Separation of code anddata is achievable by accessing the database engine through the databasedriver. Separating inter-code or inter-data dependencies requires some extraengineering work. For code entities, we have developed a bytecode rewriting1235.5. Distribution and DeploymentFigure 5.5: The screen shot for the results of partitioning and the generateddistribution plan. (A) The output of the partitioning process is shown with,(A.1) nodes that are subject to placement on the private premises (coloredin red) and (A.2) nodes that are subject to placement in the cloud (coloredin green). (B) A text-based representation of the distribution plan is alsogenerated both in plain text and XML to be used for distributed deployment.engine as well as an HTTP remoting library that takes the partitioning plangenerated by the analyzer, and splits the target application based on thegenerated plan. Splitting the application happens by identifying methodsthat will be separated from one another in the distributed deployment ofthe application, injecting remote invocation (RMI) calls from the callermethods to the callee methods, and serializing data between the callersand the callees. The RMI calls are then given the host addresses where theprivate and the public portion of the application are deployed.1245.6. DiscussionFigure 5.6: The screen shot for statistical information on partitioning re-sults. The figure shows information on the total size of data going fromthe private premises to the cloud, total data going from the cloud to theprivate premises, the overall execution time for the code in the cloud, andthe total execution time for the code on the premises. For each data point,the monetary costs are also calculated based on the information provided inthe host configuration file.In order to allow for distribution of data entities, we have taken advan-tage of Oracle’s distributed database management system (DDBMS) [11].This allows for tables remote to a local Oracle DBMS, to be identified andqueried for data through the local Oracle DBMS. This is possible by pro-viding a database link (@dblink) between the local and the remote DBMSsystems. Once a bidirectional dblink is established, the two databases canexecute SQL statements targeting tables from one another. Using the estab-lished dblinks we can provide distribution plans from our analyzer systemto perform vertical sharding at the level of database tables. It is importantto note that the distributed query engine acts on the deployment of a systemafter an intelligent decision about the placement of tables has been made byour partitioning algorithm.5.6 DiscussionManticore’s integration into an interactive development environment(IDE) provides a unique opportunity for hybrid deployment analysis to beweaved into the development process of a web application. We envisionthat through using Manticore, developers will be able to constantlyassess and revise their decisions regarding a distributed deployment of theirtarget applications and work towards having deployments that better meetcost and performance optimality requirements.1255.7. SummaryWe conducted some informal pilot studies of Manticore with de-velopers and software architects at the University of British Columbia.Our early analysis revealed that the developers found the design intuitiveand useful. They expressed interest in using Manticore to assist themwith their design and optimization decisions. Following the three goals offlexibility, modelling, and automation in the design of Manticore, duringour study we asked system architects to play with the tool and modifytheir cost or constraint models, explain the results from the generatedmodels and articulate the effectiveness of the automated deployment thatManticore provides. The received feedback highlighted the fact thatmodelling could significantly impact and improve how a distributed systemis realized. The developers found it beneficial to be able to see howdistribution and database roundtrips could affect performance and worktowards performance improvements. They also found it effective to haveautomated tool for final distribution and deployment of the application.One of the developers stated that: ”typically system architects suffer a lotwhen it comes to the final distribution of an application and it is reallyhelpful if I can program my distribution model into the tool and let it takecare of the process”.Even though the current user study was informal and limited in scale, thecomments were encouraging to continue on building the platform and movingtowards a fully-fledged analysis and user study of the platform. This is partof our future work to use Manticore during the development lifecycle of aweb application and see how it can help developers with making informeddecisions about their hybrid cloud deployments.5.7 SummaryIn this chapter we presented our implementation of Manticore as a frame-work that allows for profiling, partitioning, and distribution of a monolithicsoftware system into a hybrid version of it deployable to a hybrid cloud. Indescribing Manticore, we discussed how its implementation was concernedwith the design decisions and requirement analysis we had discussed earlierin this thesis. In Section 5.2, we discussed how Manticore in combina-tion with jip − osgi allows for profiling and collecting traces of a softwaresystem. In Section 5.3, we described how the results of the profiling pro-cess are used in order to generate the DDG of the application and how1265.7. Summarythe DDG is affected by policy and cost models defined and how differentmodels of generating the DDG can be selected prior to generating a DDG.Finally in Section 5.4, we showed how the generated DDG can be usedduring the partitioning phase, how execution and communication costs aretaken into consideration, and how the partitioning algorithms are selected.Moreover, we showed how the generated results are presented both in visu-alized form for software developers and architects to understand as well asmachine interpretable XML files that can be used by the distribution engine.Manticore is open source and can be downloaded from github [12].127Chapter 6Conclusion & Future WorkWhile there are advantages to deploying Web applications on public cloudinfrastructure, many companies wish to retain control over specific re-sources [38] by keeping them at a private premises. As a result, hybrid cloudcomputing, has become a popular architectural model where systems arebuilt to take advantage of both public and private infrastructure. This allowshybrid application deployments to meet privacy, confidentiality, elasticity,and scalability requirements all at the same time. However, architectingan efficient distributed system across these locations requires significanteffort. An effective partitioning should not only guarantee that privacyconstraints and performance objectives are met, but also should deliver onone of the primary reasons for using the public cloud, a cheaper deployment.For multi-tier web applications, the challenge of hybrid cloud deploy-ments manifests itself primarily in the partitioning of application- anddatabase-tiers. While there is existing research that focuses on eitherapplication-tier or data-tier partitioning, they usually focus on a singletier and at a higher level of abstraction (e.g., the application servers). Inthis dissertation we showed that optimized deployment of web applicationsbenefits from partitioning at a finer level of granularity (i.e., softwarefunctions and database tables), with both tiers being considered simultane-ously during the partitioning process. We presented our research on a newcross-tier partitioning approach to help developers make effective trade-offsbetween performance and cost in a hybrid cloud deployment. We showedthat for our benchmark applications (i.e., RUBiS [4], DayTrader [5], andJForum [6]), our approach can result in up to 54% reduction in monetarycosts compared to a premises only deployment and 56% improvement inexecution time compared to a na¨ıve partitioning where application-tier isdeployed in the cloud and data-tier is on private infrastructure.1286.1. Primary Research Contributions6.1 Primary Research ContributionsThe work presented in this dissertation has made the following four primarycontributions:6.1.1 Context-Sensitive Dependency ModelingExisting research on application partitioning provides techniques to deter-mine the optimal mapping of software functions to network hosts (e.g. clientor server) [43, 63, 78]. Such research only supports a simple one-to-one map-ping of functions to hosts. However, in our research we have found this sim-ple mapping to be inadequate because the optimal placement of a functiondepends on the context in which the function is used. In short, sometimesit is better to execute a particular function in the public cloud and some-times it is better to execute it on premises. We called such distinction,context-sensitive partitioning, and described our solution to this problem inChapter 3. We showed that context-sensitive dependency modeling wouldimprove both cost and performance in a hybrid deployment by as much as40%.6.1.2 Flexible Cost ModelingAt its core, application partitioning is a method for applying mathematicaloptimization to distributed software development. The primary objectivesfor optimization are performance, i.e. request processing latency, and themonetary cost of deployment. However, previous work did not provide ameans for developers to explicitly make trade-offs between these two objec-tives. In our research, we provided a flexible cost modeling technique whichallows developers to clearly specify their expected trade-offs between latencyand monetary costs when analyzing an application for partitioning and hy-brid deployment. Details for our flexible cost modeling were described inChapter 3 and examples are provided in Appendix C. We introduced thenotion of γ as a parameter to relate cost of deployment to the level of per-ceived latency. We demonstrated that a higher γ would value lower latencywhile a lower γ would value cheaper deployments. We showed that choosingthe right value for γ can improve deployment costs by up to 51% if cheaperdeployment is the focus. Similarly, in a situation where achieving lowerlatency is the goal, choosing the right value for γ can improve the overallapplication throughput by almost 25%.1296.1. Primary Research Contributions6.1.3 Cross-tier PartitioningWhen distributing a system across a hybrid architecture, developers needto simultaneously minimize data transfer, maximize use of public cloud re-sources, and honor privacy constraints for data. Achieving this goal requiresintelligent design for both business logic execution and data placement. Thusit is important to not only analyze the context of code execution, but also toprofile relational operations at the level of database queries. We have inves-tigated combining code partitioning and data partitioning to guarantee anoptimal hybrid cloud deployment. The work presented here is the first of itskind in tackling the problem of combined code and data partitioning. Ourexperiments on benchmark applications (i.e., RUBiS [4], DayTrader [5], andJForum [6]) revealed that combined partitioning of code and data entities toa hybrid cloud can provide a cost improvement of more than 55% comparedto a na¨ıve hybrid deployment, and around 40% compared to when onlycode (and not data) is partitioned. The details of our cross-tier partitioningapproach are presented in Chapter Asymmetric Data Exchange CostsFor partitioning and distribution algorithms to be effective within the con-text of hybrid cloud deployments, we need to consider financial and oper-ational characteristics of the host cloud. Our analysis of cost schemes forvarious cloud providers revealed that public cloud providers impose asym-metric charges for data to/from their infrastructure [2, 3]. Purposefullyenough, these asymmetric data transfer charges encourage pushing data tothe cloud by making cost of data transfer to the cloud considerably cheaperthan data retrieval from the cloud12. Exploiting this asymmetric cost model,we proposed an optimization formulation for software partitioning that fur-ther reduces the cost of a hybrid deployment. We showed that, by exploitingthis asymmetry in data costs, we were able to reduce the monthly costs ofhybrid deployments by 11% compared to when this asymmetry in communi-cation costs is ignored. Details for our strategy in dealing with asymmetriccost models are presented in Chapter 3.12Amazon and RackSpace have no charges for data transfer into their cloud infrastruc-ture while they charge between $0.12 to $0.18 for every GB of data leaving the cloud.1306.2. Secondary Research Contributions6.2 Secondary Research ContributionsFurther to the primary contributions discussed above, we also made thefollowing secondary contributions:6.2.1 Simulation and Real-world EvaluationTo validate the effectiveness of the developed solutions, we conducted thor-ough analyses of our developed techniques on several open source softwaresystems (i.e., RUBiS [4], Apache DayTrader [5], and JForum [6]). For all theanalyses, we first conducted simulations using our implemented framework,Manticore, and then validated the results of our simulations by conduct-ing real-world deployments of the benchmark software systems on AmazonWeb Services platform (AWS). Through conducting the evaluations, we wereable to demonstrate that the cost and performance were improved for a hy-brid deployment of the applications, when the deployment model has beengenerated using our automated partitioning algorithms. Details for theseevaluations can be found in Chapters 3 and Comprehensive ToolingAll the algorithms and tools described in this thesis are implemented under aframework for application partitioning and distribution to the cloud, Man-ticore [67]. Manticore helps software architects make informed decisionswith cost-, data-, and performance-effective deployments of their applica-tions. Manticore has been implemented on top of the Eclipse integrateddevelopment environment, and can be used within the same development en-vironment where web applications are developed. This close integration withthe development environment is intended to facilitate the overall process ofsoftware development and allow for easier deployment of web applicationsto the hybrid cloud. Details on the usage of Manticore were presented inChapter 5.6.3 Thesis Review and ObservationsAs discussed in Chapter 1, we started our research following the thesis below:We can develop semi-automatic partitioning and distribution algo-rithms and tools to facilitate migration of OLTP-style web applicationsto a hybrid cloud deployment. This is achieved through identification of1316.4. Challenges and Future Workconstraints im- posed on cost and data placement, modeling and analysis ofcode and data dependencies in a monolithic software system, and formu-lating the depen- dency models and constraints into an optimization problem.Based on the contributions in this work, we demonstrated that:• developing semi-automatic partitioning algorithms is feasible throughour cross-tier partitioning algorithm;• cost and data placement can be captured through our flexible costmodeling scheme;• code and data dependencies can be analyzed using our context-sensitivedependency model;• dependency models can be formulated into a BIP optimization prob-lem; and finally• cross-tier partitioning is proven feasible through extensive tooling anddetailed evaluation of benchmark applications.Following the results of our research, we feel confident to claim that thisresearch work validated our hypothesis.In the next section we go over some of the challenges we faced throughoutour research work, and discuss plans for our future work.6.4 Challenges and Future WorkSimilar to any other research work, the hypotheses discussed and validatedin this dissertation had its own shortcomings and challenges. In this section,we discuss some of these challenges and shortcomings and propose how theycan be improved in the future.6.4.1 Semi-automatic PartitioningWhile our approach simplifies manual partitioning for hybrid cloud par-titioning, it requires some input from a developer. First, we require arepresentative workload for profiling. Second, a developer may need toprovide input about the impact that atomic transactions have on partition-ing. After partitioning, a developer may also want to consider changes to1326.4. Challenges and Future Workthe implementation to handle some transactions in an alternative fashion,e.g. providing forward compensation [53]. To ameliorate some of thisburden, our tool provides an interactive visualization of partitioning results,as shown in Figure 4.1, to allow developers to work through different“what-if” scenarios. Also as noted, our current implementation andexperience requires some manual intervention and is limited to Java-basedweb applications and SQL-based databases. However, as part of the futurework, we plan to increase the level of automation and extend the tool towork with web applications utilizing other programming languages anddatabase systems as well.Furthermore, as described in Chapter 3, we make assumptions about thestatelessness feature of web applications. Given that there are possibilitiesfor functions in web applications to keep internal state, we plan to extend ourresearch to do static analysis and symbolic execution combined with dynamicprofiling and data dependency analysis to detect shared state in applicationfunctions and allow for replication or restriction of affected functions in sucha way that the overall behaviour of the target application stays intact.6.4.2 Heterogeneous vs. Homogeneous MachinesFollowing the points in Section 3.7 of Chapter 3, in our current implementa-tion of the algorithms, our costs models and host configuration informationassume homogeneity of all machines in the private premises and the pub-lic cloud. However previous research shows that the cloud environment ismostly a heterogeneous environment with machines of the same categoryhaving different capabilities. As part of our future work we plan to revisethe cost schemes and host configuration information in such a way that amulti-way partitioning of the dependency models would allow for deploy-ment across machines with heterogeneous capabilities.6.4.3 Multi-way and Online PartitioningAs shown in this dissertation, our formulation of an asymmetric cost modelworks well for a two-way partitioning of a given application. However, asdiscussed in Section 3.7, we are not aware of any efficient algorithm thatwould allow for multi-way partitioning of dependency graphs with asym-metric cost models. This remains an open problem for us to investigate.Furthermore, we plan to develop tools that allow for dynamic repartitioningof code and data to happen through online monitoring of application1336.4. Challenges and Future Workperformance. Particularly for the variations of query plans, currentlywe are limited to the set of potentially incomplete suggestions providedby the database engine. As described in Chapter 4, our data model isbuilt by consulting the ORACLE database engine. As a result, our datadependency model is only as good as the statistical knowledge, collectedby the database engine, of predictions on database query executions. Thisclearly could result in a suboptimal partitioning of data entities due tosuboptimal prediction of query plan executions.As part of the future work, we plan to develop techniques to allow formore exhaustive analysis of query plans by either i) deploying all distributionmodels for data entities to identify the optimal query plans, or ii) integrat-ing our optimization techniques more closely with those of database enginesto more effectively report on distributed query plans. We also intend toinvestigate how providing more adaptivity in relocating data and code enti-ties can result in better performing and more cost effective deployments ofapplications.6.4.4 Data Dependency AnalysisAs discussed in Section 4.7, our current implementation of data-tier parti-tioning relies on leveraging the distributed query engine from a productiondatabase (i.e. ORACLE). In some environments, relying on a homogeneousintegration of data by the underlying platform may not be realistic. A givenmonolithic OLTP-style web application may use ORACLE as its databaseengine, however when it comes to a distributed deployment in a hybridcloud, conditions my enforce different choices of database for different partsof the distribution. As an example, a hybrid cloud offering of a MySQLdatabase may happen to be cheaper than its ORACLE counterpart or acloud provider may lack proper licensing to support an ORACLE database.In situations like this, it is important to allow for heterogeneous engines tomanage data for different parts of a hybrid deployment, e.g., ORACLE forthe premises deployment and MySQL for the cloud deployment. We arecurrently working to automatically generate REST interfaces to integratedata between the public cloud and private premises rather than relying ona SQL layer. In future work we plan to support a more loosely coupledservice-oriented architecture to accommodate a distributed version of thepartitioned applications.1346.5. Summary6.4.5 Security and Privacy RequirementsFollowing the discussions in Section 4.7, our current work mostly focuseson security and privacy of data storage when dealing with hybrid cloud de-ployments. This implies that, for our deployments we are mostly concernedabout where the data is stored rather than how the data flows. Conse-quently, our analysis and partitioning algorithms deal with constraints ondata placement and not the flow of data. While this allows for addressingscenarios concerning regulations on data storage, it falls short in dealingwith more security and privacy critical scenarios where the flow of data isalso important. Our current approach does not guarantee that leakage ofprivacy sensitive data from the private infrastructure to the public cloudwill be prevented. For the future work, we plan to develop techniques thatwould allow for analyzing the data flow and tracking privacy critical data.We intend to combine this work on data flow analysis into our approach toprovide a better guarantee on protecting highly sensitive user information.6.5 SummaryTo summarize, we have shown that, in the presence of code and dataplacement constraints, the use of automated profiling, analysis, and parti-tioning techniques would lead to optimal deployment of web applicationsto the hybrid cloud. We showed how this can be achieved through acombination of context-sensitive dependency modeling, flexible cost model,cross-tier partitioning, and asymmetric data exchange models. We alsoconducted detailed analysis of benchmark applications to validate theoverall improvements in cost and performance when doing a cross-tierpartitioning of web applications for hybrid cloud.Overall, our findings showed significant improvements in hybrid deploy-ment of web applications to the cloud using the proposed automated ap-proach. Nonetheless, the research is only scratching the surface when itcomes to (semi-)automated application deployment to the cloud. Web ap-plications, while one of the major candidates, are not the only type of appli-cations for hybrid cloud deployment. With NoSQL databases, map-reducetype of applications, scientific CPU intensive web applications, batch pro-cessing applications, etc., the problem of (semi-)automatic partitioning anddeployment of applications to the cloud, remains an open problem. Webelieve the correct strategy for a hybrid deployment really depends on thecontext in which the application is utilized. As such, any other type of ap-1356.5. Summaryplication deployed to the hybrid cloud, requires detailed research and eval-uations for the identification of a proper deployment strategy to the hybridcloud.136Bibliography[1] Kiip Corp. - [Online - August 2013]: http://www.kiip.me/.[2] Amazon Web Services (AWS) Pricing - [Online - August 2013]:http://aws.amazon.com/ec2/pricing/.[3] Rackspace Pricing - [Online - August 2013]:http://www.rackspace.com/cloud/public/servers/pricing/.[4] RUBiS: Rice University Bidding System - [Online - August 2013]:http://rubis.ow2.org/.[5] Apache DayTrader https://cwiki.apache.org/GMOxDOC20/daytrader.html.[6] JForum Discussion Board - [Online - August 2014]: http://jforum.net.[7] JiP-OSGi - [Online - August 2014]: http://code.google.com/p/jip-osgi/.[8] Jetty Web Server - [Online - August 2013]: http://jetty.codehaus.org/jetty/.[9] Oracle Database 11g Express Edition - [Online - August2013]: http://www.oracle.com/technetwork/products/express-edition/overview/index.html.[10] lp solve Linear Programming solver - [Online - August 2014]:http://lpsolve.sourceforge.net/.[11] Oracle Distributed Database Architecture - [Online - Au-gust 2014]: http://docs.oracle.com/cd/B28359 01/server.111/b28310/ds concepts001.htm.[12] Manticore Repo. [Online - August 2014]https://github.com/nkaviani/ca.ubc.magic.partitioning.analyzer.[13] Vyatta Corp. - [Online - August 2013]: http://www.vyatta.com/.[14] CloudReach Corp. - [Online - August 2013]:http://www.cloudreach.com/en/.[15] Google Secure Data Connector - [Online - August 2013]:https://developers.google.com/secure-data-connector/.137Bibliography[16] Amazon Virtual Private Cloud - [Online - August 2013]:http://aws.amazon.com/vpc/.[17] Karma Corp. - [Online - August 2013]: https://yourkarma.com/.[18] EC2 to VPC: A transition worth doing - [Online - August 2013]:http://blog.engineering.kiip.me/post/12288961849/ec2-to-vpc-transition-worth-doing.[19] Building private clouds with Amazon VPC - [Online - August 2013]:http://blog.yourkarma.com/building-private-clouds-with-amazon-vpc/.[20] Amazon Direct Connect - [Online - August 2013]:http://aws.amazon.com/directconnect/.[21] NetApp Corp. - [Online - August 2013]: http://www.netapp.com/.[22] Zadara Storage Corp. - [Online - August 2013]:http://www.zadarastorage.com/.[23] Google AppEngine. [Online - August 2014]:https://developers.google.com/appengine/.[24] Google Compute Engine. [Online - August 2014]:https://cloud.google.com/products/compute-engine/.[25] Java Universal Network/Graph Framework. [Online - August 2014]:http://jung.sourceforge.net/.[26] PaaSLane, the Fastest Way to Migrate Applications to Amazon Web Services.[Online - August 2014]: http://www.paaslane.com/.[27] Personal Correspondence with Microsoft Principal Architecture, DougFranklin.[28] Personal Correspondence with RackSpace Engineer, David Campos.[29] SMB Cloud Adoption Study. March 2011. Microsoft Survey Reveals 39 Per-cent of SMBs to Pay for Cloud Services Within Three Years - [Online - Au-gust 2014]: http://www.microsoft.com/en-us/news/press/2011/mar11/03-25smbresearchpr.aspx.[30] June 2013. 2013 Future of Cloud Computing Survey RevealsBusiness Driving Cloud Adoption in Everything as a ServiceEra; IT Investing Heavily to Catch up and Support ConsumersGraduating from BYOD to BYOC - [Online - August 2014]:http://www.businesswire.com/news/home/20130619005581/en/2013-Future-Cloud-Computing-Survey-Reveals-Business.138Bibliography[31] July 2013. Will Cloud Services Make or BreakYour Offshore Provider? - [Online - August 2013]:http://www.gartner.com/DisplayDocument?id=2554916.[32] September 2013. Hybrid cloud computing overview and benefits- [Online - August 2014]: http://www.cloudcomputing-news.net/blog-hub/2013/sep/05/hybrid-cloud-computing-overview-benefits/.[33] Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, and Alec Wol-man. Volley: Automated data placement for geo-distributed cloud services.In Proc. of NSDI, 2010.[34] Sanjay Agrawal, Vivek Narasayya, and Beverly Yang. Integrating vertical andhorizontal partitioning into automated physical database design. In Proc. of(SIGMOD), 2004.[35] A.O. Allen. Probability, statistics, and queueing theory: with computer sci-ence applications. Academic Pr, 1990.[36] OSGi Alliance. Osgi service platform core specification release4.2. Technical report, September 2009. [Online - August 2014]:http://www.osgi.org/Release4/Download.[37] K. Amiri, D. Petrou, G.R. Ganger, and G.A. Gibson. Dynamic functionplacement for data-intensive cluster computing. In Proceedings of the annualconference on USENIX Annual Technical Conference, page 25. USENIX As-sociation, 2000.[38] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph,Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, ArielRabkin, Ion Stoica, and Matei Zaharia. Above the Clouds: A Berkeley Viewof Cloud Computing. Technical Report UCB/EECS-2009-28, EECS Depart-ment, University of California, Berkeley, Feb 2009.[39] Todd M. Austin and Gurindar S. Sohi. Dynamic dependency analysis ofordinary programs. SIGARCH Comput. Archit. News, 20(2):342–351, 1992.[40] C. Becker, M. Handte, G. Schiele, and K. Rothermel. Pcom-a componentsystem for pervasive computing. 2004.[41] Antonio Carzaniga, Gian Pietro Picco, and Giovanni Vigna. Designing dis-tributed applications with mobile code paradigms. In ICSE, pages 22–32,1997.[42] Antonio Carzaniga, Gian Pietro Picco, and Giovanni Vigna. Is Code StillMoving Around? Looking Back at a Decade of Code Mobility. In ICSECompanion, pages 9–20, 2007.139Bibliography[43] S. Chong, J. Liu, A.C. Myers, X. Qi, K. Vikram, L. Zheng, and X. Zheng.Building secure web applications with automatic partitioning. Communica-tions of the ACM, 52(2):79–87, 2009.[44] S. Chong, J. Liu, A.C. Myers, X. Qi, K. Vikram, L. Zheng, and X. Zheng.Building secure web applications with automatic partitioning. In Proc. ofSOSP, 2009.[45] Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and AshwinPatti. Clonecloud: elastic execution between mobile device and cloud. InProc. of EuroSys, 2011.[46] Microsoft Corporation. Microsoft com technologies - dcom. Technical report,1998. [Online - August 2014] http://www.microsoft.com/com/dcom.asp.[47] Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. Schism: aworkload-driven approach to database replication and partitioning. Proc.VLDB Endow., 3(1-2), 2010.[48] R.E. Diaconescu, L. Wang, Z. Mouri, and M. Chu. A compiler and runtimeinfrastructure for automatic program distribution. 2005.[49] Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program de-pendence graph and its use in optimization. ACM Trans. Program. Lang.Syst., 9(3):319–349, 1987.[50] Rick Freedman. Cloud computing migration issues: Whatyou need to know, December 2009. [Online - March 2011]:http://blogs.techrepublic.com.com/project-management/?p=1248.[51] Alfonso Fuggetta, Gian Pietro Picco, and Giovanni Vigna. UnderstandingCode Mobility. IEEE Transactions on Software Engineering, 24:342–361,1998.[52] Bruce Gain. Cloud Computing & SaaS In 2010 What To Expect After TheUncertainty & Hype Fade, January 2010.[53] Hector Garcia-Molina and Kenneth Salem. Sagas. In Proc. of SIGMOD,1987.[54] I. Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso. Calling the cloud:enabling mobile phones as interfaces to cloud applications. In Proceedingsof the 10th ACM/IFIP/USENIX International Conference on Middleware,pages 1–20. Springer-Verlag New York, Inc., 2009.[55] Bernard Golden. The case against cloud computing, part one, January 2009.[Online - July 2011]: http://www.computerworld.com/s/article/9126620/The case against cloud computing part one?taxonomyId=1.140Bibliography[56] O. Gruber, BJ Hargrave, J. McAffer, P. Rapicault, and T. Watson. TheEclipse 3.0 platform: adopting OSGi technology. IBM Systems Journal,44(2):289–299, 2005.[57] X. Gu, A. Messer, I. Greenberg, D. Milojicic, and K. Nahrstedt. Adaptiveoffloading for pervasive computing. IEEE Pervasive Computing, pages 66–73,2004.[58] Mohammad Hajjat, Xin Sun, Yu-Wei Eric Sung, David Maltz, Sanjay Rao,Kunwadee Sripanidkulchai, and Mohit Tawarmalani. Cloudward bound:planning for beneficial migration of enterprise applications to the cloud. InProc. of SIGCOMM, 2010.[59] Mohammad Hajjat, Xin Sun, Yu-Wei Eric Sung, David Maltz, Sanjay Rao,Kunwadee Sripanidkulchai, and Mohit Tawarmalani. Cloudward bound:planning for beneficial migration of enterprise applications to the cloud. InSIGCOMM ’10: Proceedings of the ACM SIGCOMM 2010 conference onSIGCOMM, pages 243–254, New York, NY, USA, 2010. ACM.[60] Griffith Hamlin, Jr. Configurable applications for satellite graphics. In SIG-GRAPH ’76: Proceedings of the 3rd annual conference on Computer graphicsand interactive techniques, pages 196–203, New York, NY, USA, 1976. ACM.[61] B. Hendrickson and R. Leland. An improved spectral graph partitioningalgorithm for mapping parallel computations. SIAM Journal on ScientificComputing, 16(2):452–469, 1995.[62] Gregor Hohpe. Your coffee shop doesn’t use two-phase commit. IEEE Softw.,22(2):64–66, March 2005.[63] G.C. Hunt and M.L. Scott. The Coign automatic distributed partitioningsystem. Operating systems review, 33:187–200, 1998.[64] Waheed Iqbal, Matthew N. Dailey, and David Carrera. SLA-driven dynamicresource management for multi-tier web applications in a cloud. In CCGRID,2010.[65] Jesper M. Johansson. On the impact of network latency on distributed sys-tems design. Information Technology and Management, 1:183–194, 2000.[66] Nima Kaviani, Eric Wohlstadter, and Rodger Lea. Profiling-as-a-Service:Adaptive Scalable Resource Profiling for the Cloud in the Cloud. In Proc. ofInt’l Conf. on Service Oriented Computing (ICSOC), 2011.[67] Nima Kaviani, Eric Wohlstadter, and Rodger Lea. Manticore: A Frameworkfor Partitioning of Software Services for Hybrid Cloud. In Proc. of IEEECloudCom, 2012.141Bibliography[68] Nima Kaviani, Eric Wohlstadter, and Rodger Lea. Cross-Tier ApplicationData Partitioning of Web Applications for Hybrid Cloud Deployment. InACM / IFIP / Usenix International Conference on Middleware, 2013.[69] Vaibhav Khadilkar, Murat Kantarcioglu, and Bhavani Thuraisingham. Risk-Aware Data Processing in Hybrid Clouds. Technical report, University ofTexas at Dallas, 2011.[70] Steven Y. Ko, Kyungho Jeon, and Ramse´s Morales. The HybrEx model forconfidentiality and privacy in cloud computing. In Proc. of HotCloud, 2011.[71] Frank Leymann, Christoph Fehling, Ralph Mietzner, Alexander Nowak, andSchahram Dustdar. Moving applications to the cloud: an approach based onapplication model enrichment. Int. J. Cooperative Inf. Syst., 20(3):307–356,2011.[72] J. Li, J. Chinneck, M. Woodside, M. Litoiu, and G. Iszlai. Performancemodel driven QoS guarantees and optimization in clouds. In Proceedingsof the 2009 ICSE Workshop on Software Engineering Challenges of CloudComputing, pages 15–22. IEEE Computer Society, 2009.[73] J.Z. Li, J. Chinneck, M. Woodside, and M. Litoiu. Fast scalable optimizationto configure service systems having cost and quality of service constraints.In Proceedings of the 6th international conference on Autonomic computing,pages 159–168. ACM, 2009.[74] Janet Michel and Andries van Dam. Experience with distributed processingon a host/satellite graphics system. In SIGGRAPH ’76: Proceedings of the3rd annual conference on Computer graphics and interactive techniques, pages190–195, New York, NY, USA, 1976. ACM.[75] Microsoft. The Economics of the Cloud. USA, November 2010.[76] Sun Microsystems. Java remote method invocation. Specification, Sun Mi-crosystems, Palo Alto, CA, 1999.[77] M.G. Nanda, S. Chandra, and V. Sarkar. Decentralizing execution of com-posite web services. ACM SIGPLAN Notices, 39(10):170–187, 2004.[78] Ryan Newton, Sivan Toledo, Lewis Girod, Hari Balakrishnan, and SamuelMadden. Wishbone: Prole-based Partitioning for Sensornet Applications. InNSDI 2009, Boston, MA, April 2009.[79] Framingham MA 01701 USA Object Management Group, 492 Old Connecti-cut Path. The common object request broker: Architecture and specificationrevision 2.2. Technical report, February 2002.142Bibliography[80] S. Ou, K. Yang, and J. Zhang. An effective offloading middleware for pervasiveservices on mobile devices. Pervasive and Mobile Computing, 3(4):362–385,2007.[81] Shumao Ou, Kun Yang, and Antonio Liotta. An adaptive multi-constraintpartitioning algorithm for offloading in pervasive systems. In PerCom, pages116–125. IEEE Computer Society, 2006.[82] Z. Ou, H. Zhuang, A. Lukyanenko, J. Nurminen, P. Hui, V. Mazalov, andA. Yla-Jaaski. Is the Same Instance Type Created Equal? Exploiting Het-erogeneity of Public Clouds. volume PP, pages 1–1, 2013.[83] Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, Antti Yla¨-Ja¨a¨ski, and PanHui. Exploiting Hardware Heterogeneity Within the Same Instance Type ofAmazon EC2. In Proceedings of the 4th USENIX Conference on Hot Topicsin Cloud Ccomputing, HotCloud’12, pages 4–4, Berkeley, CA, USA, 2012.USENIX Association.[84] Prabhakar Raghavan and Clark Tompson. Randomized rounding: A tech-nique for provably good algorithms and algorithmic proofs. Combinatorica,7:365–374, 1987. 10.1007/BF02579324.[85] Jan S. Rellermeyer, Gustavo Alonso, and Timothy Roscoe. R-osgi: Dis-tributed applications through software modularization. In Renato Cerqueiraand Roy H. Campbell, editors, Middleware, volume 4834 of Lecture Notes inComputer Science, pages 1–20. Springer, 2007.[86] J.S. Rellermeyer, O. Riva, and G. Alonso. AlfredO: an architecture forflexible interaction with electronic devices. In Proceedings of the 9thACM/IFIP/USENIX International Conference on Middleware, pages 22–41.Springer-Verlag New York, Inc., 2008.[87] RightScale. IT managers’ concerns on cloud lock-in - online:.http://www.rightscale.com/products/advantages/cloud-portability.php.[88] Manuel Rodriguez-Martinez and Nick Roussopoulos. MOHCA: A self-extensible database middleware system for distributed data sources. In Proc.of Int’l Conf. on Management of Data (SIGMOD), 2000.[89] S. M. Sadjadi. A survey of adaptive middleware. Technical report, MichiganState University, 2003.[90] A. Schrijver. Theory of Linear and Integer Programming. Wiley & Sons,1998.[91] Griffiths Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Ac-cess path selection in a relational database management system. In Proc. ofSIGMOD, 1979.143Bibliography[92] S. Shimizu, R. Rangaswami, H.A. Duran-Limon, and M. Corona-Perez.Platform-independent modeling and prediction of application resource usagecharacteristics. Journal of Systems and Software, 82(12):2117–2127, 2009.[93] Borja Sotomayor, Ruben S. Montero, Ignacio M. Llorente, and Ian Foster.Virtual Infrastructure Management in Private and Hybrid Clouds. IEEEInternet Computing, 13:14–22, 2009.[94] C. Stewart and K. Shen. Performance modeling and system managementfor multi-component online services. In Proceedings of the 2nd conferenceon Symposium on Networked Systems Design & Implementation-Volume 2,page 84. USENIX Association, 2005.[95] Christopher Stewart and Kai Shen. Performance modeling and system man-agement for multi-component online services. In NSDI’05: Proceedings of the2nd conference on Symposium on Networked Systems Design & Implementa-tion, pages 71–84, Berkeley, CA, USA, 2005. USENIX Association.[96] M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM(JACM), 44(4):591, 1997.[97] Sun. Introduction to Cloud Computing Architecture, June 2009. 1st Edition.[98] Byung Chul Tak, Bhuvan Urgaonkar, and Anand Sivasubramaniam. To Moveor Not to Move: The Economics of Cloud Computing. In Proceedings of the3rd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’11,pages 5–5, Berkeley, CA, USA, 2011. USENIX Association.[99] E. Tilevich and Y. Smaragdakis. J-orchestra: Automatic java applicationpartitioning. ECOOP 2002Object-Oriented Programming, pages 1–3, 2006.[100] Yohei Ueda and Toshio Nakatani. Performance variations of two open-sourcecloud platforms. In Proc. of Int’l Symp. on Workload Characterization, 2010.[101] B. Urgaonkar, A.L. Rosenberg, P. Shenoy, and A. Zomaya. Application place-ment on a cluster of servers. International Journal of Foundations of Com-puter Science, 18(5):1023–1041, 2007.[102] B. Urgaonkar, P. Shenoy, and T. Roscoe. Resource overbooking and applica-tion profiling in shared hosting platforms. ACM SIGOPS Operating SystemsReview, 36(SI):254, 2002.[103] Alexander Wieder, Pramod Bhatotia, Ansley Post, and Rodrigo Rodrigues.Orchestrating the deployment of computations in the cloud with conductor. InProceedings of the 9th USENIX conference on Networked Systems Design andImplementation, NSDI’12, pages 27–27, Berkeley, CA, USA, 2012. USENIXAssociation.144[104] F. Yang, N. Gupta, N. Gerner, X. Qi, A. Demers, J. Gehrke, and J. Shan-mugasundaram. A unified platform for data driven web applications withautomatic client-server partitioning. In Proceedings of the 16th internationalconference on World Wide Web, page 350. ACM, 2007.[105] F. Yang, J. Shanmugasundaram, M. Riedewald, and J. Gehrke. Hilda: Ahigh-level language for data-driven web applications. In WWW, 2006.[106] F. Yang, J. Shanmugasundaram, M. Riedewald, and J. Gehrke. Hilda: Ahigh-level language for data-driven web applications. 2006.[107] Mamoon Yunus. Understanding Enterprise-to-Cloud Migra-tion Costs and Risks, June 2010. [Online - July 2010]:http://www.ebizq.net/topics/cloud computing/features/12741.html.[108] Steve Zdancewic, Lantian Zheng, Nathaniel Nystrom, and Andrew C. Myers.Secure program partitioning. ACM Trans. Comput. Syst., 20(3):283–328,2002.[109] Hao Zhuang, Xin Liu, Zhonghong Ou, and Karl Aberer. Impact of InstanceSeeking Strategies on Resource Allocation in Cloud Data Centers. In Proceed-ings of the 2013 IEEE Sixth International Conference on Cloud Computing,CLOUD ’13, pages 27–34, Washington, DC, USA, 2013. IEEE ComputerSociety.145Appendix AAppendix on Hybrid CloudSolutionsAs discussed in Chapter 1, three infrastructure elements contribute to ahybrid cloud solution: i) the public cloud infrastructure, ii) the privatecloud infrastructure, and iii) the network infrastructure connecting thetwo. Public cloud infrastructure involves resources and services offered bycloud provider companies, in the form of computation and storage resourcesas well as accompanying services enabling resource usage elasticity andpay-per-use cost models. Similar to public cloud infrastructure, privateinfrastructure also involves computation and storage resources with thedistinction that these resources are fixed in size and are privately managed.Despite existence of various points of variability across different hybridcloud solutions, in this appendix, we distinguish them based on their type ofnetwork connectivity; mainly because the network connection can connectany two types of public and private infrastructures in a hybrid cloudsolution. Options for network connectivity range from software-enabledvirtual private networks (VPNs), to more reliable hardware-enabled VPNs,to highly complex low latency and high-capacity direct network connectionsbetween public cloud and private data centers which vary in their offeredresources and pricing models. Next we review each hybrid cloud solution aswell as the respective provider companies.Software-enabled VPNs. Hybrid clouds using software-enabled VPNsutilize the existing network infrastructure of the Internet and do notrequire any extra hardware to enable secure connectivity between thepublic cloud and private data centers. This reduces costs in the overallsetup of these hybrid cloud solutions. However, establishing such hybridconnectivity is complex and may suffer unstable data transfer rates.Vyatta [13] and CloudReach [14] are examples of companies offering suchhybrid cloud deployments connecting Amazon public cloud to arbitraryprivate infrastructure. Google also connects its Compute Engine [24] and146Appendix A. Appendix on Hybrid Cloud SolutionsAppEngine [23] to arbitrary private data centers using a service calledSecure Data Connector (SDC) [15]. In both these examples, capabilitiesand cost of utilizing public cloud resources follow the cost models forAmazon AWS and Google Compute Engine / AppEngine respectively. Onthe private cloud however, it is the responsibility of companies utilizingthese services to establish, configure, and manage their private resources.The model is particularly suitable for software systems that do not requiresignificant amount of data exchange between the public cloud and theprivate infrastructure and can tolerate occasional connectivity interruptions.Hardware-enabled VPNs. Hybrid clouds with hardware-enabled VPNsprovide faster and more secure connectivity at a higher cost. Costs of usinga hardware-enabled VPN usually fall above regular network appliances ofa cloud provider. However, their security and reliability offerings as wellas configurable network setups make them attractive options for hybriddeployments. Amazon Web Services (AWS) offers such hardware-enabledVPNs via their Virtual Private Cloud (VPC) [16] service to be combinedwith arbitrary private infrastructure. VPC is utilized by companies suchas Kiip [1] and Karma [17] to build hybrid connectivity with increasedsecurity and confidentiality [18, 19]. Hardware-enabled VPNs can replacesoftware-enabled VPNs in systems where more reliability and security isexpected. However, in terms of connectivity and bandwidth, such systemsalso rely on the underlying network infrastructure which could limit dataexchange rates and cause occasional network disruptions.Direct Data Links. Finally, hybrid cloud with direct network connec-tions between a private data center and a cloud data center offers themost consistent network performance of all hybrid deployments. On theupside, clients of such services are offered low latency and high capacity fordata exchange, making such deployments suitable for traffic heavy cloudapplications. On the down side however, there are only limited optionsfor private infrastructure that would allow for such type of connectivityto cloud data centers. Furthermore, these setups require complex networkhardware and result in deployment costs often three times as much asa software-enabled VPN connectivity. As an example, Amazon DirectConnect [20] with its more reliable high-capacity network topology providesthe ideal infrastructure for companies requiring high throughput anddata-intensive interactions between their private infrastructure and thepublic cloud. NetApp [21] and Zadara Storage [22], offering cloud-enabledcontent-delivery networks and data management, are among the customers147Appendix A. Appendix on Hybrid Cloud Solutionsfor Amazon Direct Connect. From the set of customers to a direct datalink solution, it can be seen that this model of deployment is particularlyeffective in systems where extensive data exchange over a reliable networkis expected between the cloud and the private infrastructure. The offeredreliability and network capacity justifies the higher costs of deploymentover a direct data link solution.With all the existing solutions, choosing the right hybrid cloud solution isa matter of identifying requirements for performance and reliability versuscost and complexity of the deployments. This is the job of the systemarchitect to decide about the right balance for all these factors when choosingwhich hybrid cloud solution to adopt.148Appendix BAppendix on DynamicDependency Graph149Appendix B. Appendix on Dynamic Dependency GraphProgram B.1 A sample profiling trace collected by jip-osgi.<frame bn="jdbcwrapper" cn="Util$TraceItem"mn="nz.jdbc.Util$TraceItem:print"c="1" t="663702" dfp="23444" dtp="123324" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="Util"mn="nz.jdbc.Util:line"c="1" t="12712" dfp="53" dtp="56" cfp="1" ctp="1"/><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:println"c="1" t="547975" dfp="0" dtp="0" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:print"c="1" t="124318" dfp="0" dtp="0" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:write"c="1" t="90654" dfp="0" dtp="0" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:write"c="1" t="57759" dfp="0" dtp="0"cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:ensureOpen"c="1" t="12152" dfp="0" dtp="0"cfp="1" ctp="1"/></frame></frame></frame><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:println"c="1" t="352978" dfp="0" dtp="0" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:newLine"c="1" t="194368" dfp="0" dtp="0" cfp="1" ctp="1"><frame bn="jdbcwrapper" cn="RollingPrintWriter"mn="nz.jdbc.RollingPrintWriter:ensureOpen"c="1" t="11105" dfp="0" dtp="0"cfp="1" ctp="1"/></frame></frame></frame></frame></frame> 150Appendix CAppendix on Cost andPolicy Specifications forPartitioningC.1 Policy SpecificationAs shown in Program C.1, the policy spec can include constraints on whatcode entities to include in the dependency graph. For example, the policyspec of Program C.1 constrains the DDG to contain only the informationfor the TradeServletAction:doQuotes business logic functionality of theDayTrader applicatoin. The policy spec can also define entities that canbe ignored in the dependency model. This can involve code blocks thatbelong to the underlying platform on which the application is executed.For the example of Program C.1, the policy model excludes entities of thecatalina component in the Tomcat JEE server from the DDG of theapplication and as such the created DDG will not contain any code entitiesfrom this component as part of its model.The policy spec can also specify which entities can possibly be replicatedor not replicated when creating the DDG. For non-replicated code entitiesin the DDG, the DDG ensures that there is only one single instance ofeach of these code blocks in the dependency graph. When the applicationis partitioned, these code blocks are constrained to be placed either onpremises or in the cloud. On the contrary, for replicable code or dataentities, the entity can co-exist both on the private infrastructure and inthe cloud. For stateless code entities this code replication can easily besupported by simply having replicated instances of the code on publicand private infrastructure. For stateful code entities or data entities,synchronizing the state of code instances or supporting consisity acrossdifferent versions of data entities needs to be handled by the system underanalysis. Given that the system can guarantee the consistency across151C.2. Cost Schemareplicated instances, we can use the DDG to effectively decide about theoptimal partitioning and distribution of the application across public andprivate infrastructures.Program C.1 A sample policy specification filtering on given DDG; ignor-ing any code from the catalina jar in the dependency model; and makingall the database tables non-replicable. The constraints are defined as regularexpressions.{"constraints": {"root": {"entry": {"id": "1","entity": {"component":"^.+(TradeServletAction:doQuotes).*$","target": "component" }}},"ignore": {"entity": {"component": "^.*(catalina).*$","target": "component" }},"non-replicable": {"entity": {"component": "^(DBBundle:Oracle).*$","target": "component" }}}}C.2 Cost SchemaAs showin in Program C.2, the cost scheme can encode information on theCPU and memory capabilities of the machines in the cloud or on-premisesas well as their associated costs for a given amount of time. All machine152C.2. Cost Schemacapabilities and associated monetary charges are configurable in the costscheme. Similarly for the data exchange, the cost scheme can encodeinformation on the bandwidth of the connection link between the cloudand the private infrastructure for both the inboud and the outboundcommunications. Separate montary costs can also be configured for theinbound and outbound communications.153C.2. Cost SchemaProgram C.2 An example cost scheme provided to the profiler with in-formation on host capabilities and data exchange rates as well as resourceusage costs.{"capabilities": {"hosts": { "host-num": "2","host": { "id": "1", "default": "true","cpu": {"capability": { "scale": "GHz","#text": "2.4" },"cost": {"unit": "3600", "scale": "second","#text": "0.50" }},"memory": {"capability": { "scale": "GB","#text": "3" },"cost": { "unit": "1", "scale": "GB","#text": "0" }}}// data for host2 is omitted},"exchange-rates": {"exchange-rate": {"from-host": "1", "to-host": "2","capability": { "scale": "MB","#text": "100" },"latency": { "scale": "second","#text": "0.03" },"cost": { "unit": "1", "scale": "GB","#text": "0.12" }}// data exchange for host2->host1 is omitted}}}154


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items