UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Context-aware conversational developer assistants Bradley, Nicholas 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_september_bradley_nicholas.pdf [ 718.43kB ]
Metadata
JSON: 24-1.0370955.json
JSON-LD: 24-1.0370955-ld.json
RDF/XML (Pretty): 24-1.0370955-rdf.xml
RDF/JSON: 24-1.0370955-rdf.json
Turtle: 24-1.0370955-turtle.txt
N-Triples: 24-1.0370955-rdf-ntriples.txt
Original Record: 24-1.0370955-source.json
Full Text
24-1.0370955-fulltext.txt
Citation
24-1.0370955.ris

Full Text

Context-Aware Conversational Developer AssistantsbyNicholas BradleyB.Sc., Queen’s University, 2013a thesis submitted in partial fulfillmentof the requirements for the degree ofMaster of Scienceinthe faculty of graduate and postdoctoral studies(Computer Science)The University of British Columbia(Vancouver)August 2018c© Nicholas Bradley, 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Context-Aware Conversational Developer Assistantssubmitted by Nicholas Bradley in partial fulfillment of the requirements for thedegree of Master of Science in Computer Science.Examining Committee:Reid Holmes, Computer ScienceSupervisorGail Murphy, Computer ScienceSupervisory Committee MemberiiAbstractBuilding and maintaining modern software systems requires developers to performa variety of tasks that span various tools and information sources. The crosscuttingnature of these development tasks requires developers to maintain complex mentalmodels and forces them (a) to manually split their high-level tasks into low-levelcommands that are supported by the various tools, and (b) to (re)establish their cur-rent context in each tool. In this thesis I present Devy, a Conversational DeveloperAssistant (CDA) that enables developers to focus on their high-level developmenttasks. Devy reduces the number of manual, often complex, low-level commandsthat developers need to perform, freeing them to focus on their high-level tasks.Specifically, Devy infers high-level intent from developer’s voice commands andcombines this with an automatically-generated context model to determine appro-priate workflows for invoking low-level tool actions; where needed, Devy can alsoprompt the developer for additional information. Through a mixed methods eval-uation with 21 industrial developers, we found that Devy provided an intuitive in-terface that was able to support many development tasks while helping developersstay focused within their development environment. While industrial developerswere largely supportive of the automation Devy enabled, they also provided in-sights into several other tasks and workflows CDAs could support to enable themto better focus on the important parts of their development tasks.iiiLay SummarySoftware engineers use many different tools for any one project. They work withmillions of lines of computer code and run their code through various independenttools to help edit, build and test systems and for project management to get theirprograms running smoothly. Needing all of these tools can complicate engineers’workflows because they each use a unique syntax and you have to understand howto put them together. In this thesis, I describe Devy, a tool for software engineersthat enables Amazon Alexa to take care of mundane programming tasks, helpingincrease productivity and speed up workflow. I also present an evaluation of Devy’sability to support the workflows of 21 professional software engineers and discusssome of Devy’s benefits and challenges.ivPrefaceAll of the work presented henceforth was conducted in the Software Practices Lab-oratory at the University of British Columbia. All projects and associated methodswere approved by the University of British Columbias Research Ethics Board [cer-tificate #H17-01251].A version of this material has been published [Nick C. Bradley, Thomas Fritz,Reid Holmes. Context-Aware Conversational Developer Assistants. ICSE 40:993–1003, 2018]. I was the lead investigator, responsible for all major areas of con-cept formation, data collection and analysis, as well as manuscript composition.Thomas Fritz was involved in study design and participant recruitment and con-tributed to manuscript edits. Reid Holmes was the supervisory author on thisproject and was involved throughout the project in concept formation and manuscriptcomposition.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1 Conversational Interface (Devy Skill) . . . . . . . . . . . . . . . 93.2 Context Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Intent Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Study Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15vi4.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3 Data Collection and Analysis . . . . . . . . . . . . . . . . . . . . 185 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.1 Completing Development Tasks with Devy . . . . . . . . . . . . 205.2 CDA Benefits & Scenarios . . . . . . . . . . . . . . . . . . . . . 245.3 CDA Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.1 Development Context . . . . . . . . . . . . . . . . . . . . . . . . 306.2 Bots for SE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.3 Natural Language Tools for SE . . . . . . . . . . . . . . . . . . . 327 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 337.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.1 Handling Requests . . . . . . . . . . . . . . . . . . . . . . . . . 39A.2 Developing Alexa Skills . . . . . . . . . . . . . . . . . . . . . . 41A.3 Processing User Intents . . . . . . . . . . . . . . . . . . . . . . . 43B Study Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47B.1 Recruiting Participants . . . . . . . . . . . . . . . . . . . . . . . 47B.2 Conducting the Study . . . . . . . . . . . . . . . . . . . . . . . . 51B.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59B.3.1 Quantifying Number of Invocation Attempts . . . . . . . 59B.3.2 Open Coding Interview Transcripts . . . . . . . . . . . . 59viiList of TablesTable 2.1 Manual steps for the common ‘share changes’ workflow. . . . . 6Table 2.2 Literal CDA steps for the ‘share changes’ workflow. . . . . . . 6Table 3.1 Context Model elements. . . . . . . . . . . . . . . . . . . . . 12Table 4.1 Order, sample questions, and tasks from our mixed methods study. 19Table B.1 Participant and study metadata. . . . . . . . . . . . . . . . . . 60Table B.2 Devy’s response when task is completed successfully. . . . . . 61Table B.3 Total and valid attempts to complete each task using Devy. Validattempts exclude attempts that failed due to technical reasons.Data from transcript 6 was excluded because the participantcompleted the tasks in the wrong order. . . . . . . . . . . . . 61viiiList of FiguresFigure 3.1 Devy’s architecture. A developer expresses their intention innatural language via the conversational layer. The intent ser-vice translates high-level language tokens into low-level con-crete workflows which can then be automatically executed forthe developer. Dotted edges predominantly communicate inthe direction of the arrow, but can have back edges in caseclarification is needed from the user. . . . . . . . . . . . . . 10Figure 3.2 Devy’s finite state machine for handling workflows. Stack-push transitions are shown with solid lines while stack-poptransitions are shown with dotted lines. For readability, somearrows do not connect with their state. However, all lines arelabelled with the action that causes the state transition and cor-respond to the next state. Edges between the FSM and theContext Model are elided for clarity. . . . . . . . . . . . . . . 14Figure 5.1 Adjusted number of attempts required to complete each task ofT1 across 20 participants. . . . . . . . . . . . . . . . . . . . . 21Figure A.1 Detailed view of a request being processed by our system. . . 40Figure A.2 Amazon Alexa developer console showing the Devy skill. Sup-ported intents are shown on the left in the Interaction Modelsection. Other skill development tasks can be completed usingthe links on the right under the Skill builder checklist. Testingand deployment is completed via the tabs in the header bar. . 45ixFigure A.3 Training utterances for the startIssue intent. Slots (variables)are shown in blue and surrounded by curly braces. The slottype and order is specified below the list of utterances. . . . . 46Figure B.1 Consent form contents . . . . . . . . . . . . . . . . . . . . . 49Figure B.2 Study guide . . . . . . . . . . . . . . . . . . . . . . . . . . . 63xGlossaryapi Application Programming Interfacecda Conversational Developer Assistantfsm Finite State Machinegui Graphical User Interfaceide Integrated Development Environmentnlp Natural Language Processingui User Interfaceurl Unqiue Resource Locator• Application Programming Interface (api)• Conversational Developer Assistant (cda)• Finite State Machine (fsm)• Graphical User Interface (gui)• Integrated Development Environment (ide)• Natural Language Processing (nlp)• User Interface (ui)• Unqiue Resource Locator (url)xiAcknowledgmentI would first like to thank my supervisor Dr. Reid Holmes for his guidance andsupport whenever I ran into a trouble spot or had a question about my research orwriting. He consistently allowed this thesis to be my own work, but steered me inthe right the direction whenever he thought I needed it.I would also like to thank Dr. Thomas Fritz for helping me design the study,recruit participants, and for seeing me through my first qualitative analysis and allthe students in the Software Practices Lab who helped pilot the study and whomake coming to work a pleasure each day.I would like to acknowledge Dr. Gail Murphy as the second reader of thisthesis, and I am gratefully indebted to her for her very valuable comments on thisthesis.I would also like to thank the 21 professional software engineers (and theiremployers and managers) who dedicated their time to trying out Devy. Withouttheir passionate participation and valuable insights, this work could not have beensuccessfully completed.Finally, I must express my very profound gratitude to my wife Susanne forproviding me with unfailing support and continuous encouragement throughoutmy years of study and through the process of researching and writing this thesis.This accomplishment would not have been possible without the support of ev-eryone mentioned and many others. Thank you.xiiChapter 1IntroductionThe process of developing software is complex. Developers discuss, write, test,debug, version, and deploy code with the aid of a plethora of different tools whichoften need to be used in concert to achieve a high-level task. Combining thesetools to complete tasks can be cumbersome and inefficient since developers needto manually execute tools in correct sequences (workflows) and specify correct pa-rameter values (contextual information). These manual steps force developers tomaintain mental models combining data from a variety of disparate sources includ-ing project source code, version control systems, issue trackers, discussion threads,and test and deployment environments. Manual processes are also inefficient sincedevelopers need to switch frequently between tools and contexts [5, 10, 11] andbecause they are more error prone. Additionally, tools today are designed for dif-ferent platforms including the web, desktop, and terminal requiring developers tolearn and work with different interfaces and to act as bridges moving informationbetween tools. For example, terminal tools are often considered to be very power-ful but their rigid syntax and limited support for feature discovery can make themdifficult to use. Web tools on the other hand aim to be intuitive but typically lackthe customizability of terminal or desktop tools. For common tasks, developers canlearn the required steps of the workflow but even small variations in the task mayrequire significant changes to the workflow. Many tasks can also be completedin several different ways which could lead developers to learn less efficient work-flows, or ones that cause problems with team processes. A complete workflow for1sharing code changes is given in Chapter 2.Tools operate within a certain context which is specified by both the system(e.g. the user’s home directory) and by the developer (e.g. the particular file tooperate on). While the system context is specified automatically, the developerneeds to understand, obtain, and transform the remaining contextual informationand provide it to the tool. This can be cognitively demanding on the developer andcan distract them from their actual task. Alternatively, we can track developer ac-tivities and automatically extract and synthesize much of the necessary contextualinformation. This approach is commonly employed in integrated development en-vironments where developer interactions are used to provide more intelligent tools.In our case, capturing and maintaining this information is more challenging sinceit comes from a variety of tools but allows us to run multi-step workflows that spantools.Techniques for automating developer tasks have been around for decades andtypically take the form of scripts or macros. Scripts, consisting of sequences ofcommands, are commonly used in software development due to their versatilityin connecting different tools and services together; however, they can be brittle tochanges in the system, tool, or task and can be expensive to implement initially.Macros work instead by recording the steps developers take when completing atask but they are often only available within specific tools and may still requireeffort to make them robust. More recently, tools for testing, deployment, and ser-vice integration have become popular choices for automating parts of the softwaredevelopment process but impose their own restrictions. In general, while it wouldbenefit developers to automate these workflows, it is challenging for three reasons.First, it is hard to determine a priori all the tasks and workflows developers willneed to complete. Second, these workflows consist of various low-level actionsthat often span across tool boundaries and require a diverse set of parameters thatdepend on the current context and developer intent. Third, even if there are scriptsconfigured for automating workflows, the developer needs to remember their exis-tence and how to invoke them manually in the current context.In this thesis, I argue that automated assistants are an intuitive and robustmethod of automating workflows and that they benefit developers by automati-cally mapping intentions to commands, eliminating application and mental context2switches, and reducing the need to manually maintain and specify context. To sup-port this claim, we designed a conversational developer assistant (CDA) that (a)provides a conversational interface for developers to specify their high-level tasksin natural language, (b) uses an intent service to automatically map high-level tasksto low-level development actions, and (c) automatically tracks developers’ actionsand relevant state in a context model to automate the workflows and specificationof parameters. The CDA allows developers to express their intent conversationally,eliminating the need for learning and remembering rigid syntax, while promot-ing discoverability and task flexibility. The automatic mapping and execution ofworkflows based on the developer’s high-level intent, augmented by the contextmodel, reduces developers’ cognitive effort of breaking down high-level intentsinto low-level actions, switching context between disparate tools and parameteriz-ing complex workflows.In order to conduct a meaningful industrial evaluation of the feasibility, usabil-ity, and potential use cases of CDAs in software development, we implementedDevy, a prototype voice-controlled CDA with a pre-defined set of automated Gitand GitHub tasks. Devy’s primary goal is to help developers maintain their fo-cus on their development tasks, enabling them to offload low-level actions to anautomated assistant.We performed a mixed methods study—a combination of an interview and anexperiment—with 21 industrial software engineers using Devy as a technologyprobe. Participants had the opportunity to interact with our Devy prototype sothey could offer concrete feedback about alternative applications of CDAs to theirindustrial workflows. Each engineer performed multiple experimental tasks withDevy and answered a series of open-ended questions. The evaluation showed thatengineers were able to successfully use Devy’s intent-based voice interface andthat they saw promise in this type of approach in practice.This feedback provides evidence of the potential and broad applicability ofboth Conversational Developer Assistants and developer’s interest in increased au-tomation of their day-to-day workflows.3The primary contributions of this thesis are:• A context model and conversational agent to support automated developmentassistants.• Devy, a prototypical voice-activated CDA that infers developer intent andtransforms it into complex workflows.• A mixed methods study demonstrating the value of this approach to indus-trial developers and providing insight into how CDAs can be used and ex-tended in the future.We describe a concrete scenario in Chapter 2, our approach in Chapter 3 and ourexperiment in Chapter 4 and 5. Related work, discussion, and conclusions followin Chapters 6–8.4Chapter 2ScenarioDevelopment projects often use complex processes that involve integrating numer-ous tools and services. To perform their high-level tasks, developers need to breakdown their intent into a list of atomic actions that are performed as part of a work-flow. While the intent may be compact and simple, workflows often involve inter-acting with a variety of different tools.Consider a developer whose task is to submit their source code changes forreview, which requires using version control, issue tracking, and code review tools.At a low level, the developer needs to: commit their changes, push them to a remoterepository, link the change to the commit in the issue tracker, and assign reviewersin the code review system. Our context model is able to track what project thedeveloper is working on, what issue is currently active, and who common reviewersfor the changed code are in order to enable the developer to just say “Devy: I’mdone” to complete this full workflow without having to change context betweendifferent tools.To perform this task manually, the developer must follow a workflow similar tothat shown in Table 2.1 (for simplicity, we illustrate this workflow using GitHub).In this scenario, developers use three tools: GitHub (Table 2.1-(a),(f),(g)), thetest runner (Table 2.1-(b)), and git (Table 2.1-(c),(d),(e)). They also performedfour overlapping subtasks: running the tests (Table 2.1-(b)), linking the commit(Table 2.1-(a),(c)), managing version control (Table 2.1-(c),(d),(e)), and configur-ing the code for review (Table 2.1-(f),(g)). In addition, they relied on several pieces5Table 2.1: Manual steps for the common ‘share changes’ workflow.(a) Open a web browser for the issue tracker and check the issue number for thecurrent work item.(b) Open a terminal and run the tests against the changed code to ensure they work(e.g., npm run tests).(c) Open a terminal and commit the code, tagging it with the current work itemnumber (e.g., git commit -m ‘See issue #1223’).(d) Pull any external changes from the remote repository (e.g., git pull).(e) Push the local change to the remote repository (e.g., git push).(f) Open the commit in the version control system using the GitHub web interfaceand open a pull request.(g) Determine a set of reviewers and assign them to the pull request with the GitHubweb interface.Table 2.2: Literal CDA steps for the ‘share changes’ workflow.(a) “CDA, Create a branch ‘issue1223’ in the FrontEnd repo.”→ “Branch created.”(b) “CDA, Run all the tests in the FrontEnd repo.”→ “Tests executing.”(c) “CDA, Commit with ‘Fix #1223’ in the FrontEnd repo.”→ “Commit made.”(d) “CDA, Pull the FrontEnd repo.”→ “Pulled.”(e) “CDA, Push the FrontEnd repo.”→ “Pushed.”(f) “CDA, Open GitHub for the FrontEnd repo and create a pull request for branchissue1223.”→ “Pull request created.”(g) “CDA, Open GitHub for the FrontEnd repo and add alice79 as a reviewer forthe issue1223 pull request.”→ “Reviewer added.”of implicit contextual knowledge: (1) the repository being used, (2) the current is-6sue, (3) how to run the tests, (4) the project’s commit linking protocol, and (5) theproject’s code review assignment protocol.Providing a voice-activated CDA for this workflow without any additional ab-straction (or context model) offers little benefit as shown by the transcript in Ta-ble 2.2.Grey rows are the developer’s speech, white rows are the CDA’s responses.This implementation has obvious shortcomings: it provides no meaningful bene-fit over just accessing the commands directly, as the developer must say all of thecommands with the right names, in the right order, with the right parameters, andit would no doubt be faster if they performed the actions directly.Automating this workflow would require knowing the five pieces of the con-textual information along with knowledge of how to use the three tools employedby the developer. Fortunately, these are all pieces of information that are trackeddirectly by the context model of our conversational developer assistant Devy. Thesame workflow can be completed using Devy’s verbal natural language interface:Dev Devy, I’m done.Devy You have uncommitted changes. Should I commit them?Dev OK.Devy OK, I’m about to open a pull request, should I assign Alice?Dev Yeah.During this interaction, the developer did not need to use any other tools orswitch their context away from their source code. The context model automaticallytracked the project being worked on, the files being changed, and the current issuenumber. To show the results of the tests (Table 2.1-(b)), Devy appends the outputof the test execution as a comment to the pull request thread when it is complete.To identify the list of appropriate reviewers (Table 2.1-(g)), Devy is able to query asimple service that examines past reviewers for the code involved in the developer’schange.While the developer’s intent, submitting changes, is simple, it can only be real-ized through the indirect process listed above that involves interacting directly with7the version control system, issue tracker, test execution environment, and code re-view system. Each of these systems incurs their own mental and temporal costs,and provides opportunities for a team’s workflow to be ignored (e.g., if new teammembers are not aware of all steps, or an experienced one skips a step). Ultimately,this task involves four context switches between the issue tracker, the version con-trol system, the pull request interface, and the code review system; Devy abstractsaway this minutiae so the developer can focus on their high level intent.8Chapter 3ApproachDevy, our conversational developer assistant (CDA), has three main components:a conversational user interface, a context model, and an intent service. Developersexpress their intent using natural language. Our current prototype uses AmazonEcho devices and the Amazon Alexa platform to provide the conversational inter-face; this interface converts developer sentences into short commands.These commands are passed to our intent service which runs on the developer’scomputer. The context model actively and seamlessly updates in the backgroundon the developer’s computer to gather information about their activities. Using thecontext model and the short commands from the conversational interface, the intentservice infers a rich representation of the developer’s intent. This intent is thenconverted to a series of workflow actions that can be performed for the developer.While the vast majority of input to the intent service is derived from the contextmodel, in some instances clarification is sought (via the conversational interface)from the developer. Devy’s architecture is shown in Figure 3.1.3.1 Conversational Interface (Devy Skill)The conversional interface plays a crucial role by allowing developers to expresstheir intents naturally without leaving their development context. The Devy Skillhas been implemented for the Amazon Alexa platform (apps on this platform arecalled skills). To invoke the Devy Skill, a developer must say:9ConversationLayerDeveloper's ComputerNaturalLanguageStandardDevelopmentBehaviourEcho AmazonNLPDevySkillContextModel...WorkflowActionsIntent ServiceFigure 3.1: Devy’s architecture. A developer expresses their intention in nat-ural language via the conversational layer. The intent service translateshigh-level language tokens into low-level concrete workflows which canthen be automatically executed for the developer. Dotted edges predom-inantly communicate in the direction of the arrow, but can have backedges in case clarification is needed from the user.“Alexa, ask Devy to ...”They can then complete their phrase with any intent they wish to express to Devy.The Amazon microphones will only start recording once they hear the ‘Alexa’word, and the Devy skill will only be invoked once ‘Devy’ has been spoken.The Amazon natural language APIs translate the developer’s conversation intoa JSON object; to do this, the Devy skill tells the Amazon Alexa platform whatkinds of tokens we are interested in. We have provided the platform with a varietyof common version control and related development ‘utterances’ we identified inthe literature and from discussions with professional developers; many utterancesalso have synonyms (e.g., for ‘push’, we also include ‘submit’, and ‘send’). For asentence like “Alexa tell Devy to push.” the Amazon JSON object would containone primary field intent.name with the value ‘pushChanges’.While Alexa has been useful for constructing our prototype, it imposes tworestrictions that hinder our approach:101. The requirement to use two names, “Alexa” and “Devy” is cumbersome.2. More technically, Alexa doesn’t allow push notifications and requires theclient app to respond within ten seconds; both of which cause issues for longrunning commands.While our current approach uses the Amazon APIs for voice input, using a text-based method (e.g., a ChatBot) would also be feasible for scenarios where voice-based input is not appropriate.3.2 Context ModelThe context-aware development model represents the ‘secret sauce’ that enablesadvanced voice interactions with minimal explicit developer input. The differencesbetween manual CDA and context-aware CDA (Devy) approaches are exemplifiedin Chapter 2. The model acts as a knowledge base allowing the majority of theparameters required for performing low-level actions to be automatically inferredwithout developer intervention. In cases where required information is not presentin the context model, it can be prompted from the developer using the conversa-tional interface.The context model for our prototype system is described in Table 3.1. The cur-rent model supports version control actions and online code hosting actions. Ourprototype tool includes concrete bindings for Git and GitHub respectively for thesetwo roles but also supports other semantically similar systems such as Mercurialand BitBucket. While other sets of information can be added to the model, thesewere sufficient for our current prototype.The ActiveFile model parameter is the most frequently updated aspect ofthe model. As the developer moves from file to file in any text editor or IDE, thefull path of the active file is noted and persisted in the ActiveFile field. Thecontext model is also populated with information about all version control repos-itories it finds on the developer’s machine. From these local repositories, usingthe OriginURL, it is also able to populate information about the developer’s on-line code hosting repositories. The path component of ActiveFile lets the modelindex into the list of version control repositories to get additional details including11Table 3.1: Context Model elements.Current FocusActiveFileEach Local RepositoryPathVersion Control TypeOriginURLUserNameCurrentBranchFileStatusEach Remote RepositoryOpenAssignedIssues[]Collaborators[]Other ServicesBlameServiceTestServiceReviewerAssignmentServicethe remote online repository, if applicable. Our prototype also allows developersto exclude specific repositories and paths from being tracked by the context modeland only minimal information about the state of the command is exchanged withAmazon to ensure the privacy of the user.We designed our context model to pull changes from a developer’s computerwhen they interact with Devy. Due to the limited size of our context model, thepull-based architecture is sufficient. However, for more advanced models, a push-based architecture where the model is initialized at startup and continuously up-dated by external change events would be preferable to avoid delaying the conver-sational interface.Extending the model is only required if the developer wishes to support work-flows that require new information. Model extensions can be either in terms of pre-populated entries (push-based above), or pointers to services that can be populatedon demand (pull-based). For example, the TestService, which takes a list of filesand returns a list of tests that can run, can be pointed to any service that conformsto this API (to enable easy customization of test selection algorithms). If devel-12opers wanted more fine-grained information such as the current class, method, orfield being investigated, they could add a relevant entry to the model and populateit using some kind of navigation monitor for their current text editor or IDE.3.3 Intent ServiceThe intent service does the heavy lifting of translating the limited conversationaltokens and combining it with the context model to determine the developer’s intent.This intent is then executed for the developer in the form of workflow actions. Thecontext model is updated as the actions execute since their outcomes may influencesubsequent steps of the workflow.The conversational layer provides the intent service with extremely simple in-put commands (e.g., a single verb or noun). The intent service uses a stack-basedfinite state machine (FSM) to reason about what the input command means in thiscontext. While more constrained than plan-based models, FSMs are simple to im-plement and are sufficient for the purposes of evaluating the potential of CDAs.The complete FSM for our version control intent service is shown in Figure 3.2.Within the FSM, transitions between states define workflow steps while states con-tain the logic needed to prepare and execute low-level actions. Each state is awareof other states that may need to be executed before they can successfully complete(e.g., a pull may be required before a push if the local repository is behind the re-mote repository). We use a stack-based FSM because workflow actions frequentlydepend on each other. By using a stack, we are able to just push commands on thestack and allow the execution to return to the right place in an understandable way.These potential return edges are denoted by the dotted arrows in Figure 3.2; for ex-ample, Stashing can be accessed either directly by the developer from the Readystate, or as a consequence of a Pulling precondition. The states in the FSM makeheavy use of the context model to provide values for their parameters.13Readypullstash commitpushcommit?commit? stash?pullpullpushbehind:pull firstListingIssuesGettingOwnerCreatingPRListingUsersownerpullrequestgetissueBranching branchlistissuescreatepullrequestcreatepullrequestpushgetissueCommittingPushingStashingPullingRunTeststestreporton PRrun testsassignreviewersFigure 3.2: Devy’s finite state machine for handling workflows. Stack-pushtransitions are shown with solid lines while stack-pop transitions areshown with dotted lines. For readability, some arrows do not connectwith their state. However, all lines are labelled with the action thatcauses the state transition and correspond to the next state. Edges be-tween the FSM and the Context Model are elided for clarity.14Chapter 4Study MethodThe long-term objective of our research is to enable intent-based workflows with-out software developers having to continuously map their intents to low-level com-mands. Therefore, we are investigating when and how software developers woulduse a conversational developer assistant that supports intent-based workflows. Specif-ically, we are examining the following research questions:RQ1 How well can a conversational developer assistant approach support basicdevelopment tasks related to version control?RQ2 For which workflows would a conversational developer assistant be useful todevelopers and why?To answer our research question, we developed the voice-enabled CDA, Devy,as a prototype (see Chapter 3), piloted it with several student developers, and thenconducted a mixed methods study with 21 professional software developers. Thestudy was a combination of an experiment with Devy and semi-structured inter-views.4.1 ParticipantsWe recruited 21 professional software developers (2 female, 19 male) from 6 localsoftware companies of varying size. Participants had an average of 11 (±8) years15of professional development experience and an average of 15 (±11) years1 of pro-gramming experience. Participants were classified as either junior developers (6)or senior developers (15) based on job title. All participants had experience usingversion control systems and all but 1 had experience with Git.Participants were recruited through personal contacts and recruiting emails. Topique interest and foster participation, we included a short video2 introducing Devyand demonstrating it with the use case of determining the owner of an open file. Inaddition, we incentivized participation with a lottery for two Amazon Echo Dotsamongst all participants. To participate in our study, subjects had to be softwaredevelopers and be familiar with some version control system.4.2 ProcedureThe study consisted of three parts: (1) a brief semi-structured interview to askabout developers’ tasks and workflows as well as about the possible value of a con-versational assistant to support these, (2) an experiment with Devy comprised oftwo study tasks, and (3) a follow-up semi-structured interview on the experienceand use of a CDA. We piloted our study and adapted the procedure based on theinsights from our pilots. We chose this three step procedure to stimulate developersto think about a broad range of workflows and how a CDA might or might not help,as well as to avoid priming the participants too much and too early with the func-tionality that our current Devy prototype provided. The order and sample questionsof the parts of our study are illustrated in Table 4.1. The study was conducted inquiet meeting rooms at the participant’s industrial site.Interview (Part One). To have participants reflect upon their work and workflows,we started our first semi-structured interview by asking general questions aboutparticipants’ work days and then more specifically about the tasks they are work-ing on as well as the specific steps they perform for these tasks (see Table 4.1 forsample questions). We then introduced Amazon’s Alexa to participants. To getparticipants more comfortable with interacting with Alexa, we had them ask Alexato tell them a joke. Next, we asked participants about the possible tasks and work-1Missing data for two participants.2http://soapbox.wistia.com/videos/HBzPb4ulqlQI16flows that a conversational assistant such as Alexa could help them with in theirworkplaces.Experiment. To give participants a more concrete idea of a CDA and investigatehow well it can support basic workflows, we conducted an experiment with Devyon two small tasks. For this experiment, we provided participants a laptop that weconfigured for the study. We connected the Amazon Echo Dot to the laptop forpower and we connected the laptop and the Echo Dot to the participant’s corporatewireless guest network.The tasks were designed to be familiar to participants and included version con-trol, testing, and working with issues. The objective for the first task—interactiontask—was to examine how well developers interact with Devy to complete a taskthat was described on a high-level (intent-level) in natural language. This first taskfocused on finding out who the owner of a specific file is and making a change tothe file available in the repository on GitHub. The task description (see also Ta-ble 4.1) was presented to the participants in a text editor on the laptop. For the taskwe setup a repository in GitHub with branches and the file to be changed for thetask.The objective for the second task—demonstration task—was to have partici-pants use Devy for a second case and demonstrate its power and potential of map-ping higher-level intents to lower-level actions, e.g. from telling Devy that one“is done” to Devy committing and pushing the changes, running the relevant testsautomatically and opening the pull request in a web browser tab (see Table 4.1 forthe task description).Interview (Part Two). After the experiment, we continued our interview. Interact-ing with Devy and seeing its potential might have stimulated participants’ thinkingso we asked them further about which scenarios an assistant such as Devy would bewell-suited and why. We also asked them about their experience with Devy duringthe two experimental tasks (see Table 4.1 for the questions). Finally, we concludedthe study by asking participants demographic questions and thanking them for theirparticipation.174.3 Data Collection and AnalysisThe study, including the interviews and experiment, lasted an average of 36 (±4)minutes. We audio recorded and transcribed the interviews and the experiment andwe took written notes during the study.To analyze the data, we use Grounded Theory methods, in particular open cod-ing to identify codes and emerging themes in the transcripts [14]. For the opencoding, two authors coded five randomly selected transcripts independently andthen discussed and merged the identified codes and themes. In a second step, wevalidated the codes and themes by independently coding two additional randomlyselected transcripts. For the coding of all transcripts, we used the RQDA [6] Rpackage. In this thesis, we present some of the most prominent themes, notablythose that illustrate the most common use cases, the benefits, and the shortcomingsof CDAs.From the experimental task T1, we derived a count based on the number oftimes a participant had to speak to Devy before they were able to complete a sub-task. We adjusted this count by removing 55 attempts (out of 175) that failed dueto technical issues, i.e. connectivity problems or unexpected application failures ofAlexa, due to a participant speaking too quietly, and due to participants trying toinvoke Devy without using the required utterance of “Alexa, ask/tell Devy...”.18Table 4.1: Order, sample questions, and tasks from our mixed methods study.Interview - Part One1.1 Walk me through typical development tasks you work on every day.1.2 How do you choose a work item; what are the steps to complete it?1.3 How do you debug a failed test?2 To help you get familiar with Alexa, ask Alexa to tell us a joke.3 Can you think of any tasks that you would like to have “magically” completed byeither talking to Alexa or by typing into a natural language command prompt?Experiment - Interaction Task (T1)Complete the following tasks:Launch Devy by saying “Alexa, launch Devy” [..]T1.1 Using Devy, try to get the name of the person whom you might contact to get helpwith making changes to this ‘readme’ file.T1.2 Next, make sure you are on branch ‘iss2’ and then make a change to this ‘readme’file (and save those changes).T1.3 Finally, make those changes available on GitHub.Experiment - Demonstration Task (T2)Complete the following tasks:T2.1 Say “Alexa, tell Devy to list my issues.” to list the first open issue on GitHub. Listthe second issue by saying “Next”, then stop by saying “Stop”. Notice that thelisted issues are for the correct repository.T2.2 Say “Alexa, tell Devy I want to work on issue 2.” to have Devy prepare yourworkspace for you by checking out a new branch.T2.3 Resolve the issue: comment out the debug console.log on line 8 of log.ts byprepending it with //. Save the file.T2.4 Say “Alexa, tell Devy I’m done.” to commit your work and open a pull request.Devy will ask if you want to add the new file; say “Yes”. Next, Devy recommendsup to 3 reviewers. You choose any you like. When completed, Devy will say itcreated the pull request and open a browser tab showing the pull request. Noticethe reviewers you specified have been added. Also, notice that tests covering thechanges were automatically run and the test results were included in a commentmade by Devy.Interview - Part Two1 Imagine that Devy could help you with anything you would want, what do youthink it could help you with and where would it provide most benefit?2 Are there any other tasks / goals / workflows that you think Devy could help with,maybe not just restricted to your development tasks, but other tools you or yourteam or your colleagues use?3 When you think about the interaction you just had with Devy, what did you likeand what do you think could be improved.4 Did Devy do what you expected during your interaction? What would you change?5 Do you think that Devy adds value? Why or why not?19Chapter 5ResultsIn this chapter we present the results for our two research questions that are basedon the rich data gathered from the experimental study tasks and the interview. First,we report on the participants’ interaction and experience with our CDA Devy. Thenwe report on the workflows and tasks that a CDA might support as well as itsbenefits and challenges.5.1 Completing Development Tasks with DevyOverall, all participants were able to complete all subtasks of T1 and T2 success-fully with Devy. Many participants expressed that Devy was “neat” (P17) and“cool” (P18) and some also stated that Devy did more than they expected. Forinstance, P9 explicitly stated “[Devy] exceeded my expectations” while P8 “[was]surprised at how much it did [..] it actually did more than [..] expected”.For the first experimental task T1, we examined if participants were able tointeract with Devy and complete specific subtasks that were specified on the intentlevel rather than on the level of specific and executable (Git) commands. We guidedparticipants through the subtasks confirming successful attempts and promptingthem to retry failed attempts. Successful attempts, shown in Table B.2, were in-tuitive to participants since Devy’s responses directly answered or confirmed therequested action. If a subtask failed for technical reasons (e.g. lost WiFi connec-tion) Alexa responded “The requested skill took too long to respond”, or if the user20●●Number of Attempts (Adjusted)12345678T1.1(File Owner)T1.2(Branch Name)T1.3(Submit Changes)Figure 5.1: Adjusted number of attempts required to complete each task ofT1 across 20 participants.forgot to invoke Devy, Alexa responded “Sorry, I’m not sure”. Due to limitationswith Alexa, Devy would sometimes misinterpret a user’s request. In these cases wewould ask the participant to try again, although it was usually clear from Devy’sresponse that the task was not successful (e.g. “You are on branch iss2” in responseto “Alexa, ask Devy who owns this file”).Figure 5.1 shows the number of Devy interactions (attempts) that it took eachparticipant to complete each of the three subtasks of T1. The numbers in the figureare based on 20 participants (one participant completed T2 before T1 and wasexcluded due to learning effects); the values were adjusted by removing attemptsthat failed due to technical issues (see Chapter 4.3). Across all three subtasks,participants used very few attempts to complete the subtasks with an averageof two attempts for T1.1 and T1.3 and a single attempt for T1.2.21Subtask T1.1 required getting the name of the person who made the mostchanges to an open file. This task had the highest variance with one participanttaking 8 attempts since he had never used Git before. Six participants requiredsome guidance. This was largely due to Devy only being trained with three utter-ances that all focused on file ownership and none that focused on the number ofchanges. In fact, seven participants used an utterance similar to “who has madechanges” (P1, P2, P3, P4, P14, P17, P19) on their first attempt. This shows thateither developers require training to learn the accepted utterances or, better yet, thatDevy should support a broad set of utterances. One participant compared it tothe specific and rigid syntax of command-oriented approaches:“Multiple ways you can tell it to do the same thing [because] it might be advanta-geous where [you] might forget the exact terminology.” (P10)In their first interactions with Devy, most participants (16 out of 20) did nottake advantage of its automatic context tracking and instead included the specificfile name in their utterances. This was due to participants thinking of Devy as atraditional tool, “making the assumption [Devy] would work just like the commandline client” (P19) and they “expected [Devy] to be kind of simple and dumb” (P7).As their sessions progressed, participants started to express their intentions morenaturally and more vaguely, for instance by replacing the file name with “this” or“this file”, and participants appreciated the automated context tracking:“I was just thinking it knows the context about what I’m talking about. That’s kindof cool.” (P2)Subtask T1.2 required getting the checked-out branch and only took one at-tempt on average. All but two participants were able to complete the subtask with-out guidance. Thereby, participants used various utterances to interact withDevy from ones that were close to the Git commands “Alexa, tell Devy to check-out branch iss2” (P15) to less command-oriented phrases “Alexa, ask Devy to getonto branch iss2 [..]” (P8). The two participants that did not complete this taskaccidentally skipped it. The one participant that took 4 attempts paused betweenstarting Devy and issuing the command.Subtask T1.3 focused on submitting changes to GitHub. Participants took an22average of 2 attempts to complete this and had a lower variance than for T1.1.While 14 participants followed the Git command line interaction closely by firstcommitting the changes and then pushing them, the other 6 participants tookadvantage of some of Devy’s automation and for example directly said “Alexa,tell Devy to push to GitHub” (P15) which resulted in Devy committing and pushingthe changes. Also, for this subtask, most participants took advantage of Devy’scontext model and omitted some of the specifics from their phrases, such as whatexactly should be committed or pushed.The second experimental task T2 was to demonstrate Devy’s potential. Sinceall utterances were specified in the task description, and no participants had prob-lems following the steps, we will not include any further analysis of this task.Observation 1 Participants were able to use our conversational developer as-sistant to successfully complete three common development tasks without explicittraining, with very few attempts, and by taking advantage of the automatic contexttracking.235.2 CDA Benefits & ScenariosParticipants reported a myriad of scenarios and situations in which a CDA couldenhance their professional workflow.One common workflow supports multi-step and cross-application tasks.Several development tasks require a certain “process” (P8) or sequence of steps tobe completed that oftentimes require developers to switch between different appli-cations. An example referred to by several participants was the sharing of changes:“Once I’ve committed everything locally, I’ll push it to GitHub. I’ll go to GitHubin my web browser, create a new pull request, write out a description of the change[..] and how I tried to accomplish that [..] Once the pull request is created, I’ll goto Slack and post a message on our channel and say there is a new PR” (P15)Participants indicated the cost and effort of these multi-step and cross-applicationtasks and how a conversational assistant would reduce the necessary application/-context switches and allow developers to not “lose [..] concentration on the thingI’m looking at” (P6) and stay more focused:“If you could do some [of these tasks] automatically, just by talking, I’d be reallyhappy because I usually have a ton of consoles and switching over really confusesme when you have so many screens open. Just alt-tabbing between them consis-tently, even if you do that like 9 out of 10 times successfully, at the end of the dayyou start getting sloppy and holding those trains of thought in mind would proba-bly be simpler if you weren’t changing screens so often” (P7)“Today, I think I had like 20 emails all related to my pull request and it was all justsingle comments and I have to link back [to the pull request..] and then come back[to my email client] and then delete, and then go back and [..]. So there’s a lot ofback and forth there. Those are the main things that I feel: ‘oh these are takingtime and it shouldn’t...’” (P3)A CDA is also considered particularly beneficial for the automatic mapping ofhigher-level tasks to commands:“Anything that helps you stay in the flow is good, so if I can do these higher leveltasks with a brief command to some place rather than break them down into a se-24quence of git commands plus switching to the browser plus doing yet another thinginterspersed with some manual reading, it would be a win.” (P19)This automatic aggregation of multiple steps is seen as a “simplification” (P7) byparticipants:“[If] we have a bot tell me the IP of my deployed AWS container rather than a 10step ssh-based process to get it that would be very simple [..and] interacting witha voice assistant to get information [..] out of the development ecosystem would beuseful.” (P18)By abstracting the specific low-level actions, the automatic mapping reduces theneed for memorization of the commands, which reduces errors and saves time:“There are too many command line steps that you can get wrong” (P18)“A lot of the time you know what it is in your head, but you still gotta find it. Sothat’s the stuff [..] this would be really helpful for.” (P8)Participants mentioned that this can be valuable for infrequent but recurring—“once in a while” (P11)—tasks, since they “do [them] often enough to want abetter interface but seldom enough that [they] can’t remember all the right but-tons” (P10) or they can’t remember the “crazy flags that you’ve gotta rememberevery single time” (P8).By continuously switching between applications, developers have to frequently re-establish their working context. For instance, after committing and pushing a codechange using a command line tool, developers often “have to go to the browser,open a new tab, go to GitHub and find the pull request” (P15) which can be a “apain in the arse” (P8). In general, when switching between applications, partici-pants need to do a lot of “admin work” (P14) just to ensure that the applicationsare “in sync” (P14). Therefore, a major benefit of a CDA that automatically trackscontext in the background is that it reduces the explicit specification of context.By automatically aggregating multiple steps and keeping track of the current con-text, participants also thought that a CDA can support information retrieval, es-pecially when “there isn’t really a good interface” (P1) for querying and it canspeed up the process:“So, looking at the context of what I’m doing and then like highlighting an error25text and then copying it and pasting it into Google and looking for it. And thenlooking down for some of the best matches. Even just ‘Devy look this up for me’.”(P2)“Right now, you need to do two or three commands and know what all the changelist numbers are [..to] look up all the information [about who last touched a file].”(P20)Instead of just automating and aggregating tasks, participants suggested that aCDA that tracks a developer’s steps and context could help to enforce workflowsand make sure that nothing is forgotten:“There are certain flows that often go together. When I start a day, I often needto check the same things [..and it is] not easy to check off the top of my head, so Iforget to do that sometimes [..] so that type of thing would be great to do all in oneshot. So where am I [which branch], what is my status, do I need to rebase, and ifI need to rebase, do it [..]” (P3)In case a developer does not follow the usual workflow, the context tracking cancome in handy and allow her to go back in history:“sometimes I just go too far down a path. And I’ve gone through three or four ofthose branches in my mind and I know I need to go back but because I [..] onlywant to go part way back, that just becomes really difficult. So if there was somesimple way it could recognize that I’m far enough down a path [..it] would beamazing if I could then realize that I have screwed up, rewind 10 minutes, commitand then come back to where I am now.” (P21)Several participants noted that the additional communication channel offered byDevy could be utilized to parallelize tasks that would otherwise require a switchin context and focus. The archetypal case of this was setting reminders:“Yeah, I think of them like if I’m coding and I have an idea of another approach Icould take but I want to finish the current one I’m doing, I’ll throw myself in a littleSlack reminder and then I get a notification in the time I specified.” (P21)However, this idea is more general and can be particularly useful in cases wherethe secondary task may take some time to complete:“Where it’d be useful is where I’m in the middle of one task and I want another26being done. If I’m working on one part of it, either debugging or editing somecode and I want something to go on in the background... Like we’ve got a buildsystem so maybe I want to start the build for another tool that I will be using soonand I don’t want to switch over to start that.” (P11)“If I have a pull request and I’m waiting for it to be approved and I have nothingelse to do in the meantime, I’m going to make lunch. I could just be cooking andI could just be like: ‘has it been approved yet?’ and if it has then merge it beforesomeone else gets their stuff in there. Oh, that would be great.” (P3)Seven participants explicitly mentioned that a voice-activated assistant provides analternative to typing that allows tasks be performed when the developer’s handsare “busy” (P11) or “injured” (P16,P20) and that “as intuitive as typing is [..],talking is always going to be more intuitive” (P12). Similarly, it provides an al-ternative to interacting with GUIs that “waste a lot of time just by moving themouse and looking through menus” (P7) or to navigate code with context, for ex-ample by asking “where’s this called from” (P10) or “what classes are relevant tothis concept” (P13).Observation 2 There are a large number of development tasks in participants’workflows that are currently inefficient to perform due to their multi-step and cross-application nature. A conversational developer assistant might be able to supportthese scenarios by reducing application switches, the need for context specificationand memorization, and by supporting parallelization of tasks.275.3 CDA ChallengesParticipants also raised several concerns about their interaction with Devy and con-versational developer assistants more generally. The predominant concern men-tioned by several participants was the disruptiveness of the voice interface inopen office environments:“I work in a shared workspace so there would have to be a way for us to have thesedialogs that are minimally disruptive to other people.” (P19)“I imagine a room full of people talking to their computers would be a little chaotic.”(P2)Further concerns of the voice interaction are its “accuracy” (P11) and that the ver-bal interaction is slow:“I didn’t really enjoy the verbal interaction because it takes longer.” (P2)“It feels weird to interrupt [Devy]. That’s probably more of a social thing [..] it’s avoice talking and you don’t want to interrupt it and then you have to end up wait-ing” (P15)While Devy was able to automate several steps, participants were concerned aboutthe lack of transparency and that it is important to know which low-level actionsDevy is executing:“The downside is I have to know exactly what it’s doing behind the scenes which iswhy I like the command line because it only does exactly what I tell it to do.” (P8)This can be mitigated by providing more feedback, possibly through channels otherthan audio:“I think for me, when [Devy] is changing branches or something, I’d probably wantto see that that has happened in there. Just some indication visually that somethinghas happened. I mean it told me so I’d probably get used to that too.” (P6)However, there is some disagreement on exactly how much feedback is wanted:“I liked that there was definitely clear feedback that something is happening, evenfor things that take a bit of time like git pushes.” (P1) For a conversational de-veloper assistant completeness—the number of intents that the CDA is able tounderstand—is important. Participant P14 made the case that “the breadth of com-28mands needs to be big enough to make it worthwhile.”This completeness is also related to challenges in understanding the intent of allpossible utterances a developer could use:“It’s frustrating to talk to something that doesn’t understand you. Regardless ofhow much more time it takes then another method, it would still be more frustrat-ing to argue with a thing that fundamentally doesn’t feel like it understands me.”(P12)Finally, since developers use a diverse set of tools in a variety of different waysand “everyone’s got a little bit of a different workflow” (P2), it is necessary forCDAs to support customization. For this, one could either “create macros” (P2)or have some other means for adapting to each developer’s particular workflow sothat Devy “could learn how [people are] using it” (P9). This aspect is related tocompleteness but emphasizes altering existing functionality to suit the individualor team.Observation 3 Participants raised several concerns for conversational developerassistants related to disruptiveness of voice interactions, the need for transparency,completeness, and customization.5.4 SummaryIndustrial developers were able to successfully perform basic software develop-ment tasks with our conversational developer assistant, providing positive evidencefor RQ1. In terms of RQ2, CDAs appear to be most useful for simplifying com-plex workflows that involve multiple applications and multiple steps because oftheir unnecessary context switches which interfere with developer concentration.29Chapter 6Related WorkWe build upon a diverse set of related work in this thesis. To support developers intheir tasks, researchers have long tracked development context in order to providemore effective analyses and to surface relevant information. The emerging use ofbots for software engineering also shows promise for automating some of the tasks,improving developers effectiveness and efficiency. Finally, natural language inter-faces show increasing promise for reducing complexity and performing specificdevelopment tasks.6.1 Development ContextOur model of task context is fundamental to enabling Devy to provide a naturalinterface to complex workflows. Previous work has looked at different kinds ofcontext models. Kersten and Murphy provide a rich mechanism for collecting andfiltering task context data, including details about program elements being exam-ined, as developers switch between different tasks [7]. Our context model is morerestrictive in that we mainly track the current task context: past contexts are notstored. Concurrently, our context model also includes details about non-code as-pects relevant to the developer, their project, and their teammates. Other systemshave looked at providing richer contextual information to help developers under-stand their systems. For example, TeamTracks uses the navigation informationgenerated by monitoring how members of a team navigate through code resources30to build a common model of related elements [4]. MasterScope provides additionalcontext about code elements as they are selected in an IDE [15]. The similarity be-tween task contexts can also be used to help identify related tasks [8]. Each of thesesystems demonstrates the utility context models can confer to development tasks.Our work extends these prior efforts by providing a context model appropriate forconversational development assistants.6.2 Bots for SEIn their Visions paper, Storey and Zagalsky propose that bots act as “conduits be-tween users and services, typically through a conversational UI” [13]. Devy clearlysits within this context: the natural language interface provides a means for devel-opers to ‘converse’ with their development environment, while the provided work-flows provide an effective means for integrating multiple different products withina common interaction mechanism. Further to their bot metaphor, Devy is able tointerpret the conversation to perform much more complex sequences of actionsbased on relatively little input, only checking with the developer if specific clarifi-cation is required. As Storey and Zagalsky point out, there is a clear relationshipbetween bots and advanced scripts. We firmly believe that the conversational in-terface, combined with the context model, moves beyond mere scripting to enablenew kinds of interactions and workflows that could not be directly programmed.One study participant also pointed out that “it’s nice to have the conversation whenthere are things that are not specified or you forgot to do; that’s when you want toget into a dialog. And when you’re in the zone, then you can just tell it what to do”(P19), showing further benefit of the conversational UI beyond scripting itself.Acharya et. al. also discuss the concept of Code Drones in which all programartefacts have their own agent that acts semi-autonomously to monitor and improveits client artefact [1]. One key aspect of these drones is that they can be proactiveinstead of reactive. While Devy is not proactive in that it requires a developer tostart a conversation, it can proactively perform many actions in the backgroundonce a conversation has been started, if it determines that this appropriate for thegiven workflow. Devy also takes a different direction than Code Drones in thatrather than attaching drones to code artefacts, Devy primarily exists to improve the31developer’s tasks and experience directly, rather than the code itself.6.3 Natural Language Tools for SEA number of tools have been built to provide natural language interfaces specifi-cally for software engineering tasks.The notion of programming with natural language is not new (having first beendescribed by Sammet in 1966 [12]). Begel further described the diverse ways inwhich spoken language can be used in software development [2]. More recently,Wachtel et. al. have investigated using natural language input to relieve the devel-oper of repetitive aspects of coding [16]. Their system provides a mechanism forusers to specify algorithms for spreadsheet programs using natural language. Incontrast, Devy does not try to act as a voice front-end for programming: it worksmore at a workflow level integrating different services.Others though have looked at natural language interfaces as a means for sim-plifying the complex tools used for software development. One early relevant ex-ample of this by Manaris et. al. investigated using natural language interfacesto improve the abilities of novice users to access UNIX tools in a more naturalway [9]. NLP2Code provides a natural language interface to a specific task: find-ing a relevant code snippet for a task [3]. NLP2Code takes a similar approach toDevy in that supports a specific development task, but unlike Devy does not use arich context model, nor does it involve more complex development workflows.32Chapter 7DiscussionIn this chapter we discuss threats to the validity of our study and future work sug-gested by our study participants.7.1 Threats to ValidityThe goal of our study was to gain insight into the utility of conversational developerassistants in the software engineering domain. As with any empirical study, ourresults are subject to threats to validity.Internal validity. We elicited details about participants’ workflows before theyinteracted with our prototype to mitigate bias and again after using Devy to cap-ture more detail. Despite this, it is possible that participants did not describe allways Devy could impact their workflows; given more time and a wider variety ofsample tasks, participants may have described more scenarios. While we followedstandard open coding processes, other coders may discern alternative codes fromour interview transcripts.External validity. Though our 21 interviews with industrial developers yieldedmany insights, this was a limited sample pulled from our local metropolitan region.While participants had differing years of experience and held various roles at sixdifferent organizations, each with a different set of workflows, our findings may33not generalize to the wider developer community.7.2 Future WorkThe feedback we received from industrial developers was broadly positive forour prototype conversational developer assistant. Thankfully, our participants hadmany great suggestions of ways to extend and improve Devy to make it even moreeffective in the future.The most criticized aspect of Devy was the voice interface; all participantsworked in open-plan offices and believed that conversing with Devy would annoytheir co-workers. Thus, one piece of future work is to implement alternative con-versational layers for Devy, specifically a text-based ChatBot-like interface, andto determine what feedback to give and how it should be presented to users.Currently, Devy can be extended through the intent service by wiring up newstates in the FSM. This requires at least the same amount of work as creatingscripts, but enables better integration with existing states than simple scripting.Based on participant feedback, supporting a more parameterized view of howthe states are connected to form custom workflows seems like a reasonabletradeoff between complete scripting and a fully autonomous agent. Participantswere also forthcoming with suggestions for a diverse set of future workflows thatcould define the out-of-box-workflows for version control, debugging, testing, col-laboration, task management and information retrieval. In order to support thesemore diverse workflows, more sophisticated methods of collecting, maintaining,and exposing required contextual information need to be examined. One possi-ble approach would be to develop a framework or platform in which tools shareinformation through a common query language, pushing the burden of supplyingrequired information to the tool provider. This could be combined with a library ofatoms for accessing and processing the information and for executing commands.Workflows would then simply be the composition of these atoms.A large step beyond this would be for the CDA to support generic workflowsout-of-the-box that can self-adapt to better enable user-specific custom work-flows without user intervention but based on their own usage patterns.Several participants also wished for tighter source code integration. The in-34tent of this integration was to perform more query-based questions of the specificcode elements they were looking at without interrupting their current task. For ex-ample:“the thing people want the most...are abstract syntax trees. I think it is somethingthat would offer a lot of power if you also had assistive technology layered on top.”(P8)Using lightweight notes and reminders, CDAs might enable semantic undosthat could be further maintained using the context model to rollback changes tomeaningful prior states.Enabling CDAs to proactively take action in terms of awareness or in re-sponse to external events was widely requested:“influence the output of what I’m working on...by [notifying] me about getting offmy regular pattern, that would be the most valuable.” (P8)This could also help by preventing mistakes before they happen:“If I tell it to commit and [there are an unusual number of changes], it should con-firm.” (P15)Next, extending support for industrial tools to those commonly used by indus-trial teams will enable Devy to be deployed in a wider variety of practical contexts.Participants were also enthusiastic about the potential for support for enhancedcross-application workflows that otherwise cause them to context switch or ‘copy-and-paste’ between independent systems. We will further investigate extendingsupport for these kinds of tasks that force developers to context switch.Finally, we built our prototype using the Alexa service and our intent service tohandle the natural language discourse and map it to workflow actions. To supportfurther workflows and ease the natural language discourse with developers, wewill examine whether and how to extend the underlying discourse representationstructure.35Chapter 8ConclusionIn this thesis, we have explored the potential of conversational agents to support de-veloper workflows. In particular, we have described Devy, a conversational devel-opment assistant that enables developers to invoke complex workflows with onlyminimal interaction using a natural language conversational interface. Through itscontext-aware model, Devy supports rich workflows that can span multiple inde-pendent tools; this frees the developer to offload these low-level actions and enablesthem to focus on their high-level tasks.Using our Devy prototype as a technology probe, we evaluated our approachin a mixed methods study with 21 industrial software engineers. These engineerswere able to use Devy successfully and appreciated that they did not need to specifyand memorize multi-step workflows and that it reduced context switches. Theyadditionally identified a concrete set of challenges and future directions that willimprove the utility of future CDAs.The Devy prototype demonstrates that developers can successfully launch com-plex workflows without interrupting their current tasks. We believe that that futureconversational developer assistants will have the ability to improve developer’sproductivity and/or effectiveness by allowing them to focus on their core develop-ment tasks by offloading meaningful portions of their workflows to such automatedagents.36Bibliography[1] M. P. Acharya, C. Parnin, N. A. Kraft, A. Dagnino, and X. Qu. CodeDrones. In Proceedings of the International Conference on SoftwareEngineering (ICSE), pages 785–788, 2016. → page 31[2] A. Begel. Spoken Language Support for Software Development. PhD thesis,EECS Department, University of California, Berkeley, 2006. → page 32[3] B. A. Campbell and C. Treude. Nlp2code: Code snippet content assist vianatural language tasks. In Proceedings of the International Conference onSoftware Maintenance and Evolution (ICSME), pages 628–632, 2017. →page 32[4] R. DeLine, M. Czerwinski, and G. Robertson. Easing programcomprehension by sharing navigation data. In Proceedings of the VisualLanguages and Human-Centric Computing (VLHCC), pages 241–248, 2005.→ page 31[5] V. M. Gonza´lez and G. Mark. Constant, constant, multi-tasking craziness:Managing multiple working spheres. In Proceedings of the Conference onHuman Factors in Computing Systems (CHI), pages 113–120, 2004. → page1[6] R. Huang. RQDA: R-based Qualitative Data Analysis, 2017. → page 18[7] M. Kersten and G. C. Murphy. Using task context to improve programmerproductivity. In Proceedings of the International Symposium on Foundationsof Software Engineering (FSE), pages 1–11, 2006. → page 30[8] W. Maalej, M. Ellmann, and R. Robbes. Using contexts similarity to predictrelationships between tasks. Journal of Systems and Software (JSS), 128:267– 284, 2017. → page 3137[9] B. Z. Manaris, J. W. Pritchard, and W. D. Dominick. Developing a naturallanguage interface for the UNIX operating system. Proceedings of theConference on Human Factors in Computing Systems (CHI), 26(2):34–40,1994. → page 32[10] A. N. Meyer, T. Fritz, G. C. Murphy, and T. Zimmermann. Softwaredevelopers’ perceptions of productivity. In Proceedings of the 22Nd ACMSIGSOFT International Symposium on Foundations of SoftwareEngineering, FSE, pages 19–29. ACM, 2014. → page 1[11] A. N. Meyer, L. E. Barton, G. C. Murphy, T. Zimmermann, and T. Fritz. Thework life of developers: Activities, switches and perceived productivity.IEEE Transactions on Software Engineering (TSE), 43(12):1178–1193,2017. → page 1[12] J. E. Sammet. Survey of formula manipulation. Communications of theACM (CACM), 9(8):555–569, 1966. → page 32[13] M.-A. Storey and A. Zagalsky. Disrupting developer productivity one bot ata time. In Proceedings of the International Symposium on Foundations ofSoftware Engineering (FSE), pages 928–931, 2016. → page 31[14] A. Strauss and J. M. Corbin. Basics of Qualitative Research: Techniquesand Procedures for Developing Grounded Theory. SAGE Publications,1998. ISBN 9780803959408. → page 18[15] W. Teitelman and L. Masinter. The Interlisp programming environment.Computer, 14(4):25–33, 1981. → page 31[16] A. Wachtel, J. Klamroth, and W. F. Tichy. Natural language user interfacefor software engineering tasks. In Proceedings of the InternationalConference on Advances in Computer-Human Interactions (ACHI),volume 10, pages 34–39, 2017. → page 3238Appendix AImplementation NotesA.1 Handling RequestsTo provide a more detailed version of Devy’s architecture than that of Chapter3, we start by walking through how a user request is handled by the system (seeFigure A.1).When the developer says “Alexa, ask Devy...” to an Amazon Alexa device,the device records the speech and sends it to the Devy skill (step 1 in Figure A.1)hosted by Amazon in the cloud. The skill processes the audio and attempts to matchit to one of several predefined utterances, called intents, which we created whenwe developed the skill. These intents represent the actions the skill can perform.During processing, any variables in the predefined utterance are resolved from thespeech and a structured representation of the speech is generated including thename of the matched intent and the values for the variables. If the Alexa service isunable to determine the value to assign to a required variable, it can automaticallyprompt the user to specify the value explicitly. Once the structured representationis generated Alexa will send it to the configured endpoint as an HTTPS request.Since developer machines are rarely configured with a internet accessible host-name, we created a simple proxy server and configured it to listen for requests fromthe Devy skill (2a). The proxy forwards requests to connected developer computers(clients) over a WebSocket connection (2b).Once the request is received on the developer’s local computer, it is further39Figure A.1: Detailed view of a request being processed by our system.processed (3) using a finite state machine (FSM). The request transitions the FSMfrom the ready state to a state corresponding to the request’s intent. The new statecontains the executable code to actually perform the intention and, once complete,transitions the state machine back to the ready state. For example, if the user says“Alexa, ask Devy who owns this file.” then the state machine transitions from readyto gettingOwner back to ready. The gettingOwner state queries the context modelto get the path of “this file” and then calls git blame using the file path to seewhich git user changed the most lines. It then gets the developer’s actual nameusing a GitHub WebHook with the git username found previously. Finally, it gen-erates a response object with the phrase to be spoken to the user e.g., “{developername} owns the file {file name} having made {n} changes in the past 30 days.”The response object is sent to the proxy which responds to the original request40(4a,b). Upon receipt, the Alexa device will speak the supplied phrase ending theinteraction (5).The user request in the preceding example only required transitioning to a sin-gle state; however, many states have preconditions. For example, if a user says“Alexa, tell Devy to push my changes.” then we should check if there are anyuncommitted local changes and, if so, ask the developer if they would like thechanges to be committed first. In this case the state machine transitions from readyto pulling and pushes the pulling state onto a stack. Then the pulling state willtransition the state machine to committing (which is also pushed onto the stack)which checks for uncommitted changes, and if there are any, generates a responseobject (including the states in the stack) and sends it to Alexa to get feedback fromthe user e.g., by asking “You have uncommitted changes, should I commit them?”The user answers “yes” or “no” and a new request (still containing the stack) issent to the client which then pops the state off the stack and executes it. For thesecond request, the committing state will use the user’s answer to decided whetheror not to commit the changes. It will then pop off the pulling state (emptying thestack) and execute it. This time, it bypasses the check for uncommitted changesand simply pulls (and merges) any remote changes.One thing to note is that the state transitions are designed such that the usercannot cancel an operation that is partially completed i.e., the stack will alwaysbe emptied before the interaction terminates. In the previous example, this meansthat remote changes will always be pulled after local changes are committed so thestate of the repository always matches the user’s expectations.A.2 Developing Alexa SkillsAlexa is a voice-operated virtual assistant offered by Amazon which uses naturallanguage processing to understand user intents (the actions the user would likeAlexa to perform). While Alexa can handle many intents out-of-the-box, it can beextended by writing skills which are invoked by name. For example, our skill Devyis invoked by saying “Alexa, tell Devy...”. The remaining speech is then used byDevy to determine the intent.Skills are developed in a web-based integrated development environment hosted41by Amazon shown in Figure A.2.Alexa recognizes intents by matching users’ speech with textual example phrasessupplied during development. Figure A.3 shows the utterances used to train thestartIssue intent. Notice that the utterances are largely just permutations of eachother and simply represent the variation in how the user may request the intent.Alexa is also able to handle small variations (e.g., adding a “the”) automatically.Utterances may also contain slots (i.e., variables) whose values are automati-cally recognized and assigned from users’ speech. The example utterance in FigureA.3 contains a single slot, IssueNumber, that expects a numeric value. If Alexa isunable to determine the value for the variable it will remain unset; Alexa can alsobe configured to specifically prompt for the missing value. In our experience, Alexais typically able to recognize the correct value if the possible values are constrained(Amazon provides variable types that only accept values from pre-populated listse.g., the US states). However, Alexa is not able to capture free-text e.g., a commitmessage.Dialog management is handled by sending state information in the request andresponse objects between the skill and Alexa so the Alexa service is effectivelystateless.We encountered several limitations with the Alexa service when developingour Devy skill. In general, skills1. must receive a response from the client within ten seconds,2. are not able to initiate interactions with users,3. cannot accurately capture free-text, and4. only allow training utterances to be specified during development.The first two limitations make it difficult to handle long running commands likeexecuting a test suite. In practice, this means that the skill is only able to notify theuser that the command has started but cannot notify the user of problems encoun-tered while running the command. Instead, it is up to the user to request the statusof the command (or simply observe the execution on-screen). Amazon has partlyaddressed this shortcoming by providing a Notifications API but it only informs theuser that they have pending notifications; they must still retrieve the notificationsmanually.42The third limitation makes it difficult to support intents requiring unconstrainedtext. In the case of commit messages, we simply use a static message “Committedby Devy” but tools exist to summarize committed changes and could be incor-porated in future iterations. More generally, it would be possible to provide anon-screen text box where users could type the necessary text.Finally, the requirement to specify all the training utterances at development-time may constrain customization since users cannot add to or alter these trainingphrases. The training utterances are also global to all users and the myriad utter-ances necessary to support the potentially large number of developer tasks wouldbe difficult to manage and it could be difficult for Alexa to distinguish betweenthem. A recently released alternative to traditional skills called Skill Blueprintsmay address these limitations by allowing users to customize their own instancesof the Devy skill but would require more work on the part of the user.A.3 Processing User IntentsThe Devy skill runs in the cloud and converts users’ speech into a JSON-structuredintent object.To actually perform the intended actions, the objects are sent to aservice running locally on the user’s local computer where it is processed.The service is written in Typescript for Node.js and uses four main libraries:ws1, Voxa2, NodeGit3, and @octokit/rest4 (previously named GitHub). ws isa WebSocket client and server that is used to communicate with the proxy server(which in turn communicates with the hosted Devy skill). To simplify the codeneeded to handle requests from the Devy skill we used the Voxa library which pro-vides a way to organize a skill into a state machine using a model-view-controllerpattern. We found this to be much more convenient then using the officially sup-ported Alexa Skills Kit SDK for Node.js5 which does not use a state ma-chine. Upon receiving a request, Voxa automatically transitions the state machineto the associated intent state (coded in the model) and executes its callback func-1https://www.npmjs.com/package/ws2https://www.npmjs.com/package/voxa3https://www.npmjs.com/package/nodegit4https://www.npmjs.com/package/@octokit/rest5https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs43tion. The callback function contains the code to carry out the intent which oftenuses the NodeGit library (to execute git functions) and/or @octokit/rest (tomake calls to the GitHub API). The functions can also transition to other states tocheck preconditions.After the function completes, Voxa automatically generates a response objectby calling the intent’s view. Views consist of template strings for each of the possi-ble outcomes of the intent callback function. Variables in the strings are substitutedautomatically with values set in the intent function and represent the text that willbe spoken by the Alexa device in response to the user’s request. Finally, Voxa gen-erates are response object incorporating the output from the view and sends it backto the Devy skill using the ws WebSocket.44Figure A.2: Amazon Alexa developer console showing the Devy skill. Sup-ported intents are shown on the left in the Interaction Model section.Other skill development tasks can be completed using the links on theright under the Skill builder checklist. Testing and deployment is com-pleted via the tabs in the header bar.45Figure A.3: Training utterances for the startIssue intent. Slots (variables) areshown in blue and surrounded by curly braces. The slot type and orderis specified below the list of utterances.46Appendix BStudy DetailsB.1 Recruiting ParticipantsAfter obtaining approval from UBC’s Ethics Board, we recruited participants byreaching out to managers at local software companies who were able to recruitdevelopers internally. We sent managers the following study advertisement viaemail:The Software Practices Lab at UBC is conducting a study examining objective-based interactions with automated assistants. It will take approximately 30 min-utes to complete and will consist of an interview looking at how automated as-sistants could be used by software developers followed by a short questionnaire.During the study, you will have the opportunity to try our prototype assistantand give suggestions for new features. A short demonstration is available athttps:// soapbox.wistia.com/videos/HBzPb4ulqlQI. Participants will be enteredinto a draw to win Amazon Echo Dots.A short video demonstrating me asking Devy who owned an open file was in-cluded in the advertisement to help managers raise awareness and pique interestin the study. The managers also arranged a meeting space in the office where wecould interact with participants privately. Due to legal concerns, managers did notwant Devy to be used on computers that had access to the company’s intellec-47tual property. Therefore, we had participants complete the study on a departmentLenovo Thinkpad laptop running the Fedora Linux operating system. Connectedto the laptop for power was an Amazon Echo Dot and a mobile phone. Beforestarting the study at each location, we connected the laptop and Echo Dot to thecompany’s public WiFi network1 for internet access. The phone was used to takeaudio recordings of each interview for later transcription.At the start of each interview session we walked the participant through thestudy procedure and consent form (Figure B.1) and answered any questions. Afterobtaining consent we proceeded to begin the study.1Initially, we tried to use cellular data via our mobile phone’s WiFi hotspot capabilities but theconnection was too slow causing the Devy skill to timeout.48Figure B.1: Consent form contentsEvaluating Automated Software Engineering AssistantsWho is conducting the study?Principal Investigator:Dr. Reid Holmes, Associate Professor, Department of Computer Science, UBC, rtholmes@cs.ubc.ca, 604-822-0409.Co-Investigator:Nick Bradley, Graduate Student, Department of Computer Science, UBC, nbrad11@cs.ubc.ca.Who is funding the study?This study is funded by NSERC.Why are we doing this study?You are being invited to participate in a study that investigates the potential applications of an automated assistant in the software development industry. The goal of this study is to develop an effective prototype automated software assistant and to show that it can generalize to support a range of tasks. Data collected will be used to evaluate the prototype and to help inform future features.How is the study done?Your participation in the study will involve answering a short questionnaire covering your professionalbackground, completing some software development tasks and discussing your experience with the investigators. The study will take approximately 30 minutes to complete. What happens next?The aggregate results of the study will be made available through open channels and may be published in peer reviewed journals without any individual respondent or institutional identifiers.Is there any way the study could pose a risk for you?There are no anticipated risks for participants. You do not have to answer any questions you feel uncomfortable answering and you can end the study at any time with no repercussions. The study will only ask very limited questions about your professional background as a software developer and your experience using the automated assistant.What are the benefits of participating in the study?This study will provide you with an opportunity to use and provide feedback on our automated assistant. The feedback you provide will shape future versions of the assistant and will contribute to making it useful to professional software developers. Sweets will be provided during the study and you can choose to be entered into a draw to win one of two Amazon Echo Dots.How will your privacy be maintained?Your confidentiality will be respected. Original data collected in this study will be examined by the research team members only. Although limited identifying information will be collected, the research team will ensure that any instances of self-disclosure will be anonymized. All data will be encrypted and password protected, and held in a password protected file space accessible only to the research 49team. Reports will contain descriptive statistics and include select quotes with any identifiers removed. The reports will not contain any personally identifying data. The research team will not identify individuals in publications.Who can you contact if you have questions about this study?If you have any questions or concerns about what we are asking of you, please contact the co-investigator. Contact information is listed at the top of the first page of this form.Who can you contact if you have complaints or concerns about this study?If you have any concerns or complaints about your rights as a research participant and/or your experiences while participating in this study, contact the Research Participant Complaint Line in the UBC Office of Research Ethics at 604-822-8598 or if long distance e-mail RSIL@ors.ubc.ca or call toll free 1-877-822-8598.Taking part in this study is entirely voluntary. You have the right to refuse to participate in this study. If you decide to take part, you may choose to pull out of the study at any time without giving a reason and without any negative impact on you or your employment.By completing this form, you are consenting to participate in this research.I have read the Automated Software Engineering Assistants survey consent formNameSignature DateIf you would like to be entered into a draw to win one of two Echo Dots, please fill in your email address.Email50B.2 Conducting the StudyInterviews were conducted in three phases (see also the study guide (Figure B.2)).Before starting the study, five GitHub accounts were created: devy-participant,devybot, a-aaronson, jeffmiddle, and frypj. The devy-participant account was usedby the participant. The devybot account was used to post automated feedback onpull requests. The other three were meant to represent the accounts of colleagues.They allowed us to create issues and files under different users and allowed the par-ticipant to assign someone to review pull requests. Next, two GitHub repositorieswere created under the frypj account2. The repositories were called devy-study-1and devy-study-2. Creation of the repositories was scripted (Listing B.1) to allowus to quickly and reliably reset them between trials. In particular, we deleted andre-created the repositories on GitHub and then deleted and cloned local versions ofthe repositories. Specifically, the script:1. deletes the existing devy-study-1 and devy-study-2 repositories from GitHub,2. re-creates the two repositories on GitHub,3. adds a-aaronson, jeffmiddle, and frypj as collaborators on the devy-study-1and devy-study-2 repositories,4. if the repositories were created for the first time, the invitation request wasprogrammically accepted by each user,5. creates two open issues on devy-study-2,6. creates base files in the repositories on GitHub: a README in devy-study-1and a logging class and corresponding test file in devy-study-2,7. deletes the local versions of the repositories from the study laptop, and8. clones the two repositories from GitHub,This programmatic approach made it very easy to make changes after collectingfeedback during the pilot.Listing B.1: Ruby script to reset repositories on GitHub between participants.# ! / usr / b in / env ruby# Reset GitHub Devy study r epo s i t o r i e s and update l o c a l copies2Typically, repositories would be set up by the company and using a user different from devy-participant better simulates this.51r equ i re ’ dotenv / load ’requ i re ’ g i t hub ap i ’r equ i re ’ f i l e u t i l s ’r equ i re ’ g i t ’OWNER = ’ devy−pa r t i c i p a n t ’REPOS = [ ’ devy−study−1 ’ , ’ devy−study−2 ’ ]COLLABORATORS = [ ’ a−aaronson ’ , ’ j e f fm i d d l e ’ , ’ f r y p j ’ ]g i thub = Github . new bas ic au th : ” #{OWNER} : # {ENV[ ’GITHUB PASSWORD ’ ] } ”# Update repos on GitHubREPOS. each do | repo |puts ” De le t ing r epos i t o r y #{OWNER} / # { repo } . ”beging i thub . repos . de le te OWNER, reporescue Github : : E r ro r : : G i thubEr ror => eputs ”WARN − Fa i led to de le te r epos i t o r y . #{e .message } ”endputs ” Creat ing remote r epos i t o r y #{OWNER} / # { repo } . ”g i thub . repos . create name : repo ,has w ik i : false ,has downloads : false ,a u t o i n i t : fa lseCOLLABORATORS. each do | co l l a bo r a t o r |puts ” Adding co l l a bo r a t o r ’ # { co l l a bo r a t o r } ’ t o #{OWNER} / # { repo } . ”g i thub . repos . co l l a bo r a t o r s . add OWNER, repo , c o l l a bo r a t o rend# Create Issue 1 on devy−study−2puts ” Creat ing issue on #{OWNER} / # { repo } . ”g i thub . issues . create user : OWNER, repo : repo ,t i t l e : ’HTML inpu t tag maxLength ’ ,body : ’ ’ ,assignee : OWNER,l abe l s : [ ’ enhancement ’ ]# Create Issue 2 on devy−study−2puts ” Creat ing issue on #{OWNER} / # { repo } . ”g i thub . issues . create user : OWNER, repo : repo ,t i t l e : ’ Disable debugging output ’ ,body : ’ comment out the debug ‘ console . log ‘ on l i n e 8 of log . t s . ’ ,assignee : OWNER,l abe l s : [ ’ bug ’ ]52end# Create README in devy−study−1puts ” Creat ing README.md on #{OWNER} / # {REPOS[ 0 ] } . ”g i thub . repos . contents . c reate OWNER, REPOS[ 0 ] , ’README.md ’ ,path : ’README.md ’ ,message : ’ Create readme f i l e ’ ,content : <<˜doc# Devy Study Task 1Devy i s a pro to type developer ass i s t an t t ha t you can converse wi th tocomplete development tasks . To s t a r t a conversat ion , prompt Devy bysaying ” Alexa , t e l l Devy . . . ” or ” Alexa , ask Devy . . . ” .Complete the f o l l ow i ng tasks :1 . F i r s t , s imply launch Devy by saying ” Alexa , launch Devy ” . This i sj u s t to make sure Devy i s working ( occas iona l l y there are issueswi th w i f i ) .2 . Now, using Devy , t r y to get the name of the person whom you mightcon tac t to get help w i th making changes to t h i s README.md f i l e .3 . Next , make sure you are on branch iss2 and then make a change tot h i s README.md f i l e (and save those changes ) .4 . F i na l l y , make those changes ava i l ab l e on GitHub .This task w i l l l i k e l y be f r u s t r a t i n g to complete s ince we haven ’ tg iven you the commands Devy supports . However , the purpose of thequest ion i s not to complete the task but f o r us to understand whatdevelopers would say to get t h e i r work done . This i n f o rma t i on w i l lbe inva luab le f o r c rea t i ng more use fu l vers ions o f Devy i n thef u t u r e . Note a lso t ha t t h i s p ro to type i s t r a i ned to recognize al im i t e d number o f commands which a lso con t r i bu t es to making t h i sa hard task .doc# Create README in devy−study−2puts ” Creat ing README.md on #{OWNER} / # {REPOS[ 1 ] } . ”g i thub . repos . contents . c reate OWNER, REPOS[ 1 ] , ’README.md ’ ,path : ’README.md ’ ,message : ’ Create readme f i l e ’ ,content : <<˜doc# Devy Study Task 2Complete the f o l l ow i ng tasks .1 . Say ” Alexa , t e l l Devy to l i s t my issues . ” to l i s t the f i r s t openissue on GitHub . L i s t the second issue by saying ” Next ” , thenstop by saying ” Stop ” . Not ice t ha t the l i s t e d issues are f o r theco r r ec t r epos i t o r y .532. Say ” Alexa , t e l l Devy I want to work on issue 2 . ” to have Devyprepare your workspace f o r you by checking out a new branch .3 . Resolve the issue : comment out the debug ‘ console . log ‘ on l i n e 8of log . t s by prepending i t w i th ‘ / / ‘ . Save the f i l e .4 . Say ” Alexa , t e l l Devy I ’m done . ” fo l l owed by ”Yes ” and ”Yes ” then”No” and f i n a l l y ”Yes ” . This w i l l commit your work and open a p u l lrequest .Not ice t ha t the rev iewers you spec i f i ed have been added . Also , no t i cet ha t t e s t s cover ing the changes were au toma t i ca l l y run and the r e su l t sinc luded as a comment by Devy .doc# Create log . t s i n devy−study−2puts ” Creat ing log . t s on # {OWNER} / # {REPOS[ 1 ] } . ”g i thub . repos . contents . c reate OWNER, REPOS[ 1 ] , ’ log . t s ’ ,path : ’ log . t s ’ ,message : ’ I n i t i a l ve rs ion o f logg ing ’ ,content : <<˜doc/∗∗∗ Co l l e c t i on o f logg ing methods . Usefu l for making the output eas ie r to∗ read and understand .∗∗ @param msg∗ /expor t de f au l t class Log {pub l i c s t a t i c debug (msg : s t r i n g ) {console . log ( ”<D> : ” + msg ) ;}pub l i c s t a t i c t race (msg : s t r i n g ) {console . log ( ”<T> : ” + msg ) ;}pub l i c s t a t i c i n f o (msg : s t r i n g ) {console . log ( ”< I > : ” + msg ) ;}pub l i c s t a t i c warn (msg : s t r i n g ) {console . e r r o r ( ”<W> : ” + msg ) ;}pub l i c s t a t i c e r r o r (msg : s t r i n g ) {console . e r r o r ( ”<E> : ” + msg ) ;}}doc54# Create t e s t / log . spec . t s i n devy−study−2puts ” Creat ing t e s t / log . spec . t s on #{OWNER} / # {REPOS[ 1 ] } . ”g i thub . repos . contents . c reate OWNER, REPOS[ 1 ] , ’ t e s t / log . spec . t s ’ ,path : ’ t e s t / log . spec . t s ’ ,message : ’ Create t e s t su i t e f o r logg ing ’ ,content : <<˜docconst s inon = requ i re ( ’ s inon ’ ) ;const asser t = requ i re ( ’ asser t ’ ) ;impor t Log from ” . . / log . t s ” ;descr ibe ( ” log ” , ( ) => {const spy = sinon . spy ( console , ’ log ’ ) ;const msg = ” log message ” ;a f t e r ( ( ) => {spy . res to re ( ) ;} )i t ( ” should log debug messages ” , ( ) => {Log . debug (msg ) ;asser t ( spy . ca l ledWi th ( ‘<D> : $ {msg } ‘ ) ) ;} ) ;i t ( ” should log t race messages ” , ( ) => {Log . t race (msg ) ;asser t ( spy . ca l ledWi th ( ‘<T> : $ {msg } ‘ ) ) ;} ) ;i t ( ” should log in fo rma t i on messages ” , ( ) => {Log . i n f o (msg ) ;asser t ( spy . ca l ledWi th ( ‘< I > : $ {msg } ‘ ) ) ;} ) ;i t ( ” should log warning messages ” , ( ) => {Log . warn (msg ) ;asser t ( spy . ca l ledWi th ( ‘<W> : $ {msg } ‘ ) ) ;} ) ;i t ( ” should log e r r o r messages ” , ( ) => {Log . e r r o r (msg ) ;asser t ( spy . ca l ledWi th ( ‘<E> : $ {msg } ‘ ) ) ;} ) ;} ) ;doc# Update l o c a l copy o f each repo55REPOS. each do | repo |path = ” / home / # {ENV[ ’USER ’ ] } / # { repo } ”puts ” De le t ing l o c a l r epos i t o r y #{ path } . ”beginF i l e U t i l s . rm r f ( path )rescueputs ”WARN − Fa i led to de le te l o c a l repo d i r e c t o r y #{ repo } . ”endputs ” Cloning #{OWNER} / # { repo } to #{ path } . ”g = G i t . c lone ( ” g i t@gi thub . com: # {OWNER} / # { repo } . g i t ” , path )i f repo == ’ devy−study−1 ’puts ” Checking out new branch iss2 . ”g . branch ( ’ i ss2 ’ ) . checkoutendendIn the first phase, we asked participants questions designed to help us betterunderstand their workflows and what thought(s) prompted, initiated, or precededeach workflow. For instance, a participant might describe their workflow for de-bugging a failed test and we would then prompt them to tell us what they thoughtbefore starting the workflow e.g., “why did this break?”. These questions are im-portant in helping us design a richer context model to support a wider assortmentof workflows. It also helped us better understand the statements developers woulduse to initiate a specific workflow. With this information, we can provide moretraining utterances to the Devy skill and thus offer more natural interactions for theuser. We concluded the first phase by introducing the Echo Dot and providing theparticipant with a list of Devy commands for reference. To get the participant morecomfortable with interacting with the Echo before the next phase, we had them ask“Alexa, tell me a joke”.The second phase was an interactive session where the participants completed aseries of common developer tasks using Devy. The tasks were meant to be familiarto the participants and included version control, testing, and working with issues.We gave the participants instructions orally, and textually via issues and in the filesthey were to modify. To start, participants were asked to read the README.mdfile that was open in Visual Studio Code which was opened on the study laptop andused throughout the study. The README is shown in Listing B.2.56Listing B.2: Instructions for completing the first task.# Devy Study Task 1Devy i s a pro to type developer ass i s t an t t ha t you can converse wi th tocomplete development tasks . To s t a r t a conversat ion , prompt Devy bysaying ” Alexa , t e l l Devy . . . ” or ” Alexa , ask Devy . . . ” .Complete the f o l l ow i ng tasks :1 . F i r s t , s imply launch Devy by saying ” Alexa , launch Devy ” . This i sj u s t to make sure Devy i s working ( occas iona l l y there are issueswi th w i f i ) .2 . Now, using Devy , t r y to get the name of the person whom you mightcon tac t to get help w i th making changes to t h i s README.md f i l e .3 . Next , make sure you are on branch iss2 and then make a change tot h i s README.md f i l e ( and save those changes ) .4 . F i na l l y , make those changes ava i l ab l e on GitHub .This task w i l l l i k e l y be f r u s t r a t i n g to complete s ince we haven ’ tg iven you the commands Devy supports . However , the purpose of thequest ion i s not to complete the task but f o r us to understand whatdevelopers would say to get t h e i r work done . This i n f o rma t i on w i l lbe inva luab le f o r c rea t i ng more use fu l vers ions o f Devy i n thef u t u r e . Note a lso t ha t t h i s p ro to type i s t r a i ned to recognize al im i t e d number o f commands which a lso con t r i bu t es to making t h i sa hard task .The next task was designed to show participants the power of Devy and to getthem thinking further about what features they could suggest. This time, we hadthe participants open the README.md file in the devy-study-2 repository shownin Listing B.3.Listing B.3: Instructions for completing the second task.# Devy Study Task 2Complete the f o l l ow i ng tasks .1 . Say ” Alexa , t e l l Devy to l i s t my issues . ” to l i s t the f i r s t openissue on GitHub . L i s t the second issue by saying ” Next ” , thenstop by saying ” Stop ” . Not ice t ha t the l i s t e d issues are f o r theco r r ec t r epos i t o r y .2 . Say ” Alexa , t e l l Devy I want to work on issue 2 . ” to have Devyprepare your workspace f o r you by checking out a new branch .3 . Resolve the issue : comment out the debug ‘ console . log ‘ on l i n e 8of log . t s by prepending i t w i th ‘ / / ‘ . Save the f i l e .4 . Say ” Alexa , t e l l Devy I ’m done . ” fo l l owed by ”Yes ” and ”Yes ” then”No” and f i n a l l y ”Yes ” . This w i l l commit your work and open a p u l l57request .Not ice t ha t the rev iewers you spec i f i ed have been added . Also , no t i cet ha t t e s t s cover ing the changes were au toma t i ca l l y run and the r e su l t sinc luded as a comment by Devy .In the next phase, we gathered qualitative feedback from the participate abouttheir experience with Devy by asking the following opened-ended questions:1. Imagine that Devy could help you with anything you would want, what doyou think it could help you with and where would it provide the most benefit?2. Are there any other tasks / goals / workflows that you think Devy could helpwith, maybe not just restricted to your development tasks, but other toolsyou or your team or your colleagues use?3. When you think about the interaction you just had with Devy, what did youlike and what do you think could be improved?4. Did Devy do what you expected during your interaction? What would youchange?5. Do you think that Devy adds value? Why or why not?We concluded the study asking participants demographic questions and thank-ing them for their participation.58B.3 Data AnalysisIn this section, we describe in more detail how we transcribed and coded the studyinterviews and how we analyzed the data used to generate Figure 5.1. Anonymizedparticipant data including their experience and job title summarized in Section 4.1and their interview duration summarized in Section 4.3 are given in Table B.1.B.3.1 Quantifying Number of Invocation AttemptsTo get a sense of Devy’s usability we had participants complete some common de-velopment tasks without having received prior instruction. We recorded the num-ber of invocations required for the participant to successfully complete the task, re-moving invocations that failed due to technical reasons, most notably, the networkconnection dropping out and missed steps during the experiment set up. However,we still count incorrect invocations e.g., missing the skill name in the invocation:“Alexa, get the current branch”. A task is considered completed once Devy re-sponded with the phrase in Table B.2, regardless of the invocation. Across all taskswe removed 55 out of 175 invocations due to technical reasons. Invocation countsfor each participant, by task, are shown in Table B.3.B.3.2 Open Coding Interview TranscriptsI manually transcribed the audio recordings captured during the interview usingVLC media player to control the audio playback and Visual Studio Code to writethe text. The raw transcripts were written in markdown and the interviewer text wasmarked by a preceding (N)ick or (T)homas, the participant’s text was marked witha (P)articipant, and Alexa’s responses were marked with an (A)lexa. Durationswere recorded in the heading of each section of the interview to facilitate linkingthe transcribed text to the audio recording. Formatted versions of the transcriptsare provided in Appendix ??.Transcribing the 12 hours of recordings took approximately 22 hours over thecourse of about one week.Using the transcripts we open coded participants’ responses to the questionsoutlined in Part III of the study guide (Figure B.2). Thomas Fritz (second au-thor) and I developed the codes by independently coding five randomly selected59Table B.1: Participant and study metadata.RecordingDurationProgrammingExp.(Yrs)ProfessionalExp.(Yrs)JobTitleClassification00:34:507.56SeniordeveloperSenior00:34:03-15LeadmemberoftechnicalstaffSenior00:46:2651SoftwaredeveloperJunior00:36:112720ArchitectSenior00:29:42-20PrincipalmemberoftechnicalstaffSenior00:31:532517SeniordeveloperSenior00:39:2521.5SoftwaredeveloperJunior00:28:462020WebdeveloperSenior00:31:2611FrontenddeveloperJunior00:35:062015SoftwaredeveloperSenior00:34:193520ProgrammerSenior00:35:557.51.5ProgrammerJunior00:33:313518ArchitectSenior00:30:3399PrincipalsoftwareengineerSenior00:33:5772SoftwaredeveloperJunior00:33:482020SeniordeveloperSenior00:29:178.58.5SystemsengineerSenior00:42:02159SoftwareengineerSenior00:33:222512SoftwaredeveloperSenior00:33:456.50.5SoftwareengineerJunior00:33:212215TechnicalproductarchitectSenior60Table B.2: Devy’s response when task is completed successfully.Task ResponseFile Owner“A-Aaronson owns the file, readme dot md, having made100% of the changes in the past three months.”Branch Name “You are on branch i-s-s-2.”Submit Changes “OK, I’ve pushed your changes.”Table B.3: Total and valid attempts to complete each task using Devy. Validattempts exclude attempts that failed due to technical reasons. Data fromtranscript 6 was excluded because the participant completed the tasks inthe wrong order.TranscriptFile Owner Task Branch Name Task Submit Changes TaskTotal Valid Total Valid Total Valid1 4 3 2 1 1 12 4 2 3 1 3 23 6 2 0 0 4 34 2 1 3 1 3 35 4 1 2 2 5 36 - - - - - -7 9 2 0 0 3 28 6 6 1 1 4 29 1 1 3 3 2 110 4 1 1 1 4 111 10 8 3 2 1 112 7 4 1 1 1 113 1 1 1 1 2 114 5 4 3 2 3 215 1 1 1 1 1 116 3 1 5 4 3 217 3 3 1 1 2 218 1 1 3 3 2 219 11 6 1 1 3 220 4 3 1 1 2 121 1 1 1 1 3 361transcripts and discussing and merging the resulting codes. When then validatedthe codes independently coding two additional randomly selected transcripts andchecking agreement.To facilitate the coding process, we used the R Qualitative Data Analysis (RQDA)package which allows tagging of text selections in plaintext documents using agraphical user interface. The text, tags, and participant metadata are also availablein the R environment making them easy to use in further analyses.62Figure B.2: Study guidedevy-study-procedure.md 12/07/20181 / 3Devy Study ProcedureStart1. Sign consent form. Check it before proceeding.2. Ask if we can record them.3. Let me know if you want to stop at any point.Part I (7 min)1. Walk me through the typical development tasks you work on everyday. [5 min]- [ ] Task management. Walk me through a change task:   - choosing next work item   - preparing to work on it   - submitting the change for review (and choosing reviewers) - [ ] Version control   - Command line or IDE?   - How often?   - What information do you need to commit and how do you get it?   - What features do you use (e.g. commit, log, status, blame)? - [ ] Testing   - When?   - How long to run?   - Always full test suite? If no, how to choose the tests? - [ ] Debugging   - Always with debugger?   - How do you know where to set breakpoints?   - How often? - [ ] Documentation/Google/Problem solving~~   - How do you formulate your questions? - [ ] Collaboration   - Who do you talk to?   - What about?   - By what means? 2. To help you get familiar with Alexa, I'll have you ask Alexa to tell us a joke. You can say: "Alexa, tellme a joke."3. Can you think of any of your tasks you would like to have "magically" completed just by either talking toAlexa or by typing into a natural language command prompt?Check TimePart II (10 min)Now I'm going to have you complete a couple of tasks using our automated assistant, Devy.63devy-study-procedure.md 12/07/20182 / 3Committing and pushing changesHave them open the README in devy-study-1. Confirm that the correct README is open. Tell them toread through the README and to start the tasks therein when they are ready.Devy is a prototype developer assistant that you can converse with to complete development tasks. Tostart a conversation, prompt Devy by saying "Alexa, tell Devy..." or "Alexa, ask Devy...".Complete the following tasks:1. First, simply launch Devy by saying "Alexa, launch Devy". This is just to make sure Devy is working(occasionally there are issues with wifi).2. Now, using Devy, try to get the name of the person whom you might contact to get help with makingchanges to this README.md file.3. Next, make sure you are on branch iss2 and then make a change to this README.md file (and savethose changes).4. Finally, make those changes available on GitHub.This task will likely be frustrating to complete since we haven't given you the commands Devysupports. However, the purpose of the question is not to complete the task but for us to understandwhat developers would say to get their work done. This information will be invaluable for creating moreuseful versions of Devy in the future. Note also that this prototype is trained to recognize a limitednumber of commands which also contributes to making this a hard task.Working with issuesSwitch to devy-study-2 and work through the README.Complete the following tasks.1. Say "Alexa, tell Devy to list my issues." to list the first open issue on GitHub. List the second issue bysaying "Next", then stop by saying "Stop". Notice that the listed issues are for the correct repository.2. Say "Alexa, tell Devy I want to work on issue 2." to have Devy prepare your workspace for you bychecking out a new branch.3. Resolve the issue: comment out the debug console.log on line 8 of log.ts by prepending it with //. Savethe file.4. Say "Alexa, tell Devy I'm done." to commit your work and open a pull request. Devy will ask if you whatto add the new file; say "Yes". Next, Devy recommend up to 3 reviewers. You choose any you like.When completed, Devy will say it created the pull request and will open a new tab showing the pullrequest. Notice that the reviewers you specified have been added. Also, notice that tests covering thechanges were automatically run and the results included as a comment by Devy.Check TimePart III (10 min)1. Now you've seen a few of the initial capabilities of Devy and we are currently working on more. So,imagine that Devy could help you with anything you’d want, what do you think it could help you withand where would it provide most benefit? ...Can you think of anything else you'd have it do based onthe development tasks you mentioned at the start?64devy-study-procedure.md 12/07/20183 / 32. If we left Devy with you, would you try it on your own? Why or why not. Would you be interested inhaving one?3. Are there any other tasks / goals / workflows that you think Devy could help with, maybe not justrestricted to your development tasks, but other tools you or your team or your colleagues use?4. When you think about the interaction you just had with Devy, what did you like and what do you thinkcould be improved.5. Did Devy do what you expected during your interaction? What would you change?6. Do you think that Devy adds value and why or why not?Check TimePart IV (3 min)1. How many years have you been programming?2. How long have you been working as a professional software developer?3. What is your current job title?4. What IDE or text editor do you use?5. What tools do you use to collaborate with your colleagues?6. Before this study, had you used voice-operated assistants like Alexa?ConclusionThank you so much for participating. We will let you know about the results of the lottery in the next couple ofweeks.If you have any further ideas or comments on Devy, please let us know, we really appreciate and value yourfeedback since we want to develop an approach that you might be able to use sometime soon.65

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0370955/manifest

Comment

Related Items