UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An exploratory study of socio-technical congruence in an ecosystem of software developers Azad, Deepak 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_february_azad_deepak.pdf [ 7.94MB ]
Metadata
JSON: 24-1.0166086.json
JSON-LD: 24-1.0166086-ld.json
RDF/XML (Pretty): 24-1.0166086-rdf.xml
RDF/JSON: 24-1.0166086-rdf.json
Turtle: 24-1.0166086-turtle.txt
N-Triples: 24-1.0166086-rdf-ntriples.txt
Original Record: 24-1.0166086-source.json
Full Text
24-1.0166086-fulltext.txt
Citation
24-1.0166086.ris

Full Text

An Exploratory Study of Socio-Technical Congruence inan Ecosystem of Software DevelopersbyDeepak AzadBachelor of Engineering, University of Delhi, India, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)December 2014© Deepak Azad, 2014AbstractSoftware is not built in isolation but builds on other software. When one projectrelies on software produced by another project, we say there is a technical depen-dence between the projects. The socio-technical congruence literature suggests thatwhen there is a technical dependence there may need to be a social dependence.We investigate the alignment between social interactions and technical dependencein a software ecosystem.We performed an exploratory study of 250 Java projects on GitHub that useMaven for build dependences. We create a social interaction graph based on devel-opers’ interactions on issue and pull requests. We compare the social interactiongraph with a technical dependence graph representing library dependences betweenthe projects in the ecosystem, to get an overview of the congruence, or lack thereof,between social interactions and technical dependences. We found that in 23.6% ofthe cases in which there is a technical dependence between projects there is alsoevidence of social interaction between project members. We found that in 8.67%of the cases in which there is a social interaction between project members, thereis a technical dependence between projects.To better understand the situations in which there is congruence between thesocial and technical graphs, we examine pairs of projects that meet this criteria.We identify three categories of these project pairs and provide a quantitative andqualitative comparison of project pairs from each category. We found that for 45(32%) of project pairs, no social interaction had taken place before the introductionof technical dependence and interactions after the introduction of the dependenceare often about upgrading the library being depended upon. For 49 (35%) of projectpairs, 75% of the interaction takes place after the introduction of the technicaliidependence. For the remaining 45 (32%) of project pairs, less than 75% of theinteraction takes place after the introduction of the technical dependence. In thelatter two cases, although there is interaction before the technical dependence isintroduced, it is not always about the dependence.iiiPrefaceThe work presented in this thesis is original, unpublished, independent work by theauthor and was conducted in the Software Practices Lab under supervision of Prof.Gail Murphy.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Socio-Technical Congruence . . . . . . . . . . . . . . . . . . . . . . 32.2 Developer Social Networks . . . . . . . . . . . . . . . . . . . . . . . 53 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Constructing the Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1 Technical Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 94.2 Social Project Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2.1 User Interaction Graph . . . . . . . . . . . . . . . . . . . . . 114.2.2 Forming the Social Project Graph . . . . . . . . . . . . . . . 13v5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.1 Socio-Technical Congruence in an Ecosystem . . . . . . . . . . . . . 195.2 Origins of Socio-Technical Congruence . . . . . . . . . . . . . . . . 205.2.1 Qualitative Examination . . . . . . . . . . . . . . . . . . . . 226 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.1 Formulations of TDG . . . . . . . . . . . . . . . . . . . . . . . . . . 286.2 Formulations of SPG . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.3 Understanding Socio-Technical Congruence . . . . . . . . . . . . . . 317 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35A Projects Explored . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38B Project Pairs With Both Technical Dependence and Social Interactions 46C Timeline Plots for Category A Project Pairs . . . . . . . . . . . . . . . . 52D Timeline Plots for Category B Project Pairs . . . . . . . . . . . . . . . . 61E Timeline Plots for Category C Project Pairs . . . . . . . . . . . . . . . . 71F Communities Detected in UIG . . . . . . . . . . . . . . . . . . . . . . . . 80viList of TablesTable 3.1 A characterization of the 250 projects studied between 1 Jan-uary 2010 and 2 April 2014 . . . . . . . . . . . . . . . . . . . . . 6Table 5.1 Overview of TDG and SPG . . . . . . . . . . . . . . . . . . . . . 20viiList of FiguresFigure 4.1 An example technical dependence graph with four projects p,q, m, n. Each edge is annotated with the date the technicaldependence was introduced and the technical dependence atthe time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 4.2 An example user interaction graph with only two users x, y.These users interact on three projects p, q, n. Each edge isannotated with the project to which the corresponding user in-teractions belong, and each edge is weighted by the number oftimes the two users interacted with each other on that project. . 12Figure 4.3 Degree of nodes in UIG . . . . . . . . . . . . . . . . . . . . . . 13Figure 4.4 An example social project graph with four projects p, q, m, n.Each edge is annotated with all the actions of the users on thetwo projects who were active on both the projects. . . . . . . . . 14Figure 4.5 Community detection in user interaction graph . . . . . . . . . . 15Figure 4.6 Communities detected . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 4.7 Example communities detected in UIG. Nodes are users andthe edges are the interaction between users. Different edgecolors within a community represent the different projects overwhich the interactions occurred. . . . . . . . . . . . . . . . . . . 17Figure 5.1 Degree of nodes in TDG and SPG . . . . . . . . . . . . . . . . . 21Figure 5.2 Percentage activity after technical dependency is introduced . . 22Figure 5.3 Category A - wildfly/wildfly(blue) and resteasy/Resteasy(green). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23viiiFigure 5.4 Category B - infinispan/infinispan(blue) and weld/core(green) . 25Figure 5.5 Category C-sarxos/webcam-capture(blue) and netty/netty(green). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 6.1 Distances between users in a community . . . . . . . . . . . . . 31Figure C.1 Category A project pairs (1-6) . . . . . . . . . . . . . . . . . . . 53Figure C.2 Category A project pairs (7-12) . . . . . . . . . . . . . . . . . . . 54Figure C.3 Category A project pairs (13-18) . . . . . . . . . . . . . . . . . . 55Figure C.4 Category A project pairs (19-24) . . . . . . . . . . . . . . . . . . 56Figure C.5 Category A project pairs (25-30) . . . . . . . . . . . . . . . . . . 57Figure C.6 Category A project pairs (31-36) . . . . . . . . . . . . . . . . . . 58Figure C.7 Category A project pairs (37-42) . . . . . . . . . . . . . . . . . . 59Figure C.8 Category A project pairs (43-45) . . . . . . . . . . . . . . . . . . 60Figure D.1 Category B project pairs (1-6) . . . . . . . . . . . . . . . . . . . 62Figure D.2 Category B project pairs (7-12) . . . . . . . . . . . . . . . . . . . 63Figure D.3 Category B project pairs (13-18) . . . . . . . . . . . . . . . . . . 64Figure D.4 Category B project pairs (19-24) . . . . . . . . . . . . . . . . . . 65Figure D.5 Category B project pairs (25-30) . . . . . . . . . . . . . . . . . . 66Figure D.6 Category B project pairs (31-36) . . . . . . . . . . . . . . . . . . 67Figure D.7 Category B project pairs (37-42) . . . . . . . . . . . . . . . . . . 68Figure D.8 Category B project pairs (43-48) . . . . . . . . . . . . . . . . . . 69Figure D.9 Category B project pair (49) . . . . . . . . . . . . . . . . . . . . 70Figure E.1 Category C project pairs (1-6) . . . . . . . . . . . . . . . . . . . 72Figure E.2 Category C project pairs (7-12) . . . . . . . . . . . . . . . . . . . 73Figure E.3 Category C project pairs (13-18) . . . . . . . . . . . . . . . . . . 74Figure E.4 Category C project pairs (19-24) . . . . . . . . . . . . . . . . . . 75Figure E.5 Category C project pairs (25-30) . . . . . . . . . . . . . . . . . . 76Figure E.6 Category C project pairs (31-36) . . . . . . . . . . . . . . . . . . 77Figure E.7 Category C project pairs (37-42) . . . . . . . . . . . . . . . . . . 78Figure E.8 Category C project pairs (43-45) . . . . . . . . . . . . . . . . . . 79Figure F.1 Communities detected in UIG (1-6) . . . . . . . . . . . . . . . . 81ixFigure F.2 Communities detected in UIG (7-12) . . . . . . . . . . . . . . . 82Figure F.3 Communities detected in UIG (13-18) . . . . . . . . . . . . . . . 83Figure F.4 Communities detected in UIG (19-22) . . . . . . . . . . . . . . . 84xAcknowledgmentsI am grateful to my supervisor Gail Murphy for her encouragement and guidancein pursuing an interesting topic for my Masters Thesis.I thank Marc Palyart for his guidance in solidifying the research methodology.I also thank my second reader, Ivan Beschastnikh, who provided valuable feedbackon my thesis draft.Many thanks to my colleagues of Software Practices Lab who created a cheer-ful atmosphere in the lab and provided guidance when needed.Finally, I would like to acknowledge NSERC for funding this research.xiDedicationTo my amazing parents......for their endless love and support.xiiChapter 1IntroductionSoftware is not typically built in isolation, but instead builds on other software.When one project relies on software produced by another project, we say that thereis a technical dependence between the projects. The socio-technical congruenceliterature suggests that when there is a technical dependence between the projects,there may need to be some social interactions between members of those projects[5, 6, 15]. To date, socio-technical congruence has been considered in the con-text of a single project or system (e.g., [5–7]), in a set of highly related projectsthat result in a simultaneous release (e.g., [19]) or over the lifetime of one system(e.g., [5, 7]). Socio-technical congruence has not been studied in the context ofan ecosystem of software developers, in which a variety of projects may be devel-oped and evolved largely independently but may share developers through use of acommon infrastructure.In this thesis, we investigate socio-technical congruence in the context of anecosystem of software developers. We study socio-technical congruence across250 projects that use Java and Maven1 on GitHub. We chose GitHub as an exam-ple of a software ecosystem because it hosts a large number of diverse open sourcesoftware projects, includes independent projects for which there are technical in-teractions between the projects, and includes data on social interactions, such ascomments on issues and pull requests. We form a technical dependence graph forthe 250 projects based on Maven project information that details library depen-1http://maven.apache.org/1dences between projects. We form a social dependence graph by forming a socialinteraction graph based on communities detected from developer interactions onthe various projects.We investigate two research questions:1. How often does congruence between technical dependencies and social in-teractions occur in an ecosystem of software developers?2. When socio-technical congruence does exist, how does it come about?We found that for 23.6% of the cases in which there is a technical dependencebetween two projects, there is also evidence of some social interactions. In only8.67% of the cases when there is a social interaction between projects is there alsoa technical dependence. We categorized all cases in which there are both social andtechnical dependences between projects, finding that for 45 (32%) of these cases,no social interaction had taken place before the introduction of technical depen-dence. In another 49 (35%) of the cases, most of the social interaction (more than75%) took place after the technical dependence was introduced. In the remaining45 (32%) of the cases a small percentage of social interaction (less than 75%) tookplace after the introduction of technical dependence. We also qualitatively exam-ined an instance from each category to provide insight into the different cases thatarise.We begin with a review of related work in socio-technical congruency and de-termination of developer social networks (Chapter 2). We then describe the GitHubdata used in this investigation (Chapter 3) before detailing the construction of thetechnical and social graphs (Chapter 4). We present the results of our investigations(Chapter 5), discuss the approach and results (Chapter 6) and summarize (Chap-ter 7).2Chapter 2Related WorkCoordination among project members has been recognized as one of the funda-mental problems in software engineering (e.g., [9, 12, 14]). It can affect manydevelopment qualities including developer productivity [6], build success probabil-ity [15] and software failures [5]. We describe previous efforts in socio-technicalcongruence and approaches for analyzing social networks amongst developers.2.1 Socio-Technical CongruenceConway first proposed the idea that structure of a system usually resembles thecommunication structure of the organization that designs it [8]. This concept hasbeen explored in engineering [4] and management science [18]. In software engi-neering Cataldo et al. focused attention on socio-technical congruence [6]. Theydefined socio-technical congruence as the match between the coordination needsestablished by the technical dimension of the socio-technical system and the actualcoordination activities carried out by the project members representing the socialdimension.Previous work has looked at congruence in a single product or system or soft-ware that is released together. Cataldo et al. investigated congruence in a sin-gle large distributed system developed by one company involving 114 develop-ers [6, 7]. Later, Cataldo and Herbsleb also investigated congruence in a singlecomplex embedded system developed by a single organization involving 380 de-3velopers [5]. Syeed and Hammouda empirically examined Conway’s law in theFreeBSD open source project that is developed by a team of individuals as op-posed to a single organization [19]. FreeBSD involved 1128 contributors in thelast release who developed 20 packages that were released together. In contrast,we explore congruence between technical dependencies and social interactions inan ecosystem of software developers who work on independent projects which arenot necessarily released together.High socio-technical congruence is often considered desirable. Cataldo et al.found that high Socio-Technical congruence is associated with increased developerproductivity as measured in terms of time taken to resolve a change request[6].Kwan et al. studied the relationship between Socio-Technical congruence and buildsuccess probability. They report that for continuous builds, increasing congruenceimproves the chance of build success, however increasing congruence can actuallydecrease build success probability in integration builds[15]. Cataldo and Herbslebalso studied the impact of Socio-Technical congruence on software failures, andfound that lower congruence increased software failures[5].Another direction of study has been the evolution of socio-technical congru-ence over the life cycle of a development project. Cataldo et al. examined theevolution of congruence across four releases on a project and found that congru-ence often improved or remained stable over time [7]. They also examined theevolution of congruence among developers who contributed the most and the restof the developers. They found that congruence among developers who contributedthe most increased more over time as compared to rest of the developers. Thisanalysis was replicated on two different projects with similar results [5]. Syeedand Hammouda also examined the evolution of congruence value across severalreleases of FreeBSD open source project [19]. They found that the congruencevalue remains stable and increases gradually over time. However, little has beendone to examine the origins of socio-technical congruence when it does exist.The socio-technical congruence literature has considered a range of definitionsfor identifying coordination needs or technical dependencies. Cataldo et al. useddependencies among tasks [6, 7]. Syeed and Hammouda use dependencies insource code at a file level to establish coordination needs [19]. To enable proactivedetection of coordination needs, Blincoe et al. use information about the activities4of developers associated with different tasks [2]. In this thesis, we use library de-pendencies as a natural unit for a technical dependence between two Java projects.2.2 Developer Social NetworksOther researchers have focused on analysis of how developers interact separatefrom the technical structure of the system(s) they are developing. Bird et al.extracted developer social networks from mailing list archives of five large opensource projects and identified the community structure from these social networks [1].Hong et al. examined the community evolution patterns in developer networksbased on Mozilla bug reports and observed the individual community evolutionpaths [13]. They also compare these developer social networks with general socialnetworks. They find that the size of communities in developer social networks issmall compared to that in most general social networks. They also find that de-veloper social networks have a widespread community size distribution and theirbiggest community accounts for 21% to 36% of the total developers. Our formula-tion of the developer social network is similar to Hong et al. in that we use discus-sions on issues and pull requests as a means to identify developer interactions[13].5Chapter 3Dataset DescriptionTo explore the research questions of interest in this thesis, we needed a diverse setof projects through which many developers interact. We chose GitHub because itis a popular hosting service for open source software projects, and hosts a diverseset of projects. As the projects on GitHub are open-source, we can analyze theproject source for technical dependencies. To make the determination of technicaldependencies tractable, we focused on projects on GitHub in Java that use Maven.Maven is a build automation tool primarily used for Java projects. Java projectsthat use Maven describe dependencies via Project Object Model (POM) 1 files,which are XML files that contain information about the project, its configurationand all technical dependencies used by Maven to build the project. POM uniquelyidentifies a software artefact via three required fields - groupId, artifactId, version.GroupId is generally unique amongst an organization or a project. For example,1http://maven.apache.org/pom.htmlTable 3.1: A characterization of the 250 projects studied between 1 January2010 and 2 April 2014Total Mean Std. dev.#Issues 110,374 441.5 613.9#Issue comments 236,240 945 2066.6#Pull requests 61,370 254.6 488#Pull request comments 45,902 257.9 510.86all core Maven artifacts live under the groupId org.apache.maven. However,an organization may have several projects. The artifactId is generally the namethat the project is known by. It, along with the groupId, creates a key that uniquelyidentifies a project. The version determines which incarnation of a project is be-ing talked about. POM also includes a list of all the dependencies needed by aproject for successful compilation. If a project uses Maven for its builds, all thedependencies for the project can be obtained from the POM.Listing 3.1 provides an example of a POM file for the project cucumber-testng.The groupId for this project is derived from the groupId of its parent, info.cukes.This file also specifies two dependencies on two artefacts, cucumber-corewhich has the same groupId, and testng which is a project from another or-ganization. In this way, we can analyze POM files to uniquely identify projectsand extract their technical dependencies.Listing 3.1: Sample POM file<p r o j e c t xmlns=” h t t p : / / maven . apache . o rg /POM/ 4 . 0 . 0 ”><p a r e n t><g r o u p I d> i n f o . cukes< / g r o u p I d>< a r t i f a c t I d>cucumber−jvm< / a r t i f a c t I d>< r e l a t i v e P a t h> . . / pom . xml< / r e l a t i v e P a t h><v e r s i o n>1 .2 .0 −SNAPSHOT< / v e r s i o n>< / p a r e n t>< a r t i f a c t I d>cucumber− t e s t n g< / a r t i f a c t I d><p a c k a g i n g> j a r< / p a c k a g i n g><name>Cucumber−JVM: TestNG< / name><d e p e n d e n c i e s><dependency><g r o u p I d> i n f o . cukes< / g r o u p I d>< a r t i f a c t I d>cucumber−c o r e< / a r t i f a c t I d>< / dependency><dependency><g r o u p I d>org . t e s t n g< / g r o u p I d>< a r t i f a c t I d> t e s t n g< / a r t i f a c t I d>< / dependency>< / d e p e n d e n c i e s>< / p r o j e c t>7In this thesis we consider the top 250 Java projects that use Maven on GitHubbased on the number of comments and activity on issues and pull requests. Weconsider the top 250 projects based on these criteria to ensure that we are analyzingprojects with sufficient captured social interactions. We identify these projectsusing the GHTorrent dataset [11], and we use the GHTorrent MySQL dump from2 April 2014 2.Since Maven is used primarily with Java projects, we first identify Java projects.Github detects the primary language of each project and this is available in theGHTorrent data. We simply filter projects marked as ’Java’ projects. Next we sortthese projects based on number of comments on issues and pull requests. To filterMaven projects from the list of Java projects we detect presence of at least onePOM file in a project repository. We obtain POM files by directly cloning the gitrepositories from GitHub and extracting the necessary files.For these 250 projects, the total number of issues (110,374) is about twicethe total number of pull requests (61,370). Also the total number of comments onissues (236,240) are about five times the total number of comments on pull requests(45,902). We consider the data from Jan 2010 onwards since the social interactiondata is quite sparse before that. Table 3.1 summarizes these numbers about thedataset.2http://ghtorrent.org/downloads/mysql-2014-04-02.sql.gz8Chapter 4Constructing the GraphsTo investigate socio-technical congruence, we need to be able to form graphs thatrepresent both technical and social dependences that exist between the selected 250projects. We analyze the GitHub and Maven data to form two graphs: a technicaldependence graph (Section 4.1) and a social dependence graph (Section 4.2).4.1 Technical Dependence GraphTo represent technical dependencies, we form a technical dependence graph T DG=(VP,ET DG). Let P be the set of projects, then VP is the set of vertices where eachproject corresponds to a vertex. The edge set, ET DG, defines directed edges be-tween vertices in the graph, one for each technical dependence.Let L be the set of libraries from all projects; each project can have one ormore libraries, and let D be the set of dates. For each edge e ∈ ET DG, we definedependence(e) ∶ ET DG → (L×D). In other words, each edge is annotated with thedate the technical dependence was introduced and the technical dependence at thetime. Figure 4.1 shows an example technical dependence graph.We determine ET DG based on the dependencies specified in the POM files ex-tracted from the git repositories of the selected projects. We cloned git repositoriesfor all of the selected projects on 2 April 2014 and then extracted all POM filesat the latest commit in each project. To identify a GitHub project uniquely weconsider only groupId from the POM files, this is because a project can produce9Figure 4.1: An example technical dependence graph with four projects p, q,m, n. Each edge is annotated with the date the technical dependencewas introduced and the technical dependence at the time.multiple artefacts and for the purpose of this thesis, it does not matter which arte-fact of a project introduces a technical dependence. However, we do look overthe repository history and keep track of the time a technical dependence was intro-duced and the technical dependence at that time, each edge is annotated with thisinformation.4.2 Social Project GraphThe social project graph (SPG) describes interactions between projects. This graphaggregates interactions between GitHub users1 via issues and other mechanisms tothe project level to enable comparison to the TDG. We say that an interaction has1We use the term user as opposed to developer since a GitHub user may only open issues orcomment on issues and never write code. However, it is likely that a majority of GitHub users do infact write code and can be considered developers.10occurred between two users when they have performed an action on the same issue,or the same pull request. We consider three actions on a issue: create, comment,and close. We consider four actions on a pull request: create, comment, merge andclose. The issues and pull requests belong to projects. We say that two projectsinteract with each other if users who work on these projects interact with eachother. Hence, to construct the SPG, we aggregate these user-user interactions toproject-project interactions.4.2.1 User Interaction GraphWe first construct an undirected multigraph which captures user interactions UIG =(VU ,EUIG). Let U be the set of users who have at least one action on an issue or apull request, then VU is the set of vertices where each user corresponds to a vertex.The multiset, EUIG, defines undirected edges between vertices in the graph. Eachedge corresponds to interactions between two users on a single project. If two usersinteract on multiple projects, there are multiple edges between the correspondingvertices, one for each project.Let P be the set of projects. For each edge e ∈ EUIG, we define pro ject(e) ∶EUIG → P. In other words, we keep track of specific projects by annotating eachedge with the project to which the corresponding user interactions belong.Let x and y be two users and p a project. For each edge exyp ∈ EUIG, we defineweight(exyp) ∶ EUIG →N.The weight function is given byweight(exyp) = ∑ixyp∈Ixyp(∣ixyp(x)∣× ∣ixyp(y)∣)+ ∑rxyp∈Rxyp(∣rxyp(x)∣× ∣rxyp(y)∣)11wherex and y are two usersp is a projectIxyp is the set of issues on p on which both x and y performed an actionRxyp is the set of pull requests on p on which both x and y performed an actionixyp(x) is the set of actions of user x on issue ixyprxyp(x) is the set of actions of user x on pull request rxypIn other words, each edge is weighted by the number of times the two corre-sponding users interacted with each other on a single project. Figure 4.2 shows anexample user interaction graph.Figure 4.2: An example user interaction graph with only two users x, y.These users interact on three projects p, q, n. Each edge is annotatedwith the project to which the corresponding user interactions belong,and each edge is weighted by the number of times the two users inter-acted with each other on that project.The UIG formed from the dataset has 18,471 vertices and 90,837 edges. Onaverage, a user has interacted with five other users across all projects. Figure 4.3shows the connectedness of users in UIG. Majority of the users have interactedwith less than 10 other users. However, there are a few who have interacted with alarger (up to 50) number of users.The UIG results in an over approximation of actual interactions as the com-12Figure 4.3: Degree of nodes in UIGmunication might have been one-way. Different users comment on an artefact atdifferent times, possibly weeks or months apart. Hence, we cannot be sure of aninteraction between all users who commented on an artefact, since we don’t knowif a comment was read by another user. In some ways, the UIG is also an underestimation because not all communication related to projects may be captured in arepository. We discuss these issues further in Chapter 6.4.2.2 Forming the Social Project GraphTo represent social interactions between projects, we form a social project graphSPG = (VP,ESPG). Let P be the set of projects, then VP is the set of vertices whereeach project corresponds to a vertex. The edge set, ESPG, defines undirected edgesbetween vertices in the graph. Each edge corresponds to social interactions be-tween two projects.Let Umn be the set of users who performed an action on some issues or pullrequests of projects m and n. Also, let umn ∈ Umn and let IAmn(umn) be set of issueactions and RAmn(umn) be set of pull request actions performed by user umn on13projects m and n. Then for each edge e ∈ ESPG, we defineactions(emn) = ⋃umn∈Umn(IAmn(umn)⋃ RAmn(umn))In other words, each edge is annotated with all the actions of the users on thetwo projects who were active on both the projects. Figure 4.4 shows an examplesocial project graph.Figure 4.4: An example social project graph with four projects p, q, m, n.Each edge is annotated with all the actions of the users on the twoprojects who were active on both the projects.To form the SPG, we need to determine projects that interact with each otherbased on the UIG. We only want to represent interactions between projects if thereare a number of interactions between users working on those projects. To de-termine where there are strong interactions between users, we apply communitydetection to the UIG. Girvan and Newman defined community detection as the di-vision of a graph into communities or sub-graphs in which the connections withincommunities are much denser than the connections between them [10]. By defi-nition community detection filters out weak connections between users, and hence14Figure 4.5: Community detection in user interaction graphallows us to focus on strong interactions. Since we say that two projects interactwith each other if users who work on these projects interact with each other, weconstruct the SPG by applying community detection to the UIG and forming theSPG from the result by forming edges between all projects in a community.Figure 4.5 provides an illustration of community structure in a graph. Thisfigure shows three communities with dense connections within each community,but only a single connection between communities. In Figure 4.5 each edge is alsolabelled with the project name corresponding to the interactions. For this graphwe create an SPG with vertices {p,q,m,n} and edges {pq, pn,qn,mn}. This is thegraph shown in Figure 4.4.Forming edges between all projects in a community results in an over approxi-mation of project to project social interactions. The rationale of the SPG is to cap-ture potential indirect interactions between people interacting on different projectsmuch like social network analysis. We discuss this more in Chapter 6.We use the fast community detection algorithm by Blondel et al. [3]. Thisalgorithm is a heuristic method that is based on modularity optimization. It out-performs all other known community detection method in terms of computationtime. At the same time, the quality of the communities detected is also good as15(a) Histogram of number of projects involved in acommunity(b) Histogram of number of users in a communityFigure 4.6: Communities detectedmeasured by modularity. The modularity of detected communities is a measure toquantify the goodness of detected communities and it is a scalar value between -1and 1 that measures the density of links inside communities as compared to linksbetween communities [16]. We use the implementation available in the Pythonigraph2 library.When applied to the dataset, the algorithm detects 104 communities in the UIG.2http://igraph.sourceforge.net/16(a) One project community (b) Two project community(c) Two project community (d) Multiple project communityFigure 4.7: Example communities detected in UIG. Nodes are users and theedges are the interaction between users. Different edge colors within acommunity represent the different projects over which the interactionsoccurred.Figure 4.6a shows a histogram of number of projects involved in each communitydetected. From this histogram, we can see that a large number of communities in-volve interactions on a single project, and most communities involve eight projectsor less. Figure 4.6b shows a histogram of the number of users involved in eachcommunity; most communities are quite small. Figure 4.7 shows a few communi-ties detected in UIG. Each vertex corresponds to a user and each edge correspondsto interactions between two users on a single project. Different edge colours ina single community correspond to different projects the users in the community17interact on. Some communities involve interactions on only one project (e.g., Fig-ure 4.7a). Some communities connect two projects, but there may be only one con-necting user (e.g., Figure 4.7b). Other communities may have several connectingusers (e.g., Figure 4.7c). There are also some communities which connect severalprojects and there are several connecting users (e.g. Figure 4.7d).18Chapter 5ResultsWe consider each research question in turn.5.1 Socio-Technical Congruence in an EcosystemTo answer our first research question, ”How often does congruence between tech-nical dependencies and social interactions occur in an ecosystem of software de-velopers?”, we compare the TDG and the SPG formed from the data of the 250GitHub projects described in Chapter 4. The number of project pairs that havesome social interaction is 1809, in comparison, the number of project pairs relatedby a technical dependence is 664. The overlap between the TDG and the SPG is157 edges, meaning that in 23.6% of cases in which there is a technical depen-dence between projects there is also evidence of social interaction between projectmembers. However, in only 8.67% of cases in which there is a social interactionbetween project members, there is a technical dependence between projects.To provide a sense of the graphs, we compare the connectedness of nodes in theTDG and the SPG. Figure 5.1a shows the degree of TDG vs degree of SPG; sinceTDG is directed Figure 5.1b shows the in-degree of TDG vs degree of SPG. Eachdot in these plots corresponds to a project. Figure 5.1 shows that most projectshave less than 10 technical dependencies to other selected projects from GitHub,however a number of projects interact with up to 50 other projects.Table 5.1 summarizes these numbers.19Table 5.1: Overview of TDG and SPG#nodes in both SPG and TDG 250#edges in TDG 664#edges in SPG 1809avg in-degree of TDG 2.656avg degree of SPG 14.472#edges overlapping in TDG and SPG 157#edges TDG-SPG 507#edges SPG-TDG 16525.2 Origins of Socio-Technical CongruenceTo answer our second research question, ”When socio-technical congruence doesexist, how does it come about?”, we consider the common edges in the TDG andthe SPG. For each overlapping edge, we determine all users who contributed toboth the projects and extract the activity of each such user on these two projects(i.e., actions(emn) as defined in Section 4.2.2). The activity information providesus the project, user, type of interaction and time at which the action occurred. Sincewe over approximate the number of edges in the SPG, for some of the overlappingedges there are no common users. The occurrences of over approximation aresmall, occurring 18 times out of 157 (11.46%). For each overlapping edge, wealso extract the time when a technical dependence was introduced between the twoprojects (i.e., dependence(e) as defined in Section 4.1).For each project pair, we plot the activity of each user in common to bothprojects of the pair as a timeline to understand if there are patterns that occur.Figures 5.3, 5.4, and 5.5 show three such plots. The x-axis shows the time pe-riod under consideration, and the y-axis shows different users who are active onboth projects, the labels on y-axis are the user ids of these users in the GHTorrentdataset. Each square corresponds to a user action on an issue and a circle corre-sponds to a user action on a pull request. For each user, there are two timelines,one for each project that are distinguished by colour. The data points in blue corre-spond to the first project in the figure title and the data points in green correspondto the second project. In all cases, the first project has a technical dependence onthe second project. The date the technical dependence was introduced is shown by20(a) In-degree in TDG vs. degree in SPG for eachproject node(b) Degree in TDG vs. degree in SPG for each projectnodeFigure 5.1: Degree of nodes in TDG and SPGthe vertical red line.We also plot the percentage of social activity that happens after the technicaldependence of interest is introduced. In Figure 5.2, each dot corresponds to aproject pair. We categorize all project pairs in three categories:A) All social activity for users who contribute to both projects comes after thetechnical dependence is introduced. There are 45 such project pairs.21Figure 5.2: Percentage activity after technical dependency is introducedB) More than 75% of social activity comes after technical dependence is intro-duced. There are 49 such project pairs.C) The remaining 45 project pairs, in which less than 75% of social activitycomes after the technical dependence.We chose a threshold of 75% to split the plots into category B and C into similarsize groupings.5.2.1 Qualitative ExaminationTo better understand the categories, we visually examined the timeline plots foreach category and selected a project pair that is representative of the category. Foreach such project pair, we search for textual match for the dependency name in allthe issues and pull requests for the project creating the dependency as a means ofdetermining issues and pull requests related to the dependence. After this filteringprocess, we manually examine each issue and pull request and all the comments tounderstand more about the technical dependency. We present three such selectedexamples below.22Figure 5.3: Category A - wildfly/wildfly(blue) and resteasy/Resteasy(green)Category A: All social activity after technical dependence introduction. The ex-ample for Category A is the project pair: wildfly/wildfly1 and resteasy/Resteasy2.The WildFly Application server, formerly known as JBoss AS3 (JBoss Appli-cation Server), is a flexible, lightweight, managed application runtime. It imple-ments the Java Platform, Enterprise Edition (Java EE) specification. RESTEasyis a JBoss.org project aimed at providing productivity frameworks for developingclient and server RESTful applications and services in Java. It is a fully certifiedand portable implementation of the JAX-RS specification. JAX-RS is a new JCPspecification that provides a Java API for RESTful Web Services over the HTTPprotocol.WildFly introduced a dependency on RESTEasy on Feb 25 2011. The firstmention of the technical dependency occurs six months later in August 2011 aboutupgrading the dependency from version 2.2.1 GA to version 2.2.2 GA4. There are1https://github.com/wildfly/wildfly2https://github.com/resteasy/Resteasy3http://jbossas.jboss.org/4https://github.com/wildfly/wildfly/pull/12623six further mentions of upgrading the dependency version in Jan 20135, Feb 20136,June 20137, Sep 20138,9, Oct 201310 and Dec 201311. All these upgrades involvedcreating a build with the new version and then running tests. If the build succeededthe dependency version was upgraded. On occasion, there is mention of the reasonfor upgrading to a new version. For instance, the upgrade in Feb 2013 happensbecause of need for JAX-RS 2.0 support, and the upgrade in June 2013 happens tobe able to support multiple Application classes.Over time more dependencies to RESTEasy are added. In Jan 2012 a de-pendency to resteasy-yaml-provider is added12. In May 2013 a dependency toresteasy-crypto module is added13. In July 2013 a dependency to resteasy-clientmodule is added14. In Jan 2014 a dependency to resteasy-spring is added15.There are several other issues which talk about specific bugs or adding moretests.Category B: 75% social activity comes after technical dependence introduction.The example for Category B is the project pair: infinispan/infinispan16 and weld/-core17.Infinispan is an open source data grid platform and highly scalable NoSQLcloud data store. Weld is the reference implementation of CDI: Contexts and De-pendency Injection for the Java EE Platform which is the Java standard for de-pendency injection and contextual lifecycle management. Weld is integrated intomany Java EE application servers such as WildFly, JBoss Enterprise ApplicationPlatform, GlassFish, Oracle WebLogic and others.5https://github.com/wildfly/wildfly/pull/38386https://github.com/wildfly/wildfly/pull/41177https://github.com/wildfly/wildfly/pull/46738https://github.com/wildfly/wildfly/pull/49919https://github.com/wildfly/wildfly/pull/500910https://github.com/wildfly/wildfly/pull/539111https://github.com/wildfly/wildfly/pull/559612https://github.com/wildfly/wildfly/pull/119513https://github.com/wildfly/wildfly/pull/456514https://github.com/wildfly/wildfly/pull/475815https://github.com/wildfly/wildfly/pull/569516https://github.com/infinispan/infinispan17https://github.com/weld/core24Figure 5.4: Category B - infinispan/infinispan(blue) and weld/core(green)Infinispan introduced a dependency on Weld on Jul 18 2011. Five users whoeventually contribute to both projects are active on at least one project before thisdate. Two of them have contributions on both projects, and one of them pmuir18has multiple contributions on both projects prior to the introduction of technicaldependence. The technical dependence is introduced by a pull request19. pmuir isthe author on some commits in this pull request, and also the one who merges thepull request. There is not however, a discussion on merits or demerits of addingthis technical dependence. Apart from this pull request pmuir’s contributions toboth projects are unrelated to the technical dependence.After the dependency is introduced several pull requests are opened to makesure that the version of Weld being used by Infinispan is kept up to date in Sept18https://github.com/pmuir19https://github.com/infinispan/infinispan/pull/42025201120, Nov 201121, Dec 201122, Nov 201223 and June 201324. This last one inJune 2013 is opened by pmuir. In this pull request updating the dependence is asmall part, the main focus of this is to do a code cleanup.Figure 5.5: Category C-sarxos/webcam-capture(blue) and netty/netty(green)Category C: Less than 75% of social activity after technical dependence introduc-tion. The example of Category C is the project pair: sarxos/webcam-capture25and netty/netty26.The project sarxos/webcam-capture is a Webcam Capture API for Java. Thislibrary allows one to use the build-in or external webcam directly from Java. It’sdesigned to abstract commonly used camera features and support multiple captur-20https://github.com/infinispan/infinispan/pull/54021https://github.com/infinispan/infinispan/pull/64222https://github.com/infinispan/infinispan/pull/69523https://github.com/infinispan/infinispan/pull/147224https://github.com/infinispan/infinispan/pull/190825https://github.com/sarxos/webcam-capture26https://github.com/netty/netty26ing farmeworks. Netty is an asynchronous event-driven network application frame-work for rapid development of maintainable high performance protocol servers andclients.Webcam-capture introduced a dependency on Netty in Mar 2013. One userhepin198927 creates a pull-request28 on webcam-capture project for a WebcamCapture live streaming example. Another user sarxos29 accepts this pull request- ”Thank you :) I really appreciate your help. Tomorrow I will Mavenize it.”. Thispull request introduces the technical dependency.Prior to creating this pull request, this user hepin1989 also opens multiple is-sues on both projects starting Jan 2013. These issues were bugs the user encoun-tered while trying to use both the projects. On one such issue30 the following ex-change takes place which eventually results in the technical dependency creation.ZAHIDHAF: can you please send me the code webcam encode and decodeusing xuggler.HEPIN1989: i will send you an gist [hepin1989 sends a gist]SARXOS: Can I refine your code and include it as a new usage example?HEPIN1989: I will write example for your project,how about this?SARXOS: It would be perfect if you prepare such example, thank you :)HEPIN1989: Ok,I will going to wirte something about it.and could i usenetty,or just the original java NIO?SARXOS: Sure :) Please use whatever you want until it is available fromMaven Central.After the pull request is accepted, there is some follow up discussion on usingthe Webcam Capture live streaming example with an Android phone as a client31.However, since this technical dependency was introduced for one specific pieceof code, after that feature is implemented and works well there isn’t much socialinteraction.27https://github.com/hepin198928https://github.com/sarxos/webcam-capture/pull/6829https://github.com/sarxos30https://github.com/sarxos/webcam-capture/issues/1131https://github.com/sarxos/webcam-capture/pull/68#issuecomment-1986744327Chapter 6DiscussionWe made a number of choices in the TDG and SPG formation in our investigationof socio-technical congruence. We discuss these choices and also consider ouroverall approach for investigating socio-technical congruence.6.1 Formulations of TDGWe form the TDG by identifying technical dependencies at the level of Java li-braries. However, we do not know if a library is actually used by a project or howoften it is used. Hence, an argument could be made to look for technical dependen-cies at a more fine grained level of import statements, since such analysis wouldmore accurately capture library usage. However, such a fine grained analysis wouldbe quite expensive computationally, since it requires scanning all the source files.For this exploratory study, we chose the computationally cheaper option, and weleave a more fine grained analysis of technical dependencies as future work.While forming the TDG, we assume that the POM files if present in a git repos-itory specify the technical dependencies completely and correctly. However, wenever compile the projects to ascertain if this is in fact true. It is possible that wemiss library jar files may have been copied in, or a project does not actually usePOM files for dependency management.286.2 Formulations of SPGIn the UIG formation, we give equal weights to all types of interactions, i.e. we donot differentiate between interactions on pull requests and interactions on issues.The weight function from Section 4.2.1 could be changed toweight(exyp) =weighti × ∑ixyp∈Ixyp(∣ixyp(x)∣× ∣ixyp(y)∣)+weightpr × ∑rxyp∈Rxyp(∣rxyp(x)∣× ∣rxyp(y)∣)whereweighti is the weight of an interaction on an issueweightpr is the weight of an interaction on a pull requestExperimenting with different weights is left as future work. In this thesis, wegive equal weights to different types of interactions, i.e. weighti =weightpr = 1.The user interaction graph (UIG) potentially under estimates user to user in-teractions because not all communication related to projects may be captured inthe GitHub repository. In this work, we do not consider communication betweenusers that may happen outside of GitHub issues, such as email exchanges and chatconversations between users or mailing lists that may be used by projects. How-ever, this is acceptable because social network analysis in open source projects isstill valid in case of missing or inadequate data. Nia et al.[17] studied the effectof missing links and temporal data aggregation on measures of centrality of nodes,including clustering coefficient, in the network. They demonstrate on three differ-ent OSS projects that while these issues do change network topology, the measuresare stable with respect to such changes.From another perspective, the UIG may also be an over approximation of actualinteractions in GitHub. We assume that two users interacted if they commented onthe same issue or a pull request. However, one user may not have read a commentby another user. However, this is reasonable because there are a small number of29comments on average on issues and pull requests. The average number of com-ments on an issue is 2.14 and on a pull request is 0.74 (Table 3.1). The smallnumber of comments on each issue and pull request imply that on an average wediscover a small number of user-user interactions per issue or pull request, whichreduces the amount of over approximation.To form the SPG, we first detect user communities in the UIG. There are otherpossible approaches to creating a SPG. For instance, we could simply form con-nections between all those projects that have at least one common user as a contrib-utor. However, this approach could result in connections between projects wherea single user interacted with both projects just once. This kind of single isolatedconnection would probably represent noise rather than actual connection betweentwo projects. To overcome this problem, we could have selected minimum thresh-olds for number of common users and number of interactions a user needs to havewith a project. However, these minimum thresholds would be arbitrary. Hence, wechose to go with community detection, since that provides us a mechanism to filterout the weak interactions.We did not validate the detected communities with actual users because it is notnecessary for the communities found by the community detection algorithm to berecognizable to a user. We use community detection algorithm solely as a meansto filter out weak connections between users.When we form the SPG, we form edges between all projects in a community.This approach results in an over approximation of project to project social interac-tions. Later, for all edges common to both the TDG and the SPG, we examine theactivity of all users who contributed to both projects. However, because of the overapproximation, for some of the overlapping edges, there are no common users.The number of edges for which this is true is small, 18 out of 157 or 11.46%. Asa result, to understand how far apart these projects might be in UIG, we want toget a sense of distance between users in a community. Figure 6.1 shows the dis-tances between users in a community in UIG, both the average distances and thelargest distances. The figure on the left shows that for most users are on an averagetwo users away from any other user. The figure on the right shows that in mostcommunities the largest distance between two users is five.30(a) Distribution of average length of shortest paths be-tween two users in a community(b) Distribution of the largest distance between twousers in a communityFigure 6.1: Distances between users in a community6.3 Understanding Socio-Technical CongruenceWhen comparing the TDG and the SPG, we look for the existence of a connectionbetween two projects in both graphs. However, we ignore the strength of thisconnection. The strength of a connection in the TDG could be related to the numberof libraries from one project on which the other project depends. In terms of theSPG, the strength of a connection could be related to the number of users who31contribute to both projects. Investigating connection strength is left to future work.To understand the origins of socio-technical congruence, we investigate a fewproject pairs manually. More sophisticated methods of automatic analysis could beused to classify issue and pull request comments based on their relationship to thetechnical dependence. For instance, automatic analysis could be used to investi-gate discussions about a dependence, bugs related to dependences and dependenceupgrades. Applying such an analysis would give a more thorough understandingof the categories of how social and technical congruences occur.32Chapter 7ConclusionWe explored socio-technical congruence in 250 projects on GitHub that use Mavenfor build dependences. We created a social interaction graph based on develop-ers’ interactions on issue and pull requests, and we used community detectiontechniques to identify strong user-user interactions. We also created a technicaldependence graph based on build dependences specified in POM files used byMaven. We compared these two graphs to get an overview of the congruence,or lack thereof, between social interactions and technical dependences. We foundthat for 23.6% of the cases in which there is a technical dependence between twoprojects, there is also evidence of some social interactions. In only 8.67% of thecases when there is a social interaction between projects is there also a technicaldependence.We categorized all cases in which there are both social and technical depen-dences between projects. We found that for 45 (32%) of project pairs, no socialinteraction had taken place before the introduction of technical dependence andinteractions after the introduction of the dependence are often about upgrading thelibrary being depended upon. For 49 (35%) of project pairs, 75% of the interactiontakes place after the introduction of the technical dependence. For the remaining45 (32%) of project pairs, less than 75% of the interaction takes place after theintroduction of the technical dependence. In the latter two cases, although thereis interaction before the technical dependence is introduced, it is not always aboutthe dependence.33We also discussed some of the ways that this exploratory study could be finetuned in the future.34Bibliography[1] C. Bird, D. Pattison, R. D’Souza, V. Filkov, and P. Devanbu. Latent socialstructure in open source projects. In Proceedings of the 16th ACM SIGSOFTInternational Symposium on Foundations of Software Engineering,SIGSOFT ’08/FSE-16, pages 24–35, New York, NY, USA, 2008. ACM.[2] K. Blincoe, G. Valetto, and S. Goggins. Proximity: A measure to quantifythe need for developers’ coordination. In Proceedings of the ACM 2012Conference on Computer Supported Cooperative Work, CSCW ’12, pages1351–1360, New York, NY, USA, 2012. ACM.[3] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fastunfolding of communities in large networks. Journal of StatisticalMechanics: Theory and Experiment, 2008(10):P10008, 2008.[4] T. Browning. Applying the design structure matrix to system decompositionand integration problems: a review and new directions. EngineeringManagement, IEEE Transactions on, 48(3):292–306, Aug 2001.[5] M. Cataldo and J. Herbsleb. Coordination breakdowns and their impact ondevelopment productivity and software failures. Software Engineering,IEEE Transactions on, 39(3):343–360, March 2013.[6] M. Cataldo, J. D. Herbsleb, and K. M. Carley. Socio-technical congruence:A framework for assessing the impact of technical and work dependencieson software development productivity. In Proceedings of the SecondACM-IEEE International Symposium on Empirical Software Engineeringand Measurement, ESEM ’08, pages 2–11, New York, NY, USA, 2008.ACM.[7] M. Cataldo, P. A. Wagstrom, J. D. Herbsleb, and K. M. Carley. Identificationof coordination requirements: Implications for the design of collaborationand awareness tools. In Proceedings of the 2006 20th Anniversary35Conference on Computer Supported Cooperative Work, CSCW ’06, pages353–362, New York, NY, USA, 2006. ACM.[8] M. E. Conway. How do committees invent. Datamation, 14(4):28–31, 1968.[9] B. Curtis, H. Krasner, and N. Iscoe. A field study of the software designprocess for large systems. Commun. ACM, 31(11):1268–1287, Nov. 1988.[10] M. Girvan and M. E. J. Newman. Community structure in social andbiological networks. Proceedings of the National Academy of Sciences,99(12):7821–7826, 2002.[11] G. Gousios. The ghtorrent dataset and tool suite. In Proceedings of the 10thWorking Conference on Mining Software Repositories, MSR’13, pages233–236, 2013.[12] J. D. Herbsleb and R. E. Grinter. Splitting the organization and integratingthe code: Conway’s law revisited. In Proceedings of the 21st InternationalConference on Software Engineering, ICSE ’99, pages 85–95, New York,NY, USA, 1999. ACM.[13] Q. Hong, S. Kim, S. Cheung, and C. Bird. Understanding a developer socialnetwork and its evolution. In Software Maintenance (ICSM), 2011 27thIEEE International Conference on, pages 323–332, Sept 2011.[14] R. E. Kraut and L. A. Streeter. Coordination in software development.Commun. ACM, 38(3):69–81, Mar. 1995.[15] I. Kwan, A. Schroter, and D. Damian. Does socio-technical congruence havean effect on software build success? a study of coordination in a softwareproject. Software Engineering, IEEE Transactions on, 37(3):307–324, May2011.[16] M. E. J. Newman and M. Girvan. Finding and evaluating communitystructure in networks. Phys. Rev. E, 69:026113, Feb 2004.[17] R. Nia, C. Bird, P. Devanbu, and V. Filkov. Validity of network analyses inopen source projects. In Mining Software Repositories (MSR), 2010 7thIEEE Working Conference on, pages 201–209, May 2010.[18] M. E. Sosa, S. D. Eppinger, and C. M. Rowles. The misalignment of productarchitecture and organizational structure in complex product development.Management Science, 50(12):1674–1689, 2004.36[19] M. Syeed and I. Hammouda. Socio-technical congruence in oss projects:Exploring conway’s law in freebsd. In E. Petrinja, G. Succi, N. El Ioini, andA. Sillitti, editors, Open Source Software: Quality Verification, volume 404of IFIP Advances in Information and Communication Technology, pages109–126. Springer Berlin Heidelberg, 2013.37Appendix AProjects ExploredThe 250 GitHub projects explored in this dissertation:1. wildfly/wildfly2. elasticsearch/elasticsearch3. netty/netty4. infinispan/infinispan5. brooklyncentral/brooklyn6. jcabi/jcabi-github7. getlantern/lantern8. Bukkit/CraftBukkit9. MasDennis/Rajawali10. neo4j/neo4j11. openmicroscopy/bioformats12. mcMMO-Dev/mcMMO13. hazelcast/hazelcast14. junit-team/junit15. turesheim/eclipse-utilities16. jbosstm/narayana17. Multiverse/Multiverse-Core18. mff-uk/ODCS19. hibernate/hibernate-orm20. cucumber/cucumber-jvm3821. Catrobat/Catroid22. Bukkit/Bukkit23. nathanmarz/storm24. FamilySearch/gedcomx25. square/okhttp26. AndlyticsProject/andlytics27. irstv/orbisgis28. gradle/gradle29. forcedotcom/phoenix30. ArcBees/GWTP31. hornetq/hornetq32. ujhelyiz/EMF-IncQuery33. BroadleafCommerce/BroadleafCommerce34. caelum/vraptor435. stratosphere/stratosphere36. molgenis/molgenis37. dana-i2cat/opennaas38. hector-client/hector39. square/dagger40. gxa/gxa41. mozilla-services/android-sync42. FoundationDB/sql-layer43. square/picasso44. Sage-Bionetworks/SynapseWebClient45. pentaho/pentaho-platform46. weld/core47. jboss-reddeer/reddeer48. zanata/zanata-server49. square/retrofit50. owncloud/android51. DataTorrent/Malhar52. thinkaurelius/titan53. intermine/intermine3954. rackerlabs/repose55. SpigotMC/BungeeCord56. nostra13/Android-Universal-Image-Loader57. ceylon/ceylon-ide-eclipse58. metamx/druid59. cwi-swat/rascal60. imeji-community/imeji61. bndtools/bnd62. Caleydo/caleydo63. essentials/Essentials64. fabric8io/fabric865. rackerlabs/blueflood66. SpoutDev/Spout67. jfeinstein10/SlidingMenu68. JakeWharton/ActionBarSherlock69. geoserver/geoserver70. atlasapi/atlas71. jim618/multibit72. hibernate/hibernate-ogm73. jboss-fuse/fuse74. kevinweil/elephant-bird75. jacoco/jacoco76. l0rdn1kk0n/wicket-bootstrap77. endosnipe/ENdoSnipe78. tinkerpop/rexster79. BaseXdb/basex80. facebook/presto81. Prototik/HoloEverywhere82. lobid/lodmill83. richardwilly98/elasticsearch-river-mongodb84. UniversalMediaServer/UniversalMediaServer85. jboss-switchyard/core86. overturetool/overture4087. sk89q/worldedit88. OpenGrok/OpenGrok89. maxcom/lorsource90. griddynamics/jagger91. restlet/restlet-framework-java92. xetorthio/jedis93. nutzam/nutz94. droolsjbpm/jbpm95. caelum/vraptor96. sarxos/webcam-capture97. danieloeh/AntennaPod98. MythTV-Clients/MythTV-Android-Frontend99. amplab/tachyon100. openlegacy/openlegacy101. kijiproject/kiji-schema102. jbosstools/jbosstools-integration-tests103. crawljax/crawljax104. selendroid/selendroid105. Graylog2/graylog2-server106. caprica/vlcj107. filipg/amu automata 2011108. alibaba/RocketMQ109. itm/testbed-runtime110. e-ucm/ead111. laforge49/JActor2112. Multiverse/Multiverse-Portals113. SurvivalGamesDevTeam/TheSurvivalGames114. nla/banjo115. jberkel/sms-backup-plus116. rgladwell/m2e-android117. curtisullerich/attendance118. EmilHernvall/tregmine119. libreliodev/android41120. mapstruct/mapstruct121. Monstercraft/MonsterIRC122. gwtquery/gwtquery123. basho/riak-java-client124. jboss-switchyard/quickstarts125. jboss-switchyard/tools126. Adobe-Consulting-Services/acs-aem-commons127. jbosstools/jbosstools-server128. springside/springside4129. apigee/usergrid-stack130. Silverpeas/Silverpeas-Components131. graphhopper/graphhopper132. mitreid-connect/OpenID-Connect-Java-Spring-Server133. DSH105/EchoPet134. alibaba/druid135. jayway/maven-android-plugin136. undertow-io/undertow137. dCache/dcache138. resteasy/Resteasy139. samarthgupta437/falcon-regression140. bigbluebutton/bigbluebutton141. CruGlobal/conf-registration-api142. demoiselle/behave143. irstv/H2GIS144. Findwise/Hydra145. yegor256/s3auth146. NightWhistler/PageTurner147. huskysoft/403Interviewer148. nasa/mct149. branflake2267/GWT-Maps-V3-Api150. ios-driver/ios-driver151. jclouds/jclouds-chef152. sk89q/commandhelper42153. Cloudname/cloudname154. mbax/VanishNoPacket155. xXKeyleXx/MyPet156. jbosstools/jbosstools-base157. jbosstm/quickstart158. github/android159. MinecraftPortCentral/MCPC-Plus-Legacy160. pulse00/Symfony-2-Eclipse-Plugin161. FenixEdu/fenix162. Jasig/uPortal163. The-Dream-Team/Tardis164. liquibase/liquibase165. jclouds/jclouds-karaf166. dbpedia-spotlight/dbpedia-spotlight167. jankotek/MapDB168. jbosstools/jbosstools-openshift169. jboss-switchyard/release170. yahoo/oozie171. vkostyukov/la4j172. Hidendra/LWC173. OpenNTF/JavascriptAggregator174. jbosstools/jbosstools-central175. NineWorlds/serenity-android176. webbit/webbit177. maplesyrup/maple-android178. rydnr/queryj179. robovm/robovm180. ceylon/ceylon-module-resolver181. p6spy/p6spy182. karma-exchange-org/karma-exchange183. fixteam/fixflow184. square/wire185. jbosstools/jbosstools-jst43186. Jasig/java-cas-client187. mybatis/mybatis-3188. redline-smalltalk/redline-smalltalk189. taoneill/war190. OSBI/saiku191. greenlaw110/Rythm192. eclipse/vert.x193. Activiti/Activiti194. symphonytool/symphony195. Nodeclipse/nodeclipse-1196. axemblr/axemblr-provisionr197. MxUpdate/Update198. eclipsesource/tabris199. mongodb/morphia200. tcurdt/jdeb201. aws/aws-sdk-java202. javajigi/slipp203. docdoku/docdoku-plm204. jenkinsci/git-client-plugin205. capedwarf/capedwarf-blue206. TooTallNate/Java-WebSocket207. jantje/arduino-eclipse-plugin208. joel-costigliola/assertj-core209. MarkehMe/FactionsPlus210. tntim96/JSCover211. pardom/ActiveAndroid212. chrisbanes/PhotoView213. Governance/s-ramp214. alkarinv/BattleArena215. Microsoft-CISL/REEF216. ralscha/extdirectspring217. nhaarman/ListViewAnimations218. SeqWare/seqware44219. greenlaw110/play-morphia220. facebook/swift221. cbeust/testng222. alexruiz/fest-assert-2.x223. robotoworks/mechanoid224. dropwizard/dropwizard225. np98765/BattleKits226. todoroo/astrid227. wuetherich/bundlemaker228. jknack/handlebars.java229. jline/jline2230. ps3mediaserver/ps3mediaserver231. moagrius/TileView232. andrewphorn/ClassiCube-Client233. janinko/ghprb234. openplanets/scout235. compbio-UofT/medsavant236. sonyxperiadev/BacklogTool237. Docear/Desktop238. objectos/objectos-dojo239. jbosstools/jbosstools-vpe240. marytts/marytts241. ceylon/ceylon-runtime242. BatooOrg/BatooJPA243. korpling/ANNIS244. backmeup/backmeup-prototype245. carrotsearch/randomizedtesting246. bguerout/jongo247. kijiproject/kiji-mapreduce248. sqlparser/sql2jooq249. aerogear/aerogear-android250. CompendiumNG/CompendiumNG45Appendix BProject Pairs With BothTechnical Dependence and SocialInteractionsThe 139 project pairs which have common users. We categorize these projects in 3categories in Section 5.2.1. Governance/s-ramp ↔ jboss-switchyard/release2. jclouds/jclouds-karaf ↔ jclouds/jclouds-chef3. greenlaw110/Rythm ↔ alibaba/druid4. ceylon/ceylon-runtime ↔ ceylon/ceylon-module-resolver5. neo4j/neo4j ↔ rgladwell/m2e-android6. np98765/BattleKits ↔ Bukkit/CraftBukkit7. jbosstm/quickstart ↔ weld/core8. jbosstm/quickstart ↔ jbosstm/narayana9. jbosstm/quickstart ↔ wildfly/wildfly10. pardom/ActiveAndroid ↔ rgladwell/m2e-android11. undertow-io/undertow ↔ netty/netty12. JakeWharton/ActionBarSherlock ↔ rgladwell/m2e-android13. JakeWharton/ActionBarSherlock ↔ todoroo/astrid14. JakeWharton/ActionBarSherlock ↔ square/wire4615. MarkehMe/FactionsPlus ↔ Bukkit/CraftBukkit16. MarkehMe/FactionsPlus ↔ essentials/Essentials17. weld/core ↔ cbeust/testng18. weld/core ↔ wildfly/wildfly19. weld/core ↔ fabric8io/fabric820. jboss-switchyard/release ↔ jboss-switchyard/quickstarts21. bguerout/jongo ↔ joel-costigliola/assertj-core22. BroadleafCommerce/BroadleafCommerce ↔ cbeust/testng23. greenlaw110/play-morphia ↔ mongodb/morphia24. webbit/webbit ↔ netty/netty25. jboss-switchyard/tools ↔ jboss-switchyard/release26. kijiproject/kiji-mapreduce ↔ kijiproject/kiji-schema27. Bukkit/CraftBukkit ↔ jline/jline228. infinispan/infinispan ↔ weld/core29. infinispan/infinispan ↔ resteasy/Resteasy30. infinispan/infinispan ↔ wildfly/wildfly31. infinispan/infinispan ↔ netty/netty32. infinispan/infinispan ↔ fabric8io/fabric833. jbosstm/narayana ↔ jbosstm/quickstart34. jbosstm/narayana ↔ undertow-io/undertow35. jbosstm/narayana ↔ weld/core36. jbosstm/narayana ↔ wildfly/wildfly37. p6spy/p6spy ↔ liquibase/liquibase38. alkarinv/BattleArena ↔ Bukkit/CraftBukkit39. alkarinv/BattleArena ↔ mbax/VanishNoPacket40. alkarinv/BattleArena ↔ mcMMO-Dev/mcMMO41. robovm/robovm ↔ rgladwell/m2e-android42. square/retrofit ↔ rgladwell/m2e-android43. square/retrofit ↔ square/wire44. square/retrofit ↔ square/okhttp45. mbax/VanishNoPacket ↔ Bukkit/CraftBukkit46. SpoutDev/Spout ↔ jline/jline247. SpoutDev/Spout ↔ netty/netty4748. jboss-switchyard/quickstarts ↔ weld/core49. jboss-switchyard/quickstarts ↔ jboss-switchyard/release50. jboss-switchyard/quickstarts ↔ jbosstm/narayana51. jboss-switchyard/quickstarts ↔ wildfly/wildfly52. chrisbanes/PhotoView ↔ nostra13/Android-Universal-Image-Loader53. mcMMO-Dev/mcMMO ↔ Bukkit/CraftBukkit54. alibaba/druid ↔ mybatis/mybatis-355. alibaba/druid ↔ nutzam/nutz56. NineWorlds/serenity-android ↔ rgladwell/m2e-android57. resteasy/Resteasy ↔ weld/core58. resteasy/Resteasy ↔ infinispan/infinispan59. nostra13/Android-Universal-Image-Loader ↔ rgladwell/m2e-android60. nostra13/Android-Universal-Image-Loader ↔ square/wire61. nhaarman/ListViewAnimations ↔ rgladwell/m2e-android62. irstv/H2GIS ↔ irstv/orbisgis63. springside/springside4 ↔ mybatis/mybatis-364. springside/springside4 ↔ joel-costigliola/assertj-core65. ios-driver/ios-driver ↔ cbeust/testng66. ios-driver/ios-driver ↔ webbit/webbit67. wildfly/wildfly ↔ undertow-io/undertow68. wildfly/wildfly ↔ cbeust/testng69. wildfly/wildfly ↔ weld/core70. wildfly/wildfly ↔ infinispan/infinispan71. wildfly/wildfly ↔ jbosstm/narayana72. wildfly/wildfly ↔ resteasy/Resteasy73. github/android ↔ rgladwell/m2e-android74. square/picasso ↔ rgladwell/m2e-android75. square/picasso ↔ square/wire76. square/picasso ↔ square/okhttp77. alibaba/RocketMQ ↔ alibaba/druid78. alibaba/RocketMQ ↔ netty/netty79. jboss-fuse/fuse ↔ jclouds/jclouds-karaf80. jboss-fuse/fuse ↔ bndtools/bnd4881. jboss-fuse/fuse ↔ wildfly/wildfly82. jboss-fuse/fuse ↔ fabric8io/fabric883. hibernate/hibernate-ogm ↔ infinispan/infinispan84. hibernate/hibernate-ogm ↔ resteasy/Resteasy85. DSH105/EchoPet ↔ Bukkit/CraftBukkit86. DSH105/EchoPet ↔ mbax/VanishNoPacket87. DSH105/EchoPet ↔ sk89q/worldedit88. Hidendra/LWC ↔ Bukkit/CraftBukkit89. elasticsearch/elasticsearch ↔ jboss-fuse/fuse90. essentials/Essentials ↔ Bukkit/CraftBukkit91. sk89q/commandhelper ↔ Bukkit/CraftBukkit92. Multiverse/Multiverse-Portals ↔ sk89q/worldedit93. Multiverse/Multiverse-Portals ↔ Multiverse/Multiverse-Core94. richardwilly98/elasticsearch-river-mongodb ↔ cbeust/testng95. richardwilly98/elasticsearch-river-mongodb ↔ elasticsearch/elasticsearch96. MinecraftPortCentral/MCPC-Plus-Legacy ↔ SpigotMC/BungeeCord97. SpigotMC/BungeeCord ↔ jline/jline298. SpigotMC/BungeeCord ↔ netty/netty99. jboss-switchyard/core ↔ jboss-switchyard/release100. eclipse/vert.x ↔ hazelcast/hazelcast101. eclipse/vert.x ↔ netty/netty102. cucumber/cucumber-jvm ↔ cbeust/testng103. cucumber/cucumber-jvm ↔ webbit/webbit104. cucumber/cucumber-jvm ↔ rgladwell/m2e-android105. demoiselle/behave ↔ cucumber/cucumber-jvm106. Multiverse/Multiverse-Core ↔ Bukkit/CraftBukkit107. capedwarf/capedwarf-blue ↔ undertow-io/undertow108. capedwarf/capedwarf-blue ↔ infinispan/infinispan109. capedwarf/capedwarf-blue ↔ resteasy/Resteasy110. capedwarf/capedwarf-blue ↔ wildfly/wildfly111. yegor256/s3auth ↔ jcabi/jcabi-github112. square/wire ↔ square/retrofit113. square/wire ↔ rgladwell/m2e-android49114. jayway/maven-android-plugin ↔ rgladwell/m2e-android115. square/okhttp ↔ rgladwell/m2e-android116. selendroid/selendroid ↔ netty/netty117. apigee/usergrid-stack ↔ hector-client/hector118. maxcom/lorsource ↔ elasticsearch/elasticsearch119. xXKeyleXx/MyPet ↔ Bukkit/CraftBukkit120. xXKeyleXx/MyPet ↔ alkarinv/BattleArena121. xXKeyleXx/MyPet ↔ mcMMO-Dev/mcMMO122. Monstercraft/MonsterIRC ↔ Bukkit/CraftBukkit123. thinkaurelius/titan ↔ tinkerpop/rexster124. thinkaurelius/titan ↔ elasticsearch/elasticsearch125. facebook/presto ↔ jline/jline2126. facebook/presto ↔ aws/aws-sdk-java127. facebook/presto ↔ hector-client/hector128. symphonytool/symphony ↔ overturetool/overture129. ArcBees/GWTP ↔ gwtquery/gwtquery130. Jasig/uPortal ↔ Jasig/java-cas-client131. fabric8io/fabric8 ↔ jclouds/jclouds-karaf132. fabric8io/fabric8 ↔ wildfly/wildfly133. fabric8io/fabric8 ↔ jboss-fuse/fuse134. fabric8io/fabric8 ↔ elasticsearch/elasticsearch135. taoneill/war ↔ mbax/VanishNoPacket136. square/dagger ↔ rgladwell/m2e-android137. square/dagger ↔ square/wire138. jclouds/jclouds-chef ↔ jclouds/jclouds-karaf139. sarxos/webcam-capture ↔ netty/nettyThe 18 project pairs which have no common users. These are a result of overapproximation of social interactions in SPG.1. infinispan/infinispan ↔ cbeust/testng2. mapstruct/mapstruct ↔ weld/core3. apigee/usergrid-stack ↔ jline/jline24. jclouds/jclouds-chef ↔ cbeust/testng505. jboss-fuse/fuse ↔ jclouds/jclouds-chef6. fabric8io/fabric8 ↔ jclouds/jclouds-chef7. fabric8io/fabric8 ↔ bndtools/bnd8. infinispan/infinispan ↔ jbosstm/narayana9. capedwarf/capedwarf-blue ↔ fabric8io/fabric810. sk89q/commandhelper ↔ jline/jline211. MarkehMe/FactionsPlus ↔ sk89q/worldedit12. xXKeyleXx/MyPet ↔ sk89q/worldedit13. alkarinv/BattleArena ↔ sk89q/worldedit14. MinecraftPortCentral/MCPC-Plus-Legacy ↔ jline/jline215. xetorthio/jedis ↔ rgladwell/m2e-android16. Activiti/Activiti ↔ bndtools/bnd17. Activiti/Activiti ↔ mybatis/mybatis-318. dropwizard/dropwizard ↔ joel-costigliola/assertj-core51Appendix CTimeline Plots for Category AProject PairsFor project pairs in Category A, all social activity occurs after the introduction oftechnical dependence.52Figure C.1: Category A project pairs (1-6)53Figure C.2: Category A project pairs (7-12)54Figure C.3: Category A project pairs (13-18)55Figure C.4: Category A project pairs (19-24)56Figure C.5: Category A project pairs (25-30)57Figure C.6: Category A project pairs (31-36)58Figure C.7: Category A project pairs (37-42)59Figure C.8: Category A project pairs (43-45)60Appendix DTimeline Plots for Category BProject PairsFor project pairs in Category B, 75% social activity comes after the introduction oftechnical dependence.61Figure D.1: Category B project pairs (1-6)62Figure D.2: Category B project pairs (7-12)63Figure D.3: Category B project pairs (13-18)64Figure D.4: Category B project pairs (19-24)65Figure D.5: Category B project pairs (25-30)66Figure D.6: Category B project pairs (31-36)67Figure D.7: Category B project pairs (37-42)68Figure D.8: Category B project pairs (43-48)69Figure D.9: Category B project pair (49)70Appendix ETimeline Plots for Category CProject PairsFor project pairs in Category C, less than 75% of social activity occurs after theintroduction of technical dependence.71Figure E.1: Category C project pairs (1-6)72Figure E.2: Category C project pairs (7-12)73Figure E.3: Category C project pairs (13-18)74Figure E.4: Category C project pairs (19-24)75Figure E.5: Category C project pairs (25-30)76Figure E.6: Category C project pairs (31-36)77Figure E.7: Category C project pairs (37-42)78Figure E.8: Category C project pairs (43-45)79Appendix FCommunities Detected in UIGThe 22 communities detected in UIG which result in social connections for the 157project pairs with both technical dependence and social interactions.In these plots nodes are users and the edges are the interaction between users.Different edge colors within a community represent the different projects overwhich the user interactions occurred. The captions show the number of verticesor users in the community (∣V ∣), the number of edges (∣E ∣), and the number ofprojects on which the interactions took place (∣P∣).The remaining 82 communities are not shown here. A large number of thosecommunities involve only one project.80(a) ∣V ∣ = 288, ∣E ∣ = 722, ∣P∣ = 11 (b) ∣V ∣ = 360, ∣E ∣ = 768, ∣P∣ = 10(c) ∣V ∣ = 15, ∣E ∣ = 84, ∣P∣ = 2 (d) ∣V ∣ = 527, ∣E ∣ = 2463, ∣P∣ = 21(e) ∣V ∣ = 263, ∣E ∣ = 1927, ∣P∣ = 8 (f) ∣V ∣ = 16, ∣E ∣ = 54, ∣P∣ = 2Figure F.1: Communities detected in UIG (1-6)81(a) ∣V ∣ = 50, ∣E ∣ = 111, ∣P∣ = 2 (b) ∣V ∣ = 420, ∣E ∣ = 1248, ∣P∣ = 7(c) ∣V ∣ = 116, ∣E ∣ = 141, ∣P∣ = 2 (d) ∣V ∣ = 32, ∣E ∣ = 113, ∣P∣ = 5(e) ∣V ∣ = 692, ∣E ∣ = 2693, ∣P∣ = 23 (f) ∣V ∣ = 671, ∣E ∣ = 2188, ∣P∣ = 8Figure F.2: Communities detected in UIG (7-12)82(a) ∣V ∣ = 797, ∣E ∣ = 2723, ∣P∣ = 19 (b) ∣V ∣ = 357, ∣E ∣ = 1537, ∣P∣ = 19(c) ∣V ∣ = 85, ∣E ∣ = 149, ∣P∣ = 4 (d) ∣V ∣ = 3045, ∣E ∣ = 12251, ∣P∣ = 23(e) ∣V ∣ = 1866, ∣E ∣ = 29245, ∣P∣ = 7 (f) ∣V ∣ = 20, ∣E ∣ = 31, ∣P∣ = 2Figure F.3: Communities detected in UIG (13-18)83(a) ∣V ∣ = 1992, ∣E ∣ = 7585, ∣P∣ = 20 (b) ∣V ∣ = 638, ∣E ∣ = 2331, ∣P∣ = 6(c) ∣V ∣ = 61, ∣E ∣ = 333, ∣P∣ = 3 (d) ∣V ∣ = 17, ∣E ∣ = 32, ∣P∣ = 2Figure F.4: Communities detected in UIG (19-22)84

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166086/manifest

Comment

Related Items