UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A new change propagation metric to assess software evolvability Gonzalez, Marco A. 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_fall_gonzalez_marco.pdf [ 3.09MB ]
Metadata
JSON: 24-1.0073917.json
JSON-LD: 24-1.0073917-ld.json
RDF/XML (Pretty): 24-1.0073917-rdf.xml
RDF/JSON: 24-1.0073917-rdf.json
Turtle: 24-1.0073917-turtle.txt
N-Triples: 24-1.0073917-rdf-ntriples.txt
Original Record: 24-1.0073917-source.json
Full Text
24-1.0073917-fulltext.txt
Citation
24-1.0073917.ris

Full Text

A NEW CHANGE PROPAGATION METRIC TO ASSESS SOFTWARE EVOLVABILITY  by Marco A Gonzalez  B.A.Sc., Instituto Tecnológico y de Estudios Superiores de Monterrey, 1998.  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE  in The Faculty of Graduate Studies  (Electrical and Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2013  © Marco A Gonzalez, 2013  Abstract The development of software-intensive systems faces many challenges; one of the most important from an economic perspective is to reduce their maintenance costs. This thesis proposes a modified change propagation metric as a tool to assist the analysis of evolvability and maintainability of a software system and to ultimately support the reduction of its maintenance cost.  The technical complexity of software systems has a great impact on their ability to make increased functionality and adaptability to the environment possible. One approach to understand and master the complexity of large software systems, varying from thousands to millions of lines of source code, is through software architecture. This study examines a sample of software systems from the dependencies of their static structural view. The dependencies and their importance are expressed as a design structure matrix (DSM) that is used as an indicator to reflect the strength of dependence and connection among the different modules. In this thesis, we propose a “modified change propagation” metric as a set of incremental improvements over the original Propagation Cost (PC) metric proposed by MacCormack (2008). Our improved metric uses dependencies weighted with strength to convey more information about the incidence of strongly connected relationships and it discounts weak dependencies. Moreover the original propagation metrics considered that the system should be acyclical; but we found that in practice a very few real systems are exempt of cycles. Furthermore, if cyclic dependencies are heavy rather than weak then these cycles should be treated differently. Finally, our metric is normalized to minimize the effect of both change in the total depth of the dependency graph, and increases in the size of the code.  Our modified change propagation metric can help software designers assess the maintainability of a software system at design time and over a proposed release sequence by comparing change propagation measures for different designs of software architecture. For instance, after refactoring. We validated our metric both on a system developed at UBC, and on several large open-source repositories for which we were able to obtain long release histories.  ii  Preface All of the work presented hereafter was conducted in the Software Engineering and Architecture Laboratory, and at the Electrical Power and Energy Systems at the University of British Columbia, Point Grey campus.  Chapter 3 and 5 were written with the work done on (Nord 2012). The system analyzed was DR-NEP provided by the author, the analysis of the rework cost was done following the methodology provided by Nord R. and Ozkaya I. Philippe Kruchten coordinated the work. The work on (González-Rojas 2011) was used to implement the DR-NEP system supervised by Dr . DR-NEP was an project funded by Canarie for the Electrical Power and Energy Systems Lab under the supervision of Jose Marti.  For Chapter 5 and 7 were written based on the work from (Kruchten 2012). Kruchten provided guidance onto reviewing the matrix operations for a cost propagation metric.  I wrote the thesis and Dr. Philippe Kruchten provided guidance and revisions for the manuscript.  iii  Table of Contents Abstract................................................................................................................................. ii Preface ................................................................................................................................ iii Table of Contents................................................................................................................. iv Acknowledgements ............................................................................................................. xii Dedication .......................................................................................................................... xiii 1  Introduction ................................................................................................................... 1 1.1  Background............................................................................................................ 1  1.1.1 Software Architecture ......................................................................................... 1 1.1.2 Software Evolution ............................................................................................. 2 1.1.3 Iterative and Incremental Development .............................................................. 2 1.2  Value of Software Architecture ............................................................................... 3  1.2.1 Software Architecture and Evolution. .................................................................. 3 1.3  Technical Debt ....................................................................................................... 4  1.3.1 Technical Debt at the Architecture Level ............................................................ 4 1.3.2 How to Prevent Technical Debt Originated from the Software Design ................ 5  2  1.4  Propagation Cost ................................................................................................... 6  1.5  Limitations of Propagation Cost ............................................................................. 7  1.6  Research Method .................................................................................................. 8  1.7  Structure of the Thesis ........................................................................................... 9  Related Work ...............................................................................................................10 2.1  Introduction ...........................................................................................................10  iv  2.2  Software Evolution ................................................................................................10  2.2.1 Environment ......................................................................................................10 2.2.2 Evolution ...........................................................................................................11 2.3  Modularity .............................................................................................................12  2.4  Metrics ..................................................................................................................14  2.4.1 Granularity.........................................................................................................17  3  2.5  Value and Software Development .........................................................................17  2.6  Technical Debt ......................................................................................................18  2.7  Propagation Cost ..................................................................................................19  Approach Taken...........................................................................................................21  3.1  Introduction ...........................................................................................................21  3.2  Internal and External Module Dependencies .........................................................21  3.3  Analysis Tool ........................................................................................................22  3.3.1 Analysis Assumptions........................................................................................22 3.3.1.1 Dependency Strength Count ......................................................................23 3.4  Validation ..............................................................................................................24  3.4.1 In-House System ...............................................................................................25 3.4.2 Open-Source Systems ......................................................................................28 3.4.2.1 Hadoop ......................................................................................................28 3.4.2.2 PDFBox ......................................................................................................29 3.4.2.3 Ant .............................................................................................................30 3.4.2.4 Checkstyle..................................................................................................36 3.4.2.5 Detailed Analysis ........................................................................................37 3.4.2.5.1 Normalizing by Visibility Length 1...........................................................43 3.4.2.5.2 Adjustment in the Analysis of Checkstyle ...............................................44  v  3.4.2.5.3 Similarities with other Metrics..................................................................44 3.5 4  Sum Variant Breadth Sensitivity ............................................................................45  Deriving Value for Software Architecture from User-visible Concerns ..........................52 4.1  Introduction ...........................................................................................................52  4.2  Value for Invisible Elements in the Backlog ...........................................................53  4.2.1 Benefit for Architectural Elements......................................................................55 4.3  Mapping Benefit from Features to Architectural Elements .....................................56  4.3.1 Scheduling ........................................................................................................59 4.3.2 Implementation ..................................................................................................60 4.4 5  Conclusion ............................................................................................................65  Dealing with Dependency Strength ..............................................................................66 5.1  Introduction ...........................................................................................................66  5.2  Propagation Cost ..................................................................................................66  5.2.1 Limitation of Boolean Dependencies .................................................................67 5.2.2 Modified Dot Product .........................................................................................69 5.2.3 Parallel Operator   ..........................................................................................69  5.3  Change Propagation Analysis ...............................................................................70  5.4  Change Propagation Estimation Proposed Variants..............................................71  5.5  Validating the 7 Variants .......................................................................................74  5.5.1 Case 1: Two Parallel Paths for the Same Destination........................................75 5.5.2 Case 2: Two Parallel Paths for Two Destinations ..............................................76 5.5.3 Case 3: Layered Structure with Multiple Parallel Paths Version 1 ......................79 5.5.4 Case 4: Layered Structure with Multiple Parallel Paths Version 2 ......................81 5.6  Normalization ........................................................................................................83  vi  5.7  Refactoring with Mediators (Brokers) ....................................................................85  5.7.1 Use All Structure 5 Elements 2 Layers ..............................................................85 5.7.2 Use All Structure 13 Elements Two Layers ........................................................86 5.7.3 Use All 5 Elements 2 Layers with Three Layers.................................................87 5.7.3.1 Mediators ...................................................................................................87  5.7.3.1.1 5 Elements 3 Layers One Mediator Structure .........................................87 5.7.3.1.2 13 Elements 3 Layers One Mediator .......................................................88 5.8 6  Conclusions ..........................................................................................................90  Addressing Cycles .......................................................................................................91 6.1  Introduction ...........................................................................................................91  6.2  Transitivity and Change Propagation ....................................................................91  6.3  Change Propagation Metric and Cycled Dependencies ........................................92  6.3.1 Matrix Powers and Propagation Visibility ...........................................................93 6.3.1.1 Indirect Dependencies ................................................................................93 6.4  Systems with Cyclical Dependencies ....................................................................96  6.4.1 Tactics to Address Cycles .................................................................................98 6.4.1.1 Change Propagation with Cycles................................................................99 6.4.1.2 Ignoring Cycling Dependencies ..................................................................99 6.4.2 Limiting the Impact of Cycles ...........................................................................100  7  6.5  Cycle Detection ...................................................................................................103  6.6  Conclusions ........................................................................................................104  Dealing with the Depth of Propagation .......................................................................106 7.1  Introduction .........................................................................................................106  7.2  Propagation and Depth Estimation......................................................................106  vii  8  7.3  Interdependency .................................................................................................109  7.4  Conclusion ..........................................................................................................110  Conclusions ...............................................................................................................111 8.1  Introduction .........................................................................................................111  8.2  Research Goals Summary ..................................................................................111  8.3  Contributions.......................................................................................................113  8.3.1 Propagation Breadth and Depth ......................................................................114 8.3.2 Cycles .............................................................................................................116 8.4  Future Work ........................................................................................................116  8.4.1 Level of Analysis .............................................................................................116 8.4.2 Propagation Decrease for Path Length Increase .............................................117 8.4.3 Dependency Types .........................................................................................117 8.4.4 Internal Dependencies.....................................................................................118 8.4.5 Analysis per Release .......................................................................................118 8.4.6 Visibility ...........................................................................................................119 8.5  Conclusion ..........................................................................................................119  References .......................................................................................................................121  viii  List of Tables Table 3-1 Gross change propagation, V, and normalized change propagation, V/n2 ...........42 Table 3-2 Checkstyle 2.0 change propagation per length ....................................................46 Table 3-3 MAX change propagation per path length for checkstyle-5.0, scaled by path length ...........................................................................................................................47 Table 3-4 SUM change propagation evolution per path leength for checkstyle-5.0, scaled by ....................................................................................................................................48 Table 3-5 Divide by Length (DIVPOW) ................................................................................50 Table 3-6 Increase of change propagation for checkstyle 5.0 MAX vs SUM ........................51 Table 4-1 Architectural Element before mapping.................................................................62 Table 4-2 Feature after mapping .........................................................................................63 Table 4-3 Development element types prioritized for scheduling .........................................64 Table 5-1. Dot Product for system with two parallel paths ...................................................76 Table 5-2 Change propagation for variants .........................................................................76 Table 5-3 Parallels change propagation ..............................................................................78 Table 5-4 Parallels eliminated change propagation .............................................................78 Table 5-5 paths of length 2 (M2) and length 3 (M3) .............................................................80 Table 5-6 Change Propagation for ......................................................................................80 Table 5-7 Change propagation for symmetric and asymmetric systems ..............................82 Table 5-8 Symmetric and Asymmetric paths of length 3......................................................82  ix  List of Figures Figure 2-1 Technical Debt by McConnell ............................................................................................. 19 Figure 3-1 DRNEP Path #1 Deliver soon design structure matrix ....................................................... 26 Figure 3-2 DRNEP Path #2 Reduce rework design structure matrix ................................................... 27 Figure 3-3 Propagation cost and modified change propagation for DRNEP Paths #1 and #2 ............ 27 Figure 3-4 Hadoop sources lines of code, PC, and MCP (v/n^2) ........................................................ 29 Figure 3-5 PDFBox source lines of code, PC, and MCP (V/N2) ........................................................ 30 Figure 3-6 Sources lines of code versions and cycles for ANT 1.4.1 to 1.9.0 ..................................... 31 Figure 3-7 Visibility for ANT versions 1.4.1 to 1.9.0 for 2  nd  level in Figure 3-1 .................................... 33  Figure 3-8 Ant 1.8.4 expanded to 2nd level and taskdefs and types modules to 3rd level expansion ..................................................................................................................................................... .34 Figure 3-9 Ant Change Propagation version 1.4.1 to 1.9.0 .................................................................. 35 Figure 3-10 Checkstyle SLOC, PC, MCP, and SBO from Bouwers .................................................... 37 Figure 3-11 Checkstyle propagation compared to SBO, CSU, CB ...................................................... 39 Figure 3-12 10x zoom for Visibility, V ................................................................................................... 40 Figure 3-13 5x zoom for Change propagation, V/n^2 .......................................................................... 41 Figure 3-14 Zoom for Change propagation, V/n^2 ............................................................................... 42 1  2  Figure 3-15 V divided by D and compared vs V divided by n ............................................................ 43 Figure 4-1 value system framework ..................................................................................................... 53 Figure 4-2 Mapping of features to architectural elements .................................................................... 57 Figure 4-3 Priority(value) estimated for Architectural Elements (yellow) ............................................. 62 Figure 4-4 Tasks assigned to iterations ............................................................................................... 64 Figure 5-1 Indirect dependency from A to C ........................................................................................ 69 Figure 5-2 Parallel Operator,   .......................................................................................................... 70  Figure 5-3 System with two parallel paths (strength to be divided by 10*) .......................................... 75 Figure 5-4 Four parallel paths .............................................................................................................. 77 Figure 5-5 Dot product for system with four parallel paths .................................................................. 77  x  Figure 5-6 Parallels eliminated ............................................................................................................. 78 Figure 5-7 Layered structure with multiple paths version 1 ................................................................. 79 Figure 5-8 Multiple paths asymmetric .................................................................................................. 81 Figure 5-9 Use All 5 elements 2 levels ................................................................................................. 85 Figure 5-10 Use 13 elements 2 layers ................................................................................................. 86 Figure 5-11 Use all 3 levels .................................................................................................................. 87 Figure 5-12 5 elements 3 layers one mediator structure ..................................................................... 88 Figure 5-13 13 elements, 3 layers, one mediator................................................................................. 89 Figure 6-1  a) Directed graph  b) Design structure matrix ...................................................... 92  Figure 6-2 Expanded dependencies transitive closure ........................................................................ 92 Figure 6-3 Paths of length 2 ................................................................................................................. 95 2  Figure 6-4 Cycles saturate at D .......................................................................................................... 97 Figure 6-5 Dependency Structure Matrix with hierarchy ...................................................................... 99 Figure 6-6 System with cycles............................................................................................................ 101 Figure 6-7 Steps of cycled path.......................................................................................................... 101 Figure 6-8 Power 2 obtains all the paths of length 2 ......................................................................... 102 Figure 6-9. Power 3 obtains all paths of length 3 ............................................................................... 102 Figure 6-10. Power 4 obtains all paths of length 4 ............................................................................. 103 Figure 7-1 Propagation Depth ............................................................................................................ 106  xi  Acknowledgements  Thanks for Dr. Philippe Kruchten for his help and support during my research, I woul also like to thank to Dr. Jose R. Marti from the Electrical and Power Energy Systems Laboratory for his support and collaboration in the Disaster-Response Network Enabled Platform. Thanks to my friends Cesar E. Lopez Castellanos, Rui Ren. William Wang, Pranab Kini, Thanks to Cameron Dale, Effron Esseiva, Danuta Jaworski, and others from IBM for their support in the DRNEP project. Thanks to Ipek Ozkaya, Robert Nord, Julien Delange at the Software Engineering Institute at CMU. I want to thank NSERC for partially funding my research.  Thanks to my friends who helped me, especially Chris Campbell. Also thanks to Larix Lee, Mandana Sotoodeh, Steve Adolph, Erin Lim, and others at the Software Engineering and Architecture Laboratory. I also like to thank Neeraj Sangal at Lattix for his support.  xii  Dedication  Dedicated with love to:  to Sergio, Fernanda, Deya, Edgard, Martha, Ignacio, and my extended family in Vancouver.  xiii  1 Introduction  Software systems provide tools that benefit a wide range of human activities. However, the development of software systems face several challenges; one of the most important from the economic and technical perspective is to reduce its maintenance costs. This thesis proposes a modified change propagation metric as a vehicle to assist in the analysis of evolvability and to ultimately support in the reduction of maintenance costs.  1.1 Background 1.1.1 Software Architecture One approach to understand and master the complexity of large software systems is through software architecture. Software systems are complex products varying from thousands to millions of lines of source code, embodying implicit design decisions, internal and external constraints, different technical and non-technical concerns, etc. Software systems have different concerns depending on the environment they interact with, and this presents an engineering and management challenge. Software architecture is a relatively new discipline that aims to understand the properties, elements, relationships, and environment of a software system. Software architecture provides guidelines and practices to elicit the goals and properties and to reason about the design decisions that will produce a system aimed at fulfilling these properties and the intended functionality. Software architecture also provides guidelines on how to analyze the tradeoffs among the properties of a system, and non-monetary estimation of the benefits and costs of the properties of a software system in an abstract way. The software architecture of a software system can be seen as the set of all abstract and implemented design decisions, the  1  elements that conform a system, and the relationships that determine the properties of a system (ISO/42010 2011).  1.1.2 Software Evolution A software system is subject to interactions and pressures that force it to evolve; this phenomenon is referred as software evolution. A software system is a subsystem that interacts with other subsystems in an environment that conforms a broader dynamic system that will change and force the software subsystem to be adapted. For instance, cross-domain systems that involve a social context constantly adapt to changes derived from the dynamic social component. External changes force the software system to be modified, and after a number of adaptations the initially correct designed software architecture will successively accumulate a mismatch between its structure and the present conditions in the environment. This mismatch in most of the times is solved by mappings and other modifications in the software system, which increases the size and complexity of the system. Therefore, the maintenance cost increases over time and a redesign is needed to evolve the structure to avoid increasing mismatch that provokes decay in a system.  1.1.3 Iterative and Incremental Development Software development professionals had improved delivery time and a reduced cost by developing iteratively and incrementally. The previous waterfall development model proposed to sequentially execute the requirements specification, design, construction, validation and maintenance phases. This produced a high rate of failed projects due to discrepancies and errors but mainly because of miscommunication in each of the phases of the waterfall model. Iterative development divides an entire development project in several part or iterations each with a requirement, design, construct and validation phases. This division allowed to construct  2  part of the requirements which also enabled to validate not only the design and construction, but also to receive feedback and more importantly improve the requirements specification which in some cases accounts for more than 50% of the errors and mismatches (McConnell 2006; Apache 2002). Iterative development has also allowed to incrementally deliver a system by partial releases, allowing software development organizations to increase the quality and reduce costs.  1.2 Value of Software Architecture While software professionals agree that software design has a value it is difficult to demonstrate that the architecture design has value for a particular system. Considering value as the ratio of benefit to cost, in software development most of the cost is directly related with the effort assigned to tasks; this is similar for software architecture analysis and design. However, the benefits are difficult to make explicit since they manifest in the medium and long term or in properties that are not directly identifiable. While there are economic approaches for software architecture properties (Ozkaya 2007), there is the need to determine a value for the software architecture design.  1.2.1 Software Architecture and Evolution. Software Architecture is an important tool to elicit and reason about the evolution of a software system. And this is important since some software development professionals mention cost of software maintenance to be around 40 to 80% of the total software development life-cycle cost (Glass 2003; McConnell 2006). In addition, because evolving a system without considering architecture design will cause the system to accumulate more mismatches. These discrepancies might not be appreciated as relevant in a given point in time but that will have an impact later. That led to the formulation of the question:  3  RQ1. Can a value be estimated for software architecture? Could a heuristic be assigned for software architecture value?  1.3 Technical Debt Technical debt is a metaphor introduced by (Cunningham 1992) to refer to the pending and accumulating amount work that needs to be done because of a suboptimal code implementations made to deliver functionality in to fit into a tight deadline or decisions. The work that needs to be done has similar consequences to the interests in financial debt; the more time it passes without correcting the suboptimal code implementation, the more work needs to be done to correct it in the future. Technical debt also reduces the software development productivity to a point where  it can increase the cost per functional unit and increase  complexity, as McConnell (2007) mentions.  1.3.1 Technical Debt at the Architecture Level Technical or design debt at the architectural level can be understood as a metaphor to refer to the pending amount of design and implementation effort to solve the mismatch between the software architecture design of a system and the evolving conditions. It is important to distinguish the technical debt, which is the product of design decisions from code implementation debt from the fact that design decisions were taken consciously. Technical debt at the design level produces larger amounts of debt that individual code debts. Additionally, debt bearing design decisions might have been taken to satisfy a restriction, to embody a strategy, or because at that point in time and context that decision was judged adequate (Nord 2012).  4  1.3.2 How to Prevent Technical Debt Originated from the Software Design Ideally one way on how to prevent a design technical debt is to analyze at which point in time a suboptimal decision should be replaced by an optimal decision. In order to do this the value for design decisions in a system should be known and this is presents challenge. Some design decisions that produce technical debt were taken because the benefits were perceived minimal when compared to the costs in a particular point in time. However, optimal design decisions have more value in the medium and long term than the suboptimal decisions. Many design decisions are based on experience and subjective judgments, since there are no quantitative measures to give decisions a value. Therefore, the resolution on whether to implement suboptimal decisions or at which point in time to change from suboptimal to optimal decisions is left to the intuition. One approach could be to assign a strength, or weight, to the expected benefits and costs to design decisions. However, benefits are difficult to quantify because these are abstract properties of the system and some of them are visible in the medium and long term. Nonetheless, some of the benefits manifest as a reduction of complexity and ultimately as a reduction of cost. Also, software development professionals have more experience estimating costs and therefore associating as product of a design decision. Design decisions can be associated with organizational strategies that result in software architecture with an evolutionary path. Therefore, there is the possibility to associate the evolution of a software architecture and cost. This leads to the formulation of the next question:  RQ2. Can a metric be defined to compare the cost of different evolution paths for a software system?  5  1.4 Propagation Cost MacCormack (2008) proposed propagation cost as a metric to analyze how change potentially affects the structure and also as an indicator for the effort needed to evolve a system. In order to calculate the propagation cost of a system its components relationships structure is initially expressed as a design structure matrix, D, in order to determine how closely coupled or decoupled the components in a system are. The indirect dependencies among all the components are obtained by deriving the transitive closure of a system or visibility matrix, V. The propagation cost metric is the density of the direct and indirect dependencies of the system in the visibility matrix. These dependencies are used to obtain an indicator of how a change affects the rest of the system, the higher the change spreads through the system a higher propagation cost will produce. Ideally, systems that are well design and therefore decoupled should produce a lower propagation cost. However, this study found some inconsistencies and limitations.  MacCormack (2008) proposed propagation cost obtains cost propagation as a ratio of the transitivity of dependencies to all of the possible dependencies in a system. The transitive closure of a system is the propagation of the direct dependencies by making explicit all the indirect dependencies. This transitivity is estimated and expressed in a visibility matrix, V. In a system with n modules all the possible combinations are n2, therefore propagation cost is explained as: n  PC   n  Vij j 1 i 1 2  (1)  n  n 1  V   Di i 0  6  (2)  where Vij, is a given direct and/or indirect dependency between module i and module i. The direct dependencies can extracted by code dependency analysis tools. Static structure or code dependencies are used because they represent a good metric when refactoring the more dependencies are in modules the more possibilities there are for modifications in the dependent modules.  1.5 Limitations of Propagation Cost Propagation cost (MacCormack 2008) allowed the calculation derived from the software structure to be considered as an indicator of sensibility to change and necessary effort to evolve in a binary fashion. For this reason, propagation cost has the following limitations or opportunity for improvement:  D1. Propagation costs models dependencies with binary values, 1 to denote that a dependency exists and 0 to denote no dependency. This assumes that all dependencies in the system are of the same type; however in a system there will be dependencies from a subset of components to others what will exert a great influence, while in other subset will be of little or relatively no consequence.  D2. When a change affects the system it is also supposed that it will affect all of the components in the same proportion. While there are some changes that are localized in only one or a subset of components; having changes that affect all elements is less probable. Propagation cost does not capture this information. There is the need for another structure, or vector, to capture the probability of change for the elements in the system.  7  D3. Propagation cost might not give a good result when dealing when increasing the number of components in the system. Propagation cost results in a reduced estimation due to the division of the visibility by the square of the number of elements in the system. Propagation cost decreases non-linearly when more components are added.  D4. Many systems have dependency cycles, which would cause propagation cost to give an incorrect and saturated result. Propagation cost uses the visibility concept (Sharman and Yassine 2004) which is developed upon the matrix powers (Warfield 1973). If there is a cycle in the system that is all connected this will produce the final product to saturate. A different algorithm or equation that takes into account cycles is needed.  These limitations give raise to the following research question:  RQ2.1. Could propagation cost be modified to address its limitations and become an improved metric?  1.6 Research Method The approach taken was to review relevant work done regarding software evolution, software architecture design and technical debt. Some ideas about software architecture and its value where induced. The concept of deriving value for software architecture from visible features was explored. Propagation Cost (MacCormack 2008) a metric used to indicate how susceptible a system is to change is examined. Limitations to this metric were found the following solutions are proposed:   Characterizing dependencies with a strength    Proposing depth propagation variants for the estimation  8    Analyzing and limiting the effect of cycles  The solutions are tested with:   Small artificial and representative examples of systems structure    Large open-source systems for which different versions are accessible    A system developed in-house  1.7 Structure of the Thesis The next chapters explore how the change propagation metric was derived. Chapter 2 has a review of the relevant subjects on software evolution, architecture, and technical debt and others. Secondly, the research method, the tools and representations are described in Chapter 3. In Chapter 4 an approach to derive value for architecture the benefit from user-visible features and is used to determine the value of architectural elements and its limitation is commented. In Chapter 5, characterizing dependencies with a strength rather than a binary descriptor is analyzed. Afterwards, in Chapter 6 the issue of dependency cycles is analyzed and a solution is proposed. In chapter 7 the depth of the propagation metric is analyzed and observations are made. In Chapter 8 a better metric for propagation is proposed. Finally, in Chapter 9 summarizes and concludes the research.  9  2 Related Work 2.1 Introduction This section presents previous work regarding software context and value. The topics reviewed are software evolution, software design, software architecture, technical debt and propagation cost. These topics cover the environment of software development, its evolvable properties, how software design is a user-invisible but empowering or constraining factor in the evolution.  2.2 Software Evolution 2.2.1 Environment Manny Lehman (1980) made an important contribution by making explicit the complexity of the environment and mentioning that a software system is a part of a real-world solution. Many software systems interact with other real-world systems often of dynamic nature. In addition, software systems are the product of several abstractions from a real-world problem. First, there is an abstraction or view of a real-world problem by the human observer, then the requirements and design phases are another abstraction exercises in order to adequate the real-world abstraction into a software-development abstraction. When transforming a real-world problem and its solution into a software view. The problems-solution is modeled to different concerns software views as the views proposed by Kruchten (1995). A software system as a product is the result of several views or transformations from real-world and these tacit transformations must be considered in order to understand the nature of a software system and its maintenance and evolution.  10  2.2.2 Evolution Lehman (1980) pointed out in his influential work on software evolution that all systems are subject to “continuing change”. For instance, software systems interact in a social context, which are dynamical and continually evolve. Dynamical systems force software to change frequently and adapt to this change. This adaptation is important to separate from the addition of functionality or features, systems will be subject to change in the hypothetical case their functionality remained static. Incremental increase in complexity could eventually cause decay in a software system if organizations do not allocate effort for redesign.  Another effect is the increase in complexity, Lehman (1980) mentions that as environment changes the conditions for a software system change as well. This change requires the architecture of a system to adapt as well. However, most of the times, additional code between the architecture and the internal system compensates the discrepancies if no architecture redesign effort is done. Furthermore, additional functionalities or modifications also cause an increase in size and complexity that add on top of the architecture discrepancy compensating code. The accumulating complexity and architecture discrepancy cause a software system to decay in the long-term period.  Baldwin and Clark (2000) connected the concepts of software evolution and modularity. The software industry referred to the concept of modularity in the mid 1960’s because of the interest in reuse among others. Modularity was used to refer to independent and exchangeable modules in different families of systems while reducing the required for adaptation. The authors also refer to software structure or architecture when considering evolution, since the complex structures that comprise systems are made of simple ones and the architecture of the whole system can be referred as a fundamental structure (Garlan and Shaw 1994). In the work of Baldwin and  11  Clark (2000) software design, modularity, and software structures or architecture are highly related; software systems evolution is compared to organic evolution. The authors mention that in biological evolution the selection units are genes or the characteristics of an organism contained in its set of genes. The authors point out that the selection criteria in biology is fitness to the environment and for software systesms this criteria can be market value among others.  2.3 Modularity Baldwin and Clark (2000) suggested that modularity received special attention in the 1960’s when the software industry intended to reduce costs and time to deliver by reusing software. In the past software development organizations faced the problem that despite delivering very similar products each product would have similar cost and required time to produce as when developing a new system. This was due in part because of understandability and changing involved modifying several parts of the system that consume significant effort.  Modularization is an organizational technique that allows dividing a system into components, while recombining and reusing these modules. In the field of software development modularization also allows to reduce system complexity, leading to development that is more efficient and a greater capacity for adaptation. This enables reducing cost and time to deliver as well. Another characteristic of modularity is the division and allocation of work and responsibilities to teams that mirrors how designers decide how a system should coordinate to perform tasks. This partitioning of a system into modules implies detail hiding, because design decisions within a module are the responsibility of the module. As such, only the inputs and outputs are visible at the modules interface: the “how” is hidden inside the module. This hiding of details also reduces the complexity and the required amount of work from collaborating modules.  12  Parnas (1972) suggested that in modularizing systems and to achieve the modularization benefits the “criteria” for modularizing needed to strongly consider detail hiding. Detail hiding allows modules to abstract tasks, to have a relative independence and can be decoupled from the system. Accordingly, Baldwin and Clark (2000) refer to modularity as a concept that involves independence outside modules and dependence inside the modules. Hiding details reduces the interaction complexity among modules as well as decreasing technical dependencies among modules, which entails that design decisions made by designers are also hidden. This is important because since modularity has been a desired characteristic some modularizations fail to produce reusable parts and even to reduce complexity; because sometimes modularization is done exchanging technical details.  Dijkstra (1968) argues in his influential paper on the structure of software that modularization should also involve a structure composed of hierarchies organized to be sequentially more and more abstract such that all possible design considerations are mostly covered at a reasonable degree. In the same direction Parnas (1979) proposed modularizing with a hierarchical structure that will allow a system to have different reusable parts of the system that could be used by a family of similar programs that could be “extended” or “reduced” in functionality. Similarly, Baldwin and Clark (2000) point out that modularity involves a hierarchy. Parnas (1979) observed that in order to produce a “better structure” it is necessary to identify minimal subsets, and implement them in minimal steps something that hints at developing in as an incremental and iterative process. Nord (2013) states that “in order to understand complex systems that are nearly divisible hierarchical structures is a major facilitating factor enabling to understand, describe and even to visualize the system and the parts that form it. If there are important  13  systems in the world that are complex without being hierarchic, they may to a considerable extent escape our observation and understanding”.  Parnas (1979) also refers to software structural design as providing stability through the life cycle. This structural stability enables the increase, decrease, or change of functionality according to the conditions of the environment without increasing the complexity, thus reducing cost and increasing value. Koziolek (2011) identifies this concept as sustainability. Benefits can increase when a software structure, or part of it, can be reused (extended or used partially) in another systems serving as a basis for future evolution. Parnas (1979) refers to the division of the system, or modularization, and the associated “uses” hierarchy in the design as “the major milestone” in the project.  2.4 Metrics Metrics provide traceability, meaning that the result can be corroborated if repeated at different times if the conditions have not altered the characteristic to be measured. A metric is a way to provide quantification by comparing versus a convened invariable unit of measure. This serves the purpose of comparing over time one characteristic or physical dimension. In the case of software it has metrics over the amount of code which are representations of set of tasks and information. Software development professionals had been using metrics of observable characteristics of software in order to communicate its complexity and other characteristics. A very common metric is source lines of code (SLOC).  Metrics have been developed to measure how well a system is modularized and employ techniques such as dependency graphs, extract related measures from code, object oriented metrics, etc. Koziolek (2011) provides a comprehensive classification of metrics for  14  sustainability; with a particular focus on evolution and focused to prevent system’s decay and allow expansion of its “life-time”. Lakos (1996) uses the sum of dependencies at the element level and obtains densities using the number of elements in a module. Mancoridis et al. (1999) propose a modularity quality metric in which external (inter-connectivity) is subtracted from averaged internal (i.e. intra-connectivity) binary dependency densities inside automatically clustered modules. The intra and inter connectivity are densities of the dependencies. Intraconnectivity is the sum of binary dependencies within a given subset of modules divided by the square number of components, similar to densification in propagation cost formula (MacCormack 2008). For the inter-connectivity measures the outside coupling that divides the sum of binary dependencies among the number of possible connections among the modules in a pair of subsets defined as 2n1n2.  Sethi et al. (2009) obtained two metrics for stability and modularity called decision volatility and independence level respectively. For stability, or decision volatility, the intent of the software system and the environment are expressed in terms of design decision, i.e. dimensions, and the “environmental constraints”, respectively. This is referred to as an augmented constraint network and from these declarations, a design structure matrix is derived. From this matrix the software stability metrics are calculated based on assumptions about the coupling. This is similar to adding the binary dependencies that a module consumes in a DSM as in of fan-in and fan-out metrics from MacCormack (2004). Modularity is measured by an “independence level” metric which quantifies the extent to which modules are independent of each other. This is accomplished by an algorithm which identifies the modules that are independent. In the example from the metric is the result of the percentage of independent elements from all the modules in the system.  15  Liu et al. (2008) report that the average propagation ratio and network diameters (or path length as referred in this thesis) reduces as the number of edges increases for direct dependencies. This may be attributed to a lack of depth in the dependencies connection or to coupling and propagation. In addition, reachable nodes Rik at distance k take into account the node itself and all of the previous “visited” nodes; increasing the values of averaged propagation cost (PCk). In this thesis the PC does not take into account the node itself and at every PCk or Di in formula (1) it only considers the modules to which dependency propagation has not been propagated before.  Milev, Muegge, and Weiss (2009) propose a new modularity metric based by a clustering cost algorithm. The clustered algorithm “collapses” or creates a cluster of modules in an attempt to reduce the propagation. However, their conclusions do not always corroborate with the expected results. It is important to point out that they compare propagation cost (MacCormack 2008) and relative clustered cost, with some results coinciding and others differing. Relative clustered costs assign strengths to internal module dependencies which are smaller than those assigned to external dependencies. Relative clustered cost also assigns a smaller strength to the elements (classes) that provide dependencies to a large percentage of elements in the system (hubs), which the authors set as 10 percent in this case.  Agrawal (2009) proposes to use two metrics of modularity for a general product: degree and bridge modularity. It is important to note that in degree modularity the relative strength of dependencies is quantified rather than assigned a binary value. Degree modularity attempts to measure the extent to which a component is connected to the rest of the product components; both by the dependencies it consumes and by the dependencies it provides to the rest of the system. Bridge modularity claims to measure the extent to which a component stands between  16  the rest of the components in the system. In a hierarchical structure such as that of software it is unclear how useful this metric is because most components are needed in order to make a system work and the elements in the middle of the structure could be as equally important as the rest.  2.4.1 Granularity In accordance with Laval (2011), this research analyzes systems at the package level, since packages can be considered as granular units of reuse and deployment. The studies conducted by (Sosa 2008) also suggest that analyzing systems at the top levels of the hierarchy is fruitful for obtaining a measures of a system’s architectural evolution. In contrast, Lakos (1996), Milev, Muegge, and Weiss (2009) and others, use the atomic artifacts of a system like classes or files that reside within modules. Moreover, Agrawal (2009) divides dependencies as internal and external to the module. Some researchers use object-oriented metrics at the file level; Lakos (1996) is an example of using internal module metrics to indicate how coupled elements are.  2.5 Value and Software Development In this section I present a sample of representative proponents considering value as an integral consideration for software engineering is mentioned. Boehm (2003) suggests the importance that software decisions have on the software development process and ultimately in the degree of fulfilment from objectives of the system. For this reason, the author proposes changing from the previous neutral-value into a value-based software engineering mindset. The author proposes to understand the benefits, values for an organization and from these to understand the goals of the system through a “shared vision” from the development team and the stakeholders. In order to realize the goals of a system Boehm proposes “concurrent system and software engineering” as one of the foundations of value-based software engineering.  17  Concurrent system and software engineering includes value-based architecting, in which the systems objectives (e.g. RUP(Kruchten 2003) life cycle objectives), the architecture objectives and the implemented architecture are managed. The shared vision is briefly the understanding of the customer benefits, stakeholders’ propositions which helps software professionals to understand the context and the value for a given end-use organization. Agile methods proponents cite as one of the main tenets of these methodologies to deliver value to the client, Cockburn (2001) proposes to incrementally evolve software architecture –or “rearchitecture” as Cockburn refers to this strategy.  Denne (2004) introduces the “Incremental Method Funding” in which he also addresses the stress of dedicating some effort to designing a system vs. delivering functionality immediately without a proper or minimal design. Denne points out that while there might be time and funding constraints still a part of the infrastructure could be constructed incrementally based on iterative development and the management of the releases. In a commercial context delivering parts of the system functionality can also give the chance to implement subsets of some parts of the software architecture. Releasing incrementally can also reduce the upfront cost of implementing the entire architecture.  2.6 Technical Debt Technical debt is mentioned first by Cunningham (1992) and compared to incurring in a financial debt with the aim to develop and deliver fast, but he emphasizes that it should be paid promptly. Technical debt is a metaphor to the increase of software development effort for a functional unit, features or maintenance as referred as equivalent to financial debt (Cunningham 1992) as a result of suboptimal implementations. The more time it passes before the debt is paid the more interests will accumulate in the form of more changes in the system that requires effort and  18  consequently a cost. McConnell (2007) classifies technical as intentional and unintentional. McConnell suggests that intentional debt is product of a strategy that considers the present development cost more expensive than the cost in the future. McConnell also notes that development organizations incur in this type of debt in what he termed “big chunks” to differentiate it from debt that is taken unconsciously, in smaller proportions. Managing architectural debt is better than several pieces of small debt that is spread through the system and that is taken unconsciously and is more visible. The most common cause for incurring in architectural technical debt are directed towards reducing time deliver, uncertainty, limited resources, etc. Architectural design decisions also cause technical debt in what McConnell classifies as “identifiable” individual shortcuts. For instance, a company might have reduce time to market as part of a strategy to be the first team to offer a given software to increase the probability of increasing its market share and creating captive users. Type I Type II  Unintentional Intentional (Strategic) A. Short-term reactive paid off, frequently B. Long-term proactive Figure 2-1 Technical Debt by McConnell  2.7 Propagation Cost Steward (1981) proposed to express module dependencies for a system in the form of a matrix called design structure matrix (DSM). Doing this allows to visualize how much dependencies came in and out of a system. Design structure matrices had been used in several fields of engineering to determine the criticality of given modules or subsystems, especially in the manufacturing, supply chain and other disciplines.  19  Using strengths to describe dependencies in contrast to Boolean descriptors in a design structure matrix, as proposed by MacCormack (2008), allows to capture more information that differentiates and qualifies the relationships among module. For instance, in systems there are modules with weak dependencies to some other modules; while other modules provide the majority of the dependencies. Therefore, supporting modules that provide a more significant proportion of the code dependencies play a more critical role when modifications are made to them: the dispersion of a change could potentially affect more elements than the elements providing weaker strength dependencies.  The goal of propagation cost (MacCormack 2008) is to measure the potential impact of changes, estimated by equation (2):  n  PC   n  Vij j 1 i 1 2  n  n 1  V  D i 0  (2)  i  (3)  by adding up all the directly and indirectly influenced elements in the visibility matrix, V (3). The visibility matrix contains all of the Boolean indirect and direct dependencies and its estimation is explained in Chapter 5. Propagation cost divides the total propagation or transitive dependencies and obtains normalization by dividing the total by the number the number of all possible dependencies in a system with n elements, which is n2.  20  3 Approach Taken 3.1 Introduction This section explains the approach we took to validate modified change propagation metric with open source software systems. In order to validate the use of strength dependencies descriptors we obtained these data from real systems to perform our estimations. Dependencies were extracted from the compiled code of software systems, we explain the tools used and how was the change propagation obtained from this dependency data. This chapter also covers which types of systems were used and other considerations in our study.  3.2 Internal and External Module Dependencies Some authors, e.g. Lakos (1996) Milev, Muegge, and Weiss (2009), Agrawal (2009), etc., consider the internal and external module dependencies as metrics for modularity. This study considers that both types of dependencies should be analyzed separately and focuses only on the analysis of external module dependencies. While one of the aims of module design is module independence, still dependencies are needed to integrate the module with the system. External module dependencies are common in commercial software and a low coupling or low number of dependencies are convenient to support reuse and maintenance, which leads to time-to-deliver, reduction of maintenance and development costs. Some proposed modularity metrics consider internal and external module dependencies together. For instance, Agrawal (2009) considers internal and external dependencies to be the same. Other researchers such as (Mancoridis et al. 1999) work on the basis that internal cohesion is desirable while considering that external coupling should be minimized. Accordingly, they estimate modularity as a function of internal dependencies minus a function of external module dependencies. Software  21  engineering professionals, e.g. (Fowler 2001) have applied this principle to reduce coupling while others (Mancoridis et al. 1999) consider internal dependencies to be a good characteristic. This thesis considers that internal and external dependencies as separate concerns for measuring modularity.  A high number of external dependencies between modules while allowing for reuse present constraints to software evolution that require effort and cost to overcome. It is also true that the dependencies that allow for reuse, should be the result of abstraction, information hiding, and enforcing “minimality” in interfaces (Lakos 1996). When these guidelines are not followed or when despite following them, the number of dependencies increase the complexity and cost for a given system increase as well.  3.3 Analysis Tool Lattix LDM (Sangal et al. 2005) was used to extract the dependencies and structure of the systems analyzed. This tool identifies the hierarchy using algorithms, which is important for dealing with software systems when their hierarchies and software architecture are unknown. A useful feature of Lattix LDM is that it allows expressing the dependencies as a design structure matrix, and it provides an interface to interact with the matrix. This is useful because data does not need to be exported to other tools in order to calculate the change propagation metric. The selection of Lattix LDM was based on recommendations made by Sosa (2008).  3.3.1 Analysis Assumptions There are several ways to describe dependencies: Laval (2011) classifies dependencies by type, such as by inherits, implementing an interface, referencing values, and by the calling of methods. In this thesis all the dependencies are considered as of one type for the sake of  22  simplicity. It can be argued that the inheritance dependency should be in function of the code metrics of the inherited classes and thus being of a greater strength than other types of dependencies such as a simple reference to a value or an invocation to a simple method. However, some method invocations might allow for reuse to a considerable extent placing the dependency as of a greater strength. Therefore, it is difficult to determine which of the dependencies should weight more and analyses of the reference and its scope might be needed to determine the strength of per type of dependencies. For the previous reasons and for the sake of simplicity in this study all dependencies are considered as of the same type. 3.3.1.1 Dependency Strength Count The static analysis tool Lattix LDM determines dependency strength or in three ways, by knowledge-based, by knowledge-based subsystem and usage-based dependency strength. Dependency strength by knowledge counts n references to an outside class as one reference to determine strength, independently of how many times references are made within a class inside a module. The intent of this thesis is to get a metric for change propagation; for that matter all the times a dependency to given an external class appears inside a module are counted to determine the dependency strength, for this purpose we use the usage-based dependency strength. This is because while maintaining or refactoring a system a change in a module can cause compile or run-time errors or a change in the functionality inside and outside the modules. This study is concerned with the change outside a module mainly because the changes within a module can be done by the developer(s) assigned to a given module and the changes might be contained by the information hiding implemented in the design. However, changes in the other modules, for nontrivial projects might require the coordination with other workers or separate teams which entails effort, time and cost. For that reason, we are more concerned with determining strength by counting all the used references to external modules.  23  3.4 Validation This section explains the steps used to validate the metric. First, this study used small test cases of structures to corroborate that change propagation metric obtained consistent results with empirical practices in software engineering. In order to validate the change propagation metric non-trivial examples were used; generally-speaking, projects with more than 6 months of lifespan and more than 3 developers. For this purpose, an in-house software system and four open source projects from the Apache Software Foundation (Apache 2002) were analyzed. Some of these open source systems have been used by several software development organizations for 10 years of and have what is referred to by software development professionals as a stable architecture (Sosa 2008), meaning that at a certain point they do not change drastically. We found that change propagation is a good indicator for analyzing increases in dependency complexity.  These small test cases were used to illustrate how the dependencies propagated along software systems, variations of layered, and mediator structures where used. For instance, a mediator resembles at a high-level the structure of service-oriented architectures, whereas the layered structure is the de facto hierarchical organization for many software systems. The results of these test cases is covered in Chapter 5 (Kruchten 2012, Nord 2013). We also used the Divide by Propagation Length variant for the change propagation metric because of the distance denominator that reflects the distance between layers in the hierarchy.  The change propagation metric at second layer of the structure produced the most significant results. A software system structure has different layers in its structure; for instance, Hadoop had up to 5 layers and Ant up to 6 layers. Considering the first layer to comprehend the entire analyzed system we obtained the most significant measures at the second layer. In some  24  cases, the second and third layers would have the same behavior, and at some points, they would have disproportionate effects due to the amplification of indirect dependencies when the number of elements in that layer increased. We considered that architectural analysis should comprehend the fundamental relationships, and for that reason we decided to limit the number of elements between 10 and 20 modules.  3.4.1 In-House System An in-house system, the Disaster Response Network Platform (DRNEP) (González-Rojas 2011; Marti et al. 2012) developed by the Electric Power  Group of UBC for disaster-response  simulation management was used. DRNEP was developed in order to allow collaboration among organizations with knowledge in different simulation domain areas and knowledge on disaster response in order to provide research and preparedness services to organizations around the world. This would allow to integrate simulators of different infrastructure domains and integrate their results by an interdependencies simulator (Martí et al. 2008). For instance, one implementation (Cunha et al. 2011) included a water distribution network simulator (Rossman 2000), a human decision disaster and procedure policies simulator. The DRNEP project integrated simulators that run on diverse platforms and was developed with different programming languages.  In order to compare two different strategies or evolutionary paths for the design of DRNEP, path #1 “deliver soon” and path #2 “reduce rework” were compared. We used the t-shirt size approach to characterize the strengths of the dependencies since there were no software artifacts at design time. The t-shirt size strengths are:  1=strong  0.5= medium  25  0.1=weak  For the deliver soon strategy we consider that no integration applications frameworks are going to be used, the design decision will be that for every interaction between any two simulators an  1 Controller I2SimAdapter MTAdapter MT_I2Sim_Transl EPANetAdapter Epa_I2Sim_Transl Epa_MT_Transl BrahmsAdapter Brahms_I2Sim_Transl Brahms_MT_Transl Brahms_EPANet_Transl FloodAdapter Flood_I2Sim_Transl Flood_MT_Transl Flood_EPANet_Transl Flood_Brahms_Transl Data Persistence Total  1 2 3 4 5 7 6 8 9 10 11 12 13 14 15 16 17  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  1 1 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  1  0  1  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  Data Persistence  Flood_Brahms_Transl  Flood_EPANet_Transl  Flood_MT_Transl  Flood_I2Sim_Transl  FloodAdapter  Brahms_EPANet_Transl  Brahms_MT_Transl  Brahms_I2Sim_Transl  BrahmsAdapter  MTAdapter  PATH 1 DELIVER SOON  I2SimAdapter  Controller  good for delivering fast results in the first iterations.  Epa_MT_Transl  Epa_I2Sim_Transl  EPANetAdapter  MT_I2Sim_Transl  adapter for each simulator will be used and a translator will be developed. This approach is  17 Cost 16 0 3 0 3 0 8 0 3 0 8 0 8 0 3 0 8 0 8 0 8 0 3 0 8 0 8 0 8 0 8 1 5 116 0  Figure 3-1 DRNEP Path #1 Deliver soon design structure matrix  This design requires that when more simulators are added into the DRNEP system the more translators are going to be needed. This is a combination of n elements in groups of 2, which results in  n(n  1) translator which is a problem of order О(n2). Another approach is to use the 2  well-known mediator pattern, in this case all the models representations have to be converted into the DRNEP canonical data model for simulation collaboration (González-Rojas 2011). The design in the next figure.  26  Epa  Brah  Flood  Web  WAS  Controller  Service  ServiceBusiness  SimulatorServiceClient  Entities  1 1 0 0 0 0 0 0.1 0 0 0 0 0 0.1  2 0 1 0 0 0 0 0.1 0 0 0 0 0 0.1  3 0 0 1 0 0 0 0.1 0 0 0 0 0 0.1  4 0 0 0 1 0 0 0.1 0 0 0 0 0 0.1  5 0 0 0 0 1 0 0.1 0 0 0 0 0 0.1  6 0 0 0 0 0 1 1 0 0 0 0 0 0.1  7 0 0 0 0 0 0 1 0 0 0 0 0 0.1  8 0 0 0 0 0 0 0 1 0 0 0 0 0.1  9 0 0 0 0 0 0 0 0 1 0.1 0 ? 0.1  10 0 0 0 0 0 0 0 0 0 1 0.1 0.5 0.1  11 0 0 0 0 0 0 0 0 0 0 1 0 0.1  12 0 0 0 0 0 0 0 0 0 0 0 1 0  Common  MT  1 2 3 4 5 6 7 8 9 10 11 12 13  I2sim I2SimService MTService EPAService BrahmsService FloodSimServ DrNepWeb DrNepServiceWASClient Controller DrNepService DrNepServiceBusiness SimulatorServiceClient DrNepEntities DrNepCommon Total  13 Cost 0 7 0 9 0 9 0 9 0 9 0 3 0 8 0 8 0 6 0 5 0 8 0 11 1 2 94  Figure 3-2 DRNEP Path #2 Reduce rework design structure matrix  Intuitively the Path#2 should give a smaller propagation due to the use of the mediator pattern, in order to measure the propagation metrics we run the propagation cost (MacCormack 2008) and the modified change propagation metrics. The results are shown in Figure 3-3. It can be appreciated that the propagation cost was higher for Path #2 which is counter intuitively. For the modified change propagation the result corroborates empirical practices in software design.  Figure 3-3 Propagation cost and modified change propagation for DRNEP Paths #1 and #2  27  3.4.2 Open-Source Systems Several versions of open-source systems were used to perform validation because of the widespread use of the software systems and especially because some of them have been used for abound 10 years. This allows examining the evolution in time of their software architecture for systems that are widely used by software development organizations as well as end-users. 3.4.2.1 Hadoop Hadoop (Borthakur 2007) is an open source project from the Apache Software Foundation (Apache 2002) for distributed processing. Hadoop is a highly fault-tolerant distributed file system for processing large data sets in a cluster computing environment. Several organizations use Hadoop, for instance Adobe, eBay, Facebook, Fox Audience network, Google, etc. (Apache 2013). Hadoop delivers high-availability by detecting and handling faults.  The results obtained in Figure 3-4 show that the propagation cost is different from change propagation estimations. In some cases propagation cost does not identify and in some other systems it has an opposite result. For instance, in Hadoop versions 0.18.0 to 0.19.0. propagation cost does not detect any difference, despite changes in the hierarchy where two modules are introduced, hdfs, http, and module dfs is removed. The configuration of versions 0.18.0 and 0.19.0 changed; however, because of the cycles the propagation cost estimates for the most part a saturated visibility matrix. Also, For versions 1.0.4 and 1.1.2 propagation cost shows similar estimation whereas the modified change propagation calculates an increase.  28  Figure 3-4 Hadoop sources lines of code, PC, and MCP (v/n^2)  3.4.2.2 PDFBox PDFBox is an open source project from the Apache Software Foundation (Apache 2002) for viewing and editing PDF documents. The propagation cost and the modified change propagation are similar; however there is a decrease in change propagation metric that  29  propagation cost does not identify in version 1.5.0. Also, change propagation detects a slight variation from versions 1.7.0 to version 1.7.1 whereas propagation cost detects a reduction.  Figure 3-5 PDFBox source lines of code, PC, and MCP (V/N2)  3.4.2.3 Ant Ant is an open source project from the Apache Software Foundation (Apache 2002) for building applications from source code into an executable and distributable application. We can see in Figure 3-6 that propagation cost (MacCormack 2008) estimates sometimes are coincident, other  30  times the differ, and sometimes opposite as those of the modified change propagation metric (V/n^2 DIV_POW) in label A, versions 1.5.4 to 1.6.0. Propagation cost estimates a decrease whereas change propagation calculates a decrease. From release 1.8.0 to 1.8.1, identified in Figure 3-6 with label B, opposite estimates from propagation cost and modified change propagation metrics can be seen.  B  A  Figure 3-6 Sources lines of code versions and cycles for ANT 1.4.1 to 1.9.0  31  We found that for most of the Ant versions there is an increase for both; size of the systems, measured in source lines of code, and the modified change propagation. However, a decrease of 0.3 percent in the dependencies from version 1.8.1 to 1.8.2 also causes a reduction of 4.32 percent in the modified change propagation. The modified change propagation metric also found that there is a difference in the estimation result when done with the a) cycle elimination and b) cycle limitation calculation for ANT versions 1.8.1 to 1.9.0.  a) number of modules (at second level)  32  b)  Visibility  c) Modified change propagation nd  Figure 3-7 Visibility for ANT versions 1.4.1 to 1.9.0 for 2  level in Figure 3-1  In Figure 3-7 the SUM and the Divide by Propagation Length variants (see section 5.4) are shown with the Ignore and the Limit cycle treatment approaches (see section 6.4.1) We also calculated the modified change propagation for some modules that contained more dependencies inside them as rest of the others. This implied to expand some modules which resulted in increasing the number of modules in the design structure matrix. We observed that the for the cycle elimination, or ignoring cycles, calculation the increase in our proposed modified change propagation is around 50 to 2100 percent. This is due to the increase of parallel and series operations, especially for the parallel which adds the result of propagated dependencies. This leads to question why do the modified change propagation result differ so much depending on the granularity of the analysis; however, it is my opinion that this is because architectural and atomically object metrics differ in their concerns. The intent of clustering atoms in layers or packages is to isolate, to hide information from other element. However, for large systems, the  33  8  9  .types 10  26 19 416 7  .* 11  .util  1 35  44 2 10  7 36 541 6 30  995 5 47 5006 121 4221 34 22 78 35 227  2  69 24  491 266 143 2 327 2892 158 6 329 922 1949 79 708  level  * 22  tools 23  types.* 21  types.resources 20  types.optional 18  types.selectors 19  types.mappers 17  util 15  types.spi 16  loader 14  property 13  9  input 11  8  filters 12  taskdefs.email  7  dispatch 10  taskdefs.condition  5  6  taskdefs.rmic  4  75 86  2 12 13 3034 54 2  nd  16  g1 12  .loader  5  7  4  6098 5 25 77 16  taskdefs.*  taskdefs.optional  taskdefs.compilers  2  taskdefs.cvslib 3  1  types level level expanded 3rd level 2nd2nd  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  helper  listener  taskdefs 2nd level expanded 3rd level 2nd level  listener helper taskdefs.cvslib taskdefs.optional taskdefs.compilers taskdefs.rmic taskdefs.* taskdefs.condition taskdefs.email dispatch input filters property loader util types.spi types.mappers types.optional types.selectors types.resources types.* * tools  8  485  a)  Ant 1.8.4 expanded to 3rd  6  .input  6  .filters  .dispatch  3  5  .property  .taskdefs  2  1 .listener 1 64 .helper 2 .taskdefs 3 28 .dispatch 4 .input 5 .filters 6 .property 7 .loader 8 .util 9 16 .types 10 .* 11 82 org.apache.ant.utils 12 12  .helper  .listener  Ant 1.8.4  64 485  1 109 1796 41 20  28  519 11 3 5  118 3  14 94  5  82 12  26  9 10 416 7  36  5 51  290  1  5  21 1  7  50  58  64  82  290 22 189 116  9 51 99 456 47 9 599 1165 50 268 485 1949 62 17 708  7  7 25 77 16  16  1 7  6  19  19 1811 378 1266 59  7  73 40  599 7  42 2  92 2 10  36 541 6 37  7  5  16  8  4  491  3  59  30  24  47  143  10  86 5 438 8 2105 52 2523 221 200 12  4 15 61 15  34  22  2 119 78  35  2 6  2 79 246 329  16 3  14  2 11  33 28  rd  3 level  Figure 3-8 Ant 1.8.4 expanded to 2nd level and taskdefs and types modules to 3rd level expansion  The results for the change propagation metric for Ant with modules taskdefs and types to the third level are shown in Figure 3-6.  34  Figure 3-9 Ant Change Propagation version 1.4.1 to 1.9.0  35  3.4.2.4 Checkstyle Checkstyle is an open source tool that helps to enforce coding standards, and this tool is highly configurable to perform similar checks. Other projects use checkstyle to enforce coding styles, to find some common class design problems; and to detect duplication of code and bug or antipatterns. By reviewing the evolution of checkstyle our study found that there is an improvement measured by the decrease in the initial change propagation of the system. This is important since in some other systems the variation of change propagation closely resembles that of the increase in source lines of code (SLOC). In early releases, identified by the releases starting in label A prior to the release 3.0 in label B as illustrated by Figure 3-10, checkstyle has the smallest source code between 10 to 20 KSLOC’s. Despite Checkstyle source code being the smallest in the early releases, we measured the greatest change propagation with a peak in version 2.4 prior to event B. At event B we find a reduction in change propagation caused by a refactoring, development organizations decide to redesign the software architecture when complexity or cost to develop functional units increases non-linearly to overcome increasing cost. Reviewing the release notes of checkstyle 3.0 we find:   “Completely new architecture based around pluggable modules. This means that users can now write their own checks without changing the source code of checkstyle itself (request 578712).    Configuration is not based on Properties any more. Instead a Configuration interface is used …”  B  36  Figure 3-10 Checkstyle SLOC, PC, MCP, and SBO from Bouwers  3.4.2.5 Detailed Analysis In Figure 3-10 it can be observed that propagation cost (PC) and the modified cost propagation (MCP) have a similar trend but noted that there are some trends that are different, as indicated by the red circles. For instance, the changes from versions: a) 2.3 to 2.4, b) 3.1 to 3.2, c) 3.2 to 3.3 and d) 4.3 to 4.4 are different.  In release 3.1, located right after event B in Figure 3-11, we found a decrease in the change propagation, V/n^2. This is more clear in the Visibility matrix, V, in Figure 3-12, where it can be appreciated that there was variation of 25 percent decrease in release 3.0 compared to version  37  2.4 in the dependency propagation regardless of the number of modules. In Table 3-1, and in Figure 3-11 it can be appreciated that there is a 217 percent increase in the number of modules, n, that together with the decrease in the dependency propagation contribute to the sharp decrease of the change propagation, V/n^2. This is important because there is an increase of approximately 33 percent in the number of source lines of code, SLOC, from version 2.4 to 3.0. However, the change propagation decreases as does visibility, V, which does not take into account the number of classes or elements, n, as illustrated in Table 3-1, this is also more visible in Figure 3-12. We include the change propagation variants to show their behavior.  38  Figure 3-11 Checkstyle propagation compared to SBO, CSU, CB  In release 3.1, we found an increase in the Visibility, V, which is not normalized by n2 as noted in Figure 3-12. However, since there is also a 39 percent increase in the number of modules, n, from version 3.0 to 3.1 the change propagation, V/n2, appears relatively stable for the Divide by  39  Propagation Length (DIV_LENGTH) of as illustrated by Figure 3-14. This leads us to formulate the question that if to obtain our modified change propagation metric the visibility matrix, V of length n-1 should be normalized by n2; or whether it should be normalized by the initial direct dependency matrix D. In order to clarify the estimation of change propagation we denote visibility matrix of lengths of n-1 as V(n-1) and the direct dependency design structure matrix, D, as V(1). n 1  V (n  1)   D  i  i 0  V (1)  D  1  Figure 3-12 10x zoom for Visibility, V  Normalizing V(n-1) by V(1) (see section 3.4.2.5.1) results in the ratio of dependency propagation growths in reference to its initial state, which is the matrix with the direct dependencies. A similar behaviour is observed in Figure 3-15 as in Figure 3-11, but with a different scale when compared to V/n^2. However, an increase is detected from version 3.3 to 3.4 similar to V in Figure 3-11.  Another increase in change propagation, V/n^2, is found with release 3. (label C in Figure 3-11). Repeating the data for version 3.4 allows us to compare our results with those from Bouwers (2011), since we were not able to find data for the release 3.5 of checkstyle. Checkstyle change propagation stabilizes from version 4.0 to 4.4, which is visible in the releases running prior to  40  label E. These versions have small variations on their size measured in SLOC’s, number of modules, n, visibility, V, and change propagation, V/n^2.  In checkstyle release 5.0, marked with label E, there is an increase in visibility of around 30 percent for Max, Aver, MaxPlus change propagation variants, which amounts to an increase of 169 percent for the Divide by Propagation Length. For the SUM and MAX change propagation variants, an increase of approximately 400 percent was measured because these respond more sensibly to the increase of parallel paths as will be explained in the next sections.  For versions 5.0 to 5.6 we can see in Figure 3-11 some other minor variations that do not affect the overall trend. From analysis of version 1.1 the earliest version accessible in the Checkstyle repository (label A) we observe that the initial change propagation V/n2, reduces consecutively up to version 1.3. In version 1.4 it starts to grow again until version 2.2. In version 2.3 a minimal decrease is detected. In version 2.4 the highest change propagation is detected and a sharp decline is observed at version 3.0, in label B in Figure 3-11.  Figure 3-13 5x zoom for Change propagation, V/n^2  41  Figure 3-14 Zoom for Change propagation, V/n^2  A  B 1.0  SLOC D noDiagonal Cycles n SLOC/n V  1.0  SUM MAX AVER MAX+ DIVN DIV1.1 DIVPOW V/n^2 SUM MAX AVER MAX+ DIVN DIV1.1 DIVPOW  1.1 11,827 776 18 657 1.1 1,679 1,160 1,086 1,216 826 1,597 1,069 1.1 5.2 3.6 3.4 3.8 2.5 4.9 3.3  1.2 12,283 845 19 646 1.2 1,767 1,246 1,171 1,303 894 1,683 1,146 1.2 4.9 3.5 3.2 3.6 2.5 4.7 3.2  1.3 13,390 960 26 515 1.3 1,975 1,437 1,350 1,499 999 1,883 1,305 1.2 2.9 2.1 2.0 2.2 1.5 2.8 1.9  1.4 10,919 1,316 25 437 1.4 3,501 2,558 2,356 2,715 1,403 3,302 2,061 1.4 5.6 4.1 3.8 4.3 2.2 5.3 3.3  2.0 11,575 1,456 25 463 2.0 3,641 2,698 2,496 2,856 1,543 3,442 2,201 2.0 5.8 4.3 4.0 4.6 2.5 5.5 3.5  2.1 12,541 1,640 27 464 2.1 4,975 3,386 3,068 3,605 1,764 4,671 2,740 2.1 6.8 4.6 4.2 4.9 2.4 6.4 3.8  SLOC D noDiagonal Cycles n SLOC/n V SUM MAX AVER MAX+ DIVN DIV1.1 DIVPOW V/n^2 SUM MAX AVER MAX+ DIVN DIV1.1 DIVPOW  2.3 21,858 2,802 34 643 2.3 9,603 6,119 5,329 6,391 3,002 8,985 5,047 2.3 8.3 5.3 4.6 5.5 2.6 7.8 4.4  2.4 22,106 3,132 36 614 2.4 12,003 7,354 6,528 7,725 3,378 11,196 6,042 2.4 9.3 5.7 5.0 6.0 2.6 8.6 4.7  C  3.0 29,823 3,189 114 262 3.0 6,857 6,048 5,873 6,264 3,221 6,523 4,513 3.0 0.5 0.5 0.5 0.5 0.2 0.5 0.3  3.1 38,121 4,186 12 159 240 3.1 16,594 11,344 10,532 12,269 4,276 15,467 7,589 3.1 0.7 0.4 0.4 0.5 0.2 0.6 0.3  3.2 45,087 4,775 1 80 564 3.2 8,529 7,116 6,747 7,396 4,823 8,197 6,146 3.2 1.3 1.1 1.1 1.2 0.8 1.3 1.0  3.3 45,747 3,105 1 82 558 3.3 7,018 5,579 5,209 5,870 3,153 6,662 4,522 3.3 1.0 0.8 0.8 0.9 0.5 1.0 0.7  3.4 50,579 3,498 82 617 3.4 17,061 12,111 10,344 13,067 3,661 15,828 8,128 3.4 2.5 1.8 1.5 1.9 0.5 2.4 1.2  E  D 4.0 52,296 4,833 249 210 4.0 28,340 18,411 17,087 20,027 4,927 26,203 11,091 4.0 0.5 0.3 0.3 0.3 0.1 0.4 0.2  2.2 21,174 2,697 31 683 2.2 8,565 5,545 4,885 5,789 2,886 8,031 4,635 2.2 8.9 5.8 5.1 6.0 3.0 8.4 4.8  4.1 52,278 4,824 249 210 4.1 28,326 18,397 17,073 20,014 4,918 26,189 11,080 4.1 0.5 0.3 0.3 0.3 0.1 0.4 0.2  4.2 52,974 4,893 250 212 4.2 28,459 18,525 17,196 20,144 4,987 26,316 11,172 4.2 0.5 0.3 0.3 0.3 0.1 0.4 0.2  4.3 53,696 4,957 255 211 4.3 28,619 18,665 17,331 20,289 5,049 26,467 11,273 4.3 0.4 0.3 0.3 0.3 0.1 0.4 0.2  4.4 53,822 4,969 256 210 4.4 28,639 18,682 17,349 20,307 5,061 26,487 11,288 4.4 0.4 0.3 0.3 0.3 0.1 0.4 0.2  5.0 55,302 5,635 270 205 5.0 144,439 24,673 22,308 28,083 4,149 131,820 30,352 5.0 2.0 0.3 0.3 0.4 0.1 1.8 0.4  5.1 55,522 5,638 270 206 5.1 144,448 24,683 22,320 28,092 6,152 131,829 30,356 5.1 2.0 0.3 0.3 0.4 0.1 1.8 0.4  5.2 56,989 5,822 96 276 206 5.2 153,242 27,123 24,551 30,740 6,423 139,846 31,990 5.2 2.0 0.4 0.3 0.4 0.1 1.8 0.4  5.3 56,593 4,403 276 205 5.3 152,916 26,878 24,366 30,469 6,355 139,544 31,830 5.3 2.0 0.4 0.3 0.4 0.1 1.8 0.4  5.4 56,989 5,895 279 204 5.4 153,242 27,123 24,552 30,740 6,423 139,846 31,990 5.4 2.0 0.3 0.3 0.4 0.1 1.8 0.4  5.5 58,493 5,911 279 210 5.5 153,282 27,163 24,583 30,780 6,439 139,884 32,016 5.5 2.0 0.3 0.3 0.4 0.1 1.8 0.4  Table 3-1 Gross change propagation, V, and normalized change propagation, V/n  42  5.6 58,808 5,916 279 211 5.6 153,259 27,139 24,555 30,574 6,444 139,864 32,014 5.6 2.0 0.3 0.3 0.4 0.1 1.8 0.4  2  3.4 50,579 3,498 82 617 3.4 17,061 12,111 10,344 13,067 3,661 15,828 8,128 3.4 2.5 1.8 1.5 1.9 0.5 2.4 1.2  3.4.2.5.1 Normalizing by Visibility Length 1 Taking a closer look at the sum of dependencies for V, and V/n2 in Figure 3-11 we found some cases where the tendency of V is opposite to the tendency observed when normalizing by n2. This leads us to the following question: Could the increase of dependency propagation normalized by the dependencies be a better indicator of propagation cost than normalizing by n2? We found that normalizing by V(1) did not produce a good result when compared with system breakdown optimality approach from Bouwers (2011). Normalizing by n2 provides a result that is coincides more with the system breakdown optimality.  1  Figure 3-15 V divided by D and compared vs V divided by n  43  2  3.4.2.5.2 Adjustment in the Analysis of Checkstyle In order to graphically compare the metrics of this study with that of Bouwers (2011), we repeated the data from Checkstyle version 3.4 for version 3.5 to compare the change propagation metric with that of Bouwers’ study. We also performed the analysis at the class level for Checkstyle, since some versions only had one package containing all of the classes, e.g. versions 1.1 to 2.4. Previous analyses were done at the first, second and sometimes at the third level since the other systems analyzed did had a complex hierarchy, with up to seven hierarchical levels as in the case of Hadoop. As in some analysis, (Sosa 2008); (Bouwers 2011), the hierarchical levels of analysis were determined by the analyst. In some cases, at the second level the dependencies among modules will represent less than 10% of total dependencies. In this case the modules that concentrate most of the dependencies are expanded whenever the number of modules, n, are less than 15. The researcher initially wanted to analyze the software architecture of a system as the fundamental relationships among them. For this reason, a threshold of 10 modules was set heuristically since it is a manageable number for an analyst. However, practically this number did not reflected the systems hierarchical structure and was stretched at some points to 22 modules.  3.4.2.5.3 Similarities with other Metrics Some measurements in this study and those of Bouwers, (2011) coincide in terms of measuring the architecture. The most visible similarity is that system breakdown optimality (SBO) increased from version 2.4 to 3.0 and the change propagation detects improvement from versions 2.4 to 3.0. The same applies to component balance (CB), which was used in Bouwers (2011), to measure the organization and decomposition of the system. Component breakdown increase also coincides with the decrease in propagation. There are other differences as well:  44  the small increase of propagation from version 2.3 to 2.4 is not detected. One possible cause of this is that the metrics from (Bouwers 2011) might be saturated or that our metric is more sensible. The change propagation stabilization from version 4.0 to 4.4 coincides as well with the component size uniformity (CSU) and measurements from Bowers’, but the increase from version 4.4 to 5.0 appears in (Bouwers 2011) as continuous.  3.5 Sum Variant Breadth Sensitivity The SUM variant is sensible to the change propagation caused by breadth, and this variant can be used to analyze elements that might spread dependencies in the system. We detected that some structures at some releases produce a high result in the estimation of change propagation of the SUM (Nord 2012, Nord 2013) variant while being consistent with the other change propagation variants in other versions. This is due to the sensitivity to the propagation caused by breadth of the SUM variant since SUM adds the parallel paths instead of taking the maximum or an average as the other variants. In addition, the DIVIDE BY variants use the SUM variant as their numerator but are reduced by their denominator.  In Checkstyle versions 5.0 to 5.6, illustrated by Figure 3-11, it is noted that differences of approximately an order of magnitude of one among variants SUM and DIVIDE_BY1_1 and the rest of the variants. This study considers the DIVIDE_BY1_1 as similar to SUM variant because DIVIDE_BY1_1 has the same parallel and series operator divided by a constant of 1.1. For the SUM variation, for instance, if we observe the gross change propagation , V, of Checkstyle version 5.0 with the SUM (144,439) and MAX (24,673) variants there are difference of approximate 5 times between them as illustrated in Table 3-2. It is noted that the propagation under the MAX variant diminishes after the propagation of length 2, or power 2. On the other  45  hand, the SUM variant increases consecutively until the propagation of length, or power, of 6 which has a value of approximately 5 times the initial design structure matrix, D1.  length MAX SUM 1 5,635 5,635 2 3,612 4,324 3 4,267 6,940 4 4,003 13,246 5 3,058 22,381 6 1,921 26,110 7 983 22,096 8 531 18,120 9 339 14,080 10 197 8,239 11 91 2,794 12 32 450 13 4 24 length ∑ 24,673 144,439 Table 3-2 Checkstyle 2.0 change propagation per length  The causes of the increase for the SUM variation are the parallel operations, which sums all the parallel paths. For structures produce a high number of parallel paths in their propagation the SUM variant will be more sensible. In the case of series operation, in structures where the difference is small among x and y for the series operator, represented by min(x, y) will result in a marginal decrease in the propagation strength.  At a closer inspection, we observe that for some elements, the change propagation produces more indirect dependencies, as illustrated in Table 3-3 and Table 3-4. For instance, for element 251, CheckstyleException, propagation increases. Element CheckstyleException growth represents less than 5% (1,167/24,673) of the total growth of the MAX variant while its growth is more than 10% (14,534/144,439) for the SUM variant. It is observed that logically the SUM  46  variant has a higher change propagation since it adds the cells with indirect dependencies, whereas MAX choses only the maximum.  MAX element 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 length ∑  1 144 16 28 17 14 203 16 21 2 51 2 154 8 72 6 84 99 21 21 8 39 22 110 47 47 7 150 51 48 2851 85 5635  2 109 6 5 34 32 135 198 24 9 46 9 58 24 183 61 71 178 15 35 8 25 202 47 311 86 0 55 34 98 612 51 3612  3 107 1 1 3 17 88 147 188 17 126 17 40 28 136 183 189 108 0 15 0 2 140 15 379 313 0 2 0 75 477 98 4267  4 69 0 0 1 4 71 70 141 97 263 97 203 194 88 136 219 70 0 0 0 0 88 3 196 378 0 0 0 44 253 75 4003  path length 5 6 7 29 27 24 0 0 0 0 0 0 0 0 0 1 0 0 29 27 24 56 28 27 70 56 28 73 36 33 241 121 65 73 36 33 238 206 110 232 115 65 71 29 27 88 71 29 128 80 35 61 53 27 0 0 0 0 0 0 0 0 0 0 0 0 71 29 27 0 0 0 112 72 29 196 112 72 0 0 0 0 0 0 0 0 0 28 27 24 143 96 29 44 28 27 3058 1921 983  8 4 0 0 0 0 4 24 27 27 34 27 43 34 24 27 27 24 0 0 0 0 24 0 27 29 0 0 0 4 27 24 531  9 0 0 0 0 0 0 4 24 27 27 27 33 27 4 24 27 4 0 0 0 0 4 0 24 27 0 0 0 0 24 4 339  10 0 0 0 0 0 0 0 4 24 27 24 27 27 0 4 24 0 0 0 0 0 0 0 4 24 0 0 0 0 4 0 197  11 0 0 0 0 0 0 0 0 4 24 4 27 24 0 0 4 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 91  12 0 0 0 0 0 0 0 0 0 4 0 24 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32  13 ∑ length 0 513 0 23 0 34 0 55 0 68 0 581 0 570 0 583 0 349 0 1029 0 349 4 1167 0 782 0 634 0 629 0 888 0 624 0 36 0 71 0 16 0 66 0 607 0 175 0 1201 0 1288 0 7 0 207 0 85 0 348 0 4516 0 436 4 24673  Table 3-3 MAX change propagation per path length for checkstyle-5.0, scaled by path length  47  SUM path length element 1 2 3 4 5 6 7 8 9 240 144 111 172 169 629 952 330 24 0 241 16 7 2 0 0 0 0 0 0 242 28 5 1 0 0 0 0 0 0 243 17 34 3 1 0 0 0 0 0 244 14 32 17 4 1 0 0 0 0 245 203 135 153 170 629 952 330 24 0 246 16 218 148 136 154 627 952 330 24 247 21 32 202 142 136 154 627 952 330 248 2 10 23 104 74 81 123 623 952 249 51 48 147 299 327 333 858 1702 1905 250 2 10 23 104 74 81 123 623 952 251 154 75 93 377 548 542 594 1700 3896 252 8 26 40 226 246 210 235 750 1575 253 72 186 136 153 170 629 952 330 24 254 6 61 186 136 153 170 629 952 330 255 84 71 206 224 221 250 752 1575 1282 256 99 183 202 309 955 1772 1286 354 24 257 21 15 0 0 0 0 0 0 0 258 21 36 16 0 0 0 0 0 0 259 8 8 0 0 0 0 0 0 0 260 39 30 2 0 0 0 0 0 0 261 22 205 141 154 170 629 952 330 24 262 110 86 38 6 0 0 0 0 0 263 47 324 450 351 335 800 1581 1282 354 264 47 114 579 620 493 489 1427 2533 1612 265 7 0 0 0 0 0 0 0 0 266 150 56 2 0 0 0 0 0 0 267 51 34 0 0 0 0 0 0 0 268 48 98 140 142 627 952 330 24 0 269 2851 890 1064 2366 5472 6893 4830 2034 402 270 85 54 100 140 142 627 952 330 24 ∑ length 5635 4324 6940 13246 22381 26110 22096 18120 14080  10 11 12 13 ∑ length 0 0 0 0 2531 0 0 0 0 25 0 0 0 0 34 0 0 0 0 55 0 0 0 0 68 0 0 0 0 2596 0 0 0 0 2605 24 0 0 0 2620 330 24 0 0 2346 1306 354 24 0 7354 330 24 0 0 2346 4139 1990 402 24 14534 1282 354 24 0 4976 0 0 0 0 2652 24 0 0 0 2647 354 24 0 0 5043 0 0 0 0 5184 0 0 0 0 36 0 0 0 0 73 0 0 0 0 16 0 0 0 0 71 0 0 0 0 2627 0 0 0 0 240 24 0 0 0 5548 378 24 0 0 8316 0 0 0 0 7 0 0 0 0 208 0 0 0 0 85 0 0 0 0 2361 24 0 0 0 26826 0 0 0 0 2454 8239 2794 450 24 144,439  Table 3-4 SUM change propagation evolution per path leength for checkstyle-5.0, scaled by  These small differences increase because the SUM variant adds the parallel paths in contrast to the calculation using the MAX variant where the calculation is based on the maximum value, as illustrated by Table 3-6. This is visible from paths of length 2 where a small increase in the SUM variant in the dependencies (e.g., element 251, CheckstyleException) results in a change propagation 17 units higher, detail in Table 3-6. For example, the difference between the SUM and the MAX variant grows 53 for paths of length 3, 174 in paths of length 4, to 310 in paths of length 5, to 336 in paths of length 6 and so on. These small differences are increased when paths of higher lengths are estimated. Therefore, we can conclude that the SUM is more sensible for a higher number of parallel paths in which the following possibility is implemented:  48  PRL01.The higher the number of parallel paths as indirect dependencies from one element to another exist a higher number in the modified dot product should result.  The increase detected in Checkstyle versions 5.0 to 5.6 could indicate that the SUM variant has a high sensitivity to the increase of the number parallel paths and that it does give a masked value of change propagation. Additionally, as the depth in the propagation increase there is less chance that the effects of the change in one element that provides dependencies can affect the rest of the elements with longer paths (Kazman 2012). In order to reflect this premise the Divide By Propagation Length variant implements in the proposition:  VAR7. The resulting indirect dependency should the sum divided by the a denominator that decreases the result when the length of the path, or i, in equation (2) changes to:  n  n 1  V  i 1  n   D j 1 k 1  i jk  i  It can be observed in Figure 3-11 that the Divide by Propagation Length variant that appears as DIVPOW reduces the gross change propagation V for Checkstyle versions 5.0 to 5.6 which maintains more similarities with the measures obtained by Bouwers (2011). For this reasons, we conclude that despite the SUM variation might estimate a high change propagation it is still useful for a sensitivity analysis as that shown in Table 3-2 and Table 3-3 were element 249, Configuration, 251, CheckstyleException, 255, TextBlock, 256, Utils, 263, LocalizedMessage, and 264, SeverityLevel are elements that record approximately 50% of the change propagation (72,805/144,439) in just 3% of the elements. Elements 249, 251, 255 and 264 have their highest partial change propagation in paths of length 8 and concentrate most of their partial change  49  propagation in the mention paths lengths, with 71, 83, 80, 64 and 55% of their total change propagation. When the Divide by Propagation Length variant diminishes the change propagation proportionally to the distance of indirect dependency dividing by the path length (modified dot product power) the gross change propagation is diminished as illustrated byTable 3-5.  DIVPOW element 48 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 ∑ length  1 0 144 16 28 17 14 203 16 21 2 51 2 154 8 72 6 84 99 21 21 8 39 22 110 47 47 7 150 51 48 2851 85 4444  2 98 56 4 3 17 16 68 109 16 5 24 5 38 13 93 31 36 92 8 18 4 15 103 43 162 57 0 28 17 49 445 27 2159  3 92 57 1 0 1 6 51 49 67 8 49 8 31 13 45 62 69 67 0 5 0 1 47 13 150 193 0 1 0 47 355 33 2313  4 319 42 0 0 0 1 43 34 36 26 75 26 94 57 38 34 56 77 0 0 0 0 39 2 88 155 0 0 0 36 592 35 3312  path length 5 6 7 703 709 353 126 159 47 0 0 0 0 0 0 0 0 0 0 0 0 126 159 47 31 105 136 27 26 90 15 14 18 65 56 123 15 14 18 110 90 85 49 35 34 34 105 136 31 28 90 44 42 107 191 295 184 0 0 0 0 0 0 0 0 0 0 0 0 34 105 136 0 0 0 67 133 226 99 82 204 0 0 0 0 0 0 0 0 0 125 159 47 1094 1149 690 28 105 136 4476 4352 3157  8 77 3 0 0 0 0 3 41 119 78 213 78 213 94 41 119 197 44 0 0 0 0 41 0 160 317 0 0 0 3 254 41 2265  9 4 0 0 0 0 0 0 3 37 106 212 106 433 175 3 37 142 3 0 0 0 0 3 0 39 179 0 0 0 0 45 3 1564  10 0 0 0 0 0 0 0 0 2 33 131 33 414 128 0 2 35 0 0 0 0 0 0 0 2 38 0 0 0 0 2 0 824  11 0 0 0 0 0 0 0 0 0 2 32 2 181 32 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 254  Table 3-5 Divide by Length (DIVPOW)  50  12 0 0 0 0 0 0 0 0 0 0 2 0 34 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38  13 ∑ length 0 2355 0 634 0 20 0 31 0 35 0 37 0 699 0 524 0 440 0 305 0 1031 0 305 2 1877 0 640 0 567 0 439 0 814 0 1052 0 29 0 44 0 12 0 55 0 529 0 167 0 1075 0 1372 0 7 0 179 0 68 0 513 0 7477 0 493 2 29,158  col row  Variant SUM 251 MAX 251  col row  Variant SUM 251 MAX 251 SUM-MAX  Variant  col row  SUM 251 MAX 251 SUM-MAX  Variant  col row  SUM 251 MAX 251 SUM-MAX 81  Variant  col row  SUM 251 MAX 251 SUM-MAX  Variant  col row  SUM 251 MAX 251 SUM-MAX  Length 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 … 0 0 0 0 0 1 14 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 14 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  ∑ 154 154  Length 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 …  ∑  0 0 2 2 2 3 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0  75 58 17  Length 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 …  ∑  0 0 3 3 3 3 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2  93 40 53  Length 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 …  ∑  0 0 1 1 1 1 0 0 0 0 0 0 5 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 Length 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 … 0 0 0 2 2 0 0 0 0 0 0 0 4 0 5 5 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 1 1 0 0 0 0 0 0 0 1 0 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Length 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 … 0 0 0 3 3 0 0 0 0 0 0 0 1 0 4 4 1 6 57 9 9 11 11 9 6 6 6 6 9 6 9 9 13 9 9 9 9 11 6 9 6 9 6 5 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 0 3 3 0 4 55 7 7 9 9 7 4 4 4 4 7 4 7 7 11 7 7 7 7 9 4 7 4 7 4 4  Table 3-6 Increase of change propagation for checkstyle 5.0 MAX vs SUM  51  377 203 174  ∑ 548 238 310  ∑ 542 206 336  4 Deriving Value for Software Architecture from User-visible Concerns 4.1 Introduction In this chapter we report our attempt to define a metric for software architecture value, metric that we could not fully validate, and that we abandoned in favor of an improved propagation cost approach. This chapter describes the approach to estimate the value of systems’ software architectural design by mapping value from user-visible concerns.  One of the challenges in the software development industry is that the development new functionality and maintenance cost and effort increase because the complexity has grown. Design or redesign of software architecture helps to reduce the complexity allows for efficient and scalable development. Despite common knowledge of resource allocation to architectural design tasks is widespread in organizations benefits are not visible and quantifiable, and the cost of rearchitecting is seem as a deterrent. This deters from a proactive and frequent scheduling for software architectural effort allocation. Most software architectural effort allocation occurs reactively when the increase in cost per functional unit is noticeably high and when technical debt has accumulated into the software system. In order to estimate software architecture value it would be useful to have estimation techniques that can help organizations to quantify their value to support effort allocation decision-making process.  For this purpose we tried another approach that departs from computing a cost propagation. This alternative proposal is based on directly mapping value from user-visible concerns, such 52  as features into user-invisible but organizational and technical value concerns such as architectural elements and their development tasks. However, this approach proved difficult for us to validate due to the difficulty to produce a proper validation strategy and because of the time and resources that would have been needed.  4.2 Value for Invisible Elements in the Backlog Kruchten (2010) proposes a value system framework in order to classify and prioritize the different elements in a backlog. Kruchten suggests that software development activities can be classified in four categories, within two dimensions: visibility and value as shown in Figure 4-1. In the visibility dimension, there are user-visible and user-invisible development activity types. Features are visible to the client or end-user whereas invisible architectural elements are hidden to the end-user.  Figure 4-1 value system framework  To use an analogy, architectural elements are concealed in the same way as the engine, chassis and other structural elements are hidden to the driver. All that the driver is concerned with are the features realized in the parts of the car that the driver has contact with or the car 53  interface. The car designers and manufacturers have a knowledge and perception of the car infrastructure and can better visualize tasks and the infrastructure involved in the production of future car models. The infrastructure knowledge supports their design decision-making, time, cost and effort estimation in the long-term.  In the same way, in software development infrastructure, design tasks which include the rationales for design decisions, architectural designs support the technical and organizational decision-making progress. This design and planning provide with modularization, layers, standardization, abstraction, reuse and a reduction of code dependencies. A software architectural design provide with standards, interfaces and a framework to implement features and following these guidelines dependency is abstracted and minimized (Lakos 1996) and thus reducing code dependencies and the probability of change in the rest of the module components.  The value dimension distinguishes development tasks that involve a positive or negative value to the system. In the previous paragraph the functional and architectural tasks are used as visible and invisible examples correspondingly, both adding positive value to the system. On the other hand, there are potentially code implementations with side effect aspects in a software system which, if not corrected, will continue to or could have a negative impact on the system. Defect correction tasks do not add value per se, but including these types of tasks in the schedule will avoid unexpected effects, such as involuntary behavior, that decrease the value of the system. For instance, the negative value of a defect that causes incorrect results might require additional effort from the user. Technical debt is a metaphor that refers to elements that cause an increase in effort and that are perceived as a consequence of suboptimal implementation. Technical debt are code side-effects that result in a negative quality in the system in the present. Furthermore, this technical debt will cause 54  stronger long-term negative effects on future development effort. Software that is built on top of suboptimal, technical debt ridden structures will incur in costs for removal and reworking (Nord 2012). In this sense, the risk that technical debt poses on the system is high.  Kruchten (2010) argues that including all categories of tasks inside the backlog and giving them a value allows for their direct and active prioritization and management. Wiegers (1999) proposes that scheduling priorities based on feature value and implicitly most scheduling approaches have in their backlog formed by functional requirements. In the same sense, agile methods advocate for the delivery of value to the client as one of their main tenets (Beck et al. 2001). Kruchten suggests that including all the types of tasks in the backlog and estimating their value can help them to be scheduled quantitatively rather than qualitatively. This gives more clarity and substance to aspects that subtract value and that are hard to perceive by end-users and stakeholders and organizational managers who make decisions on software development resource allocation.  4.2.1 Benefit for Architectural Elements We attempted to derive value for user-invisible architectural elements and its tasks from the visible activity types it supports. We use the value formula from Wiegers (1999) which is similar to a benefit-cost ratio, and other considerations, and we assume that the cost for invisible elements can be estimated by the software development team. In a similar way Ojala (2008) proposes that benefits can be assessed on the basis that the business and/or operational side of organizations have a better perspective on the benefits that a given architectural element might bestow on a system based on the quality attributes it supports. The approach in this chapter estimates the benefit as a relationship of the features and nonfunctional requirements.  55  The challenge for estimating value for invisible development types is that their benefit is intangible or abstract. Denne (2004) proposes to incrementally implement the architectural structure by incrementally including and refactoring architectural elements in the release plan. The approach in this chapter aims to estimate a value for software architectural elements in order to prioritize them by mapping, inferring and making explicit its cost and benefits. By doing so, development teams will have a better visibility and control of the benefits and costs and organizations can adapt their strategies accordingly considering other concerns such as revenue in time and net present value principles as Denne suggests.  An architectural element’s benefits are difficult to estimate or intangible for an organization; however, the architectural elements approximate cost can be estimated to some degree. Software development team with experience and knowledge can make educated guesses about the effort needed to implement architectural elements.  Bachmann, Bass, and Nord (2007) cost-benefit analysis method (CBAM) estimates the architectural elements value based on the judgments of stakeholders and technical staff. CBAM is a complete and extensive method that requires an intensive effort from experienced software architects and requires commitment from business stakeholders. Hence, costbenefit analysis method requires an extensive effort. The approach proposed in this thesis is an attempt to provide a lighter method for estimating the value of architectural elements.  4.3 Mapping Benefit from Features to Architectural Elements Since architectural elements provide benefits to features, this study models these relationships as constraints used to estimate value. The purpose of the mapping is to  56  estimate a value for architectural elements based on the features it supports as illustrated by Figure 4-2. It is important to assign a value for scheduling, reducing technical debt, to reduce cost, to be able to react more quickly to changes in the commercial, technical and social environment. Visible feat A  Visible feat B  Visible feat D  Visible feat E  Benefit: 10  Benefit: 7  Benefit: 3  Benefit: 16  Cost: 2  Cost: 1  Cost: 1  Cost: 2  Invisible ArchEl C  Invisible ArchEl F  Benefit: ?  Benefit: ?  Cost: 2  Cost: 2  Figure 4-2 Mapping of features to architectural elements  For simplicity we consider that a system is composed of a set of features:  S  {f1, f2, fn} (4) and supported by a set of architectural elements:  AE   AE1, AE2 ,  , AE n  (5)  Whether a feature, sub-feature, or code artifact is implemented irrelevant since these elements are considered visible elements that have benefits, costs and consequently a value. For simplicity all features, sub-features, and tasks are considered features in this approach. The value of an architectural element is derived from the features it supports by bestowing infrastructure services such as the implementation of common technical tasks inside the  57  system. The visible value of a system is understood by the stakeholders in an organization. The invisible value of architectural elements can be understood and consequently assessed if there is an explicit mapping to the architectural elements and quality attributes they support. The architectural element derived value, DV(AEi), is calculated in two parts; a) the maximum value of the features it supports b) the portion of visible value the system architectural elements realize. where i ( F )  F is the subset of features are supported by AEi , for this reason the derived value for AEi, DV(AEi) is expressed by:  DV(AEi )  max Vi ( F )     Vi ( F )  (6)  V ( F )   The maximum value from the supported features is to give a value that will be slightly higher than any of the supported elements. Value of an architectural element is also a function of the portion of the value it supports to the total visible value in the system.  In order to determine the value of an architectural element, this study departs from the notion that its value should ideally reflect the provided invisible sub-functionality, standards, interfaces, patterns and other technical infrastructure aspects provided by the architectural elements. However, during the initial design phase there are no implemented code artifacts, and a notion of its ratio is lacking. Since the goal is to provide a light-weight metric, the following ideas for determining the architectural element value were considered:  a) A heuristic percentage of the value of each feature supported, e.g. 10%  However, this would imply that if the total value of an architectural element was less than the value of any of its supported features, the feature(s) with value greater than the architectural  58  element would be implemented first. If this rationale is compared to building construction, then if wall decorations in a bathroom have greater value than the plumbing inside it. Then the decorations would be scheduled before plumbing. Implementing tasks that depend on hidden infrastructures result in greater costs at the end of the project life cycle. Part of the decoration would have to be removed to reinstall the plumbing followed by a wall reconstruction and finishing (Nord 2012). Accordingly, the benefit for a given architectural element should be greater than any of the elements it supports:  b) The value of an architectural element is greater than any of the elements it supports.  the maximum value from the supported elements plus this calculation of value based on the ratio of the sum of the supported elements’ value and the sum of the visible elements value in the system an architectural value is expressed in equation (6).  4.3.1 Scheduling The value obtained is used for scheduling in which all of the features, sub-features and architectural elements are part of the backlog B  B  {F1 , SF1_1 , F 2,..., FN , AE1 , AE2 ,..., AEN } (7)  R  R1 , R 2 , , R n  (8) I  I1 , I2 , , I n  (9) Ii  Fi , ...,Fj , , SFi , ...SFj , AEi ,.., AE j (10) Iteration task value is used to prioritize and allocate tasks into a project schedule. The iteration length is set in terms of the time a development team is available. For instance, if a one week iteration is desired with a team of 5 members, and each week is 40 hours, then  59  each iteration should have a value of 400 hours. Given that the cost of the elements is known, and that the cost is considered as the required time or effort, cost is used as a capacity factor and can be quantified to determine how many tasks assigned to an iteration.  4.3.2 Implementation This study used the Agile Planner for Digital Tabletop (Wang, Ghanam, and Maurer 2009) developed by the Agile Surface Engineering laboratory from University of Calgary. This software was developed for surface tabletops supporting agile project planning. This software was used to be used by distributed teams in an agile project planning. This thesis intended to use the agile planner for digital tabletop to express the code artifacts and the architectural elements supporting them in order to map the value from them. An infrared sensor and an infrared pen were added to the software in order to allow the digital tabletop features to be available when the tool was projected on a screen.  One of the challenges of implementation was to create an algorithm capable of determining the “levels” within the hierarchical structure of dependencies. For this reason, a searching algorithm based on the depth-first search algorithm (Tarjan 1972) was used to create all of the possible paths and thus to obtain the longest sequence of chained dependencies. This sequence was used to determine how many levels a software hierarchical structure has. One a hierarchical structure was obtained.  The value-deriving approach in this chapter used the value estimation proposed by Wiegers (1999), which expands benefit to cost into four elements, benefit plus penalty divided by the sum of cost plus risk as:  value   (benefit * BW  penalty * PW ) (cos t * CW  risk * RW ) 60  (11)  This definition is used to reflect aspects not considered as value directly to the client but as environmental conditions that encourage or hinder the implementation features or architectural elements in the backlog. Also, Wiegers (1999) definition includes different weighing factors that can be used to indicate that in certain conditions benefits are preferred than others. For instance, if benefits are more important than penalties then a greater weight can be given to benefit (BW). Or for instance in a highly regulated environment such as software for medical devices penalties of not implementing a regulation could have a greater weight than implementing a new feature. For simplicity in this study all the weights have been set to 1. The value of an element (feature or architectural element) is used to prioritize tasks in scheduling. For this reason, the result is referred as priority rather than value in the estimation application shown in the next page. Also, because in other literature such as () it is The estimated derived or mapped value or priority is referred as constrained priority since its used as a priority that is constrained to be implemented first by the dependencies The derived value is referred as constrained priority since its used as a priority that is constrained to be implemented first by the dependencies modeled as constraints (Ruhe 2005; Greer 2004). Since the values are used for prioritizing and to display them as integers priority and constrained priority are multiplied by a factor of 10,000 to obtain an integer from the smallest possible value according equation (11).  After the longest paths are calculated, levels are assigned to each element. The lowest level l is assigned to elements that do not provide any dependencies or that can be considered as the visible elements. The value assigning algorithm starts scanning the elements assigned at level l+1, l+2 and assigns them with another value termed constrained priority. The value that takes into account dependencies was named constrained priority since value is used for prioritizing and constrained because dependencies are used to constrain which element  61  should be scheduled first. The values calculated for a hypothetical system are shown in Figure 4-3.  Figure 4-3 Priority(value) estimated for Architectural Elements (yellow)  For instance for element C which has only cost has only cost: C (AE) benefit ? Penalty ? cost 10 risk ? Priority ? (value) Constrained 59181 Priority Table 4-1 Architectural Element before mapping  62  from the chain of dependencies value is derived, from instance visible element J has the following values: J (feature) benefit Penalty cost risk Priority (value)  4 5 1 1 4.5*10,000=45,000 max(45,000, 12500) + (12,500+45,000)/(165,272)*10,000 =48,479  Constrained Priority  Table 4-2 Feature after mapping  The derived value contained in constrained priority is mapped all the way down to element C, which has no benefits and no priority, but does have a constrained priority. Similarly for the other elements with no benefit, the prioritization process produces the following results:  Priority Priority Name Sign Visible Benefit Penalty Cost Risk Constrained 78038 67860 66868 64070 57836 57231 56918 56918 53671 50609 50609 50609 50609 50609 47722 12500 10000 8235 4375  0 0 0 0 0 0 10000 7272 10000 5000 15555 14000 23333 0 45000 12500 10000 8235 4375  K D L M H C P N O I A B E Z J R F G Q  + + + + + + + + + + + + + + + + + + + 63  False False False False False False True True True True True True True False True True True True True  0 0 0 0 0 0 7 7 5 3 9 8 6 0 4 7 1 5 3  0 0 0 0 0 0 2 1 1 6 5 6 1 0 5 3 1 9 4  1 4 2 5 3 1 2 7 5 9 8 7 2 2 1 7 1 8 9  6 0 1 1 1 0 7 4 1 9 1 3 1 0 1 1 1 9 7  Table 4-3 Development element types prioritized for scheduling  Most of the architectural elements in the system were assigned with the greatest derived value or constrained priority, architectural elements K, D, L, M, H and C where assigned with the greatest priority for scheduling. In the same fashion all of the other elements in the backlog are assigned based on the constrained priority. The iteration length is set to a week, each week is supposed to have 5 workdays and since in this example there is only one developer the week has 40 hours available. The time needed to implement an element is that in the field “most likely” which in the case of architectural element K is 33 hours, the additional time is 33 but only the square value is used for scheduling. Therefore, the remaining time for week 1 is not 7, but 7 minus square root of 6 which is 1.29.  Figure 4-4 Tasks assigned to iterations  64  4.4 Conclusion While this approach allowed mapping value to architectural elements with a benefit that is not directly quantifiable, it had the main disadvantage that while intuitive is cumbersome to validate. As more architectural elements are included in the design to support the system’s user-visible functional elements the more value the supporting architectural elements will have. While this approach is intuitive it faces the problem of knowing how much effort to allocate to design, and whether this effort is aligned with the best value for a system (Baldwin and Clark 2000). This is because improvements can be done almost indefinitely while consuming development resources since architectural benefit is unknown. Also another difficulty was to compare the software architecture improved system and schedule to a baseline with no architectural elements. At first instance, this could involve two development teams for the same software system, one team developing an architecturally improved system and another developing a no-architectured system. We were looking at projects with more than six months and with at least four full-time developers. This validation approach requires funding and/or organizations promoting a project would consume critical development resources. For this reason, we decided to try a different the cost propagation metric (MacCormack 2008) in Chapter 5. .  65  5 Dealing with Dependency Strength 5.1 Introduction This section describes the first step we took in our attempt to improve the MacCormack’s (2008) propagation cost metric, to make it more pertinent to calculate and compare different architectural strategies. MacCormack’s original metric is based in crude binary dependencies but this does reflect well the reality of software systems, whose components or modules can be more or less dependent on each other. This study proposes a modified propagation cost metric in which the existence of dependencies and their relative strengths are used to estimate a modified propagation cost. This approach presents challenges because the original propagation cost calculation employs Boolean values to characterize module dependencies and algebraic operations to estimate value for software architectural design. Replacing the Boolean values and algebraic operations with numbers does not follow the logic of the original change propagation model and alternatives are analyzed in this section.  5.2 Propagation Cost Rather than using Boolean descriptors, the use of strengths to describe dependencies as proposed by MacCormack (2008), in a design structure matrix allows more information to differentiate and qualify the relationships among modules to be captured. For instance, in systems there may be modules with a low number dependencies between modules; while other modules have the majority of the dependencies in the system. Therefore, modules that have dependencies with more strength play a more critical role when modifications are made to them: the associated change propagation potentially affects more elements than do the weaker dependencies that exists between modules.  66  Steward (1981) proposed the use of design structure matrices as support for design. Steward suggested that expressing system interdependencies in a design structure matrix was useful to understand interdependencies and precedence. Having these precedence dependencies in a matrix helps to partition a design structure matrix and identify groups of elements that are closely related elements that might need taking into consideration when making designdecisions  Propagation cost estimation (MacCormack 2008) in equation (12) measures potential change, by adding up all the directly and indirectly dependency propagation in the visibility matrix, V for all modules. Element Vij represents the dependency between module i and module j. This addition is normalized by the number of all possible model relationships for a system with n elements, such number is n2. n  PC   n  Vij j 1 i 1 2  n 1  where  (12)  n  V  D i 0  i  (13)  The estimation of the visibility matrix, V, is explained in the next section.  5.2.1 Limitation of Boolean Dependencies However, incorporating dependency strengths also poses the challenge on how to estimate the visibility matrix (MacCormack 2008; Sharman and Yassine 2004). Visibility is the sum of all the matrix powers (Warfield 1973), which calculates the indirect dependencies in is more commonly expressed as a one-bit binary fashion equivalent to Boolean values. Each matrix  67  power contains a path with a length of that power; where the longest path for a system expressed in a design structure matrix, Dnxn, with n modules is n-1. The visibility is initially defined by MacCormack (2008) as in equation (13); however, this study has found that D0, or the identity matrix, introduces cycles because it implies an element referencing itself (Frost 1992). Therefore, any dependency to an element that depends on itself will produce a cycle in the same way Frost (1992) mentions that eliminating the diagonal at every Di reduces the effects on cycling. For this reason, this thesis proposes to modify visibility as in equation(14): n 1  V  D i 1  i  (14)  In order to calculate each path of length i, or power i of a design structure matrix, D, the dot product is used to obtain each Di :  Di  D  Di 1  (15)  This dot product obtains for each cell in the matrix a one-bit binary number indicating whether an indirect dependency exists, with a 1, or does not exist, with a 0. This indirect dependency is estimated from two direct dependencies x and y as illustrated in Figure 5-1, and is expressed as a function of x and y. Where x is the one-bit binary dependency descriptor from component A to B, and y is the dependency from B to C. When shifting to weighed dependencies in the design structure matrix of a system the direct multiplication in the standard dot product scales direct dependencies x and y. While this might be possible for a given case, it might not be general for all indirect dependencies. Intuitively there is the need for a modified dot product operation that has a more general estimation for indirect dependency in the static structure of a system.  68  Figure 5-1 Indirect dependency from A to C  5.2.2 Modified Dot Product The dot product is used to estimate the visibility matrix, V, for a design structure matrix as in equation(). The definition of a matrix multiplication: Q=R  S where  for every cell of Q is obtained as: n  Q[i, j ]   R[i, k ]* S[k , j ] k 1  (0.16)  For this matter, equation (0.16) is expressed into its Boolean algebraic equivalent as: n  Qi , j   Ri ,k  Sk , j k 1  Where  (0.17)   , or series operator is a conjunction and  operator is the disjunction algebraic  operation. Therefore, new operators have to be propose in order to obtain Q when Ri,k and Sk,j are different than Boolean or its one-bit binary number representation.  5.2.3 Parallel Operator  To estimate the dependency for a module that has more than one indirect dependency to other modules the parallel operation is used, which is performed by an addition. For one-bit binary values, Boolean algebraic operations are used and for weighed dependencies another approach is needed. For instance, to estimate the indirect dependency from module A to C, QAC is expressed as illustrated by Figure 5-2 where element A, the dependency consumer,  69  depends at the same time on elements D, E and B, which both depend on C. To obtain QA,C a parallel operation is and can be expressed as:  i. direct dependencies  ii. series operations  iii. parallel operation  QA,C= parallel (series(x, w), series(x, y), series(x, z)) QA,C=   ( (w  t) ,(x  y), (z  u)) QA,C=   ( e , f, g)  Figure 5-2 Parallel Operator,    5.3 Change Propagation Analysis The modified change propagation metric intends to measure the propagation of a given change into the rest of the structure. This modified change propagation metric uses a visibility matrix (MacCormack 2008; Sharman and Yassine 2004) obtained by powering the design structure matrix (Warfield 1973). Since the dependencies are changing from one-bit binary to weighed (strength) in the design structure matrix a new modified product is needed in order to correctly or approximately reflect the propagation. In order to find a modified dot product that reflects the change propagation in a system the following possibilities for parallel paths are explored:  70  PRL01.The higher the number of parallel paths as indirect dependencies from one element to another exist a higher number in the modified dot product should result.  Another possibility contrary to PRL01 is:  PRL02. Regardless of the number of parallel paths for a given indirect dependency, the calculation of the modified dot product should not be affected.  Concerning the result of the change propagation, which is the sum of all the matrix powers:  CP01. The change propagation calculation before and after the redesign of a system should result in a decreased change propagation.  Another possibility contrary to CP01:  CP02. If the change propagation calculation before and after the redesign does not decrease, the value of such a redesign can be determined in a per release fashion as in (Nord 2012).  5.4 Change Propagation Estimation Proposed Variants In order to validate our assumptions on how to estimate the change propagation metric we describe variants for how dependencies with strength propagate in a system. We propose the Sum Parallel Paths variant to support PRL01, and variants Divide by System Size and Divide by Propagation Length variants to support PRL02.  71  This thesis uses the approaches proposed by Kruchten et al. (2012) and extended variants on how the estimated indirect dependencies should be combined to estimate one indirect dependency from one module to another:  VAR1. The total indirect dependency estimate should increase when more parallel indirect dependencies exist. Sum all Parallel Paths variant where   (e, f, g) = (e + f + g)  VAR2. The total indirect dependency estimate should consider only the maximum parallel indirect dependency out of all dependencies. Max from Parallel Paths variant where   (e, f, g) = MAX ( e, f, g)  VAR3. The total indirect dependency estimate should average all the parallel indirect dependencies. Average Parallel Paths variant where the:   (e, f, g) = (e + f + g) / count(e, f, g)  VAR4. The total indirect dependency estimate should take the maximum of the parallel indirect dependencies plus the minimum divided by a constant. Max Parallel Paths Plus variant where the:   (e, f, g) = MAX ( e, f, g) + MIN(e, f, g) / c  One of the purposes of layered architectural styles is to encapsulate details of a group of code artifacts, such as classes and packages by an interface. This is an example of the detail hiding principle (Parnas 1972), where a complex structure is simplified allowing developers to refer to the interface instead of needing to know the inside details of the group of elements in a layer. The original dot product propagates dependencies without taking into account that  72  the intent of a layer is to minimize complexity. It is true that a change in the internals of the layer could affect the functionality, however; the interface is an agreement that the functionality will remain intact while the details that implement it would produce the same or a similar usable result. Also introducing a layer might be considered a priori as an increase of complexity because it increases the number of elements in a system. However, the intent of introducing another module is to reduce the complexity and also the number of dependencies in the static structure. Taking into consideration the use of layers in the structure of software systems we propose the following variants:  VAR5. The total indirect dependency estimate should add all the indirect parallel dependencies and divided them by the numbers of elements in the system. Divide by System Size variant where the:   (e, f, g) = (e + f +g) / n; where n is the  size of Dnxn  VAR6. The total indirect dependency estimate should add the parallel indirect dependencies and divides it by the a denominator that decreases the result when the length of the path, or i, in equation (2) changes Divide by 1.1 variant where the:   (e, f, g) = (e + f + g) / 1.1; where i is the power of Di  VAR7. The total indirect dependency estimate should the sum divided by the a denominator that decreases the result when the length of the path, or i, in equation (2) changes Divide by Propagation Length variant where the: from Di n  n 1  V   n   D j 1 k 1  i 1  73  i  i jk  (18)   (e, f, g) = (e +f + g)/i; where i is  Variant Divide by propagation Length is also based on the observations of (Kazman 2012).  5.5 Validating the 7 Variants In order to examine the behavior of the modified change propagation variants with parallel paths the following hypothetical systems are analyzed. In (Kruchten et al. 2012) we propose using a strength between [0..1]. Similarly, for simplicity in this section strengths are exemplified by three sample sizes: large=1, medium=0.5, small=0.1, and no dependency = 0 To simplify the analysis at this stage this study uses all dependency strengths set to large (1). One consideration is setting all strengths to large before and after redesign adds another level when a mediating or abstracting element is added. This causes parallel propagation estimations SUM, MAX, from the entire system will increase; however, the variants 5, 6 and 7 take into account that layers and interfaces minimize communication and simplify interaction with the elements in a layer. The improvement in the design will be visible as: i)  reduction number of modules depending on other modules  ii)  reduction in the strength of the direct dependencies for interfaces  iii)  reduction of overall maintainability cost,  the more visible improvement in organizations is maintainability cost reduction. To visualize maintainability cost reduction then at least two alternative designs have to be compared in a set of releases and multiply the dependency matrix by the cost vector similar to the work in (Nord 2012). We have used a set of small but typical cases of module configuration: Case 1.  Two parallel paths with the same destination  Case 2.  Two parallel paths with the two destinations  Case 3.  Layered Structure with multiple parallel paths version 1  74  Case 4.  Layered Structure with multiple parallel paths version 2  5.5.1 Case 1: Two Parallel Paths for the Same Destination In order to examine the behavior of the change propagation variants with parallel paths a hypothetical system is analyzed. This system has a dependency with two parallel paths as it is shown in Figure 5-3.  a) graph  b) DSM * multiplied by 10  Figure 5-3 System with two parallel paths (strength to be divided by 10*)  *Note: Dependency strengths in these examples appear multiplied by 10 and are divided by 10 afterwards. These steps are needed to express strengths lesser than the unit since the analysis tool only captures integer for dependencies. For instance, to express a medium strength dependency, the dependency is captured as 5 and later divided by 10.  The modified change propagation is obtained by estimating the visibility matrix, the longest paths for this system are of length 2. The dependency for paths of length 2 from A to E is formed by parallel(series(A,C,E)) series(A,D,E) which are expressed with the paths ACE, ADE in the cell that describes the dependency from A  75  to E. The indirect dependency  between elements A and E of length 2 is shown in cell crossed by column A and row E of matrix M2 and has the paths shown in Table 5-1.  M A B C D  A X 0 1 1  B 0 X 0 1  C 0 0 X 0  D 0 0 0 X  E 0 0 0 0  F 0 0 0 0  E  0  0  1  1 X 0  F  0  0  0  1 0 X  M A A X B 0 C 1 D 1 E 0 F 0 M2 A A X B 0 C 0 D 0 ACE E ADE F ADF  B 0 X 0 1 0 0 B 0 X 0 0  C 0 0 X 0 1 0 C 0 0 X 0  D 0 0 0 X 1 1 D 0 0 0 X  E 0 0 0 0 X 0 E 0 0 0 0  F 0 0 0 0 0 X F 0 0 0 0  BDE  0  0  X  0  BDF  0  0  0  X  Table 5-1. Dot Product for system with two parallel paths  The results for each of the variants and using the subvariants are: M2 cell path hop A-E ACE AC= ADE AD= A-F ADF AD= B-E BDE BD= B-F BDF BD= M2 Sum M1 SUM V=M1+M2 V/N V/N^2  series 1 1 1 1 1 1  2 CE= DE= DF= DE= DF=  min(hops)  1 1 1 1 1  1 1 1 1 1  parallel SUM MAX AVER MAX+  Div/n  Div/1.1.  Div/pow  2 1 1 1 1 1 1 1 1 1 1 1 5 4 4 6 6 6 11 10 10 1.83 1.67 1.67 0.306 0.278 0.278  0.33 0.17 0.17 0.17 0.83 6 6.83 1.14 0.19  1.82 0.91 0.91 0.91 4.5 6 10.5 1.8 0.29  1.0 0.50 0.50 0.50 2.5 6 8.5 1.4 0.24  1.5 1 1 1 4.5 6 10.5 1.75 0.292  Table 5-2 Change propagation for variants  Appendix 1 contains the details of the results obtained from the analysis tool (extension on Lattix).  5.5.2 Case 2: Two Parallel Paths for Two Destinations In order to analyze the behavior of the cost propagation variants we want to consider the following hypothetical system with two indirect dependencies  76  a) graph  b) design structure matrix  Figure 5-4 Four parallel paths  It has three layers or levels, its longest paths are of length 2. The next table shows its dot product for power 2, or paths of length two:  M A B C D E  A X 0 1 1 0  B 0 X 0 1 1  C 0 0 X 0 0  D 0 0 0 X 0  E 0 0 0 0 X  F 0 0 0 0 0  G 0 0 0 0 0  F  0  0  1  1 0 X 0  G 0  0  0  1 1 0 X  M A B A X 0 B 0 X C 1 0 D 1 1 E 0 1 F 0 0 G 0 0 M2 A B A X 0 B 0 X C 0 0 D 0 0 E 0 0 ACF F BDF ADF BDG G ADG BEG  C 0 0 X 0 0 1 0 C 0 0 X 0 0  D 0 0 0 X 0 1 1 D 0 0 0 X 0  E 0 0 0 0 X 0 1 E 0 0 0 0 X  F 0 0 0 0 0 X 0 F 0 0 0 0 0  G 0 0 0 0 0 0 X G 0 0 0 0 0  0  0  0  X  0  0  0  0  0  X  Figure 5-5 Dot product for system with four parallel paths  The matrix dot product variant values for this system appears in the following table:  77  POWER 2 N= 7 M2 cell path hop A-F ACF AC= ADF AD= A-G ADG AD= B-F BDF BD= BDG BD= B-G BEG BD= M2 Sum M1 SUM V=M1+M2 V/N V/N^2  series 1 1 1 1 1 1 1  min(hops)  2 CF= DF= GF= DF= DG= EG=  1 1 1 1 1 1  1 1 1 1 1 1  parallel series parallel SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR aver(hops) SERS_AVER_AVER_POWR 1 2 1 1 1.5 0.29 1.82 1.0 1 1 1 1 1 1 0.14 0.91 0.50 1 0.5 1 1 1 1 0.14 0.91 0.50 1 0.5 1 2 1 1 1.5 0.29 1.82 1.0 1 1 6 4 4 5 0.86 5.5 3.0 3.0 8 8 8 8 8 8 8 8 14 12 12 13 8.9 13.5 11 11 2.0 1.71 1.71 1.86 1.27 1.9 1.6 1.6 0.29 0.24 0.24 0.27 0.18 0.27 0.22 0.22  Table 5-3 Parallels change propagation  If the parallels where to be removed from the system at least the variants SUM PARALLEL PATHS, MAX, AVER, MAX+ should produce the same result as in Table 5-4:  Figure 5-6 Parallels eliminated M2 series parallel series parallel cell path hop 1 2 min(hops) SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR aver(hops) SERS_AVER_AVER_POWR A-F ACF AC= 1 CF= 1 1 1 1 1 1 1 0.14 0.91 0.5 0.5 B-G BEG BD= 1 EG= 1 M2 Sum M1 SUM V=M1+M2 V/N V/N^2  1  1 1 1 2 2 2 6 6 6 8 8 8 1.1 1.14 1.14 0.16 0.16 0.16  1 2 6 8 1.14 0.16  0.14 0.29 6 6.3 0.90 0.13  0.91 1.8 6 7.8 1.1 0.16  0.5 1.0 6 7 1.0 0.14  Table 5-4 Parallels eliminated change propagation  78  1  0.5 1.0 6 7 1.0 0.14  We can observe that in this case the SUM PARALLEL PATHS, MAX, AVER and MAX+ variants behave the same, while the variants, Divide by System Size, Divide by 1.1, and Divide by propagation length, that divide accordingly to the depth of the propagation paths show a lower modified change propagation.  5.5.3 Case 3: Layered Structure with Multiple Parallel Paths Version 1 We consider the following layered structure with parallel paths:  a) Graph  b) DSM Figure 5-7 Layered structure with multiple paths version 1  For this structure paths of length 2 and 3 exist. The dependency strength for all of the relationship is 1 (represented by 10 and divided by 10 later in order to represent values lesser than the unit in the analysis tool). The paths of length two are shown in Table 5-5 that represents the dot product:  79  M2 A B C D E F  A X 0 0 0 0 ACF AC G G AD G  C 0 0 X 0 0 0  D 0 0 0 X 0 0  E 0 0 0 0 X 0  F 0 0 0 0 0 X  G 0 0 0 0 0 0  H 0 0 0 0 0 0  I 0 0 0 0 0 0  J 0 0 0 0 0 0  K 0 0 0 0 0 0  L 0 0 0 0 0 0  M 0 0 0 0 0 0  N 0 0 0 0 0 0  BDG 0  0  0  0  X  0  0  0  0  0  0  0  H ADH  0  0  0  0  X  0  0  0  0  0  I J  BDH 0 BEH 0 BEI 0 0 CFJ  0 0  0 0  0 0  0 0  0 0  X 0  0 X  0 0  0 0  K  0  0  0  0  0  0  0  X  0  L  0  M  0  N  0  B 0 X 0 0 0 0  0  CFK CG DGK K  DGL 0 CGL EHL DHL EHM 0 0 DHM EIM 0 0 0 EIN  M3 A B C D E F G  A X 0 0 0 0 0 0  B 0 X 0 0 0 0 0  C D E F G H I J K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 X 0 0 0 0  H  0  0  0  0 0 0 0 X 0 0 0 0 0 0  0  I J  0 ACFJ  0 0  0 0  0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0  0 0  0 0  K  ACFK ACGK BDGK ADGK  0  0 0 0 0 0 0 0 X 0 0 0  0  0  ACGL BDGL ADGL BDHL 0 ADHL BEHL BDHM M ADHM BEHM 0 BEIM N 0 BEIN 0 L  0  0  0  0  0  0  X  0  0  0  0  0  0  0  0  0  X  0  0  0  0  0  0  0  0  0  X  L 0 0 0 0 0 0 0  M 0 0 0 0 0 0 0  N 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 X 0 0  0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 X  Table 5-5 paths of length 2 (M2) and length 3 (M3)  The results for the hypothetical system with the proposed variants are shown in the next table:  path hop 1 M3 Total M2 total M1 Total V=M1+M2+M3 V/N V/N^2  2  3  series parallel min(hops) SUM 16 20 18 54 3.9 0.28  MAX AVER MAX+ DIV_SIZE DIV_1_1 8 8 10 1.1 14.5 15 15 17.5 1.4 18.2 18 18 18 18 18 41 41 45.5 20.6 50.7 2.9 2.9 3.3 1.5 3.6 0.21 0.21 0.23 0.10 0.26  DIV_POW  5.3 10 18 33.3 2.4 0.17  Table 5-6 Change Propagation for “layered structure multiple paths version 1”  We can observe that the modified change propagation variant that has the greatest increase in SUM PARALLEL PATHS, followed by Divide by 1.1. The variants with the lowest propagation are Average by system size and Divide by Propagation Length.  80  5.5.4 Case 4: Layered Structure with Multiple Parallel Paths Version 2 A system with similar dependency structure as in previous section but without the direct dependency from element I to M. This results in less indirect dependencies generated from elements B and E. This is shown in Figure 5-8 in the graph and as a red cell in the design structure matrix of the system. M A B C D E F G H A X 0 0 0 0 0 0 0 B 0 X 0 0 0 0 0 0 C 1 0 X 0 0 0 0 0 D 1 1 0 X 0 0 0 0 E 0 1 0 0 X 0 0 0 F 0 0 1 0 0 X 0 0  a) Graph  I 0 0 0 0 0 0  J 0 0 0 0 0 0  K 0 0 0 0 0 0  L 0 0 0 0 0 0  M 0 0 0 0 0 0  N 0 0 0 0 0 0  G  0  0 1 1 0 0 X 0 0 0 0 0 0 0  H  0  0 0 1 1 0 0 X 0 0 0 0 0 0  I J  0 0  0 0 0 1 0 0 0 X 0 0 0 0 0 0 0 0 0 1 0 0 0 X 0 0 0 0  K  0  0 0 0 0 1 1 0 0 0 X 0 0 0  L  0  0 0 0 0 0 1 1 0 0 0 X 0 0  M  0  0 0 0 0 0 0 1 0 0 0 0 X 0  N  0  0 0 0 0 0 0 0 1 0 0 0 0 X  b) DSM  Figure 5-8 Multiple paths asymmetric  The visibility and change propagation results can be observed and compared to the previous structure in Table 5-7.  path hop 1 2 3 V=M1+M2+M3 V/N V/N^2 a) Version 1 (symmetric)  series parallel min(hops) SUM 54 3.9 0.28  81  MAX AVER MAX+ DIV_SIZE DIV_1_1 41 41 45.5 20.6 50.7 2.9 2.9 3.3 1.5 3.6 0.21 0.21 0.23 0.10 0.26  DIV_POW  33.3 2.4 0.17  M3 cell path hop 1 2 3 V=M1+M2+M3 V/N V/N^2 b) Version 2 (asymmetric)  series min(hops)  SUM 51 3.6 0.26  parallel MAX AVER MAX+ DIV_SIZE DIV_1_1 40 40 44 20.5 49.8 2.9 2.9 3.1 1.5 3.6 0.20 0.20 0.22 0.10 0.25  DIV_POW  33.0 2.4 0.17  Table 5-7 Change propagation for symmetric and asymmetric systems  A decrease can be observed on the asymmetric over the symmetric system. However, in a closer inspection at the every power of the matrix power of 3 we can observer that the SUM PARALLEL PATHS variant is more sensible to the increase/decrease of parallel paths. To illustrate this we compare the symmetric and asymmetric DSM to power 3 or paths of length 3. It can be observed Table 5-8 that the dependencies from B-M for the symmetric system there are 3 parallel paths whereas there are only 2 for the asymmetric system. However, the variant MAX+ for the dependencies from B-M have the same value in both symmetric and asymmetric indirect dependency paths; despite the fact of the difference in number of parallel paths.  M3 cell A-J  path hop ACFJ AC= ACFK AC= A-K ACGK AC= ADGK AD= ACGL AC= A-L ADGL AD= ADHL AD= A-M ADHM AD= B-K BDGK BD= BDGL BD= B-L BDHL BD= BEHL BE= BDHM BD= B-M BEHM BE= BEIM BE= B-N BEIN BE= M3 Total  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  2 CF= CF= CG= DG= CG= DG= DG= DH= DG= DG= DH= EH= DH= EH= EI= EI=  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  3 FJ= FK= GK= GK= GL= GL= HL= HM= GK= GL= HL= HL= HM= HM= IM= IN=  series parallel M3 SUM MAX AVER MAX+ ALL_AVER PARL1_1 PARL_POWR SERS_AVER_PARL_POWR cell path hop 1 1 1 1 1 0.07 0.91 0.33 0.33 A-J ACFJ AC= 1 ACFK AC= 1 A-K ACGK AC= 1 3 1 1 1.5 0.21 2.73 1.0 1.0 ADGK AD= 1 ACGL AC= 1 A-L ADGL AD= 1 3 1 1 1.5 0.21 2.73 1.0 1.0 ADHL AD= 1 1 1 1 1 0.07 0.91 0.33 0.33 A-M ADHM AD= 1 1 1 1 1 0.07 0.91 0.33 0.33 B-K BDGK BD= 1 BDGL BD= 1 B-L BDHL BD= 1 3 1 1 1.5 0.21 2.73 1.0 1.0 BEHL BE= 1 BDHM BD= 1 B-M BEHM BE= 1 3 1 1 1.5 0.21 2.73 1.0 1.0 1 1 1 1 1 0.07 0.91 0.33 0.33 B-N BEIN BE= 16 8 8 10 1.1 14.5 5.3 5.3 M3 Total  min(hops)  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  a) Symmetric system  series 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  2 CF= CF= CG= DG= CG= DG= DG= DH= DG= DG= DH= EH= DH= EH=  1 1 1 1 1 1 1 1 1 1 1 1 1 1  3 FJ= FK= GK= GK= GL= GL= HL= HM= GK= GL= HL= HL= HM= HM=  min(hops)  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1 1 1 1 1 1 1 1 1 1 1 1 1 1  1 EI= 1 IN= 1  1  SUM 1  parallel MAX AVER MAX+ ALL_AVER PARL1_1 PARL_POWR SERS_AVER_PARL_POWR 1 1 1 0.07 0.91 0.33 0.33  3  1  1  1.5  0.21  2.73  1.0  1.0  3 1 1  1 1 1  1 1 1  1.5 1 1  0.21 0.07 0.07  2.73 0.91 0.91  1.0 0.33 0.33  1.0 0.33 0.33  3  1  1  1.5  0.21  2.73  1.0  1.0  2 1 15  1 1 8  1 1 8  1.5 1 10  0.14 0.07 1.1  1.82 0.91 13.6  0.7 0.33 5.0  0.67 0.33 5.0  b) asymmetric system  Table 5-8 Symmetric and Asymmetric paths of length 3  82  The symmetric system has 3 parallel dependencies from B to M with paths BDHM, BEHM and BEIM with a propagation of 3 for the variant SUM PARALLEL PATHS. Whereas the asymmetric system has 2 dependencies with paths BDHM and BEHM with a propagation with a propagation of 2 for the SUM PARALLEL PATHS variant. The MAX+ variant is 1.5 for both systems despite the difference.  1. SUM PARALLEL PATHS change propagation variant, V, increases from 4 to 100 times depending the version and level of detail of the system, to be explained in a further analysis. However, it is the variant that grows the most from all of the variants including the ones proposed by me, Marco Gonzalez. 2. However, the Sum Parallel Paths variant behaves as if there was a direct link from the element using the dependency from the one with providing it. For this reason, the change propagation could or should diminish the strength as in the MAX+ but with the ability to qualify multiple paths. 3. The roll of the mediator has to be explored with another example.  5.6 Normalization In this thesis, we tried another way of normalizing the dependency propagation by dividing the weighed V(n-1) visibility of length n-1 by the weighed V(1) visibility of length 1. However, we could not obtain a measurement that was comparable among different versions, dividing V(n-1) by V(1) indicates how much the propagation grows respective to the initial dependency matrix for a software system. MacCormack (2008) introduced the concept of normalization in order to obtain a ratio of indirect dependencies to the number of all possible dependencies within a system. MacCormack (2008) visibility matrix is the set of Boolean direct and indirect dependencies regardless. The estimation of the propagation cost is the  83  count of the true elements in the Boolean visibility matrix divided by n to the square. This is because n is the number of elements and n2 is the number of all possible dependencies within the system. Vij cell in the visibility matrix, V, and the cell Vij contains the direct and/or indirect relationships between module i and module j. This is expressed in the following formula: n  PC  Original MacCormack:  n  V j 1 i 1 2  n  ij  (1)  For a Boolean design structure matrix the result will indicate the ratio of indirect dependencies to the quantity of total possible dependencies within a system. For a weighed dependency design structure matrix such a division by n to the square will scale the sum of the propagated dependencies in the visibility. This is because the weighed visibility is estimated using dependency strength rather than Boolean descriptors. However, we could ask: what is the meaning of dividing the weighted visibility by n to the square? We initially proposed the alternative normalization as a ratio of propagation indirect dependencies to direct dependencies. The modified change propagation metric was:  n  MCP   n  V (n 1) j 1 i 1 n n  ij  V (1) j 1 i 1  (19)  ij  Where the visibility of length 1 is the matrix containing all of the dependencies of length 1, which is the same as the direct dependencies or:  84  k  V (i)   Di i 1  (20)  1  V (1)   D  D i  i 1  However, the results were not conclusive  5.7 Refactoring with Mediators (Brokers) In order to explore the effect of the change propagation a system with no mediators or brokers is used as an example the state of a system before being refactored. This is an example with a deliver first design later, approach.  5.7.1 Use All Structure 5 Elements 2 Layers We first examine a 5 element system arranged in two layers as an example of a system before a refactoring. The top layer elements use all elements in the bottom, and since there are only two layers there is no propagation or indirect dependencies. We obtaind the following results. As there are only two layers there is no propagation; however we obtain the results for comparison.  parallel parallel SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR SERS_AVER_AVER_POWR V=M1+M2 6 6 6 6 6 6 6 6 V/N 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 V/N^2 0.24 0.24 0.24 0.24 0.24 0.24 0.24 0.24  Figure 5-9 Use All 5 elements 2 levels  85  5.7.2 Use  All  Structure  13  Elements Two Layers We first examine a 13 element system with two layers as an example of a system before a refactoring. This is one example of the class explosion (Gal, Schröder-Preikschat, and Spinczyk 2001).  POWER N  1 13  parallel parallel SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR SERS_AVER_AVER_POWR V=M1+M2 42 V/N 3.23 all the same V/N^2 0.25  Figure 5-10 Use 13 elements 2 layers  Since there are no indirect relationships all the variants produce the same change propagation result. One thing to note is that if compared with use all 5 elements 2 layers is that while the matrix visibility, V, increased from 6 to 42; its density, V/n2, was increase was small from 0.24 to 0.2485. This is observable since an a system with a non-structured hierarchy there is a chance that this will happen or even worst some circular dependencies might exist filling the matrix portion above the diagonal.  86  5.7.3 Use All 5 Elements 2 Layers with Three Layers If the same system described in use all 5 elements with 2 layers where to use an artificial layer in order to simulate 3 levels we would obtain the following system:  POWER N  2 11  parallel parallel M2 SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR SERS_AVER_AVER_POWR V=M1+M2 18 18 18 18 12.5 17.5 15 15 V/N 1.6 1.64 1.64 1.64 1.14 1.6 1.4 1.4 V/N^2 0.15 0.15 0.15 0.15 0.10 0.14 0.12 0.12  Figure 5-11 Use all 3 levels  Using an extra layer could artificially diminish the density of the change propagation by increasing the number of elements. Probably one question to consider is if the elements in the intermediate layer are adding any value to the system or reducing the complexity. What is visible is that the modified change propagation was reduced 50 percent, from 0.24 to 0.12 for the Divide by Propagation Length. 5.7.3.1 Mediators 5.7.3.1.1 5 Elements 3 Layers One Mediator Structure Another variation of the use all 5 elements 2 layers system is by adding a mediator between the two layers: 87  parallel parallel SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR SERS_AVER_AVER_POWR V=M1+M2 11 11 11 11 6 10.5 8 8 V/N 1.83 1.83 1.83 1.8 1 1.7 1.3 1.3 V/N^2 0.31 0.31 0.31 0.31 0.17 0.2904 0.22 0.22  Figure 5-12 5 elements 3 layers one mediator structure  We can observe that the estimated propagations was reduced 9 percent, from 0.24 to 0.22 when refactoring the system use all 5 elements 2 layers to 5 elements 3 layers and one mediator. The strengths of both systems could probably be different in the refactoring; however, the strengths were set to high just to illustrate the fact. Probably the dependency strength from the top layer to the mediator could increase and the dependency from the mediator to the bottom layer would decrease. In order to simplify the analysis the strengths are fixed in this analysis. However, one of the goals of introducing mediators, or enforcing a layer that acts as an interface is to reduce the number of dependencies an to minimize the software dependencies. 5.7.3.1.2 13 Elements 3 Layers One Mediator Similarly we refactor the system use all 13 elements two layers to 13 Elements 3 layers one mediator.  88  parallel parallel SUM MAX AVER MAX+ ALL_AVER AVER1_1 AVER_POWR SERS_AVER_AVER_POWR V=M1+M2 55 55 55 55 16 51.2 34 34 V/N 3.93 1.83 1.83 1.8 1 1.7 1.3 1.3 V/N^2 0.28 0.31 0.31 0.31 0.17 0.2904 0.22 0.22  Figure 5-13 13 elements, 3 layers, one mediator  When the results from this system are compared to use all 13 elements two layers system that does not involve a mediator, as the roll of module F we can observe that for the variant Divide by propagation length, displayed as AVER_POWR in Figure 5-13, we can observe a decrease of 9 percent as well, from 0.22 to 0.25.  In the practice using a mediator decouples the modules making use of it, and this is a common practice in software engineering. However we still have the question of how does the use of mediators reflect their advantage in the modified change propagation. There are two possibilities: a) The change propagation metric should produce a smaller result when comparing similar systems refactored with architectural styles. This can be appreciated in the Divide By Propagation Length variant.  89  b) The overall advantage of using patterns and styles should be analyze by incorporating a vector of cost and taking into account modules to be changed and cost per releases as in the (Nord 2012)  5.8 Conclusions Based on the validations of our propagations we believe that dependencies propagate through the system; however, layers propagate dependencies but they also isolate them to a certain degree. For this reason we believe that general change propagation metric in software systems has a behavior similar to Divide by Propagation Length variant which has the formula (18): n  n 1  V   n   D j 1 k 1  i  i 1  i jk  (0.21)  This formula was be verified in chapter 3 as we address the issues in the original MacCormack (2008) propagation cost. We validate the metric in three open-source software and one in-house project  90  6 Addressing Cycles  6.1 Introduction This section presents how cycles that violate the architecture design were handled in our change propagation metric. Software systems, e.g. commercial and open-source software, usually have a hierarchical design in which modules in a level depend on elements on a lower level. However, the implementation of a software system involves in some cases violations to this architectural design rules where sometimes modules depend on modules in a higher hierarchical level; therefore, introducing cyclical dependencies that represent more effort while maintenance. While cycles have usually a lesser strength than the relationships following the design, cycles alter how change is propagated and create feedbacks. This section describes how cycles were treated in our estimation of the modified change propagation.  6.2 Transitivity and Change Propagation The change propagation metric estimates the potential change in the system caused by evolution. One way to estimate the potential change is to infer the indirect impact from the direct dependencies. The advantage of estimating change propagation from the direct dependencies is that it can be done a priori at design time and at any time during implementation. Obtaining the indirect dependencies, or relations, is similar to derive the expanded set of relations for a system in transitive closure. The transitive relation for a given set of relations R, in this thesis the relations are the direct dependencies, is the expanded set of relations R+; where R is included in R+, and R is minimal. (Lidl and Pilz 1997). The direct  91  dependencies for a software system can be expressed for its analysis as a directed graph or its equivalent adjacency matrix as it is illustrated in Figure 6-1.The matrix can be read as module in row X depends on module on column Y. The expanded or indirect dependencies for the system in Figure 6-1 are illustrated in Figure 6-2.  D A B C D E  A  B  C  D  B 1 X 0 0 0  C 0 1 X 0 0  D 0 0 1 X 0  E 0 0 0 1 X  ( dependency reads: module on row depends on Module on column)  E  Figure 6-1  A X 0 0 0 0  a) Directed graph  b) Design structure matrix  Figure 6-2 Expanded dependencies transitive closure  6.3 Change Propagation Metric and Cycled Dependencies When a cycle exists in a system propagation cost is artificially inflated and the more elements an levels a cycle involves the more the propagation cost will inflate. Laval (2011) also  92  suggest that cycles create a feedback effect, in order to avoid this feedback we describe the our approach to avoid them.  6.3.1 Matrix Powers and Propagation Visibility Change propagation expands the propagation cost metric from MacCormack (2008) that uses the dependency visibility (Sharman and Yassine 2004) in order to visualize indirect dependencies in a design structure matrix. MacCormack proposes the visibility matrix to be obtained by the following formula: n 1  V   Di i 0  (2)  The visibility matrix calculates the expanded set of indirect dependencies. Raising the design structure matrix of the system, D, to the power i, calculates the indirect dependencies with a path length of i. For instance, the indirect dependencies of length 2 for system in Figure 6-1 are shown in Figure 6-3. Raising D to n-1 power obtains the expanded relationships set of path length n-1, which obtains the longest path in an acyclical system as Warfield (1973) suggests when analyzing matrix properties for system modeling. The design structure matrix D has nxn dimension, where n is the total number of modules in the software system and its dependencies among each other. For this reason, the longest path among a system with n modules is n-1. As n-1 produce the longest paths visibility uses n-1 as the maximum path considered. 6.3.1.1 Indirect Dependencies A sample of indirect dependencies is observed in Figure 6-3. For the system in Figure 6-1 the indirect dependency from A to C of length 2, is shown by steps in Figure 6-3 a.i). The AC’  93  indirect path is shown in Figure 6-3. AC’ indirect dependency starts with the direct dependency from A to B shown in Figure 6-3 a.i) and continues with the direct dependency from B to C. This indirect dependency describes the path from A to B to C, or path ABC of length 2. Direct dependencies are also visible in the design structure matrix (DSM) in Figure 6-1 b). For instance, A depends on B can be read in Figure 6-1 b) from the cell on the intersection of the row for module A, and the column for module B. Similarly, the dependency -B depends on C- can also be read from the DSM. Likewise, direct dependencies are visible in graph in Figure 6-1 a), the dependency -A depends on B- is depicted with the arrow from module A to B. In the same manner, paths BCD and CDE are formed as it is shown in Figure 6-3 a) subsections a.ii) and a.iii). The resulting indirect dependencies, AC’, BD’ and CE’ for these paths are shown as a graph in Figure 6-3 section b). Additionally, AC’, BD’ and CE’ indirect dependencies are shown as a result of the property of powers of matrix (Warfield 1973) in Figure 6-3 section c). In this case, the design structure matrix of the system, D, raised to the power of 2 provides the propagated indirect dependencies of length 2.  94  D^2  A B C D E  a.i)A-C:A-B-C a.ii)BD:BCD a.iii)CE:CDE a) Separated paths  b) new indirect dependencies  A X 0 0 0 0  B 0 X 0 0 0  C 1 0 X 0 0  D 0 1 0 X 0  E 0 0 1 0 X  *dependency reads: column depends on row (e.g. element on column C depends on element on row A) c) new dependencies in matrix representation  ABC BCD CDE  d) Paths  Figure 6-3 Paths of length 2  MacCormack (2008) uses, in the visibility matrix, the property of powers of matrix; raising the DSM of a system from 0 up to the n-1 power in order to visualize all the possible indirect dependencies. The visibility operation later converts these values to one-bit binary values. Our work will expand by using strengths compared to the one-bit binary or Boolean relationship indicator in visibility matrix (MacCormack 2008) see Chapter 5. However, for the cycle limitation analysis, strengths will be omitted at the beginning, and we concentrate here on the module direct and indirect relationships. The paths obtained and the strengths of the dependencies will be used in Chapters 5 and 7 to calculate the change propagation.  95  6.4 Systems with Cyclical Dependencies The implementation of the visibility matrix calculation, that considers strengths, would be simple for systems with no cycles, in other words, their dependency graph is acyclical. As discussed previously, the systems analyzed contained cycled dependencies. In objectoriented design, a cycled dependency is where a module A depends on a module B, and module B also depends on module A. A change in module A can potentially imply a change in module B, if this is done by a single developer it might be straightforward. However, if changes to module A and B are done by different developers or different teams and the changes in one part are ignored by the other this will cause to adjust both modules. In cases like this the initially allocated effort will increase. For instance, in Figure 6-4 the module ant.filters depends on the ant.utility module; however, there is a cycled dependency from ant.utility to ant.filters. Changes in the ant.utility module could potentially require changes in ant.filters and vice versa. Lakos (1996), Sangal et al. (2005), Sullivan (2001), Nordberg III (2005), etc. refer to avoiding cycled dependencies among elements as a principle of a design that is understandable, evolvable and scalable.  Propagation cost (MacCormack 2008) estimation saturated the visibility matrix for cyclical dependencies in the systems at the architectural level. For this reason, we want to limit the cycle impact on our metric. For instance, the representation of the open-source PDFBox DSM in Figure 6-4 contains cyclical dependencies in the upper triangular matrix and are displayed in red for the original DSM. Figure 6-4 illustrates dependencies in the DSM by reading that element on column X depends on element on row Y. These cycles cause the propagation estimation to be amplified by feedback (Laval 2011). It is seen in Figure 6-4 that when the PDFBox DSM that contains cycles is raised to the square results in a matrix, observable in the second row, that quickly saturates. This result contrasts with the ideally  96  acyclical matrix, shown in second column “PDFBox DSM-Ignoring cycles” and second column. The PDFBox DSM shows a hierarchy which was obtained by organized the system hierarchically by provider proximity with a software tool (Sangal et al. 2005).  PDFBox DSM Original DSM  1  (D )  Change Propagation for paths of length 2  2  (D )  1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 13  1 0 0 15 16 5 51 19 396 0 0 196 0 26  2 3 4 5 6 7 8 0 0 0 0 0 4 0 0 0 0 0 0 0 23 0 0 0 0 6 0 59 0 1 0 0 0 0 56 0 55 38 0 0 0 9 0 17 30 256 0 22 176 0 0 0 0 106 0 8 25 51 21 0 743 123 0 0 0 0 0 0 0 46 0 0 0 0 0 0 10 90 542 236 6 643 89 7212 0 252 0 0 0 0 44 36 8 59 0 11 0 111  1 2 3 4 6 0 0 0 105 12 19 11 182 49 174 93 117 20 27 19 185 35 164 70 288 73 278 182 140 8 46 41 313 27 335 256 111 18 24 17 154 34 154 70 2308 1809 2122 1988 240 54 173 89 222 62 204 108  5 0 0 72 0 5 2 91 255 0 5 225 15 74  6 28 192 366 202 368 429 188 236 197 352 2175 436 384  7 0 37 85 50 61 111 71 318 42 59 2050 78 102  PDFBox DSM -Ignoring cycles 9 10 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 43 125 0 43 0 0  11 0 0 20 0 14 2 0 16 0 12 0 55 23  12 0 0 0 0 0 0 0 0 0 0 15 0 0  13 0 0 0 0 0 0 0 0 0 0 0 0 0  1 2 3 4 5 6 7 8 9 10 11 12 13  1 0 0 15 16 5 51 19 396 0 0 196 0 26  1 2 8 9 10 11 12 13 0 0 3 0 0 0 0 0 1 0 0 0 6 0 10 0 0 2 0 0 1854 33 36 21 9 0 3 4 0 15 14 0 23 0 0 4 5 31 0 1861 17 35 28 7 0 6 85 0 1930 56 96 125 4 0 39 0 71 29 0 33 0 0 7 0 2140 201 35 204 8 0 8 234 18 0 12 0 16 0 0 9 111 9 1806 17 34 7 7 0 10 102 555 1964 16 2166 0 0 11 2305 1809 54 1908 36 45 97 18 0 12 240 62 1916 48 37 42 10 0 13 222  2 0 0 0 0 0 0 0 25 0 0 90 0 36  3 0 0 0 1 55 17 0 51 0 0 542 252 8  3 0 0 0 0 10 86 31 190 0 0 224 149 163  4 5 6 7 8 9 10 11 12 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0 0 30 256 0 0 0 0 0 0 0 0 0 0 106 0 0 0 0 0 0 0 21 0 743 123 0 0 0 0 0 0 0 0 0 0 46 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 236 6 643 89 7212 43 125 0 0 0 0 0 0 0 44 0 43 55 0 0 59 0 11 0 111 0 0 23 0 0 4 0 0 0 0 0 74 34 193 0 0 168 73 75  5 6 7 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 0 0 0 250 57 0 0 0 197 42 0 0 188 33 0 225 2038 1834 56 0 371 78 1830 67 380 87 1809  9 0 0 0 0 0 0 0 0 0 0 0 25 17  10 0 0 0 0 0 0 0 0 0 0 0 45 37  11 0 0 0 0 0 0 0 0 0 0 0 0 0  12 0 0 0 0 0 0 0 0 0 0 0 0 0  13 0 0 0 0 0 0 0 0 0 0 0 0 0  The red cells are the cells that do not appear in the non-cycled DSM for the second hop. Values that are in orange appear in the non-cycled DSM propagation at hop 2 but with their value increased.  *dependencies read: column depends on row (element on column depends on element on row) Figure 6-4 Cycles saturate at D  2  It is shown that more cycles were generated when calculating indirect dependencies for paths of length 2. A feedback effect (Laval 2011) is introduced and it is identified by coloring cells in red and orange. Red cells are the product of cycles and orange cells are partially the product of cycles. This feedback effect distorts the propagation for paths of a given length being calculated. The new red cells above the diagonal will cause more cycling as soon as more 97  indirect dependencies of longer paths are calculated. Laval (2011) also mentions that cycles will produce a feedback effect in the system. In this study it is found that feedback in the cycles will beget more cycles in the change propagation for indirect dependencies of length I and power i of a matrix, which in for power i+1.  Another observation is that cycled dependencies are of lesser strength than the dependencies that follow the hierarchical design rules. Sangal et al. (2005) uses the excessive complexity as a measure to represent the ration of cycled dependencies to the total of dependencies in the system. In our analysis we used a similar ratio and found that cycled dependencies for the analyzed systems are among 2 and 10 percent, only Checkstyle had no cycled dependencies for most of its releases (with the exception of one release). Baldwin and Clark (2000) refer to cycles in the design of industrial equipment, they mention that in reality no block is truly independent but the design intends to isolate as much as possible. In the software development some propose to decouple such cyclic dependencies, i.e. use abstract data types (Sullivan 2001), by enforcing the hierarchy (Lakos 1996).  6.4.1 Tactics to Address Cycles The following approaches were considered to investigate the impact of cycles on change propagation: 1. Move atoms causing cycles, and ignore other cycles. 2. Ignoring cyclical dependencies. 3. Limiting the impact of cycles.  In order to determine the hierarchy within a system the modules are ordered by allocating the heaviest provider near to the consumer module. This produces a design structure matrix with  98  The ideal hierarchy is visible in the lower triangular matrix; therefore, the cycles will be the dependencies with the lowest strengths in the upper triangular matrix as seen in Figure 6-5.  18  3148  10  137  33  18  88 18  util 10  2  9  16 439  309  *  315  1759 5  1  35 10  4 2  6 23  25 3  types 8  7  7  2893  util  23  helper6  9  filters 5  input 4  taskdefs 3  1 2 3 4 5 6 7 8 9 10  listener2  org.apache.tools.ant  loader listener taskdefs input filters helper util types * util  loader1  Ant 1.6.0  60  300  48 1  2  133  52  64  95  1607  49  175  585 1578 10  496  Figure 6-5 Dependency Structure Matrix with hierarchy  6.4.1.1 Change Propagation with Cycles The analysis tools examines atoms, or classes, causing dependency cycles within a given module and if simply moving to other modules to eliminate the cycle. This technique will reduce the number of cycles and therefore the masking, ideally dependencies should be removed by the creating an atom(s) (class), set of atoms, or layer (package) allocated at a lower level in the system’s hierarchical structure. 6.4.1.2 Ignoring Cycling Dependencies In order to avoid the feedback caused by cyclical dependencies and considering that these cyclical dependencies are the weakest the propagation cost is done ignoring these dependencies. In section 3.4 some examples are shown where and it can be observed that in some cases where the propagation cost calculation does not differ for some versions the modified change propagation shows an increase because of the cyclical dependencies.  99  6.4.2 Limiting the Impact of Cycles In order to limit the impact of cycles this study extends the cycle suppression approach used by Frost (1992), which consists in zeroing the diagonal of the matrix, D, raised at the power of i, Di, that provides with the indirect dependencies of length i. Frost argues that to eliminate cycles the diagonal of Di should be zeroed since for a given indirect dependency path that starts and ends on itself, at any path of length i, is a cycle to itself as is show in Figure 6-8 for the cell of row A column A. This is partially correct, since we are only eliminating the cycles that start and end with the same modules. However, in this study cycles were found inside other paths as well. For instance, for the system in Figure 5 the path depicted in Figure 6-7 b) of length 2 starting from A and ending in E, travelling from A to B to E, or path ABE, will create a cycled path ABEB observed in Figure 6-7 c). Similarly, path AEC will create the path AECE of length 3. The mentioned paths will consequently form paths ABEBC AECEC of length 4 that contain a cycle and for this matter the propagation calculation should limit cycled paths. This cycling effect grows for paths of greater length. For instance 24 out of 69 the paths of length 3 where cycled as shown in Figure 6-9. The paths in Figure 6-9 are the result of raising the design structure matrix, D, to the power of 3. The cycled paths are colored in red and non-cycled paths are colored in black. For the sample of paths of length 4 ending in A are shown in column A of Figure 6-10; 23 out of 28 these paths were cycled. In order to obtain the modified change propagation metric is important to limit the feedback caused by cycles.  100  a) Graph representation  D  A  B  C  D  E  A  X  1  1  1  1  B  1  X  1  0  1  C  0  0  X  1  1  D  1  0  0  X  1  E  1  1  1  1  X  b) DSM representation  Figure 6-6 System with cycles  All the paths that can be formed from the interrelationships in the system are the set of paths that form the transitive closure of this system. A subset of these paths is obtained when raising the matrix to the power for each i in formula (1). For this reason, every new visited module should be checked for cycles when expanding a path.  a) Path AB, length 1 b) Path ABE, length 2 c) Path of length 3 ABEB Figure 6-7 Steps of cycled path  101  col  M  A  B  C  D  E  A  X  1  1  1  1  B  1  X  1  0  1  C  0  0  X  1  1  D  1  0  0  X  1  1 A  1 B  1 C  1 D  X E  ABA x  AEB 1  ABC, AEC 1  ACD, AEC 1  ABE, ACE, ADE 1  row-> M  E A B C D E M2  A  X 1 1 1 1  B  1 X 1 0 1  A B  C D E  BEA … BAC, BEC BAD, BCD, BED 1 x 1 1 C CDA, CEA CEB … CED 0 0 X 1 1 1 1 x 1 D DEA DBA, DEB DAC, DEC … 1 0 0 X 1 1 1 1 x E EBA EDA EAB EAC, EBC EAD, ECD 1 1 1 1 X 1 1 1 1  BAE, BCE 1 CDE 1 DAE 1 … x  Figure 6-8 Power 2 obtains all the paths of length 2  col  M  A  B  C  D  E  A  X  1  1  1  1  B  1  X  1  0  1  C  0  0  X  1  1  D  1  0  0  X  1  1 A A..A x  1 B  1 C  1 D  X E  ABEB, ACEB  AEBC, ABEC, ACEC  ABCD, AECD ABED, ACED  AEBE ABCE AECE ACDE AECE  BADA, BCDA, BEDA, BAEA, BCEA  B..B x  BEAC BAEC BCEC  BEAD BACD BECD BAED BCED  BEAE BACE BECE BADE BCDE BEDE  C  CEBA, CEDA, CDEA  CDAB, CDEB  C..C x  CDAD CEAD CDED  CDAE CEAE CEBE CEDE  D  DABA, DEBA, DAEA  DEAB DAEB  DEAC DBAC DEBC DAEC  D..D x  DEAE DBAE DEBE DACE DECE  E  EABA, EADA, ECDA  EBAB  EBAC EDAC  EBAD EDAD EACD EBCD  E…E x  row-> M2 A  A ABA x  B  BEA 1  C D E  B AEB 1  E C D E M3 ABC, AEC ACD, AEC ABE, ACE, ADE A 1 1 1  … BAC, BEC BAD, BCD, BED x 1 1 CDA, CEA CEB … CED 1 1 x 1 DEA DAB, DEB DAC, DEC … 1 1 1 x EBA EDA EAB EAC, EBC EAD, ECD 1 1 1 1  BAE, BCE 1 CDE 1 DAE 1 … x  B  Figure 6-9. Power 3 obtains all paths of length 3  102  D4 A  B C  A A..A x  B  C  AEBEB ABCEB AECEB ACDEB AECEB  ABEBC ACEBC AEBEC ABCEC AECEC ACDEC AECEC  BEADA BACDA BECDA BAEDA BCEDA BEAEA BACEA BECEA BADEA BCDEA BEDEA  B..B  BADAC BCDAC BEDAC BAEAC BCEAC BEAEC BACEC BECEC BADEC BCDEC BEDEC  x  CDABA, CDEBA CDADA CEADA CDEDA CDAEA CEBAB CEDAB CDEAB CDAEB CEAEA CEBEA CEDEA CEAEB CEBAB CEDEB  E  BADAD BCDAD BEDAD BAEAD BCEAD BEACD BAECD BCECD  C..C x  D  E  D  D..D DEABA DAEBA DEAEA DBAEA DEBEA DACEA DECEA  DABAB DEBAB DAEAB DEAEB DBAEB DEBEB DACEB DECEB  EBABA EBADA EDADA EACDA EBCDA  EABAB EADAB ECDAB  x E…E  x  Figure 6-10. Power 4 obtains all paths of length 4  6.5 Cycle Detection In order to check for cycles we must obtain the indirect-dependency paths traversed for all the modules in the system. We find that there is a previous body of knowledge in algorithms and graphs to traverse paths. We should note that a system can be expressed indistinctly as a directed adjacency matrix or as a directed graph. This thesis considered two well-known search approaches for graphs, and used depth search first (Tarjan 1972).  Depth-first search is a recursive algorithm that reaches all the vertexes, the algorithm used is a depth-first search variation that only visits vertexes not previously visited. In order to keep track of the vertexes previously visited this information is stored. The pseudo code for this variation is:  103  CallDFSmodified(Digraph digraph) … path.add (sourceV) dfs(sourceV, digraph, path) dfsModified(sourceV, graph, path) for neighbor: digraph.adjacencyList(sourceV) if neighborList > 1 currentPath.copy (path)  path bifurcates pathList.add(currentPath) if neighbor not in path cycle detection currentPath.add(neighbor) path generation storeInMatrix(path) dfs(neighbor, graph, path);  next path of length i+1  Once all the paths had been generated, the change propagation is obtained for each cell in the formed matrices. For instance, for a cell with two paths: path1: ABC, for AB the strength is 10 and from BC the strength is 5 the strength is the result from the operation series(10,5) path 2: ADC, for AD the strength is 20 and for DC the strength is 15 the strength operation series(20,15) the result is the operation parallel of path 1 and path 2 which can be expressed as: parallel(series(20,15), series(10,5)).  These set of series and parallel operations are done for all cells of all the matrices. The change propagation is the sum of all matrices from paths of length 1 to length n-1.  6.6 Conclusions Cyclical dependencies indefinite propagation can be limited to the first set of indirect dependencies on the modules involved in the cycle and by eliminating the identity matrix. The 104  assumption is that we take into account all the different set of indirect dependencies the first time; after the first time all the set of propagations the next indirect set of propagations will just artificially inflate the dependency strengths. The identity matrix contains cycles to elements starting from themselves.  The estimation of the proposed change propagation metric is obtained in two parts; the first part by generating a visibility matrix, V, containing propagated dependency paths instead of dependency strengths, and secondly by evaluating the dependency paths. The propagated dependency paths are obtained by representing the design structure matrix as a graph and by generating all the possible non-cyclical paths. This is accomplished by modifying search algorithms that check for cycles.  105  7 Dealing with the Depth of Propagation 7.1 Introduction In this section, it is presented how the depth in the change propagation was approached when using a number that quantifies a dependency rather than Boolean or its one-bit binary descriptor. Notions on how to estimate the depth propagation and reasoning about them are introduced in this chapter.  7.2  Propagation and Depth Estimation  The visibility matrix used by MacCormack (2008) implicitly uses the standard dot product and is adequate for one-bit binary dependency descriptors. However, it might not reflect potential change propagation correctly for cases in general when using weights of dependencies. The weights of dependencies are numbers that describe the strength between different modules in the design structure matrix that characterizes the structure of a system. The binary approach is well suited by using the dot product, as explained in Chapter 5. For an indirect dependency dot product is expressed by equation(22): n  Qi , j   Ri ,k  Sk , j  (22)  k 1  Figure 7-1 Propagation Depth  106  This standard dot product has two modules, breadth and dept. The   or parallel operator  deals with the breadth of the propagation was explained in Chapter 5. The depth module  ,  or series operator involves a direct multiplication; however, a multiplication would magnify propagation at every step of the path lengths in the transitive closure. Additionally, a direct multiplication might not reflect all the possible cases, for instance:  a) A module, B, at a level l depending on a module C at level l-1 might use or depend marginally or non-substantially o module C to deliver a service to module A in level l+1. b) A module, B, at level l despite depending from another module C at level l-1 might not depend on C to deliver a service to module A. c) A module, B, at a level l depending on a module C at level l-1 might depend part of the operations exposed by module C to deliver a service to module A in level l+1.  In order to reason about dependency in modules this study presents an exercise. We intend to answer the question how much does an element depends on an indirect dependency. For instance, as Figure 7-1 depicts a module A depends indirectly on element L. It is not intended to measure other dependencies such as the one from element A directly depending on element D and the indirect dependency from A to G. The dependence from A to L should be in function of:  D( A, L)  x  y  z (23) Additionally, similar functions do not behave in a transitive manner and other functions such as average and sum operation does not reflect general cases. For these reasons, the depth propagation was decided to be implemented as a minimum in order to reflect a general case since average does not behave in a transitive manner.  107  The addition operation to estimate indirect dependency has the inconvenience of masking dependencies. If an element A depends directly from an element B by a weak dependency and if element B depends strongly a sum would estimate that A has a strong dependency from C. The same would be similar to the maximum operation and average does not behave transitively. The addition operation to estimate indirect dependency has the inconvenience of masking dependencies. If an element A depends directly from an element B by a weak dependency and if element B depends strongly a sum would estimate that A has a strong dependency from C. The same would be similar to the maximum operation and average does not behave transitively.  This study characterizes the dependency weight as the count of the static code dependencies between any two modules, which can be normalized to 1, or not. This raises the question on how to determine in a general way the dependency. For instance, consider the hypothetical structure in Figure 7-1, if direct dependencies are x, y and z is the dependency from module A to L an algebraic multiplication, a sum, a maximum a minimum, or a different mathematical formula with different variables to consider? One could say that the “amount” of dependence is determined by the internal code structure, and this would depend on the particular implementation.  108  Figure 7-1.  Different approaches exist to measure the code metrics in object-oriented programming as the metrics proposed by McCabe McCabe (1976), Lakos, Lakos (1996), Hitz and Montazeri (1995). However, the aim of the metric is to have a general approach that shows the potential changes similar to the propagation cost from MacCormack (2008).  Furthermore, the  proposed propagation metric should be available both: a priori and a posteriori of the implementation. For this reason, we need a general and reasonable approach that used to reason about how change is propagated. In order to have a framework we review the work done in interdependency in Martí et al. (2008).  7.3  Interdependency  This thesis bases the depth propagation estimation on work done on infrastructure dependency. The propagation in depth is considered the minimum value of a interdependency chain based on the rationale that the weakest link in a series determines the overall capacity. Different physical infrastructure depends on each other to work, thus being interdependent as (Martí et al. 2008) (Peerenboom 2001) point out. For instance, some nodes in the water distribution network depend on electricity to power the pumps; or an emergency operation center might depend on the telephone, electricity and other infrastructure. If one of the module degrades then the overall output of the chain of modules 109  degrades as well. For this reason, in real world interdependence de depth of propagation is a function of the minimum of a chain of dependencies. The I2sim simulators (Martí et al. 2008) that simulates several infrastructure modules implement the operational depth propagation at every node in a serial connection as a function of the minimum of the inputs or dependencies.  7.4 Conclusion The minimum operation in a series of interconnected elements is a better simple estimation of dependency since other functions such as average do not behave transitively, or mask weaker dependencies with stronger ones as the case of maximum and sum operations. Therefore, this study based on minimum to propagate in depth.  110  8 Conclusions 8.1 Introduction In this section, we summarize the reasons why we believe our proposed improved change propagation metric is a better metric to measure evolution cost for software architectures. The module dependencies propagation is used to assess how much resistance a software infrastructure presents while evolving. Weighed dependencies between modules convey more information than the Boolean dependencies approach proposed by MacCormack (2008), and we also take into consideration also two factors breadth and depth, caused by indirect dependency propagation. We propose here the “Gonzalez change propagation”, which uses the Divide By Power variant (see section 5.4) and use the direct dependencies to normalize the result.  8.2 Research Goals Summary A modularized software architecture allows reuse of common tasks to combine and perform more complex tasks. Software designers divide the system entire functionality and assign it to different modules within a system. This presents an advantage; however, the ways to implement a software system differs as well as the way to design its architecture. Different designs have different tradeoffs and quality attributes (Bass, Clements, and Kazman 2003) . Long-term maintenance or evolvability is the quality attribute that we are concerned with in our study and is it also noted to cause around 40 to 80 percent of the total cost in software development. We are trying to find which are the factors that make some software systems more evolvable than others. Software development organizations acknowledge the role of the software structure design as one of the enablers or disablers that allow a system to adapt to  111  change. In this study we propose to measure the ability to adapt to change based on the topology and the strength of relationships among the system modules and the way these relationships indirectly expand throughout the system. Some authors such as Parnas (1972), Dijkstra (1968), Baldwin and Clark (2000), etc. state that loosely coupled software designs allow for more maintainable designs. MacCormack (2008) proposes to measure system coupling by expressing the system in a Boolean design structure matrix. In this thesis, we propose to use strength as a dependency qualifier since not all dependencies are of the same strength as exposed in Chapter 2.  In this study we were looking for an answer to the question:  RQ1. Can a value be estimated for software architecture? Could a heuristic be assigned for software architecture value?  while trying to answer that question many software architecture practitioners refer to modularity (Baldwin and Clark 2000), (Sangal et al. 2005), (Sosa 2008), (Koziolek 2011), etc. as a preferred characteristic of architectures that adapt to change. For this reason, research has been conducted to find a metric for modularity, which reduces the needed effort while evolving. In this study, a complementary inverse approach is taken based in prior research (MacCormack 2008) and we try to measure how decoupled a system is. Measuring system decoupling is also an indicator modules in the systems will be less likely to be affected because do not depend on information about other modules. Based on this decoupling metric we ask whether this measure could be used to compare different system architectural designs; the questions are:  112    RQ2. Can a metric be defined to compare the cost of different evolution paths for a software system?    RQ2.1. Could propagation cost be modified to address its limitations and become an improved metric?  The change propagation metric is used in analyses to calculate rework costs per release for different evolutionary paths that involve designs with tradeoffs, as in (Nord 2012). These designs and its tradeoffs are the implementation of organizational strategies that are a way to deal with the environment. For instance, one dimension of the environment is the commercial dimension where time to deliver is a very well-known restriction. For these reasons, change propagation metrics are important in order to analyze future possible costs based on the architecture of a system.  8.3 Contributions This thesis has the aim to address some of the limitations of propagation cost as proposed by MacCormack (2008):  D1. Propagation costs models dependencies with Boolean values, this assumes that all dependencies in the system are of the same type; however, in a system, there are dependencies from a subset of modules to others what will exert a great influence, while other subsets are of little or relatively no consequence.  This study proposes to use a  strength, e.g. as the count of the usage of dependencies, to perform the indirect dependencies propagation analysis.  113  D3. Propagation cost might not give a good result when dealing when increasing the number of modules in the system. Propagation cost might estimate a reduced metric due to normalizing with the square of the number of elements in the system. Propagation cost decreases non-linearly when more modules are added. We propose to change the normalization for this reason and also because when dealing with strengths rather than Boolean dependency indicators dividing by the square of the number of modules, or all of the possible indirect dependencies combinations of modules, might just use normalization as a scaling factor.  D4. Many systems have dependency cycles, which would cause propagation cost to give an incorrect and saturated result. Propagation cost uses the visibility concept (Sharman and Yassine 2004) which is developed upon the matrix powers (Warfield 1973). If there is a cycle in the system that is all connected this will produce the final product to saturate. A different algorithm or equation that takes into account cycles is needed. We propose taking into account the indirect dependency path caused by an acyclical dependency while the generated indirect path does not repeat any of the steps visited.  8.3.1 Propagation Breadth and Depth Dependency propagation has two modules breadth and depth, the results of the analysis show that the propagation caused by breadth has to be considered as an addition of the indirect dependencies and that propagation caused by dependency depth has to be attenuated as depth increases. In our analysis we saw that there is practically a small variation among the breadth propagation that considers averaging, taking the maximum of the indirect dependencies, or taking a max plus the minimum.  114  Another conclusion is that if the propagation caused by breadth is not restricted by the depth then the estimation increases of one or more orders of magnitude depending on the expansion of the hierarchical structure. This is due to a magnifying effect of the propagation by breadth that adds all the parallel paths and results in elevated results. In a private conversation with other software architecture practitioner (Kazman 2012) there was an observation that as the indirect dependencies were more distant it was less probable for those indirect dependencies to have an effect than the dependencies with a closer indirect propagation. This assumption was confirmed by comparing the estimations of change propagation and the system breakdown optimality (Bouwers 2011) results which are similar for the system Checkstyle.  It could be confirmed that systems that have their architectural design refactored can reduce the propagation change, such as the case of Checkstyle version 3.0 that according to the release notes contemplates:   “Completely new architecture based around pluggable modules. This means that users can now write their own checks without changing the source code of Checkstyle itself (request 578712).    Configuration is not based on Properties any more. Instead a Configuration interface is used …”  This software architecture refactoring helps to reduce the propagation and thus the indirect dependency propagation is reduced as a consequence. Another observation is that the propagation in some other systems closely resembles the change in the amount of source lines of code. In this systems we can conclude that the refactorings at least allow the dependency propagation not to increase exponentially. 115  8.3.2 Cycles We could observe that architectures of the analyzed software systems contain some violations to the hierarchical design and introduce cycles. Cycling cause dependencies to create a feedback effect and to increase propagation dependency, Laval (2011) also mentions that cycling creates a feedback effect. In this study the approach was to consider paths of indirect cycled dependencies of limited up to the creation of another cycle, so that a cycle would create paths of a maximum length of m-1, where m is the number of modules related in the cycle. This allowed us to consider cycles within architecture while limiting their effect. We also conclude that dependencies with the smaller strength were in a cycle also resulted in a smaller influence in the estimation of propagation. What we also found was that when we estimated change propagation without limiting cycles the created cycled paths caused the estimation to grow orders of magnitude. After including cycle limitation it was observed a reduction from cycling from 2 to 10 percent.  8.4 Future Work There are several venues for future work which are detailed in the next sections.  8.4.1 Level of Analysis There is the need for formal methods and guidelines to determine up to which level in the hierarchal static structure should be considered for propagation estimation. Some researchers e.g. Laval (2011), Sosa (2008) obtain their measures at levels determined by an analyst. For instance, for the Ant software system version 1.3 used in (Sosa 2008) has the ant element at the first level concentrating 97% of the total dependencies in the system as internal dependencies to this element. Some systems such as Hadoop version 1.0.4 have up to 4 levels and with 96 packages, some of these packages having 1 class as the jmx package  116  at the first level or others like mapred.* at the second level having 189 classes. In order to make a meaningful and useful analysis of the dependencies in the system, modules with weak dependencies are collapsed and modules with more dependencies are expanded. Based on empiric knowledge we were initially considering analyzing no more than 10 fundamental modules in a system. In some cases, collapsing and expanding is useful, for instance, in our analyses we contracted elements that contained weak dependencies (around 1% of the total dependencies) and expanded elements that contained stronger dependencies as Sangal et al. (2005). However, guidelines that are more detailed or formal methods can be of great advantage for the analysis of systems with complex hierarchies.  8.4.2 Propagation Decrease for Path Length Increase According to several researchers in the cost propagation as the indirect dependency visits or passes through more elements the probability of the elements at the end of that path affecting each other by a change diminishes. How much should the indirect propagation might diminish is still an open question. In this study, we use the length of the path as an intuitive heuristic; however there might be a more formal method to determine how much should the indirect dependency should decrease as the length of the path increases.  8.4.3 Dependency Types Dependencies in the static structure of a software system are of different types; for instance, Laval (2011) differentiates the dependencies by method call, class reference, class inheritance and class extension (for OOP languages such as Ruby and others). Research that is more detailed is needed to determine the differentiation of strength caused by the different types of dependency, e.g. by inheritance and a class reference. One intuitive strength could be:  117    Determine the strength caused by an inheritance by the amount of functionality and data inherited, by the amount of lines of code or some other measure    Determine the strength caused by an inheritance by the amount of functionality and data inherited and the usage in the inheriting class  8.4.4 Internal Dependencies Every time there is a module expansion which contains more submodules we can see that the dependencies within the previously not expanded module result part in dependencies among submodules and other part in internal submodules dependencies.  Sangal et al. (2005) make a classification of the structure of the systems, and it can be observed that the majority of the systems analyzed in this study fall into the “imperfectly layered system” category. Some software practitioners mention that one of the causes of complexity is not enforcing layering, which refers to enforce that an element at layer i+1 only depends on layer i. Some other researchers state that enforcing layering might not bring any advantage and might increase the cost due to “wiring” the dependencies into a strict layering for the elements that use them. One possible line could be that when any dependency is used in more than half of the layered elements it might be useful to incorporate it into to the interface of the layers.  8.4.5 Analysis per Release The dependency propagation obtained can now be used to analyze cost per different releases as in (Nord 2012). The analysis per release can be performed using the visibility matrix, V (n-1) in order to find which modules are involved in the modifications and to obtain an indicator of the rework needed to modify the involved modules. This information used  118  along with the cost of the module can give more accurate information when compared with the binary dependencies.  8.4.6 Visibility MacCormack (2008) estimates the visibility matrix from a Boolean design structure matrix that is used to describe the static structure dependencies in the system. An element describing a dependency between module i and module j has a value of 1 in the matrix to indicate a code dependency exists or a 0 to indicate no dependencies involved. MacCormack’s visibility matrix is the result of adding up all of the indirect dependencies of length 0 to n-1 by adding the powers of the design structure matrix (Warfield 1973), D, as it can be observed in the formula: n 1  V  D  i  i 0  We propose to use add only the dependencies of length 1 to n-1, since dependencies of length 0 is the is the identity matrix (MacCormack 2008). We also found that suppressing the identity matrix at every Di causes cycling and is eliminated as Frost (1992) proposes. n 1  V    i  i 1  where    D I  (24)  8.5 Conclusion The “modified change propagation” is a useful metric to gauge the modularity and evolvability of a software system; it is based on the dependency propagation and offers a guideline for how maintainable a software architectural design is or might be as the structure enables. Our 119  change propagation metric is a set of incremental improvements over the original Propagation Cost (PC) metric suggested by MacCormack (2008), our metric weights dependencies with the strength of their connections, which conveys more information about the incidence of strongly connected relationships and discounts weak dependencies. This is important since it has been known for decades that maintenance cost can be reduced by implementing loosely coupled systems. Additionally for existing systems, propagation metrics based on MacCormack (2008) consider that the system should be acyclical; however, in practice a very few actual systems are exempt of cycles. Furthermore, if cyclic dependencies are heavy rather than weak then these cycles should be treated differently. Change propagation can also can help maintainability analysis in a release sequence by considering costs and propagation in order to choose a more evolvable software architecture.  Our modified change propagation metric can help software designers assess the maintainability of a software system over a proposed release sequence by considering propagation costs across multiple proposed evolutions of a software architecture, when this architecture is “refactored”. We validated our metric both on a system developed at UBC, and on several large open-course repositories for which we were able to obtain long release histories.  120  References Agrawal, Anumap. 2009. "Product Networks, Component Modularity and Sourcing." Journal of Technology & Innovation no. 4 (1):23. Apache, S. 2002. "Apache Software Foundation, accessed June 19, 2013, http://apache.org. Apache 2013. "Powered by Hadoop ". Bachmann, Felix, Len Bass, and Robert Nord. 2007. Understanding Architectural Patterns in Terms of Tactics and Models. News at SEI, accessed June 19, 2013, http://www.sei.cmu.edu/library/abstracts/news-at-sei/architect200708.cfm. Baldwin, Carliss Y., and Kim B. Clark. 2000. Design Rules, Vol. 1: The Power of Modularity: The MIT Press. Bass, Len, Paul Clements, and Rick Kazman. 2003. Software architecture in practice. 2nd Edition ed: Addison-Wesley Longman Publishing Co., Inc. Beck, Kent, Mike Beedle, Arie Van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, and Ron Jeffries. 2001. "Manifesto for agile software development." The Agile Alliance:2002-04. Boehm, Barry. 2003. "Value-Based Software Engineering." ACM SIGSOFT Software Engineering Notes no. 28 (2). Borthakur, D. 2007. "The hadoop distributed file system: Architecture and design." Hadoop Project Website no. 11:21. Bouwers, Eric, Arie van Deursen, and Joost Visser. 2011. Quantifying the encapsulation of implemented software architectures. Technical Report TUD-SERG-2011-031, Delft Software Engineering Research Group, Delft University of Technology. Cockburn, Alistair. 2001. Agile Software Development: Addison-Wesley Professional. Cunha, A., Pham Viet-Ha, W. Ross, and Wang Xianbin. 2011. Incorporating external command signals in the simulation of interdependent infrastructures: The control room concept. Paper read at Electrical and Computer Engineering (CCECE), 2011 24th Canadian Conference on, 8-11 May 2011. Cunningham, Ward. 1992. "The WyCash portfolio management system." In OOPSLA 92 Addendum: Conference on Object-Oriented Programming Systems, Languages and Applications, 29-30. Denne, Mark. 2004. "The Incremental Funding Method: Data-Drive Software Development." IEEE Software (May/June 2004):9. Dijkstra, E.W. 1968. "The structure of “THE”-multiprogramming system." Communications of the ACM no. 11 (5):341-346. Fowler, Martin. 2001. "Reducing coupling." IEEE Software no. 18 (4):102-104. Frost, RB. 1992. "Directed graphs and their adjacency matrices: misconceptions and more efficient methods." Engineering optimization no. 20 (3):225-239. Gal, Andreas, Wolfgang Schröder-Preikschat, and Olaf Spinczyk. 2001. On minimal overhead operating systems and aspect-oriented programming. Paper read at Proceedings of the 4th ECOOP Workshop on Object-Orientation and Operating Systems (ECOOP-OOOSWS’2001). Garlan, David, and Mary Shaw. 1994. An Introduction to Software Architecture. Carnegie Mellon University. Glass, R. 2003. Facts and Fallacies of Software Engineering. A practical guide. 1 ed. Boston: Addison-Wesley Professional. González-Rojas, M., Marti, J., Kruchten, P. 2011. A Canonical Data Model for Simulator Interoperation in a Collaborative System for Disaster Response Simulation. Paper  121  read at Proceedings of the 24th Canadian Conference on Electrical and Computer Engineering, CCECE 20118-11, May, 2011, at Niagara Falls, Ontario, Canada, . Greer, D., Ruhe, G. 2004. "Software release planning: an evolutionary and iterative approach." Information and Software Technology no. 46 (2004):11. Hitz, Martin, and Behzad Montazeri. 1995. Measuring coupling and cohesion in objectoriented systems. Paper read at Proceedings of the International Symposium on Applied Corporate Computing. ISO/42010. 2011. "Systems and software engineering -- Architecture description." ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 14712000):1-46. doi: 10.1109/ieeestd.2011.6129467. Kazman, R. 2012. Private Communication, March 22, 2012. Koziolek, Heiko. 2011. Sustainability evaluation of software architectures: a systematic review. In Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS. Boulder, Colorado, USA: ACM. Kruchten, P. 2010. InfoQ: What Color is your Backlog? Interview with Philippe Kruchten, accessed June 19, 2013. http://www.infoq.com/news/2010/05/what-color-backlog. Kruchten, P. B. 1995. "Architectural Blueprints—The “4+1” View Model of Software Architecture." IEEE Software no. 12 (6):8. Kruchten, P., M. Gonzalez, I. Ozkaya, and R. Nord. 2012. Change Propagation Metric as a Measure of Structural Technical Debt. In WICSA/ECSA. Helsinski, Finland. Kruchten, Philippe. 2003. The Rational Unified Process: An Introduction. 3rd ed: AddisonWesley Professional. Kruchten, P., Gonzalez, M., Ozkaya, I., Nord, R., Kruchten, N., 2012. Change Propagation Cost Metric as a Measure of Structural Technical Debt. Working paper. Lakos, J. 1996. "Large-scale C++ software design." Reading, MA. Laval, J. 2011. Package Dependencies Analysis and Remediation in Object-Oriented Systems, Laboratoire d'Informatique Fondamentale de Lille, l'Universite des Sciences et Technologies de Lille, Lille. Lehman, Meir, M. 1980. "Programs, Life Cycles, and Laws of Evolution." Proceedings of the IEEE no. 68 (9):17. Lidl, R., and G. Pilz. 1997. Applied abstract algebra: Springer. Liu, J., J. Lu, K. He, B. Li, and C.K. Tse. 2008. "Characterizing the structural quality of general complex software networks." International Journal of Bifurcation and Chaos in Applied Sciences and Engineering no. 18 (2):605. MacCormack, A., Rusnak, J., Baldwin, C. 2004. Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code. Harvard Business School Working Knowledge: 40. MacCormack, A., Rusnak, J., Baldwin, C. 2008. Exploring the Duality between Product and Organizational Architectures: A Test of the “Mirroring” Hypothesis. Harvard Business Schools Working Knowledge. Mancoridis, S., B.S. Mitchell, Y. Chen, and E.R. Gansner. 1999. Bunch: A clustering tool for the recovery and maintenance of software system structures. Paper read at Software Maintenance, 1999.(ICSM'99) Proceedings. IEEE International Conference on. Marti, J., P. Kini, P. Lusina, A.D. Pietro, V. Rosato, B. Charnier, and W. Kui. 2012. InterSystem Software Adapter for Decision Support by Interfacing Disaster Response Platforms & Simulation Platforms. Paper read at Global Humanitarian Technology Conference (GHTC), 2012 IEEE. Martí, J., C. Ventura, J. Hollman, K. Srivastava, and H. Juárez. 2008. I2Sim modelling and simulation framework for scenario development, training, and real-time decision support of multiple interdependent critical infrastructures during large emergencies. 122  Paper read at NATO (OTAN) MSG-060 Symposium on” How is Modelling and Simulation Meeting the Defence Challenges out to. McCabe, Thomas J. 1976. "A complexity measure." Software Engineering, IEEE Transactions on (4):308-320. McConnell, S. 2006. "Software Estimation, Desmystifying the Black Art." Microsoft Press. McConnell, Steve. 2007. "Technical Debt, accessed June 19, 2013." http://www.construx.com/10x_Software_Development/Technical_Debt/. Milev, R., S. Muegge, and M. Weiss. 2009. "Design Evolution of an Open Source Project Using an Improved Modularity Metric." Open Source Ecosystems: Diverse Communities Interacting:20-33. Nord, R. L., Ozkaya, I., Kruchten, P., González-Rojas, M. . 2012. In Search of a Metric for Managing Architectural Technical Debt. In Working IEEE / IFIP Conference on Software Architecture Helsinki, Finland. Nord, R. L., Ozkaya, I., Sangwan, R. S., Delange, J., Kruchten, P., González-Rojas, M., . 2013. Limitations in Using Structural Metrics to Measure Architecture Modifiability Properties. Nordberg III, Martin E. 2005. "Aspect-oriented dependency management." Aspect-Oriented Software Development:557-584. Ojala, Pasi. 2008. "Assessing Value of Software Architecture: a Case Study " WSEAS TRANSACTIONS on INFORMATION SCIENCE & APPLICATIONS no. 5 (9):10. Ozkaya, I., Kazman, R., Klein, M. 2007. "Quality-Attribute-Based Economic Valuation of Architectural Patterns." Parnas, D. L. 1972. "On the Criteria to be Used in Decomposing Systems into Modules." Association for Computing Machinery. Parnas, David L. 1979. Designing software for ease of extension and contraction. In Proceedings of the 3rd international conference on Software engineering. Atlanta, Georgia, United States: IEEE Press. Peerenboom, James. 2001. "Infrastructure interdependencies: Overview of concepts and terminology." Infrastructure Assurance Center, Argonne National Laboratory, Argonne, IL, p., accessed June 19, 2013 http://we-partner.org/onionbelt/wpcontent/uploads/2011/06/peerenboom_pdf.pdf Rossman, L.A. 2000. "EPANET 2: users manual." Ruhe, Günther. 2005. "The Art and Science of Software Release Planning." IEEE Software (Novermber/December 2005):7. Sangal, Neeraj, Ev Jordan, Vineet Sinha, and Daniel Jackson. 2005. Using dependency models to manage complex software architecture. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. San Diego, CA, USA: ACM. Sethi, Kanwarpreet, Yuanfang Cai, Sunny Wong, Alessandro Garcia, and Claudio Sant'Anna. 2009. From retrospect to prospect: Assessing modularity and stability from software architecture. Paper read at Software Architecture, 2009 & European Conference on Software Architecture. WICSA/ECSA 2009. Joint Working IEEE/IFIP Conference on. Sharman, David M., and Ali A. Yassine. 2004. "Characterizing complex product architectures: Regular Paper." Syst. Eng. no. 7 (1):35-60. doi: 10.1002/sys.v7:1. Sosa, M. E., Browning, T., Mihm, J. 2008. "Studying the dynamics of the architecture of software products." 2007 Proceedings of the ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, DETC2007 no. 3 PART A:329-342. Steward, Donald V. 1981. Systems analysis and management: structure, strategy and design: PBI (New York).  123  Sullivan, K., Griswold, W., Cai, Y., Hallen, B. 2001. The Structure and Value of Modularity in Software Design. In ESEC/FSE 2001. Vienna, Austria: ACM. Tarjan, R. 1972. "Depth-first search and linear graph algorithms." SIAM journal on computing no. 1 (2):146-160. Wang, X., Y Ghanam, and F. Maurer. 2009. AgilePlanner for Digital Tabletop. Warfield, John N. 1973. "Binary Matrices in System Modeling." Systems, Man and Cybernetics, IEEE Transactions on no. SMC-3 (5):441-449. doi: 10.1109/tsmc.1973.4309270. Wiegers, K. E. 1999. "First things first: prioritizing requirements." Software Development no. 7 (9):11-19.  124  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073917/manifest

Comment

Related Items