UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essential software structure through implicit context Walker, Robert James 2003

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2003-792676.pdf [ 12.49MB ]
Metadata
JSON: 831-1.0051659.json
JSON-LD: 831-1.0051659-ld.json
RDF/XML (Pretty): 831-1.0051659-rdf.xml
RDF/JSON: 831-1.0051659-rdf.json
Turtle: 831-1.0051659-turtle.txt
N-Triples: 831-1.0051659-rdf-ntriples.txt
Original Record: 831-1.0051659-source.json
Full Text
831-1.0051659-fulltext.txt
Citation
831-1.0051659.ris

Full Text

Essential Software Structure through Implicit Context by Robert James Walker B.Sc. (Geophysics, Major Program), University of British Columbia, 1992 B.Sc. (Computer Science, Honours Program), University of British Columbia, 1994 M.Sc. (Computer Science), University of British Columbia, 1996  A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF  Doctor of Philosophy in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard  The University of British Columbia March 2003 © Robert James Walker, 2003  in  presenting  degree freely  at  this  the  available  copying  of  thesis  in  partial  fulfilment  of  the  University  of  British  Columbia,  I  agree  for  this  department  or  publication  of  reference  thesis by  for  his  this  and  scholarly  or  thesis  study.  for  her  of  Co  financial  neater S c i e n c e !  T h e U n i v e r s i t y o f British Vancouver, Canada  D  DE-6  a  t  e  (2/88)  7  Marct,  2-003  purposes  Columbia  gain  shall  that  agree  may  representatives.  permission.  Department  I further  requirements  It not  be  that  the  Library  permission  granted  is  by  understood be  for  allowed  an  advanced  shall for  the that without  make  it  extensive  head  of  my  copying  or  my  written  Abstract Software reuse and evolution are problematic. Modules tend to express a great deal of knowledge about the external modules, large-scale structure, and behaviour of the systems in which they reside. As a result, the reusability of individual modules is decreased as they are too dependent on other modules. Likewise, the evolvability of systems is decreased as they are too dependent on the details of individual modules. However, we must have some dependences between our modules, or they would not be able to operate together. Not all dependences are equally bad, then. Each module can be seen to have a minimal core, an essential structure, beyond which it makes no sense to reduce. This essential structure describes the basic responsibilities of that module, and its expectations of the responsibilities of external modules. The thesis of this dissertation is that expressing the essential structure of our software modules, through the use of implicit context, makes those modules easier to reuse and the systems containing those modules easier to evolve. Implicit context revolves around the notion that we must abandon the need for all modules to be defined with respect to an absolute frame of reference. Instead, modules can be defined relative to a reference frame of convenience. In order for modules to work together, the inconsistencies in their independent world views must be reconciled when they attempt to communicate. Implicit context provides the novel mechanism of contextual dispatch to perform this reconciliation. When any communication passes into or out of a module, that communication may be rerouted, altered, replaced, or discarded altogether, as the situation dictates. This translation process sometimes requires access to information about the communication history of the system; objects previously passed can be retrieved, or the appropriate recipient of a message can be determined. The dissertation describes a prototype tool that supports the application of implicit context to Java source code. The thesis is validated by applying this tool in two case studies: comparing the evolution of an implicit context-based implementation of an FTP server to an object-oriented implementation; and reusing the Outline View of the Eclipse integrated development environment in a different application, even though the Outline View was not designed for such reuse.  11  Contents Abstract  ii  Contents  iii  List of Tables  vii  List of Figures  viii  Acknowledgments  xi  Trademarks 1  Introduction  1  1.1  Overview of Implicit Context  2  1.2  A Simple Example  4  1.3  A Brief Survey of Evolution and Reuse Techniques  6  1.3.1  A d Hoc Approaches  6  1.3.2  Product-, Architecture-, and Interface-Centric Approaches . . . . .  8  1.3.3  Generative Approaches  1.4 2  xiii  9  Organization of the Dissertation  10  Motivation  12  2.1  Reusing the Outline View of Eclipse  13  2.2  Copy-and-Modify  17  2.3  Refactor  18  2.3.1  Dependences in m a k e C o n t r i b u t i o n s  19  2.3.2  Refactoring without Changing the Class Hierarchy  21  2.3.3  Refactoring through an Abstract Class  23  2.4  Summary  26  iii  3  4  Essential Structure 3.1  Minimal Specifications  28  3.2  Extraneous Embedded Knowledge  31  3.2.1  34  Varieties of E E K  3.3  Reduction to Essential Structure: An Abstract View  39  3.4  Summary  41  A Model for Essential Structure through Implicit Context 4.1  4.2  4.3  5  27  44  4.1.1  Boundaries  45  4.1.2  Contextual Dispatch  46  4.1.3  Communication History  49  Using Implicit Context to Express Essential Structure  53  4.2.1  Adding Contextual Details to an Abstract Module  54  4.2.2  Cancelling Out E E K Already Present within a Module  Summary  60 63  A Prototype Tool for Utilizing Implicit Context in Java  64  5.1  A Simple Example  65  5.2  Tool Structure  71  5.3  Boundary Maps  73  5.3.1  Capturing Communications  75  5.3.2  Parameterized Boundary Maps  80  5.3.3  Resuming Captured Communications  81  5.3.4  Modifiers  83  5.3.5  Communication History  86  5.3.6  Renaming Maps  91  5.4  Tool Directives  93  5.5  Sequencing Boundary Maps  95  5.6  Implementation Issues  96  . ,  5.6.1  Combining Boundary Maps and Java Source  5.6.2  Instrumentation,  Communication History,  Boundary Maps 5.7  6  43  Implicit Context  and Sequencing of 98  Summary  101  Validation 6.1  97  103  Reusing the Outline View of the Eclipse IDE  104  6.1.1  The Procedure  106  6.1.2  Results and Lessons Learned  Ill  iv  6.2  6.3  7  Evolution of a File Transfer Protocol Server  117  6.2.1  FTP Concepts  117  6.2.2  The Procedure  118  6.2.3  Results and Lessons Learned  134  Summary  Discussion 7.1  7.2  7.3  7.4  143  Local Execution  143  7.1.1  Event Traces  145  7.1.2  Reconciliation of Local and Global Execution  149  Communication History: Use and Efficient Support 7.2.1  8  141  155  Communication History Queries as an Expression of Minimal Dependences  155  7.2.2  Communication History Queries as Pattern Matches  156  7.2.3  Efficient Support of Communication History Queries  158  Implicit Context and Aspect-Oriented Programming  159  7.3.1  Outline of Concepts in AspectJ  159  7.3.2  Describing AspectJ in Terms of Event Traces  161  Summary  165  Related Work  167  8.1  8.2  8.3  8.4  Principles  167  8.1.1  Separation of Concerns  168  8.1.2  Dependences, Coupling, and Cohesion  168  8.1.3  Local Reasoning  168  8.1.4  Design-for-Change  169  8.1.5  Reflection  170  A d Hoc Approaches  ;  171  8.2.1  Copy-and-Modify  171  8.2.2  Refactoring  171  Interface-, Architecture-, and Product-Centric Approaches  172  8.3.1  173  Inheritance, Polymorphism, and Frameworks  8.3.2  Components and Middleware  175  8.3.3  Product Lines  177  8.3.4  Adaptors and Wrappers  178  Generative Approaches  180  8.4.1  Generic Programming  181  8.4.2  Generators  182  8.4.3  Collaboration-Based Design  183  v  8.4.4 8.5  8.6 9  Aspect-Oriented Programming  184  Other Related Topics  186  8.5.1  Essential Structure  186  8.5.2  Context  8.5.3  Multiple World Views  187 ••••  1 8 8  8.5.4  Names  188  8.5.5  History and Traces  189  Summary  190  Conclusion  191  9.1  Contributions  192  9.2  Future Work  193  Bibliography Appendix A  195 A Brief Overview of Notation  217  A.l  Square Boxes  217  A.2  Rounded Boxes  218  A.3  Lines and Arrows . .  219  A.4  Filled Circles  221  A.5  Jagged Polygons  222  Appendix B  A Brief Overview of the Abstract Factory Design Pattern  223  Appendix C  Listing of m a k e C o n t r i b u t i o n s  225  Appendix D The Application Programming Interface to Call History  228  D.l  The H i s t o r y Class  228  D. 2  The C a l l Class  234  Appendix E E. l  E.2  Example Results of Applying the Prototype Tool  236  Before Applying the Tool  236  E.l.l  Contents of S o m e C l a s s . j a v a  236  E.l.2  Contents of O t h e r C l a s s . J a v a  236  E.l.3  Contents of example .map  237  E.1.4  Contents of a p p l y . app  237  After Applying the Tool  238  E.2.1  Contents of i c w o r k i n g d i r / S o m e C l a s s . j a v a  240  E.2.2  Contents of i c w o r k i n g d i r / O t h e r C l a s s . j a v a  242  vi  List of Tables 5.1  Interpretation and support for capturing communications, for i n event des-  5.2  Interpretation and support for capturing communications, for o u t event  ignators, by each kind of boundary  76  designators, by each kind of boundary  77  5.3  Interpretation and support for capturing communications, for  5.4  The query methods defined on the H i s t o r y class  87  5.5  The interface to the C a l l class  87  6.1  Statistics on Eclipse and reusing the Outline View  Ill  6.2  Statistics on synthesizing and evolving the FTP server  134  sets  event designators, by each kind of boundary  vii  .  gets  and 77  List of Figures 1.1 1.2  A design utilizing the Abstract Factory design pattern  4  Supporting independent world views in conjunction with the Abstract Factory design pattern example  7  2.1  Screenshot of the Eclipse integrated development environment  14  2.2  Reusing the Outline View in another application  15  2.3  Dependences of the J a v a O u t l i n e P a g e class  16  3.1  Supposed, actual, and ideal dependences involving extraneous identification  36  3.2  An extraneous parameter arises due to the mismatch between minimal specifications and implementations  37  4.1  Overlapping boundaries  47  4.2  Nested boundaries  47  4.3  Performing contextual dispatch to use the Abstract Factory design pattern  .  49  4.4  Performing contextual dispatch to retrieve state through an added method  .  51  4.5  Retrieving state through the use of the communication history  4.6  An implementation of m a k e C o n t r i b u t i o n s expressing only its essential structure  52 55  4.7  The original implementation of m a k e C o n t r i b u t i o n s  56  4.8  Using communication history to retrieve discarded information  59  4.9  Altering an incoming message to hide the presence of E E K  61  5.1  Using the Abstract Factory design pattern to isolate platform-specific dependencies  66  5.2  The operation of boundary maps attached to CompoundWidget  68  5.3  Overview of the operation of the prototype tool  72  5.4  B N F syntax for mapsets  74  5.5  B N F syntax for capture clauses  75  5.6  B N F syntax for calls to proceed  82  5.7  B N F syntax for modifiers  84 viii  5.8  A sample query method on the H i s t o r y class  88  5.9  An example communication history  90  5.10  B N F syntax for renaming maps  92  5.11  B N F syntax for tool directives  93  6.1  The Eclipse integrated development environment  105  6.2  The module resulting from the reuse of the Outline View  109  6.3  The Outline View reused as an interface to my prototype tool  110  6.4  An example mapset performing a communication history query, Part 1 . . .  115  6.5  An example mapset performing a communication history query, Part 2 . . .  116  6.6  Evolving versions of the FTP server  119  6.7  Initial design for the purely object-oriented version of the FTP server, Part 1 121  6.8  Initial design for the purely object-oriented version of the FTP server, Part 2  6.9  Initial design for the implicit context version of the FTP server, Part 1  . . . 124  6.10  Initial design for the implicit context version of the FTP server, Part 2  . . . 125  6.11  The FTP-specified state machine for authentication  6.12  The FTP-specified state machine for passive mode  6.13  Evolved design for the object-oriented version of the FTP server, Part 1  . . 128  6.14  Evolved design for the object-oriented version of the FTP server, Part 2  . . 129  6.15  Evolved design for the implicit context version of the FTP server, Part 1 . . 130  '.  122  127 127  6.16 Evolved design for the implicit context version of the FTP server, Part 2 . . 131 6.17  Cross-evolved design of the FTP server  6.18  Using communication history to implement the Abstract Factory design pattern  133 137  6.19 Boundary map applied to command interpreters to alter their behaviour on the basis of whether the session has been authenticated, Part 1  139  6.20 Boundary map applied to command interpreters to alter their behaviour on the basis of whether the session has been authenticated, Part 2  140  A.l  Square boxes denoting types  217  A.2  Square boxes denoting instances and boundaries  218  A.3  Rounded boxes  218  A.4  Lines and arrows denoting attribute relationships  219  A.5  Lines and arrows denoting composition relationships  219  A.6  Lines and arrows denoting subtyping relationships  220  A.7  Lines and arrows denoting control and data flow  220  A.8  Lines and arrows denoting dependence relationships  221  A.9  Filled circles denoting message interception  221  A. 10 Jagged polygon denoting call history  ix  222  B. 1  Static structure of the Abstract Factory design pattern  223  E.l  General structure of instrumentation for methods and constructors  239  x  Acknowledgments It was a darky and stormy night in the Imager Computer Graphics Laboratory. I haunted the corridors of a Counterstrike world, inflicting nightmarish vengeance upon those who opposed me. And then I thought of my thesis topic.... Or something like that. There were definitely nights, Imager, and Counterstrike involved. All of which were made more bearable by my friends and colleagues there, including David Bullock, Bill Gates, Rob Scharein, Lisa Streit, Valerie Summers, Roger Tarn, and Marcelo Walter. Many interesting technical discussions. Much help and advice. But most of all: Good times. Anne Lavergne, who also started as an Imagerite, taught me much: some good, some bad. I would take the same route over again. Things fall apart. The second half of my grad student life began by haunting the dark corners of the basement of the CICSR building in what came to be known as the Sensory Deprivation Chamber. The Software Engineering Group being the newest and smallest group in the department, the battle for real estate was prolonged. In those days, the only other grad student around was Elisa Baniassad, another friend and colleague. Eventually, we eaked out a corner of the Distributed Systems Group Laboratory, and Martin "I hate that" Robillard joined our merry band. Ah, Martin, you can always be counted on to be up for beer no matter what corner of the planet we find ourselves in—as it should be. At some point, I ran off for the summer to I B M T.J. Watson Research Center in Hawthorne, New York. There, I met Peri Tarr and Harold Ossher, whose work helped drive some of my thoughts on my thesis. I also met Siobhan Clarke, another summer intern, with whom a close friendship and work collaboration developed. The Software Engineering Group grew into the Software Practices Laboratory, and new faces arrived on the scene. We were not as boisterous and close a group as the circle of friends in Imager, but the atmosphere was collegial, and much help was offered by all. Of particular note were Brian de Alwis, Kris De Voider, Gregor Kiczales, and Jonathan Sillito. Yvonne Coady joined us eventually, and we all purchased ear plugs. But deafness is a small price to pay for her energy and incite (not to mention her insight). There are many others who have helped over the years in so many ways, personal and professional. You are not forgotten, but there is only so much into which I wish to delve.  xi  And finally, there is Gail Murphy: the ideal mentor, a fun colleague, and a good friend. I would not be finishing this dissertation now, and especially not in software engineering, were it not for you. I cannot find the words to express my appreciation and admiration for you. Thank you for everything, but especially, for being human and showing me that there is nothing wrong with that. Thank you all. A Ph.D. is a lonely road, but you were the music that brightened the way.  ROBERT JAMES W A L K E R  The University of British  Columbia  March 2003  xii  Trademarks "POPL" and "Principles of Programming Languages" are registered trademarks of the Association for Computing Machinery.  " L E G O " is a registered trademark of The L E G O  Group. "Microsoft" and "Windows" are registered trademarks of Microsoft Corporation. " C O R B A " is a registered trademark of Object Management Group; "Unified Modeling Language" is a trademark of Object Management Group. "Motif" and "UNIX" are registered trademarks of The Open Group. "AspectJ" is a trademark of Palo Alto Research Center Inc. "Java" is a trademark of Sun Microsystems, Inc.  xiii  Chapter 1  Introduction Software systems evolve [Belady and Lehman, 1976; Lehman and Parr, 1976; Parnas, 1994; Svahnberg and Bosch, 1999; Eick et al, 2001]. The need to correct errors, to introduce new features, and to support different platforms all exert pressure upon our existing systems. Software is expensive to create; therefore, we would rather modify our systems to accommodate new requirements, rather than construct wholly new systems. Similarly, when pieces of existing systems provide functionality that is needed in new systems, there is a potential savings in reusing those pieces within the new systems [Mcllroy, 1969; Lanergan and Grasso, 1984; Krueger, 1992; Sen, 1997]. Software systems are modularized in the hope that changes can be isolated to individual modules without affecting the remainder [Parnas, 1972; Korson and Vaishnavi, 1986; Sullivan et al, 2001]. Consider a particular module in a system: it will depend on other modules in that system and other modules in that system will depend upon it. When modules are too interdependent, too coupled, problems arise [Stevens et al,  191 A}. We are  limited in our ability to modify a module because of the dependences upon its interface 1  and externally visible behaviour. We are also limited in our ability to reuse a module outside of its current system because of its dependences on the interfaces and externally visible behaviour of other modules there. However, we must have some dependences between our modules, or they would not be able to operate together. Each module can be seen to have a minimal core, an essential structure, beyond which it makes no sense to reduce. This essential structure describes the basic responsibilities of that module, and its expectations of the responsibilities of external modules. Essential structure is often abstract: a module may care that an external service exist and that it behave within certain constraints, but otherwise may be oblivious to the exact details of a concrete interface to such a service. Dependences that are not essential to the structure of our modules, which I call extra1  "Dependence" is used synonymously with "dependency" in this dissertation [Barber, 2001].  1  neous embedded knowledge of an external  service  ( E E K ) , o c c u r i n t w o w a y s . First, w h e n an abstract description  is m a d e t o o c o n c r e t e ,  we  c o n s t r a i n the m o d u l e in w h i c h  s c r i p t i o n o c c u r s ; w e c a n n o t r e u s e the m o d u l e u n l e s s that e x a c t c o n c r e t e Second,  systems  often have large-scale  concerns  that  de-  s e r v i c e is p r e s e n t .  in a d d i t i o n to the s m a l l - s c a l e  concerns  o f o u r i n d i v i d u a l m o d u l e s . D e t a i l s o f s u c h l a r g e - s c a l e c o n c e r n s t e n d to get p u s h e d into o u r s m a l l - s c a l e m o d u l e s . O n c e there, w e c a n n o t reuse the m o d u l e i n another s y s t e m w i t h differe n t l a r g e - s c a l e c o n c e r n s . O n c e t h e r e , w e c a n n o t e v o l v e o u r s y s t e m to c h a n g e its  large-scale  c o n c e r n s w i t h o u t the m o d i f i c a t i o n o f m u l t i p l e m o d u l e s , a t i m e - c o n s u m i n g a n d e r r o r - p r o n e process. U l t i m a t e l y , the presence within o u r m o d u l e s o f details o f their contexts cause us difficulty in e v o l v i n g or reusing them. State-of-the-art  of operation  programming mechanisms  d o n o t p e r m i t us the r i g h t l e v e l o f a b s t r a c t i o n to a v o i d s u c h c o n t e x t u a l details w h e n i m p l e menting our modules.  The thesis of this dissertation is that expressing the essential structure  of our software modules, through the use of  implicit context,  makes those modules easier  to reuse and the systems containing those modules easier to evolve. In S e c t i o n to e x p r e s s  1.1,1  look  i n b r i e f at  implicit context,  a n d c o n s i d e r h o w it c a n b e  t h e e s s e n t i a l s t r u c t u r e o f a s i m p l e e x a m p l e i n S e c t i o n 1.2.  short survey o f existing techniques for software reuse a n d evolution. t h e d i s s e r t a t i o n is d e s c r i b e d i n S e c t i o n  1.1  S e c t i o n 1.3  used  gives  a  T h e organization of  1.4.  Overview of Implicit Context  T o e a s e s o f t w a r e e v o l u t i o n a n d reuse, w e m u s t a b a n d o n the n o t i o n that a l l m o d u l e s c a n d e f i n e d w i t h r e s p e c t to a n a b s o l u t e f r a m e o f r e f e r e n c e .  be  Such a reference frame w o u l d need  to h o l d a c r o s s all v e r s i o n s o f a s y s t e m a n d a c r o s s a l l s y s t e m s ; o t h e r w i s e , the c o n s t r a i n t s a n d n a m e s that are v a l i d f o r o n e v e r s i o n o f o n e s y s t e m w i l l n o t b e v a l i d f o r s o m e o t h e r v e r s i o n or system. I n s t e a d , I d e f i n e m o d u l e s r e l a t i v e to a r e f e r e n c e f r a m e o f c o n v e n i e n c e .  In traditional  s o f t w a r e d e v e l o p m e n t , t h i s f r a m e l i e s i m p l i c i t l y at t h e s y s t e m - l e v e l : e v e r y m o d u l e e x p r e s s e s o n l y a subset o f the n a m e s among  these  modules:  modules.  and constraints u p o n  Implicit context  the s y s t e m , b u t n o c o n f l i c t is p e r m i t t e d  p u s h e s the r e f e r e n c e f r a m e d o w n  to the l e v e l  e a c h m o d u l e is p e r m i t t e d t o e x p r e s s a set o f n a m e s a n d c o n s t r a i n t s that a r e  ferent f r o m those expressed b y every other m o d u l e . permitted an  W i t h implicit context,  of dif-  e a c h m o d u l e is  independent world view.  S i n c e e a c h m o d u l e d o e s n o t n e e d to a g r e e o n a c o m m o n f r a m e o f r e f e r e n c e , it is f r e e to e x p r e s s o n l y t h o s e d e t a i l s o f its o p e r a t i o n that a r e o f i m m e d i a t e i m p o r t a n c e f r o m its l o c a l point-of-view—i.e.,  f r e e to e x p r e s s  higher-level modules can be  omitted,  its e s s e n t i a l s t r u c t u r e . a n d dealt with  2  Details of significance only  separately.  to  When we compose modules to form a higher-level module, we must reconcile the differences in the world views of the lower-level modules to agree with the world view of the higher-level module. After all, the modules must ultimately work together to form a coherent system. But postponing the reconciliation as long as possible maximizes the flexibility of the modules. Each module only expresses what it must from its local perspective, and this expression is transformed to suit higher-level modules only when those higherlevel modules are definitely to be used. Conceptually, the conflict in world views does not manifest itself until communication occurs between modules using different world views. Therefore, we can alter those communications to reconcile the differences as they cross between different world views. Implicit context consists of three concepts:  boundaries between conflicting world  views, contextual dispatch which is used to alter communications, and communication  his-  tory which is used to retrieve previous state when performing contextual dispatch. Contextual dispatch is a generalization of object-oriented dispatch. Contextual dispatch may be performed on communications as they cross a boundary between conflicting world views. Rather than selecting the implementation of a method on the basis of the run-time type of a given object, we select the method or methods to execute on the basis of the context in which the communication occurs—both static properties and dynamic properties of the system. Contextual dispatch can be used to fill in concrete details in an abstract service request: a message can be rerouted to a concrete implementor of the service, with additional parameters filled-in. Trouble can arise when the source of such additional parameters is not immediately obvious. Instead, communication history can be used. Communication history is, conceptually, a record of all communication that has occurred in the system, including all arguments passed and values returned. Through queries on communication history, we can retrieve state information of interest: the last passed argument of a particular type, for example. Communication history queries have the advantage that they can be expressed to minimize our assumptions about the system, as consistent with the principle of essential structure. Thus, communication history allows us to reduce the coupling in our systems, easing software evolution and reuse. Let us now consider a simple example in which contextual dependences adversely affect the evolvability and reusability of a module within a system, and then see how essential structure through implicit context can remedy the situation.  3  Client  AbstractFactory  •  aMethod(AbstractFactory)  createButtonf)  X  MotifFactory createButton()  MSWindowsFactory createButton()  AbstractButton X  MotifButton  MSWindowsButton  Figure 1.1: A design utilizing the Abstract Factory design pattern. This figure and all others in this document are based on the notation of the Unified Modeling Language™ [Object Management Group, 2000], but with extensions and liberties. See Appendix A for details of this notation, which is used throughout this dissertation.  1.2  A Simple Example  The Abstract Factory design pattern [Gamma et al, 1994] has become a standard means 2  of reducing coupling in our systems in the hopes of easing evolution. Consider a system that involves the use of a graphical user interface (GUI); which concrete classes need instantiating to represent widgets (such as buttons, sliders, or text areas) depends on concrete details of the context in which the system exists, such as the operating system or hardware present. This design pattern allows some modules to be decoupled from knowledge about these concrete details of the context, in the hopes that the system can then be run on a wider variety of platforms; the specific design I describe is illustrated in Figure 1.1 using a notation based on the Unified Modeling Language [Object Management Group, 2000], but with extensions and liberties. See Appendix A for details of this notation, which is used throughout this dissertation. Take a particular widget, such as a button. There are different concrete implementations of button for the Motif® GUI library (which runs on U N I X ® platforms) and for Microsoft® Windows®. If we wish to isolate these differences from a module that just needs to create a button, we can create a small hierarchy of classes consisting of Ab-  s t r a c t B u t t o n at the base. This abstract base class is then subclassed on the basis of which platform is supported: M o t i f B u t t o n implementing buttons for the Motif library and MSWindowsButton implementing buttons for Microsoft Windows. A client module that needs to create buttons should be oblivious to which platform See Appendix B for an overview of the Abstract Factory design pattern.  2  4  is  in use,  a n d thus,  should express no  k n o w l e d g e o f either  Motif Button  or  MSWin-  dowsButton.  H o w e v e r , s u c h a client c a n n o t instantiate the abstract button base class  get  o f the  an instance  subclass  appropriate  f o r the current  p l a t f o r m in use.  Instead,  to  this  i n s t a n t i a t i o n is d o n e i n d i r e c t l y . There  is a  hierarchy  abstract class c a l l e d into  Motif Factory  which is  o f c l a s s e s p a r a l l e l to  AbstractFactory. and  is a p p r o p r i a t e  responsible for instantiating  possesses an abstract m e t h o d called by  button  each platform-specific factory  the  hierarchy.  is s u b c l a s s e d  MSWindowsFactory.  platform-specific factory  then  the  This  o n the  A t the basis  S o m e part o f the s y s t e m  f o r the current  correct factory  createButton;  base of  is  must  operating context;  an  platform know  this  part  AbstractFactory  subclass.  this m e t h o d is then  implemented  to c r e a t e the c o r r e s p o n d i n g p l a t f o r m - s p e c i f i c v e r s i o n  of  button. Now,  a client m o d u l e must possess a parameter of type  w h i c h is p a s s e d t h e c o n c r e t e , p l a t f o r m - s p e c i f i c f a c t o r y  ateButton  m e t h o d is c a l l e d o n this f a c t o r y ,  o f b u t t o n is c r e a t e d — b u t  Abs t r a c t F a c t o r y ,  instantiated elsewhere.  through  The  and a concrete, platform-specific  cre-  instance  w i t h o u t the c l i e n t r e q u i r i n g k n o w l e d g e o f w h a t p l a t f o r m is  cur-  r e n t l y i n u s e . T h e i n s t a n c e o f b u t t o n is t h e n a c c e s s e d t h r o u g h the i n t e r f a c e d e f i n e d b y  stractButton.  T h i s a p p r o a c h also a l l o w s the a d d i t i o n o f subclasses  supporting  Abother  p l a t f o r m s , w i t h o u t the n e e d to m o d i f y a n y c l i e n t m o d u l e s . D i f f i c u l t i e s exist w i t h this a p p r o a c h . Abstract Factory  d e s i g n pattern.  C o n s i d e r the nature o f a c l i e n t m o d u l e u s i n g the  In the a b s e n c e  button object t h r o u g h the c a l l ( u s i n g J a v a ™  AbstractButton  o f the pattern, the c l i e n t w o u l d c o n s t r u c t  [Gosling  et al., 2000]  = new AbstractButton() ;  button  B u t in the p r e s e n c e o f the pattern, this c a l l m u s t instead  AbstractButton where  factory  button  a new  system  is a f o r m a l p a r a m e t e r  (through  become:  = factory.createButton(); o f the m o d u l e .  A b s t r a c t F a c t o r y p a t t e r n , it is s t u c k w i t h t h i s c o n s t r a i n t . to  a  syntax):  reuse  or evolution)  where  I f the c l i e n t m o d u l e is to u s e  the  E v e n i f the c l i e n t m o d u l e is m o v e d the  Abstract  Factory  pattern  is  not  n e e d e d o r i n a p p r o p r i a t e , it r e m a i n s c o n s t r a i n e d to u s e it. T h i s c o n s t r a i n t is o f i m p o r t a n c e  to  the s y s t e m i n w h i c h the c l i e n t e x i s t s ; the c l i e n t m o d u l e i t s e l f s i m p l y r e q u i r e s that a b u t t o n b e created. and  T h u s , w e a g a i n s e e t h e c o n t e x t u a l d e p e n d e n c e s o f a m o d u l e r e d u c i n g its r e u s a b i l i t y  evolvability. The  structure  Button  c a l l a b o v e that d i r e c t l y instantiates o f this part o f the client: is to  be  created.  It  AbstractButton  represents  the c l i e n t m u s t e x p r e s s that an i n s t a n c e o f  is o n l y t h e  system  as  a  whole  i n s t a n t i a t i o n is to t a k e p l a c e v i a the A b s t r a c t F a c t o r y d e s i g n  5  that m u s t  pattern.  the  essential  Abstract-  express  that  this  Implicit context allows for the separation of the concrete details of instantiation from the fact that an object must be created, as shown in Figure 1.2. There are two issues with which to deal when performing this separation. First, we want the module(s) calling the client to continue to see the same interface: these callers should still pass a factory for the client to use. But the client should be unaware that this information is being passed to it, because it should express no knowledge of the Abstract Factory pattern: the factory is effectively discarded. Second, when the attempt is made to instantiate A b s t r a c t B u t t o n , we can use contextual dispatch to replace this attempt with an actual call to the c r e a t e B u t t o n method on the factory object that was passed earlier. This factory object is retrieved via a query to communication history. Through the expression of essential structure via implicit context, I have made the client less dependent upon this particular system, allowing it to be reused in other versions or other systems where the Abstract Factory design pattern is not used. The client possesses a world view where Abstract Factory is not in place, but where concrete details of platforms remain unseen. The system as a whole possesses a different world view, one in which Abstract Factory is very much in use. Each world view is an expression of the essential structure appropriate at that level; implicit context allows their differences to be reconciled.  1.3  A Brief Survey of Evolution and Reuse Techniques  Many software reuse and software evolution techniques, approaches, and tools have been defined. These address a wide range of problems, including locating reusable modules of interest and re-engineering legacy systems; many of these problems, while important, lie outside the purview of this dissertation. Here, I assume that one has managed to locate software of interest, but requires the means to reuse or evolve it. There are three broad categories of prior work related to this thesis: ad hoc approaches; product-, architecture-, and interface-centric approaches; and generative approaches.  1.3.1  A d Hoc Approaches  Ad hoc approaches are ones where manual intervention, forms the basis for supporting change. Copy-and-modify and refactoring are representative examples. Copy-and-modify (also called scavenging) is likely the most common form of reuse, since it requires no special support from languages or tools [Krueger, 1992]; the programmer simply notices a similarity between an existing source code module and a desired one, makes a copy of it, and modifies the copy to the needs of the new situation. Refactoring [Opdyke, 1992; Fowler et al, 2000] is a slightly more sophisticated technique for restructuring a system, often used for object-oriented systems; example refactorings include moving methods between  6  Client  AbstractFactory  aMethod(AbstractFactory)  createButtonf)  MotitFactory  Client  MSWindowsFactory  createButtonf)  aMethod()  createButton{)  AbstractButton  3^ AbstractButton  MotifButton  (a)  MSWindowsButton  (b)  AbstractFactory createButtonf )  MotifFactory  MSWindowsFactory  createButtonf)  createButtonf)  AbstractButton X  1  MotifButton  MSWindowsButton  (c)  Figure 1.2:  Supporting independent world views  tory d e s i g n pattern e x a m p l e .  I n (a),  in c o n j u n c t i o n  w i t h the  the c l i e n t a t t e m p t s to instantiate  Abstract  AbstractButton  d i r e c t l y . In (b), the rest o f the s y s t e m b e l i e v e s that the c l i e n t u s e s the A b s t r a c t F a c t o r y stantiate  AbstractButton  r e c o n c i l e d in (c).  C l i e n t , implicit context A b s t r a c t B u t t o n , r e r o u t i n g it t o a  A t a boundary surrounding  AbstractFactory. F a c t o r y f r o m i n s i d e the  to i n -  indirectly. T h e s e c o n f l i c t i n g w o r l d v i e w s are c o m b i n e d a n d  the a t t e m p t to d i r e c t l y instantiate of  Fac-  T h e net e f f e c t is to r e m o v e  the d e p e n d e n c e  client, p l a c i n g this d e p e n d e n c e  b e r e u s e d i n c o n t e x t s that d o not u s e the A b s t r a c t F a c t o r y  7  externally. pattern.  is u s e d to  intercept  particular  instance  upon  Abstract-  N o w , the client  can  classes, adding a parameter to an overridden method, and collapsing a class hierarchy. One can avoid simple errors in refactoring through the use of tools. Typically, ad hoc approaches suffer from three shortcomings: non-repeatability, code replication, and interface constraints. If the changes made to a module are not quite correct, reverting to an earlier version of the module is necessary. Many steps may have been needed to derive the new version from the old; these steps need to be partially repeated to derive a corrected new version. This can be onerous and error-prone when many steps exist between the old and new version. Tool support can mitigate this difficulty to an extent. Code replication results from the need to maintain both the original version and modified version of a module. Consider what must occur if a problem is detected in one of these versions; it is often the case that identical or similar changes are needed in every copy. This requires one to locate all the copies and perform the same or similar steps on each to correct the problem. Such duplicated effort is tedious and error-prone [Burd and Munro, 1997]. Finally, one does not always have the luxury of having full control over every module that is dependent upon the ones requiring modification. Although the concept of encapsulation [Parnas, 1972] allows implementations of an interface to change, we are often constrained to maintain a published interface for the sake of clients—otherwise, those clients would all break. Refactoring can be used to change the structure of parts of a system without replicating those parts, but it cannot help us to overcome interface constraints. I will describe other difficulties with refactoring in Chapter 2.  1.3.2  Product-, Architecture-, and Interface-Centric Approaches  Product-, architecture-, and interface-centric approaches form a continuum that varies on the basis of the degree to which systems using them are constrained. At one end of this continuum are systems that possess a set of configurable options, allowing for a pre-defined family of products. At the other end are systems built from some pre-existing parts, where the interfaces to these parts constrain the way in which the rest of the system must be built. Object-orientation provides for reuse and evolution through inheritance.  Object-  oriented frameworks [Johnson, 1997] supply a set of interacting classes, with the intent that some will be subclassed, specializing the framework to a given context. Middleware frameworks, such as CORBA® [Object Management Group, 1998], provide a standardized calling protocol and interface constraints to which new modules (i.e., components) are to adhere [Szyperski, 1997]. The trouble with all of these approaches are two-fold. First, no matter how standardized the interface provided by one of these approaches, it inevitably needs to evolve. The need to maintain backwards-compatibility to support existing client modules, and to update the interface to accommodate new ideas, new features result in pressure in opposite directions; predicting such changes ahead of time is rarely possible [Wegner, 1996]. As a  8  result, languages evolve, libraries evolve, and standards evolve—or else, they stagnate and are eventually abandoned. Second, no single architecture is ideal, or even usable, for all systems.  Expecting  modules to always plug-in to any architecture like " L E G O ® blocks" is unrealistic. Most modules place constraints upon their systems in the form of dependences on external modules. A given module may need support from its system for certain global properties, such as particular security or fault detection protocols; such support cannot be provided by an arbitrary framework [Ran, 1999]. All these approaches require a single, global perspective on the system, an unrealistic requirement for software change that is distributed over time and geographical space.  1.3.3  Generative Approaches  Generative approaches fall into a number of subcategories; Czarnecki and Eisenecker [2000] give a good survey. Implicit context is itself a generative approach; the most closely related generative approaches are generic programming and aspect-oriented programming.  Generic Programming Generic programming [Musser, 1997; Musser and Nishanov, 1997; Jazayeri et al, 1998], most famously exemplified by the C++ Standard Template Library [Stepanov and Lee, 1995], is concerned with finding the most abstract representation of efficient algorithms. This goal is similar to that of essential structure, though more narrow in scope. Generic programming represents such algorithms in terms of generic parameters; concrete versions can be generated for concrete contexts by binding these generic parameters to concrete values or types. However, generic parameters must be declared explicitly. This results in one of two requirements; either the most elaborated use of the generic module must be predicted, so that appropriate generic parameters may be provided, or the generic module must evolve. Even should such prediction be possible, a generic module will then tend to possess such a large number of generic parameters that it will be difficult to use in simpler, concrete situations. Evolving a generic module to add generic parameters would require that all instantiations of it be modified to bind values to these new parameters—often not possible.  Aspect-Oriented Programming Aspect-oriented programming (AOP) [Kiczales et al, 1997] permits the separation and encapsulation of crosscutting concerns, concerns that affect the implementation of multiple modules within a system, such as synchronization, remote invocation, or fault tolerance.  9  By pulling out these concerns and placing them in their own, special modules (called aspects), the hope is that the core functionality of the modules will be less obscured and more reusable, and that the crosscutting functionality will be more evolvable, understandable, and reusable since it is localized. AOP is realized in a number of concrete programming languages, especially AspectJ™ [Kiczales et al, 2001], an aspect-oriented version of Java [Gosling et al, 2000]. More recently, the term "aspect-oriented programming" has come to be used for a number of approaches related to AOP as originally formulated [Elrad et al., 2001], including composition filters [Aksit et al, 1992] and subject-oriented programming (SOP) [Harrison and Ossher, 1993]. Composition filters provide a means for intercepting and rerouting messages at run-time. SOP allows separate class hierarchies to be composed; messages sent to an object from within one of these hierarchies can cause the invocation of behaviour, within other hierarchies. Some features of implicit context are similar to those of existing AOP techniques. However, its support for inconsistent, independent world views* is unique. Such support permits reuse of modules without a need for pre-planning. It also allows modules to be oblivious to changes in the system around them, reducing the need to invasively modify many modules within a system undergoing change.  1.4  Organization of the Dissertation  I begin the meat of this dissertation in Chapter 2, considering a motivational example of the difficulties of software evolution and reuse. This example points the way to the concept of essential structure, discussed in Chapter 3, describing minimal specifications and extraneous embedded knowledge along the way. In Chapter 4, I define the model of implicit context, and demonstrate how it can be used to realize essential structure. The model has been realized in a concrete, prototype tool for applying implicit context to Java source code. The tool is described in Chapter 5. It operates as a preprocessor to Java source code. The developer writes snippets of code, called boundary maps, that specify the details of the translation process for particular kinds of communications. The developer also specifies to which boundary (or boundaries) a given boundary map is to be applied; under the prototype, boundaries can be defined around only individual classes or individual methods. The original source code and boundary maps are fed into the prototype tool, which mechanically transforms the source accordingly. In Chapter 6, I describe two case studies that I performed: one describing a case of unanticipated reuse of a feature from an industrially-developed system; and one describing a case of parallel development and evolution with and without implicit context. The case studies make use of the prototype tool. Taken together these two case studies validate the  10  thesis by providing evidence that essential structure through implicit context eases software evolution and software reuse. Related work is described more thoroughly in Chapter 8. I discuss a number of outstanding issues and potential solutions in Chapter 7, especially in regards to making the technique practical. And I describe my contributions, future work, and conclusions in Chapter 9.  11  Chapter 2  Motivation When  w e b e g i n b u i l d i n g a s o f t w a r e s y s t e m , w e strive to d e s i g n s o f t w a r e m o d u l e s ' that  simple and conceptually clean. W h e n we story has t y p i c a l l y u n f o l d e d .  finish  are  b u i l d i n g a v e r s i o n o f the s y s t e m , a different  A n o r i g i n a l v i s i o n o f i n d e p e n d e n t a n d c o h e s i v e m o d u l e s that  i n t e r a c t t h r o u g h a n a r r o w i n t e r f a c e is t o o o f t e n r e p l a c e d w i t h a r e a l i t y i n w h i c h t h e r e e x i s t s a l a r g e r t h a n d e s i r e d set o f i n t e r a c t i o n s b e t w e e n m o d u l e s . S u c h a d d i t i o n a l i n t e r a c t i o n s  couple  our m o d u l e s , preventing t h e m f r o m b e i n g easily reused in new systems, a n d preventing o u r systems f r o m being easily evolved. O b v i o u s l y , m o d u l e s m u s t c o m m u n i c a t e to p r o v i d e s y s t e m b e h a v i o u r . C o m m u n i c a t i o n leads to i n t e r a c t i o n b e t w e e n  modules.  Sometimes  we can  p o i n t to i n t e r d e p e n d e n c e s  t w e e n m o d u l e s a n d s a y o u t r i g h t that t h e s e d o n o t b e l o n g : the m o d u l a r i z a t i o n o f the is p o o r , a n d as a result, o u r m o d u l e s are s t r o n g l y c o u p l e d o r w e a k l y c o h e s i v e .  be-  system  There  are  other times w h e n w e cannot m a k e such a strong statement, w h e n certain dependences  are  necessary.  T h e p r o b l e m resides  i n the f a c t that, at s u c h t i m e s , a m o d u l e e n d s u p  ing with other m o d u l e s for reasons  n o t d i r e c t l y r e l a t e d to p r o v i d i n g its c o r e b e h a v i o u r .  t h e s e t i m e s , it i s n o t t h e m e r e e x i s t e n c e r e l a t i v e to the rest o f the  interactAt  o f d e p e n d e n c e s that is n e g a t i v e , b u t t h e i r p l a c e m e n t  structure.  T o investigate the sources o f these troubles, I e x a m i n e a m o t i v a t i o n a l e x a m p l e in S e c t i o n 2.1, source  w h e r e I attempt to reuse a feature, c a l l e d the O u t l i n e V i e w , that exists in a n o p e n integrated development environment, called E c l i p s e .  tree-like v i e w o n the m a j o r d e c l a r a t i o n s  in a source  file.  T h e Outline View displays a  This Outline View  is h i g h l y  p e n d e n t u p o n parts o f the d e v e l o p m e n t e n v i r o n m e n t that I a m n o t i n t e r e s t e d in r e u s i n g . e v o l u t i o n is a n e x t r e m e f o r m o f r e u s e i n w h i c h w e a r e r e u s i n g m o s t o f o n e s y s t e m  deA s  in an-  o t h e r ( v e r s i o n ) , this e x a m p l e s e r v e s as a g e n e r a l m o t i v a t i o n f o r the d i f f i c u l t i e s e n c o u n t e r e d in b o t h software reuse a n d software evolution. I d i s c u s s the t w o m o s t c o m m o n a p p r o a c h e s  to reuse in this chapter.  First, I  examine  'I use the term module in a generic sense to mean any method, class, package, procedure, unit, component, etc.  12  the simple reuse technique of copy-and-modify in Section 2.2 and find it lacking for my purposes. Second, I consider the more sophisticated approach of refactoring in Section 2.3; while I find it too to be lacking, it brings us closer to the concept of essential structure described in Chapter 3. Less common approaches using special mechanisms are considered in Chapter 8.  2.1  Reusing the Outline View of Eclipse  Figure 2.1 shows a screenshot of an integrated development environment (IDE) called Eclipse. Eclipse is written in the Java programming language [Gosling et al, 2000], com2  bining a number of standard programming tools such as a compiler, debugger, and texteditor. It also supports an Outline View, shown on the right of the screenshot. The Outline 3  View is an abstract, hierarchical view of the major programmatic declarations within the Java source file currently displayed in the text editor. In the example screenshot, a simple class called T e s t is being edited, and the Outline View shows the outline of T e s t as a tree: the root of the tree represents T e s t itself, there are nodes for methods within T e s t (i.e., f oo (), f oo2 (), and b l a h ( i n t ) ) , and so on. Each node in the outline tree has an icon representing a combination of its kind (i.e., class vs. field or method) and its modifiers for visibility, synchronization, etc. At the top of the Outline View window are four toggle buttons that can be used to modify the view of the programmatic declarations. When each toggle is depressed, respectively, the declarations are sorted lexically, fields are hidden (or "filtered"), static members are hidden, and nonpublic members are hidden. The text editor and Outline View operate in unison: as the source file is modified, the Outline View is dynamically updated. When a declaration is selected in the Outline View, the text editor jumps to the corresponding location within the source text. For example, in the screenshot, the declaration for the inner class I n n e r within T e s t has been selected, and the text editor has moved to the corresponding location. Also available within the Outline View is a pop-up menu (not shown) that allows various tasks to be performed in a context-specific fashion on the selected declarations. For example, a set of declarations can be deleted from the source or various refactoring operations [Opdyke, 1992] can be performed, such as renaming a single declaration (all references to that declaration are altered correspondingly). The Outline View thus possesses a non-trivial set of functionality. This functionality Eclipse was originally developed by Object Technology International, Inc.; it has since been released to the open-source community and may be downloaded from h t t p : / / w w w . eclipse.org/. Reiss [1996] provides a survey of software tools and development environments. 2  3  13  !  File -Edit  Perspective  Project;.: Debug'  Packages  . tjl-O (defaulipac fij ft JREJ.-IB -  fiBHPl  ,,M,„ l,l,u,l,^l M  Help  fj) Testjavalx j public  H £ r Testing  Window  abstract class Test-{  '•public  '  •public  void  fooO'-{. •  abstract void  fcl- 0 * Test t' '''}'. : •'•>  !  :  ~ '  ,!  .  f o o 2 ( ) *  *  '  - ; • -:  ^•private • synchronized n a t i v e ; v o i d . b l a h ( i n t . l ) ; •^'protected s t a t i c T e s t someEield;  •-. ,  , s t a t i c c l a s s IfiffCH { p u b l i c v o i d lnnerMethodfdouble a)  ' . )  )  Ij^j T a s k s (0  -  " •  Hierarchy! Packages 1 Tec/nner - /Testing  -• D o blah(int) t  > someField - Test 3  - G|. |15J3 S  '  o  innerMethod(double)  {  items)  | C | I [Description  E E  •  " | o fooO ' o * foo20  Resource • In Folder**  Console Tasks Search  Figure 2.1: Screenshot of the Eclipse integrated development environment, operating in its Java Perspective mode. The text editor is in the central sub-window, and the Outline View is in the sub-window on the right. would be useful for a number of programming and analysis tools. For example, the Feature Extraction and Analysis Tool [Robillard and Murphy, 2002] makes use of a similar graphical user interface (GUI) to select sets of programmatic declarations and perform extraction and analysis tasks on these. The tool that is described in Chapter 5, for the application of implicit context, itself needs GUI support to make its use easier; the Outline View would be a good basis to build this support upon (see Figure 2.2). Eclipse itself is large, consisting of approximately 800,000 lines of source code. Most 4  of its functionality is not of interest to us in the context of these other tools. Were I to simply reuse the whole of Eclipse, the additional 800,000 lines of source code would be an unacceptable cost: there would be a cost in memory overhead, system maintenance, dynamic performance, and system comprehensibility. In short, our tools would become This was calculated using the UNIX wc tool, and thus includes blank lines and comment-only lines. 4  14  fooO  |--o  '—- o'-'foo20 i | a blah(int) e  *  S)  som'eFfeld  S- <& Inner s  .  ^—o  innenMethod(double)  Figure 2.2: The Outline View would make a good basis upon which to build a GUI for various tools, including the one described in Chapter 5. unnecessarily complex and thus fragile—even were we able to somehow hide the unwanted functionality, which is not a given. Therefore, I desire to "pull out" and reuse the Outline View, leaving behind the remainder of Eclipse. Unfortunately, the Outline View was not designed to be reusable outside the context of Eclipse. To see the difficulties this causes us, let us consider the design of Eclipse. J a v a O u t l i n e P a g e is a class within Eclipse that forms a key part of the implementation of the Outline View. Along with its inner classes, an instance of J a v a O u t l i n e P a g e is responsible for maintaining the tree data structure for the outline of a particular source class, for displaying this tree, and for performing the various filtering operations available; multiple instances of J a v a O u t l i n e P a g e may exist at a time to permit quick switching between different source files in the Eclipse editor. Since J a v a O u t l i n e P a g e and its inner classes are so central to the implementation of the Outline View and I wish to reuse the Outline View, I need to reuse these classes. Figure 2.3 shows all of the classes upon which J a v a O u t l i n e P a g e and its inner classes (shown in boldface, and boxed) depend. These are simple dependences, consisting of explicitly named classes within the source text of J a v a O u t l i n e P a g e , any superclasses these possess, and any interfaces they implement; this diagram does not include less obvious, indirect dependences, should they exist. Nevertheless, even at this level, J a v a O u t l i n e P a g e possesses many dependences upon classes (e.g., J a v a P l u g i n l m a g e s , J a v a E d i t o r M e s s a g e s ) that I am not interested in reusing when I pull out the Outline View. Eclipse is well-designed. It provides a type hierarchy with increasing abstraction towards its base. This permits clients to refer to the least concrete class that each needs; as a result, more concrete classes can be created to extend the system without the need to modify most clients. This is good object-oriented design [Booch, 1986; Rumbaugh et al,  15  1991;  llnputProvider  JavaSearchGroup  ISeleclionProvider  t  t  ~v  11 n p u t S e l e c t i o n P r o v i d e r  GenerateGroup  RefactoringGroup  ContextMenuGroup  (EventListener)  ^  N  KeyLislener  KeyAdapter  t  Viewer  T  ContentViewer  X  IPage  t>  t  t  StructuredViewer  V  t AbstractTreeViewer  ^  IContentOutlinePage  O"  t  JavaOutlinePage. O T r e e V iJavaOutlineViewer ewer  ImageDescriptoi  l_ ILabelProviderListener  IB a s e Label Provider  JavaCore  t  ~|  lExecutableExtension  ±  ILabelProvider  Plugin  t  JavaElementLabelProvider  - - V  JavaPlugin AbstractTextEditor JavaEditor  Widget  Control  |  Device  t  I  ^  t  t  •  StatusBarUpdater  Menu  Scrollable ICompMonUnit  f  ^  ~t>  IMethod  I  C o r elEE xx cc ee ip t i o n  StructuredSelection  JavaModelException  (EventObject)  <1  ISourceReference  4^  a e m  er,tCbangedEvent  TypedEvent  IContribulionltetm  T  iContributionManager  "Tree Contributionltem  IMenuManager  KeyEvent  IStatusLineManager  JavaPluginlmages IContextMenuConslants OpenTypeHierarchyHelper lUpdate  Figure 2.3:  boxed.  MenuManager  IMenuListener  IToolBarManager  ^  JavaModelUtility  Assert  ArrayUtility  ContributionManager  explicitly.  shown.  T h e inheritance  S W T  Flags  and  and aggregation  its i n n e r c l a s s e s a r e  T h e three s u p e r c l a s s e s that fall o u t s i d e E c l i p s e (in the  shown in parentheses.  16  IJavaElementDelta  a n d its i n n e r c l a s s e s  relationships  T h e other, i n d i v i d u a l d e p e n d e n c y  J a v a O u t l i n e P a g e  IRunnableContext  IWorkbenchWindow  T h e classes upon w h i c h J a v a O u t l i n e P a g e  M e t h o d s a n d fields are not s h o w n .  shown  T T  IPageService OpenHierarchyPerspectiveltem  ships o f these c l a s s e s are  SelectionChangedEvent  <T  — I C o d e A s s i s t  SelectablellemWidget  JavaEditorMessages  IStructuredSelection  IWorkingCopy lOpenable  !> I S o u r c e M a n i p u l a o o n  1  T T  (Exception)  IMember  IType  IParent  Display  Composite  T  Initializer  rv  |  JavaOutlinePage. Visibility Fitter  ITextEditor  —P*  ^  JavaOutlinePage. Fie Id Filter  T  ISelectionChangedListener Drawable  T  lEditorPart  ^ Item  JavaOutlinePage.FilterAction  ViewerFilter  t  .  T  IJavaElement  IWorkbenchPart  .  E d i t o r P a r t V  T  <}-  t  T  AbstractUIPIugi(l  t  lAdaptable  ^  WorkbenchPart  t  LabelProvider  " f r  T  relationships  shown  standard  depend. relationare  in boldface  Java libraries)  not and are  Zweben et al, 1995]. In many cases, Eclipse also provides a hierarchy of interfaces in par5  allel to its hierarchy of classes; this makes it harder for developers to ignore the conceptual coherence of their classes, since one cannot so easily overlook the abstract interface because of implementation details. Eclipse also provides well-integrated support for design patterns such as Observer [Gamma et al, 1994]: each class is designed taking into consideration what events it might provide that could be of interest to other parts of the system—and these events themselves are structured in a hierarchy of abstractions. Though it is well-designed, Eclipse remains complex, with many interdependences between its classes. There are two traditional approaches to deal with such problematic dependences in reusing the Outline View: 1. Copy the classes implementing the Outline View, and heavily modify them either to create a generic, reusable widget, or to customize them on a case-by-case basis. I describe this in Section 2.2. 2. Refactor Eclipse to separate the problematic dependences. Section 2.3 looks at this approach.  2.2  Copy-and-Modify  A typical approach to source code reuse today is to copy the source code of interest and modify it to suit the new context in which it must exist [Krueger, 1992; Rosson and Carroll, 1996]. This approach tends to greatly increase the amount of source code: one extra copy for each new context. Each of these must then be maintained independently, which is troublesome since the bigger the system, the more work there is to debug it and modify it. Software reuse is intended to minimize duplication of effort, and copy-and-modify is a poor substitute for it [Krueger, 1992; Burd and Munro, 1997]. In a variant on this, one might be tempted to alter the reused code to structure it "correctly." This assumes that this code was not structured correctly, in some sense, in the first place. There are two possibilities: either the code really was poorly structured initially, or what is desired is to create a version of the code explicitly supporting reuse, one that can then be tailored to specific situations. As I have argued above, Eclipse is well-designed, so the first possibility does not hold. The second possibility is equivalent to refactoring, and is examined in the following section. In Java, an interface is a special class that defines abstract methods, but no method implementations. As a result, interfaces are not instantiable. However, multiple inheritance can be simulated through the use of interfaces, without as many concerns over ambiguities as to the recipient of an overridden method call. 5  17  2.3  Refactor  Krueger [1992: p. 133] identifies abstraction as "the essential feature in any reuse technique" and others agree that explicitly designing for reuse makes reuse easier [Johnson and Foote, 1988; Castano and De Antonellis, 1993; Parnas, 1994; Griss et al, 1995; Wasmund, 1995; Jacobson et al, 1997]. Therefore, the question is whether we can refactor [Opdyke, 1992] Eclipse to have the Outline View explicitly support reuse by improving its abstraction. A sufficient abstraction must permit the non-reusable, context-specific dependences within the Outline View to be separated from the parts I wish to reuse. As an example, Section 2.3.1 describes the dependences arising on a particular method in the Outline View (specifically, in J a v a O u t l i n e P a g e ) called m a k e C o n t r i b u t i o n s . This method is declared as an abstract method within Page, the superclass of J a v a O u t l i n e P a g e . The m a k e C o n t r i b u t i o n s method is called polymorphically by Eclipse upon the instance of Page that is about to be displayed in the Outline View. This call permits the Page an opportunity to add specialized functionality to the standard menu and status line of Eclipse, and to the standard tool bar of the Outline View. Such an interface only makes sense in a context where a standard menu, status line, and tool bar exist. Outside Eclipse, the standard menu and status line will not necessarily exist. Therefore, assuming that such a menu and status line exist tie the Outline View to Eclipse, preventing it from being reused in other contexts. We can consider two variants of refactoring to attempt to eliminate the unwanted dependences present in this method: coarse-level refactoring, where the system decomposition is radically altered, and fine-level refactoring, where the basic class hierarchy is left as-is. At the coarse-level, Eclipse is currently well-structured. Classes in Eclipse are conceptually coherent, with small sets of methods. Coarse refactoring, such as changing the class hierarchy, or moving methods between classes is unlikely both to maintain cohesion and to separate out the problematic dependences. Therefore, such coarse-level refactoring to reduce the dependences present in the Outline View would be at the cost of making other portions of Eclipse even harder to reuse or to evolve. Each new situation where we needed to reuse a different portion of Eclipse would require yet another convoluted refactoring. Thus, I abandon hope for the coarse-level refactoring approach. At the fine-level, perhaps individual methods could be chopped up into smaller methods within the same class, separating out the unwanted dependences. Ideally, this would leave behind a highly-reusable core structure, while the context-dependent details could be manipulated in new contexts without the need to modify this core. I examine this approach further in Section 2.3.2, but find that the ideal is not met: the unwanted dependences cannot be so easily removed from m a k e C o n t r i b u t i o n s . As an alternative, the core structure could be placed in a reusable, abstract class while each set of context-dependent 18  details would be placed in a subclass of the abstract class. This alternative is discussed in Section 2.3.3. While it, too, is insufficient, it contains the seed of an approach we can use.  2.3.1  Dependences inmakeContributions  Below is part of the source for a method on J a v a O u t l i n e P a g e called m a k e C o n t r i butions.  6  I begin by looking at its.existing functionality, and identify its dependences  upon the rest of the system. This method is called by other parts of Eclipse when a J a v a O u t l i n e P a g e instance is made visible; it allows that instance the opportunity to add functionality to the standard menu bar, tool bar, and status line that are provided to all graphical views in Eclipse. //  Begin  public  running  example  2.1  v o i d m a k e C o n t r i b u t i o n s ( I M e n u M a n a g e r menuManager, IToolBarManager  toolBarManager,  IStatusLineManager  statusLineManager)  {  The method begins by checking that an actual instance of I S t a t u s L i n e M a n a g e r has been passed and, if so, decorates it with an instance of S t a t u s B a r U p d a t e r ; S t a t u s B a r U p d a t e r (potentially) writes messages on the status line when the user changes the set of declarations, on display in the Outline View, that are selected. if(statusLineManager  != n u l l )  {  StatusBarUpdater updater = new  StatusBarUpdater(statusLineManager);  addSelectionChangedListener(updater); } Next, an instance of L e x i c a l S o r t i n g A c t i o n (an inner class of J a v a O u t l i n e Page, and so, already customized for its purposes) is created and added to the tool bar; an action is essentially a callback function. The tool bar manager adds a new button to the tool bar according to the properties of the added action; when the user presses the button, the r u n method of the registered action is invoked. In the case of L e x i c a l S o r t i n g A c t i o n , the action taken is to sort (or unsort) the declarations displayed in the Outline View. Throughout the thesis, there are lengthy examples in which explanatory text is interspersed with source text. These are introduced with a comment of / / Begin running example and terminated with a comment of / / End running example. The original source text of make C o n t r i b u t i o n s is presented in Appendix C without the interspersed commentary. 6  19  Action action = new LexicalSortingAction(); toolBarManager.add(action); Similar View.  calls are made to a d d actions for the other three buttons on the tool bar of the Shown  below are those that create the  button  Outline  that hides or reveals ( i . e . , filters) field  declarations; the calls for setting u p the other buttons are similar and, thus, are not shown. FilterAction filter (e.g., for unlike  in the  is  fields)  case  also an inner class of J a v a O u t l i n e P a g e , but the actual k i n d of and the images, descriptions, etc. appropriate thereto parameterize it,  of L e x i c a l S o r t i n g A c t i o n .  As  a  result,  some detailed parameters  are exposed within this section of code. action = new F i l t e r A c t i o n ( n e w F i e l d F i l t e r ( ) , JavaEditorMessages.getString( "JavaOutlinePage.HideFields.label"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.unchecked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.unchecked"), "HideFields.isChecked"); JavaPluginlmages. setlmageDescriptors(action,  "lcll6",  "fields_co.gif");  toolBarManager.add(action);  // End running  example 2.1  The dependences (good and bad) that occur in this code include the following pieces of knowledge. • Specific classes external to J a v a O u t l i n e P a g e ager,  IStatusLineManager,  (i.e., A c t i o n , IMenuMan-  IToolBarManager,  JavaEditorMes-  s a g e s , J a v a P l u g i n l m a g e s , and S t a t u s B a r U p d a t e r ) exist.  20  • F i l t e r A c t i o n , F i e l d F i l t e r , and L e x i c a l S o r t i n g A c t i o n exist, but these are inner classes within J a v a O u t l i n e P a g e , and so, do not trouble our task of reusing J a v a O u t l i n e P a g e . • S t a t u s B a r U p d a t e r can deal with events concerning a change in the items selected in the Outline View. • S t a t u s B a r U p d a t e r requires a parameter of type I S t a t u s L i n e M a n a g e r in its constructor, which apparently cannot be n u l l . • L e x i c a l S o r t i n g A c t i o n and F i l t e r A c t i o n are subclasses of A c t i o n . • Actions are individually passed to I T o o l B a r M a n a g e r and not, say, buttons, and the a d d ( A c t i o n ) method is to be used for this purpose. • F i l t e r A c t i o n requires a particular, ordered set of parameters in its constructor, where the first is a (supertype of) F i e l d F i l t e r and the rest are strings, each with a particular purpose. • The strings appropriate as parameters to the constructor of F i l t e r A c t i o n are returned by static calls on J a v a E d i t o r M e s s a g e s , and these messages are obtainable by passing other, particular strings to it. • The file location of the image to be associated with an action (which is drawn on its button) can be set through a cryptic-looking call on a class called J a v a P l u g i n l m ages. Now, I examine how refactoring can or cannot help us eliminate the unwanted dependences in m a k e C o n t r i b u t i o n s .  2.3.2  Refactoring without Changing the Class Hierarchy  I n reusing the Outline View, I choose a set of classes to pull out of Eclipse on the basis of their support for the functionality of interest. There will be no status line nor menu bar in my separated Outline View, so I M e n u M a n a g e r , I S t a t u s L i n e M a n a g e r , and S t a t u s B a r U p d a t e r are not of interest. Likewise, I am not interested in classes supporting a Java editor and there will be no concept of a Java plugin external to Eclipse; thus, J a v a E d i t o r M e s s a g e s and J a v a P l u g i n l m a g e s are not of interest either. To reuse the Outline View independent of the rest of Eclipse, I must eliminate mention of these classes from m a k e C o n t r i b u t i o n s . First, I simplify the creation of the field filtering action by making an additional inner class within J a v a O u t l i n e P a g e , called F i e l d F i l t e r A c t i o n , inheriting from F i l t e r A c t i o n . Two additional inner classes are needed for the static filter action and public  21  visibility filter action (not shown in the code above). The replacement statement is similar to the statement instantiating L e x i c a l S o r t i n g A c t i o n ,  and has the positive effect of  making clearer the tasks performed by m a k e C o n t r i b u t i o n s . The downside of such a refactoring is that J a v a O u t l i n e P a g e as a whole still makes mention of J a v a E d i t o r M e s s a g e s and J a v a P l u g i n l m a g e s . Also, it is not possible to move any inner classes outside of J a v a O u t l i n e P a g e without exposing private state information to the system at large—which would lead to structural degradation. But let us ignore this for the moment, and look at the next set of dependences. Mention of S t a t u s B a r U p d a t e r can be eliminated from m a k e C o n t r i b u t i o n s by replacing the lines mentioning it by a call to a hook, say  if(statusLineManager  != n u l l )  {  contributeToStatusLine(statusLineManager); }  and implementing c o n t r i b u t e T o S t a t u s L i n e to make mention of S t a t u s B a r U p dater.  But again, J a v a O u t l i n e P a g e as a whole still mentions S t a t u s B a r U p -  dater. Dealing with I M e n u M a n a g e r and I S t a t u s L i n e M a n a g e r is more difficult, as the signature of m a k e C o n t r i b u t i o n s forms part of the published application programming interface (API) of Eclipse: I cannot simply change it on a whim, as client developers depend upon it. I can, however, reimplement it as follows:  p u b l i c v o i d m a k e C o n t r i b u t i o n s ( I M e n u M a n a g e r menuManager, IToolBarManager toolBarManager, IStatusLineManager statusLineManager) { contributeToStatusLine(statusLineManager); contributeToToolBar(toolBarManager); }  where the implementation of c o n t r i b u t e T o T o o l B a r ends up consisting solely of the construction and addition to the tool bar of the four actions provided by the Outline View, as described earlier.  private void contributeToToolBar(IToolBarManager toolBarManager) { A c t i o n a c t i o n = new L e x i c a l S o r t i n g A c t i o n ( ) ; toolBarManager.add(action); a c t i o n = new F i e l d F i l t e r A c t i o n ( ) ; toolBarManager.add(action); 22  } Now, c o n t r i b u t e T o T o o l B a r is reusable in my new context, although large swathes of the remainder of J a v a O u t l i n e P a g e are not. I do not want merely to copy out the reusable portions of J a v a O u t l i n e P a g e , for the same reasons I rejected the copyand-modify approach discussed in Section 2.2. Let us now consider how we might eliminate the dependences altogether from J a v a O u t l i n e P a g e without destroying the structure of Eclipse in the large. '  2.3.3  Refactoring through an Abstract Class  A s an alternative, we can modify the approach described above to split J a v a O u t l i n e P a g e into two classes: an abstract class containing the reusable elements, and a concrete subclass of this containing the Outline View dependences specific to Eclipse. I implement the abstract class much as I refactored J a v a O u t l i n e P a g e above.  // Begin p u b l i c  running c l a s s  p u b l i c  example  2.2  AbstractJavaOutlinePage  extends  . . .  {  v o i d  makeContributions(IMenuManager  menuManager,  IToolBarManager  toolBarManager,  IStatusLineManager  statusLineManager)  {  contributeToStatusLine(statusLineManager); contributeToToolBar(toolBarManager); }  This has the advantage that the default implementation of c o n t r i b u t e T o S t a t u s L i n e is empty, thereby removing mention of S t a t u s B a r U p d a t e r from the abstract class. p r o t e c t e d  v o i d  contributeToStatusLine(IStatusLineManager //  subclasses  to  override  this  as  manager)  {  needed  }  A l s o , rather than require the implementation of a new F i e l d F i l t e r A c t i o n class, I invoke an abstract method c r e a t e F i e l d F i l t e r A c t i o n that is to be implemented by my concrete subclass.  23  p r o t e c t e d  v o i d  contributeToToolBar(IToolBarManager Action action  = new  toolBarManager)  {  LexicalSortingAction();  toolBarManager.add(action); action = createFieldFilterAction(); toolBarManager.add(action);  }  p r o t e c t e d  a b s t r a c t  // End running  Action  createFieldFilterAction();  example 2.2  I h a v e t h u s c r e a t e d s o - c a l l e d h o o k s i n m y abstract c l a s s that c o n c r e t e c l a s s e s c a n t h e n implement.  T h e context-dependent  details are i m p l e m e n t e d in a context-specific  class t h r o u g h the use o f these h o o k s . to E c l i p s e a n d n o t r e u s a b l e  outside  perspective o f r e u s i n g the O u t l i n e p u b l i c  c l a s s  e x t e n d s  S i n c e this c o n c r e t e c l a s s is i n t e n d e d to b e s p e c i f i c it, t h e s e d e t a i l s  are n o longer  problematic  View.  EclipseJavaOutlinePage  AbstractJavaOutlinePage  p r o t e c t e d  {  v o i d  c o n t r i b u t e T o S t a t u s L i n e ( I S t a t u s L i n e M a n a g e r manager) { if(manager  !=  n u l l )  StatusBarUpdater new  concrete  {  updater =  StatusBarUpdater(manager);  addSelectionChangedListener(updater); } }  Action createFieldFilterAction() Action action =  p r o t e c t e d  {  new F i l t e r A c t i o n ( n e w FieldFilter(), JavaEditorMessages.getString( 24  from  the  "JavaOutlinePage.HideFields.label"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.unchecked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.unchecked"), "HideFields.isChecked") ; JavaPluginlmages. setlmageDescriptors return  (action,  "lcll6",  "fields_co.gif"-);  action;  }  }  Problems remain with this solution. A b s t r a c t J a v a O u t l i n e P a g e  still explicitly  mentions I M e n u M a n a g e r , I T o o l B a r M a n a g e r , and I S t a t u s L i n e M a n a g e r ,  even  though in a given context each of these might not be reused. For example, the reused Outline View in Figure 2.2 does not possess a status line, and so, I S t a t u s L i n e M a n a g e r has no use there. However, since these are formal parameters required by the Eclipse API and the implementation of m a k e C o n t r i b u t i o n s is intended to override that of its superclass, mention of these classes cannot be removed by this approach. M o r e serious problems arise when one considers additional cases of reuse. It might be necessary to pull out other dependences, requiring a different abstract base class. This would necessitate one of two changes. The first option requires the implementations of my Page, E c l i p s e J a v a O u t l i n e P a g e ,  AbstractJavaOutline-  and any other subclasses of A b s t r a c t J a v a -  O u t l i n e P a g e to change to increase the abstraction of A b s t r a c t  JavaOutlinePage  and transfer the concrete details to the subclasses. If we lack control over the source code in the original system, we would need to create and maintain a separate version of this source; this has the same drawbacks as copy-and-modify, described in Section 2.2.  Furthermore,  such changes are dangerous, potentially introducing errors every time we invasively modify the system. The second option would introduce an additional abstract class as the superclass of AbstractJavaOutlinePage.  This abstract superclass would have even fewer con-  crete details; most of the body of A b s t r a c t J a v a O u t l i n e P a g e  25  would move to the  new abstract superclass, with only a few concrete implementation details remaining. This solution leads to structural degradation, as there come into being a hierarchy of abstract classes, most of which are little different from their superclasses; such a hierarchy would be difficult to maintain, as the conceptual cohesion of each would be weak—mere dribs and drabs of concrete details. Many of the other dependences on page 20 remain after my attempts at refactoring. But there are two points to consider: not all of these dependences are problematic; and, not all knowledge can be eliminated if an algorithm fulfilling the role of m a k e C o n t r i b u t i o n s is to remain. There is the seed of an approach here. What I have approximated in refactoring m a k e C o n t r i b u t i o n s is its essence; I have stated in an abstract fashion the core of what this method is about, and allowed the details to be filled in externally. To reduce m a k e C o n t r i b u t i o n s much further would be to render it devoid of content; there would be no value in reusing it then. Thus, there is a hope that there exists a minimal, essential structure to m a k e C o n t r i b u t i o n s , a structure that possesses some quality of context-independence and, therefore, high reusability.  2.4  Summary  Software reuse and software evolution are problematic because of the presence of dependences between the modules of our systems. I have examined a particular example, the reuse of the Outline View from the Eclipse integrated development environment, to investigate the nature of these dependences. I have attempted two simple, traditional approaches to removing these dependences: copy-and-modify and refactoring. Copy-and-modify is found lacking due to the resulting need to maintain two parallel but closely related versions of the same code base, which would significantly reduce the benefits of reuse. Refactoring also possesses general drawbacks. It requires that interfaces be modified upon which client code depends; this is generally not possible since we rarely have control over such client code. It requires that all combinations of concrete details within our classes be separated into a complex assortment of abstract superclasses; such structural degradation is the anathema of software evolution and comprehensibility. Although more sophisticated techniques exist (and are examined in Chapter 8), none provides the sufficient means to cope with excess contextual dependences. Refactoring has motivated the idea of essential structure: the minimal implementation, maximally abstract, that actually provides the functionality that forms the core of a given module. I examine this concept in more detail in the following chapter.  26  Chapter 3  Essential Structure Software reuse and software evolution are made more difficult when our modules have assumptions built into them about the context in which they operate. In Chapter 2, we have seen how  J a v a O u t l i n e P a g e expresses knowledge of the context in which it exists, namely Eclipse, by explicitly referring to the J a v a E d i t o r M e s s a g e s and J a v a P l u g inlmages classes. The latter classes have no meaning outside of the particular context of Eclipse, which makes J a v a O u t l i n e P a g e difficult to reuse. Our refactored version of the m a k e C o n t r i b u t i o n s method in J a v a O u t l i n e Page attempted to do away with these contextual dependences by abstracting out some of these details. But arbitrary abstraction is insufficient. For example, when a class participates in the Abstract Factory design pattern [Gamma et al, 1994] as a client, it must be aware of this participation; the abstract factory class must be explicitly named even though only the product classes managed by the factory are of interest to the client. As a result, the client class cannot be reused in a context where Abstract Factory is inappropriate: even in an attempt at flexibility, such built-in assumptions make software brittle. This is not to say that there is no form of abstraction that suffices. To reuse a given module, it must represent some form of functionality that is of interest to us, or there is no point in reusing it; such functionality lies at the core of the purpose of that module. Abstracting away whatever this particular functionality is would remove the core of the purpose of the module—an operation that makes no sense. This core may require some dependences upon the system outside the module. However, any details of the external context beyond this core functionality are constraints that will impede the reusability of the module. An overconstrained module is more likely to require change as the system evolves. These extra constraints are knowledge of the external context of a module that is extraneous from the perspective of the core purpose; I thus refer to such knowledge as extraneous embedded knowledge (EEK). Abstracting the implementation down to its inner core, which I term the essential structure, will maximize the number of contexts in which a module can be used, and will eliminate the presence of E E K within  27  it, thereby making that module less likely to change when the system around it changes. More precisely, knowledge expressed within a given module is E E K relative to that module if that knowledge is not explicitly described by a minimal specification  of the mod-  ule's responsibilities. A module that contains no E E K expresses only its essential structure. In Section 3.1,1 discuss and define the concept of minimal specifications and, in Section 3.2, examine their use in defining E E K and identifying it in the m a k e C o n t r i b u t i o n s method example of Chapter 2. I then consider a useful categorization of some forms of E E K in Section 3.2.1. The general way in which each might be eliminated to realize essential structure is described in Section 3.3.  3.1  Minimal Specifications  A given module has a set of goals it must achieve. When a module must be implemented to meet a minimum set of constraints, those constraints are termed the requirements of the module. Requirements represent the set of constraints that are necessary for an implementation to meet. The process of implementation is one of elaboration of the requirements, to such an extent that a typical reaction to the fact that an implementation of a module contains many more details than its requirements is "Of course!" Thus, it is generally not the case that it is sufficient for the implementation of a module to only meet the constraints imposed by its requirements. To judge whether a module has met its goals, we need some idea of what those goals are, some description of them. Such a description may be implicit, it may be explicit but informal, or it may be explicit and formal—regardless, there is a description possible. A specification can be considered such a description, since the specification of a module "should tell the user everything he must know to use the [module], and it should tell the implementer everything he must know about the [module] to implement it" [Lamport, 1989]. A requirements specification  is a specification that is limited to a description of the necessary  constraints on a module; these may or may not be sufficient. A specification in general can contain more details than arise from the requirements specification. The more constraints there are upon a module, the fewer implementations that can be created to meet those constraints; the more constraints that a module places upon its system, the fewer systems in which the module can be used. We wish to maximize the number of contexts in which a module can potentially be reused; then, if the system in which the module occurs needs to change, this minimizes the likelihood that the module itself will need to evolve. Therefore, we must reduce the'constraints upon our modules to those that are both necessary and sufficient. This is not a radical departure from current thought; the departure lies in the assumptions that are made about what is necessary and what is sufficient. I wish to describe the goals of our modules in as abstract fashion as  28  possible, and to implement our modules to express this description without the addition of details. Then, we will have implementations that possess only the necessary and sufficient constraints upon the behaviour of modules and the behaviour of the system containing these modules. Let us look at an example. An Example of the Problem Consider one possible, informal specification of the m a k e C o n t r i b u t i o n s method of J a v a O u t l i n e P a g e , judging by its implementation. It might read something like the following. • makeContributions  takes three parameters,  of types  IMenuManager,  I T o o l B a r M a n a g e r , and I S t a t u s L i n e M a n a g e r , and has a v o i d result type. • m a k e C o n t r i b u t i o n s checks whether the argument passed in the I S t a t u s L i n e M a n a g e r parameter is n u l l ; if and only if the parameter is not n u l l , it creates a new instance of S t a t u s B a r U p d a t e r and adds this instance to J a v a Out l i n e P a g e ' s set of selection changed listeners. • m a k e C o n t r i b u t i o n s creates a new instance of L e x i c a l S o r t i n g A c t i o n and adds this to the argument passed in the I T o o l B a r M a n a g e r parameter. The rest of the specification would continue in this fashion. This specification is very detailed and concrete—to such an extent that many would object to its use as a "specification" rather than an implementation. Even so, it makes no mention that the I S t a t u s L i n e M a n a g e r must be passed to the constructor called on S t a t u s B a r U p d a t e r , nor that a d d S e l e c t i o n C h a n g e d L i s t e n e r must be used to add the new instance of S t a t u s B a r U p d a t e r to J a v a O u t l i n e P a g e ' s set of selection changed listeners, and so on. The claim could be made that these details that have appeared in the implementation were necessary to meet the specification, or even that there was no other way to implement them. Such claims could be perfectly true—under the constraints of the current system design and current programming mechanisms. But note that these details might well not be valid in this implementation should the system around m a k e C o n t r i b u t i o n s change. While the same specification might hold for an implementation of m a k e C o n t r i b u t i o n s in a different system, or different version of the same system, these details would prevent this particular implementation from being usable in all such situations. At first glance, the constraining nature of this detailed specification might seem a good thing. After all, it tells us how to implement m a k e C o n t r i b u t i o n s in a fairly precise way. One might then be tempted to over-specify m a k e C o n t r i b u t i o n s to ensure that all its details are necessary and sufficient. However, Wing [1990: p. 11] states:  29  In practice, you must usually deal with incomplete specifications. W h y ? Specifiers may intentionally leave some things unspecified, giving the implementer some freedom to choose among different data structures and algorithms. A l s o , specifiers cannot realistically anticipate all possible scenarios in which a system w i l l be run and thus, perhaps unwittingly, have left some things unspecified. Finally, specifiers develop specifications gradually and iteratively, perhaps in response to changing customer requirements, and hence work with unfinished products more often than finished ones. We can observe three interdependent principles at work here. 1. Current programming techniques require details to be added to the implementation of a module, beyond those that are specified. 2. Modules tend to be under-specified to allow their implementations some flexibility. 3. Specifications tend to miss details because specifiers cannot predict all reuse or evolution scenarios in which a module must operate. Therefore, feeling the need to create exacting specifications is misguided. A t some level, the concrete details that appear in the implementation of m a k e C o n t r i b u t i o n s for this particular version of Eclipse must be specified and implemented; after all, m a k e C o n t r i b u t i o n s needs to operate in that particular context. However, there is an inner core, an essential structure, that makes an implementation of m a k e C o n t r i b u t i o n s bona fide. If we could directly implement this essential structure, in a manner similar to the refactoring of Section 2.3.2, and layer on the concrete details, ideally the inner core would be highly reusable and the outer layers more easily changed.  Defining the Inner Core To be able to define such an essential structure, then, requires us to describe the details that are necessary and sufficient for an implementation to possess. I call such a description a minimal specification;  it is intrinsic to a module, whether or not it is made explicit.  Consider again the case of m a k e C o n t r i b u t i o n s . A t its core, it must add some functionality to allow the status line to reflect selection changes in the Outline View, and to add four actions to the tool bar, one for sorting declarations lexically, one for hiding fields, etc. Thus, a minimal specification for m a k e C o n t r i b u t i o n s i s : 1  • m a k e C o n t r i b u t i o n s must add functionality to the status line, if and only if there is one, to reflect selection changes in the Outline View. 'Since the given specification is informal, there is room for ambiguity in its interpretation; however, such worries are secondary in defining the necessary and sufficient constraints. 30  • It must add four actions to the tool bar, one for sorting declarations lexically, one for hiding fields, etc. 2  The difference between this and the earlier specification lies in the degree of abstraction. In the latter specification, there are no references to concrete names, or specific protocols or algorithms, but merely to the concepts that are necessary to realize. It is a specification of sufficient detail since it gives all the abstract details that are significant for m a k e C o n t r i b u t i o n s . Attempting to reduce the specification beyond this point would be to alter the nature of m a k e C o n t r i b u t i o n s , and therefore, it is a specification of necessary detail. This latter specification is therefore minimal. Definition 1 (Minimal specification) A minimal specification is a specification  in which  every detail present is necessary and sufficient to meet the goals of its module, in as abstract a fashion as possible.  This is not to say that there is no useful module that could be defined with a further reduced specification. In fact, we have seen in Section 2.3 how separating m a k e C o n t r i b u t i o n s into c o n t r i b u t e T o S t a t u s L i n e and c o n t r i b u t e T o T o o l B a r is useful. The minimal specification for c o n t r i b u t e T o S t a t u s L i n e would be provided by reducing the minimal specification for m a k e C o n t r i b u t i o n s to the first item present there: • c o n t r i b u t e T o S t a t u s L i n e must add functionality to the status line, if there is one, to reflect selection changes in the Outline View. Similarly, the minimal specification for c o n t r i b u t e T o T o o l B a r would be provided by reducing the minimal specification for m a k e C o n t r i b u t i o n s to the second item present there. But neither of these single items by themselves is sufficient to specify makeContributions. Now, I consider how implementations are overconstrained because they possess knowledge in addition to that described in their minimal specifications.  3.2  Extraneous Embedded Knowledge  In Chapter 2, we saw how dependences upon the external context coupled the parts of Eclipse that I wished to reuse, to the parts that I did not. The concept of a minimal specification allows us to identify those dependences that should not be present within a given module. Once these have been identified, we can hope to do something about them. Consider the differences between the implementation of  makeContributions  (given in Section 2.3.1) and the minimal specification of m a k e C o n t r i b u t i o n s 2  Note that there is an implicit assumption here that the tool bar exists. 31  (given in  Section 3.1). First, the minimal specification says nothing about a return value from make-  C o n t r i b u t i o n s , so a result type of v o i d seems appropriate. The minimal specification says nothing about menus, but does mention a tool bar and a status line; however, it does  not explicitly mention the classes IMenuManager, I T o o l B a r M a n a g e r , nor I S t a t u s L i n e M a n a g e r . The minimal specification does not even indicate that the tool bar and status line are passed as arguments. //  Begin  p u b l i c  running  v o i d  example  3.1  m a k e C o n t r i b u t i o n s ( I M e n u M a n a g e r menuManager, IToolBarManager IStatusLineManager  toolBarManager, statusLineManager)  {  The minimal specification indicates that the status line might not exist, but if it does, that functionality must be added to it to reflect selection changes in the Outline View. However, the minimal specification does not state that a n  u l l  value indicates that the status line does  not exist, nor does it state that creating an instance of S t a t u s B a r U p d a t e r and adding it as a selection changed listener are necessary in exactly this form.  if(statusLineManager != n u l l ) StatusBarUpdater updater =  {  new StatusBarUpdater(statusLineManager); addSelectionChangedListener(updater); } And finally, while the minimal specification mentions the addition of four actions to the tool bar, it makes no mention of the concrete, detailed interfaces and protocols for doing so,  such as the classes L e x i c a l S o r t i n g A c t i o n , F i l t e r A c t i o n , J a v a E d i t o r M e s s a g e s , or J a v a P l u g i n l m a g e s , or any of the methods thereon.  Action action = new LexicalSortingAction(); toolBarManager.add(action); action = new F i l t e r A c t i o n ( n e w F i e l d F i l t e r ( ) ,  JavaEditorMessages.getString( "JavaOutlinePage.HideFields.label"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.unchecked"), 32  JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.unchecked"), "HideFields.isChecked"); JavaPluginlmages. setlmageDescriptors(action,  "lcll6",  "fields_co.gif");  toolBarManager.add(action);  }  // End running  example  3.1 '  As pointed out by Wing (see page 30), specifications are often incomplete, with details needing to be added in the implementation. The fact that such extra details appear in the implementation of m a k e C o n t r i b u t i o n s should come as little surprise. However, let us consider an ideal situation. Imagine that we develop a new system, decomposing it into various modules, for each of which we provide a minimal specification. Assuming that such minimal specifications are expressed in (or can be translated to) some canonical form, we seek out extant modules matching all or part of the minimal specifications we need. But here is the catch: even if we manage to locate such extant modules, there is no guarantee whatsoever that they will be reusable in our new system. They will likely possess extra dependences upon the larger context of the system; since these dependences are not present in the minimal specification, they are not likely to be satisfied in a different system. I thus consider such dependences, and the knowledge they imply, to be extraneous. This is not to suggest that it is not important knowledge within a larger piece of the system, or the system as a whole; merely, this knowledge does not belong in the local scope of the given module. Definition 2 (Extraneous embedded knowledge) Any knowledge expressed by a module of its external context that does not explicitly appear in its minimal specification  is extrane-  ous embedded knowledge (EEK) with respect to that module. Recall the example of the Abstract Factory design pattern from Chapter 1; E E K tends to be present when a class participates in this design pattern as a C l i e n t . The minimal specification of the class that acts as C l i e n t is itself unlikely to refer to the Abstract Factory design pattern; some larger module that contains this class is the likely source for the requirement that Abstract Factory be used. Therefore, knowledge of Abstract Factory is E E K with respect to C l i e n t . This E E K manifests itself through the explicit naming 33  u s e o f the A b s t r a c t F a c t o r y c l a s s w h e n the C l i e n t n e e d s to instantiate s t r a c t P r o d u c t c l a s s . C l i e n t w o u l d b e j u s t as e f f e c t i v e at p r o v i d i n g its c o r e w e r e i t p o s s i b l e t o e x p r e s s it a s d i r e c t l y i n s t a n t i a t i n g A b s t r a c t P r o d u c t . and  A  m o r e subtle version o f E E K can be f o u n d in large-scale m o d u l e s containing smaller-  scale m o d u l e s . sesses a  draw  C o n s i d e r d r a w i n g toy robots represented b y a  Robot  class.  Robot  the torso, h e a d , left a r m , a n d so f o r t h . E a c h o f these m e t h o d s , i n turn,  d e l e g a t e s to yet o t h e r m e t h o d s f o r d r a w i n g s m a l l e r bits o f the r o b o t , p r i m i t i v e o b j e c t s as r e c t a n g l e s ) ,  or both.  In i m p l e m e n t i n g these m e t h o d s , w e r e a l i z e that w e n e e d a  w h i c h to d r a w p r i m i t i v e s .  Because  draw  m e t h o d to a c c e p t  (such canvas  the r o b o t m a y n e e d to b e d r a w n o n d i f f e r e n t c a n -  v a s s e s , a c a n v a s o b j e c t m u s t b e p a s s e d to the to the  pos-  m e t h o d that d e l e g a t e s to a series o f o t h e r m e t h o d s f o r d r a w i n g the d i f f e r e n t  parts o f the robot:  on  Ab-  an  purpose  Robot  such an object.  c l a s s ; t h e r e f o r e , a p a r a m e t e r is a d d e d  S i n c e w e n e e d to h a v e this c a n v a s  at  the  p o i n t s w h e r e it is r e q u i r e d t o d r a w p r i m i t i v e s , a p a r a m e t e r is a d d e d t o a l l t h e m e t h o d s t h a t d r a w parts o f the r o b o t a n d e a c h m e t h o d d u t i f u l l y passes the c a n v a s o n . O n l y the m e t h o d s t h a t d r a w p r i m i t i v e s a c t u a l l y d o a n y t h i n g w i t h t h e c a n v a s ; t h e o t h e r s s i m p l y p a s s it to m e t h o d s to w h i c h e a c h delegates.  the  K n o w l e d g e o f the c a n v a s c a n thus b e s e e n to b e E E K  f r o m the perspective o f these m e t h o d s :  a minimal  specification o f each o f these  c o u l d b e c r e a t e d that i g n o r e d the e x i s t e n c e o f a n y c a n v a s object.  methods  T h e c a n v a s o b j e c t is o n l y  p r e s e n t w i t h i n these m e t h o d s b e c a u s e a c a n v a s o b j e c t n e e d s to b e e x p l i c i t l y u s e d to p r i m i t i v e s , a n d this object m u s t travel f r o m the clients o f  Robot, t h r o u g h  draw  the h i e r a r c h y o f  d e l e g a t i o n s , d o w n to the p r i m i t i v e d r a w i n g c a l l s b e i n g m a d e . To  r e - e m p h a s i z e ^ c a t e g o r i z a t i o n o f k n o w l e d g e as E E K c a n o n l y  particular module.  For example,  p a t t e r n is E E K w i t h i n  Client  Client.  containing  This  o c c u r r e l a t i v e to  the fact that k n o w l e d g e o f the A b s t r a c t F a c t o r y  a  design  d o e s n o t i m p l y t h a t it is E E K w i t h i n a l a r g e r , p a r e n t m o d u l e  parent  may  indeed  be  very  concerned  with  the fact  that  the  A b s t r a c t F a c t o r y d e s i g n p a t t e r n is b e i n g u s e d i n f a v o u r o f s o m e o t h e r m e a n s o f f l e x i b i l i t y . Likewise,  primitive drawing  meaningful.  m e t h o d s n e e d to r e f e r to a c a n v a s  Therefore, a canvas  object  f o r their o p e r a t i o n to  is n o t l i k e l y to b e E E K w i t h i n  such a  be  primitive  d r a w i n g m e t h o d e v e n i f c a n v a s o b j e c t s are not m e n t i o n e d in the m i n i m a l s p e c i f i c a t i o n s o f clients o f these  3.2.1  primitives.  Varieties of E E K  E x t r a n e o u s e m b e d d e d k n o w l e d g e c a n be characterized o n the basis o f the k i n d o f d e p e n d e n c e it c r e a t e s . all  S u c h a c h a r a c t e r i z a t i o n is n o t i n t e n d e d as a p a r t i t i o n i n g o f t h e s p a c e  of  E E K , b u t r a t h e r as a c o n v e n i e n t n o m e n c l a t u r e , a n d as a m e a n s o f i n s i g h t i n t o w h e t h e r  all E E K is e q u a l l y p r o b l e m a t i c . S o m e o f the c a t e g o r i e s i n t o w h i c h E E K c a n b e s e e n to f a l l include:  extraneous identification,  extraneous parameters,  w h e r e s p e c i f i c types o r m e t h o d s are e x p l i c i t l y n a m e d ;  w h e r e p a r t i c u l a r o r d e r e d s e q u e n c e s o f t y p e s are r e q u i r e d o f the ar-  34  guments for a given method to be called; extraneous pre-condition assumptions, where the caller must adhere to certain constraints prior to calling a method; extraneous post-condition assumptions, where a called method is assumed to return a value meeting some, possibly implicit, semantics; and extraneous protocol adherence, where sequences of calls must be made in particular orders, or where certain semantic properties are otherwise assumed to hold sway. I examine each of these categories in turn.  Extraneous Identification The simplest form of E E K consists of the dependences a client module forms on particular names and signatures of external modules. To specify a module, we often resort to using external modules to provide partial functionality. Generally, it is the conceptual functionality of these external modules we are interested in, and not the specific interfaces provided by them. For example, a method may require some library method for determining the current time, without caring what the exact name of this method is, or which library it is in, or even the multitude of possible parameters it accepts (Figure 3.1). But with current mechanisms, it is difficult to avoid such knowledge. If any of the external names or signatures change, our client module will break. What should be important to the client is not which module will be providing desired external functionality, but rather what functionality is needed. Therefore, I refer to false dependence on names or signatures as extraneous identification. For example, say we had a client that referred to the V e c t o r class of the Java standard class libraries, but at some point V e c t o r is renamed Dynamic A r r a y , thereby breaking our client. Our client cared about the V e c t o r functionality, not the name; therefore, the specific name was E E K within the client. Similarly, in m a k e C o n t r i b u t i o n s , our minimal specification mentions concepts such as status line and tool bar but our implementation mentions I S t a t u s L i n e M a n a g e r and I T o o l B a r M a n a g e r . The concepts would remain the same even should the names of the concrete classes be otherwise.  Extraneous Parameters An implementation of a module can possess parameters that its minimal specification does not mention. For example, consider three methods m l , m2 and m3. Method ml calls m2 and m2 calls m3. Method m3 requires some particular information param that is known by or needs to be created by m l , but m2 does not need to know about param for its internal computations. Nevertheless, for param to pass from ml to m3, it is typically passed via m2 (Figure 3.2). In most such situations, this parameter is not present due to the minimal specification of m2; therefore, param is an extraneous parameter within the signature of m2. The argument passed to m2 from ml may or may not be E E K there, depending on the  35  caller  callee  caller  >  callee  ->: "4  (a) Supposed dependence  caller i  caller  1  . - '  callee  r> c a l l e e '  i  (c) Ideal dependence (b) Actual dependence  Figure 3.1: The c a l l e r method calls the c a l l e e method to utilize its service. This implies that (a) c a l l e r depends upon the behaviour of c a l l e e (represented by the simple, dashed arrow); however, (b) c a l l e r must access this behaviour by making explicit reference to the name " c a l l e e , " whether or not this name represents the behaviour of true interest, and thus this dependence is a case of extraneous identification. Ideally, (c) c a l l e r and c a l l e e each realizes only its minimal specification (represented by the dashed, rounded boxes); the minimal specification of c a l l e r depends upon the minimal specification of c a l l e e but not upon the abstract label "callee." Since the only dependence upon c a l l e e from c a l l e r is indirect via the minimal specification in the ideal situation, the concrete entity realizing this minimal specification may be replaced or renamed. minimal specification of m l : if it recognizes that the service it desires of m2 requires this argument, then it is not E E K ; otherwise, it is. We have seen the presence of an extraneous parameter in the implementation of m a k e C o n t r i b u t i o n s on J a v a O u t l i n e P a g e . There, the minimal specification made no mention of any menu bar concept (let alone, the actual I M e n u M a n a g e r type). Such a formal parameter appeared for the sake of parallel structure: implementations of m a k e C o n t r i b u t i o n s on other classes may need to use the menu bar in some way; in order to utilize polymorphic dispatch, every sibling class of J a v a O u t l i n e P a g e must implement m a k e C o n t r i b u t i o n s with an identical interface, including the formal parameter of type I M e n u M a n a g e r . In this case, the argument passed in the formal parameter may well end up going nowhere at all, but is passed "just in case" it is needed. Extraneous parameters are burdensome for evolution. It often comes to pass [Lewis et al., 2000; Walker and Murphy, 2000] that, during the elaboration of a system, we require some additional data to be passed to a method deep in the control flow. When this data is only obtainable from much higher in the control flow, all the intervening methods must have an extraneous parameter added to them. Should the module requiring the extra data 36  m 7_  m2_  i  I  m3*  I  I  ;  I  i  ..__>;  i  I  1 ;  I  I  ; i  I  1 «similar»  1 >,  (  Mention of 1  ! !  I  r  >^ param  "A"  j  Figure 3.2: An extraneous parameter arises due to the mismatch between minimal specifications and implementations. The minimal specification ml depends upon the minimal specification m2 that, in turn, depends upon the minimal specification m3*. While the modules ml and m2 realize ml and m2 respectively, m3 actually realizes the minimal specification m3, which is more detailed than m3* (though it must be similar enough to substitute). This extra information causes the introduction of an extraneous parameter (param) into ml and m2. be replaced with one that does not, this extraneous parameter remains in all the intervening methods even though it serves no purpose. Add the constraints of polymorphism across multiple classes, and the situation gets even more tangled.  Extraneous Pre- and Post-condition Assumptions Consider a method c l i e n t that calls another method, s e r v i c e , to perform some portion of its functionality. The minimal specification of c l i e n t has a notion of the service provided by s e r v i c e , and therefore, what information should be passed to s e r v i c e that the service should act upon. However, the minimal specification of s e r v i c e may well have a different notion of what information is required. For example, s e r v i c e may require a random number to complete its functionality. The minimal specification of s e r v i c e requires that its clients generate this random number and pass it in. Thus, for c l i e n t to be able to call s e r v i c e , the pre-condition that c l i e n t compute and pass a parameter, which is of significance only to s e r v i c e , must be satisfied. If the minimal specification of c l i e n t fails to recognize the need for this random number, this pre-condition is extraneous with respect to c l i e n t . The implementor  37  of c l i e n t likely saw the presence of s e r v i c e , thought that the pre-condition was not very onerous, and proceeded to use it accordingly. This is a problem: c l i e n t will always be constrained to use a service requiring this pre-condition—even though it did not require it itself. Similarly, we can consider the case where the return value of s e r v i c e does not directly satisfy the needs of c l i e n t .  Instead, a post-condition on s e r v i c e must be  explicitly recognized by c l i e n t : the implementation of c l i e n t must include additional computations to convert the return value. The expression of pre- and post-conditions within c l i e n t is structurally unsound. The definition of such conditions depends on the implementation of and interface to s e r v i c e , instead of the interface to the abstract, external service that is described by the minimal specification of c l i e n t . The introduction of dependences is similar to the situation with extraneous parameters (see Figure 3.2) save that an additional computation is added as a result of the dependence, rather than an additional parameter. In the case of the Outline View of Eclipse, it could be argued that the supposedly minimal specification of m a k e C o n t r i b u t i o n s (given on page 30) really could be reduced further while remaining a specification of m a k e C o n t r i b u t i o n s . Namely, it could become: • m a k e C o n t r i b u t i o n s must add functionality to the status line to reflect selection changes in the Outline View. • It must add four actions to the tool bar, one for sorting declarations lexically, one for hiding fields, etc. where there is no mention that the status line might not exist. In such a case, the presence 3  within the implementation of the test to determine whether the status line is n u l l is not only E E K from the point of view that it states too concretely what it means for the status line not to exist, but also because it is performing a test that it should not know about. Extraneous Protocol Adherence The information contained in a client module regarding a service module that it calls is not necessarily limited to the syntactic interface to the service. The implementor of the client had access to knowledge regarding the semantics of each parameter and return value of every method of the service (from the interface definition of the service and its accompanying documentation) at the time the client was implemented. Since such detailed semantics are In reality, this is a minimal specification for a subtly different module than the one discussed earlier, rather than an alternative minimal specification. 3  38  unlikely to have been present within the minimal specification of the client module, their presence would be E E K within the implementation of the client. For example, Java version 1.1 contained no Set class or interface in its standard class library ; to emulate such a class,  H a s h t a b l e may have been used instead. If our client module were using Hashtables to provide the functionality of Sets, the implementor of the client module may have assumed other properties of particular Hashtables such as lookup efficiencies; or, the H a s h t a b l e could be required as the type of an input parameter, spreading the need for the use of the H a s h t a b l e to other modules. However, the 4  minimal specification of the client only required the functionality of an abstract Set; the other semantic properties of  Hashtables formed extraneous constraints upon the imple-  mentation of our client. In the implementation of the m a k e C o n t r i b u t i o n s method of J a v a O u t l i n e Page, we have similarly seen cases of extraneous protocol adherence. The clearest case involved the creation of the field filter action: the protocol required that makeContrib u t i o n s possessed knowledge: of the key strings to look up message bodies; that the g e t S t r i n g method of J a v a E d i t o r M e s s a g e s be used to locate the actual labels, descriptions, etc.; and that the s e t l m a g e D e s c r i p t o r s method of the J a v a P l u g i n Images class be used to locate the images. The minimal specification requires only that m a k e C o n t r i b u t i o n s create a field filter action; this specific protocol ties it tightly to Eclipse, and furthermore to a particular version of Eclipse. Having examined some varieties of extraneous embedded knowledge, I consider how minimal specifications might more closely be realized through the concept of essential structure.  3.3  Reduction to Essential Structure: A n Abstract View  With current programming techniques, we implement each of our modules both to realize its minimal specification and to satisfy the minimal specification of any other modules that we decide to call or otherwise reference within that implementation (consider Figure 3.2). Consider the effect of adding details of the context of operation into our implementation of a module when those details are not present within the minimal specification for the module (i.e., the effect of adding EEK). Such E E K couples the implementation of our module to one particular context, making it difficult to change that context without changing that implementation. What we need is a means of implementing a module in such a way that it is not tied to a single context. We can implement our modules to be maximally reusable and That the Set interface was added to version 1.2 of Java is indicative of the fact that even standard libraries evolve, so making assumptions about their behaviours and interfaces leads to evolutionary problems. 4  39  maximally unlikely to need to change if their system evolves, if they contain only essential structure.  5  Definition 3 (Essential structure) An implementation of a module that expresses no extraneous embedded knowledge possesses only essential structure. The question arises of how the implementation of a module can be reduced to its essential structure. Clearly, some details can be removed through refactoring, as we saw in Section 2.3. But some E E K would not seem to be amenable to such simple solutions: a given implementation will always explicitly mention some names and make some assumptions about how its external context is communicating with it. Such assumptions are E E K if not described in a minimal specification of the module. However, all is not lost. There are three key ideas that can still allow us to reduce our modules to their essential structure. The first idea is to do away with the assumption that there is a one-to-one correspondence between a name that appears in an implementation and the actual name of an external module that gets accessed via that name. If we accept that identifiers within an implementation are simply abstract labels for external modules that meet some minimal specifications, the dependence that those labels actually be the names of those modules disappears. Permitting late binding of our identifiers and the like allows great flexibility: we can express only the essential structure of a given module, nothing more, nothing less. The essential structure of the entire system includes its concrete details, and so it must express them. But below this level, only the details that are consistent with the essential structure of our modules should be expressed: we should only express identifiers in ways that are meaningful within the implementations of our modules themselves; we should only declare parameters and pass arguments that are meaningful within those implementations; we should express only whatever pre- and post-conditions and protocols that are consistent with the essential structure of our modules. The second idea is similar: rather than assuming that the concrete communication protocols expressed by our implementation must be exactly those that actually exist, let us consider our implementation to possess an abstract description of the communication with the external context. This abstract description is expressed in a manner that is convenient to the expression of the module without consideration of the larger, concrete system in which it resides. For example, a common communication style consists of method calls Some might be tempted to think of an implementation that expresses only essential structure as being some sort of "pseudocode implementation." Unfortunately, the notion of what constitutes pseudocode varies too greatly to aid us here. While some uses of pseudocode parallel the goals of essential structure, other pseudocodes are effectively formal languages for which one could write a compiler (e.g., those of Lekkos and Peters [1978] or of Robillard [1986]). Programs written in the latter form consist of pseudocode only in the sense that the given language does not exist in reality. 5  40  that pass arguments. The presence of a method call within an implementation is likely to be a straightforward, abstract description of communication with that external method. However, the communication that appears within a module to be a method call might actually be reified as a remote procedure call. Thus, rather than insisting that we explicitly express only the essential structure of our modules, retroactive abstraction  permits us to utilize "emergent" essential structure. By  reifying our retroactively abstract implementations in the right way, we can obtain the same effect as if the module really had been implemented to express only its essential structure. The third idea deals with the fact that the E E K that we avoid or remove from our modules almost certainly has to be expressed somewhere in our systems; it is important that this knowledge be expressed in the appropriate  place, where it is not extraneous. To permit  this external expression of details, we need to compose these details into our modules to form higher-level modules. What we need is a model that allows us to express the essential structure of our modules, either directly or in an emergent fashion. This model must also allow those modules to be composed in such a way that the essential structure of ever larger-scale modules can still be expressed. I consider such a model in Chapter 4.  3.4  Summary  Every module in our systems, regardless of scale, must be specified, if only implicitly. The more detailed and concrete a specification, the fewer contexts in which the specification will be applicable; hence, an implementation of a detailed and concrete specification will tend to be less reusable and more likely to need to evolve. Therefore, we should strive to specify only the details that are both necessary and sufficient for our modules to meet and do so as abstractly as possible; that is, we should give their minimal specifications. Extraneous embedded knowledge (EEK) arises in our implementations when we express knowledge of the context external to a module that is not explicitly given in its minimal specification. E E K is problematic because it places dependences in our modules upon specific contexts. When those dependences are not enforced by a minimal specification, we have constrained our module so that it can only operate in a subset of the contexts where the minimal specification should otherwise satisfy the requirements. Such E E K includes: extraneous identification extraneous assumptions about the names and interfaces of external modules; extraneous parameters parameters that are not used at all, or that are not needed from a local perspective;  41  extraneous pre- and post-condition assumptions  w h e r e o u r m o d u l e s p e r f o r m (or  c o m p u t a t i o n s to m e e t p r e - o r p o s t - c o n d i t i o n s o n the u s e o f e x t e r n a l i n t e r f a c e s o f extraneous assumptions about those interfaces;  extraneous protocol adherence because  where  because  and  our modules perform specific sequences  of extraneous assumptions  about  avoid)  the r e q u i r e d p r o t o c o l s  of  for external  calls inter-  faces.  By  eliminating such extraneous dependences,  we can  m a x i m i z e the n u m b e r o f  con-  texts in w h i c h a g i v e n m o d u l e is r e u s a b l e , a n d d e c r e a s e the l i k e l i h o o d that a g i v e n m o d u l e w i l l n e e d to e v o l v e . c a l l s , etc. crete.  B u t to e l i m i n a t e a l l E E K r e q u i r e s that w e treat the i d e n t i f i e r s , m e t h o d  that a p p e a r  i n a n i m p l e m e n t a t i o n as a b s t r a c t e v e n  C o n s i d e r i n g these constructs  concrete programmatic  w h e n they  to b e abstract r e t r o a c t i v e l y  e l e m e n t s to t h e m as late as n e c e s s a r y  may  appear  a l l o w s u s to d e l a y  p r i o r to r u n n i n g o u r  con-  binding systems;  in this f a s h i o n , w e n e e d not n e c e s s a r i l y e x p l i c i t l y e x p r e s s o n l y the essential structure o f o u r m o d u l e s , but such an essential level o f abstraction u s to c o m p o s e  structure  can arise in an e m e r g e n t  r e q u i r e s a m o d e l f o r the e x p r e s s i o n  o u r m o d u l e s to e x p r e s s the essential  modules. I consider such a model,  implicit context,  42  fashion.  o f essential  structure  T o realize  structure  that  this  allows  of higher- and higher-level  in the f o l l o w i n g  chapter.  Chapter 4  A Model for Essential Structure through Implicit Context In previous chapters, we have seen how the concept of essential structure arises, in order to improve software reuse and to ease software evolution. But to benefit from the concept, we need a model of how essential structure may be realized. We have identified two points that such a model must meet: retroactive abstraction, where the apparently concrete identifiers and programmatic constructs within a module can be reinterpreted after the fact as abstract concepts to be reified; and composability of our abstract modules in such a way as to form a concrete system. When a module expresses only its essential structure, it becomes as contextindependent as possible. However, this is not to say that the context in which a module operates is irrelevant; quite the opposite is true in fact. Therefore, we must provide a means for a module to interact with its context while it does not explicitly make mention of that context (except as required by its essential structure). I call this means implicit context. Through implicit context, we provide retroactive abstraction within our modules in such a way that we may compose our modules to form fully concrete systems. Implicit context utilizes two basic concepts. First, each module may have an independent world view of the context in which it operates. With today's common practice, we expect all our modules to agree upon the global view of the system: certain classes exist with certain interfaces; each module may only have a partial view of the world, but this view agrees with all the others. This level of abstraction does not suffice. It is difficult enough to define a single, consistent global view for a particular version of a system; defining a single global view that will hold sway for all time as a system evolves is next to impossible. Thus, the process of reuse and evolution would potentially be made simpler were we freed from the constraints of conforming to a complex, volatile world view. Second, reconciliation of the differences between those world views must be performed when the modules  43  interact in order to form a coherent system. Reconciliation is performed by interpreting communications differently depending on the context in which they take place. In Section 4.1, I examine the model for implicit context. I then demonstrate how implicit context can be used to express the essential structure of modules in Section 4.2. I postpone a description of implementing implicit context until Chapter 5.  4.1  Implicit Context  Before F describe the model for implicit context, I give an analogy with human conversations as a motivational framework. Consider that in human conversation we do not spell out every concept we wish to communicate at every instant the understanding of those concepts is required. We expect much information to be understood from or altered by context. Such use of context takes two forms: omission, where words or details are left out to be filled-in from earlier details within a conversation or general knowledge, and alteration, where the words that are spoken or the way that they are interpreted depends upon the individuals who are speaking. "It spun wildly" could refer to a ride at the county fair, or to one's impression of a room while experiencing extreme nausea; the details about "it" have been omitted, to be understood from what has previously been discussed. Likewise, one's response to the question, "What is politics?" might be quite different depending on who is asking; the explanation given a young child is likely to be significantly different from that given an adult. Meaningful use of context can require that the participants in a conversation share a common world view when referencing knowledge outside the confines of the conversation. Stating that someone "acted the role of Cyrano" would be meaningless if the listener knew nothing of Cyrano de Bergerac. Similarly, the history of communications within a system can be viewed as a conversation between modules, and so, we can leverage the concepts of omission, alteration, and world view. Rather than forcing modules to repeatedly give the same details in their communications, I wish to allow them to communicate while omitting details—details that would be extraneous embedded knowledge within those modules. At the same time, we must perform alteration of communications according to the implicit context in which they occur, including from where they are received or to whom they are sent. And finally, modules need to share a common world view, or apparently share one, so their implicit context can be correctly used. 1  With these concepts in hand, I proceed to discuss the three components of implicit context. In Section 4.1.1, I describe boundaries, which are the transition zones between differing world views. To reconcile the differing world views, contextual dispatch is per'In human conversation, we also perform operations such as checking that we have understood what is"being discussed. I am not attempting to provide such operations via implicit context.  44  formed at these boundaries; Section 4.1.2 details this. And finally, communication  history  (outlined in Section 4.1.3) is the conceptual record of all communications that have occurred within the system; it is used to provide details for performing contextual dispatch.  4.1.1  Boundaries  We need to define the program fragments for which differing world views are in force. When communication occurs between these program fragments, we must translate this communication according to the differences in these world views in order to get the program fragments to work together. I define the program fragments to be modules, and define there to be a boundary around each module that demarcates the end of one world view and the imposition of another. Ultimately, we care about the communications between these world views because these are what must be translated. This means that, for some program fragment to be defined as a module, it must be possible (at a minimum) to differentiate which communications arise inside the module and which arise outside, with a boundary between them. This leads to our definition of modules as follows. Definition 4 (Module) Any program fragment can be defined to be a module as long as two constraints are met. First, every communication be unequivocally  in the program, potential or actual, can  identified as arising from either inside or outside that module. Second,  every communication  in the program, potential or actual, can be unequivocally  identified  as travelling to the inside of that module or not. This begs the question of what a communication is, of course. Obvious forms of communication include method invocations and field accesses. Under some circumstances, it can be useful to consider other program instructions to be communications, such as arithmetic operations, type casts, or assignment operations. In general, any program instruction could be viewed as a communication, possibly to another module within the program, or possibly to some virtual machine executing the program. In the implementation discussed in Chapter 5 of this dissertation, I view communications as being limited to method invocations, field accesses, and type references. Other implementations could make different choices. A concept similar to boundaries has long been used to separate and encapsulate the details within modules that are to be hidden from the external world [Parnas, 1972]; traditionally, a module then exposes only a high-level interface to its functionality, allowing the implementation behind this interface to be changed. Boundaries differ from module interfaces in two ways. First, they encapsulate the external context of the module (the system outside the module), in addition to the internal details. Second, each boundary acts as a facade both for the module itself, and for the external context. As a result, the external context may perceive the interface to the module to be quite different from that perceived 45  by the module itself and, similarly, the module (the inside of the boundary) may perceive the interface to the external context to be quite different from that perceived by the external context itself. This allows for differing world views to hold sway within each boundary. We can consider boundaries to exist around obvious program modules, such as classes, but also around less obvious ones, such as individual methods, individual statements within methods, groups of classes, groups of methods, etc. Even very complicated boundaries could be defined, such as a boundary around a method from one class, several methods from a second class, an entire third class, and a single statement within a fourth class. Whether such complex boundaries are needed or useful is a separate issue; nothing prevents us from conceiving of such a convoluted boundary. In simple cases, boundaries around individual methods, boundaries around individual classes, and boundaries around individual (Java) packages nest simply. As a result, communication to and from these modules cross one boundary at a time. However, program fragments may be defined to belong to multiple modules; thus, boundaries need not be well-nested but may overlap in complicated ways in the most general case. Therefore, a definition of boundaries is required that permits arbitrary overlap of modules. Definition 5 (Boundary) Every communication  from inside a module that references the  system external to that module must cross the boundary of that module. Likewise, every communication from outside a module that references some part of that module must cross the boundary of that module. As a result, communications to or from a program fragment in the intersection of two modules must cross two boundaries (see Figure 4.1). When one module is strictly a subset of another, outgoing communication is considered to cross the boundary of the inner module first and incoming communication is considered to cross the boundary of the outer module first (see Figure 4.2). In the general case of overlap, no natural order to these boundaries exists, but it is convenient to impose an order on them so we can unequivocally determine which is crossed first; imposing such an order is an issue with which an implementation of the model must cope, so we delay further discussion of this topic to Chapter 5.  4.1.2  Contextual Dispatch  With boundaries, we can segregate differing world views into well-defined subsets of our systems. But these subsets must work in conjunction to form a whole, functioning system. The presence of differing world views implies a disagreement as to the resolution of names and interpretation of communication. Therefore, this disagreement must be reconciled to achieve a coherent system. Contextual dispatch is a mechanism for reconciliation between differing world views. It is a generalization of such concepts as object-oriented polymorphism [Strachey, 1967; 46  public  int  class  Simple {  d o i t ( ) { ... }  Figure 4.1: Modules can be denned that overlap. Here, we see a module that consists of the non-public methods within the class Simple and another that consists of the methods with v o i d result types; the boundary of the former is represented as a dashed line, the latter as a solid line. If the p r i n t method communicates with the d o i t method, the solid-line boundary must be crossed. If p r i n t communicates with main the dashed-line boundary must be crossed. But if p r i n t communicates with any other modules, both these boundaries must be crossed.  Figure 4.2: Modules can be defined such that one is strictly a subset of the other. Here, we see a module that consists of the p r i n t method within the class Simple and another that consists of the entire class S i m p l e itself; the boundary of the former is represented as a dashed line, the latter as a solid line. If the p r i n t method communicates with the d o i t method or the main method, the dashed-line boundary must be crossed. If p r i n t communicates with any other modules, both these boundaries must be crossed.  47  Cardelli and Wegner, 1985], multiple dispatch [Bobrow et al, predicate dispatch [Chambers, 1993; Ernst et al,  1986; Chambers, 1992],  1998], and subjectivity [Harrison and  Ossher, 1993]. These other mechanisms allow for determining the method implementation 2  that is to receive a given message variously on the basis of the types and values of the arguments of that message, the type of the receiving object, or the type of the sending object. Contextual dispatch additionally allows the recipient implementation (or implementations) to be selected on the basis of the context from which the message originates; we can utilize both structural information about the system and knowledge of the current and previous system state to further discriminate between potential recipients. Contextual dispatch also allows for the interception and alteration of communication. The implicit context model is generic with respect to what communication is: any information that flows across a boundary is subject to contextual dispatch. We often make casual reference to message passing when discussing contextual dispatch, but such discussion is applicable to any other form of information flow too, such as procedure calls, events, signals, field accesses, or thrown exceptions. Communications are intercepted and, depending upon the context, may be: • rerouted to a different recipient or multiple recipients, • delayed or ignored completely, and/or • altered to add, remove, or reorder arguments. Figure 4.3 illustrates an example of contextual dispatch. As described in Section 3.2, for a class to participate in the Abstract Factory design pattern [Gamma et al,  1994] as  a C l i e n t , it must explicitly name the A b s t r a c t F a c t o r y class in order to create instances of an A b s t r a c t P r o d u c t class. Such explicit knowledge of the Abstract Factory design pattern will generally not be part of a minimal specification of the class acting as C l i e n t and, thus, will be extraneous embedded knowledge there. Instead, the essential structure of the C l i e n t would simply be to create the A b s t r a c t P r o d u c t :  AbstractProduct  product = new AbstractProduct();  The difficulty lies in the fact that I need the system to use the Abstract Factory design pattern, with this class acting as C l i e n t . To reconcile this difference, I first permit my C l i e n t class to contain the statement above. When the message to create this A b s t r a c t P r o d u c t crosses the boundary of the C l i e n t , contextual dispatch is performed to reroute the message to the appropriate A b s t r a c t F a c t o r y instance. This rerouting may be to different A b s t r a c t F a c t o r y instances at different times, depending upon the state of the system. This implies that an implementation of contextual dispatch must provide a means for testing and retrieving the state of the system. 2  I leave further discussion of these other mechanisms to Chapter 8. 48  SomeClass  1: new  makeAbstractProduct()  makeAbstractProduct() : AbstractFactory : AbstractFactory  Figure 4.3: Performing contextual dispatch to use the Abstract Factory design pattern. The class SomeClass apparently sends the message " n e w " to the class A b s t r a c t Product; however, this message is intercepted at the boundary of SomeClass, altered to the message "makeAbstractProduct," and rerouted to one of two instances of Abs t r a c t F a c t o r y depending on the current state of the system. Contextual dispatch could utilize additional method calls or field accesses to determine the current system state. In the Abstract Factory design pattern example, I need to locate an instance of A b s t r a c t F a c t o r y that meets some set of criteria; perhaps it is the most recently created instance of A b s t r a c t F a c t o r y , or the instance that was passed from a particular module, indicating that it is the one to be currently used. However, such state information is not always immediately retrievable from a given module; this is where we can make use of the communication history of the system. 4.1.3  Communication History  Contextual dispatch needs to access the state of the system in order to add arguments to a message, or to determine to where a communication is to be sent. Sometimes the required state information is available directly, where one or more methods can be called to retrieve it. Other times the required state information is not retrievable through the interfaces provided by the modules of our system. At these times, communication history can be used to retrieve the information of interest. Communication history is a dynamic record of all communication that has occurred within the system from the start of its execution to the current point in its execution. Com3  Communication history was originally termed call history [Walker and Murphy, 2000]; however, it can provide access to other forms of communication, so I have chosen to change the name. 3  49  munication history provides access to all the method calls with their arguments, target objects, return values, and thrown exceptions, and all field accesses. Information is stored implicitly within the communication history during the system execution. Information is retrieved explicitly from the communication history via queries. Each query defines a set of constraints, such as the type of an object of interest, the call in which an object was passed as an argument, the causal or temporal order of calls, etc. A query is fulfilled by finding the information recorded within the communication history that satisfies these constraints. To see the utility of communication history, consider again the example involving the Abstract Factory design pattern (Figure 4.3). In our example, contextual dispatch must determine to which instance of A b s t r a c t F a c t o r y the message m a k e A b s t r a c t P r o d u c t is to be sent. There could be a method called g e t C u r r e n t F a c t o r y already available on some class (say, F a c t o r y C r e a t o r ) in the system from which the appropriate instance could be retrieved. If such a method does not exist, some other means must be provided to allow the retrieval of this state. There are two options, which I describe below. In the first option, I can use contextual dispatch to effectively add the method g e t C u r r e n t F a c t o r y to the interface of F a c t o r y C r e a t o r , as shown in Figure 4.4. Here, I use contextual dispatch to intercept the instantiation of a particular subclass of Abs t r a c t F a c t o r y , storing the newly created instance. Next, I use contextual dispatch to simulate the presence of the method g e t C u r r e n t F a c t o r y .  The attempt to call  this method on F a c t o r y C r e a t o r is intercepted, returning the stored instance of Abs t r a c t F a c t o r y . This instance is then used to create the abstract product of interest to  SomeClass. In the second option (shown in Figure 4.5), communication history is used. The instance of A b s t r a c t F a c t o r y created by F a c t o r y C r e a t o r is implicitly recorded within the communication history. Contextual dispatch, upon intercepting the attempt by S o m e C l a s s to instantiate A b s t r a c t P r o d u c t , makes a query against communication history to retrieve this instance of A b s t r a c t F a c t o r y . The retrieved instance is then used to create the desired abstract product. Consider the minimal specification of the module containing S o m e C l a s s that needs it to use the Abstract Factory design pattern. This minimal specification might explicitly require that the abstract product be created via that instance of A b s t r a c t F a c t o r y obtained through a method called g e t C u r r e n t F a c t o r y on F a c t o r y C r e a t o r . In this case, this module could reasonably be implemented according to the first option. However, it will not be very reusable as it will be explicitly dependent upon F a c t o r y C r e a t o r . A more realistic minimal specification would have an abstract requirement concerning the instance of A b s t r a c t F a c t o r y that is to be used. For example, the minimal specification might require the use of the most recently created instance of A b s t r a c t F a c t o r y  50  Figure 4.4: Performing contextual dispatch to retrieve state through an added method. The instance of C o n c r e t e F a c t o r y created by F a c t o r y C r e a t o r (1) is intercepted and stored. When S o m e C l a s s attempts to instantiate A b s t r a c t P r o d u c t (2), contextual dispatch intercepts this attempt; in its place, the method g e t C u r r e n t F a c t o r y is invoked on F a c t o r y C r e a t o r (2.1). This invocation attempt is itself intercepted through contextual dispatch and the stored instance of A b s t r a c t F a c t o r y is returned in its stead (return arrow not shown). This instance is then used to create the abstract product desired by S o m e C l a s s (2.2).  51  Figure 4;5: Retrieving state through the use of the communication history. The instance of C o n c r e t e F a c t o r y created by F a c t o r y C r e a t o r (1) is implicitly stored within the communication history (1.1). When S o m e C l a s s attempts to instantiate A b s t r a c t P r o d u c t (2), contextual dispatch intercepts this attempt; in its place, a query on communication history is made (2.1) thereby retrieving this instance of C o n c r e t e F a c t o r y , viewed as an A b s t r a c t F a c t o r y (the return arrow from call 2.1 possessing this reference is not shown). This instance is then used to create the abstract product desired by S o m e C l a s s (2.1).  52  without reference to its creator. The option using communication history would then be preferable in this situation. To see why, we must consider the nature of communication history queries. Communication history allows us to perform a pattern match on the state of the system, where as much or as little of the pattern as needed can be defined. In the example above, we can literally ask for the most recently created instance of A b s t r a c t F a c t o r y without saying anything about its origins. Such a query thus allows us to decouple our module from F a c t o r y C r e a t o r .  We can express the minimal necessary knowledge of  the state of the system via such a query. It is important to emphasize two points about communication history. First, the model presented above is conceptual. Pragmatic constraints prevent any practical implementation of implicit context from naively conforming to this model; for example, there is a computational overhead associated with recording every method call that occurs in the execution of a system, and the longer a system runs, the larger the space required to record the communication history. However, a practical implementation of implicit context need not actually adhere to this model in order to give the appearance of doing so; static analysis can be performed to strip down the overhead to support communication history. I examine the opportunities for realizing a practical implementation in Chapters 5 and 7. Second, it is important that communication history queries be made with respect to the local world views of the modules in which contextual dispatch occurs. Otherwise, we still risk embedding global assumptions in our modules; they would merely be moved out by one layer of indirection. With this outline of the implicit context model in hand, I proceed to describe how it can be used in order to achieve our goals: expressing essential structure in order to increase the reusability of our modules while making them less sensitive to the evolution of our systems.  4.2  Using Implicit Context to Express Essential Structure  Previous chapters have identified two features that must be provided to express essential structure: retroactive abstraction of our modules and their composition to form concrete, coherent systems. In this section, I examine two approaches to meeting these goals. To do this, I make use of the example from Chapters 2 and 3 of the m a k e C o n t r i b u t i o n s method of the J a v a O u t l i n e P a g e class of Eclipse. Recall that the minimal specification of m a k e C o n t r i b u t i o n s was described in Section 3.1 as: • m a k e C o n t r i b u t i o n s must add functionality to the status line, if there is one, to reflect selection changes in the Outline View. • It must add four actions to the tool bar, one for sorting declarations lexically, one for hiding fields, etc. 53  In the first approach, described in Section 4.2.1 (below), I provide an implementation of the module that expresses only its essential structure, and demonstrate how to have such a module interoperate with a complete system. I implement the minimal specification directly. The resulting, abstract implementation is then embedded in a concrete context. I use the implicit context model to translate between its local world view and the system-wide world view. Realistically, we cannot always expect to correctly identify the minimal specification of a module. Even when we do, a given implementation may provide behaviour only some of which is of interest to us. Therefore, situations will arise in which a reused module contains what we consider to be extraneous embedded knowledge. Rather than refactor such a module or reimplement it, it might be preferable to somehow "cancel out" the E E K already present. In the second approach, described in Section 4.2.2,1 begin with an existing implementation that I wish to reuse. However, this implementation provides more than the essential structure of m a k e C o n t r i b u t i o n s — i t contains E E K . I need to have this module operate as though it expressed only its essential structure, so I could then embed it in a concrete context in a fashion identical to the first approach. Therefore, I use implicit context in this second approach to capture any communication from my reused module that expresses E E K with respect to the minimal specification of m a k e C o n t r i b u t i o n s . The result is indistinguishable from the abstract implementation provided in the first approach.  4.2.1  Adding Contextual Details to an Abstract Module  To create an implementation of m a k e C o n t r i b u t i o n s that is highly reusable and unlikely to evolve, we can express its essential structure by adhering to its minimal specification as closely as possible. The result, using Java, is shown in Figure 4.6. Because I have chosen to write this module in Java, I have used several simple assumptions, including that the s t a t u s L i n e and t o o l B a r are passed as arguments, that their types are S t a t u s L i n e and T o o l B a r respectively, that t o o l B a r has a method a d d A c t i o n , etc. Strictly speaking, all such assumptions are extraneous embedded knowledge. However, if we treat them as abstractions rather than as concrete expressions, they are not constraining and effectively do not act as EEK. I show how this is possible below, as I embed this implementation in a concrete context.  Embedding the Abstract Module in a Concrete Context Consider taking the implementation of Figure 4.6 and using it in a concrete context. As an example, this implementation could be used within a larger implementation for Eclipse as a whole. Let us say that the higher-level module containing m a k e C o n t r i b u t i o n s  54  makeContributions(StatusLine statusLine, ToolBar toolBar) { if(statusLine.exists()) reflectOutlineViewSelectionChanges(statusLine);  p u b l i c  v o i d  toolBar.addAction(new toolBar.addAction(new  LexicalSortingAction()); FieldVisibilityFilterAction());  }  Figure 4.6: An implementation of m a k e C o n t r i b u t i o n s , in Java, expressing only its essential structure. possesses a minimal specification that includes all the concrete details in the original implementation of m a k e C o n t r i b u t i o n s , repeated in Figure 4.7. We must use implicit context to add in these concrete details to the abstract module given in Figure 4.6, translating between the world view of Figure 4.6 and that of Figure 4.7. There are a series of translations that we need to make. The first step of the translation lies in the differences between the signature of the abstract module and the signature expected by Eclipse as a whole. This means that any incoming messages (which would expect the signature of the original implementation in Figure 4.7) crossing the boundary of m a k e C o n t r i b u t i o n s must be intercepted and altered to the signature of our essential structure implementation (in Figure 4.6). Conceptually, this involves three operations: • discarding the menuManager argument; • swapping the t o o l B a r M a n a g e r and s t a t u s L i n e M a n a g e r arguments; and • replacing the I T o o l B a r M a n a g e r and I S t a t u s L i n e M a n a g e r  types with  T o o l B a r and S t a t u s L i n e respectively. The first two operations are straightforward. The third is too because, for these two types at least, we can define a bijective mapping between the world views of our essential structure implementation of m a k e C o n t r i b u t i o n s and the implementation of the higherlevel module containing it; in other words, these types are simply different labels that are interchangeable. In  the  second  translation  step,  I  intercept  the  " s t a t u s L i n e . e x i s t s ()" and replace it with: " s t a t u s L i n e translation is similar, intercepting the message: 55  outgoing  message  !=  The third  null."  makeContributions(IMenuManager menuManager, IToolBarManager toolBarManager, IStatusLineManager statusLineManager) { i f ( s t a t u s L i n e M a n a g e r != n u l l ) { S t a t u s B a r U p d a t e r updater new StatusBarUpdater(statusLineManager); addSelectionChangedListener(updater);  p u b l i c  v o i d  }  Action action = new LexicalSortingAction(); toolBarManager ..add ( a c t i o n ) ; action = new F i l t e r A c t i o n ( n e w F i e l d F i l t e r ( ) , JavaEditorMessages.getString( "JavaOutlinePage.HideFields.label"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.unchecked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.unchecked"), "HideFields.isChecked"); JavaPluginlmages. setlmageDescriptors(action, " l c l l 6 " , "fields_co.gif"); toolBarManager.add(action);  }  Figure 4.7: The original implementation of m a k e C o n t r i b u t i o n s , expressing the concrete details expected by Eclipse, which are extraneous embedded knowledge with respect to its minimal specification.  56  reflectOutlineViewSelectionChanges(statusLine); and replacing it with two messages: StatusBarUpdater new  updater  =  StatusBarUpdater(statusLine);  addSelectionChangedListener(updater); Finally, the world view of Eclipse as a whole does not possess a class called F i e l d V i s i b i l i t y F i I t e r A c t i o n ; therefore, I must intercept the attempt at instantiating it, and replace it with the sequence of operations: FilterAction  action  =  new F i l t e r A c t i o n ( n e w  FieldFilter(),  JavaEditorMessages.getString( "JavaOutlinePage.HideFields.label"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.description.unchecked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.checked"), JavaEditorMessages.getString( "JavaOutlinePage.HideFields.tooltip.unchecked"), "HideFields.isChecked"); JavaPluginlmages. setlmageDescriptors(action,  "Icll6",  "fields_co.gif");  returning the new instance of F i l t e r A c t i o n instead. After these translations, applied externally to the essential structure of m a k e C o n t r i b u t i o n s , we have a concrete implementation interoperating with the rest of Eclipse that remains free from E E K .  An Alternative Interpretation of the Essential Structure In the situation described above, the mapping between the world view of the essential structure implementation of m a k e C o n t r i b u t i o n s and the world view of Eclipse is straightforward, consisting of mere rearrangement, discarding of some information, and one-forone replacement. However, I could have chosen a slightly different implementation of the essential structure of m a k e C o n t r i b u t i o n s . This is especially true since the informal minimal specification of m a k e C o n t r i b u t i o n s is ambiguous: it is unclear whether we  57  should consider there to be a single status line and tool bar in existence (i.e., some sort of global properties) or not. Let us consider how this ambiguity might affect the essential structure and implementation of m a k e C o n t r i b u t i o n s . The implementation I gave in Figure 4.6 takes the view that the status line and tool bar parameterize m a k e C o n t r i b u t i o n s . One alternative interpretation could be implemented thus: p u b l i c  v o i d  makeContributions()  {  if(statusLineExists()) reflectOutlineViewSelectionChangesInStatusLine(); addActionToToolBar(new addActionToToolBar(new  LexicalSortingAction()); FieldVisibilityFilterAction());  } A second alternative interpretation treats s t a t u s L i n e and t o o l B a r as fields or globals: p u b l i c  v o i d  makeContributions()  {  if(statusLine.exists()) reflectOutlineViewSelectionChanges(statusLine); toolBar.addAction(new  LexicalSortingAction());  toolBar.addAction(new F i e l d V i s i b i l i t y F i l t e r A c t i o n ( ) ) ; } In both these alternative implementations, in order to correctly interpret the particular status line and tool bar objects to which the messages are to be passed, I need to retrieve the objects that were passed to m a k e C o n t r i b u t i o n s but discarded at the boundary. Therefore, I can perform a communication history query when the outgoing messages are intercepted, locate the appropriate status line and tool bar objects, and pass the messages on to them. This is illustrated in Figure 4.8. In other words, the ambiguity in the minimal specification can still be dealt with by the application of implicit context. In all these situations, I have implemented m a k e C o n t r i b u t i o n s to express only its essential structure, and used implicit context to add in the concrete details needed by higher-level modules. I now consider an alternative scenario, where my implementation contains E E K , but I still wish to reuse it.  58  Boundary of makeContributions  makeContributions( : IMenuManager  4  makeContributions()  : IToolBarManager : IStatusLineManager  (a)  (b)  Figure 4.8:  U s i n g communication history to retrieve discarded information. I n (a), some external client calls m a k e C o n t r i b u t i o n s , passing three objects to it as arguments; this call is intercepted and the arguments are discarded. I n (b), m a k e C o n t r i b u t i o n s attempts to call a d d A c t i o n on some object of type T o o l B a r that it believes to exist; this call is intercepted, the object of type I T o o l B a r M a n a g e r that was discarded in (a) is retrieved from communication history, and the a d d method is called upon it. ( T h e argument list for a d d A c t i o n and a d d has been left out of the diagram.)  59  4.2.2  Cancelling Out E E K Already Present within a Module  Rather than implementing m a k e C o n t r i b u t i o n s from scratch, we may decide that some existing implementation almost does the job. However, this existing implementation likely expresses knowledge of its external context that is E E K relative to the minimal specification of interest. Therefore, we need a means to translate the existing implementation to simulate a module that expresses only the essential structure of interest; we must somehow "cancel out" the E E K . Consider starting with the original implementation of m a k e C o n t r i b u t i o n s , shown in Figure 4.7. I want to translate this implementation, so that, from the perspective of an external viewer, it appears to provide only the essential structure of m a k e C o n t r i b u t i o n s . The implementation of m a k e C o n t r i b u t i o n s shown in Figure 4.6 is much closer to the essential structure. We can thus think of this cancellation of E E K as a translation from the original implementation to the essential structure implementation, consisting of a series of steps. This transformation changes the externally observable structure and behaviour of the module; however, I describe the transformation as though I am altering the internal implementation. From the external perspective, there is no way to see a difference between these two descriptions. As a first step, I need to modify the signature of m a k e C o n t r i b u t i o n s to eliminate the first formal parameter, swap the other two, and alter the types from I S t a t u s -  L i n e M a n a g e r and I T o o l B a r M a n a g e r to S t a t u s L i n e and T o o l B a r respectively. This alteration is illustrated in Figure 4.9. Now, the world view external to the boundary of m a k e C o n t r i b u t i o n s perceives it as having the signature required by the minimal specification, while internal to the boundary, it has the original signature. In this situation, when I translate an incoming message that lacks an argument corresponding to the IMenu-  M a n a g e r formal parameter, I have contextual dispatch simply add n  u l l  to the message  in this position. If I were dealing with a similar situation in which adding a n  u l l  argument  did not suffice, I would need contextual dispatch to create a dummy instance or retrieve an instance from communication history, as appropriate. The next bit of E E K present in the original implementation involves extraneous details involved with the status line. The minimal specification requires that the implementation add functionality to the status line, if it exists, to reflect selection changes in the Outline View. The original implementation of m a k e C o n t r i b u t i o n s simply checks if  s t a t u s L i n e M a n a g e r is  n u l l  to determine if it exists, but this makes an extraneous  assumption about what it means for the status line to exist. Therefore, this test should be altered somehow, but difficulty arises from the fact that one might not consider the expression  statusLineManager  !=  n u l l  60  Boundary of makeContributions  makeContributions(  makeCont'r-ibutions ( m i l l  :StatusLine  •••>• :IToolBarManager ';  :ToolBar  Figure 4.9: Altering an incoming message to hide the presence of E E K within a module. Within the boundary of m a k e C o n t r i b u t i o n s , the instances of S t a t u s L i n e and T o o l B a r are viewed as instances of I S t a t u s L i n e M a n a g e r and I T o o l B a r M a n a -  g e r ; their positions are also swapped within the message. Since the original implementation of m a k e C o n t r i b u t i o n s expects an instance of I M e n u M a n a g e r as its first argument, n u l l is added to the message in this position. to be communication that flows across the boundary of m a k e C o n t r i b u t i o n s . With such an interpretation, this test cannot be intercepted and altered to the more general form of interest to us, namely " s t a t u s L i n e M a n a g e r . e x i s t s ()." I consider an alternative solution in the next subsection. Since I wish to eliminate mention of A c t i o n and its attendant constraints that L e x i c a l S o r t i n g A c t i o n and F i l t e r A c t i o n be subclasses of it, I should also replace reference to A c t i o n with reference to L e x i c a l S o r t i n g A c t i o n . Finally, the detailed set of calls to create and initialize the instance of F i l t e r A c t i o n need to be captured and discarded, returning n u l l instead. The second call to a d d A c t i o n is also captured, and its argument replaced with a call to instantiate F i e l d V i s i b i l i t y F i l t e r A c t i o n before it is allowed to continue. I have thus successfully modified the original implementation of m a k e C o n t r i b u t i o n s to express only the essential structure (I discuss the difficulties of capturing the inequality test below). The resulting module can now be reused in any context where its minimal specification is needed.  61  Coping with Statements that Do Not Produce Communication Above, I pointed out that certain statements and expressions, such as inequality tests, do not obviously produce communication. If we choose to consider them as producing communication, coping with them is straightforward. The alternative, of not considering them as producing communication, requires two steps: (a) add the if-statement appearing at the start of the essential structure implementation of m a k e C o n t r i b u t i o n s ; and (b) effectively delete the if-statement from the start of the original implementation. Recall that the external world view expects that m a k e C o n t r i b u t i o n s will have the signature appearing in Figure 4.6. For step (a), I capture calls to the expected signature makeContributions(StatusLine,  ToolBar)  and use contextual dispatch to perform the functionality of the new if-statement and to invoke the actual signature of m a k e C o n t r i b u t i o n s , in that order. Thus, we can think of the change as though the call were to this facade: p u b l i c  v o i d  makeContributions(StatusLine  s,  T o o l B a r t)  {  if(statusLine.exists()) reflectOutlineViewSelectionChanges(statusLine); makeContributions(null,  (IToolBarManager)t,  (IStatusLineManager)s); } For step (b), presumably the if-statement at the start of Figure 4.7 would not be considered a communication either, so I must capture the individual calls within its body (to instantiate S t a t u s B a r U p d a t e r and call a d d S e l e c t i o n C h a n g e d L i s t e n e r ) and discard them. This alternative solution is not very satisfactory, since it effectively throws out several lines of code, calling into question the purpose to reusing this module in the first place. Indeed, the reduction to essential structure in this example is extreme and may indicate that doing so is not worthwhile. A l l in all, it would be better to treat the test for the  n u l l  status line as an outgoing message, in order to avoid convoluted solutions. Implicit context does permit E E K already present within our modules to be cancelled out, even in such problematic situations. Although it is possible to cancel out the E E K present in an existing implementation, it may require some convolutions to do so. Too much discarding and too many convolutions themselves will lead to structural degradation, the avoidance of which motivated the need for implicit context in part. This dissertation does not address at what point it becomes 62  more expensive (in whatever sense) to reuse an ill-fitting module than to implement one expressing the required essential structure.  4.3  Summary  Implicit context permits each module to have its own, independent view of the world in which it operates. These views must be reconciled when the modules within a system communicate, in order for those modules to be able to work together. To this end, implicit context defines there to exist a boundary around each module; the world view within this boundary need not agree with the world view outside it. When communication crosses these boundaries, it is intercepted and translated according to the differences in world views on either side of the boundary; this operation is called contextual dispatch. It sometimes occurs that additional information must be added to the communication crossing a boundary or that the recipient of such communication depends on the state of the system. When such information cannot be simply retrieved via the interfaces provided by our modules, we can use the history of communication within the system to retrieve otherwise hidden information about the system state. Such use of the communication history can allow the definition of the system state to be expressed explicitly instead of, for example, assuming that an invariant exists between the system state and the value of an encoding of that state as recorded within some variable. Implicit context allows for the expression of essential structure within our modules, and for the elimination of extraneous embedded knowledge already present there. A module, whose minimal specification is indifferent to contextual details of interest to a higherlevel module, can have those details added through implicit context by intercepting incoming and outgoing communication and altering it appropriately. A module that possesses extraneous contextual details can have those details cancelled out through implicit context also by intercepting incoming and outgoing communication and altering it appropriately. Chapter 5 describes how implicit context can be and has been implemented in a prototype tool; in Chapter 6, I give concrete examples of case studies I have performed using implicit context; and further optimizations and pragmatic considerations for better implementations are also discussed in Chapter 7.  63  Chapter 5  A Prototype Tool for Utilizing Implicit Context in Java In Chapter 4, I presented a model of essential structure through implicit context. To realize this model, we must possess a mechanism for implementing it. Such a mechanism requires a means to define boundaries between independent world views; to capture information flowing across these boundaries; and to perform contextual dispatch on the captured information flow, including the use of communication history. To evaluate the implicit context model, I have constructed a prototype tool that supports the main features of the model for use on source code written in the Java programming language [Gosling et al., 2000]. I chose to work with programs written in Java so as to be able to apply implicit context to large existing systems. Although the work in this chapter is devoted to tackling Java specifically, the concepts of essential structure and implicit context are more generally applicable. Developing a special purpose language tailored to implicit context is premature at this point. The prototype tool operates as a preprocessor to Java source code. I chose to have the tool support boundaries only around individual classes and methods, largely because these declarations are easily named in Java source. To support interception and alteration of communication at these boundaries, I provide a construct called a boundary map; system developers must explicitly create boundary maps to perform contextual dispatch. Boundary maps are declared in files that are separate from the Java source code. System developers also explicitly declare, through tool directives, to which boundary or boundaries a given boundary map is to be attached. I begin, in Section 5.1, by giving an overview of the syntax and operation of boundary maps through a simple example. I describe the general structure of the tool in Section 5.2 along with an overview of its usage, leaving the non-trivial details of its algorithms to the other sections of this chapter. I then describe the features of boundary maps as provided  64  by the tool in Section 5.3.  Tool directives are described in Section 5.4.  I consider, in  Section 5.5, how boundary maps are sequentially applied so that multiple boundary maps can be composed. Finally, Section 5.6 describes some of the implementation issues of the tool.  5.1  A Simple Example  To examine the major features of my prototype tool, consider the following example that uses implicit context; this example was introduced in Section 1.2. We have a simple Java class, called CompoundWidget, for creating a particular kind of compound widget comprising a button and a scroll bar. Shown below is part of the declaration of CompoundW i d g e t , including the declarations for two fields that hold its button and scroll bar objects and a constructor declaration. public  class  CompoundWidget  p r i v a t e  Button  p r i v a t e  ScrollBar  public  scrollBar; {  = new Button();  scrollBar //  button;  CompoundWidget()  button  {  = new ScrollBar();  Perform other  setup  }  //  Other  methods  }  The constructor instantiates the B u t t o n and S c r o l l B a r classes, and stores each resulting object in the appropriate field of the newly-created instance of CompoundWidget. This source code would occur in a file named CompoundWidget. j ava. Now, imagine that we have different versions of B u t t o n and S c r o l l B a r for different platforms, such as a version operating with the Motif library to run on Unix systems and a version to operate on Microsoft Windows systems. We decide to use the Abstract Factory design pattern [Gamma et al., 1994] to isolate these platform-specific differences from the rest of our system. The class structure at which we arrive is shown in Figure 5.1. Here, B u t t o n and S c r o l l B a r are abstract base classes, each specialized into platform-specific versions. There is also a hierarchy of widget factories, consisting of an abstract base class called W i d g e t F a c t o r y and platform-specific subclasses. Each platform-specific widget factory creates instances of the corresponding platform-specific versions of B u t t o n and  65  WidgetFactory createScrollBar( ) createButton() 1  MotifWidgetFactory  MSWindowsWidgetFactory  createScrollBar( ] createButton()  createScrollBar() createButton()  Button  MSWindowsButton  MotifButton  -=H  ScrollBar JZ  1  MSWindowsScrollBar r<-  MotifScrollBar  Figure 5.1: Using the Abstract Factory design pattern to isolate platform-specific dependencies. We have separate subclasses of B u t t o n and S c r o l l B a r for both Motif and Microsoft Windows systems. S c r o l l B a r . Clients can ignore platform differences by possessing an instance of either subclass of W i d g e t F a c t o r y but treating it as being of type W i d g e t F a c t o r y . Calling the methods on this instance causes the appropriate instances of B u t t o n or S c r o l l B a r to be created polymorphically. One part of the system must still create an instance of the appropriate widget factory subclass and pass it to the client, and thus, this part will be aware of which platform is in use. I (in my role as a particular developer of this example) wish to allow the interior of the CompoundWidget module to ignore the added complexity of Abstract Factory, while having the system as a whole utilize these platform-specific subclasses. The system as a whole needs to interact with CompoundWidget as though the Abstract Factory pattern were in use there. Thus, the declaration of CompoundWidget should be externally indistinguishable from the following declaration. p u b l i c  c l a s s  CompoundWidget {  p r i v a t e  Button  p r i v a t e  ScrollBar  p u b l i c  button; scrollBar;  CompoundWidget(WidgetFactory  66  fact)  {  button =  fact.createButton();  scrollBar  =  //  fact.createScrollBar();  Perform other  setup  } //  O t h e r methods  }  The original declaration describes one world view, that which holds sway within Comp o u n d W i d g e t . This new declaration describes a different world view, that which holds sway in the system external to CompoundWidget. The differences between these two declarations are that an instance of W i d g e t F a c t o r y should be passed to the constructor, and that this instance should be used to build the individual pieces of the CompoundWidg e t . In fact, since B u t t o n and S c r o l l B a r are abstract classes in the view of the system as a whole, they are not directly instantiable. To translate between these differences, I begin by considering there to be a boundary around CompoundWidget between these independent world views. The prototype tool supports the use of boundaries around individual classes and individual methods; the fully qualified name of the class or method is used to refer to its boundary. Next, I must specify contextual dispatch at this boundary to resolve the conflict between the world views. I specify three boundary maps to do this: one that translates incoming attempts at constructing CompoundWidget instances parameterized by a W i d g e t F a c t o r y object; one that translates outgoing attempts at instantiating B u t t o n ; and one that translates outgoing attempts at instantiating S c r o l l B a r . The operation of these boundary maps is depicted in Figure 5.2. I walk through the boundary map code below. I group these three boundary maps into a construct called a mapset. The mapset below is named W i d g e t F a c t o r y to allude to its purpose. Mapsets are specified infilesseparate from source code files.  // Begin mapset  running  example  WidgetFactory  5.1  {  The first boundary map translates parameterized calls into the CompoundWidget constructor. Since no parameterized constructor exists within the boundary, such calls must be replaced by calls to the non-parameterized constructor. The communications to be captured by this boundary map are specified in the capture clause to the right of the colon on the line below; this boundary map intercepts in-bound messages to CompoundWidget parameterized by W i d g e t F a c t o r y so the capture clause uses the keyword i n . Each boundary map must declare its result type, even when we are dealing with a constructor; the  67  p u b l i c class' CompoundWidget { private Button button; private S c r o l l B a r s c r o l l B a r ;  CompoundWidget(WidgetFactory)  p u b l i c CompoundWidget() b u t t o n = new B u t t o n ! ) ; s c r o l l B a r = new S c r o l l B a r ( ) p u b l i c CompoundWidget(WidgetFactory fact) button = fact.createButton(); s c r o l l B a r = f a c t . c r e a t e S c r o l l B a r ();  {  }  (a)  Figure 5.2: The operation of boundary maps attached to C o m p o u n d W i d g e t . In (a), messages from the external system that attempt to instantiate C o m p o u n d W i d g e t via a call to a parameterized constructor (which does not actually exist) are rerouted to the nonparameterized constructor. In (b) and (c), attempts to instantiate B u t t o n and S c r o l l B a r within the boundary are rerouted via an instance of a widget factory. (Note that the use of communication history, described in the main text, is not depicted here.) 68  result type here is C o m p o u n d W i d g e t , declared at the beginning of the line below. Once a communication conforming to the description in the capture clause crosses the boundary, it is captured and the body of the boundary map (declared within a matching pair of braces) is executed in its place. CompoundWidget map():  in(CompoundWidget(WidgetFactory))  {  The body of this boundary map simply indicates that the non-parameterized constructor should be run in place of the original call.  1  The argument of type W i d g e t F a c t o r y  passed in the original call is discarded. t h i s () ; }  The second boundary map specifies the translation of requests to instantiate B u t t o n . Any communications passing out across the boundary that are to instantiate B u t t o n are captured by this boundary map, so it uses the keyword o u t (additional forms are available for capturing each access that g e t s or s e t s the value of a field).  2  An object of type  B u t t o n will be returned to the site that sent the captured communication. B u t t o n map():  out(Button.new())  {  I need to replace the call to instantiate B u t t o n with the appropriate call on an instance of W i d g e t F a c t o r y , but no such instance is lying about for us to use. Therefore, I query communication history to retrieve an instance of W i d g e t F a c t o r y . Specifically, I find the instance that has most recently been passed in any communication. Communication 3  history queries are implemented through a set of methods on the H i s t o r y class, provided by the tool. This particular query requires an instance of type C l a s s  4  and returns an  instance of type O b j e c t ; since we know that this must actually be an instance of type W i d g e t F a c t o r y , I downcast the returned object. 'in Java, one constructor on a given class may invoke another constructor on the same class. This is done via the syntax t h i s followed by the argument list appropriate to the desired constructor. In this example, we are replacing the invocation of one constructor (which takes a single argument of type W i d g e t F a c t o r y ) with another (which takes no arguments). The latter invocation is implicitly on the same class as the former. In Java, one instantiates classes through the syntax "new S o m e C l a s s ( a r g s ) . " To refer to these instantiation attempts, the syntax is changed a little (to " S o m e C l a s s . new ( a r g s ) " ) for consistency with other message capture syntax. The selection of this particular instance of W i d g e t F a c t o r y is a design choice on the part of the developer who creates the boundary maps; other choices are possible using other communication history queries, default values, etc. In Java, one may access the instance of type C l a s s that represents a particular class, say 2  3  4  S o m e C l a s s , through the syntax " S o m e C l a s s . standard Java introspection.  69  c l a s s . " The resulting instance can be used in  WidgetFactory  fact  =  (WidgetFactory)  History.lastInstancePassed(WidgetFactory.class); The first call to construct an instance of C o m p o u n d W i d g e t may come before the first call to construct an instance of W i d g e t F a c t o r y ; the communication history query would then return n u l l . In such a case, I choose to create a default widget factory of the M o t i f variety.  5  if(fact fact  ==  null)  = new  MotifWidgetFactory();  Finally, I use the retrieved (or created) widget factory to create and return a button object that is platform-specific. return  fact.createButton();  } The third and final boundary map is much like the second, the only differences being that the communications captured are to instantiate S c r o l l B a r and that the call to the widget factory creates and returns an instance of S c r o l l B a r . ScrollBar  map():  WidgetFactory  out(ScrollBar.new()) fact  =  6  {  (WidgetFactory)  History.lastinstancePassed(WidgetFactory.class); if(fact fact return  ==  null)  = new  MotifWidgetFactory();  fact.createScrollBar();  .}  }  // End running  example 5.1  Boundary maps can sometimes be specified in a way that permits their reuse in conjunction with multiple maps. Therefore, we must indicate to the tool which mapsets should be attached to which boundaries; this is done with tool directives. The following tool directive indicates that the mapset named W i d g e t F a c t o r y should be attached to the boundary Again, this is not the only choice available. This boundary map replicates the core of the earlier one. Such replication could be reduced by the support for mechanisms similar to C++ templates, for example. However, such extensions are optimizations. They do not affect the key properties of the tool, so they remain a topic for future work. 5  6  70  named CompoundWidget (the tool determines that there is a class of the same name, and therefore, it is the boundary around this class that is being referenced). Tool directives are specified in one or more files separate from those containing source code or boundary maps. a  p  p  l  y  WidgetFactory  t  o  CompoundWidget;  To apply implicit context in this situation, the prototype tool is run with the three files containing the tool directive, the W i d g e t F a c t o r y mapset, and the CompoundWidget source code. The tool produces a new set of Java source code that is ready for compilation. This new source code is not intended to be human readable; if changes are needed, the tool is re-run with changed inputs. The resulting source code simulates the effects of the implicit context model: the CompoundWidget module is oblivious to the presence of the Abstract Factory design pattern, while the system as a whole sees this design pattern in use.  5.2  Tool Structure  Figure 5.3 shows an overview of the operation of the prototype tool. The system developer provides three kinds of inputs to the tool: the original Java source code on which to apply implicit context, the boundary maps describing the use of implicit context, and the tool directives indicating which boundary maps attach to which modules within the source code. Each kind of input is provided in one or more files separate from the other kinds of input. The tool parses the original source code and boundary maps and forms abstract syntax trees (ASTs) of the code found there. The parser is generated by JJTree and JavaCC from 7  an annotated grammar for the Java language. This grammar is a modified form of the one provided in the JavaCC distribution; it has been tweaked to make the resulting ASTs more explicitly represent such things as method invocations. According to the tool directives given to the tool, the boundary maps are combined with the original source code. To do this, the tool replicates pieces of the boundary map A S T and inserts them at particular points of the source code AST; further details of this process are provided in Section 5.6.1. The combined A S T is then traversed and modified in order to support communication history queries. Instructions are inserted at particular nodes within the A S T to record the communication history; further details of instrumentation and communication history support are provided in Sections 5.3.5 and 5.6.2. The final, instrumented A S T is then translated back into Java source code. This new version of the source code is not intended for human consumption. If the developer requires changes to the resulting system, the inputs to the tool are altered and the tool re-run. In fact, the instrumentation and other transformations make the new source code difficult for a 7  Both available from WebGain at the time of writing,  71  http :  / /www. w e b g a i n . c o m / . -  Figure 5.3: Overview of the operation of the prototype tool. See the main text for an explanation of the four steps involved.  72  human to read. The tool could as well compile the code as produce transformed source; I chose the source-level transformation approach in order to permit debugging of the tool.  5.3  Boundary Maps  For convenience, rather than naming each individual boundary map, boundary maps are collected into sets called mapsets. Each mapset possesses a name, unique among all mapsets, that can be used within tool directives. Below is a simple mapset called e x a m p l e : mapset v o i d  example { map():  in(someMethod())  System.err.println("Calling  SomeException { someMethod()");  throws  CONTEXT.proceed(); } }  The e x a m p l e mapset contains a single boundary map. Usually there will be multiple boundary maps in a mapset, and each mapset can be applied to more than one boundary. Each boundary map consists of up to six parts: a capture clause, formal parameters, an optional throws clause, a body, a result type, and optional modifiers;  mapsets conform to  the Backus-Naur Form (BNF) syntax [Naur et al, 1960] shown in Figure 5.4. 8  9  The capture clause specifies which communications should be intercepted that cross the boundary to which this boundary map is attached; the e x a m p l e specifies that communications should be intercepted that cross the boundary from the external context into the module that are bound for the someMethod method. I look at communication capture in Section 5.3.1. Boundary maps can allow the arguments of captured communications to be exposed to the body of the boundary map, although the communications captured here possess no arguments; I examine how this is done in Section 5.3.2. The types of exceptions thrown within the body of the boundary map sometimes need to be explicitly declared within a throws clause, in identical fashion to the rules for Java methods [Gosling et al, 2000: §8.4.4]. The example above declares that the body can throw a S o m e E x c e p t i o n . To translate from a world view where a call can result in an exception to one where it cannot, one declares a set of exceptions within the boundary Surrounding non-terminals with "<" and ">" has been dropped in favour of slanted text. Uppercase slanted text refers to non-terminals unchanged from their definition in the grammar for Java; lowercase refers to non-terminals that are new or redefined. Terminals are denoted by boldface, for keywords, or are surrounded by quotation marks, for special characters. This syntax is intentionally reminiscent of that of AspectJ [Kiczales et al, 2001], the connection with which is described in Chapter 8. 8  9  73  mapset ::= m a p s e t IDENTIFIER { boundary-map  "{" }  "}" boundary-map  ::= standard-boundary-map  standard-boundary-map  | renaming-map  ::—  [ modifier ] map FORMAL-PARAMETERS capture-clause  [ THROWS-CLAUSE  CONSTRUCTOR-BODY  ":" ]  •  Figure 5.4: B N F syntax for mapsets. The production rules for the non-terminals IDENTIFIER, FORMAL-PARAMETERS, THROWS-CLAUSE, and CONSTRUCTOR-BODY are identical to those for identifiers, formal parameters, method throws, and constructor bodies in Java [Gosling et al, 2000: §3.8, §8.4.1, §8.4.4, & §8.8.5 respectively]. Other non-terminals are defined later in the chapter. map's throws clause. To translate in the opposite direction, one must place the contextually dispatched call within a try block, and handle.the exception in an associated catch block. The body of the boundary map specifies how these captured communications should be altered; in general, each body can consist of an arbitrary block of Java source code, with a few additions and restrictions. In this case, the boundary map has been specified to invoke a print statement prior to the execution of someMethod. The body indicates that the original communication should be allowed to proceed on its way through a call to proceed, indicated by the statement : 10  CONTEXT.proceed() ; The call to proceed is used, rather than re-invoking the original method (i.e., calling someM e t h o d again). Section 5.3.3 discusses the reasons for using this construct. Boundary maps can be prefaced by certain modifiers just as constructor, method, and field declarations can. The permitted modifiers and their interaction with the modifiers of source code declarations are described in Section 5.3.4. This simple boundary map does not need to make use of communication history, but this is often not the case; I describe how communication history can be used in Section 5.3.5. An additional form of boundary map is provided by the tool, the renaming map; this is described in Section 5.3.6. This syntax was chosen to reduce the likelihood that an existing name (such as proceed) in the input source would be treated as a keyword. Also, there was the possibility that other context references might be needed. The call to proceed is the only one defined by the prototype tool. 10  74  capture-clause  ::-= event-designator  event-designator  ::= i n | o u t | g e t s  communication-description  ::= IDENTIFIER  formal-arguments  ")"  | sets  :: =  [ qualifier ] (IDENTIFIER qualifier  "(" communication-description  | n e w ) [ formal-arguments  "." {IDENTIFIER  ]  "." }  ::=  "(" [ type-or-param-name {"," type-or-param-name type-or-param-name ::= [ qualifier ] IDENTIFIER {"["  } ] ")" "]"}  Figure 5.5: B N F syntax for capture clauses. Strictly speaking, communication descriptions for g e t s and s e t s event designators cannot include formal arguments; I avoid this wrinkle for the sake of clarity in this presentation.  5.3.1  Capturing Communications  Communications may be captured as they cross boundaries. The capture clause of each boundary map specifies the kind of communications that are captured by that boundary map. While communication capture and alteration are concepts in the dynamic realm, the tool uses and modifies the static properties of the code it operates on to achieve the effect of dynamic capture and alteration. The implementation of this process is described in Section 5.6.1. Boundary maps are differentiated on the basis of whether they map communications crossing the boundary from the outside to the inside (in-maps) or vice versa (out-maps). In addition, boundary maps can be defined to select communications that attempt to set the value of a field or to get the value of a field; these are special forms of out-maps. Each of these four options is denoted by a different event designator  11  clause:  i n , out, gets,  or  appearing in the capture  sets.  Capture clauses conform to the B N F syntax shown in Figure 5.5. Each event designator is parameterized by a communication description, consisting of the name of the method (or field) being called and a set of formal arguments. Each formal argument is either a type name or the name of a formal parameter; this latter kind is described further in Section 5.3.2. Not all capture clauses can be used in conjunction with all kinds of boundaries; even "Strictly speaking, these are not keywords but identifiers treated in a context-sensitive way by the tool's parser; they may be used elsewhere as ordinary identifiers in the boundary map code or the source code.  75  Use of i n  —  id  Q.id  id(args)  Q.id(args)  Class  0  field access  field access on nested type  method or  method or constructor invocation on nested type method or constructor invocation on nested type  Interface  0  Method  method invocation  Constructor  constructor invocation  Field  field access  field access  field access on nested type  constructor invocation  0  0  0  0  0  0  0  0  0  0  0  0  0  Table 5.1: Interpretation and support for capturing communications, for i n event designators, by each kind of boundary. The symbol 0 indicates that there is no simple interpretation for the occurrence of the given communication at the given boundary, and so the case causes the tool to indicate an error. Boldface entries indicate that capture of the given communication kind is fully supported at the given boundary. Italicized entries indicate that capture of the given communication kind is unsupported at the given boundary. Roman entries indicate that capture of the given communication kind is partially supported at the given boundary; see the main text for details. of those combinations for which a meaningful interpretation can be defined, not all are supportable in Java. The combinations consist of three dimensions: kind of event designator, communication description properties, and kind of boundary. Tables 5.1, 5.2, and 5.3 describe the treatment of each combination; the four communication kinds are represented as id (bare identifier), Q.id (qualified name), id(args) (unqualified name with formal arguments), and Q.id(args) (qualified name with formal arguments). For i n event designators, communication descriptions can have prefixes that are identical to the names of the boundaries on which these communications are being intercepted; such prefixes are stripped from the communication description prior to considering how to deal with it. This is where the column marked "—" comes from: if the communication description consists solely of the name of the boundary on which the communication is being intercepted, I have found the final destination for the communication. For example, in-mapping communications of the form d o i t () on the boundary around a method named d o i t () indicates that the entry in the "—" column should be considered. Sim76  Use of o u t  id  Q.id  id(args)  Q.id(args)  Class  field access  field access  method or  method or  constructor  constructor  invocation  invocation  Interface  field access  field access  method or constructor invocation  method or constructor invocation  Method  field access  field access  method or constructor invocation  method or constructor invocation  Constructor  field access  field access  method or constructor invocation  method or constructor invocation  Field  field access  field access  method or constructor  method or constructor invocation  invocation  Table 5.2: Interpretation and support for capturing communications, for o u t event designators, by each kind of boundary. Boldface entries indicate that capture of the given communication kind is fully supported at the given boundary. Roman entries indicate that capture of the given communication kind is partially supported at the given boundary; see the main text for details.  Use of g e t s / s e t s  id  Q.id  Class  field access  field access  Interface  field access  field access  Method  field access  field access  Constructor  field access  field access  Field  field access  field access  Table 5.3: Interpretation and support for capturing communications, for g e t s and s e t s event designators, by each kind of boundary. Formal arguments are not permitted in the communication descriptions for g e t s and s e t s event designators, so no columns are shown for those kinds. Boldface entries indicate that capture of the given communication kind is fully supported at the given boundary. Roman entries indicate that capture of the given communication kind is partially supported at the given boundary; see the main text for details.  77  ilarly, in-mapping communications of the form S o m e C l a s s . d o i t () on the boundary around a class named S o m e C l a s s results in " S o m e C l a s s " being stripped from the communication; thus, the entry in the column marked  "id(args)" should be considered. How-  ever, if we in-map communications of the form S o m e C l a s s . d o i t () on the boundary around a method named d o i t (), no stripping occurs and the entry in the column marked  "Q.id(args)" should be considered. This stripping process does not have an analogue for out-mapping, where a fullyqualified name cannot be interpreted until the outermost boundary named is encountered. For example, a reference to S t r i n g cannot be interpreted as equivalent to a reference to J a v a . l a n g . S t r i n g until import declarations make this interpretation unavoidable. Since import declarations are not found until a communication has passed an outer class boundary, there is no means to gradually resolve such a qualified name. Any communication descriptions matching a table entry that is unsupported results in the tool signalling an error. Communications consisting of just a type name do not occur in Java, so they are unsupported; these correspond to the entries in the " — " column for classes and interfaces. Similarly, the only communications that should cross the boundary of a field, method, or constructor from the outside to the inside are those that name that field, method, or constructor, respectively; all others do not occur in Java. Field accesses cannot be in-mapped in Java, because the Java Virtual Machine does not permit code to be executed when a field is accessed [Lindholm and Yellin, 1999: §3.11.5 & §3.11.8].  12  The interception of field accesses and method or constructor invocations is partially supported (as indicated in Tables 5.2 and 5.3) because I require that such communications actually cross the boundary. For example, a communication that is sent directly by a method to itself is not considered to have crossed the boundary of that method. This is a subtle point: if a method d o i t () contains an invocation d o i t (), it is considered to be sending itself a communication directly. In general, a name reference is considered to cross a boundary if it does not resolve to a declaration within that boundary. The name resolution process is described below. The remaining forms of capturing communications are partially supported due to the way in which communication capture and alteration is implemented; this is discussed in Section 5.6.  Name Resolution Declarations are referenced through names; however, a given name can refer to different declarations depending upon the scope in which the name occurs. Therefore, we need a set of rules for determining how to resolve and match names. The following process is used In-mapping of a field can be simulated by out-mapping all accesses to that field (i.e., code can be added at every location where the field is referenced)—a potentially difficult prospect, since it requires that all source code making such accesses be available to the tool. l2  78  for resolving the names in boundary maps and source code. Names are resolved with respect to a given boundary; only the declarations inside the boundary are used to resolve the names there. The following code is an example of a complete compilation unit package  that I will use to illustrate name resolution.  somepackage;  import public  13  java.util.Vector; class  private  SomeClass  Vector  extends A n o t h e r C l a s s  s e t = new  public  SomeClass()  public  Object  Java.util.Vector();  {}  doSomething(SomeClass  other,  SomeClass  set) {  if(other.set.elementAt(0) return  return  {  == t h i s . s e t . e l e m e n t A t ( 0 )  set.elementAt(0);  null;  }  private return  Object  elementAt(int index)  {  set.elementAt(index);  } }  There are three sites in this source code that reference the s e t field; but each is qualified differently: one is qualified by the local variable o t h e r , one by the keyword t h i s , and one is unqualified. Consider the following capture clauses: 1. o u t ( s e t ) 2. o u t ( S o m e C l a s s . s e t ) The expressions that the tool determines as matching each of these capture clauses varies. If a boundary map containing either of these capture clauses is attached to the boundary of the entire class, none of the three sites in the source code is considered to match. Each communication description resolves to a reference to the field declaration that occurs within the boundary; therefore, these communications would not cross the class boundary. A compilation unit consists of the contents of a single Java source file. In general each Java source file can contain the declaration of only one class. Some minor exceptions exist. 1 3  79  In the case of boundary maps attached to the boundary around the d o S o m e t h i n g method, the sites captured by each of these capture clauses is different. The expression o t h e r . s e t resolves to S o m e C l a s s . s e t since o t h e r is declared as a formal parameter of type S o m e C l a s s within the method boundary. The expression t h i s . s e t resolves to s e t , since t h i s is no longer needed to disambiguate the field reference from the formal parameter reference. The expression s e t resolves to S o m e C l a s s since this is a formal parameter reference. Therefore, capture clause #1 would match only the second expression in d o S o m e t h i n g , t h i s . s e t , while capture clause #2 would match only the first expression, o t h e r . s e t . Similarly, capture clauses of the forms: 3. o u t ( j a v a . l a n g . V e c t o r . e l e m e n t A t ( i n t ) ) 4.  out(Vector.elementAt(int))  would capture different expressions if contained in a boundary map attached to the class boundary. Capture clause #3 would match no expressions within this class; it is not until we encounter the import statement that we know that V e c t o r resolves to j a v a . l a n g . V e c t o r and this happens outside the class boundary. Capture clause # 4 matches three expressions, though: the two within the equality test of the if-statement, and the one within the body of the e l e m e n t A t method.  5.3.2  Parameterized Boundary Maps  The body of a boundary map often needs to make use of the objects passed within the communications captured by it. To enable this, boundary maps can be specified with formal parameters; these formal parameters are then bound to arguments within the communication descriptions of the boundary map's capture clause. For example, for communications sent out of a module to the V e c t o r . e l e m e n t A t ( i n t ) method and captured, one might be interested in making use of the instance of V e c t o r upon which the method is called, the value of the i n t argument passed in this call, or both or neither of these. All four of these options can be expressed by boundary maps: Object map(Vector v ) :  out(v.elementAt(int))  {  Object map(int pos):  out(Vector.elementAt(pos))  Object map(Vector v,  i n t pos):  {  ...  out(v.elementAt(pos))  O b j e c t map () :. o u t ( V e c t o r . e l e m e n t A t ( i n t ) )  80  ...  {  ...  {  ...  The body of the boundary map can then make use of these formal parameters identically to how a method uses its formal parameters [Gosling et al, 2000: §8.4.1]. The order of the formal parameters does not matter. Thus, the third example above could equivalently be: Object m a p ( i n t  pos,  Vector  v):  out(v.elementAt(pos))  {  ...  The tool currently requires that each formal parameter to a boundary map have its name occur exactly once within the capture clause.  14  The tool resolves names occurring in the capture clause as much as possible before matching it to communications occurring in the source code. Names are resolved against the formal parameters declared by the boundary map; thus, in all five examples listed above, the communication pattern that is resolved is V e c t o r . e l e m e n t A t ( i n t ) . The matching process then proceeds as described in Section 5.3.1.  5.3.3  Resuming Captured Communications  Boundary maps capture particular communications. Often, we wish to add functionality to the execution of the methods receiving these communications rather than replacing the original functionality outright. However, directly re-calling the method that has been captured causes a recursive call that is itself captured by the boundary map. Thus, we need a means to avoid the resulting infinite loop. This means is the call to proceed, indicated by the expression: C O N T E X T . p r o c e e d (args) The args are a comma-separated list of arguments of identical type and order as the formal parameters of the boundary map. The result type of the call to proceed is that of the captured communication had it not been captured. B N F syntax for calls to proceed is given in Figure 5.6. For example, consider the following boundary map. It captures all attempts to read elements from V e c t o r s , ensuring that a read lock is in place before the access happens and that the read lock is released after the access happens. Object  out(Vector.elementAt(int))  map():  {  LockManager.getReadLock(); //  Allow vector  Object  obj  =  access  to  proceed  CONTEXT.proceed();  This constraint can be relaxed. If the name of a given formal parameter occurs two or more times within the event designator, it implies that the objects passed in these positions must be identical. This is a minor feature, and so I have left it for future work. l4  81  PRIMARY  :: =  call-to-proceed |  PRIMARY-NO-NEW-ARRAY  | call-to-proceed CONTEXT  ARRAY-CREATION-EXPRESSION ::= "." p r o c e e d " ( " [ARGUMENT-LIST]  ")"  Figure 5.6: B N F syntax for calls to proceed. The Java syntax for primary expressions (indicated by the non-terminal PRIMARY) [Gosling et al, 2000: §15.12] has been extended to include calls to proceed. The other non-terminals in slanted text correspond to existing Java syntax. LockManager.releaseReadLock(); r e t u r n  obj;  } The functionality of this particular boundary map is not concerned with the actual V e c t o r instance on which the access is happening nor with the value of the i n t being passed to this method. To do this in the absence of a call to proceed would require a different boundary map, one that exposed these values to the body: O b j e c t m a p ( V e c t o r v, i n t p o s ) : o u t ( v . e l e m e n t A t ( p o s ) )  {  LockManager.getReadLock() ; // A l l o w v e c t o r a c c e s s Object  to proceed  obj = v.elementAt(pos);  LockManager.releaseReadLock(); r e t u r n  obj;  }  But this makes the code more complicated, runs the risk that we may inadvertently alter the values of v and pos, and causes an infinite loop because the call to V e c t o r . e l e m e n t A t within the boundary map body is recursive, and thus is captured by the boundary map as well. If we do wish to replace the V e c t o r object and i n t value passed in the call to e l ementAt, we could use the following boundary map. O b j e c t m a p ( V e c t o r v, i n t p o s ) : o u t ( v . e l e m e n t A t ( p o s ) ) LockManager.getReadLock(); Vector  special  = getSpecialVector();  Object  obj = C O N T E X T . p r o c e e d ( s p e c i a l , 82  pos +  1);  {  LockManager.releaseReadLock(); return  obj;  } In this example, I have changed the access attempt to have it made on a different vector than the one originally called, and the position at which the access is made is one greater than the original. The call to proceed must be provided with arguments of the same types and same order as occur in the formal parameters of the boundary map. The use of the call to proceed places an onus on the system. Boundary maps can be used to effectively introduce methods to a class that does not already exist.  Should such a  15  boundary map make a call to proceed, there is no pre-existing implementation that should be executed. There are two possible options in such a situation: ignore the call to proceed, or signal an error. The tool does the latter.  5.3.4  Modifiers  Boundary maps translate communications between differing world views.  Constructor,  method, and field declarations can all possess modifiers. Therefore; to be a true translation, boundary maps need to cope with any difference in which modifiers are perceived to affect behaviour. The boundary map should be declared as possessing the set of modifiers expected by the calling context, and should translate these into the set of modifiers provided by the called context. For example, say that the method d o i t is declared only with the modifier p r i v a t e . If the world view of the calling context expects that d o i t is p u b l i c , s t a t i c , and s y n c h r o n i z e d then the boundary map should be declared p u b l i c , s t a t i c , and s y n c h r o n i z e d while being implemented to translate this view to the reality that d o i t is only p r i v a t e . There are many issues in permitting or limiting this sort of power to change visibility that lie beyond this dissertation. Currently, the prototype tool only supports the s t a t i c modifier on boundary maps (see Figure 5.7 for B N F syntax); issues arising from their use are described in the subsection below. However, some visibility modifiers can be simulated with the tool; these are described in the subsection after next. The s t a t i c Modifier Methods and fields in Java can be declared as operating either in the context of a particular instance, or without such a context. The former case is the default; the latter is indicated by Other techniques and languages, such as AspectJ [Kiczales et al., introduction. 15  83  2001],  support method  modifier  ::=  s t a t i c  Figure 5.7: B N F syntax for modifiers. the presence of the s t a t i c modifier on the declaration of the method or field. Likewise, boundary maps can be declared as s t a t i c . Specifying a boundary map without a s t a t i c modifier implies two things. First, the communication being captured must possess an instance context; for an out-map, this is the instance context from which the communication was sent. Second, this captured instance context can be used within the body of the boundary map via the keyword t h i s just as is the case with method bodies. Specifying a boundary map with a s t a t i c modifier indicates that no instance context is required of the communications captured by the boundary map. Consequently, the keyword t h i s cannot be used within the body of such a boundary map; its presence will cause a compile-time error. A n o n - s t a t i c boundary map cannot capture communications that do not provide an instance context. A s t a t i c boundary map can capture communications that provide an instance context; this instance context will be discarded. This discarding procedure is consistent with the operation of Java itself [Gosling et al, 2000: §15.12.1].  Simulating Visibility Modifiers The prototype tool ignores issues of the standard visibility modifiers. For example, it does not take into account the fact that calls to p r i v a t e members cannot cross class boundaries, so any in-mapped method is considered p u b l i c . Consider the following simple class: p u b l i c  c l a s s  p r i v a t e  Simple  v o i d  {  doit()  {}  } We may in-map calls to the d o i t method, with or without adding behaviour there. This effectively causes d o i t to appear to be p u b l i c from the perspective of the external context. So, for example, we might specify a boundary map to be attached to S i m p l e as: v o i d  map():  in(doit())  {  CONTEXT.proceed(); }  The tool adds a method to S i m p l e that invokes the original implementation of d o i t ; thus, this added method can access all the p r i v a t e methods of S i m p l e . The external context would then see the declaration of S i m p l e as:  84  public  class  Simple  {  public void doit()  {}  } Translation of visibility modifiers can be mimicked to a degree. Such mimickry is useful when we have two independent world views where a given method is considered p r i v a t e in one world view but p u b l i c in the other. If we wished to make the method d o i t visible only within the boundary, we could in-map calls to it in the following way: v o i d map():  in(doit())  {  t h r o w new I l l e g a l A c c e s s E r r o r ( )  ;  }  Instances of the class I l l e g a l A c c e s s E r r o r are thrown by the Java Virtual Machine when an attempt is made to invoke a method from a context that is not permitted access to it; this implementation mimics that behaviour. This approach is not totally satisfactory, as modules containing calls to d o i t will still compile; only at run-time will they fail. Out-mapping can similarly be used to alter effective interfaces. Imagine that we have a class C l i e n t that calls d o i t : public  class  Client  {  p u b l i c v o i d someMethodO Simple  simple  = new  {  Simple();  simple.doit(); } }  We can out-map this call to mimic the effect of a p u b l i c modifier on d o i t : v o i d map():  out(Simple.doit())  {  } If we wished to have this out-map make a call to proceed, though, problems would occur because of the way boundary maps are implemented by the tool. Boundary maps are implemented within the scope of the boundary to which they apply. The in-map is implemented within the scope of the S i m p l e class and, thus, is able to access the d o i t method even though this is p r i v a t e . The out-map is implemented within the scope of the C l i e n t class and, thus, d o i t remains p r i v a t e from its perspective. This behaviour is not a deliberate design. Visibility control remains an issue for future work.  85  5.3.5  Communication History  Communication history is, conceptually, a complete record of every piece of information passed between modules. The prototype tool deals strictly with method and constructor invocations, though. These are recorded in two fashions: on a causal basis, in terms of a tree-structure consisting of calls and returns for each thread of control within an execution; and on a temporal basis, irrespective of causality. A means of querying communication history on either basis is supported by the prototype tool, through an interface on a specially provided class called H i s t o r y and its attendant helper, C a l l . For each invocation, the parameters passed can be retrieved through the appropriate queries. This is meant literally: in Java, objects are passed by reference; therefore, the reference that has been passed can be retrieved, not necessarily the original state behind that reference. There are several methods defined on H i s t o r y for querying communication history, as listed in Table 5.4; these are all static methods. Most of these allow the thread, class, method name, and method formal parameter types to be optionally specified; n u l l arguments are treated as though any value in the corresponding position matches. A l l these methods return an instance of C a l l if a valid match is found, or n u l l otherwise. A sample method, l a s t C a l l , is shown and described in Figure 5.8. The methods defined on the C a l l class are shown in Table 5.5. The complete interface to the H i s t o r y and C a l l classes is described in Appendix D. To examine the operation of communication history and its query interface, consider the following toy program. public class Test { public Test  s t a t i c v o i d m a i n ( S t r i n g [ ] args) t e s t = new T e s t ( ) ;  test.printIt(1); test.printIt(2) ; aMethod(); test.printlt(3); }  p u b l i c v o i d p r i n t l t ( i n t number) { System.out.println(number); }  public  static  i n t aMethod() {  T e s t o t h e r T e s t = new T e s t ( ) ; otherTest.printlt(2) return  ;  0;  86  {  Method  Search kind  firstCall lastCall lastCallAnySubclass lastCalllnCFlow lastCallReturningClass •lastinstancePassed mostRecentCall  either either either causal either temporal temporal  Table 5.4: The query methods defined on the H i s t o r y class. Each method searches causally or temporally. Those methods marked as "either" have forms that can operate in either of these fashions. The interface is described in detail in Appendix D.  Result type  java.lang.String j ava.lang.Thread j ava.lang.Obj ect j ava.lang.Obj e c t j ava.lang.Class j ava.lang.Obj ect Call Call boolean Table 5.5: pendix D.  Method name  getMethodName() getThread() getReturnValue() getTarget() getTargetClass() getParameter(int i ) getPredecessor() getParent() precedes(Call other)  The interface to the C a l l class. The interface is described in detail in Ap-  87  lastCall static  Call  lastCall(Thread  thread,  Class els, S t r i n g mth, S t r i n g [ ] paramTypes,  Object Call  returnVal, relativeTo)  Searches for and returns the most recently occurring call conforming to the search parameters passed as arguments; returns n u l l if no call can be found that matches these parameters. Parameter thread  els  mth  paramTypes  returnVal  relativeTo  Description The thread of control whose tree should be searched, in the case of causal searches; or n u l l if temporal searches should be performed The exact class on which the executed method was defined; or n u l l if the class is not of interest The name of the method on which the executed method was defined; or n u l l if the method is not of interest An array containing the names of the types of the formal parameters as they appear lexically in the method declaration; or n u l l if the parameter types are not important  Default current thread  The object that must have been returned by the method execution; or n u l l if the returned object is not of interest The call at which the search should begin; the predecessors of this call are compared, in temporal order, to the search parameters  null  mostRecentCall()  Figure 5.8: A sample query method on the H i s t o r y class. The slanted formal parameters are optional; that is, the l a s t C a l l method is overloaded with forms missing these parameters, in all combinations. If an argument is not provided that corresponds to an optional formal parameter, the default value indicated in the table is used.  88  } }  Executing this program causes T e s t .main to be invoked initially. A series of numbers is printed to the standard output in the order: 1, 2, 2, 3. The communication history just before the program terminates is shown in Figure 5.9. We can see the effects of communication history by applying the following mapset to the boundary of main: mapset  findlt  s t a t i c  v o i d  { map():  in(main(String[]))  {  CONTEXT.proceed(); String[] Call  call  params = n e w S t r i n g [ ] =  History.lastCall(Test.class, call  { " i n t " }; "printlt",  params);  "printlt",  params,  =  History.lastCall(Test.class,  call.getPredecessor()); call  = call.getParent();  System.out.println(call.getMethodName()  +  call.getReturnValue()); }  } apply  findlt  to  Test.main(String[]);  This boundary map begins by calling to proceed with the original functionality of main. It then retrieves the most recent call to p r i n t l t , which can be seen from Figure 5.9 to be the lowermost call on the diagram. Tracing back over the sequence of dashed arrows, the boundary map finds the call that occurs most recently prior to this one, which is the rightmost call on the diagram. The parent of this second most recent call to p r i n t l t is retrieved by tracing back over the solid arrow. Finally, the boundary map prints the name of this parent and the result it returns, as: aMethod: 0 At the time of developing the boundary map, we may know that the call we are interested in retrieving is the last call to aMethod. In this case, a simpler query to communication history could be used in the boundary map: 89  Thread: Class: Method: Target: Parameter types: Parameter values: Result type: Return value:  main ' Test main {String[]J  {{)} in progress in progress  Thread:  main  --M Class:  Test  printlt Method: test •3H Target: { int ) Parameter types: Parameter values: { 1 } void Result type: Return value:  Thread: Class: Method: Target: Parameter types: Parameter values: Result type: Return value:  Thread: Class: Method: Target: Parameter types: Parameter values: Result type: Return value:  V main Test printlt test  { int } { 2 ) void  main Test aMethod otherTest  O  int 0  Thread: main Class: Test Method: printlt Target: test Parameter types: { int } Parameter values: { 3 } Result type: void Return value:  Thread: Class: Method: Target: Parameter types: Parameter values: Result type: Return value:  main Test printlt test  { int } { 2 } void  he-  Figure 5.9: An example communication history. The diagram represents the actual data structure used by the tool, so even though, for example, it makes no sense to have a return value when the return type is v o i d , there is always a slot for both. The solid arrows indicate the causal relationships between the calls; these arrows lead from the callers to the callees with later callees lower on the diagram. The dashed arrows indicate the temporal ordering of the calls, leading from earlier to later calls. Since the execution of m a i n has not finished, its result is not recorded (represented as "in progress" in the diagram—a garbage value). Calls to constructors have been left off the diagram. The target objects are actually stored as references; the names t e s t and o t h e r T e s t are shown here for readability.  90  mapset  findlt2 {  s t a t i c  v o i d  map():  in(main(String[])) {  CONTEXT.proceed(); String[]  params = n e w  String[0];  Call call = History.lastCall(Test.class,  "aMethod", p a r a m s ) ;  System.out.println("aMethod: " + call.getReturnValue()); }  } This locates the same, ultimate call of interest and produces the same output as the earlier version. I chose to provide the communication history functionality in this class-based fashion for the sake of ease of development. It being initially unclear how communication history was to operate and be maintained, I decided that a class-based approach would allow maximum flexibility with minimum changes needed to the tool as the research progressed. The interface that has been implemented for communication history is, by no means, complete. For example, there are currently no query methods that allow one to specify that a call should be found that contains a particular object as an argument, and there is currently no method for determining the number of parameters on a returned call. Section 5.6.2 discusses the drawbacks of the current implementation approach, and Section 7.2 describes how an improved version of the tool should implement it to avoid such drawbacks.  5.3.6  Renaming Maps  The boundary maps described thus far are based upon the notion that communications are captured as they cross boundaries, and that an operational description of the alteration of these communications can be specified. The pragmatics of Java make the use of this model difficult for altering communications involving type references, such as static method invocations (e.g., capturing the reference to S y s t e m within an invocation to S y s t e m . l o a d L i b r a r y ) . Type references are not first class entities in Java, so replacing one type reference with another is not possible procedurally. Although Java provides support for introspection, one cannot easily transform a static type reference embedded within a potentially complex expression into a different expression that uses introspection. Instead, the prototype tool provides a second kind of boundary map: renaming maps. A renaming map specifies a target name to be matched and a replacement name that is inserted in its stead. For example, if we wished to replace all references to j a v a . l a n g .  91  renaming-map  ::=  possible-type-name  possible-type-name  ::=  "->"  [ qualifier ]  possible-type-name  ";"  IDENTIFIER  Figure 5.10: B N F syntax for renaming maps. S y s t e m , within a boundary, with a reference to S o m e C l a s s , we would use the following renaming map: java.lang.System  ->  SomeClass;  This would replace any explicit occurrences of the string " J a v a , l a n g . S y s t e m , " and hence, implicit references through the string " S y s t e m " will not match. That is, name resolution does not occur. In general, the tool allows any type name (including exception types) to be replaced with any other type name (although array types are not supported). However, Java syntax is ambiguous: type names and field accesses have identical syntax. If a name does not resolve at a given boundary, we often cannot tell whether it refers to a type or a field. Therefore, the tool considers any unresolved name to be a fair target of renaming maps. B N F syntax for renaming maps is provided in Figure 5.10. Renaming maps do not, by themselves, ensure the type compatibility of all expressions upon which they act. A local variable whose type is altered by a renaming map may have assigned to it an expression whose type is not assignable to the replacement type. F o r example, consider the following source method and renaming map: public  void  String  s =  someMethod() "test  {  string";  } String  ->  mypackage.String;  The body of s o m e M e t h o d , after applying the renaming map, effectively becomes: mypackage.String  s =  "test  string";  However, this statement is illegal because an instance of J a v a . l a n g . S t r i n g is not assignable to a variable of type m y p a c k a g e . S t r i n g since the former is not a subtype of the latter. If the tool allowed assignment operations to be captured and altered (which it does not), the assignment could presumably be replaced; otherwise, the resulting module will not compile.  92  tool-directive ::=  a p p l y IDENTIFIER t o [ qualifier ] IDENTIFIER ";"  Figure 5.11:  This  b e h a v i o u r is p e c u l i a r to  B N F syntax for tool directives.  j ava. lang. S t r i n g  w h i c h literals are d e f i n e d w i t h i n Java. lem  since  it is t h e  only  class  for  I d i d not c o n s i d e r this significant e n o u g h o f a p r o b -  to d e a l w i t h i n t h e p r o t o t y p e . M o r e g e n e r a l l y , o n e w o u l d n e e d to a s s u r e that t h e t y p e s o f  b o t h s i d e s o f a s s i g n m e n t s w e r e c o m p a t i b l e . T h e t o o l d o e s n o t c u r r e n t l y flag s u c h  constraint  v i o l a t i o n s , but a l l o w s the c o m p i l e r ( u s e d after the tool has p r o d u c e d t r a n s f o r m e d to c a t c h  them.  T h e presence  of such incompatibilities means  that the translation  source) between  w o r l d v i e w s i n t e r n a l a n d e x t e r n a l to the m o d u l e is i n c o m p l e t e .  5.4  Tool Directives  T o o l d i r e c t i v e s are statements t e l l i n g the t o o l w h i c h m a p s e t s to attach to w h i c h b o u n d a r i e s , l i k e the f o l l o w i n g :  apply someMapSet to  SomeClass;  E a c h tool directive consists o f t w o parts:  the n a m e o f the m a p s e t  (someMapSet,  c a s e ) to a p p l y to a b o u n d a r y , a n d t h e n a m e  o f the b o u n d a r y itself  syntax for tool directives appears in F i g u r e  5.11.  The  (SomeClass).  B N F  p r o t o t y p e tool a l l o w s three k i n d s o f b o u n d a r i e s to b e n a m e d w i t h i n tool directives:  w h o l e classes,  individual methods and constructors,  and individual  fields.  T h i s is n o t t o  t h a t i m p l i c i t c o n t e x t i s f u l l y s u p p o r t e d a t a l l t h e s e b o u n d a r i e s ; s e e S e c t i o n 5.3.1 detail.  i n this  If the t o o l is u n a b l e to l o c a t e the n a m e d b o u n d a r y w i t h i n the  files  for  say  more  p r o v i d e d to it, it  p r i n t s a w a r n i n g to this effect. T o o l directives reside in the g l o b a l n a m e s p a c e . aries  must be referenced  qualified. age as  For  example,  This means  that m a p s e t s  within tool directives via global names, the  class  JavaOutlinePage  i.e., they  of Eclipse occurs  org.eclipse . jdt.internal. u i . javaeditor;  org. eclipse . j d t . internal. u i . javaeditor  thus,  and  bound-  must be  fully  i n the  pack-  it m u s t b e r e f e r r e d  . JavaOutlinePage  within  a tool directive. F i e l d s are defining  referenced  to the f u l l y q u a l i f i e d n a m e  of  their  class.  Methods  also  are  o f their defining class, each.  b y a p p e n d i n g their n a m e s  referenced but are  b y a p p e n d i n g their n a m e s  more  problematic because  T o m a t c h a m e t h o d , the t o o l l e x i c a l l y m a t c h e s  to t h e f u l l y  qualified  o f the f o r m a l p a r a m e t e r  the t y p e s o f the f o r m a l p a r a m e t e r s  s p e c i f i e d w i t h i n the s o u r c e a n d w i t h i n the tool d i r e c t i v e s ; n o r e s o l u t i o n o f the t y p e s  93  name list  to  of as  found  in either formal parameter list is performed prior to matching. For example, consider this simple example of Java source: somepackage;  package  import  otherpackage.SomeClass;  public  class  AnotherClass  {  public  A n o t h e r C l a s s ( S o m e C l a s s arg)  p u b l i c  v o i d  doit(SomeClass  arg)  {}  {}  } and this tool directive: apply  someMapSet  t o  somepackage.AnotherClass. doit(otherpackage.SomeClass);  Although the tool would search within A n o t h e r C l a s s here to find the d o i t method, it would not find any matches since the lexical declaration of d o i t says that it takes a formal parameter of type S o m e C l a s s and not o t h e r p a c k a g e . S o m e C l a s s . The tool only uses the information provided within the files passed to it, but it does not assume that the information contained there forms a closed universe. Therefore, I chose the parameter type matching rule to avoid the need for disambiguating so-called on-demand import statements, such as import  otherpackage.*;  In the presence of this alternative statement, S o m e C l a s s could resolve to S o m e C l a s s , o t h e r p a c k a g e . S o m e C l a s s , somepackage . S o m e C l a s s , or even J a v a . l a n g . S o m e C l a s s ^ e v e n though the i m p o r t statement says that S o m e C l a s s resolves to otherpackage.SomeClass. Constructors are handled similarly to methods. The name of a constructor is the same as the unqualified name of its class. For example, the constructor shown above for A n o t h e r C l a s s would be referenced in a tool directive as: somepackage.AnotherClass.AnotherClass(SomeClass) Since tool directives are separated from the modules they affect, I chose to place tool directives in the global namespace; placing them inside particular source modules seems inappropriate. This choice has additional consequences: not all boundaries are easily named from the global namespace. For example, Java possesses constructs such as local classes whose names are not well-defined outside their local scopes, and anonymous classes which  94  have no names at all. While we can define an arbitrary rule to construct their names (such as a numbering scheme based on their lexical order within a globally named scope), these resulting names tend to be unstable: if we merely reorder the source code, the names will change. Similar problems would arise should we choose to support sub-method-level boundaries, such as boundaries around individual statements. It is possible that specifying boundaries through operational means—defining properties or conditions that need to be met—would allow such problems to be overcome. However, it is not clear at this point whether separate boundaries around such difficult-to-name modules are needed.  5.5  Sequencing Boundary Maps  More than one boundary map can readily be defined that attempts to capture a given communication. In these situations, it is important to have an explicit sequence in which boundary maps are to apply. A total order of boundary maps is defined for the prototype tool in the following fashion: of greatest precedence is the order in which the files containing tool directives are specified to the tool, followed by the order of the tool directives within a particular file, followed by the order of boundary maps within a mapset named by a tool directive. The tool blindly applies the boundary map of greatest precedence to the source before considering where or how the second most precedent boundary map is to apply. The source that is input to the tool for matching against the second boundary map is the output of the first. For example, consider the following source, mapset, and tool directive. p u b l i c  c l a s s  p u b l i c  SomeClass {  v o i d  methodl() {  System.err.println("In methodl()"); } p r i v a t e  String  v o i d  method2() {  s = "test  string";  }  } mapset v o i d  example { map():  in(methodl())  {  System.err.println("Before  methodl()");  CONTEXT.proceed(); }  95  v o i d map(): in(methodl()) { System.err.println("Ignoring  methodl()");  }  S t r i n g ->  mypackage.String;  S t r i n g ->  Java.lang.String;  }  a p p l y example t o  SomeClass;  The tool would apply the first in-map on m e t h o d l first, followed by the second in-map on m e t h o d l . As a result, calls to m e t h o d l would result in the string " I g n o r i n g m e t h o d l () " being printed. Next would come the first renaming map, which would result in the type of the local variable s becoming mypackage . S t r i n g . Finally, the second renaming map would be applied, but would not find any matches for " S t r i n g " since the first renaming map has already replaced these.  5.6  Implementation Issues  The previous sections within this chapter have largely described the features of the prototype tool from the perspective of a system developer using it. However, this leaves the question of how the tool actually performs its tasks, a question that leads to subtle properties of behaviour to which I have, so far, merely alluded. Recall that every boundary map possesses a body that can contain fairly arbitrary Java source code. Once a boundary map has captured a communication, its body indicates how that communication is to be altered. However, this is a model that must operate in the context of the realities of Java, where one cannot literally intercept communications (especially to non-existent methods) as they cross boundaries, and where it would be very inefficient to actually do so. Instead, as indicated in Section 5.2, the tool performs a multistage process before compile-time on the three kinds of inputs provided it. I begin, in Section 5.6.1, by examining how the tool combines boundary maps and Java source to simulate the interception of communications crossing boundaries. Section 5.6.2 looks at instrumentation of the resulting source to support communication history, and the effects that sequencing multiple boundary maps has upon this support.  96  5.6.1  Combining Boundary Maps and Java Source  The tool combines specified boundary maps with the source within specified boundaries. The way it goes about this depends on two things: the kind of boundary and the kind of boundary map to be combined. I begin by looking at the combination process for inmaps, followed by the combination process for out-maps (including g e t s and s e t s event designators). Table 5.1 indicates the support provided by the tool for various combinations of kinds of communication descriptions and boundaries. The in-mapping of method invocations results in two possible situations: either the method is already present, or it is not. If it is already present, I change the name of the method to hide it and any references to that method within the boundary are changed to 16  the new name of the method. In both cases, the body of the boundary map is inserted as the body of a new method, with the original name of the replaced method. The result type and context-kind of this new method are taken from the result type and context-kind of the boundary map; after all, the purpose of the boundary map may be to alter these properties to the perspective of a different world view. The new method is inserted at the level of the closest enclosing class, even when the boundary map is applied to a method boundary.  17  In-mapping constructors is more complicated, because we are stuck with the name of the constructor.  Instead, we can play with the number and types of parameters that  the constructor takes; dummy parameters can be inserted to effectively hide the original constructor, with appropriate dummy arguments inserted in calls to it within the boundary. This has not been fully implemented in the tool. Attempts to capture qualified method or constructor invocations can be problematic. When applied to the boundary of a type, they are interpreted as method or constructor invocations on nested types. This is done as in the non-qualified case as long as the nested types are already present. I deemed it too complicated to introduce new classes correctly for it to be worthwhile doing so in the prototype; it is not clear from whence some details about the new class would come. Out-maps are generally more straightforward. Again, the body of the boundary map is inserted as a new method with an obfuscated name.  18  All call sites (or field access sites)  within the boundary matching the communication description are replaced by calls to the new method. Out-mapping constructors has a few more twists to it. In Java, there are two components to an invocation of a constructor. First, space is allocated to the new instance via use The new name consists of the prefix "..CONTEXTS i n n e r " followed by a number, which starts at 0 for the first changed method, 1 for the second, etc. O n e cannot embed a method directly inside of another method in Java—it is not clear what the semantics of such an embedding would be. Here, the name consists of the prefix "_.CONTEXT$map" followed by a number. 16  l7  l8  97  of the new operator, then this space is initialized through the actual constructor invocation. If we were to only capture the latter component, we would need to constrain our out-map to ensure that it consisted of only a different constructor invocation. Instead, the tool allows for the capture of the new operator itself, through a special syntax. To capture constructor invocations of the form new somepackage . S o m e C l a s s (args), one describes the communication as somepackage . S o m e C l a s s .new(args).  Each entire instantiation  expression within the boundary is replaced in this situation. If the boundary map body contains a call to proceed, this must be replaced with an invocation of the original method (via its new name, in the case of in-maps) or original constructor, or with an access of the original field, where the communication arguments that are exposed to the boundary map body are replaced with whatever new values are in place and the other arguments are unchanged. If no such original target exists, the tool signals an error. The tool implements renaming maps by replacing all the occurrences of a name within a boundary with a different name in a straightforward fashion.  5.6.2  Instrumentation, Communication History, and Sequencing of Boundary Maps  In order to reflect upon the history of communications made within a system, we need both a means to record the communications made within that system, and a means to access this record. My proof-of-concept implementation of communication history for Java stores method calls and method returns within a data structure consisting of one tree per thread of control, and a doubly-linked list through the nodes in all these trees to indicate the temporal order of the events therein. Interfaces to this data structure are provided for the system developer to utilize communication history; see Section 5.3.5 for details. To store calls and call returns in the tree, I defined two snippets of code to instrument the methods in a system, one that is executed at the start of each method and one that is executed at the end of each method. These snippets need to worry about operating correctly in the presence of exceptions, and thus, tend to be long and ugly; Appendix E shows a simple example in all its gory detail. Every method in the classes operated on by the tool is instrumented. This means that every method invocation causes information to be stored in the communication history record, and it is stored for the entire duration of the execution of the system. This is obviously inefficient both in terms of execution speed and memory requirements. There are many cases in which it is fairly simple to detect when such information is not needed and thus does not need to be stored in the first place, or situations in which the information is no longer needed and can be discarded. These ideas are discussed further in Chapter 7. More subtle problems arise from the fact that instrumentation does not occur until after  98  all boundary map transformations have occurred, and that only method entries and exits are instrumented, not the call sites themselves. Consider the following source, mapset, and tool directive.  p u b l i c  SomeClass {  c l a s s  p u b l i c  s t a t i c  v o i d  m a i n ( S t r i n g [ ] args)  {  OtherClass obj = n e w OtherClass() ; f o r ( i n t  i  =0;  i  <  3;  i++)  obj.methodl(); } }  p u b l i c  OtherClass {  c l a s s  p u b l i c  v o i d  methodl()  {  method2(); }  p u b l i c  v o i d  String  method2() {  s = "a s t r i n g " ;  System.err.println(s.length()); } }  mapset v o i d  example { map():  Call  call  if(call  ==  in(methodl()) { = History.lastCall(null,  "method2",  n u l l ) ;  null)  CONTEXT.proceed(); else  System.err.println("method2() Call  before");  call2 =  History.lastCall(String.class, System.err.println(call2  !=  }  v o i d  called  map():  in(method2())  {  99  "length",  n u l l ) ;  n u l l ) ;  CONTEXT.proceed(); } }  apply  example t o  OtherClass;  Implicit context is being used here (through the first boundary map) to change the behaviour of the system such that method2 is invoked at most once. The second boundary map is present simply to illustrate the problem discussed in the next paragraph; conceptually, its effect should be idempotent. Before implicit context was applied, running S o m e C l a s s would have caused the following to be printed: 8 8 8 We would expect that, after applying implicit context, this would become: 8 true method2() c a l l e d  before  true method2() c a l l e d  before  true However, this will not happen under the prototype tool. Instead, this will be printed: 8 false 8 false 8 false Appendix E shows the code that results from the application of the tool on this example. There are two problems, as described below. The in-map on method2 causes it to be renamed to _ _ C O N T E X T $ i n n e r l and the call site within m e t h o d l is changed to this new name.  A new method is in-  serted in O t h e r C l a s s called method2 that delegates to _ _ C O N T E X T $ i n n e r l , replacing the call to proceed within the boundary map. But when the resulting class is 100  instrumented, the method that is getting invoked each time is not called method2 but __CONTEXT$ i n n e r 1—we have changed the name out from under our communication history query. This is why "method2 () c a l l e d b e f o r e " is never printed. This problem is solved by having the tool differentiate between original and mapped names better. Which name to use in a given situation depends on the world view(s) under consideration, and the tool needs to take this into account. The other problem arises from the fact that the call site within method2, invoking S t r i n g . l e n g t h (), is not instrumented and the source for S t r i n g itself has not been fed into the tool. Therefore, the query to find the last call on S t r i n g . l e n g t h () always fails, returning n u l l . This problem is simple to fix: at a minimum, one could instrument every call site. Better solutions exist (see Chapter 7). All these difficulties arise from the simplistic implementation of communication history that I chose. This sufficed for my proof-of-concept investigations (see Chapter 6), but does not suffice for a non-prototype version of the tool. A stronger implementation of communication history will be based on the discussion in Chapter 7; such an implementation would address the problems outlined above.  5.7  Summary  A system developer must describe four things in specifying and using my prototype tool: (1) the Java source code on which implicit context is to operate; (2) to which boundary or boundaries it must be attached; (3) which communications it is to intercept; and (4) how it should alter the intercepted communications. Items 3 and 4 are provided by the boundary map construct; multiple boundary maps are organized into,named mapsets. Item 2 is provided by tool directives, which name mapsets and the boundaries to which they are to apply. The prototype tool performs a multi-stage process before compile-time on the three kinds of inputs provided it (namely, tool directives, boundary maps, and original Java source): 1. The boundary maps and original Java source are parsed and stored as abstract syntax trees (ASTs). 2. Each tool directive is parsed to identify the mapset and boundary named in it. These names are matched against the ASTs. 3. Assuming that the mapset and boundary matching the names within the tool directive are found, each boundary map within the mapset is applied to the boundary, in the lexical order in which they occur. After each transformation, the transformed source AST is used as the input for the next transformation. 101  4. After all the tool directives have been carried out, the transformed source A S T is instrumented to support communication history queries. 5. The transformed source A S T is unparsed to produce a new set of Java source, ready for compilation. Boundary maps consist of: a capture clause, describing the set of communications to be intercepted at the boundary to which the boundary map is attached; a body, specifying how the intercepted communications are to be altered; a set of formal parameters, bound to parts of the intercepted communications and exposed for use within the body; and various other minor parts, such as a result type, and optional  throws  clause and/or  s t a t i c  modifier. Communication history can be used within boundary map bodies. It is supported through a set of methods on two special classes, called H i s t o r y and C a l l . Communication history is supported by instrumenting the transformed source code in order to record every method entry and exit. This simplistic approach has several drawbacks, including impact on the speed of execution, and ever-increasing memory demands the longer the execution continues. I consider solutions to these problems and to the other shortcomings of the tool in Chapter 7.  102  Chapter 6  Validation I now consider the validation of the hypothesis that expressing essential structure through implicit context eases software evolution and software reuse. I performed a set of case studies to provide evidence in support of several properties involved in easing software evolution and reuse. Good design can go a long way to easing software evolution and reuse. But "no matter how ambitious the intentions of the designers..., software designs tend to erode over time" [van Gurp and Bosch, 2002]. State-of-the-art design and implementation practices that do provide a measure of evolvability and reusability come at the cost of explicitly anticipating and planning for specific kinds of change. Implicit context attempts to allow us to do away with much of this cost, focusing on immediate system needs, not potential future ones. The changes that are actually needed in our systems over time often differ from those that we anticipate.  State-of-the-art design and implementation practices generally  handle unanticipated change poorly, requiring invasive modifications of the details of our systems—a risky and difficult prospect.  1  Implicit context helps us to cope with unantici-  pated change by specifying modifications external to our modules. State-of-the-art design and implementation practices also tend to increase the complexity of individual modules for the sake of greater overall flexibility, as described in Chapter 2. Implicit context allows us to avoid this complexity in our local modules, moving it to more global scopes where it can be manipulated directly as the system evolves. In other words, implicit context allows us to express the local concerns of each module locally, separating the more global concerns. At an early stage of the work on the ideas in this dissertation, Walker and Murphy [2000] performed a case study and published its results. It demonstrated that implicit context could be used to restructure a system to isolate a particular module from an unwanted feature, while continuing to have the system as a whole provide that feature.  Specifi-  'Invasive modifications involve making a change within the body of a module, as opposed to externally specifying a change. 103  cally, the pluggable look-and-feel feature of the Java Swing graphical user interface library [Fowler, 1998] was removed from the J B u t t o n class, which provides the functionality of a button widget. This case study addressed the concept of essential structure in only its most nascent form; as such, I do not discuss it further within this dissertation. For this dissertation, I performed two case studies, each dealing with a subset of the properties discussed above; each case study involved systems written in the Java programming language [Gosling et al, 2000]. The first case study, described in Section 6.1, focused on whether unanticipated reuse can be handled more easily through the use of implicit context. I removed the Outline View of Eclipse (a large, industrially-developed integrated development environment) and reused it in another system, independent of the rest of Eclipse. Such reuse was not planned for in the design of Eclipse. The second case study, described in Section 6.2, focused on whether unanticipated evolution can be handled more easily through the use of implicit context. I designed and implemented two versions of a server providing a subset of the file transfer protocol (FTP), one version using a purely object-oriented approach and one version using implicit context as well; these designs ignored the existence of the remainder of the FTP specification. Each version was then evolved to determine whether fewer invasive changes were necessary when implicit context was used. The purely object-oriented version was additionally evolved via implicit context; this demonstrates that implicit context can be used to evolve a system not constructed with it from scratch, while still providing benefit. These two case studies described below concentrate on structural issues bearing on the evolvability of our systems and the reusability of our modules. For implicit context to be adoptable as an industrial development technique, additional factors must be addressed, such as the development process to be used, tool design, and run-time cost. While further research involving a non-prototype tool is needed, the case studies provide some insights into these factors.  6.1  Reusing the Outline View of the Eclipse IDE  This case study concentrated on demonstrating the use of implicit context in the arena of unanticipated reuse. I began with an existing, large, well-designed system: the Eclipse integrated development environment. I chose to reuse a feature of Eclipse called the Outline View (see Figure 6.1); the functionality provided by the Outline View is described in Section 2.1. Eclipse was designed and implemented to be highly extensible, utilizing such state-ofthe-art techniques as object-orientation [Booch, 1986; Rumbaugh et al, 1991], reflection [Smith, 1982], and design patterns [Gamma et al, 1994]. However, Eclipse was not designed in a way that planned for such reuse of the Outline View. The Outline View was  104  File  Edit  perspective  Packages .  Project  Debug  Window Help  QTest.javafxi  <r  H fiJ Testing  } ^- - o i ' .  E3- £8 .(default pac  • ''-a  Iz © o  s  public abstract class Test { . *• • public-void :f oo () { .  •  \ :;-^;5 t;9;,; '. t  i  :  :  -  B - © * Test  '* • ' ^  private synchronized*nativefevoid blah(int  -Si JRE_LIB  A  1);  ™^:0 'someField - T e s t s  protected-static Test .someField; static, class flWHB { . "w. • . public void innerMethod (double a) { .*-.-*.:  \$/} Tasks (0 items) •' I C p 1 Description  fooQ  "o foo20  1^- o,  1  :  X  - • a blah(int)  "public abstract void foo2(),  ;  — O  O  ' ,  _  ;  •; i; :  mnerMethod(doublej%'  # h sp-  .  ' . " | Resource. | In Folder'  *  x  "1 Location |  Hierarchy]" Packages |: Console Tasks Search. Teo".//iner - /Testing'  Figure 6.1: The Eclipse integrated development environment. The Outline View is in the sub-window on the right. not modularized in a fashion amenable to reusing it, possessing too many dependences on classes within Eclipse that I did not want to reuse (see Section 2.1 for details). Therefore, state-of-the-art practices were unable to aid us in this task without requiring extensive, invasive modifications to Eclipse—the very thing I wanted to avoid. Instead, fused implicit context to create a facade around the Outline View. From the perspective of the interior of the Outline View, it continued to operate within the rest of Eclipse. From the perspective of the external world, the Outline View was turned into a GUI widget (in a class called J a v a V i e w e r ) independent of the remainder of Eclipse. I then used this GUI widget as the beginnings of an interface to the prototype tool described in Chapter 5; this interface currently allows one to browse through the declarations in a set of source classes and to generate very basic skeletons of boundary maps. The procedure that I followed in accomplishing this task is described in Section 6.1.1. Section 6.1.2 discusses the results of the case study.  105  6:1.1  The Procedure  The procedure that I followed in reusing the Outline View outside Eclipse consisted of the following, major steps. 1. Identify the classes implementing the Outline View. 2. Determine the dependences these classes have on the remainder of Eclipse. 3. Create the J a v a V i e w e r class to act as a facade to the Outline View, replacing the dependences the reused classes had on the remainder of Eclipse. 4. Utilize J a v a V i e w e r as a widget in a graphical user interface to my prototype tool. I describe each of these steps in turn.  Identifying Classes for Reuse Eclipse contains no class called " O u t l i n e V i e w " ; therefore, some effort needed to be expended to determine the classes that implement the Outline View. I began by performing a search for any class whose name contained the words " O u t l i n e " or "View." Examining each of the classes that matched this search, I eliminated those that clearly had no relationship to the Outline View. Of the remainder, two stood out as appearing significant on the basis of their names and content: J a v a O u t l i n e P a g e and C o n t e n t O u t l i n e . I then determined the superclass hierarchies of each of these classes, and examined the comments contained within the source corresponding to all these classes. Eclipse contains a number of "views" (such as editors, task lists, and the Outline View) that operate together to form the environment for system development. The C o n t e n t O u t l i n e class provides the Outline View, as evidenced by the comments contained in it, but only in coarse form. C o n t e n t O u t l i n e refers to the actual contents displayed within it in an abstract form; this allows the contents to be specialized, for example, on the basis of the programming language in which development is being done. This is the role of the J a v a O u t l i n e P a g e class. This description is further complicated by the fact that each view really acts as a placeholder (with certain semantic properties) for GUI widgets. Thus, PageBookView, which is the superclass for C o n t e n t O u t l i n e , contains a PageBook, which is a kind of GUI widget used for displaying one of a set of pages at a time. In turn, PageBook holds instances of Page, which is a superclass of J a v a O u t l i n e P a g e . I decided that the majority of the functionality of the Outline View is implemented in a total of seven classes within Eclipse: J a v a O u t l i n e P a g e , C o n t e n t O u t l i n e , PageBookView, V i e w P a r t (the superclass of PageBookView), W o r k b e n c h P a r t (the superclass of V i e w P a r t ) , PageBook, and O v e r l a y l c o n (responsible for drawing 106  the icons shown for each node of the tree within the Outline View). In addition to these were several classes that made small contributions to the functionality of the Outline View. Eclipse is built on top of a custom graphics package, called the Standard Widget Toolkit (SWT), instead of the graphics package provided by the standard Java class libraries, the Abstract Windowing Toolkit (AWT). Eclipse also makes heavy use of a GUI library, called JFace, built on top of the SWT; the equivalent GUI library provided by the standard Java class libraries is called Swing. While it should be possible to translate any program written on top of SWT/JFace to instead use AWT/Swing, this would be a significant undertaking. I felt that this translational process would require too much effort for too little additional evidence for it to be worthwhile within the scope of this case study. Therefore, I chose to reuse the SWT and JFace as libraries: i.e., they were not altered in any way, either manually or by the tool.  Determining the Dependences on Eclipse Determining the dependences that my reused classes had on the remainder of Eclipse was an iterative process. I isolated the classes that I intended to reuse and attempted to compile them. The compiler complained any time it could not resolve a reference to types and their members that were not present. From the error messages generated by the compiler, I created a set of types lacking any functionality so that the compiler could resolve these references. Eventually, I refined this set of dummy types to the point where the compiler would compile the classes I was reusing, implying that I had determined the set of dependences on the interface to the remainder of Eclipse.  2  Of course, mere compilation does not imply correct behaviour. I proceeded to determine the more subtle dependences: which of the types and their members performed functionality I considered useful in the context of my application, and what protocols were expected to be followed to have the Outline View function. Determining these subtle dependences was an ad hoc process involving two operations: running the compiled system and seeing where it broke, and examining the original system to determine the protocols for initializing and using the Outline View. I describe this more below.  Creating the  JavaViewer  Facade  To provide an interface for the external world to use the Outline View in isolation from the remainder of Eclipse, I created a class called J a v a V i e w e r . J a v a V i e w e r operates in the style of other widgets provided by SWT/JFace. Its public interface includes methods that This process is equivalent to forming declaratively complete hyperslices in the Hyper/J tool (see h t t p : / /www. r e s e a r c h . i b m . c o m / h y p e r s p a c e / for details of the tool, and Tarr et al. [1999] for details of the concepts it uses). 2  107  perform the following behaviour: creation of the actual widget controls that are displayed; setting the input model that is represented by the tree within the Outline View; getting and setting the title of the shell in which the Outline View is displayed; getting and setting the set of nodes in the tree of the Outline View that are selected; and adding new action buttons to the tool bar of the Outline View. To implement this behaviour, J a v a V i e w e r contains an instance of J a v a O u t l i n e P a g e and an instance of C o n t e n t O u t l i n e ; many of the methods described above simply delegate to existing methods on these classes. In order for the interface provided by J a v a V i e w e r to suffice, I had to replace all of the dependences on the parts of Eclipse I decided against reusing. Most of these dependences implemented functionality required by Eclipse overall but not required by my application; therefore, contextual dispatch was used to "dead-end" these, i.e., intercepting outgoing calls and having them immediately return, with trivial result values such as n u l l . The non-trivial dependences that remained fall into three categories: additional Outline View functionality, translation of initialization protocols, and translation of the Java source element representations. I discuss these below. Recall that I decided to reuse a set of classes that implemented a majority of the functionality of the Outline View; the remainder was scattered amongst other classes. Rather than reuse these other classes, which would have pulled along much unwanted functionality, I reimplemented the small remaining pieces in two classes: MemberPage and J a v a E l e m e n t L a b e l P r o v i d e r . The names of these two classes are identical to existing classes in Eclipse; since they fulfill the same roles as these existing classes, from the perspective of the Outline View, this choice seemed reasonable. Since the Outline View was never designed to be a widget, its implementation assumes various details of initialization at run-time have already occurred prior to its instantiation. These details are taken care of by parts of Eclipse not reused in my application and, hence, had to be dealt with through a small set of boundary maps. The most significant remaining dependences dealt with the representation of elements in Java source code. The Outline View provides a tree view of the elements in Java source, such as methods, fields, and classes. Eclipse provides a particular set of classes for representing these source elements. These classes are intimately tied-in with other functionality of Eclipse, which deals with source editing and compilation. Thus, I decided that these classes representing source elements were too specific to Eclipse for a general-purpose Outline View widget. Instead, I created a separate, more generic hierarchy of classes'for representing source elements. The world view of the reused classes had to be translated to make use of this new source element hierarchy. At this point, I had obtained a module providing the functionality of the Outline View that could be used independent from the remainder of Eclipse. Figure 6.2 shows the classes in this module.  108  WorkbenchPart  2? IPropertyListener[  i  PageBook [  ViewPart  T ~ . ...  PageBookView h - -  T  \,  IPage |(|- - | MessagePage  C o n t e n t O u t l i n e \.  ~7\ Constants  JavaOut1inePage  •  IContentOutlinePage"  JavaViewer Y  ~H JavaElementLabelProvidei _V_ Overlaylcon  JavaViewerException]^-'  """  Package  Element  \ |. Import \ \ Member \ Function  JZ Initializer  k -•  Constructor  Field  Method  Type  ^1  Class  Interface]  Figure 6.2: The module resulting from the reuse of the Outline View. The classes with names in boldface text are those that were reused verbatim from Eclipse. The classes with names in Roman text represent classes and interfaces I implemented to replace like-named ones in Eclipse. The classes with italicized names are new. Dependences denoted with dotted arrows are those that arise from the boundary maps attached to many of these classes. Note that dependences on SWT, JFace, and the standard Java class libraries are not shown.  109  Using  JavaViewer  as an Interface to the Tool  Under normal circumstances, I could use the Outline View module I created simply by instantiating J a v a V i e w e r and making the appropriate calls to its interface. However, the circumstances that I encountered in using the Outline View as an interface to my prototype tool were special. My module, providing an isolated Outline View, makes use of a generic hierarchy representing Java source elements. My prototype tool produces abstract syntax trees (ASTs) the nodes of which also represent Java source elements. To use J a v a V i e w e r as an interface to my tool, one would need to dynamically translate these ASTs into a tree representation using the source element hierarchy provided by the J a v a V i e w e r . This should not be difficult, but it does seem wasteful. Instead, I deemed it appropriate to use implicit context to alter the world view of the J a v a V i e w e r so that it directly used the source element hierarchy already provided by my tool. In this case, the tool would not need to perform a translation to and from its ASTs, but could hand these ASTs directly to the J a v a V i e w e r . Figure 6.3 shows the results: a J a v a V i e w e r instance displaying the major source elements from an A S T provided by the parser of my tool. I created a special feature for this J a v a V i e w e r through a new class called T e m p l a t e M a p A c t i o n . The user of the •interface can select a set of nodes in the tree; they can also bring up a pop-up menu by right-clicking in the J a v a V i e w e r window. T e m p l a t e M a p A c t i o n adds an option to this pop-up menu that reads, "Create in-map templates."  Selecting this option causes a  skeleton mapset to be written to a temporary file, and a text editor displaying this file to be started. The user can then add in their specifications for contextual dispatch to in-map the selected nodes. This is simple functionality, but a step towards interactive tool support for using implicit context.  i z  ft © ,o  ,o  s  jg a  Fl  O  - O *  . --. fooO foo2Q  ' '" -  -,tfHi©,blah(inf)  .  someField  • I  "  :  'Inner  ill  innerMethodfdouble)  Figure 6.3: The Outline View of Eclipse reused as an interface to my prototype tool (see Chapter 5). The contents of the window consists of a J a v a V i e w e r widget. 110  6.1.2  Results and Lessons Learned  I successfully reused the classes implementing the Outline View of Eclipse as a simple interface to my prototype tool. Table 6.1 shows statistics for the case study. Only a small fraction of Eclipse was reused (17% if we include the SWT and JFace in the calculation, or 0.3% if we do not) indicating that I reused the Outline View without pulling along much extraneous functionality.  Tool Limitations Unfortunately, a very small number of invasive modifications were needed due to limitations in the prototype tool. There were three kinds of these, involving package and imports statements and implements clauses. In designing the prototype tool, I had deliberately chosen to ignore issues arising from these statements and clauses for the sake of limiting the scope of the project. The package name for the reused modules needed to be commented out to avoid namespace collisions between the original, unmodified source and the output of the prototype tool. Similarly, import statements needed to be commented out when non-reused classes were imported by the original code. A better implementation of the tool should sup-  Description  Source size  Boundary maps all  (loc) Eclipse, total  non-trivial  (loc)  (#)  Java: 775,314  (loc)  (#)  Tool dirs. renaming  (loc)  (loc, #)  C: 11,121 SWT & JFace  131,162  reused  2,175  new, facade  1,231  666  181  214  23  80  352  128  466  53  442  29  24  111  new, interface  Table 6.1: Statistics on Eclipse and reusing the Outline View. The rows give statistics, respectively, for all of Eclipse, including for the SWT and JFace; the SWT and JFace only; the Outline View classes I reused; the new classes, etc. that I added in creating the. J a v a V i e w e r facade; and the additional classes, etc. that I added to the facade to use it as an interface to my prototype tool. Numbers represent either number of lines of source code (labelled "loc"), including blank and comment lines, or number of instances (labelled "#"). Statistics for non-trivial boundary maps and renaming maps are given in addition to the statistics for all boundary maps combined. Trivial boundary maps are ones that intercept messages and immediately return with the most obvious default result, such as n u l l for reference types, f a l s e for boolean type, etc. Each renaming map occupies one line of code.  Ill  port management of the two "versions" of the source in a completely transparent manner. In addition, when a reused class declares that it  implements  not to be reused, I had two options: I could comment out the  an interface that is  implements  clause, or I  could provide a dummy interface to replace the original. Neither option is very satisfactory. A better implementation of the tool would provide a means to ignore an unwanted implements  clause.  Determining Classes to Reuse A major difficulty I had in reusing the Outline View was simply to figure out which classes within Eclipse implement it. Because of the choices of class decomposition made within Eclipse, the functionality that we can point to on a display as being "the Outline View" is scattered amongst many classes. This was a perfectly reasonable decision on the part of the designers of Eclipse, since they were designing it to be extensible; the behavioural abstractions in place within this design serve these extensibility requirements. However, to ease software reuse through implicit context, we need to know what classes we wish to reuse in the first place; the model does not speak to how to determine this. Thus, going from the position of possessing no knowledge of Eclipse to the position of having sufficient knowledge to attempt to apply implicit context required approximately 6 weeks of effort. This time would have been shorter if the various documentation then available on Eclipse had spoken more to the existing design of Eclipse rather than to how one can use its extensibility features. After the point of knowing enough about Eclipse to attempt to reuse some of its classes, I only required approximately 1 week to successfully apply implicit context. This includes the time required to locate and correct a few significant errors in my prototype tool.  Automated Tool Support Some automated tool support for generating boundary maps would have aided in this case study. As pointed out in Table 6.1, a large number of the boundary maps that I created and applied were of a trivial nature. Trivial boundary maps are ones that intercept messages and immediately return with the most obvious default result, such as f a l s e  for b  oolean  n u l l  for reference types,  type, etc. An automated tool that could generate trivial maps for any  outgoing, unresolved dependences would have saved me considerable effort. Furthermore, simply providing automated tool support to determine what the unresolved dependences are for a given boundary would have helped enormously. Both of these forms of support would be straightforward to create, building upon the functionality that the prototype tool already possesses for resolving names at boundaries (see Section 5.3.1 for details of this process).  112  Organizing and Ordering Boundary Maps Organizing and ordering boundary maps proved to be a significant challenge. Most boundary maps performed minor transformations; as such, it was difficult to have a clear mental picture of a given module before and.after one of these small transformations. I typically created one mapset for all of the dependences on a given class that was not being reused; hence, there was one mapset for eliminating references to I J a v a E l ement, one for J a v a E d i t o r , and so forth. As described in Chapter 5, the prototype tool applies boundary maps in the order in which they are specified to the tool. Since each transformation is quite small, getting this order correct was difficult at best, and sometimes not possible. Consider that a given module might contain the following line of source:  A.ml().m2(); and that A.ml is not declared within that module. Let us also say that we need to nontrivially map calls to A.ml and B.m2. The details of and order in which we apply our boundary maps matters. If we were to attempt to remap calls to B. m2 first, the line above would not resolve at all and the boundary map would not apply here. If we were to remap calls to A. m l first, through a boundary map where the result type were B, then the attempt to remap B. m2 would match the second call here. Even more subtle effects arise when we alter the result types of messages through our boundary maps; some of these are described in Section 5.6.2. Many of these troubles could be averted through two provisions. First, boundary maps that are applied "simultaneously," as opposed to being explicitly ordered, should not be arbitrarily applied but applied in the lexical order of the source that they affect. For example, in the statement shown above, boundary maps affecting A. m l should be applied prior to boundary maps affecting B .m2. Second, specifications should be allowed that complete partial world views. In the example above, part of the trouble stems from the fact that, within the boundary, we cannot determine the type of A.ml until a boundary map is attached that affects this. Instead, we should permit the inclusion of statements specifying such type information, much as import  statements do within Java currently.  Communication History In this case study, I utilized a total of 11 queries on communication history—6 queries to construct the facade, and an additional 5 queries to construct the interface to the prototype tool. Four of the 11 queries were involved in translating calls on the Java source element hierarchy from those appropriate to Eclipse to those appropriate to the facade, such as checking whether a given method is  s t a t i c  113  or a whether a given field is  private.  Two were involved in handling the initialization of the Outline View for the facade. One translated the Eclipse protocol for looking up the currently displayed page. And the remaining four were involved in translating calls on the Java source element hierarchy from those appropriate to the facade to those appropriate to the prototype tool. The main problem encountered here was for the first four queries involved in translating between source element hierarchies. The prototype tool does not currently instrument call sites. This can lead to the apparent disappearance of calls that one is interested in retrieving from communication history, as described in Section 5.6.2. I dealt with this through a kludge involving the insertion of a dummy method. The call site is changed to call the dummy method, and the dummy method delegates to the method originally called. Communication history could then be queried for calls on this dummy method. Future implementations of the tool should deal with call sites directly. An example mapset using communication history is shown in Figures 6.4 and 6.5; it is used for translating between the source element hierarchies of Eclipse and the facade. Java source element hierarchy classes in Eclipse, such as IMember, represent their modifiers (e.g.,  static)  as a set of flags encoded in an  int.  This  i n t  could be retrieved via  the g e t F l a g s method; the presence of individual modifiers encoded in this i n t could then be tested through static methods provided on the class F l a g s , such as i s S t a t i c . Within the facade, the encoding of modifiers was to be hidden and the tests for modifiers directly supported by the individual classes of the source element hierarchy, such as  proto.hierarchy.Member. Four boundary maps needed to be applied to any class testing for the presence of modifiers (namely, J a v a O u t l i n e P a g e ) .  First, any attempts to retrieve the i n t en-  coding the modifiers had to be dead-ended, returning " 0 . " But ultimately, one will need to query communication history regarding these attempted calls, so the kludge described above needs to be used: the call to g e t F l a g s is intercepted and replaced with a call to getFlags_dummy. This dummy method is introduced into the boundary through the second boundary map. The third boundary map replaces references to the type IMemb e r with references to the type p r o t o . h i e r a r c h y . Member. And the fourth boundary map changes the test for the  s t a t i c  modifier via the F l a g s class to a direct call on  the i s S t a t i c method on p r o t o . h i e r a r c h y . Member. But this latter call invokes an instance method, meaning that the appropriate instance has to be found. This instance is retrieved via communication history, where the last call to g e t F l a g s must have been called on the instance of interest.  Overall Overall, the case study demonstrates an ability to reuse a chunk of an existing system that was not designed to enable such reuse, while avoiding invasive modifications to the reused  114  //  //  FlagsMaps  II  II Translate the Java source element hierarchy modifier II flags (e.g., static, private) from the Eclipse format II to the facade format. //  II Eclipse returns an encoding of all flags via getFlags() II in which we can test for individual flags, such as: // // // //  int flags = someMember.getFlags(); boolean isStatic = Flags.isStatic(flags);  II The facade provides II directly.  methods to,test  for individual  flags  II mapset  FlagsMaps {  // Delegate II returned s t a t i c  i n t  r e t u r n  calls to getFlags() to a dummy method. int encodes the flags.  The  m a p ( I M e m b e r member): o u t ( m e m b e r . g e t F l a g s ( ) ) { getFlags_dummy(member);  }  // Provide a dummy method to dead-end II just return 0. s t a t i c  i n t  to getFlags.() ;  map():  in(getFlags_dummy(IMember)) r e t u r n  calls  {  0;  }  (continued in next figure) Figure 6.4: An example mapset performing a communication history query, Part 1 of 2. This mapset translates the tests for modifiers on the Java source element hierarchy from those appropriate to Eclipse to those appropriate to the facade.  115  (continued from previous figure) //  Rename IMember  II  (from  the  (from  Eclipse)  to  proto.hierarchy.Member  facade).  IMember -> p r o t o . h i e r a r c h y . M e m b e r ;  //  Replace  II  int  attempts  returned  s t a t i c  boolean  //  We need  II  tested  II  at  testing  for  by getFlags() to  map():  find  came.  with  the  static  direct  out(Flags.isStatic(int))  the  call  Do this  from  whence  by finding  the  the last  on  the  {  flags  being  call  to  getFlags[_dummy]().  Call flagsCall = History.lastCall(Thread.currentThread(), "getFlags_dummy", n u l l ) //  flag calls.  The  first  11 source  parameter  hierarchy  passed  to  this  call  n u l l , ;  will  be  the  member.  p r o t o . h i e r a r c h y . M e m b e r member = (proto.hierarchy.Member)flagsCall.getParameter(0); //  Now perform  r e t u r n  the  direct  test  on this  member.  member.isStatic();  }  Figure 6i5: An example mapset performing a communication history query, Part 2 of 2.  116  modules in the process (other than a few due to minor limitations in the prototype tool). There is no way we could have accomplished this kind of reuse through existing approaches, as discussed in Chapter 2. The largest question remains that of when it becomes more expensive to reuse modules than to reimplement them. At some point, extant modules will be such a poor fit for the new context to which wish to move them that they would not truly meet our needs. Where this point lies is very sensitive to details of the process and the domain in which we work. In this case study, it was expensive to determine the classes I wished to reuse, but had I already possessed background knowledge of Eclipse, this cost would have been reduced. On the other hand, implementing a module with the functionality of the Outline View would have required a learning effort as well. Finding the correct tradeoffs awaits further research.  6.2  Evolution of a File Transfer Protocol Server  This case study concentrated on demonstrating the use of implicit context in the arena of unanticipated evolution. I constructed parallel versions of a File Transfer Protocol (FTP) [Postel and Reynolds, 1985] server using different approaches and evolved each in ways not catered for by the original design. I outline some high-level FTP concepts of importance in Section 6.2.1. There were two stages to the case study. In the first stage, I selected a subset of the features described in the FTP specification. I designed and implemented two versions of an FTP server providing this subset of features—one using standard object-orientation, and the other using object-orientation plus implicit context. Both of these versions ignored the presence of the remainder of the FTP specification. In the second stage, I selected a subset of the remaining features of the FTP specification, and evolved my two FTP server implementations to incorporate these new features. Each of the evolved versions continued to use the development approach it used for the initial stage. Additionally, I wished to demonstrate the efficacy of implicit context in easing the evolution of a system that was not initially constructed with it. Therefore, I took the initial version of my purely object-oriented FTP server and "cross-evolved" it, i.e., I evolved it using implicit context. In Section 6.2.2, I describe in more detail the procedure that I followed in the case study. The results and lessons that I learned from the case study are discussed in Section 6.2.3.  6.2.1  F T P Concepts  The File Transfer Protocol uses two kinds of connections to communicate between client and server, control connections and data connections. The server listens at a given port on 117  a given machine for connection attempts by clients; successful attempts establish a control connection. The client sends messages to the server consisting of a command and zero or more arguments as appropriate for the command. The server processes each message and responds with a series of one or more reply codes indicating the status of the requested operation along with any return data. Each control connection is stateful and some commands must be sent in an order prescribed by the protocol to complete successfully. Some commands (notably R E T R to retrieve files from and STOR to store files to the server) establish a data connection for transferring large amounts of data. This data connection is established by the server to a default port on the client; a PORT command can be issued to select a different client-side port. A server can be told to passively await the establishment of a data connection, rather than establish it itself, through the P A S V command; this command returns the server-side port at which the server awaits the connection. A data connection may be transient, shutting down upon the completion of an operation; this depends upon the requests made and the implementation of the server. Both the control connection and data connection are closed at the successful reception of the Q U I T command, if they remain open. Other commands can be issued to modify the way the data passed in the data connection is interpreted. For example, the T Y P E command is used to indicate whether data is to be interpreted as ASCII or binary (called "image" type), the MODE command is used to indicate whether the transmission mode is as a stream of bytes or in blocks, and the S T R U command is used to indicate whether the data is structured as files or records. Users can be identified via the U S E R command and authenticated via the P A S S (password) command. Within the FTP specification (RFC 959 [Postel and Reynolds, 1985]), there is a specification of the Minimum Implementation that a server must provide in order to be a bona fide FTP server: the commands U S E R , Q U I T , PORT, T Y P E , MODE, S T R U , R E T R , STOR, and NOOP (which does nothing) must be recognized. Note that P A S S and P A S V are not required by the Minimum Implementation. The protocol provides for additional commands not of importance here.  6.2.2  The Procedure  The procedure that I followed in developing and evolving the FTP server consisted of the following, major steps, as illustrated in Figure 6.6. 1. Design and implement two versions of the FTP server meeting the requirements for the Minimum Implementation of RFC 959, one using only object-oriented techniques, and one using object-orientation plus implicit context.  118  t authentication &  Implementation"  pure object-orientation  object-orientation + implicit context  approach Figure 6.6: Evolving versions of the FTP server, from the initial, Minimum Implementation to include the authentication and passive mode features. The circles represent versions of the server; the arrows represent tracks of evolution. 2. Select a subset of the remaining features from R F C 959. 3. Evolve the designs and implementations of each version, using its original development approach, to add the new features. 4. Evolve the design and implementation of the purely object-oriented version of the server from Step 1 using implicit context to add the new features. I describe each of these steps in turn.  Developing the Minimum Implementations In developing the two versions of the Minimum Implementation, I was concerned that I provide a fair comparison between the two approaches. On the one hand, I did not want the. design of the object-oriented version to be influenced by the ideas of implicit context. On the other hand, I did want the object-oriented version to be well-designed. To deal with these concerns, I took two precautions. First, I completely designed, implemented, and tested the object-oriented version prior to developing the version utilizing implicit context. Second, I had several colleagues critique my design for the object-oriented version, altering it accordingly. Below, I first describe the design of the object-oriented version and then the design of the version using implicit context. 119  Object-oriented version. The U M L class diagram for the object-oriented design is shown in Figures 6.7 and 6.8. An instance of S e r v e r runs listening for connection attempts by clients on a particular port. When a client connects to the server, an instance of S e s s i o n is created. In turn, the S e s s i o n object creates one instance each of C o n t r o l C o n n e c t i o n , D a t a C o n n e c t i o n , and T r a n s f e r C o n t e x t .  The S e s s i o n object is used to provide a  convenient handle on the current control connection, data connection, and transfer context. Each C o n t r o l C o n n e c t i o n runs in its own thread. A C o n t r o l C o n n e c t i o n listens at its port for incoming request messages from the client. Received requests are sent through a chain of objects for parsing and action. The I n t e r p r e t e r encapsulates the knowledge that the initial three or four letters of the message specify the FTP command, while the remainder consists of command-specific arguments. The CommandFactory returns an instance of one of the Command subclasses as appropriate to the specified command. Each Command subclass encapsulates the knowledge of the syntax of the arguments specific to that command. Thus, I n t e r p r e t e r strips the command out of the message and asks CommandFactory for the corresponding Command instance. If such an instance exists, I n t e r p r e t e r hands it the remainder of the message for parsing and action. One or more responses are generated along this chain and handed to the C o n t r o l C o n n e c t i o n to be sent to the client; each response is encoded as an instance of a subclass of R e p l y . Each R e p l y subclass encapsulates the knowledge of its corresponding reply code, used in formatting the response sent over the control connection. The D a t a C o n n e c t i o n object encapsulates the knowledge of the data connection protocol appropriate to the current session state, such as the file structure and transmission mode. For example, this protocol may require that the actual data connection be closed after each file transfer and this fact needs to be communicated to the client. The T r a n s f e r C o n t e x t object contains the state specific to the session, such as the address and port of the client and representation type. This system is reasonably well-designed. This design uses a number of design patterns [Gamma et al, 1994] including Interpreter, Command, Factory Method, and Singleton. The design encapsulates specific pieces of knowledge about the File Transfer Protocol (such as the encoding of reply codes, the argument syntax for particular commands, and the data format used across the data connection) in different classes, so that each might be changed in isolation from the rest of the system. Subclassing is used so that additional classes can be added (for any new commands, say) without the need for much of the rest of the system to change. For example, C o n t r o l C o n n e c t i o n can treat any instance of R e p l y through the same, abstract interface without worrying about its specific subclass. Each class has a well-defined purpose.  120  ControlConnection  Server  - socket - keepAlive  1  - interpreter  - run() - send(Reply) H close() 1  - port - serverSocket  Interpreter + process(String, Session) + translate(Reply) 4- (jetlnstanre( )  Y  run()  n n a i p ( S t r i n g [ ])  - ControlConnection  [ ServerException |  Session  TransferContext - transferContext  Y getControlConnection() h getDataConnection() Y getTransferContext() Y shutdown!) 1  Session Exception  - dataConnection DataConnection + send(File) + receive(File) + close()  - remoteAddress - remotePort Y setDataRemotePort(int) Y getDataRemotePortf) Y setDataRemoteAddress(lnetAddress) Y getDataRemoteAddress() h setRepresentationType(RepresentationType) Y getRepresentationType() Y setDataStructure(DataStructure) Y getDataStructure() H getOutputStream(OutputStream) Y getlnputStream(lnputStream)  1 y - type RepresentationType  - dataStructure y 1 DataStructure + +  getOutputStream(OutputStream) getlnputStream(lnputStream)  + +  25  getOutputStream(OutputStream) getlnputStream(lnputStream)  :  FileDataStructure  ASCIINonPrintRepresentationType  h getOutputStream(OutputStream) h getlnputStream(lnputStream) Y gellristance()  - getOutputStream(OutputStream) - getlnputStream(lnputStream) - gfitlnstance<)  ImageRepresentationType - getOutputStream(OutputStream) - getlnputStream(lnputStream) - getlnstance()  -instance  Figure 6.7: Initial design for the purely object-oriented version of the FTP server, Part 1 of 2. This diagram shows only the static features of the design, and does not include dependences other than those arising from subclassing and the existence of fields.  121  CommandFactory  -flyweights  A.  + getCnmrnanrifRtringl  DataPortCommand  ModeCommand  - perform(String, Session)  - perform{String, Session)  FileStructureCommand  —  Session)  NoopCommand  - performfString, Session) 1  Command + performfString,  —  —  - perform(String, Session)  LogoutCommand  —  - perform(String, Session)  RetrieveCommand • performfString, Session) StoreCommand + performfString, Session)  RepresentationTypeCommand + performfString, Session)  UserNameCommand - performfString, Session)  Reply # message + getMessagef) + getCodef) # split(String) ArgumentSyntaxErrorReply  CommandNotlmplementedForPassedArgumentReply  - getCodef)  v getCodef) v gatlnstanmf)  - rjetlnstannsf)  CannotOpenDataConnectionReply  -instance.  FileUnavailableReply  v getCodef)  • getCodef)  - gellnslance4  - flBtlnstanref )  ClosingControlConnectionReply  LocalErrorReply  getCodef) + getlnstannef)  - getCodef) - QBtlnstannaf)  CommandOKReply  OpeningDataConnectionReply  - getCodef) - getinstannef )  - getCodef) - getlnstannsf)  ConnectionEstablishmentReply  ServiceNotAvailableReply  i- getCodef) v gntlnstanrnf)  - getCodef) - getlnstancHf )  SyntaxErrorReply - getCodef) - QRtlnstannaf)  Figure 6.8:  - instance  Initial design for the purely object-oriented version of the F T P server, Part 2  of 2.  122  Implicit context version.  A modified U M L class diagram of the design using implicit  context is shown in Figures 6.9 and 6.10.  Many of the classes are surrounded by dotted  lines; these represent boundaries. Attached to these are mapsets, shown as rounded boxes. An instance of S e r v e r runs listening for connection attempts by clients on a particular port. When a client connects to the server, an instance of S e s s i o n is created, running in a separate thread. S e s s i o n provides the basic structure of the FTP server architecture: it instantiates C o n t r o l C o n n e c t i o n and C o m m a n d l n t e r p r e t e r , and then runs in a continuous loop, reading lines from the C o n t r o l C o n n e c t i o n and having the C o m m a n d l n t e r p r e t e r interpret them—at least, this is the world view within S e s s i o n . In reality, the S e s s i o n A r c h i t e c t u r e mapset ignores the attempt to instantiate C o m m a n d l n t e r p r e t e r and intercepts attempts at interpreting the received message, rerouting them to CommandFactory. CommandFactory strips off the initial four characters of the received message (which encode the name of an FTP command) and locates the corresponding instance of a subclass of C o m m a n d l n t e r p r e t e r . The arguments to the command are then passed to this instance for further interpretation.  3  Each C o n t r o l C o n n e c t i o n is much simpler in this design than in the purely object-oriented one. A C o n t r o l C o n n e c t i o n only reads from its socket or writes to its socket when it is explicitly told to do so. C o n t r o l C o n n e c t i o n knows nothing of the rest of the system. D a t a C o n n e c t i o n is similarly simple, knowing only how to read or write files. Extra pieces of behaviour are added to it through the D a t a C o n n e c t i o n A r c h i t e c t u r e mapset: the files being read and written must be filtered according to the current representation type; and, the data connection must be set up according to the remote address and data port for the client. Each subclass of C o m m a n d l n t e r p r e t e r is responsible for parsing the arguments to one of the FTP commands; exceptions are thrown if the arguments do not conform to the correct syntax. The mapset attached to each command interpreter intercepts any thrown exceptions and sends replies in their place. It also intercepts other outgoing messages, inserting additional communication with the client as required by the Protocol for each command; for example, the RETR command requires that the client be notified when the data connection is about to be opened. The R e p l y M a r s h a l mapset around S e s s i o n is The knowledge of the encoding of the command within the message, and the knowledge of the correspondence between a particular four-letter sequence and a given subclass of Commandlnt e r p r e t e r , are intimately connected but different. Therefore, CommandFactory could have been designed as a module containing two smaller modules, one for each of these two different pieces of knowledge. This situation is similar to the example of reducing m a k e C o n t r i b u t i o n s on page 31. 3  123  Server  ServerException  - port - serverSocket  Session - keepAlive  run() hmain(Stnng[ ]) Y  + run() + end()  SessionException (  ( Server Architecture )  SessionArchitecture )•• { ReplyMarshal )  ControlConnection  DataConnection  • socket  - send(File) - receive(File) - close()  h receive() h send(Reply) Y close()  •{ DataConnection Architecture )••  ( ControlConnection Architecture )  RepresentationType + +  getOutputStream(OutputStream) getlnputStream(lnputStream)  ASCIINonPrintRepresentationType  ImageRepresentationType  - getOutputStream(OutputStream) - getlnputStream(lnputStream)  + getOutputStream(OutputStream) + getlnputStream(lnputStream)  { Singleton )  CommandFactory  { Singleton )  commands  - geiCjimmandiString) (  Commandlnterpreter + pertorm(String)  CommandFactoryProtocol~)  DataPortCommandlnterpreter  ModeCommandlnterpreter  RetrieveCommandlnterpreter  - perform(String)  - perform(String)  v perform(String)  ( StandardProtocol)  ( StandardProtocol)  FileStructureCommandlnterpreter + perform(String)  NoopCommandlnterpreter - perform(String)  ( StandardProtocol)  { RetrieveProtocol) StoreCommandlnterpreter + perform(String)  ( StandardProtocol)  { StoreProtocol )••  LogoutCommandlnterpreter  RepresentationTypeCommandlnterpreter  UserNameCommandlnterpreter  - perform(String)  h perform(String)  - performfString)  { LogoutProtocol)  ( StandardProtocol)  {  UserNameProtocol )••  { CommandActionj  Figure 6.9: Initial design for the implicit context version of the FTP server, Part 1 of 2.  124  java.iang.Throwable —  Reply # message - getMessage() h getCodef)  ( Singleton]  ( Singleton]  ArgumentSyntaxErrorReply  CannotOpenDataConnectionReply  - getCode()  - getCodef)  ClosingControlConnectionReply  CommandNotlmplementedForPassedArgumentReply  - getCode()  CommandOKReply  Connection EstablishmentReply  h getCodef  - getCodef)  FileUnavailableReply  LocalErrorReply  - getCodef)  ( Singleton ]  ( Singleton]  - getCodef)  ( Singleton )  ( Singleton )  ( Singleton]  ( Singleton ]  ( Singleton ]  - getCodef)  OpeningDataConnectionReply  ServiceNotAvailableReply  - getCode()  - getCodef)  ( Singleton]  SyntaxErrorReply - getCodef) ( Singleton]  Figure 6.10: Initial design for the implicit context version of the FTP server, Part 2 of 2.  125  responsible for actually translating the R e p l y objects into messages and handing them to  the C o n t r o l C o n n e c t i o n . Since the use of the Singleton design pattern is merely an optimization within the system, I chose to design all the classes to treat the R e p l y and R e p r e s e n t a t i o n T y p e subclasses as though they were not singletons. Instead, attempts to instantiate any R e p l y or R e p r e s e n t a t i o n T y p e subclass were intercepted and replaced with a call to the firstcreated instance of that subclass. The remaining mapsets performed minor adjustments to allow us to do such things as deal with exceptions, small name adjustments, and the finer points of stream-based communication over network sockets, such as buffering for efficiency. My implementation expressed the essential structure of the modules in this version of the FTP server. The most basic, and least likely to change, features of FTP were encapsulated in these modules. More flexible details of protocol have been separated and added externally.  Selecting New Features RFC 959 describes various commands and features not required of a Minimum Implementation. Some of these are simple additions, accommodated by adding new subclasses to the hierarchy of commands; we would expect an object-oriented design to handle these with ease. Others are not so straightforward, requiring modifications to multiple parts of the system. I selected two of these more problematic features, authentication and passive mode. Authentication involves ensuring that a user is who they claim to be. FTP provides for authentication through its PASS command, through which a user can provide a password. Most FTP commands require that the user be authenticated prior to their usage, the exceptions being USER, QUIT, and PASS itself. FTP implicitly specifies a state machine for authentication, shown in Figure 6.11. The server is normally responsible for establishing the data connection. That is, the server creates a socket and connects to a particular port on a remote machine. In passive mode, the server waits, listening at a specific port for the client or other server to establish the data connection. Passive mode thus affects any functionality involving the data connection. The state machine involved in passive mode is shown in Figure 6.12.  Evolving the Versions to Add the New Features I evolved my initial designs to add the authentication and passive mode features. I used purely object-oriented techniques to evolve the object-oriented version, and a combination of object-orientation and implicit context to evolve the other version. I describe, in turn, the  126  any other command  any other command  Figure 6.11: The FTP-specified state machine for authentication. The three states are unauthenticated (U), awaiting password (P), and authenticated (A). any  any other command  Figure 6.12: The FTP-specified state machine for passive mode. The two states are active (ac) and passive (pa). changes required to each.  Object-oriented version. The changes to the design of the purely object-oriented version, resulting from the addition of the authentication and passive mode features, are shown in Figures 6.13 and 6.14; new classes and altered classes are highlighted in grey. Five new subclasses of R e p l y were also added in a straightforward manner; thus, I do not show the R e p l y hierarchy for the evolved design. Authentication required a number of additions to the T r a n s f e r C o n t e x t class. Previously, I had not been bothering to store the user name passed in the USER command because FTP did not require its use within the Minimum Implementation. In my evolved design, I added fields to record the user name and whether the user is logged in. I also added setter and getter methods for this state. A PasswordCommand was added to the Command hierarchy to parse the arguments to the PASS command. A class called A u t h e n t i c a t o r was created to perform the actual  127  ControlConnection  Server  - socket - keepAlive + run() + send(Reply) + closet) 1  1 - interpreter  - port - serverSocket h run()  Interpreter + process(String, Session) + translate(Reply) + getlnstancfif)  nnaia(String[  - ControlConnection  Session  1  ServerException  - transferContext  • getControlConnection() >• getDataConnection() - getTransferContext() Y shutdown()  ])  Session Exception  1 - dataConnection DataConnection  •passiveSocket • -send(Fil • receive(File)*'' • closef) - setBassiveMode(••),  tTransferContext -iremoteAddress -rembJe.Rprt• -userNarnelsSet - userName - isLoggedln «setDataRemotePort(int) - getDataRemotePort() ' h setDataRemoteAddress(lnetAddress). -K getDataRemoteAddress() h setRepresentationType(RepresentationType) v getRepresentationTypef) ' ' hsetDataStructure(DataStructure) hgetDataStructure() ... " h.get6utputStream(0utputStream)'. -_* agetlnputStream(lnputStream) - getUserName( )•,.••» h setUserName(String). '• - unsetUserName() t;haslJserName() ' «,. hJsAtithenticated() .... i- setAuthenticated(boolean) - f" " 1  - dataStructure \|/ 1  1 \|/ - type  DataStructure + +  RepresentationType  getOutputStream(OutputStream) getlnputStream(lnputStream)  + +  getOutputStream(OutputStream) getlnputStream(lnputStream)  25 FileDataStructure  ASCIINonPrintRepresentationType  h getOutputStream(OutputStream) H getlnputStream(lnputStream) h getlnstance) )  - getOutputStream(OutputStream) - getlnputStream(lnputStream) - getlnstance()  nstanne  ImageRepresentationType Tl^ - getOutputStream(OutputStream) - getlnputStream(lnputStream) - gfitlnstanne( )  Figure 6.13: Evolved design for the object-oriented version of the FTP server, Part 1 of 2.  128  CommandFactory  ^  - flyweights  + gBtCnmmand(String)  Command + performfString, Session)  : DataBortCommand  ModeCommand  + perform(String, Session)  + performfString, Session)  Authenticator -isVallri(String,  String)  RetrieveCommand + perform(String, Session)  iiFileStructureCommand  NoopCommand  StoreCommand  + perform(String, Session)  - performfString, Session)  + perform(String, Session) •  LogoutCommand - perform(String, Session)  Rep-esentat orTypeCommand •t-iperformfString, Session)  PassiveModeCommand  PasswordCommand  -perform(String, Session)  •aperformfString, Session)  UserNameCommand + perform(String", .Session)  Figure 6.14: Evolved design for the object-oriented version of the FTP server, Part 2 of 2. validity check on a username/password pair. PasswordCommand uses A u t h e n t i c a t o r to perform this validity check, storing the results in the current T r a n s f e r C o n t e x t . The UserCommand had to be altered to store the arguments passed in the USER command into the current T r a n s f e r C o n t e x t object. Its reply to the client was also changed from "Command O K " to "Password needed." Every other command, save QUIT, must be authenticated prior to its operation. Therefore, every subclass in the Command hierarchy had to be altered to check that the user had been logged in. If not, a reply of "User not logged in" had to be sent to the client. For most commands, this involved the insertion of a few lines of code at the beginning of the corresponding p e r f o r m method. To add passive mode, the D a t a C o n n e c t i o n class required extensive modifications. First, a method and a field had to be added to allow passive mode to be set. The p a s s i v e S o c k e t field records the socket used to listen for attempts by the client or other servers to establish the data connection. The s e n d and r e c e i v e methods had to be modified to test for passive mode. If passive mode was set, the passive socket was listened at. Once the data connection was established, the passive socket was closed, causing the server to return to active mode. I added a PassiveModeCommand to the Command hierarchy. This class parses the arguments to the PASV comriiand (none are permitted) and tells the current D a t a C o n n e c t i o n instance that passive mode is set. CommandFactory was altered to store an instance of each of the two new subclasses of Command, just as it stores an instance of all the original subclasses of Command.  129  Server  ServerException  - port - serverSocket run() hmain{String[ ]) H  - run() - end()  SessionException {  { ServerArchitecture )  ControlConnection  SessionArchitecture )  DataConnection  - socket  + send(File) + receive(File) + close()  - receive() - send(Reply) - close() {  Session - keepAlive  ''' (^^SConn*eettoh4fc/i/t^S>^J-•  ControlConnection Architecture )  RepresentationType <• getOutputStream(OutputStream) h getlnputStream(lnputStream)  ASCIINonPrintRepresentationType  ImageRepresentationType  - getOutputStream(OutputStream) - getlnputStream(lnputStream)  h getOutputStream(OutputStream) h getlnputStream(lnputStream)  ( Singleton)  ( Singleton)  Figure 6.15: Evolved design for the implicit context version of the FTP server, Part 1 of 2. Implicit context version. The changes to the design of the version using implicit context, resulting from the addition of the authentication and passive mode features, are shown in Figures 6.15 and 6.16; new classes and altered classes are highlighted in grey. As with the object-oriented version, there were five subclasses of R e p l y also added that are not shown. Authentication required the introduction of the P a s s w o r d C o m m a n d l n t e r p r e t e r class to parse the arguments passed for the PASS command, plus the attendant P a s s w o r d P r o t o c o l mapset to return a standard reply to the client. Beyond this, authentication was enabled through the application of three mapsets: U s e r N a m e A u t h e n t i c a t i o n , P a s s w o r d A u t h e n t i c a t i o n and A u t h e n t i c a t i o n C h e c k . Each of these mapsets expressed a portion of the state machine shown in Figure 6.11 as described below. U s e r N a m e A u t h e n t i c a t i o n is the simplest of these mapsets. We can see from the state machine that any occurrence of a USER command results in the system transitioning to the state where it awaits a password. Therefore, this mapset merely intercepts the reply "Command O K " sent from the U s e r N a m e C o m m a n d l n t e r p r e t e r class and replaces it with "Password needed." The other two mapsets require us to determine which state the session is in. For the session to be in an authenticated state, two properties must hold. First, the most recent 130  CommandFactory  A  i  - commands  Commandlnterpreter  4. fJHtnrimmarrifRtring)  + performfString)  DataPortCommandlnterpreter  LogoutCommandlnterpreter  h perform(String) { StandardProtocoP)  h performfString) { LogoutProtocoP)  FileStructureCommandl nterpreter - performfString) ( StandardProtocoP) ModeCommandlnterpreter • performfString) ( StandardProtocoP)  PasswordCommandlnterpreterii - performfString)  NoopCommandlnterpreter - performfString)  (PasswordProtocol j (i; PasswordAuthentication )  StandardProtocoP) RepresentationTypeCommandl nterpreter h performfString) { StandardProtocoP)  UserNameCommandlnterpreter - performfString)  RetrieveCommandlnterpreter - performfString)  { UserNameProtocol) ( UserNameAuthentication)  { RetrieveProtocoP) StoreCommandlnterpreter - performfString) ( StoreProtocol ) iPassiveModeEbmrnandlnterpreter:: - performfString) • PasslveModeProtocolj  Authenticator.. - isyalidfString, String)  [ AuthenticationCheck ) ( CommandAction )  Figure 6.16: Evolved design for the implicit context version of the FTP server, Part 2 of 2.  131  occurrence of the USER command must have been immediately followed by an occurrence of the PASS command. Second, the password sent as an argument to this PASS command must be valid for the user name sent as an argument to this USER command. Finding the most recent occurrence of the USER command requires a communication history query to find the most recent call for the current thread to the p e r f o r m method on the U s e r N a m e C o m m a n d l n t e r p r e t e r . If the next call to a p e r f o r m method on any subclass of Comm a n d l n t e r p r e t e r happens to be to P a s s w o r d C o m m a n d l n t e r p r e t e r , we know that a PASS command immediately followed the USER command. The communication history queries can retrieve the user name and password arguments passed to these methods. The validity check is performed by a method on the A u t h e n t i c a t o r class, added for this purpose. The P a s s w o r d A u t h e n t i c a t i o n mapset captures the "Command O K " reply issuing from the P a s s w o r d C o m m a n d l n t e r p r e t e r , and follows the procedure above to determine if the session is authenticated. If it is, the captured reply is replaced with "User logged in"; if it is not, the captured reply is replaced with "Not logged in." The A u t h e n t i c a t i o n C h e c k mapset intercepts requests to perform any command except USER, PASS, or Q U I T . It checks that the session is authenticated using the procedure above. If it is, the intercepted message is called to proceed; if it is not, the reply "Not logged in" is sent to the client. The passive mode feature required the addition of a P a s s i v e M o d e C o m m a n d l n t e r p r e t e r , to parse the arguments to the PASV command (none are permitted), with its attendant P a s s i v e M o d e P r o t o c o l mapset to translate exceptions into the proper reply messages. The CommandAction mapset was modified to intercept the "Entering passive mode" reply sent from the P a s s i v e M o d e C o m m a n d l n t e r p r e t e r . The real reply needs to contain the address and port number of the socket on which the server will be listening for the establishment of the data connection. This information is added to the reply through communication history queries. Finally, the D a t a C o n n e c t i o n A r c h i t e c t u r e mapset needed a boundary map added to it to intercept attempts to create a socket. If the session is in passive mode, the server listens for the data connection to be established; otherwise, it calls to proceed with the socket creation.  Cross-evolution: Adding Features to the OO Version through Implicit Context Finally, I took the initial design and implementation of the version using only objectorientation and added the authentication and passive mode features to it with implicit context; the changes are shown in Figure 6.17. Two new Command subclasses were added, PasswordCommand and P a s s i v e ModeCommand, and an A u t h e n t i c a t o r class to test username/password pairs for va-  132  1  - dataConnection  DataConnection + send(File) + receive(File) + close() ( PasslveMode)  V CommandFactory  A  - flyweights  + rjptCnmmand(String)  Command  ,« Authenticator  + performfString, Session)  t iaValidfStnng, String)  DataPortCommand  PassiveModeCommandi  - periorm(String, Session)  - performfString", Session) {PasslveMode)  FileStructureCommand - performfString, Session) ModeCommand  UserNameCommand  - performfString, Session)  - performfString, Session) ( UserNameAuthentlcation)  NoopCommand - performfString, Session) RepresentationTypeCommand  PasswordCommand  i- performfString, Session)  + performfString, Session)  RetrieveCommand  ( PasswordAuthentication j  - performfString, Session) StoreCommand  LogoutCommand  - performfString, Session)  - performfString, Session)  Figure 6.17: Cross-evolved design of the FTP server. Much of the design that remains unchanged is not shown, nor are the five added R e p l y subclasses.  133  lidity, as in the evolved object-oriented version of the system. Likewise, CommandFact o r y was altered as before, and five subclasses of R e p l y were added that are not shown in the diagram. The mapsets U s e r N a m e A u t h e n t i c a t i o n , P a s s w o r d A u t h e n ' t i c a t i o n and A u t h e n t i c a t i o n C h e c k were effectively the same as in the evolved implicit context version of the system. The biggest difference between the cross-evolved version and implicit context version of the system lies in the use of the P a s s i v e M o d e mapset. PassiveModeCommand was implemented as though it were communicating directly with the D a t a C o n n e c t i o n . When D a t a C o n n e c t i o n attempts to create a socket, this attempt is intercepted to see if the session is in passive mode. If it is, the D a t a C o n n e c t i o n listens at a socket for the data connection to be established; if it is not, the message to create the socket is called to proceed.  6.2.3  Results and Lessons Learned  Table 6.2 shows some statistics for the case study. The total number of lines of code for the versions using implicit context tends to be slightly greater than the number of lines of code for the versions using only object-orientation. This is an artifact due to two limitations  Approach  Source  Boundary maps  size (loc)  all (loc)  (#)  (# CHQs)  Tool  Invasive  renaming  dirs.  mods.  (loc, #)  (loc)  () #  (# els)  Initial version Pure O O  1,404  +IC  848  0 0  Evolved  909  120  39  29  121  version  Pure OO  1,722  16  11  OO +IC  941  1,169  133  57  29  144  1  1  1,602  196  34  10  22  31  1  1  Cross  Table 6.2: Statistics on synthesizing and evolving the FTP server. Numbers represent one of: number of lines of source code (labelled "loc"), including blank and comment lines; number of instances (labelled "#"); number of queries to communication history (labelled "# CHQs"); or, number of classes that were modified (labelled "# els"). Statistics for renaming maps are given in addition to the statistics for all boundary maps combined. Each renaming map occupies one line of code. The number of invasive modifications to the code that were required to evolve the server are shown in the rightmost column; each deletion or insertion of a consecutive set of lines of code is counted as one modification.  134  of the tool, a lack of wildcarding and a lack of direct support for subtyping; both these limitations lead to the need to duplicate a significant number of boundary maps and tool directives. When the numbers are modified to eliminate such duplicates, the total number of lines of code for each approach are roughly equal.  Invasive Modifications More significant to this case study are the number of invasive modifications that needed to be made to each version to accommodate the new features. The purely object-oriented approach required a total of 16 invasive modifications among 11 classes. The two approaches that evolved the FTP server with implicit context required only a single instance of invasive modification. While 16 modifications may not sound like many, realize that this was for 4  only one step of evolution. Systems undergo continual evolution throughout their existence [Lehman and Parr, 1976; Belady and Lehman, 1976]; each stage of changes would require further invasive modifications. Such invasive modifications tend to cause software structure to degrade, making each successive evolution step harder [Lehman, 1974, 1980; Eick et al, 2001].  Avoiding Planning for Change One might be tempted to interpret the results of this case study in the wrong way. While I say that I wish to avoid having to plan for change, it is important to be clear about my meaning. Change happens and it is important to plan for it but problems occur when we assume that specific changes will happen. For example, in the object-oriented version of the FTP server, I created a class hierarchy for different commands. This hierarchy is a natural structure to use for the sake of abstraction within the initial version of the server; it also happens to be easy to add new subclasses to it to support one form of evolution. The initial version using object-orientation also provides a class hierarchy to support different structures of data being communicated over the data connection; however, there is only one kind of data structure actually supported. The hierarchy is present in the latter case to explicitly worry about the possibility of specific future extensions, i.e., new structures for data. Thus, I expended time and effort in an attempt to support future extensions that may never happen—a poor investment. Therefore, one might think that it is not necessary to synthesize our modules without extraneous embedded knowledge, that trying to reduce our modules to their minimum would be an act of planning for change. After all, it is possible to have our modules express essential structure in emergent fashion (e.g„ using implicit context to evolve the purely This one was due to the fact that the prototype tool does not support the capture of initializer executions. 4  135  object-oriented version of the FTP server), providing a means for evolving existing systems. We can see from Table 6.2 that the evolved version of the FTP server using implicit context actually required the addition of more boundary maps than did the cross-evolved version, both in terms of lines of code (260 vs. 196) and number of non-renaming maps (13 vs. 12). However, partially expressing essential structure is not ideal. For starters, the numbers above are slightly deceiving; 64 of those lines of code (constituting 4 boundary maps) that were added to the initial implicit context version are necessary only because of a lack of support for wildcarding in the tool. But more significantly, the modules of the cross-evolved version continue to express knowledge beyond their emergent essential structure. Further evolution steps would still need to cope with this remaining E E K , as would any attempt at reusing these modules. By building our modules to avoid E E K in the first place, we are not planning for specific changes, we are simply avoiding worrying about the future. Through implicit context, we can make each module do exactly what it should and nothing more.  Communication History The initial version of the FTP server using implicit context made 39 queries to communication history. Of these, 13 were used in the S i n g l e t o n mapset and 13 were used in the R e p l y M a r s h a l mapset. Each of these mapsets contained 13 boundary maps to deal with each of the different subclasses of R e p l y ; hence, these 26 queries would be reduced to 2 if the tool dealt with subclasses explicitly and/or supported wildcarding. Most communication history queries here involved retrieving the current object (i.e., the most recently created object in the current thread) representing the state of the FTP session, such as the current D a t a C o n n e c t i o n , current remote address for the client, or the current Session. A few queries involve determining the current state of the system by testing for the order of a small set of calls. For example, a small state machine is specified by the boundary map in Figure 6.18. Here, the current representation type is defined to be ASCII non-print by default, unless the client has most recently explicitly specified that it should be image type. R e p r e s e n t a t i o n T y p e C o m m a n d l n t e r p r e t e r must know how the arguments to the T Y P E command correspond to different representation types; however, it should not know what the system in which it exists does with the representation types. Therefore, it simply calls a non-existent method to set the representation type appropriate to the arguments passed with the T Y P E command. This boundary map takes this event and translates it into an instantiation of the appropriate subtype. In the evolved version were 18 additional queries to communication history: 2 of these were used to find the local port number on the socket used for passive mode so the client could be given this information; 4 were used for extensions to the R e p l y M a r s h a l  136  // Replace attempts II an instantiation  at instantiating of one of its  RepresentationType subclasses.  with  RepresentationType m a p ( ) : out(RepresentationType.new()) {  s t a t i c  // Find the last II representation  call that attempted to set image type, if such exists.  C a l l imageCall^ = History.lastCall(null, "setlmageRepresentationType", n u l l ) ;  // Find the last II representation  call that attempted to set ASCII type, if such exists.  Call asciiCall = History.lastCall(null, "setASCIINonPrintRepresentationType", n u l l ) ;  // II II II II  By default, we use ASCII representation type. So, if there has been no attempt at setting image type, we're done. Otherwise, the attempt at setting image type needs to be more recent than any attempt at setting ASCII type for image type to be current.  if(imageCall asciiCall r e t u r n  r e t u r n  new  new  == n u l l | | != n u l l && i m a g e C a l l . p r e c e d e s ( a s c i i C a l l ) ) ASCIINonPrintRepresentationType(); ImageRepresentationType();  }  Figure 6.18: Using communication history to implement the Abstract Factory design pattern. Attempts to instantiate the abstract class R e p r e s e n t a t i o n T y p e are intercepted and replaced with an instance of one of its subclasses. The choice of subclass is based upon which representation type the client most recently specified, as determined through the communication history queries.  137  mapset for the added subclasses of R e p l y ; 6 were used for supporting the authentication 5  protocol; and 6 were used to have the D a t a C o n n e c t i o n ' s attempt to create a socket create a passive mode socket when appropriate. The boundary maps making use of the state machines for authentication and passive mode (Figures 6.11 and 6.12) were by far the most complicated. For example, Figures 6.19 and 6.20 show the boundary map applied to all command interpreters that must respond "Not logged in" if the session has not been authenticated; this boundary map operates as follows. The most recent occurrence of the PASS command is located as an execution of the p e r f o r m method on P a s s w o r d C o m m a n d l n t e r p r e t e r .  If none such is found,  the session is definitely unauthenticated. If the client has sent the PASS command, two conditions must be met: (1) the password in it must have been validated and (2) the PASS command must be immediately preceded by the most recent occurrence of the USER command. We determine that the password has been validated by finding the call to the i s V a l i d method of A u t h e n t i c a t o r and making sure it has returned t r u e (which is recorded as its object equivalent, B o o l e a n . TRUE). We locate the most recent occurrence of the USER command and the occurrence of any command that immediately precedes the PASS command located earlier. These must be one in the same for the session to be authenticated. The boundary maps shown in Figures 6.18 and 6.19-6.20 are relatively complicated and hard to understand, but this is largely a product of clumsy syntax. A l l that I am really doing here are pattern matches on the communication history, plus a couple of specific tests on the calls that matched the pattern of interest. For the representation type, we see that the session is to use the I m a g e R e p r e s e n t a t i o n T y p e if and only if the communication history matches a pattern that is essentially: "image  [ " a s c i i ] * $". For the  session to be authenticated, the communication history must match a pattern that is essentially: "USER PASS  ["USER, PASS] * $".  In Chapter 7, I examine how a different  style of communication history query can be supported that directly expresses such pattern matching.  Boundary Map Organization I originally envisioned mapsets as constructs of convenience with little understanding of how boundary maps should be organized into them. The structure of the mapsets used for the FTP server reflected this lack. I did attempt some vague order on the basis of the shared purpose of a set of boundary maps, but it was insufficient. A mapset is a transformation There were 5 subclasses of Reply added; I forgot to add a boundary map to ReplyMarshal for one of these—another good reason why the tool should deal directly with subclasses. Also, I did not turn the added subclasses into Singletons; this was a decision not to bother, as the system workedfinewithout the Singleton optimization. 5  138  // Ensure that the session i s authenticated II a command to be performed. v o i d map(): in(perform(String)) { Thread thread = Thread.currentThread();  before  allowing  // Find the most recent occurrence of processing the PASS 11 command. Call passCall = History.lastCall(thread, PasswordCommandlnterpreter.class, "perform", n u l l , n u l l ) ; // If the c l i e n t has issued no PASS command, the session II i s d e f i n i t e l y unauthenticated. i f ( p a s s C a l l != n u l l ) { // If the session has been authenticated, the i s V a l i d II method on Authenticator must have been called as a II result of the PASS command. Find i t . Call lastTest = History.lastCalllnCFlow(thread, Authenticator.class, "isValid", n u l l , n u l l , passCall); (continued in next figure)  Figure 6.19: Boundary map applied to command interpreters to alter their behaviour on the basis of whether the session has been authenticated, Part 1 of 2. See main text for discussion.  139  (continued from previous figure)  // If isValid was not called, II unauthenticated. if(lastTest  !=  null)  // Find the last 11 command.  the session  is  {  occurrence  of processing  the USER  Call userCall = History.lastCall(thread, UserNameCommandlnterpreter.class, "perform", n u l l , h u l l ) ;  // Find the last command that was processed II he fore the PASS command found above.  just  C a l l predecessorOfPass = History. lastCallAnySubclass(thread, Commandlnterpreter.class, "perform", n u l l , n u l l , passCall.getPredecessor());  // For the session to be authenticated, the isValid II test must be true, and the last USER command must II have immediately preceded the last PASS command. if(lastTest.getReturnValue().equals(Boolean.TRUE) u s e r C a l l == p r e d e c e s s o r O f P a s s ) { CONTEXT.proceed() ;  &&  r e t u r n ;  } } }  // Otherwise,  the session  Context.send(new  is  unauthenticated.  NotLoggedlnReply());  }  Figure 6.20: Boundary map applied to command interpreters to alter their behaviour on the basis of whether the session has been authenticated, Part 2 of 2 .  140  from one (partial) world view to another. As such, one must have a clear picture of what the initial and final world views are; if one does not, it is hard to maintain a mental model of what the mapsets do. This is why it is easy, at least at an abstract level, to understand the purpose of the A u t h e n t i c a t i o n C h e c k mapset, but not that of CornmandAction. In the evolved version of the FTP server originally constructed with implicit context, I modified some of the existing mapsets for two reasons. Once again, I needed to compensate for the lack of call site instrumentation by the prototype tool. I also decided to add boundary maps directly into these existing mapsets.  However, these additions should have—and  could have—been added externally to the existing mapsets. This was an oversight at the time, as I failed to recognize the importance of avoiding invasive modification of all of the program.  Overall I successfully evolved both versions of the FTP server to incorporate the "unanticipated" addition of the authentication and passive mode features. This is not surprising: put sufficient effort into modifying a system, even to the point of rewriting it in toto, and one can have that system accommodate arbitrarily complex changes. However, remember that we wish to minimize the number and scope of the changes we make, and on this score, implicit context aids us as seen in the results.  6.3 Summary I conducted two case studies to provide evidence in support of the thesis of this dissertation. In the first case study, I selected a feature in an existing system and reused it in a different system of markedly different character—without reusing parts of the original system in which I was not interested. This feature was the Outline View of the Eclipse integrated development environment, described in some detail in Chapter 2. By using implicit context to express the emergent essential structure of the Outline View, I was able to cancel out the dependences upon Eclipse that were created by the extraneous knowledge embedded there. The Outline View was not modularized in a fashion amenable to its reuse; the designers of Eclipse did not plan for it to be reused in this way. Nevertheless, implicit context allowed for its reuse—reuse that would not have been possible when limited to the state-of-the-art. In the second case study, I designed and implemented parallel versions of a system, one version using only object-orientation (OO) and one version using object-orientation plus implicit context. I then evolved these versions, each using the same approach with which it was originally constructed. The original version consisted of the Minimum Implementation of the File Transfer Protocol (FTP) as definedby the FTP specification; my designs ignored the existence of the remainder of the FTP specification. I had colleagues vet the initial O O  141  design to ensure that it was reasonable. I selected two features from the remainder of the FTP specification, authentication and passive mode, and evolved the versions to include them. In addition, I "cross-evolved" the initial OO version using implicit context. While all three evolved versions were able to incorporate the new features, the O O version required extensive invasive modifications. The versions evolved through implicit context were able to avoid all but one invasive modification, the latter due to a lack of support in my prototype tool. The cross-evolved version demonstrates that implicit context can be adopted after the initial construction of a system. However, only the modules of the version initially constructed with implicit context are maximally free of E E K . Future evolution of these systems, inevitable in the real world, would require more and more invasive modifications—at great difficulty and risk—of the versions not initially constructed with implicit context. The E E K contained therein fully remained in the evolved OO version: for example, an instance of S e s s i o n still needs to be passed to each Command subclass, the entire system needs to know that the R e p l y subclasses are each Singletons, every Command subclass holds a piece of the authentication protocol, etc. E E K partially remained in the cross-evolved version (e.g., an instance of S e s s i o n still needs to be passed to each Command subclass, and the entire system needs to know that the R e p l y subclasses are each Singletons) since I only cancelled out the minimum amount of E E K hindering the immediate evolution step being performed. What remains is to make the use of implicit context fully practical. I have identified a number of shortcomings in the prototype tool through the case study, beyond those described earlier in Chapter 5. I now proceed to discuss how these may be addressed in future work.  142  Chapter 7  Discussion I have alluded to a number of issues that revolve around the question of what a module represents in isolation from its context of operation. In this chapter, I begin in Section 7.1 by describing a representation for the "meaning" of modules in isolation from their systems; this representation is based on the idea of local execution, the specification of the behaviour defined by a module without reference to its context. This section goes on to explain how implicit context can be used to support such localized reasoning about modules. Reasoning about modules through appeals to local execution is only useful if such reasoning can be transformed to the global perspective of a system. Communication history can be used to bind a module in a flexible way to a novel context of operation. This binding is a specification of minimal dependences between a module's local execution and its context's execution. Section 7.2 describes this property of communication history and how it and local execution can be used to efficiently support communication history queries. Section 7.3 then points out how aspect-oriented programming can be described in terms of local execution and minimal dependences, and compares it to implicit context.  7.1  Local Execution  A program is a specification of an execution [Hehner, 1999]. The lexicon and syntax of a programming language are used to interpret a string (the program) as a set of instructions; the semantics of the language places constraints on how to perform each instruction and the order in which instructions are to be performed. A virtual machine reads each instruction, performs it, and selects the next instruction according to the semantic constraints imposed by the language. A module is a program fragment; therefore, a module is an execution specification fragment. However, whether one can attribute any meaning to a module in isolation, as separated from the context of any particular system, depends upon the way one interprets this statement. There are two possibilities: either a module is a fragment of an execution 143  specification or a module is a specification of an execution fragment. Let us consider if that difference is significant. If we regard a module as merely a fragment of a specification, we cannot expect it to necessarily have much meaning in isolation. For example, in the Abstract Factory design pattern, some class must act the role of Client; that Client class will possess instructions similar to the following: p u b l i c  v o i d  doit(AbstractFactory  factory) {  Product p = factory.makeProduct(); } However, while the use of particular names here is intended as an aid to understanding what is going on in the system, this code is equivalent to the following where the names have simply been changed: p u b l i c  v o i d  a(A  obj)  {  B b = obj.f(); } It is not until we consider some higher-level  module containing this Client class that the  presence of the Abstract Factory design pattern, or the part that the code snippet above plays in it, becomes definite. In other words, we would have to consider the other specification fragments (modules) that our module depends upon to make significant definite statements about it. But a module does say something in isolation about the execution of the program of which it is a part. A set of "arbitrary" calls (from the perspective of an isolated module) may indeed only possess a meaning that we would consider significant when those calls exist in a larger context. That we cannot ascribe greater significance to that set of calls may bring into question our choice of modularization. However, when we read the source text of such a module in isolation, it does not say anything more than that this set of "arbitrary" calls is to take place. The snippet of code from the Client class shown above simply makes a call on an object passed as an argument, and stores the result; if there is anything more going on (such as the presence of the Abstract Factory design pattern), the clues lie in the parts of the system not seen above. Rather than disavowing any knowledge of the meaning of a module in isolation from any given system, we can consider the module in isolation to be a complete program written in a special-purpose language. There is a virtual machine that interprets the instructions in this language [Goldberg, 1973a,b]. It is defined by the context in which the module executes; that context consists of such items as the other modules present, the state of the system, higher-level virtual machines, and the platform on which the system is executing.  144  Some instructions may be invariant under a large set of contexts. For example, consider a module written using Java syntax that utilizes primitive arithmetic operators: the behaviour resulting from those operators is invariant under all contexts that adhere to standard Java semantics.  However, if that module were to execute in a context that did not  adhere to standard Java semantics, the behaviour of an arithmetic operator might be overridden, perhaps in the fashion of C++ [ISO/IEC, 1998:  p. 23Iff.]. The presence of an  arithmetic operator is simply an instruction to the local virtual machine, to be interpreted as appropriate to the situation. It does not matter from the local module's perspective how its instructions are implemented; only the data that are input to it (through formal parameters, global variables, or return values) will affect its local computation. With this picture, we can consider a module to be a specification of a local view on a fragment of its program's execution; that is, a module is a specification of local execution. This local execution is invariant for a given set of inputs to the module. The arithmetic operator will always be called at the same point in the module, but the actual behaviour caused by that call will vary depending on the context of the module. This is not to say that the execution of a module will be invariant from a global perspective, though. To see how the local perspective maps to the global perspective, I use event traces. Section 7.1.1 introduces event.traces and their use in representing local execution. Section 7.1.2 demonstrates how contextual dispatch permits the reconciliation of local and global execution.  7.1.1  Event Traces  I have described the execution of a module in terms of the instructions performed by a virtual machine that is defined by the context of the module. However, another representation exists with which to picture an execution: an event trace. An event trace can be thought of as a ticker tape, where time monotonically increases in one direction along the tape. The events transcribed upon the tape are the instructions performed by a virtual machine in the order in which they are performed. Consider this simple Java program:  Simple { p u b l i c s t a t i c v o i d main(String[] Simple s = n e w Simple(); s.doit();  p u b l i c  c l a s s  args) {  }  print() { System, e r r . p r i n t l n ( " H e l l o world!") ;  v o i d  145  }  doit().  void  f o r ( i n t  i  { =  0;  i  <  2;  i++)  {  print(); } } }  An event trace for the execution of this program might look like the following, where time 1  increases downwards. I have used indentation to help indicate the nesting of executions. Enter:  Simple.main(String[])  Enter:  Simple()  Exit:  Simple()  Enter:  Simple.doit()  Enter:  Simple.print()  Exit:  Simple.print()  Enter:  Simple.print()  Exit:  Simple.print()  Exit: Exit:  Simple.doit() Simple.main(String[])  This is a very simple trace, showing only the entries and exits to methods within the class S i m p l e . A more complex trace might show the calls to and returns from these methods, the execution of the instructions to print " H e l l o  w o r l d ! " , the creation of the instance  of S i m p l e , assignments, comparisons, field accesses, garbage collection, class loads, etc. Different event traces are possible for the same execution, with widely differing levels of detail. For example, the trace produced by one particular Java Virtual Machine (JVM)  2  on the simple program shown above consists of nearly 9,000 lines of text. Even so, this trace is fairly high-level, displaying only such events as method entries and exits, object creations, garbage collection, class loads and unloads, and field accesses; other, lower-level events, such as the execution of an arithmetic operator or an assignment statement, are not shown. The low-level events that do not appear in an event trace can sometimes be implied. For example, in order that a method entry occur, a call to that method must have happened; 'Java supplies a default constructor for concrete classes where one has not been explicitly declared. Sun Classic V M , version 1.2.2, build 1.2.2.006, green threads, nojit; executed as " J a v a X r u n h p r o f S i m p l e " at the command line. 2  146  the trace produced by the J V M does not happen to record this call as a separate event. Thus, an alternative trace to the one I showed above is as follows, where method calls and returns have been added. I explain the significance of the boldface events below.  Call: Simple.main(j ava.lang.String[]) Enter: Simple.main(String[]) Call: Simple() Enter: Simple() Exit: Simple() Return: Simple() Call: Simple.doit() Enter:  S i m p l e . d o i t ( )  C a l l :  S i m p l e . p r i n t ( )  Enter: Simple.print() Call: System.err.println(j ava.lang.String) Return: S y s t e m . e r r . p r i n t l n ( j ava.lang.String) Exit: Simple.print() Return:  S i m p l e . p r i n t ( )  C a l l :  S i m p l e . p r i n t ( )  Enter: Simple.print() Call: System.err.println(j ava.lang.String)' Return: S y s t e m . e r r . p r i n t l n ( j ava.lang.String) Exit: Simple.print() Return: E x i t :  Return: Exit: Return:  S i m p l e . p r i n t ( ) S i m p l e . d o i t ( )  Simple.doit() Simple.main(String[]) Simple.main(Java.lang.String[])  In Java, an event trace for an entire program and an event trace for an individual module would be related in a straightforward manner. The full program trace that I have just given contains the event trace for the execution of the individual modules within the program; these local executions form subsequences within the full trace. Above, we can see the event trace for the S i m p l e . d o i t method embedded within the full trace; it consists of the lines in boldface. This local execution is independent of the context in which S i m p l e . d o i t exists. To see more clearly this independence of local execution from context, consider an extension to the simple program given earlier. Now, we have a second class, S u b s i m p l e , which is a subclass of S i m p l e . This new class overrides the implementation for the p r i n t method.  147  Subsimple e x t e n d s Simple { v o i d m a i n ( S t r i n g [ ] args) Subsimple s = n e w Subsimple(); s.doit();  p u b l i c  c l a s s  p u b l i c  s t a t i c  {  }  print() { System.err.println("Ciao  v o i d  world!");  } }  The full program event trace for the execution of " J a v a lows.  S u b s i m p l e " would be as fol-  Call: Subsimple.main(j a v a . l a n g . S t r i n g [ ] ) Enter: Subsimple.main(String[]) Call: Subsimple() Enter: Subsimple() Exit: Subsimple() Return: Subsimple() Call: Subsimple.doit() Enter:  Simple.doit()  C a l l :  S i m p l e . p r i n t ( )  Enter: Subsimple.print() Call: System.err.println(j ava.lang.String) Return: S y s t e m . e r r . p r i n t l n ( j ava.lang.String) Exit: Subsimple.print() Return:  S i m p l e . p r i n t ( )  C a l l :  S i m p l e . p r i n t ( )  Enter: Subsimple.print() Call: System.err.println(j ava.lang.String) Return: S y s t e m . e r r . p r i n t l n ( j ava.lang.String) Exit: Subsimple.print() Return:  S i m p l e . p r i n t ( )  E x i t :  Simple.doit()  Return: Subsimple.doit() Exit: Subsimple.main(String[]) Return: Subsimple.main(j a v a . l a n g . S t r i n g [ ] ) Note the difference between the methods being called and the methods being entered; this is simply object-oriented dispatch taking place. The effect of the calls made by 148  S i m p l e . d o i t have changed in this new context; however, the local execution of S i m p l e . d o i t remains unchanged: the same sequence of local events occurs. Thus, we can represent this local execution without the need to represent the remainder of the full pro. gram execution: Enter:  S i m p l e . d o i t ( )  C a l l :  S i m p l e . p r i n t ( )  Return:  S i m p l e . p r i n t ( )  C a l l :  S i m p l e . p r i n t ( )  Return:  S i m p l e . p r i n t ( )  E x i t :  S i m p l e . d o i t ( )  This representation begins to give us a flavour of independent world views. However, object-oriented dispatch is a limited means of expressing something akin to independent world views. After all, Java classes explicitly declare that they specialize other classes and they implicitly declare (through the absence of the  f i n a l  modifier) that they permit such  specialization. To make the leap to truly independent world views and the reconciliation of local and global execution, we must consider a different example.  7.1.2  Reconciliation of Local and Global Execution  Once more, I consider the example of the Abstract Factory design pattern.  We have a  C l i e n t class that requires the creation of an instance of a P r o d u c t class. We want this C l i e n t class to specify its essential structure; an invocation of the constructor of P r o d u c t is as straightforward a specification of the creation of an instance as Java permits. Thcresulting C l i e n t class is shown below (for the sake of simplicity, I have reduced it to a minimum necessary to illustrate the point). p u b l i c  c l a s s  p u b l i c  v o i d  Client  {  doit() {  Product p = n e w Product(); } }  The local execution of this module can be represented by the following event trace.  Enter: Call: Return:  Client.doit() Product() Product()  Exit:  Client.doit()  Let us proceed to consider using C l i e n t in different systems. In one situation, we embed the C l i e n t class in a context where P r o d u c t is a concrete class, as shown below.  149  public  class  Product  c l a s s  Main  {  } p u b l i c  p u b l i c  s t a t i c  Client  c  {  v o i d  main(String[]  args)  {  = new C l i e n t ( ) ;  c.doit(); } }  Here, the full program execution is represented as a straightforward event trace, with the local execution of C l i e n t . d o i t in boldface: Call:  '  Main.main(Java.lang.String[])  Enter:  Main.main(String[])  Call:  Client()  Enter:  Client()  Exit:  Client()  Return: Call:  Client() Client.doitO  Enter:  C l i e n t . d o i t O  C a l l :  Product()  Enter:  Product()  Exit:  Product()  Return:  Product()  E x i t :  C l i e n t . d o i t O  Return:  Client.doit()  Exit:  Main.main(String[])  Return:  Main.main(java.lang.String[])  In a different situation, we must embed the C l i e n t class in a context where the Abstract Factory design pattern is in use. p u b l i c  a b s t r a c t  c l a s s  Product  {  } public  class  MSWindowsProduct  class  MotifProduct  extends  Product  } public  extends  }  150  Product  {  {  p u b l i c  a b s t r a c t  p u b l i c  AbstractFactory { Product makeProduct();  c l a s s  abstract  }  MSWindowsFactory e x t e n d s A b s t r a c t F a c t o r y { p u b l i c Product makeProduct() { return new MSWindowsProduct();  p u b l i c  }  c l a s s  '  }  MotifFactory extends AbstractFactory { Product makeProduct() { return new M o t i f P r o d u c t ( ) ;  p u b l i c  c l a s s  p u b l i c  } }  p u b l i c  c l a s s  Main {  s t a t i c v o i d main(String[] AbstractFactory factory;  p u b l i c  args) {  S t r i n g , os = S y s t e m . g e t P r o p e r t y ( " o s . n a m e " ) ; if(os  ==  "Windows")  f a c t o r y = n e w MSWindowsFactory(); else  factory = new MotifFactory(); Client c = new Client(); c.doit(factory); } }  We would like this program to result in an execution that could be represented by the following event trace (for executions on U N I X platforms), where the local execution of C l i e n t . d o i t is boldface. ^ Call: Main.main(j ava.lang.String[]) Enter: Main.main(String[]) Call: System.getProperty(j ava.lang.String) Return: System.getProperty(java.lang.String) 151  Call:  MotifFactory()  Enter:  MotifFactory()  Exit:  MotifFactory()  Return:  MotifFactory()  Call:  Client().  Enter:  Client()  Exit:  Client()  Return:  Client()  Call:  Client.doit(AbstractFactory)  Enter:  C l i e n t . d o i t ( )  C a l l :  Product()  Enter:  MotifFactory.makeProduct()  Call:  MotifProduct()  Enter:  MotifProduct()  Exit:  MotifProduct()  Return: Exit:  MotifProduct() MotifFactory.makeProduct()  Return:  Product()  E x i t :  C l i e n t . d o i t ( )  Return:  Client.doit(AbstractFactory)  Exit: Return:  Main.main(String[]) Main.main(j  ava.lang.String[])  A s in the object-oriented dispatch example described earlier, there are disconnects in this event trace between which methods are being called and which methods are being entered. In both examples, this disconnection cannot be explained through a simple examination of the event traces as shown; the event traces are effectively non-deterministic. In the object-oriented dispatch example, the means of mapping between the local execution trace and the methods to be executed was specified by annotations in the source code, namely extends  clauses, and the run-time type of the target object ( S i m p l e versus S u b s i m -  ple). In the contextual dispatch example, this mapping must take place through the provision of a boundary map, such as the following. mapset  void  AbstractFactoryMap map(Client  c):  {  in(c.doit(AbstractFactory))  c.doit(); } Product  map():  out(Product.new())  152  {  {  AbstractFactory  factory  =  (AbstractFactory)  History.lastinstancePassed(AbstractFactory.class) return  apply  factory.makeProduct();  AbstractFactoryMap  to  Client;  There is one boundary map here for each of the two disconnects present in the event trace. The first boundary map translates from the call to C l i e n t . d o i t  (AbstractFactory)  to the method entry of C l i e n t . d o i t ( ) ; this is a translation from the event trace expressing the global perspective (that the Abstract Factory design pattern is being used) to the local perspective (that it is not). The second boundary map translates from the call to instantiate P r o d u c t to the method entry of A b s t r a c t F a c t o r y . m a k e P r o d u c t ; this is a translation from the local perspective back to the global perspective. That is the nature of dispatch in general: to map from one perspective to another. Contextual dispatch permits these perspectives to be inconsistent. Thus, from the global perspective, what is "really" happening during the execution of this program is the following event trace, expressed in a fashion consistent with object-oriented dispatch. Call:  Main.main(j  Enter: Call:  ava.lang.String[])  Main.main(String[]) System.getProperty(j  Return: Call:  ava.lang.String)  System.getProperty(java.lang.String) MotifFactory()  Enter:  MotifFactory()  Exit:  MotifFactory()  Return:  MotifFactory()  Call:  Client()  Enter:  Client()  Exit:  Client()  Return:  Client()  Call:  Client.doit(AbstractFactory)  Enter:  C l i e n t . d o i t ( A b s t r a c t F a c t o r y )  C a l l :  MotifFactory.makeProduct()  Enter:  MotifFactory.makeProduct()  Call:  MotifProduct()  Enter:  MotifProduct()  Exit:  MotifProduct() 153  Return: MotifProduct() Exit:  MotifFactory.makeProduct()  Return:  MotifFactory.makeProduct()  E x i t :  C l i e n t . d o i t ( A b s t r a c t F a c t o r y )  Return: C l i e n t . d o i t ( A b s t r a c t F a c t o r y ) Exit: Main.main(String[]) Return: Main.main(j ava.lang.String[]) On the other hand, from the local perspective of C l i e n t , the global execution is consistent with the following event trace. Call: M a i n .main ( j ava.. l a n g . S t r i n g [ ] ) Enter: Main.main(String[]) Call: Client() Enter: Client() Exit: Client() Return: C l i e n t ( ) Call: Client.doit() Enter:  C l i e n t . d o i t ( )  C a l l :  Product()  Enter: Exit:  Product() Product()  Return:  Product()  E x i t :  C l i e n t . d o i t ( )  Return: C l i e n t . d o i t ( ) Exit: Main.main(String[]) Return: Main.main(j ava.lang.String[]) In this case, C l i e n t would not necessarily have anything to say about exactly what had happened throughout the entire execution, but this is equivalent to event traces in general: they always leave out some details of what has happened. Neither perspective should be considered more "real" than the other. Both perspectives leave out details, and describe the execution in a way that is convenient to a given reference frame. Contextual dispatch permits a translation between these reference frames, so that one can reason about what is happening in a localized fashion. I now consider the connection between communication history and event traces.  154  7.2  Communication History: Use and Efficient Support  In Chapter 4, I described communication history as conceptually being a complete record of everything that has happened within an execution. Regardless of the fact that any particular event trace leaves out some information, we can select an event trace that provides sufficient details to answer particular questions about communication history. In Chapter 5, I described an implementation of communication history that was a literal realization of this model: every event within an execution was recorded and at the disposal of queries that occur in boundary maps. However, there are two important points to be made about communication history. First, communication history should not be regarded as simply some novelty in a programmer's "bag o' tricks"—although it could certainly be misused in this fashion. Second, the naive implementation provided by my prototype tool can be highly optimized.  7.2.1  Communication History Queries as an Expression of Minimal Dependences  We want to maximize the number of contexts in which a module may be used. That is, we want to be able to both reuse a module in multiple systems and to make it insensitive to changes within a given system. Through the expression of essential structure, we can achieve this maximization. But the scalability of the technique will be limited if we have to take a given module and immediately tie it to concrete entities in a system. Those entities may well change, and by tying down our abstract dependences so immediately, we limit the flexibility of all but our lowest level modules. Instead, we should express the minimal dependences that our module has upon its context. For modules that approach the global level, these minimal dependences may indeed be concrete. But below this level, they need not be. Look again at the Abstract Factory design pattern example described in Section 7.1.2. In performing the translation between the local and global executions there, we needed to express where the instance of  A b s t r a c t F a c t o r y (used in the translation process) was  to be located. One thing must be understood here: Where to locate this instance is a choice that is made on the basis of the system design. That is, there is some protocol that the system follows that defines where the instance is to be found. In the example in Section 7.1.2 the choice of protocol made was simply whatever instance of  A b s t r a c t F a c t o r y that was  last passed in the system was the one to be found. That choice would not be appropriate in every system. In systems where that choice was not appropriate, the boundary maps would need to be replaced, and hence, only the inner module (i.e.,  C l i e n t ) would be reusable.  In those systems where that choice was appropriate, the boundary map should be reusable in addition to the inner module. But to realize that potential reusability, the proto-  155  c o l n e e d s to b e d e c l a r e d as a b s t r a c t l y as p o s s i b l e — o t h e r w i s e , the d e c l a r a t i o n o f the p r o t o c o l itself w o u l d contain extraneous  embedded knowledge.  I n the a b s e n c e o f c o m m u n i c a t i o n history, w e w o u l d n e e d to take o u r c h o i c e o f p r o t o c o l a n d m a n u a l l y c o m p i l e it. W e last i n s t a n c e  would  need to manually analyze where  o fA b s t r a c t F a c t o r y  was  passed,  insert  some  code  i no u r s y s t e m  there t o r e c o r d  the this  instance, a n d h a v e o u r b o u n d a r y m a p a c c e s s this r e c o r d . B u t this m a n u a l c o m p i l a t i o n w o u l d n e e d to b e p e r f o r m e d a g a i n a n y t i m e w e w e r e to c h a n g e the s y s t e m , a n d a n y t i m e w e to r e u s e C l i e n t .  7.2.2 A  wanted  T h i s w o u l d be tedious and error-prone.  Communication History Queries as Pattern Matches  q u e r y toc o m m u n i c a t i o n h i s t o r y a l l o w s us t o s p e c i f y an abstract d e s c r i p t i o n o f the p r o t o -  c o l t ob e f o l l o w e d . I fthe p r o t o c o l i s s i m p l y t o f i n d the last-passed i n s t a n c e o f a g i v e n type, the appropriate c o m m u n i c a t i o n history q u e r y d e s c r i b e s this. I fa m o r e c o m p l e x p r o t o c o l is needed, a different c o m m u n i c a t i o n history query can be As  made.  l o n g as the event trace w e are translating f r o m is deterministic, w e c a n  the n e c e s s a r y  subsequence(s)  thought o f as a string:  that locate  the i n f o r m a t i o n w e n e e d .  each event represents  describe  A n event trace can b e  a "character" in a c o m p l e x alphabet.  Conse-  q u e n t l y , a c o m m u n i c a t i o n h i s t o r y q u e r y is a pattern m a t c h a g a i n s t o n e o f the e v e n t traces o f the t w o i n d e p e n d e n t w o r l d v i e w s u n d e r g o i n g translation. For example,  to locate  need to describe those  where set:  call  o fA b s t r a c t F a c t o r y ,  w e would  "characters" representing m e t h o d calls containing a nargument o f  type A b s t r a c t F a c t o r y .  c a l l ( * ( . . ,  the last p a s s e d i n s t a n c e  L e t u s s a y that this set o f c a l l e v e n t s i s d e s c r i b e d a s  A b s t r a c t F a c t o r y ,  . . ) )  is a k e y w o r d . T h e a r g u m e n t i n t h i s e x p r e s s i o n g i v e s t h e k i n d o f c a l l s w i t h i n the  the asterisk i n d i c a t e s that a n y m e t h o d n a m e i s a v a l i d m a t c h , a n d the f o r m a l  list s a y s  that o n e  formal parameter  position  i n the list d o e s  has  not matter.  to b e o f type A b s t r a c t F a c t o r y  I call each  o fthe events  parameter  b u t that its  i n t h i s s e t a primitive  b e c a u s e its d e s c r i p t i o n m a k e s n o m e n t i o n o f its p o s i t i o n r e l a t i v e t o o t h e r  events.  T o d e s c r i b e the p l a c e m e n t o f this pattern w i t h i n the event trace, w e m i g h t b o r r o w standard notation f r o m regular expression tools, such as g r e p  "  event*  c a l l ( * ( . . , [ " c a l l ( * ( . . ,  T h e caret matches event;  A b s t r a c t F a c t o r y , A b s t r a c t F a c t o r y ,  the start o f the e v e n t t r a c e .  the asterisk after it m a t c h e s  passing A b s t r a c t F a c t o r y .  some  U N I X :  . . ) ) . . ) ) ] * $  The keyword  zero ormore occurences.  an instance of A b s t r a c t F a c t o r y ,  from  event  event  matches  Next comes  any  single  the c a l l p a s s i n g  f o l l o w e d b y z e r o o r m o r e e v e n t s w h i c h a r e not  calls  A n d finally, the d o l l a r sign m a t c h e s the (current) e n d o f the  156  event trace. I call each event matched by this complete query a compound event, it consists of primitive events with ordering constraints attached. The syntax I have just used in the example is intended to merely give a flavour of the pattern matching in communication history; I do not promote it as being elegant or even sufficient. In fact, we are generally interested in matching patterns within communication history that possess more structure. Specifically, we often want to pair up method entries and method exits, or calls and returns, that are nested in particular ways. Such pairing is equivalent to balanced parenthesis matching. Thus, regular expressions do not suffice to describe the patterns of interest, but more general context-free grammars are required [Chomsky, 1956, 1959]. Determining if an event trace matches a pattern of interest is equivalent to deciding if a string belongs to a language. Therefore, one might be tempted to point out that this could be an undecidable problem in the most general case. To this, I have two points to make. First, while one could make arbitrarily complex queries, there is no reason to assume that such would be necessary in any real situation. The hope here is that, by making "reasonable" local choices in terms of system structure and queries to the communication history, that no bizarre queries would need to be made at higher levels as a result. It is true that this is a conjecture; my desire to address this conjecture in the context of "practical," "reasonable," and "real" situations make an analytic approach to its validation all the more difficult. But a communication history query that was more complex than the kind of system it was querying would be an indication of misuse. Second, a communication history query does not need to be arbitrarily complex. An arbitrarily complex query would effectively be a complete computation independent of the rest of the system. In most practical situations, one could simply retrieve the information of interest through a communication history query, and perform a computation based upon this information separately. For example, in the boundary maps shown in this dissertation, that is precisely what happens: one or more communication history queries are made, retrieving particular information about the execution, and then computations are performed on the basis of this information. A communication history query is simply a localized means of retrieving information from the execution of a system. This locality has two aspects. A communication history query is present at a given place in the source text. The question it answers does not depend on individual snippets of code that have to all work correctly and in concert. A communication history query also takes place at a specific location within the execution of the system. The question it answers does not depend on having asked various smaller questions previously during the execution, and having stored those answers away for later retrieval. And then, a variable that, for example, contains a flag simply contains a flag: one cannot be sure that its value expresses the property in which one is interested.  157  7.2.3  Efficient Support of Communication History Queries  Recall that the prototype tool that I described in Chapter 5 is quite inefficient in two ways: (1) it instruments every method entry and method exit, resulting in bloat in the size of the source text and in a degradation of dynamic performance due to the execution of all this instrumentation; and (2) every event is recorded once and forever, requiring ever increasing storage space, and ever slower responses to queries. With the picture I have outlined of communication history queries as pattern matches against event traces, I can now address how we might efficiently support communication history queries. The basic idea is, rather than support every possible query that one might make on communication history, to support within a given system only those queries that are actually made. Thus, in the example used in the previous section, we would need to support only this query: "  c a l l ( * ( . . ,  event*  ["call(*(..,  AbstractFactory,  AbstractFactory,  ..))  ..))]*  $  if no others occurred in the system. There are three steps needed to reduce the support to the minimum necessary: (1) optimize the query expression; (2) instrument those points in the source text that represent the primitive events of interest; and (3) feed these primitive events into an automaton to identify if the compound event of interest has occurred. In the example query, the subexpression event *  means that we are only interested in a suffix of the event trace. This subexpression can be discarded, reducing the query to call(*(..,  AbstractFactory,  ["call(*(..,  ..))  AbstractFactory,  ..))]*  $  The primitive events of interest here are partitioned into those that match call(*(..,  AbstractFactory,  and those that do not. The pattern " a  ..))  [ " a ] * $" for any subexpression a indicates that we  want the most recent occurrence of a. Therefore, we only need to instrument those points in the source text that can be statically identified as potentially generating calls to methods with arguments of type A b s t r a c t F a c t o r y . Finally, our automaton needs to record only the instance of A b s t r a c t F a c t o r y passed in the most recently generated primitive call event handed to it by the instrumentation. As soon as a new event of this kind occurs, the old instance stored by the automaton can be discarded in favour of the new. 158  Not all communication history queries will necessarily be so amenable to reduction. However, my prototype implementation of communication history provided only a small set of queries, and this set quickly reached a stable size through the course of practical use. This is an indication that, at worst, optimized support for this small set of queries can be provided. For more general queries, parser generation technology is available that can cope with context-free grammars. This is not to say that the problem will necessarily be simple to solve, but there is a promising route forward. Previous work by Havelund and Ro§u [2001] and by Giannakopoulou and Havelund [2001] may help here.  7.3  Implicit Context and Aspect-Oriented Programming  Implicit context bears resemblance to a group of techniques generally referred to as aspectoriented programming (AOP) [Kiczales et al., 1997]. A O P was originally described as providing the means for the separation and encapsulation of concerns (such as synchronization or distribution) that "crosscut" the base modularity of a system. More recently the term has come to refer to any technique that permits functionality to be inserted into a system on the basis of predicates that describe the points of interest [Elrad et al., 2001]. Implicit context provides a superset of the former definition, but is an element of the latter definition. Describing the fundamental differences between the various AOP techniques, rather than superficial differences of syntax or tool support, has been difficult for the community. Superficial differences can be overcome; fundamental differences are not so amenable to change. The model of local execution and event traces can be used as a means of comparison. I outline some of the features and concepts behind a particular AOP approach in Section 7.3.1, and translate this description to the model of event traces in Section 7.3.2. A more general description of AOP techniques can be found in Section 8.4.4. 7.3.1  Outline of Concepts in AspectJ  AspectJ [Kiczales et al., 2001] is an aspect-oriented extension to Java; it is the chief representative of a particular style of AOP, the one with the highest level of development (at the time of writing this dissertation). The AspectJ-style of AOP operates on the concept of join points. A join point is an event in the execution of a program where behaviour can be added, to either replace or supplement other behaviour. Join points are equivalent to primitive events as I have described them above. To. add or replace functionality, one quantifies a set of join points through a predicate that matches certain properties of events of interest. These sets are called pointcuts, since they generally describe a set of join points that crosscuts the system's modularity. The pointcut predicates are constructed as expressions that  159  compose simpler keyword expressions called pointcut designators (PCDs). PCDs exist that capture method executions, field accesses, initializer executions, etc. For example, a pointcut predicate declaration might read as: pointcut pc(String  s ) : args(s)  execution(public  void  &&  *.meth(String));  This statement declares a pointcut predicate called p c ; it captures join points (specifically, method executions) possessing certain properties. The argument to the e x e c u t i o n PCD indicates that the method being executed must be named m e t h , that it must take exactly one argument of type S t r i n g , that it must have a result type of v o i d , and that it must possess p u b l i c visibility. The asterisk indicates that this method may belong to any class; such wildcards may be used in place of much of the other information that we do not wish to describe with particular values. The a r g s PCD permits two things: it constrains the passed argument to be of a particular type ( S t r i n g in this case), and it can expose the object passed in that argument through the formal parameter to the predicate. Pointcut predicates can be used in the declaration of functionality much as event designators are used in the declaration of boundary maps ; this added functionality is called 3  advice. That functionality is performed whenever the predicate evaluates to true. There are three basic kinds of advice: b e f o r e advice executes just before an identified event occurs; a f t e r advice executes just after an identified event occurs; and a r o u n d advice executes in place of an identified event. For example, we can use the pointcut predicate p c to define the advice like the following: void around(String  s ) : pc(s) {  System', e r r . p r i n t l n ( " D i s c a r d i n g c a l l  t o meth:"  +s ) ;  }  Now, whenever an execution of a method called m e t h , with the other properties mentioned above, is about to happen, this execution is discarded in favour of this advice. The S t r i n g object that was to be passed to m e t h is captured and passed to the body of the advice, where it can be used like a formal parameter to a method. An additional PCD of interest in my discussion is c f low. A pointcut predicate using this PCD can be declared as: pointcut pc2(int i , String s ) : cflow(execution(* 3  *.*(int)))  && a r g s ( i ) )  &&  In fact, that is where the syntax for boundary maps was derived from.  160  pc(s);  The pointcut described by p c 2 is a restriction (i.e., a subset) of that described by pc. It requires that any execution described by pc also happen within the control-flow of a certain other method execution; that is, it must occur after a certain method entry, but before that method exits. This nesting method must possess an argument of type i n t ; again, this argument is exposed so it can be used within added, boundary-map-like functionality. 7.3.2  Describing AspectJ in Terms of Event Traces  The semantics of AspectJ are based upon changes to the execution stack [Wand et al., 2002]. For example, e x e c u t i o n selects stack changes involving the top frame on the stack (i.e., its addition or removal), while c f l o w selects those stack changes that happen when a given execution frame remains on the stack. This stack-based semantics is extended in a straightforward manner to permit the capture of events that do not truly involve the execution stack of the Java Virtual Machine, such as field accesses. Once a frame is removed from the stack, it ceases to be accessible. It can be difficult to tell from all the details of syntax and semantics if there is any significant difference between the AspectJ-style of AOP and implicit context. However, Hannemann and Kiczales [2002] have indicated that AspectJ is not capable of separating the Abstract Factory design pattern in the manner shown above. Describing AspectJ semantics in terms of event traces allows a more general comparison, resulting in three key differences. • Most pointcut predicates can only differentiate between primitive events. • The c f l o w P C D permits the description of compound events, but only in a very limited way. 4  • Reconciliation between conflicting world views is not possible. I examine these points below. Pointcut Predicates and Primitive Events  Consider again the simple Java program shown in Section 7.1.1: p u b l i c  c l a s s  Simple {  main(String[] Simple s - new Simple(); s.doit();  p u b l i c  s t a t i c  v o i d  args) {  }  4  This statement remains true for the minor variants of the  161  cf low  PCD, like c f  lowbelow.  void  print()  {  System.err.println("Hello  world!");  } void  doit()  for(int  {  i =  0;  2;  i <  i++)  {  print(}; } } }  We can capture and replace the execution of p r i n t through the following pointcut and advice: pointcut  printPC(): execution(void Simple.print());  around(): printPC()  void  {  } This would discard the execution of p r i n t without replacing it with anything. This has the effect of translating the original execution trace to the following one; events discarded from the original trace are struck out. Call:  Simple.main(j ava.lang.String[])  Enter:  Simple.main(String[])  Call:  Simple()  Enter:  Simple()  Exit:  Simple()  Return:  Simple()  Call:  Simple.doit()  Enter:  Simple.doit()  Call:  Simple.print()  Enter. Call.  S i m p l e . p r i n t () System.en .println(java.lang.String) 1  Return.—3^ stern.err.println(java.lang.String) Exit.  S i m p l e . p r i n t ()  Return:  Simple.print()  Call:  Simple.print()  Enter . Call.  Simple.print() System.err.println(Java.lang.String)  R e t u r n . — B y i> tern, e r r . p r i n t l n ( J a v a . l a n g . S t r i n g ) 162  EJeife . Return:  S i m p l e . p r i n t () Simple.print()  Exit: Simple.doit() Return: Simple.doit() Exit: Simple.main(String[]) Return: Simple.main(java.lang.String[]) But note that this pointcut does not differentiate between the two calls to S i m p l e . p r i n t . The only difference between them lies in their relative position within the trace. Thus, only a compound event could differentiate between them.  The c f l o w PCD and Compound Events The c f l o w PCD can be used to specify compound events to a limited extent. Consider the following pointcut predicate: p o i n t c u t  doitCF():  c f l o w ( e x e c u t i o n ( v o i d  Simple.doit()));  It would match every event that occurred while S i m p l e . d o i t remained on the stack, as shown by the boldface lines below:  Call: Simple.main(j ava.lang.String[]) Enter: Simple.main(String[]) Call: Simple() Enter: Simple() Exit: Simple() Return: Simple() Call: Simple.doit() Enter:  S i m p l e . d o i t ( )  C a l l :  S i m p l e . p r i n t ( )  Enter: C a l l : Return: E x i t :  S i m p l e . p r i n t ( ) S y s t e m . e r r . p r i n t I n ( J a v a . l a n g . S t r i n g ) S y s t e m . e r r . p r i n t l n ( j  Return:  S i m p l e . p r i n t ( )  C a l l :  S i m p l e . p r i n t ( )  Enter:  a v a . l a n g . S t r i n g )  S i m p l e . p r i n t ( )  S i m p l e . p r i n t ( )  Cal1:  S y s t e m . e r r . p r i n t I n ( j  a v a . l a n g . S t r i n g )  Return:  S y s t e m . e r r . p r i n t l n ( j  a v a . l a n g . S t r i n g )  E x i t : Return:  S i m p l e . p r i n t ( ) S i m p l e . p r i n t ( )  163  E x i t :  Simple.doit()  Return:  Simple.doit()  Exit:  Simple.main(String[])  Return:  Simple.main(j ava.lang.String[])  Let us say that we wanted to ensure that no executions of the constructor of S i m p l e occur while S i m p l e . d o i t executes; none do so in this example, but we might want to ensure this property even if the system changes later. We could capture any such events by the pointcut: pointcut  constructor():  execution(Simple.new());  However, this c o n s t r u c t o r pointcut would also capture every other execution of this constructor. To differentiate between the constructor executions of interest and those not of interest, we could conjoin these pointcuts: pointcut  unwanted!):  constructor!)  && d o i t C F O ;  Advising the u n w a n t e d pointcut would allow us to signal an error should it occur. Thus, we can see that c f l o w allows us to specify some compound events. But there is nothing obviously special about frames remaining on the stack, other than the fact that the flow of control ought to return to each frame eventually. One might be interested in some arbitrary primitive event in the past, irrespective of whether some representation of it remains on the stack. We have seen such use in previous chapters (e.g., in Figures 6.5, 6.18, 6.19—where a control-flow-based query is also used—and 6.20). Communication history queries and pointcut predicates permit the partitioning of event traces into those of interest and those not of interest, at a given point in an execution. Without any reference to prior events, every event trace with the same final primitive event falls into the same equivalence class. The use of generalized communication history allows us to discriminate more finely between different event traces. Pointcut designators like c f l o w are an intermediate level of expressiveness, without an obvious reason as to why they are not either more general (permitting finer call trace differentiation) or less (treating primitive events as sufficient). In general, making unwarranted assumptions 5  about the structure of the control flow is to introduce extraneous embedded knowledge; for It should be pointed out that both implicit context and AspectJ provide lexical means for discriminating between event traces of interest and not of interest. Implicit context tests for events relative to a boundary, which immediately separates event traces into two partitions. Similarly, AspectJ provides the means for limiting events of interest to within a lexically-identified scope (through the within and withincode PCDs) and aspect associations; neither of these exposes more of the communication history, but can be used in selecting modularizations. Implicit context does not impose any modularizations, and my prototype tool supporting it provides only simple ones compared to AspectJ; this is an area for future work. 5  164  example, when an assertion that a particular call has occurred at some previous time will suffice, an assertion used in'its place that a very specific and complete sequence of calls has occurred is E E K .  Reconciliation of Conflicting World Views Aside from the limited power of discrimination provided by its pointcut designators, AspectJ inherits from Java the constraint that a globally-consistent perspective of the system is necessary. For example, the Abstract Factory design pattern example from Section 7.1.2 is not supportable in AspectJ. In AspectJ, one could specify a pointcut via call(Product.new()) that would capture any attempt to instantiate the P r o d u c t class, and replace this attempt with other functionality (such as calling a method on a Factory class); however, P r o d u c t must be a concrete class. That is, given that a call to the constructor exists, a declaration of the constructor must also exist, even if the execution can never get there. Without the concept of local execution, as provided by implicit context, such separation of concerns is not possible.  7.4  Summary  The behaviour specified by a module in isolation remains the same regardless of the context in which that module is embedded; this behaviour is that module's local execution. The local execution of a module does not change if that module does not change. We can represent an execution, local or otherwise, as an event trace. Such a trace records events of significance within an execution in the temporal order in which they occur. In simple situations, an event trace representing a local execution can be considered a subsequence of an event trace representing a global execution. Special forms of dispatch, such as object-oriented polymorphism or contextual dispatch, form more complex situations. In the presence of these dispatch mechanisms, the local execution is a translation of the global execution. For object-oriented dispatch, events that occur in the global view, such as a call to execute a given method, may be replaced with a different event in the local view, such as the execution of a different method. For contextual dispatch, the translation process can be much more complex; it can depend upon earlier events that have occurred in the event trace. The idea of event traces can be used to efficiently support queries on communication history. These queries are pattern matches against event traces. The primary principle involved is that of supporting only those queries that are actually made within the system. 165  The query expression can then be optimized for typically-encountered patterns; this is analogous to the process of optimizing typically-encountered regular expression patterns. Two parts are then required to support the optimized queries. First, the individual (primitive) events mentioned in the pattern can be statically matched in the source text. Second, an automaton is needed to which can be fed these primitive events of interest. The points in the source text that match the primitive events of interest are instrumented so that should they occur during the execution, they can feed information to the automaton. This efficient support for communication history queries remains to be implemented in a tool. Finally, I have demonstrated some differences in the expressibility of implicit context and one particular approach to aspect-oriented programming (namely, AspectJ) through the use of event traces. Pointcut predicates in AspectJ are the equivalent of communication history queries; however, only relatively simple queries can be made due to the fact that AspectJ's semantics are based upon the idea of the execution stack. Most pointcut predicates can only differentiate between primitive events, ones that do not involve ordering of multiple events. The c f low pointcut designator is an exception, but its use is limited to describing events still in existence upon the execution stack. And AspectJ is not capable of translating between global and local executions: a single, globally-consistent world view must exist. I have given a simple example where this is shown to be a limitation with negative consequences.  166  Chapter 8  Related Work I have argued in earlier chapters for the ability of essential structure through implicit context to ease software evolution and software reuse. There has been much other work that addresses evolution and reuse. In addition, there have been various ideas that bear some resemblance to those of essential structure or implicit context without necessarily focusing on evolution or reuse. In this chapter, I consider the relationships of these two sets of work to this dissertation. I begin by considering the relationships to some basic principles in Section 8.1. I then proceed to other approaches to software reuse and evolution; these fall into three major categories.  1  Ad hoc approaches are considered in Section 8.2. Approaches that centre  upon particular interfaces, architectures, or products are described in Section 8.3. And Section 8.4 expounds upon generative approaches. Sometimes, a given approach falls under multiple headings. Cross-references are given to other categories where needed. Finally, I consider some additional topics that bear directly on the concepts of essential structure and implicit context, in Section 8.5.  8.1  Principles  In this section, I first consider two basic ideas that are applicable to the majority of approaches addressing software evolution or reuse: separation of concerns, in Section 8.1.1; and dependences, coupling, and cohesion, in Section 8.1.2. I then proceed to three other ideas: local reasoning, in Section 8.1.3; design-for-change, in Section 8.1.4; and reflection, in Section 8.1.5. There is less widespread agreement as to the importance of these latter ideas in addressing reuse or evolution. 'Run-time evolution lies outside this dissertation as a topic for future work; thus, we do not consider approaches to it here.  167  8.1.1  Separation of Concerns  Dijkstra [1976] is credited with having first coined the term separation of concerns to describe the idea that software should be structured in such a way that different "concerns" can be dealt with individually. The idea pervades modern thought on software structure, with its most extreme application in aspect-oriented approaches [Hiirsch and Lopes, 1995] (see also Section 8.4.4). However, the question of what a "concern" is or is not still troubles us [Murphy et al., 2001 ]. While essential structure is clearly another attempt at separating concerns, it gives some specific guidance as to what is to be removed from our modules: extraneous embedded knowledge.  8.1.2  Dependences, Coupling, and Cohesion  Essential structure is an attempt to minimize the dependences that a module has on its system, and vice versa. Software tends to get more difficult to evolve as it grows older [Belady and Lehman, 1976; Parnas, 1994]; this can be attributed to the growth in the number of dependences in the system as its structure degrades [Stevens et al, 1974; Wang et al, 2001]. Stevens et al. described software structure and dependences in terms of coupling and cohesion.  Coupling is a property describing the degree of dependence between modules;  a highly coupled system is harder to evolve. Cohesion is a property that has been more elusive to describe. Roughly speaking, it is a property that describes how meaningful the relationship is among elements in a module. If a module consists of odds and ends that are thrown together, it has low cohesion; if the elements within a module are tightly bound such that they must work together for the module to function correctly, the module has high cohesion. High cohesion is thus considered preferable for good design. Much work has tried to measure and assess the cohesion of systems [Eder et al, 1992; Ott and Bieman, 1992; Lakhotia, 1993; Briand et al, 1998; Kang and Bieman, 1998; Nandigam et al, 1999]; these measures effectively describe cohesion as intra-modular dependences. Thus, it should not surprise that some empirical work indicates that reusing a module is easier when it possesses low cohesion [Bieman and Kang, 1995], since reuse often requires adaptation or evolution of a module. Coupling and cohesion tend to treat all dependences equally. None of this prior work permits the differentiation of necessary dependences from extraneous embedded knowledge; the dependences that represent E E K are the problematic ones.  8.1.3  Local Reasoning  There have been a number of efforts that have either aimed at or touched upon the idea that one needs to reason about systems on a local rather than global basis. Permitting such 168  local reasoning can ease the burden on a developer, requiring less effort to construct good mental models for the tasks they undertake [Edwards, 1995]. On the other hand, requiring global reasoning tends to bury the hapless developer in a "plethora of details" [Parnas and Clements, 1990]. Local reasoning and isolation of changes are complementary. Global variables were regarded as harmful early on precisely because of the necessity for global reasoning when using them [Wulf and Shaw, 1973]. Even objections to the use of local variables at the time were concerns about the ability for local reasoning [Fisher, 1983]; without strong scoping mechanisms in place, the particular "local" variable to which one refers can be quite malleable and difficult to determine. Encapsulation of details behind an interface [Parnas, 1972] permits those details to be modified without affecting the rest of the system, to an extent. However, such encapsulation does not suffice to permit local reasoning, since the internals of a module can depend on the interfaces of many external modules. The need for exception handling arose because of the impossibility of performing truly global reasoning [Parnas and Wiirges, 1976]. Even in a system where one could have all the source code in order to perform a formal verification, the system ultimately needs to interact with the real world: hardware breaks down, and humans can behave unpredictably. To reason soundly at the local level, we need to have that reasoning translate well to the global level. Due to the shortcomings of the current state of the art, developers tend to be understandably dubious about foregoing the need to see all the details for an entire system all at once. Sitaraman et al. [2000] discuss reasoning about software component behaviour, both in isolation and as composed into a system. The properties of interest reasoned about locally must be maintained when moving to the global view. As long as we can somehow continue the reasoning at an inter-modular level without re-analyzing low-level details, local reasoning can be sound. In my future work on implicit context, I will need to demonstrate how the translation between world views also transforms the local reasoning into sound global reasoning. Such work awaits formalization of the approach. However, the local reasoning aspects of the work of Sitaraman et al. should be directly applicable.  8.1.4  Design-for-Change  It has long been held that good software design must consider the accommodation of change [Parnas, 1972, 1979]. Some design decisions are narrowly applicable to a single context and therefore are prone to volatility. Isolating these design decisions from the majority of a system (for example, by encapsulating them in modules behind more general interfaces) makes it easier to change these decisions. VanHilst and Notkin [1996a] explain the need for finer-grained modularization to isolate smaller design decisions, a recognition that modules can be highly complex entities in their own right and thus difficult to change. Essential  169  structure considers modules to exist at multiple levels of granularity. This principle of design-for-change is often applied in the form of good coding practices or making design decisions explicit [Han, 1997]; for example, using a named constant whose value is defined at one place is preferable to having the value of that constant scattered throughout the system. However, on the larger scale, design-for-change amounts to predicting change and designing for it, rather than using a more malleable coding style. Indeed, some proponents would claim that evolution and, in particular, reuse are impractical unless one anticipates changes during the development of a system [Parnas, 1972; Castano and De Antonellis, 1993; Wasmund, 1995]. But one can rarely accurately predict the future. Without accurate prediction, there are two pitfalls. First, we waste effort adding complexity into our systems that ends up not being used, while making these systems less efficient. Our systems become harder to understand, since the purpose of the added complexity is not apparent. Second, our designs will not necessarily accommodate the changes that are actually needed. Essential structure allows us to design our modules in a style that accommodates external change while minimizing the likelihood of internal change. It avoids the expense and omniscience required to predict change. However, in situations where one can get away with making assumptions about the structure of a system and the evolution thereof, techniques that take advantage of these assumptions are liable to be at an advantage. It remains to be shown what the cost-benefit tradeoffs are of essential structure through implicit context.  8.1.5  Reflection  Computational reflection [Smith, 1982] is "the activity performed by a computational system when doing computation about (and by that possibly affecting) its own computation" [Maes, 1987]. Two degrees of reflection are often distinguished: introspection involves reacting to the representations of the computation present in a given system, while metaprogramming involves the manipulation of those representations. Introspective capabilities are provided by Java, for example, where one can specify that classes be dynamically loaded, ask about the methods and fields of classes, and cause those methods to be invoked if they prove of interest. One cannot, however, add or replace methods or types through this facility. Metaprograrnming capabilities are provided by CLOS, for example [Kiczales et al,  1991]. Metaprogramming is a superset of intro-  spection, adding facilities to dynamically create or replace methods or types. Metaprogramming is a very powerful tool. It requires a great deal of discipline to use properly, rather than to resort to its use at every opportunity. Ad hoc application of metaprogramming tends to make programs harder to reason about, since it alters those programs dynamically in a way that can be opaque. Implicit context provides a mechanism  170  that is less powerful than metaprogramming: for example, one cannot dynamically create new classes with implicit context. Implicit context does permit reasoning about a system, and certain translations of dynamic events within a system (see Chapter 7); however, the hope is that the static specification of these events will permit the right balance between expressive power and comprehensibility. Further study is required to judge whether this hope is met.  8.2  A d Hoc Approaches  Ad hoc approaches are those where manual intervention forms the basis for supporting change. These fall into two categories: copy-and-modify, described in Section 8.2.1; and refactoring, described in Section 8.2.2.  8.2.1  Copy-and-Modify  Copy-and-modify is likely the most common form of reuse, since it requires no special support from languages or tools. The programmer notices a similarity between an existing source code module and a desired one, makes a copy of the original, and modifies the copy to the needs of the situation. This idea goes by other names such as scavenging [Krueger, 1992], reclamation [Garnett and Mariani, 1990], or extraction [Lanubile and Visaggio, 1997]. Some such techniques use more sophisticated means for identifying the candidate modules to be copied, such as flow graph-based program slicing [Lanubile and Visaggio, 1997]. Copy-and-modify suffers from two shortcomings. First, after having been copied, the original module may eventually require changes itself due to error corrections. These changes cannot automatically propagate to the copies of the module. Therefore, the manual copy-and-modify operation must be repeated to propagate the changes, and since the process was manual, there is no guarantee that one will be able to repeat the process faithfully. Second, copies of the module may be scattered hither and yon wherever it was reused. Since there is no permanent connection between the original and any copies of it, locating each version in order to propagate a change to it is problematic. Empirical results support the contention that these shortcomings are a practical reality [Burd and Munro, 1997]; copy-and-modify, while lightweight, is error-prone.  8.2.2  Refactoring  A significant number of techniques centre about the idea of meaning-preserving transformations. This is often called refactoring [Opdyke, 1992; Tokuda and Batory, 1999; Fowler  171  et al, 2000; Beck, 2000; Butler and Xu, 2001; Baxter, 2002]; similar ideas have been promoted without using this term [Feather, 1982; Burson et al, 1990; Griswold and Notkin, 1993]. The differences between these techniques lie in the kinds of restructuring supported, or details of usage. In short, refactorings are performed when one desires to restructure the functionality in a system.  Object-oriented refactorings might include such operations as moving the  implementation of a method from one class in a hierarchy to its children, adding a parameter to an overridden method, or collapsing a class hierarchy. Tools are often applied to permit refactoring to avoid simple errors. Refactoring can either be performed in the modify step of copy-and-modify, or it can be performed on the original code itself. In the former case, it suffers from the same shortcomings as does copy-and-modify: code must be replicated, and it is not a simple process to propagate changes through the copies, even should those copies be located. When refactoring is applied to the original source code of a module, the problems of replication are avoided; however, new troubles arise. If one is working on a system the source code of which is completely isolated from that of any other system, then one is completely free to restructure it as desired. But it is more common that one must adhere to published interfaces, either those provided by other developers (for example, interfaces to libraries) or provided to other developers (for example, the interface to a framework); in this situation, one is constrained as to what restructuring is possible without violating these interfaces. Some empirical work indicates that it is best to avoid changing such interfaces whenever possible [Svahnberg and Bosch, 1999]. Furthermore, a simple modularization is not always possible that will satisfy the needs of every system in an ideal fashion, and thus, there is no corresponding refactoring that will suffice; I elaborate on this point in Section 2.3. This is the reason for much of the work on multiple world views (see Section 8.5.3).  8.3  Interface-, Architecture-, and Product-Centric Approaches  There exists a continuum of approaches that centre upon the idea of standardizing certain commonalities between systems or modules. At one end of this continuum are systems that allow for minor variation on the basis of configurable options. At the other end are systems that are built from pre-existing components, each of which adheres to some standard interface. Inheritance, polymorphism, and frameworks are the topics of Section 8.3.1. I describe approaches that adapt or wrap existing modules to integrate them, in Section 8.3.4. In Section 8.3.2,1 discuss components and middleware architectures. And I finish with product line approaches in Section 8.3.3.  172  8.3.1  Inheritance, Polymorphism, and Frameworks  Object-orientation is founded on three interacting mechanisms: encapsulation of details behind an interface [Parnas, 1972]; inheritance of the implementation of parts of that interface [Taivalsaari, 1996]; and polymorphic dispatch of a call on that interface to one of its alternative implementations [Strachey, 1967; Cardelli and Wegner, 1985]. This is a powerful combination, and its potential for easing synthesis, reuse, and evolution of software is no small part of why there has been so much research into and adoption of object-orientation. Although object-orientation has been somewhat successful, particularly when compared to procedural development techniques [Kiran et al, 1997], it has not lived up to all the claims made about it in regards to its support for reuse and evolution [Bassett, 1997a: pp. 57-58]. Inheritance can be overused [Johnson and Foote, 1988]. It is not a generalpurpose reuse mechanism, but requires conceptual similarities to exist between the superclass and subclass lest the design of the system erode. Some empirical studies indicate that inheritance is used fairly sparingly in real systems, with the average number of subclasses residing in the neighbourhood of 0.6 [Bieman and Zhao, 1995]. One might claim that this is simply an indication that there was little opportunity for reuse in those systems; however, the prevalence of code replication [Burd and Munro, 1997] and design replication [Gamma et al, 1994] makes this doubtful. That the depth of inheritance hierarchies is inversely related with their comprehensibility has been claimed [Daly et al, 1996], but remains contentious [Deligiannis et al, 2002]. Added complications such as multiple inheritance (i.e., inheritance from more than one superclass) or representation inheritance [Edwards, 1997] do not alleviate the lack of a general-purpose reuse mechanism. The difficulty of typical approaches to inheritance is that subtyping becomes coupled to the reuse of source code. Some languages allow these concepts to be separated to a degree. On the one hand, interface inheritance [Hamilton and Radia, 1994] is straightforward; for example, Java permits the description of interfaces, essentially abstract classes that specify a protocol to which implementing classes must adhere. Implementation inheritance is also possible. C++ permits private inheritance, which can be used indirectly (i.e., via delegation to an inherited, private method) in the reuse of a set of implementations. However, this kind of implementation inheritance is still inheritance, requiring more conceptual continuity between classes than pure reuse of algorithms or source [Weihe, 1997, 2001]. Generative techniques (see Section 8.4) have been pursued to address this shortcoming. Polymorphism is a mechanism allowing the decoupling of method calls from method implementations. Commonly used object-oriented languages like C++ and Java provide single dispatch: the method implementation that receives a message is selected on the basis of the run-time type of only one object; only the static types of additional parameters are  173  used to discriminate between overloaded methods. Other languages, such as CLOS [Bobrow et al, 1986] and Cecil [Chambers, 1992], provide for multiple dispatch: the run-time type of every parameter to a message can affect the selection of the method implementation to receive that message. Subjectivity [Harrison and Ossher, 1993; Harrison et al, 1994; Kristensen, 1998] permits method selection not only on the basis of the type of the receiver, but also of the type of the sender (see Section 8.4.4 for further discussion of subjectivity). Predicate dispatch permits selection of a recipient implementation on the basis of whether a given predicate evaluates to true or false [Chambers, 1993; Ernst et al, 1998]. Contextual dispatch is a generalization of these other dispatching mechanisms. The context in single dispatch is the run-time type of the object receiving the message; in predicate dispatch, it is the system state used to compute the predicate. Only contextual dispatch permits local frames of reference that conflict with one another. However, as these forms of dispatch increase in power and expressiveness, they become more complex to specify. How sharply this complexity increases, or whether it is outweighed by the benefits gained are matters requiring further study. Metaphoric polymorphism purports to permit the identification of semantic correspondences between types, as independent from subtyping [Rinat and Magidor, 1996]. This concept was later made concrete (though renamed correspondence polymorphism) in a language called L C P [Rinat et al, 1999]. Their primary example is based upon the perceived similarity of an algorithm for computing powers of an integer with an algorithm for computing powers of a matrix. This idea is a limited form of retroactive abstraction (see Section 3.3): calls in the source text are reinterpreted so that they are sent to the appropriate implementation in the given context. The limitations lie in the fact that there can only be a one-for-one replacement of types and methods: no dynamic selection is possible, and there must still be a globally consistent frame of reference.  Metaphoric/correspondence  polymorphism is effectively retroactive genericity (see Section 8.4.1). Another mechanism that can be considered a form of dispatch is implicit invocation (also called publish-subscribe) [Garlan and Notkin, 1991]. Here, a given module provides an interface whereby external modules can register an interest in specific kinds of events as defined by that interface; each external module provides a callback function when it registers. When an event occurs that the module publishes, each external module that has subscribed to this event has its callback function invoked. One can consider this event to be a call that gets dispatched to zero or more method implementations, though without any return values. Implicit invocation requires that the publisher module announce specific events. As such, it must be explicitly aware that these events—and no others—are of potential interest to external modules and announce when they occur; often, this explicit knowledge is E E K with respect to the publisher. On the other hand, implicit context permits the form of the events of interest to be specified externally to a module. Thus, the events published can be  174  tailored to meet the needs of the system without the need to invasively modify the module. As a result, from the system's perspective, that module would be providing an implicit invocation interface; from the module's perspective, it would not. A framework is a collection of interacting classes, some of which are abstract [Johnson and Foote, 1988]. To reuse a framework, a developer subclasses the abstract classes as appropriate to the particular system being developed. As such, frameworks can be considered highly constrained middleware architectures where components must be added via inheritance (see Section 8.3.2) [Johnson, 1997]. Frameworks do not alleviate us from the difficulties of evolution, since the frameworks themselves need to evolve [van Gurp and Bosch, 2001]. Indeed, this evolution can be particularly difficult since, presumably, existing systems depend on the original interface provided by the framework (or there would be no reason to invest in evolving the framework in the first place). In short, objects and frameworks are insufficient to support general software reuse or evolution by themselves [Griss et al., 1995]. Implicit context does not reject objectorientation, but can be used orthogonally to it, as has been shown in the case studies of Chapter 6.  8.3.2  Components and Middleware  The term software component [Mcllroy, 1969] is used in a variety of ways. One possible definition offered by Szyperski [1997: p. 34] is: " A software component is a unit of composition with contractually specified interfaces and explicit context dependences only. A software component can be deployed independently and is subject to composition by third parties." Thus, "component" tends to imply more than "module," in that a module (as defined in this dissertation ) can be quite tightly coupled to its system, and not intended for 2  separate reuse. A component differs from a framework in that a framework allows custom modules to be plugged into it (via inheritance) while components are the modules getting composed [Johnson, 1997]. Through essential structure and implicit context every module has the potential to be treated as a component after the fact—consider the case of the Outline View of Eclipse (see Chapter 6). Components can be made to interoperate in one of two fashions: through "glue" techniques, described in Section 8.3.4; or through standardized interfaces to an architecture that is then used to communicate with other components. These standardized architectures are called middleware. A difficulty with components is indicated by Johnson [1997: p. 14]: "Each component makes assumptions about its environment. If components make different assumptions then it is hard to use them together." Others agree [Berlin, 1990; Garlan et al, 1995]. This need to make consistent assumptions naturally leads to the desire for standard environments See Section 4.1.1.  2  175  in which to compose components, and thence, to the standard middleware architectures, described below. POLYLITH [Purtilo, 1994] provides a "software bus" that allows the specifications of a program's structure, its deployment onto nodes, and inter-module communication to be separated. Simple communication statements within components are adapted to conform to the needs of the actual deployment geometry and communication protocol needs of a heterogeneous architectural- and language-environment. While POLYLITH is an attempt at making modules moreflexibleby separating the concerns of distributed and inter-process communication, it cannot remove the same degree of E E K as implicit context since it does not permit the indirect kind of communication where, for example, parameters can be filled in from the communication history. Microsoft's Component Object Model (COM) [Williams and Kindel, 1994] permits components to communicate through universally-identified, immutable interfaces. When a component increases in functionality, rather than changing the definition of its interface, it defines an additional, new interface that encompasses the new functionality; this is intended to alleviate problems of version-control. Also, when a C O M component A possesses an interface pointer to another C O M component B, component A can query component B to see if it supports additional interfaces. C O M assumes that component functionality never decreases. COM+ [Kirtland, 1997], which automates some of the specification processes of C O M , permits the implementor of a class to specify attributes on C O M components to express how to initialize system services, how to bind fields to other components, or hints on how to pass parameters. COM+ class specifications are riddled with E E K ; such classes require intimate knowledge of COM+ to operate correctly. Neither C O M nor COM+ provides a means to reflect upon system history. The Common Object Request Broker Architecture (CORBA) [Object Management Group, 1998] permits components to request particular, registered services. The broker locates other components that supply these services, and directs the communications appropriately. C O R B A IDL context clauses are a means of passing environment variables across a network connection [Baker, 1997: p. 65]. JavaBeans (self-contained, reusable software units that can be composed into composite components in a standardized fashion, and manipulated visually in a builder tool) [Hamilton, 1997] provide BeanContexts [Cable, 1998] to permit an abstraction for the context in which a JavaBean logically functions at run-time. A JavaBean may query this context to determine the availability of services and to subsequently use those services. JavaBeans will therefore be riddled with E E K in support of the BeanContext mechanism; JavaBeans and BeanContexts will also need to agree upon names, semantics, and protocols for services. Although partial system history is represented via the hierarchy of nested BeanContexts and JavaBeans, no other reflection upon system history or state is implicit in the  176  model. No matter how standardized the interface provided by one of these approaches, it inevitably needs to evolve. The need to maintain backwards-compatibility to support existing client modules, and to update the interface to accommodate new ideas and new features result in pressure in opposite directions; predicting such changes ahead of time is rarely possible [Wegner, 1996]. As a result, languages evolve, libraries evolve, and standards evolve—or else, they stagnate and are eventually abandoned. Additionally, no single architecture is ideal, or even usable, for all systems. Expecting modules to always plug-in to any architecture like " L E G O blocks" is unrealistic. Most modules place constraints upon their systems in the form of dependences on external modules. The need that a given module may have for certain global properties, such as particular security or fault detection protocols, to be supported by its system cannot be provided by an arbitrary framework [Ran, 1999]. In contrast, implicit context avoids the need for an overarching standard. Individual modules may disagree upon the global perspective, permitting differin