UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A model and method for measuring information system size Wrigley, Clive Donald 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 W75.pdf [ 9MB ]
Metadata
JSON: 831-1.0098269.json
JSON-LD: 831-1.0098269-ld.json
RDF/XML (Pretty): 831-1.0098269-rdf.xml
RDF/JSON: 831-1.0098269-rdf.json
Turtle: 831-1.0098269-turtle.txt
N-Triples: 831-1.0098269-rdf-ntriples.txt
Original Record: 831-1.0098269-source.json
Full Text
831-1.0098269-fulltext.txt
Citation
831-1.0098269.ris

Full Text

A MODEL AND METHOD FOR MEASURING INFORMATION SYSTEM SIZE b y CL . IVE D O N A L D W R I G L E Y B . A . ( H o n o r s ) , S i m o n F r a s e r U n i v e r s i t y , 1 9 8 0 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S ( F a c u l t y o f C o m m e r c e a n d B u s i n e s s A d m i n i s t r a t i o n ) W e a c c e p t t h i s t h e s i s as c o n f o r m i n g t o t h e r e q u i r e d s t a n d a r d T H E . U N I V E R S I T Y O F B R I T I S H C O L U M B I A D e c e m b e r 1 9 8 8 ©' C l i v e D o n a l d W r i g l e y , 1 9 8 8 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date T>^C- 2 £ > . / f r e P / 3 DE-6 (2/88) ABSTRACT T h i s t h e s i s d e v e l o p s a m e a s u r e m e n t m o d e l a n d m e t h o d t h a t a l l o w s i n f o r m a t i o n s y s t e m p r o f e s s i o n a l s t o e s t a b l i s h m e a s u r e s of i n f o r m a t i o n s y s t e m s i z e t h a t a r e a c c u r a t e a n d m a y b e e s t a b l i s h e d e a r l y i n t h e s y s t e m d e v e l o p m e n t l i f e c y c l e . It r e p o r t s t h e r e s u l t s o f a n e m p i r i c a l i n v e s t i g a t i o n i n t o t h e a s p e c t s o f r e q u i r e m e n t s a n d d e s i g n m e t r i c s w h i c h l e a d t o t h e p r o d u c t i o n of s o u r c e c o d e . T h e t h e o r e t i c a l f o u n d a t i o n s u s e d t o i n v e s t i g a t e t h i s t o p i c o r i g i n a t e i n s y s t e m s t h e o r y , m o d e l s of i n f o r m a t i o n s y s t e m s a n d o r g a n i z a t i o n a l t h e o r i e s o f s t r u c t u r e a n d c o m p l e x i t y . M o d e l s of s y s t e m d e v e l o p m e n t a r e r e v i e w e d as t h e y p l a y a k e y r o l e i n o u r n o t i o n s o f a c t i v i t i e s t o p e r f o r m d u r i n g d e v e l o p m e n t . B y d r a w i n g o n e x i s t i n g e s t i m a t i n g m o d e l s f r o m s o f t w a r e e n g i n e e r i n g a n d o t h e r s o u r c e s o f p r a c t i t i o n e r l i t e r a t u r e , a m o d e l t o p r e d i c t d e v e l o p m e n t e f f o r t w a s s y n t h e s i z e d . T h r e e d i s t i n c t c o n s t r u c t s e m e r g e d : 1) s y s t e m r e q u i r e m e n t s , w h i c h d r i v e e f f o r t ; 2) p e r s o n n e l e x p e r i e n c e , w h i c h c a n m i t i g a t e e f f o r t ; a n d 3) t e c h n o l o g y , w h i c h c a n a l s o m i t i g a t e e f f o r t . S y s t e m r e q u i r e m e n t s w a s c h o s e n t o f u r t h e r d e f i n e a n d o p e r a t i o n a l i z e , s i n c e t h e y a r e t h e p r i n c i p a l s o u r c e of d e v e l o p m e n t c o m p l e x i t y a n d h e n c e s y s t e m s i z e . A n E n t i t y - R e l a t i o n s h i p a n d E v e n t a p p r o a c h w a s t a k e n t o e s t a b l i s h a n e a r l y m e a s u r e of S y s t e m R e q u i r e m e n t s s i z e . A t h e o r e t i c a l f r a m e w o r k of d a t a a n d p r o c e s s c o m p l e x i t y w a s d e v e l o p e d w h i c h m a y b e u s e d t o i n i t i a l l y size a d e v e l o p m e n t p r o j e c t b a s e d o n t h e i n f o r m a t i o n a v a i l a b l e at t h e r e q u i r e m e n t s s p e c i f i c a t i o n a n d d e s i g n p h a s e s . It is a r g u e d t h a t t h i s n e w a p p r o a c h is m o r e g e n e r a l t h a n e x i s t i n g s i z i n g t e c h n i q u e s ; n a m e l y , F u n c t i o n P o i n t s a n d L i n e s o f C o d e . T h i s n e w s i z i n g a p p r o a c h is t e s t e d a g a i n s t 26 s i m p l e t r a n s a c t i o n b a s e d p r o c e s s i n g s y s t e m s d e v e l o p e d i n F O C U S . A n a u t o m a t e d C o d e A n a l y s e r w a s d e v e l o p e d t o r e v e r s e e n g i n e e r t h e s e s y s t e m s b a c k t o t h e i r d e s i g n m e a s u r e s . T h e p e r s o n n e l a n d t e c h n o l o g y v a r i a b l e s w e r e h e l d c o n s t a n t f o r t h i s i n i t i a l t e s t . F o r t h e t h e s i s , o n l y r e t r o s p e c t i v e m e a s u r e m e n t o c c u r r e d b u t it is e x p e c t e d t h a t l o n g i t u d i n a l m e a s u r e m e n t w i l l e v e n t u a l l y b e p o s s i b l e . T w o p r i m a r y r e s e a r c h c o n t r i b u t i o n s a r e s e e n e m a n a t i n g f r o m t h i s s t u d y . F i rs t is t h e d e v e l o p m e n t a n d a p p l i c a t i o n of t h e o r y t o t h e p r o b l e m of i n f o r m a t i o n s y s t e m s i z i n g . S e c o n d is a m e t h o d f o r d a t a c o l l e c t i o n a n d a n a l y s i s w h i c h w i l l h e l p t h e s o f t w a r e d e v e l o p m e n t i n d u s t r y m o v e t o w a r d s t h e g o a l o f s y s t e m d e v e l o p m e n t m e a s u r e m e n t a n d e v a l u a t i o n . T h i s w i l l i m p r o v e p l a n n i n g f o r , a n d m a n a g e m e n t o f , i n f o r m a t i o n s y s t e m s d e v e l o p m e n t . i i TABLE OF CONTENTS L is t o f T a b l e s v i L i s t o f F i g u r e s v i i A b s t r a c t ii A c k n o w l e d g e m e n t s v i i i C h a p t e r 1. I n t r o d u c t i o n 1 1 . 1 . T h e s i s S t a t e m e n t 1 1 .2 . M o t i v a t i o n : T h e E s t i m a t i n g P r o b l e m A n d Its S i g n i f i c a n c e 1 1 . 2 . 0 . 1 . P i l o t S u r v e y 2 1 . 3 . G o a l s o f t h e e s t i m a t i n g p r o c e s s 3 1 .4 . T h e m e a s u r e m e n t i s s u e s 5 1 .5 . R e s e a r c h M e t h o d s u s e d in t h i s t h e s i s 6 1 .6 . C o n t r i b u t i o n t o K n o w l e d g e a n d P r o f e s s i o n 7 1 .7 . T h e s i s S t r u c t u r e 8 C h a p t e r 2 . R e l e v a n t L i t e r a t u r e R e v i e w 1 0 2 . 1 . M o d e l C l a s s i f i c a t i o n , 1 0 2 . 2 . H i s t o r i c a l D e v e l o p m e n t 1 2 2 . 3 . E m p i r i c a l V a l i d a t i o n s 1 8 2 . 4 . C o m p a r i s o n o f F a c t o r s a n d M o d e l S e n s i t i v i t y 2 0 2 . 4 . 1 . D i s c u s s i o n : E f f o r t E s t i m a t i o n a n d P r o d u c t i v i t y 2 3 2 . 5 . S y s t e m S i z i n g 2 6 2 . 6 . S u m m a r y : T h e m i s s i n g L i n k 31 C h a p t e r 3. T h e o r e t i c a l F r a m e w o r k 3 3 3 . 1 . F r a m i n g t h e I s s u e s 3 3 3 . 2 . K e y I n f o r m a t i o n S y s t e m C o n c e p t s 3 6 3 . 2 . 1 . S y s t e m D e v e l o p m e n t T r a n s f o r m a t i o n s 3 6 3 . 2 . 2 . R e q u i r e m e n t s M o d e l l i n g 3 8 3 . 2 . 3 . A D y n a m i c I n f o r m a t i o n S y s t e m M o d e l 4 0 3 . 2 . 4 . R e q u i r e m e n t s T r a n s f o r m a t i o n 4 2 3 . 2 . 5 . R e q u i r e m e n t s C o m p l e x i t y 4 5 3 . 2 . 6 . R e q u i r e m e n t s S i z i n g 4 6 3 . 2 . 7 . S q u a r e f o o t a g e 4 6 3 . 3 . P r o c e s s i n g C o m p l e x i t y 4 7 3 . 3 . 1 . P r o c e s s i n g c o m p l e x i t y b a s e d o n E - R - E c o n c e p t s 4 8 3 . 3 . 2 . I n p u t E v e n t s 4 9 3 . 3 . 3 . O u t p u t E v e n t s 4 9 3 . 3 . 4 . F u n c t i o n a l i t y 51 3 . 4 . E - R - E a n d F u n c t i o n P o i n t s 5 2 3 . 5 . F o r m a l i z i n g R e q u i r e m e n t s S i z e 5 3 3 . 5 . 1 . E v e n t s , S t a t e s a n d T r a n s f o r m s 5 3 3 . 6 . S u m m a r y 5 8 C h a p t e r 4 . E m p i r i c a l R e s e a r c h D e s i g n 6 0 i i i 4.1 . Research M o d e l 61 4 .1 .1 . D e t a i l e d Research M o d e l 65 4.1.2. M e t r i c Linkages 68 4.2. System D e v e l o p m e n t M e a s u r e m e n t and Evaluation 71 4 .2 .1 . Ca l ib ra t ion ...X.....A A 7 3 4.2.2. M e a s u r e s of E , S and T From the C o d e a . 74 4 .2 .2 .1 . System D y n a m i c Measurements^: E 74 4 .2 .2 .2 . System Static M e a s u r e m e n t s : S 74 4 .2 .2 .3 . P rocess M e a s u r e m e n t : T 75 4.3. Example System • 75 4 .3 .1 . System D e s c r i p t i o n 76 4.4. A u t o m a t i n g The Expert C o d e Ana lyser 87 4 .4 .1 . Reliabil ity .' 88 4.4.2. The A u t o m a t e d C o d e Ana lyser 89 4 .4 .2 .1 . T o o l D e v e l o p m e n t ..: 89 4 .4 .2 .2 . T o o l D e s c r i p t i o n 90 4 .4 .2 .3 . Parsing Strategy 92 4.4.2.4. P rogram c lass i f icat ion 94 4 .4 .2 .5 . T o k e n c o u n t i n g rules 94 4.4.2.6. K n o w n l imi tat ions 95 4 .4 .3 . Systems Database 96 4.4.4. M e t r i c s Analysis 98 4.5. H y p o t h e s e s 98 . 4.6. Summary 99 C h a p t e r 5. The Empir ical Study 100 5.1. Data Site 1: , 101 5.1.1. The data set 101 5.1.2. Pi lot Invest igat ion 102 5.1.3. Pi lot Va l idat ion 103 5.1.4. Data C o l l e c t i o n 106 5 .1 .4 .1 . Data Val id i ty 107 *5.2. Data Site 2: '. 108 5.3. D i s c u s s i o n • 109 5.4. O t h e r M e a s u r e m e n t s f r o m the data 110 5.5. Summary 110 C h a p t e r 6. Empir ica l F indings 112 6.1. Un i t of Analysis : P rogram Level 112 6.1.1. P rogram Size • 112 6.1.2. P rog ram Class •• 114 6.1 .2 .1 . U p d a t e Programs 115 6.1.2.2. O u t p u t Programs 116 6.1.2.3. C o n t r o l P rograms 116 6.1.2.4. Unc lass i f i ed Analysis 116 6.1.3. Regress ion Results 117 6.1.3.1. Regress ion D i s c u s s i o n 120 6.1.3.2. T o w a r d s Pars imony 122 6.1.4. P r o g r a m m e r d i f fe rences 123 6.1 .4 .1 . Effects o n P rogram Size 125 iv 6.1.4.2. Test For Learn ing Effects 129 6.2. Un i t of Analysis: Sys tem Level 130 6.2.1. D e m o g r a p h i c s 131 6.2.2. M e t r i c L inkage at System Level 132 6.2.3. R e s o u r c e C o n s u m p t i o n 133 6.3. Un i t of Analysis : Firm Level 136 6.4. Summary and D i s c u s s i o n : 138 6.4.1. Genera l i zab i l i ty 139 6 .4 .1 .1 . System Size 139 6.4.1.2. M o d e l Va l ida t ion 140 6.4.1.3. Reverse Eng ineer ing 141 6.4.1.4. Forward Eng ineer ing 142 6.4.1.5. H o u r s Data 142 6.4.1.6. Language 143 6.4.2. M a i n t e n a n c e 147 6.4.3. System D e c o m p o s i t i o n 148 C h a p t e r 7. S u m m a r y and Research D i r e c t i o n s 150 7.1. Thesis Summary 150 7.1.1. Empir ica l L imitat ions 151 7.2. D i rect Extensions 152 7.3. Expert Behav iour 154 7.4. M o d e l l i n g 155 7.5. M a n a g e m e n t 156 7.6. C l o s i n g Remarks 157 References 158 A p p e n d i x A: O p e r a n d C o d i n g S c h e m e 164 A p p e n d i x B: P i lot Va l i da t ion Results 166 A p p e n d i x C : Systems Database D e t a i l e d Structure 177 A p p e n d i x D: C o d e Analyser S o u r c e C o d e : 180 v List of Tables Table 2 .1 : C o m p a r i s o n of Est imating A p p r o a c h e s 11 Tab le 2.2: Effects o n Product iv i ty 21 Tab le 5.1: Pi lot Regress ion - Exp la ined Var iance in C o d e Size 105 Tab le 6.1: P rogram class d i f fe rences 115 Tab le 6.2.1: Regress ion m o d e l - U p d a t e class 118 Table 6.2.2: Regress ion m o d e l - O u t p u t class 119 Table 6.2.1: Regress ion m o d e l - C o n t r o l class 120 Table 6.3: R e d u c e d M o d e l 123 Table 6.4: P r o g r a m m e r by P rogram Class 124 Table 6.5.1: P r o g r a m m e r Analysis - U p d a t e s 126 Table 6.5.2: P r o g r a m m e r Analysis - O u t p u t s 127 Table 6.5.3: P r o g r a m m e r Analysis - C o n t r o l 128 Tab le 6.6: System Size M e a s u r e s . . . . .132 Table 6.7: D e s i g n Pred ictors of System Size 133 vi List of Figures Figure 2 .1 : S imp l i f ied W a t e r Fall M o d e l 29 Figure 2.2: D e M a r c o ' s Project C lass i f icat ion 31 Figure 3.1: Requ i rements , Pe rsonne l and T o o l s M o d e l 35 Figure 3.2: System D e v e l o p m e n t Trans fo rmat ions 38 Figure 3.3: In format ion System M o d e l 55 Figure 4 .1 : Research M o d e l 63 Figure 4.2: D e t a i l e d Research M o d e l 66 Figure 4.3: M e t r i c L inkages 69 Figure 4.4: Sys tem D e v e l o p m e n t - M e a s u r e m e n t and Evaluat ion 72 Figure 4.5: Asse t D i s p o s a l System: E-R-E Structure 78 Figure 4.6: Asse t D isposa l System Processes 80 Figure 4.7: T o p Level Data F l o w D iagram 81 Figure 4.8: A D S M a i n M e n u P rocesses 83 Figure 4.9: S u b - M e n u 1020: Repor t P rocesses 85 Figure 4.10: Batch Repor t Run 4080 86 Figure 4 .11 : Asset Status Report 4160 87 Figure 4.12: O u t p u t s f r o m C o d e Ana lyzer 92 Figure 4.13: Systems Database , . . . 97 Figure 6.1: P r o g r a m s ize d is t r ibut ion 114 Figure 6.2: H o u r s vs. Lines of C o d e 135 Figure 6.3: P r o g r a m s ize d is t r ibut ion - Site 2 138 Figure 6.4: C a l c u l a t e d vs. Actua l l ength 146 vii ACKNOWLEDGEMENTS It is i m p o s s i b l e of c o u r s e t o ident i fy all the cont r ibu to rs t o w a r d s a thesis. O u r t h o u g h t s are m o s a i c s of what w e have read, heard and d i s c u s s e d . M a n y ideas c o m e f r o m o f f - hand c o m m e n t s in seminars and conve rsa t i on . It is a m u s i n g t o r e m e m b e r onese l f bask in the de l ight of what a p p e a r e d t o be a p r e c o c i o u s n e w t h o u g h t on ly to f ind several days later that another author had art iculated the c o n c e p t m u c h bet ter -f ive years earlier. The role of the thesis adv isor in this p r o c e s s is incalcu lab le , s o first and fo remost I w o u l d l ike t o a c k n o w l e d g e and thank A lber t D e x t e r w h o s e c o n t i n u e d pat ience , s u p p o r t and h u m o u r s m o o t h e d out the r o u g h spots w h i l e cont inua l ly c h a l l e n g i n g m e not to be satisf ied w i th the ordinary. A l t h o u g h , at t imes this can be a curse w i t h o u t the self d isc ip l ine to const ra in o n e s ambi t ions . A l , y o u wi l l be a fr iend and c o l l e a g u e for many l o n g years t o c o m e . To the o the r m e m b e r s of my c o m m i t t e e , C ra ig P inder and S o n V u o n g , for your insight and g u i d a n c e t h r o u g h this l ucubra t ion and to the faculty and graduate s tudents in the MIS d iv is ion , I thank y o u for y o u r candour . Cr i t ical appraisal is the route to k n o w l e d g e . T o the managers w h o p r o v i d e d access to thei r data and to t h o s e w o r k i n g in the data sites, w h o unwi t t ing ly b e c a m e i nvo l ved in this pro ject yet gave their ful lest c o - o p e r a t i o n , I e x t e n d my grat i tude. H o p e f u l l y the o u t p u t f rom this research is not t o o smal l a payment . Finally, to my family, D o n , Doro thy , Lynda, Joyce and Duga l , w h o s e u n e n d i n g pa t ience and mora l suppor t , if no t unders tand ing , gave m e the mot i va t i on t o k e e p g o i n g . A n d of c o u r s e to Shandy - w h o really k n o w s h o w to live. viii A MODEL AND METHOD FOR MEASURING INFORMATION SYSTEM SIZE ix CHAPTER 1. INTRODUCTION 7.7. THESIS STATEMENT T h e ob jec t i ve of this d issertat ion is t o establ ish a theoret ica l a n d empi r ica l link b e t w e e n : 1. The entit ies, events , and their in ter - re lat ionsh ips that o c c u r in the real w o r l d , and 2. The size of the i m p l e m e n t e d i n fo rmat ion system w h i c h represents those real w o r l d ent it ies, re lat ionships and events . 7.2. MOTIVATION: THE ESTIMATING PROBLEM AND ITS SIGNIFICANCE There are essential ly t w o dist inct issues w i th w h i c h the In format ion Systems M a n a g e r is c o n c e r n e d . The first is the d e c i s i o n of w h e t h e r or not to p r o c e e d w i th a sof tware d e v e l o p m e n t project . H e r e the accuracy of the total life cyc le d e v e l o p m e n t cost est imate is crit ical as it is the pr ime input in to the cos t / bene f i t d e c i s i o n m o d e l s w h i c h w e use to justify o u r investments . As B o e h m [1981] wr i tes : "There is n o g o o d way to per fo rm a sof tware cos t benef i t analysis, break e v e n analysis, o r make or buy d e c i s i o n w i t h o u t s o m e reasonab ly accurate m e t h o d of est imat ing sof tware costs , and their sensit iv i ty t o var ious p r o d u c t , pro ject , and env i ronmenta l factors . " (pg. 30) T h e s e c o n d issue is the m a n a g e m e n t of the d e v e l o p m e n t p r o c e s s . Total effort t o be e x p e n d e d must be d is t r ibuted over the pro ject durat ion . The t w o aspects of this later issue are s c h e d u l i n g of resources and pro ject c o n t r o l . Estimates f o r m a base l ine against w h i c h the m a n a g e m e n t p rocess can be carr ied out . The usefu lness of pro ject 1 I n t roduct ion / 2 m a n a g e m e n t t o o l s , s u c h as PERT and G A N T T charts, d e p e n d main ly o n the accuracy of the or ig inal est imate . A g o o d est imate wi l l ident i fy areas of pro ject d e v e l o p m e n t that may cause p r o b l e m s or require a greater d e g r e e of manager ia l c o n t r o l . W i t h o u t this basel ine est imate , pro jects are ou t of c o n t r o l f r o m their i n c e p t i o n . T w o measures of system success are w h e t h e r a pro ject is de l i ve red o n t ime and wi th in budget . O b v i o u s l y the accuracy of the or ig inal est imate may d e t e r m i n e if a system is c o n s i d e r e d success fu l g iven these criteria. A s Pressman [1987] expla ins : "In the early days of c o m p u t i n g , so f tware costs r e p r e s e n t e d a smal l p e r c e n t a g e of the overal l cos t of a c o m p u t e r based system. A s izable error in est imates of sof tware costs had relatively little impact . Today , so f tware is the m o s t expens ive e l e m e n t in many c o m p u t e r based systems. A large cos t es t imat ion error can make the d i f fe rence b e t w e e n prof i t and loss o r b e t w e e n system success and system a b a n d o n m e n t . " (pg.98) It is the p u r p o s e of this thesis: 1. T o argue that system s iz ing is a major area of di f f iculty in est imat ing resource c o n s u m p t i o n , 2. T o d e v e l o p a theoret ica l a p p r o a c h t o system s iz ing , 3. T o d e m o n s t r a t e that the s iz ing m o d e l and m e t h o d can be u s e d to expla in sys tem s ize at the various phases of in fo rmat ion system d e v e l o p m e n t , and 4. T o d e m o n s t r a t e that units of system size can be related to units of resource c o n s u m p t i o n . 7.2.0.7. Pilot Survey T o establ ish if, in fact, est imat ing sof tware cos ts is of real c o n c e r n t o pract ic ing IS pro fess iona ls , a p i lo t survey was c o n d u c t e d in the V a n c o u v e r A r e a a m o n g data p r o c e s s i n g managers . These managers are m e m b e r s of the F O C U S users g r o u p and represent m o s t of the larger D P s h o p s in Vancouve r . Fifty quest ionna i res were d is t r ibuted w h i c h asked basic q u e s t i o n s about the use of est imat ing m e t h o d s and the I n t roduct ion / 3 p e r c e n t a g e of so f tware d e v e l o p m e n t p ro jects de l i ve red o n t ime and wi th in budget . O f the 3 0 % w h o r e s p o n d e d , it was d i s c o v e r e d that very little use was m a d e of fo rmal es t imat ing m e t h o d s . M o s t managers re l ied o n past e x p e r i e n c e and gut fee l . Further, the i r se l f - r epo r ted data o n est imate accuracy revealed that over 5 0 % of all pro jects w e r e c o m p l e t e d e i ther o v e r t ime o r b u d g e t schedu le . W h i l e w e c o u l d specu la te that the large pe rcen tage of projects b e i n g o v e r b u d g e t was d u e to not us ing fo rmal es t imat ing t e c h n i q u e s , it may also be d u e to the lack of usefu lness of these t e c h n i q u e s . The basic f ind ings of the survey w e r e p r e s e n t e d at a FUSE (Focus Users g roup ) m e e t i n g in June 1987. A great dea l of interest was e x p r e s s e d by these pract i t ioners to the extent that several f irms ag reed to p r o v i d e access t o data o n thei r o w n systems d e v e l o p m e n t projects. 7.3. GOALS OF THE ESTIMATING PROCESS In o rde r to cons t ruc t an est imate, r esou rces must be e x p e n d e d to gather i n fo rmat ion o n w h i c h the est imate is based . This i n t r o d u c e s t w o quest ions : 1) W h a t is the appropr ia te i n fo rmat ion t o gather? and , 2) W h a t quant i ty of resources s h o u l d b e e x p e n d e d to gather this in format ion in o r d e r to make an est imate? O n c e these t w o q u e s t i o n s have b e e n a d d r e s s e d an est imate is c o n s t r u c t e d . Finally, a d e c i s i o n must be m a d e as to the qual i ty of the est imate. This last po in t is best e x p r e s s e d by Ar is tot le (330 B . C . E . ) 1 : " . . . i t is the mark of an inst ructed m i n d to rest satisf ied w i th the degree Of p rec is ion w h i c h the nature of the subject admits , and no t to seek exactness w h e n on ly an a p p r o x i m a t i o n of the truth is p o s s i b l e . . . " M n Pressman [1987] pg .82 I n t roduct ion / 4 W i t h i n the b roade r s c o p e of IS i nvestment d e c i s i o n s and pro ject m a n a g e m e n t many issues enter in to the p icture . M o r e speci f ical ly , an IS d e v e l o p m e n t est imate s h o u l d a t tempt t o address the f o l l o w i n g bas ic quest ions [Rubin 1983]: 1. W h a t quant i ty of h u m a n resources are requi red? s. W h a t k ind of skills are n e e d e d ? b. H o w many p e o p l e are required? c. W h a t are the constra ints? 2. H o w m u c h wi l l it cost? 3. H o w l o n g wi l l it take? 4. W h a t are the risks? 5. W h a t wi l l the effect be o n the exist ing por t fo l io? 6. W h a t are the trade-of fs? W h i l e all of the a b o v e are clearly important , this research wi l l f o c u s o n a m o r e critical issue o n w h i c h all of the a b o v e q u e s t i o n s are based . The central issue revolves a round the c o n c e p t of system size. Early in the System D e v e l o p m e n t Life C y c l e (SDLC) w h a t ' p roper t ies of a sys tem are available for c o u n t i n g and what units s h o u l d be used? C a n these measurab le p roper t ies of sys tem requ i rements be direct ly l i nked to i m p l e m e n t a b l e c o d e . If this is the case, then by estab l ish ing the l inkages b e t w e e n requ i rements and c o d e , it is poss ib le t o cons t ruc t est imates of c o d e s ize based u p o n the requ i rement measures . In genera l , o n c e this centra l q u e s t i o n can be accurately assessed , do l lar cost f igures, s c h e d u l e , skil ls, and n u m b e r of p e o p l e requ i red can be der i ved reasonably we l l g i v s n ex is t ing pro ject m a n a g e m e n t too l s . The est imat ing p r o c e s s can be c o n s i d e r e d as a semi - s t ruc tu red p r o b l e m w h i c h requires i n fo rmat ion in o r d e r t o pred ict the value of a n u m b e r of variables. W e can safely i gnore the pa tho log ica l case w h e r e an est imate is r e q u e s t e d in the a b s e n c e of any in fo rmat ion . The est imat ing p r o c e s s is o n g o i n g , s ince , as m o r e i n fo rmat ion is gathered In t roduct ion / 5 re f inements o n the initial est imate are poss ib le . A f ter all i n fo rmat ion has b e e n gathered , i.e. at pro ject c o m p l e t i o n , the d i f fe rence b e t w e e n the est imate and the actual s h o u l d be ze ro . The pr inc ipa l goa l is t o p r o d u c e an accurate est imate of r e s o u r c e c o n s u m p t i o n as early as poss ib le w i t h the m i n i m u m a m o u n t of i n fo rmat ion . 1.4. THE MEASUREMENT ISSUES In o rde r t o c o n s t r u c t an est imate of any k ind , a m e a s u r e m e n t must be made . The first measurement issue is to d e t e r m i n e w h i c h variables in the d e v e l o p m e n t p r o c e s s are causal ly c o n n e c t e d . The s e c o n d , is t o actually measure those variables. . For examp le , in o rder to est imate sys tem d e v e l o p m e n t dol lar costs y o u must first b e able to d e t e r m i n e the funct iona l re la t ionsh ip b e t w e e n resource c o n s u m p t i o n and dol lars . Clear ly , the largest determinant of cos ts in sof tware d e v e l o p m e n t is the c o n s u m p t i o n of the labour resource , but es t imat ing labour requ i rements is the centra l p r o b l e m . W e first have to be able to measure s o m e proper t ies of the sof tware pro ject in o rde r to est imate labour effort. O n e c o m m o n pract ice is t o est imate overal l p ro ject s ize by first est imat ing Lines of C o d e ( L O C ) and then est imate labour effort as a f u n c t i o n of this size metr ic . As w e shall see there is a ser ious c o n c e p t u a l f law w i th the L O C est imat ing a p p r o a c h . In brief, L O C is cor re la ted w i t h effort but d o e s not causal ly de te rm ine effort. Effort is e x p e n d e d t o p r o d u c e L O C . It is effort that causes the p r o d u c t i o n of L O C , no t the o the r way a round . A s e c o n d a p p r o a c h is t o size a pro ject us ing Funct ion Po in ts (FP). W h i l e this basic a p p r o a c h is s o u n d it is l imi ted in terms of w h e n accurate i n f o r m a t i o n is available to make an est imate . The t im ing of the measure is cr i t ical . If a s ize est imate of L O C is m a d e after the I n t roduct ion / 6 detai l des ign is c o m p l e t e t h e n w e w o u l d fee l m o r e c o m f o r t a b l e w i t h its a c c u r a c y 2 than if L O C w e r e es t imated du r ing analysis. L O C may be an appropr ia te basis for est imat ing effort bu t on ly at a certain po in t in t ime, usual ly late in the d e v e l o p m e n t p rocess . L ikewise the accuracy of a FP measure wi l l be h igher if it is m a d e w h e n suff ic ient i n fo rmat ion is available. 7.5. RESEARCH METHODS USED IN THIS THESIS 1. Exist ing est imat ing m e t h o d s w e r e invest igated and f o u n d t o have de f ic ienc ies . 2. A causal m o d e l was synthes i zed f r o m the literature based o n ex is t ing m o d e l s and the r e p o r t e d ef fects of var ious factors o n the d e v e l o p m e n t p rocess . The m o d e l conta ins three major variables w h i c h are c l a i m e d to causal ly affect effort: Sys tem requ i rements , Pe rsonne l e x p e r i e n c e , and T e c h n o l o g y . 3. A p i lot survey was c o n d u c t e d to establ ish the pract ical n e e d for research in this area and to sol ic i t access t o live data in o r d e r t o test the m o d e l . 4. This thesis starts f r o m the theoret ica l p o s i t i o n that the under ly ing ent it ies and events w h i c h the in fo rmat ion system m o d e l s is the root s o u r c e of effort to bu i l d the system. Theoret ica l d e v e l o p m e n t f r o m this p o s i t i o n draws f rom systems theory c o n c e p t s , theory of the system d e v e l o p m e n t p r o c e s s , and requ i rements m o d e l l i n g us ing b o t h the data f l o w and the Entity Re lat ionsh ip data m o d e l a p p r o a c h e s . 5. Data o n c o m p l e t e d systems was c o l l e c t e d and u s e d t o cal ibrate the p r o p o s e d requ i rements and d e s i g n metr ics . T e c h n o l o g y and e x p e r i e n c e w e r e he ld cons tan t in the f ie ld sett ing . ^ H e r e accuracy is d e f i n e d is (Actual L O C - Est imated L O C ) / Actua l L O C . This is genera l i zab le to o the r accuracy measures in the general f o r m (Actual measure -Est imated measure)/ Ac tua l measure I n t roduct ion / 7 6. The requ i rements and d e s i g n s ize metr ics w e r e empi r ica l ly va l idated by p r e d i c t i n g c o d e s ize based o n i n f o r m a t i o n present in the s o u r c e c o d e of c o m p l e t e d systems u s i n g regress ion analysis. 1.6. CONTRIBUTION TO KNOWLEDGE AND PROFESSION 1. Cu r ren t t e c h n i q u e s for s i z ing sys tem d e v e l o p m e n t p ro jects are based o n pract ice not theory . It is ant ic ipated that by app ly ing a theory of requ i rements s ize to est imat ing effort requ i red t o bu i l d an in fo rmat ion system a theoret ica l ly based metr ic can be d e v e l o p e d . 2. The f o c u s of ex ist ing est imat ing t e c h n i q u e s can be c o n s i d e r e d to be o n the sof tware p r o c e s s e s and no t o n the data requ i rements . This research wi l l a t tempt to s h o w , in part, that data st ructures are m o r e genera l than p r o c e s s e s for the p u r p o s e s of e s t i m a t i n g 3 . 3. A n earlier, m o r e pa rs imon ious , m o r e rel iable met r ic of requ i rements and d e s i g n size is e x p e c t e d . It is ant ic ipated that the E-R data structure and event a p p r o a c h to es t imat ing (E-R-E) e n c o m p a s s e s the t w o m o s t c o m m o n s iz ing t e c h n i q u e s , namely, F u n c t i o n Points and Lines of C o d e . A n E-R-E d iagram is u s e d to capture the essent ia l ob jec ts and re lat ionsh ips that exist in the real w o r l d and their dynamics . By measur ing m o r e permanent , m o r e basic ob jec ts in a bus iness sys tem, w e s h o u l d be able t o pred ic t no t on ly effort but also Funct ion Points and Lines of C o d e . If this can be ach ieved , then the E-R-E a p p r o a c h c o u l d be c o n s i d e r e d a m o r e genera l i zed f o r m of the o the r t w o approaches . 4. The d e s i g n and d e v e l o p m e n t of a p r o t o t y p e s o u r c e c o d e analyser p rov ides the capabi l i ty t o reverse -eng ineer insta l led systems. This p rov ides m a n a g e m e n t w i t h a " 3 l t may also be the case that a~ data structure a p p r o a c h is a m o r e general a p p r o a c h to analysis and d e s i g n . H o w e v e r , this issue wi l l not be a d d r e s s e d in this thesis. I n t roduct ion / 8 t o o l t o fully unders tand their insta l led so f tware base by p r o v i d i n g rel iable so f tware metr ics . 5. This s tudy is the first to invest igate est imat ing in sys tem d e v e l o p m e n t e n v i r o n m e n t s us ing a 4th genera t ion language. If genera l i zab le , this wi l l o p e n the d o o r to est imat ing in m o d e r n d e v e l o p m e n t env i ronments . 7.7. THESIS STRUCTURE C h a p t e r 2 of this thesis rev iews the relevant l iterature o n est imat ing and software metr ics . Prev ious est imat ing m o d e l s are also r e v i e w e d and assessed . The factors f o u n d to affect d e v e l o p m e n t effort are o r g a n i z e d into a table w h e r e a basic structure emerges . Three major in f luences are f o u n d w h i c h lay the g r o u n d w o r k for the theoret ica l d e v e l o p m e n t in C h a p t e r 3. C h a p t e r 3 d e v e l o p s a theory of requ i rements s ize and p r o p o s e s a m e t h o d of m e a s u r e m e n t based o n ent i t ies, re lat ionships a n d events . C h a p t e r 4 m o v e s f r o m the theory to the empi r ica l d o m a i n . It presents a research m o d e l c o n t a i n i n g b o t h the theoret ica l c o n s t r u c t s d e v e l o p e d in chapte r 3 and the empi r ica l measures p r o p o s e d to substant iate the theory . A n e x a m p l e sys tem is u s e d t o art iculate the measures s u g g e s t e d . A f ie ld s tudy research des ign is d e s c r i b e d w h e r e the theory may be tes ted . The des ign and c o n s t r u c t i o n of an a u t o m a t e d C o d e Ana lyser is p r e s e n t e d t o address the issue of m e a s u r e m e n t reliabil ity. C h a p t e r 5 d e s c r i b e s the empi r ica l sett ing and the data c o l l e c t e d in o r d e r t o test the I n t roduct ion / 9 est imat ion m o d e l . Data rel iabil ity and val id ity issues are a d d r e s s e d by c o m p a r i n g manua l c o u n t i n g m e t h o d s w i th the a u t o m a t e d t o o l . C h a p t e r 6 d i scusses the results of the data analysis " a n d ident i f ies the l imitat ions of the research. C h a p t e r 7 conta ins a d i s c u s s i o n o n the ramif icat ions of this research b o t h in te rms of e x t e n d i n g the a c a d e m i c invest igat ion and in terms of its potent ia l c o n t r i b u t i o n to industry. CHAPTER 2. RELEVANT LITERATURE REVIEW 2.7. MODEL CLASSIFICATION Several authors have s u g g e s t e d a t a x o n o m y of est imat ing app roaches : [Wo lve r ton , 1974; Basil i , 1980; Benbasat and Vessey , 1980; K i t c h e n h a m and Taylor, 1984; C o n t e , D u n s m o r e and S h e n , 1986] 1. Expert judgement: Estimates are arrived at f r o m "gut f e e l " a n d perhaps s o m e e x p e r i e n c e w i th o t h e r pro jects shar ing a c o m m o n attr ibute. Sub ject ive evaluat ions and c o m p a r i s o n s are used heurist ical ly . This t e c h n i q u e is u s e d w h e n c o m p l e t e l y n e w pro jects are c o n s i d e r e d . It marks the m o s t c o m m o n a p p r o a c h u s e d in the early years of c o m p u t i n g . A l s o , a p i lo t survey f o u n d this a p p r o a c h still c o m m o n pract ice . 2. Mathematical or Algorithmic models: The next stage of d e v e l o p m e n t appears to be the c o n s t r u c t i o n of fo rmulas b a s e d o n attr ibutes of a n u m b e r of c o m p l e t e d pro jects . Curve f i tt ing and factor analysis are a p p l i e d t o a dataset in the h o p e s of teas ing out s o m e under ly ing ef fects. The resul t ing equat ion(s ) has a n u m b e r of parameters w h i c h p resumab ly corre late h ighly w i th system d e v e l o p m e n t effort. D is t r ibut ion of this effort ove r the s c h e d u l e is der ived f rom assumpt ions about the shape of the life cyc le curve. 3. Bottom up: This app roach is w i d e l y e m p l o y e d w h e n a s t ructu red systems d e v e l o p m e n t m e t h o d o l o g y is available. The pro ject is b r o k e n d o w n into ident i f iable tasks, e a c h task is separately es t imated , then s u m m e d up o v e r the entire pro ject . W h i l e this is e x p e c t e d to r e d u c e error var iance it requi res far m o r e effort than is feasible f o r initial j o b s iz ing . 4. Top down: G l o b a l p ropert ies of the sof tware p r o d u c t , t e c h n o l o g y , e n v i r o n m e n t and p e r s o n n e l are u s e d t o create an overal l est imate. Effort is t h e n d is t r ibuted 10 Relevant Literature Rev iew / 11 over the p ro ject a c c o r d i n g to the assumed shape of the life cyc le curve. It is this t e c h n i q u e w h i c h offers the m o s t p r o m i s e fo r generat ing early accurate est imates. K i t c h e n h a m and Taylor [1984] po in t ou t that all of the a b o v e a p p r o a c h e s share a c o m m o n attr ibute: they all require h istor ical data of c o m p l e t e d pro jects , and e v e n the s o ca l led theoret ica l mathemat ica l m o d e l s are c o n s t r u c t e d and cal ibrated f r o m real data. It w o u l d appear f r o m their analysis that there is an a b s e n c e of pure theory in this area. A brief c o m p a r i s o n of the st rengths and w e a k n e s s e s of these a p p r o a c h e s appears in Table 2 . 1 * . ' Tab le 2.1 A p p r o a c h Strengths W e a k n e s s e s Expert j u d g e m e n t • A s s e s s m e n t of reasonab leness , interact ions e x c e p t i o n a l c i rcumstances • N o better than e x p e r i e n c e , imper fect recal l A l g o r i t h m i c • O b j e c t i v e , repeatable , analyzable fo rmu la • Eff icient, g o o d for sensit iv ity analysis •Sub jec t i ve inputs • Ca l ib rated to past not future B o t t o m - u p • H i g h level of detai l , m o r e stable • M a y o v e r l o o k s o m e costs , requires m o r e effort T o p D o w n • Sys tem level f o c u s • L e s s deta i led basis, less stable A d a p t e d f r o m B o e h m [1981], pg . 342 Relevant Literature Rev iew / 12 2.2. HISTORICAL DEVELOPMENT The first p u b l i s h e d w o r k o n sof tware cos t es t imat ion c a m e f r o m N e l s o n [ 1 9 6 6 ] 5 . H e c o n s t r u c t e d a regress ion e q u a t i o n f r o m 169 . so f tware p ro jects s t u d i e d by System D e v e l o p m e n t C o r p o r a t i o n (SDC) . This regress ion m o d e l conta ins 14 parameters w h i c h are rated by the est imator . The d e p e n d e n t var iable is m e a s u r e d in m a n - m o n t h s ( M M ) . Aga inst its o w n data base, the m o d e l p r o d u c e s a m e a n est imate of 40 M M wi th a standard error of 62 M M . W i t h a 1.5 co -e f f i c i en t of var iat ion, clear ly s o m e t h i n g may be miss ing . The pr inc ipa l c o n c l u s i o n f r o m the study was that there w e r e t o o many non - l i near re lat ionships for a l inear regress ion t o be mean ing fu l . Du r ing the early sevent ies , T R W was invo l ved w i th the d e v e l o p m e n t of n u m e r o u s real t ime military pro jects . O u t of this pro ject base, W o l v e r t o n [1974], d e v e l o p e d a graph relat ing cost per ins t ruct ion to di f f iculty of pro ject , fo r 6 di f ferent p ro ject categor ies . D i f f icu l ty is o n a percent i l e scale , i.e. relative d e g r e e of di f f iculty c o m p a r e d to o the r pro jects . C o s t per inst ruct ion relates to a l ine of assemb le r c o d e . His m o d e l is basical ly a lgor i thmic tak ing the three input parameters , l ines of c o d e , level of dif f iculty, the histor ical cos t base and p r o d u c e s a tota l pro ject cost est imate . His m o d e l forecasts that di f f iculty can c h a n g e cost per inst ruct ion by 100%. In 1975 Brooks w r o t e his n o w famous b o o k "The Myth ica l M a n M o n t h " . W h i l e no t hav ing a m o d e l per se, B r o o k s mot i va ted the centra l idea that p e r s o n n e l and t ime are not in terchangeab le . It is this non - l inear re lat ionsh ip w h i c h had c o n f o u n d e d the 5 S e e B o e h m [1981], pp . 510 -519 , and M o h a n t y [1981] for fur ther descr ip t ions . Relevant Literature Rev iew / 13 m a n a g e m e n t of p ro jects for years. " A d d i n g p e o p l e t o a late pro ject m a k e s it l a t e r " 6 . A major s tudy to invest igate p r o g r a m m i n g p roduct iv i t y was c o n d u c t e d by W a l s t o n and Felix [1977] at IBM 's Federal Systems D iv is ion (FSD). They invest igated 60 pro jects , wr i t ten in 28 di f ferent languages, rang ing in s ize f r o m 4000 to 467 ,000 l ines of c o d e . P roduct iv i ty was f o u n d to range f r o m 27 to 1000 de l i ve red s o u r c e l ines per m a n - m o n t h , nearly t w o orders of m a g n i t u d e ! F rom this data set 68 pro ject attr ibutes w e r e c o l l e c t e d and c o m p a r e d to l ines of c o d e per m a n - m o n t h , 29 of these w e r e f o u n d to be s igni f icant . The basic m o d e l calculates a p roduct iv i t y index b a s e d u p o n the 29 signi f icant " c o s t dr ivers" : and (PC) / is the p roduct iv i ty c h a n g e b e t w e e n a l o w and h igh rating for the / th cost dr iver attr ibute. Putnam's SL IM m o d e l [Putnam, 1978 ; Putnam and F i t zs immons , 1979] uses the Rayleigh d is t r ibut ion to relate the three crit ical variables: s o u r c e l ines of c o d e , tota l man-years , and pro ject durat ion . The sof tware e q u a t i o n is: = I W /X / ; i = 1..29 w h e r e : VV; = 0.5 l o g 1 0 ( P C ) / X / = pro ject attr ibute S C K - 3 3 t 1 , 3 3 where : S K de l ivered s o u r c e l ines l i fe -cyc le . effort in man-years d e v e l o p m e n t t i m e in years a t e c h n o l o g y cons tan t C 6 B r o o k s [1975], pg .25 Relevant Literature Rev iew / 14 The user of the m o d e l typical ly inputs an initial s ize est imate , then manipu lates the shape of the curve to arrive at the final effort est imate. The shape is m o d i f i e d by t w o parameters, the p roduct i v i t y factor C , and its initial s l ope , rep resent ing the constra int o n p e r s o n n e l b u i l d u p . These parameters can b e cal ibrated e i ther by a c c e p t i n g input data f r o m c o m p l e t e d p ro jects o r answer ing a series of 22 ques t ions . In 1973, C was ca l ibrated at 4900 . By 1978 this f igure had r e a c h e d 10,000, i nd icat ing e i ther p roduct i v i t y gains of 1 0 0 % o v e r 5 years or lack of stabil ity in the parameters . Later, A lan A l b r e c h t of I B M c h a n g e d the p r e o c c u p a t i o n w i th l ines of c o d e as the pr inc ipa l s iz ing metr ic by i n t r o d u c i n g f u n c t i o n po ints [A lbrecht , 1979]. Instead of est imat ing l ines of c o d e , a c o u n t is m a d e of ident i f iable system funct ions : external inputs, external ou tpu ts , log ica l internal file, external interface f i le, external inquiry. Each f u n c t i o n has an assoc ia ted n u m b e r of po in ts : External Input = 4 External O u t p u t = 5 Log ica l Internal file = 10 External Interface file = 7 External Inquiry = ' 4 The software p r o d u c t is initially s i zed by mu l t ip l y ing the n u m b e r of ident i f iab le so f tware funct ions in e a c h ca tegory by its assoc ia ted f u n c t i o n po in ts . This f u n c t i o n c o u n t is m o d i f i e d by a n u m b e r of c o m p l e x i t y ad jus tment factors to a c c o u n t for the di f ferent k inds of system requ i rements and d e v e l o p m e n t env i ronments . In the or ig inal m o d e l the ad justment c o u l d m o d i f y the initial f u n c t i o n c o u n t by plus o r minus 3 5 % ; h o w e v e r , the ef fects of c o m p l e x i t y have s ince b e e n rev ised upwards to p lus o r minus 3 0 0 % . This m o d e l is the first e x a m p l e of a t o p d o w n a p p r o a c h to es t imat ion . In 1981 Barry B o e h m of T R W p u b l i s h e d his C o n s t r u c t i v e C O s t M O d e l ( C O C O M O ) . This Relevant Literature Rev iew / 15 m o d e l is b a s e d o n the analysis of 63 sof tware pro jects du r ing his tenure as D i rec to r of So f tware Research and T e c h n o l o g y . The basic m o d e l relates " t h o u s a n d s of de l i ve red s o u r c e i ns t ruc t ions" (KDSI) t o "e f fo r t " ( M M ) : M M = 2.4 (KDSI ) 1 - 0 5 The d e v e l o p m e n t s c h e d u l e in m o n t h s (TDEV) is es t imated next w i t h : T D E V = 2 . 5 ( M M ) 0 - 3 8 The d is t r ibu t ion of effort by phase wi th in this t ime p e r i o d is taken f r o m the normat ive p rescr ip t ions of the tradit ional life cyc le curve. The de ta i l ed C O C O M O c o n s i d e r s 15 " c o s t dr ivers" w h i c h have b e e n f o u n d to affect product iv i ty . These cost drivers are app l ied to the di f ferent phases of d e v e l o p m e n t t o adjust the or ig inal effort est imate upward o r d o w n w a r d . The basic C O C O M O can be c o n s i d e r e d an a lgor i thmic m o d e l wh i le the c o n s i d e r a t i o n of m a c r o ef fects in the deta i led vers ion makes it a hybr id s o m e w h e r e b e t w e e n t o p - d o w n and a lgor i thmic . W h e n a p p l i e d t o its o w n database of pro jects , C O C O M O pred icts d e v e l o p m e n t costs w i th in 2 0 % of actuals 7 0 % of the t ime . B o e h m cla ims this translates to a standard dev ia t ion of the residuals of rough ly 2 0 % of the a c t u a l s . 7 The Bai ley-Basi l i m e t a m o d e l was p u b l i s h e d in 1981 and is similar t o B o e h m ' s C O C O M O m o d e l . D e v e l o p e d f r o m 18 pro jects in the N A S A - G o d d a r d Software Eng ineer ing Laboratory it uses pr ior effort and size data to scale th ree parameters : Effort = a(Size)° + c T w o o t h e r p ro ject attr ibutes, tota l m e t h o d o l o g y (METH) and cumulat ive c o m p l e x i t y ( C M P L E X ) are t h e n u s e d to d e t e r m i n e effort mult ip l iers and error ratio (ER) in the Relevant Literature Rev iew / 16 f o rm : Effort = ER [a (S i ze ) b + c] ER = cf(METH) + e ( C M P L X ) + f W h e n app l ied to its o w n database the resu l t ing equat ions w e r e : Effort = ER[0.73 + ( S i z e ) 1 - 1 6 + 3.5] ER = - 0 .036 (METH) + 0 . 0 0 9 ( C M P L X ) + 0.80 w i th a standard er ror of est imate of 1.15. The E 5 T I M A C S m o d e l was d e v e l o p e d and p u b l i s h e d in 1983 by H o w a r d Rubin of H u n t e r C o l l e g e [Rubin, 1983]. As it is a propr ietary p r o d u c t , n o deta i led i n fo rmat ion is current ly available. The m o d e l engages the user in a series of 25 quest ions . It uses a m o d i f i e d f unc t ion po in t app roach for a s ize est imate w h i c h is t h e n ad justed by assumpt ions a b o u t p ro ject comp lex i t y . This ad justment can c h a n g e the f u n c t i o n po in t est imate by a factor of 2. In add i t i on , E S T I M A C S p r o d u c e s staff ing prof i les , a d e v e l o p m e n t s c h e d u l e and a pro ject risk est imate . International T e l e p h o n e and Te legraph f o r m e d a study g r o u p to invest igate the factors a f fect ing p roduct iv i ty . The results w e r e p u b l i s h e d by V o s b u r g h [1984]. The g r o u p ana lyzed 44 pro jects , wr i t ten in several d i f ferent languages, rang ing iri s ize f r o m 5000 to 500 ,000 l ines of c o d e . The pro jects w e r e a mix of s w i t c h i n g systems, de fense app l icat ions , and p r o c e s s c o n t r o l . B e g i n n i n g w i th a list of 100 factors, the researchers f o u n d 5 to be s igni f icant . Regressed against the d e p e n d e n t variable effort, t hese 5 e x p l a i n e d 6 5 % of the var iance: 1. C o m p l e x i t y and resource constra ints 1 6 % 2. C l i e n t Interface and e x p e r i e n c e 1 2 % 3. M o d e r n P r o g r a m m i n g pract ice 2 4 % Relevant Literature Rev iew / 17 4. 5. Personne l e x p e r i e n c e Stabil ity of spec i f ica t ions 4 % 9 % The m o s t recent est imat ing p r o d u c t , S P Q R / 2 0 [Jones, 1986], was also der ived f r o m A l b r e c h t ' s f u n c t i o n a p p r o a c h . D e v e l o p e d by C a p e r s Jones of So f tware Product iv i ty Research Inc., it is an interact ive package w h i c h p rov ides similar o u t p u t s to ESTIMACS. The user is p r o m p t e d by a series of p r o d u c t and pro ject re lated q u e s t i o n s . Then the p ro ject s ize est imate is input by the user in terms of f u n c t i o n po int categor ies . In add i t i on to the a b o v e m o d e l s , a n u m b e r of bottom-up est imat ing a p p r o a c h e s have b e e n d e v e l o p e d f r o m propr ietary system d e v e l o p m e n t m e t h o d o l o g i e s . O n e e x a m p l e of this class is ESTIMATE/1 d e v e l o p e d and u s e d by Ar thur A n d e r s e n and C o . The basic structure of the m o d e l is as fo l l ows : Ff~ = Est imat ing factor , f= 1..n Kk = C o n s t a n t to adjust units to c o m m o n base, k = 1..n It = Task in w o r k b r e a k d o w n st ructure , t= 1..m Ss = S e g m e n t : i nc ludes a g r o u p of tasks, s = 1..p C s = Estimate by analyst of overal l c o m p l e x i t y for s e g m e n t s UT£= # of units in task t U T t = fn(F/K;...F/"K/) H T f = H o u r s es t imated to c o m p l e t e task t A base est imate is ca lcu la ted by s u m m i n g up o v e r all tasks. This base is then adjusted for several m a c r o c o m p l e x i t y factors. The c o m p l e x i t y mult ip l iers at this m a c r o level plus the mult ip l iers at the task level can ef fect tota l p ro ject costs by several h u n d r e d percent . The d is t r ibut ion of effort is expl ic i t in the w o r k b r e a k d o w n structure. C o s t f igures for each ca tegory of p e r s o n n e l , e .g . s e n i o r analyst d o w n to c lerks, are 8 f n c is der i ved empir ica l ly and is c o n s i d e r e d propr ietary in fo rmat ion . HTf UTf * fnc (Cs , TO 8 Relevant Literature Rev iew / 18 m u l t i p l i e d by the n u m b e r of hours f o u n d in the task est imates . Each pe rson o n the pro ject team also has a product iv i ty mul t ip l ie r w h i c h is u s e d t o adjust the tota l cost f igure up o r d o w n . 2.3. EMPIRICAL VALIDATIONS There have b e e n f e w attempts t o external ly val idate the var ious m o d e l s against o ther pro jects to date. There are several p o s s i b l e reasons for this. First, m a n y c o n s i d e r their p ro ject database as propr ietary i n fo rmat ion . The s e c o n d , and far m o r e l ikely is that the data is s imply no t c o l l e c t e d . O n c e a sys tem is up and runn ing there is little mot i va t ion t o c o n d u c t a p o s t m o r t e m . It is unl ike ly that pro ject m a n a g e m e n t systems are geared to track crit ical data po in ts . Th i rd , fo rmal es t imat ion t e c h n i q u e s are not w i d e l y used . Expert j u d g e m e n t may still be the d o m i n a n t a p p r o a c h . B r o o k s [1981] r e v i e w e d the W a l s t o n and Felix database and n o t e d that 4 of the 29 variables re lated t o st ructured p r o g r a m m i n g . These variables w e r e g r o u p e d as o n e and then the pro jects w e r e re -ana lyzed . H e f o u n d that p roduct iv i t y gains of 3 5 % w e r e a c h i e v e d for smal l p ro jects repor t ing the use of s t ructured p r o g r a m m i n g , wh i le gains of 2 0 0 % w e r e f o u n d fo r larger pro jects us ing similar t echn iques . Further, he f o u n d that in pro jects us ing uns t ruc tu red t e c h n i q u e s l o w product iv i ty was h igh ly co r re la ted w i th h igh c u s t o m e r interface c o m p l e x i t y , h igh app l i ca t ion c o m p l e x i t y , t im ing and storage constra ints , and l o w p e r s o n n e l e x p e r i e n c e . A l b r e c h t and Ca f fney [1983] c o l l e c t e d data o n a n e w set of pro jects in an at tempt t o estab l i shed a cor re la t ion b e t w e e n f u n c t i o n po ints and s o u r c e l ines of c o d e . This is Relevant Literature Rev iew / 19 impor tan t b e c a u s e a major p r o b l e m in est imat ing is in ob ta in ing a reasonab le feel for the so f tware size. His sample c o n s i s t e d of 24 pro jects d e v e l o p e d by I B M D P Services. Relat ions were f o u n d b e t w e e n language type , s ize , and w o r k effort. It appears f r o m this art icle that f u n c t i o n po ints map we l l t o eventua l l ines of c o d e . 9 In a separate va l idat ion study, K e m e r e r [1987] f o u n d a .65 co r re la t ion b e t w e e n f u n c t i o n po ints and l ines of c o d e for 15 C o b o l pro jects . Behrens [1983] c o l l e c t e d data f rom 11 pro jects c o m p l e t e d in 1980 and 14 c o m p l e t e d in 1981 . The pro jects w e r e s i zed us ing A l b r e c h t ' s f unc t ion po in t t e c h n i q u e . C o s t data was m a d e available f r o m an a u t o m a t e d pro ject m a n a g e m e n t sys tem. H e f o u n d an e x p o n e n t i a l re lat ion b e t w e e n f u n c t i o n po in ts and w o r k effort. Language u s e d and d e v e l o p m e n t e n v i r o n m e n t w e r e also f o u n d to affect w o r k effort. In c o n t r a d i c t i o n f o many studies , years of e x p e r i e n c e was not f o u n d to be a s igni f icant factor . K i t c h e n h a m et.al . [1984] evaluated b o t h SLIM and C O C O M O in a jo int research pro ject b e t w e e n ICL and Brit ish T e l e c o m . There ob jec t i ve was to f ind an est imat ing m o d e l t o use in future d e v e l o p m e n t pro jects . Their dataset c o n s i s t e d of 20 c o m p l e t e d pro jects dea l ing w i th a d v a n c e d t e l e p h o n e swi tch ing . M a n y of these systems w e r e of a real - t ime nature and d e v e l o p e d at many dif ferent sites. Plots of actuals vs. est imates s h o w little systemat ic re lat ion. They p o i n t e d out many di f f icult ies in us ing and cal ibrat ing the m o d e l s to their part icular env i ronments . Thei r c o n c l u s i o n s w e r e to reject the use of e i ther m o d e l and to establ ish their o w n histor ical database. K e m e r e r [1987] c o n d u c t e d a pos t h o c eva luat ion of 15 pro jects us ing 4 of the ' Jones [1986], pg. 77" gives examp les of the s ize m a p p i n g b e t w e e n 25 languages and f u n c t i o n po in ts . Relevant Literature Rev iew / 20 prev ious m o d e l s : SL IM , C O C O M O , EST1MACS, and F U N C T I O N POINTS . Data w e r e c o l l e c t e d f r o m c o m p l e t e d pro jects w i th in the same o rgan i za t ion . Nearly all p ro jects w e r e d e v e l o p e d in C O B O L and ranged in size f r o m 39 t h o u s a n d s o u r c e l ines of c o d e ( K S L O C ) t o 1107 K S L O C , w i th the m e a n a round 190 K S L O C . Each of the m o d e l s w e r e u s e d t o est imate tota l effort w h i c h w e r e then c o m p a r e d t o actual effort. SL IM had an average pe rcen t error of 7 7 2 % , i.e. SL IM cons is tent ly es t imated m o r e effort than was actually used . Similarly, C O C O M O had an average error of 614%. A l b r e c h t ' s f u n c t i o n po int t e c h n i q u e d i d m u c h bet ter w i t h the average error at 102%. Finally, Rubin 's EST IMACS p a c k a g e p r o d u c e d errors of 167%. 2.4. COMPARISON OF FACTORS AND MODEL SENSITIVITY Five of the p r e c e d i n g m o d e l s have b e e n s e l e c t e d for further analysis. The criter ia 'for inc lus ion was m a d e informal ly based primari ly u p o n the pub l i ca t i on of empi r ica l results but a lso f rom the f requency of re fe rences by other authors . Table 2.2, f r o m W r i g l e y and D e x t e r [1987], conta ins a c o l l e c t i o n of 74 factors w h i c h have b e e n f o u n d to affect d e v e l o p m e n t effort. Several factor classi f icat ions appear in the literature. B o e h m [1981] o rgan izes 29 factors in to 5 major g roup ings : s ize , p rog ram, c o m p u t e r , p e r s o n n e l , and pro ject attr ibutes. Factors are e q u a t e d to attr ibutes in his analysis. K e m e r e r [1986] bui lds o n this basic t a x o n o m y but adds a sixth g roup , user factors, and renames p rog ram to p r o d u c t , and c o m p u t e r t o env i ronment . A l t h o u g h B o e h m ' s factor c lass i f icat ion has b e e n general ly ma in ta ined , a h igher level of abstract ion has b e e n a d d e d t o serve as the basis of a c o n c e p t u a l m o d e l p r e s e n t e d in S Y S T E M R E Q U I R E M E N T S : P r o d u c t S i l e D a t a B a s e S i z e t o f I t o o n i n D a t a b a s e / 1 0 0 0 LOC t o f L o g i c a l P i l e s t o f L o g i c a l B u s i n e s s I n p u t s t o f L o g i c a l B u s i n e s s O u t p u t s t o f O n - l i n e I n q u i r i e s t o f B u s i n e s s F u n c t i o n s T o I m p l e m e n t % P u n e t i o n o U s e r A l r e a d y P e r f o r m s % F u n c t i o n s A l r e a d y A u t o m a t e d . S o u r c e C o d e R e - u s a b i l i t y C o m p l e x i t y C u s t o m e r I n t e r f a c e C o m p l e x i t y C o m p l e x i t y o f C o d e C o m p l e x i t y o f A p p l i c a t i o n P r o c e s s i n g C o m p l e x i t y o f P r o g r a m F l o w % C o d e N o n - M a t h e m a t i c a l o r I / O S o f t w a r e C o m p l e x i t y C o m p l e x i t y o f P r o c e s s i n g L o g i c L o g i c a l C o m p l e x i t y S t r u c t u r a l C o m p l e x i t y D a t a C o m p l e x i t y T i m e B a t c h , O n - l i n e . R e a l T i m e E x e c u t i o n T i m * C o n s t r a i n t s P e r f o r m a n c e E x p l i c i t D e s i g n C r i t e r i a R e s p o n s e T i n e C o n s t r a i n t s % C o d e f o r R e a l T i m e o r I n t e r a c t i v e D o c u m e n t a t i o n U s e r D o c . [ E x c e l l e n t . . I n f o r m a l ) * ' P a g e s D o c u m e n t a t i o n / 1 0 0 0 LOC R e l i a b i l i t y R e q u i r e d R e l i a b i l i t y B a c k - u p S y s t e m R e q u i r e d A u t o m a t e d E r r o r D e t e c t i o n H e e d e d S p e c i a l D i s a s t e d R e c o v e r y N e e d e d C r i t i c a l B u s i n e s s S y s t e m P r o j e c t C l a s s ( P e r s o n n e l , P r o j e c t T y p e ( B a t c h . . A l ) • s e r P a r t i c i p a t i o n R e q u i r e m e n t s V o l a t i l i t y . M i l i t a r y ) I B M C O ? E S T i •SPCJR" PSD OHO SLIM MACS 44 • 73 • * • ft • • • ft • * 1000 3 0 0 70 106 36 42 135 • ft ft * * 77 95 • * 30 37 30 6 * 180 • • • • • * • 15 140 52 50 P E R S O N N E L E X P E R I E N C E : O v e r a l l P e r s o n n e l E x p e r i e n c e A n a l y s t s C a p a b i l i t y P r o g r a m m e r C a p a b i l i t y L a n g u a g e E x p e r i e n c e E x p e r i e n c e w i t h O p e r a t i o n a l H a r d w a r e A p p l i c a t i o n E x p e r i e n c e P r o j e c t N o v e l t y ( C o n v e r s i o n . . N e w ) C o m p a r a b l e S y s t e m i n O p e r a t i o n U s e r K n o w l e d g e o f D a t a P r o c e s s i n g C u s t o m e r E x p e r i e n c e P R O J E C T D E V E L O P M E N T : M e t h o d s a n d T o o l s U s e o f S t r u c t u r e d P r o g r a m m i n g U s e o f D e s i g n a n d C o d e I n s p e c t i o n s U s e o f T o p Down D e v e l o p m e n t U s e o f C h i e f P r o g r a m m e r Team U s e o f M o d e r n P r o g r a m m i n g P r a c t i c e s U s e o f S o f t w a r e T o o l s U s e o f A u t o m a t e d D e s i g n T o o l s L a n g u a g e U s e d E n v i r o n m e n t D l s t r i b u t e d , O n - l i n e , o r B a t c h C o m p u t e r t u r n a r o u n d T i m e C o m p u t e r A c c e s s O p e n C o m p u t e r A c c e s s C l o s e d C l a s s i f i e d S e c u r i t y E n v i r o n m e n t O f f i c e F a c i l i t i e s ( S e p a r a t e . . C r a m p e d ) I S i t e s i n O t h e r C i t i e s • P e o p l e R e l o c a t e d f o r P r o j e c t T r a v e l T i m e t o I n v o l v e d S i t e s P r o c e s s D e v e l o p m e n t S c h e d u l e C o n s t r a i n t C o n s t r a i n t s on P r o g r a m D e s i g n M a i n S t o r a g e C o n s t r a i n t s v i r t u a l M a c h i n e V o l a t i l i t y H a r d w a r e C o n c u r r e n t l y D e v e l o p e d % P r o g r a m m e r s i n D e s i g n t D e v e l o p e ( M e a n S t a f f S i z e ) / D u r a t i o n U s e r / M a n a g e m e n t A g r e e m e n t • o f P e o p l e i n O r g a n i z a t i o n I n v o l v e d • o f O r g a n i z a t i o n a l U n i t s i n P r o j e c t I U s e r s o n D e v e l o p m e n t Team I B M coc" E S T I SPQR F S D OHO S L I M MACS / 2 0 2 1 0 2 0 0 1 44 • 130 • 2 1 5 30 * 114 44 * 160 4 7 • • 50 » * 54 78 54 64 8 5 130 ft 107 40 • ft ft ft 44 ft 58 78 8 5 40 ft • ft 2 0 ft i 64 102 8 5 ft 75 68 155 76 ft ft ft ft Table 2 .2 E f f e c t s on Productivity (%) Relevant Literature Rev iew / 22 C h a p t e r 3. The three t o p level factor g r o u p i n g s are system requ i rements , p e r s o n n e l e x p e r i e n c e and d e v e l o p m e n t t e c h n o l o g y . Analysis of Table 2.2 s h o w s that m o s t of the 74 factors m a p reasonably w e l l in to these three major g roup ings . A n o t h e r aspect of Tab le 2.2 is w o r t h n o t i n g . C o m p l e x i t y has b e e n split off as a separate class. O n e reason for this is the general lack of clarity as to what the de f in i t i on of c o m p l e x i t y is and the resul t ing lack of r igour in m e a s u r i n g its ef fects. But m o r e important ly , c o m p l e x i t y has b e e n f o u n d to in f luence m u c h of the d e v e l o p m e n t effort. For e x a m p l e , in 1979, A lb rech t a l l o w e d c o m p l e x i t y factors to impact the est imate by plus o r minus 3 5 % . This has s ince b e e n rev ised u p w a r d to 3 0 0 % potent ia l ef fect [Jones 1986]. S ince authors pub l i sh thei r results in d i f ferent ways it was necessary to conver t the var ious f ind ings to a c o m m o n base for c o m p a r i s o n . The f igures in the c o l u m n s u n d e r e a c h m o d e l represent the potent ia l impact o n p roduct iv i ty m e a s u r e d in percent . W h a t this means is that, i g n o r i n g all o ther factors and poss ib le cor re la t ions , the product iv i ty of pro jects having a h igh rating o n the factor was h igher by X%, w h e n c o m p a r e d to p ro jects hav ing a l o w rating o n the factor (at s o m e level of s ign i f icance) . W h e r e not o t h e r w i s e o b v i o u s , the usual scale for e a c h factor was l ow , m e d i u m and h igh . For e x a m p l e , in the I B M - F S D database, pro jects hav ing a h igh level of p e r s o n n e l e x p e r i e n c e r e p o r t e d p roduct iv i t y gains of 2 1 0 % c o m p a r e d to pro jects hav ing l o w p e r s o n n e l e x p e r i e n c e . C lear ly for s o m e factors the reverse ef fect is e x p e c t e d . For examp le , in the s a m e dataset, p roduct i v i t y gains of 3 0 0 % w e r e f o u n d for p ro jects w i t h l o w c u s t o m e r inter face c o m p l e x i t y . The ' * ' s are used to s h o w that the au tho r i nc ludes this factor in Relevant Literature Rev iew / 23 his m o d e l but the w e i g h t of its impact is not present ly ava i l ab le 1 0 . M o r e than o n e entry per line ind icates that this factor is referred t o exact ly the same way in e a c h of the i nd ica ted m o d e l s . It is o b v i o u s that there is m u c h d u p l i c a t i o n in Table 2.2. O n e p u r p o s e of p r e s e n t i n g the data in this manner is t o s h o w the diversity and ambigu i ty s u r r o u n d i n g these m o d e l s . 2.4.1. Discussion: Effort Estimation and Productivity The c o n c e p t s of sof tware effort es t imat ion and p r o g r a m m e r p roduct iv i ty are inextr icably l inked . Expl icit in all m o d e l s of cost es t imat ion are n o t i o n s of factors w h i c h e i ther increase o r d e c r e a s e software d e v e l o p m e n t effort. P r o g r a m m e r p roduct iv i t y gains are a c h i e v e d by e n h a n c i n g those factors w h i c h are be l i eved to decrease ef fort and mit igat ing t h o s e that are be l i eved t o increase effort. For the m o s t part these factors have b e e n empi r ica l ly der ived f r o m c o l l e c t i o n s of pro jects . A q u e s t i o n of external val idity can be raised w h e n m o d e l s are c o n s t r u c t e d f r o m histor ical data. The pro ject mix may signi f icant ly in f luence the m o d e l parameters. For examp le , of the 63 pro jects in B o e h m ' s C O C O M O database on ly 7 are classi f ied as bus iness app l icat ions w h i l e 24 w e r e d e v e l o p e d in F O R T R A N and 20 in Assemb le r . Genera l i za t ions f r o m this s a m p l e t o o the r areas may be risky. The better emp i r ica l studies report the effects of var ious factors o n product iv i ty . Product iv i ty , h o w e v e r , is a decept i ve l y a m o r p h o u s c o n c e p t . In f ields o the r than so f tware ~ r o in" an i n d e p e n d e n t test of S P Q R / 2 0 the authors f o u n d each of the c o m p l e x i t y factors to impact d e v e l o p m e n t costs by up to 80%, and by nearly o n e o r d e r of m a g n i t u d e w h e n c o m b i n e d . H o w e v e r , this may be d u e to the spec i f ic attr ibutes of the sof tware sys tem u s e d as a test case. Relevant Literature Rev iew / 24 e n g i n e e r i n g there is usual ly s o m e reasonab le metr ic for the o u t p u t o r y ie ld f r o m a p r o c e s s . L ikewise , inputs can be m e a s u r e d . E c o n o m i c p roduct iv i t y can be easily c o m p u t e d as y ie ld / input . The p r o b l e m w e face in so f tware d e v e l o p m e n t is that b o t h the inputs and the o u t p u t s have not yet had a stable metr ic for c o m p a r i s o n . Never the less , if w e are t o p rogress in this area, s o m e measures are requ i red . Cur rent ly the c o m m o n usage is t o measure inputs in terms of m a n - m o n t h s ( M M ) 1 1 and ou tpu ts in l ines of c o d e ( L O C ) . Product iv i ty is t h e n d e f i n e d as L O C / M M . O n e dif f iculty in in terpret ing the empi r ica l studies is that factors af fect ing L O C and t h o s e af fect ing M M are not expl ic i t ly separated . A n o t h e r anomaly in us ing L O C / M M as a measure of p roduct i v i t y is that the major i ty of d e v e l o p m e n t effort is no t c o d i n g [ M c K e e n 1981, 1983]. If w e are s imply in terested in the c o d i n g phase then L O C / M M may be reasonable , but many o the r types of p e o p l e are i nvo l ved in pro ject d e v e l o p m e n t s u c h as users, analysts, des igners etc. A s s i g n i n g p roduct iv i t y ind ices for these p e o p l e is m o r e p rob lemat ica l . The abil ity t o c o n s t r u c t an accurate est imate of effort cr it ical ly d e p e n d s u p o n impl ic i t assumpt ions about p roduct iv i ty . Even assuming the s ize est imate is accurate , p roduct iv i ty , and h e n c e effort is i n f l u e n c e d by many of the factors ident i f ied in Table 2. Current ly , there are insuf f ic ient s tud ies to accurate ly assess the ef fects of these factors o n d e v e l o p m e n t effort. A s a s ide issue, activit ies i n c l u d e d in the de f in i t ion of effort can affect est imates. B r o o k s [1975] po in ts o u t that w h i l e c o d i n g effort may be the easiest to est imate it N o n - g e n d e r spec i f ic Relevant Literature Rev iew / 25 may on ly c o m p r i s e 2 0 % of the total sys tem effort. M a n a g e m e n t effort, c ler ical suppor t , ma in tenance , t raining etc . are s igni f icant activit ies t o i nc lude . C o m p a r i s o n s a m o n g p u b l i s h e d s tud ies , there fore , are c o n f o u n d e d t h r o u g h d i f fe rences in de f in i t ions . Perhaps the m o s t impor tant idea to c o m e out of B r o o k s ' and Putnam's w o r k is that effort and s c h e d u l e are st rongly l i nked . Total effort can rise exponent ia l l y as pro ject s c h e d u l e is s h o r t e n e d to less than s o m e o p t i m u m t ime. The reasons for this are c i ted as c o m i n g f r o m the c o m m u n i c a t i o n load a m o n g pro ject p e r s o n n e l and d i m i n i s h e d p roduct iv i ty as stress levels g o b e y o n d s o m e o p t i m u m po int . The centra l i dea is that pro jects must g o t h r o u g h a natural ges ta t ion p e r i o d . It is c la imed that m i n i m u m effort, and h e n c e m i n i m u m cost , may be ach ieved w i t h the appropr ia te s c h e d u l e . The theoret ica l reason for this has r e c e i v e d little, if any, a t tent ion in the l iterature. There is the impl ic i t a s s u m p t i o n in all est imat ing m o d e l s to date that the . factors w h i c h affect d e v e l o p m e n t effort are o r t h o g o n a l . This may be d u e to insuff ic ient data po in ts to c h e c k for i n t e r d e p e n d e n c i e s . But, if w e are to bu i l d m o r e p a r s i m o n i o u s est imat ing m o d e l s , then redundant factors must be ident i f ied and r e m o v e d to avo id d o u b l e c o u n t i n g and c o n f o u n d e d results. As can be s e e n f r o m the p r e c e d i n g d i s c u s s i o n , many theoret ica l and empir ica l l imitat ions exist w i th the var ious est imat ing m o d e l s . A further rev iew and cr i t ique of these a p p r o a c h e s appears in B o e h m [1981], M o h a n t y [1981], and W r i g l e y & Dexte r [1987]. W e turn n o w to an area of the l iterature w h i c h fo rms key c o m p o n e n t s in later theore t ica l d e v e l o p m e n t . Relevant Literature Rev iew / 26 2.5. SYSTEM SIZING T h e issue w h i c h e m e r g e s f rom ex ist ing est imat ing m o d e l s is that an accurate , early est imate of system size is crucial in o r d e r for the est imate t o be usefu l in p r e d i c t i n g actual effort t o bu i l d the given system. At present the t w o s iz ing a p p r o a c h e s are the lines of c o d e ( L O C ) app roach and the f u n c t i o n po in t (FP) a p p r o a c h . Unfor tunate ly , ne i ther of these a p p r o a c h e s uti l ize i n fo rmat ion about the system b e i n g d e v e l o p e d w h i c h is c a p t u r e d w i th s t ruc tu red analysis m e t h o d s , (see e .g . G a n e & Sarson [1979], Warn ie r [1976], Jackson [1975], D e M a r c o [1978,1982]) . Several m o d e l s exist in w h i c h a size est imate in L O C is the p r ime input in to the m o d e l : M e t a m o d e l - Bailey and Basili [1980], C O C O M O - B o e h m [1981], SL IM -P u t n a m [1978, 1979] ; a n d W o l v e r t o n [1974]. The rationality of us ing L O C , o r any o the r o u t p u t metr ic of a s y s t e m d e v e l o p m e n t effort, as an input to an est imat ing m o d e l , is q u e s t i o n a b l e . The p r o b l e m s wi th us ing c o d e size as an input in to an effort e q u a t i o n are as fo l l ows : 1. The size of a sof tware system is the result of many c o n t i n g e n c i e s . It is the b y - p r o d u c t o r c u m u l a t i o n of all factors in the d e v e l o p m e n t p rocess . 2. Size, m e a s u r e d in L O C , is that w h i c h results after requ i rements have b e e n met . It s h o u l d no t be c o n s i d e r e d as a target. 3. Size est imates at the requ i rements phase are qu i te subject ive . A c c u r a t e est imates may not be avai lable until after the detai l d e s i g n is c o m p l e t e . T h e fundamenta l a s s u m p t i o n in using any of the c o d e o r i e n t e d s i z ing m o d e l s is that a pr ior i , a reasonab le est imate of system s ize in L O C is available. This requi res a Relevant Literature Rev iew /' 27 sub ject ive est imate of pro ject s ize to be initially i nput to the m o d e l . Statements such as: " A n initial s tudy has d e t e r m i n e d that the s ize of the p r o g r a m wi l l be rough ly 32 ,000 de l i ve red s o u r c e ins t ruct ions . . . " ( B o e h m [1981], p g . 63). "It is usually reasonable to est imate a range of poss ib le s izes for the . s ys tem. . . " (Putnam [1979], p g . 356). " G i v e n that S ( L O C ) can be es t imated . . . " (K i tchenham and Taylor [1984], P g - 82). are not expl ic i t ly stated as the crit ical a s s u m p t i o n . In the G o l d e n et al. [1984) rev iew of Putnam's m o d e l , it is c o n c e d e d in a c l o s i n g paragraph: " O n e di f f iculty and u n r e s o l v e d p r o b l e m w i t h the use of this or any similar m o d e l is sys tem s i z ing , that is, es t imat ing l ines of c o d e . It f o l l o w s that a g o o d requ i rements and system spec i f i ca t i on is n e e d e d . " (pg. 14). A n e x a m p l e of h o w dif f icult this s iz ing p r o b l e m is c o m e s f rom a repor t by C o n t e et al. [1986] o n a s tudy by Y o u r d o n . Several e x p e r i e n c e d sof tware managers w e r e asked to est imate the size of 16 c o m p l e t e d sof tware p ro jects g iven on ly the c o m p l e t e des ign spec i f ica t ions for e a c h pro ject . A n R 2 of .07 b e t w e e n actual s ize and es t imated size is repo r ted . A n in terest ing obse rva t i on f r o m this s tudy is that the exper t analysts cons is tent ly u n d e r e s t i m a t e d the actual p r o d u c t sizes. The s ign i f icance of these results is that even w i th the d e s i g n spec i f i ca t ion in hand the abil ity to subject ive ly s ize a pro ject in terms of L O C is e lusive. The c o n c l u s i o n f r o m the a b o v e d i s c u s s i o n is that SL IM , C O C O M O and the others are no t s i z ing m o d e l s but are bet ter su i ted to es t imat ing resource c o n s u m p t i o n and s c h e d u l i n g o n c e a s ize est imate is available. N o t w i t h s t a n d i n g their c o n t r i b u t i o n to o u r Relevant Literature Rev iew / 28 unders tand ing of the issues, s igni f icant s i z ing p r o b l e m s remain . Add i t iona l l y , a n u m b e r of m o d e l s use Func t ion Points as p r ime inputs : A lb rech t [1979], A l b r e c h t & Gaf fney [1983], Rubin [1983, 1985], Jones [1986], S y m o n s [1988]. This m o r e recent app roach c o n s i d e r s larger units of so f tware than L O C , s u c h as screens , reports , inqui r ies , files and interfaces as inputs t o an est imat ing m o d e l . O u r abil ity t o est imate these larger units of sof tware is bet ter than est imat ing L O C , as the in fo rmat ion necessary to measure t h e m is available dur ing the d e s i g n phase of d e v e l o p m e n t . H o w e v e r , ideally, what is sought are p roper t ies of an in fo rmat ion sys tem w h i c h are measurab le du r ing analysis, and w h i c h are f o u n d t o causally affect the a m o u n t of effort and c o d e requ i red t o bu i ld the sys tem. This thesis addresses the p r o b l e m of o b t a i n i n g size est imates b a s e d o n system requ i rements , i.e. est imates based o n inputs t o the d e v e l o p m e n t p r o c e s s . The var ious s i z ing a p p r o a c h e s may be p l a c e d o n a s impl i f ied vers ion of the Water fa l l m o d e l [ B o e h m 1981] of system d e v e l o p m e n t . It can be s e e n f r o m this p l a c e m e n t w h e n in fo rmat ion is available t o make an est imate . A s can be s e e n f r o m Figure 2.1 there is a pauci ty of s i z ing strategies at the analysis phase. The notab le e x c e p t i o n is D e M a r c o [1982]. H e has d e v e l o p e d a " B a n g " met r ic t o est imate sys tem effort. D e M a r c o ' s " p - c o u n t s " (system primit ives) i nc lude 12 di f ferent ways of c o u n t i n g system proper t ies w h i c h are ind icators of system c o m p l e x i t y . H o w e v e r , in his o w n w o r d s : " Y o u might reason , as I or ig inal ly d i d , that all w o r k in a pro ject is w o r k spent i m p l e m e n t i n g o n e of the th ings c o u n t e d by the var ious p - c o u n t s . This theo ry impl ies that y o u o u g h t to base your f u n c t i o n met r ic o n all of the p - c o u n t s , w i th each o n e w e i g h t e d by its u n i q u e factor . I have never had m u c h success w i t h this a p p r o a c h ; it is statistically intractable and some of the counts overlap and measure redundantly. A s impler / 29 B a n g Metr ic [ D e M a r c o ] Funct ion Po ints [ A lbrecht , J o n e s , Rubin , S y m o n s ] LOC [ Boehm, Bailey, P u t n a m , . Wolverton] F i g u r e 2 . 1 : S i m p l i f i e d W a t e r f a l l M o d e l Relevant Literature Rev iew / 30 and m o r e p r o d u c t i v e w a y to character ize Bang is t o c h o o s e o n e of the c o u n t s as a pr inc ipa l ind icator . . . " (pg. 83, b o l d added) . l o n e s [1986], the lead ing p r o p o n e n t for the use of f u n c t i o n po in ts , ref lects o n this u n r e s o l v e d issue: " A s of 1985, so f tware e n g i n e e r i n g is s l o w l y e m e r g i n g f r o m the dark f o g of l ines of c o d e m e a s u r e m e n t s t o e x p l o r e n e w m e t h o d s and n e w c o n c e p t s of m e a s u r e m e n t " ... " . . . n e w so f tware measures are start ing t o appear that c o m e to grips w i th funct ions , w i th structural c o m p l e x i t y , and wi th data c o m p l e x i t y . A l t h o u g h t h e s e m e t h o d s w e r e d e v e l o p e d i n d e p e n d e n t l y by di f ferent researchers, n e w synerg ist ic hybr id measures are starting t o be e x p l o r e d , w i th s igni f icant potent ia l va lues" , (pp. 81-82) O n e p u r p o s e of this thesis is to f ind the s impler , m o r e p a r s i m o n i o u s m e t h o d that Jones and D e M a r c o s u g g e s t e d . As a step towards ident i fy ing w h i c h p r inc ip le ind icators to use, D e M a r c o d i f ferent iated systems o n t w o axes: sc ient i f ic t o bus iness p r o c e s s i n g and func t ion s t rong t o data s t rong (Figure 2.2). M u c h of bus iness data p r o c e s s i n g can be character i zed as data s t rong . This thesis takes the p o s i t i o n that requ i rements c o m p l e x i t y of these bus iness app l icat ions can be captu red in terms of data measurements . The d e v e l o p m e n t of Requ i rements metr ics is crit ical if est imat ing is t o advance o n a sc ient i f ic f o o t i n g . 2.6. SUMMARY: THE MISSING LINK W h i l e o u r assumpt ions about the d is t r ibut ion of effort ove r a sys tem's life and the effect of d e v e l o p m e n t m e t h o d s and too ls o n product iv i t y are important , they are s e c o n d a r y t o the central m e a s u r e m e n t q u e s t i o n . The crit ical shortfal l of ex ist ing so f tware metr ics is that they are measures of the product of d e v e l o p m e n t . As s u c h , their / 31 Scientific Commercial ^ \ Data \ Strong ( AH Projects ] Hybrid / Function Strong Figure 2.2: De Marco's Project Classification Relevant Literature Rev iew / 32 usefu lness as es t imat ing metr ics are l im i ted . If metr ics of the p r o d u c t of d e v e l o p m e n t are u s e d , s e m a n t i c in fo rmat ion a b o u t the requ i rements is lost as these p r o d u c t metr ics are just s h a d o w s of the under ly ing reality. The essent ia l p r o b l e m is this: If what w e really want to d o is est imate effort, t h e n w e must measure t h o s e th ings that cause effort. By us ing a met r ic of the e n d p r o d u c t as a basis for d e t e r m i n i n g effort, w e immed ia te l y i n t r o d u c e error in o u r est imate of effort. This is b e c a u s e o u r measure of the e n d p r o d u c t is itself an est imate c o m p l e t e w i th its o w n error marg in . There fo re , to m i n i m i z e this error w e must est imate effort no t o n ano the r est imate , but o n an actual measure at the t ime the est imate of effort is made . A n e x a m p l e wi l l clarify this c o n c e p t : If an es t imate of effort is m a d e based o n an est imate of Lines of C o d e , t h e n there are t w o sources of error b e t w e e n an est imate of effort and eventua l actual effort. The first is the error margin b e t w e e n the est imate of L O C and the actual L O C , i.e. the a m o u n t by w h i c h the sub ject ive est imate of L O C is d i f ferent than the eventua l actual L O C . The s e c o n d sou rce of error is i n t r o d u c e d w h e n L O C is an imper fect p r e d i c t o r of effort, i.e. w h e n L O C is n o t per fect ly cor re la ted w i th effort. T o r e m o v e this inherent s o u r c e of error in est imat ing , it is necessary to m o v e f r o m t e c h n i q u e s and rules of t h u m b d e v e l o p e d in pract ice to a m o r e substantial v i e w of the basic sou rces of effort in system d e v e l o p m e n t . This wi l l o c c u r by c o n s t r u c t i n g est imates based o n the real w o r l d ent it ies, re lat ionsh ips and events that an in fo rmat ion system m o d e l s . The next c h a p t e r d e v e l o p s the genera l theory for this a p p r o a c h . CHAPTER 3. THEORETICAL FRAMEWORK The p u r p o s e of this chapte r is to d e v e l o p a theoret ica l f r a m e w o r k of system requ i rements size. T o ach ieve this, several major c o n c e p t s are d e v e l o p e d b e g i n n i n g w i t h a d i s c u s s i o n of the basic inputs" t o the d e v e l o p m e n t p rocess . The chapter t h e n d e v e l o p s the key in fo rmat ion system c o n c e p t s of i s o m o r p h i c t rans format iona l p roper t ies f r o m requ i rements t h r o u g h to i m p l e m e n t a t i o n , m e t h o d s of requ i rements m o d e l l i n g , and an o v e r v i e w of requ i rements c o m p l e x i t y and requ i rements s iz ing . N e x t , c o n c e p t s of p r o c e s s i n g c o m p l e x i t y and h o w they relate to requ i rements are i n t r o d u c e d . Finally, these c o n c e p t s are syn thes i zed into a m o r e formal s ta tement of requ i rements s ize . 3.7. FRAMING THE ISSUES Bu i ld ing i n fo rmat ion systems is a p r o d u c t i o n p rocess . The pr ime inputs into any p r o d u c t i o n p r o c e s s are labour (effort), t e c h n o l o g y and capital . The o u t p u t f r o m the sys tem d e v e l o p m e n t p r o c e s s is a w o r k i n g i n fo rmat ion system. The resul t ing system can be m e a s u r e d in many ways, the easiest of w h i c h is to s imply c o u n t the a m o u n t of sof tware p r o d u c e d us ing s o m e mean ing fu l met r ic such as l ines of c o d e . Alternat ively , user sat isfact ion o r system reliabil ity are increas ing ly c o m m o n ways of evaluat ing systems. A l t h o u g h measures of the e n d p r o d u c t are clearly important , it is not the f o c u s of this thesis . Instead, w e wi l l turn our at tent ion t o the inputs of the system d e v e l o p m e n t p r o c e s s . The est imat ing p r o b l e m is, inter alia, t o d e t e r m i n e the a m o u n t of e f f o r t 1 2 that is 1 2 Effort is u s e d in p lace of labour as effort is c o n s i d e r e d to be " p r o f e s s i o n a l labour" . 33 Theore t ica l F ramework / 34 requ i red to p r o d u c e the w o r k i n g sys tem. W h i l e the p r o d u c t i o n m o d e l is usefu l in u n d e r s t a n d i n g the b u i l d i n g p r o c e s s , its usefu lness in es t imat ing is l im i ted b e c a u s e the effort input is the i n d e p e n d e n t var iable. W h a t w e are after is a m o d e l w h i c h c o n s i d e r s effort as the d e p e n d e n t variable. The centra l q u e s t i o n is: W h a t d o e s the a m o u n t of effort d e p e n d on? A m o d e l to he lp st ructure this q u e s t i o n has b e e n syn thes i zed f r o m the literature and appears in Figure 3.1. The just i f icat ion for this m o d e l c o m e s f rom the analysis of Tab le 2.2 in C h a p t e r 2 and f r o m Wr ig ley & D e x t e r (1987): The numbers in the c o l u m n s of Tab le 2.2 represent t h e ef fects of e a c h factor o n system d e v e l o p m e n t product iv i ty . W h a t is impor tant to n o t e , however , is the d i rec t ion of the ef fects. For all of the factors u n d e r "Sys tem R e q u i r e m e n t s " , an increase in the factor is assoc ia ted w i t h a decrease in p roduct i v i t y and h e n c e an increase in effort. For all of the factors u n d e r " P e r s o n n e l E x p e r i e n c e " , an increase in the factor is assoc ia ted w i th an increase in p roduct iv i ty and h e n c e a decrease in effort. For all. the factors u n d e r " D e v e l o p m e n t T e c h n o l o g y " , an increase in the factor is assoc ia ted w i t h an increase in p roduct i v i t y and h e n c e a decrease in effort. M o r e o v e r , s ince these three const ructs are tempora l l y an tecedent to the d e v e l o p m e n t p r o c e s s this m o d e l suggests that these c o n s t r u c t s causal ly affect sys tem d e v e l o p m e n t effort . The face val id ity of this m o d e l mot ivates three assumpt ions w h i c h this thesis a c c e p t s as u n p r o v e n p remises , namely : 1. H o l d i n g p e r s o n n e l and t e c h n o l o g y constant , as the s ize of system requ i rements increase, requ i red effort wi l l a lso increase. 2. For a g iven set of requ i rements and level of t e c h n o l o g y , as p e r s o n n e l skill and e x p e r i e n c e (of b o t h users and deve lopers ) increases, requ i red effort decreases . 3. For a g iven set of requ i rements and level of p e r s o n n e l expe r ience , as the use Figure 3.1: Requirements, Personnel and Tools Model Theoret ica l F ramework / 36 of d e v e l o p m e n t m e t h o d s and t o o l s advances , requ i red effort w i l l decrease . It may b e argued that s o m e in te ract ion ef fects wi l l exist a m o n g the i n d e p e n d e n t var iables. For e x a m p l e , the m o r e di f f icult o r larger systems may be d e v e l o p e d by sen io r p e r s o n n e l . W h i l e this is m o s t l ikely the case, it is p r e s u m e d that any interact ion ef fects are smal l in c o m p a r i s i o n t o the main ef fects . The cons t ruc t w h i c h is h y p o t h e s i z e d as the pr inc ipa l i n d e p e n d e n t s o u r c e of effort is s y s t e m requ i rements . C lear ly this cons t ruc t is qu i te c o m p l e x requi r ing further de f in i t ion if it is to be usefu l . There fo re the remainder of this C h a p t e r wi l l f o c u s o n its opera t iona l i za t ion and measurement . For est imat ing pu rposes , h o w can w e measure s y s t e m requ i rements? Before w e start t o answer this q u e s t i o n it is first necessary t o u n d e r s t a n d the p r o c e s s of b u i l d i n g i n fo rmat ion systems and exact ly what it is w e are bu i l d ing . I wi l l start by descr ib ing a f e w key c o n c e p t s . 3.2. KEY INFORMATION SYSTEM CONCEPTS 3.2.1. System Development Transformations It is general ly a c c e p t e d that bu i l d ing any mach ine artifact cons is ts of a series of t ransformat ions w h i c h start f rom s o m e c o n c e p t u a l reality in h u m a n thought , t h r o u g h var ious phases, into a w o r k i n g sys tem. The demarcat ions b e t w e e n phases are not always c lear but neverthe less must ref lect an ident i f iab le stable po in t in the p rocess . For o u r p u r p o s e s the stable po in ts in the t rans format ions f r o m c o n c e p t u a l reality t o w o r k i n g c o m p u t e r systems are d e f i n e d as 1) r equ i rements de f in i t i on , 2) log ica l representat ion , Theoret ica l F ramework / 37 and 3) m a c h i n e i m p l e m e n t a b l e c o d e . The c o r r e s p o n d i n g t ransformat ions are analysis, d e s i g n and c o d i n g . The final t ransformat ion , f r o m m a c h i n e i m p l e m e n t a b i e c o d e to the w o r k i n g sys tem, is d o n e by mach ines v ia c o m p i l e r s , l inkers, translators, e tc . This s imp l i f i ed d e v e l o p m e n t m o d e l is por t rayed in Figure 3.2 (see also W a n d and W e b e r [1987, 19881). For est imat ing p u r p o s e s , what is important , and is a centra l p remise of this thesis, is that there exist p roper t ies of a system's requ i rements that remain invariant over the transformat ions necessary to br ing a b o u t the w o r k i n g sys tem. M o r e o v e r these p roper t ies are measurab le . If the w o r k i n g system accurately ref lects its requ i rements , then the transformat ions have mainta ined a basic i s o m o r p h i s m . The proper ty of mainta in ing the structural f o r m of a system requ i rement th rough t o the w o r k i n g s y s t e m is referred t o as an isomorphic transformation1 3 . This p remise of i s o m o r p h i c t rans fo rmat ion is a c o r n e r s t o n e in the theoret ica l f ramework of this chapte r and wi l l be e x p a n d e d in a later s e c t i o n u n d e r Requirements Transformations. Before d o i n g so , the next t w o sect ions i n t r o d u c e requ i rements m o d e l l i n g c o n c e p t s and then de f ine an I.S. m o d e l . .3.2.2. Requirements Modelling As d e s c r i b e d in C h a p t e r 2, t w o separate but parallel s c h o o l s of t h o u g h t exist w i th respect t o s y s t e m analysis: 1) the Data Structure o r Data M o d e l a p p r o a c h and 2) the Data F l o w a p p r o a c h . W h i l e b o t h a p p r o a c h e s are ext reme ly usefu l fo r p u r p o s e s of T ~ 3 For a m o r e genera ! art iculat ion of this c o n c e p t see A s h b y [1956] / 38 (1) Analysis Entities-Relationships Events (2) Design Logical Data, Process Representation (3) Coding Source Code (4) Translators Machine Implementation Figure 3.2: System Development Transformations Theoret ica l F ramework / 39 analysis and d e s i g n , this thesis takes the p o s i t i o n that for est imat ing p u r p o s e s an in tegra t ion of b o t h is requ i red . The centra l reason for this is that b o t h a sys tem 's stat ics (data structure) and dynamics (events w h i c h generate data f low) must be c a p t u r e d if a measure of requ i rements size is des i red . It is the c o m b i n a t i o n of statics a n d dynamics . w h i c h dr ive the transformat ions (processes) in a w o r k i n g in fo rmat ion sys tem. The m o s t w i d e l y a c c e p t e d data m o d e l is the Ent i ty -Relat ionship (E-R) m o d e l [ C h e n 19761, also ca l led the Entity Re lat ionship At t r ibute m o d e l (E-R-A). Var ious ex tens ions and interpretat ions of this m o d e l appear in the literature. For the p u r p o s e s of this thesis the t e r m i n o l o g y c h o s e n is stated by A t z e n i et al. [1983]: Three di f ferent classes of objects exist in the m o d e l : ent i t ies, re lat ionships and attr ibutes. Each ob ject is r e p r e s e n t e d and ident i f ied by an object name (entity name , re lat ionship name, attr ibute name) ; m o r e o v e r it may have assoc ia ted b o t h a set of s y n o n y m s and an expl icat ive text in natural language. Entities rep resent t h o s e classes of ob jects in the real w o r l d i nvo lved in the app l ica t ion . A n e lementary ob jec t w i th in a class wi l l be referred t o as an occurrence (or instance) of an entity. Relationships represent classes of log ica l assoc iat ions b e t w e e n ent it ies. A n e lement of o n e of these classes wi l l be referred to as an occurrence of a relationship. W e use the te rm "ent i t y " ( " re lat ionship" ) as a s y n o n y m of the te rm "en t i t y se t " (" re lat ionship set" ) . The m o d e l can desc r ibe the cardinal ity (type) of each re lat ionship , i.e. it can d ist ingu ish b e t w e e n 1:1, 1:n, m:n (binary) re lat ionships . ... Attributes represent p roper t ies of entit ies o r re lat ionships ; an attr ibute is a mathemat ica l f u n c t i o n : the d o m a i n is the set of o c c u r r e n c e s of an entity o r a re lat ionsh ip and the c o d o m a i n is a set of values . . .The set of attr ibutes of an entity that un ique ly ident i f ies its o c c u r r e n c e s is the key Theoret ica l F ramework / 40 of the entity, (pp. 379-380) The data m o d e l a p p r o a c h , h o w e v e r , is not w i t h o u t its l imitat ions. The first, and less crit ical short fa l l , lies in the m o d e l l i n g of ent i ty - re lat ionsh ip connect i v i t y w i th the use of m a p p i n g ratios. M a n y c o m b i n a t i o n s of set m a p p i n g s exist in reality o t h e r than the three m e n t i o n e d above . Further, there is the impl ic i t a s s u m p t i o n that ent ity sets are c o n n e c t e d to o t h e r entity sets v ia a single attr ibute. For est imat ing requ i rements size the s impl ist ic m a p p i n g s of 1:1, 1:n, n:m lose m u c h in fo rmat ion as to the real under ly ing ent i ty l inkages. This observa t ion wi l l b e c o m e apparent w h e n w e try to pred ict p r o c e s s i n g comp lex i t y . The s e c o n d , and far m o r e ser ious s i tuat ion , lies in the m o d e l l i n g of env i ronmenta l changes , i.e. events. W h i l e a data m o d e l may capture the essential static c o m p l e x i t y of a sys tem, it fails to capture the dynamic aspects of a system, i.e. h o w the system r e s p o n d s t o real w o r l d events. S ince so f tware p rocesses are the d y n a m i c s of a system, the E-R m o d e l has a fundamenta l w e a k n e s s in its pure fo rm . It is, h o w e v e r , poss ib le t o desc r ibe events in terms of the attr ibutes of the ent i t ies w h i c h are invo lved in the event . This c o n c e p t wi l l be d i s c u s s e d in greater detai l short ly. W h e n the E-R m o d e l ex tends t o i n c l u d e events it b e c o m e s the Ent i ty -Re lat ionsh ip -Event (E-R-E) a p p r o a c h t o captur ing b o t h the statics and dynamics of requ i rements . W i t h these terms d e f i n e d , w e can n o w descr ibe what an in fo rmat ion system is. 3.2.3. A Dynamic Information System Model A n in fo rmat ion system is a representat ion of the attr ibutes of ent i t ies and the inter - re lat ionsh ips of entit ies that exist in the real w o r l d . The state of these attr ibutes Theore t ica l F ramework / 41 and ob jec t re lat ions must be ma in ta ined if the IS is t o preserve its " faithful r e p r e s e n t a t i o n " 1 * of the statics and dynamics of the real w o r l d . The m o r e attr ibutes an o b j e c t has and the m o r e these attr ibutes can vary the m o r e c o m p l e x it is. W h e n the attr ibutes of an o b j e c t change , this c h a n g e can be c o n s i d e r e d as an event to the i n fo rmat ion sys tem. These attr ibutes, or, t o k e e p later t e r m i n o l o g y cons is tent , data e lements , must be or ig inal ly spec i f i ed t o the system and then kept current as events o c c u r in the e n v i r o n m e n t . Entities also have re lat ionships w i th o the r ent i t ies. The k n o w l e d g e of these re lat ionships must also be s p e c i f i e d to the in fo rmat ion system in s o m e way. W h e n the re lat ionsh ip a m o n g ob jects c h a n g e in the real w o r l d , this can also be c o n s i d e r e d an event to the i n fo rmat ion system. These events must also be r e c o r d e d in the i n fo rmat ion sys tem. A data e l e m e n t is essential ly a rep resenta t ion of s o m e real w o r l d state, or the c u m u l a t i o n of events in the real w o r l d : H e n c e , a data e l e m e n t is a state o r event tracker. Events and ent it ies in the real w o r l d must be re f lected in o n e o r m o r e data e lements . " C o d e " is requ i red to maintain these variables t o ensure they parallel their counterpar ts in the real w o r l d . C o d e is n e e d e d to locate , retr ieve, change , de le te o r add to instances o r sets of instances of these data e lements . The d e p e n d e n c y re lat ionships a m o n g variables i n t roduce the n e e d for add i t iona l l og ic . Retrieval requ i rements , o r v i e w s of the data also i n t r o d u c e the n e e d for p r o c e s s i n g l og ic , if this d e s c r i p t i o n of an i n fo rmat ion system is a c c e p t a b l e then it is impor tant to i n c o r p o r a t e the c o n c e p t of t ransformat ions d e v e l o p e d earl ier to unders tand h o w the size of T " 5 The system t h e o r e t i c app roach as ar t icu lated by A s h b y [1956], v o n Bertalanffy [1968], Bunge [1977], a m o n g others and the m o r e recent IS m o d e l of events , states and laws b e i n g d e v e l o p e d by W a n d and W e b e r [1987], turns ou t to be a usefu l veh ic le for d e s c r i b i n g requ i rements size. T h e o r e t i c a l F r a m e w o r k / 4 2 r e q u i r e m e n t s e n d u p a s c o m p l e x i t y i n p r o c e s s i n g . 3.2.4. Requirements Transformation R e s t a t e d , a c e n t r a l p r e m i s e o f t h i s t h e s i s is t h a t t h e c o m p l e x i t y o f t h e s o f t w a r e r e m a i n s i s o m o r p h i c t o t h e c o m p l e x i t y o f i ts r e q u i r e m e n t s . S p e c i f i c a l l y , s t a t e m e n t s in t h e r e q u i r e m e n t s d e f i n i t i o n a b o u t e v e n t s , e n t i t i e s a n d r e l a t i o n s h i p s t h a t a r e t o b e m o d e l l e d i n t h e i n f o r m a t i o n s y s t e m w i l l b e i d e n t i f i a b l e s o m e h o w i n t h e m a c h i n e i m p l e m e n t a b l e c o d e . A s a n e x a m p l e , c o n s i d e r t h e E n g l i s h p h r a s e , " c u s t o m e r b u y s p a r t " ( C - B - P ) , t h a t w o u l d l i k e l y e x i s t i n t h e r e q u i r e m e n t s s t a t e m e n t o f a n o n l i n e i n v e n t o r y s y s t e m . It is b a s i c a l l y a d e s c r i p t i o n o f a n e v e n t , t h a t is o f i n t e r e s t t o t h e s y s t e m . W e p a r s e t h e s e n t e n c e i n t o s u b j e c t a n d p r e d i c a t e , y i e l d i n g t w o n o u n s , c u s t o m e r a n d p a r t , a n d o n e v e r b , b u y s . I n t e r e s t i n g l y e n o u g h t h i s p h r a s e c a n a l s o b e d e s c r i b e d i n t e r m s o f e n t i t i e s , r e l a t i o n s h i p s a n d e v e n t s . T h e t w o e n t i t i e s , c u s t o m e r a n d p a r t h a v e a r e l a t i o n s h i p f o r m e d b e t w e e n t h e m b y t h e b u y i n g a c t i o n . T h e e n t i r e s e q u e n c e , C - B - P , is c o n s i d e r e d as t h e e v e n t . H e r e it c a n b e s e e n h o w t h e E-R m o d e l c o u l d b e e n e x t e n d e d t o c a p t u r e s y s t e m d y n a m i c s b y i n c l u d i n g e v e n t s . In t h i s l i g h t a w o r k i n g d e f i n i t i o n o f a n e v e n t i s : A n e v e n t t o t h e s y s t e m is t h a t w h i c h c r e a t e s o r d e s t r o y s a r e l a t i o n s h i p b e t w e e n t w o o r m o r e e x i s t i n g e n t i t y o c c u r r e n c e s . A n e v e n t a l s o m a y c r e a t e , d e s t r o y , o r c h a n g e o n e o r m o r e a t t r i b u t e s o f a n e n t i t y o c c u r r e n c e . F o r e x a m p l e , i n t h e a b o v e s y s t e m w e m a y w i s h t o a d d a c u s t o m e r e n t i t y o c c u r r e n c e . In a d d i t i o n , if w e a s s u m e t h a t t h e s y s t e m is w e l l d e s i g n e d , c u s t o m e r s s h o u l d b e r e p r e s e n t e d a n d i d e n t i f i a b l e as u n i q u e e n t i t y o c c u r r e n c e s w i t h i n t h e s y s t e m . L i k e w i s e , p a r t s w i l l b e r e p r e s e n t e d as e n t i t y o c c u r r e n c e s . If t h e i s o m o r p h i s m is t o b e p r e s e r v e d f r o m t h e r e q u i r e m e n t s d e f i n i t i o n t o t h e m a c h i n e i m p l e m e n t a t i o n , t h e n s o m e w h e r e i n t h e Theore t ica l F ramework / 43 so f tware w e w o u l d e x p e c t to f ind the C - B - P event fully art iculated in the f o r m of variables and p r o c e s s i n g . From a m a c h i n e pe rspect i ve (assuming current p r o c e s s o r archi tecture) t ransformat ions can be e x p r e s s e d in the f o rm : o p e r a n d - o p e r a t o r - o p e r a n d . It is c o n c e i v a b l e that o u r C - B - P event c o u l d eventual ly be r e p r e s e n t e d in this manner . T h e c o m p l e t e t rans fo rmat ion f rom the or ig inal English phrase to m a c h i n e f o r m is s h o w n in Table 3.1, w h e r e the i tems in the three r ighthand c o l u m n s f o r m the inputs in to the sys tem d e v e l o p m e n t phases s h o w n in the le f thand c o l u m n . Table 3.1 Event: C u s t o m e r Buys Part Requ i rements N o u n V e r b N o u n D e s i g n Entity Re lat ionship Entity C o d e Data Process Data I m p l e m e n t a t i o n O p e r a n d O p e r a t o r O p e r a n d If requ i rements spec i f i ca t ion c o n t a i n e d on ly s imp le n o u n - v e r b phrases the life of sys tem d e v e l o p e r s w o u l d be easy. Unfor tunate ly , the real w o r l d p laces restr ict ions o n w h o m and h o w many parts can be b o u g h t at any o n e point . This i n t r o d u c e s the c o n c e p t of c o n t r o l w h i c h must a c c o m p a n y the C - B - P d e s c r i p t i o n . The c o n t r o l s p l a c e d o n events usual ly take the c o n j u n c t i v e " i f" f o r m in requ i rements . The "i f" has a similar ef fect as a verb in that a re lat ionsh ip is d e f i n e d b e t w e e n ent it ies. This is equ iva lent to mak ing the state of o n e variable d e p e n d e n t o n another . C o m p l e x c o n t r o l s wi l l result in c o m p l e x log ic , eg . val idity c h e c k s , in so f tware . The d is tu rbance to the sys tem, namely the C - B - P event , w i l l requi re a series of t ransformat ions by the so f tware to ensure that all c h e c k s and ba lances are carr ied out , i.e. to return the system to an equ i l i b r i um T h e o r e t i c a l F r a m e w o r k / 44 s t a t e if p o s s i b l e . O t h e r e x a m p l e s o f l a n g u a g e f o r m s w h i c h w e w o u l d e x p e c t t o h a v e i s o m o r p h i c t r a n s f o r m a t i o n a l p r o p e r t i e s a r e : 1. H a s : i m p l i e s o w n e r s h i p o r a r e l a t i o n s h i p b e t w e e n t w o e n t i t y s e t s . 2. Is a : A n e n t i t y o c c u r r e n c e is a m e m b e r o f a n e n t i t y s e t . 3. A d j e c t i v e s : a t t r i b u t e s o f e n t i t i e s 4. A d v e r b s : a t t r i b u t e s o f r e l a t i o n s h i p s W h i l e it is b e y o n d t h e s c o p e o f t h i s t h e s i s t o e v e n a t t e m p t t o d e c o m p o s e t h e E n g l i s h l a n g u a g e i n t o i ts f u n c t i o n a l p r i m i t i v e s , it is a c e n t r a l p r e m i s e t h a t , if d o n e s o , t h e s t r u c t u r e p r o v i d e d w o u l d s e r v e w e l l as" a r e q u i r e m e n t s a n a l y s i s t o o l . T h e b a s i c s t r u c t u r e o f t h e s e k i n d s o f p h r a s e s s h o u l d r e m a i n i s o m o r p h i c t h r o u g h t h e s e v e r a l t r a n s f o r m a t i o n s n e c e s s a r y t o g o f r o m r e q u i r e m e n t s t o i m p l e m e n t a t i o n . S u p p o r t f o r t h i s b a s i c p o s i t i o n c a n b e f o u n d i n H a l s t e a d [1977] w h o f o u n d s i g n i f i c a n t r e g u l a r i t y i n t h e s t r u c t u r e o f E n g l i s h p r o s e a n d m a c h i n e l a n g u a g e . T h e r e f o r e , t h e c a n o n i c a l f o r m o f a r e q u i r e m e n t s d e f i n i t i o n , i .e . s t r i p p e d o f r e d u n d a n t p r o s e , s h o u l d p r o v i d e a m e a s u r e o f t h e o v e r a l l s i z e o f t h e i n f o r m a t i o n s y s t e m t o b e i m p l e m e n t e d in s o f t w a r e . F o r n o w it is s u f f i c i e n t t o a c c e p t t h a t t h e l o g i c a l s t r u c t u r e o f t h e r e q u i r e m e n t s , if m e a s u r e d c o r r e c t l y , c a n b e u s e d as a n e a r l y m e a s u r e o f s y s t e m s i z e . T h e o r e t i c a l F r a m e w o r k / 4 5 3.2.5. Requirements Complexity R e q u i r e m e n t s c o m p l e x i t y c a n b e c o n s i d e r e d a l o n g a n u m b e r o f d i m e n s i o n s . W i t h r e s p e c t t o a n i n f o r m a t i o n s y s t e m , c o m p l e x i t y m a y b e m e a s u r e d as 1) t h e v a r i e t y i n t h e c l a s s e s o f t h i n g s t h a t m u s t b e d e a l t w i t h , 2) t h e l e v e l o f i n t e r a c t i o n a m o n g i ts p a r t s , 3) t h e u n c e r t a i n t y o f e v e n t o c c u r r e n c e s , a n d 4 ) t h e t i m i n g o f s y s t e m r e s p o n s e s [ K o t t e m a n n a n d K o n s y n s k i 1 9 8 4 ] . W h i l e t h i s l is t is n o t m e a n t t o b e e x h a u s t i v e , it p r o v i d e s a b a s i s f o r u n d e r s t a n d i n g t h e n a t u r e o f c o m p l e x i t y . In m a n y s e n s e s c o m p l e x i t y c a n b e d e f i n e d as a n y t h i n g w h i c h c a u s e s c o n f u s i o n i n h u m a n b e i n g s o r i n d u c e s m e n t a l l o a d . If a c o m p l e x p h e n o m e n o n is w e l l u n d e r s t o o d , it m a y n o l o n g e r s e e m c o n f u s i n g t o t h o s e w h o u n d e r s t a n d it b u t w i l l u n d o u b t e d l y i n d u c e c o g n i t i v e s t r a i n o n t h o s e w h o d o n ' t . ' F o r o u r p u r p o s e s t h o u g h , a c o m p l e x p h e n o m e n o n r e q u i r e s effort t o u n d e r s t a n d it, d e f i n e i t , a n d g i v e it s t r u c t u r e . H a l s t e a d [ 1 9 7 7 ] m a d e t h e t h e o r e t i c a l c o n n e c t i o n b e t w e e n ' c o m p l e x i t y a n d e f f o r t . H e c l a i m e d t h a t p r o g r a m m e r s u n d e r t a k e s o m e s e a r c h p r o c e s s t h r o u g h t h e o p e r a n d a n d o p e r a t o r s p a c e o f a l a n g u a g e t o t r a n s f o r m a p r o g r a m s p e c i f i c a t i o n i n t o c o d e . T h e n u m b e r o f p r i m i t i v e m e n t a l o p e r a t i o n s r e q u i r e d t o l o c a t e t h e r i g h t o p e r a n d - o p e r a t o r s e q u e n c e s , d i v i d e d b y t h e n u m b e r o f p r i m i t i v e m e n t a l o p e r a t i o n s p e r s e c o n d [ S t r o u d 1 9 5 4 ] , g i v e s t h e t o t a l t i m e r e q u i r e d t o p r o g r a m a g i v e n s p e c i f i c a t i o n . If t h i s t h e o r y is g e n e r a l i z a b l e t o t h e a n a l y s i s a n d d e s i g n t r a n s f o r m a t i o n s t h e n b y c o u n t i n g t h e t h i n g s t h a t a r e l i k e l y t o i n d u c e m e n t a l l o a d o n b o t h u s e r s a n d d e v e l o p e r s w e s h o u l d o b t a i n a h i g h c o r r e l a t i o n w i t h a c t u a l e f f o r t t o b u i l d t h e s o f t w a r e . M o r e o v e r , t h i s r e l a t i o n s h i p w i l l b e c a u s a l . Theoret ica l F ramework / 46 3.2.6. Requirements Sizing In the C o n s t r u c t i o n industry es t imat ing pract ices are w e l l d e v e l o p e d . In the case of bu i ld ings , the metr ic m o s t c o m m o n l y used t o est imate cos t is square foo tage . A reasonab le est imate for the cos t of bu i l d ing a h o u s e can be o b t a i n e d s imply by mul t ip l y ing the des i red size of the house , in square feet, by s o m e quality (luxury) i n d e x per square foot . In 1988 the standard resident ia l h o u s e can be bui lt for a r o u n d $60 per square foot . T h e p r o b l e m w e face in IS d e v e l o p m e n t is that there is n o simi lar s tandard metr ic , s u c h as square feet, w h i c h can be u s e d early in the IS p lann ing p rocess . If w e c o n s i d e r that an in fo rmat ion system s o m e h o w maps o n t o the organ i za t ion w h i c h it s u p p o r t s , then the area of the organ i za t ion w h i c h is to be c o m p u t e r i z e d c o u l d be u s e d as a basis f o r est imat ing system d e v e l o p m e n t costs . 3.2.7. Square footage A s s u m e that a c o m p r e h e n s i v e E-R-E diagram can be c o n s t r u c t e d for a business f i rm. This d iagram w o u l d represent the entire o rgan izat ion 's k n o w l e d g e o r data base. It c o u l d be c o n s i d e r e d as the organ izat ion 's data map . The area o n this m a p is referred t o as "square f o o t a g e " 1 5 . Further, it w o u l d also capture the organ izat ion 's l inkages t o its e n v i r o n m e n t . These l inkages, o r re lat ionships w i t h its e n v i r o n m e n t can be used t o captu re the events of interest t o the f irm. "1~5 R e c o g n i z e that the E-R-E d iagram is u s e d as a veh ic le , albeit imperfect ly , t o represent k n o w l e d g e . It is h o p e d that the a p p r o a c h s u g g e s t e d in this thesis wi l l b e e x t e n d a b l e t o any k n o w l e d g e representat ion s c h e m e . Theoret ica l F ramework / 47 In a w e l l d e s i g n e d bus iness there s h o u l d be c o n c e n t r a t i o n s o r c lusters of ent i t ies and re lat ionships w i t h h igh connect iv i ty . These clusters s h o u l d c o r r e s p o n d to o u r n o t i o n of a bus iness f u n c t i o n o r depar tment . There s h o u l d a lso be a s imp le min ima l l inkage b e t w e e n these clusters . M i n i m a l l inkage w o u l d be w h e r e a s ingle ent ity in o n e c luster is related to a s ing le ent ity in another . A s imp le l inkage w o u l d be w h e r e the m a p p i n g ratio b e t w e e n the t w o is s imp le , e .g . (1,1) or (1,*). The a b o v e d e s c r i p t i o n is similar to o u r intuit ive n o t i o n s of l o o s e c o u p l i n g and h igh internal c o h e s i o n . H e n c e w e s h o u l d b e able, hypothet ica l l y at least, t o d r a w a c i rc le a round a c luster of entit ies o n a bus iness ' data m a p and have t h o s e ent it ies represent s o m e log ica l business unit. W e can then think of a system d e v e l o p m e n t pro ject as c o m p u t e r i z i n g or i m p l e m e n t i n g a smal l p o r t i o n of the bus iness ' data map . The ent i t ies, re lat ionships and events wi th in the i m p l e m e n t a t i o n c i rc le give us an early i nd icat ion of the " s i z e " of the requ i rements . 3.3. PROCESSING COMPLEXITY Initially, w e wi l l dea l w i th the p r o c e s s i n g c o m p l e x i t y of a relatively s imp le t ransact ion p r o c e s s i n g i n fo rmat ion system. These types of systems are the best u n d e r s t o o d by all and have the best empi r ica l data available for m o d e l va l idat ion . The ob ject i ve of the d i scuss ion b e l o w is to s h o w h o w o n e c o u l d start t o measure the size of a system's requ i rements . The pr inc ipa l a s s u m p t i o n is that the s o u r c e of all p r o c e s s i n g requ i rements c o m e f r o m events that o c c u r in the system e n v i r o n m e n t and the c o m p l e x i t y of the data structures to w h i c h those events are re lated. Theore t i ca l F ramework / 48 3.3.1. Processing complexity based on E-R-E concepts J a c k s o n [1975], W a r n i e r [1976], D e M a r c o [1982] and others , have s u g g e s t e d that a w e l l s t ructured f u n c t i o n s h o u l d match the log ica l data structure o n w h i c h the f u n c t i o n opera tes . D e M a r c o [1982] expla ins : "The reason ing b e h i n d Warn ier ' s obse rva t i on is this: A l l d e c i s i o n s in the c o d e are b a s e d o n the data p r o c e s s e d by the c o d e ; t he structure of that data - that is, the associat ions a m o n g its c o m p o n e n t p ieces gives a st rong hint as to h o w the c o d e c o u l d be wr i t ten , e g . if there is a repeated subst ructure in the arriving data there s h o u l d b e a l o o p in the c o d e t o deal w i th it. If there is an o p t i o n in the data, i.e. a f ie ld that may o r not be present , t h e n there wi l l have to be an IF-ELSE s e q u e n c e to deal w i th the t w o s i tuat ions . " (pg. 107) The e x t e n s i o n of the a b o v e reason ing is that processing performed by a program is proportional to the complexity of the data at the program interface. If this p r o p o s i t i o n is true then by measur ing the size of the inputs and outpu ts to a sys tem as a w h o l e (based o n requ i rements ) t h e n w e can a p p r o x i m a t e the size of the p r o c e s s i n g task. The theory , w h i c h expla ins the a b o v e pract i t ioners 's observat ions a n d a major p remise of this thes is , can be f o u n d in A s h b y ' s Theory of Requisite Variety [Ashby 1956]. S imply s ta ted , a system, t o remain e c o l o g i c a l l y v iab le , must have suf f ic ient internal variety t o be able t o r e s p o n d t o var ious changes in the env i ronment , i.e. a system must be at least as c o m p l e x as its i m m e d i a t e e n v i r o n m e n t . The key, t h e n , is to capture the essent ia l structural c o m p l e x i t y of the data at the system interface. Theore t ica l F ramework / 49 3.3.2. Input Events W h e n an event o c c u r s in the env i ronment , it is usually p r e s e n t e d to the in fo rmat ion sys tem in the f o r m of a t ransact ion . This t ransact ion conta ins a n u m b e r of attr ibutes w h i c h are log ica l ly re lated t o o n e o r m o r e ent i t ies and thei r attr ibutes in the database. A first s tep t o w a r d s the m e a s u r e m e n t of p r o c e s s i n g c o m p l e x i t y w o u l d be to s imply c o u n t the n u m b e r of system ent i t ies r e f e r e n c e d by the transact ion . D e p e n d i n g u p o n the spec i f i c i m p l e m e n t a t i o n , this p r o c e s s i n g may be m o r e o r less di f f icult . But the k n o w l e d g e of w h i c h system entit ies must be r e f e r e n c e d is c o n t a i n e d w i t h i n the overal l log ica l data st ructure (E-R-E chart). The E-R-E d iagram also conta ins basic in fo rmat ion regard ing the l inkages a m o n g the entit ies. These l inkages s h o u l d p rov ide c lues as to w h i c h ent it ies wi l l be r e f e r e n c e d by a t ransact ion . The c o m p l e x i t y of these l inkages leads t o potent ia l p r o c e s s i n g c o m p l e x i t y requ i red t o r e c o r d an event and also t o extract the in fo rmat ion o n c e s t o r e d . 3.3.3. Output Events R e c o r d i n g the bus iness events in the database is on ly o n e part of the p r o c e s s i n g requ i red . Presumably , d e c i s i o n makers wi l l want t o k n o w about the event at s o m e po in t in t ime and at s o m e level of aggregat ion . This i n t roduces retrieval requ i rements . For e a c h ent ity there is at least o n e s imp le retrieval p r o c e s s i n g requ i rement . For e x a m p l e , an inquiry in to a c u s t o m e r ' s a c c o u n t status. There wi l l most l ikely be m o r e c o m p l e x o n e s s u c h as relat ing o n e entity set to another . The a m o u n t of p r o c e s s i n g log ic for in fo rmat ion retrieval wi l l d e p e n d u p o n the actual n u m b e r of ent i t ies that the Theoret ica l F ramework / 50 que ry requires to access . D e t e r m i n i n g p r o c e s s i n g di f f iculty for retrievals presents spec ia l p r o b l e m s . For the same E-R-E d i a g r a m . it is p o s s i b l e to have t w o sets of inquiry requ i rements w h i c h vary greatly in c o m p l e x i t y . The issue centers a r o u n d d e c i d i n g o n h o w m u c h funct ional i ty to give the sof tware. For n o w , it is suff icient to assume away the p r o b l e m by de f in ing w h i c h quer ies or reports are t o be s u p p o r t e d and w h i c h o n e s are n o t 1 6 . In genera l , h o w e v e r , a retrieval can be c o m p a r e d t o an input t ransact ion w h i c h has to access spec i f i c ent it ies. W e can c o n s i d e r a query request by a user as ano the r event to the sys tem. The on ly d i f f e rence is that the query is an ext ract ion request . A retrieval event wi l l still have to access var ious ent it ies in the database in o r d e r to co l l ec t the necessary data to satisfy the query. The p r o c e s s i n g c o m p l e x i t y i n t r o d u c e d by a query event c o u l d be c o u n t e d in the same way that an update event w o u l d be c o u n t e d . H o w e v e r , a p r o b l e m arises w h e n w e try t o d e t e r m i n e the ef fect of the systems ' internal data structure c o m p l e x i t y o n p r o c e s s i n g . Ha ls tead [1977] es tab l i shed that there is a re lat ionship b e t w e e n the n u m b e r of ope ra to rs p lus the n u m b e r of ope rands and p r o g r a m length (and h e n c e effort requ i red to i m p l e m e n t a spec i f icat ion ) . If w e c o n s i d e r that ope rands are actual ly the data that a p r o g r a m must operate o n , then by c o u n t i n g all data that a p r o g r a m accesses , w e s h o u l d be able to est imate p rog ram length . The p r o b l e m is that w e may no t k n o w the n u m b e r of data e lements in a data structure that wi l l actually be u s e d in a p rog ram. For examp le , a p rog ram access ing a data st ructure con ta in ing 20 e lements may actually use on ly 5 of these e lements . A c c o r d i n g 1 6 A n interest ing p o i n t e m e r g e s f r o m the c o n c e p t of u n i m p l e m e n t e d quer ies . These po ten t ia l quer ies may p r o v i d e a c lue to the h u g e ma in tenance cos ts e x p e r i e n c e d in the so f tware industry today . " C r e e p i n g funct iona l i ty" and n e w releases may be n o m o r e than just i m p l e m e n t i n g n e w por t ions and p r o v i d i n g access to the overal l E-R-E d iagram that was d i scussed earlier. Theore t ica l F ramework / 51 to Ha ls tead , p r o g r a m length is a f u n c t i o n of 5 no t 20. T w o programs may access the same ent ity du r ing event p r o c e s s i n g yet may differ in s ize d e p e n d i n g u p o n the actual e l e m e n t s re fe renced . It may, neverthe less , be p o s s i b l e t o " e s t i m a t e " the n u m b e r of e l e m e n t s t o be r e f e r e n c e d by c o n s i d e r i n g the " c l a s s " of p r o g r a m b e i n g wr i t ten . For e x a m p l e , in an o n - l i n e update p rog ram it is likely that data man ipu la t ion wi l l o c c u r at the e l e m e n t level . This impl ies that a h igh pe rcen tage of access ib le data e lements wi l l actual ly be used . Alternat ively , in a file interface p r o g r a m indiv idual e l e m e n t s are less l ikely t o be man ipu la ted . T o summar i ze , k n o w l e d g e of the f u n c t i o n of a p rog ram may give us insight in to the k ind and extent of data man ipu la t i on that may o c c u r . 3.3.4. Functionality There are s o m e basic funct ions w h i c h w e w o u l d e x p e c t an in fo rmat ion system to prov ide in o rde r t o manipu late the entit ies and re lat ionships in the sys tem. The level of funct iona l i ty o r capabi l i t ies p r o v i d e d by the sof tware can vary greatly but can at least be spec i f i ed very early in the d e v e l o p m e n t p rocess . S o m e e x a m p l e s of these are: 1. Entity o c c u r r e n c e s wi l l have t o be a d d e d , d e l e t e d and have attr ibute values c h a n g e d . 2. If an entity is a d d e d then w e n e e d to k n o w certain attr ibutes be fo re the entity is a l l o w e d t o exist in the data base. 3. A single o r g r o u p of ent ity o c c u r r e n c e s must be locatab le and in fo rmat ion p r o v i d e d t o users of the system. 4. A l isting of ent i t ies in var ious log ica l s e q u e n c e s s h o u l d be available. The po in t is that all of these p r o c e s s i n g requ i rements are c o m m o n to all entity types. O n l y the spec i f i c attr ibutes s h o u l d affect the a m o u n t of p r o c e s s i n g l o g i c to be s p e c i f i e d . This k ind of p r o c e s s i n g l o g i c may be referred t o as "ent i ty m a i n t e n a n c e " . Each entity wi l l l ikely requi re a m e n u sc reen for adds , changes , and de le tes . W i t h i n Theoret ica l F ramework / 52 e a c h m e n u i tem a n o t h e r sc reen wi l l be requ i red t o hand le all the attr ibutes a long w i t h the necessary c o d e t o ensure data val idity. The m o s t impor tan t aspect of the E-R-E d iagram is that it can be o n e of the first d o c u m e n t s p r o d u c e d dur ing analysis. This thesis assumes that if ca l ib rated cor rect ly E-R-E d iagrams can be u s e d as the basis for est imat ing further work . 3.4. E-R-E AND FUNCTION POINTS If the E-R-E a p p r o a c h to est imat ing is t o be useful it must be usable earlier in the d e v e l o p m e n t p r o c e s s than current t e c h n i q u e s . F rom the above d i s c u s s i o n w e can b e g i n t o see w h e r e f u n c t i o n , po in t est imat ing fits in . For e x a m p l e , e a c h ent ity wi l l requi re a n u m b e r of screens for " m a i n t e n a n c e " pu rposes . H o w e v e r the n u m b e r of screens are just s h a d o w s of the under ly ing data c o m p l e x i t y . It makes m o r e sense t o f o c u s o n the ent i t ies and events . By measur ing ent i t ies , re lat ionships and events w e can obta in a reasonab le a p p r o x i m a t i o n to the n u m b e r of f u n c t i o n po ints that the so f tware wi l l have to p rov ide . The f o l l o w i n g table s h o w s h o w the E-R-E is the m o r e genera l case of the p o p u l a r Funct ion Po in t a p p r o a c h to est imat ing . Tab le 3.2 Entities Relat ionships Events -> -> -> l og ica l internal f i les, external fi les log ica l internal fi les input t ransact ion types, external ou tpu ts , inquir ies it can be s e e n f r o m the a b o v e Tab le that the E-R-E a p p r o a c h is also m o r e Theoret ica l F ramework / 53 p a r s i m o n i o u s than the F u n c t i o n p o i n t a p p r o a c h . The reason for this is that a deta i l ed eva luat ion of A l b r e c h t ' s a p p r o a c h reveals at least t w o redundant measures . D e m a r c o [1982] o b s e r v e d this same p h e n o m e n o n in his o w n attempt t o est imate sys tem effort as a f u n c t i o n of his Bang metr ic . H is " p - c o u n t s " (system primit ives) i nc lude 12 dif ferent ways of c o u n t i n g system proper t ies w h i c h are ind icators of system c o m p l e x i t y . H o w e v e r , in his o w n w o r d s : " Y o u m igh t reason , as I or ig inal ly d id , that all w o r k in a pro ject is w o r k spent i m p l e m e n t i n g o n e of the things c o u n t e d by the var ious p - c o u n t s . This theory imp l ies that y o u o u g h t to base you r funct ion met r ic o n all of the p - c o u n t s , w i th e a c h o n e w e i g h t e d by its un ique factor . I have never had m u c h s u c c e s s w i th this a p p r o a c h ; it is statistically intractable and some of the counts overlap and measure redundantly. A s impler and m o r e p r o d u c t i v e way to character ize Bang is to c h o o s e o n e of the c o u n t s as a pr inc ipa l i nd icato r . . . " (pg. 83, bold added) . This thesis has taken ent i t ies , re lat ionships , and events as the pr inc ipa l ind icators of requ i rements size. These c o n c e p t s are n o w integrated into a theoret ica l m o d e l . 3.5. FORMALIZING REQUIREMENTS SIZE 3.5.1. Events, States and Transforms The ideal s i tuat ion is w h e r e w e have c o m p l e t e in fo rmat ion about the events and inter - re lat ions that exist at the sys tem boundary . If it is a s s u m e d that w e have b e e n able t o accurate ly represent a system's requ i rements in terms of events at the system boundary , ent i ty sets, k n o w n re lat ionships a m o n g ent i ty sets and entity o c c u r r e n c e s , and cont ro l s o r laws w h i c h spec i fy a l l owab le c o m b i n a t i o n s of o n e o r m o r e events , then it Theore t ica l F ramework / 54 is p o s s i b l e to d e s c r i b e the statics and dynamics of a sys tem w i t h the use of set theory and matrix n o t a t i o n (see e .g . B u n g e 1977). The d i s c u s s i o n b e l o w represents an initial a p p r o a c h towards r igorous ly spec i f y i ng the c o n c e p t s d e v e l o p e d prev ious ly in this chapter . W e can def ine the event space of a system by a v e c t o r E w h i c h conta ins an e l e m e n t fo r each p o s s i b l e input and ou tpu t event at the system boundary : E = [ei,...,en] The event vector , E is c o m p r i s e d of t w o sub - vec to rs , E ' and E ° w h i c h c o r r e s p o n d t o the input events a n d ou tpu t events respect ive ly . Further, the ej's, i = 1...n are d e f i n e d as vecto rs themse lves w h i c h conta in in fo rmat ion about the s p e c i f i c event that has taken or is taking p lace . The e lements in the ej's are values of ent i ty attr ibutes i n v o l v e d in the event . A s e c o n d v e c t o r S is d e f i n e d as the sys tem state vector . The e lements of S are d e f i n e d as vecto rs (sets) w h o s e e l e m e n t s conta in the values of the ent i t ies and relat ionships that exist in the system's m e m o r y . The t w o vecto rs E ' and S are the inputs in to the matrix T w h i c h transforms E ' a n d S in to the n e w s y s t e m state vector , S ' , and poss ib l y creates n e w o u t p u t events at the system boundary E ° . The t rans format ion T is what w e usually refer t o as c o d e . This m o d e l of an in fo rmat ion system is s h o w n b e l o w in Figure 3.3. / 55 S Si LeCJ S' s; E = input event space S = system state space E°= output event space S' = new system state space T = process transformation Figure 3 . 3 Theore t ica l F ramework / 56 If the a s s u m p t i o n regard ing i s o m o r p h i c t rans format ions ho lds , t h e n w e w o u l d e x p e c t that the s ize of the t rans format ion T ref lects the s ize of E and S. The task n o w b e c o m e s o n e of measur ing the s ize of b o t h E and S in o rde r to p red ic t the size of T. F rom Figure 3.3 w e can see that a first a p p r o x i m a t i o n of r equ i rements s ize can be o b t a i n e d by s imply c o u n t i n g the n u m b e r of events that a system e i ther must r e s p o n d to o r generates , and the s ize of each event in terms of the n u m b e r of data e lements i nvo lved in each event . This is a measure of E. Add i t iona l l y , S can be c o u n t e d in the same way. A c o u n t of the n u m b e r of ent i t ies, and re lat ionships w e i g h t e d by the s ize of each ent i ty o r re lat ionship wi l l p rov ide a measure of S. The c o m b i n a t i o n of these t w o requ i rements measures p rov ide a first est imate of the size of T necessary to opera te a system. W h i l e these measures may be i n c o m p l e t e w i th respect to the eventua l c o m p l e x i t y of T they are available early in the d e v e l o p m e n t p r o c e s s . It must be kept in m i n d that est imat ing is a task carr ied ou t in the p r e s e n c e of i n c o m p l e t e i n fo rmat ion . W h e n m o r e in fo rmat ion about the E's and S's is i n t r o d u c e d it is p o s s i b l e to refine our initial est imate of T. Here w e can see the benef i t of estab l ish ing early m a c r o measures of s y s t e m requ i rements w h i c h can be t racked and re f ined as m o r e i n fo rmat ion b e c o m e s ava i l ab le 1 7 . T o bet te r unders tand the potent ia l c o m p l e x i t y of T it is he lp fu l t o p ro ject fo rward to a T ~ 7 The i m p o r t a n c e of this f r o m a pro ject m a n a g e m e n t perspect ive is that the series of d e c i s i o n s to p r o c e e d or not to p r o c e e d w i t h a pro ject must have cons i s ten t units for c o m p a r i s o n . Theore t ica l F ramework / 5 7 ful ly i m p l e m e n t e d in fo rmat ion sys tem. A t s y s t e m start -up S is c o m p r i s e d of a n u m b e r of d e f i n e d sets, s ^ - . p . S o m e are e m p t y ; these wi l l be u s e d t o track events over s o m e t ime p e r i o d , e .g . a t ransact ion f i le. S o m e are g iven m e m b e r s by humans at system in i t iat ion. These initial values can be c o n s i d e r e d as master fi les re f lect ing the current state of the e n v i r o n m e n t , e .g . o p e n i n g ba lances in asset a c c o u n t s , ex ist ing c l ients , etc . The p r o c e s s i n g of the t ransact ion events against the master fi les naturally k e e p s the later sets c o n c u r r e n t w i th a sys tem's env i ronment . W h a t are important , h o w e v e r , are the l inkages, i.e. the m a p p i n g s a m o n g these sets. C o n s i d e r t w o subsets of S, and s 2 . w i th in a system. The sets s , and s 2 are mutual ly exc lus ive if any event , ej o c c u r r i n g in the system's e n v i r o n m e n t requires that e i ther s , must be m o d i f i e d in s o m e w a y (e.g. add , change , o r de le te an e lement ) or s 2 must be m o d i f i e d , but no t b o t h . N o effort is requ i red by humans (or mach ines ) to r e c o r d the event in b o t h p laces. A n add i t iona l requ i rement for mutua l exclusiv i ty is that the state of set s 2 in n o way in f luences the poss ib le c h a n g e s to set S j . In the case of s i m p l e input events , there s h o u l d be a o n e to o n e c o r r e s p o n d e n c e b e t w e e n the state c h a n g e in the sys tem's e n v i r o n m e n t and a s ing le variable update , i.e. there is a d i rect t rans format ion of the input variable into storage. The trivial case is w h e r e a c h a n g e in the e n v i r o n m e n t is not r e c o r d e d , i.e. the system has n o d e f i n e d r e s p o n s e for the event . In the case of m o r e c o m p l e x events , changes to m o r e than o n e set may be requ i red . This w o u l d be the case w h e r e the event is c o m p r i s e d of m o r e than o n e data e lement . A first o rder a p p r o x i m a t i o n to the s ize of an indiv idual input event can n o w be T h e o r e t i c a l F ramework / 58 d e f i n e d as the n u m b e r of set c h a n g e s in S that must be m a d e g iven the event e ; 1 8 . Add i t i ona l l y , a c o m p l e x event may dictate that updates t o o n e set are c o n d i t i o n a l l y d e p e n d e n t o n the state of o n e o r m o r e o ther sets. H e n c e , a s e c o n d o rde r u p p e r l imit t o the s ize of an event ej, can be d e f i n e d as the square of the n u m b e r of sets i nvo l ved in the event . G i v e n the a b o v e even t - sys tem state in teract ion in fo rmat ion it is ant ic ipated that an ext reme ly accurate est imate of T c o u l d be generated . Un fo r tunate l y the prev ious t w o measures of c o m p l e x i t y w o u l d no t be available unti l w e l l in to the des ign phase . W h a t w e p r o d u c e at the analysis phase are approx imate measures of the ej's and the S J ' S . A A A A - A T h e s e indiv idual est imates of, ej and S J are aggregated t o f o r m E ' , E ° , and S . Essentially, these est imates represent data f l ows to and f r o m the system p r o c e s s e s and A can be u s e d t o est imate T. This est imate of T is referred t o as T . 3.6. SUMMARY This chapter started ou t wi th the p r o p o s i t i o n that requ i rements s ize is the dr iv ing f o r c e b e h i n d the effort to bu i ld sof tware . It c la ims that requ i rements s ize can be es t imated early in the s y s t e m d e v e l o p m e n t life cyc le by measur ing a system's event space and its internal state space . It is the size of these vectors that dr ive the t rans fo rmat ion p r o c e s s . The c o n c l u s i o n that must be drawn f rom this chapte r is that software processes are driven by the data. W i t h o u t data there are n o processes . W h i l e it is 1 8 A n issue w h i c h may be raised at this po in t is that a " g o o d " des ign w o u l d m i n i m i z e the n u m b e r of set changes , poss ib l y t o o n e . The p o s i t i o n of this thesis is that e v e n w i t h the " b e s t " d e s i g n , c o m p l e x events wi l l affect m o r e than o n e set. Theoret ica l F ramework / 59 true that events ref lect p r o c e s s e s o c c u r r i n g in a system's env i ronment , these events must be r e p r e s e n t e d as data to a sys tem. The s ign i f icance of this f r o m a research p e r s p e c t i v e is that the data can be v i e w e d as the i n d e p e n d e n t var iable and the so f tware p r o c e s s e s as the d e p e n d e n t variable. There fo re , m e a s u r e m e n t s of the data s h o u l d p red ic t the size of the p rocess . This c h a p t e r has p r e s e n t e d an a p p r o a c h to est imate sys tem requ i rements b a s e d o n measures of system statics and dynamics . This has b e e n ach ieved by l ink ing theory of requ i rements m o d e l l i n g w i t h theory of sys tem d e v e l o p m e n t t ransformat ions . It has s h o w n h o w m e a s u r e m e n t s of requ i rements are available earl ier in the d e v e l o p m e n t p r o c e s s than o the r measures s u c h as Func t ion Points and Lines of c o d e . It has also s h o w n h o w they are m o r e pa rs imon ious than these ex is t ing est imat ing a p p r o a c h e s b e c a u s e they o c c u r at a h igher level of abst ract ion . W h a t remains to be seen is if these measures of sys tem requ i rements s ize are better, m o r e rel iable p red ic tors of effort. T o address this q u e s t i o n , it is necessary to m o v e away f r o m theory in to the real w o r l d w h e r e it can be invest igated empir ical ly . The remainder of the thesis is d e d i c a t e d to this task. CHAPTER 4. EMPIRICAL RESEARCH DESIGN The ob jec t i ve of this chapter is t o m o v e f r o m theory in to the empir ica l d o m a i n . The p r e c e d i n g chapte r has estab l ished the theoret ica l just i f icat ion for us ing measures of requ i rements size as an early measure of system size. The p u r p o s e in d o i n g s o is that t rans forming sys tem requ i rements in to a w o r k i n g in fo rmat ion system is b e l i e v e d t o be the pr inc ipa l dr iv ing f o r c e b e h i n d effort. Three issues e m e r g e f rom C h a p t e r 3 w h i c h t o g e t h e r f o r m the p u r p o s e of the remainder of the thesis . The first is the extent t o w h i c h measures of sys tem requ i rements can expla in d e s i g n s ize . The s e c o n d is the extent to w h i c h d e s i g n measures can exp la in p rocess o r c o d e size. The th i rd is the extent to w h i c h effort can be p r e d i c t e d f r o m any of these s iz ing metr ics . There fo re the empi r ica l p o r t i o n of this thesis cons is ts of three d ist inct steps : The first s tep is t o A A establ ish the re lat ionsh ip b e t w e e n measures of E and S available at the d e s i g n phase A and p rocess s ize , T . The unit of analysis in this s tep is the indiv idual p r o g r a m as programs can be c o n s i d e r e d as d e s i g n d e c i s i o n s by d e v e l o p e r s . The s e c o n d s tep is t o t h e n establ ish the re lat ionship b e t w e e n requ i rements size and des ign s ize . In this s e c o n d step the unit of analysis m o v e s t o the system level because dur ing analysis it is the system w h i c h is the c o n c e p t u a l ent i ty of interest. The first t w o steps are carr ied ou t in reverse o r d e r of system d e v e l o p m e n t t o ensure that the s o u r c e c o d e is an accurate re f lect ion of des ign and that d e s i g n is an accurate re f lect ion of requ i rements . Finally, o n c e this l inkage is estab l i shed it is then p o s s i b l e to i n t r o d u c e the resource c o n s u m p t i o n variable and relate units of requ i rements and c o d e size t o units of labour . This chapter p r o c e e d s in the f o l i o w i n g manner . In the first s e c t i o n , a research m o d e l graphical ly presents the c o n c e p t of requ i rements size caus ing effort at b o t h the 60 Empir ical Research D e s i g n / 61 c o n c e p t u a l and empi r ica l levels. By c o m b i n i n g Figures 3.1, 3.2 and 3.3 f r o m C h a p t e r 3, this bas ic m o d e l e x p a n d s t o inc lude the theory of requ i rements t rans fo rmat ion and the c o r r e s p o n d i n g sou rces of effort. This s e c t i o n also exp lo res issues of c o n s t r u c t val idity and reliabil ity and further def ines the spec i f i c met r ic l inkages, c o n n e c t i n g measures of r e q u i r e m e n t s w i th measures of a w o r k i n g system. The s e c o n d s e c t i o n presents a general strategy of h o w the research m o d e l may be empi r ica l l y tested . This strategy calls for: 1) the systemat ic m e a s u r e m e n t and evaluat ion of w o r k i n g in fo rmat ion systems. This is a c h i e v e d by reverse e n g i n e e r i n g ex is t ing systems back t o their des ign and requ i rements spec i f i ca t ions and 2) c o n s t r u c t i o n of a sof tware metr ics database for m o d e l test ing . T o m o r e fully articulate the c o n c e p t s of ent i t ies, re lat ionships , events and p r o c e s s t ransformat ions this s e c t i o n i nc ludes an e x a m p l e system w h i c h was reverse e n g i n e e r e d manually. The issue of reliabil ity and c o m p u t a t i o n a l t ractabi l i ty ident i f ies the n e e d t o automate the exper t c o d e analysis p rocess . The d e s i g n , c o n s t r u c t i o n and i m p l e m e n t a t i o n of s u c h an a u t o m a t e d t o o l is t h e n p r e s e n t e d a long w i t h the measures for input to the der i ved metr ics database. Finally, a general reg ress ion m o d e l uses the measures ident i f ied in the examp l e system to expla in p r o c e s s A size, T . 4.1. RESEARCH MODEL Research into so f tware e n g i n e e r i n g and the sys tem d e v e l o p m e n t p r o c e s s in genera l , has suf fe red f r o m the lack of a n o m o l o g i c a l net, i.e. the theoret ica l c o n n e c t i o n s a m o n g a set of c o n c e p t s o r cons t ruc ts in w h i c h t o d iscuss empi r ica l p h e n o m e n o n . Further, substant ive research in sof tware eng ineer ing has b e e n carr ied ou t in the a b s e n c e of Empir ical Research D e s i g n / 62 c o n s t r u c t va l idat ion research. Scient i f ic k n o w l e d g e of the system d e v e l o p m e n t p r o c e s s wi l l be a d v a n c e d , and our ability t o est imate r e s o u r c e c o n s u m p t i o n , if and on ly if: c o n s t r u c t val id re lat ionships are es tab l i shed b e t w e e n the system des i red and the o n e d e f i n e d for d e v e l o p m e n t , and b e t w e e n the d e s i g n and c o d i n g effort e x p e c t e d and actual w o r k hours e x p e n d e d . Fur thermore , the re lat ionsh ip f r o m o u r d e f i n e d requ i rements to the a m o u n t of resources c o n s u m e d in the p rocess , must then be empir ica l ly va l idated . S c h w a b [1980] d i c h o t o m i z e s research into t w o separate phases of 1) c o n s t r u c t va l idat ion research and 2) substant ive research. S c h w a b de f ines cons t ruc t val idity as " . . . the c o r r e s p o n d e n c e b e t w e e n a cons t ruc t ( c o n c e p t u a l de f in i t ion of a variable) and the opera t iona l p r o c e d u r e to measure that c o n s t r u c t " (pp. 5-6). In contrast , substant ive research refers to research that at tempts to establ ish " . . . re lat ionsh ips b e t w e e n i n d e p e n d e n t and d e p e n d e n t var iab les" (pg. 4). O n e d e p i c t i o n of b o t h the theoret ica l and empi r ica l re lat ionships a m o n g the i n d e p e n d e n t and d e p e n d e n t variables of interest appears in Figure 4 .1 . This f igure is in terpreted as f o l l o w s . At the c o n c e p t u a l level there is a causal re lat ionship b e t w e e n the s ize of the requ i rements of the des i red system and the a m o u n t of resource c o n s u m p t i o n necessary to i m p l e m e n t the requ i rements , m i t igated by t w o factors: k inds of too ls u s e d and the e x p e r i e n c e or skill level of the system bui lders . These ef fects are d e p i c t e d as F. The system requ i rement size c o n s t r u c t is further d e f i n e d as i n c l u d i n g b o t h system statics and system dynamics (S,E). In Figure 4.1 the i n d e p e n d e n t variable, I, s ize of s y s t e m requ i rements , causal ly affects the d e p e n d e n t variable D, w h i c h is the a m o u n t of resources c o n s u m e d in the p rocess . It is impor tant to real ize that the / 63 Time ' S ize of System Retirements \ (SB) F = Exper ience, Tools D / Resource Consumption Conceptual Empirical D' 8' ~# Entities m FtakrticraNpo Design .4. " Input Events Output Events T C o r i n g * E Work Hours Figure 4.1: Research Model Empir ical Research Des ign / 64 c o n s t r u c t s at this level c a n n o t be m e a s u r e d direct ly . A t the empi r ica l level , the t w o c o m p o n e n t s of I, sys tem statics a n d dynamics , are o p e r a t i o n a l i z e d as ent i t ies and re lat ionships , and input events and o u t p u t events respect ive ly . In" Figure 4.1 the measurab le sys tem requ i rements are labe l led as I'. Similarly, the d e p e n d e n t variable, resource c o n s u m p t i o n , D, is o p e r a t i o n a l i z e d as the n u m b e r of w o r k hours , D', e x p e n d e d to bu i l d the sys tem. It is the covar ia t ion b e t w e e n o u r opera t iona l i za t ion of a system's requ i rements , I', a n d the measures of ef fort t o p r o d u c e the w o r k i n g sys tem, D', that is of substant ive interest. Finally, t w o major factors af fect ing the a m o u n t of effort e x p e n d e d , level of t o o l use and level of p e r s o n n e l e x p e r i e n c e , are rep resented by the F' arrow. In this f igure these ef fects have no t b e e n speci f ica l ly o p e r a t i o n a l i z e d . The m o d e l p r e s e n t e d here is a genera l o n e w h i c h may be ope ra t iona l i zed w i th in spec i f ic empi r ica l d o m a i n s in future f ie ld work . Referr ing to Figure 4.1 again, the scient ist e m p h a s i z e s the vert ical re lat ionships b e t w e e n b o t h the des i red and actual requ i rements and b e t w e e n the resources c o n s u m e d and w o r k hours . If measures of b o t h const ructs are va l id , then the empi r ica l re lat ionship m e a s u r i n g the actual requ i rements and effort e x p e n d e d p rov ide va l id research. The pract i t ioner , h o w e v e r , is satisf ied if there is c o n s t r u c t val idity on ly o n the vert ical re lat ionsh ip b e t w e e n the m e a s u r e d d e p e n d e n t var iable, i.e. the resou rces c o n s u m e d and the w o r k hours. H e r e the re lat ionsh ip b e t w e e n m e a s u r e d requ i rements and m e a s u r e d effort is e n o u g h t o satisfy cons ide ra t i ons of manager ia l c o n t r o l of the d e v e l o p m e n t p r o c e s s . B o t h pract i t ioners and a c a d e m i c s have an interest in const ruct val idity. In this case Empir ical Research D e s i g n / 65 systems d e v e l o p e r s are interested in a c c o u n t i n g for the var iance pr imari ly in the d e p e n d e n t variable. In o ther w o r d s they are in te res ted in p r e d i c t i n g the a m o u n t of ef fort t o d e v e l o p a g iven sys tem, i.e. their interest is in the re lat ionsh ip b e t w e e n 1' a n d D of Figure 4 .1 . Thus the d i f fe rence in a p p r o a c h e s b e t w e e n a c a d e m i c s and pract i t ioners lies no t o n the c h o i c e of the i n d e p e n d e n t and d e p e n d e n t variables, but rather o n the a m o u n t of emphas is p l a c e d o n the val idity of the c o n s t r u c t measures . W h i l e a c a d e m i c s p lace equa l va lue o n the val idity of the measures of b o t h the i n d e p e n d e n t and d e p e n d e n t c o n s t r u c t s , the pract i t ioner p laces m o r e emphas is o n p red ic t i ng o r est imat ing the size of the d e p e n d e n t variable, in this case the a m o u n t of resources c o n s u m e d in bu i ld ing an in fo rmat ion sys tem. 4.1.1. Detailed Research Model Figure 4.2 p resents these c o n c e p t s of cons t ruc t va l idat ion and substant ive research into a m o r e useful c o n t e x t and establ ishes the deta i led l inkage b e t w e e n requ i rements size and effort. T o unders tand this deta i led l inkage it is useful t o rev iew the d i s c u s s i o n o n the p r o c e s s and the p r o d u c t of system d e v e l o p m e n t d e v e l o p e d in C h a p t e r 3. The p rocess of systems d e v e l o p m e n t is c o n c e p t u a l i z e d as c o n s u m i n g resources dur ing : analysis of in fo rmat ion requ i rements , des ign of spec i f ica t ions , and d e v e l o p m e n t o r c o d i n g of spec i f ica t ions in to an operat iona l system. Each effort area takes as input s o m e def in i t ion of the p r o b l e m space , or size, and conver ts this de f in i t ion into ano the r representat ion f o rm . The w o r k i n g IS is the result of t ransformat ions f r o m each of the prev ious phases (see eg . K o t t e m a n n and Konsynsk i 1984). These t ransformat ions take c o n c e p t u a l reality Time Empirical S' E # Entities # Relationships Input Events Output Events F Work Hours i v r Processes Files Fields Proj. Joins Reports Screens I/O Data -Elements F Work Hours IV2' T - Source Lines of Code F = Experience, Tools Figure 4.2: Detailed Research Model Empir ical Research D e s i g n / 67 in the minds of users t h r o u g h analysis, des ign , c o d i n g and a final t ranslat ion in to m a c h i n e representa t ion via the use of c o m p i l e r s o r interpreters. Each of the t ransformat ions , e x c e p t the last o n e , requi re h u m a n effort t o unders tand and structure the p r o b l e m s p a c e . If the p r inc ip le of t o p - d o w n d e c o m p o s i t i o n and s tepwise re f inement is a p p l i e d , then e a c h of the units at o n e level wi l l s u b s e q u e n t l y e x p a n d to m a n y units at a l o w e r leve l . At the analysis stage of system d e v e l o p m e n t , the units that are available to c o u n t cons is t of the real w o r l d ob jects that are to be m o d e l l e d w i th in the i n fo rmat ion sys tem, as w e l l as k n o w l e d g e of the events af fect ing those ob jec ts that wi l l o c c u r at the system boundary . In Figure 4.2 b o t h the const ructs and their respect ive empi r ica l measures can be o b s e r v e d . At the c o n c e p t u a l level w e are in te res ted in the size of: the sys tem requ i rements , the log ica l d e s i g n , and the w o r k i n g system. C o n s t r u c t val idity b e t w e e n all three of the d e p i c t e d vert ical re lat ionships is necessary to make scient i f ic p rogress . Here , the static and dynamic p roper t ies of sys tem requ i rements are o p e r a t i o n a l i z e d as ent it ies and re lat ionsh ips , and input events and o u t p u t events respect ive ly . I, and I', c o r r e s p o n d to I and I' in Figure 4.1. D , , the o u t p u t of the des ign phase and l 2 , input to the c o d i n g phase are equ iva lent as are their operat iona l i za t ions D ' , and l ' 2 . In this f igure I, is s h o w n to cause D , t h r o u g h the in terven ing variable I V , , effort. The size of the log ica l m o d e l represents the result of the t rans format ion f r o m sys tem requ i rements . It is t h e o r i z e d that the size of this log ica l m o d e l maintains the size and c o m p l e x i t y of the initial r equ i rements t h r o u g h the d e s i g n phase. Fur thermore , these p roper t ies h o l d t h r o u g h the c o d i n g phase w h i c h eventual ly p r o d u c e s the size of the Empir ical Research D e s i g n / 68 w o r k i n g sys tem. At the emp i r i ca l level I, is ope ra t iona l i zed as d i s c u s s e d above . D ' } and l ' 2 are o p e r a t i o n a l i z e d as the static and dynamic p roper t i es of the log ica l m o d e l . These measures are ana logous to f u n c t i o n po ints . For s impl ic i ty , D 2 , the size of the w o r k i n g sys tem, is m e a s u r e d as D ' 2 / s o u r c e l ines of c o d e . A l ternate ly , Ha ls tead [1977], M c C a b e [1976] o r o t h e r p rog ram c o m p l e x i t y metr ics c o u l d be u s e d at this po int . The reason that the m a p p i n g b e t w e e n measures of requ i rements and measures of c o d e is impor tant is that the re lat ionships a m o n g the units are t raceable . This d i scuss ion started by c la im ing that there is causal re lat ionsh ip b e t w e e n size of requ i rements and resource c o n s u m p t i o n . H o w e v e r these units are not funct iona l ly c o m p a r a b l e . It is necessary first t o establ ish the proper t ies of requ i rements w h i c h cause the p r o d u c t i o n of s o u r c e c o d e . O n c e the. funct iona l re lat ionships b e t w e e n requ i rements and c o d e are es tab l i shed , t h e n units of requ i rements can be used as p red ic to rs of d e v e l o p m e n t effort. 4.1.2. Metric Linkages The m a p p i n g b e t w e e n requ i rements and c o d e is s h o w n in Figure 4.3. W h i l e Figure 4.2 s imply s h o w e d a s ingle empi r ica l l ink b e t w e e n the I's and D's, Figure 4.3 s h o w s in greater deta i l the spec i f ic measurab le l inkages f r o m requ i rements t h r o u g h to s o u r c e c o d e . Figure 4.3 d e p i c t s h o w the operat iona l measures of s ize are funct iona l l y re lated. Prior research has f o c u s e d primari ly o n m e a s u r e m e n t s at the des ign and c o d e level (see e .g . C o n t e et al. 1986), and has d e v e l o p e d empi r ica l re lat ionships b e t w e e n these measures and the eventual c o d e of instal led systems (A lbrecht 1979; A l b r e c h t and / 69 Requirements Size metrics ( S . E ) S = S t a t i c s Entities: Relationships: — Number - Number - Attributes - Attr ibutes Design Size Metrics Files Fields P ro jec t ions J o i n s Fields E = E v e n t s Input Output Events Events Update-P r o c e s s - S c r e e n s - S c r e e n Var iab les . - F i l e s Used R e t r i e v a l -P r o c e s s - Reports - Report Var iab les - F i les Used Work ing S y s t e m M e t r i c S o u r c e L ines of C o d e Figure 4 .3 : Metr ic L inkages Empir ical Research D e s i g n / 70 Ga f fney 1983 ; Rub in 1983, 1985; Jones 1986; S y m o n s 1988). The empi r ica l f ind ings, d i s c u s s e d in C h a p t e r 6, are the result of reverse e n g i n e e r i n g the c o d e back to the d e s i g n metr ics s h o w n in this f igure. The regress ion e q u a t i o n then uses these c o u n t s t o p red ic t c o d e size. In o r d e r to map backwards f rom f u n c t i o n po in t metr ics to requ i rement metr ics it is necessary to i n t r o d u c e semant ic i n fo rmat ion in to the reverse e n g i n e e r i n g p r o c e s s . If w e assume that system des igners d e c o m p o s e a requ i rement spec i f i ca t ion in to p r o c e s s e s (programs, m o d u l e s ) that deal w i th o r h ide s o m e aspect of the spec i f i ca t ion (see eg . Parnas 1975), then it is reasonable t o classify a system's p rograms a c c o r d i n g to the f u n c t i o n that they pe r fo rm. If w e further assume that the basis for f unc t iona l d e c o m p o s i t i o n ref lects a data f l o w o r ien ta t ion to d e s i g n , typ ical of t ransact ion p r o c e s s i n g systems, t h e n w e may c o n c l u d e that events , i dent i f i ed as requ i rement units, w o u l d require separate p rocesses to hand le the dynamics of e a c h event . The in fo rmat ion necessary to d e t e r m i n e a p rogram's f u n c t i o n is e m b e d d e d in its s o u r c e c o d e . At the p r o g r a m level certain language k e y w o r d s and patterns of variable usage p rov ide c lues to the func t ion of the p r o g r a m . For e x a m p l e , a p rog ram w h i c h has a h igh c o n t e n t of s c r e e n I/O and file man ipu la t ion s ta tements wi l l m o s t l ikely be an o n l i n e t ransact ion capture p rogram. O n c e c lass i f ied correct ly , s e m a n t i c in ferences may be m a d e as to the n u m b e r of ent i t ies, re lat ionships a n d events that exist in a requ i rements s p e c i f i c a t i o n by l o o k i n g at the d is t r ibut ion of p rograms by p r o g r a m class. The p r o p o s i t i o n is that this i n fo rmat ion can be u s e d to p red ic t the a m o u n t of c o d e requ i red to i m p l e m e n t a n e w requ i rements spec i f i ca t ion . Empir ical Research D e s i g n / 71 A l imi tat ion of the empi r ica l p o r t i o n of this research is that it deals pr imari ly w i th the re lat ionsh ip b e t w e e n d e s i g n and c o d e size, i.e. at the p rog ram level . The e x t e n s i o n t o i n c l u d e the re lat ionsh ip b e t w e e n requ i rements and d e s i g n is theoret ica l at this po in t and no t fully ope r a t i ona l i z ed . Future studies w i t h larger data sets wil l be able to fully opera t iona l i ze these c o n c e p t s and test this theory . 4.2. SYSTEM DEVELOPMENT MEASUREMENT AND EVALUATION Previous d i scuss ion laid ou t the c o n c e p t u a l f ramework for this study. N o w w e turn t o the task of test ing the m o d e l empir ical ly . T o ach ieve this, a genera l strategy for c o l l e c t i o n and analysis of f ie ld data is n e c e s s a r y 1 9 . A t o p level data f l o w d iagram of system d e v e l o p m e n t m e a s u r e m e n t and evaluat ion appears in Figure 4.4. The Systems D e v e l o p m e n t p r o c e s s b u b b l e is seen as c o n s u m i n g the resources of sys tem deve lope rs and users, p r o d u c i n g the w o r k i n g systems w h i l e genera t ing a n u m b e r of Pro ject Detai ls s u c h as a m o u n t of resource c o n s u m p t i o n , p e r s o n n e l i nvo lved , t o o l s and m e t h o d s u s e d and o t h e r d e m o g r a p h i c s . The System D e v e l o p m e n t P rocess b u b b l e p r o d u c e s w o r k i n g systems as tang ib le o u t p u t s cons i s t i ng of s o u r c e c o d e and data de f in i t ions . T w o m e a s u r e m e n t p r o c e s s b u b b l e s are requ i red to p r o d u c e the D e r i v e d M e t r i c s database: The Expert C o d e Ana lyser and the Project Detai l Extract ion. The Expert C o d e Ana lyzer p rocess b u b b l e , e i ther a u t o m a t e d or manual , takes as input c o m p l e t e d w o r k i n g systems and generates a n u m b e r of sof tware metr ics s h o w n later in Figure 4 .12 . F rom the perspect i ve of this p r o c e s s a system is d e f i n e d as a c o l l e c t i o n of in tegra ted p rograms and thei r c o r r e s p o n d i n g data def in i t ions that, together , have an ident i f iab le f u n c t i o n ~ r 9 Several authors have ident i f ied and d i s c u s s e d the general n e e d fo r a sof tware metr ics c o l l e c t i o n and analysis g r o u p w i th in a sys tem d e v e l o p m e n t e n v i r o n m e n t and research is underway to au tomate the entire p r o c e s s (see e.g. Basili and R o m b a c h [1988], and D e M a r c o [1982]). / 72 System Builders *> Project Details 4 System Develop-ment Process ) Source code and Data Definitions Derived metrics Database I • Metrics Analysis • Extract Project Parameters Figure 4.4: System Development Measurement and Evaluation Empir ical Research D e s i g n / 73 w i t h i n a c o m p a n y . The o u t p u t f r o m this p rocess updates the M e t r i c s database. Project Parameters can be ext racted by e i ther an a d v a n c e d pro ject c o n t r o l system or, in its a b s e n c e , t ra ined research p e r s o n n e l . Finally, the M e t r i c Analysis cons is ts of statist ical rou t ines w h i c h p r o v i d e m a n a g e m e n t w i th in fo rmat ion w h i c h improves their unde rs tand ing and est imat ing capabi l i t ies w i th in their o w n d e v e l o p m e n t env i ronment . The advantages of revers ing the c o d e over us ing ex is t ing d o c u m e n t a t i o n lies in the accuracy of the d o c u m e n t s . In the f ie ld it is c o m m o n for d o c u m e n t a t i o n to lag the sof tware . For e x a m p l e , a d d e d requ i rements may not be re f lected in the requ i rements d o c u m e n t . From a theoret ica l pe rspect i ve it is useful to think of reverse eng inee r i ng as reverse t ransformat ions . H e n c e it is assured that the analysis and d e s i g n d o c u m e n t s are true representat ions , i.e. w i t h o u t mod i f i ca t ions , of the final w o r k i n g sys tem, subject on ly to the interpretat ions of the researcher . The revers ing opera t ions , h o w e v e r , are ob jec t i ve and r e p r o d u c i b l e . Detai ls of the rules app l ied t o reverse eng inee r the c o d e are c o n t a i n e d in the C o d e Analyzer , d i s c u s s e d later. 4.2.1. Calibration C a l c u l a t i n g regress ion co -e f f i c ien ts f r o m the reverse e n g i n e e r e d systems is just the first s tep towards b e i n g able to pred ict resource c o n s u m p t i o n early in the d e v e l o p m e n t p r o c e s s . O n c e re lat ionships have b e e n estab l ished b e t w e e n requ i rements o r d e s i g n met r ics and c o d e for a g iven e n v i r o n m e n t , it is t h e n p o s s i b l e t o i n t r o d u c e effort a n d o t h e r pro ject factors in to the est imat ion m o d e l . This is a c h i e v e d by extract ing r e s o u r c e c o n s u m p t i o n data f r o m the pro ject details and regress ing these units against requ i rements , d e s i g n or c o d e units. H e n c e , the resul t ing regress ion we igh ts are Empir ical Research D e s i g n / 74 ca l ib rated to a spec i f i c sys tem d e v e l o p m e n t e n v i r o n m e n t cons i s t i ng of a u n i q u e mixture of p e o p l e skills and a p p l i e d t e c h n o l o g y . A A A 4.2.2. Measures of E, S and T From the Code From Figure 4.3 a b o v e it is p o s s i b l e to de f ine a n u m b e r of s p e c i f i c measures for events and system statics. These appear b e l o w . F o l l o w i n g these is the measure of p r o c e s s size used and the just i f icat ion for the measure . A 4.2.2.1. System Dynamic Measurements: E 1. Inputs: E 1 a. Input events = screens b. Input event s ize = sc reen variables 2. O u t p u t s : E 0 a. O u t p u t events '= reports b. O u t p u t event size = report var iables A 4.2.2.2. System Static Measurements: S 1. Entities = files a c c e s s e d 2. Entity s ize = f ields available 3. Relat ionships = tempora ry fi les u s e d (pro ject ions) + file cross re ferences (joins) Empir ical Research D e s i g n / 75 A 4.2.2.3. Process Measurement: T A n issue ra ised in C h a p t e r 2 is that c o d e may be c o u n t e d in var ious ways . The m o s t c o m m o n m e t h o d is t o c o u n t text l ines of s o u r c e c o d e wh i le i gno r ing c o m m e n t l ines [see B o e h m 1981] . H o w e v e r , this a p p r o a c h is no t w i t h o u t its d rawbacks . Even w i th in the same language a line of c o d e may vary greatly w i t h respect t o the a m o u n t of w o r k or f u n c t i o n p e r f o r m e d . For e x a m p l e a nes ted "IF" s tatement is m o r e di f f icult t o c o d e than a s i m p l e ass ignment s tatement . Ideally, t o c o n t r o l fo r these var iances in indiv idual l ines, c o d e s h o u l d be b r o k e n d o w n into its o p e r a t o r and o p e r a n d s y m b o l s in o r d e r to ob ta in a m o r e r igorous measure of p rog ram l e n g t h 2 0 H o w e v e r , w i t h o u t the a u t o m a t e d c o d e c o u n t i n g p rog ram this a p p r o a c h is no t c o m p u t a t i o n a l l y feasib le . Instead, the m o r e c r u d e but c o m m o n l y a c c e p t e d m e t h o d of c o u n t i n g text l ines was u s e d as a measure of p r o c e s s size for the p i lot data. The d e v e l o p m e n t and test ing of a c o d e analyzer to alleviate the c o u n t i n g issue is the subject of a f o l l o w i n g s e c t i o n . 4.3. EXAMPLE SYSTEM A n examp le sys tem wi l l be u s e d to desc r ibe the c o n c e p t s d e v e l o p e d a b o v e and in C h a p t e r 3. This system has not b e e n artificially c rea ted : it is a system w h i c h is current ly in use w i t h i n a c o m p a n y . It was not c h o s e n because it re f lected the theory w e l l and as s u c h s o m e por t ions of the theory may no t be represented . Rather, it was c h o s e n for n o o t h e r reason than its n a m e starts w i t h 'A ' w h i c h p l a c e d it o n t o p of the f ie ld samp le p i le . The s o u r c e c o d e f r o m the c o m p a n y was o b t a i n e d f r o m its "2"c See Ha ls tead [1977] for de f in i t ions of p rog ram length , p rog ram v o l u m e , and p rog ram in fo rmat ion con ten t . A l s o see F i t zs immons and Love [1978] and Shen et al. [1983] for i n d e p e n d e n t val idat ions of these measures . Empir ical Research D e s i g n / 76 p r o d u c t i o n library w h i c h conta ins the current ve rs ions of all its w o r k i n g systems. T h e e x a m p l e b e l o w beg ins w i th a system's requ i rements s ta tement and traces the system's d e v e l o p m e n t d o w n to the p r o c e s s leve l . The system's r e q u i r e m e n t s conta in the bas ic spec i f i ca t ion of sys tem level ent i t ies , re lat ionships and events . T h e s e are p r e s e n t e d graphical ly . T h e n , the system's m e n u structure is p resen ted us ing a W a r n i e r - O r r t ype p r o c e s s hierarchy. N e x t , t he p rocess h ierarchy is analyzed us ing da ta - f l ow d iagrams (DFD 's ) w h i c h are e x p l o d e d d o w n t o the l o w e s t p rocess level unt i l e a c h p r o c e s s b u b b l e represents a s ing le p rogram. A t that level it is poss ib le t o accurate ly c o u n t the sof tware in var ious ways. In the c o n c l u s i o n t o C h a p t e r 3 it was stated that the i n d e p e n d e n t variable for p red ic t i ng p rocess s ize is its data. There fo re , data f lows are c o u n t e d in and ou t of each p rogram. T w o dist inct ive data f lows are c o u n t e d . The first represent data f l ows across the h u m a n - m a c h i n e boundary , i.e. attr ibute values of events o c c u r r i n g at the sys tem boundary . The s e c o n d represent the f l o w s b e t w e e n e a c h p r o c e s s and the system's static space , i.e. b e t w e e n the program and system storage. 4.3.1. System Description The Asset D isposa l System (ADS) is a relatively smal l m e n u - d r i v e n t ransact ion p r o c e s s i n g and m a n a g e m e n t repor t ing app l icat ion system. Its func t ion is to mainta in inventory detai ls o n assets w h i c h the c o m p a n y has d e c i d e d to sell as w e l l as to r e c o r d i n f o r m a t i o n about their sale w h e n eventual ly s o l d t o cus tomers . There are t w o separate c lasses of assets a b o u t w h i c h the system must mainta in i n fo rmat ion , capital assets and scrap assets. The first major event of interest t o the system o c c u r s w h e n m a n a g e m e n t d e c i d e s t o sell an asset. In format ion about the asset is then r e c o r d e d into the sys tem Empir ical Research D e s i g n / 77 by a user via an on l i ne termina l . Further, d e c i s i o n s are n o t i r revocab le , i.e. m a n a g e m e n t can c h a n g e its m i n d and d e c i d e to r e m o v e any asset up for sale or c h a n g e any detai ls o n assets already up for sale. The s e c o n d major event of interest is the sale of an asset to a purchaser . Sales events require the r e c o r d i n g of the asset(s) i n v o l v e d in the sale and in fo rmat ion about the purchaser and f inancial terms of the sale. Finally, a n u m b e r of m a n a g e m e n t reports are requ i red w h i c h are available to users by se lec t ing f r o m m e n u s p r e s e n t e d at an on l ine terminal . In tota l , 13 separate reports are r e q u e s t e d by m a n a g e m e n t . These reports range f r o m s imp le l ist ing of t ransact ions (events) to sales and prof i t c ross - tabu la t ions . The analysis of the a b o v e system requ i rements p r o d u c e s the first d o c u m e n t , Figure 4.5, d e p i c t i n g the major entit ies, re lat ionships and events i nvo lved in the sys tem. Repor t i ng requ i rements are' no t s h o w n . In Figure 4.5 the t w o classes of assets are r e p r e s e n t e d as the entity sets Capi ta l Assets A S D X 0 0 0 1 and Scrap Assets A S D X 0 0 0 2 2 1 . The d e c i s i o n to sell events are d e p i c t e d as the re lat ionship c reated b e t w e e n m a n a g e m e n t and the c o m p a n y ' s assets. The d e c i s i o n to no t sell an asset, i.e t o reverse a d e c i s i o n , is rep resented by the 1:n re lat ionship b e t w e e n m a n a g e m e n t and the Asset ent i ty sets. This is equiva lent t o stat ing that an asset may be i nvo lved in m o r e than o n e event . Sales events are s h o w n as the re lat ionship c reated b e t w e e n a c u s t o m e r and o n e o r m o r e assets or asset c lasses via the ent ity set Sales Orders . Each sales o rde r is m a d e by o n e c u s t o m e r but e a c h c u s t o m e r may be i nvo l ved in many sales events . Each sales o rde r may invo lve o n e o r m o r e assets f r o m ei ther asset class. These re lat ionships are rep resented by the " I n c l u d e s " d i a m o n d s . The repor t ing requ i rements cons is t of ext ract ing in fo rmat ion f r o m the three entity sets in var ious c o m b i n a t i o n s . The A D S sys tem can be "2~1 For the rema inder of the e x a m p l e a numer ica l o r cap i ta l i zed re ference spec i f ies the actual n a m e ass igned to the system ob ject . Figure 4.5: Asset Disposal System E-R-E S t r u c t u r e / 78 Customer Evant Sourc« Empir ical Research D e s i g n / 79 s u m m a r i z e d as hav ing three k ind of input events : t w o d e c i s i o n s and o n e sales, three ent ity sets: t w o asset classes and o n e sale orders , and th i r teen o u t p u t events : repor ts w h i c h c o n v e y i n f o r m a t i o n about all ent i t ies and re lat ionships that exist w i th in the sys tem. It wi l l be s e e n h o w each entity, re lat ionsh ip and event wi l l be i m p l e m e n t e d as e i ther a fi le, a data f l ow , a p rocess , o r user interface. In Figure 4.6 the p r o c e s s structure is s h o w n . Each line conta ins a numer ica l re fe rence and d e s c r i p t i o n of a p rog ram. The first sc reen a user sees is the M a i n M e n u . F rom this m e n u the user may se lect o n e of e ight alternatives. The first three p rograms in Figure 4.6, 1100 -1130 , dea l w i th the three events ident i f ied Figure 4.5. The remainder of the m e n u s e l e c t i o n s a l l ow the user to c h o o s e w h i c h reports are to be p r in ted ou t and where . M e n u i tem 4080 in Figure 4.6 executes four p rograms in s e q u e n c e p r o d u c i n g t w o repor ts w i t h o u t further user in teract ion . M e n u i tem 1020 p r o d u c e s a report s u b - m e n u f r o m w h e r e a user can se lect o n e of n ine di f ferent reports . Each report s e l e c t i o n e x e c u t e s a s ingle p rog ram w h i c h may p r o m p t a user for further report parameters . The l inkage b e t w e e n these p rog rams and the sys tem files wi l l n o w be s h o w n in the D F D ' s . Figure 4.7 conta ins the t o p level v i e w of the system f r o m a data f l o w perspect ive . There are six n o d e s in this d iagram. The first box , User , represents the i m m e d i a t e system e n v i r o n m e n t . The s e c o n d n o d e , p r o c e s s b u b b l e 1000, conta ins the 21 s u b - p r o c e s s e s i dent i f i ed in Figure 4.6 and c o r r e s p o n d s to the t rans fo rmat ion matrix T in Figure 3.3 f r o m the last chapter . The l ine b e t w e e n the User and p r o c e s s b u b b l e 1000 represents all data f l o w i n g across the m a n - m a c h i n e boundary . A l l input and o u t p u t events o c c u r across this system boundary w h i c h is d e p i c t e d as a w e i g h t e d arc b e t w e e n / 80 Figure 4 . 6 : Asset Disposal System Processes 1100 Orlne Maintain Cap asset DeteJ - 1110 QrSm rVlakttaln Scrap Asset Dotal 1130 Onine Maintain Sales Orders r 4100 Create Temp ffie 4080 Batch Reports Main 4 1 1 0 S . O . Report Cap. 4 1 2 0 $ . O . PrintScr. _ 4130 Reset R a g - 4140 Discrepancy Report - 4160 Cap. Asset Status Report - 9999 Sat Printer 1020 Report S u b - M e n u - 4 0 2 0 Disposal focome Rpt. - 4 0 3 0 S . O . S i f rmaryRpt 4 0 4 0 A S D Status Rpt . 4 0 5 0 Used Tte Sates Rpt. 4 0 8 0 Unsold Asset Rpt . - 4 0 7 0 Scrap Sates Rpt . 4 0 9 0 OL Account Rpt . 4 1 5 0 $ . G L S u m m . Rpt . Figure 4 . 7 : Top Level DFD / 81 3,13,&82 Capital Assets ASDX0001 Scrap Assets ASDX0002 Salts Orders ASBX0003 S.O. Print TemP ASDX9001 Empir ical Research D e s i g n / 82 the t w o n o d e s . This arc c o r r e s p o n d s to the E v e c t o r f r o m C h a p t e r 3. Th ree n u m b e r s de f ine this arc. The first ind icates that 3 input events o c c u r at the boundary . The s e c o n d ind icates that 13 o u t p u t events , i.e., reports are g e n e r a t e d by the sys tem. The th i rd , 682 is an actual variable c o u n t of this f l ow. It i nc ludes all s c r e e n and report var iables a n d n o d i s t i nc t ion is m a d e at this p o i n t regard ing d i r e c t i o n of f l o w . A l t h o u g h , in genera l sc reen variables t e n d to be input to the system and repor t variables necessar i ly c o m e out of the sys tem. This value wi l l play a key rol l later w h e n est imates of p r o c e s s s ize are made . In brief, it is our first g o o d est imate of sys tem dynamics . The rema in ing n o d e s in Figure 4.7 are the system's data stores . These c o r r e s p o n d to S and S' f r o m C h a p t e r 3. They also c o r r e s p o n d to the three ent i t ies Capi ta l Assets , Scrap Assets a n d Sales O rde rs . A four th data store , Sales O r d e r Tempora ry Print f i le, is u s e d by s o m e of the report p rograms as a buf fer and is i n c l u d e d primari ly for the sake of c o m p l e t e n e s s . W h e n t h e p rocess b u b b l e 1000 is e x p l o d e d d o w n to its next leve l , Figure 4.8 e m e r g e s . Each p r o c e s s b u b b l e in this d iagram represents a p r o g r a m w h i c h is access ib le f r o m the M a i n M e n u p rogram. As in Figure 4.7 the w e i g h t e d arcs b e t w e e n the U s e r b o x e s a n d each p r o c e s s represent the data - f lows across the m a n - m a c h i n e boundary , a l though these numbers n o w represent " m i c r o - e v e n t s " at the p r o g r a m level . For examp le , the " 3 , 0 , 5 3 " arc f l o w i n g into p rog ram 1100 s h o u l d be in terpreted as 3 screens , 0 reports , and 53 data e lements . There are three o n - l i n e u p d a t e p rograms available in the system. It is not surpr is ing to learn that each u p d a t e p rog ram c o r r e s p o n d s exact ly to e a c h of the major events d e s c r i b e d in Figure 4 .5 . The a d d e d detai l that this d iagram p rov ides is i n fo rmat ion regard ing w h i c h data s tores are a c c e s s e d by e a c h p rog ram and the size and d i rec t ion of each data f l o w into each p rocess . / 83 Update Programs Report Programs Figure 4.8: Asset Disposal System - Main Menu Processes Empir ical Research D e s i g n / 84 Figure 4.9 conta ins the detai l of the Reports S u b - M e n u 1020 p r o c e s s . W h e r e the User has s o m e c o n t r o l o v e r a program's e x e c u t i o n the c o n n e c t i n g arc has a d o u b l e h e a d e d ar row. The w e i g h t e d arcs across the system's boundary again represent the n u m b e r of var iables transferred b e t w e e n the sys tem and its Users and the " m i c r o - e v e n t s " that o c c u r . Figure 4 .10 conta ins the p r o c e s s e s e x e c u t e d f r o m the batch repor t m a i n - m e n u i tem. Finally, Figure 4.11 has b e e n i n c l u d e d t o s h o w a subt le but impor tant aspect of est imat ing p r o c e s s s ize . W i t h i n p r o c e s s 4160 a temporary h o l d file is created t o reorgan i ze the data c o m i n g in f rom the o t h e r data stores . This adds add i t iona l c o d e w h i c h is not apparent f r o m a h igher level of abstract ion . The a d d e d in fo rmat ion of tempora ry files c o m e s o n l y after s igni f icant d e s i g n w o r k has b e e n carr ied out . H e n c e it is e x p e c t e d that g i ven this i n fo rmat ion an est imate of p r o c e s s s ize w o u l d increase in accuracy . This is ind icat ive of the iterative nature of est imate re f inement . 4.4. AUTOMATING THE EXPERT CODE ANALYSER In o rde r to establ ish unb iased , rel iable m e a s u r e m e n t s f r o m the systems, it is necessary to so lve the p r o b l e m of the relative exper t i se a m o n g the measurers . It is be l i eved that the p r o b l e m of m e a s u r e m e n t reliabil ity is so lvab le by the use of an a u t o m a t e d c o d e analyzer w h i c h can be u s e d to reverse eng inee r instal led systems. / 86 4100 L. ' 0003 0001 0002 4130 «3-4 1 2 0 9001 0,1,t>2 0,1,62 Figure 4.10: Batch Report Run 4080 0,1,1b 0001 hold Figure A.11: Asset Status Report A160 Empir ical Research D e s i g n / 88 4.4.1. Reliability W h i l e Figures 4.1 and 4 .2 have p r o v i d e d an overal l f r a m e w o r k for research , the issue of m e a s u r e m e n t rel iabil ity is central t o the est imat ing p r o b l e m . Reliabil ity refers to the extent that var ious measures of c o d e , o r d e s i g n f u n c t i o n po in ts , are cons is tent ly m e a s u r e d by o n e researcher o r analyst t o the next and f r o m o n e sys tem to the next. P r o b l e m s w i th f u n c t i o n po in t analysis w e r e succ inc t l y rev iewed and art iculated by S y m o n s (1988) w h o ind icated that the very p rocess of measurement was fraught w i th di f f icult ies. "The m e t h o d is not as easy to apply in pract ice as it first appears . . . For s o m e t ime t o c o m e there fo re it wi l l be best in any o n e organ i za t ion if all m e a s u r e m e n t s are superv i sed by o n e ob jec t i ve , e x p e r i e n c e d f u n c t i o n po in t analyst. S u c h an analyst s h o u l d accumu la te and d o c u m e n t cases and der ive genera l rules ... w h i c h wi l l he lp ensure c o n s i s t e n c y and ob ject iv i ty in f unc t ion po in t analysis, (pp. 9,10) Briefly, t w o major cr i t ic isms can be leve led at f u n c t i o n po in t analysis: 1. It is unl ikely that the f u n c t i o n po int w e i g h t s and c o m p l e x i t y ad justments are genera l i zab le o u t s i d e of a part icular pro ject data set. N o theory has yet b e e n p r o p o s e d as to w h y a part icular c o m p l e x i t y scale is appropr iate . This leads to the c o n c l u s i o n that all scales used in f u n c t i o n p o i n t analysis are der i ved f r o m empi r ica l re lat ionships f o u n d in a spec i f ic pro ject database. 2. Assess ing the c o m p l e x i t y level of each major f unc t ion type a n d d e t e r m i n i n g overal l p r o c e s s i n g c o m p l e x i t y ad justments is sub ject ive and may vary substantial ly f rom o n e analyst t o the next. T o o v e r c o m e these s h o r t c o m i n g s o n e alternative is t o first measure e a c h indiv idual system d e v e l o p m e n t e n v i r o n m e n t t o ascertain spec i f i c def in i t ions of l ow , m e d i u m and h igh c o m p l e x i t y . This may be a c h i e v e d by ob ta in ing a d is t r ibut ion of, for e x a m p l e , the n u m b e r of data e l e m e n t s o n each input sc reen . L o w c o m p l e x i t y c o u l d t h e n be d e f i n e d as the b o t t o m th i rd of the d is t r ibut ion , wh i le h igh c o m p l e x i t y may be d e f i n e d over the Empir ical Research D e s i g n / 89 t o p th i rd . H o w e v e r , w i t h the use of a u t o m a t e d m e t h o d s it is unnecessary t o ca tego r i ze f u n c t i o n types as l o w , m e d i u m o r h igh as the c o n t i n u o u s regress ion c o - c o e f f i c i e n t can be u s e d direct ly . This w o u l d r e m o v e s o m e of the inherent subject iv i ty i nvo lved w i th c o m p l e x i t y c lass i f icat ion . 4.4.2. The Automated Code Analyser It b e c a m e apparent very early in this research pro ject that manua l analysis of c o m p l e t e d systems was t ime c o n s u m i n g , t e d i o u s and p r o n e to researcher error. This mot i va ted the des ign and d e v e l o p m e n t of an a u t o m a t e d t o o l . It differs s igni f icant ly f r o m tradit ional c o d e counte rs w h i c h are l imi ted t o c o u n t i n g l ines of c o d e and n u m b e r of tokens . Essentially, the c o d e Analyser is an i m p l e m e n t a t i o n of the c o n c e p t s d e v e l o p e d in this and o the r chapters p lus an i m p l e m e n t a t i o n of Sof tware Sc ience theory [Halstead 1977] in a Fourth G e n e r a t i o n language. The t o o l is wr i t ten in F O C U S and h e n c e is potent ia l l y por tab le to over 5000' MIS d e v e l o p m e n t env i ronments in N o r t h A m e r i c a 2 2 . 4.4.2.1. Tool Development Prior t o c o n s t r u c t i o n , 75 F O C U S p rograms w e r e c o u n t e d manual ly us ing the reverse eng inee r i ng a p p r o a c h d i s c u s s e d in the prev ious e x a m p l e 2 3 . Rules w e r e estab l ished to identi fy e a c h p rog ram 's main f u n c t i o n , identi fy major sect ions of c o d e , c o u n t variables and classify var iable usage. These rules w e r e later i m p l e m e n t e d , and m o d i f i e d by e x p e r i e n c e , in the pars ing and aggregat ion rout ines of the Analyser . ^ See Mis ra a n d Jalics [1988] for a d i scuss ion o n the di f f icul t ies in us ing n o n - p r o c e d u r a l languages to so lve p r o c e d u r a l p r o b l e m s 2 3 The c o u n t i n g was p e r f o r m e d by the author Empir ical Research D e s i g n / 90 The c o n s t r u c t i o n of the Ana lyser b e g a n in early July 1988 and was iteratively t e s t e d , first against s imp le p rog rams and then m o r e c o m p l e x c o d e . By the b e g i n n i n g of A u g u s t the Ana lyser had suf f ic ient k n o w l e d g e of F O C U S c o d e to be run against the or ig inal 75 p i lo t p rograms. This revea led three sou rces of d i f ferences b e t w e e n the manual and a u t o m a t e d measures : 1. bugs in the Analyser itself 2. manual c o u n t i n g errors in the p i lot data. 3. c o d e unders tand ing and interpretat ion d i f fe rences The first s o u r c e of error was dealt w i t h easily. In the s e c o n d case, the Ana lyser d i d a bet ter j o b than the or ig inal manual c o u n t i n g . The third sou rce of d i f fe rence ref lects a c h a n g e in the k n o w l e d g e and unders tand ing of the F O C U S language by this researcher . This was b r o u g h t about by bu i ld ing an Ana lyser wr i t ten in the same language as the ob jec t of analysis. H e n c e , the manua l vs a u t o m a t e d results differ s l ight ly as a result of di f ferent rules b e i n g a p p l i e d in sl ightly di f ferent c i rcumstances . As it turns ou t these changes are an i m p r o v e m e n t over the manua l rules. In short , the manual p r o c e s s o v e r l o o k e d s o m e key features. This p rocess is d i s c u s s e d separately under Pi lot Va l ida t ion in C h a p t e r 5. 4.4.2.2. Tool Description The C o d e Ana lyser was i m p l e m e n t e d in P C / F O C U S 3.0 o n an AT class m i c r o - c o m p u t e r . S o u r c e c o d e and data de f in i t ions for e a c h c o m p l e t e d system are l o a d e d f r o m d iskette o r i m p o r t e d over a c o m m u n i c a t i o n s link into separate sub -d i rec to r ies u n d e r e a c h f i rm's name. The t o o l generates a list of systems fo r e a c h c o m p a n y and c o r r e s p o n d i n g lists Empir ical Research Des ign / 91 of F O C E X E C S and M A S T E R FILE DEFINIT IONS to be ana lyzed . These lists are then p r o c e s s e d by the pars ing rout ines w h i c h update the SYSTEMS database. For e a c h p r o g r a m the main parser is i n v o k e d . The C o d e Ana lyzer scans e a c h p rogram's s o u r c e c o d e and p r o d u c e s the f o l l o w i n g o u t p u t s (see Figure 4.12): 1. Program c lass i f icat ion 2. Program length : Text l ines ( exc lud ing c o m m e n t s ) 3. Halstead 's so f tware s c i e n c e metr ics : a. N u m b e r of un ique ope rands : n , b. Tota l n u m b e r of operands : N , c. N u m b e r of un ique opera to rs : n 2 d. Tota l n u m b e r of operators : N 2 e. P rogram length : N = N , + N 2 f. P rog ram vocabulary : n = n , + n 2 g. P rogram v o l u m e : V = N l o g 2 n 4. M c C a b e s c y c l o m a t i c c o m p l e x i t y metr ic 5. N u m b e r of s c r e e n images and n u m b e r of input and ou tpu t data e l e m e n t s o n each sc reen 6. N u m b e r of reports and ou tpu t variables 7. N u m b e r of files a c c e s s e d 8. N u m b e r of p ro jec t ions and joins p e r f o r m e d o n tables 9. D is t r ibut ion of language k e y w o r d usage. 4.4.2.3. Parsing Strategy Each l ine of text is read in sequent ia l ly by the text parser. Blank l ines and c o m m e n t s are o m i t t e d f r o m further analysis, a l t hough the c o m m e n t l ines are latter s c a n n e d t o p ick up p r o g r a m m e r and date i n fo rmat ion . D e p e n d i n g o n what the l ine of c o d e conta ins , var ious pos i t i ona l log ic sw i tches are tu rned o n o r off. The initial parse uses / 92 | Sou rce Code and | Data Definition f rom | | Working S y s t e m s i / Code \ 1 Analyser Metrics to predict Metrics to predict New Sys tem Development: Maintenance: — Funct ion Points — Halstead — Bang Metrics — McCabe - Requ i rements Size - others Figure 4 . 1 2 : Outputs f r o m Code A n a l y z e r Empir ical Research D e s i g n / 93 blank as a del imiter . This generates a t o k e n w h i c h may be a n y w h e r e f r o m 1 to 80 characters l o n g and may c o n t a i n o n e o r m o r e e m b e d d e d tokens . This t o k e n is u s e d in a table l o o k u p of F O C U S k e y w o r d s . If f o u n d then the t o k e n is an operator . O t h e r w i s e , if the t o k e n has the proper t ies of a w e l l f o r m e d o p e r a n d then it is an o p e r a n d . If b o t h these t w o rules fail t hen fur ther d e c o m p o s i t i o n is p e r f o r m e d o n the t o k e n . A series of rules are app l ied to the c o m p l e x t o k e n to break it d o w n to a r e c o g n i z a b l e i tem. W h e n a n e w chunk is p a r s e d out , it b e c o m e s the current t o k e n and is p a s s e d to the t o p of the rules stack. N e w rules are easily i n c o r p o r a t e d into the st ructure wi th on ly a l inear increase in p r o c e s s i n g t ime. A n error trap rout ine waits at the b o t t o m of the rule hierarchy to c a t c h un ident i f ied t o k e n s . This rout ine is t r iggered w h e n n o n e of the p r e c e d i n g rules fire. It traps the t o k e n s and writes t h e m o u t to a l o g file w h i c h can then be l o o k e d at by a human after p r o c e s s i n g 2 " The a lgor i thm is recursive s o that any c o m p l e x t o k e n wi l l eventual ly be d e c o m p o s e d into a s imp le interpretable o n e . Even after a s imp le t o k e n is f o u n d its in terpretat ion d e p e n d s o n its p o s i t i o n relative to p r e c e d i n g c o d e . This c o m p l e x i t y ident i f ies a current f law in the Analyser . The parser makes on ly o n e pass th rough the c o d e . It can o n l y d e t e r m i n e o r interpret a t o k e n in the c o n t e x t of c o d e that p r e c e d e s the t o k e n . The Ana lyser d o e s no t l o o k fo rward in the c o d e and h e n c e m o r e subt le interpretat ions are not poss ib le . For examp le , a variable appear ing in a TABLE request can e i ther be c lassi f ied as be ing i nvo lved in the p r o d u c t i o n of a report o r actual ly appear ing in a report . The i n f o r m a t i o n necessary t o m a k e this d i s t inc t ion d o e s no t appear in the c o d e unti l the ^ This t e c h n i q u e p r o v e d useful in d e b u g g i n g the Analyser as n e w or specia l cases w e r e u n c o v e r e d w h e n n e w programs w e r e ana lyzed . It also d i s c o v e r e d that s o m e p rograms had b e e n s c r a m b l e d dur ing t ransmiss ion u p l o a d o r d o w n l o a d . These p rograms w e r e d e l e t e d f rom the study. Empir ical Research D e s i g n / 94 e n d of the report p rogram. Here , backwards c lass i f icat ion w o u l d b e necessary , a capabi l i ty that the current ve rs ion d o e s not p o s s e s s 2 5 . 4.4.2.4. Program classification The t o p level s t ructure of the p r o g r a m classi f icat ion f o l l o w s standard data p r o c e s s i n g fo rm. The s u b o r d i n a t e classes ref lect variat ions o n this t h e m e : 1. C o n t r o l p rog rams a. M e n u s b. Job c o n t r o l c. O / S interface d. Rou t i ng 2. Input p rog rams a. O n l i n e interactive update b. O n l i n e interactive update wi th report c. Batch file update d. Batch file update w i th report 3. O u t p u t p rograms a. Batch reports b. Repor ts w i th user s e l e c t i o n c. File ou tpu t 4.4.2.5. Token counting rules T o k e n s fall in to t w o main categor ies - operators and operands . T o k e n s ident i f ied as b e i n g part of the F O C U S language o r bu i l t - in p r o c e d u r e s are operators . A l l rema in ing tokens fall into the category of data and the C o d e Analyser treats t h e m as o p e r a n d s . 2 5 This i s o n e instance w h e r e the a u t o m a t e d C o d e Ana lyzer w o u l d differ f r o m manua l analysis. Empir ical Research D e s i g n / 95 The Ana lyser classif ies ' o p e r a n d s into o n e of 23 alternatives d e p e n d i n g o n its c o n t e n t a n d c o n t e x t wi th in the p rog ram. The o p e r a n d c o d i n g s c h e m e appears in A p p e n d i x A. 4.4.2.6. Known limitations 1. The I N C L U D E statement : The I N C L U D E statement is a c o m p i l e t ime inst ruct ion w h i c h br ings in c h u n k s of s o u r c e c o d e into p rograms . At p resent the Analyser d o e s not br ing these c o d e c h u n k s in for analysis in the c o n t e x t of the cal l ing p rog ram but c o n s i d e r s t h e m as separate p rograms. W h e n these c o d e chunks are smal l re -useab le rout ines s u c h as input edit c h e c k s o r pr inter se tup , m o d e r a t e usage has neg l ib le effect o n overal l c o u n t i n g . H o w e v e r , w h e n these c o d e chunks are substantial in size, c o n t a i n impor tant file access ing i n fo rmat ion , and are u s e d extensively , the si tuat ion is m o r e p rob lemat ic . Programs may be classi f ied incorrect ly , o r their lengths may be u n d e r e s t i m a t e d and d e s i g n character ist ics may be h idden . The issue is best u n d e r s t o o d by c o n s i d e r i n g that at o n e level the p rogram is the unit of analysis. Operat iona l l y , a p r o g r a m is c o u n t e d as a file w h i c h conta ins a n u m b e r of s o u r c e inst ruct ions . If a p r o g r a m is physical ly d is t r ibuted over several f i les t h e n the unit of analysis breaks d o w n . The same s i tuat ion w o u l d arise at the system level if all p rograms w e r e not present for analysis. 2. C o m p i l e d M O D I F Y : At p resent a p r o b l e m exists in f ind ing all the J O I N structures u s e d by update p rog rams that have b e e n c o m p i l e d . F O C U S requires that c o m p i l e d M O D I F Y statements be run f r o m a dr iver p r o g r a m w h i c h conta ins all the file access and jo in i n fo rmat ion . The p r o b l e m is similar to the o n e a b o v e w h e r e t w o separate p rograms , a smal l dr iver and a larger body , are c o n c e p t u a l l y the same func t ion . Several so lu t i ons exist to c o m b i n e p rograms s u c h Empir ical Research D e s i g n / 96 as these but n o n e have yet b e e n i m p l e m e n t e d . 3. ind i rect file re ference: If a fi le b e i n g a c c e s s e d d o e s no t m a t c h a k n o w n master f i le de f i n i t i on n a m e t h e n this master 's i n f o r m a t i o n , clearly, can n o t b e at tached t o the p rog ram. This s i tuat ion may arise if c o m p l e x i nd i rec t , file re fe renc ing is u s e d , s u c h as g loba l m a c r o text subst i tu t ion du r ing run t ime. The l imi tat ions ident i f ied a b o v e all revo lve a r o u n d the inabil ity t o log ica l ly c o l l e c t re lated p ieces of c o d e at the p rogram level w h i l e avo id ing d o u b l e c o u n t i n g . Several so lu t ions are o n the d rawing boards to dynamica l ly attach log ica l ly re lated c h u n k s of c o d e but these are b e i n g left for future d e v e l o p m e n t . 4.4.3. Systems Database The ent i t ies and re lat ionships in the Systems Database appear in Figure 4 .13, w i th further detai l in A p p e n d i x C . The internal s t ructure of the SYSTEMS database is basical ly h ierarchical w i th a c o m p a n y as the root instance. B e l o w c o m p a n y are systems, w h i c h are parents of p rograms and data def in i t ions . Data def in i t ions are c r o s s - r e f e r e n c e d t o programs w h e n e n c o u n t e r e d in the c o d e . B e l o w p rograms are l ines of c o d e w h i c h are the parents of operators and operands . There is a ne twork type cross l inkage t o the Projects file. The Projects file conta ins pro ject detai ls regarding r e s o u r c e c o n s u m p t i o n , tasks p e r f o r m e d , dates,, and o the r p ro ject d e m o g r a p h i c s . / 97 Company Projects n m Resources n m Systems n Operators n Programs n m Masters 1 n LOC 1 \ n Operands Figure 4.13: Systems Database 4.4.4. Metrics Analysis Empir ical Research Des ign / 98 From the operat iona l i za t ions p r o v i d e d prev ious ly the regress ion m o d e l t o pred ict p r o c e s s s ize at the p r o g r a m level is: A T = fn(Screens, Input Variables, Output Variables Master files accessed, fields accessed) A t the system level the m o d e l b e c o m e s : A T = fndnput events, Output events, Master files, Segments within master files, Total number of fields) These m o d e l s are t e s t e d against sample data d e s c r i b e d in Chapte rs 5 and 6. 4.5. HYPOTHESES As d i s c u s s e d in this and prev ious chapters , the cons t ruc t a s s u m e d to be the dr iv ing fo rce b e h i n d effort is the size of the requ i rements . Clear ly , o the r factors wi l l have thei r ef fects dur ing the p r o c e s s of d e v e l o p m e n t but these wi l l have thei r effect o n the base c o m p l e x i t y . If, in fact, system requ i rements and des ign are the m o s t impor tant const ructs to measure t h e n it f o l l o w s that: 1. The var iance in requ i rements size and d e s i g n size wi l l exp la in a s igni f icant pe rcen tage of the var iance in d e v e l o p m e n t effort m e a s u r e d at the system leve l . A A A 2. Var iance in E ' and S ' w i l l exp la in a signi f icant pe rcentage of the var iance in T 1 at bo th the p r o g r a m and system level . Empir ical Research D e s i g n / 99 Add i t i ona l l y , due to the iterative nature of s y s t e m d e v e l o p m e n t and est imat ing it is e x p e c t e d that: A A 3. A s m o r e in fo rmat ion about E ' and S' is o b t a i n e d an increase in the accuracy A of T ' w i l l be ach ieved . 4.6. SUMMARY The research d e s i g n c o n s i s t e d of three steps. The first was to estab l ish a l inkage b e t w e e n measures of des ign size and c o d e size at the p r o g r a m level . The s e c o n d step was to aggregate these parameters to the system level t o link requ i rements s ize to d e s i g n s ize . The third s tep requ i red that a g iven s y s t e m d e v e l o p m e n t e n v i r o n m e n t have all its d e v e l o p e d systems reverse e n g i n e e r e d and t h e n cal ibrated against resource c o n s u m p t i o n deta i l . This is to a c c o u n t for the w i d e d i f fe rences a m o n g s t p e o p l e skil ls, sys tem or app l i ca t ion type , and t o o l usage. In the next chapter the spec i f ic research e n v i r o n m e n t is d e s c r i b e d w h e r e this is p e r f o r m e d . CHAPTER 5. THE EMPIRICAL STUDY The p u r p o s e of this C h a p t e r is to desc r ibe the f ie ld sett ing and f ie ld m e t h o d s u s e d in an initial test of the research m o d e l . This initial test f o c u s e d o n software systems wr i t ten in a s ing le language, w i th in t w o d e v e l o p m e n t e n v i r o n m e n t s , and w i th in the genera l class of " b u s i n e s s data p r o c e s s i n g " as ident i f ied by D e M a r c o [1982). W h i l e this restr ict ion wi l l l imit the genera l izabi l i ty of the f ind ings to sof tware systems d e v e l o p e d in s imi lar c i rcumstances , there are a large n u m b e r of d e v e l o p m e n t env i ronments w i t h similar character ist ics in w h i c h to e x t e n d the research. C h a p t e r 7 d e v e l o p s the e x t e n s i o n s f r o m this initial test. This chapter first d i scusses the f ie ld set t ing and the data available for analysis. S e c o n d , it presents the results of p i lot data analysis and va l idat ion against the C o d e Analyser . Th i rd , it d e s c r i b e s the data c o l l e c t i o n p rocess and . addresses a n u m b e r of m e t h o d o l o g i c a l issues. W o r k i n g in the f ie ld requires a great dea l of pat ience and the ability to w o r k as unobt rus ive ly as p o s s i b l e . After all, any c o m p a n y agree ing to take part in a s tudy is not guaranteed to rece ive any d i rect benef i t . The researcher 's p resence c o n s u m e s c o m p a n y resources , w h e t h e r it is a s ingle 30 s e c o n d q u e s t i o n o r an expl ic i t request for a t ime slot . M o s t crit ically, w h e n resources b e c o m e scarce , i.e. th ings get busy, research b e c o m e s s e c o n d a r y to the d e m a n d s of the env i ronment . 100 The Empir ical Study / 101 5.7. DATA SITE 7; The first data site can be d e s c r i b e d as reasonab ly typical of m o d e r n system d e v e l o p m e n t s h o p s . The D P Services e m p l o y s nearly 80 p e o p l e in a variety of jobs i n c l u d i n g cler ical w o r k e r s , m a c h i n e operators , i n fo rmat ion cent re staff, p rog rammers , sen io r analysts and pro ject leaders. The in fo rmat ion systems in use cons is t of t hose d e v e l o p e d i n - h o u s e as w e l l as m o d i f i e d packages . B o t h 3rd G L ( C O B O L , PLI) and 4th G L ( F O C U S ) languages are u s e d . The staff tu rnover is comparat ive ly l ow , w i th m o s t p ro fess iona l p e r s o n n e l b e i n g w i th the f irm over the t ime p e r i o d of the systems ana lyzed . This h e l p e d greatly as the pe rson w h o bui lt the system was available to answer any quest ions that arose dur ing the invest igat ion . The hardware d e v e l o p m e n t e n v i r o n m e n t was constant , e .g . same opera t ing sys tem, s c r e e n ed i tors , ove r the d e v e l o p m e n t p e r i o d of the systems analyzed. 5.1.1. The data set The data f r o m this site i n c l u d e d 26 app l icat ion systems wr i t ten in F O C U S . A l l systems w e r e d e v e l o p e d by the Smal l Projects G r o u p in the c o m p a n y b e t w e e n 1984 and 1988. The systems analyzed represent all of the systems d e v e l o p e d by the c o m p a n y in F O C U S . In add i t i on t o the measures spec i f i ed in the last chapter , data o n e a c h system inc lude : 1. Bus iness f u n c t i o n / a p p l i c a t i o n type. 2. Programmer(s ) 3. H o u r s to bu i ld : analysis, des ign , i m p l e m e n t a t i o n and ma in tenance . The 26 systems rep resented nearly 800 F O C U S p r o g r a m s and over 62 ,000 F O C U S lines The Empir ical Study / 102 of c o d e ( L O C ) . Detai ls o n the data c o l l e c t i o n p r o c e d u r e appear in a f o l l o w i n g s e c t i o n . In 4 systems, r e s o u r c e c o n s u m p t i o n data was not available as these hours w e r e bu r ied in o t h e r b i l l ing n u m b e r s such as user suppor t , the In format ion C e n t r e o r ano the r larger pro ject . The hours data represent o n l y the t ime spent b u i l d i n g and mainta in ing the systems and d o no t ref lect the o n g o i n g s u p p o r t and tra in ing requ i red by users. H e n c e the hours data are under -es t imates of the actual l abour resources spent o n the systems. 5.1.2. Pilot Investigation For the pi lot s tudy a smal l sample of the available data w e r e ana lyzed manual ly be fo re the a u t o m a t e d t o o l was built. T w o systems w e r e manual ly c o u n t e d in detai l at the p r o g r a m level f o l l o w i n g the m e t h o d o l o g y d e s c r i b e d in C h a p t e r 4. These t w o systems c o n t a i n e d about 11 ,000 F O C U S L O C wr i t ten wi th in 75 programs. O n e sys tem was 2700 L O C and 22 p r o g r a m s w h i l e the o t h e r was 8300 L O C and 53 programs. A l l p rog rams w e r e d e s i g n e d and wr i t ten by the same sen io r systems analyst. The t w o systems were funct iona l l y s imi lar in that they w e r e basical ly m e n u dr iven by users. The users typical ly enter t ransact ions f r o m a terminal and may select a n u m b e r of reports by spec i f y ing report parameters. M e a n and var iance tests w e r e p e r f o r m e d t o d e t e r m i n e if length of p r o g r a m was sys tem d e p e n d e n t . These tests were not re jected a l lowing p o o l i n g of the programs. Analysis of this data set appears in Wr ig ley and D e x t e r [1988]. The st rength of the re lat ionships f o u n d in these 75 programs ind icate there exists a stable re lat ionsh ip at the p r o g r a m level b e t w e e n operat iona l i za t ions of the theo ry and L O C p r o d u c e d . T h e E m p i r i c a l S t u d y / 1 0 3 A d d i t i o n a l l y , h o u r s a n d t o t a l F O C U S L O C w e r e m a n u a l l y c o u n t e d f o r 8 o f t h e s y s t e m s . H o u r s w e r e e x t r a c t e d f r o m t h e c o m p a n y ' s a u t o m a t e d p r o j e c t c o n t r o l s y s t e m . H o u r s f o r t h i s s a m p l e r a n g e f r o m 5 0 t o 4 5 0 w h i l e L O C r a n g e f r o m 3 0 0 t o 8 7 0 0 . F r o m t h i s s m a l l s a m p l e t h e r e is e v i d e n c e t h a t h o u r s a n d L O C a r e s t r o n g l y r e l a t e d . 5.1.3. Pilot Validation P r i o r t o a n a l y z i n g t h e s a m p l e d a t a w i t h t h e C o d e A n a l y s e r it w a s n e c e s s a r y t o e n s u r e t h a t it g e n e r a t e d t h e s a m e m e t r i c s as t h e m a n u a l m e t h o d . A p r o b l e m i m m e d i a t e l y a r o s e i n c o m p a r i n g t h e m a n u a l p i l o t c o u n t s w i t h t h e a u t o m a t e d c o u n t s . T h e m a n u a l a n a l y s i s w a s p e r f o r m e d in J a n u a r y 1 9 8 8 . W h e n t h e s o u r c e c o d e f o r t h e s e s y s t e m s w a s c a p t u r e d i n m a c h i n e r e a d a b l e f o r m d u r i n g J u l y 1 9 8 8 , b o t h s y s t e m s h a d u n d e r g o n e r e v i s i o n s , o n e v e r y m i n o r a n d o n e m o r e s u b s t a n t i a l . T h e a d d e d f u n c t i o n a l i t y t o t h e s e c o n d s y s t e m r e s u l t e d i n it b e i n g 1 0 % l a r g e r i n t e r m s o f L O C . N e w r e p o r t i n g f u n c t i o n s , s e v e r a l n e w d a t a e l e m e n t s a n d n e w u p d a t e r o u t i n e s w e r e a d d e d . A l s o , s e v e r a l o f t h e l a r g e r u p d a t e p r o g r a m s h a d b e e n c o n v e r t e d t o c o m p i l e d m o d u l e s f o r r u n - t i m e e f f i c i e n c y . T h e c o m b i n e d m o d i f i c a t i o n s r e s u l t e d in t h e s y s t e m b e i n g e x p a n d e d f r o m 5 5 t o 81 p r o g r a m s . M a n y o f t h e n e w p r o g r a m s r e s u l t e d f r o m c e r t a i n u p d a t e p r o g r a m s h a v i n g b e e n s p l i t i n t o t w o p r o g r a m s . T h e u p d a t e p r o g r a m s r e q u i r e d t h a t a s e p a r a t e s m a l l d r i v e r r o u t i n e b e s p l i t o f f . T h e y h a d b e e n t o g e t h e r i n t h e o r i g i n a l p i l o t a n a l y s i s a n d b e e n s p l i t i n t o t w o c o d e c h u n k s d u r i n g s y s t e m m o d i f i c a t i o n d u e t o a r e s t r i c t i o n i n F O C U S w h i c h r e q u i r e s t h a t c o m p i l e m o d u l e s c o n t a i n o n l y M O D I F Y t y p e s t a t e m e n t s . T h e s e d r i v e r The Empir ical Study / 104 rout ines c o n t a i n e d ail fi le JO IN and file C O M B I N E in fo rmat ion . H e n c e , the s i tuat ion arose w h e r e s o m e p rog rams had a lot of s i z ing i n fo rmat ion yet n o c o d e b o d y w h i l e their log ica l ly re lated p rograms had just c o d e b o d y w i t h their data d e f i n e d e l sewhere . A current , l imi tat ion of the C o d e Analyser d o e s not a l l ow for this s i tuat ion to be dealt w i th in an a u t o m a t e d w a y 2 6 . For p u r p o s e s of d i rect c o m p a r i s o n w i th the pi lot data the smal l dr iver p rograms w e r e manual ly c o n c a t e n a t e d w i th their main p rog ram body . A l s o , on ly those p rog rams appear ing in b o t h manual and a u t o m a t e d c o u n t i n g w e r e i n c l u d e d in the va l idat ion . C o m p a r i s o n of b o t h manual and a u t o m a t e d coun ts appears in A p p e n d i x B, a - f, w h i l e the actual coun ts fo r b o t h the i n d e p e n d e n t and d e p e n d e n t variables for all p rog rams appear in A p p e n d i x B, g - h. The s implest va l idat ion is to c o m p a r e p rog ram length d e m o g r a p h i c s . A m o r e diff icult task for the Analyser is t o classify p rograms proper ly . V isual i n s p e c t i o n of the t w o sets of c o u n t s s h o w s very similar measures . The m o s t diff icult is to c o u n t the i n d e p e n d e n t variables accurately . For this va l idat ion p r o g r a m by p rog ram visual c h e c k s were p e r f o r m e d o n the data in A p p e n d i x B.g and B.h. D i f fe rences that d o exist are f r o m three p o s s i b l e s o u r c e s 1) manual error, 2) mod i f i ca t ions m a d e t o the p rograms b e t w e e n January and July, and 3) di f ferent c o u n t i n g rules as d i s c u s s e d in C h a p t e r 4. Finally, regress ions w e r e run to see if the d i f fe rences that rema ined w e r e important . A summary of the regress ion results appear b e l o w in Table 5.1. 2 5 Several so lu t i ons are o n the d raw ing boards to dynamica l ly attach logical ly re lated chunks of c o d e . The Empir ical S tudy / 105 Table 5.1: Pi lot Regress ion - Exp la ined Var iance in C o d e Size P rog ram Class M a n u a l A u t o m a t e d C o n t r o l .48 .38 U p d a t e s .87 .90 O u t p u t s .75 .74 O n e d i f fe rence is no tewor thy . The repor t variable is s igni f icant in the a u t o m a t e d ve rs ion . This is d u e to the di f ferent c o u n t i n g rules b e i n g used . Basically, in the manual analysis on ly those data e lements actual ly appear ing o n a report w e r e c o u n t e d . The a u t o m a t e d c o u n t i n g inc ludes data e lements i nvo lved in the p r o d u c t i o n of the report as w e l l as those actual ly appear ing o n the p r in ted report . Sat isf ied that the A u t o m a t e d C o d e Ana lyser was d o i n g as g o o d if n o t a better a job than the manual m e t h o d , the s o u r c e c o d e f r o m the remain ing 26 systems in site 1 w e r e c o l l e c t e d and f e d to the C o d e Analyser . D i s c u s s i o n of this c o l l e c t i o n p r o c e s s appears b e l o w w h i l e the analysis results appear in C h a p t e r 6. T h e E m p i r i c a l S t u d y / 1 0 6 5.1.4. Data Collection A l l s y s t e m s w r i t t e n i n F O C U S i n t h e o r g a n i z a t i o n b e i n g s t u d i e d w e r e i d e n t i f i e d v i a d i s c u s s i o n s f i r s t w i t h m a n a g e m e n t a n d t h e n w i t h s e n i o r a n a l y s t s . R e p o r t s f r o m t h e p r o j e c t c o n t r o l s y s t e m w e r e u s e d t o c r o s s c h e c k t h i s l i s t . E v e n at t h i s s t a g e t h e d i s t i n c t i o n b e t w e e n a p r o j e c t a n d a s y s t e m b e c a m e a p p a r e n t . S e v e r a l p r o j e c t s a n d e v e n d i f f e r e n t s y s t e m s c o u l d b e l o g i c a l l y g r o u p e d i n t o a s i n g l e s y s t e m . T h e c r i t e r i a u s e d t o c a l l a g r o u p o f p r o g r a m s a s y s t e m w a s w h e t h e r t h e y c o u l d b e a t t r i b u t e d t o a n i d e n t i f i a b l e o r g a n i z a t i o n f u n c t i o n , t a s k o r a c t i v i t y . A f i n a l l i s t w a s c o n s t r u c t e d a n d u s e d t o l o c a t e t h e c o r r e s p o n d i n g s o u r c e c o d e . In g e n e r a l a l l s o u r c e c o d e f o r a p a r t i c u l a r s y s t e m w a s r e s i d e n t i n i ts o w n l i b r a r y o r m a c h i n e . H o w e v e r , i n s o m e i n s t a n c e s p o r t i o n s o f s y s t e m s r e s i d e d i n a p r o d u c t i o n l i b r a r y . T h i s e n t a i l e d c r o s s c h e c k i n g w i t h s y s t e m d o c u m e n t a t i o n a n d s u p p o r t p e r s o n n e l . A l l s o u r c e c o d e a n d d a t a d e f i n i t i o n s w e r e w r i t t e n t o t a p e a n d t r a n s p o r t e d o f f s i t e t o U . B . C . ' s c o m p u t i n g f a c i l i t y . T h e c o n t e n t s o f t h e t a p e s w e r e l o a d e d o n t o t h e U n i v e r s i t y m a i n f r a m e a n d t h e n d o w n l o a d e d t o a n A T c l a s s m i c r o - c o m p u t e r . A s d i s c u s s e d a b o v e , t h e C o d e A n a l y s e r w a s r u n f i rs t a g a i n s t t h e p i l o t d a t a . A f t e r v a l i d a t i o n , t h e r e m a i n i n g s y s t e m s w e r e i n p u t t o t h e A n a l y s e r a n d c r o s s c h e c k i n g w a s d o n e t o e n s u r e n o d u p l i c a t e p r o g r a m s e x i s t e d . In s e v e r a l i n s t a n c e s d i f f e r e n t p r o g r a m n a m e s w e r e d i s c o v e r e d t o c o n t a i n a p p r o x i m a t e l y t h e s a m e s o u r c e c o d e . T h i s c a m e a b o u t d u e t o s o m e p r o g r a m m e r s l e a v i n g o l d c o p i e s o f p r o g r a m s a r o u n d . T h e d u p l i c a t i o n w a s d i s c o v e r e d b y c a r e f u l v i s u a l i n s p e c t i o n o f t h e s o f t w a r e m e t r i c s p r o d u c e d b y t h e A n a l y s e r . S o m e p r o g r a m s w e r e v e r y s i m i l a r as r e v e a l e d b y t h e i r m e t r i c The Empir ica l Study / 107 "s ignatu res . " Inspect ion of the or ig inal s o u r c e c o d e c o n f i r m e d this. These p rog rams w e r e d e l e t e d f r o m the s a m p l e after verbal c o n f i r m a t i o n by the p r o g r a m m e r s r e s p o n s i b l e for their p r o d u c t i o n . Or ig ina l ly the samp le c o n s i s t e d of 28 systems. H o w e v e r , du r ing c ross c h e c k i n g t w o systems w e r e d e l e t e d f r o m the sample . It was d i s c o v e r e d that o n e sys tem was no t a w o r k i n g system at all but an exper imenta l c o l l e c t i o n of p rograms. A s e c o n d system was d e l e t e d b e c a u s e it was wr i t ten entirely by a cont ract analyst w h o was n o l o n g e r w i t h the f irm and h e n c e unavai lable for ver i f icat ion ques t ions . Add i t iona l l y , it was d i s c o v e r e d that the main update p r o g r a m for this s y s t e m had b e e n s c r a m b l e d dur ing data t ransmiss ion f r o m the data site. It was d e e m e d un impor tan t t o i n c l u d e this system in the sample . The 26 remain ing systems c o m p r i s e d 793 programs. D u r i n g data c o u n t i n g the C o d e Ana lyser f o u n d that an add i t iona l four p rograms had b e e n s c r a m b l e d in transport . As these p rog rams w e r e relatively smal l it was again d e e m e d appropr ia te t o d r o p t h e m f r o m the sample . Add i t iona l ly , 19 p rograms w e r e f o u n d to have miss ing master file def in i t i t ions and w e r e d r o p p e d f r o m the sample . H e n c e the final sample size in data site 1 was 26 systems c o m p r i s e d of 770 programs. This data set was u p l o a d e d to the c a m p u s mainf rame w h e r e the statistical p a c k a g e M I D A S was u s e d . 5.7.4.7. Data Validity D i s c u s s i o n s w e r e h e l d w i t h analysts r e s p o n s i b l e for each sys tem to ensure that the p r o g r a m and master file desc r ip t i ons ext racted f r o m the libraries rep resented c o m p l e t e p ictures of the w o r k i n g systems. W h e r e q u e s t i o n s arose regard ing p o s s i b l e errors in the data the analysts r e s p o n s i b l e for each system w e r e c o n t a c t e d and s h o w n what the The Empir ical Study / 108 current a s s u m p t i o n s w e r e . W h e r e errors o r o m i s s i o n s w e r e d e t e r m i n e d they w e r e asked to s u p p l y the necessary data. This was d o n e to verify that the data ana lyzed were accurate re f lect ions of the systems ' status. It may be c o n c l u d e d that if another researcher w e n t in to the same e n v i r o n m e n t the same measures w o u l d be p r o d u c e d . There exists the poss ib i l i ty that s o m e bias ex is ted in the hours r e p o r t e d , exc lus ive t o the " c u l t u r e " of this particular e n v i r o n m e n t . For e x a m p l e , s o m e hours spent w o r k i n g o n a sys tem may be r e p o r t e d in genera l task classes and no t against a spec i f i c pro ject number . This may arise w h e n a part icular activity is app l icab le to m o r e than o n e sys tem, in this case hours may be r e p o r t e d against genera l d e p a r t m e n t o v e r h e a d and not to the spec i f i c systems. This is a threat on ly to the c o n c l u s i o n s regard ing the re lat ionsh ip b e t w e e n hours and o t h e r variables and there fo re may r e d u c e the internal val idity of the o b s e r v e d re lat ionship . H o w e v e r , it is un l ike ly that any f ie ld set t ing exists w h e r e there are n o repo r t i ng c o n f o u n d s . 5.2. DATA SITE 2: The s e c o n d data site d i f fered substant ial ly f r o m the first. It was a smal l c o n s u l t i n g firm spec ia l i z ing in 4 C L app l icat ions . The d e s i g n and d e v e l o p m e n t team for all systems c o n s i s t e d of t w o pe rsons only. Five systems cons i s t i ng of o v e r 1000 F O C U S programs w e r e p r o v i d e d o n f l o p p y disks and l o a d e d o n t o the c a m p u s m i c r o . H e n c e data c o l l e c t i o n issues w e r e min imal . This ease was p robab ly d u e no t on ly to similar hardware e n v i r o n m e n t s b e i n g u s e d but a lso to a learning ef fect by this researcher by k n o w i n g the r ight q u e s t i o n s to ask. The first data site represents no t on ly the first test of the research m o d e l , but also a test a n d d e b u g g i n g of the research m e t h o d . The feasibi l i ty The Empir ical Study / 109 of c o n d u c t i n g f ie ld research in this m a n n e r was b o r n ou t by the relative s p e e d of data c o l l e c t i o n and analysis in the s e c o n d site. Pr ior t o c o n s t r u c t i n g the C o d e Analyser o n e s a m p l e system was o b t a i n e d for test ing p u r p o s e s . It was r e a s o n e d that the C o d e Ana lyser s h o u l d be tes ted against c o d e wr i t ten by an i n d e p e n d e n t s o u r c e to ensure that the pars ing rules w e r e no t site spec i f ic . This strategy p r o v e d invaluable as the c o n s u l t i n g f i rm had wr i t ten far m o r e c o m p l e x user inter face c o d e wi th a w i d e r range of language usage. 5.3. DISCUSSION In C h a p t e r 4 l imi tat ions of the C o d e Ana lyser w e r e ident i f ied . D u r i ng the analysis of the data f rom site 2 o n e of the l imitat ions b e c a m e as issue. In site 1 I N C L U D E s tatements ( instruct ions that br ing in re -usab le c o d e ) w e r e very few. They c o n s i s t e d pr imari ly of PF KEY def in i t ions , O S inter face and pr inter rout ing , i.e. they w e r e unc lass i f ied and d i d n o t affect the cal l ing p rogram's c o u n t i n g o r c lass i f icat ion. H o w e v e r , in site 2 the Analyser 's l imitat ion was m o r e ser ious . O v e r 200 p rograms ca l led in re -usab le c o d e . These c o d e chunks c o n t a i n e d crit ical i n fo rmat ion the Analyser n e e d e d to reverse e n g i n e e r the systems proper ly . The s i tuat ion was f a c e d of w h e t h e r or not to upg rade the Ana lyser o r t o limit the analysis of site 2 data. The later alternative was c h o s e n wi th k n o w l e d g e that the Ana lyser t o o l was not yet c o m p l e t e l y por tab le to di f ferent d e v e l o p m e n t env i ronments . T h e E m p i r i c a l S t u d y / 110 5.4. OTHER MEASUREMENTS FROM THE DATA In a d d i t i o n t o t h e t h e o r y t e s t i n g p o r t i o n o f t h e r e s e a r c h , a n u m b e r o f d e s c r i p t i v e s t a t i s t i c s w i l l b e o f i n t e r e s t t o p r a c t i t i o n e r s . T h e A u t o m a t e d C o d e A n a l y s e r a l l o w s a n i n d e p t h v i e w o f t h e c o d e w h i c h w a s p r e v i o u s l y c o m p u t a t i o n a l l y i n f e a s i b l e . E x a m p l e s o f t h e k i n d s o f o u t p u t s t h e A n a l y s e r c a n p r o d u c e a p p e a r b e l o w a n d w i l l b e a d d r e s s e d i n t h e n e x t c h a p t e r . 1. C o m p a r i s o n o f s y s t e m s a n d p r o g r a m s c o m p l e t e d b y t h e s a m e p e r s o n n e l . T h i s is u s e f u l as a n a n a l y t i c c o n t r o l m e c h a n i s m . a . P r o g r a m m i n g s t y l e : L a n g u a g e u s a g e , v o c a b u l a r y , c o d e d e n s i t y . b. C h a n g e s o v e r t i m e i n p r o g r a m m i n g s t y l e . 2. B r a n c h i n g c o m p l e x i t y : M c C a b e ' s c y c l o m a t i c c o m p l e x i t y . T h i s m e a s u r e is i m p o r t a n t f o r p r e d i c t i n g m a i n t e n a n c e e f f o r t . 3. M i x o f p r o g r a m t y p e s , e . g . n u m b e r o f r e p o r t s , u p d a t e s , m e n u s . 4. P r o d u c t i v i t y o f p r o g r a m m e r s : L O C / H o u r s 5. C o m p a r i s o n o f c o d e p r o d u c e d b y u s e r s v s s y s t e m d e v e l o p e r s . 6. S o f t w a r e S c i e n c e m e a s u r e s o f a 4CL. 5.5. SUMMARY T h i s c h a p t e r h a s d e s c r i b e d t h e f i e l d s e t t i n g s a n d m e t h o d s u s e d t o p e r f o r m a n i n i t i a l t e s t o f t h e r e s e a r c h m o d e l . R e s u l t s o f t h e m a n u a l a n d a u t o m a t e d c o u n t i n g m e t h o d s s h o w t h e a u t o m a t e d c o u n t i n g t o b e b o t h v a l i d a n d r e l i a b l e w i t h i n t h e s p e c i f i c f i e l d s i t e . L i m i t a t i o n s o f t h e t o o l d o n o t m a k e it c o m p l e t e l y p o r t a b l e at t h i s t i m e ; h o w e v e r , s o m e c r o s s s i t e c o m p a r i s o n s a r e p o s s i b l e . T h e n e x t c h a p t e r p r e s e n t s t h e f i n d i n g s f r o m analys ing the f ie ld data w i th the a u t o m a t e d t o o l The Empir ical Study / 111 CHAPTER 6. EMPIRICAL FINDINGS This chapte r p resents the f ind ings f r o m the emp i r i ca l p o r t i o n of the research . Sec t ion 6.1 c o n s i d e r s the p r o g r a m as the unit of analysis and establ ishes the re lat ionship b e t w e e n d e s i g n and c o d e for the deta i led d e s i g n phase. It repor ts p rog ram d e m o g r a p h i c s , tests the p r o g r a m level regress ion m o d e l de f i ned in s e c t i o n 4.4.4, and tests for p r o g r a m m e r ef fects. S e c t i o n 6.2 deals w i t h the system as the unit of analysis and establ ishes the re lat ionship b e t w e e n des ign a n d c o d e for the prel iminary des ign phase. It reports aggregated system level measures , tests the sys tem level regress ion m o d e l d e f i n e d in s e c t i o n 4.4.4, then briefly d iscusses resource c o n s u m p t i o n and overal l p roduct iv i ty . S e c t i o n 6.3 presents the first s tep t o w a r d s genera l i z ing the results accross d e v e l o p m e n t e n v i r o n m e n t s by c o m p a r i n g the data f r o m site 1 w i th the data f r o m site 2. Finally, s e c t i o n 6.4 summar i zes and d iscusses the empi r ica l f indings. 6.7. UNIT OF ANALYSIS: PROGRAM LEVEL 6.1.1. Program Size The d is t r ibut ion of p rog ram size for site 1 appears in Figure 6.1. O f the 770 programs in site 1, 408 are less than 60 L O C (about 1 page of c o d e ) w h i l e 258 are b e t w e e n 60 and 150 L O C . 104 p rog rams are c o n s i d e r e d " l a r g e " , b e i n g over 150 L O C . H o w e v e r , in te rms of tota l pe rcen tage of c o d e in the samp le these numbers are mis lead ing . The 408 smal l p rograms represent on ly 1 5 % of the c o d e . The 258 m e d i u m s i zed p rograms represent ano the r 3 5 % of the c o d e . The larger p rograms, wh i le on ly 1 4 % of the p r o g r a m c o u n t , represent just o v e r 5 0 % of the c o d e . It is these larger p rograms w h i c h 112 Empir ical F indings / 113 may be of greater interest t o managers not on ly b e c a u s e they may c o n s u m e m o r e effort to c o n s t r u c t but also b e c a u s e they may requi re greater ma in tenance effort in the future. 6.1.2. Program Class The theory d e v e l o p e d in C h a p t e r 3 and app l ied in C h a p t e r 4 p red ic ts that a so f tware system wil l have p r o c e s s e s to hand le input and o u t p u t events . Referr ing to Table 6.1, A N O V A tests o n the mean and var iance of p rog ram size in each class ind icates that three dif ferent p o p u l a t i o n s of p rog rams d o exist and c o r r e s p o n d to Figure 3.3 as d e s c r i b e d b e l o w . 6.7.2.7. Update Programs U p d a t e p rograms c o r r e s p o n d c o n c e p t u a l l y t o the event input vector , E', of Figure 3.3. O f 770 programs 200 classi f ied as updates . These c o n t a i n e d 4 8 % of the c o d e and have a mean s ize m o r e than d o u b l e the o the r c lasses. H o w e v e r , this class represents m o r e than s i m p l e update p rocesses . As the C o d e Ana lyser ident i f ies main des ign po in ts w i th in the c o d e it is able t o part i t ion the update class in to several sub -c lasses . The Analyser d i s c o v e r e d that s o m e o u t p u t funct ions s u c h as u p d a t e c o n t r o l totals w e r e e m b e d d e d w i th in the c o d e . The sub -c lasses wi th in the genera l update class are: 1. O n l i n e us ing CRT: n = 125 a. N o report : n = 107 b. W i t h report : n = 18 2. Batch file update : n = 75 a. N o report : n = 61 HISTOGRAM MIDPOINT COUNT FOR 5.LOC (EACH X= 3) 0. 102 •xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 25.000 243 •XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 50.000 124 +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 75.000 106 •XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 100.00 47 •XXXXXXXXXXXXXXXX 125.00 27 +XXXXXXXXX 150.00 34 •XXXXXXXXXXXX 175.00 22 •XXXXXXXX 200.00 12 + XXXX 225.00 8 •XXX 250.00 10 + XXXX 275.00 3 •X 300.00 3 + x c 325.00 1 + x H 350.00 2 +x 375.00 1 •X 400.00 3 +x 425.00 2 + x •• 450.00 1 •X 475.00 2 •X H 500.00 3 •X o oo 525.00 3 •X 550.00 , . 1 + x 575.00 ' ' 2 •X 600.00 1 + X V-625.00 1 •X N 650.00 1 •X 675.00 1 +x o 700.00 1 •X (-•• in 725.00 0 • rt750.00 1 • X 1 H ' 775.00 0 • cr 800.00 0 • c r> 825.00 0 850.00 0 + o 875.00 1 •X 900.00 1 1 925.00 0 w 950.00 0 975.00 0 + n> 1000.0 0 + TOTAL 770 (INTERVAL WIDTH= 25.000) 4>-T a b l e 6 . 1 : P r o g r a m c l a s s d i f f e r e n c e s / 115 U N I V A R I A T E 1 - W A Y A N O V A A N A L Y S I S O F V A R I A N C E OF L O C N= 7 2 0 OUT OF 7 2 0 S O U R C E DF SUM OF S Q R S M E A N SQR F-- S T A T I S T I C S I G N I F B E T W E E N 2 . 1 1 3 1 5 +7 . 5 6 5 7 5 +6 5 1 . 2 5 3 . 0 0 0 0 W I T H I N 7 1 7 . 7 9 1 4 4 +7 1 1 0 3 8 . T O T A L 7 1 9 . 9 0 4 5 9 +7 (RANDOM E F F E C T S S T A T I S T I C S ) ETA= . 3 5 3 7 E T A - SQR = . 1 2 5 1 ( V A R COMP= 2 4 9 7 . 2 % V A R AMONG= 1 8 . 4 5 ) E Q U A L I T Y OF V A R I A N C E S : DF= 2 , . 8 5 5 0 0 +6 F= 1 7 9 . 1 1 . 0 0 0 0 T O P C L A S S N M E A N V A R I A N C E S T D D E V U P D A T E 2 0 0 1 4 7 . 1 6 2 8 5 1 8 . 1 6 8 . 8 7 O U T P U T 3 6 8 6 5 . 0 0 0 5 4 2 1 . 9 7 3 . 6 3 4 C T L 1 5 2 4 7 . 6 5 1 1 6 5 1 . 8 4 0 . 6 4 2 G R A N D 7 2 0 8 4 . 1 6 0 1 2 5 8 1 . 1 1 2 . 1 7 Empir ical F ind ings / 116 b. W i t h Report : n = 14 Thirty t w o of the u p d a t e p rograms c o n t a i n e d o u t p u t reports . In genera l these latter p r o g r a m s are not database ext ract ion reports per se but repor ts o n t ransact ion p r o c e s s i n g , o r changes to the database. H e n c e they can be c o n s i d e r e d as p r o g r a m m e d o u t p u t events . 6.7.2.2. Output Programs O u t p u t Programs c o r r e s p o n d to the event o u t p u t vector , E°, of Figure 3.3. The sample c o n t a i n e d 368 programs (48%) in this class but on ly 3 9 % of the c o d e . Clear ly , r epo r t i ng p r o c e s s e s w e r e m o r e n u m e r o u s yet smaller . 6.7.2.3. Control Programs The th i rd pr imary class, c o n t r o l p rograms, represent the o v e r h e a d requ i red to o rgan ize and e x e c u t e p rograms in the o the r t w o classes. There w e r e 152 p rog rams in this class represent ing 1 2 % of the c o d e . 6.7.2.4. Unclassified Analysis Fifty (6%) of the 770 programs w e r e not c lassi f ied by the C o d e Analyser , i.e. they d id not c o n t a i n any c o d e chunks o r k e y w o r d s w h i c h c o u l d be ident i f iab le as des ign d e c i s i o n s . These p rograms c o n s i s t e d mainly of smal l c o d e stubs (eg. i nc lude f i les, util ity p rog rams , p rog ram f u n c t i o n key def in i t ions o r pr inter rout ing) . The total a m o u n t of Empir ical F indings / 117 c o d e in the unc lass i f ied programs was 685 L O C , o r 1 % of the total c o d e f r o m data site 1. These p r o g r a m s were d i s c a r d e d f r o m further analysis leaving 720 p r o g r a m s of interest. 6.1.3. Regression Results The results f r o m runn ing the regress ion m o d e l in S e c t i o n 4.4.4 against the three classes of p rograms appear in Tables 6.2.1 - 6.2.3. 6.1.3.1. Regression Discussion The Maste r Files A c c e s s e d c o m p o n e n t of the regress ion m o d e ! has b e e n e x p a n d e d t o inc lude S e g m e n t s in o rde r t o obta in a m o r e c o m p l e t e measure of the c o m p l e x i t y in the database a c c e s s e d . Segments are a measure of the structural l inkages w i th in a M a s t e r file. F O C U S structures its databases in a three level h ierarchy wi th M a s t e r at the t o p cons is t ing of subord ina te S e g m e n t s w h i c h in turn cons is t of a n u m b e r of a t o m i c f ie lds. Segments are not signi f icant in the m o d e l as Masters and S e g m e n t s are highly cor re la ted . It is i n c l u d e d here as it wi l l be used later, in c o n j u n c t i o n w i th Fields, t o f o r m a pr inc ipa l c o m p o n e n t measure of the systems' static space , S. As can be s e e n f r o m Table 6.2.1 the variables in the m o d e l expla in 8 6 % of the var iance in u p d a t e p rog ram size . O f the three variables, Masters , S e g m e n t s and Fields on ly o n e expla ins a signif icant p r o p o r t i o n of the var iance. This is because these three variables are h ighly cor re la ted . The d is t r ibut ion of residuals f r o m this regress ion have a standard dev ia t ion of 63 L O C wi th a sl ight pos i t ive s k e w of 2.7 L O C . A p lot of the T a b e l 6 . 2 . 1 : R e g r e s s i o n r e s u l t s - Update c l a s s / 118 LEAST SQUARES REGRESSION <1> CLASS:UPDATE ANALYSIS OF VARIANCE OF 5 .LOC N= 200 OUT OF 200 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 7 .48723 + 7 .69604 +6 166.45 .0000 ERROR 192 .80290 + 6 4181.8 TOTAL 199 .56752 • 7 MULT R= .92657 R-SQR= .85852 SE* 64.667 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT -3 .3345 9.1546 - .36424 .7161 6 . ISCREENS .31890 15.304 3.2825 4.6623 .0000 7 .SCREEN_IN .38937 1.5461 .26395 5.8576 .0000 8 .SCREEN_OUT .58744 2.4872 .24728 10.058 .0000 9 .OUTPUT_DATA .29229 .51447 .12148 4.2350 .0000 10.MASTERS .25929 19.014 5.1112 3.7200 .0003 12 . SEGMENTS .07893 2.9489 2.6880 1.0971 .2740 11 . F I E L D S .10313 .37831 .26332 1.4367 .1524 T a b e l 6 . 2 . 2 : R e g r e s s i o n model - Output c l a s s / 119 LEAST SQUARES REGRESSION <2> CLASS:OUTPUT ANALYSIS OF VARIANCE OF 5.LOC N= 368 OUT OF 368 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 7 .15438 + 7 .22055 +6 178.01 0. ERROR 360 .44602 + 6 1238.9 TOTAL 367 .19899 + 7 MULT R= .88083 R-SQR= .77585 SE = 35.198 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT -1 .7931 4.3138 - .41567 .6779 6 .#SCREENS .19375 27.049 7.2184 3.7472 .0002 7 .SCREEN_IN .10099 6.5655 3.4090 1.9259 .0549 8 .SCREEN_OUT .18134 6.3111 1.8039 3.4987 .0005 9 •OUTPUT DATA .82075 .88397 .32428 -1 27.259 0. 10 .MASTERS .16469 9.6389 3.0426 3.1680 .0017 12 .SEGMENTS .00932 .19926 1.1263 .17692 .8597 11 . F I E L D S .18129 .33605 .96078 -1 3.4977 .0005 T a b e l 6 . 2 . 3 : R e g r e s s i o n model - C o n t r o l c l a s s /" 120 LEAST SQUARES REGRESSION <3> CLASS .'CONTROL ANALYSIS OF VARIANCE OF 5 .LOC N = 152 OUT OF 152 SOURCE DF SUM SQRS MEAN SQR REGRESSION ERROR TOTAL 7 .14553 +6 20789. 144 .10389 +6 721.46 151 .24942 +6 MULT R= .76385 R-SQR= .58346 SE= 26.860 F - S T A T 28.816 SIGNIF .0000 VARIABLE CONSTANT 6.#SCREENS 7.SCREEN_IN 8.SCREEN_OUT 9.OUTPUT_DATA 10. MASTERS 12.SEGMENTS 11. FIELDS PARTIAL .57950 .01243 .37864 .00181 .04587 .00537 - .02928 COEFF 9.2193 21.871 .40625 4.6876 .49062 3.0950 .30617 - .16777 STD ERROR T - S T A T -1 3.8833 2.5632 2.7244 .95485 2.2547 5.6167 4.7483 .47730 2.3741 8.5328 .14911 4.9093 .21760 .55104 .64481 - .35149 -1 SIGNIF .0189 .0000 .8817 .0000 .9827 .5825 .9487 .7257 Empir ical F indings / 121 residuals and the p r e d i c t e d values s h o w s sl ight heteroscedast ic i ty . This may ind icate that a n o n - l i n e a r te rm in the m o d e l s h o u l d be e x p l o r e d . In Tab le 6.2.2 the m o d e l expla ins 7 8 % of the var iance in o u t p u t p r o g r a m size. The S C R E E N - I N variable falls ou t of s ign i f icance , as w e w o u l d e x p e c t , b e c a u s e these are o u t p u t p rograms. S C R E E N - I N are the n u m b e r of data e l e m e n t s e n t e r e d by the user. In this class it represents user c o n t r o l o v e r report p r o d u c t i o n w h i c h usual ly entails on ly o n e o r t w o variables b e i n g e n t e r e d by the user. Segments are again insigni f icant for the s a m e reason stated above . The d is t r ibut ion of residuals f r o m this regress ion have a standard dev ia t ion of 34 L O C w i th a sl ight pos i t i ve s k e w of 2.9 L O C . A s above , a scatter p lo t of the residuals against the p r e d i c t e d values s h o w s sl ightly increas ing var iance. The C o n t r o l class, in Tab le 6 . 2 . 3 . is m o r e p r o b l e m a t i c . The m o d e l on ly expla ins 5 8 % of the var iance. As e x p e c t e d , the database related measures d o not exp la in any s igni f icant var iance. There are n o data f l o w i n g in these p rog rams so w e w o u l d also not e x p e c t s c r e e n input variables to be signif icant. The on ly c o d e in this class of p rograms cons is ts of sc reen d isplay and a f e w o u t p u t variables ev ident by the n u m b e r of sc reens (#SCREENS) and the n u m b e r of data e l e m e n t s d isp layed ( S C R E E N O U T ) b e i n g h ighly s igni f icant. The s igni f icant constant te rm in Tables 6.2.3 ind icates s o m e m i n i m u m o v e r h e a d in wr i t ing e a c h p rog ram s u c h as variable in i t ia l izat ion or o the r s e t - u p p r o c e d u r e s . The d is t r ibut ion of residuals f r o m this regress ion have a standard dev ia t ion of 26 L O C w i th a sl ight pos i t ive s k e w of 2.7 L O C . A g a i n , the residuals s h o w increas ing var iance w h e n p l o t t e d against the p r e d i c t e d d e p e n d e n t var iable. 6.1.3.2. Towards Parsimony Empir ical F ind ings / 122 In the prev ious regress ions the sc reen re lated variables are highly co r re la ted as are the database related var iables. T o r e m o v e this mul t ico l inear i ty the first p r inc ipa l c o m p o n e n t was ext racted f r o m the n u m b e r of screens (#SCREEN) a n d the n u m b e r of input data e lements (SCREEN- IN) , and f rom the n u m b e r of log ica l database g roups ( S E G M E N T S ) and the n u m b e r of f ie lds in all s e g m e n t s (FIELDS). These pr inc ipa l c o m p o n e n t s c o r r e s p o n d c o n c e p t u a l l y t o the n o t i o n s of input event s ize , E', and system stat ic space s ize , S, and g iven the labels INSIZE and DATASIZE. As a measure of o u t p u t event s ize, E°, the a m o u n t of data f l ow ing ou t of the systems ( O U T P U T - D A T A ) is re labe l led O U T S I Z E . These three variables f o r m a n e w r e d u c e d m o d e l and the results of this run appear in Table 6.3. For update and o u t p u t p rograms all three i n d e p e n d e n t variables w e r e highly s igni f icant exp la in ing 8 2 % and 7 5 % of the var iance in p r o g r a m length respect ive ly . C o n t r o l p rograms, h o w e v e r , remain p rob lemat ic . The s ing le s igni f icant i n d e p e n d e n t var iable, INSIZE, e x p l a i n e d on ly 3 7 % of the var iance. H o w e v e r , w e are primari ly in te rested in the m o r e n u m e r o u s and larger input and ou tpu t p rograms . This m o r e p a r s i m o n i o u s m o d e l ind icates that aggregate measures of input , o u t p u t and data size can be u s e d t o pred ict p r o g r a m s ize dur ing the deta i l ed d e s i g n phase of system d e v e l o p m e n t . 6.1.4. Programmer differences In Site 1 a tota l of th i r teen persons w e r e invo lved in p r o d u c i n g the 26 systems. 693 of the 770 p r o g r a m s c o u l d be un ique l y at t r ibuted t o a p r o g r a m m e r f r o m the d o c u m e n t a t i o n e m b e d d e d wi th in the s o u r c e c o d e . O f these 693, 638 (92%) w e r e T a b l e 6 . 3 : Reduced Model / 1-23-LEAST SQUARES REGRESSION : UPDATE. ANALYSIS OF VARIANCE OF LOC N= 200 OUT OF 200 SOURCE DF SUM SQRS MEAN SQR F-STAT SIGNIF REGRESSION ERROR TOTAL 3 196 199 .46818 +7 .99340 +6 .56752 +7 .15606 +7 5068.4 307.91 .0000 MULT R= . 90827 R-SQR= = .82496 SE = 71.192 VARIABLE PARTIAL COEFF STD ERROR T-STAT SIGNIF CONSTANT INSIZE OUTSIZE DATASIZE .88322 .44716 .24684 33.298 2.8641 .81512 .74600 7.8665 .10862 .11646 .20919 4.2328 26.368 6.9990 3.5662 .0000 .0000 .0000 .0005 1: OUTPUT. ANALYSIS OF VARIANCE OF LOC N= 368 1 OUT OF 368 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION ERROR TOTAL 3 364 367 .15007 +7 .48918 +6 .19899 +7 .50022 +6 1343.9 372.22 0. MULT R= . 86843 R-SQR= = .75416 SE= 36.659 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT INSIZE OUTSIZE DATASIZE .53586 .81603 .26675 8.4064 13.008 .89919 .46729 3.4805 1 .0742 .33383 -1 .88493 -1 2.4153 12.109 26.936 5.2805 .0162 .0000 0. .0000 CONTROL. ANALYSIS OF VARIANCE OF LOC N= 152 OUT OF 152 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION ERROR TOTAL 3 148 151 91829. .15759 +6 .24942 +6 30610. 1064.8 28.747 .0000 MULT R= . 60678 R-SQR ! = .36818 SE= 32.631 VARIABLE CONSTANT INSIZE OUTSIZE DATASIZE PARTIAL 58567 ,00066 COEFF 22 .573 8.2710 .21907 -1 STD ERROR T - S T A T SIGNIF .0000 .0000 .05201 - .10222 4.1942 5.3819 .94092 8.7903 2.7294 .80263 -2 .9936 .16135 - .63354 .5274 Empir ical F indings / 124 wr i t ten by seven p r o g r a m m e r s . The rema in ing p rograms w e r e wr i t ten by pe rsons o n loan f r o m o the r areas of the D P s h o p . The n u m b e r of observat ions for these p r o g r a m m e r s are t o o smal l t o analyze. Table 6.4: P r o g r a m m e r by Program Class P e r s o n C o n t r o l O u t p u t s U p d a t e s Unc lass Total N A 7 27 14 0 48 B 6 25 15 0 46 C 0 25 0 1 26 D 0 4 0 0 4 E 6 45 21 0 72 F 2 02 03 0 7 C ** 33 84 46 3 166 H 2 3 5 0 12 j * * 44 47 27 7 125 I 10 17 21 0 48 K ** 31 38 38 26 133 L 0 0 02 0 2 M 2 0 0 2 4 O f the 638 programs, 424 w e r e wr i t ten by just th ree p e o p l e , a c c o u n t i n g for over 6 0 % of the F O C U S c o d e output . The d is t r ibut ion of p rograms wr i t ten by p r o g r a m m e r by p r o g r a m class appears in Table 6.4. The d o u b l e asterisk bes ide p r o g r a m m e r ind icates suf f ic ient sample s ize fo r further analysis. Empir ical F ind ings / 125 The s o u r c e c o d e for 77 of the 770 programs c o n t a i n e d n o d o c u m e n t a t i o n ident i fy ing the p rog rammer . These p rograms c o n s i s t e d of 1544 L O C or 2 . 5 % of the samp le c o d e . O f these 77 on ly 11 w e r e unc lass i f ied and c o n s i s t e d of 137 L O C . O f the remainder , 9 (113 L O C ) w e r e smal l menus , 8 (310 L O C ) w e r e smal l update p rograms and , interest ingly , 49 (1120 L O C ) w e r e report p rograms . If w e assume that lack of p r o g r a m m e r d o c u m e n t a t i o n in a p r o g r a m indicates that it was wr i t ten by a n o n - D P p ro fess iona l , i.e. an end -user , t h e n this po in ts t o an in terest ing observa t ion . Purported ly , 4 C L languages are a b o o n t o e n d - u s e r c o m p u t i n g (EUC) . The data in this samp le ind icates that E U C is still very m u c h l imited to smal l ext ract ion p rog rams access ing ex is t ing databases. The m o r e di f f icult aspects of b u i l d i n g the c o r e of systems must rely o n the exper t i se of MIS pro fess iona ls . Fur thermore , of the tota l 4 C L c o d e wr i t ten , less than 3 % appears t o have b e e n g e n e r a t e d by n o n - D P staff. H o w e v e r , this observa t ion may be an artifact of the data c o l l e c t i o n m e t h o d u s e d in this data site. W i t h the g r o w i n g use of m i c r o - c o m p u t e r t o main f rame c o m m u n i c a t i o n it may be the case that user wr i t ten c o d e resides o n e n d user persona l c o m p u t e r s and is on ly e x e c u t e d against databases access ib le t h r o u g h the mainf rame. D u e to the diverse nature of the data site it was not p o s s i b l e t o pursue this observa t ion further. 6.1.4.1. Effects on Program Size Three p r o g r a m m e r s - C , I and K had suff icient data po ints in each p r o g r a m class t o a l l ow c o m p a r i s o n s across classes. T o test for p r o g r a m m e r ef fects o n p r o g r a m size three d u m m y variables w e r e a d d e d t o the regress ion m o d e l in Tables 6.2.1 - 6.2.3. This e x p a n d e d m o d e l appears in Tables 6.5.1 - 6.5.3 for p rog ram classes U p d a t e , O u t p u t and C o n t r o l respect ive ly . As can be s e e n f rom Table 6.5.1, n o n e of the p r o g r a m m e r T a b l e 6 . 5 . 1 : Programmer D i f f e r e n c e s - Updates / 126 LEAST SQUARES REGRESSION <1> CLASS:UPDATE ANALYSIS OF VARIANCE OF 5.LOC N= 200 OUT OF 200 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 10 .48770 + 7 .48770 +6 115.49 .0000 ERROR 189 .79813 + 6 4222.9 TOTAL 199 .56752 + 7 MULT R= .92702 R-SQR= .85936 SE = 64.984 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT - .56162 10.092 - . 55650 -1 .9557 6 . #SCREENS .32495 15.829 3.3510 4.7237 .0000 7 . SCREEN_IN .38962 1.5483 .26621 5.8159 .0000 8 . SCREEN_OUT .57926 2.4563 .25143 9.7695 .0000 9 .OUTPUT_DATA .28794 .52755 .12763 4.1336 .0001 10 •MASTERS .26022 20.149 5.4383 3.7050 .0003 12 .SEGMENTS .04247 1 .7621 3.0152 .58442 .5596 11 . F I E L D S .12035 .46121 .27674 1.6666 .0973 45 .PERSON_G - .05359 -9 .2850 12.585 - . 73779 .4616 46 .PERSON_I - . 06435 -13 .543 15.277 - .88654 .3765 47 . PERSON_K .00193 .35394 13.342 .26528 -1 .9789 T a b l e 6 . 5 . 2 : Programmer D i f f e r e n c e s - O u t p u t s / 127 LEAST SQUARES REGRESSION <2> CLASS:OUTPUT ANALYSIS OF VARIANCE OF 5.LOC N= 368 OUT OF 368 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 10 .15651 +7 .15651 +6 131.55 0 ERROR 357 .42473 +6 1189.7 TOTAL 367 .19899 +7 MULT R= .88688 R-SQR= .78655 SE= 34.492 VARIABLE PARTIAL COEFF CONSTANT -4 .9776 6 .#SCREENS .19385 27.118 7 .SCREEN_IN •06550 4.2089 8 . SCREEN_OUT .18621 6.3488 9 .OUTPUT_DATA .82867 .89232 10 •MASTERS .19715 11.786 12 .SEGMENTS - . 04763 -1 .0577 11 •FIELDS .20146 .37531 45 .PERSON G .15300 14.484 46 .PERSON_I - .09174 -10 .817 47 .PERSON_K .09139 11.011 STD ERROR T - S T A T SIGNIF 4.3501 -1 .1442 .2533 7.2635 3.7335 .0002 3.3938 1.2402 .2157 1.7729 3.5809 .0004 .31900 -1 27.972 0. 3 .1019 3.7997 .0002. 1.1740 - .90095 .3682 .96575 -1 3.8862 .0001 4 .9511 2.9254 .0037 6.2143 -1 .7406 .0826 6.3504 1.7340 .0838 T a b l e 6 . 5 . 3 : P r o g r a m m e r D i f f e r e n c e s - C o n t r o l / 128 LEAST SQUARES REGRESSION <3> C L A S S : C T L ANALYSIS OF VARIANCE OF 5 . L O C N= 152 OUT OF 152 SOURCE DF SUM SQRS MEAN SQR REGRESSION 10 .16104 + 6 16104. ERROR 141 88381. 626.81 TOTAL 151 .24942 + 6 MULT R= .80352 R-SQR= .64565 SE = 25.036 F - S T A T SIGNIF 25.691 .0000 VARIABLE PARTIAL COEFF CONSTANT 23.335 6 .#SCREENS .62444 22.787 7 . SCREEN_IN •02201 .66811 8 . SCREEN_OUT .39718 4.6232 9 . OUTPUT_DATA - .04901 -1 .2366 10.MASTERS .08210 5.2344 12 •SEGMENTS - .03334 -1 .7634 11 .FIELDS - . 02810 - .14930 45 . PERSON_G - . 19397 -14 .266 46 .PERSON_I - .36021 -24 .968 47 .PERSON_K - . 30728 -22 .956 STD ERROR T - S T A T SIGNIF 4.6745 4.9921 .0000 2.4004 9.4932 .0000 2.5553 .26146 .7941 .89964 5.1389 .0000 2.1225 - .58264 .5611 "5.3514 .97813 .3297 4.4518 - .39612 .6926 .44720 - .33385 .7390 6.0764 -2 .3478 .0203 5.4456 -4 .5850 .0000 5.9871 -3 .8343 .0002 Empir ical F indings / 129 d u m m y variables are signi f icant ly di f ferent f r o m zero . This ind icates that the o t h e r i n d e p e n d e n t var iables, represent ing measures of d e s i g n , are the pr inc ip le cont r ibu to rs to p r o g r a m length , in Table 6.5.2 p r o g r a m m e r C appears t o wr i te . s l ight ly larger o u t p u t p r o g r a m s (p < .004), h o w e v e r the coe f f i c ien t is on ly 14 L O C so this may not be t o o m e a n i n g f u l for m a n a g e m e n t . A n in terest ing p h e n o m e n o n o c c u r s in c o n t r o l p rograms. Co l l ec t i ve l y , the three p rogrammers , C , I, K, t e n d t o p r o d u c e smal ler c o n t r o l p rog rams than the o the r p r o g r a m m e r s c o m b i n e d . S ince c o n t r o l p rograms represent on ly 1 1 % of all c o d e p r o d u c e d this may not be impor tant f r o m a pro ject m a n a g e m e n t pe rspect i ve . H o w e v e r , f r o m a p e r s o n n e l m a n a g e m e n t pe rspect i ve insights of this k ind may assist in a l locat ing training b u d g e t s o r task ass igmments . D u e to the letter of unders tand ing b e t w e e n the researcher and the c o m p a n y i nvo l ved the ident i t ies of the p r o g r a m m e r s w i l l remain c o n f i d e n t i a l . 6.7.4.2. Test. For Learning Effects O n e potent ia l c o n f o u n d i n g variable lies in p r o g r a m m e r learning ef fects over t ime. T o address this possib i l i ty , the date that e a c h p rog ram was wr i t ten was ext racted f r o m the s o u r c e c o d e . This i n fo rmat ion was available fo r 626 programs. The p e r i o d 1985 -1988 was c o n v e r t e d t o a day d i s p l a c e m e n t f r o m January 1, 1985. For e x a m p l e , January 01 , 1986 =366 , January 01 , 1987 = 731 etc . The date that each p r o g r a m was wr i t ten was c o n v e r t e d t o the a b o v e scale. In genera l p rog rams w e r e wr i t ten w i th in a short t ime p e r i o d . This D A T E variable was then cor re la ted w i th all variables in the regress ion m o d e l by e a c h p r o g r a m m e r . The hypothes i s that there was a c h a n g e over t ime in the regress ion variables was not s u p p o r t e d a l though s o m e m i n o r d i f fe rences d id emerge . Empir ical F ind ings / 130 For p r o g r a m m e r G there is s o m e e v i d e n c e that p r o g r a m s w e r e b e c o m i n g smal ler over the three year p e r i o d : R (da te ,LOC) = - .167 (p < .04). W h i l e p r o g r a m m e r I i n c l u d e d m o r e variables in report p rograms : R = .16 (p < .03), p r o g r a m m e r J had n o d iscernab le c h a n g e over t ime. The research d e s i g n in chapter 4 ca l led for the p e r s o n n e l variables and system type t o b e he ld constant . As the staff w e r e e x p e r i e n c e d analysts and F O C U S p r o g r a m m e r s and the kinds of systems b e i n g bui l t w e r e similar the a b o v e results are what w e w o u l d expect . Based o n the e v i d e n c e , and l o o k i n g at the regress ion m o d e l a lone , it is safe to c o n c l u d e that the p r o g r a m m e r s in the samp le w e r e tackl ing simi lar k inds of p r o g r a m m i n g tasks in a cons i s ten t manner . H o w e v e r , the a u t o m a t e d t o o l o f fe red a un ique o p p o r t u n i t y t o invest igate p r o g r a m m i n g style and a few interest ing observat ions e m e r g e d . A learning ef fect was d e t e c t e d for p r o g r a m m e r I. H e r e the r ichness of language usage, o r vocabulary , increased s igni f icant ly over t ime (p < .01). and the dens i ty of the c o d e ( number of c o d e t o k e n s per line) also increased (p < .03). C o n v e r s e l y , the dens i ty of p r o g r a m m e r K's c o d e d e c r e a s e d o v e r t ime (p < .01). These a n e c d o t e s give s o m e ind icat ion as t o the manager ia l insights m a d e p o s s i b l e by the C o d e Analyser . 6.2. UNIT OF ANALYSIS: SYSTEM LEVEL Empir ical F indings / 131 6.2.1. Demographics M e a s u r e s of sys tem sizes fo r the 26 systems f r o m site 1 appear in Table 6.6. A h is tog ram of sys tem- sizes in terms of L O C s h o w s the same genera l shape as the d is t r ibut ion of p r o g r a m size o b s e r v e d at the p rog ram level . 6.2.2. Metric Linkage at System Level For the 26 systems three measures of d e s i g n s ize pred ict eventua l total L O C very w e l l . N u m b e r of screens , n u m b e r of reports and n u m b e r of s e g m e n t s in the database t o g e t h e r exp la in 9 4 % of the var iance in this sample . H o w e v e r , it is premature t o c la im that the m o d e l is va l idated because several large systems c o u l d create this unusual ly str ik ing result. The regress ion l ine is b e i n g f o r c e d th rough the large systems w h i c h c o n t a i n m o s t of the variance. Ideally, a larger data set w o u l d have data po in ts un i fo rmly d is t r ibuted o v e r the range. Detai ls of this regress ion appear in Table 6.7. The large negative in tercept is not mean ing fu l in this c o n t e x t as there are n o observat ions c l o s e to ze ro , and e x t e n d i n g the regress ion l ine past the smal l data set is inappropr ia te . 6.2.3. Resource Consumption The 22 systems w i t h hours data requ i red 11 ,365 hours to d e v e l o p , e n h a n c e and / 132 Table 6.6: System Level Size Measures Sys tern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 LOC Prog Mast Segm Fields Scr- Rep-rams ers ents eens orts Devel Main Total Hours ta i n Hours 10 126 33 13 206 5 211 13 109 101 30 607 0 607 3 18 12 23 59 0 59 30 245 42 25 2009 536 2545 7 58 33 5 missing 18 153 36 24 869 153 1022 18 174 42 21 427 0 427 10 69 28 10 59 11 70 5 89 3 10 63 0 63 4 41 15 5 35 0 35 2 8 10 5 missing 44 349 59 32 3749 624 4373 4 19 31 12 165 0 165 9 133 12 5 missing 22 179 24 7 292 0 292 1 26 8 5 34 0 34 2 24 7 1 54 0 54 5 54 14 9 50 0 ' 50 3 58 16 7 missing 4 61 11 12 97 0 97 3 40 29 27 161 17 178 15 181 0 32 502 200 702 5 21 25 10 96 0 96 4 118 16 9 91 39 130 2 39 13 6 67 0 67 10 118 26 1 90 0 90 3145 9509 1243 6335 1598 3076 4273 2046 594 476 1623 9444 1383 865 3371 304 744 1343 1861 2267 1651 1194 1029 1327 22 81 34 52 16 41 49 31 20 14 10 84 21 18 47 10 635 13 734 17 18 25 46 41 27 20 12 20 4 11 3 17 1 10 14 7 5 4 1 30 2 9 15 1 2 3 3 3 2 6 4 3 1 1 T a b l e 6 .7 : P r e d i c t o r s of System S i z e / 133 LEAST SQUARES REGRESSION ANALYSIS OF VARIANCE OF 6 . L O C N= 26 OUT OF 26 S 0 U R C E DF SUM SQRS MEAN SQR F - S T A T REGRESSION ERROR TOTAL 3 22 25 .14473 +9 .89304 +7 15366 +9 48244 +8 40593 +6 118 .85 SIGNIF .0000 MULT R= .97051 R-SQR= .94188 SE= 637.13 VARIABLE CONSTANT MASTERS SCREENS REPORT PARTIAL .79428 .87651 .48757 COEFF •782.34 152.42 65.967 43.652 STD ERROR T - S T A T 224.51 24.857 7.7243 16.665 •3.4847 6.1320 8.5402 2.6193 SIGNIF .0021 .0000 .0000 .0157 Empir ical F ind ings / 134 maintain . T h e d is t r ibut ion of effort was as fo l l ows : Phase H o u r s P r o p o r t i o n Or ig ina l d e v e l o p m e n t : 7500 6 6 % Enhancements : 2280 2 0 % M a i n t e n a n c e : 1585 1 4 % Tota l : 11365 1 0 0 % The total c o d e p r o d u c e d in these 22 systems was 57,000 y ie ld ing an overal l p roduct iv i t y of 7.6 L O C / H o u r for or ig inal d e v e l o p m e n t and d r o p p i n g to 5.0 L O C / H o u r w h e n e n h a n c e m e n t and m a i n t e n a n c e hours are i n c l u d e d . H o w e v e r , s ince it is i m p o s s i b l e to d e t e r m i n e the exact a m o u n t of c o d e re -use and p o s s i b l e hours o m i s s i o n these p roduct i v i t y f igures may be artificially h igh . As a s ide issue here w e see s o m e e v i d e n c e of surpr is ingly l o w m a i n t e n a n c e costs for systems bui lt a r o u n d a four th genera t ion language. This suppo r ts the general pract i t ioners ' p o s i t i o n (and 4 G L v e n d o r s ' pos i t ion ) that 4 C L s require less ma in tenance . A n alternative exp lanat ion to this c o n c l u s i o n , h o w e v e r , is that the systems in this samp le w e r e d e v e l o p e d recent ly and m a i n t e n a n c e costs may be yet to emerge . Long i tud ina l t rack ing of these systems w o u l d bear o u t this observat ion . The re lat ionsh ip b e t w e e n total d e v e l o p m e n t hours (original d e v e l o p m e n t + e n h a n c e m e n t s ) and l ines of c o d e appears in Figure 6.2. The l inear regress ion of hours against L O C results in an R 2 of .61. In the absence of the out l ie r at 750 hours and 9500 L O C / 135 Figure 6.2: Development Hours vs. Lines of Code SCATTER PLOT LOC N = 2 2 ° U T ° F 3 0 6 ' L ° C V S * 29. TOTDEVHR 9509.0 + * 7607.2 5705.4 3803.6 + 1901.8 + ** Empir ical F ind ings / 136 not on ly d o e s there appear t o be a sl ight curvature t o the re lat ionship but the regress ion l ine i m p r o v e s the R 2 t o .87. The out l ier may be d u e to u n r e p o r t e d hours against the system o r heavy use of reusable c o d e . W h e n D e v e l o p m e n t hours w e r e reg ressed against the th ree p red ic to rs of sys tem s ize , MASTERS , S C R E E N S a n d REPORTS, an R 2 of .80 results. H o w e v e r , Masters is the on ly variable of s ign i f icance . Screens and Repor ts d r o p out of the m o d e l . This may indicate that reasonable est imates of d e v e l o p m e n t hours are p o s s i b l e w h e n an aggregate measure of data, i.e. master files, is available fo r s imp le t ransact ion p r o c e s s i n g systems wr i t ten in F O C U S . This observat ion , it is stressed again, is spec i f ic t o this o n e data site and no extendability of the regress ion co -e f f ic ients is s u g g e s t e d o r war ranted . 6.3. UNIT OF ANALYSIS: FIRM LEVEL The p r e c e d i n g d i s c u s s i o n f o c u s e d o n a single d e v e l o p m e n t env i ronment . The f indings ind icate that the m o d e l and m e t h o d can be u s e d by the f i rm t o better unde rs tand its o w n system d e v e l o p m e n t efforts. O f theoret ica l and pract ical interest is t o d e t e r m i n e if the m o d e l and m e t h o d are e x t e n d a b l e in to o the r e n v i r o n m e n t s . As a first s tep towards this goal the s o u r c e c o d e f rom data site 2 was ana lyzed w i t h the C o d e Analyer . D u e to the l imitat ions ident i f ied earlier on ly r o u g h c o m p a r i s o n s are p o s s i b l e at this t ime. The t w o firms in the study are w i d e l y di f ferent in f u n c t i o n , s ize , and miss ion . It w o u l d be surpr is ing t o f ind any similarit ies in the systems they d e v e l o p unless there is s o m e under ly ing genera l izabi l i ty t o the m o d e l and m e t h o d p r o p o s e d in this thesis. C o m p a r i s o n of the t w o data sites are as fo l l ows : The d is t r ibut ions of p r o g r a m size for Empir ical F indings / 137 all p r o g r a m s appears in Figure 6.1 for site 1 and Figure 6.3 for site 2. The means of these t w o d is t r ibut ions are not statistically d i f ferent as revea led by an F-test (p > .35). A l t h o u g h the var iances are signi f icant ly d i f ferent , this d i f ference c o u l d be mis lead ing b e c a u s e the sample s ize is s o large. This also may be an instance w h e r e the variances dif fer s igni f icant ly but not meaningfu l ly . A n alternate p lausib le e x p l a n a t i o n is that the d is t r ibut ion having the larger var iance c o m e s f r o m the data site w i t h many p rogrammers . W e w o u l d e x p e c t m o r e var iance w h e n m o r e p e o p l e are invo l ved in sys tem bu i ld ing due t o any latent d e s i g n and p r o g r a m m i n g styles. H o w e v e r , this is a post h o c exp lanat ion for an o b s e r v e d p h e n o m e n o n . P rogram classi f icat ion similarit ies appear in b o t h f i rms wi th on ly s l ight ly di f ferent f r e q u e n c y d is t r ibut ions . This is l ikely d u e to the same language , i.e. F O C U S , b e i n g a p p l i e d t o similar k inds of p r o b l e m s . M o r e impor tant ly it s h o w s that system bui lders use the t o o l cons is tent ly . The similarity in p r o g r a m size d is t r ibut ion ind icates initial s u p p o r t for the p o s i t i o n that this m o d e l and m e t h o d are genera l i zab le t o o the r F O C U S d e v e l o p m e n t env i ronments . It is e x p e c t e d that d i f fe rences w o u l d appear on ly in the regress ion parameters w h e n p r o g r a m m e r style and app l ica t ion area have their in f luence o n the p r o c e s s of system bu i ld ing . 6.4. SUMMARY AND DISCUSSION Based u p o n the regress ion results p r e s e n t e d in s e c t i o n 6.1 w e may c o n c l u d e that: g iven an accurate deta i led des ign , w h i c h spec i f ies the major input and o u t p u t funct ions for p rog rams , g o o d p red ic t i ons of c o d e size are poss ib le , exp la in ing 8 6 % , 7 8 % and 5 8 % of the var iance in c o d e size for update , o u t p u t and c o n t r o l p rog rams respect ively . Further, s e c t i o n 6.2 has s h o w n that: g iven an accurate prel iminary d e s i g n d o c u m e n t spec i f y ing major system funct ions and ob jec ts s u c h as screens , repor ts and master fi les, HISTOGRAM <1> FIRMCODE:2 CASES=CLASS:JOBCTL-CRTUFO MIDPOINT COUNT FOR 5.L0C (EACH X = 4) 0 . 58 •xxxxxxxxxxxxxxx 25.000 137 •xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50.000 332 •xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 75.000 200 •xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 100.00 104 •xxxxxxxxxxxxxxxxxxxxxxxxxx 125.00 52 •XXXXXXXXXXXXX 150.00 37 •XXXXXXXXXX 175.00 22 •xxxxxx 200.00 17 •XXXXX i 225.00 22 •xxxxxx 250.00 9 • XXX 275.00 14 + XXXX \< 300.00 13 •XXXX 325.00 3 • X 350.00 2 •X 375.00 3 •X 400.00 7 •XX 425.00 4 •X 450.00 1 •X 475.00 0 • 500.00 2 • X 525.00 2 • X 550.00 0 • 575.00 1 •X 600.00 0 + 625.00 1 +x 650.00 1 + x 675.00 0 + 700.00 0 + 725.00 0 + 750.00 0 + 775.00 0 • 800.00 0 + 825.00 1 + X 850.00 0 + 875.00 0 + 900.00 0 + 925.00 0 950.00 0 + 975.00 0 + 1000.0 0 + TOTAL 1045 (INTERVAL WIDTH= 25.000) oo c 1-1 cr. o M W 0 N a H -01 ft H e'-er d o 3 CO H -0 0 Empir ical F ind ings / 139 it is poss ib le t o p red ic t overal l sys tem size in terms of L O C . These measures of prel iminary d e s i g n s ize toge the r exp la in 9 4 % of the var iance in c o d e s ize . This s tudy has no t empi r ica l ly d e m o n s t r a t e d the re lat ionship b e t w e e n requ i rements size and d e s i g n size. This remains f o r future work . It has s h o w n that measures of des ign s ize expla in 8 0 % of the var iance in resource c o n s u m p t i o n for this data set. H o w e v e r , d u e to the lack of measures of c o d e re -use , mot i va t ion o r skill level , e x t e n s i o n t o n e w d e v e l o p m e n t w o r k is unwar rented . 6.4.1. Generalizability 6.4.7.7. System Size The systems e x a m i n e d in b o t h data sites fall in to the category of relatively smal l t ransact ion p r o c e s s i n g systems. The results ind icate that the l inear m o d e l s e m p l o y e d d o qui te w e l l in exp la in ing c o d e s ize . H o w e v e r , o u r a pr ior i n o t i o n s of size and c o m p l e x i t y lead us to expect that increases in requ i rements and des ign s ize increases c o m p l e x i t y and h e n c e s h o u l d result in a non - l i near increase in c o d e s ize . Examinat ion of the residuals in s e c t i o n 6.1.3 ind icates that non - l i nea r terms may be miss ing f r o m the m o d e l . To address the poss ib i l i ty that c o d e s ize increases non - l inear ly w i t h increased size of d e s i g n , and to make the siz ing m e t h o d m o r e genera l i zab le to o the r sett ings, s e c o n d o r d e r terms for each of the i n d e p e n d e n t variables w e r e a d d e d to the m o d e l . This e x p a n d e d m o d e l d id no t exp la in any m o r e var iance in c o d e s ize than the s imp le r l inear m o d e l . Further, var ious c o m b i n a t i o n s of in teract ion terms w e r e tr ied but d i d no t i m p r o v e the results. It is ant ic ipated , h o w e v e r , that larger systems wi l l requi re these s e c o n d o rde r terms t o a c c o u n t fo r the effects of requ i rements c o m p l e x i t y . Empir ical F indings / 140 A n o t h e r alternative fo r dea l ing w i th the non - l inear i ty issue w o u l d requi re pe r fo rming a l o g t ransform o n e a c h of the variables. H o w e v e r , in the data set in this study many obse rva t ions o n the i n d e p e n d e n t variables had z e r o values m a k i n g the l o g t ransform infeasib le . W h e n a l o g t ransform was taken o n just the d e p e n d e n t var iable, L O C , and the regress ion m o d e l s rerun, R 2 ' s of .59, .62, and .61 resu l ted for the update , o u t p u t and c o n t r o l c lasses respect ive ly . F rom a pract ical pe rspect i ve , h o w e v e r , the l o g t rans fo rmed m o d e l y ie lds un interpretab le coe f f ic ients . This m o v e s away f rom the ob jec t i ve of s impl ic i ty . 6.4.7.2. Model Validation The general izabi l ty of the regress ion m o d e l f o r p r e d i c t i n g n e w sys tem d e v e l o p m e n t has not b e e n assessed. This w o u l d require l ong i tund ina l t racking of the s y s t e m d e v e l o p m e n t p r o c e s s w h i c h is b e y o n d the s c o p e of this w o r k but may be a d d r e s s e d in future research . A n o t h e r level of genera l i zat ion is . w h e n a sample of the p o p u l a t i o n can be u s e d to genera l i ze t o the remain ing systems in the pou la t i on . This requires that the m o d e l be va l idated against a s e c o n d set of data. T o address this issue a spl it -half d o u b l e cross va l idat ion was p e r f o r m e d o n e a c h class of p rog ram. This va l idat ion enta i led the f o l l o w i n g p r o c e d u r e : 1. Randomly spl it the samp le in half fo r e a c h p rogram class, 2. Use the first half of the s a m p l e t o generate regress ion coe f f i c ien ts , 3. U s e the coe f f ic ien ts t o p red ic t the s e c o n d half of the samp le , 4. Cor re la te the p r e d i c t e d values w i t h the actual values, 5. U s e the s e c o n d half of the samp le t o generate a s e c o n d set of regress ion coef f ic ients , 6. U s e these coe f f i c ien ts t o p red ic t the first half of the sample , 7. Co r re la te the p r e d i c t e d values w i th the actual values. W h e n the a b o v e p r o c e d u r e was p e r f o r m e d o n the data f r o m site 1 corre lat ions pairs Empir ical 'F ind ings / 141 of (.93, .90), (.80, .90), (.75, .76) w e r e o b t a i n e d for the update , ou tput , and c o n t r o l classes respect ive ly , ind icat ing a h igh d e g r e e of internal h o m o g e n e i t y in the data. H o w e v e r , this p r o c e d u r e was carr ied ou t primari ly for m e t h o l o g i c a l reasons rather than strict m o d e l va l idat ion . As the sample data r e p r e s e n t e d the ent i re p o p u l a t i o n of systems it was e x p e c t e d that cross va l idat ion w o u l d p rov ide these results. In • o the r sett ings it may be the case that on ly a samp le f r o m the p o p u l a t i o n of systems can be measu red . In these instances the poss ib i l i ty of samp le bias may in f l uence the regress ion coef f ic ients . S o m e measure of this samp l ing error w o u l d be a c h i e v e d w i t h the a b o v e d e s c r i b e d d o u b l e c ross val idat ion t e c h n i q u e . A l t h o u g h s o m e w o u l d argue that ad justed R 2 or o t h e r shr inkage formulas w o u l d ach ieve the same result (see e .g . M u r p h y [1983]). 6.4.1.3. Reverse Engineering M e a s u r e s of systems des ign in this thesis w e r e der ived f rom the s o u r c e c o d e . The des ign units u s e d are typical of bus iness t ransact ion p r o c e s s i n g systems. For systems o c c u p y i n g the remain ing d o m a i n of Figure 2.2, such as sc ient i f ic o r f u n c t i o n s t rong systems, it wi l l l ikely be the case that d e s i g n units o the r than screens , reports etc . w o u l d n e e d to be reversed out of the sof tware t o ref lect des ign s ize . H e r e w e w o u l d have t o assess the outputs f r o m the system analysis and d e s i g n too ls u s e d in e a c h app l icat ion area. For example , if o b j e c t o r i e n t e d d e s i g n w e r e u s e d to m o d e l a system then the d e s i g n units w o u l d have t o ref lect that o r ien ta t ion . M o v i n g f r o m d e s i g n spec i f ica t ion back to requ i rements spec i f i ca t ion is a m o r e dif f icult p r o b l e m . S e m a n t i c in fe rences f r o m the d e s i g n spec i f i ca t ion w o u l d be requ i red s u c h as Empir ica l F indings / 142 aggregat ing a n u m b e r of p rocesses t o represent a c o n c e p t u a l input or o u t p u t event . Further, the ident i f icat ion of entit ies and re lat ionsh ips in the c o d e w o u l d l ikely requ i re d o m a i n k n o w l e d g e t o be ef fect ive. Extensive va l idat ion b e t w e e n h u m a n analysts and the m a c h i n e reversed requ i rements w o u l d be n e e d e d . 6.4.1.4. Forward Engineering The usefu lness of this research app roach is l im i ted in app l ica t ion areas w h e r e f o rma l i zed d e s i g n spec i f i ca t ion exist and forward e n g i n e e r i n g c o m p i l e r s are available. For e x a m p l e , c o m m u n i c a t i o n p r o t o c o l s can be spec i f i ed in terms of state t rans i t ion d iagrams and their h igh level des ign s p e c i f i e d in languages s u c h as FDT ESTELLE. F rom a d e s i g n spec i f i ca t i on m o d e l l e d in ESTELLE, the ESTELLE c o m p i l e r can generate P A S C A L s o u r c e c o d e . H e r e the effort to g o f rom d e s i g n t o i m p l e m e n t a t i o n is min imal . The bulk of the effort e x p e n d e d is g o i n g f rom requ i rements to des ign spec i f i ca t ion . This s i tuat ion may eventual ly arise in all app l icat ion areas as m o r e m o r e p o w e r f u l languages and t o o l s are d e v e l o p e d . 6.4.1.5. Hours Data Several l imitat ions exist in b e i n g able t o relate hours to the p r o d u c t i o n of sof tware . These c o m e f rom t w o dist inct sources : 1. Reusable c o d e : W i t h o u t actual p r o c e s s t racing it is dif f icult t o k n o w w h a t p o r t i o n of the c o d e was genera ted f r o m scratch a n d what p o r t i o n was i m p o r t e d f r o m similar p rog rams ; 2. Pe rsonne l d i f fe rences : A l t h o u g h the results s h o w that p r o g r a m m e r style and / Empir ical F ind ings / 143 learn ing ef fects d o not s igni f icant ly affect the s ize re lat ionsh ip b e t w e e n d e s i g n and c o d e , this s tudy c a n n o t d e m o n s t r a t e that e f f ic iency d i f fe rences d o o r d o no t exist . W h i l e past c o n v e n t i o n a l w i s d o m suggests that p r o g r a m m e r ef f ic iency can vary by a factor of 10, m o r e data is n e e d e d to address this. Add i t iona l l y , as n o measures w e r e taken of p e r s o n n e l mot i va t i on o r actual skill level it w o u l d be p remature to e x t e n d the re lat ionship b e t w e e n hours and c o d e to n e w d e v e l o p m e n t e v e n w i t h i n the same env i ronment . H o w e v e r , the m e t h o d o l o g y is n o w in p lace to obta in these measures in the future. 6.4.7.6. Language A n issue w h i c h is legi t imately raised is w h e t h e r the m e t h o d and measures u s e d in this research are spec i f ic to F O C U S , are genera l i zab ie to o t h e r 4 C L languages, o r are genera l i zab le across language generat ions . W h i l e it is p remature to say anyth ing c o n c l u s i v e in this area, the issue may be add ressed by c o m p a r i n g f ind ings by o the r researchers and j udge w h e t h e r they are similar e n o u g h t o enterta in the a rgument that the m o d e l and m e t h o d d e m o n s t r a t e d in this thesis have the potent ia l of b e i n g genera l i zab le across languages. Ha ls tead [1977] first t h e o r i z e d a n d then empir ica l ly s h o w e d that an interest ing re lat ionsh ip exists b e t w e e n the length of a p rog ram, m e a s u r e d in the n u m b e r of t o k e n s , and the n u m b e r of u n i q u e opera to rs and o p e r a n d s u s e d to wr i te the p rog ram. A. H e d e t e r m i n e d that a p rogram's l ength , N, c o u l d be es t imated by N , b a s e d o n the f o l l o w i n g formula : Empir ical F indings / 144 N = n 1 L o g 2 n 1 + n 2 L o g 2 n 2 w h e r e : n T = n u m b e r of un ique operators ( language keywords ) n 2 = n u m b e r of un ique ope rands (variables) This surpr is ing re la t ionsh ip is similar t o b e i n g able t o p red ic t the length of a story wr i t ten in Engl ish g i ven on ly the n u m b e r of un ique verbs and un ique n o u n s in the story. The cor re la t ion f o u n d b e t w e e n these t w o measures for a samp le of p u b l i s h e d PL/1 and Fortran p rograms , pr imari ly scient i f ic , var ied b e t w e e n .83 and .89 d e p e n d i n g o n w h o d i d the research and the language used . If a similar re lat ionship is f o u n d in an anothe r language (and of p resumab ly h igher level) f r o m an i n d e p e n d e n t sample of p rog rams wr i t ten for a di f ferent app l icat ion area t h e n s o m e t h i n g m o r e p r o f o u n d and m o r e genera l i zab le is b e i n g o b s e r v e d . A T h e cor re la t ion b e t w e e n N and N for the 703 programs f r o m data site 1 is .89 us ing s i m p l e l inear regress ion . H o w e v e r , what is m o r e reveal ing is that Ha lstead 's data s h o w s a curvi l inear re lat ionship as program s ize increases . A p lot of the 703 programs s h o w s a similar p h e n o m e n o n . Perhaps a m o r e mean ing fu l and p o w e r f u l c o m p a r i s o n w o u l d be to c o m p a r e Ha ls tead 's result at the system level instead of at the p rog ram level . The rat ionale for this is that the p rograms Hals tead l o o k e d at w e r e essent ia l ly self c o n t a i n e d , d e f i n e d funct ions . Empir ical F ind ings / 145 Similarly, it s e e m s reasonab le t o c o n s i d e r ent ire systems as self c o n t a i n e d , stable units w i t h d e f i n e d interfaces. A separate analysis p rog ram was wr i t ten to aggregate Halstead 's metr ics to the system level . This enta i led integrat ing all of the variables across all p rog rams in each sys tem to arrive at un ique variable usage. For examp le , if V A R X was u s e d un ique ly in 5 di f ferent p rog rams it s h o u l d on ly be c o u n t e d as 1 u n i q u e o c c u r r e n c e at the system leve l . For the 26 systems at data site 1 the cor re la t ion A b e t w e e n N and N is .916 and the scatter p lo t , s h o w n in Figure 6.4, s h o w s similar A behaviour . N regressed against N y ie lds an R z of .84 wi th the cons tan t term not s igni f icant ly di f ferent f r o m z e r o . These results ind icate that c o m p u t e r languages are m u c h m o r e similar than the ord ina l scale of " g e n e r a t i o n s " w o u l d lead us t o be l ieve . It s h o w s that p rog rams can be b r o k e n d o w n and d e s c r i b e d by their operators and o p e r a n d s . W h a t the prev ious d i s c u s s i o n d o e s not address is w h e t h e r the Func t ion Po int l ike measures are language i n d e p e n d e n t . H e r e , the a rgument is m u c h easier to d e m o n s t r a t e . Funct ion po in t measures p r o p o s e d by A l b r e c h t [1979], D e M a r c o [1982], Jones [1986] and others w e r e and are based o n 3rd genera t ion languages, yet they w e r e d e v e l o p e d to be u s e d at the d e s i g n phase, w h i c h s h o u l d be i n d e p e n d e n t of the i m p l e m e n t a t i o n language. This research has n o w e x t e n d e d this use in to a 4 G L language. The data e lements c o u n t e d by the Ana lyser are l inked to requ i rement spec i f ica t ions and are c o n t a i n e d wi th in a p rog ram by a des ign d e c i s i o n . Screen images are the result of des ign d e c i s i o n s to dea l b o t h w i t h data entry and in fo rmat ion ext ract ion requ i rements . Mas te r f i les are the result of requ i rements fo r reta ined data o n s o m e real w o r l d entity. N o w h e r e is the i m p l e m e n t a t i o n language c o n s i d e r e d in these units. It may be the case, h o w e v e r , that d e s i g n o r ientat ion s u c h as data f l o w may be i m p l e m e n t e d m o r e easily in certain languages relative to others . Tak ing this o n e s tep further, it may also be the F i g u r e 6. 4 :. C a l c u l a t e d v s . A c t u a l l e n g t h / 1 4 6 N= 26 OUT OF 26 NHAT VS . LENGTH NHAT 34666. + 27733. + 20800. • 13866. * 2 6933.2 + 2** * * 2 * + 3* * * • + + + • + + + + - — - + + 13450. 26901. LENGTH 6725.2 20176. 33626. Empir ical F ind ings / 147 case that certain app l ica t ion areas, in this instance t ransact ion p r o c e s s i n g systems, may naturally take a data f l o w a p p r o a c h to des ign . The issue is n o w r e m o v e d f r o m c o n s i d e r a t i o n s of language d e p e n d e n c e to a m o r e substant ia l issue of general izabi l i ty across app l icat ion areas and across d e s i g n or ientat ions . Future research c o u l d address these issues. 6.4.2. Maintenance In the systems ana lyzed the main s o u r c e of system m a i n t e n a n c e effort was f o u n d to c o m e f rom users request ing c h a n g e s o r add i t ions to the data e lements w i th in the system files. This i n fo rmat ion was f o u n d in the requ i rements de f in i t i on a n d c h a n g e l o g of c o m p l e t e d systems. These data e l e m e n t changes led to m o d i f i c a t i o n s in the data entry rout ines and in report layouts. O f the systems ana lyzed ma in tenance effort resu l ted b o t h f r o m n e w f ie ld de f in i t ion and f r o m range o r integrity c h e c k s o n exist ing f ie lds. W h i l e the actual value of the ma in tenance effort attr ibutable to these changes is b e y o n d the s c o p e of this thesis, o n e ret rospect ive met r ic f o r requ i rements volati l i ty can b e tentat ively o f fe red as: N u m b e r of data e l e m e n t c h a n g e s Total n u m b e r of data e lements A c o n f i d e n c e interval may be c o n s t r u c t e d a round the m e a n requ i rements volati l i ty f r o m c o m p l e t e d pro jects . This wi l l g ive pro ject managers s o m e idea of the potent ia l for requ i rements c h a n g e s in their e n v i r o n m e n t as w e l l as the impact of the changes o n r e s o u r c e c o n s u m p t i o n . A n observat ion f r o m this f ie ld research is that there is ambigu i ty b e t w e e n what p e o p l e Empir ical F ind ings / 148 c o n s i d e r t o b e " p r o j e c t s " and " s y s t e m s " . Pro jects s h o u l d be c o n s i d e r e d as a measure of r e s o u r c e c o n s u m p t i o n w h i l e in fo rmat ion systems are the p r o d u c t s of o n e o r m o r e pro jects . Pro jects have d e f i n e d life cyc les but it is no t necessar i ly the case that i n fo rmat ion systems d o . Informat ion systems are ob jec ts that c o n t i n u e t o be u s e d , t o g r o w and be mainta ined . The p r o b l e m is that i n fo rmat ion systems are no t stable units. In format ion systems can be m e r g e d w i th o thers , o r be can r e d e f i n e d . C o n c e p t u a l l y , i n fo rmat ion systems s h o u l d c o n t i n u e to exist and c h a n g e as l o n g as the parent o rgan i za t ion d o e s . H o w e v e r , spec i f ic i m p l e m e n t a t i o n s of in fo rmat ion systems may c o m e and g o w i t h c h a n g e s in t e c h n o l o g y o r o the r e n v i r o n m e n t a l change . H e n c e the w h o l e c o n c e p t of m a i n t e n a n c e may have to be r e t h o u g h t t o r e c o g n i z e that systems c o n t i n u e to g r o w and be m o d i f i e d t o m e e t c o n t i n u a l env i ronmenta l c h a n g e . The m o d e l p r e s e n t e d in Figure 3.3 assumes the input and o u t p u t vectors are stable. H o w e v e r , if these vecto rs c h a n g e , i.e. n e w events o c c u r , t h e n a system must c h a n g e in o rde r to remain e c o l o g i c a l l y v iable . These n e w events can be c o n s i d e r e d as s e c o n d o rde r e n v i r o n m e n t a l change . First o rde r c h a n g e is that w h i c h system sof tware is d e s i g n e d to deal , i.e. the changes w i th in and a m o n g the ent i t ies and relat ionships in the reta ined data m o d e l . S e c o n d o r d e r c h a n g e involves the def in i t ion of n e w enti t ies, n e w relat ionships and n e w events. . 6.4.3. System Decomposition Average p r o g r a m length in Site 1 is 85 L O C w i t h a l o g n o r m a l appearance t o the dis t r ibut ion . The bulk of the programs appear t o be b e t w e e n 40 and 100 L O C . This ind icates that, in genera l , d e s i g n p r o c e e d s d o w n t o the level w h e r e a unit may be i m p l e m e n t e d in 1-2 pages of c o d e . O n rare o c c a s i o n s programs b e c o m e qu i te lengthy Empir ical F indings / 149 to a m a x i m u m of nearly 1000 L O C . This may be for o n e of several reasons. First, u n b a l a n c e d d e s i g n may lead t o a p r o g r a m m e r b e i n g faced w i th a p i e c e t o o large to be easily i m p l e m e n t e d . The c o n v e r s e s ide t o this is that for s o m e kinds of u p d a t e or repor t ing s i tuat ions the c o m p l e x i t y of the task qu ick ly rises b e y o n d the language 's abil ity. This also may b e d u e to the d e s i g n tradeoffs m a d e in the database structure. A n impor tant obse rva t i on and c o n c l u s i o n can be drawn f r o m this study. If w e assume that a g iven requ i rements spec i f i ca t ion may be i m p l e m e n t e d in many di f ferent ways then w e w o u l d e x p e c t t o see n o cons is ten t re lat ionsh ip , across systems, a m o n g the requ i rements , the des ign , and the size of c o d e n e e d e d t o i m p l e m e n t t h o s e requ i rements . Each p e r s o n or system bu i lder w o u l d have their o w n way of so l v i ng a g iven spec i f i ca t ion . If, o n the o t h e r hand , s ize is p red ic tab le f r o m requ i rements and p r o g r a m m e r d i f fe rences are not large then this s t rongly suggests that h u m a n be ings f o l l o w a reasonably w e l l s t ructured p r o c e s s in p r o b l e m so lv ing . They may be m o r e o r less eff ic ient in carry ing ou t the p rocess , but the p r o c e s s is similar. W i t h respect t o c o m p u t e r p r o g r a m m i n g , this p h e n o m e n o n was first o b s e r v e d and art iculated by Ha ls tead [1977] pg . 15: This f i nd ing gains s ign i f icance w h e n it is r e m e m b e r e d that, for every way in w h i c h an a lgor i thm can be i m p l e m e n t e d in ag reement w i th e q u a t i o n ( 2 . 7 ) 2 7 , there are an inf inite n u m b e r of ways in w h i c h an equiva lent ve rs ion c o u l d be wr i t ten . This suggests that the h u m a n brain o b e y s a m o r e r igid set of rules than it has b e e n aware of... 7 N = n 1 L o g 2 n 1 + n 2 L o g 2 n 2 CHAPTER 7. SUMMARY AND RESEARCH DIRECTIONS The p u r p o s e of this chapte r is to summar i ze the c o n t r i b u t i o n of the research and ident i fy fruitful areas for future invest igat ion , b o t h e x t e n d i n g d i rect ly f r o m the thesis as w e l l as t h o s e ideas raised w h i l e c o n d u c t i n g the study. 7.1. THESIS SUMMARY The first issue raised by this thesis was the p r o b l e m of sys tem s iz ing . The central p r o b l e m in est imat ing r e s o u r c e c o m s u m p t i o n for in fo rmat ion sys tem d e v e l o p m e n t is t o d e t e r m i n e the size of the in fo rmat ion system as early in the life cyc le as poss ib le . This thesis has argued that the s ize of requ i rements t ransform into s i ze of d e s i g n , w h i c h in turn t ransform into size of c o d e , and that p roper t ies of sys tem requ i rements remain structural ly i s o m o r p h i c t h r o u g h des ign into i m p l e m e n t a b l e c o d e . S e c o n d , whi le^ current est imat ing m o d e l s exist, the i r general izabi l i ty has not b e e n d e m o n s t r a t e d . c It is unl ikely that these m o d e l s can be general w i t h o u t ind iv idual site ca l ibrat ion due to w i d e rang ing env i ronmenta l d i f fe rences of t e c h n o l o g y , e x p e r i e n c e , skills and app l icat ion area. For a realistic est imat ing e n v i r o n m e n t t o exist, k n o w l e d g e of past sys tem d e v e l o p m e n t ef forts is cruc ia l . In o rde r t o achieve this it is insuf f ic ient t o have the k n o w l e d g e res ident in pro ject managers heads as they may ei ther forget o r c h a n g e jobs , but instead it is cor rect to use a m e t h o d o l o g y w h i c h captures and maintains a database of sys tem metr ics . Th i rd , a p r o t o t y p e research inst rument has b e e n i m p l e m e n t e d t o reso lve the p r o b l e m s 150 Summary and Research D i rect ions / 151 of c o m p u t a t i o n a l tractabi l i ty and m e a s u r e m e n t reliabil ity. W h i l e the regress ion coef f ic ients f o u n d in C h a p t e r 6 are not genera l i zab le b e y o n d the data, the m e t h o d o l o g y fo r data capture and ind iv idual site cal ibrat ion is. As it stands, w i th a f e w m i n o r e n h a n c e m e n t s , the a u t o m a t e d C o d e Analyser can be a p p l i e d t o any F O C U S d e v e l o p m e n t e n v i r o n m e n t and u s e d t o c o n s t r u c t regress ion coe f f ic ien ts for e a c h MIS depar tment . Fourth, the m o d e l and t o o l have b e e n u s e d to measure t w o d e v e l o p m e n t env i ronments . T h e s e f ind ings p rov ide an initial va l idat ion of the m o d e l and t o o l , s h o w i n g that the p a r s i m o n e o u s m o d e l can be u s e d t o exp la in p r o g r a m and system s ize w i th in a l imi ted data set. The spec i f i c system d e v e l o p m e n t e n v i r o n m e n t s have b e e n b e n c h m a r k e d w h i c h wi l l a l low for long i tud ina l t rack ing of ind iv idual sys tem c h a n g e s as w e l l as overal l p roduct iv i t y c h a n g e s that may result f r o m n e w d e v e l o p m e n t m e t h o d s and t e c h n o l o g i e s . 7 .1 .1. Empirical Limitations Naturally, there are a large n u m b e r of in f luences o n the d e v e l o p m e n t effort o u t s i d e the s c o p e of this emp i r ica l work . This thesis r e c o g n i z e s their ex i s tence but c a n n o t h o p e to measure or c o n t r o l all of t h e m . The trade-of f m a d e by c o n d u c t i n g this f ie ld research has b e e n b e t w e e n cons t ruc t validity and internal validity. The measures that are m a d e of requ i rements size have h igh cons t ruc t validity. The f ie ld sett ing i n t roduces threats t o internal val idity by the ex is tence of p o s s i b l e c o n f o u n d i n g variables, i.e. o the r factors in f luenc ing effort w h i c h cannot be m e a s u r e d o r c o n t r o l l e d . Each of the prev ious studies , ident i f ied in C h a p t e r 2, f o u n d s o m e c o m m o n effects and s o m e u n i q u e ef fects . It is l ikely that e a c h n e w invest igat ion wi l l a lso f ind c o m m o n as w e l l as u n i q u e ef fects , Summary a n d Research D i rec t ions / 152 re f lect ing ind iv idual e n v i r o n m e n t a l d i f ferences . W h i l e these s e e m i n g l y u n i q u e ef fects may be c o n s i d e r e d as s u c h , they may in fact be part of a c o m m o n factor w h i c h has yet to be ident i f ied . H o w e v e r , there are s imp ly insuf f ic ient s tud ies to b e g i n t o c o n v e r g e o n an exhaust ive set. Rather than to ident i fy all factors, this research has c h o s e n t o h o l d the t o o l s and d e v e l o p m e n t e n v i r o n m e n t constant , p rov ide analyt ic c o n t r o l for the p e r s o n n e l i nvo lved and f o c u s o n what is b e l i e v e d t o be the pr inc ipa l dr iv ing fo rce b e h i n d effort, namely requ i rements s ize . By estab l ish ing a stable, causal l inkage b e t w e e n requ i rements , c o d e , and effort it wi l l t hen be p o s s i b l e t o m o v e into o the r d e v e l o p m e n t e n v i r o n m e n t s and have a so l i d g r o u n d o n w h i c h to gather data. 7.2. DIRECT EXTENSIONS The first ob ject ive is t o e x t e n d this research a p p r o a c h into a larger samp le of systems and c o m p a n i e s , initially w i th in the same language and then t o others'. The abil ity to unders tand the history of a system d e v e l o p m e n t e n v i r o n m e n t is an impor tant m a n a g e m e n t goa l . The k n o w l e d g e t o c o n s t r u c t reasonab le est imates of resource c o n s u m p t i o n is res ident in the minds of the systems d e v e l o p m e n t staff. These p e o p l e k n o w the history of sys tem d e v e l o p m e n t , the skills e a c h p e r s o n ho lds and spec ia l c i rcumstances s u r r o u n d i n g e a c h system d e v e l o p e d . Unfor tunate ly , w h e n sen io r systems p e r s o n n e l leave, their k n o w l e d g e leaves w i t h t h e m . Even if staff tu rnover is l o w p e o p l e have se lect ive recall and imper fect eva luat ion of pr io r d e v e l o p m e n t efforts. H o w e v e r , part of this k n o w l e d g e base is also res ident in the c o m p l e t e d systems. They posses the cu lm ina t ion of all effort e x p e n d e d t o bu i l d t h e m . This thesis has taken the p o s i t i o n that analysis of these systems can c o n t r i b u t e t o m a n a g e m e n t ' s unders tand ing of their o w n d e v e l o p m e n t e n v i r o n m e n t . The reasons that this is n o t yet c o m m o n pract ice inc lude : Summary and Research D i rec t i ons / 153 1. H is to ry takes t ime t o . d e v e l o p . O n l y w i th in the past d e c a d e have in fo rmat ion systems b e c o m e w i d e s p r e a d . N o w , c o m p a n i e s possess por t fo l ios of so f tware built i n - h o u s e . This sof tware can be c o n s i d e r e d as a database of systems conta in ing va luable i n fo rmat ion ; 2. M a n u a l analysis of sof tware is infeasible . W i t h the use of a u t o m a t e d t o o l s , such as the o n e d e s i g n e d and d e v e l o p e d in this thes is , c o l l e c t i o n of data o n past and current systems b e c o m e s pract ical . In format ion about past d e v e l o p m e n t exists in sof tware and can be ext racted and used by m a n a g e m e n t to i m p r o v e pro ject c o n t r o l : 1. K n o w l e d g e a b o u t h o w di f ferent p rog rammers and des igners have bui lt di f ferent systems is usefu l for resource a l locat ion . It makes sense to assign p e o p l e to tasks they are best su i ted to pe r fo rm. 2. A u t o m a t e d s o u r c e c o d e analysis a l lows c o m p l e x p rograms to be ident i f ied . M a i n t e n a n c e effort wi l l l ikely result f r o m these p rograms . 3. P rogram length as a f u n c t i o n of analysis and d e s i g n in fo rmat ion is useful for p red ic t i ng n e w d e v e l o p m e n t . 4. Ca l ib ra t ion of d e v e l o p m e n t e n v i r o n m e n t s : The ob ject i ve here is to p r o v i d e a firm w i t h an unders tand ing of its o w n p e o p l e and t o o l use. Ca l ib ra t ion is necessary s o that a base l ine of p roduct iv i ty can be es tab l i shed . It is ant ic ipated that direct ca l ibrat ion wi l l be requ i red for each f irm unti l a large e n o u g h data set is available for genera l izat ions in to n e w e n v i r o n m e n t s based u p o n attr ibutes of t h o s e env i ronments . 5. A f te r suf f ic ient external val idity is estab l ished it wi l l be feas ib le to make normat ive s tatements about the est imat ing and system d e v e l o p m e n t p r o c e s s , i.e. S u m m a r y and Research D i rec t i ons / 154 ob jects and interact ions w h i c h are f o u n d to cons is tent ly affect d e v e l o p m e n t can be ident i f ied early and m a n a g e d accord ing ly . E x t e n d i n g this m e t h o d into o ther languages w o u l d requi re e x p a n d i n g the C o d e Analyser . M o d i f i c a t i o n s t o the l o w level pars ing rout ines w o u l d be requ i red so that languages s u c h as C o b o l , PL /1 , Fortran o r 4 th gene ra t i on languages o the r than F O C U S c o u l d be r e c o g n i z e d . This w o u l d entai l 1) rep lac ing the language K e y w o r d d ict ionary , 2) c h a n g i n g the pars ing de l imi ters a n d syntax assumpt ions , and 3) a d d i n g r e c o g n i t i o n l o g i c for major c o d e chunks . Further, as the F O C U S language evo lves e n h a n c e m e n t s t o the ex is t ing Ana lyser wi l l be n e e d e d . 7.3. EXPERT BEHAVIOUR O n e fruitful area of invest igat ion is the p r o c e s s of est imate c o n s t r u c t i o n by pract is ing exper ts as the bu lk of est imat ing w o r k is d o n e f r o m e x p e r i e n c e . Experts in the f ie ld d o qu i te w e l l e v e n t o have an est imate of effort c o m e wi th in several h u n d r e d pe rcen t of actual effort e x p e n d e d , c o n s i d e r i n g the potent ia l range of so f tware d e v e l o p m e n t costs . W h a t this po in ts t o is that p ro ject managers base their est imates o n character ist ics of s y s t e m size and that these character ist ics s h o u l d be measurab le . H e r e w e may con jec tu re that est imators actual ly p e r f o r m s o m e menta l t rans fo rmat ion f r o m the system's requ i rements in to s o m e units (such as L O C ) and then apply a p roduct iv i t y rule of t h u m b . H e n c e it w o u l d be fruitful t o e n g a g e in observat ion of pract is ing sys tem d e v e l o p e r s in a real sys tem feasibi l i ty d e c i s i o n s i tuat ion . W i t h i n this ca tegory of research several di f ferent invest igat ion strategies are feasib le . The ob ject i ve of o b s e r v i n g pract is ing sys tem deve lope rs in an est imat ing s i tuat ion is t o unders tand their p r o b l e m so lv ing Summary and Research D i rec t ions / 155 b e h a v i o u r and the t e c h n i q u e s u s e d t o : a) unders tand the s c o p e of a sys tem; b) obta in user d e v e l o p e r interact ions , i n fo rmat ion e l ic i tat ion and issue c lar i f icat ion ; c) unders tand and p r o c e s s env i ronmenta l signals. 1. O n e a p p r o a c h in this area w o u l d be to create an examp le system requ i rements spec i f i ca t ion and ask 20 pract is ing analysts to der ive an est imate of labour resources n e e d e d to i m p l e m e n t the e x a m p l e sys tem. The verbal p r o t o c o l s of the analysts c o u l d be u s e d to d e t e r m i n e w h i c h aspects of the p r o b l e m d o m a i n c a u s e d m o r e diff iculty. The l imitat ions of this a p p r o a c h are that i) the p r o b l e m d o m a i n is artificial, and ii) the c o m p l e x i t y of the system must be very s imp le , i.e. analysts must be able to cons t ruc t an initial d e s i g n w i th in a f e w hours . 2. A s e c o n d a p p r o a c h w o u l d be to m o v e into a live s i tuat ion . For this a p p r o a c h to be success fu l it w o u l d be necessary to obta in the c o o p e r a t i o n of a c o m p a n y about to embark u p o n a system d e v e l o p m e n t pro ject . Ideally, user and d e v e l o p e r teams w o u l d be b rought t o g e t h e r in to a C r o u p D e c i s i o n S u p p o r t s i tuat ion w i t h the stated ob ject i ves of i) captu r ing system requ i rements , and ii) estab l ish ing a cos t est imate . The G D S S e n v i r o n m e n t w o u l d a l l o w for o n l i n e capture of est imat ing behav iour . The D e l p h i t e c h n i q u e c o u l d be u s e d to focus the p r o c e s s and c o n v e r g e o n agreement . A n integrat ing e x t e n s i o n of this a p p r o a c h w o u l d be to a l low analysts access to their database of past pro jects in o rde r to obta in accurate cos t data. 7.4. MODELLING Integrat ion of System Static and System D y n a m i c m o d e l l i n g : W i t h i n the area of c o n c e p t u a l m o d e l l i n g , i m p r o v e m e n t s are n e e d e d in the way in w h i c h systems are r e p r e s e n t e d conceptua l l y . This thesis has a rgued that measures of system requ i rements Summary and Research D i rec t ions / 156 are actual ly measures of c o m p l e x i t y . It is this c o m p l e x i t y w h i c h i n d u c e s effort in humans to u n d e r s t a n d , d e c o m p o s e , st ructure , and o the rw ise translate in to a w o r k i n g m a c h i n e artifact. The m e t h o d s w e use t o m o d e l a system affect fundamenta l l y o u r abil ity t o st ructure c o m p l e x i t y . Further w o r k is n e e d e d to i n c o r p o r a t e all d i m e n s i o n s of c o m p l e x i t y w i th in a un i f ied m e t h o d o l o g y . This may entail i n t r o d u c i n g aspects of ob jec t o r i e n t e d analysis and des ign , m o d e l l i n g the d y n a m i c behav iour (events) of t h o s e ob jects o r g r o u p of ob jec ts (entit ies) all w i th in a static structural m o d e l of the system. 7.5. MANAGEMENT A centra l p r o b l e m fac ing the sof tware d e v e l o p m e n t industry is the lack of cons i s ten t r e c o r d k e e p i n g o n system d e v e l o p m e n t pro jects . Even w h e n records are kept , a s e c o n d p r o b l e m or ig inates f r o m the kinds of records kept . If data o n d e v e l o p m e n t pro jects are c o l l e c t e d , it is invariably not the right k ind of data to const ruct a mean ing fu l base l ine fo r any est imat ing m o d e l . The p r o b l e m s tems f rom the fact that data is c o l l e c t e d w i t h ind iv idua l pro ject m a n a g e m e n t and c o n t r o l as the ob ject i ve and no t towards ga in ing a b r o a d e r pe rspect i ve by c o l l e c t i n g data w i th a research f ramework in m i n d . In o r d e r to ob ta in a research pe rspect i ve data must be c o l l e c t e d o n the units of d e v e l o p m e n t w o r k s o that these units may be c o m p a r e d across systems. M a n y pro ject m a n a g e m e n t systems a l l ow d e v e l o p e r s to r e c o r d hours spent o n a part icular task w i t h i n their system d e v e l o p m e n t m e t h o d o l o g y . This a p p r o a c h , h o w e v e r , d o e s not ident i fy funct iona l aspects of the sof tware . For a mean ing fu l es t imat ing e n v i r o n m e n t to exist , the units w h i c h f o r m the basis for es t imat ing must be the units against w h i c h effort is repo r ted . Essentially, the p r o b l e m is that pro ject m a n a g e m e n t Summary and Research D i r e c t i o n s / 157 systems a t tempt to c o n t r o l the p r o c e s s of system d e v e l o p m e n t but no t the p roduct . For e x a m p l e , if the est imat ing a p p r o a c h uses F u n c t i o n Points t o size a pro ject then hours must be r e p o r t e d against the de l ivered f u n c t i o n . L ikewise if ent i t ies , re lat ionships and events are u s e d t o f o r m a pro ject est imate t h e n w o r k s h o u l d b e r e p o r t e d as effort is e x p e n d e d analyz ing, d e s i g n i n g , and i m p l e m e n t i n g t h o s e sys tem ob jects . Add i t iona l l y , spec i f i cs about the activit ies p e r f o r m e d s h o u l d b e n o t e d s o that c o n d i t i o n s caus ing further c o m p l e x i t y can be e x a m i n e d . Finally, spec i f ics o n t o o l use must be r e p o r t e d for e a c h system d e v e l o p m e n t in o rde r t o test for p roduct iv i ty gains (or losses) a c c o m p a n y i n g the instal lat ion of n e w d e v e l o p m e n t t e c h n o l o g i e s . Repo r t i ng o n this basis is cruc ia l for p ro jec t c o n t r o l . 7.6. CLOSING REMARKS The m e a s u r e m e n t m o d e l and m e t h o d d e v e l o p e d in this thesis assumes that a val id representat ion of system requ i rements is avai lable. It d o e s no t i n c l u d e such organizat iona l factors as the po l i t ica l .dec is ion mak ing p rocess , cross depar tmenta l data f lows, stabil ity of the e n v i r o n m e n t of the parent o rgan i za t ion , structural dynamics , to name a few. It is a s s u m e d for this initial s tudy that these factors in f luence the requ i rements de f in i t i on and , mutatis mutand is , result in requ i rements b e i n g m o r e o r less c o m p l e x . M u c h m o r e w o r k is n e e d e d to be able t o est imate system s ize based o n system ob ject i ves , o rgan izat iona l c o n t e x t and organ izat iona l c o m p l e x i t y in genera l . This research represents o n e smal l s tep in that d i rec t ion . REFERENCES Alb rech t , A l lan J . , " M e a s u r i n g A p p l i c a t i o n D e v e l o p m e n t P roduct i v i t y , " P r o c e e d i n g s of the  I B M A p p l i c a t i o n s D e v e l o p m e n t S y m p o s i u m , G U I D E / S H A R E . O c t o b e r , 1979. pp . 83 -92 . A l b r e c h t , A l lan J. and John Gaf fney Jr., "So f tware Funct ion , S o u r c e Lines of C o d e , and D e v e l o p m e n t Effort P red ic t ion : A Sof tware Sc ience Val idat ion ," IEEE Transact ions o n  Sof tware Eng ineer ing , N o v e m b e r 1983, SE-9(6), pp . 639 -648 . Ashby , Ross W . A n In t roduct ion to C y b e r n e t i c s , John W i l e y & Sons Inc., N e w York, 1956. A t z e n i , P. et a l . , " I N C O D : A System for C o n c e p t u a l D e s i g n of Data and Transact ions in the Ent i ty -Relat ionship M o d e l " , in Ent i ty -Relat ionship A p p r o a c h to In format ion  M o d e l l i n g and Analysis e d i t e d by P. P. C h e n , N o r t h - H o l l a n d , 1983. Bailey, J. W. , and V. R. Basili , " A M e t a - M o d e l for Sof tware D e v e l o p m e n t Resource Expend i tu res , " P r o c e e d i n g s of the Fifth International C o n f e r e n c e o n Sof tware  Eng ineer ing , I E E E / A C M / N B S , M a r c h 1981 , pp . 107 -116. Basil i , V i c t o r R., M o d e l s and Met r i cs for Sof tware M a n a g e m e n t and Engineer ing : Tutor ia l , IEEE C o m p u t e r Soc ie ty Press, 1980. Basil i , V i c t o r R., and H. D ie te r R o m b a c h , "The T A M E Project: T o w a r d s I m p r o v e m e n t - O r i e n t e d Sof tware Env i ronments" , IEEE Transact ions o n Sof tware  Eng ineer ing , June 1988, pp .758 -773 . Behrens , Char les A. , " M e a s u r i n g the Product iv i ty of C o m p u t e r Systems D e v e l o p m e n t Act iv i t ies w i th Funct ion Po in ts , " IEEE Transact ions o n Software Eng ineer ing , N o v e m b e r 1983, V o l : SE-9, N o . 6, pp . 648 -652 . Benbasat , Izak, and Iris Vessey , " P r o g r a m m e r and Analyst T i m e / C o s t Est imat ion" , MIS  Quarter ly , June 1980, pp. 31 -42. B o e h m , Barry W. , Sof tware Eng ineer ing E c o n o m i c s . P ren t ice -Ha l l Inc., E n g l e w o o d Cl i f fs , N e w Jersey, 1981. B o e h m , Barry W. , "So f tware Eng ineer ing E c o n o m i c s " , IEEE Transact ions o n Sof tware  Eng ineer ing , January 1984, pp . 4 - 21 . Borg ida , A lexander , So l G r e e n s p a n and John M y l o p o u l o s , " K n o w l e d g e Representat ion as the Basis For Requ i rements S p e c i f i c a t i o n " , IEEE C o m p u t e r , Ap r i l 1985, pp . 82 -90 . B rooks , Freder ick P., Jr., The My th ica l M a n - M o n t h : Essays o n Sof tware Eng ineer ing , A d d i s o n - W e s l e y Pub l ish ing C o m p a n y , Inc. Ph i l ipp ines , 1975 . B r o o k s , W . D., "So f tware t e c h n o l o g y payoff - s o m e statistical e v i d e n c e , " The Journal of  Systems and Sof tware , V o l . 2, 1981 , pp . 3-9. 158 / 159 Bunge , M . , Treatise o n Basic Ph i losophy : O n t o l o g y II: A W o r l d of Systems, Reidel , B o s t o n , 1977. Bu r ton , B. J . , " M a n p o w e r Estimating for Systems Pro jects" , Journal of Systems  M a n a g e m e n t , January 1975, pp . 29 -33 . C h e n , Peter, "Ent i ty re lat ionship Diagrams a n d Engl ish S e n t e n c e St ructu re , " in Entity  Re lat ionsh ip A p p r o a c h to Systems Analys is and Des ign , e d i t e d by: P. P. C h e n , N o r t h - H o l l a n d , 1980 C h e n , Peter, "The Entity Re lat ionship M o d e l : T o w a r d s a U n i f i e d V i e w of D a t a , " A C M  Transact ions o n Database Systems V o l . 1, N o . 1, June 1976, pp . 9 -36. C o n t e , S. D., H . E. D u n s m o r e and V. Y. Shen , Software Eng ineer ing M e t r i c s and  M o d e l s . The B e n j a m i n / C u m m i n g s Pub l i sh ing C o m p a n y , Inc., M e n l o Park, C A . 1986. Chrys ler , Earl, " I m p r o v e d M a n a g e m e n t of In fo rmat ion Systems D e v e l o p m e n t " , Journal of  Systems M a n a g e m e n t , M a r c h 1980, pp . 6 -13. Chrysler , Earl, " S o m e Basic Determinants of C o m p u t e r P r o g r a m m i n g Product iv i ty" , C o m m u n i c a t i o n s of the A C M , June 1978, pp . 472 -483 . Davis, C o r d o n B., "Strategies for i n fo rmat ion requ i rements d e t e r m i n a t i o n , " I B M Systems  Journal , V o l . 21, N o . 1, 1982, pp . 4 -30 . D e M a r c o , T., S t ructured Analysis and System Spec i f i ca t ion , Y o u r d o n Press, N e w York, 1978. D e M a r c o , T., C o n t r o l l i n g Sof tware Projects, P r e n t i c e - H a l l 1982. Essink, L e o J. B., " A M o d e l l i n g A p p r o a c h t o In format ion System D e v e l o p m e n t " , in In format ion System Des ign M e t h o d o l o g i e s : Improv ing the Pract ice T. W . O l l e , H. C . So l and A . A. Verr i j in -Stuart (editors) , Elsevier S c i e n c e Publ ishers ( N o r t h - H o l l a n d ) , IFIP, 1986. F i t zs immons , A n n , and T o m Love, " A Rev iew and Evaluation of Sof tware S c i e n c e " , C o m p u t i n g Surveys, M a r c h 1978, pp . 4 -18 . Fre iman, Frank R., and Rober t E. Park, " T h e PRICE Software C o s t M o d e l " , P r o c e e d i n g s  of the Nat iona l A e r o s p a c e Electronics C o n f e r e n c e , IEEE, 1979. C a n e , Chr is , and Trish Sarson, St ructured Systems Analysis : T o o l s and Techn iques , P ren t i ce -Ha l l , N e w Jersey, 1979. Caf fney , John E. Jr., Rober t G o l d b e r g and L inda D. Misek -Fa lkof f , " S c o r e 8 2 summary , " A C M Sigmetr ics : Pe r fo rmance Evaluation Rev iew, W i n t e r 1984 -1985 , V o l . 12, N o . 4, p p . 4 -12 . / 160 Gaffney, John E. Jr., "The Impact o n Sof tware D e v e l o p m e n t C o s t s of U s i n g H O L ' s " , IEEE Transact ions o n Software Eng ineer ing , M a r c h 1986, pp . 496 -499 . G o l d e n , John R., James R. M u e l l e r and Barbara A n s e l m , "So f tware C o s t Est imat ing: Craft o r W i t c h c r a f t , " D A T A BASE, Spr ing 1981 , pp . 12-14. Ha ls tead , M . H. , E lements of Software S c i e n c e . Elsevier, N o r t h - H o l l a n d , N e w York, 1977. Hare l , Ellie C. and Ephra im R. M c L e a n , "The Effects of U s i n g N o n p r o c e d u r a l C o m p u t e r Language o n P r o g r a m m e r Product iv i ty " , MIS Quarter ly , June 1985, pp . 109 -120 . Henry , Sallie, and D e n n i s Kafura, "So f tware Structure M e t r i c s Based O n In format ion F l o w " , IEEE Transact ions o n Sof tware Eng ineer ing , S e p t e m b e r 1981 , pp . 510 -518 . Jackson , M . , Pr inc ip les of Program D e s i g n , A c a d e m i c Press, L o n d o n , 1975. Jones, Capers , P r o g r a m m i n g Product iv i ty , M c G r a w - H i l l , Inc. 1986. Kemerer , Chr is F., " A n Empir ical Va l ida t ion of Sof tware C o s t Est imat ion M o d e l s , " GSIA W o r k i n g Paper #41-85-86, C a r n e g i e - M e l l o n Universi ty , P i t tsburgh , PA. , 1986. Kemerer , Chr is F., " A n Empir ical Va l ida t ion of Sof tware C o s t Est imat ion M o d e l s , " C o m m u n i c a t i o n s of the A C M , M a y 1987. K i t c h e n h a m , B. A. , and N. R. Taylor, "So f tware C o s t M o d e l s , " ICL T e c h n i c a l Journal , M a y 1984, p p . 73 -102 . K o t t e m a n n , Jeffrey E., and Benn R. Konsynsk i , " C o m p l e x i t y A s s e s s m e n t : A D e s i g n and M a n a g e m e n t T o o l For In format ion System D e v e l o p m e n t " , In format ion Systems, 1984. Li, H . F. and W . K. C h e u n g , " A n Empir ical Study of Sof tware M e t r i c s " , IEEE  Transact ions o n Software Eng ineer ing , June, 1987. M c C a b e , T h o m a s J . , " A C o m p l e x i t y M e a s u r e , " IEEE Transact ions o n Sof tware Eng ineer ing , D e c e m b e r 1976, V o l . 2, pp . 308 -320 . M c K e e n , J . , A n Empir ica l Invest igat ion of the Process and P r o d u c t of A p p l i c a t i o n System  D e v e l o p m e n t , U n p u b l i s h e d Ph .D . Thesis , Univers i ty of M i n n e s o t a , M a r c h 1981 . M c K e e n , Jim, " S u c c e s s f u l D e v e l o p m e n t Strategies for Bus iness A p p l i c a t i o n Sys tems , " MIS  Quarter ly , S e p t e m b e r 1983, p p . 47 -65 . M is ra , Santosh K. and Paul J. Jalics, " T h i r d - G e n e r a t i o n versus Four th G e n e r a t i o n Sof tware D e v e l o p m e n t " , IEEE Software , July 1988, pp . 8-14. M o h a n t y , S iba N . , "So f tware C o s t Est imat ion: Present and Future" , So f tware -Pract ice and  Exper ience , V o l . 11, 1981, pp . 103 -121 . / 161 M u r p h y , Kevin R., " F o o l i n g Yourse l f w i th C r o s s - V a l i d a t i o n : S ingle S a m p l e D e s i g n s , " Pe rsonne l P s y c h o l o g y , V o l . 36, 1983, pp . 111 -118 . N a u m a n n , J.D. and M . Jenkins, " P r o t o t y p i n g : The N e w Parad igm for Systems D e v e l o p m e n t " , MIS Quarter ly , V o l . 6, N o . 3, S e p t e m b e r 1982, pp . 29-44. Orr , K. S t ructured Systems D e v e l o p m e n t , Y o u r d o n Press, N e w York, 1977. Parnas, Dav id , " O n the Cr i ter ia to be U s e d in D e c o m p o s i n g Systems into M o d u l e s , " C o m m u n i c a t i o n s of the A C M . D e c e m b e r 1972. Parr, F. N . , " A n Al ternat ive t o the Rayleigh C u r v e M o d e l for Sof tware D e v e l o p m e n t Effort ," IEEE Transact ions o n Sof tware Eng ineer ing , M a y 1980, pp . 291 -296 . Pressman, R. ,Software Eng ineer ing : A Pract i t ioners A p p r o a c h , S e c o n d Ed i t ion , M c C r a w Hi l l , 1987. Putnam, Larry H. , " A Genera l Empir ical S o l u t i o n to the M a c r o Sof tware Siz ing and Est imating P r o b l e m , " IEEE Transact ions o n Sof tware Eng ineer ing , July 1978, V o l . SE-4, N o . 4, pp . 345 -361 . Pu tnam, Larry H. and A n n F i t zs immons , "Est imat ing Sof tware C o s t s , " Datamat ion . S e p t e m b e r , O c t o b e r , N o v e m b e r 1979. V o l . 25, N o . 10 ,11,12. Ramamoor thy , C. V. , A tu l Prakash, W e i - T e k Tsai, "So f tware Eng ineer ing : P r o b l e m s and Pe rspect i ves" , IEEE C o m p u t e r , O c t o b e r 1984, pp . 191 -209 . Rub in , H o w a r d A. , " M a c r o - E s t i m a t i o n of Sof tware D e v e l o p m e n t Parameters: The E S T I M A C S S y s t e m , " SOFTFAIR - Software D e v e l o p m e n t : T o o l s , T e c h n i q u e s , and Alternat ives,  IEEE, July 1983, pp . 109 -118 . Rubin , H o w a r d A. , "The Art and S c i e n c e of Sofware Est imat ion : Fifth G e n e r a t i o n Est imators" , P r o c e e d i n g s of the 7th A n n u a l ISPA C o n f e r e n c e , V o l . 5, June 1985. Rubin , H o w a r d A. , " A C o m p a r i s o n of C o s t Est imat ion T o o l s " , P r o c e e d i n g s of the 8th  International C o n f e r e n c e o n Sof tware Eng ineer ing , IEEE C o m p u t e r Society , A u g u s t 28 -30 , 1985, L o n d o n , England. Sayler, John H o w a r d , "The Structure and C o m p l e x i t y of Sof tware System D e s i g n s " , P h D D isser tat ion , The Univers i ty of M i c h i g a n , 1985. S c h w a b , D o n a l d P., " C o n s t r u c t Val id i ty o n Organ i za t iona l Behav io r " , in Research in  O rgan i za t iona l Behavior , JAI Press Inc., V o l 2, 1980, pp . 3 -43. Shen , V . Y. , S. D. C o n t e , and H. E. D u n s m o r e , "So f tware S c i e n c e Revis i ted : A Cr i t ica l Analys is of the Theory and Empir ical S u p p o r t " , IEEE Transact ions o n Sof tware  Eng ineer ing , SE-9, 2, M a r c h 1983, pp . 155 -165 . S i m o n , H . A. , The S c i e n c e s of the Art i f ic ial . MIT Press, C a m b r i d g e Mass . , 1969. / 162 Software , P roduct iv i ty Research Inc., So f tware Product iv i ty , M a r c h / A p r i l 1986. V o l . 1, N o . 1, pp . 1-7. S P Q R / 2 0 , S P Q R / 2 0 Users G u i d e . So f tware Product iv i ty Research Inc., January 1986. S teward , D o n a l d V. , Software Eng ineer ing w i t h Systems Analysis and D e s i g n , B r o o k s / C o l e Pub l i sh ing C o . , 1987, pp . 143 -163 . S t roud , John M . , " T h e Fine Structure of P s y c h o l o g i c a l T i m e " , in In format ion T h e o r y in  Psycho logy , e d i t e d by Henry Quast le r , The Free Press, G l e n c o , Ill inois. 1954. Symons , Char les R., " F u n c t i o n Po int Analysis : Di f f icult ies and I m p r o v e m e n t s " , IEEE  Transact ions o n Software Eng ineer ing , January 1988. T e i c h r o e w , Dan ie l , Fernao G e r m a n o and Luca Silva, " A p p l i c a t i o n s of the Ent i ty -Relat ionship A p p r o a c h " , in Ent i ty -Relat ionship A p p r o a c h to In fomat ion M o d e l l i n g and Analysis, P. P. C h e n edi tor , Elsevier Sc ience Publ ishers B. V. ( N o r t h - H o l l a n d ) , 1983, pp . 1-17. Toe l lner , John , "P ro jec t Est imat ing" , Journal of Systems M a n a g e m e n t , M a y 1977, p p . 6 - 9 . v o n Bertalanffy, L. Genera l Systems Theory , G e o r g e Braziller, N e w York, 1968. Verner , June and Graham Tate, "Est imat ing Size and Effort in Four th G e n e r a t i o n D e v e l o p m e n t " , IEEE Software, July 1988, pp . 15 -22 . V o s b u r g h , J . , et. al . , "P roduct i v i t y Factors and P r o g r a m m i n g Env i ronments , " P r o c e e d i n g s  of the Seventh International C o n f e r e n c e o n Sof tware Eng ineer ing , 1984, pp . 143 -152 . W a l s t o n , C . E., and C . P. Felix, " A M e t h o d of P r o g r a m m i n g M e a s u r e m e n t and Est imat ion , " I B M Systems Journal , 1977, V o l . 16, N o . 1, pp . 54 -73 . W a n d , Y. and R. W e b e r , "Forma l i za t ion of In format ion Systems D e s i g n " , U . B . C . W o r k i n g Paper, M a y 1987. W a n d , Y. and R. W e b e r , " A n O n t o l o g i c a l Analysis of S o m e Fundamenta l In format ion System C o n c e p t s , " P r o c e e d i n g s , of the 9th A n n u a l International C o n f e r e n c e o n  In format ion Systems, M i n n e a p o l i s , U.S.A. , D e c e m b e r 1988. Warn ier , J. Log ica l C o n s t r u c t i o n of Programs , Van N o s t r a n d Re inho ld , N e w York , 1974. W a n g , A n d r e w S h e n g - Y e n , "The Est imat ion of Sof tware Size and Effort: A n A p p r o a c h Based o n the Evo lut ion of Sof tware M e t r i c s " , P h D D isser tat ion , Pu rdue Universi ty , 1984. W o l v e r t o n , Ray W . , "The C o s t of D e v e l o p i n g Large-Scale So f tware , " IEEE Transact ions o n  C o m p u t e r s , June 1974, V o l . c - 23 , N o . 6, pp . 615 -636 . / 163 Wr ig ley , C . D., and A. S. Dexter , "So f tware D e v e l o p m e n t Est imat ion M o d e l s : A Rev iew and C r i t i q u e " , P r o c e e d i n g s of the Admin is t ra t ive Sc iences A s s o c i a t i o n of Canada :  MIS D iv i s ion , T o r o n t o , C a n a d a , June 1987. Wr ig ley , C . D., and A. S. Dexter , " A M o d e l For Est imat ing In format ion System Requ i rements Size: Prel iminary F ind ings" , P r o c e e d i n g s of the 9th A n n u a l  International C o n f e r e n c e o n In format ion Systems, M i n n e a p o l i s , U.S.A. , D e c e m b e r 1988. Wr ig ley , C . D., and A. S. Dexter , "Towards A n A u t o m a t e d T o o l For In format ion System Requ i rements S i z i n g " , W o r k i n g Paper # M I S - 0 1 1 - 8 8 , The Univers i ty of British C o l u m b i a , 1988. APPENDIX A: OPERAND CODING SCHEME 164 OPERAND CODING SCHEME: CLIVE WRIGLEY AUG.08.1988 EXAMPLES / 165 00-19 SCREEN I /O: 01 - DATA ENTRY VARIABLE 02 - DATABASE TURNAROUND VARIABLE 03 - DATABASE DISPLAY VARIABLE 04 - LOCAL VARIABLE DISPLAY 05 - LOCAL VARIABLE TURNAROUND 06 - CHARACTER STRING 07 - POSITIONAL CONTROL 08 - GRAPHIC ATTRIBUTE <VARNAME <T.VARNAME <D.VARNAME &VARNAME OR <D.&VARNAME <T.&VARNAME "STRING LITERAL" <99, </99 <.H. 20-39 REPORT OUTPUT: 20 - DATABASE VARIABLE 21 - DATABASE VARIABLE IN HEADING 22 - LOCAL VARIABLE IN HEADING 23 - COLUMN HEADING 24 - REPORT HEADINGS 25 - PRINTER POSITIONING VARNAME <VARNAME -&VARNAME ' 'STRING' "STRING" <99 40-49 FILENAMES: 40 - MASTER FILENAME 41 - TEMPORARY FILENAME 42 - INCLUDE FILENAME MASTNAME HOLD FOCEXEC 50-59 INTERNAL DATA MANIPULATION: 50 - DATABASE VARIABLE MANIPULATION 51 - LOCAL VARIABLE MANIPULATION 52 - NUMERICAL CONSTANT 53 - STRING CONSTANT VARNAME &VARNAME 99 •STRING' OR D.VARNAME 60-69 LOGICAL: 60 - LABEL LABELNAME 99 - UNCLASSIFIED APPENDIX B: PILOT VALIDATION RESULTS 166 Appendix B: P i l o t V a l i d a t i o n -Program Demographics I a. Manual C o u n t i n g : LOC N MINIMUM C o n t r o l Updates Outputs 13 30.000 18 15.000 43 10.000 MAXIMUM 88.000 740.00 440.00 MEAN 65.615 240.28 142.16 STD DEV 20.855 226.58 94.810 b. Automated C o u n t i n g : LOC N MINIMUM C o n t r o l Updates Outputs 13 29.000 18 15.000 43 10.000 MAXIMUM 87.000 910.00 427.00 MEAN 62.615 277 .39 131.05 STD DEV 22.221 280.39 91.879 c. Manual Program C l a s s i f i c a t i o n : Two Sample T - T e s t MEAN N MEAN VAR N MEAN VAR N UPDATE 240.28 VAR 51340. 18 REPORT 142.16 8989.0 43 UPDATE 240.28 51340. 18 REPORT TEST STATISTIC DF SIGNIF 142.16 T= 2.4008 59 .0195 8989.0 F= 5.7114 17,42 .0000 43 PROB(1ST MEAN>2ND |DATA)= .9532 MENU TEST STATISTIC DF SIGNIF 65.615 T« 2.8726 54 .0058 434.92 F - 20.668 42,12 .0000 13 PROB(1ST MEAN)2ND |DATA)=1.0000 MENU TEST STATISTIC DF SIGNIF 65.615 T= 2.7579 29 .0100 434.92 F= 118.04 17,12 .0000 13 P R O B d S T MEAN > 2ND | DATA) = .9977 d. Automated Program C l a s s i f i c UPDATE REPORT MEAN 277.39 131.05 VAR 78620. 8441.8 N 18 43 REPORT MENU MEAN 131.05 62.615 VAR 8441 .8 493.76 N 43 13 UPDATE MENU MEAN 277.39 62.615 VAR 78620. 493.76 N 18 13 a t i o n : Two Sample T - T e s t TEST STATISTIC DF SIGNIF T= 3.0791 59 .0031 F= 9.3132 17,42 .0000 P R O B d S T MEAN>2ND | DATA) = .9778 TEST STATISTIC DF SIGNIF T= 2.6462 54 .0106 F= 17.097 42,12 .0000 P R O B d S T MEAN) 2ND | DATA) = 1 . 0000 TEST STATISTIC DF SIGNIF T= 2.7426 29 .0103 F= 159.23 17,12 .0000 P R O B d S T MEAN)2ND | DATA) = .9976 / 168 Appendix B: P i l o t V a l i d a t i o n - R e g r e s s i o n R e s u l t s e. Manual C o u n t i n g LEAST SQUARES REGRESSION: Update Programs ANALYSIS OF VARIANCE OF 5.LOC N= 18 OUT OF 18 SOURCE DF SUM SQRS MEAN SQR F - S T A T REGRESSION ERROR TOTAL 2 .75851 +6 .37925 +6 49.783 15 .11427 +6 7618.1 17 .87278 +6 SIGNIF .0000 MULT R= .93224 R-SQR= .86907 SE= 87.282 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT 10.804 38.774 .27863 .7843 8.SCRNVARS .89319 1.6648 .21642 7.6927 .0000 15.MASTERS .66891 49.417 14.179 3.4852 .0033 LEAST SQUARES REGRESSION: C o n t r o l Programs ANALYSIS OF VARIANCE OF 5.LOC N= 13 OUT OF 14 SOURCE REGRESSION ERROR TOTAL DF SUM SQRS MEAN SQR 1 11 12 2514 .9 2704 .2 5219.1 2514.9 245.83 MULT R= .69417 R-SQR= .48187 SE= 15.679 F - S T A T 10.230 SIGNIF .0085 VARIABLE CONSTANT 8.SCRNVARS PARTIAL .69417 COEFF 22.000 9.6102 STD ERROR 14.313 3.0046 T - S T A T 1.5371 3.1985 SIGNIF .1525 .0085 / 1 6 9 LEAST SQUARES REGRESSION: Output Programs - Preliminary ANALYSIS OF VARIANCE OF 4.LOC N= 43 OUT OF 43 SOURCE DF SUM SQRS MEAN SQR F-STAT SIGNIF REGRESSION 3 .23903 +6 79678. 26.900 .0000 ERROR 39 .11552 +6 2962.0 TOTAL 42 .35455 +6 MULT R= .82109 R-SQR= .67418 SE = 54.425 VARIABLE PARTIAL COEFF STD ERROR T-STAT SIGNIF CONSTANT -9.8091 20.688 -.47415 .6380 5.SCRN .41143 49.416 17.530 2.8190 .0075 9.FLDS .19241 .51473 .42037 1.2245 .2281 7.RVAR .74025 1.6813 .24452 6.8758 .0000 LEAST SQUARES REGRESSION: Output Programs - Detailed ANALYSIS OF VARIANCE OF 4.LOC N= 43 OUT OF 43 SOURCE DF SUM SQRS MEAN SQR F-STAT SIGNIF REGRESSION ERROR TOTAL '4 38 42 .26343 91123. .35455 +6 65858. 2398.0 + 6 27.464 .0000 MULT R= .86197 R-SQR= .74299 SE= 48.969 VARIABLE PARTIAL COEFF STD ERROR T-STAT SIGNIF CONSTANT SCRN FLDS RVAR PANDJ .39640 .16168 .46058 .45955 -3.4236 42.388 .38422 .98873 11.318 18.721 15.926 .38044 .30911 3.5482 -.18287 2.6617 1.0099 3.1986 3.1897 .8559 .0113 .3189 .0028 .0029 / 170 Appendix B: P i l o t V a l i d a t i o n - R e g r e s s i o n R e s u l t s f . Automated C o u n t i n g LEAST SQUARES REGRESSION: Update Programs ANALYSIS OF VARIANCE OF 4 .LOC N= 18 OUT OF 18 SOURCE DF SUM SQRS MEAN SQR REGRESSION ERROR TOTAL 2 15 17 .12141 +7 .12241 + 6 .13365 *7 .60706 8161.0 MULT R= .95310 R-SQR= .90841 SE= 90.338 F - S T A T 74.386 SIGNIF .0000 VARIABLE CONSTANT 6.SVAR 8. MAS PARTIAL ,89655 ,67917 COEFF 20 .846 1.8907 40.894 STD ERROR 35.351 .24119 11.411 T-STAT .58970 7.8391 3.5837 SIGNIF .5642 .0000 .0027 LEAST SQUARES REGRESSION: C o n t r o l Programs ANALYSIS OF VARIANCE OF 4 .LOC N= 13 OUT OF 13 SOURCE DF SUM SQRS MEAN SQR REGRESSION ERROR TOTAL 1 11 12 2228.4 3696.6 5925.1 2228.4 336.06 MULT R« .61327 R-SQR= .37610 SE= 18.332 F-STAT 6.6311 SIGNIF .0258 VARIABLE CONSTANT 6.SVAR PARTIAL 61327 COEFF 20.632 8.6632 STD ERROR 17.078 3.3642 T-STAT 1.2081 2.5751 SIGNIF .2523 .0258 / 171 Appendix B : P i l o t V a l i d a t i o n - R e g r e s s i o n R e s u l t s LEAST SQUARES REGRESSION: Output Programs - P r e l i m i n a r y ANALYSIS OF VARIANCE OF 5.LOC N= 4 3 OUT OF 43 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 3 .14230 +6 47432. 7.8637 .0003 ERROR 39 .23524 +6 6031.8 TOTAL 42 .37754 +6 MULT R= .61393 R-SQR= .37691 SE= 77.665 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT 9.9198 30.193 .32855 .7443 6.SCREENS .35934 59.260 24.643 2.4047 .0210 17 .FIELDS .34380 1.4506 .63443 2.2864 .0277 10.REPVARS .18439 1.0608 .90544 1.1716 .2485 LEAST SQUARES REGRESSION: Output Programs - D e t a i l e d ANALYSIS OF VARIANCE OF 5.LOC N= 43 OUT OF 43 SOURCE DF SUM SQRS MEAN SQR F - S T A T SIGNIF REGRESSION 4 .28235 +6 70587. 28.178 .0000 ERROR 38 95191. 2505.0 TOTAL 42 .37754 +6 MULT R= .86479 R-SQR= .74786 SE= 50.050 VARIABLE PARTIAL COEFF STD ERROR T - S T A T SIGNIF CONSTANT 6. SCREENS 17 .FIELDS 10.REPVARS 49.PANDJ .51285 .18069 .15548 .77159 4.9679 58.484 .48556 .56972 18.632 19.469 15.881 .42874 .58719 2.4918 .25517 3.6825 1 .1325 .97025 7.4771 .8000 .0007 .2645 .3381 .0000 / 172 P i l o t V a l i d a t i o n : Manual counting Program LOC Scrn Svar Rvar Mast Fids Proi Join P+J ASDX1000 87 2 6 0 0 0 0 0 0 ASDX1020 73 2 5 0 0 0 0 0 0 ASDX1100 88 3 53 0 1 26 0 0 0 ASDX1110 295 7 157 0 1 26 0 0 0 ASDX1130 352 12 84 0 2 51 0 0 0 ASDX4010 92 0 0 24 2 51 0 1 1 ASDX4020 116 0 0 12 3 77 0 2 2 ASDX4030 169 1 3 22 3 77 0 2 2 ASDX4040 71 0 0 19 2 51 0 1 1 ASDX4050 92 0 0 28 2 51 0 2 2 ASDX4060 . 52 0 0 19 1 26 0 0 0 ASDX4070 205 1 5 50 2 51 0 2 2 ASDX4080 31 0 0 0 0 0 0 0 0 ASDX4090 141 1 3 19 3 77 0 2 2 ASDX4100 346 0 0 0 6 75 4 0 4 ASDX4110 140 0 0 62 2 75 1 1 2 ASDX4120 191 0 0 62 2 75 1 1 2 ASDX4130 15 0 0 0 3 76 1 0 1 ASDX4140 180 1 3 20 3 77 2 3 5 ASDX4150 150 1 3 20 3 77 1 2 3 ASDX4160 52 0 0 16 2 51 0 1 1 ASDX9999 33 1 4 0 0 0 0 0 0 CAPX1000 87 2 6 0 0 0 0 0 0 CAPX1010 72 2 5 0 0 0 0 0 0 CAPX1030 69 2 5 0 0 0 0 0 0 CAPX1040 73 2 5 0 0 0 0 0 0 CAPX1050 67 2 5 0 0 0 0 0 0 CAPX1060 76 2 5 0 ' ,0 0 0 0 0 CAPX1070 67 2 5 0 0 0 0 0 0 CAPX1080 76 2 5 0 0 0 0 0 0 CAPX2010 691 11 314 0 4 61 0 3 3 CAPX2020 740 11 325 0 4 61 0 3 3 CAPX2030 486 6 83 0 4 61 0 3 3 CAPX2080 232 1 3 0 1 53 0 0 0 CAPX2090 176 1 8 0 1 53 0 0 0 CAPX2100 422 6 114 0 4 61 0 3 3 CAPX2110 73 1 7 0 2 57 1 0 1 CAPX3010 49 1 7 0 1 3 0 0 0 CAPX3020 40 1 7 0 1 3 0 0 0 CAPX3O30 41 1 5 0 1 2 0 0 0 CAPX3040 41 1 5 0 1 2 0 0 0 CAPX4010 118 1 6 55 4 61 0 3 3 CAPX4020 115 1 7 55 4 61 0 3 3 CAPX4030 237 1 6 49 4 61 3 5 8 CAPX4040 11 0 0 5 1 3 0 0 0 CAPX4050 11 0 0 5 1 3 0 0 0 / 173 CAPX4060 10 0 0 4 1 2 0 0 0 CAPX4070 10 0 0 4 1 2 0 __.o 0 CAPX5010 48 1 4 0 0 0 0 0 0 CAPX5020 183 0 3 0 70 3 0 3 CAPX5030 65 0 0 0 3 59 2 2 4 CAPX5040 55 0 0 0 3 63 0 0 0 CAPX6010 124 1 4 35 2 55 0 1 1 CAPX6020 162 1 4 19 1 53 2 1 3 CAPX6030 116 1 4 22 1 53 0 0 0 CAPX6040 185 1 4 28 2 56 3 3 6 CAPX6050 298 1 4 28 2 56 5 4 9 CAPX6060 216 1 4 28 2 56 4 3 7 CAPX6070 226 1 4 28 2 56 4 3 7 CAPX6O80 115 1 4 21 1 53 0 0 0 CAPX6090 440 1 2 25 1 53 6 3 9 CAPX6100 430 1 3 22 1 53 6 3 9 CAPX6110 135 1 3 18 2 ' 56 2 2 4 CAPX6120 163 1 6 18 2 56 2 4 6 CAPX6130 77 1 3 17 1 10 0 0 0 CAPX6140 61 1 3 13 1 8 0 0 0 CAPX6150 126 1 3 16 2 56 2 2 4 CAPX7010 125 1 4 39 2 55 0 1 1 CAPX7020 88 1 4 0 0 0 0 0 0 CAPX7040 143 0 0 37 2 56 4 3 7 CAPX7050 253 0 0 36 2 56 7 5 12 CAPX7060 190 0 0 36 2 56 6 4 10 CAPX7070 190 0 0 36 2 56 6 4 10 CAPX7080 62 0 0 23 3 21 0 2 2 CAPX9999 30 1 4 0 0 0 0 0' 0 / 174 P i l o t V a l i d a t i o n : A u t o m a t e d C o u n t i n g P r o e r a m LOC S c r n S v a r R v a r Mast F i d s P r o i J o i n ASDX1000 87 2 6 -. o 0 0 0 - 0 ASDX1020 73 2 5 0 0 0 0 0 ASDX1100 88 3 52 0 1 26 0 0 ASDX1110 295 5 157 0 1 26 0 0 ASDX1130 455 13 107 0 2 51 0 0 ASDX4010 91 0 0 32 2 51 0 1 -ASDX4020 116 0 0 29 3 77 1 2 ASDX4030 171 1 3 34 3 77 1 2 ASDX4040 73 0 0 30 2 51 0 1 ASDX4050 92 0 0 30 2 51 0 1 ASDX4060 54 0 0 19 1 26 0 0 ASDX4070 209 1 5 57 2 51 1 2 ASDX4080 31 0 0 0 0 0 0 0 ASDX4090 141 1 3 28 3 77 1 2 ASDX4100 348 0 0 28 2 74 4 0 ASDX4110 194 0 0 126 2 75 1 1 ASDX4120 194 0 0 126 2 75 1 1 ASDX4130 15 0 0 2 2 74 1 0 ASDX4140 178 1 - 3- 32 3 77 2 - 3 ASDX4150 150 1 3 28 3 77 1 2 ASDX4160 56 0 0 27 2 51 0 1 ASDX9999 34 1 5 0 0 0 0 0 CAPC2010 869 10 320_.__. 0 6 67 0 5 CAPC2020 910 11 349 0 6 67 0 5 CAPC2030 552 6 102 0 6 67 0 5 "*CAPC2040 75 0 2 0 2 63 0 0 *CAPC2050 ' 75 0 2 0 2 61 0 0 CAPC2080 256 1 16 0 6 67 0 5 CAPC2090 208 1 14 0 6 67 0 5 CAPC2100 497 6 135 0 6 67 0 5 CAPC3010 45 3 6 0 1 3 0 0 CAPC3020 45 3 6 0 1 3 0 0 CAPC3030 41 3 4 0 1 2 0 0 CAPC3040 41 3 4 0 1 2 0 0 *CAPC3050 47 3 12 0 1 6 0 0 CAPX1000 85 2 6 - 0 0 0 0 0 CAPX1010 83 2 5 0 0 0 0 0 CAPX1030 75 2 6 0 0 0 0 0 CAPX1040 73 2 6 0 0 0 0 0 CAPX1050 66 2. 5 0 0 0 0 0 CAPX1060 75 2 5 0 0 0 0 0 CAPX1070- 70 2 5 0 0 0 0 0 •*CAPX2010 11 0 0 0 6 67 0 5 * C A P X 2 0 2 0 11 0 0 " 0 6 67 0 5 *CAPX2030 44 0 0 *CAPX2080 11 0 0 *CAPX2090 18 0 0 *CAPX2100 11 0 0 CAPX2110 73 1 10 *CAPX2120 101 1 15 *CAPX2130 68 1 9 "*CAPX2140 76 1 9 *CAPX2150 89 1 3 CAPX3010 4 0 0 CAPX3020 4 0 0 CAPX3030 4 0 0 CAPX3040 4 0 0 •*CAPX3050 4 0 0 *CAPX3110 51 1 3 *CAPX3120 51 1 3 *CAPX3130 51 1 3 *CAPX3140 51 1 3 *CAPX3150 51 1 3 CAPX4010 114 1 6 CAPX4020 122 1 7 CAPX4030 245 1 6 CAPX4040 11 0 0 CAPX4050 11 0 0 CAPX4060 10 0 0 CAPX4070 10 0 0 *CAPX4080 17 0 0 CAPX5010 62 1 4 CAPX5020 196 1 10 CAPX5030 63 0 0 CAPX5040 59 0 4 •*CAPX5050 25 0 1 CAPX6010 71 1 4 CAPX6020 106 1 4 CAPX6030 62 1 4 CAPX6040 147 1 4 CAPX6050 254 1 4 CAPX6060 171 1 4 CAPX6070 160 1 4 CAPX6080 59 1 4 CAPX6090 427 1 3 CAPX6100 405 1 3 CAPX6110 116 1 3 CAPX6120 157 1 6 CAPX6130 74 1 4 CAPX6140 50 1 4 CAPX6150 99 1 3 CAPX6160 109 1 3 CAPX7010 70 1 4 CAPX7020 33 1 4 CAPX7040 144 0 0 CAPX7050 254 0 0 CAPX7060 190 0 0 CAPX7070 190 0 0 / 175 0 - 10 107 0 5 0 ~ "6~" 67 0 5 0 6 67 0 5 0 6 67 0 5 0 1 51 1 0 0 3 61 1 1 0 1 51 1 0 . 0 1 51 1 0 16 1 51 1 0 0 . 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2 54 2 0 4 2 54 2 0 4 2 53 2 0 4 2 53 2 0 4 2 57 2 0 42 4 59 0 3 44 4 59 0 3 112 4 59 3 . 4 8 1 3 0 0 8 1 3 0 0 6 1 2 . 0 0 6 1 2 0 0 16 1 6 0 0 2 1 2 1 0 19 3 73 5 0 14 57 2 4 0 1 X 51 0 0 1 1 2 1 0 42 2 53 0 ' . 1 45 1 51 2 1 32 1 51 0 0 72 2 54 3 2 101 2 54 5 3 82 2 54 4 2 82 2 54 4 2 22 1 51 0 0 96 1 51 6 3 96 1 51 . 6 3 36 2 54 3 2 59 4 59 2 4 44 2 15 0 1 29 1 10 0 0 24 2 54 2 2 41 1 51 2 1 42 2 53 0 1 0 0 0 0 0 91 2 54 4 3 137 2 54 7 5 120 2 54 6 4 120 2 54 6 4 CAPX7080 62 0 * C A P X 7 0 9 0 52 1 * C A P X 8 0 1 0 63 0 *CAPX8020 34 0 CAPX9999 29 1 I 176 0 29 0 0 0 1 4 20 2 53 0 1 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 APPENDIX C: SYSTEMS DATABASE DETAILED STRUCTURE 177 •* MASTER FILE DEFINITION: SYSTEMS -* DATE CREATED: JULY.15.1988 -* BY: CD. WRIGLEY _*»«••*»»»•»»*»»»•«•»»*«>*»»•»»•»•*•»*»*••»»*»»*»»»»»»»»»««»•»••*•» FILENAME=SYSTEMS,SUFFTX=FOC,$ SEGNAME=COMPANY, SEGTYPE=S1,$ FIELDNAME=COMP ANYNAME ALIAS = FTRM,FORMAT=A8,FIELDTYPE=I,$ SEGNAME=SYSNAMES.PARENT=COMPANY.SEGTYPE=SL$ FIELDNAME=S YSTEMNAMEALIAS=S YSTEM.FORMAT=A8,FIELDTYPE=I,$ FIELDNAME=SYSTEMTYPE„FO RMAT=A20,$ SEGNAME=PROJNAME, PARENT=SYSNAMES,SEGTYPE=Sl,$ FIELDNAME=PROUD ALIAS=.FORMAT=A20,FTELDTYPE=L$ SEGNAME=PROJINFO.PARENT=PROJNAME,SEGTYPE=KU, CRSEGNAME = PROJINFO.CRKE Y=PROJID.CRFILE=PRO JECTS,$ SEGNAME=PROGRAMS.PARENT=SYSNAMES.SEGTYPE=Sl,$ FIELDNAME = PROGRAM NAMEALIAS = FOCNAME.FORMAT=A8.FIELDTYPE = I,$ SEGNAME=BUILDER.PARENT=PROGRAMS.SEGTYPE=SL$ FIELDNAME=PROGRAMMER„FORMAT=A8.FIELDTYPE=I,$ SEGNAME=LINES.PARENT=PROGRAMS.SEGTYPE=S0,$ FIELDNAME = LOC„FORMAT=A80,$ SEGNAME=OPERANDS, PARENT=LINES, SEGTYPE=SO,$ FIELDNAME=OPERAND„FORMAT=A12,$ FIELDNAME - OPERANDTYPE..FORMAT=12,$ SEGNAME=VOCAB, PARENT=LINES, SEGTYPE=SO,$ FIELDNAME=KEYWORD..FORMAT=A12,$ SEGNAME=FILES, PARENT=PROGRAMS, SEGTYPE=S1,$ FIELDNAME=FILENAME, FORMAT=A8.FIELDTYPE=I,$ FIELDNAME = # FIELDS ALIAS=.FORMAT=L5,S FIELDNAME = ^ SEGMENTS ALIAS=.FORMAT=15,$ FIELDNAME = #INDEXES ALIAS = .FORMAT=12,$ FIELDNAME=# FILES ALIAS=.FORMAT=12,$ FIELDNAME = F_LENGTH ALIAS=.FORMAT=14,$ SEGNAME=MASTERS PARENT=SYSNAMES.SEGTYPE=S1,S FIELDNAME=MASTERNAME..FORMAT=A8,FTELDTYPE=I,$ FIELDNAME - MUM FIELDS ALIAS = FIELDS FORMAT=15,$ FIELDNAME=NUM~SEGMENTS ALIAS=SEGMENTS.FORMAT=15,$ FIELDNAME=NUM~IND EXES ALIAS=INDEXES .FORMAT=12,$ FIELDNAME=NUM~FILES ALIAS = .FORMAT=12,$ FIELDNAME=FIELD LENGTH ALIAS=TOTAL LENGTH.FORMAT=14,$ -* MASTER FILE DEFINITION: PROJECTS . * . . • - --* DATE WRITTEN: JUL.18.1988 BY: CD. WRIGLEY -* MODS: FILENAME=PROJECTS.SUFFIX=FOC SEGNAME=PROJTNFO,SEGTYPE=SI FIELDNAME=PROJTD ALIAS = PROJID .FORMAT=A20.FIELDTYPE=I,$ FIELDNAME=PROJNAMEALIAS = .FORMAT=A40,$ SEGNAME=SYSTEMS.PARENT=PROJJNFO.SEGTYPE=Sl,$ FIELDNAME=SYSTEMNAMEALIAS=FORMAT=A12,FIELDTYPE=I,$ SEGNAME=SYSINFO.PARENT = SYSTEMS.SEGTYPE=KU, CRSEGNAME=S YSNAMES.CRFTLE=S YSTEMS.CRKE Y=SYSTEM_NAME,$ SEGNAME=RESOURCE,PARENT=PROHNFO,SEGTYPE=SL$ FIELDNAME=RESOURCELD ALIAS=.FORMAT=A20,$ FIELDNAME=SKILL_LEVELyALIAS = FORMAT=12,$ SEGNAME=TASKS,PARENT=PROJINFO.SEGTYPE= SL$ FIELDNAME=TASKID ALIAS = .FORMAT=A12.FIELDTYPE=L$ SEGNAME=WORKDONE, PARENT=TASKS.SEGTYPE=SL$ FIELDNAME=WORKUNTTS ALIAS=.FORMAT=14,$ SEGNAME=TOOLS.PARENT=PROJTNFO, SEGTYPE=U,$ FIELDNAME=LANGUAGEALIAS=.FORMAT=A8,$ FIELDNAME=METHODOLOGY ALIAS = .FORMAT=A20,$ FIELDNAME = HARDWAREALIAS = .FORMAT=A20,$ FIELDNAME=SAD TOOLSALIAS,FORMAT=A20,$ APPENDIX D: CODE ANALYSER SOURCE CODE 180 ************************************************************************ / 181 -* PROGRAM: SYSLIST _• -* THIS EXEC TAKES TWO INPUTS: -* THE FIRST IS NAME OF THE COMPANY TO BE USED AS AN IDENTIFIER -* IN THE SYSTEMS DATABASE. -* THE SECOND IS THE PATHNAME OF A DIRECTORY -* WHERE THE COMPANY'S SYSTEMS TO BE REVERSE ENGINEERED ARE LOCATED. -• ITS GOOD POLICY TO USE THE DIRECTORY NAME -* AS THE SYSTEM NAME. FUTURE MODS WILL DO THIS AUTOMATICALLY. -* THE PROGRAM CONSTRUCTS A LIST OF SYSTEMS AND STORES THIS LIST IN -* SYSLIST J3IR IN THE COMPANY'S ROOT DIRECTORY -* AUG.08.88 CLTVE WRIGLEY _************************************************************************ DOS ERASE FOCSTACKJFTM -RUN -SET &STACK=ON; -DOS FILEDEF COMPANY DISK D.COMPANY.PTR -DOS FTLEDEF SYSLIST DISK D:SYSUSTX>IR -'FIRST GET THE LOCATION OF THE SYSTEMS TO BE COUNTED -TYPE ENTER COMPANY NAME (DIRECTORY CONTAINING THE SYSTEMS.) -PROMPT &&COMPANYA8.ENTER ABREVIATED NAME OF COMPANY (8 CHARS). -PROMPT &&DRTVE.ENTER DRIVE AND PATH WERE COMPANY DATA IS STORED. -SET &&PATH = &&DRIVE 11 &&COMPANY; -DOS CD &&PATH -* store the company name for future reference -TYPE STORING COMPANY NAME: &&COMPANY -WRITE COMPANY &&COMPANY -TYPE CONSTRUCTING LIST OF SYSTEMS IN: &&PATH EXEC GETDIRS PATH = &&PATH -RUN DOSC: -RUN _*•**•*•**••**»***•••*•*»*••*••••****•***************•***•*•****«******** / 182 -• PROGRAM: SYSDIRS _• -* THIS EXEC TAKES TWO INPUTS: -* THE FIRST IS NAME OF THE COMPANY TO BE USED AS AN IDENTIFIER -* IN THE SYSTEMS DATABASE. -* THE SECOND IS THE PATHNAME OF A DIRECTORY -• WHERE THE COMPANY'S SYSTEMS TO BE REVERSE ENGINEERED ARE LOCATED. -* THE PROGRAM CONSTRUCTS LISTS OF FOCEXEC AND MASTER FILE DEFINITIONS -* FOR EACH NAMES FOUND IN SYSLIST.DIR IN THE COMPANY'S DIRECTORY -* THESE LISTS ARE STORED IN FEXLIST.DIR AND MASLIST.DIR IN THE SUB--* DIRECTORIES CONTAINING THE ACTUAL SOURCE CODE. -* ADDITIONALLY THE COMPANY AND SYSTEM NAME ARE STORED IN SYSNAME.PTR -• AUG.08.88 CD. WRIGLEY _***»*****«***»»*•******************************************************* DOS ERASE FOCSTACK.FTM -RUN -SET&STACK=ON; -DOS FTLEDEF COMPANY DISK D-.COMPANY.PTR -DOS FTLEDEF SYSLIST DISK D:SYSLIST.DIR -•FIRST GET THE LOCATION OF THE SYSTEMS TO BE COUNTED -TYPE ENTER COMPANY NAME (DIRECTORY CONTAINING THE SYSTEMS.) -PROMPT &&COMPANYA8.ENTER ABREVIATED NAME OF COMPANY (8 CHARS). -PROMPT &&DRIVE.ENTER DRIVE AND PATH WERE COMPANY DATA IS STORED. -SET &&PATH = &&DRTVE 11 &&COMPANY; -DOS CD &&PATH -DOS STATE D:SYSLIST.DIR -IF &RETCODE NE 0 THEN GOTO NOSYS; -DOS CD &&PATH -GETSYS -READ SYSLIST &&SYSNAMEA8. -IF &IORETURN NE 0 THEN GOTO ENDSYS; -* CHANGE DIRECTORY TO SYSNAME -SET &&SYSPATH = &&PATH | j ' \ ' 11 &&SYSNAME; -DOS CD &&SYSPATH -* STORE THE SYSTEM NAME FOR LATER ACCESS -TYPE STORING NAME FOR &&SYSNAME SYSTEM IN &&SYSPATH -DOS FTLEDEF SYSNAME DISK D:SYSNAME.PTR -WRITE SYSNAME &&COMPANY -WRITE SYSNAME &&SYSNAME -TYPE CONSTRUCTING FOCEXEC AND MASTER LISTS FOR: &&SYSNAME DOS CD &&SYSPATH EXEC GETNAMES EXTENSION='MAS' EXEC GETNAMES EXTENSION=TEX' -DOS CD &&PATH -GOTO GETSYS -NOSYS -TYPE »*** NO SUBDIRECTORIES FOUND FOR: &COMPANY. TERMINATING RUN. -GOTO EXIT -ENDSYS -RUN DOS C: -RUN / 183 _*«*•**•**»••*•••*•*•*••**••••»•****•«*•••*••*•••*••**•**•******•*•*•**** / 184 -* PROGRAM: GETDIRS _* -• THIS PROGRAM CONSTRUCTS A LIST OF SUB-DIRECTORIES IN THE PATH -* SPECIFIED BY THE CALLING PROCEDURE. THE FILE TEMP.DIR IS CREATED IN THE -* CURRENT DRIVE AND DIRECTORY ALONG WITH SYSLIST.DIR -* CLIVE WRIGLEY JULY.11.88 • * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * -DEFAULTS &DRrVE='D:' -DEFAULTS &RTNDRTVE='C:' DOS &DRTVE DOS CD &PATH DOS DIR *. > TEMP.DIR -RUN -SET &SYSLIST = 'D-.SYSLIST.DIR'; -DOS FILEDEF SYSLIST DISK &SYSLIST -DOS FILEDEF DIRLIST DISK TEMP.DIR -TOP -READ DIRLIST &FILENAMEA8. &EXTENSIONA5. &DIRA5. -IF &IORETURN NE 0 THEN GOTO DONE; -IF &DIR NE '<DIR>' OR (&FTLENAME EQ 7 OR '..*) THEN GOTO TOP; -WRITE SYSUST &FTLENAME -GOTO TOP -DONE DOS ERASE TEMP.DIR DOS &RTNDRTVE -RUN / 185 •PROGRAM: GETNAMES -• THIS PROGRAM CONSTRUCTS A LIST OF DOS FILENAMES HAVING THE EXTENSION -• SPECIFIED BY THE CALLING PROCEDURE. THE FILE TEMP.DIR IS CREATED IN THE -• CURRENT DRIVE AND DIRECTORY ALONG WITH XXXLIST.DAT -• CLIVE WRIGLEY JULY.11.88 _*»•**•••*••*••*******«*•**•******•****•*****••••*****•*•****•*****••*•**• -DEFAULTS &DRJVE='D:' -DEFAULTS &RTNDRIVE='C:' DOS &DRTVE DOS DIR •.&EXTENSION >TEMP.DIR -RUN -SET &FTLELIST = &EXTENSION 11 'LIST.DER'; -DOS FTLEDEF FTLELIST DISK &FTLELIST -DOS FTLEDEF DIRLIST DISK TEMP.DIR -TOP -READ DIRUST &FTLE1A1. &FILE2A7. -IF &IORETURN NE 0 THEN GOTO DONE; -IF &FTLE1 EQ ' ' THEN GOTO TOP; -SET &FILENAME = &FILE1 11 &FTLE2; -WRITE FTLELIST &FTLENAME -GOTO TOP -DONE DOS ERASE TEMP .DIR DOS &RTNDRIVE -RUN _**********•»***********•*************************»*********************** / 186 -* PROGRAM TO COUNT MASTERFTLE DEFINITIONS -• &MASNAME IS PASSED FROM SYSCOUNT -* MASTER FILE LIST IS IN D-.MASLIST.DIR -* SYSTEM NAME IS IN D:SYSNAME.PTR OF THE CURRENT DIRECTORY -» CLTVE WRIGLEY JUL.08.88 -* MODS: SE.08.88 CHANGED NAME OF LOG FILE TO UPDATE.LOG .*«***»**««*«****************«*******»****«»»«*«***«*********»»**»»***»**» -DOS FTLEDEF SYSNAME DISK D:SYSNAME.PTR FTLEDEF SYSTEMS DISK C:SYSTEMS.FOC FTLEDEF LOGFILE DISK C.UPDATE.LOG APPEND -• CHECK MASTERFTLE DEMOGRAPHICS DOS D. CHECK FILE &MASNAME HOLD DOS C: TABLE FILE HOLD COUNT SEGNAME AND CNT.FIELDNAME AND SUM.SKEYS BY FILENAME ON TABLE HOLD AS HOLD2 END MODIFY FILE SYSTEMS FTXFORM ON HOLD2 MASTERNAME/8 NUM SEGMENTS/5 NUMFTELDS/5 NUM INDEXES/2 GOTO MASADD CASE AT START COMPUTE STUPID/A8 = "; GOTO FIND FIRM ENDCASE CASE FTNDFTRM TYPE "* FTXFORM ON SYSNAME COMPANYNAME/8 FTXFORM ON SYSNAME SYSTEMNAME/8 MATCH COMPANYNAME ON NOMATCH TYPE "ADDING COMPANY: < COMPANYNAME TO DATABASE" ON NOMATCH TYPE ON LOGFILE "ADDING COMPANY: < COMPANYNAME TO DATABASE" ON NOMATCH INCLUDE GOTO FTNDSYS ENDCASE CASE FTNDSYS T Y P E " MATCH SYSTEMNAME ON NOMATCH TYPE "ADDING SYSTEM: < SYSTEMNAME TO DATABASE" ON NOMATCH TYPE ON LOGFILE "ADDING SYSTEM: < SYSTEMNAME TO DATABASE" ON NOMATCH INCLUDE ENDCASE CASE MASADD MATCH MASTERNAME ON MATCH TYPE "DUPLICATE MASTER: < SYSTEMNAME < MASTERNAME" ON MATCH TYPE ON LOGFILE "DUPLICATE MASTER: < SYSTEMNAME < MASTERNAME" ON MATCH REJECT ON NOMATCH TYPE ON LOGFILE "ADDING MASTER: < SYSTEMNAME < MASTERNAME" ON NOMATCH TYPE "ADDING MASTER <MASTERNAME" ON NOMATCH INCLUDE ENDCASE DATA END *«***»**»**•*»»««»•»«»»***»***»***»»*«»*»«**«»**«*»»***»»»»«*««»» / 188 -* PGMCOUNT.FEX -* VERSION AS OF AUG.07.88 -• PRINCIPAL CHANGES: -* 1) OPERANDS ARE CATEGORIZED ACCORDING TO THE OPERAND -* CODING SCHEME IN FOCUS\ANALYSER\OPCODESJX)C -* 2) TOKENS ARE STORED BELOW LOC AND NOT IN ANY ORDER AS OPPOSED TO BELOW PROGRAMNAME AND IN ORDER. -* 3) TOKEN COUNTING IN NOT DONE ANY MORE. THIS FUNCTION IS HANDLED BY TABLE' REQUESTS -* MODS: SE.07.88 FILENAME MAY CONTAIN SEGMENT INFO.THJS IS PARSED OUT -* EG. MODIFY FTLE NAME.SEG _* -* SYSTEMNAME IS STORED IN SYSNAME.PTR IN THE CURRENT DIRECTORY -* &FOCEXEC IS PASSED FROM SYSCOUNT -* CLIVE WRIGLEY JULY.88 -DOS FILEDEF SYSNAME DISK D:SYSNAME.PTR -DEFAULTS &OUTFTLEl=C:SYSTEMS.FOC -DEFAULTS &LOOKUP1 = CFOCWORDS.FOC -SET &FOCFTLE = 'D:' 11 &FOCEXEC 11 '.FEX'; _*******»**»•**»**»**»**********»*»****»*»»*»**«»«*****««*«**»»*»*»«**»** -* THIS SECTION READS IN A TEXTFILE 'FOCFLLE' INTO A FOCUS FTLE 'SYSTEMS' -* EACH LINE OF TEXT IN TEXTFILE IS CHOPPED INTO TOKENS TABLE LOOK UP IS PERFORMED TO DETERMINE IF TOKENS ARE FOCUS KEYWORDS -* (OPERATORS) OR DATA REFERENCES (OPERANDS). -» TOKENS AND LOC ARE ADDED TO THE SYSTEMS FILE _********«******««***********»***»**«*«************»********************** FILEDEF SYSTEMS DISK &OUTFILE1 FILEDEF TEXTFILE DISK &FOCFTLE FILEDEF FOCWORDS DISK &LOOKUP1 FILEDEF LOG FILE DISK CUPDATE.LOG APPEND MODIFY FTLE SYSTEMS FLXFORM ON TEXTFILE LOC/80 GOTO PARSER CASE AT START COMPUTE MAXLOC/I4 = 9999; LOCOUNT/I4 =0; TABLEFLAG/I1 = 0; CRTFLAG/I1 = 0; GOTOFLAG/I1 = 0; FTLEFLAG/I1 =0; LEFTSIDE/11 = ; COMMLEN/I1 = 2; COMMENT/A2 ='-•'; QUOTE/A1 = HEXBYT(34,QUOTE); APOSTROPHE/A1 = HEXBYT(39rAPOSTROPHE); QUOTEPOS/I2 = ; CHARPOS/T2 = 0; OPENQUOTE/I1 =0; / 1 8 9 CARPOS/T2 =; LCARROT/A1 = '<'; RCARROT/A1 = *>'; DELJMETER/Al = ; NEWLINE/A80 = ; BUFFER/A80 = ; GOTO FTNDSYS ENDCASE _****«*»«**»»**»****»**«*»»*«»»»»»*»**»***»»»*»»»**»*««»*»»»»»*»«»»»«» -* SEARCH FOR COMPANY AND SYSTEM. ADD IF NOT THERE. _********•»»«***•«••*****•**•*****•***•**«*•*•••*•*•••******•••*****•* CASE FTNDSYS FTXFORM ON SYSNAME COMPANYNAME/8 FTXFORM ON SYSNAME SYSTEMNAME/8 MATCH COMPANYNAME ON MATCH TYPE "PROCESSING COMPANY: < COMPANYNAME " ON NOMATCH TYPE "ADDING NEW COMPANY: < COMPANYNAME" ON NOMATCH INCLUDE MATCH SYSTEMNAME ON MATCH TYPE "PROCESSING SYSTEM: < SYSTEMNAME " ON NOMATCH TYPE "ADDING NEW SYSTEM: < SYSTEMNAME" ON NOMATCH INCLUDE GOTO FTNDPROG ENDCASE -* SEARCH FOR PROGRAMNAME, ADD NEW ROOT IF NOT THERE, EXIT IF DUPLICATE -* READ IN FOCEXEC NAME TO PARSE FROM AMPER VARIABLE IN DATA STREAM BELOW CASE FTNDPROG FTXFORM PROGRAMNAME MATCH PROGRAMNAME ON MATCH TYPE ON LOGFILE "DUPLICATE PROGRAM: < SYSTEMNAME < PROGRAMNAME" ON MATCH TYPE "DUPLICATE PROGRAM: < PROGRAMNAME" ON MATCH REJECT ON MATCH GOTO EXIT ON NOMATCH TYPE ON LOGFILE •PARSING PROGRAM: < SYSTEMNAME < PROGRAMNAME" ON NOMATCH TYPE "PARSING PROGRAM. < PROGRAMNAME" ON NOMATCH INCLUDE ENDCASE _**•*»**••««**••*•*••***••••**•**•*****••*•***•*••**••****•**••*«•**• -* EACH LINE IS PASSED TO HERE FIRST -* DEAL WITH MATTERS RELATING TO THE ENTIRE LINE CASE PARSER -* TYPE " " TYPE "<LOC -* TYPE " " COMPUTE LOC/A80 = LJUST(80,LOC,LOC); LOCLEN/I2 = ARGLEN(80,LOC,LOCLEN); LOCBUF/A80 = LOC; / 190 LEFTSIDE/14 =1; TOKCOUNT/I2 =0; LOCOUNT/I4 = LOCOUNT + 1; IF LOCOUNT GT MAXLOC THEN GOTO ABORTPGM; -'IGNORE BLANK LINES AND COMMENTS IF (LOCLEN EQ 0) OR (LOC CONTAINS COMMENT) OR (LOC EQ '-') THEN GOTO ENDCASE; -'INCLUDE THE LOC INTO THE DATABASE NEXT LOC ON NONEXT INCLUDE -* FOCUS BUG CAUSES THIS BRANCHING CLUGE CDW. AU.09.88 GOTO PARSE2 ENDCASE CASE PARSE2 -* IN THE EVENT THAT STRINGS EXTEND OVER 2 OR MORE LINES IF OPENQUOTE THEN GOTO PATCHTT; -* SET FLAGS TO INDICATE THE CONTEXT OF THE CODE COMPUTE TABLEFLAG/I1 = IF (LOC CONTAINS TABLE ') AND (LOC OMITS QUOTE OR APOSTROPHE) THEN 1 ELSE (IF TABLEFLAG EQ 1 " ' THEN (IF LOC CONTAINS 'END* AND LOCLEN LE 4 THEN 0 ELSE TABLEFLAG) ELSE TABLEFLAG); MATCHFLAG/I1 = IF (LOC CONTAINS 'MATCH ') AND (LOC CONTAINS 'FILE ') AND (LOC OMITS QUOTE OR APOSTROPHE) THEN 1 ELSE (IF MATCHFLAG EQ 1 THEN (IF LOC CONTAINS 'END' AND LOCLEN LE 4 THEN 0 ELSE MATCHFLAG) ELSE MATCHFLAG); CRTFLAG/Il = LF LOC CONTAINS 'CRTFORM ' THEN 1 ELSE (IF LOC CONTAINS TYPE ' THEN 0); COMBINEFLAG/I1 = IF LOC CONTAINS 'COMBINE ' AND LOC CONTAINS ' FILES ' AND LOC OMITS QUOTE OR APOSTROPHE THEN 1 ELSE (LF COMBINEFLAG EQ 1 AND LOC CONTAINS ' AS ' THENO); -•TYPE " TBL= <TABLEFLAG CRT= <CRTFLAG FF= <FILEFLAG MAT= <MATCHFLAG" -•TYPE " COM = < COMBINEFLAG" / 191 -* LINE MAY REFER TO A MASTER FTLE IF LOC CONTAINS 'FILE' OR 'JOIN ' OR TABLE ' THEN GOTO FTLEPARSER; GOTO EASYSTUFF ENDCASE _*****»»*»*»***»**»**»*»»***«***»»**»****»*»*«**»»**»*««******»*»««*« -» FTLEPARSER IS CALLED WHEN AN INPUT LINE LOOKS SUSPICIOUSLY -» LIKE IT MAY HAVE SOME MASTERFILE INFORMATION IN IT _»»*»*»*•*»**»»«***«**»**«»«»»*«***»»*«***«#»»*»*•»****»****»**»*«**»* CASE FTLEPARSER COMPUTE NEWLINE/A80 = LOC; NEWLEN/T2 = LOCLEN; GOTO GETFILE ENDCASE CASE GETFILE IF NEWLEN EQ 0 THEN GOTO EASYSTUFF; COMPUTE TOKEN/A80 = GETTOK(NEWLINE,NEWLEN,l)' ',80,TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); NEWLINE = SUBSTR(NEWLEN,NEWLINE,TOKENLEN+1, NEWLEN, 80,NEWLINE); NEWLINE/A80 = UUST(80,NEWLINE,NEWLINE); NEWLEN/T2 = ARGLEN(80,NEWLINE,NEWLEN); PERIODPOS/I2 = POSlT(TOKEN,TOKENLEN,'.',l,PERIODPOS); MASTERNAME = IF PERIODPOS EQ 0 THEN SUBSTR(80,TOKEN,1,8,8,MASTERNAME) ELSE SUBSTR(80,TOKEN,1,PERIODPOS-1,8,MASTERNAME); -*TYPE "MASTERNAME = < MASTERNAME" MATCH MASTERNAME ON MATCH GOTO CHECKFTLE ON NOMATCH GOTO GETFILE ENDCASE CASE CHECKFTLE COMPUTE FILENAME = MASTERNAME; MATCH FILENAME ON MATCH CONTINUE ON NOMATCH COMPUTE # FIELDS = D-NUMFTELDS; #SEGMENTS = D.NUMSEGMENTS; #INDEXES = D.NUM_INDEXES; #FILES = D.NUM FILES; FLENGTH = D.FIELD LENGTH; ON NOMATCH INCLUDE ON NOMATCH TYPE "INCLUDING FILENAME < FILENAME" GOTO GETFILE ENDCASE _*«**»*«***»•««**********«««*****************«******************* -* START OF MAIN TOKEN PARSING LOOP _****************************************************************** I 192 CASE EASYSTUFF COMPUTE TOKEN/A80 = "; LOC/A80 = UUST(80,LOC,LOC); LOCLEN/I2 = ARGLEN(80,LOCJLOCLEN); TOKEN/A80 = GETTOK(LOCJLOCLEN,l,' \80TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); TOKCOUNT/T2 = TOKCOUNT+1; IF TOKCOUNT GT 99 THEN GOTO ABORTLINE; -* DONE WITH THE LINE IF TOKENLEN EQ 0 THEN GOTO ENDCASE; GOTO LOOKITUP ENDCASE -* DO A TABLE LOOKUP TO SEE IF THE TOKEN IS A FOCUS KEYWORD CASE LOOKITUP COMPUTE KEYWORD/A12 = ''; KEYWORD/A12 = IF TOKENLEN LE 12 THEN SUBSTR(TOKENLENTOKEN,LTOKENLEN,12,KEYWORD) ELSE SUBSTR(TOKENLEN,TOKEN,LL2,12,KEYWORD); KEYTEST = FIND(KEYWORD LN FOCWORDS); IF KEYTEST EQ 0 THEN GOTO MEDIUMSTUFF ELSE GOTO PARSEKEY; ENDCASE _«*«*»««»**•»»*****«**»»»»*** OPERATORS *************************** -* PARSEKEY IS CALLED WHEN THE INPUT TOKEN MATCHES A FOCUS KEYWORD _«*»**»»**»**»»**•»***««**»»»»«***«»»«»»»*»****»***««»»»««*»*»*»*»* CASE PARSEKEY COMPUTE LEFTSIDE/14 = IF KEYWORD EQ ' = ' THEN 0; GOTOFLAG/I1 = IF KEYWORD CONTAINS 'GOTO' OR 'CASE' THEN 1; FTLEFLAG/H = IF KEYWORD EQ 'FTLE' OR 'FILES' OR ((TABLEFLAG EQ 0 AND MATCHFLAG EQ 0) AND (KEYWORD IS 'AS' OR 'IN)) OR (COMBINEFLAG EQ 1 AND KEYWORD IS 'AND') • THEN 1; NEXT KEYWORD ON NONEXT INCLUDE -* ON NONEXT TYPE " KEYWORD = < KEYWORD FTLEFLAG= <FELEFLAG" GOTO REMOVE ENDCASE _******************************* OPERANDS ************************** -* THIS IS WHERE OPERANDS ARE PROCESSED: ONCE THEY'RE FOUND. -* DEPENDING ON ITS CONTEXT AN OPERAND MAY BE ONE OF MANY THINGS ******************************************************************* CASE CODEOPERAND COMPUTE OPERAND TYPE = IF FTLEFLAG EQ 1 THEN 40 / 193 ELSE (IF GOTOFLAG EQ 1 THEN 60 ELSE (IF TABLEFLAG EQ 1 OR MATCHFLAG EQ 1 THEN (IF LOCBUF CONTAINS ' HOLD ' THEN 41 ELSE (IF MATCHFLAG EQ 1 THEN 50 ELSE 20)) ELSE (IF LOCBUF CONTAINS '-INCLUDE ' THEN 42 ELSE (TP TOKEN GT ':' THEN 50 ELSE (IF TOKEN CONTAINS '&' THEN 51 ELSE 52))))); GOTO PARSEOP ENDCASE -* ADD TO THE DATABASE CASE PARSEOP COMPUTE TOKTEMP/A80 = "; SLASHPOS/I2 = POSTT(TOKEN,TOKENLEN,7',l,SLASHPOS); TOKTEMP/A80 = IF (SLASHPOS GT 0) AND (OPERAND TYPE NE 7) THENSUBSTR(TOKENl£N,TOKEN,l,SIASHPOS-l,SIASHPOS-l,TOKTEMP) ELSE TOKEN; TEMPLEN/I2 = ARGLEN(80,TOKTEMP,TEMPLEN); OPERAND/A12 - SUBSTR(cW,TOKTEMP,l,TEMPLEN,12,OPERAND); FTLEFLAG = 0; GOTOFLAG = 0; IF OPERAND EQ '""' THEN GOTO REMOVE; NEXT OPERAND ON NONEXT INCLUDE -* ON NONEXT TYPE "OPERAND = < OPERAND TYPE = < OPERANDTYPE" GOTO REMOVE ENDCASE -* UTILITY TO REMOVE TOKEN FROM LOC CASE REMOVE COMPUTE LOC/A80 = IF LOCLEN EQ TOKENLEN THEN ' ' ELSE SUBSTR(80,LOC,TOKENLEN+1, LOCLEN, 80.LOC); LOC/A80 = LJUST(80iOCiOC); LOC/A80 = IF KEYWORD EQ '-TYPE' THEN "'| |LOC| |""; LOCLEN/I2 = ARGLEN(80,LOC,LOCLEN); GOTO EASYSTUFF ENDCASE *************************** MEDIUM STUFF *********************** -• WHEN THE TOKEN ISNT STRAIGHTFORWARD SOME MORE LOGIC IS INVOKED -• TO ASCERTAIN WHATS IN THE LINE ***************************************************************** CASE MEDIUMSTUFF / 194 -* DEAL WITH QUOTES AND APOSTROPHES - A ROYAL PAIN IN THE ASS IF POSIT(TOKEN,TOKENLEN,QUOTE,l,QUOTEPOS) EQ 1 OR 2 THEN GOTO PQUOTE; IF POSrT(TOKEN,TOKENLENAPOSTROPHE,l,QUOTEPOS) EQ 1 THEN GOTO PAPOST; -* IF THERE ARE NO SPECIAL CHARCTERS IN TOKEN THEN ITS AN OPARAND IF TOKEN OMITS '(' OR ')' OR 7 OR ';' OR '=' OR '-' OR OR '+' OR '[' OR '!' OR '@' OR '<* OR '/' THEN GOTO CODEOPERAND; -* NOW THINGS ARE BIT TRICKIER: SCREEN OUT DELLMETERS OR PAD COMPUTE CHARPOS/I2 = POSrr(TOKEN,TOKENLEN)'(')l,CHARPOS); IF CHARPOS NE 0 THEN GOTO BLANKOUT; COMPUTE CHARPOS/I2 = POSrT(TOKEN,TOKENLEN,')',l,CHARPOS); IF CHARPOS NE 0 THEN GOTO BLANKOUT; COMPUTE CHARPOS/12 = POSIT(TOKEN,TOKENLEN,,,'>l,CHARPOS); IF CHARPOS NE 0 THEN GOTO BLANKOUT; COMPUTE CHARPOS/I2 = POSrT(TOKEN,TOKENLEN,';,,l,CHARPOS); IF CHARPOS NE 0 THEN GOTO BLANKOUT; COMPUTE CHARPOS/I2 = POSrT(TOKEN>TOKENLEN,' = ',l,CHARPOS); IF CHARPOS NE 0 THEN GOTO PADIT; COMPUTE CHARPOS/I2 = POSrTOrOKEN.TOKENLEN.'-'.LCHARPOS); IF CHARPOS NE 0 THEN GOTO CHECKLABEL; COMPUTE CHARPOS/I2 = POSrT(TOKEN,TOKENLEN,'*',l,CHARPOS); IF CHARPOS NE 0 THEN GOTO PADIT; COMPUTE CHARPOS/I2 = POSITiTOKEN.TOKENLEN.'+'.LCHARPOS); IF CHARPOS NE 0 THEN GOTO PADIT; COMPUTE CHARPOS/I2 « POSlT(TOKENTOKENLEN//\LCHARPOS); IF CHARPOS NE 0 THEN GOTO SLASHIT; COMPUTE CHARPOS/I2 = POSiT(TOKEN,TOKENLEN,'|',l,CHARPOS); IF CHARPOS NE 0 THEN GOTO PADIT; WHAT IS THIS STUFF? GOTO HARDER ENDCASE -* THE "\" CHARACTER PRESENTS SPECIAL PROBLEMS. IT CAN BE EITHER A / 195 -* FORMATTING D E L I M E T E R OR A DIVISION OPERATION DEPENDING O N WHICH SIDE -* OF T H E "=" SIGN IT LEES. CASE SLASHTT IF LEFTSIDE E Q 1 T H E N G O T O CODEOPERAND; GOTO PADIT ENDCASE D I A L O G U E M A N A G E R STATEMENTS H A V E A - IN T H E FIRST C O L U M N CASE C H E C K L A B E L IF POSIT(LOCBUF,80,'-M,CHARPOS) N E 1 T H E N G O T O PADIT; C O M P U T E O P E R A N D T Y P E = 60; GOTO PARSEOP ENDCASE -* APOSTROPHES LOC='BLAH... CASE PAPOST C O M P U T E NEWLINE/A80 = SUBSTR(LOCLEN )LOC,2,LOCLEN,80,NEWLINE); QUOTEPOS/I2 = POSIT(NEWLLNE, 80,APOSTROPHE,1,QUOTEPOS); TOKEN/A80= SUBSTR(LOCLEN, LOC, 1,QUOTEPOS + 1,80,TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); O P E R A N D T Y P E = IF T A B L E F L A G E Q 1 T H E N 23 E L S E 53; GOTO PARSEOP ENDCASE -• QUOTES LOC="BLAH.. CASE P Q U O T E -*TYPE 'PQUOTE" C O M P U T E QUOTEPOS = POSIT(TOKEN,TOKENLEN,QUOTE,l ,QUOTEPOS); LOC/A80='"'|| UUST(80, SUBSTR(LOCLEN,LOC,QUOTEPOS + 1,LOCLEN,80,NEWLINE),LOC); LOCLEN/I2 = ARGLEN(80,LOC,LOCLEN); CARPOS/I2 = POSIT(IXK:,IXX:LEN,LCARROT,LCARPOS); QUOTEPOS/T2 = POSIT(SUBSTR(LOCLEN,LOC,2, LOCLEN,80,NEWLINE) ( LOCLEN-l ,QUOTE,l ,QUOTEPOS); OPENQUOTE/I1 = IF QUOTEPOS E Q 0 T H E N 1 E L S E 0; IF CARPOS E Q 2 T H E N G O T O GETSVARS ELSE (TP CARPOS G T 2 T H E N G O T O GETSTRTNG); -* JUST C H A R A C T E R S L E F T IN STRING C O M P U T E TOKEN/A80 = IF QUOTEPOS G T 0 T H E N SUBSTR(LOCLEN, LOC, 1, QUOTEPOS + 1,80,TOKEN) E L S E LOC; TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); O P E R A N D T Y P E = IF T A B L E F L A G E Q 1 T H E N (IF T O K E N CONTAINS '&' T H E N 22 E L S E 24) E L S E (TP T O K E N CONTAINS '&' T H E N 4 E L S E 6); / 196 GOTO PARSEOP ENDCASE -* PARSE ON THE "<" DELIMETER CASE GETSTRING COMPUTE TOKEN/A80 = GETTOK(LOC,LOC1£N,1/<\80TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); ADJUST/12 = 1; OPERANDTYPE = IF TABLEFLAG EQ 1 THEN (IF TOKEN CONTAINS '&' THEN 22 ELSE 24) ELSE (IF TOKEN CONTAINS '&' THEN 4 ELSE 6); GOTO QUOTE IT ENDCASE -* MOST OF THE STUFF BELOW IS BECAUSE FOCUS STRING HANDLING ROUTINES -* ARE NOT WELL DOCUMAENTED AND SOMETIMES DO MYSTERIOUS THINGS -* INPUT LOC = WAR BLAH" OR "<VAR<VAR" CASE GETSVARS COMPUTE CARPOS/I2 = POSrr(LOC,LOCLEN,'>',l,CARPOS); LOC/A80 = IF CARPOS GT 0 THEN OVRLAY(LOC,LOCLEN,' \LCARPOS,LOC); TOKEN/A80 = GETTOK(LOC,LOCLEN, 1,' ',80,TOKEN); TOKEN/A80 = OVRLAY(TOKEN,TOKENLEN,' ',l,l,TOKEN); TOKEN/A80= LJUST(80,TOKEN,TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); TOKEN/A80 = IF POSIT(TOKEN,TOKENLEN,,"',l,QUOTEPOS) GT 1 THEN OVRLAY(TOKEN,TOKENLEN,' ',l,TOKENLEN,TOKEN); TOKENLEN/12 = ARGLEN(80,TOKEN,TOKENLEN); REVERSED/A80 = RE VERSE(TOKENLEN,TOKEN,RE VERSED); ADJUST/11 = 2; IF POSrT(REVERSED,TOKENLEN)'<',LCARPOS) LT TOKENLEN-2 THEN GOTO STRJJPVAR; GOTO STRIPATTR ENDCASE -• TOKEN="<VAR<VAR" OR "<VAR<VAR CASE STRIPVAR COMPUTE TOKEN/A80 = '<'| |GETTOK(LOC,LOCLENA'<,,80,TOKEN); TOKENLEN/12 = ARGLEN(80,TOKEN)TOKENLEN); GOTO STRIPATTR ENDCASE _»»«*»»«»»**»»«*»***««»******»»»«**»*****»***«***»*****«****»*****»«*»» -* WE NOW FINALLY HAVE A TOKEN CONATAIN1NG SOMETHING RESEMBLING A SCREEN -• VARIABLE. UNFORTUNATELY WITH ALL THE FANCY GRAPHICS OUTPUTS THERE MAY / 197 -• BE SOME EMBEDDED GRAPHICS CONTROL CHARS. WE GET RID OF THESE HERE -» GRAPHICS CONTROL LOOK LIKE < AA. _**•****•*•••*••••*•*••****••*«*•**•*•******•*•*****••*****•**•*•******* CASE STRIPATTR -•TYPE "STRIPATR1 TOKEN = <TOKEN TOKENLEN = <TOKENLEN REVERSED =<REVERSED" •TYPE "STRIPATTR1 LOC= <LOC LOCLEN = <LOCLEN" IF (TOKEN GE'</ 'AND TOKEN LT '<:') OR (TOKEN CONTAINS '< +' OR '<-') THEN GOTO SCRNPOS; IF (TOKEN GE • < A') OR (TOKEN LE * < (' AND TOKEN GE * < &') THEN GOTO SCRNVAR; COMPUTE REVERSED/A80 = ' '; RE VERS ED /A80 - RE VERSE(TOKENLEN,TOKEN,RE VERSED); ENDPERIOD/I2 = POSIT(REVERSEDTOKENLEN,'.',LENDPERIOD); OPERANDTYPE = 8; -* TYPE "ENDPERIOD = <ENDPERIOD" IF ENDPERIOD EQ 1 THEN GOTO QUOTEIT; COMPUTE SPLITFLAG/I2 = IF TOKEN CONTAINS T V OR 'D.' THEN 1 ELSE 0; SPLTTPOS/I2 = LF SPLITFLAG EQ 1 THEN TOKENLEN-ENDPERIOD-1 ELSE TOKENLEN-ENDPERIOD+1; TOKEN/A80 = SUBSTR(TOKENLEN,TOKEN,l,SPLrrPOS,80,TOKEN); TOKENLEN/12 = ARGLEN(80,TOKEN,TOKENLEN); NEWLINE/A80 = SUBSTR(LOCLEN,LOC,SPLITPOS + 2,LOCLEN,80,NEWLINE); LOC/A80 = '"' | | TOKEN 11 '<' 11 NEWLENE; LOCLEN = ARGLEN(80,LOC,LOCLEN); -•TYPE "STRIPATR2 TOKEN = <TOKEN TOKENLEN = <TOKENLEN SPLITPOS= <SPLITPOS--*TYPE "STRIPATTR2 LOC= <LOC LOCLEN = < LOCLEN NEWLINE= <NEWLINE" GOTO QUOTEIT ENDCASE CASE SCRNPOS -* TYPE "SCRNPOS" COMPUTE OPERAND TYPE = IF TABLEFLAG EQ 1 THEN 25 ELSE 7; GOTO QUOTE~IT ENDCASE CASE SCRNVAR -* TYPE "SCRNVAR TOKEN = <TOKEN OPERAND TYPE= <OPERAND TYPE" COMPUTE OPERANDTYPE = IF TABLEFLAG EQ 1 THEN 21 ELSE (IF TOKEN CONTAINS '<D.' THEN (IF TOKEN CONTAINS '&' THEN 4 ELSE 3) ELSE (LF TOKEN CONTAINS '<T.' THEN (TF TOKEN CONTAINS '&' THEN 5 ELSE 2) ELSE (IF TOKEN CONTAINS '&' THEN 4 ELSE (TP CRTFLAG EQ 1 THEN 1 ELSE 3)))); -* TYPE "OPERANDTYPE = < OPERAND TYPE" COMPUTE ADJUST = IF TOKEN CONTAINS ' < D.' OR ' < T.* THEN 5 ELSE 3; -* DROP THE <D., <T. OR < TOKEN/A80 = IF TOKEN CONTAINS '<D.' OR '<T.' THEN SUBSTR(TOKENI^N,TOKEN,4,TOKENLEN,80,TOKEN) ELSE SUBSTR(TOKENLENTOKEN,2,TOKENLEN,80,TOKEN); TOKENLEN/I2 = ARGLEN(80,TOKEN,TOKENLEN); -* TYPE "LOC= <LOC TOKEN = <TOKEN ADJUST = <ADJUST GOTO QUOTE IT ENDCASE CASE QUOTEJT COMPUTE LOC/A80 = TOKEN 11 '@"'|| SUBSTR(LOCLEN,LOC,TOKENLEN+ADJUST, LOCLEN,80,LOC); LOC/A80= OVRLAY(LOC,LOCLEN,' ',l,TOKENLEN+l,LOC); LOCLEN/I2 = ARGLEN(80M)C^OCLEN); GOTO PARSEOP ENDCASE CASE BLANKOUT COMPUTE LOC/A80 = OVRLAYfLOCiOCLEN,* •,l,CHARPOS,LOC); GOTO EASYSTUFF ENDCASE CASE PADIT COMPUTE DELTMETER/A1 = SUBSTR(TOKENIiN,TOKEN,CHARFOS,TOKENLEN,cW,DELIMETER); NEWLTNE/A80=SUBSTR(LOCLEN,LOC,1,CHARPOS-1,80,NEWLINE) 11 '@' ; NEWLINE/A80=NEWLINE11DELIMETER; NEWUNE/A80=NEWLINE11 '@' 11 SUBSTR(LOCLEN,LOC,CHARPOS + l,LOCLEN,c»,^WLINE); NEWLEN/I2 = ARGLEN(80,NEWLINE,NEWLEN); NUPOS/I2 = POSTT(>rEWLINE,NEWLEN,'@*,lJWPOS); NEWLTNE/A80= OVRLAY(NEWLINE,NEWLEN,' \LNUPOS,NEWLINE); NEWLJNE/A80 = OVRLAY(NEWLINE,NEWLEN,' ',l,NUPOS + 2,NEWLTNE); LOC/A80 = NEWLINE; GOTO EASYSTUFF ENDCASE CASE HARDER -* TYPE "HARDER" IF OPENQUOTE EQ 1 THEN GOTO PATCHJT; GOTO GARBAGE ENDCASE 7 198 CASE PATCHTT -* TYPE "PATCHED A LINE" / 199 COMPUTE LOC/A80 = QUOTE||LOC; GOTO EASYSTUFF ENDCASE CASE GARBAGE TYPE ON LOGFILE "ERROR: GARBAGE = < TOKEN" GOTO REMOVE ENDCASE CASE ABORTLINE N ^ T E ON LOGFILE "ERROR: TOKEN COUNT EXCEEDS MAXIMUM. LINE ABORTED" TYPE ON LOGFILE "LOC= < LOCBUF ENDCASE CASE ABORTPGM TYPE ON LOGFILE "ERROR:PROGRAM LENGTH EXCEEDS MAXIMUM. PROGRAM ABORTED" GOTO EXIT ENDCASE DATA &FOCEXEC END / 200 -* PROGRAM: METRICS -* PURPOSE: GENERATES A NUMBER OF SOFTWARE METRICS FROM THE SYSTEMS DATABASE -* MAIN OUTPUT: METRICSP.FTM WHICH CAN BE IMPORTED INTO A STATS PACKAGE WRITTEN: AU.10.88 -* AUTHOR: CD. WRIGLEY -• MODS: AU.22.88 ADD PERSON VECTOR -* AU24.88 ADDED SORT BY FIRM ••*•***•••*•»•******•***•*•****•****•****••*•*********•**•***•••*••****•* EXEC PGMCLASS -TYPE FOCEXEC: PGMCLASS -RUN EXEC PERSON -TYPE FOCEXEC: PERSON -RUN EXEC FPOINTS -TYPE FOCEXEC: FPOINTS -RUN EXEC HALTEMP EXEC HALSTEAD -TYPE FOCEXEC: HALTEMP AND HALSTEAD -RUN EXEC MCCABE -TYPE FOCEXEC: MCCABE -RUN NOW TAKE THE OUTPUT GENERATED FROM THE ABOVE PROGRAMS AND MERGE THEM EXEC MERGE -TYPE MERGING FILES: PGMCLASS, FPOINTS, HALSTEAD, MCCABE,PERSON -RUN -* PRODUCE THE FINAL ASCII FILE EXEC METRICP -TYPE GENERATING METRICS REPORT -RUN DOS TIME / 201 .**•**•*••••••*•***•**•*•****•*•***»••**«••**•*•****••*•**•*****#••**•*••*•* -• PROGRAM: MERGE -* PURPOSE: MERGES SOFTWARE METRICS PRODUCED BY PGMCLASS, FPOINTS, HALSTEAD AND MCCABE INTO ONE FTLE -* MAIN OUTPUT: METRICS.FTM -* WRITTEN: AU.10.88 -* AUTHOR: C D . WRIGLEY -* MODS: AU.22.88 MERGES PROGRAMMER ID FTLE: PERSON AU.25.88. SYSNAMES AND FIRMNAMES TEXT MACROS _*««**•**«********•***»*•*****•****»******«•••***•«*•********************* MATCH FILE PGMCLASS PRINT PGMCLASS AS 'CLASS' BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME RUN FILE PERSON PRINT PROGRAMMER AS 'PERSON' BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME AFTER MATCH HOLD OLD-OR-NEW RUN FTLE FPOINTS PRINT LOC AND SCRN AND SVARIN AND SVAROUT AND RVAR AND MAS AND FLDS AND SEGS AND LND AND VFILE AND FLEN AND PROJ AND JOIN BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME AFTER MATCH HOLD OLD-AND-NEW RUN FTLE HALSTEAD PRINT Nl AND N2 AND ETA1 AND ETA2 AND LEN AND VOCAB AND VOL AND DSDEX AND NHAT BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME AFTER MATCH HOLD OLD-AND-NEW RUN FILE MCCABE PRINT KEYWORD AS 'MCABE' BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME AFTER MATCH HOLD AS METRICS OLD-OR-NEW END DOS ERASE FOCSORT.FTM ************************************************************************ / 2 0 2 -• PROGRAM: PGMCLASS -* CATEGORIZES PROGRAMS INTO CLASSES -* AU.01.88 CD. WRIGLEY MOD: AU.18.88. INTRODUCED BITMAPPED KEYWORD USAGE APPROACH. -* A 6 BIT FIELD IS USED TO RECOGNIZE THE OCCURRENCE OF SPECIFIC FOCUS KEYWORDS. ************************************************************************ DEFINE FILE SYSTEMS CRTCLASS/U WITH LOC = IF LOC CONTAINS 'CRTFORM ' OR '-CRTFORM ' THEN 1 ELSE 0; MODCLASS/I1 WITH LOC = IF (LOC CONTAINS 'MODIFY ') AND (LOC OMITS '"') THEN 1 ELSE 0; DATACLASS/I1 WITH LOC = DF (LOC CONTAINS 'FTXFORM ' OR 'FREEFORM ') AND (LOC OMITS -) THEN 1 ELSE 0; TABCLASS/I1 WITH LOC=IF (LOC CONTAINS TABLE ' OR TABLEF) AND (LOC CONTAINS ' FILE ') AND (LOC OMITS "') THEN 1 ELSE 0; ONTCLASS/I1 WITH LOC = IF LOC CONTAINS 'ON ' AND LOC CONTAINS TABLE ' AND (LOC CONTAINS 'HOLD ' OR 'SAVE ' OR 'SAVB ') AND (LOC OMITS THEN 1 ELSE 0; CONCLASS/I1 WITH LOC = IF (LOC CONTAINS 'EX ' OR 'EXEC ' OR 'RUN ') AND (LOC OMITS "") AND (LOC OMITS 'FOCEXEC') AND (LOC OMITS '-RUN) AND (LOC NE 'RUN) THEN 1 ELSE 0; END TABLE FILE SYSTEMS SUM CRTCLASS NOPRLNT AND MODCLASS NOPRTNT AND TABCLASS NOPRINT AND ONTCLASS NOPRINT AND CONCLASS NOPRTNT AND DATACLASS NOPRINT AND COMPUTE BIT6/I6 = IF SUM.MODCLASS GT 0 THEN 100000 ELSE 0; NOPRINT AND COMPUTE BTT5/I6 = IF SUM.CRTCLASS GT 0 THEN 10000 ELSE 0; NOPRTNT AND COMPUTE BIT4/I6 = Br SUM.DATACLASS GT 0 THEN 1000 ELSE 0; NOPRINT AND COMPUTE BIT3/I6 = IF SUM.TABCLASS GT 0 THEN 100 ELSE 0; NOPRTNT AND COMPUTE BJT2/I6 = IF (SUM.ONTCLASS GT 0) AND (SUM.ONTCLASS EQ SUM.TABCLASS) THEN 10 ELSE 0; NOPRINT AND COMPUTE BIT1/I6 = Br SUM.CONCLASS GT 0 THEN 1 ELSE 0; NOPRINT AND COMPUTE PGMCLASS/I6 = BIT1+BIT2+BIT3+BIT4+BIT5+BJT6; BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FTRMNAMES ON TABLE HOLD AS PGMCLASS END DOS ERASE FOCSORT J T M / 2 0 4 _*»»»»»«**«*»*»»**»***»»*««»**»»•*»»«»«***»»»»«•*«»•»***»««*»•*»*•»»«**** -* PROGRAM: FPOINTS .* -* FPOINTS EXTRACTS FUNCTION POINT LIKE METRICS FROM THE DATABASE AU.10.88 CD. WRIGLEY -* AU.22.88 SPLIT SVARS AND MAS COUNTING -* AU.24.88. SYSNAMES AND FIRM NAMES ARE TEXT MACROS »»»****«***»*******»»*****«****»«*»**»»»***»««*»**«****»»»*»*******»**** -* SPECIFY LOGICAL CRITERIA EM DEFINE DEFINE FILE SYSTEMS JOINFLAG/I5 = IF (((LOC CONTAINS 'MATCH') AND (LOC CONTAINS 'FILE')) OR ((LOC CONTAINS 'JOIN ') AND (LOC OMITS 'CLEAR'))) AND (LOC OMITS '"') THEN 1 ELSE 0; PROJFLAG/I5 = IF LOC CONTAINS 'ON ' AND LOC CONTAINS ' TABLE ' AND (LOC CONTAINS ' HOLD ' OR ' SAVE ') THEN 1 ELSE 0; END MATCH FTLE SYSTEMS COUNT LOC AND FILENAME AS 'MAS' AND SUM.#FT£LDS AS 'FLDS' AND SUM.#SEGMENTS AS 'SEGS' AND SUM.#INDEXES AS 'INDEX' AND SUM.#FTLES AS 'VFILE* AND SUM.FLENGTH AS 'FLEN' - - - -BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES RUN FTLE SYSTEMS COUNT KEYWORD AS 'SCRJvP IF KEYWORD EQ '-CRTFORM' OR 'CRTFORM' BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES AFTER MATCH HOLD OLD-OR-NEW RUN FTLE SYSTEMS COUNT OPERAND AS 'SVARIN' IF OPERAND TYPE EQ 1 OR 2 OR 5 BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES AFTER MATCH HOLD OLD-OR-NEW RUN FTLE SYSTEMS COUNT OPERAND AS 'SVAROUT IF OPERAND TYPE EQ 3 OR 4 BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES AFTER MATCH HOLD OLD-OR-NEW RUN FTLE SYSTEMS COUNT OPERAND AS 'RVAR' IF OPERANDTYPE IS-FROM 20 TO 22 BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES AFTER MATCH HOLD OLD-OR-NEW RUN FTLE SYSTEMS SUM PROJFLAG AS 'PROJ' AND JOINFLAG AS 'JOIN' BY FIRM BY SYSTEMNAME BY PROGRAMNAME SYSNAMES FIRMNAMES AFTER MATCH HOLD AS FPOINTS OLD-OR-NEW END DOS ERASE FOCSORT.FTM -* HALTEMP: AUG.08.88 CD. WRIGLEY -* EXTRACTS OPERATOR AND OPERAND DATA FROM SYSTEMS FTLE. -* TO BE RUN BEFORE HALSTEAD.FEX -* AU.25.88. SYSNAMES AND FIRMNAMES ARE TEXT MACROS _ * * » » 0 * » » » * * » * * « * * * » « « * » * » » * * » » * * » * » * « * « * * « » * * * » « » » * » * « * * * * * « » » * TABLE FILE SYSTEMS COUNT KEYWORD BY FIRM BY SYSTEMNAME BY PROGRAMNAME BY KEYWORD NOPRINT SYSNAMES FIRMNAMES ON TABLE HOLD AS HOLD1 END TABLE FTLE SYSTEMS COUNT OPERAND BY FIRM BY SYSTEMNAME BY PROGRAMNAME BY OPERAND NOPRINT SYSNAMES FIRMNAMES ON TABLE HOLD AS HOLD2 END MATCH FTLE HOLD1 SUM KEYWORD AND CNT.KEYWORD BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME RUN FTLE HOLD2 SUM OPERAND AND CNT.OPERAND BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME AFTER MATCH HOLD AS HALTEMP OLD-AND-NEW END DOS ERASE HOLD!.* DOS ERASE HOLD2.* DOS ERASE FOCSORT.FTM _««*»**»»»»»»*»«»»»»»»*»»*»*•*»«»«»»»»««»»»»*»***»*»»»*««»»****** -* HALSTEAD AUG.08.88 CD. WRIGLEY -* PRODUCES HALSTEAD SOFTWARE METRICS FROM HALTEMP HOLD FTLE -* RUN HALTEMP FIRST -* AU.25.88. SYSNAMES ANS FIRMNAMES ARE TEXT MACROS ft**************************************************************** TABLE FILE HALTEMP PRINT E04 NOPRINT AND E05 NOPRINT AND E06 NOPRINT AND E07 NOPRINT AND COMPUTE N1/I6 = E04; N2/I6 = E06; ETA1/I6=E05; ETA2/I6=E07; LEN/I6= Nl + N2; VOCAB/I6=ETAl + ETA2; VOL/16=LEN*(LOG(VOCAB)/LOG(2)); DSDEX/D63 = ETA2/N2; NHAT/I6 = (ETA1 * (LOG(ETAl)/LOG(2))) + (ETA2*(LOG(ETA2)/LOG(2))); BY COMPANYNAME BY SYSTEMNAME BY PROGRAMNAME ON TABLE HOLD AS HALSTEAD END DOS ERASE FOCSORT.FTM DOS ERASE HALTEMP.FTM DOS BEEP 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0098269/manifest

Comment

Related Items