Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A comparison of the ability of novices and experienced third generation language programmers to learn… Pulfer, Charles E. 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A4_6 P84.pdf [ 8.04MB ]
Metadata
JSON: 831-1.0096832.json
JSON-LD: 831-1.0096832-ld.json
RDF/XML (Pretty): 831-1.0096832-rdf.xml
RDF/JSON: 831-1.0096832-rdf.json
Turtle: 831-1.0096832-turtle.txt
N-Triples: 831-1.0096832-rdf-ntriples.txt
Original Record: 831-1.0096832-source.json
Full Text
831-1.0096832-fulltext.txt
Citation
831-1.0096832.ris

Full Text

A COMPARISON OF THE ABILITY OF NOVICES AND EXPERIENCED THIRD GENERATION LANGUAGE PROGRAMMERS TO LEARN FOURTH GENERATION LANGUAGES by CHARLES E . PULFER A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN BUSINESS ADMINISTRATION i n THE FACULTY OF GRADUATE STUDIES Commerce and Business Admin i s t ra t ion We accept t h i s thes i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF March © Charles E . BRITISH COLUMBIA 1987 P u l f e r , 1987 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the The U n i v e r s i t y of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e f o r reference and study. I further agree that permission for extensive copying of t h i s thesis for s c h o l a r l y purposes may be granted by the Head of my Department or by his or her representatives. It i s understood that copying or p u b l i c a t i o n of t h i s t h e s i s for f i n a n c i a l gain s h a l l not be allowed without my written permission. Commerce and Business Administration The University of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date: March 1987 ABSTRACT This thesis describes research which was c a r r i e d out to determine whether novices could program i n fourth generation languages as well as experienced t h i r d generation programmers. It was thought that experience with a t h i r d generation language could be transferred to a fourth generation environment. This hypothesis was tested using a completely randomized block design lab experiment c o n s i s t i n g of two fa c t o r s and a block. The two fa c t o r s were experience with t h i r d generation languages, and complexity of the task. The block was the educational i n s t i t u t i o n where the lab sessions were conducted. Each of the f a c t o r s and the block had two l e v e l s . The s p e c i f i c hypotheses tested were: 1. Experienced t h i r d generation language programmers w i l l record higher mean scores on both simple and complex te s t s of fourth generation languages. 2. The d i f f e r e n c e i n tes t scores, between simple and complex fourth generation language tasks, w i l l be greater f o r novices than for experienced t h i r d generation language programmers. 3. Experience with other software t o o l s , e s p e c i a l l y report writers, query languages, and other fourth generation languages w i l l a f f e c t the subjects' performance on the fourth generation language t e s t s . Using FOCUS as the fourth generation language, lab sessions were run for f i f t y - s e v e n subjects. The r e s u l t s i n d i c a t e that experience with t h i r d generation languages a f f e c t s a subject's performance on simple t e s t s of fourth generation languages. The re s u l t s also i n d i c a t e that the experience has no e f f e c t on i i complex te s t s of fourth generation languages. Because of a lack of data, no meaningful conclusions could be reached for hypothesis number three. We f e e l experienced t h i r d generation language programmers scored higher than novices on simple 4GL reporting tests because experienced 3GL programmers had s k i l l s which were very s i m i l a r to the s k i l l s needed i n a simple 4GL reporting a p p l i c a t i o n . There are several p o s s i b l e ways of explaining why experienced programmers could do no better than novices on complex 4GL reporting t e s t s . One possible explanation follows; because complex 4GL r e p o r t i n g commands are so d i f f e r e n t from t h i r d generation language commands, t h i r d generation language programmers had no advantage over novices. A second explanation might be that the complex te s t was too d i f f i c u l t , or too long. As a r e s u l t of t h i s d i f f i c u l t y , no one was able to perform very w e l l . We conclude that experienced programmers should be preferred over novices when app l i c a t i o n s involve simple 4GL commands. More research i s necessary to determine i f i n f a c t novices can perform as well as experienced t h i r d generation language programmers on complex 4GL tasks. i i i Table of Contents ABSTRACT i i LIST OF TABLES . . . v i LIST OF FIGURES v i i ACKNOWLEDGEMENT IX 1. INTRODUCTION 1 2. REVIEW OF THE LITERATURE ON FOURTH GENERATION LANGUAGES 3 2.1 DEFINITION OF FOURTH GENERATION LANGUAGES 3 2.2 CLAIMS MADE ABOUT FOURTH GENERATION LANGUAGES 11 3. RESEARCH RELEVANT TO FOURTH GENERATION LANGUAGES 18 3.1 MEASURES OF EASE-OF-LEARNING 19 3.2 RESEARCH ON FOURTH GENERATION LANGUAGES 23 3.3 QUERY LANGUAGE RESEARCH 25 3.4 THIRD GENERATION LANGUAGE RESEARCH 27 4. THEORY 31 5. METHOD 39 5.1 PARTICIPANTS 39 5.2 DESIGN • 41 5.2.1 COMPLEXITY FACTOR 41 5.2.2 EXPERIENCE FACTOR 46 5.2.3 CLUSTER ANALYSIS 54 5.2.4 STATEMENT OF THE MODEL EQUATION 57 5.2.5 OTHER VARIABLES USED IN THE ANALYSIS 57 5.3 MEASURE OF THE DEPENDENT VARIABLE 61 5.4 PROCEDURE 62 5.4.1 PILOT TEST 62 5.4.2 THE ACTUAL EXPERIMENT 64 5.5 ESTIMATION OF THE SAMPLE SIZE NEEDED 68 i v 5.6 S T A T I S T I C A L METHODS USED 68 5.6.1 HYPOTHESIS ONE 68 5.6.2 HYPOTHESIS TWO 7 1 5.6.3 HYPOTHESIS THREE 72 5.7 V A R I A B L E S USED I N THE A N A L Y S I S 74 6. RESULTS 77 6.1 HYPOTHESIS ONE 77 6.2 HYPOTHESIS TWO 81 6.3 HYPOTHESIS THREE 81 6.4 SUMMARY OF THE PERFORMANCE OF THE S U B J E C T S 83 7. D I S C U S S I O N OF THE RESULTS 87 B I B L I O G R A P H Y 91 A P P E N D I C E S 98 1. JUDGES' RATINGS OF THE SUBJECTS 98 2. S I M P L E TEST 103 3. COMPLEX TEST 113 4. MARKING SCHEME FOR THE TESTS 123 5. EXPERIMENTAL PROCEDURES 135 6. REPORT GENERATION T R A I N I N G MANUAL 139 7. P R A C T I C E PROBLEMS 169 8. QUESTIONNAIRE 184 9. ESTIMATION OF THE SAMPLE S I Z E NEEDED 188 10. DATA COLLECTED DURING THE EXPERIMENT 192 v L I S T OF TABLES I . COMPARISON OF 4GL CATEGORIES [ J E N K I N S AND SCHUSSEL] 9 I I . C A T E G O R I Z A T I O N OF FOCUS COMMANDS 44 I I I . JUDGES' RATINGS OF THE S U B J E C T S ' E X P E R I E N C E 47 I V . OTHER SOFTWARE USED BY THE SUBJECTS 60 V. V A R I A B L E S USED I N THE A N A L Y S I S , 74 V I . ANOVA T A B L E FOR MODEL 1.0 77 V I I . T A B L E OF REGRESSION S T A T I S T I C S FOR HYPOTHESIS THREE 82 V I I . MEANS, STANDARD ERRORS, AND NUMBER OF OBSERVATIONS FOR EACH TREATMENT '84 v i L I S T OF FIGURES 1. MARTIN'S MODEL OF A FOURTH GENERATION ENVIRONMENT 12 2. EXPERIMENT AL D E S I G N 42 3. FREQUENCY CHART FOR THE MEANS OF THE JUDGES' RATINGS 50 4. AGREEMENT OF THE TWO METHODS OF S U B J E C T SEPARATION 51 5. NUMBER OF S U B J E C T S I N EACH TREATMENT 52 6. GRAPH OF NOVICE, INTERMEDIATE AND EXPERT C L U S T E R S 5 6 7. EXPERIMENTAL D E S I G N WITHOUT BLOCKING 70. 8. GRAPH OF S U B J E C T SCORES VERSUS NUMBER OF REPORT WRITER PROGRAMS WRITTEN 73 9. GRAPH OF S U B J E C T SCORES VS MEAN OF THE JUDGES' RATINGS ( S I M P L E T E S T ) 79 v i i 1 0 . GRAPH OF S U B J E C T SCORES VS MEAN OF THE JUDGES' RATINGS (COMPLEX T E S T ) 80 1 1 . GRAPH OF S U B J E C T SCORES VERSUS EACH OF THE FOUR TREATMENTS 85 v i i i ACKNOWLEDGEMENT This research was supported by an NSERC Postgraduate Scholarship awarded to Charles Pulfer in 1985. ix 1. INTRODUCTION This research was prompted by the growing acceptance of a new type of software in the corporate information systems environment. Though these "fourth generation languages" are growing in acceptance (in 1985 a study found that fourteen percent of IBM installations in the U.S. use fourth generation languages.) 1 , information systems managers s t i l l know l i t t l e about the r e a l i t i e s of this type of software. Information systems managers have trouble defining the term "fourth generation language." Their i n a b i l i t y to define the term i s caused by the fact that software vendors, marketing everything from report writers to database management systems, label their products fourth generation languages (4GL's). As well, these software vendors claim that their products can not only be used by computer novices, but also by experienced programmers. If information systems managers are to make proper use of fourth generation languages, they w i l l need a clearer indication of what they are, who can use them, and when they should be used. The research was begun with these aims in mind. The purpose of this experiment was to provide insight into the a b i l i t y of novices and experienced third generation language programmers to learn a fourth generation language. The primary goal was to determine whether knowledge of third generation languages affected a person's a b i l i t y to learn a fourth generation language. A secondary goal was to determine whether novices had more d i f f i c u l t y with complex fourth generation language commands than did experienced third generation language programmers. It was hoped that answers to these questions would help information systems managers decide 1 "4GLs enter dp mainstream despite some resistance", Computing  Canada; Software Report, (May 1985), p.6. 1 2 1. who should do the programming in a fourth generation environment, and 2. whether novices should be allowed to produce more complicated applications, or whether this task should stay within the information systems department. The rest of the thesis proceeds as follows. Chapter 2 reviews the literature on fourth generation languages, Chapter 3 reviews some prior research relevant to fourth generation languages, Chapter 4 reviews some relevant theory on learning, Chapter 5 describes the method used for the experiment, Chapter 6 analyses the data obtained from the experiment, and Chapter 7 discusses the results. 2. REVIEW OF THE LITERATURE ON FOURTH GENERATION LANGUAGES 2.1 DEFINITION OF FOURTH GENERATION LANGUAGES As previously mentioned, one of the biggest problems involved with doing research i n t h i s area i s the lack of a p r e c i s e d e f i n i t i o n of these languages. Without a pre c i s e d e f i n i t i o n , information systems managers are not equipped to handle the combination of marketing l i t e r a t u r e , and "buzzwords" produced i n the p r a c t i t i o n e r l i t e r a t u r e . For t h i s reason, the f i r s t step i n t h i s t h e s i s was to uncover the c h a r a c t e r i s t i c s which define a fourth generation language. Martin 2 defines a fourth generation language as a t o o l which w i l l r e s u l t i n a p r o d u c t i v i t y improvement of at l e a s t ten to one over COBOL. Fourth generation languages a l s o use an order of magnitude fewer l i n e s of code, when developing an a p p l i c a t i o n , than would be needed with COBOL, PL/1 etc. Therefore fourth generation languages might be characterized as high p r o d u c t i v i t y languages. The claimed p r o d u c t i v i t y improvement i s a r e s u l t of the languages using a d i v e r s i t y of other mechanisms, besides sequential commands, such as f i l l i n g i n forms or panels, screen i n t e r a c t i o n and b u i l t - i n d e f a u l t s . Unfortunately l i t t l e research has been done to substantiate the p r o d u c t i v i t y improvement claims put forward by vendors. Therefore, we cannot be sure that fourth generation languages increase p r o d u c t i v i t y . In addition, we cannot be sure i f a fourth generation language can be used f o r a l l a p p l i c a t i o n s , or whether i t i s only suitable for a c e r t a i n core of a p p l i c a t i o n s . T h i r d generation languages such as COBOL and FORTRAN are domain independent. They are used across a 2James Martin, Application Development Without Programmers (Toronto: Prentice Hall,1982), p.28. 3 4 v a r i e t y of a p p l i c a t i o n areas and do not incorporate domain s p e c i f i c knowledge. Some very high l e v e l languages (e.g. IFPS, GPSS) are domain dependent. That i s to say, they can only be applied to solve s p e c i f i c problems. Fourth generation languages vary g r e a t l y i n t h e i r power and c a p a b i l i t i e s . While a t h i r d generation language could create a l l or most a p p l i c a t i o n s , some fourth generation languages are designed only for a s p e c i f i c c l a s s , or range of a p p l i c a t i o n s . Some are highly r e s t r i c t e d i n t h e i r range, while others can handle a d i v e r s i t y of ap p l i c a t i o n s well. In some cases, a s p e c i f i c fourth generation language might have to be chosen f o r a s p e c i f i c a p p l i c a t i o n . On the other hand, some 4GL's are just as f l e x i b l e as COBOL and can be used to produce complete ap p l i c a t i o n s i n almost any area of business. Fourth generation languages are al s o characterized as problem oriented, 3 or nonprocedural. As Leavenworth and Sammet state : " I t i s hard to convey an i n t u i t i v e notion of languages which i n some sense are higher than FORTRAN, COBOL, PL/1 etc. The most common term used for t h i s concept has been nonprocedural, and the most common phrase has been 'what' rather than 'how1. That phrase r e f e r s to the f a c i l i t y of a user to indic a t e the goals (what) he wishes to achieve rather than the s p e c i f i c methods of s o l u t i o n (how) that must be used." 4 In disc u s s i n g the advantages of a nonprocedural language Leavenworth and Sammet state: "The solution should be s p e c i f i e d i m p l i c i t l y i n terms of structures or abstractions which are relevant to the problem rather than those operations, data and co n t r o l structures which are convenient for some machine o r g a n i z a t i o n . " [ i b i d , p.2.] 3 Steven L. Mandell, Computers and Data Processing: Concepts and  Applications (New York: West Publishing,1985), pp.246-247. *Burt M. Leavenworth and Jean E. Sammet, "An Overview of Nonprocedural Languages", IBM Research Report RC4685 (1974), p . l . 5 Leavenworth and Sammet a l s o i d e n t i f y some c h a r a c t e r i s t i c s of these nonprocedural or problem oriented languages 1) Associative referencing - the programmer does not have to specify access paths, or conduct a search f o r a s p e c i f i c data structure. The data i s accessed on some i n t r i n s i c property of the data. 2) Aggregate operators - no need f o r looping. 3) Elimination of a r b i t r a r y sequencing - If a program s a t i s f i e s the "si n g l e assignment" t e s t (no v a r i a b l e i s assigned values by more than one statement) then the order of the statements i s immaterial. 4) Pattern d i r e c t e d structures - search f o r a pattern without sp e c i f y i n g how to search. The degree of nonproceduralness of a language i s not absolute, but, rather, i s r e l a t i v e . A t h i r d generation language with a statement such as A=(B * C)+D can be considered nonprocedural when compared to the equivalent operation i n assembly language. Generally, we can state that a fourth generation language i s more nonprocedural than a t h i r d generation language. Some of the fourth generation commands can be expressed i n terms of a se r i e s of t h i r d generation commands. But, as Elder states: "A fourth generation language that i s e n t i r e l y nonprocedural w i l l allow users to r e t r i e v e information without d e t a i l e d programming but w i l l be l i m i t e d to queries only. To develop a p p l i c a t i o n s that involve any l o g i c a l decisions and/or the processing of data (e.g. sorting) a language must have procedural aspects.... The di f f e r e n c e between a procedural fourth generation language and t h i r d generation languages i s the number of procedural i n s t r u c t i o n s necessary to write an a p p l i c a t i o n . " 5 5 Marvin Elder, "SALVO - A Fourth Generation Language for Personal Computers", Proceedings of the National Computer Conference 1984 J 6 Most fourth generation languages contain procedural commands, for example IF's and GOTO's, i n order to handle more complex l o g i c . Schmidt attempts to b u i l d database querying c a p a b i l i t i e s i n t o PASCAL.6 We can see that the PASCAL commands are s u b s t a n t i a l l y longer than the equivalent commands i n IBM's SQL, or a fourth generation language. This gives an i n d i c a t i o n of the d i f f e r e n c e s between a fourth generation language and a t h i r d generation language. The savings r e a l i z e d i n l i n e s of code by a 4GL are usually a r e s u l t of more powerful nonprocedural commands incorporated i n t o a 4GL. Savings are a l s o r e a l i z e d as a r e s u l t of d e f a u l t options chosen by a 4GL. Most 4GLs can automatically select the format of a report, put page numbers on i t , select chart types for graphic d i s p l a y , put l a b e l s on the axes or on column headings, and ask the user i n a f r i e n d l y manner when i t needs more information. Jenkins 7 has c l a s s i f i e d current software generator products i n t o three c l a s s e s : 1) a p p l i c a t i o n generators, 2) code generators, and 3) p r o d u c t i v i t y enhancement t o o l s . A p p l i c a t i o n generators have an end user o r i e n t a t i o n . A p p l i c a t i o n generators w i l l produce complete working a p p l i c a t i o n s , such as p a y r o l l or accounts receivable, from s p e c i f i c a t i o n s given by the user. FOCUS and RAMIS II are examples. Code generators produce a coded program, i n t h i r d generation language, from the given s p e c i f i c a t i o n s . This coded program would then have to be compiled and run. These code generators are oriented more towards te c h n i c a l users who can " f i n e tune" the code. P r o d u c t i v i t y enhancement 5 ( c o n t d) (Montvale,N.J.: AFIPS,1984), p.564. sJoachim W. Schmidt, "Some High Level Constructs for Data of the Type Relation", ACM Transactions on Data Bases, 2(1977), pp.247-261. 'Milton A. Jenkins, "Surveying the Software Generator Market", Datamation, 31, No.17(1985), pp.247-261. 7 tools f a c i l i t a t e the development process but cannot produce whole applications by themselves (examples of p r o d u c t i v i t y enhancement tools include query languages, t e s t data generators, automatic documentation, and report generators.) Both a p p l i c a t i o n generators and code generators f a l l i n t o the category of fourth generation languages, but p r o d u c t i v i t y enhancement to o l s are not considered 4GL's because they can produce only a p o r t i o n of the t o t a l a p p l i c a t i o n . Schussel o u t l i n e d three types of fourth generation languages i n a s l i g h t l y d i f f e r e n t manner. 8 1) Interpretive programming languages are nonprocedural and easy to learn. Examples include N0MAD2, RAMIS II, FOCUS and NATURAL. These languages do not handle complex l o g i c w e l l . 2) Function generators i n t e r a c t with the developer i n the form of dialogues and screens. Examples are ADS/0, MANTIS and IDEAL. As an example, an IDEAL a p p l i c a t i o n c o n s i s t s of two major classes of components: 1. F i l l - i n - t h e - b l a n k screens for d e s c r i p t i o n of the a p p l i c a t i o n : i t s inputs and outputs, reports, and panels. 2. A high l e v e l procedural language incorporating r e l a t i o n a l commands. With t h i s system, the developer i s insulated from the operating system, teleprocessing monitor, and data base manipulation. 3) Compiled system generators w i l l generate COBOL or PL/1 code from high l e v e l s p e c i f i c a t i o n s . Examples include PacBase, GAMMA, TELON, and UMBRELLA. 8Schussel, the President of D i g i t a l Consulting i s quoted i n the following a r t i c l e , Miriam Cu-Uy-Gam, "Do-it-yourself i s on the way for system development", Computing Canada: Software Report, (May 1985), p.9. 8 Schussel's compiled system generator category i s equivalent to Jenkins' code generator category. Both function generators and interpretive programming languages f a l l into Jenkins' application generator category. Table 1 summarizes Jenkins' and Schussel's categorizations. Jenkins' term "application generators" is used in a business environment as a synonym for a fourth generation language. Grochow provides an introduction and bibliography on the topic of application generators.' Fourth generation languages use database management systems to improve productivity. Fourth generation languages such as FOCUS, NOMAD or MAPPER are built on top of, and integrated with, their own database management system. Other 4GL's can be joined to database management systems already in place. One of the keys to efficient application generation is the fact that the data structure for an application already exists; i t is represented in the database data dictionary which the software can use. The creator of an application i s not required to design the data or i t s structuring. The data dictionary i s the foundation of many report generators, query languages and application generators. In addition to describing the data, such dictionaries may contain report headings, alternate names for data (aliases), report formats, screen layouts, and t i t l e s for fields that can be placed in column headings. Fourth generation languages suitable for users, as well as computer professionals, are emerging from three primary sources: ' Jerrold M. Grochow, "Application Generators: An Introduction", Proceedings of the National Computer Conference 1982 (Montyale,N.J.: AFIPS,1982), pp.391-392. 9 T A B L E I - C O M P A R I S O N 0_£ ±QL C A T E G O R I E S [ J E N K I N S AND. S C H U S S E L 1 j C A T E G O R Y 4GL P R O D U C E S A P P L 1 C A -T ION P R O D U C E S 3 G L C O D E P R O D U C E S P A R T OF AN A P P L I -C A T 1 ON E X A M P L E j A P P L 1 C A T 1 ON 1 G E N E R A T O R [ J E N K 1 N S ] Y E S Y E S NO NO F O C U S C O D E G E N E R A T O R [ J E N K 1 N S ] Y E S NO Y E S NO GAMMA 1 P R O D U C T 1 V 1 T Y i E N H A N C E M E N T ! T O O L S [ J E N K I N S ] : NO NO NO Y E S C O B O L I R E P O R T WR 1 T E R ; I N T E R P R E T I V E I P R O G R A M M I N G L A N G U A G E S [ S C H U S S E L ] Y E S Y E S NO NO F O C U S F U N C T 1 ON G E N E R A T O R S j [ S C H U S S E L ] Y E S Y E S NO NO 1 D E A L j COMP1 L E D S Y S T E M G E N E R A T O R S [ S C H U S S E L ] Y E S NO Y E S j N0 GAMMA 10 1) data base management systems designed for mainframes that include a 4GL for report generation, for query, and for prototyping business computer a p p l i c a t i o n s ; 2) r e l a t i o n a l data base management programs designed, i n i t i a l l y , f o r personal computers with integrated spreadsheets and other functions, including a 4GL for a p p l i c a t i o n s development; and 3) 4GL's designed o r i g i n a l l y as a p p l i c a t i o n development t o o l s . To be considered a 4GL, a language must f i r s t l y , be t i e d to, or incorporate, a database management system, which includes backup, recovery, and s e c u r i t y features, as well as a data d i c t i o n a r y . It must use a nonprocedural language which has the following features: a s s o c i a t i v e referencing, aggregate operators, e l i m i n a t i o n of a r b i t r a r y sequencing, and pattern d i r e c t e d structures. It must have the a b i l i t y to handle some complex l o g i c with procedural code and incorporate an i n t e r a c t i v e query f a c i l i t y and report generator. It should incorporate a screen formatter and o f f e r a p r o d u c t i v i t y improvement over COBOL. Usually t h i s i s achieved v i a a reduction i n the number of l i n e s of code, and a reduction i n the number of hours of programming e f f o r t . L a s t l y , i t must make i n t e l l i g e n t default assumptions concerning what the user needs. Some e x i s t i n g products have only some of the above c h a r a c t e r i s t i c s . They can be c l a s s i f i e d as p r o d u c t i v i t y enhancement t o o l s , not 4GL's. Simple report writers, query languages, graphics packages, databases and screen generators do not q u a l i f y . Rather, a l l of these features must be combined i n an integrated package to produce a fourth generation language. Figure 1 i l l u s t r a t e s a good 4GL 11 environment• 1 0 Given the above c h a r a c t e r i s t i c s the term "fourth generation language" s t i l l remains generic rather than s p e c i f i c . Products which have a l l of the above c h a r a c t e r i s t i c s do not n e c e s s a r i l y have s i m i l a r design or s i m i l a r syntax. This i s the r e s u l t of the multitude of vendors i n the 4GL f i e l d . For the purpose of t h i s t h e s i s , t h i r d generation languages w i l l be defined as languages which f i r s t l y , obtain t h e i r data from f i l e s rather than databases, and are "procedure-oriented". In other words, the programmer must speci f y "how" rather than what he wants to accomplish. In a d d i t i o n they may involve data typing, and are domain independent. Examples of 3GL's include PASCAL, BASIC, COBOL, FORTRAN, and Pl/1. It i s obvious that 4GL's have d i f f e r e n t c h a r a c t e r i s t i c s than languages such as COBOL, and PL/1. As a r e s u l t , learning a 4GL might be easier, or more d i f f i c u l t than learning a t h i r d generation language. 2.2 CLAIMS MADE ABOUT FOURTH GENERATION LANGUAGES Authors w r i t i n g on the subject of fourth generation languages have not only f a i l e d to better define fourth generation languages, but have al s o tended to make extravagant claims about the languages. Generally, these claims have not been backed up by empirical evidence. The following quotes are examples of some of the claims made concerning the c a p a b i l i t i e s of fourth generation languages. Read and Harmon st a t e : "With 4GL's, programming p r o d u c t i v i t y gains of 1 0 T h i s i l l u s t r a t i o n i s taken from James Martin, Fourth Generation  Languages (Lancaster: Savant Institute,1983), p.203. 12 FIGURE 1 - MARTIN'S MODEL OF A FOURTH GENERATION ENVIRONMENT Communications. Intelligent Data Base Mailbox facilities Automatically derived data. Data-base triggers. Integrity checks. Audit controls. Human Usage Aids • On-line documentation creation. • Menu generator. • Help aids. • Computer-based teaching. 13 1,000% to 2,000% over 3GL's are r o u t i n e l y achieved by personnel with no p r i o r programming experience... About 75% of a l l programming can be done by end users with only two days of t r a i n i n g , s p e c i a l i s t programmers, however, w i l l s t i l l handle complex a p p l i c a t i o n s . " l l 0 t h e r quotes are s i m i l a r , "System development with FOCUS takes about one fourth of the time... Programmers m u l t i p l i e d t h e i r output by 5 to 10 times by using Ads/Online." x 2 James Martin, i n h i s book Application  Development without Programmers discusses the 4GL NOMAD. He states: "Beginners who have never programmed f i n d i t very easy to achieve r e s u l t s of value by using the nonprocedural statements... It i s easy to understand and modify what another person has written i n NOMAD... Most end users employ only a small subset of the language. They can be taught to achieve powerful r e s u l t s i n a few hours." 1 3 The l i t e r a t u r e also indicates that 4GL's o f f e r a more responsive t o o l f o r prototyping because of t h e i r i n t e r p r e t i v e nature and t h e i r nonprocedural code. They produce shorter programs because they do not include some of the control statements found i n 3GL's, those dealing with input/output format, loop c o n t r o l , handling error conditions, and memory a l l o c a t i o n . Many successful implementations of systems, designed with fourth generation languages, have been reported i n the l i t e r a t u r e . The most famous case occurred at the Santa Fe Railway Co., where a railway repo r t i n g system was completely r e b u i l t by nonprogrammers i n a few weeks.[Ibid, p.175] L 1 N i g e l S. Read and Douglas L. Harmon, "Readers' Forum: Language B a r r i e r to Productivity", Datamation, 29, No.2(1983), pp.209-210. 1 2 D a v i d K u l l , "Nonprocedural Languages: Bringing up the Fourth Generation", Computer Decisions, 15, No.13(1983), pp.156-162. 1 3 M a r t i n , A p p l i c a t i o n Development Without Programmers, pp.206-208. 14 Some writers go so f a r as to state that neither programming experience, nor technical t r a i n i n g are p r e r e q u i s i t e s for the use of 4GL programming techniques. 1 4 On the other hand, some writers o f f e r caveats to the claims made by others. Dr Tom P u r c e l l , d i r e c t o r of IS with Borg Warner Chemical Corp., states, "As easy as MANTIS (4GL) i s to use , and as much as i t increases programmer p r o d u c t i v i t y , i t ' s s t i l l an order of magnitude too d i f f i c u l t f o r the average user. U n t i l they make i t easier to use, they can't have a programmerless DP department." 1 5 Wilco t r i e s to d i s p e l some of the myths of 4GL's. She argues that 4GL's are not that easy to learn because of t h e i r f a i r l y r i g i d syntax r u l e s . She also argues that a 4GL cannot be both f l e x i b l e and easy to use. The more functions that are b u i l t i n to make i t easy the more the user i s r e s t r i c t e d to the software designer's preconception about what the a p p l i c a t i o n system w i l l look l i k e . She concludes that there i s s t i l l a need for professional programmers. 1 6 Read and Harmon al s o t r y to d i s p e l some of the myths of 4GL's: "The glossy brochures and magazine ads touting 4GL's often claim nonprogrammers can produce t h e i r own reports with l i t t l e t r a i n i n g . On the whole t h i s i s true, but what i s not explained i s that such claims apply to report generation f o r s i n g l e a p p l i c a t i o n s using single databases employing only a minor part of the f u l l power of the 4GL. With current 4GL's programming complexity r i s e s exponentially with product complexity, and to be f u n c t i o n a l at the upper l e v e l s requires 1 4 Nigel S. Read and Douglas L. Harmon, "Assuring MIS Success", Datamation, 27, No.2(1981), p.109. 1 5Quoted i n an a r t i c l e by Micheal Tyler, "Cincom S h i f t s Gears", Datamation, 29, No.6(1983), p.65. 1 ' E l a i n e Wilco, "System Development Without Programming", Computer  Data, 9, No.2(1984), p.19. 15 a considerable amount of knowledge and experience."[Read and Harmon, "Assuring MIS Success", pp 118-119] The l i t e r a t u r e a l s o indicates that fourth generation languages have a number of weaknesses. For example, 4GL's are resource "hogs" and can use up to 50% more CPU time than 3GL's. Good database design i s important i n order to minimize the use of resources. The computer using a 4GL must have v i r t u a l memory and high speed I/O handling.[Ibid, p.116] Fourth generation languages are not s u i t a b l e f o r number crunching operations [Ibid, p.116]. They are weak at character manipulation [Ibid, p.116]. They are not suited f or an environment with a high number of transactions per hour - over 30,000 [Cu-Uy-Gam, p.9]. Fourth generation languages a l s o lack language standards. D i f f e r e n t vendors o f f e r completely d i f f e r e n t 4GL's, and even the same 4GL may not be compatible over a l l hardware. The ease of documentation and maintenance i s q u e s t i o n a b l e . 1 7 There may be a need to f i t the software to p a r t i c u l a r a p p l i c a t i o n s . No one type of 4GL i s appropriate for a l l s i t u a t i o n s . Resistance of e x i s t i n g data processing s t a f f i s a problem. [Martin, Ap p l i c a t i o n Development  Without Programmers, pp.45-47.] F i n a l l y , 4GL procedural code i s harder to read than COBOL code. When used, the procedural code of a 4GL decreases the nonprocedural benefits of a 4GL dramatically. From h i s experience, Johnson suggests a l i m i t of 300 statements for 4GL programs. 1 8 There are exceptions, the EDP Analyzer reports that with the use of Burrough's LINC 4GL, larger programs benefitted more from 1 7 P a u l C. T i n n i r e l o , "Software Maintenance with Fourth Generation Languages", Proceedings of the National Computer Conference 1984 (Montvale,N.J.: AFIPS,1984), pp.251-257. 1 8James R. Johnson, "A P r o t o t y p i c a l Success Story", Datamation, 29, No.11(1983), p.256. 16 the use of a 4GL than the shorter programs. 1' Considering the weaknesses i t i s obvious that an Information Systems manager must study h i s environment c a r e f u l l y before adopting a fourth generation language. This i s not to say that a fourth generation language cannot be extremely valuable when used c o r r e c t l y . The above claims and counter-claims serve only to confuse readers of the strengths and weaknesses of fourth generation languages. From the reports i t appears that end users could learn a small portion of a 4GL i n a few days and use t h i s knowledge for querying and report generation. But, i n order to program large production a p p l i c a t i o n s , more knowledge i s necessary. It appears that end users, using a 4GL, s t i l l cannot develop a large production system. The EDP Analyzer Special Report on 4GL suggests that the a p p l i c a t i o n s best suited to a 4GL are those subject to rapid changes, or where the need for ad hoc reporting i s high, as i n personnel or budgeting.[Ibid, p.15] Some research has investigated the improvements i n programming p r o d u c t i v i t y brought about by using the 4GL FOCUS. Harel and McLean's work lends support to some of the claims of increased p r o d u c t i v i t y although they studied only r e l a t i v e l y small systems. 2 0 S t i l l one question remained unanswered - Could novices learn 4GL's as well as experienced programmers? Read and Harmon o f f e r t h e i r opinion: "Since programming techniques with a 4GL are so d i f f e r e n t from those of e a r l i e r generation programming languages, everyone has to s t a r t from 1 9EDP Analyzer, Special Report: Fourth Generation Languages and  Prototyping (Vista,Ca.: Canning Pub.,1984), p.29 2 0 E l i e C. Harel and Ephrain R. McLean, "The E f f e c t s of Using a Nonprocedural Computer Language on Programmer Productivity", MIS  Quarterly, 9, No.2(1985), pp.109-119. 17 square one, which opens up a large new pool of programming t a l e n t .•. These programming techniques have to be learned from scratch, because there i s almost no s i m i l a r i t y between programming i n COBOL and programming i n a 4GL. In f a c t , f o r a v a r i e t y of reasons, a knowledge of COBOL may be a hindrance." [Read and Harmon, "Assuring MIS Success", p.120]. Schleuter i n an extract from his book User Designed  Computing , quoted i n N i c o l l - G r i f f i t h , adds: " It i s a s i g n i f i c a n t f a c t that the more sophisticated and experienced a DP person i s i n conventional methodologies, the less l i k e l y i t i s that such a person w i l l be comfortable or even e f f e c t i v e when trained to do report processing a p p l i c a t i o n design." 2 1 While consulting, Martin has encountered the same phenomena. He states: "New graduates often learn and become s k i l l e d with the new techniques f a s t e r than many established programmers. This phenomenon has been observed and measured with many ap p l i c a t i o n generators and 4GL's. IBM uses ADF extensively f o r i t s own i n t e r n a l development. It has measured the performance of many ADF users and discovered that new graduates do much better on average, than experienced programmers. National CSS s t a f f sometimes r e f e r to the NOMAD programs written by o l d COBOL programmers as "NOBOL" programs. The COBOL programmers, thinking i n COBOL-like terms, f a i l to use the powerful but d i f f e r e n t constructs i n the NOMAD language".[Martin, Fourth Generation Languages, p.64] I n t u i t i v e l y , i t i s reasonable to assume that experience i n one programming language would help i n learning another, e s p e c i a l l y when learning and using the procedural aspects of a 4GL. But the above anecdotal evidence points i n the opposite d i r e c t i o n . 2 1 Mike N i c o l l - G r i f f i t h , rev of User-Designed Computing, by Louis Schleuter J r . , MAPPER was the F i r s t User Command Language (Montreal: Canadian P a c i f i c Consulting,1983), p.4. 3. RESEARCH RELEVANT TO FOURTH GENERATION LANGUAGES Because programming i s a complex, but poorly defined task, researchers have experienced many problems i n conducting experiments with programmers. Most of these problems have surfaced i n experiments conducted with t h i r d generation languages. The lessons learned from these experiments are a l s o relevant to researchers studying fourth J generation languages. Brooks and Shneiderman discuss the problems caused by the large v a r i a t i o n i n programmer performance. 2 2 , 2 3 Brooks states that because the r a t i o of programmer performance can vary from four-to-one, to twenty-five to one, i t i s d i f f i c u l t to assemble a group of programmers with equivalent s k i l l s . This kind of confound could e a s i l y i n v a l i d a t e the r e s u l t s of an experiment. Brooks proposes a large sample as a p a r t i a l s o l u t i o n . Both Shneiderman and Brooks suggest using within subject t e s t s f o r experiments where multiple l e v e l s are involved. It i s also important that the subjects of the experiment be representative of the population to whom we wish to apply our fin d i n g s . Brooks suggests that subjects be r e l a t i v e l y uniform with regard to t h e i r c h a r a c t e r i s t i c s and a b i l i t i e s at pre-experimentation, i n order to avoid introducing confounds. Shneiderman proposes that a l o t of data be c o l l e c t e d on the subjects. Examining t h i s data f o r co r r e l a t i o n s with the dependent v a r i a b l e w i l l help the researcher determine i f confounds e x i s t . For example, for each subject, the job 2 2Ruven Brooks, "Studying Programming Behavior Experimentally: The Problems of Proper Methodology", Communications of the ACM, 23(1980), p.209. 2 3Ben Shneiderman, "Improving the Human Factors Aspect of Database Interactions", ACM Transactions on Database Systems, 3(1978), pp.423-425. 18 19 experience, number of courses taken, of programming courses taken, of languages known, of years programming and of months with each language should be c o l l e c t e d . Reisner suggests that an aptitude measure be developed to ensure that the groups are equal. She also l i s t s questions to consider when assessing query languages. Did the subjects have the same kind of background as the people who are expected to use the query language? Were they of the same educational l e v e l , the same i n t e l l i g e n c e l e v e l ? Was t h e i r l e v e l of motivation the same as that of the intended users? If some subjects were c a l l e d "programmers" or "more advanced" how were these classes defined? 2 4 P r i o r to the experiment, researchers should make every e f f o r t to ensure that subjects are, i n a l l respects, as equal i n a b i l i t y as p o s s i b l e . Brooks a l s o discusses the kind of programs that should be used i n an experiment t e s t i n g comprehension. Programs which are too easy would r e s u l t i n c o n s i s t e n t l y high scores. Thus, poor variance i n the r e s u l t s would be produced, making i t d i f f i c u l t to reach any conclusions from the data. Secondly, programs should be representative of r e a l world a p p l i c a t i o n s . Brooks suggests 50 to 100 l i n e s of code. Longer programs would be more representative but they would be very hard to administer i n a lab experiment. 3.1 MEASURES OF EASE-QF-LEARNING Shneiderman breaks programming i n t o f i v e tasks: learning, composition, comprehension, debugging, and modification. [Shneiderman, p. 419] As we w i l l see, i n d i c a t i o n s of how well a person has learned programming 2 4 P h y l l i s Reisner, "Human Factors Studies of Database Query Languages: A Survey and Assessment", ACM Computing Surveys, 13, No.1(1981), pp.27-28. 20 has often been measured by re q u i r i n g the subject to perform any of the other four tasks. For instance, i f a novice has thoroughly learned a 3GL such as COBOL, he w i l l be able to write a COBOL program, and understand and modify an e x i s t i n g program. This p r i n c i p l e i s used extensively i n t e s t i n g u n i v e r s i t y computer science students. To measure ease-of-learning Brooks, 2 5 R e i s n e r , 2 6 and Shneiderman 2 7 suggest some a l t e r n a t i v e s : r e q u i r i n g a modification to the program, l o c a t i o n of a bug, response to a set of multiple choice questions, a subjective estimation of the c l a r i t y of a program, or a hand t r a c i n g of the execution sequence. The following questions could be asked f o r hand t r a c i n g : the value of a v a r i a b l e at a s p e c i f i c point i n the program, the sequence of values assumed by a v a r i a b l e , the number of times a p a r t i c u l a r statement i s executed, the sequence of statements executed, the output of a program, a b r i e f d e s c r i p t i o n of the function of the program, and the impact of an a l t e r a t i o n . Some of these te s t s are not v a l i d for a 4GL because of i t s nonprocedural nature. The execution of a 4GL i s not always s t r i c t l y sequential. The l i t e r a t u r e i n d i c a t e s that researchers have had problems with some of the measures of understanding. Subjective answers by pa r t i c i p a n t s of experiments have not proven to be r e l i a b l e . Open-ended questions can be d i f f i c u l t to score. Weissman t r i e d to develop good measures of understanding while doing experiments on the a f f e c t s of c e r t a i n v a r i a b l e s (e.g. comments,structure) on the complexity of a program. 2 8 He concluded that although hand simulation i s not a v a l i d 2 SBrooks, pp.211-213. 2 6 P h y l l i s Reisner, pp.17-19. 2 7Ben Shneiderman, "Exploratory Experiments i n Programmer Behavior", International Journal of Computer and Information Sciences, 5, No.2(1976), pp.125-126. 2 8 L a r r y Weissman, "Psychological Complexity of Computer Programs: An 21 measure of understanding, i t i s an important f a c t o r which can contribute to one's understanding of a program. Quiz scores tended to go up a f t e r hand simulation. He also found that f i l l - i n - t h e - b l a n k questions were inadequate and decided to use open-ended quiz questions instead. Shneiderman was not s a t i s f i e d with existent measures, so he developed a new measure of comprehension: memorization/recall. 2'He presented programmers with scrambled and unscrambled FORTRAN programs and they were asked to memorize them i n a given time. They were then asked to reproduce the programs. He showed that more experienced programmers could better reproduce the unscrambled programs because of t h e i r chunking a b i l i t y . The r e s u l t s for scrambled programs were not s i g n i f i c a n t . He hypothesized that memorization and comprehension were cor r e l a t e d . Unfortunately, the r e s u l t s of other studies force us to question the v a l i d i t y of t h i s measure. In Vessey's study, t h i s measure did not c o r r e l a t e well with other measures of programming s k i l l , and d i d not c o r r e c t l y p r e d i c t programmer performance. 3 0 Composition s k i l l s can be measured by asking the subject to write a program, or part of a program. The problem here i s that w r i t i n g a complete program can be time-consuming, and w r i t i n g a short program can i n v a l i d a t e the external v a l i d i t y of the research. In fourth generation languages, programs are usually short, so a short program would s t i l l be v a l i d . 2 8 ( c o n t d) Experimental Methodology", SIGPLAN Notices, 9, No.6(1974), pp.30-34. 2'Ben Shneiderman, "Measuring Computer program Qua l i t y and Comprehension", International Journal of Man-Machine Studies, 9(1977), pp.465-478. 3 0 I r i s Vessey, An Investigation of the Psychological Processes  Underlying the Debugging of Computer Programs, Unpublished Doctoral D i s s e r t a t i o n , U n i v e r s i t y of Queensland, A u s t r a l i a (1984), pp.218-220. 22 When measuring ease-of-use of a query language, Reisner (Reisner's use of ''ease-of-use" here i s equivalent to my 'ease-of-learning'), suggests some tasks that can be used as measures. The tasks are query wri t i n g , query reading ( t r a n s l a t e meaning in t o E n g l i s h ) , query i n t e r p r e t a t i o n (what w i l l i t do given the data), question comprehension, memorization, and problem so l v i n g (given a problem, what queries w i l l solve i t ) . [ R e i s n e r , ppl6-17] She also l i s t s some kind of t e s t s used to measure ease-of-use: f i n a l exams of learning, immediate comprehension (tests while teaching), reviews, re t e n t i o n (how well can the language be used after' a long period of time), and r e l e a r n i n g . To date, the bulk of the research done i n query languages, has used the task of query w r i t i n g , and f i n a l exams, immediate comprehension and retention t e s t s as measures of ease-of-use. Shneiderman argues against using time as a measure of q u a l i t y because those who f i n i s h f i r s t are not ne c e s s a r i l y the b e s t . 3 1 He supports s e t t i n g a f i x e d time length f or task performance because i t focuses a t t e n t i o n on correctness and q u a l i t y . Before we can measure subject comprehension the subjects have to be taught the language. To date, most researchers have used the t r a d i t i o n a l classroom method. R e i s n e r 3 2 j u s t i f i e d the method and stated that "classroom teaching i s r e l a t i v e l y quick to implement and known to be e f f e c t i v e , and because i t provides opportunity for on the spot feedback between the teacher and the student.' But she admitted that computer i n s t r u c t i o n would have been more reproducible. If more 3 1Shneiderman, "Improving the Human Factors Aspect of Database Interactions", p.426. 3 2 P h y l l i s Reisner, "Human Factors Evaluation of Two Data Base Query Languges - Square and Sequel", Proceedings of the National Computer  Conference 1975(Montvale, N.J.: AFIPS,1975), p. 451. 23 than one cl a s s i s taught, or i f multiple i n s t r u c t o r s teach d i f f e r e n t classes, the equivalence of teaching between classes i s hard to e s t a b l i s h . A more equal method would be to teach, employing a t r a i n i n g manual. This method has the added advantage of being most representative of how workers learn on the job. 3.2 RESEARCH ON FOURTH GENERATION LANGUAGES To date, research of fourth generation languages has concentrated on the p r o d u c t i v i t y advantages of 4GL's over t h i r d generation languages. As yet, no research has examined the a b i l i t y of experienced t h i r d generation language programmers, and novices, to learn fourth generation languages. The two studies summarized below are the only pieces of research conducted i n the area of fourth generation languages up to t h i s p o i n t . 3 3 Munnecke conducted a d e s c r i p t i v e study of a fourth generation language. 3 4 He compared the fourth generation language ,MUMPS, used at the Massachusetts General Hospital, with COBOL, on a s t r i c t l y l i n g u i s t i c b a s i s . He believed that a computer language should support users, l i n g u i s t i c a l l y , on t h e i r own terms, to adapt to t h e i r needs, and to be as f o r g i v i n g and f r i e n d l y as p o s s i b l e . He compared the access methods of COBOL/IMS with MUMPS. COBOL/IMS has 12 d i f f e r e n t complicated access methods while MUMPS has only one simple access method. Also, COBOL/IMS database pointers involve complicated ph y s i c a l references while MUMPS navigates more l o g i c a l l y . The amount of 3 3 There has been a l o t written i n the p r a c t i t i o n e r l i t e r a t u r e , but, as I mentioned e a r l i e r , i t cannot be considered "research". 3 4Thomas Munnecke, "A L i n g u i s t i c Comparison of MUMPS and COBOL", Proceedings of the National Computer Conference 1980, (Montvale,N.J.: AFIPS,1980), pp.723-729. 24 documentation that supports each system i s also compared. Over 1700 pages must be read for COBOL/IMS. MUMPS documentation is much shorter. Munnecke concludes that the 4GL i s better suited to meeting the needs of users, within the range of applications i t can handle. Recently, Harel and McLean studied the effects of using a nonprocedural language on programmer productivity.[Harel and McLean, pp.109-119]. A f i e l d experiment was conducted in order to compare COBOL with the 4GL FOCUS, in terms of programmer productivity and program efficiency. Beginners (people who had programmed less than 20 programs in that language) and experts (more than 20 programs) were asked to program report generation applications (simple and complex). In every case, the 4GL FOCUS was found to be faster but less efficient i n terms of machine resources used. Programmers, with l i t t l e experience, did significantly better with FOCUS than COBOL, while the difference for experts, was not as great. This suggests FOCUS might be a good end user language, but this conclusion i s weakened by the fact that the less experienced programmers were far from being novices, and, in fact, a l l the programmers were professionals. Also the "complex" applications were not really complex. But neither were they overly simple. The complex task took approximately three days to program in COBOL. In longer programs, COBOL might have had an advantage over FOCUS. The results of the study imply that FOCUS would be a good end user language because novices can learn i t quickly. On the other hand, the experiment does not deal with the issue of whether novices or experienced programmers learn 4GL's more easily. Programmers need to be tested more directly to see i f novices can learn and use a 4GL 25 as e a s i l y as an experienced 3GL programmer. 3.3 QUERY LANGUAGE RESEARCH Some comparison of novice versus experienced programmer learning has been conducted with query languages. Query languages are si m i l a r to 4GL's because they are nonprocedural, and because they are often incorporated into 4GL's. Reisner et a l . compared two data base query languages, SQUARE and SEQUEL, using both novices (no programming experience) and programmers (had taken one programming c o u r s e ) . 3 s T h e i r main f i n d i n g was that SEQUEL was easier to learn than SQUARE, but she also had other i n t e r e s t i n g r e s u l t s . Programmers were able to learn the new nonprocedural languages somewhat f a s t e r than nonprogrammers (12 hours versus 14 hours of cl a s s time). The d i f f e r e n c e i n scores between programmers and nonprogrammers on the quizzes were s i g n i f i c a n t (p < .01). Programmers scored higher than nonprogrammers. On the one hand, since query languages are s i m i l a r to 4GL's, we might expect the same r e s u l t s f o r subjects learning a 4GL. On the other hand, the learning time comparison must be viewed with skepticism. Reisner explains that the pace of classes was determined by the slower learners. Therefore, a few slow learners i n one of the classes could s e r i o u s l y a f f e c t the learning time of the whole c l a s s , throwing o f f the v a l i d i t y of any comparisons between clas s e s . In t e s t i n g the students, Reisner et a l . used f i v e review quizzes during the classes, a f i n a l exam at the end, and a memory tes t one 3 5 P h y l l i s Reisner, Donald D. Chamberlain, and Raymond F. Boyce, "Human Factors Evaluation of Two Database Query Languages -Square and Sequel", Proceedings of the National Computer Conference 1975 (Montvale,N.J.: AFIPS,1975), pp.447-452. 26 week l a t e r . With the exception of the memory t e s t , students were allowed to use reference materials. Based on t h e i r r e s u l t s , Reisner et a l . argue that query languages can be learned i n l a y e r s . 3 ' G i v e n the basic subset of a query language, an inexperienced programmer can learn to use the language very quickly with few e r r o r s . Because t h i s subset i s so simple, experienced programmers may not have an advantage i n learning when only the basics are considered. Their advantage may only become apparent when the more complex procedural aspects are taught. The same r e s u l t s might be expected i n t h i s research when scores on the simple task are compared between novices and experienced programmers. In a s i m i l a r study, Welty and Stemple compared the nonprocedural query language SEQUEL with the procedural query language TABLET. 3 7 They believed there was a point of complexity i n languages beyond which a procedural language i s easier than the nonprocedural. They defined a metric for p r o c e d u r a l l y and went on to show that TABLET was a better language for complex queries. Again, programmers outperformed nonprogrammers i n learning the language ( s i g n i f i c a n t at the .05 l e v e l ) . They a l s o had a higher retention score one week l a t e r . The scoring, teaching and subject c l a s s i f i c a t i o n s were b a s i c a l l y the same as Reisner's. Again, t h i s seems to imply that experienced programmers have an advantage i n learning a 4GL, e s p e c i a l l y the more procedural aspects used for complex a p p l i c a t i o n s . But the study also reveals that, when the quiz scores for the 3 ' P h y l l i s Reisner, "Use of Psychological Experimentation as an Aid to Development of a Query Language", IEEE Transactions on Software  Engineering, 3(1977), p.222. 3 7 C h a r l e s Welty and David W. Stemple, "Human Factors Comparison of a Procedural and a Non-Procedural Query Language", ACM Transactions on  Database Systems, 6(1981), pp.626-649." 27 experienced programmers are compared, SEQUEL scores are lower on average than the SQUARE scores. This i s not the case for inexperienced (novice) programmers. Again, t h i s may mean that programmers do not have as large an advantage when learning a nonprocedural language. S t i l l , the r e s u l t s i n d i c a t e that experienced programmers can outperform novices when learning query languages. 3.4 THIRD GENERATION LANGUAGE RESEARCH Chrysler, i n h i s study of what a f f e c t s the p r o d u c t i v i t y of programmers, found that experience does have s i g n i f i c a n t impact. 3 8 He measured the following v a r i a b l e s : the number of months of programming experience, the number of months of experience using COBOL, the number of months experience using the s p e c i f i c COBOL compiler, the number of months experience i n programming business a p p l i c a t i o n s , and the number of months experience programming f o r the current employer. Programmers developed code i n the COBOL language. A l l variables were found to be s i g n i f i c a n t l y c o r r e l a t e d to the p r o d u c t i v i t y of programmers. Even the number of months programming experience (not ne c e s s a r i l y a l l i n COBOL) was s i g n i f i c a n t . This i n d i c a t e s that experience with one t h i r d generation language, improves performance i n another 3GL. In another COBOL study, Gordon et. al.showed that programmers with at least three years experience i n COBOL, outperformed students who had just learned COBOL.3' The pro f e s s i o n a l s f i n i s h e d t h e i r programs with le s s errors and fewer runs. These studies i n d i c a t e that experience with one 3GL a s s i s t s when using another 3GL, and the more 3 " E a r l Chrysler, "Some Basic Determinants of Computer Programming Produ c t i v i t y " , Communications of the ACM, 21(1978), pp.472-483. 3 9 J . D . Gordon, A. Salvadori, and C.K. Capstick, "An Empirical Study of COBOL Programmers", INFOR, 15(1977), pp.229-241. 28 experienced the programmer the better. But these r e s u l t s may not apply i n a 4GL environment. Kennedy studied the learning curves of naive users of a new system. He showed that the anxiety and fear of naive users can have an e f f e c t on t h e i r a b i l i t y to l e a r n . 4 0 DuBoulay and O'Shea, report a study by Mayer comparing d i f f e r e n t program s t r u c t u r e s . 4 1 The GOTO, IF THEN and a nonprocedural construct were compared for comprehension. The novices were not asked to write programs but to in t e r p r e t and answer questions about a sporting competition, expressed i n the various program-like forms. Results indi c a t e d that the nonprocedural construct was the most comprehensible. The harder the question was, the greater the s u p e r i o r i t y of the nonprocedural representation. This study i s weak, i n the sense that r e a l programs and programmers are not involved. Yet, i t again indicates that nonprocedural structures are easier for novices to comprehend, which has some implications for learning a 4GL. This indicates that novices are at less of a disadvantage, as compared to experienced programmers, when learning a nonprocedural language, than when learning a procedural language. Some research has been conducted on debugging programs. Youngs used t h i r t y novices and twelve p r o f e s s i o n a l programmers i n h i s study of debugging. 4 2 Novices were u n i v e r s i t y students taking t h e i r f i r s t programming course and prof e s s i o n a l s held or had held p r o f e s s i o n a l 4 0T.C.S. Kennedy, "Some Behavioural Factors A f f e c t i n g the Tra i n i n g of Naive Users of an Interactive Computer System", International Journal of Man-Machine Studies, 7(1975), pp.817-834. 4 1 B . DuBoulay and T. O'Shea, "Teaching Novices Programming", Human  Interactions with Computers, ed H.T. Smith and T.R.G. Green (New York: Academic Press,1980), pp.159-162. 4 2Edward A. Youngs, "Human Errors i n Programming", International  Journal of Man-Machine Studies, 6(1974), pp.361-376. 29 programming jobs. Youngs compared the number of errors committed by the two groups. Experienced programmers committed fewer errors and corrected their programs more quickly. Youngs also discovered that novices made many more syntax and semantic errors than the professionals but the same number of logic errors. He pointed out especially troublesome areas for novices, like looping and input /output formatting. If these could be eliminated (as in 4GL's), novices could compete more evenly with professional programmers. This could explain why novices can learn 4GL's as well as experienced 3GL programmers. A debugging study conducted by Vessey concluded just the opposite.[Vessey, pp.206-222] She showed that more experienced programmers do not necessarily debug better than less experienced programmers. Managers cl a s s i f i e d programmers as experts or novices on the basis of the number of years experience they had accumulated. Vessey concluded that this c l a s s i f i c a t i o n was not a good predictor of debugging performance. Even though Vessey was dealing with debugging and not programming, her study indicated that the number of years of programming alone cannot predict how proficient a person w i l l be at programming. In conclusion, i t i s obvious that there is a need for research in the area of fourth generation languages. The question of whether novices can prepare their own applications with fourth generation languages, and whether or not 3GL programmers can transfer their s k i l l s into a 4GL environment, have not been addressed. The one thing that we can conclude from the research done to date is that experienced programmers have always outperformed novices in using query languages and third generation languages. 4. THEORY How well a person can learn a fourth generation language, a f t e r using a t h i r d generation language, i s a question of t r a n s f e r of t r a i n i n g . Most of the theory relevant to learning comes from psychology. These theories have yet to be applied to programming studies. The most important theories involve the concepts of p o s i t i v e and negative t r a n s f e r s i n l e a r n i n g . Garry and Kingsley explain the theory as follows: "When t r a i n i n g i n one s i t u a t i o n or one form of a c t i v i t y a f f e c t s one's a b i l i t y i n another type of a c t i v i t y or one's performance i n d i f f e r e n t s i t u a t i o n s we have what i s commonly understood as t r a n s f e r of t r a i n i n g . An attempt to operate a tractor or a truck based upon one's knowledge of operating an automobile requires t r a n s f e r of t r a i n i n g i n order to succeed i n the task. In countless ways we use the r e s u l t s of past learning to meet the demands of new s i t u a t i o n s . In many ways the r e s u l t s of past learning i n t e r f e r e with new learning, for instance, the d i f f i c u l t y we experience i n c o r r e c t l y pronouncing a foreign language because of our habitual manner of pronouncing sounds." 4 3 If p r i o r experience f a c i l i t a t e s learning i n a new s i t u a t i o n , we say that p o s i t i v e t r a n s f e r has occured. Osgood showed that the key factor i s the s i m i l a r i t y of s t i m u l i i n d i f f e r e n t s i t u a t i o n s , when the same behaviors are required. 4 4 When p r i o r l e a r n i n g i n t e r f e r e s with learning i n a new s i t u a t i o n we say that negative t r a n s f e r has occurred. T y p i c a l l y negative transfer occurs when persons are required to learn new responses to s t i m u l i to which other responses have, previously, been learned. An example i s , learning to d r i v e a car with manual transmission a f t e r having learned on a car with automatic transmission. 4 3 R a l p h Garry and Howard L. Kingsley, The Nature and Conditions of  Learning (Englewood C l i f f s , N . J . : Prentice Hall,1970), p.512. 4 4C.E. Osgood, Method and Theory i n Experimental Psychology (New York: Oxford,1953), p.495-548. 31 32 The question i s - which one of these cases a p p l i e s to an experienced t h i r d generation language programmer attempting to learn a fourth generation language? When fourth generation languages are examined, we notice that t h e i r commands accomplish more than t h i r d generation commands. In other words, one fourth generation language command i s equivalent to a number of t h i r d generation language commands. For example, the COUNT command i n FOCUS i s equivalent to a t h i r d generation language DO loop: 1=0 WHILE NOT END-OF-FILE DO READ RECORD 1 = 1 + 1 END WHILE COUNT = I In some cases a 4GL command i s d i r e c t l y equivalent to a 3GL command. For instance, the TYPE command i n FOCUS has a s i m i l a r function to the WRITELN command i n PASCAL. Because programmers use the same algorithm to solve the problem no matter what the language, 3GL programmers should be able to tr a n s l a t e some of t h e i r s k i l l s to a 4GL environment. But, the d i f f e r e n t syntax, nonprocedurality, and conciseness of fourth generation languages could make the transf e r of s k i l l s very d i f f i c u l t . We gain some i n s i g h t i n t o which of the above two p o s s i b i l i t i e s i s more l i k e l y to be true by examining two theories of learning t r a n s f e r . Garry et a l . describe the two theories.[Garry and Kingsley, 33 pp.513-531] The theories have opposing views of the conditions which make tr a n s f e r of t r a i n i n g p o s s i b l e . The theory of transfer by s i m i l a r i t y states that the more two functions have i n common, the more l i k e l y i t i s that t r a i n i n g i n the f i r s t w i l l tend to improve the second. The commonality of the two sit u a t i o n s i s measured by the constituents, or components of the s i t u a t i o n . The mere presence of common components does not assure p o s i t i v e t r a n s f e r ; under some conditions of t r a i n i n g , they produce negative t r a n s f e r . The amount of tra n s f e r , due to i d e n t i c a l features of two functions, v a r i e s with the locus of i d e n t i t y or the phases of the functions i n which i d e n t i t y occurs. Identity, i n the response phase, i s conducive to far more p o s i t i v e t r a n s f e r than i d e n t i t y i n the stimulus f a c t o r s . Thus, i t i s easier to learn to respond to a new s i t u a t i o n i n an o l d way, than i t i s to develop a new method of response to an o l d s i t u a t i o n , for, i n the l a t t e r case, the inter f e r e n c e from previously formed habits i s greater. Since, i n our case, we have a new method of response (4GL) rather than a new s i t u a t i o n , t r a n s f e r i s more d i f f i c u l t . An opposing view i s developed i n the theory of transfer through r e l a t i o n s h i p s . This theory argues that p o s i t i v e t r a n s f e r i s due not only to s i m i l a r i t y of content, but also to the s i m i l a r i t i e s i n the patterns of r e l a t i o n s h i p s . It i s often claimed that competitive a t h l e t i c s contribute g r e a t l y to the successful performance of an American s o l d i e r i n combat. According to the r e l a t i o n s h i p theory, transfer occurs because both a c t i v i t i e s involve coordinated teamwork of i n d i v i d u a l s performing r e l a t e d operations, not because of the s i m i l a r i t y of the required a b i l i t i e s (speed, strength, a g i l i t y ) . The 34 strategy involved in out thinking and out manoeuvering an opponent would be more important than the specific tactics employed. If this were the case in programming, 3GL programmers should be able to transfer their training to 4GL's because both tasks involve individuals solving business problems using data processing s k i l l s . In this case, i t i s not the similarity of syntax which i s an important condition of transfer, but rather i t i s the overall data processing strategies and knowledge that make transfer possible. The majority of theories of programming developed so far support transfer through relationship. Shneiderman's work on comprehension, hypothesizes that experienced programmers can outperform novices because they have better chunking a b i l i t y than novices. 4 5 Experienced programmers have a better understanding of the semantics and logic of a program so they can view i t at a higher level than novices (at the problem level rather than at the syntax level). In other words, the most important s k i l l s acquired by a programmer are the semantic and logic s k i l l s , not familiarity with the syntax. It would be reasonable to assume that once these s k i l l s are acquired, programmers could use these s k i l l s in many language environments. Shneiderman explains his model as follows. The programmer f i r s t conceives the problem in general terms such as general programming strategies. He refers to these general plans as "internal semantics". He suggests that this internal representation progresses from a very general outline to a more specific plan, to a specific generation of code focusing on minute details. Shneiderman1s "funneling" view of problem solving (going from general to specific) was f i r s t introduced as a result of 4 5Ben Shneiderman, Software Psychology: Human Factors in Computer and  Information Systems (Toronto: L i t t l e , Brown and Co.,1984), pp.46-53. 35 Duncker 1s experiment based on asking subjects to solve complex problems a l o u d . 4 6 Once the programmer has worked out the i n t e r n a l semantics the construction of a program i s a r e l a t i v e l y straightforward task. The programmer draws on h i s knowledge of semantic structures and syntax to write the code. The program may be composed i n any f a m i l i a r programming language. Other authors have developed theories along the same l i n e s . Chase and Simon conducted si m i l a r work with chess players. 4 7 They showed that experienced chess players could memorize chessboard p o s i t i o n s more e a s i l y than le s s experienced players. Simon used the "chunking" hypothesis to explain t h i s phenomena. Higher l e v e l chess players do not memorize the p o s i t i o n of each piece, but rather memorize meaningful "chunks" of pieces. Mayer hypothesizes that experienced programmers have an advantage over novices because they have "anchoring ideas" i n long term memory, which they use when learning. He explains: "In the course of meaningful learning the learner must come in t o contact with the new material, then must search long term memory... for anchoring ideas and then must transfer these ideas to short term memory so that they can be combined with new incoming i n f o r m a t i o n . " 4 8 Novices are at a disadvantage because they have not encountered s i m i l a r syntax, and because they do not have b u i l t i n algorithms. Some programmers can immediately i d e n t i f y the purpose of loops, subprograms and other structures because they have seen s i m i l a r constructs before. 4 6 K. Duncker, "On Problem Solving", Psychological Monographs, 58(1945), p.270. 4 7 William G. Chase and Herbert A. Simon, "Perceptions i n Chess", Cognitive Psychology, 4, No.1(1973), pp.55-81. 4 8 R i c h a r d E. Mayer, "The Psychology of How Novices Learn Computer Programming", ACM Computing Surveys, 13, No.1(1981), p.122. 36 The remaining question to be considered i s : Do the differences between fourth generation languages and t h i r d generation languages require that the strategies and algorithms used to attack a programming problem also be changed? Shneiderman would base hi s answer to t h i s question on the semantic s i m i l a r i t y of the two languages. He states: "Learning a f i r s t language requires development of both semantic concepts and s p e c i f i c s y n t a c t i c knowledge, while learning a second language involves learning only a new syntax, assuming the same semantic structures are retained." 4 9 Though 4GL's are very d i f f e r e n t , they are used to accomplish the same tasks as 3GL's and must therefore be semantically s i m i l a r . To use 4GL 1s i n complex tasks, programmers must use the same concepts used i n 3GL's, s p e c i f i c a l l y , c o n t r o l breaks, f i l e structures, and f i e l d formats. An experienced 3GL programmer w i l l have a thorough knowledge of these concepts. This w i l l help him i n any 4GL work. To recap, Shneiderman 1s and Simon's work i n d i c a t e that experienced programmers make better programmers than novices because, among other things, of t h e i r a b i l i t y to chunk problems. They emphasize that i n s i g h t into the semantics and l o g i c of a program i s more important than mere syntactic knowledge. Further to t h i s point 4GL's are semantically s i m i l a r to 3GL's because they are used to accomplish the same goals. Therefore, experienced programmers should be able to transf e r t h e i r s k i l l s to 4GL's. Research i n query languages, which are very s i m i l a r to 4GL's, support t h i s conclusion. Reisner's and Welty's research i n query languages has shown that experience i n other programming languages does improve performance with query languages. 4'Software Psychology: Human Factors i n Computer and Information Systems, p.48. 37 Accordingly, i f experienced programmers and novices are tested on t h e i r a b i l i t y to learn 4GL's, experienced t h i r d generation language programmers should record higher mean scores than novices on both simple and complex tests of fourth generation languages. This i s e m p i r i c a l l y tested using the following one sided t e s t : Ho: Experienced 3GL programmers' scores on 4GL tests w i l l be equal to novices' scores on 4GL tests Ha: Experienced 3GL programmers scores on 4GL tests w i l l be greater than novices' scores on 4GL tests The p r a c t i t i o n e r l i t e r a t u r e i n d i c a t e s that novices can program i n fourth generation languages, but, once the a p p l i c a t i o n becomes complex, experienced data processing people are needed. When novices are faced with more complex a p p l i c a t i o n s , they lack what Mayer c a l l s "anchoring ideas". They lack the data processing concepts which would help them understand these problems. In 4GL's, as problems become more complex, the 4GL commands used become many times as hard, semantically, than the more simple commands. Shneiderman has hypothesized that knowing the semantics of a language allows programmers to transfer t h e i r s k i l l s to other languages. Since 3GL's and 4GL's are semantically s i m i l a r , experienced 3GL programmers should be able to apply t h e i r experience to complex 4GL problems. Accordingly, the d i f f e r e n c e i n test scores, between simple and complex 4GL tasks, should be greater for novices than for experienced 3GL programmers. This i s e m p i r i c a l l y tested as follows: Ho: Difference i n simple and complex t e s t scores for experienced programmers i s equal to the d i f f e r e n c e i n simple and complex test scores f or novices 38 Ha: Difference i n simple and complex t e s t scores for experienced programmers i s less than the d i f f e r e n c e i n simple and complex test scores f o r novices. F i n a l l y , from previous work with programmers, Shneiderman and Reisner have found that other v a r i a b l e s such as work experience and p r i o r experience with other programming languages improve programming performance. Accordingly, the t h i r d hypothesis to be tested i s : the number of query language programs written, number of report writer programs written, and number of 4GL programs written, w i l l be p o s i t i v e l y correlated to 4GL t e s t scores. 5. METHOD To te s t the developed hypotheses, a laboratory experiment was conducted i n which the performance of novices, and experienced t h i r d generation language programmers, using a fourth generation language was measured across two l e v e l s of task complexity. 5.1 PARTICIPANTS Fifty - s e v e n volunteers, from two d i f f e r e n t educational i n s t i t u t i o n s , p a r t i c i p a t e d i n the study. Twenty-four students (mostly MBA students) at the U n i v e r s i t y of B r i t i s h Columbia (UBC) took part i n the experiment, as well as t h i r t y - t h r e e Computer Systems Diploma (two year program) students from the B r i t i s h Columbia I n s t i t u t e of Technology (BCIT). These two groups of students were used i n order to improve the external v a l i d i t y of the study, and to provide the necessary mix of novice and experienced programmers. The MBA students were chosen for two reasons. F i r s t l y , these students had l i t t l e programming experience, and, therefore, provided a supply of novices for the experiment. Secondly, as business students, they represented the end-users of the future, people who have l i t t l e knowledge of computers, considerable knowledge within business areas, and may be doing the programming i n the future. The Computer Systems Diploma students were also chosen for two main reasons. They provided a supply of experienced student programmers for the experiment. The second reason again r e l a t e s to the issue of external v a l i d i t y . It was f e l t that Computer Systems students were more representative of the type of programmer who would use a fourth generation language. 39 40 Specifically, Computer Systems students were preferred over university computer science students because their education stressed business programming, especially the use of COBOL, whereas university computer science programs place more emphasis on sci e n t i f i c computing and theory. Since the experiment specifically tested the use of a fourth generation language, in a business reporting setting, business oriented students were required rather than s c i e n t i f i c a l l y oriented students. For this reason, university computer science students were ruled out. The d i f f i c u l t y in using two distinct groups of subjects was that educational background could become confounded with experience. In order to minimize the chances of other variables affecting the results of the experiment, two steps were taken. The f i r s t step involved using the educational institution as a blocking variable in the s t a t i s t i c a l analysis. This w i l l be explained in more detail in the Design section of this thesis. The second step involved collecting information on other possible confounding variables (e.g. number of query language programs written), and regressing them against scores obtained on the fourth generation language tests. This analysis revealed whether any other variables, besides experience with a third generation language, could explain a person's a b i l i t y to learn and use a fourth generation language. It was thought that the advantages of having two representative groups outweighed the mentioned disadvantages. Using only Computer Science or MBA students could have cast doubts on the external va l i d i t y of the experiment. 41 5.2 DESIGN The experimental design used was a randomized block design (See Figure 2). Two factors were studied, experience with third generation languages, and task complexity. Each factor had two levels. In addition, educational institution was used as a blocking variable. 5.2.1 COMPLEXITY FACTOR One of the two factors was task complexity. After learning some FOCUS fourth generation language reporting commands, subjects were asked to write a test involving either, a simple reporting task, or a complex reporting task. Two levels of task complexity were used in order to test hypothesis number two, concerning the a b i l i t y of novices to program complex applications. The subjects were randomly assigned to one of the two tasks. The f i r s t task tests the subjects' knowledge of a set of approximately seventeen FOCUS commands. These commands form the basic subset of FOCUS reporting commands. They are the easiest to learn, and can be used to generate reports involving printing, summing, counting, and subtotalling. The second task tested the subjects' a b i l i t y to produce more complex reports involving multiple f i l e s , temporary fi e l d s , complex summarization and more detailed formatting. The commands needed to accomplish these tasks increased the subjects' required command set by seven. Including the simple commands, this increased the subjects' inventory of commands to twenty-four. Thus, the task became more complex for two reasons. F i r s t l y , the user had to have a larger vocabulary of FOCUS commands, and, secondly, the user had to be able to understand the workings of the more complex commands. 42 FIGURE 2 - EXPERIMENTAL DESIGN SIMPLE TASK COMPLEXITY COMPLEX UBC 3GL PROGRAMMING EXPERIENCE NOVICE EXPERIENCED SIMPLE 3CIT 3GL PROGRAMMING EXPERIENCE NOVICE EXPERIENCED TASK COMPLEXITY COMPLEX 43 The boundary between the two categories (simple and more complex) i s not extremely sharp. Other than the above reasoning, no precise method of ca t e g o r i z a t i o n i s advanced. Notwithstanding, there i s no doubt that the complex test was more d i f f i c u l t because of the a d d i t i o n a l commands needed. Indeed, the analysis of variance that was performed l a t e r , i n d i c a t e d that subjects scored much lower on the complex t e s t . The d i f f e r e n c e i n test scores between simple and complex tes t s was s i g n i f i c a n t (p value = .0011). Both the complex and simple commands were explained i n the Report Generation Manual. Although there were only seven complex commands, they had to be read more c a r e f u l l y to be understood, and the user needed p r a c t i c e before he or she was able to understand how they worked. The simple commands were more i n t u i t i v e . In a d d i t i o n , using the more complex command took more time because the user had to consider the many implications (results) of the command. As the commands became more complex the user had to consider the e f f e c t of using one command i n combination with others. For example, the SUBHEAD command could only be used a f t e r consideration of the e f f e c t s of the SUBTOTAL command. The two groups of commands used i n the tes t s appear i n Table II below, along with the reasoning for placing them i n the appropriate category. 44 TABLE II - CATEGORIZATION OF FOCUS COMMANDS COMMAND CATEGORY REASONING TABLE FILE simple Used to begin reporting sesssions. User has only to type t h i s as well as the filename of the f i l e he wants to use. PRINT, SUM, COUNT simple These are the basic verbs. Meanings are the same as t h e i r normal English d e f i n i t i o n . HEADING, FOOTING simple P r i n t s headings. The user has only to enclose h i s text i n quotes. IF simple Used f o r record s e l e c t i o n . Easy to understand because i t has the same meaning that " i f " does, used i n everyday vocabulary. BY, ACROSS, OVER simple Simple sort d i s p l a y commands. Either sorts the f i e l d s h o r i z o n t a l l y , v e r t i c a l l y or p r i n t s them over each other. SUBTOTAL, SUB-TOTAL simple Simplest of t o t a l l i n g commands. Can only add f i e l d s together at the sort break. AS simple Used to replace headings with another name supplied by the user, User has only to supply the new name i n quotes. NOPRINT simple To suppress p r i n t i n g of a f i e l d . User s p e c i f i e s the f i e l d to be suppressed. UNDER-LINE, SKIP-LINE PAGE-BREAK simple User has only to specif y the f i e l d he wants t h i s a c t i o n to a f f e c t . JOIN complex User requires understanding of indexes, and f i e l d formats. Join involves the use of at le a s t two f i l e s . User a l s o needs to know how to r e f e r to the joined f i l e , and how the JOIN i s r e l a t e d to DEFINE and TABLE. DEFINE, COMPUTE complex Used to create a new f i e l d from those given. Involves other concepts such as concatenation and assigning new f i e l d values s e l e c t i v e l y . RECAP,SUMMARIZE complex Used to produce other than simple subtotals at the co n t r o l break. User needs an understanding of both COMPUTE and SUBTOTAL combined to use these c o n t r o l break commands. SUBFOOT, SUBHEAD complex Used to p r i n t summary at control break. Need an understanding of where c o n t r o l breaks w i l l occur. Also need to know how to p r i n t current database f i e l d values i n the text. When the user begins to use the complex commands, an understanding of database theory i s an advantage. Commands l i k e JOIN 46 and DEFINE require some understanding of database concepts. The simple commands are more si m i l a r to the t h i r d generation language commands, than the more complex commands. 5.2.2 EXPERIENCE FACTOR The expefince factor had two l e v e l s , novice, and experienced 3GL, and were c l a s s i f i e d by information provided i n the questionnaire. Three independent judges determined whether the subjects were novice programmers or experienced 3GL programmers. One of the judges was an MIS academic, and the other two were MIS p r o f e s s i o n a l s . The subjects were c l a s s i f i e d on a seven point scale according to t h e i r p r i o r experience with t h i r d generation languages. Experience with other software, such as query languages, or database management systems, was not considered. Experience with other types of software was not considered because the purpose of the c l a s s i f i c a t i o n was to test hypothesis number one, concerning the influence that p r i o r work with t h i r d generation programming languages had on the a b i l i t y to learn a fourth generation language. Experience with these other factors are considered under hypothesis number three: The i n s t r u c t i o n s that were d i s t r i b u t e d to the judges appear i n Appendix 1. The judges ratings appear i n Table I I I . The r e l i a b i l i t y between judges as measured by the c o r r e l a t i o n s were: .65 for judges one and two, .78 for judges one and three, and .68 for judges two and three. From the r e s u l t s , we can see that judge number two used a s l i g h t l y d i f f e r e n t r a t i n g scale than judges one and three, but, o v e r a l l , the ratings were s i m i l a r . The T A B L E I I I - J U D G E S ' SUBJECTS 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 20 2 1 22 23 24 25 26 27 28 29 30 3 1 32 33 34 35 36 37 38 39 40 4 1 42 43 44 45 46 R A T I N G S QE_ 1 H E S U B J E C T S ' E X P E R I E N C E JUDGE 1 J U P G E 2 J UDGE 3 MEAN 1 3 . 0 2 2.00 2 4 . 5 4 3 . 5 0 1 . 0 1 1 1 . 00 4 . 5 5 5 4 . 83 1 . 0 1 1 1.00 4 . 5 5 4 4 . 50 4 . 5 7 5 5 . 50 5 . 0 3 5 4.33 4 . 5 4 4 4.17 4 . 5 3 3 3 . 5 3 . 5 1 2 2.17 4 . 5 4 4 4.17 3 . 5 1 2 2.17 4 . 5 3 4 3 . 83 4 . 5 3 6 4 . 50 4 . 5 3 4 3 . 83 4 . 5 5 4 4.50 3 . 5 2 3 2 . 83 5 . 5 5 6 5.50 3 . 0 1 2 2 . 00 4 . 5 3 3 3 . 50 2 . 0 1 2 1.67 1 . 0 1 1 1.00 5 . 5 7 5 5.83 2.0 1 1 1 . 33 4 . 5 1 2 2.50 5 . 5 4 4 4 . 50 5 . 0 2 7 4.67 3.5 2 4 3.17 1 . 0 1 1 1 . 00 2 . 5 3 3 2.83 5 . 0 3 3 3.67 5 . 0 4 4 4.33 3 . 5 3 7 4 . 50 6 . 0 7 5 6 . 00 1 . 0 1 1 1 . 00 4 . 0 2 3 3 . 00 1 . 0 1 1 1 . 00 5 . 5 1 5 3 . 83 1 . 0 1 1 1 . 00 3 . 5 1 2 2 . 17 5 . 0 4 3 4 . 00 1 . 0 1 1 1.00 3 . 5 5 4 4.17 6 . 0 7 5 6 . 00 5 . 0 2 3 3.33 47 1 . 0 1 1 1 . 00 48 2 . 0 1 2 1 . 67 49 3 . 5 1 3 2 . 50 50 3 . 5 1 2 2 . 1 7 5 1 4 . 5 3 3 3 . 50 52 4 . 5 2 3 3 . 1 7 53 5 . 5 7 6 6 . 1 7 5 4 4 . 0 4 4 4 . 00 55 3 . 5 2 2 2 . 50 56 3 . 5 7 4 4 . 83 57 3 . 0 3 3 3 . 00 49 mean score of the three judges , f o r each subject was c a l c u l a t e d . These mean scores were p l o t t e d on a frequency chart (See f i g u r e 3). As can be seen, the double humped d i s t r i b u t i o n , which would i n d i c a t e obvious novice and expert groups, d i d not occur. Since the frequency chart showed no d i s t i n c t novice and experienced groups, other methods had to be used to form novice and experienced groups. Two methods were used. The f i r s t method grouped a l l subjects rated one, two, or three as novices, and those rated f i v e , s i x , or seven (though no subjects were rated 7) as experts, and those rated four as intermediates. The intermediate group was not used i n hypothesis t e s t i n g . The second method considered the top f o r t y percent of the subject ratings were experts and the bottom f o r t y percent were novices. The in-between twenty percent were considered intermediates and were dismissed. Figure 4 shows the amount of agreement and disagreement that was found on subject ratings between the two methods. As can be seen, the f i r s t method resulted i n a smaller expert cl a s s than the second method because few subjects were rated as sixes or sevens. Figure 5 shows the breakdown of subjects i n each treatment. Method one, the number separation method, resu l t e d i n a group of twenty-eight novices and a group of f i f t e e n experts. Method two, the percentage separation method, resulted i n a group of twenty-three novices and a group of twenty-two experts. The groups for method two are s l i g h t l y unbalanced because the novice group included a l l r a t ings up to, but not in c l u d i n g , three, and the expert group included a l l ratings of four and above. A twenty-third expert was not chosen to preserve the 3 to 4 range as intermediates. FREQUENCY -J. ro - + — + -CO - + - - + -CD -J - + + -o -- +— +-0) - + * * * * * * * * o t-3 w c D a Ol 53 t-3 M O * * * * o i > S JO w o s ^ o K >-3 K O W £ Cn G D a o w - t-3 so S S3 •-3 OS 51 F I G U R E 4 - C O M P A R I S O N QL S U B J E C T SEPERATIQN METHODS NUMBER R A T I N G S E P E R A T I ON M E T H O D N O V I C E I N T E R M E D I A T E E X P E R T P E R C E N T A G E S E P A R A T I ON METHOD NOV ICE N T E R M E D I A T E E X P E R T F I G U R E 5 - NUMBER Q£_ S U B J E C T S J J i E A C H T R E A T M E N T S I M P L E T A S K D I F F I C U L T Y C O M P L E X P R O G R A M M I N G E X P E R I E N C E N O V I C E E X P E R I E N C E D M E T H O D 1 = 16 M E T H O D 2 = 14 METHOD 1 = 6 METHOD 2 = 10 METHOD 1 = 12 METHOD 2 = 9 METHOD 1 = 9 METHOD 2 = 12 53 Besides the s l i g h t l y larger number of experts produced by method two, the two methods of separation resulted i n ratings which were almost i d e n t i c a l . S t a t i s t i c a l t e s t s , t e s t i n g hypotheses number one and two, were performed, using both separation methods. I d e n t i c a l r e s u l t s were obtained. For t h i s reason, the s t a t i s t i c a l analyses presented i n the rest of the thesis w i l l only show r e s u l t s using the f i r s t method of separation. This method i s preferred because i t uses the judges' ratings d i r e c t l y . It does not force a c e r t a i n percentage i n t o the expert group as the second method does. A t h i r d v a r i a b l e , a blocking v a r i a b l e , was used to eliminate the differences which might occur between subjects. These d i f f e r e n c e s were due to the l o c a t i o n of the t e s t i n g , or the background of the subjects. The blocking v a r i a b l e (educational i n s t i t u t i o n ) removed the v a r i a b i l i t y i n the scores caused by lo c a t i o n of the experiment. Thus, i t cannot be argued that the d i f f e r e n c e i n te s t scores was due to educational background or experimental s e t t i n g . In order to use t h i s randomized complete block design, we had to prove that there were no int e r a c t i o n s between the f a c t o r s and the block, and that the blocking v a r i a b l e d i d not explain a s i g n i f i c a n t amount of v a r i a t i o n of the dependent v a r i a b l e . If t h i s blocking v a r i a b l e d i d explain a s i g n i f i c a n t amount of v a r i a t i o n , educational i n s t i t u t i o n would have to be considered a confound. When analysis of variance was performed, educational i n s t i t u t i o n was not found to be a s i g n i f i c a n t f a c t o r . Also, educational i n s t i t u t i o n d i d not have any s i g n i f i c a n t i n t e r a c t i o n s with the other f a c t o r s . Therefore, educational i n s t i t u t i o n can be used as the blocking v a r i a b l e . 54 5.2.3 CLUSTER ANALYSIS "Given a sample of N objects or individuals, each of which i s measured on each of p variables, cluster analysis is a classification scheme for grouping the objects into classes." 5 0 For our purposes, cluster analysis was used to classify the subjects of the experiment into three classes, based upon measures of their programming experience. The three measures used were: the number of third generation programming languages known, the number of third generation programs written, and the amount of programming work experience. The three classes of subjects obtained were novice, intermediate, and expert programmers. This classification was then used as a comparison with the classifications arrived at from the judges' ratings. In other words, cluster analysis was used as a non-subjective tool to lend validity to the judges' ratings. There are several different methods of cluster analysis. Clusters can be obtained by hierarchical techniques in which a group of subjects i s s p l i t into smaller groups (or individual subjects are joined into clusters) based on distance measures. The clusters can also be obtained by density techniques which seek regions of high density to form clusters. Clumping techniques, which allow overlap between clusters, are also used, but are not appropriate here because of the need for mutually exclusive groups. Based on his studies, Everitt concludes that the best results seem to be obtained by hierarchical techniques. [Everitt, p.45] Several hierarchical techniques were used, but the best results come with Ward's method. "Ward proposes that at any stage of an 5 0 Brian Everitt, Cluster Analysis (London: Heinemann Educational Books,1974), p . l . 55 analysis the loss of information which r e s u l t s from the grouping of i n d i v i d u a l s i n t o c l u s t e r s can be measured by the t o t a l sum of squared deviations of every point from the mean of the c l u s t e r to which i t belongs. At each step i n the a n a l y s i s , union of every p o s s i b l e pair of c l u s t e r s i s considered and the two c l u s t e r s whose fusion r e s u l t s i n the minimum increase i n the error sum of square are combined."[Everitt, p.15] O u t l i e r s i n the data have a negative e f f e c t on the c l u s t e r analysis because they form t h e i r own c l u s t e r and impede decomposition of other c l u s t e r s . For t h i s reason, subject number f i f t y - s i x , who had much more experience than any other subject, was deleted. Figure 6 shows the c l u s t e r s produced by Ward's method. The ones represent novices, twos intermediates, and threes experts. Comparison of the c l u s t e r s produced by Ward's method with those obtained from the judges' scores, shows that forty-two of the f i f t y - s e v e n subjects are c l a s s i f i e d the same way by both methods ( t h i s i s true for both the c l a s s i f i c a t i o n s obtained from the judges' scores). This represents a seventy-four percent agreement. In addition, a l l of the experienced c l u s t e r , and nineteen of the twenty subjects i n the novice c l u s t e r , produced by the c l u s t e r a n a l y s i s are c l a s s i f i e d i n the same way by the judges. The only r e a l disagreement occurs between the novice-intermediate and intermediate-expert c l a s s i f i c a t i o n s , because c l u s t e r a n a l y s i s produces a larger intermediate group. Thus, the c l u s t e r a n a l y s i s seems to lend v a l i d i t y to the judges' r a t i n g s . F I G U R E 6 - GRAPH OF N O V I C E , I N T E R M E D I A T E AND E X P E R T C L U S T E R S 2 2 2 2 2 2 3 3 3 1 1 1 22 2 22 2 3 1 22 3 3 2 2 1 1 1 1 1 1 1 1 = N o v i c e 2 = Intermediate 3 = Expert 0 10 2 0 30 4 0 5 0 6 0 7 0 8 0 9 0 100 110 120 NUMBER OF 3GL PROGRAMS WRITTEN 57 5.2.4 STATEMENT OF THE MODEL EQUATION The hypotheses were tested e m p i r i c a l l y using the following model: SCORE. .. = M • • + EXPERIENCE. + COMPLEXITY. + (EXPERIENCE*COMPLEXITY) . . l j k l j l j + EDUCATIONAL INSTITUTION. + e. (1.0) k l j k where u.. i s the o v e r a l l mean i i s the l e v e l of the EXPERIENCE factor j i s the l e v e l of the COMPLEXITY factor k i s the l e v e l of the blocking factor and €. . . i s the random error, l j k 5.2.5 OTHER VARIABLES USED IN THE ANALYSIS Some of the subjects i n the experiment had backgrounds which included experience with other types of software besides t h i r d generation languages. This experience could a f f e c t t h e i r performance on the FOCUS t e s t s . Since a fourth generation language, such as FOCUS, i s b a s i c a l l y an i n t e g r a t i o n of other types of software, such as a query language, report writer, screen painter, database management system, procedural language etc., i n d i v i d u a l experience with these other types of software might have an a f f e c t on how well a subject can learn FOCUS. In order to account for pos s i b l e confounding e f f e c t s , the subjects' experience with various types of software were recorded, and were r e l a t e d to t h e i r performance on the FOCUS tes t s v i a a regression study. Before proceeding with the regression study two questions 58 needed immediate answers: How could software experience be categorized, and which software products f i t i n t o which categories? The f i r s t attempt resulted i n a c a t e g o r i z a t i o n of software experience i n t o the following groups: nonprocedural languages, report writers, query languages, and fourth generation languages. These categories were not s a t i s f a c t o r y because most of them were badly defined, and because they overlapped. For example, a fourth generation language could a l s o be categorized as a query language or a nonprocedural language. Even though terms l i k e query language and report writer are commonly used, there are no accepted d e f i n i t i o n s for any of these categories. Some weak d e f i n i t i o n s have been given i n the l i t e r a t u r e f o r these terms. For example, the National Bureau of Standards i n the U.S., has defined a query language as a "language used to specify how database objects (items, e n t i t i e s , and r e l a t i o n s h i p s ) are retr i e v e d , manipulated (inserted, deleted, and modified) and how new objects are created." 5 1 Sometimes the i n s e r t i n g , d e l e t i n g , and modifying aspects are not considered to be part of a query language, but, rather, make up a data sublanguage. Reisner defines a query language i n the s t r i c t e r sense, "A query language i s a s p e c i a l purpose language for constructing queries to r e t r i e v e information from a database of information stored i n a computer. It i s usu a l l y intended to be used by people who are not pro f e s s i o n a l programmers. Query languages are usually higher l e v e l languages with a f a i r l y l i m i t e d number of functions." 5 2 The problem with t h i s d e f i n i t i o n i s that languages l i k e 5 N a t i o n a l Bureau of Standards, An Ar c h i t e c t u r e f o r Database  Management Standards, NSB Special P u b l i c a t i o n 500-86 (Washington: National Bureau of Standards, 1982) p.37. 5 2 R e i s n e r , "Human Factors Studies of Database Query Languages: A 59 SQL and QBE can no longer be considered query languages. In order to make the c a t e g o r i z a t i o n simpler, three mutually exclusive categories were created. The f i r s t category was query languages. Query languages were defined as languages used to r e t r i e v e and update information i n a database. In t h i s sense, query languages do not allow the user to s e l e c t the l o c a t i o n (column posi t i o n ) or appearance ( i n s e r t commas, d o l l a r signs, etc.) of f i e l d s i n the report, rather the user must accept the d e f a u l t report. Query languages a l s o lack the l o g i c of procedural languages. The second category was report w r i t e r s . Report writers were defined as languages which had extensive formatting functions which could be used to produce simple and complex reports from a database or sequential f i l e . This includes the a b i l i t y to produce f i n a n c i a l and s t a t i s t i c a l reports. In t h i s sense, report writers have some programming l o g i c , but lack good i n t e r a c t i v e query f a c i l i t i e s . The t h i r d category was fourth generation languages. As defined e a r l i e r , fourth generation languages include a query language, a report writer, a screen painter, a nonprocedural language, a database management system, and some procedural code for complex l o g i c . For the purpose of t h i s research, i f software f e l l somewhere in-between a query language or report writer, and a fourth generation language, i t was c l a s s i f i e d as a fourth generation language. This should not weaken the r e s u l t s because regression of t e s t scores, with the f i r s t two categories, w i l l show how using a query language or report writer a f f e c t s using a fourth generation language. Regression of test scores with the l a s t category w i l l i n d i c a t e how using multiple parts of a 5 2 ( c o n t d) Survey and Assessment" p.14. 60 fourth generation language (or a l l the parts) w i l l a f f e c t using the reporting commands of a fourth generation language. The e f f e c t of having p r i o r experience with other software t o o l s , on the subject's a b i l i t y to learn a 4GL, depends heavily on what p r i o r tasks the subject has performed with the software t o o l s . As i t was not f e a s i b l e to ask each subject what tasks he performed with the software, we could not determine how well the subject knew the p a r t i c u l a r t o o l . Therefore, the r e s u l t s of the regression are somewhat weakened by the fact that we only use aggregate experience data. Software used by subjects i s c l a s s i f i e d below i n Table IV. TABLE IV - OTHER SOFTWARE USED BY THE SUBJECTS Query Language Report Writer Fourth Generation Language SQL COBOL REPORT DBASE II and III QUERY WRITER RBASE 5000 EDBS MARK IV KMAN RPG III ORACLE IFPS IMAGE SAS, GPSS 61 5.3 MEASURE OF THE DEPENDENT VARIABLE The subjects' a b i l i t y to learn fourth generation language reporting commands was measured by t h e i r performance on either the simple or complex reporting t e s t . Both the simple and complex te s t s were made up of three questions. Appendix 2 contains the simple t e s t , Appendix 3 contains the complex t e s t , and Appendix 4 contains the marking scheme used to score the two t e s t s . As can be seen, question One on the simple and complex te s t s are i d e n t i c a l . This f i r s t question was intended to f a m i l i a r i z e the student with a FOCUS exercise. As a r e s u l t , question One was a very easy question, and, therefore, was only included i n the o v e r a l l score for the simple t e s t . The o v e r a l l score for the complex te s t included only the scores on questions two and three. Question two on the complex t e s t was more d i f f i c u l t than question two on the simple t e s t , but both questions required the subject to produce a s i m i l a r report on r e g i s t r a t i o n information. The two question Three's required the subject to produce similar reports, one being more complex than the other. Question Two and Three were r e a l l y measuring the same thing within each t e s t , and that was, the a b i l i t y of the subjects to learn the fourth generation language commands (either simple or complex). But, as can be seen from the appendices, question three was much more comprehensive, and used many more FOCUS commands than question two. Because question Three i s very comprehensive, i t could be used as a measure of the dependent v a r i a b l e , a b i l i t y to learn a 4GL. The o v e r a l l test score could a l s o be used as a measure of the dependent v a r i a b l e . Questions one and two were not comprehensive enough to be 62 used as v a l i d measures of 4GL l e a r n i n g a b i l i t y . Regression a n a l y s i s i n d i c a t e d that the scores achieved on question three, and o v e r a l l score were h i g h l y c o r r e l a t e d (R 2 = .7335). Since the o v e r a l l score in c o r p o r a t e s the scores achieved on question three, only the o v e r a l l score w i l l be used as a measure of the dependent v a r i a b l e . The t e s t s were marked by an independent FOCUS expert (an MIS p r o f e s s i o n a l ) who was not i n v o l v e d i n the research. An independent judge was used because someone i n v o l v e d i n the research would have been l e s s o b j e c t i v e i n h i s s c o r i n g . 5.4 PROCEDURE 5.4.1 PILOT TEST P r i o r to the a c t u a l experiment, a p i l o t t e s t was conducted t o i d e n t i f y any weaknesses i n the experimental m a t e r i a l s , and to o b t a i n p r a c t i c e i n a d m i n i s t e r i n g the experimental sessions. Eleven MBA students a t UBC took p a r t i n the p r e - t e s t i n g , as pa r t of a systems a n a l y s i s course they were t a k i n g . One other MBA student was asked to take p a r t because he had l i t t l e knowledge of computers. The systems a n a l y s i s students had v a r i e d amounts of p r i o r programming experience. The students were randomly assigned t o e i t h e r the complex or simple r e p o r t i n g t a s k s . During the experiment, subjects were i s o l a t e d i n a q u i e t room. For the m a j o r i t y of the time, the o n l y other person i n the room was the l a b a s s i s t a n t . An IBM AT was used t o run the f o u r t h generation language. The p r e - t e s t sessions ran smoothly; a l l students f i n i s h e d the experiment i n l e s s than the three hours scheduled. The cumulative 63 time taken to complete the session varied from eighty-nine to one hundred and seventy-seven minutes. Subjects were given f o r t y - f i v e minutes to complete the t e s t . The subjects who wrote the simple test generally f i n i s h e d i n les s than f o r t y - f i v e minutes (only one of the six subjects took the maximum f o r t y - f i v e minutes). Scores on the simple test v a r i e d from f i f t y - s e v e n to ninety-four percent, with four of the s i x subjects scoring above eighty percent. The subjects who wrote the complex te s t generally took a l l f o r t y - f i v e minutes (four of the f i v e subjects) to complete the t e s t . Scores on the complex te s t varied from sixty-one to eighty-seven percent, with four out of the f i v e scoring seventy percent or higher. In order to achieve a wider range of scores, f o r data a n a l y s i s purposes, i t was decided to add two small sections to question one, and question three was made s l i g h l y longer (on both simple and complex t e s t s ) . F o r t y - f i v e minutes remained as the a l l o t e d time f o r both t e s t s . The thesis r e s u l t s , which w i l l be presented l a t e r , i ndicated that t h i s f o r t y - f i v e minute time l i m i t , along with the increased d i f f i c u l t y r e s u l t e d i n a much more d i f f i c u l t complex t e s t . While doing the sample problems some subjects complained that they were unable to understand the data structure used i n the problems. For example, some d i d not know i f the RETURNS f i e l d i n the SALES f i l e contained information about just one product, or was aggregate data by store. In order to c l a r i f y these problems, examples of data records were added to the f i l e . Subjects also experienced problems t r y i n g to decide what the proper order of the DEFINE and JOIN commands were, when the two commands were used i n the same program. In order to help the subjects 64 understand the l o g i c of these commands, problem number eight was l a t e r redesigned to involve both the DEFINE and JOIN commands. Other than these two minor problems, the subjects had no other d i f f i c u l t i e s . As a r e s u l t of the subjects' comments, as well as comments made by other reviewers, several changes were made to the Report Generation Manual, a f t e r the pre-test sessions. The following sections were deleted, because they were not d i r e c t l y relevant to the t e s t , and, were time-consuming: Direct Operations, Includes and Excludes Tests, Testing Accumulated Values, Reports with no Verbs, C a l c u l a t i o n s . In addition, the explanations for the PRINT, SUM and COUNT verbs were expanded because the students were oc c a s i o n a l l y confused as to which verb to use i n a given s i t u a t i o n . Examples were also added to some sections of the manual i n order to c l a r i f y the purpose of c e r t a i n commands. F i n a l l y , the section d e s c r i b i n g the JOIN command was expanded because of subject confusion. This confusion was both v e r b a l i z e d , and obvious from the r e s u l t s of the p i l o t t e s t . 5.4.2 THE ACTUAL EXPERIMENT Each subject completed a three part session which l a s t e d approximately three hours. The f i r s t part of the session involved learning a fourth generation language, the second involved p r a c t i c i n g fourth generation language commands i n a sample problem session, and the t h i r d tested the subject's knowledge of the fourth generation language which they had just learned. As mentioned e a r l i e r , t h e fourth 65 generation language used was FOCUS, the most popular fourth generation language on the market today. FOCUS was used because i t has a l l of the characteristics of a fourth generation language which were enumerated in an earlier chapter of this thesis. FOCUS, an application generator, was used rather than a code generator, because application generators are more widely used in industry, and are more oriented towards end-users. The specific range of FOCUS reporting commands used in the tests were enumerated in Table II of this thesis. The lab assistant began the sessions by giving a brief explanation of the purpose of the experiment (the lab procedures followed during the experiment are shown in Appendix 5). He explained that the main purpose of the experiment was to compare the a b i l i t y of novices and experienced third generation language programmers to learn a fourth generation language. Next, the lab assistant discussed how important fourth generation languages are becoming in business. This was intended to reinforce the subjects' belief that they would learn something useful. Following this, the lab assistant discussed the importance of collecting the data for this thesis, and then, briefly, covered the sequence of events for the session. At this point, the lab assistant stressed that the session would take three hours, and possibly more. This was mentioned, in order to avoid losing disinterested students after the reading or practice stages of the experiment. It was hoped that once a student began, he would commit himself to finishing the session. As mentioned before, the lab sessions were comprised of three main parts. 66 Part 1 The approximate duration of Part 1 was one-half to one and one-half hours. Subjects learned FOCUS commands used to generate reports, by-reading an i n s t r u c t i o n manual adapted, by the author, from the PC/FOCUS Users Manual. This manual appears i n Appendix 6. The subjects were encouraged to take as much time as necessary to read the manual,and to read i t c a r e f u l l y , as i t would save them time when they proceeded to the p r a c t i c e and test portions of the experiment. The lab experiments, i n which the subjects were tested, were s l i g h t l y d i f f e r e n t at the two i n s t i t u t i o n s (UBC and BCIT). At BCIT, large numbers of people were run through the session at the same time. At UBC, subjects were tested i n d i v i d u a l l y . As a r e s u l t , l e s s c o n t r o l could be exercised over the subjects at BCIT, even though they were asked not to t a l k to one another during the session. This was one of the reasons f o r using educational i n s t i t u t i o n as a blocking v a r i a b l e i n the s t a t i s t i c a l a n a l y s i s ; the second reason being d i f f e r e n t educational background. Part 2 The approximate duration of Part 2 was one-half to one and one-half hours. A f t e r reading the FOCUS manual, the subjects were given a set of eight p r a c t i c e problems (see Appendix 7) i n order to p r a c t i c e the commands explained i n the manual. The p r a c t i c e problems were based on a f i c t i t i o u s milk company's database, which were taken from the PC/FOCUS Operations Manual. A b r i e f d e s c r i p t i o n of the database, the 67 d e s c r i p t i o n of the f i e l d s making up the database, and an example of database records were in c l u d e d with the p r a c t i c e problems. Subjects were asked t o budget as c l o s e to an hour as p o s s i b l e f o r the p r a c t i c e problems. They were a l s o provided w i t h the s o l u t i o n s to the problems. Subjects were i n s t r u c t e d to consult the s o l u t i o n s o n l y a f t e r attempting the problem at l e a s t once, but were encouraged to r e f e r to the s o l u t i o n i f they were spending an e x c e s s i v e amount of time on any one problem. Each of the sample problems forced the subject to use one or more of the most important FOCUS commands. S o l u t i o n s to the problems were typed i n t o the computer by the s u b j e c t , who then executed them using FOCUS. This was continued i t e r a t i v e l y u n t i l the subject answered c o r r e c t l y . The time taken to complete the problem s e s s i o n was recorded. Part 3 The approximate d u r a t i o n of Part 3 was one hour. A f t e r completing the problem s e s s i o n , subjects were given a w r i t t e n t e s t . The subjects were randomly assigned t o e i t h e r the simple or complex t e s t . The t e s t s i n v o l v e d c r e a t i n g course and subject r e p o r t s from a u n i v e r s i t y r e g i s t r a t i o n database, which had been created by the author. A f o r t y - f i v e minute time l i m i t was imposed f o r the simple and complex t e s t s . Subjects were advised when two or three minutes remained i n the t e s t . F o l l o w i n g the t e s t , the subjects were r e q u i r e d to f i l l out a q u e s t i o n n a i r e a s k i n g f o r personal data such as: number of years work 68 experience, educational l e v e l , knowledge of other 4GL's, p r i o r use of report w r i t e r s , query languages, data base management programs, computer languages known, number of programs written, number of years programming etc. The questionnaire i s shown i n Appendix 8. This data was used by the judges to rate the subjects as either novices or experts, and to t e s t hypothesis number three, concerning the e f f e c t of other v a r i a b l e s on the subject's a b i l i t y to learn a f o u r t h generation language. 5.5 ESTIMATION OF THE SAMPLE SIZE NEEDED Before running the experiment i t s e l f , s t a t i s t i c a l a n a l y s i s was performed to estimate the size of the sample needed. Two methods of estimating the sample s i z e were used. The c a l c u l a t i o n s are shown i n Appendix 9. The r e s u l t s indicated that a sample siz e of 48 subjects would enable us to t e s t the hypotheses with a s u f f i c i e n t l e v e l of confidence. 5.6 STATISTICAL METHODS USED 5.6.1 HYPOTHESIS ONE Analysis of variance was used to determine i f 3GL experience s i g n i f i c a n t l y a f f e c t s the subjects' performance on t e s t s of fourth generation languages. S p e c i f i c a l l y , one-sided t - t e s t s were performed to determine i f experienced t h i r d generation language programmers achieved higher mean scores than novices on simple and complex tests of fourth generation languages. The tested hyptheses were: 69 Ho: Experienced programmers' scores on simple 4GL t e s t s w i l l be equal to novices' scores on simple 4GL t e s t s , ( i . e . u21 = u11) Ha: Experienced programmers' scores on simple 4GL tes t s w i l l be greater than novices' scores on simple 4GL t e s t s , ( i . e . M 2 1 > Mn) and Ho: Experienced programmers' scores on complex 4GL t e s t s w i l l be equal to novices' scores on complex 4GL t e s t s , ( i . e . M 2 2 = M 1 2 ) Ha: Experienced programmers' scores on complex t e s t s w i l l be greater than novices' scores on complex 4GL t e s t s , ( i . e . M 2 2 > u12 ) To test these hypotheses we used one-sided t - t e s t s (at the 5% and 10% l e v e l s of s i g n i f i c a n c e ) . With reference to Figure 7, i f we l e t L equal the d i f f e r e n c e between experienced and novice scores (e.g. M 2 I M n ) , then the c r i t i c a l value t * i s c a l c u l a t e d as follows: t * = (L-0)/S(L) where L = Y 2 1 - Y l l f and S 2(L) = MSE + l / n i ' . . ' ) . The t table value w i l l have n T - ab degrees of freedom where n i s the sample s i z e , a and b are the number of l e v e l s of the f i r s t and second f a c t o r . Once t * i s calculated we would compare i t with the t-value obtained from a t - d i s t r i b u t i o n table f o r a given l e v e l of s i g n i f i c a n c e . If t* 5 t we w i l l accept Ho, otherwise i f t * > t we w i l l r e j e c t Ho and accept Ha. Regression analysis was also used to support the analysis of variance r e s u l t s , and to determine the numerical r e l a t i o n s h i p between the subjects' experience with 3GLs, and t h e i r a b i l i t y to learn a 4GL. F I G U R E 7 - E X P E R I M E N T A L D E S I G N WITHOUT BLOCK ING P R O G R A M M I N G E X P E R I E N C E NOV I C E E X P E R I E N C E D 3 G L P R O G R A M M E R S S I M P L E TASK D I F F I C U L T Y C O M P L E X 71 For t h i s a n a l y s i s , the dependent FOCUS test score (OV) was regressed with the independent v a r i a b l e , judges' mean experience r a t i n g (SCA) (measuring the subjects' experience with 3GLs). The model i s as follows: Test Score = 0O + ^ E x p e r i e n c e Rating + e (1.1) i i i We examined both the s i g n i f i c a n c e of the r e l a t i o n s h i p ( R 2 ) , the d i r e c t i o n of 0 l f and the p-value f o r the Experience Rating v a r i a b l e . Hypothesis number one predicts 0i > 0. The above analysis was repeated for both the simple and complex t e s t s . The two tests could not be combined i n the same analysis because a c e r t a i n score on the complex test indicated a completely d i f f e r e n t performance than the same score on the simple t e s t . Scores on the simple test were s i g n i f i c a n t l y higher. 5.6.2 HYPOTHESIS TWO Analysis of variance was used to determine i f the novices' mean drop i n scores between the simple and the complex test was greater than the experienced 3GL programmers mean drop i n scores. If the (EXPERIENCE * COMPLEXITY) i n t e r a c t i o n term i n our model i s s i g n i f i c a n t , we can conclude that the d i f f e r e n c e i n test score, between the simple and complex t e s t , has changed s i g n i f i c a n t l y either for the novices or the experienced 3GL programmers. But, s p e c i f i c a l l y , we would l i k e to test i f novice scores have decreased by a larger number. We can test t h i s using a one-sided t - t e s t . Ho: ( M H - M i 2 ) = ( M 2 I ~ M 2 2 ) 72 Ha: ( M X 1 - M 1 2 ) > ( M 2 I - M 2 2 ) where t* = (L-0)/ S(L) where L = (Y X 1 - Y12) - (Y21 - Y22) and s(L) = MSE LL c.. ,/n. . . I f Ho is accepted, we would conclude that there is i j 2 i ] c no evidence indicating that novice programmers' scores decrease more than experienced 3GL programmer scores. 5.6.3 HYPOTHESIS THREE Simple and multiple regression analyses were used to test hypothesis three: that other variables are positively related to the subjects' test scores. Scatter plots of the number of query language, report writer, and 4GL programs written versus test scores were examined before performing the regressions to ensure that enough data was available to proceed with the regression. Figure 8 shows the graph of subjects' scores versus the number of Report Writer programs written. The scatter plots of subjects' scores versus the number of query language programs written, and the number of fourth generation language programs written are very similar. It i s obvious that there i s a shortage of data for a l l the variables, but the data for the query language programs written is most scarce. Only four subjects had used query languages before and no one had written more than two query language programs. The query language variable was not examined because of the shortage of data. The following models were analyzed: Test Score = Test Score = Test Score = 0O + (3 i Report Writer + e 0O + 0i4GL + e 0o + 0!Report Writer + 024GL + e (1.2) ..(1.3) ...(1.4) FIGURE 8 - GRAPH OF SUBJECTS' SCORES VERSUS THE NUMBER OF REPORT WRITER PROGRAMS WRITTEN Z 3 4- 5 6 NUMBER OF REPORT WRITER PROGRAMS WRITTEN 74 Report Writer, and 4GL, are dummy variable s , with the following meaning: i f the v a r i a b l e = l then the subject has used t h i s t o o l before, i f the variable=0 then the subject has no experience with the t o o l . The above regressions were repeated for both simple and complex t e s t s . From the r e s u l t s , we looked for o v e r a l l s i g n i f i c a n c e of the re l a t i o n s h i p (R 2, F - s t a t i s t i c ) , and a p o s i t i v e 0 X value. A p o s i t i v e ^ value indicated that, as hypothesized, a greater amount of experience with the t o o l r e s u l t s i n higher 4GL test scores. 5.7 VARIABLES USED IN THE ANALYSIS In the course of the research, the variables shown i n Table V were used i n the analysis of the hypotheses. The data c o l l e c t e d from the subjects appears i n Appendix 10. TABLE V - VARIABLES USED IN THE ANALYSIS VARIABLE NAME TYPE OF VARIABLE 1. Educational I n s t i t u t i o n a l ) Dummy va r i a b l e (UBC=1, BCIT=0) 2. Previous Related Full-Time Work Experience (PWE) Dummy vari a b l e (YES=1, N0=0) 3. Years of Programming Experience at Work(YPEW) Continuous v a r i a b l e (=percent of time * no of years) 4. MBA student (MBA) Dummy vari a b l e (YES=1, N0=0) 5. Computer Systems student (SS) 6. Other student (OS) 7. Simple/Complex Task (COMPLEXITY) 8. Previous 3GL Experience (E3GL) 9. Number of 3GLs Known (N3GL) 10. Number of 3GL Programs Written (N3GLW) 11. Number of Report Writer Programs Written (RWPW) 12. Number of Query Language Programs Written (QLPW) 13. Number of 4GL Programs Written (N4GLW) 14. Total time of the Subject's Session (TTIME) 15. Reading time (RT) 16. Problem time (PT) 17. Test time (TT) 18. Overall Score on FOCUS Test (OV) 19. Score on Ql (Ql) 75 Dummy varialble (YES=1, N0=0) Dummy variable (YES=1, N0=0) Dummy variable (Simple=l, Complex=0) Dummy variable (YES=1, NO=0) Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable Continuous variable 76 20. Score on Q2 (Q2) Continuous variable 21. Score on Q3 (Q3) Continuous variable 22. Novice/Experienced 3GL Classification (EXP) Dummy variable (Novice=0 Exp3GL=l) 23. Mean experience score of the three judges (SCA) Continuous variable (Scale of 1 to 7) 6. RESULTS 6.1 HYPOTHESIS ONE The a n a l y s i s of var iance was performed for model 1.0. The ANOVA tab le i s shown below i n Table V I . TABLE VI - ANOVA TABLE FOR MODEL 1.0 Source of v a r i a t i o n SS df MS F-va lue PR > F COMPLEXITY 4341.4 1 4341.4 12.54 .0011 EXPERIENCE 29.8 1 29.8 0.09 .7709 (COMP*EXP) 673.8 1 673.8 1.95 .1712 EDUCATIONAL INSTITUTION 319.3 1 319.8 0.92 .3430 ERROR 13159.6 38 346.3 The ANOVA t a b l e i n d i c a t e s that Educat iona l I n s t i t u t i o n has no e f fec t on t e s t scores . Subjects from both Educat ional I n s t i t u t i o n s d i d equal ly w e l l . The tab le does i n d i c a t e that complexity i s an important f a c t o r , as was intended. Scores on the complex tes t are s i g n i f i c a n t l y d i f f e r e n t from scores on the simple tes t (p = .0011). The C0MP*EXP i n t e r a c t i o n term i n d i c a t e s some weak i n t e r a c t i o n between complexity and experience . T h i s w i l l be discussed l a t e r under hypothesis two. 77 78 The ANOVA table indicates that Experience i s not an important factor when the mean scores of simple and complex te s t s are combined. But, we a l s o need to determine i f 3GL experience has an e f f e c t on test scores for the simple test only. This must a l s o be repeated for the complex t e s t . We can test the hypothesis that experienced 3GL programmers achieve higher mean scores than novices, using the t - t e s t described e a r l i e r , i n the Method section. F i r s t for the simple t e s t , using the one-sided t - t e s t described e a r l i e r , t*=1.45. The c r i t i c a l t-value at .05 l e v e l of s i g n i f i c a n c e i s t[.95,39]= 1.684. Since t * i s not larger than t we cannot r e j e c t Ho at the .05 l e v e l of s i g n i f i c a n c e . The c r i t i c a l t-value at the .10 l e v e l of s i g n i f i c a n c e i s t[.90,39] = 1.303. Since t * = 1.45 i s greater than 1.303 we can r e j e c t Ho, and conclude that 3GL experience i s an important f a c t o r f o r simple tests at the .10 l e v e l . For the complex t e s t , t*= -.52. This i s not larger than the c r i t i c a l t-value 1.684. Therefore we cannot r e j e c t Ho at the .05 l e v e l . The same conclusion i s reached at the .10 l e v e l . This in d i c a t e s that 3GL experience i s not an important factor f o r the complex t e s t . The scatter p l o t s of test scores versus the judges' mean experience r a t i n g s , for simple and complex t e s t s , are shown i n Figures 9 and 10. The scatter p l o t for the simple t e s t shows an upward trend, whereas the scatter p l o t f o r the complex te s t i s random. The r e s u l t s of regresson 1.1 score versus 3GL experience for the simple t e s t , are as follows: R2=.1318, p-value of .028, and px = 4.4. The r e s u l t s for the complex te s t are R2=.0015, p-value=.42, and 01=O.46. These regression r e s u l t s support the r e s u l t s obtained from the analysis of FIGURE 9 GRAPH OF SUBJECT SCORES VERSUS MEAN OF THE JUDGES' RATINGS FOR THE SIMPLE TEST 100 80 PC o o 6o\ 40 20 Z 3 & MEAN OF THE JUDGES' RATINGS VO FIGURE 10 - GRAPH OF SUBJECTS' SCORES VERSUS MEAN OF THE JUDGES' RATINGS FOR THE COMPLEX TEST 100 80 g60| o xn 40 2 . 3 4-MEAN OF THE JUDGES' RATINGS 00 o 81 variance for hypothesis one. The regression f o r the simple t e s t i s s i g n i f i c a n t , again i n d i c a t i n g that experience with 3GLs a f f e c t s simple 4GL t e s t performance. The p o s i t i v e 0^ value indicates that greater experience i s co r r e l a t e d with higher simple t e s t scores, as hypothesized. The r e s u l t s of the complex t e s t regression again i n d i c a t e that the experience-score r e l a t i o n s h i p i s not s i g n i f i c a n t f o r the complex t e s t . 6.2 HYPOTHESIS TWO We can d i r e c t l y test hypothesis number two to see i f novices' scores drop by more than experienced 3GL programmer scores. We w i l l use a .10 l e v e l of s i g n i f i c a n c e . In order to accept Ha, that novice scores decrease more than experienced 3GL programmers' scores , t * w i l l have to be l a r g e r than t[.90, n T - ab] which i s 1.303. We determined that t * = -1.43, which i s not larger than 1.303. Therefore we cannot r e j e c t Ho. Therefore, the evidence indicates that we cannot conclude that novices' scores drop by more than experienced programmers' scores. 6.3 HYPOTHESIS THREE Table VII presents the r e s u l t s of the regression a n a l y s i s . We s h a l l use the term Report Writ as a short form for the dummy va r i a b l e Report Writer. The table indicates that 4GL i s the only s i g n i f i c a n t v a r i a b l e . The table shows the r e s u l t s of both the simple regressions and the m u l t i p l e regressions. The multiple regression r e s u l t s are presented at the bottom of the table. 82 T A B L E V l | - T A B L E Qf_ R E G R E S S I O N STATISTICS LOR HYPQTHESIS 2. ..... .. S i m p l e / C o m p l e x t a s k D e p e n d e n t V a r i a b l e I n d e p e n d e n t V a r i a b l e P - v a l u e R V a I ue o f B S i m p l e S c o r e Re p o r t Wr t . 2 0 6 5 . 0 2 5 9 6 . 1 6 S i mp 1 e S c o r e 4GL . 0 6 3 2 . 0 8 7 4 1 1 . 8 2 Comp 1 e x S c o r e R e p o r t Wr t . 3 2 9 6 . 0 0 7 3 - 3 . 2 1 Comp 1 e x S c o r e 4GL .3 468 . 0 0 5 8 2 . 5 6 S i mp 1 e S c o r e Re po r t Wr t ! . 1 9 2 0 0 . 1 9 4GL .4 48 4 0 . 92 C o m p l e x S c o r e R e p o r t Wr t . 3 6 6 5 2 . 5 5 4GL . 0 0 6 8 22 . 4 83 More data would be needed to be c e r t a i n of the r e s u l t s on report writer, and 4GL experience. The raw data c o l l e c t e d shows that only nineteen out of the f i f t y - s e v e n subjects had used report writers (no one had written more than sixteen programs), and only twenty-four out of f i f t y - s e v e n had previously used a fourth generation language. These are both small samples. Results indicate that 4GL i s a s i g n i f i c a n t v a r i a b l e . The f a c t that students, with previous 4GL experience do s i g n i f i c a n t l y better on the FOCUS 4GL tests i s not s u r p r i s i n g . The evidence we have i n d i c a t e s that we have to r e j e c t the hypothesis that experience with report writers leads to higher 4GL t e s t scores. 6.4 SUMMARY OF THE PERFORMANCE OF THE SUBJECTS Figure 11 and Table VIII summarize the performance of each of the four treatments, as well as the number of subjects i n each treatment. Table VIII and Figure 11 both i n d i c a t e that novice scores are much more v a r i a b l e than experienced programmer scores. The data strongly suggests that some novices performed poorly on both t e s t s , while experienced programmers were more consistent. A l l the scores recorded on the complex te s t were low i n d i c a t i n g that the test was very d i f f i c u l t , or, that too l i t t l e time was provided for the subjects to f i n i s h the t e s t . This could explain why there was l i t t l e d i f f e r e n c e i n the scores achieved by novices and experienced programmers on the complex t e s t . We also note i n Figure 11 a break i n scores between the lower scorers and the higher scorers. This break indicates that some subjects could learn 4GLs e a s i l y , while others had a l o t of T A B L E VI II - M E A N S . STANDARD E R R O R S . ANP_ NUMBER Qf_ O B S E R V A T I O N S £ £ f i EACH T R E A T M E N T SI MP L E T A S K D I F F I C U L T Y COMPLEX P R O G R A M M I N G E X P E R I E N C E NOVICE E X P E R I E N C E D 3GL _ _ X = 69 1 X = 8 2 . 0 € -- 22 3 € = 8 .7 N = 1 6 0 N = 6 .0 X 5 6 . 9 19 .6 X = 5 2 . 6 € - 13 .3 FIGURE 11 - GRAPH OF SUBJECT SCORES VERSUS EACH OF THE FOUR TREATMENTS I O O 80 § 6 0 1 h3 > o 20 3 ' 1 401 • • — I 1 1 1 SIMPLE SIMPLE COMPLEX COMPLEX NOVICE EXPERT NOVICE EXPERT 86 d i f f i c u l t y . This f a c i l i t y with 4GLs is not solely dependent on previous 3GL experience as can be seen from the graph. There seems to be another factor missing here, which would explain why some subjects learn 4GLs more easily than others. 7. DISCUSSION OF THE RESULTS Though not a l l the hypotheses were accepted when tested, some very interesting results were obtained from the analysis. Before discussing the results, some weaknesses of the methodology, which could have affected the results, should be discussed. Because the subjects were only given approximately an hour to read the FOCUS commands, and one hour to run through the problem session, some question remains as to the vali d i t y of the measure of the dependent variable, a b i l i t y to learn a fourth generation language. On the one hand, in a work setting, employees are given a number of days to familiarize themselves with the FOCUS commands. On the other hand, the two to three hour learning session given the students, allowed them to learn almost a l l of the FOCUS reporting commands. Because the syntax i s so English-like, i t takes l i t t l e time to learn FOCUS. Because the tasks assigned to students were relatively short, some c r i t i c s might question the external vali d i t y of the results. The problems assigned to the subjects were typical of problems faced by business professionals. But these problems are more simplistic than those handled by information system professionals. A more r e a l i s t i c problem for an information system professional would involve building, maintaining, and reporting from a database. Getting the subjects to commit more time to complete a more complex task would have been a problem. The three hour session was already long. In this study, only reporting tasks were examined, yet the novices showed that they handled more complex tasks no more poorly than the experienced programmers. 87 88 The f i n a l weakness i n our methology i s that no r e a l "expert" programmers were used i n the experiment. Rather, experienced student programmers were used. If pr o f e s s i o n a l programmers had been used they might have done better than the novices. S t i l l , a few of the subjects had worked as programmers, and the rest of the experienced subjects were only one year away from being i n the job market where they would be considered p r o f e s s i o n a l programmers. Notwithstanding the above problems, the r e s u l t s are i n t r i g u i n g . The problems are a r e a l i s t i c representation of what some marketing, or finance p r o f e s s i o n a l s w i l l be faced with i n the near future. The r e s u l t s of hypotheses two and three w i l l be discussed f i r s t as they are not as important as the r e s u l t s of hypothesis one. The r e s u l t s of hypothesis number two were not what we expected. We expected novices' scores to decrease by more than experienced programmers' scores, when progressing from the simple to the complex t e s t . The r e s u l t s can be p a r t i a l l y explained by the higher score recorded by experienced programmers on the simple t e s t . When we were se t t i n g up the t e s t s , we expected experienced programmers to perform much better than novices on the complex t e s t . As a r e s u l t , the second hypothesis would have measured a much bigger drop f o r novices. But, because experienced programmers scored higher than novices on the simple t e s t , but no better on the complex t e s t , the r e s u l t s of hypothesis number two became less meaningful than o r i g i n a l l y expected. Because of the lack of data, few meaningful r e s u l t s were obtained for hypothesis number three. Only the f a c t that knowing one 4GL helps, when learning another, can be s a f e l y concluded. 89 Concerning hypothesis number one, the r e s u l t s of the simple test i n d i c a t e that experienced programmers outperform novices. The simple commands such as PRINT, BY, and ON i n v o l v i n g c o n t r o l breaks, are better handled by the experienced programmers. These commands are f a m i l i a r to experienced t h i r d generation language programmers, therefore, they can transfer previous s k i l l s to a fourth generation a p p l i c a t i o n i n v o l v i n g simple reporting commands. As a r e s u l t , t h e i r scores are higher than novices' who have never been exposed to these commands Results of the complex test do not support the hypothesis that experienced programmers can outperform novices on complex 4GL t e s t s . Some people may argue that these r e s u l t s are due to an excessively hard t e s t combined with too short a t e s t i n g time (45 minutes). Their argument would be that the test was too hard for a l l the subjects, and therefore the r e s u l t s are meaningless. T h i r t y percent of the subjects who wrote the complex te s t scored below f i f t y percent. Also, 16 out of the 29 subjects who wrote the complex t e s t d i d not f i n i s h w r i t i n g the tes t before the 45 minute time l i m i t had expired. This data could i n d i c a t e that many of the subjects found the t e s t too long or too hard. Based on the f a c t that the experienced programmers' scores on the complex te s t were less v a r i a b l e than the novices' scores, we might conclude that i f a bigger sample was taken the experienced programmers would have outperformed the novices. I b e l i e v e the r e s u l t s may be p a r t l y due to the d i f f e r e n c e between the simple and complex 4GL commands. The more complex 4GL reporting commands such as JOIN, SUBHEAD, and DEFINE used i n the complex te s t are semantically very 90 d i f f e r e n t from anything encountered i n a t h i r d generation language. Therefore, the experienced t h i r d generation language programmer cannot transf e r any previous s k i l l s into t h i s complex 4GL environment. These very complex commands were the ones which caused the most problems for experienced 3GL programmers. As a r e s u l t , the experienced 3GL programmer has l i t t l e or no advantage over a novice i n a complex 4GL reporting a p p l i c a t i o n . The fact that experienced programmers' scores were l e s s v a r i a b l e than novices' scores can be explained by the fac t that a few simpler commands were used i n the complex t e s t . As we have already concluded, experienced programmers have an advantage over novices i n simple 4GL reporting a p p l i c a t i o n s . Possibly, experienced programmers score no better than novices because they have no more experience with the d i f f e r e n t 4GL semantics than novices do. The r e s u l t s of t h i s research i n d i c a t e that experienced t h i r d generation language programmers would be preferred over novices when an a p p l i c a t i o n involves simple 4GL commands. For complex ap p l i c a t i o n s , novices performed as well as experienced t h i r d generation language programmers, but the r e s u l t s were not conclusive. Several possible explanations were advanced to t r y to explain the r e s u l t s . More t e s t i n g i s needed to determine i f any of the explanations i s co r r e c t . Further t e s t i n g should a l s o be conducted by other researchers with other fourth generation languages, and with other tasks to see i f the same r e s u l t s are obtained. BIBLIOGRAPHY Abbott, Jack L. "A Comparison of Five Database Management Programs" Byte, 8, No.5(1983), pp.220-228. Brooks, Ruven. "Studying Programmer Behavior Experimentally: The Problems of Proper Methodology" Communications of the ACM, 23(1980), pp.207-213. Brooks, Ruven. "Using a Behavioral Theory of Program Comprehension i n Software Engineering" IEEE Third I n t e r n a t i o n a l Conference on  Software Engineering 1978. Long Beach,CA.: IEEE,1978, pp.196-201. Cardenas, Alfonso F., and William P. Grafton. "Generators: Challenges and Requirements for New App l i c a t i o n s " Proceedings of the National Computer Conference 1982. Montvale,N.J.: AFIPS,1982, pp.343-349. Chamberlain, D.D., Astrahan, M.M., et a l . "SEQUEL 2: A U n i f i e d Approach to Data D e f i n i t i o n , Manipulation and Control" IBM  Journal of Research and Development, 20(1976), pp.560-574. Chapin, Ned. "Software Maintenance with Fourth Generation Languages" ACM Sig s o f t Software Engineering Notes, 9, No.1(1984), pp.41-42. Chase, William G., and Herbert A. Simon. "Perceptions i n Chess" Cognitive Psychology, 4(1973), pp.55-81. Chrysler, E. "Some Basic Determinants of Computer Programmming Pro d u c t i v i t y " Communications of the ACM, 21(1978), pp.472-483. Chrysler, E. "The Impact of Program and Programmer C h a r a c t e r i s t i c s on Program Size" Proceedings of the National Computer Conference  1978. Montvale,N.J.: AFIPS,1978, pp.581-587. Cobb, Richard H. "In Praise of 4GLs" Datamation, 31, No.14(1985), pp.90-96. Coble, D.F. "Fourth Generation Languages W i l l Impact P r o d u c t i v i t y -If..."Data Management, 20, No.7(1982), pp.29-32. Codd, E.F. "Relational Databases: A P r a c t i c a l Foundation f o r Pro d u c t i v i t y " Communications of the ACM, 25(1982), pp.109-117. Cu-Uy-Gam, Miriam. "Do-it-yourself i s on the way for system development" Computing Canada;Software Report,(May 1985),p.9. Data Decisions. "System Software Survey: User's Favourite Disks" Datamation, 30, No.20(1984), pp.85-138. D i g i t a l Consulting Associates. The 1984 National Data Base and Fourth Generation Language Symposium. Wakefield,MA.: D i g i t a l Consulting Associates,1984. 91 92 DuBoulay, B., and T. O'Shea. "Teaching Novices Programming" Human Interations With Computers, ed H.T. Smith and T.R.G. Green. New York: Academic Press,1980, pp.147-200. Duncker, K. "On Problem Solving" Psychological Monographs, 58, No.5(1945), pp.1-112. Dzida, W., Herda, S. et a l . "User Perceived Qua l i t y of Interactive Systems" IEEE Transactions on Software Engineering, 4(1978), pp.270-276. EDP Analyzer. Special Report: Fourth Generation Languages and Prototyping. Vista,CA.: Canning publications,1984. Eisenbach, S., and C. Sadler. "Declarative Languages: An Overview" Byte, 10, NO.8(1985), pp.181-197. Elder, Marvin. "SALVO - A Fourth Generation Language for Personal Computers" Proceedings of the National Computer Conference 1984. Montvale,N.J.: AFIPS,1984, pp.563-566. E v e r i t t , Brian. Cluster Analysis . London: Heinemann Educational Books,1974. "Fourth Generation Languages enter the dp mainstream despite some resistance" Computing Canada: Software Report, (May 1985), p.6. Garry, Ralph, and Howard L. Kingsley. The Nature and Conditions  of Learning. Englewood C l i f f s , N . J . : Prentice-Hall,1970. Goodman, Aaron M. "Application Generators at IBM" Proceedings  of the National Computer Conference 1982. Montvale,N.J.: AFIPS,1982, pp.361-362. Gordon, J.D., Salvadori, A., and C.K. Capstick. "An Empirical Study of COBOL Programmers" INFOR, 15(1977), pp.229-241. Gould, John D. "Some Psychological Evidence on How People Debug Computer Programs" International Journal of Man-Machine Studies, 7(1975), pp.151-182. Green, T.R.G., Sime, M.E., and M.J. F i t t e r . "The Art of Notation" Human Interactions with Computers, ed H.T. Smith and T.R.G. Green. New York: Academic Press,1980, pp.221-251. Grochow, J e r r o l d M. "Application Generators: An Introduction" Proceedings of the National Computer Conference 1982. Montvale,N.J.: AFIPS,1982, pp.391-392. Harel, E l i e C , and Ephrain R. McLean. "The E f f e c t s of Using a Nonprocedural Computer Language on Programmer Productivity" MIS Quarterly, 9, No.2(1985), pp.109-120. Holtz, D.H. "A Nonprocedural Language for On-Line Applications" 93 Datamation, 25, No.4(1979), pp.167-176. Horowitz, E l i e , Kemper, Alfons, and B a l a j i Narasimhan. "A Survey of Appl i c a t i o n Generators" IEEE Software, 2, No.1(1985), pp.40-54. Jenkins, Milton A. "Surveying the Software Generator Market" Datamation, 31, No.17(1985), pp.247-261. Johnson, James R. "A Pro t o t y p i c a l Success Story" Datamation, 29, No.11(1983), pp.251-256. Johnson, Jan. "MAPPER Goes Micro" Datamation, 29, No.11(1983), pp.62-66. Kelley, J.F. "An I t e r a t i v e Design Methodology for User-Friendly Natural Language O f f i c e Information A p p l i c a t i o n s " ACM  Transactions on O f f i c e Information Systems, 2(1984), pp.26-40. Kennedy, T.C.S. "Some Behavioural Factors A f f e c t i n g the Training of Naive Users of an Int e r a c t i v e Computer System" International  Journal of Man-Machine Studies, 7(1975), pp.817-834. Kowalski, R. "Logic Programming" Byte, 10, No.8(1985), pp.161-177. K u l l , David. "Non Procedural Languages: Bringing up the Fourth Generation" Computer Decisions, 15, No.13(1983), pp.154-162. Landaver, T.K., G a l o t t i , K.M., and S. Hartwell. "Natural Command Names and I n i t i a l Learning: A Study of Text-Editing Terms" Communications of the ACM, 26(1983), pp.495-503. Laughery J r . , K. Ronald, and Kenneth R. Laughery Sr. "Human Factors i n Software Engineering: A Review of the L i t e r a t u r e " The  Journal of Systems and Software, 5(1985), pp.3-14. Leavenworth, Burt M., and Jean E. Sammet. "An Overview of Nonprocedural Languages" IBM Research Report, RC4685. Gaithersburg,MA.: IBM,1974. Lukac, Eugene G. "The Impact of a 4GL on Hardware Resources" Datamation, 30, No.16(1984), pp.105-114. Mandell, Steven L. Computers and Data Processing: Concepts and App l i c a t i o n s. New York: West Publishing Company,1985. Martin, James. App l i c a t i o n Development Without Programmers. Toronto: Prentice-Hall,1982. Martin, James. Fourth Generation Languages. Lancaster: Savant Institute,1983. Mayer, Richard E. "The Psychology of How Novices Learn Computer Programming" ACM Computing Surveys, 13(1981), pp.121-141. 94 McDonald, Nancy H., and John P. McNally. "Query Languages Feature Analysis by U s a b i l i t y " Computer Languages, 7(1982), pp.103-124. McKeithen, Katherine B. et a l . "Knowledge Organization and S k i l l Differences i n Computer Programmers" Cognitive Psychology, 13(1981), pp.307-325. Miara, Richard J ., Musselman, Joyce A., et a l . "Program Indentation and Comprehensibility" Communications of the ACM, 26(1983), pp.861-867. M i l l e r , Boulton B. "Fourth generation languages and personal computers" Proceedings of the National Computer Conference 1984. Montvale,N.J.: AFIPS,1984, pp.555-559. M i l l e r , Lance A., and Cu r t i s A. Becker. "Programming i n Natural Language" IBM Research Report, RC5137. Yorktown Heights: IBM,1974. M i l l e r , Lance A., and John C. Thomas J r . "Behavioral Issues i n the Use of Interactive Systems" International Journal of  Man-Machine Studies, 9(1977), pp.509-536. Moran, Thomas P. "An Applied Psychology of the User" ACM Computing Surveys, 13(1981), pp.1-11. Munnecke, Thomas. "A L i n g u i s t i c Comparison of MUMPS and COBOL" Proceedings of the National Computer Conference 1980. Montvale,N.J.: AFIPS,1980, pp.723-729. National Bureau of Standards. An Architecture for Database  Management Standards, NBS Special P u b l i c a t i o n 500-86. Washington: National Bureau of Standards,1982. Neter, John, Wasserman, William, and Micheal H. Kutner. Applied Linear S t a t i s t i c a l Models. Homewood, 111.: Richard D. Irwin Inc.,1985. N i c o l - G r i f f i t h , Mike. MAPPER was the F i r s t User Command Language. Montreal: Canadian P a c i f i c Consulting Services,1983. N i c o l - G r i f f i t h , Mike. User-Driven Computing at Canadian P a c i f i c Consulting Services Ltd. - A Case Study. Montreal: Canadian P a c i f i c Consulting Services,1985. Osgood, C.E. Method and Theory i n Experimental  Psychology. New York: Oxford,1953. Paxton, A.L., and E.J. Turner. "Human Factors and Novice Computer Users" International Journal of Man-Machine Studies, 20(1984), pp.137-156. Petr i c k , S.R. "On Natural Language Based Computer Systems" IBM Journal of Research and Development, 20(197,6), pp.314-325. 95 Prywes, N.S., Shastry, S. and A. Pnueli. "Use of a Nonprocedural Specification Language and Associated Program Generator in Software Development" ACM Transactions on Programming Languages  and Systems, 1(1979), pp.196-217. Read, Nigel S., and Douglas L. Harmon. "Assuring MIS Success" Datamation, 27, No.29(1981), pp.109-120. Read, Nigel S., and Douglas L. Harmon. "Readers' Forum: Language Barrier to Productivity" Datamation, 29, No.2(1983), pp.209-212. Reisner, Phyllis, Chamberlain, Donald D., and Raymond F. Boyce. "Human Factors Evaluation of Two Data Base Query Languages: Square and Sequel" Proceedings of the National Computer  Conference 1975. Montvale,N.J.: AFIPS,1975, pp.447-452. Reisner, Phyllis. "Human Factors Studies of Database Query Languages: A Survey and Assessment" ACM Computing Surveys, 13(1981), ppl3-31. Reisner, Phyllis. "Use of Psychological Experimentation as an Aid to Development of a Query Language" IEEE Transactions on  Software Engineering, 3(1977), pp.218-229. Sammet, Jean E. Programming Languages: History and Fundamentals. Englewood Cliffs,N.J.: Prentice-Hall,1969. SAS Institute Inc. SAS User's Guide: Statistics, Version 5 Edition.Cary, NC: SAS Institute Inc.,1985. Schmidt, Joachim W. "Some High Level Constructs for Data of the Type Relation" ACM Transactions on Data Bases, 2(1977), pp.247-261. Sheil, B.A. "The Psychological Study of Programming" ACM  Computing Surveys, 13(1981), pp.101-120. Shneiderman, Ben. Software Psychology: Human Factors in Computer and Information Systems. Toronto: L i t t l e , Brown and Co.,1980. Shneiderman, Ben. "Measuring Computer Program Quality and Comprehension" International Journal of Man-Machine Studies, 9(1977), pp.465-478. Shneiderman, Ben. "Exploratory Experiments in Programmer Behavior" International Journal of Computer and Information  Sciences, 5, No.2(1976), pp.123-143. Shneiderman, Ben. "Improving the Human Factors Aspect of Database Interactions" ACM Transactions on Database Systems, 3(1978), pp.417-439. Sojka, Deborah. "Soft Selling Software" Datamation, 29, No.6(1983), pp.68-73. 96 Tharp, Alan L., and Woodrow E. Robbins. "Using Computers i n Natural Language Mode f o r Elementary Education" International  Journal of Man-Machine Studies, 7(1975), pp.703-725. Thomas, John C. "Psychological Issues i n Database Management" Third International Conference on Very Large Data Bases, Tokyo,  Japan, 1977. New York: IEEE,1977, pp.169-185. Thomas, John C , and John D. Gould. "A Psychological Study of QBE" Proceedings of the National Computer Conference 1975. Montvale,N.J.: AFIPS,1975, pp.439-445. T i n n i r e l l o , Paul C. "Software Maintenance with Fourth Generation Languages" Proceedings of the National Computer Conference 1984. Montvale,N.J.: AFIPS,1984, pp.251-257. Treu, S i e g f r i e d . " I n t e r a c t i v e Command Language Design Based on Required Mental Work" International Journal of Man-Machine  Studies, 7(1975), pp.135-149. T r u i t t , Thomas D., and Stuart B. Mindlin. An Introduction  to Nonprocedural Languages: Using NPL. New York: McGraw-Hill,1983. Tyler, Micheal. "Cincom S h i f t s Gears" Datamation, 29, No.6(1983), pp.58-65. Vessey, I r i s . An Investigation of the Psychological Processes Underlying the Debugging of Computer Programs. Unpublished Doctoral D i s s e r t a t i o n , U n i v e r s i t y of Queensland, A u s t r a l i a , 1984. Waldrop, James H. "Application Generators: A Case Study" Proceedings of the National Computer Conference 1982. Montvale,N.J.: AFIPS,1982, pp.365-368. Wang, M.D. "The Role of Syntactic Complexity as a Determiner of Comprehensibility" Journal of Verbal Learning and Verbal  Behavior, 9(1970), pp.398-404. Weinberg, Gerald M., and Edward L. Schulman. "Goals and Performance i n Computer Programming" Human Factors, 16(1974), pp.70-77. Weissman, Larry. "Psychological Complexity of Computer Programs: An Experimental Methodology" SIGPLAN Notices, 9, No.6(1974), pp.25-36. Welty, Charles, and David W. Stemple. "Human Factors Comparison of a Procedural and a Non-Procedural Query Language" ACM  Transactions on Database Systems, 6(1981), pp.626-649. Wilco, Elaine. "Systems Development Without Programming" Computer Data, 9, No.2(1984), p.19. 97 Youngs, Edward A. "Human Errors in Programming" International  Journal of Man-Machine Studies, 6(1974), pp.361-376. APPENDIX ONE JUDGES' RATINGS OF THE SUBJECTS 99 RATING THE SUBJECTS P l e a s e r a t e the s u b j e c t s of the f o u r t h g e n e r a t i o n language experiment on t h e i r e xperience with t h i rd q e n e r a t i o n languages. Use the i n f o r m a t i o n about the s u b j e c t s given i n the a t t a c h e d t a b l e s t o r a t e the s u b j e c t s . The i n f o r m a t i o n s u p p l i e d i n the t a b l e s i s s t r u c t u r e d as f o l l o w s : Column Column Column Column Column Column Column Column Column Column Column 1 The s u b j e c t number. 2 E d u c a t i o n a l degrees the s u b j e c t has completed, or now has i n p r o g r e s s . 1 3 The number of years the s u b j e c t has been doing f u l l - t i m e work i n v o l v i n g some degree of programming. 4 The percentage of time spent programming ( v s . doing other tasks) at work. 5 The number of t h i r d g e n e r a t i o n languages the s u b j e c t has used. 6 The t o t a l number of t h i r d g e n e r a t i o n language programs the s u b j e c t has w r i t t e n . 7 The t h i r d g e n e r a t i o n language known be s t . 8 The number of years e x p e r i e n c e the s u b j e c t has with t h i s language. 9 The number of programs the s u b j e c t has w r i t t e n i n t h i s language. to 9, but f o r the second best 10 to 12 13 to 15 Same as columns language known. Same as columns language known. 7 to 9, but f o r the t h i r d best Rate the s u b j e c t s experience with t h i r d g e n e r a t i o n languages u s i n g the f o l l o w i n g s c a l e . P l a c e your r a t i n g s on the a t t a c h e d RATING SHEET. NOVICE 1 EXPERT 7 1Note - Computer Systems i s a two year diploma program o f f e r e d at BCIT. 100 R A T I N G SHEET S U B J E C T NUMBER 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 _1_9 A 0 . '2 r 22 T T RAT ING 24 25 26 2 7~ S U B J E C T NUMBER RAT ING 28 29 30 3 1 32 33 "3 4" 35 36 37 38 "39" 40 4 1 42 43 4"£ '45 46 "47" ±9_ 5 1 53 55". 56 1 57 ! 58 J \ 59 ! 60 t t SUBJECT DEGREES COMPLETED // OF YRS %TIME //OF 3GLs TOTAL // LANGUAGES BEST KNOWN, i i . . - • • i NUMBER / I N PROGRESS PROGRAMG PROG. KNOWN PROGRAMS LANG1 YRS // PROGRAMS LANG 2 YRS //PROGRAMS LANG 3 YRS //PR AT WORK WORK WRITTEN WRITTEN WRITTEN WRT 1 COMP. SYS _ _ 6 25 PASCAL 2 6 B A S I C 2 5 ASSEM 2 5 2 B.A., COMP. SYS - - 7 64 B A S I C 3 30 ASSEM 3 11 C 1 3 B.MATH, M.SC.B.A. - - 1 6 FORTRA 1/2 6 - - - - - -4 B . S c , COMP. SYS - - 6 54 PASCAL 4 30 FORTRA 3 8 COBOL 1 6 5 B . S c , MBA - - 1 3 FORTRA 1 3 - - - - -6 B.COMM, M.Sc.B.A. - - 2 55 FORTRA 1.5 40 COBOL 1 15 - -7 COMP. SYS - - 6 78 B A S I C 6 30 P A S C A I 1 20 ASSEM 1 12 8 COMP. SYS .25 80 6 30 COBOL 1 15 PASCAL 1.1 5 ASSEM l . f 5 9 COMP. SYS - - 5 41 PASCAL 4 25 B A S I C 2 6 COBOL 1 5 10 B.B.A., COMP. SYS - - 4 35 B A S I C 1/2 20 ASSEM 1/2 6 COBOL if: 6 11 B.COMM, MBA - - 2 9 COBOL 1/4 5 A P L 1U 4 - - -12 COMP SYS - - 5 39 PASCAL 3 20 ASSEM 3 10 COBOL 2 5 13 B . S c . , MBA - - 2 5 A P L 1/4 3 COBOL 1/E 2 - - -14 COMP. SYS - - 6 30 P L / 1 1 8 B A S I C 1 5 COBOL 1 5 15 B. ENG (MECH) 3 5 3 33 FORTRA 1 20 B A S I C 1 10 PASCAI 1/2 3 16 B . S . F . , COMP. SYS - - 6 35 B A S I C 1 10 PASCAL 1 6 COBOL 1 6 17 COMP. SYS - - 6 47 B A S I C 6 30 COBOL 1 5 ASSEM 1 4 18 B.ENG, MBA - - 4 20 B A S I C 1 10 FORTRA 1 5 COBOL 1/2 3 19 COMP. SYS .5 25 6 51 PASCAL 2 30 FORTRA 2 8 ASSEM 1 5 20 B . S c . - - 3 10 B A S I C 1/4 6 PASCAL 1/4' 3 COBOL 1/8 1 21 B.ENG, COMP. SYS - - 5 28 B A S I C 1 10 FORTRA 1 10 ASSEM 1 6 22 B.COMM, MBA - - 2 13 B A S I C 1/2 12 FORTRA 1/8 1 - - -23 B . S c . , MBA - - - - - - - - - - - - -24 B . S c , COMP. SYS - - 5 74 FORTRA 2 30 PASCAL 2 30 COBOL 1 7 25 B.ENG, MBA - - 1 7 FORTRA 2 7 - - - - - -26 COMP. SYS - - 5 10 ASSEM 1/2 4 COBOL 1/2 3 P L / 1 1/2 2 27 COMP. SYS - 7 39 PASCAL 4 12 COBOL 2 6 ASSEM 2 6 28 B . C . S c . , MBA 1 100 5 23 PASCAL 1 10 C 1 6 COBOL 1/2 3 29 B.COMM, MBA - 3 23 FORTRA 5 20 A P L 2 2 COBOL 1 1 30 B.A. ( L I N G U I S T I C S ) - _ - - - - - - - - - -31 B.ENG., MBA - - 2 33 B A S I C 2 30 FORTRA 1/2 3 - - -32 COMP. SYS - - 6 27 B A S I C 2 5 ASSEM 2 8 COBOL 1.5 5 33 COMP. SYS - - 5 38 COBOL 2 10 ASSEM 2 10 C 1 10 34 B . S c , M.A. 8 5 3 32 FORTRA 10 20 A P L V 10 B A S I C 1 2 SUBJECT NUMBER DEGREES COMPLETED / I N PROGRESS / OF YRS PROGRAMG KT WORK %TIME PROG. WORK // OF 3GLs KNOWN TOTAL // LANGUAG ES B EST "KNOWN" ——-— ...... PROGRAMS WRITTEN LANG1 YRS //PROGRAMS WRITTEN LANG 2 YRS //PROGRAMS WRITTEN LANG 3 iTRS //PR tfRT 35 COMP. SYS _ 5 87 ASSEM 4 50 COBOL 3 20 FORTRA '1/2 10 36 B.A., MBA - - - - - - - - - - _ _ 37 COMP. SYS - - 7 21 ASSEM 2 6 PASCAL 1 3 C 1 3 38 B . S c , MBA - - - - - - - _ _ _ _ _ _ 39 COMP. SYS 1/2 60 4 13 COBOL 2 6 ASSEM 2 4 P L / 1 2 2 AO B.H.N., MBA - - - - - - - _ _ _ _ 41 B . S c . M . S c , MBA - - 2 5 A P L 2 4 COBOL 2 1 - - -42 COMP. SYS - - 5 37 ASSEM 1.5 13 COBOL 1 12 PASCAL 1/2 6 43 B.A. ( E N G L I S H ) - - - - - - - - - _ _ _ 44 B.ENG, M.A,PhD - - 5 46 B A S I C 1/4 20 PASCAL 1 10 C 1/4 10 45 COMP. SYS - - 6 143 B A S I C 7 50 PASCAL 6 30 COBOL 4 50 46 COMP. SYS - - 5 23 PASCAL 1/2 8 ASSEM 1/2 6 COBOL 1/2 5 47 B.H.E., MBA - - - - - - - _ _ _ _ 48 B . A . ( E C O N ) , MBA- - - 2 7 B A S I C 4 5 A P L \ 1 2 - - -49 COMP. SYS - - 5 15 COBOL 2 6 ASSEM 1 4 PASCAL 1/2 2 51 COMP. SYS - - 5 11 ASSEM 1/2 4 COBOL 1/2 4 B A S I C 1 1 53 COMP. SYS - - 5 27 ASSEM 2 10 COBOL 1.5 10 B A S I C 2 3 55 COMP. SYS - - 5 24 B A S I C 1 10 ASSEM 1 6 COBOL 1/2 4 56 COMP. SYS - - 7 320 B A S I C 7 300 ASSEM 2 5 COBOL 1 5 57 COMP. SYS - - 6 40 B A S I C 3 20 PASCAL 1/2 8 ASSEM 1 5 58 COMP. SYS - - 4 19 B A S I C 2 6 COBOL 1 6 ASSEM 1 5 59 THEOLOGY, COMP. SYS 7 69 B A S I C 1.5 40 ASSEM 1 12 PASCAL 1 7 60 B.A., COMP. SYS - - 6 29 ASSEM 2 1 0 FORTRA 1 6 B A S I C 2 5 APPENDIX TWO SIMPLE TEST REPORT PREPARATION TEST The d a t a b a s e you w i l l be u s i n g f o r the t e s t p r o b l e m s i s a u n i v e r s i t y r e g i s t r a t i o n d a t a b a s e . T h i s d a t a b a s e m a i n t a i n s i n f o r m a t i o n on which c o u r s e s each s t u d e n t i s t a k i n g , who t e a c h e s the c o u r s e s , the s t u d e n t r e g i s t r a t i o n f o r each c o u r s e and s e c t i o n , and p e r s o n a l d a t a on the s t u d e n t s . The d a t a b a s e i s composed of t h r e e p h y s i c a l f i l e s : 1) STUDSEG f i l e ( c o n t a i n s i n f o r m a t i o n on s t u d e n t s ) 2) PROFSEG f i l e ( c o n t a i n s i n f o r m a t i o n on the p r o f e s s o r s ) and 3) REGISTER f i l e ( c o n t a i n s i n f o r m a t i o n on t h e c o u r s e s and s e c t i o n s ) . F i g u r e 1 i l l u s t r a t e s the d a t a b a s e , and f i g u r e s 2 and 3 g i v e the f i e l d s i n each of the f i l e s . H i e r a r c h i c a l Database f o r U n i v e r s i t y R e g i s t r a t i o n REG I.S.T.EE JF-LLE-F a c u l t y STUDSEG FILE PROFSEG FILE Student Course P r o f e s s o r S e c t i o n Attendee FIGURE: I 106 D e s c r i p t i o n of F i e l d s i n the REGISTER F i l e FACULTY_NAME - T h i s i s the name of the f a c u l t y ( A r t s , Commerce etc.) i n which the course i s taught. T h i s i s an indexed f i e l d . ALIAS=FACNAME COURSE_NUM - T h i s i s the course number ( f i e l d of l e n g t h 3 e.g 536). T h i s i s an indexed f i e l d . ALIAS=CONUM COURSE_NAME - The name of the course, e.g. A c c o u t i n g . ALIAS=CONAME. CO_FEE - The d o l l a r amount charged to the student f o r t a k i n g the course. ALIAS=COCHARG SEC_NO - The s e c t i o n number of the course. Each course can have s e v e r a l s e c t i o n s (e.g 001, 002 et c . ) ALIAS=SECNUM PROF_TEACH - T h i s i s a numeric f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the p r o f e s s o r teaching the course. ALIAS=TEACH. MAX_ENROLL - T h i s i s the maximum number of students which can be e n r o l l e d i n the course s e c t i o n . ALIAS=MAX. PERSON_ID - T h i s i s a numeric f i e l d of le n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the students e n r o l l e d i n the course s e c t i o n . ALIAS=PERID. 107 D e s c r i p t i o n of the F i e l d s i n the PROFSEG F i l e 1. PROF_ID - T h i s i s a numeric f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the p r o f e s s o r . T h i s i s an indexed f i e l d . ALIAS=PROID. 2. PROF_NAME - The name of the p r o f e s s o r . ALIAS=PROF. 3. OFFICE_NUM - The p r o f e s s o r ' s o f f i c e number. ALIAS=OFFNUM. 4. PR_FACULTY - The name of the f a c u l t y to which the p r o f e s s o r belongs. ALIAS=PRFAC. 1. STUDENT_ID - A numeric f i e l d of le n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the student. T h i s i s an indexed f i e l d . ALIAS=STUDID 2. STUDENT_NAME - The student's name. ALIAS = STUDNAME. 3. FACULTY - The name of the f a c u l t y to which the student belongs. ALIAS=FAC. 4. STREET_ADD - The student's home s t r e e t address. ALIAS=STADDR 5. CITY - The student's home c i t y . ALIAS=CT. 6. PROVINCE - The student's home p r o v i n c e . ALIAS=PROV. 7. TEL_NUM - The student's home telephone number. ALIAS=TELNO. D e s c r i p t i o n of the F i e l d s i n the STUDSEG F i l e 3 108 Shown below a r e examples o f d a t a r e c o r d s i n e a c h of t h e t h r e e f i l e s . STUDSEG F I L E STUDENT. I Q STUDENT-NfiMt FACULTY STREET-A DO CITY lpRo\ntJCF. TEL - k)0 OlOIO i 4 • J'OE COOPER MILT TO^EJ * COMMERCE AKT5 YJ wAY sr. 2Z T R E T S T GOLDEhJ B.C. '^637-1316 j PROFSEG F I L E P K O F - -LO Pft0F~ NAME~ oFF\CEM0t4 pp. F A U L T Y OOO|73 D/e. £>OLLf\R, t 21313 f r REGISTER F I L E FACULTY do SEC MA/ PE&SOK) _M ME FEE : NO TEACH £>JftOLL ID COMMENCE* ACCOCWTING- OO | 000173 i r o Ol oi ol 11 n 11 n O02. 000\ 73 IOO G°l 32/g ARTS 207 0O| (300 372 40 It II n n 11 II u tt d - c . # j i i 109 QUESTION 1 PART A Show t h e s t r u c t u r e of t h e r e p o r t (column h e a d i n g s ) p r o d u c e d by t h e f o l l o w i n g p r o g r a m s . W r i t e t h e column h e a d i n g s e x a c t l y as FOCUS wou l d p r o d u c e them. I f y o u r s e c o n d answer i s t h e same as y o u r f i r s t , j u s t w r i t e "Same as a b o v e " i n t h e s p a c e p r o v i d e d f o r t h e s e c o n d p r o b l e m . TABLE F I L E REGISTER PRINT CO_FEE AND MAX BY FACNAME BY CONUM BY SECNUM END TABLE F I L E REGISTER BY SECNUM BY CONUM BY FACNAME PRINT CO_FEE MAX END 110 PART B W r i t e a'program w h i c h w i l l d i s p l a y t h e c o u r s e number, and s e c t i o n number and t h e number of s t u d e n t s r e g i s t e r e d i n t h e s e c t i o n , f o r e v e r y c o u r s e and s e c t i o n i n t h e d a t a b a s e . S t r u c t u r e t h e r e p o r t a s f o l l o w s . PERSON- r o COU^SF.NUM -SET—MO COOKJT PART C W r i t e a pr o g r a m w h i c h w i l l p r o d u c e t h e same f i e l d s a s above but i n t h e f o l l o w i n g m a t r i x f o r m a t . S E C - N O lO-j ISO 3 3 3 f T K£6lSTORED tN Tt+e SEcTlOrJ. Q U E S T I O N 2 Write a program which w i l l p r i n t the course name, and below i the course number and s e c t i o n number, f o r a l l the courses and s e c t i o n s i n the database. Sequence the courses a c c o r d i n g to t f a c u l t y to which they belong. Show only one f a c u l t y per page, the bottom of the report p r i n t "COURSE AND SECTION LIST AS OF JANUARY 31, 1986". S t r u c t u r e the report e x a c t l y as f o l l o w s . FACULTY_NAME ARTS COURSE NAME POLITICAL SCIENCE COURSE_NUM SEC NO 1 99 001 COURSE_NAME COURSE_NUM SEC NO POLITICAL SCIENCE 1 99 002 112 Q U E S T I O N 3 W r i t e a p r o g r a m t o p r o d u c e a r e p o r t w h i c h shows, f o r e a c h s e c t i o n o f t h e c o u r s e names ENGLISH and FINANCE, 1) t h e ID number of t h e p r o f e s s o r t e a c h i n g , and 2) t h e maximum e n r o l l m e n t a l l o w e d i n t h e c o u r s e . Sequence t h e r e p o r t by t h e f a c u l t y t o w h i c h t h e c o u r s e b e l o n g s , and by t h e c o u r s e name and t h e s e c t i o n number, but do not p r i n t t h e s e c t i o n number. A l s o , show a s u b t o t a l f o r maximum e n r o l l m e n t e a c h t i m e t h e c o u r s e name o r f a c u l t y name c h a n g e s . P r o d u c e t h e r e p o r t and column h e a d i n g s e x a c t l y as f o l l o w s : » Xt IOTRL COu/esC-MAME /XXAX XKX TOTAL NftflE XX XX ^ TOTAL FACULTY..WAMt XXXX/ xxx XXX APPENDIX THREE COMPLEX TEST 114 REPORT PREPARATION TEST The d a t a b a s e you w i l l be u s i n g f o r t h e t e s t p roblems i s a u n i v e r s i t y r e g i s t r a t i o n d a t a b a s e . T h i s d a t a b a s e m a i n t a i n s i n f o r m a t i o n on which c o u r s e s e a c h s t u d e n t i s t a k i n g , who t e a c h e s t h e c o u r s e s , the s t u d e n t r e g i s t r a t i o n f o r e a c h c o u r s e and s e c t i o n , and p e r s o n a l d a t a on t h e s t u d e n t s . The d a t a b a s e i s composed of t h r e e p h y s i c a l f i l e s : 1) STUDSEG f i l e ( c o n t a i n s i n f o r m a t i o n on s t u d e n t s ) 2) PROFSEG f i l e ( c o n t a i n s i n f o r m a t i o n on t h e p r o f e s s o r s ) and 3) REGISTER f i l e ( c o n t a i n s i n f o r m a t i o n on t h e c o u r s e s and s e c t i o n s ) . F i g u r e 1 i l l u s t r a t e s the d a t a b a s e , and f i g u r e s 2 and 3 g i v e t h e f i e l d s i n e a c h of t h e f i l e s . H i e r a r c h i c a l D a t a b a s e f o r U n i v e r s i t y R e g i s t r a t i o n R E G I S T E R F_IL£. F a c u l t y STUDSEG MILS. PROFSEG PIXEL S e c t i o n A t t e n d e e FIGURE 1 116 D e s c r i p t i o n of F i e l d s i n the REGISTER F i l e 2. 3. 4. 5. 6. 7. FACULTY_NAME - T h i s i s the name of the f a c u l t y ( A r t s , Commerce e t c . ) i n which the cou r s e i s taught. T h i s i s an indexed f i e l d . ALIAS=FACNAME COURSE_NUM - T h i s i s the course number ( f i e l d of l e n g t h 3 e.g 536). T h i s i s an indexed f i e l d . ALIAS=CONUM COURSE_NAME - The name of the c o u r s e , e.g. A c c o u t i n g . ALIAS=CONAME. CO_FEE - The d o l l a r amount charged t o the student f o r taking the c o u r s e . ALIAS=COCHARG SEC_NO - The s e c t i o n number of the c o u r s e . Each course can have s e v e r a l s e c t i o n s (e.g 001, 002 e t c . ) ALIAS=SECNUM PROF_TEACH - T h i s i s a numeric, f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the p r o f e s s o r t e a c h i n g the course. ALIAS=TEACH. MAX_ENROLL - T h i s i s the maximum number of students which can be e n r o l l e d i n the course s e c t i o n . ALIAS=MAX. PERSON_ID - T h i s i s a numeric f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the st u d e n t s e n r o l l e d i n the course s e c t i o n . ALIAS=PERID. 117 D e s c r i p t i o n of the F i e l d s in the PROFSEG F i l e 1. PROF_ID - T h i s i s a numeric f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the p r o f e s s o r . T h i s i s an indexed f i e l d . ALIAS=PROID. 2. PROF_NAME - The name of the p r o f e s s o r . ALIAS=PROF. 3. OFFICE_NUM - The p r o f e s s o r ' s o f f i c e number. ALIAS=OFFNUM. 4. PR_FACULTY - The name of the f a c u l t y to which the p r o f e s s o r b elongs. ALIAS=PRFAC. 1. STUDENT_ID - A numeric f i e l d of l e n g t h 6 r e p r e s e n t i n g the i d e n t i f i c a t i o n number of the st u d e n t . T h i s i s an indexed f i e l d . ALIAS=STUDID 2. STUDENT_NAME - The stu d e n t ' s name. ALIAS=STUDNAME. 3. FACULTY - The name of the f a c u l t y to which the student b elongs. ALIAS=FAC. 4. STREET_ADD - The stu d e n t ' s home s t r e e t address. ALIAS=STADDR 5. CITY - The student's home c i t y . ALIAS=CT. 6. PROVINCE - The student's home p r o v i n c e . ALIAS=PROV. 7. TEL NUM - The stu d e n t ' s home telephone number. ALIAS=TELNO. D e s c r i p t i o n of the F i e l d s i n the STUDSEG F i l e 3 118 Shown be l o w a r e examples o f d a t a r e c o r d s i n e a c h o f t h e t h r e e f i l e s . STUDSEG F I L E STUDENT, r o FACULTY CATV TEL - klO .0 i 0 i o I • • SO£ COOPER, Ml LT JTO^iES « » AZTS y j IVA7 sr. 2Z -fcCC ST. GOLDEhJ 687-PROFSEG F I L E Pfc0F_ NAME" OFF\CkZM0fA PP.- Ff\c\jLiy OOO 1 7 3 « * 9 i REGISTER F I L E FACULTY . N A M E : NOrA N A r f F Co FEZ SEC KlO T C T C H M A / £>JftOU_ PE^SOK) I D ACCOtWTiKKS- OO | a o o / 7 3 i r o Ol oi ol I I 11 II ooz. G°t 3 2 f g A R T S Zol 0 O | 0 0 03 7 2 I I * 11 1 1 11 I I » C . T C . 119 Q U E S T I O N 1 P A R T A Show the s t r u c t u r e of the rep o r t (column headings) produced by the f o l l o w i n g programs. W r i t e the column headings e x a c t l y as FOCUS would produce them. I f your second answer i s the same as your f i r s t , j u s t w r i t e "Same as above" i n the space p r o v i d e d f o r the second problem. TABLE F I L E REGISTER PRINT CO_FEE AND MAX BY FACNAME BY CONUM BY SECNUM END TABLE F I L E REGISTER BY SECNUM BY CONUM BY FACNAME PRINT CO_FEE MAX END P A R T B W r i t e a program w h i c h w i l l d i s p l a y t h e c o u r s e number, and s e c t i o n number and t h e number of s t u d e n t s r e g i s t e r e d i n t h e s e c t i o n , f o r e v e r y c o u r s e and s e c t i o n i n t h e d a t a b a s e . S t r u c t u r e t h e r e p o r t a s f o l l o w s . PERS>OtJ- r o PART C W r i t e a p r o g r a m w h i c h w i l l p r o d u c e t h e same f i e l d s a s above b ut i n t h e f o l l o w i n g m a t r i x f o r m a t . COURSE-um sec-No J 0 7 ISO 3 3 3 rJUrABCR OF STVD£f>rrS 121 QUESTION 2 NOTE - For some of the f o l l o w i n g programs you may need to use the JOIN and DEFINE commands before i s s u i n g the TABLE command. Please show the commands e x a c t l y as you would have to type them on the computer. Write a program which w i l l p r i n t the course name, and below i t the course number and s e c t i o n number together, in a f i e l d c a l l e d NUMBER, f o r a l l the courses and s e c t i o n s i n the database. Order the courses according to the f a c u l t y i n which they belong. S t r u c t u r e the report e x a c t l y as f o l l o w s : FACULTY NAME ARTS COURSE_NAME POLITICAL SCIENCE NUMBER 199.001 COURSE_NAME POLITICAL SCIENCE NUMBER 199.002 122 QUESTION 3 Produce a course report for the Faculty o f A r t s . F o r e a c h c o u r s e , print the name of the course, the name of the professor teaching the section, the maximum enrollment allowed in the section, and the size of the classroom needed. Size of the classroom is defined as 'BIG' i f the maximum enrollment i s greater than 1 0 0 , and 'SMALL' otherwise. Summarize each course by indicating the to t a l maximum enrollment allowed (the sum of the maximums of the sections). Sequence the report by course number and section number, but do not print the course number with the rest of the f i e l d s . Instead, print the course number above the other course information as shown below. Your report should appear as follows (print column headings and summary lines exactly as they are shown below): ^ g f . - M n CouftS£__NAM£ PROFESSOR. r-|A*lHuM £>J£3i . l r t£MT SIZE Ooi POLI S c . M£ ShirH £o SMALL. » » » a * # T O T A L C O U R S E . NJuM *X><. X X * APPENDIX FOUR MARKING SCHEME FOR THE TESTS 124 MARKING SCHEME General G u i d e l i n e s 1. Unless otherwise noted the command l i n e s between the f i r s t command, TABLE FILE, and the l a s t command, END, can be p l a c e d in any o r d e r . 2. Do not s u b t r a c t marks f o r mistakes which are o b v i o u s l y only s p e l l i n g mistakes, but make sure they are only s p e l l i n g mistakes not a r e f e r e n c e to another data f i e l d or command. 3. Any fieldname can be r e p l a c e d by i t s a l i a s . 4. I f commands are added which are not needed and i ) which would cause the program to f a i l , s u b t r a c t a ' n o t i c e a b l e ' amount of marks. i i ) which would u n e c e s s a r i l y add to the program, but would not cause i t to f a i l , s u b t r a c t a 'minimal' amount of marks. 5. Commands which use the r i g h t keywords, but which are out of order, such as SUBTOTAL ON fieldname, or PRINT PERSONJD COUNT, are wrong and should not r e c e i v e any marks. 6. AND's are o p t i o n a l between fieldnames, and IS i s e q u i v a l e n t to EQ. 125 QUESTION 1 PART A - 5 MARKS The s u b j e c t r e c e i v e s 1/2 mark f o r each column heading which i s c o r r e c t l y p l a c e d , and has the c o r r e c t heading. I f e i t h e r , the column i s out of p l a c e , or the heading i s i n c o r r e c t , no marks are awarded. SOLUTION FACNAME CONUM SECNUM COFEE MAX SECNUM CONUM FACNAME COFEE MAX PART B -12 MARKS SOLUTION MARK TABLE FILE REGISTER 1 COUNT PERSON_ID 4 BY COURSE_NUM BY SEC_NO 6 END 1 - I f the verb i s not COUNT f o r the f i e l d PERSON_ID, don't award any marks f o r that l i n e . - I f the BY f i e l d s are reve r s e d , i . e . BY SEC_NO BY COURSE_NUM, award only 3 of 6 marks. - For each BY, or fieldname, which i s m i s s i n g , s u b t r a c t 2 marks, up to a t o t a l of 6. - I f PRINT COURSE_NUM SEC_NO i s used i n s t e a d of BY COURSE_NUM BY SEC_NO, award only 2 of 6 marks. 126 PART C - 13 MARKS MARKS 1 4 3 4 1 - L i n e s 1 and 5 must be e x a c t l y as shown to r e c e i v e marks. - I f the f i e l d i s not PERSON_ID, or the verb i s not COUNT, don't award any marks f o r l i n e 2. - I f the f i e l d i s not COURSE_NUM, or the command i s not BY don't award any marks f o r l i n e 3. - I f the ACROSS command i s used, but with the wrong f i e l d , award 2 of the 4 marks. - I f 'BY SEC_NO' i s used i n s t e a d of 'ACROSS SEC_NO', award only 1 of the 4 marks. SOLUTION TABLE FILE REGISTER COUNT PERSON_ID BY COURSE_NUM ACROSS SEC_NO END 127 Q U E S T I O N 2 - 25 M A R K S SOLUTION MARKS TABLE FILE REGISTER PRINT COURSE NAME OVER COURSE NUM OVER SEC NO 10 BY FACNAME 3 ON FACNAME PAGE-BREAK 5 FOOTING 4 COURSE AND SECTION LIST AS OF JANUARY 31, 1986 n END - L i n e s 1 and 7 must be e x a c t l y as shown to r e c e i v e marks. - I f the OVER commands are l e f t out of l i n e 2, award only 3 of 10 marks. - I f the OVER f i e l d s are r e v e r s e d i . e . 'OVER SEC_NO OVER COURSE_NUM' award 6 of 10 marks. - I f BY or ACROSS commands are used i n p l a c e of OVER commands i n l i n e 2 , award only 3 of 10 marks. - I f a verb other than PRINT i s used i n the second l i n e , s u b t r a c t 4 marks. A l s o , s u b t r a c t 2 marks f o r each fieldname which i s mi s s i n g . - L i n e 3 must be e x a c t l y as shown to r e c e i v e marks (but PAGE-BREAK c o u l d be added to l i n e 3 i n s t e a d of l i n e 4 ) . - Award the 5 marks f o r l i n e 4 i f PAGE-BREAK i s added to l i n e 3 in p l a c e of l i n e 4. - L i n e 4 must appear e x a c t l y as shown to r e c e i v e the marks. I f the PAGE-BREAK i s elsewhere than with the ON or BY command i t i s 128 i n c o r r e c t . - FOOTING can be on the same l i n e as the quote. - I f SUBFOOT i s used i n p l a c e of FOOTING award 2 of 4 marks. - I f quotes are m i s s i n g f o r l i n e 6, do not award the mark. - FOOTING CENTER i s OK f o r l i n e 5. 129 QUESTION 3 - 3 8 MARKS SOLUTION MARKS TABLE FILE REGISTER PRINT TEACH MAX ENROLL AS 'MAXIMUM ENROLLMENT' 6 BY FACNAME BY CONAME BY SEC NO 9 ON SEC NO NOPRINT 5 ON CONAME SUB-TOTAL 10 IF CONAME EQ ENGLISH OR FINANCE 6 END - L i n e s 1 and 7 must be e x a c t l y as shown. - If l i n e 2 i s mis s i n g the AS command f o r the column t i t l e , award only 3 of the 6 marks. - If TEACH i s not i n the PRINT l i n e , but rather i s i n a BY f i e l d , t h i s i s a c c e p t a b l e (but i t must be the l a s t BY f i e l d ) . - I f the PRINT verb i s r e p l a c e d by SUM or COUNT, award only 3 of 6 marks. - If one of the BY f i e l d s i n l i n e 3 i s pl a c e d i n a PRINT i n s t e a d of a BY, s u b t r a c t 3 marks f o r each mistake. - If BY f i e l d s are out of order, award only 3 of 9 marks. - Award 5 marks f o r l i n e 4 as i s , or i f NOPRINT f o l l o w s BY SEC_NO e.g. BY SEC_NO NOPRINT. - Award 10 marks f o r l i n e 5 as i s , or i f SUB-TOTAL f o l l o w s the BY CONAME command e.g. BY CONAME SUB-TOTAL BY SEC_NO - If the ON f i e l d f o r l i n e 5 i s i n c o r r e c t use your judgement as to how c l o s e the r e s u l t s would be to the one d e s i r e d , don't award 130 more than 5 of the 10 marks. - I f SUBTOTAL i s used i n pl a c e of SUB-TOTAL don't award more than 5 of the 10 marks. - I f the ON f i e l d i s wrong i n l i n e 5, and SUBTOTAL i s used i n pla c e of SUB-TOTAL, don't award more than 3 of the 10 marks. - I f AND i s used i n s t e a d of OR i n l i n e 6 award 2 of the 6 marks. - I f l i n e 6 i s s p l i t i n t o two l i n e s as f o l l o w s : IF CONAME EQ ENGLISH IF CONAME EQ FINAMCE award 2 of the 6 marks. - Naming the f i e l d to be s u b t o t a l e d i s a c c e p t a b l e , i . e . SUB-TOTAL MAX_ENROLL. - I f the f o l l o w i n g 2 l i n e s r e p l a c e l i n e 5 s u b t r a c t 1 mark ON CONAME SUBTOTAL ON FACNAME SUBTOTAL 131 COMPLEX TEST QUESTION 2 -26 MARKS SOLUTION MARKS DEFINE FILE REGISTER 3 NUMBER = CONUM || || SECNUM; 10 END TABLE FILE REGISTER PRINT CONAME OVER NUMBER 7 BY FACULTY 3 END - The DEFINE commands must come before the TABLE commands. If they follow, award a maximum of 7 of the 14 marks (cut the marks in h a l f ) . - Lines 1,3,4,7 must be exactly as shown to receive the marks. - If l i n e 2 is placed inside the TABLE command rather than in a DEFINE, divide the marks for that l i n e in hal f . - I f '|' i s used instead of '||' in l i n e 2, subtract 4 marks. - If the '.' and one of the '||' are misssing, subtract 4 marks. - If both of the above are wrong, award 4 of 10 marks. - If the. CONUM or SECNUM f i e l d s are replaced by other f i e l d s in l i n e 2, award one mark each for '.', 1||', '||', NUMBER and one of the correct f i e l d s . - If a name other than NUMBER is used in the DEFINE, and the same  name i s also used in the TABLE, subtract 4 marks. - If l i n e 5 i s correct, but NUMBER i s not defined, award only 5 132 of the 7 marks. - I f the OVER command i s mi s s i n g i n l i n e 5, award only 3 of the 7 ma r k s. - Award 2 of the 7 marks i f PRINT CONAME i s c o r r e c t , but the r e s t of the l i n e i s i n c o r r e c t . - Award 4 marks f o r PRINT CONAME OVER, i f the OVER f i e l d i s wrong. - I f the semi-colon i s m i s s i n g i n the DEFINE, s u b t r a c t 1 mark. 133 Q U E S T I O N 3 - 6 4 M A R K S SOLUTION MARKS JOIN TEACH IN REGISTER TO PROID IN PROFSEG AS NEW 10 DEFINE FILE REGISTER 3 SIZE = IF MAX_ENROLL GT 100 THEN 'BIG' ELSE 'SMALL'; 10 END 1 TABLE FILE REGISTER 1 PRINT CONAME PROF AS 'PROFESSOR' MAX AS 'MAXIMUM ENROLLMENT' SIZE 9 BY CONUM BY SECNUM 6 ON CONUM NOPRINT 5 ON CONUM SUBTOTAL 6 SUBHEAD "COURSE NUMBER <CONUM>" 8 IF FACULTY EQ ARTS 4 END 1 - L i n e s 2,4,5,12 must be e x a c t l y as shown i n order to r e c e i v e the ma r k s. - If the f i l e s are reversed i n the JOIN command, award 7 of the 10 marks, i . e . JOIN PROID IN PROFSEG TO TEACH IN REGISTER ... - The name NEW i n l i n e 1 can be r e p l a c e d by any other name. - The name of the DEFINE f i e l d does not have t o be SIZE, j u s t as long as the f i e l d i s l a t e r p r i n t e d AS 'SIZE'. - If quotes are miss i n g i n l i n e 3, award 8 of the 10 marks. - If the DEFINE block comes before the JOIN, award a maximum of 7 of the 14 marks. 134 - I f l i n e 3 i s pl a c e d i n s i d e a TABLE block r a t h e r than i n the DEFINE block, award a maximum of 4 of the 10 marks. - I f the DEFINE block f o l l o w s the TABLE block, award a maximum of 5 of the 14 marks. - I f a COMPUTE, or IF statement i s used i n p l a c e of the DEFINE block, award a maximum of 4 marks. - I f the JOIN i s not the f i r s t command, award only 5 of the 10 marks. - CONAME, SECNUM, and PROF can be e i t h e r BY f i e l d s , or f i e l d s i n the PRINT command. But, they must be i n the proper order - S u b t r a c t 2 marks f o r each AS command which i s executed i n c o r r e c t l y or i s m i s s i n g . - S u b t r a c t 2 marks f o r each PRINT or BY f i e l d which i s m i s s i n g , up to a maximum of 12 marks. I f the BY CONUM f i e l d i s m i s s i n g s u b t r a c t 4 r a t h e r than 2 marks. - Award a maximum of 6 marks f o r l i n e s 8 and 9 together i f BY CONUM i s m i s s i n g . - NOPRINT must be a s s o c i a t e d with the CONUM f i e l d i n order t o r e c e i v e the 5 marks. - SUB-TOTAL can be used i n p l a c e of SUBTOTAL. - SUBTOTAL MAX_ENROLL i s a c c e p t a b l e . - I f SUMMARIZE i s used i n p l a c e of SUBTOTAL, award only 4 of 6 ma r k s. - I f the SUBTOTAL f i e l d i s not CONUM don't award any marks. - In l i n e 10, award 3 marks f o r SUBHEAD, 3 f o r <CONUM>, and 2 f o r the t e x t "COURSE NUMBER". APPENDIX FIVE EXPERIMENTAL PROCEDURES 136 EXPERIMENTAL PROCEDURES 1) B r i e f I n t r o d u c t i o n of the experiment - B r i e f e x p l a n a t i o n of the purpose of the experiment (comparing the a b i l i t y of novice and experienced programmers to l e a r n a 4 G L ) - S t r e s s how important 4 G L ' s c o u l d be to the s u b j e c t s i n the f u t u r e . - S t r e s s that the s e s s i o n i s a l s o important because i t s u p p l i e s the experimenter with data f o r h i s t h e s i s . - Subjects should s i t at l e a s t one seat apart i f p o s s i b l e . 2) E x p l a i n the sequence of events f o r the s e s s i o n - The s e s s i o n w i l l take 3 hours, maybe more . S t r e s s to the sub j e c t s that they have to run through a l l the experiment (reading, sample problems, t e s t , q u e s t i o n n a i r e ) , otherwise the data w i l l not be of value to the study. At t h i s p o i n t they should decide whether they want to commit to the three hours i n v o l v e d . - S t r e s s that there should be no t a l k i n g at any time d u r i n g the experiment. I f the s u b j e c t s have problems they should c o n s u l t one of the l a b a s s i s t a n t s . - T e l l the s u b j e c t s that the f i r s t step i n v o l v e s reading the manual, which w i l l i n t r o d u c e them to some b a s i c FOCUS r e p o r t i n g commands. Inform them that they can keep the manual f o r the problem s e s s i o n , and f o r the t e s t which f o l l o w . - T e l l the s u b j e c t s they should read the manual c a r e f u l l y because i t w i l l make the sample problems and t e s t much e a s i e r . - S t r e s s to them that there i s no time l i m i t on e i t h e r the 137 r e a d i n g , or sample problem s e s s i o n s . - When the su b j e c t i s f i n i s h e d reading, he should get up q u i e t l y , and inform the l a b a s s i s t a n t that he has f i n i s h e d . They should not d i s t u r b the other s u b j e c t s . - E x p l a i n that the next step a f t e r reading, i s a p r a c t i c e s e s s i o n on the computer. E i g h t p r a c t i c e problems have to be completed. S o l u t i o n s are provided, but the s u b j e c t s should t r y to sol v e the problem at l e a s t once before l o o k i n g at the s o l u t i o n s . Try to budget about an hour, or 8 minutes per problem f o r the sample problems. When the s u b j e c t s are f i n i s h e d they should inform the l a b a s s i s t a n t , who w i l l g ive them the t e s t . Subjects w i l l be allowed 45 minutes f o r the t e s t . A f t e r completing the t e s t the s u b j e c t s w i l l be asked to f i l l out a q u e s t i o n n a i r e on t h e i r computer background. T h i s w i l l g ive the experimenter i n f o r m a t i o n concerning t h e i r l e v e l of programming exper i e n c e . 3) What the lab assistant should do during the experiment - Hand out the FOCUS manuals to s u b j e c t s . - Record the reading s t a r t i n g time f o r each s u b j e c t . - Lab a s s i s t a n t w i l l r e c o r d the time the s u b j e c t f i n i s h e d reading the manual. - Give the s u b j e c t the set of sample problems, when he has f i n i s h e d r e a d i n g the manual. - Record the time the su b j e c t s t a r t s the sample problems. - Record the time when the s u b j e c t has f i n i s h e d the sample problems. - Give the su b j e c t the a p p r o p r i a t e t e s t ( e i t h e r complex or 138 s i m p l e ) , and t e l l him he has 4 5 m i n u t e s . - R e c o r d t h e t i m e t h e s u b j e c t s t a r t s t h e t e s t . - A d v i s e t h e s u b j e c t when he has o n l y two m i n u t e s l e f t . - R e c o r d t h e s u b j e c t ' s f i n i s h t i m e . - Make s u r e t h e s u b j e c t ' s name i s on t h e t e s t . - G i v e t h e s u b j e c t t h e q u e s t i o n n a i r e , and c o l l e c t i t when he has f i n i s h e d . Check t o make s u r e t h e s u b j e c t has f i l l e d o u t a l l t h e t h e q u e s t i o n n a i r e . APPENDIX SIX REPORT GENERATION TRAINING MANUAL FOURTH GENERATION L A N G U A G E S REPORT GENERATION TRAINING M A N U A L (Adapted f rom P C / F O C U S User 's Manual) C . Pulfer 141 Report Generation B e f o r e b e g i n n i n g t h e r e p o r t g e n e r a t i o n c o u r s e i t i s i m p o r t a n t t h a t you u n d e r s t a n d some s i m p l e d a t a b a s e c o n c e p t s . T h r e e i m p o r t a n t terms w h i c h y ou s h o u l d know a r e : f i l e s , records, and fields . We can t h i n k o f t h e computer a s b e i n g e q u i v a l e n t t o a f i l i n g c a b i n e t . W i t h i n t h e f i l i n g c a b i n e t we m i g h t have an employee f i l e , c o n t a i n i n g i n f o r m a t i o n on a l l o f t h e company's e m p l o y e e s . S i m i l a r l y we c a n have an e q u i v a l e n t employee f i l e on t h e c o m p u t e r , i n a d a t a b a s e . A database m i g h t c o n t a i n many o f t h e s e t y p e s o f f i l e s . F o r example, t h e computer m i g h t have a f i l e on e m p l o y e e s and a l s o a f i l e on s h a r e h o l d e r s , w i t h i n t h e d a t a b a s e . In our f i l i n g c a b i n e t f i l e on e m p l o y e e s we have i n f o r m a t i o n on many e m p l o y e e s . The i n f o r m a t i o n c o n c e r n i n g o n l y one employee i s c a l l e d a record. T h e r e f o r e we have a s many r e c o r d s a s we have e m p l o y e e s . We would have a r e c o r d f o r employee J o h n S m i t h , one f o r Doug J o h n s o n , one f o r Don W i l s o n e t c . The r e c o r d c o n t a i n s i n f o r m a t i o n on j u s t one emp l o y e e . E a c h p i e c e o f i n f o r m a t i o n on t h e employee i s c a l l e d a f i e l d . F o r example we mi g h t have a s a l a r y f i e l d l i s t i n g t h e s a l a r y o f t h e employee, and an e x p e r i e n c e f i e l d l i s t i n g how many y e a r s o f e x p e r i e n c e t h e employee h a s . 142 TABLE OF CONTENTS Page I n t r o d u c t i o n 1 Report W r i t i n g 2 Verbs PRINT 4 SUM 5 COUNT 7 Producing a Matrix Report 8 D i s p l a y i n g Data F i e l d s Over Each Other 10 AS 10 Grouping Numerical Data 10 Record S e l e c t i o n 12 C o n t r o l C o n d i t i o n s SUB-TOTAL 14 SUBTOTAL 15 SKIP-LINE 17 SUMMARIZE 17 RECOMPUTE 18 NOPRINT 18 COMPUTE - RECAP 19 HEADING and FOOTING 20 UNDER-LINE 20 SUBHEAD 20 SUBFOOT '. 21 DEFINE command 22 Concatenating Character S t r i n g s 23 Reports from S e v e r a l F i l e s 25 143 Introduction This Report Preparation Manual wi l l introduce you to a computer package called FOCUS. FOCUS is a database management system which al lows users to store, maintain, and report on data which is of interest to them. FOCUS is a "fourth generation", or high productivity language which has gained wide acceptance in the business community. What you learn today should be useful to you in the future. Typical ly , businesses wi l l want to store data on their personnel, the firm's financial position etc. Decis ion-makers , within the business, wi l l want to see regular reports on how things are progressing in recruiting, sales etc. The report preparation commands within FOCUS can be used to prepare these reports. The rest of the manual wi l l introduce you to these FOCUS commands. Most of the FOCUS commands are relatively Engl ish- l ike and easy to understand, though some are more diff icult than others. Read the manual and learn the commands carefully. There is no time limit for reading the manual. Take as much time as you feel is necessary. Once you are finished, you wi l l be tested on the commands you have learned. Notify the lab assistant once you are finished. REPORT WRITING FOCUS c a n be u s e d t o e n t e r d a t a , m a i n t a i n d a t a and r e p o r t on d a t a . I s s u i n g t h e command TABLE F I L E f i l e n a m e , where f i l e n a m e i s t h e name o f t h e f i l e c o n t a i n i n g t h e i n f o r m a t i o n w h i c h w i l l a p p e a r i n t h e r e p o r t , a l l o w s us t o e n t e r t h e r e p o r t i n g mode. The TABLE F I L E command must be t h e f i r s t command i n t h e r e p o r t p r o g r a m . F o r e x a m p l e , i f we have a f i l e c o n t a i n i n g e m p l o y e e names and s a l a r i e s , c a l l e d SALARY, w h i c h we want t o use i n o u r r e p o r t , we w o u l d i s s u e t h e command TABLE F I L E SALARY. TABLE F I L E SALARY O t h e r r e p o r t commands w o u l d t h e n be i s s u e d , f o l l o w e d f i n a l l y , by t h e word END, on a l i n e by i t s e l f . TABLE F I L E SALARY END a) Request Statements " R e q u e s t s t a t e m e n t s " a r e t h e commands w h i c h a r e u s e d t o p r o d u c e t h e r e p o r t s a u s e r d e s i r e s . A r e p o r t r e q u e s t s t a t e m e n t f o l l o w s t h e r u l e s o f an i m p e r a t i v e E n g l i s h s e n t e n c e . The s e n t e n c e b e g i n s w i t h a v e r b o f a c t i o n w h i c h i s f o l l o w e d by v e r b o b j e c t s , t h e n a s e r i e s o f p h r a s e s . The e x a m p l e s w h i c h f o l l o w a l l u s e a s a m p l e f i l e c a l l e d PRODUCT w h i c h c o n t a i n s t h e f o l l o w i n g f i e l d s : MONTH UNITS AMOUNT PRODUCT-TYPE AREA CUSTOMER I d e n t i t y o f p r o d u c t G e o g r a p h i c a l a r e a Name o f C u s t o m e r Month f r o m 1 t o 1 2 Number o f u n i t s s h i p p e d D o l l a r v a l u e o f s h i p m e n t A f i e l d c a n a l s o be r e f e r e d t o by i t s A L I A S . The ALIAS i s a s h o r t e r v e r s i o n o f t h e f i e l d n a m e , m a k i n g i t e a s i e r f o r R e p o r t W r i t i n g M a n u a l Page 3 t h e u s e r t o t y p e i n h i s r e q u e s t s t a t e m e n t s . I f t h e PRODUCT-TYPE f i e l d h as a s i t s A L I A S , PROD, t h e u s e r c o u l d i s s u e e i t h e r o f t h e f o l l o w i n g e q u i v a l e n t commands: i ) PRINT PRODUCT-TYPE i i ) PRINT PROD The o r d e r o f p r e s e n t a t i o n o f t h e command e l e m e n t s w i t h i n a r e p o r t i s a r b i t r a r y . The f o l l o w i n g a r e a l l e q u i v a l e n t commands: i ) SUM UNITS BY MONTH i i ) SUM UNITS BY MONTH i i i ) BY MONTH SUM UNITS R e p o r t W r i t i n g M a n u a l Page 4 1. VERBS A v e r b i s a word o f a c t i o n . The a c t i o n i s p e r f o r m e d on t h e f i e l d s w h i c h a r e named a s t h e o b j e c t s o f t h e v e r b . The l i s t o f v e r b s a r e : V e r b M e a n i n g PRINT L i s t t h e f i e l d s d e s i r e d , w i t h a d i f f e r e n t r e c o r d on e a c h l i n e COUNT Count t h e number o f o c c u r e n c e s o f a f i e l d , and d i s p l a y t h e r e s u l t s . SUM Add t h e n u m e r i c f i e l d s o f t h e r e c o r d s t o g e t h e r , and d i s p l a y t h e r e s u l t s . The s y n t a x ( s t r u c t u r e o f t h e command) f o r a s i m p l e v e r b p h r a s e i s : VERB f i e l d n a m e [AND] f i e l d n a m e [AND] f i e l d n a m e e t c ( t h e AND be t w e e n f i e l d n a m e s i s o p t i o n a l ) E x a m p l e s SUM AMOUNT PRINT PRODUCT-TYPE AND AREA COUNT PRODUCT-TYPE a) PRINT The PRINT command c a u s e s t h e i n f o r m a t i o n i n t h e f i e l d s d e s i r e d t o be l i s t e d . The o r d e r i n w h i c h f i e l d n a m e s a r e p r o v i d e d i n t h e v e r b p h r a s e i s t h e o r d e r i n w h i c h t h e c o l u m n s o f t h e r e p o r t a r e p r i n t e d . *NOTE - The PRINT v e r b c a n n o t be u s e d i n t h e same p r o g r a m as  th e SUM o r COUNT v e r b . T h i s i s b e c a u s e PRINT, and SUM o r COUNT, d i s p l a y d i f f e r e n t amounts o f i n c o m p a t i b l e i n f o r m a t i o n . PRINT does n o t i n v o l v e any s u m m a r i z a t i o n o f i n f o r m a t i o n c o n t a i n e d i n t h e d a t a b a s e , as COUNT and SUM do. PRINT s i m p l y l i s t s t h e i n f o r m a t i o n w h i c h e x i s t s i n t h e d a t a b a s e . Example TABLE F I L E PRODUCT PRINT PRODUCT-TYPE UNITS END R e p o r t W r i t i n g M a n u a l Page 5 T h i s p r o d u c e s t h e f o l l o w i n g r e p o r t c o n t a i n i n g t h e f i e l d s PRODUCT-TYPE and UNITS e x t r a c t e d f r o m e a c h r e c o r d i n t h e f i l e PRODUCT. PRODUCT-TYPE UNITS AXLES 150 BEARING 324 b) SUM When t h e t o t a l o f t h e v a l u e s o f t h e d a t a f i e l d ( f o r a l l r e c o r d s i n t h e f i l e ) i s r e q u i r e d t h e n t h e v e r b SUM i s u s e d . I f t h e PRODUCT-TYPE and UNITS f i e l d s l o o k l i k e t h i s PRODUCT-TYPE UhJiT$ AXLE 150 BEARING 32H SCREW 300 BOLT 90 SUM UNITS w o u l d r e s u l t i n o n l y one p i e c e o f i n f o r m a t i o n b e i n g d i s p l a y e d , i . e . t h e t o t a l o f t h e UNITS f i e l d . UNITS 1545 We c a n a l s o p r o d u c e SUMS f o r p o r t i o n s o f t h e f i l e by u s i n g t h e BY command. F o r i n s t a n c e , SUM UNITS BY MONTH means t h a t a l l o f t h e v a l u e s o f t h e f i e l d UNITS a r e t o be added t o g e t h e r f o r e a c h MONTH. I n t h i s e x a m p l e t h e r e w i l l o n l y be 12 l i n e s on t h e p r i n t e d r e p o r t , one f o r e a c h month. The r e p o r t a p p e a r s a s : MONTH UNITS 1 2 3 18000 14625 10843 R e p o r t W r i t i n g M a n u a l Page 6 Example 2 TABLE F I L E PRODUCT SUM UNITS AND AMOUNT BY AREA END Th e s e commands p r o d u c e t h e f o l l o w i n g r e p o r t AREA UNITS AMOUNT EAST 10000 26000 NORTH 8000 19000 • • • N o t e - The SUM command c a n n o t be us e d t o show c o l u m n - t o t a l s ,  f o r PRINTed f i e l d s , a t t h e b o t t o m o f a_ r e p o r t . As we w i l l s e e , COLUMN-TOTAL o r SUBTOTAL c a n be u s e d f o r t h i s p u r p o s e . S o r t i n g W i t h t h e BY Command A p h r a s e i n t h e r e q u e s t s t a t e m e n t b e g i n n i n g w i t h t h e word B Y means t o s e q u e n c e t h e l i n e s o f t h e r e p o r t by t h e f i e l d whose name f o l l o w s ( s e e ab o v e e x a m p l e ) . M u l t i p l e BY commands c a n be u s e d . The f i r s t BY f i e l d s p e c i f i e d w o u l d be t h e m a j o r s o r t f i e l d , t h e s e c o n d BY s o r t s w i t h i n e a c h o c c u r e n c e o f t h e f i r s t By f i e l d e t c . F o r e x a m p l e t h e f o l l o w i n g r e p o r t has t h r e e By f i e l d s : TABLE F I L E PRODUCT SUM UNITS BY AREA BY MONTH BY PRODUCT-TYPE I F MONTH I S 1 OR 2 OR 3 END The r e p o r t a p p e a r s as AREA EAST NORTH MONTH PRODUCT-TYPE UNITS 1 BOLTS 200 FLANGES 125 2 BOLTS 600 FLANGES 800 3 BOLTS 625 FLANGES 515 1 BOLTS 125 FLANGES 315 e t c . R e p o r t W r i t i n g M a n u a l Page 7 The d e f a u l t s e q u e n c e o f a s c e n d i n g (LOW TO HIGH) us e d t o p r e s e n t t h e r e p o r t , ( a l p h a b e t i c a l l y A-Z n u m e r i c a l l y 1 , 2 , 3 . . . ) ' T h i s c a n be c h a n g e d t o a s e q u e n c e , Z-A and h i g h e s t numbers f i r s t , by t h e HIGHEST f i e l d n a m e . s o r t , i s and d e s c e n d i n g command BY c) COUNT A c o u n t o f t h e number o f o c c u r e n c e s o f some d a t a f i e l d , ( i . e . i n how many r e c o r d s d o es t h e f i e l d o c c u r , i n t h e d a t a b a s e ) E x a m p l e COUNT CUSTOMER BY AREA T h i s p r o d u c e s : AREA CUSTOMER COUNT EAST 248 NORTH 172 R e p o r t W r i t i n g M a n u a l Page 8 SUM and COUNT c a n be c o m b i n e d i n one command. SUM AMOUNT AND COUNT CUSTOMER BY AREA T h i s r e p o r t w o u l d p r o d u c e t h e same r e p o r t as t h e e x a m p l e a b o v e , e x c e p t t h a t a n o t h e r c o l u m n AMOUNT w o u l d be a d d e d . P r o d u c i n g a M a t r i x R e p o r t P r o d u c i n g a m a t r i x d i s p l a y i s a c c o m p l i s h e d by c o m b i n i n t h e BY command w i t h an ACROSS command. I n t h i s c a s e t h e c o l u m n s a r e s p r e a d ACROSS some v a r i a b l e o f i n t e r e s t i n a d d i t i o n t o t h e s o r t f i e l d s c o n t r o l l i n g t h e rows o f t h e r e p o r t . N o t e t h e use o f t h e p h r a s e ACROSS MONTH i n t h e f o l l o w i n g e x a m p l e , and t h e command 'COLUMN-TOTAL' w h i c h p r o d u c e s c o l u m n t o t a l s . TABLE F I L E PRODUCT COUNT PRODUCT-TYPE AND COLUMN-TOTAL BY REGION ACROSS MONTH END MONTH REGION 1 2 3 4 5 6 7 EAST 20 10 15 16 19 * NORTH 14 70 21 26 28 SOUTH 18 10 14 19 19 WEST 15 8 7 4 9 • • TOTAL 68 48 57 65 75 The v a l u e s i n t h e m a t r i x r e p r e s e n t t h e number o f p r o d u c t s a v a i l a b l e i n t h e r e g i o n by month. The number o f c o l u m n s d i s p l a y e d on t h e r e p o r t i s e q u a l t o t h e number o f v e r b o b j e c t f i e l d s t i m e s t h e number o f v a l u e s r e t r i e v e d f o r t h e ACROSS f i e l d . For- i n s t a n c e i f t h e p h r a s e ACROSS MONTH i s u s e d and t h e r e a r e two v e r b o b j e c t s (PRINT AMOUNT and U N I T S ) , t h e n t h e r e w i l l be 24 o u t p u t c o l u m n s , composed o f 12 p a i r s . The r e p o r t t i t l e a p p e a r s as MONTH 1 2 3 AMOUNT UNITS AMOUNT UNITS AMOUNT UNITS ... R e p o r t W r i t i n g M a n u a l Page 10 2. D i s p l a y i n g D a t a F i e l d s Over E a c h O t h e r N o r m a l l y one c o l u m n i s o c c u p i e d by one d a t a f i e l d and i t s t i t l e h e a ds t h e c o l u m n . T h i s c a n be r e v e r s e d by s p e c i f y i n g t h a t t h e d a t a f i e l d s a r e t o a p p e a r one o v e r t h e o t h e r . I n s t e a d o f c o n n e c t i n g t h e v e r b o b j e c t f i e l d s w i t h t h e word AND, t h e word OVER i s u s e d . F o r e x a m p l e : SUM AMOUNT OVER UNITS BY AREA T h i s w o u l d p r o d u c e : AREA EAST AMOUNT 4050 UNITS 487 NORTH AMOUNT 2686 UNITS 456 3. AS The d e f a u l t c o l u m n - t i t l e ( w h i c h i s t h e f i l e d n a m e ) c a n be r e p l a c e d by a more m e a n i n g f u l t i t l e by t h e use o f t h e p h r a s e "AS c o l u m n - t i t l e " . E x a m p l e PRINT PCT.AMOUNT AS 'PERCENTAGE OF AMOUNT' w o u l d p r o d u c e t h e c o l u m n t i t l e PERCENTAGE OF AMOUNT i n s t e a d of PCT AMOUNT 4. G r o u p i n g N u m e r i c a l D a t a C a r e must be t a k e n when s o r t i n g by n u m e r i c f i e l d s . E a c h d i f f e r e n t v a l u e ( p e r h a p s o n l y d i f f e r e n t i n t h e l a s t d e c i m a l p l a c e ) w o u l d r e s u l t i n a s e p a r a t e l i n e o r c o l u m n . A f a c i l i l t y i s p r o v i d e d t o s o r t d a t a i n r a n g e s o f v a l u e s . A c o n v e n i e n t way t o v i e w n u m e r i c a l d a t a i s t o g r o u p t h e v a l u e s i n t o r a n g e s and d i s p l a y t h e r e s u l t s i n t h e s e r a n g e s . T h e command IN-GROUPS-OF c a n be u s e d t o g r o u p n u m e r i c a l v a l u e s i n d e s i r e d g r o u p s . 152 R e p o r t W r i t i n g M a n u a l Page 11 e.g. TABLE F I L E PRODUCT COUNT PRODUCT-TYPE BY UNITS IN-GROUPS-OF 500 END P r o d u c e s UNITS PRODUCT-TYPE  COUNT 0 40 500 85 1000 72 1500 58 2000 14 2500 8 R e p o r t W r i t i n g M a n u a l Page 12 5. R e c o r d S e l e c t i o n I n g e n e r a l any command a c c e s s e s a l l r e c o r d s i n t h e f i l e . I f we o n l y want t o a c c e s s c e r t a i n f i e l d s , we c a n use t h e I F command. Any number o f I F p h r a s e s c a n be u s e d , and t h e y may r e f e r t o any d a t a f i e l d i n t h e f i l e . The s y n t a x i s I F f i e l d n a m e RELATION l i t e r a l [OR l i t e r a l OR l i t e r a l ] where RELATION a r e o p e r a t o r s as shown b e l o w , and where L I T E R A L i s e i t h e r a n u m e r i c c o n s t a n t ( e . g . 34) o r a c h a r a c t e r c o n s t a n t ( e . g . 'STEEL') e.g. I F AREA I S EAST Here ' I S ' i s t h e RELATION and 'EAST* i s t h e LI T E R A L O n l y t h o s e r e c o r d s w h i c h c o n t a i n t h e v a l u e EAST i n t h e f i e l d AREA a r e a c c e s s e d , t h e o t h e r s a r e i g n o r e d . RELATION MEANING Note - R e l a t i o n s s e p a r a t e d by commas a r e e q u i v a l e n t IS.EQ IS-NOT, NE IS-FROM, GE TO, LE EXCEEDS, GT IS-LESS-THAN, LT FROM TO NOT-FROM TO CONTAINS OMITS E q u a l i t y b e t w e e n f i e l d v a l u e and l i t e r a l I n e q u a l i t y b e t ween f i e l d v a l u e and l i t e r a l F i e l d v a l u e e q u a l t o o r g r e a t e r t h a n l i t e r a l F i e l d v a l u e e q u a l t o o r l e s s t h a n l i t e r a l F i e l d v a l u e g r e a t e r t h a n l i t e r a l F i e l d v a l u e l e s s t h a n l i t e r a l F i e l d v a l u e i n r a n g e F i e l d v a l u e n o t i n r a n g e C h a r a c t e r s i n f i e l d v a l u e c o n t a i n s c h r a c t e r i n l i t e r a l C h a r a c t e r s i n a f i e l d v a l u e do n o t c o n t a i n c h a r a c t e r s i n l i t e r a l . O p p o s i t e o f CONTAINS E x a m p l e s I F AREA I S EAST OR WEST H e r e , o n l y t h o s e r e c o r d s w h i c h c o n t a i n t h e v a l u e EAST, o r WEST i n t h e f i e l d AREA a r e a c c e s s e d , t h e o t h e r s a r e i g n o r e d . C h a r a c t e r f i e l d s s u c h a s EAST do n o t have t o be e n c l o s e d i n s i n g l e q u o t e s u n l e s s t h e f i e l d c o n t a i n s two o r more words s e p a r a t e d by b l a n k s s u c h a s 'NEW YORK'. R e p o r t W r i t i n g M a n u a l Page 13 I F AMOUNT IS-FROM 100 The f i e l d AMOUNT must be e q u a l t o o r g r e a t e r t h a n t h e v a l u e 100 I F PRODUCT-TYPE CONTAINS STEEL The f i e l d PRODUCT-TYPE must c o n t a i n t h e c h a r a c t e r s "STEEL" a n y w h e r e w i t h i n i t e .g. COLDSTEEL. I F AMOUNT EXCEEDS 40 I F UNITS FROM 100 TO 140 The f i e l d UNITS must have a v a l u e b e t w e e n 100 and 140 i n c l u s i v e . I F UNITS NOT-FROM (4 TO 6) OR (9 TO 11) The f i e l d v a l u e must be e q u a l t o a number o u t s i d e t h e r a n g e s g i v e n . NOTE - Two I F c o n d i t i o n s c a n n o t be p l a c e d on t h e same l i n e . F o r i n s t a n c e I F AMOUNT EXCEEDS 40 AND UNITS EQ 100 i s  i n c o r r e c t . The two c o n d i t i o n s s h o u l d be p l a c e d on s e p a r a t e l i n e s a s f o l l o w s : I F AMOUNT EXCEEDS 40 IF UNITS e<x \QO R e p o r t W r i t i n g M a n u a l Page 14 6. C o n t r o l C o n d i t i o n s W i t h i n FOCUS, a v a r i e t y o f a c t i o n s a r e p r o v i d e d w h i c h p e r t a i n t o what h a p p e n s on t h e p r i n t e d r e p o r t when a s o r t c o n t r o l f i e l d ( BY) c h a n g e s v a l u e . F o r i n s t a n c e , a s u b - t o t a l may be d i s p l a y e d . The s y n t a x f o r s p e c i f y i n g t h e c o n t r o l c o n d i t i o n s u s e s a p h r a s e b e g i n n i n g w i t h t h e word 'ON'. T h i s i s f o l l o w e d by t h e name o f t h e f i e l d . When t h i s f i e l d c h a n g e s v a l u e on t h e p r i n t e d r e p o r t , t h e n j u s t b e f o r e t h e n e x t v a l u e i s p r i n t e d , t h e a c t i o n m e n t i o n e d i s t a k e n . < SUB-TOTAL SUBTOTAL PAGE-BREAK S K I P - L I N E FOLD-LINE SUMMARIZE RECOMPUTE NOPRINT UNDER-LINE SUBFOOT SUBHEAD COMPUTE' RECAP a c c u m u l a t e and d i s p l a y a l l s u b - t o t a l s a c c u m u l a t e and d i s p l a y s i n g l e s u b t o t a l f o r c e a page b r e a k i n s e r t a b l a n k l i n e f o l d a l o n g l i n e s u m m a r i z e c a l c u l a t i o n s a t a l l s o r t b r e a k s . s u m m a r i z e o n l y a t named s o r t b r e a k s s u p p r e s s p r i n t i n g o f t h i s c o l u m n draw u n d e r l i n e a c r o s s page i n s e r t f r e e t e x t a f t e r v a l u e s i n s e r t f r e e t e x t b e f o r e v a l u e s compute r e c a p s o f c o n t e n t l i n e a) SUB-TOTAL Under e a c h c o l u m n o f n u m e r i c d a t a , a s u b t o t a l i s p r i n t e d . E a c h s u b - t o t a l l i n e d i s p l a y s t h e r u n n i n g a c c u m u l a t i o n s o f t h e s o r t f i e l d up t o i t s l a s t b r e a k i n v a l u e . A COMPLETE GRAND TOTAL FOR EACH COLUMN IS PRODUCED AUTOMATICALLY. A l l o f t h e s u b - t o t a l s a r e d i s p l a y e d u p t o and i n c l u d i n g  t h e p o i n t o f s o r t b r e a k r e q u e s t e d so t h a t o n l y t h e  i n n e r - m o s t p o i n t o f s u b - t o t a l l i n g s h o u l d be r e q u e s t e d . F o r i n s t a n c e , i f t h e ~rBY~i f i e l d s a r e BY AREA BY PRODUCT-TYPE BY MONTH ON PRODUCT-TYPE SUB-TOTAL R e p o r t W r i t i n g M a n u a l Page 15 Then when t h e AREA c h a n g e s SUBTOTALS FOR BOTH PRODUCT_TYPE AND THE INNER SORT F I E L D AREA ARE DISPLAYED.For e x a m p l e TABLE F I L E PRODUCT PRINT UNITS AND AMOUNT BY AREA BY PRODUCT-TYPE BY MONTH FROM 1 TO 3 ON PRODUCT-TYPE SUB-TOTAL END AREA PRODUCT-TYPE MONTH UNITS AMOUNT EAST BEARINGS 1 100 50.45 2 140 61 .75 3 210 76.49 *TOTAL PRODUCT-TYPE BEARINGS 450 188.69 FLANGES 1 125 64.40 2 115 91 .38 3 143 63.51 *TOTAL PRODUCT-TYPE FLANGES 383 219.29 *TOTAL AREA EAST 833 407.98 WEST AXLES 1 100 130.50 e t c . b) SUBTOTAL When t h e word SUBTOTAL w i t h o u t a_ hyphen i s us e d t h e n o n l y t h e s u b t o t a l o f t h e s o r t b r e a k f i e l d , m e n t i o n e d i n t h e *0N' p h r a s e , i s d i s p l a y e d . The i n n e r 'BY' f i e l d s ARE NOT DISPLAYED . F o r i n s t a n c e : BY AREA BY PRODUCT-TYPE BY MONTH ON PRODUCT-TYPE SUBTOTAL T h i s w i l l d i s p l a y a s u b t o t a l when t h e PRODUCT-TYPE f i e l d c h a n g e s v a l u e , but WILL NOT d i s p l a y a s u b t o t a l f o r t h e o u t e r f i e l d AREA when i t c h a n g e s v a l u e . F o r exa m p l e R e p o r t W r i t i n g M a n u a l Page 16 TABLE F I L E PRODUCT PRINT UNITS AND AMOUNT BY AREA BY PRODUCT-TYPE BY MONTH FROM 1 TO 3 ON PRODUCT-TYPE SUBTOTAL END AREA PRODUCT-TYPE MONTH UNITS AMOUNT EAST BEARINGS 1 100 50.45 2 140 61.75 3 210 76.49 *TOTAL PRODUCT-TYPE BEARINGS 450 188.69 FLANGES 1 125 64.40 2 115 91.38 3 143 63.51 *TOTAL PRODUCT-TYPE FLANGES 383 219.29 WEST AXLES 1 100 130.50 e t c . SUBTOTAL f o r S p e c i f i c F i e l d s A l i s t o f s p e c i f i c f i e l d s t o s u b t o t a l c a n be s u p p l i e d a f t e r t h e word SUB-TOTAL, o r SUBTOTAL, i s t y p e d . T h i s l i s t o v e r r i d e s t h e d e f a u l t w h i c h i n c l u d e s a l l n u m e r i c a l v e r b o b j e c t f i e l d s . F o r e x a m p l e TABLE F I L E PRODUCT SUM UNITS AND AMOUNTS BY PRODUCT-TYPE BY MONTH ON PRODUCT-TYPE SUBTOTAL UNITS END He r e o n l y t h e UNITS f i e l d w i l l be s u b t o t a l e d ( n o r m a l l y UNITS and AMOUNT w o u l d be s u b t o t a l e d ) . T h e s y n t a x i s ~ -ON by f i e l d J S U B - T O T A L C f i e l d n a m e AND f i e l d n a m e ... /SUBTOTAL J 158 R e p o r t W r i t i n g M a n u a l Page 17 Combined PAGE-BREAK and SUB-TOTAL The g e n e r a l u se o f a page b r e a k i s t o p r o v i d e a s e p a r a t e r e p o r t f o r e a c h m a j o r s o r t c o n t r o l v a l u e . I f a s u b - t o t a l i s a l s o r e q u e s t e d t h e n e a c h o f t h e s e p a r a t e r e p o r t s may o r may n o t have t h e r u n n i n g s u b t o t a l s a c c u m u l a t e d a t t h e same t i m e t h e page b r e a k i s f o r c e d . The r u l e w h i c h i s f o l l o w e d i s : be u s e d when v e r b o b j e c t s . a b l a n k l i n e by t h e page of p r o d u c i n g 11 a c c u m u l a t e - I f t h e page b r e a k i s r e q u e s t e d f i r s t , t h e n e a c h s u c h b r e a k w i l l be g i v e n o n l y t h e s u b - t o t a l s w h i c h p e r t a i n t o i t , and w i l l n o t a c c u m u l a t e r u n n i n g t o t a l s i n t o f u t u r e p a g e s . - I f t h e s u b - t o t a l i s r e q u e s t e d f i r s t , f o l l o w e d b r e a k , t h e n i t i s assumed t h a t t h e s t a n d a r d way s u b - t o t a l s i s d e s i r e d and t h a t t h e s u b t o t a l s w i i n t o s u b s e q u e n t s u b t o t a l s on o t h e r p a g e s . c ) S K I P - L I N E The S K I P - L I N E command c a n  r e f e r r i n g t o s o r t c o n t r o l ( B Y ) f i e l d s o r t o The f o l l o w i n g e x a m p l e p r o d u c e s l i n e e a c h t i m e t h e a r e a c h a n g e s . BY PRODUCT-TYPE BY AREA ON AREA S K I P - L I N E The n e x t e x a m p l e c r e a t e s a d o u b l e d s p a c e d r e p o r t : a l i n e i s s k i p p e d a f t e r e a c h PRODUCT-TYPE PRINT PRODUCT-TYPE ON PRODUCT-TYPE S K I P - L I N E d) SUMMARIZE S u m m a r i z a t i o n i s s i m i l a r t o a s u b - t o t a l . The d i f f e r e n c e d e p e n d s upon w h e t h e r any c o l u m n s i n t h e o u t p u t r e p o r t a r e t h e m s e l v e s t h e r e s u l t o f a c a l c u l a t i o n o r a c a l c u l a t e d f i e l d . I f t h e y a r e , t h e n t h e i n d i c a t e d c a l c u l a t i o n i s p e r f o r m e d i n s t e a d o f a s u b - t o t a l on t h e summary l i n e o f t h e r e p o r t . R e p o r t W r i t i n g M a n u a l Page 18 E x a m p l e TABLE F I L E PRODUCT SUM AMOUNT AND UNITS COMPUTE PER UNIT = AMOUNT/UNITS; BY PRODUCT-TYPE BY AREA ON PRODUCT-TYPE SUMMARIZE END PRODUCT-TYPE AREA AMOUNT UNITS PER UNIT AXLES EAST 1 ,342.50 1500 .89 NORTH 2,761.41 2000 1 .38 SOUTH 3,849.52 3700 1 .04 WEST 2, 147.36 1800 1 .17 *TOTAL AXLES 10,100.79 9000 1 .12 The COMPUTE v e r b i s u s e d t o p r o d u c e t h e new f i e l d PER UNIT. Compute c a n be u s e d t o p r o d u c e new f i e l d s r e s u l t i n g f r o m t h e n o r m a l math o p e r a t o r s : / ( d i v i d e ) , * ( m u l t i p l y ) , -( s u b t r a c t ) , + ( a d d ) . N o t e t h a t t h e PER UNIT f i e l d i s d i s p l a y e d a u t o m a t i c a l l y , no PRINT command i s n e e d e d . The COMPUTE v e r b a c t s l i k e a PRINT v e r b i n t h i s c a s e . The COMPUTE v e r b c a n n e v e r be t h e f i r s t command o f ja p r o g r a m . N o t i c e t h a t i n t h e c o l u m n t i t l e d PER UNITS t h e v a l u e p r i n t e d i s 1.12. T h i s i s 10,100.79 / 9000, and i s t h e r e s u l t o f t h e same c a l c u l a t i o n a s t h e o t h e r numbers i n t h e c o l u m n . Had a s u b - t o t a l been r e q u e s t e d , t h e sum o f t h e numbers i n t h e PER UNIT c o l u m n w o u l d have a p p e a r e d . T h i s w o u l d have been 4.50 and w o u l d have been m e a n i n g l e s s i n t h i s s i t u a t i o n . e) RECOMPUTE E x a c t l y l i k e s u m m a r i z e e x c e p t o n l y t h e v a l u e s f o r t h e s p e c i f i c 'BY' f i e l d r e q u e s t e d a r e d i s p l a y e d , n o t t h e h i g h e r l e v e l 'BY' f i e l d s . The d i f f e r e n c e b e t w e e n SUMMARIZE and RECOMPUTE i s e q u i v a l e n t t o t h e d i f f e r e n c e b e t ween SUB-TOTAL and SUBTOTAL. f ) NOPRINT T h i s i s u s e d when we do n o t want one o f t h e s o r t c o n t r o l f i e l d s t o be p r i n t e d i n t h e f i n a l r e p o r t . The r e p o r t w i l l s t i l l be s e q u e n c e d by t h i s f i e l d , b u t b o t h t h e c o l u m n R e p o r t W r i t i n g M a n u a l Page 19 t i t l e and t h e d a t a f o r t h a t c o l u m n a r e removed f r o m t h e f i n a l p r i n t e d r e p o r t . e.g. BY AREA BY MONTH ON AREA NOPRINT The AREA f i e l d w i l l n o t be p r i n t e d i n t h e r e p o r t . g) COMPUTE - RECAP L I N E S The f a c i l i t y t o p r o d u c e c a l c u l a t i o n s i n a d d i t i o n t o o r i n s t e a d o f s u b - t o t a l s a t d e s i r e d b r e a k s i n t h e s o r t c o n t r o l f i e l d s i s p r o v i d e d a s a r e c a p l i n e . A r e c a p l i n e i s a l i n e w h i c h i s c a l c u l a t e d , b a s e d on t h e d a t a i n t h e c o n t e n t l i n e s e v e r y t i m e a c o n t r o l f i e l d c h a n g e s v a l u e . I n a way a s u b - t o t a l i s a r e c a p l i n e , b u t b e c a u s e i t i s so common i t ca n be d i r e c t l y r e q u e s t e d . I n s t e a d o f j u s t t o t a l l i n g t h e f i e l d s , a s i s done f o r s u b t o t a l s , RECAP c a n be u s e d t o d i s p l a y a v e r a g e s o r r a t i o s a t t h e s o r t b r e a k . A r e c a p i s p e r f o r m e d w h e n e v e r t h e c o n t r o l f i e l d c h ange v a l u e . ON f i e l d n a m e (COMPUTE* (RECAP ) COMPUTE and RECAP a r e e q u i v a l e n t when u s e d w i t h 'ON' T h i s i s f o l l o w e d by t h e c a l c u l a t i o n s t o be p e r f o r m e d . e.g TABLE F I L E PRODUCT SUM AMOUNT AND UNITS AND COUNT BY AREA BY MONTH FROM 1 TO 3 ON AREA RECAP UNITPRICE = AMOUNT/UNITS; AVE SHIPMENT = UNITS/COUNT; END 161 R e p o r t W r i t i n g M a n u a l Page 20 AREA MONTH AMOUNT UNITS COUNT EAST 1 2 3 4528 1200 6240 200 110 460 13 14 5EAST UNITPRICE 15.55 AVE SHIPMENT 30.80 NORTH 1400 200 e t c . h) HEADING and FOOTING A r e p o r t h e a d i n g c a n be s u p p l i e d by g i v i n g t h e command HEADING f o l l o w e d by t h e t i t l e e n c l o s e d i n d o u b l e q u o t e s . F o r ex a m p l e HEADING "PRODUCT REPORT" "AS OF DEC 3 1 , 1986" w o u l d p r o d u c e PRODUCT REPORT AS OF DEC 3 1 , 1986 a t t h e t o p o f t h e r e p o r t . To c e n t e r t h e h e a d i n g u s e t h e command HEADING CENTER f o l l o w e d by t e x t i n q u o t e s . To p r o d u c e t e x t a t t h e end o f t h e r e p o r t , we w o u l d u s e t h e same s y n t a x a s f o r HEADING, b u t we w o u l d u s e t h e command FOOTING. i ) UNDER-LINE T h i s command i s u s e d t o draw an u n d e r l i n e a f t e r t h e named f i e l d c h a n g e s . The l i n e i s drawn a f t e r any o t h e r o p t i o n s u c h a s RECAP o r SUB-TOTAL. j ) SUBHEAD T h i s command i s b r e a k c a u s e d by a BY u s e d t o i n s e r t t e x t b e f o r e a c o n t r o l command. When d a t a i s t o be embedded i n R e p o r t W r i t i n g M a n u a l Page 21 t e x t t h e f i e l d n a m e s a r e e n c l o s e d i n b r a c k e t s i . e . " t e x t < f i e l d n a m e > t e x t " . The d a t a v a l u e s a r e t h o s e w h i c h w o u l d a p p e a r on t h e f i r s t l i n e o f t h e c o n t r o l b r e a k had t h i s d a t a been p l a c e d on t h e c o n t r o l b r e a k l i n e r a t h e r t h a n i n t h e SUBHEAD. e. g . TABLE F I L E PRODUCT SUM UNITS AND AMOUNT BY AREA BY PRODUCT-TYPE ON AREA NOPRINT AND SUBHEAD " SUMMARY FOR <AREA>" END PRODUCT-TYPE UNITS SUMMARY FOR EAST AXLES 650 BEARINGS 720 SUMMARY FOR WEST AXLES 534 AMOUNT 141 182 162.45 e t c . Two s p e c i a l f i e l d p r e f i x e s a r e a p p l i c a b l e t o SUBHEADS < S T . f i e l d n a m e < C T . f i e l d n a m e The s u b t o t a l v a l u e o f t h e f i e l d a t t h a t p o i n t i n t h e r e p o r t . The r u n n i n g c o l u m n t o t a l o f t h e f i e l d a t t h a t p o i n t i n t h e r e p o r t . k) SUBFOOT T h i s command i s e q u i v a l e n t t o SUBHEAD, b u t t h e t e x t a p p e a r s a f t e r , r a t h e r t h a n b e f o r e , t h e c o n t r o l b r e a k . R e p o r t W r i t i n g M a n u a l Page 2 2. 7 . DEFINE •command The DEFINE command i s u s e d t o c r e a t e t e m p o r a r y d a t a f i e l d s . T e m p o r a r y d a t a f i e l d s c a n be d e f i n e d as m a t h e m a t i c a l o r l o g i c a l c o m b i n a t i o n s o f r e a l o r o t h e r t e m p o r a r y d a t a f i e l d s . T h e s e t e m p o r a r y f i e l d s c a n be u s e d i n r e p o r t r e q u e s t s t a t e m e n t s . Some o f t h e u s e s o f t e m p o r a r y d a t a f i e l d s a r e ( d o n ' t w o r r y i f you d o n ' t u n d e r s t a n d t h e s e r i g h t a w a y ) : - Compute new n u m e r i c a l v a l u e s w h i c h a r e n o t i n a d a t a r e c o r d - Compute c o n d i t i o n a l n u m e r i c a l v a l u e s b a s e d on IF-THEN-ELSE c o n d i t i o n a l l o g i c - Compute new s t r i n g s o f a l p h a n u m e r i c c h a r a c t e r s f r o m o t h e r s t r i n g s . DEFINE s y n t a x DEFINE F I L E f i l e n a m e where f i l e n a m e i s t h e name o f t h e f i l e you a r e u s i n g t o  p r o d u c e a^  new f i e l d . name = e x p r e s s i o n ; name c a n be up t o 12 c h a r a c t e r s l o n g name = e x p r e s s i o n ; END e x p r e s s i o n - i s t h e c a l c u l a t i o n , o r c o n d i t i o n a l c a l c u l a t i o n t o be p e r f o r m e d . I_t must b e t e r m i n a t e d by a_ s e m i - c o l o n . I n t h e f o l l o w i n g e x a m p l e t h e f i l e PRODUCT c o n t a i n s t h e f i e l d s AMOUNT and UNITS and w i l l c o n t a i n t h e new f i e l d P R I C E . The f i e l d PRICE c o u l d t h e n be u s e d i n a r e p o r t p r o g r a m begun w i t h TABLE F I L E PRODUCT. B u t , DEFINE must t a k e p l a c e b e f o r e u s i n g t h e new f i e l d i n a TABLE command. e.g. DEFINE F I L E PRODUCT PRICE = AMOUNT/UNITS; (NOTE THE USE OF THE END SEMICOLON) The a r i t h m e t i c o p e r a t i o n s a v a i l a b l e a r e R e p o r t W r i t i n g M a n u a l Page 23 + p l u s - minus * m u l t i p l y / d i v i d e ** e x p o n e n t i a t i o n The l o g i c a l o p e r a t o r s a r e EQ e q u a l NE n o t e q u a l LE l e s s t h a n o r e q u a l LT l e s s t h a n GE g r e a t e r t h a n o r e q u a l GT g r e a t e r t h a n AND l o g i c a l c o n n e c t i v e AND OR l o g i c a l c o n n e c t i v e OR T h e s e o p e r a n d s a r e u s e d most f r e q u e n t l y i n c o n d i t i o n a l c a l c u l a t i o n s o f t h e IF-THEN-ELSE t y p e . e.g. DEFINE F I L E PRODUCT NEW-VAL = I F AMOUNT LT 100 THEN AM0UNT*1.5 ELSE AM0UNT*2.0; END T h i s s e t s NEW-VAL t o e i t h e r AM0UNT*1.5 o r AM0UNT*2 d e p e n d i n g on t h e v a l u e o f AMOUNT TFACTOR = I F AMOUNT GT 100 OR PRICE LT FACTOR THEN AMOUNT ELSE AMOUNT * FACTOR; S I Z E = I F UNITS GT 100 THEN 'LARGE' ELSE 'SMALL 1; He r e a c h a r a c t e r s t r i n g ( e i t h e r LARGE o r SMALL) i s a s s i g n e d t o t h e f i e l d S I Z E . TYPE = I F PRODUCT EQ 'AUTO' THEN 1 ELSE 0; 1. NOTE ALPHANUMERIC L I T E R A L S MUST BE ENCLOSED IN SINGLE QUOTES 2. EACH EXPRESSION MUST END IN A SEMI-COLON a) C o n c a t e n a t i n g C h a r a c t e r S t r i n g s C o n c a t e n a t i o n i s u s e d t o j o i n f i e l d s . Two o r more c h a r a c t e r s t r i n g s o f a l p h a n u m e r i c f i e l d s a n d / o r c o n s t a n t s c a n be c o m b i n e d i n t o a s i n g l e f i e l d . I n t h i s way, t h e v a l u e s of d i f f e r e n t f i e l d s c a n be j o i n e d i n a new f i e l d . R e p o r t W r i t i n g M a n u a l Page e.g. DEFINE F I L E PRODUCT FULLNAME = PLANT | AREA | '3' END i f PLANT = GEOR i n a 6 c h a r a c t e r f i e l d AREA = EAST i n a 6 c h a r a c t e r f i e l d t h e n FULLNAME = GEOR7 EkST^ 3 N o t e t h e v e r t i c a l b a r i s u s e d f o r c o n c a t e n a t i o n . I f we want t o g e t r i d o f t r a i l i n g b l a n k s we c o u l d u s e t h e d o u b l e v e r t i c a l b a r s || U s i n g t h e same f i e l d s a s a b o v e FULLNAME = PLANT || AREA || ' 3 ' ; w o u l d r e s u l t i n FULLNAME = GE0REAST3 166 8. Reports from Several F i l e s Data from two or more FOCUS may be j o i n e d together by t h e i r common va l u e s and the r e s u l t s t o r e d i n a temporary f i l e . Reports spanning the e n t i r e c o l l e c t i o n of data can then be requested from t h i s ' r e s u l t s ' f i l e . In t h i s way we can prepare r e p o r t s which d i s p l a y i n f o r m a t i o n from many f i l e s . For example, i f we have a SALES f i l e , c o n t a i n i n g i n f o r m a t i o n on product s a l e s , and a COST f i l e , c o n t a i n i n g i n f o r m a t i o n on product c o s t s , we c o u l d produce a r e p o r t on both product s a l e s and c o s t s , i n s t e a d of one c o n t a i n i n g only s a l e s or c o s t i n f o r m a t i o n . The j o i n i n g process i s c o n t r o l l e d by the FOCUS command JOIN. The syntax i s JOIN f i e l d l i n f i l e l to f i e l d 2 i n f i l e 2 AS joiname where ' f i e l d l ' i s any f i e l d i n the f i l e named ' f i l e l ' ' f i e l d 2 ' i s any f i e l d i n the f i l e named ' f i l e 2 ' ( t h i s f i e l d must be indexed) ' f i e l d l ' AND ' f i e l d 2 ' MUST CONTAIN THE SAME 'TYPE' OF DATA. For example, the two f i e l d s c o u l d c o n t a i n names, some of which might be the same. IF THE TWO FIELDS CONTAIN THE SAME VALUE (eg. SMITH) THEN THE TWO FILES WOULD BE JOINED FOR THAT RECORD. IF THE FIELD VALUES ARE NOT THE SAME THEN THE TWO FILES ARE NOT JOINED FOR THAT RECORD. The JOIN command should be i s s u e d before e n t e r i n g a r e p o r t request that accesses data from the j o i n e d f i l e s . TABLE and DEFINE commands using the JOINed f i e l d s can only be i s s u e d a f t e r the JOIN command. F i e l d s DEFINEd before a JOIN command are a u t o m a t i c a l l y d e a c t i v a t e d by i s s u i n g the JOIN command. TABLE and DEFINE commands can be a p p l i e d as _if_ ' f i l e l '  were a new f i l e made up of both the o r i g i n a l f i l e s . Shown below are two f i l e s , the PRODUCT f i l e , and the SALESCOM f i l e , which w i l l be used i n the next example. PRODUCT FILE AREA UNITS AMOUNT EAST 1642 7466 WEST 2354 2635 NORTH 5636 1234 167 SALESCOM FILE TERRITORY SALESREP POINT EAST BROWNE 862 EAST MEHTA 1984 EAST RUBIN 1482 EAST VRONSKY 1640 NORTH SMITH 87 6 JOIN AREA IN PRODUCT TO TERRITORY IN SALESCOM AS NEW The above command has the e f f e c t of j o i n i n g the PRODUCT f i l e to the SALESCOM f i l e , which c o n t a i n s i n f o r m a t i o n on s a l e s commissions by s a l e s t e r r i t o r y . D i a g r a m a t i c a l l y , the JOIN command looks l i k e t h i s : R>£r1 A NJEuJ FILE The records j o i n e d are the ones s h a r i n g common values i n the AREA and TERRITORY f i e l d s . For example, a l l the records c o n t a i n i n g the value 'EAST' i n the PRODUCT f i l e are j o i n e d to a l l the records c o n t a i n i n g 'EAST' i n the TERRITORY f i l e , producing one l a r g e f i l e . The same i s tr u e f o r records s h a r i n g the value 'NORTH'. As can be seen, not a l l the records are j o i n e d . N o t i c e the WEST area i n the PRODUCT f i l e has no match i n the SALESCOM f i l e and i s t h e r e f o r e not j o i n e d . AREA UNITS AMOUNT SALESREP POINTS EAST 1 642 7466 BROWNE 862 EAST 1642 7466 MEHTA 1 984 EAST 1 642 7466 RUBIN 1 482 EAST 1642 7466 VRONSKY 1 640 NORTH 5636 1234 SMITH 876 A f t e r a j o i n , the jo i n e d f i l e can be used for r e p o r t i n g JOIN AREA IN PRODUCT TO TERRITORY IN SALESCOM AS NEW TABLE FILE PRODUCT <-- (note the use of " f i l e l " ) PRINT SALESREP POINTS BY AREA BY UNITS END AREA UNITS SALESREP POINTS EAST 1642 BROWNE 862 MEHTA 1984 RUBIN 1482 VRONSKY 1640 NORTH 5636 SMITH 876 A dynamic JOIN i s very u s e f u l because i t does not a f f e c t the master d e s c r i p t i o n . New f i l e s can be created by j o i n i n g other f i l e s , and these other f i l e s are not af f e c t e d . APPENDIX SEVEN PRACTICE PROBLEMS 170 PRACTICE PROBLEMS In o r d e r t o p r a c t i c e t h e commands you have j u s t l e a r n e d , you w i l l be a s k e d t o w r i t e up s e v e r a l s m a l l p r o g r a m s . You s h o u l d t h e n e n t e r t h e s e programs i n t o t h e computer u s i n g t h e TED e d i t o r i n FOCUS. Run t h e programs i n FOCUS u n t i l t h e y work c o r r e c t l y . S o l u t i o n s a r e p r o v i d e d f o l l o w i n g t h e q u e s t i o n s , but p l e a s e a t t e m p t t h e p r o b l e m a t l e a s t once b e f o r e t u r n i n g t o t h e s o l u t i o n . A l l t h e q u e s t i o n s w i l l be b a s e d on a d a t a b a s e from a f i c t i t i o u s m i l k company. The company and t h e f i l e s w h i c h make up i t s d a t a b a s e a r e d e s c r i b e d below, and on t h e f o l l o w i n g page. L e t t h e s e s s i o n l e a d e r know when you have s u c c e s s f u l l y f i n i s h e d a l l t h e p r o g r a m s . SAMPLE APPLICATION A p p l i c a t i o n D e s c r i p t i o n Our sample a p p l i c a t i o n c o n c e r n s t h e M i l k m o r e Farms Company. T h i s company m a n u f a c t u r e s a v a r i e t y of m i l k and c h e e s e p r o d u c t s and s e l l s them t o t h e p u b l i c t h r o u g h s e v e n o u t l e t s t o r e s . A l l s t o r e s a r e open s e v e n d a y s a week. The company w i s h e s t o r e c o r d i n f o r m a t i o n a b o u t p r o d u c t s m a n u f a c t u r e d and s o l d . To do t h i s , two f i l e s a r e needed: 1. A P r o d u c t f i l e t o c o n t a i n d e s c r i p t i v e i n f o r m a t i o n a b o u t e a c h p r o d u c t m a n u f a c t u r e d by t h e company. T h i s f i l e i s c a l l e d XPROD. 2. A S a l e s F i l e t o c o n t a i n s a l e s d a t a a b o u t p r o d u c t s s o l d by e a c h o u t l e t s t o r e e a c h d a y . T h i s f i l e i s c a l l e d SALES. The company r o u t i n e l y a d d s , u p d a t e s , and d e l e t e s i n f o r m a t i o n i n t h e s e f i l e s so t h a t u p - t o - d a t e p r o d u c t d e s c r i p t i o n and s a l e s r e p o r t s c a n be p r o d u c e d . 171 MILKMORE FARMS COMPANY XPROD FILE Any r e p o r t request using t h i s f i l e w i l l begin with TABLE FILE XPROD. The f o l l o w i n g are the f i e l d s which make up the f i l e : 1. PROD_CODE - A unique alphanumeric code which i d e n t i f i e s a c e r t a i n product. E.g. B10, C17. T h i s f i e l d i s an indexed f i e l d . A f i e l d can a l s o be r e f e r r e d to by i t s a l i a s . ALIAS=PCODE. 2. PROD_NAME - The name of the product s o l d , e.g. whole milk, sour, cream. ALIAS=ITEM. 3. PACKAGE - De s c r i b e s the amount of product each package c o n t a i n s , e.g. 16 ounces, 1 dozen. ALIAS=SIZE. 4. UNIT_COST - The c o s t of one package of a product, e.g.$ . 6 5 , $ 1 . 1 5 . ALIAS=COST. The f o l l o w i n g are some of the data records i n the f i l e : P K O D - M A N F (jwiT-ro^r £ " / VJHOL£ MILK MCDlUM E66S 1 DOZEKt MILKMORE FARMS COMPANY SALES FILE Any r e p o r t request using t h i s f i l e w i l l begin with TABLE FILE SALES. The f i e l d s which make up the f i l e are as f o l l o w s : 1. STORE_CODE - A code which u n i q u e l y d e s c r i b e s a s t o r e which s e l l s Milkmore products, e.g. 14B, 77F. ALIAS=SNO. 2. CITY - C i t y i n which the s t o r e i s l o c a t e d . ALIAS=CTY. 3. AREA - A l e t t e r which d e s c r i b e s the area i n which the s t o r e i s l o c a t e d , e.g. S, U . ALIAS=LOC. 4. DATE - The date on which the products were s o l d . ALIAS=DTE. 5. PROD_CODE - Same as i n XPROD f i l e above. T h i s f i e l d i s an indexed f i e l d . 6. UNIT_SOLD - The number of u n i t s of a c e r t a i n product s o l d . ALIAS=SOLD. 7. RETAIL_PRICE - The p r i c e the product r e t a i l s f o r . ALIAS=RP. 8. DELIVER_AMT - The number of u n i t s of a product d e l i v e r e d . ALIAS=SHIP. 9. OPENING_AMT - The number of u n i t s of a product i n opening i n v e n t o r y . ALIAS=INV. 10. RETURNS - The number of u n i t s of a product r e t u r n e d by the customer. ALIAS=RTN. 11. DAMAGED - The number of u n i t s of a product which are damaged. ALIAS=BAD. The f o l l o w i n g are some of the data records i n the f i l e : ClTi AREA PATE" P&QC*.C0bF l/k}|T_SOLO RETAIL. PRICE" DELIVER _AnT 5fMFo«D tz/iz 8 / 0 SO s 12/12. BIZ HO • 2 0 5 D Kt 0 io/ie eio 13 SO ir KI 0 'O/ll 21 i M i 3 0 3 0 t 172 EXAMPLE OF A FOCUS SESSION If we want to create a program to display a store's identification code, and the number of units of each product sold at the store, and want to call it SHOWSALES, the FOCUS session would appear as fo l l ows : >>TED SHOWSALES.FEX This command puts us into the TED editor where we can enter our program. The FEX extension indicates that the program is a FOCUS execution procedure. It must be added to all program names. Once in TED the fol lowing screen appears. To create or edit our program we enter EDIT at the command line (bottom of the screen) Now we want to create space between the TOP OF FILE and END OF FILE in order to be able to enter our program. To do this we issue the command ADD 3 at the command line. We now enter the program in the space just provided using the arrow keys, and the double arrow key as the return key. 173 Once we are finished entering the program we return to the command line using the return key. There we type the command FILE, to save the program we have just entered. This command also transfers us out of the TED editor back into normal FOCUS mode. To run the program, we can now type EX and the name of the program. >>EX SHOWS ALES.FEX jf an error occurs |n the program, a message will appear. At this point you will usually be prompted by a single caret ">". If you do not wish to continue execution of the incorrect program, the word Q U I T can be typed as a response and the report request is cancelled. If there is more than one error, you may have to type QUIT to two ">" prompts in order to get back to normal FOCUS mode ">>". Once back in FOCUS you can edit your program with the TED editor (TED SHOWSUM.FEX). If the program works, FOCUS will tell us the length of the report and ask us to hit the return key to see the report. Hitting the return key shows us the first page of the report. If the report occupies more than one page we would keep hitting the return key to see the rest of the report. At the end of the report, an END-OF-REPORT message will appear at the bottom of the screen. Hitting the return key two or three more times will get us back to the FOCUS prompt '>>'. 174 PRACTICE P R O G R A M S FOR THE R E P O R T GENERATION SESSION Produce a report which shows the total units sold and total returns for each store_code. The first part of the output should appear as fol lows: $To££_CoC£ LWlTL SOLD ftETuXMS 1 4 6 3 4 7 3 © Produce report which will sum the units sold for each date within each city, and prints these sums in alphabetical order by city. The first part of the report should appear as follows: C I T Y D A T E " t f M l T . S O L n Produce a report which will sum the units sold for each city and prints these horizontally across the top of the page. The column headings should appear as: ___ HEMJ YORK MfT^/AKK <ZTfirJ\fzOR.E> UN|oMD/\L£-Produce the following matrix-type report which details units sold, and returns for each city by retail price. (Note- there may not be units sold, in each city, at every retail price. In this case a dot will occur in place of a number) CITY" NEW YOftK. N£W*QC RCTflrL_Pg . lCr UN)lT,SOt-Q RCTUfiNS UMir-SQLQ ftCTURMS .85 3 0 1 7 5 P r o d u c e a report wh ich s h o w s the quant i t ies of a s p e c i f i c product (b10) s o l d , and on hand (opening amt) in a s p e c i f i c c i ty (Newark). The f i rs t part of the report shou ld appear a s : -ClTf Pf^OP^C^DE UN|TV$QLP or£>J/N6 -Afrr Produce a report wh ich s h o w s the units so ld for each product c o d e by c i ty . For each c i t y , s h o w the units s o l d by date . S h o w subto ta ls for both the c i t y , and each date within the c i ty . A l s o s h o w a grand total for units s o l d . The report shou ld have the f o l l o w i n g structure: fliry D A T E PRop-rjs&E U N ^ T - S O L P ME\o fbRK IO/>7 B I O 3 0 *T0TA.L t>(\T£ IO/17 162 it TCn-fiL c/TY HEWioRK I Produce a report wh ich s h o w s the ratio o f returns to units so ld for each product c o d e . The f irst part o f the report should appear as f o l l o w s ; P g Q D - C O D C R A T I O B I O . 1 7 P r o d u c e a report wh ich prints the product name, unit c o s t , retail pr ice , and ratio of cos t to retail price for each product . The f i rst part o f the report s h o u l d appear as f o l l o w s ; Pfcftb-NAME- M^'T.COST - R t r T A U - . P R i r r g A T l p WHOLE - M I L K i.^S" $.1S" ,68 SOLUTION TO PROBLEM 1 TABLE FILE SALES SUM UNIT_SOLD RETURNS BY STORE CODE END PAGE 1 STORE_CODE UNIT_SOLD RETURNS 14B 3 76 40 14Z 162 15 77F 65 1 Kl 42 2 177 SOLUTION TO PROBLEM 2 TABLE FILE SALES SUM UNIT_SOLD BY CITY BY DATE END PAGE 1 CITY DATE UNIT_.SOLD NEW YORK 10/17 162 NEWARK 10/18 13 10/19 29 STAMFORD 12/12 3 76 UNIONDALE 10/18 65 17.8 SOLUTION TO PROBLEM 3 TABLE FILE SALES SUM UNITS ACROSS CITY END PAGE CITY NEW YORK 162 NEWARK 42 STAMFORD 376 UNIONDALE 65 4 SOLUTION TO PROBLEM 4 179 TABLE FILE SALES SUM UNIT_SOLD RETURNS ACROSS CITY BY RETAIL_PRICE END PAGE 1. 1 CITY NEW YORK NEWARK STAMFORD _PRICE UNIT_SOLD RETURNS UNIT_SOLD RETURNS UNIT_SOLD $.85 30 2 $.89 30 4 $.95 60 $.99 13 1 80 $1.09 35 4 70 $1.29 . . 40 $1. 49 29 1 $1. 89 20 2 29 25 $1. 99 15 0 $2. 09 32 3 $2.19 27 $2. 39 45 $2.49 10 9 8 3 Z 3 0 5 SOLUTION TO PROBLEM 5 TABLE FILE SALES PRINT CITY PROD_CODE UNIT_SOLD OPENING_AMT IF PROD_CODE EQ BIO IF CITY EQ NEWMK END PAGE 1 CITY PROD_CODE UNIT_SOLD OPENING_AMT NEW/ARK BIO 13 1 5 181 SOLUTION TO PROBLEM 6 TABLE F I L E SALES PRINT PROD__CODE UNIT SOLD BY C I T Y BY DATE ON DATE SUB-TOTAL END PAGE CITY DATE PROD CODE UNIT SOLD NEW YORK 10/17 BIO 30 B17 20 B20 15 C17 12 D12 20 E l 30 E3 35 •TOTAL DATE 1017 162 •TOTAL C I T Y NEW YORK 162 NEWARK 10/18 BIO 13 •TOTAL DATE 1018 13 10/19 E12 29 •TOTAL DATE 1019 29 •TOTAL C I T Y NEWARK 42 STAMFORD 12/12 BIO 60 B12 40 C13 25 C7 45 D12 27 E2 80 E3 70 •TOTAL DATE 1212 3 76 •TOTAL C I T Y STAMFORD 3 76 UNIONDALE 10/18 B20 25 C7 40 •TOTAL DATE 1018 65 •TOTAL C I T Y UNIONDALE 65 TOTAL 616 SOLUTION TO PROBLEM 7 DEFINE F I L E SALES RAT10=RETURNS/UNIT SOLD; END TABLE F I L E SALES PRINT PROD_CODE RATIO END PAGE 1 PROD CODE RATIO BIO . 17 B12 .07 C13 .12 C7 . 11 D12 . 00 E2 . 11 E3 . 11 B.10 .07 B17 . 10 B20 . 00 C17 .00 D12 . 15 E l . 13 E3 . 11 B20 . 04 C7 .00 B12 .03 BIO .08 183 S o l u t i o n t o P r o b l e m 8 J O I N P R O D _ C O D E IN XPROD TO P R O D _ C O D E IN S A L E S AS NEW D E F I N E F I L E XPROD R A T I O = U N I T _ C O S T / RE T A I L _ P R I C E j END T A B L E F I L E XPROD P R I N T P R O D _ N A M E U N I T _ C O S T RE TA I L _ P R I C E R A T I O END APPENDIX EIGHT QUESTIONNAIRE 185 FOURTH GENERATION L A N G U A G E EXPERIMENT P l e a s e f i l l out al l s e c t i o n s of the quest ionna i re as accura te ly as p o s s i b l e . The i n f o r m a t i o n y o u p r o v i d e w i l l be used t o dete rmine f a c t o r s w h i c h c o u l d expla in your s u c c e s s in the exper iment . If y o u fee l their are other f a c t o r s w h i c h cou ld be impor tant in exp la in ing you r s u c c e s s w i t h a four th genera t ion language, and are not m e n t i o n e d b e l o w , p lease inc lude them at the end of the ques t ionna i re . A l l i n f o r m a t i o n w i l l be held s t r i c t l y c o n f i d e n t i a l . If y o u have any q u e s t i o n s do not hes i ta te to ask the lab a s s i s t a n t . Thank y o u f o r your c o o p e r a t i o n . 1. N a m e P l e a s e c o m p l e t e q u e s t i o n 2 if y o u are n o w a student . In a d d i t i o n , p lease c o m p l e t e q u e s t i o n 3 if y o u have w o r k e d fu l l t i m e , or n o w work fu l l t i m e . 2. W h i c h s c h o o l do y o u attend? (p lease check one ) UBC BCIT Other (p lease s p e c i f y ) 3. a) W h i c h c o m p a n y d i d / d o y o u w o r k fo r? b) What w a s your job t i t le? c ) ln your w o r k , d i d / d o y o u make use o f c o m p u t e r s ? Y E S NO . d) If y e s , what type o f w o r k d o / d i d y o u do w i t h the compute r? (check any of the f o l l o w i n g w h i c h a p p l y ) Q u e r y i n g da tabases Data entry P r o g r a m m i n g S o f t w a r e user C o m p u t e r opera to r Other , p lease s p e c i f y e) If your job i n v o l v e s / i n v o l v e d p r o g r a m m i n g , fo r h o w many y e a r s h a v e / h a d y o u been p r o g r a m m i n g at w o r k ? ( indicate number o f y e a r s ) year (s ) f ) If your j ob i n v o l v e s / i n v o l v e d p r o g r a m m i n g , a p p r o x i m a t e l y what percentage o f your t i m e at work i s / w a s spent us ing c o m p u t e r s f o r th is purpose? ( indicate percentage) percent 186 4. What is your p rogram of s tudy at s c h o o l ? (check one o f the f o l l o w i n g ) C o m m e r c e MBA C o m p u t e r S c i e n c e Other (please s p e c i f y ) _ 5. If y o u have p rev ious d e g r e e s / d i p l o m a s , p l e a s e s p e c i f y the t i t l e of the degree Have y o u ever done any c o m p u t e r p r o g r a m m i n g ? Y E S NO i i i ) _ i v i _ v i ) v i i ) v i i i ) ix) If y e s , l ist the p r o g r a m m i n g languages y o u k n o w , the number of yea rs exper ience y o u have had w i t h each , and the app rox imate number o f p r o g r a m s y o u have w r i t t e n in each.(eg. C O B O L , F O R T R A N , A P L , P A S C A L ) Language Y e a r s Exper ience A p p r o x . no. of p r o g r a m s w r i t t e n 7. Have y o u ever used any report w r i t e r s , s p r e a d s h e e t s , query languages , da tabase management s y s t e m s , or four th genera t ion languages? (eg. S e q u e l , _ E D B S , D b a s e l l , Lo tus 1 - 2 - 3 , I M S , F O C U S , R A M I S , T O T A L , DB2 , IDEAL , A D F , A D A B A S ) Y E S NO 187 If y e s , p l e a s e l ist them b e l o w . S o f t w a r e Y e a r s Exper ience A p p r o x . no. of p r o g r a m s wr i t ten H o w w o u l d y o u character i ze your use o f m i c r o c o m p u t e r s o v e r the past f e w y e a r s ? (p lease c i r c l e the number w h i c h best d e s c r i b e s you r use) 1 2. 3 t 5 6 7 Never use U s e every day L i s t b e i o w any other f a c t o r s in your background w h i c h y o u think might have had an e f f e c t on you r p e r f o r m a n c e in the exper iment . Thats i t , thank y o u fo r your p a r t i c i p a t i o n . P l e a s e do not d i s c u s s th is s tudy w i t h other pa r t i c ipan ts as y o u may unduly in f luence their p e r f o r m a n c e and learn ing p r o c e s s . APPENDIX NINE ESTIMATION OF SAMPLE SIZE NEEDED 189 APPENDIX 9 - ESTIMATION OF THE SAMPLE SIZE NEEDED The two methods of est imating the sample s i ze needed are , the power approach and the est imation approach. The power approach uses an estimate of the standard d e v i a t i o n (a) , the l e v e l at which Type I (a) and Type II (0) e r r o r s are to be c o n t r o l l e d , and an estimate of the magnitude of the minimum range (A) of the fac tor l e v e l means (a) which i s important to detect with high p r o b a b i l i t y . The es t imat ion approach s p e c i f i e s the major comparisons of i n t e r e s t , and from these, determines the expected widths of the confidence i n t e r v a l s for var ious sample s i z e s , given an advance planning value for the standard d e v i a t i o n . The approach i s i t e r a t i v e , s t a r t i n g with an i n i t i a l guess for the needed sample s i z e s . If the confidence i n t e r v a l s , based on the i n i t i a l • s a m p l e s izes are s a t i s f a c t o r y , the i t e r a t i o n process i s terminated. The power method was used f i r s t to determine a range of l i k e l y sample s i z e s . A sample s i ze was then used i n the est imat ion approach to ensure that the confidence i n t e r v a l s were s a t i s f a c t o r y . The c a l c u l a t i o n s are shown below. POWER APPROACH An estimate of the standard d e v i a t i o n of the subject populat ion i s needed for determining the sample s i z e , i n t h i s approach. The mean and standard d e v i a t i o n from the p i l o t study were 75.4 and 13 r e s p e c t i v e l y . This standard d e v i a t i o n w i l l be used as our est imate. We would l i k e our hypotheses t es t s to detect d i f f erences i n mean scores between subjects of about 10 marks (A=10). Any d i f f e r e n c e smaller than 10 could be due to 190 chance. Because CT=13, and we need an even A/a r a t i o to use the s t a t i s t i c a l tables, we w i l l increase A s l i g h t l y to 13. The number of l e v e l s of the f i r s t factor (complexity) i s a=2, and the number of l e v e l s of the second factor (experience) i s b=2. We w i l l control the Type I and Type II errors at a=.05 1 - (3=.90 The power method says that i f we l e t the number of factor l e v e l s (r) equal the number of factor l e v e l s of the f i r s t f a ctor (a) then the r e s u l t i n g sample siz e equals the number of l e v e l s of the second factor (b) m u l t i p l i e d by the sample siz e f or each treatment (n). From the power tables [From Table A-10 i n Neter and Wasserman, Applied Linear S t a t i s t i c a l Models] t h i s resulted i n an n of 11.5. By increasing the d i f f e r e n c e to A=16.25, the sample siz e becomes 7.5. On the other hand, with 1 - /J=.95 a sample siz e of 9 i s needed. From the above c a l c u l a t i o n s , i t appears we need a sample siz e of approximately 8 to 12 for each treatment. Since there are four treatments, t h i s r e s u l t s i n a t o t a l sample siz e of 32 to 48. ESTIMATION APPROACH• We can now use our estimates from the power approach to check the confidence i n t e r v a l s obtained for hypotheses t e s t i n g . HYPOTHESIS ONE Emp i r i c a l l y , the f i r s t hypothesis, that experienced programmers w i l l obtain higher mean scores than novices on simple and complex t e s t s , involves a contrast of factor l e v e l means. 191 We must determine i f the confidence i n t e r v a l for a given sample siz e i s s u f f i c i e n t l y small f o r our a n a l y s i s . The confidence i n t e r v a l for contrast of factor l e v e l means i s L +/- t [ 1 - a; (n - 1) x (a x b)] s(L) where L i s the d i f f e r e n c e between factor l e v e l means. L i s the estimator of L. L i s the d i f f e r e n c e i n factor l e v e l sample means, s (L) i s the standard deviation of L and can be computed as the square root of 2a/n. Therefore the confidence i n t e r v a l , i f n=12, w i l l be L +/- 2.47. This i s a s u f f i c i e n t l y small i n t e r v a l for estimating the d i f f e r e n c e i n scores between experienced programmers and novices, as we were w i l l i n g to accept a A=13. HYPOTHESIS TWO The second hypothesis, that the d i f f e r e n c e i n scores between complex and simple tests w i l l be greater for novices than experienced programmers, involves a contrast of treatment means. If n=12, the confidence i n t e r v a l for a contrast of treatment means with i n t e r a c t i o n i s L +/- t [ 1-a; ( n - l ) a b j s ( L ) , where L = [ ( Y ^ -Y 1 2) - ( Y 2 1 - Y 2 2 ) ] and s 2 ( L ) = MSE/nZZc 2 i j = 4.33, s(L) = 2.08. With n=12, and a = .05, L +- 3.49. Therefore the confidence i n t e r v a l i s +/- 3.49 and i s s u f f i c i e n t . Therefore n=12 i s a s u f f i c i e n t sample s i z e , which makes our t o t a l target sample N=48. APPENDIX TEN DATA COLLECTED DURING THE EXPERIMENT DBS SN EI PWE VPEW M B A CSS OS E3GL N3GL N3GLW RWPk 1 1 0 0 0 . 0 0 0 1 0 1 6 25 0 2 2 0 0 0 . 0 0 0 1 0 1 7 64 0 3 3 1 0 0 . 0 0 1 0 0 1 1 6 0 4 4 0 o 0 . 0 0 0 1 0 1 6 54 0 5 5 1 0 0 . 0 0 1 0 0 1 1 3 4 6 6 1 0 o.oo 1 o o 1 2 55 5 7 7 0 0 0 . 0 0 0 1 0 1 6 78 2 8 8 0 1 0 . 2 0 0 1 0 1 6 30 0 9 9 0 0 0 . 0 0 0 1 0 1 5 4 1 0 10 10 0 0 0 . 0 0 0 1 0 1 4 35 0 1 1 1 1 1 0 0 . 0 0 1 0 0 1 2 9 a 12 12 0 0 0 . 0 0 0 1 0 1 5 39 0 13 13 1 0 0 . 0 0 1 0 0 1 2 5 0 14 14 0 0 0 . 0 0 0 1 0 1 6 30 0 15 15 1 1 O . 15 0 0 1 1 3 33 0 16 16 0 0 0 . 0 0 0 1 0 1 6 35 0 17 17 0 0 o.oo 0 1 0 1 6 47 ' 0 16 18 1 0 0 . 0 0 1 0 0 1 4 20 4 19 19 0 1 0 . 13 0 1 0 1 6 51 0 20 20 1 0 o.oo 0 0 1 1 3 10 1 2 1 21 0 0 0 . 0 0 0 1 0 1 5 26 0 22 22 1 0 0 . 0 0 1 0 0 1 2 13 2 23 23 1 0 0 . 0 0 1 0 0 0 0 0 5 24 24 0 1 0 . 25 0 1 0 1 5 74 0 25 25 1 0 0 . 0 0 1 0 0 1 1 7 7 26 26 0 0 0 . 0 0 0 1 0 1 5 10 0 27 27 0 0 o.oo 0 1 0 1 7 39 0 28 28 1 1 1 . 00 1 0 0 0 5 23 0 29 29 1 0 o.oo 1 0 0 1 3 23 12 30 30 1 0 0 . 0 0 0 0 1 0 0 0 0 31 31 1 0 o.oo 1 0 0 1 2 33 3 32 32 0 0 o.oo 0 1 0 1 6 27 0 33 33 0 0 o.oo 0 1 0 1 5 38 0 34 34 1 1 0 . 4 0 0 0 1 1 3 32 0 35 35 0 0 0 . 0 0 0 1 0 1 5 87 0 36 36 1 0 o.oo 1 0 0 0 0 0 0 37 37 0 0 0 . 0 0 0 1 0 1 7 2 1 1 38 38 1 0 o.oo 1 0 0 0 0 0 0 39 39 0 1 0 . 3 0 0 1 0 1 4 13 0 40 40 1 0 0 . 0 0 1 0 0 0 0 0 0 4 1 4 1 1 o 0 . 0 0 1 0 0 1 2 5 7 42 42 0 o 0 . 0 0 0 1 0 1 5 37 1 43 43 1 0 0 . 0 0 0 0 1 0 0 0 0 4 4 44 1 0 0 . 0 0 0 0 1 1 5 46 3 45 45 0 0 o.oo 0 1 0 1 6 143 1 1 46 46 0 0 0 . 0 0 0 1 0 1 5 23 1 47 47 1 0 o.oo t 0 0 0 0 0 0 48 48 1 0 o.oo 1 o 0 1 2 7 16 49 49 0 0 0 . 0 0 0 1 0 1 5 15 0 50 51 0 0 0 . 0 0 0 1 0 1 5 1 1 1 51 53 0 0 0 . 0 0 0 1 0 1 5 27 0 52 55 0 0 0 . 0 0 0 1 0 1 5 24 0 53 56 0 o 0 . 0 0 0 1 0 1 7 320 0 54 57 0 0 0 . 0 0 0 1 0 1 6 • 40 0 55 58 0 0 0 . 0 0 0 1 0 1 4 19 0 56 59 0 o 0 . 0 0 o 1 0 1 7 69 0 57 6 0 0 0 0 0 1 0 1 6 29 0 3LPW N4GLW TTIME SIMP RT PT TT 01 02 03 OV EXP 0 0 204 0 44 1 15 45 43 81 33 47 0 0 1 138 0 30 78 30 73 62 41 47 1 0 0 235 0 80 1 10 45 100 73 19 34 0 0 25 133 0 25 69 39 90 100 66 76 1 0 0 176 1 40 91 45 63 52 13 40 0 0 0 151 O 46 6 0 45 83 92 66 73 1 1 0 131 0 22 64 45 100 88 23 42 1 0 2 134 0 33 63 38 97 lOO 73 8 1 0 3 143 1 26 75 42 90 100 68 84 1 3 146 1 31 89 26 100 96 79 90 0 0 156 1 52 59 45 100 100 61 84 0 0 3 138 O 31 65 42 95 27 72 59 0 0 150 1 45 60 45 93 60 18 54 0 0 2 150 0 31 82 37 57 88 47 59 0 0 190 1 70 80 40 too 100 76 90 1 0 3 155 0 32 79 44 97 38 64 7 1 0 0 152 0 25 82 45 97 77 25 40 1 0 0 188 0 53 90 45 97 58 52 53 0 0 40 136 1 23 78 35 100 100 79 91 1 0 0 155 1 37 75 43 100 100 79 91 ' 0 1 1 157 0 32 89 36 92 38 28 31 0 0 169 1 66 73 30 50 92 74 7 1 0 0 0 154 1 39 70 45 90 100 74 86 0 0 1 159 0 22 96 41 80 42 58 53 1 0 0 183 1 57 81 45 100 16 76 68 0 0 0 158 1 40 82 36 67 68 74 70 0 0 2 163 0 28 96 39 too 58 4 1 46 1 0 0 130 1 40 60 30 93 80 53 73 1 0 0 158 0 43 70 45 100 4 63 46 0 0 0 182 0 52 85 45 93 27 61 53 0 1 0 190 1 75 80 35 lOO 100 100 100 0 0 1 166 0 22 105 39 100 38 44 42 0 6 169 1 33 104 32 87 60 32 57 0 0 200 o 60 95 45 70 77 28 42 1 0 25- 173 1 33 97 43 83 61 7 1 1 0 0 199 1 50 104 45 63 0 0 20 0 0 2 174 1 52 94 28 too 80 50 74 0 0 0 188 1 45 103 40 lOO 28 92 77 0 0 20 180 0 38 97 45 97 4 39 29 0 0 198 1 60 93 45 83 76 50 68 0 0 0 162 1 53 64 45 100 52 50 67 o 0 2 185 1 42 101 42 100 88 84 90 0 0 152 0 45 62 45 100 73 47 54 0 0 0 125 1 45 SO 30 1CO 100 76 90 0 0 184 0 44 95 45 30 58 53 54 1 0 1 175 0 50 8 0 45 100 96 70 78 0 0 0 145 0 60 45 40 93 77 77 77 0 0 3 190 0 6 0 4B 4 3 93 0 33 23 0 0 3 175 0 6 0 7 0 45 83 65 84 84 0 0 0 165 0 40 9 0 35 87 58 48 51 0 0 0 145 1 40 75 30 90 100 55 78 0 0 154 o 44 65 45 100 81 84 83 0 1 1 149 r 44 75 30 100 96 71 87 1 0 2 174 0 44 90 40 100 88 52 62 2 1 179 1 44 9 0 45 100 100 97 99 0 0 0 194 1 44 105 45 90 92 63 80 1 0 0 199 1 49 105 45 82 40 0 37 0 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096832/manifest

Comment

Related Items