Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Data administration and control : a framework for design 1973

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1973_A4_5 F55_5.pdf [ 5.17MB ]
Metadata
JSON: 1.0101176.json
JSON-LD: 1.0101176+ld.json
RDF/XML (Pretty): 1.0101176.xml
RDF/JSON: 1.0101176+rdf.json
Turtle: 1.0101176+rdf-turtle.txt
N-Triples: 1.0101176+rdf-ntriples.txt
Citation
1.0101176.ris

Full Text

DATA ADMINISTRATION AND CONTROL: A FRAMEWORK FOR DESIGN by BRIAN L. FINLEY B.Sc, University of B r i t i s h Columbia, 1965 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF BUSINESS ADMINISTRATION in the Faculty of Commerce and Business Administration We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1973 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of Commerce and Business Administration The University of B r i t i s h Columbia, Vancouver 8, Canada. Date: September 15th, 1973. i i ABSTRACT Data i s an important resource of an organization and i s one of the fundamental building blocks of an e f f e c t i v e information system. The f a i l u r e of top-level management to define a frame- work for information systems and to recognise the potential of the data resource has a serious impact on information systems costs and development. This thesis attempts to i d e n t i f y some of the problem areas associated with unmanaged data and proposes a framework for the design of a Data Administration and Control System (DACS). Existi n g data analysis techniques have been reviewed and were found to be inadequate to meet the general requirements for data d e f i n i t i o n and documentation. DACS, when implemented, w i l l a s s i s t i n the i d e n t i f i c a t i o n and d e f i n i t i o n of the data resource, how i t i s used and where i t i s stored throughout the organization. It provides a t o o l to monitor and control the data and to a s s i s t i n the design of information systems. DACS has a p p l i c a b i l i t y i n the growing f i e l d of computer- aided information systems analysis and design. DACS i t s e l f i s an automated approach to the d e f i n i t i o n of data and i t s uses. Extensions to the basic design are discussed which would further contribute to the development of computer-aided design t o o l s . i i i TABLE OF CONTENTS CHAPTER PAGE I INTRODUCTION 1 II INFORMATION, DATA AND MANAGEMENT 7 2.1 Information Systems Development 7 ,2.2 The Management of Data 12 2.3 The Problems of Data Mismanagement 15 2.4 Integrated Data Banks 22 2.5 Automated Systems Design 26 III ELEMENTS OF A DATA ADMINISTRATION AND CONTROL SYSTEM 29 3.1 Objectives of DACS 3 2 3.2 Questions to be Answered 34 3.3 Requirements of DACS 36 IV DATA ADMINISTRATION AND CONTROL SYSTEM 3 9 4.1 Definitions 40 4.2 Data S p e c i f i c a t i o n Methodology 42 4.3 Record S p e c i f i c a t i o n 54 4.4 Document Sp e c i f i c a t i o n 59 4.5 Data Element S p e c i f i c a t i o n 65 4.6 F i l e Considerations 71 4.7 System Output Reports 7 6 i v CHAPTER PAGE V THE IMPLEMENTATION AND USE OF DACS 84 5.1 The Data Administrator 84 5.2 Standards Concerning Data 89 5.3 C o l l e c t i o n of the Data Information 91 5.4 The Users of DACS 94 VI SUMMARY 99 BIBLIOGRAPHY 102 APPENDIX I 104 V LIST OF FIGURES FIGURE PAGE 2.1 Information Systems Requirements 10 3.1 Data Element I d e n t i f i c a t i o n 29 4.1 Data S p e c i f i c a t i o n Sheet 43 4.2 Supplementary S p e c i f i c a t i o n Sheet 44 4.3 T i t l e Statement for Record and Segment Specifications 56 4.4 T i t l e Statement for Document S p e c i f i c a t i o n .. 61 4.5 T i t l e Statement for Element/Group/Array Sp e c i f i c a t i o n 66 4.6 Element Function Definitions 68 4.7 Logical DACS Record Structure 72 4.8 Data Dictionary/Directory 78 4.9 Keyword Index 80 4.10 Data Requirements Analysis Report 81 5.1 Information Systems Organization 8 5 1 CHAPTER I INTRODUCTION "Data constitute the sine qua non of information processing".'*' Throughout organizations people depend on information to a s s i s t them i n performing t h e i r functions. They use inform- ation of various forms and i n various ways i n stra t e g i c planning, 2 management control and operations control. Information from in t e r n a l and external sources, h i s t o r i c a l , current and projective, i s employed i n goal formulation by the strategic planners. It i s used i n the management control function to assure that resources are used e f f e c t i v e l y and e f f i c i e n t l y i n the accomplish- ment of the organization's objectives and i t i s used by operations control i n the day-to-day operation of the organization. I t can be, and has been, argued, that information systems of an organization are analogous to the nervous system i n a human. It consists of s t i m u l i i (inputs), processing and responses (output and action). Formal and informal information and communication channels are v i t a l to the success and health of an organization. The information and control systems are a cohesive network that bind the organization together for a common purpose. C.J. Bontempo, "Data Resource Management", Data Management, February 1973, p.31. 2 R.N. Anthony, Planning and Control Systems, A Framework for Analysis, D i v i s i o n of Research, Graduate School of Business Administration, Harvard University, 1965. 2 An e f f e c t i v e information system i s one of the requirements for a successful organization and for t h i s reason, the f i e l d of information systems has been receiving considerable attention at the present time and during the l a s t decade. However, i f information i s analyzed i t i s apparent that one of the building blocks consists of data elements. Close examination w i l l reveal that most information and control systems are based on a r e l a t i v e l y small c o l l e c t i o n of these data elements and without these elements there would be no way to meet the information requirements of strategic management, management control or oper- ations control. Considering the importance of information to an organization, i t could be assumed that the data elements which constitute information would be a highly valued and managed resource; one that i s considered with the same degree of attention that i s given to the other resources such as people and c a p i t a l . It i s evident, however, from personal observations and perusals of the l i t e r a t u r e , that too few companies consider data as a valued resource of the organization. Often, they have no established philosophy towards the data that exists within t h e i r organization and they have few structures, standards or control mechanisms to ensure that the data resource i s used e f f e c t i v e l y and e f f i c i e n t l y throughout the company. In f a c t , i t i s often apparent that the data resource suffers from management neglect. This apparent neglect can cause severe problems for an organization. The information systems may suffer from a lack of i n t e g r i t y and consistency, considerable cost can be incurred i n duplicated e f f o r t s , opportunities may be l o s t when data i s thought to be unavailable and increased burdens are placed on information systems architects and analysts who are try i n g to design and implement systems for use across functiona areas of an organization. The subject of "management information systems" has been receiving s i g n i f i c a n t attention by management and systems professionals. As a fundamental requirement i n the design and implementation of MIS, there are three major tasks which must 3 be accomplished: the information requirements of management have to be i d e n t i f i e d . the data elements which are p o t e n t i a l l y available must be defined as to technical description, meaning, storage location and r e t r i e v a l . - the data relationships among the various data elements must be i d e n t i f i e d . The f i r s t of these requirements, that management must be able to i d e n t i f y i t s information needs, i s a basic component of an e f f e c t i v e information system. An information system can only be meaningful i f i t r e f l e c t s and supports the organization's objectives, strategies, p o l i c i e s , and procedures It i s the r e s p o n s i b i l i t y of management to determine and define the framework for the development of information systems. R.V. Head, "Management Information Systems; A C r i t i c a l Appraisal", Datamation, May 1967, p. 23. 4 In the context of t h i s t h esis, i t i s assumed that upper-level management can and i s w i l l i n g to define t h i s requirement frame- work and to recognize the potential of information and i t s underlying data resource. It i s the undertaking and completion of the l a t t e r two tasks that w i l l s i g n i f i c a n t l y contribute to better data management and control of the ex i s t i n g data resource. If the d e f i n i t i o n and relationships are accomplished i n a d i s c i p l i n e d , standardized manner, a strong foundation w i l l have been established on which to b u i l d an e f f e c t i v e management information system. It i s the purpose of t h i s thesis to examine the r o l e of data as a resource of an organization; to trace the development of information systems with the e f f e c t t h i s has had on data and to discuss the problems encountered as a r e s u l t of data mis- management. In order to better manage and control the data resource, i t i s contended that a formal system i s required which w i l l i d e n t i f y data elements, where and how they are used and stored i n the organization and with whom the r e s p o n s i b i l i t y of data i n t e g r i t y resides. For t h i s purpose the requirements for a Data Administration and Control System (DACS) are examined and a proposal for the design of such a system i s forwarded. The scope of t h i s work includes the input and output requirements and conceptual f i l e organization for the Data Administration and Control System but does not include the writing of the support programs to make i t operational. It i s f e l t that an in-depth analysis of what i s required of a system for the management of the data resource w i l l contribute to the state-of-the-art i n the ever growing f i e l d of information 5 system analysis and design. It i s emphasized that the Data Administration and Control System i s not a f i l e organization technique for the r e t r i e v i n g , analyzing and displaying of data i n a data bank. This belongs to the area of data management systems whose int e g r a l function i s the storage and r e t r i e v a l of data i n support of an information system. The Data Administration and Control System i s con- cerned with i d e n t i f y i n g the unique data elements themselves, where they are located i n the various f i l e s , and where and how they are used. Chapter II examines the development of information systems and the e f f e c t on data elements, how data i s managed (or mis- managed) i n many organizations, the problems encountered because of management neglect of the data resource and the need for better management and control. The philosophy of integrated data banks i s also discussed i n perspective with data con t r o l . Chapter III analyzes the objectives for a Data Control System, what information i t should provide to i t s users and what requirements are needed for i t s implementation. Chapter IV discusses the proposed Data Administration and Control System. Through a Data S p e c i f i c a t i o n Methodology the required information regarding data elements i s captured. The various types of informational output which must be provided by the system i s described as well as a conceptual analysis of a permanent f i l e organization to store and r e t r i e v e the data information. 6 In Chapter V, the r o l e of the Data Administrator, the standards which must be developed and the approach to data c o l l e c t i o n are discussed. The potential users of the system and how they can employ and interface with the system are also described. 7 CHAPTER II INFORMATION, DATA AND MANAGEMENT 2.1 INFORMATION SYSTEMS DEVELOPMENT The requirement for data processing and the need for relevent information, both i n t e r n a l and external, has always been a c h a r a c t e r i s t i c of an enterprise. Before the growth of the presently large and complex organizations a business was composed of r e l a t i v e l y few people and operated i n a rather s t a t i c and structured environment. Each businessman conducted most of his own a f f a i r s and through personal involvement and observation intimately knew the required information about his business and i t s environment. His information system was composed of his first-hand knowledge and personal discussion with his employees and people i n his market. Each of his decisions was based on a f a i r l y complete knowledge of the sit u a t i o n from the environment, market, economic condition and state of production and he could quite accurately predict the i n t e r r e l a t i n g e f f e c t s his decisions would have on these components. As organizations became larger and more complex, communication l i n e s longer, slower and more d i f f i c u l t to maintain, the enterprise began to fragment along functional l i n e s . At the same time, the quantity of data and information requirements increased at a seemingly exponential rate. Data c o l l e c t i o n , data processing and information dissemination became a very r e a l and costly concern. Many manual systems and procedures were developed to 8 meet these increasing problems. The f i r s t systems were based on accounting transactions and the data base consisted of journals, ledgers and accounts organized i n specia l i z e d recording and f i l i n g systems to serve selected purposes. Additional systems began to develop i n the other functional areas such as marketing, production, and personnel. A c h a r a c t e r i s t i c of these manual systems was that they generally served one functional area and one l e v e l i n the organization. It was very d i f f i c u l t to obtain information which crossed functional or v e r t i c a l l i n e s from these subsystems. The advent of computers i n the early 1950's was considered by many to be the panacea for the data processing and information problems which had arisen. The f i r s t systems to be automated when computers were introduced into business were those of high volume and high c l e r i c a l a c t i v i t y . Such a c t i v i t i e s as p a y r o l l , invoicing, order processing and general ledger were systems that could e a s i l y be i d e n t i f i e d as to t h e i r requirements and could be j u s t i f i e d on a cost-benefit basis. These systems were designed to replace the manual systems and i n scope did not o f f e r much more than the manual systems except faster processing and lower cost. Each system was s t i l l independent unto i t s e l f and served a single function and single purpose. These systems were implemented at the operational l e v e l of the firm and did not have included i n t h e i r design any integration with other oper- atio n a l systems horizontal i n the organization, nor did they 9 provide any s i g n i f i c a n t amount of v e r t i c a l information to higher l e v e l s of the organization. Each of these systems had t h e i r own input, f i l e s and output and the data these systems captured and processed were used exclusively for that system. With the increasing c a p a b i l i t y of hardware and software, systems designers recognized that the p r o l i f e r a t i o n of single function systems might be p r o f i t a b l y co-ordinated into more comprehensive systems. For example, i t was desirable to integrate the p a y r o l l function, the employee benefits function, the d i s t r i b u t i o n of costs to the accounting systems and the personnel records keeping function into a cohesive manpower system. The co-ordinated systems often, however, f e l l short of t h e i r objectives. This resulted mainly due to the fact that the i n d i v i d u a l systems maintained t h e i r single function approach and uniqueness and data was passed through each system and processed where i t was required. Each system retained i t s own f i l e s with the data organized as required. In addition, the systems were usually agreed to and developed by, functional area management with l i t t l e involvement by top management and t h e i r impact was primarily at the operational l e v e l . Today, the great majority of computerized systems maintain the c h a r a c t e r i s t i c s of the e a r l i e r manual systems. These systems were developed independently over an extensive period of time, have l i t t l e regard for future developments and concentrate on li m i t e d functional areas of the company controlled by the 10 same management that had controlled the non-mechanized system. The context and structure of a s p e c i f i c system and i t s data i s well understood by i t s users and system support personnel. However, these data generally ex i s t i s o l a t e d from the rest of the organization and knowledge about them i s very li m i t e d . Other users are ignorant of t h e i r existence, location or the fact that the data may be exploited by them. The bottom portion of figure 2.1, i l l u s t r a t e s some of these closed functional systems which have been developed. INDIVIDUAL SYSTEMS AND DATA FILES Figure 2.1 Information Systems Requirements 11 At the present time there i s great appeal to the "integrated systems approach"; to develop information systems which would serve not only the functional areas but would have the c a p a b i l i t y to cross the v e r t i c a l and horizontal l i n e s of figure 2.1, and serve the entire organization. There i s a r e a l i z a t i o n that the data elements e x i s t i n g i n various subsystems can be p r o f i t a b l y exploited to meet the inform- ation requirements of management control and at a higher l e v e l , to aid i n the po l i c y setting and decision making functions of strategic management. Often the higher l e v e l information requirements are of an "undefined" and demand nature which require as one component, a cl e a r , d e f i n i t i v e description of the available data resources. Therefore, a common means for describing data that would lend i t s e l f to a common understanding across d i v i s i o n a l , functional and v e r t i c a l l i n e s i s required. The problem of uniformly defining the meaning, structure and use of data located within i n d i v i d u a l data banks or f i l e s of various information systems and of providing a vehicle to f a c i l i t a t e common understanding becomes apparent. The desire to develop integrated systems must be based on sound understanding of what i s required, what i s presently a v a i l a b l e , what the r e a l problems are and the economies involved. One of the f i r s t steps that i s required i n the quest for better information systems i s the adoption by top management of a concern and philosophy towards the data i n t h e i r organization, t h e i r recognition that data i s a very valuable resource and that a means i s required for i t s administration and contro l . 12 2.2 THE MANAGEMENT OF DATA When data i s viewed as a primary and very valuable resource of an organization, i t i s readi l y apparent that i t suffers from a lack of attention and proper management. This i s highlighted when the attitude towards the other valuable resources - people and c a p i t a l , i s considered. With these resources, t h e i r value to the organization i s i l l u s t r a t e d by the concern and constructive management afforded to them. Well established t o o l s , techniques and procedures are implemented i n an e f f o r t to obtain, a l l o c a t e , u t i l i z e and monitor these resources i n such a way as to maximize t h e i r u t i l i t y . It i s easy to assume that once the c r u c i a l role that data serves i s recognized, s i m i l a r management practices and attention would be directed towards data as i t i s given to people and c a p i t a l . However, data resources are l e f t v i r t u a l l y unmanaged. It i s a rare manager who has ever given serious thought or consideration to the data resources. He i s concerned with informational content i n the form of reports and displays and neglects the fundamental requirement of administering the under- l y i n g data elements. This task i s l e f t to the analysts and programmers who generally are concerned with i n d i v i d u a l applications i n separate functional areas. Frequently, there i s very l i t t l e communication between analysts when i d e n t i f y i n g data elements i n order to avoid unnecessary redundancy and duplication and to ensure the data resource i s u t i l i z e d f or the optimum benefit of 13 the entire organization. No conscientious manager would allow his personnel or c a p i t a l resources to go unmanaged, yet the dangers and problems involved i n allowing data to suffer neglect have not been properly recognized. The complexities of modern data processing and information systems, multi-functional systems, on-line processing and dynamic user requirements, a l l compound the far reaching problems associated with the unmanaged use of the data resource. The quantitative value of information and i t s contribution to decision-making i s very d i f f i c u l t to determine. It i s even more d i f f i c u l t to alloc a t e a value to the data element components of information since a l l o c a t i o n s must also be made to other components such as programs and processes. However, the i d e n t i f i c a t i o n of the costs associated with data may be less d i f f i c u l t to quantify. Jarvinen"*" has analyzed these cost factors and summarizes them as: design of f i l e s programming of f i l e manipulation programs generation of f i l e s maintenance of f i l e s data processing required or reserved space outer properties i . e . - frequency of input and output security and protection query c a p a b i l i t y equipment P. Jarvinen, "Design of Information Systems", Computer-Aided Information Systems Analysis and Design, Bubenko, Langefors and Solvberg (eds.), p.83. 14 No attempt i s made i n t h i s thesis to quantify either the cost or the value associated with data. Generalities can not be made as each s i t u a t i o n must be evaluated i n i t s respective environment. However, ce r t a i n problems associated with data can be highlighted. These problems have frequently been observed i n the author 1s personal experience and they are discussed on a general l e v e l to avoid presenting examples which may be interpreted as a f a b r i c a t i o n . 15 2.3 THE PROBLEMS OF DATA MISMANAGEMENT The mechanization of previous manual systems, the continuance of the functional and single application approach to new systems design and the f a i l u r e of management to provide d i r e c t i o n and guidelines has had serious impact on the data resource. I t has frequently led to an uncontrolled data environment which has associated problems and costs. Data Fragmentation A severe problem a r i s i n g from the development of independent information systems to meet the limited functional areas of an organization i s the fragmentation of data resources. Data i s introduced on an "ad hoc" basis to meet the requirement of each of the app l i c a t i o n areas. The primary mission of each application i s to c o l l e c t , process and disburse the data resources necessary to s a t i s f y the information needs i n i t s own area. Thus, each of these application areas or systems tend to introduce, process and maintain data i n t h e i r own sets of f i l e s without regard for the needs of the other systems. These other systems, of course, must meet the pressures exercised by t h e i r users and w i l l follow s u i t i n t h e i r data u t i l i z a t i o n practices. Applications w i l l also sometimes introduce data without determining f i r s t whether they can c a p i t a l i z e on the fact that other systems may be using the same data and have i t already i n a form which may be useful. Once t h i s cycle of fragmentation, duplication and p r o l i f e r a t i o n 16 of data throughout the various systems has begun, there i s a lack of control over the data resource which leads to several problems. Data Duplication With each application area or system processing and main- tain i n g i t s own data i n i s o l a t i o n from the other systems, i d e n t i c a l data elements become di s t r i b u t e d over many f i l e s . This not only leads to additional costs of storage for the data elements but more important, i t re s u l t s i n the duplication of e f f o r t i n the capture, processing and maintenance of the data elements. The degree of duplication i s often not known and management i s unaware of any data or processing redundancy. However, there are circumstances i n which i t i s desirable to maintain and process duplicate data. This i s the case when the effectiveness of the information system would be impaired or the processing cost would be increased i n an e f f o r t to eliminate the duplication. What i s required i n order to determine when duplicate data i s , i n f a c t , redundant, i s a clear description of the data resources, the uses to which these data are put and the cost of capturing, storing and processing these data. Only then can an i n t e l l i g e n t decision be made on the costs and benefits of duplicate data. In addition to the costs involved, the maintenance of duplicate data also gives r i s e to the problem of data inconsistency. 17 Data Inconsistency Since the same data elements may exis t i n several systems, these data elements are subject to the processing requirements of each respective system. A p a r t i c u l a r data element i n one system may be updated on a d a i l y basis while the same element i n another system may be updated on a monthly cycle. The data elements thus become out of phase with each other and although each purports to be the same thing they represent the sit u a t i o n at d i f f e r e n t periods i n time. This may present no problem when the information generated i s used exactly for the purpose i t was intended and designed. However, there i s a tendency for information to migrate outside these intended boundaries. The problem often rears i t s head when personnel from two or more d i f f e r e n t areas discuss a problem or decision area and discover that t h e i r respective information, although supposedly the same, i n fact, d i f f e r s i n time. When the same data i s ca r r i e d i n several systems the problem of maintaining the data so i t i s consistent becomes very d i f f i c u l t . There i s always the chance that a transaction which updates the data element i n one system does not fi n d i t s way through a l l the subsystems to update them accordingly. If t h i s happens, the entire information system suffers an i n t e g r i t y problem from which i t i s often d i f f i c u l t to recover. In these cases, management develops an uneasy f e e l i n g about the i n t e g r i t y of the information they receive and about the 18 information systems themselves. There has become a lack of control over the consistency of data that i s used by various areas and inevitably information i s generated which suffers an i n t e g r i t y problem whose source i s the same data inconsistency. Communications Every organization develops i t s own vocabulary of terms which i t uses i n day-to-day communications. S i m i l a r l y , the various functions and aspects within the organization develop vocabularies and technical languages. If e f f e c t i v e communication i s to take place, information systems must employ a common language and produce a common understanding. There i s a danger with segmented and independent systems that the same terms are used to represent d i f f e r e n t e n t i t i e s or d i f f e r e n t e n t i t i e s are referenced by the same names. This leads to misunderstanding, misinterpretation and poor communications between people, e s p e c i a l l y those from d i f f e r e n t areas of the organization. A serious concern arises when information coming from d i f f e r e n t sources, although i d e n t i f i e d similarly, has a d i f f e r e n t meaning and context. For an information system to be e f f e c t i v e there must be a common vocabulary of data and information d e f i n i t i o n which i s understood and provides a non-ambiguous frame of reference for data and information. 19 Data Isolationism The fact that the data resides i n separate f i l e s , serving unique purposes with no central control, makes i t d i f f i c u l t for knowledge of the data resource to be communicated throughout the organization. Situations often occur when new information requirements i n one area may demand data that i s currently unfamiliar to t h i s area, even though the data may be, i n f a c t , resident and available for processing i n the f i l e s of other e x i s t - ing a p p l i c at i o n areas. Since the a v a i l a b i l i t y of t h i s data i s unknown there may be misguided responses to the request for the new information. The prospective user may be t o l d that his request cannot be s a t i s f i e d since the data i s not available or erroneous estimates may be made regarding the cost of obtaining the data and s a t i s f y i n g the new requirements. If the information i s worth the cost, a new system may be b u i l t duplicating the data c o l l e c t i o n and storage and further contributing to data isolationism. If the content and location of the data resource were cen t r a l i z e d , the f i r s t step i n responding to new requirements would be to determine whether or not the data already exists i n a form that w i l l s a t i s f y the requirements. The a b i l i t y of an information system to respond to random requests and changing requirements i s seriously hampered by i s o l a t e d data. New techniques such as RPG and special r e t r i e v a l languages have done much to lower the interface b a r r i e r s between the data and the end-users. However, these languages s t i l l require knowledge and understanding of the available data and 20 much e f f o r t i s involved i n these tasks. If the data resource i s c l e a r l y defined these techniques would be much more valuable. The time required to respond to unanticipated requests would be appreciably shortened since the e f f o r t to define and locate the required data would be reduced. Design and Implementation of New Systems The fragmentation and isolationism of data elements causes analysts and programmers to expend much e f f o r t and time f a m i l i a r - i z i n g themselves with the data resources that are required to meet t h e i r own program s p e c i f i c a t i o n . I t may be easier for an analyst to design a new system with data c o l l e c t i o n , f i l e s and output, than i t i s to t r y to determine i f the data already e x i s t s . Even i f he does attempt to accomplish t h i s through informal communication and analysis, data element d e f i n i t i o n i s so non- standardized, that the may not succeed. The information systems arc h i t e c t faces an extremely d i f f i c u l t task when he analyzes the organization to determine what information i s required where, and who uses what data. Since he i s viewing the information system requirements from an o v e r a l l viewpoint, the fragmentation of data throughout the exi s t i n g systems may seem l i k e a giant jigsaw puzzle. He would welcome a technique to lend some order to the maze of the data resource, to provide some standardization i n the i d e n t i f i c a t i o n of the elements and to shed insight into the r e l a t i o n between data, information and users. 21 Increasing Data Processing Costs Total data processing costs have continued to increase i n spite of gains i n computer hardware e f f i c i e n c i e s with the r a t i o of people costs to equipment costs continually growing larger. Much of t h i s i s due to increased personnel costs. However, a s i g n i f i c a n t amount of data processing d o l l a r s are spent i n e f f i c i e n t l y i d e n t i f y i n g , maintaining and c o n t r o l l i n g the data resource. The fragmentation of data elements, the r e s u l t i n g duplication of e f f o r t s , the data redundancy and the i n f l e x i b i l i t y of the systems a l l contribute to the costs of information systems. Analysts and programmers spend too much of t h e i r time i d e n t i f y i n g and defining data requirements to s a t i s f y informational needs. They could be more productive and e f f e c t i v e i f supported by a system such as DACS. 22 2.4 INTEGRATED DATA BANKS The recognition that one or more of the above problems exists with regard to data has often led organizations to consider, or to adopt, the concept of an "integrated corporate data bank" as a panacea for the problems associated with data management and control. A l l too often data resource management i s viewed simply as a technical matter requiring a technical solution. This approach i s understandable enough, since there i s a strong tendency for people i n the data processing f i e l d to solve data processing problems with more and faster hardware or more complex software. Recognition of data problems often causes an over-reaction by the personnel involved and they may h a s t i l y agree that t h e i r problems can be solved by eliminating data redundancy and creating program independent data banks. Frequently,the approach considered i s to incorporate a l l data resources within the framework of a single "data base management system" (DBMS) which eliminates data redundancy and locates a l l the data i n a central repository. 2 Bontempo highlights the danger of viewing redundant data as the main problem and warns that duplicate data does not i n i t s e l f constitute conclusive evidence of redundancy. He suggests that the view that i t does so i s based on one of two non-sequiturs i n the analysis of data problems: "that any two occurrences of the same data constitute a gratuitous duplication or redundancy C.J. Bontempo, "Data Resource Management", Data Management, Vol 11, No.2, February 1973, pp.33-34. 23 of data". He continues by saying: "There are circumstances i n which i t i s desirable to maintain and process duplicate data. What i s required i n order to determine when duplicate data i s , i n fa c t , redundant, i s an unequivocal, clear description of data resources and of the uses to which these data are put. Without t h i s evidence, any remedies invoked to eliminate redundancy can serve merely to compound the o r i g i n a l error which i s the f a i l u r e to monitor and control data u t i l i z a t i o n i n a systematic and deliberate way - i n the same way we monitor and control the use of other resources". The view that a single data base management system i s the only solution or the best solution to data control, i s based on his second non-sequitur: "data resource fragmentation implies a need for actual data integration achievable only by means of a centralized repository of data, i . e . a data base, managed by a complex and elaborate set of programs". Once t h i s decision i s made, management has committed, perhaps unconsciously, to integration and c e n t r a l i z a t i o n as t h e i r data processing goals and the cost involved i s high. In addition to the costs of the software, they must re-educate analysts, programmers and users, add new systems support, restructure and convert data and reorganize the entire systems flow. Unfortunately, the approach to a DBMS i s often taken very i r r a t i o n a l l y and without the required concern for planning, analysis, selection and implementation. There i s no doubt that 24 the data base management system approach and the investment i t requires can well be worth i t s cost and e f f o r t when implemented in the proper manner, i n the r i g h t environment and with the necessary management involvement. Very often, however, data banks are constructed with the goal of data processing e f f i c i e n c y and the models stress input data format, flows and f i l e s with l i t t l e attention to the s p e c i f i c output requirements of the users. There i s a tendency, as well, to approach the DBMS with the same philosophy as with previous designs; that i s to concentrate in functional areas with limited scope without consideration at the outset for the requirements and general needs of the entire organization. The concentration i s often on the techniques structuring the data i n hierarchies, networks, chains, l i s t s , rings etc, and too l i t t l e attention i s paid to the information content of the data, where i t i s used, and the need for control. It i s often i n such cases that the data residing i n data banks i s no better understood, i s not better managed or controlled 3 and as Dearden observed, does not provide for the expected increase i n the qu a l i t y and value of information. The decision to implement corporate data banks, centralized 4 or d i s t r i b u t e d , i s not one to be considered l i g h t l y or naively. A considerable amount of e f f o r t , planning and control must be J. Dearden, "MIS i s a Mirage", Harvard Business Review, January-February, 197 2, p.96. C H . Kriebel, "The Future MIS", Business Automation, June 1972, p.42. 25 undertaken by a l l l e v e l s of management to ensure that the implementation meets the requirements, produces economical returns and i s j u s t i f i e d through better and more timely information with reduced costs of new systems. Management, when considering solutions to i t s data problems, should regard i t s primary objective as the need to monitor and control i t s data resource rather than the "inte- gration of the data v i a a DBMS. What i s needed i s a tool by which management can analyze the data, determine i f , i n fa c t , problems do exist and thus develop evidence on the basis of which r a t i o n a l decisions can be made with regard to the remedies required and the approach most suited to i t s p a r t i c u l a r environment. In the succeeding chapters of th i s thesis a proposal i s forwarded for a Data Administration and Control System (DACS) which serves as such a t o o l . The implementation of t h i s system can provide management with a vehicle which serves not only as an aid i n i d e n t i f y i n g data, i t s uses and associated problems, but which, by i t s e l f , may be adequate to meet the objectives of sound data management and control. I f , on the basis of rigorous analysis and planning, a DBMS i s considered as an economical approach to data management, DACS w i l l provide valuable assistance i n defining e x i s t i n g data as to loca t i o n , c h a r a c t e r i s t i c s , relationships and use. It may be used to advantage i n improving the order, uniformity and d i s c i p l i n e of the data before and during the implementation of the Data Base Management System. 26 2.5 AUTOMATED SYSTEMS DESIGN The analysis and design of computer-based information systems i s generally characterized as a manual process. From the time of an i n i t i a l request for a new or modified system, to the completion of a set of detailed s p e c i f i c a t i o n s for programs, hardware, output and processing requirements, there are few tools or automated processes to a s s i s t the analysts and designers. Recently, however, there i s increasing research under study to 5 develop automated procedures to a s s i s t i n the various design phases.' One of the most ambitious projects i n t h i s f i e l d i s the development of the Information System Design and Optimatization System (ISDOS) by Teichroew. This system i s envisaged as encompassing the entire design process from the i n i t i a l s p e c i f i c a t i o n of the requirements v i a a Problem Statement Language (PSL), through to the structuring of the data and the production of the object programs. One of the modules of ISDOS i s the "Data Re-organizer" which accepts s p e c i f i c a t i o n s from various other modules to structure the data i n the form required. In order to accomplish t h i s phase, a "meta-databank" i s needed which describes and defines data as they e x i s t i n the various data banks and f i l e s of the information systems. The Data Re-organizer then interrogates t h i s meta- databank or directory to map the l o g i c a l data requirements on the R.V. Head, "Automated System Analysis", Datamation, August 15, 1971, p.22. 6 D."Teichroew and H. Sayani, "Automation of System Building", Datamation, August 15, 1971, p.25. 27 existing data base and to indicate any missing or incomplete requirements. CODASYL 7 and Guide-Share**have published proposals describing the requirements for such a directory as a means to define the physical and l o g i c a l c h a r a c t e r i s t i c s of a data base and to act as an interface between the actual data fand the users and programs which employ these data. This interface, although designed to f a c i l i t a t e the d e f i n i t i o n and r e t r i e v a l of data, could also serve as the data directory required i n techniques for automated systems design. The Data Administration and Control System (DACS) as presented i n t h i s thesis could be extended to provide a f a c i l i t y to meet the requirements of Codasyl's Data Description Language and Guide-Share 1s Data Base Descriptive Language (DBDL). As DACS goes beyond the i d e n t i f i c a t i o n and d e f i n i t i o n of the data as they e x i s t i n the data structures i n the various systems and also includes the i d e n t i f i c a t i o n of data elements as they are captured and employed throughout the organization, i t may be extended to provide automation to the process of f i l e and data base design. The automated analysis of the data elements, how they are used, t h e i r time and r e t r i e v a l requirements, could conceptually r e s u l t i n the d e f i n i t i o n and structuring of the data base i t s e l f . CODASYL Systems Committee, "Feature Analysis of Generalized Data Base Management Systems", May 1971. p Joint Guide-Share Data Base Requirements Group, "Data Base Management System Requirements", November 11, 197 0. 28 A system such as DACS could serve as the nucleus of one of the components of Automated Management Information Systems^. It would accept the data s p e c i f i c a t i o n s from the Problem Statement Language, scan the data directory for the required data and construct l o g i c a l and physical data structures. If the design of the data base can be automated and there i s no reason why i t can or w i l l not be done, a s i g n i f i c a n t step towards automated MIS w i l l have been taken. % . J . W i l l , "A System of MIS Concepts", p.12, June 1973. 29 CHAPTER III ELEMENTS OF A DATA ADMINISTRATION AND CONTROL SYSTEM A data processing or information system conceptually consists of input data being c o l l e c t e d and maintained i n f i l e s and t h i s data i n turn being manipulated by procedures to produce the required output and information. The building block or raw material of these systems are data elements. They e x i s t i n the input, f i l e s and output phases with the data c o l l e c t i o n and processing programs massaging and organizing them to meet various requirements. (Figure 3.1). It i s with these data elements that the Data Administration and Control System i s concerned. Figure 3.1 Data Element I d e n t i f i c a t i o n 30 In considering the design for DACS, other data analysis techniques were reviewed. These included AUTOSATE and TAG (see Appendix I ) . AUTOSATE was one of e a r l i e r techniques developed and i t s primary emphasis was on the documents ex i s t i n g i n the organization and the flow of these documents between the various stations. Secondary importance was given to the actual data elements which comprised these documents and the f i l e s i n > which they were stored. AUTOSATE1s emphasis makes i t d i f f i c u l t to analyze the data elements i n order to i d e n t i f y redundancies, to gain insight into the uses of these data and to structure them into more e f f e c t i v e data organizations. The main emphasis of DACS i s on the data elements; employing document analysis to determine where and how the data are used. The TAG technique uses the input and output documents of an application i n order to develop data requirements and f i l e structures for each of the applications. TAG, however, does not have a f a c i l i t y to describe data as they currently e x i s t and, moreover, has li m i t e d a b i l i t y to analyze data as they are used across application boundaries. For these reasons, i t was determined that a system could be developed which would concentrate on the data elements as they currently e x i s t , could have the generality of analyzing these data i n a l l t h e i r locations and use and could also have the f l e x i b i l i t y of being able to determine the impact of new requirements on e x i s t i n g data structures. 31 It i s recognised that the processing programs, procedures, models, and the data c o l l e c t i o n and maintenance processes are an i n t e g r a l part of the information system (figure 3.1). They must be included i n a comprehensive t o o l for the design and documentation of information systems. The emphasis of t h i s thesis and the design of DACS are on the data elements and for these reasons the d e f i n i t i o n s of the processes are beyond the present scope. However, i t would be feas i b l e to extend the f a c i l i t i e s and incorporate the a b i l i t y to include process and program d e f i n i t i o n s as well. If t h i s were done, the d e f i n i t i o n of the t o t a l information system could be rigorously documented and automated. 32 3.1 OBJECTIVES OF DACS The basic objective of DACS i s to provide a t o o l to a s s i s t management i n the planning, monitoring and c o n t r o l l i n g of the organization's data resource. It accomplishes t h i s by i d e n t i - fying the data elements as they ex i s t i n the present systems, where they originate, where they are stored and how and by whom they are used. I t provides a knowledge about the data resource as i t i s employed throughout the organization. Where data i s shared among various systems and across organizational or functional l i n e s , DACS serves as a central repository of inform- ation about business and operating data for a l l persons i n the organization. DACS uniformly defines the meaning, structure and use of data, provides a vehicle for common control, communication and understanding and as s i s t s i n improving the order, uniformity and d i s c i p l i n e of the data resource. The centralized control of data element d e f i n i t i o n s , locations and uses does not imply the integration and ce n t r a l - i z a t i o n of the data elements themselves. Instead, information on data i s integrated and centralized and i s available through data resource reports to a l l interested people i n the organization. It may be desirable for effectiveness and e f f i c i e n c y to re t a i n decentralized f i l e s and to employ the dis t r i b u t e d data processing (DDP) approach as suggested by Kriebel"l \;.H. Kri e b e l , "The Future MIS", Business Automation, June 1972, p.42. 33 DACS w i l l provide a to o l by which management can define the data resource, i d e n t i f y e x i s t i n g problems and provide information which can be used to make more e f f e c t i v e and e f f i c i e n t use of t h i s resource. 34 3.2 QUESTIONS TO BE ANSWERED To meet the objectives as outlined, DACS must provide answers to key questions about the data for management, operational and technical personnel. I d e n t i f i c a t i o n and D e f i n i t i o n 1. Is the data element currently available i n our system? 2. What i s the element's formal name and description? 3. Is i t derived from other data elements? 4. What are i t s technical c h a r a c t e r i s t i c s ? 5. Where i s i t located for r e t r i e v a l and use? Usage 1. What reports use which data? 2. Who are the users and where are they located? 3. What i s the frequency of use? 4. How i s the data used? Source 1. Where does t h i s data originate? 2. How often i s i t received? 3. How often i s i t processed? Responsibility and Control 1. Who i s responsible for the s p e c i f i c a t i o n of the data? 2. Who i s responsible for i t s i n t e g r i t y and consistency? 3. Who has authority to update the data? 35 Processing 1. How i s the data structured and accessed for processing? 2. What programs or procedures use what data? Redundancy 1. Is there unnecessary data element redundancy? 2. Is there unnecessary document or report redundancy? 3. Is there unnecessary f i l e or record redundancy? 36 3.3 REQUIREMENTS OF DACS In defining DACS, the following requirements have been considered. I d e n t i f i c a t i o n and D e f i n i t i o n Each data element, report, document and record must have a l a b e l , a name, a textual and a technical description. The l a b e l i s a short and unique i d e n t i f i e r used to ref e r to the data item and to cross-reference i t to other data items o i n the system. The name i s a s t r i n g of key words which meaningfully i d e n t i f i e s the data items and can be used to prepare indexes to f a c i l i t a t e r e t r i e v a l . The description i s a free-form English text which allows the data to be described i n as much d e t a i l as required so as to be non-ambiguous and convey a clear meaning and understanding. The technical c h a r a c t e r i s t i c s of the data provide information such as length, format and precision . Location The permanent location of the data element and how i t can be accessed must be s p e c i f i e d . If data resides i n more than one location, a l l locations must be sp e c i f i e d . This allows the analyst to draw upon data that already exists i n processable form and not to propose recapturing the data which may r e s u l t i n possible duplication and redundancy. 37 Responsibility for S p e c i f i c a t i o n and Integrity I t must be determined exactly who i s responsible for specifying the data possessed i n the various areas of the organization (i.e who "owns" the data) and who i s responsible fo r , or has the authority to, update, change or delete the data. If one area has the r e s p o n s i b i l i t y for maintaining the i n t e g r i t y of a data element and i t s relationships, the problem of inconsistency i s reduced. Determination of Data Users The system must include the c a p a b i l i t y of r e g i s t e r i n g a l l the users of each data item. This provides insight into the data flows i n the organization, how the data i s used and the time requirements for i t . It also prevents the change or deletion of data items u n t i l the impact on a l l users has been considered. Determination of Data Source In order to e f f i c i e n t l y capture source data and to avoid unnecessary duplication i t i s required to know i n what manner and from what source the data items originate. Data Duplication and Redundancy Identical or very si m i l a r items of data occurring i n d i f f e r e n t parts of the organization indicate potential areas for concern. There may be duplication i n f i l e s , source 38 documents or output information. Detection of these cases i s necessary so each may be studied i n d e t a i l . Those responsible for the data items involved must have a t o o l to enable them to determine i f inconsistency or duplication actually e x i s t s . If such cases are found, there must be an a b i l i t y to determine whether such constitutes redundancy or i s desirable. Retrieval of Data Information Users must be able to s e l e c t i v e l y r e t r i e v e the items of data that are of inte r e s t to them and review or analyze only the s p e c i f i c a t i o n s or relationships of those i n d i v i d u a l items. This requires the support of an appropriate data processing f a c i l i t y so that the c h a r a c t e r i s t i c s of each item may be simply and s e l e c t i v e l y analyzed. If the above requirements are s a t i s f i e d through the Data Administration and Control System, the framework for c o n t r o l l i n g the data resource has been b u i l t . E f f o r t s must be made to review the system output, define cases of inconsistency and redundancy and determine where effectiveness and e f f i c i e n c y can be improved. CHAPTER IV DATA ADMINISTRATION AND CONTROL SYSTEM The Data Administration and Control System (DACS) i s designed to s a t i s f y the requirements which have been i d e n t i f i e d i n the preceding chapter. It i s conceptualized as a computer supported system which has the following components: A Data S p e c i f i c a t i o n Methodology (DSM) which f a c i l i t a t e s the d e f i n i t i o n and s p e c i f i c a t i o n of the i d e n t i t y , technical and r e l a t i o n a l attributes of documents, records and data elements. A f i l e structure to f a c i l i t a t e the storage and r e t r i e v a l of the data s p e c i f i c a t i o n s and t h e i r related information. A series of output displays or reports which enable the user to analyze and control the data resource as to i t s i d e n t i f i c a t i o n , terminology, usage and relati o n s h i p s . In developing DACS, the underlying concept i s that the basic units of information are data elements and i t i s on these elements and how they are combined and used that the system i s b u i l t . The d e f i n i t i o n s used throughout DACS are common IBM d e f i n i t i o n s and c l o s e l y correspond to Guide-Share requirements. As there i s no univ e r s a l l y accepted terminology at the present time for data d e f i n i t i o n , the author has attempted to provide keywords which w i l l best represent data concepts. CODASYL has attempted to out- l i n e a data d e f i n i t i o n language and several of t h e i r terms correspond 40 to DACS d e f i n i t i o n s . E x p l i c i t l y , DACS d e f i n i t i o n s for element, group, segment, record and f i l e correspond to CODASYL's data item, group, entry, record and f i l e . CODASYL d e f i n i t i o n s do not include document s p e c i f i c a t i o n s and, therefore, there i s no correspondence i n t h i s area. 4.1 DEFINITIONS The basic l e v e l s of data d e f i n i t i o n are the element, group and array. The element i s the smallest unit of data s p e c i f i c a t i o n and cannot be separated into smaller components, i . e . month. A group consists of two or more data elements which can be considered as one l o g i c a l entity or unit, i . e . date i s a group made up of the data elements day, month and year. An array i s a series of recurrences of data elements or groups which are i d e n t i c a l i n t h e i r meaning or a t t r i b u t e s , i . e . a series of amounts showing sales of products by month i s an array. Elements, groups and arrays can be combined into segments and records for storage i n an automated f i l e or into documents for operational or informational purposes. A segment i s a portion of a data record containing one or more l o g i c a l l y related data elements, groups or arrays. A segment i s t y p i c a l l y the smallest portion of data which can be p h y s i c a l l y retrieved from a f i l e . 41 A record consists of one or more l o g i c a l l y related segments, elements, groups or arrays, i . e . an employee record contains a l l data pertinent to that employee. A f i l e or data set i s a c o l l e c t i o n of l o g i c a l l y related records, i . e . an employee f i l e would contain a l l records for a l l employees. A document i s a combination of data elements, groups and arrays put into such a form as to be humanly understood and usable for informational or operational purposes, i . e . invoices, purchase orders and reports are documents. 42 4.2 DATA SPECIFICATION METHODOLOGY In order to capture information about data, i t s attributes and i t s use, DACS uses S p e c i f i c a t i o n Sheets and Sp e c i f i c a t i o n Statements to describe the data. Information about the data being analyzed i s entered onto the preprinted S p e c i f i c a t i o n Sheets by completing the appropriate S p e c i f i c a t i o n Statements. There are two S p e c i f i c a t i o n Sheets that are used to gather information about data; the Data S p e c i f i c a t i o n Sheet and the Supplementary S p e c i f i c a t i o n Sheet. The l a t t e r i s used when there i s not s u f f i c i e n t room on the f i r s t sheet to complete the data s p e c i f i c a t i o n . Sample Sp e c i f i c a t i o n Sheets are shown i n figures 4.1 and 4.2. Spe c i f i c a t i o n Statements The s p e c i f i c a t i o n or description of data to be included i n DACS i s accomplished by means of Sp e c i f i c a t i o n Statements. Each short, concise statement communicates s p e c i f i c c h a r a c t e r i s t i c s of the sp e c i f i e d data item. The Data S p e c i f i c a t i o n Methodology provides three sets of Sp e c i f i c a t i o n Statements; one set i s designed for elements, groups and arrays (E/G/A), one for documents and the other for records and segments (R/S). The S p e c i f i c a t i o n Statements for each of the data entries f a l l into f i v e general classes: 1. Label: The l a b e l , a short and unique tagf. associated with each item of data i n DACS, i s intended primarily for computer manipulation and r e t r i e v a l . > I— r- DATA . o < LABEL SEQ. 1 2 3 < 5 6 7 8 9 10 11 12 - 1 DATA ADMINISTRATION AND CONTROL SYSTEM DATA SPECIFICATION SHEET PAGE of Columns 1-12 to be punched into each card. Q cc < FILE LABEL /ER PREPARED BY PREP. DATE CERTIFIED BY CERT, DATE 13 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50 51 52 53 S4 55 56 57 58 59 60 61 52 S3 64 65 56 o i • C D ENTER SPECIFICATION STATEMENTS AND TEXT BELOWt 13 14 15 16 17 i8 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 4e 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 05 01 - OJT OS — — - i i 3 i i &~ — 15 IS 1<S — — — - — — — _ —- Figure 4.1 Data S p e c i f i c a t i o n Sheet C A R D  D A T A . . ' L A B E L S E Q . 1 2 3 4 5 6 7 8 9 10 11 12 D A T A A D M I N I S T R A T I O N A N D C O N T R O L S Y S T E M S U P P L E M E N T A R Y S P E C I F I C A T I O N S H E E T PAGE of C o l u m n s 1 - 1 2 t o b e p u n c h e d i n t o e a c h c a r d . c < _> E \'T EI S P E C I F I C A T I O N S T A T E M E N T S A N D T E X T B E L O W 13 14 15 16 17 18 19 20 21 22 23 24 a 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 - -- — — - — - Figure 4.2 Supplementary S p e c i f i c a t i o n Sheet 45 2. Name: The name consists of a number of keywords assigned to provide descriptive information about the data. I t i s the human i d e n t i f i e r for a piece of data and must be unique. The name appears i n the Keyword Index report and provides a means for i d e n t i f i c a t i o n and r e t r i e v a l . 3. Description: The description includes a free-form text which unambiguously describes each item of data and provides additional information deemed necessary for c l a r i t y . 4. Technical Data: Technical data, mainly of int e r e s t to the programmers and analysts, includes such information as the length, precision and mode of a data element. It also includes access method and language processor i n the case of a mechanized record. 5. Relational Data: Relational data refers to such information as the ordering of a data element within a record, the use of a unit of data such as an element by another e n t i t y such as a record or a document, the source and destination of the data and the use of the data i n the organization The DSM uses the name and lab e l i n conjunction to achieve se l e c t i v e r e t r i e v a l c a p a b i l i t y . I f an analyst knows the lab e l of a p a r t i c u l a r item of data, an element, record or document, he may reference the label for information on that item. If he does not know the label associated with the item of data, he may consult the Keyword Index which i s ordered by the keywords of the name, to f i n d the data items and labels he i s interested i n . 46 He may then use the lab e l to re f e r to the data s p e c i f i c a t i o n he wishes to study. The T i t l e Statement, the f i r s t statement on the Sp e c i f i c a t i o n Sheets, i s required on every s p e c i f i c a t i o n and i s the only one which has a fixed p o s i t i o n a l format. I t must always be completed for every data s p e c i f i c a t i o n . The remaining s p e c i f i c a t i o n statements are e s s e n t i a l l y free- form and are designed to permit as much f l e x i b i l i t y as possible to the person describing the data. These statements have three components: the operator, the delimiter and the operand. T ^ e operator i s a keyword which i d e n t i f i e s a single a t t r i b u t e of the data such as the mode, the frequency or the access method. F u l l words are used as the operators i n the Sp e c i f i c a t i o n State- ments, however, the f i r s t four l e t t e r s may be used for brevity. The delimiter follows the operator and i s always an equal sign (=:)'. The operand i s the information supplied by the analyst and i s the actual data which w i l l be entered into DACS. The operand may consist of one or more values and the interpretation of the values i s determined by t h e i r p o s i t i o n within the l i s t . Commas (,) are used to separate the values i n a multi-value operand l i s t . A S p e c i f i c a t i o n Statement i s always terminated by a semi- colon (;) . Other than the T i t l e Statement, the Sp e c i f i c a t i o n Statements may be completed i n any desired sequence on the Sp e c i f i c a t i o n Sheets. 47 T i t l e Statement The T i t l e Statement, the f i r s t l i n e of the Sp e c i f i c a t i o n Sheet, i d e n t i f i e s to DACS the lab e l of the data item to be submitted to the system and the type of processing to be performed. Data items can be added to DACS, have ex i s t i n g s p e c i f i c a t i o n s changed or be deleted. It also provides for the i n c l u s i o n of administrative information such as the name of the analyst specifying the data and the date of s p e c i f i c a t i o n . The T i t l e statement i s recorded on the Sp e c i f i c a t i o n Sheet i n a fixed format and must always be completed for the data s p e c i f i c a t i o n to be accepted. 1. A c t i v i t y Type (positions 1-2) This f i e l d indicates whether the s p e c i f i c a t i o n i s for a document, record or element/group/array and how the s p e c i f i c a t i o n i s to be processed. Position 1 can contain one of: E - to indicate the s p e c i f i c a t i o n i s for an E/G/A. R - to indicate the s p e c i f i c a t i o n i s for a record. D - to indicate the s p e c i f i c a t i o n i s for a document. Position 2 contains one of: B - to indicate a new s p e c i f i c a t i o n i s being entered. C - to indicate an ex i s t i n g s p e c i f i c a t i o n i s being changed D - to indicate a s p e c i f i c a t i o n i s being deleted. In building a s p e c i f i c a t i o n , the T i t l e Statement i s followed 48 by the remaining S p e c i f i c a t i o n Statements required to specify the data item. To change a s p e c i f i c a t i o n , the T i t l e Statement need only be followed by those statements required to e f f e c t the desired changes. In deleting a s p e c i f i c a t i o n , only the T i t l e Statement i s required to remove the entire s p e c i f i c a t i o n of the data item from the system. 2. Data Label (positions 3-9) The l a b e l i s created by the analyst at the time the s p e c i f i c a t i o n i s prepared. It may contain up to seven alphameric characters and serves as the unique i d e n t i f i e r for the p a r t i c u l a r E/G/A, record or document. It i s used for machine manipulation, r e t r i e v a l and appears i n several reports. 3. Sequence Number (positions 10-13) The sequence number consists of three numeric characters and i s used when completing E/G/A s p e c i f i c a t i o n s and indicates the r e l a t i v e p o s i t i o n of the E/G/A within the record or document. 4. Card Number (positions 13-14) The card number i s printed on the Specification,. Sheets. Its purpose i s to sequentially number the statements for a given data s p e c i f i c a t i o n . 49 5. Record or F i l e Label (positions 15-21) This f i e l d serves one of two purposes depending i f an E/G/A, a record or a document i s being s p e c i f i e d . When specifying an E/G/A t h i s f i e l d contains the l a b e l of the record or document on which the E/G/A appears. When specifying a record, the f i l e or data set to which the record belongs i s placed i n t h i s f i e l d . For a document t h i s f i e l d i s l e f t blank, or i f a manual f i l e i s used to store the documents, i t s name can be placed here. 6. Version Number (positions 22-23) The version number consists of two numeric characters and i s required when sp e c i f i c a t i o n s d i f f e r for the same item. It i s used when multiple usages create a need to show d i f f e r e n t c h a r a c t e r i s t i c s of data. 7. Status (position 24) This f i e l d shows the status of the item at various phases. P - indicates a Proposed status and means that the addition, change or deletion of a data s p e c i f i c a t i o n i s being proposed to meet a requirement. The proposed s p e c i f i c a t i o n i s reviewed by the data administrator for accuracy and i s entered into the system. Any affected users must be n o t i f i e d of the new s p e c i f i c a t i o n and t h e i r agreement obtained. Any c o n f l i c t s with other items i n the system must also be i d e n t i f i e d and corrected. A - indicates an Approved status where the proposed s p e c i f i c a - t i o n has been agreed to by a l l concerned. The s p e c i f i c a t i o n i s 50 now formally entered into the system and i s available for testing but not for formal use. E - indicates E f f e c t i v e status and the item i s now available for use. A l l changes a f f e c t i n g various systems throughout the organization have been made and the new s p e c i f i c a t i o n i s permanently resident i n the system. 8. Prepared By (positions 25-39) This f i e l d contains the i n i t i a l s and l a s t name of the analyst preparing the s p e c i f i c a t i o n . 9. Date Prepared (positions 4 0-4 5) This f i e l d indicates the date the s p e c i f i c a t i o n was prepared. 10. C e r t i f i e d By (positions 45-60) This f i e l d contains the name of the manager of the department who i s primarily responsible for the data. New or changed sp e c i f i c a t i o n s must be approved by him before entry into DACS. 11. E f f e c t i v e Date (positions 61-66) This date indicates the date at which the status of the item becomes e f f e c t i v e . Name Statement NAME = keyword s t r i n g ; The Name Statement provides a humanly sensible i d e n t i f i e r for the data item, E/G/A, record or document. Each item must 51 then have two i d e n t i f i e r s ; the Label and the Name. The Name gives the data item an i d e n t i f i e r which indicates the use of the item. It provides a r e t r i e v a l mechanism for the user who i s able to describe the d e f i n i t i o n being sought but does not know i t s Label. The structure of the Name consists of a series of keywords, chosen by the analyst, which w i l l describe the item at the time of s p e c i f i c a t i o n . Each of the keywords used must be highly d e f i n i t i v e and e a s i l y understood to insure e f f e c t i v e usage. When building the Name, the analyst should consider which i s the prime word, i . e . the most general term that can be used to describe the item. He then should consider the modifiers or next general terms etc, u n t i l he has completed the least general or most s p e c i f i c term. The Name i s the data s p e c i f i c a t i o n which i s input to the Keyword Index. In order to make t h i s report as complete and useful as possible, the analyst should be very conscientious when he constructs the Name for the data. The Name Statement should be the l a s t statement completed by the analyst. The creation of an e f f e c t i v e Name i s dependent on how well he knows the data and can i d e n t i f y the most descriptive keywords. During the process of completing the other S p e c i f i c a t i o n Statements he w i l l gain insight into the data and i t s character- i s t i c s which may be helpful i n i d e n t i f y i n g keywords for the Name. 52 Description Statement The Description Statement supplies DACS with the textual description which unambiguously describes the E/G/A, record or document. The analyst should compose a complete, clear and concise Description Statement. He should avoid abbreviations, mnemonics or obscure phraseology which may confuse a user who i s not f a m i l i a r with the data. He may include i n the Description Statement, any information which he judges to be relevent and i s not included i n any of the S p e c i f i c a t i o n Statements. The structure of the Description Statements takes the form: DESC = text; The statement i s begun i n column f i f t e e n and, i f required, can be continued on succeeding l i n e s . The f i r s t character of any continuation l i n e must s t a r t i n column sixteen or a f t e r . Reference D e f i n i t i o n Before proceeding to i d e n t i f y a data element, i t should always be determined i f the element already exists i n the system. This i s done by checking the Keyword Index to see i f there are any s i m i l a r elements using the associated Labels and determining i f the element has been defined. If i t has, a Reference D e f i n i t i o n can be used to simplify the repeated s p e c i f i c a t i o n of the data. If the element to be s p e c i f i e d i s i d e n t i c a l to an e x i s t i n g one, only the T i t l e Statement need be completed giving the same Label and Version number. If the element d i f f e r s i n 53 some of i t s a t t r i b u t e s , a new Version number i s assigned and only the attributes which d i f f e r from the o r i g i n a l entry need to be supplied. Origin, User and Responsibility Codes In completing the Source, User and Responsibility Statements, i t i s required to i d e n t i f y a person, function or department i n the organization. The Source Statement i d e n t i f i e s the originator of a document, the User Statement i d e n t i f i e s who uses the document and the Responsibility Statement determines who i s responsible for the content and i n t e g r i t y of the data. The code may be structured to s u i t the requirements of the s p e c i f i c organization. The structure of the code, therefore, must allow for the clear i d e n t i f i c a t i o n of the location and user of the document or the person reponsible for the data i n t e g r i t y . A possible structure may consist of d i v i s i o n , function, location, department and position numbers. An organization chart may be he l p f u l i n determining the code and, i n f a c t , a c l a s s i f i c a t i o n structure may already be i n existence. Schruben^ suggests that t h i s code can be s i m i l a r to an accountant's chart of accounts or budgetary system. Although the formalization of these codes may e n t a i l considerable e f f o r t i n s p e c i f i c organizations, they are a necessary requirement for a successful implementation of DACS i n each case. L. Schruben, "The Information System Model", Datamation, Vol.15, No.7, July 1969, p.93. 54 Data S p e c i f i c a t i o n Sheets The methodology followed to complete the data s p e c i f i c a t i o n s consists of f i r s t completing record or document sp e c i f i c a t i o n s and then completing a S p e c i f i c a t i o n Sheet for each of the data elements which appear on the document or i s contained i n the record. In t h i s way linkage i s set up between records, documents and E/G/A's which i s l a t e r used i n the output reports. .3 RECORD SPECIFICATION The Data S p e c i f i c a t i o n Sheet ( f i g . 4.1) i s used to define the machine processable records that ex i s t i n the information system. These records are stored i n f i l e s or data sets on cards magnetic tape or magnetic disc and are processable by a computer or business machine. A d i s t i n c t i o n between documents and record as defined here, can be made by determining i f they can be used as they are ( i . e . are humanly i n t e l l i g i b l e ) or i f they require machine processing i n order to reveal the data or information captured on them. I f they are humanly i n t e l l i g i b l e , they are documents, i f not, they are records. The s p e c i f i c a t i o n of a record includes i d e n t i f y i n g i t by a Label, Name and Description and providing information as to where the record i s located and how to access i t . A d i s t i n c t i o n must be made between records and segments. A segment i s a portion of a data record containing one or more l o g i c a l l y related elements, groups or arrays. A record then, 55 i s defined as a l o g i c a l c o l l e c t i o n of one or more segments, elements, groups or arrays. Segments are included i n the d e f i n i t i o n of records since they are the basis on which many Data Management Systems are b u i l t and may be the smallest portions of a data set or f i l e which can be separately accessed and stored. T i t l e Statement This statement takes the general form as the T i t l e Statement for documents or elements; i s described i n section 4.2 and i l l u s t r a t e d i n figure 4.3. Name Statement The Name Statement i s described i n section 4.2. Description Statement The Description Statement, providing textual information about the record, i s described i n section 4.2. Type Statement TYPE = Record/Segment; The Type Statement s p e c i f i e s i f the s p e c i f i c a t i o n i s for a record or a segment. Class Statement CLASS = Perm/Temp; The Class Statement i d e n t i f i e s the record/segment as being a member of a permanent f i l e or a member of a temporary f i l e 56 FIELD POSITION A c t i v i t y 1 - 2 Record/Segment/ Label 3 - 9 10-12 Card Code 13-14 F i l e Label 15-21 Version 22-23 Status 24 Prepared by 25-39 Preparation date 4 0-4 5 C e r t i f i e d by 46-60 E f f e c t i v e date 61-66 DESCRIPTION RB: to create a new record RC: to change an exist i n g record s p e c i f i c a t i o n RD: to delete an exis t i n g record s p e c i f i c a t i o n Up to seven alphameric characters giving a unique la b e l to the record/segment Blank Card Code Up to seven alphameric characters i d e n t i f y i n g the f i l e or data set i n which the record/segment resides Two numeric characters which indicate the version number of the record/segment The status of the s p e c i f i c a t i o n i n the system P: proposed A: approved E: e f f e c t i v e The name of the person preparing the s p e c i f i c a t i o n The date the s p e c i f i c a t i o n was prepared The name of the manager of the prime user department The date the s p e c i f i c a t i o n becomes e f f e c t i v e . Figure 4.3 T i t l e Statement for Record and Segment Specifications 57 which exists for only a li m i t e d duration. Access Statement ACCESS = ISAM/BDAM/SAM/ etc: This statement provides data organization and access method for a record/segment. Any access method pertinent to the i n s t a l l a t i o n i s acceptable. Language Statement LANGUAGE = COBOL/ALGOL/FORTRAN/ etc; The Language Statement indicates the language used when the record/segment was i n i t i a l l y created. The operating system used ( i . e . DOS or OS) should also be indicated. Media Statement MEDIA = TAPE/DISC/CARD; The Media Statement describes the media on which the record/ segment i s stored. Format Statement FORMAT = F/V/U/FB/VB/FS/VS; The Format Statement i d e n t i f i e s that the record i s either fixed length or variable, blocked or unblocked, undefined or consists of fixed or variable length segments. Computer Statement COMPUTER = IBM360/UNIV94 00/ etc; The Computer Statement i d e n t i f i e s the system used to create the record/segment. 58 Count Statement COUNT = number; The Count Statement, i n reference to a record, indicates the number of records i n the f i l e or data set. If a segment i s being s p e c i f i e d , the count refers to the number of occurrences of the segment i n the record. Retention Statement RETENTION = number of periods; The Retention Statement provides the length of time that the data i n a record/segment i s maintained, i . e . two months would be indicated by 2M. End Statement This indicates the end of the Record/Segment S p e c i f i c a t i o n . After completing the Record/Segment S p e c i f i c a t i o n each of the in d i v i d u a l data elements are defined by means of the Data Element S p e c i f i c a t i o n . 59 4.4 DOCUMENT SPECIFICATION The Data S p e c i f i c a t i o n Sheet shown i n figure 4.1, i s also used to i d e n t i f y data and information that appears on various forms, documents, reports and displays throughout the organization. Its main purpose i s to record each document by l a b e l , name, description, volume, frequency of preparation or use, source and id e n t i t y of users. ; A document can be defined as a medium carrying information i n the form of a c o l l e c t i o n of one or more data elements. The medium i n t h i s context not only includes paper but also cards, machine processable forms, micro f i l m , telex messages, video displays, etc. Any document that exists i n the information system, which serves a purpose and provides data of informational or operational content to a user or process, must be i d e n t i f i e d by means of the Document S p e c i f i c a t i o n Statements. 2 ARDI c l a s s i f i e s documents according to t h e i r c h a r a c t e r i s t i c s : A source document i s a document which introduces new data to the system. Generally, data c o l l e c t i o n and conversion w i l l be necessary when a source document i s used i n conjunction with computers. A work or intermediate document i s a document such as a worksheet or a summary card, used mainly to summarize a large quantity of source data or to f a c i l i t a t e the editing of trans- actions or reports. W. Hartman, H, Matthes and A. Proeme, Management Information Systems Handbook, McGraw-Hill Book Co., 1968, sect. 3.2, p.31. 60 A record i s a document carrying recordings of a set of related data elements. It i s usually part of a manual f i l e which i s regularly updated to permit the supply of current information for use i n the preparation of reports. A report i s a document carrying managerial or operational information. Such a document may c a l l for a managerial decision or may serve to i n i t i a t e a necessary operation. T i t l e Statement The T i t l e Statement has the same format as described e a r l i e r and i s shown i n figure 4.4. Name Statement The Name Statement, as described e a r l i e r , consists of a series of key words which unambiguously i d e n t i f i e s the document. Description Statement The Description Statement provides textual information and a description of the document. Type Statement TYPE = source/work/record/report; The Type Statement i d e n t i f i e s the document according to the description outlined i n section 4.4. Class Statement CLASS - stat u s / a c t i v i t y / t r a n s a c t i o n ; Status indicates a report recording status at-a given date, 61 FIELD POSITION A c t i v i t y 1 - 2 Document Label 3 - 9 10-12 Card Code 13-14 F i l e Label 15-21 Version 22-23 Status 24 Prepared by 25-39 Preparation Date 40-45 C e r t i f i e d by 46-60 E f f e c t i v e date 61-66 DESCRIPTION DB: to enter a new document s p e c i f i c a t i o n DC: to change an ex i s t i n g document s p e c i f i c a t i o n DD: to delete an exi s t i n g document s p e c i f i c a t i o n Up to seven alphameric characters giving a unique la b e l to the document Blank Card Code If applicable, up to seven alphameric characters defining the f i l e where the document i s stored for reference; otherwise blank Two numeric characters which indicates the version number of the document The status of the document i n the system P: proposed A: approved E: e f f e c t i v e The name of the person preparing the s p e c i f i c a t i o n The date the s p e c i f i c a t i o n was prepared The name of the manager of the prime user department The date the s p e c i f i c a t i o n becomes e f f e c t i v e . Figure 4.4. T i t l e Statement for Document S p e c i f i c a t i o n 62 i . e . a balance sheet. A c t i v i t y indicates a report recording a c t i v i t y over a given period, i . e . a P r o f i t and Loss Statement. Transaction indicates a document recording some action or event, i . e . a purchase order. Form Statement FORM = paper/card/video/micro; The Form Statement further defines the document by describing the type of media on which the document appears. Count Statement COUNT = number; The Count Statement represents the quantity of the documents received, processed or used i n the time i n t e r v a l defined by the Frequency and Period Statements. Preparation Statement PREPARATION = manual/computer; The Preparation Statement indicates whether the document i s manually or computer produced. Frequency Statement FREQUENCY = number; Frequency represents the number of times t h i s document i s received, processed or used i n the period defined by the Period Statement. 63 Period Statement PERIOD = SS/MM/HH/DD/WW/BW/SM/MO/QQ/SY/YY/RR; The period i s the shortest time period i n which t h i s document i s received, processed or used. SS MM HH DD WW BW seconds minutes SM MO QQ SY YY RR semi-monthly monthly hours days weeks quarterly semi-yearly yearly bi-weekly as required Origin Statement ORIGIN source-code; This statement describes from whom or where the document originates. The source-code s p e c i f i e s the department or person who i n i t i a l l y submits the document to the system. It can be either i n t e r n a l l y generated by the organization, or come from an external source such as a customer. The format of the source- code conforms to the structure established for the source and user s p e c i f i c a t i o n described i n section 4.2. Users Statement This statement defines a l l the users of a p a r t i c u l a r document. As there may be more than one user, i t i s important to ascertain the i d e n t i t y of a l l the users even though they may only use cer t a i n pieces of data on the document. The format for the user code i s described i n section 4.2. USERS = u s e r l , user2, user3, . . . , user n; 64 Retention Statement RETENTION = number of periods; The Retention Statement indicates the number of time periods the document i s stored, or retained, between i t s i n i t i a l processing and eventual destruction. Periods are indicated by: D - days M - months W - weeks Y - years For example, a document that i s kept for six months would be described as: RETENTION = 6M; End Statement The End Statement indicates the end of the document s p e c i f i c a t i o n . When the document s p e c i f i c a t i o n has been completed, the data elements appearing on the document must each be separately s p e c i f i e d . 65 4.5 DATA ELEMENT SPECIFICATION A Data S p e c i f i c a t i o n Sheet i s completed for every data element on the document or record af t e r the document or record i t s e l f has been s p e c i f i e d . The data elements are linked to the corresponding documents or records through the Document/ Record la b e l i n the T i t l e Statement. The use of reference d e f i n i t i o n and the version number can be used to simplify the sp e c i f i c a t i o n of a data element i f i t already exists i n the system. As described i n section 4.2, only the T i t l e Statement and those statements needed to specify attributes which d i f f e r from the i n i t i a l entry, need to be supplied. T i t l e Statement The T i t l e Statement i s similar to the description i n section 4.2, and i s shown i n figure 4.5. Name Statement The Name Statement i s described e a r l i e r i n section 4.2. Description Statement The Description Statement i s described e a r l i e r i n section 4.2. Type Statement TYPE = element/group/array; This statement s p e c i f i e s the data item as an element, group or array. 66 FIELD POSITION A c t i v i t y 1 - 2 Element Label 3 - 9 Sequence Number 10-12 Card Number 13-14 Document/Record Label 15-21 Version Number 22-23 Status 24 Prepared by 25-39 Preparation Date 40-45 C e r t i f i e d by 46-60 E f f e c t i v e Date 61-66 DESCRIPTION EB: to enter a new element s p e c i f i c a t i o n EC: to change an exist i n g s p e c i f i c a t i o n ED: to delete an existing s p e c i f i c a t i o n A seven character alphameric la b e l giving a unique i d e n t i f i e r to the element Three numeric characters that indicate the r e l a t i v e position of the element within the document or record Card number An eight character alphameric lab e l i d e n t i f y i n g the document/ record on which the element appears Two numeric characters that indicate the version of the element P: proposed A: appeared E: e f f e c t i v e Name of person preparing the sp e c i f i c a t i o n Date the s p e c i f i c a t i o n was prepared Name of the manager responsible for s p e c i f i c a t i o n Date the s p e c i f i c a t i o n becomes e f f e c t i v e . Figure 4.5 T i t l e Statement for Element/Group/Array S p e c i f i c a t i o n 67 Class Statement CLASS = name/code/amount/date/text/quantity/f lag/control; -V"' The Class Statement i d e n t i f i e s the general use of the data. The classes have been chosen to portray information to a user or analyst. They are an extension to the object (entity), property 3 and time concepts as discussed by Langefors and required for automated design. Name - data which i d e n t i f i e s s p e c i f i c e n t i t i e s Code - data which i d e n t i f i e s a c l a s s i f i c a t i o n of an entity Quantity - the number or quantity of anything (excluding monetary units) Amount - the quantity of monetary amounts Date - calendar date Text - data having undefined content Flag - an indicator showing a yes-no condition Function Statement FUNCTION = FI/FF/VF/VR; This statement further defines the function of the data element. 4 The Function Statement i s applicable to Solvberg's process r e l a t i o n s which i d e n t i f y the relationships between data elements. 5 The d e f i n i t i o n s are determined by figure 4.6. J. Bubenko, B. Langefors and A. Solvberg, Computer-Aid Information Systems Analysis and Design, Studentlitteratur, Lund, 1971, p.22. 4 I b i d , p.98. ^J.F. K e l l y , Computerized Management^Information Systems, The MacMillan Company, New York, N.Y., 1970, p.372. 68 Figure 4.6 Element Function Definitions Count Statement COUNT = number, number, number, ...; The Count Statement, used when defining groups or arrays, provides a numerical count of the number of data elements within the group or array. If one number appears i n the count statement, i t indicates the number of elements i n the array or group. If more than one number appears, they express the number of planes i n a multidimensional array. i.e COUNT = 3, 2, 4; i s a three dimensional array having 3 planes, 2 rows and 4 columns. 69 Responsibility Statement RESPONSIBILITY = code; The Responsibility Statement indicates the person responsible for the content and i n t e g r i t y of a given element, group and array and who i s authorized to update or delete the element. This statement, i n fa c t , determines who "owns" the data and i s defined i n section 4.2. Mode Statement MODE = format, length, scale; The Mode Statement describes the format of the data as i t appears on a document or i s contained i n a record. The format operand i d e n t i f i e s the element as: CH - a s t r i n g of alphabetic or numeric characters BI - a s t r i n g of binary b i t s FD - a zoned decimal number PD - a packed decimal number BD - a binary fixed point number DF - a decimal f l o a t i n g point number BF - a binary f l o a t i n g point number The length operand s p e c i f i e s the number of characters, or i n the case of binary numbers, the number of b i t s , i n the element. The scale operand, s p e c i f i e s the number of f r a c t i o n a l d i g i t s for fixed point numbers. Edit Statement EDIT = picture; The Edi t Statement furnishes an edit mask for use when displaying the element on a report or video display. COBOL or PL/I edit words may be used for the operand. 70 Key Statement KEY = d i g i t ; The Key Statement i s used to i d e n t i f y a data element as an access key. In addition, i t provides a means of ind i c a t i n g the r e l a t i v e importance of the element as a key i n r e l a t i o n to other keys. The d i g i t i n the operand ind i c a t i n g t h i s r e l a t i v e importance may be from 1 (major) to 9 (minor). Code Statement CODE = symbol, meaning, symbol, meaning ...; The Code Statement i s used to specify coded data. It should appear i n the s p e c i f i c a t i o n whenever CODE appears i n the Class Statement. Example: CODE = 01, Eastern Region, 02, Western Region; End Statement This indicates the end of the data element s p e c i f i c a t i o n . 71 4.6 FILE CONSIDERATIONS Permanent computer f i l e s must be designed, organized and structured i n order to store the data information c o l l e c t e d and allow a means for manipulating and r e t r i e v i n g t h i s information. The physical format of these f i l e s w i l l depend on the implement- ation of DACS and on the computer hardware and software the user has at his disposal. If the user has a Data Management System avail a b l e , the structure of the DACS f i l e s w i l l undoubtedly be p h y s i c a l l y d i f f e r e n t from a user who has only the more conventional organizations and access methods, such as Indexed Sequential, at his disposal. For these reasons, the discussion on f i l e design here w i l l be at a rather general and conceptual l e v e l and w i l l concentrate on the l o g i c a l organization and content of the f i l e s rather than on the actual physical structure. This conceptual design, however, does make the assumption that a d i r e c t access storage device i s used to store the inform- ation and that some type of indexed or d i r e c t f i l e organization i s available to allow for the sel e c t i v e r e t r i e v a l of information by data l a b e l s . The conceptual record structure for the DACS f i l e i s shown i n figure 4.7. The Label of the record, document or E/G/A serves as the primary access key to the indi v i d u a l records. The records can be considered to be made of a number of segments, each containing some of the information provided by the Sp e c i f i c a - t i o n Sheets. These segments may be fixed or variable length Data Label Data Name Ver Technical Desc Ver Technical Desc L_Ver Technical Desc Textual Description Documents i Element Labels User Codes Codes Def. Elements Record Labels Document Labels 1 i Records Element Labels Figure 4.7 Logical DACS Record Structure 73 depending on the c a p a b i l i t i e s of the organization and access method employed. For a l l data items there i s the Name, Description and technical data segments. Data Name Segment This segment contains the Keyword name of the data item and i s the input to the Keyword Index. Description Segment This segment contains the textual description of the data. It may be necessary to have one or more of these segments i n order to contain the complete description. Technical Segment This segment contains the Version number and status for each data item as well as the technical information. Since the spe c i f i c a t i o n s and c h a r a c t e r i s t i c s vary for d i f f e r e n t versions of the data, there may be several of these segments. For each version t h i s segment w i l l include: Status E f f e c t i v e Date Type - Class In addition i t w i l l include for Records: Access Language Media Format Computer Count Retention 74 For Documents: Form Count Frequency- Period Source Retention For Elements: Function Count Responsibility Mode Edit For each version of the data item there w i l l also be associated other segments. These segments w i l l d i f f e r depending upon whether the item i s a record, document or E/G/A. For documents the following segments are defined: Element Segments These segments contain the Labels, Version numbers and sequence numbers of the elements which appear on the document. User Segments These segments contain the user codes of the departments, or persons, who use the document either as generators, processors or re c i p i e n t s . For records, the element segments are defined to indicate the elements which make up the record. These segments w i l l contain the element Label, Version and sequence number and the key i n d i c a t - ing the r e l a t i v e importance of the element i n the record. For elements there are three more segment types. Record Segment Each of these segments contains the Label of the records where t h i s element can be found. 75 Codes Segment If the element i s classed as a code, segments should exist which i d e n t i f y the various codes and t h e i r d e f i n i t i o n s . Document Segment The Labels of the various documents i n which the element appears i s contained i n these segments. An indicator showing that the document i s source, work or report could also be included. These various segments could be combined into physical records i n varying ways. They may a l l be grouped together to form one variable length record with the Label as the key; or each segment may be treated as a separate record using the Label and a s u f f i x code as the key. If a hierarchical-oriented Data Management System such as 4 IMS i s avai l a b l e , t h i s conceptual segmented approach could be u t i l i z e d d i r e c t l y . Two other f i l e s may be considered to simplify r e t r i e v a l and processing. One would be a User F i l e organized by user codes and for each user, would contain a l i s t of documents which that p a r t i c u l a r user employed. This could be a separate f i l e or could' be an inverted index f i l e maintained by the f i l e organization method. The second f i l e would be the Name F i l e which would be i n sequence by each Keyword of the data names and would be u t i l i z e d to produce the Keyword Index Report. IBM CORPORATION, Information Management System V2, General Information Manual, GH20-0765, 1972. 76 4.7 SYSTEM OUTPUT REPORTS The output reports or displays produced by DACS f a l l into two general categories: V e r i f i c a t i o n and Edit Reports Usage Reports. V e r i f i c a t i o n and_Edit Reports These reports are produced when the Specifications for the data are processed by the system as additions, changes or deletions. They provide the following information: Errors: Any errors found i n processing the Sp e c i f i c a t i o n Statements or unsuccessful update transactions are i d e n t i f i e d . This could include format errors, duplicate labels or references to labels which do not exis t i n the system. Ommissions: Any missing s p e c i f i c a t i o n s or c h a r a c t e r i s t i c s that have not been supplied should be i d e n t i f i e d . S t a t i s t i c s : Show the current content of the system by number of entries and indicate the additions, deletions and changes that have occurred during the l a s t processing cycle. The V e r i f i c a t i o n and Edit reports are used mainly by the Data Administrator to maintain the completeness and i n t e g r i t y of the contents of DACS. It i s his r e s p o n s i b i l i t y to check these reports and ensure that any errors or ommissions are r e c t i f i e d . Usage Reports These are the s i g n i f i c a n t reports produced by DACS and are 77 used to answer the questions that were raised i n section 3.2. Data Dictionary and Directory: This l i s t s a l l the spe c i f i c a t i o n s for a given record, document or element; provides information as to the content of a record or document and the location and use of the data elements. Keyword Index: This i s an alphabetic l i s t i n g by the keywords of the Name Statement and displays the complete data name and i t s associated l a b e l . Data Requirements Analysis: This report shows, by user, the source, work and report documents employed by them. Data Dictionary/Directory The Data Dictionary/Directory (figure 4.8) provides a reference for a l l data elements, documents and records that have been s p e c i f i e d to the system. It i s divided into three sections; one section each for documents, records and E/G/A's with the data labels i n alphabetical order within each section. The dictionary section of the report shows for each data item, the l a b e l , name, description and technical c h a r a c t e r i s t i c s . The directory portion of the report shows: For Documents - the elements which appear on the document and the users of the document. For Records - the elements which are contained i n the record. For Elements - the documents on which the elements appear and the records that contain the elements. If the element i s classed as a code element then the code information i s also included. DATA ADMINISTRATION AND CONTROL SYSTEM DATA DICTIONARY/DIRECTORY PAGE 1 07/01/73 DATA ELEMENTS CUSTNO CUSTOMER NUMBER A NUMERIC IDENTIFIER ASSIGNED TO ANY COMPANY OR PERSON PURCHASING PRODUCTS OR SERVICES. VER 01 LOCATION LABEL CUSTMST ACCREC USAGE LABEL CUSTORD INVOICE STATUS E TYPE ELEM CLASS NAME FUNC F l COUNT MODE EDIT RESPONS EFFECT 1600 CH,6 02160903 01/01/69 VER SEQ KEY NAME 01 01 1 CUSTOMER NAME AND ADDRESS RECORD 01 01 1 ACCOUNTS RECEIVABLE STATUS RECORD VER SEQ NAME 01 01 CUSTOMER PURCHASE ORDER 01 03 CUSTOMER INVOICE )Label and Name ) Description )Technical )Characteristics )Records containing ) t h i s element )Documents containing ) t h i s element DATA RECORDS ACCREC CUSTOMER ACCOUNTS RECEIVABLE STATUS RECORD CONTAINING CUSTOMER INFORMATION SHOWING OUTSTANDING ACCOUNTS RECEIVABLE STATUS BY CURRENT, 30-60, 60-90, AND OVER-90 CATEGORIES VER STAT TYPE CLASS COUNT FILE ACCESS LANG MEDIA FORMAT COMP RETENT EFFEC 01 E REC PERM 1600 ARFIL SAM COBOL TAPE FB IBM360 01/01/68 CONTENTS LABEL VER SEQ KEY NAME CUSTNO 01 01 1 CUSTOMER NUMBER CURAMT 01 02 CURRENT AMOUNT OUTSTANDING ETC. Figure 4.8 DATA DICTIONARY/DIRECTORY C O 79 The Dictionary/Directory serves as the main source of reference for the data resource. It i s used for i d e n t i f i c a t i o n , d e f i n i t i o n and c l a r i f i c a t i o n purposes for anyone requiring information about a p a r t i c u l a r data item. It also serves to highlight duplication and redundancy i n documents and records and can be used by an analyst to improve the e f f i c i e n c y of data use and storage. When new information requirements a r i s e , the Dictionary/ Directory can be referenced to determine i f the data elements already e x i s t i n the system and i f there are source documents and reports which can be u t i l i z e d to s a t i s f y the new need. If not, the report w i l l be helpful i n determining how the new information can most e f f e c t i v e l y be produced. Keyword Index The Keyword Index (figure 4 . 9 ) i s a l i s t i n g of data labels and t h e i r corresponding names. The name of each document, report or data element appears i n the Index by alphabetical order of each keyword i n the name. This enables the user to locate any DACS entrie s , l i s t e d i n alphabetical order, through a knowledge of any word i n the data name. Once having found the correct name, he can, by the associated Label, locate the data item and any inform- ation he requires i n the other reports of the system. Data Requirements Analysis This report (figure 4 . 1 0 ) shows, by user or department code, DATA ADMINISTRATION AND CONTROL SYSTEM KEYWORD INDEX PAGE 1 01/07/73 LABEL E/R/D NAME LABEL E/R/D NAME CUST002 E ADDRESS *CUSTOMER BILLING CUST020 R NAME AND ADDRESS MASTER *CUSTOMER CUST020 R ADDRESS MASTER *CUSTOMER NAME AND EMPOOl E NAME *EMPLOYEE CUST030 E ADDRESS *CUSTOMER SHIPPING CUSTOOl E NUMBER *CUSTOMER SAOOl D ANALYSIS BY PRODUCT BY CUSTOMER *SALES CUST004 E NUMBER *CUSTOMER PURCHASE ORDER AROOl D ACCOUNTS RECEIVABLE STATUS EMP002 E NUMBER ̂ EMPLOYEE CUST002 E BILLING ADDRESS *CUSTOMER INVOOl E NUMBER *INVENTORY CUST002 E CUSTOMER BILLING ADDRESS INV003 E ORDER *INVENTORY QUANTITY ON CUST005 D CUSTOMER INVOICE CUST004 E ORDER NUMBER *CUSTOMER PURCHASE CUST020 R CUSTOMER NAME AND ADDRESS MASTER SAOOl D PRODUCT BY CUSTOMER *SALES ANALYSIS BY CUSTOOl E CUSTOMER NUMBER SA002 D PRODUCT BY YEAR *SALES HISTORY BY CUST004 E CUSTOMER PURCHASE ORDER NUMBER INV003 E QUANTITY ON ORDER *INVENTORY SAOOl D CUSTOMER *SALES ANALYSIS BY-PRODUCT BY INV002 E QUANTITY IN STOCK *INVENTORY EMPOOl E EMPLOYEE NAME AROOl D RECEIVABLE STATUS *ACCOUNTS EMP002 E EMPLOYEE NUMBER EMP003 E SALARY EMPLOYEE EMP003 E EMPLOYEE SALARY CUST003 E SHIPPING ADDRESS *CUSTOMER SA002 D HISTORY BY PRODUCT BY YEAR *SALES SAOOl D SALES ANALYSIS BY PRODUCT BY CUSTOMER INVOOl E INVENTORY NUMBER SA002 D SALES HISTORY BY PRODUCT BY YEAR INV003 E INVENTORY QUANTITY ON ORDER AROOl D STATUS *ACCOUNTS RECEIVABLE INV002 E INVENTORY QUANTITY IN STOCK INVOIO D STATUS *INVENTORY STOCK INVOIO D INVENTORY STATUS INV002 E STOCK *INVENTORY QUANTITY IN CUST005 D INVOICE CUSTOMER INVOIO E STOCK STATUS *INVENTORY CUST020 R MASTER CUSTOMER NAME AND ADDRESS SA002 D YEAR *SALES ANALYSIS BY PRODUCT BY Figure 4.9 Keyword Index C O o DATA ADMINISTRATION AND CONTROL SYSTEM PAGE 1 01/07/73 DATA REQUIREMENTS ANALYSIS USER LABEL VER STAT NAME 02061301 Origin INV001 01 E INVENTORY RECEIPT NOTICE INV002 01 E INVENTORY SHIPPING TRANSMITTAL Report INV010 01 E INVENTORY STOCK STATUS INV020 01 E INVENTORY WAREHOUSE ORDERS 01071312 Origin PAY001 01 E EMPLOYEE PAYROLL CHEQUE PAY004 01 E EMPLOYEE PAYROLL REGISTER PAY010 01 E EMPLOYEE F-4 FORMS Work PAY020 01 E PAYROLL FILE UPDATE AND EDIT LIST Report PAY050 01 E EMPLOYEE TIME TICKETS PAY055 01 E EMPLOYEE PAYROLL CHANGE NOTIFICATION Figure 4.10 Data Requirements Analysis Report 82 the documents and data that they generate, process or receive. It can be used, when analyzing a p a r t i c u l a r area, to determine the data requirements for that area and i f any redundancy, duplication or i n e f f i c i e n c y e x i s t s . This report would also be of value to the Information Systems Architect when determining what data and information i s being employed by various functions or departments within the organization. Concentration on a p a r t i c u l a r subsystem of the t o t a l information system may be achieved by analyzing the data elements and documents which serve t h i s p a r t i c u l a r subsystem. For example, the marketing information system may be analyzed by obtaining a report retrieved by the marketing function code, which shows the data required by t h i s function. 5 I f , as Langefors suggests, l i s t s could be constructed showing the functions of firms of d i f f e r e n t types and t h e i r d i r e c t information needs, the Data Requirements Analysis Report could be compared to a standard for data requirements. In t h i s way i t could be determined as to whether certa i n functions are above or below t h i s standard with respect to the information they receive. B. Langefors, Theoretical Analysis of Information Systems, Studentlitteratur, Lund, Sweden, 1966, Vol.2, p.225. 83 Redundancy and S i m i l a r i t y Reports 6 7 AUTOSATE and Schruben 1s model contain reports which i d e n t i f y i d e n t i c a l or similar documents and f i l e s by the commonality of data elements. AUTOSATE determines duplication by comparing a calculated value for each f i l e with the values for other f i l e s and reporting the f i l e s which have i d e n t i c a l or sim i l a r values. Schruben, when analyzing each document i n the system, also i d e n t i f i e s s i m i l a r documents by l i s t i n g the documents which s a t i s f y a "prespecified c r i t e r i o n of s i m i l a r i t y " . The c r i t e r i o n he used was to l i s t the ten reports which were missing the least number of data elements on the report under study. He suggests that other c r i t e r i a could include percent of i d e n t i c a l data elements or frequency of generation. Such reports showing s i m i l a r i t y would be very useful to the analysts but may be d i f f i c u l t to program and produce. A more r e a l i s t i c approach may be to define a set of data elements and then to scan the documents and f i l e s to determine which ones contain a ce r t a i n percentage of the spec i f i e d set. These examples c i t e d do indicate that duplication and s i m i l a r i t y can be i d e n t i f i e d through automated analysis and do provide a valuable tool i n analyzing the data resource. O.T. Gatto, "AUTOSATE", Communications of the ACM, Vol.7, No.7, July 1964, p.430. L. Schruben, "The Information System Model", Datamation, Vol.15, No.7, July 1969, p.99. 84 CHAPTER V THE IMPLEMENTATION AND USE OF DACS THE DATA ADMINISTRATOR The Data Administration and Control System cannot achieve i t s p o t e n t i a l i n e f f e c t i v e l y and e f f i c i e n t l y monitoring the data resource unless certain other requirements are met. Management must be informed and educated to regard data as a corporate resource and to understand the value and potent i a l of DACS as a to o l i n data management. DACS must be assiduously maintained and updated as to content and i n t e g r i t y . I t w i l l quickly lose i t s value i f allowed to become outdated or neglected. DACS must be made available to a l l people concerned and must be used regularly i n i d e n t i f y i n g problem areas and defining new data requirements. The requirements and r e s p o n s i b i l i t i e s of the above functions make i t necessary to esta b l i s h a s p e c i f i c organizational p o s i t i o n i d e n t i f i e d as the Data Administrator. The Data Administrator should be considered as a s t a f f function reporting d i r e c t l y to the person responsible for Information Systems and Data Processing as shown i n figure 5.1. One of the a t t r a c t i v e features of a system such as DACS, i s that once i t has been set up, i t i s r e l a t i v e l y e a s i l y administered by a single person. Such may not be the case with more complex Director of Information Systems Information Systems Architect Data Administrator Hardware Systems Architect Software Systems Architect Systems & Procedures Systems Maintenance Manager Systems Development Manager Data Processing Manager C O U l Maintenance Programmers Systems Application Systems Computer Quality Data Analysts Programmers Programmers Operations Control Entry >. ^ J Some may be associated with user departments Figure 5.1 Information Systems Organization 86 Data Base Management Systems and t h e i r requirements for d e f i n i t i o n , organization and structure, access and r e t r i e v a l , security and recovery procedures. The Data Administrator i s concerned with the data that presently ex i s t within the company and analyzing problems that may be inherent i n t h e i r present organization and use. He i s not the owner of the data but only the custodian. He i s responsible for ensuring that the data resource i s u t i l i z e d according to certa i n standards and objectives which have been developed. He may, i n fa c t , be deeply involved i n defining these standards and objectives. Supported by a system such as DACS, the Data Administrator i s the foc a l point for the administration and control of the data resource and w i l l have unique knowledge on the o v e r a l l extent, content and location of t h i s resource. His functions and r e s p o n s i b i l i t i e s w i l l include: To inform and educate management on the extent and value of the data resource. To a s s i s t i n the establishment of data resource objectives and standards. To analyze and define e x i s t i n g data elements as to name, source, use, location and r e s p o n s i b i l i t y . To implement a system ( i . e . DACS) by which the above analysis and d e f i n i t i o n may be formalized. 87 To analyze the data for u n j u s t i f i e d inconsistency and redundancy and to make recommendations for i t s r e c t i f i c a t i o n . To ensure that he i s informed of a l l new and changed data requirements. To monitor and concur with new data element d e f i n i t i o n s and the source, f i l e and output requirements for new applications. To have continuous lias o n with the Information Systems Architect and his objectives, plans and requirements. To keep aware of data users'needs and requirements. To make recommendations regarding f i l e organization and r e t r i e v a l techniques including Generalized Data Base Management Systems. To evaluate the data resource from an ove r a l l organizational viewpoint, considering e f f e c t i v e and e f f i c i e n t u t i l i z a t i o n . The Data Administrator w i l l hold a very responsible and demanding pos i t i o n i n the realm of Information Systems and Data Management and may perhaps be considered analogous to the comptroller''" i n the f i n a n c i a l area. Certainly he has the same mission of developing and maintaining a control system for d e f i n i t i o n and recording, measuring of effectiveness and e f f i c i e n c y , auditing for accuracy and consistency and disseminating of s t a t i s t i c s and information concerning the resource. M.J. Gordon and G. Shillinglaw, Accounting and Management Approach, Third E d i t i o n , Richard D. Irwin Inc., Homewood, I l l i n o i s , 1964, pp.527-529. 88 The data administrator has the r e s p o n s i b i l i t y of ensuring that the data resource i s maintained and organized to support the information framework that i s defined by the higher l e v e l management and that i t meets t h e i r requirements, objectives and standards. It i s management who must develop the standards and controls for the data resource, and i t i s on these standards and the a b i l i t y of the data to f u l f i l the information framework, that the data administrator w i l l be evaluated. 89 5.2 STANDARDS CONCERNING DATA Many of the required standards associated with data are inherently implied i n the design and structure of the Data Administration and Control System and are discussed i n Section 3.3, under Requirements for a Data Control System. It i s through the implementation and use of DACS, that the standards for i d e n t i f i c a t i o n , d e f i n i t i o n , source, use and r e s p o n s i b i l i t y of data are s a t i s f i e d . The data s p e c i f i c a t i o n s must be complete and accurate and any c o n f l i c t s resolved before entry i s made into the formalized system. The Data Administrator may be required to act as an a r b i t r a t o r i n disputes over the ownership or r e s p o n s i b i l i t y of certain data that may be shared across functional boundaries. Any data duplication or apparent redundancy must be cl o s e l y examined to determine i f i t i s j u s t i f i e d on the basis of cost, timeliness or effectiveness and i f not, the redundancy must be corrected through the consolidation of f i l e s and the redesign or elimination of source or report documents. Although some resistance may be encountered from user groups or data processing personnel, an examination of the costs and problems r e s u l t i n g from unmanaged data and i t s e f f e c t on the operational and inform- atio n a l content, should j u s t i f y the need for redesign and a more d i s c i p l i n e d approach to data. I t must be stressed to users, analysts and programmers, that DACS serves as a central repository for data and that i t s 90 output should be examined f i r s t when new or changing data requirements a r i s e . DACS provides a control mechanism to preclude the i n t r o - duction of redundant or inconsistent data elements. It should be coupled with procedures which require that new data elements be checked against DACS to ensure that they do not duplicate e x i s t i n g elements unnecessarily and that they are not inco n s i s t - ent with these exi s t i n g elements It i s only through the formalized use of t h i s t o o l that i t s value can be re a l i z e d and i t s objectives be attained. 91 5.3 COLLECTION OF THE DATA INFORMATION A most important and demanding task i n the implementation of DACS, i s the c o l l e c t i o n of the information about the data i t s e l f . Most of t h i s information probably exists i n some fashion i n various areas of the company but i t w i l l have to be searched for and co l l e c t e d i n an organized manner. Useful approaches to t h i s task are suggested by Kelly"*" and 2 i n ARDI m the discussions on s t a r t i n g points for information system analysis and design. One approach i s to s t a r t with the operating system or production flow. It i s based on an in-depth analysis of the manufacturing and production processes involved i n the company1s operations and the data required to perform and control t h i s function. The other approach i s to s t a r t with the information flows and systems and analyze the order processing, sales, marketing and f i n a n c i a l functions, with emphasis on the documents and data that are used i n the operation, planning and decision-making r o l e s . The choice between the two approaches w i l l depend on the type of organization - whether i t has a manufacturing or service or i e n t a t i o n . However, as Kelly makes cle a r , the study should begin at the operational l e v e l rather than at the management control, or strategic planning l e v e l s . This i s the l e v e l at which a majority of the data elements can be i d e n t i f i e d most J.F. Ke l l y , Computerized Management and Information Systems, The MacMillan Company, New York, 1970, pp.73-75. 2 W. Hartman, H. Matthes and A. Proeme, Management Information Systems Handbook, McGraw H i l l Book Co., 1968, Sects. 3-2.2. and 3-2.3. 92 c l e a r l y as to d e f i n i t i o n , source, r e s p o n s i b i l i t y and use. If the elements at t h i s l e v e l are rigorously defined, i t w i l l simplify the task of i d e n t i f y i n g them as they appear i n informational content, at higher levels i n the organization. The determination of the point at which to begin the study should be the most l o g i c a l system input and the one that a large part of the company would be dependent upon. For example, the service-order would be a l o g i c a l s t a r t i n g point for a u t i l i t y such as a Telephone Company. Once a st a r t i n g point has been determined, the various departments are analyzed as to t h e i r data use and requirements. An organization chart may be useful to define the data as to user, source and r e s p o n s i b i l i t y and to determine the interfaces and relationships between other departments. The data c o l l e c t i o n cycle w i l l proceed through interviews with department managers, analysis of the documents and inform- ation used and review of any documentation which exists i n the data processing department, including input forms, record s p e c i f i c a t i o n s and output reports. Care must be taken to i d e n t i f y both manual and computer prepared reports. Discussions with people i n the departments and with analysts w i l l help i d e n t i f y the informational require- ment and significance of the data elements. As data items are i d e n t i f i e d and submitted to DACS, increas- ing use i s made of the output reports to determine i f the data 93 elements have already been defined and i f so, reference d e f i n i t i o n can be employed. Once the operational lev e l s have been reviewed, the analysis should proceed up the organizational hierarchy, noting new source data that enters the system and the varying employment of the data elements. Higher levels of management w i l l be characterized as information consumers and there may be special purpose systems developed to s u i t t h e i r needs. These systems are important as they are often a s i g n i f i c a n t cause of data fragmentation. The data c o l l e c t i o n process may be a complex, tedious and time-consuming endeavour. However, the standardization and d i s c i p l i n e of the Data S p e c i f i c a t i o n Methodology and the automated a b i l i t y of DACS, should make t h i s process simpler and more complete. 94 5.4 THE USERS OF DACS The Data Administrator has previously been discussed as' to his use and interface with DACS and the r e s p o n s i b i l i t i e s he has concerning the Data resource. However, Management, Information Systems Architects, Analysts and Programmers can a l l make b e n e f i c i a l use of DACS. Management One of the e a r l i e s t benefits of DACS to the management of a company.comes during the data c o l l e c t i o n phase. As management personnel should be involved i n the i d e n t i f i c a t i o n of the data i n t h e i r department or area, they w i l l become aware of the data resources that they provide, process or consume. As the data s p e c i f i c a t i o n requires r e s p o n s i b i l i t y to be assigned to the data elements, the in d i v i d u a l managers assume t h i s r e s p o n s i b i l i t y for the consistency and i n t e g r i t y of the data within t h e i r control. With t h i s increased awareness and concern for the data resource, they w i l l pay more attention to i t and be conscious of problems associated with data neglect. Once the content and extent of the data i s r e a l i z e d , management w i l l be able to provide a s i g n i f i c a n t contribution i n the analysis of data problems and th e i r solutions. They w i l l be more aware of data redundancies, i n e f f i c i e n c i e s and misuse and should draw attention to possible areas of concern. After DACS has been implemented, several of i t s uses w i l l 95 furnish tangible benefits. The interface to DACS can be d i r e c t consultation with the output reports but probably w i l l be through the Data Administrator. When new data or information requirements a r i s e , DACS can be consulted to determine the a v a i l a b i l i t y of the data elements and t h i s information can be factored into development lead times and cost and manpower requirements to meet the new need. This practice w i l l eliminate the problem of refusal of information demands when the data may already e x i s t i n a usable form. The a b i l i t y of a manager to define his requirements e x p l i c i t l y i n the form of standardized data elements, w i l l narrow the communi- cations gap that sometimes exists between users and data processing personnel. As a r e s u l t of better communications and data i d e n t i f i - cation, his requirements should be met more quickly and with a better quality product. As inconsistency and duplication are reduced, management w i l l have more confidence i n the information they receive and i n the data and systems which underlie i t . Information Systems Architect A fundamental problem confronting the designers of information systems, i s the i d e n t i f i c a t i o n of the information requirements of the horizontal and v e r t i c a l structures i n the organization. Their concern i s whether d i f f e r e n t functional portions of the company can share a common data base or whether the data base must be separated into v e r t i c a l and horizontal segments. A major 96 challenge l i e s i n the integration of the data resource so that i t can be u t i l i z e d by a l l lev e l s and components. The Information Systems architect i s also faced with deter- mining management's "information threshold"; the l e v e l of d e t a i l which a given person may require or expect. Although the Information Systems arc h i t e c t may be more concerned with the eventual design of the information system rather than i t s present structure, he can extract a large amount of useful information from DACS. DACS w i l l provide him with the i d e n t i t y of the data elements which are used by the various segments of the organization and the context and relationships among them. I t can also provide c l a r i f i c a t i o n of the informational interfaces between these segments. During analysis of the reports, weaknesses i n the ex i s t i n g systems may become apparent and may provide insights into more e f f e c t i v e design approaches. Whether he takes a "top-down" or "bottom-up" approach to information systems design, at some phase he must document, analyze and structure the data elements and define models or procedures with which to manipulate the elements into information. With a t o o l such as DACS, thi s task may be greatly s i m p l i f i e d . The e x i s t i n g systems have grown and developed over many years and analysis of the present status can help i n i d e n t i f y i n g and determining future requirements. DACS relieves from the a r c h i t e c t , the necessity of determining 97 "here i s what e x i s t s " and allows him to concentrate on "what i s r e a l l y needed of an information system". Analysts Analysts generally concentrate i n a more lim i t e d and detailed environment than Information Systems arc h i t e c t s . The l a t t e r i s concerned with the o v e r a l l view and philosophy of the information system while the former, given the general require- ments and design, di r e c t s his e f f o r t s to more s p e c i f i c subsystems or functional applications. It i s the analyst who d e t a i l s input formats, f i l e requirements, output s p e c i f i c a t i o n s and user i n t e r - faces . For analysts working on new requirements or extensions to ex i s t i n g applications, DACS provides a central source of data information. Once the data output and requirements have been defined, the analyst should consult DACS to determine whether or not e x i s t i n g documents can s a t i s f y his needs and whether the data elements already ex i s t somewhere i n the system. If the data requirements do e x i s t i n some form, the analyst should ex p l o i t them i n a manner consistent with the established data objectives. He must preclude the introduction of redundant or inconsistent data elements. If new elements are to be introduced into the system, he should f i r s t consult with the Data Administrator and determine i f any users w i l l be affected. The analyst, by drawing on e x i s t i n g data resources while 98 meeting the objectives of the Information Systems a r c h i t e c t s , plays an important r o l e i n maintaining the i n t e g r i t y , consist- ency and control of the data. Programmers The application programmer's function consists of trans- l a t i n g the analyst's design s p e c i f i c a t i o n s into a computer i n t e l l i g e n t language. He can u t i l i z e DACS to gain more knowledge of the data he i s to employ and to f a m i l i a r i z e himself with the way t h i s data i s stored and accessed i n the computer f i l e s . By u t i l i z i n g DACS labels and names i n the source programs, standard- i z a t i o n and communication can be enhanced. An interface to DACS could f e a s i b l y be incorporated to place record d e f i n i t i o n s from the DACS f i l e into a source l i b r a r y which could then be "c a l l e d " or "included" by COBOL or PL/I source programs. This ensures standardization of data usage and minimizes programmer time required i n making up these statements whenever a data f i l e i s used i n an applicat i o n . A further extension of t h i s interface would enhance DACS by incorporating the Data Description Language of CODASYL and s a t i s f y i n g the Guide- Share requirements. 99 CHAPTER VI SUMMARY In the quest for better information systems, numerous problems are encountered by management and systems personnel. Some of the e a r l i e s t questions to arise are expressed as follows: Where i s the l o g i c a l s t a r t i n g point i f r e l a t i v e l y l i t t l e i s known about the data that are presently i n use i n the various systems throughout the company? How can the data, which should constitute the nucleus of the information system, be determined? There i s a tremendous investment i n the present systems. Can these systems be improved without a complete redesign and reprogramming e f f o r t ? In the preceding chapters of th i s t h esis, i t has been emphasized that a fundamental requirement i n developing e f f e c t i v e information systems, i s to introduce d i s c i p l i n e and control to the data resource. Once thi s i s accomplished, the task of melding the information requirements and the data elements into a management information system w i l l become a less formidable endeavour. A review of the development of information systems and the management of data has indicated that the data resource suffers from management neglect, has become severely fragmented and i s often not recognized as a valuable corporate resource. In 100 order to a l l e v i a t e these problems, a Data Administration and Control System has been proposed to introduce standardization and d i s c i p l i n e to the exist i n g data elements. DACS, by i t s e l f , does not promise to provide more e f f e c t i v e or improved information. It i s a tool by which management and systems personnel can better manage, control and u t i l i z e the data resource. Through th i s t o o l the data elements are defined and described and data relationships are i d e n t i f i e d . This d e f i n i t i o n and i d e n t i f i c a t i o n i s a primary requirement i n information systems development. In conjunction with information requirements determination, data base design, f i l e creation and maintenance, model development and information r e t r i e v a l techniques, i t constitutes one component of the complete develop- ment e f f o r t . DACS can be considered to serve two purposes: It w i l l provide a method by which the data resource can be better managed and controlled on a continuing basis. It can provide a st a r t i n g point i n the development of a comprehensive management information system. The implementation of DACS w i l l highlight the extent of the data resource and make management more aware of the importance of consistent, r e l i a b l e data to corporate well-being. I t i s only through the r e a l i z a t i o n of the value of the data resource and i t s e f f e c t i v e management that t h i s resource can be exploited to achieve i t s pot e n t i a l of meeting the informational requirements of an organization. 101 In the realm of Management Information System design, DACS provides a method by which the data elements can be i d e n t i f i e d i n a d i s c i p l i n e d manner. Through the Data Dictionary/Directory, redundancy and duplication are i d e n t i f i e d and r e c t i f i e d . Stand- ardized data d e f i n i t i o n s are developed to be employed through- out the organization with DACS serving as a central repository for data information. With the Data Dictionary/Directory and the Data Requirements Analysis reports, designers can determine how each data element i s employed and what the users 1 data requirements are. Through t h i s f a c i l i t y , he w i l l have more knowledge and insight of the data when new data base structures are designed. As computer-aided system design becomes more developed, DACS could be extended to construct model data banks to meet the MIS requirements. This model data bank could then serve as input to a simulation model constructed to evaluate the performance of various generalized data management systems. Another necessary task, i f automated design i s to become a r e a l i t y , i s the development of techniques to define the models, programs and process relationships between various data elements and r e s u l t i n g information. Incorporating t h i s technique into DACS w i l l further contribute to information system documentation and design. Computer-aided system design i s c e r t a i n l y a development of the near future and DACS would require many extensions and modifications to contribute to t h i s f i e l d . However, as an interim step, DACS, as described i n t h i s thesis, could be programmed with less than six man-months of e f f o r t and would provide a much needed and desired t o o l . 102 BIBLIOGRAPHY Ackoff, Russel L., "Management Misinformation Systems", Management Science, Vol.14, No.4, p.Bl47. Anthony, Robert N., Planning and Control Systems, A Framework for Analysis, Di v i s i o n of Research, Harvard School of Business Administration, Harvard University, 1965. Blumenthal, Sherman C., Management Information Systems; A Framework for Planning and Development, Prentice-Hall Inc., Englewood C l i f f s , New Jersey, 1969. Bontempo, Charles J., "Data Resource Management", Data Management, Vol.11, No.2, February 197 3. Bubenko, J . , Langefors, B., Solveberg, A. (eds.), Computer-Aided Information Systems Analysis and Design, Studentlitteratur, Lund, Sweden, 1971. CODASYL Systems Committee, "Feature Analysis of Generalized Data Base Management Systems", May 1971, Available from Association for Computing Machinery, New York, N.Y. Dearden, John, "MIS i s a Mirage", Harvard Business Review, January-February, 1972, p.90. Dodd, George G., "Elements of Data Management Systems", Computing Surveys, Vol.1, No.2, June 1969, p.118. Gatto, O.T., "AUTOSATE", Communications of the ACM, Vol.7 No.7, July 1964, p.425. Gordon, M.J. and Shillinglaw, G., Accounting: A Management Approach, Richard D. Irwin, Inc., Homewood, I l l i n o i s , Third E d i t i o n , 1964. Hanold, Terrance, "A President's View of MIS", Datamation, Vol.14, No.11, November 1968. Hanold, Terrance, "An Executive View of MIS", Datamation, Vol.18, No.11, November 1972. Hartman, W., Matthes, H. and Proeme, A., Management Information Systems Handbook, McGraw-Hill Book Co., 1968. Head, Robert V., "Automated System Analysis", Datamation, August 15, 1971. Head, Robert V., "Management Information Systems: A C r i t i c a l Appraisal", Datamation, May 1967, p.22. 103 IBM Corporation, "Information Management System V2", General Information Manual, GH20-0765, 1972. Joint Guide-Share Data Base Requirements Group, "Data Base Management System Requirements", November 11, 1970, Available from SHARE Inc., New York, N.Y. Kel l y , Joseph F., Computerized Management Information Systems, The MacMillan Company, New York, 1970. Krie b e l , Charles H., "The Future MIS", Business Automation, June 1972, p.18. Langefors, Borge, Theoretical Analysis of Information Systems, Studentlitteratur, Lund, Sweden, 1966. Morton, Michael, S.Scott, Management Decision Systems, Divi s i o n of Research, Graduate School of Business Administration, Harvard University, 1971. Nolan, Richard L., "Systems Analysis of Computer Based Information Systems Design", Data Base, Vo.3, No.4, Winter 1971, p . l . Paul, Norman L., "MIS .... Are You Ready?", Data Management, October 1972, p.26. Price, Gerald F., "The Ten Commandments of Data Base", Data Management, May 1972, p.14. Sage, David M., "Information Systems: A B r i e f Look into History", Datamation, Vol.14, No.11, November 1968. Schruben, Lee, "The Information System Model", Datamation, Vol.15, No.7, July 1969, p.93. Teichroew, Daniel, and Sayani, Hasan, "Automation of System Building", Datamation, August 15, 1971, p.25. W i l l , Hartmut J . , "A System of MIS Concepts", condensed from H.J. W i l l , "Towards a System of MIS Concepts", Working Paper No.194, Faculty of Commerce and Business Administration, University of B r i t i s h Columbia, June 1973. W i l l , Hartmut J . , "Management Information Systems as a S c i e n t i f i c Endeavour; The State of the Art", Working Paper No.84A, University of B r i t i s h Columbia, Vancouver, B.C. W i l l , Hartmut J . , "MIS: Mirage or Mirror Image?", Working Paper No.14 6, University of B r i t i s h Columbia, Vancouver, B.C. Zani, William M., "Blueprint for MIS", Harvard Business Review, Vol.48, No.6, November-December, 197 0, p.95. 104 APPENDIX I EXISTING DATA ANALYSIS TECHNIQUES There have been several approaches, c i t e d i n the l i t e r a t u r e , which have attempted to analyze the data and information resource, establish relationships and bring more control and d i s c i p l i n e to the area of information system analysis and design. Most of these techniques have been concerned with automating parts of the design process i n the development of data processing systems which replace e x i s t i n g manual ones. A common c h a r a c t e r i s t i c of each, i s that at some phase they analyze the present system as i t currently e x i s t s . Some of the concepts employed i h these techniques have been adopted i n the formulation of the.Data Administration and Control System and a b r i e f comment about them i s warranted here. AUTOSATE One of the f i r s t attempts at automated systems design was AUTOSATE developed by O.T. GATTO, of The Rand Corporation, for the United States A i r Force. The technique " i s geared to determining workload, relationships and storage c h a r a c t e r i s t i c s of documents i n the information network automatically""'". AUTOSATE examines "stations" as nodes i n the information network and i d e n t i f i e s data a c t i v i t y i n and out, and information stored i n these stations. Stations are defined as a person, a group O.T. Gatto, "AUTOSATE", Communications of The ACM, Vol 7, No. 7, July 1964. 105 of people, a work centre or a s p e c i f i c thing. Through interviews with station personnel, information used, processed or stored at each station i s i d e n t i f i e d on document s p e c i f i c a t i o n sheets. The sp e c i f i c a t i o n s are translated into machine readable form and a series of analysis reports are produced. These indicate by station, document a c t i v i t y as to input, processing, output and storage. Other reports trace the flow of documents through the various stations and the a c t i v i t i e s which are performed with them. A series of "redundancy" reports i s also produced to show i d e n t i c a l and s i m i l a r documents. AUTOSATE's main emphasis i s on the document flow through a manual information system and provides the analyst with more insight and knowledge on which to base his design of a new system. Although i t s major contribution was intended to a s s i s t i n the design of new computer systems, several of the concepts are applicable and have been used i n the i d e n t i f i c a t i o n and use of data elements i n DACS. The use of data s p e c i f i c a t i o n sheets, the removal of a r t i f i c i a l boundaries of the data by function or application and the insights provided by the redundancy reports, have a l l contributed to the design of DACS. 2 Schruben's "Information System Model" i s si m i l a r to AUTOSATE i n that i t uses e x i s t i n g documents to trace the information flow through a system. "Document analysis", "where used", and "data- L. Schruben, "The Information System Model", Datamation, Vol 15, No 7, July 1969. 106 flow" reports, a s s i s t i n the analysis and design of new information systems. ARDI 3 ARDI i s a technique developed to provide a detailed guide to the Analysis, Requirements determination, Design and develop- ment, and Implementation and evaluation phases of the system design e f f o r t . In one phase, i d e n t i f y i n g the information flow, description sheets are used to document forms, f i l e s and f i e l d s e x i s t i n g i n the system. While automated analysis i s not used, the techniques used to i d e n t i f y the data elements and i l l u s t r a t e relationships have been useful i n determining DACS requirements. TAG 4 The Time Automated Grid technique developed by IBM i s intended to aid the systems analyst i n the design of information systems. Data to be analyzed i s recorded on "input/output analysis forms" that describe the c h a r a c t e r i s t i c s of the inputS7 outputs or f i l e s " . The use of TAG begins with a description of required outputs and then works backwards to determine what inputs are necessary at what periods of time. When inputs and outputs have been defined, the next i t e r a t i o n of the program produces f i l e and system flow descriptions. W. Hartman, W.H. Matthes, A. Proeme, Management Information Systems Handbook, McGraw-Hill Book Co., New York, 1969. J. Kelly, Computerized Management Information Systems, The MacMillan Co., New York, 1970, Chapter 8. 107 Although time i s obviously one of the factors i n f i l e design, a study of the example provided i n the reference indicates that a p r o l i f e r a t i o n of f i l e s r e s u l t from TAG analysis. These f i l e s are constructed to meet very structured requirements and do not appear to consider redundancy or commonality of use of data items. The technique seems to create very functional and application oriented f i l e s , contributes to the problems of common data base d e f i n i t i o n and defeats the desire to make the data resource f l e x i b l e i n order to meet various, and perhaps, unfore- seen requirements. Although the techniques described above do not d i r e c t l y focus on the problems of data resource management, they a l l have the common c h a r a c t e r i s t i c of attempting to i d e n t i f y the data resource i n a d i s c i p l i n e d standardized manner so that i t may more readi l y be analyzed. However, none of these systems answer a l l the questions or s a t i s f y the requirements that have been i d e n t i f i e d to achieve the desired l e v e l of control but the contribution of each to the development of DACS i s acknowledged.

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 5 0
China 3 34
France 2 0
City Views Downloads
Beijing 3 4
Los Angeles 3 0
Unknown 2 0
Redmond 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items