UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A personal library system Kehler, Eric Gregory 1982

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1982_A6_7 K44.pdf [ 4.28MB ]
Metadata
JSON: 831-1.0051828.json
JSON-LD: 831-1.0051828-ld.json
RDF/XML (Pretty): 831-1.0051828-rdf.xml
RDF/JSON: 831-1.0051828-rdf.json
Turtle: 831-1.0051828-turtle.txt
N-Triples: 831-1.0051828-rdf-ntriples.txt
Original Record: 831-1.0051828-source.json
Full Text
831-1.0051828-fulltext.txt
Citation
831-1.0051828.ris

Full Text

A PERSONAL LIBRARY SYSTEM by ERIC GREGORY KEHLER B.Sc. The U n i v e r s i t y of B r i t i s h Columbia, 1980 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Computer Science We accept t h i s T h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA J u l y 1982 C E r i c Gregory Kehler, 1982 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of C,OrY)j)ul&f Sc^\e^.V)Ce^-The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 D a t e CTcJy ^ H%Z DE-6 (.3/81) E G K e h l e r P e r s o n a l L i b r a r y S y s t e m A b s t r a c t 1982 T h i s t h e s i s r e p o r t s a n o v e l i m p l e m e n t a t i o n o f a l i b r a r y c a t a l o g u e . T h e c a t a l o g u e i s d e s i g n e d a s a t h r e e l e v e l h i e r a r c h y d e s i g n e d t o c a p t u r e t h e p u b l i s h e d s t r u c t u r e o f k n o w l e d g e . A f o u r t h l e v e l e x i s t s w h i c h l i n k s t h e b i b l i o g r a p h i c d a t a o f t h e t o p t h r e e l e v e l s t o t h e a c t u a l p h y s i c a l c o p i e s o f t h e d e s c r i b e d i t e m . T h e c a t a l o g u e i s i m p l e m e n t e d o n a m i c r o - c o m p u t e r t o p r o v i d e a c o n v e n i e n t m e t h o d t h a t s m a l l l i b r a r i e s c o u l d u t i l i z e a s a n o n - l i n e c a t a l o g u e s y s t e m . T h e l i m i t a t i o n s o f t h e m i c r o c o u p l e d w i t h t h e r i g o r o u s d e m a n d s o f a l i b r a r y m a d e t h e c h o i c e o f s t o r a g e s t r u c t u r e s a m a j o r i s s u e o f t h e i m p l e m e n t a t i o n . T h e u s e o f P r e f i x B - t r e e s a n d B i t m a p s i n t h e c a t a l o g u e i n d e x a r e d i s c u s s e d i n d e t a i l . K E Y W O R D S : L i b r a r y C a t a l o g u e , P r e f i x B - T r e e s , M i c r o - c o m p u t e r , O n - l i n e C a t a l o g u e s , E f f i c i e n t S t o r a g e S t r u c t u r e s , B i t M a p s , I n f o r m a t i o n R e t r i e v a l - i i -A Personal L i b r a r y System by E r i c G. Kehler Table of Contents A b s t r a c t i i Table of Tables v Table of F i g u r e s . . . v i Acknowledgement v i i 1. I n t r o d u c t i o n 1 1 .1 Background 1 1.2 General Research Area 4 1.2.1 Micro Power 4 1.2.2 Data S t r u c t u r e s 5 1.2.3 Automated Catalogues . . 5 1.3 Summary of Proposed Work 6 1.3.1 Goals of the L i b r a r y System 6 2. F i e l d Framework 10 2.1 L i b r a r y Functions 10 2.2 L i b r a r y Automation 12 2.3 Lack of I n t e g r a t i o n 14 2.4 A Co-ordinated System 15 2.5 Small L i b r a r y Requirements 17 2.5.1 E f f i c i e n t Storage S t r u c t u r e s 19 2.6 Current Systems 22 2.6.1 Dobis 22 2.6.2 OCLC 23 2.6.3 D i a l o g 24 3. Proposed System 27 - i i i -• s e e © e o © © © • © e e » 9 o • © « e • e « o • © 3.1 Purpose of System 3.2 Integrated System 3.3 The Three Level Hierarchy . . . 3.4 Why i t w i l l work 1. System Implementation . . . . 4.1 System Description 4.1.1 Hardware 4.1.2 Operating System 4.2 Data Structures 4.2.1 Overview . . 4.2.2 I n t e r - f i l e Relationships 4.2.3 F i l e Usage . . . . 4.2.4 Internal F i l e Structure 4.2.5 Physical and Logical Relationship . . 4.3 Programs 4.4 Data Robustness . . . . . . . . . . . . . 5. Evaluation . . . . . . . . . 5.1 The Hierarchical Structure 5.2 P r a c t i c a l i t y 5.3 A Personal System 5.4 F i l e Structure 6. Conclusions . . . . Bibliography Appendix A . . Appendix B . . Appendix C . . Appendix D . . Appendix E . . o • o o • a. © e • © « o • • o o • • • ' • © © © • 0 - i v -E G Kehler Personal L i b r a r y System Table of Tables 1982 5:1 Test Database Current S i z e 61 5:2 R e t r e i v a l Times . . 62 5:3 Storage Space U t i l i z a t i o n F a c t o r s 64 -v-E G Rehler Personal L i b r a r y System 1982 Table of F i g u r e s 2:1 Flow of Documents Though a L i b r a r y 12 3:1 Three L e v e l H i e r a r c h y of P u b l i s h e d Knowledge . . . . 31 3:2 P u b l i c a t i o n Types 32 3:3 L o g i c a l S t r u c t u r e of Catalogue 35 4:1 P h y s i c a l F i l e S t r u c t u r e s 41 4:2 Doubly Linked P r e f i x B-tree 47 4:3 B i t Map S t r u c t u r e 48 4:4 L o g i c a l and P h y s i c a l S t r u c t u r e of Catalogue . . . . 52 - v i -E G Kehler P e r s o n a l L i b r a r y System 1982 Acknowledgement I d e d i c a t e t h i s T h e s i s t o my dear w i f e , C a r o l y n , f o r her p a t i e n c e r a i s i n g our young son while I was busy at the u n i v e r s i t y and a l s o f o r the a s s i s t a n c e she gave i n the p r o o f - r e a d i n g and e d i t i n g of t h i s work. I a l s o would l i k e to thank my f a t h e r who set an example of hard work and e x c e l l e n c e f o r which I w i l l be f o r e v e r g r e a t f u l e Thanks go a l s o to my two readers Drs. Paul Gilmore and Bob G o l d s t e i n f o r t h e i r h e l p i n the p r o d u c t i o n of t h i s T h e s i s . Most of a l l I wish to Thank my Lord who gave me the a b i l i t i e s to be a b l e to reach t h i s l e v e l of achievement. To Him be the Honour. - v i i -E G Kehler 1 I n t r o d u c t i o n Personal L i b r a r y System 1 982 1.1 Background T h i s T h e s i s r e v o l v e s around an i n s t i t u t i o n which i s f a r o l d e r than any of our modern S c i e n c e s , namely that of the storage and d i s s e m i n a t i o n of Knowledge. T h i s Knowledge has been handed down from generation to g e n e r a t i o n i n many forms. The e a r l i e s t was by word of mouth; then came the w r i t t e n form with an i n c r e a s e in both accuracy and l o n g e v i t y . In "recent" years multi-media formats have become common-place. Over these m i l l e n n i a not only has the form of knowledge changed but so has the amount. The ever i n c r e a s i n g amounts of knowledge have made c o n t r o l an impossible task. T h i s problem i s not unique to today; i t has been the major concern over a l l years of recorded h i s t o r y . I t j u s t appears as more of a problem today due to the huge q u a n t i t i e s of, and the heavy demands f o r , knowledge. To improve the s h a r i n g of knowledge, some form of c e n t r a l i z e d c o n t r o l over i t was r e q u i r e d , thence the foundation of the " L i b r a r y " and i t s v a r i o u s f u n c t i o n s . The . e a r l i e s t l i b r a r i e s housed cuneiform t a b l e t s and papyrus s c r o l l s bound with animal h i d e s . Today one may f i n d almost anything i n a l i b r a r y from books in paper or p l a s t i c on through to computerised data s t o r e d as magnetic f l u x on a p l a s t i c tape. Once the knowledge was c e n t r a l i z e d i t was e a s i e r to l o c a t e ; however, as the l i b r a r y grew i t became too d i f f i c u l t to search every item f o r a s p e c i f i c request and thus the major problem was s t i l l with us - how to f i n d a s p e c i f i c p i e c e of knowledge in a reasonable amount of time \ -1 -E G Kehler Personal L i b r a r y System 1982 without a great deal of wasted e f f o r t . The knowledge had to be c l a s s i f i e d i n such a way that a l l r e l e v a n t f a c t s on a subject were e a s l i y o b t a i n a b l e . T h i s was hampered by the f a c t that one item may be r e l e v e n t to s e v e r a l s u b j e c t areas. T h i s problem was d e a l t with by p r o v i d i n g a catalogue which l i s t e d each item i n the l i b r a r y and provided a c r o s s r e f e r e n c e by subject to the v a r i o u s items. T h i s c a t a l o g u i n g was done by hand and was t h e r e f o r e l i m i t e d to what co u l d be expected from a manual system. Items were l i s t e d under only a few of t h e i r r e l e v a n t areas as i t would be impossible to manage the re f e r e n c e s i f a l l s u b j e c t s a book d e a l t with were maintained. I t became a d i f f i c u l t task to decide what the main area a book r e f e r r e d to was and what sub j e c t areas to catalogue i t under. T h i s f u n c t i o n i s the most important and d i f f i c u l t task of a l i b r a r y . However, each l i b r a r y was l e f t on i t s own and a book would be ca t a l o g e d d i f f e r e n t l y by two l i b r a r i e s . T h i s added to the co n f u s i o n i n f i n d i n g a given item of knowledge. In the l a s t century e f f o r t s have been made to s t a n d a r d i z e the c a t a l o g u i n g of knowledge. I t was hoped that t h i s would reduce the d i f f i c u l t y both i n f i n d i n g an item and in c a t a l o g u i n g i t . The major standard i n North America i s known as the AACR2 c a t a l o g u i n g r u l e s [ALA 1978], and fo r computer storage of b i b l i o g r a p h i c data there are the MARC II formats [MARC 1972]. U n f o r t u n a t e l y these have been only m a r g i n a l l y s u c c e s s f u l . The AACR2 i s not fol l o w e d f u l l y even by the L i b r a r y of Congress [Dwyerl98l and Fasana 1980]. The MARC II format has gone under a major r e v i s i o n s i n c e i t s i n i t i a l r e l e a s e and each country using i t forms t h e i r own -2-E G Kehler Personal L i b r a r y System 1982 standard MARC II format [Rush 1980].. Thus with standards that are only p o o r l y followed the problems are not e f f e c t i v e l y d e a l t with. These problems are aggravated by the a n t i q u i t y of t h e ' f i e l d -which can be very slow to change - and by the complexity of inf o r m a t i o n i n i t s own r i g h t . Knowledge i s a b s t r a c t ; each f a c t i s unique unto i t s e l f and thus i s very hard to c l a s s i f y i n standard formats. Coupled with t h i s i s the wide relevance of knowledge; a seemingly simple f a c t can have great bearing i n areas d i s t i n c t from the b a s i c subject areas. Determining a l l these subject areas i s an impossible task given a f i n i t e amount of time. As a r e s u l t of these v a r i o u s f a c t o r s we have p o o r l y c l a s s i f i e d knowledge. T h i s poor c l a s s i f i c a t i o n leads to low r e c a l l of r e l e v a n t f a c t s to a given request; t h i s i n turn hampers f u r t h e r r e s e a r c h in any given f i e l d . Taking the opposite extreme and c a t a l o g u i n g a f a c t under as many sub j e c t areas as p o s s i b l e leads to poor p r e c i s i o n i n the r e t r i e v a l of items by a s p e c i f i c request. T h i s poor p r e c i s i o n i s as inconvenient as poor r e c a l l s i n c e now great amounts of i r r e l e v a n t f a c t s must be weeded through to f i n d those which are r e l e v a n t . Today rese a r c h i s being done i n the area of improving both r e c a l l and p r e c i s i o n s i m u l t a n e o u s l y . With the advent of the computer age i t has become f e a s i b l e f o r a l i b r a r y to c l a s s i f y ,items under many headings - ra t h e r than j u s t two or three - while p r o v i d i n g some a n a l y s i s of the query by the system to improve p r e c i s i o n . T h i s t r e n d can be seen i n the - 3 -E G Kehler Personal L i b r a r y System 1982 p r o l i f e r a t i o n of computer-aided a r t i c l e search s e r v i c e s such as MEDLARS [Lancaster 1968] f o r medical r e s e a r c h and ERIC [Yarborough 1975] for education r e s e a r c h . T h i s t h e s i s d e s c r i b e s the implementation of a system designed to provide b e t t e r access to the knowledge s t o r e d in a l i b r a r y and to overcome the v a r i o u s inherent problems. 1.2 General Research Area T h i s r e s e a r c h touches on three widely d i v e r g e n t a r e a s . Two of the areas, micro-computers and storage s t r u c t u r e s , are d e a l t with only i n passing and l i t t l e new work i s produced i n these areas. The t h i r d area, that of l i b r a r y catalogue s t r u c t u r e s , i s the main focus of the research and c o u l d be impacted s i g n i f i c a n t l y by t h i s r e s e a r c h . 1.2.1 Micro Power One concern in developing t h i s system was to demonstrate the p o t e n t i a l c a p a b i l i t i e s of a small micro computer. Micros are s t i l l thought of as s m a l l , incapable toys by the m a j o r i t y of the computing i n d u s t r y . While i t i s true that they are f a r l e s s powerful than the l a r g e s t main frames, t h e i r computing p o t e n t i a l should not be overlooked. In many cases the micro i s the only a l t e r n a t i v e . Since micros are inexpensive and pro v i d e a very good computing per d o l l a r r a t i o , they can be a f f o r d e d by even a pe r s o n a l or s m a l l , underfinanced l i b r a r y . A l i b r a r y system developed on a main frame would have l i m i t e d value f o r the pe r s o n a l or small l i b r a r y . The two reasons f o r t h i s -4-E G Kehler Personal L i b r a r y System 1982 are the expense and lack of a v a i l a b i l i t y of the main frame to the average person. I t was, t h e r e f o r e , d e s i r a b l e to determine whether a micro c o u l d support such a system. Should t h i s be s u c c e s s f u l , i t would be e s p e c i a l l y b e n e f i c i a l to p r o f e s s i o n a l s and academics whose l i b r a r i e s o f t e n c o n s i s t mainly of j o u r n a l s . 1.2.2 Data S t r u c t u r e s The second r e s e a r c h area i s i n the s e l e c t i o n of the optimal storage s t r u c t u r e s . Given the c o n s t r a i n t s imposed on the s t r u c t u r e s by a micro-computer - namely: l i m i t t e d I/O throughput, low storage, and slow p r o c e s s i n g , - coupled with the demanding requirements of a l i b r a r y system with v a r i a b l e l e n g t h data, can e f f i c i e n t s t r u c t u r e s be developed? The two c o n s t r a i n t s work a g a i n s t each other and make the s e l e c t i o n of an e f f i c i e n t storage s t r u c t u r e e s s e n t i a l to the success of the implementation. F i n d i n g and implementing a s t r u c t u r e which adequately meets both c o n s t r a i n t s c o u l d prove to be a c h a l l e n g i n g task. 1.2.3 Automated Catalogues The t h i r d and major research area i s i n the automation of l i b r a r y c atalogue systems. I t i s in t h i s area that the major research e f f o r t has been made. Most l i b r a r y catalogues are s t i l l u sing paper cards f o r the storage of c r o s s r e f e r e n c e s . The computerised systems simply copy t h i s data to d i s k f i l e s and o c c a s i o n a l l y p r i n t the s o r t e d contents on f i c h e and d i s t r i b u t e m u l t i p l e c o p i e s . The l a t e s t systems provide o n - l i n e r e t r i e v a l of the card catalogue, but t h e r e i n l i e s the problem, i t i s s t i l l the o l d card catalogue i n nature. -5-E G Kehler Personal L i b r a r y System 1982 The standard c a r d catalogue has a s i n g l e l e v e l s t r u c t u r e d e s p i t e the h i e r a r c h i c a l 'way i n which knowledge i s p u b l i s h e d . The author's constant exposure to the b a s i c s t r u c t u r e of p u b l i s h e d m a t e r i a l through experience i n the l i b r a r y f i e l d l e d to the development of t h i s approach - the h i e r a r c h i c a l s t r u c t u r e . P u b l i s h e d i n f o r m a t i o n can be e a s i l y mapped on to a h i e r a r c h y and i t was d e s i r e a b l e to capture t h i s f a c t in a o p e r a t i o n a l l i b r a r y system. 1.3 Summary of Proposed Work The aim of t h i s r e s e a r c h was to produce a l i b r a r y system which would enable a person to l o c a t e and c o n t r o l the l i t e r a t u r e in h i s p e r s o n a l l i b r a r y . The system had to provide access to i n d i v i d u a l a r t i c l e s w i t h i n j o u r n a l s as w e l l as i n d i c a t e where the r e f e r e n c e d item c o u l d be l o c a t e d i n the l i b r a r y . M u l t i p l e c o p i e s of a given a r t i c l e had to be c o n t r o l l e d and managed as a s i n g l e l o g i c a l e n t i t y f o r r e t r i e v a l but as i n d i v i d u a l items when l o a n i n g them. The system had to be r e a d i l y a v a i l a b l e and inexpensive to operate. T h i s r e s t r i c t e d the development to a system on a micro-computer. The f o l l o w i n g p o i n t s served as the d e t a i l e d goals of the development. 1.3.1 Goals of the L i b r a r y System 1.3.1.1 Current Standards It was f e l t that the prototype system should not conform to the e s t a b l i s h e d l i b r a r y data storage standards. T h i s was done because the MARC II standard [MARC 1972] i s very d e t a i l e d and t h i s - 6 -E G Kehler Personal L i b r a r y System 1982 r e s u l t s in l a r g e amounts of storage overhead. I t was b e l i e v e d that t h i s amount of d e t a i l would soon overwhelm the storage c a p a b i l i t i e s of a micro system and provide l i t t l e or no a d d i t i o n a l v a l u a b l e i n f o r m a t i o n to the user. I t should be remembered that as t h i s i s a personal system the user w i l l be the one keying most of the data. I t would be u n d e s i r e a b l e to key more than what i s r e q u i r e d f o r ones own personal use. There comes a p o i n t where a system i s too p a i n f u l to use and i s t h e r e f o r e abandoned. P e r s o n a l l y , I dread the thought of keying hundreds of a b s t r a c t s , yet I f e e l that they are necessary f o r my requirements. S t i b i c i n h i s book [ S t i b i c 1980], on page 36, g i v e s a warning in t h i s regard. "A law of interdependence of the c o s t s of storage and r e t r i e v a l must be kept in mind: What i s saved at the moment of storage has to be p a i d f o r at the moment of r e t r i e v a l , and v i c e v e r s a " [ S t i b i c 1980] He a l s o p o i n t s out that the saving of time without l o s s of c l a r i t y and u s a b i l i t y of the system i s the supreme r u l e i n p e r s o n a l r e t r i e v a l systems. He l i s t s the f o l l o w i n g p o i n t s as e s s e n t i a l s i n a personal c r o s s r e f e r e n c e system. - I d e n t i f y i n g the p h y s i c a l form of the document - M a i n t a i n i n g a unique ID f o r each document - I n d i c a t i n g the type of document. - I d e n t i f y Authors and t i t l e - I d e n t i f y the source of document ( P u b l i s h e r / J o u r n a l ) - i n d i c a t e the date of the document - D e s c r i p t i o n of Content (Keywords) - P r o v i d i n g A b s t r a c t s on documents - 7 -E G Kehler Personal L i b r a r y System 1982 In l i g h t of these p o i n t s the designed system resembles the MARC II s t r u c t u r e , i e . a tagged f i l e , but with much s i m p l i f i c a t i o n . 1.3.1.2 Search F i e l d s The second goal was to provide f o r a minimum of four major methods of r e t r i e v i n g documents in the system. v i a : - Author - T i t l e of work - keywords and phrases with boolean s e l e c t i o n , t h i s provides f i n e s e l e c t i o n of the s t o r e d i n f o r m a t i o n . - Subject headings. T h i s p r o v i d e s general access to a given f i e l d of knowledge. 1.3.1.3 Document D e s c r i p t i o n The R e t r i e v a l system must provide adequate methods f o r d e s c r i b i n g the documents. T h i s should i n c l u d e an o n - l i n e r e c a l l of an a b s t r a c t , and review of the item. I t should a l s o be p o s s i b l e to r e c o r d p e r s o n a l comments about the a r t i c l e . 1.3.1.4 Access Methods A l l access to the system must be v i a i n t e r a c t i v e s e s s i o n s and a l l a c c e s s a b l e data and d e s c r i p t i o n s must be o n - l i n e . The whole purpose of the system i s to provide a quick method of r e t r i e v i n g i n f o r m a t i o n i n a small l i b r a r y . Batch methods would g r e a t l y reduce the value of the system as i t would soon become f a s t e r to browse the a c t u a l documents then to use the computer. -8-E G Kehler Personal L i b r a r y System 1982 « 1.3.1.5 E x t e r n a l Sub-systems As a l l l i b r a r i e s r e q u i r e c o n t r o l over the use and l o c a t i o n of t h e i r documents, the r e t r i e v a l system must be r e a d i l y expandable to p r o v i d e c i r c u l a t i o n c o n t r o l . Even in personal l i b r a r i e s items are loaned to v a r i o u s people and q u i t e o f t e n f o r g o t t e n or l o s t . P r o v i d i n g c i r c u l a t i o n c o n t r o l would reduce l o s s e s and immediately i n d i c a t e whether the d e s i r e d item i s c u r r e n t l y i n the l i b r a r y . C o n t r o l over a c q u i s i t i o n s should a l s o be c o n s i d e r e d in the design phase. T h i s would be a u s e f u l f e a t u r e for the p r o f e s i o n a l who r e c e i v e s s e v e r a l j o u r n a l s . The subsystem would tr a c k the r e c e i p t of items and warn when some are overdue. In my own experience i t would q u i t e o f t e n be s e v e r a l months before I n o t i c e d the t a r d i n e s s of a j o u r n a l . More months would pass before a response from the p u b l i s h e r was r e c e i v e d i n d i c a t i n g that a copy was on the way. An a q u i s i t i o n s system would d e t e c t the l a t e a r r i v a l sooner, thus s h o r t e n i n g the delay time. T h i s would r e s u l t i n more c u r r e n t i n f o r m a t i o n - a must in any growing f i e l d . 1.3.1.6 Exhaustive search The o n - l i n e data must be arranged so that exhaustive search a n a l y s i s c o u l d be done f a i r l y e a s i l y . Exhaustive sear c h i n g would permit i n t e l l i g e n t search methods. T h i s would enable the system to s e l e c t only those documents meeting very complex and s p e c i f i c requirements which would be i n f e a s i b l e to do by hand. -9-E G Kehler 2 F i e l d Framework Personal L i b r a r y System 1 982 2.1 L i b r a r y Functions The standard academic l i b r a r y can be s p l i t i n t o four b a s i c f u n c t i o n a l areas. Each of these areas performs a s p e c i f i c r o l e in the p r o c e s s i n g of info r m a t i o n and i t s eventual d i s s e m i n a t i o n . The data produced by one area becomes the source f o r the next and thus can be seen as a assembly l i n e f o r p r o v i d i n g i n f o r m a t i o n . The f i r s t s t a t i o n that i n f o r m a t i o n must pass i n e n t e r i n g a l i b r a r y i s the purchasing or a c q u i s i t i o n s department. In t h i s department books are s e l e c t e d for i n c l u s i o n i n the l i b r a r y . In o r d e r i n g a book much of the a s s o c i a t e d b i b l i o g r a p h i c i n f o r m a t i o n i s r e q u i r e d f o r the o r d e r i n g process thus i t can be keyed i n t o the system at t h i s p o i n t . T h i s i n f o r m a t i o n then forms the nucleus fo r the next phase. A f t e r the book has been r e c e i v e d by the l i b r a r y i t i s sent to the c a t a l o g u i n g department. Here the item i s given a s h e l f - i d and d e c i s i o n s are made as to what sub j e c t areas the book i s to be c r o s s r e f e r e n c e d under. T h i s phase of book p r o c e s s i n g i s aided by o n l i n e catalogues which share c a t a l o g i n g e f f o r t a c r o s s s e v e r a l l i b r a r i e s . T h i s f u n c t i o n i s terminated by the s h e l v i n g of the book onto the v a r i o u s book s t a c k s . The t h i r d step in r e t r i e v i n g i n f o r m a t i o n i s the a c t u a l l o c a t i o n of the d e s i r e d work by the l i b r a r y patrons. T h i s phase i s by f a r the most e s s e n t i a l and c r i t i c a l to the success of the l i b r a r y , -10-E G Kehler Personal L i b r a r y System 1982 however, i t can only be s u c c e s s f u l i f the previous processes are performed c o r r e c t l y and a c c u r a t e l y . F i n a l l y the l a s t area of a l i b r a r y i s the c i r c u l a t i o n system. T h i s area of the l i b r a r y t r a c k s the l o c a t i o n and l o a n i n g of books. Without a good c i r c u l a t i o n department books w i l l be l o s t or misplaced r e s u l t i n g i n the degradation of the l i b r a r y ' s e f f e c t i v e n e s s i n p r o v i d i n g i n f o r m a t i o n . It i s the t h i r d area which was concentrated on in t h i s t h e s i s . I t i s the l e a s t automated of the areas yet i s the most important to the user of the l i b r a r y . As a r e s u l t of the low l e v e l of automation t h i s area has a l a r g e number of i n t e r e s t i n g problems to s o l v e . -11-E G Kehler Personal L i b r a r y System 1982 A c q u i s t i o n s \ J C l a s s i f i c a t i o n \ J R e t r i e v a l C i r c u i t J at ion < — + On-line Search Flow of documents through a l i b r a r y F i g u r e 2:1 2.2 L i b r a r y Automation Today's l i b r a r i e s have been using computers f o r s e v e r a l decades yet the a p p l i c a t i o n s have been mainly l i m i t t e d to c o n t r o l over c i r c u l a t i o n and a c q u i s i t i o n s . There has been some automation done in the r e t r i e v a l area but i t i s only very r e c e n t l y that o n - l i n e searchable data bases have e x i s t e d . The automation i n the r e t r i e v a l area can be d i v i d e d i n t o two main streams. The f i r s t i s o n - l i n e s e a r c h i n g f o r a r t i c l e s i n s e r i a l s ; the second steam i s the shared c a t a l o g u e . The former system i s designed f o r the person seeking to f i n d s p e c i f i c i n f o r m a t i o n . The system s t o r e s a n a l y t i c s f o r each a r t i c l e documented in i t s f i l e s ; these are -12-E G Kehler Personal L i b r a r y System 1982 r e t r i e v e d by the searcher who then determines the value of o b t a i n i n g that document. T h i s type of system i s c h a r a c t e r i s e d by the Medlars [Lancaster 1968] and D i a l o g [Miastkowski 1981] r e t r i e v a l systems. The second stream, those systems used by l i b r a r i a n s to a i d i n the c a t a l o g u i n g of books, .is designed to share the cost of c a t a l o g u i n g a c r o s s s e v e r a l l i b r a r i e s thereby reducing the i n d i v i d u a l c o s t . The second type of system a l s o p r o v i d e s a method by which the combined catalogues of s e v e r a l l i b r a r i e s can be c o n v e n i e n t l y q u e r i e d - a union c a t a l o g u e . T h i s makes i n t e r l i b r a r y loan f a c i l i t i e s much more convenient. T h i s type of system i s c h a r a c t e r i s e d by the OCLC [Jacobs 1979] and DOBIS [Newman 1979] systems. The design purpose of the a n a l y t i c s e a r c h i n g systems was to provide a m u l t i - a c c e s s index i n t o a c o l l e c t i o n of a r t i c l e s . Each a r t i c l e forms the main b i b l i o g r a p h i c e n t i t y - i t s p h y s i c a l p u b l i s h e d s t r u c t u r e i s not c o n s i d e r e d by the system; the database i s searched to l o c a t e a r t i c l e s which are of i n t e r e s t to the search e r . The system then p r o v i d e s the t i t l e and author of the work and the name of the j o u r n a l in which i t was p u b l i s h e d as pa r t of the a n a l y t i c a l data s t o r e d on the a r t i c l e . Once an a r t i c l e i s chosen, the next step of the searcher i s to c o n s u l t the l o c a l l i b r a r y h o l d i n g s to determine i f the d e s i r e d j o u r n a l i s r e c e i v e d and i f so where i t might be found i n the l o c a l l i b r a r y . Shared catalogue systems such as OCLC and DOBIS are designed to be used by the p r o f e s s i o n a l c a t a l o g u e r in the c a t a l o g u i n g of books. I t i s used only i n d i r e c t l y by the general p u b l i c through -13-E G Kehler Personal L i b r a r y System 1982 p a r t i a l l i s t i n g s of the contents on f i c h e or p o s s i b l y o n - l i n e . These systems g e n e r a l l y provide complete MARC formats of each catalogued item. The average patron does not need these. In t h i s system the main b i b l i o g r a p h i c e n t i t y i s the p h y s i c a l o b j e c t , u s u a l l y a book. The system catalogues books in the same way as d i d the o l d card catalogues - with l i t t l e to no i n d i c a t i o n of the content s , or s u b s t r u c t u r e , of the i n d i v i d u a l books or j o u r n a l s . In use, a c a t a l o g u e r would c o n s u l t the system to determine i f an item was p r e v i o u s l y catalogued. If i t was, the a s s o c i a t e d data would be used to catalogue the book and the data would be copied to the l o c a l l i b r a r y h o l d i n g s . If the item was not l o c a t e d i t would be catalogued and the data added to the shared system. Thus the primary purpose of the system i s to share c a t a l o g i n g c o s t s , and to provide a c e n t r a l catalogue f a c i l i t y a c ross s e v e r a l member l i b r a r i e s to a i d i n t e r l i b r a r y l o a n s . 2.3 Lack of I n t e g r a t i o n These two s e r v i c e s comprise the bulk of computerised l i b r a r y r e t r i e v a l a c c e s s . The f i r s t d e a l s only wit'h a b s t r a c t e d a r t i c l e s with no regard to i t s p u b l i s h e d s t r u c t u r e . The l a t t e r d e a l s with the p h y s i c a l item with l i t t l e regard to i t s co n t e n t s . Neither system i s complete by i t s e l f . In using the a n a l y t i c systems, when an a r t i c l e i s found the d i s p l a y e d i n f o r m a t i o n must be copied and used to access the l o c a l l i b r a r y h o l d i n g s f i l e to determine i f the a r t i c l e i s co n t a i n e d i n the l i b r a r y . T h i s manual step should not be neccessary. The f i r s t p l ace to look f o r items should be the l o c a l catalogue which a l s o -14-E G Kehler Personal L i b r a r y System 1982 i n d i c a t e s where the item i s s t o r e d . T h i s would enhance the us e f u l n e s s of the catalogue and a r t i c l e d e s c r i p t i o n f o r two major reasons: 1) Since the item i s l i s t e d you know that the a r t i c l e i s c o n v e n i e n t l y o b t a i n a b l e and 2) I t i s e a s i e r and f a s t e r to o b t a i n the s e l e c t e d item. Should a d d i t i o n a l works i n a f i e l d s t i l l be r e q u i r e d then the g l o b a l databases c o u l d be c o n s u l t e d . The o n - l i n e shared catalogue systems are too shallow in depth; they only look at the p h y s i c a l o b j e c t as a whole. These systems need to separate the p h y s i c a l o b j e c t s i n t o t h e i r i n t e l l e c t u a l subcomponents and provide access to both the p h y s i c a l object and the i n t e l l e c t u a l p a r t s . T h i s would g r e a t l y improve the value of the catalogue to the searcher as many a d d i t i o n a l access p o i n t s i n t o the l i b r a r y would be a v a i l a b l e ; yet the system would s t i l l have t i e s to the p h y s i c a l c o n t a i n e r s which i s the f e a t u r e l a c k i n g in the a n a l y t i c systems. Nei t h e r s t y l e of system captures the s t r u c t u r e of p u b l i s h e d i n f o r m a t i o n as we c u r r e n t l y o b t a i n i t . Both systems take a f l a t view of the world of l i t e r a t u r e p u b l i c a t i o n when there are i n f a c t s e v e r a l l e v e l s of p u b l i s h e d i n f o r m a t i o n with a side t i e r r e p r e s e n t i n g the p h y s i c a l o b j e c t . Ignoring t h i s h i e r a r c h y r e s u l t s in the l o s s of a l a r g e amount of in f o r m a t i o n which i s u s e f u l i n the o b t a i n i n g of knowledge. 2.4 A Co-ordinated System For good access i n t o a l i b r a r y the o n - l i n e catalogue must be a combination of both the a n a l y t i c and shared catalogue types of -15-E G Kehler Personal L i b r a r y System 1982 systems. The catalogue must provide access to each i n d i v i d u a l i n t e l l e c t u a l u n i t - a r t i c l e - c o n t a i n e d i n the l i b r a r y . The i n t e g r a t e d system must then p r o v i d e the searcher with the l o c a t i o n of the d e s i r e d item on the v a r i o u s stacks i n the ' l i b r a r y . To handle t h i s combined view the notion of how knowledge i s s t r u c t u r e d must be b u i l t i n t o the system. T h i s n o t i o n , which i s l a c k i n g i n today's c a t a l o g s makes them unadaptable f o r h a n d l i n g the complete i n t e g r a t e d system. Current systems c o u l d provide some form of combined system but problems would a r i s e due to the f u n c t i o n a l dependencies inherent i n the p u b l i s h e d s t r u c t u r e of knowledge which these systems c o u l d not adequately r e p r e s e n t . The need f o r an i n t e g r a t e d system i s even more acute i n the s m a l l e r , p e r s o n a l l i b r a r y . Here the user i s t r y i n g to f i n d a work he knows he has i n h i s l i b r a r y . At present, the a n a l y t i c search systems would come the c l o s e s t to s o l v i n g h i s needs. However, t h i s type of system f a i l s i n that i t would only t e l l him the name of the j o u r n a l which c o n t a i n s h i s requested document; a n c i l l i a r y i n f o r m a t i o n i s used to a c t u a l l y l o c a t e the document [eg. the f a c t that a l l j o u r n a l s are s t o r e d a l p h a b e t i c a l l y i n issue sequence s t a r t i n g on s h e l f 5]. T h i s type of system would a l s o not provide c i r c u l a t i o n c o n t r o l over the l o a n i n g of the v a r i o u s documents. The shared catalogue systems would not be able to l o c a t e the d e s i r e d work as they do not provide enough c r o s s - r e f e r e n c e d e t a i l . The only time one would f i n d the d e s i r e d work i s i f the work i s a l s o a p h y s i c a l item. Thus t h i s system w i l l produce i n c o n s i s t e n t r e s u l t s i n the r e t r i e v a l of i n f o r m a t i o n , r e s u l t i n g -16-E G Kehler Personal L i b r a r y System 1982 in low r e c a l l of i n f o r m a t i o n . A i n t e g r a t e d system would catalogue each and "every" i n t e l l e c t u a l work in the l i b r a r y , f o r c o n s i s t e n c y , and would a l s o i n d i c a t e the s p e c i f i c p h y s i c a l o b j e c t the d e s i r e d work i s found i n along with i t s storage l o c a t i o n . T h i s would provide a c o n s i s t e n t l e v e l of r e t r i e v a l access with g r e a t e r r e c a l l and would a l s o be more convenient in the l o c a t i n g of s t o r e d items i n the l i b r a r y . Major l i b r a r i e s would b e n e f i t by such a catalogue as a more complete access method would be a v a i l a b l e to i t s patrons. Users would not have to r e l y on o u t s i d e s e r v i c e s to l o c a t e items w i t h i n the l i b r a r y . T h i s type of combined catalogue would be good for a union catalogue and the c u r r e n t shared catalogue systems, as a complete r e f e r e n c e to a l l c o n t a i n e d documents would be a v a i l a b l e , yet only one copy of the a s s o c i a t e d b i b l i o g r a p h i c data would be present, thus the overhead of the a d d i t i o n a l data storage would be shared. The i n t e g r a t e d catalogue would r e s u l t in a s i n g l e step process in f i n d i n g any document s t o r e d i n the system. 2.5 Small L i b r a r y Requirements Small and p e r s o n a l l i b r a r i e s have s p e c i a l needs which are not found i n a l a r g e c e n t r a l l i b r a r y . A small l i b r a r y e x i s t s to serve a d i f f e r e n t purpose than i t s b i g b r o t h e r ; i t i s t h i s d i f f e r e n c e which c r e a t e s the d i f f e r e n t requirements between the two l i b r a r i e s . -17-E G Kehler Personal L i b r a r y System 1982 The smallness of. the small l i b r a r y d i c t a t e s smallness i n the way i t approaches a l l aspects of l i b r a r y s c i e n c e - e s p e c i a l l y where funding i s concerned. A l a r g e l i b r a r y has m u l t i p l e purposes f o r e x i s t e n c e and t h e r e f o r e m u l t i p l e funding sources. A small l i b r a r y ' s purpose i s to provide access to the immediate p o p u l a t i o n . I t concentrates on s e l f and i s not as concerned with o u t s i d e i n f o r m a t i o n . A l a r g e (or a r c h i v a l ) l i b r a r y p r o v i d e s access to a l l i n f o r m a t i o n , that which i t owns and that i n other l i b r a r i e s . The l a r g e l i b r a r y a l s o p r o v i d e s an a r c h i v a l place f o r l i t e r a t u r e and c e r t a i n p u b l i c a t i o n s , thus i t becomes the guardian of knowledge. The small l i b r a r y i s concerned with i t s own h o l d i n g s and not that much with items i t doesn't own. The small l i b r a r y i s a l s o not a primary a r c h i v a l f a c i l i t y ; i t simply e x i s t s to provide convenience to i t s patrons i n the o b t a i n i n g of knowledge. I t should be observed that s m a l l and l a r g e do not r e f e r to the a c t u a l numbers of owned items but r a t h e r to the purpose of the l i b r a r y . As a r e s u l t of the " s e l f centeredness" of the small l i b r a r y the catalogue takes on a much more important r o l e i n the l i b r a r y ; i t now forms the only access method i n t o the l i b r a r y ' s h o l d i n g s which i s a v a i l a b l e to i t s patrons. Since i t i s the only access method i t must provide f o r a l l types of access and be as comprehensive as p o s s i b l e . To meet t h i s requirement the catalogue must be a i n t e g r a t e d system r e f e r e n c i n g a l l i n t e l l e c t u a l works and l i n k i n g them to t h e i r p h y s i c a l l o c a t i o n . -18-E G Kehler Personal L i b r a r y System 1982 A second problem with the small l i b r a r y i s a r e s u l t of i t s very smallness. A c e n t r a l l i b r a r y can o b t a i n funding f o r v a r i o u s c o m p u t e r i z a t i o n programs i n v o l v i n g l a r g e r e s e a r c h systems. T h i s can come both s t r i c t l y f o r o p e r a t i o n s or i n d i r e c t l y through r e s e a r c h f a c i l i t i e s on which l i b r a r y automation rese a r c h i s being done. The small l i b r a r y has been l e f t out as i t has extremely low funding and no research i s done on t h i s small s c a l e . Now that micros are a v a i l a b l e the small l i b r a r y has an o p p o r t u n i t y to enter the computer e r a . However a l l i s not rosy; the micro imposes s e v e r a l new c o n s t r a i n t s on the implementation of the system which are not as c r i t i c a l with l a r g e r systems. The major r e s t r i c t i o n s are l i m i t t e d storage space and p r o c e s s i n g throughput. Overcoming these problems i s a concern of t h i s paper. 2.5.1 E f f i c i e n t Storage S t r u c t u r e s 2.5.1.1 Data Storage The m a j o r i t y of data i n a l i b r a r y catalogue c o n s i s t s of v a r y i n g l e n g t h t e x t s t r i n g s . A t i t l e , f o r i n s t a n c e , can be a few c h a r a c t e r s in l e n g t h on up to s e v e r a l hundred. A b s t r a c t s can be s e v e r a l hundreds of c h a r a c t e r s in l e n g t h . The s t r u c t u r e must be able to handle both the s h o r t e s t and longest e n t r i e s p o s s i b l e . T h i s makes using a f i x e d l e n g t h s t r u c t u r e i m p r a c t i c a l s i n c e p r o v i d i n g f o r the maximum entry makes the storage u t i l i z a t i o n extremly poor. The only s t r u c t u r e which pro v i d e s good storage u t i l i z a t i o n i s the tagged f i l e . In t h i s s t r u c t u r e each data item can have i t s storage space t a i l o r e d to the optimal s i z e f o r the data. T h i s permits more optimal use of the a v a i l a b l e space. - 1 9 -E G Kehler Personal L i b r a r y System 1982 2.5.1.2 Index s t r u c t u r e S e v e r a l p o t e n t i a l storage s t r u c t u r e s were a v a i l a b l e to choose from; each has i t s pros and cons. The most c r i t i c a l f a c t o r s governing the c h o i c e of index s t r u c t u r e were the l i m i t e d storage space and low d i s k t r a n s f e r r a t e s found with most micro-computers. The s t r u c t u r e which would do best i n t h i s environment would be one which minimized the amount of data t r a n s f e r and d i s k seeks, and handled v a r i a b l e l e n g t h text s t r i n g s e f f i c i e n t l y . I n i t i a l l y the K-d t r e e [Bentley 1975] looked l i k e a promising s t r u c t u r e . I t p r o v i d e d r a p i d access to the data and convenient methods f o r multi-key s e a r c h i n g and range q u e r i e s [Bentley 1978]. A l i b r a r y c atalogue would make heavy use of these f e a t u r e s i n l o c a t i n g v a r i o u s a r t i c l e s . However, the K-d t r e e i s a s t a t i c s t r u c t u r e , i t works best when b u i l t i n a balanced f a s h i o n . F i r s t , the l i b r a r y environment i s a very r a p i d l y growing environment with huge amounts of data, secondly, the data i s almost impossible to c h a r a c t e r i z e w e l l enough to b u i l d optimal t r e e s . T h i s would r e s u l t in unbalanced t r e e s and degraded access times. The second index s t r u c t u r e c o n s i d e r e d was the more c o n v e n t i o n a l B-Tree [Comer 1979]. Although s e v e r a l t r e e s would be r e q u i r e d to provide the multi-key search f a c i l i t y inherent i n the K-d t r e e , i t was b e l i e v e d that the a d d i t i o n a l storage overhead would not be g r e a t . I t i s a l s o known that the B-tree i s f a s t e r than a K-d t r e e f o r s i n g l e and some multi-key q u e r i e s [Weddell 1980]. Since most q u e r i e s w i l l be s i n g l e the B-tree should prove a b e t t e r choice - 2 0 -E G Kehler Personal L i b r a r y System 1982 then the K-d t r e e . A s l i g h t m o d i f i c a t i o n to the pure B-tree s t r u c t u r e should allow good range query p r o c e s s i n g as w e l l . Weddell i n [Weddell 1980] uses B-trees to support a K-d s t r u c t u r e , the B-trees are used to provide f a s t e r access f o r c e r t a i n c l a s s e s of q u e r i e s . A second f e a t u r e of the B-tree i s i t s dynamic nature, i t w i l l always remain balanced. T h i s w i l l keep the access times as low as p o s s i b l e . To enhance storage u t i l i z a t i o n which in turn improves access, a v a r i a n t of the B-tree was c o n s i d e r e d - the P r e f i x B-tree. The P r e f i x B-tree [Bayer 1977] can support v a r i a b l e l e n g t h data items; i t a l s o p r o v i d e s f o r some data compression, thus i t would f i t n i c e l y i n the l i b r a r y environment. Range q u e r i e s c o u l d be provided by adding forward and reverse l i n k s between the nodes in the B-tree, t h i s would a l s o allow browsing through the index given a s t a r t i n g p o i n t . Since most index terms r e f e r to s e v e r a l documents, an e f f e c i e n t method of l i n k i n g terms to m u l t i p l e records was r e q u i r e d . Two methods were a v a i l a b l e to choose from. The f i r s t , and most c o n v e n t i o n a l , method i n v o l v e d using a l i n k e d l i s t of p o i n t e r s to common documents. The second method u t i l i z e d b i t maps. The b i t map has the advantage that c o n s i d e r a b l e storage savings c o u l d be p o s s i b l e over the more t r a d i t i o n a l i n v e r t e d index s t r u c t u r e [Ragan 1978]. The b i t map a l s o p r o v i d e s more e f f i c i e n t boolean combination than does a l i n k e d l i s t s t r u c t u r e [Daneliuk 1979]; t h i s f e a t u r e alone makes the b i t map the s t r o n g e s t contender f o r use i n the index. -21-E G Kehler Personal L i b r a r y System 2.6 Current Systems 1982 2.6.1 Dobis The Dobis system has been adopted by the Canadian Government to serve as the o n - l i n e l i b r a r y management system f o r t h e i r sponsored l i b r a r i e s . [ M c A l l i s t e r 1979] The Dobis system pr o v i d e s a f u l l y o n - l i n e system with a l l access to the data a v a i l a b l e v i a t e r m i n a l s . The system pro v i d e s a f u l l b i b l i o g r a p h i c catalogue which can be l i n k e d to the h o l d i n g s of m u l t i p l e l i b r a r i e s . T h i s s e p a r a t i o n of the b i b l i o g r a p h i c data from the p h y s i c a l copies permits the system to provide each member l i b r a r y with c i r c u l a t i o n c o n t r o l and a c q u i s t i o n s management. The pooled b i b l i o g r a p h i c data enables l i b r a r i e s to share the cost of c l a s s i f y i n g books. The b i b l i o g r a p h i c data i s complete i n that the f u l l MARC II data format i s u t i l i z e d f o r storage. The i n d i v i d u a l MARC records can be searched v i a ei g h t access indexes: Author, T i t l e , Subject, C l a s s i f i c a t i o n , P u b l i s h e r , ISSN/ISBN, LC card numbers and other numbers. I n d i v i d u a l index terms are l i m i t e d to a maximum length of 255 c h a r a c t e r s . Searching can be s p e c i f i e d to i n c l u d e the e n t i r e c o l l e c t i o n or j u s t the l o c a l l i b r a r y ' s h o l d i n g s . A l l s e a r c h i n g i s done v i a a browsing method. The searcher f i r s t s e l e c t s which access index to use and then e n t e r s a s t a r t i n g p o i n t i n t o a l i n e a r l i s t of terms. Next the d e s i r e d item can be s e l e c t e d to show grea t e r d e t a i l , t h i s process repeats u n i l the d e s i r e d d e t a i l i s d i s p l a y e d and a l l d e s i r e d items are seen. The -22-E G Kehler Personal L i b r a r y System . 1982 system p r o v i d e s no boolean combinations f o r s e a r c h i n g . The Dobis system i s c l o s e l y modelled a f t e r the common card c a t a l o g u e , i t infact'was designed to appear l i k e the manual card system with only a few changes which make i t s use s l i g h t l y e a s i e r and f a s t e r . The system i s used to c a t a l o g p h y s i c a l o b j e c t s without regard to t h e i r s u b s t r u c t u r e and i s thus a s i n g l e l e v e l system with l i t t l e c a p a b i l i t y to support a h i e r a r c h i c a l l y s t r u c t u r e d r e t r i e v a l method. Dobis i s incapable of d i r e c t l y s u p p o r t i n g indexing i n t o the v a r i o u s subcomponents of the p h y s i c a l o b j e c t . 2.6.2 OCLC OCLC (Ohio C o l l e g e L i b r a r y Center) i s a l s o a shared catalogue system [Jacobs 1979]. As with Dobis, OCLC pr o v i d e s a complete o n - l i n e catalogue with f u l l MARC rec o r d s . A l l user access to the system i s v i a o n - l i n e t e r m i n a l s . OCLC pools b i b l i o g r a p h i c data entered i n t o the system and l i n k s the h o l d i n g s of the v a r i o u s member l i b r a r i e s to the common p o o l . At present there i s no c i r c u l a t i o n c o n t r o l a v a i l a b l e with the system. The b i b l i o g r a p h i c records can be searched through e i g h t main indexes: Author,. A u t h o r / T i t l e , L i b r a r y of Congress car d number, ISBN/ISSN, coden, OCLC c o n t r o l number. The system does not allow e x p l i c i t boolean combinations but the f i r s t three search indexes provide i m p l i c i t 'and' f u n c t i o n s . When performing an author search the input would c o n s i s t of the f o l l o w i n g Aaaa,aaa,a where the f i r s t group i s the f i r s t four c h a r a c t e r s of the author's -23-E G Kehler Personal L i b r a r y System 1982 surname followed by f i r s t name and middle i n i t i a l , the lowercase c h a r a c t e r s are o p t i o n a l . A u t h o r / T i t l e searches c o n s i s t of A a a a , t t t t f o r surname and f i r s t t i t l e word, and t i t l e searches T t t , t t , t t , t f o r the f i r s t four words i n the t i t l e . The r e t r i e v e d documents are d i s p l a y e d i n two formats, the short form i s used when a search r e t r i e v e s more than one document, t h i s i s a one l i n e d e s c r i p t i o n of the document. The user then has the option of d i s p l a y i n g s e l e c t e d documents i n g r e a t e r d e t a i l . L i k e Dobis, OCLC i s a s i n g l e l e v e l system i t cannot d i r e c t l y support the h i e r a r c h i c a l s t r u c t u r e of how knowledge i s p u b l i s h e d . The system t h e r e f o r e cannot adequately provide enough content d e t a i l of i t co n t a i n e d documents to make s e l e c t i o n of r e l e v e n t documents e f f e c i e n t . Since the system was designed to share c a t a l o g i n g c o s t s t h i s requirement was not c o n s i d e r e d . For the task i t was intended f o r , the system performs w e l l and i s i n f a c t the most widely used shared catalogue system today. 2.6.3 D i a l o g D i a l o g i s an example of an a n a l y t i c r e t r i e v a l system, perhaps the l a r g e s t . The system was developed and i s c u r r e n t l y operated by Lockheed M i s s i l e and Space Company Inc. [Miastowski 1981]. The D i a l o g system provides f u l l indexing i n t o a c o l l e c t i o n of b i b l i o g r a p h i c r e f e r e n c e s and a b s t r a c t s . Searching of the data base i s done i n t e r a c t i v e l y with batch c i t a t i o n p r i n t i n g . The system was designed to provide a b s t r a c t s and p u b l i c a t i o n d e t a i l s , there are no l i n k s in the system to the p h y s i c a l h o l d i n g s of the r e f e r e n c e d documents. -24-E G Kehler Personal L i b r a r y System 1982 The system i s searched by boolean o p e r a t i o n s ; there i s a r e s t r i c t e d browsing f e a t u r e a v a i l a b l e but i t i s secondary to the boolean s e a r c h i n g . The searchable indexes are b u i l t from d e s c r i p t o r s a s s i g n e d to each entry as w e l l as from every s i g n i f i c a n t word found i n the a b s t r a c t of the en t r y . Any boolean combination of these terms can be used to r e t r i e v e the c i t a t i o n . The boolean search can be performed i n two s t y l e s - as a s i n g l e , one l i n e e n try y i e l d i n g a s i n g l e r e s u l t , or stepwise d i s p l a y i n g intermediate r e s u l t s . A f t e r each step the system retu r n s a count of the number of documents meeting the c r i t e r i a . The s e l e c t i o n c r i t e r i a can be augmented by s e v e r a l c o n s t r a i n t s beyond the main index. These c o n s t r a i n t s c o n s i s t of such t h i n g s as s p e c i f y i n g the j o u r n a l name, p u b l i c a t i o n year, Author name. e t c . In a d d i t i o n , the system pro v i d e s s e v e r a l types of s t r i n g matching f e a t u r e s . These enable exact matching, p r e f i x matching, phrase matching, and word order s p e c i f i c a t i o n s to be enforced along with the basic boolean combinations. R e t r i e v e d c i t a t i o n s can then be d i s p l a y e d i n s e v e r a l formats of d i f f e r i n g d e t a i l . D i a l o g p r o v i d e s f o r the saving of a search s t r a t e g y which can be r e c a l l e d at l a t e r dates and rerun . T h i s f e a t u r e e l i m i n a t e s the chore of rekeying a search s t r a t e g y each time i t i s to be run. D i a l o g p r o v i d e s a comprehensive l e v e l of searc h i n g but has no f a c i l i t i e s f o r supporting the primary purpose of a card c a t a l o g , namely the l o c a t i n g of an item on the s h e l v e s . The system has no concept of the a c t u a l p h y s i c a l o b j e c t i t i s r e f e r e n c i n g . The D i a l o g system a l s o l a c k s the no t i o n of how knowledge i s -25-E G Kehler Personal L i b r a r y System 1982 p u b l i s h e d . I t sees every c i t a t i o n as being independent without l i n k s to brother a r t i c l e s in a j o u r n a l i s s u e . The only connections between c i t a t i o n s i s where two or more share common descr i p t o r s . -26-E G Kehler Personal L i b r a r y System 3 Proposed System 1 982 3.1 Purpose of System The goal of t h i s implementation, as p r e v i o u s l y s t a t e d , was to provide a means whereby an i n d i v i d u a l can manage h i s personal l i b r a r y . T h i s meant that access to the computer had to be convenient and a f f o r d a b l e . The only c h o i c e s which met these c r i t e r i a were to rent time on a system or to u t i l i z e a small micro computer. The f i r s t o p t i o n can be more expensive in the long run and i s not as p o r t a b l e , t h e r e f o r e i t i s not a long term s o l u t i o n to p r o v i d i n g an access t o o l to the l i b r a r y . Secondly, s e r v i c e bureau access i s not as r e a d i l i y o b t a i n a b l e as a micro and thus would prevent a l a r g e c l a s s of users from making use of the system. A micro system i s as p o r t a b l e as the l i b r a r y , i t can be s i t u a t e d next to the c o l l e c t i o n and thus c o n v e n i e n t l y l o c a t e d . The s t a r t u p c o s t s of a micro are higher than a s e r v i c e bureau but the o p e r a t i o n a l c o s t s much lower; in the long run the micro would prove cheaper than rented time. The micro has the a d d i t i o n a l advantage of p r o v i d i n g a d d i t i o n a l f u n c t i o n s at l i t t l e a d d i t i o n a l c o s t . The micro's major o b s t a c l e i s the p o s s i b i l i t y of i t s being too small f o r the storage requirements of the l i b r a r y . F o r t u n a t e l y most per s o n a l l i b r a r i e s have at most a few thousand items. T h i s number would c o n v e n i e n t l y f i t on a micro with a small Winchester d i s k and thereby s t i l l be more than adequate. Larger l i b r a r i e s -27-E G Kehler Personal L i b r a r y System 1982 would r e q u i r e a l a r g e r system f o r the sole use of the c a t a l o g . Thus, the hardware which would best meet the requirements would be a microcomputer with a small hard d i s k of a few m i l l i o n c h a r a c t e r s c a p a c i t y . Common micro systems on the market would meet a l l these storage requirements for under $10,000. Since a growing number of p r o f e s s i o n a l s a l r e a d y have a micro system the system s t a r t up c o s t would be much lower than presented. 3.2 I n t e g r a t e d System If a l i b r a r y system i s to be of the g r e a t e s t p o s s i b l e use i n a p e r s o n a l l i b r a r y i t must be a f u l l y i n t e g r a t e d system. Current o n - l i n e systems are e i t h e r too coarse in t h e i r c a t a l o g u i n g or not e a s i l y l i n k e d to a p h y s i c a l o b j e c t s t r u c t u r e with loanable items. The i n t e g r a t e d system must be able to handle a l l types of items in the l i b r a r y (books, r e c o r d s , e x h i b i t s . . . ) . The o b j e c t s must be r e f e r e n c a b l e on an i n d i v i d u a l b a s i s f o r c o n t r o l l i n g c i r c u l a t i o n , as w e l l as by t h e i r c o n s t i t u e n t i n t e l l e c t u a l components. To allow the i n t e l l e c t u a l works to be catalogued i n d i v i d u a l l y and a l s o l i n k e d to t h e i r p h y s i c a l occurrences, the n o t i o n of how knowledge i s p u b l i s h e d must be b u i l t i n t o the c a t a l o g system. T h i s s t r u c t u r e i s e s s e n t i a l when l i n k i n g m u l t i p l e items to a s i n g l e p h y s i c a l o b j e c t . U t i l i z i n g the f l a t s t r u c t u r e found in today's systems w i l l c r e a t e d i f f i c u l t y in the l i n k i n g of the i n t e l l e c t u a l works to t h e i r p h y s i c a l occurrences. T h i s i n turn makes the management of the system d i f f i c u l t . -28-E G Kehler Personal L i b r a r y System 1982 The s t r u c t u r e which w i l l provide convenient l i n k i n g . must take i n t o account the f u n c t i o n a l dependancies of i n t e l l e c t u a l works and the p h y s i c a l occurrences of them. Put simply, a s i n g l e p h y s i c a l o b j e c t can c o n t a i n many i n t e l l e c t u a l works; t h e r e f o r e the two must be d e a l t with d i f f e r e n t l y . T h i s can be done by employing a h i e r a r c h i c a l s t r u c t u r e . The s t r u c t u r e of p u b l i s h e d knowledge can be captured by a simple three l e v e l h i e r a r c h y . T h i s s t r u c t u r e has been proposed by Hoffman i n h i s book [Hoffman 1976] and l a t e r r e f i n e d by him i n [Hoffman 1981]. 3.3 The Three L e v e l H i e r a r c h y The f i r s t c o n s i d e r a t i o n in the s t r u c t u r i n g of knowledge i s the se p a r a t i o n of the p h y s i c a l c o n t a i n e r of knowledge from the contents - the i n t e l l e c t u a l work. It i s the i n t e l l e c t u a l work which i s the t h i n g of worth and that which i s d e s i r e d by the searcher. The co n t a i n e r has no value in i t s e l f except to provide a convenient method of housing knowledge. Once t h i s d i s t i n c t i o n i s made i t i s p o s s i b l e to deal with the i n t e l l e c t u a l knowledge and map i t s p u b l i s h e d s t r u c t u r e onto a data s t r u c t u r e . Knowledge i s p u b l i s h e d in i t s c u r r e n t f a s h i o n f o r convenience and t i m e l i n e s s . As new t e c h n o l o g i e s appear the mechanisms of tr a n s m i s s i o n and storage w i l l change but not the b a s i c p u b l i s h e d s t r u c t u r e . Current p u b l i c a t i o n r e s t r i c t i o n s have u n f o r t u n a t e l y i n t e r f e r e d with the pure s t r u c t u r e of knowledge and how i t should be t r u l y s t r u c t u r e d . T h i s i n t e r f e r e n c e can be seen by the a d d i t i o n of e x t r a l e v e l s i n the h i e r a r c h i c a l s t r u c t u r e . -29-E G Kehler Personal L i b r a r y System 1982 In i t s purest form knowledge can be seen as a two l e v e l s t r u c t u r e . The author forms a root node and a l l h i s i n t e l l e c t u a l works form leaves under the r o o t . In a i d e a l system an author would disseminate h i s knowledge by e n t e r i n g i t i n t o a common system under him as the author. These i n t e l l e c t u a l works would e v e n t u a l l y become leaves of other h i e r a r c h i e s as the case r e q u i r e s . These a d d i t i o n a l h i e r a r c h i e s would c o n s i s t of such t h i n g s as r e f e r e e i n g bodies which a t t e s t to the v a l i d i t y of the item. U n f o r t u n a t e l y past and c u r r e n t technology has not allowed t h i s type of i d e a l d i s s e m i n a t i o n method. The l a r g e s t reason to date fo r t h i s i s due to the c o s t of the d i s s e m i n a t i o n of knowledge. In past c e n t u r i e s i t was too expensive to p u b l i s h a s i n g l e paper of a few pages and have t h i s shipped to a l l i n t e r e s t e d p a r t i e s . Today the c o s t has become even g r e a t e r . To o f f s e t t h i s the works of v a r i o u s authors would be c o l l e c t e d by a p u b l i s h e r and p e r i o d i c a l l y - when s u f f i c i e n t were c o l l e c t e d - they would be p u b l i s h e d as a u n i t . These u n i t s i n turn formed part of an on-going s e r i e s forming a complete s e t . Thus we can see a three l e v e l h i e r a r c h i c y appearing having l i t t l e i n common with the pure s t r u c t u r e . Instead of the author being the root we have a j o u r n a l ; the j o u r n a l has i t s i n d i v i d u a l i s s u e s as i t s c h i l d r e n which in turn have the a c t u a l i n t e l l e c t u a l works as t h e i r l e a v e s . Thus the true root has been pushed a s i d e to accomodate the way knowledge i s p u b l i s h e d . -30 -E G Kehler Personal L i b r a r y System 1 982 The c o l l e c t i o n of v a r i o u s s e t s of p u b l i s h e d o b j e c t s The i n d i v i d u a l p u b l i s h e d o b j e c t s The i n t e l l e c t u a l works The Three L e v e l H i e r a r c h y of P u b l i s h e d Knowledge Figu r e 3:1 Hoffman in h i s a r t i c l e [Hoffman 1981] shows that p u b l i s h e d o b j e c t s can be c l a s s i f i e d i n t o seven d i f f e r e n t p u b l i c a t i o n c a t e g o r i e s . These seven c a t a g o r i e s can then be mapped onto the three l e v e l h i e r a r c h i c a l s t r u c t u r e . Some of the seven r e q u i r e only two l e v e l s ; these two l e v e l s of the s t r u c t u r e are not d e f i n e d by him. The seven c l a s s e s were mapped i n t o d i f f e r e n t l e v e l s in the h i e r a r c h y dependant upon which best represented the c l a s s . Hoffman goes on to s t a t e that a l l l i t e r a t u r e can be c l a s s i f i e d i n t o one of these seven c a t e g o r i e s although he does allow some mixing. He shows that a c o l l e c t i o n c o u l d c o n s i s t of two or more c l a s s e s although an i n d i v i d u a l o b j e c t can be of only one. In h i s paper he uses the three terms item, document, and -31-E G Kehler Personal L i b r a r y System 1 982 work to r e f e r to a s e r i a l , p h y s i c a l book, and i n t e l l e c t u a l work r e s p e c t i v e l y . P u b l i c a t i o n Types [Hoffman 1981] One Document Per Item S e v e r a l Documents Per Item F i n i t e I n f i n i t e One Work Per Item B S e v e r a l Works Per I tern S e v e r a l Works Per Document One Work Per Document A - T y p i c a l Novel B - Multi-volume s t o r y C - A s i n g l e volume c o l l e c t i o n of short s t o r i e s D - A multi-volume c o l l e c t i o n of short s t o r i e s E - An ongoing s e r i e s of short works F - A C o l l e c t i o n of novels - t r i l o g y G - An ongoing c o l l e c t i o n of novels F i g u r e 3:2 -32-E G Kehler Personal L i b r a r y System 1982 T h i s c l a s s i f i c a t i o n s t r u c t u r e c a t e g o r i z e s a l l forms of p u b l i s h e d knowledge; a l l i n t e l l e c t u a l works can be mapped i n t o t h i s s t r u c t u r e and then mapped i n t o the three l e v e l h i e r a r c h y . However, as i s often the case, i n t e l l e c t u a l works are p u b l i s h e d more than once and o f t e n j o u r n a l i s s u e s can be members of more than one s e t . These anomalies c o r r u p t even the three t i e r h i e r a r c h y in i t s purest form; f o r t u n a t e l y t h i s can be e a s i l y r e c t i f i e d . The s o l u t i o n to the problem i n v o l v e s the s h a r i n g of common nodes across the h i e r a r c h i c a l s t r u c t u r e s . T h i s m o d i f i c a t i o n of the pure h i e r a r c h i c a l s t r u c t u r e i s mainly for o p e r a t i o n a l ease of the system; without i t m u l t i p l e c o p i e s of the lower nodes would be r e q u i r e d . While a l l o w i n g t h i s m o d i f i c a t i o n the b a s i c three l e v e l s t r u c t u r e can s t i l l be seen. T h i s h i e r a r c h y c o n s i s t s of a s i n g l e root with m u l t i p l e c h i l d r e n each of which may have c h i l d r e n ; a n y non-root node may be common to s e v e r a l h i e r a r c h i e s . There are no c y c l e s in the s t r u c t u r e as the a r c s are d i r e c t e d , going from the s i n g l e root to i t s c h i l d r e n and then to t h e i r c h i l d r e n . C h i l d r e n cannot p o i n t to a higher l e v e l or t h e i r own l e v e l i n the s t r u c t u r e . A node belongs to only one l e v e l i n the s t r u c t u r e r e g a r d l e s s of how many i n d i v i d u a l h i e r a r c h i e s i t i s a member o f . Thus the f u n c t i o n a l dependencies can s t i l l be c l e a r l y seen forming a three l e v e l s t r u c t u r e . T h i s s t r u c t u r e i s simpler than a f u l l network. Although using a network database would be e a s i e r to use f o r the implementation than would a s t r i c t h i e r a r c h i c a l system. The premise of t h i s t h e s i s i s that t h i s h i e r a r c h i c a l s t r u c t u r e i s s u f f i c i e n t to b u i l d a complete l i b r a r y catalogue system upon. - 3 3 -E G Kehler Personal L i b r a r y System 1982 T h i s i s b e l i e v e d because the h i e r a r c h y c o r r e c t l y captures the s t r u c t u r e of p u b l i s h e d knowledge. Once t h i s n o t i o n of p h y s i c a l s t r u c t u r e i s captured i t becomes p o s s i b l e to c o r r e c t l y l i n k the a b s t r a c t nature of i n t e l l e c t u a l works with t h e i r p h y s i c a l c o n t a i n e r s . The r e s u l t i n g system i s a f u l l y i n t e g r a t e d catalogue p r o v i d i n g access to the c o n s t i t u e n t p a r t s of the p h y s i c a l o b j e c t s in the l i b r a r y as w e l l as to the l o c a t i o n of the o b j e c t s on the sh e l v e s . 3.4 Why i t w i l l work The system to be developed w i l l u t i l i z e the three l e v e l h i e r a r c h y . The catalogue w i l l be implemented as a f o r e s t of t r e e s , each t r e e being independent except where they share a common node. The top most l e v e l of the s t r u c t u r e w i l l always be a member of only one t r e e . The s t r u c t u r e can be e a s i l y l i n k e d to the p h y s i c a l occurrences of the r e f e r e n c e d o b j e c t s . Thus the system becomes p r a c t i c a l f o r a l i b r a r y c a t a l o g u e . The three l e v e l h i e r a r c h y can support a l l seven c l a s s e s of p u b l i s h e d documents and thus adequately models knowledge. P h y s i c a l copies of the v a r i o u s items can t h e r e f o r e be attached at the c o r r e c t l e v e l , e nsuring the c o r r e c t h a n d l i n g of t h e i r f u n c t i o n a l dependencies. Thus the proposed system can handle a l l forms of p u b l i s h e d documents and p r o p e r l y l i n k them to p h y s i c a l c o p i e s of the documents. The system i s t h e r e f o r e s u f f i c i e n t f o r p r o v i d i n g a b a s i s f o r a f u l l y i n t e g r a t e d l i b r a r y c a talogue system. -34-E G Kehler Personal L i b r a r y System 1982 SERIAL => JOURNAL ======> ARTICLE •> COPY <-The L o g i c a l S t r u c t u r e of the Catalogue Double l i n e s represent the l i n k s of the h i e r a r c h y , the s i n g l e l i n e s are l i n k s l o c a t i n g the p h y s i c a l copy of the item. F i g u r e 3:3 -35-E G Kehler Personal L i b r a r y System 4 System Implementation 1982 4.1 System D e s c r i p t i o n 4.1.1 Hardware The micro computer s e l e c t e d f o r the prototype implementation was a small business system. The s e l e c t e d system was chosen because of i t s r e l a t i v e low c o s t , high p r o c e s s i n g throughput, hardware e x p a n d a b i l i t y , and a v a i l a b i l i t y . Being a business system, the hardware i s of commericial q u a l i t y and should t h e r e f o r e be more r e l i a b l e than the average p e r s o n a l hobby computer. The system i s based upon the Motorola 6809 microprocessor, o p e r a t i n g at 2Mhz. T h i s c h i p i s a h y b r i d 8/16 b i t p a r t and thus capable of p r o v i d i n g g r e a t e r throughput than the more common 8 b i t micros. The b a s i c c o n f i g u r a t i o n c o n s i s t s of the micro, 64KB of memory, two 8 inch f l o p p y d i s k d r i v e s (each c o n t a i n i n g 1.1 Mbytes), and a s i n g l e t e r m i n a l . I t was t h i s c o n f i g u r a t i o n which was used f o r the i n i t i a l development of the l i b r a r y system. By the end of the implementation phase an a d d i t i o n a l 64KB of memory became a v a i l a b l e . The system can be expanded to 1Mbyte of i n t e r n a l memory, l00Mb+ of d i s k , and s e v e r a l t e r m i n a l s with r e l a t i v e ease. The major r e s t r i c t i o n of the system i s the l i m i t t e d address space; the 6809 can d i r e c t l y support only a 64K range. T h i s r e s t r i c t i o n i n c r e a s e s the complexity of l a r g e software systems developed f o r the micro. - 3 6 -E G Kehler Personal L i b r a r y System 1982 4.1.2 Operating System The O/S managing the micro system i s a U n i x - l i k e system c a l l e d OS9. 0S9 provi d e s most of the f e a t u r e s of Unix - h i e r a r c h i c a l f i l e s t r u c t u r e , process management, pipes - and i s t h e r e f o r e a convenient environment f o r the development of the l i b r a r y system. 0S9 supports s e v e r a l languages i n c l u d i n g a random f i l e P a s c a l . T h i s f e a t u r e made the development f a i r l y s t r a i g h t f o r w a r d ; without the random I/O f e a t u r e i t would not be p o s s i b l e to develop the system. 4.2 Data S t r u c t u r e s 4.2.1 Overview The catalogue system i s comprised of nineteen o n - l i n e f i l e s . These f i l e s c o n t a i n a l l the st o r e d data elements which form the b i b l i o g r a p h i c e n t i t i e s , and the l o g i c a l s t r u c t u r e which d e f i n e s the h i e r a r c h i c a l r e l a t i o n s h i p s . Of these nineteen, three form the nucleus of the system; the three each have an a s s o c i a t e d index f i l e . The remaining f i l e s form the searchable indexes i n t o the main data area. The three main f i l e s a re: B i b l i o . M a i n . d a t a T h i s f i l e c o n t a i n s the v a r i a b l e length b i b l i o g r a p h i c data. A l l inf o r m a t i o n s t o r e d by the system concerning an ob j e c t i s found i n t h i s f i l e . There are two exceptions to t h i s ; the h i e r a r c h i c a l l i n k a g e data and the data about the p h y s i c a l c o p i e s of an item are s t o r e d s e p e r a t e l y . -37 -E G Kehler Personal L i b r a r y System 1982 Work.data Th i s f i l e c o n t a i n s the i n f o r m a t i o n p e r t a i n i n g to each p h y s i c a l o b j e c t r e f e r e n c e d by the c a t a l o g u e . T h i s i n c l u d e s such t h i n g s as media type, c o n d i t i o n , and l o c a t i o n . Components.data T h i s f i l e c o n t a i n s the l i n k s between the two p r e v i o u s f i l e s as w e l l as the h i e r a r c h i c a l l i n k s between e n t r i e s w i t h i n the b i b l i o g r a p h i c f i l e . Each of these f i l e s has a DEF f i l e ( D i r e c t Entry F i l e ) a s s o c i a t e d with i t . The DEF p r o v i d e s an index i n t o i t s data f i l e and forms the only access path i n t o these f i l e s . The next group c o n s i s t of s i x s ets of two f i l e s . These twelve f i l e s form the searchable indexes of the c a t a l o g u e . The f i r s t f i l e of each set i s a P r e f i x B-tree which serves as an index i n t o the second f i l e , which i s a compressed b i t map. The twelve f i l e s a re: Author.Tree & .Map T i t l e . T r e e & .Map Keywords.Tree & .Map - Searched i n Author name q u e r i e s - Author names are s t o r e d : l a s t f i r s t middle - Searched in t i t l e q u e r i e s - A l l t i t l e s of every entry are s t o r e d in the order they appear in the work - Searched i n Keyword q u e r i e s - Contains a l l words and phrase -38-E G Kehler Personal L i b r a r y System 1982 d e s c r i p t o r s manually assi g n e d Searched i n Subject heading q u e r i e s S i m i l a r to Keywords except that the e n t r i e s are p r e - c o o r d i n a t e d and come from a c o n t r o l l e d vocabulary Searched i n components q u e r i e s Contains a l l s i g n i f i c a n t words in t i t l e and a b s t r a c t Searched i n ISSN q u e r i e s Contains ISSN and ISBN numbers of a l l r e f e r e n c e d items The f i n a l f i l e i s used as a stop l i . s t . I t c o n t a i n s approximately 210 of the most common words i n the E n g l i s h language as w e l l as a few which are common to the computing i n d u s t r y . These words are the ones which are not to be indexed i n the T i t l e C index s t r u c t u r e . 4.2.2 I n t e r - f i l e R e l a t i o n s h i p s The most complex r e l a t i o n s h i p s found in the catalogue system e x i s t between the three nucleus f i l e s . These r e l a t i o n s h i p s form the backbone of the system. A s s o c i a t e d with each B i b l i o g r a p h i c entry i s a components/owner l i s t , t h i s l i s t l i n k s the entry with i t s h i e r a r c h i c a l parents, c h i l d r e n and the p h y s i c a l c o p i e s of the item. T h i s l i n k a g e data i s s t o r e d i n the components f i l e as a l i n k e d l i s t . The works f i l e i s r e f e r e n c e d by the components f i l e and i t in turn r e f e r s back to the b i b l i o g r a p h i c f i l e by -39-S u b j e c t s . T r e e & .Map -T i t l e C . T r e e & .Map -ISSN.Tree & .Map -E G Kehler Personal L i b r a r y System 1982 i n d i c a t i n g i t s owner. The f o l l o w i n g diagram i l l u s t r a t e s these r e l a t i o n s h i p s . The twelve index f i l e s are used simply to l o c a t e a s t a r t i n g p o i n t i n s e a r c h i n g the main b i b l i o g r a p h i c ; at present they have no other f u n c t i o n . The twelve index f i l e s can be looked upon as a separate subsystem. They are l i n k e d to the main f i l e s only by s y m b o l i c a l l y i d e n t i f y i n g e n t r i e s i n the b i b l i o g r a p h i c f i l e . I n t e r n a l l y these f i l e s come i n sets of two. The f i r s t of each set i s an index i n t o the second. The index c o n t a i n s every term which forms the data s t o r e d by that c a t a l o g index (eg Author names). T h i s index then p o i n t s i n t o the B i t map f i l e . The b i t map i d e n t i f i e s the a s s o c i a t e d b i b l i o g r a p h i c e n t r i e s by t h e i r i n t e r n a l document number and thus forms the l i n k between the index and the b i b l i o g r a p h i c data. The f i n a l f i l e , the stop l i s t , has no data l i n k s to the r e s t of the system. I t * i s r e l a t e d only through the data entry process. -40 -E G Kehler Personal L i b r a r y System 1982 PHYSICAL F i l e S t r u c t u r e s DEF +-+-+-+-+-+ V - + --+-- + - - + --+-- + Main B i b l i o g r a p h i c -+-- + +--+--+ +--+--+ 1 H +--+--+ V +--+--+--+--+--+--+--+ V Components V +-+-+-+-+ DEF — • - - + - + - + > p h y s i c a l p o i n t e r s ....> Symbolic p o i n t e r s V +-+-+-+-+ +-+-+-+-+ V DEF Works +--+--+--+--+ Fig u r e 4:1 -41-E G Kehler Personal L i b r a r y System 1982 • 4.2.3 F i l e Usage The most obvious f i l e s to the average user of the l i b r a r y system are the index t r e e f i l e s . The searcher w i l l almost always approach the catalogue v i a a search through the index f i l e s . He would s t a r t by f i r s t s e l e c t i n g which of the indexes he wants to search. His next step would be to enter a term as the i n i t i a l s t a r t i n g p o i n t w i t h i n the index. Next he would browse through the index u n t i l he f i n d s a d e s i r e d term. Once a term has been chosen the system can be requested to provide a l i s t of the v a r i o u s documents a s s o c i a t e d with the term. The b i t map i s used to provide the l i s t of a s s o c i a t e d documents for each term; each document can be r e t r i e v e d in t u r n . Once a b i b l i o g r a p h i c e ntry i s d i s p l a y e d i n a short format, f u r t h e r r e l a t e d i n f o r m a t i o n can be obtained by u t i l i z i n g the l i n k a g e s c o n t a i n e d by the components f i l e . Thus the i n d i v i d u a l a r t i c l e s of a r e t r i e v e d book can be r e c a l l e d as w e l l as where the p h y s i c a l c o p i e s of the s e l e c t e d entry are s t o r e d . The main flow of the search can be seen as o r i g i n a t i n g i n the indexes, moving to the h i e r a r c h i c a l s t r u c t u r e and t e r m i n a t i n g i n the works f i l e which i n d i c a t e s where the a c t u a l item can be found. To the l i b r a r i a n , who maintains the data base, the system i s seen more as the three l e v e l l o g i c a l s t r u c t u r e with the p h y s i c a l c o p i e s a t t a c h e d to the s i d e of the t r e e . T h i s d i f f e r r i n g view r e s u l t s because the l i b r a r i a n d e a l s d i r e c t l y with the l o g i c a l s t r u c t u r e . The main b i b l i o g r a p h i c f i l e can be entered d i r e c t l y without o r i g i n a t i n g in an index. The l i b r a r i a n can a l s o enter the -42-E G Kehler Personal L i b r a r y System 1982 works f i l e d i r e c t l y and can from there move back i n t o the main b i b l i o g r a p h i c . 4.2.4 I n t e r n a l F i l e S t r u c t u r e For a D e t a i l e d layout of the i n t e r n a l f i l e s t r u c t u r e s see appendix A. As documents are added to the system they are assigned a permanent s e q u e n t i a l i n t e r n a l document number (IDN). T h i s number has two l o g i c a l p a r t s . The f i r s t p art i s a two b i t i d e n t i f i e r ; the four combinations i n d i c a t e whether the i d e n t i f i e d item i s a p h y s i c a l copy entry or a b i b l i o g r a p h i c e n t r y . In the l a t t e r case i t a l s o i n d i c a t e s which l e v e l of the h i e r a r c h y the entry belongs to . The second part of the IDN i s a t h i r t y b i t number used to uniquely i d e n t i f y the e n t r y . Since the IDN i s assigned s e q u e n t i a l l y with no i n t e n t i o n a l gaps, the DEF f o r the main b i b l i o g r a p h i c and components f i l e i s a simple ordered s e q u e n t i a l l i s t of p o i n t e r s . The r e l a t i v e p o s i t i o n of the p o i n t e r s w i t h i n the DEFs determine the entry i t i s a s s o c i a t e d with. The works f i l e a l s o u t i l i z e s the IDN to uniquely i d e n t i f y i t s e n t r i e s . However, the IDN i s here assigned as the same value as the item's a c c e s s i o n number. Since the a c c e s s i o n number i s assigned manually i t cannot be counted upon to be s e q u e n t i a l and thus the Works' f i l e DEF must be s t r u c t u r e d d i f f e r e n t l y than the o t h e r s . The Works' DEF i s s t r u c t u r e d as a B-Tree. T h i s B-tree i s -43 -E G Kehler Personal L i b r a r y System 1982 very simular to the c o n v e n t i o n a l B-Tree with f i x e d l e n g t h e n t r i e s and a l l data items in the l e a v e s . The t r e e has nodes of 256 c h a r a c t e r s c o n t a i n i n g t h i r t y s e t s of p o i n t e r s and a c c e s s i o n numbers. The main b i b l i o g r a p h i c f i l e i s a l s o d i v i d e d i n t o 256 byte r e c o r d s . As b i b l i o g r a p h i c data i s i n s e r t e d , the data i s placed i n t o these records which are l i n k e d i n t o a s e q u e n t i a l l i s t . Each rec o r d c o n t a i n s a f i x e d and v a r i a b l e s e c t i o n . The f i x e d s e c t i o n c o n t a i n s i n t e r n a l data used to c o n t r o l the l i n k a g e s and i d e n t i f y the r e c o r d along with i t s r e l a t i v e p o s i t i o n i n the l i s t . T h i s l a t t e r data i s present to enable the r e c o n s t r u c t i o n of the l i s t should the f i l e become damaged. The v a r i a b l e data s e c t i o n c o n s i s t s of a s e r i e s of three r e p e a t i n g f i e l d s : tag i d , length, data. I t i s i n t h i s v a r i a b l e s e c t i o n that a l l the b i b l i o g r a p h i c i n f o r m a t i o n i s s t o r e d . The Components data f i l e c o n s i s t s of f i x e d l e n g t h 32 byte r e c o r d s . Each r e c o r d c o n t a i n s space for four IDNs. These IDNs s y m b o l i c a l l y p o i n t back i n t o the b i b l i o g r a p h i c f i l e and the works f i l e . The records are l i n k e d i n t o a s e q u e n t i a l l i s t as a d d i t i o n a l space i s r e q u i r e d . The Works f i l e i s a l s o a f i x e d format f i l e . The records here are 64 bytes i n l e n g t h . They c o n t a i n i n f o r m a t i o n which i s p a r t i c u l a r to the i n d i v i d u a l p h y s i c a l o b j e c t s . Half of the r e c o r d i s undefined and a v a i l a b l e f o r use by a c i r c u l a t i o n c o n t r o l subsystem. -44-E G K e h l e r P e r s o n a l L i b r a r y S y s t e m 1982 I n d e x F i l e s T h i s s e t o f f i l e s , a l t h o u g h n o t e s s e n t i a l t o t h e r e p r e s e n t a t i o n a n d s t o r a g e o f t h e b i b l i o g r a p h i c d a t a , f o r m t h e m a i n a c c e s s p o i n t i n t o t h e s y s t e m . S i n c e t h e y w i l l b e t h e m o s t h e a v i l y u s e d a n d m o s t o b v i o u s t o t h e u s e r t h e y h a d t o p r o v i d e e f f i c i e n t , r a p i d , y e t s i m p l e a c c e s s i n t o t h e m a i n d a t a a r e a s . T h e m a j o r c o m p l i c a t i o n w h i c h h a d t o b e o v e r c o m e w a s t h e f o r m a t o f t h e d a t a t o b e s t o r e d , n a m e l y v a r i a b l e l e n g t h t e x t s t r i n g s . S i n c e t h e l e n g t h s v a r y f r o m o n l y a f e w c h a r a c t e r s t o s e v e r a l h u n d r e d s i n l e n g t h i t i s i m p r a c t i c a l t o a l l o c a t e t h e s a m e f i x e d a m o u n t e a c h t i m e . T h i s m a d e t h e s e l e c t i o n o f t h e d a t a s t r u c t u r e v e r y d i f f i c u l t a n d a s d e s c r i b e d i n c h a p t e r t w o a B - t r e e f o r m w a s s e l e c t e d f o r t h e t e r m s t o r a g e p a r t o f t h e i n d e x . T h e s t a n d a r d B - t r e e c o u l d n o t b e u s e d b e c a u s e i t i s b a s e d u p o n f i x e d l e n g t h k e y s . T h e r e f o r e t h e v a r i a n t o f t h e B - t r e e k n o w n a s t h e P r e f i x B - t r e e was u s e d . T h i s s t r u c t u r e e f f i c i e n t l y d e a l t w i t h t h e v a r i a b l e l e n g t h o f t h e s t r i n g s . T h e P r e f i x B - t r e e h a s t h e a d d i t i o n a l b e n e f i t o f f r o n t - e n d c o m p r e s s i o n . T h i s c o m p r e s s i o n i s s e e n i n t h e n o n - l e a f l e v e l s o f t h e t r e e . A s a n o d e i s s p l i t , a m i n i m a l l e n g t h s e p a r a t o r i s s e l e c t e d a n d p r o m o t e d t o t h e u p p e r l e v e l . T h i s h a s t h e e f f e c t o f r e d u c i n g t h e s t o r a g e r e q u i r e d f o r s e p a r a t o r s w h i c h i n t u r n i n c r e a s e s t h e b r a n c h i n g f a c t o r o f t h e h i g h e r n o d e s p r o d u c i n g a s h a l l o w e r t r e e . I n a d d i t i o n t o t h e s t a n d a r d t r e e p o i n t e r s , t h e n o d e s o f e a c h l e v e l a r e l i n k e d t o g e t h e r w i t h f o r w a r d a n d b a c k w a r d p o i n t e r s . T h i s d o u b l y l i n k e d l i s t l i n k s t h e n o d e s i n s e q u e n t i a l o r d e r - 4 5 -E G Kehler Personal L i b r a r y System 1982 ac r o s s the v a r i o u s parent nodes. T h i s f e a t u r e permits r a p i d browsing through the t r e e and w i l l provide the means f o r a good concurrent updating f a c i l i t y as developed by Lehman [Lehman 1981 ]. The nodes of the t r e e are 512 bytes i n le n g t h , 492 of which are a v a i l a b l e f o r storage of the v a r i a b l e l e n g t h text s t r i n g s and a s s o c i a t e d p o i n t e r s . The allows a maximum s t r i n g length of 488 c h a r a c t e r s . The v a r i a b l e data area i s comprised of an a l t e r n a t i n g s e r i e s of terms and four byte p o i n t e r s . The v a r i a b l e length term i s terminated by s e t t i n g the high-order b i t of the l a s t c h a r a c t e r h i g h . The leaves of the t r e e c o n t a i n the p o i n t e r s i n t o the b i t map f i l e . -46-E G Kehler Personal L i b r a r y System 1982 Doubly Linked P r e f i x B-tree + + | ROOT |>--+ + + | V v + + + + I I V V V + + + + + + v v v v v v . . . . V . . . V V . . . . v > Tree p o i n t e r s ====> Doubly l i n k e d l e v e l p o i n t e r s ....> E x t e r n a l p o i n t e r s F i g u r e 4:2 The B i t Map f i l e c o n t a i n s maps f o r every term found i n the B-tree. The f i l e i s comprised of f i x e d format records of 64 bytes. Each of these records c o n t a i n t h i r t e e n s e t s of b i t map. When a d d i t i o n a l s e t s are needed, a d d i t i o n a l records are l i n k e d i n t o a l i s t s t r u c t u r e which forms a s i n g l e l o g i c a l b i t map. The b i t map set c o n s i s t s of two p a r t s : a two byte o f f s e t and a two byte b i t f i e l d . Each document entered i n t o the system i s assigned a s i n g l e b i t in the map. When the document i s r e l e v a n t to the map, i t s b i t i s s e t , otherwise the b i t i s c l e a r . The bit/document r e l a t i o n i s e s t a b l i s h e d by the b i t ' s r e l a t i v e p o s i t i o n to the -47-E G Kehler Personal L i b r a r y System 1 982 s t a r t of the map. Since most maps w i l l only r e f e r to a small percentage of the documents, there must be some form of c l e a r e d b i t compression; i t i s the o f f s e t which provides t h i s compression. The o f f s e t i n d i c a t e s the number of c l e a r b i t f i e l d s between the p r e v i o u s f i e l d and the f o l l o w i n g one. ( i e . an o f f s e t value of f i v e would i n d i c a t e that there are 80 c l e a r b i t s between the two f i e l d s . ) B i t Map S t r u c t u r e B i t s B i t s B i t s B i t s -Number of c l e a r sets of 16 b i t s between a c t u a l b i t f i e l d s a set of 16 b i t s , each of which represent one document Fi g u r e 4:3 T h i s form of index was s e l e c t e d over the c o n v e n t i o n a l approach due to i t s p o t e n t i a l space r e d u c t i o n s . The index i n a c o n v e n t i o n a l index c o n s i s t s of a l i s t of p o i n t e r s or symbolic i d e n t i f i e r s i n t o the main data f i l e . In t h i s c atalogue implementation the l i s t would c o n s i s t of the four byte IDN, thus four bytes per r e f e r e n c e and overhead would be r e q u i r e d f o r each document a s s o c i a t e d with each term. For space requirement comparisons the overhead can be ignored as both the l i s t and the map w i l l have a s i m i l a r percentage of overhead. -48-E G Kehler Personal L i b r a r y System 1982 Ragan in h i s a r t i c l e [Ragan 1978] found that the use of b i t maps reduced the index s i z e from 126 percent of the data f i l e , i n c o n v e n t i o n a l i n v e r t e d indexes, to 32 percent. In the c u r r e n t implementation the p o t e n t i a l compression c o u l d be as high as s i x t e e n to one, where s i x t e e n r e f e r e n c e s c o u l d be compressed i n t o the space one would occupy in the c o n v e n t i o n a l approach. Although a c h i e v i n g t h i s much compression i s u n l i k e l y , even a two or three f o l d r e d u c t i o n in s i z e would make the map more a t t r a c t i v e than the c o n v e n t i o n a l index s t r u c t u r e . The a c t u a l space usage per entry i n the b i t map can be shown to f a l l i n t o one of three cases. The f i r s t , when two referenced items have IDNs d i f f e r i n g by l e s s than s i x t e e n . T h i s i s the only case which can p o t e n t i a l l y p r ovide space compression. Should the second and subsequent e n t r i e s of a l o c a l i z e d s e r i e s f a l l on the same s i x t e e n b i t map word then the second and subsequent e n t r i e s are s t o r e d with no a d d i t i o n a l storage requirements over that r e q u i r e d by the f i r s t e n t r y . Should the e n t r i e s not f a l l on the same map word then the e n t r i e s would r e q u i r e space a l l o c a t i o n as d e s c r i b e d by case two. Case two space requirements apply to those e n t r i e s which are missed by case one and a l l e n t r i e s which d i f f e r in value by s i x t e e n or more but l e s s than 524,272 (2**15 - 1)*16. In t h i s case, storage requirements f o r both the map and the c o n v e n t i o n a l system are i d e n t i c a l . Each map entry w i l l occupy four bytes - a two byte o f f s e t and a two byte map. The f i n a l case comprises those e n t r i e s which d i f f e r by more than 524,272. Since t h i s i s the l a r g e s t number of b i t s an o f f s e t can s k i p over i n a s i n g l e step, m u l t i p l e o f f s e t s with zero maps would -49-E G K e h l e r P e r s o n a l L i b r a r y System 1982 be r e q u i r e d t o p r o v i d e t h e n e c c e s s a r y r a n g e . T h i s f i n a l c a s e i s t h e o n l y c a s e where the map c a n r e s u l t i n p o o r e r s p a c e r e q u i r e m e n t s t h a n t h e c o n v e n t i o n a l s t r u c t u r e . F o r t u n a t e l y , D a n e l i u k i n [ D a n e l i u k 1979] o b s e r v e d t h a t e n t r i e s a r e g e n e r a l l y f o u n d i n clumps where s e v e r a l e n t r i e s a r e d e n s e l y p a c k e d f o l l o w e d by a gap and a n o t h e r clump. T h i s c l u m p i n g would c a u s e most e n t r i e s t o f a l l i n c a s e one. The s t r u c t u r e he employed was not u s e d i n t h i s i m p l e m e n t a t i o n as i t was more complex t h a n t h e one i m p l e m e n t e d and i t d i d not a p p e a r t o p r o v i d e any a d d i t i o n a l b e n e f i t s . C o m p u t a t i o n a l r e q u i r e m e n t s of t h e map can be d i v i d e d i n t o two c a s e s d e t e r m i n e d by u s e . The f i r s t c a s e c o m p r i s e s q u e r i e s which i n v o l v e o n l y one term and map, e n t r i e s a r e r e t r i e v e d and d i s p l a y e d s e q u e n t i a l l y . I n t h i s c a s e t h e r e i s a d d i t i o n a l c p u t i m e r e q u i r e d t o p e r f o r m t h e map d e c o d i n g w h i c h i s not p r e s e n t w i t h c o n v e n t i o n a l s y s t e m s . However, t h e s m a l l t i m e p e n a l t y i s not s i g n i f i c a n t as t h e major t i m e d e l a y i s i n t h e d i s p l a y i n g of t h e e n t r i e s and t h e u s e r ' s r e s p o n s e t o i t . Case two c o n s i s t s o f t h o s e q u e r i e s w h i c h p e r f o r m b o o l e a n c o m b i n a t i o n s between two maps. Here t h e map s t r u c t u r e p r o v i d e s a more e f f i c i e n t mechanism f o r c o m b i n a t i o n s o v e r t h e c o n v e n t i o n a l a p p r o a c h [ D a n e l i u k 1979]. In t h i s s e c o n d c a s e any t i m e s a v i n g i s c r i t i c a l a s more work i s r e q u i r e d w i t h p o t e n t i a l l y many maps b e i n g combined b e f o r e t h e f i r s t r e s p o n s e t o t h e u s e r . Thus c a s e two r e q u i r e m e n t s t a k e p r e c e d e n c e o v e r t h o s e o f c a s e one. -50-E G Kehler Personal L i b r a r y System 1982 O v e r a l l , the use of the map s t r u c t u r e should r e s u l t in s i g n i f i c a n t space r e d u c t i o n and reduced cpu a c t i v i t y ( i n time c r i t i c a l computations) than the c o n v e n t i o n a l index s t r u c t u r e . In l i g h t of t h i s a n a l y s i s the b i t map was the b e t t e r c h o i c e f o r the implementation of the i n v e r t e d index s t r u c t u r e . 4.2.5 P h y s i c a l and L o g i c a l R e l a t i o n s h i p A l l of the l o g i c a l h i e r a r c h i c a l l i n k a g e s connecting the v a r i o u s b i b l i o g r a p h i c e n t r i e s to themselves and to the p h y s i c a l c o p i e s of them are s t o r e d i n the s i n g l e f i l e - Components. Although most r e t r i e v a l systems have a s i m i l a r type of f i l e f o r l i n k i n g the b i b l i o g r a p h i c e n t r i e s to t h e i r p h y s i c a l c o p i e s they use i t only fo r that purpose. The only d i f f e r e n c e between t h i s system and the others at the p h y s i c a l l e v e l i s that here the p o i n t e r s take on two separate meanings depending upon the f i l e they r e f e r e n c e . Thus the general c o n s t r u c t of the low l e v e l s t r u c t u r e s i s very s i m i l a r to the systems found today. As the f o l l o w i n g diagram i l l u s t r a t e s , most p o i n t e r s are symbolic i n nature; the i m p l i c i t p o i n t e r s are based upon r e l a t i v e p o s i t i o n s . Since the components f i l e has two d i s t i n c t purposes i t has been given two names - components and l o c a t i o n - which d i s t i n g u i s h t h i s purpose. The b i b l i o g r a p h i c f i l e i s shown under a three names depending upon what type of entry i s being d e s c r i b e d . -51-E G Kehler Personal L i b r a r y System 1 982 L o g i c a l and P h y s i c a l S t r u c t u r e of the catalogue system + + + + + + | SERIAL |< < | COMP | > >| ARTICLES | + + + + + + V B " C B V • " V V + + + + + + | COMP | > >| JOURNALS |< < | LOC | + + + + + + C B V " • V C B C w V + + | LOC | + + V c V + + >| WORKS |> W - B i b l i o g r a p h i c f i l e - Components f i l e - Works f i l e -> I m p l i c i t p o i n t e r s > Symbolic p o i n t e r s F i g u r e 4:4 4.3 Programs The catalogue system can be viewed from two d i f f e r e n t vantage p o i n t s - that of the l i b r a r i a n and that of the l i b r a r y patron. In the case of the l a t t e r there i s a s i n g l e program which the patron employs f o r h i s q u e r i e s - the r e t r i e v a l program. I t i s t h i s program which prov i d e s the in f o r m a t i o n f o r which the e n t i r e system i s c r e a t e d . In a d d i t i o n to the r e t r i e v a l program the l i b r a r i a n a l s o u t i l i z e s v a r i o u s maintenance r o u t i n e s to maintain the data s t o r e d in the system. At present the number of maintenance f u n c t i o n s which have been implemented are minimal. -52 -E G K e h l e r P e r s o n a l L i b r a r y System 1982 To add e n t r i e s t o t h e s y s t e m a s e q u e n c e of f o u r p rograms a r e employed. The f i r s t of t h e s e i s a d a t a e n t r y p r o g r a m w h i c h works i n t e r a c t i v e l y , p r o m p t i n g t h e u s e r f o r t h e v a r i o u s d a t a i t e m s of a t y p i c a l b i b l i o g r a p h i c i t e m . The p r o m p t i n g c o n t r o l u s e d by t h i s p r o g r a m and t h e p e r m i s s i b l e d a t a e l e m e n t s a r e s t o r e d i n t h e B i b l i o t a g s c o n t r o l f i l e (See a p p e n d i x E f o r t h e c u r r e n t v a l u e s ) . T h i s program b u i l d s a b a t c h f i l e o f t h e e n t e r e d d a t a . T h i s p r o g r am d o e s n o t r e f e r e n c e t h e main s y s t e m f i l e s and t h u s c a n n o t do any e d i t t i n g of t h e d a t a . The r e s u l t i s a t e x t f i l e w i t h a few s t r u c t u r e c o n t r o l c o d e s u s e d t o s e p a r a t e t h e v a r i o u s e n t r i e s and i n d i c a t e t h e h i e r a r c h i c a l r e l a t i o n s h i p s of t h e e n t e r e d d a t a . S i n c e t h i s p r o g r a m s i m p l y p r o d u c e s a common t e x t f i l e , any pro g r a m d o i n g t h e same c o u l d be emplo y e d . However, t h e p u r p o s e of t h i s p r o g r am i s t o a i d i n t h e e n t r y of d a t a , and t h u s p r o v i d e s t h e e a s i e s t method of d a t a i n s e r t i o n i n t o t h e l i b r a r y s y s t e m . The s e c o n d p r o g ram i n t h e e n t r y s e q u e n c e i s t h e e d i t p r o g r a m . T h i s p r o g ram c h e c k s t h e b a t c h f i l e f o r p r o p e r s y n t a x ; i t a l s o f l a g s d a t a i t e m s which c o u l d be i n e r r o r or a l r e a d y p r e s e n t i n th e s y s t e m . S h o u l d c o r r e c t i o n s be r e q u i r e d t h e t e x t f i l e c a n be u p d a t e d by e m p l o y i n g a s t a n d a r d t e x t e d i t o r . See a p p e n d i x B f o r a sample e d i t l i s t . When t h e b a t c h f i l e i s e r r o r f r e e i t i s put t h r o u g h a two s t e p merge p r o c e s s . The f i r s t o f t h e s e s t e p s r e a d s t h e t e x t f i l e and c o p i e s t h e d a t a i n t o t h e main b i b l i o g r a p h i c d a t a f i l e and t h e p h y s i c a l works f i l e . T h i s p r o g ram a l s o b u i l d s t h e l i n k a g e s f o r e a c h e n t r y i n t h e components f i l e and a l s o c r e a t e s a n o n - t e x t -53-E G Kehler Personal L i b r a r y System 1982 f i l e c o n t a i n i n g a l l the i n d i v i d u a l terms which c o u l d p o t e n t i a l l y be merged i n t o the index f i l e s . The second program in the two step merge reads the nontext f i l e and adds the terms to the index f i l e s . E n t r i e s marked for a d d i t i o n to the TITLEC f i l e are f i r s t checked a g a i n s t a stop l i s t and i f found are not added to the index. Once t h i s f i l e has been processed the merge i s complete. The r e t r i e v a l program i s the major source of output from the catalogue system. I t i s t h i s program which produces the d e s i r e d r e s u l t of the catalogue system. The program p r o v i d e s the user with four types of acces i n to the catalogue data f i l e s . These four methods are browsing, boolean s e a r c h i n g , d i r e c t entry v i a the IDN, and d i r e c t entry v i a Accession numbers. 1) Browse T h i s i s the main method used f o r the o b t a i n i n g of i n f o r m a t i o n s t o r e d i n the catalogue system. Once the user has s e l e c t e d the browse option he i s prompted to s e l e c t an index to browse through. Next the program requests an entry p o i n t i n t o the index; the program responds by d i s p l a y i n g the e n t r i e s on the page which c o n t a i n s the entered term. If the term does not e x i s t i n the index the page on which i t would appear i s d i s p l a y e d . Once a page i s d i s p l a y e d the user can page forward and backward through the index or s e l e c t a d i s p l a y e d term to be shown in g r e a t e r d e t a i l . On s e l e c t i n g more d e t a i l each r e l a t e d document i n the system i s s e q u e n t i a l l y d i s p l a y e d i n short form. As each document i s presented the searcher has the o p t i o n of d i s p l a y i n g the f u l l data of the document, the c h i l d r e n of the document, or the p h y s i c a l copy data. See appendix C f o r a d e s c r i p t i o n of the v a r i o u s - 5 4 -E G Kehler Personal L i b r a r y System 1982 screens d i s p l a y e d i n the browse sequence. 2) Logic T h i s f a c i l i t y permits a s i n g l e l o g i c a l 'AND' between two search terms. The terms can be i n any combination of the indexes. If common items are found they are s e q u e n t i a l l y d i s p l a y e d . As each common document i s d i s p l a y e d the searcher has the same ch o i c e s f o r f u r t h e r d e t a i l as i n the browse f u n c t i o n . 3) Doc Th i s search method i s used to r e t r i e v e a s i n g l e s p e c i f i e d document from the system. The number entered i s the IDN. Once a document i s d i s p l a y e d , g r e a t e r d e t a i l can be d i s p l a y e d as in the browse f u n c t i o n . 4) A c c e s s i o n # Th i s works i n a s i m i l a r f a s h i o n to method 3, except that the number ent e r e d i s the a c c e s s i o n number of a p h y s i c a l o b j e c t . The system responds by d i s p l a y i n g the b i b l i o g r a p h i c data of the owner of the s p e c i f i e d o b j e c t . Once again the document can be d i s p l a y e d in g r e a t e r d e t a i l . These four query methods are complete in that they can r e t r i e v e a l l i n f o r m a t i o n s t o r e d i n the system. E v e n t u a l l y the l o g i c query must be improved to allow a f u l l range of boolean combinations. Appendix C d e s c r i b e s the format of the major d i s p l a y s produced by the r e t r i e v a l p r o c e s s . For an example of a r e t r i e v a l see appendix D. -55-E G Kehler Personal L i b r a r y System 1982 4.4 Data Robustness The system maintains i t s primary data in three f i l e s . These f i l e s have been designed so that they can be r e b u i l t should c e r t a i n c l a s s e s of damage occur to them. Damage which can be r e p a i r e d i n c l u d e s c o r r u p t i o n of the f i l e c o n t r o l s e c t i o n , i n t e r n a l r e c o r d p o i n t e r s or of the DEFs. T h i s amount of robustness has been achieved by s t o r i n g i n each r e c o r d the ID of the record's owner (IDN) and, where a p p l i c a b l e , the s e q u e n t i a l p l a c e of where the re c o r d belongs i n i t s l i n k e d l i s t . Should e n t i r e records be damaged the b i b l i o g r a p h i c data i s l o s t from the system. Thus there i s a minimal amount of recovery b u i l t i n t o the c u r r e n t system. A l l data s t o r e d i n the index t r e e s and maps i s redundant and can be obtained from the main b i b l i o g r a p h i c f i l e . However, when an index i s damaged i t i s r e c r e a t e d not r e p a i r e d . -56-E G Kehler 5 E v a l u a t i o n Personal L i b r a r y System 1 982 5.1 The H i e r a r c h i c a l S t r u c t u r e The implementation of the h i e r a r c h i c a l catalogue s t r u c t u r e does demonstrate the e f f e c t i v e n e s s of modeling a catalogue a f t e r the p h y s i c a l way knowledge i s p u b l i s h e d . The s t r u c t u r e allows the user to l e a r n more f u l l y the contents of an item and thereby allow him to make b e t t e r d e c i s i o n s as to the relevance of the item to h i s needs. The h i e r a r c h i c a l s t r u c t u r e i n c r e a s e s the u s e f u l n e s s of the catalogue beyond that of the c o n v e n t i o n a l systems d e t a i l e d in chapter two. T h i s increase i s p a r t i a l l y a r e s u l t of the g r e a t e r d e t a i l of data kept in the c a t a l o g u e . And as s t a t e d e a r l i e r , t h i s amount of d e t a i l can only be p r o p e r l y managed by imposing a s t r u c t u r e on the v a r i o u s e n t r i e s . Since the h i e r a r c h i c a l s t r u c t u r e groups documents r e l a t e d by t h e i r p h y s i c a l occurrences; new types of q u e r i e s , not e a s i l y a v a i l a b l e on other systems were in t r o d u c e d . These new q u e r i e s improve the p r e c i s i o n and r e c a l l of a search. Without the h i e r a r c h i c a l s t r u c t u r e these types of q u e r i e s would be a d i f f i c u l t task and thus would not o f t e n be performed. During t e s t q u e r i e s of the system the value of being able to r e t r i e v e the subcomponents of a document was seen. Rather than having to decide the relevance of a document by a b r i e f d e s c r i p t i o n , each a r t i c l e c o n tained i n i t c o u l d be reviewed i n d i v i d u a l l y . T h i s enabled the user to more a c c u r a t e l y determine -57-E G Kehler Personal L i b r a r y System 1982 whether a c e r t a i n document was a c t u a l l y r e l e v a n t , or too s u p e r f i c i a l i n the t o p i c sought than i n the c o n v e n t i o n a l c a t a l o g u e . The t e s t q u e r i e s a l s o showed that the m a j o r i t y of r e t r i e v a l s were made to the a c t u a l a r t i c l e s , bypassing the higher l e v e l s . These types of q u e r i e s are not p o s s i b l e with the co n v e n t i o n a l system. Thus the higher r e c a l l p o t e n t i a l of the h i e r a r c h i c a l catalogue was a l s o observed. Although these f e a t u r e s c o u l d be seen with the c u r r e n t t e s t data, the demonstration of t h i s was not impressive. T h i s can be a t t r i b u t e d to the smallness of the database. For a good demonstration and proof of the system's c a p a b i l i t i e s a database with s e v e r a l thousand e n t r i e s would be r e q u i r e d . It would appear that the h i e r a r c h i c a l s t r u c t u r i n g of a l i b r a r y c a talogue i s an e f f e c t i v e and f e a s i b l e method f o r p r o v i d i n g b e t t e r r e t r i e v a l of knowledge. I t i s s t i l l to be determined i f the three l e v e l s t r u c t u r e i s a b s o l u t e l y s u f f i c i e n t for a l l cu r r e n t and f o r s e e a b l e types of p u b l i s h e d knowledge. The case i n po i n t i s a s e r i a l p u b l i s h e d monthly where the i n d i v i d u a l i s s u e s are grouped i n t o volumes. Is the middle l e v e l the volume, the i s s u e , the i s s u e then the volume when bound, or both? Here there are four apparent l e v e l s . 5.2 P r a c t i c a l i t y At the present time, t h i s s t r u c t u r e of catalogue i s not p r a c t i c a l f o r use i n an o p e r a t i o n a l system. Current systems are e a s i e r to use and ma i n t a i n . T h i s ease of maintenance i s a d i r e c t consequence of t h e i r lack of l o g i c a l s t r u c t u r e . In c o n v e n t i o n a l - 5 8 -E G Kehler Personal L i b r a r y System 1982 systems each b i b l i o g r a p h i c entry i s d e a l t with on an i n d i v i d u a l b a s i s and can be added and d e l e t e d with l i t t l e e f f e c t on other e n t r i e s . The h i e r a r c h i c a l l y based system f o r c e s the updater to be concerned with how the item r e l a t e s to others i n the system thus the a d d i t i o n of items i s a more d i f f i c u l t t a s k . P r e s e n t l y there i s no easy way to determine a l l of these v a r i o u s r e l a t i o n s h i p s and how the a d d i t i o n and d e l e t i o n of an entry a f f e c t s those around i t . In the c u r r e n t implementation, the l o g i c a l inter-document l i n k s must be manually s p e c i f i e d . Thus when a document i s to be entered i n t o the system, the f i r s t t h i n g the user must do i s to l o c a t e a l l the documents which are h i e r a r c h i c a l l y a s s o c i a t e d - i e . j o u r n a l e n t r i e s , s e r i e s e n t r i e s , c o n s t i t u a n t a r t i c l e s e t c . - with t h i s new e n t r y . Once these documents are found, t h e i r r e s p e c t i v e IDNs must be recorded and entered with the new ent r y . In a d d i t i o n the user must s p e c i f y which l e v e l of the h i e r a r c h y the new entry belongs i n . Upon entry of the IDNs, h i e r a r c h i c a l p o s i t i o n , and the e n t r y ' s data elements the system i s able to c o r r e c t l y i n s e r t the new document i n t o the system. F o r t u n a t e l y , when e n t i r e j o u r n a l s are entered as s i n g l e batches the i n t e r - e n t r y l i n k s are assig n e d a u t o m a t i c a l l y by the system f o r a l l simple h i e r a r c h i e s d e s c r i b e d in the batch s t r u c t u r e ; o u t s i d e l i n k s , however, must s t i l l be manually found and entered. T h i s a d d i t i o n a l entry requirement can c o n s i d e r a b l y i n c r e a s e the e f f o r t r e q u i r e d in adding e n t r i e s to the catalogue system. Since we l i v e in an age of t i g h t f i s c a l funding t h i s e x t r a c o s t c o u l d be hard to j u s t i f y . -59-E G Kehler Personal L i b r a r y System 1982 The h i e r a r c h i c a l system makes heavy use of the i n t e r n a l document number (IDN); in i t s e l f t h i s i s not bad but the number i s u n f o r t u n a t e l y v i s i b l e to the user and as mentioned above, must be used e x t e n s i v e l y by him to i n d i c a t e the h i e r a r c h i c a l l i n k s to the system. Conventional systems a l s o a s s i g n numbers to each document but these are used f o r simple i d e n t i f i c a t i o n not f o r interdocument l i n k a g e . Since the number in the c o n v e n t i o n a l system i s only f o r i d e n t i f i c a t i o n , i t can be assigned a u t o m a t i c a l l y and then ignored. U n t i l the IDN can be handled more a u t o m a t i c a l l y , and the v a r i o u s l o g i c a l l i n k s can be e s t a b l i s h e d by the system, the h i e r a r c h i c a l system w i l l remain clumsy to use in a p r a c t i c a l implementation. The c u r r e n t implementation i s not robust enough to handle continued use i n a h o s t i l e environment. The h i e r a r c h i c a l s t r u c t u r e l i n k s are s t o r e d by a f r a g i l e data s t r u c t u r e which can be e a s i l y damaged, r e s u l t i n g i n l o s s of l a r g e amounts of i n f o r m a t i o n . For use in a s m a l l , p e r s o n a l l i b r a r y s i t u a t i o n the c u r r e n t implementation would not be simple enough for use. Whereas the p e r s o n a l user does not r e q u i r e the robustness r e q u i r e d i n the commercial environment, he does have very l i m i t e d time to spend on e n t e r i n g data. S e t t i n g up the h i e r a r c h i c a l l i n k a g e s would be too time consuming fo r the average p e r s o n a l user and thus make the system i m p r a c t i c a l . The average personal user i s too busy to spend the time r e q u i r e d to c o n t r o l the v a r i o u s anomalies found in the c u r r e n t implementation. -60-E G Kehler Personal L i b r a r y System 1982 5.3 A Personal System The implementation of the catalogue has demonstrated the p r a c t i c a l i t y of using a micro computer system f o r the c o n t r o l of a p r o f e s s i o n a l ' s p e r s o n a l l i b r a r y . The micro system was able to pro v i d e a response time of one second to the m a j o r i t y of the q u e r i e s performed on the t e s t database when using the browse f e a t u r e . L o g i c a l q u e r i e s averaged under three seconds response time. The f o l l o w i n g two t a b l e s d e t a i l s e v e r a l a t t r i b u t e s of the t e s t database and response times. Test Database Current S i z e Values Number of e n t r i e s Average length of e n t r i e s . . . . T o t a l number of Keywords . . . . T o t a l number of Component e n t r i e s T o t a l number of Authors 490 1 702 1 94 1 28 499. 1 Table 5: 1 -61-E G Kehler Personal L i b r a r y System 1 982 R e t r e i v a l Timings (timing in seconds +/- .25) Time I n i t i a l i z a t i o n Index Request I n i t i a l Search term d i s p l a y Browsing D e t a i l D i s p l a y C h i l d R e t r i e v a l Works d i s p l a y L o g i c a l r e t r e i v a l time to f i r s t document d i s p l a y L o g i c a l r e t r i e v a l nothing found IDN r e t r i e v a l A c c e s s i o n number r e t r e i v a l 38 5-2 5-1 8-2 5-0.8 Author Keywords NA 2.1 0 0 6-0, 5-0 2.0-2.6 NA 6-1.0 NA NA 2.1 0. 0, 6-1 5-0 2.0-2.5 NA NA Components Keywords Components NA 2.1 0.6-1.1 0.5-0.8 2.7-4.1 2.6-2.9 2.8-3.1 1 .8-3.3 NA NA NA NA NA NA NA 2.8-3.9 2.5-3.0 NA NA NA - Not A p p l i c a b l e - A l l i d e n t i c a l Table 5:2 Timings were done by hand using a d i g i t a l stopwatch d i v i s i b l e to 100's of a second. The times are not s t a t i s t i c a l l y s i g n i f i c a n t , they are presented simply to g i v e an idea of the system's response. These t e s t s made as l i t t l e use of the i n t e r n a l data b u f f e r i n g as p o s s i b l e . When q u e r i e d data i s a l r e a d y i n a b u f f e r the response i s instantaneous. The micro can be c o n s i d e r e d s u f f i c i e n t l y f a s t enough to provide good r e t r i e v a l access to a small l i b r a r y and the need f o r a l a r g e computer system does not e x i s t when small l i b r a r i e s are catalogued o n - l i n e . -62-E G Kehler Personal L i b r a r y System 1982 The implementation has a l s o demonstrated that i t can provide b e t t e r access i n t o the l i b r a r y than by using p r i n t e d indexes and one's memory. Indexes o f t e n cover documents not con t a i n e d i n the l i b r a r y , thus producing i r r e l e v e n t responses, e f f e c t i v e l y reducing the p r e c i s i o n of the search. R e l y i n g on ones memory i s a l s o a poor c h o i c e f o r a catalogue as the human mind has a way of ' l o s i n g ' the needed c r o s s - r e f e r e n c e s , e s p e c i a l l y when they are r e q u i r e d in a hurry. The one major draw back of any catalogue system f o r use by a p r o f e s s i o n a l i s that the catalogue data must be keyed by the user. T h i s keying i s a time consuming process and i t s b e n e f i t s must be c a r e f u l l y weighed. 5.4 F i l e S t r u c t u r e U t i l i z i n g the P r e f i x B-tree f o r the handling of the v a r i a b l e l e n g t h t e x t s t r i n g s has been proven s u c c e s s f u l . The tre e a l g o r i t h m has minimized the amount of data promoted to the higher l e v e l s thus p r o v i d i n g a maximal branching f a c t o r . T h i s has been demonstrated i n the components index where the average length of the index term was 7.45 c h a r a c t e r s , (19495-1702*4)/l702, while the average l e n g t h of the node separ a t o r s was only 2.07 c h a r a c t e r s , (352-58*4)/58 (58 i s c o r r e c t as one p o i n t e r i s part of the overhead). Furthermore, the t r e e i s only two l e v e l s high with a root only 71.5 percent f i l l e d . See t a b l e 5.3 f o r f u l l d e t a i l s . The space u t i l i z a t i o n has been very good, thus c o n f i r m i n g the u s e f u l n e s s of t h i s data s t r u c t u r e i n the l i b r a r y c a talogue environment. The l e v e l l i n k s have been i n v a l u a b l e in -63-E G Kehler Personal L i b r a r y System 1982 the implementation and e f f e c t i v e n e s s of the browse f u n c t i o n . They provide an e f f i c i e n t access method f o r r e t r i e v i n g neighbouring informat i o n . Storage Space U t i l i z a t i o n F a c t o r s (bytes) Author T i t l e Keywords Components Number a l l o c a t e d nodes 1 0 1 5 28 60 T o t a l data bytes a l l o c . 51 20 7680 1 4336 30720 Number of non-leaf nodes 1 1 1 1 Non-leaf used data space 42 72 149 352 percentage 8.5% 14.6% 30.3% 71.5% Non-leaf empty data space 450 420 343 1 40 percentage 91 .5% 85.4% 69.7% 28.5% Non-leaf overhead space 20 20 20 20 Number of l e a f nodes 9 1 4 27 59 Leaf used data space 3481 4916 8747 1 9495 percentage 78.6% 71.4% 65.8% 67.2% Leaf empty data space 947 1 972 4537 9533 percentage 21.4% 28.6% 34.2% 32.8% Leaf overhead space 1 80 280 540 1 1 80 Table 5:3 The B i t map implementation was not as s u c c e s s f u l as the B-tree; the maps were g e n e r a l l y l a r g e r than t h e i r a s s o c i a t e d t r e e s . T h i s was due to very poor space u t i l i z a t i o n of the p o t e n t i a l storage space of each map. T h i s f a i l u r e of the b i t maps can be a t t r i b u t e d to the two f o l l o w i n g causes: l a r g e map a l l o c a t i o n amounts r e l a t i v e to the amount of data, and the usage of a map f o r only one r e f e r e n c e . As d e t a i l e d i n Appendix A the b i t map i s a l l o c a t e d in records of t h i r t e e n s e t s of o f f s e t s and maps which can p o t e n t i a l l y r e f e r e n c e over 200 items. However, the m a j o r i t y of the index terms in the t e s t database r e f e r e n c e only one to three documents thus wasting a l a r g e percentage of the a l l o c a t e d space. -64-E G Kehler Personal L i b r a r y System 1982 F o r t u n a t e l y , as the database grows t h i s average w i l l i n c r ease and consequently reduce wasted space and improve the space u t i l i z a t i o n . The space u t i l i z a t i o n can be i n d i r e c t l y improved by i n c r e a s i n g the s i z e of the e n t i r e l i b r a r y catalogue and a l s o d i r e c t l y by two methods. In the f i r s t d i r e c t method the a l l o c a t i o n amount co u l d be tuned to each s i z e of database. Using a s m a l l e r amount f o r the small l i b r a r y and l a r g e r amounts f o r the bi g database. There i s always a trade o f f - too small an amount w i l l i n c rease the overhead space and too big an a l l o c a t i o n amount w i l l r e s u l t i n the c u r r e n t s i t u a t i o n . The second d i r e c t method of i n c r e a s i n g space u t i l i z a t i o n i s by employing the map only when two or more items are to be r e f e r e n c e d . A s i n g l e r e f e r e n c e can be handled d i r e c t l y by the t r e e thus r e q u i r i n g no map a l l o c a t i o n u n t i l the second document i s to be r e f e r e n c e d . Despite t h i s i n i t i a l f a i l u r e , the b i t map s t r u c t u r e shows good p o t e n t i a l f o r s i g n i f i c a n t space savings over c o n v e n t i o n a l methods which s t o r e the a c t u a l document numbers or p h y s i c a l p o i n t e r s i n t o the r e f e r e n c e d data. -65-E G Kehler 6 C o n c l u s i o n s Personal L i b r a r y System 1982 The major goal of t h i s t h e s i s was to demonstrate the f e a s i b i l i t y of a h i e r a r c h i c a l l y s t r u c t u r e d l i b r a r y catalogue and to demonstrate that t h i s approach permits g r e a t e r accuracy in the r e t r i e v a l of knowledge then i n c o n v e n t i o n a l systems. The implementation has demonstrated the p r a c t i c a l i t y of employing the three l e v e l h i e r a r c h y as a catalogue s t r u c t u r e . The three l e v e l h i e r a r c h y enabled g r e a t e r i n f o r m a t i o n to be s t o r e d about a given item ( i n an o r d e r l y way) and thus improved the a b i l i t y of a searcher to determine the d e s i r a b i l t y of the document. The implementation r a i s e d the q u e s t i o n as to whether the three l e v e l s t r u c t u r e was s u f f i c i e n t . T h i s study should be pursued. The implementation a l s o showed the weakness of using a document number f o r l i n k a g e . Research should be continued to determine how to make the IDN l e s s v i s i b l e to the user of the system. This must be improved to allow g r e a t e r automation i n the b u i l d i n g of the h i e r a r c h y as data i s entered. The system must be enhanced to allow a d d i t i o n a l interdocument l i n k a g e , r e l a t i n g documents which are not h i e r a r c h i c a l l y r e l a t e d . F i n a l l y , and most importantly, the system must be improved to take g r e a t e r advantage of the h i e r a r c h y and the implied commonality of the parents a t t r i b u t e s to the c h i l d r e n . T h i s l a s t p o i n t r a i s e s both implementation and t h e o r e t i c a l q u e s t i o n s on how and what a t t r i b u t e s are common from parent's to c h i l d r e n . - 6 6 -E G K e h l e r P e r s o n a l L i b r a r y System 1982 The second g o a l of the i m p l e m e n t a t i o n was t o demonstrate the c a p a b i l i t i e s of a m i c r o system and i t s p o t e n t i a l f o r p r o v i d i n g an o n - l i n e c a t a l o g u e f o r a s m a l l l i b r a r y . T h i s g o a l was the most o b v i o u s l y s u c c e s s f u l achievement of the i m p l e m e n t a t i o n . The m i c r o was a b l e t o r a p i d l y p e r f o r m r e t r i e v a l s i n t o the d a t a base. The speed was more than adequate i n p r o v i d i n g good response t o a q u e r y . With m i c r o s becoming more p o w e r f u l t h e r e s h o u l d be no problem w i t h too slow a response time w i t h the e x c e p t i o n of complex boolean s e a r c h e s which are not c u r r e n t l y p e r m i t t e d . The f i n a l g o a l was t o f i n d an e f f i c i e n t d a t a s t r u c t u r e f o r the s t o r i n g of v a r i a b l e t e x t s t r i n g s and p r o v i d e an e f f e c t i v e c r o s s index of them. The P r e f i x B - t r e e s t r u c t u r e proved w e l l s u i t e d f o r the j o b of h a n d l i n g v a r i a b l e l e n g t h s t r i n g s w h i l e p r o v i d i n g r a p i d a c c e s s w i t h m i n i m a l I/O and p r o c e s s i n g t i m e . The s t r u c t u r e c o u l d be f u r t h e r improved by moving d a t a between b r o t h e r nodes p r i o r t o the s p l i t t i n g of a node. The b i t map s t r u c t u r e was not as s u c c e s s f u l as the B - t r e e . T h i s was p r o b a b l y due t o c h o i c e of a l l o c a t i o n s i z e and the r e l a t i v e l y s m a l l amount of t e s t d a t a r a t h e r than w i t h the s t r u c t u r e i t s e l f . F u r t h e r s t u d i e s s h o u l d be undertaken t o determine a whether a d i f f e r e n t a l l o c a t i o n s i z e would improve the space u t i l i z a t i o n of the map f i l e s . The h i e r a r c h i c a l l y s t r u c t u r e d c a t a l o g u e does appear t o be a b e t t e r method of s t r u c t u r i n g a c a t a l o g u e f o r o n - l i n e r e t r i e v a l . The system s h o u l d be f u r t h e r t e s t e d w i t h l a r g e r amounts of d a t a , p a y i n g p a r t i c u l a r a t t e n t i o n t o the user u s e f u l n e s s of the h i e r a r c h y i n o b t a i n i n g b e t t e r r e s u l t s of document s e l e c t i o n . -67-E G Kehler Personal L i b r a r y System B i b l i o g r a p h y 1 982 [ALA 1978] American L i b r a r y A s s o c i a t i o n ; "Anglo-American C a t a l o g i n g Rules"; 2nd e d i t i o n , American L i b r a r y A s s o c i a t i o n , 1978 [BAYER 1977] Bayer, Rudolf; Unterauer, K a r l ; " P r e f i x B-Trees"; ACM T r a n s a c t i o n s on Data Base Systems; March 1977 [BENTLEY 1975] Bentley, Jon Lewis; " M u l t i d i m e n s i o n a l Binary Search t r e e s used f o r a s s o c i a t i v e s e a r c h i n g " ; Communications of the ACM; September 1975 [BENTLEY 1978] Bentley, Jon Lewis; " M u l t i d i m e n s i o n a l Binary Search t r e e s in Data Base A p p l i c a t i o n s " ; Carnegie-Mellon U n i v e r s i t y r e s e a r c h paper CMU-CS-78-139; September 1978 [COMER 1979] Comer,Douglas; "The Ubi g u i t o u s B-tree"; ACM Computing Surveys; June 1979 [DANELIUK 1979] Daneliuk, Faye A; "The Design & Implementation of a Data Base System f o r B i b l i o g r a p h i c A p p l i c a t i o n s on a Minicomputer"; M c G i l l U n i v e r s i t y t e c h n i c a l r e p o r t ; SOCS 79.14; August 1979 [DWYER 1981] Dwyer, James R; "Who Rules the Rules?"; J o u r n a l of L i b r a r y Automation; volume 14:3; September 1981 [FASANA 1980] Fasana, Paul J ; "1981 and Beyond: V i s i o n s and D e c i s i o n s " ; J o u r n a l of L i b r a r y Automation; volume 13:2; June 1980 [HOFFMAN 1976] Hoffman, Herbert H.; " D e s c r i p t i v e C a t a l o g i n g in a new l i g h t : P o l e m i c a l Chapters f o r L i b r a r i a n s " ; R a y l i n e P r i n t i n g Co.; 1976 [HOFFMAN 1981] Hoffman, Herbert H.; "A S t r u c t u r e code f o r Machine Readable L i b r a r y Catalog r e c o r d Formats";Journal of L i b r a r y Automation; volume 14:2; June 1981 [JACOBS 1979] Jacobs, May E l l e n ; Woods, Richard; "Online Resource Sharing I I " , C a l i f o r n i a L i b r a r y A u t h o r i t y f o r Systems and S e r v i c e s , 1979 -68-E G Kehler Personal L i b r a r y System 1 982 [LANCASTER 1968] Lancaster, F.W.; " E v a l u a t i o n of the Medlars Demand Search S e r v i c e " ; U.S. Department of Health, Education and Welfare; 1968 [LEHMAN 1981] Lehman, P h i l i p L.; Yao, S. Bing; " E f f i c i e n t Locking f o r ' Concurrent Operations on B-Trees"; ACM T r a n s a c t i o n s on Data Base Systems; December 1981 [MARC 1972] Marc Task Group; "Canadian Marc"; The N a t i o n a l L i b r a r y of Canada,1972 [MCALLISTER 1979] M c A l l i s t e r , C a r y l ; " D o b i s / L i b i s : An Integr a t e d , O n - l i n e L i b r a r y Management System"; J o u r n a l of L i b r a r y Automation, volume 12:4; December 1979 [MIASTKOWSKI 1981 ] Miastkowski, Stan; "Information U n l i m i t t e d -The D i a l o g Information R e t r i e v a l S e r v i c e " ; McGraw-Hill; Byte Magazine, June 1981 [NEWMAN 1979] Newman, W i l l i a m L.; Brodie, Nancy; and ot h e r s ; "Dobis: The Canadian Governament V e r s i o n " ; Canadian L i b r a r y J o u r n a l , August 1979 [RAGAN 1978] Ragan, Don P; Jones, Steven A.; " H i g h - l e v e l Language Implementation of B i t Map Inverted f i l e s " ; Computers and Biomedical Research, volume 11 pages 595-612 [RUSH 1980] Rush, James E.; "The MARC Formats: T h e i r Use, S t a n d a r d i z a t i o n , and E v o l u t i o n " ; J o u r n a l of L i b r a r y Automation, volume 13:3, September 1980 [STIBIC 1980] S t i b i c , V l a d i m i r ; Personal Documentation f o r P r o f e s s i o n a l s : Means and Methods"; North-Holland P u b l i s h i n g Co.; 1980 [WEDDELL 1980] Weddell, Grant . E.; "Automating P h y s i c a l R e o r g a n i z a t i o n Requirements at the Access Path L e v e l of a R e l a t i o n a l Data Base Management System"; U n i v e r s i t y of B r i t i s h Columbia, Department of Computer Science, t e c h n i c a l report 80-3, March 1980 [YARBOROUGH 1975] Yarborough, J u d i t h ; "How to Prepare f o r a Computer Search of ERIC: A Non - t e c h n i c a l Approach"; ERIC Clearinghouse on Information Resources; S t a n f o r d U n i v e r s i t y ; September 1975 -69-E G Kehler Personal Library System 1982 Appendix A: This appendix describes the f i l e structure of the various f i l e s used in the l i b r a r y catalogue system. The f i l e s are f a i r l y straightforward in nature. The record structure i s presented in the order i t appears in the actual f i l e . In f i l e s with more than one record type, the f i r s t byte of every record i s reserved for id e n t i f y i n g the record type. This record type byte contains the numbers which are used to label each record l i s t e d in t h i s appendix. D e f i n i t i o n of Internal Document Number (IDN) The IDN i s used to i d e n t i f y each item stored in the catalogue system. It i s composed of two parts: a 2 b i t c l a s s i f i e r and a. 30 b i t item number. The 30 b i t number i s unique across three of the class codes, the fourth i s independent. The two most s i g n i f i c a n t b i t s of the IDN form the c l a s s i f i e r the remaining 30 form the item number. The c l a s s i f i e r i s used to define which of the four types of bibliographic e n t i t i e s the i d e n t i f i e r represents. The number portion i s system maintained and sequentially assigned as entries are added to the catalogue. The f i r s t three classes i d e n t i f y the l o g i c a l bibliographic e n t i t i e s and the i r h i e r a r c h i c a l position, the fourth indicates that the ent i t y i s a physical work. -70-E G Kehler Personal L i b r a r y System 1982 C l a s s i f i e r d e f i n i t i o n 1 1 - Entry r e f e r s to an a r t i c l e e n t r y 1 0 - Entry r e f e r s to a journal/book entry 0 1 - Entry r e f e r s to a s e r i a l / s e t entry 0 0 - Th i s r e f e r s to a p h y s i c a l o b j e c t . The number st o r e d i s the item's a c c e s s i o n number. The number p o r t i o n i s independent of the other three c l a s s e s , i t i s user a l l o c a t e d . The number must be unique i n i t s own c l a s s . B i b l i o . M a i n . d a t a T h i s f i l e s t o r e s the b i b l i o g r a p h i c data f o r a l l e n t r i e s i n the system. The data i s s t o r e d as v a r i a b l e length tagged f i e l d s . I n d i v i d u a l records are l i n k e d together to form a complete l o g i c a l e n t r y . Records are 256 bytes in l e n g t h . Record type 10 - Data re c o r d f i e l d a t t r i b u t e s d e s c r i p t i o n RECNUM 4 byte i n t e g e r CONT 4 byte i n t e g e r DOCNUM 4 byte IDN RECLEN 2 byte i n t e g e r FLD 241 c h a r a c t e r s Unique record ID ( r e c o r d address) C o n t i n u a t i o n p o i n t e r to next r e c o r d IDN of the l o g i c a l e n t i t y 1st r e c o r d - l e n g t h of v a r i a b l e data subsequent - r e l a t i v e record number in l i s t V a r i a b l e data area Three p a r t sequence 1. - 1 byte tag 2. - 2 byte data l e n g t h 3. - v a r i a b l e data s t r i n g -71-E G Kehler Personal L i b r a r y System 1982 Record type 11 - C o n t r o l r e c o r d f i e l d a t t r i b u t e s descr i p t ion D1 4 bytes dummy FLIST 4 byte int e g e r P o i n t e r to s t a r t of f r e e l i s t LLIST 4 byte int e g e r P o i n t e r to end of f r e e l i s t D2 4 bytes dummy LINTDOC 4 byte int e g e r Last assigned IDN BYTES 4 byte int e g e r S i z e of f i l e Record type 12 - Empty r e c o r d f i e l d a t t r i b u t e s d e s c r i p t i o n RNUM 4 byte int e g e r Unique record ID NEXT 4 byte int e g e r P o i n t e r to next empty rec o r d D i r e c t entry f i l e s (DEF) T h i s f i l e s t r u c t u r e i s used as an index i n t o the B i b l i o . M a i n . d a t a and the Components.data f i l e s . Each record i s a 4 byte i n t e g e r which p o i n t s to a record in the a s s o c i a t e d data f i l e . Components T h i s f i l e s t o r e s the v a r i o u s inter-document l i n k a g e s . These l i n k s form both the h i e r a r c h i c a l s t r u c t u r e and p o i n t e r s to the p h y s i c a l c o p i e s of an item. Records are 32 bytes in l e n g t h . -72-E G Kehler Personal L i b r a r y System Record type 40 - Link storage r e c o r d f i e l d a t t r i b u t e s RECNUM 4 byte int e g e r OWNER 4 byte IDN CONT 4 byte int e g e r MEMBER 4 x 4 byte IDNs descr i p t ion Unique re c o r d ID Po i n t e r to owner of reco r d P o i n t e r to next r e c o r d i n l i s t Linkage storage Record type 41 - C o n t r o l record f i e l d a t t r i b u t e s D1 4 bytes FLIST 4 byte i n t e g e r L L I s t 4 byte i n t e g e r D2 4 bytes BYTES 4 byte int e g e r descr i p t ion dummy Po i n t e r to s t a r t of f r e e l i s t P o i n t e r to end of free l i s t dummy Size of f i l e Record type 42 - Empty rec o r d f i e l d . a t t r i b u t e s RNUM 4 byte i n t e g e r NEXT 4 byte int e g e r d e s c r i p t i o n Unique re c o r d ID Po i n t e r to next empty rec o r d Works.data T h i s f i l e s t o r e s the data p e r t a i n i n g to s i n g l e p h y s i c a l c o p i e s items s t o r e d in the l i b r a r y . Records are 64 bytes i n l e n g t h . - 7 3 -E G Kehler Personal L i b r a r y System 1982 Record type 50 - P h y s i c a l data storage r e c o r d f i e l d a t t r i b u t e s ACCESSION 4 byte IDN OWNER 4 byte IDN MEDIA 2 c h a r a c t e r s COST 4 byte i n t e g e r LOCATE 12 c h a r a c t e r CONDITION 2 c h a r a c t e r s INDIRECT 4 byte i n t e g e r d e s c r i p t i o n A ccession number of a s s o c i a t e d item IDN of b i b l i o g r a p h i c owner Type of medium comprising the item Cost of item x 100 Lo c a t i o n of item in l i b r a r y C o n d i t i o n of o b j e c t P o i n t e r to another Works f i l e r e cord i n d i c a t e s that t h i s item has been combined with others i n t o the designated item. Record type 51 - C o n t r o l r e c o r d f i e l d a t t r i b u t e s D1 4 bytes FLIST 4 byte i n t e g e r LLIST 4 byte i n t e g e r D2,D3 4 bytes BYTES 4 byte i n t e g e r d e s c r i p t i o n dummy Po i n t e r to s t a r t of fr e e l i s t P o i n t e r to end of f r e e l i s t dummys Size of f i l e Record type 52 - Empty re c o r d f i e l d a t t r i b u t e s RNUM 4 byte i n t e g e r NEXT 4 byte i n t e g e r d e s c r i p t i o n Unique r e c o r d ID Po i n t e r to next empty re c o r d -74-E G Kehler Personal L i b r a r y System " 1982 B i b l i o . t a g s T h i s f i l e c o n t a i n s d e s c r i p t o r s of the v a r i o u s tagged data items which can be s t o r e d on each e n t r y . The f i l e a l s o c o n t a i n s c o n t r o l i n f o r m a t i o n on how to administer the v a r i o u s f i e l d s . The tags f i l e i s used by the data entry program. Records are 32 bytes i n l e n g t h . The f i l e i s s t r u c t u r e d as a two l e v e l h i e r a r c h y , a l l tag IDs d i v i s a b l e by 10 are group headers others are members. See appendix E f o r a complete f i l e l i s t i n g . Record type 90 - Tag f i e l d d e s c r i p t o r f i e l d a t t r i b u t e s descr i p t ion DUPSER 1 byte DUPJOUR 1 byte DUPART 1 byte MENTRY 1 c h a r a c t e r DESC 27 c h a r c t e r s S e r i a l c o n t r o l J o u r n a l c o n t r o l A r t i c l e c o n t r o l M u l t i l i n e entry i f Tag d e s c r i p t i o n = 'M' C o n t r o l codes - 0 ignore t h i s tag 1 set to d e f a u l t 2 0 or once per item 3 must be present and only once 4 0 or more times 5 1 or more times 12 0 or once per group 13 must be present once per group 14 0 or more times per group 15 1 or more times per group -75-E G Kehler Works.DEF Personal L i b r a r y System 1 982 Thi s f i l e i s used as the index i n t o the works data f i l e . The f i l e i s s t r u c t u r e d as a B-tree. Records are 256 c h a r a c t e r s i n l e n g t h . Record type 150 - Tree node or l e a f f i e l d a t t r i b u t e s RECNUM 4 byte i n t e g e r NTYPE 1 byte BFACT 2 byte i n t e g e r KEY 30 x 4 byte IDN d e s c r i p t i o n Unique record ID 0 = node, 1 = l e a f Branching f a c t o r of node Ac c e s s i o n number storage PTRS 31 X 4 byte i n t e g e r P o i n t e r s i n to the Works f i l e or p o i n t e r s to c h i l d r e n nodes Record type 151 - C o n t r o l record f i e l d a t t r i b u t e s ROOT 4 byte i n t e g e r FLIST 4 byte i n t e g e r LLIST 4 byte i n t e g e r ACTIVE 4 byte i n t e g e r ENTRY 4 byte i n t e g e r BYTES 4 byte i n t e g e r H 2 byte i n t e g e r d e s c r i p t i o n P o i n t e r to B-Tree root node Po i n t e r to s t a r t of f r e e l i s t P o i n t e r to end of fr e e l i s t Count of a c t i v e nodes Count of a c t i v e e n t r i e s S i z e of f i l e Height of t r e e Record type 152 - Empty records f i e l d a t t r i b u t e s descr i p t ion -76-E G Kehler Personal L i b r a r y System 1982 RNUM 4 byte i n t e g e r NEXT 4 byte i n t e g e r Unique r e c o r d ID P o i n t e r to next f r e e r e c o r d Tree f i l e s These f i l e s are used i n the index s e c t i o n of the l i b r a r y system. They s t o r e the terms of the index and provide p o i n t e r s in to the index b i t map. The f i l e s are s t r u c t u r e d as p r e f i x B - t r e e s , each node being 512 bytes i n l e n g t h . See main body of t h e s i s for f u r t h e r d e t a i l s of the t r e e s t r u c t u r e . Record type 160 - Tree node format f i e l d a t t r i b u t e s RECNUM 4 byte i n t e g e r NTYPE 1 c h a r a c t e r BUSED 2 byte i n t e g e r PRED 4 byte i n t e g e r SUC 4 byte i n t e g e r PTRO 4 byte i n t e g e r FLD 492 byte f i e l d d e s c r i p t i o n Unique record ID Node type 0 = non-leaf node 1 = l e a f node Bytes of data area i n use P o i n t e r to predecessor node P o i n t e r to successor node P o i n t e r to a node or b i t map V a r i a b l e l e n g t h data The FLD f i e l d c o n s i s t s of an a l t e r n a t i n g sequence of v a r i a b l e l e n g t h s t r i n g s and four byte i n t e g e r p o i n t e r s . Each s t r i n g i s terminated by s e t t i n g the hig h order b i t of the l a s t c h a r a c t e r . The le n g t h of the node l i m i t s the le n g t h of terms to a maximum of 488 c h a r a c t e r s . -77-E G K e h l e r P e r s o n a l L i b r a r y System 1 982 Record type 161 - C o n t r o l r e c o r d f i e l d a t t r i b u t e s ROOT 4 byte i n t e g e r FLIST 4 byte i n t e g e r LLIST 4 byte i n t e g e r ACTIVE 4 byte i n t e g e r ENTRY 4 by t e i n t e g e r BYTES 4 by t e i n t e g e r H 2 b y t e i n t e g e r d e s c r i p t i o n P o i n t e r t o B-Tree r o o t node P o i n t e r t o s t a r t of f r e e l i s t P o i n t e r t o end of f r e e l i s t Count of a c t i v e nodes Count of a c t i v e e n t r i e s S i z e of f i l e H e i g h t of B - t r e e Record type 162 - Empty r e c o r d f i e l d a t r t r i b u t e s RNUM 4 by t e i n t e g e r NEXT 4 byte i n t e g e r d e s c r i p t i o n Unique r e c o r d ID P o i n t e r t o next empty node B i t map f i l e s These f i l e s c o n t a i n the b i t maps of the index s t r u c t u r e . Each r e c o r d i s 64 b y t e s i n l e n g t h . See main body of t h e s i s f o r f u r t h e r d e t a i l s of s t r u c t u r e . Record type 180 - B i t map s t o r a g e r e c o r d f i e l d a t t r i b u t e s d e s c r i p t i o n RECNUM 4 byte i n t e g e r MAPPTR 4 byte i n t e g e r BITCNT 2 byte i n t e g e r Unique r e c o r d ID P o i n t e r t o next map r e c o r d i n l i s t Count of a c t i v e b i t s i n e n t i r e map •78-E G Kehler Personal L i b r a r y System MAP 13 x (2+2) bytes 13 se t s of b i t map 1982 Record type 181 - C o n t r o l r e c o r d f i e l d a t t r i b u t e s d e s c r i p t i o n D1 4 bytes dummy FLIST 4 byte i n t e g e r P o i n t e r to s t a r t of fr e e l i s t LLIST 4 byte i n t e g e r P o i n t e r to end of fr e e l i s t D2,D3 2 x 4 bytes dummys BYTES 4 byte i n t e g e r S i z e of data f i l e Record type 182 - empty r e c o r d f i e l d a t t r i b u t e s d e s c r i p t i o n RNUM 4 byte i n t e g e r Unique r e c o r d ID NEXT 4 byte i n t e g e r P o i n t e r to next empty record Data Entry f i l e The data entry f i l e i s a temporary measure used to provide the i n i t i a l data entry i n t o the system. E v e n t u a l l y t h i s batch approach w i l l be r e p l a c e d with an o n - l i n e data entry method. The f i l e i s a f i x e d format te x t f i l e with v a r i a b l e l e n g t h l i n e s . Each l i n e i s composed of two f i e l d s a three c h a r a c t e r tag f i e l d f o l l o w e d by a 75 c h a r a c t e r data f i e l d . The tag f i e l d c o n t a i n s the tag IDs which i d e n t i f y the data item, they a l s o are used to d e l i m i t e n t r i e s . -79-E G Kehler Personal L i b r a r y System 1982 The f o l l o w i n g codes are used as d e l i m i t e r s : STR - S t a r t batch STS - S t a r t s e r i a l e n try STJ - S t a r t j o u r n a l entry STA - S t a r t a r t i c l e entry ENA - End a r t i c l e entry ENJ - End j o u r n a l entry ENS - End s e r i a l entry END - End the batch STW - s t a r t a p h y s i c a l work entry F02 - Acession number F03 - Owner IDN F04 - Media type F05 - Replacement c o s t F06 - Storage l o c a t i o n F07 - P h y s i c a l c o n d i t i o n ENW - end a p h y s i c a l work *** - c o n t i n u a t i o n of previous l i n e There may be up to four l e v e l s of embedding - a r t i c l e w i t h i n j o u r n a l w i t h i n s e r i a l w i t h i n a batch. The work entry may be w i t h i n e i t h e r the j o u r n a l or a r t i c l e entry depending upon which i t i s . A work w i l l normally be entered i n one of two ways. 1) As a j o u r n a l i s r e c e i v e d the b i b l i o g r a p h i c data w i l l be entered j o u r n a l name - a r t i c l e s . Here the work w i l l r e f e r to the j o u r n a l e n t r y . 2) When a s i n g l e a r t i c l e i s entered with i t s p h y s i c a l work, the work entry w i l l be w i t h i n the a r t i c l e . Adding -80-E G Kehler Personal L i b r a r y System 1982 a d d i t i o n a l c o p i e s of works w i l l cause the work entry to be o u t s i d e any e n c l o s i n g entry (except the b a t c h ) . See Appendix B f o r a sample E d i t L i s t . -81-E G Kehler Personal L i b r a r y System 1 982 Appendix B Th i s appendix d i s p l a y s a sample of the e d i t r e p o r t produced by the system'. The e d i t f e a t u r e i s e s s e n t i a l to insure the i n t a c t n e s s of the d e l i c a t e h i e r a r c h i c a l s t r u c t u r e . Data Entry f i l e e d i t l i s t i n g STR STA 0 2 l O S o l u t i o n to the Concurrent D e l e t i o n Problem f o r B-trees 1 1 A 12TR 80-CS-7 161980 20Kwong, Y S ++++ WARNING: Author not present i n Data Base ++++ 20Wood, D ++++ WARNING: Author not present i n Data Base ++++ 30B-Trees 30Concurrent D e l e t i o n ++++ WARNING: Keyword not present in Data Base ++++ 1401980 20026W $$$$ ERROR: A non-numeric found i n a numeric f i e l d $$$$ 2lOThe B-tree and i t s v a r i a n t s are very u s e f u l f o r s t o r i n g l a r g e ***amounts of in f o r m a t i o n , p a r t i c u l a r l y on secondary storage. Th *** ( ) * * * ( ) STW F02 1004 F03 0.0 F04PC -F05 F06PC f i l e F07GD ENW ENA END Normal t e r m i n a t i o n F i g u r e 6:2 -82-E G Kehler Personal L i b r a r y System Appendix C 1 982 Th i s appendix d i s p l a y s the major screen l a y o u t s of the l i b r a r y system as they are seen by the searcher. NOTE: x = a c h a r a c t e r s t r i n g of v a r y i n g l e n g t h , n = a number showing the maximum l e n g t h . Screen 1 I n i t i a l s t a r t up d i s p l a y Enter (B)rowse, (D)oc, ( A ) c c e s s i o n , ( L ) o g i c a l , (E)nd -> Screen 2 Index s e l e c t i o n prompt S e l e c t d e s i r e d index - (A)uthor ( T ) i t l e (K)eywords ( S ) u b j e c t s (C)omponents ( I ) s s n -> Screen 3 Prompt f o r browse s t a r t i n g p o i n t Enter Search term -> -83-E G Kehler Personal L i b r a r y System 1 982 Screen 4 Short form d i s p l a y of l o c a l area 1 xxxxxxxxxxx 2 xxxxxxxxxxxx 3 xxxxxxx 4 xxxxxxxxxxxxxxxxxxx 5 xxxxxxxxxxxxxxxxxxxxxxxxx 6 xxxxxxxxxxxxxx 7 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 8 xxxxxxxxxxxxxxxxxx 9 xxxxxxxx 10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx # - More d e t a i l F - S c r o l l forward B - S c r o l l backwrd R - Re d i s p l a y E - End -> Screen 5 B i b l i o g r a p h i c d e t a i l d i s p l a y 3-47 2 date & time 10 t i t l e 20 author 1 20 author 2 30 keyword 1 30 keyword 2 continue d e t a i l ? Y 210 a b s t r a c t l i n e 1 a b s t r a c t l i n e 2 a b s t r a c t l i n e n D i s p l a y next or (E)nd? Screen 6 P h y s i c a l works d i s p l a y D i s p l a y P h y s i c a l works? Y ACCESSION nnnn OWNER nnnn MEDIA xx COST nn.nn LOCATION xxxxxxxxxxxx CONDITION xx -84-E G Kehler Personal L i b r a r y System 1982 Appendix D T h i s appendix presents a sample s e s s i o n of querying the l i b r a r y c a talogue through the o n - l i n e system. Most of the major f e a t u r e s of the system are i l l u s t r a t e d . Explanatory comments are enclosed in braces. ENTER (B)rowse, (D)oc, ( A ) c c e s s i o n , ( L ) o g i c a l , (E)nd -> B {Screen 1} S e l e c t d e s i r e d index - (A)uthor - ( T ) i t l e - (K)eywords - ( S ) u b j e c t s - (C)omponents - ( I ) s s n Enter Search' Term ->FREE -> K {Screen 2} {Screen 3} 1 FAMOUS PEOPLE 3 FAULT TOLERANCE 5 FILE ORGANIZATION 7 FILE STRUCTURE 9 FINLAND 1 1 FISHERMEN 13 FORGOTTEN CROPS 15 GAMBLING 17 GEOLOGICAL RESEARCH 19 GORILLA 20 GORILLAS Enter Request 2 FAST SEARCH UPDATE 4 FIGHTER AIRCRAFT 6 FILE SEARCHING 8 FILE STRUCTURES 10 FINNISH LIVELIHOOD 12 FISHING 1 4 FUNGUS 1 6 GEOGRAPHY 18 GORALE PEOPLE 2 0 GORILLA MATING {Screen 4} # - More D e t a i l F - S c r o l l Forward B - S c r o l l Backward R - Redisplay E - End 1 GREAT AMERICANS 3 GREEK MYTHOLOGY 5 GROUNDNUT -> F {Browsing forward] 2 GREAT BARRIER REEF 4 GRENADA 13 HAWAII 14 HAWAIIAN LANDSCAPE 2 3 HORSESHOE CRAB 2 4 HUMAN SACRIFICE Enter Request : # - More D e t a i l F - S c r o l l Forward B - S c r o l l Backward -85-E G Kehler Personal L i b r a r y System 1982 R - Redisp l a y E - End 2-94 {IDN} 2 82:02:26:15:39: 1 1 10 N a t i o n a l Geographic 15 160:2 16 Aug 1981 Continue D e t a i l ? Y 30 Maya A r t Sharks Hawai i H e l s i n k i Yellowstone N a t i o n a l Geographic S o c i e t y •> 13 {Requesting d e t a i l } {Screen 5} {Request a d d i t i o n a l d e t a i l } 30 30 30 30 50 52 60 82 1 40 200 204 USA N a t i o n a l Photos 1981 1 42 141,278 Geographic S o c i e t y D i s p l a y sub-components ? Y {Request to view c h i l d r e n } 3-95 2 82:02:26:15:39:27 10 Sharks M a g n i f i c e n t 16 Aug 1981 20 C l a r k , Eugenie 20 D o u b i l e t , David 24 Phot Continue D e t a i l ? N D i s p l a y P h y s i c a l Works { f i r s t c h i l d } and Misunderstood of 95? D i s p l a y Next component? Y 3-96 {Second c h i l d } 2 82:02:26:15:39:36 10 M o l o k a i : Forgotten Hawaii 16 Aug 1981 20 S t a r b i r d , E t h e l A. 20 Cooke, Richard A., I l l 24 Phot Continue D e t a i l ? Y 30 Hawaii 30 Molokai 30 Hawaiian People 30 Hawaiian L i f e s t y l e s 30 Hawaiian Landscape 82 Photos 140 1981 200 32 210 The small r u r a l Hawaiian i s l a n d of Molokai i s d e s c r i b e d i n i t s f u l l beauty. The i s l a n d i s s t a r t i n g to a t t r a c t develop ers and t o u r i s t s which i s d i v i d i n g the l o c a l residents.. -86-E G Kehler Personal L i b r a r y System 1 982 D i s p l a y P h y s i c a l Works o f : 96? Y ACCESSION 3932 OWNER 94 MEDIA OR COST 1.7 0 LOCATION BK SHELVE 0 CONDITION GD {Request P h y s i c a l works} {Screen 6} D i s p l a y next component? N D i s p l a y P h y s i c a l Works o f : 94? N Obtain next document or (E)nd? E 1 GREAT AMERICANS 3 GREEK MYTHOLOGY 2 GREAT BARRIER REEF 4 GRENADA 2 3 HORSESHOE CRAB Enter Request 2 4 HUMAN SACRIFICE # - More D e t a i l F - S c r o l l Forward B - S c r o l l Backward R - Redisplay E - End -> B 1 FAMOUS PEOPLE 3 FAULT TOLERANCE 5 FILE ORGANIZATION 2 FAST SEARCH UPDATE 4 FIGHTER AIRCRAFT 6 FILE SEARCHING 21 GORILLAS Enter Request # - More D e t a i l F - S c r o l l Forward B - S c r o l l Backward R - Redisplay E - End -> 3-9 2 82:01:13:18:09:07 10 Ubi q u i t o u s B-tree 1 1 The 16 June 1979 20 Comer, Douglas Continue D e t a i l ? N D i s p l a y P h y s i c a l Works of: 9? N Obtain next document or (E)? Y 3-1 1 2 82:01:13:18:09:39 10 Optimal P a r t i a l - M a t c h {Second document with same keyword} R e t r i e v a l when F i e l d s are independently -87-E G Kehler Personal L i b r a r y System 1982 S p e c i f i e d 16 June 1979 20 Aho, A l f r e d V. 20 Ullman, J e f f r e y D. Continue D e t a i l ? N D i s p l a y P h y s i c a l Works o f : 11? N Obtain next document or (E)nd? Y {No more e x i s t ] 1 FAMOUS PEOPLE 2 FAST SEARCH UPDATE 3 FAULT TOLERANCE 4 FIGHTER AIRCRAFT 21 GORILLAS Enter Request : # - More D e t a i l F - S c r o l l Forward B - S c r o l l Backward R - R e d i s p l a y E - End -> E Enter Search term -> ENTER (B)rowse, (D)oc, ( A ) c c e s s i o n , ( L ) o g i c a l , (E)nd -> L S e l e c t d e s i r e d index -Enter Key 1 -> ITALY S e l e c t d e s i r e d index Enter key 2 -> ROAD (A)uthor ( T ) i t l e (K)eywords ( S ) u b j e c t s (C)omponents ( I ) s s n -> K (A)uthor ( T ) i t l e (K)eywords ( S ) u b j e c t s (C)omponents ( I ) s s n -> C {Perorm a l o g i c a l "AND"} {on two terms } 3-79 2 82:02:25:17:42:22 10 Down the Ancient Appian way 16 June 1981 20 C e r r u t i , James 20 Mazzatenta, 0. Louis 24 Photog Continue d e t a i l ? Y 30 Appian Way 30 Ancient Rome 30 I t a l y 30 Ancient Road B u i l d i n g . 30 Roman C u l t u r e -88-E G. Kehler Personal L i b r a r y System 1 30 Horace 82 Photos 140 1981 200 35 210 The path of the anc i e n t Appian Way i s followed d e s c r i b i n g the h i s t o r y of the p e r i o d , the l o c a l a t t r a c t i o n s and what i s seen today. A short r e p o r t i s given on the major moder n c i t i e s the road passes through. D i s p l a y P h y s i c a l Works o f : 79? Y ACCESSION 3926 OWNER 78 MEDIA OR COST 1.45 LOCATION BK SHELVE 5 CONDITION GD ENTER (B)rowse, (D)oc, ( A ) c c e s s i o n , ( L ) o g i c a l , (E)nd -> E -89-E G Kehler Personal L i b r a r y System 1 982 Appendix E T h i s appendix presents the c u r r e n t values contained in the B i b l i o g r a p h i c tags c o n t r o l f i l e . The v a r i o u s items shown are the c u r r e n t l y accepted data elements which can be s t o r e d f o r each l e v e l i n the h i e r a r c h y . The c o n t r o l numbers apply only to a designated l e v e l i n the h i e r a r c h y and d e s c r i b e how that element i s to be c o n t r o l e d in that l e v e l . C o n t r o l Codes 0 Ignore t h i s tag 1 Set to d e f a u l t 2 Zero or Once per item 3 Must be present and only once 4 Zero or more times 5 One or more times 12 Zero or once per group 13 Must be present once per group 14 Zero or more times per group 15 One or more time per group Tag ID C o n t r o l L i n e s D e s c r i p t i o n Ser Jour Art 000 1 1 1 I n t e r n a l Document Number 002 1 1 1 Date entered 010 3 5 3 M T i t l e of Work 01 1 1 2 1 2 1 2 I n i t i a l t i t l e a r t i c l e 012 1 2 1 2 1 2 S u b t i t l e 014 1 4 1 4 0 M Common name 015 0 1 2 1 2 Revision/Ed./volume:no. 016 0 2 2 P u b l i c a t i o n date 018 0 1 2 4 Owner ID 019 2 1 2 0 Coden 020 4 4 5 Author I n d i v . - l a s t ; f i r s t 023 1 2 1 2 1 2 Dates 024 1 2 1 2 1 2 author c l a s s eg. e d i t o r 026 1 2 1 2 1 2 M nat i o n a l i ty/employer 030 4 4 4 Keywords 032 1 2 1 2 12 keyword weight f a c t o r -90-E G Kehler Personal L i b r a r y System 1982 040 4 4 4 050 4 4 0 052 1 3 1 3 0 058 4 4 0 060 4 4 4 062 12 1 2 12 064 12 1 2 1 2 070 0 2 0 072 1 2 1 2 1 2 074 1 2 1 2 1 2 080 2 2 2 082 2 2 2 1 40 0 2 2 1 50 2 2 0 1 55 2 2 0 1 60 0 2 0 1 70 2 4 0 180 0 0 0 181 4 4 4 200 0 2 2 204 0 2 2 210 0 4 4 212 0 1 2 1 2 214 0 1 2 1 2 216 0 1 2 1 2 220 4 4 4 222 1 2 1 2 1 2 224 1 2 1 2 1 2 226 12 1 2 1 2 228 1 2 1 2 1 2 240 4 4 4 250 0 0 0 252 0 4 4 255 0 4 0 M M M Subject headings P u b l i s h e r name c i t y / c o u n t r y p u b l i s h e d i n P u b l i s h e r coden Corporate Author (name) s t r e e t address c i t y / country Conference name conference date conference l o c a t i o n Type of work (eg f i l m ) I l l u s t r a t i o n l e v e l Copyright date L.C. Catalogue number L.C. Card number Dewey Decimal number ISSN / ISBN S p e c i a l indexing numbers Computing Reviews Subjects Pages S t a r t i n g / e n d i n g page A b s t r a c t Language Organizat ion Number Review language o r g a n i z a t i o n number reviewer Comments Related works (doc. numbers) Cross Entry Sub-components - 9 1 -

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051828/manifest

Comment

Related Items