Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and implementation of an event monitor for the unix operating system Chan, Susan Chui-Sheung 1987

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1987_A6_7 C42_7.pdf [ 2.81MB ]
Metadata
JSON: 831-1.0051876.json
JSON-LD: 831-1.0051876-ld.json
RDF/XML (Pretty): 831-1.0051876-rdf.xml
RDF/JSON: 831-1.0051876-rdf.json
Turtle: 831-1.0051876-turtle.txt
N-Triples: 831-1.0051876-rdf-ntriples.txt
Original Record: 831-1.0051876-source.json
Full Text
831-1.0051876-fulltext.txt
Citation
831-1.0051876.ris

Full Text

DESIGN AND IMPLEMENTATION  OF A N E V E N T  MONITOR  FOR THE UNIX OPERATING SYSTEM By SUSAN CHUI-SHEUNG  CHAN  B . Sc., University of B r i t i s h C o l u m b i a ,  1982  A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T T H E REQUIREMENTS FOR THE D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES (DEPARTMENT OF C O M P U T E R SCIENCE)  We accept this thesis as conforming to the required standard  T H E UNIVERSITY OF BRITISH COLUMBIA A p r i l 1987 © Susan C h u i - S h e u n g C h a n , 1987  OF  In  presenting  degree  at  this  the  thesis in  University of  partial  fulfilment  of  this  department  or  publication  of  thesis for by  his  or  that the  her  representatives.  It  this thesis for financial gain shall not  Department of The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3  for  an advanced  Library shall make it  agree that permission for extensive  scholarly purposes may be  permission.  DE-6(3/81)  requirements  British Columbia, I agree  freely available for reference and study. I further copying of  the  is  granted  by the  understood  that  head of copying  my or  be allowed without my written  A b s t r a c t  Tuning a computer system effectively requires prior studies on the performance of the system. There are different types of tools available to measure a system: hardware, firmware and software. This thesis presents the design and implementation of an event monitor, which is one type of software tools. The event monitor was developed on a SUN1  workstation running UNIX  4.2bsd  version 1.4. Six types of events were selected to be measured, namely transactions, logins/logouts, pageins, pageouts, disk I/Os and forks/exits. The operating system was modified to include probes to trap these events. For a final testing of the event monitor, it was ported and installed onto a SUN3 workstation running UNIX version 3.2.  condenser.  4.2bsd  Measurements collected were analyzed by a capacity planning package The results give an indication of the system workload and the system  performance. Benchmarks were also set up to measure the overhead incurred by the event monitor.  ii  C o n t e n t s  Abstract  ii  Contents  iii  List of Figures  v  List of Tables  vi  Acknowledgement  vii  1  Introduction 1.1 Thesis Motivations and Objectives 1.2 Thesis Outline  2  M e a s u r e m e n t Techniques 2.1 Criteria for a Good Measurement Tool 2.2 Hardware Tools 2.3 Firmware Tools 2.4 Software Tools 2.4.1 Event Detection 2.4.2 Event Sampling  4 4 5 8 8 10 11  3  Event M o n i t o r 3.1 Environment 3.1.1 Hardware 3.1.2 Operating System 3.1.3. Implementation Language 3.2 Structure of the Event Monitor 3.2.1 Overall Design 3.2.2 Buffer Management  13 13 13 14 16 16 16 17  in  1 1 3  3.3 3.4 3.5 3.6 3.7  3.2.3 D a t a Structures Probe Routine Process Synchronization C r i t i c a l Section Security Measures C o m p a t i b i l i t y and D e p e n d a b i l i t y  20 23 24 25 25 26  4 Installation 4.1 4.2 4.3  27  Installation Procedures Probes E v e n t s to b e M e a s u r e d 4.3.1 Transactions 4.3.2 L o g i n s a n d Logouts 4.3.3 Paging 4.3.4 Diskl/Os 4.3.5 Forks and E x i t s  27 28 30 31 31 32 35 35  5 Testing 5.1 5.2  37  M e a s u r i n g a R e a l System Benchmarks  37 42  6 Concluding Remarks  46  6.1  Tool Evaluation 6.1.1 Scope 6.1.2 Interference 6.1.3 Accuracy 6.1.4 Portability  46 46 47 47 48  6.2  Future Enhancements  48  Bibliography  49  Appendix  51  A User Guide  51  B Module Design  5 4  C System Data Structures  6 0  iv  L i s t  o f  F i g u r e s  2.1  A Hardware Monitor  7  3.1 3.2 3.3 3.4 3.5 3.6  Relationship of monitor w i t h rest of U N I X Buffer M a n a g e m e n t Scheme of monitor m o n i t o r . b u f structure L a y o u t of d a t a collected by monitor T h e Header Structure The Event Record  18 19 20 21 22 22  4.1  T h e L R U clocks hands for U N I X 4.2bsd and U N I X 4.3bsd  34  v  L i s t  o f  T a b l e s  4.1  Event Descriptions and their Auxiliary Information  30  5.1 5.2 5.3  Summarized measurement results of ubc-csgrads Global Statistics of ubc-csgrads Variability in C P U time (sees) under the three conditions  40 41 45  vi  Acknowledgement  I would like to thank my supervisor Dr. Sam Chanson for his guidance and cooperation. Thanks must also go to Dr. Son Vuong who served as the second reader for this thesis. My greatest gratitude should go to Jee Fung Pang, who has given me numerous valuable ideas, and who adapted the condenser package to run on the event data collected. Lastly, I am grateful for all help I obtained from Frank Pronk and Rick Sample. They have answered many of my questions which may otherwise take me a long time to figure out.  vii  C h a p t e r  1  I n t r o d u c t i o n  1.1  Thesis Motivations and Objectives  The studies on Tuning, Measurements  and Performance  Evaluations  of Computer  Systems have gained recognition in Computer Science research. In most cases, tuning a system improves its performance, and sometimes the improvement can be considerable. However, in order to be able to tune a system effectively, the system must first be evaluated. We are specifically interested in the case where the system already exists and available to be measured. There are many types of tools which can be used to measure the performance of a system. Each type of tools has its merits as well as drawbacks, and hence the choice of tool depends entirely on the type of measurements desired. For system usage data, such as cpu and disk utilizations, it is most accurate to collect the relevant information as the events occur. This will require either a hardware measurement tool which can be quite expensive, or a piece of software, an event monitor, to be inserted into the  1  CHAPTER  1.  INTRODUCTION  2  operating system to monitor the activities of the system. The thesis deals with the design and implementation of a software event monitor. The UNIX 1 operating system is used extensively both in research and commercial environments. It is a very powerful operating system running on a range of computers from microprocessors to the mainframes. The operating system itself, however, does not maintain adequate measurements statistics. Consequently, performance evaluations on the UNIX operating system have been found difficult. However, unlike other operating systems, UNIX was written in the high level language C, which makes it easier to decode than other operating systems written in assembler language, and its source is available. In addition, the kernel of the operating system is fairly compact and manageable. This thesis was motivated by the preceding considerations. It was felt that an event monitor can be developed on a UNIX 4.2 bsd operating system, which will capture and record events as they occur. The data collected by such a monitor can be used to characterize the system's workload, and can be fed into other capacity planning packages to study system performance. Three of the key concerns in Performance Evaluations are the amount of interferences added to the system, the amount of overhead incurred, and the accuracy of the data collected. When designing the event monitor, special attempts were made to cope with the above problems. In addition, the event monitor can be used interactively, 1  UNIX is a trademark of A T & T Bell Laboratories  CHAPTER  1.  INTRODUCTION  allowing users to turn it ON  3  or O F F at will. It also provides users the flexibility to  select the types of events to be measured, and the number and size of buffers to be used. Since the event monitor is implemented as a separate process, it can be ported to other versions of UNIX with ease.  1.2  Thesis  Outline  The thesis is organized as follows. Following the introduction in Chapter 1, the reader is presented with the different techniques that are available for measuring computer systems in Chapter 2. The design and implementation of the event monitor developed for UNIX is described in Chapter 3. The installation of the event monitor including where to insert probes to trap some selected events is discussed in Chapter 4. Measurements collected on a real system are presented in Chapter 5. Evaluations of the event monitor, and possible future enhancements conclude the thesis in Chapter 6. Appendix A contains a sample session and thus can be used as a simple guide for first time users of the event monitor. Appendix B contains a detailed module design, which may be helpful for programmers who may want to modify the system. Appendix C contains the data structures of some pertinent UNIX system variables.  Chapter 2 Measurement Techniques There are three main categories of measurement tools: hardware, software.  firmware  and  Each type of tools has its own characteristics, and is suitable for collecting  different sets of data. Since the event monitor  developed is one type of software tools,  software measuring techniques are presented in details. Hardware and firmware tools are also presented for comparison.  2.1  Criteria for a Good Measurement Tool  Two of the most important criteria for a good measurement tool are its and accuracy.  efficiency  A measurement tool should be efficient, and should not impose too  much extraneous load on the system. Equally important, the data collected by the tool should reflect accurately a system's workload. The accuracy of a tool is determined in part by its resolution,  which is the maximum frequency at which events can be detected  and correctly recorded. Inevitably, any measurement tool that does not use an external  4  CHAPTER  2.  MEASUREMENT  TECHNIQUES  5  processor, no matter how perfect its design, will add to the system load, and the data collected will necessarily contain an error margin.  Nevertheless, if the overhead is  acceptable and can be measured, and the error margin is low, the measurement tool is still useful. Hardware tools, as discussed below, are superior to software and firmware tools in both efficiency and accuracy, though they are generally more expensive and difficult to use. One cannot, however, compare the different types of tools as if they have equivalent capabilities and applications. Even though some measurements can be taken by any one of the tools, one type is usually preferred. The characteristics of each type of tool are discussed in the following sections.  2.2  H a r d w a r e  Tools  Hardware monitors are electronic devices connected to specific system points where they can detect voltage levels or pulses characterizing the events to be measured. Since they are completely external to the system, they do not interfere with the system's activities, or do they add to the system's workloads. The only energy consumption is at the point where the connection occurs, but the amount is usually considered negligible. Because of their negligible interference and high resolution - capable of detecting high frequency ( l M H z or higher) events - their accuracy is generally higher than the other tools. Hardware monitors are connected to the system via probes. These probes are usually  CHAPTER  2. MEASUREMENT  TECHNIQUES  6  circuits of high impedance, capable of detecting the change in voltage levels. Care must be taken when installing the probes, because at some critical points of the system, the addition of even a slight electrical load can introduce serious system disturbances. After the signals have been collected by the probes, they are sent through an event filter, a logic module which processes the signals. From the event filter, signals are then sent to a set of counters, one counter for each specific event. A t the end of the measurement session, or periodically, depending on the duration, contents of the counters are written onto a mass storage device, usually disk or tape. The analysis of these data, a process known as data reductionis usually done off-line to produce reports for capacity planning. A hardware monitor is represented pictorially in Figure 2.1. Hardware monitors are more sensitive to changes of the system on a physical level, and since they hardly interfere with the system's activities, they are ideal for measuring microscopic events of high frequencies. Examples of such events are: the transfer rate of a channel, C P U and device utilizations and the seek activities of a disk unit. However, it may be difficult to relate the microscopic events to higher level events. Also, installation of a hardware monitor is usually very complicated, because it involves physical connections to the machine; hence, it will require good knowledge of the hardware architecture to be able to place the probes properly.  CHAPTER  2.  MEASUREMENT  TECHNIQUES  7  clock  probes  * s  • •  • •  logic modules  • • •  1 1 1  counters  i  Figure 2.1: A Hardware Monitor  storage device (magnetic tape)  CHAPTER  2.3  2. MEASUREMENT  TECHNIQUES  8  Firmware Tools  Firmware tools are measurement tools that are micro-programmed into the system. They are not as common as either hardware or software tools. Their many characteristics, such as interference, accuracy, resolution and ease of use, are in between that of hardware and software tools. Installations of firmware tools are fairly complicated, and their costs are higher than those of software tools.  2.4  Software Tools  Software tools are programs inserted into the operating system to monitor its activities. This may be done in one of three ways:  1. addition of a program  2. modification of the software to be measured  3. modification of the operating system  The first method is generally preferred because it makes it easier to use the tool when required, and remove it when not needed. The integrity of the operating system is also preserved. The second method requires the insertion of codes at some critical points of the program to be measured. The last method is the most cumbersome, as it  CHAPTER  2.  MEASUREMENT  TECHNIQUES  9  involves rewriting part of the operating system, and is usually done because the existing O/S does not provide some of the necessary data. Since a software monitor competes for resources with the rest of the system, it introduces interferences. In particular, the data collected by the software monitor, which is generally large, has to be stored in main memory, and written out periodically onto secondary storage devices, and thus interrupting the normal I/O activities of the system. The collection and compilation of statistics also consume C P U time. The design of the software monitor can greatly influence the above factors. A good software tool should satisfy the following requirements: [Kole7l]  • it should be able to extract quantitative and descriptive data from the system.  • it should require as little modification to the operating system as possible.  • its data collection techniques should not alter the workload characteristics and hence the performance of the measured system.  • it should require as little memory as possible.  Due to the amount of interference, software tools are only good when measuring events of a much lower frequency. It is appropriate for obtaining descriptive and quantitative data, such as page table entries and file access information. Within the software  CHAPTER  2.  MEASUREMENT  TECHNIQUES  10  tools domain, there are two distinct measurement techniques: event detection and event  sampling, which are discussed in the following sections. 2.4.1  Event  Detection  A n event in the computer system is defined to be a change in the system's states. Examples of events can be the start and end of I/O operations, users logging on/off the system and the recognition of a page fault. In event detection, a piece of software, known as an event monitor,  is inserted  into the operating system, which is capable of collecting and compiling information when an event occurs. Special code, commonly known as probe or hook, are also placed strategically in various spots of the operating system. When an event of interest occurs, this code will cause control to be transferred to the monitor routine. Inside the monitor, relevant information are collected and written into a buffer area, which is a temporary storage. Depending on the size of the buffer, and the frequencies of events, the buffer is emptied periodically onto a secondary storage device. This technique preserves the order in which the events occur and provides the necessary data associated with each event. Detailed and accurate workload characterizations can thus be obtained. Since an event monitor usually deals with a large volume of data, buffer space is extremely critical.  In most machines, buffer space is limited, and hence writing to  secondary device has to be done frequently. When an event occurs, if the buffer is full  CHAPTER  2.  MEASUREMENT  TECHNIQUES  11  but the transfer of its contents has not been completed, then the question arises as to whether the system should wait for the completion of the transfer. If the system waits, it will be slowed down appreciably; if not, some event data may be lost. It is up to the implementor to decide how to handle this situation. 2.4.2  Event  Sampling  Event sampling is a statistical approach to measuring the behaviour of a computer system. Instead of measuring every event as it occurs, this method collects only selected samples for analysis from which one can usually estimate, with a high degree of accuracy, parameters that can characterize the activities of the computer system. The main advantage of sampling is that it produces a much smaller set of data, thus reducing the overhead and simplifying its analysis. The problem of buffer management is also less critical. The amount of interference is comparably lower than event detection. There are two types of sampling techniques, count sampling and time sampling. In count sampling, a measurement routine is periodically invoked after a fixed number of predefined events have occurred. The more common technique is time sampling, where measurement routines are invoked at pre-specified time intervals.  Sampling  intervals can be constant or random. Random sampling is particularly useful when the distribution of the data is unknown.  CHAPTER  2.  MEASUREMENT  TECHNIQUES  12  Sample size has to be fairly large in order that the data collected be representative. Sample interval should be short such that the distribution of workload is homogeneous. Sampling is suitable for collecting resource usage data, and particularly those data in which the sequencing of events is unimportant.  Chapter 3 Event Monitor This chapter discusses the design and implementation of the event monitor for the  UNIX operating system. Issues of importance are: the implementation environment, its buffer management scheme, data structures, process synchronization, critical sections, security measures, compatibility and dependability. For the purpose of brevity, the term monitor is henceforth used synonymously with event monitor.  3.1  E n v i r o n m e n t  3.1.1  Hardware  The monitor was implemented on a 68000 based SUN  andrew. It is one of the early S U N l workstations which SUN manufactures.  workstation named ubc1  Microsystems no longer  Even though the implementation did not involve the manipulation of  the very low level machine architecture, a knowledge of its structure is useful when 1  S U N Workstation is a trademark of Sun Microsystems Inc.  13  CHAPTER  3.  EVENT  MONITOR  14  designing the monitor and making modifications to the UNIX kernel. The SUN  68000 Board uses two buses: an internal synchronous bus for communi-  cating with local memory and I/O devices, and the Multibus system bus for referencing additional memory and offboard I/O devices. Seven levels of interrupts, numbered 1 through 7, are recognized by the SUN  processor. Level 7 has the highest priority, and  level 1 has the lowest. Interrupts are acknowledged and processed for all priority levels greater than the current processor priority level contained in the 68000 status register.  ubc-andrew has 1 Mbyte of main memory. There is a Memory Management Unit (MMU)  in the workstation which provides address translation, protection, sharing and  memory allocation for multiple processes executing on the 68000 CPU. consists of a context register, a segment map the C P U  and a page map.  MMU  Virtual address from  are translated into intermediate addresses by the segment map  physical addresses by the page  The  and then into  map.  The page size is 2048 bytes, the segment size is 32K  bytes (giving 16 pages per  segment), and up to 16 contexts can be mapped concurrently. The maximum logical address space that can be mapped simultaneously is 2M  3.1.2  bytes.  O p e r a t i n g System  Inasmuch as monitor is one type of software tools, it is to be inserted into the operating system. The target operating system is UNIX 4.2bsd. A very brief overview  CHAPTER  3.  EVENT  MONITOR  15  of the relevant aspects of the operating system is presented below. The  UNIX  execution.  O/S  provides the processes abstraction.  A process is a program in  The system starts up with the init process as process 1, and the page  daemon as process 2. New  processes can be created by the system fork command.  There are two types of processes: those that reside in the user space, and those that are within the kernel. These two types of processes do not share the same address space; hence, variables cannot be shared between the two layers. Kernel processes have access to all kernel variables, but each user process has its own stack for variables. Communication amongst user processes is via pipes and sockets, while communication between the layers is via system calls.  Special kernel routines such as copyin  and  copyout are required to copy variables in and out of the kernel space. Associated with each process is a data structure called the process structure.  (See  Appendix C). The process structures of all running processes are linked together in a process table. Each process structure contains everything that is necessary to know about a process when it is swapped out, such as its unique process identifier (an integer), scheduling information and pointers to other control blocks. Associated with each user is a user structure, which contains information such a user id, group id and resource usages for each user. (See Appendix C). These two structures provide most of the required data for monitor. Memory of the system is available in the forms of buffers, linked up in three separate  CHAPTER  3.  EVENT  MONITOR  16  queues. The first queue, which contains all the super blocks of the file system, must be kept permanently in main memory. The second queue contains the cache, while the third queue contains I/O buffers for different devices, and also some empty buffers. Buffers for monitor usage come from the third queue. The structure of a UNIX buffer can be found in Appendix C.  3.1.3  Implementation Language  The language used to develop monitor is the C programming language. The choice of this language is obvious, as almost the entire UNIX operating system is written in C. Initially, it was felt that parts of monitor may have to be coded in assembler language to increase efficiency, but as yet, it has not been found necessary.  3.2  Structure of the Event Monitor  3.2.1  O v e r a l l Design  The monitor is implemented partly as a user process, and partly as a kernel process. Because most of the events of interest, such as page faults and disk I/Os, happen within the kernel, the processing of events is most appropriately done within the kernel. To improve efficiency, the large quantity of I/Os and buffer management, are also handled within the kernel. There is a user command interface at the user level, which processes user commands, and then does a context switch to pass parameters into the kernel portion of the monitor. Within the kernel, the probe routine awaits the occurrences of  CHAPTER  3.  EVENT  MONITOR  17  events. The inter-relationship between monitor and the rest of UNIX is illustrated in Fig. 3.1. A detailed module design is given in Appendix B.  3.2.2  Buffer Management  Buffer management is handled entirely within the kernel.  The  monitor allows  users to select the number and size of the buffers, but some knowledge of the hardware architecture is essential when optimal usage of buffers is desired. For instance, with the SUN  architecture, blocks are always allocated in size of 2048 bytes. Hence, it is only  sensible to choose buffer size to be multiples of 2048. The number of buffers should also be chosen wisely, such that there is always an overlap between filling buffers and writing them out onto secondary storage device. The amount of available memory on a machine also governs the number of buffers to be used. For example, with a machine that has only one megabyte of memory, it is not logical to allocate more than 16K bytes for monitor usage. With the preceding considerations in mind, monitor is designed to supply suitable default values for buffer size and number of buffers for users who  do  not have an in-depth knowledge of the machine architecture. At the outset of monitor, buffers are allocated and linked up cyclically within the kernel. The ring approach is chosen over maintaining two separate link lists of empty and full buffer queues, because it simplifies the task of pointer re-assignments when moving a buffer from the empty queue to full queue, and vice versa.  Only three  CHAPTER  3. EVENT  system tables  MONITOR  monitor  event data  (kernel ccrnel  portion) probes  system buffer  • (Kernel*)  pool  Vcv c n IS  Kernel Address Space  User monitor  (user command interface)  Address Space  user commands  Figure 3.1: Relationship of monitor with rest of U N I X  CHAPTER  3.  EVENT  MONITOR  19  pointers are maintained at any one time: pointer to the current buffer, pointer to the logical head of the entire buffer pool, and pointer to the logical head of the full buffer queue. The current pointer, mp, points to the buffer being used to collect data. The full pointer, fp, points to next buffer to be written out onto the secondary device. Pictorially, the buffer management system of monitor is represented in Fig. 3.2.  Figure 3.2: Buffer Management Scheme of monitor  The first 10 bytes of each buffer are reserved for administrative purposes. The actual data storage area is thus bufsize - 10. The administrative information required is summarized in the monitor_buf  structure in Fig. 3.3. The filled field is a status flag,  CHAPTER  3.  EVENT  MONITOR  20  struct monitor_buf { short filled; struct buf *nextbp; struct monitor_buf *nextmp;  Figure 3.3: monitor_buf structure turned on when the buffer is full, and off when the buffer is empty. This flag tells the output routine if there are more buffers to write, and the probe routine if there are empty buffers to use. Two separate pointers are also required to point to the next buffer structure. The necessity of these two pointers may need some explanation. From Appendix C, it can be seen that the actual data area of a UNIX buffer is at the address pointed to by the field b.addr. The monitor buffer's administrative data starts at this address. The address of the entire buffer structure, however, must also be maintained, because the system needs the address when releasing the buffer storage. A buffer is written out onto disk or tape as soon as it gets filled. When the monitor is turned off, all buffers are deallocated and returned to the system for other usages.  3.2.3  D a t a Structures  The organization of data collected by monitor is illustrated in Fig. 3 . 4 . A header record is written for each invocation of the monitor. Its purpose is for  CHAPTER  3.  EVENT  header record  MONITOR  event record  event record  21  • • • •  event record  trailer record  Figure 3.4: Layout of data collected by monitor identification of the system being measured, and to record the start time of monitor. This information is useful for the capacity planner when analyzing the event data. The structure of the header record is shown in Fig. 3.5. As each event occurs, an event record is written into the buffer storage. Structure for the event record is shown in Fig. 3.6. There are two parts to an event record,  fixed and variable.  The fixed portion contains pertinent information for each event,  such as event id, user id, process id, real time and cpu time. The event id is computed as event_group*256+event_type, where event_group is one of the possible events; event_type is either the start or end of the event. (See Table 4.1). The event id is a compact way to represent the event_group and event_type. The variable portion is for auxiliary information, and varies for each type of event. A table of events measured on a SUN workstation and associated auxiliary information can be found in Chapter 4. The trailer record is another time stamp, indicating the termination time for mon-  CHAPTER  3.  EVENT  MONITOR  struct header_record { short month; short day; short year; short hour; short min; short sec; short userno; char username[32]; char version [16]; short cpuid; short memory; short nusers; short mon_version;  Figure 3.5: T h e Header Structure  struct event_record{ short len; short event_id; short user_id; short pid; unsigned long cpujime; unsigned long real_time; short *auxinfo; /* may or may not present */  Figure 3.6: The Event Record  CHAPTER  3.  EVENT  MONITOR  23  itor. The difference between termination time and start time is the elapsed time for the monitor session. Associated with each monitor buffer is a 2-byte field, lost-event, which counts the number of events lost while waiting for the next available buffer. If lost.event is too high, the capacity planner may  select to discard the buffer of event records for not  being representative of the entire workload.  3.3  Probe  Routine  The probe routine is the heart of monitor, awaiting the occurrences of events. There are two entry points to this routine: probe to be called from the user level, and  sys-probe to be called from the kernel level. The two entry points are necessary, because one would like to trap user level events as well as kernel events. The tasks for probe can be summarized as follows:  1. check that monitor is turned on.  2. check that the particular event is selected to be monitored.  3. if no empty buffers is available, increment lost.events.  4. otherwise, fill an event_record and add to current buffer.  CHAPTER  3.4  3.  EVENT  MONITOR  24  Process Synchronization For maximum efficiency, monitor strives to completely overlap the tasks of filling  buffers with event records and writing buffers out onto secondary devices. There are two separate routines to assume the two tasks: sys_probe to fill buffers, and writeJbuf to invoke an output routine to write buffers out. The two processes are synchronized by two primitives sleep and makeup. Depending on the workload of the system, the type of events to be measured, the speed of I/O drivers, and the number of buffers available, one process may  have to wait for signals from the other process before it  can continue. For instance, if the events of interest occur at such a rapid pace that all available buffers are filled, then sysjprobe has to wait for empty buffer before it can collect more event records. The proper calling sequence for these primitives are: sleep(chan, prio) caddr_t chan; int prio;  wakeup(chan ) caddr_t chan; The first argument of sleep is by convention the address of a kernel data structure, and the second argument is a scheduling priority. When a process goes to sleep, it  CHAPTER  3.  EVENT  MONITOR  25  gives up the processor until a wakeup occurs, at which time the process enters the scheduling queue at priority prio. The priority, if negative, also prevents the process from being prematurely awakened by some exceptional event, such as a signal. Hence, when sysjprobe  write-buf  3.5  has to wait for empty buffers, it goes to sleep until it is waken up by  when empty buffers become available.  Critical Section The critical section problem arise when several processes try to asynchronously  change the contents of a common data area. The updated area may  not, in general,  contain the intended changes if protection against contention of competing processes is not provided. In the case of monitor,  the common data area is the buffer storage. To  safeguard the buffer area from contention when it is being filled with an event_record, its interrupt level is raised to level 6, and the old level is restored when it is done. In  UNIX  4.2bsd, the routines splO, spll,  ... spl6 can be used to raise or lower the interrupt  levels.  3.6  Security Measures Presently, monitor  is designed such that only the super-user of the system can  invoke it. Of course, it can easily be modified to give access rights to any user. The potential danger of the second approach, however, cannot be overlooked. If the user  CHAPTER  3.  of monitor  EVENT  MONITOR  26  does not have a clear concept of the nature of the events being measured,  system workload can increase appreciably and system performance will deteriorate. Since the main objective of monitor is to collect data for performance evaluation studies, it is best to grant access permission only to a user with the above objective in mind.  3.7  Compatibility and Dependability  The data collected by monitor is designed mainly to be used by the capacity planning package condenser, developed by Jee Fung Pang as his master thesis for the Department of Computer Science, University of British Columbia [Pang86]. The event record may have to be modified if it is to be used by other packages, but the tasks should be minimal.  C h a p t e r  4  I n s t a l l a t i o n  This chapter presents the installation procedures of the event monitor on a UNIX operating system running on a S U N workstation. To trap events, probes have to be inserted in strategic locations of the operating system. For illustrative purposes, six different types of events are selected to be measured and the locations of probes for those events are also discussed.  4.1  Installation Procedures  The event-monitor was developed on a 68000 based SUN workstation running  UNIX  4.2 bsd, but it can easily be transported and installed in other compatible  UNIX  operating systems. The installation procedures are outlined as below:  1. Copy the object modules for the event monitor, which includes both the userlevel and kernel portions, to the target machine. The object modules for these routines are collectively stored in a file monitor.o, with its associate source in file  27  CHAPTER  4.  monitor.c,  INSTALLATION  and header file  28  monitor.h.  2. Make entries in the system entry table for the two kernel routines that are also callable by users from the user-level. These routines are setmonitor  and probe,  and they take 6 and 5 arguments respectively.  3. Make entries in the system C library for the above two routines, such that they can be invoked from the user level of the operating system, which is written in C.  4. Modify the kernel to trap the selected events. See Sec. 4.2.  5. Recompile the UNIX kernel linking the resident kernel portion of the event monitor with the rest of the system.  6. Install and load the new  UNIX kernel.  If the installation is successful, and the target machine meets the minimum memory requirement, then monitor is ready to be used. Refer to Appendix A for a user guide.  4.2  P r o b e s  In order to trap the selected events, probes have to be inserted at the precise locations in the operating system where the events occur. As mentioned previously, there are two separate entry points to the probe routine: from the user level and from the  CHAPTER  4.  INSTALLATION  29  kernel level. The calling sequence to these two entry points are: probe (event-group, event-type, auxinfo, auxlen); int event-group; int event-type; short * auxinfo; /* pointer to auxiliary info */ int auxlen; /* length of auxiliary info */  sys-probe (event-group, event-type, auxinfo, auxlen, kern); int event-group; int event-type; short * auxinfo; int auxlen; int kern; /* 1 if it's kernal event; 0 otherwise */  There are flags within the kernel, such as KERN_FORK, KERN.PAGEIN, KERN_TRANS, which are set when the corresponding event is turned on. Before the sys_probe routine is invoked, the appropriate flag is checked to insure that the event to occur is selected to be measured. This way, unnecessary invocations of sys_probe can be avoided. For example, to trap a pagein, the following statements are inserted into the operating system at the precise location:  CHAPTER  4.  INSTALLATION  30  if ( K E R N J P A G E I N ) sys_probe(KERNJPAGEIN,START_EVENT,&pf,2,l);  4.3  E v e n t s  t o b e  M e a s u r e d  For illustrative purpose, six different types of events are selected to be measured. They are transactions, logins/logouts, pageins, pageouts, disk I/Os and forks/exits. The events and their associated auxiliary information are summarized in Table 4.1.  Event Group  Event Description  1 2 3 4 5 6  TRANSACTION LOGIN/LOGOUT PAGEINS PAGEOUTS DISK I/O FORK/EXIT  Associated Types start, start, start, start, start, start,  end end end end end end  Aux info  Auxlen  none none page no. page no. device no. none  0 0 2 2 2 0  Table 4.1: Event Descriptions and their Auxiliary Information  The following sections give a brief description of how UNIX handles the different events, and the precise locations of the different probes. The descriptions are based on the system U N I X 4.2bsd.  CHAPTER  4.3.1  4.  INSTALLATION  31  Transactions  A transaction is defined as an interaction with the system, whether it is input or output. For instance, when the user issues a shell command, he is initiating a transaction.  When the command is acted upon by the system, the transaction is  terminated. The system handles input differently, depending on the modes it is in. In N O R M A L mode, such as within the shell, input is not processed until a carriage return is encountered. But in RAW  and C B R E A K modes, which are used within editors,  input is processed a character at a time; character is also output without processing. The S T A R T _ E V E N T for N O R M A L mode is therefore different from the other modes, and occurs when a carriage return is entered. The E N D _ E V E N T for all modes is the output of the first character. Probes are inserted in ttyJnput and tty^output in the file tty.c. The statistics collected at S T A R T J E V E N T and E N D _ E V E N T can be used to calculate system response time and users' think times.  4.3.2  Logins and Logouts  On each terminal port available for interactive use, init forks a new process, which attempts to open the port for reading and writing. The open succeeds when the terminal is turned on, or a telephone call is accepted by a dial-up modem. The program  getty is then executed by init.  CHAPTER  4.  INSTALLATION  32  Getty initializes terminal line parameters and prompts the user to type a login name. The login name is passed as an argument to another program, login.  Login  encryptes the typed password and compares it with the encrypted password string for the login name found in file /etc/passwd. If they are the same, login sets the uid of the process to that of the user logging in. The S T A R T J E V E N T of the L O G I N O U T probe is placed at the location after the uid is set, in file login, c. Login then executes a shell, a command interpreter. When the user logouts, the shell process dies.  The  E N D JE V E N T of the L O G I N O U T probe is placed in the routine goodbye (), in the file sh.c.  The L O G I N O U T probes are the only two probes that are placed in the user level. The login and logout events usually occur less frequently than the other events.  4.3.3  Paging  Memory pages are arranged into frames, which are represented by the core map or  cmap. This map  records the disk block corresponding to a frame that is in use by a  process, and also maintains a free-list of frames that are not used by any process. UNIX  4-Sbsd uses a modified Global Clock Least Recently Used (LRU)  algorithm for  memory management [Quar85]. A software clock hand linearly and repeatedly sweeps all frames of main memory that are available for paging. The reference bit of a page is marked invalid, i.e., reclaimable, when the clock hand sweeps over it. If the page is  CHAPTER  4.  INSTALLATION  33  refereneced before the clock hand next reaches it, a page fault occurs, and the page is made valid again. However, if the page has not been referenced when the clock hand reaches it again, it is reclaimed for other use. Various software conditions are also checked before a page is marked invalid. Pagein occurs when a process need a page, and the page is not mapped into a memory frame. This causes the kernel to allocate a frame of main memory, map it into the appropriate process page, and read the proper data into it. Pageins do not necessarily mean a disk I/O. If the required page is still in the process' page table, but has been marked invalid by the last pass of the clock hand, it can be marked valid and used without any I/O transfer. Pages can similarly be retrieved from the memory free-list. If the page has to be fetched from disk, it must be locked during the I/O transfer to prevent data from being corrupted. The pagein probe is inserted in the pagein routine in the file vm_page.c. The probe is to catch only those pagins that involve a disk I/O. The pageout algorithm is the L R U clock hand, which was described earlier. The algorithm is implemented in the pagedaemon, which is process 2. The pagedaemon's purpose is to keep the memory free-list large enough, such that paging demands on memory will not exhaust it. This process spends most of its time sleeping, but a check is done several times per second to see if action is necessary. Whenever the number of free frames falls below a threshold, the process is awakened; thus, if there is always a  CHAPTER  4.  INSTALLATION  34  lot of free memory, the pagedaemon imposes no load on the system because it never runs. With systems having a large main memory, the clock hand may take a long time to complete a cycle. Thus the second encounter of the hand with a given page has little relevance to the first encounter, and the pagedaemon will have difficulty finding reclaimable page frames. In 4.3bsd, a second clock hand, which follows behind the first clock hand, reclaims pages that are marked invalid by the first hand, (see Fig. 4.1)  4.2bsd clock hand  4.3bsd clock hands  Figure 4.1: The L R U clocks hands for UNIX 4.2bsd and UNIX 4.3bsd  The pageout probe has been placed to trap the reclamation of pages. It is inserted in the pageout() routine in thefilevm_page.c.  CHAPTER  4.3.4  4.  INSTALLATION  35  D i s k I/Os  This category of events include I/O associated with the block devices, namely disk and magnetic tapes. Attached to each device driver is a list of buffers, with each buffer assigned a device name and a device address. This list of buffers also acts as a cache for the block devices, as it is always searched first for a desired block on a read request. If the block is found, the data is made avialable without any physical I/O. If the block is not found, the least recently referenced buffer is used for the transfer. On a write request, the correct buffer is located in the cache and marked "dirty". Physical I/O is deferred until the buffer is reclaimed for a read request. Buffer I/O routines are collectively stored in file ufsJ)io.c.  Probes are inserted in  bread(), breadaf), bwritef), and bdwritef). Only events that cause actual physical I/Os are being trapped.  4.3.5  Forks and E x i t s  Processes in UNIX are created by the fork system call. During a fork, a new entry is allocated in the process table. A process structure is created for the new process (the child process), and all relevant information are copied from the parent process. This copying of information preserves open file descriptors, user and group identifiers, signal handling, and other similar properties of a process. The process id of the child process, however, is different from that of its parent. Fork returns 1 to the child process, and 0  CHAPTER  4.  INSTALLATION  36  to the parent process. A process is terminated by the exit system call. W h e n a process is to be terminated, its parent is notified, its resource u t i l i z a t i o n statistics are recorded, and all resources allocated are returned to the system. F o r our purpose of capacity p l a n n i n g , F O R K / E X I T is treated as a single event. T h e S T A R T J E V E N T is F O R K , a n d its probe is inserted i n the forklf) the file kernjork.c,  routine i n  after the process structure is allocated a n d information copied.  T h e E N D J E V E N T is E X I T , a n d its probe is inserted i n the exit() routine i n the file  kern.exit.c.  F o r k s a n d exits of processes happen very frequently w i t h i n the  UNIX  operating system; hence, this event usually generates a large amount of event data.  C h a p t e r  5  T e s t i n g  As a final testing of the event-monitor, it is ported from the development system, which is a SUN1  workstation running UNIX 4.2bsd version 1.4, and installed on a  SUN3 running U N I X 4.2bsd version 3.2. The event-monitor was turned on to measure the system for approximately 6 hours. Data collected were analyzed by the capacity planning package condenser.  In addition, benchmarks were set up to determine the  interference of the monitor on the system.  5.1  M e a s u r i n g  a  R e a l  S y s t e m  The system to be measured is a SUN3/260 running U N I X 4.2bsd version 3.2, known as ubc-csgrads. It is a production system that supports most of the research work of the graduate students in the Computer Science Department at UBC.  The  event-monitor  was installed overnight, and was turned on between 9:00 a.m. to 3:00 p.m. on Monday April 6, 1987 to measure the six types of events as outlined in the previous chapter.  37  CHAPTER  5.  TESTING  38  The results were analyzed by condenser, which reports system utilization on a per user basis, as well as overall system performance. Only overall system performance patterns and those of three classes of users (light, medium, heavy) of the ubc-csgrads are reported here. The following is a a brief description of the different types of statistics collected. For more in-depth discussions, refer to Jee Fung Pang's thesis [Pang86]. Definition of statistics:  •  Response time is the average response time for all interactive users in the class.  •  Think time is the average think time for all interactive users in the class.  •  True I/O  are those I/O operations (including queue wait times) not caused by  page faults.  •  True CPU are C P U usage excluding any C P U time for page faults.  •  Physical Page Fault is the number of page faults that actually cause one or more I/O operations.  •  Virtual Page Fault is the number of page faults that do not cause any I/O operations at all.  •  Login is the average session length of a terminal user, or the average length of a child process. Logout is the average time between logins, i.e. the average time between two terminal sessions or the average time between two child processes.  CHAPTER  •  5.  TESTING  39  CPU Utilization is the percentage of time that the C P U is utilized during the measurement session.  •  Disk Utilization is the percentage of time that the Disk is utilized during the measurement session.  •  Page Fault Rate is the number of page faults per second.  CHAPTER  5.  TESTING  40  The following is a summary of measured statistics for ubc-csgrads. Monitor started on 04/06/87 09:00:36 Monitor Version: 1 on 4.2BSD Monitor User Name: schan CPU: ubc-csgrads  Monitor User Number: 1022  Memory: 8 Megabytes  Maximum users: 12  Elapsed time = 22377.760 seconds Number of events processed = 208536 Total blocks/buffers read = 1804 System Parameters  Light  Medium  Heavy  Overall  Response time (sees) Think time (sees) True C P U (ms) True I/O (ms) Physical Page Fault (number) Virtual Page Fault (number) Login (sees) Logout (sees)  2.748 9.393 1.318 6411.319 0 198 n/a n/a  6.946 76.050 36.927 6681.434 0 130 n/a n/a  6.870 28.517 8017.077 9972.275 2 301 n/a n/a  3.312 13.871 3440.912 9080.095 2 629 10.4142 131.6857  Table 5.1: Summarized measurement results of ubc-csgrads  The global statistics are given in table 5.2. They indicate that the bottleneck of the system is probably the disk, with a utilization of 57.9837 percent, as compared to the CPU, which has a utilization of only 43.4541 percent. The system has virtually no page faults (0.0282 page fault/sec), which is probably due to the relatively large  CHAPTER  5.  TESTING  41  Statistic  True  Virtual  Total  C P U Utilization (percent) Disk Utilization(percent) Page Fault Rate (no. page faults/sec)  43.4539 57.9837 0.0001  0.0002 0.0000 0.0281  43.4541 57.9837 0.0282  Table 5.2: Global Statistics of ubc-csgrads physical memory of the machine (8 Megabytes). The workload of the system during the measurement session was considered heavier than normal, since it was end of term and students were finishing term projects and assignments. However, it should be noted that the statistics are the mean over the 6-hour session, including lunch hours. Peak loads during small intervals in the session will have much higher utilization figures. Nevertheless, the system was not saturated. The results suggest that the system can accommodate more users than its current maximum. The number of lines connected to ubc-csgrads via the switch can also be increased from the present 12. Classification of users into the three classes (light, medium and heavy) is based on CPU  usage. One would expect response time will increase from light to heavy users.  It is, however, not necessarily the case. In the execution of a fork, for example, the parent process has to wait for its child process to die, and hence a large response time, but it is not using any CPU.  Similarly, any process that initiates a pipe also has to  wait for the other process to complete before exiting. This will explain the average  CHAPTER  5.  TESTING  42  response time for the medium user (6.946 sees) which is slightly higher than the heavy user (6.870 sees). The true I/O times collected included disk wait times. When calculating disk utilization the queuing times have been removed.  5.2  Benchmarks  Benchmarks are artificial and reproducible workloads processed by the system. They enable meaningful comparison of system performance before and after system changes by providing the same workload for each case. To measure the interferences added to ubc-csgrads due to the installation of monitor, two sets of benchmarks were executed on the system under three different conditions: 1. monitor is not installed. 2. monitor installed but not turned on. 3. monitor installed and turned on with six events selected to be measured. The contents of the two benchmark files are : Benchmark I: • echo 'date' Begin M I X T E S T 'NUSERS' • mtime "cc -DSYS3 -c driver.c" & • sleep 10 • mtime -u2 "ed < edscript" &; • sleep 5  CHAPTER  5.  TESTING  • mtime "sort -r words -o /dev/null" & • sleep 50 • mtime "nroff -man nroff.l > /dev/null" &; • sleep 60 • mtime "cp editor.c editor.cc" & • sleep 24 • mtime "pwd" & mtime "cd /tmp" &: fork 5 &; • sleep 15 • mtime "who" & mtime "tsh" & • sleep 17 • mtime "cc -DSYS3 -c editor.c" & • mtime "cat driver.c | tr a — z A — Z > driver.cc" • wait • echo 'date' End M I X T E S T 'NUSERS' Benchmark II: • date • mtime "cc -DSYS3 driver.c -o driver" & • mtime -u4 "ed < edscript" & • sleep 5 • mtime "sort -r words -o /dev/null" &; • mtime "nroff-man nroff.l > /dev/null" & • mtime "cc -DSYS3 editor.c -o editor" & • sleep 4  CHAPTER  5.  TESTING  44  • mtime "pwd" & mtime "cd /tmp" &; fork 2 0 & : • mtime "who" &; mtime "tsh" & • mtime "cp editor.c editor.cc" & • mtime "cat driver.c | tr a — z A — Z > driver.cc" Sz • wait • date The two benchmarks differ in the order in which the commands are executed, the number of users executing each command, and the time interval between each command, mtime is a program written in C that will spawn subshells to time the execution of a given command by any number of users as given by the -u option. Each user is independently running the command the number of times as indicated by the -r option.  mtime is modelled after the standard U N I X time command, and enjoys comparable accuracy. Internally, /bin/sh is called to excecute the "command", and the subprocess times are returned via the system call time. The two sets of benchmarks were executed on a single-user S U N 3 machine, ubc-  csfs2, with no other workload on the system. Each benchmark was executed ten times, and the C P U time averages for each command were used in the comparison. Variability in response times is not reported, because response times depend largely on the timing and the order that the commands were executed in. Results of the benchmarks are summarized in table 5 . 3 . The three conditions that the benchmarks were executed in are as outlined earlier.  CHAPTER  5.  TESTING  45  command  condition 1  condition 2  condition 3  cc -DSYS3 -c driver.c ed edscript sort -r words nroff -man nroff.l cp editor.c editor.cc pwd cd /tmp who cc -DSYS3 -c editor.c cat driver.c driver.cc  7.832 0.133 0.149 3.967 0.117 0.084 0.017 0.162 12.048 0.534  7.836 0.134 0.150 3.967 0.117 0.083 0.016 0.164 12.047 0.535  7.835 0.133 0.152 3.968 0.118 0.089 0.015 0.164 12.049 0.535  Table 5.3: Variability in C P U time (sees) under the three conditions From the tabulated results, it can be seen that the largest increase in C P U time from conditon 1 to condition 3 is 5 msec, in the pwd command. In some commands, however, the C P U time in condition 3 is even lower than in condition 1. With most commands, whether it is a large compilation or text formatting job, or a simple shell command, there is hardly any variability at all. The accuracy of these results depends on the accuracy of the shell time command, but they are good indications that the event-monitor does not increase the system load noticeably.  Chapter 6 Concluding Remarks This thesis discusses the design and implementation of an event-monitor to be used in a U N I X operating system to trap and record events as they occur. The data collected is used for capacity planning. The event-monitor was developed on a SUN1 workstation, and ported to a SUN3. The tool is evaluated based on four criteria: scope, interference, accuracy and portability. Suggestions for possible future enhancements conclude the thesis.  6.1  6.1.1  Tool  Evaluation  Scope  The scope of a measurement tool is the classes of events it can detect. Events can be macroscopic, for example, the number of users logging on, or microscopic, such as the utilization of disks and cpu. Obviously, the wider the scope of a measurement tool, the greater is its range of applications. As seen in the earlier chapters, the scope of  46  CHAPTER  6.  CONCLUDING  REMARKS  47  monitor is very wide. Probes can be inserted in any part of the kernel, as well as in user programs. The only limitation with monitor is that it cannot be used to probe events lower than the kernel layer, such as hardware events that occur at the instruction and macro layers. Being a software tool, it is also not suitable to measure events that occur with very high frequency.  6.1.2  Interference  Every measurement tool extracts energy from the system.  Interference can be  classified in terms of resources and memory utilization. For monitor, the amount of interference introduced depends heavily on the workload, and on the types of events selected. When designing monitor, special care was taken to allow the tool to make optimal use of resources available. For instance, user can select the number and size of buffers to suit the particular system.  Output data can reside on disk if space is  available, but they can also be stored on magnetic tape to conserve space. Interference introduced on ubc-csgrads was found to be minimal in the previous chapter.  6.1.3  Accuracy  The accuracy of a tool is often reflected by the error affecting the data collected. Due to the interference caused by the event-monitor, the time statistics - cpu time and real time - collected in the event record necessarily constitute an error margin, which fortunately should be constant for all types of events. Since events are always probed  CHAPTER  6.  CONCLUDING  REMARKS  48  in pairs, S T A R T J E V E N T and E N D J E V E N T , analysis performed using the differences in statistics between each pair of probes would eliminate the effect of the overhead. Generally, if probes are well-placed within the operating system, data collected by  monitor is very accurate, since events are always trapped as they occur.  6.1.4  Portability  monitor  is implemented mainly as an individual kernel process, which functions  independently from the rest of the system. It does not depend on the very low level registers or machine architecture. As a result, monitor can easily be ported to other systems running compatible versions of UNIX. For this thesis, monitor  was  initially  developed on a S U N l workstation running U N I X 4.2bsd version 1.4, but was later installed on a SUN3 running U N I X 4.2bsd version 3.2.  6.2  F u t u r e  E n h a n c e m e n t s  This event-monitor is designed to run on a centralized system with a single processor. With the advent of distributed computing and computer systems with multiple cpus, there is a need for measurement tools for the more complicated systems, moni-  tor can be adapted to collect statistics for systems with multiple cpus, and maybe to measure the traffic flow across networks.  Bibliography [Ferr83]  [Hu86]  D. Ferrari, G. Serazzi, A. Zeigner, Measurement and Tuning of Computer Systems, Prentice-Hall, Englewood Cliffs, N.J., U.S.A., 1983. I. Hu, Measuring  File Access Patterns  in UNIX, Performance Evaluation  Review, Vol. 14, #2, 1986. [Kern78]  B.W. Kernighan, D.M. Ritchie, The C Programming Hall, Englewood Cliffs, N.J., U.S.A., 1978.  [Kern84]  B.W. Kernighan, R. Pike, The UNIX Programming Hall, Englewood Cliffs, N.J., U.S.A., 1984.  [Kole7l]  K. Kolence, A software view of measurement tools, Datamation, pp. 32-38, Jan. 1971.  [Lion77]  J. Lions, A Commentary on the UNIX operating system, Dept. of Computer Science, University of New South Wales, 1977.  [Pang86]  J.F. Pang, Characterizing  User  Workload for  Language, Prentice-  Environment,  Capacity  Prentice-  Planning,  M.Sc.  Thesis, University of British Columbia, Oct. 1986 [Quar85]  J.S. Quarterman, A. Silberschatz, J.L. Peterson, ^.2BSD Examples  [Ritc78]  [SUN82]  of the UNIX  System, Computing Surveys, Vol. 17, #4, Dec. 1985  D.M. Ritchie, K. Thompson, The UNIX Tech. J., Vol. 57, #6, Jul. 1978. Programmer's  and 4-3BSD as  Reference  Manual  time-sharing  for the SUN  system, Bell Sys.  Workstation  Version 1.0,  SUN Microsystem Inc., Oct. 1982. [SUN83]  SUN 68000 Board, Revision B, SUN Microsystem Inc., Feb. 1983.  [SUN85]  System Interface Manual for the SUN  Oct. 1985.  49  Workstation, SUN Microsystem Inc.,  BIBLIOGRAPHY  [Thom78]  50  K. Thompson, UNIX Implementation, Bell Sys. Tech. J., Vol. 57, #6, Jul. 1978.  Appendix A User Guide This appendix serves as a quick user guide to the user command monitor.  An  overview of the command line options is given, followed by a sample session with the monitor. The command line options for monitor are as follows: monitor -on | -off | -status | -help [-buffer nbufs bufsize] [-event evtnos] [-file fname] The meanings of the options are elaborated below. Options separated by commas are equivalence.  -on | -off | -status, -s \ -help, -h  One of the above options must be selected, otherwise the monitor command is void. -on is to turn monitor on, -off is to turn monitor off, -5 to query the status of the monitor, and -h prints out help information.  51  APPENDIX  A.  USER  GUIDE  52  -buffer, -b nbufs bufsize This is an optional parameter. It takes two arguments, number of buffers (nbufs) and size of each buffer (bufsize) to be used by monitor. If not given, defaults suitable to the particular system is supplied.  -events, -e evtnos This is also optional. Monitor can accommodate up to pre-defined maximum number of events. Evtnos can be a list of event-groups separted by blanks. Currently, only events 1 to 6 are defined. They are, respectively, transaction, login/logout, pagein, pageout, disk I/O and forks/exits. Only those events of the corresponding evtnos are being monitored.  -file, -f filename A n optional parameter, which takes an output file name as an argument. If the output file exists the user will be queried before it is overwritten. If the file does not exist, it will first be created. The default output file is  monitor.out.  A sample session is given below. User's input is given in bold face. % m o n i t o r -on -b 2 2048 -e 1 3 M O N I T O R version 1.0 - April 1987 Number of buffers to be used : 2  APPENDIX  A.  USER  GUIDE  Size of each buffer (bytes) : 2048 Output file name : monitor.out Events to be monitored are : 1. Transaction 3. Pagein  % monitor -on M O N I T O R is already turned on % m o n i t o r -s Status of M O N I T O R version 1.0 - April 1987 Number of buffers being used : 2 Size of each buffer (bytes) : 2048 Events being monitored are : 1. Transaction 3. Pagein %  monitor  Usage: monitor -on | -off | -s | -h [-b nbufs bufsize] [-e evtnos] [-f filename] % m o n i t o r -off M O N I T O R is turned O F F Event data collected in file monitor.out  %  Appendix B Module Design This appendix gives a brief module design of the event-monitor. It may serve as a guide for programmers who may want to modify or update the system. There are two portions to the event-monitor, one residing in the user address space, and the other in the kernel address space. The relationship between the two parts is illustrated in Fig. 3.1. For the modules below, a design/execution level number is included to indicate the module's position in the monitor's  hierarchical algorithm. A brief description of each  module's algorithm is also included.  User Level  main(argc,argv) - level 0 int argc; char **argv; The main program that acts as the interface between user and the kernel portion  54  APPENDIX  B.  MODULE  DESIGN  55  of monitor. It records the startup time of the monitor, sets default values for the buffer size, number of buffers and the output file. It then processes the user command line, allowing user options to override the default values. It creates the output file if it does not exist, or it prompts the user if he likes the file to be emptied. If everything seems fine, it echoes the parameters and options of monitor that are in effect. It then forks a child process which does a context switch into the kernel. The parent process exits and dies.  GetTime(tmbuf) - level 1 struct timedat *tmbuf; It is called from the main program to return the time of day. It uses the system routines gettimeofday and localtime to get and convert time to the form mm:dd:yy:hh:mm:ss.  MapArg(arg) - level 1 char *arg; It is called from the main program to return the next argument on the user command line. A -1 is returned if it is an illegal argument.  Kernel  Level  APPENDIX  B.  MODULE  DESIGN  56  setmonitor(tmbuf,fdes,nbuf,bsize,flag,event_on) - level 0 struct timedat *tmbuf; int fdes, *nbufs, *bsize, flag, event.on; This is a system call, i.e., callable from the user level. It is invoked by the main routine in the user level. If the option is -status, the current values of  mon.bufsize  mon.numbufs,  and kern_event_on are copied out to the user address space. If the option  is -off, haltmonitor  is called to stop the monitor. If the option is -on, then relevant  system variables are set with values passed from the user level. Other routines are also invoked to allocate buffer storage, to write the header record, and to write out a full buffer when one is available.  writebuf() - level 1 As long as the monitor is turned on, this routine will make sure that any full buffers will get written out, by calling the routine dump .buffer_to_file.  When the monitor is  turned off, it also calls monitor_off to do the final clean up.  clear_buffer(bp,bphead) - level 1 struct buf *bp; struct monitor_buf *bphead; It clears the storage area pointed to by bp, such that it can be used for storing  APPENDIX  B. MODULE  DESIGN  57  event data. The position of the buffer within the cyclic buffer pool is preserved.  dump_buffer_to_file() - level 2 This routine calls the low level I/O routine rdwri to dump buffer onto a peripheral device, usually a disk file or a magnetic tape. The device is specified by the inode pointer.  getstorageQ - level 1 This routine transforms a UNIX buffer to a monitor buffer. The first 10 bytes of the monitor  buffer is set aside for administrative purpose. The rest of the buffer is  casted to type short.  PutHeader(tm) - level 1 timedat *tm; The header record is filled with the necessary information, and the sys^probe is invoked with H E A D E R as the event-group.  haltmonitorQ - level 1 This routine is invoked when the monitor is to be turned off. It writes the trailer record, resets all the kernel variables, and wakes up any buffers waiting to get filled.  APPENDIX  B.  MODULE  DESIGN  58  monitor_on() - level 1 This routine is invoked at the start of the monitor. Its main task is to allocate the specified storage buffers, and link them up into a cyclic pool. It initializes the current buffer pointer, mp, the full buffer pointer, fp, and the pseudo head of the buffer pool, hp.  monitor_off() - level 1 This routine is invoked at the end of the monitor session to return all the buffer storage areas to the system.  probe(group,type,auxinfo,auxlen) - level 0 int group, type, auxlen; short *auxinfo; This is another system call. It is designed to be invoked from the user layer, such that events of interest happening in the user address space can also be recorded. An example of such an event is login. This routine does not collect any statistics. It calls sys_probe to process the event.  sys_probe(group,type,auxinfo,auxlen,kern) - level 1 int group, type, auxlen, kern;  APPENDIX  B.  MODULE  DESIGN  59  short *auxinfo; This is the central routine that processes an event-record. It first checks if buffer storage is available. If no buffer is available it increments the lost.events  count and  exits. It does not wait around for available buffer. If buffer is available, it collects the necessary information for the event_record and writes it into the buffer. When a buffer is filled, it wakes up the process sleeping on the full pointer fp if necessary. It then calls  getstorage to get another buffer ready for event data.  Appendix C System Data Structures This appendix gives the data structures of some UNIX  /* * * * * * * *  system variables.  The process s t r u c t u r e One s t r u c t u r e a l l o c a t e d per a c t i v e process. I t contains a l l data needed about the process while the process may be swapped out. Other per process data (user.h) i s swapped with the process.  */ s t r u c t proc {. s t r u c t proc * p _ l i n k ; /* l i n k e d l i s t of running processes */ s t r u c t proc * p _ r l i n k ; s t r u c t pte *p_addr; /* u-area k e r n e l map address */ char p _ u s r p r i ; /* u s e r - p r i o r i t y based on p_cpu and p_nice */ char p _ p r i ; /* p r i o r i t y , negative i s high */ char p_cpu; /* cpu usage f o r scheduling */ char p _ s t a t ; char p_time; /* r e s i d e n t time f o r scheduling */ char p_nice; /* n i c e f o r cpu usage */ char p_slptime; /* time s i n c e l a s t block */ char p _ c u r s i g ; i n t p _ s i g ; /* s i g n a l s pending to t h i s process */ i n t p_sigmask; /* current s i g n a l mask */  60  APPENDIX  C.  SYSTEM  DATA  STRUCTURES  i n t p _ s i g i g n o r e ; /* s i g n a l s being ignored */ i n t p _ s i g c a t c h ; /* s i g n a l s being caught by user */ int p_flag; short p_uid; /* user i d , used to d i r e c t t t y s i g n a l s */ short p_pgrp; /* name of process group leader */ short p_pid; /* unique process i d */ short p_ppid; /* process i d of parent */ u_short p_xstat; /* E x i t status f o r wait */ s t r u c t rusage *p_ru; /* mbuf holding e x i t information */ short p_poip; /* page outs i n progress */ short p_szpt; /* copy of page t a b l e s i z e */ s i z e _ t p _ t s i z e ; /* s i z e of t e x t ( c l i c k s ) */ s i z e _ t p _ d s i z e ; /* s i z e of data space ( c l i c k s ) */ s i z e _ t p _ s s i z e ; /* copy of stack s i z e ( c l i c k s ) */ size_t p _ r s s i z e ; /* current r e s i d e n t set s i z e i n c l i c k s */ s i z e _ t p_maxrss; /* copy of u.u_limit[MAXRSS] */ s i z e _ t p_swrss; /* r e s i d e n t set s i z e before l a s t swap */ swblk_t p_swaddr; /* disk address of u area when swapped */ caddr_t p_wchan; /* event process i s awaiting */ s t r u c t t e x t *p_textp; /* p o i n t e r to text s t r u c t u r e */ s t r u c t pte *p_pObr; /* page t a b l e base POBR */ s t r u c t proc * p _ x l i n k ; /* l i n k e d l i s t of procs sharing same t e x t */ short p _ c p t i c k s ; /* t i c k s of cpu time */ long p_pctcpu; /* °/cpu f o r t h i s process during p_time */ short p_ndx; /* proc index f o r memall (because of vfork) */ short p_idhash; /* hashed based on p_pid f o r k i l l + e x i t + . . . */ s t r u c t proc *p_pptr; /* p o i n t e r to process s t r u c t u r e of parent */ s t r u c t i t i m e r v a l p_realtimer; s t r u c t quota *p_quota; /* quotas f o r t h i s process */ #ifdef sun s t r u c t context *p_ctx; /* p o i n t e r to current context */ #endif 0  >;.  /* * * */  The  User Structure  61  APPENDIX  C.  SYSTEM  DATA  STRUCTURES  s t r u c t user •( s t r u c t pcb u_pcb; s t r u c t proc *u_procp; /* p o i n t e r to proc s t r u c t u r e */ i n t *u_arO; /* address of users saved RO */ char u_comm[MAXCOMLEN + 1]; /* s y s c a l l parameters, r e s u l t s and catches */ i n t u_arg[8]; /* arguments t o current system c a l l */ i n t *u_ap; /* p o i n t e r to a r g l i s t */ l a b e l _ t u_qsave; /* f o r n o n - l o c a l gotos on i n t e r r u p t s */ char u_error; /* r e t u r n e r r o r code */ union < /* s y s c a l l r e t u r n values */ s t r u c t -C int R_vall; i n t R_val2; y u_rv; #define r _ v a l l u_rv.R_vall #define r_val2 u_rv.R_val2 off_t r_off; time_t r_time; } u_r; char u_eosys; /* s p e c i a l a c t i o n on end of s y s c a l l */ /* 1.1 - processes and p r o t e c t i o n */ short u_uid; /* e f f e c t i v e user i d */ short u_gid; /* e f f e c t i v e group i d */ i n t u_groups[NGROUPS]; /* groups, 0 terminated */ short u _ r u i d ; /* r e a l user i d */ short u _ r g i d ; /* r e a l group i d */ /* 1.2 - memory management */ s i z e _ t u _ t s i z e ; /* text s i z e ( c l i c k s ) */ s i z e _ t u_dsize; /* data s i z e ( c l i c k s ) */ s i z e _ t u _ s s i z e ; /* stack s i z e ( c l i c k s ) */ s t r u c t dmap u_dmap; /* d i s k map f o r data segment */ s t r u c t dmap u_smap; /* d i s k map f o r stack segment */ s t r u c t dmap u_cdmap, u_csmap; /* shadows of u_dmap, u_smap, f o r use of parent during f o r k */ l a b e l _ t u_ssave: /* l a b e l v a r i a b l e f o r swapping */  62  APPENDIX  C.  SYSTEM  DATA  STRUCTURES  s i z e _ t u_odsize, u_ossize; /* f o r (clumsy) expansion swaps */ time_t u_outime; /* user time at l a s t sample */ /* 1.3 - s i g n a l management */ i n t (*u_signal[NSIG])(); /* d i s p o s i t i o n of s i g n a l s */ i n t u_sigmask[NSIG]; /* s i g n a l s to be blocked */ i n t u_sigonstack; /* s i g n a l s t o take on s i g s t a c k */ i n t u_oldmask; /* saved mask from before sigpause */ i n t u_code; /* ''code*' to t r a p */ s t r u c t s i g s t a c k u_sigstack; /* sp & on stack s t a t e v a r i a b l e */ #define u_onstack u_sigstack.ss_onstack #define u_sigsp u_sigstack.ss_sp /* 1.4 - d e s c r i p t o r management */ s t r u c t f i l e *u_ofile[NOFILE]; /* f i l e s t r u c t u r e s f o r open f i l e s */ char u_pofile[NOFILE]; /* per-process f l a g s of open f i l e s */ #define UF_EXCL0SE 0x1 /* auto-close on exec */ #define UF_MAPPED 0x2 /* mapped from device */ s t r u c t inode * u _ c d i r ; /* c u r r e n t d i r e c t o r y */ s t r u c t inode * u _ r d i r ; /* root d i r e c t o r y of current process */ s t r u c t t t y *u_ttyp; /* c o n t r o l l i n g t t y p o i n t e r */ dev_t u _ t t y d ; /* c o n t r o l l i n g t t y dev */ short u_cmask; /* mask f o r f i l e c r e a t i o n */ /* 1.5 - t i m i n g and s t a t i s t i c s */ s t r u c t rusage u_ru; /* s t a t s f o r t h i s proc */ s t r u c t rusage u_cru; /* sum of s t a t s f o r reaped c h i l d r e n */ s t r u c t i t i m e r v a l u_timer[3]; i n t u_XXX[3]; time_t u _ s t a r t ; short u _ a c f l a g ; /* 1.6 - resource c o n t r o l s */ s t r u c t r l i m i t u_rlimit[RLIM_NLIMITS] ; s t r u c t quota *u_quota; /* user's quota s t r u c t u r e */ i n t u _ q f l a g s ; /* per process quota f l a g s */ /* BEGIN TRASH */ char u _ s e g f l g ; /* 0:user D; 1:system; 2:user I */  63  APPENDIX  C. SYSTEM  DATA  STRUCTURES  caddr_t u_base; /* base address f o r 10 */ unsigned i n t u_count; /* bytes remaining f o r 10 */ o f f _ t u _ o f f s e t ; /* o f f s e t i n f i l e f o r 10 */ union -C s t r u c t < /* header of executable f i l e */ i n t Ux_mag; /* magic number */ unsigned U x _ t s i z e ; /* text s i z e */ unsigned Ux_dsize; /* data s i z e */ unsigned Ux_bsize; /* bss s i z e */ unsigned Ux_ssize; /* symbol t a b l e s i z e */ unsigned Ux_entloc; /*' entry l o c a t i o n */ unsigned Ux_unused; unsigned U x _ r e l f l g ; > Ux_A; char ux_shell[SHSIZE]; /* #! and name of i n t e r p r e t e r */ } u_exdata; #define ux_mag Ux_A.Ux_mag #define u x _ t s i z e Ux_A.Ux_tsize #define ux_dsize Ux_A.Ux_dsize #define ux_bsize Ux_A.Ux_bsize #define ux_ssize Ux_A.Ux_ssize #define ux_entloc Ux_A.Ux_entloc #define ux_unused Ux_A.Ux_unused #define u x _ r e l f l g Ux_A.Ux_relflg caddr_t u_dirp; /* pathname p o i n t e r */ s t r u c t d i r e c t u_dent; /* c u r r e n t d i r e c t o r y entry */ s t r u c t inode *u_pdir; /* inode of parent d i r e c t o r y of d i r p */ /* END TRASH */ s t r u c t uprof { /* p r o f i l e arguments */ short *pr_base; /* b u f f e r base */ unsigned p r _ s i z e ; /* b u f f e r s i z e */ unsigned p r _ o f f ; /* pc o f f s e t */ unsigned p r _ s c a l e ; /* pc s c a l i n g */ } u_prof; s t r u c t nameicache { /* l a s t s u c c e s s f u l d i r e c t o r y search */ i n t n c _ p r e v o f f s e t ; /* o f f s e t at which l a s t entry found */ i n o _ t nc_inumber; /* inum of cached d i r e c t o r y */ dev_t nc_dev; /* dev of cached d i r e c t o r y */ time_t nc_time; /* time stamp f o r cache entry */  64  APPENDIX  C.  SYSTEM  DATA  STRUCTURES  } u_ncache; #ifdef sun i n t u _ l o f a u l t ; /* c a t c h f a u l t s i n locore.s */ i n t u_memropc[12]; /* state of rope */ s t r u c t skyctx { unsigned u s c _ r e g s [ 8 ] ; /* the Sky r e g i s t e r s */ short usc_cmd; /* current command */ short usc_used; /* user i s u s i n g Sky */ > u_skyctx; struct hole { /* a data space hole (no swap space) */ i n t u h _ f i r s t ; /* f i r s t data page i n hole */ i n t u h _ l a s t ; /* l a s t data page i n hole */ } u_hole; #endif int  u_stack[l];  >: /* *  The UNIX b u f f e r  structure  */ s t r u c t buf < long b _ f l a g s ; /* too much goes here t o describe */ s t r u c t buf *b_forw, *b_back; /* hash chain (2 way s t r e e t ) */ s t r u c t buf *av_forw, *av_back; /* p o s i t i o n on free l i s t i f not BUSY */ #define b _ a c t f av_forw /* a l t e r n a t e names f o r d r i v e r queue */ #define b _ a c t l av_back /* head - i s n ' t h i s t o r y wonderful */ long b_bcount; /* t r a n s f e r count */ long b_bufsize; /* s i z e of a l l o c a t e d buffer */ #define b _ a c t i v e b_bcount /* d r i v e r queue head: drive a c t i v e */ short b _ e r r o r ; /* returned a f t e r I/O */ dev_t b_dev; /* major+minor device name */ union { caddr_t b_addr; /* low order core address */ i n t *b_words; /* words f o r c l e a r i n g */ s t r u c t f s * b _ f s ; /* superblocks */ s t r u c t csum *b_cs; /* superblock summary information */ s t r u c t eg *b_cg; /* c y l i n d e r group block */  65  APPENDIX  C.  SYSTEM  DATA  STRUCTURES  s t r u c t dinode *b_dino; /* i l i s t */ daddr_t *b_daddr; /* i n d i r e c t block */ > b_un; daddr_t b_blkno; /* block # on device */ long b _ r e s i d ; /* words not t r a n s f e r r e d a f t e r e r r o r */ #define b _ e r r c n t b _ r e s i d /* while i / o i n progress: # r e t r i e s s t r u c t proc *b_proc; /* proc doing p h y s i c a l or swap I/O */ i n t (*b_iodone)(); /* f u n c t i o n c a l l e d by iodone */ i n t b_pfcent; /* center page when swapping c l u s t e r */ #ifdef sun caddr_t b_saddr; /* saved address */ short b_kmx; /* saved kernelmap index */ short b_npte; /* number of pte's mapped */ #endif >;  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051876/manifest

Comment

Related Items