UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A portable real time threads environment Mechler, Roland 1997

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1997-0262.pdf [ 3.92MB ]
JSON: 831-1.0051524.json
JSON-LD: 831-1.0051524-ld.json
RDF/XML (Pretty): 831-1.0051524-rdf.xml
RDF/JSON: 831-1.0051524-rdf.json
Turtle: 831-1.0051524-turtle.txt
N-Triples: 831-1.0051524-rdf-ntriples.txt
Original Record: 831-1.0051524-source.json
Full Text

Full Text

A P o r t a b l e R e a l T i m e T h r e a d s E n v i r o n m e n t by Roland Mechler B.Sc, University of British Columbia, 1992 A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a April 1997 © Roland Mechler, 1997 In p resent ing this thesis in partial fulf i lment of the requ i rements for an a d v a n c e d d e g r e e at the University of British C o l u m b i a , I agree that the Library shall make it freely available for re ference and study. I further agree that p e r m i s s i o n for extensive c o p y i n g of this thesis fo r scholar ly p u r p o s e s m a y b e g ran ted b y the h e a d o f m y depar tment or by his o r her representat ives. It is u n d e r s t o o d that c o p y i n g o r pub l ica t ion of this thesis for financial gain shall not b e a l l o w e d wi thout my writ ten p e r m i s s i o n . D e p a r t m e n t o f Cp W\ pU 'f&f S C \ & V \ C C ' T h e Universi ty of Brit ish C o l u m b i a V a n c o u v e r , C a n a d a p a t e Apr; I i o , mi DE-6 (2/88) A b s t r a c t Real Time Threads (RT Threads) is a threads package which provides real time scheduling semantics, application portability to a variety of host operating systems and hardware architectures, and support for distributed application programming via location transparent message passing between threads in distinct address spaces. RT Threads was initially designed as a programming environment for the devel-opment of the Continuous Media File System (CMFS) , a distributed multimedia application requiring real time scheduling of asynchronous tasks. In particular, real time scheduling is required for timely disk reading and network transmission of con-tinuous media data at the server nodes, as well as for synchronization of multiple media streams at clients. The goal of this thesis is to show that RT Threads has performance characteristics making it practical for use both as a general purpose threads package, and as a platform for developing applications requiring real time scheduling. In fact, it will be shown that RT Threads peforms acceptably well (for non-critical applications), even in a non-real time operating system environment. ii C o n t e n t s Abstract ii Contents iii List of Tables v i List of Figures vi i i Acknowledgements ix 1 Introduction 1 2 Real T i m e Threads Overview 6 2.1 Real Time Scheduling 7 2.2 Thread Communication and Synchronization 8 2.3 External I/O 11 3 Appl icat ion Programming Interface 12 3.1 Types and Constants 14 3.2 Thread Management 16 3.2.1 Thread Creation 16 iii 3.2.2 Thread Termination 17 3.2.3 Thread Scheduling 18 3.2.4 Miscellaneous 18 3.3 Synchronization 19 3.4 Communication 22 3.4.1 Intra-Address Space Communication 22 3.4.2 Inter-Address Space Communication 24 3.5 Dynamic Memory Allocation 25 3.6 Miscellaneous 26 3.7 External I /O 27 4 Implementation 29 4.1 Data Structures 30 4.2 System Startup 33 4.3 Thread Management 35 4.4 Queue Management 40 4.5 Thread Scheduling 43 4.5.1 System Clock 44 4.5.2 Context Switching 47 4.5.3 Modifying Scheduling Attributes 50 4.6 Semaphores 51 4.7 Inter Thread Communication 55 4.7.1 Cross Address Space ITC 60 4.8 Dynamic Memory Allocation 66 4.9 External I /O 66 4.10 Mapped Implementation 70 iv 4.11 Native Microkernel Implementation 72 5 Performance 73 5.1 Functional Overheads 74 5.1.1 Thread Creation 75 5.1.2 Semaphore Overhead 76 5.1.3 Context Switching 77 5.2 Inter Thread Communication 78 5.3 Real Time Performance 79 6 Related W o r k 94 6.1 POSIX 96 6.2 Real Time Support for Multimedia Applications 97 6.3 Inter Thread Communication 98 7 Addi t ional W o r k 101 8 Conclusions 102 Bibl iography 104 A p p e n d i x A Example P r o g r a m 107 v L i s t o f T a b l e s 3.1 RT Threads Interface 13 3.2 Interface to U N I X I /O System Calls 28 5.1 Host operating systems and architectures 74 5.2 Thread creation times in microseconds 76 5.3 Context switch times in microseconds 78 5.4 Local send/receive/reply times 79 5.5 Cross address space send/receive/reply times 80 5.6 Base Test: No background threads, no background processes . . . . 82 5.7 1 background thread, no background processes 83 5.8 No background threads, 1 background process 84 5.9 1 background thread, 1 background process 85 5.10 No background threads, 2 background processes 86 5.11 1 background thread, 2 background processes 86 5.12 No background threads, 1 background compile 87 5.13 1 background thread, 1 background compile 88 5.14 2 RT Threads environments 89 5.15 2 RT Threads environments, 1 with background thread 89 vi 5.16 2 RT Threads environments, both with background thread 90 5.17 One RT Threads environment with 100 threads 91 5.18 One RT Threads environment with 200 threads 91 5.19 One RT Threads environment with 300 threads 92 5.20 One RT Threads environment with 300 threads, measured in kernel 92 5.21 Multimedia client scheduling performance 93 vii L i s t o f F i g u r e s 2.1 ITC via Message Passing 8 2.2 Client/Server Communication 10 4.1 RttThreadld 33 V l l l A c k n o w l e d g e m e n t s First and foremost, I wish to thank my supervisor, D r . Gerald Neufeld, for the tremendous support, guidance, and encouragement he has given me, for which I will be forever grateful. I would also like to thank D r . Norman Hutchinson, for his invaluable generosity in lending advice and opinion. M a n y thanks to David Finkel-stein and Dwight Makaroff, who were both closely involved in the implementation and testing of R T Threads, and with whom it has been a pleasure to work. Acknowl-edgement and thanks go as well to M u r r a y Goldberg, whose Pthreads preemptive threads package was the basis for this implementation, and to A n n L o , whose thesis work on real time scheduling was incorpoated into R T Threads. Finally, I thank my parents for their patience, their love, and for always standing behind me. None of this would have been possible without them. R O L A N D M E C H L E R The University of British Columbia April 1997 I X C h a p t e r 1 I n t r o d u c t i o n This thesis is concerned with the design, implementation and performance of Real Time Threads (RT Threads). RT Threads is a threads package which supports distributed application programming and application portability across operating systems and machine architectures. The first motivation for RT Threads was the desire for a portable multi-threaded programming platform on which to build dis-tributed multimedia applications in a heterogeneous computer network environment. Another motivating factor was that a real time scheduling policy (with a well-defined scheduling behaviour) was considered best suited to the time critical nature of mul-timedia applications, particularly with respect to the synchronization of multiple media objects and the timely delivery of continuous media data. RT Threads has been ported to several versions of U N I X (AIX, SunOs 4.1.3, Solaris, Linux, FreeBSD), Microsoft's Win32 software development platform (Win-dows N T , 95), as well as the Kea microkernel platform [30]. Machine architectures supported are Sun S P A R C , I B M PowerPC, and Intel x86/Pentium. A l l of the im-1 plementations are at the user level, except the Kea version which is implemented in the kernel. Threads packages are often closely associated with a particular operating system or software development platform. A key problem with porting threaded applications among various threads packages (as a means of porting between plat-forms) is that these packages do not all have a common set of functions and policies. For example, A I X Pthreads [1] does not support semaphores in its interface. Win32 Threads, on the other hand, has only five priority levels and does not support real time scheduling of threads1. Defining a set of thread functions and policies with a single interface, and supporting that interface on various platforms was chosen as a way to make the applications easily portable. General purpose threads packages tend to use a time sliced round robin scheduling policy among ready threads at the highest priority level which has ready threads. Often the priorities are not fixed, i.e., the system may temporarily modify priorities. The advantage of time slicing is that the application is not responsible for fairness among threads of equal priority. The advantage of non-fixed priorities is that the system has some control of fairness among threads in different processes (i.e., to prevent a high priority thread in one process from starving other processes of C P U resources), since threads have a global scheduling scope in many of these systems. The disadvantage of both policies is that they take control of scheduling away from the application. 'A time sliced round robin policy is the only one supported. 2 RT Threads, on the other hand, uses a fixed priority scheduling policy with no time slicing. Threads run to completion unless preempted by another thread of higher priority or equal priority and earlier deadline. The lack of time slicing means that the application has a greater control over scheduling of threads, but also that the application itself is responsible for fairness among its threads. This can be thought of as a cooperative threads environment. An RT Threads application can in fact implement its own time slicing, if this is desired. Starvation of system processes is not an issue since RT Threads has a per process scheduling scope for threads (although operating in a multi-process environment may have consequences for the real time response of RT Threads). Several aspects of the design of U B C ' s distributed Continuous Media File Server (CMFS) [19, 18, 20] require real time scheduling. At the server, threads must be scheduled to read data from disk and to submit data for network trans-mission according to strict time constraints, in order to guarantee timely delivery of continuous media streams to clients. A t the clients, real time scheduling is used to synchronize multiple media streams, e.g., video, audio and closed captioning text. Synchronization is achieved by scheduling real time threads to submit the data for each stream to the appropriate display device at the correct time. RT Threads does not provide hard real time guarantees. Its scheduling al-gorithm is optimal, meaning it will produce a feasible schedule if such a schedule exists [12], but no dynamic schedulability analysis is done and threads which pass their deadline continue executing until completion. Hard real time scheduling was not implemented for RT Threads because such guarantees are not possible when 3 p o r t i n g t o n o n r e a l t i m e o p e r a t i n g s y s t e m s s u c h as U N I X a n d W i n d o w s N T , a n d b e c a u s e t h e o v e r h e a d o f d y n a m i c s c h e d u l a b i l i t y a n a l y s i s w o u l d l i k e l y be self d e f e a t -i n g . R T T h r e a d s w a s p o r t e d t o a m i c r o k e r n e l e n v i r o n m e n t i n a n effort t o be a b l e t o p r o v i d e r e a l t i m e g u a r a n t e e s f o r t h e C M F S s e r v e r . R e s u l t s f r o m [2] s h o w e d t h a t i m p l e m e n t i n g a t h r e a d s p a c k a g e o n a b a r e m a c h i n e p r o v i d e s a h i g h d e g r e e o f t i m i n g p r e d i c t a b i l i t y as c o m p a r e d t o i m p l e m e n t a t i o n o n t o p o f U N I X . H o w e v e r , o u r r e s u l t s s h o w t h a t e v e n i n a n o n - r e a l t i m e g e n e r a l p u r p o s e o p e r a t i n g s y s t e m like U N I X o r W i n d o w s N T , r e a s o n a b l y a c c u r a t e r e s u l t s a r e a c h i e v e d f o r t i m i n g p r e d i c t a b i l i t y , so l o n g as l o a d s f r o m o t h e r n o n - r e a l t i m e p r o c e s s e s is r e l a t i v e l y l i g h t . B e i n g a b l e t o s u p p o r t a p p l i c a t i o n s w i t h soft r e a l t i m e r e q u i r e m e n t s , s u c h as m u l t i m e d i a d i s p l a y c l i e n t s , o n w i d e l y u s e d g e n e r a l p u r p o s e p l a t f o r m s h a s o b v i o u s a d v a n t a g e s o v e r re-q u i r i n g s p e c i a l p u r p o s e o p e r a t i n g s y s t e m e n v i r o n m e n t s . A n a l t e r n a t i v e t o i m p l e m e n t i n g a n e w t h r e a d s p a c k a g e f o r p o r t a b i l i t y w o u l d h a v e b e e n t o use a s t a n d a r d t h r e a d s i n t e r f a c e , i .e. , P O S I X t h r e a d s ( P t h r e a d s ) . U n -f o r t u n a t e l y , t h e P O S I X i n t e r f a c e d o e s n o t p r o v i d e t h e f u n c t i o n a l i t y r e q u i r e d f o r t h e t a r g e t r e a l t i m e a p p l i c a t i o n s . In p a r t i c u l a r , t h e s p e c i f i c a t i o n d o e s n o t p r o v i d e f o r d e a d l i n e s c h e d u l i n g , t h e a b i l i t y for o n e t h r e a d t o s u s p e n d a n o t h e r t h r e a d o r p u t a n o t h e r t h r e a d t o s leep, o r s u p p o r t f o r d i s t r i b u t e d p r o g r a m m i n g b y a l l o w i n g d i -r e c t c o m m u n i c a t i o n b e t w e e n t h r e a d s in d i s t i n c t a d d r e s s s p a c e s v i a m e s s a g e p a s s i n g . A l s o , P O S I X d o e s n o t s p e c i f y a level o f s c h e d u l i n g c o n t r o l c o n s i d e r e d i m p o r t a n t f o r t h e t a r g e t a p p l i c a t i o n s . F o r e x a m p l e , t h e i n t e r f a c e d o e s n o t s p e c i f y t h a t a t h r e a d w h i c h s i g n a l s a h i g h e r p r i o r i t y t h r e a d b l o c k e d o n a c o n d i t i o n v a r i a b l e 2 s h o u l d b e p r e e m p t e d b y t h a t t h r e a d i m m e d i a t e l y . 2Pthreads uses the condition variable as its basic synchronization primitive, whereas RT Threads uses semaphores. 4 Thesis statement: It is possible to implement a real time threads environ-ment on a general purpose operating system, such that acceptable results for real time schduling will be acheived even under moderate system loads. The remainder of this thesis is organized as follows. An overview of RT Threads is given in chapter 2. Chapter 3 describes the application programming interface. Chapter 4 describes how RT Threads was implemented. Performance re-sults are given in chapter 5. Related work and conclusions follow in chapters 6 and 8. 5 C h a p t e r 2 R e a l T i m e T h r e a d s O v e r v i e w RT Threads has the following key features: • A real time scheduling algorithm based on the starting time, priority and deadline attributes of threads • Application modification of thread scheduling attributes • Thread creation/destruction • Thread synchronization (semaphores) • Inter-thread communication (ITC) • Globally unique (on the Internet) thread identifiers, allowing transparent mes-sage passing between address spaces (and machines) • Asynchronous I /O (blocking I /O calls block only the calling thread) 6 2.1 R e a l T i m e S c h e d u l i n g Real time scheduling in RT Threads is defined by three scheduling attributes, start-ing time, priority, and deadline. Starting times and deadlines are absolute time values (measured in in seconds and microseconds), and priorities are integer values from 0 to 31, where 0 is the highest priority. No thread is made ready before its starting time. The currently executing thread is called the running thread. Threads are scheduled first by priority, i.e., threads are preempted by higher priority ready threads. Threads of equal priority are scheduled according to the Earliest Deadline First (EDF) rule [13], which states that threads are to be scheduled in order of non-decreasing deadlines. Threads of equal priority and equal deadline are sched-uled on a first come first served basis. Within a given priority level, threads with no deadline are scheduled after threads with deadlines. A running thread will run to completion unless preempted by another thread which becomes ready and has higher precedence according to the above algorithm. Such a thread can become ready in one of the following ways: • Its starting time has been reached (i.e., the thread was sleeping). • It has become unblocked due to completion of I /O. • It has been signaled by the running thread, (i.e., it was blocked on a semaphore or an ITC call). • Its scheduling attributes have been explicitly modified by the running thread to give it higher precedence. • It has been created by the running thread, with scheduling attributes giving it higher precedence. 7 Thread A Thread B I RttReceiveO ^ Thread is RttSendO request I I Mocked reply RttReplyO | rara,n»S M Figure 2.1: ITC via Message Passing 2.2 Thread Communication and Synchronization RT Threads provides two main mechanisms for inter-thread communication (ITC): shared memory and message passing. All threads within an RT Threads environ-ment share a common address space. RT Threads provides semaphores to synchro-nize access to this shared memory. Semaphores are standard counting semaphores. A semaphore has a value which is decremented each time RttP() is called and in-cremented each time Rt tVQ is called. Calling RttP() when the semaphore value is zero or negative will cause the calling thread to block. Calling RttV() when the semaphore value is negative will cause one blocked thread to become unblocked. A semaphore's initial value is specified when it is created with RttAl locSemO. Semaphores can be created in one of two modes. In first come first served mode, threads become unblocked in the same order in which they blocked, regardless of priority. In priority mode, threads become unblocked according to the same priority and deadline based algorithm used by RT Threads to schedule ready threads. Message passing uses a send/receive/reply model. RttSendO sends a mes-8 sage (request) to a specified thread and blocks until it receives a reply. R t t R e c e i v e O receives a message from any thread, blocking if there are no messages waiting for the calling thread. R t t R e p l y O sends a reply to a specified thread, which must be waiting for a reply, and does not block. Figure 2.1 shows the temporal relationship and blocking behaviour of these calls. Thread A and Thread B can be in the same address space or in different address spaces. Messages of arbitrary length can be sent using this mechanism. RT Threads' message passing facilities provide reliable communication between threads in distinct address spaces (and distinct machines), so as to provide support for programming distributed applications. These facilities are similar to the distributed inter-process communication (IPC) provided by the V Kernel [4]. RT Threads provides location transparency for message passing ITC using thread ids that are globally unique across the Internet. This is accomplished by embedding an IP address and a local port number in the identifier. Each RT Threads environment (i.e., local address space) is identified by an IP address and port number, specified using the R t t N e t l n i t O call. A n RT Threads environment can create one special name server thread using the Rt tCreateNSO call. The thread id for the name server thread in a remote RT Threads environ-ment can be obtained using the RttGetNSO call, which takes the IP address and port number of the remote environment as parameters. RttGetNSO does not per-form any network communication, the IP address and port number provide enough information to uniquely identify the name server thread in a remote RT Threads environment. The name server thread simply provides a mechanism by which each RT Threads environments can create an easily locatable thread. Whether this mech-anism is actually used to implement a name service, with which local threads can 9 Figure 2.2: Client/Server Communication register, is left up to the application. Cross address space ITC provides a convenient mechanism for implement-ing R P C style client/server transactions. Figure 2.2 shows an example of how a distributed client/server application might be designed using ITC. In the figure, the server consists of an administrator thread which creates local worker threads to handle client requests. The numbers in the figure indicate the order of events in the handling of a single request from a remote client. Note that administrator/worker communication is arranged such that the administrator never does an RttSendO because RttSendO blocks and the administrator must always be available to receive requests. 10 2.3 External I/O RT Threads provides an I/O interface which is essentially a subset of the UNIX interface. Operations for file and socket I/O are supported. I/O calls which block will block only the calling thread. When a thread's I/O is complete it is immediately made ready. 11 C h a p t e r 3 A p p l i c a t i o n P r o g r a m m i n g I n t e r f a c e A n R T T h r e a d s a p p l i c a t i o n b e g i n s e x e c u t i o n at t h e first s t a t e m e n t o f t h e r o u t i n e m a i n p O . M a i n p O is p a s s e d t h e e n v i r o n m e n t a r g u m e n t s a r g c a n d a r g v [ ] . M a i n p O is i t s e l f a t h r e a d a n d h a s t h e h i g h e s t p o s s i b l e p r i o r i t y a n d e a r l i e s t p o s s i b l e d e a d l i n e (it c a n n o t b e p r e e m p t e d ) . T a b l e 3.1 s h o w s t h e o p e r a t i o n s w h i c h m a k e u p t h e R T T h r e a d s i n t e r f a c e . A l l R T T h r e a d s i n t e r f a c e r o u t i n e s h a v e a n i n t r e t u r n v a l u e , r e t u r n i n g e i t h e r R T T O K (for success) o r R T T F A I L E D (for f a i l u r e ) , u n l e s s i n d i c a t e d o t h e r w i s e in t h e f o l l o w i n g i n t e r f a c e d e s c r i p t i o n . 12 T h r e a d Managamen t R t t C r e a t e O Create a new thread RttMyThreadldO Get thread id of running thread RttGetThreadSchedAttr() Get scheduling attributes of given thread RttSetThreadSchedAttr() Set scheduling attributes of given thread RttNameToThreadld() Get thread id given name R t tTh readEx i s t sO Given thread id, determine whether thread exists R t t K i l l O Given thread id, terminate thread R t t E x i t O Terminate running thread RttGetTimeDfDay() Get the current time T h r e a d Synchron iza t ion and C o m m u n i c a t i o n Rt tAl locSemO Allocate a new semaphore RttFreeSemO Free a semaphore R t t P O Wait on a semaphore R t t V O Signal a semaphore RttSemValueO Get the value of a semaphore Rt tSendO Send a message to a given thread and receive a reply R t tRece iveO Receive a message from another thread RttMsgWaitsO Determine whether R t tRece iveO would block R t t R e p l y O Reply to a message from given thread R t t N e t l n i t O Enable cross address space ITC R t t M y l P O Get IP address of local RT Threads environment Rt tMyPor tO Get port number of local RT Threads environment Rt tCreateNSO Create new thread as name server thread RttGetNSO Get thread id of name server, given IP address and port M e m o r y Management R t t M a l l o c O Allocate memory from heap R t t F r e e O Free memory previously allocated with R t t M a l l o c O R t t R e a l l o c O Reallocate memory previously allocated by R t t M a l l o c O R t t C a l l o c O Allocate and clear memory from heap Misce l laneous R t tReg i s t e rEx i tRou t ine ( ) Register a routine to be executed when application exits R t t B e g i n C r i t i c a l 0 Disable preempting events (I/O completion and clock) R t t E n d C r i t i c a l O Re-enable preempting events Rt tGetDataO Get data associated with thread Rt tSe tDataO Set data associated with thread Table 3.1: RT Threads Interface 13 3.1 Types and Constants The type RttThreadld is an abstract data type used to identify threads at the user level. The routine xdr_RttThreadId() is provided to facilitate external data repre-sentation of Rtt Thread Ids so they can be transmitted as data between RT Threads environments on heterogeneous architectures. RttSem is another abstract data type, used to identify semaphores. The type RttTimeValue is used to represent time val-ues in RT Threads. The type is transparent, i.e., the application has knowledge of its fields seconds and microseconds. typedef s t ruct { long seconds; long microseconds; > RttTimeValue; The seconds field can take on values from 0 to 2147483647 and the microsec-onds field values from 0 to 999999. Variables of type RttTimeValue can express both relative times and absolute times. Absolute times are determined using the RttGetTimeOfDayO call. The type RttSchAttr is used to specify the scheduling attributes of a thread. Its fields start ingtime, p r i o r i t y and deadline are visible to applications. typedef s t ruct { RttTimeValue star t ingt ime; int p r i o r i t y ; RttTimeValue deadl ine; } RttSchAttr ; 14 The startingtime and deadline fields are used to specify the starting time and dead-line, respectively, of a thread. Both express an absolute time. The priority field is used to specify the priority of a thread. Priorities can take on values from 0 to 30 (inclusive), where 0 is the highest priority and 30 is the lowest. It is highly recommended that threads use only priorities in the range 10 to 30 inclusive so as not to interfere with higher priority threads used by the underlying system (i.e., threads used to implement cross address space communication). For applications that do not require a fine granularity of priority, the following priority constants are available: RTTHTGH = 10, RTTNORM = 20, RTTL0W = 30 The following constant values are also available as a convenience for scheduling time values: RttTimeValue RTTZEROTIME; RttTimeValue RTTINFINITE; RttTimeValue RTTNODEADLINE; RTTZEROTIME can be used to specify either the starting time or the deadline of a thread. Giving a thread a starting time of RTTZEROTIME will make it ready im-mediately (assuming it is not blocked). A deadline of RTTZEROTIME is the earliest possible deadline for a thread, causing it to be scheduled ahead of any thread with the same priority and a different deadline. RTTINFINITE can be used to specify a thread's starting time. A thread with a starting time of RTTINFINITE will effectively be suspended (it can only run again if another thread modifies its starting time). RTTNODEADLINE can be used to specify a thread's deadline. A thread with a deadline 15 of RTTNODEADLINE will be scheduled after all threads of equal priority which have a deadline. 3.2 Thread Management Thread management consists of routines for creating, scheduling and terminating threads, as well as some miscellaneous related routines. 3.2.1 Thread Creation i n t Rt tCreate(Rt tThreadId *thread, v o i d (*addr) ( ) , i n t s t k s i z e , char *name, i d *arg, R t tSchAt t r schedAt t r , i n t l e v e l ) This call is used to create threads. The first parameter, thread, is a pointer to an RttThreadld, and is used to return the newly created thread's identifier by refer-ence. The second parameter, addr, is a pointer to the routine which acts as the entry point for the created thread. The stksize parameter is the size, in bytes, that should be allocated for the new thread's stack. The name parameter is a null-terminated string used to specify a thread name for debugging purposes. The name is copied, so it's memory may be released after the R t tCrea t eO call returns. More than one thread may have the same name. The fifth parameter, arg, is an argument (pa-rameter) for the created thread. The entry routine for the new thread receives this argument as a parameter. It can be used to pass a 32-bit value, or a pointer to a larger data structure. Note that passing the address of an automatic variable is dangerous since the procedure calling R t tCrea t eO may return before the thread 16 ever runs. The sixth argument, schedAttr, specifies the scheduling attributes of the created thread. If the starting time indicated is in the future, the thread will not be-come ready until that time.The final argument, level, indicates whether the created thread is a system (RTTSYS) or an application (RTTUSR) thread. The only difference between the two is that the RT Threads process (i.e., threads environment) will exit when there are no more RTTUSR level threads. Therefore, perpetual server threads should be created with level RTTSYS (unless they service requests from other RT Threads environments, in which case they will likely want to use RTTUSR). 3.2.2 Thread Termination Threads terminate in one of three ways. Threads can return from their entry routine (either with a re turn( ) statement or by falling off the end of the routine). Equiva-lently, they can terminate themselves using the R t t E x i t O call. Lastly, threads can be killed by other threads using the R t t K i l l O call. v o i d R t t E x i t O R t t E x i t O allows a thread to terminate itself, i n t R t t K i l K R t t T h r e a d l d thread) R t t K i l l O allows a thread to terminate another thread, specified with the thread parameter. 17 3.2.3 Thread Scheduling A thread's scheduling attributes are set at creation time. These attributes may also be retrieved and modified (by the thread itself or by another thread) at any time during the thread's lifetime. i n t RttGetThreadSchedAttr(RttThreadId thread , R t tSchAt t r *schedAttr) This call is used to retrieve the current values of the scheduling attributes of the thread specified by the thread parameter. The attributes are returned by reference via the schedAttr parameter. This call has no effect on the scheduling of the speci-fied thread. i n t RttSetThreadSchedAttr(RttThreadId thread, R t tSchAt t r schedAttr) This call is used to set the scheduling attributes of the thread specified by the thread parameter, as specified in the schedAttr parameter. Any system scheduling effects caused by a change in scheduling attributes will take place immediately. 3.2.4 Miscellaneous RttGetTimeOfDayO, RttMyThreadldO and Rt tThreadExis t s ( ) are miscellaneous routines which are useful for thread scheduling. i n t RttGetTimeOfDay(RttTimeValue *time) 18 This call returns, by reference, the current clock time in seconds and microseconds since January 1, 19701. Rt tThreadld RttMyThreadldO This call returns the currently running thread's Rt tThreadld. i n t Rt tThreadExis t s (Rt tThread ld thread) This call is a predicate which returns 1 if the thread specified by parameter thread exists, and 0 otherwise. Use of this call is discouraged since even with a return of 1, by the time the following statement executes, the specified thread may no longer exist (preemption may occur between the two statements). The call is included primarily for reasons of backward compatibility. 3.3 Synchronization Counting semaphores are provided as the primary means of thread synchronization. Two attributes are specified by the user when allocating a new semaphore, value and mode. Value indicates the number of times Rt tP( ) (see below) can be called before a calling thread will block, assuming no R t t V Q calls are made. Mode indicates the desired semantics for the order in which semaphore blocked threads are made ready. RTTFCFS specifies first come first served semantics, in which threads are made ready 'RttGetTimeOfDayO is based on the UNIX gettimeof day () system call. On other architectures, the base time may be different. 19 in the order in which they were blocked, regardless of priority. RTTPRIORITY speci-fies highest priority first semantics, in which threads of higher priority will be made ready before threads of lower priority, with threads of equal priority being made ready in earliest deadline first order. Threads of equal priority and equal deadline are made ready in first come first served order in RTTPRIORITY mode. Since memory is shared within an RT Threads environment, semaphores can easily be shared among threads within a single environment. In keeping with the cooperative nature of an RT Threads environment, it is left up to the user to ensure that semaphores are no longer in use when a semaphore is freed. To help detect such a situation, an attempt to free a semaphore on which other threads are blocked will result in failure. In the event that a thread is killed while blocked on a semaphore, the state of the semaphore is automatically modified (its value is incremented). Other threads packages provide primitives intended specifically for providing mutually exclusive access to shared data (these are usually called mutexes). RT Threads does not provide special "mutex" primitives; semaphores are lightweight in RT Threads and can easily be used for this purpose. RT Threads provides semaphore operations R t tA l loc S e m O, RttFreeSemO, R t t P ( ) , Rt tV() and RttSemValueO. Semaphore variables are of type RttSem. i n t RttAllocSem(RttSem *sem, i n t va lue , i n t mode) This call allocates a new semaphore, returned by reference via the sera parameter. The initial value of the semaphore is specified by the value parameter, and must be 20 greater than or equal to 0. The mode for the semaphore ( RTTFCFS or RTTPRIORITY) is specified by the mode parameter. i n t RttFreeSem(RttSem sem) This call frees a previously allocated semaphore, specified by the sem parameter. RttFreeSemO will return RTTFAILED if any threads are blocked on the semaphore. i n t RttP(RttSem sem) This is the conventional semaphore wait primitive. It decreases the value of the semaphore specified by parameter sem by one, and blocks if the resulting value is less than zero. i n t RttV(RttSem sem) This is the conventional semaphore signal primitive. It increases the value of the semaphore specified by parameter sem by one, and unblocks one thread blocked on the semaphore if the value was less than zero when the call was made. i n t RttSemValue(RttSem sem, i n t *value) This call returns, via reference parameter value, the current value of the semaphore specified by parameter sem. This call should be used with caution, since the semaphore's value could change between the return from the RttSemValueO call and the following statement. 21 3 .4 C o m m u n i c a t i o n An inter-thread communication (ITC) facility is provided by RT Threads, allow-ing message passing between threads both within an RT Threads environment, and between threads in different environments. A single interface for both types of com-munication is made possible because RttThreadlds are universally unique on the internet. 3.4.1 Intra-Address Space Communication RT Threads provides blocking send/receive/reply style inter-thread communication primitives. A thread sending a message using RttSendO is blocked until the mes-sage has been received by the destination thread and a reply has been made. A thread receiving a message using R t tRece iveO is blocked until some other thread sends it a message. R t t R e p l y O , a non-blocking operation, is used to reply to a received message. There is also a non-blocking routine RttMsgWaitsO that allows a thread to test whether it has a message waiting to be received. Rt tSendO and R t t R e p l y O both copy their message (request and reply) data. A description of the message passing routines follows. i n t Rt tSend(RttThreadld t o , v o i d *sData, u_in t s l e n , v o i d *rData, u_in t *r len) This call sends a message to the thread specified by the to parameter, and blocks waiting for a reply. The request message is specified by a buffer pointer, the sData parameter, and a length, the slen parameter. A buffer must be supplied for the 22 reply to be copied into, given as the rData parameter. The parameter rlen is an input/output parameter, specifying the maximum reply length as input, and re-turning the actual length of the reply message as output. If the input length is less than the length of the reply sent, the reply will be truncated. RttSendO returns RTTOK on success, RTTNOSUCHTHREAD if the specified destination thread does not ex-ist, RTTFAILED on failure for any other reason. i n t Rt tReceive(Rt tThreadId *from, v o i d *data, u_ in t *len) This call receives a message which was sent by another thread using Rt tSendO. The Rt tThreadld of the sending thread is returned by reference via the from pa-rameter. A buffer must be supplied for the message to be copied into, given as the data parameter. The parameter len is an input/output parameter, specifying the maximum message length as input, and returning the actual length of the received message as output. If the input length is less than the length of the message sent, the message will be truncated. i n t R t tReply(Rt tThread ld sndr, v o i d *data, u_in t len) This call sends a reply message back to a thread which is blocked on an Rt tSendO call. The destination thread (i.e., the original sender) is specified using the sndr parameter. A pointer to the reply message buffer and its length are given as the data and length parameters, respectively. R t t R e p l y O returns RTTOK on suc-cess, RTTNOTBLOCKED if the thread indicated by sndr is not blocked sending, and RTTFAILED for any other failure. 23 i n t RttMsgWaitsO This non-blocking call returns 1 if there is a message waiting to be received by the calling thread, and 0 otherwise. 3.4.2 Inter-Address Space Communication The message passing routines described in the previous section can also be used to send messages between threads in distinct RT Threads environments, i.e., across address space, possibly machine, boundaries. The routines described in this section make this possible. i n t R t tNe t ln i t (uns igned long ipAddr , unsigned short portNo) This call is used to uniquely identify the calling RT Threads environment on the in-ternet, and enable cross address space ITC. It must be called in mainpO, before any R t tCrea t eO calls so that the IP address and port number specified by the ipAddr and portNo parameters can be embedded in all RttThreadlds. The user can let the system choose values for the IP address and/or the port number by specifying 0 for the respective parameters. i n t RttMyIP(unsigned i n t *ipAddr) i n t RttMyPort(unsigned i n t *portNo) These calls return, respectively, by reference, the IP address and port number which identify the local RT Threads environment. They are useful if 0 was specified as 24 parameters to R t t N e t l n i t ( ) . Both return RTTFAILED if R t t N e t l n i t O has not yet been called. i n t Rt tCreateNS(RttThreadld *thread, v o i d (*addr) () , i n t s t k s i z e , char *name, v o i d *arg, R t tSchAt t r schedAt t r , i n t l e v e l ) This call takes the same parameters as RttCreate() and has the same semantics, except the created thread is marked by RT Threads as the unique name server thread for the environment. RttCreateNS() may be called only once (it will return RTTFAILED on subsequent calls). i n t RttGetNS(unsigned i n t ipAddr , i n t portNo, Rt tThreadld *nServThreadId) This call returns, by reference, the Rt tThreadld of the name server thread for an RT Threads environment specified by IP address ipAddr and port number portNo. bool_t xdr_RttThreadId(XDR *xdrs , Rt tThreadld *objp) This call is provided so that RttThreadlds being sent in a message between address spaces can be encoded and decoded using the X D R external data representation. 3.5 Dynamic Memory Allocation RT Threads provides thread-safe wrappers for the standard C library heap memory allocation routines m a l l o c O , c a l l o c O , r e a l l o c Q and f r ee ( ) , declared as follows: 25 v o i d * R t t M a l l o c ( i n t s i z e ) v o i d * R t t C a l l o c ( i n t numElems, i n t elemSize) v o i d * R t t R e a l l o c ( v o i d *pt r , i n t s i z e ) v o i d R t tF ree (vo id *mem) Other C library routines which do not have RT Threads equivalents should be pro-tected from context switches with R t t B e g i n C r i t i c a l ( ) and R t t E n d C r i t i c a l O calls. 3 .6 M i s c e l l a n e o u s i n t R t t R e g i s t e r E x i t R o u t i n e ( v o i d (* rou t ine) ( ) ) This call registers procedure routine (which takes no parameters) to be called when RT Threads exits. R t t R e g i s t e r E x i t R o u t i n e O may be called more than once to register multiple exit routines. i n t R t t R e g i s t e r D e s c r i p t o r ( i n t fd) Any file descriptors obtained from a non-RT Threads call (e.g., ones obtained in a non-RT Threads process and passed to an RT Threads environment) should be registered with RT Threads using the R t tReg i s t e rDesc r ip to r ( ) call. v o i d R t t B e g i n C r i t i c a l ( ) v o i d R t t E n d C r i t i c a l O 26 These calls should surround any system calls or library calls which do not have RT Threads equivalents and may not be re-entrant. They delimit a critical section within which no RT Threads context switches will occur. i n t Rt tSetData(Rt tThreadld thread , unsigned long data) i n t Rt tGetData(RttThreadld thread , unsigned long *data) These calls allow data to be associated with a given thread, and for that data to be retrieved. 3.7 External I/O RT Threads also includes interface routines for many UNIX-style I /O calls. Blocking calls will only block the thread making the call and not the entire process in which RT Threads is running. The parameter lists are identical and the semantics are the same as the corresponding U N I X calls. See the U N I X man pages for details of a par-ticular call. Table 3.2 shows the RT Threads calls which map directly to U N I X calls. Some other routines are provided which do not map directly to U N I X calls, but are provided for convenience. They are described as follows: i n t RttReadN(int f d , char *buf, i n t numbytes) This call has the same semantics as the U N I X read() , except that while read() may return after reading less than numbytes bytes, RttReadNO will block until 27 R T T R o u t i n e U N I X C a l l R T T R o u t i n e U N I X C a l l RttOpen open RttPipe pipe RttSocket socket RttClose close RttBind bind RttListen listen RttAccept accept RttConnect connect RttRead read RttWrite write RttLseek lseek RttRecvfrom recvfrom RttSendto sendto RttRecvmsg recvmsg RttSendmsg sendmsg RttGetsockopt getsockopt RttSetsockopt setsockopt RttGetsockname getsockname Table 3.2: Interface to U N I X I /O System Calls numbytes has been read (unless an error or the end of file is encountered), i n t R t tWr i t eN( in t f d , char *buf, i n t numbytes) This call has the same semantics as the U N I X w r i t e ( ) , except that while w r i t e ( ) may return after writing less than numbytes bytes, R t tWr i t eNO will block until numbytes has been written (unless an error is encountered). i n t RttSeekNRead(int f d , char *buf, i n t numbytes, i n t seekpos, i n t seekmode) This call combines the functionality of R t tLseekQ and RttReadO into a single call. i n t Rt tSeekNWri te( in t f d , char *buf, i n t numbytes, i n t seekpos, i n t seekmode) This call combines the functionality of R t tLseekO and R t t W r i t e O into a single call. 28 C h a p t e r 4 I m p l e m e n t a t i o n Three models were used for implementing RT Threads: a native host model where RT Threads implements its own context switching; a mapped model where RT Threads threads are mapped to existing threads provided by the underlying op-erating system (or a separate user level threads layer); and a native microkernel model where RT Threads forms part of a microkernel implementation. Most of the implementation is shared between the native host and mapped implementations. The source code is shared between the two models for all archi-tectures, with precompiler directives used to switch between model and architecture specific parts of the code. The general description of the implementation here will follow that of the native host version, that being the original model and environ-ment (UNIX) used to implement RT Threads. Features which differ in the mapped implementation will be described in section 4.10. The native microkernel is briefly described in section 4.11. 29 4.1 Data Structures The overall state of each thread in the system is maintained in a structure called a thread control block (TCB). This structure has the type TCB. typedef s t r uc t tcb { Rt tThreadld th read ld ; i n t errNo; ThreadMgmtlnfo threadMgmtlnfo; QueueMgmtlnfo queueMgmtlnfo; Schedlnfo schedlnfo; ITCInfo i t c l n f o ; Semlnfo semlnfo; IOInfo i o l n f o ; } TCB; Each thread is identified within the system by a pointer to its T C B . The threadld field of the T C B contains the thread's thread id (an identifier of type Rt tThreadld, described later in this section). The errNo field of the T C B is a thread specific ver-sion of the errno global of the underlying operating system (for U N I X and Win32). The remaining fields of the T C B are structures specific to the various functional modules of the RT Threads implementation: threadMgmtlnfo contains information used by the thread management module, queueMgmtlnfo for the queue management module, and so on. Each will be described in greater detail in its respective section in this general implementation description. 30 RT Threads has a few global variables. Most are part of the _Globals_ global variable, which is a structure of the following type: s t r uc t g loba l s { TCB *curr thread; i n t inOsFlag; i n t numbThreads; i n t numbReadyThreads; i n t numbSysThreads; i n t ioPending; i n t t i ckPend ing ; i n t nameServerCreated; Rt tThreadld nameServerld; u_long h i d ; u_long l i d ; s t r uc t tcb tcbTbl[NUMBTCBS]; QUEUE qTbl[NUMQS]; QUEUE rdyqTbl[NUM_READYQ_PRIORITIES]; }; The currthread field is a pointer to the T C B of the currently running thread. The inOsFlag field is used to control shared access to shared data within the RT Threads kernel. Fields numbThreads, numb ReadyThreads and numbSysThreads are the num-ber of threads currently alive, the number which are ready, and the number which are "system" threads (threads created with level RTTSYS). The ioPending and tickPend-ing fields are flags which indicate whether I /O processing and clock tick handling, respectively, have been deferred. The nameServerCreated field is a flag indicating 31 whether the special name server thread has been created, and nameServerld is the thread id of the name server thread, if there is one. The hid and lid fields are the host id and local id (IP address and port) for identifying the local RT Threads environment (see section 4.7.1). The tcbTbl, qTbl and rdyqTbl fields are the global T C B table, queue table and ready queue table, respectively. Access to shared data structures of the RT Threads implementation, such as TCBs , is protected by setting inOsFlag whenever a thread is in a critical section in the RT Threads kernel. This flag is checked whenever a preempting event occurs in the kernel, i.e., whenever a clock tick occurs (see section 4.5) or I /O has com-pleted (see section 4.9). If the flag is set when such an event occurs, any associated processing and scheduling is deferred (by setting the ioPending or tickPending flag) until the original thread leaves its critical section. These critical sections are delim-ited with ENTERINGOS and LEAVINGOS, which are macros for the enteringOs () and leav ingOsO routines. enteringOs () simply sets inOsFlag. l eav ingOsO clears the flag, and also checks ioPending and tickPending to see whether any processing has been deferred, calling their respective handlers directly if necessary. The publicly visible type for the threads, Rt tThreadld, is a 64-bit value which contains enough information to uniquely locate a thread among RT Threads environments on the internet. Figure 4.1 shows the layout of the Rt tThreadld. It is divided into two parts, the host id (hid) and local id (lid). The host id contains the 32-bit IP address associated with the thread's RT Threads environment, used to locate its host machine. The local id identifies the thread on a particular machine, using the 16-bit port number, and within a particular threads environment, using 32 hid lid 3 IP address port r-t-& index O 32 bits 16 bits 4bits 12 bits Figure 4.1: Rt tThreadld the instance number and index. The index is an 12-bit index into the T C B table, allowing for as many as 4096 simultaneous alivethreads per address space. The 4-bit instance number is incremented each time a T C B structure is reused, to help prevent the thread id for a terminated thread from mistakenly being accepted as a reference to a newer thread using the same T C B table index. The instance number can range from 1 to 15, instance 0 being reserved to indicate the special name server thread. 4.2 System Startup This section describes how the RT Threads system (i.e., an RT Threads environ-ment running in an underlying process) is initialized. From the user application's point of view, an RT Threads application starts in the same manner as a normal console application, except that the entry routine is named mainpO rather than main(). The actual main() routine is defined in the RT Threads library, and is the entry point for RT Threads startup. In main(), first the routine o s I n i t O is called to initialize data structures, and then the null thread and main thread are created. 33 After these threads are created, the StartOs_() routine (of the scheduling module, see section 4.5) is called to start the scheduling clock. s t a t i c v o i d o s I n i t O This routine first does dynamic initialization of the global variables curr thread and inOsFlag, and then calls the routines InitReadyQueue_(), InitThreadMgmt_() and Ini t IO_() to do module specific initialization. These routines are described in their respective sections. RTTTHREAD NullThread_() This is the entry routine for the null thread. The null thread is created with a special priority lower than that of any other thread in the system, and thus runs only when no other threads are ready to run. The null thread simply loops on the pause () system call, which yields control of the processor allowing other processes in the underlying system to run while the RT Threads environment is idle waiting for threads to wake up or I /O to complete. When such an event occurs, a signal is generated, causing pause() to return. This return will not actually happen until threads scheduled by the preempting event have completed their tasks, at which time the RT Threads system will become idle again, and the null thread will loop back to call pause() again. RTTTHREAD mainThreadQ 34 This is the entry routine for the main thread, which simply calls the user defined mainpO routine, passing along the argc and argv parameters from main(). The main thread is created with the highest possible priority and the earliest possible deadline (RTTZEROTIME), and as such it will not be preempted by any threads created in mainpO (the intended purpose of mainpO is for user application initialization, e.g., command line parsing and the creation of one or more initial threads). 4 . 3 T h r e a d M a n a g e m e n t The thread management module is responsible for the creation and termination of threads, and for maintaining state specifically related to the running of a thread, such as its entry point and stack. The ThreadMgmtlnf o structure is shown below. typedef s t r uc t { i n t inUse; R t tSchAt t r a t t r s ; TSTATE s t a t e ; char name[MAXTNAMELEN + 1] ; char *namePtr; i n t *stmembeg; i n t *stkend; i n t *sp; i n t s t a c k S i z e ; v o i d (*threadAddr)() ; v o i d *threadArg; 35 unsigned long u V a l ; i n t l e v e l ; } ThreadMgmtlnfo; The inUse field is a flag indicating whether the T C B is currently in use. The attrs field maintains the current scheduling attributes for the thread in an R t tSchAt t r structure, as described in section 3.1. The current state of the thread is maintained in the state field, which is of enumerated type TSTATE, with possible values as follows: typedef enum { RBLK = 0, /* rece ive blocked */ SBLK = 1, /* send blocked */ WTFR = 2, /* wai t i ng f o r r e p l y */ SLEP = 3, /* s leep ing threads */ SEMB = 4, /* semaphore blocked */ IOBK = 5, /* I/O blocked */ THREAD. .FREE = 20, /* not i n use . */ THREAD. .READY = 21, /* ready */ THREAD. .RUNNING = 22, /* running */ THREAD. .DYING = 23 , /* dying */ NOQ = Oxff /* not on any queue */ } TSTATE; The name and namePtr fields are used to hold and refer to the string name of the thread. The thread's stack is maintained using the stmembeg, stkend, sp and stack-Size fields, which indicate the beginning and end of the stack in memory, the current stack pointer, and the stack size, respectively. The threadAddr and threadArg fields 36 are the thread entry point (i.e., the address of the entry routine) and the argument passed into the entry routine. Thread specific user data is stored in the u Val field. The /eue/field indicates the thread level as specified at thread creation and described in section 3.2.1. The following routines, defined in the thread management module, are used to assist in thread management: s t a t i c TCB *f indUnusedTCBO This routine searches the T C B table for a T C B whose inllse field is 0. If found, a pointer to that T C B is returned, otherwise TNULL is returned (TNULL is a null T C B pointer). The last index tried is maintained as a static local variable as an optimization, so the search can continue with the next T C B in the table the next time f indUnusedTCBO is called. s t a t i c void in i tTCB( in t tcbNo, TCB *tcbPtr) This routine initializes a T C B at system startup. Most notably, the inllse field is set to 0, the state is set to THREADJFREE, and the instance number is set to 1 (instance number 0 is reserved for the special name sever thread). s t a t i c void threadIsLeaving(TCB *tcbPtr) This routine is used to call cleanup routines for exiting threads, as defined in other modules. ItcThreadCleanup_() and SemCleanup_() are called. 37 s t a t i c v o i d k i l l C u r r t h r e a d O This routine is called when thread terminates itself, either by returning from its en-try routine or by calling R t t E x i t O or R t t K i l l O , to remove itself from the system. The numbThreads field of _Globals_ is decremented, as is the numbSysThreads field if the thread is a system thread. The th read l sLeav ingO routine is called to do any cleanup for other modules. SysFree_() is used to free the thread's stack. The T C B ' s state is set to THREAD_FREE, the instance number is incremented (and wrapped to 1 if it exceeds 15), and the inUse field is set to 0. If numbThreads is greater than numSysThreads, the next ready thread is scheduled by using DeqReadyThread_() to assign a new curr thread, and then calling SchedThread_() to schedule it. Other-wise, there are no RTTUSR threads left in the system and the RT Threads environment first calls Ca l lEx i tRou t ines_ ( ) to call any user registered exit routines, and then exits using the underlying e x i t ( ) call. v o i d InitThreadMgmt_() This routine is used to initialize thread management. The numbThreads and numb-SysThreads fields of _Globals_ are set to 0, and the i n i t T C B O routine is called for each T C B in the T C B table. v o i d C a l l E x i t R o u t i n e s _ ( ) This routine is called when the RT Threads environment exits naturally. It traverses a linked list of routines previously registered by the using application with the 38 R t t R e g i s t e r E x i t R o u t i n e O A P I call, v o i d Ca l lNclean_( ) Cal lNclean_() is the entry routine used by the RT Threads implementation to start each new thread, the first time it is scheduled to run. The actual entry routine for the thread, as specified in the R t t C r e a t e O call, is explicitly called by Cal lNclean_( ) . When C a l l N c l e a n _ 0 is called, cur r thread will have already been set to the point to the new thread's T C B by the scheduler, so the entry routine address and ar-gument can be obtained from the threadAddr and threadArg fields of the T C B ' s threadMgmtlnfo field. Should the user specified thread entry routine return, the k i l l C u r r t h r e a d O routine is called to remove the thread from the system. Threads are created using the R t tCrea t eO interface routine. First, an unused T C B is obtained using the f indUnusedTCBO call. If none are available, R t t C r e a t e O returns RTTFAILED. Otherwise, the new thread's threadMgmtlnfo field is initialized. The inUse field is set to 1, and the attrs, name, stackSize, threadAddr, threadArg and level are assigned the values passed in as parameters to Rt tCreate () . A stack is allocated for the thread using SysMalloc_() . The IP address and port number of the threadld field of the T C B are set to the values of the hid and lid globals. The thread's state is set to THREAD_READY, and the new field of the T C B ' s schedlnfo field is set to 1, to indicate to the scheduler that this is a new thread. The new thread is now ready to be scheduled. The current time is determined using RttGetTimeOfDay(), and is compared with the starting time attribute of the new thread to see whether it should be put to sleep or made ready. If the starting time has been reached, the thread is made ready by calling EnqueueEdfTail_(), other-wise it is put to sleep by calling SetTimeout_() (see section 4.5). If the thread's 39 level is RTTSYS, the number of system threads (numbSysThreads) is incremented, and the next ready thread is scheduled using the SchedM macro (this will either be the currently running thread, i.e., the one which called R t t C r e a t e O , or the newly created thread, depending on their scheduling attributes). R t t E x i t O and R t t K i l l O can be used to terminate threads. R t t E x i t O terminates the currently running thread by calling k i l l C u r r t h r e a d O . R t t K i l l O first checks to see whether its thread id parameter is the currently running thread, in which case it too calls k i l l C u r r t h r e a d O . Otherwise, it terminates the specified thread .in much the same way that k i l l C u r r t h r e a d O does, except that it must also dequeue the thread from whichever scheduling queue it is on, determined by its state (and priority if it is a ready thread), using the DEQUEUE-ITEMO macro. 4 . 4 Q u e u e M a n a g e m e n t A l l threads except the running thread are maintained in queues. The queues de-scribed in this section are used primarily by the scheduler, and will be referred to in general as scheduling queues. Ready threads are maintained in an array of ready queues, one for each priority level. Queues are implemented as doubly linked lists, for efficient removal of a thread from a queue. The QueueMgmtlnf o structure is shown below. typedef s t r uc t { s t r u c t tcb *prev; s t r u c t tcb *next; 40 u_char queue; } QueueMgmtlnfo; The prev and next fields are links into whichever queue the thread is currently on. The queue field is an 8-bit integer value, corresponding to the current state of the thread, indicating which queue the thread is on. The routines ENQUEUE_HEAD(), ENQUEUE_TAIL () , EN QUEUE _I TEM(), DEQUEUEJffiAD ( ) and DEQUEUEJTEM () , defined as macros for efficiency, allow threads to be placed on or removed from anywhere in a queue. The FindThreadOnQ_() macro is used to determine whether a thread is on a particular queue. A thread is always on a single scheduling queue, unless it is the currently running thread (state THREAD-RUNNING), in which case it is not on any queue. Threads which are ready to run, i.e., are not blocked, sleeping, or running, have state THREAD_READY and are maintained on one of the ready queues. There is one ready queue for each priority level, these queues being maintained in an array called the ready queue table (field rdyqTbl'm the _Globals_ structure). The ready queue module provides routines which place threads in the appropriate position (accord-ing to the E D F rule) on the appropriate queue (i.e., the one corresponding to the thread's priority). Thus, the thread at the head of each priority queue is always the one which should be scheduled next for that priority. The ready queue routines are: v o i d EnqueueEdfTail .(TCB *thread) This routine places a ready thread, specified as a pointer to its T C B , on the ready queue for its priority, behind all threads with an earlier deadline. The thread is 41 placed immediately following (at the tail of) any other threads with the same dead-line. v o i d EnqueueEdfHead_(TCB *thread) This routine places a ready thread, specified as a pointer to its T C B , on the ready queue for its priority. The thread is placed immediately preceding (at the head of) any other threads with the same deadline. This routine is used when a thread is made ready after being blocked sending or receiving a message, or doing external I /O. TCB *DeqReadyThread_(void) This routine returns a pointer to the T C B of the thread which is at head of the highest priority, non-empty ready queue, removing that thread from the queue. TCB *DeqIfReady_() Like DeqReadyThread_(), this routines removes the highest priority thread and re-turns a pointer to its T C B , but only if that thread would preempt the currently running thread. If so, the currently running thread, i.e., the thread indicated by the global variable curr thread, is placed on its appropriate priority queue. Otherwise, the value of cur r thread is returned. v o i d DeleteReadyThread.(TCB *thread) 42 This routine removes the thread specified by thread from its priority queue. Threads which are not ready or running are either sleeping or blocked, and will be on an appropriate queue, having been placed there with one of the macro routines described earlier in this section. Sleeping threads, i.e., those whose start-ing time is in the future, are maintained in the sleep queue, ordered by start time for efficient waking. A l l other threads are on various blocked queues, one each for semaphores, sending, waiting for a reply, receiving and I /O. 4.5 Thread Scheduling The key features of thread scheduling are: • determining which thread should be scheduled to run at any given time • a clock for scheduling future events • a mechanism for context switching, including saving thread state (registers) Determination of which ready thread should be next to run at any given time is handled largely by the queue management routines described in section 4.4. The following sections describe the implementation of the system clock for RT Threads, and how context switches are performed. 43 4.5.1 System Clock A "clock" is required by RT Threads for the purpose of making threads ready at their starting time. This is implemented using an interval timer ( s e t i t i m e r O ) which causes periodic alarm signals to be sent to the process. These signals are caught by a signal handler called the tick handler. The tick handler checks the sleep queue to see whether any threads have a starting time less than or equal to the current time (obtained using RttGetTimeOfDayO). A l l such threads are taken off the sleep queue and made ready. If there is a thread which should preempt the currently running thread, a context switch to that thread takes place immediately, within the signal handler. This means that there may be no actual return from the signal handler for an indefinite period of time, possibly longer than the interval between clock ticks. Thus, in order to ensure that subsequent alarm signals are caught at the correct time, the signal mask for the process must be modified before such a context switch in order to unblock alarm signals. The following scheduling module routines are used to implement the clock and related functions: s t a t i c i n t wakeSleepersO This routine is called by the tick handler to take all threads off the sleep queue whose starting times have passed, and make them ready by putting them on their respective ready queues. RttGetTimeOfDayO is called to get the current time for comparison, and the sleep queue is traversed using the queue management macros, dequeuing threads until one whose starting time is still in the future is encountered. 44 Threads are ordered in the sleep queue by starting time, so it is not necessary to search the entire queue. Each thread which is dequeued from the sleep queue is put on its ready queue using EnqueueEdfTail_(). wakeSleepersO then returns the number of threads which were awakened. s t a t i c v o i d t i c k H a n d l e r O The tick handler is a routine registered with the operating system to to be called at regular intervals. It is a signal handler which catches the SIGALRM signal. When it is called, the inOsFlag is first checked to see if it set. If so, the tickPending field of _Globals_ is set, to defer the signal handling, and t i ckHand le r returns imme-diately. If not in a critical section, it calls wakeSleepersO to wake (make ready) any threads whose starting time has passed. If wakeSleepersO returns a non-zero value, at least one thread has been awakened, meaning that a context switch may be required. The SchedM scheduling macro (see section 4.5.2) is called to perform such a switch, if required. However, while in the signal handler, the underlying operating system blocks SIGALRM signals until the signal handler routine returns. If a context switch occurs, the routine will not return for an indefinite period of time, during which the SIGALRM signal would continue to block, thus delaying future clock ticks indefinitely. To overcome this problem, the SigMask_() routine (see below) is called to unblock the SIGALRM signal before calling SchedM. i n t SetTimeout_(TCB *thread, RttTimeValue wakeTime) The SetTimeout_() routine is used to put a thread to sleep by placing it on the time ordered sleep queue. The thread's state is set to SLEP, and its starting time is set to 45 wakeTime. The sleep queue is traversed using the queue management macros until a thread with a later starting time is encountered, and the new sleeper is linked into the queue ahead of that thread, unless the end of the queue is reached, in which case the new thread is linked to the tail of the queue. v o i d SigMask_(int how, i n t s i g ) SigMask_() is a convenience routine used to block or unblock signals. The how pa-rameter is used to select whether the signal is to be blocked or unblocked, taking as a value either SIGJ3L0CK or SIG_UNBLOCK. The sig parameter is the signal to block or unblock, for example SIGALRM. The underlying sigprocmaskO call is used. v o i d S t a r t C l o c k _ ( i n t useconds) Star tClock_() is used to start the system clock. It uses the s e t i t i m e r O call to initiate the generation of SIGALRM signals at regular intervals, that interval being specified in microseconds via the useconds parameter. v o i d Star tOs_() StartOs_() is used to initiate the signal related functionality of the system. The tick handler and SIGIO handler (see section 4.9) are registered with the operating system using the s i g a c t i o n O call, and the RT Threads system clock is started by calling S ta r tClock_( ) . 46 4.5.2 Context Switching The following data structure, Schedlnfo, is the type for the schedlnfo field of a T C B . typedef s t ruc t { i n t savearea[36]; i n t new; }• Schedlnfo; The savearea field is a statically allocated piece of memory which is used to store the register state of a thread when execution switches to the context of another thread. Its size, 144 bytes, is large enough to accomodate the necessary registers for all of the architectures on which RT Threads is implemented. The new parameter is a flag used to determine whether a thread is being scheduled to run for the first time. The routines which perform the actual context switches are implemented in assembler code because they require access to the registers for saving thread state, and because they employ a "trick" involving return addresses1. The assembler code implements 3 functions used to achieve user level context switching: v o i d startNewThread(void (*func)() , v o i d *sp) This routine is used to start a new thread. It sets the initial stack pointer for the thread to point to sp, the beginning of the stack memory for the thread as allocated in R t t C r e a t e O , and then jumps to June, which is the address of the entry routine 'Solaris provides support for user level context switching (setcontext () and getcontext). This support was not used in the RT Threads implementation, in part because it is not portable to other operating systems. 47 for the new thread (in reality, the address of Cal lNclean_() is passed in, so that this routine can do cleanup if the actual thread entry routine returns). i n t saveThreadContext(void *savearea) The saveThreadContextO routine is used to save the state (i.e., the registers) for a thread prior to a context switch. It takes a parameter savearea which is a pointer to memory which is used to save the current state of the thread. The savearea field of the T C B ' s schedlnfo field is passed in for this purpose. When called to save the state of a thread, saveThreadContextO returns a non-zero value, so that the context switching code will know that another thread must be resumed. v o i d returnToThread(void *savearea) The returnToThreadO routine is used to resume a thread which had at some point previously saved its state using saveThreadContextO. The contents of the memory referred to by its parameter savearea are loaded into the registers. returnToThreadO does an "unnatural" return, i.e., the function returns to the same address as the last call to saveThreadContextO for that thread did, except this time the return value is 0. Thus the context switching code can tell that a thread is being resumed and it should continue with execution of user code for that thread. The following routines are also involved in context switching: 48 v o i d SchedThread_() This routine is called to perform a context switch, by calling the appropriate assem-bler routine. When called, cur r thread must point to the T C B of the next thread to run. The state of curr thread is set to THREAD_RUNNING. Then, if the new field for the thread is set, startNewThreadO is called, and otherwise returnToThreadO is called resulting in the actual context switch. SchedThread_() never returns. SchedM The SchedM routine is defined as a preprocessor macro for efficiency. It is called by the RT Threads kernel whenever there is the possibility that a context switch is needed, and it calls the appropriate routines to effect the context switch if required. It must be called within a kernel critical section (i.e., ENTERINGOS must have been called previously), and LEAVINGOS is called within the macro. When it is called, the current thread must either be in the THREAD_RUNNING state, or one of the blocked states (e.g., SEMB). In the latter case, the thread will have just been put in a blocked state by the kernel code, and SchedM will put the thread on the appropriate schedul-ing queue. A context switch will necessarily occur, because the current thread is now blocked. On the other hand, if the current thread was in the THREAD_RUNNING state before the SchedM call, a context switch might not occur, since there might not be a ready thread which would preempt the current one. Deqlf Ready_() is called to determine the next thread to run. If it returns cur r thread no context switch takes place, otherwise saveThreadContext () is called to save the state of the cur-rent thread in preparation for a context switch. The saveThreadContext () call will return a non-zero value, and this condition will result in cur r thread being set to 49 the T C B pointer returned by Deqlf Ready_(). The same procedure is followed if the originally current thread was in a blocked state, except that DeqReadyThread_() is used to obtain the T C B pointer for the next thread to run, since it is known that it will not be curr thread. At this point SchedThread_() is called to perform the context switch. The saveThreadContextO call is in a conditional statement, which, when the return value is zero, results in a LEAVINGOS call and a return to the code which called SchedM. This occurs when the originally preempted thread is scheduled to run again. 4.5.3 M o d i f y i n g Schedul ing A t t r ibu tes The RttSetThreadSchedAttrO A P I routine can be used to modify the scheduling attributes of a thread. This can result in an immediate context switch. If the thread is sleeping, it is first removed from the sleep queue with the DEQUEUE_ITEM() queue management macro and its state is set to THREAD_READY. The sleeping thread is dequeued because if the thread is to remain sleeping, but with a modified starting time, it will need to be re-enqueued on the sleep queue to ensure it ends up in the correct position in the time ordering. If the thread is on a ready queue, it is re-moved from that ready queue with DeleteReadyThread_() because it may change state or priority. RttGetTimeOfDayO is then called to determine the current time for comparison. If the new starting time for the thread is in the future, the thread's scheduling attributes are assigned the values indicated by the attr parameter, and if the thread was running or ready, it is placed on the sleep queue using SetTimeout_() and SchedM is called to schedule the next ready thread. Otherwise, if the thread having its attributes set is the currently running thread, its attributes are simply 50 assigned their new values and SchedM is called. If the thread was not the currently running thread and its state is now THREAD_READY (because it had been ready or sleeping), it is put on the appropriate ready queue with EnqueueEdf T a i l _ ( ) SchedM is called to schedule the next ready thread (which may or may not be the one just made ready). 4 . 6 S e m a p h o r e s The publicly visible type for semaphores is the opaque type RttSem (defined as a vo id* in the interface). In the semaphore implementation, the RttSem type maps to the private type SemaphoreStruct, i.e., RttSem variables passed into the semaphore routines are actually pointers to SemaphoreStructs, and are cast accordingly within the routines. The SemaphoreStruct type is a structure defined as follows: typedef s t r u c t semaphore { i n t magicNumb; TCB *f i r s tBlockedThread ; TCB *las tBlockedThread; i n t semaphValue; i n t mode; } SemaphoreStruct; The magicNumb field is a magic number used to ensure that the semaphore pointer is valid. The firstBlockedThread and lastBlockedThread fields are T C B pointers for the first and last thread in the queue of threads blocked on the semaphore. The semaphValue field is the current value of the semaphore. The mode field indicates 51 the mode of semaphore, either first come first served (RTTFCFS) or priority based (RTTPRIORITY). Semaphores are created using the Rt tAl locSemO routine, which allocates a SemaphoreStruct dynamically using SysMalloc_() , and initializes its fields (the semaphValue and mode values are obtained from the value and mode parameters). The pointer to the allocated memory is cast to RttSem and returned by reference. RT Threads semaphores are destroyed by calling the R t t F r e e O routine, which frees the memory allocated for the semaphore using SysFree_(). R t t F r e e O first checks, however, whether the semaphore value is less than zero, indicating that threads are blocked on the semaphore, in which case it returns RTTFAILED without freeing the semaphore. Threads which are blocked on a semaphore maintain the pertinent informa-tion in the semlnfo field of the T C B , which is a structure of type: typedef s t r u c t { s t ruc t tcb *semaphNext; s t ruc t tcb *semaphPrev; RttSem semaphore; } Semlnfo; The semaphNext and semaphPrev fields are links used to place the thread in the queue of threads blocked on a particular semaphore. The semaphore field is the semaphore which the thread is blocked on (or NULL if the thread is not blocked on a semaphore). 52 The following are routines local to the semaphore module used to add and remove threads from a semaphore's queue of blocked threads: s t a t i c v o i d addBlockedThread(SemaphoreStruct *semaph) This routine links the currently running thread (currthread) into the blocked queue for the semaphore indicated by the pointer semaph. Links from the semlnfo field of the thread's T C B are used. The lastBlockedThread field of semaph is set to be curr thread, placing the thread at the tail of the queue. s t a t i c v o i d removeThread(SemaphoreStruct *semaph, TCB *tcbPtr ) This routine removes the thread indicated by tcbPtr from anywhere in the blocked queue for semaph. This is an efficient operation because the T C B has the previous and next links which locate the thread in the queue. The removeThread routine is used when a thread waiting on a semaphore becomes unblocked (due to an Rt tV() call on the semaphore), or when a semaphore blocked thread is terminated. s t a t i c TCB *dequeueFirstBlockedThread(SemaphoreStruct *semaph) This routine calls removeThread() with the firstBlockedThread field of semaph to remove the first blocked thread from the semaphore's blocked queue. It is used for first come first served mode semaphores. s t a t i c TCB *dequeueHighPrioBlockedThread(SemaphoreStruct *semaph) 53 This routine searches the blocked queue for semaph to determine which thread has the highest priority. If more than one thread has this priority, the one with the earliest deadline is chosen, and if there is more than one such thread, the one closest to the first blocked thread is chosen. The resulting thread is then removed from the queue using removeThreadO. This routine is used for priority mode semaphores. v o i d SemCleanup_(TCB *tcbPt r ) SemCleanup_() is called by a terminating thread to do semaphore related cleanup. If the thread indicated by tcbPtr is blocked on a semaphore, it is removed from the semaphore's blocked queue using removeThreadO, and the semaphore's value is incremented to reflect the fact that one less thread is blocked on it. Threads can wait on or signal a semaphore using the Rt tP( ) and Rt tV() A P I routines. Each takes a semaphore variable (type RttSem), previously created with R t tA l locSemO, as its single parameter, and casts that parameter to a pointer of type SemaphoreStruct* so it can access the semaphore's state. Each also checks the semaphore's magic number to ensure it is a valid semaphore. In R t t P ( ) , the semaphore's value is first decremented. If the new value is less than zero, the thread will block on the semaphore. If this is the case, addBlockedThreadO is called to put the thread (i.e., the currently running thread, currthread) at the tail of the blocked queue for the semaphore. Then the state of the thread is set to SEMB (semaphore blocked), and the scheduling macro SchedM is called to put the thread on the semaphore blocked queue (the scheduling queue, 54 not to be confused with the semaphore's own blocked queue) and schedule the next ready thread (see section 4.5). If the new value of the semaphore is not less than zero, the thread does not block and Rt tP( ) returns immediately. Rt tV() increments the semaphore's value. If the resulting value is less than or equal to zero, then there was at least one thread blocked on the semaphore and thus a blocked thread should be made ready. If the semaphore's mode is RTTFCFS, then the thread to be made ready is chosen with dequeueFirstBlockedThreadO, and if the mode is RTTPRIORITY, the thread is chosen with dequeueHighPrioBlockedThreadO. The chosen thread is made ready with EnqueueEdfTail_(), and the scheduling macro SchedM is called. If the newly ready thread has scheduling attributes which take precedence over those of the currently running thread, the current thread will be preempted immediately. 4.7 Inter Thread Communication Information used for RT Threads' message passing facility is maintained in the itclnfo field of the T C B of each thread. This field is a structure of type ITCinf o, defined as follows: typedef s t r uc t { i n t remote l tc ; Rt tThreadld remoteThreadld; v o i d *sendmsg; u_ in t sendlen; 55 struct tcb *peer; void *receivedmsg; u_int receivedlen; struct tcb *waitingFirst; struct tcb *waitingLast; struct tcb *waitingNext; struct tcb *waitingPrev; > ITCInfo; The first field, remoteltc, is a flag indicating whether the thread is currently in the midst of a cross address space message passing transaction. The second field, re-moteThreadld is the thread id of the remote sending thread in such a transaction (see section 4.7.1). The fields sendmsg and sendlen are the message buffer and buffer length for the request message, maintained by the sending thread in the event that the intended receiver is not blocked receiving a message at the time the message is sent. The peer field is used to communicate the identity of the sending thread to the receiving thread. The receivedmsg and receivedlen fields are the buffer and buffer length for depositing the request message at the receiving thread, and for depositing the reply message at the sending thread. The remaining fields are used to implement a queue of waiting senders at a receiving thread, for the case where multiple threads send to a single thread before it does a receive. The waitingFirst and waitingLast fields are used to mark the ends of the queue in the intended receiver's T C B . The waitingNext and waitingPrev are links in the T C B of each of the sending threads in the queue. 56 The following local routines are defined in the ITC module: s t a t i c v o i d addWaitingSender(TCB *to) This routine uses the links in the itclnfo field of a sending thread's T C B to place the thread in the queue of waiting senders for an intended receiver thread, as specified by the to parameter. s t a t i c v o i d dequeueSender(TCB *tcbPtr) This routine dequeues the first sender thread from the queue of waiting senders maintained at a receiving thread. A first come first served policy is used for enqueu-ing and dequeuing senders. s t a t i c TCB *getTcbFromRemoteThreadId(RttThreadId thread) This routines maps a remote thread id to the T C B in which it is stored. The T C B table is searched until a match is found. v o i d ItcThreadCleanup_(TCB *tcbPtr ) This routine is called to perform ITC related cleanup when a thread terminates. If any sender threads are on the waiting sender queue of the thread referred to by tcbPtr, they are taken off the scheduler's send blocked queue and made ready. 57 The RttSendO routine first compares the IP address and port number in the destination thread id (determined using the GetlpAddrFromThreadldO and GetPortNoFromThreadldO macros) with the local RT Threads environment's host id and local id to see whether the intended receiver is a local thread. If they differ, the receiver is in another threads environment and ExecuteRemoteRequest_() is called to perform the cross address space communication (see section 4.7.1). Other-wise, the receiver is local, in which case a pointer to the T C B of the intended receiver is obtained to do the processing of the local send (the sending thread is the currently running thread, i.e., curr thread) . The macro getTcbFromThreadldO is used to convert a thread id to a T C B pointer. This is done with the to parameter thread id, unless its instance number is 0 which indicates that the intended receiver is the local name server thread (see section 4.7.1), in which case getTcbFromThreadldO is used with the globally stored name server thread id to determine its T C B pointer. This will or occur either if the nameserver thread is the destination in a cross ad-dress space communication, in which case this will be the local proxy send, or if the nameserver thread id was obtained locally with the RttGetNSO call. The re-ceivedmsg field for the sender thread is then set to the reply buffer address, passed in as the rData parameter to Rt tSendO. The FindThreadOnQueue_() macro is used to see if the the intended receiver is on the receive blocked (RBLK) scheduling queue. If so, the receiver will have set the receivedmsg and receivedlen fields of its itclnfo structure, and the sender copies the request data (parameter sData) into the receiver's buffer. The length of the copy is the minimum of the send length specified by the sender (parameter slen), and the maximum receive length specified by the receiver (the value in the receivedlen field). The sender then sets itself as the receiver's peer, sets its own state to be waiting for reply (WTFR), makes the receiving 58 thread ready by changing its state to THREAD-READY and putting it on its ready queue using EnqueueEdf Head_(), and then blocks itself and makes the next thread ready (possibly the receiver) by calling the scheduling macro SchedM. In the event that the intended receiver exists but is not blocked receiving, the sender, puts itself on the receiver's queue of waiting senders using addWaitingSenderO, sets its own sendmsg and sendlen fields for the receiver to eventually copy the request message from, sets the receiver as its peer, sets its own state as send blocked (SBLK) and calls SchedM to block itself and schedule the next ready thread. Note that Rt tSendO will always block. When the sending thread eventually becomes unblocked, it checks its own receivedlen field to find out the length of the reply message, as set by the replying thread, and returns (the reply length is returned by reference). In R t t R e c e i v e O , the receiving thread starts by checking the waitingFirst field of its T C B ' s itclnfo field. If not TNULL, there is at least one message waiting, and the thread can pick it up right away without blocking. I.e., the waitingFirst field will be the T C B pointer for the next sender, and the receiver copies the mes-sage stored in its sendmsg field into its own receive buffer (the Data parameter to R t t R e c e i v e O ) , the copy length being the minimum of the send length (specified in the sender's sendlen field) and the receive length (the dereferenced value of the len parameter to R t tRece iveO) . The receiver then changes the sender's state from send blocked (SBLK) to waiting for reply (WTFR), switching it to the appropriate scheduling queue, and then returns. In the event that there are no messages waiting for the receiver, it sets its own receivedmsg and receivedlen fields to indicate the buffer to eventually be filled with a request message, then sets its state to receive blocked (RBLK) and calls SchedM to block itself and schedule the next ready thread. 59 RttMsgWaitsO simply checks waitingFirst for the calling thread, returning 0 if it is NULL and 1 otherwise. R t t R e p l y O first checks the thread id to which the reply is being sent (pa-rameter sndr) to see if it is a local or a remote thread. If it is a local thread, its T C B pointer is obtained using the getTcbFromThreadldO macro, otherwise the getTcbFromRemoteThreadldO routine is used (this maps the remote thread id to the T C B for a proxy thread which does a local send for the remote thread). In the local case, FindThreadOnQ_() is used to see whether the destination thread (i.e., original sender) is actually in blocked waiting for a reply. If not, RTTNOTBLOCKED is returned. Otherwise, the reply is copied into the receievedmsg field in the sender's T C B , the sender is made ready by setting its state to THREAD_READY and calling EnqueueEdfHead_(), and SchedM is called. R t t R e p l y O never blocks, but there may be an immediate context switch to the sender if it has a higher priority or equal priority and earlier or equal deadline. 4.7.1 Cross Address Space ITC The cross address space ITC facility is for the most part built on top of RT Threads, but since it makes use of a few of the kernel's facilities to make message passing location transparent, it is considered part of the RT Threads kernel. Location trans-parency is achieved by embedding each thread's location in the Rt tThreadld, and extracting it within the Rt tSendO call. T C P / I P is used as the underlying protocol for sending messages across address space boundaries. Thus, if within the Rt tSendO 60 call it is determined that the Rt tThreadld of the intended receiver refers to a thread with an IP address and port number for an RT Threads environment other than the local one, ExecuteRemoteRequest_() is called to set up a T C P connection to the remote environment. For receiving messages from remote senders, each RT Threads environment can enable an ITC server system thread which listens for incoming connection re-quests, and spawns worker threads which forward ITC messages to the appropriate local thread using a local R t tSendO. The following data structures are used by the cross address space ITC implementation: s t ruc t ITCinfo { u_int s l e n ; u_int r l e n ; Rt tThreadld sender; Rt tThreadld r e c e i v e r ; } ITCinfo ; The ITCinfo structure is used for transmitting control information. It precedes the actual message data in every transmission from sender to receiver. The slen field is the message data length, and rlen is the maximum length reply the sender will accept. The sender field is the sender thread id (which will be returned to the receiving thread by the R t tRece iveO call at the remote end) and r ece ive r is the thread id of the intended receiver, so it can be located when the message gets to the remote threads environment. s t r uc t Reply lnfo { 61 i n t retCode; u_in t r ep lyLen ; } Rep ly ln fo ; The Reply lnfo structure is used to transmit control information back to the sender along with the reply message. The retCode field is used to relay the return code for the remote RttSendO back to the sender, and the replyLen indicates the length of the reply. The R t t N e t l n i t O A P I routine is used to enable cross address space ITC. It sets the host id (hid) and local id (lid) for the local RT Threads environment according to the ipAddr and port parameters passed to it. Either or both of these parameters may be zero, in which case system chosen values are used. In the case of the IP address, the underlying gethostbyname() call is used to determine the local IP address (which may not be the desired one if the host is multi-homed). In the case of the port number, the underlying b i n d O call is used to determine the port, if one is not chosen by the application. The routines of the remote ITC module which implement cross address space message passing are described below: i n t ExecuteRemoteRequest_(RttThreadld t o , u_in t s l e n , v o i d *sData, u_ in t * r l e n , v o i d *rData) This routine is used to perform a remote send. The parameters are the same as those passed to the Rt tSendO call from which ExecuteRemoteRequest_() is called. A socket is created, bound to a system chosen port, and used to connect (using Rt tConnectO) to the IP address and port embedded in the to parameter thread id. 62 An ITCinf o structure is then filled in with the slen and dereferenced rlen parameter values, the thread id of the currently running thread for the sender, and the receiver being the to parameter. This control information structure is byte order encoded for network transmission using the locally defined encodelnfoO routine, which converts each field to network byte order using n t o h l ( ) . The control information structure (variable info) is then sent to the intended receiver using R t t W r i t e N O , followed by another Rt tWr i t eNO call to send the actual message data. The mes-sage is considered to be a stream of bytes, and any byte order considerations are the responsibility of the higher level application calling Rt tSendO. At this point ExecuteRemoteRequest_() will block waiting for the reply from the remote receiver. First RttReadNO is called to receive the reply control information, which is read into a Reply lnfo structure. If the return value (retCode) value is not RTTOK, the T C P socket is closed and the return code value is returned to Rt tSendO, where it is returned to the original caller. Otherwise, the replyLen field is used to deter-mine how many more bytes are to be read from the connection in a subsequent RttReadNO call, which reads the data into the rData buffer supplied by the original caller. The dereferenced value of the rlen parameter is also set to the reply length. The T C P connection is then closed and RTTOK is returned. s t a t i c RTTTHREAD i t c S e r v e r O This routine defines the ITC server thread, created in the R t t N e t l n i t O call. The ITC server first calls R t t L i s t e n O on the local socket descriptor (also created by R t t N e t l n i t O ) , to listen for incoming connection requests from other RT Threads environments doing remote sends to this one. The server then goes into a loop, first calling Rt tAccept ( ) to accept any incoming connections. Whenever Rt tAccept ( ) 63 returns with the socket descriptor for a new connection it calls R t t C r e a t e O to cre-ate an ITC worker thread (see the i t cWorke rO description below), passing it the socket descriptor as its argument. s t a t i c RTTTHREAD i tcWorker (vo id *arg) A n ITC worker thread is created each time a new connection corresponding to an incoming remote send request arrives, leaving the ITC server available to handle more incoming requests. The arg argument of i tcWorker contains the socket de-scriptor for the T C P connection on which the request control information and data is being sent. RttReadNO is first called to receive the control information into an ITCinfo structure, the contents of which are then decoded to host byte order us-ing the locally defined decodelnfoO routine. The slen field from that structure is then used to determine how long the message data will be, so that a buffer (local variable sData) can be allocated for it and it can be read off the T C P connection using RttReadNO, providing it is not a zero length message, in which case sData is set to NULL. At this point the ITC worker thread prepares to do a local send, with the ITC worker acting as a proxy for the remote thread originally doing the send. Rt tSendO is used to do the local send. However, the corresponding local R t t R e c e i v e O needs to return the thread id of the remote sender, and not that of the ITC worker (proxy), as would normally be the case. To accomplish this, the remoteltc field of the ITC worker's itclnfo T C B field is set to 1, and the re-moteThreadld field is set to the thread id of the remote sender, as obtained from the sender field of the control information structure received on the T C P connec-tion. Within the local R t tRece iveO implementation, before the sender thread id is returned to the receiving thread (via its reference paramater), the remoteltc flag 64 is checked, and if set the remote ThreadId thread id is returned instead of that of the local sender. Once the local RttSendO returns to the ITC server, the ITC server can send the reply to the remote sender. The retCode field of a Rep ly ln f o structure is assigned the return value of the RttSendO call. If the return value is RTTOK, the replyLen field is set to the reply length returned by Rt tSendO, otherwise it is set to zero. The fields of the Rep ly ln f o structure are then converted to network byte order using h t o n l O , and it and the reply message are sent to the remote sender using R t t W r i t e O . The T C P connection is then closed and the ITC worker thread terminates. The cross address space ITC facility uses a name server thread mechanism so that a thread in one RT Threads environment can locate (i.e., find out the thread ids of) threads in other environments. RT Threads does not provide an actual name service, but rather provides a way to locate a single well known thread in any given RT Threads enviroonment, such an environment itself being uniquely identified by its IP address and port number, as set with the R t t N e t l n i t O call. The mecha-nism used to identify the special name server thread is give it an instance number of 0, while no other threads are allowed to have an instance number of 0. This is done by the Rt tCreateNSO call. A remote thread can use RttGetNSO to construct (without doing any communication) a thread id containing the IP address and port number of the destination threads environment. When RttSendO is called with this thread id, the remote receiver will notice the 0 instance number and send the message to the well known name server thread there. This is actually done within the local Rt tSendO call at the remote receiver. 65 4.8 Dynamic Memory Allocation The RT Threads routines for dynamic allocation of heap memory are implemented simply as wrappers around the standard C library routines m a l l o c O , c a l l o c O , r e a l l o c O and f r ee ( ) . The corresponding A P I routines add protection from con-text switches (using ENTERINGOS and LEAVINGOS), because the underlying library routines are not necessarily re-entrant. Private (to the RT Threads kernel) rou-tines SysMalloc_() and SysFree_() are also provided for memory allocation within the kernel, and must be called from within a critical section (i.e., kernel code pro-tected with ENTERINGOS and LEAVINGOS). These routines simply call the underlying m a l l o c O and f ree ( ) , but are provided to facilitate alternate implementations. 4.9 External I/O The asynchronous I /O features of U N I X are used to implement asynchronous I/O in RT Threads, i.e., all file (and socket) descriptors are set to be non-blocking and to have a SIGIO signal generated whenever I/O is complete. When an I /O routine is called by a thread (e.g., Rt tReadO), the underlying U N I X system call (e.g., r eadO) is called. If that system call returns with an indication that the call would block, the thread is placed on the I /O blocked (IOBK) queue. SIGIO signals are caught by the I/O handler routine, which uses the s e l e c t ( ) 2 system call to determine which file descriptors have completed I /O. Each blocking A P I routine has a corresponding local routine which is called to complete the I/O operation when it is known that the descriptor is ready. For example, the read() system call is called in the RttReadO 2For Solaris, the po l l ( ) system call is used. 66 routine to read from the file or socket indicated by the file descriptor passed to it. If the read() call returns with an error and errno indicates that the call would have blocked, the calling thread is placed on the I /O blocked queue and changes its state to IOBK. When the I /O is ready, the SIGIO handler will catch a SIGIO signal and the local doReadO routine will be called to call read() again. If it completes successfully, the thread will be made ready again and a context switch will occur if the thread should run. The following structure, which is maintained as the iolnfo field of the T C B , is used to communicate the I /O operation to be performed and its parameters to the lower level I /O routine (e.g., doReadO). typedef s t r uc t { char ioDp; i n t ioRes; char * ioBuf ; i n t ioNumBytes; i n t i o F l a g s ; char • i o S r c B u f ; i n t • i o S b l P t r ; i n t ioSeekPos; i n t ioSeekMode; i n t i o A l ; i n t ioA2; i n t i oA3; i n t ioA4; i n t i oA5; 67 } IOInfo; The ioOp field indicates the I /O operation which is to be performed. Constants are assigned to indicate various I /O operations, for example the SNDM constant flags the sendmsgO operation. The ioRes field is used to communicate the result of the underlying I /O operation back to the calling thread. The other fields of the IOInfo structure are used to communicate various parameters to the lower level routine. The ioAl, ioA2, etc., fields are general purpose fields used for any I /O call argu-ments, except that ioAl is always used for the file descriptor argument. s t a t i c i n t e n a b l e S i g i o ( i n t fd) This routine uses i o c t l O system calls to mark the given file descriptor for non blocking I /O and asynchronous I /O notification with the underlying operating sys-tem. This means that normally blocking system calls on the file descriptor will not block, but will return an error value and set the errno system global to indicate that the call would block (EWOULDBLOCK or EAGAIN, depending on the version of U N I X ) . It also means that whenever I /O is ready for completion at the operating system level, for example if data has arrived on the network for a given socket, a SIGIO signal will be generated, and a subsequent call for the associated file descriptor will not block. v o i d I n i t I 0 _ ( ) In i t I0_( ) is called to initialize I /O for an RT Threads environment. It clears the global signal masks and then sets descriptors 0, 1 and 2 (standard input, standard 68 output and standard error) for f dmask. It also enables SIGIO generation for stan-dard input by calling e n a b l e S i g i o O . i n t checkBlkIoOpera t ions( in t se lected) This routine traverses the I /O blocked queue, and for each thread in the queue, checks whether the appropriate global signal mask (WIOMASK or RIOMASK) has been set for its file descriptor (which is in the ioAl field of the T C B ) . If so, I /O is ready for completion and the appropriate lower level I /O routine can be called. A switch is done on the ioOp field of the thread's T C B to determine the appropriate signal mask to check (e.g., wiomask for output I /O such as w r i t e O or sendmsgO), and the appropriate lower level routine to call (e.g., doWri teO for an ioOp of WRITE). The number of operations which were completed is returned. v o i d IoHandler_() This routine defines the SIGIO handler, registered to be the signal handler for SIGIO signals. When called, the global inOsFlag is first checked to see if the in-terrupted thread was in a critical section in the kernel. If so, the handling must be deferred, which is done by setting the ioPending field of _Globals_ and return-ing immediately. Otherwise, the signal handler enters a critical section by calling ENTERINGOS. Then, s e l ec t () is called to determine which file descriptors (hence, which threads) have I /O ready for completion. The descriptors to be checked are maintained in a global (to the I /O module) mask, called f dmask, which is of type sigset_t . Two other masks global to the module, riomask and wiomask are set to 69 fdmask and passed to s e l e c t ( ) to determine which file descriptors are ready for reading (riomask), and which are ready for writing (wiomask). If any descriptors are selected, checkBlk loQpera t ionsO is called to complete the operations. At this point SIGIO signals are unblocked using SigMask_(), and SchedM is called in case one of the threads whose I /O was just completed would preempt the thread which was interrupted by the handler. 4.10 Mapped Implementation Neither Win32 nor Windows N T provide an equivalent to the U N I X system call s e t i t i m e r O for generating alarm signals at a regular interval. Win32 does provide similar functionality with its multimedia timer facilities, allowing a callback routine to be called at regular intervals. Thus the tick handler could be implemented as one of these multimedia callbacks. Unfortunately, callbacks will not occur if there is an outstanding callback which has not returned. Thus a context switch within the tick handler would effectively cause the scheduling clock to stop until that thread ran to completion. This could easily lead to incorrect scheduling behaviour (i.e., a thread with higher scheduling precedence might miss its "wake up call"). A suitable solution was not found, which provided the motivation for trying a new approach, the mapped model. In the mapped model, RT Threads is implemented on top of an existing threads package, with each RT Threads thread mapped directly to a thread native to the underlying package. Thus the RT Threads implementation can take advan-tage of some of the features of the underlying package, such as context switching 70 and asynchronous I /O among threads. A mapped implementation of RT Threads was done for both A I X 4.1 PThreads and for the Win32 Software Development K i t 3 . The Win32 implementation is described here, i.e., those features which differ from the native host implementation. A l l of the underlying Win32 threads are suspended at any given time, ex-cept the one underlying the currently running RT Thread. Context switches are performed by suspending the currently running thread's underlying Win32 thread, and resuming that for the new thread to be scheduled. The tick handler runs as a separate Win32 thread which executes in a loop blocking on a Win32 semaphore on each pass. The multimedia timer callback routine simply signals this semaphore to effect the clock tick. The tick handler can then perform context switches by sus-pending one thread and resuming another. Note that the RT Threads scheduler is still used and not the Win32 scheduler, since the Win32 scheduler does not give the desired scheduling behaviour. Mapping to underlying Win32 threads is used merely as a technique for performing context switches. Not all context switches occur in the tick handler. Often a thread yields con-trol itself, for example by blocking on a semaphore. In this case the thread cannot suspend itself first and then resume the next thread. Instead, it first resumes the next thread and then suspends itself. This could, however, cause a context switch before it actually suspends itself. To overcome this problem, the thread first boosts its Win32 priority before resuming the next thread to ensure that a context switch does not take place until it suspends itself. When the original thread is resumed 3Win32 provides portability between Windows NT and Windows 95 71 again, it lowers its priority again to the normal priority reserved for the underlying Win32 threads. Mapping RT Threads to Win32 threads made it possible to take advantage of Win32's asynchronous I /O among threads. I.e., this made it possible for the RT Threads I /O routines to simply call the underlying Win32 I /O routines (e.g., _read()) and block if necessary rather than using an I /O queue. Additional schedul-ing is required to resume another thread before the actual I /O call (in case it blocks), and to ensure that the correct thread is scheduled to run when I /O completes, since during the I /O call (and immediately before and after), more than one underlying Win32 thread is in an unsuspended state. 4.11 Native Microkernel Implementation There were at one time plans to implement RT Threads as a stand-alone kernel on top of bare hardware for the Intel Pentium architecture. The motivation for this was to provide a very efficient environment for implementing the Continuous Media File Server, one in which real time guarantees could be provided. However, the Kea microkernel operating system had already been developed at U B C , allowing the RT Threads scheduler to be dropped into the kernel, so that Kea's threads would be scheduled according to RT Threads' scheduling policy. This put RT Threads close to the hardware, made it possible to make the RT Threads environment the only process4 in the system, and eliminated the need to develop new operating system support from the ground up. 4 domain, in K e a terminology 72 C h a p t e r 5 P e r f o r m a n c e The performance of a threads package can be measured on a number of criteria. The most straightforward is measurement of the overhead of specific calls and threads related functional aspects. The term functional overheads will be used in this thesis to describe such overheads. The most important of these are for synchronization calls (e.g., for mutexes), for thread creation, and for context switching. Less straightforward is the impact on overall application performance of using a threads package. While functional overheads play a part in overall performance, their effect may be insignificant compared to performance improvements due to con-currency achieved by using threads for blocking I /O calls, so that computation may continue while I /O requests are being serviced by external hardware. The perfor-mance of a particular threaded application may be highly dependent on the nature of the application and the way it is implemented (i.e., its decomposition into con-current tasks). It is difficult to choose representative benchmark applications which will show performance differences due to differences in scheduling policies or imple-73 Hostname Operating System Architecture Stephen SunOS 5.5.1 SPARC2 rickards SunOS 5.5.1 Pentium 166 cascade SunOS 5.5.1 Ul t raSPARC ice SunOS 4.1.3C S P A R C 10 blackdog A I X 4.1 PowerPC 601 appaloosa A I X 3.2.5 PowerPC 601 dsg A I X 3.2.5 PowerPC 601 monteith Windows N T 3.51 Pentium 100 grouse Windows N T 4.0 Pentium 200 sumas Windows N T 4.0 Pentium 200 asp Windows 95 Pentium 120 robson-win95 Windows 95 Pentium 200 Table 5.1: Host operating systems and architectures mentations between different threads packages. Instead, the approach taken here is to try to measure how well RT Threads performs in terms of meeting its own real time scheduling criteria. In the tests described in this section, performance measurement times were averaged over several iterations, and unrelated overheads were subtracted where possible. A l l times listed in the tables are in microseconds. Table 5.1 shows a list of hostnames of the machines used in the tests, and their operating systems and architectures. 5.1 Functional Overheads The following sections describe experiments used to measure the functional over-heads of thread creation, semaphore calls and context switching for RT Threads. Measurements were made for other threads packages as well, for comparison. 74 5.1.1 Thread Creation The importance of thread creation performance depends on the nature of the ap-plication using threads. If only a few long lived threads are created, then creation time is not critical. However, if threads have a low overhead for creation ("cheap" threads), implementation approaches which make liberal use of dynamic thread cre-ation may become practical. An example would be protocol stacks which allocate a thread per packet for processing. The results for thread creation testing are summarized in table 5.2. Thread creation times for RT Threads were highly dependent on the size of the stack allo-cated (i.e., they reflect the time taken to allocate the stack using m a l l o c O within the call). Solaris Threads and Pthreads allow a stack to be allocated by the user, and stacks were allocated using m a l l o c O in the tests, the time for which is included in the results. Values in parentheses show creation times with average m a l l o c O times factored out. For Win32 Threads, a stacksize is specified at creation, but the stack is allocated by the system. For the Win32 mapped RT Threads implementa-tion, the stack of the underlying Win32 thread is utilized, and no explicit m a l l o c O is done. Creation times were measured as an average of 1000 sequential creation calls. A fairly high degree of variability was observed between successive repeats of the same test, thus the results shown are very approximate. They do show, however, that creation times for RT Threads are comparable to those for the other threads packages tested. They are generally better than those for the U N I X thread packages, several time better than those for A I X 4.1 Pthreads. They are only slightly worse 75 Hostname Threads 1024 4096 16384 Package bytes bytes bytes RT Threads 280 (65) 922 (110) 1830 (95) Stephen Solaris Threads 300 (117) 856 (144) 1744 (265) Pthreads 305 (118) 867 (169) 1698 (240) RT Threads 82 (32) 230 (40) 425 (40) rickards Solaris Threads 93 (46) 249 (59) 468 (85) Pthreads 100 (52) 245 (50) 448 (82) blackdog RT Threads 103 (64) 275 (145) 297 (144) Pthreads 626 (589) 816 (680) 887 (720) monteith RT Threads 67 93 138 Win32 Threads 62 86 129 grouse RT Threads 45.5 65 100 Win32 Threads 45.2 63 93 robson-win95 RT Threads 27 99 106 Win32 Threads 25.5 95 98 Table 5.2: Thread creation times in microseconds than those for Win32 Threads, a good result considering that an underlying Win32 Thread creation is included in the cost of an RT Thread creation in the mapped implementation. 5.1.2 Semaphore Overhead The overhead of semaphores is measured here for two reasons. Firstly, semaphores are the most common way of controlling access to shared data structures in RT Threads 1. Threaded applications will often require a high number of such calls, although few might actually result in context switches if there is low contention for such access, as is likely to be the case in a uniprocessor environment. Thus the over-head of the call itself can be important. Secondly, context switching performance was measured using semaphores to cause the switches, and the overhead of the calls 1 Semaphores can easily provide the same functionality as "mutex" constructs often found in other threads packages 76 needs to be determined so it can be factored out. Results for semaphore call overheads are included in table 5.3 along with results for context switching. Semaphore overhead is lower for RT Threads in all cases. This is to be expected, since RT Threads is implemented at user level and there is no system call overhead for these calls. 5.1.3 Context Switching The context switching performance of RT Threads was tested on various machines, comparing the results with those for native thread packages on each machine. For So-laris and A I X , the native host implementation of RT Thread was used, and for Win-dows N T and Windows 95 the mapped implementation was used. The test consisted of two threads alternately blocking on and signaling semaphores. Semaphores are implemented using condition variables and mutexes for Pthreads, since semaphores are not part of the Pthreads interface. Table 5.3 shows the results, where times are averaged over 100,000 iterations. Context switch times include the overhead for semaphore calls. An estimate of the overhead for semaphore calls (one P and one V per switch) is also shown, determined by timing calls which did not result in context switches. For context switching, the native host implementation of RT Threads out-performed the native threads packages in all cases. The mapped implementation showed a considerable performance penalty as compared to the underlying Win32 threads alone. While the mapped implementation can obviously not perform bet-ter than the threads package it is built on top of, work is ongoing to reduce this 77 Hostname Threads Context Semaphore Package Switch Overhead RT Threads 61 7 Stephen Solaris Threads 99 10 Pthreads 117 8 RT Threads 10 2 cascade Solaris Threads 24 4 Pthreads 28 3 RT Threads 16 1 rickards Solaris Threads 21 5 Pthreads 25 2 blackdog RT Threads 38 4 Pthreads 115 27 monteith RT Threads 81 4 Win32 Threads 23 17 asp RT Threads 182 5 Win32 Threads 82 7 Table 5.3: Context switch times in microseconds overhead. 5.2 Inter Thread Communication Tests were made for RT Threads to determine the average times for send/receive/reply transactions, measured as the blocking time for the RttSendO call. Results were averaged over several iterations, the number of which varied with the message size. Message sizes of 0, 1024, 65536 and 1048576 bytes were tested, both for local ITC transactions (i.e., between threads in the same RT Threads environment), and for cross address space ITC between machines on a local Ethernet network. Longer times for longer messages would reflect the overhead in copying the message in the local case, and the network transmission overhead in the cross address space tests. The other threads packages tested did not have a communication facility 78 Host 0 bytes 1024 bytes 64 KBytes 1 MByte Stephen 129 707 36458 557847 rickards 28 121 7342 126042 ice 155 190 3360 118748 blackdog 85 103 4075 68528 appaloosa 71 101 4172 58207 cascade 20 48 1104 17803 monteith 147 183 2310 49832 grouse 81 116 2513 51269 robson-win95 164 210 2773 57403 Table 5.4: Local send/receive/reply times comparable to RT Threads' message passing, so comparisons could not be made. Table 5.5 shows times, in microseconds, for ITC between two threads in a single RT Threads environment, for various message sizes. Times for zero length message would reflect the overhead for two context switches, plus the overhead of the Rt tSendO, R t tRece iveO and R t t R e p l y O calls combined. Table 5.5 shows times, in microseconds, for cross address space ITC between various machines on a lOBaseT Ethernet network. 5.3 Real Time Performance Experiments were designed to determine how well RT Threads does in terms of real time scheduling in various environments. To truly be a "real time" threads environ-ment (i.e., such that threads will start as close to their scheduled starting times as is possible given the priorities, deadlines and execution times of other threads in the environment), RT Threads threads would have to be the only threads in the overall system on a computer, or preempt other non-RT Threads threads or processes in-79 Sending Host Receiving Host 0 bytes 1024 bytes 64 KBytes 1 MByte ice appaloosa 26549 24186 190584 2554876 appaloosa ice 21355 20682 217334 2646952 dsg appaloosa 20092 21526 228083 2211720 appaloosa dsg 20058 66237 200006 2194095 cascade appaloosa 26404 32858 403443 3117465 appaloosa cascade 25841 31825 400061 3136351 S t e p h e n appaloosa 31016 33668 398772 2620522 appaloosa Stephen 20225 23389 411676 2841969 rickards appaloosa 21535 20107 409943 2823639 appaloosa rickards 21322 21810 415503 2958978 grouse appaloosa 13048 10750 203123 2323293 appaloosa grouse 20073 20073 168049 2274666 monteith appaloosa 13607 11505 201410 2412353 appaloosa monteith 20014 20035 200548 2497584 robson-win95 appaloosa 12209 10394 201115 2382625 appaloosa robson-win95 20103 20002 201727 2413129 robson-win95 grouse 6433 12892 309948 2270538 grouse robson-win95 10176 10174 301699 2410048 grouse sumas 6966 7669 148257 2267732 sumas grouse 6905 6831 138490 2153997 Table 5.5: Cross address space send/receive/reply times 80 stantaneously whenever ready to run. Nevertheless, it is of interest to see whether RT Threads would be useful on a variety of platforms where such guarantees are not possible. In other words, can user applications comfortably make use of the real time aspects of RT Threads' scheduling in the context of a general purpose operating system such as U N I X or Windows N T . The basic test program consists of a thread which repeatedly schedules itself with a starting time which is a random short time in the future. The rand() library function is used to derive a delay time of between 20 and 40 milliseconds. Each time the thread wakes up, the difference between the scheduled starting time and the actual starting time is measured, and the average and standard deviation are calculated at the end of the test. In the base test, no other user threads were run-ning while the test thread was sleeping. Tests were run for 10,000 iterations. In the tables, "Idle System" means there were no user processes running, other than those described for the tests, but there may have been active system processes. "Loaded System" refers to a system with several active user processes. In tests labeled "RT Threads in Real Time Process", RT Threads was run as a U N I X real time process. In Solaris, real time processes cannot be preempted by non-real time processes, in-cluding system processes. Unfortunately, the resolution of the system clock used for scheduling threads was 10 milliseconds (10,000 microseconds) in all cases, so even under ideal condi-tions (i.e., no other competition for C P U resources), a thread could start as much as just under 10 milliseconds, plus any post interrupt scheduling overhead, after it's scheduled starting time. With a uniform random distribution of starting times 81 Hostname Average Standard Maximum Difference Deviation Difference Idle System Stephen 5856 3932 101182 r i c k a r d s 5566 3044 46489 i c e 5389 3340 77746 b l a c k d o g 5738 2921 13093 a p p a l o o s a 5721 2922 12835 m o n t e i t h 5022 2887 10961 g r o u s e 5017 2890 10909 robson-win95 5969 3799 14954 Loaded System c a s c a d e 5540 5206 415719 R T Threads in Real Time Process S t e p h e n 5689 2926 10494 r i c k a r d s 5548 2940 10349 Table 5.6: Base Test: No background threads, no background processes one would expect the actual starting times would be roughly 5 milliseconds plus the scheduling overhead later than the scheduled starting times, on average, with a standard deviation of around 3 milliseconds. Table 5.6 shows results for the base test. As expected, results show an av-erage difference somewhat higher than 5,000 microseconds (5 ms), and standard deviations of around 3,000 microseconds (3 ms). Maximum differences tend to be somewhat over 10,000 microseconds (10 ms), as expected, except on the various versions of SunOS where considerably higher maximums are seen. For the loaded system, a relatively high standard deviation (5.2 ms) and maximum (415 ms) were observed, yet the average was still reasonable. The results for RT Threads run-ning in a real time process, the case which should have the best results, were only marginally better. 82 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 5940 5641 299452 rickards 5623 3414 90510 ice 6350 26121 1966924 blackdog 5763 3088 85118 a p p a l o o s a 5733 2977 40758 monteith 5027 2889 14322 grouse 5007 2887 10946 robson-win95 5955 3774 19356 Loaded System cascade 137988 165153 814844 R T Threads in Real Time Process S t e p h e n 5694 2926 10711 rickards 5545 2940 10240 Table 5.7: 1 background thread, no background processes Table 5.7 shows results for the base test, with a low priority spinning (i.e., in a tight loop using up C P U resources) thread running in the same RT Threads environment (since they are of higher priority, the test threads which take the timing measurements should preempt the spinning thread). For the most part, results were only slightly worse than for the base test, as one would expect. The exception-was the loaded system, where the result was far worse. One possible explanation is that with the spinning thread, the RT Threads environment process is taking up considerable C P U resources and the underlying operating system's scheduler is less likely to schedule it when its alarm signal is generated (in other words, it will have used up its quantum). Table 5.8 shows results where there is a background spinning process, which presumably would compete with the RT Threads environment for C P U resources. Results are comparable to the base test, which suggests that an RT Threads appli-cation which is not compute intensive should still observe reasonably good real time 83 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 5852 3767 71459 rickards 5600 3491 160631 ice 5528 7451 379967 blackdog 5749 2919 12824 appaloosa 5728 2945 38246 monteith 5021 2889 18729 grouse 5019 2896 28729 robson-win95 5376 3557 13998 Loaded System cascade 5469 3141 38402 R T Threads in Real Time Process S t e p h e n 5699 2925 10610 rickards 5553 2938 10340 Table 5.8: No background threads, 1 background process scheduling behaviour, even when other processes are active in the system. Table 5.9 shows results where there is a low priority background thread and a background spinning process. In this case there is greater competition for C P U resources, which is reflected in the results. For the various versions of SunOS, the results are quite poor, especially for Solaris with an average of over 38 ms on both the SPARC2 and the Pentium 166, and a standard deviation of around 54 ms. How-ever, for A I X and N T , the results are actually quite good, especially for A I X 4.1 and N T 3.51. For the loaded system, the results are similar to those in table 5.7, which is not surprising since the background process may not be adding much to the overall background load. For the RT Threads environment running in a real time process, the results are still virtually the same as for the base test, as would be expected. 84 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 38978 54476 252087 rickards 38174 53869 216343 ice 22187 15538 106325 blackdog 5881 3259 49079 a p p a l o o s a 7701 4880 39121 monteith 5249 3930 49692 grouse 5967 10452 129904 robson-win95 6081 4486 34924 Loaded System cascade 149734 170841 820978 R T Threads in Real Time Process S t e p h e n 5689 2926 10717 rickards 5538 2941 10233 Table 5.9: 1 background thread, 1 background process Table 5.10 shows results where there is no low priority background thread, but there are two background spinning processes. Results are similar to those for the same test with only one background process. Table 5.11 shows results where there is a low priority background thread and two background spinning processes. Results are somewhat worse than for the same test with only one background process, but still quite good for the various Windows operating systems. Tables 5.12 and 5.13 show results with a compilation (GNU gcc for U N I X , Visual C++ 4.0 for Windows) being performed in the background, which may be a more realistic situation than having a process spinning in a tight loop. Results are similar to those with a background spinner, although generally somewhat worse. The results for the loaded system are better, but that may be due to changes in the 85 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 5861 4028 103529 r i c k a r d s 5571 3050 45683 i c e 5377 3355 74688 b l a c k d o g 5742 2917 11015 a p p a l o o s a 5726 2922 15373 m o n t e i t h 5023 2886 10961 g r o u s e 5021 2971 66134 robson-win95 3632 1971 7999 Loaded System c a s c a d e 5484 3258 102751 R T Threads in Real Time Process S t e p h e n 5703 2924 10561 r i c k a r d s 5554 2939 10308 Table 5.10: No background threads, 2 background processes Hostname Average Difference Standard Deviation Maximum Difference Idle System S t e p h e n 73425 99654 380421 rickards 71768 98779 378353 ice 45873 30867 363165 blackdog 10153 9143 67849 appaloosa 13856 10482 60720 monteith 5608 6928 89987 grouse 7142 22149 250817 robson-win95 5044 3550 24279 Loaded System cascade 173509 181477 714466 R T Threads in Real Time Process S t e p h e n 5692 2925 10672 rickards 5542 2941 10286 Table 5.11: 1 background thread, 2 background processes 86 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 6203 4412 85392 rickards 5734 3511 101128 ice 5767 3504 46513 blackdog 6070 5900 466406 appaloosa 5786 2995 24153 monteith 5022 2887 10961 grouse 5026 2895 24697 robson-win95 6323 9672 302643 Loaded System cascade 5522 3167 28866 Table 5.12: No background threads, 1 background compile overall load over time. Tables 5.14, 5.15 and 5.16 show the results of tests where two RT Threads environments were run concurrently on the same machine. In the first test (ta-ble 5.14), neither has a spinning background thread. In the second (table 5.15), one of the two RT Threads processes has a spinning background thread, and in the third (table 5.16), both do. Two results are shown in each case, one for each environment. In table 5.14, the results in all cases are reasonably close to those for the base case, except that the maximum difference tended to be much higher. In the second test, with one of the environments having a background thread spinning, results were again similar but slightly worse. The results for the third case, where both environments had a background spinning thread, the results are considerably worse. For the U N I X operating systems, the results are similar to those for table 5.9 (one background thread, one background process), which is not a surprising result as each of the RT Threads environments would act like a background spinning pro-87 Hostname Average Difference Standard Deviation Maximum Difference Idle System S t e p h e n 18157 39494 390007 r i c k a r d s 16900 39887 526672 i c e 31452 125322 1533903 b l a c k d o g 13165 36437 631339 appaloosa 12799 42353 628420 m o n t e i t h 5073 3131 47836 g r o u s e 5133 3758 169975 robson-win95 7068 7473 378022 Loaded System cascade 35452 65177 724476 Table 5.13: 1 background thread, 1 background compile cess for the other. For the first time, however, some very poor results are observed for the Windows operating systems. In each case only one of the RT Threads envi-ronments exhibits the poor results, the one which was started first. For Windows N T 4.0 and Windows 95 the average was almost double that for the base test, for the first environment started. For Windows N T 3.51, the average was 39 ms, the standard deviation 360 ms, and the maximum nearly 4 seconds. It is difficult to interpret these results, it may be be a problem with interaction of the multimedia timers in the two processes, but that is merely speculation. Tables 5.17, 5.18 and 5.19 show results for the base test, but with several threads doing the scheduling test simultaneously (100, 200 and 300 respectively). The results show totals for all threads combined. As the number of threads in-creases, the results show considerably larger differences in the scheduled and actual starting times for threads. This result probably reflects the fact that many threads will be scheduled to run at approximately the same time, and by the time they all 88 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 5949/5930 4139/4189 81745/81525 rickards 5581/5587 3029/3028 38304/38040 ice 5483/5430 3069/3040 55919/55742 blackdog 5760/5737 3057/2919 95803/13274 appaloosa 5713/5828 2927/2909 24606/12268 monteith 5024/5018 3088/2896 113711/27781 grouse 5023/5015 3105/2887 116683/10946 robson-win95 5556/5649 3633/3621 14990/14982 Loaded System cascade 5493/5491 3140/3185 35649/32227 Table 5.14: 2 RT Threads environments Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 6023/5981 6199/4436 386794/99719 rickards 5590/5576 3030/3013 39661/45026 ice 5402/5414 3429/3012 138975/44699 blackdog 5767/5751 3376/2917 167307/10735 appaloosa 5767/5733 3298/2920 149607/12443 monteith 5023/5017 3034/2893 97619/18353 grouse 5044/5014 3296/2891 159975/12650 robson-win95 5782/9887 3729/5400 24741/24127 Loaded System cascade 6802/5474 8488/3273 254772/103873 Table 5.15: 2 RT Threads environments, 1 with background thread 89 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 33287/27166 51808/42670 352237/211320 r i c k a r d s 32459/26167 50755/42056 239984/241409 i c e 22008/22252 16807/16107 565581/99998 b l a c k d o g 6802/8588 4707/5354 172364/43439 appaloosa 9705/9343 8196/5574 595853/42113 m o n t e i t h 39060/5181 360349/3471 3893468/40304 g r o u s e 9922/5021 14915/2886 169975/10907 robson-win95 10082/5240 8954/3430 261841/22954 Loaded System cascade 35079/32557 55748/52008 569994/384839 Table 5.16: 2 RT Threads environments, both with background thread get a chance to run briefly, the accumulated overhead will add up to cause a high average delay in actual starting times. To test this idea, the test with 300 threads was run again, but measuring the time when each thread is made ready in the RT Threads kernel rather than its actual starting time. Results are shown in table 5.20, and are more reasonable, with average differences ranging from 5 ms to 10 ms, and standard deviations of around 3 ms. Table 5.21 shows results for a scheduling test done with a real multimedia client application used with the C M F S file server. The application presents J P E G motion video using a Parallax hardware decoder card, synchronized with a simulta-neous audio stream, and in some cases a closed captioning text stream. Threads, one for each stream, are scheduled periodically by the client to display their respective media data. Measurements are made in the RT Threads kernel to see how close to their scheduled starting time the threads are made ready, on average. Two versions of the same video clip showing the music group Yes, one at 15 frames per second, 90 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 9054 9777 617879 rickards 7251 2309 67894 ice 7525 3368 191491 blackdog 7645 3199 169761 appaloosa 8451 3118 196294 monteith 5046 3530 298528 grouse 5051 3247 178528 robson-win95 9507 3737 98975 Loaded System cascade 6002 3128 94445 Table 5.17: One RT Threads environment with 100 threads Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 30756 22385 1602504 rickards 10245 2000 207351 ice 14419 5878 376447 blackdog 21088 14307 822681 appaloosa 27473 5371 318741 monteith 5130 4182 374181 grouse 5424 14117 1756181 robson-win95 12716 9299 1195975 Loaded System cascade 7248 3880 145242 Table 5.18: One RT Threads environment with 200 threads 91 Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 59979 39486 1930563 rickards 12885 4026 415807 ice 39954 16533 1085964 blackdog 46488 53473 4108626 appaloosa 57031 62089 4254461 monteith 18921 4239 229975 grouse 8868 69442 6277325 robson-win95 33167 12136 1224975 Loaded System cascade 12978 18406 512005 Table 5.19: One RT Threads environment with 300 threads Hostname Average Standard Maximum Difference Deviation Difference Idle System S t e p h e n 5308 6086 354863 rickards 5167 3256 93237 ice 5071 3432 191146 blackdog 5029 3607 151694 appaloosa 5058 3891 148228 monteith 5018 2889 19189 grouse 5092 3246 129425 robson-win95 2912 2122 404975 Loaded System cascade 7344 13224 329269 Table 5.20: One RT Threads environment with 300 threads, measured in kernel 92 Title Media Type Average Difference Standard Deviation Maximum Difference Yes Video (15 fps) Video 4925 2936 29381 Audio 4963 3151 42423 Yes Video (30 fps) Video 6247 2603 24592 Audio 7348 7971 65987 Reservoir Dogs (20 fps) Video 4394 3373 47382 Audio 4779 4617 29077 Text 4890 5124 32107 Table 5.21: Multimedia client scheduling performance one at 30, and both with synchronized audio were tested, as was a clip from the movie "Reservoir Dogs" with synchronized 20 frames per second video, audio and text. Measurements were taken over a two minute period. Results were generally as expected. Results were somewhat poorer for the 30 frame per second clip, although that may in part be explained by the fact that calls to the X server for displaying video frames were protected by the R t t B e g i n C r i t i c a l () and R t t E n d C r i t i c a l calls which block signals to the RT Threads kernel and may have caused scheduling clock ticks to be delayed. 93 Chapter 6 R e l a t e d W o r k A number of other threads packages and operating systems with kernel thread sup-port are available, such as A I X 4.1 Pthreads [1], Solaris Threads [23, 9], Mach C Threads [5], etc. Some of these provide real time functionality. For example, A I X 4.1 Pthreads can specify a fixed priority F IFO policy, but this requires root au-thority, due to the danger of starving system processes because of global thread scheduling. Solaris threads offer a real time class of threads, but these real time threads also require root authority and require a high degree of caution, since they preempt system processes [11] and could cause the entire system to deadlock. Other work has been done to provide real time support for existing packages. The Real Time Thread Package (RTTP) developed at Brown University is a C + + framework which is an extension of the Brown Thread Package. The R T T P imple-mentation described in [24] is a wrapper for Solaris Threads, providing a mechanism for non-super users to create real time threads and for protection against runaway real time threads taking over system resources. 94 Real-Time Threads [26] is a real time threads package based on Mach C Threads. One of its main goals, like RT Threads, is to provide portability among operating system platforms. Unlike RT Threads, this package does dynamic schedu-lability analysis, and offers support for multiprocessor platforms. The admission algorithm does, however, require static analysis of maximum execution times for threads, and threads must be assigned to specific nodes in multiprocessor environ-ments. Alternate techniques for providing user-level thread scheduling have been implemented. In [14], shared memory records in user space are used to communi-cate scheduling information efficiently between kernel space and user space to help user space thread schedulers make scheduling decisions based on kernel-level process scheduling and system call blocking behaviour. Most notably, software interrupts are delivered to user space when system calls block and unblock, and optionally to warn of imminent preemption to assist in avoiding context switches while holding locks. This framework was implemented in the Psyche parallel operating system and was used to implement more than one threads package, with different pack-ages being able to coexist and communicate with each other. A similar shared data mechanism called a Meta-Level Scheduler (MLS) is presented in [22]. User-Level Schedulers (ULSs) communicate their state to the kernel via a shared memory re-gion in the M L S (in user space), so the kernel scheduler can schedule the highest priority thread. This mechanism was implemented for the A R T S real time operating system [27] (see section 6.3). Another example of a shared memory mechanism for coordinating user level thread scheduling with the kernel can be found in [6]. 95 6.1 POSIX In [21], experience with the POSIX 1003.4 and 1003.4A standard proposals is dis-cussed. POSIX 4 (1003.4) is a proposal for adding base real time support to the POSIX standard, while POSIX 4A provides thread extension (the "Pthreads" in-terface). LynxOS, a commercial U N I X compatible real time operating system, was modified to provide an interface and functionality conforming to 1003.4 and 1003.4A, with some of the new functionality added at the kernel level but as much as pos-sible implemented as user libraries. POSIX 4 requires that at least two scheduling policies be implemented; Round-Robin and F I F O . Round-Robin is a round robin time-slicing policy, whereas F I F O is a fixed priority policy where threads of a given priority run to completion unless preempted by higher priority threads (threads have a global scheduling scope, so single threaded processes can be thought of as threads in the overall system in terms of scheduling policy). The F IFO policy is similar to the RT Threads scheduling policy, except that it excludes deadline scheduling. With respect to POSIX 4A, an interesting result found by the authors was unexpectedly poor performance of the mutex and condition variable synchronization mechanisms. A strictly user level implementation of Pthreads is described in [15]. Similar-ities with the RT Threads implementation include the use of a monolithic monitor for protecting kernel data structures, i.e., a single global flag to indicate execution in a kernel critical section, and a flag used to defer signal handling. Performance results were similar to those experienced with RT Threads; synchronization and con-text switching were as good as or better than kernel implementations, and dynamic allocation of stacks was found to be the main source of overhead for thread creation. Most of the complexity of the implementation was in providing for the delivery of 96 signals to individual threads, as required by the POSIX standard. RT Threads does not provide this functionality because using a synchronous programming approach to handling asynchronous events with blocking threads was considered preferable to the asynchronous programming approach of using signals and signal handlers. In fact, this is one of the main motivations for providing threads in the first place, a sentiment also expressed in [21]. MiThOS (Microkernel Threads Operating System) [3], developed at Florida State University, provides deadline and periodic scheduling extensions to a POSIX Pthreads library, and is implemented on a bare machine. 6.2 Real Time Support for Multimedia Applications Other work has been done in providing real time support for multimedia applica-tions, particularly in terms of resource allocation in a shared system. Much of the work takes a Quality of Service (QoS) approach where threads which cannot meet their real time requirements dynamically lower those requirements. Real time thread support focuses on periodic scheduling and handling of missed deadlines. Another issue which has prominence in the area of real time systems is dealing with the problem of priority inversion, where a low priority thread holding a shared resource can cause a medium priority thread to hold up a high priority thread. In [28] and [16], periodic real-time threads are used to handle periodic events in a multimedia client application, such as the timely display of continuous media data units (e.g., video frames). In both cases the test application was a QuickTime 97 video player. Support multiple clients on a single machine with graceful degrada-tion required a mechanism for QoS management, and global scope for scheduling of threads. Real-Time Mach [29, 25] was used as the underlying operating system, providing such real time threads. One of the methods demonstrated was a self sta-bilization scheme whereby a missed deadline handler for each thread modified the thread's timing attributes, i.e., period and deadline, such that a thread would lower its frame rate if it was missing deadlines. Results from [16] showed good results when running multiple video clients with varying priorities, i.e., only the quality of the low priority client was degraded. The same test run with a time-sliced scheduler showed erratic degradation among the clients, such that the high priority stream was as likely to reduce frame rate as the low priority one. In [17], a complex protocol for dealing with priority inversion, called the Semaphore Dependency Protocol, is presented. This protocol is less conservative than the protocol ceiling protocol, and results in fewer context switches than the priority inheritance protocol. The authors go on to conjecture, however, that the structure of multimedia applications is usually not complex enough to warrant such an elaborate protocol, and that simple priority inheritance should suffice. The philosophy of RT Threads is that even priority inheritance is not required, i.e., that it should be left up to the application to deal with problems like priority inversion. 6 .3 I n t e r T h r e a d C o m m u n i c a t i o n RT-IPC, a real time extension to Mach IPC for Real-Time Mach is discussed in [10]. Like RT Threads, Mach provides a facility for message passing between threads, where multiple messages arriving at a receiver are queued awaiting reception by the 98 receiving thread (a port mechanism is used in Mach, but the effect is the same). As is the case with RT Threads, messages are dequeued in F I F O order, regardless of the priority of the sender. RT-IPC adds a number of options with respect to dequeuing policy and priority of the receiver. For dequeuing order, a choice of F IFO or priority based policy is available, much like the first come first served and priority modes available for RT Threads semaphores. Priority inheritance and priority handoff op-tions are also available to deal with priority inversion problems which may occur between sender and receiver, much as similar protocols are available for mutex locks in many real time operating systems. In RT Threads, only F I F O semantics are pro-vided for message queuing. In keeping with RT Threads' cooperative programming philosophy, it is condidered the application's responsibility to ensure that reasonable priorities are used for the message receiver, and that an appropriate design is used for handling received messages so as not to unduly hold up senders waiting in the message queue (fro eaxmple, the client/server model depicted in figure 2.2). The Chant threads package [7] extends the POSIX Threads interface with a mechanism called talking threads for distributed programming. Talking threads allow threads on different machines to communicate directly using message passing primitives. Thread identifiers are made universally unique by embedding in them a machine identifier, process identifier and thread identifier, a technique similar to that used by RT Threads. Notification of message reception is done by polling the underlying communication system in the Chant implementation, whereas RT Threads uses underlying asynchronous signals (as part of the external I /O imple-mentation) to unblock threads waiting for messages. 99 H A R T O S [8] is a distributed real time operating system developed at the Uni-versity of Michigan. H A R T O S is part of the H A R T S Hexagonal Architecture for Real-Time Systems, where several multiprocessor nodes are connected on a hexago-nal mesh interconnection network. Dedicated network processors and per-hop dead-lines for data transmission are among the techniques used to implement a real time message passing facility. A R T S is an object oriented distributed real time operating system. Its com-munication facilities include a synchronous request/response style message passing mechanism similar to that of RT Threads. Real time support for this mechanism includes transmitting the sender's time fence (deadline) with the request message, which is compared with the receiver's time fence, taking communication delays into account, to determine whether the request can be processed in time. Also, a com-munication manager is employed at the receiver with prioritized message queues and prioritized, preemptable message handling worker threads, using a priority inheri-tance protocol to avoid priority inversion within the communication subsystem. 100 Chapter 7 A d d i t i o n a l W o r k Work which remains to be done with RT Threads includes: • testing the native microkernel implementation • testing FreeBSD and Linux versions of RT Threads, and comparing the results with those for Windows N T and Windows 95 on the same hardware • implementing a mapped version of RT Threads for Solaris Threads, testing, and comparing the results for those with the native host implementation for Solaris. 101 Chapter 8 C o n c l u s i o n s RT Threads has been used in a U N I X environment for approximately two years. Although refinements have been made over that time, it has proven to be stable. Both the server components and the clients for the C M F S distributed continu-ous media file server project have been implemented using RT Threads. These applications have a complex design of communicating threads, including network communication (and synchronization) between client and server components using the cross address space inter-thread communication facilities of RT Threads. Real time scheduling functionality has been used to implement timely disk reading and network transmission of continuous media data, fine grained network rate control for the purpose of traffic shaping, and display synchronization for multiple media streams. RT Threads has provided a convenient and powerful model for program-ming cooperating real time tasks. Performance results show comparable or better results for RT Threads, in terms of functional overheads, than for the other threads packages tested in a U N I X 102 environment (Solaris Threads and POSIX Threads). This suggests that RT Threads makes a practical alternative for use as a general purpose threads package in such an environment. Results for functional overheads in the Win32 environment were some-what worse for RT Threads than for Win32 threads in the case of thread creation and context switches, which is to be expected considering that for RT Threads these functions are built on top of the Win32 facilities. Still, results were close enough that RT Threads should be a practical alternative for most applications. Real time scheduling tests showed that RT Threads, in most cases, exhibits the expected real time scheduling behaviour in lightly loaded systems. Scheduling was almost always accurate to within the limiting underlying system timer clock resolution of 10 milliseconds. Results worsened when the underlying system had a heavy load, or when multiple compute intensive RT Threads applications were run on the same machine. These experiments were intended to test the limits of the RT Threads package. Generally, one would not expect to be able to run a compute in-tensive real time application and share computing resources successfully with other processes in a general purpose system. Thus RT Threads can be successfully used as the basis for applications which require non-critical real time scheduling of events, even in a non-real time operat-ing system environment. This result was shown in practical terms by testing the performance of a synchronized multimedia client. 103 B i b l i o g r a p h y [1] AIX Version 4-1 General Programming Concepts: Writing and Debugging Pro-grams, October 1994. [2] T.P. Baker, Frank Mueller, and Viresh Rustagi. Experience with a Prototype of the POSIX "Minimal Realtime System Profile". In the 11th IEEE Workshop on Real-Time Operating Systems and Software, May 1994. [3] T.P. Baker, Frank Mueller, and Viresh Rustagi. MiThOS - A Real-Time Micro-Kernel Threads Operating System. In IEEE Real-Time Systems Symposium 1995, 1995. [4] David R. Cheriton. The V Kernel: A Software Base for Distributed Systems. IEEE Software, l(2):19-42, Apri l 1984. [5] Eric C. Cooper and Richard P. Draves. C Threads. Technical Report C M U -CS-88-129, Carnegie Mellon University, Apri l 1988. [6] Ramesh Govindan and David P. Anderson. Scheduling and IPC Mechanisms For Continuous Media. In Proceedings of the 13th Symposium on Operating Systems Principles, pages 68-80, October 1991. [7] Matthew Haines, David Cronk, and Piyush Mehrotra. On the Design of Chant: A Talking Threads Package. In Supercomputing 94, Apr i l 1994. [8] Dilip D. Kandlur, Daniel L . Kiskis, and Kang G . Shin. H A R T O S : A Distributed Real-Time Operating System. ACM Operating Systems Review, 23(3):72-89, July 1989. [9] Sandeep Khanna, Michael Sebree, and John Zolnowsky. Realtime Scheduling in SunOS 5.0. In Winter USENIX, pages 375-390, January 1992. [10] Takuro Kitayama, Tatsuo Nakajima, and Hideyuki Tokudo. RT-IPC: An IPC Extension for Real-Time Mach. In Proceedings of the 2nd USENIX Conference on Microkernels and Other Kernel Architectures, 1993. 104 [11] B i l Lewis and Daniel J . Berg. Threads Primer. SunSoft Press, 1996. [12] Siu Ling Ann Lo. An Operating System Architecture for Real-Time Tasking Abstractions. PhD thesis, University of British Columbia, Apri l 1995. [13] Siu Ling Ann Lo, Norman C . Hutchinson, and Samuel T. Chanson. Architec-tural Considerations in the design of Real-Time Kernels. In IEEE Real-Time Systems Symposium, pages 138-147, December 1993. [14] Brian D . Marsh, Michael L . Scott, Thomas J . LeBlanc, and Evangelos P. Markatos. First-Class User-Level Threads. In Proceedings of the 13th Sym-posium on Operating Systems Principles, pages 110-120, October 1991. [15] Frank Mueller. A Library Implementation of POSIX Threads under U N I X . In Proceedings of the Winter USENIX Conference, pages 29-41, January 1993. [16] Tatsuo Nakajima and Hiroshi Tezuka. A Continuos Media Application sup-porting Dynamic QOS Control on Real-Time Mach. In A CM Multimedia '94, October 1994. [17] Roger Needham and Akira Nakamura. An Approach to Real-time Scheduling -but is it Really a Problem for Multimedia? In Proceedings of the 3rd Interna-tional Workshop on Network and Operating System Support for Digital Audio and Video, pages 32-39, 1993. [18] Gerald W . Neufeld, Dwight J . Makaroff, and Norman C. Hutchinson. Design of a Variable Bit-Rate Continuous Media Server. In Proceedings of the 5th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 375-378, Durham N H , Apri l 1995. [19] Gerald W . Neufeld, Dwight J . Makaroff, and Norman C. Hutchinson. Internal Design of the U B C Distributed Continuous Media File Server. Technical Report 95-06, University of British Columbia, Apri l 1995. [20] Gerald W . Neufeld, Dwight J . Makaroff, and Norman C. Hutchinson. Design of a Variable Bit-Rate Continuous Media Server for an A T M Network. In Pro-ceedings of the IS&T/SPIE Multimedia Computing and Networking Conference, San Jose, C A , January 1996. [21] Bi l l O.Gallmeister and Chris Lanier. Early Experience With POSIX 1003.4 and POSIX 1003.4A. In Proceedings of the IEEE Symposium on Real-Time Systems Conference, pages 190-198, July 1991. 105 [22] Shuichi Oikawa and Hideyuki Tokuda. User-Level Real-Time Threads: A n Approach Towards High Performance Multimedia Threads. In Proceedings of the 4th International Workshop on Network and Operating System Support for Digital Audio and Video, pages 66-76, November 1993. [23] M . L . Powell, S. R. Kleiman, S. Barton, D . Shah, D . Stein, and M . Weeks. SunOS Multi-thread Architecture. In Proceedings of the Winter USENIX Con-ference, pages 65-76,1991. [24] Kostadis Roussos. Real-Time Thread Package. Technical Report CS-96-18, Department of Computer Science, Brown University, May 1996. [25] Stefan Savage and Hidyuki Tokuda. Real-Time Mach Timers: Exporting Time to the User. In Proceedings of the 3rd USENIX Mach Conference, pages 111-118, Apri l 1993. [26] Karsten Schwan, Hongyi Zhou, and Ahmed Gheith. Real-Time Threads. ACM Operating Systems Review, 25(4):35-46, October 1991. [27] Hideyuki Tokuda and Clifford W . Mercer. A R T S : A Distributed Real-Time Kernel. ACM Operating Systems Review, 23(3):29-53, July 1989. [28] Hide Tokudo and Takuro Kitayama. Dynamic QOS Control based on Real-Time Threads. In Proceedings of the Jtth International Workshop on Network and Operating System Support for Digital Audio and Video, pages 114-121, November 1993. [29] Hideyuki Tokudo, Tatsuo Nakajima, and Prithvi Rao. Real-Time Mach: To-wards a Predictable Real-Time System. In Proceedings of the 1st USENIX Mach Symposium, October 1990. [30] Alistair C. Veitch and Norman C. Hutchinson. Kea - A Dynamically Exten-sible and Configurable Operating System Kernel. In Proceedings of the Third Conference on Configurable Distributed Systems (ICCDS '96), May 1996. 106 Appendix A E x a m p l e P r o g r a m The following simple example program is included to demonstrate the basic struc-ture of an RT Threads program. Note that the header file r t t h r eads .h must be included. In the program, two threads are created, each being passed an integer to identify it in the output. Both threads are created with the same priority, RTTHIGH. Each thread lowers its priority, in turn, causing the other to be scheduled. This is done within a critical section, protected with a semaphore. The example program demonstrates the scheduling effects of both scheduling attribute modification and the use of synchronization primitives. Sample output is shown in comments at the end. #include <stdio.h> #include " r t th reads .h" s t a t i c Rt tThreadld threada, threadb; s t a t i c RttSem sem; 107 RTTTHREAD thr(void *arg) •C RttSchAttr myattrs; RttThreadld me; int i , threadNum; threadNum = (int)arg; me = RttMyThreadldO ; for ( i = 0; i < 3; i++) •C printf("7,d: Wait\n", threadNum); RttP(sem); RttGetThreadSchedAttr(me, ftmyattrs); p r i n t f ("*/,d: My p r i o r i t y i s '/,d\n", threadNum, myattrs . p r i o r i t y ) ; /* lower my p r i o r i t y */ myattrs.priority++; RttSetThreadSchedAttr(me, myattrs); p r i n t f ("7,d: Signal\n" , threadNum); RttV(sem); } p r i n t f ("'/,d: Done\n" , threadNum); } 108 mainpO { RttSchAttr a t t r s ; RttAllocSem(&sem, 1, RTTFCFS); attrs.startingtime = RTTZEROTIME; a t t r s . p r i o r i t y = RTTHIGH; attrs.deadline = RTTNODEADLINE; RttCreate(&threada, thr, 16000, "a", (void *)1, a t t r s , RTTUSR); RttCreate(&threadb, thr, 16000, "b", (void *)2, a t t r s , RTTUSR); printf("threads created\n"); > 109 /* Sample Output: threads created 1: Wait 1: My p r i o r i t y i s 10 2: Wait 1: Signal 2: My p r i o r i t y i s 10 2: Signal 2: Wait 2: My p r i o r i t y i s 11 1: Wait 2: Signal 1: My p r i o r i t y i s 11 1: Signal 1: Wait 1: My p r i o r i t y i s 12 2: Wait 1: Signal 2: My p r i o r i t y i s 12 2: Signal 2: Done 1: Done 110 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items