Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Cyclical processor and computer architectures for highly parallel applications Wong, Fut-Suan 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1984_A1 W66.pdf [ 7.17MB ]
Metadata
JSON: 831-1.0096640.json
JSON-LD: 831-1.0096640-ld.json
RDF/XML (Pretty): 831-1.0096640-rdf.xml
RDF/JSON: 831-1.0096640-rdf.json
Turtle: 831-1.0096640-turtle.txt
N-Triples: 831-1.0096640-rdf-ntriples.txt
Original Record: 831-1.0096640-source.json
Full Text
831-1.0096640-fulltext.txt
Citation
831-1.0096640.ris

Full Text

C y c l i c a l P rocessor and Computer A r c h i t e c t u r e s f o r H i g h l y P a r a l l e l A p p l i c a t i o n s by Fut-Suan Wong B.Eng.(Hons.), U n i v e r s i t y of Singapore,1979 M.A.Sc., The State U n i v e r s i t y of New York, 1980 / A/THESIS SUBMITTED IN PARTIAL FULFILLMENT OF ' THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY In THE FACULTY OF GRADUATE STUDIES Department of E l e c t r i c a l E n g i n e e r i n g We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF JANUARY, © 1984, F. BRITISH COLUMBIA 1984 S. Wong In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f the requirements f o r an advanced degree a t the U n i v e r s i t y o f B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head o f my department o r by h i s o r her r e p r e s e n t a t i v e s . I t i s understood t h a t copying o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed without my w r i t t e n p e r m i s s i o n . Department o f The U n i v e r s i t y o f B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 i i A b s t r a c t D u r i n g t h e l a s t few d e c a d e s , t h e s e a r c h f o r p o w e r f u l c o m p u t i n g m a c h i n e s has been one o f t h e s e v e r a l e n d l e s s p u r s u i t s among t h e s c i e n t i f i c c o m m u n i t y . I n t h i s t h e s i s , s e v e r a l n o v e l a r c h i t e c t u r a l i d e a s f o r t h e d e s i g n s o f h i g h -p e r f o r m a n c e c o m p u t i n g m a c h i n e s a r e p r e s e n t e d , a n d t h e p r a c t i c a l i t y a n d u s e f u l n e s s o f c y c l i c a l a r c h i t e c t u r e s -- o n e s w h i c h have t h e i r h a r d w a r e r e s o u r c e s c y c l i c a l l y a r r a n g e d -- i n t h i s r e s p e c t a r e e x a m i n e d . T h e s e i d e a s a r e i l l u s t r a t e d w i t h t h e use o f s p e c i f i c a p p l i c a t i o n e x a m p l e s i n c l u d i n g p a r a l l e l s o r t i n g , p a c k e t - s w i t c h e d c o m m u n i c a t i o n s a n d t h e d e s i g n m e t h o d o l o g y o f a c l a s s o f n e x t - g e n e r a t i o n c o m p u t e r s . I n t h e f i r s t p a r t o f o u r s t u d i e s , t h e s t r u c t u r e a n d c o n t r o l a l g o r i t h m s o f a s i n g l e - c h i p , r e c i r c u l a t i n g s y s t o l i c s o r t e r ( R S S ) , a r e p r e s e n t e d . The c o r r e c t n e s s of t h e a l g o r i t h m s i s p r o v e d , a n d g e n e r a l o p e r a t i o n a l c o n s t r a i n t s a r e d e r i v e d . T h i s p a r a l l e l s o r t e r i s h i g h l y a m e n a b l e t o V L S I i m p l e m e n t a t i o n s b e c a u s e o f t h e s i m p l e c o n t r o l s t r u c t u r e a n d t h e r e g u l a r , r e p e t i t i v e a n d n e a r - n e i g h b o u r t y p e o f i n t e r c o n n e c t i o n s r e q u i r e d . The number o f q u a d r u p l e c o m p a r a t o r s n e e d e d t o s o r t N i t e m s i s N / 4 , a n d t h e a v e r a g e s o r t i n g t i m e i s f o u n d t o be b o u n d e d by ( l o g N ) * * 2 a n d N. A h a r d w a r e t e r m i n a t i o n i s i n c o r p o r a t e d i n t o t h e c o n t r o l u n i t o f t h e s o r t e r , so t h a t t h e s o r t i n g p r o c e s s c a n be t e r m i n a t e d a s s o o n a s t h e i n p u t l i s t i s i n t h e d e s i r e d o r d e r . In t h e s e c o n d p a r t o f o u r s t u d i e s , a n o v e l l o o p -s t r u c t u r e d s w i t c h i n g n e t w o r k (LSSN) i s p r e s e n t e d . I t i s i n t e n d e d f o r p a c k e t c o m m u n i c a t i o n s i n l a r g e - s c a l e s y s t e m s c o n s i s t i n g o f h u n d r e d s t o t h o u s a n d s o f i n t e r c o n n e c t e d d e v i c e s . W i t h L l o o p s -- where L i s a power o f two, i t c a n c o n n e c t up t o N = L ( l o g L ) p a i r s o f t r a n s m i t t e r s a n d r e c e i v e r s , u s i n g o n l y N/2 t w o - b y - t w o s w i t c h e s ; i n t e r m s o f s w i t c h c o u n t s and t h e a m o u nts o f w i r i n g , t h i s n e t w o r k i s v e r y a d v a n t a g e o u s when t h e v a l u e o f N i s l a r g e . I t c a n be e x t e n d e d i n c r e m e n t a l l y , a n d i s f r e e o f t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s w h i c h p r e v a i l i n o t h e r c y c l i c a l , p a c k e t - s w i t c h e d n e t w o r k s . Our s i m u l a t i o n r e s u l t s show t h a t i t s a v e r a g e t h r o u g h p u t r a t e a n d d e l a y a r e c l o s e t o t h a t o f o t h e r d e s i g n s d e s p i t e i t s r e l a t i v e l y l o w s w i t c h c o u n t . I n t h e t h i r d p a r t o f o u r s t u d i e s , a new d e s i g n m e t h o d o l o g y f o r t h e n e x t - g e n e r a t i o n c o m p u t e r s i s d e s c r i b e d . Our p r o p o s e d s y s t e m , t h e E v e n t - D r i v e n C o m p uter (EDC) i s p r i m a r i l y a d a t a - d r i v e n s y s t e m w h i c h h a s i t s c o m p u t i n g r e s o u r c e s a r r a n g e d a s a c i r c u l a r p i p e l i n e , and i t i s s u p p l e m e n t e d w i t h c o n t r o l - d r i v e n a c t i v i t i e s . S u c h a c o m b i n e d a p p r o a c h i s a i m e d a t e x t r a c t i n g t h e a d v a n t a g e s o f b o t h t h e " p u r e " d a t a - d r i v e n a n d c o n t r o l - d r i v e n c o m p u t a t i o n s w h i l e a l l e v i a t i n g t h e i r s h o r t c o m i n g s . Compared t o o t h e r d e s i g n s , an EDC h a s t h e m e r i t s o f a s i m p l e r a r c h i t e c t u r e , b e t t e r r e s o u r c e u t i l i z a t i o n , a r r a y p r o c e s s i n g c a p a b i l i t i e s a n d a h i g h e r s p e e d r a n g e . i v As i s shown by o u r s t u d i e s , t h e p r o p e r t i e s o f t h e c y c l i c a l a r c h i t e c t u r e s d e p e n d g r e a t l y on how t h e i n f o r m a t i o n p a c k e t s i n t e r a c t w i t h e a c h o t h e r ; d e a d l o c k s , f o r i n s t a n c e , w i l l o c c u r on s y s t e m s s u c h a s t h e L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k b e c a u s e o f t h e a s y n c h r o n o u s , c i r c u l a r r e q u e s t s o f n e t w o r k r e s o u r c e s by t h e p a c k e t s (we h a v e , h o w e v e r , p r e s e n t e d a d e a d l o c k a v o i d a n c e s c h e m e ) , b u t w i l l n o t o c c u r on s y n c h r o n o u s s y s t e m s s u c h a s t h e R e c i r c u l a t i n g S y s t o l i c S o r t e r . I n g e n e r a l , t h e r e s o u r c e u t i l i z a t i o n o f t h e c y c l i c a l a r c h i t e c t u r e s a r e h i g h e r t h a n t h a t o f t h e a c y c l i c o n e s — o r e q u i v a l e n t l y , t h e c y c l i c a l a r c h i t e c t u r e s c a n h a n d l e l a r g e r a mounts o f i n f o r m a t i o n w i t h r e l a t i v e l y s m a l l e r a r e a s -- t h e y a r e t h e r e f o r e more s u i t a b l e t o t h e d e s i g n s o f v e r y l a r g e - s c a l e s y s t e m s . Key p h r a s e s : c o m p u t e r a r c h i t e c t u r e s , n e x t - g e n e r a t i o n c o m p u t i n g , s y s t o l i c a r r a y s , p a r a l l e l s o r t i n g , p a c k e t - s w i t c h e d n e t w o r k s , s t o r e - a n d - f o r w a r d d e a d l o c k s , d a t a - d r i v e n a n d c o n t r o l - d r i v e n c o m p u t a t i o n s . V T a b l e o f C o n t e n t s page A b s t r a c t i i T a b l e o f C o n t e n t v L i s t o f F i g u r e s v i i i L i s t o f T a b l e s x N o n e m c l a t u r e x i A c k n o w l e d g e m e n t s x i i C h a p t e r I . I n t r o d u c t i o n 1.1. B a c k g r o u n d I n f o r m a t i o n 1 1.2. C y c l i c a l A r c h i t e c t u r e s 4 1.3. O b j e c t i v e s and S c o p e o f t h e T h e s i s 7 C h a p t e r I I . A S y s t o l i c P r o c e s s o r f o r P a r a l l e l S o r t i n g 11.1. I n t r o d u c t i o n 9 11.2. The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) A. N e t w o r k D e s c r i p t i o n 12 B. The Q u a d r u p l e C o m p a r a t o r 14 C. The C o m p a r i s o n / E x c h a n g e / S h i f t O p e r a t i o n s 16 11.3. The RSS A l g o r i t h m s A. A l g o r i t h m I 20 B. A l g o r i t h m I I 21 C. E x a m p l e s 22 11.4. O p e r a t i o n a l C o n s t r a i n t s A. C o n s t r a i n t s on t h e S i z e o f RSS 23 B. M a r k i n g Scheme A 26 C. M a r k i n g Scheme B ...26 11.5. A n a l y s i s o f t h e RSS A l g o r i t h m s A. A n a l o g y w i t h t h e Odd-Even T r a n s p o r t a t i o n v i Sort 27 B. C o r r e c t n e s s of the RSS Algorithms and Marking Schemes 29 C. C o r r e c t n e s s of the Termination Method 40 D. Timing C o m p l e x i t i e s 41 I I . 6. D i s c u s s i o n s 43 Chapter I I I . A Novel Loop-Structured Switching Network (LSSN) 111 .1 . I n t r o d u c t i o n 48 I I I . 2. Network Topology A. Addressing Scheme and Connection Function....50 B. Routing Scheme 51 111.3. Network P r o p e r t i e s 59 A. Network C o n f l i c t s 62 B. Deadlocks and Avoidance Method 65 C. Network E x t e n s i b i l i t y .68 111.4. S i m u l a t i o n s and Performance A n a l y s i s 70 111.5. D i s c u s s i o n s and Outlook 74 Chapter IV. Design and E v a l u a t i o n of the Event-Driven Computer (EDC) IV.1. I n t r o d u c t i o n A. Background Information ...81 B. Recent Developments 83 C. Overview of Our Approach 85 IV.2. The EDC Hardware A r c h i t e c t u r e A. P r o c e s s i n g Modules 91 B. Storage Modules 96 C. Switches 98 IV.3. The EDC Information S t r u c t u r e v i i A. M a c h i n e Code F o r m a t s 104 B. P a c k e t F o r m a t s 107 C. P r o g r a m O r g a n i z a t i o n .108 D. D a t a S t r u c t u r e s 111 E. P r o c e s s a n d R e s o u r c e Management 113 I V . 4 . The EDC P r o g r a m m i n g L a n g u a g e S t r u c t u r e A. S t a t e m e n t s a n d P r o g r a m B l o c k s 114 B. L a n g u a g e C o n s t r u c t s f o r A r r a y P r o c e s s i n g . . . . 120 I V . 5 . P e r f o r m a n c e A n a l y s i s A. F l o w A n a l y s i s o f EDC 123 B. E x a m p l e 126 C. C o n s i d e r a t i o n s f o r G e n e r a l i z e d C o m p u t a t i o n s . 1 3 1 I V . 6. D i s c u s s i o n s a n d O u t l o o k 132 C h a p t e r V. C o n c l u s i o n s V. 1. Summary of R e s u l t s 138 V.2. G e n e r a l D i s c u s s i o n s 139 V.3. S u g g e s t i o n s f o r F u r t h e r Work 141 A p p e n d i x A A p p e n d i x B A p p e n d i x C R e f e r e n c e s V i t a v i i i L i s t o f F i g u r e s page 1. F i g . 1 . 1 . The c y c l i c a l c o n f i g u r a t i o n 5 2. F i g . I I . 1 . The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) 11 3. F i g . I I . 2 . The c o n t r o l u n i t o f RSS 11 4. F i g . I I . 3 . The S c h e m a t i c d i a g r a m o f a q u a d r u p l e c o m p a r a t o r 15 5. F i g . I I . 4 . S y m b o l s u s e d f o r c o m p a r i s o n a n d s h i f t 16 6. F i g . I I . 5 . The f o u r o p e r a t i o n s p e r f o r m e d by t h e q u a d r u p l e c o m p a r a t o r s 19 7. F i g . I I . 6 . An e x a m p l e t o i l l u s t r a t e RSS A l g o r i t h m I a n d M a r k i n g Scheme A 24 8. F i g . I I . 7 . An e x a m p l e t o i l l u s t r a t e RSS A l g o r i t h m I I a n d M a r k i n g Scheme B 25 9. F i g . I I . 8 . The o d d - e v e n s o r t e r 28 10. F i g . I I . 9 . The t h r e e i n d e x e s : i , j , and J , a n d t h e i n i t i a l m a r k e r p o s i t i o n M ( i ) 30 11. F i g . 1 1 . 1 0 . T h e h o r i z o n t a l c o m p a r i s o n s c a r r i e d o u t on t h e RSS a r r a y 30 12. F i g . 1 1 . 1 1 . The number o f c o m p a r i s o n c y c l e s v e r s u s t h e number o f i t e m s t o be s o r t e d ( A l g o r i t h m I ) 42 13. F i g . I I . 1 2 . A g e n e r a l - p u r p o s e c o m p u t e r s y s t e m w i t h s p e c i a l - p u r p o s e c h i p s a t t a c h e d [ 1 9 ] 43 14. F i g . I I I . l . A s s i g n m e n t o f l o o p a n d l i n k l a b e l s on a LSSN w h i c h h a s 16 l o o p s and 32 s w i t c h e s 53 15. F i g . I I I . 2 . C o n n e c t i o n o f t r a n s m i t t i n g and r e c e i v i n g d e v i c e s on a LSSN w i t h 16 l o o p s ...54 16. F i g . I I I . 3 . The s c h e m a t i c d i a g r a m s o f a Type-A s w i t c h . . . . 5 5 17. F i g . I I I . 4 . A 16x16 b a s e l i n e n e t w o r k 78 18. F i g . I I I . 5 . E f f e c t s o f b u f f e r s i z e on t h e t h r o u g h p u t a n d d e l a y o f a 64x64 LSSN 78 19. F i g . I I I . 6 . The t h r o u g h p u t r a t e s o f a 64x64 b a s e l i n e , a 64x64 LSSN and a 16x16 b a s e l i n e , v e r s u s t h e i n t e r -a r r i v a l t i m e 79 20. F i g . I I I . 7 . The d e l a y c u r v e s o f a 64x64 b a s e l i n e , a 64x64 LSSN a n d a 16x16 b a s e l i n e , v e r s u s t h e i n t e r -a r r i v a l t i m e , 80 2 1 . F i g . I V . 1 . EDC s y s t e m b l o c k d i a g r a m 87 22. F i g . I V . 2 . The c o n n e c t i o n d i a g r a m o f EDC h a r d w a r e a r c h i t e c t u r e 90 2 3. F i g . I V . 3 . The s c h e m a t i c d i a g r a m o f a PSN s w i t c h 100 24. F i g . I V . 4 . P a r a m e t e r p a s s i n g b e t w e e n t h e c a l l i n g p r o g r a m M a n d t h e c a l l e d p r o g r a m P 109 25. F i g . I V . 5 . The i n t e r a c t i o n s b e t w e e n c a l l i n g p r o g r a m s a n d a t a s k p r o g r a m 110 26. F i g . I V . 6 . The p h y s i c a l a n d l o g i c a l a r r a n g e m e n t s o f t h e EDC memory s y s t e m 111 27. F i g . I V . 7 . The i m p l e m e n t a t i o n o f a r e s o u r c e manager u s i n g a t a s k p r o g r a m 113 28. F i g . I V . 8 . A " B e g i n / E n d " b l o c k a n d i t s d a t a - f l o w g r a p h . 1 1 6 29. F i g . I V . 9 . An " I F " b l o c k a n d i t s d a t a - f l o w g r a p h 116 i x 3 0. F i g . I V . 1 0 . A " M a t c h " b l o c k a n d i t s d a t a - f l o w g r a p h 117 3 1 . F i g . I V . 1 1 . A "Loop" b l o c k a n d i t s d a t a - f l o w g r a p h 118 32. F i g . I V . 1 2 . The s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f a p a r a l l e l v e c t o r o p e r a t i o n 120 33. F i g . I V . 1 3 . The s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f a r e d u c t i o n o p e r a t i o n 121 34. F i g . I V . 1 4 . The s t a t e m e n t s o f some a l i g n m e n t o p e r a t i o n s , a n d t h e d a t a - f l o w g r a p h a nd m a c h i n e c o d e f o r m a t o f t h e "SHIFT" o p e r a t i o n 123 35. F i g . I V . 1 5 . The MARPf,MATR and MARPc,max c u r v e s o f t h e g i v e n e x a m p l e 129 X L i s t o f T a b l e s page 1. T a b l e I I . 1. R e q u i r e m e n t s o f t h e RSS m a r k i n g s c h e m e s . . 40 2. T a b l e I I . 2 . C o m p l e x i t i e s of s o r t i n g n e t w o r k s 47 3. T a b l e 111 .1 . S c a l a r o p e r a t i o n s 135 4. T a b l e I I I . 2 . Compound o p e r a t i o n s 135 5. T a b l e I I I . 3 . The " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f s c a l a r o p e r a t i o n s 136 6. T a b l e I I I . 4 . The " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s of compound o p e r a t i o n s 136 7. T a b l e I I I . 5 . The f o r m a t s o f i n s t r u c t i o n p a c k e t s 137 8. T a b l e I I I . 6 . The f o r m a t s o f r e s u l t p a c k e t s 137 x i N o m e n c l a t u r e ADT : A r r a y D e s c r i p t i o n T a b l e ARS : A v e r a g e r o u t i n g s t e p s C : Number o f c o l u m n s CS : C h a n n e l S e l e c t o r EDC : E v e n t - D r i v e n C o m p uter GCU : G l o b a l C o n t r o l U n i t i , I : I n d e x ( s h o r t f o r m o f " I n s t r u c t i o n " when u s e d a s s u b s c r i p t ) J , J : I ndex k,K : I ndex I R S : I n s t r u c t i o n R e g i s t e r s L : Number o f l o o p s u s e d i n LSSN L I T : L i n k a g e I n f o r m a t i o n T a b l e LMs : L o c a l M e m o r i e s LSSN : L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k M Number o f L o c a l M e m o r i e s MATR Maximum A v e r a g e T h r o u g h p u t R a t e MARP Maximum A c c e p t a n c e R a t e o f P a c k e t s MIMD : M u l t i p l e - I n s t r u c t i o n a n d M u l t i p l e - D a t a ( c o m p u t e r s y s t e N : Number o f i n p u t x p P : Number o f p r o c e s s o r s PDF : P i e c e - w i s e D a t a F l o w ( c o m p u t e r ) PSN : P a c k e t - S w i t c h e d N e t w o r k R : Number o f rows o r R e c e i v i n g P r o c e s s o r s ( s h o r t f o r m o f " R e s u l t " when u s e d a s s u b s c r i p t ) RL : R e q u e s t L i s t RPS : R e c e i v i n g P r o c e s s o r s Rr : R e c e i v e r RSS : R e c i r c u l a t i n g S y s t o l i c S o r t e r SIMD : S i n g l e - I n s t r u c t i o n a n d M u l t i p l e - D a t a ( c o m p u t e r s y s t e m s SISD : S i n g l e - I n s t r u c t i o n a n d S i n g l e - D a t a ( c o m p u t e r s y s t e m s ) SM : S y s t e m Memory SP : S u p e r v i s o r y P r o c e s s o r SUT : S t o r a g e U t i l i z a t i o n T a b l e SW : S w i t c h T : Number o f T r a n s m i t t i n g P r o c e s s o r s TPs : T r a n s m i t t i n g P r o c e s s o r s T r : T r a n s m i t t e r ) A c k n o w l e d g e m e n t s I s i n c e r e l y t h a n k my s u p e r v i s o r , D r . M. R. I t o f o r h i s p a t i e n t h e l p and g u i d a n c e d u r i n g t h e c o u r s e o f my g r a d u a t e p r o g r a m . I w o u l d a l s o l i k e t o t h a n k D r . C h a n s o n , D r . S c h r a c k a n d D r . Vuong f o r t h e i r s e r v i c e s a s members o f my s u p e r v i s o r y c o m m i t t e e . I am a l s o t h a n k f u l t o my p a s t a n d c u r r e n t o f f i c e m a t e s , f o r m a k i n g my s t a y on t h i s campus a m e m orable one. As f o r my f i n a n c i a l s u p p o r t , I am g r a t e f u l f o r t h e r e s e a r c h a s s i s t a n t s h i p s p r o v i d e d by my s u p e r v i s o r , t h e t e a c h i n g a s s i s t a n t s h i p s p r o v i d e d by t h e D e p a r t m e n t o f C o m p u t e r S c i e n c e , and t h e a w a r d s p r o v i d e d by t h e L e e F o u n d a t i o n o f S i n g a p o r e . 1 Chapter I. I n t r o d u c t i o n 1. Background Information The demand f o r high speed computation i s ever-i n c r e a s i n g , p a r t i c u l a r l y among the s c i e n t i f i c community engaged i n l a r g e - s c a l e computation such as weather f o r e c a s t i n g , r e a l t i m e b a t t l e f i e l d assessment, a r t i f i c i a l i n t e l l i g e n c e and s i m u l a t i o n s of very l a r g e and complex p r o c e s s e s . While c o n v e n t i o n a l computer systems can handle many of the c u r r e n t demands, they s u f f e r from c e r t a i n drawbacks — ranging from software o b e s i t y to hardware i n e x t e n s i b i l i t y — which s e v e r e l y r e s t r i c t t h e i r u s e f u l n e s s i n the design of the s o - c a l l e d " f i f t h - g e n e r a t i o n " computers [1] which are c u r r e n t l y being planned f o r f u t u r e very l a r g e - s c a l e a p p l i c a t i o n s . The f i r s t four g e n e r a t i o n s of computers are commonly d i s t i n g u i s h e d by t h e i r c o n s t i t u e n t t e c h n o l o g i e s — vacuum tubes, t r a n s i s t o r s , i n t e g r a t e d c i r c u i t s and, c u r r e n t l y , very l a r g e - s c a l e i n t e g r a t i o n (VLSI). C e n t r a l to the f i f t h -g e n e r a t i o n concept i s a break with the c o n v e n t i o n a l , or sometimes r e f e r r e d to as the Von Neumann, computer a r c h i t e c t u r e that has p r e v a i l e d i n the f i r s t four computer ge n e r a t i o n s [ 2 ] . S e v e r a l c l a s s e s of computer a r c h i t e c t u r e s have been proposed f o r the n e x t - g e n e r a t i o n computers, i n c l u d i n g t r e e s t r u c t u r e s , square and cube a r r a y s , p i p e l i n e s , s y s t o l i c a r r a y s 2 [ 3 ] , d a t a - d r i v e n systems [ 4 ] , demand-driven systems [5] and dynamic s t r u c t u r e s [55,57,61]. As of today, none of these a r c h i t e c t u r e s has yet evolved to become the s i n g l e , dominant b a s i s of r e s e a r c h work in t h i s a r e a . In t h i s t h e s i s , we w i l l look i n t o another i n t e r e s t i n g d e s i g n methodology -- c y c l i c a l a r c h i t e c t u r e s -- f o r h i g h l y p a r a l l e l a p p l i c a t i o n s , and s e v e r a l ideas based on the concept of c y c l i c a l a r c h i t e c t u r e s w i l l be proposed. Our new designs w i l l a l s o i n c o r p o r a t e the fundamental p r i n c i p l e s of s y s t o l i c , packet communications, d a t a - d r i v e n and c o n t r o l - d r i v e n systems. S y s t o l i c systems are c h a r a c t e r i z e d by t h e i r data-flow p a t t e r n : rhythmic data movements analogous to the p u l s a t i o n s i n the a r t e r i e s caused by the r e c u r r e n t c o n t r a c t i o n s of the h e a r t s . Because of t h e i r simple, h i g h l y r e p e t i t i v e s t r u c t u r e s , s y s t o l i c systems are very amenable to VLSI implementations. The a l g o r i t h m s of many s p e c i a l i z e d a p p l i c a t i o n s such as the Fast F o u r i e r Transform and matrix m u l t i p l i c a t i o n s , have been proposed f o r s y s t o l i c computation [ 3 ] . Packet communications are t r a d i t i o n a l l y meant f o r computer systems which are g e o g r a p h i c a l l y apart and i n t e r c o n n e c t e d v i a l o c a l networks; but r e c e n t l y , they have a l s o been proposed f o r m u l t i p r o c e s s o r systems c o n s i s t i n g of tens t o thousands of c l o s e l y i n t e r c o n n e c t e d p r o c e s s i n g and storage modules -.- examples are d a t a - d r i v e n computers i n which i n s t r u c t i o n e x e c u t i o n s are t r i g g e r e d by the a r r i v a l s of input 3 operands which are e ncapsulated i n t o the form of packets, and networks are used to convey these packets among the hardware modules. Data-driven computers have r e c e n t l y r e c e i v e d enormous a t t e n t i o n s due to t h e i r s i m p l i c i t y i n the e x p l o r a t i o n s of asynchronous p a r a l l e l i s m ; but on the other hand, they do not take advantage of the simple c o n t r o l s t r u c t u r e that e x i s t s i n a r r a y computation, and a l s o some i n h e r e n t l y s e q u e n t i a l a c t i v i t i e s do not conform n a t u r a l l y to the n o t i o n of d a t a -d r i v e n computation. In the c o n v e n t i o n a l , c o n t r o l - d r i v e n computers, i n s t r u c t i o n e x e cutions are sequenced e x p l i c i t l y by c o n t r o l s i g n a l s generated by the c e n t r a l p r o c e s s i n g u n i t s ; i n c o n t r a s t to d a t a - d r i v e n systems, they are more advantageous i n h a n d l i n g a r r a y computation because they make use of the simple c o n t r o l s t r u c t u r e s which e x i s t i n a r r a y computation; but on the other hand, the e x p l o i t a t i o n s of p a r a l l e l i s m i n c o n t r o l - d r i v e n systems are more d i f f i c u l t because e x p l i c i t c o n t r o l s i g n a l s are needed to s p e c i f y the branching and merging of e x e c u t i o n paths, which otherwise c o u l d be done i m p l i c i t l y i n d a t a - d r i v e n systems by operand packets which are sent among the i n s t r u c t i o n s . More d e t a i l s of these v a r i o u s systems w i l l be p r o v i d e d i n the f o l l o w i n g c h a p t e r s . We b e l i e v e t h a t i n order to gain s i g n i f i c a n t 4 improvement in the computation speed over e x i s t i n g computer systems, the new designs may have to depart from the p r e v a l e n t s e q u e n t i a l computation i n both hardware and software to v a r i o u s e x t e n t s . In other words, some of the e x i s t i n g development t o o l s such as o f f - t h e - s h e l f components, compiler techniques, e t c . , may not be u s e f u l i n our d e s i g n s ; f o r these reasons, we w i l l only emphasize the a r c h i t e c t u r a l aspects but not any immediate implementation. Throughout t h i s d i s s e r t a t i o n , the term " p r o c e s s o r " i s used to denote a p i e c e of p a s s i v e hardware capable of only p r i m i t i v e o p e r a t i o n s ; on other hand, "computer" r e f e r s t o a f u l l f l e d g e d machine capable of e x e c u t i n g h i g h - l e v e l o p e r a t i o n s ; " h i g h l y p a r a l l e l " or " n e x t - g e n e r a t i o n " a p p l i c a t i o n s are those c o n t a i n i n g l a r g e amounts of both synchronous and asynchronous, h i g h and l o w - l e v e l computation which can be performed i n p a r a l l e l , such as those examples quoted i n the beginning of t h i s c h a p t e r . 2. C y c l i c a l A r c h i t e c t u r e s The r a t i o n a l e of our advocacy of c y c l i c a l a r c h i t e c t u r e s i s based on the behaviour of program e x e c u t i o n s . As e x h i b i t e d i n the execution c y c l e s of i n s t r u c t i o n s as w e l l as "DO-LOOPS" which e x i s t i n n e a r l y a l l s c i e n t i f i c and b u s i n e s s -o r i e n t e d programs, the ways i n which most programs are executed, are b a s i c a l l y c y c l i c a l i n nature. I t i s t h e r e f o r e very spontaneous to e n v i s i o n a c l a s s of a r c h i t e c t u r e s which 5 have t h e i r resources arranged i n t o a c y c l i c a l c o n f i g u r a t i o n as f o l l o w s : Feedback path Computation path Input > (storage, p r o c e s s o r s and switches) > Output I I F i g . 1 . 1 . The c y c l i c a l c o n f i g u r a t i o n . The main computation path i n Fig.1.1 c o n s i s t s of both p r o c e s s i n g and s w i t c h i n g elements, and e i t h e r s h i f t - r e g i s t e r s or memory words are used f o r b u f f e r i n g and storage purposes. The i n f o r m a t i o n which goes through the feedback path are packets of e i t h e r data, c o n t r o l s i g n a l s or both, depending on the a p p l i c a t i o n s . Current r e s e a r c h i n v o l v i n g such c y c l i c a l a r c h i t e c t u r e s c o u l d be broadly c l a s s i f i e d i n t o three areas depending on the nature of the feedback s i g n a l s : (1) S p e c i a l - p u r p o s e p r o c e s s o r s attached to host computers: Examples are p r o c e s s o r s f o r the Fast F o u r i e r Transform [6,7] and matrix t r a n s p o s i t i o n [ 9 ] . For t h i s area of a p p l i c a t i o n , the purpose of feedback i s to allow f u r t h e r i n t e r a c t i o n s among the data elements and a l s o to re-use the resources along the computation path. 6 (2) I n t e r c o n n e c t i o n networks f o r processor-to-memory or p r o c e s s o r - t o - p r o c e s s o r communications: Examples are s i n g l e - s t a g e d shuffle-exchange networks [9,12] and m u l t i - s t a g e d shuffle-exchange networks [20,21,22], For t h i s area of a p p l i c a t i o n s , the s o l e purpose of feedback i s to re-use the res o u r c e s ; there i s no i n t e r a c t i o n s among the data. (3) F u l l f l e d g e d , high-performance computers: With only a few exceptions [28,60], n e a r l y a l l d a t a - d r i v e n systems are based on the c y l i c a l c o n f i g u r a t i o n [4,5]. For t h i s area of a p p l i c a t i o n s , packets are fed back as the r e s u l t of the completion of i n s t r u c t i o n c y c l e s ; and new i n s t r u c t i o n packets are brought i n t o the computation path when c e r t a i n r e s u l t packets are r e c e i v e d at the end of the feedback path. If a system c o u l d be implemented with e i t h e r the c y c l i c a l or the a c y c l i c c o n f i g u r a t i o n , then the r e l a t i v e m e r i t s and demerits of the two c o n f i g u r a t i o n s are as f o l l o w s . In g e n e r a l , the c y c l i c a l c o n f i g u r a t i o n would give r i s e t o a b e t t e r resource u t i l i z a t i o n than the a c y c l i c one, because i t s resou r c e s c o u l d be used r e p e a t e d l y by means of feedback; t h i s f e a t u r e would i n c u r tremendous savings i n system r e s o u r c e s , e s p e c i a l l y when the s i z e of the system i s very l a r g e . T h e r e f o r e , i f the e n t i r e system i s to be c o n s i d e r e d f o r f a b r i c a t i o n on a s i n g l e i n t e g r a t e d - c i r c u i t c h i p , i t s c y c l i c a l c o n f i g u r a t i o n would be a b e t t e r c h o i c e . On the other hand, 7 the c o n t r o l of the c y c l i c a l c o n f i g u r a t i o n i s u s u a l l y more d i f f i c u l t : i n some systems, masking b i t s are needed to d i s a b l e a subset of the p r o c e s s i n g r e s o u r c e s [9]; while i n o t h e r s , feedback counts are r e q u i r e d to separate the feedback s i g n a l s from the incoming ones. I f the c y c l i c a l c o n f i g u r a t i o n s are used asynchronously (e.g., as packet-switched communications networks) , then they would be s u s c e p t i b l e to the store-and-forward type of deadlocks due to c i r c u l a r requests of re s o u r c e s . Another important c h a r a c t e r i s t i c of packet-switched, c y c l i c a l systems i s t h e i r lack of res p o n s i v e n e s s , because when i n t e r r u p t s occur, the computation path c o u l d a l r e a d y be congested with i n f o r m a t i o n packets such that the i n t e r r u p t s cannot be processed immediately. 3. O b j e c t i v e s and Scope of the T h e s i s The main o b j e c t i v e of t h i s t h e s i s i s t o advocate c y c l i c a l a r c h i t e c t u r e s as the b a s i c design p r i n c i p l e of a c l a s s of high-performance systems. Our ideas w i l l be demonstrated by s p e c i f i c a p p l i c a t i o n s i n c l u d i n g p a r a l l e l s o r t i n g , packet-switched communications and the design of a novel computer — a l l of which are of c u r r e n t r e s e a r c h i n t e r e s t . The advantages of our designs r e l a t i v e t o ot h e r s w i l l be d i s c u s s e d , and the methods to r e s o l v e the v a r i o u s afore-mentioned demerits of c y c l i c a l a r c h i t e c t u r e s w i l l be pre s e n t e d . 8 In Chapter I I , we w i l l present a r e c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) which i s designed as a s i n g l e - c h i p , p a r a l l e l s o r t i n g module to be a t t a c h e d to a host computer. The s o r t i n g a l g o r i t h m s , design of the c o n t r o l l e r , and r e l a t i v e m e r i t s of the RSS w i l l be d e t a i l e d . Chapter I I I w i l l d e s c r i b e a l o o p - s t r u c t u r e d s w i t c h i n g network (LSSN) intended f o r communications i n packet-switched, m u l t i p r o c e s s i n g environments. The topology, p r o p e r t i e s and performance a n a l y s i s of LSSN w i l l be d i s c u s s e d , and the occurrence and r e s o l u t i o n of deadlocks w i l l be p r esented. Chapter IV w i l l o u t l i n e the design of the Event-Driven Computer (EDC) which i s p r i m a r i l y a d a t a - d r i v e n system supplemented with c o n t r o l -d r i v e n a c t i v i t i e s . The r a t i o n a l e of design, hardware and software o r g a n i z a t i o n s and performance of EDC w i l l be addressed. General d i s c u s s i o n s and suggestions of f u r t h e r work w i l l be given i n Chapter V. 9 Chapter I I . A S y s t o l i c Processor For P a r a l l e l S o r t i n g A b b r e v i a t i o n s : N: Number of input items C,Column#: Number of columns R,Row#: Number of rows P,Comparator#: Number of comparators i : Loop index j : Moving p o s i t i o n index J : F i x e d p o s i t i o n index M ( i ) : I n i t i a l marker's p o s i t i o n i n loop i t : Comparison c y c l e time "*": A marker 1. I n t r o d u c t i o n S o r t i n g has been an important o p e r a t i o n i n business and computer e n g i n e e r i n g a p p l i c a t i o n s [13]. Many standard and novel s o r t i n g a l g o r i t h m s c o u l d be found i n the l i t e r a t u r e [9-17]; some of them are optimal i n time c o m p l e x i t i e s , some i n the number of comparators used while others l a y emphasis on a r c h i t e c t u r a l d e s i g n s , i . e . , p r o c e s s o r i n t e r c o n n e c t i o n s , data flow, c o n t r o l s t r a t e g i e s and implementation t e c h n o l o g i e s . In t h i s c h a p t e r , we present a p a r a l l e l s o r t i n g network which embodies the concepts of both the c y c l i c a l a r c h i t e c t u r e s 1 0 and the s y s t o l i c systems .[3]. S y s t o l i c systems are c h a r a c t e r i z e d by t h e i r data flow p a t t e r n : once data are loaded from the memories, they and/or t h e i r i n t e r m e d i a t e r e s u l t s w i l l move w i t h i n the system along predetermined paths provided among the p r o c e s s i n g elements, and every element accepts and d i s t r i b u t e s data from and to i t s neighbours i n a rhythmic f a s h i o n analogous to the p u l s a t i o n s i n the a r t e r i e s caused by the r e c u r r e n t c o n t r a c t i o n s of the h e a r t . A major advantage of such systems l i e s i n the f a c t t h a t processor-memory communications are i n v o l v e d only d u r i n g the l o a d i n g of the input data and unloading of the f i n a l r e s u l t s ; t h e r e f o r e , there i s no delay due to bus c o n t e n t i o n s and memory access c o n f l i c t s d u r i n g the computation time. T h i s study w i l l demonstrate t h a t a c y c l i c a l a r c h i t e c t u r e coupled with s y s t o l i c data movements can perform the u s e f u l task of s o r t i n g . Because of the h i g h l y r e g u l a r i n t e r c o n n e c t i o n , simple c o n t r o l and addr e s s i n g s t r u c t u r e s , the area r e q u i r e d by t h i s design i s very compact, and hence i t i s h i g h l y amenable to VLSI implementations. A d e s c r i p t i o n of the r e c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) w i l l be given i n S e c t i o n 2, and the s o r t i n g a l g o r i t h m s , i n S e c t i o n 3. The c o n s t r a i n t s on RSS w i l l be d i s c u s s e d i n S e c t i o n 4 while S e c t i o n 5 w i l l a n alyse the RSS alg o r i t h m s and t h e i r t i m i n g c o m p l e x i t i e s . The r e l a t i v e m e r i t s of RSS w i l l be compared and d i s c u s s e d along with other designs i n S e c t i o n 6. DlHSHFuO-IHH {FL> tHHFDH mm HFlHHHFm LtD tFD \mnm tHHFC mm HHHHFC LHhHHHHHHHHHr-* I/O sw i t c h S y s t o l i c a r r a y i g . I I . 1 . The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) To/from the s y s t o l i c a r r a y To I/O sw i t c h Terminate "O X o o o o a rH o o C o n t r o l U n i t * o c (C o w 0) c (0 x: o X w Sequencer |Clock| I Reset I Counter U ' | T in t JComparator -'-f^Tft 1 J |2*Column#| R e g i s t e r F i g . I I . 2 . The c o n t r o l u n i t of RSS. 12 2. The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) 2.A. Network D e s c r i p t i o n A schematic diagram of the proposed s o r t e r RSS i s given i n F i g . I T . 1 . The RSS network c o n s i s t s of an a r r a y of "quadruple" comparators which are arranged i n t o R rows and C columns. The whole a r r a y i s a r t i c u l a t e d by 2*R c i r c u l a r loops as shown. Each of the quadruple comparators holds and s o r t s four input items d u r i n g a comparison c y c l e , except those s i t u a t e d at the top and bottom and l o c a t e d i n the odd-numbered columns of the a r r a y , where only e i t h e r the upper or lower p o r t i o n of these comparators i s i n v o l v e d i n the s o r t i n g p r o c e s s . During the i n i t i a l l o a d i n g phase, a l l the loops are opened at the Input/Output switch and connected to the input l i n e s ; data items enter the network through the loops i n a s e r i a l manner, with neighbouring loops s h i f t e d i n opposite d i r e c t i o n s . A f t e r the network has been loaded, the input l i n e s w i l l be d i s c o n n e c t e d and a l l the loops w i l l be c l o s e d . Before s o r t i n g commences, the comparator a r r a y has to be "marked" — the s o l e purpose of which i s to p l a c e a marker i n a c e r t a i n p o s i t i o n w i t h i n each loop, to i n d i c a t e the beginning and end of that loop. The convention of marking adopted here i s t h a t the "head" of each loop w i l l be a s s o c i a t e d with a marker, and the p o s i t i o n on the r i g h t - h a n d s i d e of the marker 13 w i l l be regarded as the " t a i l " of that loop. The reader may r e f e r to the examples given i n S e c t i o n 3 f o r i l l u s t r a t i o n s ; i n these examples, a s t e r i s k s are used to represent markers. N o t i c e that the marking schemes — i . e . , the ways to p l a c e the markers on the a r r a y p r i o r t o the f i r s t c y c l e of s o r t i n g are d i f f e r e n t f o r the' two examples, and they w i l l be r e f e r r e d to as Scheme A and Scheme B r e s p e c t i v e l y . A f t e r the marking procedure, one of the proposed RSS a l g o r i t h m s w i l l be a p p l i e d to the a r r a y . During a comparison c y c l e , input data are compared and exchanged w i t h i n the quadruple comparators. I f a p a i r of data has to be exchanged, then t h e i r a s s o c i a t e d markers, i f there are any, do not move with them but w i l l remain where they a r e . However, between s u c c e s s i v e comparison c y c l e s and when the data are s h i f t e d , the markers w i l l be s h i f t e d along with the data with which they are a s s o c i a t e d . A schematic diagram of the c o n t r o l u n i t used i s presented i n F i g . I I . 2 . T h i s u n i t generates the c o n t r o l s i g n a l s ( i . e . , "Opcode" i n F i g . I I . 2 ) to i n d i c a t e one of the o p e r a t i o n s to be performed by the comparators: (1) V e r t i c a l -comparison; (2) H o r i z o n t a l - c o m p a r i s o n ; (3) Diagonal-comparison and (4) S h i f t - o p e r a t i o n . At the end of each comparison c y c l e , the c o n t r o l u n i t w i l l t e s t the s t a t u s of the a r r a y ( i . e . , "Exchange/No-Exchange") to see whether any exchange has taken p l a c e d u r i n g that c y c l e . I t a l s o has a c y c l e counter 14 which keeps t r a c k of the c u r r e n t number of c o n s e c u t i v e "No-Exchange" c y c l e s . In other words, the content of the counter i s incremented upon e n t e r i n g a new c y c l e , and i s r e s e t whenever there i s at l e a s t one exchange i n that c y c l e ; when the count reaches twice the number of columns ( i . e . , Count=2*C), a t e r m i n a t i o n s i g n a l w i l l be generated. At t h i s stage, the input items have been s o r t e d i n t o a l i n e a r l i s t . As demonstrated i n the examples of S e c t i o n 3, the f i r s t items of the s o r t e d l i s t s are accompanied by a s t e r i s k s i n the uppermost l o o p s , and the l a s t items are on the righ t - h a n d s i d e of the a s t e r i s k s i n the lowest l o o p s . 2.B. The Quadruple Comparator The quadruple comparators have a higher l o g i c d e n s i t y than the c o n v e n t i o n a l , b i n a r y s o r t e r s used i n other networks, but the number of input/output l i n e s per comparator of the former i s only s l i g h t l y more than that of the l a t t e r . F i g . I I . 3 g i v e s a sketch of the input/output c o n f i g u r a t i o n of a quadruple comparator. 15 upper Ioop { data in »•»> { marker in > "=> data out > marker out lower loop { data out <>= = «= { marker out< <=== data in < marker In A A v 0) Ol c (0 C O X 01 o c a 01 c a £ O X F i g . I I . 3 . The schematic diagram of a quadruple comparator. In a d d i t i o n t o the two s e t s of input and two s e t s of output data l i n e s , there are four s i n g l e - b i t l i n e s used f o r s h i f t i n g of markers a l o n g the two loops connected to the comparators: one l i n e i s f o r the c l o c k s i g n a l , one l i n e i s used to i n d i c a t e whether any exchange has taken p l a c e d u r i n g the c u r r e n t comparison c y c l e , and two l i n e s f o r the opcodes. f o l l o w i n g . I f i t i s l o c a t e d i n an odd-numbered column, then i t w i l l push the s m a l l e s t of the four data items which i t h o l d s t o i t s u p p e r - r i g h t neighbour; i f i t i s i n an even-numbered column, then i t w i l l r e t a i n the s m a l l e s t and the l a r g e s t items i n i t s u p p e r - r i g h t and l o w e r - l e f t p o s i t i o n s E s s e n t i a l l y what a comparator u n i t accomplishes i s the 16 r e s p e c t i v e l y . However, when markers are present i n s i d e the comparator, the s i t u a t i o n becomes somewhat d i f f e r e n t and w i l l be d e s c r i b e d i n the next s u b s e c t i o n . 2.C. The Comparison/Exchange/Shift Operations For the convenience of i l l u s t r a t i o n , the f o l l o w i n g symbols w i l l be used throughtout t h i s chapter: ( i ) D i r e c t i o n of comparison o head or T t a i l V ( i i ) D i r e c t i o n of s h i f t F i g . I I . 4 . Symbols used f o r comparison and s h i f t . The d i r e c t i o n of comparison i s used t o show the o r d e r i n g of items a f t e r each comparison. In F i g . I I . 4 ( i ) , the s o l i d arrow head i n d i c a t e s the p o s i t i o n of the l a r g e r item f o r an ascending order; i f , on the other hand, a descending order i s d e s i r e d , then the arrow head w i l l i n d i c a t e the s m a l l e r one. Without l o s s of g e n e r a l i t y , the ascending order w i l l be assumed i n t h i s study. The open arrow of F i g . I I . 4 ( i i ) i s used t o i n d i c a t e the d i r e c t i o n of movement f o r both the items 17 and the markers during the " S h i f t " o p e r a t i o n s . The four o p e r a t i o n s performed by a comparator are d e p i c t e d i n F i g . I I . 5 and d e s c r i b e d below: 1. V e r t i c a l - c o m p a r i s o n : The two items on the upper p o r t i o n of the comparator are compared to the two at the bottom i n p a r a l l e l , w i t h the d i r e c t i o n s of comparison p o i n t i n g downward. The presence of markers i s ignored. 2. H o r i z o n t a l - c o m p a r i s o n : C a s e d ) When no marker i s i n s i d e the comparator: the two items on the r i g h t p o r t i o n of the comparator are compared to the two on the l e f t i n p a r a l l e l , with the d i r e c t i o n s of comparison p o i n t i n g to the l e f t ; c a s e ( i i ) When one or two markers are p r e s e n t : when a marker appears on the l e f t p o r t i o n of the comparator, the corres p o n d i n g d i r e c t i o n of comparison p o i n t s to the r i g h t ; otherwise i t p o i n t s to the l e f t . Note t h a t i n the h o r i z o n t a l comparisons, the d i r e c t i o n of comparison always p o i n t s from r i g h t to l e f t a c c o r d i n g t o the convention adopted, u n l e s s when both the head and the t a i l of a loop are i n v o l v e d i n the comparison, i . e . , when the marker appears on the l e f t p o r t i o n of the comparator, then the d i r e c t i o n w i l l be r e v e r s e d . T h i s r e v e r s a l prevents the minimum and maximum items i n a loop from c r o s s i n g 1 8 over each other, and i t i s achieved by the a c t i o n taken i n Case ( i i ) above. Diagonal-comparison: The two items on the upper p o r t i o n of the comparator are compared t o the two at the bottom i n p a r a l l e l , with the d i r e c t i o n s of comparison p o i n t i n g downward and c r o s s i n g each other. At the f i r s t g lance, the d i a g o n a l comparison i n v o l v i n g the t o p - r i g h t and l o w e r - l e f t items seems redundant, because these two items are a l r e a d y i n order a f t e r the v e r t i c a l and h o r i z o n t a l comparisons; however, i t i s u s e f u l when two markers appear on the l e f t p o r t i o n of the comparator s i m u l t a n e o u s l y . Furthermore, the t o p - l e f t / b o t t o m - r i g h t comparisons p r o v i d e s an exchange not p r o v i d e d by the combination of the v e r t i c a l and h o r i z o n t a l comparisons. S h i f t : C a s e d ) i f the comparator i s l o c a t e d i n an even-numbered column, then i t s top two items w i l l be s h i f t e d t o the l e f t and i t s lower two items to the r i g h t ; c a s e ( i i ) i f the comparator i s l o c a t e d i n an odd-numbered column, i t s top two items w i l l be s h i f t e d to the r i g h t and i t s lower items t o the l e f t . operat1on act ions 1.Vert1ca1_ Comparison t1me» t r 2.Horlzonta1_ Comparison ca s e ( l ) no marker Is involved c a s e ( H ) markers are Involved t1me= t. 3.D1agonal_ Comparison t1me« t, 4.Shift c a s e d )for comparators in even columns tlme= t« c a s e ( l l ) f o r comparators in odd columns F i g . I I . 5 . The four o p e r a t i o n s performed by the quadruple comparators. 20 3. The RSS Algorithms 3.A. A l g o r i t h m I T h i s a l g o r i t h m i n v o l v e s only the " V e r t i c a l -comparison", "Horizontal-comparison" and " S h i f t " o p e r a t i o n s but not the "Diagonal-comparison", and i s d e s c r i b e d i n the f o l l o w i n g program fragment w r i t t e n i n P a s c a l : Program R e c i r c u l a t i n g - S y s t o l i c - S o r t e r ; • Var Terminate : boolean; Column# : i n t e g e r ; Row# : i n t e g e r ; Comparator! : i n t e g e r ; Exchange : boolean; Count-No-Exchange: i n t e g e r ; ( • I n i t i a l i z a t i o n * ) • While NOT Terminate do (*enter next c y c l e of comparison*) Begin f o r C:=1 to Comparator! do Begin V e r t i c a l - c o m p a r i s o n ; H o r i z o n t a l - c o m p a r i s o n ; End; Check-Terminate; S h i f t ; End; (Algorithm I) The procedure "Check-Terminate" manipulates the f o l l o w i n g g l o b a l v a r i a b l e s : 1. "Exchange" - T h i s boolean v a r i a b l e i s always r e s e t to be " F a l s e " before a new comparison c y c l e commences, and i s set to be "True" i f any exchange 21 takes p l a c e d u r i n g the c y c l e . 2. "Count-No-Exchange" - T h i s v a r i a b l e keeps t r a c k of the number of c o n s e c u t i v e c y c l e s which have no exchange, and i s r e s e t t o zero whenever "Exchange" equals "True". 3. "Terminate" - T h i s boolean v a r i a b l e c o n t r o l s the "WHILE-DO" loop, and i s set to be "True" i f the f o l l o w i n g c o n d i t i o n i s s a t i s f i e d : C o n d i t i o n ( l ) ( f o r t e r m i n a t i o n ) : Count-No-Exchange > 2*Column# 3.B. A l g o r i t h m II T h i s a l g o r i t h m i s s i m i l a r to A l g o r i t h m I except that the "Diagonal-comparison" o p e r a t i o n i s i n c l u d e d i n i t s "WHILE-DO" loop: 22 Program Rec i r c u l a t i n g - S y s t o l i c - S o r t e r ; • While NOT Terminate do (*enter next c y c l e of comparison*) Begin For I : = 1 to Comparator! do Begin V e r t i c a l - c o m p a r i s o n ; H o r i z o n t a l - c o m p a r i s o n ; Diagonal-compari son; End; Check-Terminate; S h i f t End; (Algorithm I I ) 3 .C. Examples Two examples using three columns, three rows and e i g h t comparators ( i . e . , C=3, R=3, P=8) are presented i n F i g . I I . 6 and F i g . I I . 7 . A f t e r the i n i t i a l l o a d i n g and marking procedures, A l g o r i t h m I and II are a p p l i e d to the f i r s t and second examples r e s p e c t i v e l y . The contents of the comparator a r r a y are shown f o r the f i r s t and the l a s t two c y c l e s . Both input l i s t s are s o r t e d i n t o the ascending order. At the end of the l a s t c y c l e , the minimum of each loop i s i n d i c a t e d by the markers and the d i r e c t i o n of i n c r e a s i n g v a l u e s i s from r i g h t to l e f t . A l l the numbers i n a given loop are g r e a t e r than or equal to those i n the next l o o p above. 23 at c y c l e time • 1 •-1 16 0 3 2 0 11 • 2 2 5 7 17 e 0 •13 0 16 -1 15 10 7 1 B • 6 • 7 12 9 2 7 2 2 • 3 2 5 17 8 vert l e a l V comparison •-1 2 0 3 2 0 11 •16 2 0 7 17 8 0 •13 S 8 . -1 15 10 7 1 16 • 8 • 2 3 9 2 7 2 7 •12 2 5 17 8 horizontal | V comparison •-1 2 0 3 2 0 16 •11 2 0 IT 7 8 0 • 5 13 8 -1 15 10 7 1 16 • 8 • 2 3 9 2 7 2 12 • 7 2 5 17 8 cycle time • 47 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 5 5 3 3 2 7 • 7 7 8 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 v e r t i c a l | V comparison 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 6 5 3 3 2 7 • 7 7 8 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 horizontal | V compar1son 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 5 5 3 3 2 7 • 7 8 7 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 s h i f t I » cycle time • 48 0 0 0 0 -1 •-1 2 2 • 1 2 2 2 2 • 2 5 5 3 3 • 7 8 7 8 7 7 12 11 10 9 8 • 8 16 15 •13 17 17 16 v e r t i c a l | V comparison 0 0 0 0 - 1 •-1 2 2 • 1 2 2 2 2 • 2 5 S 3 3 • 7 8 7 8 7 7 12 11 10 9 8 • 8 16 15 •13 17 17 16 horizontal I comparison V 1 0 0 0 0 -1 «-1 | 1 2 2 _• 1 2 2 2 I | 2 • 2 5 5 3 3 | l _ * 7 8. 8 7 7 7 ! | 12 11. 10 9 8 • 8 [ 1 16 15. _*13 17 17 16 | 1 « < sorted V l i s t » > F i g . I I . 6 . An example t o i l l u s t r a t e RSS A l g o r i t h m I and Marking Scheme A. 24 c y c l e time - 1 at c y c l e 11fne • 1B •-3 32 0 8 6 2 3 2 0 0 -2 •-3 22 5 5 11 16 •35 • 2 6 5 5 5 4 •17 0 27 2 33 -2 15 11 6 8 6 • 6 31 22 15 3 16 •18 •11 17 15 16 16 15 •15 26 19 4 15 5 31 22 22 19 18 •18 6 6 6 11 34 •IB •26 35 34 33 32 27 vert leal | V compar1 son v e r t i c a l j V comparison •-3 S 0 8 6 2 2 2 0 O -2 •-3 22 32 5 2 16 •36 • 3 6 5 5 5 4 •17 0 27 11 16 -2 11 11 6 8 6 • 6 31 22 15 3 33 •IB •15 17 15 16 16 15 • 6 6 19. 4 15 S 26 22 22 19 18 •18 IS 26 6 11 34 •18 •31 35 34 33 32 27 horizontal | V comparison horizontal | V comparison •-3 5 0 8 6 2 2 2 0 0 -2 •-3 32 22 5 2 35 •16 • 3 6 5 S 5 4 • 0 17 27 11 16 -2 11 11 8 6 6 • 6 31 22 15 3 33 •18 •15 17 16 15 16 15 • 6 6 19 4 15 5 26 22 22 19 18 •18 26 15 6 11 34 •18 •31 35 34 33 32 27 diagonal | V compar1 son diagonal j V compar1son •-3 S 0 8 6 2 2 2 0 0 -2 •-3 32 22 5 2 35 •16 • 3 6 5 5 5 4 • 0 17 27 11 16 -2 11 11 8 6 6 • 6 31 22 4 3 33 •18 •IS 17 16 15 16 15 • 6 6 19 15 15 5 26 22 22 19 18 •18 26 15 • 6 11 34 •18 •31 35 34' 33 32 27 S h i f t I S h i f t I at c y c l e time - 19 •-3 2 2 0 0 -2 6 5 5 5 4 • 3 • 6 11 11 8 6 ' 6 17 16 15 16 15 •15 •18 26 22 22 19 18 35 34 33 32 27 •31 v e r t i c a l | V comparison •-3 2 2 0 0 -2 6 5 5 5 4 • 3 • 6 11 11 B 6 6 17 16 15 16 15 •15 •18 26 22 22 19 18 35 34 33 32 27 •31 ho r i z o n t a l | V compar1 son •-3 2 2 0 0 -2 6 5 5 8 4 • 3 • 6 11 11 8 6 6 17 16 16 15 15 •15 •18 26 22 22 19 18 35 34 33 32 31 •27 diagonal j comparison V L-3 2 2 0 0 -2 | 1 6 5 5 6 4 • 3 | L* « 11 11 8 6 6 | L_1T_ _ 1 6 16 .15 IS »15 | L_*18 26 22 _22 19 18 | I 35 34 33 32 31 '27 « < sorted 11st » > V F i g . I I . 7 . An example t o i l l u s t r a t e RSS A l g o r i t h m I I and Marking Scheme B. 25 4. O p e r a t i o n a l C o n s t r a i n t s 4.A. C o n s t r a i n t s on the S i z e of RSS Most s o r t i n g networks impose c e r t a i n c o n s t r a i n t s on the s i z e of the networks. For examples, Batcher's b i t o n i c s o r t e r [13] r e q u i r e s that the number of i t s input l i n e s be a power of two, and some mesh s o r t e r s [14,15] work on square a r r a y s o n l y . The b a s i c c o n s t r a i n t of the RSS a r r a y appears to be l e s s s t r i n g e n t : Requirement(1): Column! >2 Row# >1 Fu r t h e r c o n s t r a i n t s may or may not be r e q u i r e d depending on the marking schemes used: In Scheme A and B t o be d e s c r i b e d below, Requirement(1) i s s u f f i c i e n t to guarantee c o r r e c t o p e r a t i o n s of both RSS a l g o r i t h m s when Scheme A i s used, but an a d d i t i o n a l c o n s t r a i n t — which w i l l be given l a t e r on — on the s i z e of RSS w i l l be needed when Scheme B i s used. 26 4.B. Marking Scheme A I t i s observed from our s i m u l a t i o n s t u d i e s that only c e r t a i n ways of marking the a r r a y can guarantee c o r r e c t r e s u l t s , and one such ways i s given below. Marking Scheme A: The i n i t i a l marker p o s i t i o n , M ( i ) , of loop i i s : M(i) := 4 * i - 2 + 1 - M(i-1) where i=1,2,...2*Row#-1, 0< M(i) <= 2*Column#, and M(0) can be any value i n the range of M ( i ) . Scheme A i s a p p l i e d to the example of F i g . I I . 6 , where M(0)=1, M(1)=3-1=2, M(2)=5-2=3, M(3)=9~3=6, M(4)=7-6=1, M(5)=3-1=2, and the p a t t e r n r e p e a t s . I f there are only two columns, then M(3)=6MOD(2*Column#)=2. The r a t i o n a l e behind t h i s scheme w i l l be e x p l a i n e d i n S e c t i o n 5.B. i 4.C. Marking Scheme B In the second scheme, the markers are p l a c e d along the two s i d e s of the comparator a r r a y , as demonstrated i n 27 F i g . I I . 7 . T h i s method i s simpler and we may use the Input/Output l i n e s to i n s e r t the markers; and a l s o , the r e t r i e v a l of the f i n a l s o r t e d l i s t i s e a s i e r than when Scheme A i s used. However, t h i s scheme r e q u i r e s that the number of columns of the RSS a r r a y be twice an odd i n t e g e r , or the next hig h e r i n t e g e r of that v a l u e : Marking Scheme B: M(i) := 1 , f o r i=even • — • 2*Column#, f o r i=odd. Requi rement(2) ( f o r Scheme B o n l y ) : Column# : =  2*(An odd i n t e g e r ) ,or • -• = 2*(An odd int e g e r ) + 1 5. A n a l y s i s of the RSS Algorithms 5.A. Analogy with the Odd-Even T r a n s p o r t a t i o n Sort The RSS a l g o r i t h m s bear some resemblance to the Odd-Even T r a n s p o r t a t i o n Sort [11]; t h e r e f o r e , a b r i e f e x p l a n a t i o n of the Odd-Even s o r t e r would be h e l p f u l i n a n a l y s i n g the RSS d e s i g n : 28 Stage 5-0 a n — K - l Input a. r i sorted ouput N - l F i g . I I . 8 . The Odd-Even S o r t e r . In F i g . I I . 8 , the appearance of an arrow i n d i c a t e s the presence .of a c o n v e n t i o n a l , b i n a r y s o r t e r s i t u a t e d at that p o s i t i o n . An i t e m a ( j ) w i l l be compared t o another item a ( j ' ) a t stage s i f , j ' = j + ( - l ) * * ( j + s ) (II.1) where a l l j ' , j and s are g r e a t e r than or equal t o zero and are l e s s than N, where N i s the number of input l i n e s . The value of j 1 w i l l a l t e r n a t e between (j+1) and ( j - 1 ) when s i s incremented. T h i s s o r t e r guarantees c o r r e c t s o r t i n g of N items i n N c y c l e s [11], but i t r e q u i r e s a t o t a l of N*(N-1)/2 s o r t e r s ; i t i s t h e r e f o r e i m p r a c t i c a l i f N i s l a r g e . 29 5.B. C o r r e c t n e s s of the RSS A l g o r i t h m s and Marking Schemes In t h i s s e c t i o n , we w i l l f i r s t prove that the RSS a l g o r i t h m s are c o r r e c t , and then the two marking schemes w i l l be d e r i v e d . In Lemma ( I I . 1 ) , we w i l l examine the e f f e c t s of the RSS a l g o r i t h m s on each loop of the RSS a r r a y , t e m p o r a r i l y i g n o r i n g the i n t e r a c t i o n s among the loops; then Theorem ( I I . 1) w i l l show that with these i n t e r a c t i o n s , a complete s o r t i n g process can be a c h i e v e d . Lemma ( I I . 1 ) : The Odd-Even T r a n s p o r t a t i o n S o r t i s performed on each of the RSS loop when e i t h e r A l g o r i t h m I or II i s a p p l i e d to the RSS a r r a y . Proof: Let us c o n s i d e r three indexes i , j and J on the RSS a r r a y . As demonstrated i n F i g . I I . 9 , i(=0,1,..2*R-1) indexes the loops of the RSS a r r a y ; J(=1,2,..2*C) indexes the f i x e d p o s i t i o n s of the a r r a y ; and f o r l o o p i , M(i) i n d i c a t e s the i n i t i a l p o s i t i o n of the marker of the loop, and j(=0,1,...2*R-1) i n d i c a t e s the d i s t a n c e of a p o s i t i o n away from the marker. Because the markers are s h i f t e d with time, j i s t h e r e f o r e a f u n c t i o n of time and i s r e l a t e d to other indexes as f o l l o w s : j=[(M(i)+2C-J) + (2C+(t*(-1)**i)MOD 2C)]MOD 2C =[4C + M(i) - J + (t*(-1)**i)MOD 2C]MOD 2C .. (II.2') .(II.2) / 30 Moving p o s i t i o n index J. Fixed p o s i t i o n index 3 4 Loop 1"0 1 = 1 1 " 2 1 * 3 1«6 1-7 1 2 • 0 11 1 • 0 * 1«4 0 11 * 1«5 1 0 10 11 11 10 * 0 11 3 2 10 9 10 CLE 5 6 8 7 9 8 10 9 1 * 0 8 7 9 8 10 9 1 * 0 7 8 6 5 7 8 6 7 11 6 10 5 7 Tl 6 IT 11 10 9 10 4 3 5 4 6 5 9 8 7 6 2 1 T i n F i g . I I . 9 . The three indexes: i , j , and J , and the i n i t i a l marker p o s i t i o n M ( i ) . Cycle time t-0 t=1 t=2 t=2C-1 {a(0.0)* <a(0,1) <a(0,2) loop O {a(0,3) { < {a(0.2C-1) (a(1,0)* <a(1,1) loop 1 {a(1.2) (a(1.3) { : { : {a(1.2C-1) { a(2R-1.O)* { a(2R-1,1 ) loop { a(2R-1.2) 2R-1 { a(2R-1.3) ( : { { a(2R-1,2C-1). sorted output Fig.II.10. The horizontal comparisons c a r r i e d out on the RSS array. \ 31 The f i r s t composite term i n e x p r e s s i o n (II.2') shows the e f f e c t of the i n i t i a l marker's p o s i t i o n on j , and the second composite term i s due to the e f f e c t of time indexed by t . The modulo f u n c t i o n s are used to t r i m the v a l u e s of t and j because both of them are r e p e t i t i v e with a p e r i o d of 2C. The reader may v e r i f y e a s i l y t h at e x p r e s s i o n (II.2) i s c o r r e c t from the example of F i g . I I . 9 . Having e s t a b l i s h e d the r e l a t i o n s h i p among the indexes, we w i l l now d e r i v e s e v e r a l e x p r e s s i o n s to r e l a t e a p a i r of data items { a ( i , j ) , a ( i ' , j ' ) } i n v o l v e d i n a comparison. F i r s t , l e t us c o n s i d e r the h o r i z o n t a l comparison. In F i g . I I . 9 , a ( i , J ) i s always compared to a ( i , J ' ) where J ' = J - (-1)**J (II.3) For item a ( i , j ) , j i s r e l a t e d to other indexes as i n ( I I . 2 ) : j = [4C+M(i)-J+(t*(-1)**i)MOD 2C] MOD 2C S i m i l a r l y f o r item a ( i ' , j ' ) : j'= [4C+M(i')-J'+(t*(-1)**i*)MOD 2C] MOD 2C = [4C+M(i')-J+(-1)**J+(t*(-1)**i')MOD 2C] MOD 2C For a h o r i z o n t a l comparison, i equals i ' , t h e r e f o r e j ' reduces 32 t o : j'=j + (-1 ) * * J (II .3' ) Again from e x p r e s s i o n ( I I . 2 ) : J= -j+4C+M(i)+(t*(-1)**i)MOD 2C + 2KC where K i s any p o s i t i v e i n t e g e r such that J w i l l be p o s i t i v e . S u b s t i t u t i n g J i n t o (11.3 * ) , we o b t a i n the e x p r e s s i o n f o r j ' , where a ( i ' , j ' ) w i l l be compared to a ( i , j ) h o r i z o n t a l l y , j':=-j + (-1)**[j+M(i)+2C+(t*(-1)**i)MOD 2C] (II.4) Within loop i , M(i) and (-1) * * i are c o n s t a n t s , t h e r e f o r e j ' w i l l a l t e r n a t e between (-j-1) and (-j+1) as t i n c r e a s e s . By comparing (II.4) and ( I I . 1), we can see that the " H o r i z o n t a l -comparison" when coupled with the " S h i f t " o p e r a t i o n s , w i l l perform the Odd-Even Sort as f a r as loop i i s concerned, and t h e r e f o r e items w i t h i n a loop can be s o r t e d w i t h i n 2*C c y c l e s . T h i s p o i n t i s f u r t h e r i l l u s t r a t e d i n F i g . I I . 1 0 . Q.E.D. Theorem ( 1 1 . 1 ) : The e n t i r e RSS a r r a y i s capable of s o r t i n g with the combination of e i t h e r A l g o r i t h m I or A l g o r i t h m I I , and e i t h e r Marking Scheme A or Scheme B. 33 Proof: In a d d i t i o n to the Odd-Even comparisons w i t h i n a loop, the RSS a l s o compares the items of any two adjacent loops by means of the " V e r t i c a l - c o m p a r i s o n " and. "Diagonal-comparison"; the purpose of these o p e r a t i o n s i s t o move sm a l l e r items upward and l a r g e r items downward. I t i s e a s i l y d i s c e r n i b l e t h a t , i f comparisons are p r o v i d e d between the head ( i . e . , a(i+1,0)) of one loop, and the t a i l ( i . e . , a ( i , 2 C - l ) ) of the next higher loop, then odd-even s o r t w i l l be c a r r i e d out on the e n t i r e RSS a r r a y . T h e r e f o r e , the proof of t h i s theorem i s reduced to the proof that the " h e a d - t a i l " comparisons are p r o v i d e d by the combination of the a l g o r i t h m s and the marking schemes. Let us c o n s i d e r the " V e r t i c a l - c o m p a r i s o n " between a p a i r of items { a ( i , j ) , a ( i ' , j ' ) } . In F i g . I I . 9 , note that a ( i , j ) w i l l be always compared to a ( i 1 , j ' ) i f , i ' = i - ( - 1 )**( i + r J/2 1) (II.5) From e x p r e s s i o n s (11.2,4 and 5), we can o b t a i n the p o s i t i o n J where the head and t a i l of any two loops meet: j=2C -1 , => 4C+M(i)-J+(t*(-1)**i)MOD 2C = 2KC+2C-1 ...(II.6) j'=0, => 4 C + M ( i ' ) - J + ( t * ( - 1 ) * * ( i + 1))MOD 2C = 2K'C (II.7) Combining e x p r e s s i o n s (II.6 and 7), we o b t a i n , 34 8C+M(i)+M(i')-2J = (K+K')*2C+2C-1 = > J=K"C+(M(i)+M(i' ) + 1 )/2 (II.8) where K and K' are i n t e g e r s such that 0<=j<2C, and K" equals e i t h e r -1, 0 or 1 because 1<=J<=2C. E x p r e s s i o n (II.8) means that the t a i l of loop i w i l l be compared t o the head of loop (i+1) at e i t h e r halfway between M(i) and M ( i ' ) , i . e . , J=(M(i)+M(i')+1)/2, or J=(M(i)+M(i')+1)/2+C, depending on whether there i s any comparator s i t u a t e d at these l o c a t i o n s . From e x p r e s s i o n I I . 8 , M(i)+M(i+1)+1 = 2(J-K"C) => M(i)+M(i + 1 ) = 2(J-K"C) -1 (II.9) E x p r e s s i o n (II.9) g i v e s r i s e t o another requirement f o r the marking of the RSS a r r a y : Requirement(3): ( f o r marking schemes other than Scheme A and B) M(i)+M(i+1)=An odd i n t e g e r T h i s requirement w i l l ensure that the t a i l s and heads of the loops w i l l be compared by the v e r t i c a l comparisons. 35 I t i s a u t o m a t i c a l l y s a t i s f i e d when Marking Scheme A or B i s used, but has to be c o n s i d e r e d f o r other marking schemes. Q.E.D. Now we w i l l d e r i v e Marking Scheme A. From ( I I . 5 ) , i'=i+1 and i ' =i-(-1 ) ** ( i+ r j / 2 l ) =>i + l"J/2l=An odd i n t e g e r =>rj/2l=(An odd i n t e g e r ) - i C a s e d ) at i=even, [J/2l=odd; t h e r e f o r e , => J= 2*(An odd i n t e g e r ) , or = 2*(An odd i n t e g e r ) -1 (11.10) from e x p r e s s i o n s (II.9 and 10), => M(i+1)= 4*(An odd in t e g e r ) - 2 K " C - 1 - M ( i ) , or = 4*(An odd integer)-2K"C-3-M(i) ...(11.11) C a s e ( i i ) at i=odd, l"J/2l=even; t h e r e f o r e , => J=2*(An even i n t e g e r ) , or = 2*(An even i n t e g e r ) - 1 (11.12) from e x p r e s s i o n s (II.9 and 12), 36 => M(i+1)=4*(An even i n t e g e r ) - 2 K " C - 1 - M ( i ) , or -= 4*(An even integer)-2K"C-3-M(i) (11.13) We c o u l d o b t a i n Scheme A by s e t t i n g K"=0 i n e x p r e s s i o n s (11.11 and 14): M(i+1)=4*i-2+1-M(i) Or, e q u i v a l e n t l y , r M(i)=4*i-2+1-M(i-1) which i s Scheme A and where 1<=M(i)<=2C, f o r i:=1,2,...2R To d e r i v e Scheme B, l e t M(i):=1, f o r i=even :=2C, f o r i=odd C a s e ( i ) At i=odd, from e x p r e s s i o n s (II.8 and 12), J=K"C+(1+2C+1)/2=2*(An even i n t e g e r ) , or =2*(An even i n t e g e r ) - 1 => J=K"C+C+1 e{3,4,7,8 } (11.14) out of the three p o s s i b l e vaules of K", i . e . , -1, 0, and +1, 37 only K"=0 can s a t i s f y both e x p r e s s i o n (11.14) and (1<=J<=2C); t h e r e f o r e , J=C+1 e{3,4,7 f8 f } =>C i {2,3,6,7, } =>C:=2*(An odd i n t e g e r ) , or :=2*(An odd i n t e g e r ) + 1 (11.15) C a s e ( i i ) At i=even, from e x p r e s s i o n s (II.8 and 10), J:=K"C+(2C+1+1)/2 =K"C+C+1 €{1,2,5,6 } (11.16) both K"=0 and K"=-1 can s a t i s f y e x p r e s s i o n (11.16) and (1<=J<=2C) s i m u l t a n e o u s l y : when K" = 0, Ce{0,1,4,5 } (11.17') when K" = -1, C= Any p o s i t i v e i n t e g e r (11.17) For a l l v a l u e s of i , both e x p r e s s i o n s (11.15 and 17) can be s a t i s f i e d s i m u l t a n e o u s l y by the requirement below: C := 2*(An odd i n t e g e r ) , or := 2*(An odd integer)+1 38 t => C €{2,3,6,7, } which i s Requirement (2) of Scheme B. Now l e t us c o n s i d e r the d i a g o n a l comparisons, and we w i l l show that Requirement (3.) can a c t u a l l y be waived when Marking Scheme B i s used with A l g o r i t h m I I . Again, from F i g . I I . 9 , items a ( i , j ) and a ( i ' , j ' ) w i l l , be compared d i a g o n a l l y i f , J ' = J - ( - 1 ) * * J ...(II.3) i ' = i - ( - 1 ) * * ( i + r J/21) (II.5) C o n v e r t i n g J and J ' i n t o j and j ' us i n g e x p r e s s i o n s (II.2 and 3): j=[4C+M(i)-J+(t*(-1)**i)MOD 2C] MOD 2C j'=[4C+M(i')-J+(-1)**J+(t*(-1)**i')MOD 2C] MOD 2C The heads and t a i l s of the loops w i l l be compared by the d i a g o n a l comparisons i f , j = 2C-1 j'=0 i'=i+1 39 S u b s t i t u t i n g these values i n t o the above e x p r e s s i o n s , we get, j=2C-1, => 4C+M(i)-J+(t*(-1)**i)MOD 2C=2KC+2C-1 j'=0, => 4C+M(i + 1 )-J+(-1 )'**J+(t*(-1 ) * * ( i + 1 )MOD 2C=2K'C Adding up the two e x p r e s s i o n s , 8C+M(i)+M(i+1)-2J+(-1)**J=(K+K'+1)*2C"1 => M(i)+M(i+1)=2J-2K"C-1-(-1)**J =2(j-K")C-1-(-1)**J =2(J-K")C, or 2(J-K")C-2 =An even i n t e g e r . The l a s t r e s u l t shows that Requirement (3) can be waived when Scheme B i s used with A l g o r i t h m I I , because i f M(i)+M(i+1) equals an odd i n t e g e r , then the h e a d - a n d - t a i l comparisons w i l l be p r o v i d e d by the " V e r t i c a l - c o m p a r i s o n " , but i f i t equals an even i n t e g e r , then i t w i l l be p r o v i d e d by the "Diagonal-comparison" as demonstrated above. The v a r i o u s requirements f o r the marking schemes are summarized i n T a b l e . I I . 1 . 40 Table II.1 - Requirements of the RSS marking schemes. Marking Scheme A l g o r i t h m I A l g o r i t h m II A Requirement(1) Requirement(1) B Requirements(1)&(2) Requirements(1)&(2) others Requirements(1)&(3) and others to be d e r i v e d from e x p r e s s i o n s II.5 and I I . 8 . 5.C. C o r r e c t n e s s of the Termination Methods I f the RSS a r r a y i s c o r r e c t l y marked and Requirements (1) , (2) and (3) are duly met, then C o n d i t i o n (1) of S e c t i o n (2.C) i s s u f f i c i e n t to guarantee proper t e r m i n a t i o n . The reason i s t h a t , as we can see from e x p r e s s i o n s (11.2,4 and 5), the comparison p a t t e r n repeats every 2C c y c l e s ; i f there i s no exchange i n the most recent c o n s e c u t i v e 2C c y c l e s , then there w i l l be no f u r t h e r exchanges i n the subsequent comparisons, meaning that the s o r t i n g process must have terminated. 5.D. Timing C o m p l e x i t i e s The RSS s i m u l a t i o n program i s l i s t e d i n Appendix B f o r r e f e r e n c e . Input parameters t o the s i m u l a t o r i n c l u d e the numbers of rows and columns — these two numbers determine the 41 t o t a l number of items to be s o r t e d — and the i n i t i a l seed value f o r the g e n e r a t i o n of the input l i s t ; at the end of each s i m u l a t i o n run, the s i m u l a t o r w i l l produce the s o r t e d l i s t as w e l l as the number of s o r t i n g c y c l e s needed. Fig.II.11 show the numbers of c y c l e s needed by RSS A l g o r i t h m I to s o r t on a r r a y s with v a r i o u s combinations of numbers of rows and columns. These s i m u l a t i o n r e s u l t s i n d i c a t e that with A l g o r i t h m I, the average number of c y c l e s needed to s o r t a random set of N items i s bound by the l i n e N, and approaches N/2 as N i n c r e a s e s . When A l g o r i t h m II i s used, the number of c y c l e s needed w i l l be much s m a l l e r -- due to the presence of d i a g o n a l comparisons, and the examples of F i g . I I . 6 and II.7 h e l p i l l u s t r a t e t h i s p o i n t . However, the a c t u a l speeds of A l g o r i t h m II may or may not exceed that of A l g o r i t h m I, because i t s comparison c y c l e i n c l u d e the d i a g o n a l comparison and hence i t s c y c l e time w i l l be l o n g e r . ^ s Number o f comparison c y c l e s F i g . l l . 1 1 . The number of comparison c y c l e s v ersus the number of items to be so r t e d (Algorithm I ) . 43 6. D i s c u s s i o n s Since s o r t i n g i s such a common and necessary o p e r a t i o n i n computer a p p l i c a t i o n s , t h e r e are dozens of s o r t i n g a l g o r i t h m s d e s c r i b e d i n the l i t e r a t u r e . In t h i s c hapter, we have presented two s i m i l a r s o r t i n g a l g o r i t h m s which apply the s y s t o l i c idea to a c y c l i c a l a r c h i t e c t u r e , and the f u n c t i o n a l d e s ign of a s o r t e r (RSS) based on these a l g o r i t h m s has a l s o been suggested. Our primary g o a l i s to look i n t o the design of a s p e c i a l - p u r p o s e VLSI c h i p t h a t can be a t t a c h e d to a host computer such as the one e n v i s i o n e d by F o s t e r and Rung [ 19]: CPU S Y S T E M B U S I Primary Memory i Pattern Matcher F i g . I I . 1 2 . A general-purpose computer system with s p e c i a l -purpose c h i p s a t t a c h e d [19], Undoubtedly, the u s e f u l n e s s of the s o r t e r i s not l i m i t e d to s c i e n t i f i c computation , i t c o u l d a l s o be used i n o f f i c e i n f o r m a t i o n systems and r e l a t i o n a l data base machines. With the s t a t e d goal i n mind, we now compare our p r o p o s a l with some e x i s t i n g ones u s i n g the f o l l o w i n g c r i t e r i a : (a) time complexity; (b) hardware complexity and (c) c o n t r o l 44 complexity. (a) Time Complexity: In T a b l e . I I . 2 , the s o r t i n g times of some e x i s t i n g a l g o r i t h m s c o u l d be d i v i d e d i n t o four c a t e g o r i e s , namely, O(logN), 0 ( ( l o g N ) * * 2 ) , O(N**0.5) and O(N), where N i s the number of items to be s o r t e d . M u l l e r and P r e p a r a t o r ' s a l g o r i t h m [10] i s i n the f a s t e r category, but i t r e q u i r e s a d i s c o u r a g i n g number of comparators, 0(N**2). Batcher's b i t o n i c s o r t e r [13] and the p e r f e c t s h u f f l e [9] are both i n the 0((logN)**2) category, and they are c h a r a c t e r i z e d by the shuffle-exchange type of i n t e r c o n n e c t i o n s . The two mesh s o r t i n g schemes s o r t N**2 items on a NxN mesh with approximately O(N) time, t h e r e f o r e they b e l o n g " to the O(N**0.5) cat e g o r y . Nassimi and Sahni's mesh s o r t i n g scheme [14] i s based on Batcher's b i t o n i c merge a l g o r i t h m and i t needs approximately 14N r o u t i n g steps and 2*logN compare-exchange steps on a NxN mesh; moreover, i t r e q u i r e s that the input s u b f i l e s be pre-s o r t e d . Thompson and Kung's mesh s o r t i n g scheme [15] needs roughly 6N+0((N**(2/3))logN) r o u t i n g steps and N+0((N**(2/3))logN) compare-exchange s t e p s . The RSS a l g o r i t h m s belong to the O(N) c a t e g o r y , but because of t h e i r simpler c o n t r o l s t r u c t u r e s and near-neighour type of data movements, t h e i r a c t u a l s o r t i n g times might be l e s s than those of the mesh s o r t i n g schemes which r e q u i r e more complex c o n t r o l 45 and data movements. (b) Hardware complexity: S o r t e r s with shuffle-exchange type of i n t e r c o n n e c t i o n s are not w e l l - s u i t e d t o VLSI implementations, because shuffle-exchange networks have a very low degree of r e g u l a r i t y and m o d u l a r i t y , and r e q u i r e wires of v a r i o u s l e n g t h s . I t has been shown by Thompson [18] that at l e a s t 0((N**2)/(logN)**2) c h i p area i s r e q u i r e d to l a y out an N-vertex shuffle-exchange network — t h i s i s a s e r i o u s drawback when N i s l a r g e . On the other hand, because the i n t e r c o n n e c t i o n p a t t e r n s r e q u i r e d by the mesh and the RSS a l g o r i t h m s are h i g h l y r e g u l a r and r e p e t i t i v e , these two types of s o r t e r s c o u l d be f a b r i c a t e d e a s i l y by r e p l i c a t i n g the c i r c u i t s of a s i n g l e comparator u n i t f o r the e n t i r e a r r a y s . (c) C o n t r o l complexity: The l o g i c of the v a r i o u s o p e r a t i o n s ( i . e . , H o r i z o n t a l - c o m p a r i s o n , V e r t i c a l -comparison and Diagonal-comparison) can be hardwired i n t o each of the quadruple comparator, and the c o n t r o l u n i t shown i n F i g . I I . 1 simply broadcasts the sequence of these o p e r a t i o n s to a l l the comparators. The c o n t r o l s t r u c t u r e r e q u i r e d by the RSS a l g o r i t h m s i s t h e r e f o r e comparable to that r e q u i r e d by the Batcher's b i t o n i c s o r t e r s , and i s much simpler than that r e q u i r e d by the, mesh s o r t e r s . The 46 l s i m p l i c i t y of the RSS c o n t r o l l e r i s another important f a c t o r when the VLSI implementation i s concerned. Most other s o r t i n g networks impose c e r t a i n n o n - t r i v i a l c o n s t r a i n t s on t h e i r s i z e s ; f o r examples, the Batcher's s o r t e r and the p e r f e c t s h u f f l e network r e q u i r e that the number of input l i n e s be a power of two, and the mesh s o r t i n g a l g o r i t h m s operate on square a r r a y s . The c o n t r a i n t s of the RSS al g o r i t h m s (see T a b l e . I I . 1 ) appear to be l e s s s t r i n g e n t i n t h i s r e s p e c t . In summary, although the RSS design i s not optimal i n every a s p e c t , i t i s h i g h l y amenable t o VLSI implementations as f a r as i t s hardware and c o n t r o l c o m p l e x i t i e s are concerned. T a b l e II.2 - C o m p l e x i t i e s of S o r t i n g Networks* Method #Input ^Comparators Time I n t e r c o n n e c t ion C o n t r o l B a t c h e r ' s B i t o n i c S o r t e r [ 1 3 ] N 0(N{logN}**2) 0( {1ogN}**2) h i g h 1 ow M u l l e r & P r e p a r a t o r ' s t 1 0 ] N 0(N**2) 0(1ogN) 1 ow h i g h P e r f e c t S h u f f l e [ 9 ] N 0(N) 0({logN>**2) h i g h 1 ow Thompson & Kung's Mesh S o r t [ 1 5 ] NxN NxN mesh 0(N)++ 1 ow h i g h Nasslml & S a h n l ' s Mesh S o r t [ 1 4 ] NxN NxN mesh 0(N)++ 1 ow h i g h RSS N 0(N) 0(N)++ low 1 ow Notes: + In terms of a m e n a b i l i t y to VLSI implementations. ++ P l e a s e see d i s c u s s i o n s i n S e c t i o n II.G. 48 Chapter I I I . A Novel Loop-Structured Switching Network (LSSN) 1. I n t r o d u c t i o n Many l a r g e - s c a l e computer a p p l i c a t i o n s such as image p r o c e s s i n g , weather f o r e c a s t i n g and b a l l i s t i c m i s s i l e defence systems, r e q u i r e execution r a t e s of more than one b i l l i o n i n s t r u c t i o n s per second. With the advent of VLSI t e c h n o l o g i e s , i t i s f e a s i b l e and more f l e x i b l e to c o n s t r u c t such l a r g e - s c a l e systems by i n t e r c o n n e c t i n g hundreds or even thousands of o f f - t h e - s h e l f p r o c e s s i n g and storage d e v i c e s , to work i n a c o - o p e r a t i v e manner. Although s e v e r a l e x i s t i n g networks can p r o v i d e the r e q u i r e d communication "bandwidth among these d e v i c e s , they are expensive t o b u i l d and d i f f i c u l t t o expand. For examples, the s w i t c h counts of a NxN c r o s s b a r and a NxN b a s e l i n e [20] are 0(N**2) and O(NlogN) r e s p e c t i v e l y ; f o r N=1024, the c r o s s b a r would r e q u i r e more than a m i l l i o n switches while the b a s e l i n e would need about f i v e thousand of them. Another disadvantage of u s i n g l a r g e number of switches i s t h a t of system r e l i a b i l i t y — the networks are more l i k e l y t o f a i l when more switches are used. In t h i s chapter, we i n t r o d u c e a novel l o o p - s t r u c t u r e d s w i t c h i n g network (LSSN) which overcomes the above problems. The main f e a t u r e of LSSN i s i t s c y c l i c a l c o n n e c t i o n s ; and i t o n l y r e q u i r e s N/2 two-by-two switches f o r i n t e r c o n n e c t i n g N p a i r s of t r a n s m i t t i n g and r e c e i v i n g d e v i c e s ; t h e r e f o r e , i t i s 49 very a t t r a c t i v e f o r l a r g e - s c a l e , heterogeneous systems made up of many d e v i c e s . From the s t r u c t u r a l and f u n c t i o n a l p o i n t s of view, LSSN i s a packet-switched, m u l t i - s t a g e d , b l o c k i n g network with d i s t r i b u t e d c o n t r o l . In the next s e c t i o n , we w i l l present i t s c o n n e c t i o n f u n c t i o n , a d d r e s s i n g and r o u t i n g a l g o r i t h m s . In S e c t i o n t h r e e , s e v e r a l important p r o p e r t i e s of LSSN w i l l be re v e a l e d , and the causes of and method t o a v o i d deadlocks w i l l be d i s c u s s e d . In S e c t i o n f o u r , the r e s u l t s of our s i m u l a t i o n s t u d i e s and performance e v a l u a t i o n s w i l l be pre s e n t e d . D i s c u s s i o n s and t o p i c s f o r f u r t h e r work are p r o v i d e d i n S e c t i o n f i v e , and the LSSN s i m u l a t i o n program are l i s t e d i n Appendix C f o r r e f e r e n c e . 2. Network Topology Networks using (logL) stages of two-by-two switches --where L i s a power of two — are well-known [20,21,22,25,26]. T r a d i t i o n a l l y , they are used to connect L input t o L output t e r m i n a l s , i . e . , f o r i n t e r c o n n e c t i n g L t r a n s m i t t e r s to L r e c e i v e r s . Feedback paths are sometimes p r o v i d e d t o route the i n f o r m a t i o n back from the output s i d e to the input s i d e , thus forming l o o p s . LSSN i s a l s o based on the concept of feedback l o o p s , but i t d i f f e r s from others i n t h a t a l l i t s switches c o u l d be used as both e n t r y and e x i t s t a t i o n s f o r data t r a n s m i s s i o n and r e c e p t i o n . With L loops — where L i s a power of two and at l e a s t equal to four — LSSN can connect 50 up to N=LlogL p a i r s of t r a n s m i t t e r s (Trs) and r e c e i v e r s ( R r s ) , u s i n g only N/2 switches — t h i s f e a t u r e renders i t a t t r a c t i v e f o r l a r g e v a l u e s of N. An example with L=16 and N=64 i s i l l u s t r a t e d i n F i g . I I I . 1 . 2.A. Addressing Scheme and Connection F u n c t i o n The f o l l o w i n g d e s c r i p t i o n can be e a s i l y understood i f the readers r e f e r to the example of F i g . I I I . 1 , i n which a l l the switches have been set to the " S t r a i g h t - t h r o u g h " c o n n e c t i o n ; a loop i s d e f i n e d as a c l o s e d path i n t h i s c o n f i g u r a t i o n . For a LSSN with L loops, each of the loops i s l a b e l e d with L'=logL b i t s of code, i . e . , which i s the b i n a r y r e p r e s e n t a t i o n of an i n t e g e r i n the c l o s e d range [ 0 , L - 1 ] , The switches are arranged i n t o L' stages each of which i s l a b e l e d with S'=riogL'1 b i t s of b i n a r y d i g i t s r e p r e s e n t e d as Ag,...-o.^. The output l i n k s of a s w i t c h at the s - t h stage would be a s s i g n e d the f o l l o w i n g addresses: Xe.ft Output -t-Lnh = ,. . . ^•i^L* • • • • * ^1 12-Lg.h.t Output Xtnh= A g • • • •'°'i'^ n • • • • ' ^ s + i ^ ' ^ s _ i * * These addresses are o b t a i n e d by c o n c a t e n a t i n g the stage and l o o p l a b e l s t o g e t h e r , with the s-th b i t of the address of the l e f t output l i n k set t o "0" and t h a t of the r i g h t output l i n k 51 set to "1". One c o u l d v e r i f y t h i s scheme on the example of F i g . I l l . 1 . Consider a switch l o c a t e d i n the s - t h stage, and suppose one of i t s output l i n k s i s p a r t of the loop then i t r e a l i z e s the f o l l o w i n g c o n n e c t i o n f u n c t i o n : L S S N S ( £ L , . • « £ s . • = *-L,- • -^ s- • • l 1 » where l& i s the one's complement o f -t^. The connection f u n c t i o n s t a t e s that at the s-th stage, any two loops with l a b e l s d i f f e r i n g o n l y i n t h e i r s - t h b i t s w i l l be connected by a switch at that stage. 2.B. Routing Scheme In LSSN, r e c e i v e r s (Rrs) are i d e n t i f i e d by using the address of the output l i n k s t o which they are connected, whereas t r a n s m i t t e r s (Trs) need not be i d e n t i f i e d . To d i s p a t c h a message, a Tr w i l l generate a packet which has the f o l l o w i n g format: < f ' f " ; d e s t i n a t i o n address ; message > where f ' f " i s a 2 - b i t f i e l d which w i l l be r e f e r r e d to as the feedback count, and i s i n i t i a l i z e d to zero when a packet i s newly formed, and incremented whenever the packet goes through the feedback paths. L a t e r on, i t w i l l be shown that t h i s f i e l d would never r e q u i r e more than two b i t s r e g a r d l e s s of the 52 network s i z e . The address of the Rr and the a c t u a l message to be t r a n s m i t t e d are a l s o c o n t a i n e d i n the packet. Two types of switches, namely Type-A and Type-B, w i l l be c o n s i d e r e d i n our s t u d i e s , and t h e i r schematic diagrams are shown i n F i g . I I I . 3 . Loop // Stage 00 o o o o o o o o o o o o o o o o Stage 01 o o o o o o o o o o o o o o o o o o o o o o o o iH .-I o o fH fH o •H O iH fH O O Stage 10 o o o o o o o iH o o o o o o o o o o o o o o o o o o o iH O O »H o o fH O O r-l o o o o X O i-l iH O o o o o o o o o >< o o o o >H O O tH T H O L tH fH o Stage 11 F i g . I I I . 1 . Assignment of loop and l i n k l a b e l s on LSSN which has 16 loops and 32 switches. F i g . I I I . 2 . Connection of t r a n s m i t t i n g and r e c e i v i n g d e v i c e s on a LSSN with 16 l o o p s . (x= a hardwired c o n n e c t i o n f o r a t r a n s m i t t e r , T r ; 0= a hardwired c o n n e c t i o n f o r a r e c e i v e r , Rr.) 55 a) Input buffers Output p o r t s b) loop ( l ^ . - . - l j ) stape (s^.-.Sj) • • I 7 fT> C CM ec tr. CO c o output link Input ports Duffer pools Intermediate ports Output ports data and control signals » status signals Indicating a v a i l a b i l i t y of class O and class 1 buffers F i g . I I I . 3 . The schematic diagrams of a Type-A switch (a) and, a Type-B switch ( b ) . 56 A Type-A switch i s s i m i l a r to those used i n the co n v e n t i o n a l packet-switched networks, except that i t has two b u i l t - i n f i r s t - i n - f i r s t - o u t b u f f e r s . When a packet e n t e r s a Type-A switch l o c a t e d i n the s-th stage, i t i s f i r s t p l a c e d i n t o one of i t s input b u f f e r s , and then switched to the l e f t output port i f the s-th b i t of i t s d e s t i n a t i o n address i s a "0", or to the r i g h t output port i f that b i t i s a " 1 " . As shown i n F i g . I I I . 3 , a Type-B switch has a s l i g h t l y more complicated i n t e r n a l s t r u c t u r e than a Type-A s w i t c h . I t s main f e a t u r e s are the " s t r u c t u r e d b u f f e r p o o l s " which are made up of three c l a s s e s of f i r s t - i n - f i r s t - o u t (FIFO) b u f f e r s : C l a s s - 0 , Class-1 and C l a s s - 2 . I t a l s o c o n t a i n s four intermediate p o r t s which are connected to the Cl a s s - 0 and Class-1 b u f f e r s as shown. I t has two se t s of outgoing s t a t u s l i n e s i n d i c a t i n g the a v a i l a b i l i t i e s of i t s C l a s s - 0 and Class-1 b u f f e r s to i t s p r e c e d i n g switches (which are connected to i t s l e f t and r i g h t input l i n k s ) , and two se t s of incoming s t a t u s l i n e s g i v i n g the same i n f o r m a t i o n from i t s succeeding switches (which are connected to i t s l e f t and r i g h t output l i n k s ) . The C l a s s - k b u f f e r — where k i s i n {0,1,2} — i s used to accomodate packets with a feedback count of k; the Cla s s - 2 b u f f e r i s connected to the output p o r t d i r e c t l y while the Cl a s s - 0 and Class-1 b u f f e r s are connected to the output p o r t through the in t e r m e d i a t e p o r t s . The f u n c t i o n s of these v a r i o u s mechanisms w i l l be e x p l a i n e d l a t e r on. When a packet e n t e r s a Type-B switch l o c a t e d i n the s-57 t h s t a g e , i t w i l l u n d e r g o t h e f o l l o w i n g o p e r a t i o n s : ( a ) F r o m a n i n p u t p o r t t o t h e b u f f e r p o o l : T h e p a c k e t w i l l b e p l a c e d i n t o o n e o f t h e b u f f e r s a c c o r d i n g t o i t s f e e d b a c k c o u n t ; ( b ) F r o m t h e b u f f e r p o o l t o t h e o u t p u t p o r t : C a s e d ) F r o m a C l a s s - 0 a n d C l a s s - 1 b u f f e r s : I f t h e s - t h b i t o f t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t i s a " 0 " , t h e n t h e p a c k e t w i l l b e s w i t c h e d t o t h e l e f t i n t e r m e d i a t e p o r t t h e n t o t h e l e f t o u t p u t p o r t ; e l s e i t w i l l b e s w i t c h e d t o t h e r i g h t i n t e r m e d i a t e p o r t t h e n to_ t h e r i g h t o u t p u t p o r t . C a s e ( i i ) F r o m a C l a s s - 2 b u f f e r s : T h e p a c k e t w i l l b e f o r w a r d e d t o t h e o u t p u t p o r t c o n n e c t e d t o t h e C l a s s - 2 b u f f e r w i t h o u t s w i t c h i n g a n d g o i n g t h r o u g h t h e i n t e r m e d i a t e p o r t s . ( c ) F r o m a n o u t p u t p o r t t o t h e e x t e r i o r : A t t h e o u t p u t p o r t , t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t w i l l b e m a t c h e d a g a i n s t t h a t o f t h e o u t p u t l i n k . I f a m a t c h o c c u r s , t h e n a s t r o b e s i g n a l w i l l b e s e n t t o t h e r e c e i v e r a t t a c h e d t o t h a t o u t p u t l i n k , a n d t h e p a c k e t w i l l b e r e m o v e d f r o m t h e o u t p u t p o r t b y t h a t r e c e i v e r ; e l s e t h e n e x t s w i t c h a t t h e o t h e r e n d o f t h e o u t p u t l i n k w i l l b e s t r o b e d a n d t h e p a c k e t w i l l b e f o r w a r d e d t o i t s i n p u t p o r t . F o r t h e t r a n s m i s s i o n b e t w e e n t h e l a s t a n d t h e f i r s t s t a g e s v i a t h e f e e d b a c k l o o p s , t h e same o p e r a t i o n w i l l t a k e 58 p l a c e , but i n a d d i t i o n , the feedback count of those packets emerging from the output port of the l a s t stage w i l l be incremented. These three o p e r a t i o n s w i l l be c o l l e c t i v e l y r e f e r r e d t o as a s i n g l e r o u t i n g step f o r the Type-B s w i t c h . According to the d e s c r i p t i o n s above, the Type-B switch must c o n t a i n the f o l l o w i n g f e a t u r e s i n a d d i t i o n to those d e p i c t e d i n F i g . I I I . 3 . F i r s t of a l l , the addresses of i t s output l i n k s must be made a v a i l a b l e to the matching o p e r a t i o n s (e.g., by s t o r i n g the addresses i n s i d e the s w i t c h ) , and there must be some l o g i c gates to perform the matching; the switch must be able to determine whether or not i t i s l o c a t e d i n the l a s t stage of the network by examining the l a b e l s a s s i g n e d to i t s output l i n k s , because the feedback counts of those packets p a s s i n g through i t have t o be incremented. S i m i l a r f e a t u r e s must* a l s o be present i n a Type-A switch, but those hardware i n v o l v i n g the feedback counts may not be i n c l u d e d . Since the r o u t i n g of packets i s performed l o c a l l y by each of the switches, LSSN has the advantage of not r e q u i r i n g a c e n t r a l c o n t r o l l e r . On the other hand, the l a c k of c e n t r a l c o n t r o l w i l l g i v e r i s e to c o n f l i c t s among the packets f o r the shared network resources such as p o r t s and l i n k s ; the e f f e c t s of such c o n f l i c t s on LSSN when Type-A and Type-B switches are used w i l l be d e t a i l e d i n the next s e c t i o n . 59 3. Network P r o p e r t i e s F i r s t , some u s e f u l theorems concerning the behaviors of LSSN with the presence of a s i n g l e packet w i l l be s t a t e d , then the v a r i o u s p r o p e r t i e s of LSSN with the presence of more than one packet w i l l be examined. The p r o o f s of a l l these theorems are given i n Appendix A, and a l l the a l g o r i t h m s used are base-two. No t i c e that even though the d e s t i n a t i o n addresses A S , , * 1 ^ L ' * ^ 1 c a r r i e d by a packet c o n s i s t s of (S'+L') b i t s of i n f o r m a t i o n , only the L' l e a s t s i g n i f i c a n t b i t s are i n v o l v e d i n the s w i t c h i n g o p e r a t i o n ( i . e . , Operation (b) of S e c t i o n 2.B); and the f i r s t S' most s i g n i f i c a n t b i t s , together with the L' l e a s t s i g n i f i c a n t b i t s , are i n v o l v e d i n the matching o p e r a t i o n ( i . e . , O peration (c) of S e c t i o n 2.B) o n l y . T h i s o b s e r v a t i o n l e a d s to the f o l l o w i n g lemma: Lemma 111.1: Consider a LSSN which has L loops and a packet which i s d e s t i n e d f o r the address • ••a^L** • • • • ^ i , where L'=logL and S ' = r i o g L ' l . The packet w i l l be routed to the loop t 1 w i t h i n L' steps of r o u t i n g a f t e r i t s admission i n t o the LSSN. Example: Consider a LSSN with L=16, then L'=4 and S'=2. A packet d e s t i n e d f o r the address (101111) w i l l be routed to the l o o p (1111) w i t h i n 4 steps of r o u t i n g r e g a r d l e s s of where i t i s generated. 60 Lemma I I I . 2 : Consider a LSSN with L loops and a packet which i s d e s t i n e d f o r the address As'*'* A 1^L'• • • • ^1, where L'=logL and S' = rlogL'. 1. A f t e r the packet has been routed to the loop L 1 , i t needs at most another (L'-1) steps of matching along that l o o p to reach i t s d e s t i n a t i o n . Theorem III.1s In a LSSN with L loops, a packet w i l l be d e l i v e r e d to i t s d e s t i n a t i o n w i t h i n (21ogL -1) steps of r o u t i n g r e g a r d l e s s of where i t i s generated. Example: In the example of F i g . I I I . 1 , L=16; t h e r e f o r e the maximum number of r o u t i n g steps i s (2*4 - 1)=7. Theorem I I I . 2 : The average number of r o u t i n g steps (ARS) needed to d e l i v e r a r e s u l t packet i n a LSSN with L loops i s , ARS(L)=(3logL-1)/2+2/L-1 Example: In the example of F i g . I I I . 2 , s i n c e L=16, t h e r e f o r e , ARS(L=l6)=(3log16 - l ) / 2 + 2/16 -1 = 4.625. C o r o l l a r y 1 1 1 . 1 : Any packet admitted i n t o LSSN w i l l go through the feedback path at most twice. 61 Example: In the example of F i g . I I I . 1 and 2, i f Tr49 — which i s a t t a c h e d to l i n k 10 0000 -- sends a packet to Rr32 -- which i s a t t a c h e d to l i n k 01 1111, then t h i s packet w i l l go through the feedback paths twice: The f i r s t time through the loop (1000), and the second time through the loop (1111). C o r o l l a r y III.1 e x p l a i n s why the packets only have to c a r r y two b i t s to i n d i c a t e i t s feedback count f ' f " , and a l s o why each b u f f e r pool of the Type-B switch i s made up of three c l a s s e s of b u f f e r s r e g a r d l e s s of the network s i z e L. Theorem I I I . 3 : In a Type-B switch of a LSSN which has L lo o p s , the p r o b a b i l i t y t h at the d e s t i n a t i o n address c a r r i e d by a r e s u l t packet w i l l match the l a b e l of an output l i n k of the switch, and hence the packet w i l l be removed from the network i s : Premoved =2L/{3LlogL-L +4} where the t r a n s m i s s i o n p a t t e r n i s such that each and every r e c e i v i n g p o r t of the network i s e q u a l l y l i k e l y t o r e c e i v e t h a t packet. Theorem I I I . 4 : The maximum average throughput r a t e (MATR) of a LSSN with L loops i s : 62 MATR(L)=3/2xS R,SW xlogLxL**2/{3LlogL-L+4} where S. R,SW i s the maximum ra t e of t r a n s m i t t i n g R e s u l t packet between two switches v i a an output l i n k . 3.A. Network C o n f l i c t s When there are two or more packets i n LSSN, they may contend f o r the same network r e s o u r c e s such as input b u f f e r s , p o r t s and data l i n k s , thus g i v i n g r i s e to c o n f l i c t s . I f Type-A switches are used i n LSSN, then there would be two types of c o n f l i c t s : (a) A1 c o n f l i c t s - which are the c o n t e n t i o n s due to the simultaneous requests made by packets i n the two input b u f f e r s , f o r the same Output p o r t ; (b) A2 c o n f l i c t s - which are the c o n t e n t i o n s between an output p o r t and the Tr s h a r i n g the same l i n k , f o r the same input p o r t of the switch at the end of the l i n k . A simple round-robin d i s c i p l i n e can r e s o l v e both types of c o n f l i c t s and w i l l ensure f a i r n e s s . A b e t t e r a l t e r n a t i v e f o r Al c o n f l i c t s i s to honor the input b u f f e r which has more 63 w a i t i n g packets i n i t ; and i f both input b u f f e r s are e q u a l l y occupied, then an a r b i t r a r y b u f f e r w i l l be chosen. As f o r A2 c o n f l i c t s , the output p o r t s perhaps should be given a higher p r i o r i t y over the T r ' s so that those packets which are a l r e a d y admitted i n t o the network c o u l d reach t h e i r d e s t i n a t i o n f a s t e r ( i . e . , a f a s t e r response time c o u l d be o b t a i n e d ) . These o b s e r v a t i o n s were obtained from the s i m u l a t o r l i s t e d i n Appendix C. As f o r Type-B switches, there are three p o s s i b l e types of c o n f l i c t s (the reader may r e f e r to F i g . I I I . 3 f o r the f o l l o w i n g d e s c r i p t i o n s ) : (a) B1 c o n f l i c t s - which are due to the simultaneous requests made by packets from the l e f t and r i g h t b u f f e r p o o l s , f o r the same intermediate p o r t ; (b) B2 c o n f l i c t s - which are the c o n t e n t i o n s among the the i n t e r m e d i a t e p o r t s and Class-2 b u f f e r f o r the same output p o r t ; (c) B3 c o n f l i c t s - which are the c o n t e n t i o n s between an output p o r t and the Tr s h a r i n g the same output l i n k , f o r the input port at the end of the l i n k . B1 c o n f l i c t s c o u l d be r e s o l v e d by a simple round-robin d i s c i p l i n e : the c o n f l i c t i n g packets are switched to the i n t e r m e d i a t e port a l t e r n a t e l y . 64 The r e s o l u t i o n of B2 c o n f l i c t s i s more i n t r i c a t e . Our s i m u l a t i o n s t u d i e s showed that the round-robin d i s c i p l i n e would g i v e r i s e t o unbearable propagation delay t o c e r t a i n p a c kets, but much b e t t e r performance, i n terms of average throughput r a t e and delay, c o u l d be obtained with a p r i o r i t y -based p o l i c y (an e x p l a n a t i o n w i l l be given i n S e c t i o n 3.B) which a s s i g n e d the hig h e s t p r i o r i t y to the Cl a s s - 2 b u f f e r , and then the in t e r m e d i a t e p o r t connected to the Class-1 b u f f e r s , and f i n a l l y the intermediate p o r t connected to the C l a s s - 0 b u f f e r s . With t h i s p o l i c y , packets i n the Cla s s - 2 b u f f e r s are switched to the output p o r t immediately when the output p o r t becomes empty; as c o u l d be e x p l a i n e d by Lemma 1 and 2, these packets w i l l always remain i n the same lo o p s , t h e r e f o r e they need not go throught any intermediate p o r t . Furthermore, s i n c e they are as s i g n e d the hi g h e s t p r i o r i t y i n the use of the output p o r t , they w i l l not accumulate and hence the s i z e of the Cla s s - 2 b u f f e r s i s always bounded. When the Cl a s s - 2 i s empty, the " e l i g i b l e " i n t e r m e d i a t e p o r t with the next h i g h e s t p r i o r i t y w i l l be granted access to the output p o r t . An in t e r m e d i a t e port connected to the Cl a s s - k b u f f e r s i s s a i d to be " e l i g i b l e " i f i t i s non-empty, and i f the incoming s t a t u s l i n e s i n d i c a t e t h a t the Cla s s - k b u f f e r of the next switch i s not f u l l . As f o r the connections between the l a s t and f i r s t stage, an int e r m e d i a t e port connected to t o the Cl a s s - k b u f f e r i s " e l i g i b l e " i f i t i s non-empty and i f the Class-(k+1) b u f f e r of the next switch i n the f i r s t stage i s not f u l l . The d i f f e r e n c e i n the above d e f i n i t i o n s of 65 " e l i g i b i l i t y " i s d i s c e r n i b l e i f o n e r e a l i z e s t h a t t h e f e e d b a c k c o u n t o f a p a c k e t i s i n c r e m e n t e d w h e n e v e r i t g o e s t h r o u g h t h e f e e d b a c k p a t h s b a c k t o t h e f i r s t s t a g e . T h e p u r p o s e o f t h e s t a t u s l i n e s i s t h e r e f o r e t o h e l p p r e v e n t t h e o u t p u t p o r t s a n d i n p u t p o r t s f r o m b e i n g c l o g g e d w i t h p a c k e t s w h i c h c a n n o t b e s w i t c h e d a w a y i m m e d i a t e l y . T h e p r i o r i t y - b a s e d p o l i c y w o u l d f a v o r t h o s e p a c k e t s o r i g i n a t e d a t t h e l o w e r s t a g e s , b e c a u s e t h e f e e d b a c k c o u n t s o f t h e s e p a c k e t s a r e i n c r e m e n t e d s o o n e r t h a n t h o s e o r i g i n a t e d a t t h e u p p e r s t a g e s , a n d h e n c e w i l l b e a s s i g n e d h i g h e r p r i o r i t i e s s o o n e r . H o w e v e r , o u r s i m u l a t i o n s t u d i e s s h o w s t h a t t h i s p o l i c y i s s u p e r i o r t h a n t h e r o u n d - r o b i n d i s c i p l i n e a s f a r a s t h e o v e r a l l p e r f o r m a n c e i s c o n c e r n e d ( a n e x p l a n a t i o n w i l l b e o f f e r e d a t t h e e n d o f n e x t s e c t i o n ) . T h e r e s o l u t i o n o f B3 c o n f l i c t s i s r a t h e r s t r a i g h t -f o r w a r d : t h e c o n f l i c t i n g T r a n d t h e o u t p u t p o r t a r e g r a n t e d a c c e s s a l t e r n a t e l y . B u t i n a d d i t i o n , i t i s n e c e s s a r y f o r t h e T r t o c h e c k t h a t t h e C l a s s - 0 b u f f e r ( n o t j u s t t h e i n p u t p o r t ) a t t h e e n t r y p o i n t i s n o t f u l l b e f o r e i t c a n t r a n s m i t . T h e a v a i l a b i l i t y o f t h e C l a s s - 0 b u f f e r h a s t o b e c h e c k e d b e c a u s e n e w l y a d m i t t e d p a c k e t s c a r r y f e e d b a c k c o u n t s o f z e r o . 3.B. D e a d l o c k a n d A v o i d a n c e M e t h o d I n c o n v e n t i o n a l n o n - r e c i r c u l a t i n g , p a c k e t - s w i t c h e d 66 networks, the blockage due to data path c o n f l i c t s i s temporary as long as there i s a f a i r s c h e d u l i n g p o l i c y ; whereas i n a LSSN which uses Type-A switches, blockage migh l e a d to deadlocks — the s i t u a t i o n s i n which c e r t a i n loops are clogged with packets and no f u r t h e r s w i t c h i n g can take p l a c e along these l o o p s , and very soon the whole network w i l l become impassable. The deadlock problem i n LSSN i s a t t r i b u t a b l e to the store-and-forward type of data movements and the c y c l i c a l r equests of network r e s o u r c e s . In a Type-A switch, i f the packets coming out of i t s two input b u f f e r s always contend f o r the output p o r t s , then the input b u f f e r s w i l l be f i l l e d up r a p i d l y ; and i f a l l the input b u f f e r s and output p o r t s along'a p a r t i c u l a r l o o p are f i l l e d with packets i n t r a n s i t , and i f the f i r s t packets of a l l these input b u f f e r s are w a i t i n g f o r these occupied output p o r t s to be f r e e d , then t h i s l o o p w i l l enter a " s i n g l e - l o o p " deadlock. A " m u l t i p l e - l o o p " deadlock i s produced i n a s i m i l a r manner but i t i n v o l v e s more than one l o o p . A c c o r d i n g to our s i m u l a t i o n s t u d i e s , the p r o b a b i l i t y of deadlock c o u l d be reduced s i g n i f i c a n t l y by i n c r e a s i n g the s i z e of b u f f e r s and r e s t r i c t i n g the input l o a d down to a c e r t a i n l e v e l ; but t h i s approach does not e l i m i n a t e deadlocks e n t i r e l y , and moreover, i t r e q u i r e s a deadlock d e t e c t i o n scheme and a recovery procedure. Perhaps i t i s more 67 e f f i c i e n t t o get around the deadlock problem by a v o i d i n g c y c l i c a l requests of the network re s o u r c e s ; and Type-B switches are meant f o r such a purpose. Our idea of using Type-B switches to prevent deadlocks i s based on the concept of " s t r u c t u r e d b u f f e r p o o l s " put forward by Raubold and Haenle [29]. A c c o r d i n g to t h e i r method, b u f f e r p o o l s are d i v i d e d i n t o K c l a s s e s , where K i s the l e n g t h of the longest path i n the network concerned, and i f a packet i s of r r o u t i n g steps away from i t s t r a n s m i t t e r , then i t may be p l a c e d i n t o any C l a s s - k b u f f e r such t h a t k<r<K. C l e a r l y , t h e i r method has the drawback that K must be a f u n c t i o n of the network s i z e . We e l i m i n a t e t h i s drawback by c l a s s i f y i n g packets a c c o r d i n g to t h e i r feedback counts which has been proved to be bounded. With the use of Type-B switches, the LSSN w i l l be f r e e of the store-and-forward type of deadlocks. A simple e x p l a n a t i o n i s as f o l l o w s : f o r packets e n t e r i n g the b u f f e r p o o l s , they w i l l request b u f f e r s a c c o r d i n g to t h e i r feedback counts, t h e r e f o r e there i s no c i r c u l a r request on the b u f f e r s ; as f o r the shared l i n k s and input/output p o r t s , these network reso u r c e s are granted to the r e q u e s t i n g packets on the c o n d i t i o n t h a t t h e i r occupations by the packets w i l l always be temporary. With t h i s idea i n mind, now we w i l l s t a t e the f o l l o w i n g theorem: Theorem I I I . 5 : The LSSN which uses Type-B switches i s deadlock r 68 f r e e . A n e x p l a n a t o r y p r o o f o f T h e o r e m I I I . 5 i s g i v e n i n A p p e n d i x A . 3 . C . N e t w o r k E x t e n s i b i l i t y V e r y o f t e n i t i s d e s i r a b l e t o e x p a n d a n e t w o r k a f t e r i t h a s b e e n b u i l t ; b u t u s u a l l y s u c h a n e x p a n s i o n i s d i f f i c u l t w i t h m o s t , i f n o t a l l , e x i s t i n g d e s i g n s . L S S N h a s t h e v e r y u s e f u l p r o p e r t y t h a t i t c a n b e e x p a n d e d i n c r e m e n t a l l y b y a d d i n g m o r e s t a g e s t o t h e b a s i c s t r u c t u r e w i t h o u t c o m p l i c a t i n g t h e a d d r e s s i n g a n d r o u t i n g a l g o r i t h m s . O f c o u r s e , t o f a c i l i t a t e t h e e x p a n s i o n , t h e r e m u s t b e s u f f i c i e n t a d d r e s s l i n e s t o a c c o u n t f o r t h e a d d e d s t a g e s a n d d e v i c e s . O n e way t o e x p a n d t h e b a s i c s t r u c t u r e w h i l e k e e p i n g i t i n t a c t i s t o a d d t h e new s t a g e s t o t h e b o t t o m o f i t — i m m e d i a t e l y a f t e r t h e l a s t s t a g e o f s w i t c h e s a n d b e f o r e t h e p e r f e c t s h u f f l e t a k e s p l a c e . S u p p o s e t h e r e a r e L l o o p s a n d L ' s t a g e s o r i g i n a l l y , a n d we w a n t t o a d d L " m o r e s t a g e s , t h e n t h e e x p a n d e d L S S N w o u l d h a v e a t o t a l o f ( L ' + L " ) s t a g e s a s s h o w n : 69 stage number, s 1 } 2 } -• 1 • y b a s i c s t r u c t u r e • s L' } L' + 1 ] • J • ] • J a d d i t i o n a l stages L'+L" ] Now the stages and Rr's w i l l be addressed u s i n g S"=riog(L'+L")1 and {\log(L'+L")1+L 1}={S"+L'} b i t s of b i n a r y d i g i t s r e s p e c t i v e l y . The a d d r e s s i n g scheme and c o n n e c t i o n f u n c t i o n f o r the newly added stages a r e : L e f t o u t p u t l i n k = (A g „ . . . A ^ - ^ , _^ . . . ) R i g h t o u t p u t l i n k = ( A g „ . . . A ^ l - d ^ , ^ . . . .-t^ ) L S S N (<e.L, tx) = ( 7 L , t x ) In words, a l l the new stages would be t r e a t e d much the same as the l a s t stage of the b a s i c s t r u c t u r e , i . e . , the L ' - t h stage; and there i s no s h u f f l i n g among the output l i n k s of these new stages; and packets which are sent to them are routed a c c o r d i n g to the L ' - t h b i t s of the d e s t i n a t i o n addresses of the packets. In the expanded LSSN, the duty of incrementing the feedback counts i s performed by the (L'+L")th stage r a t h e r than the L ' - t h ; such a minor change has to be taken care of d u r i n g the expansion. We do not i n t e n d to d e r i v e theorems from s c r a t c h f o r the expanded network because the v a l i d i t y of 70 C o r o l l a r y .111.1 and Theorem II I . 2 are d i s c e r n i b l e i f the new stages are regarded as the s u b s i d i a r i e s of the L ' - t h stage, i . e . , i f stage L' through stage (L'+L") are c o n s i d e r e d as a s i n g l e , compound stage. L a s t l y , we would l i k e to p o i n t out tha t LSSN c o u l d a l s o be expanded i n the more expensive way by do u b l i n g i t s loop count. 4. Si m u l a t i o n s and Performance A n a l y s i s In our s i m u l a t i o n s t u d i e s , the throughput r a t e and del a y are the two measures used f o r e v a l u a t i o n s and comparisons. Throughput r a t e i s d e f i n e d as the average number of packets f l o w i n g throught the network per u n i t time, and the delay of packets i s d e f i n e d as the average i n t e r v a l between t h e i r g e n e r a t i o n s and r e c e p t i o n s . The delay i s made up of "entrance d e l a y " and "propagation d e l a y " , where entrance d e l a y i s the average d u r a t i o n that a packet has to wait at the en t r y p o i n t , and propagation delay i n c l u d e s the time spent i n queueing and s w i t c h i n g w i t h i n the network. Request i n t e r v a l i s the v a r y i n g parameter and i s d e f i n e d as the average time between the l a s t s u c c e s s f u l t r a n s m i s s i o n and the ge n e r a t i o n of the next packet. In order to o b t a i n some meaningful r e s u l t s and to f a c i l i t a t e the a n a l y s i s l a t e r on, we have made the f o l l o w i n g assumptions: 71 (a) The t r a n s m i t t i n g and r e c e i v i n g p a i r s are randomly s e l e c t e d out of the e n t i r e address space; (b) The t r a n s m i s s i o n p a t t e r n i s such that i f the c u r r e n t request to t r a n s m i t i s i n process or blocked, then the t r a n s m i t t e r a f f e c t e d w i l l not generate the next request; (c) Packets are removed immediately from the network when they a r r i v e at t h e i r d e s t i n a t i o n ; (d) As f o r t i m i n g c o n s i d e r a t i o n s , the amount of s w i t c h i n g delay i n going through a c o n v e n t i o n a l b i n a r y switch was estimated to be f i v e gate d e l a y s [30]: three f o r path s e l e c t i o n and two f o r data t r a n s f e r . Since Type-A switches would l e a d to deadlocks on LSSN, t h e i r a n a l y s i s w i l l not be i n c l u d e d i n our s t u d i e s . A Type-B s w i t c h would need more delay than the c o n v e n t i o n a l ones: three gate d e l a y s f o r path s e l e c t i o n , two f o r data t r a n s f e r from the input p o r t s to the b u f f e r p o o l s , two from the b u f f e r p o o l s to the i n t e r m e d i a t e p o r t s and another f i v e f o r path s e l e c t i o n and data t r a n s f e r from the i n t e r m e d i a t e p o r t s to the output p o r t s — a t o t a l of twelve gate d e l a y s . In the case of those packets which are i n s i d e C l a s s - 2 b u f f e r s , t h e i r s w i t c h i n g delay are s h o r t e r because they do not have to go through the 72 i n t e r m e d i a t e p o r t s . T h e m a i n d u t y o f t h e L S S N s i m u l a t o r i s t o c o m p u t e t h e t o t a l d e l a y s o f e a c h i n d i v i d u a l p a c k e t b y s u m m i n g u p i t s e n t r a n c e , s w i t c h i n g a s w e l l a s q u e u e i n g d e l a y s . T h e s e a s s u m p t i o n s a r e c o n s i d e r e d j u s t i f i a b l e , a n d t h e y h a v e a l s o a p p e a r e d i n t h e s i m u l a t i o n s t u d i e s o f o t h e r p a c k e t s w i t c h i n g n e t w o r k s ( e . g . r e f e r e n c e s [ 2 5 , 3 0 ] ) . I n t h e L S S N s i m u l a t o r , a t i m e r was a s s o c i a t e d w i t h e a c h p a c k e t e n t e r i n g t h e n e t w o r k , s o a s t o r e c o r d e a c h t y p e o f d e l a y s t h a t i t w i l l e n c o u n t e r . F r o m t i m e t o t i m e , t h e w h o l e s w i t c h i n g a r r a y was i n s p e c t e d t o m ake s u r e t h a t n o p a c k e t w o u l d b e s u b j e c t e d t o s u b s t a n t i a l d e l a y — w h i c h i s a n i n d i c a t i o n o f p o t e n t i a l d e a d l o c k s . T h e L S S N u n d e r s t u d y h a d 16 l o o p s a n d was f u l l y c o n n e c t e d w i t h 64 p a i r s o f t r a n s m i t t e r s a n d r e c e i v e r s . T h e e f f e c t s o f t h e b u f f e r s i z e o n t h e n e t w o r k p e r f o r m a n c e w e r e f i r s t i n v e s t i g a t e d , a n d i t w a s c o n f i r m e d t h a t b e c a u s e t h e C l a s s - 2 b u f f e r s w e r e g i v e n t h e h i g h e s t p r i o r i t y i n t h e B2 t y p e o f c o n f l i c t s , t h e maximum r e q u e s t e d s i z e o f t h e C l a s s - 2 b u f f e r s was b o u n d e d t o t w o . F o r t h i s r e a s o n , t h e s i z e o f t h e C l a s s - 2 b u f f e r s was f i x e d a t t w o , a n d t h e s i z e s o f t h e C l a s s - 0 a n d C l a s s - 1 b u f f e r s w e r e v a r i e d f r o m 4 t o 14 ( a n a r b i t r a r y r a n g e ) . O u r r e s u l t s ( p l e a s e s e e F i g . I I I . 5 ) s h o w t h a t t h e v a r i a t i o n o f t h e s i z e s d o n o t h a v e a s i g n i f i c a n t e f f e c t ; t h e r e a s o n i s t h a t w h en m o r e b u f f e r s w e r e u s e d , t h e r e w o u l d b e m o r e t r a f f i c i n t r o d u c e d i n t o t h e n e t w o r k , a l t h o u g h t h e e n t r a n c e d e l a y o f a p a c k e t i s r e d u c e d , i t s p r o p a g a t i o n d e l a y 73 w o u l d b e i n c r e a s e d ; a s a r e s u l t , t h e t o t a l d e l a y i s n o t m u c h a f f e c t e d . I n t h e s e c o n d p a r t o f o u r s t u d y , we c o m p a r e d t h e p e r f o r m a n c e o f a 6 4 x 6 4 L S S N t o t h a t o f a 6 4 x 6 4 b a s e l i n e a n d t h e n a 1 6 x 1 6 b a s e l i n e . T h e b a s e l i n e n e t w o r k s w e r e c o n s i d e r e d b e c a u s e t h e y a r e t o p o l o g i c a l l y e q u i v a l e n t t o many e x i s t i n g n e t w o r k s [ 2 0 ] . We m u s t e m p h a s i z e t h a t o u r c o m p a r i s o n s t u d i e s a r e n o t e n t i r e l y f a i r b e c a u s e b a s e l i n e - l i k e n e t w o r k s c o u l d b e u s e d a s e i t h e r c i r c u i t - s w i t c h e d n e t w o r k s ( e . g . , T h e S t a r n e t w o r k [ 2 4 ] ) , o r p a c k e t - s w i t c h e d n e t w o r k s , w h e r e a s L S S N i s i n t e n d e d t o b e u s e d a s p a c k e t - s w i t c h e d n e t w o r k s o n l y ; f u r t h e r m o r e , t h e L S S N s w i t c h e s h a v e a m u c h h i g h e r l o g i c g a t e d e n s i t y t h a n t h e c o n v e n t i o n a l o n e s . F o r c o m p a r i s o n s , we a s s u m e d t h a t t h e s w i t c h e s u s e d i n t h e b a s e l i n e s h a d a b u f f e r s i z e o f 16 ( a n a r b i t r a r y n u m b e r ) ; a s f o r L S S N s w i t c h e s , t h e s i z e s o f i t s C l a s s - 0 , C l a s s - 1 a n d C l a s s - 2 b u f f e r s w e r e f i x e d a t s e v e n , s e v e n a n d t w o , r e s p e c t i v e l y — a t o t a l o f 16 a s w e l l . T h e r e s u l t s o b t a i n e d f o r t h i s b u f f e r s i z e a r e p r e s e n t e d i n F i g . I I I . 5 ; o t h e r b u f f e r s i z e s w o u l d p r o d u c e r e s u l t s v e r y s i m i l a r t o t h e s e . I n F i g . I I I . 5 , a l l t h e m e a s u r e m e n t s w e r e s c a l e d b y t h e f a c t o r f w h i c h i s t h e o p e r a t i n g f r e q u e n c y o f t h e n e t w o r k s . T h e v a l u e o f f c o u l d b e a s h i g h a s 60 MHz i f t h e s w i t c h e s a r e f a b r i c a t e d w i t h T T L g a t e s , o r 4 0 0 MHz w i t h E C L g a t e s . A 6 4 x 6 4 b a s e l i n e w o u l d n e e d a t o t a l o f 192 s w i t c h e s w h e r e a s a 74 6 4 x 6 4 L S S N a n d a 1 6 x 1 6 b a s e l i n e w o u l d r e q u i r e 32 s w i t c h e s e a c h . N o t o n l y t h e n u m b e r s o f s w i t c h e s w o u l d c o n t r i b u t e t o t h e c o m p l e x i t i e s o f t h e n e t w o r k s , b u t t h e a m o u n t s o f w i r i n g h a v e t o b e c o n s i d e r e d a s w e l l . O u r r e s u l t s show t h a t i f t h e r e q u e s t i n t e r v a l i s v e r y s h o r t , t h e t h r o u g h p u t o f t h e L S S N w i l l b e c l o s e t o t h a t o f t h e 1 6 x 1 6 b a s e l i n e , a n d i t s d e l a y w i l l b e a b o u t t h r e e t i m e s h i g h e r ; b u t when t h e r e q u e s t i n t e r v a l i s l o n g e r t h a n 4 0 / f , t h e n b o t h t h e t h r o u g h p u t a n d d e l a y o f t h e L S S N w i l l a p p r o a c h t h a t o f t h e 6 4 x 6 4 b a s e l i n e . I f t h e r e q u e s t i n t e r v a l i s f u r t h e r i n c r e a s e d , t h e d e l a y o f t h e L S S N w i l l b e r e d u c e d t o t h a t o f t h e 1 6 x 1 6 b a s e l i n e . I n s u m m a r y , o u r r e s u l t s i n d i c a t e t h a t t h e p e r f o r m a n c e o f a 6 4 x 6 4 L S S N c a n m a t c h t h a t o f a 6 4 x 6 4 b a s e l i n e w i t h a s i g n i f i c a n t l y l o w e r s w i t c h c o u n t a n d h e n c e f e w e r w i r i n g . T h i s s a v i n g s i s e v e n m o r e s u b s t a n t i a l when t h e s i z e s o f t h e n e t w o r k s c o n s i d e r e d a r e v e r y l a r g e . 5. D i s c u s s i o n s a n d O u t l o o k We h a v e d e s c r i b e d a n o v e l m e t h o d t o s e t u p a c o m m u n i c a t i o n n e t w o r k b a s e d o n t h e c o n c e p t o f c y c l i c a l a r c h i t e c t u r e s ; we h a v e a l s o p r e s e n t e d t h e a d d r e s s i n g a n d r o u t i n g s c h e m e s , a n d s e v e r a l p r o p e r t i e s o f t h e n e t w o r k . A l t h o u g h p a c k e t - s w i t c h e d , c y c l i c a l l y c o n n e c t e d s t r u c t u r e s a r e , 75 s u s c e p t i b l e t o t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s when u s e d a s y n c h r o n o u s l y , we h a v e s u g g e s t e d a d e a d l o c k a v o i d a n c e s c h e m e b a s e d o n some u n i q u e f e a t u r e s o f o u r d e s i g n . T h e t o p o l o g y o f o u r p r o p o s e d n e t w o r k L S S N r e s e m b l e s t h a t o f some e x i s t i n g o n e s . I f o n l y L p r o c e s s o r s a r e a t t a c h e d t o L S S N s u c h t h a t t h e y a l l t r a n s m i t p a c k e t s t h r o u g h t h e f i r s t s t a g e a n d r e c e i v e p a c k e t s f r o m t h e l a s t s t a g e , t h e n L S S N w o u l d b e r e d u c e d t o a n i n d i r e c t b i n a r y n - c u b e [ 2 1 ] . T h i s s i m i l a r i t y i m p l i e s t h a t t h o s e u s e f u l a l g o r i t h m s d e v e l o p e d f o r t h e i n d i r e c t b i n a r y n - c u b e c o u l d b e a d a p t e d f o r L S S N e a s i l y . L S S N a l s o r e s e m b l e s t h e l a s t s t a g e o f t h e B a t c h e r ' s b i t o n i c s o r t e r [ 1 3 ] ; t h e r e f o r e , i t i s p o s s i b l e t o p e r f o r m B a t c h e r S o r t o n L S S N p r o v i d e d t h e r e e x i s t s a m a s k i n g s c h e m e t o d i s a b l e some o f t h e a t t a c h e d p r o c e s s o r s a s d a t a i t e m s a r e c i r c u l a t e d a r o u n d t h e n e t w o r k . L S S N c o u l d a l s o b e p a r t i a l l y c o n n e c t e d a n d u s e d a s a n a r b i t r a t o r o r a d i s t r i b u t o r — b o t h o f w h i c h a r e e s s e n t i a l i n t h e d e s i g n s o f d a t a - d r i v e n c o m p u t e r s [ 4 , 5 , 8 , 5 2 , 5 3 ] . L S S N c a n a l s o b e u s e d t o p e r f o r m a r b i t r a r y p e r m u t a t i o n s — i . e . , o n e - t o - o n e m a p p i n g s — f r o m t h e i n p u t s i d e t o t h e o u t p u t s i d e . S u c h p e r m u t a t i o n s w o u l d r e q u i r e t h e p r e s e n c e o f a c e n t r a l c o n t r o l l e r t o c o m p u t e t h e r o u t i n g i n f o r m a t i o n ; a l t e r n a t i v e l y , t h o s e f r e q u e n t l y u s e d c o n t r o l p a t t e r n s c o u l d b e p r e - c o m p u t e d a n d r e t r i e v e d when n e e d e d . T h e s i m p l e r , T y p e - A s w i t c h e s c o u l d b e u s e d f o r s u c h a n a p p l i c a t i o n a n d t h e y w i l l n o t c a u s e t h e n e t w o r k t o d e a d l o c k a s 76 l o n g a s o n l y o n e p e r m u t a t i o n i s p e r f o r m e d a t a t i m e , a n d p r o v i d e d t h e r e a r e s u f f i c i e n t b u f f e r s i n s i d e t h e s w i t c h e s : B ( L ) > M A X { M I N ( 2 * * r 0 . 5 1 o g N l , N / 2 * * [ 0 . 5 1 o g N l ) , M I N ( 2 * * L 0 . 5 1 o g N J , N / 2 * * [ 0 . 5 1 o g N J ) } = O ( N * * 0 . 5 ) w h e r e B ( L ) i s t h e n u m b e r o f b u f f e r s o f t h e T y p e - A s w i t c h e s n e e d e d t o a v o i d d e a d l o c k s , a n d N = L l o g L , w h e r e L i s t h e n u m b e r o f l o o p s ; t h e w o r s t - c a s e d e l a y t o p e r f o r m a o n e - t o - o n e m a p p i n g i s : T m a x ( L ) = 2 * * [ 0 . 5 1 o g N ] + N / ( 2 * * L 0 . 5 1 o g N J ) - ' l o g L - 1 = O ( N * * 0 . 5 ) a n d t h e a v e r a g e d e l a y i s : T a v g ( L ) = O ( l o g N ) T h e s e r e s u l t s c o u l d b e f o u n d i n r e f e r e n c e [ 2 3 ] ; b e c a u s e b o t h B ( L ) a n d T m a x ( L ) a r e i n t o l e r a b l y l a r g e f o r l a r g e v a l u e s o f N, we d o n o t i n t e n d t o i n c l u d e t h e a n a l y s i s i n t h i s d i s s e r t a t i o n . A s i s s h o w n b y o u r s i m u l a t i o n r e s u l t s ( F i g . I I I . 6 ) , t h e p e r f o r m a n c e o f t h e L S S N i s n o t a s g o o d a s t h a t o f t h e b a s e l i n e 7,7 n e t w o r k f o r a p p l i c a t i o n s w h i c h h a v e v e r y s h o r t i n t e r -t r a n s m i s s i o n t i m e s ; b u t i f t h e t r a n s m i t t e r s a r e p r o c e s s o r s w h i c h s e n d o u t d a t a p a c k e t s a s t h e r e s u l t s o f i n s t r u c t i o n e x e c u t i o n s — i . e . , , i f t h e p r o c e s s o r s c o m p u t e a n d d i s p a t c h a l t e r n a t e l y — t h e n t h e LSSN w i l l b e a n a t t r a c t i v e d e s i g n . O u r p r o p o s e d s y s t e m t r a d e s o f f e x t e r n a l h a r d w a r e c o m p l e x i t i e s ( e . g . , c o m p o n e n t c o u n t s , w i r i n g , e t c . ) w i t h i n t e r n a l h a r d w a r e c o m p l e x i t i e s ( i . e . , l o g i c g a t e p e r s w i t c h ) . H i g h i n t e r n a l c o m p l e x i t i e s c a n b e e a s i l y a c h i e v e d w i t h t o d a y ' s t e c h n o l o g i e s , b u t t o o , many e x t e r n a l c o m p o n e n t s a n d w i r i n g o f t e n r e n d e r s t h e s y s t e m d i f f i c u l t t o m a n a g e — t h i s i s t h e m a i n m o t i v e b e h i n d o u r d e s i g n . 78 Throughput (packets/sec 2f- f : network frequency a: 64x64 LSSN ( b u f f e r = l 4 / l 4 / 2 ) b: 64x64 LSSN (buffer=7/7/2) c: 64x64 LSSN (buffer=4/4/2) Note-numbers i n b r a c k e t s r e f e r to the s i z e s of Class-0,1 and 2 r e s p e c t i v e l y I n t e r - a r r i v a l time 200/f(sec.) Delay(sec.) 1.5/f - c G 1.0/f -0.5/f -40 It 80 ft 120/ f 160/f I n t e r - a r r i v a l time 2 0 0 / f ( s e c . ) »nrHI?'5* «Eff!5t!.o£ b u £ £ e r s i z e o n t h e throughput and delay of a 64x64 LSSN. s p Throughput rate (packets/sec) A Baseline Network (64 Tr's x 64 Rr's , 192 Switches) Loop-Structured Switching Network (64 Tr's x 64 Rr's , 32 Switches) Baseline Network (16 Tr's x 16 Rr's , 32 Switches) f = Network Frequency (Hz) F ig . I I I .6 . The throughput rates of a 64x64 baseline, a 64x64 LSSN and a 16x16 baseline, versus the inter--arr iva l time. I n t e r - a r r i v a l time (sec) 40/f 80/f 120/f 160/f 200 / f Delay(Sec/packet) A' 1.4 " 1.0 -f 0.6 ~ 0.2 -4 0 / f 8 0 / f 1 2 0 / f 1 6 0 / f Inter-arrlval •time(sec) 2 0 0 / f F i g . I I I . 7 . T h e d e l a y c u r v e s o f a 6 4 x 6 4 b a s e l i n e , a 6 4 x 6 4 L S S N a n d a 16 x 1 6 b a s e l i n e , v e r s u s t h e i n t e r -a r r i v a l t i m e . CO o 81 C h a p t e r I V . D e s i g n a n d E v a l u a t i o n o f The E v e n t - D r i v e n C omputer (EDC) 1 . I n t r o d u c t i o n 1 .A. B a c k g r o u n d I n f o r m a t i o n I n t h i s c h a p t e r , we w i l l e x a m i n e how t h e c o n c e p t o f c y c l i c a l a r c h i t e c t u r e s c o u l d be a p p l i e d t o t h e d e s i g n o f a h i g h - p e r f o r m a n c e s u p e r c o m p u t e r ; t o s t a r t w i t h , we w i l l d i s c u s s t h e s h o r t c o m i n g s o f t h e c o n v e n t i o n a l c o m p u t e r s y s t e m s i n t h i s r e s p e c t , a n d t h e n t h e a p p r o a c h o f o u r d e s i g n w i l l be i d e n t i f i e d . C o n v e n t i o n a l c o m p u t e r s y s t e m s a r e o f t e n r e f e r r e d t o a s Von Neumann, o r s o m e t i m e s a s H a r v a r d , m a c h i n e s , a n d t h e y a l l h a v e v e r y s i m i l a r " c o n t r o l " a n d " d a t a " m e c h a n i s m s . " C o n t r o l " m e c h a n i s m s r e f e r t o t h e method s f o r s c h e d u l i n g i n s t r u c t i o n s f o r e x e c u t i o n , a n d " d a t a " m e c h a n i s m s r e f e r t o t h e method s f o r p a s s i n g d a t a among i n s t r u c t i o n s . Von Neumann c o m p u t e r s a r e t e r m e d " c o n t r o l - d r i v e n " b e c a u s e t h e i r i n s t r u c t i o n e x e c u t i o n s a r e s e q u e n c e d by c o n t r o l s i g n a l s g e n e r a t e d by t h e CPUs ( C e n t r a l P r o c e s s i n g U n i t s ) . I n t h e s e c o m p u t e r s , d a t a a r e p a s s e d among i n s t r u c t i o n s by w r i t i n g a n d r e a d i n g memory l o c a t i o n s w h i c h a r e s p e c i f i c a l l y a s s i g n e d t o t h e s e d a t a . A m a j o r d r a w b a c k o f u s i n g Von Neumann c o m p u t e r s i n a h i g h l y p a r a l l e l e n v i r o n m e n t i s a t t r i b u t a b l e t o t h e n e e d o f , an d d i f f i c u l t i e s i n , s p e c i f y i n g c o n c u r r e n c y — e i t h e r t h e programmer o r t h e c o m p i l e r h a s t o be c a r e f u l w i t h t h e 82 g e n e r a t i o n of c o n t r o l s i g n a l s , to ensure that memory l o c a t i o n s are not c o r r u p t e d by wrongful i n f o r m a t i o n d u r i n g read and wr i t e o p e r a t i o n s . Such a drawback i s easy to overcome i n a SIMD ( S i n g l e I n s t r u c t i o n stream M u l t i p l e Data streams [6]) system because there i s only a s i n g l e stream of executable i n s t r u c t i o n s ; whereas i n a MIMD ( M u l t i p l e I n s t r u c t i o n streams M u l t i p l e Data streams) system, the asynchronous behavior of memory accesses among the v a r i o u s i n s t r u c t i o n streams o f t e n complicates the implementation of the c o n t r o l mechanisms. Another major drawback i s a t t r i b u t a b l e to the d i f f i c u l t y i n "l o a d p a r t i t i o n i n g " which o f t e n g i v e s r i s e to uneven work lo a d d i s t r i b u t i o n s among the p r o c e s s o r s and access b o t t l e n e c k s i n the memory modules. Moreover, system e x t e n s i b i l i t y i s always d i f f i c u l t to achie v e , and increments i n the number of pr o c e s s o r s are o f t e n not accompanied by p r o p o r t i o n a t e improvement i n the system performance. The g o a l of r e s e a r c h i n " d a t a - d r i v e n " computers [4] i s aimed at a l l e v i a t i n g the above shortcomings, and both t h e i r data and c o n t r o l mechanisms are implemented very much d i f f e r e n t l y from those of Von Neumann systems: the data mechanism i s such that data are passed from the producing i n s t r u c t i o n s t o the consuming ones d i r e c t l y without going through any intermediate s t o r a g e , and the c o n t r o l mechanism i s such t h a t the consuming i n s t r u c t i o n s would be r e a d i e d f o r exe c u t i o n i f , and only i f , they have r e c e i v e d a l l the r e q u i r e d data and i n f o r m a t i o n . For a data d r i v e n computer, i t s c o n t r o l mechanism c o u l d t h e r e f o r e be implemented as 83 suboperations i n c o r p o r a t e d i n t o i t s data mechanism, which c o u l d be e a s i l y implemented using well-known c o m p i l a t i o n techniques (e.g., data-flow a n a l y s i s ) . The absence of an e x p l i c i t c o n t r o l mechanism would ease the task of the programmmer i n s p e c i f y i n g p a r a l l e l i s m to a great e x t e n t . As a r e s u l t , the d a t a - d r i v e n approach i s very a p p e a l i n g to m u l t i p r o c e s s i n g and multiprogramming environments which c o n t a i n l a r g e amounts of u n s t r u c t u r e d , asynchronous concurrency. However, the i m p l i c i t c o n t r o l mechanism of d a t a - d r i v e n systems does not conform to the n o t i o n of c e r t a i n a c t i v i t i e s such as input and output o p e r a t i o n s , which are not n e c e s s a r i l y ready f o r execution when t h e i r data have a r r i v e d . F u r t h e r , the data mechanism of d a t a - d r i v e n systems i s very i n e f f i c i e n t i n handl i n g l a r g e a r r a y s because sending a r r a y s among i n s t r u c t i o n s f o r computation i s both time and space consuming. s From the above d i s c u s s i o n s , i t i s c l e a r that the^data-d r i v e n and c o n t r o l - d r i v e n approaches are complements of each othe r ; t h e r e f o r e , i t i s very n a t u r a l to e n v i s i o n a c l a s s of computers which combine t h e i r c o n t r o l and data mechanisms f o r the purpose of b e t t e r performance. 1.B. Recent Developments T h i s s e c t i o n w i l l examine three e x i s t i n g p r o p o s a l s which adopt the combined approach: 84 ( 1 ) D e p e n d e n c e - D r i v e n s y s t e m ( 1 9 8 1 ) [ 3 2 ] ; ( 2 ) C o m b i n e d s y s t e m ( 1 9 8 2 ) [ 3 3 ] ; ( 3 ) P i e c e - w i s e D a t a - F l o w s y s t e m ( 1 9 8 3 ) [ 3 4 ] . T h e D e p e n d e n c e - D r i v e n s y s t e m i s made u p o f a GCU ( G l o b a l C o n t r o l U n i t ) a n d s e v e r a l p r o c e s s o r c l u s t e r s , e a c h c a p a b l e o f e x e c u t i n g a h i g h - l e v e l f u n c t i o n . T h e c o m p i l e r i s e x p e c t e d t o p r o d u c e a l l t h e s t a t i c i n f o r m a t i o n a b o u t t h e c o m p u t a t i o n a n d t h e GCU w i l l p e r f o r m r u n - t i m e s c h e d u l i n g . T h i s s y s t e m i s b e s t - s u i t e d f o r c o m p u t a t i o n w h i c h c o u l d b e h e a v i l y v e c t o r i z e d ; h o w e v e r , t h e p r e s e n c e o f s c a l a r c o m p u t a t i o n w o u l d c a u s e some p r o c e s s i n g r e s o u r c e s t o s t a n d i d l e d u r i n g t h e i r e x e c u t i o n s , t h u s g i v i n g r i s e t o u n d e r -u t i l i z a t i o n o f t h e s e r e s o u r c e s . T h e C o m b i n e d s y s t e m i n t e g r a t e s t h e c o n c e p t s o f t h e " p u r e " d a t a - d r i v e n c o m p u t a t i o n a n d t h o s e o f t h e " m u l t i - t h r e a d " c o n t r o l - d r i v e n c o m p u t a t i o n . T r e a l e v e n e t a l [ 3 3 ] h a v e s h o w n how i t e r a t i o n s , p r o c e d u r e c a l l s a n d r e s o u r c e m a n a g e m e n t a r e c a r r i e d o u t o n t h i s s y s t e m ; n o t m e n t i o n e d i s how a r r a y o p e r a t i o n s a r e p e r f o r m e d . I f a r r a y o p e r a t i o n s a r e d e c o m p o s e d i n t o i n d i v i d u a l p a c k e t s e a c h c o n t a i n i n g a p a r t i c i p a t i n g a r r a y e l e m e n t , t h e n t h e r e w i l l b e e n o r m o u s a m o u n t s o f o v e r h e a d a s s o c i a t e d w i t h t h e s e t t i n g u p a n d t r a n s m i s s i o n o f t h e p a c k e t s , a n d a l s o t o s y n c h r o n i z e t h e c o m p l e t i o n s o f t h e a r r a y o p e r a t i o n s . R e q u a e t a l [ 3 4 ] h a v e p r o v i d e d a r a t h e r d e t a i l e d 85 d e s c r i p t i o n of the PDF system which possesses both -SIMD and MIMD c h a r a c t e r i s t i c s — these two c l a s s e s of computation are performed on d i f f e r e n t types of hardware modules which are not in t e r c h a n g e a b l e . We b e l i e v e that i f both s c a l a r and a r r a y o p e r a t i o n s c o u l d be c a r r i e d out on the same type of hardware modules, then there w i l l be fewer module types, and hence the system would be l e s s expensive to design and e a s i e r to c o n t r o l . The PDF system avoids using any i n t e r c o n n e c t i o n network by l i m i t i n g the number of s c a l a r p r o c e s s o r s t o about e i g h t ; t h e r e f o r e , the speed of the PDF system i s expected to have l i m i t e d improvement over e x i s t i n g ones (please r e f e r to the I n t r o d u c t i o n S e c t i o n of [ 3 4 ] ) . 1.C. Overview of Our Approach Our o b j e c t i v e i s t o design a heterogeneous m u l t i p r o c e s s o r system which, (1) i s capable of us i n g hundreds t o thousands of pr o c e s s o r s ; (2) has a p r o j e c t e d speed range of 100 to 1,000 MOPS ( m i l l i o n o p e r a t i o n s per second); ( 3 ) possesses both SIMD and MIMD c h a r a c t e r i s t i c s — t h i s i s to be achieved by combining the p r i n c i p l e s of da t a - d r i v e n and c o n t r o l - d r i v e n computation; ( 4 ) i s intended f o r ne x t - g e n e r a t i o n a p p l i c a t i o n s , and i s expected to depart from the p r e v a l e n t , Von Neumann a r c h i t e c t u r e s . 86 In order to connect a l a r g e number of p r o c e s s o r s together and yet ma i n t a i n i n g a high degree of f l e x i b i l i t y , a l a r g e s w i t c h i n g network would be i n c l u d e d i n our d e s i g n . To achieve the d e s i r e d speed range, the intended a p p l i c a t i o n s must possess a l a r g e amount of concurrency to keep the pr o c e s s o r s busy most of the time. Since the r a t i o of SIMD and MIMD i n s t r u c t i o n mix d i f f e r s from a p p l i c a t i o n to a p p l i c a t i o n , i n order to f u l l y u t i l i z e i t s r e s o u r c e s , the system must be abl e to maintain roughly the same l e v e l of performance r e g a r d l e s s of the r a t i o of mix. We a l s o b e l i e v e that i n order to a t t a i n a s i g n i f i c a n t achievement toward u l t r a - f a s t computation, the new design may have to depart from the p r e v a l e n t Von Neumann systems i n both hardware and software; t h e r e f o r e , we only emphasize the a r c h i t e c t u r a l aspects of our design r a t h e r than any immediate implementation. In our proposed system, there are two b a s i c types of op e r a t i o n s -- s c a l a r and compound o p e r a t i o n s , both of which are scheduled f o r execution u s i n g d a t a - d r i v e n p r i n c i p l e s ; but suboperations w i t h i n a compound o p e r a t i o n are sequenced f o r execution i n a c o n t r o l - d r i v e n manner. A compound o p e r a t i o n i s e i t h e r a computational a r r a y o p e r a t i o n , an a r r a y alignment o p e r a t i o n or a block of s e q u e n t i a l program. A s e q u e n t i a l program i s one which e i t h e r r e q u i r e s a f a s t computation time and c o u l d run f a s t e r when executed s o l e l y by a s i n g l e p r o c e s s o r i n a SISD ( S i n g l e I n s t r u c t i o n stream S i n g l e Data stream) mode than by many of them i n a MIMD mode (due to 87 communications overhead), or i s used to c o n t r o l an i n h e r e n t l y s e q u e n t i a l process such as p r i n t i n g on the l i n e p r i n t e r . I/O d e v i c e s SP A 4 LMs R P s >l / ! F i g . I V . 1.- The EDC system block diagram. As shown in F i g . I V . 1 , an EDC c o n s i s t s of s i x b a s i c p a r t s : A S u p e r v i s i n g Processor (SP); a bank of L o c a l Memories (LMs); s e v e r a l T r a n s m i t t i n g P r o c e s s o r s (TPs) and R e c e i v i n g P r o c e s s o r s (RPs); a number of I n s t r u c t i o n R e g i s t e r s • (IRs) and a Packet Switching Network (PSN). The main duty of SP i s to l o a d and spread i n s t r u c t i o n s and data i n t o the bank of LMs, and to i n i t i a t e the a p p r o p r i a t e TPs to s t a r t execution — both u s i n g the r e a d / w r i t e l i n k s p r o v i d e d on the l e f t of LMs and TPs. Compound o p e r a t i o n s ( i . e . , a r r a y computation, a r r a y alignments and e x e c u t i o n s of s e q u e n t i a l programs) w i l l i n v o l v e the use of these r e a d / w r i t e l i n k s as w e l l : when a TP r e c e i v e s such a compound o p e r a t i o n , i t w i l l become a s u b c o n t r o l l e r and request SP e i t h e r f o r the c o n t r o l of s e v e r a l TP-LM p a i r s f o r a r r a y computation i n a SIMD manner, or to i n i t i a t e the e x c u t i o n of a block of s e q u e n t i a l program on another TP i n a SISD f a s h i o n . 88 Information f l o w i n g on the r i g h t of LMs and TPs are encapsulated i n t o the form of e i t h e r r e s u l t or i n s t r u c t i o n p a c k e t s : r e s u l t packets are generated by TPs and are switched through PSN to RPs which w i l l p l a c e them i n t o the proper LMs; RPs are a l s o r e s p o n s i b l e f o r the formation of i n s t r u c t i o n p a c k e t s : they r e t r i e v e the executable i n s t r u c t i o n s and data from LMs and b u f f e r them i n t o IRs to wait f o r f r e e TPs f o r e x e c u t i o n . Because each of the TPs w i l l r e c e i v e i n s t r u c t i o n s from d i f f e r e n t i n s t r u c t i o n streams from time t o time, the i n t e r p r e t a t i o n of "MIMD" i n t h i s case i s somewhat d i f f e r e n t from the t r a d i t i o n a l one. Other s a l i e n t f e a t u r e s of the EDC system i n c l u d e : (a) Fewer module types: Only a few types of hardware modules are used, although each type i s intended to be used i n l a r g e amounts. T h i s would reduce the design c o s t s and give r i s e t o a simpler a r c h i t e c t u r e which i s e a s i e r to c o n t r o l than one which uses a l o t of module t y p e s . (b) I n t e r l e a v e d i n s t r u c t i o n s and skewed a r r a y s : A subset of the LMs are used t o s t o r e s e q u e n t i a l programs which w i l l be executed by t h e i r a s s o c i a t e d TPs using the r e a d / w r i t e l i n k s i n a SISD mode; while the m a j o r i t y of the LMs are meant f o r non-sequential programs which w i l l be executed i n a MIMD mode by TPs using the packet-switched network. For the l a t t e r case, the i n s t r u c t i o n s w i l l be i n t e r l e a v e d i n t o the 89 LMs concerned, thus randomizing and e q u a l i z i n g the access p a t t e r n of TPs and RPs; a r r a y elements w i l l be skewed i n t o these LMs u s i n g known storage techniques [36,37] which allow d i f f e r e n t p o r t i o n s of of an a r r a y to be r e f e r e n c e d c o n c u r r e n t l y . These f e a t u r e s would reduce the problem of memory access b o t t l e n e c k s . (c) Overlapped a r r a y o p e r a t i o n s : While an a r r a y i s being operated on using the read/write l i n k s , s e v e r a l a r r a y alignment o p e r a t i o n s c o u l d be c a r r i e d out u s i n g the packet-switched network which p r o v i d e s a novel way to synchronize the completions of the alignment o p e r a t i o n s , and to s i g n a l those i n s t r u c t i o n s dependent on them. (d) E x t e n s i b i l i t y : The EDC a r c h i t e c t u r e i s h i g h l y e x t e n s i b l e . A f t e r an EDC system has been b u i l t , the numbers of TPs, RPs, IRs and LMs c o u l d be i n c r e a s e d i n c r e m e n t a l l y . Such an advantage i s a t t r i b u t a b l e to the e x t e n s i b i l i t y of the network. A more d e t a i l e d schematic diagram of an EDC i s shown i n F i g . I V . 2 . The f u n c t i o n a l d e s c r i p t i o n s of the system hardware w i l l be given i n S e c t i o n 2, and S e c t i o n 3 w i l l e x p l a i n how i n f o r m a t i o n i s s t o r e d and processed i n an EDC. S e c t i o n 4 w i l l d e s c r i b e the nature of the programming language to be used, and S e c t i o n 5 w i l l examine the performance of an EDC. A comparison of an EDC with the three afore-mentioned designs and some suggested work w i l l be given i n S e c t i o n 6. C^ __^ « Terminal F i l e Sensors , A c t u a t o r s , e t c . A b b r e v i a t i o n s : S u p e r v i s i n g P r o c e s s o r System Memory Channel S e l e c t o r L o c a l Memory T r a n s m i t t i n g P r o c e s s o r R e c e i v i n g P r o c e s s o r I n s t r u c t i o n R e g i s t e r S w i t c h i n g Network Number o f T r a n s m i t t e r s Number o f R e c e i v e r s Number o f l o c a l Memories F i g . I V .2. The connec t ion diagram of EDC hardware a r c h i t e c t u r e . 91 2. EDC H a r d w a r e A r c h i t e c t u r e 2.A. P r o c e s s i n g M o d u l e s ( 1 ) S u p e r v i s i n g P r o c e s s o r ( S P ) S P i s t h e m a s t e r c o n t r o l l e r o f t h e w h o l e s y s t e m a n d i t o v e r s e e s t h e e x e c u t i o n s o f t h e f o l l o w i n g a c t i v i t i e s : ( a ) P r o g r a m d o w n l o a d i n g a n d i n i t i a l i z a t i o n : P r o g r a m s a r e l o a d e d f r o m e x t e r n a l s o u r c e s s u c h a s t h e h o s t c o m p u t e r o r b u l k m e m o r i e s , a n d s t o r e d i n t h e S y s t e m M e m o r y (SM) i n i t i a l l y . When a p r o g r a m i s c a l l e d f o r , S P w i l l a c c e s s a s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) w h i c h i s l o c a t e d i n SM, a n d a l l o c a t e f r e e m e m o r y p a g e s t o t h e c a l l e d p r o g r a m w h i c h w i l l t h e n b e f e t c h e d f r o m SM a n d l o a d e d i n t o L M s . A t t h e e n d o f l o a d i n g , S P w i l l s i g n a l t h e T P s c o n c e r n e d t o s t a r t e x e c u t i o n . ( b ) I n p u t a n d o u t p u t o p e r a t i o n s : I n p u t d a t a w i l l f i r s t g o t o t h e i n p u t b u f f e r l o c a t e d i n SM, a n d t h e n p r o c e e d t o t h e a p p r o p r i a t e L M s . I f t h e v a r i o u s p a r t s o f a n a r r a y a r e t o b e r e f e r e n c e d i n d e p e n d e n t l y a n d c o n c u r r e n t l y , t h e n i t w i l l b e s k e w e d i n t o t h e f i r s t R L M s u s i n g t e c h n i q u e s d e s c r i b e d b y B u d n i k e t a l [ 3 6 , 3 7 ] f o r c o n f l i c t - f r e e a c c e s s e s ; t h e s t o r a g e p a t t e r n w i l l t h e n b e r e c o r d e d i n a n a r r a y d e s c r i p t i o n t a b l e ( A D T ) l o c a t e d i n SM f o r f u t u r e r e f e r e n c e s . O u t p u t d a t a w i l l b e t r a n s f e r r e d f r o m L M s t o t h e o u t p u t b u f f e r w h i c h i s a l s o l o c a t e d i n SM, a n d t h e n 92 t o t h e o u t s i d e d e v i c e s . ) P r o c e s s a n d r e s o u r c e m a n a g e m e n t : SP a l s o h a n d l e s r e q u e s t s f o r p r o c e s s c r e a t i o n s , i n t e r a c t i o n s a n d t e r m i n a t i o n s , p r o c e d u r e c a l l s a n d t h e u s e o f m e m o r i e s a s w e l l a s o t h e r r e s o u r c e s . A r e q u e s t l i s t ( R L ) i s m a i n t a i n e d b y S P t o e n q u e u e t h o s e r e q u e s t s t h a t c a n n o t b e h o n o r e d i m m e d i a t e l y . ) S e t t i n g u p o f c o m p o u n d o p e r a t i o n s : S c a l a r o p e r a t i o n s n e e d n o t g o t h r o u g h S P a n d a r e e x e c u t e d b y T P s a u t o n o m o u s l y ; w h e r e a s c o m p o u n d o p e r a t i o n s h a v e t o b e s e t u p b y S P . I f t h e c o m p o u n d o p e r a t i o n i s a n a r r a y o p e r a t i o n , t h e n S P w i l l r e q u e s t a s u b s e t o f t h e f i r s t R T P s t o p e r f o r m t h e o p e r a t i o n u n d e r t h e d e m a n d a n d c o n t r o l o f a s u b c o n t r o l l e r T P ( j ) , w h e r e j > R . I n t h e c a s e o f a b l o c k o f s e q u e n t i a l p r o g r a m , S P w i l l l o a d i t i n t o L M ( k ) — w h e r e R<k<=M — a n d r e q u e s t T P ( k ) t o e x e c u t e i t . I n t h e f o r m e r c a s e , t h e c h o i c e o f T P s w i l l b e s p e c i f i e d b y t h e s u b c o n t r o l l e r T P ( j ) a c c o r d i n g t o t h e c o m p o u n d o p e r a t i o n i t h a s r e c e i v e d ; i n t h e l a t t e r c a s e , t h e c h o i c e i s a r b i t r a r y . T h e C h a n n e l S e l e c t o r ( C S ) w i l l b e s e t u p b y S P t o r e a l i z e t h e a b o v e c o n n e c t i o n s . ) O t h e r o p e r a t i n g s y s t e m t a s k s : S P may e i t h e r e x e c u t e t h e s e t a s k s d i r e c t l y , o r r e g a r d t h e m a s a p p l i c a t i o n t a s k s a n d a s s i g n t h e m t o T P s . T h e c h o i c e d e p e n d s o n t h e n a t u r e o f t h e O.S. t a s k s . 93 (2) R e c e i v i n g P r o c e s s o r s (RPs) There are R RPs connected to the r e c e i v i n g s i d e of the network PSN. A RP w i l l c o n t i n u o u s l y remove the a r r i v i n g r e s u l t packets from the network and update the contents of the LMs a c c o r d i n g l y . The formats of the v a r i o u s types of r e s u l t packets are l i s t e d i n Table IV.6. A RP w i l l respond to the content of a r e s u l t packet as f o l l o w s : (a) I f i t i s an a r r a y element, then i t w i l l be s t o r e d i n t o the memory l o c a t i o n as s p e c i f i e d by i t s d e s t i n a t i o n address; (b) I f i t i s a s c a l a r operand, the base address of an a r r a y or a s i g n a l l i n g token, then the r e c e i v i n g p r o c e s s o r w i l l update the i n s t r u c t i o n word given by the d e s t i n a t i o n address of the packet, and i t w i l l then examine whether that i n s t r u c t i o n has r e c e i v e d a l l the r e q u i r e d i n f o r m a t i o n ; i f i t has, then the i n s t r u c t i o n w i l l be p l a c e d i n an i n s t r u c t i o n r e g i s t e r (IR) to wait f o r a f r e e TP f o r execution (the s e l e c t i o n of IR-TP p a i r s w i l l be given i n S e c t i o n 2.B.(3)); otherwise no f u r t h e r a c t i o n w i l l take p l a c e . (3) T r a n s m i t t i n g P r o c e s s o r s {TP(1) to TP(R)} T h i s group of TPs w i l l execute both s c a l a r and a r r a y o p e r a t i o n s . Any f r e e TP belonging to t h i s group w i l l c o n t i n u o u s l y check i t s a s s o c i a t e d i n s t r u c t i o n r e g i s t e r s f o r 94 t h e a d d r e s s e s o f e x e c u t a b l e i n s t r u c t i o n s . I f I R ( k ) c o n t a i n s o n e , t h e n T P ( k ) w i l l f e t c h t h e c o r r e s p o n d i n g i n s t r u c t i o n f r o m L M ( k ) a n d e x e c u t e i t . T h e c o m p u t e d r e s u l t s t o g e t h e r w i t h t h e a d d r e s s e s o f t h e n e x t i n s t r u c t i o n s w i l l b e p a c k a g e d i n t o r e s u l t p a c k e t s w h i c h a r e t h e n f o r w a r d e d t o t h e n e t w o r k f o r d i s t r i b u t i o n . A s u b s e t o f t h e s e T P s may u n d e r g o a n a r r a y o p e r a t i o n u n d e r t h e c o n t r o l o f T P ( i ) w h e r e i > R . When T P ( i ) r e c e i v e s s u c h a c o m p o u n d o p e r a t i o n , i t w i l l g e n e r a t e a n d b r o a d c a s t t h e c o n t r o l s i g n a l s t o t h e s e T P s v i a t h e C h a n n e l S e l e c t o r ( C S ) . A s s o o n a s t h e s e T P s h a v e f i n i s h e d t h e i r c u r r e n t a c t i v i t i e s , t h e y w i l l r e s p o n d b y f e t c h i n g t h e a r r a y e l e m e n t s f r o m t h e i r L M s a c c o r d i n g t o t h e b r o a d c a s t • s i g n a l s . I f t h e a r r a y o p e r a t o n i s , ( a ) a c o m p u t a t i o n a l a c t i v i t y , t h e n t h e s e T P s w i l l o p e r a t e o n t h e e l e m e n t s a n d t h e n s t o r e t h e r e s u l t s b a c k t o t h e m e m o r i e s u s i n g t h e r e a d / w r i t e l i n k s ; ( b ) a n a l i g n m e n t o p e r a t i o n , t h e n t h e s e T P w i l l p a c k a g e t h e e l e m e n t s i n t o r e s u l t p a c k e t s a n d f o r w a r d t h e m t o t h e n e t w o r k f o r a l i g n m e n t . A f t e r t h e l a s t e l e m e n t h a s b e e n s e n t o u t , some o f t h e s e T P s w i l l b e r e q u e s t e d b y S P t o g e n e r a t e a s y n c h r o n i z a t i o n t o k e n w h i c h w i l l b e f o r w a r d e d t o t h e n e t w o r k t o i n d i c a t e t h e e n d o f t r a n s m i s s i o n ( t h i s s y n c h r o n i z a t i o n p r o c e s s w i l l b e d e s c r i b e d i n S e c t i o n 2 . C ( 2 ) ) . 95 I f a TP i s not i n v o l v e d i n or has j u s t completed an a r r a y o p e r a t i o n , i t w i l l resume i t s normal a c t i v i t i e s as mentioned i n the beginning of t h i s s u b s e c t i o n . (4) T r a n s m i t t i n g P r o c e s s o r s (TP(R+1) to TP(T)} The main f u n c t i o n of these TPs i s to execute s c a l a r o p e r a t i o n s ; f o r those with LMs, they may be requested by SP to execute s e q u e n t i a l programs as w e l l . Any f r e e TP belongs to t h i s group w i l l c o n t i n u o u s l y check i t s a s s o c i a t e d IR f o r executable i n s t r u c t i o n packets. U n l i k e the p r e v i o u s group, these TPs r e q u i r e t h a t the a c t u a l i n s t r u c t i o n s — i . e . , the opcodes, immediate operands and addresses of next i n s t r u c t i o n s — b e a v a i l a b l e i n the IRs, but. not the addresses of the i n s t r u c t i o n s , because these TPs do not have d i r e c t r ead/write l i n k s to access the f i r s t R LMs where the n o n - s e q u e n t i a l programs are s t o r e d . The r e s u l t computed by these TPs w i l l be packaged i n t o r e s u l t packets which w i l l then be forwarded to the network f o r d i s t r i b u t i o n . To i n i t i a t e the execution of a s e q u e n t i a l program, SP w i l l s e l e c t any f r e e TP-LM p a i r of t h i s group, and the program w i l l be loaded i n t o the LM, and the a s s o c i a t e d TP w i l l be requested to execute i t . Upon completion, t h a t TP w i l l e i t h e r s i g n a l SP or produce a r e s u l t packet to t r i g g e r other i n s t r u c t i o n s v i a the network. The number of TPs c o u l d be l a r g e r than or equal to 96 t h a t o f L M s , d e p e n d i n g o n t h e s p e e d s o f t h e v a r i o u s h a r d w a r e m o d u l e s a n d t h e i n t e n d e d a p p l i c a t i o n s . / 2 . B . S t o r a g e M o d u l e s ( 1 ) S y s t e m M e m o r y (SM) T h e a f o r e m e n t i o n e d i n p u t a n d o u t p u t b u f f e r s a r e l o c a t e d i n SM w h i c h a l s o c o n t a i n s a p p l i c a t i o n p r o g r a m s a s w e l l a s s y s t e m s o f t w a r e s u c h a s I/O r o u t i n e s a n d i n t e r r u p t s e r v i c e r o u t i n e s . W h i l e i n SM, a l l t h e a d d r e s s e s o f a p r o g r a m w i l l r e m a i n i n t h e r e l a t i v e f o r m s o t h a t t h e p r o g r a m c o u l d b e r e -l o c a t a b l e ; w h e n c o p i e d i n t o L M s , t h e s e r e l a t i v e a d d r e s s e s w i l l b e t r a n s l a t e d i n t o a b s o l u t e o n e s b y t h e T P s c o n n e c t e d t o t h e L M s , u s i n g t h e b a s e a d d r e s s p r o v i d e d b y S P . I f a p r o g r a m i s t o b e c a l l e d r e p e a t e d l y , t h e n a c o p y o f i t w i l l b e k e p t i n SM f o r r e p l i c a t i o n p u r p o s e s . SM a l s o c o n t a i n s t h o s e a f o r e m e n t i o n e d t a b l e s , n a m e l y , t h e s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) , t h e a r r a y d e s c r i p t i o n t a b l e ( A D T ) , t h e r e q u e s t l i s t ( R L ) , a s w e l l a s a l i n k a g e i n f o r m a t i o n t a b l e ( L I T ) w h i c h p r o v i d e s t h e l i n k a g e i n f o r m a t i o n b e t w e e n a c a l l i n g p r o g r a m a n d i t s c a l l e d p r o g r a m s . ( 2 ) L o c a l M e m o r i e s ( L M s ) L M ( 1 ) t h r o u g h L M ( R ) a r e u s e d t o c o n t a i n i n t e r l e a v e d 97 i n s t r u c t i o n s and skewed a r r a y s . T h e i r l e f t p o r t s are connected to SP and TPs while t h e i r r i g h t p o r t s t o the RPs. Contentions between RPs and TPs c o u l d be r e s o l v e d by g r a n t i n g t h e i r requests i n an a l t e r n a t i n g manner. LM(R+1) through LM(M) are used to s t o r e s e q u e n t i a l programs which are to be executed s o l e l y by the a s s o c i a t e d TP. At times SP w i l l i n t e r r u p t the above a c t i v i t i e s f o r the l o a d i n g and unloading of programs; such i n t e r f e r e n c e s c o u l d be reduced by i n c r e a s i n g the s i z e of LMs so that most of those f r e q u e n t l y needed programs c o u l d r e s i d e i n them. (3) I n s t r u c t i o n R e g i s t e r s (IRs) IRs serve as b u f f e r s between RPs and TPs. As has been mentioned i n S e c t i o n 2.A(3) and (4), IR(1) through IR(R) c o n t a i n only the addresses of executable i n s t r u c t i o n s while IR(R+1) through IR(T) c o n t a i n the a c t u a l i n s t r u c t i o n s ; t h e r e f o r e , the b u f f e r i n g c a p a c i t i e s of these two groups of IRs are d i f f e r e n t . A s s o c i a t e d with each IR are two s i n g l e - b i t f l a g s : the " F u l l / N o t - F u l l " f l a g which i n d i c a t e s the s t a t u s of the IR, and the "Autonomous/Slave" f l a g which i n d i c a t e s the o p e r a t i n g mode of the connected TP. An autonomous TP i s one which i s ready t o accept or i s c u r r e n t l y executing i n s t r u c t i o n s from IRs, while a s l a v e TP i s one which i s undergoing a compound o p e r a t i o n under the c o n t r o l of another p r o c e s s o r . 98 T o s c h e d u l e a n e x e c u t a b l e i n s t r u c t i o n , R P ( i ) w i l l e x a m i n e t h e f l a g s o f I R ( i + n * R ) i n t h e o r d e r o f i n c r e a s n g n w h i c h i s a n o n - n e g a t i v e i n t e g e r , a n d t h e f i r s t I R w h i c h i s n o t f u l l a n d i s c o n n e c t e d t o a n a u t o n o m o u s T P w i l l r e c e i v e t h e i n s t r u c t i o n p a c k e t . 2.C. S w i t c h e s ( 1 ) C h a n n e l S e l e c t o r ( C S ) CS e n a b l e s S P t o s e l e c t a n y o f t h e T P - L M p a i r s t o p e r f o r m t h o s e a c t i v i t i e s m e n t i o n e d i n S e c t i o n 2 . A ( 1 ) , n a m e l y , p r o g r a m l o a d i n g , i n p u t a n d o u t p u t a c t i v i t i e s a n d s e t t i n g u p o f c o m p o u n d o p e r a t i o n s . T h e i m p l e m e n t a t i o n o f CS i s q u i t e s t r a i g h t - f o r w a r d a n d h e n c e w i l l n o t b e d i s c u s s e d i n t h i s d i s s e r t a t i o n . ( 2 ) P a c k e t S w i t c h i n g N e t w o r k ( P S N ) O t h e r c o n v e n t i o n a l p a c k e t s w i t c h i n g n e t w o r k s c o u l d b e u s e d i n p l a c e o f P S N , b u t t h e y r e q u i r e a t l e a s t ( N / 2 ) l o g N s w i t c h e s f o r a (NxN) c o n n e c t i o n , w h e r e a s PSN u s e s o n l y ( N / 2 ) s w i t c h e s ; t h e r e f o r e , P S N i s a t t r a c t i v e when N i s v e r y l a r g e . P S N i s a m o d i f i e d v e r s i o n o f L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k ( L S S N ) w h i c h h a s b e e n d e s c r i b e d i n C h a p t e r I I I , a n d i t s f u n c t i o n s a r e : 99 (a) to d e l i v e r r e s u l t packets from TPs to RPs and LMs; (b) to perform hardware s y n c h r o n i z a t i o n to s i g n a l the completion of a r r a y alignments. I t i s the second f u n c t i o n above which d i s t i n g u i s h e s PSN from LSSN. The topology and a d d r e s s i n g scheme of PSN are the same as that of LSSN; but i n order to perform hardware s y n c h r o n i z a t i o n on the network, the PSN switches have to be d i f f e r e n t from the LSSN switches. Fig.IV.3 i l l u s t r a t e s the schematic diagram of a PSN s w i t c h . 100 From Transmitting Processors Loop • * * * "^j Stage A 0 , • • < Input Port o 1 o 1 W •P W -P w £. w <H CIS bC CO •H r-H i-J u K U Output Port Input Port Left Class-1 Right Class-1 •X-CM • CM 1 W | P W •P (0 x: w <M CO bO CO 4) i H •H r-t u OS o Output Port B u f f e r pools S y n c h r o n i z a t i o n Stations Intermediate ports To Receiving Processors Output Link F i g . I V . 3 . The schematic diagram of a PSN s w i t c h . 101 I n g e n e r a l , a r e s u l t p a c k e t s e n t o u t b y a t r a n s m i t t i n g p r o c e s s o r w o u l d h a v e t h e p a c k e t f o r m a t a s f o l l o w s : < F e e d b a c k C o u n t ; D e s t i n a t i o n A d d r e s s ; R e s u l t T y p e ; R e s u l t > When a r e s u l t p a c k e t e n t e r s t h e i n p u t p o r t o f a s w i t c h , i t w i l l b e p l a c e d i n t o t h e C l a s s - i b u f f e r i n s i d e t h e s w i t c h a c c o r d i n g t o i t s f e e d b a c k c o u n t i t { 0 , 1 , 2 } , w h i c h i s s e t t o z e r o when t h e p a c k e t i s . i n i t i a l l y g e n e r a t e d , a n d i s i n c r e m e n t e d w h e n e v e r t h e p a c k e t g o e s t h r o u g h t h e f e e d b a c k p a t h . I n F i g . I V . 3 , a l l t y p e s o f r e s u l t p a c k e t s e x c e p t t h e S y n c h r o n i z a t i o n p a c k e t s , w i l l b y p a s s t h e S y n c h r o n i z a t i o n S t a t i o n s when t h e y e m e r g e f r o m t h e b u f f e r p o o l s . F o r a p a c k e t c o m i n g o u t o f t h e C l a s s - 2 b u f f e r , i t w i l l b e f o r w a r d e d t o t h e o u t p u t p o r t i m m e d i a t e l y a n d d i r e c t l y when t h e l a t t e r b e c o m e s e m p t y ; p a c k e t s c o m i n g o u t o f t h e C l a s s - 0 a n d C l a s s - 1 b u f f e r s w i l l b e s w i t c h e d t o a n i n t e r m e d i a t p p o r t t o w a i t f o r t h e i r t u r n s t o b e t r a n s f e r r e d t o t h e o u t p u t p o r t . F o r a s w i t c h l o c a t e d i n t h e s - t h s t a g e , t h e d i r e c t i o n o f s w i t c h i n g i s d e t e r m i n e d b y t h e s - t h b i t o f t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t : i f i t i s a "0" , t h e n t h e p a c k e t w i l l b e s w i t c h e d t o t h e l e f t i n t e r m e d i a t e p o r t ; e l s e t o t h e r i g h t o n e . B e c a u s e o f t h e s i m i l a r i t i e s t h a t e x i s t b e t w e e n t h e t o p o l o g i e s o f P S N a n d L S S N , t h o s e t h e o r e m s d e v e l o p e d f o r L S S N a r e a l s o a p p l i c a b l e t o P S N . T h e o r e m 111,1 h a v e s h o w n t h a t f o r a n e t w o r k w i t h L l o o p s , t h e maximum n u m b e r o f s w i t c h e s 102 t h a t a n y p a c k e t w o u l d h a v e t o g o t h r o u g h i n o r d e r t o a r r i v e a t i t s d e s t i n a t i o n , i s ( 2 1 o g L - 1 ) . C o n s i d e r t h e c a s e i n w h i c h a p a c k e t i s a d m i t t e d a t t h e l a s t s t a g e o f P S N a n d h a s t o g o t h r o u g h t h e maximum n u m b e r o f s w i t c h e s , ( 2 l o g L - 1 ) , t h e n t h i s p a c k e t w i l l b e r e m o v e d f r o m P S N when i t e m e r g e s f r o m a C l a s s - 2 b u f f e r l o c a t e d i n t h e ( l o g L - 2 ) t h s t a g e — w h i c h i s t h e f u r t h e s t d e s t i n a t i o n a n y p a c k e t w i l l h a v e t o g o r e g a r d l e s s o f w h e r e i t i s o r i g i n a t e d ; t h e s i g n i f i c a n t o f t h i s o b s e r v a t i o n w i l l b e come o b v i o u s when we d i s c u s s t h e m e t h o d o f h a r d w a r e s y n c h r o n i z a t i o n o n P S N . A n o t h e r i m p o r t a n t p r o p e r t y o f P S N , a s r e v e a l e d b y Lemma I I I . 2 , i s t h a t a n y p a c k e t w h i c h h a s a l r e a d y a c q u i r e d a f e e d b a c k c o u n t o f 2 w i l l a l w a y s r e m a i n i n t h e same l o o p f o r a n y o f i t s f u r t h e r r o u t i n g s t e p s — t h i s e x p l a i n s why p a c k e t s c o m i n g o u t o f t h e C l a s s - 2 b u f f e r s i n F i g . I V . 3 n e e d n o t g o t h r o u g h t h e i n t e r m e d i a t e p o r t s . T h e p u r p o s e o f t h e S y n c h r o n i z a t i o n S t a t i o n s i s t o a c h i e v e t h e e f f e c t o f h a r d w a r e s y n c h r o n i z a t i o n o n PSN — i . e . , t o s i g n a l t h e c o m p l e t i o n o f a r r a y a l i g n m e n t o p e r a t i o n s s o t h a t o t h e r c o m p u t a t i o n d e p e n d e n t o n t h e s e o p e r a t i o n s may p r o c e e d . A f t e r a l l t h e e l e m e n t s i n v o l v e d i n a n a l i g n m e n t o p e r a t i o n h a v e b e e n d i s p a t c h e d t o P S N , e a c h o f t h e f i r s t L T P s ( i . e . , t h o s e T P s c o n n e c t e d t o t h e f i r s t s t a g e o f PSN) w i l l b e r e q u e s t e d , b y e i t h e r S P o r t h e s u b c o n t r o l l e r o f t h e a l i g n m e n t o p e r a t i o n , t o f o r w a r d a s y n c h r o n i z a t i o n t o k e n i n t h e f o r m o f a r e s u l t p a c k e t . T h e s e p a c k e t s w i l l b e t r e a t e d m u c h t h e same a s o t h e r r e s u l t p a c k e t s e x c e p t t h a t t h e y w i l l b e r e t a i n e d b y t h e S y n c h r o n i z a t i o n S t a t i o n s when e m e r g i n g f r o m t h e b u f f e r p o o l s ; 103 a s y n c h r o n i z a t i o n p a c k e t r e t a i n e d b y t h e l e f t ( r i g h t ) C l a s s - i s t a t i o n w o u l d h a v e t o w a i t f o r t h e a r r i v a l o f a n o t h e r s y n c h r o n i z a t i o n p a c k e t i n t h e r i g h t ( l e f t ) s t a t i o n o f t h e same c l a s s , t h e n b o t h p a c k e t s w i l l p r o c e e d t o t h e i n t e r m e d i a t e a n d o u t p u t p o r t s i n a s t r a i g h t - t h r o u g h m a n n e r . S u c h a s c h e m e w o u l d e n s u r e t h a t t h e s y n c h r o n i z a t i o n p a c k e t s w i l l a l w a y s l a g b e h i n d t h e a r r a y e l e m e n t s w h i c h t h e y a r e t r a i l i n g , a n d t h a t when t h e s e p a c k e t s a r r i v e a t t h e C l a s s - 2 S y n c h r o n i z a t i o n S t a t i o n s o f t h e ( l o g L - 2 ) t h s t a g e , a l l t h e a r r a y e l e m e n t s c o n c e r n e d m u s t h a v e b e e n d e l i v e r e d t o t h e i r d e s t i n a t i o n s ( a s h a s b e e n e x p l a i n e d i n t h e p r e v i o u s p a r a g r a p h ) . U p o n t h e a r r i v a l s o f t h e s y n c h r o n i z a t i o n p a c k e t s , t h e C l a s s - 2 S y n c h r o n i z a t i o n S t a t i o n s o f t h e ( l o g L - 2 ) t h s t a g e w i l l t r a n s f o r m t h e m i n t o s i g n a l l i n g t o k e n s b y r e s e t t i n g t h e i r f e e d b a c k c o u n t s t o z e r o . , a n d c h a n g i n g t h e i r r e s u l t t y p e s ( p l e a s e r e f e r t o T a b l e I V . 6 f o r t h e i r f o r m a t s ) ; t h e s e s i g n a l l i n g t o k e n s w i l l b e r e t r a n s m i t t e d t o t r i g g e r t h o s e i n s t r u c t i o n s d e p e n d e n t o n t h e c o m p l e t i o n o f t h e a r r a y a l i g n m e n t o p e r a t i o n , a n d t h e i r d e s t i n a t i o n a d d r e s s e s a r e t h o s e o r i g i n a l l y c a r r i e d b y t h e s y n c h r o n i z a t i o n p a c k e t s . When a r e s u l t p a c k e t a r r i v e s a t a n o u t p u t p o r t o f a PS N s w i t c h , i t s d e s t i n a t i o n a d d r e s s w i l l b e m a t c h e d a g a i n s t t h a t o f t h e o u t p u t l i n k c o n n e c t e d t o t h e p o r t . I f a m a t c h o c c u r s , t h e n t h e RP c o n n e c t e d t o t h a t l i n k w i l l b e s t r o b e d a n d t h e r e s u l t p a c k e t w i l l b e h a n d e d o v e r t o i t ; o t h e r w i s e t h e p a c k e t w i l l b e f o r w a r d e d t o t h e s w i t c h s i t u a t e d a t t h e o t h e r e n d o f t h e l i n k . 1 0 4 M o r e d e t a i l s o f P S N c o u l d b e f o u n d i n C h a p t e r I I I w h i c h a l s o e x p l a i n s how t o e x p a n d t h e n e t w o r k i n c r e m e n t a l l y — a s i g n i f i c a n t a d v a n t a g e o f P S N o v e r o t h e r c o n v e n t i o n a l n e t w o r k s . A l t h o u g h a P S N s w i t c h h a s a mu c h c o m p l e x i n t e r n a l s t r u c t u r e t h a n a c o n v e n t i o n a l b i n a r y s w i t c h , t h e s a v i n g s i n t h e n u m b e r o f s w i t c h e s a s w e l l a s e x t e r n a l w i r i n g w i l l o f f s e t s u c h a d i s a d v a n t a g e when t h e s i z e o f t h e n e t w o r k i s l a r g e . W i t h t o d a y ' s t e c h n o l o g i e s , a h i g h i n t e r n a l c o m p l e x i t y c o u l d b e e a s i l y a c h i e v e d , b u t i f a s y s t e m i n v o l v e s t o o many e x t e r n a l c o m p o n e n t s , i t w i l l s t i l l b e d i f f i c u l t t o m a n a g e . 3 . E D C I n f o r m a t i o n S t r u c t u r e 3.A. M a c h i n e I n s t r u c t i o n F o r m a t s ( 1 ) F o r m a t f o r s e q u e n t i a l p r o g r a m s : I t i s s i m i l a r t o t h a t o f c o n v e n t i o n a l c o m p u t e r s y s t e m s , a n d i s a r r a n g e d a s o n e d o u b l e - b y t e o f o p c o d e f o l l o w e d b y e i t h e r o n e o r m o r e d o u b l e - b y t e s o f o p e r a n d s . «*-16 b i t s — * . — , O p c o d e O p e r a n d s (2) F o r m a t f o r n o n - s e q u e n t i a l p r o g r a m s : I t i s u s e d t o e n c o d e s c a l a r a n d t h o s e e n c a p s u l a t e d c o m p o u n d o p e r a t i o n s , a n d i s made u p o f e i g h t d o u b l e - b y t e s w h i c h a r e d i v i d e d i n t o 4 f i e l d s : (a)Opcode ( b ) C o n t r o l (c) Operand I ' ( d ) Next f i e l d I n f o rmation f i e l d / I n s t r u c t i o n f i e l d I [ f i e l d «*-16bits-»~« I 6 b i t s X 6 x l 6 b i t s 105 The "Opcode" and " C o n t r o l I n f o r m a t i o n " f i e l d s are of one double-byte each, while the "Operand" and "Next I n s t r u c t i o n " f i e l d s share the remaining s i x d o u b l e - b y t e s . (a) "Opcode" f i e l d : T a b l e s IV.1 and 2 show the four c a t e g o r i e s of s c a l a r and compound o p e r a t i o n s r e s p e c t i v e l y , along with some t y p i c a l examples and t h e i r data-flow graphs. (b) " C o n t r o l I n f o r m a t i o n " f i e l d s : I t i s f u r t h e r d i v i d e d i n t o f i v e s u b f i e l d s : ( l ) R e s u l t ( 2 ) F o r m a t ( 3 ) # 0 p e r a n d s ( 4 ) # T o k e n s ( 5 ) # T o k e n s t y p e t y p e R e q u i r e d R e q u i r e d T o Go ««-3bits **— 4 b i t s — 3 b i t s — — 3 b i t s — — 3 b i t s — * -V c o n s t a n t A — v a r i a b l e — ' (1) "Result type": S p e c i f i e s whether the computed r e s u l t w i l l be of s i n g l e or double p r e c i s i o n , a numerical or boolean v a l u e , or a s i g n a l l i n g token. (2) "Format type": S p e c i f i e s the type of format used to accomodate operands and the addresses of those i n s t r u c t i o n s dependent on the c u r r e n t i n s t r u c t i o n . (3) "#Operand Required": S p e c i f i e s the number of operands needed by the i n s t r u c t i o n . (4) "#Tokens Required": Equals "#Operands Required" p l u s the t o t a l number of s i g n a l l i n g tokens needed. (5) "ITokens To Go": Equals "#Tokens Required" 106 minus the number of tokens r e c e i v e d . When a RP r e c e i v e s an operand or a s i g n a l l i n g token, i t w i l l decrement the "#Tokens To Go" of the r e c e i v i n g i n s t r u c t i o n ; when t h i s value reaches z e r o , the r e c e i v i n g i n s t r u c t i o n w i l l be p l a c e d i n t o an i n s t r u c t i o n r e g i s t e r (IR) to wait f o r e x e c u t i o n , (c) & (d) "Operands" and "Next I n s t r u c t i o n " f i e l d s : The v a r i o u s types of formats used by s c a l a r and compound o p e r a t i o n s are l i s t e d i n Tables IV.3 and 4 r e s p e c t i v e l y . These simple formats w i l l meet almost a l l the computational needs; otherwise new formats c o u l d be added i f necessary (a t o t a l of 4 b i t s are a s s i g n e d to the "Format Type" f i e l d which c o u l d account f o r -16 f o r m a t s ) . In Table IV.3, "Opi" r e f e r s to the i - t h operand of an i n s t r u c t i o n and "Nextj" r e f e r s to the address of the j - t h next i n s t r u c t i o n , and "NextT" and "NextF" are the addresses of the next i n s t r u c t i o n s when the r e s u l t of a boolean o p e r a t i o n i s "True" and " F a l s e " r e s p e c t i v e l y . Format No.8 i s u s e f u l f o r those o p e r a t i o n s such as " D u p l i c a t e " and "Wait" which do not c a r r y embedded operands. In Table IV.4, "No. of elements" r e f e r s to the t o t a l number of a r r a y elements i n v o l v e d i n the a r r a y o p e r a t i o n , and " S t r i d e " i s the d i f f e r e n c e i n the indexes of two neig h b o r i n g a r r a y elements which take p a r t i n the o p e r a t i o n . Both the 107 "No. of elements" and " S t r i d e " are obtained from the loop c o n t r o l statements such as "DO 1=1,64,2" or "FOR I=1to64step2 DO". In Table IV.4, "(V1)" i s the base address a s s i g n e d to the r e s u l t i n g v e c t o r V1, and "(V2)" and "(V3)" are those of the input v e c t o r s V2 and V3, r e s p e c t i v e l y . A l l compound o p e r a t i o n s except those "Reduction" ones would produce v e c t o r s which are too expensive ( i n terms of time and space) t o be sent to each and every i n s t r u c t i o n r e q u i r i n g them; t h e r e f o r e , only the base addresses of the v e c t o r s w i l l be sent. As f o r "Reduction" o p e r a t i o n s such as summation and product, t h e i r s c a l a r r e s u l t s would be t r e a t e d much the same as those produced by s c a l a r o p e r a t i o n s . Although the formats shown i n Tables IV.3 and IV.4 have l i m i t e d numbers of "Next I n s t r u c t i o n s " f i e l d s , t h e i r a c t u a l fan-outs c o u l d be extended i n f i n i t e l y by having one or more of t h e i r "Next I n s t r u c t i o n s " f i e l d s p o i n t to a number of " D u p l i c a t e " o p e r a t o r s . 3.B. Packet Formats There are two c l a s s e s of packets that e x i s t i n EDC, namely, (a) I n s t r u c t i o n packets: They flow from RPs to TPs and r e s i d e i n IRs while w a i t i n g f o r e x e c u t i o n . (Please r e f e r t o Table IV.5.) 108 ( b ) R e s u l t p a c k e t s : T h e y a r e p r o d u c e d b y T P s a n d a r e f o r w a r d e d t o R P s v i a t h e n e t w o r k P S N . ( P l e a s e r e f e r t o T a b l e I V . 6 . ) 3.C. P r o g r a m O r g a n i z a t i o n T h e EDC p r o g r a m o r g a n i z a t i o n i s s i m i l a r t o t h o s e o f t h e e x i s t i n g c o m p u t e r s y s t e m s . B o t h t h e a p p l i c a t i o n a n d s y s t e m s o f t w a r e a r e made u p o f t h r e e t y p e s o f p r o g r a m c o m p o n e n t s : ( 1 ) M a i n p r o g r a m s : T h e y a r e a c t i v a t e d v i a e x t e r n a l m e a n s s u c h a s t h e c o n s o l e a n d n o t t o b e c a l l e d b y o t h e r p r o g r a m c o m p o n e n t s . ( 2 ) P r o c e d u r e s : T h e y a r e a c t i v a t e d b y e x p l i c i t c a l l s f r o m t h e p r o g r a m c o m p o n e n t s . T h e c a l l i n g p r o g r a m s u s e " C a l l " a n d " D i s t r i b u t e " o p e r a t o r s a n d t h e c a l l e d p r o g r a m s u s e " D i s t r i b u t e " a n d " R e t u r n " o p e r a t o r s f o r p a r a m e t e r p a s s i n g . A s d e p i c t e d i n F i g . I V . 4 , when t h e " C a l l " i n s t r u c t i o n h a s g a t h e r e d a l l i t s i n p u t t o k e n s , i t w i l l b e d i s p a t c h e d b y a RP t o a f r e e T P w h i c h w i l l t h e n r e q u e s t t h e p r o g r a m c o d e f r o m S P . I f t h e p r o g r a m c o d e d o e s n o t e x i s t i n L M s , t h e n S P w i l l a l l o c a t e f r e e m e m ory p a g e s t o i t a n d l o a d i t f r o m SM t o L M s , a n d i t s s t a r t i n g m e m ory l o c a t i o n w i l l b e r e t u r n e d t o t h e r e q u e s t i n g T P w h i c h w i l l t h e n p r o c e e d 109 with other computations. when the "Return" operator i s executed, a l l the computed r e s u l t s w i l l be routed back t o the c a l l i n g program and the memory pages a s s i g n e d to the c a l l e d program w i l l be r e l e a s e d . M: i j P a b 1 \ \ \ j CALL DISTRIBUTE c 1 (P;a,b;M.j) (M.j;c) DISTRIBUTE a b M.j RETURN v ; c M. j F i g . I V . 4 Parameter p a s s i n g between the c a l l i n g program M and c a l l e d program P. "a" and "b" are the input parameters and "c" i s the re t u r n e d r e s u l t , and " j " i s the r e t u r n a ddress. (3) Task programs: They are used to p r o t e c t shared data and/or p h y s i c a l r e s o u r c e s so as to ensure t h e i r proper use. A task program c o n s i s t s of one or more e n t r y p o i n t s whereby other programs c o u l d send data or s i g n a l s t o i t , and t h e r e f o r e i t i s a means of p r o v i d i n g communications and i n t e r a c t i o n s among the v a r i o u s types of program components. The implementation of parameter p a s s i n g between a task and the c a l l i n g programs i s very much the same as that of procedure c a l l i n g ; the major d i f f e r e n c e i s that a procedure i s a c t i v a t e d by an e x p l i c i t c a l l while a task program i s 110 a c t i v a t e d when t h e p r o g r a m w h i c h d e c l a r e s i t c o m e s i n t o e x i s t e n c e ; a l s o , a p r o c e d u r e t e r m i n a t e s when t h e c o m p u t e d r e s u l t s a r e r e t u r n e d t o t h e c a l l e r s , w h e r e a s a t a s k p r o g r a m may c o n t i n u e t o s e r v e o t h e r c a l l e r s u n t i l a n e x p l i c i t t e r m i n a t i o n s t a t e m e n t i s e n c o u n t e r e d , o r t h e p r o g r a m w h i c h d e c l a r e d i t h a s t e r m i n a t e d . e . g . T a s k t ; A c c e p t A ( x : R e a l ) R e t u r n ( y : R e a l ) ; E n d A; E n d t ; C a l l e r s T a s k t T h e e x e c u t i o n A F i g . I V . 5 . T h e i n t e r a c t i o n s b e t w e e n c a l l i n g p r o g r a m s a n d a t a s k p r o g r a m . T h e e x a m p l e o f F i g . I V . 5 s h o w s a t a s k w i t h a s i n g l e e x e c u t i o n p a t h ; b u t i n g e n e r a l , a t a s k c o u l d b e " m u l t i -t h r e a d e d " — i . e . , made u p o f s e v e r a l c o n c u r r e n t e x e c u t i o n p a t h s . T h e a d v a n t a g e s o f u s i n g t a s k p r o g r a m s i n s t e a d o f l o w -l e v e l c o n c u r r e n c y p r i m i t i v e s s u c h a s s e m a p h o r e s [ 4 2 ] i n 1 1 1 h a n d l i n g i n t e r - p r o g r a m a c t i v i t i e s are ease of use and c l a r i t y . Furthermore, the implementation of t a s k s conforms to the p r i n c i p l e of d a t a - d r i v e n computation. Compared t o other h i g h - l e v e l c o n s t r u c t s , a task i s q u i t e d i f f e r e n t from the "monitor" of Concurrent P a s c a l [38] but very s i m i l a r to the " t a s k " of Ada [ 4 4 ] . 3.D. Data S t r u c t u r e We o n l y d i s c u s s a r r a y s i n t h i s paper alt h o u g h some other more c o m p l i c a t e d s t r u c t u r e s [49,68] may a l s o be c o n s i d e r e d i n our d e s i g n . The h a n d l i n g of a r r a y s i n an EDC i s i l l u s t r a t e d i n F i g . I V . 6 . S y s t e m Memory-F i g . I V . 6 . The p h y s i c a l and l o g i c a l arrangements of EDC memory system. The f i r s t R LMs are l o g i c a l l y d i v i d e d i n t o pages as shown. LM(1) t o LM(R) are used to s t o r e d skewed a r r a y s so 112 t h a t t h e y may b e p r o c e s s e d c o n c u r r e n t l y b y T P ( 1 ) t h r o u g h T P ( R ) . H o w e v e r , f o r r e a s o n s o f e f f i c i e n c y o r a l g o r i t h m i c c o n s t r a i n t s , a n a r r a y may n o t b e s k e w e d b u t i n s t e a d , w i l l b e e i t h e r l o a d e d e n t i r e l y i n t o a l o c a l m e m o r y L M ( k ) a n d p r o c e s s e d b y T P ( k ) , o r d i v i d e d a m o n g s e v e r a l T P ( k ) - L M ( k ) p a i r s w h e r e k>R. T h e d e c i s i o n s c o n c e r n i n g t h e s e a r r a n g e m e n t s c o u l d b e made e i t h e r s t a t i c a l l y a t c o m p i l e t i m e , o r d y n a m i c a l l y b y SP a t r u n t i m e . T h e a r r a y d e s c r i p t i o n t a b l e ( A D T ) a n d s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) m u s t a l w a y s b e u p d a t e d t o r e f l e c t t h e s t o r a g e p a t t e r n s . 3 . E . P r o c e s s a n d R e s o u r c e M a n a g e m e n t I n t h e EDC e n v i r o n m e n t , a p r o c e s s i s d e f i n e d a s e i t h e r a m a i n o r t a s k p r o g r a m i n e x e c u t i o n , a n d t h o s e p r o c e d u r e s c a l l e d a n d d a t a s t r u c t u r e s o w n e d b y t h e p r o g r a m a r e r e g a r d e d a s p a r t s o f t h e p r o c e s s . T h e t r e a t m e n t s o f p r o c e s s c r e a t i o n s a n d t e r m i n a t i o n s a r e v e r y s i m i l a r t o t h a t o f p r o c e d u r e c a l l s : t h e r e q u e s t t o c r e a t e a p r o c e s s w i l l b e f i r s t p l a c e d o n t h e R e q u e s t L i s t ( R L ) u n t i l i t i s r e m o v e d b y S P , w h i c h w i l l t h e n a s s i g n a n u n u s e d i d e n t i f i c a t i o n n u m b e r ( I D ) f r o m t h e l i n k a g e i n f o r m a t i o n t a b l e ( L I T ) a n d f r e e m e m o r y p a g e s f r o m t h e s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) t o t h e p r o c e s s ; S P w i l l t h e n l o a d t h e m e m o r i e s a l l o c a t e d w i t h t h e p r o g r a m c o d e a n d i n i t i a l i z e i t t o r u n . When t h e p r o c e s s t e r m i n a t e s , S P w i l l a g a i n u p d a t e SUT a n d L I T a c c o r d i n g l y . A s i l l u s t r a t e d i n F i g . I V . 7 , t h e m a n a g e m e n t o f h a r d w a r e 113 and/or software resources c o u l d be implemented c o n v e n i e n t l y u s i n g a task program. The number of unused r e s o u r c e s of a p a r t i c u l a r type (e.g., the number "N" of Fig.IV.7) i s s t o r e d i n a memory l o c a t i o n which can only be accessed from w i t h i n the c r i t i c a l r e g i o n e n c l o s e d by the " S e l e c t " and "End S e l e c t " o p e r a t o r s . In order t o prevent m a l i c i o u s a c c e s s e s t o that memory l o c a t i o n , only one request a t a time would be allowed to e nter the c r i t i c a l r e g i o n t o modify the number of the res o u r c e s , and t h i s i s achieved with the use of a s i g n a l l i n g token as shown. The content of t h a t memory l o c a t i o n i s incremented whenever a "Release" request i s honored and decremented whenever an "Acquire" request i s gra n t e d . S i g n a l l i n g Requests from p r o c e s s e s Release A c q u i r e Acknowledgement, to p r o c e s s e s A c q u i r e d Task Program token SELECT (Acquire) • (Release) N:=N-1 • • • * N:=N+1 • » 21 END SELECT ZI7—1=1 Released F i g . I V . 7 . The implementation of a resource manager using a task program. The " S e l e c t " operator used i n the resource manager of Fig.IV.7 does not conform f a i t h f u l l y to the d a t a - d r i v e n p r i n c i p l e s , because i t s execution i s t r i g g e r e d by the a r r i v a l s 1 1 4 o f t h e s i g n a l l i n g t o k e n p l u s a t l e a s t one r e q u e s t -- n o t n e c e s s a r i l y a l l o f them. I f t h e r e a r e s e v e r a l r e q u e s t s a t t h e same t i m e , t h e n t h e y w i l l be e n q u e u e d when t h e y a r r i v e , a n d t h e s e l e c t i o n p o l i c y f o r t h e s e r e q u e s t s c o u l d be e i t h e r f i r s t - c o m e - f i r s t - s e r v e d o r p r i o r i t y - b a s e d , d e p e n d i n g on t h e i m p l e m e n t a t i o n . 4. EDC P r o g r a m m i n g L a n g u a g e S t r u c t u r e A p r o g r a m t h a t r u n s on EDC t a k e s t h e f o r m o f e i t h e r a m a i n p r o g r a m , a p r o c e d u r e o r a t a s k p r o g r a m , a nd i t i s composed o f one o r more " p r o g r a m b l o c k s " w h i c h a r e c o l l e c t i o n s o f i n s t r u c t i o n s t h a t h a v e no b r a n c h i n g i n t o o r o u t o f t h e b l o c k s , e x c e p t a t t h e b e g i n n i n g s a n d e n d i n g s . The a d v a n t a g e s o f u s i n g b l o c k s a r e p r o g r a m c l a r i t y a n d t h a t e x i s t i n g t e c h n i q u e s o f o p t i m i z i n g c o m p i l e r s c o u l d be u s e d . The o b j e c t i v e o f t h i s s e c t i o n i s t o p r e s e n t some u s e f u l i d e a s c o n c e r n i n g t h e d e s i g n o f t h e EDC p r o g r a m m i n g l a n g u a g e , a n d t o i l l u s t r a t e how some l a n g u a g e c o n s t r u c t s a r e c o m p i l e d i n t o d a t a - f l o w g r a p h s w h i c h c o u l d be e a s i l y t r a n s l a t e d i n t o m a c h i n e c o d e u s i n g t h e f o r m a t s p r e s e n t e d i n S e c t i o n 3. 4.A. EDC S t a t e m e n t s a n d P r o g r a m B l o c k s (1) D e c l a r a t i o n s t a t e m e n t s : M o s t o f them a r e u s e d t o a s s i s t 1 15 the compiler i n s e t t i n g up the data-flow graphs and are not t r a n s l a t e d i n t o executable o p e r a t i o n s . Exceptions are the d e c l a r a t i o n of task programs and a r r a y s , which w i l l be compiled i n t o o p e r a t i o n s that w i l l request SP for process c r e a t i o n s and memory space r e s p e c t i v e l y . (2) Assignment statements: In a c o n v e n t i o n a l s e q u e n t i a l program, v a r i a b l e names c o u l d be used r e p e a t e d l y to represent d i f f e r e n t e n t i t i e s i n d i f f e r e n t p a r t s of the program without c a u s i n g much c o n f u s i o n ; however, such a "convenience" would o f t e n l e a d to o b s c u r i t i e s i n a concurrent environment. In the EDC system, the S i n g l e Assignment Rule (SAR) i s used to a v o i d such c o n f u s i o n s whenever necessary. The SAR simply s t a t e s that a v a r i a b l e name must not be as s i g n e d more than one value w i t h i n i t s scope; when a p p l i e d to data-flow graphs, i t means that each arc of the graphs c o u l d have atmost one source of o r i g i n . (3) "Begin/End" block: The "Begin" and "End" statements w i l l be compiled i n t o "Wait" o p e r a t o r s as demonstrated i n Fig.I V . 8 . The "Wait" operator i s a means of imposing dependencies among program b l o c k s so as to achieve the d e s i r e d s e q u e n t i a l i t y not e x p l i c i t l y expressed by t h e i r data dependencies. e . g . B e g i n S i g n a l l i n g t o k e n ( s ) O p e r a n d t o k e n ( s ) 116 B e g i n : W a i t E n d ; P r o g r a m g r a p h E n d : W a i t 4 F i g . I V . 8 A " B e g i n / E n d " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 4 ) " I F " b l o c k : I t w i l l b e c o m p i l e d i n t o a b o o l e a n p l u s a n u m b e r o f " S w i t c h " i n g o p e r a t o r s w h i c h a r e u s e d t o d i r e c t t h e f l o w o f i n p u t o p e r a n d s i n t o e i t h e r t h e " I F " o r " E 1 S E " p a r t o f t h e p r o g r a m . S o m e t i m e s c e r t a i n e m a n a t i n g a r c s h a v e t o b e " g r o u n d e d " i n o r d e r t o d i s c a r d t h e u n u s e d o p e r a n d s a f t e r t h e c o n d i t i o n a l t e s t . CI C 2 B , g . I F ( C l £ C 2 ) T H E N E L S E END; F i g . I V . 9 A n " I F " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 5 ) " M a t c h " b l o c k : I t w i l l b e c o m p i l e d i n t o a s e r i e s o f b o o l e a n a n d " S w i t c h " i n g o p e r a t o r s a s s h o w n i n F i g . I V . 1 0 . T h e i n p u t o p e r a n d " C " w i l l b e m a t c h e d a g a i n s t a l l t h e c o m p a r a n d s " C 1 , C 2 , ...." i n p a r a l l e l , a n d t h e " S w i t c h " i n g o p e r a t o r s 117 w i l l d i r e c t t h e o p e r a n d s t o t h e p a r t o f t h e p r o g r a m w h i c h h a s a s u c c e s s f u l m a t c h . A n " E l s e " p a r t s h o u l d b e p r o v i d e d i n c a s e a l l t h e m a t c h e s f a i l . e.g. MATCH (C) CASE(C1)D0 CASE(C2)D0 ELSE END; F i g . I V . 1 0 . A " M a t c h " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 6 ) " L o o p " b l o c k : A n E D C l o o p i s d i f f e r e n t f r o m t h e " F o r - a l l " a n d " D o - a l l " l o o p s p r o p o s e d i n o t h e r d a t a - d r i v e n l a n g u a g e s [ 4 0 ] . S i n c e p a r a l l e l a r r a y c o m p u t a t i o n s i n EDC a r e e n c o d e d a s c o m p o u n d o p e r a t i o n s , i t i s n o t n e c e s s a r y t o u s e " L o o p " s f o r t h e s e c o m p u t a t i o n s ; i n s t e a d , l o o p s a r e u s e d f o r i t e r a t i o n s a n d r e c u r r e n c e o p e r a t i o n s w h i c h e x h i b i t d a t a d e p e n d e n c i e s b e t w e e n t w o s u c c e s s i v e l o o p c o m p u t a t i o n s . 118 e.g. LOOP WHILE(Cl-C2)DO • NEXT X Z • • • NEXT Cl:« ... NEXT C2:« ... • LOOP EXIT XLAST :« X; END; F i g . I V . 1 1 . A "LOOP" b l o c k a n d i t s d a t a - f l o w g r a p h . 119 As e x e m p l i f i e d by Fig.IV.11, the data-flow graph of a EDC loop c o n t a i n s some a r c s which have two sources — one o u t s i d e the loop and another one w i t h i n the loop body — imply i n g that some v a r i a b l e s are a s s i g n e d more than once, thus v i o l a t i n g the S i n g l e Assignment Rule. A common remedy [39,65] i s to p l a c e p r e f i x e s such as "NEW" and "NEXT" i n f r o n t of those v a r i a b l e names i n q u e s t i o n . Thus, "NEXT X" would be t r e a t e d d i f f e r e n t l y from "X" d u r i n g a loop computation, but "NEXT X" w i l l be updated as "X" at the boundary of two c o n s e c u t i v e loop computations, and "NEXT X" w i l l be an e n t i r e l y new v a r i a b l e when the next loop computation commences. When the c o n d i t o n a l t e s t a s s o c i a t e d with a loop i s s a t i s f i e d , the loop w i l l be e x i t e d and then some of the computed r e s u l t s w i l l be passed to the e x t e r i o r of the loop by a s s i g n i n g them to names that are not used w i t h i n the loop (e.g. "XLAST" of F i g . I V . 1 1 ) . (7) P r e f i x f o r s e q u e n t i a l i t y : As has been mentioned before, c e r t a i n a c t i v i t i e s such as input and output are i n h e r e n t l y s e q u e n t i a l , and i t i s more convenient and e f f i c i e n t to execute t h e i r i n s t r u c t i o n s i n the order s p e c i f i e d by the programs; and the "SEQ"uential p r e f i x i s meant f o r such purposes. If an i n s t r u c t i o n block i s p r e f i x e d with "SEQ", then s i g n a l l i n g tokens would be used to enhance i t s data-120 f l o w g r a p h t o a c h i e v e t h e d e s i r e d s e q u e n t i a l i t y . I f a n e n t i r e p r o g r a m i s p r e f i x e d w i t h " SEQ", t h e n i t w i l l b e c o m p i l e d i n t o a s e q u e n t i a l p r o g r a m s u s i n g t h e f o r m a t m e n t i o n e d i n S e c t i o n 3 . A ( 1 ) . 4.B. L a n g u a g e C o n s t r u c t s f o r A r r a y P r o c e s s i n g A r r a y o p e r a t i o n s a r e p r o b a b l y t h e r i c h e s t s o u r c e o f s y n c h r o n o u s p a r a l l e l i s m a n d a b o u n d i n s c i e n t i f i c c o m p u t a t i o n s . T h i s s e c t i o n w i l l d i s c u s s how o n e - d i m e n s i o n a l a r r a y s a r e e n c o d e d a n d e x e c u t e d i n E D C ; a r r a y s w i t h a h i g h e r d i m e n s i o n w i l l b e r e d u c e d i n t o o n e - d i m e n s i o n a l a r r a y s b e f o r e t h e i r c o m p u t a t i o n s . ( 1 ) P a r a l l e l V e c t o r O p e r a t i o n s : T h e r a n g e a n d s t r i d e o f a p a r a l l e l v e c t o r o p e r a t i o n a r e i n d i c a t e d w i t h t h e u s e o f a n " I N D E X " s e t , w h i c h h a s t o b e d e c l a r e d p r i o r t o i t s u s e a s f o l l o w s : ( A ) ( B ) ( C ) I e . g . D E C L A R E I : I N D E X 1..64 S T R I D E 2 B E G I N C ( I ) := A ( I ) + B ( I ) ; END; ( A D D ) (C) O p c o d e C o n t r o l # E l e m e n t s S t r i d e A d d r e s s A d d r e s s A d d r e s s ( A D D ) I n f o r . « 6 4 • 2 o f O u t p u t o f I n p u t o f I n p u t N e x t 2 N e x t l t A r r a y = ( C ) A r r a y = ( A ) A r r a y = ( B ) F i g . I V . 1 2 . T h e s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o o f a p a r a l l e l v e c t o r o p e r a t i o n . T h e m a c h i n e c o d e f o r m a t i s a s d e s c r i b e d i n T a b l e I V . 4 . 121 Both the range and s t r i d e i n d i c a t e d i n an index are regarded as input operands to the a r r a y o p e r a t i o n , and they c o u l d be e i t h e r c o n s t a n t s or v a r i a b l e s t o be determined a t run time: The base addresses of the input a r r a y s ( i . e . , (A) and (B) of Fig.IV.12) are r e c e i v e d from the p r e c e d i n g o p e r a t i o n s , and t h a t of the output a r r a y ( i . e . , (c) of Fig.IV.12) i s o b t a i n e d from the memory manager p r i o r to the e x e c u t i o n of the o p e r a t i o n , and i s sent to the succeeding o p e r a t i o n s as an input operand. The e x e c u t i o n of such o p e r a t i o n s has been d e s c r i b e d i n S e c t i o n 2.A.O) and ( 3 ) . (2) Reduction O p e r a t i o n s : T h i s i s another type of a r r a y o p e r a t i o n s f r e q u e n t l y encountered, and there are s i x of them, namely, "SUMmation", "PRODUCT", "MAXimum", "MINimum", "AND" and "OR". e.g. DECLARE J : INDEX 1 . . 1024 BEGIN xsum := SUM(x(J)); END; (X) . (SUM) 1 " XSUM Opcode C o n t r o l #Elements S t r i d e Address I n f o r . =1024 o f input" Array=(X) Next4 Next3 Next2 Nextl (SUM) =1 F i g . I V . 1 3 . The statements, data-flow graph and machine code of a r e d u c t i o n o p e r a t i o n . The machine code format i s as d e s c r i b e d i n Table IV.4. 122 When an a r r a y i s to be reduced, i t s elements w i l l be p a r t i t i o n e d and loaded i n t o s e v e r a l LM(k)'s — where k>R — and each p a r t of i t w i l l be processed by the a s s o c i a t e d TP independently; at the end of the computation, each of these TPs w i l l forward i t s p a r t i a l r e s u l t i n the form of a r e s u l t packet, v i a the network to an i n s t r u c t i o n which w i l l combine a l l the p a r t i a l r e s u l t s . The number of TP-LM p a i r s used f o r a r e d u c t i o n o p e r a t i o n depends on the s i z e of the a r r a y and the speeds of the v a r i o u s hardware and software modules, and has to be o p t i m i z e d i n order to o b t a i n the s h o r t e s t p o s s i b l e computation time. ( 3 ) Alignment o p e r a t i o n s : "SHIFT" and "ROTATE" are two alignment p r i m i t i v e s : the "ROTATE" operator moves the a r r a y elements c y c l i c a l l y by the amount s p e c i f i e d a f t e r the o p e r a t o r , while the "SHIFT" operator f u n c t i o n s i n a s i m i l a r way except that there i s no c y c l i c feedback of a r r a y elements, and zeroes are i n s e r t e d i n t o the p o s i t i o n s vacated by the s h i f t i n g o p e r a t i o n s . The d i r e c t i o n of an alignment c o u l d be f i x e d a r b i t r a r i l y ; and i f none of the o p e r a t o r s i s s p e c i f i e d , then the "SHIFT" o p e r a t i o n w i l l be assumed. 123 e . g . x := a ( K S H I F T - 1 ) ; y := b ( I R O T A T E 2 ) ; z := c ( J - 1 ) ; (A)<f> P -1 \ VI / S H I F T X ) O p c o d e C o n t r o l # E l e m e n t s S t r i d e A d d r e s s A d d r e s s D i s p l a c e - N e x t 2 , l o f O u t p u t o f I n p u t ments -1 S H I F T I n f o r . = 1 0 2 4 =1 A r r a y = ( X ) A r r a y = ( A ) F i g . I V . 1 4 . T h e s t a t e m e n t s o f some a l i g n m e n t o p e r a t i o n s , a n d t h e d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f t h e " S H I F T " o p e r a t i o n . 5. P e r f o r m a n c e A n a l y s i s 5.A. F l o w A n a l y s i s o f E D C I n o r d e r t o s i m p l i f y t h e p e r f o r m a n c e a n a l y s i s o f E D C a n d t o a r r i v e a t some m e a n i n g f u l r e s u l t s , s ome a s s u m p t i o n s w i l l b e made; l a t e r o n , t h e j u s t i f i c a t i o n o f t h e s e a s s u m p t i o n s w i l l b e d i s c u s s e d . ( 1 ) A s s u m p t i o n s : ( a ) T h e p r o b l e m s t o b e p r o c e s s e d b y EDC c o m p r i s e a l a r g e a m o u n t o f c o n c u r r e n c y w h i c h w i l l k e e p a l l t h e T P s b u s y m o s t o f t h e t i m e ; ( b ) I n i t i a l l y a l l t h e c o m p u t a t i o n s a r e a s s u m e d t o b e o f 124 s c a l a r type; compound o p e r a t i o n s w i l l be ignored t e m p o r a r i l y ; (c) The s c a l a r o p e r a t i o n s are randomly d i s t r i b u t e d among the f i r s t R LMs, thus g i v i n g r i s e t o approximately equal packet flow among the output p o r t s of the network PSN. (2) C o n s t r a i n t s On Communications Loads; As s t a t e d i n S e c t i o n 2.C(2), the maximum throughput r a t e (MATR) that can be d e l i v e r e d by a PSN with L loops i s : MATR(L) = 3/2xS xlogLxL**2/{3LlogL~L+4} (IV.1) R,SW (packets/second) (3) C o n s t r a i n t s On P r o c e s s i n g Loads: In order to prevent the i n s t r u c t i o n r e g i s t e r s (IRs) from o v e r f l o w i n g , the t o t a l p r o c e s s i n g speed of TPs must exceeds t h a t of RPs, i . e . , T x S I > T p > R x S I j R p T > < S I > R p / S I > T P ) x R (IV.2) where T and R are the numbers of TPs and RPs r e s p e c t i v e l y , and S i s the average r a t e of producing i n s t r u c t i o n packets by X , RP 125 a RP, and S T „_ i s the average r a t e of consuming i n s t r u c t i o n I, TP packets by a TP. For a PSN with L loops, we can connect up t o a maximum of N=LlogL p a i r s of TPs and RPs. I f a l l the input p o r t s are connected with TPs, then T=N=LlogL, and from e x p r e s s i o n (IV.2), R < ( S I j T p / S I > R p ) x L x l o g L (IV.3) If each RP i s capable of a c c e p t i n g S R R p r e s u l t packets from the network per second, then the maximum acceptance r a t e of r e s u l t packets (MARP) by RPs per second i s : MARP=S R > RpXR Since most of the s c a l a r i n s t r u c t i o n s w i l l be compiled i n t o b i n a r y o p e r a t i o n s , meaning that on the average, the acceptance of every two r e s u l t packets w i l l cause one i n s t r u c t i o n t o be re a d i e d f o r e x e c u t i o n , i . e . , *I.RP " =R,RP / 2 ( I V - 4 > I f RPs are connected to a l l the output p o r t s of the network, then R=LlogL and the maximum acceptance r a t e of r e s u l t packets of such a f u l l y connected c o n f i g u r a t i o n (MARPj) i s : 126 MARP f=RxS R R p = ( L l o g L ) x S R ^ R p (IV. 5) But the v a l u e of R i s c o n s t r a i n e d by e x p r e s s i o n ( I V . 3 ) ; t h e r e f o r e , the maximum acceptance r a t e s u b j e c t e d to such a c o n s t r a i n t i s : M A R P c < ^ I , T P ^ I , R P ) x L x l o g L x ? R f R P S u b s t i t u t i n g S =S /2 i n t o the above e x p r e s s i o n , R,RP I,RP ™ R P c , m a x = 2 L x l o g L x S i , T P * ? . . . . . . (IV.6) E x p r e s s i o n s (IV.1, 5 and 6) show that the EDC performance depends on the network s i z e L, the speeds of TPs, RPs and the s w i t c h e s . 5.B. Example: Let us c o n s i d e r some t y p i c a l v a l u e s f o r the speeds of the v a r i o u s hardware modules, and then examine the EDC performance as a f u n c t i o n of the network s i z e , L. We w i l l assume t h a t each TP and RP i s capable of 3 127 MOPS ( m i l l i o n o p e r a t i o n s p e r s e c o n d ) o n t h e a v e r a g e " t h i s a s s u m p t i o n i s f a i r a n d h a s a l s o a p p e a r e d i n o t h e r s t u d i e s ( f o r i n s t a n c e , s e e r e f e r e n c e [ 3 0 ] ) . S i n c e m o s t o f t h e s c a l a r i n s t r u c t i o n s r e q u i r e t w o r e s u l t p a c k e t s a s t h e i r i n p u t o p e r a n d s , a p p r o x i m a t e l y h a l f o f t h e r e s u l t p a c k e t s c o m i n g o u t o f P S N w i l l n o t t r i g g e r t h e i r r e c e i v i n g i n s t r u c t i o n s f o r e x e c u t i o n ; t o p r o c e s s s u c h a r e s u l t p a c k e t , a RP w o u l d n e e d o n e o p e r a t i o n t o r e t r i e v e t h e t o k e n c o u n t ( i . e . , " # T o k e n s T o G o " ) o f t h e r e c e i v i n g i n s t r u c t i o n , o n e o p e r a t i o n t o d e c r e m e n t i t , o n e o p e r a t i o n t o s t o r e i t b a c k a n d a n o t h e r o n e t o s t o r e t h e r e s u l t t o k e n — a t o t a l o f f o u r o p e r a t i o n s . T h e o t h e r h a l f o f t h e r e s u l t p a c k e t s w o u l d r e a d y t h e r e c e i v i n g i n s t r u c t i o n s f o r e x e c u t i o n ; t o p r o c e s s s u c h a r e s u l t p a c k e t , a RP w o u l d r e q u i r e e i g h t m o r e o p e r a t i o n s t o t r a n s f e r a n 8 - b y t e i n s t r u c t i o n w o r d f r o m i t s a s s o c i a t e d L M t o a n I R , i n a d d i t i o n t o t h e f o u r o p e r a t i o n s m e n t i o n e d a b o v e . T h e r e f o r e , t h e a v e r a g e n u m b e r o f o p e r a t i o n n e e d e d t o p r o c e s s a r e s u l t i s ( 4 + ( 8 + 4 ) ) / 2 = 8 , a n d h e n c e t h e v a l u e o f S R ^ e q u a l s ( 3 x l 0 6 / 8 ) p a c k e t s p e r s e c o n d . A s f o r T P s , l e t u s a s s u m e t h a t t h e t o t a l n u m b e r o f o p e r a t i o n s n e e d e d f o r a T P t o f e t c h a n i n s t r u c t i o n f r o m a n I R , e x e c u t e i t , p a c k a g e t h e r e s u l t i n t o a r e s u l t p a c k e t , a n d t h e n f o r w a r d i t t o t h e n e t w o r k , i s a r o u n d 40 ( o n e m i g h t t r y o t h e r v a l u e s ) ; t h e r e f o r e S j T p = ( 3 x l 0 6 ) / 4 0 p a c k e t s p e r s e c o n d . A s f o r t h e s p e e d o f t h e P S N s w i t c h e s , S R ^ S W e g u a l s " f / t m i n " w h e r e " f " i s t h e c l o c k i n g f r e q u e n c y o f t h e n e t w o r k , a n d M t m i n M i s t h e m i n i m u m n u m b e r o f c l o c k p u l s e s n e e d e d t o t r a n s f e r a p a c k e t f r o m t h e o u t p u t p o r t o f a s w i t c h 1 28 t o t h e i n p u t p o r t o f t h e n e x t s w i t c h . I n t h i s e x a m p l e , t m i n i s a s s u m e d t o b e 10 w h i l e f i s t a k e n t o b e 40 MHz; t h e r e f o r e , — 6 6 t h e v a l u e o f s n c w e q u a l s 4 0 x 1 0 / 1 0 = 4 x l 0 p a c k e t s p e r s e c o n d . 129 The MART, MARPj. and MARPc ,max curves of t h i s example are p l o t t e d a g a i n s t the network s i z e L, and are shown i n Fig.IV.15. Fig.IV.15. The MARPf, MATR and MARPc,max curves of the giv e n example. 130 In t h i s e x a m p l e , t h e maximum t h r o u g h p u t r a t e a t t a i n a b l e b y t h e EDC i s l i m i t e d b y t h e MARPc,max c u r v e w h i c h i s t h e l o w e s t a m o n g t h e t h r e e c u r v e s . Two o b s e r v a t i o n s c o u l d b e o b t a i n e d f r o m F i g . I V . 1 5 : ( a ) T h e r e l a t i v e p o s i t i o n s o f t h e M A R P f a n d MARPc,max c u r v e s s u g g e s t t h a t i n t h i s e x a m p l e , i t i s n o t n e c e s s a r y t o c o n n e c t a l l t h e o u t p u t p o r t s o f t h e PSN w i t h R P s . F o r L = 6 4 , t h e n T = L l o g L = 6 4 x 6 = 3 8 4 ; f r o m e x p r e s s i o n ( I V . 2 ) : ( b ) T h e r e l a t i v e p o s i t i o n s o f MART a n d MARPc,max c u r v e s i n d i c a t e t h a t f o r m o s t o f t h e t i m e , t h e c a p a c i t y o f P S N w i l l b e h i g h e r t h a n t h a t r e q u i r e d , a n d h e n c e t h e e x t e n t a n d p r o b a b i l i t y o f t r a f f i c c o n g e s t i o n i n t h e P S N i s e x p e c t e d t o b e l o w . I f t h e r e a r e a l w a y s c o m p u t a t i o n s t o k e e p a l l t h e T P s b u s y , t h e n t h e r a w s p e e d a t t a i n a b l e b y t h e EDC i s : T x 3 x l 0 6 = 3 8 4 x 3 x l 0 6 = 1 1 5 2 (MOPS) F r o m F i g . I V . 1 5 , t h e maximum r a t e o f f l o w o f r e s u l t p a c k e t s i n t h e s y s t e m i s : I,TP' I,RP ) x T = ( 3 x l 06 / 4 0 ) / ( 3 x l 0 6 / 8 ) x 3 8 4 6 MARP c,max ( L = 6 4 ) = 5 7 . 6 X 1 0 ( p a c k e t s / s e c . ) 131 and the maximum r a t e of flow i n s t r u c t i o n packets i s : RxS D D=RxS_ __/2=154x(3xl0 6)/8/2=28.9xl0 6 (packets/second) R f RP I § RP The curves of Fig.IV.15 are u s e f u l i n e s t i m a t i n g the s i z e of the EDC a r c h i t e c t u r e f o r a d e s i r e d speed, and i t a l s o i n d i c a t e s which part of the a r c h i t e c t u r e w i l l be the most l i k e l y performance b o t t l e n e c k a f t e r the system has been b u i l t . 5.C. C o n s i d e r a t i o n s f o r G e n e r a l i z e d Computations: I f the va l u e s of S T S„. __ and S„ _,, are d i f f e r e n t I.RP I,TP R.SW from those i n t h e . p r e v i o u s example, then e x p e c t e d l y , d i f f e r e n t throughput curves and c o n c l u s i o n s would be o b t a i n e d . In g e n e r a l , computations i n EDC w i l l be made up of both s c a l a r and compound o p e r a t i o n s which are mixed with unknown r a t i o , t h e r e f o r e , assumption (b) has to be removed f o r g e n e r a l i t y . The r e s u l t of such a removal would g i v e r i s e to a b e t t e r performance because the ex e c u t i o n of compound o p e r a t i o n s — i . e . , a r r a y o p e r a t i o n s and s e q u e n t i a l programs — i s c o n t r o l - d r i v e n , and r e q u i r e s simpler c o n t r o l s t r u c t u r e s and in c u r l e s s communications overhead than s c a l a r o p e r a t i o n s which are d a t a - d r i v e n . Assumptions (a) and (c) are j u s t i f i a b l e s i n c e EDC i s intended f o r a p p l i c a t i o n s i n v o l v i n g l a r g e amounts of concurrency; and the i n t e r l e a v i n g of i n s t r u c t i o n s and skewing of a r r a y s would spread the programs randomly and evenly among the TP-LM p a i r s . 132 6. D i s c u s s i o n s a n d O u t l o o k A d e s i g n m e t h o d o l o g y h a s b e e n p r o p o s e d f o r a c l a s s o f J n e x t - g e n e r a t i o n s u p e r c o m p u t e r s . O u r p r o p o s a l , w h i c h i s n a m e d E v e n t - D r i v e n C o m p u t e r ( E D C ) , i s p r i m a r i l y a d a t a - d r i v e n s y s t e m s u p p l e m e n t e d w i t h c o n t r o l - d r i v e n a c t i v i t i e s . A l t h o u g h we d o n o t e m p h a s i z e t h e i m m e d i a t e i m p l e m e n t a t i o n o f E D C , m o s t o f i t s h a r d w a r e a r e i m p l e m e n t a b l e w i t h o f f - t h e - s h e l f c o m p o n e n t s e x c e p t t h e P S N s w i t c h e s w h i c h , h o w e v e r , c o u l d b e e a s i l y f a b r i c a t e d w i t h t o d a y ' s t e c h n o l o g i e s . I f t h e PSN ( P a c k e t S w i t c h i n g N e t w o r k ) c o n s i s t s o f 64 l o o p s , t h e n a p p r o x i m a t e l y 4 0 0 T P s ( T r a n s m i t t i n g P r o c e s s o r s ) c a n b e a t t a h c e d t o t h e s y s t e m , a n d t h e maximum r a t e o f f l o w o f i n s t r u c t i o n a n d r e s u l t p a c k e t s w i l l b e a p p r o x i m a t e l y 58 a n d 30 m i l l i o n p e r s e c o n d , r e s p e c t i v e l y ; a n d t h e raw s p e e d a t t a i n a b l e b y t h e EDC w i l l e x c e e d 1,000 MOPS. S i n c e t h e p r o p o s e d EDC l a n g u a g e s t r u c t u r e i s s i m i l a r t o t h e e x i s t i n g o n e s [ 3 9 , 4 1 , 4 4 , 6 9 ] , many o f t h e t e c h n i q u e s a v a i l a b l e t o d a y c o u l d b e a p p l i e d t o i t s c o m p i l e r d e s i g n . C o m p a r e d t o t h e D e p e n d e n c e - D r i v e n S y s t e m [ 3 2 ] , t h e r e s o u r c e s o f E D C a r e b e t t e r u t i l i z e d : w h e n a T P i s n o t i n v o l v e d i n a n a r r a y o p e r a t i o n , i t c o u l d a l w a y s g e t a s c a l a r i n s t r u c t i o n f r o m i t s a s s o c i a t e d IR ( i n s t r u c t i o n r e g i s t e r ) a n d e x e c u t e i t . T h e a r r a y p r o c e s s i n g c a p a b i l i t i e s o f EDC a l s o d i s t i n g u i s h i t f r o m t h e C o m b i n e d S y s t e m [ 3 3 ] , T h e s p e e d r a n g e o f EDC i s e x p e c t e d t o b e many t i m e s h i g h e r t h a n t h a t o f t h e P D F S y s t e m [ 3 4 ] . 133 B e c a u s e o n l y a f e w c o m p o n e n t t y p e s a r e u s e d , t h e d e s i g n c o s t s o f EDC a r e e x p e c t e d t o b e l o w . S i n c e m o s t o f t h e p r o g r a m s a r e r a n d o m l y d i s t r i b u t e d a m o n g t h e LMs t h u s e q u a l i z i n g t h e memory a c c e s s l o a d , t h e s e r i o u s p r o b l e m o f m e m o r y b o t t l e n e c k s c o u l d b e r e d u c e d . A s f o r a r r a y p r o c e s s i n g c a p a b i l i t i e s , a r r a y c o m p u t a t i o n c o u l d b e c a r r i e d o u t b y a s u b s e t o f t h e f i r s t R T P - L M p a i r s u s i n g t h e r e a d / w r i t e l i n k s p r o v i d e d ; a n d t h e a l i g n m e n t s o f a r r a y s c o u l d b e p e r f o r m e d u s i n g t h e PSN w h i c h p r o v i d e s a n o v e l s y n c h r o n i z a t i o n m e t h o d t o i n d i c a t e t h e e n d o f e a c h a r r a y a l i g n m e n t . D e s i g n i n g a s u p e r c o m p u t e r i s n o t a s i m p l e t a s k ; we h a v e p r e s e n t e d some a r c h i t e c t u r a l i d e a s , b u t t h e r e a r e s t i l l * s e v e r a l i s s u e s w h i c h d e s e r v e i m m e d i a t e a t t e n t i o n s b e f o r e t h e EDC c o n c e p t s c o u l d b e c o m e p r a c t i c a l . F i r s t l y , t h e d e t a i l e d s p e c i f i c a t i o n s o f t h e EDC h a r d w a r e h a v e t o b e d e v e l o p e d , a n d t h e d i v i s i o n o f l a b o r b e t w e e n t h e c o m p i l e r a n d h a r d w a r e h a s t o b e c l a r i f i e d a t t h e o u t s e t . S e c o n d l y , i t i s n e c e s s a r y t o a n a l y s e t h e e f f e c t s o f r u n - t i m e o v e r h e a d ( s u c h a s p r o g r a m l o a d i n g ) o n t h e s y s t e m p e r f o r m a n c e , a n d r e m e d i e s ( s u c h a s i n c r e a s i n g t h e s i z e o f L M s ) h a v e t o b e p r o v i d e d i f t h e e f f e c t s a r e s e v e r e . T h i r d l y , i f s e v e r a l i n d e p e n d e n t p r o c e s s e s a r e r u n s i m u l t a n e o u s l y , t h e n a n i d e n t i f i c a t i o n m e t h o d i s n e c e s s a r y ( u n i q u e i d e n t i f i c a t i o n t a g s a r e o f t e n p r o p o s e d f o r o t h e r d a t a -d r i v e n s y s t e m s [ 4 8 ] ) . F o u r t h l y , t h e p o l i c i e s u s e d b y S P t o s c h e d u l e c o m p o u n d o p e r a t i o n s a n d p r o c e s s e s p l a y a n i m p o r t a n t r o l e i n p r o v i d i n g t h e p r o c e s s o r s w i t h e n o u g h o p e r a t i o n s t o 134 k e e p t h e m b u s y ; i t i s n o t s u r e w h e t h e r t h e r e e x i s t s u c h p o l i c i e s a n d t h o s e f o u n d i n t h e l i t e r a t u r e c o u l d b e u s e f u l i n t h i s a s p e c t . F i f t h l y , we h a v e s u g g e s t e d t h a t a r r a y c o m p u t a t i o n b e e n c o d e d a s c o m p o u n d o p e r a t i o n s , b u t p e r h a p s t h o s e o p e r a t i o n s o n s m a l l a r r a y s c o u l d b e d e c o m p o s e d i n t o s c a l a r o p e r a t i o n s s o t h a t t h e l o a d s u b m i t t e d t o S P s o u l d b e r e d u c e d ; t h e c r i t e r i a o f s u c h d e c o m p o s i t i o n s c o n s t i t u t e a n o t h e r a r e a o f f u r t h e r s t u d y . S i x t h l y , i t w o u l d b e c o n v e n i e n t t o t h e u s e r s i f m o r e d a t a s t r u c t u r e s — s u c h a s r e c o r d s a n d l i s t s — a r e p r o v i d e d . L a s t l y , a l t h o u g h s t u d i e s o n f a u l t t o l e r a n c e i n p a c k e t c o m m u n i c a t i o n a r c h i t e c t u r e s h a v e b e e n f o u n d i n t h e l i t e r a t u r e [ 4 5 ] , a s p e c i f i c s t u d y c o n c e r n i n g EDC i n t h i s r e p e c t i s i n d i s p e n s a b l e . We f e e l t h a t o n l y a f t e r t h e s e a b o v e i s s u e s h a v e b e e n a d e q u a t e l y d e a l t w i t h , c a n a s u p e r c o m p u t e r t h e n b e b u i l t a l o n g t h e l i n e s s e t f o r t h f o r E D C . 135 T a b l e I I I . 1 - S c a l a r o p e r a t i o n s . S c a l a r o p e r a t i o n T y p i c a l e x a m p l e s * D a t a - f l o w g r a p h 1 . A r i t h m e t i c a n d l o g i c ADD, MUL,OR \ I ADD 2 . B o o l e a n E Q U A L ? 3 . D a t a t r a n s f e r a n d c o n t r o l S WITCH D U P L I C A T E f DUP ^ WAIT i i 1 t /ATT* i i t 4 . P r o c e d u r a l c a l l s C A L L , R E T U R N < C A L L J • 1 T a b l e I I I . 2 - C o m p o u n d o p e r a t i o n s . C o m p o u n d o p e r a t i o n T y p i c a l e x a m p l e s D a t a - n o w g r a p h * 1 . V e c t o r a r i t h m e t i c a n d l o g i c ( A D D ) , ( M U L ) H I I ( A D D ) 2 . R e d u c t i o n ( S U M ) , ( P R O D U C T ) , ( M A X ) , ( M I N ) l I (SUM) 1 3 . V e c t o r b o o l e a n ( E Q U A L ? ) 4 . A l i g n m e n t ( R I G H T S H I F T ) , ( L E F T R O T A T E ) , 1 1 U l ( R O T ) 1 * S e e S e c t i o n 4 o n t h e u s e o f t h e s e o p e r a t o r s . 136 T a b l e I I I . 3 - T h e " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f s c a l a r o p e r a t i o n s . Computation Format No. Operands/Next i n s t r u c t i o n s S c a l a r A r i t h m e t i c . L o g i c and Procedure C a l l 1 Opl 0p2 0p3 Op 4 Next2 Nextl 2 c O p l y Op 2 -. Next2 Next l 3 Opl Op2 Next4 Next 3 Next 2 N e x t l 4 ^ Opl „ Next4 Next3 Next2 Next l 5 Opl Next5 Next4 Next3 Next2 N e x t l Boolean 6 «e—Op Next«p N e x t F up 1 * 2 .... >. 7 Opl Op2 Nextf Nextf Nextp N e x t F Data T r a n s f e r & C o n t r o l 8 Next6 Next5 Next4 Next3 Next2 Ne x t l <s-16bit-*«-16 —x—16 K 16—*r-16—^-16—> T a b l e I I I . 4 - T h e " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f c o m p o u n d o p e r a t i o n s . Computation Format No. Operands/Next i n s t r u c t i o n s V e c t o r A r i t h m e t i c and L o g i c 9 No. o f elements S t r i d e (VI) <V2) (V3) Next2 Next l V e c t o r Boolean 10 No. O f elements S t r i d e (VI) (V2) (V3) N e x t T N e x t F Alignments 11 No. o f elements S t r i d e (VI) (V2) D i s -p l a c e -m e n t Next2 N e x t l R e d u c t i o n 12 No. O f e l e m e n t s S t r i d e ( V 2 ) N e x t 4 N e x t 3 N e x t 2 N e x t ^ - - - 8 - b i t - * * - 8 * - 1 6 ^ * - 1 6 - * - 4 6 — * - 1 6 — 1 6 137 Table I I I . 5 - The formats of i n s t r u c t i o n p a c k e t s . Packet Content Packet Format a. I n s t r u c t i o n Address <Addres8 o f i n s t r u c t i o n ^ b. A c t u a l I n s t r u c t i o n word ^Opco d e ; r e s u l t & format types;operands; Next i n s t r u c t i o n a d d r e s s e s ^ Table I I I . 6 - The formats of r e s u l t p a c k e t s . R e s u l t Type Packet Format a. S c a l a r Operand b. Array Element being a l i g n e d c. Base Address assigned to an arra y f e e d b a c k C o u n t ; D e s t i n a t i o n Address;Result T y p e j R e s u l ^ d. S i g n a l l i n g Token ^Feedback C o u n t ; D e s t l n a t i o n Address;Result Type^ e. S y n c h r o n i z a t i o n Token f e e d b a c k C o u n t ; D e s t i n a t i o n Address;Result Type^ 1 38 C h a p t e r V. C o n c l u s i o n s 1. Summary o f R e s u l t s I n C h a p t e r I I , we h a v e p r e s e n t e d a r e - c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) a n d two a l g o r i t h m s w h i c h work on RSS. The c o r r e c t n e s s o f t h e a l g o r i t h m s h a s been p r o v e d and g e n e r a l o p e r a t i o n a l c o n s t r a i n t s have been d e r i v e d . T h i s d e s i g n i s h i g h l y a m e n a b l e t o V L S I i m p l e m e n t a t i o n s due t o t h e f o l l o w i n g a t t r i b u t e s : ( 1 ) t h e s i m p l e c o n t r o l s t r u c t u r e r e q u i r e d by t h e a l g o r i t h m s ; (2) t h e r e g u l a r , r e p e t i t i v e a n d n e a r - n e i g h o u r t y p e o f i n t e r c o n n e c t i o n s among t h e c o m p a r a t o r s ; a n d (3) t h e s y s t o l i c d a t a movements. The s o r t i n g a r r a y i s a l s o w e l l -s u i t e d f o r f a b r i c a t i o n on s h i f t - r e g i s t e r t y p e o f s t o r a g e a n d l o g i c d e v i c e s s u c h a s m a g n e t i c b u b b l e m e m o r i e s (MBMs) and c h a r g e - c o u p l e d d e v i c e s ( C C D s ) , b e c a u s e o f i t ' s c l o s e d - l o o p s t r u c t u r e . The number o f q u a d r u p l e c o m p a r a t o r s n e e d e d t o s o r t N i t e m s i s N/4, and t h e a v e r a g e number o f s o r t i n g c y c l e s , a s f o u n d by o u r s i m u l a t i o n s t u d i e s , i s w i t h i n t h e r a n g e [ ( l o g N ) * * 2 , N ] . A h a r d w a r e t e r m i n a t i o n method i s i n c o r p o r a t e d i n t o t h e c o n t r o l u n i t o f t h e s o r t e r , so t h a t t h e s o r t i n g p r o c e s s c a n be t e r m i n a t e d a s soon a s t h e i n p u t l i s t i s i n t h e d e s i r e d o r d e r . C h a p t e r I I I d e s c r i b e s a n o v e l l o o p - s t r u c t u r e d s w i t c h i n g n e t w o r k (LSSN) i n t e n d e d f o r p a c k e t c o m m u n i c a t i o n s i n h i g h l y p a r a l l e l a p p l i c a t i o n s . W i t h L l o o p s , i t c a n c o n n e c t up t o N = L l o g L p a i r s o f t r a n s m i t t i n g a n d r e c e i v i n g d e v i c e s , u s i n g o n l y N/2 t w o - b y - t w o s w i t c h i n g e l e m e n t s . T h e r e f o r e , i t 1 39 i s very c o s t - e f f e c t i v e i n terms of i t s component count. I t s topology resembles that of the i n d i r e c t b i n a r y n-cube n e t w o r k [ 2 l ] , but a much higher d e v i c e - t o - s w i t c h r a t i o can be achieved by LSSN because a l l the l i n k s between the switches c o u l d be used as both t r a n s m i t t i n g and r e c e i v i n g s t a t i o n s . I t has the advantage of incremental e x t e n s i b i l i t y , and i t i s f r e e of the store-and-forward type of deadlocks which p r e v a i l in other c y c l i c a l packet-switched networks. Our s i m u l a t i o n s t u d i e s have shown that the average throughput r a t e and delay of LSSN are c l o s e to that of other designs d e s p i t e i t s r e l a t i v e l y low component count. Chapter IV d e s c r i b e s a new design methodology f o r the n e x t - g e n e r a t i o n computers. Our p r o p o s a l , the Event-Driven Computer (EDC) i s p r i m a r i l y a d a t a - d r i v e n , heterogeneous system which i s supplemented with c o n t r o l - d r i v e n a c t i v i t i e s ; such a combined approach i s aimed at e x t r a c t i n g the advantages of both the "pure" d a t a - d r i v e n and c o n t r o l - d r i v e n systems while a l l e v i a t i n g t h e i r shortcomings. Compared to other designs, EDC has the advantages of a simpler a r c h i t e c t u r e , b e t t e r resource u t i l i z a t i o n , a r r a y p r o c e s s i n g c a p a b i l i t i e s and a higher speed range. The LSSN of Chapter III has been m o d i f i e d f o r t h i s a p p l i c a t i o n ; with a c o n f i g u r a t i o n of 64 loops, t h i s network can connect up to approximately 400 p r o c e s s o r s , and hence an execution speed of more than 1,000 m i l l i o n o p e r a t i o n s per second can be obtained by the EDC. 1 40 2. G e n e r a l D i s c u s s i o n s The m a i n theme of t h i s t h e s i s i s t o d e m o n s t r a t e t h e p r a c t i c a l i t y a n d u s e f u l n e s s o f c y c l i c a l a r c h i t e c t u r e s i n t h e d e s i g n s o f h i g h - p e r f o r m a n c e p r o c e s s o r s a n d c o m p u t e r s . Our i d e a s h a v e been i l l u s t r a t e d t h r o u g h t h e u s e o f s p e c i f i c a p p l i c a t i o n e x a m p l e s i n c l u d i n g p a r a l l e l s o r t i n g , p a c k e t -s w i t c h e d c o m m u n i c a t i o n s a nd t h e d e s i g n m e t h o d o l o g y o f a n o v e l , n e x t - g e n e r a t i o n c o m p u t e r . The i d e a s o f f e e d b a c k i n o u r p r o p o s a l s a r e e n t i r e l y d i f f e r e n t f r o m t h a t o f p r o c e s s c o n t r o l , w h i c h u s e s f e e d b a c k s i g n a l s f o r c o r r e c t i o n a l p u r p o s e s ( i . e . , a d d i t i v e o r m u l t i p l i c a t i v e m a n i p u l a t i o n s o f t h e i n p u t s i g n a l s ) ; i n t h e RSS a r r a y s , f e e d b a c k a l l o w s d a t a i t e m s t o be f u r t h e r c o m p a r e d among t h e m s e l v e s u n t i l t h e w h o l e i n p u t l i s t i s s o r t e d ; i n t h e n e t w o r k LSSN, t h e s o l e p u r p o s e o f f e e d b a c k i s t o r e - u s e t h e n e t w o r k r e s o u r c e s u n t i l t h e p a c k e t s a r e r o u t e d t o t h e i r d e s t i n a t i o n s , b u t t h e r e i s no d i r e c t i n t e r a c t i o n ( s u c h a s c o m p a r i s o n s i n t h e RSS a r r a y s ) among t h e i n f o r m a t i o n p a c k e t s , o t h e r t h a n c o m p e t i t i o n s f o r t h e n e t w o r k r e s o u r c e s ; i n t h e EDC, t h e a r r i v a l s o f r e s u l t p a c k e t s i n t h e f e e d b a c k p a t h s i g n i f y t h e c o m p l e t i o n o f one o r more i n s t r u c t i o n c y c l e s , a n d a s a r e s u l t , new i n s t r u c t i o n s may o r may n o t be b r o u g h t i n t o t h e c o m p u t a t i o n p a t h f o r e x e c u t i o n s , d e p e n d i n g on t h e amounts o f i n f o r m a t i o n t h e y h a v e g a t h e r e d . The manners i n w h i c h f e e d b a c k p a c k e t s i n t e r a c t w i t h e a c h o t h e r c o n t r i b u t e g r e a t l y t o t h e p r o p e r t i e s o f t h e c y c l i c a l a r c h i t e c t u r e s . F o r 141 i n s t a n c e , t h e LSSN i s s u s c e p t i b l e t o t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s (we h a v e , h o w e v e r , d e m o n s t r a t e d how t h i s p r o b l e m c a n be s o l v e d ) , b u t t h e d e a d l o c k p r o b l e m do n o t e x i s t i n t h e RSS, b e c a u s e d a t a movements i n t h e RSS n e t w o r k t a k e p l a c e a l o n g s p e c i f i c p a t h s a n d t h e r e i s no d a t a p a t h c o n f l i c t ; i n t h e c a s e o f t h e EDC, i f t h e r e a r e a l w a y s memory l o c a t i o n s a v a i l a b l e i n t h e L o c a l M e m o r i e s t o a w a i t t h e r e s u l t p a c k e t s c o m i n g o u t of t h e n e t w o r k ( i . e . , i f t h e p r o g r a m s a r e c o r r e c t l y w r i t t e n , c o m p i l e d a n d l o a d e d ) , t h e n t h e EDC s y s t e m s h o u l d be d e a d l o c k - f r e e . A n o t h e r p r o p e r t y o f p a c k e t - s w i t c h e d , c y c l i c a l a r c h i t e c t u r e s i s t h e i r l a c k o f r e s p o n s i v e n e s s -- i n t e r r u p t s c a n n o t be p r o c e s s e d i m m e d i a t e l y b e c a u s e t h e c o m p u t a t i o n p a t h c o u l d a l r e a d y be c o n g e s t e d w i t h p a c k e t s when t h e i n t e r r u p t s o c c u r ; w h i l e i n t h e EDC, d i r e c t r e a d / w r i t e l i n k s c o n n e c t i n g t h e T r a n s m i t t i n g P r o c e s s o r s a n d t h e L o c a l M e m o r i e s a l l o w p r o g r a m s t o be e x e c u t e d i n a c o n t r o l - d r i v e n manner, w i t h o u t g o i n g t h r o u g h t h e PSN and t h e f e e d b a c k p a t h ; t h e r e f o r e , f a s t e x e c u t i o n a n d h e n c e s h o r t r e p o n s e t i m e s c o u l d be e x p e c t e d . I n g e n e r a l , r e s o u r c e s i n c y c l i c a l a r c h i t e c t u r e s a r e b e t t e r u t i l i z e d when c o m p a r e d t o t h o s e i n t h e a c y c l i c s y s t e m s . 3. S u g g e s t i o n s f o r F u t h e r Work I n t h i s t h e s i s , we have d e v e l o p e d some i d e a s b a s e d on s e v e r a l new a r c h i t e c t u r a l c o n c e p t s a n d d e m o n s t r a t e d t h e i r p r a c t i c a l i t y a n d u s e f u l n e s s . We h a v e n o t i m p l e m e n t e d any o f t h e p r o p o s a l s , b e c a u s e we f e e l t h a t more r e l a t e d work h a s y e t t o be d o n e . S p e c i f i c t o p i c s f o r f u r t h e r r e s e a r c h h a v e been 142 s u g g e s t e d i n t h e p r e v i o u s c h a p t e r s ; i n p a r t i c u l a r , work c o n c e r n i n g t h e d e t a i l e d h a r d w a r e s p e c i f i c a t i o n s a n d f a u l t t o l e r a n c e s t u d i e s o f t h e t h r e e d e s i g n s , s h o u l d p e r h a p s r e c e i v e t h e u t m o s t a t t e n t i o n s , s i n c e o u r p r o p o s e d s y s t e m s a r e d e s i g n e d t o make use o f h u n d r e d s t o t h o u s a n d s o f i n t e r c o n n e c t e d p r o c e s s i n g a nd s t o r a g e c o m p o n e n t s , f a i l u r e s o f s i n g l e c o m p o n e n t s w i l l p a r a l y s e t h e e n t i r e s y s t e m s , a n d t h e s e i m p o r t a n t i s s u e s a r e n o t i n c l u d e d i n o u r s t u d i e s . 143 A p p e n d i x A Lemma 1 1 1 . 1 : C o n s i d e r a L S S N w h i c h h a s L l o o p s a n d a p a c k e t w h i c h i s d e s t i n e d f o r t h e a d d r e s s A g . . A j - £ L , . . . - £ , w h e r e L ' = l o g L a n d S ' = r i o g L ' l . T h e p a c k e t w i l l b e r o u t e d t o t h e l o o p t ....t w i t h i n L ' s t e p s o f r o u t i n g a f t e r i t s a d m i s s i o n i n t o t h e L S S N . P r o o f : S u p p o s e t h e p a c k e t i s a d m i t t e d i n t o t h e L S S N v i a t h e l o o p jf i_> . A c c o r d i n g t o t h e r o u t i n g s c h e m e , t h i s L ' * s* * 1 p a c k e t w i l l b e r o u t e d t o t h e l o o p t L' b y a s w i t c h L ' * * s* * 1 l o c a t e d i n t h e s - t h s t a g e , w h e r e s = 1 , 2 , . . . L ' . S i n c e t h e maximum v a l u e o f s i s L ' , o n l y L ' s t e p s o f r o u t i n g a r e r e q u i r e d t o r o u t e t h e p a c k e t t o t h e a f o r e m e n t i o n e d l o o p JL t t L ' s ' 1* Lemma I I I . 2 : C o n s i d e r a L S S N w i t h L l o o p s a n d a p a c k e t w h i c h i s d e s t i n e d f o r t h e a d d r e s s A G , . . A ^ ^ ,..-c!^ , w h e r e L ' = l o g L a n d S ' = r l o g L ' l . A f t e r t h e p a c k e t h a s b e e n r o u t e d t o t h e l o o p £ L i * • , i t n e e d s a t m o s t a n o t h e r ( L ' - l ) s t e p s o f m a t c h i n g a l o n g t h a t l o o p t o r e a c h i t s d e s t i n a t i o n . P r o o f : A c c o r d i n g t o t h e r o u t i n g s c h e m e , t h e d e s t i n a t i o n 144 a d d r e s s A .. A t ... t i s o n e o f t h e L ' o u t p u t l i n k s a l o n g t) X Li X t h e l o o p . A f t e r t h e p a c k e t h a s b e e n s w i t c h e d t o •L* X t h i s l o o p , i t w i l l b e r e m o v e d b y e i t h e r t h e r e c e i v e r a t t a c h e d t o t h e l i n k w h i c h i s p a r t o f t h a t l o o p , o r o n e o f t h e r e m a i n i g ( L ' - 1 ) r e c e i v e r s a t t a c h e d t o t h e same l o o p . I n e i t h e r c a s e , a t m o s t ( L ' - 1 ) s t e p s o f m a t c h i n g a r e n e c e s s a r y . T h e o r e m 1 1 1 . 1 : I n a L S S N w i t h L l o o p s , a p a c k e t w i l l b e d e l i v e r e d t o i t s d e s t i n a t i o n w i t h i n ( 2 1 o g L - 1 ) s t e p s o f r o u t i n g r e g a r d l e s s o f w h e r e i t i s g e n e r a t e d . P r o o f : T h i s t h e o r e m i s a r e s u l t o f Lemma 1 a n d 2. T h e o r e m I I I . 2 : T h e a v e r a g e n u m b e r o f r o u t i n g s t e p s ( A R S ) n e e d e d t o d e l i v e r a r e s u l t p a c k e t i n a L S S N w i t h L l o o p s i s , A R S ( L ) = ( 3 l o g L - 1 ) / 2 + 2 / L - 1 P r o o f : W i t h o u t l o s s o f g e n e r a l i t y a n d f o r s i m p l i c i t y , we s h a l l c o n s i d e r a t r a n s m i t t e r ( T r ) l o c a t e d i n t h e f i r s t s t a g e o f t h e n e t w o r k . T h e r o u t e s f r o m t h i s T r t o t h e s e t o f o u t p u t l i n k s w h i c h c a n b e r e a c h e d w i t h o u t g o i n g t h r o u g h t h e f e e d b a c k p a t h s , a s s h o w n i n F i g . I I I . 2 , i s i n t h e f o r m o f a " b i n a r y t r e e " w h i c h b r a n c h e s o u t t o w a r d t h e l o w e r e n d o f t h e n e t w o r k ; t h e r o u t e s t o t h e r e m a i n i n g s e t o f o u t p u t l i n k s w o u l d i n c l u d e ^ t h e f e e d b a c k p a t h s , a n d i s i n t h e f o r m o f a n i r r e g u l a r , " t a p e r i n g " t r e e . T h e n u m b e r o f r o u t i n g s t e p s n e e d e d t o r e a c h 145 t h e r e c e i v e r s o n t h e s e t w o t r e e s a r e t a b u l a t e d i n T a b l e I I I . 1 . F r o m T a b l e 1 1 1 . 1 , t h e a v e r a g e n u m b e r o f r o u t i n g s t e p s n e e d e d f o r a T r t o r e a c h a n y o u t p u t l i n k i s t h e r e f o r e , A R S ( L ) = { ( 2 x 1 + 4 x 2 + . . + L l o g L ) + ( L - 2 ) ( l o g L + 1 ) + ( L - 4 ) ( l o g L + 2 ) + + ( L / 2 ) ( 2 l o g L - 1 ) } / { ( 2 + 4 + . . + L ) + ( L - 2 ) + ( L - 4 ) + . . + ( L / 2 ) } = { ( L l o g L - 2 1 o g L + L ) + ( L l o g L - 4 1 o g L + 2 L ) + ( L l o g L - L / 2 1 o g L + L d o g L - 1 ) + L l o g L } / { L l o g L } = { L ( l o g L + ( l o g L + 1 ) + ( l o g L + 2 ) + . . + ( l o g L + l o g L - 1 ) ) -l o g L ( 2 + 4 + . . L / 2 ) } / { L l o g L } = { L l o g L ( 3 l o g L - 1 ) / 2 - 2 1 o g L ( 2 * * ( l o g L - 1 ) ) / { L l o g L } = ( 3 l o g L - 1 ) / 2 + 2 / L - 1 Q.E.D. T a b l e . I I I . 1 - T h e n u m b e r o f r o u t i n g s t e p s n e e d e d t o r e a c h t h e r e c e i v e r s o f t h e " b i n a r y " a n d " t a p e r i n g " t r e e s . S t a g e # o f R r B i n a r y t r e e T a p e r i n g t r e e # R r s # s t e p s # R r s # s t e p s s t a g e 1 2 1 L - 2 l o g L + 1 s t a g e 2 4 2 L - 4 l o g L + 2 • • • • • • • • • • s t a g e ( l o g L - 1 ) L / 2 l o g L - 1 L / 2 2 l o g L - 1 s t a g e ( l o g L ) L l o g L 0 2 1 o g L C o r a l l a r y 1 1 1 .1 : A n y p a c k e t a d m i t t e d i n t o L S S N w i l l g o t h r o u g h 146 t h e f e e d b a c k p a t h a t m o s t t w i c e . P r o o f : C o n s i d e r a t r a n s m i t t e r ( T r ) w h i c h s e n d s a p a c k e t a t t h e s - t h s t a g e t o a r e c e i v e r ( R r ) i s o f r r o u t i n g s t e p s a w a y ; a n d s u p p o s e t h e r e a r e L ' s t a g e s i n t h e n e t w o r k . T h e n u m b e r o f f e e d b a c k s , F , c o u l d b e c a l c u l a t e d a s : F ( L ) = Q u o t i e n t ( ( s + r - 1 ) / L ' ) T h e r e a d e r may v e r i f y t h e c o r r e c t n e s s o f t h i s e x p r e s s i o n w i t h a s i m p l e e x a m p l e o n F i g . I I I . 2 . T h e maximum v a l u e o f F i s t h e r e f o r e , F ( L ) = Q u o t i e n t ( ( S m a x + r m a x ~ 1 ) / L * ) S i n c e s m a x = L* a n d r m a x = 2 L « - 1 ( f r o m T h e o r e m I I I . l ) , t h e n F ( L ) = Q u o t i e n t ( ( 3 L ' " 2 ) / L ' ) max = 2 Q.E.D. T h e o r e m I I I . 3 : F o r a L S S N w i t h L l o o p s , t h e p r o b a b i l i t y t h a t t h e d e s t i n a t i o n a d d r e s s c a r r i e d b y a r e s u l t p a c k e t w i l l m a t c h t h e l a b e l o f a n o u t p u t l i n k , a n d h e n c e t h e p a c k e t w i l l b e r e m o v e d f r o m t h e n e t w o r k i s : 147 P = 2 L / { 3 L l o g L - L + 4 } r e m o v e d w h e r e t h e t r a n s m i s s i o n p a t t e r n i s s u c h t h a t e a c h a n d e v e r y r e c e i v i n g p o r t o f t h e n e t w o r k i s e q u a l l y l i k e l y t o r e c e i v e t h a t p a c k e t . P r o o f : S i n c e t h e L S S N h a s L l o o p s , i t w o u l d h a v e l o g L s t a g e s o f s w i t c h e s a n d L l o g L p a i r s o f t r a n s m i t t i n g p o r t s ( T P s ) a n d r e c e i v i n g p o r t s ( R P s ) . C o n s i d e r t h e c a s e i n w h i c h a T P i n e a c h s t a g e o f t h e n e t w o r k t r a n s m i t s a r e s u l t p a c k e t t o e a c h a n d e v e r y RP i n t h e n e t w o r k , t h e n t h e n u m b e r o f p a c k e t s t r a n s m i t t e d b y e a c h s t a g e o f R P s i s a s t a b u l a t e d i n T a b l e I I I . 2 . I n T a b l e I I I . 2 , " F e e d b a c k C o u n t " i s t h e n u m b e r o f t i m e s t h e p a c k e t s w i l l g o t h r o u g h t h e f e e d b a c k p a t h s i n o r d e r t o r e a c h t h e i r d e s t i n a t i o n s . T h e c o r r e c t n e s s o f t h i s t a b l e c o u l d b e v e r i f i e d o n t h e e x a m p l e g i v e n i n F i g . I I I . 3 . F r o m t h i s t a b l e , t h e t o t a l n u m b e r o f p a c k e t s t h a t w i l l b e r e c e i v e d b y t h e R P s c o n n e c t e d t o a p a r t i c u l a r s t a g e , s a y t h e l a s t ( i . e . , l o g L - t h ) s t a g e , i s o b t a i n e d b y s u m m i n g u p t h e n u m b e r s a c r o s s t h e c o r r e s p o n d i n g r o w o f t h e t a b l e , a n d i t i s : N = L l o g L m a t c h e d a n d t h e t o t a l n u m b e r o f s w i t c h i n g o p e r a t i o n s p e r f o r m e d b y t h e 148 same s t a g e i s : N = N +N t o t a l m a t c h e d u n m a t c h e d w h e r e N i s t h e n u m b e r o f p a c k e t s w h i c h w i l l n o t b e u n m a t c h e d r e m o v e d b y R P s o f t h a t s t a g e b e c a u s e o f u n m a t c h e d d e s t i n a t i o n a d d r e s s e s c a r r i e d b y t h e m . I n t h e c a s e o f t h e ( l o g L ) - t h s t a g e , N u n m a t c h e d c o u ^ ^ e a s i l y b e c o m p u t e d a s t h e sum o f p r o d u c t s o f t h e e n t r i e s a n d t h e i r r e p e c t i v e " F e e d b a c k c o u n t " s i n T a b l e I I I . 2 : N ^ = 1 x { ( L - 2 ) + ( L - 4 ) + . . . + ( L - L / 2 ) + ( L - L ) u n m a t c h e d + L + ( L - 2 ) + . . . + ( L - L / 4 ) + ( L - L / 2 ) + L / 2 + L + ( L - 2 ) + . . . + ( L - L / 8 ) + ( L - L / 4 ) T • • • • + 4+8+16+ + ( L - 4 ) + ( L - 2 ) } + 2 x { ( L - L / 2 ) + ( L - L / 4 ) + ( L - L / 2 ) + ( L - L / 8 ) + ( L - L / 4 ) + ( L - L / 2 ) + + ( L - 4 ) + ( L - 8 ) + + ( L - L / 2 ) } L e t N ' = ( L - 2 ) + ( L - 4 ) + . . . + ( L - L / 2 ) , t h e n a f t e r r e - a r r a n g e m e n t , N =N' u n m a t c h e d +L+N' + L / 2 + L + N ' + ( L - L / 2 ) + L / 4 + L / 2 + L + N ' + ( L - L / 2 ) + ( L - L / 4 ) 149 + 4 + 8 + . . . . + L / 2 + L + N ' + ( L - L / 2 ) + ( L - L / 4 ) + . . + ( L - 4 ) = N ' l o g L + ( 1 + 2 + 3 + . . . + l o g L - 1 ) * L = { L ( l o g L - 1 ) - ( 2 + 4 + 8 + . . . + L / 2 ) } * l o g L + { L ( 1 + l o g L - 1 ) ( l o g L - 1 ) / 2 } = { L ( l o g L - 1 ) - ( L - 2 ) } l o g L + { L l o g L ( l o g L - 1 ) / 2 } = { 3 L ( l o g L ) * * 2 } / 2 - { 3 L l o g L } / 2 + 2 1 o g L => N t o t a l = { 3 L ( l o g L ) * * 2 } / 2 - { L l o g L } / 2 + 2 1 o g L => P = N / N ^ , r e m o v e d m a t c h e d t o t a l = ( L l o g L ) / { ( 3 L ( l o g L ) * * 2 ) / 2 - ( L l o g L ) / 2 + 2 1 o g L } = { 2 L } / { ( 3 L ( l o g L ) * * 2 ) / 2 - L + 4 } Q.E.D. T h e o r e m I I I . 4 : T h e maximum a v e r a g e t h r o u g h p u t r a t e (MATR) o f a L S S N w i t h L l o o p s i s : M A T R ( L ) = 3 / 2 x S R ^ s w x l o g L x L * * 2 / { 3 L l o g L - L + 4 } w h e r e S R g w i s t h e maximum r a t e o f t r a n s m i t t i n g ^ R e s u l t p a c k e t s b e t w e e n t w o s w i t c h e s v i a a n o u t p u t l i n k . 150 P r o o f : S i n c e r e t h e r e a r e L l o g L l i n k s i n a L - l o o p e d L S S N , t h e r e f o r e , t h e maximum a v e r a g e r a t e o f d e l i v e r i n g p a c k e t s t o a l l t h e r e c e i v i n g p o r t s c o u l d b e f o r m u l a t e d a s : M A T R ( L ) = L l o g L x S R ^  ^ x ( 1 - P c o n f x i c t e d > x P r e m o v e d w h e r e P i s t h e p r o b a b i l i t y t h a t a n o u t p u t l i n k w i l l c o n f l i c t e d . J c « n o t c o n t a i n a p a c k e t d u e t o c o n f l i c t s w i t h i n t h e s w i t c h c o n c e r n e d , a n d i t c o u l d b e c o m p u t e d w i t h t h e i l l u s t r a t i o n s b e l o w : F i g . I l l . 6 , On t h e a v e r a g e , 2 5 % o f t h e t i m e a n o u t p u t l i n k w i l l n o t r e c e i v e a n y p a c k e t d u e t o c o n f l i c t s i n t h e s w i t c h , t h e r e f o r e , 1 " P c o n f l i c t e d * 3 ^ 4 a n d a l s o , M A R T ( L ) = 3 / 4 x S x l o g L x L * * 2 / { 3 L l o g L - L + 4 } = 3 / 2 x S R t S W x l o g L x L * * 2 / { 3 L l o g L - L + 4 } Q.E.D. 151 T h e o r e m I I I . 5 : T h e L S S N w h i c h u s e s T y p e - B s w i t c h e s i s d e a d l o c k f r e e . P r o o f : T y p e - B s w i t c h e s p r o v i d e t w o e s s e n t i a l f e a t u r e s i n a v o i d i n g d e a d l o c k s i n L S S N : ( a ) T h e i n t e r m e d i a t e p o r t s a r e u s e d t o h o l d p a c k e t s w i t h f e e d b a c k c o u n t s o f 0 a n d 1, s u c h t h a t t h e y a r e n o t e l i g i b l e t o c o n t e n d f o r t h e o u t p u t p o r t s i f t h e y c a n n o t be s w i t c h e d t o t h e n e x t s t a g e i m m e d i a t e l y , i . e . , i f t h e b u f f e r p o o l s o f t h e n e x t s w i t c h h a s n o r o o m t o a c c e p t t h e m . ( b ) T h e f e e d b a c k c o u n t s o f t h e p a c k e t s e m e r g i n g f r o m t h e l a s t s t a g e a r e i n c r e m e n t e d s o t h a t when t h e y a r e f e d b a c k t o t h e f i r s t s t a g e , t h e y w i l l r e q u e s t b u f f e r s o f t h e n e x t h i g h e r c l a s s . T h e f i r s t f e a t u r e e n s u r e s t h a t l i n k s t h a t a r e s h a r e d b y p a c k e t s w i t h v a r i o u s f e e d b a c k c o u n t s w i l l n o t b e c l o g g e d . T h e s e c o n d f e a t u r e p r e v e n t s t h e f o r m a t i o n o f a n y c y c l i c a l r e q u e s t l o o p . W i t h t h e s e t w o f e a t u r e s , t h e p a t h t r a v e r s e d b y a n y p a c k e t i n t h e n e t w o r k i s " s p i r a l " r a t h e r t h a n " c y l i c a l " i n s h a p e , a n d t h e w h o l e n e t w o r k c o u l d b e c o n c e i v e d a s s e v e r a l s p i r a l s i n t e r c o n n e c t e d i n p a r a l l e l , w i t h t h e C l a s s - 0 b u f f e r s o f t h e f i r s t s t a g e a s t h e h e a d s o f t h e s p i r a l s , a n d t h e C l a s s -2 b u f f e r s o f t h e l a s t s t a e a s t h e t a i l s . S i n c e t h e r e i s n o 152 c y c l i c a l r e q u e s t o f r e s o u r c e s , t h e n e t w o r k i s t h e r e f o r e d e a d l o c k f r e e . Appendix B PROGRAM RECIRCULATING-SYSTOLIC-SORT; CONST INIT_SEED = 3; MARKER COLUMN = 1; RANGE 100.0; (* LARGEST RANDOM NUMBER TO BE SORTED *) TYPE STORE_RECORD = RECORD MARKER : BOOLEAN; ITEM : INTEGER END; COMPARATOR RECORD RECORD INIT_ROW. INIT_C0LUMN : A l , AJ, BI, BJ, CI. CO, TEMPORARY : INTEGER END; INTEGER; 01, DJ : INTEGER; DATA_ARRANGE TYPE = (RRANDOM. SSOUENTIAL); VAR N_C0MP, H_COMP, LIMIT_NO_SWITCH ROW, COLUMN STORE COMPARATOR • TOTAL_CYCLE, SEED 01 . I, <J. K, TEMPA, TEMPB TEMP_COL MARKERA. MARKERB TERMINATE DATA_ARRANGE DECR V_COMP : INTEGER; : INTEGER; INTEGER; ARRAY (. 0 .. 50, 0 .. 50 .) OF STORE_RECORD; ARRAY (. 1 .. 200 .) OF COMPARATOR RECORD; SWITCH_PER_CYCLE : INTEGER; : INTEGER; EVEN_START : INTEGER; INTEGER; J J , ODD_START TEMPC. TEMPO : INTEGER; MARKERC, BOOLEAN; DATA_ARRANGE_TYPE INTEGER; MARKERD : BOOLEAN; CONTINUOUS NO SWITCH : INTEGER; PROCEDURE SETUP_NETWORK; BEGIN (* SETUP_NETWORK *) ROW := 2 * V_COMP; COLUMN := 2 * H_COMP; N_COMP := V_COMP * H_COMP (* *NUMBER OF COMPARATORS LIMIT NO SWITCH := 2 » H COMP O TRUNC (H_COMP / 2) ) »; ( I : = O 0 := O FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN (* ** EACH COMPARATOR WILL HOLD INIT_ROW := I; INIT_COLUMN := J ; I := I + 2; TERMINATE IF NO SWITCHING CONTINUOUSLY *) 4 ITEMS TO BE SORTED' 154 IF I >= 2 * V_COMP - 1 THEN BEGIN I := I - 2 * V_COMP + 1; J := J + 2 END END END (* SETUP_NETWORK * ) ; FUNCTION RANDOM (VAR SEED : INTEGER) : INTEGER; BEGIN (* RANDOM *) RANDOM := TRUNC ((SEED / 65536 - 0.1) * RANGE); SEED := (25173 * SEED + 13849) MOD 65536 END (* RANDOM * ) ; PROCEDURE CHECK_TERMINATE; BEGIN (* CHECK_TERMINATE *) IF SWITCH_PER_CYCLE = 0 THEN BEGIN INCR (CONTINUOUS_NO_SWITCH); IF CONTINUOUS_NO_SWITCH >= LIMIT_NO_SWITCH THEN TERMINATE := TRUE END ELSE BEGIN SWITCH_PER_CYCLE := 0; C0NTINU0US_N0_SWITCH := 0 END END (* CHECK_TERMINATE * ) ; PROCEDURE INITIALIZE; BEGIN .(* INITIALIZE *) FOR I := 0 TO (ROW - 1) DO FOR d := 0 TO (COLUMN - 1) DO WITH STORE (. I, d .) DO BEGIN IF DATA_ARRANGE = RRANDOM THEN ITEM := RANDOM (SEED) ELSE BEGIN ITEM := SEED; SEED := SEED - DECR END; (* *********"MARKING EACH LOOP* ******** *) IF ( ( d = 2 * MARKER_COLUMN - 2) AND (I MOD 2 = 1 ) ) 0 R ( ( d = 2 * MARKER_COLUMN - 1) AND (I MOD 2 = 0 ) ) THEN MARKER := TRUE ELSE MARKER := FALSE END; FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN TEMPORARY = 0; Al = INIT ROW; Ad = INIT" "COLUMN; BI = A l ; Bd = Ad + 1 ; CI = A l + 1 ; Cd = Ad; DI = A l + 1 ; Dd = Ad + 1 END; TOTAL_CYCLE := 0; SWITCH_PER_CYCLE := 0; C0NTINU0US_N0_SWITCH := 0; TERMINATE := FALSE END (* INITIALIZE * ) : PROCEDURE VERTICAL_COMP; BEGIN (* VERTICAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF (STORE (. A l , Ad .).ITEM > STORE (. CI. Cd BEGIN TEMPORARY := STORE (. A l , Ad .).ITEM: STORE (. A l , Ad .).ITEM := STORE (. CI, Cd STORE (. CI, Cd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END; IF (STORE (. BI, Bd .).ITEM > STORE (. DI, Dd BEGIN TEMPORARY := STORE .(. BI. Bd .).ITEM; STORE (. BI, Bd .).ITEM := STORE (. DI. Dd STORE (. DI. Dd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END END END (* VERTICAL_COMP * ) ; PROCEDURE DIAGONAL_COMP; BEGIN (* DIAGONAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF (STORE (. A l , Ad .).ITEM > STORE (. DI. Dd .).ITEM) THEN BEGIN TEMPORARY := STORE (. A l , Ad .).ITEM; STORE (. A l , Ad .).ITEM := STORE (. DI, Dd .).ITEM; STORE (. DI, Dd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END; IF (STORE (. BI. Bd .).ITEM > STORE (. CI. Cd .).ITEM) THEN BEGIN TEMPORARY := STORE (. BI. Bd . ).ITEM; STORE (. BI, Bd .).ITEM := STORE (. CI, Cd .).ITEM; STORE (. CI. Cd .).ITEM := TEMPORARY; INCR (SWITCH_PER CYCLE) END END END (* DIAGONAL_COMP * ) ; (* DIAGONAL_COMP *) PROCEDURE HORIZONTAL_COMP; BEGIN (* HORIZONTAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN TEMPA := STORE (. A l , Ad TEMPB := STORE (. BI, Bd TEMPC := STORE (. CI, Cd ).ITEM) THEN .).ITEM; ).ITEM) THEN .).ITEM; . ) . ITEM . ) . ITEM . ) .ITEM TEMPD := MARKERA MARKERB MARKERC MARKERD TEMP COL STORE (. * STORE = STORE = STORE = STORE DI , Dd . ) . (. A l . AJ . (. BI. Bd . (. CI. Cd . (. DI, Dd . ITEM; ).MARKER ).MARKER ).MARKER ).MARKER := TRUNC (INIT COLUMN / 2); IF ((NOT MARKERA) AND (TEMPB TEMPA)) THEN BEGIN TEMPORARY := STORE (. BI, STORE ( . BI, Bd .).ITEM : STORE (. A l . Ad .).ITEM : INCR (SWITCH_PER_CYCLE) END; IF ((NOT MARKERC) AND (TEMPD TEMPC)) THEN BEGIN TEMPORARY :- STORE (. DI, STORE (. DI, Dd .).ITEM : STORE (. CI. Cd .).ITEM : INCR (SWITCH_PER_CYCLE) END END END (* HORIZONTAL COMP * ) ; > TEMPA)) OR ((MARKERA) AND (TEMPB Bd .).ITEM; = STORE (. A l , = TEMPORARY: Ad .).ITEM; > TEMPO) OR ((MARKERC) AND (TEMPD Dd .).ITEM; = STORE (. CI, = TEMPORARY; Cd .).ITEM: PROCEDURE DISPLAY: BEGIN (* DISPLAY *) WRITELN; WRITELN; WRITELN ('NUMBER OF SWITCHING = SWITCH PER CYCLE 5) ; WRITELN ('AT CYCLE TIME = FOR I := 0 TO (ROW - 1) DO BEGIN IF (I MOD 2 = 0 ) THEN BEGIN WRITELN; WRITELN END; FOR d := 0 TO (COLUMN -WITH STORE (. I. d .) BEGIN IF (d MOD 2 WRITE (ITEM END; WRITELN END END (* DISPLAY *) ; TOTAL CYCLE 5) ; 1 ) DO DO O) THEN WRITE (' 4) 1): PROCEDURE TRU_DISPLAY; BEGIN (* TRU_DISPLAY *) WITH COMPARATOR (. 1 .) DO BEGIN EVEN_START := Ad; ODD_START := Cd END; WRITELN; WRITELN; WRITELN ('AT CYCLE TIME=', TOTAL CYCLE FOR I := 0 TO (ROW - 1) DO BEGIN 5); 157 IF (I MOD 2 = 0 ) THEN BEGIN FOR d := EVEN_START TO EVEN_START + COLUMN - 1 DO BEGIN J J := J MOD COLUMN; IF STORE (. I, dd .).MARKER THEN WRITE (' *', STORE (. I, dd .).ITEM : 2) ELSE WRITE (' ', STORE (. I, dd .).ITEM : 2) END; WRITELN END ELSE BEGIN FOR d := ODD_START TO ODD_START + COLUMN - 1 DO BEGIN dd := d MOD COLUMN; IF STORE (. I. dd .).MARKER THEN WRITE (' *'. STORE (. I, dd .).ITEM : 2) ELSE WRITE (' ', STORE (. I, dd .).ITEM : 2) END; WRITELN END END END (* TRU_DISPLAY * ) ; PROCEDURE SHIFT; BEGIN (* SHIFT *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF ( A l MOD 2 = 1 ) THEN Ad : = (Ad + 1 ) MOD COLUMN ELSE Ad := (Ad + COLUMN - 1) MOD COLUMN; IF (BI MOD 2 = 1 ) THEN Bd : = (Bd + 1) MOD COLUMN ELSE Bd := (Bd + COLUMN - 1) MOD COLUMN; IF (CI MOD 2 = 1) THEN Cd : = (Cd + 1) MOD COLUMN ELSE Cd := (Cd + COLUMN - 1 ) MOD COLUMN; IF (DI MOD 2 = 1 ) THEN Dd : = (Dd + 1) MOD COLUMN ELSE Dd := (Dd + COLUMN - 1) MOD COLUMN END END (* SHIFT * ) ; BEGIN (* PARA_SORT *) H_COMP := 6; V_COMP := 3; SEED := INIT_SEED; DATA_ARRANGE := RRANDOM: DECR := 1; WHILE (H_COMP <> -999) DO BEGIN WRITELN ('H_COMP/V COMP/SEED/RAND/DECR=', H_COMP : 5. V_COMP : 5, SEED : 5, DATA -ARRANGE : 5. DECR : 5); WRITELN ('ENTER NEW VALUES/ -999 FOR TERMINATION'); READLN (H_COMP. V_COMP, SEED. DATA_ARRANGE, DECR); IF (H_COMP <> -999) THEN BEGIN SETUP_NETWORK; INITIALIZE; TRUJDISPLAY; WHILE (NOT TERMINATE) DO BEGIN 158 INCR (TOTAL_CYCLE); VERTICAL_COMP; HORIZONTAL_COMP; DIAGONAL_COMP; CHECK_TERMINATE; SHIFT END; TRU_DISPLAY; WRITELN; WRITELN ('NUMBER OF HORIZONTAL COMPARATORS =', H_COMP : 5 ) : WRITELN ('NUMBER OF VERTICAL COMPARATORS ='. V_COMP : 5); WRITELN ('NUMBER OF ITEMS SORTED 3', ROW * COLUMN : 5 ) ; WRITELN ('NUMBER OF DOUBLE_COMPARISON/SHIFT CYCLES =', TOTAL_CYCLE - LIMIT_NO SWITCH : 5) END END END (* PARA SORT > « 159 Appendix C PROGRAM LSSN; (•DEADLOCK FREE •USE CENTRAL BUFFERS FOR SIMULATION,ELSE STACK OVERFLOW *CLASS_0,_1,_2 BUFFERS ARE PRIORITIZED: CLASS-2 = HIGHEST CLASS-1 = MIDDLE CLASS-0 = LOWEST •MAY 1983 *) CONST N_SW=32; N TR=64; N~RR=64; TR_INTL=10; T SIMULAT=1000000; T~TRANSFER=2: T~DECIDE=3; T_SWITCH=2; F0_SIZE=7; F1_SIZE=7; F2_SIZE=2; F012_SIZE=FO_SIZE+F1_SIZE+F2_SIZE; F_T0TAL=2*N_SW*F012_SIZE; TYPE BUFFER_RECORD=RECORD B_EMPTY:BOOLEAN; B_TR_TIME:INTEGER; B_DEST:INTEGER: B_FEEDBACK_COUNT:INTEGER; END; FIFO_RECORD=RECORD F_START,F_STOP:INTEGER; F_TOP,F_BOTTOM:INTEGER; F_EMPTY,F_FULL:BOOLEAN; F_COUNT:INTEGER; END; INPORT_RECORD=RECORD I_TIMER:INTEGER; I_EMPTY:BOOLEAN; I_TR_TIME:INTEGER; I_DEST:INTEGER; I_FEEDBACK_COUNT:INTEGER; END; OUTPORT_RECORD=RECORD 0_TIMER:INTEGER; 0_EMPTY,0_MATCHED:BOOLEAN; 0_TR_TIME:INTEGER; 0_DEST:INTEGER; 0_FEEDBACK_COUNT:INTEGER; 0_RR,0_NEXT_SW,0_NEXT_PT:INTEGER ; END; SWITCH RECORD=RECORD INPORT:ARRAY(. 0..1 .) OF INPORT_RECORD; FIFO:ARRAY(. 0..1.0..2 .) OF FIFO_RECORD; OUTPORT:ARRAY(. 0..1 .) OF OUTPORT_RECORD; END; TR_RECORD=RECORD T EMPTY,T BLOCKED:BOOLEAN; T_DEST:INTEGER; T_NEXT_SW,T_NEXT_PT:INTEGER; T_TIMER:INTEGER;("TRANSMIT WHEN TIMER REACHES CLOCK*) END; VAR SWITCH:ARRAY(. 1..N_SW .) OF SWITCH_RECORD; TR:ARRAY(. 1..N_TR .) OF TR_RECORD; BUFFER : ARRAY(. 1..F TOTAL .) OF BUFFER_RECORD; FMAX:ARRAY( . 0..2 .7 OF INTEGER;(* MAX' USAGE OF FIFO*) CLOCK:INTEGER; SEED:INTEGER; R PACKET,T_PACKET:INTEGER; TOTAL_DELAY:INTEGER; MAX_DELAY:INTEGER; TR_DELAY:INTEGER;("BLOCKAGE AT ENTRANCE*) FUNCTION RANDOM (VAR SEED : INTEGER) : REAL; BEGIN RANDOM:=SEED/65535; SEED := (25173*SEED+13849)M0D 65536; END; PROCEDURE INITIALIZE; VAR TI.SI.IPI.OPI.CI.PI.BI:INTEGER; TEMPI:INTEGER; BEGIN WRITELN('LAST STAGE DOES NOT MATCH LOOP NUMBER,JUST INCR FB#') WRITELN('NUMBER OF SWITCHES 3',N_SW:5); WRITELN('NUMBER OF TR=',N_TR:5); WRITELN('FIFO SIZE OF CL-O,1,2,TOTAL PER SWITCH=',FO_SIZE:5, F1_SIZE:5,F2_SIZE:5,F012_SIZE:5); WRITELN('REQUEST RATE=',1/TR_INTL:10:5); WRITELN('TR INTL= ',TR_INTL:7); WRITELN('SIMULATION TIME =',T_SIMULAT:5); FOR SI:=1 TO N_SW DO WITH SWITCH[SI] DO BEGIN (*S*) FOR IPI:= 0 TO 1 DO WITH INPORT[IPI] DO BEGIN (*IP*) I_TIMER:=0; I_EMPTY:=TRUE; I_DEST:=-1; I_TR_TIME:=0; I_FEEDBACK_COUNT:=0; END;(*IP*) FOR PI:=0 TO 1 DO BEGIN FOR CI:= 0 TO 2 DO WITH FIFO [ P I . C I ] DO BEGIN (*BP,CL*) F_EMPTY:=TRUE; F_FULL:=FALSE ; F_COUNT:=0; END;(*BP,CL*) (••DETERMINE MEMORY LOCATIONS FOR EACH SWITCH'S BUFFER**) TEMPI:=F012_SIZE*(2*SI-2+PI) + 1 ; FIF0[PI,O].F_START:=TEMPI; FIFO[PI,0].F_TOP:=TEMPI; 161 FIFO[PI.0].F_BOTTOM:=TEMPI; FIFOtPI,0].F_ST0P:=TEMPI+FO_SIZE-1; FIFO[PI.1].F_START:=TEMPI+FO_SIZE; FIFO[PI,1].F_TOP:=TEMPI+FO_SIZE; FIFO[PI,1].F_BOTTOM:=TEMPI+FO_SIZE: FIFO[PI,1].F_ST0P:=TEMPI+FO_SIZE+F1_SIZE-1; FIFO[PI,2].F_START:=FIF0[PI,1].F_ST0P+ 1 ; FIFO[PI,2].F_TOP:=FIFO[PI,1].F_ST0P+ 1 ; FIF0[PI,2].F_BOTTOM:=FIFO[PI.1].F STOP+1; FIFO[PI,2].F_STOP:=FIF0[PI,2].F_START+F2_SIZE-1; END; FOR OPI:=0 TO 1 DO WITH OUTPORT[OPI] DO BEGIN(*OP*) 0_TIMER:=0; 0_EMPTY:=TRUE; 0_TR_TIME:=0; 0_MATCHED:=FALSE; 0_DEST:=-1; 0_FEEDBACK_COUNT:=0; READLN(0_RR,0_NEXT_SW.0_NEXT_PT); END;(*0P*) END:(*S*) FOR TI:=1 TO N_TR DO WITH TR[TI) DO BEGIN(*T* ) T_EMPTY:=TRUE; T_BLOCKED:=FALSE; T_DEST:=-1; T_NEXT_SW:= TRUNC((TI+1)/2); T_NEXT_PT:= (TI+1)MOD 2; REPEAT T_TIMER:=TRUNC(RANDOM(SEED)*2*TR_INTL); UNTIL T_TIMER>0 AND T_TIMER<2*TR_INTL; END;(*T*) FOR BI:=1 TO F_TOTAL DO WITH BUFF ER[BI] DO BEGIN B_EMPTY:=TRUE; B_TR_TIME:=0; B_DEST:=-1; B_FEEDBACK_COUNT:=0; END; CLOCK:=0; TOTAL_DELAY:=0; TR_DELAY:=0; R_PACKET :=0: T_PACKET:=0; MAX_DELAY:=0; FMAX[0]:=0; FMAX[1]:=0; FMAX[2J:=0; (**MAX USAGE OF EACH CLASS OF BUFFERS*) END;(*INIT*) PROCEDURE TR_GET_DEST; VAR T:INTEGER; BEGIN FOR T:= 1 TO N_TR DO WITH TR[T] DO BEGIN(*T*) IF T_TIMER <= CLOCK AND T_EMPTY AND NOT T_BLOCKED THEN BEGIN(*TIMER*) REPEAT T_DEST := TRUNC(RANDOM(SEED)*S5); UNTIL ( T DEST IN (. 1..64 .) AND T_DEST <> T ) T_EMPTY:=FALSE; END;(*TIMER*) END; END;(*TR_GET DEST*) FUNCTION ROUTE(SW, DT:INTEGER) : INTEGER; VAR SR.DR:INTEGER; BEGIN SR:=SW; DR:=DT; ROUTE:=1;(*INITIAL SETTING*) IF SR<=8 AND ( (DR-1) MOD 2 =0) THEN ROUTE:=0 ELSE IF SR<=16 AND SR>8 AND ( TRUNC((DR-1)/2) MOD 2 =0) THEN ROUTE:=0 ELSE IF SR<=24 AND SR>16 AND ( TRUNC((DR-1)/4) MOD 2=0) THEN ROUTE:=0 ELSE IF SR<=32 AND SR>24 AND ( TRUNC((DR-1)/8)MOD 2=0) THEN ROUTE:=0 END; PROCEDURE TR_TO_INPORT; VAR TT;INTEGER; PACKET_COUNT:INTEGER; BEGIN FOR TT:=1 TO N_TR DO WITH TR[TT] DO BEGIN(*T* ) IF NOT T_EMPTY AND T_TIMER<=CLOCK THEN WITH SWITCH[T_NEXT_SW],INPORT[T NEXT_PT] DO BEGIN(*READY TO TRANSMIT*) ~ PACKET_COUNT:=FIFO[T_NEXT_PT,O].F_COUNT; IF ( I_EMPTY) AND I_TIMER<=CLOCK AND FIF0[T_NEXT_PT,O].F_C0UNT<FO_SIZE AND FIFO[T_NEXT_PT,1].F_COUNT<F1_SIZE AND FIF0[T_NEXT_PT,2].F_C0UNT<F2_SIZE THEN PACKET_C0UNT<(F0_SIZE-1)*(TRUNC(T_NEXT_SW/8.4+1)*8/N_SW) THEN ( FIFO[0,0].F_COUNT < ( TRUNC(T_NEXT_SW/8.4)+1)) AND' (FIFO[1.0].F_COUNT< ( TRUNC(T_NEXT_SW/8.4)+1)) THEN BEGIN(*GRANTED TO TRANSMIT*) I_TIMER:=CLOCK+T_TRANSFER; I_EMPTY:=FALSE; I_DEST:=T_DEST; I_F E EDBACK_COUNT:=0; I_TR_TIME:=CLOCK; T_EMPTY:=TRUE; T_TIMER:=CLOCK+ TRUNC(TR INTL*2*(RANDOM( SEED ) ) ) ; T_DEST:=-1; T_BLOCKED:=FALSE; INCR(T_PACKET); END ELSE 163 BEGIN T_BLOCKED:=TRUE: INCR(TR_DELAY);("INCREMENT TOTAL TR_DELAY*) END; END;("READY TO TRANSMIT") END;("T") END;(*TR_TO_INPORT*) PROCEDURE TRANSFER( SS,BB,CC,PPRT,OOPT,CC_NEXT:INTEGER); VAR ST,BT,CT,PTRT,OPT,CT_NEXT:INTEGER; BEGIN("GRANT CLASS-CL BUFFER*) ST:=SS; BT:=BB; CT:=CC; PTRT:=PPRT; OPT:=OOPT; CT_NEXT:=CC_NEXT; WITH SWITCH[ST].FIFO[BT.CT],OUTPORT[OPT] DO BEGIN WITH BUFFER[PTRT] DO BEGIN DECR(F_COUNT) ; 0_TIMER:=CLOCK+T_SWITCH+T_DECIDE; 0_EMPTY:=FALSE; 0_DEST:=B_DEST; IF 0_DEST NOT IN (. 1..64 .) THEN WRITELN('????? WRONG DEST, LINE 257??????',0_DEST:5); IF 0_DEST=0_RR THEN 0_MATCHED:=TRUE ELSE 0_MATCHED:=FALSE; 0_FEEDBACK_COUNT:=CT_NEXT; 0_TR_TIME:=B_TR_TIME; B_EMPTY:=TRUE; B_DEST:=-1; B_FEEDBACK_COUNT:=0; B_TR_TIME:=CLOCK; F_FULL:=FALSE; F_BOTTOM:=PTRT; IF F_TOP=F_BOTTOM THEN F_EMPTY:=TRUE ELSE F_EMPTY:=FALSE; END END (*CLASS_CL*) END; PROCEDURE OUTPORT_TO_INPORT; VAR S,OP:INTEGER; BEGIN FOR S:=1 TO N_SW DO WITH SWITCHES] DO FOR OP:=0 TO 1 DO WITH OUTPORT[OP].SWITCH[0_NEXT_SW].INPORT[0_NEXT_PT] DO BEGIN(*S.OP*) IF NOT 0_EMPTY AND NOT 0_MATCHED AND 0_TIMER<=CLOCK THEN BEGIN(*READY TO TRANSMIT*) IF ( I_EMPTY) THEN BEGIN("GRANTED TO TRANSMIT*) I TIMER:=CLOCK+T TRANSFER; I_EMPTY:=FALSE; I_DEST:=0_DEST; I_TR_TIME:=0_TR_TIME; I_FEEDBACK_COUNT:=0_FEEDBACK_COUNT; 0_TIMER:=CLOCK+T_TRANSFER; 0_DEST:=-1; 0_EMPTY:=TRUE; 0_FEEDBACK_COUNT:=0; 0_TR_TIME:=CL0CK; END ELSE BEGIN(*BLOCKED*) (**********OUTP0RT IS BLOCKED???????****************) WRITE('?????OUTPORT BLOCKED???? ' ) ; WRITECAT T.SW.NEXT SW , FB_COUNT ' ) ; WRITELN( CLOCK:3,S:3,0_NEXT_SW:3,0_FEEDBACK_COUNT:4); 0_EMPTY:=FALSE; END; END(*READY*) ELSE IF 0_MATCHED THEN('REMOVE PACKETS*) BEGIN 0_TIMER:=CLOCK; 0_EMPTY:=TRUE; 0_MATCHED:=FALSE; INCR(R_PACKET); WRITELN('RECEIVED AT TIME,DEST.RR,DELAY=',CLOCK:5,0_DEST : 5 , 0_RR:5,CL0CK - 0_TR_TIME:5); IF 0_TR_TIME=0 THEN WRITELN('??????? LINE 343????'); TOTAL_DELAY:=T0TAL_DELAY+(CLOCK - 0_TR_TIME); IF (CLOCK-0_TR_TIME)> MAX_DELAY THEN MAX_DELAY:=(CLOCK - 0_TR_TIME); 0_DEST:=-1; 0_FEEDBACK_COUNT:=0; 0_TR_TIME:=CLOCK; END;(*REMOVE PACKETS*) END;(*S,OP*) END;(*OUTPORT_INPORT*) PROCEDURE INPORT_TO_POOL; VAR SS,IP:INTEGER; BEGIN FOR SS:=1 TO N_SW DO WITH SWITCH[SS] DO FOR IP:=0 TO 1 DO WITH INPORT[IP] DO BEGIN (*S.IP*) IF NOT I_EMPTY AND I_TIMER<=CLOCK THEN BEGIN (*READY TO STORE PACKETS INTO BUFFER POOL*) IF FIFO[IP,I_FEEDBACK_COUNT].F_FULL THEN WRITELN('!!!!! ! !F_FULL! ! ! ! ! S,P.CL=',SS:2,IP:2,I_FEEDBACK_COUNT:3) WITH FIFO[IP,I_FEEDBACK_COUNT] DO IF NOT F_FULL THEN BEGIN F_TOP:=(F_T0P+1): IF F_TOP>F_STOP THEN F_TOP:=F_START; WITH BUFFER[F_TOP] DO BEGIN INCR(F_COUNT); IF F_COUNT>FMAX[I_FEEDBACK_COUNT] THEN FMAX[I_FEEDBACK_COUNT]:= F COUNT; B_EMPTY:=FALSE; ~ B_DEST:=I_DEST; B_TR_TIME:=I_TR_TIME; B_FEEDBACK_COUNT:=I_FEEDBACK_COUNT; I_TIMER:=CLOCK+T_SWITCH+T_DECIDE; I_EMPTY:=TRUE; I_OEST:=-1; I_FEEDBACK_COUNT:=0; I_TR_TIME:=CLOCK; F_EMPTY:=FALSE; IF F_TOP=F_BOTTOM THEN F_FULL:=TRUE ELSE F_FULL:=FALSE; END END END END (*S,IP*) END;(*INPORT_BUFFER POOL*) PROCEDURE POOL_TO_OUTPORT; VAR PTRP:INTEGER;(*POINTER OF STRUCTURED BUFFERS* ) TERMINATE:BOOLEAN; SP,OPP,PP,CLP,CLP_NEXT:INTEGER; CHECK_DEST:INTEGER; CHECK_BIT:INTEGER; OK_TRANSFER:BOOLEAN; NOW_SCHEDULED:INTEGER; BEGIN FOR SP:=1 TO N_SW DO WITH SWITCHfSP] DO FOR OPP:=0 TO 1 DO WITH OUTPORT[OPP] DO BEGIN(*S,OP*) IF 0_EMPTY AND 0_TIMER<=CLOCK THEN BEGIN(*READY TO ACCEPT PACKETS FROM CLASS_0,_1,_2 BUFFERS*) NOW_SCHEDULED:=0; TERMINATE:=FALSE; WHILE (N0W_SCHEDULED<6) AND (NOT TERMINATE) DO BEGIN CASE NOW SCHEDULED OF 0: BEGIN PP =0; CLP = 2; END; 1 : BEGIN PP = 1 ; CLP = 2; END; 2 : BEGIN PP =0; CLP = 1 ; END; 3 : BEGIN PP = 1 ; CLP = 1; END; 4 : BEGIN PP =0; CLP =0; END; 5: BEGIN PP = 1; CLP =0; END; <>: BEGIN WRITELN( 'ERROR IN POOL TO OUTPORT !!!!!!!');END; END; PTRP. =(FIFO[PP.CLP].F_B0TT0M+1); IF PTRP>FIFO[PP,CLP].F_STOP THEN PTRP : =FIFO[PP,CLP] .F_START; (•REMOVE PACKET FROM BOTTOM OF BUFFER*) CHECK_DEST:=BUFFER[PTRP].B_DEST; CHECK_BIT:=ROUTE(SP.CHECK_DEST): (•DETERMINE SWITCH BIT OF PACKET*) (*IF FEEDBACK PACKET, THEN GO TO NEXT CLASS OF BUFFER*) IF 0_NEXT_SW IN (. 1..8 .) THEN BEGIN IF CLP<2 AND (((CHECK DEST-1)M0D 16)=((0 RR-1) MOD 16))THEN CLP_NEXT:=2 ELSE IF CLP<2 THEN 166 CLP_NEXT:=CLP+1 ELSE CLP_NEXT:=CLP END ELSE CLP_NEXT:=CLP; IF 0_NEXT_SW IN (. 1..8 .) AND CLP<2 THEN CLP_NEXT:=CLP+1; ELSE CLP_NEXT:=CLP; WITH SWITCH[0 NEXT_SW].FIFO[0_NEXT_PT,CLP_NEXT] DO IF (NOT FIFO[PP,CLP].F_EMPTY) AND (F_COUNT< (F_STOP-F_START)) AND (CHECK_BIT=OPP) THEN BEGIN OK_TRANSFER:=TRUE; TERMINATE:=TRUE; 0_TIMER:=CLOCK+T_SWITCH; 0_EMPTY:=FALSE; END ELSE WITH BUFFER[PTRP] DO BEGIN OK TRANSFER:=FALSE; NOW_SCHEDULED:=(NOW SCHEDULED*1 ); TERMINATE:=FALSE; END; END;(*WHILE*) IF OK_TRANSFER THEN TRANSFER(SP,PP.CLP,PTRP,OPP,CLP_NEXT); END;("EMPTY*) END;(*S,OP*) END;(*POOL_TO_OUTPORT*) PROCEDURE GROSS_DISPLAY; BEGIN WRITELN('AT TIME=',CL0CK:5); WRITELN(' T_PACKET='.T_PACKET:7); WRITELN(' R_PACKET=',R_PACKET:7); IF R_PACKET > 1 THEN WRITELNC AVERAGE DELAY=',TOTAL_DELAY/R_PACKET:10:5); WRITELN(' MAX DELAY= ',MAX_DELAY:5); WRITELNC AVERAGE TR_DELAY= ', TR_DELAY/T_PACKET:10:5); WRITELN(' AVERAGE THROUGHPUT=', R_PACKET/T_SIMULAT:10:5); WRITELNC UNDELIVERED PACKETS=', T_PACKET- R_PACKET:5); WRITELNC MAX FIFO USAGE=',FMAX[0]:3.FMAX[1]:3,FMAX[2]:3); WRITELN( ' TOTAL FIFO USAGE=' ,FMAX[O] + FMAX[1] + FMAX[2] :5) ; END; PROCEDURE DETAIL_DISPLAY; VAR SW.C.P.B:INTEGER; BEGIN WRITELN('BUFFER DISPLAY OF SWITCH ARRAY'); WRITEC S PT CL B SRC DST SBIT STEP FB GEN_T TR_T BF_T DE_T DE_?'); WRITELN(' F_TOP F_BOTTOM MAX_USED'); FOR SW:=1 TO N_SW DO WITH SWITCH[SW] DO BEGIN FOR P:=0 TO 1 DO BEGIN FOR C:=0 TO 2 DO WITH BUFFER_POOL[P,C] DO BEGIN V 167 IF F_FULL THEN WRITELN('BUFFER FULL:S,P,C ',SW:3,P:3,C:3); IF NOT F_EMPTY THEN BEGIN FOR B:=0 TO FIFO_SIZE-1 DO WITH BUFFER[B] DO IF NOT B_EMPTY THEN BEGIN WRITE(' ',SW:2,' ',P:2,' ',C:2,' ',B:2.' ',' ',B_DE5T:5); WRITE(B_FEEDBACK_COUNT:3,' ' ) ; WRITELN(B_GENERATE_TIME:5,B_TR_TIME:5,' ', F_T0P:6,F_B0TT0M:6); END; END; END; END; WRITELN; END; END; PROCEDURE DEBUG; VAR B:INTEGER; BEGIN WRITELN('DISPLAY OF BUFFERS'); FOR B:=1 TO F_TOTAL DO WITH BUFFER[B] DO BEGIN IF NOT B_EMPTY THEN WRITELN(B:7,B_TR_TIME:5,B_DEST:5,B_FEEDBACK_COUNT:5) ; END; END; BEGIN(*LSSN*) SEED:=8476; INITIALIZE; FOR CLOCK:=1 TO T_SIMULAT DO BEGIN TR_GET_DEST; INPORT_TO_POOL; POOL_TO_OUTPORT; OUTPORT_TO_INPORT; TR_TO_INPORT; IF CLOCK> T_SIMULAT -1 THEN GROSS_DISPLAY; END; END . ("INPUT FILE WHICH CONTAINS THE INTERCONNECTION PATTERN OF THE LSSN*) > 168 References: 1. T. Moto-oka ( e d i t o r ) 1982. F i f t h Generation Computer Systems. North-H o l l a n d P u b l i s h i n g Company. 2. IEEE Spectrum. Tomorrow's Computers. Vol.20, N o . l l , Nov. 1983. 3. H.T. Kung £> C.E. L e i ser son, " S y s t o l i c A r r a y s f o r VLSI," Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rept. CS-79-103, Apr. 1983. 4. Computer, V o l . 15, No. 2, Feb. 1982. S p e c i a l i s s u e on data-flow computers. 5. P. C. T r e l e a v e n , D. R. Brownbridge and R. P. Hopkins,"Data-Driven and Demand-Driven Computer A r c h i t e c t u r e s , " ACM Computing Surveys, V o l . 14, No. 1, March 1982. 6. H.S. S t o n e , " P a r a l l e l Computers," i n I n t r o d u c t i o n to Computer A r c h i t e c t u r e s , e d i t e d by H.S. Stone et a l , 1975, Science Research A s s o c i a t e s , Inc. 7. W.R. Cyre 6 G.J. Lipo v s k i , " O n g e n e r a t i n g M u l t i p l i e r s f o r a C e l l u l a r F ast F o u r i e r Transform P r o c e s s o r , " IEEE Trans, on Computers, C-21, pp83-87, 1972. 8. D. P. Misunas,"A Computer A r c h i t e c t u r e f o r Data Flow Computation," MIT/LCS/TM-100, Cambridge, MA., 1975. 9. H.S. S t o n e , " P a r a l l e l P r o c e s s i n g with the P e r f e c t S h u f f l e , " IEEE Trans, on Computers, C-20, p p l 5 3 - l 6 1 , 1971. 10. D.E. M u l l e r & E.P. Preparata,"Bounds to C o m p l e x i t i e s of Networks f o r S o r t i n g and S w i t c h i n g , " J . Ass. Comput. Mach., Vol.22, PP195-201, Apr. 1975. 11. D.E. Knuth, The A r t of Computer Programming, V o l . 3 , S o r t i n g and S e a r c h i n g . Addison-Wesley, Reading, Mass., 1973. 12. T. Lang & H.S. Stone,"A Shuffle-exchange network with s i m p l i f i e d C o n t r o l , " IEEE Trans, on Computers, Vol.C-25, pp.55-65, Jan. 76. 13. K.E. B a t c h e r , " S o r t i n g Networks and t h e i r A p p l i c a t i o n s , " Proc. AFIPS 1968, Sp r i n g J o i n t Comput. Conf., pp307-3l4, Apr. 1968. 14. D. Nassimi & S. S a h n i , " B i t o n i c Sort on a Mesh-Connected P a r a l l e l Computer," IEEE Trans, on Comput., V10-C28, No.1, pp2-7, Jan.1979. 15. C D . Thompson & H.K. Kung, " S o r t i n g on a Mesh-rConnected P a r a l l e l Computer," Comm. of the ACM, Vol.20, No.4,pp263-2?1, Apr. 1977. 16. H.T. Kung,"Let's Design Algorithms f o r VLSI Systems," Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rep., Jan. 1979. 17. F.S. Wong & M.R. Ito,"A S y s t o l i c S o r t e r and i t s S i m u l a t i o n R e s u l t s , " Dept. of E.E., The Univ. of B r i t i s h Columbia, Tech. Rep., Oct. 1982. 169 18. C D . Thompson,"A Complexity Theory f o r VLSI," Ph.D. T h e s i s , Carnegie-Mellon Univ., Dept. of Computer Sc., 1979. 19. M.J. F o s t e r & H.T. Kung,"Design of S p e c i a l - P u r p o s e VLSI Chips: Examples and Op i n i o n s , " Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rep., Sep. 1979. 20. C. Wu & T. Feng,"On a C l a s s of M u l t i s t a g e I n t e r c o n n e c t i o n Networks," IEEE Trans, on Computers, V o l . C-29, No. 8, Aug. 1980, pp. 694-702. 21. M.C. Pease,"The I n d i r e c t B i n a r y n-Cube M i c r o p r o c e s s o r A r r a y , " IEEE Trans, on Computers, V o l . C-26, No.5, May 1977, pp.458-473. 22. Computer, Vol.14, No. 12, Dec. 1981. S p e c i a l i s s u e on i n t e r c o n -n e c t i o n Networks. 23. F.S. Wong & M.R. Ito,"A Novel Packet S w i t c h i n g Network," Tech. Rept., Dept. of E.E., The Univ. of B r i t i h s Columbia, Canada, J u l y 1982. 24. C. Wu, T. Feng & M.C. L i n , " S t a r : A L o c a l Network System f o r Realtime Management of Imagery Data," IEEE Trans, on Computers, V o l . C-31, No. 10, Oct. 1982, pp. 923-933. 25. D.M. Dias & J.R. Jump,"Packet Switching I n t e r c o n n e c t i o n Networks f o r Modular Systems," i n Computer, V o l . 14, No. 12, Dec. 1981, pp.42-53. 26. A.R. T r i p a t h i & J . L i p o v s k i , " P a c k e t S w i t c h i n g i n Banyan Networks," Proceedings of the 6th Annual Symposium on Computer A r c h i t e c t u r e s , 1979, pp.160-167. 27. K.E. B a t c h e r , " S o r t i n g Networks and t h e i r A p p l i c a t i o n s , " Proceedings of AFIPS 1968, Sp r i n g J o i n t Computer Conf., pp.307 -314, 1968. 28. F. S. Wong and M. R. Ito,"A L a r g e - S c a l e Data-Flow Computer For H i g h l y P a r a l l e l S i g n a l P r o c e s s i n g , " Proceedings of the 1982 I n t e r n a t i o n a l Conference on C i r c u i t s and Computers, New York, Oct. 1982. 29. E. Raubold & J . Haenle,"A Method of Deadl o c k - f r e e Resource A l l o c a t i o n and Flow C o n t r o l i n Packet Networks," Proceeding ICCC 1976, Toronto, Canada, Aug. 1976, pp.483. 30. G.H. Barnes & S.F. Lundstrom,"Design and V a l i d a t i o n of a Connection Network f o r Many-Processor M u l t i p r o c e s s o r Systems," i n Computer, V o l . 14, No. 12, Dec. 1981, pp31-4l. 31. K. S. Weng,"An A b s t r a c t Implementation f o r a G e n e r a l i z e d Data Flow Language," MIT/LCS/TR-228, Cambridge, MA., 1979. 32. D. D. G a j s k i , D. J . Kuck and D. A. Padua,"Dependence-Driven Computation," Proceedings of the IEEE 1981 Compcon S p r i n g , pp. 168-172. 33. P. C. T r e l e a v e n , R . P. Hopkins and P. W. Rautenback,"Combining Data Flow and C o n t r o l Flow Computing," The Computer J o u r n a l , V o l . 2 5 , N o . 2, 1 9 8 2 , p p . 2 0 7 - 2 7 1 . 170 3 4 . J . E . R e q u a a n d J . R. M c G r a w , " T h e P i e c e - w i s e D a t a F l o w A r c h i t e c t u r e : A r c h i t e c t u r a l C o n c e p t s , " I E E E T r a n s a c t i o n s o n C o m p u t e r s , V o l . C - 3 2 , N o . 5, 1 9 8 3 , p p . 4 2 5 - 4 3 7 . 3 5 . F . S. Wong a n d M. R. I t o , " A L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k , " T e c h i c a l R e p t . , D e p t . o f E . E . , T h e U n i v . o f B r i t i s h C o l u m b i a , 1 9 8 2 . ( A c c e p t e d b y I E E E T r a n s , o n C o m p u t e r s . ) 3 6 . P. B u d n i k a n d D. J . K u c k , " T h e O r g a n i z a t i o n a n d U s e o f P a r a l l e l M e m o r i e s , " I E E E T r a n s , o n C o m p u t e r s , V o l . C - 2 6 , 1 9 7 1 , p p . 1 5 6 6 - 1 5 6 9 . 3 7 . D. H. L a w r i e a n d C. R. V o r a , " T h e P r i m e M e m o r y S y s t e m f o r A r r a y A c c e s s , " I E E E T r a n s , o n C o m p u t e r s , V o l . C - 3 1 , N o . 5, 1 9 8 2 , p p . 4 3 5 - 4 4 2 . 3 8 . B. H a n s e n . T h e A r c h i t e c t u r e o f C o n c u r r e n t P a s c a l . P r e n t i c e -H a l l , I n c . 1 9 7 7 . 3 9 . W. B. A c k e r m a n , " D a t a F l o w L a n g u a g e s , " P r o c . o f t h e 1 9 7 9 N a t i o n a l C o m p u t e r C o n f e r e n c e , 1 9 7 9 , p p . 1 0 8 7 - 1 0 9 5 . 4 0 . S. F . L u n d s t r o m a n d G. H. B a r n e s , " A C o n t r o l l a b l e MIMD A r c h i t e c t u r e , " P r o c . o f t h e 1980 I n t e r n a t i o n a l C o n f e r e n c e o n P a r a l l e l P r o c e s s i n g , 1 9 8 0 , p p . 1 9 - 2 7 . 4 1 . D. C o m t e , N . H i f d i a n d J . C. S y r e , " T h e D a t a D r i v e n L A U M u l t i p r o c e s s o r S y s t e m : R e s u l t s a n d P e r s p e c t i v e s , " I n f o r m a t i o n P r o c e s s i n g 8 0 , S. H. L a v i n g t o n ( E d . ) , N o r t h - H o l l a n d P u b . C o . , 1 9 8 0 , p p . 1 7 5 - 1 7 9 . 4 2 . E . W. D i s j k s t r a , " C o - o p e r a t i n g S e q u e n t i a l P r o c e s s e s , " i n P r o g r a m m i n g L a n g u a g e s . F . G e n u y s ( E d . ) A c a d e m i c P r e s s , 1 9 6 8 . 4 3 . W. B. A c k e r m a n a n d J . B . D e n n i s , " V A L — a V a l u e - o r i e n t e d A l g o r i t h m i c L a n g u a g e : P r e l i m i n a r y r e f e r e n c e m a n u a l , " M I T / L C S T R - 2 1 8 , J a n . 1 9 7 9 . 4 4 . R e f e r e n c e M a n u a l f o r t h e A d a P r o g r a m m i n g L a n g u a g e , P r o p o s e d S t a n d a r d D o c u m e n t . US D e p a r t m e n t o f D e f e n s e , 1 9 8 0 . 4 5 . C . K. C. L e u n g , " F a u l t T o l e r a n c e i n P a c k e t C o m m u n i c a t i o n C o m p u t e r A r c h i t e c t u r e s , " M I T / L C S / T R - 2 5 0 , 1 9 8 0 . 4 6 . D. A . A d a m s , " A C o m p u t a t i o n M o d e l w i t h D a t a F l o w S e q u e n c i n g , " C o m p u t e r S c i e n c e D e p t . , S c h o o l o f H u m m a n i t i e s a n d S c i e n c e , S t a n f o r d U n i v e r s i t y , T R - C S 1 7 , D e c . 1 9 6 8 . 4 7 . A r v i n d , K. P. G o s t e l o w a n d W. E. P l o u f f e , " A n A s y n c h r o n o u s P r o g r a m m i n g L a n g u a g e a n d C o m p u t i n g M a c h i n e , " T R - 1 1 4 a , D e p t . o f I n f o r . a n d Comp. S c . , UC I r v i n e , D e c . 1 9 7 8 . 4 8 . A r v i n d , V . K a t h a i l a n d K. P i n g a l i , " A D a t a f l o w A r c h i t e c t u r e w i t h T a g g e d T o k e n s , " M I T / L C S / T M - 1 7 4 , C a m b r i d g e , M a r . 1 9 8 0 . 4 9 . A r v i n d a n d R. E . T h o m a s , " I - S t r u c t u r e : A n E f f i c i e n t D a t a T y p e f o r F u n c t i o n a l L a n g u a g e , " M I T / L C S / T M - 1 7 8 , S e p t . 1 9 8 0 . 171 5 0 . J . D. B r o c k a n d L . B. M o n t z , " T r a n l a t i o n a n d O p t i m i z a t i o n o f D a t a F l o w P r o g r a m s , " P r o c . 1 9 7 9 I n t l . C o n f . o n P a r a l l e l P r o c e s s i n g , B e l l a i r e , M i c h i g a n , A u g . 1 9 7 9 , p p . 4 6 - 5 4 . 5 1 . A . L . D a v i s , " T h e A r c h i t e c t u r e o f DDM1: A R e c u r s i v e l y S t r u c t u r e d D a t a D r i v e n M a c h i n e , " U n i v . o f U t a h , Comp. S c . D e p t . T R - U U C S - 7 7 - 1 1 3 , 1 9 7 7 . 5 2 . J . B. D e n n i s a n d D. P. M i s u n a s , " A P r e l i m i n a r y A r c h i t e c t u r e f o r a B a s i c D a t a - F l o w P r o c e s s o r , " P r o j e c t MAC. M I T C S G Memo 1 0 2 . 5 3 . J . B. D e n n i s a n d D. P. M i s u n a s , " A C o m p u t e r A r c h i t e c t u r e f o r H i g h l y P a r a l l e l S i g n a l P r o c e s s i n g , " P r o c . o f t h e ACM 1974 N a t i o n a l C o n f e r e n c e , p p . 4 0 2 - 4 0 9 . 5 4 . J . B. D e n n i s a n d K. S. W e n g , " A p p l i c a t i o n o f D a t a F l o w C o m p u t a t i o n t o t h e W e a t h e r P r o b l e m , " P r o c . o f t h e S y m p o s i u m o n H i g h S p e e d C o m p u t e r a n d A l g o r i t h m O r g a n i z a t i o n s , A p r i l 1 9 7 7 , p p . 1 4 3 - 1 5 7 . 5 5 . S. I . K a r t a s h e v a n d S. P. K a r t a s h e v , " D y n a m i c A r c h i t e c t u r e s : P r o b l e m s a n d S o l u t i o n s , " i n C o m p u t e r , J u l y 1 978 i s s u e . 5 6 . S. P. K a r t a s h e v a n d S. I . K a r t a s h e v , " S u p e r s y s t e m s f o r t h e 8 0 ' s , " i n C o m p u t e r , N o v . 1980 i s s u e . 5 7 . G. J . L i p o v s k i , " O n a V a r i s t r u c t u r e d A r r a y o f M i c r o p r o c e s s o r s , " I E E E T r a n s , o n C o m p u t e r s , F e b . 1 9 7 7 , p p . 1 2 5 . 5 8 . J . R. M c G r a w , " D a t a F l o w C o m p u t i n g : T h e V A L L a n g u a g e , " M I T / L C S T M - 1 8 8 , J a n . 1 9 8 0 . 5 9 . L . B. M o n t z , " S a f e t y a n d O p t i m i z a t i o n T r a n s f o r m a t i o n o f D a t a F l o w P r o g r a m s , " M I T / L C S / T R - 2 4 0 , C a m b r i d g e , Ma., J a n . 1 9 8 0 . 6 0 . J . R a m b u a g h , " A D a t a F l o w M u l t i p r o c e s s o r , " I E E E T r a n s , o n Comp., F e b . 1 9 7 7 , p p . 1 3 8 - 1 4 6 . 6 1 . S. S. R e d d i a n d E . A . F e u s t e l , " A R e s t r u c t u r a b l e C o m p u t e r S y s t e m , " I E E E T r a n s , o n C o m p u t e r s , J a n . 1 9 7 8 , p p . 1-20. 6 2 . R. M. S h a p i r o a n d e t a l . , " R e p r e s e n t a t i o n o f A l g o r i t h m s a s C y c l i c P a r t i a l O r d e r i n g , " A p p l i e d D a t a R e s e a r c h , W a k e r f i e l d , M a s s . , R e p o r t C A - 7 1 1 2 - 2 7 1 1 , D e c . 1 9 7 1 . 6 3 . H. J . S i e g e l a n d e t a l . , " A S u r v e y o f I n t e r c o n n e c t i o n M e t h o d s f o r R e c o n f i g u r a b l e P a r a l l e l P r o c e s s i n g S y s t e m s , " N a t i o n a l C o m p u t e r C o n f e r e n c e 1 9 7 9 , p p . 5 2 9 - 5 4 2 . 6 4 . M. R. S l e e p , " A p p l i c a t i v e L a n g u a g e s , D a t a F l o w a n d P u r e C o m b i n a t o r y C o d e , " I E E E C o m p c o n 1 9 8 0 , p p . 1 1 2 - 1 1 5 . 6 5 . P. C . T r e l e a v e n , " E x p l o r a t i n g P r o g r a m C o n c u r r e n c y i n C o m p u t i n g S y s t e m s , " i n C o m p u t e r , J a n . 1 9 7 9 , p p . 4 2 - 4 9 . 6 6 . C . G. V i c k a n d e t a l . , " A d a p t a b l e A r c h i t e c t u r e s f o r S u p e r c o m p u t e r s , " i n C o m p u t e r , N o v . 1 9 8 0 , p p . 1 7 - 3 6 . 6 7 . I . W a t s o n a n d J . G u r d , " A P r o t o t y p e D a t a F l o w C o m p u t e r w i t h T o k e n L a b e l l i n g , " N a t i o n a l C o m p u t e r C o n f e r e n c e 1 9 7 9 , p p . 6 2 3 - 6 2 8 . 172 6 8 . D. P. M i s u n a s , " S t r u c t u r e P r o c e s s i n g i n a D a t a F l o w P r o c e s s o r , " P r o c e e d i n g s o f 1 9 7 6 I n t e r n a t i o n a l P a r a l l e l P r o c e s s i n g , A u g . 1976 p p . 1 0 0 - 1 0 5 . 6 9 . R. H. P e r r o t t , "A L a n g u a g e f o r A r r a y a n d V e c t o r P r o c e s s o r s , " ACM T r a n s , o n P r o g r a m m i n g L a n g u a g e a n d S y s t e m s , V o l . 1, N o . 2, O c t . 1 9 7 9 , p p . 1 7 7 - 1 9 5 . 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096640/manifest

Comment

Related Items