Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Cyclical processor and computer architectures for highly parallel applications 1984

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1984_A1 W66.pdf
UBC_1984_A1 W66.pdf [ 7.17MB ]
UBC_1984_A1 W66.pdf
Metadata
JSON: 1.0096640.json
JSON-LD: 1.0096640+ld.json
RDF/XML (Pretty): 1.0096640.xml
RDF/JSON: 1.0096640+rdf.json
Turtle: 1.0096640+rdf-turtle.txt
N-Triples: 1.0096640+rdf-ntriples.txt
Citation
1.0096640.ris

Full Text

C y c l i c a l P rocessor and Computer A r c h i t e c t u r e s f o r H i g h l y P a r a l l e l A p p l i c a t i o n s by Fut-Suan Wong B.Eng.(Hons.), U n i v e r s i t y of Singapore,1979 M.A.Sc., The State U n i v e r s i t y of New York, 1980 / A/THESIS SUBMITTED IN PARTIAL FULFILLMENT OF ' THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY In THE FACULTY OF GRADUATE STUDIES Department of E l e c t r i c a l E n g i n e e r i n g We accept t h i s t h e s i s as conforming to the r e q u i r e d standard THE UNIVERSITY OF JANUARY, © 1984, F. BRITISH COLUMBIA 1984 S. Wong In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f the requirements f o r an advanced degree a t the U n i v e r s i t y o f B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and study. I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head o f my department o r by h i s o r her r e p r e s e n t a t i v e s . I t i s understood t h a t copying o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be allowed without my w r i t t e n p e r m i s s i o n . Department o f The U n i v e r s i t y o f B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 i i A b s t r a c t D u r i n g t h e l a s t few d e c a d e s , t h e s e a r c h f o r p o w e r f u l c o m p u t i n g m a c h i n e s has been one o f t h e s e v e r a l e n d l e s s p u r s u i t s among t h e s c i e n t i f i c c o m m u n i t y . I n t h i s t h e s i s , s e v e r a l n o v e l a r c h i t e c t u r a l i d e a s f o r t h e d e s i g n s o f h i g h - p e r f o r m a n c e c o m p u t i n g m a c h i n e s a r e p r e s e n t e d , a n d t h e p r a c t i c a l i t y a n d u s e f u l n e s s o f c y c l i c a l a r c h i t e c t u r e s -- o n e s w h i c h have t h e i r h a r d w a r e r e s o u r c e s c y c l i c a l l y a r r a n g e d -- i n t h i s r e s p e c t a r e e x a m i n e d . T h e s e i d e a s a r e i l l u s t r a t e d w i t h t h e use o f s p e c i f i c a p p l i c a t i o n e x a m p l e s i n c l u d i n g p a r a l l e l s o r t i n g , p a c k e t - s w i t c h e d c o m m u n i c a t i o n s a n d t h e d e s i g n m e t h o d o l o g y o f a c l a s s o f n e x t - g e n e r a t i o n c o m p u t e r s . I n t h e f i r s t p a r t o f o u r s t u d i e s , t h e s t r u c t u r e a n d c o n t r o l a l g o r i t h m s o f a s i n g l e - c h i p , r e c i r c u l a t i n g s y s t o l i c s o r t e r ( R S S ) , a r e p r e s e n t e d . The c o r r e c t n e s s of t h e a l g o r i t h m s i s p r o v e d , a n d g e n e r a l o p e r a t i o n a l c o n s t r a i n t s a r e d e r i v e d . T h i s p a r a l l e l s o r t e r i s h i g h l y a m e n a b l e t o V L S I i m p l e m e n t a t i o n s b e c a u s e o f t h e s i m p l e c o n t r o l s t r u c t u r e a n d t h e r e g u l a r , r e p e t i t i v e a n d n e a r - n e i g h b o u r t y p e o f i n t e r c o n n e c t i o n s r e q u i r e d . The number o f q u a d r u p l e c o m p a r a t o r s n e e d e d t o s o r t N i t e m s i s N / 4 , a n d t h e a v e r a g e s o r t i n g t i m e i s f o u n d t o be b o u n d e d by ( l o g N ) * * 2 a n d N. A h a r d w a r e t e r m i n a t i o n i s i n c o r p o r a t e d i n t o t h e c o n t r o l u n i t o f t h e s o r t e r , so t h a t t h e s o r t i n g p r o c e s s c a n be t e r m i n a t e d a s s o o n a s t h e i n p u t l i s t i s i n t h e d e s i r e d o r d e r . In t h e s e c o n d p a r t o f o u r s t u d i e s , a n o v e l l o o p - s t r u c t u r e d s w i t c h i n g n e t w o r k (LSSN) i s p r e s e n t e d . I t i s i n t e n d e d f o r p a c k e t c o m m u n i c a t i o n s i n l a r g e - s c a l e s y s t e m s c o n s i s t i n g o f h u n d r e d s t o t h o u s a n d s o f i n t e r c o n n e c t e d d e v i c e s . W i t h L l o o p s -- where L i s a power o f two, i t c a n c o n n e c t up t o N = L ( l o g L ) p a i r s o f t r a n s m i t t e r s a n d r e c e i v e r s , u s i n g o n l y N/2 t w o - b y - t w o s w i t c h e s ; i n t e r m s o f s w i t c h c o u n t s and t h e a m o u nts o f w i r i n g , t h i s n e t w o r k i s v e r y a d v a n t a g e o u s when t h e v a l u e o f N i s l a r g e . I t c a n be e x t e n d e d i n c r e m e n t a l l y , a n d i s f r e e o f t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s w h i c h p r e v a i l i n o t h e r c y c l i c a l , p a c k e t - s w i t c h e d n e t w o r k s . Our s i m u l a t i o n r e s u l t s show t h a t i t s a v e r a g e t h r o u g h p u t r a t e a n d d e l a y a r e c l o s e t o t h a t o f o t h e r d e s i g n s d e s p i t e i t s r e l a t i v e l y l o w s w i t c h c o u n t . I n t h e t h i r d p a r t o f o u r s t u d i e s , a new d e s i g n m e t h o d o l o g y f o r t h e n e x t - g e n e r a t i o n c o m p u t e r s i s d e s c r i b e d . Our p r o p o s e d s y s t e m , t h e E v e n t - D r i v e n C o m p uter (EDC) i s p r i m a r i l y a d a t a - d r i v e n s y s t e m w h i c h h a s i t s c o m p u t i n g r e s o u r c e s a r r a n g e d a s a c i r c u l a r p i p e l i n e , and i t i s s u p p l e m e n t e d w i t h c o n t r o l - d r i v e n a c t i v i t i e s . S u c h a c o m b i n e d a p p r o a c h i s a i m e d a t e x t r a c t i n g t h e a d v a n t a g e s o f b o t h t h e " p u r e " d a t a - d r i v e n a n d c o n t r o l - d r i v e n c o m p u t a t i o n s w h i l e a l l e v i a t i n g t h e i r s h o r t c o m i n g s . Compared t o o t h e r d e s i g n s , an EDC h a s t h e m e r i t s o f a s i m p l e r a r c h i t e c t u r e , b e t t e r r e s o u r c e u t i l i z a t i o n , a r r a y p r o c e s s i n g c a p a b i l i t i e s a n d a h i g h e r s p e e d r a n g e . i v As i s shown by o u r s t u d i e s , t h e p r o p e r t i e s o f t h e c y c l i c a l a r c h i t e c t u r e s d e p e n d g r e a t l y on how t h e i n f o r m a t i o n p a c k e t s i n t e r a c t w i t h e a c h o t h e r ; d e a d l o c k s , f o r i n s t a n c e , w i l l o c c u r on s y s t e m s s u c h a s t h e L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k b e c a u s e o f t h e a s y n c h r o n o u s , c i r c u l a r r e q u e s t s o f n e t w o r k r e s o u r c e s by t h e p a c k e t s (we h a v e , h o w e v e r , p r e s e n t e d a d e a d l o c k a v o i d a n c e s c h e m e ) , b u t w i l l n o t o c c u r on s y n c h r o n o u s s y s t e m s s u c h a s t h e R e c i r c u l a t i n g S y s t o l i c S o r t e r . I n g e n e r a l , t h e r e s o u r c e u t i l i z a t i o n o f t h e c y c l i c a l a r c h i t e c t u r e s a r e h i g h e r t h a n t h a t o f t h e a c y c l i c o n e s — o r e q u i v a l e n t l y , t h e c y c l i c a l a r c h i t e c t u r e s c a n h a n d l e l a r g e r a mounts o f i n f o r m a t i o n w i t h r e l a t i v e l y s m a l l e r a r e a s -- t h e y a r e t h e r e f o r e more s u i t a b l e t o t h e d e s i g n s o f v e r y l a r g e - s c a l e s y s t e m s . Key p h r a s e s : c o m p u t e r a r c h i t e c t u r e s , n e x t - g e n e r a t i o n c o m p u t i n g , s y s t o l i c a r r a y s , p a r a l l e l s o r t i n g , p a c k e t - s w i t c h e d n e t w o r k s , s t o r e - a n d - f o r w a r d d e a d l o c k s , d a t a - d r i v e n a n d c o n t r o l - d r i v e n c o m p u t a t i o n s . V T a b l e o f C o n t e n t s page A b s t r a c t i i T a b l e o f C o n t e n t v L i s t o f F i g u r e s v i i i L i s t o f T a b l e s x N o n e m c l a t u r e x i A c k n o w l e d g e m e n t s x i i C h a p t e r I . I n t r o d u c t i o n 1.1. B a c k g r o u n d I n f o r m a t i o n 1 1.2. C y c l i c a l A r c h i t e c t u r e s 4 1.3. O b j e c t i v e s and S c o p e o f t h e T h e s i s 7 C h a p t e r I I . A S y s t o l i c P r o c e s s o r f o r P a r a l l e l S o r t i n g 11.1. I n t r o d u c t i o n 9 11.2. The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) A. N e t w o r k D e s c r i p t i o n 12 B. The Q u a d r u p l e C o m p a r a t o r 14 C. The C o m p a r i s o n / E x c h a n g e / S h i f t O p e r a t i o n s 16 11.3. The RSS A l g o r i t h m s A. A l g o r i t h m I 20 B. A l g o r i t h m I I 21 C. E x a m p l e s 22 11.4. O p e r a t i o n a l C o n s t r a i n t s A. C o n s t r a i n t s on t h e S i z e o f RSS 23 B. M a r k i n g Scheme A 26 C. M a r k i n g Scheme B ...26 11.5. A n a l y s i s o f t h e RSS A l g o r i t h m s A. A n a l o g y w i t h t h e Odd-Even T r a n s p o r t a t i o n v i Sort 27 B. C o r r e c t n e s s of the RSS Algorithms and Marking Schemes 29 C. C o r r e c t n e s s of the Termination Method 40 D. Timing C o m p l e x i t i e s 41 I I . 6. D i s c u s s i o n s 43 Chapter I I I . A Novel Loop-Structured Switching Network (LSSN) 111 .1 . I n t r o d u c t i o n 48 I I I . 2. Network Topology A. Addressing Scheme and Connection Function....50 B. Routing Scheme 51 111.3. Network P r o p e r t i e s 59 A. Network C o n f l i c t s 62 B. Deadlocks and Avoidance Method 65 C. Network E x t e n s i b i l i t y .68 111.4. S i m u l a t i o n s and Performance A n a l y s i s 70 111.5. D i s c u s s i o n s and Outlook 74 Chapter IV. Design and E v a l u a t i o n of the Event-Driven Computer (EDC) IV.1. I n t r o d u c t i o n A. Background Information ...81 B. Recent Developments 83 C. Overview of Our Approach 85 IV.2. The EDC Hardware A r c h i t e c t u r e A. P r o c e s s i n g Modules 91 B. Storage Modules 96 C. Switches 98 IV.3. The EDC Information S t r u c t u r e v i i A. M a c h i n e Code F o r m a t s 104 B. P a c k e t F o r m a t s 107 C. P r o g r a m O r g a n i z a t i o n .108 D. D a t a S t r u c t u r e s 111 E. P r o c e s s a n d R e s o u r c e Management 113 I V . 4 . The EDC P r o g r a m m i n g L a n g u a g e S t r u c t u r e A. S t a t e m e n t s a n d P r o g r a m B l o c k s 114 B. L a n g u a g e C o n s t r u c t s f o r A r r a y P r o c e s s i n g . . . . 120 I V . 5 . P e r f o r m a n c e A n a l y s i s A. F l o w A n a l y s i s o f EDC 123 B. E x a m p l e 126 C. C o n s i d e r a t i o n s f o r G e n e r a l i z e d C o m p u t a t i o n s . 1 3 1 I V . 6. D i s c u s s i o n s a n d O u t l o o k 132 C h a p t e r V. C o n c l u s i o n s V. 1. Summary of R e s u l t s 138 V.2. G e n e r a l D i s c u s s i o n s 139 V.3. S u g g e s t i o n s f o r F u r t h e r Work 141 A p p e n d i x A A p p e n d i x B A p p e n d i x C R e f e r e n c e s V i t a v i i i L i s t o f F i g u r e s page 1. F i g . 1 . 1 . The c y c l i c a l c o n f i g u r a t i o n 5 2. F i g . I I . 1 . The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) 11 3. F i g . I I . 2 . The c o n t r o l u n i t o f RSS 11 4. F i g . I I . 3 . The S c h e m a t i c d i a g r a m o f a q u a d r u p l e c o m p a r a t o r 15 5. F i g . I I . 4 . S y m b o l s u s e d f o r c o m p a r i s o n a n d s h i f t 16 6. F i g . I I . 5 . The f o u r o p e r a t i o n s p e r f o r m e d by t h e q u a d r u p l e c o m p a r a t o r s 19 7. F i g . I I . 6 . An e x a m p l e t o i l l u s t r a t e RSS A l g o r i t h m I a n d M a r k i n g Scheme A 24 8. F i g . I I . 7 . An e x a m p l e t o i l l u s t r a t e RSS A l g o r i t h m I I a n d M a r k i n g Scheme B 25 9. F i g . I I . 8 . The o d d - e v e n s o r t e r 28 10. F i g . I I . 9 . The t h r e e i n d e x e s : i , j , and J , a n d t h e i n i t i a l m a r k e r p o s i t i o n M ( i ) 30 11. F i g . 1 1 . 1 0 . T h e h o r i z o n t a l c o m p a r i s o n s c a r r i e d o u t on t h e RSS a r r a y 30 12. F i g . 1 1 . 1 1 . The number o f c o m p a r i s o n c y c l e s v e r s u s t h e number o f i t e m s t o be s o r t e d ( A l g o r i t h m I ) 42 13. F i g . I I . 1 2 . A g e n e r a l - p u r p o s e c o m p u t e r s y s t e m w i t h s p e c i a l - p u r p o s e c h i p s a t t a c h e d [ 1 9 ] 43 14. F i g . I I I . l . A s s i g n m e n t o f l o o p a n d l i n k l a b e l s on a LSSN w h i c h h a s 16 l o o p s and 32 s w i t c h e s 53 15. F i g . I I I . 2 . C o n n e c t i o n o f t r a n s m i t t i n g and r e c e i v i n g d e v i c e s on a LSSN w i t h 16 l o o p s ...54 16. F i g . I I I . 3 . The s c h e m a t i c d i a g r a m s o f a Type-A s w i t c h . . . . 5 5 17. F i g . I I I . 4 . A 16x16 b a s e l i n e n e t w o r k 78 18. F i g . I I I . 5 . E f f e c t s o f b u f f e r s i z e on t h e t h r o u g h p u t a n d d e l a y o f a 64x64 LSSN 78 19. F i g . I I I . 6 . The t h r o u g h p u t r a t e s o f a 64x64 b a s e l i n e , a 64x64 LSSN and a 16x16 b a s e l i n e , v e r s u s t h e i n t e r - a r r i v a l t i m e 79 20. F i g . I I I . 7 . The d e l a y c u r v e s o f a 64x64 b a s e l i n e , a 64x64 LSSN a n d a 16x16 b a s e l i n e , v e r s u s t h e i n t e r - a r r i v a l t i m e , 80 2 1 . F i g . I V . 1 . EDC s y s t e m b l o c k d i a g r a m 87 22. F i g . I V . 2 . The c o n n e c t i o n d i a g r a m o f EDC h a r d w a r e a r c h i t e c t u r e 90 2 3. F i g . I V . 3 . The s c h e m a t i c d i a g r a m o f a PSN s w i t c h 100 24. F i g . I V . 4 . P a r a m e t e r p a s s i n g b e t w e e n t h e c a l l i n g p r o g r a m M a n d t h e c a l l e d p r o g r a m P 109 25. F i g . I V . 5 . The i n t e r a c t i o n s b e t w e e n c a l l i n g p r o g r a m s a n d a t a s k p r o g r a m 110 26. F i g . I V . 6 . The p h y s i c a l a n d l o g i c a l a r r a n g e m e n t s o f t h e EDC memory s y s t e m 111 27. F i g . I V . 7 . The i m p l e m e n t a t i o n o f a r e s o u r c e manager u s i n g a t a s k p r o g r a m 113 28. F i g . I V . 8 . A " B e g i n / E n d " b l o c k a n d i t s d a t a - f l o w g r a p h . 1 1 6 29. F i g . I V . 9 . An " I F " b l o c k a n d i t s d a t a - f l o w g r a p h 116 i x 3 0. F i g . I V . 1 0 . A " M a t c h " b l o c k a n d i t s d a t a - f l o w g r a p h 117 3 1 . F i g . I V . 1 1 . A "Loop" b l o c k a n d i t s d a t a - f l o w g r a p h 118 32. F i g . I V . 1 2 . The s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f a p a r a l l e l v e c t o r o p e r a t i o n 120 33. F i g . I V . 1 3 . The s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f a r e d u c t i o n o p e r a t i o n 121 34. F i g . I V . 1 4 . The s t a t e m e n t s o f some a l i g n m e n t o p e r a t i o n s , a n d t h e d a t a - f l o w g r a p h a nd m a c h i n e c o d e f o r m a t o f t h e "SHIFT" o p e r a t i o n 123 35. F i g . I V . 1 5 . The MARPf,MATR and MARPc,max c u r v e s o f t h e g i v e n e x a m p l e 129 X L i s t o f T a b l e s page 1. T a b l e I I . 1. R e q u i r e m e n t s o f t h e RSS m a r k i n g s c h e m e s . . 40 2. T a b l e I I . 2 . C o m p l e x i t i e s of s o r t i n g n e t w o r k s 47 3. T a b l e 111 .1 . S c a l a r o p e r a t i o n s 135 4. T a b l e I I I . 2 . Compound o p e r a t i o n s 135 5. T a b l e I I I . 3 . The " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f s c a l a r o p e r a t i o n s 136 6. T a b l e I I I . 4 . The " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s of compound o p e r a t i o n s 136 7. T a b l e I I I . 5 . The f o r m a t s o f i n s t r u c t i o n p a c k e t s 137 8. T a b l e I I I . 6 . The f o r m a t s o f r e s u l t p a c k e t s 137 x i N o m e n c l a t u r e ADT : A r r a y D e s c r i p t i o n T a b l e ARS : A v e r a g e r o u t i n g s t e p s C : Number o f c o l u m n s CS : C h a n n e l S e l e c t o r EDC : E v e n t - D r i v e n C o m p uter GCU : G l o b a l C o n t r o l U n i t i , I : I n d e x ( s h o r t f o r m o f " I n s t r u c t i o n " when u s e d a s s u b s c r i p t ) J , J : I ndex k,K : I ndex I R S : I n s t r u c t i o n R e g i s t e r s L : Number o f l o o p s u s e d i n LSSN L I T : L i n k a g e I n f o r m a t i o n T a b l e LMs : L o c a l M e m o r i e s LSSN : L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k M Number o f L o c a l M e m o r i e s MATR Maximum A v e r a g e T h r o u g h p u t R a t e MARP Maximum A c c e p t a n c e R a t e o f P a c k e t s MIMD : M u l t i p l e - I n s t r u c t i o n a n d M u l t i p l e - D a t a ( c o m p u t e r s y s t e N : Number o f i n p u t x p P : Number o f p r o c e s s o r s PDF : P i e c e - w i s e D a t a F l o w ( c o m p u t e r ) PSN : P a c k e t - S w i t c h e d N e t w o r k R : Number o f rows o r R e c e i v i n g P r o c e s s o r s ( s h o r t f o r m o f " R e s u l t " when u s e d a s s u b s c r i p t ) RL : R e q u e s t L i s t RPS : R e c e i v i n g P r o c e s s o r s Rr : R e c e i v e r RSS : R e c i r c u l a t i n g S y s t o l i c S o r t e r SIMD : S i n g l e - I n s t r u c t i o n a n d M u l t i p l e - D a t a ( c o m p u t e r s y s t e m s SISD : S i n g l e - I n s t r u c t i o n a n d S i n g l e - D a t a ( c o m p u t e r s y s t e m s ) SM : S y s t e m Memory SP : S u p e r v i s o r y P r o c e s s o r SUT : S t o r a g e U t i l i z a t i o n T a b l e SW : S w i t c h T : Number o f T r a n s m i t t i n g P r o c e s s o r s TPs : T r a n s m i t t i n g P r o c e s s o r s T r : T r a n s m i t t e r ) A c k n o w l e d g e m e n t s I s i n c e r e l y t h a n k my s u p e r v i s o r , D r . M. R. I t o f o r h i s p a t i e n t h e l p and g u i d a n c e d u r i n g t h e c o u r s e o f my g r a d u a t e p r o g r a m . I w o u l d a l s o l i k e t o t h a n k D r . C h a n s o n , D r . S c h r a c k a n d D r . Vuong f o r t h e i r s e r v i c e s a s members o f my s u p e r v i s o r y c o m m i t t e e . I am a l s o t h a n k f u l t o my p a s t a n d c u r r e n t o f f i c e m a t e s , f o r m a k i n g my s t a y on t h i s campus a m e m orable one. As f o r my f i n a n c i a l s u p p o r t , I am g r a t e f u l f o r t h e r e s e a r c h a s s i s t a n t s h i p s p r o v i d e d by my s u p e r v i s o r , t h e t e a c h i n g a s s i s t a n t s h i p s p r o v i d e d by t h e D e p a r t m e n t o f C o m p u t e r S c i e n c e , and t h e a w a r d s p r o v i d e d by t h e L e e F o u n d a t i o n o f S i n g a p o r e . 1 Chapter I. I n t r o d u c t i o n 1. Background Information The demand f o r high speed computation i s ever- i n c r e a s i n g , p a r t i c u l a r l y among the s c i e n t i f i c community engaged i n l a r g e - s c a l e computation such as weather f o r e c a s t i n g , r e a l t i m e b a t t l e f i e l d assessment, a r t i f i c i a l i n t e l l i g e n c e and s i m u l a t i o n s of very l a r g e and complex p r o c e s s e s . While c o n v e n t i o n a l computer systems can handle many of the c u r r e n t demands, they s u f f e r from c e r t a i n drawbacks — ranging from software o b e s i t y to hardware i n e x t e n s i b i l i t y — which s e v e r e l y r e s t r i c t t h e i r u s e f u l n e s s i n the design of the s o - c a l l e d " f i f t h - g e n e r a t i o n " computers [1] which are c u r r e n t l y being planned f o r f u t u r e very l a r g e - s c a l e a p p l i c a t i o n s . The f i r s t four g e n e r a t i o n s of computers are commonly d i s t i n g u i s h e d by t h e i r c o n s t i t u e n t t e c h n o l o g i e s — vacuum tubes, t r a n s i s t o r s , i n t e g r a t e d c i r c u i t s and, c u r r e n t l y , very l a r g e - s c a l e i n t e g r a t i o n (VLSI). C e n t r a l to the f i f t h - g e n e r a t i o n concept i s a break with the c o n v e n t i o n a l , or sometimes r e f e r r e d to as the Von Neumann, computer a r c h i t e c t u r e that has p r e v a i l e d i n the f i r s t four computer ge n e r a t i o n s [ 2 ] . S e v e r a l c l a s s e s of computer a r c h i t e c t u r e s have been proposed f o r the n e x t - g e n e r a t i o n computers, i n c l u d i n g t r e e s t r u c t u r e s , square and cube a r r a y s , p i p e l i n e s , s y s t o l i c a r r a y s 2 [ 3 ] , d a t a - d r i v e n systems [ 4 ] , demand-driven systems [5] and dynamic s t r u c t u r e s [55,57,61]. As of today, none of these a r c h i t e c t u r e s has yet evolved to become the s i n g l e , dominant b a s i s of r e s e a r c h work in t h i s a r e a . In t h i s t h e s i s , we w i l l look i n t o another i n t e r e s t i n g d e s i g n methodology -- c y c l i c a l a r c h i t e c t u r e s -- f o r h i g h l y p a r a l l e l a p p l i c a t i o n s , and s e v e r a l ideas based on the concept of c y c l i c a l a r c h i t e c t u r e s w i l l be proposed. Our new designs w i l l a l s o i n c o r p o r a t e the fundamental p r i n c i p l e s of s y s t o l i c , packet communications, d a t a - d r i v e n and c o n t r o l - d r i v e n systems. S y s t o l i c systems are c h a r a c t e r i z e d by t h e i r data-flow p a t t e r n : rhythmic data movements analogous to the p u l s a t i o n s i n the a r t e r i e s caused by the r e c u r r e n t c o n t r a c t i o n s of the h e a r t s . Because of t h e i r simple, h i g h l y r e p e t i t i v e s t r u c t u r e s , s y s t o l i c systems are very amenable to VLSI implementations. The a l g o r i t h m s of many s p e c i a l i z e d a p p l i c a t i o n s such as the Fast F o u r i e r Transform and matrix m u l t i p l i c a t i o n s , have been proposed f o r s y s t o l i c computation [ 3 ] . Packet communications are t r a d i t i o n a l l y meant f o r computer systems which are g e o g r a p h i c a l l y apart and i n t e r c o n n e c t e d v i a l o c a l networks; but r e c e n t l y , they have a l s o been proposed f o r m u l t i p r o c e s s o r systems c o n s i s t i n g of tens t o thousands of c l o s e l y i n t e r c o n n e c t e d p r o c e s s i n g and storage modules -.- examples are d a t a - d r i v e n computers i n which i n s t r u c t i o n e x e c u t i o n s are t r i g g e r e d by the a r r i v a l s of input 3 operands which are e ncapsulated i n t o the form of packets, and networks are used to convey these packets among the hardware modules. Data-driven computers have r e c e n t l y r e c e i v e d enormous a t t e n t i o n s due to t h e i r s i m p l i c i t y i n the e x p l o r a t i o n s of asynchronous p a r a l l e l i s m ; but on the other hand, they do not take advantage of the simple c o n t r o l s t r u c t u r e that e x i s t s i n a r r a y computation, and a l s o some i n h e r e n t l y s e q u e n t i a l a c t i v i t i e s do not conform n a t u r a l l y to the n o t i o n of d a t a - d r i v e n computation. In the c o n v e n t i o n a l , c o n t r o l - d r i v e n computers, i n s t r u c t i o n e x e cutions are sequenced e x p l i c i t l y by c o n t r o l s i g n a l s generated by the c e n t r a l p r o c e s s i n g u n i t s ; i n c o n t r a s t to d a t a - d r i v e n systems, they are more advantageous i n h a n d l i n g a r r a y computation because they make use of the simple c o n t r o l s t r u c t u r e s which e x i s t i n a r r a y computation; but on the other hand, the e x p l o i t a t i o n s of p a r a l l e l i s m i n c o n t r o l - d r i v e n systems are more d i f f i c u l t because e x p l i c i t c o n t r o l s i g n a l s are needed to s p e c i f y the branching and merging of e x e c u t i o n paths, which otherwise c o u l d be done i m p l i c i t l y i n d a t a - d r i v e n systems by operand packets which are sent among the i n s t r u c t i o n s . More d e t a i l s of these v a r i o u s systems w i l l be p r o v i d e d i n the f o l l o w i n g c h a p t e r s . We b e l i e v e t h a t i n order to gain s i g n i f i c a n t 4 improvement in the computation speed over e x i s t i n g computer systems, the new designs may have to depart from the p r e v a l e n t s e q u e n t i a l computation i n both hardware and software to v a r i o u s e x t e n t s . In other words, some of the e x i s t i n g development t o o l s such as o f f - t h e - s h e l f components, compiler techniques, e t c . , may not be u s e f u l i n our d e s i g n s ; f o r these reasons, we w i l l only emphasize the a r c h i t e c t u r a l aspects but not any immediate implementation. Throughout t h i s d i s s e r t a t i o n , the term " p r o c e s s o r " i s used to denote a p i e c e of p a s s i v e hardware capable of only p r i m i t i v e o p e r a t i o n s ; on other hand, "computer" r e f e r s t o a f u l l f l e d g e d machine capable of e x e c u t i n g h i g h - l e v e l o p e r a t i o n s ; " h i g h l y p a r a l l e l " or " n e x t - g e n e r a t i o n " a p p l i c a t i o n s are those c o n t a i n i n g l a r g e amounts of both synchronous and asynchronous, h i g h and l o w - l e v e l computation which can be performed i n p a r a l l e l , such as those examples quoted i n the beginning of t h i s c h a p t e r . 2. C y c l i c a l A r c h i t e c t u r e s The r a t i o n a l e of our advocacy of c y c l i c a l a r c h i t e c t u r e s i s based on the behaviour of program e x e c u t i o n s . As e x h i b i t e d i n the execution c y c l e s of i n s t r u c t i o n s as w e l l as "DO-LOOPS" which e x i s t i n n e a r l y a l l s c i e n t i f i c and b u s i n e s s - o r i e n t e d programs, the ways i n which most programs are executed, are b a s i c a l l y c y c l i c a l i n nature. I t i s t h e r e f o r e very spontaneous to e n v i s i o n a c l a s s of a r c h i t e c t u r e s which 5 have t h e i r resources arranged i n t o a c y c l i c a l c o n f i g u r a t i o n as f o l l o w s : Feedback path Computation path Input > (storage, p r o c e s s o r s and switches) > Output I I F i g . 1 . 1 . The c y c l i c a l c o n f i g u r a t i o n . The main computation path i n Fig.1.1 c o n s i s t s of both p r o c e s s i n g and s w i t c h i n g elements, and e i t h e r s h i f t - r e g i s t e r s or memory words are used f o r b u f f e r i n g and storage purposes. The i n f o r m a t i o n which goes through the feedback path are packets of e i t h e r data, c o n t r o l s i g n a l s or both, depending on the a p p l i c a t i o n s . Current r e s e a r c h i n v o l v i n g such c y c l i c a l a r c h i t e c t u r e s c o u l d be broadly c l a s s i f i e d i n t o three areas depending on the nature of the feedback s i g n a l s : (1) S p e c i a l - p u r p o s e p r o c e s s o r s attached to host computers: Examples are p r o c e s s o r s f o r the Fast F o u r i e r Transform [6,7] and matrix t r a n s p o s i t i o n [ 9 ] . For t h i s area of a p p l i c a t i o n , the purpose of feedback i s to allow f u r t h e r i n t e r a c t i o n s among the data elements and a l s o to re-use the resources along the computation path. 6 (2) I n t e r c o n n e c t i o n networks f o r processor-to-memory or p r o c e s s o r - t o - p r o c e s s o r communications: Examples are s i n g l e - s t a g e d shuffle-exchange networks [9,12] and m u l t i - s t a g e d shuffle-exchange networks [20,21,22], For t h i s area of a p p l i c a t i o n s , the s o l e purpose of feedback i s to re-use the res o u r c e s ; there i s no i n t e r a c t i o n s among the data. (3) F u l l f l e d g e d , high-performance computers: With only a few exceptions [28,60], n e a r l y a l l d a t a - d r i v e n systems are based on the c y l i c a l c o n f i g u r a t i o n [4,5]. For t h i s area of a p p l i c a t i o n s , packets are fed back as the r e s u l t of the completion of i n s t r u c t i o n c y c l e s ; and new i n s t r u c t i o n packets are brought i n t o the computation path when c e r t a i n r e s u l t packets are r e c e i v e d at the end of the feedback path. If a system c o u l d be implemented with e i t h e r the c y c l i c a l or the a c y c l i c c o n f i g u r a t i o n , then the r e l a t i v e m e r i t s and demerits of the two c o n f i g u r a t i o n s are as f o l l o w s . In g e n e r a l , the c y c l i c a l c o n f i g u r a t i o n would give r i s e t o a b e t t e r resource u t i l i z a t i o n than the a c y c l i c one, because i t s resou r c e s c o u l d be used r e p e a t e d l y by means of feedback; t h i s f e a t u r e would i n c u r tremendous savings i n system r e s o u r c e s , e s p e c i a l l y when the s i z e of the system i s very l a r g e . T h e r e f o r e , i f the e n t i r e system i s to be c o n s i d e r e d f o r f a b r i c a t i o n on a s i n g l e i n t e g r a t e d - c i r c u i t c h i p , i t s c y c l i c a l c o n f i g u r a t i o n would be a b e t t e r c h o i c e . On the other hand, 7 the c o n t r o l of the c y c l i c a l c o n f i g u r a t i o n i s u s u a l l y more d i f f i c u l t : i n some systems, masking b i t s are needed to d i s a b l e a subset of the p r o c e s s i n g r e s o u r c e s [9]; while i n o t h e r s , feedback counts are r e q u i r e d to separate the feedback s i g n a l s from the incoming ones. I f the c y c l i c a l c o n f i g u r a t i o n s are used asynchronously (e.g., as packet-switched communications networks) , then they would be s u s c e p t i b l e to the store-and- forward type of deadlocks due to c i r c u l a r requests of re s o u r c e s . Another important c h a r a c t e r i s t i c of packet- switched, c y c l i c a l systems i s t h e i r lack of res p o n s i v e n e s s , because when i n t e r r u p t s occur, the computation path c o u l d a l r e a d y be congested with i n f o r m a t i o n packets such that the i n t e r r u p t s cannot be processed immediately. 3. O b j e c t i v e s and Scope of the T h e s i s The main o b j e c t i v e of t h i s t h e s i s i s t o advocate c y c l i c a l a r c h i t e c t u r e s as the b a s i c design p r i n c i p l e of a c l a s s of high-performance systems. Our ideas w i l l be demonstrated by s p e c i f i c a p p l i c a t i o n s i n c l u d i n g p a r a l l e l s o r t i n g , packet-switched communications and the design of a novel computer — a l l of which are of c u r r e n t r e s e a r c h i n t e r e s t . The advantages of our designs r e l a t i v e t o ot h e r s w i l l be d i s c u s s e d , and the methods to r e s o l v e the v a r i o u s afore-mentioned demerits of c y c l i c a l a r c h i t e c t u r e s w i l l be pre s e n t e d . 8 In Chapter I I , we w i l l present a r e c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) which i s designed as a s i n g l e - c h i p , p a r a l l e l s o r t i n g module to be a t t a c h e d to a host computer. The s o r t i n g a l g o r i t h m s , design of the c o n t r o l l e r , and r e l a t i v e m e r i t s of the RSS w i l l be d e t a i l e d . Chapter I I I w i l l d e s c r i b e a l o o p - s t r u c t u r e d s w i t c h i n g network (LSSN) intended f o r communications i n packet-switched, m u l t i p r o c e s s i n g environments. The topology, p r o p e r t i e s and performance a n a l y s i s of LSSN w i l l be d i s c u s s e d , and the occurrence and r e s o l u t i o n of deadlocks w i l l be p r esented. Chapter IV w i l l o u t l i n e the design of the Event-Driven Computer (EDC) which i s p r i m a r i l y a d a t a - d r i v e n system supplemented with c o n t r o l - d r i v e n a c t i v i t i e s . The r a t i o n a l e of design, hardware and software o r g a n i z a t i o n s and performance of EDC w i l l be addressed. General d i s c u s s i o n s and suggestions of f u r t h e r work w i l l be given i n Chapter V. 9 Chapter I I . A S y s t o l i c Processor For P a r a l l e l S o r t i n g A b b r e v i a t i o n s : N: Number of input items C,Column#: Number of columns R,Row#: Number of rows P,Comparator#: Number of comparators i : Loop index j : Moving p o s i t i o n index J : F i x e d p o s i t i o n index M ( i ) : I n i t i a l marker's p o s i t i o n i n loop i t : Comparison c y c l e time "*": A marker 1. I n t r o d u c t i o n S o r t i n g has been an important o p e r a t i o n i n business and computer e n g i n e e r i n g a p p l i c a t i o n s [13]. Many standard and novel s o r t i n g a l g o r i t h m s c o u l d be found i n the l i t e r a t u r e [9-17]; some of them are optimal i n time c o m p l e x i t i e s , some i n the number of comparators used while others l a y emphasis on a r c h i t e c t u r a l d e s i g n s , i . e . , p r o c e s s o r i n t e r c o n n e c t i o n s , data flow, c o n t r o l s t r a t e g i e s and implementation t e c h n o l o g i e s . In t h i s c h a p t e r , we present a p a r a l l e l s o r t i n g network which embodies the concepts of both the c y c l i c a l a r c h i t e c t u r e s 1 0 and the s y s t o l i c systems .[3]. S y s t o l i c systems are c h a r a c t e r i z e d by t h e i r data flow p a t t e r n : once data are loaded from the memories, they and/or t h e i r i n t e r m e d i a t e r e s u l t s w i l l move w i t h i n the system along predetermined paths provided among the p r o c e s s i n g elements, and every element accepts and d i s t r i b u t e s data from and to i t s neighbours i n a rhythmic f a s h i o n analogous to the p u l s a t i o n s i n the a r t e r i e s caused by the r e c u r r e n t c o n t r a c t i o n s of the h e a r t . A major advantage of such systems l i e s i n the f a c t t h a t processor-memory communications are i n v o l v e d only d u r i n g the l o a d i n g of the input data and unloading of the f i n a l r e s u l t s ; t h e r e f o r e , there i s no delay due to bus c o n t e n t i o n s and memory access c o n f l i c t s d u r i n g the computation time. T h i s study w i l l demonstrate t h a t a c y c l i c a l a r c h i t e c t u r e coupled with s y s t o l i c data movements can perform the u s e f u l task of s o r t i n g . Because of the h i g h l y r e g u l a r i n t e r c o n n e c t i o n , simple c o n t r o l and addr e s s i n g s t r u c t u r e s , the area r e q u i r e d by t h i s design i s very compact, and hence i t i s h i g h l y amenable to VLSI implementations. A d e s c r i p t i o n of the r e c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) w i l l be given i n S e c t i o n 2, and the s o r t i n g a l g o r i t h m s , i n S e c t i o n 3. The c o n s t r a i n t s on RSS w i l l be d i s c u s s e d i n S e c t i o n 4 while S e c t i o n 5 w i l l a n alyse the RSS alg o r i t h m s and t h e i r t i m i n g c o m p l e x i t i e s . The r e l a t i v e m e r i t s of RSS w i l l be compared and d i s c u s s e d along with other designs i n S e c t i o n 6. DlHSHFuO-IHH {FL> tHHFDH mm HFlHHHFm LtD tFD \mnm tHHFC mm HHHHFC LHhHHHHHHHHHr-* I/O sw i t c h S y s t o l i c a r r a y i g . I I . 1 . The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) To/from the s y s t o l i c a r r a y To I/O sw i t c h Terminate "O X o o o o a rH o o C o n t r o l U n i t * o c (C o w 0) c (0 x: o X w Sequencer |Clock| I Reset I Counter U ' | T in t JComparator -'-f^Tft 1 J |2*Column#| R e g i s t e r F i g . I I . 2 . The c o n t r o l u n i t of RSS. 12 2. The R e c i r c u l a t i n g S y s t o l i c S o r t e r (RSS) 2.A. Network D e s c r i p t i o n A schematic diagram of the proposed s o r t e r RSS i s given i n F i g . I T . 1 . The RSS network c o n s i s t s of an a r r a y of "quadruple" comparators which are arranged i n t o R rows and C columns. The whole a r r a y i s a r t i c u l a t e d by 2*R c i r c u l a r loops as shown. Each of the quadruple comparators holds and s o r t s four input items d u r i n g a comparison c y c l e , except those s i t u a t e d at the top and bottom and l o c a t e d i n the odd-numbered columns of the a r r a y , where only e i t h e r the upper or lower p o r t i o n of these comparators i s i n v o l v e d i n the s o r t i n g p r o c e s s . During the i n i t i a l l o a d i n g phase, a l l the loops are opened at the Input/Output switch and connected to the input l i n e s ; data items enter the network through the loops i n a s e r i a l manner, with neighbouring loops s h i f t e d i n opposite d i r e c t i o n s . A f t e r the network has been loaded, the input l i n e s w i l l be d i s c o n n e c t e d and a l l the loops w i l l be c l o s e d . Before s o r t i n g commences, the comparator a r r a y has to be "marked" — the s o l e purpose of which i s to p l a c e a marker i n a c e r t a i n p o s i t i o n w i t h i n each loop, to i n d i c a t e the beginning and end of that loop. The convention of marking adopted here i s t h a t the "head" of each loop w i l l be a s s o c i a t e d with a marker, and the p o s i t i o n on the r i g h t - h a n d s i d e of the marker 13 w i l l be regarded as the " t a i l " of that loop. The reader may r e f e r to the examples given i n S e c t i o n 3 f o r i l l u s t r a t i o n s ; i n these examples, a s t e r i s k s are used to represent markers. N o t i c e that the marking schemes — i . e . , the ways to p l a c e the markers on the a r r a y p r i o r t o the f i r s t c y c l e of s o r t i n g are d i f f e r e n t f o r the' two examples, and they w i l l be r e f e r r e d to as Scheme A and Scheme B r e s p e c t i v e l y . A f t e r the marking procedure, one of the proposed RSS a l g o r i t h m s w i l l be a p p l i e d to the a r r a y . During a comparison c y c l e , input data are compared and exchanged w i t h i n the quadruple comparators. I f a p a i r of data has to be exchanged, then t h e i r a s s o c i a t e d markers, i f there are any, do not move with them but w i l l remain where they a r e . However, between s u c c e s s i v e comparison c y c l e s and when the data are s h i f t e d , the markers w i l l be s h i f t e d along with the data with which they are a s s o c i a t e d . A schematic diagram of the c o n t r o l u n i t used i s presented i n F i g . I I . 2 . T h i s u n i t generates the c o n t r o l s i g n a l s ( i . e . , "Opcode" i n F i g . I I . 2 ) to i n d i c a t e one of the o p e r a t i o n s to be performed by the comparators: (1) V e r t i c a l - comparison; (2) H o r i z o n t a l - c o m p a r i s o n ; (3) Diagonal-comparison and (4) S h i f t - o p e r a t i o n . At the end of each comparison c y c l e , the c o n t r o l u n i t w i l l t e s t the s t a t u s of the a r r a y ( i . e . , "Exchange/No-Exchange") to see whether any exchange has taken p l a c e d u r i n g that c y c l e . I t a l s o has a c y c l e counter 14 which keeps t r a c k of the c u r r e n t number of c o n s e c u t i v e "No- Exchange" c y c l e s . In other words, the content of the counter i s incremented upon e n t e r i n g a new c y c l e , and i s r e s e t whenever there i s at l e a s t one exchange i n that c y c l e ; when the count reaches twice the number of columns ( i . e . , Count=2*C), a t e r m i n a t i o n s i g n a l w i l l be generated. At t h i s stage, the input items have been s o r t e d i n t o a l i n e a r l i s t . As demonstrated i n the examples of S e c t i o n 3, the f i r s t items of the s o r t e d l i s t s are accompanied by a s t e r i s k s i n the uppermost l o o p s , and the l a s t items are on the righ t - h a n d s i d e of the a s t e r i s k s i n the lowest l o o p s . 2.B. The Quadruple Comparator The quadruple comparators have a higher l o g i c d e n s i t y than the c o n v e n t i o n a l , b i n a r y s o r t e r s used i n other networks, but the number of input/output l i n e s per comparator of the former i s only s l i g h t l y more than that of the l a t t e r . F i g . I I . 3 g i v e s a sketch of the input/output c o n f i g u r a t i o n of a quadruple comparator. 15 upper Ioop { data in »•»> { marker in > "=> data out > marker out lower loop { data out <>= = «= { marker out< <=== data in < marker In A A v 0) Ol c (0 C O X 01 o c a 01 c a £ O X F i g . I I . 3 . The schematic diagram of a quadruple comparator. In a d d i t i o n t o the two s e t s of input and two s e t s of output data l i n e s , there are four s i n g l e - b i t l i n e s used f o r s h i f t i n g of markers a l o n g the two loops connected to the comparators: one l i n e i s f o r the c l o c k s i g n a l , one l i n e i s used to i n d i c a t e whether any exchange has taken p l a c e d u r i n g the c u r r e n t comparison c y c l e , and two l i n e s f o r the opcodes. f o l l o w i n g . I f i t i s l o c a t e d i n an odd-numbered column, then i t w i l l push the s m a l l e s t of the four data items which i t h o l d s t o i t s u p p e r - r i g h t neighbour; i f i t i s i n an even- numbered column, then i t w i l l r e t a i n the s m a l l e s t and the l a r g e s t items i n i t s u p p e r - r i g h t and l o w e r - l e f t p o s i t i o n s E s s e n t i a l l y what a comparator u n i t accomplishes i s the 16 r e s p e c t i v e l y . However, when markers are present i n s i d e the comparator, the s i t u a t i o n becomes somewhat d i f f e r e n t and w i l l be d e s c r i b e d i n the next s u b s e c t i o n . 2.C. The Comparison/Exchange/Shift Operations For the convenience of i l l u s t r a t i o n , the f o l l o w i n g symbols w i l l be used throughtout t h i s chapter: ( i ) D i r e c t i o n of comparison o head or T t a i l V ( i i ) D i r e c t i o n of s h i f t F i g . I I . 4 . Symbols used f o r comparison and s h i f t . The d i r e c t i o n of comparison i s used t o show the o r d e r i n g of items a f t e r each comparison. In F i g . I I . 4 ( i ) , the s o l i d arrow head i n d i c a t e s the p o s i t i o n of the l a r g e r item f o r an ascending order; i f , on the other hand, a descending order i s d e s i r e d , then the arrow head w i l l i n d i c a t e the s m a l l e r one. Without l o s s of g e n e r a l i t y , the ascending order w i l l be assumed i n t h i s study. The open arrow of F i g . I I . 4 ( i i ) i s used t o i n d i c a t e the d i r e c t i o n of movement f o r both the items 17 and the markers during the " S h i f t " o p e r a t i o n s . The four o p e r a t i o n s performed by a comparator are d e p i c t e d i n F i g . I I . 5 and d e s c r i b e d below: 1. V e r t i c a l - c o m p a r i s o n : The two items on the upper p o r t i o n of the comparator are compared to the two at the bottom i n p a r a l l e l , w i t h the d i r e c t i o n s of comparison p o i n t i n g downward. The presence of markers i s ignored. 2. H o r i z o n t a l - c o m p a r i s o n : C a s e d ) When no marker i s i n s i d e the comparator: the two items on the r i g h t p o r t i o n of the comparator are compared to the two on the l e f t i n p a r a l l e l , with the d i r e c t i o n s of comparison p o i n t i n g to the l e f t ; c a s e ( i i ) When one or two markers are p r e s e n t : when a marker appears on the l e f t p o r t i o n of the comparator, the corres p o n d i n g d i r e c t i o n of comparison p o i n t s to the r i g h t ; otherwise i t p o i n t s to the l e f t . Note t h a t i n the h o r i z o n t a l comparisons, the d i r e c t i o n of comparison always p o i n t s from r i g h t to l e f t a c c o r d i n g t o the convention adopted, u n l e s s when both the head and the t a i l of a loop are i n v o l v e d i n the comparison, i . e . , when the marker appears on the l e f t p o r t i o n of the comparator, then the d i r e c t i o n w i l l be r e v e r s e d . T h i s r e v e r s a l prevents the minimum and maximum items i n a loop from c r o s s i n g 1 8 over each other, and i t i s achieved by the a c t i o n taken i n Case ( i i ) above. Diagonal-comparison: The two items on the upper p o r t i o n of the comparator are compared t o the two at the bottom i n p a r a l l e l , with the d i r e c t i o n s of comparison p o i n t i n g downward and c r o s s i n g each other. At the f i r s t g lance, the d i a g o n a l comparison i n v o l v i n g the t o p - r i g h t and l o w e r - l e f t items seems redundant, because these two items are a l r e a d y i n order a f t e r the v e r t i c a l and h o r i z o n t a l comparisons; however, i t i s u s e f u l when two markers appear on the l e f t p o r t i o n of the comparator s i m u l t a n e o u s l y . Furthermore, the t o p - l e f t / b o t t o m - r i g h t comparisons p r o v i d e s an exchange not p r o v i d e d by the combination of the v e r t i c a l and h o r i z o n t a l comparisons. S h i f t : C a s e d ) i f the comparator i s l o c a t e d i n an even-numbered column, then i t s top two items w i l l be s h i f t e d t o the l e f t and i t s lower two items to the r i g h t ; c a s e ( i i ) i f the comparator i s l o c a t e d i n an odd-numbered column, i t s top two items w i l l be s h i f t e d to the r i g h t and i t s lower items t o the l e f t . operat1on act ions 1.Vert1ca1_ Comparison t1me» t r 2.Horlzonta1_ Comparison ca s e ( l ) no marker Is involved c a s e ( H ) markers are Involved t1me= t. 3.D1agonal_ Comparison t1me« t, 4.Shift c a s e d )for comparators in even columns tlme= t« c a s e ( l l ) f o r comparators in odd columns F i g . I I . 5 . The four o p e r a t i o n s performed by the quadruple comparators. 20 3. The RSS Algorithms 3.A. A l g o r i t h m I T h i s a l g o r i t h m i n v o l v e s only the " V e r t i c a l - comparison", "Horizontal-comparison" and " S h i f t " o p e r a t i o n s but not the "Diagonal-comparison", and i s d e s c r i b e d i n the f o l l o w i n g program fragment w r i t t e n i n P a s c a l : Program R e c i r c u l a t i n g - S y s t o l i c - S o r t e r ; • Var Terminate : boolean; Column# : i n t e g e r ; Row# : i n t e g e r ; Comparator! : i n t e g e r ; Exchange : boolean; Count-No-Exchange: i n t e g e r ; ( • I n i t i a l i z a t i o n * ) • While NOT Terminate do (*enter next c y c l e of comparison*) Begin f o r C:=1 to Comparator! do Begin V e r t i c a l - c o m p a r i s o n ; H o r i z o n t a l - c o m p a r i s o n ; End; Check-Terminate; S h i f t ; End; (Algorithm I) The procedure "Check-Terminate" manipulates the f o l l o w i n g g l o b a l v a r i a b l e s : 1. "Exchange" - T h i s boolean v a r i a b l e i s always r e s e t to be " F a l s e " before a new comparison c y c l e commences, and i s set to be "True" i f any exchange 21 takes p l a c e d u r i n g the c y c l e . 2. "Count-No-Exchange" - T h i s v a r i a b l e keeps t r a c k of the number of c o n s e c u t i v e c y c l e s which have no exchange, and i s r e s e t t o zero whenever "Exchange" equals "True". 3. "Terminate" - T h i s boolean v a r i a b l e c o n t r o l s the "WHILE-DO" loop, and i s set to be "True" i f the f o l l o w i n g c o n d i t i o n i s s a t i s f i e d : C o n d i t i o n ( l ) ( f o r t e r m i n a t i o n ) : Count-No-Exchange > 2*Column# 3.B. A l g o r i t h m II T h i s a l g o r i t h m i s s i m i l a r to A l g o r i t h m I except that the "Diagonal-comparison" o p e r a t i o n i s i n c l u d e d i n i t s "WHILE- DO" loop: 22 Program Rec i r c u l a t i n g - S y s t o l i c - S o r t e r ; • While NOT Terminate do (*enter next c y c l e of comparison*) Begin For I : = 1 to Comparator! do Begin V e r t i c a l - c o m p a r i s o n ; H o r i z o n t a l - c o m p a r i s o n ; Diagonal-compari son; End; Check-Terminate; S h i f t End; (Algorithm I I ) 3 .C. Examples Two examples using three columns, three rows and e i g h t comparators ( i . e . , C=3, R=3, P=8) are presented i n F i g . I I . 6 and F i g . I I . 7 . A f t e r the i n i t i a l l o a d i n g and marking procedures, A l g o r i t h m I and II are a p p l i e d to the f i r s t and second examples r e s p e c t i v e l y . The contents of the comparator a r r a y are shown f o r the f i r s t and the l a s t two c y c l e s . Both input l i s t s are s o r t e d i n t o the ascending order. At the end of the l a s t c y c l e , the minimum of each loop i s i n d i c a t e d by the markers and the d i r e c t i o n of i n c r e a s i n g v a l u e s i s from r i g h t to l e f t . A l l the numbers i n a given loop are g r e a t e r than or equal to those i n the next l o o p above. 23 at c y c l e time • 1 •-1 16 0 3 2 0 11 • 2 2 5 7 17 e 0 •13 0 16 -1 15 10 7 1 B • 6 • 7 12 9 2 7 2 2 • 3 2 5 17 8 vert l e a l V comparison •-1 2 0 3 2 0 11 •16 2 0 7 17 8 0 •13 S 8 . -1 15 10 7 1 16 • 8 • 2 3 9 2 7 2 7 •12 2 5 17 8 horizontal | V comparison •-1 2 0 3 2 0 16 •11 2 0 IT 7 8 0 • 5 13 8 -1 15 10 7 1 16 • 8 • 2 3 9 2 7 2 12 • 7 2 5 17 8 cycle time • 47 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 5 5 3 3 2 7 • 7 7 8 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 v e r t i c a l | V comparison 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 6 5 3 3 2 7 • 7 7 8 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 horizontal | V compar1son 0 0 0 -1 •-1 0 2 2 2 • 1 2 2 • 2 5 5 3 3 2 7 • 7 8 7 8 7 11 10 9 8 • 8 12 16 16 15 •13 17 17 s h i f t I » cycle time • 48 0 0 0 0 -1 •-1 2 2 • 1 2 2 2 2 • 2 5 5 3 3 • 7 8 7 8 7 7 12 11 10 9 8 • 8 16 15 •13 17 17 16 v e r t i c a l | V comparison 0 0 0 0 - 1 •-1 2 2 • 1 2 2 2 2 • 2 5 S 3 3 • 7 8 7 8 7 7 12 11 10 9 8 • 8 16 15 •13 17 17 16 horizontal I comparison V 1 0 0 0 0 -1 «-1 | 1 2 2 _• 1 2 2 2 I | 2 • 2 5 5 3 3 | l _ * 7 8. 8 7 7 7 ! | 12 11. 10 9 8 • 8 [ 1 16 15. _*13 17 17 16 | 1 « < sorted V l i s t » > F i g . I I . 6 . An example t o i l l u s t r a t e RSS A l g o r i t h m I and Marking Scheme A. 24 c y c l e time - 1 at c y c l e 11fne • 1B •-3 32 0 8 6 2 3 2 0 0 -2 •-3 22 5 5 11 16 •35 • 2 6 5 5 5 4 •17 0 27 2 33 -2 15 11 6 8 6 • 6 31 22 15 3 16 •18 •11 17 15 16 16 15 •15 26 19 4 15 5 31 22 22 19 18 •18 6 6 6 11 34 •IB •26 35 34 33 32 27 vert leal | V compar1 son v e r t i c a l j V comparison •-3 S 0 8 6 2 2 2 0 O -2 •-3 22 32 5 2 16 •36 • 3 6 5 5 5 4 •17 0 27 11 16 -2 11 11 6 8 6 • 6 31 22 15 3 33 •IB •15 17 15 16 16 15 • 6 6 19. 4 15 S 26 22 22 19 18 •18 IS 26 6 11 34 •18 •31 35 34 33 32 27 horizontal | V comparison horizontal | V comparison •-3 5 0 8 6 2 2 2 0 0 -2 •-3 32 22 5 2 35 •16 • 3 6 5 S 5 4 • 0 17 27 11 16 -2 11 11 8 6 6 • 6 31 22 15 3 33 •18 •15 17 16 15 16 15 • 6 6 19 4 15 5 26 22 22 19 18 •18 26 15 6 11 34 •18 •31 35 34 33 32 27 diagonal | V compar1 son diagonal j V compar1son •-3 S 0 8 6 2 2 2 0 0 -2 •-3 32 22 5 2 35 •16 • 3 6 5 5 5 4 • 0 17 27 11 16 -2 11 11 8 6 6 • 6 31 22 4 3 33 •18 •IS 17 16 15 16 15 • 6 6 19 15 15 5 26 22 22 19 18 •18 26 15 • 6 11 34 •18 •31 35 34' 33 32 27 S h i f t I S h i f t I at c y c l e time - 19 •-3 2 2 0 0 -2 6 5 5 5 4 • 3 • 6 11 11 8 6 ' 6 17 16 15 16 15 •15 •18 26 22 22 19 18 35 34 33 32 27 •31 v e r t i c a l | V comparison •-3 2 2 0 0 -2 6 5 5 5 4 • 3 • 6 11 11 B 6 6 17 16 15 16 15 •15 •18 26 22 22 19 18 35 34 33 32 27 •31 ho r i z o n t a l | V compar1 son •-3 2 2 0 0 -2 6 5 5 8 4 • 3 • 6 11 11 8 6 6 17 16 16 15 15 •15 •18 26 22 22 19 18 35 34 33 32 31 •27 diagonal j comparison V L-3 2 2 0 0 -2 | 1 6 5 5 6 4 • 3 | L* « 11 11 8 6 6 | L_1T_ _ 1 6 16 .15 IS »15 | L_*18 26 22 _22 19 18 | I 35 34 33 32 31 '27 « < sorted 11st » > V F i g . I I . 7 . An example t o i l l u s t r a t e RSS A l g o r i t h m I I and Marking Scheme B. 25 4. O p e r a t i o n a l C o n s t r a i n t s 4.A. C o n s t r a i n t s on the S i z e of RSS Most s o r t i n g networks impose c e r t a i n c o n s t r a i n t s on the s i z e of the networks. For examples, Batcher's b i t o n i c s o r t e r [13] r e q u i r e s that the number of i t s input l i n e s be a power of two, and some mesh s o r t e r s [14,15] work on square a r r a y s o n l y . The b a s i c c o n s t r a i n t of the RSS a r r a y appears to be l e s s s t r i n g e n t : Requirement(1): Column! >2 Row# >1 Fu r t h e r c o n s t r a i n t s may or may not be r e q u i r e d depending on the marking schemes used: In Scheme A and B t o be d e s c r i b e d below, Requirement(1) i s s u f f i c i e n t to guarantee c o r r e c t o p e r a t i o n s of both RSS a l g o r i t h m s when Scheme A i s used, but an a d d i t i o n a l c o n s t r a i n t — which w i l l be given l a t e r on — on the s i z e of RSS w i l l be needed when Scheme B i s used. 26 4.B. Marking Scheme A I t i s observed from our s i m u l a t i o n s t u d i e s that only c e r t a i n ways of marking the a r r a y can guarantee c o r r e c t r e s u l t s , and one such ways i s given below. Marking Scheme A: The i n i t i a l marker p o s i t i o n , M ( i ) , of loop i i s : M(i) := 4 * i - 2 + 1 - M(i-1) where i=1,2,...2*Row#-1, 0< M(i) <= 2*Column#, and M(0) can be any value i n the range of M ( i ) . Scheme A i s a p p l i e d to the example of F i g . I I . 6 , where M(0)=1, M(1)=3-1=2, M(2)=5-2=3, M(3)=9~3=6, M(4)=7-6=1, M(5)=3-1=2, and the p a t t e r n r e p e a t s . I f there are only two columns, then M(3)=6MOD(2*Column#)=2. The r a t i o n a l e behind t h i s scheme w i l l be e x p l a i n e d i n S e c t i o n 5.B. i 4.C. Marking Scheme B In the second scheme, the markers are p l a c e d along the two s i d e s of the comparator a r r a y , as demonstrated i n 27 F i g . I I . 7 . T h i s method i s simpler and we may use the Input/Output l i n e s to i n s e r t the markers; and a l s o , the r e t r i e v a l of the f i n a l s o r t e d l i s t i s e a s i e r than when Scheme A i s used. However, t h i s scheme r e q u i r e s that the number of columns of the RSS a r r a y be twice an odd i n t e g e r , or the next hig h e r i n t e g e r of that v a l u e : Marking Scheme B: M(i) := 1 , f o r i=even • — • 2*Column#, f o r i=odd. Requi rement(2) ( f o r Scheme B o n l y ) : Column# : =  2*(An odd i n t e g e r ) ,or • -• = 2*(An odd int e g e r ) + 1 5. A n a l y s i s of the RSS Algorithms 5.A. Analogy with the Odd-Even T r a n s p o r t a t i o n Sort The RSS a l g o r i t h m s bear some resemblance to the Odd- Even T r a n s p o r t a t i o n Sort [11]; t h e r e f o r e , a b r i e f e x p l a n a t i o n of the Odd-Even s o r t e r would be h e l p f u l i n a n a l y s i n g the RSS d e s i g n : 28 Stage 5-0 a n — K - l Input a. r i sorted ouput N - l F i g . I I . 8 . The Odd-Even S o r t e r . In F i g . I I . 8 , the appearance of an arrow i n d i c a t e s the presence .of a c o n v e n t i o n a l , b i n a r y s o r t e r s i t u a t e d at that p o s i t i o n . An i t e m a ( j ) w i l l be compared t o another item a ( j ' ) a t stage s i f , j ' = j + ( - l ) * * ( j + s ) (II.1) where a l l j ' , j and s are g r e a t e r than or equal t o zero and are l e s s than N, where N i s the number of input l i n e s . The value of j 1 w i l l a l t e r n a t e between (j+1) and ( j - 1 ) when s i s incremented. T h i s s o r t e r guarantees c o r r e c t s o r t i n g of N items i n N c y c l e s [11], but i t r e q u i r e s a t o t a l of N*(N-1)/2 s o r t e r s ; i t i s t h e r e f o r e i m p r a c t i c a l i f N i s l a r g e . 29 5.B. C o r r e c t n e s s of the RSS A l g o r i t h m s and Marking Schemes In t h i s s e c t i o n , we w i l l f i r s t prove that the RSS a l g o r i t h m s are c o r r e c t , and then the two marking schemes w i l l be d e r i v e d . In Lemma ( I I . 1 ) , we w i l l examine the e f f e c t s of the RSS a l g o r i t h m s on each loop of the RSS a r r a y , t e m p o r a r i l y i g n o r i n g the i n t e r a c t i o n s among the loops; then Theorem ( I I . 1) w i l l show that with these i n t e r a c t i o n s , a complete s o r t i n g process can be a c h i e v e d . Lemma ( I I . 1 ) : The Odd-Even T r a n s p o r t a t i o n S o r t i s performed on each of the RSS loop when e i t h e r A l g o r i t h m I or II i s a p p l i e d to the RSS a r r a y . Proof: Let us c o n s i d e r three indexes i , j and J on the RSS a r r a y . As demonstrated i n F i g . I I . 9 , i(=0,1,..2*R-1) indexes the loops of the RSS a r r a y ; J(=1,2,..2*C) indexes the f i x e d p o s i t i o n s of the a r r a y ; and f o r l o o p i , M(i) i n d i c a t e s the i n i t i a l p o s i t i o n of the marker of the loop, and j(=0,1,...2*R- 1) i n d i c a t e s the d i s t a n c e of a p o s i t i o n away from the marker. Because the markers are s h i f t e d with time, j i s t h e r e f o r e a f u n c t i o n of time and i s r e l a t e d to other indexes as f o l l o w s : j=[(M(i)+2C-J) + (2C+(t*(-1)**i)MOD 2C)]MOD 2C =[4C + M(i) - J + (t*(-1)**i)MOD 2C]MOD 2C .. (II.2') .(II.2) / 30 Moving p o s i t i o n index J. Fixed p o s i t i o n index 3 4 Loop 1"0 1 = 1 1 " 2 1 * 3 1«6 1-7 1 2 • 0 11 1 • 0 * 1«4 0 11 * 1«5 1 0 10 11 11 10 * 0 11 3 2 10 9 10 CLE 5 6 8 7 9 8 10 9 1 * 0 8 7 9 8 10 9 1 * 0 7 8 6 5 7 8 6 7 11 6 10 5 7 Tl 6 IT 11 10 9 10 4 3 5 4 6 5 9 8 7 6 2 1 T i n F i g . I I . 9 . The three indexes: i , j , and J , and the i n i t i a l marker p o s i t i o n M ( i ) . Cycle time t-0 t=1 t=2 t=2C-1 {a(0.0)* <a(0,1) <a(0,2) loop O {a(0,3) { < {a(0.2C-1) (a(1,0)* <a(1,1) loop 1 {a(1.2) (a(1.3) { : { : {a(1.2C-1) { a(2R-1.O)* { a(2R-1,1 ) loop { a(2R-1.2) 2R-1 { a(2R-1.3) ( : { { a(2R-1,2C-1). sorted output Fig.II.10. The horizontal comparisons c a r r i e d out on the RSS array. \ 31 The f i r s t composite term i n e x p r e s s i o n (II.2') shows the e f f e c t of the i n i t i a l marker's p o s i t i o n on j , and the second composite term i s due to the e f f e c t of time indexed by t . The modulo f u n c t i o n s are used to t r i m the v a l u e s of t and j because both of them are r e p e t i t i v e with a p e r i o d of 2C. The reader may v e r i f y e a s i l y t h at e x p r e s s i o n (II.2) i s c o r r e c t from the example of F i g . I I . 9 . Having e s t a b l i s h e d the r e l a t i o n s h i p among the indexes, we w i l l now d e r i v e s e v e r a l e x p r e s s i o n s to r e l a t e a p a i r of data items { a ( i , j ) , a ( i ' , j ' ) } i n v o l v e d i n a comparison. F i r s t , l e t us c o n s i d e r the h o r i z o n t a l comparison. In F i g . I I . 9 , a ( i , J ) i s always compared to a ( i , J ' ) where J ' = J - (-1)**J (II.3) For item a ( i , j ) , j i s r e l a t e d to other indexes as i n ( I I . 2 ) : j = [4C+M(i)-J+(t*(-1)**i)MOD 2C] MOD 2C S i m i l a r l y f o r item a ( i ' , j ' ) : j'= [4C+M(i')-J'+(t*(-1)**i*)MOD 2C] MOD 2C = [4C+M(i')-J+(-1)**J+(t*(-1)**i')MOD 2C] MOD 2C For a h o r i z o n t a l comparison, i equals i ' , t h e r e f o r e j ' reduces 32 t o : j'=j + (-1 ) * * J (II .3' ) Again from e x p r e s s i o n ( I I . 2 ) : J= -j+4C+M(i)+(t*(-1)**i)MOD 2C + 2KC where K i s any p o s i t i v e i n t e g e r such that J w i l l be p o s i t i v e . S u b s t i t u t i n g J i n t o (11.3 * ) , we o b t a i n the e x p r e s s i o n f o r j ' , where a ( i ' , j ' ) w i l l be compared to a ( i , j ) h o r i z o n t a l l y , j':=-j + (-1)**[j+M(i)+2C+(t*(-1)**i)MOD 2C] (II.4) Within loop i , M(i) and (-1) * * i are c o n s t a n t s , t h e r e f o r e j ' w i l l a l t e r n a t e between (-j-1) and (-j+1) as t i n c r e a s e s . By comparing (II.4) and ( I I . 1), we can see that the " H o r i z o n t a l - comparison" when coupled with the " S h i f t " o p e r a t i o n s , w i l l perform the Odd-Even Sort as f a r as loop i i s concerned, and t h e r e f o r e items w i t h i n a loop can be s o r t e d w i t h i n 2*C c y c l e s . T h i s p o i n t i s f u r t h e r i l l u s t r a t e d i n F i g . I I . 1 0 . Q.E.D. Theorem ( 1 1 . 1 ) : The e n t i r e RSS a r r a y i s capable of s o r t i n g with the combination of e i t h e r A l g o r i t h m I or A l g o r i t h m I I , and e i t h e r Marking Scheme A or Scheme B. 33 Proof: In a d d i t i o n to the Odd-Even comparisons w i t h i n a loop, the RSS a l s o compares the items of any two adjacent loops by means of the " V e r t i c a l - c o m p a r i s o n " and. "Diagonal-comparison"; the purpose of these o p e r a t i o n s i s t o move sm a l l e r items upward and l a r g e r items downward. I t i s e a s i l y d i s c e r n i b l e t h a t , i f comparisons are p r o v i d e d between the head ( i . e . , a(i+1,0)) of one loop, and the t a i l ( i . e . , a ( i , 2 C - l ) ) of the next higher loop, then odd-even s o r t w i l l be c a r r i e d out on the e n t i r e RSS a r r a y . T h e r e f o r e , the proof of t h i s theorem i s reduced to the proof that the " h e a d - t a i l " comparisons are p r o v i d e d by the combination of the a l g o r i t h m s and the marking schemes. Let us c o n s i d e r the " V e r t i c a l - c o m p a r i s o n " between a p a i r of items { a ( i , j ) , a ( i ' , j ' ) } . In F i g . I I . 9 , note that a ( i , j ) w i l l be always compared to a ( i 1 , j ' ) i f , i ' = i - ( - 1 )**( i + r J/2 1) (II.5) From e x p r e s s i o n s (11.2,4 and 5), we can o b t a i n the p o s i t i o n J where the head and t a i l of any two loops meet: j=2C -1 , => 4C+M(i)-J+(t*(-1)**i)MOD 2C = 2KC+2C-1 ...(II.6) j'=0, => 4 C + M ( i ' ) - J + ( t * ( - 1 ) * * ( i + 1))MOD 2C = 2K'C (II.7) Combining e x p r e s s i o n s (II.6 and 7), we o b t a i n , 34 8C+M(i)+M(i')-2J = (K+K')*2C+2C-1 = > J=K"C+(M(i)+M(i' ) + 1 )/2 (II.8) where K and K' are i n t e g e r s such that 0<=j<2C, and K" equals e i t h e r -1, 0 or 1 because 1<=J<=2C. E x p r e s s i o n (II.8) means that the t a i l of loop i w i l l be compared t o the head of loop (i+1) at e i t h e r halfway between M(i) and M ( i ' ) , i . e . , J=(M(i)+M(i')+1)/2, or J=(M(i)+M(i')+1)/2+C, depending on whether there i s any comparator s i t u a t e d at these l o c a t i o n s . From e x p r e s s i o n I I . 8 , M(i)+M(i+1)+1 = 2(J-K"C) => M(i)+M(i + 1 ) = 2(J-K"C) -1 (II.9) E x p r e s s i o n (II.9) g i v e s r i s e t o another requirement f o r the marking of the RSS a r r a y : Requirement(3): ( f o r marking schemes other than Scheme A and B) M(i)+M(i+1)=An odd i n t e g e r T h i s requirement w i l l ensure that the t a i l s and heads of the loops w i l l be compared by the v e r t i c a l comparisons. 35 I t i s a u t o m a t i c a l l y s a t i s f i e d when Marking Scheme A or B i s used, but has to be c o n s i d e r e d f o r other marking schemes. Q.E.D. Now we w i l l d e r i v e Marking Scheme A. From ( I I . 5 ) , i'=i+1 and i ' =i-(-1 ) ** ( i+ r j / 2 l ) =>i + l"J/2l=An odd i n t e g e r =>rj/2l=(An odd i n t e g e r ) - i C a s e d ) at i=even, [J/2l=odd; t h e r e f o r e , => J= 2*(An odd i n t e g e r ) , or = 2*(An odd i n t e g e r ) -1 (11.10) from e x p r e s s i o n s (II.9 and 10), => M(i+1)= 4*(An odd in t e g e r ) - 2 K " C - 1 - M ( i ) , or = 4*(An odd integer)-2K"C-3-M(i) ...(11.11) C a s e ( i i ) at i=odd, l"J/2l=even; t h e r e f o r e , => J=2*(An even i n t e g e r ) , or = 2*(An even i n t e g e r ) - 1 (11.12) from e x p r e s s i o n s (II.9 and 12), 36 => M(i+1)=4*(An even i n t e g e r ) - 2 K " C - 1 - M ( i ) , or - = 4*(An even integer)-2K"C-3-M(i) (11.13) We c o u l d o b t a i n Scheme A by s e t t i n g K"=0 i n e x p r e s s i o n s (11.11 and 14): M(i+1)=4*i-2+1-M(i) Or, e q u i v a l e n t l y , r M(i)=4*i-2+1-M(i-1) which i s Scheme A and where 1<=M(i)<=2C, f o r i:=1,2,...2R To d e r i v e Scheme B, l e t M(i):=1, f o r i=even :=2C, f o r i=odd C a s e ( i ) At i=odd, from e x p r e s s i o n s (II.8 and 12), J=K"C+(1+2C+1)/2=2*(An even i n t e g e r ) , or =2*(An even i n t e g e r ) - 1 => J=K"C+C+1 e{3,4,7,8 } (11.14) out of the three p o s s i b l e vaules of K", i . e . , -1, 0, and +1, 37 only K"=0 can s a t i s f y both e x p r e s s i o n (11.14) and (1<=J<=2C); t h e r e f o r e , J=C+1 e{3,4,7 f8 f } =>C i {2,3,6,7, } =>C:=2*(An odd i n t e g e r ) , or :=2*(An odd i n t e g e r ) + 1 (11.15) C a s e ( i i ) At i=even, from e x p r e s s i o n s (II.8 and 10), J:=K"C+(2C+1+1)/2 =K"C+C+1 €{1,2,5,6 } (11.16) both K"=0 and K"=-1 can s a t i s f y e x p r e s s i o n (11.16) and (1<=J<=2C) s i m u l t a n e o u s l y : when K" = 0, Ce{0,1,4,5 } (11.17') when K" = -1, C= Any p o s i t i v e i n t e g e r (11.17) For a l l v a l u e s of i , both e x p r e s s i o n s (11.15 and 17) can be s a t i s f i e d s i m u l t a n e o u s l y by the requirement below: C := 2*(An odd i n t e g e r ) , or := 2*(An odd integer)+1 38 t => C €{2,3,6,7, } which i s Requirement (2) of Scheme B. Now l e t us c o n s i d e r the d i a g o n a l comparisons, and we w i l l show that Requirement (3.) can a c t u a l l y be waived when Marking Scheme B i s used with A l g o r i t h m I I . Again, from F i g . I I . 9 , items a ( i , j ) and a ( i ' , j ' ) w i l l , be compared d i a g o n a l l y i f , J ' = J - ( - 1 ) * * J ...(II.3) i ' = i - ( - 1 ) * * ( i + r J/21) (II.5) C o n v e r t i n g J and J ' i n t o j and j ' us i n g e x p r e s s i o n s (II.2 and 3): j=[4C+M(i)-J+(t*(-1)**i)MOD 2C] MOD 2C j'=[4C+M(i')-J+(-1)**J+(t*(-1)**i')MOD 2C] MOD 2C The heads and t a i l s of the loops w i l l be compared by the d i a g o n a l comparisons i f , j = 2C-1 j'=0 i'=i+1 39 S u b s t i t u t i n g these values i n t o the above e x p r e s s i o n s , we get, j=2C-1, => 4C+M(i)-J+(t*(-1)**i)MOD 2C=2KC+2C-1 j'=0, => 4C+M(i + 1 )-J+(-1 )'**J+(t*(-1 ) * * ( i + 1 )MOD 2C=2K'C Adding up the two e x p r e s s i o n s , 8C+M(i)+M(i+1)-2J+(-1)**J=(K+K'+1)*2C"1 => M(i)+M(i+1)=2J-2K"C-1-(-1)**J =2(j-K")C-1-(-1)**J =2(J-K")C, or 2(J-K")C-2 =An even i n t e g e r . The l a s t r e s u l t shows that Requirement (3) can be waived when Scheme B i s used with A l g o r i t h m I I , because i f M(i)+M(i+1) equals an odd i n t e g e r , then the h e a d - a n d - t a i l comparisons w i l l be p r o v i d e d by the " V e r t i c a l - c o m p a r i s o n " , but i f i t equals an even i n t e g e r , then i t w i l l be p r o v i d e d by the "Diagonal- comparison" as demonstrated above. The v a r i o u s requirements f o r the marking schemes are summarized i n T a b l e . I I . 1 . 40 Table II.1 - Requirements of the RSS marking schemes. Marking Scheme A l g o r i t h m I A l g o r i t h m II A Requirement(1) Requirement(1) B Requirements(1)&(2) Requirements(1)&(2) others Requirements(1)&(3) and others to be d e r i v e d from e x p r e s s i o n s II.5 and I I . 8 . 5.C. C o r r e c t n e s s of the Termination Methods I f the RSS a r r a y i s c o r r e c t l y marked and Requirements (1) , (2) and (3) are duly met, then C o n d i t i o n (1) of S e c t i o n (2.C) i s s u f f i c i e n t to guarantee proper t e r m i n a t i o n . The reason i s t h a t , as we can see from e x p r e s s i o n s (11.2,4 and 5), the comparison p a t t e r n repeats every 2C c y c l e s ; i f there i s no exchange i n the most recent c o n s e c u t i v e 2C c y c l e s , then there w i l l be no f u r t h e r exchanges i n the subsequent comparisons, meaning that the s o r t i n g process must have terminated. 5.D. Timing C o m p l e x i t i e s The RSS s i m u l a t i o n program i s l i s t e d i n Appendix B f o r r e f e r e n c e . Input parameters t o the s i m u l a t o r i n c l u d e the numbers of rows and columns — these two numbers determine the 41 t o t a l number of items to be s o r t e d — and the i n i t i a l seed value f o r the g e n e r a t i o n of the input l i s t ; at the end of each s i m u l a t i o n run, the s i m u l a t o r w i l l produce the s o r t e d l i s t as w e l l as the number of s o r t i n g c y c l e s needed. Fig.II.11 show the numbers of c y c l e s needed by RSS A l g o r i t h m I to s o r t on a r r a y s with v a r i o u s combinations of numbers of rows and columns. These s i m u l a t i o n r e s u l t s i n d i c a t e that with A l g o r i t h m I, the average number of c y c l e s needed to s o r t a random set of N items i s bound by the l i n e N, and approaches N/2 as N i n c r e a s e s . When A l g o r i t h m II i s used, the number of c y c l e s needed w i l l be much s m a l l e r -- due to the presence of d i a g o n a l comparisons, and the examples of F i g . I I . 6 and II.7 h e l p i l l u s t r a t e t h i s p o i n t . However, the a c t u a l speeds of A l g o r i t h m II may or may not exceed that of A l g o r i t h m I, because i t s comparison c y c l e i n c l u d e the d i a g o n a l comparison and hence i t s c y c l e time w i l l be l o n g e r . ^ s Number o f comparison c y c l e s F i g . l l . 1 1 . The number of comparison c y c l e s v ersus the number of items to be so r t e d (Algorithm I ) . 43 6. D i s c u s s i o n s Since s o r t i n g i s such a common and necessary o p e r a t i o n i n computer a p p l i c a t i o n s , t h e r e are dozens of s o r t i n g a l g o r i t h m s d e s c r i b e d i n the l i t e r a t u r e . In t h i s c hapter, we have presented two s i m i l a r s o r t i n g a l g o r i t h m s which apply the s y s t o l i c idea to a c y c l i c a l a r c h i t e c t u r e , and the f u n c t i o n a l d e s ign of a s o r t e r (RSS) based on these a l g o r i t h m s has a l s o been suggested. Our primary g o a l i s to look i n t o the design of a s p e c i a l - p u r p o s e VLSI c h i p t h a t can be a t t a c h e d to a host computer such as the one e n v i s i o n e d by F o s t e r and Rung [ 19]: CPU S Y S T E M B U S I Primary Memory i Pattern Matcher F i g . I I . 1 2 . A general-purpose computer system with s p e c i a l - purpose c h i p s a t t a c h e d [19], Undoubtedly, the u s e f u l n e s s of the s o r t e r i s not l i m i t e d to s c i e n t i f i c computation , i t c o u l d a l s o be used i n o f f i c e i n f o r m a t i o n systems and r e l a t i o n a l data base machines. With the s t a t e d goal i n mind, we now compare our p r o p o s a l with some e x i s t i n g ones u s i n g the f o l l o w i n g c r i t e r i a : (a) time complexity; (b) hardware complexity and (c) c o n t r o l 44 complexity. (a) Time Complexity: In T a b l e . I I . 2 , the s o r t i n g times of some e x i s t i n g a l g o r i t h m s c o u l d be d i v i d e d i n t o four c a t e g o r i e s , namely, O(logN), 0 ( ( l o g N ) * * 2 ) , O(N**0.5) and O(N), where N i s the number of items to be s o r t e d . M u l l e r and P r e p a r a t o r ' s a l g o r i t h m [10] i s i n the f a s t e r category, but i t r e q u i r e s a d i s c o u r a g i n g number of comparators, 0(N**2). Batcher's b i t o n i c s o r t e r [13] and the p e r f e c t s h u f f l e [9] are both i n the 0((logN)**2) category, and they are c h a r a c t e r i z e d by the shuffle-exchange type of i n t e r c o n n e c t i o n s . The two mesh s o r t i n g schemes s o r t N**2 items on a NxN mesh with approximately O(N) time, t h e r e f o r e they b e l o n g " to the O(N**0.5) cat e g o r y . Nassimi and Sahni's mesh s o r t i n g scheme [14] i s based on Batcher's b i t o n i c merge a l g o r i t h m and i t needs approximately 14N r o u t i n g steps and 2*logN compare-exchange steps on a NxN mesh; moreover, i t r e q u i r e s that the input s u b f i l e s be pre- s o r t e d . Thompson and Kung's mesh s o r t i n g scheme [15] needs roughly 6N+0((N**(2/3))logN) r o u t i n g steps and N+0((N**(2/3))logN) compare-exchange s t e p s . The RSS a l g o r i t h m s belong to the O(N) c a t e g o r y , but because of t h e i r simpler c o n t r o l s t r u c t u r e s and near- neighour type of data movements, t h e i r a c t u a l s o r t i n g times might be l e s s than those of the mesh s o r t i n g schemes which r e q u i r e more complex c o n t r o l 45 and data movements. (b) Hardware complexity: S o r t e r s with shuffle-exchange type of i n t e r c o n n e c t i o n s are not w e l l - s u i t e d t o VLSI implementations, because shuffle-exchange networks have a very low degree of r e g u l a r i t y and m o d u l a r i t y , and r e q u i r e wires of v a r i o u s l e n g t h s . I t has been shown by Thompson [18] that at l e a s t 0((N**2)/(logN)**2) c h i p area i s r e q u i r e d to l a y out an N-vertex shuffle-exchange network — t h i s i s a s e r i o u s drawback when N i s l a r g e . On the other hand, because the i n t e r c o n n e c t i o n p a t t e r n s r e q u i r e d by the mesh and the RSS a l g o r i t h m s are h i g h l y r e g u l a r and r e p e t i t i v e , these two types of s o r t e r s c o u l d be f a b r i c a t e d e a s i l y by r e p l i c a t i n g the c i r c u i t s of a s i n g l e comparator u n i t f o r the e n t i r e a r r a y s . (c) C o n t r o l complexity: The l o g i c of the v a r i o u s o p e r a t i o n s ( i . e . , H o r i z o n t a l - c o m p a r i s o n , V e r t i c a l - comparison and Diagonal-comparison) can be hardwired i n t o each of the quadruple comparator, and the c o n t r o l u n i t shown i n F i g . I I . 1 simply broadcasts the sequence of these o p e r a t i o n s to a l l the comparators. The c o n t r o l s t r u c t u r e r e q u i r e d by the RSS a l g o r i t h m s i s t h e r e f o r e comparable to that r e q u i r e d by the Batcher's b i t o n i c s o r t e r s , and i s much simpler than that r e q u i r e d by the, mesh s o r t e r s . The 46 l s i m p l i c i t y of the RSS c o n t r o l l e r i s another important f a c t o r when the VLSI implementation i s concerned. Most other s o r t i n g networks impose c e r t a i n n o n - t r i v i a l c o n s t r a i n t s on t h e i r s i z e s ; f o r examples, the Batcher's s o r t e r and the p e r f e c t s h u f f l e network r e q u i r e that the number of input l i n e s be a power of two, and the mesh s o r t i n g a l g o r i t h m s operate on square a r r a y s . The c o n t r a i n t s of the RSS al g o r i t h m s (see T a b l e . I I . 1 ) appear to be l e s s s t r i n g e n t i n t h i s r e s p e c t . In summary, although the RSS design i s not optimal i n every a s p e c t , i t i s h i g h l y amenable t o VLSI implementations as f a r as i t s hardware and c o n t r o l c o m p l e x i t i e s are concerned. T a b l e II.2 - C o m p l e x i t i e s of S o r t i n g Networks* Method #Input ^Comparators Time I n t e r c o n n e c t ion C o n t r o l B a t c h e r ' s B i t o n i c S o r t e r [ 1 3 ] N 0(N{logN}**2) 0( {1ogN}**2) h i g h 1 ow M u l l e r & P r e p a r a t o r ' s t 1 0 ] N 0(N**2) 0(1ogN) 1 ow h i g h P e r f e c t S h u f f l e [ 9 ] N 0(N) 0({logN>**2) h i g h 1 ow Thompson & Kung's Mesh S o r t [ 1 5 ] NxN NxN mesh 0(N)++ 1 ow h i g h Nasslml & S a h n l ' s Mesh S o r t [ 1 4 ] NxN NxN mesh 0(N)++ 1 ow h i g h RSS N 0(N) 0(N)++ low 1 ow Notes: + In terms of a m e n a b i l i t y to VLSI implementations. ++ P l e a s e see d i s c u s s i o n s i n S e c t i o n II.G. 48 Chapter I I I . A Novel Loop-Structured Switching Network (LSSN) 1. I n t r o d u c t i o n Many l a r g e - s c a l e computer a p p l i c a t i o n s such as image p r o c e s s i n g , weather f o r e c a s t i n g and b a l l i s t i c m i s s i l e defence systems, r e q u i r e execution r a t e s of more than one b i l l i o n i n s t r u c t i o n s per second. With the advent of VLSI t e c h n o l o g i e s , i t i s f e a s i b l e and more f l e x i b l e to c o n s t r u c t such l a r g e - s c a l e systems by i n t e r c o n n e c t i n g hundreds or even thousands of o f f - t h e - s h e l f p r o c e s s i n g and storage d e v i c e s , to work i n a c o - o p e r a t i v e manner. Although s e v e r a l e x i s t i n g networks can p r o v i d e the r e q u i r e d communication "bandwidth among these d e v i c e s , they are expensive t o b u i l d and d i f f i c u l t t o expand. For examples, the s w i t c h counts of a NxN c r o s s b a r and a NxN b a s e l i n e [20] are 0(N**2) and O(NlogN) r e s p e c t i v e l y ; f o r N=1024, the c r o s s b a r would r e q u i r e more than a m i l l i o n switches while the b a s e l i n e would need about f i v e thousand of them. Another disadvantage of u s i n g l a r g e number of switches i s t h a t of system r e l i a b i l i t y — the networks are more l i k e l y t o f a i l when more switches are used. In t h i s chapter, we i n t r o d u c e a novel l o o p - s t r u c t u r e d s w i t c h i n g network (LSSN) which overcomes the above problems. The main f e a t u r e of LSSN i s i t s c y c l i c a l c o n n e c t i o n s ; and i t o n l y r e q u i r e s N/2 two-by-two switches f o r i n t e r c o n n e c t i n g N p a i r s of t r a n s m i t t i n g and r e c e i v i n g d e v i c e s ; t h e r e f o r e , i t i s 49 very a t t r a c t i v e f o r l a r g e - s c a l e , heterogeneous systems made up of many d e v i c e s . From the s t r u c t u r a l and f u n c t i o n a l p o i n t s of view, LSSN i s a packet-switched, m u l t i - s t a g e d , b l o c k i n g network with d i s t r i b u t e d c o n t r o l . In the next s e c t i o n , we w i l l present i t s c o n n e c t i o n f u n c t i o n , a d d r e s s i n g and r o u t i n g a l g o r i t h m s . In S e c t i o n t h r e e , s e v e r a l important p r o p e r t i e s of LSSN w i l l be re v e a l e d , and the causes of and method t o a v o i d deadlocks w i l l be d i s c u s s e d . In S e c t i o n f o u r , the r e s u l t s of our s i m u l a t i o n s t u d i e s and performance e v a l u a t i o n s w i l l be pre s e n t e d . D i s c u s s i o n s and t o p i c s f o r f u r t h e r work are p r o v i d e d i n S e c t i o n f i v e , and the LSSN s i m u l a t i o n program are l i s t e d i n Appendix C f o r r e f e r e n c e . 2. Network Topology Networks using (logL) stages of two-by-two switches -- where L i s a power of two — are well-known [20,21,22,25,26]. T r a d i t i o n a l l y , they are used to connect L input t o L output t e r m i n a l s , i . e . , f o r i n t e r c o n n e c t i n g L t r a n s m i t t e r s to L r e c e i v e r s . Feedback paths are sometimes p r o v i d e d t o route the i n f o r m a t i o n back from the output s i d e to the input s i d e , thus forming l o o p s . LSSN i s a l s o based on the concept of feedback l o o p s , but i t d i f f e r s from others i n t h a t a l l i t s switches c o u l d be used as both e n t r y and e x i t s t a t i o n s f o r data t r a n s m i s s i o n and r e c e p t i o n . With L loops — where L i s a power of two and at l e a s t equal to four — LSSN can connect 50 up to N=LlogL p a i r s of t r a n s m i t t e r s (Trs) and r e c e i v e r s ( R r s ) , u s i n g only N/2 switches — t h i s f e a t u r e renders i t a t t r a c t i v e f o r l a r g e v a l u e s of N. An example with L=16 and N=64 i s i l l u s t r a t e d i n F i g . I I I . 1 . 2.A. Addressing Scheme and Connection F u n c t i o n The f o l l o w i n g d e s c r i p t i o n can be e a s i l y understood i f the readers r e f e r to the example of F i g . I I I . 1 , i n which a l l the switches have been set to the " S t r a i g h t - t h r o u g h " c o n n e c t i o n ; a loop i s d e f i n e d as a c l o s e d path i n t h i s c o n f i g u r a t i o n . For a LSSN with L loops, each of the loops i s l a b e l e d with L'=logL b i t s of code, i . e . , which i s the b i n a r y r e p r e s e n t a t i o n of an i n t e g e r i n the c l o s e d range [ 0 , L - 1 ] , The switches are arranged i n t o L' stages each of which i s l a b e l e d with S'=riogL'1 b i t s of b i n a r y d i g i t s r e p r e s e n t e d as Ag,...-o.^. The output l i n k s of a s w i t c h at the s - t h stage would be a s s i g n e d the f o l l o w i n g addresses: Xe.ft Output -t-Lnh = ,. . . ^•i^L* • • • • * ^1 12-Lg.h.t Output Xtnh= A g • • • •'°'i'̂ n • • • • ' ^ s + i ^ ' ^ s _ i * * These addresses are o b t a i n e d by c o n c a t e n a t i n g the stage and l o o p l a b e l s t o g e t h e r , with the s-th b i t of the address of the l e f t output l i n k set t o "0" and t h a t of the r i g h t output l i n k 51 set to "1". One c o u l d v e r i f y t h i s scheme on the example of F i g . I l l . 1 . Consider a switch l o c a t e d i n the s - t h stage, and suppose one of i t s output l i n k s i s p a r t of the loop then i t r e a l i z e s the f o l l o w i n g c o n n e c t i o n f u n c t i o n : L S S N S ( £ L , . • « £ s . • = *-L,- • -^ s- • • l 1 » where l& i s the one's complement o f -t^. The connection f u n c t i o n s t a t e s that at the s-th stage, any two loops with l a b e l s d i f f e r i n g o n l y i n t h e i r s - t h b i t s w i l l be connected by a switch at that stage. 2.B. Routing Scheme In LSSN, r e c e i v e r s (Rrs) are i d e n t i f i e d by using the address of the output l i n k s t o which they are connected, whereas t r a n s m i t t e r s (Trs) need not be i d e n t i f i e d . To d i s p a t c h a message, a Tr w i l l generate a packet which has the f o l l o w i n g format: < f ' f " ; d e s t i n a t i o n address ; message > where f ' f " i s a 2 - b i t f i e l d which w i l l be r e f e r r e d to as the feedback count, and i s i n i t i a l i z e d to zero when a packet i s newly formed, and incremented whenever the packet goes through the feedback paths. L a t e r on, i t w i l l be shown that t h i s f i e l d would never r e q u i r e more than two b i t s r e g a r d l e s s of the 52 network s i z e . The address of the Rr and the a c t u a l message to be t r a n s m i t t e d are a l s o c o n t a i n e d i n the packet. Two types of switches, namely Type-A and Type-B, w i l l be c o n s i d e r e d i n our s t u d i e s , and t h e i r schematic diagrams are shown i n F i g . I I I . 3 . Loop // Stage 00 o o o o o o o o o o o o o o o o Stage 01 o o o o o o o o o o o o o o o o o o o o o o o o iH .-I o o fH fH o •H O iH fH O O Stage 10 o o o o o o o iH o o o o o o o o o o o o o o o o o o o iH O O »H o o fH O O r-l o o o o X O i-l iH O o o o o o o o o >< o o o o >H O O tH T H O L tH fH o Stage 11 F i g . I I I . 1 . Assignment of loop and l i n k l a b e l s on LSSN which has 16 loops and 32 switches. F i g . I I I . 2 . Connection of t r a n s m i t t i n g and r e c e i v i n g d e v i c e s on a LSSN with 16 l o o p s . (x= a hardwired c o n n e c t i o n f o r a t r a n s m i t t e r , T r ; 0= a hardwired c o n n e c t i o n f o r a r e c e i v e r , Rr.) 55 a) Input buffers Output p o r t s b) loop ( l ^ . - . - l j ) stape (s^.-.Sj) • • I 7 fT> C CM ec tr. CO c o output link Input ports Duffer pools Intermediate ports Output ports data and control signals » status signals Indicating a v a i l a b i l i t y of class O and class 1 buffers F i g . I I I . 3 . The schematic diagrams of a Type-A switch (a) and, a Type-B switch ( b ) . 56 A Type-A switch i s s i m i l a r to those used i n the co n v e n t i o n a l packet-switched networks, except that i t has two b u i l t - i n f i r s t - i n - f i r s t - o u t b u f f e r s . When a packet e n t e r s a Type-A switch l o c a t e d i n the s-th stage, i t i s f i r s t p l a c e d i n t o one of i t s input b u f f e r s , and then switched to the l e f t output port i f the s-th b i t of i t s d e s t i n a t i o n address i s a "0", or to the r i g h t output port i f that b i t i s a " 1 " . As shown i n F i g . I I I . 3 , a Type-B switch has a s l i g h t l y more complicated i n t e r n a l s t r u c t u r e than a Type-A s w i t c h . I t s main f e a t u r e s are the " s t r u c t u r e d b u f f e r p o o l s " which are made up of three c l a s s e s of f i r s t - i n - f i r s t - o u t (FIFO) b u f f e r s : C l a s s - 0 , Class-1 and C l a s s - 2 . I t a l s o c o n t a i n s four intermediate p o r t s which are connected to the Cl a s s - 0 and Class-1 b u f f e r s as shown. I t has two se t s of outgoing s t a t u s l i n e s i n d i c a t i n g the a v a i l a b i l i t i e s of i t s C l a s s - 0 and Class-1 b u f f e r s to i t s p r e c e d i n g switches (which are connected to i t s l e f t and r i g h t input l i n k s ) , and two se t s of incoming s t a t u s l i n e s g i v i n g the same i n f o r m a t i o n from i t s succeeding switches (which are connected to i t s l e f t and r i g h t output l i n k s ) . The C l a s s - k b u f f e r — where k i s i n {0,1,2} — i s used to accomodate packets with a feedback count of k; the Cla s s - 2 b u f f e r i s connected to the output p o r t d i r e c t l y while the Cl a s s - 0 and Class-1 b u f f e r s are connected to the output p o r t through the in t e r m e d i a t e p o r t s . The f u n c t i o n s of these v a r i o u s mechanisms w i l l be e x p l a i n e d l a t e r on. When a packet e n t e r s a Type-B switch l o c a t e d i n the s- 57 t h s t a g e , i t w i l l u n d e r g o t h e f o l l o w i n g o p e r a t i o n s : ( a ) F r o m a n i n p u t p o r t t o t h e b u f f e r p o o l : T h e p a c k e t w i l l b e p l a c e d i n t o o n e o f t h e b u f f e r s a c c o r d i n g t o i t s f e e d b a c k c o u n t ; ( b ) F r o m t h e b u f f e r p o o l t o t h e o u t p u t p o r t : C a s e d ) F r o m a C l a s s - 0 a n d C l a s s - 1 b u f f e r s : I f t h e s - t h b i t o f t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t i s a " 0 " , t h e n t h e p a c k e t w i l l b e s w i t c h e d t o t h e l e f t i n t e r m e d i a t e p o r t t h e n t o t h e l e f t o u t p u t p o r t ; e l s e i t w i l l b e s w i t c h e d t o t h e r i g h t i n t e r m e d i a t e p o r t t h e n to_ t h e r i g h t o u t p u t p o r t . C a s e ( i i ) F r o m a C l a s s - 2 b u f f e r s : T h e p a c k e t w i l l b e f o r w a r d e d t o t h e o u t p u t p o r t c o n n e c t e d t o t h e C l a s s - 2 b u f f e r w i t h o u t s w i t c h i n g a n d g o i n g t h r o u g h t h e i n t e r m e d i a t e p o r t s . ( c ) F r o m a n o u t p u t p o r t t o t h e e x t e r i o r : A t t h e o u t p u t p o r t , t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t w i l l b e m a t c h e d a g a i n s t t h a t o f t h e o u t p u t l i n k . I f a m a t c h o c c u r s , t h e n a s t r o b e s i g n a l w i l l b e s e n t t o t h e r e c e i v e r a t t a c h e d t o t h a t o u t p u t l i n k , a n d t h e p a c k e t w i l l b e r e m o v e d f r o m t h e o u t p u t p o r t b y t h a t r e c e i v e r ; e l s e t h e n e x t s w i t c h a t t h e o t h e r e n d o f t h e o u t p u t l i n k w i l l b e s t r o b e d a n d t h e p a c k e t w i l l b e f o r w a r d e d t o i t s i n p u t p o r t . F o r t h e t r a n s m i s s i o n b e t w e e n t h e l a s t a n d t h e f i r s t s t a g e s v i a t h e f e e d b a c k l o o p s , t h e same o p e r a t i o n w i l l t a k e 58 p l a c e , but i n a d d i t i o n , the feedback count of those packets emerging from the output port of the l a s t stage w i l l be incremented. These three o p e r a t i o n s w i l l be c o l l e c t i v e l y r e f e r r e d t o as a s i n g l e r o u t i n g step f o r the Type-B s w i t c h . According to the d e s c r i p t i o n s above, the Type-B switch must c o n t a i n the f o l l o w i n g f e a t u r e s i n a d d i t i o n to those d e p i c t e d i n F i g . I I I . 3 . F i r s t of a l l , the addresses of i t s output l i n k s must be made a v a i l a b l e to the matching o p e r a t i o n s (e.g., by s t o r i n g the addresses i n s i d e the s w i t c h ) , and there must be some l o g i c gates to perform the matching; the switch must be able to determine whether or not i t i s l o c a t e d i n the l a s t stage of the network by examining the l a b e l s a s s i g n e d to i t s output l i n k s , because the feedback counts of those packets p a s s i n g through i t have t o be incremented. S i m i l a r f e a t u r e s must* a l s o be present i n a Type-A switch, but those hardware i n v o l v i n g the feedback counts may not be i n c l u d e d . Since the r o u t i n g of packets i s performed l o c a l l y by each of the switches, LSSN has the advantage of not r e q u i r i n g a c e n t r a l c o n t r o l l e r . On the other hand, the l a c k of c e n t r a l c o n t r o l w i l l g i v e r i s e to c o n f l i c t s among the packets f o r the shared network resources such as p o r t s and l i n k s ; the e f f e c t s of such c o n f l i c t s on LSSN when Type-A and Type-B switches are used w i l l be d e t a i l e d i n the next s e c t i o n . 59 3. Network P r o p e r t i e s F i r s t , some u s e f u l theorems concerning the behaviors of LSSN with the presence of a s i n g l e packet w i l l be s t a t e d , then the v a r i o u s p r o p e r t i e s of LSSN with the presence of more than one packet w i l l be examined. The p r o o f s of a l l these theorems are given i n Appendix A, and a l l the a l g o r i t h m s used are base-two. No t i c e that even though the d e s t i n a t i o n addresses A S , , * 1 ^ L ' * ^ 1 c a r r i e d by a packet c o n s i s t s of (S'+L') b i t s of i n f o r m a t i o n , only the L' l e a s t s i g n i f i c a n t b i t s are i n v o l v e d i n the s w i t c h i n g o p e r a t i o n ( i . e . , Operation (b) of S e c t i o n 2.B); and the f i r s t S' most s i g n i f i c a n t b i t s , together with the L' l e a s t s i g n i f i c a n t b i t s , are i n v o l v e d i n the matching o p e r a t i o n ( i . e . , O peration (c) of S e c t i o n 2.B) o n l y . T h i s o b s e r v a t i o n l e a d s to the f o l l o w i n g lemma: Lemma 111.1: Consider a LSSN which has L loops and a packet which i s d e s t i n e d f o r the address • ••a^L** • • • • ^ i , where L'=logL and S ' = r i o g L ' l . The packet w i l l be routed to the loop t 1 w i t h i n L' steps of r o u t i n g a f t e r i t s admission i n t o the LSSN. Example: Consider a LSSN with L=16, then L'=4 and S'=2. A packet d e s t i n e d f o r the address (101111) w i l l be routed to the l o o p (1111) w i t h i n 4 steps of r o u t i n g r e g a r d l e s s of where i t i s generated. 60 Lemma I I I . 2 : Consider a LSSN with L loops and a packet which i s d e s t i n e d f o r the address As'*'* A 1^L'• • • • ^1, where L'=logL and S' = rlogL'. 1. A f t e r the packet has been routed to the loop L 1 , i t needs at most another (L'-1) steps of matching along that l o o p to reach i t s d e s t i n a t i o n . Theorem III.1s In a LSSN with L loops, a packet w i l l be d e l i v e r e d to i t s d e s t i n a t i o n w i t h i n (21ogL -1) steps of r o u t i n g r e g a r d l e s s of where i t i s generated. Example: In the example of F i g . I I I . 1 , L=16; t h e r e f o r e the maximum number of r o u t i n g steps i s (2*4 - 1)=7. Theorem I I I . 2 : The average number of r o u t i n g steps (ARS) needed to d e l i v e r a r e s u l t packet i n a LSSN with L loops i s , ARS(L)=(3logL-1)/2+2/L-1 Example: In the example of F i g . I I I . 2 , s i n c e L=16, t h e r e f o r e , ARS(L=l6)=(3log16 - l ) / 2 + 2/16 -1 = 4.625. C o r o l l a r y 1 1 1 . 1 : Any packet admitted i n t o LSSN w i l l go through the feedback path at most twice. 61 Example: In the example of F i g . I I I . 1 and 2, i f Tr49 — which i s a t t a c h e d to l i n k 10 0000 -- sends a packet to Rr32 -- which i s a t t a c h e d to l i n k 01 1111, then t h i s packet w i l l go through the feedback paths twice: The f i r s t time through the loop (1000), and the second time through the loop (1111). C o r o l l a r y III.1 e x p l a i n s why the packets only have to c a r r y two b i t s to i n d i c a t e i t s feedback count f ' f " , and a l s o why each b u f f e r pool of the Type-B switch i s made up of three c l a s s e s of b u f f e r s r e g a r d l e s s of the network s i z e L. Theorem I I I . 3 : In a Type-B switch of a LSSN which has L lo o p s , the p r o b a b i l i t y t h at the d e s t i n a t i o n address c a r r i e d by a r e s u l t packet w i l l match the l a b e l of an output l i n k of the switch, and hence the packet w i l l be removed from the network i s : Premoved =2L/{3LlogL-L +4} where the t r a n s m i s s i o n p a t t e r n i s such that each and every r e c e i v i n g p o r t of the network i s e q u a l l y l i k e l y t o r e c e i v e t h a t packet. Theorem I I I . 4 : The maximum average throughput r a t e (MATR) of a LSSN with L loops i s : 62 MATR(L)=3/2xS R,SW xlogLxL**2/{3LlogL-L+4} where S. R,SW i s the maximum ra t e of t r a n s m i t t i n g R e s u l t packet between two switches v i a an output l i n k . 3.A. Network C o n f l i c t s When there are two or more packets i n LSSN, they may contend f o r the same network r e s o u r c e s such as input b u f f e r s , p o r t s and data l i n k s , thus g i v i n g r i s e to c o n f l i c t s . I f Type-A switches are used i n LSSN, then there would be two types of c o n f l i c t s : (a) A1 c o n f l i c t s - which are the c o n t e n t i o n s due to the simultaneous requests made by packets i n the two input b u f f e r s , f o r the same Output p o r t ; (b) A2 c o n f l i c t s - which are the c o n t e n t i o n s between an output p o r t and the Tr s h a r i n g the same l i n k , f o r the same input p o r t of the switch at the end of the l i n k . A simple round-robin d i s c i p l i n e can r e s o l v e both types of c o n f l i c t s and w i l l ensure f a i r n e s s . A b e t t e r a l t e r n a t i v e f o r Al c o n f l i c t s i s to honor the input b u f f e r which has more 63 w a i t i n g packets i n i t ; and i f both input b u f f e r s are e q u a l l y occupied, then an a r b i t r a r y b u f f e r w i l l be chosen. As f o r A2 c o n f l i c t s , the output p o r t s perhaps should be given a higher p r i o r i t y over the T r ' s so that those packets which are a l r e a d y admitted i n t o the network c o u l d reach t h e i r d e s t i n a t i o n f a s t e r ( i . e . , a f a s t e r response time c o u l d be o b t a i n e d ) . These o b s e r v a t i o n s were obtained from the s i m u l a t o r l i s t e d i n Appendix C. As f o r Type-B switches, there are three p o s s i b l e types of c o n f l i c t s (the reader may r e f e r to F i g . I I I . 3 f o r the f o l l o w i n g d e s c r i p t i o n s ) : (a) B1 c o n f l i c t s - which are due to the simultaneous requests made by packets from the l e f t and r i g h t b u f f e r p o o l s , f o r the same intermediate p o r t ; (b) B2 c o n f l i c t s - which are the c o n t e n t i o n s among the the i n t e r m e d i a t e p o r t s and Class-2 b u f f e r f o r the same output p o r t ; (c) B3 c o n f l i c t s - which are the c o n t e n t i o n s between an output p o r t and the Tr s h a r i n g the same output l i n k , f o r the input port at the end of the l i n k . B1 c o n f l i c t s c o u l d be r e s o l v e d by a simple round-robin d i s c i p l i n e : the c o n f l i c t i n g packets are switched to the i n t e r m e d i a t e port a l t e r n a t e l y . 64 The r e s o l u t i o n of B2 c o n f l i c t s i s more i n t r i c a t e . Our s i m u l a t i o n s t u d i e s showed that the round-robin d i s c i p l i n e would g i v e r i s e t o unbearable propagation delay t o c e r t a i n p a c kets, but much b e t t e r performance, i n terms of average throughput r a t e and delay, c o u l d be obtained with a p r i o r i t y - based p o l i c y (an e x p l a n a t i o n w i l l be given i n S e c t i o n 3.B) which a s s i g n e d the hig h e s t p r i o r i t y to the Cl a s s - 2 b u f f e r , and then the in t e r m e d i a t e p o r t connected to the Class-1 b u f f e r s , and f i n a l l y the intermediate p o r t connected to the C l a s s - 0 b u f f e r s . With t h i s p o l i c y , packets i n the Cla s s - 2 b u f f e r s are switched to the output p o r t immediately when the output p o r t becomes empty; as c o u l d be e x p l a i n e d by Lemma 1 and 2, these packets w i l l always remain i n the same lo o p s , t h e r e f o r e they need not go throught any intermediate p o r t . Furthermore, s i n c e they are as s i g n e d the hi g h e s t p r i o r i t y i n the use of the output p o r t , they w i l l not accumulate and hence the s i z e of the Cla s s - 2 b u f f e r s i s always bounded. When the Cl a s s - 2 i s empty, the " e l i g i b l e " i n t e r m e d i a t e p o r t with the next h i g h e s t p r i o r i t y w i l l be granted access to the output p o r t . An in t e r m e d i a t e port connected to the Cl a s s - k b u f f e r s i s s a i d to be " e l i g i b l e " i f i t i s non-empty, and i f the incoming s t a t u s l i n e s i n d i c a t e t h a t the Cla s s - k b u f f e r of the next switch i s not f u l l . As f o r the connections between the l a s t and f i r s t stage, an int e r m e d i a t e port connected to t o the Cl a s s - k b u f f e r i s " e l i g i b l e " i f i t i s non-empty and i f the Class-(k+1) b u f f e r of the next switch i n the f i r s t stage i s not f u l l . The d i f f e r e n c e i n the above d e f i n i t i o n s of 65 " e l i g i b i l i t y " i s d i s c e r n i b l e i f o n e r e a l i z e s t h a t t h e f e e d b a c k c o u n t o f a p a c k e t i s i n c r e m e n t e d w h e n e v e r i t g o e s t h r o u g h t h e f e e d b a c k p a t h s b a c k t o t h e f i r s t s t a g e . T h e p u r p o s e o f t h e s t a t u s l i n e s i s t h e r e f o r e t o h e l p p r e v e n t t h e o u t p u t p o r t s a n d i n p u t p o r t s f r o m b e i n g c l o g g e d w i t h p a c k e t s w h i c h c a n n o t b e s w i t c h e d a w a y i m m e d i a t e l y . T h e p r i o r i t y - b a s e d p o l i c y w o u l d f a v o r t h o s e p a c k e t s o r i g i n a t e d a t t h e l o w e r s t a g e s , b e c a u s e t h e f e e d b a c k c o u n t s o f t h e s e p a c k e t s a r e i n c r e m e n t e d s o o n e r t h a n t h o s e o r i g i n a t e d a t t h e u p p e r s t a g e s , a n d h e n c e w i l l b e a s s i g n e d h i g h e r p r i o r i t i e s s o o n e r . H o w e v e r , o u r s i m u l a t i o n s t u d i e s s h o w s t h a t t h i s p o l i c y i s s u p e r i o r t h a n t h e r o u n d - r o b i n d i s c i p l i n e a s f a r a s t h e o v e r a l l p e r f o r m a n c e i s c o n c e r n e d ( a n e x p l a n a t i o n w i l l b e o f f e r e d a t t h e e n d o f n e x t s e c t i o n ) . T h e r e s o l u t i o n o f B3 c o n f l i c t s i s r a t h e r s t r a i g h t - f o r w a r d : t h e c o n f l i c t i n g T r a n d t h e o u t p u t p o r t a r e g r a n t e d a c c e s s a l t e r n a t e l y . B u t i n a d d i t i o n , i t i s n e c e s s a r y f o r t h e T r t o c h e c k t h a t t h e C l a s s - 0 b u f f e r ( n o t j u s t t h e i n p u t p o r t ) a t t h e e n t r y p o i n t i s n o t f u l l b e f o r e i t c a n t r a n s m i t . T h e a v a i l a b i l i t y o f t h e C l a s s - 0 b u f f e r h a s t o b e c h e c k e d b e c a u s e n e w l y a d m i t t e d p a c k e t s c a r r y f e e d b a c k c o u n t s o f z e r o . 3.B. D e a d l o c k a n d A v o i d a n c e M e t h o d I n c o n v e n t i o n a l n o n - r e c i r c u l a t i n g , p a c k e t - s w i t c h e d 66 networks, the blockage due to data path c o n f l i c t s i s temporary as long as there i s a f a i r s c h e d u l i n g p o l i c y ; whereas i n a LSSN which uses Type-A switches, blockage migh l e a d to deadlocks — the s i t u a t i o n s i n which c e r t a i n loops are clogged with packets and no f u r t h e r s w i t c h i n g can take p l a c e along these l o o p s , and very soon the whole network w i l l become impassable. The deadlock problem i n LSSN i s a t t r i b u t a b l e to the store-and-forward type of data movements and the c y c l i c a l r equests of network r e s o u r c e s . In a Type-A switch, i f the packets coming out of i t s two input b u f f e r s always contend f o r the output p o r t s , then the input b u f f e r s w i l l be f i l l e d up r a p i d l y ; and i f a l l the input b u f f e r s and output p o r t s along'a p a r t i c u l a r l o o p are f i l l e d with packets i n t r a n s i t , and i f the f i r s t packets of a l l these input b u f f e r s are w a i t i n g f o r these occupied output p o r t s to be f r e e d , then t h i s l o o p w i l l enter a " s i n g l e - l o o p " deadlock. A " m u l t i p l e - l o o p " deadlock i s produced i n a s i m i l a r manner but i t i n v o l v e s more than one l o o p . A c c o r d i n g to our s i m u l a t i o n s t u d i e s , the p r o b a b i l i t y of deadlock c o u l d be reduced s i g n i f i c a n t l y by i n c r e a s i n g the s i z e of b u f f e r s and r e s t r i c t i n g the input l o a d down to a c e r t a i n l e v e l ; but t h i s approach does not e l i m i n a t e deadlocks e n t i r e l y , and moreover, i t r e q u i r e s a deadlock d e t e c t i o n scheme and a recovery procedure. Perhaps i t i s more 67 e f f i c i e n t t o get around the deadlock problem by a v o i d i n g c y c l i c a l requests of the network re s o u r c e s ; and Type-B switches are meant f o r such a purpose. Our idea of using Type-B switches to prevent deadlocks i s based on the concept of " s t r u c t u r e d b u f f e r p o o l s " put forward by Raubold and Haenle [29]. A c c o r d i n g to t h e i r method, b u f f e r p o o l s are d i v i d e d i n t o K c l a s s e s , where K i s the l e n g t h of the longest path i n the network concerned, and i f a packet i s of r r o u t i n g steps away from i t s t r a n s m i t t e r , then i t may be p l a c e d i n t o any C l a s s - k b u f f e r such t h a t k<r<K. C l e a r l y , t h e i r method has the drawback that K must be a f u n c t i o n of the network s i z e . We e l i m i n a t e t h i s drawback by c l a s s i f y i n g packets a c c o r d i n g to t h e i r feedback counts which has been proved to be bounded. With the use of Type-B switches, the LSSN w i l l be f r e e of the store-and-forward type of deadlocks. A simple e x p l a n a t i o n i s as f o l l o w s : f o r packets e n t e r i n g the b u f f e r p o o l s , they w i l l request b u f f e r s a c c o r d i n g to t h e i r feedback counts, t h e r e f o r e there i s no c i r c u l a r request on the b u f f e r s ; as f o r the shared l i n k s and input/output p o r t s , these network reso u r c e s are granted to the r e q u e s t i n g packets on the c o n d i t i o n t h a t t h e i r occupations by the packets w i l l always be temporary. With t h i s idea i n mind, now we w i l l s t a t e the f o l l o w i n g theorem: Theorem I I I . 5 : The LSSN which uses Type-B switches i s deadlock r 68 f r e e . A n e x p l a n a t o r y p r o o f o f T h e o r e m I I I . 5 i s g i v e n i n A p p e n d i x A . 3 . C . N e t w o r k E x t e n s i b i l i t y V e r y o f t e n i t i s d e s i r a b l e t o e x p a n d a n e t w o r k a f t e r i t h a s b e e n b u i l t ; b u t u s u a l l y s u c h a n e x p a n s i o n i s d i f f i c u l t w i t h m o s t , i f n o t a l l , e x i s t i n g d e s i g n s . L S S N h a s t h e v e r y u s e f u l p r o p e r t y t h a t i t c a n b e e x p a n d e d i n c r e m e n t a l l y b y a d d i n g m o r e s t a g e s t o t h e b a s i c s t r u c t u r e w i t h o u t c o m p l i c a t i n g t h e a d d r e s s i n g a n d r o u t i n g a l g o r i t h m s . O f c o u r s e , t o f a c i l i t a t e t h e e x p a n s i o n , t h e r e m u s t b e s u f f i c i e n t a d d r e s s l i n e s t o a c c o u n t f o r t h e a d d e d s t a g e s a n d d e v i c e s . O n e way t o e x p a n d t h e b a s i c s t r u c t u r e w h i l e k e e p i n g i t i n t a c t i s t o a d d t h e new s t a g e s t o t h e b o t t o m o f i t — i m m e d i a t e l y a f t e r t h e l a s t s t a g e o f s w i t c h e s a n d b e f o r e t h e p e r f e c t s h u f f l e t a k e s p l a c e . S u p p o s e t h e r e a r e L l o o p s a n d L ' s t a g e s o r i g i n a l l y , a n d we w a n t t o a d d L " m o r e s t a g e s , t h e n t h e e x p a n d e d L S S N w o u l d h a v e a t o t a l o f ( L ' + L " ) s t a g e s a s s h o w n : 69 stage number, s 1 } 2 } - • 1 • y b a s i c s t r u c t u r e • s L' } L' + 1 ] • J • ] • J a d d i t i o n a l stages L'+L" ] Now the stages and Rr's w i l l be addressed u s i n g S"=riog(L'+L")1 and {\log(L'+L")1+L 1}={S"+L'} b i t s of b i n a r y d i g i t s r e s p e c t i v e l y . The a d d r e s s i n g scheme and c o n n e c t i o n f u n c t i o n f o r the newly added stages a r e : L e f t o u t p u t l i n k = (A g „ . . . A ^ - ^ , _^ . . . ) R i g h t o u t p u t l i n k = ( A g „ . . . A ^ l - d ^ , ^ . . . .-t^ ) L S S N (<e.L, tx) = ( 7 L , t x ) In words, a l l the new stages would be t r e a t e d much the same as the l a s t stage of the b a s i c s t r u c t u r e , i . e . , the L ' - t h stage; and there i s no s h u f f l i n g among the output l i n k s of these new stages; and packets which are sent to them are routed a c c o r d i n g to the L ' - t h b i t s of the d e s t i n a t i o n addresses of the packets. In the expanded LSSN, the duty of incrementing the feedback counts i s performed by the (L'+L")th stage r a t h e r than the L ' - t h ; such a minor change has to be taken care of d u r i n g the expansion. We do not i n t e n d to d e r i v e theorems from s c r a t c h f o r the expanded network because the v a l i d i t y of 70 C o r o l l a r y .111.1 and Theorem II I . 2 are d i s c e r n i b l e i f the new stages are regarded as the s u b s i d i a r i e s of the L ' - t h stage, i . e . , i f stage L' through stage (L'+L") are c o n s i d e r e d as a s i n g l e , compound stage. L a s t l y , we would l i k e to p o i n t out tha t LSSN c o u l d a l s o be expanded i n the more expensive way by do u b l i n g i t s loop count. 4. Si m u l a t i o n s and Performance A n a l y s i s In our s i m u l a t i o n s t u d i e s , the throughput r a t e and del a y are the two measures used f o r e v a l u a t i o n s and comparisons. Throughput r a t e i s d e f i n e d as the average number of packets f l o w i n g throught the network per u n i t time, and the delay of packets i s d e f i n e d as the average i n t e r v a l between t h e i r g e n e r a t i o n s and r e c e p t i o n s . The delay i s made up of "entrance d e l a y " and "propagation d e l a y " , where entrance d e l a y i s the average d u r a t i o n that a packet has to wait at the en t r y p o i n t , and propagation delay i n c l u d e s the time spent i n queueing and s w i t c h i n g w i t h i n the network. Request i n t e r v a l i s the v a r y i n g parameter and i s d e f i n e d as the average time between the l a s t s u c c e s s f u l t r a n s m i s s i o n and the ge n e r a t i o n of the next packet. In order to o b t a i n some meaningful r e s u l t s and to f a c i l i t a t e the a n a l y s i s l a t e r on, we have made the f o l l o w i n g assumptions: 71 (a) The t r a n s m i t t i n g and r e c e i v i n g p a i r s are randomly s e l e c t e d out of the e n t i r e address space; (b) The t r a n s m i s s i o n p a t t e r n i s such that i f the c u r r e n t request to t r a n s m i t i s i n process or blocked, then the t r a n s m i t t e r a f f e c t e d w i l l not generate the next request; (c) Packets are removed immediately from the network when they a r r i v e at t h e i r d e s t i n a t i o n ; (d) As f o r t i m i n g c o n s i d e r a t i o n s , the amount of s w i t c h i n g delay i n going through a c o n v e n t i o n a l b i n a r y switch was estimated to be f i v e gate d e l a y s [30]: three f o r path s e l e c t i o n and two f o r data t r a n s f e r . Since Type-A switches would l e a d to deadlocks on LSSN, t h e i r a n a l y s i s w i l l not be i n c l u d e d i n our s t u d i e s . A Type-B s w i t c h would need more delay than the c o n v e n t i o n a l ones: three gate d e l a y s f o r path s e l e c t i o n , two f o r data t r a n s f e r from the input p o r t s to the b u f f e r p o o l s , two from the b u f f e r p o o l s to the i n t e r m e d i a t e p o r t s and another f i v e f o r path s e l e c t i o n and data t r a n s f e r from the i n t e r m e d i a t e p o r t s to the output p o r t s — a t o t a l of twelve gate d e l a y s . In the case of those packets which are i n s i d e C l a s s - 2 b u f f e r s , t h e i r s w i t c h i n g delay are s h o r t e r because they do not have to go through the 72 i n t e r m e d i a t e p o r t s . T h e m a i n d u t y o f t h e L S S N s i m u l a t o r i s t o c o m p u t e t h e t o t a l d e l a y s o f e a c h i n d i v i d u a l p a c k e t b y s u m m i n g u p i t s e n t r a n c e , s w i t c h i n g a s w e l l a s q u e u e i n g d e l a y s . T h e s e a s s u m p t i o n s a r e c o n s i d e r e d j u s t i f i a b l e , a n d t h e y h a v e a l s o a p p e a r e d i n t h e s i m u l a t i o n s t u d i e s o f o t h e r p a c k e t s w i t c h i n g n e t w o r k s ( e . g . r e f e r e n c e s [ 2 5 , 3 0 ] ) . I n t h e L S S N s i m u l a t o r , a t i m e r was a s s o c i a t e d w i t h e a c h p a c k e t e n t e r i n g t h e n e t w o r k , s o a s t o r e c o r d e a c h t y p e o f d e l a y s t h a t i t w i l l e n c o u n t e r . F r o m t i m e t o t i m e , t h e w h o l e s w i t c h i n g a r r a y was i n s p e c t e d t o m ake s u r e t h a t n o p a c k e t w o u l d b e s u b j e c t e d t o s u b s t a n t i a l d e l a y — w h i c h i s a n i n d i c a t i o n o f p o t e n t i a l d e a d l o c k s . T h e L S S N u n d e r s t u d y h a d 16 l o o p s a n d was f u l l y c o n n e c t e d w i t h 64 p a i r s o f t r a n s m i t t e r s a n d r e c e i v e r s . T h e e f f e c t s o f t h e b u f f e r s i z e o n t h e n e t w o r k p e r f o r m a n c e w e r e f i r s t i n v e s t i g a t e d , a n d i t w a s c o n f i r m e d t h a t b e c a u s e t h e C l a s s - 2 b u f f e r s w e r e g i v e n t h e h i g h e s t p r i o r i t y i n t h e B2 t y p e o f c o n f l i c t s , t h e maximum r e q u e s t e d s i z e o f t h e C l a s s - 2 b u f f e r s was b o u n d e d t o t w o . F o r t h i s r e a s o n , t h e s i z e o f t h e C l a s s - 2 b u f f e r s was f i x e d a t t w o , a n d t h e s i z e s o f t h e C l a s s - 0 a n d C l a s s - 1 b u f f e r s w e r e v a r i e d f r o m 4 t o 14 ( a n a r b i t r a r y r a n g e ) . O u r r e s u l t s ( p l e a s e s e e F i g . I I I . 5 ) s h o w t h a t t h e v a r i a t i o n o f t h e s i z e s d o n o t h a v e a s i g n i f i c a n t e f f e c t ; t h e r e a s o n i s t h a t w h en m o r e b u f f e r s w e r e u s e d , t h e r e w o u l d b e m o r e t r a f f i c i n t r o d u c e d i n t o t h e n e t w o r k , a l t h o u g h t h e e n t r a n c e d e l a y o f a p a c k e t i s r e d u c e d , i t s p r o p a g a t i o n d e l a y 73 w o u l d b e i n c r e a s e d ; a s a r e s u l t , t h e t o t a l d e l a y i s n o t m u c h a f f e c t e d . I n t h e s e c o n d p a r t o f o u r s t u d y , we c o m p a r e d t h e p e r f o r m a n c e o f a 6 4 x 6 4 L S S N t o t h a t o f a 6 4 x 6 4 b a s e l i n e a n d t h e n a 1 6 x 1 6 b a s e l i n e . T h e b a s e l i n e n e t w o r k s w e r e c o n s i d e r e d b e c a u s e t h e y a r e t o p o l o g i c a l l y e q u i v a l e n t t o many e x i s t i n g n e t w o r k s [ 2 0 ] . We m u s t e m p h a s i z e t h a t o u r c o m p a r i s o n s t u d i e s a r e n o t e n t i r e l y f a i r b e c a u s e b a s e l i n e - l i k e n e t w o r k s c o u l d b e u s e d a s e i t h e r c i r c u i t - s w i t c h e d n e t w o r k s ( e . g . , T h e S t a r n e t w o r k [ 2 4 ] ) , o r p a c k e t - s w i t c h e d n e t w o r k s , w h e r e a s L S S N i s i n t e n d e d t o b e u s e d a s p a c k e t - s w i t c h e d n e t w o r k s o n l y ; f u r t h e r m o r e , t h e L S S N s w i t c h e s h a v e a m u c h h i g h e r l o g i c g a t e d e n s i t y t h a n t h e c o n v e n t i o n a l o n e s . F o r c o m p a r i s o n s , we a s s u m e d t h a t t h e s w i t c h e s u s e d i n t h e b a s e l i n e s h a d a b u f f e r s i z e o f 16 ( a n a r b i t r a r y n u m b e r ) ; a s f o r L S S N s w i t c h e s , t h e s i z e s o f i t s C l a s s - 0 , C l a s s - 1 a n d C l a s s - 2 b u f f e r s w e r e f i x e d a t s e v e n , s e v e n a n d t w o , r e s p e c t i v e l y — a t o t a l o f 16 a s w e l l . T h e r e s u l t s o b t a i n e d f o r t h i s b u f f e r s i z e a r e p r e s e n t e d i n F i g . I I I . 5 ; o t h e r b u f f e r s i z e s w o u l d p r o d u c e r e s u l t s v e r y s i m i l a r t o t h e s e . I n F i g . I I I . 5 , a l l t h e m e a s u r e m e n t s w e r e s c a l e d b y t h e f a c t o r f w h i c h i s t h e o p e r a t i n g f r e q u e n c y o f t h e n e t w o r k s . T h e v a l u e o f f c o u l d b e a s h i g h a s 60 MHz i f t h e s w i t c h e s a r e f a b r i c a t e d w i t h T T L g a t e s , o r 4 0 0 MHz w i t h E C L g a t e s . A 6 4 x 6 4 b a s e l i n e w o u l d n e e d a t o t a l o f 192 s w i t c h e s w h e r e a s a 74 6 4 x 6 4 L S S N a n d a 1 6 x 1 6 b a s e l i n e w o u l d r e q u i r e 32 s w i t c h e s e a c h . N o t o n l y t h e n u m b e r s o f s w i t c h e s w o u l d c o n t r i b u t e t o t h e c o m p l e x i t i e s o f t h e n e t w o r k s , b u t t h e a m o u n t s o f w i r i n g h a v e t o b e c o n s i d e r e d a s w e l l . O u r r e s u l t s show t h a t i f t h e r e q u e s t i n t e r v a l i s v e r y s h o r t , t h e t h r o u g h p u t o f t h e L S S N w i l l b e c l o s e t o t h a t o f t h e 1 6 x 1 6 b a s e l i n e , a n d i t s d e l a y w i l l b e a b o u t t h r e e t i m e s h i g h e r ; b u t when t h e r e q u e s t i n t e r v a l i s l o n g e r t h a n 4 0 / f , t h e n b o t h t h e t h r o u g h p u t a n d d e l a y o f t h e L S S N w i l l a p p r o a c h t h a t o f t h e 6 4 x 6 4 b a s e l i n e . I f t h e r e q u e s t i n t e r v a l i s f u r t h e r i n c r e a s e d , t h e d e l a y o f t h e L S S N w i l l b e r e d u c e d t o t h a t o f t h e 1 6 x 1 6 b a s e l i n e . I n s u m m a r y , o u r r e s u l t s i n d i c a t e t h a t t h e p e r f o r m a n c e o f a 6 4 x 6 4 L S S N c a n m a t c h t h a t o f a 6 4 x 6 4 b a s e l i n e w i t h a s i g n i f i c a n t l y l o w e r s w i t c h c o u n t a n d h e n c e f e w e r w i r i n g . T h i s s a v i n g s i s e v e n m o r e s u b s t a n t i a l when t h e s i z e s o f t h e n e t w o r k s c o n s i d e r e d a r e v e r y l a r g e . 5. D i s c u s s i o n s a n d O u t l o o k We h a v e d e s c r i b e d a n o v e l m e t h o d t o s e t u p a c o m m u n i c a t i o n n e t w o r k b a s e d o n t h e c o n c e p t o f c y c l i c a l a r c h i t e c t u r e s ; we h a v e a l s o p r e s e n t e d t h e a d d r e s s i n g a n d r o u t i n g s c h e m e s , a n d s e v e r a l p r o p e r t i e s o f t h e n e t w o r k . A l t h o u g h p a c k e t - s w i t c h e d , c y c l i c a l l y c o n n e c t e d s t r u c t u r e s a r e , 75 s u s c e p t i b l e t o t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s when u s e d a s y n c h r o n o u s l y , we h a v e s u g g e s t e d a d e a d l o c k a v o i d a n c e s c h e m e b a s e d o n some u n i q u e f e a t u r e s o f o u r d e s i g n . T h e t o p o l o g y o f o u r p r o p o s e d n e t w o r k L S S N r e s e m b l e s t h a t o f some e x i s t i n g o n e s . I f o n l y L p r o c e s s o r s a r e a t t a c h e d t o L S S N s u c h t h a t t h e y a l l t r a n s m i t p a c k e t s t h r o u g h t h e f i r s t s t a g e a n d r e c e i v e p a c k e t s f r o m t h e l a s t s t a g e , t h e n L S S N w o u l d b e r e d u c e d t o a n i n d i r e c t b i n a r y n - c u b e [ 2 1 ] . T h i s s i m i l a r i t y i m p l i e s t h a t t h o s e u s e f u l a l g o r i t h m s d e v e l o p e d f o r t h e i n d i r e c t b i n a r y n - c u b e c o u l d b e a d a p t e d f o r L S S N e a s i l y . L S S N a l s o r e s e m b l e s t h e l a s t s t a g e o f t h e B a t c h e r ' s b i t o n i c s o r t e r [ 1 3 ] ; t h e r e f o r e , i t i s p o s s i b l e t o p e r f o r m B a t c h e r S o r t o n L S S N p r o v i d e d t h e r e e x i s t s a m a s k i n g s c h e m e t o d i s a b l e some o f t h e a t t a c h e d p r o c e s s o r s a s d a t a i t e m s a r e c i r c u l a t e d a r o u n d t h e n e t w o r k . L S S N c o u l d a l s o b e p a r t i a l l y c o n n e c t e d a n d u s e d a s a n a r b i t r a t o r o r a d i s t r i b u t o r — b o t h o f w h i c h a r e e s s e n t i a l i n t h e d e s i g n s o f d a t a - d r i v e n c o m p u t e r s [ 4 , 5 , 8 , 5 2 , 5 3 ] . L S S N c a n a l s o b e u s e d t o p e r f o r m a r b i t r a r y p e r m u t a t i o n s — i . e . , o n e - t o - o n e m a p p i n g s — f r o m t h e i n p u t s i d e t o t h e o u t p u t s i d e . S u c h p e r m u t a t i o n s w o u l d r e q u i r e t h e p r e s e n c e o f a c e n t r a l c o n t r o l l e r t o c o m p u t e t h e r o u t i n g i n f o r m a t i o n ; a l t e r n a t i v e l y , t h o s e f r e q u e n t l y u s e d c o n t r o l p a t t e r n s c o u l d b e p r e - c o m p u t e d a n d r e t r i e v e d when n e e d e d . T h e s i m p l e r , T y p e - A s w i t c h e s c o u l d b e u s e d f o r s u c h a n a p p l i c a t i o n a n d t h e y w i l l n o t c a u s e t h e n e t w o r k t o d e a d l o c k a s 76 l o n g a s o n l y o n e p e r m u t a t i o n i s p e r f o r m e d a t a t i m e , a n d p r o v i d e d t h e r e a r e s u f f i c i e n t b u f f e r s i n s i d e t h e s w i t c h e s : B ( L ) > M A X { M I N ( 2 * * r 0 . 5 1 o g N l , N / 2 * * [ 0 . 5 1 o g N l ) , M I N ( 2 * * L 0 . 5 1 o g N J , N / 2 * * [ 0 . 5 1 o g N J ) } = O ( N * * 0 . 5 ) w h e r e B ( L ) i s t h e n u m b e r o f b u f f e r s o f t h e T y p e - A s w i t c h e s n e e d e d t o a v o i d d e a d l o c k s , a n d N = L l o g L , w h e r e L i s t h e n u m b e r o f l o o p s ; t h e w o r s t - c a s e d e l a y t o p e r f o r m a o n e - t o - o n e m a p p i n g i s : T m a x ( L ) = 2 * * [ 0 . 5 1 o g N ] + N / ( 2 * * L 0 . 5 1 o g N J ) - ' l o g L - 1 = O ( N * * 0 . 5 ) a n d t h e a v e r a g e d e l a y i s : T a v g ( L ) = O ( l o g N ) T h e s e r e s u l t s c o u l d b e f o u n d i n r e f e r e n c e [ 2 3 ] ; b e c a u s e b o t h B ( L ) a n d T m a x ( L ) a r e i n t o l e r a b l y l a r g e f o r l a r g e v a l u e s o f N, we d o n o t i n t e n d t o i n c l u d e t h e a n a l y s i s i n t h i s d i s s e r t a t i o n . A s i s s h o w n b y o u r s i m u l a t i o n r e s u l t s ( F i g . I I I . 6 ) , t h e p e r f o r m a n c e o f t h e L S S N i s n o t a s g o o d a s t h a t o f t h e b a s e l i n e 7,7 n e t w o r k f o r a p p l i c a t i o n s w h i c h h a v e v e r y s h o r t i n t e r - t r a n s m i s s i o n t i m e s ; b u t i f t h e t r a n s m i t t e r s a r e p r o c e s s o r s w h i c h s e n d o u t d a t a p a c k e t s a s t h e r e s u l t s o f i n s t r u c t i o n e x e c u t i o n s — i . e . , , i f t h e p r o c e s s o r s c o m p u t e a n d d i s p a t c h a l t e r n a t e l y — t h e n t h e LSSN w i l l b e a n a t t r a c t i v e d e s i g n . O u r p r o p o s e d s y s t e m t r a d e s o f f e x t e r n a l h a r d w a r e c o m p l e x i t i e s ( e . g . , c o m p o n e n t c o u n t s , w i r i n g , e t c . ) w i t h i n t e r n a l h a r d w a r e c o m p l e x i t i e s ( i . e . , l o g i c g a t e p e r s w i t c h ) . H i g h i n t e r n a l c o m p l e x i t i e s c a n b e e a s i l y a c h i e v e d w i t h t o d a y ' s t e c h n o l o g i e s , b u t t o o , many e x t e r n a l c o m p o n e n t s a n d w i r i n g o f t e n r e n d e r s t h e s y s t e m d i f f i c u l t t o m a n a g e — t h i s i s t h e m a i n m o t i v e b e h i n d o u r d e s i g n . 78 Throughput (packets/sec 2f- f : network frequency a: 64x64 LSSN ( b u f f e r = l 4 / l 4 / 2 ) b: 64x64 LSSN (buffer=7/7/2) c: 64x64 LSSN (buffer=4/4/2) Note-numbers i n b r a c k e t s r e f e r to the s i z e s of Class-0,1 and 2 r e s p e c t i v e l y I n t e r - a r r i v a l time 200/f(sec.) Delay(sec.) 1.5/f - c G 1.0/f - 0.5/f - 40 It 80 ft 120/ f 160/f I n t e r - a r r i v a l time 2 0 0 / f ( s e c . ) »nrHI?'5* «Eff!5t!.o£ b u £ £ e r s i z e o n t h e throughput and delay of a 64x64 LSSN. s p Throughput rate (packets/sec) A Baseline Network (64 Tr's x 64 Rr's , 192 Switches) Loop-Structured Switching Network (64 Tr's x 64 Rr's , 32 Switches) Baseline Network (16 Tr's x 16 Rr's , 32 Switches) f = Network Frequency (Hz) F ig . I I I .6 . The throughput rates of a 64x64 baseline, a 64x64 LSSN and a 16x16 baseline, versus the inter- -arr iva l time. I n t e r - a r r i v a l time (sec) 40/f 80/f 120/f 160/f 200 / f Delay(Sec/packet) A' 1.4 " 1.0 - f 0.6 ~ 0.2 - 4 0 / f 8 0 / f 1 2 0 / f 1 6 0 / f Inter-arrlval •time(sec) 2 0 0 / f F i g . I I I . 7 . T h e d e l a y c u r v e s o f a 6 4 x 6 4 b a s e l i n e , a 6 4 x 6 4 L S S N a n d a 16 x 1 6 b a s e l i n e , v e r s u s t h e i n t e r - a r r i v a l t i m e . CO o 81 C h a p t e r I V . D e s i g n a n d E v a l u a t i o n o f The E v e n t - D r i v e n C omputer (EDC) 1 . I n t r o d u c t i o n 1 .A. B a c k g r o u n d I n f o r m a t i o n I n t h i s c h a p t e r , we w i l l e x a m i n e how t h e c o n c e p t o f c y c l i c a l a r c h i t e c t u r e s c o u l d be a p p l i e d t o t h e d e s i g n o f a h i g h - p e r f o r m a n c e s u p e r c o m p u t e r ; t o s t a r t w i t h , we w i l l d i s c u s s t h e s h o r t c o m i n g s o f t h e c o n v e n t i o n a l c o m p u t e r s y s t e m s i n t h i s r e s p e c t , a n d t h e n t h e a p p r o a c h o f o u r d e s i g n w i l l be i d e n t i f i e d . C o n v e n t i o n a l c o m p u t e r s y s t e m s a r e o f t e n r e f e r r e d t o a s Von Neumann, o r s o m e t i m e s a s H a r v a r d , m a c h i n e s , a n d t h e y a l l h a v e v e r y s i m i l a r " c o n t r o l " a n d " d a t a " m e c h a n i s m s . " C o n t r o l " m e c h a n i s m s r e f e r t o t h e method s f o r s c h e d u l i n g i n s t r u c t i o n s f o r e x e c u t i o n , a n d " d a t a " m e c h a n i s m s r e f e r t o t h e method s f o r p a s s i n g d a t a among i n s t r u c t i o n s . Von Neumann c o m p u t e r s a r e t e r m e d " c o n t r o l - d r i v e n " b e c a u s e t h e i r i n s t r u c t i o n e x e c u t i o n s a r e s e q u e n c e d by c o n t r o l s i g n a l s g e n e r a t e d by t h e CPUs ( C e n t r a l P r o c e s s i n g U n i t s ) . I n t h e s e c o m p u t e r s , d a t a a r e p a s s e d among i n s t r u c t i o n s by w r i t i n g a n d r e a d i n g memory l o c a t i o n s w h i c h a r e s p e c i f i c a l l y a s s i g n e d t o t h e s e d a t a . A m a j o r d r a w b a c k o f u s i n g Von Neumann c o m p u t e r s i n a h i g h l y p a r a l l e l e n v i r o n m e n t i s a t t r i b u t a b l e t o t h e n e e d o f , an d d i f f i c u l t i e s i n , s p e c i f y i n g c o n c u r r e n c y — e i t h e r t h e programmer o r t h e c o m p i l e r h a s t o be c a r e f u l w i t h t h e 82 g e n e r a t i o n of c o n t r o l s i g n a l s , to ensure that memory l o c a t i o n s are not c o r r u p t e d by wrongful i n f o r m a t i o n d u r i n g read and wr i t e o p e r a t i o n s . Such a drawback i s easy to overcome i n a SIMD ( S i n g l e I n s t r u c t i o n stream M u l t i p l e Data streams [6]) system because there i s only a s i n g l e stream of executable i n s t r u c t i o n s ; whereas i n a MIMD ( M u l t i p l e I n s t r u c t i o n streams M u l t i p l e Data streams) system, the asynchronous behavior of memory accesses among the v a r i o u s i n s t r u c t i o n streams o f t e n complicates the implementation of the c o n t r o l mechanisms. Another major drawback i s a t t r i b u t a b l e to the d i f f i c u l t y i n "l o a d p a r t i t i o n i n g " which o f t e n g i v e s r i s e to uneven work lo a d d i s t r i b u t i o n s among the p r o c e s s o r s and access b o t t l e n e c k s i n the memory modules. Moreover, system e x t e n s i b i l i t y i s always d i f f i c u l t to achie v e , and increments i n the number of pr o c e s s o r s are o f t e n not accompanied by p r o p o r t i o n a t e improvement i n the system performance. The g o a l of r e s e a r c h i n " d a t a - d r i v e n " computers [4] i s aimed at a l l e v i a t i n g the above shortcomings, and both t h e i r data and c o n t r o l mechanisms are implemented very much d i f f e r e n t l y from those of Von Neumann systems: the data mechanism i s such that data are passed from the producing i n s t r u c t i o n s t o the consuming ones d i r e c t l y without going through any intermediate s t o r a g e , and the c o n t r o l mechanism i s such t h a t the consuming i n s t r u c t i o n s would be r e a d i e d f o r exe c u t i o n i f , and only i f , they have r e c e i v e d a l l the r e q u i r e d data and i n f o r m a t i o n . For a data d r i v e n computer, i t s c o n t r o l mechanism c o u l d t h e r e f o r e be implemented as 83 suboperations i n c o r p o r a t e d i n t o i t s data mechanism, which c o u l d be e a s i l y implemented using well-known c o m p i l a t i o n techniques (e.g., data-flow a n a l y s i s ) . The absence of an e x p l i c i t c o n t r o l mechanism would ease the task of the programmmer i n s p e c i f y i n g p a r a l l e l i s m to a great e x t e n t . As a r e s u l t , the d a t a - d r i v e n approach i s very a p p e a l i n g to m u l t i p r o c e s s i n g and multiprogramming environments which c o n t a i n l a r g e amounts of u n s t r u c t u r e d , asynchronous concurrency. However, the i m p l i c i t c o n t r o l mechanism of d a t a - d r i v e n systems does not conform to the n o t i o n of c e r t a i n a c t i v i t i e s such as input and output o p e r a t i o n s , which are not n e c e s s a r i l y ready f o r execution when t h e i r data have a r r i v e d . F u r t h e r , the data mechanism of d a t a - d r i v e n systems i s very i n e f f i c i e n t i n handl i n g l a r g e a r r a y s because sending a r r a y s among i n s t r u c t i o n s f o r computation i s both time and space consuming. s From the above d i s c u s s i o n s , i t i s c l e a r that the^data- d r i v e n and c o n t r o l - d r i v e n approaches are complements of each othe r ; t h e r e f o r e , i t i s very n a t u r a l to e n v i s i o n a c l a s s of computers which combine t h e i r c o n t r o l and data mechanisms f o r the purpose of b e t t e r performance. 1.B. Recent Developments T h i s s e c t i o n w i l l examine three e x i s t i n g p r o p o s a l s which adopt the combined approach: 84 ( 1 ) D e p e n d e n c e - D r i v e n s y s t e m ( 1 9 8 1 ) [ 3 2 ] ; ( 2 ) C o m b i n e d s y s t e m ( 1 9 8 2 ) [ 3 3 ] ; ( 3 ) P i e c e - w i s e D a t a - F l o w s y s t e m ( 1 9 8 3 ) [ 3 4 ] . T h e D e p e n d e n c e - D r i v e n s y s t e m i s made u p o f a GCU ( G l o b a l C o n t r o l U n i t ) a n d s e v e r a l p r o c e s s o r c l u s t e r s , e a c h c a p a b l e o f e x e c u t i n g a h i g h - l e v e l f u n c t i o n . T h e c o m p i l e r i s e x p e c t e d t o p r o d u c e a l l t h e s t a t i c i n f o r m a t i o n a b o u t t h e c o m p u t a t i o n a n d t h e GCU w i l l p e r f o r m r u n - t i m e s c h e d u l i n g . T h i s s y s t e m i s b e s t - s u i t e d f o r c o m p u t a t i o n w h i c h c o u l d b e h e a v i l y v e c t o r i z e d ; h o w e v e r , t h e p r e s e n c e o f s c a l a r c o m p u t a t i o n w o u l d c a u s e some p r o c e s s i n g r e s o u r c e s t o s t a n d i d l e d u r i n g t h e i r e x e c u t i o n s , t h u s g i v i n g r i s e t o u n d e r - u t i l i z a t i o n o f t h e s e r e s o u r c e s . T h e C o m b i n e d s y s t e m i n t e g r a t e s t h e c o n c e p t s o f t h e " p u r e " d a t a - d r i v e n c o m p u t a t i o n a n d t h o s e o f t h e " m u l t i - t h r e a d " c o n t r o l - d r i v e n c o m p u t a t i o n . T r e a l e v e n e t a l [ 3 3 ] h a v e s h o w n how i t e r a t i o n s , p r o c e d u r e c a l l s a n d r e s o u r c e m a n a g e m e n t a r e c a r r i e d o u t o n t h i s s y s t e m ; n o t m e n t i o n e d i s how a r r a y o p e r a t i o n s a r e p e r f o r m e d . I f a r r a y o p e r a t i o n s a r e d e c o m p o s e d i n t o i n d i v i d u a l p a c k e t s e a c h c o n t a i n i n g a p a r t i c i p a t i n g a r r a y e l e m e n t , t h e n t h e r e w i l l b e e n o r m o u s a m o u n t s o f o v e r h e a d a s s o c i a t e d w i t h t h e s e t t i n g u p a n d t r a n s m i s s i o n o f t h e p a c k e t s , a n d a l s o t o s y n c h r o n i z e t h e c o m p l e t i o n s o f t h e a r r a y o p e r a t i o n s . R e q u a e t a l [ 3 4 ] h a v e p r o v i d e d a r a t h e r d e t a i l e d 85 d e s c r i p t i o n of the PDF system which possesses both -SIMD and MIMD c h a r a c t e r i s t i c s — these two c l a s s e s of computation are performed on d i f f e r e n t types of hardware modules which are not in t e r c h a n g e a b l e . We b e l i e v e that i f both s c a l a r and a r r a y o p e r a t i o n s c o u l d be c a r r i e d out on the same type of hardware modules, then there w i l l be fewer module types, and hence the system would be l e s s expensive to design and e a s i e r to c o n t r o l . The PDF system avoids using any i n t e r c o n n e c t i o n network by l i m i t i n g the number of s c a l a r p r o c e s s o r s t o about e i g h t ; t h e r e f o r e , the speed of the PDF system i s expected to have l i m i t e d improvement over e x i s t i n g ones (please r e f e r to the I n t r o d u c t i o n S e c t i o n of [ 3 4 ] ) . 1.C. Overview of Our Approach Our o b j e c t i v e i s t o design a heterogeneous m u l t i p r o c e s s o r system which, (1) i s capable of us i n g hundreds t o thousands of pr o c e s s o r s ; (2) has a p r o j e c t e d speed range of 100 to 1,000 MOPS ( m i l l i o n o p e r a t i o n s per second); ( 3 ) possesses both SIMD and MIMD c h a r a c t e r i s t i c s — t h i s i s to be achieved by combining the p r i n c i p l e s of da t a - d r i v e n and c o n t r o l - d r i v e n computation; ( 4 ) i s intended f o r ne x t - g e n e r a t i o n a p p l i c a t i o n s , and i s expected to depart from the p r e v a l e n t , Von Neumann a r c h i t e c t u r e s . 86 In order to connect a l a r g e number of p r o c e s s o r s together and yet ma i n t a i n i n g a high degree of f l e x i b i l i t y , a l a r g e s w i t c h i n g network would be i n c l u d e d i n our d e s i g n . To achieve the d e s i r e d speed range, the intended a p p l i c a t i o n s must possess a l a r g e amount of concurrency to keep the pr o c e s s o r s busy most of the time. Since the r a t i o of SIMD and MIMD i n s t r u c t i o n mix d i f f e r s from a p p l i c a t i o n to a p p l i c a t i o n , i n order to f u l l y u t i l i z e i t s r e s o u r c e s , the system must be abl e to maintain roughly the same l e v e l of performance r e g a r d l e s s of the r a t i o of mix. We a l s o b e l i e v e that i n order to a t t a i n a s i g n i f i c a n t achievement toward u l t r a - f a s t computation, the new design may have to depart from the p r e v a l e n t Von Neumann systems i n both hardware and software; t h e r e f o r e , we only emphasize the a r c h i t e c t u r a l aspects of our design r a t h e r than any immediate implementation. In our proposed system, there are two b a s i c types of op e r a t i o n s -- s c a l a r and compound o p e r a t i o n s , both of which are scheduled f o r execution u s i n g d a t a - d r i v e n p r i n c i p l e s ; but suboperations w i t h i n a compound o p e r a t i o n are sequenced f o r execution i n a c o n t r o l - d r i v e n manner. A compound o p e r a t i o n i s e i t h e r a computational a r r a y o p e r a t i o n , an a r r a y alignment o p e r a t i o n or a block of s e q u e n t i a l program. A s e q u e n t i a l program i s one which e i t h e r r e q u i r e s a f a s t computation time and c o u l d run f a s t e r when executed s o l e l y by a s i n g l e p r o c e s s o r i n a SISD ( S i n g l e I n s t r u c t i o n stream S i n g l e Data stream) mode than by many of them i n a MIMD mode (due to 87 communications overhead), or i s used to c o n t r o l an i n h e r e n t l y s e q u e n t i a l process such as p r i n t i n g on the l i n e p r i n t e r . I/O d e v i c e s SP A 4 LMs R P s >l / ! F i g . I V . 1.- The EDC system block diagram. As shown in F i g . I V . 1 , an EDC c o n s i s t s of s i x b a s i c p a r t s : A S u p e r v i s i n g Processor (SP); a bank of L o c a l Memories (LMs); s e v e r a l T r a n s m i t t i n g P r o c e s s o r s (TPs) and R e c e i v i n g P r o c e s s o r s (RPs); a number of I n s t r u c t i o n R e g i s t e r s • (IRs) and a Packet Switching Network (PSN). The main duty of SP i s to l o a d and spread i n s t r u c t i o n s and data i n t o the bank of LMs, and to i n i t i a t e the a p p r o p r i a t e TPs to s t a r t execution — both u s i n g the r e a d / w r i t e l i n k s p r o v i d e d on the l e f t of LMs and TPs. Compound o p e r a t i o n s ( i . e . , a r r a y computation, a r r a y alignments and e x e c u t i o n s of s e q u e n t i a l programs) w i l l i n v o l v e the use of these r e a d / w r i t e l i n k s as w e l l : when a TP r e c e i v e s such a compound o p e r a t i o n , i t w i l l become a s u b c o n t r o l l e r and request SP e i t h e r f o r the c o n t r o l of s e v e r a l TP-LM p a i r s f o r a r r a y computation i n a SIMD manner, or to i n i t i a t e the e x c u t i o n of a block of s e q u e n t i a l program on another TP i n a SISD f a s h i o n . 88 Information f l o w i n g on the r i g h t of LMs and TPs are encapsulated i n t o the form of e i t h e r r e s u l t or i n s t r u c t i o n p a c k e t s : r e s u l t packets are generated by TPs and are switched through PSN to RPs which w i l l p l a c e them i n t o the proper LMs; RPs are a l s o r e s p o n s i b l e f o r the formation of i n s t r u c t i o n p a c k e t s : they r e t r i e v e the executable i n s t r u c t i o n s and data from LMs and b u f f e r them i n t o IRs to wait f o r f r e e TPs f o r e x e c u t i o n . Because each of the TPs w i l l r e c e i v e i n s t r u c t i o n s from d i f f e r e n t i n s t r u c t i o n streams from time t o time, the i n t e r p r e t a t i o n of "MIMD" i n t h i s case i s somewhat d i f f e r e n t from the t r a d i t i o n a l one. Other s a l i e n t f e a t u r e s of the EDC system i n c l u d e : (a) Fewer module types: Only a few types of hardware modules are used, although each type i s intended to be used i n l a r g e amounts. T h i s would reduce the design c o s t s and give r i s e t o a simpler a r c h i t e c t u r e which i s e a s i e r to c o n t r o l than one which uses a l o t of module t y p e s . (b) I n t e r l e a v e d i n s t r u c t i o n s and skewed a r r a y s : A subset of the LMs are used t o s t o r e s e q u e n t i a l programs which w i l l be executed by t h e i r a s s o c i a t e d TPs using the r e a d / w r i t e l i n k s i n a SISD mode; while the m a j o r i t y of the LMs are meant f o r non-sequential programs which w i l l be executed i n a MIMD mode by TPs using the packet-switched network. For the l a t t e r case, the i n s t r u c t i o n s w i l l be i n t e r l e a v e d i n t o the 89 LMs concerned, thus randomizing and e q u a l i z i n g the access p a t t e r n of TPs and RPs; a r r a y elements w i l l be skewed i n t o these LMs u s i n g known storage techniques [36,37] which allow d i f f e r e n t p o r t i o n s of of an a r r a y to be r e f e r e n c e d c o n c u r r e n t l y . These f e a t u r e s would reduce the problem of memory access b o t t l e n e c k s . (c) Overlapped a r r a y o p e r a t i o n s : While an a r r a y i s being operated on using the read/write l i n k s , s e v e r a l a r r a y alignment o p e r a t i o n s c o u l d be c a r r i e d out u s i n g the packet-switched network which p r o v i d e s a novel way to synchronize the completions of the alignment o p e r a t i o n s , and to s i g n a l those i n s t r u c t i o n s dependent on them. (d) E x t e n s i b i l i t y : The EDC a r c h i t e c t u r e i s h i g h l y e x t e n s i b l e . A f t e r an EDC system has been b u i l t , the numbers of TPs, RPs, IRs and LMs c o u l d be i n c r e a s e d i n c r e m e n t a l l y . Such an advantage i s a t t r i b u t a b l e to the e x t e n s i b i l i t y of the network. A more d e t a i l e d schematic diagram of an EDC i s shown i n F i g . I V . 2 . The f u n c t i o n a l d e s c r i p t i o n s of the system hardware w i l l be given i n S e c t i o n 2, and S e c t i o n 3 w i l l e x p l a i n how i n f o r m a t i o n i s s t o r e d and processed i n an EDC. S e c t i o n 4 w i l l d e s c r i b e the nature of the programming language to be used, and S e c t i o n 5 w i l l examine the performance of an EDC. A comparison of an EDC with the three afore-mentioned designs and some suggested work w i l l be given i n S e c t i o n 6. Ĉ __̂ « Terminal F i l e Sensors , A c t u a t o r s , e t c . A b b r e v i a t i o n s : S u p e r v i s i n g P r o c e s s o r System Memory Channel S e l e c t o r L o c a l Memory T r a n s m i t t i n g P r o c e s s o r R e c e i v i n g P r o c e s s o r I n s t r u c t i o n R e g i s t e r S w i t c h i n g Network Number o f T r a n s m i t t e r s Number o f R e c e i v e r s Number o f l o c a l Memories F i g . I V .2. The connec t ion diagram of EDC hardware a r c h i t e c t u r e . 91 2. EDC H a r d w a r e A r c h i t e c t u r e 2.A. P r o c e s s i n g M o d u l e s ( 1 ) S u p e r v i s i n g P r o c e s s o r ( S P ) S P i s t h e m a s t e r c o n t r o l l e r o f t h e w h o l e s y s t e m a n d i t o v e r s e e s t h e e x e c u t i o n s o f t h e f o l l o w i n g a c t i v i t i e s : ( a ) P r o g r a m d o w n l o a d i n g a n d i n i t i a l i z a t i o n : P r o g r a m s a r e l o a d e d f r o m e x t e r n a l s o u r c e s s u c h a s t h e h o s t c o m p u t e r o r b u l k m e m o r i e s , a n d s t o r e d i n t h e S y s t e m M e m o r y (SM) i n i t i a l l y . When a p r o g r a m i s c a l l e d f o r , S P w i l l a c c e s s a s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) w h i c h i s l o c a t e d i n SM, a n d a l l o c a t e f r e e m e m o r y p a g e s t o t h e c a l l e d p r o g r a m w h i c h w i l l t h e n b e f e t c h e d f r o m SM a n d l o a d e d i n t o L M s . A t t h e e n d o f l o a d i n g , S P w i l l s i g n a l t h e T P s c o n c e r n e d t o s t a r t e x e c u t i o n . ( b ) I n p u t a n d o u t p u t o p e r a t i o n s : I n p u t d a t a w i l l f i r s t g o t o t h e i n p u t b u f f e r l o c a t e d i n SM, a n d t h e n p r o c e e d t o t h e a p p r o p r i a t e L M s . I f t h e v a r i o u s p a r t s o f a n a r r a y a r e t o b e r e f e r e n c e d i n d e p e n d e n t l y a n d c o n c u r r e n t l y , t h e n i t w i l l b e s k e w e d i n t o t h e f i r s t R L M s u s i n g t e c h n i q u e s d e s c r i b e d b y B u d n i k e t a l [ 3 6 , 3 7 ] f o r c o n f l i c t - f r e e a c c e s s e s ; t h e s t o r a g e p a t t e r n w i l l t h e n b e r e c o r d e d i n a n a r r a y d e s c r i p t i o n t a b l e ( A D T ) l o c a t e d i n SM f o r f u t u r e r e f e r e n c e s . O u t p u t d a t a w i l l b e t r a n s f e r r e d f r o m L M s t o t h e o u t p u t b u f f e r w h i c h i s a l s o l o c a t e d i n SM, a n d t h e n 92 t o t h e o u t s i d e d e v i c e s . ) P r o c e s s a n d r e s o u r c e m a n a g e m e n t : SP a l s o h a n d l e s r e q u e s t s f o r p r o c e s s c r e a t i o n s , i n t e r a c t i o n s a n d t e r m i n a t i o n s , p r o c e d u r e c a l l s a n d t h e u s e o f m e m o r i e s a s w e l l a s o t h e r r e s o u r c e s . A r e q u e s t l i s t ( R L ) i s m a i n t a i n e d b y S P t o e n q u e u e t h o s e r e q u e s t s t h a t c a n n o t b e h o n o r e d i m m e d i a t e l y . ) S e t t i n g u p o f c o m p o u n d o p e r a t i o n s : S c a l a r o p e r a t i o n s n e e d n o t g o t h r o u g h S P a n d a r e e x e c u t e d b y T P s a u t o n o m o u s l y ; w h e r e a s c o m p o u n d o p e r a t i o n s h a v e t o b e s e t u p b y S P . I f t h e c o m p o u n d o p e r a t i o n i s a n a r r a y o p e r a t i o n , t h e n S P w i l l r e q u e s t a s u b s e t o f t h e f i r s t R T P s t o p e r f o r m t h e o p e r a t i o n u n d e r t h e d e m a n d a n d c o n t r o l o f a s u b c o n t r o l l e r T P ( j ) , w h e r e j > R . I n t h e c a s e o f a b l o c k o f s e q u e n t i a l p r o g r a m , S P w i l l l o a d i t i n t o L M ( k ) — w h e r e R<k<=M — a n d r e q u e s t T P ( k ) t o e x e c u t e i t . I n t h e f o r m e r c a s e , t h e c h o i c e o f T P s w i l l b e s p e c i f i e d b y t h e s u b c o n t r o l l e r T P ( j ) a c c o r d i n g t o t h e c o m p o u n d o p e r a t i o n i t h a s r e c e i v e d ; i n t h e l a t t e r c a s e , t h e c h o i c e i s a r b i t r a r y . T h e C h a n n e l S e l e c t o r ( C S ) w i l l b e s e t u p b y S P t o r e a l i z e t h e a b o v e c o n n e c t i o n s . ) O t h e r o p e r a t i n g s y s t e m t a s k s : S P may e i t h e r e x e c u t e t h e s e t a s k s d i r e c t l y , o r r e g a r d t h e m a s a p p l i c a t i o n t a s k s a n d a s s i g n t h e m t o T P s . T h e c h o i c e d e p e n d s o n t h e n a t u r e o f t h e O.S. t a s k s . 93 (2) R e c e i v i n g P r o c e s s o r s (RPs) There are R RPs connected to the r e c e i v i n g s i d e of the network PSN. A RP w i l l c o n t i n u o u s l y remove the a r r i v i n g r e s u l t packets from the network and update the contents of the LMs a c c o r d i n g l y . The formats of the v a r i o u s types of r e s u l t packets are l i s t e d i n Table IV.6. A RP w i l l respond to the content of a r e s u l t packet as f o l l o w s : (a) I f i t i s an a r r a y element, then i t w i l l be s t o r e d i n t o the memory l o c a t i o n as s p e c i f i e d by i t s d e s t i n a t i o n address; (b) I f i t i s a s c a l a r operand, the base address of an a r r a y or a s i g n a l l i n g token, then the r e c e i v i n g p r o c e s s o r w i l l update the i n s t r u c t i o n word given by the d e s t i n a t i o n address of the packet, and i t w i l l then examine whether that i n s t r u c t i o n has r e c e i v e d a l l the r e q u i r e d i n f o r m a t i o n ; i f i t has, then the i n s t r u c t i o n w i l l be p l a c e d i n an i n s t r u c t i o n r e g i s t e r (IR) to wait f o r a f r e e TP f o r execution (the s e l e c t i o n of IR-TP p a i r s w i l l be given i n S e c t i o n 2.B.(3)); otherwise no f u r t h e r a c t i o n w i l l take p l a c e . (3) T r a n s m i t t i n g P r o c e s s o r s {TP(1) to TP(R)} T h i s group of TPs w i l l execute both s c a l a r and a r r a y o p e r a t i o n s . Any f r e e TP belonging to t h i s group w i l l c o n t i n u o u s l y check i t s a s s o c i a t e d i n s t r u c t i o n r e g i s t e r s f o r 94 t h e a d d r e s s e s o f e x e c u t a b l e i n s t r u c t i o n s . I f I R ( k ) c o n t a i n s o n e , t h e n T P ( k ) w i l l f e t c h t h e c o r r e s p o n d i n g i n s t r u c t i o n f r o m L M ( k ) a n d e x e c u t e i t . T h e c o m p u t e d r e s u l t s t o g e t h e r w i t h t h e a d d r e s s e s o f t h e n e x t i n s t r u c t i o n s w i l l b e p a c k a g e d i n t o r e s u l t p a c k e t s w h i c h a r e t h e n f o r w a r d e d t o t h e n e t w o r k f o r d i s t r i b u t i o n . A s u b s e t o f t h e s e T P s may u n d e r g o a n a r r a y o p e r a t i o n u n d e r t h e c o n t r o l o f T P ( i ) w h e r e i > R . When T P ( i ) r e c e i v e s s u c h a c o m p o u n d o p e r a t i o n , i t w i l l g e n e r a t e a n d b r o a d c a s t t h e c o n t r o l s i g n a l s t o t h e s e T P s v i a t h e C h a n n e l S e l e c t o r ( C S ) . A s s o o n a s t h e s e T P s h a v e f i n i s h e d t h e i r c u r r e n t a c t i v i t i e s , t h e y w i l l r e s p o n d b y f e t c h i n g t h e a r r a y e l e m e n t s f r o m t h e i r L M s a c c o r d i n g t o t h e b r o a d c a s t • s i g n a l s . I f t h e a r r a y o p e r a t o n i s , ( a ) a c o m p u t a t i o n a l a c t i v i t y , t h e n t h e s e T P s w i l l o p e r a t e o n t h e e l e m e n t s a n d t h e n s t o r e t h e r e s u l t s b a c k t o t h e m e m o r i e s u s i n g t h e r e a d / w r i t e l i n k s ; ( b ) a n a l i g n m e n t o p e r a t i o n , t h e n t h e s e T P w i l l p a c k a g e t h e e l e m e n t s i n t o r e s u l t p a c k e t s a n d f o r w a r d t h e m t o t h e n e t w o r k f o r a l i g n m e n t . A f t e r t h e l a s t e l e m e n t h a s b e e n s e n t o u t , some o f t h e s e T P s w i l l b e r e q u e s t e d b y S P t o g e n e r a t e a s y n c h r o n i z a t i o n t o k e n w h i c h w i l l b e f o r w a r d e d t o t h e n e t w o r k t o i n d i c a t e t h e e n d o f t r a n s m i s s i o n ( t h i s s y n c h r o n i z a t i o n p r o c e s s w i l l b e d e s c r i b e d i n S e c t i o n 2 . C ( 2 ) ) . 95 I f a TP i s not i n v o l v e d i n or has j u s t completed an a r r a y o p e r a t i o n , i t w i l l resume i t s normal a c t i v i t i e s as mentioned i n the beginning of t h i s s u b s e c t i o n . (4) T r a n s m i t t i n g P r o c e s s o r s (TP(R+1) to TP(T)} The main f u n c t i o n of these TPs i s to execute s c a l a r o p e r a t i o n s ; f o r those with LMs, they may be requested by SP to execute s e q u e n t i a l programs as w e l l . Any f r e e TP belongs to t h i s group w i l l c o n t i n u o u s l y check i t s a s s o c i a t e d IR f o r executable i n s t r u c t i o n packets. U n l i k e the p r e v i o u s group, these TPs r e q u i r e t h a t the a c t u a l i n s t r u c t i o n s — i . e . , the opcodes, immediate operands and addresses of next i n s t r u c t i o n s — b e a v a i l a b l e i n the IRs, but. not the addresses of the i n s t r u c t i o n s , because these TPs do not have d i r e c t r ead/write l i n k s to access the f i r s t R LMs where the n o n - s e q u e n t i a l programs are s t o r e d . The r e s u l t computed by these TPs w i l l be packaged i n t o r e s u l t packets which w i l l then be forwarded to the network f o r d i s t r i b u t i o n . To i n i t i a t e the execution of a s e q u e n t i a l program, SP w i l l s e l e c t any f r e e TP-LM p a i r of t h i s group, and the program w i l l be loaded i n t o the LM, and the a s s o c i a t e d TP w i l l be requested to execute i t . Upon completion, t h a t TP w i l l e i t h e r s i g n a l SP or produce a r e s u l t packet to t r i g g e r other i n s t r u c t i o n s v i a the network. The number of TPs c o u l d be l a r g e r than or equal to 96 t h a t o f L M s , d e p e n d i n g o n t h e s p e e d s o f t h e v a r i o u s h a r d w a r e m o d u l e s a n d t h e i n t e n d e d a p p l i c a t i o n s . / 2 . B . S t o r a g e M o d u l e s ( 1 ) S y s t e m M e m o r y (SM) T h e a f o r e m e n t i o n e d i n p u t a n d o u t p u t b u f f e r s a r e l o c a t e d i n SM w h i c h a l s o c o n t a i n s a p p l i c a t i o n p r o g r a m s a s w e l l a s s y s t e m s o f t w a r e s u c h a s I/O r o u t i n e s a n d i n t e r r u p t s e r v i c e r o u t i n e s . W h i l e i n SM, a l l t h e a d d r e s s e s o f a p r o g r a m w i l l r e m a i n i n t h e r e l a t i v e f o r m s o t h a t t h e p r o g r a m c o u l d b e r e - l o c a t a b l e ; w h e n c o p i e d i n t o L M s , t h e s e r e l a t i v e a d d r e s s e s w i l l b e t r a n s l a t e d i n t o a b s o l u t e o n e s b y t h e T P s c o n n e c t e d t o t h e L M s , u s i n g t h e b a s e a d d r e s s p r o v i d e d b y S P . I f a p r o g r a m i s t o b e c a l l e d r e p e a t e d l y , t h e n a c o p y o f i t w i l l b e k e p t i n SM f o r r e p l i c a t i o n p u r p o s e s . SM a l s o c o n t a i n s t h o s e a f o r e m e n t i o n e d t a b l e s , n a m e l y , t h e s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) , t h e a r r a y d e s c r i p t i o n t a b l e ( A D T ) , t h e r e q u e s t l i s t ( R L ) , a s w e l l a s a l i n k a g e i n f o r m a t i o n t a b l e ( L I T ) w h i c h p r o v i d e s t h e l i n k a g e i n f o r m a t i o n b e t w e e n a c a l l i n g p r o g r a m a n d i t s c a l l e d p r o g r a m s . ( 2 ) L o c a l M e m o r i e s ( L M s ) L M ( 1 ) t h r o u g h L M ( R ) a r e u s e d t o c o n t a i n i n t e r l e a v e d 97 i n s t r u c t i o n s and skewed a r r a y s . T h e i r l e f t p o r t s are connected to SP and TPs while t h e i r r i g h t p o r t s t o the RPs. Contentions between RPs and TPs c o u l d be r e s o l v e d by g r a n t i n g t h e i r requests i n an a l t e r n a t i n g manner. LM(R+1) through LM(M) are used to s t o r e s e q u e n t i a l programs which are to be executed s o l e l y by the a s s o c i a t e d TP. At times SP w i l l i n t e r r u p t the above a c t i v i t i e s f o r the l o a d i n g and unloading of programs; such i n t e r f e r e n c e s c o u l d be reduced by i n c r e a s i n g the s i z e of LMs so that most of those f r e q u e n t l y needed programs c o u l d r e s i d e i n them. (3) I n s t r u c t i o n R e g i s t e r s (IRs) IRs serve as b u f f e r s between RPs and TPs. As has been mentioned i n S e c t i o n 2.A(3) and (4), IR(1) through IR(R) c o n t a i n only the addresses of executable i n s t r u c t i o n s while IR(R+1) through IR(T) c o n t a i n the a c t u a l i n s t r u c t i o n s ; t h e r e f o r e , the b u f f e r i n g c a p a c i t i e s of these two groups of IRs are d i f f e r e n t . A s s o c i a t e d with each IR are two s i n g l e - b i t f l a g s : the " F u l l / N o t - F u l l " f l a g which i n d i c a t e s the s t a t u s of the IR, and the "Autonomous/Slave" f l a g which i n d i c a t e s the o p e r a t i n g mode of the connected TP. An autonomous TP i s one which i s ready t o accept or i s c u r r e n t l y executing i n s t r u c t i o n s from IRs, while a s l a v e TP i s one which i s undergoing a compound o p e r a t i o n under the c o n t r o l of another p r o c e s s o r . 98 T o s c h e d u l e a n e x e c u t a b l e i n s t r u c t i o n , R P ( i ) w i l l e x a m i n e t h e f l a g s o f I R ( i + n * R ) i n t h e o r d e r o f i n c r e a s n g n w h i c h i s a n o n - n e g a t i v e i n t e g e r , a n d t h e f i r s t I R w h i c h i s n o t f u l l a n d i s c o n n e c t e d t o a n a u t o n o m o u s T P w i l l r e c e i v e t h e i n s t r u c t i o n p a c k e t . 2.C. S w i t c h e s ( 1 ) C h a n n e l S e l e c t o r ( C S ) CS e n a b l e s S P t o s e l e c t a n y o f t h e T P - L M p a i r s t o p e r f o r m t h o s e a c t i v i t i e s m e n t i o n e d i n S e c t i o n 2 . A ( 1 ) , n a m e l y , p r o g r a m l o a d i n g , i n p u t a n d o u t p u t a c t i v i t i e s a n d s e t t i n g u p o f c o m p o u n d o p e r a t i o n s . T h e i m p l e m e n t a t i o n o f CS i s q u i t e s t r a i g h t - f o r w a r d a n d h e n c e w i l l n o t b e d i s c u s s e d i n t h i s d i s s e r t a t i o n . ( 2 ) P a c k e t S w i t c h i n g N e t w o r k ( P S N ) O t h e r c o n v e n t i o n a l p a c k e t s w i t c h i n g n e t w o r k s c o u l d b e u s e d i n p l a c e o f P S N , b u t t h e y r e q u i r e a t l e a s t ( N / 2 ) l o g N s w i t c h e s f o r a (NxN) c o n n e c t i o n , w h e r e a s PSN u s e s o n l y ( N / 2 ) s w i t c h e s ; t h e r e f o r e , P S N i s a t t r a c t i v e when N i s v e r y l a r g e . P S N i s a m o d i f i e d v e r s i o n o f L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k ( L S S N ) w h i c h h a s b e e n d e s c r i b e d i n C h a p t e r I I I , a n d i t s f u n c t i o n s a r e : 99 (a) to d e l i v e r r e s u l t packets from TPs to RPs and LMs; (b) to perform hardware s y n c h r o n i z a t i o n to s i g n a l the completion of a r r a y alignments. I t i s the second f u n c t i o n above which d i s t i n g u i s h e s PSN from LSSN. The topology and a d d r e s s i n g scheme of PSN are the same as that of LSSN; but i n order to perform hardware s y n c h r o n i z a t i o n on the network, the PSN switches have to be d i f f e r e n t from the LSSN switches. Fig.IV.3 i l l u s t r a t e s the schematic diagram of a PSN s w i t c h . 100 From Transmitting Processors Loop • * * * "^j Stage A 0 , • • < Input Port o 1 o 1 W •P W -P w £. w <H CIS bC CO •H r-H i-J u K U Output Port Input Port Le ft  Cl as s- 1 Ri gh t Cl as s- 1 •X- CM • CM 1 W | P W •P (0 x: w <M CO bO CO 4) i H •H r-t u OS o Output Port B u f f e r pools S y n c h r o n i z a t i o n Stations Intermediate ports To Receiving Processors Output Link F i g . I V . 3 . The schematic diagram of a PSN s w i t c h . 101 I n g e n e r a l , a r e s u l t p a c k e t s e n t o u t b y a t r a n s m i t t i n g p r o c e s s o r w o u l d h a v e t h e p a c k e t f o r m a t a s f o l l o w s : < F e e d b a c k C o u n t ; D e s t i n a t i o n A d d r e s s ; R e s u l t T y p e ; R e s u l t > When a r e s u l t p a c k e t e n t e r s t h e i n p u t p o r t o f a s w i t c h , i t w i l l b e p l a c e d i n t o t h e C l a s s - i b u f f e r i n s i d e t h e s w i t c h a c c o r d i n g t o i t s f e e d b a c k c o u n t i t { 0 , 1 , 2 } , w h i c h i s s e t t o z e r o when t h e p a c k e t i s . i n i t i a l l y g e n e r a t e d , a n d i s i n c r e m e n t e d w h e n e v e r t h e p a c k e t g o e s t h r o u g h t h e f e e d b a c k p a t h . I n F i g . I V . 3 , a l l t y p e s o f r e s u l t p a c k e t s e x c e p t t h e S y n c h r o n i z a t i o n p a c k e t s , w i l l b y p a s s t h e S y n c h r o n i z a t i o n S t a t i o n s when t h e y e m e r g e f r o m t h e b u f f e r p o o l s . F o r a p a c k e t c o m i n g o u t o f t h e C l a s s - 2 b u f f e r , i t w i l l b e f o r w a r d e d t o t h e o u t p u t p o r t i m m e d i a t e l y a n d d i r e c t l y when t h e l a t t e r b e c o m e s e m p t y ; p a c k e t s c o m i n g o u t o f t h e C l a s s - 0 a n d C l a s s - 1 b u f f e r s w i l l b e s w i t c h e d t o a n i n t e r m e d i a t p p o r t t o w a i t f o r t h e i r t u r n s t o b e t r a n s f e r r e d t o t h e o u t p u t p o r t . F o r a s w i t c h l o c a t e d i n t h e s - t h s t a g e , t h e d i r e c t i o n o f s w i t c h i n g i s d e t e r m i n e d b y t h e s - t h b i t o f t h e d e s t i n a t i o n a d d r e s s o f t h e p a c k e t : i f i t i s a "0" , t h e n t h e p a c k e t w i l l b e s w i t c h e d t o t h e l e f t i n t e r m e d i a t e p o r t ; e l s e t o t h e r i g h t o n e . B e c a u s e o f t h e s i m i l a r i t i e s t h a t e x i s t b e t w e e n t h e t o p o l o g i e s o f P S N a n d L S S N , t h o s e t h e o r e m s d e v e l o p e d f o r L S S N a r e a l s o a p p l i c a b l e t o P S N . T h e o r e m 111,1 h a v e s h o w n t h a t f o r a n e t w o r k w i t h L l o o p s , t h e maximum n u m b e r o f s w i t c h e s 102 t h a t a n y p a c k e t w o u l d h a v e t o g o t h r o u g h i n o r d e r t o a r r i v e a t i t s d e s t i n a t i o n , i s ( 2 1 o g L - 1 ) . C o n s i d e r t h e c a s e i n w h i c h a p a c k e t i s a d m i t t e d a t t h e l a s t s t a g e o f P S N a n d h a s t o g o t h r o u g h t h e maximum n u m b e r o f s w i t c h e s , ( 2 l o g L - 1 ) , t h e n t h i s p a c k e t w i l l b e r e m o v e d f r o m P S N when i t e m e r g e s f r o m a C l a s s - 2 b u f f e r l o c a t e d i n t h e ( l o g L - 2 ) t h s t a g e — w h i c h i s t h e f u r t h e s t d e s t i n a t i o n a n y p a c k e t w i l l h a v e t o g o r e g a r d l e s s o f w h e r e i t i s o r i g i n a t e d ; t h e s i g n i f i c a n t o f t h i s o b s e r v a t i o n w i l l b e come o b v i o u s when we d i s c u s s t h e m e t h o d o f h a r d w a r e s y n c h r o n i z a t i o n o n P S N . A n o t h e r i m p o r t a n t p r o p e r t y o f P S N , a s r e v e a l e d b y Lemma I I I . 2 , i s t h a t a n y p a c k e t w h i c h h a s a l r e a d y a c q u i r e d a f e e d b a c k c o u n t o f 2 w i l l a l w a y s r e m a i n i n t h e same l o o p f o r a n y o f i t s f u r t h e r r o u t i n g s t e p s — t h i s e x p l a i n s why p a c k e t s c o m i n g o u t o f t h e C l a s s - 2 b u f f e r s i n F i g . I V . 3 n e e d n o t g o t h r o u g h t h e i n t e r m e d i a t e p o r t s . T h e p u r p o s e o f t h e S y n c h r o n i z a t i o n S t a t i o n s i s t o a c h i e v e t h e e f f e c t o f h a r d w a r e s y n c h r o n i z a t i o n o n PSN — i . e . , t o s i g n a l t h e c o m p l e t i o n o f a r r a y a l i g n m e n t o p e r a t i o n s s o t h a t o t h e r c o m p u t a t i o n d e p e n d e n t o n t h e s e o p e r a t i o n s may p r o c e e d . A f t e r a l l t h e e l e m e n t s i n v o l v e d i n a n a l i g n m e n t o p e r a t i o n h a v e b e e n d i s p a t c h e d t o P S N , e a c h o f t h e f i r s t L T P s ( i . e . , t h o s e T P s c o n n e c t e d t o t h e f i r s t s t a g e o f PSN) w i l l b e r e q u e s t e d , b y e i t h e r S P o r t h e s u b c o n t r o l l e r o f t h e a l i g n m e n t o p e r a t i o n , t o f o r w a r d a s y n c h r o n i z a t i o n t o k e n i n t h e f o r m o f a r e s u l t p a c k e t . T h e s e p a c k e t s w i l l b e t r e a t e d m u c h t h e same a s o t h e r r e s u l t p a c k e t s e x c e p t t h a t t h e y w i l l b e r e t a i n e d b y t h e S y n c h r o n i z a t i o n S t a t i o n s when e m e r g i n g f r o m t h e b u f f e r p o o l s ; 103 a s y n c h r o n i z a t i o n p a c k e t r e t a i n e d b y t h e l e f t ( r i g h t ) C l a s s - i s t a t i o n w o u l d h a v e t o w a i t f o r t h e a r r i v a l o f a n o t h e r s y n c h r o n i z a t i o n p a c k e t i n t h e r i g h t ( l e f t ) s t a t i o n o f t h e same c l a s s , t h e n b o t h p a c k e t s w i l l p r o c e e d t o t h e i n t e r m e d i a t e a n d o u t p u t p o r t s i n a s t r a i g h t - t h r o u g h m a n n e r . S u c h a s c h e m e w o u l d e n s u r e t h a t t h e s y n c h r o n i z a t i o n p a c k e t s w i l l a l w a y s l a g b e h i n d t h e a r r a y e l e m e n t s w h i c h t h e y a r e t r a i l i n g , a n d t h a t when t h e s e p a c k e t s a r r i v e a t t h e C l a s s - 2 S y n c h r o n i z a t i o n S t a t i o n s o f t h e ( l o g L - 2 ) t h s t a g e , a l l t h e a r r a y e l e m e n t s c o n c e r n e d m u s t h a v e b e e n d e l i v e r e d t o t h e i r d e s t i n a t i o n s ( a s h a s b e e n e x p l a i n e d i n t h e p r e v i o u s p a r a g r a p h ) . U p o n t h e a r r i v a l s o f t h e s y n c h r o n i z a t i o n p a c k e t s , t h e C l a s s - 2 S y n c h r o n i z a t i o n S t a t i o n s o f t h e ( l o g L - 2 ) t h s t a g e w i l l t r a n s f o r m t h e m i n t o s i g n a l l i n g t o k e n s b y r e s e t t i n g t h e i r f e e d b a c k c o u n t s t o z e r o . , a n d c h a n g i n g t h e i r r e s u l t t y p e s ( p l e a s e r e f e r t o T a b l e I V . 6 f o r t h e i r f o r m a t s ) ; t h e s e s i g n a l l i n g t o k e n s w i l l b e r e t r a n s m i t t e d t o t r i g g e r t h o s e i n s t r u c t i o n s d e p e n d e n t o n t h e c o m p l e t i o n o f t h e a r r a y a l i g n m e n t o p e r a t i o n , a n d t h e i r d e s t i n a t i o n a d d r e s s e s a r e t h o s e o r i g i n a l l y c a r r i e d b y t h e s y n c h r o n i z a t i o n p a c k e t s . When a r e s u l t p a c k e t a r r i v e s a t a n o u t p u t p o r t o f a PS N s w i t c h , i t s d e s t i n a t i o n a d d r e s s w i l l b e m a t c h e d a g a i n s t t h a t o f t h e o u t p u t l i n k c o n n e c t e d t o t h e p o r t . I f a m a t c h o c c u r s , t h e n t h e RP c o n n e c t e d t o t h a t l i n k w i l l b e s t r o b e d a n d t h e r e s u l t p a c k e t w i l l b e h a n d e d o v e r t o i t ; o t h e r w i s e t h e p a c k e t w i l l b e f o r w a r d e d t o t h e s w i t c h s i t u a t e d a t t h e o t h e r e n d o f t h e l i n k . 1 0 4 M o r e d e t a i l s o f P S N c o u l d b e f o u n d i n C h a p t e r I I I w h i c h a l s o e x p l a i n s how t o e x p a n d t h e n e t w o r k i n c r e m e n t a l l y — a s i g n i f i c a n t a d v a n t a g e o f P S N o v e r o t h e r c o n v e n t i o n a l n e t w o r k s . A l t h o u g h a P S N s w i t c h h a s a mu c h c o m p l e x i n t e r n a l s t r u c t u r e t h a n a c o n v e n t i o n a l b i n a r y s w i t c h , t h e s a v i n g s i n t h e n u m b e r o f s w i t c h e s a s w e l l a s e x t e r n a l w i r i n g w i l l o f f s e t s u c h a d i s a d v a n t a g e when t h e s i z e o f t h e n e t w o r k i s l a r g e . W i t h t o d a y ' s t e c h n o l o g i e s , a h i g h i n t e r n a l c o m p l e x i t y c o u l d b e e a s i l y a c h i e v e d , b u t i f a s y s t e m i n v o l v e s t o o many e x t e r n a l c o m p o n e n t s , i t w i l l s t i l l b e d i f f i c u l t t o m a n a g e . 3 . E D C I n f o r m a t i o n S t r u c t u r e 3.A. M a c h i n e I n s t r u c t i o n F o r m a t s ( 1 ) F o r m a t f o r s e q u e n t i a l p r o g r a m s : I t i s s i m i l a r t o t h a t o f c o n v e n t i o n a l c o m p u t e r s y s t e m s , a n d i s a r r a n g e d a s o n e d o u b l e - b y t e o f o p c o d e f o l l o w e d b y e i t h e r o n e o r m o r e d o u b l e - b y t e s o f o p e r a n d s . «*-16 b i t s — * . — , O p c o d e O p e r a n d s (2) F o r m a t f o r n o n - s e q u e n t i a l p r o g r a m s : I t i s u s e d t o e n c o d e s c a l a r a n d t h o s e e n c a p s u l a t e d c o m p o u n d o p e r a t i o n s , a n d i s made u p o f e i g h t d o u b l e - b y t e s w h i c h a r e d i v i d e d i n t o 4 f i e l d s : (a)Opcode ( b ) C o n t r o l (c) Operand I ' ( d ) Next f i e l d I n f o rmation f i e l d / I n s t r u c t i o n f i e l d I [ f i e l d «*-16bits-»~« I 6 b i t s X 6 x l 6 b i t s 105 The "Opcode" and " C o n t r o l I n f o r m a t i o n " f i e l d s are of one double-byte each, while the "Operand" and "Next I n s t r u c t i o n " f i e l d s share the remaining s i x d o u b l e - b y t e s . (a) "Opcode" f i e l d : T a b l e s IV.1 and 2 show the four c a t e g o r i e s of s c a l a r and compound o p e r a t i o n s r e s p e c t i v e l y , along with some t y p i c a l examples and t h e i r data-flow graphs. (b) " C o n t r o l I n f o r m a t i o n " f i e l d s : I t i s f u r t h e r d i v i d e d i n t o f i v e s u b f i e l d s : ( l ) R e s u l t ( 2 ) F o r m a t ( 3 ) # 0 p e r a n d s ( 4 ) # T o k e n s ( 5 ) # T o k e n s t y p e t y p e R e q u i r e d R e q u i r e d T o Go ««-3bits **— 4 b i t s — 3 b i t s — — 3 b i t s — — 3 b i t s — * - V c o n s t a n t A — v a r i a b l e — ' (1) "Result type": S p e c i f i e s whether the computed r e s u l t w i l l be of s i n g l e or double p r e c i s i o n , a numerical or boolean v a l u e , or a s i g n a l l i n g token. (2) "Format type": S p e c i f i e s the type of format used to accomodate operands and the addresses of those i n s t r u c t i o n s dependent on the c u r r e n t i n s t r u c t i o n . (3) "#Operand Required": S p e c i f i e s the number of operands needed by the i n s t r u c t i o n . (4) "#Tokens Required": Equals "#Operands Required" p l u s the t o t a l number of s i g n a l l i n g tokens needed. (5) "ITokens To Go": Equals "#Tokens Required" 106 minus the number of tokens r e c e i v e d . When a RP r e c e i v e s an operand or a s i g n a l l i n g token, i t w i l l decrement the "#Tokens To Go" of the r e c e i v i n g i n s t r u c t i o n ; when t h i s value reaches z e r o , the r e c e i v i n g i n s t r u c t i o n w i l l be p l a c e d i n t o an i n s t r u c t i o n r e g i s t e r (IR) to wait f o r e x e c u t i o n , (c) & (d) "Operands" and "Next I n s t r u c t i o n " f i e l d s : The v a r i o u s types of formats used by s c a l a r and compound o p e r a t i o n s are l i s t e d i n Tables IV.3 and 4 r e s p e c t i v e l y . These simple formats w i l l meet almost a l l the computational needs; otherwise new formats c o u l d be added i f necessary (a t o t a l of 4 b i t s are a s s i g n e d to the "Format Type" f i e l d which c o u l d account f o r -16 f o r m a t s ) . In Table IV.3, "Opi" r e f e r s to the i - t h operand of an i n s t r u c t i o n and "Nextj" r e f e r s to the address of the j - t h next i n s t r u c t i o n , and "NextT" and "NextF" are the addresses of the next i n s t r u c t i o n s when the r e s u l t of a boolean o p e r a t i o n i s "True" and " F a l s e " r e s p e c t i v e l y . Format No.8 i s u s e f u l f o r those o p e r a t i o n s such as " D u p l i c a t e " and "Wait" which do not c a r r y embedded operands. In Table IV.4, "No. of elements" r e f e r s to the t o t a l number of a r r a y elements i n v o l v e d i n the a r r a y o p e r a t i o n , and " S t r i d e " i s the d i f f e r e n c e i n the indexes of two neig h b o r i n g a r r a y elements which take p a r t i n the o p e r a t i o n . Both the 107 "No. of elements" and " S t r i d e " are obtained from the loop c o n t r o l statements such as "DO 1=1,64,2" or "FOR I=1to64step2 DO". In Table IV.4, "(V1)" i s the base address a s s i g n e d to the r e s u l t i n g v e c t o r V1, and "(V2)" and "(V3)" are those of the input v e c t o r s V2 and V3, r e s p e c t i v e l y . A l l compound o p e r a t i o n s except those "Reduction" ones would produce v e c t o r s which are too expensive ( i n terms of time and space) t o be sent to each and every i n s t r u c t i o n r e q u i r i n g them; t h e r e f o r e , only the base addresses of the v e c t o r s w i l l be sent. As f o r "Reduction" o p e r a t i o n s such as summation and product, t h e i r s c a l a r r e s u l t s would be t r e a t e d much the same as those produced by s c a l a r o p e r a t i o n s . Although the formats shown i n Tables IV.3 and IV.4 have l i m i t e d numbers of "Next I n s t r u c t i o n s " f i e l d s , t h e i r a c t u a l fan-outs c o u l d be extended i n f i n i t e l y by having one or more of t h e i r "Next I n s t r u c t i o n s " f i e l d s p o i n t to a number of " D u p l i c a t e " o p e r a t o r s . 3.B. Packet Formats There are two c l a s s e s of packets that e x i s t i n EDC, namely, (a) I n s t r u c t i o n packets: They flow from RPs to TPs and r e s i d e i n IRs while w a i t i n g f o r e x e c u t i o n . (Please r e f e r t o Table IV.5.) 108 ( b ) R e s u l t p a c k e t s : T h e y a r e p r o d u c e d b y T P s a n d a r e f o r w a r d e d t o R P s v i a t h e n e t w o r k P S N . ( P l e a s e r e f e r t o T a b l e I V . 6 . ) 3.C. P r o g r a m O r g a n i z a t i o n T h e EDC p r o g r a m o r g a n i z a t i o n i s s i m i l a r t o t h o s e o f t h e e x i s t i n g c o m p u t e r s y s t e m s . B o t h t h e a p p l i c a t i o n a n d s y s t e m s o f t w a r e a r e made u p o f t h r e e t y p e s o f p r o g r a m c o m p o n e n t s : ( 1 ) M a i n p r o g r a m s : T h e y a r e a c t i v a t e d v i a e x t e r n a l m e a n s s u c h a s t h e c o n s o l e a n d n o t t o b e c a l l e d b y o t h e r p r o g r a m c o m p o n e n t s . ( 2 ) P r o c e d u r e s : T h e y a r e a c t i v a t e d b y e x p l i c i t c a l l s f r o m t h e p r o g r a m c o m p o n e n t s . T h e c a l l i n g p r o g r a m s u s e " C a l l " a n d " D i s t r i b u t e " o p e r a t o r s a n d t h e c a l l e d p r o g r a m s u s e " D i s t r i b u t e " a n d " R e t u r n " o p e r a t o r s f o r p a r a m e t e r p a s s i n g . A s d e p i c t e d i n F i g . I V . 4 , when t h e " C a l l " i n s t r u c t i o n h a s g a t h e r e d a l l i t s i n p u t t o k e n s , i t w i l l b e d i s p a t c h e d b y a RP t o a f r e e T P w h i c h w i l l t h e n r e q u e s t t h e p r o g r a m c o d e f r o m S P . I f t h e p r o g r a m c o d e d o e s n o t e x i s t i n L M s , t h e n S P w i l l a l l o c a t e f r e e m e m ory p a g e s t o i t a n d l o a d i t f r o m SM t o L M s , a n d i t s s t a r t i n g m e m ory l o c a t i o n w i l l b e r e t u r n e d t o t h e r e q u e s t i n g T P w h i c h w i l l t h e n p r o c e e d 109 with other computations. when the "Return" operator i s executed, a l l the computed r e s u l t s w i l l be routed back t o the c a l l i n g program and the memory pages a s s i g n e d to the c a l l e d program w i l l be r e l e a s e d . M: i j P a b 1 \ \ \ j CALL DISTRIBUTE c 1 (P;a,b;M.j) (M.j;c) DISTRIBUTE a b M.j RETURN v ; c M. j F i g . I V . 4 Parameter p a s s i n g between the c a l l i n g program M and c a l l e d program P. "a" and "b" are the input parameters and "c" i s the re t u r n e d r e s u l t , and " j " i s the r e t u r n a ddress. (3) Task programs: They are used to p r o t e c t shared data and/or p h y s i c a l r e s o u r c e s so as to ensure t h e i r proper use. A task program c o n s i s t s of one or more e n t r y p o i n t s whereby other programs c o u l d send data or s i g n a l s t o i t , and t h e r e f o r e i t i s a means of p r o v i d i n g communications and i n t e r a c t i o n s among the v a r i o u s types of program components. The implementation of parameter p a s s i n g between a task and the c a l l i n g programs i s very much the same as that of procedure c a l l i n g ; the major d i f f e r e n c e i s that a procedure i s a c t i v a t e d by an e x p l i c i t c a l l while a task program i s 110 a c t i v a t e d when t h e p r o g r a m w h i c h d e c l a r e s i t c o m e s i n t o e x i s t e n c e ; a l s o , a p r o c e d u r e t e r m i n a t e s when t h e c o m p u t e d r e s u l t s a r e r e t u r n e d t o t h e c a l l e r s , w h e r e a s a t a s k p r o g r a m may c o n t i n u e t o s e r v e o t h e r c a l l e r s u n t i l a n e x p l i c i t t e r m i n a t i o n s t a t e m e n t i s e n c o u n t e r e d , o r t h e p r o g r a m w h i c h d e c l a r e d i t h a s t e r m i n a t e d . e . g . T a s k t ; A c c e p t A ( x : R e a l ) R e t u r n ( y : R e a l ) ; E n d A; E n d t ; C a l l e r s T a s k t T h e e x e c u t i o n A F i g . I V . 5 . T h e i n t e r a c t i o n s b e t w e e n c a l l i n g p r o g r a m s a n d a t a s k p r o g r a m . T h e e x a m p l e o f F i g . I V . 5 s h o w s a t a s k w i t h a s i n g l e e x e c u t i o n p a t h ; b u t i n g e n e r a l , a t a s k c o u l d b e " m u l t i - t h r e a d e d " — i . e . , made u p o f s e v e r a l c o n c u r r e n t e x e c u t i o n p a t h s . T h e a d v a n t a g e s o f u s i n g t a s k p r o g r a m s i n s t e a d o f l o w - l e v e l c o n c u r r e n c y p r i m i t i v e s s u c h a s s e m a p h o r e s [ 4 2 ] i n 1 1 1 h a n d l i n g i n t e r - p r o g r a m a c t i v i t i e s are ease of use and c l a r i t y . Furthermore, the implementation of t a s k s conforms to the p r i n c i p l e of d a t a - d r i v e n computation. Compared t o other h i g h - l e v e l c o n s t r u c t s , a task i s q u i t e d i f f e r e n t from the "monitor" of Concurrent P a s c a l [38] but very s i m i l a r to the " t a s k " of Ada [ 4 4 ] . 3.D. Data S t r u c t u r e We o n l y d i s c u s s a r r a y s i n t h i s paper alt h o u g h some other more c o m p l i c a t e d s t r u c t u r e s [49,68] may a l s o be c o n s i d e r e d i n our d e s i g n . The h a n d l i n g of a r r a y s i n an EDC i s i l l u s t r a t e d i n F i g . I V . 6 . S y s t e m Memory- F i g . I V . 6 . The p h y s i c a l and l o g i c a l arrangements of EDC memory system. The f i r s t R LMs are l o g i c a l l y d i v i d e d i n t o pages as shown. LM(1) t o LM(R) are used to s t o r e d skewed a r r a y s so 112 t h a t t h e y may b e p r o c e s s e d c o n c u r r e n t l y b y T P ( 1 ) t h r o u g h T P ( R ) . H o w e v e r , f o r r e a s o n s o f e f f i c i e n c y o r a l g o r i t h m i c c o n s t r a i n t s , a n a r r a y may n o t b e s k e w e d b u t i n s t e a d , w i l l b e e i t h e r l o a d e d e n t i r e l y i n t o a l o c a l m e m o r y L M ( k ) a n d p r o c e s s e d b y T P ( k ) , o r d i v i d e d a m o n g s e v e r a l T P ( k ) - L M ( k ) p a i r s w h e r e k>R. T h e d e c i s i o n s c o n c e r n i n g t h e s e a r r a n g e m e n t s c o u l d b e made e i t h e r s t a t i c a l l y a t c o m p i l e t i m e , o r d y n a m i c a l l y b y SP a t r u n t i m e . T h e a r r a y d e s c r i p t i o n t a b l e ( A D T ) a n d s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) m u s t a l w a y s b e u p d a t e d t o r e f l e c t t h e s t o r a g e p a t t e r n s . 3 . E . P r o c e s s a n d R e s o u r c e M a n a g e m e n t I n t h e EDC e n v i r o n m e n t , a p r o c e s s i s d e f i n e d a s e i t h e r a m a i n o r t a s k p r o g r a m i n e x e c u t i o n , a n d t h o s e p r o c e d u r e s c a l l e d a n d d a t a s t r u c t u r e s o w n e d b y t h e p r o g r a m a r e r e g a r d e d a s p a r t s o f t h e p r o c e s s . T h e t r e a t m e n t s o f p r o c e s s c r e a t i o n s a n d t e r m i n a t i o n s a r e v e r y s i m i l a r t o t h a t o f p r o c e d u r e c a l l s : t h e r e q u e s t t o c r e a t e a p r o c e s s w i l l b e f i r s t p l a c e d o n t h e R e q u e s t L i s t ( R L ) u n t i l i t i s r e m o v e d b y S P , w h i c h w i l l t h e n a s s i g n a n u n u s e d i d e n t i f i c a t i o n n u m b e r ( I D ) f r o m t h e l i n k a g e i n f o r m a t i o n t a b l e ( L I T ) a n d f r e e m e m o r y p a g e s f r o m t h e s t o r a g e u t i l i z a t i o n t a b l e ( S U T ) t o t h e p r o c e s s ; S P w i l l t h e n l o a d t h e m e m o r i e s a l l o c a t e d w i t h t h e p r o g r a m c o d e a n d i n i t i a l i z e i t t o r u n . When t h e p r o c e s s t e r m i n a t e s , S P w i l l a g a i n u p d a t e SUT a n d L I T a c c o r d i n g l y . A s i l l u s t r a t e d i n F i g . I V . 7 , t h e m a n a g e m e n t o f h a r d w a r e 113 and/or software resources c o u l d be implemented c o n v e n i e n t l y u s i n g a task program. The number of unused r e s o u r c e s of a p a r t i c u l a r type (e.g., the number "N" of Fig.IV.7) i s s t o r e d i n a memory l o c a t i o n which can only be accessed from w i t h i n the c r i t i c a l r e g i o n e n c l o s e d by the " S e l e c t " and "End S e l e c t " o p e r a t o r s . In order t o prevent m a l i c i o u s a c c e s s e s t o that memory l o c a t i o n , only one request a t a time would be allowed to e nter the c r i t i c a l r e g i o n t o modify the number of the res o u r c e s , and t h i s i s achieved with the use of a s i g n a l l i n g token as shown. The content of t h a t memory l o c a t i o n i s incremented whenever a "Release" request i s honored and decremented whenever an "Acquire" request i s gra n t e d . S i g n a l l i n g Requests from p r o c e s s e s Release A c q u i r e Acknowledgement, to p r o c e s s e s A c q u i r e d Task Program token SELECT (Acquire) • (Release) N:=N-1 • • • * N:=N+1 • » 21 END SELECT ZI7—1=1 Released F i g . I V . 7 . The implementation of a resource manager using a task program. The " S e l e c t " operator used i n the resource manager of Fig.IV.7 does not conform f a i t h f u l l y to the d a t a - d r i v e n p r i n c i p l e s , because i t s execution i s t r i g g e r e d by the a r r i v a l s 1 1 4 o f t h e s i g n a l l i n g t o k e n p l u s a t l e a s t one r e q u e s t -- n o t n e c e s s a r i l y a l l o f them. I f t h e r e a r e s e v e r a l r e q u e s t s a t t h e same t i m e , t h e n t h e y w i l l be e n q u e u e d when t h e y a r r i v e , a n d t h e s e l e c t i o n p o l i c y f o r t h e s e r e q u e s t s c o u l d be e i t h e r f i r s t - c o m e - f i r s t - s e r v e d o r p r i o r i t y - b a s e d , d e p e n d i n g on t h e i m p l e m e n t a t i o n . 4. EDC P r o g r a m m i n g L a n g u a g e S t r u c t u r e A p r o g r a m t h a t r u n s on EDC t a k e s t h e f o r m o f e i t h e r a m a i n p r o g r a m , a p r o c e d u r e o r a t a s k p r o g r a m , a nd i t i s composed o f one o r more " p r o g r a m b l o c k s " w h i c h a r e c o l l e c t i o n s o f i n s t r u c t i o n s t h a t h a v e no b r a n c h i n g i n t o o r o u t o f t h e b l o c k s , e x c e p t a t t h e b e g i n n i n g s a n d e n d i n g s . The a d v a n t a g e s o f u s i n g b l o c k s a r e p r o g r a m c l a r i t y a n d t h a t e x i s t i n g t e c h n i q u e s o f o p t i m i z i n g c o m p i l e r s c o u l d be u s e d . The o b j e c t i v e o f t h i s s e c t i o n i s t o p r e s e n t some u s e f u l i d e a s c o n c e r n i n g t h e d e s i g n o f t h e EDC p r o g r a m m i n g l a n g u a g e , a n d t o i l l u s t r a t e how some l a n g u a g e c o n s t r u c t s a r e c o m p i l e d i n t o d a t a - f l o w g r a p h s w h i c h c o u l d be e a s i l y t r a n s l a t e d i n t o m a c h i n e c o d e u s i n g t h e f o r m a t s p r e s e n t e d i n S e c t i o n 3. 4.A. EDC S t a t e m e n t s a n d P r o g r a m B l o c k s (1) D e c l a r a t i o n s t a t e m e n t s : M o s t o f them a r e u s e d t o a s s i s t 1 15 the compiler i n s e t t i n g up the data-flow graphs and are not t r a n s l a t e d i n t o executable o p e r a t i o n s . Exceptions are the d e c l a r a t i o n of task programs and a r r a y s , which w i l l be compiled i n t o o p e r a t i o n s that w i l l request SP for process c r e a t i o n s and memory space r e s p e c t i v e l y . (2) Assignment statements: In a c o n v e n t i o n a l s e q u e n t i a l program, v a r i a b l e names c o u l d be used r e p e a t e d l y to represent d i f f e r e n t e n t i t i e s i n d i f f e r e n t p a r t s of the program without c a u s i n g much c o n f u s i o n ; however, such a "convenience" would o f t e n l e a d to o b s c u r i t i e s i n a concurrent environment. In the EDC system, the S i n g l e Assignment Rule (SAR) i s used to a v o i d such c o n f u s i o n s whenever necessary. The SAR simply s t a t e s that a v a r i a b l e name must not be as s i g n e d more than one value w i t h i n i t s scope; when a p p l i e d to data-flow graphs, i t means that each arc of the graphs c o u l d have atmost one source of o r i g i n . (3) "Begin/End" block: The "Begin" and "End" statements w i l l be compiled i n t o "Wait" o p e r a t o r s as demonstrated i n Fig.I V . 8 . The "Wait" operator i s a means of imposing dependencies among program b l o c k s so as to achieve the d e s i r e d s e q u e n t i a l i t y not e x p l i c i t l y expressed by t h e i r data dependencies. e . g . B e g i n S i g n a l l i n g t o k e n ( s ) O p e r a n d t o k e n ( s ) 116 B e g i n : W a i t E n d ; P r o g r a m g r a p h E n d : W a i t 4 F i g . I V . 8 A " B e g i n / E n d " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 4 ) " I F " b l o c k : I t w i l l b e c o m p i l e d i n t o a b o o l e a n p l u s a n u m b e r o f " S w i t c h " i n g o p e r a t o r s w h i c h a r e u s e d t o d i r e c t t h e f l o w o f i n p u t o p e r a n d s i n t o e i t h e r t h e " I F " o r " E 1 S E " p a r t o f t h e p r o g r a m . S o m e t i m e s c e r t a i n e m a n a t i n g a r c s h a v e t o b e " g r o u n d e d " i n o r d e r t o d i s c a r d t h e u n u s e d o p e r a n d s a f t e r t h e c o n d i t i o n a l t e s t . CI C 2 B , g . I F ( C l £ C 2 ) T H E N E L S E END; F i g . I V . 9 A n " I F " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 5 ) " M a t c h " b l o c k : I t w i l l b e c o m p i l e d i n t o a s e r i e s o f b o o l e a n a n d " S w i t c h " i n g o p e r a t o r s a s s h o w n i n F i g . I V . 1 0 . T h e i n p u t o p e r a n d " C " w i l l b e m a t c h e d a g a i n s t a l l t h e c o m p a r a n d s " C 1 , C 2 , ...." i n p a r a l l e l , a n d t h e " S w i t c h " i n g o p e r a t o r s 117 w i l l d i r e c t t h e o p e r a n d s t o t h e p a r t o f t h e p r o g r a m w h i c h h a s a s u c c e s s f u l m a t c h . A n " E l s e " p a r t s h o u l d b e p r o v i d e d i n c a s e a l l t h e m a t c h e s f a i l . e.g. MATCH (C) CASE(C1)D0 CASE(C2)D0 ELSE END; F i g . I V . 1 0 . A " M a t c h " b l o c k a n d i t s d a t a - f l o w g r a p h . ( 6 ) " L o o p " b l o c k : A n E D C l o o p i s d i f f e r e n t f r o m t h e " F o r - a l l " a n d " D o - a l l " l o o p s p r o p o s e d i n o t h e r d a t a - d r i v e n l a n g u a g e s [ 4 0 ] . S i n c e p a r a l l e l a r r a y c o m p u t a t i o n s i n EDC a r e e n c o d e d a s c o m p o u n d o p e r a t i o n s , i t i s n o t n e c e s s a r y t o u s e " L o o p " s f o r t h e s e c o m p u t a t i o n s ; i n s t e a d , l o o p s a r e u s e d f o r i t e r a t i o n s a n d r e c u r r e n c e o p e r a t i o n s w h i c h e x h i b i t d a t a d e p e n d e n c i e s b e t w e e n t w o s u c c e s s i v e l o o p c o m p u t a t i o n s . 118 e.g. LOOP WHILE(Cl-C2)DO • NEXT X Z • • • NEXT Cl:« ... NEXT C2:« ... • LOOP EXIT XLAST :« X; END; F i g . I V . 1 1 . A "LOOP" b l o c k a n d i t s d a t a - f l o w g r a p h . 119 As e x e m p l i f i e d by Fig.IV.11, the data-flow graph of a EDC loop c o n t a i n s some a r c s which have two sources — one o u t s i d e the loop and another one w i t h i n the loop body — imply i n g that some v a r i a b l e s are a s s i g n e d more than once, thus v i o l a t i n g the S i n g l e Assignment Rule. A common remedy [39,65] i s to p l a c e p r e f i x e s such as "NEW" and "NEXT" i n f r o n t of those v a r i a b l e names i n q u e s t i o n . Thus, "NEXT X" would be t r e a t e d d i f f e r e n t l y from "X" d u r i n g a loop computation, but "NEXT X" w i l l be updated as "X" at the boundary of two c o n s e c u t i v e loop computations, and "NEXT X" w i l l be an e n t i r e l y new v a r i a b l e when the next loop computation commences. When the c o n d i t o n a l t e s t a s s o c i a t e d with a loop i s s a t i s f i e d , the loop w i l l be e x i t e d and then some of the computed r e s u l t s w i l l be passed to the e x t e r i o r of the loop by a s s i g n i n g them to names that are not used w i t h i n the loop (e.g. "XLAST" of F i g . I V . 1 1 ) . (7) P r e f i x f o r s e q u e n t i a l i t y : As has been mentioned before, c e r t a i n a c t i v i t i e s such as input and output are i n h e r e n t l y s e q u e n t i a l , and i t i s more convenient and e f f i c i e n t to execute t h e i r i n s t r u c t i o n s i n the order s p e c i f i e d by the programs; and the "SEQ"uential p r e f i x i s meant f o r such purposes. If an i n s t r u c t i o n block i s p r e f i x e d with "SEQ", then s i g n a l l i n g tokens would be used to enhance i t s data- 120 f l o w g r a p h t o a c h i e v e t h e d e s i r e d s e q u e n t i a l i t y . I f a n e n t i r e p r o g r a m i s p r e f i x e d w i t h " SEQ", t h e n i t w i l l b e c o m p i l e d i n t o a s e q u e n t i a l p r o g r a m s u s i n g t h e f o r m a t m e n t i o n e d i n S e c t i o n 3 . A ( 1 ) . 4.B. L a n g u a g e C o n s t r u c t s f o r A r r a y P r o c e s s i n g A r r a y o p e r a t i o n s a r e p r o b a b l y t h e r i c h e s t s o u r c e o f s y n c h r o n o u s p a r a l l e l i s m a n d a b o u n d i n s c i e n t i f i c c o m p u t a t i o n s . T h i s s e c t i o n w i l l d i s c u s s how o n e - d i m e n s i o n a l a r r a y s a r e e n c o d e d a n d e x e c u t e d i n E D C ; a r r a y s w i t h a h i g h e r d i m e n s i o n w i l l b e r e d u c e d i n t o o n e - d i m e n s i o n a l a r r a y s b e f o r e t h e i r c o m p u t a t i o n s . ( 1 ) P a r a l l e l V e c t o r O p e r a t i o n s : T h e r a n g e a n d s t r i d e o f a p a r a l l e l v e c t o r o p e r a t i o n a r e i n d i c a t e d w i t h t h e u s e o f a n " I N D E X " s e t , w h i c h h a s t o b e d e c l a r e d p r i o r t o i t s u s e a s f o l l o w s : ( A ) ( B ) ( C ) I e . g . D E C L A R E I : I N D E X 1..64 S T R I D E 2 B E G I N C ( I ) := A ( I ) + B ( I ) ; END; ( A D D ) (C) O p c o d e C o n t r o l # E l e m e n t s S t r i d e A d d r e s s A d d r e s s A d d r e s s ( A D D ) I n f o r . « 6 4 • 2 o f O u t p u t o f I n p u t o f I n p u t N e x t 2 N e x t l t A r r a y = ( C ) A r r a y = ( A ) A r r a y = ( B ) F i g . I V . 1 2 . T h e s t a t e m e n t , d a t a - f l o w g r a p h a n d m a c h i n e c o o f a p a r a l l e l v e c t o r o p e r a t i o n . T h e m a c h i n e c o d e f o r m a t i s a s d e s c r i b e d i n T a b l e I V . 4 . 121 Both the range and s t r i d e i n d i c a t e d i n an index are regarded as input operands to the a r r a y o p e r a t i o n , and they c o u l d be e i t h e r c o n s t a n t s or v a r i a b l e s t o be determined a t run time: The base addresses of the input a r r a y s ( i . e . , (A) and (B) of Fig.IV.12) are r e c e i v e d from the p r e c e d i n g o p e r a t i o n s , and t h a t of the output a r r a y ( i . e . , (c) of Fig.IV.12) i s o b t a i n e d from the memory manager p r i o r to the e x e c u t i o n of the o p e r a t i o n , and i s sent to the succeeding o p e r a t i o n s as an input operand. The e x e c u t i o n of such o p e r a t i o n s has been d e s c r i b e d i n S e c t i o n 2.A.O) and ( 3 ) . (2) Reduction O p e r a t i o n s : T h i s i s another type of a r r a y o p e r a t i o n s f r e q u e n t l y encountered, and there are s i x of them, namely, "SUMmation", "PRODUCT", "MAXimum", "MINimum", "AND" and "OR". e.g. DECLARE J : INDEX 1 . . 1024 BEGIN xsum := SUM(x(J)); END; (X) . (SUM) 1 " XSUM Opcode C o n t r o l #Elements S t r i d e Address I n f o r . =1024 o f input" Array=(X) Next4 Next3 Next2 Nextl (SUM) =1 F i g . I V . 1 3 . The statements, data-flow graph and machine code of a r e d u c t i o n o p e r a t i o n . The machine code format i s as d e s c r i b e d i n Table IV.4. 122 When an a r r a y i s to be reduced, i t s elements w i l l be p a r t i t i o n e d and loaded i n t o s e v e r a l LM(k)'s — where k>R — and each p a r t of i t w i l l be processed by the a s s o c i a t e d TP independently; at the end of the computation, each of these TPs w i l l forward i t s p a r t i a l r e s u l t i n the form of a r e s u l t packet, v i a the network to an i n s t r u c t i o n which w i l l combine a l l the p a r t i a l r e s u l t s . The number of TP-LM p a i r s used f o r a r e d u c t i o n o p e r a t i o n depends on the s i z e of the a r r a y and the speeds of the v a r i o u s hardware and software modules, and has to be o p t i m i z e d i n order to o b t a i n the s h o r t e s t p o s s i b l e computation time. ( 3 ) Alignment o p e r a t i o n s : "SHIFT" and "ROTATE" are two alignment p r i m i t i v e s : the "ROTATE" operator moves the a r r a y elements c y c l i c a l l y by the amount s p e c i f i e d a f t e r the o p e r a t o r , while the "SHIFT" operator f u n c t i o n s i n a s i m i l a r way except that there i s no c y c l i c feedback of a r r a y elements, and zeroes are i n s e r t e d i n t o the p o s i t i o n s vacated by the s h i f t i n g o p e r a t i o n s . The d i r e c t i o n of an alignment c o u l d be f i x e d a r b i t r a r i l y ; and i f none of the o p e r a t o r s i s s p e c i f i e d , then the "SHIFT" o p e r a t i o n w i l l be assumed. 123 e . g . x := a ( K S H I F T - 1 ) ; y := b ( I R O T A T E 2 ) ; z := c ( J - 1 ) ; (A)<f> P -1 \ VI / S H I F T X ) O p c o d e C o n t r o l # E l e m e n t s S t r i d e A d d r e s s A d d r e s s D i s p l a c e - N e x t 2 , l o f O u t p u t o f I n p u t ments -1 S H I F T I n f o r . = 1 0 2 4 =1 A r r a y = ( X ) A r r a y = ( A ) F i g . I V . 1 4 . T h e s t a t e m e n t s o f some a l i g n m e n t o p e r a t i o n s , a n d t h e d a t a - f l o w g r a p h a n d m a c h i n e c o d e o f t h e " S H I F T " o p e r a t i o n . 5. P e r f o r m a n c e A n a l y s i s 5.A. F l o w A n a l y s i s o f E D C I n o r d e r t o s i m p l i f y t h e p e r f o r m a n c e a n a l y s i s o f E D C a n d t o a r r i v e a t some m e a n i n g f u l r e s u l t s , s ome a s s u m p t i o n s w i l l b e made; l a t e r o n , t h e j u s t i f i c a t i o n o f t h e s e a s s u m p t i o n s w i l l b e d i s c u s s e d . ( 1 ) A s s u m p t i o n s : ( a ) T h e p r o b l e m s t o b e p r o c e s s e d b y EDC c o m p r i s e a l a r g e a m o u n t o f c o n c u r r e n c y w h i c h w i l l k e e p a l l t h e T P s b u s y m o s t o f t h e t i m e ; ( b ) I n i t i a l l y a l l t h e c o m p u t a t i o n s a r e a s s u m e d t o b e o f 124 s c a l a r type; compound o p e r a t i o n s w i l l be ignored t e m p o r a r i l y ; (c) The s c a l a r o p e r a t i o n s are randomly d i s t r i b u t e d among the f i r s t R LMs, thus g i v i n g r i s e t o approximately equal packet flow among the output p o r t s of the network PSN. (2) C o n s t r a i n t s On Communications Loads; As s t a t e d i n S e c t i o n 2.C(2), the maximum throughput r a t e (MATR) that can be d e l i v e r e d by a PSN with L loops i s : MATR(L) = 3/2xS xlogLxL**2/{3LlogL~L+4} (IV.1) R,SW (packets/second) (3) C o n s t r a i n t s On P r o c e s s i n g Loads: In order to prevent the i n s t r u c t i o n r e g i s t e r s (IRs) from o v e r f l o w i n g , the t o t a l p r o c e s s i n g speed of TPs must exceeds t h a t of RPs, i . e . , T x S I > T p > R x S I j R p T > < S I > R p / S I > T P ) x R (IV.2) where T and R are the numbers of TPs and RPs r e s p e c t i v e l y , and S i s the average r a t e of producing i n s t r u c t i o n packets by X , RP 125 a RP, and S T „_ i s the average r a t e of consuming i n s t r u c t i o n I, TP packets by a TP. For a PSN with L loops, we can connect up t o a maximum of N=LlogL p a i r s of TPs and RPs. I f a l l the input p o r t s are connected with TPs, then T=N=LlogL, and from e x p r e s s i o n (IV.2), R < ( S I j T p / S I > R p ) x L x l o g L (IV.3) If each RP i s capable of a c c e p t i n g S R R p r e s u l t packets from the network per second, then the maximum acceptance r a t e of r e s u l t packets (MARP) by RPs per second i s : MARP=S R > RpXR Since most of the s c a l a r i n s t r u c t i o n s w i l l be compiled i n t o b i n a r y o p e r a t i o n s , meaning that on the average, the acceptance of every two r e s u l t packets w i l l cause one i n s t r u c t i o n t o be re a d i e d f o r e x e c u t i o n , i . e . , *I.RP " =R,RP / 2 ( I V - 4 > I f RPs are connected to a l l the output p o r t s of the network, then R=LlogL and the maximum acceptance r a t e of r e s u l t packets of such a f u l l y connected c o n f i g u r a t i o n (MARPj) i s : 126 MARP f=RxS R R p = ( L l o g L ) x S R ^ R p (IV. 5) But the v a l u e of R i s c o n s t r a i n e d by e x p r e s s i o n ( I V . 3 ) ; t h e r e f o r e , the maximum acceptance r a t e s u b j e c t e d to such a c o n s t r a i n t i s : M A R P c < ^ I , T P ^ I , R P ) x L x l o g L x ? R f R P S u b s t i t u t i n g S =S /2 i n t o the above e x p r e s s i o n , R,RP I,RP ™ R P c , m a x = 2 L x l o g L x S i , T P * ? . . . . . . (IV.6) E x p r e s s i o n s (IV.1, 5 and 6) show that the EDC performance depends on the network s i z e L, the speeds of TPs, RPs and the s w i t c h e s . 5.B. Example: Let us c o n s i d e r some t y p i c a l v a l u e s f o r the speeds of the v a r i o u s hardware modules, and then examine the EDC performance as a f u n c t i o n of the network s i z e , L. We w i l l assume t h a t each TP and RP i s capable of 3 127 MOPS ( m i l l i o n o p e r a t i o n s p e r s e c o n d ) o n t h e a v e r a g e " t h i s a s s u m p t i o n i s f a i r a n d h a s a l s o a p p e a r e d i n o t h e r s t u d i e s ( f o r i n s t a n c e , s e e r e f e r e n c e [ 3 0 ] ) . S i n c e m o s t o f t h e s c a l a r i n s t r u c t i o n s r e q u i r e t w o r e s u l t p a c k e t s a s t h e i r i n p u t o p e r a n d s , a p p r o x i m a t e l y h a l f o f t h e r e s u l t p a c k e t s c o m i n g o u t o f P S N w i l l n o t t r i g g e r t h e i r r e c e i v i n g i n s t r u c t i o n s f o r e x e c u t i o n ; t o p r o c e s s s u c h a r e s u l t p a c k e t , a RP w o u l d n e e d o n e o p e r a t i o n t o r e t r i e v e t h e t o k e n c o u n t ( i . e . , " # T o k e n s T o G o " ) o f t h e r e c e i v i n g i n s t r u c t i o n , o n e o p e r a t i o n t o d e c r e m e n t i t , o n e o p e r a t i o n t o s t o r e i t b a c k a n d a n o t h e r o n e t o s t o r e t h e r e s u l t t o k e n — a t o t a l o f f o u r o p e r a t i o n s . T h e o t h e r h a l f o f t h e r e s u l t p a c k e t s w o u l d r e a d y t h e r e c e i v i n g i n s t r u c t i o n s f o r e x e c u t i o n ; t o p r o c e s s s u c h a r e s u l t p a c k e t , a RP w o u l d r e q u i r e e i g h t m o r e o p e r a t i o n s t o t r a n s f e r a n 8 - b y t e i n s t r u c t i o n w o r d f r o m i t s a s s o c i a t e d L M t o a n I R , i n a d d i t i o n t o t h e f o u r o p e r a t i o n s m e n t i o n e d a b o v e . T h e r e f o r e , t h e a v e r a g e n u m b e r o f o p e r a t i o n n e e d e d t o p r o c e s s a r e s u l t i s ( 4 + ( 8 + 4 ) ) / 2 = 8 , a n d h e n c e t h e v a l u e o f S R ^ e q u a l s ( 3 x l 0 6 / 8 ) p a c k e t s p e r s e c o n d . A s f o r T P s , l e t u s a s s u m e t h a t t h e t o t a l n u m b e r o f o p e r a t i o n s n e e d e d f o r a T P t o f e t c h a n i n s t r u c t i o n f r o m a n I R , e x e c u t e i t , p a c k a g e t h e r e s u l t i n t o a r e s u l t p a c k e t , a n d t h e n f o r w a r d i t t o t h e n e t w o r k , i s a r o u n d 40 ( o n e m i g h t t r y o t h e r v a l u e s ) ; t h e r e f o r e S j T p = ( 3 x l 0 6 ) / 4 0 p a c k e t s p e r s e c o n d . A s f o r t h e s p e e d o f t h e P S N s w i t c h e s , S R ^ S W e g u a l s " f / t m i n " w h e r e " f " i s t h e c l o c k i n g f r e q u e n c y o f t h e n e t w o r k , a n d M t m i n M i s t h e m i n i m u m n u m b e r o f c l o c k p u l s e s n e e d e d t o t r a n s f e r a p a c k e t f r o m t h e o u t p u t p o r t o f a s w i t c h 1 28 t o t h e i n p u t p o r t o f t h e n e x t s w i t c h . I n t h i s e x a m p l e , t m i n i s a s s u m e d t o b e 10 w h i l e f i s t a k e n t o b e 40 MHz; t h e r e f o r e , — 6 6 t h e v a l u e o f s n c w e q u a l s 4 0 x 1 0 / 1 0 = 4 x l 0 p a c k e t s p e r s e c o n d . 129 The MART, MARPj. and MARPc ,max curves of t h i s example are p l o t t e d a g a i n s t the network s i z e L, and are shown i n Fig.IV.15. Fig.IV.15. The MARPf, MATR and MARPc,max curves of the giv e n example. 130 In t h i s e x a m p l e , t h e maximum t h r o u g h p u t r a t e a t t a i n a b l e b y t h e EDC i s l i m i t e d b y t h e MARPc,max c u r v e w h i c h i s t h e l o w e s t a m o n g t h e t h r e e c u r v e s . Two o b s e r v a t i o n s c o u l d b e o b t a i n e d f r o m F i g . I V . 1 5 : ( a ) T h e r e l a t i v e p o s i t i o n s o f t h e M A R P f a n d MARPc,max c u r v e s s u g g e s t t h a t i n t h i s e x a m p l e , i t i s n o t n e c e s s a r y t o c o n n e c t a l l t h e o u t p u t p o r t s o f t h e PSN w i t h R P s . F o r L = 6 4 , t h e n T = L l o g L = 6 4 x 6 = 3 8 4 ; f r o m e x p r e s s i o n ( I V . 2 ) : ( b ) T h e r e l a t i v e p o s i t i o n s o f MART a n d MARPc,max c u r v e s i n d i c a t e t h a t f o r m o s t o f t h e t i m e , t h e c a p a c i t y o f P S N w i l l b e h i g h e r t h a n t h a t r e q u i r e d , a n d h e n c e t h e e x t e n t a n d p r o b a b i l i t y o f t r a f f i c c o n g e s t i o n i n t h e P S N i s e x p e c t e d t o b e l o w . I f t h e r e a r e a l w a y s c o m p u t a t i o n s t o k e e p a l l t h e T P s b u s y , t h e n t h e r a w s p e e d a t t a i n a b l e b y t h e EDC i s : T x 3 x l 0 6 = 3 8 4 x 3 x l 0 6 = 1 1 5 2 (MOPS) F r o m F i g . I V . 1 5 , t h e maximum r a t e o f f l o w o f r e s u l t p a c k e t s i n t h e s y s t e m i s : I,TP' I,RP ) x T = ( 3 x l 0 6 / 4 0 ) / ( 3 x l 0 6 / 8 ) x 3 8 4 6 MARP c,max ( L = 6 4 ) = 5 7 . 6 X 1 0 ( p a c k e t s / s e c . ) 131 and the maximum r a t e of flow i n s t r u c t i o n packets i s : RxS D D=RxS_ __/2=154x(3xl0 6)/8/2=28.9xl0 6 (packets/second) R f RP I § RP The curves of Fig.IV.15 are u s e f u l i n e s t i m a t i n g the s i z e of the EDC a r c h i t e c t u r e f o r a d e s i r e d speed, and i t a l s o i n d i c a t e s which part of the a r c h i t e c t u r e w i l l be the most l i k e l y performance b o t t l e n e c k a f t e r the system has been b u i l t . 5.C. C o n s i d e r a t i o n s f o r G e n e r a l i z e d Computations: I f the va l u e s of S T S„. __ and S„ _,, are d i f f e r e n t I.RP I,TP R.SW from those i n t h e . p r e v i o u s example, then e x p e c t e d l y , d i f f e r e n t throughput curves and c o n c l u s i o n s would be o b t a i n e d . In g e n e r a l , computations i n EDC w i l l be made up of both s c a l a r and compound o p e r a t i o n s which are mixed with unknown r a t i o , t h e r e f o r e , assumption (b) has to be removed f o r g e n e r a l i t y . The r e s u l t of such a removal would g i v e r i s e to a b e t t e r performance because the ex e c u t i o n of compound o p e r a t i o n s — i . e . , a r r a y o p e r a t i o n s and s e q u e n t i a l programs — i s c o n t r o l - d r i v e n , and r e q u i r e s simpler c o n t r o l s t r u c t u r e s and in c u r l e s s communications overhead than s c a l a r o p e r a t i o n s which are d a t a - d r i v e n . Assumptions (a) and (c) are j u s t i f i a b l e s i n c e EDC i s intended f o r a p p l i c a t i o n s i n v o l v i n g l a r g e amounts of concurrency; and the i n t e r l e a v i n g of i n s t r u c t i o n s and skewing of a r r a y s would spread the programs randomly and evenly among the TP-LM p a i r s . 132 6. D i s c u s s i o n s a n d O u t l o o k A d e s i g n m e t h o d o l o g y h a s b e e n p r o p o s e d f o r a c l a s s o f J n e x t - g e n e r a t i o n s u p e r c o m p u t e r s . O u r p r o p o s a l , w h i c h i s n a m e d E v e n t - D r i v e n C o m p u t e r ( E D C ) , i s p r i m a r i l y a d a t a - d r i v e n s y s t e m s u p p l e m e n t e d w i t h c o n t r o l - d r i v e n a c t i v i t i e s . A l t h o u g h we d o n o t e m p h a s i z e t h e i m m e d i a t e i m p l e m e n t a t i o n o f E D C , m o s t o f i t s h a r d w a r e a r e i m p l e m e n t a b l e w i t h o f f - t h e - s h e l f c o m p o n e n t s e x c e p t t h e P S N s w i t c h e s w h i c h , h o w e v e r , c o u l d b e e a s i l y f a b r i c a t e d w i t h t o d a y ' s t e c h n o l o g i e s . I f t h e PSN ( P a c k e t S w i t c h i n g N e t w o r k ) c o n s i s t s o f 64 l o o p s , t h e n a p p r o x i m a t e l y 4 0 0 T P s ( T r a n s m i t t i n g P r o c e s s o r s ) c a n b e a t t a h c e d t o t h e s y s t e m , a n d t h e maximum r a t e o f f l o w o f i n s t r u c t i o n a n d r e s u l t p a c k e t s w i l l b e a p p r o x i m a t e l y 58 a n d 30 m i l l i o n p e r s e c o n d , r e s p e c t i v e l y ; a n d t h e raw s p e e d a t t a i n a b l e b y t h e EDC w i l l e x c e e d 1,000 MOPS. S i n c e t h e p r o p o s e d EDC l a n g u a g e s t r u c t u r e i s s i m i l a r t o t h e e x i s t i n g o n e s [ 3 9 , 4 1 , 4 4 , 6 9 ] , many o f t h e t e c h n i q u e s a v a i l a b l e t o d a y c o u l d b e a p p l i e d t o i t s c o m p i l e r d e s i g n . C o m p a r e d t o t h e D e p e n d e n c e - D r i v e n S y s t e m [ 3 2 ] , t h e r e s o u r c e s o f E D C a r e b e t t e r u t i l i z e d : w h e n a T P i s n o t i n v o l v e d i n a n a r r a y o p e r a t i o n , i t c o u l d a l w a y s g e t a s c a l a r i n s t r u c t i o n f r o m i t s a s s o c i a t e d IR ( i n s t r u c t i o n r e g i s t e r ) a n d e x e c u t e i t . T h e a r r a y p r o c e s s i n g c a p a b i l i t i e s o f EDC a l s o d i s t i n g u i s h i t f r o m t h e C o m b i n e d S y s t e m [ 3 3 ] , T h e s p e e d r a n g e o f EDC i s e x p e c t e d t o b e many t i m e s h i g h e r t h a n t h a t o f t h e P D F S y s t e m [ 3 4 ] . 133 B e c a u s e o n l y a f e w c o m p o n e n t t y p e s a r e u s e d , t h e d e s i g n c o s t s o f EDC a r e e x p e c t e d t o b e l o w . S i n c e m o s t o f t h e p r o g r a m s a r e r a n d o m l y d i s t r i b u t e d a m o n g t h e LMs t h u s e q u a l i z i n g t h e memory a c c e s s l o a d , t h e s e r i o u s p r o b l e m o f m e m o r y b o t t l e n e c k s c o u l d b e r e d u c e d . A s f o r a r r a y p r o c e s s i n g c a p a b i l i t i e s , a r r a y c o m p u t a t i o n c o u l d b e c a r r i e d o u t b y a s u b s e t o f t h e f i r s t R T P - L M p a i r s u s i n g t h e r e a d / w r i t e l i n k s p r o v i d e d ; a n d t h e a l i g n m e n t s o f a r r a y s c o u l d b e p e r f o r m e d u s i n g t h e PSN w h i c h p r o v i d e s a n o v e l s y n c h r o n i z a t i o n m e t h o d t o i n d i c a t e t h e e n d o f e a c h a r r a y a l i g n m e n t . D e s i g n i n g a s u p e r c o m p u t e r i s n o t a s i m p l e t a s k ; we h a v e p r e s e n t e d some a r c h i t e c t u r a l i d e a s , b u t t h e r e a r e s t i l l * s e v e r a l i s s u e s w h i c h d e s e r v e i m m e d i a t e a t t e n t i o n s b e f o r e t h e EDC c o n c e p t s c o u l d b e c o m e p r a c t i c a l . F i r s t l y , t h e d e t a i l e d s p e c i f i c a t i o n s o f t h e EDC h a r d w a r e h a v e t o b e d e v e l o p e d , a n d t h e d i v i s i o n o f l a b o r b e t w e e n t h e c o m p i l e r a n d h a r d w a r e h a s t o b e c l a r i f i e d a t t h e o u t s e t . S e c o n d l y , i t i s n e c e s s a r y t o a n a l y s e t h e e f f e c t s o f r u n - t i m e o v e r h e a d ( s u c h a s p r o g r a m l o a d i n g ) o n t h e s y s t e m p e r f o r m a n c e , a n d r e m e d i e s ( s u c h a s i n c r e a s i n g t h e s i z e o f L M s ) h a v e t o b e p r o v i d e d i f t h e e f f e c t s a r e s e v e r e . T h i r d l y , i f s e v e r a l i n d e p e n d e n t p r o c e s s e s a r e r u n s i m u l t a n e o u s l y , t h e n a n i d e n t i f i c a t i o n m e t h o d i s n e c e s s a r y ( u n i q u e i d e n t i f i c a t i o n t a g s a r e o f t e n p r o p o s e d f o r o t h e r d a t a - d r i v e n s y s t e m s [ 4 8 ] ) . F o u r t h l y , t h e p o l i c i e s u s e d b y S P t o s c h e d u l e c o m p o u n d o p e r a t i o n s a n d p r o c e s s e s p l a y a n i m p o r t a n t r o l e i n p r o v i d i n g t h e p r o c e s s o r s w i t h e n o u g h o p e r a t i o n s t o 134 k e e p t h e m b u s y ; i t i s n o t s u r e w h e t h e r t h e r e e x i s t s u c h p o l i c i e s a n d t h o s e f o u n d i n t h e l i t e r a t u r e c o u l d b e u s e f u l i n t h i s a s p e c t . F i f t h l y , we h a v e s u g g e s t e d t h a t a r r a y c o m p u t a t i o n b e e n c o d e d a s c o m p o u n d o p e r a t i o n s , b u t p e r h a p s t h o s e o p e r a t i o n s o n s m a l l a r r a y s c o u l d b e d e c o m p o s e d i n t o s c a l a r o p e r a t i o n s s o t h a t t h e l o a d s u b m i t t e d t o S P s o u l d b e r e d u c e d ; t h e c r i t e r i a o f s u c h d e c o m p o s i t i o n s c o n s t i t u t e a n o t h e r a r e a o f f u r t h e r s t u d y . S i x t h l y , i t w o u l d b e c o n v e n i e n t t o t h e u s e r s i f m o r e d a t a s t r u c t u r e s — s u c h a s r e c o r d s a n d l i s t s — a r e p r o v i d e d . L a s t l y , a l t h o u g h s t u d i e s o n f a u l t t o l e r a n c e i n p a c k e t c o m m u n i c a t i o n a r c h i t e c t u r e s h a v e b e e n f o u n d i n t h e l i t e r a t u r e [ 4 5 ] , a s p e c i f i c s t u d y c o n c e r n i n g EDC i n t h i s r e p e c t i s i n d i s p e n s a b l e . We f e e l t h a t o n l y a f t e r t h e s e a b o v e i s s u e s h a v e b e e n a d e q u a t e l y d e a l t w i t h , c a n a s u p e r c o m p u t e r t h e n b e b u i l t a l o n g t h e l i n e s s e t f o r t h f o r E D C . 135 T a b l e I I I . 1 - S c a l a r o p e r a t i o n s . S c a l a r o p e r a t i o n T y p i c a l e x a m p l e s * D a t a - f l o w g r a p h 1 . A r i t h m e t i c a n d l o g i c ADD, MUL,OR \ I ADD 2 . B o o l e a n E Q U A L ? 3 . D a t a t r a n s f e r a n d c o n t r o l S WITCH D U P L I C A T E f DUP ^ WAIT i i 1 t /ATT* i i t 4 . P r o c e d u r a l c a l l s C A L L , R E T U R N < C A L L J • 1 T a b l e I I I . 2 - C o m p o u n d o p e r a t i o n s . C o m p o u n d o p e r a t i o n T y p i c a l e x a m p l e s D a t a - n o w g r a p h * 1 . V e c t o r a r i t h m e t i c a n d l o g i c ( A D D ) , ( M U L ) H I I ( A D D ) 2 . R e d u c t i o n ( S U M ) , ( P R O D U C T ) , ( M A X ) , ( M I N ) l I (SUM) 1 3 . V e c t o r b o o l e a n ( E Q U A L ? ) 4 . A l i g n m e n t ( R I G H T S H I F T ) , ( L E F T R O T A T E ) , 1 1 U l ( R O T ) 1 * S e e S e c t i o n 4 o n t h e u s e o f t h e s e o p e r a t o r s . 136 T a b l e I I I . 3 - T h e " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f s c a l a r o p e r a t i o n s . Computation Format No. Operands/Next i n s t r u c t i o n s S c a l a r A r i t h m e t i c . L o g i c and Procedure C a l l 1 Opl 0p2 0p3 Op 4 Next2 Nextl 2 c O p l y Op 2 -. Next2 Next l 3 Opl Op2 Next4 Next 3 Next 2 N e x t l 4 ^ Opl „ Next4 Next3 Next2 Next l 5 Opl Next5 Next4 Next3 Next2 N e x t l Boolean 6 «e—Op Next«p N e x t F up 1 * 2 .... >. 7 Opl Op2 Nextf Nextf Nextp N e x t F Data T r a n s f e r & C o n t r o l 8 Next6 Next5 Next4 Next3 Next2 Ne x t l <s-16bit-*«-16 —x—16 K 16—*r-16—^-16—> T a b l e I I I . 4 - T h e " O p e r a n d / N e x t i n s t r u c t i o n s " f i e l d s o f c o m p o u n d o p e r a t i o n s . Computation Format No. Operands/Next i n s t r u c t i o n s V e c t o r A r i t h m e t i c and L o g i c 9 No. o f elements S t r i d e (VI) <V2) (V3) Next2 Next l V e c t o r Boolean 10 No. O f elements S t r i d e (VI) (V2) (V3) N e x t T N e x t F Alignments 11 No. o f elements S t r i d e (VI) (V2) D i s -p l a c e -m e n t Next2 N e x t l R e d u c t i o n 12 No. O f e l e m e n t s S t r i d e ( V 2 ) N e x t 4 N e x t 3 N e x t 2 N e x t ^ - - - 8 - b i t - * * - 8 * - 1 6 ^ * - 1 6 - * - 4 6 — * - 1 6 — 1 6 137 Table I I I . 5 - The formats of i n s t r u c t i o n p a c k e t s . Packet Content Packet Format a. I n s t r u c t i o n Address <Addres8 o f i n s t r u c t i o n ^ b. A c t u a l I n s t r u c t i o n word ^Opco d e ; r e s u l t & format types;operands; Next i n s t r u c t i o n a d d r e s s e s ^ Table I I I . 6 - The formats of r e s u l t p a c k e t s . R e s u l t Type Packet Format a. S c a l a r Operand b. Array Element being a l i g n e d c. Base Address assigned to an arra y f e e d b a c k C o u n t ; D e s t i n a t i o n Address;Result T y p e j R e s u l ^ d. S i g n a l l i n g Token ^Feedback C o u n t ; D e s t l n a t i o n Address;Result Type^ e. S y n c h r o n i z a t i o n Token f e e d b a c k C o u n t ; D e s t i n a t i o n Address;Result Type^ 1 38 C h a p t e r V. C o n c l u s i o n s 1. Summary o f R e s u l t s I n C h a p t e r I I , we h a v e p r e s e n t e d a r e - c i r c u l a t i n g s y s t o l i c s o r t e r (RSS) a n d two a l g o r i t h m s w h i c h work on RSS. The c o r r e c t n e s s o f t h e a l g o r i t h m s h a s been p r o v e d and g e n e r a l o p e r a t i o n a l c o n s t r a i n t s have been d e r i v e d . T h i s d e s i g n i s h i g h l y a m e n a b l e t o V L S I i m p l e m e n t a t i o n s due t o t h e f o l l o w i n g a t t r i b u t e s : ( 1 ) t h e s i m p l e c o n t r o l s t r u c t u r e r e q u i r e d by t h e a l g o r i t h m s ; (2) t h e r e g u l a r , r e p e t i t i v e a n d n e a r - n e i g h o u r t y p e o f i n t e r c o n n e c t i o n s among t h e c o m p a r a t o r s ; a n d (3) t h e s y s t o l i c d a t a movements. The s o r t i n g a r r a y i s a l s o w e l l - s u i t e d f o r f a b r i c a t i o n on s h i f t - r e g i s t e r t y p e o f s t o r a g e a n d l o g i c d e v i c e s s u c h a s m a g n e t i c b u b b l e m e m o r i e s (MBMs) and c h a r g e - c o u p l e d d e v i c e s ( C C D s ) , b e c a u s e o f i t ' s c l o s e d - l o o p s t r u c t u r e . The number o f q u a d r u p l e c o m p a r a t o r s n e e d e d t o s o r t N i t e m s i s N/4, and t h e a v e r a g e number o f s o r t i n g c y c l e s , a s f o u n d by o u r s i m u l a t i o n s t u d i e s , i s w i t h i n t h e r a n g e [ ( l o g N ) * * 2 , N ] . A h a r d w a r e t e r m i n a t i o n method i s i n c o r p o r a t e d i n t o t h e c o n t r o l u n i t o f t h e s o r t e r , so t h a t t h e s o r t i n g p r o c e s s c a n be t e r m i n a t e d a s soon a s t h e i n p u t l i s t i s i n t h e d e s i r e d o r d e r . C h a p t e r I I I d e s c r i b e s a n o v e l l o o p - s t r u c t u r e d s w i t c h i n g n e t w o r k (LSSN) i n t e n d e d f o r p a c k e t c o m m u n i c a t i o n s i n h i g h l y p a r a l l e l a p p l i c a t i o n s . W i t h L l o o p s , i t c a n c o n n e c t up t o N = L l o g L p a i r s o f t r a n s m i t t i n g a n d r e c e i v i n g d e v i c e s , u s i n g o n l y N/2 t w o - b y - t w o s w i t c h i n g e l e m e n t s . T h e r e f o r e , i t 1 39 i s very c o s t - e f f e c t i v e i n terms of i t s component count. I t s topology resembles that of the i n d i r e c t b i n a r y n-cube n e t w o r k [ 2 l ] , but a much higher d e v i c e - t o - s w i t c h r a t i o can be achieved by LSSN because a l l the l i n k s between the switches c o u l d be used as both t r a n s m i t t i n g and r e c e i v i n g s t a t i o n s . I t has the advantage of incremental e x t e n s i b i l i t y , and i t i s f r e e of the store-and-forward type of deadlocks which p r e v a i l in other c y c l i c a l packet-switched networks. Our s i m u l a t i o n s t u d i e s have shown that the average throughput r a t e and delay of LSSN are c l o s e to that of other designs d e s p i t e i t s r e l a t i v e l y low component count. Chapter IV d e s c r i b e s a new design methodology f o r the n e x t - g e n e r a t i o n computers. Our p r o p o s a l , the Event-Driven Computer (EDC) i s p r i m a r i l y a d a t a - d r i v e n , heterogeneous system which i s supplemented with c o n t r o l - d r i v e n a c t i v i t i e s ; such a combined approach i s aimed at e x t r a c t i n g the advantages of both the "pure" d a t a - d r i v e n and c o n t r o l - d r i v e n systems while a l l e v i a t i n g t h e i r shortcomings. Compared to other designs, EDC has the advantages of a simpler a r c h i t e c t u r e , b e t t e r resource u t i l i z a t i o n , a r r a y p r o c e s s i n g c a p a b i l i t i e s and a higher speed range. The LSSN of Chapter III has been m o d i f i e d f o r t h i s a p p l i c a t i o n ; with a c o n f i g u r a t i o n of 64 loops, t h i s network can connect up to approximately 400 p r o c e s s o r s , and hence an execution speed of more than 1,000 m i l l i o n o p e r a t i o n s per second can be obtained by the EDC. 1 40 2. G e n e r a l D i s c u s s i o n s The m a i n theme of t h i s t h e s i s i s t o d e m o n s t r a t e t h e p r a c t i c a l i t y a n d u s e f u l n e s s o f c y c l i c a l a r c h i t e c t u r e s i n t h e d e s i g n s o f h i g h - p e r f o r m a n c e p r o c e s s o r s a n d c o m p u t e r s . Our i d e a s h a v e been i l l u s t r a t e d t h r o u g h t h e u s e o f s p e c i f i c a p p l i c a t i o n e x a m p l e s i n c l u d i n g p a r a l l e l s o r t i n g , p a c k e t - s w i t c h e d c o m m u n i c a t i o n s a nd t h e d e s i g n m e t h o d o l o g y o f a n o v e l , n e x t - g e n e r a t i o n c o m p u t e r . The i d e a s o f f e e d b a c k i n o u r p r o p o s a l s a r e e n t i r e l y d i f f e r e n t f r o m t h a t o f p r o c e s s c o n t r o l , w h i c h u s e s f e e d b a c k s i g n a l s f o r c o r r e c t i o n a l p u r p o s e s ( i . e . , a d d i t i v e o r m u l t i p l i c a t i v e m a n i p u l a t i o n s o f t h e i n p u t s i g n a l s ) ; i n t h e RSS a r r a y s , f e e d b a c k a l l o w s d a t a i t e m s t o be f u r t h e r c o m p a r e d among t h e m s e l v e s u n t i l t h e w h o l e i n p u t l i s t i s s o r t e d ; i n t h e n e t w o r k LSSN, t h e s o l e p u r p o s e o f f e e d b a c k i s t o r e - u s e t h e n e t w o r k r e s o u r c e s u n t i l t h e p a c k e t s a r e r o u t e d t o t h e i r d e s t i n a t i o n s , b u t t h e r e i s no d i r e c t i n t e r a c t i o n ( s u c h a s c o m p a r i s o n s i n t h e RSS a r r a y s ) among t h e i n f o r m a t i o n p a c k e t s , o t h e r t h a n c o m p e t i t i o n s f o r t h e n e t w o r k r e s o u r c e s ; i n t h e EDC, t h e a r r i v a l s o f r e s u l t p a c k e t s i n t h e f e e d b a c k p a t h s i g n i f y t h e c o m p l e t i o n o f one o r more i n s t r u c t i o n c y c l e s , a n d a s a r e s u l t , new i n s t r u c t i o n s may o r may n o t be b r o u g h t i n t o t h e c o m p u t a t i o n p a t h f o r e x e c u t i o n s , d e p e n d i n g on t h e amounts o f i n f o r m a t i o n t h e y h a v e g a t h e r e d . The manners i n w h i c h f e e d b a c k p a c k e t s i n t e r a c t w i t h e a c h o t h e r c o n t r i b u t e g r e a t l y t o t h e p r o p e r t i e s o f t h e c y c l i c a l a r c h i t e c t u r e s . F o r 141 i n s t a n c e , t h e LSSN i s s u s c e p t i b l e t o t h e s t o r e - a n d - f o r w a r d t y p e o f d e a d l o c k s (we h a v e , h o w e v e r , d e m o n s t r a t e d how t h i s p r o b l e m c a n be s o l v e d ) , b u t t h e d e a d l o c k p r o b l e m do n o t e x i s t i n t h e RSS, b e c a u s e d a t a movements i n t h e RSS n e t w o r k t a k e p l a c e a l o n g s p e c i f i c p a t h s a n d t h e r e i s no d a t a p a t h c o n f l i c t ; i n t h e c a s e o f t h e EDC, i f t h e r e a r e a l w a y s memory l o c a t i o n s a v a i l a b l e i n t h e L o c a l M e m o r i e s t o a w a i t t h e r e s u l t p a c k e t s c o m i n g o u t of t h e n e t w o r k ( i . e . , i f t h e p r o g r a m s a r e c o r r e c t l y w r i t t e n , c o m p i l e d a n d l o a d e d ) , t h e n t h e EDC s y s t e m s h o u l d be d e a d l o c k - f r e e . A n o t h e r p r o p e r t y o f p a c k e t - s w i t c h e d , c y c l i c a l a r c h i t e c t u r e s i s t h e i r l a c k o f r e s p o n s i v e n e s s -- i n t e r r u p t s c a n n o t be p r o c e s s e d i m m e d i a t e l y b e c a u s e t h e c o m p u t a t i o n p a t h c o u l d a l r e a d y be c o n g e s t e d w i t h p a c k e t s when t h e i n t e r r u p t s o c c u r ; w h i l e i n t h e EDC, d i r e c t r e a d / w r i t e l i n k s c o n n e c t i n g t h e T r a n s m i t t i n g P r o c e s s o r s a n d t h e L o c a l M e m o r i e s a l l o w p r o g r a m s t o be e x e c u t e d i n a c o n t r o l - d r i v e n manner, w i t h o u t g o i n g t h r o u g h t h e PSN and t h e f e e d b a c k p a t h ; t h e r e f o r e , f a s t e x e c u t i o n a n d h e n c e s h o r t r e p o n s e t i m e s c o u l d be e x p e c t e d . I n g e n e r a l , r e s o u r c e s i n c y c l i c a l a r c h i t e c t u r e s a r e b e t t e r u t i l i z e d when c o m p a r e d t o t h o s e i n t h e a c y c l i c s y s t e m s . 3. S u g g e s t i o n s f o r F u t h e r Work I n t h i s t h e s i s , we have d e v e l o p e d some i d e a s b a s e d on s e v e r a l new a r c h i t e c t u r a l c o n c e p t s a n d d e m o n s t r a t e d t h e i r p r a c t i c a l i t y a n d u s e f u l n e s s . We h a v e n o t i m p l e m e n t e d any o f t h e p r o p o s a l s , b e c a u s e we f e e l t h a t more r e l a t e d work h a s y e t t o be d o n e . S p e c i f i c t o p i c s f o r f u r t h e r r e s e a r c h h a v e been 142 s u g g e s t e d i n t h e p r e v i o u s c h a p t e r s ; i n p a r t i c u l a r , work c o n c e r n i n g t h e d e t a i l e d h a r d w a r e s p e c i f i c a t i o n s a n d f a u l t t o l e r a n c e s t u d i e s o f t h e t h r e e d e s i g n s , s h o u l d p e r h a p s r e c e i v e t h e u t m o s t a t t e n t i o n s , s i n c e o u r p r o p o s e d s y s t e m s a r e d e s i g n e d t o make use o f h u n d r e d s t o t h o u s a n d s o f i n t e r c o n n e c t e d p r o c e s s i n g a nd s t o r a g e c o m p o n e n t s , f a i l u r e s o f s i n g l e c o m p o n e n t s w i l l p a r a l y s e t h e e n t i r e s y s t e m s , a n d t h e s e i m p o r t a n t i s s u e s a r e n o t i n c l u d e d i n o u r s t u d i e s . 143 A p p e n d i x A Lemma 1 1 1 . 1 : C o n s i d e r a L S S N w h i c h h a s L l o o p s a n d a p a c k e t w h i c h i s d e s t i n e d f o r t h e a d d r e s s A g . . A j - £ L , . . . - £ , w h e r e L ' = l o g L a n d S ' = r i o g L ' l . T h e p a c k e t w i l l b e r o u t e d t o t h e l o o p t ....t w i t h i n L ' s t e p s o f r o u t i n g a f t e r i t s a d m i s s i o n i n t o t h e L S S N . P r o o f : S u p p o s e t h e p a c k e t i s a d m i t t e d i n t o t h e L S S N v i a t h e l o o p jf i_> . A c c o r d i n g t o t h e r o u t i n g s c h e m e , t h i s L ' * s* * 1 p a c k e t w i l l b e r o u t e d t o t h e l o o p t L' b y a s w i t c h L ' * * s* * 1 l o c a t e d i n t h e s - t h s t a g e , w h e r e s = 1 , 2 , . . . L ' . S i n c e t h e maximum v a l u e o f s i s L ' , o n l y L ' s t e p s o f r o u t i n g a r e r e q u i r e d t o r o u t e t h e p a c k e t t o t h e a f o r e m e n t i o n e d l o o p JL t t L ' s ' 1* Lemma I I I . 2 : C o n s i d e r a L S S N w i t h L l o o p s a n d a p a c k e t w h i c h i s d e s t i n e d f o r t h e a d d r e s s A G , . . A ^ ^ ,..-c!^ , w h e r e L ' = l o g L a n d S ' = r l o g L ' l . A f t e r t h e p a c k e t h a s b e e n r o u t e d t o t h e l o o p £ L i * • , i t n e e d s a t m o s t a n o t h e r ( L ' - l ) s t e p s o f m a t c h i n g a l o n g t h a t l o o p t o r e a c h i t s d e s t i n a t i o n . P r o o f : A c c o r d i n g t o t h e r o u t i n g s c h e m e , t h e d e s t i n a t i o n 144 a d d r e s s A .. A t ... t i s o n e o f t h e L ' o u t p u t l i n k s a l o n g t) X Li X t h e l o o p . A f t e r t h e p a c k e t h a s b e e n s w i t c h e d t o •L* X t h i s l o o p , i t w i l l b e r e m o v e d b y e i t h e r t h e r e c e i v e r a t t a c h e d t o t h e l i n k w h i c h i s p a r t o f t h a t l o o p , o r o n e o f t h e r e m a i n i g ( L ' - 1 ) r e c e i v e r s a t t a c h e d t o t h e same l o o p . I n e i t h e r c a s e , a t m o s t ( L ' - 1 ) s t e p s o f m a t c h i n g a r e n e c e s s a r y . T h e o r e m 1 1 1 . 1 : I n a L S S N w i t h L l o o p s , a p a c k e t w i l l b e d e l i v e r e d t o i t s d e s t i n a t i o n w i t h i n ( 2 1 o g L - 1 ) s t e p s o f r o u t i n g r e g a r d l e s s o f w h e r e i t i s g e n e r a t e d . P r o o f : T h i s t h e o r e m i s a r e s u l t o f Lemma 1 a n d 2. T h e o r e m I I I . 2 : T h e a v e r a g e n u m b e r o f r o u t i n g s t e p s ( A R S ) n e e d e d t o d e l i v e r a r e s u l t p a c k e t i n a L S S N w i t h L l o o p s i s , A R S ( L ) = ( 3 l o g L - 1 ) / 2 + 2 / L - 1 P r o o f : W i t h o u t l o s s o f g e n e r a l i t y a n d f o r s i m p l i c i t y , we s h a l l c o n s i d e r a t r a n s m i t t e r ( T r ) l o c a t e d i n t h e f i r s t s t a g e o f t h e n e t w o r k . T h e r o u t e s f r o m t h i s T r t o t h e s e t o f o u t p u t l i n k s w h i c h c a n b e r e a c h e d w i t h o u t g o i n g t h r o u g h t h e f e e d b a c k p a t h s , a s s h o w n i n F i g . I I I . 2 , i s i n t h e f o r m o f a " b i n a r y t r e e " w h i c h b r a n c h e s o u t t o w a r d t h e l o w e r e n d o f t h e n e t w o r k ; t h e r o u t e s t o t h e r e m a i n i n g s e t o f o u t p u t l i n k s w o u l d i n c l u d e ^ t h e f e e d b a c k p a t h s , a n d i s i n t h e f o r m o f a n i r r e g u l a r , " t a p e r i n g " t r e e . T h e n u m b e r o f r o u t i n g s t e p s n e e d e d t o r e a c h 145 t h e r e c e i v e r s o n t h e s e t w o t r e e s a r e t a b u l a t e d i n T a b l e I I I . 1 . F r o m T a b l e 1 1 1 . 1 , t h e a v e r a g e n u m b e r o f r o u t i n g s t e p s n e e d e d f o r a T r t o r e a c h a n y o u t p u t l i n k i s t h e r e f o r e , A R S ( L ) = { ( 2 x 1 + 4 x 2 + . . + L l o g L ) + ( L - 2 ) ( l o g L + 1 ) + ( L - 4 ) ( l o g L + 2 ) + + ( L / 2 ) ( 2 l o g L - 1 ) } / { ( 2 + 4 + . . + L ) + ( L - 2 ) + ( L - 4 ) + . . + ( L / 2 ) } = { ( L l o g L - 2 1 o g L + L ) + ( L l o g L - 4 1 o g L + 2 L ) + ( L l o g L - L / 2 1 o g L + L d o g L - 1 ) + L l o g L } / { L l o g L } = { L ( l o g L + ( l o g L + 1 ) + ( l o g L + 2 ) + . . + ( l o g L + l o g L - 1 ) ) - l o g L ( 2 + 4 + . . L / 2 ) } / { L l o g L } = { L l o g L ( 3 l o g L - 1 ) / 2 - 2 1 o g L ( 2 * * ( l o g L - 1 ) ) / { L l o g L } = ( 3 l o g L - 1 ) / 2 + 2 / L - 1 Q.E.D. T a b l e . I I I . 1 - T h e n u m b e r o f r o u t i n g s t e p s n e e d e d t o r e a c h t h e r e c e i v e r s o f t h e " b i n a r y " a n d " t a p e r i n g " t r e e s . S t a g e # o f R r B i n a r y t r e e T a p e r i n g t r e e # R r s # s t e p s # R r s # s t e p s s t a g e 1 2 1 L - 2 l o g L + 1 s t a g e 2 4 2 L - 4 l o g L + 2 • • • • • • • • • • s t a g e ( l o g L - 1 ) L / 2 l o g L - 1 L / 2 2 l o g L - 1 s t a g e ( l o g L ) L l o g L 0 2 1 o g L C o r a l l a r y 1 1 1 .1 : A n y p a c k e t a d m i t t e d i n t o L S S N w i l l g o t h r o u g h 146 t h e f e e d b a c k p a t h a t m o s t t w i c e . P r o o f : C o n s i d e r a t r a n s m i t t e r ( T r ) w h i c h s e n d s a p a c k e t a t t h e s - t h s t a g e t o a r e c e i v e r ( R r ) i s o f r r o u t i n g s t e p s a w a y ; a n d s u p p o s e t h e r e a r e L ' s t a g e s i n t h e n e t w o r k . T h e n u m b e r o f f e e d b a c k s , F , c o u l d b e c a l c u l a t e d a s : F ( L ) = Q u o t i e n t ( ( s + r - 1 ) / L ' ) T h e r e a d e r may v e r i f y t h e c o r r e c t n e s s o f t h i s e x p r e s s i o n w i t h a s i m p l e e x a m p l e o n F i g . I I I . 2 . T h e maximum v a l u e o f F i s t h e r e f o r e , F ( L ) = Q u o t i e n t ( ( S m a x + r m a x ~ 1 ) / L * ) S i n c e s m a x = L* a n d r m a x = 2 L « - 1 ( f r o m T h e o r e m I I I . l ) , t h e n F ( L ) = Q u o t i e n t ( ( 3 L ' " 2 ) / L ' ) max = 2 Q.E.D. T h e o r e m I I I . 3 : F o r a L S S N w i t h L l o o p s , t h e p r o b a b i l i t y t h a t t h e d e s t i n a t i o n a d d r e s s c a r r i e d b y a r e s u l t p a c k e t w i l l m a t c h t h e l a b e l o f a n o u t p u t l i n k , a n d h e n c e t h e p a c k e t w i l l b e r e m o v e d f r o m t h e n e t w o r k i s : 147 P = 2 L / { 3 L l o g L - L + 4 } r e m o v e d w h e r e t h e t r a n s m i s s i o n p a t t e r n i s s u c h t h a t e a c h a n d e v e r y r e c e i v i n g p o r t o f t h e n e t w o r k i s e q u a l l y l i k e l y t o r e c e i v e t h a t p a c k e t . P r o o f : S i n c e t h e L S S N h a s L l o o p s , i t w o u l d h a v e l o g L s t a g e s o f s w i t c h e s a n d L l o g L p a i r s o f t r a n s m i t t i n g p o r t s ( T P s ) a n d r e c e i v i n g p o r t s ( R P s ) . C o n s i d e r t h e c a s e i n w h i c h a T P i n e a c h s t a g e o f t h e n e t w o r k t r a n s m i t s a r e s u l t p a c k e t t o e a c h a n d e v e r y RP i n t h e n e t w o r k , t h e n t h e n u m b e r o f p a c k e t s t r a n s m i t t e d b y e a c h s t a g e o f R P s i s a s t a b u l a t e d i n T a b l e I I I . 2 . I n T a b l e I I I . 2 , " F e e d b a c k C o u n t " i s t h e n u m b e r o f t i m e s t h e p a c k e t s w i l l g o t h r o u g h t h e f e e d b a c k p a t h s i n o r d e r t o r e a c h t h e i r d e s t i n a t i o n s . T h e c o r r e c t n e s s o f t h i s t a b l e c o u l d b e v e r i f i e d o n t h e e x a m p l e g i v e n i n F i g . I I I . 3 . F r o m t h i s t a b l e , t h e t o t a l n u m b e r o f p a c k e t s t h a t w i l l b e r e c e i v e d b y t h e R P s c o n n e c t e d t o a p a r t i c u l a r s t a g e , s a y t h e l a s t ( i . e . , l o g L - t h ) s t a g e , i s o b t a i n e d b y s u m m i n g u p t h e n u m b e r s a c r o s s t h e c o r r e s p o n d i n g r o w o f t h e t a b l e , a n d i t i s : N = L l o g L m a t c h e d a n d t h e t o t a l n u m b e r o f s w i t c h i n g o p e r a t i o n s p e r f o r m e d b y t h e 148 same s t a g e i s : N = N +N t o t a l m a t c h e d u n m a t c h e d w h e r e N i s t h e n u m b e r o f p a c k e t s w h i c h w i l l n o t b e u n m a t c h e d r e m o v e d b y R P s o f t h a t s t a g e b e c a u s e o f u n m a t c h e d d e s t i n a t i o n a d d r e s s e s c a r r i e d b y t h e m . I n t h e c a s e o f t h e ( l o g L ) - t h s t a g e , N u n m a t c h e d c o u ^ ^ e a s i l y b e c o m p u t e d a s t h e sum o f p r o d u c t s o f t h e e n t r i e s a n d t h e i r r e p e c t i v e " F e e d b a c k c o u n t " s i n T a b l e I I I . 2 : N ^ = 1 x { ( L - 2 ) + ( L - 4 ) + . . . + ( L - L / 2 ) + ( L - L ) u n m a t c h e d + L + ( L - 2 ) + . . . + ( L - L / 4 ) + ( L - L / 2 ) + L / 2 + L + ( L - 2 ) + . . . + ( L - L / 8 ) + ( L - L / 4 ) T • • • • + 4+8+16+ + ( L - 4 ) + ( L - 2 ) } + 2 x { ( L - L / 2 ) + ( L - L / 4 ) + ( L - L / 2 ) + ( L - L / 8 ) + ( L - L / 4 ) + ( L - L / 2 ) + + ( L - 4 ) + ( L - 8 ) + + ( L - L / 2 ) } L e t N ' = ( L - 2 ) + ( L - 4 ) + . . . + ( L - L / 2 ) , t h e n a f t e r r e - a r r a n g e m e n t , N =N' u n m a t c h e d +L+N' + L / 2 + L + N ' + ( L - L / 2 ) + L / 4 + L / 2 + L + N ' + ( L - L / 2 ) + ( L - L / 4 ) 149 + 4 + 8 + . . . . + L / 2 + L + N ' + ( L - L / 2 ) + ( L - L / 4 ) + . . + ( L - 4 ) = N ' l o g L + ( 1 + 2 + 3 + . . . + l o g L - 1 ) * L = { L ( l o g L - 1 ) - ( 2 + 4 + 8 + . . . + L / 2 ) } * l o g L + { L ( 1 + l o g L - 1 ) ( l o g L - 1 ) / 2 } = { L ( l o g L - 1 ) - ( L - 2 ) } l o g L + { L l o g L ( l o g L - 1 ) / 2 } = { 3 L ( l o g L ) * * 2 } / 2 - { 3 L l o g L } / 2 + 2 1 o g L => N t o t a l = { 3 L ( l o g L ) * * 2 } / 2 - { L l o g L } / 2 + 2 1 o g L => P = N / N ^ , r e m o v e d m a t c h e d t o t a l = ( L l o g L ) / { ( 3 L ( l o g L ) * * 2 ) / 2 - ( L l o g L ) / 2 + 2 1 o g L } = { 2 L } / { ( 3 L ( l o g L ) * * 2 ) / 2 - L + 4 } Q.E.D. T h e o r e m I I I . 4 : T h e maximum a v e r a g e t h r o u g h p u t r a t e (MATR) o f a L S S N w i t h L l o o p s i s : M A T R ( L ) = 3 / 2 x S R ^ s w x l o g L x L * * 2 / { 3 L l o g L - L + 4 } w h e r e S R g w i s t h e maximum r a t e o f t r a n s m i t t i n g ^ R e s u l t p a c k e t s b e t w e e n t w o s w i t c h e s v i a a n o u t p u t l i n k . 150 P r o o f : S i n c e r e t h e r e a r e L l o g L l i n k s i n a L - l o o p e d L S S N , t h e r e f o r e , t h e maximum a v e r a g e r a t e o f d e l i v e r i n g p a c k e t s t o a l l t h e r e c e i v i n g p o r t s c o u l d b e f o r m u l a t e d a s : M A T R ( L ) = L l o g L x S R ̂  ̂ x ( 1 - P c o n f x i c t e d > x P r e m o v e d w h e r e P i s t h e p r o b a b i l i t y t h a t a n o u t p u t l i n k w i l l c o n f l i c t e d . J c « n o t c o n t a i n a p a c k e t d u e t o c o n f l i c t s w i t h i n t h e s w i t c h c o n c e r n e d , a n d i t c o u l d b e c o m p u t e d w i t h t h e i l l u s t r a t i o n s b e l o w : F i g . I l l . 6 , On t h e a v e r a g e , 2 5 % o f t h e t i m e a n o u t p u t l i n k w i l l n o t r e c e i v e a n y p a c k e t d u e t o c o n f l i c t s i n t h e s w i t c h , t h e r e f o r e , 1 " P c o n f l i c t e d * 3 ^ 4 a n d a l s o , M A R T ( L ) = 3 / 4 x S x l o g L x L * * 2 / { 3 L l o g L - L + 4 } = 3 / 2 x S R t S W x l o g L x L * * 2 / { 3 L l o g L - L + 4 } Q.E.D. 151 T h e o r e m I I I . 5 : T h e L S S N w h i c h u s e s T y p e - B s w i t c h e s i s d e a d l o c k f r e e . P r o o f : T y p e - B s w i t c h e s p r o v i d e t w o e s s e n t i a l f e a t u r e s i n a v o i d i n g d e a d l o c k s i n L S S N : ( a ) T h e i n t e r m e d i a t e p o r t s a r e u s e d t o h o l d p a c k e t s w i t h f e e d b a c k c o u n t s o f 0 a n d 1, s u c h t h a t t h e y a r e n o t e l i g i b l e t o c o n t e n d f o r t h e o u t p u t p o r t s i f t h e y c a n n o t be s w i t c h e d t o t h e n e x t s t a g e i m m e d i a t e l y , i . e . , i f t h e b u f f e r p o o l s o f t h e n e x t s w i t c h h a s n o r o o m t o a c c e p t t h e m . ( b ) T h e f e e d b a c k c o u n t s o f t h e p a c k e t s e m e r g i n g f r o m t h e l a s t s t a g e a r e i n c r e m e n t e d s o t h a t when t h e y a r e f e d b a c k t o t h e f i r s t s t a g e , t h e y w i l l r e q u e s t b u f f e r s o f t h e n e x t h i g h e r c l a s s . T h e f i r s t f e a t u r e e n s u r e s t h a t l i n k s t h a t a r e s h a r e d b y p a c k e t s w i t h v a r i o u s f e e d b a c k c o u n t s w i l l n o t b e c l o g g e d . T h e s e c o n d f e a t u r e p r e v e n t s t h e f o r m a t i o n o f a n y c y c l i c a l r e q u e s t l o o p . W i t h t h e s e t w o f e a t u r e s , t h e p a t h t r a v e r s e d b y a n y p a c k e t i n t h e n e t w o r k i s " s p i r a l " r a t h e r t h a n " c y l i c a l " i n s h a p e , a n d t h e w h o l e n e t w o r k c o u l d b e c o n c e i v e d a s s e v e r a l s p i r a l s i n t e r c o n n e c t e d i n p a r a l l e l , w i t h t h e C l a s s - 0 b u f f e r s o f t h e f i r s t s t a g e a s t h e h e a d s o f t h e s p i r a l s , a n d t h e C l a s s - 2 b u f f e r s o f t h e l a s t s t a e a s t h e t a i l s . S i n c e t h e r e i s n o 152 c y c l i c a l r e q u e s t o f r e s o u r c e s , t h e n e t w o r k i s t h e r e f o r e d e a d l o c k f r e e . Appendix B PROGRAM RECIRCULATING-SYSTOLIC-SORT; CONST INIT_SEED = 3; MARKER COLUMN = 1; RANGE 100.0; (* LARGEST RANDOM NUMBER TO BE SORTED *) TYPE STORE_RECORD = RECORD MARKER : BOOLEAN; ITEM : INTEGER END; COMPARATOR RECORD RECORD INIT_ROW. INIT_C0LUMN : A l , AJ, BI, BJ, CI. CO, TEMPORARY : INTEGER END; INTEGER; 01, DJ : INTEGER; DATA_ARRANGE TYPE = (RRANDOM. SSOUENTIAL); VAR N_C0MP, H_COMP, LIMIT_NO_SWITCH ROW, COLUMN STORE COMPARATOR • TOTAL_CYCLE, SEED 01 . I, <J. K, TEMPA, TEMPB TEMP_COL MARKERA. MARKERB TERMINATE DATA_ARRANGE DECR V_COMP : INTEGER; : INTEGER; INTEGER; ARRAY (. 0 .. 50, 0 .. 50 .) OF STORE_RECORD; ARRAY (. 1 .. 200 .) OF COMPARATOR RECORD; SWITCH_PER_CYCLE : INTEGER; : INTEGER; EVEN_START : INTEGER; INTEGER; J J , ODD_START TEMPC. TEMPO : INTEGER; MARKERC, BOOLEAN; DATA_ARRANGE_TYPE INTEGER; MARKERD : BOOLEAN; CONTINUOUS NO SWITCH : INTEGER; PROCEDURE SETUP_NETWORK; BEGIN (* SETUP_NETWORK *) ROW := 2 * V_COMP; COLUMN := 2 * H_COMP; N_COMP := V_COMP * H_COMP (* *NUMBER OF COMPARATORS LIMIT NO SWITCH := 2 » H COMP O TRUNC (H_COMP / 2) ) »; ( I : = O 0 := O FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN (* ** EACH COMPARATOR WILL HOLD INIT_ROW := I; INIT_COLUMN := J ; I := I + 2; TERMINATE IF NO SWITCHING CONTINUOUSLY *) 4 ITEMS TO BE SORTED' 154 IF I >= 2 * V_COMP - 1 THEN BEGIN I := I - 2 * V_COMP + 1; J := J + 2 END END END (* SETUP_NETWORK * ) ; FUNCTION RANDOM (VAR SEED : INTEGER) : INTEGER; BEGIN (* RANDOM *) RANDOM := TRUNC ((SEED / 65536 - 0.1) * RANGE); SEED := (25173 * SEED + 13849) MOD 65536 END (* RANDOM * ) ; PROCEDURE CHECK_TERMINATE; BEGIN (* CHECK_TERMINATE *) IF SWITCH_PER_CYCLE = 0 THEN BEGIN INCR (CONTINUOUS_NO_SWITCH); IF CONTINUOUS_NO_SWITCH >= LIMIT_NO_SWITCH THEN TERMINATE := TRUE END ELSE BEGIN SWITCH_PER_CYCLE := 0; C0NTINU0US_N0_SWITCH := 0 END END (* CHECK_TERMINATE * ) ; PROCEDURE INITIALIZE; BEGIN .(* INITIALIZE *) FOR I := 0 TO (ROW - 1) DO FOR d := 0 TO (COLUMN - 1) DO WITH STORE (. I, d .) DO BEGIN IF DATA_ARRANGE = RRANDOM THEN ITEM := RANDOM (SEED) ELSE BEGIN ITEM := SEED; SEED := SEED - DECR END; (* *********"MARKING EACH LOOP* ******** *) IF ( ( d = 2 * MARKER_COLUMN - 2) AND (I MOD 2 = 1 ) ) 0 R ( ( d = 2 * MARKER_COLUMN - 1) AND (I MOD 2 = 0 ) ) THEN MARKER := TRUE ELSE MARKER := FALSE END; FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN TEMPORARY = 0; Al = INIT ROW; Ad = INIT" "COLUMN; BI = A l ; Bd = Ad + 1 ; CI = A l + 1 ; Cd = Ad; DI = A l + 1 ; Dd = Ad + 1 END; TOTAL_CYCLE := 0; SWITCH_PER_CYCLE := 0; C0NTINU0US_N0_SWITCH := 0; TERMINATE := FALSE END (* INITIALIZE * ) : PROCEDURE VERTICAL_COMP; BEGIN (* VERTICAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF (STORE (. A l , Ad .).ITEM > STORE (. CI. Cd BEGIN TEMPORARY := STORE (. A l , Ad .).ITEM: STORE (. A l , Ad .).ITEM := STORE (. CI, Cd STORE (. CI, Cd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END; IF (STORE (. BI, Bd .).ITEM > STORE (. DI, Dd BEGIN TEMPORARY := STORE .(. BI. Bd .).ITEM; STORE (. BI, Bd .).ITEM := STORE (. DI. Dd STORE (. DI. Dd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END END END (* VERTICAL_COMP * ) ; PROCEDURE DIAGONAL_COMP; BEGIN (* DIAGONAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF (STORE (. A l , Ad .).ITEM > STORE (. DI. Dd .).ITEM) THEN BEGIN TEMPORARY := STORE (. A l , Ad .).ITEM; STORE (. A l , Ad .).ITEM := STORE (. DI, Dd .).ITEM; STORE (. DI, Dd .).ITEM := TEMPORARY; INCR (SWITCH_PER_CYCLE) END; IF (STORE (. BI. Bd .).ITEM > STORE (. CI. Cd .).ITEM) THEN BEGIN TEMPORARY := STORE (. BI. Bd . ).ITEM; STORE (. BI, Bd .).ITEM := STORE (. CI, Cd .).ITEM; STORE (. CI. Cd .).ITEM := TEMPORARY; INCR (SWITCH_PER CYCLE) END END END (* DIAGONAL_COMP * ) ; (* DIAGONAL_COMP *) PROCEDURE HORIZONTAL_COMP; BEGIN (* HORIZONTAL_COMP *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN TEMPA := STORE (. A l , Ad TEMPB := STORE (. BI, Bd TEMPC := STORE (. CI, Cd ).ITEM) THEN .).ITEM; ).ITEM) THEN .).ITEM; . ) . ITEM . ) . ITEM . ) .ITEM TEMPD := MARKERA MARKERB MARKERC MARKERD TEMP COL STORE (. * STORE = STORE = STORE = STORE DI , Dd . ) . (. A l . AJ . (. BI. Bd . (. CI. Cd . (. DI, Dd . ITEM; ).MARKER ).MARKER ).MARKER ).MARKER := TRUNC (INIT COLUMN / 2); IF ((NOT MARKERA) AND (TEMPB TEMPA)) THEN BEGIN TEMPORARY := STORE (. BI, STORE ( . BI, Bd .).ITEM : STORE (. A l . Ad .).ITEM : INCR (SWITCH_PER_CYCLE) END; IF ((NOT MARKERC) AND (TEMPD TEMPC)) THEN BEGIN TEMPORARY :- STORE (. DI, STORE (. DI, Dd .).ITEM : STORE (. CI. Cd .).ITEM : INCR (SWITCH_PER_CYCLE) END END END (* HORIZONTAL COMP * ) ; > TEMPA)) OR ((MARKERA) AND (TEMPB Bd .).ITEM; = STORE (. A l , = TEMPORARY: Ad .).ITEM; > TEMPO) OR ((MARKERC) AND (TEMPD Dd .).ITEM; = STORE (. CI, = TEMPORARY; Cd .).ITEM: PROCEDURE DISPLAY: BEGIN (* DISPLAY *) WRITELN; WRITELN; WRITELN ('NUMBER OF SWITCHING = SWITCH PER CYCLE 5) ; WRITELN ('AT CYCLE TIME = FOR I := 0 TO (ROW - 1) DO BEGIN IF (I MOD 2 = 0 ) THEN BEGIN WRITELN; WRITELN END; FOR d := 0 TO (COLUMN - WITH STORE (. I. d .) BEGIN IF (d MOD 2 WRITE (ITEM END; WRITELN END END (* DISPLAY *) ; TOTAL CYCLE 5) ; 1 ) DO DO O) THEN WRITE (' 4) 1): PROCEDURE TRU_DISPLAY; BEGIN (* TRU_DISPLAY *) WITH COMPARATOR (. 1 .) DO BEGIN EVEN_START := Ad; ODD_START := Cd END; WRITELN; WRITELN; WRITELN ('AT CYCLE TIME=', TOTAL CYCLE FOR I := 0 TO (ROW - 1) DO BEGIN 5); 157 IF (I MOD 2 = 0 ) THEN BEGIN FOR d := EVEN_START TO EVEN_START + COLUMN - 1 DO BEGIN J J := J MOD COLUMN; IF STORE (. I, dd .).MARKER THEN WRITE (' *', STORE (. I, dd .).ITEM : 2) ELSE WRITE (' ', STORE (. I, dd .).ITEM : 2) END; WRITELN END ELSE BEGIN FOR d := ODD_START TO ODD_START + COLUMN - 1 DO BEGIN dd := d MOD COLUMN; IF STORE (. I. dd .).MARKER THEN WRITE (' *'. STORE (. I, dd .).ITEM : 2) ELSE WRITE (' ', STORE (. I, dd .).ITEM : 2) END; WRITELN END END END (* TRU_DISPLAY * ) ; PROCEDURE SHIFT; BEGIN (* SHIFT *) FOR K := 1 TO N_COMP DO WITH COMPARATOR (. K .) DO BEGIN IF ( A l MOD 2 = 1 ) THEN Ad : = (Ad + 1 ) MOD COLUMN ELSE Ad := (Ad + COLUMN - 1) MOD COLUMN; IF (BI MOD 2 = 1 ) THEN Bd : = (Bd + 1) MOD COLUMN ELSE Bd := (Bd + COLUMN - 1) MOD COLUMN; IF (CI MOD 2 = 1) THEN Cd : = (Cd + 1) MOD COLUMN ELSE Cd := (Cd + COLUMN - 1 ) MOD COLUMN; IF (DI MOD 2 = 1 ) THEN Dd : = (Dd + 1) MOD COLUMN ELSE Dd := (Dd + COLUMN - 1) MOD COLUMN END END (* SHIFT * ) ; BEGIN (* PARA_SORT *) H_COMP := 6; V_COMP := 3; SEED := INIT_SEED; DATA_ARRANGE := RRANDOM: DECR := 1; WHILE (H_COMP <> -999) DO BEGIN WRITELN ('H_COMP/V COMP/SEED/RAND/DECR=', H_COMP : 5. V_COMP : 5, SEED : 5, DATA -ARRANGE : 5. DECR : 5); WRITELN ('ENTER NEW VALUES/ -999 FOR TERMINATION'); READLN (H_COMP. V_COMP, SEED. DATA_ARRANGE, DECR); IF (H_COMP <> -999) THEN BEGIN SETUP_NETWORK; INITIALIZE; TRUJDISPLAY; WHILE (NOT TERMINATE) DO BEGIN 158 INCR (TOTAL_CYCLE); VERTICAL_COMP; HORIZONTAL_COMP; DIAGONAL_COMP; CHECK_TERMINATE; SHIFT END; TRU_DISPLAY; WRITELN; WRITELN ('NUMBER OF HORIZONTAL COMPARATORS =', H_COMP : 5 ) : WRITELN ('NUMBER OF VERTICAL COMPARATORS ='. V_COMP : 5); WRITELN ('NUMBER OF ITEMS SORTED 3', ROW * COLUMN : 5 ) ; WRITELN ('NUMBER OF DOUBLE_COMPARISON/SHIFT CYCLES =', TOTAL_CYCLE - LIMIT_NO SWITCH : 5) END END END (* PARA SORT > « 159 Appendix C PROGRAM LSSN; (•DEADLOCK FREE •USE CENTRAL BUFFERS FOR SIMULATION,ELSE STACK OVERFLOW *CLASS_0,_1,_2 BUFFERS ARE PRIORITIZED: CLASS-2 = HIGHEST CLASS-1 = MIDDLE CLASS-0 = LOWEST •MAY 1983 *) CONST N_SW=32; N TR=64; N~RR=64; TR_INTL=10; T SIMULAT=1000000; T~TRANSFER=2: T~DECIDE=3; T_SWITCH=2; F0_SIZE=7; F1_SIZE=7; F2_SIZE=2; F012_SIZE=FO_SIZE+F1_SIZE+F2_SIZE; F_T0TAL=2*N_SW*F012_SIZE; TYPE BUFFER_RECORD=RECORD B_EMPTY:BOOLEAN; B_TR_TIME:INTEGER; B_DEST:INTEGER: B_FEEDBACK_COUNT:INTEGER; END; FIFO_RECORD=RECORD F_START,F_STOP:INTEGER; F_TOP,F_BOTTOM:INTEGER; F_EMPTY,F_FULL:BOOLEAN; F_COUNT:INTEGER; END; INPORT_RECORD=RECORD I_TIMER:INTEGER; I_EMPTY:BOOLEAN; I_TR_TIME:INTEGER; I_DEST:INTEGER; I_FEEDBACK_COUNT:INTEGER; END; OUTPORT_RECORD=RECORD 0_TIMER:INTEGER; 0_EMPTY,0_MATCHED:BOOLEAN; 0_TR_TIME:INTEGER; 0_DEST:INTEGER; 0_FEEDBACK_COUNT:INTEGER; 0_RR,0_NEXT_SW,0_NEXT_PT:INTEGER ; END; SWITCH RECORD=RECORD INPORT:ARRAY(. 0..1 .) OF INPORT_RECORD; FIFO:ARRAY(. 0..1.0..2 .) OF FIFO_RECORD; OUTPORT:ARRAY(. 0..1 .) OF OUTPORT_RECORD; END; TR_RECORD=RECORD T EMPTY,T BLOCKED:BOOLEAN; T_DEST:INTEGER; T_NEXT_SW,T_NEXT_PT:INTEGER; T_TIMER:INTEGER;("TRANSMIT WHEN TIMER REACHES CLOCK*) END; VAR SWITCH:ARRAY(. 1..N_SW .) OF SWITCH_RECORD; TR:ARRAY(. 1..N_TR .) OF TR_RECORD; BUFFER : ARRAY(. 1..F TOTAL .) OF BUFFER_RECORD; FMAX:ARRAY( . 0..2 .7 OF INTEGER;(* MAX' USAGE OF FIFO*) CLOCK:INTEGER; SEED:INTEGER; R PACKET,T_PACKET:INTEGER; TOTAL_DELAY:INTEGER; MAX_DELAY:INTEGER; TR_DELAY:INTEGER;("BLOCKAGE AT ENTRANCE*) FUNCTION RANDOM (VAR SEED : INTEGER) : REAL; BEGIN RANDOM:=SEED/65535; SEED := (25173*SEED+13849)M0D 65536; END; PROCEDURE INITIALIZE; VAR TI.SI.IPI.OPI.CI.PI.BI:INTEGER; TEMPI:INTEGER; BEGIN WRITELN('LAST STAGE DOES NOT MATCH LOOP NUMBER,JUST INCR FB#') WRITELN('NUMBER OF SWITCHES 3',N_SW:5); WRITELN('NUMBER OF TR=',N_TR:5); WRITELN('FIFO SIZE OF CL-O,1,2,TOTAL PER SWITCH=',FO_SIZE:5, F1_SIZE:5,F2_SIZE:5,F012_SIZE:5); WRITELN('REQUEST RATE=',1/TR_INTL:10:5); WRITELN('TR INTL= ',TR_INTL:7); WRITELN('SIMULATION TIME =',T_SIMULAT:5); FOR SI:=1 TO N_SW DO WITH SWITCH[SI] DO BEGIN (*S*) FOR IPI:= 0 TO 1 DO WITH INPORT[IPI] DO BEGIN (*IP*) I_TIMER:=0; I_EMPTY:=TRUE; I_DEST:=-1; I_TR_TIME:=0; I_FEEDBACK_COUNT:=0; END;(*IP*) FOR PI:=0 TO 1 DO BEGIN FOR CI:= 0 TO 2 DO WITH FIFO [ P I . C I ] DO BEGIN (*BP,CL*) F_EMPTY:=TRUE; F_FULL:=FALSE ; F_COUNT:=0; END;(*BP,CL*) (••DETERMINE MEMORY LOCATIONS FOR EACH SWITCH'S BUFFER**) TEMPI:=F012_SIZE*(2*SI-2+PI) + 1 ; FIF0[PI,O].F_START:=TEMPI; FIFO[PI,0].F_TOP:=TEMPI; 161 FIFO[PI.0].F_BOTTOM:=TEMPI; FIFOtPI,0].F_ST0P:=TEMPI+FO_SIZE-1; FIFO[PI.1].F_START:=TEMPI+FO_SIZE; FIFO[PI,1].F_TOP:=TEMPI+FO_SIZE; FIFO[PI,1].F_BOTTOM:=TEMPI+FO_SIZE: FIFO[PI,1].F_ST0P:=TEMPI+FO_SIZE+F1_SIZE-1; FIFO[PI,2].F_START:=FIF0[PI,1].F_ST0P+ 1 ; FIFO[PI,2].F_TOP:=FIFO[PI,1].F_ST0P+ 1 ; FIF0[PI,2].F_BOTTOM:=FIFO[PI.1].F STOP+1; FIFO[PI,2].F_STOP:=FIF0[PI,2].F_START+F2_SIZE-1; END; FOR OPI:=0 TO 1 DO WITH OUTPORT[OPI] DO BEGIN(*OP*) 0_TIMER:=0; 0_EMPTY:=TRUE; 0_TR_TIME:=0; 0_MATCHED:=FALSE; 0_DEST:=-1; 0_FEEDBACK_COUNT:=0; READLN(0_RR,0_NEXT_SW.0_NEXT_PT); END;(*0P*) END:(*S*) FOR TI:=1 TO N_TR DO WITH TR[TI) DO BEGIN(*T* ) T_EMPTY:=TRUE; T_BLOCKED:=FALSE; T_DEST:=-1; T_NEXT_SW:= TRUNC((TI+1)/2); T_NEXT_PT:= (TI+1)MOD 2; REPEAT T_TIMER:=TRUNC(RANDOM(SEED)*2*TR_INTL); UNTIL T_TIMER>0 AND T_TIMER<2*TR_INTL; END;(*T*) FOR BI:=1 TO F_TOTAL DO WITH BUFF ER[BI] DO BEGIN B_EMPTY:=TRUE; B_TR_TIME:=0; B_DEST:=-1; B_FEEDBACK_COUNT:=0; END; CLOCK:=0; TOTAL_DELAY:=0; TR_DELAY:=0; R_PACKET :=0: T_PACKET:=0; MAX_DELAY:=0; FMAX[0]:=0; FMAX[1]:=0; FMAX[2J:=0; (**MAX USAGE OF EACH CLASS OF BUFFERS*) END;(*INIT*) PROCEDURE TR_GET_DEST; VAR T:INTEGER; BEGIN FOR T:= 1 TO N_TR DO WITH TR[T] DO BEGIN(*T*) IF T_TIMER <= CLOCK AND T_EMPTY AND NOT T_BLOCKED THEN BEGIN(*TIMER*) REPEAT T_DEST := TRUNC(RANDOM(SEED)*S5); UNTIL ( T DEST IN (. 1..64 .) AND T_DEST <> T ) T_EMPTY:=FALSE; END;(*TIMER*) END; END;(*TR_GET DEST*) FUNCTION ROUTE(SW, DT:INTEGER) : INTEGER; VAR SR.DR:INTEGER; BEGIN SR:=SW; DR:=DT; ROUTE:=1;(*INITIAL SETTING*) IF SR<=8 AND ( (DR-1) MOD 2 =0) THEN ROUTE:=0 ELSE IF SR<=16 AND SR>8 AND ( TRUNC((DR-1)/2) MOD 2 =0) THEN ROUTE:=0 ELSE IF SR<=24 AND SR>16 AND ( TRUNC((DR-1)/4) MOD 2=0) THEN ROUTE:=0 ELSE IF SR<=32 AND SR>24 AND ( TRUNC((DR-1)/8)MOD 2=0) THEN ROUTE:=0 END; PROCEDURE TR_TO_INPORT; VAR TT;INTEGER; PACKET_COUNT:INTEGER; BEGIN FOR TT:=1 TO N_TR DO WITH TR[TT] DO BEGIN(*T* ) IF NOT T_EMPTY AND T_TIMER<=CLOCK THEN WITH SWITCH[T_NEXT_SW],INPORT[T NEXT_PT] DO BEGIN(*READY TO TRANSMIT*) ~ PACKET_COUNT:=FIFO[T_NEXT_PT,O].F_COUNT; IF ( I_EMPTY) AND I_TIMER<=CLOCK AND FIF0[T_NEXT_PT,O].F_C0UNT<FO_SIZE AND FIFO[T_NEXT_PT,1].F_COUNT<F1_SIZE AND FIF0[T_NEXT_PT,2].F_C0UNT<F2_SIZE THEN PACKET_C0UNT<(F0_SIZE-1)*(TRUNC(T_NEXT_SW/8.4+1)*8/N_SW) THEN ( FIFO[0,0].F_COUNT < ( TRUNC(T_NEXT_SW/8.4)+1)) AND' (FIFO[1.0].F_COUNT< ( TRUNC(T_NEXT_SW/8.4)+1)) THEN BEGIN(*GRANTED TO TRANSMIT*) I_TIMER:=CLOCK+T_TRANSFER; I_EMPTY:=FALSE; I_DEST:=T_DEST; I_F E EDBACK_COUNT:=0; I_TR_TIME:=CLOCK; T_EMPTY:=TRUE; T_TIMER:=CLOCK+ TRUNC(TR INTL*2*(RANDOM( SEED ) ) ) ; T_DEST:=-1; T_BLOCKED:=FALSE; INCR(T_PACKET); END ELSE 163 BEGIN T_BLOCKED:=TRUE: INCR(TR_DELAY);("INCREMENT TOTAL TR_DELAY*) END; END;("READY TO TRANSMIT") END;("T") END;(*TR_TO_INPORT*) PROCEDURE TRANSFER( SS,BB,CC,PPRT,OOPT,CC_NEXT:INTEGER); VAR ST,BT,CT,PTRT,OPT,CT_NEXT:INTEGER; BEGIN("GRANT CLASS-CL BUFFER*) ST:=SS; BT:=BB; CT:=CC; PTRT:=PPRT; OPT:=OOPT; CT_NEXT:=CC_NEXT; WITH SWITCH[ST].FIFO[BT.CT],OUTPORT[OPT] DO BEGIN WITH BUFFER[PTRT] DO BEGIN DECR(F_COUNT) ; 0_TIMER:=CLOCK+T_SWITCH+T_DECIDE; 0_EMPTY:=FALSE; 0_DEST:=B_DEST; IF 0_DEST NOT IN (. 1..64 .) THEN WRITELN('????? WRONG DEST, LINE 257??????',0_DEST:5); IF 0_DEST=0_RR THEN 0_MATCHED:=TRUE ELSE 0_MATCHED:=FALSE; 0_FEEDBACK_COUNT:=CT_NEXT; 0_TR_TIME:=B_TR_TIME; B_EMPTY:=TRUE; B_DEST:=-1; B_FEEDBACK_COUNT:=0; B_TR_TIME:=CLOCK; F_FULL:=FALSE; F_BOTTOM:=PTRT; IF F_TOP=F_BOTTOM THEN F_EMPTY:=TRUE ELSE F_EMPTY:=FALSE; END END (*CLASS_CL*) END; PROCEDURE OUTPORT_TO_INPORT; VAR S,OP:INTEGER; BEGIN FOR S:=1 TO N_SW DO WITH SWITCHES] DO FOR OP:=0 TO 1 DO WITH OUTPORT[OP].SWITCH[0_NEXT_SW].INPORT[0_NEXT_PT] DO BEGIN(*S.OP*) IF NOT 0_EMPTY AND NOT 0_MATCHED AND 0_TIMER<=CLOCK THEN BEGIN(*READY TO TRANSMIT*) IF ( I_EMPTY) THEN BEGIN("GRANTED TO TRANSMIT*) I TIMER:=CLOCK+T TRANSFER; I_EMPTY:=FALSE; I_DEST:=0_DEST; I_TR_TIME:=0_TR_TIME; I_FEEDBACK_COUNT:=0_FEEDBACK_COUNT; 0_TIMER:=CLOCK+T_TRANSFER; 0_DEST:=-1; 0_EMPTY:=TRUE; 0_FEEDBACK_COUNT:=0; 0_TR_TIME:=CL0CK; END ELSE BEGIN(*BLOCKED*) (**********OUTP0RT IS BLOCKED???????****************) WRITE('?????OUTPORT BLOCKED???? ' ) ; WRITECAT T.SW.NEXT SW , FB_COUNT ' ) ; WRITELN( CLOCK:3,S:3,0_NEXT_SW:3,0_FEEDBACK_COUNT:4); 0_EMPTY:=FALSE; END; END(*READY*) ELSE IF 0_MATCHED THEN('REMOVE PACKETS*) BEGIN 0_TIMER:=CLOCK; 0_EMPTY:=TRUE; 0_MATCHED:=FALSE; INCR(R_PACKET); WRITELN('RECEIVED AT TIME,DEST.RR,DELAY=',CLOCK:5,0_DEST : 5 , 0_RR:5,CL0CK - 0_TR_TIME:5); IF 0_TR_TIME=0 THEN WRITELN('??????? LINE 343????'); TOTAL_DELAY:=T0TAL_DELAY+(CLOCK - 0_TR_TIME); IF (CLOCK-0_TR_TIME)> MAX_DELAY THEN MAX_DELAY:=(CLOCK - 0_TR_TIME); 0_DEST:=-1; 0_FEEDBACK_COUNT:=0; 0_TR_TIME:=CLOCK; END;(*REMOVE PACKETS*) END;(*S,OP*) END;(*OUTPORT_INPORT*) PROCEDURE INPORT_TO_POOL; VAR SS,IP:INTEGER; BEGIN FOR SS:=1 TO N_SW DO WITH SWITCH[SS] DO FOR IP:=0 TO 1 DO WITH INPORT[IP] DO BEGIN (*S.IP*) IF NOT I_EMPTY AND I_TIMER<=CLOCK THEN BEGIN (*READY TO STORE PACKETS INTO BUFFER POOL*) IF FIFO[IP,I_FEEDBACK_COUNT].F_FULL THEN WRITELN('!!!!! ! !F_FULL! ! ! ! ! S,P.CL=',SS:2,IP:2,I_FEEDBACK_COUNT:3) WITH FIFO[IP,I_FEEDBACK_COUNT] DO IF NOT F_FULL THEN BEGIN F_TOP:=(F_T0P+1): IF F_TOP>F_STOP THEN F_TOP:=F_START; WITH BUFFER[F_TOP] DO BEGIN INCR(F_COUNT); IF F_COUNT>FMAX[I_FEEDBACK_COUNT] THEN FMAX[I_FEEDBACK_COUNT]:= F COUNT; B_EMPTY:=FALSE; ~ B_DEST:=I_DEST; B_TR_TIME:=I_TR_TIME; B_FEEDBACK_COUNT:=I_FEEDBACK_COUNT; I_TIMER:=CLOCK+T_SWITCH+T_DECIDE; I_EMPTY:=TRUE; I_OEST:=-1; I_FEEDBACK_COUNT:=0; I_TR_TIME:=CLOCK; F_EMPTY:=FALSE; IF F_TOP=F_BOTTOM THEN F_FULL:=TRUE ELSE F_FULL:=FALSE; END END END END (*S,IP*) END;(*INPORT_BUFFER POOL*) PROCEDURE POOL_TO_OUTPORT; VAR PTRP:INTEGER;(*POINTER OF STRUCTURED BUFFERS* ) TERMINATE:BOOLEAN; SP,OPP,PP,CLP,CLP_NEXT:INTEGER; CHECK_DEST:INTEGER; CHECK_BIT:INTEGER; OK_TRANSFER:BOOLEAN; NOW_SCHEDULED:INTEGER; BEGIN FOR SP:=1 TO N_SW DO WITH SWITCHfSP] DO FOR OPP:=0 TO 1 DO WITH OUTPORT[OPP] DO BEGIN(*S,OP*) IF 0_EMPTY AND 0_TIMER<=CLOCK THEN BEGIN(*READY TO ACCEPT PACKETS FROM CLASS_0,_1,_2 BUFFERS*) NOW_SCHEDULED:=0; TERMINATE:=FALSE; WHILE (N0W_SCHEDULED<6) AND (NOT TERMINATE) DO BEGIN CASE NOW SCHEDULED OF 0: BEGIN PP =0; CLP = 2; END; 1 : BEGIN PP = 1 ; CLP = 2; END; 2 : BEGIN PP =0; CLP = 1 ; END; 3 : BEGIN PP = 1 ; CLP = 1; END; 4 : BEGIN PP =0; CLP =0; END; 5: BEGIN PP = 1; CLP =0; END; <>: BEGIN WRITELN( 'ERROR IN POOL TO OUTPORT !!!!!!!');END; END; PTRP. =(FIFO[PP.CLP].F_B0TT0M+1); IF PTRP>FIFO[PP,CLP].F_STOP THEN PTRP : =FIFO[PP,CLP] .F_START; (•REMOVE PACKET FROM BOTTOM OF BUFFER*) CHECK_DEST:=BUFFER[PTRP].B_DEST; CHECK_BIT:=ROUTE(SP.CHECK_DEST): (•DETERMINE SWITCH BIT OF PACKET*) (*IF FEEDBACK PACKET, THEN GO TO NEXT CLASS OF BUFFER*) IF 0_NEXT_SW IN (. 1..8 .) THEN BEGIN IF CLP<2 AND (((CHECK DEST-1)M0D 16)=((0 RR-1) MOD 16))THEN CLP_NEXT:=2 ELSE IF CLP<2 THEN 166 CLP_NEXT:=CLP+1 ELSE CLP_NEXT:=CLP END ELSE CLP_NEXT:=CLP; IF 0_NEXT_SW IN (. 1..8 .) AND CLP<2 THEN CLP_NEXT:=CLP+1; ELSE CLP_NEXT:=CLP; WITH SWITCH[0 NEXT_SW].FIFO[0_NEXT_PT,CLP_NEXT] DO IF (NOT FIFO[PP,CLP].F_EMPTY) AND (F_COUNT< (F_STOP-F_START)) AND (CHECK_BIT=OPP) THEN BEGIN OK_TRANSFER:=TRUE; TERMINATE:=TRUE; 0_TIMER:=CLOCK+T_SWITCH; 0_EMPTY:=FALSE; END ELSE WITH BUFFER[PTRP] DO BEGIN OK TRANSFER:=FALSE; NOW_SCHEDULED:=(NOW SCHEDULED*1 ); TERMINATE:=FALSE; END; END;(*WHILE*) IF OK_TRANSFER THEN TRANSFER(SP,PP.CLP,PTRP,OPP,CLP_NEXT); END;("EMPTY*) END;(*S,OP*) END;(*POOL_TO_OUTPORT*) PROCEDURE GROSS_DISPLAY; BEGIN WRITELN('AT TIME=',CL0CK:5); WRITELN(' T_PACKET='.T_PACKET:7); WRITELN(' R_PACKET=',R_PACKET:7); IF R_PACKET > 1 THEN WRITELNC AVERAGE DELAY=',TOTAL_DELAY/R_PACKET:10:5); WRITELN(' MAX DELAY= ',MAX_DELAY:5); WRITELNC AVERAGE TR_DELAY= ', TR_DELAY/T_PACKET:10:5); WRITELN(' AVERAGE THROUGHPUT=', R_PACKET/T_SIMULAT:10:5); WRITELNC UNDELIVERED PACKETS=', T_PACKET- R_PACKET:5); WRITELNC MAX FIFO USAGE=',FMAX[0]:3.FMAX[1]:3,FMAX[2]:3); WRITELN( ' TOTAL FIFO USAGE=' ,FMAX[O] + FMAX[1] + FMAX[2] :5) ; END; PROCEDURE DETAIL_DISPLAY; VAR SW.C.P.B:INTEGER; BEGIN WRITELN('BUFFER DISPLAY OF SWITCH ARRAY'); WRITEC S PT CL B SRC DST SBIT STEP FB GEN_T TR_T BF_T DE_T DE_?'); WRITELN(' F_TOP F_BOTTOM MAX_USED'); FOR SW:=1 TO N_SW DO WITH SWITCH[SW] DO BEGIN FOR P:=0 TO 1 DO BEGIN FOR C:=0 TO 2 DO WITH BUFFER_POOL[P,C] DO BEGIN V 167 IF F_FULL THEN WRITELN('BUFFER FULL:S,P,C ',SW:3,P:3,C:3); IF NOT F_EMPTY THEN BEGIN FOR B:=0 TO FIFO_SIZE-1 DO WITH BUFFER[B] DO IF NOT B_EMPTY THEN BEGIN WRITE(' ',SW:2,' ',P:2,' ',C:2,' ',B:2.' ',' ',B_DE5T:5); WRITE(B_FEEDBACK_COUNT:3,' ' ) ; WRITELN(B_GENERATE_TIME:5,B_TR_TIME:5,' ', F_T0P:6,F_B0TT0M:6); END; END; END; END; WRITELN; END; END; PROCEDURE DEBUG; VAR B:INTEGER; BEGIN WRITELN('DISPLAY OF BUFFERS'); FOR B:=1 TO F_TOTAL DO WITH BUFFER[B] DO BEGIN IF NOT B_EMPTY THEN WRITELN(B:7,B_TR_TIME:5,B_DEST:5,B_FEEDBACK_COUNT:5) ; END; END; BEGIN(*LSSN*) SEED:=8476; INITIALIZE; FOR CLOCK:=1 TO T_SIMULAT DO BEGIN TR_GET_DEST; INPORT_TO_POOL; POOL_TO_OUTPORT; OUTPORT_TO_INPORT; TR_TO_INPORT; IF CLOCK> T_SIMULAT -1 THEN GROSS_DISPLAY; END; END . ("INPUT FILE WHICH CONTAINS THE INTERCONNECTION PATTERN OF THE LSSN*) > 168 References: 1. T. Moto-oka ( e d i t o r ) 1982. F i f t h Generation Computer Systems. North-H o l l a n d P u b l i s h i n g Company. 2. IEEE Spectrum. Tomorrow's Computers. Vol.20, N o . l l , Nov. 1983. 3. H.T. Kung £> C.E. L e i ser son, " S y s t o l i c A r r a y s f o r VLSI," Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rept. CS-79-103, Apr. 1983. 4. Computer, V o l . 15, No. 2, Feb. 1982. S p e c i a l i s s u e on data-flow computers. 5. P. C. T r e l e a v e n , D. R. Brownbridge and R. P. Hopkins,"Data-Driven and Demand-Driven Computer A r c h i t e c t u r e s , " ACM Computing Surveys, V o l . 14, No. 1, March 1982. 6. H.S. S t o n e , " P a r a l l e l Computers," i n I n t r o d u c t i o n to Computer A r c h i t e c t u r e s , e d i t e d by H.S. Stone et a l , 1975, Science Research A s s o c i a t e s , Inc. 7. W.R. Cyre 6 G.J. Lipo v s k i , " O n g e n e r a t i n g M u l t i p l i e r s f o r a C e l l u l a r F ast F o u r i e r Transform P r o c e s s o r , " IEEE Trans, on Computers, C-21, pp83-87, 1972. 8. D. P. Misunas,"A Computer A r c h i t e c t u r e f o r Data Flow Computation," MIT/LCS/TM-100, Cambridge, MA., 1975. 9. H.S. S t o n e , " P a r a l l e l P r o c e s s i n g with the P e r f e c t S h u f f l e , " IEEE Trans, on Computers, C-20, p p l 5 3 - l 6 1 , 1971. 10. D.E. M u l l e r & E.P. Preparata,"Bounds to C o m p l e x i t i e s of Networks f o r S o r t i n g and S w i t c h i n g , " J . Ass. Comput. Mach., Vol.22, PP195-201, Apr. 1975. 11. D.E. Knuth, The A r t of Computer Programming, V o l . 3 , S o r t i n g and S e a r c h i n g . Addison-Wesley, Reading, Mass., 1973. 12. T. Lang & H.S. Stone,"A Shuffle-exchange network with s i m p l i f i e d C o n t r o l , " IEEE Trans, on Computers, Vol.C-25, pp.55-65, Jan. 76. 13. K.E. B a t c h e r , " S o r t i n g Networks and t h e i r A p p l i c a t i o n s , " Proc. AFIPS 1968, Sp r i n g J o i n t Comput. Conf., pp307-3l4, Apr. 1968. 14. D. Nassimi & S. S a h n i , " B i t o n i c Sort on a Mesh-Connected P a r a l l e l Computer," IEEE Trans, on Comput., V10-C28, No.1, pp2-7, Jan.1979. 15. C D . Thompson & H.K. Kung, " S o r t i n g on a Mesh-rConnected P a r a l l e l Computer," Comm. of the ACM, Vol.20, No.4,pp263-2?1, Apr. 1977. 16. H.T. Kung,"Let's Design Algorithms f o r VLSI Systems," Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rep., Jan. 1979. 17. F.S. Wong & M.R. Ito,"A S y s t o l i c S o r t e r and i t s S i m u l a t i o n R e s u l t s , " Dept. of E.E., The Univ. of B r i t i s h Columbia, Tech. Rep., Oct. 1982. 169 18. C D . Thompson,"A Complexity Theory f o r VLSI," Ph.D. T h e s i s , Carnegie-Mellon Univ., Dept. of Computer Sc., 1979. 19. M.J. F o s t e r & H.T. Kung,"Design of S p e c i a l - P u r p o s e VLSI Chips: Examples and Op i n i o n s , " Dept. of Computer Sc., Carnegie-Mellon Univ., Tech. Rep., Sep. 1979. 20. C. Wu & T. Feng,"On a C l a s s of M u l t i s t a g e I n t e r c o n n e c t i o n Networks," IEEE Trans, on Computers, V o l . C-29, No. 8, Aug. 1980, pp. 694-702. 21. M.C. Pease,"The I n d i r e c t B i n a r y n-Cube M i c r o p r o c e s s o r A r r a y , " IEEE Trans, on Computers, V o l . C-26, No.5, May 1977, pp.458-473. 22. Computer, Vol.14, No. 12, Dec. 1981. S p e c i a l i s s u e on i n t e r c o n - n e c t i o n Networks. 23. F.S. Wong & M.R. Ito,"A Novel Packet S w i t c h i n g Network," Tech. Rept., Dept. of E.E., The Univ. of B r i t i h s Columbia, Canada, J u l y 1982. 24. C. Wu, T. Feng & M.C. L i n , " S t a r : A L o c a l Network System f o r Realtime Management of Imagery Data," IEEE Trans, on Computers, V o l . C-31, No. 10, Oct. 1982, pp. 923-933. 25. D.M. Dias & J.R. Jump,"Packet Switching I n t e r c o n n e c t i o n Networks f o r Modular Systems," i n Computer, V o l . 14, No. 12, Dec. 1981, pp.42-53. 26. A.R. T r i p a t h i & J . L i p o v s k i , " P a c k e t S w i t c h i n g i n Banyan Networks," Proceedings of the 6th Annual Symposium on Computer A r c h i t e c t u r e s , 1979, pp.160-167. 27. K.E. B a t c h e r , " S o r t i n g Networks and t h e i r A p p l i c a t i o n s , " Proceedings of AFIPS 1968, Sp r i n g J o i n t Computer Conf., pp.307 -314, 1968. 28. F. S. Wong and M. R. Ito,"A L a r g e - S c a l e Data-Flow Computer For H i g h l y P a r a l l e l S i g n a l P r o c e s s i n g , " Proceedings of the 1982 I n t e r n a t i o n a l Conference on C i r c u i t s and Computers, New York, Oct. 1982. 29. E. Raubold & J . Haenle,"A Method of Deadl o c k - f r e e Resource A l l o c a t i o n and Flow C o n t r o l i n Packet Networks," Proceeding ICCC 1976, Toronto, Canada, Aug. 1976, pp.483. 30. G.H. Barnes & S.F. Lundstrom,"Design and V a l i d a t i o n of a Connection Network f o r Many-Processor M u l t i p r o c e s s o r Systems," i n Computer, V o l . 14, No. 12, Dec. 1981, pp31-4l. 31. K. S. Weng,"An A b s t r a c t Implementation f o r a G e n e r a l i z e d Data Flow Language," MIT/LCS/TR-228, Cambridge, MA., 1979. 32. D. D. G a j s k i , D. J . Kuck and D. A. Padua,"Dependence-Driven Computation," Proceedings of the IEEE 1981 Compcon S p r i n g , pp. 168-172. 33. P. C. T r e l e a v e n , R . P. Hopkins and P. W. Rautenback,"Combining Data Flow and C o n t r o l Flow Computing," The Computer J o u r n a l , V o l . 2 5 , N o . 2, 1 9 8 2 , p p . 2 0 7 - 2 7 1 . 170 3 4 . J . E . R e q u a a n d J . R. M c G r a w , " T h e P i e c e - w i s e D a t a F l o w A r c h i t e c t u r e : A r c h i t e c t u r a l C o n c e p t s , " I E E E T r a n s a c t i o n s o n C o m p u t e r s , V o l . C - 3 2 , N o . 5, 1 9 8 3 , p p . 4 2 5 - 4 3 7 . 3 5 . F . S. Wong a n d M. R. I t o , " A L o o p - S t r u c t u r e d S w i t c h i n g N e t w o r k , " T e c h i c a l R e p t . , D e p t . o f E . E . , T h e U n i v . o f B r i t i s h C o l u m b i a , 1 9 8 2 . ( A c c e p t e d b y I E E E T r a n s , o n C o m p u t e r s . ) 3 6 . P. B u d n i k a n d D. J . K u c k , " T h e O r g a n i z a t i o n a n d U s e o f P a r a l l e l M e m o r i e s , " I E E E T r a n s , o n C o m p u t e r s , V o l . C - 2 6 , 1 9 7 1 , p p . 1 5 6 6 - 1 5 6 9 . 3 7 . D. H. L a w r i e a n d C. R. V o r a , " T h e P r i m e M e m o r y S y s t e m f o r A r r a y A c c e s s , " I E E E T r a n s , o n C o m p u t e r s , V o l . C - 3 1 , N o . 5, 1 9 8 2 , p p . 4 3 5 - 4 4 2 . 3 8 . B. H a n s e n . T h e A r c h i t e c t u r e o f C o n c u r r e n t P a s c a l . P r e n t i c e - H a l l , I n c . 1 9 7 7 . 3 9 . W. B. A c k e r m a n , " D a t a F l o w L a n g u a g e s , " P r o c . o f t h e 1 9 7 9 N a t i o n a l C o m p u t e r C o n f e r e n c e , 1 9 7 9 , p p . 1 0 8 7 - 1 0 9 5 . 4 0 . S. F . L u n d s t r o m a n d G. H. B a r n e s , " A C o n t r o l l a b l e MIMD A r c h i t e c t u r e , " P r o c . o f t h e 1980 I n t e r n a t i o n a l C o n f e r e n c e o n P a r a l l e l P r o c e s s i n g , 1 9 8 0 , p p . 1 9 - 2 7 . 4 1 . D. C o m t e , N . H i f d i a n d J . C. S y r e , " T h e D a t a D r i v e n L A U M u l t i p r o c e s s o r S y s t e m : R e s u l t s a n d P e r s p e c t i v e s , " I n f o r m a t i o n P r o c e s s i n g 8 0 , S. H. L a v i n g t o n ( E d . ) , N o r t h - H o l l a n d P u b . C o . , 1 9 8 0 , p p . 1 7 5 - 1 7 9 . 4 2 . E . W. D i s j k s t r a , " C o - o p e r a t i n g S e q u e n t i a l P r o c e s s e s , " i n P r o g r a m m i n g L a n g u a g e s . F . G e n u y s ( E d . ) A c a d e m i c P r e s s , 1 9 6 8 . 4 3 . W. B. A c k e r m a n a n d J . B . D e n n i s , " V A L — a V a l u e - o r i e n t e d A l g o r i t h m i c L a n g u a g e : P r e l i m i n a r y r e f e r e n c e m a n u a l , " M I T / L C S T R - 2 1 8 , J a n . 1 9 7 9 . 4 4 . R e f e r e n c e M a n u a l f o r t h e A d a P r o g r a m m i n g L a n g u a g e , P r o p o s e d S t a n d a r d D o c u m e n t . US D e p a r t m e n t o f D e f e n s e , 1 9 8 0 . 4 5 . C . K. C. L e u n g , " F a u l t T o l e r a n c e i n P a c k e t C o m m u n i c a t i o n C o m p u t e r A r c h i t e c t u r e s , " M I T / L C S / T R - 2 5 0 , 1 9 8 0 . 4 6 . D. A . A d a m s , " A C o m p u t a t i o n M o d e l w i t h D a t a F l o w S e q u e n c i n g , " C o m p u t e r S c i e n c e D e p t . , S c h o o l o f H u m m a n i t i e s a n d S c i e n c e , S t a n f o r d U n i v e r s i t y , T R - C S 1 7 , D e c . 1 9 6 8 . 4 7 . A r v i n d , K. P. G o s t e l o w a n d W. E. P l o u f f e , " A n A s y n c h r o n o u s P r o g r a m m i n g L a n g u a g e a n d C o m p u t i n g M a c h i n e , " T R - 1 1 4 a , D e p t . o f I n f o r . a n d Comp. S c . , UC I r v i n e , D e c . 1 9 7 8 . 4 8 . A r v i n d , V . K a t h a i l a n d K. P i n g a l i , " A D a t a f l o w A r c h i t e c t u r e w i t h T a g g e d T o k e n s , " M I T / L C S / T M - 1 7 4 , C a m b r i d g e , M a r . 1 9 8 0 . 4 9 . A r v i n d a n d R. E . T h o m a s , " I - S t r u c t u r e : A n E f f i c i e n t D a t a T y p e f o r F u n c t i o n a l L a n g u a g e , " M I T / L C S / T M - 1 7 8 , S e p t . 1 9 8 0 . 171 5 0 . J . D. B r o c k a n d L . B. M o n t z , " T r a n l a t i o n a n d O p t i m i z a t i o n o f D a t a F l o w P r o g r a m s , " P r o c . 1 9 7 9 I n t l . C o n f . o n P a r a l l e l P r o c e s s i n g , B e l l a i r e , M i c h i g a n , A u g . 1 9 7 9 , p p . 4 6 - 5 4 . 5 1 . A . L . D a v i s , " T h e A r c h i t e c t u r e o f DDM1: A R e c u r s i v e l y S t r u c t u r e d D a t a D r i v e n M a c h i n e , " U n i v . o f U t a h , Comp. S c . D e p t . T R - U U C S - 7 7 - 1 1 3 , 1 9 7 7 . 5 2 . J . B. D e n n i s a n d D. P. M i s u n a s , " A P r e l i m i n a r y A r c h i t e c t u r e f o r a B a s i c D a t a - F l o w P r o c e s s o r , " P r o j e c t MAC. M I T C S G Memo 1 0 2 . 5 3 . J . B. D e n n i s a n d D. P. M i s u n a s , " A C o m p u t e r A r c h i t e c t u r e f o r H i g h l y P a r a l l e l S i g n a l P r o c e s s i n g , " P r o c . o f t h e ACM 1974 N a t i o n a l C o n f e r e n c e , p p . 4 0 2 - 4 0 9 . 5 4 . J . B. D e n n i s a n d K. S. W e n g , " A p p l i c a t i o n o f D a t a F l o w C o m p u t a t i o n t o t h e W e a t h e r P r o b l e m , " P r o c . o f t h e S y m p o s i u m o n H i g h S p e e d C o m p u t e r a n d A l g o r i t h m O r g a n i z a t i o n s , A p r i l 1 9 7 7 , p p . 1 4 3 - 1 5 7 . 5 5 . S. I . K a r t a s h e v a n d S. P. K a r t a s h e v , " D y n a m i c A r c h i t e c t u r e s : P r o b l e m s a n d S o l u t i o n s , " i n C o m p u t e r , J u l y 1 978 i s s u e . 5 6 . S. P. K a r t a s h e v a n d S. I . K a r t a s h e v , " S u p e r s y s t e m s f o r t h e 8 0 ' s , " i n C o m p u t e r , N o v . 1980 i s s u e . 5 7 . G. J . L i p o v s k i , " O n a V a r i s t r u c t u r e d A r r a y o f M i c r o p r o c e s s o r s , " I E E E T r a n s , o n C o m p u t e r s , F e b . 1 9 7 7 , p p . 1 2 5 . 5 8 . J . R. M c G r a w , " D a t a F l o w C o m p u t i n g : T h e V A L L a n g u a g e , " M I T / L C S T M - 1 8 8 , J a n . 1 9 8 0 . 5 9 . L . B. M o n t z , " S a f e t y a n d O p t i m i z a t i o n T r a n s f o r m a t i o n o f D a t a F l o w P r o g r a m s , " M I T / L C S / T R - 2 4 0 , C a m b r i d g e , Ma., J a n . 1 9 8 0 . 6 0 . J . R a m b u a g h , " A D a t a F l o w M u l t i p r o c e s s o r , " I E E E T r a n s , o n Comp., F e b . 1 9 7 7 , p p . 1 3 8 - 1 4 6 . 6 1 . S. S. R e d d i a n d E . A . F e u s t e l , " A R e s t r u c t u r a b l e C o m p u t e r S y s t e m , " I E E E T r a n s , o n C o m p u t e r s , J a n . 1 9 7 8 , p p . 1-20. 6 2 . R. M. S h a p i r o a n d e t a l . , " R e p r e s e n t a t i o n o f A l g o r i t h m s a s C y c l i c P a r t i a l O r d e r i n g , " A p p l i e d D a t a R e s e a r c h , W a k e r f i e l d , M a s s . , R e p o r t C A - 7 1 1 2 - 2 7 1 1 , D e c . 1 9 7 1 . 6 3 . H. J . S i e g e l a n d e t a l . , " A S u r v e y o f I n t e r c o n n e c t i o n M e t h o d s f o r R e c o n f i g u r a b l e P a r a l l e l P r o c e s s i n g S y s t e m s , " N a t i o n a l C o m p u t e r C o n f e r e n c e 1 9 7 9 , p p . 5 2 9 - 5 4 2 . 6 4 . M. R. S l e e p , " A p p l i c a t i v e L a n g u a g e s , D a t a F l o w a n d P u r e C o m b i n a t o r y C o d e , " I E E E C o m p c o n 1 9 8 0 , p p . 1 1 2 - 1 1 5 . 6 5 . P. C . T r e l e a v e n , " E x p l o r a t i n g P r o g r a m C o n c u r r e n c y i n C o m p u t i n g S y s t e m s , " i n C o m p u t e r , J a n . 1 9 7 9 , p p . 4 2 - 4 9 . 6 6 . C . G. V i c k a n d e t a l . , " A d a p t a b l e A r c h i t e c t u r e s f o r S u p e r c o m p u t e r s , " i n C o m p u t e r , N o v . 1 9 8 0 , p p . 1 7 - 3 6 . 6 7 . I . W a t s o n a n d J . G u r d , " A P r o t o t y p e D a t a F l o w C o m p u t e r w i t h T o k e n L a b e l l i n g , " N a t i o n a l C o m p u t e r C o n f e r e n c e 1 9 7 9 , p p . 6 2 3 - 6 2 8 . 172 6 8 . D. P. M i s u n a s , " S t r u c t u r e P r o c e s s i n g i n a D a t a F l o w P r o c e s s o r , " P r o c e e d i n g s o f 1 9 7 6 I n t e r n a t i o n a l P a r a l l e l P r o c e s s i n g , A u g . 1976 p p . 1 0 0 - 1 0 5 . 6 9 . R. H. P e r r o t t , "A L a n g u a g e f o r A r r a y a n d V e c t o r P r o c e s s o r s , " ACM T r a n s , o n P r o g r a m m i n g L a n g u a g e a n d S y s t e m s , V o l . 1, N o . 2, O c t . 1 9 7 9 , p p . 1 7 7 - 1 9 5 .

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 2 3
Japan 1 0
City Views Downloads
Unknown 1 4
Tokyo 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items