UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The application of cluster analysis on a post office scheduling problem Wong, Siu-Sik 1976

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1976_A45 W65.pdf [ 13.62MB ]
Metadata
JSON: 831-1.0093519.json
JSON-LD: 831-1.0093519-ld.json
RDF/XML (Pretty): 831-1.0093519-rdf.xml
RDF/JSON: 831-1.0093519-rdf.json
Turtle: 831-1.0093519-turtle.txt
N-Triples: 831-1.0093519-rdf-ntriples.txt
Original Record: 831-1.0093519-source.json
Full Text
831-1.0093519-fulltext.txt
Citation
831-1.0093519.ris

Full Text

THE A P P L I C A T I O N O F C L U S T E R A N A L Y S I S ON A POST O F F I C E S C H E D U L I N G P R O B L E M BY SIU-SIK WONG B.A.Sc, U N I V E R S I T Y OF B R I T I S H COLUMBIA, 1972 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T OF THE R E Q U I R E M E N T S FOR THE D E G R E E OF MASTER I N B U S I N E S S A D M I N I S T R A T I O N i n t h e Faculty of C o m m e r c e a n d B u s i n e s s A d m i n i s t r a t i o n We a c c e p t t h i s t h e s i s a s c o n f o r m i n g t o t h e r e q u i r e d s t a n d a r d T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A J u l y , 1976 © S i u - S i k Wong, 1976 In presenting th i s thes is in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Univers i ty of B r i t i s h Columbia, I ag ree that the L ibrary shal l make it f ree ly ava i lab le for reference and study. I further agree that permission for extensive copying of th is thesis for scho lar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of th is thesis for f inanc ia l gain sha l l not be allowed without my writ ten permission. Department of The Univers i ty of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date A B S T R A C T T h e a p p l i c a t i o n o f c o m p u t e r i z e d c l u s t e r i n g m e t h o d s i n o u t l i n i n g t h e t r u c k r o u t e b o u n d a r i e s f o r s t r e e t l e t t e r b o x c o l l e c t i o n r u n s i s b e l i e v e d t o b e a n e f f e c t i v e t o o l f o r u s e b y t h e V a n c o u v e r P o s t O f f i c e . T h i s s t u d y i n v e s t i g a t e s a n d a n a l y s e s t h e c h a r a c t e r i s t i c s , a l g o r i t h m s , a n d a p p l i c a b i l i t y o f 12 c l u s t e r a n a l y s i s t e c h n i q u e s i n g r o u p i n g s e t s o f t w o -d i m e n s i o n a l d a t a u n i t s f o r t h e P o s t O f f i c e . A b r o a d v i e w o f c l u s t e r a n a l y s i s i s p r e s e n t e d , i n c l u d i n g a r e v i e w o f t h e m e t h o d o l o g y a n d t h e p o t e n t i a l p r o b l e m s a s s o c i a t e d w i t h n i n e h i e r a r c h i c a l a n d t h r e e n o n h i e r a r c h i c a l c l u s t e r i n g m e t h o d s . Two s e t s o f c o n t r i v e d d a t a a n d t w o e m p i r i c a l d a t a s e t s ( c o n s i s t i n g o f s t r e e t l e t t e r b o x l o c a t i o n s i n t h e B u r n a b y a r e a ) a r e u s e d t o t e s t t h e s u i t a b i l i t y o f t h e g r o u p i n g m e t h o d s i n c l u s t e r i n g b o t h e v e n l y a n d u n e v e n l y d i s t r i b u t e d d a t a u n i t s i n a 2 - d i m e n s i o n a l C a r t e s i a n s p a c e . C o m p u t e r p r o g r a m s f o r v a r i o u s c l u s t e r i n g p r o c e d u r e s a r e u s e d t o g e n e -r a t e t r e e d i a g r a m s s h o w i n g t h e l i n k a g e s o f t h e m e m b e r s w i t h i n e a c h g r o u p a s w e l l a s t h e m e m b e r s h i p l i s t s f o r t h e f o u r d a t a s e t s . T h e r e s u l t s a r e t h e n p l o t t e d o n t o m a p s f o r e v a l u a t i o n . R e s u l t s o f t h e e v a l u a t i o n s , b a s e d o n g r o u p s i z e s , d i s t r i b u -t i o n s o f d i s t a n c e s w i t h i n g r o u p s , a n d t r a v e l t i m e s a n d d i s t a n c e s , c a n b e s u m m a r i z e d a s f o l l o w s : i i a . W a r d 's m e t h o d a n d t h e t h r e e n o n h i e r a r c h i c a l m e t h o d s a r e b e t t e r c l u s t e r i n g t e c h n i q u e s i n g r o u p i n g e v e n l y d i s t r i b u t e d d a t a s e t s ; b. t h e c o m p l e t e l i n k a g e m e t h o d , a n d t h e two a v e r a g e l i n k a g e m e t h o d s a r e more s u i t a b l e f o r g r o u p i n g v i s u a l l y i d e n t i f i a b l e c l u s t e r e d d a t a u n i t s ; c . t h e s i n g l e l i n k a g e m e t h o d s a n d t h e c e n t r o i d m e t h o d s a r e g e n e r a l l y l e s s s a t i s f a c t o r y i n g r o u p i n g a l l f o u r s e t s o f d a t a ; a n d d. c l u s t e r i n g t e c h n i q u e s p r o v i d e a u s e f u l t o o l f o r o u t l i n i n g t h e r o u t e b o u n d a r i e s f o r s t r e e t l e t t e r b o x c o l l e c t i o n s . A c o m p a r a t i v e s t u d y f o r t h e V a n c o u v e r a r e a w o u l d s u b s t a n t i a t e t h e f e a s i b i l i t y o f c l u s t e r a n a l y s i s a s a n a i d t o s o l v i n g t h e s c h e d u l i n g p r o b l e m . • • • 111 TABLE OF CONTENTS Page ABSTRACT i TABLE OF CONTENTS i i i LIST OF FIGURES v i i LIST OF TABLES x i ACKNOWLEDGEMENT x i i i CHAPTER I INTRODUCTION 1 1.1 Vancouver City Postal Transportation Service 2 1.2 Purpose of the Study 4 1.3 Overview 5 CHAPTER II CLUSTER ANALYSIS : A BROAD VIEW 8 2.1 Need for Clus t e r i n g Algorithms 9 2.2 Conceptual Problems i n Cluster Analysis 10 2.2.1 The Objective Function ^ 10 2.2.2 Choice of Data Units and Variables 11 2.2.3 Measures 13 2.2.4 Other Problems of Cluster Analysis 15 2.3 A Review of Clustering Techniques 16 2.4 Uses of Clustering Techniques 19 iv CHAPTER I I I HIERARCHICAL CLUSTERING TECHNIQUES 21 3.1 Basic Agglomerative Procedure and Approaches 22 3.2 Linkage Methods 28 3.2.1 Single Linkage Methods 28 3.2.2 Complete Linkage Method 30 3.2.3 Average Linkage Within the New Group 31 3.2.4 Average Linkage Between Merged Groups 33 3.3 Centroid Methods 34 3.3.1 Centroid Method 34 3.3.2 Median (Gower) Method 35 • 3.4 Error Sum of Squares or Variant Methods 36 3.5 Summary 39 CHAPTER IV . NONHIERARCHICAL CLUSTERING TECHNIQUES 41 4.1 Elements of Nonhierarchical Methods 42 4.1.1 Seed Points 42 4.1.2 I n i t i a l P a r t i t i o n s 43 4.2 Nearest Centroid Sorting With Fixed Number of Clusters 45 4.2.1 Convergence Properties 45 4.2.2 Forgy's Method 46 4.2.3 Jancey's Variant 47 4.2.4 Convergent K-mean Method 48 4.3 Summary 49 CHAPTER V COMPARATIVE EVALUATION OF CLUSTERING TECHNIQUES 50 5.1 Approach to the Evaluation Process 50 5.1.1 Data Set 51 5.1.2 Association Measure 57 5.1.3 Inputs to Clustering Methods 59 5.1.4 The Number of Clusters 60 5.1.5 What to Cluster 60 5.1.6 Clustering Techniques 61 5.2 Tool for Interpretation of Results 63 5.3 Results 64 5.3.1 Evenly Distributed Contrived Data (DATAl) 65 5.3.2 Unevenly Distributed Contrived Data (DATA2) 80 5.3.3 North Burnaby Empirical Data (NBDATA) 95 5.3.4 South Burnaby Empirical Data (SBDATA) 109 5.4 Tools for Evaluation 112 5.5 Evaluation 127 5.5.1 Evenly Distributed Contrived Data (DATAl) 127 5.5.2 Unevenly Distributed Contrived Data (DATA2) 134 5.5.3 North Burnaby Empirical Data (NBDATA) 140 5.5.4 South Burnaby Empirical Data (SBDATA) 147 5.6 Summary 153 CHAPTER VI CONCLUSIONS 159 FOOTNOTE 166 REFERENCES 167 APPENDIX A 172 •APPENDIX B " 174 APPENDIX C 199 APPENDIX D 204 APPENDIX E 210 APPENDIX F 225 APPENDIX G 233 APPENDIX H 243 V l l LIST OF FIGURES FIGURE Page 1. Location Map of Evenly Distributed Contrived Data Set (DATA1) 53 2. Location Map of Unevenly Distributed Contrived Data Set (DATA2) 54 3. Location Map of North Burnaby Mail Boxes (NBDATA) 55 4. Location Map of South Burnaby Mail Boxes (SBDATA) 56 5. Linkages Outlined by Single Linkage-" City - Block " Method for DATA1 69 6. Linkages Outlined by Single Linkage-Euclidean Distance Method for DATAl 70 7. Linkages Outlined by Single Linkage-Chi Squares Method for DATAl 71 8. Linkages Outlined by Complete Linkage Method for DATAl 72 9. Linkages Outlined by Avg. Linkage between Merged Groups Method for DATAl 73 10. Linkages Outlined by Avg. Linkage within New Group Method for DATAl 74 11. Linkages Outlined by Centroid Method for DATAl 75 12. Linkages Outlined by Median Method for DATAl 76 13. Linkages Outlined by Ward's Method for DATAl 77 14. Group Boundaries Defined by 3 Nonhierar-chical Methods Using Seed Points as Inputs for DATAl 78 v i i i 15. Group Boundaries Defined by 3 Nonhierar-chical Methods Using I n i t i a l Partitions for DATAl 79 1.6. Linkages Outlined by Single Linkage-11 City - Block." Method for DATA2 84 17. Linkages Outlined by Single Linkage-Euclidean Distance Method for DATA2 85 18. Linkages Outlined by Single Linkage-Chi Squares Method for DATA2 86 19. Linkages Outlined by Complete Linkage Method for DATA2 87 20. Linkages Outlined by Avg. Linkage between Merged Groups Method for DATA2 88 21. Linkages Outlined by Avg. Linkage within New Group Method for DATA2 89 22. Linkages Outlined by Centroid Method for DATA2 90 23. Linkages Outlined by Median Method for DATA2 91 24. Linkages Outlined by Ward's Method for DATA2 92 25. Group Boundaries Defined by Forgy's and Convergent K-mean Methods using Seed Points as Inputs for DATA2 93 26. Group Boundaries Defined by Jancey's Method Using I n i t i a l Partitions for DATA 2 94 27. Present Street Letter Box Collection Routes of North Burnaby Area 96 28. Linkages Outlined by Single Linkage-"City .-- Block" Method for NBDATA 98 29. Linkages Outlined by Single Linkage-Euclidean Distance Method for NBDATA 99 30. Linkages Outlined by Single Linkage-Chi Squares Method for NBDATA 100 i x 31. L i n k a g e s O u t l i n e d b y C o m p l e t e L i n k a g e M e t h o d f o r NBDATA 101 32. L i n k a g e s O u t l i n e d b y A v g . L i n k a g e b e t w e e n M e r g e d G r o u p s M e t h o d f o r NBDATA 102 33. L i n k a g e O u t l i n e d b y A v g . L i n k a g e w i t h i n New G r o u p M e t h o d f o r NBDATA 103 34. L i n k a g e s O u t l i n e d b y C e n t r o i d M e t h o d f o r NBDATA 104 35. L i n k a g e s O u t l i n e d b y M e d i a n M e t h o d f o r NBDATA 105 36. L i n k a g e s O u t l i n e d b y W a r d ' s M e t h o d f o r NBDATA 106 37. G r o u p B o u n d a r i e s D e f i n e d b y J a n c e y ' s M e t h o d U s i n g S e e d P o i n t s a s I n p u t s f o r NBDATA 107 38. G r o u p B o u n d a r i e s D e f i n e d b y F o r g y ' s a n d C o n v e r g e n t K - m e a n M e t h o d s U s i n g I n i t i a l P a r t i t i o n f o r NBDATA 108 39. P r e s e n t S t r e e t L e t t e r B o x C o l l e c t i o n R o u t e s o f S o u t h B u r n a b y A r e a 111 40. L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e -" C i t y - B l o c k . - M e t h o d f o r S B D A T A 115 41. L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e -E u c l i d e a n D i s t a n c e M e t h o d f o r S B D A T A 116 42. L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e -C h i - S q u a r e s M e t h o d f o r S B D A T A 117 43. L i n k a g e s O u t l i n e d b y C o m p l e t e L i n k a g e M e t h o d f o r S B D A T A 118 44. L i n k a g e s O u t l i n e d b y A v g . L i n k a g e b e t w e e n . M e r g e d G r o u p s M e t h o d f o r S B D A T A 119 45. L i n k a g e s O u t l i n e d b y A v g . L i n k a g e w i t h i n New G r o u p M e t h o d f o r S B D A T A 120 46. L i n k a g e s O u t l i n e d b y C e n t r o i d M e t h o d f o r S B D A T A 121 X 47. Linkages Outlined by Median Method for SBDATA 122 48. Linkages Outlined by Ward's Method for SBDATA 123 49. Group Boundaries Defined by Jancey's Method Using Seed Points as Inputs for SBDATA 124 50. Group Boundaries Defined by Forgy's and Convergent K-meari Methods Using I n i t i a l Partitions for SBDATA 125 51. Distribution of Distances Within Groups Defined by Nonhierarchical Methods for DATAl 131 52. Distribution of Distances Within Groups Defined by Ward's Method for DATAl 132 53. Distribution of Distances Within Groups , Defined by Complete Linkage for DATAl 133 54. Distribution of Distances Within Groups Defined by Complete Linkage for DATA2 138 55. Distribution of Distances Within Groups Defined by Avg. Linkage Methods for DATA2 139 56. Distribution of Distances Within Groups Defined by Forgy's and Convergent K-mean Methods for NBDATA 144 57. Distribution of Distances Within Groups Defined by Average Linkage Methods for NBDATA 145 58. Distribution of Distances Within Groups Defined by Ward's Method for NBDATA 146 59. Distribution of Distances Within Groups Defined by Jancey's Method for SBDATA 151 60. Distribution of Distances Within Groups Defined by Chi-Squares Method for SBDATA 152 61. Distribution of Distances Within Groups Defined by Ward's Method for SBDATA ' 154 L I S T OF T A B L E S T A B L E ].. S t o r a g e R e q u i r e m e n t s f o r S i m i l a r i t y M a t r i x 2. P a r a m e t e r V a l u e s o f t h e R e c u r r e n c e F o r m u l a f o r F i v e H i e r a r c h i c a l T e c h n i q u e s 3. S u m m a r y o f N o n h i e r a r c h i c a l R u n s f o r D A T A l 4. R e s u l t s o f 12 C l u s t e r i n g M e t h o d s f o r D A T A l 5. S u m m a r y o f N o n h i e r a r c h i c a l R u n s f o r DATA2 6. R e s u l t s o f 12 C l u s t e r i n g M e t h o d s f o r DATA2 7. R e s u l t s o f 12 C l u s t e r i n g M e t h o d s f o r NBDATA 8. S u m m a r y o f N o n h i e r a r c h i c a l R u n s f o r NBDATA 9. R e s u l t s o f 12 C l u s t e r i n g M e t h o d s f o r S B D A T A 10. S u m m a r y o f N o n h i e r a r c h i c a l R u n s f o r S B D A T A 11. M e a n s a n d S t a n d a r d D e v i a t i o n s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r D A T A l 12. T r a v e l T i m e s a n d D i s t a n c e s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r D A T A l 13. M e a n s a n d S t a n d a r d D e v i a t i o n s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r DATA2 T r a v e l D i s t a n c e s a n d T i m e s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r DATA 2 M e a n s a n d S t a n d a r d D e v i a t i o n s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r NBDATA T r a v e l D i s t a n c e s a n d T i m e s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r NBDATA M e a n s a n d S t a n d a r d D e v i a t i o n s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r S B D A T A T r a v e l D i s t a n c e s a n d T i m e s o f G r o u p s D e f i n e d b y 12 C l u s t e r M e t h o d s f o r S B D A T A S u m m a r y o f M e t h o d P r e f e r e n c e s f o r t h e F o u r S e t s o f D a t a ACKNOWLEDGEMENT I w i s h t o t h a n k D r . C . L . D o l l , my c o m m i t t e e C h a i r m a n , f o r h i s g u i d a n c e a n d t i m e d u r i n g a l l s t a g e s my t h e s i s w o r k . I w o u l d a l s o l i k e t o t h a n k M r . B. D y k e a n d M r . T. L i m , b o t h o f V a n c o u v e r P o s t O f f i c e , f o r t h e i r a s s i s t a n c e i n c o l l e c t i n g d a t a . A n d s p e c i a l t h a n k s t o my w i f e C h a r m a i n e f o r h e r a s s i s t a n c e i n t h e t y p i n g o f t h i s t h e s i s . 1 C H A P T E R I I N T R O D U C T I O N T h e V a n c o u v e r C i t y P o s t a l T r a n s p o r t a t i o n S e r v i c e s , l i k e m a n y p o s t a l s e r v i c e s o f o t h e r b i g c i t i e s , i s c o n s t a n t l y f a c e d w i t h c o m p l i c a t e d r e v i s i o n s o f s c h e d u l e s i n t h e p u r s u i t o f a s s i g n i n g s u i t a b l e d u t i e s t o v a r i o u s t r u c k s a n d d r i v e r s . T h e s c h e d u l i n g o f d e l i v e r y s e r v i c e s i s h i g h l y c o n s t r a i n e d b y t h e s e r v i c e a n d p r o c e s s r e q u i r e m e n t s o f t h e P o s t O f f i c e . T h e s c h e d u l i n g i s f u r t h e r c o m p l i c a t e d b y t h e u n e v e n l y d i s -t r i b u t e d d e s t i n a t i o n s o f t h e p r o c e s s e d m a i l v o l u m e . P l a n n i n g f o r t h e t r u c k a n d h u m a n r e s o u r c e s m o v e m e n t s i s c o m p l e x a n d t i m e - c o n s u m i n g . A t t h e p r e s e n t s t a g e , t h e p l a n n i n g o f a l l t h e s c h e d u l e s r e l y h e a v i l y o n t h e e x p e r i e n c e o f t h e e x p e r t s a n d t h e m a n a g e m e n t i s c u r r e n t l y e n c o u r a g i n g t h e d e v e l o p m e n t o f a m o r e e f f i c i e n t s c h e d u l i n g t e c h n i q u e . O n e o f t h e b i g g e s t p r o b l e m i n s c h e d u l i n g i s t h e s e t t i n g u p o f t h e b o u n d a r i e s i n w h i c h e a c h s c h e d u l e d s e r v i c e w o u l d r e n d e r i t s d e l i v e r y e f f o r t . C l u s t e r i n g a n a l y s i s t e c h -n i q u e i s a s u i t a b l e a p p r o a c h t o s o l v e t h e p r o b l e m . W i t h t h e a d v e n t o f c o m p u t e r i z a t i o n , s e v e r a l c l u s t e r m e t h o d s c a n b e r u n i n a s i n g l e p r o g r a m a n d t h e n t h e i r r e s u l t s c a n b e a i d s t o t h e p l a n n e r s a n d s c h e d u l e r s i n t h e d e t e r m i n a t i o n o f t h e s e r v i c e b o u n d a r i e s . I n o r d e r t o a p p l y t h i s t e c h n i q u e e f f e c t i v e l y , the complex operation of the Vancouver City Postal Transportation Service must be comprehended. 1.1 Vancouver City Postal Transportation Service The Vancouver C i t y Transportation operation i s a. very complex system with a f l e e t of various sized vehicles performing d i f f e r e n t types of delivery and c o l l e c t i o n ser-vices. As of May 1976, the Post O f f i c e City Transportation system has 151 small vehicles of which 139 are scheduled for d a i l y services, and 12 are kept on standby basis to meet contingenciesj i r r e g u l a r i e s , replacements and breakdowns. A t o t a l of 23 medium to large trucks are also u t i l i z e d for other transportation services within Greater Vancouver area. Small vehicles of \ ton capacities are mainly used for d a i l y street l e t t e r box c o l l e c t i o n s , r elay bundle de-l i v e r i e s , parcel post d e l i v e r i e s , and s p e c i a l d e l i v e r i e s . Larger trucks are assigned to shuttle services transporting bulk mail volumes to and from a i r p o r t , docks, railways st a t i o n s , s a t e l l i t e post o f f i c e s and the Vancouver General Post O f f i c e . The scheduling of these d i f f e r e n t sized trucks for various services i s very i n t r i c a t e and complex. Each type of service has i t s own c h a r a c t e r i s t i c s i n timing, location constraints, and degree of importance to the Post O f f i c e operations. Frequent rescheduling of services are required because there are often a l t e r a t i o n s i n the airplane s c h e d u l e s , d e l i v e r y r e q u i r e m e n t s , s e r v i c e s r e q u i r e m e n t s , s t r e e t l e t t e r b o x a n d b u n d l e b o x a l l o c a t i o n s , m a i l v o l u m e p a t t e r n , a n d u n i o n r e g u l a t i o n . T h e r e s c h e d u l i n g o f t r u c k s e r v i c e s i s p r e s e n t l y d o n e m a n u a l l y b y t h e p l a n n e r s . C o n s t a n t g r o w t h o f m a i l v o l u m e s t o a n d f r o m t h e G r e a t e r V a n c o u v e r a r e a i n r e c e n t y e a r s i s b r o u g h t a b o u t b y t h e t r e m e n d o u s i n c r e a s e s i n p o p u l a t i o n a s w e l l a s s m a l l i n d u s t r i e s i n t h e a r e a . T h e c o o r d i n a t i o n o f c i t y p o s t a l t r a n s p o r t a t i o n i s b e c o m i n g a v e r y i n t r i c a t e a n d d i f f i c u l t p r o b l e m t o t a c k l e . A s p e c i a l v e h i c l e u t i l i z a t i o n s t u d y i n t h e s u mmer o f 1 9 7 5 i n v e s t i g a t i n g t h e f e a s i b i l i t y o f d e -c e n t r a l i z a t i o n o f N o r t h a n d W e s t V a n c o u v e r M a i l S e r v i c e s f r o m t h e G e n e r a l P o s t O f f i c e i n d i c a t e d t h a t e f f i c i e n c y o f t h e s y s t e m c o u l d b e h i g h l y i m p r o v e d b y d e c e n t r a l i z a t i o n . ^ T h e s t u d y a l s o i n t r o d u c e d c o m p u t e r i z a t i o n i n v o l u m e s t a -t i s t i c s a n d r o u t i n g o f v e h i c l e s . T h e s e c o m p u t e r i z e d m e t h o d s , u n d o u b t l y , h e l p t h e p l a n n e r s a n d t h e m a n a g e m e n t i n t h e s c h e d u l i n g o f s e r v i c e s , h o w e v e r , t h e y c a n n o t b e u t i l i z e d e f f i c i e n t l y w i t h o u t t h e c o m p u t e r i z a t i o n o f o t h e r e l e m e n t s i n v o l v e d i n t h e s c h e d u l i n g o f v e h i c l e s a n d m a n p o w e r . M o s t m a i l t r u c k s a r e a s s i g n e d t o m u l t i p l e s e r v i c e s f o r a d a y ' s w o r k , a n d i t i s c r i t i c a l t h a t t h e t i m i n g a l l o w e d f o r e a c h s e r v i c e i s a d e q u a t e a n d e f f i c i e n t l y a l l o c a t e d . R e l a y b u n d l e r u n s h a v e t o t r a n s p o r t t h e b u n d l e d m a i l s t o t h e r e l a y b o x e s i n t i m e f o r m a i l c a r r i e r s t o d i s t r i b u t e a c c o r d i n g t o t h e i r s c h e d u l e s . S t r e e t l e t t e r b o x c o l l e c t i o n s p e r i o d s a r e . r e s t r i c t e d b o t h b y t h e d e l i v e r y a n d t h e p r o c e s s p l a n t r e q u i r e m e n t s . T h e g e o g r a p h i c a l b o u n d a r i e s o f e a c h s t r e e t l e t t e r b o x c o l l e c t i o n a n d r e l a y b u n d l e r u n a r e , t h e r e f o r e , i m p o r t a n t t o t h e s t r u c t u r e o f t h e v e h i c l e s s c h e d u l e s . 1.2 P u r p o s e o f t h e S t u d y W i t h t h e a d v e n t o f c o m p u t e r i z a t i o n , i t i s i d e a l t o h a v e a m e c h a n i z e d s y s t e m t h a t c a n s o l v e a l l t h e c o m p l e x s c h e d u l i n g p r o b l e m s i n a s h o r t t i m e . A c o m p l e t e a n d d e t a i l e d c o m p u t e r s c h e d u l i n g p r o g r a m a t t h e p r e s e n t s t a g e i s u n f e a s i -b l e b e c a u s e o f t h e i n a b i l i t y t o c o m p r e h e n d a l l t h e d e t a i l s o f a n o l d e x i s t i n g m a n u a l s y s t e m i n a s h o r t p e r i o d . R O U T P L O T , a c o m p u t e r i z e d r o u t i n g p r o g r a m u s e d i n t h e 1 9 7 5 v e h i c l e u t i l i z a t i o n s t u d y p r o v e s t h a t a l t h o u g h t h i s p r o g r a m d o e s n o t c o n s i d e r e v e r y m i n o r d e t a i l s s u c h a s t r a f f i c r e s t r i c t i o n s , s t o p s i g n s , e t c . i n t h e c o n s t r u c t i o n o f t h e r o u t e , i t h a s r e l i e v e d t h e m a n a g e m e n t f r o m s p e n d i n g a s i g n i f i c a n t a m o u n t o f t i m e i n r o u t i n g o r r e - r o u t i n g o f t h e v e h i c l e s s c h e d u l e s o n c e a l t e r a t i o n s a r e i n d u c e d . M o d i f i c a -t i o n o f t h e c o m p u t e r r e s u l t s i s n e c e s s a r y , a n d t h e r e s u l t a n t s c h e d u l e s c a n b e c a r r i e d o u t w i t h i n a d a y ' s t i m e i n s t e a d o f a w e e k ' s o r m o n t h ' s t i m e t o c h a n g e t h e r o u t i n g s m a n u a l l y . T h e r o u t i n g o f a v e h i c l e a c t u a l l y d e p e n d s h i g h l y o n t h e g e o g r a p h i c b o u n d a r i e s o f t h e a s s i g n e d s c h e d u l e . 5 T r a d i t i o n a l l y , the boundaries are either natural geographic boundaries confined by r i v e r s , highways, bridges and sea or are a r b i t r a r i l y determined postal zones. The boundaries for areas with l i t t l e p hysical hindrance are primarily set by experienced planners who are ex-truck d r i v e r s . This system of boundary determination i s i n fact very r e l i a b l e , but time-consuming. The purpose of t h i s study i s two-fold:-(1) to examine the c h a r a c t e r i s t i c s of 12 c l u s t e r i n g tech-niques ; and (2) to v a l i d a t e , using contrived and empirical data, the a p p l i c a b i l i t y of these 12 techniques i n grouping box locations into suitable c l u s t e r . 1.3 Overview Chapter I begins with general introduction supple-mented by a b r i e f d e s c r i p t i o n of the Vancouver City Postal Transportation Service operations. The purpose of the study i s then outlined and an overview completes the f i r s t chapter. Chapter II presents a broad view on c l u s t e r analysis by stressing the need of c l u s t e r i n g algorithms and the con-ceptual problem i n u t i l i z i n g c l u s t e r analysis. Elements of the c l u s t e r i n g analysis such as va r i a b l e s , scale and measures are discussed to indicate the variety of methods i n amalgamating variables and c o n s t i t u t i n g data set for c l u s t e r algorithms. A review of c l u s t e r i n g techniques and t h e i r u s e s , a n d some g e n e r a l r e m a r k s o n t h e u t i l i z a t i o n o f c l u s t e r a n a l y s i s c o m p l e t e t h i s c h a p t e r . C h a p t e r I I I b e g i n s w i t h a b r i e f i n t r o d u c t i o n t o t h e 9 h i e r a r c h i c a l c l u s t e r i n g m e t h o d s e x a m i n e d i n t h i s s t u d y . T h e b a s i c a p p r o a c h t o t h e s e t e c h n i q u e s i s f i r s t d e s c r i b e d a n d t h e c h a r a c t e r i s t i c s a n d c r i t e r i o n o f e a c h t e c h n i q u e , w i t h s p e c i f i c r e f e r e n c e t o d i s t a n c e m e a s u r e s a r e t h e n d i s c u s s e d i n d e t a i l s i n t h e s e c o n d p a r t o f t h i s c h a p t e r . T h e r a t i o n a l e b e h i n d e a c h m e t h o d i s a l s o r e v i e w e d b y r e f e r r i n g t o l i t e r a -t u r e o n t h e s e m e t h o d s . S i m i l a r t o t h e a b o v e c h a p t e r , C h a p t e r I V d i s c l o s e s t h e p h i l o s o p h y , c h a r a c t e r i s t i c s a n d c r i t e r i o n o f t h r e e n o n -h i e r a r c h i c a l c l u s t e r i n g m e t h o d s t h a t a r e a p p l i c a b l e t o d i s -t a n c e m e a s u r e d d a t a a n d v a r i a b l e s . I n C h a p t e r V , t h e r e s u l t s g e n e r a t e d b y t w e l v e d i f f e r e n t c l u s t e r i n g t e c h n i q u e s o n f o u r s e t s o f i n p u t d a t a a r e e x a m i n e d a n d e v a l u a t e d i n d e t a i l . T w o s e t s o f c o n t r i v e d d a t a a r e d e s i g n e d t o t e s t t h e a p p l i c a b i l i t y o f t e c h n i q u e s f o r e v e n l y a n d u n e v e n l y d i s t r i b u t e d d a t a s e t s . T h e o t h e r t w o d a t a s e t s a r e a c t u a l s t r e e t l e t t e r b o x l o c a t i o n s o f N o r t h a n d S o u t h B u r n a b y , a n d t h e y - a r e u s e d t o t e s t t h e a p p r o p i a t e n e s s a n d e f f i c i e n c y o f t h e s e c l u s t e r i n g t e c h n i q u e s a s a i d s t o t h e s c h e d u l i n g p r o b l e m . S t a t i s t i c a l a n a l y s e s o n t h e t r a v e l t i m e s a n d d i s t a n c e s o f e a c h s e t o f r e s u l t s a r e p e r f o r m e d a n d u s e d a s e v a l u a t i o n m e a s u r e s . The c o n c l u d i n g c h a p t e r w i l l d i s c u s s t h e s u i t a b i l i t y o f t h e t e s t e d c l u s t e r i n g t e c h n i q u e s a s a t o o l f o r s c h e d u l i n g s t r e e t l e t t e r b o x c o l l e c t i o n s a n d p e r h a p s b u n d l e r e l a y r u n s f o r t h e P o s t O f f i c e . A r e a s o f a d d i t i o n a l i n v e s t i g a t i o n a r e a l s o i n c l u d e d i n t h i s c h a p t e r . C H A P T E R I I C L U S T E R I N G A N A L Y S I S : A BROAD V I E W T h i s c h a p t e r g i v e s a n i n t r o d u c t i o n t o t h e s u b j e c t o f c l u s t e r a n a l y s i s . I n g e n e r a l t e r m s , c l u s t e r a n a l y s i s c a n b e r e f e r r e d t o a s a c o l l e c t i o n o f t e c h n i q u e s a d o p t e d i n d i f f e r e n t a p p l i e d f i e l d s t o c l a s s i f y o b j e c t s i n t o g r o u p s w h i b h s a t i s f y s o me c r i t e r i a o f h o m o g e n e i t y , i n t e r - r e l a t e d n e s s , o r i n t e r - g r o u p s e p a r a t i o n . T h i s c o l l e c t i o n o f t e c h n i q u e s i s t h e c o n t r i b u t i o n o f s c i e n t i s t s f r o m v a r i o u s f i e l d s , e a c h h o l d i n g a d i f f e r e n t v i e w p o i n t o n t h e t o p i c o f c l a s s i f i c a t i o n o r c l u s t e r i n g . T h e v a r i e t y o f t e c h n i q u e s , u s e s a n d m e a s u r e s o f c l u s t e r i n g a n a l y s i s w e r e d e v e l o p e d t o s u i t d i v e r s e p u r p o s e s i n g r o u p i n g d a t a , v a r i a b l e s o r o b j e c t s i n t o c a t e g o r i e s u s a b l e f o r f u r t h e r d a t a i n t e r p r e t a t i o n o r ^ o t h e r m a t h e m a t i c a l a n a l y s i s . T h e r e i s n o g e n e r a l s o l u t i o n t o t h e p r o b l e m o f c l u s t e r i n g , p a r t l y b e c a u s e t h e a p p l i c a t i o n o f e a c h c r i t e r i o n o f p e r f o r m a n c e l e a d s i n p r i n c i p l e t o a d i f f e r e n t o u t c o m e . I t i s , t h e r e f o r e , n e c e s s a r y t o u n d e r s t a n d t h e c o n c e p t s b e h i n d e a c h c l u s t e r i n g a l g o r i t h m , t o i n v e s t i g a t e t h e a p p l i c a b i l i t y o f d i f f e r e n t a p p r o a c h e s , a n d t o c h o o s e t h e c o r r e c t m e t h o d t o d e r i v e u s a b l e c l u s t e r s f o r a s e t o f o b j e c t s . 9 2.1 Need for Clustering Algorithms C l a s s i f y i n g objects or data into groups with defined c r i t e r i a or i n t u i t i v e notions requires numerous enumerations to search a l l the p o s s i b i l i t i e s and to choose the best solution. This enumeration process would be very time-consuming and d i f f i c u l t . Abramowitz and Stegun (1968) indicated that the number of ways of sorting n observations in t o m groups i s a S t i r l i n g number of the second kind For even the r e l a t i v e l y t i n y problem of sorting 25 observa-tions into 5 groups, the number of p o s s i b i l i t i e s i s the 15 astounding quantity of over 2x10 . This number could be further compounded i f number of i d e a l groupings i s not known prior to t h i s enumeration. In order to simp l i f y the complexity i n sorting objects into appropiate number of groupings, c l u s t e r i n g algorithms are used. These algorithms are procedures for searching through the set of a l l possible clu s t e r s to fi n d one that f i t the data reasonably w e l l . Frequently, there i s a numerical measure of f i t which the algorithm attempts to optimize, but many useful algorithms do not e x p l i c i t l y optimize a c r i t e r i o n . These algorithms use d i f f e r e n t mode of search by sor t i n g , switching, j o i n i n g , s p l i t t i n g , adding and searching the data set to i d e n t i f y the cl u s t e r that s u i t s the c r i t e r i o n best. The choice of algorithm, however, i s b a s i c a l l y determined by the user's s e l e c t i o n of the data un i t , the variables and the s i m i l a r i t y measures. 2.2 Conceptual Problems i n Cluster Analysis A p p l i c a t i o n of c l u s t e r algorithms would introduce a host of problems even though the i n t u i t i v e idea of c l u s t e r -ing i s clear enough. The foremost d i f f i c u l t y i s that c l u s t e r analysis i s only a c o l l e c t i o n of h e u r i s t i c procedures with a variety of decision rules and algorithms. Series of i n t u i t i v e decisions are required to determine which elements of c l u s t e r analysis repertory should be u t i l i z e d . Unfor-tunately, the l i t e r a t u r e on c l u s t e r analysis does not provide a general framework for t h i s c o l l e c t i o n of techniques that shows the steps involved, a v a i l a b l e a l t e r n a t i v e s , decision points, and relevent c r i t e r i a for se l e c t i n g among options. In the following sub-sections, the author attempts to build a framework from which the elements of c l u s t e r analysis could be e a s i l y r e l a t e d . 2.2.1 The Objective Function Though there i s no absolute sol u t i o n to c l u s t e r problems, i t i s usually used to determine a p a r t i t i o n i n g that s a t i s f i e s some optimality c r i t e r i o n . This optimality c r i t e r i o n may be given i n terms of a f u n c t i o n a l r e l a t i o n that r e f l e c t s the levels of d e s i r a b i l i t y of the various p a r t i t i o n s or groupings. This functional r e l a t i o n i s often termed objective function. Each algorithm uses a p a r t i c u l a r c r i t e r i o n : - distance measures, s i m i l a r i t y or d i s s i m i l a r i t y measures, or quantifiable measures of homogeneity are a l l adopted as objective c r i t e r i a for d i f f e r e n t techniques. This variety constitutes the d i v e r s i f i e d problem of choosing appropriate data u n i t s , variables as w e l l as measures of functional r e l a t i o n s . 2.2.2 Choice of Data Units and Variables The actual mechanics of the c l u s t e r analysis are performed on a sample of e n t i t i e s representing "objects", "observations" or "elements". These e n t i t i e s could be the e n t i r e small single population or the f r a c t i o n of a large population. I f random samples were chosen from a large population to represent the population, independence of the data must be assumed. Cluster analysis on a given data set r e f l e c t s the c h a r a c t e r i s t i c s of these data u n i t s , thus the choice of data a f f e c t s the outcome of the analysis. Another problem facing the user i s the choice of variables that can i d e n t i f y the entity's c h a r a c t e r i s t i c s , a t t r i b u t e s or t r a i t s . Any relevant discriminating variable could highly a f f e c t the r e s u l t of the c l u s t e r analysis. M i s s i n g v a r i a b l e s c o u l d w e l l g e n e r a t e a m o r p h o u s a n d c o n f u s i n g c l u s t e r s . O n t h e o t h e r h a n d , i n c l u s i o n o f s t r o n g d i s c r i m i -n a t o r s n o t r e l e v a n t t o t h e p u r p o s e a t h a n d c o u l d m a s k t h e s o u g h t - f o r c l u s t e r s a n d g i v e m i s l e a d i n g r e s u l t s . A s e l e c t i o n m e t h o d m u s t b e u s e d t o d e t e r m i n e r e l e v a n t v a r i a b l e s , a n d b a s e d o n t h i s s e l e c t i o n o f v a r i a b l e s a n d w i t h p r o p e r s c a l i n g , a s i n g l e i n d e x o f s i m i l a r i t } ' c o u l d b e d e r i v e d . I n m o s t s t a t i s t i c a l t h e o r y d i s c u s s i o n s , t h e v a r i a -b l e s a r e a l w a y s a s s u m e d t o b e o f a s i n g l e t y p e , u s u a l l y c o n t i n u o u s a n d o n a n i n t e r v a l s c a l e . T h i s c o n v e n i e n t a s s u m p -t i o n , o f c o u r s e , f u l l y i n c r e a s e s t h e p o w e r o f m a t h e m a t i c a l t e c h n i q u e s . H o w e v e r , i n r e a l w o r l d p r o b l e m s , v a r i a b l e s a r e u s u a l l y o f m i x e d t y p e s . I n g e n e r a l , v a r i a b l e s c a n b e c l a s s i f i e d a c c o r d i n g t o t h e s i z e o f t h e r a n g e s e t o r t h e s c a l e o f m e a s u r e m e n t , a n d t h e y c a n b e c r o s s - c l a s s i f i e d . I n m a n y c a s e s , m a t h e m a t i c a l f o r m u l a t i o n a n d t r a n s f o r m a t i o n a r e u s e d t o c o n v e r t d i f f e r e n t l y c l a s s i f i e d a n d s c a l e d v a r i a b l e s i n t o u s a b l e f o r m s f o r f u r t h e r i n t e r p r e t a t i o n a n d m a n i p u l a t i o n s . T h i s n e e d o f t r a n s f o r m a t i o n i n d u c e t h e f o r m u l a t i o n o f s c a l e c o n v e r s i o n s . T h e h o m o g e n e i t y o f s c a l e t y p e s , a s m e n t i o n e d i n a b o v e p a r a g r a p h , i s r e q u i r e d i n m o s t a n a l y s i s t e c h n i q u e s . T r a n s f o r m a t i o n o f s c a l e t y p e s , h o w e v e r , m u s t b e e v a l u a t e d by i t s i m p o r t a n c e a n d s u i t a b i l i t y t o t h e a n a l y s i s t e c h n i q u e s . Scales are usually referred to as nominal, o r d i n a l , i n t e r v a l or r a t i o measures. The transformation from one type of scale to another often involves subjective consideration of the v a l i d i t y of conversions. For clus t e r a n a l y s i s , the variables are usually quantifiable and unquantifiable scales are transformed by various methods (Cochran and Hopkins, 1961; Shepard, 1962a, 1962b; Kruskal, 1964; Anderberg, 1973) to su i t the requirements of c l u s t e r i n g techniques. 2.2.3 Measures The majority of c l u s t e r i n g techniques begins with, the c a l c u l a t i o n of a matrix of s i m i l a r i t i e s or distances between e n t i t i e s , and therefore consideration i s needed of the possible ways of defining these quantities. Indeed many c l u s t e r i n g techniques may be thought of as attempts to summarize the information or rel a t i o n s h i p s between e n t i t i e s which are given i n a s i m i l a r i t y matrix, so that these r e l a -tionships can be e a s i l y comprehended and communicated. Although variables could be scaled by various transformations, a measure of association i s s t i l l needed to r e l a t e , i n numerical form, the s i m i l a r i t y of one variable to another. This measure i s required because a l l c l u s t e r i n g methods have the same basic working assumption that numerical measure among data or variables are comparable and sortable. D i f f e r e n t types of variables warrant various measures of associations. There i s a host of methods for calculating the measures among variables:- the angular measure between vectors, the product moment correlation coefficient, the canonical correlation, the matching coefficients, and some probability-based measures are a few measures commonly adopted in establishing numerical relations among variables. These measures of association of variables, however, need to be supplemented by a measure of association of data units i f the clustering technique so chosen is designed for group-ing data units. The measures of association among data units differ from that of variables in many aspects. The most prominent difference is that the measure for data unit warrants a. sometimes nonexisting relationship measure among the variables related to the data unit. The heterogeneity of variety of measurement units and variable types makes i t especially d i f f i c u l t to define meaningful measures of asso-ciation between data units, within the context of a given set of variables. Similarity and distance measures are the most popular measures adopted in clustering data units. There are a number of similarity measures, as well as distance measures that are applicable to binary and qualitative data (Anderberg, 1973; Everritt, 1974; Duran, 1974). Two dimen-sional problems are easier to measure: the distances among data units could be simply Euclidean distances. Multi-dimensional problems, however, require experimentation to examine the v a l i d i t y of weight assignment, representation spaces, and the basic approach to formulate such a measure. 2.2.4 Other Problems of Cluster Analysis Even though the user has decided the objective function, data and variables to be used, and the s i m i l a r i t y measure between data or v a r i a b l e s , there are s t i l l three big questions to be considered:- 1. what to c l u s t e r ; 2. number of c l u s t e r s ; and 3. the choice of technique. The variables are often amalgamated into a single index related to the data unit and i t i s sometimes necessary to categorize the objects by variables instead of data units . Similar to factor a n a l y s i s , the multivariate data are often c l a s s i f i e d and grouped by the a t t r i b u t e s of d i f f e r e n t variables separately or simultaneously. This choice could depend on the degree of d i f f i c u l t y i n amalgamating the variables and the relevance of i n d i v i d u a l variable to the c l u s t e r analysis. A substantial p r a c t i c a l problem i n performing a c l u s t e r analysis i s deciding upon the number of clusters i n the data. Different c l u s t e r i n g methods o f f e r various degrees of f l e x i b i l i t y i n grouping numbers of c l u s t e r s . H i e r a r c h i c a l c l u s t e r i n g methods give a configuration for every member of the c l u s t e r from one up to the number of e n t i t i e s whereas other approaches might require defined number of c l u s t e r s prior to the c l u s t e r i n g procedures. Some algorithms begin with a chosen number of groups and then modify th i s number as indicated by c e r t a i n c r i t e r i a with the objective of simultaneously determining both the number of clus t e r s and t h e i r configuration. A l l these indicate that the choice of technique could a f f e c t a l l the elements of c l u s t e r analysis. The choice of technique i s an inherent problem i n using c l u s t e r analysis. As mentioned previously, the c o l l e c -t i o n of c l u s t e r i n g techniques were developed by s c i e n t i s t s of d i f f e r e n t f i e l d s to s a t i s f y t h e i r own needs; and each technique would be p a r t i c u l a r l y suitable to one set of data or c r i t e r i a . A review of l i t e r a t u r e would help to determine the technique required for sorting objects with appropriate c r i t e r i a and algorithms. Further discussions of c l u s t e r i n g techniques i s included i n section 2.3 of t h i s chapter. 2.3 A Review of Clustering Techniques The various background of s c i e n t i s t s and researchers who developed d i f f e r e n t c l u s t e r i n g techniques r e s u l t s i n a variety of c l u s t e r i n g algorithms. Comprehensive reviews of c l a s s i f i c a t i o n methods was conducted by Cormack (1971) and Anderberg (1973). In general, c l u s t e r analysis techniques can be " c l a s s i f i e d " into types roughly as follows-^:-( i ) H i e r a r c h i c a l techniques - i n which the classes themselves are clustered into groups, the process being repeated at d i f f e r e n t l e v e l s , step by step, to form a tree diagram. ( i i ) Optimization - p a r t i t i o n i n g techniques -- i n which the c l u s t e r s are formed by optimizing the c l u s t e r i n g c r i t e r i o n . The classes are mutually exclusive, thus forming a p a r t i t i o n of the set of e n t i t i e s . ( i i i ) Density or mode-seeking techniques i n which clus t e r s are formed by searching for regions containing a r e l a t i v e l y dense concentration of e n t i t i e s . (iv) Clumping techniques i n which the classes or clumps can overlap. (v) Others methods which do not f a l l c l e a r l y into any of the four previous groups, Of a l l the above categories, h i e r a r c h i c a l technique are most commonly used and discussed. This category of tech-niques may be subdivided into "agglomerative" methods which proceed by a series of successive fusion of the N e n t i t i e s into groups, and " d i v i s i v e " methods which p a r t i t i o n the set of the N e n t i t i e s successively into f i n e r p a r t i t i o n s . The r e s u l t s of both types can be presented i n the form of a dendogram or a two dimensional tree diagram, i l l u s t r a t i n g the fusions or p a r t i t i o n s which have been made at each-successive l e v e l . Both types of h i e r a r c h i c a l techniques can be viewed as attempts to f i n d the most e f f i c i e n t step, i n some defined sense, at each stage i n the progressive sub-d i v i s i o n or synthesis of the population. Further discussion of h i e r a r c h i c a l techniques i s included i n Chapter I I I . Besides h i e r a r c h i c a l c l u s t e r i n g techniques, the others could be simply termed as non-hierarchical methods. P a r t i t i o n i n g techniques can be formulated as attempts to p a r t i t i o n the set of e n t i t i e s so as to optimize some pre-defined c r i t e r i o n . Most of these methods assume that the number of groups had been decided by the user, although some allow the number to be changed during the course of analysis. Three d i s t i n c t procedures are employed by these techniques: (a) a method of i n i t i a t i n g c l u s t e r s ; (b) a method for a l l o c a t i n g e n t i t i e s to i n i t i a t e d c l u s t e r s ; and (c) a method of r e a l l o c a t i n g some or a l l of the e n t i t i e s to other clusters once the i n i t i a l c l a s s i f i c a t o r y process has been completed. These non-hierarchical techniques are further examined i n Chapter IV. Density search techniques originated from single linkage c l u s t e r analysis. These techniques locate the high density regions and define them as cl u s t e r s (Carmichael, 1968). Clumping techniques allow overlaps between classes i n d i c a t i n g that the overlapped e n t i t i e s may belong i n several places (Jones and Jackson, 1967). Other methods such as the Q-factor analysis ( C a t t e l l , 1952; Parks, 1970; Johnson, 1970), R-factor analysis (Gower, 1966), BC Try System (Tryon and Bailey, 1970), and many others are less known among c l u s t e r i n g techniques. Most c l u s t e r i n g methods warrant extensive enumeration, and computer programs are written to perform these procedures e f f i c i e n t l y with high accuracy. 2.4 Uses of Clustering Techniques Cluster analysis i s generally used to sort data, variables or objects-into meaningful classes for further i n t e r p r e t a t i o n . Since the c h a r a c t e r i s t i c s and c r i t e r i o n of each c l u s t e r i n g technique are designed d i f f e r e n t l y f or various purposes, i t i s hard to define the li m i t s of cl u s t e r a n a l y s i s . On one end of the spectrum, i t resembles the factor analysis procedures, and on the other, i t i s nothing but a sortation methodology. Applications of c l u s t e r i n g techniques, therefore, vary from simple sortation of one- or two-dimensional data sets to complex multi-dimensional data c l a s s i f i c a t i o n s . The clu s t e r analysis i s most widely used by b i o l o g i c a l s c i e n t i s t s i n c l a s s i f y i n g species of d i f f e r e n t f a m i l i e s . Other f i e l d s , such as psychology, geological sciences, economics, archeology, medicine and many other use cl u s t e r analysis mainly i n sorta-t i o n of multivariate data sets. The development of computer technology has helped to reduce the computational time for a l l o c a t i n g or r e a l l o -cating c l u s t e r s within a data set. The increasing int e r e s t i n applied s t a t i s t i c s and the a v a i l a b i l i t y of vast amount of data have further emphasized the importance of s e l e c t i n g , sorting and grouping useful data into c l u s t e r s for other mathematical analysis. The uses of c l u s t e r a n a l y s i s , however, i s often tempered by the d i f f i c u l t i e s i n i n t e r p r e t a t i n g the r e s u l t s . These d i f f i c u l t i e s could be the r e s u l t s of:-(i) the f a i l u r e to recognize the inappropiateness of techniques on the set of data by the user; and ( i i ) the ignorance of the p o s s i b i l i t y of the absence of c l u s t e r s i n the data set. In using c l u s t e r a n a l y s i s , i t i s necessary to keep i n mind that: (a) c l u s t e r i n g techniques do not optimize with respect to the c r i t e r i a ; (b) s e l e c t i o n of v a r i a b l e s , data, and measures i s c r i t i c a l to r e s u l t s ; and (c) bias opinion on data and variables s e l e c t i o n of ' a sampled population could generate confusing and meaningless r e s u l t s . CHAPTER I I I HIERARCHICAL CLUSTERING TECHNIQUES The b r i e f r e v i e w o f c l u s t e r i n g t e c h n i q u e s i n C h a p t e r I I has i n d i c a t e d t h a t t h e r e i s a h o s t o f c l u s t e r i n g methods a p p l i c a b l e t o d a t a , v a r i a b l e s o r o b j e c t s c l a s s i f i -c a t i o n s . H i e r a r c h i c a l c l u s t e r i n g t e c h n i q u e s a r e t h e most commonly u sed a n d , by f a r , t h e most d i s c u s s e d i n l i t e r a t u r e . I t i s d i f f i c u l t t o examine e v e r y v a r i a t i o n o f h i e r a r c h i c a l c l u s t e r i n g t e c h n i q u e s d i s c l o s e d i n t h e l i t e r a t u r e , and i n t h i s s t u d y , o n l y 9 o f t h e more p o p u l a r h i e r a r c h i c a l methods a r e r e v i e w e d . The c h a r a c t e r i s t i c s , c r i t e r i o n and p r o b l e m s , i f a n y , a s s o c i a t e d w i t h e a ch o f t h e s e t e c h n i q u e s , w i t h s p e c i f i c r e f e r e n c e s t o a g g l o m e r a t i v e a p p r o a c h , d i s t a n c e mea su re , and d a t a c l u s t e r i n g , a r e examined i n t h e f o l l o w i n g s e c t i o n s . The abundance o f h i e r a r c h i c a l c l u s t e r i n g methods t r e a t e d i n t h e l i t e r a t u r e a r e a l t e r n a t i v e f o r m u l a t i o n o r m i n o r v a r i a t i o n s o f t h r e e b a s i c c l u s t e r i n g c o n c e p t s ^ : -(1) L i n k a g e method s , (2) C e n t r o i d me thod s , and (3) E r r o r sum o f s qua re s o r v a r i a n c e methods . A l l t h e s e methods a r e c o m p a t i b l e f o r c l u s t e r i n g d a t a u n i t s and o n l y l i n k a g e methods c a n be a d o p t e d t o v a r i a b l e c l u s t e r i n g p r o c e d u r e s . 22 3.1 Basic Agglomerative Procedure and Approaches The basic procedure with a l l the agglomerative methods i s s i m i l a r . They generally s t a r t with the computation of a c o r r e l a t i o n or distance matrix between the e n t i t i e s , and the end product i s a dendogram showing the sequential fusions of i n d i v i d u a l e n t i t i e s . The c o r r e l a t i o n or distance matrix, which contains the association measure, Sjj or d jj , between e n t i t i e s I and j , i s the most important component of the c l u s t e r i n g proce-dure. Most procedures assume the measure between e n t i t i e s i s symmetric i . e . Sij = Sji or dij = dji , and because of t h i s assumption, only the lower t r i a n g l e of the c o r r e l a t i o n or distance measure i s u t i l i z e d i n the c l u s t e r i n g procedures. One c r i t i c a l point concerning the elements of the matrix i s that the c l u s t e r i n g procedures are not designed to handle negatively valued elements, thus, the absolute values or square of the measure are frequently used as the association measure. Another aspect r e l a t e d to the c l u s t e r i n g procedure i s that there are s i g n i f i c a n t differences i n the c h a r a c t e r i s t i c of a c o r r e l a t i o n and a distance matrix of n e n t i t i e s : -(a) Pairwise distances d(Xj,Xj) may be represented i n terms of a symmetric D * n distance matrix:-/ 2 3 D = ' o d 1 2 d 2 1 o dm d 2n \d n i d n 2 6 / . The diagonal elements of the matrix D are d;j =0 for i =1,2...,n and dy>0 for i,j =1,2 -^n. (b) The c o r r e l a t i o n measure Sjj between two e n t i t i e s i s non-negative r e a l valued function subjected to^:-(i) 0 < Sjj < 1 for l*j i ,n ; and ( i i ) Sjj -1 for i =1,2, • -( i i i ) Sjj = Sjj . The 'pa i f wise c o r r e l a t i o n can be represented i i i a matrix of / 1 % s 2 1 1 S -31n >2n 1 These marked dif f e r e n c e s of distance and c o r r e l a t i o n measure would reverse the c l u s t e r i n g c r i t e r i a of some h i e r a r c h i c a l methods described i n the following sections. Once the matrix i s defined, the basic agglomerative approach i n c l u s t e r i n g e n t i t i e s into groups can be considered as follows:-(1) Begin with n c l u s t e r s each consisting of exactly one e n t i t y . Let the c l u s t e r s be labeled with the numbers 1 through n . (2) Search the c o r r e l a t i o n or distance matrix for the pair of c l u s t e r s that s a t i s f i e s best the c l u s t e r i n g c r i -t e r i o n . Let the chosen c l u s t e r s be labeled p and cj and l e t the association measure be Spq or dpq , P> ci' (3) Reduce the number of c l u s t e r s by 1 through merge of c l u s t e r s p and fj . Label the product of the merge q, and update the c o r r e l a t i o n or distance matrix entries i n order to r e f l e c t the revised a s s o c i a t i o n measures between c l u s t e r q_ and a l l other e x i s t i n g c l u s t e r s . Delete the row and column of the o r i g i n a l matrix pertaining to c l u s t e r p . (4) Repeat steps (2) and (3) for (n -1) times to form one single c l u s t e r containing n e n t i t i e s . The i d e n t i t y of the c l u s t e r s that are merged and the values of measures between them at each stage are recorded for the f i n a l dendogram output. Various agglomerative methods d i f f e r from t h i s basic procedure i n using the c l u s t e r i n g c r i t e r i o n to define the most c l o s e l y associated pair at step 2 and i n updating and r e v i s i n g the c o r r e l a t i o n or distance matrix at step 3. These variations would produce d r a s t i c a l l y d i f f e r e n t r e s u l t s for some data set and l i t t l e or no variance for others. This basic agglomerative procedure i s a c t u a l l y a series of comparisons between e n t i t i e s and based on these comparisons, c l u s t e r s are formed. Comparison of matrix elements, searching of pair of c l u s t e r s , update the simi-l a r i t y matrix, and deletion of matrix rows a l l added up to a t o t a l of 2n~ - 9n/2 comparisons^. The number of com-parisons should be one of the considerations for the size of the input matrix and the timing required for computation. There are several computational approaches to c l u s t e r i n g problems. Each approach has i t s own unique advan-tages and l i m i t a t i o n s . No single approach i s a c u r e - a l l for a l l circumstances; each has i t s own realm of app l i c a t i o n s . Among the computational approaches, "stored matrix", "stored data" and "sorted matrix" are more commonly used. Stored matrix approach involved the storing of the c o r r e l a t i o n or distance matrix i n the computer's c e n t r a l memory so that the s i m i l a r i t y values may be accessed d i r e c t l y i n any sequence. This approach, l i k e any others has i t s own unique c h a r a c t e r i s t i c s : -(1) The c l u s t e r i n g procedure i s independent of the derivation of c o r r e l a t i o n or distance matrix; (2) This in-core storage method severely l i m i t s the number of e n t i t i e s that can be grouped. Anderberg (1973) indicates that problems of more than 150 e n t i t i e s are d i f f i c u l t to handle without a large size computer. The storage require-ments for c o r r e l a t i o n or distance matrix are shown i n Table 1. Number of e n t i t i e s Storage required Number of e n t i t i e s Storage required 50 1,225 300 44,850 100 4,950 350 61,075 150 11,175 400 79,800 200 19,900 450 101,025 250 31,125 500 124,750 Table 1. Storage Requirements for S i m i l a r i t y Matrix (3) Both variables and data to be grouped are represented i n the matrix and do not a f f e c t the procedure of c l u s t e r i n g algorithm i n t h i s stored matrix approach. In using t h i s approach, the user must be aware of the capacity of the computer for storage and enumeration process of d i f f e r e n t methods. Stored data approach involved more computational procedures i n i t s algorithm than the stored matrix approach. The storage of data, instead of c o r r e l a t i o n or distance matrix elements, i n the c e n t r a l memory requires less core space. This approach i s applicable to any c l u s t e r i n g methods with s i m i l a r i t y or distance measures, and the combinatorial problem i n computing as s o c i a t i o n measures between cl u s t e r s by reference to the o r i g i n a l data i s comparable to the s i z i n g issue of stored matrix approach. This computational problem can be avoided by storing either the association values or the summary s t a t i s t i c s for each clu s t e r from which desired association measure could be computed i n the computer memory. This remedy, however, has r e s t r i c t e d the user of t h i s approach to data unit c l u s t e r i n g . Sorted matrix approach i s a r e l a t i v e l y unexploited methodology. This i s designed to handle sizeable s i m i l a r i t y matrix for sorting data units or variables. The d i s t i n c t advantage of t h i s approach i s that i t saves computer storage space but the matrix has to be sorted before being input to the c l u s t e r i n g program. Other approaches are also a v a i l a b l e by using various combination of the above approaches or s p e c i a l l y designed algorithm for s p e c i f i c computer systems. Magnetic tapes and disks are mostly used by other approaches (Wishart, 1969b; Park, 1970; Wolfe, 1970) for storin g data units and/or s i m i l a r i t y matrix, thus up to 1000 data units and 200 variables can be clustered with h i e r a r c h i c a l methods. Variations of stored matrix and stored data approaches are generally adopted i n most computational methods handling large data set. 28 3.2 Linkage Methods This category of h i e r a r c h i c a l c l u s t e r i n g methods i s simple to use and easy to understand. This c l a s s i f i c a t i o n procedure i s e s s e n t i a l l y i d e n t i c a l to that of basic agglomera-ti v e procedure. Maximum, minimum or average values of the assoc i a t i o n measures among e n t i t i e s are used as c r i t e r i a for grouping the data units or variables into c l u s t e r s . Methods using d i f f e r e n t c l u s t e r i n g c r i t e r i o n , of course, r e s u l t d i f f e r e n t l y i n the dendograms. In the following subsections, the c h a r a c t e r i s t i c s of four linkage methods are described. 3.2.1 Single Linkage Methods The methods of single-linkage c l u s t e r analysis are the simplest of a l l h i e r a r c h i c a l techniques, and are.also the most popular. This approach was f i r s t described by Sneath (1957) and l a t e r by many other s c i e n t i s t s (McQuitty, 1960; Lance and Williams, 1966; Johnson, 1967; Gower and Ross, 1969; Zahn, 1971; Sibson, 1973; Hartigan, 1975). The c r i t e r i o n used i n these techniques i s the minimum distance (maximum value i f c o r r e l a t i o n measure i s used) between c l u s t e r s . At each stage, after c l u s t e r s p and q, have been merged, the s i m i l a r i t y between the new cl u s t e r t and some other c l u s t e r r i s determined by d^r ~ min ( d p r , dqr) , or Str ^ max ( spr' S ( l r ) The measure between the two closest or most s i m i l a r members of clu s t e r s t and r i s the c r i t e r i o n for further merge. This c r i t e r i o n i s used throughout the enumerations u n t i l one single c l u s t e r i s formed. This linkage procedure i s known as single linkage because clusters are joined at each stage by the single short-est or strongest l i n k . For any c l u s t e r of two or more e n t i t i e s produced by t h i s method, every member i s more si m i l a r to some other member of the same c l u s t e r than to any other e n t i t y not i n the c l u s t e r . However, t h i s approach i s often c r i t i c i z e d for i t s resultant chaining c l u s t e r s for n o n - e l l i p s o i d a l groupings. I t i s frequently stated that t h i s "chaining" has d i s t i n c t l y d i s s i m i l a r e n t i t i e s at each end of the c l u s t e r . The uses of d i f f e r e n t distance measures for t h i s approach, however, would y i e l d d i f f e r e n t "chaining" e f f e c t s . Distance measures such as simple "City-Block" and Euclidean distances, and Chi-sqares are often used i n t h i s approach to c l u s t e r data sets. The outcomes i n using these d i f f e r e n t measures, of course, are d i f f e r e n t and are c l a s s i f i e d as three d i f f e r e n t single linkage methods i n t h i s study. 3.2.2 Complete Linkage Method Different from single-linkage method, the complete linkage method uses maximum distance or least c o r r e l a t i o n as c r i t e r i o n to group e n t i t i e s . Sorensen's (Sneath, 1968) complet linkage c r i t e r i o n i s that two individuals i n a group have a s i m i l a r i t y or distance which i s less than a threshold value s or r . Other s c i e n t i s t s termed t h i s method as furthest neighbour technique, i n which each i n d i v i d u a l i s treated as a single-point c l u s t e r . This approach i s considered as "masimally connected subgraph" i n graph theory. Similar to single-linkage procedure, at each stage of complete linkage algorithm, a f t e r c l u s t e r s p and have been merged, the a s s o c i a t i o n measure between the new c l u s t e r t and some other c l u s t e r r i s determined as dtr =max(dp r , d q r ) for distance measure, and S^r =min( Spr,Sqr ) for c o r r e l a t i o n measure. The quantity dtr ( o r ^tr ) l s t n e distance (or cor r e l a t i o n ) between the most distance (or d i s -s i m i l a r ) members of c l u s t e r s t and r . I f c l u s t e r s were merged, then every en t i t y i n the r e s u l t i n g c l u s t e r would be no further than dtr or more than S^r from every e n t i t y i n the c l u s t e r . The d t r or Str can be considered as the diameter of sphere of which the maximum distance or minimum c o r r e l a t i o n i s related to. The i n t e r p r e t a t i o n of the c l u s t e r s , i n contrast to single-linkage method, can be made only i n terms of the re l a t i o n s h i p within i n d i v i d u a l c l u s t e r s ; and there i s no p a r t i c u l a r l y useful i n t e r p r e t a t i o n involving the differences between c l u s t e r s . 3.2.3 Average Linkage Within the New Group Instead of r e l y i n g on extreme values, maximums or minimums, used i n single-linkage and complete linkage methods as c r i t e r i a for grouping e n t i t i e s into c l u s t e r , average linkage method u t i l i z e s average values of the measures as a r u l e for grouping e n t i t i e s or c l u s t e r s . Two methods employ t h i s c r i t e r i o n : one uses the within group averages and the other compares the between merged group averages. The l a t t e r method i s discussed i n section 3.2.4. The d,j or Sjj entries i n the i n i t i a l s i m i l a r i t y matrix may be b u i l t as the sum of s i m i l a r i t i e s associated with a l l pairwise combinations formed by taking one ent i t y from c l u s t e r i and the other from c l u s t e r j . Prior merges of any e n t i t i e s , each c l u s t e r consists of j u s t one single e n t i t y and there i s only one such pair of e n t i t i e s for each pair of c l u s t e r s . Upon the merges of clu s t e r s P and <\ , the sum of pairwise s i m i l a r i t i e s between the new cl u s t e r t and some other c l u s t e r r becomes: d t r = dp r + dqr for distance measures or S ^ r= S p r + S q r for c o r r e l a t i o n measures and the s i m i l a r i t y matrix i s updated accordingly. The sum of a l l pairwise s i m i l a r i t i e s among e n t i t i e s within c l u s t e r i , SUMj becomes: SUMt = SUMp + SUMq + dpq when c l u s t e r p and q are merged and new c l u s t e r t i s formed. At the same time, the number of e n t i t i e s Nj for c l u s t e r i increases accordingly as: Nt - N p + Nq In searching for the most s i m i l a r p a i r , the average within group s i m i l a r i t y for the c l u s t e r s formed by merging the candidate pair of clus t e r i and j becomes SUM j + SUMj + djj • (Nj + Nj )(Nj + Nj -l ) / 2 . This average linkage method has not made any reference to the maximum or minimum s i m i l a r i t y values and the in t e r p r e t a -t i o n of the r e s u l t i n g dendogram would need a d i f f e r e n t approach than that for the single or complete linkage r e s u l t s . However, as a p r a c t i c a l matter, t h i s method frequently gives r e s u l t s that are l i t t l e r a d i c a l d i f f e r e n t from those obtained with complete linkage method^. 3.2.4 Average Linkage Between Merged Groups This average linkage method uses d i f f e r e n t average values from that of the above method. Similar to the average linkage within group technique i t h i s method defines distance between groups as the average of the distance between a l l pairs of indivi d u a l s i n the two groups. The procedure can be used with c o r r e l a t i o n and distance measures as long as the concept of an average measure i s acceptable. The simi-l a r i t y matrix contains djj (or S j j ), the sum of s i m i l a r i -t i e s associated with a l l pairwise combinations between cl u s t e r i and j . The number of such between group p a i r -wise s i m i l a r i t i e s i s the product of Nj and Nj where Nj i s the number of e n t i t i e s i n c l u s t e r i . The average between group s i m i l a r i t y for c l u s t e r i and j can be formu-lated as 'J or Nj Nj Nj Nj In t h i s method, the sums of within group pairwise s i m i l a r i t i e s ai*e ignored. In reference to a p p l i c a t i o n to t h i s method to c o r r e l a t i o n measure matrix c l u s t e r i n g , Lance and Williams (1967) point out that using . COS £ COS Sjj Nj Nj ,j as a s i m i l a r i t y measure would be more appropiate. 3.3 Centroid Methods These methods merge cl u s t e r s with the most si m i l a r mean vectors or centroids. Two d i f f e r e n t approaches were developed by s c i e n t i s t s : centroid c l u s t e r i n g analysis (Sokal and Michener, 1958; King, 1966 and 67; Lance & Williams, 1967a) and median method (Gower, 1967; Lance & Williams, 1967a). The f i r s t method employs weighted measures according to the number of e n t i t i e s i n the formulation of mean vectors whereas the l a t t e r method uses equal weighs for centroids of groups. An unique c h a r a c t e r i s t i c of the centroid methods and th e i r variants i s that the s i m i l a r i t y value associated with the mergers of the most sim i l a r c l u s t e r may r i s e and f a l l from stage to stage. This i s the r e v e r s a l phenomenon associated with t h i s approach. These reversals occur because clu s t e r centroids can migrate as mergers take place. 3.3.1 Centroid Method This method was o r i g i n a l l y proposed by Sokal and Mitchener (1958) and King (1966, 196 7) who concentrate on the c l u s t e r i n g of variab l e s . Groups are depicted to l i e i n Euclidean space, and are replaced on formation by the coordi-nates of t h e i r centroid. The distance between groups i s de-fined as distance between the group centroids. The procedure i s then to fuse groups according to the distance between t h e i r centroids, the groups with the shortest distance being fused f i r s t . Lance and Williams (1967a) update the formulation of distance measure between centroids as:-Np , Nq Np Nn CJ t r= d p r + dqr — d p q Np + Nq Np + Nq N p + Nq where P and q, are the labels for the cl u s t e r s j u s t merged, t i s the l a b e l for the new c l u s t e r , and r i s any other e x i s t i n g c l u s t e r s . This equation could be used with any s i m i l a r i t y measure for either variables or data u n i t s , however, the r e s u l t s would lack a usefu l i n t e r p r e t a t i o n i f djj i s not the squared Euclidean distance between the centroids of cl u s t e r i and j . 3.3.2 Median (Gower) Method A disadvantage of the Centroid Method i s that i f the sizes of the two groups to be fused are very d i f f e r e n t , the centroid of the new group w i l l be very close to that of the larger group and may remain within that group; the ch a r a c t e r i s t i c s of the smaller group are then v i r t u a l l y l o s t . The strategy can be made independent of group size by assuming that the groups to be fused are of equal s i z e , the apparent po s i t i o n of the new group w i l l then always be between the two groups to be fused. In other words, as proposed by Gower (1967) the general idea i s that the centroids are weighted equally regardless of how many e n t i t i e s are i n the respective c l u s t e r s . When djj i s a distance function, the updating equation for the Median method i s d t r =\ ( dpr + dqr ) or Str =\ ( Spr + Sqr ) i f Sjj i s a c o r r e l a t i o n function. Although t h i s method could be made suitable for both s i m i l a r i t y and distance measures, Lance & Williams (1967a) suggest that i t should be regarded as incompatible for c o r r e l a t i o n measure, since geometrical representation of the measure cannot be interpreted e a s i l y . 3.4 Error Sum of Squares or Variant Methods Although there are several methods using error sum of squares as objective function for c l u s t e r i n g e n t i t i e s , they are variations from the method developed by Ward (1963) and Ward and Hook (1963). In t h i s study, only Ward's method i s examined. Ward (1963) proposes that at any stage of an analysis the loss of information which r e s u l t s from the grouping of indiv i d u a l s into clusters can be measured by the t o t a l sum of squared deviation of every point from the mean of the c l u s t e r to which i t belongs. At each step i n the a n a l y s i s , union of every possible pair of c l u s t e r s i s considered and the two clusters whose fusions r e s u l t s i n the minimum increase - fcdpq, - k d - s p O i n t h e e r r o r sum o f s q u a r e s a r e c o m b i n e d . I n t h e f o r m u l a t i o n o f t h i s a p p r o a c h , t h e f o l l o w i n g a r e d e f i n e d a n d c a l c u l a t e d : -X = s c o r e o n i i n o f n v a r i a b l e s f o r j i n of" 'J d a t a u n i t s i n k t h o f h c l u s t e r s j = m k X i k - 2 1 X i j k / m k j c 1 - mean o n t h e i t n v a r i a b l e f o r d a t a u n i t s i n k t h c l u s t e r ."Hk - ^ X | j k = m k . x i k J-1 " t o t a l o f s c o r e s o n i 1 ^ v a r i a b l e f o r d a t a u n i t s i n t h e k t h c l u s t e r i l 2 l^k 2 Sk - n x i J k = sum o f s q u a r e d s c o r e s on a l l v a r i a b l e s f o r a l l d a t a u n i t i n t h e k^1 c l u s t e r T h e n t h e e r r o r sum o f s q u a r e s f o r c l u s t e r k may be w r i t t e n a s E k = S k - Z I T i k 2 / m k I - I The i n c r e a s e i n t h e t o t a l e r r o r sum o f s q u a r e s due t o t h e m e r g e r o f c l u s t e r s p a n d <{ t o f o r m t h e new c l u s t e r t i s A E p q Et Ep Eq i = n 2 / = S p + Sq - ^ (T i p +Tiq^(m p +m q ) - E p - Eq Based on the above formulas, e n t i t y with least AEpq- i s grouped into the new c l u s t e r . Wishart (1969a) i n his computer algorithm indicates that the variables with a large variance have more influence on the joins than those with a small variance. The j o i n i n g of units and sub-groups i s decided on the basis of the contribu-tions to the sum by the squares deviation E^ . Although the j o i n i n g of sub-groups p and <\ may r e s u l t i n a smaller sum of squared deviations, a l i n k between p and r i s decided because the increase of error sum of squares A Ep r i s less than that of AEpq . This occurrence disrupts the homogeneity of the groups e n t i t i e s . A l t e r a t i o n to both the sub-group structure and the changes i n error sum of squares are required i n formulating t h i s algorithm. The Ward method i s designed for a s i m i l a r i t y matrix of Euclidean distances computed i n any decided representation space. Although th i s method may or may not give the minimum possible error sum of squares over a l l possible sets of h c l u s t e r s from them data u n i t s , the solution i s usually very good even i f i t i s not optimal on the c r i t e r i o n . 3.5 Summary Many of the above mentioned h i e r a r c h i c a l c l u s t e r i n g methods, using distance measure between groups as c r i t e r i o n , can be represented as a recurrence formula for the distance between a group k , and a group (ij ) formed by the fusion of groups i and j . This formula can be written as:-dk(.j) = * i d k i - * j d k j - /-djj - * : | d k l - d k j| When djj i s the distance between groups i and j and oC, fi and oC are parameters rel a t e d to d i f f e r e n t methods as shown i n Table 2. Single Linkage: oCj = oCj - !^ ; /3 = o j X Complete Linkage: j = <*j = 1^  ) = 0; Centroid: oC-t = ri j/(nj+nj); oij =rij / ( n j+nj)j fi =-o<:je<j ; y = 0 Median: 0(] = oCj = 1^ ; fi = 1^ . i= 0 Ward 1 s Method: n k + n j + nj J n k - n r n j Table 2. Parameter Values of the Recurrence Formula for Five H i e r a r c h i c a l Techniques This recurrence r e l a t i o n s h i p i s given by Lance & Williams (1967a) and by Wishart (1969c) and i t i s not suitable for methods using c o r r e l a t i o n measures. On the whole, h i e r a r c h i c a l c l u s t e r i n g techniques a l l have t h e i r merits and l i m i t a t i o n s . There i s no single method that would solve a l l types of c l u s t e r i n g problems and i t i s necessary for the user to examine the s u i t a b i l i t y of these techniques for grouping the data sets. CHAPTER IV . NONHIERARCHICAL CLUSTERING TECHNIQUES For a data set of n e n t i t i e s the h i e r a r c h i c a l methods give n nested c l a s s i f i c a t i o n s ranging from n clusters of one member each to one cl u s t e r of n members. Contrary to t h i s , nonhierarchical techniques introduced i n t h i s chapter are designed to clu s t e r data units into single c l a s s i f i c a t i o n of k c l u s t e r s , where k i s either s p e c i f i e d prior to the proce-dures or determined as part of the c l u s t e r i n g method. These methods may be used with much larger problems than the h i e r a r c h i c a l methods because i t i s not necessary to calculate and store the s i m i l a r i t y or distance matrix; i t i s not even necessary to store the data set. In general, the data units are processed s e r i a l l y and can be read from tape or disk as needed; and t h i s c h a r a c t e r i s t i c allows c l u s t e r i n g of larger c o l l e c t i o n s of data units. In t h i s study, only three of the nearest centroid s o r t i n g methods with fixed number of clust e r s are examined i n d e t a i l s . 4.1 Elements of Nonhierarchical Methods Most nonhierarchical procedures can sta r t with i n i t i a l p a r t i t i o n s or i n i t i a l seed points of the data u n i t s . In using i n i t i a l p a r t i t i o n i n g , the algorithms change the cl u s t e r memberships into "better" p a r t i t i o n s . The broad concept for these methods i s very similar to that underlying the steepest descent algorithms used for unconstrained optimization i n nonlinear programming^. These methods st a r t with i n i t i a l seed points and then generate a sequence of moves from one point to another, each giving an improved value of objective function, u n t i l a l o c a l optimum i s found. The seed point and i n i t i a l p a r t i t i o n are, therefore, impor-tant to the nonhierarchical methods. These i n i t i a l c o n f i -gurations can be chosen randomly or methodically as d i s -cussed i n the following sub-sections. 4.1.1 Seed Points . Various approaches are used i n choosing a set of seed points that are adopted as cl u s t e r n u c l e i around which the set of n data units can grouped. Some methods use data units themselves as seed points, whereas other use more sophisticated methodology i n a r r i v i n g the n u c l e i . The simplier methods choose (1) k data units from the set. randomly (McRae, 1971); ( 2 ) the f i r s t k data units i n the data set (McQueen, 1967); ( 3 ) the labeled %, 2%, • • • , ( k " 1 ) r H and n data units which are i n i t i a l l y tacked on as 1 s t to n**1 data units; or ( 4 ) subjectively k units from the data set. More calculated approaches for sele c t i n g the seed points use centroids of i n i t i a l p a r t i t i o n s (Forgy, 1965), densities of i n i t i a l groups (Astrahan, 1970) or mean vectors of the data set ( B a l l and H a l l , 1967). These approaches, l i k e any other elements of c l u s t e r i n g techniques, have a substantialy d i f f -erent influence on the r e s u l t s of c l u s t e r i n g procedure. 4.1 . 2 I n i t i a l P a r t i t i o n s In l i e u of seed points, some c l u s t e r i n g methods emphasize on the generating of i n t i a l p a r t i t i o n of the data units into mutually exclusive c l u s t e r s . However, the set of i n i t i a l seed points are required to generate i n i t i a l p a r t i t i o n i n some of the p a r t i t i o n i n g procedures. Forgy (1965) uses a given set of seed points as the nuclei to i n i t i a t e a p a r t i t i o n s formed by assigning the data units to the nearest seed point. The seed points remain stationary throughout the assignment of the f u l l data set and consequently the r e s u l t i n g set of clust e r s i s independent of the sequence i n which data units are assigned. These clusters are separated by pairwise linear boundaries which are equi-distant from the c l u s t e r s n u c l e i i n two dimensional problems. MacQueen (1967) assigns data units one at a time to the i n i t i a l l y single point c l u s t e r s pre-defined by the seed points with the nearest centroid; centroids as the true mean vectors of a l l the data units are updated as the c l u s t e r s ' sizes grow. In t h i s method the c l u s t e r centroids migrate so the distance between a given data unit and the centroid of a p a r t i c u l a r c l u s t e r may a l t e r widely during the assignment process, as a r e s u l t , the set of i n i t i a l c l u s t e r s i s dependent on the order i n which data units are assigned. Wolfe (1970) uses Ward's h i e r a r c h i c a l c l u s t e r i n g method to provide an i n i t i a l set of clu s t e r s for his algorithm. This approach, however, involve immense computational e f f o r t i n s e t t i n g up the p a r t i t i o n s , thus l i m i t i n g the size of the problem tremendously. Similar to Wolfe's approach, Lance and Williams (1967b) suggest using h i e r a r c h i c a l methods on one or more subsets of convenient size and then use the r e s u l t i n g groups as n u c l e i for assignment of the remaining c l u s t e r s . Random assignment of p a r t i t i o n , of course, i s the simplest one to use. However, t h i s approach would clu s t e r e n t i t i e s without considering t h e i r homogeneity and thus i s not an a t t r a c t i v e a l t e r n a t i v e . 45 4.2 Nearest Centroids Sorting With Fixed Number of Clusters Of a l l the nonhierarchical c l u s t e r i n g techniques, the simplest i t e r a t i v e methods merely consist of two basic processes:- (1) a set of seed points are computed as the centroids of a set of c l u s t e r s , and (2) a set of c l u s t e r s can be constructed by assigning each data unit to the c l u s t e r with the nearest seed point. These two processes are repeated a l t e r n a t e l y u n t i l a stable configuration converges: a c r i t i c a l condition for completing c l u s t e r i n g algorithms. 4.2.1 Convergence Properties In using nearest centroid sorting algorithms, convergence of processes i s c r i t i c a l and expected i n grouping data u n i t s . Proofs for convergence are mostly rigorous and generally d i f f i c u l t to understand. On the whole, the t o t a l w ithin group error sum of squares i s the key to the conver-gence. Referring to the notation used i n section 3.4, the t o t a l w i t h i n group error sum of square E can be formulated as:-k=h j_=k i =n 2 where ( ^ i j k _ x i k ^ *-s *-he squared Euclidean distance between the centroid of c l u s t e r k and the j m data unit i n that c l u s t e r . Another c h a r a c t e r i s t i c i n the algorithms that ensure convergence i s the number of d i f f e r e n t ways a data set of n data units may be clustered into h c l u s t e r s i s a f i n i t e number i f n i s f i n i t e . This indicates that any method that generates each p a r t i t i o n at most once i s f i n i t e l y convergent because there are only f i n i t e l y many d i f f e r e n t p a r t i t i o n s . The c r i t e r i o n chosen for deciding convergence of the nearest centroid sorting method i s the s t a b i l i t y of c l u s t e r membership; an al t e r n a t i v e c r i t e r i o n i s s t a b i l i t y of the cl u s t e r seed points. In most methods, the seed points are the c l u s t e r centroids which are dependent only on the c l u s t e r membership. 4.2.2 Forgy's Method The simple algorithm suggested by Forgy (1965) consists of b a s i c a l l y three steps:-(1) Start with the desired i n i t i a l configuration. I f the configuration i s a set of seed points, go to step 2; otherwise go to step 3. (2) Assign data units to the c l u s t e r s with nearest seed point. The seed points remain i n t a c t for a f u l l cycle through the entire data set. (3) Compute new seed points as the centroids of the c l u s t e r data units. Steps 2 and 3 are repeated a l t e r n a t e l y u n t i l the process converges; that i s , i t e r a t e u n t i l no data units change t h e i r c l u s t e r membership at step 2. In view of the various i n i t i a l configurations:-seed points or p a r t i t i o n s , i t i s d i f f i c u l t to estimate how many i t e r a t i o n of the steps are required to achieve conver-gence i n any p a r t i c u l a r problem. Empirical evidence reveals that f i v e r e p e t i t i o n s or less w i l l be s u f f i c i e n t for small problems. A t o t a l of n distance computations and n(k-1) comparisons of distances are required i n assigning n data units to k clusters at each r e p e t i t i o n of the two steps. R e l a t i v e l y limited number of i t e r a t i o n s i s a c t u a l l y necessary i f the number of clusters i s much smaller than the number of data u n i t s . This approach allows users to t r y several v a r i a t i o n s of the number of cl u s t e r s at a less computational cost than for a f u l l h i e r a r c h i c a l analysis. 4.2.3 Jancey's Variant Jancey (1966) suggests a method si m i l a r to Forgy's with a modified step 3. The f i r s t set of c l u s t e r seed points i s either given or computed as the centroids of cl u s t e r s i n the i n i t i a l p o s i t i o n ; at a l l suceeding stages each new seed point i s formed by r e f l e c t i n g the old seed point through the new centroid for the c l u s t e r . This technique presumbly w i l l accelerate convergence and possibly lead to a better o v e r a l l s o l u t i o n through bypassing i n f e r i o r l o c a l minima. This approach, similar to Forgy's, i m p l i c i t l y minimizes the within group error function. The boundaries of the clus t e r s are equidistant from each centroid and the r e s u l t of t h i s method i s not affected by the sequence of data units within the data set. 4.2.4 Convergent K-Mean Method Unlike Forgy's or Jancey's approach i n assigning member to old centroids, MacQueen (1967) uses a "K-mean" process i n a l l o c a t i n g each data unit to the c l u s t e r with the nearest centroid computed on the basis of the cl u s t e r ' s current membership. The convergent c l u s t e r i n g method (Wishart, 1969b; McRae, 1971) using the MacQueen's K-mean process can be implemented through the following sequence of steps'*"^. (1) Begin with an i n i t i a l p a r t i t i o n of the data units into c l u s t e r s . The p a r t i t i o n could be constructed using any of the approaches described i n section 4.1.2. (2) Take each data unit i n sequence and compute the distances to a l l c l u s t e r centroids; i f the nearest cen-t r o i d i s not that of the data unit's parent c l u s t e r , then reassign the data unit and update the centroids of the losing and gaining c l u s t e r s . Repeat step 2 u n t i l convergence i s achieved; that i s u n t i l the membership i n each c l u s t e r i s s t a b l i z e d . 4.3 Summary There are s t i l l several nonhierarchical methods described i n the l i t e r a t u r e (MacQueen, 1967; B a l l and H a l l , 1965). Most other methods use similar procedures as the three described, but have d i f f e r e n t grouping c r i t e r i a and updating procedures of the c l u s t e r elements. There are, un-doubtly, l i m i t a t i o n s and merits i n each of the other methods, and since these methods were developed for d i f f e r e n t f i e l d s , r e s u l t s from these techniques on the same set of data are expected to be d i f f e r e n t . The three techniques mentioned i n the previous sec-tions a l l have the three d i s t i n c t procedures of i n i t i a t i n g , a l l o c a t i n g and r e a l l o c a t i n g u n t i l convergence occur within the data set. In using these methods i n c l u s t e r i n g a given data set, the groupings are expected to be r e l a t i v e l y s i m i l a r . Forgy's and Convergent K-mean methods w i l l probably give simi-lar r e s u l t s because of t h e i r resemblance i n procedures. Jancey method, because of i t s d i f f e r e n t procedure i n a l l o c a t i n g new 'seed points, could produce unmatched c l u s t e r s for the same give data set. The s u i t a b i l i t y of method a p p l i c a t i o n for a given data set, therefore, cannot be determined without t r y i n g a l l three methods, perhaps with d i f f e r e n t seed points or i n i t i a l p a r t i t i o n s and various number of c l u s t e r s . Interpretation and subjective opinion on the r e s u l t s would be the. only assets i n evaluating the compatibility of these c l u s t e r i n g methods. CHAPTER V COMPARATIVE EVALUATION OF CLUSTERING TECHNIQUES Cluster analysis has always been an exploratory t o o l for generating hypothesis about the data or discerning funda-mental facts previously not apparent. Interpretation of the r e s u l t s using various tools i s needed to j u s t i f y hypothesis or simply to evaluate the s u i t a b i l i t y of the techniques. This stage of judgement i s subjective, i n t u i t i v e , and h e u r i s t i c . Comparisons of r e s u l t s generated by d i f f e r e n t techniques would indicate not only the r e l a t i o n s h i p between e n t i t i e s but also the v a l i d i t y of the desired r e s u l t s . This chapter describes the evaluation process used i n t h i s study by examining the appropiateness of each of the twelve c l u s t e r i n g methods to ,group four sets of two-dimensional data, and discusses the r e s u l t s so obtained. 5.1 Approach to the Evaluation Process The above chapters have stressed that each element of a c l u s t e r analysis has i t s own importance to the a c t u a l c l u s t e r i n g procedure. Pertinent information on the data set i s by far the c r i t i c a l to the grouping procedures:- the number of va r i a b l e s , the variable scales, and the association of these variables to each other of a data unit are the key inputs to a c l u s t e r i n g algorithm. The association measure between data units i s another e s s e n t i a l element of the c l u s t e r i n g a n a l y s i s . The number of c l u s t e r s , the c l u s t e r key and the c l u s t e r i n g technique to be used are the other v i t a l elements of a grouping analysis. The scheduling problem of the Post O f f i c e has two unique c h a r a c t e r i s t i c s : - the location of the boxes are fixed and the t r a v e l l i n g speed of the trucks are standardized to be 15 m.p.h. These two conditions emphasize the need for an e f f i c i e n t technique to assign appropiate number of c a l l points to each truck. The objective of t h i s study i s to f i n d , i f possible, such a technique. The comparative evaluation process i s , therefore, constructed to investigate the appropiateness of several c l u s t e r i n g techniques on data sets, measures and other elements that are pertinent to t h i s scheduling problem. 5.1.1 Data Set In order to examine the a p p l i c a b i l i t y of d i f f e r e n t c l u s t e r i n g methods to data sets of various s p a t i a l character-i s t i c s , four sets of data pertinent to the Post O f f i c e schedul-ing problem are used. A l l the data units of these four sets have only two variables representing the x and y coordinate of a two dimensional Cartesian space. The use of t h i s dimensional space i s based on the assumption that the distances between data units are computable by using two vari a b l e s . Both v a r i a -b l e s , i n t h i s case, are of i n t e r v a l type, and there i s no need to use scale conversions for generating conformity of va r i a b l e s . These four data sets can be i d e n t i f i e d as:-(1) evenly d i s t r i b u t e d contrived data; (2) unevenly d i s t r i b u t e d contrived data; (3) empirical data for North Burnaby area; and (4) empirical data for South Burnaby area. The f i r s t set of data was constructed by a l l o c a t i n g randomly 80 data points each representing the a r b i t r a r i l y l o c a l i t y of a mail box. These data points are f a i r l y evenly d i s t r i b u t e d , and there i s no outstanding grouping which can be spotted v i s u a l l y (Figure 1). The second set of contrived data i s e s s e n t i a l l y a c o l l e c t i o n of data points f a l l i n g into three v i s u a l l y i d e n t i -f i a b l e groups (Figure 2). This set also contains 80 data points, and i s designed to test the a b i l i t y of each technique i n o u t l i n i n g the v i s u a l l y f e a s i b l e boundaries of the three groups. Both the North Burnaby and the South Burnaby data sets represent the l o c a l i t i e s of mail boxes i n the Municipa-l i t y of Burnaby. The boundary d i v i d i n g North and South Burnaby i s an a r b i t r a r i l y set l i m i t to separate the routes of the mail runs. These two sets of data have no v i s u a l l y detect-able c l u s t e r s (Figure 3 and 4) and the present routes of 5 Legend • 9 Box Location © J* JJ j » 5 1 7 3 41 CO 22 <U «» 1» © 2 , 3 * © © © o © . « 37 G © * • . • 21 SO 7 8 © 9 0 © © e „ © © o » . © 20 * J rt 77 4 0 6 0 8 0 - 1 0 0 1 2 0 140 Location Map of Evenly Distributed Contrived Data Set(DATAl) 9 © ( © s ©, © 1 © © i « S u o 7 e • © . 3 t i I t ®„ Legend ' * . •Box Location © © 23 3V 9 9 9 © » a a a »» «2 ft 30 S9 O O © © ©. © (5 Q f* 20 « J7 3 J 3« " © 0 8 © _ _ © © 65 73 • O % 9 «i O ©„ © » J * ® »9 « 75 25 . «0 © ® «4 71 © 5 7 " - o » ' ® 74 © © ** ® © ® «S M O ™ «J « * Q 0 A O © 7 8 © © . M M W 77 62 9 © »i to 9 *8 2 0 4 0 6 0 66 1 0 0 120 1 4 0 F i g u r e 2. L o c a t i o n Map o f U n e v e n l y D i s t r i b u t e d C o n t r i v e d D a t a S e t (DATA2) F i g u r e 3. L o c a t i o n Map o f N o r t h Burnaby M a i l Boxes (NBDATA) trucks serving these areas do not conform to any grouping system. There are 87 and 113 box locations i n North and South Burnaby respectively. The x and y-coordinates of these l o c a l i t i e s are a l l measured with respect to the g r i d system adopted by the Post O f f i c e . A l l four sets of data have t h e i r own s p a t i a l c h a r a c t e r i s t i c s . The contrived data sets are designed to test the a b i l i t y of d i f f e r e n t c l u s t e r i n g techniques to outline v i s u a l l y detectable group boundaries. On the other hand, the empirical data sets are used to examine the v a l i d i t y of the c l u s t e r methods as a t o o l for grouping non-patterned a c t u a l locations of boxes for the Post O f f i c e . The implications i n using comparisons of grouping r e s u l t i n g from d i f f e r e n t tech-niques on these four data sets would depict, i f any, the "natural" groupings inherent to each of these data sets. 5.1.2 Association Measure The two-dimensional Cartesian data sets tested i n t h i s study are a c t u a l l y l o c a l i t i e s of data points with two unrelated variables. These mathematically unrelated variables of any given data point i n the set have ruled out the use of c o r r e l a t i o n measure as the c l u s t e r i n g c r i t e r i o n . The other a l t e r n a t i v e i s distance measure. Among the various types of distance measures, the Euclidean one i s probably the most s u i t -able measure for c l u s t e r i n g algorithms. ' T r a d i t i o n a l l y , the true distance between two points i and j can be expressed, i n terms of t h e i r x and y coordinates, as:-where k and p are derived from analysis of the data set. The value of k and p both r e f l e c t the ac t u a l distance t r a v e l l e d from point i to point j . A study of the scheduling problem (Tse, 1975) rela t e d to the Post O f f i c e mail runs indicates that the value of k and p depend on the road pattern and the a c c e s s i b i l i t y from one point to another. I t i s generally acceptable to use k and p as 1, and formula (1) w i l l become In t h i s study, the l a t t e r form i s used because:-(1) the two axes of the c i t y road gr i d are mostly rectangular to each other; (2) most mail boxes are located at the corner of a street where the roads i n t e r s e c t ; and (3) the t r a v e l l i n g distance from one location to another would be the summation of ho r i z o n t a l distance along the E-W d i r e c t i o n and the distance along the N-S d i r e c t i o n . This distance measure i s referred to as "City-Block" distance i n t h i s study. Cl n = k(( X j - X j / + (yj-yj ) P ) ( i ) d j j - l x r x i h h - y i . (2) 5.1.3 Inputs to Clus t e r i n g Methods As described i n Chapters III and IV, the inputs required by various h i e r a r c h i c a l and nonhierarchical d i f f e r i n many aspects. B a s i c a l l y , s i m i l a r i t y matrix and data variables are the two major inputs. In t h i s study, stored matrix approach i s used for s i x h i e r a r c h i c a l methods, and data variables are inputs to the other s i x c l u s t e r i n g runs. The elements of the symmetric s i m i l a r i t y matrices containing distance measures of the four data sets are computed by a small computer program, MATRIX (Appendix A). This program stores the values of the variables of each data point i n a set i n the computer memory, and distances, d j j , from point i to j are computed using formula (2) i n the previous section. These distances are l a t e r scaled to value less than 1.0 as required by some of the c l u s t e r i n g programs The elements of the lower t r i a n g l e of the symmetric matrix are then sorted i n rows of 10 elements into a f i l e for c l u s -t e r i n g procedures. Raw data variables are a l l punched on cards i n pre determined formats for various c l u s t e r i n g algorithms. A l l these variables are i n units of millimeter conforming to the system used by the Post O f f i c e . 5.1.4 The Number of Clusters The number of c l u s t e r s i s an important key i n the nonhierarchical methods used i n t h i s study. The four data sets are grouped into either two or three groups. The number of clu s t e r s for both contrived data sets are a r b i t r a r i l y set: 2 for the evenly d i s t r i b u t e d data set and 3 for the other. On the contrary, the number of c l u s t e r s for North and South Burnaby areas are predetermined: there are 2 and 3 routes for North and South Burnaby r e s p e c t i v e l y , and i t i s only l o g i c a l to use 2 and 3 c l u s t e r s accordingly. There i s no requirement to define the number of c l u s t e r s for h i e r a r c h i c a l methods. These methods group data sets into one single c l u s t e r at the f i n a l stage, thus i t i s not necessary to predifine the number of groups. In order to compare r e s u l t s of both h i e r a r c h i c a l and nonhierarchical methods, the number of c l u s t e r s for each data set are corres-pondingly equal as stated i n the above paragraph. The groups for h i e r a r c h i c a l methods are i d e n t i f i e d i n the tree diagram representing the dendogram of the c l u s t e r i n g r e s u l t s . 5.1.5 What to Cluster Obviously there i s no choice of what to c l u s t e r . The unanimous key for c l u s t e r , i n t h i s case, i s the data u n i t . The two t o t a l l y unrelated variables would make the c l u s t e r i n g o f v a r i a b l e s u n f e a s i b l e and m e a n i n g l e s s . The independence o f the v a r i a b l e s among d a t a p o i n t s f u r t h e r d i s c o u r a g e the use of v a r i a b l e s as t h e b a s i c c l u s t e r u n i t , 5*1.6 C l u s t e r i n g T e c h n i q u e s The n i n e h i e r a r c h i c a l and t h r e e n o n h i e r a r c h i c a l c l u s t e r i n g methods d e s c r i b e d i n C h a p t e r s I I I and IV a r e t h e 12 b e t t e r known ones and a r e p r o b a b l y most a p p l i c a b l e t o d i s t a n c e measures t h a n o t h e r complex c o m p u t a t i o n a l methods. I n t h i s s t u d y , v a r i o u s computer programs f o r t h e s e c l u s t e r i n g methods were used. F o r t r a n programs f o r s i n g l e l i n k a g e ( u s i n g minimum E u c l i d e a n d i s t a n c e as c r i t e r i o n ) , complete l i n k a g e , average l i n k a g e w i t h i n new group, average l i n k a g e between merged g r o u p s , c e n t r o i d and median methods a r e a l l m o d i f i e d f r om Anderberg's (1973) A p p e n d i x E (pp. 275-305). The i n p u t s f o r t h e s e programs i n c l u d e t h e number o f e n t i t i e s and the lower t r i a n g l e o f t h e symmetric s i m i l a r i t y m a t r i x . T h i s s e t o f programs i s made up of the main program DRIVER; s u b r o u t i n e s CNTRL, CLSTR, MTXIN, TREE, and METHOD; and f u n c t i o n LFIND (Appendix B ) . The use o f d i f f e r e n t v e r s i o n s o f METHOD would g e n e r a t e r e s u l t s f o r t h e above s i x methods. The r e s u l t o f each c l u s t e r procedure i s p r e s e n t e d i n a h o r i z o n t a l t r e e d i a -gram (Appendix B ) . From t h e s e t r e e d i a g r a m s , groups a r e i d e n t i f i e d and p l o t t e d a c c o r d i n g l y onto f i g u r e s . The UBC:BMDP2M program i s b a s i c a l l y a single linkage c l u s t e r i n g routine. In t h i s program, Engelman and Fu (1970) use either square roots of the sum of squares of differences (Euclidean distance) or chi-square of the data points as distance measure. Both these c r i t e r i a give d r a s t i c a l l y d i f f e r -ent r e s u l t s from that of single linkage using simple Euclidean distance as a c r i t e r i o n measure. Data variables are the inputs to t h i s program, and a v e r t i c a l tree diagram (Appendix C) recording the c l u s t e r i n g sequences i s output from t h i s computer package program. Another UBC package program CGROUP (Patterson and Whitaker, 1973) uses Ward's error sum of squares grouping tech-niques to clus t e r data points with variables. Similar to BMDP2M program, the inputs to CGROUP include the set of variables for each data unit and the options for running the program as w e l l as outputing the r e s u l t s . The output contains a detailed sequence of the c l u s t e r i n g procedure, the group membership at each step, and a v e r t i c a l tree diagram. An optional output i s the plot of the error sum of squares versus the number of groups (Appendix D) . A program for three h i e r a r c h i c a l methods are modi-f i e d from Anderberg's (1973) Appendix F(pp. 306-325). This program i s designed to implement the three nearest centroid sorting techniques described i n Chapter IV. The Forgy's and Jancey's grouping methods are options i n a version of s u b r o u t i n e KMEAN, and t h e Conve rgen t K-Mean method i s i m p l e -mented i n a n o t h e r v e r s i o n o f KMEAN. The who l e program i s composed o f DRIVER, the m a i n p rog ram; and 3 s u b r o u t i n e s : EXEC, RESULT and KMEAN (Append i x E ) . E i t h e r seed p o i n t s o r i n i t i a l p a r t i t i o n s c an be u sed t o i n i t i a t e t h e c l u s t e r s i n t h i s p rog ram. The o t h e r i n p u t s i n c l u d e t h e number o f e n t i t i e s , t h e number o f v a r i a b l e s f o r each e n t i t i e s , t h e number o f c l u s t e r s f o r t h i s s e t o f d a t a , o p t i o n a l o u t p u t f e a t u r e s and t he a c t u a l v a r i a b l e s f o r e a ch d a t a p o i n t . The o u t p u t i s e s s e n t i a l l y a l i s t o f membersh ip w i t h i n e a ch r e s u l t i n g c l u s t e r . The number o f e n t i t i e s moved i n t h e i t e r a t i v e s t e p s i s a l s o ouput (Append ix E ) . A l l t h e above programs a r e used t o g e n e r a t e d i f f -e r e n t o u t p u t s f o r t h i s s t u d y . The s i n g l e l i n k a g e methods ( E u c l i d e a n , sum o f square o f d i f f e r e n c e s and c h i - s q u a r e s ) a r e u sed t o t e s t t he e f f e c t o f measures on g r o u p i n g r e s u l t s . O the r methods a r e j u s t s t r a i g h t f o r w a r d i m p l e m e n t a t i o n o f t h e o t h e r n i n e methods d e s c r i b e d i n C h a p t e r s I I I and IV . The r e s u l t s f r o m t h e s e t r i a l s a r e t h e n compared and e v a l u a t i o n s o f t h e s e t e c h n i q u e s a r e d i s c u s s e d i n t h e f o l l o w i n g s e c t i o n s . 5.2 T o o l f o r I n t e r p r e t a t i o n o f R e s u l t s The o u t p u t s f r o m a l l t he computer programs g i v e v a r i o u s r e p r e s e n t a t i o n s o f t h e c l u s t e r i n g r e s u l t s . T r ee d i a g r a m t o g e t h e r w i t h t h e c l u s t e r i n g sequence a r e commonly the h i e r a r c h i c a l outputs. On the other hand, only a membership l i s t i s output from nonhierarchical methods. This inconsistent representation of outputs presents a problem i n comparing the r e s u l t s e f f e c t i v e l y . Tree diagram i s a c t u a l l y a very e f f e c t i v e t o o l for in t e r p r e t i n g the c l u s t e r i n g r e s u l t s . However, i f there are more than 50 points i n the data set, the tree becomes complex and i t i s d i f f i c u l t to trace the tree without a step by step follow-up of the sequence at the same time. Membership l i s t i s not useful at a l l as a t o o l for i n t e r p r e t a t i o n unless frequent r e f e r r a l s to the data unit inputs are made. Multi-dimensional data are therefore very d i f f i c u l t , to interpret i f the representation space i s more than 3-dimensions. The 2-dimensional data used i n t h i s case, however, can be plotted onto maps according to the values of the data varia b l e s . The r e s u l t s from both h i e r a r c h i c a l and non-h i e r a r c h i c a l methods - the l i n k i n g of e n t i t i e s and the grouping l i s t s , can be plotted onto maps of the data u n i t s . These plots of the r e s u l t s are the keys to i n t e r p r e t a t i o n and comparisons. 5.3 Results The r e s u l t s from d i f f e r e n t c l u s t e r i n g techniques from the computer programs are plotted onto maps for com-parisons. The links of e n t i t i e s are plotted as straight lines between points i n the graph. The sequential links of a l l the points as output by the h i e r a r c h i c a l programs are charted using the lowest indexed point as the l i n k to another entity or group lead point (the lowest indexed point of the group). These r e s u l t s i n diagrams of•many, i f not confusing, links between data points. The recognization of the last few links among groups or e n t i t i e s i n the diagrams allow the user to i d e n t i f y the group boundaries f a i r l y e a s i l y . The group boundary for nonhierarchical r e s u l t s are easier to handle. The data point index on the graph i s i d e n t i c a l to that on the membership output l i s t , thus boun-daries of the clusters can be e a s i l y plotted onto the diagram. In view that the four sets of data have d i f f e r e n t s p a t i c a l c h a r a c t e r i s t i c s , the r e s u l t s of the 12 methods of each data set are presented i n the following sub-sections for easy i d e n t i f i c a t i o n . The discussion of r e s u l t s i s also included i n these sub-sections. 5.3.1 Evenly Distributed Contrived Data (DATAl) A t o t a l of 80 data points exists i n t h i s data set (Figure 1 and Appendix F). D i f f e r e n t randomly chosen i n i t i a l seed points and i n i t i a l p a r t i t i o n s were input for the three nonhierarchical methods, and the r e s u l t i n g groupings are i d e n t i c a l i n a l l these t r i a l s (Table 3, Figures 14 and 15). Group Sizes T r i a l Group Seed Points Jancey 1s Forgy's Convergent K-mean 26 36 36 36 59 20 44 44 44 44 44 44 66 36 36 36 15 36 36 36 I n i t i a l Partition Methods 40 40 35 45 30 Jancey's Forgy's Convergent K-mean 44 44 44 36 36 36 36 36 36 44 44 44 36 36 36 Table 3. Summary of Nonhierarchical Runs for DATAl The r e s u l t s from d i f f e r e n t c l u s t e r i n g techniques are plotted as shown i n Figures 5 to 15. Undoubtly the d i f f e r e n t methods used give various r e s u l t s i n the number of e n t i t i e s within the groups and membership l i s t of these groups. Table 4 summarizes the number of e n t i t i e s for the two defined groups for t h i s data set. The l i s t s of memberships of the two groups r e s u l t i n g from each method are included i n Appendix G. The r e s u l t s presented i n the diagrams and tables f o r t h i s evenly d i s t r i b u t e d contrived data set give various group sizes as w e l l as group memberships. These differences i n r e s u l t s are the end products of d i f f e r e n t c l u s t e r i n g c r i t e r i a and algorithms. Among the h i e r a r c h i c a l methods, the number of e n t i t i e s i n group 1 and 2 i s perhaps best balanced i n the Ward's method (group sizes of 43 and 37 r e s p e c t i v e l y ) . T h i s , however, does not necessarily indicate that Ward's error sum of squares method i s the best f o r grouping evenly d i s t r i b u t e d data set. The approach i n evaluating the r e s u l t s i n section 5.5 gives a more comprehensive judgement on the s u p e r i o r i t y of d i f f e r e n t c l u s t e r i n g methods. It i s i n t e r e s t -ing to notice that the Centroid and Median methods both have the same c l u s t e r i n g sequences and group members i n the c l u s t e r s . S l i g h t variations of the group members i n the two average linkage methods r e f l e c t the s i m i l a r i t y of these two algorithms. The same applies to the single linkage methods using "City-Block" and Euclidean:, distance as measures. Chi-square Group Sizes Group 1 2 Methods Hierarchical Single Linkage City - Block 67 13 Euclidean Distance 67 13 Chi-Squares 22 58 Complete Linkage 35 45 Avg. Linkage between Merged Group 27 53 Avg. Linkage within New Group 27 53 Centroid Method 49 31 Median Method 49 31 Ward's Method 43 37 Nonhierarchical Jancey's Method 36 44 Forgy's Method 36 44 Convergent K-mean Method 36 44 Table 4. Results of 12 Clustering Methods for DATAl 8 0 i 7u i 60-1. £0i 4 0 .30H 2 0 1 Legend a Box Location Cluster Link 2nd Last Link Last Link 0 A r—r 2 0 4 0 eo co 100 r I" 120 140 F i g u r e 5. L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e - " C i t y - B l o c k 1 1 M e t h o d f o r D A T A l ON i n Legend 9 Box L o c a t i o n - — C l u s t e r L i n k 2nd L a s t L i n k L a s t L i n k - i " 1 - 1 — 20 — i — 1 i i • 40 60 80 100 120 i . 1 r 140 L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e - E u c l i d e a n D i s t a n c e M e t h o d f o r D A T A l Legend . © Box L o c a t i o n C l u s t e r L i n k 2nd L a s t L i n k 77 t i l l I I I , • • • . 2 0 4 0 6 0 S O 1 0 0 1 2 0 140 L i n k a g e s O u t l i n e d b y S i n g l e L i n k a g e - C h i S q u a r e s M e t h o d f o r D A T A l Legend © Box L o c a t i o n C l u s t e r L i n k F i g u r e 9. L i n k a g e s O u t l i n e d by A v g . L i n kage between Merged Groups Method f o r DATA l us Legend 0 4— 1 — i 1 — i — 1 r 1 1 1 i — — - i — — i 1 : 1 : r ~ 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 140 Figure 11. Linkages Outlined by Centroid Method for DATAl Legend Figure 12. Linkages Outlined by Median Method for DATAl legend o Box L o c a t i o n C l u s t e r L i n k 2nd L a s t L i n k Figure .13. Linkages Outlined by Ward's Method for DATAl •^ 1 Legend 8 0 i © Box L o c a t i o n © Seed P o i n t I n i t i a l P a r t i t i o n New C e n t r o i d — - Group Boundary © 30 © 6CH 4CH 20H -I / e © ~72 \ » © © © 4 © © ©., © S3 J ^ M 6* ® N 9 © 27 G i ; © © 1* 21 9 © 18 I S © 19 © © © G © 07 © ©_ 37 © 70 SO © © 4 4 • * 4 0 • © 5 S Q S 5 8 9 6 9 <* 57 e« 7» © \ 7» * © / © /' 2 0 4 0 6 0 8 0 - 1 0 0 1 2 0 140 Figure 14. Group Boundaries Defined by 3 Nonhierarchical Methods Using Seed Points as Inputs for DATAl 00 Legend © Box L o c a t i o n Kew C e n t r o i d ® Seed P o i n t Group Boundary I n i t i a l P a r t i t i o n i ! : 1 : i © ; i » !l !• • Ii ' •1 © 1 'v I 1 / ' I | I | t . ! © • i i \\ W © © © s 30 > 23 28 © 1 1 © 1 1 M © 3S •j . > i © 51 '.' , © . © i ~ 38 ,1 9 11 M Q 22 © © © 33 . . 28 ©. a 9 V ©_ \ © \ © 73 'I «l © M ——?r .^ - — — - ^ © © - - ^ ^ » • © II II n 1, ' / /I © 1 37 © C7 © © ® 60 © 65 70 © s *» © it 7t 18 32 20 t © • V 4 40| \ \ © I \ \ \ I i i 11 11 ii © # / 2 0 4 0 6 0 so- 1 0 0 1 2 0 140 Figure 15. Group Boundaries Defined by 3 Nonhierarchical Methods Using I n i t i a l P artitions for DATAl VJO single linkage method r e s u l t s , however, do not resemble any of the h i e r a r c h i c a l method,, and i t i s doubtful that t h i s method i s suitable for c l u s t e r i n g t h i s evenly d i s t r i b u t e d data set. A l l the nonhierarchical r e s u l t s are i d e n t i c a l and they are f a i r l y evenly balanced i n group s i z e s . These non-h i e r a r c h i c a l methods have a unique c h a r a c t e r i s t i c d i f f e r e n t from the h i e r a r c h i c a l ones: the clust e r s are grouped l a t e r a l l y instead of r a d i a l l y as defined by most h i e r a r c h i c a l methods. This c h a r a c t e r i s t i c would d e f i n i t e l y a f f e c t the within group in t e r - u n i t t r a v e l times and distances, also the d i s t r i b u t i o n of distances between units would be d i f f e r e n t from that of the other methods. These r e s u l t s , however, do not indicate the s u p e r i o r i t y of one class of method over the other. Further evaluation of these methods i s included i n section 5.5. 5.3.2 Unevenly Distributed Contrived Data (DATA2) This data set has 80 data units located i n 3 v i s u a l l y distinguishable groupings (Figure 2 and Appendix F). Similar to the t r i a l s c a r r i e d out for the evenly d i s t r i b u t e d data set, t r i a l s of using d i f f e r e n t seed points and i n i t i a l p a r t i t i o n s for nonhierarchical methods were conducted. The r e s u l t s of these t r i a l s d i f f e r s l i g h t l y i n using d i f f e r e n t seed points or i n i t i a l p a r t i t i o n s (Table 5 and Figures 25, 26). This i s probably the r e s u l t of the i n c l u s i o n of in-between data points Group S i z e s T r i a l 1 2 3 Group i 2 3 1 2 3 1 2 3 — - - ^ S e e d P o i n t s Methods ""-•*---^^^ 8 35 67 10 40 60 7 30 70 Jancey's 16 25 39 23 34 23 23 23 34 Forgy's 23 29 28 16 41 23 15 29 35 Convergent K-mean 23 23 34 16 41 23 16 29 35 I n i t i a l " ^ * ~ ^ ^ P a r t i t l o n Methods 20 34 26 27 27 26 23 23 34 Jancey's 15 25 39 23 23 34 16 39 25 Forgy's 23 23 34 23 23 34 23 34 23 Convergent K-iaean 23 23 34 23 23 34 23 34 23 Table 5. Summary of Nonhierarchical Runs for DATA2 f o r the three v i s u a l l y i d e n t i f i a b l e c l u s t e r s . The variance of group sizes i s r e l a t i v e l y small but t h i s indicates the import-ance of i n i t i a l seed points or centroid i n using nonhierarchi-c a l methods. Resembling the r e s u l t s for DATAl, the cl u s t e r s defined by these methods extend l a t e r a l l y instead of r a d i a l l y as i n h i e r a r c h i c a l methods. The r e s u l t s from none h i e r a r c h i c a l methods d i f f e r greatly from one another (Table 6 and Figures 16-24). In t h i s set of r e s u l t s , the Centroid and Median methods have i d e n t i c a l c l u s t e r i n g sequences as w e l l as membership. Con-trary to r e s u l t s for the evenly d i s t r i b u t e d data set, single linkage method using " C i t y - Block " measure does not resemble any other linkage methods or c l u s t e r i n g methods. The existence of 1-member group f o r the 3 cl u s t e r s i n t h i s method urges to draw a conclusion that single linkage -"City-Block" method i s inappropiate for grouping unevenly d i s t r i b u t e d data set. The other two single linkage approaches, on the other hand, give i d e n t i c a l memberships to the 3 groups though sequences of grouping are d i f f e r e n t . The average l i n k -age methods also give i d e n t i c a l memberships as w e l l as group-ing sequences to the 3 groups. The intended group sizes are 19, 34 and 27 and the r e s u l t s from complete linkage method are i d e n t i c a l to t h i s predetermined group s i z e s . The average linkage methods also give very s i m i l a r r e s u l t s to these group si z e s . Group Sizes Group 1 2 3 Methods Hierarchical Single Linkage City r Block 53 1 26 Euclidean Distance 17 25 38 Chi-Squares 17 25 38 Complete Linkage 19 34 27 Avg. Linkage between Merged Group 20 33 27 Avg. Linkage within New Group 20 33 27 Centroid Method 20 37 23 Median Method 20 37 23 Ward's Method 42 14 24 Nonhierarchical Jancey's Method 23 23 34 Forgy's Method 16 29 35 Convergent K-mean Method 16 29 35 Table 6. Results of 12 Clus t e r i n g Methods for DATA2 Legend SO-60-504 40-1 30i 20 04 61 (9 © Box Location Cluster Link — 3rd Last Link .— 2nd Last Link Last Link TS " \ J 1 eo 77 2 0 4 0 6 0 6 0 1 0 0 120 140 Figure 16. Linkages Outlined by Single Linkage-".City - Block " Method for DATA2 co Legend e Box L o c a t i o n C l u s t e r L i n k ° C ' 20 *" 40 ' 60 ' Ip TQO ' 120 ' 140 Figure 17. Linkages Outlined by Single Linkage-Euclidean Distance Method for DATA2 C O Legend © Box L o c a t i o n °6 ~" 2 0 ' 4 0 ~ ~ ' £5 ' " ~ i q ~ t~°~. 100 ' 120 1*0 F i g u r e 20. L i n k a g e s O u t l i n e d b y A v g . L i n k a g e b e t w e e n M e r g e d G r o u p s M e t h o d f o r DATA2 CO CO Legend © Box L o c a t i o n C l u s t e r L i n k ° 0 "~~~! 2 0 ' 4 0 ^ ' 60 ' ~£o 1 0 0 ' 120 140 ' Figure 21. Linkages Outlined by Avg. Linkage within New Group Method for DATA2 00 Legend © Box L o c a t i o n C l u s t e r L i n k 3rd La3t L i n k 2nd L a s t L i n k L a s t L i n k " 1 4 0 6 0 SO 1 0 0 120 140 g u r e 2 2 . L i n k a g e s O u t l i n e d B y C e n t r o i d M e t h o d f o r D A T A 2 o Legend © Box l o c a t i o n ' C l u s t e r L i n k °Z ' 2 0 ~ 4 0 ' 60 ' 86 ' TOO* ' 120 ' 140 Figure 23. Linkages Outlined by Median Method for DATA2 Legend © Box L o c a t i o n C l u s t e r LirJc 3rd L a s t L i n k 2nd L a s t LirJc L a s t L i n k ' 4 0 ' SO ' 80 r~~ 10cf~ ' 120 ' 140 Figure 24. Linkages Outlined by Ward's Method for DATA2 Legend. © Box L o c a t i o n (3) Seed P o i n t ^ I n i t i a l P a r t i t i o n / © 6 9 / . w X ^ ^. yew C e n t r o i d / © © ^ ^  i©**© © © © Group Boundary © © © 2« 18 3X © © » ( V -30 e e © © 12 27 3J 38 X \ © © © 2 5  9 » N N . © " © ©  ® « ° M ~ - * . ^ ^ ; 70 "i ** °„ * / • © " 6 4 ©, © % / 43 4 4 • © 1 e^ 7 8 V • ft » * 55 ^ - • — _ J ' 4 5 ®« - ~~ — ' • © © C s"* ii A t «7 ^ V n to ^ ' 20 r " 40 SO ' lo ' TOO ' 120 140" Figure 25. Group Boundaries Defined by Forgy's and Convergent K-mean Methods using Seed Points as Inputs for DATA2 Legend © Box L o c a t i o n © • Seed P o i n t •//' °» ^ I n i t i a l P a r t i t i c n / / ©. » ©w // * + New C e n t r o i d .110 © t / S 1 {; ©. * ©4 ©B ©_ ©.^ \ — _____ — G r o u p Boundary \ - 7 O - s 29 3H ~~ \\ © © © ^ — - _ © \ \ © © \ • ' © V V . ^ - . - j ; ^ - ' » 21 23 V < 0 37 \ / ^ " ^ X / ~ x © © © © \> © © N -k ' © © ' • ' * * \ . >^ 26 i 14 E \ gt e7 JS « . « ^ \ K = : « \ < « © 64 .71 . V ' ' — -N— "*""" ' ' © * ~ —— " * " "© " " «« © 7 4 \ ^ © \ , » SO © » \ \ S ' <3 '* s © © © * * , - - ^ — — — — • • . — * ' * * ~ _ 47 •» / ' 53 49 \ \ © / • \ . \ © © J " ? - . ' i i i . • 2 0 ' 4 0 " 6 0 ' 3 0 ' 100 120 140 •Figure 26. Group Boundaries Defined by Jancey's Method Using I n i t i a l P a r t i t i o n s for DATA2 95 On the whole, i t i s c l e a r that v i s u a l l y i d e n t i f i -cable groups i n an unevenly d i s t r i b u t e d data set can be detected easier than that of evenly d i s t r i b u t e d data set. Further evaluations are discussed i n section 5.5. 5.3.3 North Burnaby Empirical Data (NBDATA) There are a t o t a l of 87 box locations for two routes i n the North Burnaby area (Figure 3 and Appendix F). These locations are situated i n an area of approximately 34 square miles. This scattered, ungrouped data set has no v i s u a l l y i d e n t i f i a b l e c l u s t e r s . These points are presently routed as two groups of 46 and 41 re s p e c t i v e l y (Figure 27), and these routes has l i t t l e i mplication on the desired grouping of the box locations. The h i e r a r c h i c a l and nonhierarchical r e s u l t s are presented i n Table 7 and Figures 28-38. The group sizes vary from 3 to 39 for group 1 and 48 to 84 for the other. The r e s u l t s of two d i s s i m i l a r group sizes i n the f i v e h i e r a r c h i c a l methods (single linkage:"City - Block" and Euclidean distance; average linkage between merged groups; and the centroid methods) show that a p o t e n t i a l c l u s t e r (points 1, 2 and 3) located at a distance from the r e s t of the locations would temper the effectiveness of these c l u s t e r i n g algorithms. The only method that groups l o c a l i t i e s into c l u s t e r s of 39 and 48 members Group Sizes Group 1 2 Methods Hierarchical Single Linkage City - Block 3 84 Euclidean Distance 3 84 Chi-Squares 29 58 Complete Linkage 37 50 Avg. Linkage between Merged Group 3 84 Avg. Linkage within New Group 39 48 Centroid Method 12 75 Median Method 12 75 Ward's Method 35 52 Nonhierarchical Jancey's Method 35 52 Forgy's Method 31 56 Convergent K-mean Method 31 56 Table 7. Results of 12 Clustering Method for NBDATA Scale 1" = 0.62 mile Figure 28. Linkages Outlined by Single Linkage- " City - B l o c k " Method for NBDATA co Figure 29. Linkages. Outlined by Single Linkage-Euclidean Distance Method for NBDATA": v o v o S c a l e 1" = 0.62 m i l e Figure 30. Linkages Outlined by Single Linkage-Chi Squares Method for NBDATA o c S c a l e 1" = 0.62 s-ile F i g u r e 3 2 . L i n k a g e s O u t l i n e d b y A v g . L i n k a g e b e t w e e n M e r g e d G r o u p s M e t h o d f o r NBDATA • |_» o Figure 37. Group Boundaries Defined by Jancey's Method Using Seed Points as Inputs for NBDATA S c a l e 1M 0.62 a i l e F i g u r e 3 8 . G r o u p B o u n d a r i e s D e f i n e d b y F o r g y ' s a n d C o n v e r g e n t K - m e a n M e t h o d s U s i n g I n i t i a l ' P a r t i t i o n f o r NBDATA' r e s p e c t i v e l y i s the average linkage within the new group algorithm. This, however, does not indicate that t h i s method is most suitable for t h i s set of data. The evaluation approach i n l a t t e r sections would give an in-depth examination of these r e s u l t s . Three sets of seed points and i n i t i a l p a r t i t i o n s were also used to test the e f f e c t i n i n i t i a l c l u s t e r on the r e s u l t s (Table 8). There are indications that i n i t i a l p a r t i -tions have less influence than seed points on the data set. These variances i n r e s u l t i n g group s i z e s , however, do not indicate the s u p e r i o r i t y of one nonhierarchical method over the others. 5.3.4 South Burnaby Empirical Data (SBDATA) This set of data depicts the locations of 113 mail boxes which are serviced by 3 truck routes (Figure 39) i n the South Burnaby area. There i s no d i s t i n c t group boundaries that are v i s u a l l y i d e n t i f i a b l e (Figure 4 and Appendix F) for t h i s set. This probably represents the t y p i c a l d i s t r i b u t i o n of mail boxes i n other areas. The 12 c l u s t e r i n g methods used previously for the other three sets are also employed for grouping t h i s data set. Group S i z e s T r i a l 1 2 3 Group 1 2 1 2 -1 2 Seed Points Methods 5 62 10 70 15 55 Jancey's 34 53 35 52 34 53 Forgy's 35 52 31 56 31 56 Convergent . K-mean 35 52 31 56 31 56 I n i t i a l Partition Methods 40 47 43 44 35 52 Jancey's 35 52 34 53 35 52 Forgy's 35 52 35 52 35 52 Convergent K-mean 35 52 35 52 35 52 T a b l e 8. Summary o f N o n h i e r a r c h i c a l Runs f o r NBDATA I l l S c a l e 1 M — 0.62 a i l e Figure 3 9 . Present Street Letter Box C o l l e c t i o n Routes of South Burnaby Area The r e s u l t s of these 12 methods are tabulated and plotted i n Table 9 and Figures 40-50. The single linkage-'" Cit y - Block " . method i d e n t i f i e s group sizes of 1, 1 and 111. These group sizes are t o t a l l y unacceptable as aids to scheduling problems. The centroid methods have similar unbalanced r e s u l t s . The complete linkage, the Ward's method and the three nonhierarchical algorithms give reasonably balanced number of members i n each group. However, a l l the resultant group sizes from these c l u s t e r i n g methods do not resemble the actual number of c a l l points (group sizes 38, 40 and 36). A more extensive evaluation examining the c r i t i -c a l group sizes w i l l be discussed i n Section 5.5. The nonhierarchical methods' r e s u l t s presented i n Table 9 are the more d i s t i n c t i v e group sizes of six t r i a l s using d i f f e r e n t seed points and p a r t i t i o n s (Table 10). These t r i a l s show l i t t l e s i g n i f i c a n t differences i n the group sizes. 5.4 Tools for Evaluation Two methods were developed to evaluate the groups r e s u l t i n g from various techniques on the four sets of data. Both methods are included i n the computer program ROUTE (Appendix H) doing s t a t i s t i c s and routing on the groups out-lined by the d i f f e r e n t c l u s t e r i n g methods. Group Sizes Group 1 1 3 Methods Hierarchical Single Linkage City - Block 1 1 111 Euclidean Distance 21 26 66 Chi-Squares 34 36 43 Complete Linkage 20 45 48 Avg. Linkage between Merged Group 21 26 66 Avg. Linkage within New Group 22 36 55 Centroid Method 2 18 93 Median Method 2 18 93 Ward* s Method 45 28 40 Nonhierarchical Jancey's Method 34 45 34 Forgy's Method 33 46 34 Convergent K-mean Method 33 46 34 Table 9. Results of 12 Clus t e r i n g Methods for SBDATA Group Sizes T r i a l 1 2 - 3 Group 1 2 3 1 2 3 1 2 3 -~-^_Seed Points Me tho d s ~ ^ — 20 44 87 15 40 93 20 39 92 Jancey 1s 35 44 34 34 45 34 34 45 34 Forgy's 33 46 34 33 46 34 34 49 30 Convergent K-mean 33 46 34 34 49 30 34 49 30 . I n i t i a l " —-r-»^Partition Methods ~" 35 40 38 38 38 37 35 44 34 Jancey's 34 45 34 35 44 34 33 46 34 Forgy's 36 46 31 36 47 30 36 46 31 Convergent K-mean 35 44 34 35 44 34 35 44 34 Table 10. Summary of Nonhierarchical Runs for SBDATA 115 Scale 1" = 0.62 milo Figure 40. Linkages Outlined by Single Linkage-" C i t y - Block " Method for SBDATA 116 S c a l o 1" = m i l e Figure 41. Linkages Outlined by Single Linkage-Euclidean Distance Method for SBDATA 117 Scale 1" = 0.62 nile Figure 42. Linkages Outlined by Single Linkage-Chi-Squares Method for SBDATA 119 licale. 1" —= 0.62 rvxle Figure 44. Linkages Outlined by Avg. Linkage between Merged Groups Method for SBDATA 120 121 122 Scale 1" == 0.62 irdle F i g u r e 47. L i n k a g e s O u t l i n e d by Median Method f o r SBDATA 123 Legend o Box Location Cluster Link ^1 3rd L a s t L i n k jjj 2nd L a s t Link Scale 1"= 0.62 K i l e Figure 48. Linkages Outlined by Ward's Method for SBDATA / "18 \ Legend o <§> 4-\ t ^ »>*« « \ Box l o c a t i o n Seed P o i n t I n i t i a l P a r t i t i o n fj< Kev C e n t r o i d Group Boundary \ \ '© \ \ * i ^- ^- 'X 1.1 * " V 1 ^ 1 Hi 1 I % \* \ ^ — — - — — -— S c a l e Figure 49. Group Boundaries Defined by Jancey's Method Using Seed Points as Inputs for SBDATA 125 S c a l e 1" = 0.62 mile Figure 50. Group Boundaries Defined by Forgy's and Convergent K-mean Methods Using I n i t i a l P a r t i t i o n s for SBDATA The f i r s t part of the program records the data points; calculates the distance among data points within the groups; computes the mean and standard deviations and plots a histogram of the d i s t r i b u t i o n of distances (Appendix H). The d i s t r i b u t i o n s of distances within group are plotted because they are related to the route distance within each group. Christofides (1969) indicates that the associated distance for optimized truck route (D Q) i s a function of r a d i a l distances (Di) and the average value of the maximum number of customers that can be serviced by one route (C). These s t a t i s t i c s are then compared and used as evaluation key for the methods. The second part of the program was modified from ROUTPLOT, a computer program developed for p l o t t i n g routes using maximum distance saving as c r i t e r i o n (Chance e t a l , 1975). This routine gives the t r a v e l l i n g distances, the times required and the order of c a l l points for the maximum saving route within the defined groups (Appendix H). Since timing and t r a v e l l i n g distances are c r i t i c a l i n the schedulin problems for the Post O f f i c e , these outputs are us e f u l i n judging the usefulness of c l u s t e r i n g methods. 5.5 Evaluation The discussion of the r e s u l t s i n the previous sections have indicated that subjective evaluation of the r e s u l t s need to be supported by some objective measures. The l a s t section introduced two appropiate tools for eva-luating the group sizes q u a n t i t a t i v e l y . Since the four sets of data each has s p a t i a l c h a r a c t e r i s t i c s of i t s own, i t i s best to evaluate the methods for each data set separately. In the following subsections, the methods w i l l be evaluated using the outputs from the two analysing t o o l s . 5.5.1 Evenly d i s t r i b u t e d Contrived Data (DATAl) As discussed i n Section 5.3.1, the complete linkag the Ward's and the three nonhierarchical methods each groups t h i s data set into two groups of comparable sizes (Table 4). An examination of the plots of the groups defined by these methods (Figure 9, 13-15), however, does not indicate any d i s t i n c t s u periority of one method over the other. Both the r a d i a l l y linkage of h i e r a r c h i c a l methods and l a t e r a l l y group ing of nonhierarchical methods have t h e i r own grouping boundaries. These groupings of comparable sizes are i n d i f f -erent from t h e o r e t i c a l standpoint, but i n actual case, i f there are r e s t r i c t i o n s imposed on the t r a v e l l i n g d i r e c t i o n s , then the r e s u l t i n g group .boundaries would be c r i t i c a l to the scheduling problem. 128 The outputs from the program ROUTE for t h i s set of data as grouped by various methods are summarized i n Table 11 and 12. These outputs are affected by the group sizes and the d i s t r i b u t i o n of the data units within each group. Con-trary to the group size comparisons, the complete linkage groupings do not have comparable means of distances or t r a v e l times and distances. This i s the r e s u l t of grouping s c a t t e r -ed data units into c l u s t e r s . The Ward's method groupings, though gives f a i r l y s i m i l a r means, do not have similar t r a v e l -l i n g distance or timings because of the unevenly scattering of data units within the defined groups. The three non-h i e r a r c h i c a l methods give i d e n t i c a l r e s u l t s because of the s i m i l a r i t y of t h e i r groupings. The t r a v e l l i n g distances and times for groupings of these methods, i n t h i s case, vary s l i g h t l y and are the best matches among the 12 sets of r e s u l t s . The d i s t r i b u t i o n patterns of inter-data unit d i s -tances within groups are c r i t i c a l i n the determination of t r a v e l distances and times. The patterns of the groups of the above f i v e c l u s t e r i n g methods, as shown i n Figures 51-53, d e f i n i t e l y r e f l e c t the s i m i l a r i t y between the two groups defined by these methods. The patterns of the two groups defined by the complete linkage method are d i s t i n c t l y d i s s i m i l a r . The Ward's patterns have better resemblance whereas that of nonhierarchical methods have almost i d e n t i c a l d i s t r i b u t i o n patterns. The degree of resemblance does not only o f f s e t the mean and standard Mean (feet) Std. Deviation Method? • ___Group^ 1 2 1 2 Hierarchical Single Linkage City - Block 20332 6791 10629 3413 Euclidean Distance 20332 6791 10629 3413 Chi-Squares 13281 19494 6499 10577 Complete Linkage 13926 16008 6757 7863 Avg. Linkage between Merged Group 16747 12115 8178 5977 Avg. Linkage within New Group 16747 12115 8178 5977 Centroid Method 16233 12800 7973 6008 Median Method 16233 12800 7973 6008 Ward's Method 15574 15004 7962 7671 Nonhierarchica1 Jancey's Method 18396 17811 10240 10353 Fogy's Method 18396 17811 10240 10353 Convergent K-mean Method 18396 17811 10240 10353 Table 11. Means and Standard Deviations of Groups Defined by 12 Cluster: Methods for DATAl Travel Time(min) Travel Dist.(mi) Total Time (min) "Methods " • ___^Group^^ 1 2 1 2 1 2 Hierarchical Single Linkage City - Block 202.82 27.92 50.70 6.98 263.12 39.62 Euclidean Distance 202.82 27.92 50.70 6.98 263.12 39.62 Chi-Squares 77.07 168.93 19.27 42.23 96.87 221.13 Complete Linkage 107.85 136,96 26.96 34.24 139.35 177.46 Avg. Linkage between Merged Group 153.19 78.02 40.44 19.51 199.09 102.32 Avg. Linkage within New Group 153.19 78.02 40.44 19.51 199.09 102.32 Centroid Method 145.31 127.16 36.33 24.82 189.41 127.16 Median Method 145.31 127.16 36.33 24.82 189.41 127.16 Ward's Method 137.20 113.58 34.20 28.39 175.90 146.88 Nonhierarchica1 Jancey's Method 120.50 118.83 30.12 29.71 152.90 158.43 Forgy'» Method 120.50 118.83 30.12 29.71 152.90 158.43 Convergent K-mean 120.50 118.83 30.12 29.71 152.90 158.43 1. & 2. from f i r s t to last box ; 3. including stopping time. Table 12. T r a v e l Times and Distances of Groups Defined by 12 Cluster Methods for DATAl INTFRVAIS (f-TET) lf lOO.77 F R F S U N C Y 0 A O 30 1*0 0 7 6 15779,6 1 25039.50 115 111 fi4 A - * * * * * * * * * * * * * * * * * * * * A * * ** A * * * * * * A * * * * * * * * * * * x fx » A A• * * A * * A A A * * * * * * * * * * * * * * k 1: A A * * * * * * A * ft * * * * * * TV * 4 29669', 45 3 4 2 9 9 7 3 9 " 3*929,74 «8"rrt'v"~jl!3" 7fi __6 5 40 T 6 " 19 A * A * K A- A * . * * * * * * * A' A * * * * A * * * * * * * * * * * * * A- * * A A / ; * * A * r i A A A' A * * * * * * A A * O t ) o X S I o C3 40 8 0 1259,65 5795,30 '10330,7b 19401,68 _ H3937 , 13 33 0 0 0 t 0 5 37543,50 4Ho ?a , c »6 46b14,42 0 116 1 SB 161 130 1.5 1 9 6 " 79" _55L 57 J 60 j * * * * * * * * * * * * * * A A A * * * * * * * * * * * * * * * A * It ft Z *• * * * * * * * * * * * * * * * * * * * A * * * * * * * * * * **************************************** ******************************** i | - . . * * * * * * * * * * * * * * * * * * * * * * * * * * * * ************************ ******************* ************** *S ***!'< * o o CI o 1 1 * 1 ' 1 ' j - | * i . * i ' G R O U P 1 1 j •K j | | j 1 ! * 1 | i * 1 * +x 1 ! i * i -tc 1 *: -X o i * * ! « * '; * j •K ! * ft \ t * ft • ; * j ft * •k ft >- * * i * o ! * ft ! * •it 2T •K •K i * i •a •K •K o * ; •v. •* & ' 1 i « •* ft -K •X 'La • 2 i • -x . ft. M ! * •K : •* -K ! * ft -is ft * ft . •K * •W * •rt < '. •K ft ' * * •K * * * • o j * * * ft * ft -K i * •K : -K 'ft *; ft •ft j * * ! •kf ft * i -« * •* •eC * i ft ft •K -K K •i '• •< •< •K -rf -< ft ! * ••< •< •< ! -K •* * * ; r. * ! v. ft o 1 •* * ! ft •rt o rv: o CO o ft ft C - G R O U P (WARD METHOO) GROUP 2 !* i * I* i * I -it !* I* iC +: * i * j« if. ! f-.! rc ! * * i * i ' * i •K • * : * : * ; * : ' ft . + * * i * I * i •K • * ! •< : -< •K ' * ' •X ft -K * -it •St -;< i< •K •X * ft ft •K •K <^ ft ! * ! * I ^  ft # ti •K +: •it ft •K +c •f. •f < ft •rt rt i •¥. In. ¥ * M •K •c <^ •it fc * •iC ft f •K ft K •!< ~j ~ T " — - T - T "3" -r —rr-i ( - —-r- T j —rr 1 i I 1 O r--; o to rv • .i C ; rv: * — I ' ci- 1 _ i ft! o i tc Pi o o ITi j -ii OJ i o •51 j * i !• | 1—* i « : • i | 1 i I i > *— I - 1 ! 1 I 3L !_! i • i I 1 1 c r~ o- o A. d c J— CC' tr. \C cr O o |C c- !©• iir. IT • * • • • * » * 1 * * ; * • » , «» *i » i » *> i * c c- c o o |o \^ • CO o o ;£*• ro Ln tr tn in in >o O L'A o- o C I f^ - f". c r j r j f j ru r j r j f\! — o Jo o o- cr ID a- —* M r- cr LH o : i r j T j Figure 52. D i s t r i b u t i o n of Distances Within Groups Defined by Ward's Method f o r DATA 1 INTERVALS (FEET) FREQUENCY 'JO 1259.8 5 718J , 11 10111,78 1310 2,13 16063,0 7 ~1 9 0 2 3 , 7T 2 1 9 8 a , 3 6 2 ^ 9 1 5 , 00 2 7 9 0 5 . 6 5 .30866,29 0 35 7 6 86 8 6 __8.0. . 85 6 1 _ 3 . L r 80 120 1 2 1 *&****** * ft * j ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *_* * * * * *.* * * * * * ft * * _ ********************* *************** *_*.*..* ***-*_*_**.* ! ft * * * * n o ;s -o o r-;u rn o — i cz i-n 1 2 5 9 , 8 5 5 2 2 * . 3 8 9 1 9 6 , 9 0 \l\bS\liT 1 7 1 3 3 , 9 5 21 102,17 0 10 80 0 80 ********************* ************************************ 160 181 1 69 2 5 0 7 0 , 9 9 2 9 0 3 9 , 5 2 3 3 0 0 8 , 0 1 36 9 76756 «0_915 t 09 _ n o 10 17 1L° * * * * * * * * * * * * * - * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * _******* ******************** i * * * * * * * * * * * * * * * * * * * * * ********** Mr * *: * o a -o 134 deviations of the distances, but also influence the t r a v e l distances and times. Based on the above evaluations, the nonhierarchical methods' groupings d e f i n i t e l y show better c o r r e l a t i o n s i n means, t r a v e l distances and times, and d i s t r i b u t i o n patterns though there i s l i t t l e s i m i l a r i t i e s i n the group s i z e s . I f there were t r a v e l r e s t r i c t i o n i n l a t e r a l movements, then the Ward's method would probably be the best choice among these methods. On the whole, t h i s quantitative evaluation of the grouping r e s u l t s indicates that the f i v e methods mentioned i n t h i s sub-section are favoured for t h i s set of data based on the assumption that there i s no r e s t r i c t i o n s i n t r a v e l d i r e c t i o n s . 5.5.2 Unevenly Distributed Contrived Data (DATA2) The plots of the r e s u l t s of the 12 c l u s t e r i n g methods indicate that the complete linkage and the two average linkage methods give r e s u l t s resembling to that of the intend-ed group sizes (19, 34 and 27). A study of the summarized re s u l t s (Table 13 and 14) from the program ROUTE for these sets of d i f f e r e n t group sizes and memberships also indicates that these three methods have very s i m i l a r means of distances, and t r a v e l times. This i s probably the r e s u l t of various density of data units within each group. Group 1, the small-er and more scattered c l u s t e r has less stops, but longer Mean ( f e e t ) Standard D e v i a t i o n Methods"" • — ^ _ _ _ _ ^ ^ r ^ P ^ i JL 2 3 1 2 3 H i e r a r c h i c a l S i n g l e Linkage C i t y - B l o c k N. A. N. A. Eu c l i d e a n Distance 5252 6471 14692 2414 3375 9592 Chi-Squares 5252 6471 14692 2414 3375 9592 Complete Linkage 6085 8237 8434 3131 4281 4574 Avg. Linkage between Merged Group 6642 8124 8434 3672 4285 4575 Avg. Linkage w i t h i n New Group 6642 8124 8434 3672 4285 4575 C e n t r o i d Method 6642 9358 6754 3672 5040 3273 Median Method 6642 9358 6754 3672 5040 3273 Ward's Method 11668 7569 7033 7257 4573 3468 N o n h i e r a r c h i c a l J a n c e y 1 s Method 10424 14040 13645 8433 9152 8625 F o r g y 1 s Method 6955 13900 14644 4730 8609 9671 Convergent K-mean 6955 13900 14644 4730 8609 9671 Table 13. Means and Standard Dev iations of Group s Defined by 12 Cluster Methods for DATA2 M Travel Time^ -Cmin.) Travel D i s t . 1 (mi) Total 2 Time (m in.) MSO^ _ G r o u p ^ ^ 1 2 3 1 2 3 1 2 3 Hierarchical Single Linkage City - Block N.A. N.A. N.A. Euclidean Distance 25.29 | 34.85 74.45 6.32 8.71 18.61 40.59 .57.34 108.65 Chi-Squares 25.29 34.85 74.45 6.32 8.71 18.61 40.59 57.34 108.65 Complete Linkage 30.06 50.58 50.82 7.52 12.65 12.71 47.16 81.18 75.12 Avg. Linkage between Merged Group 32.45 48.20 50.82 8.11 12.05 12.71 50.45 77.90 75.12 Avg. Linkage within New Group 32.45 48.20 50.82 8.11 12.05 12.71 50.45 77.90 75.12 Centroid Method 32.45 65.38 40.09 8.11 16.34 10.02 50.45 98.68 60.79 Median Method 32.45 65.38 40.09 8.11 16.34 10.02 50.45 98.68 60.79 Ward's Method 69.67 26.72 42.47 17.42 6.68 10.62 107.47 39.32 64.07 Nonhierarchica1 Jancey's Method 45.27 60.61 69.67 11.39 15.15 17.42 66.27 81.31 100.27 Forgy's Method 31.97 66.81 67.05 7.99 16.70 16.76 46.37 92.91 98.55 Convergent K-mean 31.97 66.81 67.05 7.99 16.70 16.76 46.37 92.91 98.55 1. from 1st to last box; 2. including stop time. Table 14. Travel Distances and Times of Groups Defined by 12 Cluster Methods for DATA2 137 distances between stops i n t r a v e l l i n g from one location to another. Group 2 and 3 have denser population within the groups, thus r e q u i r i n g more stops i n traversing the shorter inter-point distances. These c h a r a c t e r i s t i c s are r e f l e c t e d i n the distance d i s t r i b u t i o n patterns (Figures 54-55) of the above three linkage methods. The intended group sizes for the three c l u s t e r s are not of comparable number of members and t h i s makes the evalua-t i o n of these r e s u l t s d i f f i c u l t . Undoubtly, the evaluation of these outcomes must be subjective. I f s i m i l a r t r a v e l distances among the three groups are required, then the Jancey*s nonhierarchical method has groups of t h i s nature. On the other hand, i f both the t r a v e l times and the mean of distances are to be comparable, then the complete linkage and the two average linkage methods would be more appropiate i n grouping t h i s set of data. One d e f i n i t e conclusion, however, can be drawn i n the evaluation of these c l u s t e r i n g techniques i s that single l i n k -age u s i n g " City - Block " measure i s not an appropiate method for t h i s set of data. Among the other seven methods, the two centroid methods also have sim i l a r group sizes as the intended ones. However, the r e s u l t s from ROUTE for the group-ings of these methods do not indicate any d i s t i n c t character-i s t i c s that are more superior than the three linkage methods' and the Jancey's method's. The other f i v e methods do not group data units into c l u s t e r s with sizes resembling to the OQ fD P-co rt t-i cr C rt-: p-o o Ml co rt Pi O ro co s: rt 3' O o c co O fD Hi p. a , fD a cr o o . 3 TJ H» fD rt fD tr> g* OP fD Hi ' O i™{ g g 1NTERVM 3 (KEtT) FPfOUfiNCY «o 0 . 0 i 0 ~ ™ _ J • .„t 0 , 0 16 0 , 0 « * * » 20 4 9 1 3 . 4 1 38 "~623>.'.25 " " -8 7 5 5 9 . 0 9 * * * * * * * * * * * * * * * * * * S3 8 8 8 1 , 9 3 25 A * A * * * * * * * * * * * * 10204 .7 7-16 11527.61 5 12fl50 -.<)5 6 * * 1*173.29. 5 -« o o 3: I — rn ^ t-> 0 , 0 "3b 23", >Ta" 1787.«s BO !<ro 0 si 6 5 5 ! . 2 1 — 'fiSTsToo -10Q7fl'.79 1 1 8 « 2 , 5 9 "T36P6T3T" 1 5 3 7 0 . l h 1 0& 70 — s r 86 AO 1 7 1 3 3 . 9 5 T F i T 9 7 ' ; 7 ' ~ _ -910.89 ' ~"3'ifA'T.62 5 3 5 1 . 3 6 7 5 5 9 , 0 9 ~97f i3 " .83 1 1 9 6 8 . 5 6 13 27 * * * * * * * * * * * * A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * t * * * * f t * * A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * I********** ****** 33 IT 6! k**»**«* 60 1H73.30 ~T6378"7o"3~ 18582.77 3b 26 2 0 7 8 7 , 5 0 "2299,?;23T 16 5 10 L — 80 * - « * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * i * * * * * * * * * ****** ****** * * * * A T C . L O T A C 3 K I T H C C S G R O U P 2 C R O U P 1 I * irj G R O U P J 1 ¥• n r- I - I A" Figure 55. D i s t r i b u t i o n of Distances Within Groups Defined by Avg. Linkage Methods for DATA2 140 intended groupings nor have comparable means of distances and t r a v e l distances within the defined groups. On the whole, there are methods that are more s u i t -able for grouping t h i s set of data. The superiority of any method, i n t h i s case, cannot be judged solely by the comparisons of the means of distances, t r a v e l distances and times; i t requires subjective evaluation. This subjective decision process involves the consideration of appropiate group s i z e s , the weighing of the quantitative measures, and most important of a l l , the experience of the user. 5.5.3 North Burnaby Empirical Data (NBDATA) The complete linkage, the average linkage within the new group, the Ward's and the Jancey's methods a l l have similar group sizes though d i f f e r e n t memberships. The uneven d i s t r i b u t i o n of data points, e s p e c i a l l y points 1, 2 and 3, i n th i s set do not only discourage the use of group sizes as an evaluation of c r i t e r i o n , i t also indicates the importance of d i s t r i b u t i o n pattern i n the judgement of appropiate groupings. The a p p l i c a t i o n of evaluation tools on the 12 sets of groupings defined by the d i f f e r e n t c l u s t e r i n g methods on t h i s data set indicates that there are c o n f l i c t i n g evidence on the s u p e r i o r i t y of one method over the others (Tables 15 and 16). Mean (feet) Std. Deviation 77~T—T — _ Group Methods —• 1 2 1 2 Hierarchical Single Linkage City - Block 944 11637 476 7241 Euclidean Distance 944 11637 476 7241 Chi-Squares 9562 7705 5875 4112 Complete Linkage 9824 6501 5609 3215 Avg. Linkage between Merged Group 944 11637 476 7241 Avg. Linkage within New Group 8614 10087 8105 6125 Centroid Method 9489 9531 7226 5293 Median Method 9489 9531 7226 5293 Ward1s Method 10269 7622 6187 4201 Nonhierarchical Jancey's Method 9749 6688 5553 3332 Forgy's Method 9576 7091 5487 3588 Convergent K-mean Method 9576 7091 5487 3588 Table 15. Means and Standard Deviation of Groups Defined by 12 Cluster. Methods for NBDATA Travel Time1(min) Travel DisU(mi) Total T i 2 me (min) " r — — G r o u p Methods — — 1 2 1 2 1 2 Hierarchical Single Linkage City - Block 1.00 159.91 0.25 39.98 3.70 235.51 Euclidean Distance 1.00 159.91 0.25 39.98 3.70 235.51 Chi-Squares 50.06 63.25 12.51 15.81 76.16 115.45 Complete Linkage 66.81 49.73 16.70 12.43 100.11 94.73 Avg. Linkage between Merged Group 1.00 159.91 0.25 39.98 3.70 235.51 Avg. Linkage within New Group 70.68 66.19 17.67 16.55 105.78 109.39 Centroid Method 21.21 87.83 5.30 21.96 32.01 155.33 Median Method 21.21 87.83 5.30 21.96 32.01 155.33 Ward's Method 68.19 57.55 17.05 14.39 99.69 104.35 Nonhierarchical Jancey's Method 65.00 54.07 16.25 13.52 96.50 100.87 Forgy's Method 59.46 60.15 14.87 15.04 87.36 110.55 Convergent K-mean 59.46 60.15 14.87 15.04 87.36 110.55 1. from 1st to last box; 2. including Stopping time Table 16. Travel Distances and Times of Group Defined by 12 Cluster Methods for NBDATA The centroid methods' groupings have the most si m i l a r means of distances, whereas the Forgy"s and the Convergent K-mean nonhierarchical methods' c l u s t e r s have the almost i d e n t i c a l t r a v e l times and distances. This contrast i n the quantitative measures i s the r e s u l t of the differences i n the group sizes as w e l l as the d i s t r i b u t i o n pattern of the data points within the groups. In t h i s case, the most s i m i l a r group sizes (39, 48) as defined by the average linkage within the new group method does not have the most matched t r a v e l times or d i s -tances. The i n c l u s i o n of a p o t e n t i a l c l u s t e r of 3 points (1, 2 and 3) i n the groupings d i s t o r t s the d i s t r i b u t i o n pattern of the distances (Figures 56-58). The resultant t r a v e l time required for traversing a smaller, but f a i r l y scattered group i s r e l a t i v e l y s i m i l a r to that required for t r a v e l l i n g through the points of a denser population. The two nonhierarchical methods (Forgy's and Convergent K-mean) have groupings of comparable distance d i s t r i b u t i o n s , thus the r e s u l t i n g t r a v e l times and distances are s i m i l a r . These two c l u s t e r i n g methods are probably the most suitable methods among the 12 for t h i s data set. Similar to the conclusion drawn for DATAl, the single linkage methods do not group data points into groups of comparable s i z e s , nor s i m i l a r t r a v e l distances. The • 144 Figure 56. D i s t r i b u t i o n of Distances Within Groups Defined by Forgy's and Convergent K-tnean Methods for NBDATA 145 - > -•M-TH* tt-wew-e Roy p-GROUP i o o O ' •-©• o o wa ! i \ ! I c-. • I . • ro ! in CO cu « « « « « « « « « « « « « « « m « « « « « « « « « « « * « « * « -O to 00 GROl/H 2 « * « * 00 CO <o O-fVJ t-* 1* * M 0"! 00 r — f . Jo u~> -« V t-o *1 i n ! • : ro . cc c o-Figure 57. D i s t r i b u t i o n of Distances Within Groups Defined by Average Linkage Methods for NBDATA Figure 58. D i s t r i b u t i o n of Distances Within Group Defined by Ward's Method for NBDATA 147 average linkage between merged groups method's cl u s t e r s also do not have sim i l a r sizes nor t r a v e l times. These methods are inappropiate because they tend to i s o l a t e the p o t e n t i a l 3-point c l u s t e r from the r e s t of the data units. This i s o l a t i o n y i e l d s d r a s t i c a l l y d i f f e r e n t group sizes and, of course, t o t a l l y incompatible t r a v e l times. The two centroid methods have sim i l a r c l u s t e r i n g properties though less severe i n the degree of i s o l a t i o n of the three data u n i t s . The r e s u l t s from the other four methods are generally acceptable but not recommended for c l u s t e r i n g t h i s set of data. 5.5.4 South Burnaby Empirical Data (SBDATA) The oddly clustered group sizes for single linkage (1, 1 and 111) and the centroid methods (2, 18, 93) were not examined i n t h i s evaluation process. These methods have f a i l e d to c l u s t e r the data set into groups of s i m i l a r sizes and are ruled out as aids for c l u s t e r i n g t h i s set of data for the scheduling problem. The summary table of the group sizes (Table 9) indicates that the nonhierarchical methods and the single-linkage chi-squares techniques c l u s t e r the data set into three comparable groups. An examination of the r e s u l t s from ROUTE on the twelve sets of groupings as defined by the various methods indicates that only three methods' groupings a c t u a l l y have 148 comparable means of the distances, travel times and distances (Tables 17 and 18). The Jancey's method groups have di s t i n c t -ly similar travel times as well as total times for the traverse within the groups: a result in grouping the data points into regions of almost identical areas (Figure 49). The similarity of distribution patterns of the inter-unit distances for groups as defined by this method (Figure 59) also reflects the superiority of this method over the others. Among the hierarchical methods, only the single-linkage Chi-squares method and the Ward's method have compar-able means of distances as well as travel times within the defined groups. The group sizes for the Ward's method are 45, 28 and.40 respectively (Figure 47). Though the scattered distributions of a l l the data units within these groups resemble each other, the distances between points for group 2 (28 points) are slightly longer than that of the other two groups. Thus the distribution of distances (Figure 60) has a wider spread than the others. The number of stops i s , in this case, c r i t i c a l for the total travel and stop time. The single linkage Chi-square method has i t s distince linkage pattern (Figure 41). Although the group sizes are 34, 36 and 43 respectively, the defined boundaries are very different from that of any methods examined in this Mean (feet) Standard Deviation M e t h o d ^ — _ ^ J * o u p 1 2 3 1 2 3 Hierarchical Single Linkage City - Block N. A. N.A. Euclidean Distance 5009 6937 7766 2280 3541 4016 Chi-Squares 7919 7712 7784 5011 4896 3697 Complete Linkage A738 6566 8309 2107 3231 4029 Avg. Linkage between Merged Group 5009 6037 7766 2280 3541 4016 Avg. Linkage within New Group 5146 6230 8283 2350 3232 4132 Centroid Method N.A. 5603 9949 N.A. 2993 4975 Median Method N.A. 5603 9949 N.A. 2993 4975 Ward's Method 7039 7346 6333 3818 3847 3348 Nonhierarchical Jancey's Method 7099 6345 6465 3328 3180 3336 Forgy's Method 7169 6470 6269 3363 3205 3210 Convergent K-mean 7169 6470 6269 3363 3205 3210 Table 17. Means and Standard Deviations of Groups Defined by 12 Cluster Method for SBDATA Travel Time^ Cmin) Travel Dist.1(mile) Total Time2(min.) Me tho dT^"^>^Gro up J . 2 3 1 2 3 1 2 3 Hierarchical Single Linkage City - Block N.A. N.A. N.A. Euclidean Distance 27.75 39.23 77.76 6.94 9.81 19.44 46.65 62.63 137.16 Chi-Squares 38.65 50.43 61.97 9.66 12.67 15.49 69.25 82.83 100.67 Complete Linkage 24.19 50.14 71.48 6.05 12.54 17.87 42.19 90.64 114.68 Avg. Linkage between Merged Group 27.75 39.23 77.76 6.94 9.81 19.44 46.65 62.63 137.16 Avg. Linkage within New Group 28.91 40.84 75.88 7.23 10.21 18.97 48.71 73.24 125.38 Centroid Method N.A. 27.97 190.16 N.A. 6.99 47.75 N.A. 44.17 273.86 Median Method N.A. 27.97 190.16 N.A. 6.99 47.75 N.A. 44.17 273.86 Ward's Method 57.97 44.24 48.74 14.49 11.06 12.18 98.47 69.44 84.74 Nonhierarchical Jancey's Method 49.22 54.19 '48.96 12.30 13.55 12.17 78.92 95.39 79.29 Forgy's Method 50.78 56.12 35.90 12.70 14.02 8.97 98.47 69.44 62.90 Convergent K-mean 50.78 56.12 35.90 12.70 14.02 8.97 98.47 69.44 62.90 1. from 1st to last box; 2. including stopping time. Table 18. Travel Distances and Times of Groups Defined by 12 Cluster Methods for SBDATA Jancey's Method C R O U P t 3 a 2 4 c t CC hi * r I I O <C K l & *- m I G R O U P - 2 C R O U P 5 I in e- a . » !• . * M IP co m w o r> K> (vi 1 o H- .in f» I r - « o c» CO IA «M rj> m o •o (VI 1<1 — in i/i 9 < a- in 3 -c P o '. • • . • , » rn CO o c- LI o» m o -O *P r*> KI <D *0 *T CO O —> ru •cr ! -it - I -Figure 59. D i s t r i b u t i o n of Distances Within Groups Defined by Jancey's Method for SBDATA -CHf-setjAKea-tiefHTO- CROUP 2 Figure 60. Di s t r i b u t i o n of Distances Within Groups Defined by Chi-Squares Method for SBDATA study. The d i s t r i b u t i o n pattern of the distances i s f a i r l y s i m i l a r to the Ward's. These d i s t r i b u t i o n s are s l i g h t l y skewed to the l e f t as shown i n Figure 61. The skewness i s p a r a l l e l to the difference i n the means and standard devia-t i o n of the distances. The r e s u l t i n g route t r a v e l time and distance also d i f f e r s l i g h t l y (Table 18), because of the d i f f e r e n t degree of sca t t e r i n g of the data units within the d i f f e r e n t sized groups. The differences i n population density of the groups as defined by the above three methods, plays an i n t e g r a l part i n the c a l c u l a t i o n of distances and the choice of route. The nonhierarchical (Jancey's) method i s apparently the best clustering'techniques among the twelve methods for grouping t h i s set of data. 5.6 Summary As mentioned i n the above sections, evaluation of the a p p l i c a b i l i t y of the twelve c l u s t e r i n g methods i s d i f f i -c u l t and has to be very subjective. In section 5.5, the quantitative evaluation approaches are designed to aid the evaluation process. I t i s found that not a l l the methods are suitable for a l l kinds of data sets with various s p a t i a l c h a t a c t e r i s t i c s . This conclusion i s p a r a l l e l to that drawn i n section 5.3. a u a. CROUP 1 o I » I -O «T — r -GROUP 3 -CGROUP—< *A ROt-ftTMOO-OROUP 2 2 f, -2| £ c j { co Kl KI •O « ro ry tn ro r ** •r 4* to f i w ^ - r -t in e» <c i ro o « • • • • Kl I -O Kl | tO in O Kl cp C r j <c K! . • ._ . • r» o «r « O LO o r - io : «» -Figure 61. .Distribution of. -Distances within Groups Defined" by Ward 's Method f o r SBDATA'\ The choice of c l u s t e r i n g method for grouping data units depends highly on the c h a r a c t e r i s t i c s of the data set. A table (Table 19) summarizing the preference of method based on the group s i z e s , the data unit d i s t r i b u t i o n s w i t h i n the groups, and the t r a v e l times and distances required for traversing the data points within the groups indicates that various methods are favoured for c l u s t e r i n g d i f f e r e n t data sets. Methods that give appropiate groupings for the evenly d i s t r i b u t e d data sets (DATAl, .NBDATA and SBDATA) do not necessarily give s a t i s f a c t o r y c l u s t e r s for other data sets. Among these methods that are considered applicable to the scheduling problem, some are higher ranked than the others for one set of data, and vice-versa. The s p a t i a l c h a r a c t e r i s t i c s of the data units w i t h i n a group, perhaps, i s the most c r i t i c a l element i n the decision process. DATAl, a data set without any p o t e n t i a l c l u s t e r , i s best grouped by the three nonhierarchical methods The r e s u l t s of these methods a l l have s i m i l a r c h a r a c t e r i s t i c : the data units are a l l clustered into l a t e r a l groups. This c h a r a c t e r i s t i c could be one of the deciding factor i n the evaluation process. The i n c l u s i o n of a p o t e n t i a l 3-point c l u s t e r i n the north east corner of the area i n NBDATA r e -duces the effectiveness of some methods. Only two h i e r a r c h i -c a l methods are favoured for t h e i r r e s u l t i n g t r a v e l distances 156 Ranking Criterion Group Sizes Travel Time & Dist. " ^ M e ^ d s - ^ ^ 1 2 3 1 2 3 Hierarchical Single Linkage City - Block . Euclidean Distance Chi-Squares 1 3 Complete Linkage 3 1 2 3 1 Avg. Linkage between Merged Group 2 2 Avg. Linkage H i t h i n New Group 2 1 2 2 Centroid Method Median Method Ward's Method 2 3 2 3 2 Nonhierarchical Jancey's Method 1 3 1 1 Forgy's Method 1 2 1 1 Convergent K-mean 1 1 3 2 1 1 Legend 1 — best method 3 — third best method 2 — second best method Table 19. Summary of Method Preferences for the Four Sets of Data 157 and times. The Ward's method i s considered to be the a l t e r n a -t i v e technique for grouping t h i s data set. SBDATA, a data set sim i l a r to DATAl, i s best grouped by Jancey's nonhierarchical method. On the whole, the three nonhierarchical methods are-genera l l y ranked as applicable methods for grouping evenly d i s t r i b u t e d data sets. The most ranked h i e r a r c h i c a l method i s the Ward's technique though the r e s u l t s from t h i s method are less appealling than the nonhierarchical ones'. The unevenly d i s t r i b u t e d data set with intended grouping i s found to be best grouped by the three linkage methods: the complete linkage and the two average linkage methods. Unfortunately, there were no available empirical data that resemble t h i s DATA2 set, and the conclusions concerning the c l u s t e r i n g o f . t h i s type of data d i s t r i b u t i o n can only be drawn from the t r i a l s on DATA2. It i s d i f f i c u l t to state that these three methods are the best ones for grouping t h i s type of d i s t r i b u t i o n , and i t requires supporting evidence to substantiate the v a l i d i t y of t h i s subjective evaluation. In evaluating the c l u s t e r i n g methods for grouping mail boxes for the scheduling problem, the ranking system based on the s i m i l a r i t y of t r a v e l distances and times i s pro-bably the best approach. In most cases, the schedules for the 158 truck services are set up according to the time allocated to each driver and not to the number of boxes each driver has to service. The Postal Union r u l i n g on the maximum number of stops for each route, however, could hinder the use of above scheduling approach. For t h i s study, i t i s assumed that the schedules are prepared for each driver based on the time s l o t assigned to each type of services. I f the assumption i s cor r e c t , then the three nonhierarchical methods and the Ward's method are probably the better c l u s t e r i n g techniques for group-ing data sets for Burnaby area. 159 CHAPTER VI CONCLUSIONS This study i n in v e s t i g a t i n g the a p p l i c a b i l i t y and c h a r a c t e r i s t i c s of the twelve c l u s t e r i n g techniques indicates the following:-1. The s p a t i a l c h a r a c t e r i s t i c s of the data set or the locations of the mail boxes have s i g n i f i c a n t influence on the outcomes of the grouping methods. A set of f a i r l y evenly d i s t r i b u t e d data units could be best grouped by the nonhierarchi-c a l methods and not the others. Conversely, these methods are not suitable for groupmg uneven-ly d i s t r i b u t e d data u n i t s . There i s no straight r u l e , however, to discriminate the use of any c l u s t e r i n g method, and i t requires test runs to prove the s u i t a b i l i t y of these c l u s t e r i n g methods for the p a r t i c u l a r data set. 2. The three single linkage methods using d i f f e r e n t distance measures give various r e s u l t s for the same set of data. The method using simple "City - Block"as association measure tends to group c l o s e l y located data units into a c l u s t e r and i s o l a t e the more distant data points. This grouping c h a r a c t e r i s t i c generates d i s s i m i l a r group sizes as w e l l as t r a v e l distances and times for traverses within defined groups. The square root of the sum of squares of d i f ferences (Euc lidean) method does not always i s o l a t e the distant data units from the others, and th i s approach generally gives f a i r l y acceptable groupings of the natural c l u s t e r s . The chi-squares method clust e r s data units with d i s t i n c t i v e l i n k s , and the group sizes outlined by t h i s method i s usually acceptable, but not as good as other methods'. On the whole, single linkage methods are not suitable techniques for grouping these four sets of data. The two average linkage methods and the complete linkage technique are considered to be suitable only for the unevenly d i s t r i b u t e d data set. The c r i t e r i a used i n these methods d i f f e r from that of single linkage methods', and these grouping c r i t e r i a generally l i n k data units into groups of f a i r l y comparable s i z e s . These algorithms tend to l i n k the p o t e n t i a l c l u s t e r s together pr i o r to the l i n k i n g of more scattered points. The effectiveness of the a b i l i t y to group poten-t i a l c l u s t e r s i s highly reduced i f there i s an 161 absence of p o t e n t i a l groupings. The r e s u l t s from these methods for evenly d i s t r i b u t e d data sets are therefore not as s a t i s f a c t o r y as these of nonhierarchical and the Ward's methods'. On the other handj intended groupings are r e a d i l y clustered by these methods as shown i n the t r i a l s for DATA2. 4. The r e s u l t s generated by the centroid and median methods are always i d e n t i c a l for these four data sets. The differences i n weighing the i n t e r - u n i t or i n t e r - c l u s t e r diatances i n these two methods have l i t t l e influence on the l i n k i n g sequence or the linkage pattern. I t i s apparent that only one of these methods should be used i n further studies of the a p p l i c a b i l i t y of c l u s t e r analysis. The r e s u l t s produced by these methods do not conform to any of the better ranked methods, and they are not suitable for grouping the mail boxes for the scheduling problem. 5. The Ward's method i s by far the best ranked among the seven h i e r a r c h i c a l methods for grouping evenly d i s t r i b u t e d data sets. The error sum of squares c r i t e r i o n adopted i n t h i s method links the variables with large variances f i r s t , thus gives a good 162 representation of both the c l o s e l y located and distant data units i n the groupings. The r e s u l t s of the Ward's method for the unevenly d i s t r i b u t e d data set, however, i s unacceptable. 6. The r e s u l t s from the nonhierarchical techniques are d i s t i n c t l y s a t i s f a c t o r y for grouping evenly d i s t r i b u t e d data sets. The location of the seed points or i n i t i a l p a r t i t i o n s for these methods, contrary to the findings of other authors, i s found to be of minor importance i n the grouping of the evenly d i s t r i b u t e d data units . The contin-uous a l l o c a t i o n and r e a l l o c a t i o n of the data units to the nearest centroid apparently reduces the importance of the seed points and p a r t i t i o n s . In t h i s study, randomly selected seed points and i n i t i a l p a r t i t i o n s are seemingly v a l i d . This does not only reduce the complexity i n using nonhierarchical methods, i t also reduces the time used i n data preparations. 7. The d i s t r i b u t i o n of i n t e r - u n i t distances within groups i d e n t i f i e s the scattering of the data units . The s i m i l a r i t y of d i s t r i b u t i o n patterns would indicate the compatibility of t r a v e l distances and times required for the groups defined. This 163 conclusion i s s i m i l a r to the r e l a t i o n s h i p between t r a v e l distances and the area i n which the c a l l points are located described by Christofides (1969). The comparison of the d i s t r i b u t i o n pattern i s d e f i n i t e l y a u s e f u l aid i n the s e l e c t -ion of c l u s t e r i n g methods. 8. The best i n t e r p r e t a t i o n t o o l for both the h i e r a r -c h i c a l and nonhierarchical c l u s t e r i n g r e s u l t s i s the representation of the links between e n t i t i e s or the group boundaries on a 2-dimensional graph. Tree diagrams are also u s e f u l , but i t requires more time to trace a tree than to inspect the linkage or boundary p l o t s . The plot of data units on graph also helps to understand the degree of sc a t t e r i n g of the c a l l points. On the whole, v i s u a l aids are u s e f u l i n the preparation of schedules. 9. The t r a v e l distances and times are c r i t i c a l measures i n evaluating the groupings defined by various methods. These elements are a c t u a l l y the most v i t a l information i n the preparation of schedules as w e l l as the s p e c i f i c routes f o r the trucks. The optimal route as outlined by the maximum distance saving routing method for each group can also be used as a decision c r i t e r i o n i n the s e l e c t i o n of c l u s t e r i n g method. 10. The evaluation of c l u s t e r i n g methods, whether q u a l i t a t i v e or quantitative i n nature, has to be very subjective and h e u r i s t i c . The seemingly apparent choice of methods for c l u s t e r ing mail boxes should be the Ward's and the three nonhierarchi c a l methods for data points d i s t r i b u t e d s i m i l a r to North and South Burnaby's. This, however, does not imply that a l l the groupings of the boxes into c l u s t e r s should be performed by these approaches. I t i s expected that some areas, such as the North Shore, would warrant the use of other c l u s t e r i n g methods e.g. the complete linkage and the two average linkage methods, i n order to give s a t i s f a c t o r y groupings of the mail boxes. The tests on unevenly d i s t r i b u t e d data set DATA2 have shown that nonhierarchical methods are not suitable for t h i s type of data point d i s t r i b u t i o n . The tools developed for t h i s study i n c l u s t e r i n g data sets, c a l c u l a t i n g s t a t i s t i c s , and estimating the route distance and timing could be coordinated into a single program for scheduling purposes. These tools are e f f i c i e n t means to help analyse the route structure as w e l l as the s p a t i a l relationships of c a l l points. The histograms showing 165 d i s t r i b u t i o n of distances between points i s also a useful t o o l for analysing the data points and evaluating the groupings. This study also points out that although computeri-zed c l u s t e r i n g methods can help the schedulers i n determining the assignment of c a l l points, i t does not over-rule the superiority of the groupings outlined manually by inspection as car r i e d out by the experienced planners. An i n t e r e s t i n g study related to t h i s c l u s t e r i n g method i n v e s t i g a t i o n would be the study of a p p l i c a b i l i t y of the f i v e more suitable methods i n grouping the Vancouver's mail or bundle boxes into c l u s t e r s . This t r i a l would further prove the f e a s i b i l i t y of using c l u s t e r analysis as an aid to the Post O f f i c e scheduling problem. 166 FOOTNOTE 1. Unpublished Special Vehicle U t i l i z a t i o n Study Report, Vancouver Post O f f i c e , 1975. 2. M.R. Anderberg. Cluster Analysis for Application (New York: Academic Press, 1973). pp.25-29. 3. B. E v e r i t t . Cluster Analysis(Toronto: Heinmann Educational Books, 1974). pp.7-9. 4. M.R. Anderberg. pp.132-133. 5. B.S. Duran and P.L. Odell. Cluster Analysis,_a  Survey (New York:Springer-Verlag, 1974). pp.6-7. 6. M.R. Anderberg. pp.136-7. 7. Ibid. p.134. 8. Ibid. p.140. 9. Ibid. p.156. 10. Ibid. p.163. REFERENCES Anderberg, M.R., Cluster Analysis for Application, Academic Press, N.Y., 1973, 354 p. Astrahan, M.M., "Speech Analysis by Clustering, or the Hyperphoneme Method", Stanford A r t i f i c i a l I n t e l l i -gence Project, Stanford University, Stanford, Cali f . , 1970, 25p. Ball, G.H. and Hall, D.J., "A Clustering Technique for Summarizing Multivariate D a t a " , Behavioral Sciences, vol. 12, No. 2, 1967, pp. 153-55. Bijnen, E.J., Cluster Analysis: Survey and Evaluation of Techniques, Tilburg University Press, The Netherlands 1973, 112p. Bonner, R.E., "On Some Clustering Techniques", IBM Journal  of Research and Development. 1964, vol.8, pp. 22-32. Bridges, C.C., "Hierarchical Cluster Analysis", Psvchologi  ca l Reports, vol. 18, 1966, pp.851-54. Carmichael, J.W. , George, J.A. and Julius, R.S., "Finding Natural Clusters", Syst. Zool.. vol.17, 1968, pp. 144 150. Cattell, R.B., Factor Analysis. Harper, N.Y. , 1952, p.355. Chance, R., Dyke, B. and Wong, S., Unpublished Report on Special Vehicle U t i l i z a t i o n Study, Vancouver Post Office, Vancouver, 1975, 40p. Christofides, N. and Eilon. S., "Expected Distances in Distribution Problems Cp. Res. Q., vol. 20, no. 4, 1969, pp. 437-43. Cochran, W.G. and Hopkins, C.E., "Some Classification Prob-lems with Multivariate Qualitative Data", Biometrics , vol.17, no.l, 1961, pp.10-32. Cole, A.J. and Wishart, D., "An Improved Algorithm for the Jardine-Sibson Method of Generating Overlapping Clusters", The Computer Journal, vol.13,1970, pp.156-163. 168 Cormack, R.M. , "A Review of C l a s s i f i c a t i o n " , Jour. R. S t a t i s t . Soc. Series A. v. 134, no.3, 1971, pp. 321-367. Duran, B.S. and Odell, P.L., Cluster A n a l y s i s , a Survey, Springer-Verlag, N.Y. , 1974, 137p. Edwards, A.W.F. and Cavall-Sforza, L.L., "A Method for Cluster Analysis", Biometrics , v.21, 1965, pp.362-375. Engelman, L. and FujS., BMDP2M program, UCLA BMD Document-ation. UCLA, C a l i f . , 1970, 7p. E v e r i t t , B., Cluster A n a l y s i s . Heinmann Educational Books, London, 122p. F l e i s s , J.L. and Zubin,J., "On the Methods and Theory of Clustering". Multivariate Behavioral Research, v.4, 1969, pp.235-250. Forgy, E., "Cluster Analysis of Multivariate Data: E f f i c i e n -cy vs I n t e r p r e t a b i l i t y of C l a s s i f i c a t i o n s " , Biometrics, v.21, 1965, p.758. Gower, J.C., "Some Distance Properties of Latent Root and Vector Methods Used i n Multivariate A n a l y s i s " , Biometrika. v.53, 1966, pp. 325-338. , "A Comparison of Some Methods of Cluster Analysis", Biometrics. v.23, no.4, 1967, pp.623-37. "Minimum Spanning Trees and Single Linkage Cluster Analysis", Appl. S t a t i s t . , v.18, n o . l , pp.54-64. Green, P.E. and Rao, V.R.. "A Note on Proximity Measures and Cluster Analysis ', Jour. Mark.. Res. , v o l . VI, 1969, pp.359-64. Harris, B., Farhi, A. and Dufour, J . , Aspects of a, Problem , i n Clustering, University of Pennsylvania, 1972, 28p. Hartigan, J.A., "Representation of S i m i l a r i t y Matrices by Trees", J. Amer. S t a t i s t . Assoc., v.62, 1967, pp.1140-1158. ________ "Direct Clustering of a Data Matrix", J . Amer. S t a t i s t . Assoc., v.67, 1972, pp.123-129. 169 , Clustering Algorithms, John Wiley, N.Y., 1975, 351p. Jardine, N. and Sibson, R., "The Construction of H i e r a r c h i -c a l and Non-hierarchical C l a s s i f i c a t i o n s " , Comp. J . , v . l l , 1968, pp. 177-184. , "Choice of Methods for Automatic C l a s s i f i c a t i o n s " , Comp. J . , v.14, 1971, pp.404-406. J a r v i s , R.A. and Patrick, E.A., "Clustering Using a Similar-i t y Measure Based on Shared Near Neighbours , IEEE Trans. Comp., v.22, no.11, 1973, pp.Io25-1034. Jensen, R.E., "A Dynamic Programming Algorithm for Cluster Analysis", Op. Res., v.17, 1969, pp.1034-57. Johnson, R.M. , "Q-Analysis of Large Samples", Jour. Mark.  Res. , v. VII, 1970, pp.104-5. Johnson, S.C., "Hier a r c h i c a l Clustering Schemes", Psycho- metrika, v.32, no.3, 1967, pp.241-254. Jones, K.S. and Jackson, D., "Current Approaches to Class-i f i c a t i o n and Clump-finding at the Cambridge Language Research Unit", Comp. J . , v.10, 1967, PP.29-37. King, B.F., "Stepwise Clustering Prcedures", J. Amer.  S t a t i s t . Assoc., v.62, 1967, pp.86-101. Koontz, W.L. e t a l , "A Branch and Bound Clustering Algorithm", IEEE Tran. Comp., v.24, no. 9, 1975, pp.908-914. Kruskal, J r . J.B., "On the Shortest Spanning Subtree of a Graph and the T r a v e l l i n g Salesman Problem", Proc.  Amer. Math. Soc., no.7, 1956, pp.48-50. Kruskal, J.B., "Multidimensional Scaling by Optimizing Good-ness of F i t to a Nonmetric Hypothesis ', Psychometrika. v.29 s 1964, pp.1-28. Lance, G.N. and Williams, W.T., "Computer Programs f o r Hi e r a r c h i c a l Polythetic C l a s s i f i c a t i o n ( ' S i m i l a r i t y A n a l y s i s ' ) " , Comp. J . , v.9, n o . l , 1966, pp.60-64. ,"A General Theory of C l a s s i f i c a t o r y Sorting S t r a t -egies I, H i e r a r c h i c a l Systems", Comp. J . , v.9, 1967, pp.373-380. , "A General Theory of C l a s s i f i c a t o r y Sorting S t r a t -egies I I , Clustering Systems", Comp. J . , v.10, 1967, pp.271-277. 170 Ling, R.F., "On the Theory and Construction of K-Cluster", Comp. J . . v.15, 1972, pp.326-332. MacQueen, J.B. , "Some Methods for C l a s s i f i c a t i o n and Analysis of Multivariate Observations", Proc. Symp. Math.  S t a t i s t , and Pr o b a b i l i t y , 5th. Berkerly, v . l , 1967, pp.281-297. McQuitty, L.L. , " H i e r a r c h i c a l Linkage Analysis for the Isol a t i o n of Types", Educational and Psychological  Measurement, v.20, 1960, pp.55-67. , "Hierarchical Syndrome An a l y s i s " , Educ. and Psycho. Measure., v.20, 1960, pp.293-304. , "Hierarchical C l a s s i f i c a t i o n by Multiple Linkage", Educ. and Psycho. Measure., v.30, 1970, pp.3-19. Mcrae, D.J., "MLRCA: A Fortran IV I t e r a t i v e K-means Cluster Analysis Program", Behavioral S c i . , v.16, no.4, 1971, pp.423-424. Marriot, F.H.C., " P r a c t i c a l Problems i n a Method of Cluster Analysis", Biometrics, v.27, no.3, 1971, pp.501-14. Morrison, D.G., "Measurement Problems i n Cluster A n a l y s i s " , Management S c i . , v.13, 1967, pp. B-775-780. Parks, J.M., "Fortran IV Program for Q-Mode Cluster Analysis on Distance Function with Printed Dendogram", Comp.  Contrib. 46, Stat. Geol. Survey, University of Kansas, Lawrence, Kansas, 1970, 36p. Patterson, J.M. and Whitaker, R.A., "CGROUP: H i e r a r c h i c a l Grouping Analysis with Optimal Contigenity Constraint Program , UBC Computer Centre Program, UBC, Vancouver, 1973, 20p. Rand, W.M. , "Objective C r i t e r i a for the Evaluation of Clustering Methods", J . Amer. S t a t i s t . Assoc., v.66, 1971, pp.846-850. Rohlf, F.J., "Adaptive H i e r a r c h i c a l Clustering Schemes", Syst. Zool., v.19, n o . l , 1970, pp.58-83. Sawrey, W.L. e t a l , "An Objective Method of Grouping P r o f i l e s by Distance Functions and i t s Relation to Factor Analysis", Educ. and Psycho. Measure., v.20, 1960, pp. 651-673. Shepard, R.N., "Analysis of Proximities: Multidimensional Scaling with an Unknown Distance Function I and I I " , Psychometrika, v. 27, 1966, pp. 125-140, 219-246. Shepherd, M.J. and Willmott, A.J. , "Cluster Analysis on the Atlas Computer", Comp. J.,v.11, 1968, pp.57-62. Sibson, R., "SLINK: An Optimally E f f i c i e n t Algorithm for the Single Link Cluster Method", Comp. J . , v.16, 1973, pp.30-34. Sokal, R.R. and Michenener, CD., "A S t a t i s t i c a l Method for Evaluating Systematic Relationships", Univ.  Kansas. S c i . B u l l . 38, 1958, pp.1409-1438. Sokal, R.R. and Sneath, P.H.A., ' P r i n c i p l e s of Numerical  Taxonomy , Freeman, San Francisco, 1963, 377p. Tse, A., "Scheduling of Post O f f i c e Letter Box C o l l e c t i o n Routes - A Case Study", Comm. 541 Project, UBC, Vancouver, 1975, 26p. Ward, J r . J.H., "Hier a r c h i c a l Grouping to Op imise an Objective Function", J.Amer.Statist.Assoc., v.58, 1963, pp.236-244. ; r — " Ward, J r . J.H. and Hook, M.E. , "Application of an Hierarc h i c a l Grouping Procedure to a Problem of Grouping P r o f i l e s ' , Educ. and Psycho. Measurement, v.23, 1963, pp.69-83. Wishart, D., "An Algorithm for H i e r a r c h i c a l C l a s s i f i c a t i o n s Biometrics, v.22, n o . l , 1969, pp.165-170. ,"Fortran II Programs for 8 Methods of Cluster Analysis(CLUSTRAN I ) " , Comp. Contrib. 38, State Geol.  Survey, Univ. of Kansas, Lawrence, 1969, 47p. "~ Wolfe, J.H., "Pattern Clustering by Multivariate Mixture Analysis", Multivariate Behavioral Res., v.5, no.3, 1970, pp.329-350. Wright, W.E., "An Axiomatic S p e c i f i c a t i o n of Euclidean Analysis", Comp. J . , v.17, 1974, pp.355-364. Zahn, C.T., "Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters", IEEE Trans. Comp., v.20, n o . l , 1971, pp.68-86. APPENDIX A L i s t i n g of MATRIX: a Computer Program for Generating Symmetric Distance Matrix 173 C T H I S 15 M A T R I X : PROGRAM FOR C A L C U L A T I N G D I S T A N T M A T R I X C. FOR C L U S T E R I N G PROGRAM I N P U T S < C I D I M E N S I O N X I 1 5 G ) , Y ( 1 5 Q ) , D I S T i 1 5 0 , 1 5 0 } , A D I S T ( 1 5 0 , 1 5 0 ) D I M E N S I O N D M A T ( 7 0 0 0 ) R E A D , N DO 10 1 = 1 , N R E A D ( 5 t l O O ) X ( I ) » Y < I ) W R I T E ( 6 , i O I ) I , X J I ) , Y i I J j 10 CONTINUE DO 20 I 1 = 1 . N DO 3 0 1 2 = 1 , N 30 D I S T ( I I , I 2 ) = { ( X ( 1 2 ) - X < I I ) i * * 2 ) + ( ( Y 1 1 2 ) - Y { 1 1 i ) * * 2 J 2 0 CONTINUE A M A X = 0 . A M I N = 9 9 9 9 9 9 . 9 DO 40 1 3 = 1 , N DO 50 I 4 = 1 , N I F ( DIST ( 1 3 , 1 4 ) . G T . A M A X J A M A X = D I S T ( I 3 , 1 4 ) 5 0 I F i D i S T i 1 3 , 1 4 ) . L T . A M I N ) A M I N = O I S T ( 1 3 , 1 4 ) 4 0 CONTINUE DO 60 I 5 = 1 , N 00 70 I 6 = 1 , N 7 0 A D I S T U 5 , I 6 ) = D I S T ( I 5 , 1 6 ) / A M A X 6 0 CONTINUE ! DO 80 I 7=1 ,JN :80 CONTINUE ! K=0 DO 90 I 8 = 2 , N N 2 = I 8 - 1 DO 95 I 9 = 1 , N2 K=K + 1 '95 0 M A T < K ) = A D 1 S T ( I 8 , I 9 ) 9 0 C O N T I N U E C O U N T = K / i 0 . KOUNT=K/10 K1=K0UNT I F ( C O U N T . G T . K O U N T ) K1=KGUNT+1 DO 96 I I - l . K l KB£G=«I 1 * 1 0 ) - 9 K E N D = I I * 1 0 I F ( 1 1 . E Q - K l ) KcMD=K W R I T E ( 7 , 1 1 3 ) ( D M A T I K 3 ) , K 3 = K B E G , K E N D ) 96 W R I T E ( 6 , l i 3 ) (DMA!{K2) ,K2=KBcG»KEND) ]L00 FORMAT* 5 X , 2 F 7 . 0 ) JLOl F O R M A T ! I 5 , 2 F 1 0 . 2 ) ]. 13 FORMAT ( 1 0 F 7 . 4 ) STOP END 1>S IG APPENDIX B L i s t i n g and Sample Outputs from HiER: a Computer Program for Six H i e r a r c h i c a l C l u s t e r i n g Techniques 175 C 1 0 0 PROGRAM D R I V E R — - MAIN PROGRAM D I M E N S I O N X ( 1 0 0 0 0 1 L I M I T = 1 0 0 0 0 C A L L C N T P . H X , L I M I T ) W R I T E ( 6 , 1 0 0 ) F O R M A T ( • 1 ' f l 5 X . •cND OF OUTPUT *) STOP END C C 0 SUBROUTINE C N T R L ( X , L I M I T ) T H I S SUBROUTINE A L L O C A T E S STORAGE» READS INPUT AND CONTROLS E X E C U T I O N FOR A H I E R A R C H I C A L C L U S T E R I N G JOB BASED ON A P R O V I D E D S I M I L A R I T Y M A T R I X . e G c c t c c c c z ;c c . •c c c c c c c c c c r c«* c c c c c INPUT S P E C I F I C A T I O N S CARD 1 CARD 2 C O L S COLS 6 - 7 COLS 8 - 9 COLS 1 0 - 1 2 T I T L E CARD INFORMATION FOR SUBROUTINES C L S T R AND T R E E 1- 3 NE=NUM8ER OF E N T I T I E S (DATA U N I T S OR V A R I A B L E S ) TO B£ C L U S T E R E D COLS 4 - 5 I S I G N = O P T I O N FOR S I M I L A R I T Y F U N C T I O N I S I G N = + 1 , D I S T A N C E MEASURE I S I G N = - 1 , C O R R E L A T I O N MEASURE NTSV=TAPE UNIT ON WHICH C L S T R R E S U L T S ARE SAVED NTSV = 7, PUNCH R E S U L T S ON CARDS N T S V . L E . O i DO NOT SAVE R E S U L T S N T I N = U N I T FROM WHICH S I M I L A R I T Y M A T R I X I S READ N T I N = 5 , CARD READER N T I N o N E o S , D I S K OR T A P E I N O P T = I N P U T OPTION FOR S I M I L A R I T Y MATRIX I N O P T . L E . O f EACH RECORD I S ONE ROW OF A LOWER T R I A N G -ULAR M A T R I X I N O P T c G T o O , THE LOWER T R I A N G U L A R M A T R I X IS C O N S I D E R E D TO BE STORED BY ROWS' I N ONE LONG L I N E A R ARRAY AND IS READ IN BLOCKS * I N O P T # L O N G . KOUT=OUTPUT O P T I O N K0UT=+2, STANDARD OUTPUT KQUT=-2» STANDARD OUTPUT PLUS PUNCHED SEQUENCE L I S T FROM SUBROUTINE * T R E E * • ANY P R E P O S I T I O N I N G OF THE I / O UNITS NTSV AND NT I N MUST BE A C C O M P L I S H E D IN PROGRAM DRIVER OR THROUGH USE OF CONTROL CARDSo CARD 3 INPUT FORMAT FOR S I M I L A R I T Y MATRIX ( 2 0 A 4 FORMAT) C A R D ( S ) 4 S I M I L A R I T Y MATRIX CARD 5 END OF RECORD CARD ( 7 / 8 / 9 ) COLS 1 3 - 1 4 176 INCLUDE CARDS 4 AND 5 ONLY I F THE S I M I L A R I T Y MATRIX IS ON C A R D S * * * C A R D ( S ) 6 L A B E L CARDS FOR E N T I T I E S . THERE ARE TWO OPTIONS 1 . I N C L U D E 1 CURD WITH THE 4 - C H A R A C T E R S * W O L B * I N COLUMNS 1 - 4 C UNDER T H I S O P T I O N L A B E L S ARf= NOT P R I M T E D ON THE TREE O U T P U T . 2» I N C L U D E NE C A R D S , COLUMNS I TO 2 0 C O N T A I N I N G A L A 3 E L FOR ONE E N T I T Y . ORDER THE LABEL CARDS IN THE SAME S S Q U i M C c AS THE E N T I T I E S ARE R E P R E S E N T E D I N THE S I M I L A R I T Y M A T R I X . DECK SETUP S P E C I F I C A T I O N S THE USER PROVIDES PROGRAM D R I V E R WHICH PERFORMS THE FOLLOWING T A S K S . 1 . A S S I G N S I N P U T / O U T P U T U N I T S 2 . E S T A B L I S H E S THE DIMENSION! OF THE X ARRAY AND SETS T H I S D I M E N S I O N EQUAL TO L I M I T . 3. C A L L S SUBROUTINE C N T R L . THE FOLLOWING EXAMPLE WILL S U F F I C E I N MOST C A S E S . PROGRAM DRI VER ( I N P U T , OUTPUT , PUNCH , T A P E S = I NP UT , T A P E 6 = Q'JT PUT , A T A P E 7 = P U N C H , T A P E 1 , T A P E 2 ) D I M E N S I O N X { 7 0 0 0 ) L I M I T = 7 0 0 0 C A L L C N T R L ( X , L I M I T ) -END A SECOND J O B DEPENDENT SEGMENT I S S U B R O U T I N E M E T H O J . THE USER S E L E C T S AMONG THE S E V E R A L A L T E R N A T I V E V E R S I O N S OF T H I S S U B R O U T I N E TO IMPLEMENT THE D E S I R E D C L U S T E R I N G T E C H N I Q U E . THE SUBPROGRAMS C N T R L , C L S T R , M T X I N , L F I N D AND TREE GO I N EVERY J O B . THE X STORAG X(N1 ) X I N 2 ) X { N 3 ) X ( N 4 ) X ( N 5 ) X (N6 ) STORAG M1=N7 XI M l ) XI M2) X(M3 ) X ( M 4 ) X ( M 5 ) ARRAY I S PAR £ FOR ARRAYS TO X ( N 2 - 1 ) TO X I N 3 - 1 ) TO X { N 4 - 1 ) TO X I N 5 - 1 ) TO X ( N 6 - 1 ) TO X ( N 7 - 1 ) E FOR ARRAYS T I T I O N E D NEEDED NE WORDS WORDS WORDS WORDS WORDS WORDS N'c NE NE NE NE NEEDED FOR STORAGE AT A L L STAGES — S T O R A G E OF — S T O R A G E OF — S T O R A G E OF — S T O R A G E OF — S T O R A G E OF — S T O R A G E OF I N S U B R O U T I N E TO X C M 2 - 1 ) ( N E * ( N E - 1 ) 1 / 2 W O R D S — TO X ( M 3 - 1 ) NE WORDS—STORAGE OF TO X ( M 4 - 1 ) NE WORDS—STORAGE OF TO X ( M 5 - 1 ) NE WORDS—STORAGE OF TO X 1 M 6 - 1 ) NE WORDS—STORAGE OF AS FOLLOWS OF THE J O B THE I I ARRAY THE J J ARRAY THE SS ARRAY THE I L ARRAY THE J L ARRAY THE NEXT ARRAY C L S T R STORAGE OF THE S ARRAY THE LAST ARRAY THE NEAR ARRAY THE SREF ARRAY THE L I S T ARRAY 177 X(M6> TO X ( M 7 - 1 ) NE WORDS—STORAGE OF THE A ARRAY X ( M 7 ) TO XI MS I NE WORDS—STORAGE OF THE B ARRAY STORAGE FOR ARRAYS NEEDED IN SUBROUTINE TREE ( O V E R L A Y I N S U B R O U T I N E C L S T R J L1=N7 TO X ( L 2 - 1 ) 2 5*NE WORDS—STORAGE OF THE A ARRAY TO X ( L 3 - 1 ) 5 * N E W O R D S — S T O R A G E OF THE L A B E L ARRAY TO X ( L V - l ) NE WORDS—STORAGE OF THE L C L N O ARRAY TO X ( L 5 - 1 ) NE WORDS—.STORAGE OF THE L I N E ARRAY TO X R 6 - 1 ) NE WORDS—STORAGE OF THE IS ARRAY TO X I L 7 ) NE WORDS—STORAGE OF THE L A S T ARRAY ARRAYS NEEDED I N T E G E R F I R S T D I M E N S I O N X l l ) , F M T ( 2 0 ) ,T I T L E I 20 ) , E P S ( 2 5 ) DATA R L B / ' N O L B ' / R E A D ( 5 , 1 0 0 0 ) T I T L E R E A D 1 5 , 1 1 0 0 ) N E , I S I G N , N T S V , N T I N t I N O P T , K O U T W R I T E ( 6 * 2 5 0 0 ) T I T L E W R I T E ( 6 , 2 2 0 0 ) N E i I S I G N , N T S V . N T IN»INOPT,KOUT P A R T I T I O N THE STORAGE ARRAY N l = l N2=N1+NE N3=N2+NE N4=N3+NE N5=N4+NE N6=N5+NE N7=N6+NE M 2 = N 7 + ( N E * C N E - 1 ) ) / 2 M3=M2+NE M4=M3+NE M5 = M4 + NE M6=M5+NE M7=M6+N£ M8=M7+NE-1 L2=N7+25*NE 1 3 = L 2 + 5 * N E L4=L3+NE L5=L4+NE . L6=L5+NE L 7 = L 6 + N E - 1 CHECK FOR S U F F I C I E N T STORAGE MAX=M8 I F ( L 7 . G T . M A X ) MAX=L7 W R I T E ( 6 , 2 3 0 0 ) M A X , L I M I T I F ( M A X o G T 0 L I M I T ) STOP READ THE S I M I L A R I T Y M A T R I X R E A D ( 5 , 1 0 0 0 ) FMT W R I T E ( 6 , 2 1 0 0 ) FMT C A L L M T X I N t X ( N 7 ) , I N O P T , N E t N T I N , F M T ) 178 C. READY TO C L U S T E R 6 0 C A L L C L S T R l X I N 1 ) , X ( N 2 ) , X ( N 3 ) , X ( N 4 ) , X ( N 5 ) , X ( N 6 ) , X ( N 7 ) , X I M 2 ) , X ( M 3 ) , | A X ( M 4 ) , X ( M 5 ) , X ( M 6 ) , X ( M 7 ) , T I T L E , N E , I S I G N , N T S V ) C READ L A B E L C A R D ( S ) F I R S T = L 2 L A S T = L 2 + 4 R E A D ( 5 , 1 0 0 0 ) ( X ( I ) , 1 = F I R S T , L A S T ) I F I X C F I R S T ) . f = Q . R L B ) GO TO 80 READ R E M A I N I N G L A B E L S DO 70 K = 2 , N E F I R S T = L A S T + 1 L A S T = LAST +5 TO R E A D ( 5 , 1 0 0 0 ) ( X ( I ) , I = F I R S T . L A S T ) C DRAW THE T R E E CORRESPONDING TO THE C L U S T E R I N G 80 MERGES=NE-1 C A L L T R E E ( X I N l ) , X ( N 2 ) , X ( N 3 ) , X ( N 4 ) , X ( N 5 ) , X ( N 6 ) , X ( N 7 ) . X ( L 2 ) , X ( L 3 ) , AX ( L 4 ) , X {L 5 ) , X ( L 6 ) , E P S , T I T L E , ME RGE S , 1, 6 , 1 , KOUT , NE) RETURN 1 0 0 0 FORMAT( 20A4) 1100 F O R M A T l 1 3 , 3 I 2 f I 3 , 1 2 , 1 3 ) 2100 F O R M A T ( 7 H F O R M A T , 2 0 A4) 2200 FORMAT(5H NE = , I 8 , / , 8 H I S I G N = , I 5 , / , 7 H NTSV = , I 6 , / , 7 H NT I N = , 1 6 , A / , 8 H INOPT = , I 5 , / , 7 H KOUT = , 1 6 ) 2 J 0 0 FORMAT{19H REQUIRED STORAGE = , I 5 , 6 H W O R D S , / , A 19H A L L O T T E D STORAGE = , I 5 , 6 H W O R D S , / ) 2">00 FORMAT ( ' 1 ' , / / , 2 0 A 4 , / / ) END C S U B R O U T I N E C L S T R ( 11 , J J , SS , 1 L , J L , N E X T , S , L AST , \J E A R , S R E F , L I ST , A , B , A T I T L E , N , I S I G N , N T ) C IN T H I S V E R S I O N THE LOWER T R I A N G U L A R PORTION OF THE S I M I L A R I T Y MATRIX C I S STORED BY ROWS IN THE O N E - D I M E N S I O N A L ARRAY S . C C THE FOLLOWING V A R I A B L E S ARE S P E C I F I E D I N THE C A L L I N G PROGRAM AND C ARE P A S S E D THROUGH THE ARGUMENT L I S T C N=NUMBER OF O B J E C T S TO BE C L U S T E R E D C S ( J ) = J - T H ELEMENT IN LOWER T R I A N G U L A R S I M I L A R I T Y M A T R I X C I S I G N = 0 P T I 0 N ' S P E C I F Y I N G TYPE OF S I M I L A R I T Y F U N C T I O N USED C I S I G N = + l = D I S T A N C E MEASURE ( D E C R E A S I N G F U N C T I O N OF S I M I L A R I T Y ) C I S I G N = - l = C ORR EL ATION MEASURE ( I N C R E A S I N G F U N C T I O N OF S I M I L A R I T Y ) C NT=TAPE UNIT ON WHICH THE R E S U L T S ARE SAVED C NT e>LE«>0 = DO NOT SAVE R E S U L T S ON TAPE C NT=7=SAVE R E S U L T S ON PUNCHED CARDS C T I T L E = I D E M T I F Y I N G T I T L E FOR T H I S RUN C C THE FOLLOWING V A R I A B L E S REPRESENT THE OUTPUT OF THE PROGRAM AND ARE C P A S S E D BACK THROUGH THE ARGUMENT L I S T . T H E S E R E S U L T S ARE READY FOR C SUBROUTINE T R E E . C K=STAGE OF C L U S T E R I N G C I I ( K ) = L O W E R NUMBERED C L U S T E R MERGED AT STAGE K 179 C J J ( K ) = U P P E R NUMBERED C L U S T E R MERGED AT STAGE K E S S ( K ) = V A L U c Or S I M I L A R I T Y F U N C T I O N A S S O C I A T E O WITH McRGE AT STAGE K tt IL { K ) = P R E C E O I N G STAGE AT WHICH I K K ) WAS LAST I N A MERGE G J L ( K ) = P R E C 6 D I N G S T A G E AT WHICH J J ( K ) WAS LAST IN A MERGE G NEXTt K.) =NEXT STAGE AT WHICH I K K ) I S I N A MERGE G G I N A D D I T I O N , THE FOLLOWING V A R I A B L E S PLAY IMPORTANT ROLES IN THE PROGRAM d N E A R { I ) = I D NUMBER OF EXTREME ELEMENT I N ROW I OF THE LOWER C T R I A N G U L A R S I M I L A R I T Y M A T R I X * 0 S R E F C I ) = S ! M I L A R I T Y MEASURE FOR THE P A I R ( I , N E A R ( I ) ) CJ L I S T ( I ) = I - T H CLUSTER ID NUMBER I N S E Q U E N T I A L L I S T OF CURRENT C L U S T E R S d NCL=NUMBER OF C L U S T E R S AT CURRENT STAGE C L A S T ! I ) = S T A G E NUMBER AT WHICH C L U S T E R I WAS LAST I N A MERGE C A=WORKING AREA FOR S U B R O U T I N E METHOD C R=WORKING AREA FOR SUBROUTINE METHOD C C T H I S S U B R O U T I N E USES FUNCTION L F I N D < I , J ) TO F I N D THE ADDRESS I N S FOR THE S I M I L A R I T Y MEASURE BETWEEN C L U S T E R S I AND J D I M E N S I O N S (1) , I I ( 1 ) , J J ( 1 ) , S S ( l ) , I L ( 1 ) , J L { 1 ) , N E X T 11 ) , N E A R { 1 ) , A S R E F ( l ) , L I S T l l ) , L A S T ( l ) , A ( l ) , B ( l ) D I M E N S I O N T I T L E ( 2 0 ) I N I T I A L I Z E V A R I A B L E S AND SET CONSTANTS NCL=N K = l S I G N = I S I G N B I G = S I G N * l o E 5 0 C A L L M E T H O D C - S , N E A R , S R E F , L I S T , A , B , S R E F X , S I G N , N , N C L , L R E F , N R E F , 1 } I N I T I A L I Z E ARRAYS DO 10 J = l , N L A S T ( J ) = 0 N E X T ( J ) = 0 L I S T ( J ) = J S R E F ( J ) = 8 I G 10 CONTINUE C F I N D EXTREME ENTRY I N EACH ROW L = 0 DO 30 1 = 2 , N 11=1-1 DO 30 J = 1 , 1 1 L=L+1 C IN E F F E C T S ( L ) = S ( I , J ) I F ( ( ( S ( L ) - S R E F l 1 } ) * S I G N ) o G T o Oo) GO TO 30 N 5 A R ( I J =J S R £ F U ) = S U ) 30 CONTINUE C MAIN LOOPo F I N D EXTREME VALUE I N S R E F ARRAY 4 0 S R E F X = B I G DO 50 1 = 2 , N C L L I S T I = L I S T ( I ) I F ( U S R E F I L I S T I ) - S R E F X ) * S I G N ) . G T . O ) GO TO 50 180 I R E F = I L R E F = L I S T I S R E F X = S R E F ( L 1 S T I ) 50 C O N T I N U E C L R E F IS THE ROW NUMBER C O N T A I N I N G T H E E X T R E M E ENTRY IN T H E S A R R A Y . C I F T H E R E A R E T I E S , T H E N L R E F IS THE H I G H E S T NUMBERED ROW WITH T H I S C E X T R E M E V A L U E . H E N C E L R E F . G T « N E A R ( L R E F ) « I R E F I D E N T I F I E S T H E C P L A C E M E N T OF L R E F IN THE L I S T A R R A Y . N R £ F = N E A R ( L R E F ) C A L L METHOD(S , NEAP.» S R E F , L I S T , A» 3» S R E F X » S I G N , N» NCL » L R E F » N R E F » 2 ) C G E N E R A T E MERGE DATA N E E D E D FOR S U B R O U T I N E T R E E I I ( K ) = N R E F J J ( K ) = L R E F S S ( K ) = S R E F X I L ( K ) = L A S T ( N R E F ) J L ( K ) = L A S T ( L R E F ) L A S T ( N R E F ) = K I F ( I L l K ) o E Q o O ) GO TO oC I L K = I L ( K ) N e X T ( I l . K ) = K 60 I F ( J L ( K ) . E Q o 0 ) GO TO 7 0 J L K = J L ( K) N E X T t J L K ) - K 70 K=K+1 C T E R M I N A T E I F N-1 MERGES H A V E O C C U R E D I F ( K . E Q . N ) GO TO 140 . C U P D A T E FOR T H E . NEXT C Y C L E N C L = N C L - 1 • I F ( I R E F . G T . N C L ) GO TO 90 C U P D A T E L I S T ARRAY BY R E M O V I N G L R E F AND P U S H I N G DOWN THE L I S T DO 80 I = I R = F , N C L 80 L I S T ( I ) = L I S T ( I-t-i ) C U P D A T E FOR NEXT C Y C L E 90 C A L L M E T H O D ( S , N E A R , S R E F , LI S T , A , B , S R E F X , S I G N , N , N C L , L R E F , N R E F , 3 ) GO TO 4 0 C C L U S T E R I N G F I N I S H E D AND A L L A N C I L L A R Y I N F O R M A T I O N G E N E R A T E D . C S A V E R E S U L T S AS D E S I R E D . 1 4 0 K = K - 1 160 I F ( N T . L E . O ) R E T U R N . W R I T E ( N T , 2 3 0 0 ) T I T L E DO 1 7 0 1=1 ,K 170 W R I T E ( N T , 2 2 0 0 ) I » 11 ( I ) , J J ( I ) , S S ( I ) , I L ( I ) , J L ( I ) , N E X T ( I ) RETURN 2 2 0 0 F O R M A T ( 3 1 1 0 , E 1 6 . 8 , 3 1 1 0 ) 2 3 0 0 F O R M A T ( 2 0 A 4 ) END C F U N C T I O N L F I N D ( I , J ) C I F T H E LOWER T R I A N G U L A R P O R T I O N OF A S Y M M E T R I C M A T R I X I S S T O R E D BY C ROWS IN A O N E - D I M E N S I O N A L A R R A Y , T H E N T H E E L E M E N T ( I , J ) IN T H E F U L L i 181 C M A T R I X I S ELEMENT L F I N D ( I , J ) I N T H E L I N E A R ARRAY IF C I . G T . J I GO TO 10 C ROW J , COLUMN I L F I N O = ( ( J - l } * ( J - 2 ) ) / 2 + I RETURN C ROW I , COLUMN J 10 L F I N D = ( ( ! - ! ) * { 1 -2) ) / 2 + J RETURN END C S U B R O U T I N E TREE( I , J , S , I L , J L i N E X T , A , L A B E L , L C L N O , L I N E , I S , L A S T , E P S , A T I T L E , N , K B E G , N T , I NT R V , I P R N T , M A X IN) C C DATA INPUT THROUGH C A L L I N G SEQUENCE C C N = HIGHE ST STAGE NUMBER I N THE C L U S T E R MERGE DATA (MUST BE E X A C T ) C KBEG= ST AGE NUMBER AT WHICH THE TREE B E G I N S , D E F A U L T VALUE 1 C NT=TAPE NUMBER FOR P R I N T E D OUTPUT, DEFAULT VALUE = 6 C I N T R V - I N T E R V A L OPTION FOR SEGMENTATION C INTRV=1=DEFAULT V A L U E o CONSTRUCT EPS BY D I V I D I N G THE RANGE OF S INTO C 25 EQUAL SEGMENTS C INTRV = 2-=EPS IS PROVIDED AS PAST OF THE ARGUMENT L I S T C INTRV=3=THE IS ARRAY IS ALREADY CONSTRUCTED AND EPa I S P R O V I D E D FOR I N F O C I P R N T = P R I N T O P T I O N FOR INPUT I N F O R M A T I O N C I A 8 S ( I P R N T ) = l o P R I N T ONLY T I T L E AND - I S * ARRAY C I A B S t I P R N T ) . N E o l . I N A D D I T I O N PRINT THE CLUSTER MERGE DATA C I P R N T . L E o O . I N A D D I T I O N , PUNCH THE SEQUENCE I N WHICH THE E N T I T I E S C- A P P E A R IN THE T R E E (NEEDED !=0R P 0 S T-4NA L Y S I S OF DATA C UNIT C L U S T E R I N G I N SUBROUTINE * P 3 S T 0 J * ) . C E P S ( M ) = R I G H T E N D P D I N T FOR THE MIN I N T E R V A L USED FOR S E G M E N T I N G S C L A B E L ( M , I J ) = W I T H OF 5 WORDS I D E N T I F Y I N G THE I J T H 0 3 J E C T C T I T L E = A R R A Y OF 20 WORDS FOR I D E N T I F Y I N G THE R U N . C K=INDEX I D E N T I F Y I N G STAGE NUMBER I N THE C L U S T E R I N G C KTH STAGE C J ( K ) = U P P E R NUMBERED C L U S T E R I D E N T I F I C A T I O N NUMBER I N THE MERGE AT THE C KTH STAGE C S ( K ) = V A L U E OF THE C R I T E R I O N FUNCTION FOR THE MERGE AT THE KTH STAGE C I S ( K ) = C A T E G O R I Z E D VALUE OF S= INTEGER I N RANGE 1 TO 25 C I L ( K ) = S T A G E NUMBER WHEN K K ) WAS LAST IN A MER3E ( J F D * F I R S T MERGE FOR K C J L { K ) = S T A G E NUMBER WHEN J ( K ) WAS LAST I N A MERGE (0 FOR F I R S T MERGE FDR J ( C N E X T ( K ) = S T A G E NUMBER WHEN K K ) TO NEXT I N A MERGE C MAX IN=HIGHEST C L U S T E R ID NUMBER IN THE CLUSTER MERGE DATA C C OTHER V A R I A B L E S USED IN THE PROGRAM C C L I N E ( I ) = L ! N E NUMBER I N THE PRINTOUT AT WHICH K K ) I S C A R R I E D ( A F T E R C MOST RECENT MERGE) C L C L N O ( L ) = THE C L U S T E R NUMBER TO 8E P R I N T E D ON L I N E L AT THE L E F T OF THE TRE C A ( M , L ) = T H E MTH SEGMENT (OF 2 5 ) IN THE LTH L I N E OF THE P R I N T O U T C L A S T ( L ) = F A R T H E S T RIGHT SEGMENT I N L I N E L WHICH I S BLANK R E A L * 4 L A B E L DIMEN5 ION T < N ) , J i N ) t S ( N ) , I S ( N ) , I L ( N ) , J L ( N ) , N E XT I N ) , A A ( 2 5 , M A X I N J , L AST ( MAX I N ) t LCLNO'( MAXIN J D I M E N S I O N L I N E {MAX J N) , L A 3 E L ( 5 , M A X I N ) D I M E N S I O N E P S ( 2 5 ) , T I T L E ( 2 0 ) DATA 3AP.I , B L I N K , B A R S , B L A N K / 4 H I»4H I , 4 H ,+H DATA R L B / ' N O L B V DEFAULT V A L U E S I F ( K B E G . L T . l ) KBEG=1 IP ( I N T R V o L T o l o O R o I N T R V o G T o 3 ) INTRV = 1 I F ( N T o L E o O ) NT=6 I N I T I A L I Z E ARRAYS N0BJ=N+1 DO 10 K = 1 , N 0 3 J L I N E ( K ) = 0 L C L N O ( K ) = 0 L A S T ( K ) = 0 DO 1 0 L = i , 2 5 I A ( L , K ) = B L A N K 10 CONTINUE C SEGMENT THE S ARRAY GO TO ( 2 0 , 4 0 , 1 2 0 ) , I N T R V C CONSTRUCT I N T E R V A L S OF EQUAL LENGTH 20 R AN GE= S ( N ) - S ( K B E G) D E L T A = R A N G E / 2 5 o E P S ( 1 ) = S ( K 3 E G J + D E L T A ' DO 30 K = 2 , 2 4 " 30 E P S ( K ) = E P S ( K - 1 ) + D E L T A E P S ( 2 5 ) = S ( N ) C . CONSTRUCT THE IS ARRAY 4 0 I F ( E P S ( l ) « G T o E P S ( 2 ) ) GO TO 70 C S I N C R E A S E S WITH D I S S I M I L A R I T Y (AS DOES A D I S T A N C E ) KK = 1 DO 60 K = 1 , N 50 I F ( S ( K ) « . L E e E P S ( K K ) ) GO TO 6 0 I F ( K K o E Q « 2 5 ) GO TO 60 KK=KK+1 GO TO 50 60 I S ( K ) = K K ' GO TO 120 C S D E C R E A S E S WITH D I S S I M I L A R I T Y (AS DOES A C O R R E L A T I O N ) 70 KK=24 K K K = 2 5 NN=N+1 DO 90 K = 1,.N KCOMP=NM-K 80 I F ( S ( K C G M P ) e L T o E P S ( K K ) ) GO TO 90 KKK=KK K K = K K - 1 I F ( K K . = Q . O I GO TO 100 183 GO TO 8 0 90 I S ( K C O M P ) = K K K 1 0 0 DO 1 1 0 K = 1 , K C 0 M P 1 1 0 I S ( K ) = 1 C PR INT INPUT TO T R E E 1 2 0 W R I T E ( N T , 2 0 0 0 ) T I T L E W R I T E ( N T , 2 1 0 0 ) K B E G , N W R I T E ( N T , 2 2 0 0 ) W R I T E ( N T , 2 3 0 0 ) M=l W R I T E ( N T , 2 4 0 0 ) M , S ( K B E G ) , E P S ( M ) DO 1 3 0 M = 2 , 2 5 MM=M-1 1 3 0 W R I T E ( N T , 2 4 0 0 ) M , E P S ( M M ) , E P S ( M ) I F ( U B S ( I P R N T ) . E Q . 1) GO TO 1 5 0 C P R I N T T H E C L U S T E R MERGE DATA W R I T E ( N T , 2 0 0 0 ) T I T L E W R I T E ( N T , 2 5 0 0 ) DO 1 4 0 K = K B E G , N W R I T E ( N T , 2 6 0 0 ) K , I ( K ) , J ( K ) , S ( K ) , I S ( K ) , I L ( K ) , J L ( K ) , N E X T ( K ) 1140 C O N T I N U E CJ S T A R T T R E E WITH T H E MOST S I M I L A R P A I R l|50 K = K B E G LNO=0 MERGE C L U S T E R S I ( K ) AND J ( K ) 1160 I K = I ( K ) J K = J ( K ) S E T L I N E NUMBERS FOR O U T P U T I F I I L ( K ) . N E . O ) GO TO 170 L N 0 = L N 0 + 1 L I N E l I K ) = L N O L C L N O ( L N O ) = I K 1 7 0 I F ( J L ( K ) . N E . O ) GO TO 180 L N 0 = L N 0 + 1 L I N E ( J K )=.LNO L C L N O ( L N O ) = J K C F I L L IN THE P R I N T L I N E S 1 8 0 I S K = I S ( K ) KT=0 I T E M = I K 190 L I T E M = L I N E ( I T E M ) I F { I S K - L A S T ( L I T E M ) - l ) 2 2 5 , 2 0 0 , 2 1 0 C ADD O N L Y ONE MORE S F G E M N T FOR L I N E ( I T E M ) 2 0 0 A( I SK , L IT EM ) = BAR I L A S T ( L I T E M ) = I S K GO TO 2 2 5 C ADD MORE THAN ONE S E G M E N T 210 L B E G = L A S T ( L I T E M ) + 1 L E N D = I S K - 1 DO 2 2 0 L= L B E G , L END 184 G J TO 240 220 A!L » LITEM)=BARS GO TO 200 C REPEAT FOR CLUSTER J ( K ) 225 KT=KT+1 I F l K T . N E . l ) GO TO 230 TTEM=JK GO TO 190 C TAKE CARE OF ANY LINES BETWEEN K K ) AND J(K) 230 LIK=LINE(IK) LJK=LINE(JK) IFCLIKoGToLJK) LBOT=LJK LTOP=LIK GO TO 2 50 LBOT= LIK LTOP=LJK IF(LBOT.EQ. MUST FILL IN SOME VERTICAL CONNECTIONS LBEG=LT0P+1 LEND=LB,DT-1 DO 26 0 L=LBEG,LEND ' IF(A(ISK,L)»EQo5ARI) GO TO 260 A( ISK,L) = BLINK LAST(L)=ISK 260 CONTINUE C UPDATE LINE NUMBER FOR NEW CLUSTER 270 L I N E l I K ) = ( L I N E l T K J + L I N E I J K ) ) / 2 C MERGE COMPLETE. FIND NEXT STAGE KLAST=K K=NEXT1K) 240 250 C 1LT0P+1)) TO 270 IF(KoGT«NoDRoKoLT« I F U L ( K ) . L E . O ) GO IF{JL(K ) . L E . O ) GO GO TO 300 280 IL(K)=-IL1K) GO TO 160 . 290 J L ( K ) = - J L I K ) GO TO 160 THIS MERGE INVOLVES GO TO 400 TO TO 280 290 C C BACKTRACK TO THE 300 IF( IL[K)oEQ.KL THA • GO DOWN IL( K ) J L I K ) = - J L ( K ) K=I LIK ) GO TO 3 20 C | GO DOWN JL(K) 310 I L ( K ) = - I L ( K ) K=JL1K) 320 IF1K.LT.1.0R.K EACH HAVE MOR THAN ROOT OF THE TREE ALONG THE ST) GO TO 310 JL ( K ) ONE MEM3ER. UN EXPLORED BRANCH. BRANCH SET SO WE KNOW NOT TO GO DOWN THAT BRANCH AGA: BRANCH, SET IL ( K ) SO WE KNOW NOT TO GO DOWN THAT BRANCH AGA! GT.N) C TEST TO SEE IF THE END GO TO 600 HAS BEEN REACHED, IL( K ) = J L ( K ) IFF BOTH ZERO. i 185 I F ( I L C K J — J L ( K i ) 3 3 0 , 1 6 0 , 3 5 0 3 3 0 I F { I L ( K ) „ E Q . 0 ) GO T O 360 3 4 0 K = I L ( K ) GO TO 3 2 0 3 5 0 I F { J L ( K ) . S Q . O ) GO TO 3 4 0 ' 3 6 0 K= J L t K ) GO TO 3 20 C P R I N T THE T R E E 4 0 0 W R I T E ( N T , 2 0 0 0 ) T I T L E I F ( L A B S L ( 1 , 1 ) . E Q . R L 3 ) GO TO 4 2 0 W R I T E ( N T , 3 0 0 0 ) ( K , K = 1 , 2 5 ) 00 4 1 0 L = 1 , L N 0 L L = L C L N O ( L ) 4 1 0 WRITE (NIT, 3 1 0 0 ) ( LAB E L ( K, L L ) , K = l , 5) , L L , ( A l K, L) , K = i , 2 5 ) GO TO 4 4 0 C L E A V E L A B E L S P A C E S BLANK 4 2 0 W R I T E ( N T , 3 0 1 0 ) ( K , K = 1 , 2 5 ) 0 0 4 3 0 L = 1 , L N 0 L L = L C L M Q ( L ) 4 3 0 W R I T E ( N T , 3 2 1 0 ) L L , ( A ( K , L ) , K = 1 , 2 5 ) C T R E E C O M P L E T E 4 4 0 I F ( I P R N T . G T . O ) R E T U R N C PUNCH S E Q U E N C E L I S T W R I T E ( 7 , 3 9 0 0 } T I T L E W R I T E ( 7 , 4 0 0 0 ) ( L C L N O ( L ) , L = 1 , L N O ) R E T U R N C E R R O R , P R I N T AS MUCH OF T H E T R E E AS HAS B E E N C O N S T R U C T E D 6 0 0 W R I T t ( N T » 6 0 0 0 ) K L A S T , K GO TO 4 0 0 2 0 0 0 F O R M A T ! 1 H 1 , 2 0 X , 2 0 A 4 , / / ) 2 1 0 0 F O R M A T ( 6 5 H T H I S RUN D E P I C T S THE P O R T I O N OF T H E T R E E G E N E R A T E D 3ETW A c E N S T A G E , I 5 , 1 1 H AND S T A G E , I 5 , 1 9 H OF T H E C L U S T E R I N G . , / ) 2 2 0 0 F O R M A T ( 6 3 H THE C R I T E R I O N V A L U E S A R E S E G M E N T E D INFO T H E F O L L O W I N G C A L A S S E S . , / ) 2 3 0 0 FORMAT ( 6H CL ASS , 5 X , 11HLOW F:R 30UND, '5X , 1 1 H U P P E R B O U N D , / / ) AOO FORMAT ( 1 X » I 5» 2E 1 6 . 3) 2 5 0 0 F O R M A T ( I H , 9 X , 1 H K , 9 X , 1 H I , 9 X , I H J , 1 5 X , 1 H 5 , 3 X , 2 H I 5 . 3 X , 2 H I L , 8 X , 2 H J L , 6 X I A , 4 H N E X T , / / ) 2 6 0 0 F O R M A T ( I X , 3 1 1 0 , £ 1 6 . 3 , 4 1 1 0 ) 3 0 0 0 F O R M A T l l O H I T E M NAME , 12X » 5 H I D N O , 2 X , 2 5 1 4 , / / ) C I F L O C A L C O N V E N T I O N S P E R M I T , R tCOMMEN T H A T T H E C A R R I A G E C O N T R O L C C H A R A C T E R IN F O R M A T S 5 1 0 0 AND 3 2 0 0 ALLOW 66 L I N E S OF P R I N T PER P A G E . C T H A T I S , T H E M A R G I N S - A T T H E T O P AND BOTTOM OF T H E P A G E ARE S U P P R E S S E D C AND P R I N T I N G IS S I N G L E S P A C E . 31100 FORMAT (1 H , 5 A4 , 16 , 2X , 2 5A4 ) 3<j)10 F O R M A T ( 5 X » 5HI D N O , 2 X , 2 5 1 4 , / / ) 3 2 1 0 FORMAT ( 5X , I 6, 2 X , 2 5 A 4 ) 3 ^ 0 0 F O R M A T ( 2 0 A 4 ) 4(000 F O R M A T ! 201 4) 6q>00 F O R M A T ( 3 7 H E R R O R . W H I L E B A C K T R A C K I N G FROM K L A S T , I 6 , 2 7 H K WAS FOUND 186 A OUT OF R A N G E . , / , I X , 3 H K = , 1 2 0 ) END | S U B R O U T I N E MTX I N ( X , I O P T , N E , N T I N , F M T ) £ T H I S SUBROUTINE READS A LOWER T R I A N G U L A R MATRIX * X * R E P R E S E N T I N G t A S S O C I A T I O N AMONG * N E * E N T I T I E S . THE M A T R I X I S READ FROM UNIT * N T I N * t I N FORMAT * F M T * . THE MODE OF INPUT FOR THE M A T R I X IS D c T c R M l N E D BY THE * I O P T * PARAMETER AS F O L L O W S , tt I O P T c L E c O , MATRIX IS READ IN LOWER T R I A N G U L A R FORM BY ROWS, E A C H d ROW B E I N G A NEW R E C O R D . G I O P T . G T . O , MATRIX I S READ I N CONSTANT LENGTH B L O C K S , EACH * I 3 P T * C WORDS L O N G . D I M E N S I O N F M T ( 2 0 ) , X ( 1) INTEGER F I R S T I F ( I O P T o L E . 0 ) GO TO 30 C READ THE S I M I L A R I T Y MATRIX I N BLOCKS IOPT LONG ! F I R S T = 1 L A S T = I O P T 10 READ(NT I N , F M T , E N D = 6 0 ) [ X { I ) , I = F I R S I , L A S T ) C USE THE END OF RECORD CARD TO S I G N I F Y END OF THE S I M I L A R I T Y M A T R I X 20 F I R S T = F I R S T + I O P T [ L A S T = L A S T + I O P T GO TO 10 C READ THE S I M I L A R I T Y M A T R I X AS ROWS OF A LOWER T R I A N G U L A R M A T R I X , C EACH ROW A R E C O R D . 30 F I R S T = 1 LAST=1 DO 50 K=2,N£ READ(NT I N , F M T »E MD=2 0 G ) ( X I I ) , 1 = F I R S T , L A S T ) 40 F I R S T = L A S T + 1 LAST=LAST+K 50 CONTINUE C PASS THE END OF F I L E READ{NT I N , F M T , E N D = 6 0 ) Z 210 W R I T E ( 6 , 2 5 0 0 ) GO TO 9 9 9 60 RETURN C ERROR MESSAGES 2 0 0 W R I T E ( 6 , 2 4 0 0 ) . GO TO 2 2 0 220 W R I T E ( 6 , 2 6 0 0 ) K , F I R S T , L A S T , Z , ( X ( I ) , I = F I R S T , L A S T ) 999 STOP 2 4 0 0 FORMAT{36H EOF ENCOUNTERED WHEN NONE E X P E C T E D . ) 2 5 0 0 F O R M A T ( 3 0 H NO EOF WHEN ONE WAS E X P E C T E D . ) 2 6 0 0 F O R M A T ( l X , 3 I 1 0 , F 1 0 . 7 , / , ( 1 X . 1 2 F 1 0 . 7 ) ) END $ S I G 187 SUBROUTINE M E T H O D ( S , N E A R , S R E F , L I S T , A , B , S R E F X , S I G N , N « N C L , L R E F , N R E F , A J 0 3 J c C H I E R A R C H I C A L C L U S T E R I N G 3Y S I N G L E L I N K A G E . THE L A o O R I T H M I S D E R I V E D C FROM C J O H N S O N , S . C , H I E R A R C H I C A L C L U S T E R I N G S C H E M E S , PSYCHOS ET R I K A , C VOLUME 3 2 , NUMBER 3 , SEPTEMBER 1 9 6 7 , PP 2 4 1 - 2 5 4 . C D I M E N S I O N S ( l ) , N E A R { 1 ) , S R E F ( 1 I , L 1 S T ( 1 ) , A l l ) , 3 ( 1 1 GO TO { 1 0 , 1 5 , 2 C ) , J O B C J 0 3 = l . I N I T I A L I Z A T I O N 1 0 WRITE ( 6 , 3 0 0 0 ) '3000 FORMAT ( 26H0S I N G L E L I N K A G E C L U S T E R I N G ) BIG=SIGN : S--loE50 RETURN C J 0 B = 2 , DUMMY ENTRY* ! l5 RETURN C J 0 B = 3 , UPDATE FOR NEXT ROUND. 20 CONTINUE DO 50 J = 1 , N C L U P D A T E E N T R I E S IN S ARRAY A S S O C I A T E D WITH NREF I = L I S T ( J ) I F ( I . E Q . N R E F ) GO TO 50 t R E C A L L THAT L R E F HAS B E E N REMOVED FORM L I S T SO I NEED NOT BE T E S T E D t FOR E Q U A L I T Y WITH L R E F LL = LF I N D ( I , L R E F ) L N = L F I N D ( I , N R E F ) I F ( l ( S ( L L ) - S ( L N ) ) * S I G N ) o G E . O . ) GO TO 35 S ( L N ) = S ( L L ) I F ( I o G T . N R E F ) GO TO 30 G I . L T . N R E F G CHECK WHETHER S ( L N ) HAS A BETTER VALUE THAN S R E F ( N R E F ) I F ( ( ( S ( L N ) - S R E F ( N R E F ) ) * S I G N ) . G T . O . ) GO TO 50 N E A R l N R E F ) = 1 S R E F ( N R E F ) = S ( L N ) GO TO 50 E50 I F ( L G T . L R E F ) GO TO 4 0 C I . G T . N R E F . A N D . I . L T . L R E F CJ CHECK WHETHER S ( L N ) HAS A BETTER VALUE THAN S R E F ( I ) I F { I ( S ( L N ) - S P v c F i n ) * S I G N ) . G E . O . ) GO TO 50 S R E F C I ) = S ( L N ) N E A R ( I ) = M R E F GO TO 5 0 3 5 I F ( I . L T . L R E F ) GO TO 50 C I . G T . L R E F C UPDATE NEAR ARRAY FOR THOSE ROWS WHOSE EXTREME ELEMENT WAS L R E F 4 0 I F ( N E A R ( D . N E . L R E F ) GO TO 50 N E A R ( I ) = NREF S R E F ( I ) = S ( L N ) 50 CONTINUE RETURN END 188 S U B R O U T I N E M E T H O D ( S , N E A R , S R E F , L I S T , . A , B t S R E F X , S I G N . N . N C L , L R E F , N R E F , A J O B ) C C H I E R E R C H I C A L C L U S T E R I N G BY C O M P L E T E L I N K A G E . T H E A L G O R I T H M IS C D E R I V E D FROM C J O H N S O N , S . C , H I E R A R C H I C A L C L U S T E R I N G S C H E M E S , P S Y C H O M E T R I K A , C V O L U M E 3 2 , NUMBER 3 , S E P T E M B E R 1 9 6 7 , PP 2 ^ 1 - 2 5 4 . C D I M E N S I O N S ( 1 ) , N E A R ( I ) , S R E F { 1 ) , L I S T ( 1 ) , A ( 1 ) , 3 1 1 ) GO TO ( 1 0 , 1 5 , 2 0 ) , J O B C J 0 B = 1 . I N I T I A L I Z A T I O N 10 W R I T E ( 6 , 2 0 0 0 ) 2000 F O R M A T ! 2 8 H 0 C 0 M P L E T E L I N K A G E C L U S T E R I N G ) B I G = S I G N * 1 . E 5 0 ] RETURN C J0B=2, DUMMY ENTRY. ;15 RETURN |C J 0 B = 3 , U P D A T E FOR N E X T R O U N D . ;20 DO 3 0 J = 1 , N C L | I = L I S T < J ) I F ( I . E Q . N R E F ) GO TO 3 0 0 RECALL T H A T LREF HAS BEEN R E M O V E D FROM L I S T SO I NEED NOT BE C TESTED FOR E Q U A L I T Y WITH LREF. L L = L F I N D i I , L R E F ) L N = L F I N D I I , N R E F ) I F ( ( ( S ( L L ) - S ( L N ) ) * S I G N ) . L E . G ) GO TO 3 0 S I L N ) = S { L L ) 30 C O N T I N U E C U P D A T E T H E NEAR AND S R E F A R R A Y S . I F T H E E X T R E M E E L E M E N T IN ROW I C WAS E I T H E R L R E F OR N R E F , T H E N I T I S N E C E S S A R Y TO F I N D A NEW E X T R E M E C E L E M E N T . ROWS P R I O R T O N R E F N E E D NOT B E C O N S I D E R E D . 40 DO 50 J = 1 , N C L I = L 1 S T ( J ) I F ( I . E Q . N R E F ) GO TO 55 SO C O N T I N U E 5 5 I F ( J . E Q . l ) GO TO 80 60 S R E F { I ) = B I G J 1 = J - 1 DO 70 L = 1 , J 1 L I S T L = L I S T ( L ) L L = L F I N O ( I , L I S T L ) I F U ( S ( L L ) - S R E F { I ) ) * S I G N ) . G E » 0 < , ) GO TO 7 0 N E A R ! I ) = L I S T L S R E F ( I ) = S { L L ) 7 0 C O N T I N U E 80 J = J + 1 I F ( J . G T . N C L ) RETURN I = L I S T ( J ) I F ( N E A R d ) . E Q . L R E F . O R . N E A R C I ) . E Q . N R E F ) GO TO 6 0 GO TO 80 END 189 S U B R O U T I N S M E T H O D ( S , N E A R , S RE F , L I ST , N U M 6 R , S U M , S R E F X » 3 1 G N » N » NC L » A L R E F , N R E F , J O B ) C C H I E R A R C H I C A L C L U S T E R I N G BY M I N I M I Z I N G T H E A V E R A G E D I S T A N C E OR C M A X I M I Z I N G T H E A V E R A G E C O R R E L A T I O N BETWEEN T H E MERGED G R O U P S . C C T H E A L G O R I T H M I S D E R I V E D FROM T H E * G R O U P A V E R A G E * METHOD D E S C R I B E D IN C L A N C E , G . N . AND W. T „ W I L L I A M S , A G E N E R A L THEORY OF C L A S S I F I C A T O R Y C S O R T I N G S T R A T E G I E S , 1. H I E R A R C H I C A L S Y S T E M S , THE C O M P U T E R J O U R N A L , C V O L U M E 9 , NUMBER 4 , F E B R U A R Y 1 9 6 7 , P P 3 7 3 - 3 8 0 . C D I M E N S I O N S ( l ) , N E A R ( 1 ) , S R E F ( 1 ) , L I S T ( 1 ) , N U M B R t 1 ) , S U M ( 1 ) GO TO 1 1 0 , 2 5 , 3 0 ) , J O B C J 0 3 = l , I N I T I A L I Z E . C NUM8R I I3=NUMBER OF E N T I T I E S C U R R E N T L Y IN T H E I - T H C L U S T E R 10 W R I T E ( 6 , 2 0 0 0 ) 2000 FORMAT I 4 2 H 0 A V E R A G E L I N K A G E B E T W E E N T H E MERGED G R O U P S ) DO 20 J = 1 , N 20 N U M B R t J ) = l B I G = S I G N * 1 . E 5 0 R E T U R N C J 0 B = 2 , DUMMY E N T R Y . 25 R E T U R N C J 0 8 = 3 , U P D A T E FOR N E X T R O U N D . C U P D A T E T H E NEW C L U S T E R 30 N U M 8 R ( N R E F ) = N U M B R ( N R E F ) + N U M S R ( L R E F ) C U P D A T E E N T R I E S IN T H E R E D U C E D S I M I L A R I T Y M A T R I X . T H E E N T R I E S ARE C T H E SUM T O T A L OF S I M I L A R I T Y V A L U E S A S S O C I A T E D W ITH A L L C P A I R W I S E L I N K S BETWEEN T H E E L E M E N T S OF THE TWO C L U S T E R S . DO 40 J = 1 , N C L I = L I S T ( J ) I F ( I . E Q . N R E F ) GO TO 40 C R E C A L L T H A T L R E F HAS B E E N REMOVED FROM L I S T AND T H E R E F O R E I N E E D NOT C B E T E S T E D FOR E Q U A L I T Y WITH L R E F . L L = L F I N D ( I , L R E F ) L N = L F I N D ( I , N R E F ) S ( L N ) = S ( L N ) + S ( L L ) 40 C O N T I N U E C U P D A T E THE NEAR AND S R E F A R R A Y S . I F T H E E X T R E M E E L E M E N T IN ROW I C WAS E I T H E R L R E F OR N R E F , T H E N IT IS N E C E S S A R Y TO F I N D A NEW WXTREME C E L E M E N T . ROWS PR IOR TO N R E F N E E D NOT BE C O N S I D E R E D . DO 50 J = 1 , N C L I = L I S T ( J ) I F ( I . E Q . N R E F ) GO TO 55 50 C O N T I N U E 55 I F ( J . E Q . l ) GO TO 80 60 S R E F ( I ) = 8 I G J 1 = J - 1 DO 7 0 L = 1 , J 1 L I S T L = L I S T ( L ) 190 L L = L F I N D ( I » L ! S T L ) S R E F X = S ( L L ) / (NUMBR (I ) * N U M B R ( L I S T L ) ) I F { ( ( S R F F X - S R E F ( I ) ) * S I G N ) . G E . O . ) GO T O 7 0 N E A R ( I ) = L I S T L SRE F { I ) = S R E F X 7 0 C O N T I N U E 8 0 J = J + 1 I F ( J . G T . - N C L ) R E T U R N I = L I S T { J ) I F ( N c A R ( I ) » E Q o L R E F o O R o N E A R ( I I . E Q . N R E F ) GO TO 60 GO TO 8 0 END 191 S U B R O U T I N E M E T H O D ! S , N E A R , S R E F , L I S T , N U M B R , S U M , S R E F X , S I G N , N , N C L , A L R F . F , N R E F , J O B ) C C H I E R A R C H I C A L C L U S T E R I N G BY M I N I M I Z I N G T H E A V E R M G E D I S T A N C E OR C M A X I M I Z I N G T H E A V E R A G E C O R R E L A T I O N W ITH IN T H E N E W J R O U P . T H A T I S , C FOR E A C H P O T E N T I A L MERGE T H * A V E R A G E OF A L L L I N K A G E S W I T H I N THE C HEW GROUP I S C A L C U L A T E D . D I M E N S I O N S ( 1 ) » N E A R ( 1 } , S R E F ( 1 ) , L I S T ( 1 ) , N U M B R 1 1 ) , S U M ( i ) GO TO ( 1 0 , 2 5 , 3 0 } , J O B C J 0 B = 1 , I N I T I A L I Z E . C N U M B R ( I } = N U M B E R OF E N T I T I E S C U R R E N T L Y IN T H E I - T H ^ L U S T E R C S U M ( I ) = S U M OF A L L P A I R W I S E S I M I L A R I T I E S AMONG E N T I T I E S IN T H E I - T H C C L U S T E R 1 0 W R I T E ( 6 , 2 0 0 0 ) 2 0 0 0 F O R M A T C 3 7 H 0 A V E R A G E L I N K A G E W I T H I N T H E NEW GROUP) ; DO 20 J = 1 , N j N U M B R ( J ) = 1 2 0 S U M ( J ) = 0 . B I G = S I G N * 1 . E 5 0 R E T U R N C J 0 B = 2 , DUMMY E N T R Y . 2 5 R E T U R N C J 0 B = 3 , U P D A T E FOR N E X T ROUND. i U P D A T E T H E NEW C L U S T E R 3 0 N U M B R ( l M R E F ) = N U M B R ( N R E F ) + N U M 8 R ( L R E F ) L N = L F I N D ( L R E F , N R E F ) S U M ( N R E F ) = S U M ( N R E F ) + S U M ! L R E F ) + S ( L N ) G U P D A T E E N T R I E S IN T H E R E D U C E D S I M I L A R I T Y M A T R I X . T H E E N T R I E S ARE C T H E SUM T O T A L OF S I M I L A R I T Y V A L U E S A S S O C I A T E D WITH A L L P A I R W I S E L I N K S B E T W E E N T H E E L E M E N T S OF THE TWO C L U S T E R S . DO 4 0 J = 1 , N C L I = L I ST ( J ) I F { I . E Q . N R E F ) GO TO 4 0 R E C A L L T H A T L R E F HAS B E E N REMOVED FROM L I S T AND T c R E F O R e I N E E D NOT BE T E S T E D FOR E Q U A L I T Y WITH L R E F . L L = L F I N D ( I , LP .EF ) LN= LF I ND (I , N ' REF ) S (L iM) = S ( L N ) + S ( L L ) 4 0 C O N T I N U E C U P D A T E T H E NEAR AND S R E F A R R A Y S . I F T H E E X T R E M E E L E M E N T IN ROW I CJ WAS E I T H E R L R E F OR N R E F , T H E N I T IS N E C E S S A R Y TO F I N D A NEW E X T R E M E E L E M E N T . ROWS P R I O R TO N R E F N E E D NOT BE . C O N S I D E R E D . DO 50 J = 1 , N C L I = L I S T ( J ) I F ( l o E Q . N R E F ) GO TO 55 5 0 C O N T I N U E 5 5 I F ( J . E Q . l ) GO TO 80 o'o S R E F ( I ) = B I G J 1 = J - 1 DO 70 L = 1 , J 1 192 L I S T L = L I S T ( U L L = L F I N D ( I , L I S T L ) N T G T = N U M 3 R{ I )+ N U M B R ( L I S T L ) NTOT=( ( N T O T ) M N T C T - l ) i/2 SRE FX= ( " S U M { I ) + S U M ( L J S T L ) + S ( L L ) J / N T O T I F ( ( ( S R E F X - S R E F { I ) ) * S I G N ) „ G E o O . ) GO TO 70 ME A R l I ) = L I S T L S R E F ( I ) = S R E F X 70 C O N T I N U E 30 J = J + L I F ( J . G T o N C L ) RETURN I = L I S T ( J ) I F { N E A R ( I ) . E Q „ L R E F o O R e N E A R ( I ) . E Q o N R E F ) GO TO 6 0 GO TO 8 0 END 193 I SUBROUTINE METHOD(S,NEAR,SREF,LIST,NUMBR,SUM,SREFX,SIGN,N,NCL, ALREF,NREF,JOB) C iC HIERARCHICAL CLUSTERING BY CENTROID SORTING C C THE PARTICULAR ALGORITHM USED HERE IS DESCRIBED IN C LANCE, G.N. AND W.T, WILLIAMS, A GENERAL THEORY OF CLASSIFICATORY C SORTING STRATEGIES, 1. HIERARCHICAL SYSTEMS, THE COMPUTER JOURNAL, C VOLUME 9, NUMBER 4, FEBRUARY 1967, PP373-380. DIMENSION S(1) , NEAR (1) ,SREF(1) , L I S T ( 1 ) ,NUKBR(1) ,SUM(1) GO TO (10,25,30),JOB C JOB=1, I N I T I A L I Z E . C NUMBR(I)=NUMBER OF ENTITIES CURRENTLY IN THE I-TH CLUSTER C CLUSTER 10 WRITE(6,2000) 2000 FORMAT(42H0CENTROID CLUSTERING. BEWARE OF REVERSALS) DO 20 J=1,N 20 NUKBR(J)^1 BIG=SIGN*1.E50 RETURN C JOB=2, DUMMY ENTRY. 25 RETURN iC J03=3, UPDATE FOR NEXT ROUND. C UPDATE THE NEWCLUSTER 30 NTOT=NUM3R(NREF)+NOMBS(LREF) TOT=NTOT ALL=NUKBR(LREF)/TOT ALN=KUMBR(LREF)/TOT PROD= ALN*ALL LBET=LFIND (LREF,NREF) DO 40 J=1,NCL I = L I S T ( J ) IF(I.EQ.NREF) GO TO 40 (p RECALL THAT LREF HAS BEEN REMOVED FROM L I S T AND TEREFORE I NEED NOT € BE TESTED FOR EQUALITY WITH LREF. LL=LFIND (I,LREF) LN=LFIND(I,NREF) S (LN) =ALL*S (LL) +A.LN + S (IN) -PEOD*S (LBET) HO CONTINUE C;: UPDATE THE NEAR AND SREF ARRAYS. . I F THE EXTREME ELEMENT IN ROW I (b WAS EITHER LREF OR NREF, THEN I T IS NECESSARY TO FIND A NEW EXTREME 0: ELEMENT. ROW'S PRIOR TO NREF NEED NOT BE CONSIDERED. • DO 50 J=1,NCL I=LIST (J) I F (I.EQ.NREF) GO TO 55 50 CONTINUE 55 I F ( J . EQ. 1) GO TO 80 60 S R E F ( I ) = B I G J 1 = J - 1 DO 70 L = 1 , J 1 194 L I S T L = L I S T (L) L L - L T l l i D ( I , L I 5 T L ) . I F ( { (S (LL) -33EF (I) ) *SIGN) . GE.. 0.) GO TO 70 NE AH ( I ) = L I S T L SREF (I) =S (LL) 70 CONTINUE 80 J=J+1 I F ( J . G T . N C L ) RETURN I = L I S T (J) I F (NEAR ( I ) . EQ . LP E-F. 03. NBAS (I ) . EQ. NREF) GO TO 60 GO TO 80 END $SIG 195 SUBROUTINE; M E T H O D ( S , N E A R , S R E F , l . [ S T , A , 8, S R E F X f S I G N , Nt N C L , L R E F , NREF, A J O B ) H I E R A R C H I C A L C L U S T E R I N G BY THE M I D I AM METHOD OF GOWER, J o C o , A COMPARISON OF SOME METHODS OF C L U S T E R A N A L Y S I S , B I O M E T R I C S , VOLUME 23, NUMBER 4, DECEMBER 1967, PP 6 2 3 - 6 3 7 . DIMENSION S(1)»N E AR(1)» S RE F{1)» LIS T11)»A(1)»B(1) GO TO (10,15,20),JOB J 0 3 = l . INITIALIZATION 1 0 WRITE(6,2000) 2 0 0 0 FORMAT(44H0ME0IAN METHOD OF GOWER, BEWARE OF REVERSALS) BIG=SIGN*1.E50 RETURN C J0B=2, DUMMY ENTRY. 15 RETURN C J03=3, UPDATE FOR NEXT ROUND. 2 0 LBfcT=LFIND{LRSF,NREF) DO 30 J=1,NCL I-L 1ST ( J ) IF(I.EQ.NREF) GO TO 30 RECALL THAT LREF HAS BEEN REMOVED FROM LIST SO I NEED NOT BE TESTED FOR EQUALITY WITH LREF. LL = LFIND{I,LRE F) LN=LFIND(I,NREF) I F S IS A DECREASING FUNCTION OF SIMILARITY ( E . G . DISTANCE) THEN S U N ) = (S(LN) +S(LL) ) /2 , -S < LBET ) /4. I F S IS A N INCREASING FUNCTION OF SIMILARITY ( E.G. CORRELATION) THEN S(LN)=(S(LN)+S ( L L ))/2©+(I»-S(LBET))/4. S(LN)=(S(LN)+S(LL))/2.-S(L8ET)/4. 3b CONTINUE ; - .:; - UPDATE THE NEAR AND SREF . ARRAYS. I F THE EXTREME ELEMENT IN ROW I WAS EITHER LREF OR NREF. THEN' IT IS NECESSARY TO FIND A NEW WXTREME ELEMENT. ROWS PRIOR TO NREF NEED NOT BE CONSIDERED. 4|0 DO 50 J = 1,NCL I=LI S T ( J ) I F (I.EQ.NREF) GO TO 55 5 0 CONTINUE 5,5 I F ( J . E Q . l ) GO TO 80 6 0 SREF(I)=BIG J1=J-1 I DO 70 L=1,J1 1 LISTL = L IST(L) LL = LFIND(I,LI STL) I F l ( ( S ( L L ) - S R E F ( I ) ) * S I G N ) . G E . O . ) GO TO 70 N5AR(I)=LISTL SREF(I)=S(LL) 70 CONTINUE 80 J=J+1 I F ( J . G T . N C L ) RETURN I = L I S T ( J ) IF(NEAR 11).EQoLREF.OR.NEARtI).EQ.NREF) GO TO 60 GO TO 80 END. i C L U S T E R T R I A L R_UNj - | NE a 8 0 I S I G N o •1 N T S V s - 1 NT IM a 7 I N ' C P T - » 10 KO'JT = 2 _ R E O i l I R E D _ S T CR AGE P 0 1 2 0 WORDS A L L O T T E D S T O R A G E n 7 0 0 0 WORDS F O R M A T ClOF7,« i_) •  A V E R A G E L I N K A G E W I T H I N T H E NEW G R O U P _ f H l F " R T N _ l D E P I c l T ~ f H " r i , O R T I . O N OF T H E T R E E C E N E R A T E O B E T W E E N S T A G E . T H E C R I T E R I O N V A L U E S A R E ' S E G M E N T E D I N T O T H E F O L L O W I N G C L A S S E S , C L A S S LOWER BOUND UPPF.R DOUNf; 1 AND S T A G E 79 Of THE C L U S T E R I N G , • 0.2120CC0CE - C I 0.35035290C -01 0".''ilW!7 0 5 <'cf -01 « • 0.62705?7CE--01 5 0.7&S«112CE. -01 0 0.SC3763 7CE -01 7 0.10«2; 1 6 OE 00 S 0.1 1PGJ68CE 00 9 o " . " i 3 i 6 B 2 f 6 r c o ' 10. 0.11571730E 00 11 0.1S95526CE CO 12 ""0.17333780E 00 13 0.1*72231OE 00 1« 0.2OlO5rt30E 00 15 0 , 2 1 u 8 9 3 6 C C 00 16 0.2237206QE 00 17 0.2<J256«10E 00 18 0.2563993CE 00 19 0.27023060E 00 20 0 , 2 3 ^  0 'J 9 8 0 E CO 21 o"72 *? 7 4 o s i"o E" 00 22 0.3117«03cE- 00 23 • 0.32557560-E 00 "2<i" 0".33<5M109OE 00' 2 5 0.35324610E 00 0,,35035290E-0t 0,q6B705°OE-01 0,627 0587 0'E-O'r 0,76501 120E-01 0.<J0 J76370E '0' 10«21l60E O.UPC16P0E 0 .1 3 1 S G ?. 1 C E 0 , I U57 1 7.30E 0 . 1 5 ' 5 5 5 2 6 0 E 0', 1 7 3 3 R 7 R 0 E O ' . l f l 7 2 2 3 l 0 E ' 0 . 2 O 1 C 5 C 3 O E 0 . 2 1 H 8 9 3 6 0 E •01 0 0 00 00_ bo o o 00 0 0" 00 00 o . ? 2 8 7 P O s n r " • 0 ' . 2 / | 2 S 6 « i 0 F . 0 ' , 2 S 6 3 9 9 3 C E 0 " . 2 7 0 2 3 4 6 0 E 0 , 2 fl a 0 6 9 fi 0 E _0_.'29_7905 lOF. 0731 l"T'«'0'3 0r o ' . 3 2 5 5 7 S 6 0 £ OT 00 00 00" 00 0 0 0^3394 1 090E '"0,3532«'M'0E" 0.36708240E 00 00 00 "00" 00 CLUSTER TRIAL HUN "K ' I J 5 "IS" IL JL NEXT 1 Vi Ts b~ 5T2Too o c"f ^ o l i ' 5 o FT " 2 40 4 4 0.2120000CE-OI I 0 0 4 _ 5 7 10 0.21200000C-01 I 0 0_ 10 4 69 '70~ "0.26500OOPE-O1 " t " 0" 0. "" 1 3 " 5 14 17 0.26500000F-01 1 0 O- 14 _ 6 61 62 O_.Jt70OO0r!:-0 1 ! 0 0 20__ 7 27 29 o,"M766o6CE-0~l 1 '• B 0- 21 » 16 16 0.3170000CE-01 ! 0 4 IS 9 40 _45 0. J52t>6(.S0E-01 2 2 0 _ 24 "10 7" . 8" 0,35266t>SOF.-01' 2 3 0 1 9 " 11 33 30 0.3700000CE-01 2 0 0 50 _J2 _5_6 6 4 o . i e e o o o o c E - o i 2 C t 2 J _ _ 13 69 76 o;'«235'332"Ct-o"i 2~ 5 0 16 14 14 21 0.42JJJ52CK-01 2 S 0 33 _1S 16 20 0.U58JJJ00E-0I 2 6 , 0 _ 22 16 66 6 9 ~ 0.16733MCE-01 2 0' 13 30 " 17 68 70 0.17600000F-01 2 0 0 39 _J_9_ _36 37 0.47600000E-01 2_ 0 0 4J_ 19 7 n o f o a S e s s j c e - o i T T5 o 34 20 60 61 0.09J666J0E-01 3 0 6 3S 21 ._ 56 67 0.5J783320E-01 3 _12 0 36_ 22 16 25 0.SU6666J0E-01 3 lS 0 31 23 27 31 0.5603J320E-C1 3 7 0 68 _ja __4 0 OJ 0.57 3153a OE-OJ J 9 0 _ l _ 25 71 72 0".5'8260(>6i;£'-61 3 IS 0 62 26 48 49 0.5B20D000E-01 3 0 .0 40 27 / 46 47 O.58JCOO00E-01 3 0 0 51 28"- "23 20"~0.58200000E-01 3 "0 6 58" 29 3 4 0 .5e200000E-ol 3 0 0 57 __30 66 75 0_,_3ue9970E^0_l 31 52 SO 0,63««9"960E-01 32 2 5 0.63499980E-01 _ 3 3 10 16 0,6a495200F-0i 30 " 1 7 O,645O99ROE-OI » 35 56 60 0.6524997CE-01 4 0 20 '54 0 t6 0 «5_ 8 0 0 . 56 1 5 2 5 " - <i 0? 2 5 0 . 3 4 9 9 9 8 - 1 « 0 0 J 7 „ «q J 6 0 , 6 4 0 < > 5 2 0 0 F - 0 : 1 4 14 22 •• I 7 -  64 5 O 9 9 S 0 E - 0 1 4 • H . ' J _ 5 5 f c 6 0 0 . 6 5 2 4 9 9 7 C E - 0 1 4 0 _ 6 57 58 0 . 6 3 7 7 9 9 4 0 E - 0 1 4 » -, 7 J5 J 8 o" .Ve '7999 7 0 £ - 6 l 0 0 j8 13 19 0 . 6 8 7 9 9 9 7 C E - 0 1 4 0 0 21 _63_ ^ : o «2 0 52 39 68 80 0.70533330E-01 4 17 0 46 " « 6 08 51 " " 0.7406663CE-01 0 26 0 "61 " 41 36 00 0.700799JOE-01 4 18 24 5S 42 35 0 3 0 .74099950E-01 0 37 0 59 43 22 28 0".7«r000'|0E-6i : 0 i ! CP 6T " 44 9 12 0.74100010F-01 0 0 0 57 45 66 7S__0,75«79920E-01 4 30 0 _ S3 "46" 68 73 0.77600000E-01 5 " 3 9 " ' 0 " 65 » " •> 7(.nnnncr-n 1 S 32 0 67 0 6 61_ 5 1"5 0 ' 60" _ j 2 6 0.7760000CE-01 5 53 55 0,79B00O00E-.0l 5 -S T5 5 2 — o t 7 « _ « i o E - b r - | • » • • To 50' 28 33 0.81099980E-01 5 0 I I *» I. -.o i i . n.8461?29Cr-01 5 0 . 2 7 . . . . " li 39 0 6 0.846J329CE-01 5  . 27 .... »» r i .3 1 5 " 0.84666600E-01 S 38 0 *< H 66 77 O:65661820E-0| .5 . • * . J M Il 5 4 63 C.667699JCE-01 S 35 0 , ?<L. •)M _ _ » -- - _ . - - - -"S'S 36 so oTaTTTiOBsoT.ol s o i o 66 56 52 59 0,eei9997O£ - c l 5 31 0 70 57 J 9_0.908099<)CE-01 t 29 4 4 71 "58 23 " 30 0 191699">5CF-sl *6" "28 0 " "" 71 * %t 35 02 O^dSuf l vgCF-C ! 4 02 0 69 6 0 22 26 9.9(.2P9<)'5T.-C'. 6 . 03 50 63_ 6"l 08 53 0%'962999'iiOt-or 6 o5 «8 Vi~ 62 56 71 0.10l2S5hOE 00 6 5« . 2 5 71 63 _ 57 66 0.10I7J320EC0 _ _ 6 36 53 70 _ '60 " 1 10 " 0,102:9')70E 00"" ~" 6 "34 0 9 72 65 68 79 0,1C687990£ 00 7 4 6 ' 0 73 66 36 39 0.UM154CEC0 7^  55^  51 7 « _ 67 "2 13 t~. 1 ibI66iifiE 00" 6 o7 52 72 68 2 2 , 27 0.1 1880700!: CO 6 60 23 70 69 35 08 0, 1261 05'iOE 00 8 59 6t 76 ' 70 52 57 " 0.134 MlfiOE 00 " " 9 ' 56 63 75 7) 1 23 0,149lht.60E 00 . 10 57 58 77 72 1 2 0.150 7 495 0FJJO J O 6^ 0 67 77_ 7^ 1 56 66 "0',"t60Jj62OE 00. 11 6"2 65 TS >0 22 36 0.18IO5320E 00 12 68 66 76 75 52 56 0.203S308CF. 00 10 70 73 . 79_ " '76 22" 35 " 0.205009S0E 00 " ' lo " " 7 4 " " 64 78 77 I 3 0.21761JOCE 00 IS 72 71 78 76 1 22 0.28134730E 00 19 77 76 79 ~19 52—6", 16 7 0 8 2 o 0 C"0O 2 5 18 75 T 198 ID NO I " " 2 " " j 4 $ 6 1" i 4 tO 11 ~ " '"14 >S~ 1* J 7 ' " "18 1 « " 2 0 " 21 i i 23 2 « ~ 2 5 " 64 65 se 67 57 -*! <0~ 76 (6 "75 78 '7_: "52 51 59 71 72 _tl_ 62 60 56 63 66 7« • r i — . i : i I-1 • i - - i £0 73 79 "35 SB 43 « V 48 49 "51 ' 53 55 06 47 39 36 37 DO IK 45 at '50 27 29 --I 31 22 it 33 30 26 I I-•I " I I — I -I T a 9 "12" 23 _24_ 30 7 10 0 II 1 1" 17 21 16 18 20 "25" 32 2 " 5 6 _13 19 15 APPENDIX C Sample Outputs from Program UBC:BMDP2M BHDP2M - CLUSTER ANALYSIS OF CASES HEALTH SCIENCES .COMPUTING FACILITY UNIVERSITY OF CALIFORNIA. LOS ANGELES PROGRAM REVISED FEBRUARY 26, 1973 WRITEUP REVISED SEPTEMBER, 1971 PROBLEM CONTROL CARDS PROB TITLE IS 'CLUSTER TRIAL RUN - TOATA1 - 2M«./ INPUT VARIABLE=2. CASE=80. "FORMAT='(5X,2 F7.6) '•/ PROC SUMCFSO. ST AN 0./ PRINT OAT A. DISTANCE. VERTICAL./ END / PROBLEM TITLE . . . . . . .CLUSTER TRIAL RUN - TDATA1 - 2H NUMBER OF VARIABLES TO READ IN . . 2 NUMBER OF VARIABLES ADDED BY TRANSFORMATIONS. . 0 TOTAL NUM3ER OF VARIABLES 2 NUMBER OF CASES TO READ IN.'. . . . 80 CASE LABELING VARIABLES . . . . . . . . . . '.'.'"" 6 0 LIMITS AND MISSING VALUE CHECKED BEFORE TRANSFORMATIONS INPUT TAPE NUMBER . . "5 REWIND INPUT TAPE PRIOR TO READING DATA . . . . NO INPUT FORMAT . . (i>X,2F7.0) PRINT DISTANCE MATRIX . . . - -_. YES _ TYPE OF TREE PRINTED. '. .' . ". . 7 ~."V .""VERTICAL CALCULATING PROCEDURE , . . . . SUM-SOR STANDARD I ZAT1CN CN INPUT DATA YE_S PRINT INPUT DATA MATRIX AFTER STANDARDIZATION . YES NO. NAME STANDARDIZED INPUT DATA 1 2 3 4 •1.597 •1 .597 •1.597 -1.597 •I .471 -C.879 C.361 I -104 1.649 0. 6 7 a 9 10 u -1 13 -] 14 -1 15 -I 16 j i j 1 7 -1 1.471 1.346 1 .34t> 1 .295 1. 295 1.220. 18 19 20 21 22 _2 i . 220 170 094 1 .094 1.044 L-^4_ -0.969 -0.K93 -0.843. -0.868 -0.843 24 25 26 2 7 28 -22_ 30 31 32 33 34 _3 5_ .36 37 36 39 40 _4J_ -0.843 -0.717 -0.717 -0.591 -0.591 -0.466 42 43 44 45 46 J±.Z. 48 49 50 51 52 _5Ji_ -0 IQ8_ ,135 126 . 730 05 5 ,02 7 ,325_ ,600 ,234 679 ,410 ,275 ,730. , 126 ,135 ,374 879 708 54 55 56 57 5e 59 -0.4 66 -0.466 -0.315 -0.340 iQ-..29JL. -0.2 14 -0.164 -0.139 . -0.089 0.037 _Q..J2.12_ 0.037 0.037 .0.087.. 0.163 0.2 8b _iL.23.8_ 0.2P8 0.213 0.414 0.41<. 0.565 . . n . 5 i 5 , 1. -0. -0. 0. - I . -0. - 1 . -0. - I . - c . 0. _ 1 ._352_ 1 . 897 -1.126 . . 0.261.. -0.383 0.906 _=JC,.1.6.5_ 1.749 -0.730 _-l.275 0.311 0.60P _ J L _ _ _ 2 _ -1.126 -0.779 C.955 -0.135 -1.374 ___Q,ii3JL 60 61 62 63 64 65 66 67 60 69 70 71 72 73 74 75 76 77 78 79 0.666 : 0.666 0.666 0. 791 0.917 0.867 0.9 17 •0.917 0.967 1.043 1.043 1.043 "168 168 219 294 1.294 _1_. _24jt_ 172 94 1.420 1.344 1.420 1.420 1 .54 6 -0.086 0. 856 1.600 ~-l.126 -0. 978 O ^ l ^ 1. T04 1.699 1.501 0.706 -0.680 -0.879 -1.126 -0.532 C.460 "-1.C2 7 -0.779 _L^798_ 1,352 0.836 0. 261 -0.433 -1. 126 -1 .523 1.671 1.747 1.621 -0.879 -0.234 C.410 0.46 0 1.352 -1.2 75 -1. 126 -0.383 __L_0.63._ 0.708 1 .104 -C. 879 1.352 -0.482 ,-..0„41Q . N 3 O 202 AHAL&.0IS1. VALUES OF VARIABLES CF £ L U S » E « S 1 0.111 - r l . 3 ? l -1 .077 2.000 2 0.111 0.062 - 1 . 3 2 5 2.000 J 0.157 - 1 . 0 6 9 -0.C04 2.000 A 0.1 60 C. 980 -0 .978 2 .000__ > 0.160 17357 -1 .C77 2".0-6 b 0 . t67 - 1 . 0 0 6 -1 .501 2.000 7 0.195 1.106 -0 .606 2.000 8 0.195 " 1.29* * - 1 .093 3.000 9 0.205 0.9*2 1.600 2.000 11 0 .21* - 1 . 0 0 2 -0 .629 3"."000 12 0.222 0.096 - 1 . 2 5 9 ' 3.000 11 0.235 1.282 0.361 _ 2.000 1* 0.238 - 0 . 9 5 2 - - 1 - 2 5 9 3.000 15 0;250 - 0 . 0 6 3 - 0 . 7 0 5 2.000 l o 0.256 _. 16? _0-6fc'; 3.000_ 17 0.267 -1 .287 - l V l o O 3.000 18 0.269 - 0 . 8 9 3 -1 .226 A.000 _ . ; 19 0.27* 0.917 - 0 . 9 9 * 3.000 20 0.277 0.850 1.600 3.000 21 0-292 - 0 . 3 4 0 -1.201 2.000 22 0. 222 - . . A l l _P • "OJ, 2_O00_ 2 J 0.294 -1 .031 -C.18S 2.000 2* 0.294 0.A27 - 0 . 4 3 3 2.000 25 ... 0.298 - 0 . 3 2 7 C.460 2.000 26 0.304 0.125 1.228 2.000 27 0.306 - 1 . A 4 6 1.079 2.000 28 0.3 15 0 . 221 I.,2i9 3.,00?_ ~2* 0 . 3 1 9 - ' - . ' 7 0 6 0.038 2.000 30 • 0.320 - 0 . 7 1 7 0.807. 2.000 31 0.342 1.231 -0 .606 A . 000 32 CK343 1-395 0.377 3.000 33 0.353 0.791 0.980 2.000 _,*. 0.3 53 fl-163 -0. 5 M 2 . .0 0 Q 38 39 _*0_ A l A2 ' - • - . 35 0.369 - 1 . 5 3 4 0.53'. 2.000 36 0.371 0.e75 0.889 3.000 37 . . 0.376 - 0 . 2 U ' 1.129 2.000 .. 0.380 - 1 .A09 1.425 2.000 0.382 0.075 -O .C36 2.000 ___392 1.10& -1.C44 6._0CJ__ 0.392 0.280 0.526 3.000 ^ 0.399 -1 .C62 - 1 . 1 9 7 7.000 _.A3 0.373 r l . O A A - 1 . C 8 7 10.000.. AA 0.A03 -0 .198 -0 .713 3.000 AS 0.A05 - 0 . 5 0 6 0.336 2.000 4.6. Q _ D _ _Q _6_4 l i £ 2 3 2_00 0__ A7 0-AAO - 0 . 0 7 9 -1 .236 5.000 *8 0.AA1 0.9A8 1.6A9 A.000 *9... .. 0.443 _ _ _ r l . 1 7 8 . _ _ . - 0 . 168 . . . . 3.000 50 0.AA6 0.A23 -0.5G1 3.000 51 0.A56 1.156 - 0 . e _ 9 10.000 52 0^4 5 i 1.0 IS 1 .590 5,_.0 0 _ 53 C A S E O.0A7 1.213 5.000 5A 0.A80 1.A01 C . 4 5 ' A .000 ... 55 0.507 -0.E11 0.571 A.000 . 56 0.A97 - 0 . 6 5 0 0.53* 6.000 57 0.507 - 0 . 7 1 7 1.666 • 3.000 <f> n . i p - l . l i s - 1 .040 17.000 59 0.515 1.203 -C.B70 11.000 60 0.536 -0 .123 - 1 . 0 4 0 8.000 61 0.517 - 1 . 4 2 7 1.352 A.000 62 0.598 0.196 0.30.1 " " 5.000 63 0.623 - 0 . 9 3 3 -0 .21A 5.000 6 A 0.676 . 0.360 °_'i ? 6. 2-_9CO_ 65 0.656 1. 176 0.165 7".0'00 66 0.713 0.026 - C . 9 1 5 11.000 67 0.738 _ 1.731 -C.92A. 12.000 " 68 0.777 ~ - 1 . 1 2 3 1 . 4 S 7 7.000 69 0 .800 - 0 . 7 7 9 0.19* 11.000 70 O.JSZ& - 0 .895 __?A6 ? _ L 0 0 _ 71 O'. e61 1.27"l - 0 . 8 71 13.000 72 0.926 0.768 0.AA6 1A.0OO _ 73 1.0*1 0.532 _ 1.A02 1C.000 7* * 0.985 0.670 0.E4* 2*.000 75 1.1*8 -0 .570 - 0 . 9 8 0 23.000 7 6 1. 76 1 -0 .975 ' C.6 8 0 20.0 0_0_ 1.653 -6.078 0.7~70 **7'000 ]l ['.111 - C .2A? >:i69 67.000 79 i.e*o 0.000 0.000 80.000 203 APPENDIX D Sample Outputs from Program UBC:CGROUP PR03LEM "NAME" * "CLUSTER TRIAL RUN 7 OA'fA'l""" ' NUMBER.OF ITEMS TO 3E GROUPED ° 60 '  N U M B E R OF G R O U P I N G KEYS » 2 ' S T A R T P R I N T I N G W H E N T H E R E ARr 10 GROUPS '. S T A N D A R D I Z E G R O U P I N G K E Y S t YES  P R I N T A T R E E G R A P H I YF3 C O N T I G U I T Y C O N S T R A I N T 7 NO I T E M I D E N T I F I C A T I O N NAMES TO B E READ t NQ . ' S T O R E G R O U P MEMBERSHIP : NO TRANSPOSE 0ATA MATRIX j YES T~" " ~ NUMBER OF FORMAT CARPS n j  P L O T ERROR T E R M S t YES 720' BYTES OF CORE ARE ACOUIRED TO TRANSPOSE THE DATA MATRIX DATA FORMAT I (5Xf2F7,61 .• EX E C U T I O N T I M E POR' T R A N S P O S I N 6 "n 0,09 SEC ON 03 " iinnu B Y T E S OF C O R E A R E A C Q U I R E D F O R G R O U P I N G TIME T6"ReTd"OATA: ANO~STORE'"ERROR l iAf R'1 X~» T,06 "jJECONOS"-"" S 3 o . _ - , » . « - r r — r . . . v w - i — n T T T 7 * n ~ 7 T 9 — r T Y l rCTTS Tfl "' F I E L D OF «'S WILL BE WRITTEN. . . . . - • • 3TEP I STEP 2 STEP 5 3TEP fl STEP 5 S7EP STEP ' STEP STEP STEP ""STEP"-STEP STEP "STEP STEP STEP. S 7 cP~~ STEP STEP " STEP STEP STEP_ STEP STEP STEP " S T E P STEP STEP STEP STEP STEP' ~ STEP' STEP STE^_ 3"TE"» STEP STEP STEP STEP STEP ~~S~ 7 3 9 10 11 T2~ 13 1" 15 16 _17 10 1? 20 21 22 23_ ~2"o STEP STEP STEP STEP STEP STEP 2S 26 ' 27 28 ? "30 31 32 33 3a _J5 3»" 3? 38 39 00 01 02 «3 a « 85 «6 «7 79 GROUPS 70 C O U P S 77 GROUPS 76 GROUPS 7S .CPQL'PS ~li GROUPS" 73 C O U P S 72 G=CUPS 7 I GROUPS 70 GROUPS _ 6 9 GROUPS 68'GROUPS 67 GROUPS 66 GROUPS 65 GROUPS 60 GROUPS 63_GR0UP5_ ~~lZ GP0UP3 61 G=0l.'?S 60 GROUPS 59 C-RO'JPS 58 CROUPS _ 5 7 GROUPS^ 56" GROUPS 55 GROUPS 5c G=OUPS 53 GPCUPS 52 GROUPS 51 _r.Rn;iPS 5d" GROUPS" 09 GROUPS 08 GROUPS 07 G~0UP3 06 GROI'PS ii5_GR0UP3_ IC GROUPS «3 GROUPS 02 GROUPS «1 GRGUPS 00 GROUPS 39 GROUPS 38 GROUPS 37 GROUPS 36 GROUPS 35 CROUPS 3« GROUPS 33 GROUPS AFTER JOINING AFTER JOINING AFTER JOINING AFTER JOINING AFTER JOINING AFTER JCTNTNG AFTER JOINING JOINING JOINING JOINING _ JOINING " A F T E R - J O l N l N C AFTER JOINING JOINING JOINING JOINING J 0 I N I N G _ JOINING JOINING JOINING JOINING ' JOINING JOINING JOINING JOINING JOINING JOINING JOINING JO INING JOINING JOINING JOINING JOINING JOINING J 0 ! N I N C _ 'JOINING JOINING JOINING JOINING JOINING J,FTE_R_JOINlNG_ AFTER' JOINING JOINING JOINING JOINING J01N1NG JOINING AFTER AFTER AFTER AFTER AFTER AFTER AFTER AFTFR_ AFTER AFTER AFTER AFTER AFTER AFTER AFTER" AFTER AFTER AFTER AFTER AF r FR_ ( F l f l i AFTER AFTER AFTER AFTER A F T E R_ AFT F R AFTER AFTER AFTER AFTER 02 ( N a AFTER AF TF.R AFTER AFTER AFTER 55 (N = a J CNC 2 »".:a 35 ( N a _7_5_JNn_ 39 ( N a 68 ( N a IS ( N a 63 ( N a 2« (N = 36 ( N a 71 (N = 02 ( N a 31 (N = 6 (N = 06 ( N a 66 ( N a 1) & 1) & 1) & 1) & 1) & 10 ( N a 011 ( H a , 17 ( N a 65 (No 69 (N = 1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) & 1> & "18 ( N a 67 ( N a 29 ( N a " 62 ( N a 76 ( N = 70 ( N a 2) & 1) I 2) 2) 1) 1) 1) 1) 2) 1) 1) n 2T~TNa 01 ( N a 05 (N = 20 (Na' 8 ( N a 32 ( N a 19~C"Na S2 ( N a 70 ( N a " JO ( N a ' 09 ( N a 9 _ ( N a _ 11 (Na 58 ( N a 5 9 ( N a 6 1 ' ( N a 29 (Na 08 ( N a 1) 8. 2) & 1J & 1) & 1) t. 1) t. 60 (N = SI ( N a 5 ( N a 38 ( N a 12 ( N a 79 ( N a 1) 2) 1) 1) 1 ) 07 (Ns 80 ( N a 26 (Na 73 ( N a 30 ( N a 00 ( N a ""STEP «S J2 CROUPS AFTER JOINING 16 (N« 1) 2) 1) 1) 2] _3JL 3) 72 (N S3 (Na 37 ( N a 13 ( N a SO ( N a 78 (N T 5 ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR •1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR ~T> "ERROR-1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR 1) ERROR 2) ERROR 1) ERROR 2) ERROR 1) ERROR 1) ERROR n ERROR" 1) ERROR 1) ERROR 1) ERROR I) ERROR 1) ERROR 1). F.PRUH 1) ERROR J) ERROR 1) ERROR 1) 'ERROR • 3) ERROR 1) ERROR 1) ERROR 2) ERROR 2) . ERROR 1J ERROR 1) ERROR 0.6256S8E « 0.625693E a 0.12O777E i 0.129779E = 0.1297S0E ro~'l"a"b77r.E' a 0.191968E = 0.211870E B 0.?. 11B7 0E-a 0.256627E-a 0.279081E-sro;'310'i33'E'-a 0.316809E-a 0 , 3 3 J 6 9 9 f . a 0.383S07E-a 0.O32025E-a O.OJ202SE" a 0,03702oE ; a 0.037029E-a 0.C0J252E-= 0,051103E-a 0.06786HE-a 0.073313E-a 0~O83b62E : a 0.50505HE, a 0.515873E' a 0.516307E' a 0.519121E-a 0 t 6 3 1 0 9 0 E ' a 0,6S1C93E a 0,66B039E< a 0.689653E' a 0,720P.63E a 0.732555E a Q.739952E 3 0", 7 399 5 a" E" a 0.79O3O9E a 0.832088E = 0.832088E a 0.832C92E a 0.862960E = 0 . 102058 a 0.103961 c 0.1C96S8 a 0.132306 a 0,130076 0.103061 02CUM 02CUM • 01CUH • OICUM • 01CUM_j OlCUM • OICUM : OICUM ! •OICUM : OICUM i ^OlCUMj •OICUM i •01CUM i •OICUM ! •0 1CUK : •OICUM • •OICUM •OICUM •OICUM -OICUM -OICUM -OICUM -0KUM_ -OICUM " -OICUM -OICUM -OICUM -OICUM -01CUM_ -bicu'ri -OICUM -OICUM -OICUM -OICUM -OICUM -OICUM •OICUM •OICUM •OICUM •OICUM •OICUM CUM CUM CUM CUM CUM CUM 0.625688E 0.125138E 0.209915E 0.37969OE 0.509O7OE 0"."6"5625Sc' 0.6O220CE 0,105011 0.126593 0.152261 0.180169_ 0'.2U?\?. 0,202893 0,276263 0,310613 0,357316 0_.00 1018_ "0.000721 0,088020 0,532709 0.577860 0,620606 0 i671''78_ "0.72 02 7 8 0,770820 0.822O11 0.870006 0.925958 _0,9890h7_ lTosJi'a 1,11698 1,18790 1,26003 1,33368 1 . 00768 1 ,o'8 16 7 1,56110 1,60031 1,72752 1,81073 1,89702 1 ,999"o"8 2.1030O . 2.21270 2,30505 2,07952 -2.62258 • 0 21NDE X • 01IN0EX •01IN0EX •OlINOEX • 01 INCEX_ .oYiN'cf x • oirof.x IN'OEX INOtX INr>EX_ INDEX INC.EX INPEX INOEX . INDEX INDEX INDEX INDEX INDEX '"INDEX INDEX INDEX INDEX" • INDEX INDEX INDEX INDEX INDEX INDEX" INDEX INDEX INDEX INDEX !NDEX_ INDEX INDEX INDEX INDEX INDEX INCEX. INDEX INDEX .INDEX INDEX INDEX INDEX 0.0006 77.5092 3.0R70 0.0001 6.3565~" 26.9177 7.5S98 0.0 10,9986 _6,12S7_ 7.7516 1.3967 3,5719 _ 9,3511 8 ,2?32 0.0 ' 0,7296 0,0001 • c,e<>8S_ 1.0625 2.1927 0.6T09_ r,16"69 2.6035 1,1132 0,0096 0,2807 U i 2 , . 5 9 _ d.cooi 2,9271 l . s j s a ; 2.O506 0.0967 0 , 0 6 C 5 _ «,:•;. i 3,2321 2 , 0 0 5 1 . 0.0 0.C002 4 j,«6 38_ 7.1232 0,7088 r_.'2.C276 7,0081 0.5630 •2.17CS-I—25 (Na 2) ERROR « 0,161370 CUM > 2,78aHS INDEX • « , J J 8 6 m : o o o 9 O O 0 O O © O o © o o o o o ho o c 1 IS 1 7 8 10 1 1 2 5 2 5 6 IS 11 5 7 3 4 9 12 2 J IS "" 6 IS * 2 ? 26 25 SS 27 12 27 2 ' 31 56 17 35 5 J5 J» 0 3 09 51 J ' 7 39 o"2 0 7 Aft 53 55 7 55 36 60 61 62 57 I S 5 t 58 60 65 66 6 3 5 63 7 J 70 e o 1 15 2 t i _ J 7 27 12 35 5 _I5 I 7 S 6 IS _ « _ 1 12 2 » SI 36 ja e s oo .2 07 OS « _ _ £ » A F T E R J O i M H i O 2_!_ 10 11 !« 1« 1' I 8 2 ° 21 2 5 IS 15 |9 22 26 26 35 30 2 5 _ 20 30 . 3 7 a o "1 00 05 06 50 52 51 50 55 7 S 7 13 55 57 6 J . 56 60 61 5 » 60 65 6 8 _ 7 S 7 0 _5..5__o_ 71 72 67 60 62 70 7S 76 77 76 79 t o CROUP N » STEP I 13" L 72 7 8 GROUPS AFTER J O I N I N G 35 (N= 2 11 2 3 7 3 . 2 7 . . _ 1 2 _ 3 5 12 3 5 3 9 7 3 0 5 7 13 5 7 9 .31 03 _LS 11 13 10 16 17 18 J _ _ 5) l 21 25 5 5 (N= 32 7 ) E R R O R ' 2 . 5 0 0 5 7 C U H » 18.3032 IN.OEX 12 36 37 IS 19 23 20 0 2 0 7 5- 60 63 63 66 7 5 7 0 08 1 5 _ 6 6 . e o o o 5 t 5 5 5 3 5 0 22 26 24 S 3 30 01 00 05 06 56 60 61 62 59 6' 7 0 75 76 50 52 7; 72 CRO'JP » ' STEP 1 13 1 2 11 2 3 7 3 27 12 . 27 J 5 12 35 3 « 12 39 57 13 ~ 57 7$ 7 CROUPS AFTER J O I N I N G 3 0 <M» 7) I 63.. ( N"_. 7 6 IS, I I 10 16 11 |8 20 21 25 32 5 6 13 15 19 22 26 28 33 So _12 23. 20 i O _ _ 5 7 _ . £ R R 0 R . . » _ . S . 0 6 2 8 2 CUH..«. .21.0061.. .INDEX.-". 1,6295 2 « 31 36 3 7 00 01 00 05 06 50 52 3 0 OS 09 51 55 56 60 61 62 71 72 02 0 7 OS 53 5 0 59 63 68 73 7 0 80 .58 ( 0 65 66 67 69 70 75 76 77 7» CROUP STEP T 70 nr 18 12 12 12 13 " T 3 29 36 02 07 5 8 60 6 GROUPS AFTER J O T X I V G 2 !••'« [7 I T - 25 IT 0 5 31 36 0 3 09 0 8 65 r r 37 51 53 50 66 67 9 1 2 0 0 0 1 55 56 60 61 |3 |5 oo «5 .r 19 22 59 69 75" 23 06 50 52 62 71 72 63 68 73 70 80 75 76 11) t 3 [ » • 7) ERROR 5 . 7 H 6 5 C U M ' 27.1977 6.2367 ~ S 2 ~ 20 26 28 SO 33 38 70 77 76 79 C R O U P 1 13 ~ • 2 18 27 12 35 20 STEP 7 5 5 CROUPS AFTER J O I N I N C 35 tN» 12) I 39 (N« 18 20 21 25 32 15 19 22 23 20 26 6 50 52 51 51 50 55 56 5 ' "57 I T " 1 7 8 10 t l 10 16 17 2 "' 3 ~ t 5 6 9 12 ' 15 27 2« 31 36 37 00 01 00 35 36 ;9 02 OS 0 7 06 09 57 5- ( o 65 66 6 7 ~ " V O - T O -12) ERROR « 5 , 9 9 0 5 9 "28"~ 30 " S3 - 30 60 61 6 2 . 6S 68 71 CUH « ..33,1883 0.2061 72 73 7 0 80 0 CROUPS AFTER J C 1 N I X G 1 (N« 13) I 27 ('•• 8 10 11 |0 16 17 I t 20 21 25 27 2« 0 3 6 9 12 IS 15 19 22 23 20 26 19 02 0 3 07 0 8 09 51 5 3 50 5 5 56 5< 6 5 — 6 5 — 6"6~6 7~~i i~ 7 o—7 s - ) s~ ? 7 — 1 en' 3 CROUPS AFTER JCINK'G 35 {•(• 2o) S 57 («• • t 10 II 10 16 |7 IB 20 21 25 27 29 • $ 6 9 12 13 15 19 22 23 20 26 39 0? 0 3 07 0 8 09 5| 5 3 5 0 55 56 5 7 I t - 7 7 — Y « ~ J * - " « 5 12) ERROR » 7,23539 31 32 36 37 o 0 01 28 30 33 3 0 6 0 61 62 63 68 71 13) tPRCR •. 2 8 , 1 9 9 7 31 32 36 ' 37 0 0 01 28 30 33 3 0 56 59 60 61 62 63 C U M . 00,0237 ..INDEX;" 1,0390 . 00 05 06 50 52 72 73 70 80 ._ C U M • 66,6230 I'IOEX_» l l | 5 8 9 9 "00 0 5 06 50 52"" 6 0 65 66 67 68 69 70 71 72 73 S T E P 76 2 CROUPS A F T E R J O l N l l O I (im 25) 1 2 ("• 16) ERROR » 3 3 . 2 0 5 5 ' 1 2 3 0 - 5 6 7 6 1' 10 I I 12 13 | 0 15 16 17 18 19 20 31 32 3 3 So J6 3 7 oo 01 0 0 0 5 06 50 52 yj' 3 7 ' " 3 5 3 « 3 ' _ , 0 2 0 J 0 7 _ * '. 0 0 51 5 3 5 a C R O U P " » " I ' OS C U M • 101.829 T*OEX • 0.5325 21 22 23 20 25 26 27 28 20 30 55 56 57 58 59 61 t l 62 t l » 0 *5 66 67 68 69 TO 7J 7 2 — J _ J _ ~7~7S ->6 77 78 " 79 C R O U P S T E P 7 ° 1 C » O U P S A F T E R JOINIKO 1 CN« OS) I 3 3 (•<• 3 7 ) ERROR • 5 6 , 1 6 9 3 V — 1 — t•'• 3 0 5 6 7 6 _ 9 1 0 t l 12 13 to 15 16 :7 16 |9 20 J l 32 31 JO 33 56 37 38 SO « 0 «t 0 2 OJ 0 0 0 3 06 07 0 6 01 50 t t 2 t 3 6 0 65 66 67 66 6< 70 71 72 7 3 70 75 76 77 76 79 80. I 64 C U M • 1 5 9 , 0 9 8 INCEX • 1,5536 21 22 23 20 25 26 27 16 29 35 51 5 2 55 5 0 5 5 56 57 38 39 60 EXECUTION TIME r c « GROUPING • o ' . J * JECOxOS .LISTER TR IRL RUN / DflTRl 209 APPENDIX E L i s t i n g and Sample Outputs from NONHIER: a Computer Program for Three Nonhierarchical C l u s t e r i n g Techniques 211 D I M E N S I O N X { 7 5 0 0 ) L I M I T = 7 5 0 0 C A L L E X E C 1 X , L I M I T ) STOP END S U B R O U T I N E E X E C { X , L I M I T ) C C T H I S S U B R O U T I N E READS PARAMETERS» COMPUTHS STORAGc AND C A L L S MAJOR C PROGRAM SEGMENTS N E E D E D FOR A N O N - H I E R A R C H I C A L C L U S T E R I N G JOB U S I N G C ONE OF THE METHODS PROGRAMMED AS A V E R S I O N OF SUBROUTINE * K M E A N * . C C EVERY JOB R E Q U I R E S T H R E E USER S U P P L I E D DECK S E G M E N T S , C C 1* PROGRAM * D R I V E R * PERFORMS THE FOLLOWING T A S K S . C A. A S S I G N S I N P U T / O U T P U T U N I T S . C 3 . E S T A B L I S H E S THE D I M E N S I O N OF THE * X * ARRAY AND SETS T H I S C D I M E N S I O N TO * L I M I T * C C . C A L L S S U B R O U T I N E * 2 X E C * . C THE FOLLOWING E X A M P L E WILL S U F F I C E I N MOST C A S E S . C C PROGRAM D R I V E R * I N P U T , O U T P U T t P U N C H f T A P E S = I N P U T f T A P c 6 = J U T P U T t C AT AP57= PUNCH,TAPE1» T A P E 2 ) C D I M E N S I O N X(5GOO) C L I M I T = 5 G O O C C A L L E X E C { X , L I M I T ) C END C C 2 . S U B R O U T I N E * U S E R * IS EMPLOYED TO READ THE COMPLETE SET OF SCORES C ON THE V A R I A B L E S FOR ONE DATA U N I T . THE FOLLOWING E X A M P L E C I L L U S T R A T E S VARIOUS P O S S I B I L I T I E S FOR MERGING F I L E S AND C" TRANSFORMING V A R I A B L E S AS THEY ARE R E A D * C C S U B R O U T I N E U S C R ( X ) C D I M E N S I O N X I 3 ) C R E A D l l . l O O ) X ( 7 ) , Y C R E A D ( 2 ) ( X ( I ) , I = 1 , 6 ) C R E A D { 5 , 2 0 0 ) X ( 8 ) , Z C X ( 3 ) = . 5 * X ( 3 ) C X ( 7 ) = 3 e 6 * X ( 7 ) C X ( 8 ) = . 4 * X ( 8 ) + . 3 5 * Y + « 2 5 * Z * X ( 8 ) C RETURN C 1 0 0 FORMAT (2 F l l o 3 ) C 2 0 0 F 0 R M A T ( F 8 . 1 , F 6 . 3 ) C END C C 3 . F U N C T I O N * D I S T * COMPUTES THE D I S T A N C E BETWEEN TwO DATA UNITS OR C BETWEEN A DATA UNIT AND A C L U S T E R C E N T R O I D . THE USER CAN S P E C I F Y C ANY D E S I R E D D I S T A N C E F U N C T I O N AND WEIGHT THE V A R I A B L E S IN ANY C MANNER. THE FOLLOWING E X A M P L E I L L U S T R A T E S A WEIGHTED SQUARED 212 C E U C L I D E A N D I S T A N C E B E T W E E N TWO D A T A U N I T S D E N O T E D AS X AND Y . C T H E P R O B L E M I N V O L V E S 3 V A R I A B L E S AND THE WE IGHTS ARE IN THE C * W * A R R A Y . C C F U N C T I O N D I S T ( X , Y ) C D I M E N S I O N X{15 ,Y{ 1) , W ( 8 ) C DATA ( W ( I ) » I = l t 3 ) / 3 * l . , 3 . i 4 * 5 t 2 . , 2 * i / C O I S T = 0 . C DO 10 1 = 1 , 8 C 10 D I S 7 = D I S T + W ( I } * ( ( X i I ) - Y ( I ) ) * * 2 ) C R E T U R N C END C C NOTE T H A T S C A L I N G AND T R A N S F O R M A T I O N OF V A R I A B L E S CAN BE C A C C O M P L I S H E D E I T H E R IN S U B R O U T I N E * U S E R * OR IN S U B R O U T I N E * D I S T * . C c  C I N P U T S P E C I F I C A T I O N S C CARD 1 T I T L E C CARD 2 P A R A M E T E R CARD C C O L S 1- 5 ME=NUMBER OF E N T I T I E S ( D A T A U N I T S ) C COLS 6 - 1 0 MV=NUMBER OF V A R I A B L E S C C O L S 1 1 - 1 5 NC = NUMBER OF C L U S T E R S C C O L S 1 6 - 2 0 N T I N - I N P U T UN IT FOR T H E DATA S E T C N T I N = 5 » CARD R E A D E R C N T I N . N E . 5 , T A P E OR D I S K F I L E C C O L S 2 1 - 2 5 N T O U T = G U T P U T U N I T FOR S A V I N G C L U S T E R M E M B E R S H I P L I S T S C N T 0 U T = 7 t CARD PUNCH C N T O U T . L E e 0 , DO NOT S A V E M E M B E R S H I P L I S T S C C O L S 2 6 - 3 0 M I N R E L = T E R M I N A T I O N P A R A M E T E R . C L U S T E R I N G ENDS WHEN A C - C Y C L E THROUGH T H E DATA S E T R E S U L T S IN * M I N R E L * C OR FEWER C H A N G E S IN C L U S T t R EMBERSH I PS C M I N R E L . L c . O , I T E R A T E TO C O M P L E T E C O N V E R G E N C E C C O L S 3 1 - 3 5 I P A R T = I N ! T I A L P A R T I T I O N P A R A M E T E R C I P A R T = 1 , S E E D P O I N T S ARE S E L E C T E D FROM THE D A T A U N I T S . C R E A D T H E S E Q U E N C E NUM3ERS FOR T H E CHOSEN DATA C ' U N I T S FROM C A R D ( S ) 3 IN 2 0 1 4 F O R M A T . I F T H E C DATA S E T IS NOT S T O R E D IN C J R E , T H E L I S T OF C OF S E Q U E N C E NUMBERS MUST BE I N A S C E N D I N G ORDER C I P A R T = 2 i T H E DATA U N I T S ARE G R O U P E D I N T O AN I N I T I A L C P A R T I T I O N IN T H E INPUT S E Q U E N C E WITH THE C F I R S T * N U M B R ( 1 ) * IN C L U S T E R 1, T H E NEXT C * N U M B R 1 2 ) « IN C L U S T E R 2 E T C . R E A D THE C * N U M B R * ARRAY FROM C A R D ( S ) 3 IN 2 0 1 4 F O R M A T . C I P A R T = 3 f THE S C O R E V E C T O R S FOR THE S E E D P O I N T S A R E C . R E A D FROM C A R D I S ) 4 IN FORMAT * F M T * WHICH IS C R E A D FROM CARD 3 . C C O L S 3 6 - 4 0 M E T H Q D = P A R A M E T E R FOR C H O O S I N G T H E A L G O R I T H M IN ONE C V E R S I O N OF S U B R O U T I N E * K M E A N * . C METHQD=lf J A N C E Y A L G O R I T H M 213 C M E T H O D . N E . l t FORGY ALGORITHM C C * * * C A R D S 3 AND 4 ARE READ IN S U B R O U T I N E * K M E A N * ACCORDING TO THE C***PR3C£DURS S P E C I F I E D BY THE CHOSEN VALUE OF * I P A R T * * NOTE THAT THE C * * * 3 A S I C K-MEANS METHOD OF MAC QUE EM S I M P L Y USES THE F I R S T * N C * DATA C * * * U N I T S AS CLUSTER SEED P O I N T S AND THEREFORE IGNORES THE * I P A R T * C * * * P A R A M E T E R . r . : C c c c c c c c c STORAGE A L L O C A T I O N S IN THE * X * X ( N 1 ) TO X I N 2 - U N C * N V WORDS-X I N 2 ) TO X ( N 3 - ' i ) X ( N 3 ) TO X ( N 4 - 1 ) X ( N 4 ) TO X ( N 5 - 1 ) X ( N 5 ) TO X ( N 6 ) X ( N 4 ) TO X I N 7 ) ARRAY STORAGE NC W O R D S — S T O R A G E OF NE W O R D S — S T O R A G E OF N C * N V WORDS—STORAGE NV OR NE*NV WORDS—STORAGE OF NE WORDS—STORAGE OF THE L I S T OF THE CENTR ARRAY THE NUM3R ARRAY THE MEMBR ARRAY OF THE TOTAL ARRAY THE DAT4 ARRAY ARRAY I N * R E S U L T * D I M E N S I O N X U ) , T I T L E ( 2 0 ) R E A D ( 5 , 1 0 G O ) T I T L E RE A D ( 5 , 1 1OO ) N E , N V , N C , N T I N , N T O U T , M I N R E L , I P A R T , M E T H O D W R I T E ( 6 , 2 0 0 0 ) T I T L E W R I T E ( 6 , 2 1 0 0 ) N E , N V , N C , N T I N , N T O U T , M I N R E L , I PART,METHOD N l = l N2=N1+NC*NV N3=N2+NC 'N4=N3+NE N5=N4+NC*NV C *N6* MAY BE I N C R E A S E D IN *KM£AN*. N6=N5+NV-1 N7=N4+NE-1 MAX=N6 I F ( N 7 . G T . M A X ) MAX=N7 W R I T E ( 6 , 2 2 0 0 ) M A X , L I M I T I F ( M A X o G T . L I M I T ) STOP C A L L K M E A N ( X I N l ) , X ( N 2 ) , X ( N 3 ) , X ( N 4 ) , X t N 5 ) , N 5 , N E , N V , N C , N T I N , M I N R E L , A I P A R T , METHOD, L I M I T ) C A L L R E S U L T ( X ( N i ) , X { N 2 ) , X ( N 3 ) , X ( N 4 ) » T I T L E » Nc» NV»NC » NTOUT) RETURN 1000 FO R M A T ( 2 0 A 4 ) 1 1 0 0 F 0 R M A T ( 8 I 5 ) 2000 FORMAT( 1H1 , 2 0 A 4 ) 2100 FORMAT(5HONE = , I 3 , / , 5 H NV = , I 8 , / , 5 H A8H NTOUT = , I 5 , / , 9 H MINREL = , I 4 , / , 8 H 2200 F O R M A T ( 1 9 H 0 R E Q U I R E D STORAGE = , I 5 , 6 H A 1 9 H 0 A L L 0 T T E D STORAGE = , I 5 , 6 H END C S U B R O U T I N E R E S U L T { C E N T R , N U M 3 R , M E M B R , L I S T , T I T L E , N E , N V . N C , N T O U T ) C T H I S SUBROUTINE P R I N T S THE R E S U L T S FROM A C L U S T E R I N G J O B BASED C ON ANY V E R S I O N OF SUBROUTINE * K M E A N * . N C . = , 1 8 , / , 7 H N T I N = , 1 6 , / , I PART = , I 5 , / , 9 H METHOD = , 1 4 ) W O R D S , / , WORDS) 214 C .. D I M E N S I O N C E N T R ( l ) ,NUMBR<1) , M E M B R { 1 ) , L I ST ( 1 ) , T I TL £ 12_») C C AS A CONTINGENCY P R E C A U T I O N WRITE OUT THE RAW MEM8cRSHIP L I ST-WRITE ( 6 , 2 0 0 0 ) T I T L E W R I T : ( 6 , 2 1 0 0 } ( M E M 3 R ( K ) , K = l , N l = l WRIT!. (6 , 2 2 0 0 ) (NUMBP.tJ ) , J = 1 , N C ) C INVERT THE * M E M B R * ARRAY AND PUT THE R E S U L T I N THE * LIS T * ARRAY, C F I R S T R E V I S E THE * N U M B R * ARRAY TO CONTAIN START P O I N T S I N THE C * L I S T * ARRAY FOR EACH C L U S T E R NU M BR (N C ) = N E-NUM bR { N C ) +1 • J J = NC J J 1 = J J - 1 DO 10 J = 2 t N C N U M B R ( J J l ) = N U M B R ( J J ) - N U M B R ( J J 1 ) J J = J J 1 10 J J I = J J - 1 C B U I L D * L I S T * ARRAY DO 20 K = l f N E MEMBRK=MEMBR(K) N J = NU M B R { M E M B RK ) L I S T i N J ) = K NUMBR(MEM3RK)=NUMBR(MEMBRK)+1 20 C O N T I N U E C SAVE THE SORTED MEMBERSHIP L I S T I F D E S I R E D I F ( N T O U T o L E o O ) GO TO 30 W R I T E ( N T Q U T , 3 0 0 0 ) T I T L E WRITE{N T O U T 1 3 1 0 0 ) l L I S T ( K ) , K = l , N E ) C RESTORE THE * N U M B R * ARRAY 30 JJ=NC DO 40 J = 2 , N C N U M B R { J J ) = N U M 8 R ( J J ) - N U M B R { J J - 1 ) 4 0 J J = J J - 1 N U M B R ( 1 ) = N U M B R ( 1 ) - 1 C P R I N T RESULTS FOR EACH C L U S T E R W R I T E ( 6 , 2 0 0 0 ) T I T L E W R I T E ( 8 , 2 0 0 0 ) T I T L E K l = l DO 50 J = l , N C W R I T E ( 6 , 2 3 0 0 ) J f N U M B R ( J ) W R I T E ( 8 , 2 3 0 1 ) J , N U M B R ( J ) J 1 = ( J - 1 ) * M V W R I T E ( 6 , 2 4 0 0 ) {C E N T R ( J 1 + 1 ) , 1 = 1 , N V ) K 2 = K 1 + N U M B R ( J ) - 1 W R I T E ( 6 , 2 5 0 0 ) ( L I S T ( K ) , K = K 1 , K 2 ) W R I T E ( 8 , 2 5 0 1 ) ( L I S T ( K F ) , K F = K 1 , K 2 ) K1=K2+1 50 CONTINUE W R I T E ( 6 , 3 5 0 0 ) RETURN 215 2000 F D R M A T ( 1 H 1 , 2 0 A 4 ) 2 1 0 0 FORMAT(20H0RA W MEMBERSHIP L I S T , / , ( I X t 25 I 5 J ) 2 2 0 0 F O R M A T i 1 4 H 0 C L U S T E R S I Z E S , / , ( L X , 2 5 1 5 ) ) 2 3 0 0 F O R M A T { 3 H 0 C L U S T E R , I ? , 9 H C D M T A I N S , I 5 , 1 1 H DATA U N I T S ) 2 3 0 1 F 0 R M A T 1 2 I 4 ) 2 4 0 0 F 0 R M A T ( 2 1 H 0 C E N T P . 0 I D C O O R D I N A T E S , / , ( 1 X » 1 0 E 1 2 « 4 ) ) 2 5 0 0 FORMAT ( 1 6 H 0 M E M b E F . S H l P L I ST , / , { IX , 25 I 5 ) ) 2 5 0 1 F 0 R M A T 1 1 5 I 5 ) 3 0 0 0 F O R M A T ( 2 0 A 4 ) 3 1 0 0 F 0 R M A T ( 2 0 I 4 ) 3 5 0 0 F O R M A T ( ' 1 * » 1 5 X , ' E N D OF O U T P U T ' , / / / ) END C SUBROUTINE U S E R ( X ) D I M E N S I O N X ( 2 ) R E A D ( 5 , 1 0 0 ) X ( 1 ) , X 1 2 ) RETURN 100 F O R M A T ( 5 X , 2 F 1 0 o 2) END C F U N C T I O N D I S T ( X , Y ) D I M E N S I O N X ( l ) , Y ( 1 ) D I S T = 0 . DO 10 1 = 1 , 2 10 D I S T = D I S T + { ( X ( I ) - Y I I ) ) * * 2 ) RETURN END $ S I G 216 S U B R O U T I N E K M E A N ( C E N T R , N U M B R , M E M B R , T O T A L , D A T A , N 5 , N E , N V , N C , N T I N , AMI N R E L , I P A R T , M E T H O D , L IM IT. ) C C • C V E R S I O N l c T H E DATA S E T IS S T O R E D IN C E N T R A L MEMORY. C _ . __ C C T H I S S U B R O U T I N E I T E R A T I V E L Y SORTS * N E * D A T A U N I T S I N T O * N C * C L U S T E R S C U S I N G T H E A L G O R I T H M OF ( M E T H O D . N E . 1 ) C C F O R G Y , E . W . , C L U S T E R A N A L Y S I S OF M U L T I V A R I A T E D A T A . E F F I C I E N C Y C V E R S U S I N 7 E R P R E T A 8 I L ! T Y OF C L A S S I F I C A T I O N S , P A P E R P R E S E N T E D AT T H E C 3 I O M E T R I C S O C I E T Y IWNAR) M E E T I N G S , R I V E R S I D E , C A L I F O R N I A , J U N E C 1 9 6 5 . A B S T R A C T IN B I O M E T R I C S , V O L U M E 2 1 , NUMBER 3, P 7 6 8 . C C OR T H E A L G O R I T H M OF ( M E T H O D = l ) C C J A N C E Y , R . C . , M U L T I D I M E N S I O N A L GROUP A N A L Y S I S , A U S T R A L I A N J O U R N A L C OF B O T A N Y , VOLUME 1 4 , NUMBER 1, A P R I L 1 9 6 6 , PP 1 2 7 - 1 3 0 . C C C E N T R ( N V * { J - 1 ) + I ) = S C 0 R E ON I - T H V A R I A B L E FOR J - T H C L U S T E R C E N T R O I D C T O T A L ( N V * ( J - 1 ) + I ) = T O T A L S C O R E ON I -<H V A R I A B L E FOR DATA U N I T S THUS C FAR A L L O C A T E D TO T H E J - T H C L U S T E R C NUMBR{J )=NUMBER OF DATA U N I T S T H U S FAR A L L O C A T E D TO T H E J - T H C L U S T E R C M E M 3 R I K ) = C L U S T E R T O WH ICH T H E K - T H DATA U N I T C U R R E N T L Y B E L O N G S C D A T A ( N V * { K - i ) + I ) = S C O R E ON I - T H V A R I A B L E FOR K - T H DATA UMIT C D I M E N S I O N C ' E N T R ( 1 ) , T 0 T A L ( 1 ) , N U M B R ( 1 ) , M E M B R ( 1 ) , DATA {1) , F M T ( 2 0 ) A , N A M E ( 4 ) D A T A IN A M E ( I ) » I = 1, 4 ) / 4 H F , 4 H 0 R G Y , 4 H J A , 4 H N C E Y / 1 = 1 I F ( M E T H O D . E Q . 1) 1=3 W R I T E ( 6 , 2 0 0 0 ) M A M S . I ) , N A M E ( 1 + 1 ) WRITE 1 3 , 2 0 0 1 ) N A M E C I ) , N A M E { I + 1 ) C C H E C K FOR S U F F I C I E N T S T O R A G E N 6 = N 5 + N E * N V - 1 W R I T E ( 6 f 2 1 0 0 ) N 6 i L I M I T I F ( N 6 . G T . L I M I T ) S T O P C E S T A B L I S H I N I T I A L P A R T I T I O N I F ( I P A R T o M E . 3 ) GO T O 2 0 C S E E D P O I N T S ARE R E A D D I R E C T L Y FROM CARDS READ ( 5 , 1 0 0 0 ) FMT W R I T E ( 6 , 2 2 0 0 ) FMT W R I T E ( 6 , 2 3 0 0 ) J1 = 0 DO 10 J=1 ,NC READC 5 , F M T ) ( C E N T R { J 1 + I ) , I = 1 , N V ) W R I T E ( 6 , 2 4 0 0 ) ( C E N T R < J 1 + 1 ) , I = 1 , N V ) 10 J 1 = J 1 + N V 217 GO TO 3 0 C IPART=1 OR 2 20 W R I T E ( 6 , 2 5 0 0 ) I PART R E A D ( 5 , 1 1 0 0 ) ( N U M B R ( J ) , J = 1 , N C ) W R I T E ( 6 , 2 6 0 0 ) ( N U M B R C J ) , J ^ 1 , N C ) C READ THE DATA SET INTO C E N T R A L MEMORY 30 K l = l DO 40 K = 1 , N E C A L L USER ( D A T A ( K 1 ) ) 4 0 K1=K1+NV I F ( I P A R T . E Q . 3 ) GO TO 1 0 0 C I F * I P A R T * IS 1 OR 2 SET UP THE SEED P O I N T S I F U P A R T o E Q . 2 ) GO TO 6 0 C I P A R T = 1 . THE DATA U N I T WITH SEQUENCE NUMBER * N U M B R ( J ) * I S USED AS C THE J - T H S E E D POINT DO 50 J = 1 , N C N J = ( N U M B R ( J ) - U * N V J l = ( J - l ) * N V DO 50 I = 1 , N V C E N T R ( J 1 + 1) = D A T A ( N J + 1) 5 0 CONTINUE GO TO 1 0 0 C IPART=2» THE DATA U N I T S ARE GROUPED INTO C L U S T E R S WITH THE J - T H C C L U S T E R H A V I N G * N U M B R ( J ) * M E M B E R S . 6 0 K=0 J 1 = - N V C ACCUMULATE THE TOTAL SCORE ON E A C H V A R I A B L E FOR EACH C L U S T E R DO 80 J = 1 , N C N J = N U M 8 R ( J ) J1=J1+NV DO 70 1=1 ,NV 7 0 T O T A L ( J 1 + 1)=0 o DO 80 K J = l i ! M J K=K + 1 M E M B R ( K ) = J K 1 = ( K - 1 ) * N V DO 80 1=1,NV J 2 = J 1 + I - T 0 T A L ( J 2 ) = T 0 T A L ( J 2 ) + D A T A ( K 1 +1 ) 8 0 • CONTINUE C COMPUTE THE CENTROIDS J 1 = 0 00 90 J = 1 , N C DO 90 I = 1 , N V J 1 = J 1 + 1 C E N T R I J 1 ) = T 0 T A L ( J D / N U M 3 R C J ) 9 0 CONTINUE GO TO 115 C I N I T I A L I Z E ARRAYS 100 DO 1 1 0 K = 1 , N E 218 110 M E M 3 R ( K ) = 0 115 NPASS=1 C B E G I N N I N G OF MAIN LOOP 120 J1=0 DO 130 J = 1 , N C N U M B R { J ) = 0 00 130 1=1,NV J1=J1+1 130 T Q T A L ( J 1 ) = 0 . MOVES=0 T D I S T = 0 C A L L O C A T E EACH DATA UNIT TO THE N E A R E S T C L U S T E R CENTROID K1=0 DO 160 K = 1 , N E K2=K1+1 J2 = l C COMPUTE D I S T A N C E TO F I R S T C L U S T E R C E N T R O I D D R E F = D I S T ( D A T A ( K 2 ) » C E N T R ( J 2 ) ) J R E F - - 1 C TEST D I S T A N C E S TO R E M A I N I N G C L U S T E R C E N T R O I D S DO 140 J = 2 , N C J 2 = J 2 + N V D T E S T = D I S T ( D A T A ( K 2 ) , C E N T R ( J 2 ) ) I F ( D T E S T o G E o D R E F ) GO TO 140 DREF=DTEST J R c F = J 140 C O N T I N U E C A L L O C A T E DATA UNIT * K * TO C L U S T E R * J R E F * N U M B R ( J R E F ) = N U M B R ( J R E F)+ 1 T D I S T = T D I S T + D R E F I F ( J R E F . E Q . M E M B R ( K ) ) GO TO 150 C THE DATA U N I T CHANGES I T S MEMBERSHIP M0VES=M0VES+1 MEMBR(K ) = J R E F 1 5 0 J l = ( J R E F - 1 ) * N V DO 160 I = 1 , N V J l = J l + l K1=K1+1 T O T A L ( J 1 ) = T 0 T A L ( J 1 ) + D A T A ( K 1 ) 160 CONTINUE C A L L DATA U N I T S ALLOCATEDo TEST FOR CONVERGENCE W R I T E ( 6 , 2 7 0 0 ) M O V E S , N P A S S , T D I S T NPASS=NPASS+1 J R E F = 0 I F ( M O V E S o G T o M I N R E L ) GO TO 185 I F ( M E T H O D o N S o l o A N D . M O V E S . E Q . 0 ) RETURN J R E F = 1 C COMPUTE TRUE C L U S T E R C E N T R O I D S — F O R G Y UPDATE 1 7 0 J 1 = 0 DO 180 J = 1 , N C DO 1 3 0 I = 1 , N V J 1 = J 1 + 1 1 3 0 C E N T R A J l ) = T O T A L ( J l ) / N U M 3 R { J ) I F ( J R E F . E O . l ) R E T U R N GO TO 1 2 0 1 8 5 I F ( M E T H O D . N E . 1 ) GO T O 1 7 0 C J A N C E Y U P D A T E 1 9 0 J 1 = 0 DO 2 0 0 J = 1 , N C DO 2 0 0 1 = 1 , N V J 1 = J 1 + 1 2 0 0 C E N T R { J 1 ) = 2 « * T 0 T A L ( J 1 ) / N U M B R { J 1 - C E N T R { J 1 ) ' GO TO 1 2 0 1 0 0 0 F O R M A T ( 2 0 A 4 ) 1 1 0 0 F O R M A T ( 2 0 1 4 ) 2 0 0 0 F O R M A T ! 1 H 0 , 2 A 4 , 53H METHOD OF C L U S T E R A N A L Y S I S . DATA S E T S T O R E D AN C O R E ) 2 0 0 1 F O R M A T ( 2 Q A 4 ) 2 1 0 0 FORMAT ( 1 9 HORE Q U I R E D S T O R A G E = , I5 ,6 .H W O R D S , / , A 1 9 H 0 A L L 0 T T E D S T O R A G E = , I 5 , 6 H WORDS) 2 2 0 0 FORMAT { 7 HO FOP. MAT , 20 A4} 2 3 0 0 FORMAT{ 43 HI IN I T I A L C L U S T E R C E N T E R S R E A D IN AS F O L L O W S / / / ) 2 4 0 0 F O R M A T ( I X , 1 0 E 1 2 . 4 ) 2 5 0 0 FORMAT( 9H1 I PART = , 1 2 , 3 0 H , NUM5R ARRAY R E A D AS F O L L O W S / / / ) 2 6 0 0 F O R M A T ! I X , 1 0 1 7 ) 2 7 0 0 F O R M A T i l H O , I 5 , 3 7 H DATA U N I T S MOVED ON I T E R A T I O N N U M B E R , 1 3 , / , A 3 8 H SUMMED D E V I A T I O N S ABOUT S E E D P O I N T S = , E 1 6 . 8 ) END 220 S U B R O U T I N E K M E A N ( C E N T R , N U M B R , M E M B R t T O T A L t DATA t N 5 t NEt NV t N C , N T IN t A M I N R E L , I P A R T , M E T H O D , L I M I T ) C c . C V E R S I O N 1 « THE D A T A S E T I S S T O R E D IN C E N T R A L MEMORY. c _ . _ c C T H I S S U B R O U T I N E I T E R A T I ' V E L Y SORTS * N E * DATA U N I T S I N T O * N C * C L U S T E R S C U S I N G T H E C O N V E R G E N T K - M E A N S METHOD D E S C R I 3 E D IN S E C T I O N 7 . 2 . 2 . C C C E N T R ( N V * ( J - l ) + ! ) = S C O R E ON I - T H V A R I A B L E FOR J - T H C L U S T E R C E N T R O I D C T O T A L ( N V ' I J - D + I j = T O T A L SCORE ON I - T H V A R I A 3 L E FOR DATA U N I T S THUS C FAR A L L O C A T E D TO T H E J - T H C L U S T E R C N U M B R ( J ) = N U M B E R OF DATA U N I T S T H U S FAR A L L O C A T E D T J THE J - T H C L U S T E R C M E M B R ( K ) = C L U S T E R TO WHICH T H E K - T H D A T A U N I T C U R R E N T L Y B E L O N G S C D A T A ( N V * ! K - 1 ) + I ) = S C Q R E ON I - T H V A R I A B L E FOR K - T H D A T A U N I T C D I M E N S I O N C E N T R ( 1 ) , T O T A L ( 1 ) , N U M B R ( 1 ) , M E M B R ! 1 ) , D A T A 1 1 ) , F M T ( 2 0 ) W R I T E ( 6 , 2 0 0 0 ) W R I T E ( 8 , 2 0 0 0 ) C C H E C K FOR S U F F I C I E N T S T O R A G E N 6 = N 5 + N E * N V - 1 W R I T E ( 6 , 2 1 0 0 ) N 6 , L I M I T I F ( N 6 . G T . L I M I T ) STOP C E S T A B L I S H I N I T I A L P A R T I T I O N I F U P A R T . N E . 3 ) GO T O 20 C S E E D P O I N T S ARE READ D I R E C T L Y FROM CARDS R E A D ( 5 , 1 0 0 0 ) FMT W R I T E ( 6 , 2 2 0 0 ) FMT W R I T E ( 6 , 2 3 0 0 ) J1 = 0 DO 10 J = 1 , N C R E A D ( 5 , F M T ) ( C c N T R C J l + I ) , 1 = 1 . N V ) W R I T E ( 6 , 2 4 0 0 ) ( C E N T R ( J l + I ) , 1 = 1 , N V ) 10 J 1 = J 1 + N V GO TO 3 0 C I P A R T = 1 OR 2 2 0 W R I T E ( 6 , 2 5 0 0 ) I PART R E A D ( 5 , 1 1 0 0 ) ( N U M B R ( J ) , J = 1 , M C ) W R I T E t 6 , 2 6 0 0 ) ( N U M B R C J ) , J = l ,NC ) C R E A D T H E DATA S E T INTO C E N T R A L MEMORY 3 0 K l = l DO 4 0 K = 1 , N E C A L L U S E R ( D A T A ( K 1 ) ) 4 0 K1=K1+NV I F ( I P A R T . E Q . 3 ) GO T O 51 C I F * I P A R T * IS 1 OR 2 S E T UP T H E S E E D P O I N T S I F ( I P A R T . E 0 . 2 ) GO T O 6 0 C I P A R T = 1 . T H E DATA U N I T WITH S E Q U E N C E NUMBER * N U M B * ( J ) * IS U S E D AS C T H E J - T H S E E D P O I N T 00 50 J= i ,NC NJ= (NUMBR.J ) - l ) *NV J l = ( J - i ) * N V DO 50 I - l f N V CENTR(J1 +1) = DATAiNJ M ) 50 CONTINUE C THE INITIAL CONFIGURATION IS GIVEN IN TERMS OF SEED POINTS. C CONSTRUCT AN INITIAL PARTITION BY ASSIGNING EACH OHTA UNIT TO TH C NEAREST S E E D POINT. S E E D POINTS REMAIN FIXED THROUGHOUT ASSIGNM C OF THE FULL DATA S E T . 51 DO 52 K=liNF. 52 MEMBR(K)=0 J1 = 0 00 53 J=1,MC NUMBR(J)=0 00 53 1=1,NV J l - J l + 1 53 TOTAL(J l5=0. C ALLOCATE EACH DATA UNIT TO THE NEAREST SHED POINT K l - 0 DO 55 K=1,N_ K2=K1+1 J2 = l C COMPUTE DISTANCE TO FIRST SEED POINT DREF-DIST(DATACK2),CENTR(J2)) JREF=1 C TEST DISTANCES TO REMAINING SEED POINTS 00 54 J~-2,NC J2=J2+NV 0TEST = DIST(DATA(K2) » CENTR(J 2) ) IFCDTEST.GE.DREF) GO TO 54 DREF = OTE ST JREF=J 54 CONTINUE C ALLOCATE DATA UNIT * K * TO CLUSTER * J R E F * NUMBR(J REF)=NUMBR(JREF)+1 MEMBR (K.)=JREF J l = t J R E F - l ) * N V DO 55 1=1,NV J1=J1+1 K1=K1+1 TOTAL(J l )=TOTAHJ1)+DATA(K1) 55 CONTINUE GO TO 85 C IPART=2. THE DATA UNITS ARE GROUPED INTO CLUSTERS WITH THE J-TH C CLUSTER HAVING *NUMBR(J» * MEMBERS. 60 K = 0 J1=-NV C ACCUMULATE THE TOTAL SCORE ON EACH VARIABLE FOR EACH CLUSTER DO 80 J=1,NC 222 NJ=NUMBR(J) -J1=J1+NV DO 70 1=1,NV 70 TOTAL.J1+I)=0. DO 80 KJ=1,NJ K=K + 1 MEMBR(K)=J K1=(K-1)*NV DO 80 I=1,NV J2=J1+I TOTAL(J2)=T0TAL(J2)+DATAIK1+I) 80 CONTINUE C COMPUTE THE CENTROIDS 85 J1=0 DO 90 J=1,NC DO 90 I=1,NV J1=J1+1 CENTR{J1)=TOTAL(Jl)/NUMBR(J ) 90 CONTINUE C IN IT IAL IZE ARRAYS 100 NPASS=1 C BEGINNING OF MAIN LOOP 120 MOVES=G TDIST-0 C ALLOCATE EACH DATA UNIT TO THE NEAREST CLUSTER CENTROID K1=0 DO 16 0 K=1,NS K2=K1+1 J2= l C COMPUTE DISTANCE TO FIRST CLUSTER CENTROID DREF=DIST(DATA(K2),CENTR(J2)) JREF=1 C TEST DISTANCES TO REMAINING CLUSTER CENTROIDS DO 140 J=2,NC J2=J2+MV DTEST=DI ST(DATA(K2),CE NTR(J2) ) IF(DTESTo GEoDREF) GO TO 140 DREF=DTEST JREF=J 140 CONTINUE TDIST=TDIST+DREF IF(JREF.NE.MEMBR(K)) GO TO 155 K1=K1+NV GO TO 160 C REALLOCATE DATA UNIT*K* FROM CLUSTER *MEMBR(K)<< TO CLUSTER * JREF* 155 M0VES=M0VES+1 J2=MEMBR(K) NUM BR(J 2)=NUM BR(J 2 ) -1 NUMBR { JREF ) =.NUMBR (JREF ) * - l MEMBR(K)=JREF J1= IJREF-1 ) *NV J3= (J2 -1 ) *NV 00 150 1=1,NV J1=J1+1 J3=J3+1 K 1 = K 1 U TOTAL ( J l ) =TD IAL ( J l ] +DATA (K l ) CENTR{J1)=TOT AL(J1)/NUMBR(J RE F) TOT AL (J 3)=TOTAL(J 3) - D A T A(K 1 3 CENTR(J3)=TOT AL(J3)/NUMBR(J 2) 150 CONTINUE 160 CONTINUE C ALL DATA UNITS ALLOCATED. TEST FOR CONVERGENCE WRITE(6,2700) MOVES,NPASS,TDIST NPASS=NPASS+1 IF{MOVE S.LE.MINREI) RETURN GO TO 120 1000 FORMAT{20A4) 1100 FQRMAT(20I4) 2000 FORMAT I 46H0C0NVFRGEMT K-MEANS METHOD OF CLUSTER ANALYSIS,/, A 24H DATA SET STORED IN CORE) 2100 FORMATi19H0RE QUI RED STORAGE =,I5,6H WORDS,/, A 19H0ALL0TTED STORAGE =,I5,6H WORDS) 2200 FORMAT(7H0F0RMAT,20A4) 2300 FORMAT{ 43H1INITIAL CLUSTER CENTERS READ IN AS FOLLOWS///) 2400 FORMAT(IX,10E12o4) 2500 FORMAT! 9H1 I PART =,12, 30H, NUMBR ARRAY READ AS FpLLOWS///) 2600 FORMAT iIX,1017) 2700 FORMAT(1H0,15,37H DATA UNITS MOVED ON ITERATION NUMBER,13,/, A33H SUMMED DEVIATIONS ABOUT SEED POINTS =,E16.8) END SSI 6 CLUSTER TRIAL RUN - NON-H-IERARCHICAL - TOATA1 • NE * 80 NV a 2 uc » 2 NTI ' I 5 5 NTO'JT « -1 _>!INR£L_S_ 1 IPAPT • 2 K£T»03 « 3 REOUIREO STORAGE = 186'WORDS FCSCY "ETHOO OF CLUSTER ANALYSIS, OATA SET STORED In CORE RECU1AED STORAGE « 250 WORDS L-AHSTTEn ST"?tsr » Ttn. uriai IPART » 2, NU»8R'ARRAY UFAO A3 FOLLOWS 10 ao 37 OATA UNITS HOVEO ON ITERATION A'tiHHKR 1 suMM&a—3E-viA7Ki»ir, AnpuT srro PO-WS-=—»*-*<nn>?*tk-i* ' -i DATA UNITS MOVED ON TTFRATION "MMSFR y_ SU*«ED DEVIATIONS ABOUT Sf FT. POTsTS « fl.685h003ar 12 RAW ME*S£R3HIP LIST 1 2 2 2 2 2 1 2 1 2 ? 1 2 1 ? 1 1 1 1 2 1 2 1 1 2 2 CLUSTER SIZES CI 36 CLUSTER 1 CONTAINS 44 DATA UNITS CENTROID COORDINATES • • .0.731SE 03 0.2164E 06 I KCrs?-SJ»It>-rT5T— 1 6 7 45 46 . 47 8 50 10 52 11 54 13 57 14 S8 ' 16 64 17 65 18 66 19 67 20 69 CLUSTER 2 CONTAINS 36 DATA UNITS' —crNTrtCTD-xroJTTTT^Ai'es" 0.7400E 03 0.5739E 06 «EMDER3HI?LI6T — -*-2 3 4 5 9 <2 59 60 . 61 62 63 68 15 71 22 72 23 73 21 74 26 80 28 30 APPENDIX F Coordinates of Data Points Used i n This Study 226 INPUT DATA FOR EVFNLY DISTRIBUTED CONTRIVED DATA - DATAl i X y i X y 1 10 20 47 83 39 2 10 45 48 85 52 3 10 60 49 82 60 4 10 71 50 90 20 5 15 52 51 90 65 6 15 3 5 52 96 28 7 20 15 53 94 46 8 20 23 54 100 36 9 22 59 55 100 55 10 22 17 56 100 70 11 25 11 57 105 15 12 25 70 58 110 18 13 27 33 59 108 41 14 30 20 60 110 60 15 30 46 61 1.10 72 16 32 12 62 112 68 17 3 2 23 63 115 52 18 35 15 64 115 24 19 38 35 65 115 20 20 40 10 66 120 15 21 3 9 20 67 120 27 22 40 52 68 122 47 23 40 65 J 69 125, 17 24 40 76 70 125 22 25 45 15 71 123 74 26 45 43 72 125 65 27 50 30 73 130 55 28 50 56 74 127 43 29 52 34 75 130 29 30 55 73 76 130 15 31 55 23 77 135 7 32 55 12 78 140 20 33 61 44 79 143 33 34 60 50 80 138 46 35 62 64 36 65 15 37 67 22 38 68 57 39 70 35 40 75 10 41 75 25 42 75 47 43 75 65 44 77 12 45 80 15 46 85 30 INPUT DATA F O R UNEVENLY DISTRIBUTED CONTRIVED DATA i X y 5 6 5 2 7 67 3 10 55 4 10 65 5 10 70 6 10 75 7 15 60 8 15 65 9 15 77 10 15 80 11 20 55 12 20 65 13 2 0 70 14 22 58 15 25 45 16 25 55 17 25 65 18 25 75 19 30 50 20 35 45 21 40 50 22 45 45 23 45 50 24 45 55 2 5 47 52 26 50 40 27 50 45 28 50 55 29 5 0 60 30 53 48 31 55 60 32 55 55 33 5 5 45 34 55 40 35 57 36 36 60 33 37 60 50 38 62 45 39 63 41 40 63 36 41 65 38 42 67 41 43 55 28 44 59 28 45 59 23 46 64 22 1 X y 47 62 20 48 58 15 49 55 20 50 50 10 51 45 10 52 40 15 53 45 20 54 75 25 55 85 30 56 90 25 57 90 35 58 100 35 59 100 40 60 105 30 61 105 45 62 105 50 63 110 28 64 1 1 0 36 65 110 43 66 115 • 32 67 115 40 68 115 45 69 115 43 70 115 55 71 119 36 72 120 43 73 122 51 74 12 5 47 75 125 40 76 130 33 77 133 25 78 128 27 79 125 30 80 125 25 228 INPUT DATA FOR NORTH BURNABY AREA (STREET LETTER BOX LOCATIONS) ID X y l o c a t i o n N3001 401.23 223.53 SFU MALL NB002 393.23 229.73 SFU FRONT OF SHELL STATION NB003 397.93 223.73 SFU FRONT OF STUDENT RES. NB004 365.23 220.63 DUTHIE AND CURTIS N8005 361.03 214.33 CL IFF AND WINCH NB006 365.13 212.03 DUTHIE AND HALIFAX NB007 365.63 211.63 1800 DUTHIE NB008 369.93 209.33 PHILL IPS AND CORONADO NB009 3 65.33 204.63 DUTHIE AND BROADWAY NBOIO 372.13 204.83 CAMROSE AND BROADWAY N B O l l 376.43 193.33 LAKE CITY AND ENTERPRISE NBCU2 384.93 199.43 UNDER HILL AND ENTER NB013 391.63 198.63 PRODUCTION WAY AND THUNDERBIRD CRES. N3014 385.83 193.23 LAKEDALE AND GOVERNMENT NB015 378.93 193.23 PIPER AND GOVERNMENT NB016 3 73 . 93 •193.23 LOZELLES AND GOVERNMENT N3017 368.93 193.23 ' PH ILL IPS AND GOVERNMENT NB018 3 73.93 190.03 LOZELLES AND WINSTON NB019 368.13 191.63 7342 WINSTON NB020 363.43 202.03 BAINBRIDGE AND LOUGHEED NB021 361.13 205.93 CL IFF AND BROADWAY NB022 356.83 204.83 SPERLING AND BROADWAY NB023 356.83 208.43 SPERLING AND ADAIR N3024 356.33 216.53 SPERLING AND KITCHENER NB025 3 56.83 220.83 SPERLING AND CURTIS NB026 356.83 226.53 SPERLING AND HASTINGS (SUB LGCHDALE) NB027 362.93 226.73 BAP.NET AND HASTINGS N3028 364.33 223.83 BARNET AND PANDORA NB029 364.73 232.33 INLET AND SIERRA N3030 352.73 226.43 KENSINGTON AND HASTINGS N3031 348.63 216.53 FELL AND KITCHENER N3032 345.53 215.33 HOLOOM AND GRANT NB033 343.53 211.43 FELL AND BUCHANAN N3034 352.73 212.33 KENSINGTON AND HALIFAX NB035 350.73 207.33 6265 EAST BROADWAY NB036 346.73 208.93 5901 EAST BROADWAY!SUB 113) NB037 344.43 209.93 HOLDOM AND BROADWAY (SUM4S) NB033 332.23 210.43 BETA AND LOUGHEED NB039 329.93 203.53 ALPHA AND DAWSON N8040 327.03 210.83 4477 LOUGHEED N3041 325-63 212.53 ROSSER AND HALIFAX NB042 323.53 208.53 MADISON AND DAWSON N3043 320.13 ' 214.53 GILMORE AND GRAVELEY NB044 313.33 213.83 3765 EAST 1ST NB045 314.23 220.33 DOUGLAS AND ESMOND NB046 318.03 226.03 MCDONALD AND PENDER (NB STAT .i 2 2 9 NB(K7 3 23.73 2 22.13 NB048 319.43 222.13 N8049 319.43 220.03 NB050 3 23.73 218.43 N3051 323.73 214.93 MB052 3 27.7? 214.93 NB053 330.73 2.16.03 N3054 336.13 219.03 NB055 336.13 221.03 NB056 342.53 223.1 3 N8057 339.33 217.23 N5058 336.13 212.83 N8059 327.73 210.83 NB060 330.23 211.53 NB061 3 30.43 213 .43 N3062 327.73 221.03 N3063 3 27.73 227.03 N6G64 332.03 227.03 N3065 336. 13 226.13 NB066 334.13 233.23 NB067 336.13 233.23 N3063 337.43 231.03 NB069 341.43 231.03 NB070 344.53 231.03 NB071 346.73 2 31.03 N3072 347.73 226.43 " NB073 343 .53 - 2 26.43 NB074 339.33 226.43 N8075 332.03 223.13 NB076 .3 23.73 227.03 N8077 32 3.73 2 30 » 73 N3078 319.43 227.03 NB079 319.43 2 30.73 NB080 319.43 233.23 NB081 319.43 236.33 N8082 316.03 238.23 NB083 314.33 235.83 NB084 314.33 232.73 N8085 312.53 230.73 N3036 316.03 227.03 N3087 318.03 227.03 MAOI SON ANO UNION GILMORE ANO VENABLES GILMQP.O ANO NAPIER 1265 MADISON (SUB 6 7) MADISON AND GRAVELEY WI LL I rib DOM ANO BRENTLAWN FAIRLAWM AND MIOLAWN DELTA AND F - AI R.LAWN DELTA ANO PARKER HOWARD AND UNION 1381 SPRINGER DELTA AND 3RENTLAWN WILLINGTON AND LOUGHEED EA TONS BRENTWOOD (SUB 121) REAR BRENTWOOD SHOP CTR. WILLINGTON AND PARKER WILLINGTON AND HASTINGS (SUB 45) BETA AND HASTINGS DELTA ANO HASTINGS EMPIRE OR (GAMMA) AND CAMBRIDGE DELTA AND CAMBRIDGE HYTHE AND DUNDAS GROSVENOR AND DUNDAS HOLDOM AND DUNDAS WARWICK AND DUNDAS STRATFORD AND HASTINGS ELLESMERE AND HASTINGS (SUB 132) SPRINGER AND HASTINGS BETA AND UNION MADISON AND HASTINGS AND AND AND MADISON GILMORE GILMORE GILMORE GILMORE TRIUMPH HAST INGS TRIUMPH ANO CAMBRIDGE AND TRINITY INGLETON AND EDINBURGH ESMOND AND MCGILL ESMOND AND OXFORD 80UNDRY AND TRIUMPH INGLETON AND HASTINGS MACDONALD AND HASTINGS (SUB 106) 230 INPUT DATA FOR "SOUTH BURNABY AREA (STREET LETTER BOX LOCATIONS) TJJ X y SB001 375.16 162 .27 SB002 372.36 165.67 SB003 3 67.66 171.2 7 S3004 3 62.16 172.77 SB005 357.66 178. 97 SB006 552.16 183.77 SB007 34 9 . 86 185.57 SB008 348.06 190.67 SB0G9 344.86 187.17 SBOIO 344.76 181.37 S B O l l 339.46 181.37 S3012 333.66 131.17 SB013 342.16 189.07 S3014 345.16 191.57 SB015 338.03 206.93 SB016 344.06 200.17 SB017 339.66 198.57 SB018 339.66 194.57 S B 0 I 9 j i 3 • 8 6 1 9 6.37 S3020 328.66 191.47 SB021 330.26 190.17 SB022 323.06 196.17 SB023 324.36 196.17 SB024 324.36 197.97 SB025 326.36 203.37 SB026 316.36 202.17. SB027 315.16 196.17 SB028 321.21 192.07 SB029 316.86 190.67 SB030 317.96 190.67 SB 031 316.16 187.97 SB032 316.06 182.37 SB033 320.86 182.87 SB034 32 0 . 86 178.17 SB035 322.56 174.57 SB036 327.06 176.17 SB037 324.66 179.62 SB038 324.86 182.87 S3039 323.76 167.77 SB040 3 26.26 164.07 SB041 321.36 167.57 SB042 323.46 161.47 SB043 328.36 162.77 SB044 326.46 159.17 SB045 324.26 155.07 SB046 321.36 151.17 l o c a t i o n CANADA WAY AND WEDGEWOOD CANADA WAY AND GOODLAD CANADA WAY AND STANLEY BUCKINGHAM AND BURR IS CANADA WAY AND SPERLING CANADA WAY AND LEDGER 4916 CANADA WAY GODWIN AND SPROTT MAHON AND SPRUCE MAHON AND GILP IN ROYAL OAK AND GILP IN GARDEN GROVE AND MOSCROP 5325 K lNCA lD DOUGLAS AND WOODSWORTH 2210 DOUGLAS ROAD DOUGLAS AND REGENT ROYAL OAK AND MANOR 4694 CANADA WAY GARDNER COURT AND CANADA WAY 3700 WILLINGTON (FRONT BCIT) 3700 WILLINGTON (SAC BCIT) WILLINGDON AND CANADA WAY SUMNER AND CANADA WAY SUMNER AND DOMINION GILMORE AND STILL CREEK SMITH AND MYRTLE 3737 CANADA WAY KALYK AND NITHSDALE 3815 SUNSET (SUB 93) BURNABY GEN HOSPITAL SMITH AND SPRUCE-SMITH AND MOSCROP PATTERSON AND MOSCROP PATTERSON AND HAZELWOOD BARKER AND BOND GILP IN CRES AND BURKE BARKER AND GI LPI.N DARWIN AND MOSCROP MCKAY AND KINGSWAY MCKAY AND BERESFORD PATTERSON AND BERESFORD CASSIE AND MAY WOOD BERESFORD AND TELFORD SUSSEX AND IMPERIAL MCKAY AND VICTORY PATTERSON AND RUMBLE S6047 321 . 36 148.07 So 043 326.46 143.17 SB049 330. 3 6 146.77 S3050 3 26 . 46 151.07 SB051 329 . 76 15 3*12 SB.052 335 . 36 151.07 S3053- ^35.36 146.77 SB054 335- 3 6 155,07 S3 05 5 3 36 . 36 ' 1 59 . 17 S3036 339. 61 15 6.07 S3037 3 43 . 66 155.17 SB053 347. 86 153.7 7 S3059 347. 36 15 0.97 S3060 347 . 86 147.97 SB061 347. 86 144.37 S8062 347. 56 140.47 SB063 352 . 16 147.87 SB 06 4 3 52 . 16 151.92 S3065 343 . 66 150.97 S3066 343 . 66 146.87 S3067 3 39 . 86 150.97 S3068 339 . 36 143.97 S3069 339. 86 145.97 S3070 . 339 . 36 141.97 S8071 326 . 96 144.92 S3072 321. 36 144.97 SB0 73 317. 06 143.77" S3074 315 . 46 ' 146.77 S8075 314.86 151.12 SB076 314. 36 154.07 SB077 314.86 15 8.07 SB078 332. 76 176. 17 S3079 332 . 76 171.92 SB080 335 . 26 173.07 SB081 3 39. 3 6 168.02 SB082 344. 21 163.77 SB083 '348. 81 165.57 S3034 352. 06 . 163.57 SB085 355 . 26 164.97 SB036 357. 56 163.37 SB037 357 . 56 161 .47 SB OS 8 3 60. 26 165.07 SB 039 363 . 26 161.37 S3090 372 . 16 159.7 7 SB091 369. 76 160.77 SB092 366. 56 159.07 SB093 367 . 76 155.07 SB094 364. 46 155.27 S8095 3 63 . 26 155.17 SB096 352. 96 159.17 PATTERSON AND PORTLAND SUSSEX AND PORTLAND STRATHEARN AND MCKEE SUSSEX AND RUMBLE FREDERICK AND WITLING NELSON AND RUMBLE NELSON AND MCKEE NELSON AND VICTORY DUNBLANE AND IMPERIAL ROYAL OAK AND BERESFORD MCPHERSON AND BERESFORD BULLER AND BERESFORD BULLER AND RUMBLE BULL EP. AND PORTLAND BULLER AND CARSON GILLEY AND MARINE GILLEY AND PORTLAND 7542 GILLEY MCPHERSON AND RUMBLE MCPHERSON AND MCKEE ROYAL OAK AND RUMBLE (SUB 9 ROYAL OAK AND CLINTON ROYAL OAK AND EWART ROYAL OAK AND MARINE SUSSEX AND MARINE PATTERSON AND MARINE GREENALL AND MARINE JQFFRE AND CARSON JOFFRE AND RUMBLE (SUB 109} J OF F Rt AND ARBOR JOFFRE AND DUBOIS SUSSEX AND BURKS SUSSEX AND SARDIS NELSON AND BUXTON ROYAL OAK AND DOVER ELGIN AND IRVING WALTHAM AND BERWICK GILLEY AND BURNS BRANTFORD AND STANLEY SPERLING AND WALKER SPF; RL I NG AND BUR FORD WALKER AND STANLEY (SUB 97) WALKER AND IMPERIAL MARY AND VISTA HUMPHRIES AND ELWELL LINDEN AND ELWELL ESMONDS AND KINGSWAY 7155 KINGSWAY (SUB 120) SALISBURY AND KING SWAY COL BOURNE AND IMPERIAL SB097 357.56 . 155.17 SB098 3 52 . 06 156.87 SB099 346. 36 159.17 SBIOO 342. 76 160.32 SB101 339 . 36 162.57 S3102 3 33. 26 159.17 S8103 330. 76 164.42 S8104 332 . 26 164.37 SB105 335 . 26 164.52 S8106 3 33 . 76 165.37 S3107 331 . 36 166.47 SB 103 , 3 31 . 86 167.47 SB109 329 . 51 169.37 SB110 326. 96 168.87 SB111 323 . 50 169.67 SB112 318 . 11 171.47 SB113 316.06 173.57 $SIG SPERLING AND KINGSWAY GILLEY AND KINGSWAY KINGSWAY AND IMPERIAL DENBIGH AND KINGSWAY ROYAL OAK AND KINGSWAY (SUB JUBILEE AND IMPERIAL SIMPSON SEARS (SUB 65) SIMPSON SEARS NELSON AND KINGSWAY MCMURRAY AND KINGSWAY SUSSEX AND KINGSWAY 6025 SUSSEX PIONEER AND GRA JGE 4429 KINGSWAY (SUB 122) 4211 KINGSWAY JERSEY AND KINGSWAY (SUB 88) SMITH AND THURSTON 233 APPENDIX G Membership L i s t s of Groupings Defined by Various Clustering Methods 234 M E M B E R S H I P L I S T OF GROUPS D E F I N E D BY V A R I O U S METHODS E V E N L Y D I S T R I B U T E D C O N T R I V E D O A T A S E T - D A T A l S I N G L E L I N K A G E : E U C L I D E A N D I S T A N C E + SUM GF D I F F E R H c T H O D S GROUP 1 - 6 7 MEMBERS 1 2 3 4 5 6 7 8 9 10 11 12 1 3 1 4 15 16 1 7 18 1 9 2 0 21 2 2 2 3 2 4 25 Z 7 2 8 2 9 3 0 3 1 3 2 33 3 4 3 5 36 3 7 38 3 9 4 0 Hi 4 2 4 3 4 4 4 5 4 6 4 7 4 3 4 9 5 0 51 5 2 5 3 5 4 55 5o 5 9 6 0 61 ' 6 2 6 3 68 71 72 73 74 3 0 GROUP 2 - 1 3 MEMBERS 5 7 5 8 6 4 6 5 66 6 7 6 9 70 75 76 7 7 7 3 7 9 S I N G L E L I N K A G E : CH I - S Q U A R E S METHOD GROUP 1 - 2 2 MEMBERS 1 2 3 4 5 6 8 9 1 2 13 1 5 1 9 2 2 2 3 2 4 2 6 2 8 3 0 3 4 3 5 3 8 4 3 GROUP 2 - 5 8 MEMBERS 7 1 0 11 1 4 1 6 1 7 18 20 2 1 2 5 2 7 2 9 3 1 3 2 3 3 3 6 3 7 3 9 4 0 41 4 2 4 4 4 5 4 6 4 7 Ho 4 9 5 0 51 5 2 5 3 5 4 55 5 6 5 7 " 5 8 .59 6 0 61 62 6 i t>4 6 5 6 6 6 7 6 8 6 9 7 0 " 71 72 73 7 4 7 5 76 7 7 73 79 8 0 C O M P L c T E L I N K A G E METHOD GROUP 1 - 3 5 MEM3ERS 1 - 2 3 4 5 6 7 8 9 1 0 i i J.2 1 3 1 4 1 5 16 1 7 18 1 9 2 0 21 2 2 23 2 4 25 2 o 2 7 2 8 29 3 0 3 3 3 4 3 5 3 8 4 2 GROUP 2 - 4 5 MEMBERS 3 1 3 2 3 6 3 7 3 9 4 0 4 1 4 3 4 4 4 5 4 o Hi 4 8 4 9 5 0 51 52 53 54 55 5 6 5 7 5 8 5 9 6 0 t»i 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 70 71 7 2 73 7 4 75 7t> 7 7 78 79 8 0 A V G . L I N K A G E BETWEEN MERGED GROUP GROUP 1 - 5 1 MEMBERS 1 2 3 4 5 6 7 8 9 10 i i 1 2 1 3 1 4 1 5 1 6 1 7 18 1 9 2 0 21 2 2 2 3 2 4 25 2 o 2 7 2 8 2 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 4 0 41 4 2 4 3 4 4 4 5 4 6 4 7 4 d 4 9 5 0 51 GROUP 2 - 2 9 MEMBERS 5 2 53 5 4 5 5 5 6 5 7 • 58 5 9 6 0 6 1 o 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 7 0 71 72 73 7 4 75 76 77 78 7 9 80 235 A V G . L I N K A G E W ITH IN NEW GROUP 1 2 3 4 5 6 7 8 9 10 l i . 1<: 13 14 15 16 17 1 8 1 9 2 0 21 2 2 2 3 2 4 25 2 o 2 7 2 8 2 9 3 0 3 1 3 2 3 3 3 4 35 3 6 3 7 38 3 9 4 0 41 4<i 4 3 4 4 4 5 4 6 4 7 4 8 4 9 50 51 5 3 55 GROUP 1 - 53 M E M E d E R S GROUP 2 - 2 7 MEMBERS 5 2 5 4 56 5 7 50 5 9 6 0 61 6 2 6 3 6 * o 5 6 6 6 7 6 3 6 9 7 0 71 72 73 74 75 76 77 78 71 3 0 C E N T R O I D AND M E D I A N METHODS GROUP 1 - 4 9 MEMBERS 1 2 3 4 5 6 7 8 9 1 0 Ix 1 2 13 1 4 1 5 1 6 1 7 1 8 1 9 2 0 21 2 2 23 2 4 2 5 2t> 2 / 2 8 2 9 3 0 3 1 32 3 3 3 4 3 5 3 6 3 7 3 8 3 9 4 0 4 i 4 2 4 3 4 4 4 5 4 3 4 9 50 51 GROUP 2 - 3 1 MEMBERS 4 6 4 7 5 2 53 54 55 5 6 5 7 5 8 5 9 60 6 1 6 2 6 3 o 4 6 5 6 6 6 7 6 S 6 9 70 71 72 73 7 4 75 76 7 7 73 7 9 8 0 WARD' S METHOD GROUP 1 - 4 3 MEMBERS 1 2 3 4 5 6 7 8 9 10 i i 1 2 1 3 1 4 15 1 6 1 7 18 1 9 2 0 21 2 2 23 2 4 25 26 2 7 2 8 2 9 3 0 31 3 2 3 3 3 4 36 3 7 4 0 4 1 4 4 4 5 4 o 5 0 5 2 GROUP 2 - 3 7 MEMBERS 3 5 3 8 3 9 4 2 4 3 4 7 4 8 4 9 5 1 5 2 53 5 4 5 5 56 5 7 5 8 5 9 6 0 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 6 6 9 7 0 71 72 7 3 7 4 75 76 7 7 78 7 9 8 0 J A N C E Y ' S , F O R G Y ' S AND C O N V E R G E N T K - M E A N METHODS GROUP 1 - 3 6 MEMBERS 2 3 4 5 9 12 1 5 2 2 2 3 2 4 2 o 2 8 3 0 3 3 3 4 3 5 3 8 4 2 43 4 8 4 9 51 53 55 56 3 9 6 0 6 1 6 2 6 3 6 8 71 72 73 74 30 :OUP 2 - 4 4 MEMBERS 1 6 7 3 10 11 13 1 4 16 1 7 li> 1 9 2 0 21 2 5 2 7 2 9 31 32 3 6 37 3 9 4 0 ' 41 4 4 4 J 46 4 7 50 5 2 5 4 5 7 58 6 4 6 5 6 6 6 7 6 9 7 0 75 76 77 78 79 236 MEMBERSHIP LIST OF GROUPS DEFINED BY VARIOUS METHODS UNEVENLY DISTRIBUTED CONTRIVED DATA SET - DATA2 SINGLE LINKAGE : EUCLIDEAN DISTANCE METHOD MEMBERSHIP LIST NOT ANALYSED SINGLE LINKAGE J SUM OF DIFFER ENCES AND CHI-SQUARES Mi: THUDS GROUP 1 - 1 7 MEMBERS 1 2 3 4 5 6 7 8 9 10 l i 12 13 14 16 17 18 GROUP 2 - 2 5 MEMBERS 15 19 20 21 22 23 24 25 26 27 29 30 31 32 33 34 35 36 37 38 39 40 41 42 GROUP 3 - 3 8 MEMBERS 43 44 45 46 47 43 49 50 51 52 i»3 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 COMPLETE LINKAGE METHOD -GROUP 1 - 1 9 MEMBERS 1 2 3 4 5 6 7 8 9 10 l i JL ti 13 14 15 16 17 18 19 GROUP 2 - 3 4 MEMBERS 20 21 22 23 24 25 26 27 23 29 30 3 A 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 GROUP 3 - 2 7 MEMBERS 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 73 79 30 AVG. LINKAGE METHODS GROUP 1 - 2 0 MEMEBERS 1 2 3 4 5 6 7 8 9 10 l i 12 13 14 15 16 17 18 19 20 GROUP 2 - 33 MEMBERS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 " 45 HO 47 48 49 50 51 52 53 GROUP 3 - 2 7 MEMBERS 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72. 73 74 75 76 77 78 79 60 237 CENTROID AND MEDIAN METHODS GROUP 1 - 2 0 MEMBERS 1 2 3 4 5 6 7 8 9 10 1A 12 13 14 15 16 17 18 19 20 GROUP 2 - 37 MEMBERS 21 22 23 24 25 26 27 28 29 30 31 32 33 34 3 5 36 37 38 39 40 41 42 43 44 45 4b -*/ 48 49 50 5 i 52 53 54 55 56 57 GROUP 3 - 23 MEMBERS 58 59 60 61 62 63 64 65 66 67 od 69 70 71 72 73 74 75 76 77 73 79 30 WARD' S MET 'HOD GROUP 1 - 42 MEMBERS 1 2 3 4 5 6 7 8 9 10 i i 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 GROUP 2 - 14 MEMBERS 43 ,44 45 46 47 48 49 50 51 52 53 54 55 56 GROUP 3 - 24 MEMBERS 57 5 8 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 73 79 80 JANCEY S METHOD GROUP i - 23 MEMBERS 1 2 3 4 5 6 7 8 9 10 l i 12 13 14 .16 17 18 24 28 29 31 32 70 GROUP 2 - 23 MEMBERS 3 6 43 44 45 46 47 43 49 50 51 52 53 54 55 56 60 63 66 76 77 78 79 80 GROUP 3 - 34 MEMBERS 15 19 20 21 22 23 25 26 27 30 33 3<* 35 37 38 39 40 41 42 57 58 59 61 62 64 65 67 68 69 71 72 73 74 75 FORGY 1 S AND CONVERGENT K-MEAN METHODS GROUP 1 - 16 MEMBERS 1 2 4 5 6 7 8 9 10 12 l i i t 17 18 29 31 GROUP 2 - 29 MEMBERS 35 36 40 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 t>0 63 64 66 71 76 77 76 79 80 GROUP 3 - 35 MEMBERS 3 11 15 16 19 20 21 22 23 24 25 26 27 28 30 32 33 34 3 7 38 39 41 42 59 61 o2 67 6a 69 70 72 73 74 75 238 MEMBERSHIP LIST OF GROUPS DEFINED BY VARIOUS METHODS EMPIRICAL DATA - NORTH BURNABY AREA - NBDATA SINGLE LINKAGE: EUCLIDEAN DISTANCE AND SUM OF DIFFERENCES METHODS + AVG. LINKAGE BETWEEN MERGED GROUP METhOD GROUP 1 - 3 MEMBERS 1 2 3 GROUP 2 - 84 MEMBERS 4 5 6 7 8 9 10 11 12 13 14 15 16 17 13 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 4-5 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 03 34 85 86 87 SINGLE LINKAGE : CHI -SQUARES METHOD GROUP 1 - 29 MEMBERS 1 2 3 4 5 6 7 8 9 10 l i 13 14 15 16 17 IS 19 20 21 22 23 24 33 34 35 36 37 GROUP 2 - 58 MEMBERS 25 26 27 28 29 30 31 32 33 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 5o 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ' 80 81 82 83 84 85 86 87 COMPL ETE LINKAGE METHOD GROUP 1 - 37 MEMBERS 1 2 3 4 5 6 7 8 9 10 l i 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 35 34 35 36 37 GROUP 2 - 50 MEMBERS 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 J 4 75 76 77 78 79 80 8 i 82 83 84 85 86 87 AVG. LINKAGE WITHIN NEW GROUP METHOD GROUP 1 - 39 MEMBERS 1 2 3 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 58 59 60 61 62 63 64 75 76 77 78 79 80 81 32 83 84 85 86 87 GROUP 2 - 48 MEMBERS 4 5 6 7 8 9 10 11 12 13 ±4 15 16 17 13 19 20 21 22 23 24 25 26 27 28 29 3,0 31 32 33 34 35 36 37 54 55 56 57 65 66 67 66 69 70 71 72 73 74 239 CENTROID AND MEDIAN METHODS GROUP 1 - 1 2 MEMBERS L 2 3 11 12 13 14 15 lOUP 2 - 75 MEMBERS 4 5 6 7 8 9 10 20 28 29 30 31 32 33 34 35 43 44 45 46 47 48 49 50 58 59 60 61 62 63 64 65 73 74 75 76 77 78 79 80 WARD'S METHOD GROUP 1 - 3 5 MEMBERS 1 2 3 4 5 6 7 8 16 17 18 19 20 21 22 23 36 37 54 55 57 10UP 2 - 52 ME MBER.S 26 27 28 29 30 38 39 40 48 49 50 51 52 53 56 58 66 67 63 69 70 71 72 73 81 82 83 84 85 86 87 JANCEY 'S METHOD • GROUP 1 - 3 5 MEMBERS 1 2 3 4 5 6 7 8 16 17 13 19 20 21 22 23 31 33 34 35 36 OUP 2 - 52 MEMBERS 32 37 33 39 40 41 42 43 51 52 53 54 55 56 57 58 66 67 63 69 70 71 72 73 81 82 83 84 85 86 87 FORGY'S AND CONVERGENT K-MEAN METHODS GROUP 1 - 31 MEMBERS 1 2 3 4 5 6 7 8 16 17 18 19 20 21 22 23 35 ;OUP 2 - 56 MEMB • ERS 30 31 32 33 36 37 33 39 47 48 49 50 51 52 53 54 62 63 64 65 66 67 68 69 77 78 79 80 81 82 83 84 16 17 18 19 21 22 23 2 t 25 26 2 7 36 37 38 39 40 41 42 51 52 5 J 34 55 56 57 66 67 6d 69 70 71 72 81 82 85 34 85 86 87 9 10 i i 12 13 14 15 24 25 3 i 3^ 33 34 35 41 42 43 44 45 46 47 59 60 6 l o2 63 64 65 74 75 7o 77 78 79 80 9 10 11 12 13 14 15 24 25 26 27 28 29 3 0 44 45 46 . 47 48 49 50 59 60 61 o2 63 64 65 74 75 76 7/ 78 79 80 9 10 11 12 13 14 15 24 25 2o 27 28 29 34 40 41 42 43 44 45 46 55 56 57 58 59 60 61 70 71 72 73 74 75 76 85 86 87 240 MEMBERSHIP LIST OF GROUPS DEFINED BY VARIOUS METHODS EMPIRICAL DATA - SOUTH BURNABY AREA - SBDATA SINGLE LINKAGE : SUM OF DIFFERENCES METHOD + AVG. LINKAGE BETWEEN MERGED GROUP METHOD GROUP 1 - 21 MEMBERS 1 2 3 4 5 83 84 85 86 87 63 69 90 91 92 93 94 95 96 97 98 GROUP 2 - 26 ME MBERS o 7 8 9 10 11 13 14 15 16 17 i d 19 20 21 22 23 24 25 26 27 28 29 30 31 32 GROUP 3 - 66 MEMBERS 12 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 5o 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 32 99 100 101 102 103 i.04 105 106 107 103 1C9 110 111 112 113 SINGLE LINKAGE : CHI SQUARES METHOD GROUP 1 - 34 MEMBERS 1 2 48 49 52 53 57 66 67 68 69 70 71 72 95 96 97 98 GROUP 2 - 36 MEMBERS 3 4 5 40 42 43 44 74 75 76 77 31 83 103 104 105 106 107 103 GROUP 3 - 43 MEMBERS 6 7 8 9 10 11 12 21 22 23 24 25 26 27 36 37 38 39 41 78 79 COMPLETE LINKAGE METHOD GROUP 1 - 20 MEMBERS 1 2 3 4 83 84 85 94 95 96 97 98 GROUP 2 - 45 MEMBERS 45 46 47 48 49 50 51 60 61 62 63 64 65 60 75 76 77 81 82 99 100 GROUP 3 - 48 MEMBERS 5 6 7 8 9 10 11 20 21 22 23 24 25 26 35 36 37 38 39 40 41 111 112 113 58 59 60 6x 62 63 64 65 73 37 39 90 9 i 92 93 94 45 46 47 50 5 i 54 55 56 84 85 86 88 99 100 101 102 13 14 15 16 17 18 19 20 28 29 30 3 A 32 33 34 35 80 109 110 H i i i 2 113 66 87 83 39 90 91 92 93 52 53 54 55 56 57 58 59 67 68 69 70 71 72 73 74 .01 102 103 104 105 106 107 108 12 13 14 15 16 17 18 19 27 28 29 30 - i l 32 33 34 42 *3 44 78 79 80 109 110 241 A V G . L I N K A G E W ITH IN NEW GROUP METHOD GROUP 1 - 22 MEMBERS 1 2 3 4 5 83 84 85 93 94 95 96 97 98 99 GROUP 2 - 36 MEMBERS 46 47 48 49 50 51 52 53 61 62 63 64 65 66 67 68 76 77 82 100 101 102 GROUP 3 - 55 MEMBERS 6 7 a 9 10 11 12 13 21 22 23 24 25 2b 2 7 28 36 37 33 39 40 41 42 43 104 105 106 107 108 1C9 1 10 111 C E N T R O I D AND M E D I A N METHODS GROUP 1 - 2 MEMBERS 62 70 GROUP 2 - 18 MEMBERS 15 16 17 18 19 20 21 22 30 31 32 GROUP 3 - 93 MEMBERS 1 2 3 4 5 6 7 8 34 3 5 36 37 3 8 . 39 40 41 49 50 51 . 52 53 54 55 56 65 66 67 68 69 71 72 73 81 82 83 84 85 86 87 8 8 96 97 98 99 100 101 102 103 111 112 113 WARD' S METHOD GROUP 1 - 45 MEMBERS 1 2 3 • 4 52 53 54 55 63 64 65 66 67 63 69 70 8 8 89 90 91 92 93 94 95 GROUP 2 - 28 MEMBERS 5 6 7 8 9 10 11 13 21 22 2 3 24 25 26 27 28 GROUP 3 - 40 MEMBERS 12 34 35 36 37 38 39 40 48 49 50 51 71 72 73 74 104 105 106 107 108 109 110 111 8 6 87 86 8 9 9 0 91 9 2 5 4 55 5o 5 7 5 8 59 6 0 6 9 70 7 A 73 74 75 1 4 1 3 l o 11 18 19 2 0 2 9 3 0 3A ->2 3 3 3 4 3 5 4 4 4 5 73 79 8 0 81 1 0 3 112 113 2 3 2 4 2 5 2 6 2 7 2 8 2 9 9 1 0 11 x 2 1 3 1 4 3 3 4 2 4 3 4 4 4 5 4 6 4 7 4 8 5 7 58 59 6 0 6 1 6 3 6 4 7 4 75 7o 77 78 7 9 8 0 8 9 9 0 91 9 2 9 3 94 95 1.04 1 0 5 10o i 0 7 1 0 3 1 0 9 1 1 0 56 57 58 59 60 61 62 81 82 84 85 36 87 96 97 96 99 100 101 102 14 15 l o 17 13 19 2 0 29 30 31 32 3 3 41 42 43 44 45 <t6 47 75 76 77 78 79 80 103 112 113 \ 242 JANCEY* S METHOD GROUP 1 - 33 MEMBERS 6 21 37 7 22 38 8 9 23 24 78 10 25 11 26 12 27 13 28 14 29 15 30 16 3 i i.7 32 16 3 3 19 34 20 36 GROUP 2 - 46 MEMBERS 35 53 79 113 39 54 80 40 41 55 56 81 101 42 67 102 43 68 103 44 69 104 45 70 105 <*6 71 106 47 72 107 46 73 l O d 49 74 i 0* 50 75 110 51 76 H i 52 77 i l 2 GROUP 3 - 34 MEMBERS 3. 82 97 2 83 98 3 4 84 85 99 100 5 86 57 87 58 88 59 89 60 90 61 91 6<: 92 o3 9.> 64 94 65 95 66 96 FORGY»S AND CONVERGENT K-MEAN METHODS GROUP 1 - 3 4 MEMBERS 6 7 8 9 10 11 12 13 21 22 23 24 25 26 27 28 36 37 38 78 GROUP 2 - 4 9 MEMBERS 39 40 41 42 54 55 56 57 7 6 77 79 80 110 111 112 113 GROUP 3 - 3 0 MEMBERS 2 3 4 86 87 83 14 15 16 17 16 19 20 29 30 3 i 32 3 3 34 35 43 44 45 46 47 43 '49 3'J 51 52 53 65 66 67 63 69 70 71 72 73 74 75 31 100 101 102 103 104 105 10o 107 108 109 5 58 59 60 61 62 63 o*t 82 83 84 89 90 91 92 93 94 95 9o 97 98 99 APPENDIX H L i s t i n g and Sample Outputs from ROUTE: a Computer Program Designed for Evaluating Clusters 244 DIMENSION X ( 100 ) ,Y ( 100 ) , 0 ( 100 , 100 ) , S ( 100 , 1OOJ ,DOJ (1001 , 1' I T ( 1 0 0 ) , N I ( 5 0 0 0 ) , N J ( 5 0 0 0 ) , L ( 1 0 0 ) , S N ( 1 0 J , 1 0 ) , 2 LR (50 ,100 ) , LLR {10 , I 00 ) , LK|100 ) ,PROT IT (15 ) 3 ,FSA(2) ,T IME(1003 ,LAB 1 1 0 0 ) , 0 1 S T Z ( l U l , l U l ) , 4 A R E A ( 5 ) , N L R ( I 0 0 ) , S 0 T ( 3 , 5 0 0 0 ) , S N A M E ( ± 0 0 , 1 0 ) , 5 L A B K 1 0 0 ) , T I T L E ( 2 0 ) ,WAY (20) C C THE FOLLOWING ARE IMPORTANT VARIABLES: -C C PROTIT = PROBLEM TITLE C AREA(I) = GEOGRAPHIC LOCATION C ICAP = CAPACITY OF TRUCK (N0„ OF STOPS) C ITIM = TIME CONSTRAINT OF TRUCK ROUTE C M = TOTAL NUMBER OF TRUCKS C INUM = TOTAL NUMBER OF BOXES IN THIS AREA C ISTIME = STOPPING TIME AT EACH BOX C AVGSPD = AVERAGE TRUCK SPEED IN M.P.H. C SCALE = MAP SCALE (1 MM= XX MILES) C NPLOT = PLOTTING CONTROL NUMBER C NPL0T=1 EXISTING ROUTE PLOTTED C NPLOTc GTe1 ONLY NEW ROUTE PLOTTED C XORD = X-COOROIHATE OF STATION IN MM C YORD = Y-C00RDINA7E OF STATION IN MM C N = NO OF BOXES FOR THIS ROUTE C FSA = FORWARD SORT AT I ON AREA C TYPE = SL3/BR C XORG = X-COORDINATS FOR PLOT ORIGIN (INCHES) C YQRG = Y-COORDINATE FOR PLOT ORIGIN (INCHES) C READ (5,11) PROTIT 11 FORMAT ( I5A4) WRITE (6 ,190) PROTIT 190 FORMAT ( * 1 ' , 2 0 ( / ) , 2 5 X , ' M U L T I P L E ROUTE SCHEDULING BY SAVINGS ' , 1'ALGORITHM »,/26X,15A4) READ(5 ,15 ) (AREA( I ) , I = 1 ,5 ) , ICAP, I TIM,M,INUM,ISTIME,NPL OT,AVGS PD, A SCALE 15 FORMAT!5A4,6I4,2F10.6) WRITE (6,60) ( AREA { I ) , 1= 1 ,5 ) , ICAP, IT I M , A V G S P D , M , I N J M , S C A L E 60 FORMAT(•1 ' , 10 (/ ) ,15X , ' THE FOLLOWING ROUTING PRINT-OUTS FOR «, 1 5A4 , ' AREA ARE BASED ON : - ' / / , 1 7 X , 2 'CAPACITY CONSTRAINT OF EACH TRUCK ROUTE = • ,16 ,2X ,•BOXES ' // 3,17X, 'T IME CONSTRAINT OF EACH ROUTE',1 IX,•= •,I 6 , 2 X , • M I N U T E S • / / , 4 17X, 'AVERAGE SPEED OF TRUCKS EN ROUTE ' , 8X , ' = • , F 6 . 2 , 2 X , C M . P . H o 1 / 5/,17X, 'TOTAL NUMBER OF ROUTES IN THE AREA ' , 6X , ' = • , 1 6 . 2 X , 1 ROUTES'/ 6/,17X, ' TOTAL NUMBER OF 30XES IN THE A R E A ' , 7 X , ' = ' , l o , 2 X , « B O X E S ' / 7/,17X, 'MAP SCALE•,31X, *= « , 2X ,«1 M M = • , F 1 0 . 6 , • M I L E S ' , / ) WRITE(6,61) ISTIME 61 F3RMAT(17X,•STOPPING TIME FOR RESIDENTIAL B O X E S • , 5 X , » = • , 5 X , 1 2 , 2 X , 1 ' SECONDS ' ,//,17X, ' STOPPING TIME FOR BUSINESS B O X E S ' , 6 X , ' = ' , 5 X , 2 'AS SPEC IF I ED ' , // ) 245 JC = 0 IR = 0 TOIST=Oo R E A 0 ( 5 , I S ) F S A . T Y P E , X O R D , Y O R D , X O R G , Y O R G , I P L O T , I LI ST 18 F 0 R M A T ( 5 X , 2 A 4 , 4 F 8 . 2 , 2 I 4 ) 7 . OO 987 I L = 1 , 1 0 0 9 3 7 I T ( I L ) = 2 C X ( o ) - X - C O O R D I N A T E OF C A L L POINT l o ) C Y(.) - Y - C O O R D I N A T E OF C A L L POINT ( . ) C L ( . ) - L A B E L OF C A L L POINT ( . ) C T I M E (• ) - S T O P P I N G T I M E R E Q U I R E D AT POINT U ) C D I . , . ) - D I S T A N C E BETWEEN C A L L POINTS ( . . . ) C D D 0 ( o ) - D I S T A N C E OF C A L L POINT ( « ) FROM ( 0 , 0 ) C S ( . , . ) - S A V I N G S BY J O I N I N G C A L L POINT S I . , . ) C L R ( ! , J ) - L A B E L OF J T H E L T . I N SUBROUTE I C N L R ( o ) - N O . OF EL To IN SUBRQUTEC) C NR - NO. OF CURRENT SUBROUTES C C C READ L A B E L AND L O C A T I O N OF EACH BOX C C A L L S E L E C T ( L , X , Y , T I M E , S N A M E , I N U M , T I T L E , W A Y , N ) C C C A L C U L A T E D I S T A N C E BETWEEN BOX AND O R I G I N C 9 2 DO 1 I = 1 , N 1 D D O t I ) = ( A B S ( X { I ) - X 0 R D ) + A B S ( Y ( I ) - Y O R D ) ) NM1=N-1 c C C A L C U L A T E D I S T A N C E SAVED M A T R I X AND D I S T A N C E MATRIX C D=DISTANCE MATRIX S=DISTANCE SAVED DO 3 I = 1 , N M 1 K= 1 + 1 DO 3 J = K , N D ( I , J ) = ( A B S ( X ( I ) - X ( J ) ) ) + ( A B S ( Y ( I ) - Y ( J ) )) S ( I , J ) = DDO( 1 ) + D D 0 ( J ) - D ( I , J ) S ( J , I ) = S ( I , J ) D I J , I ) = D ( I , J ) 3 CONTINUE NA=N+1 DO 5 I = 1 , N A D ( I , I ) = O o • 5 S ( I , 1 ) = 0 . K=0 CALL C O U N T ( D , N , T I T L E , W A Y , S C A L E ) C C STORE D I S T A N C E SAVED MATRIX I N A VECTOR C C S O T ( 1 , . ) = F R O M - M A I L 8 0 X , S O T ( 2 , . ) = T O - M A I L B O X , S O T ( 3 , . ) = D 1 S T - S A V E D DO 4 I = 1 , N M 1 246 I P 1 = I + 1 DO 4 J = I P 1 , N • K = K + 1 S O T ( l . K ) = I S O T ( 2 , K ) = J S 0 T ( 3 » K ) = S ( I » J ) 4 CONTINUE C C SORT VECTOR BY D I S T A N C E SAVED V A L U E S r N N = ( ( N * N ) - N ) / 2 C A L L I S O R K S O T , 3 , 3 0 0 0 , 1 , N N , 3 , 3 , - 1 ) DQ 10 I = 1 , N N " N I t I ) = S O T ( l , I ) N J U ) = S 0 T { 2 , I ) 10 CONTINUE C C B E G I N ROUTE CHOICE H E U R I S T I C . C C SUBROUTES ARE FORMED 8Y S E A R C H I N G I N THE L I S T S O T ( 3 , . ) FOR C M A X . S A V I N G S AND ARE L I N K E D M U L T 1 P L L Y . C NR=0 20 DO 12 I = 1 , N N C C I F N E G A T I V E , IT I S A L R E A D Y I N ROUTE C IF ( S O T ( 3 , I ) . L T o O o ) GO TO 12 C C TEST TO SEE I F ONLY ONE I S END POINT OF SUBROUT^ C IF ( I T ( N J ( I ) ) . N H . 1 T { N I ( I ) ) ) GO TO 21 C C C TEST TO SEE I F I N SUBROUTE C IF ( I T ( N J ( I ) J . E Q . l ) GO TO 22 C C B E G I N NEW SUBROUTE C I T ( N I ( I ) ) = 1 I T ( N J ( I ) ) = 1 C NR= # OF NEW R O U T E S , L R ( N R , 1 ) = L A B E L . 1 S T . B O X , L R ' N R , 2 ) = L A B E L . 2 . B O X C N L R ( N R ) = 2 . N 0 . 0 F . L A b £ L S = 2 NR=NR+1 L R ( N R , 1 ) = N I ( I ) LR (NP., 2 ) =N J ( I ) N L R ( N R ) = 2 GO TO 12 C C ADD NI(I) TO S U 3 R 0 U T E C 22 DO 71 J = 1 , N R C EG R 0 U T 1 = 1 2 - 1 4 - 1 7 R O U T 2 = 1 3 - 2 0 - 1 2 TRY TO MATCH AND MAKE I ROUTE C 1 R0UTE = 1 8 - 2 ~ 1 2 - 1 4 - 1 7 C F I N D S U B R O U T E ( S ) AND END C CHECK TO S E E I F E I T H E R HAS BEEN J O I N E D TOGETHER B E F J \ E SO THAT C A CLOSED LOOP IS NOT COMPLETED TOO EARLY IF ( L R ( J , D o E Q o N I ( T ) . O R . L R ( J » N L R ( J ) J . E O . N I I I ) ) K i = J IF ( L R ( J , l ) o E Q o N J ( I ) o O R . L R { J , N L R ( J ) ) „ E Q . N J ( I ) ) K 2 = J 71 CONTINUE I F (KloEQ«,K2) GO TO 12 C C TEST FOR BOTH AT B E G I N N I N G OF SUBROUTE C I F N E I T H E R L A B E L OCCURRED BEFORE STORE I N NEW ROW OF M A T R I X BUT C DO NOT CHECK FOR ANYTHING IF ( L R { K l , l ) o E Q . N U I ) „ A N D o L R ( K 2 , i ) o E Q . N J ( I ) ) GO TO 73 C C TEST FOR ROUTES F I T T I N G TOGETHER C IF ( L R ( K l , N L R ( K I J ) o E Q o N I I I ) a A N D o L R { K 2 , N L R { K 2 ) }«£*}• NJ ( I ) ) GO TO C I F OLD BOX LA8EL=NEWBOX L A B E L GO TO 72 I F { L R ( K 1 , N L R ( K 1 ) ) . E Q . N I ( I ) ) GO TO 72 GO TO 99 C C R E V E R S E ORDER OF SUBROUTE K l & P L A C E INTO L L R C 73 N K 1 = N L R ( K l ) DO 75 J = 1 , N K 1 L L R ( K 1 , N L R I K 1 ) - J + l . = L R ( K 1 , J ) 75 CONTINUE C STORE NEW L A B E L J N NEW ROW OF M A T R I X DO 83 J = l , N K i L R I K 1 , J ) = L L R ( K 1 , J ) 88 CONTINUE C C P L A C E BOTH S U 3 R 0 U T E S INTO ROUTE K l C N K l P 2 = S U M . T 0 T A L o N 0 o L A B E L S N K l = N O o O F . B O X E S oIN . K 1 . R O U T E 72 N K 1 P 2 = N L R ( K 1 ) + N L R ( K 2 ) . N K 1 = N L R { K 1 ) + 1 C J O I N UP 2 SUBROUTES TOGETHER DO 76 J = N K 1 , N K 1 P 2 L R ( K l , J J = L R I K 2 , J - N L R ( K l ) ) L R I K 2 , J - N L R ( K l ) ) = 0 76 CONTINUE N L R ( K 1 ) = N K I P 2 C C TURN L A B E L S AROUND C I F ( K 2 o G T . K l ) GO TO 76 89 K2P1=K2+1 C INCORPORATE NEW ROUTE AND RENUMBER OLD ROUTE SO OLD ONE FITS C NLR=# OF BOXES IN ROUTE COUNTER LR=M£W LABELS DO 77 J=K2P1,NR NLRJ= NL R IJ) NL R ( J - l ) =NLR ( J ) DO 77 K=1,NLRJ L R ( J - 1» K)=LR(J » K) 77 CONTINUE NR=NR-1 GO TO 100 78 I F (K2„EQ.NR) GO TO 67 NNR=NR-1 DO 66 K=K2iNNR JAK=K+1 NNUM=NLR(JAK) DO 66 J=1,NNUM L R ( K , J ) = L R ( J A K , J ) LR(JAK,J}=0 NDUMP=NLR(K j NLR(K)=NLR(JAK) 66 CONTINUE NLR (NR) =0 67 NR=NR-1 GO TO 100 C C F IT SECOND ROUTE INTO LLR C 79 NK2=NLR(K2) DO 81 J=1,NK2 LLR(K2 ,NLR(K2 ) - J+1 )=LR(K2 ,J ) 81 CONTINUE DO 80 J=1,NK2 L R ( K 2 , J ) = L L R ( K 2 » J ) 80 CONTINUE GO TO 72 99 KK1=K1 K1 = K2 K2=KK1 GO TO 72 C C PLACE NEW LOCATIONS ON SUBROUTE C 21 IF ( IT(NI( I))oEQ<.2) GO TO 52 C SWITCH LABELS FOR NEW BOXES THE OLD BOX HAS ALREADY JOINED W C ANOTHER NOW IT BECOMES A "T0= BOX RATHER THAN A "FROM" BOX N I I=N I ( I ) N I ( I ) =NJ ( I ) NJ( 11 - N I I 52 L1=0 L2 = 0 DQ 53 J = 1 , N R C I F L A B E L S MATCH L1=0LD„LABEL CHECK 2ND L A B E L WITH NEW L A B E L IF ( L R { J , 1 ) « E Q . N J ( I ) ) L1=J IF I L R ( J , I N L R ( J ) ) . E O . N J ( I H L 2 = J 53 CONTINUE IF ( L l . G T . O . ) GO TO 55 N L R t L 2 ) = N L R ( L 2 ) + l L R ( L 2 »NLR ( L 2 ) ) = N I ( I > GO TO 1 0 0 ' 55 N L 1 = N L R{L 1 ) C R E L A B E L NEW BOX TO OLD 1ST B O X . A N D R E L A B E L 1ST OLD oOX TO 2ND C BOX N O . DO 56 J = 1 , N L 1 L R ( L i , iSIL1 - J + 2 ) = L R ( L 1 , N L I - J + 1 ) 56 CONTINUE LR ( L i , 1 ) = N I ( I ) N L R C L l ) = N L R ( L U + i " C C C A L C U L A T E S T A T I S T I C S AND UPDATE RECORDS C 100 IP 1 = 1+1 C NLR=# OF BOXES J O I N E D TOGETHER I T ( N I ( I ) ) = I T I N I ( I ) ) - 1 I T ( N J ( I ) ) = I T I N J ( I ) ) - i I F 1 I T I N K I ) ) s E O . O J GO TO 120 C GOTO E L I N I M A T E THE POSS OF J O I N I N G AN OLD BOX C GOTO E L I N I M A T E THE POSS OF J O I N I N G AN OLD BOX 135 I F ( I T ( i N J ( I ) ) . E Q . O ) GO TO 1 2 3 GO TO 12 C E L I M I N A T E POSS OF J O I N I N G OLD BOXES BY M A K I N G N E G A T I V E 120 DO 125 J = I P 1 , N N I F I N K J ) » E Q . N I ( I ) ) S 0 T ( 3 , J ) = - 1 . I F ( N J ( J ) . E Q . N I ( I ) ) S 0 T ( 3 , J ) = - 1 . 125 CONTINUE GO TO 135 128 DO 126 J = I P 1 , N N I F (N I ( J ) • E Q . N J { I H S 0 T C 3 , J ) = - 1 . IF ( N J { J ) o E Q o N J ( I ) ) S 0 T l 3 , J ) = - l o 126 CONTINUE 12 CONTINUE C C P R I N T R E S U L T S . C JC=JC+1 IR=IR+1 D I S T=0. IFLAG=1 DO 35 I = 1 , N M 1 35 D I S T = D I S T + D ( L R ( l i l ) , L R ( 1 , I + l l ) 250 DIST=THE DISTANCE IS SUMMED FROM 1ST BOX TO LAST BOX DISS = THE COMPLETE ROUTE DISTANCE»SuTH ARE NECESSARY CAUSE THE ROUTE CAN BE STARTED AFTER SLB OR BR OR IN MIDDLE OF SLB DISS=DIST+UD0(LR(1,1)) + 0 0 0 ( L R (1»N )) ADIST = ACTUAL DISTANCE (STN TO STN,IN MILES. BDIST = ACTUAL DISTANCE (1ST TO LAST BOX» IN MILES) STIME = TOTAL TIME REQUIRED FOR STOPPING AT BOXES TRTIME = TOTAL TRAVEL TIME REQUIRED (STN TO STN) BTT'IME = TOTAL TRAVEL TIME (1ST TO LAST BOX) TOTIME = TOTAL TIME PER ROUTE INCLUDING STIME AND TRTIME (STN TO STN) BTIME = TOTAL TIME FOR TRAVELLING AND STOPPING (1ST TO LAST BOX) DIST = DISTANCE OF THE ROUTE FROM FIRST BOX T J LAST BOX DISS = DISTANCE OF THE ROUTE FROM ORIGIN (STATION) TO ORIGIN CHANGE LR,LAK,ETC TO LAB OR NEMONJCS FDR LABEL DO 34 I=1,N 34 LAB (1 )=LR (1f I ) LAB(N+1)=0 44 ST I MS = 0.0 DO 32 I = 1,N L K ( I ) = L ( L A S ( I ) ) 32 CONTINUE 00 36 IN=1,N DO 36 IN0=1,10 36 SN(IN,IN0)=SNAME(LA3( IN), INO) DO 33 IK=1,N 33 STIME=(STIME+TIME(LR(1, IK))) BDIST=DIST*SCALc BTTIME=C3DIST/AVGSPD)*60. ADIST=DISS*SCALE TRT IME= ( ADI ST/AVGSPD)*60. TOT IME=TRTIME + STI ME BTIME = 3TTIME + STI ME WRITE(6,41) IR, ( A R E A ( I J l ) , I J 1 = 1 , 5 ) , ( F S A ( I J 2 ) , I J 2 = 1 , 2 ) .TYPE 41 F O R M A T t ' 1 ' , / / , 1 1 0 X , 1 PAGE 1 , 13 ,// ,15X , 'MULT IPLE ROUTE SCHEDULING', 1 / ,15X, 'US ING TIME-SAVING METHOD',///,15X,5A4.• A R E A ' , / , 2 1 5 X , ' F o S o A . • , 2 X , 2 A 4 , / , 1 5 X , A 4 , « ROUTES ' ,////) I F ( IFLAG.EQ.2) GO TO 45 WRITE(6,40) 40 F 0 R M A T Q 5 X , ' * * PRELIMINARY ROUTE * *• ,////) GO TO 46 45 WR1TE(6,47) 47 FORMAT(15X » ' I M P R O V E D ROUTE * * ' , / / / / ) 46 WR!TE(6,42) JC•N,ST I ME,BDI ST,BTTI ME,BTI ME,ADIST,TRTIME,TOTIME 42 FORMAT(15X,•ROUTE N O . ' , 1 4 , / / , 1 15X , ' N0 . GF BOXES EN ROUT E • , 16X , • = " • , I 5 , 3 X , • BOXES ' , / / , 215X, 'TOTAL STOPPING T IME ' , 18X ,• = • ,3X , F 6 . 2 » ' M INUTES ' .//, 315X, 'DISTANCE TRAVELLED(1ST TO LAST BOX) = » . 3 X , F 6 . 2 , ' M I LES ' , / / , 251 4 1 5 X , ' T R A V E L TIME R E Q U I R E D ( 1 S T TO LAST BOX ) = • , 3 X , F 6 . 2 , • M I N U T E S ' , / 5 / , 1 5 X , • T O T A L TIME R E Q U I R E D { 1 S T TO LAST BOX) = ' » 3 X » F 6 . 2 » 2 X , ' M I N U T E S 6 ' , / / , 1 5 X , ' T O T A L D I S T A N C E T R A V E L L E D ( S T N TO STNJ = ' , 3 X , F 6 . 2 , 7 ' M I L E S ' , / / , 1 5 X , ' T O T A L T R A V E L T ! M E ( S T N TO STN ) • , 3 X , • = • , 8 3 X , F 6 o 2 , ' M I N U T E S ' , / / , I 5 X , ' T O T A L TIME R E Q U I R E D FOR T H I S ' , 9 ' R O U T E = • , 3 X , F 6 . 2 , 2 X , ' M I N U T E S ' , / ) N P i = N + l L R ( 1 , N P 1 ) = 0 L K ( N P 1 ) = 0 WR I T S ( 6 , 4 3 ) ( L K ( J ) , J = 1, NP 1.) 43 F Q R M A T ( / / / , 1 5 X , • O R D E R OF CALL P O I N T S : ' , / , 1 5 X , • S T A T ION - ' , 1 1 2 ( 1 5 , ' - ' ) » / » 6 ( 2 5 X , 1 2 ( 1 5 , ' — ' ) » / ) , ' S T A T I O N ' , / ) I F ( I F L A G o E Q o 2 ) GO TO 8 0 5 D I S T Z U , 1 ) = 0 . DO 8 0 1 I K = 1 , N D I S T Z ( I K + 1 , 1 ) = D D 0 ( I K ) 801 D I S T Z d , I K + 1 ) =DDO( I K ) DO 8 0 2 IM=1,N DO 802 I J = 1 , N 8 0 2 D I S T Z U M + 1 , 1 J+1)=D( I M . I J ) X T M = 1 0 0 0 o C A L L I M P R O T ( L A B , N , X T M , D I S T Z , D I S T , D I S S ) DO 8 0 9 I H = 1 , N 8 0 9 L A B K I H ) = L ( L A B ( I H ) ) IF LAG =2 1R=IR+1 GO TO 44 8 0 5 1F ( B T I M E o G T e l T I M ) GO TO 2 9 9 9 0 1 I F ( I L I S T o N E o 1 ) GO TO 9 0 2 8 03 NUM=N/2 0 A N U M = N / 2 0 . I F ( A N U M e G T o NUM)NUM=NUM + 1 ICNT=0 NON=0 8 0 4 IR=IR+1 ICNT=ICNT+1 W R I T E ( 6 , 8 0 8 ) I R , J C , ( A R E A ( I R l ) , I R 1 = 1 , 5 ) , I C N T , N U M 8 0 8 F O R M A T t • ! ' , / / , 1 1 0 X , • P A G E ' , 1 3 , / , 2 7 X , ' R O U T E N O . 1 , 1 5 , / / , 1 2 4 X , 5 A 4 , / / , 2 3 X , • O R D E R OF C A L L P O I N T S ' , / / , 1 3 X , 6 0 ( ' - • ) , / / , 2 1 3 X , ' B 0 X N O . ' , 1 0 X , ' L O C A T I O N ' , 2 0 X , » S H E E T ' , 1 2 , » O F ' , 1 2 , / / , 3 1 3 X , 6 0 ( • - » ) , / / ) NAN=N0N+1 N 0 N = I C N T * 2 0 I F ( N O N o G T . N ) N 0 N = N DO 8 0 6 IW=NAN,NON 8 0 6 W R I T E ( 6 , 8 0 7 ) L A B K I W ) , ( S N ( I W , I S ) t I S = l , 1 0 ) 8 0 7 F 0 R M A T ( 1 5 X , 1 5 , 5 X , 1 0 A 4 , / ) I F ( I C N T . L T . N U M ) GO TO 8 0 4 9 0 2 I F ( I P L O T o G T . l ) GO TO 8 1 0 C A L L P L I K ( D I S T , N P 1 , L A B , X , Y , X O R G , Y O R G , S C A L E , J C , L A B I , 252 1 N P L 0 T ) 6 1 0 T D I S T = T D I S T + O I S T A T D I S T = T D I S T * SCALE-GO TO 9 9 7 2 9 9 IR= IR+1 W R I T E ( 6 , 2 9 8 ) I R , J C 2 9 8 F O R M A T ! 1 1 ' , 1 1 0 X , ' P A G E ' , I 3 i I 5 ( / ) , 1 5 X , • * * W A R N I N G * * ' , / / / / , 1 5 X , I* T I M E R E Q U I R E D ^ O R R O O T S N O . ' , 1 3 , 2' G R E A T E R T H A N T I M E A L L O W E D ' , / / , 1 5 X , 3 ' T R Y L E S S E R NUMBER OF B O X E S IN V T H I S R O U T E ' , / / / / ) GO TO 9 0 1 9 9 7 I F ( J C . L T . M ) GO T O 7 W R I T E ( 6 , 1 7 0 ) M , A T D I ST 1 7 0 F O R M A T ( ' 1 * , 1 0 X , • T O T A L NUMBER OF R O U T E S ' , 2 X , I 2 , / 1 1 X , ' T O T A L L E N G T H 1 0 F R O U T E S ' , 2 X , F 7 » 2 , " M I L E S ' ) 9 9 9 S T O P END S U B R O U T I N E I M P R O T ( N M R O U T , M M R , T M M , D M , D 1 S T , D ISS ) D I M E N S I O N N M R O U T ( I O O ) , N E W R T ( 1 0 0 ) , D M ( 1 0 1 , 1 0 1 ) . N L R i l O u ) , L R ( 1 0 , i O O ) C C T H I S S U B R O U T I N E I M P R O V E S R O U T E S 3Y R E M O V I N G A C U S T O M E R IN A R O U T E AND C R E I N S E R T I N G IT IN T H E B E S T P O S I T I O N IN T H I S R O U T E , C C C MMR = # O F B O X E S EN R O U T E C LM = NUMBER OF T H E R O U T E C D M ( * f . ) = D I S T A N C E FROM {*) T O ( • ) C TMM = D I S T A N C E R E M A I N I N G C I = R O U T E I N D E X C K = BOX I N D E X C I T S T C S = T E S T BOX NUMBER C N M R O U T ( o ) = L A B E L OF ( • ) IN C A L L O R D E R C D I S T = T R A V E L D I S T A N C E OF NEW R O U T E C c C S T E P T H R U R O U T E S C KM1--1 C C S T E P T H R U C U T O M E R S OF R O U T E I C DO 9 0 K = l , M M R C C S T O R E R O U T E WITHOUT T E S T C U S T O M E R IN N E W R T ( o ) C DO 10 K K1=1 ,M M R I F ( K K l o L T o K ) N E W R T ( KK1 ) = N M R O U T ( K K 1 ) I F ( K K I o G T » K ) N E W R T ( K K 1 - 1 ) = N M R O U T ( K K 1 ) 10 CONTINUE 253 NEWRT(MMR)=0 NLR{MMR)=0 C C TEST BOX IS K • TH BOX OF N M R Q U T J . ) C ITSTCS =.NMR0UT( K)+1 KP1=NEWRT(K)+1 I F ( K. GTc 1 ) KM1 = NE W R T ( K - 1 ) + 1 I F C I T S T C S . E Q . K M 1 ) G 0 T 0 90 C C C A L C U L A T E D I S T A N C E OF NEWRT C T £ M P = T M M - D M ( K M 1 , I T S T C S ) - D M ( I T S T C S , K P 1 ) + D M ( K M 1 , K P i ) I S V K K = 0 SVTM=TMM C C KMP I S K M - P R E V I O U S C KMP=1 C C TEST CUSTOMER I N EACH P O S I T I O N AND SAVE L O C A T I O N I N NEWRT ( I S V K K ) AND C D I S T A N C E ON NEWRT ( S V T M ) o C DO 20 KK2=1,MMR KMO=NEWRT(KK2)+I TEMPT M = T E M P + D M { K M P , 1 T S T C S ) + D M ( I T S T C S , K M O ) - D M ( K M P , KMu) KMP=KM0 I F { T E M P T M o G E o SVTM) GOTO 20 I S V K K = K K 2 SVTM=TEMPTM 2 0 CONTINUE C C I F NO CHANGE I N NEWRT TRY NEXT BOX C I F ( I S V K K . L E . l ) GOTO 9 0 C C STORE NEW ROUTE C TMM=S VTM I S V M = I S V K K - 1 DO 30 K K 3 = 1 , I S V M I F C K K 3 . L T . I S V K K ) NMROUT{KK3)= N E W R T ( K K 3 ) I F ( K K 3 . G T . I S V K K ) N M R O U T ( K K 3 ) = N E W R T < K K 3 - 1 ) 3 0 CONTINUE N M R O U T { I S V K K ) = I T S T C S - 1 9 0 CONTINUE D I S T = 0 . 0 MNR=MMR-2 DO 35 1=1,MNR 3 5 D I S T = D I S T + D M ( N M R O U T ( I ) + 1 , N M R O U T { I + 1 ) + 1 ) 254 01S S=DIST+DM(1 ,NMROUT( 1)+ i ) + DM(NMROUT(MMR) ,11 RETURN END S U 3 R 0 U T I N E PL I K ( S C L T , N A C C 3 T , S , X , Y , X O R G , Y O R G , 1 S C A L E , J C , L A B I , N P L O T > . D I M E N S I O N J J J 1 2 ) , I D A T E ( 3 ) , X { 1 0 0 ) , Y { 1 0 0 ) , L A B 1 ( 1 0 0 ) INTEGER S ( I O O ) C A L L T I M E ( 5 , 0 , I D A T E ) C C PLOT A X I S C C A L L N U M B E R ( 1 . 8 5 , O c 3 5 , 0 . 1 , X O R G , 0 . 0 , 1 ) C A L L P L O T ( 2 . 0 , 1 . 0 , 3 ) C A L L P L 0 T ( 2 . 0 , 1 . 0 , 2 ) J J J ( 1 ) = 1 3 YB-1.0 Y B l = Y 3 - 0 . 2 5 DO 10 1 = 1 , 1 3 AK=(XOKG+( 1 * 2 5 ) ) X 3 = C 1 * 1 . 9 6 9 ) + 2 . 0 C A L L S Y M B O L ( X 3 , Y B , 0 . i , J J J { 1 ) , 0 . 0 , - 1 ) 1 0 C A L L N U M B E R ( X B , Y 8 i , 0 . 1 , A K , 0 . 0 , 1 ) C A L L N U M B E R ! 1 . 2 5 , 1 . 0 , 0 . 1 , Y O R G , 0 . 0 , 1 ) C A L L P L O T ( 2 . 0 , 1 . 0 , 2 ) XB=2.0 X 3 1 = X B - 0 . 7 5 DO 20 1 = 1 , 1 0 A Y = ( Y O R G + l 1 * 2 5 ) ) Y B = ( 1 * 1 . 9 6 9 ) + ! . 0 C A L L S Y M B 0 L ( X B , Y 3 , o l , J J J ( i ) , 9 0 . 0 , - 1 ) 2 0 C A L L N U M B E R ( X 3 1 , Y 8 , 0 . 1 , A Y , 0 . 0 , 1 ) C A L L P L 0 T ( 2 . 0 , 1 . 0 , 2 ) C C CHANGE COORDINATES TO MAP S C A L E C NN=NACCST-1 DO 100 J = 1 , N N S X = X ( J ) X { J ) = ( S X - X 0 R G ) / 2 . 5 4 * 2 S Y = Y ( J ) 1 0 0 Y ( J ) = ( S Y - Y 0 R G ) / 2 o 5 4 * 2 C C PLOT ROUTES C J J J ( 2 ) = 2 X B = X ( S ( 1 ) ) + 2 . 0 Y d = Y ( S ( 1 ) ) + 1 . 0 C A L L P L O T I X B , Y B , 3 ) C A L L S Y M B 0 L ( X B , Y B , . 1 , J J J ( 2 ) , 0 . 0 , - 2 ) X l = X B - 0 . 2 5 255 Y l = Y 6 - 0 . 2 0 •BLAB=LA81{1) ' C A L L N U M B E R ( X I , Y I , 0 . 1 , B L A B i 0 . 0 , - 1 ) C A L L PLOT ( X B , Y B , 3) DO 50 J = 2,N.M I K = S ( J ) X b = X ( I K ) + 2 . 0 Y B = Y ( I K J + 1 . 0 X 2 = X 3 - 0 . 2 5 Y 2 = Y B - 0 . 2 0 A L A B = L A B 1 ( J ) C A L L S Y M B O L ( X B , Y B , . 1 , J J J ( 2 ) , 0 . 0 , - 2 ) C A L L N U M B E R ! X 2 , Y 2 , 0 . 1 , A L A B , 0 . 0 , - 1 ) 5 0 C A L L P L O T ( X B , Y B , 3 ) I F t N P L O T . N E . l ) GO TO 5 2 C C PLOT P R E S E N T ROUTE C X B = X ( 1 ) + 5 . 1 YB = Y( D + 3 . 1 C A L L P L 0 T ( X B , Y B , 3 ) C A L L D A S H L N C O . l , 0 . 0 5 , 0 . 1 , 0 . 0 5 ) DO 51 J = 2 , N N X B = X ( J ) + 5 o l Y B = Y ( J ) + 3 . 1 51 C A L L P L O T I X B , Y B , 4 ) C C PLOT HEADINGS C 52 S C A L 0 = S C A L E * 2 S C A l = S C A L Q * 2 5 . 4 C A L L P L O T l 1 5 . 0 , 5 . 0 , 3) C A L L S Y M B O L ( 1 5 . 0 , 1 . 2 5 , o i , 2 3 H T 0 T A L TRAVEL D I S T A N C E , 0 . , 2 3 ) C A L L N U M B E R ! 1 7 . 0 , 1 . 2 5 , . l , S 0 L T , 0 . , 1 ) C A L L S Y M B O L . 1 5 . 0 , 1 . 5 0 , . 1 , 1 6 H N U M B E R OF BOXES , 0 . , 1 6 ) ANN=NN C A L L NUMBER! 1 7 . 0 , 1 . 5 0 , . 1 , A N N , 0 . ,-D C A L L S Y M B 0 L ! 1 5 . 0 , 1 . 0 , . 1 , 1 7 H S C A L E : 1 MM = , 0 « , , 1 7 ) C A L L N U M B E R ! 1 7 . 0 , 1 . 0 , . 1 , S C A L D , 0 . , 2 ) C A L L S Y M B O L ( 1 8 . 0 , 1 . 0 , . 1 , 5 H M I L E S , 0 . , 5 ) C A L L S Y M B 0 L ( 1 5 . 5 , 0 . 7 5 , . 1 , 1 0 H ! 1 INCH = , 0 « , 1 0 ) C A L L N U M B E R ( 1 7 . 5 , 0 . 7 5 , . ! , S C A L , 0 . , 2 ) C A L L S Y M B 0 L ( 1 8 . 5 , 0 o 7 5 , . l , 6 H M I L E S ) , 0 . , 6 ) C A L L PLOT ( 1 9 . 0 , 0 . 0 , - 3 ) RETURN END SUBROUTINE S E L E C T ( L , X , Y , T 1 M E , S N A M E , N U M , T I T L e , W A Y , I G P ) DIMENSION T I T L E ! 2 0 ) , WAYI 2 0 ) , L G P ( 1 0 0 ) , L ( 1 0 0 ) , X ( 1 0 0 ) , Y ( 1 0 0 ) , A S N A M E t 1 0 0 , 1 0 ) , T I M E ( 1 0 0 ) , 1 0 ! 1 0 ) , 0 ( 1 0 ) R E A D ( 5 , 1 0 0 ) T I T L E , W A Y 256 100 F0RMAT(20A4,/,20A4) REA0(5,110) INO.IGP 110 FORMAT(214) RSAD(5,120) (LGP(K) ,K=1, IGP) ' 120 FORMAT(1515) ATIME=.9 J= l DO 190 II=1,NUM READ(7,140) IA ,B ,C . (D (K ) ,K=1 ,10 ) I F ( I A . E Q . L G P ( J ) ) GO TO 200 GO TO 190 200 L ( J ) = I A X( J)= 8 Y(J)=C TIMciJ)=ATIME DO 201 IC=1,10 201 SNAMSiJ, 1 0 = 0 ( 1 0 J=J + 1 I F ( J„GT„ IGP ) GO TO 99 190 CONTINUE 140 FORMAT(2X,I 3 , 2F10«2 ,3X ,10A4 ) 9 9 RETURN END SUBROUTINE COUNT(0,N,TITLE,WAY,SCALE) DIMENSION XL I S T ( 5 0 0 0 ) , 0 ( 1 0 0 , 1 0 0 ) , I C O N ! ( 1 1 ) , INT(11) ,STOR(150) DIMENSION T I T L E ( 2 0 ) , W A Y ( 2 0 ) REAL INT, INTER DATA STAR , BL"NK/ ' * • , « «/ AMIN=99999„99 AMAX=0»0 NUMK=1 SUM=0. ASUM=0» SCAL1=SCALE*5230 DO 14 I=2,N K= I - i DO 14 J=1,K XLIST(NUMK)=D(I,J) IF ( XL IS T (iMUMK) oGTo AM AX ) AM AX=XL 1ST ( NUMK ) IF (XL I ST(NUMK).LT•AM IN) AMIN=XLI ST(NUMK) ' SUM=SUM+XLIST(NUMK) 14 NUMK=NUMK+1 NUMK=NUMK-1 AVG=SUM/NUMK DO 16 11 = 1,NUMK DSQ=(XL IST( I1 ) -AVG)* *2 16 ASUM=ASUM+DSQ VAR=ASUM/( NIJMK-1 ) STD=SQRT(VAR) TSUM=SUM 257 SUM=TSUM*SCAL1 TAVG=AVG AVG=TAVG*SCAL1 TSTD=STD S T D = T S T D * S C A L 1 8MAX=AMAX AMAX=BMAX*SCAL1 BMIN=AMIN A M I N = B M I N * S C A L 1 R A N G E = A M A X - A M IN I N T E R C H A N G E / L O DO 10 1 2 = 1 , 1 0 10 I C 0 N T C I 2 ) = 0 I N T ( 1 ) = A M I N * 1 DO 17 I G = 2 , 1 1 17 I N T ( I G ) = I N T ( I G - 1 J + I N T E R C A L L S S 0 R T ( X L I 5 T , N U M K , 3 ) IKONT=0 J l = 2 DO 19 J K = 1 ,NUMK V A L U c = X L I S T ( J K 5 * S C A L 1 I F { V A L U E . G T . I N T ! J l ) ) GO TO 22 I K 0 N T = I K 0 N T + 1 GO TO 19 22 I C O N T ( J 1 ) = IKONT IK0NT=1 J1=J1+1 19 CONTINUE W R I T £ < 6 , 1 0 0 ) T I T L E , N , W A Y 1 0 0 F O R M A T ! ' ! • , / / , 5 X , 2 0 A 4 , / / , 5 X , ' S T A T I S T I C S OF D I S T A N C E M A T R I X O F ' , A« G R O U P ' , 1 4 , ' U S I N G f , 2 0 A 4 , / / ) W R I T E ( 6 , 1 0 1 ) S U M , A V G , S T D W R I T E ( 6 , 5 0 ) NUMK 5 0 F O R M A T { 2 0 X , ' T O T A L NUMBER OF E L E M E N T IN O I S T o M A T R I X = ' , 1 1 0 ) WRITE ( 6 , 51) A Mi N , AMAX 51 F O R M A T ! 2 0 X , ' M I N I M U M D I S T A N C E = • , F 1 0 . 2 , ' F E E T ' , / , 2 0 X , ' M A X I MUM' , A • D I S T A N C E = • , F 1 0 . 2 , • F E E T « , / / ) 101 F 0 R M A T 1 2 0 X , • T O T A L D I S T N C E = ' , F 1 6 . 2 , ' F E E T ' , / , 2 0 X , ' A V E R A G E • , A ' D I S T A N C E = ' , F 1 4 . 2 , ' F E E T ' , / , 2 0 X , ' S T A N D A R D ' , .B ' D E V I A T I O N = • , F 1 2 » 2 , • F E E T » , / / / ) W R I T E ( 6 , 1 0 2 ) 1 0 2 FORMAT { 3 0 X , * * * * * FREQUENCY PLOT * * * * • . , / / / , 2 0 X , • I N T E R V A L S ' , 2 0 X , A * F R E Q U E N C Y ' , / ) M=0 DO 150 I J = 1 , 1 0 I F ( I C O N T ( I J ) . E Q . O ) GO TO 70 M=ICONT( I J )/4 IF{M„EQ.0)M=1 DO 9 J Y = 1 , M 9 S T O R ( J Y ) = S T A R 258 70 MN=M+1 00 8 JX=MN,150 8 ST0R( JX )=ELNK WRITE(6,103) I N T l I J ) , I C O N T < I J ) , ( S T O R { K ) , K = 1 » 6 0 ) 103 F0RMAT120X,F9.2,/,32X,13,3X,60A1) 150 CONTINUE WRITE(6,152) INT t i l ) 152 FORMAT(20X,F9.2) WRIT5(6,151 ) (NUMBE,NUMBE=40,240,4G) 151 FORMAT( / ,37X,6( • I •) , ' I • , / , 37X , ' 0" , 6 {7X, I 3 ) ) RETURN END $ S I G MULTIPLE ROUTE SCHEDULING BY "SAVINGS' ALGORITHM" CLUSTER TRIAL RUN - NON-HIFRARCHICAL - TDATA1 THE FOLLOWING ROUTING PRINT-PUTS FOR £VSNLY DI3T. DATA AREA ARE BASEO ON I • CAPACITY CONSTRAINT OF EACH TRUCK ROUTE a 50 BOXES TIME CONSTRAINT~0F~FACH~ROUTE = 90 MINUTES " AVERAGE SPEED OF TRUCKS EN ROUTE = IS.00 M.P.H.  TOTAL NUMHER OF ROUTES IN THE AREA = 2 ROUTES TOTAL NUMREFOTBIDXES~IN~T~HE"1,RFA~ z~""~'eO BOXES : HAP SCALE • - 1 MM e 0. Q5Q652MILE3  STCpPlNG TIME FOR RESIDENTIAL BOXES = 45 SECONDS STOPPING TIME FOR BUSINESS BOXES = AS~SPECTFlED t o _ . CCOS"f E R""X N TERPREfA f IQti" -' T^AfV i "-6R0U'P ' "" l" STATISTICS OF DISTANCE MATRIX OP GROUP 49 USING MEOIAN(GOWER?METHOD a .CENTROID SORTING TOTAL OISTNCE a 19090196 . 0 OFEET AVERAGE 01 STANCE a 1 c,?.S1,U 1 FEET STANDARD DEVIATION = 7973,69FSST TOTAL NUMBER OF ELEMENT IN DIST. MATRIX a MINIMUM DISTANCE = 1259.85FEET MAXIMUM DISTANCE = U 1 260 . 07FEET 1176 * * * * FREQUENCY PLOT **** INTERVALS FREOUFNCY I259. f t5 0 5 2 5 9 . 0 7 9259. f l9 95 *********************** 13259.91 157 210 *************************************** **************************************************** 17259 .93 2 1 2 5 9 . 9 5 195 ************************************************ 2 5 2 5 9 . 9 7 215 139 ***************************************************** ******************************'**** 2 9 2 5 9 , 9 9 33260'.01 86 ********************* 3 7 2 6 0 . 0 3 50 21 ************ ***** '11260.05 • T O " 80 160 TOO T a o " MULTIPLE ROUTE SCHEDULING USING TIKE-SAVING METHOD EVENLY DIST. OAT A AREA F.S.A. UNDEFINE SI,B ROUTES  *» PRELIMINARY ROUTE ** ROUTE NO. 1 NO. OF BOXES EN ROUTE a H9 BOXES TOTAL STOPPING TIME = 44.10 MINUTES DISTANCE~TRAVELLED( 1ST TO LASfHBoxT~~= 3 6 765 MILES TRAVEL TIME r<ECUlREP(13T TO LAST BQX) = 154.62 MINUTES TOTAL TIME REOUIREDC1ST TO LAST BOX) = 198.72 MINUTES TOTAL OTSTANCF"~TRAVELLEOCS~TN~T'cTsflfj~s'~~~z9'.bI MILES TOTAL TRAVEL TIM E (S T N TO STN) s 158.44 MINUTES TOTAL TIME REHUIREO ..FOR THIS ROUTE = 202.54 MINUTES ORDER OF CALL POINTS: 3TATI0N - 22- 26- 19- 27- 29- 33- 38- • 34- 28- 23- 24- 30-35- «3- 51- 49- 48- 42- 39- 41- 50- 45- 44- 4 0-36- 37- 32- 31- 25- 20- 16- 11- 10- 7- 1- e-14- 10- 21- 17- 13- . 6- ?.- 5- 3- 4- 12- 9-STATION 15- 0-PAGE 2 MULTIPLE ROUTE 'SCHEDULING T J S T N G . 7 IN-E-SAVINU METH d D EVENLY DIST'. DATA ARFA F.3.A. UNDEFTNE SLB ROUTE3 * * IMPROVED ROUTE * * ROUTE NO". l " NO. OF BOXES EN ROUTE 8 1)9 BOXES TOTAL STOPPING TIME e 44.10 MINUTES OISTAMCE TRAVELLED(1ST TO LAST BOX) = 36.33 MILES TRAVEL. T IMr RFOIIIRFOCIST TO LAST BOX) = 145.31 MTNUTES TOTAL TIME RFOUIRFDC1ST TO 1 AST BOX) = 180.41 MINUTES TOTAL DISTANCE TRAVELLEDCSIN TO STN) a 3H.«»S MILES TOTAL TRAVEL TIMf.cSTN TO STN) = 155.81 MINUTES TOTAL TIME R(-OUJRFD FOR THIS ROUTE s 199.91 MINUTES 0!?CER OF CAI.I POINTS: • STATION - ?6- 19- 27- 29- 33- 34- 28- 22- 23- 24- 30- 35-38- 43- 51- 49- 42- 39- 41- 50- 45- 44- 40-3?- " 3T-" . _ ^ '20- 18- 11- ' 10- " 7- 1- 8-STATION 14- 10- 21- 17- 6- 2- 5- 3- 4- 12- 9-1 5 - 0 -

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0093519/manifest

Comment

Related Items