UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The application of cluster analysis on a post office scheduling problem Wong, Siu-Sik 1976

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1976_A45 W65.pdf [ 13.62MB ]
Metadata
JSON: 831-1.0093519.json
JSON-LD: 831-1.0093519-ld.json
RDF/XML (Pretty): 831-1.0093519-rdf.xml
RDF/JSON: 831-1.0093519-rdf.json
Turtle: 831-1.0093519-turtle.txt
N-Triples: 831-1.0093519-rdf-ntriples.txt
Original Record: 831-1.0093519-source.json
Full Text
831-1.0093519-fulltext.txt
Citation
831-1.0093519.ris

Full Text

THE ON A  APPLICATION POST  OF CLUSTER  OFFICE  SCHEDULING  ANALYSIS PROBLEM  BY  SIU-SIK WONG B.A.Sc, U N I V E R S I T Y  A  O F B R I T I S H COLUMBIA, 1972  THESIS SUBMITTED I N P A R T I A L FULFILLMENT THE REQUIREMENTS FOR T H E DEGREE OF MASTER I N B U S I N E S S ADMINISTRATION  OF  i n t h e Faculty of Commerce  We  accept  required  THE  andBusiness  this  thesis  Administration  as conforming  t o t h e  standard  UNIVERSITY  OF B R I T I S H  July,  ©  1976  S i u - S i k Wong, 1976  COLUMBIA  In p r e s e n t i n g  this  thesis  an advanced degree at the L i b r a r y s h a l l I  f u r t h e r agree  for  scholarly  by h i s of  this  written  make i t  t h a t permission  for  the requirements  Columbia,  I agree  It  f o r e x t e n s i v e copying o f  is understood that copying or  for financial  gain s h a l l  of  2075 Wesbrook P l a c e V a n c o u v e r , Canada V6T 1W5  Columbia  not  for  that  reference and study. this  thesis  purposes may be granted by the Head of my Department  The U n i v e r s i t y of B r i t i s h  Date  freely available  permission.  Department  fulfilment of  the U n i v e r s i t y of B r i t i s h  representatives. thesis  in p a r t i a l  or  publication  be allowed without my  ABSTRACT  The in  o u t l i n i n g the truck  c o l l e c t i o n by  runs  12  dimensional  sets  data  units  t o be  techniques  and three  data  of street  programs  i n a  the s u i t a b i l i t y evenly  showing  group  as w e l l  as t h e membership  The  results  are then  of the evaluations,  of distances  distances,  c a n be  within  based  groups,  summarized  as  sets A  o f two-  broad  view  methods.  data  i n the  of the  space.  onto  on group  grouping distributed Computer t o  f o r the four f o r  sizes,  and t r a v e l times  follows:  Burnaby  are used  maps  nine  sets  o f t h e members  l i s t s  of  of the  c l u s t e r i n g  the linkages  plotted  and  with  procedures  each  f o r use  a p p l i c a b i l i t y  and unevenly  tree  box  associated  Cartesian  c l u s t e r i n g  letter  t o o l  a review  locations  rate  tions  diagrams  problems  box  2-dimensional  f o r various  and  and two e m p i r i c a l  letter  t o test  including  methods  investigates  Office.  nonhierarchical  i n c l u s t e r i n g both  units  Results  study  i n grouping  f o r the Post  and the p o t e n t i a l  a r e used  sets.  f o r street  an effective  This  i s presented,  of contrived  (consisting  methods  boundaries  Office.  analysis  analysis  hierarchical  area)  Post  data  methodology  Two  route  c l u s t e r i n g  the c h a r a c t e r i s t i c s , algorithms,  cluster  cluster  of computerized  i s believed  the Vancouver  analyses of  application  genewithin data  evaluation. d i s t r i b u and  i i  a.  Ward's  method  methods  are better  grouping b.  evenly  the complete linkage  methods  linkage  a r e more  linkage  a l l four  methods  sets  c l u s t e r i n g techniques outlining  suitable  clustered  the route  less  i n  sets;  m e t h o d , and t h e two  are generally  grouping d.  c l u s t e r i n g techniques  identifiable  the single  nonhierarchical  d i s t r i b u t e d data  methods  visually c.  and t h e t h r e e  average  f o r grouping  data  and t h e  units;  centroid  satisfactory i n  of data;  and  provide a useful  boundaries  tool for  for street  letter  box c o l l e c t i o n s .  A  comparative  substantiate to  solving  study  the f e a s i b i l i t y  the scheduling  f o r the Vancouver of c l u s t e r analysis  problem.  area  would  as an a i d  • ••  111  TABLE OF CONTENTS  Page ABSTRACT  i  TABLE OF CONTENTS  i i i  LIST OF FIGURES  v i i  LIST OF TABLES  xi  ACKNOWLEDGEMENT CHAPTER I  xiii  INTRODUCTION  1  1.1  Vancouver C i t y P o s t a l T r a n s p o r t a t i o n  1.2  Purpose o f the Study  4  1.3  Overview  5  CHAPTER I I  Service  CLUSTER ANALYSIS : A BROAD VIEW  2.1  Need f o r C l u s t e r i n g A l g o r i t h m s  2.2  C o n c e p t u a l Problems  8 9  i n Cluster Analysis  Function  2  10  2.2.1  The O b j e c t i v e  ^  10  2.2.2  Choice o f Data U n i t s and V a r i a b l e s  11  2.2.3  Measures  13  2.2.4  Other Problems o f C l u s t e r A n a l y s i s  15  2.3  A Review of C l u s t e r i n g Techniques  16  2.4  Uses o f C l u s t e r i n g Techniques  19  iv  CHAPTER I I I  21  3.1  B a s i c A g g l o m e r a t i v e Procedure and Approaches  22  3.2  Linkage Methods  28  3.2.1  S i n g l e Linkage Methods  28  3.2.2  Complete Linkage Method  30  3.2.3  Average Linkage W i t h i n the New Group  31  3.2.4  Average Linkage Between Merged Groups  33  3.3  •  HIERARCHICAL CLUSTERING TECHNIQUES  C e n t r o i d Methods  34  3.3.1  C e n t r o i d Method  34  3.3.2  Median (Gower) Method  35  3.4  E r r o r Sum o f Squares or V a r i a n t Methods  36  3.5  Summary  39  CHAPTER IV 4.1  . NONHIERARCHICAL CLUSTERING TECHNIQUES  Elements o f N o n h i e r a r c h i c a l Methods  41 42  4.1.1  Seed P o i n t s  42  4.1.2  I n i t i a l Partitions  43  4.2  Nearest C e n t r o i d S o r t i n g W i t h F i x e d Number o f Clusters  45  4.2.1  Convergence P r o p e r t i e s  45  4.2.2  Forgy's Method  46  4.2.3  Jancey's V a r i a n t  47  4.2.4  Convergent K-mean Method  48  4.3  Summary  49  CHAPTER V 5.1  COMPARATIVE EVALUATION OF CLUSTERING TECHNIQUES  Approach to the Evaluation Process  50 50  5.1.1  Data Set  51  5.1.2  Association Measure  57  5.1.3  Inputs to C l u s t e r i n g Methods  59  5.1.4  The Number of Clusters  60  5.1.5  What to Cluster  60  5.1.6  Clustering Techniques  61  5.2  Tool for Interpretation of Results  63  5.3  Results  64  5.3.1  Evenly D i s t r i b u t e d Contrived Data (DATAl)  65  5.3.2  Unevenly D i s t r i b u t e d Contrived Data (DATA2) 80  5.3.3  North Burnaby Empirical Data (NBDATA)  95  5.3.4  South Burnaby Empirical Data (SBDATA)  109  5.4  Tools f o r Evaluation  112  5.5  Evaluation  127  5.5.1  Evenly Distributed Contrived Data (DATAl)  5.5.2  Unevenly D i s t r i b u t e d Contrived Data (DATA2) 134  5.5.3  North Burnaby Empirical Data (NBDATA)  140  5.5.4  South Burnaby Empirical Data (SBDATA)  147  5.6  Summary  CHAPTER VI  CONCLUSIONS  127  153 159  FOOTNOTE  166  REFERENCES  167  172  APPENDIX A •APPENDIX B  "  174  APPENDIX C  199  APPENDIX D  204  APPENDIX E  210  APPENDIX F  225  APPENDIX G  233  APPENDIX H  243  Vll  LIST OF FIGURES  FIGURE 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.  Page Location Map of Evenly Distributed Contrived Data Set (DATA1)  53  Location Map of Unevenly Distributed Contrived Data Set (DATA2)  54  Location Map of North Burnaby Mail Boxes (NBDATA)  55  Location Map of South Burnaby Mail Boxes (SBDATA)  56  Linkages Outlined by Single Linkage" City - Block " Method f o r DATA1  69  Linkages Outlined by Single LinkageEuclidean Distance Method f o r DATAl  70  Linkages Outlined by Single LinkageChi Squares Method f o r DATAl  71  Linkages Outlined by Complete Linkage Method f o r DATAl  72  Linkages Outlined by Avg. Linkage between Merged Groups Method f o r DATAl  73  Linkages Outlined by Avg. Linkage within New Group Method for DATAl  74  Linkages Outlined by Centroid Method for DATAl  75  Linkages Outlined by Median Method f o r DATAl  76  Linkages Outlined by Ward's Method f o r DATAl  77  Group Boundaries Defined by 3 Nonhierarc h i c a l Methods Using Seed Points as Inputs f o r DATAl  78  viii  15.  1.6.  Group Boundaries Defined by 3 Nonhierarc h i c a l Methods Using I n i t i a l P a r t i t i o n s for DATAl  79  Linkages Outlined by Single LinkageCity - Block." Method f o r DATA2  84  Linkages Outlined by Single LinkageEuclidean Distance Method f o r DATA2  85  Linkages Outlined by Single LinkageChi Squares Method for DATA2  86  Linkages Outlined by Complete Linkage Method f o r DATA2  87  Linkages Outlined by Avg. Linkage between Merged Groups Method f o r DATA2  88  Linkages Outlined by Avg. Linkage within New Group Method f o r DATA2  89  Linkages Outlined by Centroid Method f o r DATA2  90  Linkages Outlined by Median Method f o r DATA2  91  Linkages Outlined by Ward's Method f o r DATA2  92  Group Boundaries Defined by Forgy's and Convergent K-mean Methods using Seed Points as Inputs f o r DATA2  93  Group Boundaries Defined by Jancey's Method Using I n i t i a l Partitions f o r DATA 2  94  Present Street Letter Box C o l l e c t i o n Routes of North Burnaby Area  96  Linkages Outlined by Single Linkage"City .-- Block" Method f o r NBDATA  98  Linkages Outlined by Single LinkageEuclidean Distance Method f o r NBDATA  99  11  17. 18. 19. 20. 21. 22. 23. 24. 25.  26.  27. 28. 29. 30.  Linkages Outlined by Single LinkageChi Squares Method f o r NBDATA  100  ix  31.  Linkages  Outlined  Method 32.  Linkages  Linkage New  34.  Groups  Outlined Group  Linkages  Linkage  f o r NBDATA Outlined  Merged 33.  by Complete  101  by A v g . Linkage Method  by Avg. Linkage  Method  Outlined  between  f o r NBDATA  102  within  f o r NBDATA  by Centroid  103 Method f o r  NBDATA 35.  Linkages  104 Outlined  by Median Method f o r  NBDATA 36.  Linkages  105 Outlined  by Ward's  Method f o r  NBDATA 37.  38.  39.  Group Boundaries Defined by Jancey's Method U s i n g Seed Points as Inputs f o r NBDATA  107  Group B o u n d a r i e s D e f i n e d by Forgy's C o n v e r g e n t K-mean Methods U s i n g I n i t i a l P a r t i t i o n f o r NBDATA  108  Present  Street  Routes 40.  Linkages "  41.  106  City  Linkages  o f South Outlined  Outlined  Linkages  Linkages Method  44.  Linkages . Merged  45.  Linkages New  46.  Collection  by Single -  Method  by Single  Outlined  by Single  Method  Outlined  111  Linkagef o r SBDATA  f o r SBDATA  Groups Outlined  Group  by Complete  117  Linkage  Linkages Outlined SBDATA  118  by Avg. Linkage Method  between  f o r SBDATA  by Avg. Linkage  Method  116  Linkage-  f o r SBDATA  f o r SBDATA Outlined  115  Linkage-  D i s t a n c e Method  Chi-Squares 43.  Box  Burnaby Area  - Block.  Euclidean 42.  Letter  and  119 within  f o r SBDATA  by Centroid  120 Method f o r  121  X  47.  Linkages Outlined by Median Method f o r SBDATA  122  Linkages Outlined by Ward's Method f o r SBDATA  123  Group Boundaries Defined by Jancey's Method Using Seed Points as Inputs for SBDATA  124  Group Boundaries Defined by Forgy's and Convergent K-meari Methods Using I n i t i a l P a r t i t i o n s f o r SBDATA  125  D i s t r i b u t i o n of Distances Within Groups Defined by Nonhierarchical Methods for DATAl  131  D i s t r i b u t i o n of Distances Within Groups Defined by Ward's Method f o r DATAl  132  53. ,  D i s t r i b u t i o n of Distances Within Groups Defined by Complete Linkage f o r DATAl  133  54.  D i s t r i b u t i o n of Distances Within Groups Defined by Complete Linkage f o r DATA2  138  D i s t r i b u t i o n of Distances Within Groups Defined by Avg. Linkage Methods f o r DATA2  139  D i s t r i b u t i o n of Distances Within Groups Defined by Forgy's and Convergent K-mean Methods f o r NBDATA  144  D i s t r i b u t i o n of Distances Within Groups Defined by Average Linkage Methods for NBDATA  145  48. 49.  50.  51.  52.  55.  56.  57.  58. 59. 60.  61.  D i s t r i b u t i o n of Distances Within Groups Defined by Ward's Method f o r NBDATA  146  D i s t r i b u t i o n of Distances Within Groups Defined by Jancey's Method f o r SBDATA  151  D i s t r i b u t i o n of Distances Within Groups Defined by Chi-Squares Method f o r SBDATA  152  D i s t r i b u t i o n of Distances Within Groups Defined by Ward's Method f o r SBDATA '  154  LIST  OF  TABLES  TABLE  ]..  Storage Requirements Matrix  f o r  Similarity  2.  Parameter Values o f t h e Recurrence Formula f o r Five H i e r a r c h i c a l Techniques  3.  Summary o f N o n h i e r a r c h i c a l DATAl  4.  R e s u l t s o f 12 DATAl  5.  Summary  Runs f o r  C l u s t e r i n g Methods f o r  o f Nonhierarchical  Runs  f o r  DATA2 6.  Results  o f 12  C l u s t e r i n g Methods  f o r  DATA2 7.  R e s u l t s o f 12 C l u s t e r i n g M e t h o d s f o r NBDATA  8.  Summary o f N o n h i e r a r c h i c a l NBDATA  9.  R e s u l t s o f 12 SBDATA  Runs f o r  C l u s t e r i n g Methods f o r  10.  Summary o f N o n h i e r a r c h i c a l SBDATA  11.  Means a n d S t a n d a r d D e v i a t i o n s o f Groups D e f i n e d b y 12 C l u s t e r M e t h o d s f o r DATAl  12.  T r a v e l Times and Distances o f Groups D e f i n e d b y 12 C l u s t e r M e t h o d s f o r DATAl  13.  Means a n d S t a n d a r d D e v i a t i o n s o f Groups D e f i n e d b y 12 C l u s t e r M e t h o d s f o r  DATA2  Runs f o r  Travel Distances D e f i n e d b y 12 DATA 2  and Times o f Groups C l u s t e r Methods f o r  Means a n d S t a n d a r d D e v i a t i o n s o f Groups D e f i n e d b y 12 C l u s t e r M e t h o d s f o r NBDATA Travel Distances D e f i n e d b y 12 NBDATA  and Times o f Groups C l u s t e r Methods f o r  Means and S t a n d a r d D e v i a t i o n s o f Groups D e f i n e d b y 12 C l u s t e r M e t h o d s f o r SBDATA Travel Distances D e f i n e d b y 12 SBDATA Summary Four  and Times o f Groups C l u s t e r Methods f o r  o f Method Preferences Sets of Data  f o rt h e  ACKNOWLEDGEMENT  I  wish  t o thank  D r . C.L. D o l l ,  Chairman,  f o rh i s g u i d a n c e  my  work.  thesis  I Mr.  T.  would  Lim,both  assistance  like  assistance  special  during  t o thank  o f Vancouver  i n collecting  And her  also  and time  Post  my  committee a l l stages  M r . B. D y k e a n d  Office,  f o r their  data.  thanks  i n the typing  t o my w i f e o f this  Charmaine f o r  thesis.  1  CHAPTER  I  INTRODUCTION  The like  many  faced of  Vancouver  postal  with  assigning suitable scheduling  the  service  The  scheduling  tributed for  t h e management a more  rely  efficient  One up  render  nique  i s a  by  i t s delivery  suitable  single  program  problem  boundaries.  dis-  volume.  Planning  the planning  each  and  ofa l l  of the  experts  the development  t o solve  and then  their  t h e problem. methods  results  this  service  analysis  tech-  With  the  c a n be  c a n be  i n the determination t o apply  i s the  scheduled  Clustering  several cluster  I n order  by  Office.  i n scheduling  i n which  and schedulers  constrained  technique.  effort.  approach  of computerization,  mail  encouraging  scheduling  drivers.  movements i s complex  stage,  i s currently  and  the unevenly  on the experience  of the biggest  the planners  service  heavily  trucks  of the Post  complicated  resources  i s constantly i n the pursuit  services i s highly  of the boundaries  would  i n a  of schedules  to various  At the present  and  advent  b i g c i t i e s ,  destinations of the processed  t h e t r u c k a n d human  setting  Transportation Services,  requirements  i s further  schedules  to  duties  and process  the  run  revisions  of delivery  time-consuming.  of  Postal  services of other  complicated  The  City  aids  of the  technique  e f f e c t i v e l y , the complex o p e r a t i o n of the Vancouver C i t y P o s t a l T r a n s p o r t a t i o n S e r v i c e must be comprehended.  1.1  Vancouver C i t y P o s t a l T r a n s p o r t a t i o n  Service  The Vancouver C i t y T r a n s p o r t a t i o n o p e r a t i o n i s a. very complex system w i t h a f l e e t of v a r i o u s s i z e d v e h i c l e s performing  d i f f e r e n t types  vices.  of May  As  system has  151  1976,  of d e l i v e r y and c o l l e c t i o n s e r -  the Post O f f i c e C i t y  Transportation  s m a l l v e h i c l e s o f which 139 are  f o r d a i l y s e r v i c e s , and  scheduled  12 are kept on standby b a s i s to meet  c o n t i n g e n c i e s j i r r e g u l a r i e s , replacements and  breakdowns.  A t o t a l of 23 medium to l a r g e t r u c k s are a l s o u t i l i z e d f o r other t r a n s p o r t a t i o n s e r v i c e s w i t h i n Greater Vancouver a r e a .  S mall v e h i c l e s of \ ton c a p a c i t i e s are mainly used f o r d a i l y s t r e e t l e t t e r box  c o l l e c t i o n s , r e l a y bundle  l i v e r i e s , p a r c e l post d e l i v e r i e s , and Larger  t r u c k s are a s s i g n e d  bulk m a i l volumes to and  The  from a i r p o r t , docks, r a i l w a y s the Vancouver  s c h e d u l i n g of these d i f f e r e n t  f o r v a r i o u s s e r v i c e s i s very type of s e r v i c e has  i t s own  l o c a t i o n c o n s t r a i n t s , and O f f i c e operations.  special deliveries.  to s h u t t l e s e r v i c e s t r a n s p o r t i n g  s t a t i o n s , s a t e l l i t e post o f f i c e s and Post O f f i c e .  de-  i n t r i c a t e and  General  sized trucks  complex.  Each  c h a r a c t e r i s t i c s i n timing,  degree of importance to the  Post  Frequent r e s c h e d u l i n g of s e r v i c e s are  r e q u i r e d because there are o f t e n a l t e r a t i o n s i n the a i r p l a n e  schedules, street  delivery  letter  pattern,  and  services  i s  box  and  union  the  presently  Vancouver  tremendous  industries  i n  the  to  the  of  system  The  study  t i s t i c s  could also  and  A  of  i n  scheduling  of  Post be  the  of  services,  assigned  runs in  time  adequate to  for  and  transport mail  and  of to  as  w e l l of  that  the  c a r r i e r s  bundled to  postal d i f f i c u l t study of  i n  de-  Services  efficiency  of  decentralization.^ i n  volume  sta-  computerized  cannot  vehicles  of and  the  elements  manpower. for  allowed  to  methods,  u t i l i z e d  other  allocated. mails  in  be  services  timing  e f f i c i e n t l y  city  management  they  by  small  Mail  that  These  the  as  and  the  about  f e a s i b i l i t y  the  truck  from  u t i l i z a t i o n  by  multiple  and  brought  Vancouver  improved  of  i s  computerization  i t i s c r i t i c a l  i s  have  the  to  intricate  indicated  however,  are  service  West  planners  scheduling  and  very  vehicles.  the  work,  years  volume  planners.  volumes  computerization  involved  trucks  a  highly  the  coordination  Office  without  mail  by  population  and  e f f i c i e n t l y i n  rescheduling  i n v e s t i g a t i n g the  routing help  The  special vehicle  North  requirements,  allocations, mail  mail  The  introduced  undoubtly,  services  manually  becoming  1975 of  box  i n recent  area.  i s  General  the  area  the  centralization from  growth  tackle.  summer  done  increases  transportation problem  bundle  regulation.  Constant Greater  requirements,  the  a  Most day's  for  each  Relay  bundle  relay  boxes  d i s t r i b u t e according  to  their  schedules.  Street  restricted  both  requirements. letter  box  important  1.2  to  a  the  ble of  old  does  the  the  the  i n a  manual  every  relieved  significant vehicles  schedules a  week's  amount  the  of  can or  be  geographic  the  to  of  .  plant  each  are,  street  therefore,  schedules.  solve a l lthe A  present  i n a  that  i n the  time  i n routing  or  time  of  to  a  within  vehicle of  the  a the  unfeasidetails  i n the  1975  this  program  as  t r a f f i c of  spending  re-routing induced.  i s necessary,  change  boundaries  are  detailed  i s  construction from  out  and  period.  although such  ideal complex  a l lthe  used  management  alterations  stage  short  program  details  etc.  complete  comprehend  proves  results  routing  can  the  carried  month's  run  vehicles  time.  at  minor  once  computer  The on  that  routing  signs,  schedules the  process  computerization, i t i s  system  study  route,  of  of  i n a b i l i t y  stop  i t has  bundle  the  short  computerized  consider  the  periods are  Study  r e s t r i c t i o n s ,  tion  of  system  u t i l i z a t i o n  not  and  relay  advent  existing  ROUTPLOT, a vehicle  and  scheduling program of  collections  delivery  structure  problems  because an  of  box  geographical boundaries  mechanized  scheduling computer  the  c o l l e c t i o n  Purpose  have  by The  With to  letter  and  day's  routings  actually assigned  a of  the  Modifica-  the  time  the  resultant  instead  of  manually.  depends  highly  schedule.  5  T r a d i t i o n a l l y , the boundaries are e i t h e r n a t u r a l geographic boundaries c o n f i n e d by r i v e r s , highways, b r i d g e s and a r e a r b i t r a r i l y determined f o r areas w i t h l i t t l e  p o s t a l zones.  The  sea or  boundaries  p h y s i c a l hindrance are p r i m a r i l y set  by experienced planners who  are e x - t r u c k d r i v e r s .  This  system of boundary d e t e r m i n a t i o n i s i n f a c t very r e l i a b l e , but time-consuming. (1)  The  purpose  of t h i s study i s t w o - f o l d : -  t o examine the c h a r a c t e r i s t i c s of 12 c l u s t e r i n g t e c h -  niques ; and (2)  t o v a l i d a t e , u s i n g c o n t r i v e d and e m p i r i c a l d a t a , the  a p p l i c a b i l i t y of these 12 t e c h n i q u e s i n grouping box into suitable  1.3  locations  cluster.  Overview Chapter I begins w i t h g e n e r a l i n t r o d u c t i o n s u p p l e -  mented by a b r i e f d e s c r i p t i o n of the Vancouver C i t y Transportation Service operations.  Postal  The purpose of the study  i s then o u t l i n e d and an overview completes  the f i r s t  Chapter I I p r e s e n t s a broad view on c l u s t e r  chapter.  analysis  by s t r e s s i n g the need of c l u s t e r i n g a l g o r i t h m s and the conc e p t u a l problem  i n u t i l i z i n g cluster analysis.  Elements  o f the c l u s t e r i n g a n a l y s i s such as v a r i a b l e s , s c a l e and measures a r e d i s c u s s e d t o i n d i c a t e the v a r i e t y of methods i n amalgamating v a r i a b l e s and c o n s t i t u t i n g data set f o r c l u s t e r algorithms.  A review o f c l u s t e r i n g techniques and  their  uses,  cluster  and  some  analysis  complete  Chapter 9  hierarchical  The  basic  and  the  behind ture  clustering  approach  to  reference  details  i n the  these  tance  different are data  are  evenly  two  data  North  to  data  the  and  Chapter  and  and  sets  and  part  of  this  i n this  i s f i r s t each are  of  by  to  the  study.  described  technique,  then  chapter.  with  discussed The  r e f e r r i n g  V,  rationale  to  l i t e r a -  the  evaluated to  are  test  c r i t e r i o n  that  results on  are  IV  of  discloses  three  non-  applicable to  dis-  actual  Burnaby,  scheduling  generated four  i n d e t a i l . the  street and  efficiency  sets Two  data  letter  they-are of  these  problem.  by of  sets  a p p l i c a b i l i t y  distributed  as  the  and  Chapter  variables.  techniques  unevenly  South  chapter,  methods  and  to  of  above  appropiateness aids  u t i l i z a t i o n  introduction  examined  c r i t e r i o n  characteristics  designed  for  brief  i s also reviewed  clustering  examined  a  d i s t a n c e measures  clustering  measured  In  the  methods.  philosophy,  hierarchical  with  techniques  and  second  Similar the  on  chapter.  methods  these  to  each method  on  this  I I I begins  characteristics  specific i n  general remarks  of  used  input of  contrived  The  other  locations to  data  techniques  sets.  box  twelve  test  clustering  S t a t i s t i c a l  of the  techniques  analyses  on  the  t r a v e l t i m e s and  performed  distances  of  and  used as  The  concluding  chapter w i l l  the  tested  clustering  t e c h n i q u e s as  street  letter  box  of  for also  the  collections  Post O f f i c e .  included  evaluation  each set  in this  Areas  of  chapter.  and  of  results  are  measures.  discuss a  perhaps  additional  tool  the  suitability  for  scheduling  bundle relay investigation  runs are  CHAPTER I I  CLUSTERING ANALYSIS  This  chapter  of  cluster  analysis.  be  referred  t o as a  different whibh or the  applied  satisfy  holding  a  clustering.  of  clustering  in  grouping  for  further  There partly leads  i s no  data, data  because  necessary  approaches,  were  developed  into  objects.  each  of c l a s s i f i c a t i o n uses  into  and  measures  diverse  purposes  categories  usable  mathematical of  behind  method  analysis.  clustering,  c r i t e r i o n of I t i s , each  the a p p l i c a b i l i t y of the correct  i n  groups  f i e l d s ,  outcome.  the concepts  can  of techniques i s  t o the problem o f each  analysis  adopted  to suit  or^ o t h e r  different  and t o choose set of  the topic  of techniques,  the application  t o investigate  f o r a  on  subject  inter-relatedness,  various  or objects  solution  t o understand  algorithm,  clusters  variety  t o a  t o the  cluster  c o l l e c t i o n  from  interpretation  i n principle  terms,  This  variables  general  introduction  o f homogeneity,  viewpoint  analysis  VIEW  of techniques  of s c i e n t i s t s  The  BROAD  to c l a s s i f y objects  c r i t e r i a  different  or  c o l l e c t i o n  separation.  contribution  an  I n general  f i e l d s  some  inter-group  gives  : A  performance therefore,  clustering  different  t o derive  usable  9  2.1  Need f o r C l u s t e r i n g  Algorithms  C l a s s i f y i n g o b j e c t s or data i n t o groups w i t h d e f i n e d c r i t e r i a or i n t u i t i v e n o t i o n s r e q u i r e s numerous enumerations t o search a l l the p o s s i b i l i t i e s and t o choose the best s o l u t i o n .  T h i s enumeration process would be very  time-consuming and d i f f i c u l t . indicated  Abramowitz and Stegun  (1968)  t h a t the number o f ways o f s o r t i n g n o b s e r v a t i o n s  i n t o m groups i s a S t i r l i n g number of the second k i n d  For even the r e l a t i v e l y t i n y problem o f s o r t i n g 25 observations  i n t o 5 groups, the number o f p o s s i b i l i t i e s i s the 15  astounding further  q u a n t i t y of over 2x10  .  T h i s number c o u l d be  compounded i f number o f i d e a l groupings  i s not  known p r i o r t o t h i s enumeration. I n order t o s i m p l i f y the complexity o b j e c t s i n t o a p p r o p i a t e number o f groupings, a l g o r i t h m s a r e used. s e a r c h i n g through one  i n sorting clustering  These a l g o r i t h m s a r e procedures f o r  the s e t o f a l l p o s s i b l e c l u s t e r s  t h a t f i t the data r e a s o n a b l y w e l l .  to find  F r e q u e n t l y , there  i s a n u m e r i c a l measure o f f i t which the a l g o r i t h m attempts t o o p t i m i z e , but many u s e f u l a l g o r i t h m s do not e x p l i c i t l y optimize a c r i t e r i o n .  These a l g o r i t h m s use d i f f e r e n t mode  of s e a r c h by s o r t i n g , s w i t c h i n g , j o i n i n g , s p l i t t i n g ,  adding  and  searching  the data s e t t o i d e n t i f y the c l u s t e r  s u i t s the c r i t e r i o n b e s t .  The c h o i c e o f a l g o r i t h m ,  that however,  i s b a s i c a l l y determined by the user's s e l e c t i o n of the data u n i t , the v a r i a b l e s and the s i m i l a r i t y measures.  2.2  Conceptual Problems i n C l u s t e r  Analysis  A p p l i c a t i o n o f c l u s t e r a l g o r i t h m s would  introduce  a host o f problems even though the i n t u i t i v e i d e a o f c l u s t e r i n g i s c l e a r enough. analysis  The foremost d i f f i c u l t y  i s only a c o l l e c t i o n o f h e u r i s t i c procedures w i t h  a v a r i e t y o f d e c i s i o n r u l e s and a l g o r i t h m s . i n t u i t i v e decisions  are required  of c l u s t e r a n a l y s i s r e p e r t o r y tunately, a  i s that c l u s t e r  Series of  t o determine which elements  should be u t i l i z e d .  Unfor-  the l i t e r a t u r e on c l u s t e r a n a l y s i s does not p r o v i d e  g e n e r a l framework f o r t h i s c o l l e c t i o n of techniques  that  shows the steps i n v o l v e d , a v a i l a b l e a l t e r n a t i v e s , d e c i s i o n p o i n t s , and r e l e v e n t In the f o l l o w i n g  c r i t e r i a f o r s e l e c t i n g among  sub-sections,  the author attempts t o b u i l d  a framework from which the elements o f c l u s t e r could  options.  analysis  be e a s i l y r e l a t e d .  2.2.1  The O b j e c t i v e  Function  Though there i s no a b s o l u t e s o l u t i o n t o c l u s t e r problems, i t i s u s u a l l y used t o determine a p a r t i t i o n i n g that  s a t i s f i e s some o p t i m a l i t y c r i t e r i o n .  This  optimality  c r i t e r i o n may  be g i v e n i n terms of a f u n c t i o n a l r e l a t i o n  t h a t r e f l e c t s the  l e v e l s of d e s i r a b i l i t y of the  p a r t i t i o n s or groupings. termed o b j e c t i v e  This  function.  c r i t e r i o n : - distance  various  functional r e l a t i o n i s often  Each a l g o r i t h m  uses a p a r t i c u l a r  measures, s i m i l a r i t y or d i s s i m i l a r i t y  measures, or q u a n t i f i a b l e measures of homogeneity are a l l adopted as o b j e c t i v e  c r i t e r i a f o r d i f f e r e n t techniques.  This variety constitutes appropriate  the d i v e r s i f i e d problem of choosing  data u n i t s , v a r i a b l e s as w e l l as measures of  functional relations.  2.2.2  Choice of Data U n i t s and The  Variables  a c t u a l mechanics of the c l u s t e r a n a l y s i s  performed on a sample of e n t i t i e s r e p r e s e n t i n g  are  "objects",  " o b s e r v a t i o n s " or "elements".  These e n t i t i e s c o u l d  be  e n t i r e small single population  or the  large  population. population  f r a c t i o n of a  I f random samples were chosen from a to r e p r e s e n t the  the data must be assumed.  population,  the  large  independence of  Cluster analysis  on a g i v e n data  s e t r e f l e c t s the c h a r a c t e r i s t i c s of these data u n i t s , thus the c h o i c e of data a f f e c t s the  outcome of the  analysis.  Another problem f a c i n g the user i s the c h o i c e variables  that can  i d e n t i f y the e n t i t y ' s c h a r a c t e r i s t i c s ,  a t t r i b u t e s or t r a i t s . could  of  Any  relevant  discriminating  variable  h i g h l y a f f e c t the r e s u l t of the c l u s t e r a n a l y s i s .  Missing  variables could  clusters.  On  nators  not  relevant  to  sought-for  clusters  and  method based a  must on  the  be  this  single  bles  are  tion,  of  most  and  of  scale In are  of  many  on  formulation  of  The  scale  In the  and  need  of  for  A  the  selection and  proper  scaling,  derived.  type,  usually  of  be  assump-  mathematical variables  v a r i a b l e s can the  varia-  convenient  range  set  c l a s s i f i e d  are  be or  the  cross-classified  f o r m u l a t i o n and  .  transformation and  scaled  further interpretation  transformation  induce  and  the  conversions.  of  i s required scale and  with  problems,  of  mask  variables,  power  general,  can  could  This  the  size  discrimi-  results.  single  world  they  forms  homogeneity  i t s importance  a  confusing  discussions, the  scale.  differently  This  of  of  i n r e a l  to  usable  paragraph,  Transformation  be  mathematical  manipulations.  theory  increases  types.  convert  into  to  be  and  strong  hand  relevant  could  interval  However,  cases,  variables  by  an  of  v a r i a b l e s and  s t a t i s t i c a l  according  to  determine of  at  misleading  s i m i l a r it}'  measurement,  used  above  to  amorphous  inclusion  purpose  give  f u l l y  mixed  c l a s s i f i e d  the  assumed  course,  techniques. usually  of  generate  hand,  selection  always  continuous  other  used  index  In  w e l l  scale  types,  i n most  types,  to  mentioned  analysis  however,  s u i t a b i l i t y  as  the  must  i n  techniques. be  evaluated  analysis  techniques.  Scales  a r e u s u a l l y r e f e r r e d t o as nominal, o r d i n a l , i n t e r v a l  or r a t i o measures.  The t r a n s f o r m a t i o n from one type o f s c a l e  to another o f t e n i n v o l v e s  subjective  consideration  v a l i d i t y of conversions.  For c l u s t e r a n a l y s i s , the v a r i a b l e s  are u s u a l l y q u a n t i f i a b l e and u n q u a n t i f i a b l e transformed by v a r i o u s  methods  Shepard, 1962a, 1962b; K r u s k a l ,  o f the  scales are  (Cochran and Hopkins, 1961; 1964; Anderberg, 1973) t o  s u i t the requirements o f c l u s t e r i n g t e c h n i q u e s .  2.2.3  Measures The m a j o r i t y  o f c l u s t e r i n g techniques begins with,  the c a l c u l a t i o n o f a m a t r i x o f s i m i l a r i t i e s or d i s t a n c e s between e n t i t i e s , and t h e r e f o r e the  consideration  i s needed o f  p o s s i b l e ways o f d e f i n i n g these q u a n t i t i e s .  Indeed  many c l u s t e r i n g techniques may be thought o f as attempts t o summarize the i n f o r m a t i o n  or r e l a t i o n s h i p s between e n t i t i e s  which a r e g i v e n i n a s i m i l a r i t y m a t r i x , so that these r e l a t i o n s h i p s can be e a s i l y comprehended and communicated.  Although v a r i a b l e s could  be s c a l e d by v a r i o u s  t r a n s f o r m a t i o n s , a measure o f a s s o c i a t i o n i s s t i l l needed t o r e l a t e , i n n u m e r i c a l form, the s i m i l a r i t y of one v a r i a b l e t o another.  T h i s measure i s r e q u i r e d  because a l l c l u s t e r i n g  methods have the same b a s i c working assumption that  numerical  measure among data or v a r i a b l e s a r e comparable and s o r t a b l e . D i f f e r e n t types o f v a r i a b l e s warrant v a r i o u s  measures o f  associations.  There i s a host of methods for c a l c u l a t i n g  the measures among v a r i a b l e s : - the angular measure between vectors, the product moment c o r r e l a t i o n c o e f f i c i e n t , the canonical c o r r e l a t i o n , the matching c o e f f i c i e n t s , and some probability-based  measures are a few measures commonly  adopted i n establishing numerical r e l a t i o n s among variables. These measures of association of variables, however, need to be supplemented by a measure of association of data units i f the c l u s t e r i n g technique so chosen i s designed for grouping data u n i t s .  The measures of association among data units d i f f e r from that of variables i n many aspects.  The most  prominent difference i s that the measure for data unit warrants a. sometimes nonexisting  r e l a t i o n s h i p measure among  the variables related to the data unit.  The heterogeneity  of variety of measurement units and variable types makes i t especially d i f f i c u l t to define meaningful measures of assoc i a t i o n between data u n i t s , within the context of a given set of variables.  S i m i l a r i t y and distance measures are the  most popular measures adopted i n c l u s t e r i n g data units. There are a number of s i m i l a r i t y measures, as w e l l as distance measures that are applicable to binary and q u a l i t a t i v e data (Anderberg, 1973; E v e r r i t t , 1974; Duran, 1974).  Two dimen-  s i o n a l problems are easier to measure: the distances among data units could be simply Euclidean  distances.  Multi-  d i m e n s i o n a l problems, however, r e q u i r e  experimentation to  examine the v a l i d i t y of weight assignment, spaces, and  the  2.2.4  b a s i c approach t o formulate such a measure.  Other Problems of C l u s t e r Even though the user has  function,  representation  data and  variables  Analysis  decided the  t o be used, and  measure between data or v a r i a b l e s , t h e r e are  objective  the  similarity  still  three  b i g q u e s t i o n s t o be c o n s i d e r e d : - 1. what t o c l u s t e r ; 2.  number of c l u s t e r s ; and  The  variables  3.  are  the c h o i c e of  o f t e n amalgamated i n t o a  index r e l a t e d to the data u n i t and to c a t e g o r i z e Similar  the  objects  by  variables  grouped by  separately  the  variables  cluster  instead  and  of data u n i t s .  data are  often  a t t r i b u t e s of d i f f e r e n t  or s i m u l t a n e o u s l y .  depend on the degree of d i f f i c u l t y variables  single  i t i s sometimes necessary  to f a c t o r a n a l y s i s , the m u l t i v a r i a t e  c l a s s i f i e d and  technique.  This choice  i n amalgamating  the r e l e v a n c e of i n d i v i d u a l v a r i a b l e  could  the to  the  analysis.  A s u b s t a n t i a l p r a c t i c a l problem i n performing a cluster analysis the  data.  i s deciding  upon the number of c l u s t e r s  in  D i f f e r e n t c l u s t e r i n g methods o f f e r v a r i o u s degrees  of f l e x i b i l i t y  i n grouping numbers of c l u s t e r s .  c l u s t e r i n g methods give a c o n f i g u r a t i o n  Hierarchical  f o r every member of  the c l u s t e r from one up t o the number of e n t i t i e s whereas other approaches might r e q u i r e d e f i n e d number of c l u s t e r s p r i o r to the c l u s t e r i n g procedures. w i t h a chosen number of groups and  Some a l g o r i t h m s  begin  then modify t h i s number  as i n d i c a t e d by c e r t a i n c r i t e r i a w i t h the o b j e c t i v e of s i m u l t a n e o u s l y d e t e r m i n i n g both the number of c l u s t e r s t h e i r c o n f i g u r a t i o n . A l l these  i n d i c a t e t h a t the c h o i c e of  technique c o u l d a f f e c t a l l the elements of c l u s t e r  The c h o i c e of technique using cluster analysis.  As mentioned p r e v i o u s l y , the  of d i f f e r e n t f i e l d s to s a t i s f y t h e i r own  by  collec-  scientists  needs; and  technique would be p a r t i c u l a r l y s u i t a b l e to one A review  analysis.  i s an i n h e r e n t problem i n  t i o n of c l u s t e r i n g techniques were developed  or c r i t e r i a .  and  each  s e t of data  of l i t e r a t u r e would h e l p to determine  the technique r e q u i r e d f o r s o r t i n g o b j e c t s w i t h a p p r o p r i a t e c r i t e r i a and a l g o r i t h m s . techniques  2.3  i s i n c l u d e d i n s e c t i o n 2.3  of t h i s  chapter.  A Review of C l u s t e r i n g Techniques The  who  F u r t h e r d i s c u s s i o n s of c l u s t e r i n g  developed  v a r i o u s background of s c i e n t i s t s and  d i f f e r e n t c l u s t e r i n g techniques r e s u l t s i n a  v a r i e t y of c l u s t e r i n g a l g o r i t h m s . c l a s s i f i c a t i o n methods was Anderberg  researchers  (1973).  Comprehensive reviews  conducted  of  by Cormack (1971) and  In g e n e r a l , c l u s t e r a n a l y s i s techniques  can be " c l a s s i f i e d " i n t o types roughly as follows-^:-  ( i ) H i e r a r c h i c a l techniques themselves  are c l u s t e r e d  repeated a t d i f f e r e n t  - i n which the c l a s s e s  i n t o groups, the process b e i n g  l e v e l s , step by s t e p , t o form a t r e e  diagram. ( i i ) O p t i m i z a t i o n - p a r t i t i o n i n g techniques -- i n which the c l u s t e r s are formed by o p t i m i z i n g the c l u s t e r i n g criterion.  The c l a s s e s are mutually e x c l u s i v e , thus  forming  a p a r t i t i o n of the set of e n t i t i e s . ( i i i ) D e n s i t y or mode-seeking techniques  in  which c l u s t e r s are formed by s e a r c h i n g f o r r e g i o n s c o n t a i n i n g a r e l a t i v e l y dense c o n c e n t r a t i o n of e n t i t i e s . ( i v ) Clumping techniques  i n which the c l a s s e s  or clumps can o v e r l a p . (v) Others i n t o any  methods which do not f a l l  of the f o u r p r e v i o u s  clearly  groups,  Of a l l the above c a t e g o r i e s , h i e r a r c h i c a l a r e most commonly used and d i s c u s s e d . niques may  technique  T h i s category of t e c h -  be s u b d i v i d e d i n t o " a g g l o m e r a t i v e " methods which  proceed by a s e r i e s of s u c c e s s i v e f u s i o n of the N e n t i t i e s i n t o groups, and  " d i v i s i v e " methods which p a r t i t i o n the set  of the N e n t i t i e s s u c c e s s i v e l y i n t o f i n e r p a r t i t i o n s .  The  r e s u l t s of both types can be presented i n the form of a dendogram or a two d i m e n s i o n a l t r e e diagram,  illustrating  the f u s i o n s or p a r t i t i o n s which have been made a t eachsuccessive l e v e l .  Both types of h i e r a r c h i c a l techniques  can  be viewed as attempts t o f i n d the most e f f i c i e n t  step, i n  some d e f i n e d sense, a t each stage i n the p r o g r e s s i v e subd i v i s i o n or s y n t h e s i s o f the p o p u l a t i o n . of h i e r a r c h i c a l techniques  Further discussion  i s i n c l u d e d i n Chapter I I I .  Besides h i e r a r c h i c a l c l u s t e r i n g t e c h n i q u e s , the others c o u l d be simply termed as n o n - h i e r a r c h i c a l methods. P a r t i t i o n i n g techniques  c a n be formulated as attempts t o  p a r t i t i o n the s e t o f e n t i t i e s so as t o o p t i m i z e some p r e defined c r i t e r i o n .  Most o f these methods assume t h a t the  number o f groups had been decided by the u s e r ,  although  some a l l o w the number t o be changed d u r i n g the course o f analysis.  Three d i s t i n c t  procedures  a r e employed by these  techniques: (a) a method o f i n i t i a t i n g  clusters;  (b) a method f o r a l l o c a t i n g e n t i t i e s t o i n i t i a t e d c l u s t e r s ; and (c) a method of r e a l l o c a t i n g some or a l l o f the e n t i t i e s t o other c l u s t e r s once the i n i t i a l c l a s s i f i c a t o r y process has been completed. These n o n - h i e r a r c h i c a l techniques a r e f u r t h e r examined i n Chapter IV.  D e n s i t y s e a r c h techniques linkage c l u s t e r a n a l y s i s .  o r i g i n a t e d from s i n g l e  These techniques  d e n s i t y r e g i o n s and d e f i n e them as c l u s t e r s  l o c a t e the h i g h (Carmichael,  1968).  Clumping techniques a l l o w o v e r l a p s between c l a s s e s  indicating  t h a t the overlapped e n t i t i e s may belong i n s e v e r a l p l a c e s (Jones and Jackson,  1967).  Other methods such as the Q - f a c t o r  analysis  ( C a t t e l l , 1952; Parks, 1970; Johnson, 1970), R - f a c t o r  analysis  (Gower, 1966), BC T r y System (Tryon and B a i l e y ,  1970),  and many others a r e l e s s known among c l u s t e r i n g t e c h n i q u e s . Most c l u s t e r i n g methods warrant  e x t e n s i v e enumeration, and  computer programs a r e w r i t t e n t o perform these  procedures  e f f i c i e n t l y with high accuracy.  2.4  Uses o f C l u s t e r i n g  Techniques  C l u s t e r a n a l y s i s i s g e n e r a l l y used  to sort data,  v a r i a b l e s or o b j e c t s - i n t o meaningful c l a s s e s f o r f u r t h e r interpretation.  Since the c h a r a c t e r i s t i c s and c r i t e r i o n of  each c l u s t e r i n g technique a r e designed d i f f e r e n t l y f o r v a r i o u s purposes,  i t i s hard t o d e f i n e the l i m i t s o f c l u s t e r  On one end o f the spectrum,  i t resembles  the f a c t o r  analysis. analysis  procedures, and on the o t h e r , i t i s n o t h i n g but a s o r t a t i o n methodology.  Applications of c l u s t e r i n g techniques, therefore,  vary from simple s o r t a t i o n o f one- or two-dimensional  data  s e t s t o complex m u l t i - d i m e n s i o n a l data c l a s s i f i c a t i o n s . c l u s t e r a n a l y s i s i s most w i d e l y used by b i o l o g i c a l i n c l a s s i f y i n g species of d i f f e r e n t f a m i l i e s .  The  scientists  Other  fields,  such as psychology, g e o l o g i c a l s c i e n c e s , economics, a r c h e o l o g y , medicine and many other use c l u s t e r a n a l y s i s mainly i n s o r t a t i o n o f m u l t i v a r i a t e data  sets.  The development of computer technology has t o reduce  the computational time f o r a l l o c a t i n g or r e a l l o -  c a t i n g c l u s t e r s w i t h i n a data s e t . i n a p p l i e d s t a t i s t i c s and  The  increasing  s o r t i n g and mathematical  interest  the a v a i l a b i l i t y of v a s t amount  o f data have f u r t h e r emphasized the importance  of s e l e c t i n g ,  grouping u s e f u l data i n t o c l u s t e r s f o r other analysis.  The  uses of c l u s t e r a n a l y s i s , however,  i s o f t e n tempered by the d i f f i c u l t i e s results.  helped  i n interpretating  the  These d i f f i c u l t i e s c o u l d be the r e s u l t s o f : ( i ) the f a i l u r e t o r e c o g n i z e the i n a p p r o p i a t e n e s s of techniques on the set of data by the u s e r ; and ( i i ) the ignorance of the p o s s i b i l i t y of the absence of c l u s t e r s i n the data s e t .  I n u s i n g c l u s t e r a n a l y s i s , i t i s necessary t o keep i n mind that: (a) c l u s t e r i n g techniques do not o p t i m i z e w i t h r e s p e c t t o the  criteria;  (b) s e l e c t i o n of v a r i a b l e s , d a t a , and measures i s c r i t i c a l to r e s u l t s ;  and  (c) b i a s o p i n i o n on data and v a r i a b l e s s e l e c t i o n of '  a sampled p o p u l a t i o n c o u l d generate c o n f u s i n g and meaningless  results.  CHAPTER  III  HIERARCHICAL CLUSTERING TECHNIQUES  The b r i e f r e v i e w o f c l u s t e r i n g t e c h n i q u e s Chapter II  has  indicated that there is  methods a p p l i c a b l e t o d a t a , v a r i a b l e s cations.  a host of  H i e r a r c h i c a l c l u s t e r i n g t e c h n i q u e s a r e t h e most  is difficult  study,  only  are reviewed. if  in literature.  t o examine e v e r y v a r i a t i o n o f h i e r a r c h i c a l  c l u s t e r i n g techniques disclosed this  clustering  or o b j e c t s c l a s s i f i -  commonly u s e d a n d , by f a r , t h e m o s t d i s c u s s e d It  in  i n the  l i t e r a t u r e , and  in  9 o f t h e more p o p u l a r h i e r a r c h i c a l m e t h o d s  The c h a r a c t e r i s t i c s , c r i t e r i o n a n d  any, a s s o c i a t e d w i t h each of these t e c h n i q u e s ,  s p e c i f i c references to agglomerative  approach,  problems, with  distance  m e a s u r e , and d a t a c l u s t e r i n g , a r e e x a m i n e d i n t h e  following  sections.  The a b u n d a n c e  of h i e r a r c h i c a l c l u s t e r i n g  methods  t r e a t e d i n the l i t e r a t u r e are a l t e r n a t i v e f o r m u l a t i o n or minor v a r i a t i o n s  All only  of three b a s i c  (1)  Linkage  (2)  C e n t r o i d methods,  (3)  Error  clustering  concepts^:-  methods, and  sum o f s q u a r e s o r v a r i a n c e  methods.  t h e s e methods a r e c o m p a t i b l e f o r c l u s t e r i n g d a t a u n i t s l i n k a g e methods  procedures.  c a n be a d o p t e d t o v a r i a b l e  clustering  and  22  3.1  B a s i c Agglomerative  Procedure  and Approaches  The b a s i c procedure w i t h a l l the agglomerative methods i s s i m i l a r .  They g e n e r a l l y s t a r t w i t h the  computation  o f a c o r r e l a t i o n or d i s t a n c e m a t r i x between the e n t i t i e s ,  and  the end product i s a dendogram showing the s e q u e n t i a l f u s i o n s of i n d i v i d u a l  entities.  The c o r r e l a t i o n or d i s t a n c e m a t r i x , which c o n t a i n s the a s s o c i a t i o n measure, Sjj or  d jj , between e n t i t i e s  j , i s the most important component of the c l u s t e r i n g dure.  proce-  Most procedures assume the measure between e n t i t i e s i s  symmetric i . e . Sij = Sji or assumption,  dij = dji , and because of t h i s  o n l y the lower t r i a n g l e of the c o r r e l a t i o n or  d i s t a n c e measure i s u t i l i z e d One  I and  i n the c l u s t e r i n g  c r i t i c a l p o i n t c o n c e r n i n g the elements  procedures.  of the m a t r i x i s  t h a t the c l u s t e r i n g procedures are not designed t o handle n e g a t i v e l y valued elements,  t h u s , the a b s o l u t e v a l u e s or  square of the measure are f r e q u e n t l y used as the a s s o c i a t i o n measure.  Another  aspect r e l a t e d to the c l u s t e r i n g  procedure  i s t h a t t h e r e are s i g n i f i c a n t d i f f e r e n c e s i n the c h a r a c t e r i s t i c of a c o r r e l a t i o n and a d i s t a n c e m a t r i x of (a)  P a i r w i s e d i s t a n c e s d(Xj,Xj)  n  entities:-  may  i n terms of a symmetric D * n d i s t a n c e matrix:-  be r e p r e s e n t e d  /  23  'o  d  d  o  21  dm  12  dn 2  D = \d i  d  n  6 / .  n 2  The d i a g o n a l elements o f the m a t r i x D a r e d;j =0 f o r i =1,2...,n and  d y > 0 f o r i,j =1,2 -^n. (b)  The c o r r e l a t i o n measure Sjj between two e n t i t i e s  i s non-negative r e a l valued f u n c t i o n (i)  0 < Sjj < 1 f o r  (ii)  Sjj  (iii)  -1 f o r  i  l*j  subjected t o ^ : -  i  =1,2, • -,n  ; and  Sjj = S j j .  The 'pa i f wise c o r r e l a t i o n can be r e p r e s e n t e d i i i a m a t r i x of  /  s  S  %  1  21  1  1n  3  >2n  1  These marked d i f f e r e n c e s  of d i s t a n c e and c o r r e l a t i o n measure  would r e v e r s e the c l u s t e r i n g c r i t e r i a o f some h i e r a r c h i c a l methods d e s c r i b e d i n the f o l l o w i n g  sections.  Once the m a t r i x i s d e f i n e d , the b a s i c  agglomerative  approach i n c l u s t e r i n g e n t i t i e s i n t o groups can be c o n s i d e r e d as  follows:-  Begin w i t h n c l u s t e r s each c o n s i s t i n g  (1) one  entity.  Let  the c l u s t e r s be  l a b e l e d w i t h the  of  exactly  numbers 1  through n . (2) the  p a i r of c l u s t e r s  terion. the  Search the  Let  measure be Reduce the  merge of c l u s t e r s and  p and  Spq  labeled  p and  L a b e l the  association  o r i g i n a l matrix p e r t a i n i n g (4)  form one  Repeat steps  by  1 through  product of the merge q, i n order  measures between c l u s t e r q_ D e l e t e the row to c l u s t e r  (2) and  (3) f o r  of the c l u s t e r s that are merged and  the  and  column  p . ( n -1)  single cluster containing n e n t i t i e s .  between them at each stage are  let  c  a l l other e x i s t i n g c l u s t e r s .  of the  cri-  or dpq , P > i '  number of c l u s t e r s  fj .  cj and  update the c o r r e l a t i o n or d i s t a n c e m a t r i x e n t r i e s  t o r e f l e c t the r e v i s e d and  s a t i s f i e s best the c l u s t e r i n g  the chosen c l u s t e r s be  association (3)  that  c o r r e l a t i o n or d i s t a n c e m a t r i x f o r  times  The  to  identity  v a l u e s of measures  r e c o r d e d f o r the  f i n a l dendogram  output.  V a r i o u s a g g l o m e r a t i v e methods d i f f e r procedure i n u s i n g the c l o s e l y associated the  from t h i s  c l u s t e r i n g c r i t e r i o n t o d e f i n e the most  p a i r a t step 2 and  i n updating and  c o r r e l a t i o n or d i s t a n c e m a t r i x at step 3.  These  revising variations  would produce d r a s t i c a l l y d i f f e r e n t r e s u l t s f o r some data and  little  basic  or no v a r i a n c e f o r  others.  set  T h i s b a s i c a g g l o m e r a t i v e procedure i s a c t u a l l y a s e r i e s of comparisons between e n t i t i e s and based on these comparisons, c l u s t e r s are formed.  Comparison  of m a t r i x  elements, s e a r c h i n g of p a i r o f c l u s t e r s , update the s i m i l a r i t y m a t r i x , and d e l e t i o n of m a t r i x rows a l l added up t o a t o t a l of  2n~ -  9n/2  comparisons^.  The number o f com-  p a r i s o n s should be one of the c o n s i d e r a t i o n s f o r the s i z e of  the input m a t r i x and the t i m i n g r e q u i r e d f o r computation.  There are s e v e r a l c o m p u t a t i o n a l approaches t o c l u s t e r i n g problems. tages and  limitations.  Each approach has i t s own unique advanNo s i n g l e approach i s a c u r e - a l l f o r  a l l c i r c u m s t a n c e s ; each has i t s own r e a l m of a p p l i c a t i o n s . Among the c o m p u t a t i o n a l approaches, " s t o r e d m a t r i x " , " s t o r e d d a t a " and  " s o r t e d m a t r i x " a r e more commonly used.  Stored m a t r i x approach i n v o l v e d the s t o r i n g of the c o r r e l a t i o n or d i s t a n c e m a t r i x i n the computer's c e n t r a l memory so t h a t the s i m i l a r i t y v a l u e s may i n any sequence. unique  be accessed d i r e c t l y  T h i s approach, l i k e any o t h e r s has i t s own  characteristics:(1)  The c l u s t e r i n g procedure i s independent o f the  d e r i v a t i o n of c o r r e l a t i o n or d i s t a n c e m a t r i x ; (2)  T h i s i n - c o r e storage method s e v e r e l y  number of e n t i t i e s that can be grouped.  Anderberg  l i m i t s the (1973)  i n d i c a t e s that problems o f more than 150 e n t i t i e s are d i f f i c u l t  t o handle without a l a r g e s i z e computer.  The s t o r a g e r e q u i r e -  ments f o r c o r r e l a t i o n or d i s t a n c e m a t r i x a r e shown i n T a b l e 1.  Number o f entities  Storage required  Number o f entities  Storage required  50  1,225  300  44,850  100  4,950  350  61,075  150  11,175  400  79,800  200  19,900  450  101,025  250  31,125  500  124,750  T a b l e 1.  Storage Requirements f o r S i m i l a r i t y M a t r i x  (3)  Both v a r i a b l e s and data t o be grouped a r e  r e p r e s e n t e d i n the m a t r i x and do not a f f e c t the procedure o f c l u s t e r i n g algorithm i n t h i s stored matrix  I n u s i n g t h i s approach,  approach.  the user must be aware o f  t h e c a p a c i t y of the computer f o r storage and enumeration process o f d i f f e r e n t methods.  Stored data approach i n v o l v e d more procedures The  computational  i n i t s a l g o r i t h m than the s t o r e d m a t r i x  approach.  storage of d a t a , i n s t e a d o f c o r r e l a t i o n or d i s t a n c e m a t r i x  elements,  i n the c e n t r a l memory r e q u i r e s l e s s core  space.  T h i s approach i s a p p l i c a b l e t o any c l u s t e r i n g methods w i t h s i m i l a r i t y or d i s t a n c e measures, and the c o m b i n a t o r i a l problem i n computing a s s o c i a t i o n measures between c l u s t e r s by r e f e r e n c e t o the o r i g i n a l data i s comparable to the s i z i n g i s s u e o f s t o r e d matrix approach.  T h i s computational  problem  can be avoided by s t o r i n g e i t h e r the a s s o c i a t i o n v a l u e s or the summary s t a t i s t i c s f o r each c l u s t e r from which d e s i r e d a s s o c i a t i o n measure c o u l d be computed i n the computer memory. T h i s remedy, however, has r e s t r i c t e d the user o f t h i s approach to data u n i t  clustering.  Sorted m a t r i x approach i s a r e l a t i v e l y u n e x p l o i t e d methodology.  T h i s i s designed  t o handle  sizeable  m a t r i x f o r s o r t i n g data u n i t s or v a r i a b l e s .  similarity  The d i s t i n c t  advantage o f t h i s approach i s t h a t i t saves computer  storage  space but the m a t r i x has t o be s o r t e d b e f o r e being i n p u t t o the c l u s t e r i n g program.  Other approaches a r e a l s o a v a i l a b l e by u s i n g v a r i o u s combination  o f the above approaches or s p e c i a l l y  designed  a l g o r i t h m f o r s p e c i f i c computer systems.  Magnetic  d i s k s a r e mostly used by other approaches  (Wishart, 1969b;  Park,  tapes and  1970; Wolfe, 1970) f o r s t o r i n g data u n i t s and/or  s i m i l a r i t y m a t r i x , thus up t o 1000 data u n i t s and 200 v a r i a b l e s can be c l u s t e r e d w i t h h i e r a r c h i c a l methods.  Variations of  s t o r e d m a t r i x and s t o r e d data approaches a r e g e n e r a l l y adopted i n most computational methods h a n d l i n g l a r g e data s e t .  28  3.2  Linkage Methods T h i s category o f h i e r a r c h i c a l c l u s t e r i n g methods i s  simple t o use and easy t o understand. procedure i s e s s e n t i a l l y t i v e procedure. association  This c l a s s i f i c a t i o n  i d e n t i c a l t o that  of b a s i c  agglomera-  Maximum, minimum or average values o f the  measures among e n t i t i e s a r e used as c r i t e r i a f o r  grouping the data u n i t s  or v a r i a b l e s  into clusters.  Methods  using d i f f e r e n t c l u s t e r i n g c r i t e r i o n , of course, r e s u l t differently the  i n the dendograms.  characteristics  3.2.1  In the f o l l o w i n g  subsections,  o f f o u r l i n k a g e methods a r e d e s c r i b e d .  S i n g l e Linkage Methods  The  methods of s i n g l e - l i n k a g e  c l u s t e r a n a l y s i s a r e the  s i m p l e s t o f a l l h i e r a r c h i c a l t e c h n i q u e s , and a r e . a l s o the most popular. and  This  approach was f i r s t d e s c r i b e d by Sneath  l a t e r by many other s c i e n t i s t s  (1957)  (McQuitty, 1960; Lance and  W i l l i a m s , 1966; Johnson, 1967; Gower and Ross, 1969; Zahn, 1971; S i b s o n , 1973; H a r t i g a n , 1975).  The c r i t e r i o n used i n these  techniques i s the minimum d i s t a n c e  (maximum value i f c o r r e l a t i o n  measure i s used) between c l u s t e r s . clusters  A t each s t a g e , a f t e r  p and q, have been merged, the s i m i l a r i t y between the  new c l u s t e r t and some other c l u s t e r min ( d p , dqr)  d^r ~  Str ^  r  m  a x ( pr' s  S (  l ) r  r i s determined by  , or  The  measure between the  of c l u s t e r s  t and  two  closest  r i s the  c r i t e r i o n for further  c r i t e r i o n i s used throughout the cluster  or most s i m i l a r members  enumerations u n t i l one  because c l u s t e r s are j o i n e d  a t each stage by  e s t or s t r o n g e s t l i n k .  any  produced by  cluster.  groupings.  For  single the  c l u s t e r of two  same c l u s t e r  than to any  However, t h i s approach  for i t s resultant  linkage  single  short-  or more e n t i t i e s  chaining clusters  I t i s frequently stated  other e n t i t y  i s often  not  criticized  for  non-ellipsoidal  that  this "chaining"  d i s t i n c t l y d i s s i m i l a r e n t i t i e s a t each end  of the  has  cluster.  uses of d i f f e r e n t d i s t a n c e measures f o r t h i s approach,  however, would y i e l d d i f f e r e n t  "chaining" e f f e c t s .  D i s t a n c e measures such as E u c l i d e a n d i s t a n c e s , and  The  d i f f e r e n t measures, of c o u r s e , are three d i f f e r e n t  single  simple " C i t y - B l o c k "  C h i - s q a r e s are  approach to c l u s t e r data s e t s .  as  single  t h i s method, every member i s more s i m i l a r to some  other member of the  The  This  i s formed.  T h i s l i n k a g e procedure i s known as  i n the  merge.  and  o f t e n used i n t h i s  outcomes i n u s i n g these d i f f e r e n t and  l i n k a g e methods i n t h i s  are c l a s s i f i e d study.  3.2.2  Complete Linkage Method Different  from s i n g l e - l i n k a g e  method, the  complete  l i n k a g e method uses maximum d i s t a n c e or l e a s t c o r r e l a t i o n as c r i t e r i o n t o group e n t i t i e s .  Sorensen's  l i n k a g e c r i t e r i o n i s that  individuals  two  s i m i l a r i t y or d i s t a n c e which s  or  r .  (Sneath, 1968)  i n a group have a  i s l e s s than a t h r e s h o l d  Other s c i e n t i s t s termed  complet  t h i s method as  value  furthest  neighbour t e c h n i q u e , i n which each i n d i v i d u a l i s t r e a t e d a single-point  cluster.  as  T h i s approach i s c o n s i d e r e d as  "masimally connected subgraph" i n graph t h e o r y .  Similar of complete  to single-linkage  l i n k a g e a l g o r i t h m , a f t e r c l u s t e r s p and  been merged, the a s s o c i a t i o n t dq  and some other c l u s t e r r  r  c o r r e l a t i o n measure.  cluster  i s determined as dtr m a x ( d p =  S^  dtr  (  o r  ^tr )  l s  between the most d i s t a n c e  s i m i l a r ) members of c l u s t e r s  t  r  ,  =min( S p r , S q r ) f o r  r  The q u a n t i t y  (or c o r r e l a t i o n )  have  measure between the new  ) f o r d i s t a n c e measure, and  distance  procedure, a t each stage  and  r  .  t  n  e  (or d i s -  I f c l u s t e r s were  merged, then every e n t i t y i n the r e s u l t i n g c l u s t e r would be no f u r t h e r the  than  cluster.  dtr or more than  The  d  t r  or  Str  S^  r  from every e n t i t y i n  can be c o n s i d e r e d as the  diameter of sphere o f which the maximum d i s t a n c e or minimum correlation i s related to.  The  interpretation  to s i n g l e - l i n k a g e  o f the c l u s t e r s , i n c o n t r a s t  method, can be made only i n terms o f the  relationship within  i n d i v i d u a l c l u s t e r s ; and t h e r e i s no  particularly useful  interpretation  i n v o l v i n g the d i f f e r e n c e s  between c l u s t e r s .  3.2.3  Average Linkage W i t h i n the New Group Instead o f r e l y i n g on extreme v a l u e s , maximums or  minimums, used i n s i n g l e - l i n k a g e  and complete l i n k a g e methods  as c r i t e r i a f o r grouping e n t i t i e s i n t o c l u s t e r , average  linkage  method u t i l i z e s average v a l u e s o f the measures as a r u l e f o r grouping e n t i t i e s or c l u s t e r s . criterion:  one uses the w i t h i n  Two methods employ t h i s group averages and the other  compares t h e between merged group averages. i s discussed i n section  The  d,j  The l a t t e r method  3.2.4.  or S j j e n t r i e s  i n the i n i t i a l  similarity  m a t r i x may be b u i l t as the sum o f s i m i l a r i t i e s  associated  w i t h a l l p a i r w i s e combinations formed by t a k i n g  one e n t i t y  from c l u s t e r i and the other from c l u s t e r j . of any e n t i t i e s , each c l u s t e r c o n s i s t s  P r i o r merges  o f j u s t one s i n g l e  e n t i t y and t h e r e i s only one such p a i r o f e n t i t i e s f o r each pair of c l u s t e r s .  Upon the merges o f c l u s t e r s  P  and <\ ,  the  sum o f p a i r w i s e s i m i l a r i t i e s between the new c l u s t e r t  and  some other c l u s t e r  r  becomes:  dt =  dp  S^ =  Sp +Sqr  r  or and  r  r  + d q r f o r d i s t a n c e measures f o r c o r r e l a t i o n measures  r  the s i m i l a r i t y matrix  The  i s updated  sum o f a l l p a i r w i s e  within cluster  i  , SUMj  becomes:  SUM  = SUMp  + SUMq  t  accordingly.  s i m i l a r i t i e s among e n t i t i e s  + dpq  when c l u s t e r p and q a r e merged and new c l u s t e r A t the same time, the number o f e n t i t i e s Nj  t i s formed.  for cluster i  increases accordingly as:  N  t  - N  p  + Nq  In s e a r c h i n g f o r the most s i m i l a r p a i r , the average w i t h i n group s i m i l a r i t y f o r the c l u s t e r s candidate  pair of c l u s t e r  formed by merging the  i and j  SUM j + SUMj  +  becomes  djj •  (Nj + Nj )(Nj + Nj - l ) / 2 . T h i s average l i n k a g e method has not made any r e f e r e n c e t o the maximum or minimum s i m i l a r i t y values and the i n t e r p r e t a t i o n o f t h e r e s u l t i n g dendogram would need a d i f f e r e n t approach than t h a t f o r t h e s i n g l e or complete l i n k a g e r e s u l t s . as a p r a c t i c a l matter, t h i s method f r e q u e n t l y g i v e s  results  that a r e l i t t l e r a d i c a l d i f f e r e n t from those obtained complete l i n k a g e method^.  However,  with  3.2.4  Average Linkage Between Merged  Groups  T h i s average l i n k a g e method uses d i f f e r e n t v a l u e s from t h a t of the above method. l i n k a g e w i t h i n group t e c h n i q u e  i  Similar  average  t o the average  t h i s method d e f i n e s d i s t a n c e  between groups as the average o f the d i s t a n c e between a l l pairs  of i n d i v i d u a l s  i n the two groups.  The procedure can  be used w i t h c o r r e l a t i o n and d i s t a n c e measures as long as the  concept o f an average measure i s a c c e p t a b l e .  l a r i t y matrix contains  d j j (or S j j  The  simi-  ) , the sum of s i m i l a r i -  t i e s a s s o c i a t e d w i t h a l l p a i r w i s e combinations between cluster  i and  j  .  The number o f such between group  wise s i m i l a r i t i e s i s the product o f N j  and N j  pair-  where Nj  i s the number o f e n t i t i e s i n c l u s t e r  i .  The average  between group s i m i l a r i t y f o r c l u s t e r  i and j can be  formu-  l a t e d as 'J  or  Nj Nj  Nj Nj  I n t h i s method, the sums o f w i t h i n group p a i r w i s e s i m i l a r i t i e s ai*e i g n o r e d .  In reference to a p p l i c a t i o n  c o r r e l a t i o n measure m a t r i x c l u s t e r i n g , (1967) p o i n t out that  using .  t o t h i s method t o  Lance and W i l l i a m s  £ COS Nj Nj ,j as a s i m i l a r i t y measure would be more a p p r o p i a t e . COS  Sjj  3.3  C e n t r o i d Methods These methods merge c l u s t e r s w i t h the most s i m i l a r  mean v e c t o r s or c e n t r o i d s .  Two d i f f e r e n t approaches were  developed by s c i e n t i s t s : c e n t r o i d c l u s t e r i n g a n a l y s i s and Michener, 1958; King, and median method The  (Sokal  1966 and 67; Lance & W i l l i a m s ,  (Gower, 1967; Lance & W i l l i a m s ,  1967a)  1967a).  f i r s t method employs weighted measures a c c o r d i n g t o the  number o f e n t i t i e s i n the f o r m u l a t i o n o f mean v e c t o r s whereas the  l a t t e r method uses e q u a l weighs f o r c e n t r o i d s o f groups.  An unique c h a r a c t e r i s t i c o f the c e n t r o i d methods and t h e i r v a r i a n t s i s t h a t the s i m i l a r i t y value a s s o c i a t e d w i t h the mergers o f the most s i m i l a r c l u s t e r may r i s e and f a l l from stage t o stage.  T h i s i s the r e v e r s a l phenomenon a s s o c i a t e d  w i t h t h i s approach.  These r e v e r s a l s occur because c l u s t e r  c e n t r o i d s can migrate as mergers take  3.3.1  place.  C e n t r o i d Method T h i s method was o r i g i n a l l y proposed by S o k a l and  Mitchener  (1958) and King  (1966, 196 7) who c o n c e n t r a t e  the c l u s t e r i n g o f v a r i a b l e s .  Groups a r e d e p i c t e d t o l i e i n  E u c l i d e a n space, and a r e r e p l a c e d on f o r m a t i o n nates o f t h e i r c e n t r o i d .  by the c o o r d i -  The d i s t a n c e between groups i s de-  f i n e d as d i s t a n c e between the group c e n t r o i d s . is  on  The procedure  then t o fuse groups a c c o r d i n g t o the d i s t a n c e between t h e i r  c e n t r o i d s , the groups w i t h t h e s h o r t e s t d i s t a n c e being first.  fused  Lance and W i l l i a m s  (1967a) update the f o r m u l a t i o n  of d i s t a n c e measure between c e n t r o i d s a s : Np  , d  CJ = tr  Np where t  Nq  p r  + Nq  +  dqr Np  + Nq  Np Nn — d N p + Nq  p q  P and q, a r e the l a b e l s f o r the c l u s t e r s j u s t merged,  i s the l a b e l f o r the new c l u s t e r , and r i s any other  existing clusters.  T h i s e q u a t i o n c o u l d be used w i t h any  s i m i l a r i t y measure f o r e i t h e r v a r i a b l e s or data u n i t s , however, the r e s u l t s would l a c k a u s e f u l i n t e r p r e t a t i o n i f djj  i s not the squared E u c l i d e a n d i s t a n c e between the  centroids of cluster  3.3.2  i  and j .  Median (Gower) Method A disadvantage o f the C e n t r o i d Method i s t h a t i f  the s i z e s o f the two groups t o be fused a r e very  different,  the c e n t r o i d o f the new group w i l l be very c l o s e t o t h a t of the  l a r g e r group and may remain w i t h i n t h a t group; the  c h a r a c t e r i s t i c s o f the s m a l l e r group a r e then v i r t u a l l y The  lost.  s t r a t e g y can be made independent o f group s i z e by assuming  t h a t the groups t o be fused a r e o f e q u a l s i z e , the apparent p o s i t i o n o f the new group w i l l then always be between the two  groups t o be f u s e d .  I n other words, as proposed by Gower  (1967) the g e n e r a l idea i s t h a t the c e n t r o i d s a r e weighted e q u a l l y r e g a r d l e s s o f how many e n t i t i e s a r e i n the r e s p e c t i v e clusters.  When  djj i s a d i s t a n c e f u n c t i o n , the updating  e q u a t i o n f o r the Median method i s d or if  t r  =\ ( dpr + dqr ) - fcdpq,  Str =\  ( Spr + Sqr  )  - k d-spO  Sjj i s a c o r r e l a t i o n f u n c t i o n .  A l t h o u g h t h i s method c o u l d be made s u i t a b l e f o r both s i m i l a r i t y and  d i s t a n c e measures, Lance &  (1967a) suggest t h a t i t should  be regarded as  f o r c o r r e l a t i o n measure, s i n c e g e o m e t r i c a l the measure cannot be  3.4  E r r o r Sum  interpreted  incompatible  representation  of  easily.  of Squares or V a r i a n t Methods  A l t h o u g h there are sum  Williams  s e v e r a l methods u s i n g  error  of squares as o b j e c t i v e f u n c t i o n f o r c l u s t e r i n g e n t i t i e s ,  they are v a r i a t i o n s from the method developed by Ward and Ward and  Hook (1963).  (1963)  In t h i s study, only Ward's method  i s examined.  Ward the  (1963) proposes t h a t at any  stage of an a n a l y s i s  l o s s of i n f o r m a t i o n which r e s u l t s from the grouping of  i n d i v i d u a l s i n t o c l u s t e r s can be measured by the t o t a l of squared d e v i a t i o n o f every p o i n t from the mean of c l u s t e r to which i t belongs.  the  A t each step i n the a n a l y s i s ,  union of every p o s s i b l e p a i r o f c l u s t e r s i s c o n s i d e r e d the two  sum  c l u s t e r s whose f u s i o n s r e s u l t s i n the minimum  and increase  in  the error  of  this  sum o f s q u a r e s a r e c o m b i n e d .  approach, X  the following  score on i data units  =  'J  j= k 2 1 j 1  i  In the  are defined  formulation  and c a l c u l a t e d : -  of n variables f o r j i n k of h clusters n  i  n  of"  t h  m  X  -  i k  X  i j k  /m  k  c  - mean o n t h e i k cluster  t  n  variable  f o rdata  units i n  t h  ."Hk  -  ^  X  =m .x  | j k  k  J-1  " t o t a l of scores units i n the k  l^k  Sk  -  f o r data  2  x  n  on i ^ v a r i a b l e cluster 1  t h  il2  i k  i J k  = sum o f s q u a r e d s c o r e s o n a l l v a r i a b l e s f o r a l l d a t a u n i t i n t h e k^ c l u s t e r 1  Then t h e e r r o r  E The  increase  merger  k  sum o f s q u a r e s  f o r c l u s t e r k may be w r i t t e n  = S  i k  -  k  ZI T I - I  i n the t o t a l  of clusters  p  and  2  error  / k m  sum o f s q u a r e s due t o t h e  <{ t o f o r m t h e new c l u s t e r  t i s  as  AEpq  Et =  Ep  Eq  i=n 2 / Sq - ^ ( T i + T i q ^ ( m + m ) - E  S+ p  p  p  q  - Eq  p  Based on the above formulas, e n t i t y w i t h l e a s t AEpq- i s grouped i n t o the new  Wishart  cluster.  (1969a) i n h i s computer a l g o r i t h m  that the v a r i a b l e s w i t h a l a r g e v a r i a n c e  have more i n f l u e n c e  on the j o i n s than those w i t h a s m a l l v a r i a n c e . u n i t s and  sub-groups i s decided  t i o n s to the  sum  by the  squares d e v i a t i o n  AEpq .  contribu-  Although  r i s decided  of squares A Ep  r  the  sum  of  because  i s l e s s than t h a t  T h i s occurrence d i s r u p t s the homogeneity of the  entities.  of squares are r e q u i r e d  in  of  groups  A l t e r a t i o n t o both the sub-group s t r u c t u r e and  changes i n e r r o r sum this  E^ .  j o i n i n g of  r e s u l t i n a smaller  squared d e v i a t i o n s , a l i n k between p and the i n c r e a s e of e r r o r sum  The  on the b a s i s of the  j o i n i n g of sub-groups p and <\ may  indicates  the  formulating  algorithm.  The Ward method i s designed f o r a s i m i l a r i t y m a t r i x of E u c l i d e a n d i s t a n c e s space.  computed i n any  A l t h o u g h t h i s method may  p o s s i b l e e r r o r sum  or may  decided not  representation  g i v e the minimum  of squares over a l l p o s s i b l e s e t s of  c l u s t e r s from t h e m data u n i t s , the s o l u t i o n i s u s u a l l y good even i f i t i s not  optimal  on the  criterion.  h very  3.5  Summary Many of the above mentioned h i e r a r c h i c a l  methods, u s i n g d i s t a n c e measure between groups as  clustering criterion,  can be r e p r e s e n t e d as a r e c u r r e n c e formula f o r the d i s t a n c e between a group k , and a group groups i and j .  d  k(.j) =  * i d  (ij ) formed by the f u s i o n o f  T h i s formula can be w r i t t e n a s : -  k i  -  *jd  -  k j  -* |d  /-djj  :  - d |  k l  k j  When djj i s the d i s t a n c e between groups i and j and  oC, fi  and oC are parameters r e l a t e d t o d i f f e r e n t methods as shown i n T a b l e 2.  S i n g l e Linkage: Complete  Linkage:  oCj = oCj - !^  ;  /3 = o j X  j = <*j = 1^ )  = 0;  Centroid:  oC- = rij/(nj+nj); oij = r i j / ( n j + n j ) j fi =-o<:je<j  Median:  0  t  ( = oCj = 1^ fi ]  ;  1^ . i=  =  ;  y =  0  0  Ward s Method: 1  n+ n k  T a b l e 2.  j+  nj  J  nk- r j n  n  Parameter Values of the Recurrence Formula f o r F i v e H i e r a r c h i c a l Techniques  This recurrence r e l a t i o n s h i p (1967a) and by Wishart  i s g i v e n by Lance & W i l l i a m s  (1969c) and i t i s not s u i t a b l e f o r  methods u s i n g c o r r e l a t i o n measures.  On the whole, h i e r a r c h i c a l c l u s t e r i n g a l l have t h e i r m e r i t s and l i m i t a t i o n s .  techniques  There i s no  single  method that would s o l v e a l l types o f c l u s t e r i n g problems and  i t i s necessary f o r the user t o examine the s u i t a b i l i t y  of these techniques f o r grouping the data  sets.  CHAPTER IV  .  NONHIERARCHICAL CLUSTERING TECHNIQUES  For a data s e t o f n e n t i t i e s the h i e r a r c h i c a l methods g i v e n nested c l a s s i f i c a t i o n s r a n g i n g from n c l u s t e r s member each t o one c l u s t e r o f n members. nonhierarchical  Contrary t o t h i s ,  techniques i n t r o d u c e d i n t h i s chapter a r e  designed t o c l u s t e r data u n i t s k  o f one  c l u s t e r s , where k i s e i t h e r  dures or determined  into single specified  c l a s s i f i c a t i o n of  p r i o r t o the p r o c e -  as p a r t o f the c l u s t e r i n g method.  methods may be used w i t h much l a r g e r  These  problems than the  h i e r a r c h i c a l methods because i t i s not necessary t o c a l c u l a t e and  s t o r e the s i m i l a r i t y or d i s t a n c e m a t r i x ; i t i s not even  necessary t o s t o r e the data s e t .  I n g e n e r a l , the data  units  a r e processed s e r i a l l y and can be read from tape or d i s k as needed; and t h i s c h a r a c t e r i s t i c a l l o w s c l u s t e r i n g collections  o f data  units.  In t h i s study, only t h r e e of the nearest s o r t i n g methods w i t h f i x e d number in details.  of l a r g e r  of c l u s t e r s  centroid  a r e examined  4.1  Elements  of N o n h i e r a r c h i c a l Methods  Most n o n h i e r a r c h i c a l procedures can s t a r t w i t h i n i t i a l p a r t i t i o n s or i n i t i a l In  seed p o i n t s of the data u n i t s .  u s i n g i n i t i a l p a r t i t i o n i n g , the a l g o r i t h m s change the  c l u s t e r memberships i n t o " b e t t e r " p a r t i t i o n s .  The broad  concept f o r these methods i s very s i m i l a r t o that u n d e r l y i n g the s t e e p e s t descent a l g o r i t h m s used f o r u n c o n s t r a i n e d o p t i m i z a t i o n i n n o n l i n e a r programming^.  These methods  s t a r t w i t h i n i t i a l seed p o i n t s and then generate a sequence of  moves from one point t o another, each g i v i n g an  improved  v a l u e of o b j e c t i v e f u n c t i o n , u n t i l a l o c a l optimum i s found. The  seed p o i n t and  i n i t i a l p a r t i t i o n are, therefore,  t a n t t o the n o n h i e r a r c h i c a l methods.  These i n i t i a l  g u r a t i o n s can be chosen randomly or m e t h o d i c a l l y as  imporconfidis-  cussed i n the f o l l o w i n g s u b - s e c t i o n s .  4.1.1  Seed P o i n t s . V a r i o u s approaches  are used i n c h o o s i n g a set of  seed p o i n t s that are adopted which the s e t of  n  as c l u s t e r n u c l e i  data u n i t s can grouped.  around  Some methods  use data u n i t s themselves as seed p o i n t s , whereas other use more s o p h i s t i c a t e d methodology i n a r r i v i n g the n u c l e i .  The  s i m p l i e r methods choose  the set. randomly  (1) k  data u n i t s  (McRae, 1971); ( 2 ) the f i r s t  k data u n i t s i n  %, %, • •  the data s e t (McQueen, 1967); ( 3 ) the l a b e l e d and n data u n i t s which a r e i n i t i a l l y  from  2  tacked on as 1  s t  • ,  ( k  "  1 ) r  H  t o n**  1  data u n i t s ; or ( 4 ) s u b j e c t i v e l y k u n i t s from the data s e t . More c a l c u l a t e d approaches f o r s e l e c t i n g the seed p o i n t s use centroids of i n i t i a l p a r t i t i o n s i n i t i a l groups set  (Forgy, 1965), d e n s i t i e s o f  (Astrahan, 1970) or mean v e c t o r s o f the data  ( B a l l and H a l l ,  1967).  These approaches,  l i k e any other  elements of c l u s t e r i n g t e c h n i q u e s , have a s u b s t a n t i a l y erent i n f l u e n c e on the r e s u l t s o f c l u s t e r i n g  4.1.2  Initial In  diff-  procedure.  Partitions  l i e u o f seed p o i n t s , some c l u s t e r i n g methods  emphasize on the g e n e r a t i n g o f i n t i a l p a r t i t i o n o f the data u n i t s i n t o mutually e x c l u s i v e c l u s t e r s . initial in  However, the s e t of  seed p o i n t s a r e r e q u i r e d t o generate  some of the p a r t i t i o n i n g  Forgy  initial  partition  procedures.  (1965) uses a g i v e n s e t o f seed p o i n t s as  the n u c l e i t o i n i t i a t e a p a r t i t i o n s formed by a s s i g n i n g the data u n i t s t o the nearest seed p o i n t . s t a t i o n a r y throughout consequently  The seed p o i n t s remain  the assignment o f the f u l l data s e t and  the r e s u l t i n g s e t o f c l u s t e r s i s independent o f  the sequence i n which data u n i t s a r e a s s i g n e d .  These c l u s t e r s  a r e separated by p a i r w i s e l i n e a r boundaries which a r e e q u i -  distant  from the c l u s t e r s n u c l e i  i n two d i m e n s i o n a l problems.  MacQueen (1967) a s s i g n s data u n i t s the i n i t i a l l y  single point c l u s t e r s  p o i n t s w i t h the nearest c e n t r o i d ;  one a t a time t o  p r e - d e f i n e d by the  seed  c e n t r o i d s as the t r u e mean  v e c t o r s of a l l the data u n i t s are updated as the c l u s t e r s ' s i z e s grow.  In t h i s method the c l u s t e r c e n t r o i d s migrate  the d i s t a n c e between a g i v e n data u n i t and a p a r t i c u l a r c l u s t e r may  the c e n t r o i d  so  of  a l t e r w i d e l y d u r i n g the assignment  p r o c e s s , as a r e s u l t , the s e t of i n i t i a l c l u s t e r s  i s dependent  on the order i n which data u n i t s are a s s i g n e d .  Wolfe  (1970) uses Ward's h i e r a r c h i c a l c l u s t e r i n g  method t o provide an i n i t i a l T h i s approach, in  set of c l u s t e r s  however, i n v o l v e  immense computational e f f o r t  s e t t i n g up the p a r t i t i o n s , thus  problem tremendously. Williams  Similar  (1967b) suggest  for his algorithm.  l i m i t i n g the s i z e of the  to Wolfe's approach,  Lance and  u s i n g h i e r a r c h i c a l methods on  or more subsets of convenient  s i z e and  one  then use the r e s u l t i n g  groups as n u c l e i f o r assignment of the r e m a i n i n g  clusters.  Random assignment of p a r t i t i o n , of c o u r s e , i s the s i m p l e s t one  t o use.  e n t i t i e s without an a t t r a c t i v e  However, t h i s approach would  c o n s i d e r i n g t h e i r homogeneity and  alternative.  cluster thus i s not  45  4.2  Nearest Centroids  S o r t i n g With F i x e d Number of C l u s t e r s  Of a l l the n o n h i e r a r c h i c a l c l u s t e r i n g the  simplest  processes:centroids  techniques,  i t e r a t i v e methods merely c o n s i s t of two  basic  (1) a set of seed p o i n t s are computed as  of a set of c l u s t e r s , and  can be c o n s t r u c t e d w i t h the nearest  the  (2) a set of c l u s t e r s  by a s s i g n i n g each data u n i t to the c l u s t e r  seed p o i n t .  These two  processes are  repeated  a l t e r n a t e l y u n t i l a s t a b l e c o n f i g u r a t i o n converges: a  critical  c o n d i t i o n f o r completing c l u s t e r i n g a l g o r i t h m s .  4.2.1  Convergence  Properties  I n u s i n g nearest  centroid sorting  convergence of processes i s c r i t i c a l and data u n i t s .  Proofs  generally d i f f i c u l t  to understand.  On  the whole, the  of squares i s the key  to the  k=h  (^ijk  j_=k  _ x  ik^  i  total  the  of square E can be formulated  =n  2  *- *-he s  squared E u c l i d e a n  between the c e n t r o i d of c l u s t e r k and that c l u s t e r .  and  conver-  R e f e r r i n g to the n o t a t i o n used i n s e c t i o n 3.4,  t o t a l w i t h i n group e r r o r sum  where  expected i n grouping  f o r convergence are mostly r i g o r o u s  w i t h i n group e r r o r sum gence.  algorithms,  the  j  m  distance  data u n i t i n  as:-  Another c h a r a c t e r i s t i c i n the a l g o r i t h m s t h a t  ensure  convergence i s the number o f d i f f e r e n t ways a data s e t o f n data u n i t s may be c l u s t e r e d i n t o h c l u s t e r s i s a f i n i t e if n is finite.  This  i n d i c a t e s t h a t any method that  each p a r t i t i o n a t most once i s f i n i t e l y there a r e only  number  generates  convergent because  f i n i t e l y many d i f f e r e n t p a r t i t i o n s .  The c r i t e r i o n chosen f o r d e c i d i n g convergence of the nearest  c e n t r o i d s o r t i n g method i s the s t a b i l i t y  membership; an  alternative criterion i s stability  c l u s t e r seed p o i n t s .  of c l u s t e r  o f the  I n most methods, the seed p o i n t s a r e  the c l u s t e r c e n t r o i d s which a r e dependent only on the c l u s t e r membership.  4.2.2  Forgy's Method The simple a l g o r i t h m  c o n s i s t s o f b a s i c a l l y three (1)  suggested by Forgy  (1965)  steps:-  S t a r t w i t h the d e s i r e d  initial  configuration.  I f the c o n f i g u r a t i o n i s a s e t o f seed p o i n t s , go t o step otherwise go t o step (2) seed p o i n t .  2;  3.  A s s i g n data u n i t s t o the c l u s t e r s w i t h The seed p o i n t s remain i n t a c t f o r a f u l l  nearest cycle  through the e n t i r e data s e t . (3)  Compute new seed p o i n t s as the c e n t r o i d s o f  the c l u s t e r data u n i t s .  Steps 2 and 3 a r e repeated a l t e r n a t e l y process converges; that  u n t i l the  i s , i t e r a t e u n t i l no data u n i t s  change  t h e i r c l u s t e r membership a t s t e p 2.  In view o f the v a r i o u s i n i t i a l  configurations:-  seed p o i n t s or p a r t i t i o n s , i t i s d i f f i c u l t many i t e r a t i o n o f the steps a r e r e q u i r e d gence i n any p a r t i c u l a r problem. that  five repetitions  problems.  reveals  or l e s s w i l l be s u f f i c i e n t f o r s m a l l  i n assigning  n data  t o k c l u s t e r s a t each r e p e t i t i o n o f the two s t e p s . l i m i t e d number o f i t e r a t i o n s i s a c t u a l l y  i f the number o f c l u s t e r s data u n i t s . variations  i s much s m a l l e r than the number o f  o f the number o f c l u s t e r s a t a l e s s c o m p u t a t i o n a l  than f o r a f u l l h i e r a r c h i c a l  Jancey's Jancey  analysis.  Variant  (1966) suggests a method s i m i l a r t o Forgy's  w i t h a m o d i f i e d step 3. is either  necessary  T h i s approach a l l o w s u s e r s t o t r y s e v e r a l  4.2.3  the  evidence  A t o t a l o f n d i s t a n c e computations and n ( k - 1 )  Relatively  cost  t o a c h i e v e conver-  Empirical  comparisons of d i s t a n c e s a r e r e q u i r e d units  t o estimate how  The f i r s t  s e t o f c l u s t e r seed p o i n t s  g i v e n or computed as the c e n t r o i d s  of c l u s t e r s i n  i n i t i a l p o s i t i o n ; a t a l l suceeding stages each new seed  point  i s formed by r e f l e c t i n g the o l d seed point  new c e n t r o i d accelerate  f o r the c l u s t e r .  through the  T h i s technique presumbly  convergence and p o s s i b l y  lead  to a better  will  overall  s o l u t i o n through b y p a s s i n g i n f e r i o r  l o c a l minima.  This  approach, s i m i l a r t o F o r g y ' s , i m p l i c i t l y minimizes the w i t h i n group e r r o r f u n c t i o n .  The boundaries of the c l u s t e r s  are e q u i d i s t a n t from each c e n t r o i d and the r e s u l t o f t h i s method i s not a f f e c t e d by the sequence  of data u n i t s w i t h i n  the data s e t .  4.2.4  Convergent  K-Mean Method  U n l i k e Forgy's or Jancey's approach i n a s s i g n i n g member t o o l d c e n t r o i d s , MacQueen (1967) uses a "K-mean" process i n a l l o c a t i n g each d a t a u n i t t o the c l u s t e r w i t h the nearest c e n t r o i d computed on the b a s i s o f the c l u s t e r ' s c u r r e n t membership. 1969b; McRae, 1971) be implemented (1)  The convergent c l u s t e r i n g method (Wishart, u s i n g the MacQueen's K-mean process can  through the f o l l o w i n g sequence  Begin w i t h an i n i t i a l p a r t i t i o n o f the data  units into clusters.  The p a r t i t i o n c o u l d be c o n s t r u c t e d  u s i n g any o f the approaches (2)  o f steps'*"^.  d e s c r i b e d i n s e c t i o n 4.1.2.  Take each d a t a u n i t i n sequence  and compute  the d i s t a n c e s t o a l l c l u s t e r c e n t r o i d s ; i f the n e a r e s t centroid  i s not t h a t o f the d a t a u n i t ' s parent c l u s t e r , then  r e a s s i g n the data u n i t and update the c e n t r o i d s o f the  losing  and g a i n i n g c l u s t e r s .  Repeat  step 2 u n t i l convergence  i s a c h i e v e d ; that  i s u n t i l the membership i n each c l u s t e r i s s t a b l i z e d .  4.3  Summary There a r e s t i l l  s e v e r a l n o n h i e r a r c h i c a l methods  d e s c r i b e d i n the l i t e r a t u r e (MacQueen, 1967; B a l l and H a l l , 1965).  Most other methods use s i m i l a r procedures  as the  t h r e e d e s c r i b e d , but have d i f f e r e n t grouping c r i t e r i a and updating procedures  of the c l u s t e r  elements.  There a r e , un-  d o u b t l y , l i m i t a t i o n s and m e r i t s i n each o f the other methods, and  s i n c e these methods were developed  results expected  for different  fields,  from these techniques on the same s e t o f data a r e t o be d i f f e r e n t .  The three techniques mentioned i n the p r e v i o u s s e c t i o n s a l l have the three d i s t i n c t procedures  of i n i t i a t i n g ,  a l l o c a t i n g and r e a l l o c a t i n g u n t i l convergence occur w i t h i n the data s e t . In u s i n g these methods i n c l u s t e r i n g a g i v e n data s e t , the groupings are expected  t o be r e l a t i v e l y s i m i l a r .  Forgy's and Convergent K-mean methods w i l l probably g i v e  simi-  l a r r e s u l t s because of t h e i r resemblance i n procedures.  Jancey  method, because o f i t s d i f f e r e n t procedure  new  'seed  in allocating  p o i n t s , c o u l d produce unmatched c l u s t e r s  d a t a s e t . The s u i t a b i l i t y  f o r the same g i v e  o f method a p p l i c a t i o n  data s e t , t h e r e f o r e , cannot  be determined  t h r e e methods, perhaps w i t h d i f f e r e n t  without  f o r a given trying a l l  seed p o i n t s or i n i t i a l  p a r t i t i o n s and v a r i o u s number o f c l u s t e r s .  I n t e r p r e t a t i o n and  s u b j e c t i v e o p i n i o n on the r e s u l t s would be the. o n l y a s s e t s i n e v a l u a t i n g the c o m p a t i b i l i t y  o f these c l u s t e r i n g methods.  CHAPTER V  COMPARATIVE EVALUATION OF CLUSTERING TECHNIQUES  C l u s t e r a n a l y s i s has always been an e x p l o r a t o r y t o o l f o r g e n e r a t i n g hypothesis about the data or d i s c e r n i n g fundamental f a c t s p r e v i o u s l y not apparent.  I n t e r p r e t a t i o n of the  r e s u l t s u s i n g v a r i o u s t o o l s i s needed to j u s t i f y  hypothesis  or simply t o e v a l u a t e the s u i t a b i l i t y of the t e c h n i q u e s . stage of judgement i s s u b j e c t i v e , i n t u i t i v e , and Comparisons o f r e s u l t s generated  heuristic.  by d i f f e r e n t techniques would  i n d i c a t e not o n l y the r e l a t i o n s h i p between e n t i t i e s but the v a l i d i t y of the d e s i r e d r e s u l t s . the e v a l u a t i o n process used appropiateness  5.1  also  T h i s chapter d e s c r i b e s  i n t h i s study by examining  the  of each of the twelve c l u s t e r i n g methods to  ,group f o u r s e t s of two-dimensional results  This  d a t a , and d i s c u s s e s the  so obtained.  Approach t o the E v a l u a t i o n Process The above c h a p t e r s have s t r e s s e d t h a t each element  o f a c l u s t e r a n a l y s i s has i t s own c l u s t e r i n g procedure.  importance  t o the a c t u a l  P e r t i n e n t i n f o r m a t i o n on the data set  i s by f a r the c r i t i c a l t o the grouping procedures:of v a r i a b l e s , the v a r i a b l e s c a l e s , and  the number  the a s s o c i a t i o n o f  these v a r i a b l e s to each o t h e r o f a data u n i t are the key to a c l u s t e r i n g a l g o r i t h m .  The  inputs  a s s o c i a t i o n measure between  d a t a u n i t s i s another analysis.  The  e s s e n t i a l element o f the  clustering  number of c l u s t e r s , the c l u s t e r key and  c l u s t e r i n g technique t o be used are the other v i t a l of  a grouping  and be  elements  analysis.  The unique  the  s c h e d u l i n g problem o f the Post O f f i c e has  c h a r a c t e r i s t i c s : - the l o c a t i o n of the boxes are  the t r a v e l l i n g 15 m.p.h.  speed  two fixed  of the t r u c k s are s t a n d a r d i z e d t o  These two c o n d i t i o n s emphasize the need f o r an  e f f i c i e n t technique to a s s i g n a p p r o p i a t e number of c a l l p o i n t s to  each t r u c k .  The  o b j e c t i v e o f t h i s study i s t o f i n d , i f  p o s s i b l e , such a t e c h n i q u e . is, of  The  comparative  t h e r e f o r e , c o n s t r u c t e d t o i n v e s t i g a t e the  e v a l u a t i o n process appropiateness  s e v e r a l c l u s t e r i n g techniques on data s e t s , measures and  other elements t h a t are p e r t i n e n t t o t h i s s c h e d u l i n g problem.  5.1.1  Data Set In  order t o examine the a p p l i c a b i l i t y of d i f f e r e n t  c l u s t e r i n g methods t o data s e t s of v a r i o u s s p a t i a l c h a r a c t e r istics, ing  f o u r s e t s o f data p e r t i n e n t t o the Post O f f i c e  problem are used.  schedul-  A l l the data u n i t s of these f o u r s e t s  have o n l y two v a r i a b l e s r e p r e s e n t i n g the x and y c o o r d i n a t e of a two d i m e n s i o n a l C a r t e s i a n space. space  i s based  on the assumption  The use of t h i s  dimensional  t h a t the d i s t a n c e s between  d a t a u n i t s are computable by u s i n g two v a r i a b l e s . b l e s , i n t h i s case, are of i n t e r v a l type, and  Both v a r i a -  t h e r e i s no need  to  use s c a l e c o n v e r s i o n s f o r g e n e r a t i n g c o n f o r m i t y of  variables.  These four d a t a s e t s can be i d e n t i f i e d  as:-  (1)  evenly d i s t r i b u t e d c o n t r i v e d d a t a ;  (2)  unevenly  (3)  e m p i r i c a l data f o r North Burnaby a r e a ; and  (4)  e m p i r i c a l d a t a f o r South Burnaby a r e a .  The  first  d i s t r i b u t e d contrived data;  set of data was  c o n s t r u c t e d by  randomly 80 data p o i n t s each r e p r e s e n t i n g the l o c a l i t y of a m a i l box.  allocating  arbitrarily  These data p o i n t s are f a i r l y  evenly  d i s t r i b u t e d , and there i s no o u t s t a n d i n g grouping which can be spotted v i s u a l l y  The  second  (Figure 1).  set o f c o n t r i v e d data i s e s s e n t i a l l y a  c o l l e c t i o n of data p o i n t s f a l l i n g i n t o t h r e e v i s u a l l y f i a b l e groups p o i n t s , and in  (Figure 2 ) .  i s designed  identi-  T h i s s e t a l s o c o n t a i n s 80 d a t a  t o t e s t the a b i l i t y of each technique  o u t l i n i n g the v i s u a l l y f e a s i b l e boundaries  of the t h r e e  groups.  Both the North Burnaby and  the South Burnaby data  s e t s r e p r e s e n t the l o c a l i t i e s o f m a i l boxes i n the l i t y of Burnaby.  The boundary d i v i d i n g North and  MunicipaSouth  Burnaby i s an a r b i t r a r i l y  s e t l i m i t t o separate the r o u t e s o f  the m a i l r u n s .  s e t s of data have no v i s u a l l y d e t e c t -  able c l u s t e r s  These two  (Figure 3 and 4) and the present r o u t e s of 5  Legend • 9  Box L o c a t i o n  © J*  JJ  j»  5 1  7 3  41  <U  22  1»  ©  2  ©  ©  ©  .  «  37  rt  o  ©  G  21  *•  .  •  SO  9  o  «»  3*  ,  ©  ©  CO  0  ©  »  40  L o c a t i o n Map  ©  .  20  7  ©  ©  e  „  ©  *J  60  8  77  80-  100  of Evenly D i s t r i b u t e d Contrived  120  140  Data Set(DATAl)  ®„  Legend ©  9  (  ©  ©, 1  i  ' * . •Box Location  © s ©  «  ©  S 7  .  3  u  o  e  ©  9  9  23  • © ti  © It  9  »  a  a  3V  ©  a  »»  «2  ft  30  O  ©  20  ©  «  J7 _  •  »  3J  %  O  J* .  " © 9  ®  «i  O  «0  ©  © «J  «*  © 9  20  Figure  »i  © to  40  2.  Location  ©  ®  .  M  A  »9  73  ©„ «  71  ©  M  ®  O  o **  74  ®  ™  Q  O  ©  75  «4  "  «S  f*  ©  ©  0  W  7 8  ©  77  9 *8  60  Map  M  ©  0 8  65  ®  57  »' ©  Q  (5  3« ©  _  25  62  O  S9  ©  ©.  of Unevenly  66  100  Distributed  120  Contrived  Data  140  Set  (DATA2)  Figure  3.  L o c a t i o n Map o f  N o r t h Burnaby M a i l Boxes  (NBDATA)  t r u c k s s e r v i n g these areas do not conform system.  There are 87 and  113 box  South Burnaby r e s p e c t i v e l y .  t o any  grouping  l o c a t i o n s i n North and  The x and y - c o o r d i n a t e s o f these  l o c a l i t i e s are a l l measured w i t h r e s p e c t t o the g r i d adopted  system  by the Post O f f i c e .  A l l f o u r s e t s of d a t a have t h e i r own characteristics.  spatial  The c o n t r i v e d data s e t s are designed  to  t e s t the a b i l i t y of d i f f e r e n t c l u s t e r i n g techniques t o o u t l i n e v i s u a l l y d e t e c t a b l e group b o u n d a r i e s .  On the other hand, the  e m p i r i c a l data s e t s are used t o examine the v a l i d i t y o f the c l u s t e r methods as a t o o l f o r grouping non-patterned l o c a t i o n s of boxes f o r the Post O f f i c e . u s i n g comparisons  The  implications i n  o f grouping r e s u l t i n g from d i f f e r e n t t e c h -  niques on these f o u r data s e t s would d e p i c t , i f any, " n a t u r a l " groupings  5.1.2  actual  i n h e r e n t t o each of these data  the  sets.  A s s o c i a t i o n Measure The  two-dimensional  C a r t e s i a n data s e t s t e s t e d i n  t h i s study a r e a c t u a l l y l o c a l i t i e s o f data p o i n t s w i t h unrelated variables.  two  These m a t h e m a t i c a l l y u n r e l a t e d v a r i a b l e s  of any g i v e n d a t a p o i n t i n the s e t have r u l e d out the use c o r r e l a t i o n measure as the c l u s t e r i n g c r i t e r i o n . a l t e r n a t i v e i s d i s t a n c e measure.  The  of  other  Among the v a r i o u s types o f  d i s t a n c e measures, the E u c l i d e a n one  i s probably the most  a b l e measure f o r c l u s t e r i n g a l g o r i t h m s .  suit-  '  T r a d i t i o n a l l y , the  true  p o i n t s i and j can be expressed,  d i s t a n c e between two  i n terms o f t h e i r x and y  coordinates, as:Cl n = k(( X j - X j / + ( y j - y j  (i)  ) ) P  where k and p a r e d e r i v e d from a n a l y s i s o f the data value o f k and p both r e f l e c t the a c t u a l d i s t a n c e from p o i n t i t o p o i n t j .  A study  set.  The  travelled  of the scheduling  problem  (Tse, 1975) r e l a t e d t o the Post O f f i c e m a i l runs i n d i c a t e s t h a t the value o f k and p depend on the road a c c e s s i b i l i t y from one p o i n t t o another. acceptable  I t i s generally  t o use k and p as 1, and formula  dj  j -  p a t t e r n and t h e  (1) w i l l become  (2)  l r ihh- i . x  x  y  In t h i s study, t h e l a t t e r form i s used because:(1)  the two axes o f the c i t y road  g r i d a r e mostly  r e c t a n g u l a r t o each o t h e r ; (2)  most m a i l boxes a r e l o c a t e d a t the corner o f  a s t r e e t where the roads i n t e r s e c t ; and (3)  the t r a v e l l i n g d i s t a n c e from one l o c a t i o n t o  another would be the summation o f h o r i z o n t a l d i s t a n c e  along  the E-W d i r e c t i o n and the d i s t a n c e a l o n g the N-S d i r e c t i o n . T h i s d i s t a n c e measure i s r e f e r r e d t o as " C i t y - B l o c k " d i s t a n c e in this  study.  5.1.3  Inputs t o C l u s t e r i n g Methods As  described  i n Chapters I I I and IV, t h e i n p u t s  r e q u i r e d by v a r i o u s h i e r a r c h i c a l and n o n h i e r a r c h i c a l i n many a s p e c t s .  differ  B a s i c a l l y , s i m i l a r i t y m a t r i x and data  v a r i a b l e s a r e the two major i n p u t s .  I n t h i s study, s t o r e d  m a t r i x approach i s used f o r s i x h i e r a r c h i c a l methods, and data v a r i a b l e s a r e i n p u t s t o the other  The  s i x c l u s t e r i n g runs.  elements o f the symmetric s i m i l a r i t y  matrices  c o n t a i n i n g d i s t a n c e measures o f the four data s e t s a r e computed by a s m a l l computer program, MATRIX (Appendix A ) . T h i s program s t o r e s the v a l u e s  o f the v a r i a b l e s o f each data  p o i n t i n a s e t i n the computer memory, and d i s t a n c e s , from p o i n t previous  i t o j a r e computed u s i n g formula  section.  These d i s t a n c e s  l e s s than 1.0 as r e q u i r e d The  djj ,  (2) i n t h e  a r e l a t e r s c a l e d t o value  by some o f the c l u s t e r i n g programs  elements o f the lower t r i a n g l e o f the symmetric m a t r i x  a r e then s o r t e d i n rows o f 10 elements i n t o a f i l e  forclus-  t e r i n g procedures.  Raw data v a r i a b l e s a r e a l l punched on cards i n pre determined formats f o r v a r i o u s  c l u s t e r i n g algorithms. A l l  these v a r i a b l e s a r e i n u n i t s o f m i l l i m e t e r conforming t o the system used by the Post O f f i c e .  5.1.4  The The  2 f o r the the  important key  data sets are  sets  predetermined: t h e r e are  3 clusters  not  3 f o r the  2 and  the  stated  i t i s only  unanimous key  The  two  In order  to  nonhierarchical  i d e n t i f i e d i n the  The  tree  corresgroups  diagram  clustering results.  Cluster  O b v i o u s l y t h e r e i s no The  number of  above paragraph.  dendogram o f the  What to  South  logical  f o r each data set are  i n the  f o r h i e r a r c h i c a l methods a r e  5.1.5  other.  3 routes for  number of groups.  number of c l u s t e r s  r e p r e s e n t i n g the  set:  f i n a l stage, thus i t i s  compare r e s u l t s of both h i e r a r c h i c a l and  pondingly e q u a l as  number  These methods group data  s i n g l e c l u s t e r a t the  necessary t o p r e d i f i n e  methods, the  The  data  accordingly.  f o r h i e r a r c h i c a l methods.  i n t o one  four  the  arbitrarily  There i s no requirement t o d e f i n e the clusters  in  number of c l u s t e r s f o r North and  South Burnaby r e s p e c t i v e l y , and  2 and  The  or t h r e e groups.  evenly d i s t r i b u t e d d a t a set and  Burnaby areas are  t o use  two  f o r both c o n t r i v e d  c o n t r a r y , the  N o r t h and  i s an  methods used i n t h i s study.  grouped i n t o e i t h e r  of c l u s t e r s  On  Clusters  number of c l u s t e r s  nonhierarchical sets are  Number of  c h o i c e of what to  cluster.  f o r c l u s t e r , i n t h i s c a s e , i s the data u n i t .  t o t a l l y unrelated variables  would make the  clustering  of variables unfeasible the  and meaningless.  v a r i a b l e s among d a t a p o i n t s  v a r i a b l e s as the basic  5*1.6  further discourage theuse of  cluster unit,  C l u s t e r i n g Techniques The  nine  h i e r a r c h i c a l and three  c l u s t e r i n g methods d e s c r i b e d 12  The independence o f  nonhierarchical  i n Chapters I I I and IV a r e t h e  b e t t e r known ones a n d a r e p r o b a b l y most a p p l i c a b l e t o  distance  measures t h a n other  In this  study, various  complex c o m p u t a t i o n a l methods.  computer programs f o r t h e s e c l u s t e r i n g  methods were used.  Fortran Euclidean  distance  programs f o r s i n g l e l i n k a g e as c r i t e r i o n ) , complete  l i n k a g e w i t h i n new g r o u p , a v e r a g e  ( u s i n g minimum  l i n k a g e , average  l i n k a g e b e t w e e n merged  g r o u p s , c e n t r o i d and m e d i a n methods a r e a l l m o d i f i e d Anderberg's  ( 1 9 7 3 ) A p p e n d i x E (pp.  t h e s e programs i n c l u d e triangle  275-305).  The i n p u t s  This  p r o g r a m s i s made u p o f t h e m a i n p r o g r a m DRIVER;  The use o f d i f f e r e n t v e r s i o n s  g e n e r a t e r e s u l t s f o r t h e above s i x methods. each c l u s t e r procedure i s presented (Appendix B).  tified  set of  subroutines  CNTRL, CLSTR, MTXIN, T R E E , a n d METHOD; a n d f u n c t i o n  gram  for  t h e number o f e n t i t i e s a n d t h e l o w e r  o f t h e symmetric s i m i l a r i t y m a t r i x .  (Appendix B).  from  LFIND  o f METHOD w o u l d The r e s u l t o f  i na horizontal tree d i a -  From t h e s e t r e e d i a g r a m s , groups a r e i d e n  and p l o t t e d a c c o r d i n g l y  onto f i g u r e s .  The  UBC:BMDP2M program i s b a s i c a l l y a s i n g l e l i n k a g e  clustering routine. use  I n t h i s program, Engelman and Fu (1970)  e i t h e r square r o o t s o f the sum of squares o f d i f f e r e n c e s  ( E u c l i d e a n d i s t a n c e ) or c h i - s q u a r e d i s t a n c e measure.  o f t h e data p o i n t s as  Both these c r i t e r i a g i v e d r a s t i c a l l y  ent r e s u l t s from that of s i n g l e l i n k a g e u s i n g simple d i s t a n c e as a c r i t e r i o n measure.  differ-  Euclidean  Data v a r i a b l e s a r e the i n p u t s  to t h i s program, and a v e r t i c a l t r e e diagram  (Appendix C)  r e c o r d i n g the c l u s t e r i n g sequences i s output from t h i s computer package program.  Another UBC package program CGROUP ( P a t t e r s o n and Whitaker, 1973) uses Ward's e r r o r sum of squares grouping niques t o c l u s t e r data  points with v a r i a b l e s .  tech-  S i m i l a r t o BMDP2M  program, the inputs t o CGROUP i n c l u d e the s e t of v a r i a b l e s f o r each data u n i t and the options as o u t p u t i n g  the r e s u l t s .  f o r running  the program as w e l l  The output c o n t a i n s  a detailed  sequence o f the c l u s t e r i n g procedure, the group membership a t each s t e p , and a v e r t i c a l t r e e diagram.  An o p t i o n a l output i s  the p l o t o f the e r r o r sum o f squares versus groups  the number o f  (Appendix D) . A program f o r three h i e r a r c h i c a l methods a r e modi-  f i e d from Anderberg's  (1973) Appendix F(pp.  306-325).  program i s designed t o implement the three nearest s o r t i n g techniques  described  This  centroid  i n Chapter IV. The Forgy's and  Jancey's grouping methods a r e o p t i o n s  i n a version of  subroutine  KMEAN, and t h e C o n v e r g e n t  m e n t e d i n a n o t h e r v e r s i o n o f KMEAN.  K-Mean m e t h o d i s The w h o l e p r o g r a m  composed o f DRIVER, t h e m a i n p r o g r a m ; a n d 3 E X E C , RESULT and KMEAN ( A p p e n d i x E ) . initial this  The o t h e r  t h e number o f v a r i a b l e s clusters  for this  inputs  subroutines:  essentially a list  All erent outputs  f o r e a c h e n t i t i e s , t h e number  f o r each data p o i n t .  The o u t p u t  for this  study.  used t o t e s t the e f f e c t  n i n e methods  steps  is  also  The s i n g l e  of measures  straight  linkage  methods  III  and IV.  i n the f o l l o w i n g  T o o l f o r I n t e r p r e t a t i o n of  The  Other other  results of  sections.  Results  from a l l the computer programs  representations  are  on g r o u p i n g r e s u l t s .  forward implementation of the  described i n Chapters  The o u t p u t s  diff-  o f d i f f e r e n c e s and c h i - s q u a r e s )  these techniques are discussed  various  is  a r e used t o generate  f r o m t h e s e t r i a l s a r e t h e n c o m p a r e d and e v a l u a t i o n s  5.2  and  E).  t h e above programs  are j u s t  of  of membership w i t h i n each r e s u l t i n g c l u s t e r .  ( E u c l i d e a n , sum o f s q u a r e  methods  in  i n c l u d e t h e number o f e n t i t i e s ,  The number o f e n t i t i e s moved i n t h e i t e r a t i v e (Appendix  or  the c l u s t e r s  set of d a t a , o p t i o n a l output features  the a c t u a l v a r i a b l e s  ouput  is  E i t h e r seed p o i n t s  p a r t i t i o n s c a n be u s e d t o i n i t i a t e  program.  imple-  of the c l u s t e r i n g r e s u l t s .  diagram t o g e t h e r w i t h the c l u s t e r i n g sequence a r e  give Tree  commonly  the h i e r a r c h i c a l outputs. list  i s output  On the other hand, only a membership  from n o n h i e r a r c h i c a l  methods.  This  inconsistent  r e p r e s e n t a t i o n of outputs presents a problem i n comparing the results effectively. Tree diagram i s a c t u a l l y a very e f f e c t i v e t o o l f o r i n t e r p r e t i n g the c l u s t e r i n g r e s u l t s .  However, i f there are  more than 50 p o i n t s i n the data s e t , the t r e e becomes complex and  i t is difficult  to trace  the t r e e without a step by  step  f o l l o w - u p of the sequence a t the same time. Membership l i s t  i s not u s e f u l a t a l l as a t o o l f o r  i n t e r p r e t a t i o n u n l e s s f r e q u e n t r e f e r r a l s t o the data u n i t i n p u t s a r e made. difficult,  M u l t i - d i m e n s i o n a l data are t h e r e f o r e very  to i n t e r p r e t  than 3-dimensions.  The  however, can be p l o t t e d  i f the r e p r e s e n t a t i o n space 2-dimensional  data used  i s more  i n t h i s case,  onto maps a c c o r d i n g t o the v a l u e s of  the data v a r i a b l e s . The r e s u l t s from both h i e r a r c h i c a l and h i e r a r c h i c a l methods - the l i n k i n g of e n t i t i e s and l i s t s , can be p l o t t e d  onto maps of the data u n i t s .  of the r e s u l t s are the keys t o i n t e r p r e t a t i o n and  5.3  the  non-  grouping  These  plots  comparisons.  Results The r e s u l t s from d i f f e r e n t c l u s t e r i n g  from the computer programs are p l o t t e d parisons.  The  techniques  onto maps f o r com-  l i n k s of e n t i t i e s are p l o t t e d  as s t r a i g h t  lines  between p o i n t s i n the graph.  The  s e q u e n t i a l l i n k s of a l l the  p o i n t s as output by the h i e r a r c h i c a l programs are c h a r t e d u s i n g the lowest indexed p o i n t as the l i n k t o another or  entity  group lead p o i n t (the lowest indexed p o i n t of the group).  These r e s u l t s i n diagrams of•many, i f not c o n f u s i n g , l i n k s between data p o i n t s .  The r e c o g n i z a t i o n o f the l a s t few  links  among groups or e n t i t i e s i n the diagrams a l l o w the user to i d e n t i f y the group boundaries f a i r l y  The  easily.  group boundary f o r n o n h i e r a r c h i c a l r e s u l t s are  e a s i e r t o handle.  The data p o i n t index on the graph i s  i d e n t i c a l t o t h a t on the membership output  list,  thus boun-  d a r i e s of the c l u s t e r s can be e a s i l y p l o t t e d onto the  In  diagram.  view that the f o u r s e t s o f data have d i f f e r e n t  s p a t i c a l c h a r a c t e r i s t i c s , the r e s u l t s of the 12 methods of each data s e t are presented i n the f o l l o w i n g s u b - s e c t i o n s for  easy i d e n t i f i c a t i o n .  The  d i s c u s s i o n of r e s u l t s i s a l s o  i n c l u d e d i n these s u b - s e c t i o n s .  5.3.1  Evenly D i s t r i b u t e d C o n t r i v e d Data  (DATAl)  A t o t a l of 80 data p o i n t s e x i s t s i n t h i s data set (Figure 1 and Appendix F ) . seed p o i n t s and  D i f f e r e n t randomly chosen  initial  i n i t i a l p a r t i t i o n s were input f o r the t h r e e  n o n h i e r a r c h i c a l methods, and i d e n t i c a l i n a l l these t r i a l s  the r e s u l t i n g groupings  are  (Table 3, F i g u r e s 14 and  15).  Group Sizes Trial Group Seed Points 26  59  20  66  15  Jancey s 1  36  44  44  36  36  Forgy's  36  44  44  36  36  Convergent K-mean  36  44  44  36  36  40  40  35  45  30  Jancey's  44  36  36  44  36  Forgy's  44  36  36  44  36  Convergent K-mean  44  36  36  44  36  Initial Partition Methods  Table  3.  Summary o f N o n h i e r a r c h i c a l Runs f o r DATAl  The are  r e s u l t s from d i f f e r e n t c l u s t e r i n g  p l o t t e d as  shown i n F i g u r e s 5 t o 15.  Undoubtly  d i f f e r e n t methods used g i v e v a r i o u s r e s u l t s entities within  the  groups and  T a b l e 4 summarizes the  membership l i s t  The  included  i n r e s u l t s are  the  end  algorithms.  i n the Ward's method  necessarily  i n s e c t i o n 5.5  The  and  37  Slight variations  indicate  that Ward's  best f o r grouping evenly  C e n t r o i d and  of the  the  the  The  single  same a p p l i e s and  to the  results the  It is interest-  Median methods both have  group members i n the  group members i n the  l i n k a g e methods r e f l e c t  the  respectively).  approach i n e v a l u a t i n g  same c l u s t e r i n g sequences and  "City-Block"  differences  clustering  s u p e r i o r i t y of d i f f e r e n t c l u s t e r i n g methods. the  G.  tables  gives a more comprehensive judgement on  that  two  2 i s perhaps best balanced  of squares method i s the  d i s t r i b u t e d data s e t .  defined  Among the h i e r a r c h i c a l methods,  (group s i z e s of 43  T h i s , however, does not  the  two  i n Appendix  These  products of d i f f e r e n t  number of e n t i t i e s i n group 1 and  i n g to n o t i c e  of these groups.  data set g i v e v a r i o u s  group s i z e s as w e l l as group memberships.  sum  number of  r e s u l t s presented i n the diagrams and  f o r t h i s evenly d i s t r i b u t e d c o n t r i v e d  error  the  l i s t s of memberships of the  groups r e s u l t i n g from each method are  c r i t e r i a and  i n the  number of e n t i t i e s f o r the  groups f o r t h i s data s e t .  The  techniques  two  clusters.  average  s i m i l a r i t y of these two  algorithms.  l i n k a g e methods u s i n g  Euclidean:, d i s t a n c e as measures.  Chi-square  Group Sizes Group  1  2  C i t y - Block  67  13  Euclidean Distance  67  13  22  58  Complete Linkage  35  45  Avg. Linkage between Merged Group  27  53  Avg. Linkage within New Group  27  53  Centroid Method  49  31  Median Method  49  31  Ward's Method  43  37  Jancey's Method  36  44  Forgy's Method  36  44  Convergent K-mean Method  36  44  Methods Hierarchical Single Linkage  Chi-Squares  Nonhierarchical  T a b l e 4.  Results  of 12 C l u s t e r i n g Methods f o r DATAl  Legend a  Box Location Cluster Link 2nd Last Link  80i  Last Link 7u  i  60-1.  £0i  40  .30H  201  0  r  r—r  A  20  Figure  5.  Linkages  eo  40  Outlined  by  Single  100  co  Linkage  -  120  " C i t y  - Block  11  Method  140  I"  f o r DATAl  ON  in  Legend 9 - —  Box  Location  Cluster Link 2nd  Last  Last  -i"  1  -1—  20  Linkages  — i —  1  i  40  Outlined  i  60  by  i.  •  80  100  Single Linkage-Euclidean  120  D i s t a n c e Method  140  f o r  DATAl  Link  1  r  Link  Legend .  ©  Box  Location  Cluster 2nd  77  t  i  20  Linkages  l  l  I  40  Outlined  I  60  by  Single  I  ,  •  SO  Linkage-Chi  •  100  Squares  •  .  120  Method  140  for  DATAl  Link  Last Link  Legend  ©  Box L o c a t i o n Cluster Link  Figure  9.  Linkages  O u t l i n e d by A v g .  L i n k a g e b e t w e e n Merged G r o u p s  Method f o r  DATAl  us  Legend  0 4— 0  1  — i 20  1  F i g u r e 11.  — i — 40  1  r 60  1  1 80  1  i — — - i — — i 100  : 120 1  Linkages O u t l i n e d by C e n t r o i d Method f o r DATAl  1  :  ~ 140 r  Legend  F i g u r e 12.  Linkages O u t l i n e d by Median Method f o r DATAl  legend o  Box L o c a t i o n Cluster Link 2nd L a s t L i n k  F i g u r e .13.  Linkages O u t l i n e d by Ward's Method f o r DATAl  •^1  Legend ©  Box L o c a t i o n  ©  Seed P o i n t  New C e n t r o i d — -  Group Boundary  Initial Partition  80i  ©  ©  30  ~72  6CH  » ©  4  \  ®  N  ©  © ©  9  S3  ©  4CH  J^M  /  e ©  G  20H  ©  ©.,  ©  ©  ©  Figure  14.  ©  37  ©  SO  S  ©  IS  57  4  Q  5 S  9  6 9  e«  © <*  /  7»  • *  4  40  ©  60  Group Boundaries Defined Using  5 8  ©_70  ©  ©  © ©  19  *  07  21 18  \  ©  G  ©  40  20  27  i;  1*  ©  7»  ©  9  -I  6*  •  80-  ©  100  120  /'  140  by 3 N o n h i e r a r c h i c a l Methods  Seed P o i n t s as Inputs f o r DATAl 00  Legend ©  Box L o c a t i o n  Kew C e n t r o i d  ®  Seed P o i n t  Group Boundary  Initial Partition ©  © s 30  1  i! :  ©  1  : i ; i  •j > i  3S  ©  •  ©  Ii ' •1 ©  ,  i  «l  © .  .  ©  1 I I  ©.  ©  -  ©  II  I  73  ©  ©  ——?r ^. - — —  33  28  a  \  ©  Q  M  'I  1'v  9  22  \  9  ©. ~  38 , 1 11  28  !•  ©_  51  .  '.'  © »  !l  M  1  ©  ©  23  V  ©  1 1  >  ©  M  ^  ©  - - ^ ^ »  \ \ I \  \ \ I  •  ii  II  / ' |  n '1, /  |  /I 1  t .  C7  ©  37  ©  ! © • i i  ©  65  W 18  t© •V  32 20  40  20  Figure  15.  s  ©  it  4  ii 70  ©  ® 60  \\  11 11  ©  ©  ©  # /  *» © 7t  40|  60  so-  Group Boundaries Defined Methods U s i n g  100  120  140  by 3 N o n h i e r a r c h i c a l  I n i t i a l P a r t i t i o n s f o r DATAl VJO  single  l i n k a g e method r e s u l t s , however, do not resemble any  of the h i e r a r c h i c a l method,, and i t i s d o u b t f u l that method i s s u i t a b l e  for clustering  t h i s evenly  this  distributed  data s e t .  All  the n o n h i e r a r c h i c a l  they a r e f a i r l y  r e s u l t s a r e i d e n t i c a l and  evenly balanced i n group s i z e s .  These non-  h i e r a r c h i c a l methods have a unique c h a r a c t e r i s t i c from the h i e r a r c h i c a l ones: the c l u s t e r s instead  different  a r e grouped  laterally  of r a d i a l l y as d e f i n e d by most h i e r a r c h i c a l methods.  T h i s c h a r a c t e r i s t i c would d e f i n i t e l y a f f e c t the w i t h i n group i n t e r - u n i t t r a v e l times and d i s t a n c e s , a l s o  the d i s t r i b u t i o n  of d i s t a n c e s between u n i t s would be d i f f e r e n t the  other methods.  the  superiority  from that of  These r e s u l t s , however, do not  o f one c l a s s  o f method over the other.  e v a l u a t i o n o f these methods i s i n c l u d e d i n s e c t i o n  5.3.2  Unevenly D i s t r i b u t e d  C o n t r i v e d Data  T h i s data s e t has 80 data u n i t s distinguishable  groupings  to the t r i a l s c a r r i e d trials  (DATA2)  located  i n 3 visually  (Figure 2 and Appendix F ) .  seed p o i n t s and i n i t i a l  for nonhierarchical  methods were conducted.  these t r i a l s d i f f e r  slightly  Further  5.5.  out f o r the evenly d i s t r i b u t e d  of u s i n g d i f f e r e n t  i n i t i a l partitions  indicate  Similar data s e t ,  partitions  The r e s u l t s o f  i n using d i f f e r e n t  seed p o i n t s or  (Table 5 and F i g u r e s 25, 26).  This i s  probably the r e s u l t o f the i n c l u s i o n o f in-between data p o i n t s  Group S i z e s 1  Trial  2  3  i  2  3  1  2  3  1  2  3  8  35  67  10  40  60  7  30  70  Jancey's  16  25  39  23  34  23  23  23  34  Forgy's  23  29  28  16  41  23  15  29  35  Convergent K-mean  23  23  34  16  41  23  16  29  35  Initial "^*~^^Par titlon Methods  20  34  26  27  27  26  23  23  34  Jancey's  15  25  39  23  23  34  16  39  25  Forgy's  23  23  34  23  23  34  23  34  23  23  23  34  23  23  34  23  34  23  Group — - - ^ S e e d Points Methods ""-•*---^^^  Convergent K-iaean  T a b l e 5.  Summary o f N o n h i e r a r c h i c a l Runs f o r DATA2  f o r the t h r e e v i s u a l l y i d e n t i f i a b l e c l u s t e r s . group s i z e s  i s r e l a t i v e l y s m a l l but t h i s i n d i c a t e s  ance of i n i t i a l seed p o i n t s or c e n t r o i d c a l methods.  Resembling the r e s u l t s  d e f i n e d by these methods extend as  The v a r i a n c e o f the import-  i n using nonhierarchi-  f o r DATAl, the c l u s t e r s  laterally  instead  of r a d i a l l y  i n h i e r a r c h i c a l methods.  The r e s u l t s from none h i e r a r c h i c a l methods d i f f e r greatly  from one another  (Table 6 and F i g u r e s 16-24).  In  t h i s s e t o f r e s u l t s , the C e n t r o i d and Median methods have identical clustering trary to r e s u l t s  sequences as w e l l  as membership.  f o r the evenly d i s t r i b u t e d  l i n k a g e method u s i n g  " City  - Block "  Con-  data s e t , s i n g l e  measure does not  resemble any other l i n k a g e methods or c l u s t e r i n g methods. The e x i s t e n c e o f 1-member group f o r the 3 c l u s t e r s method urges t o draw a c o n c l u s i o n that "City-Block" distributed  single  i n this  linkage -  method i s i n a p p r o p i a t e f o r grouping unevenly data s e t . The other two s i n g l e  l i n k a g e approaches,  on the other hand, give i d e n t i c a l memberships t o the 3 groups though sequences of grouping a r e d i f f e r e n t . age  methods a l s o  The average l i n k -  g i v e i d e n t i c a l memberships as w e l l as group-  i n g sequences t o the 3 groups.  The intended group s i z e s a r e  19,  34 and 27 and the r e s u l t s  are  i d e n t i c a l t o t h i s predetermined group s i z e s .  l i n k a g e methods a l s o sizes.  from complete l i n k a g e method  give very s i m i l a r r e s u l t s  The average  t o these group  Group Sizes 1  2  3  C i t y r Block  53  1  26  Euclidean Distance  17  25  38  17  25  38  Complete Linkage  19  34  27  Avg. Linkage between Merged Group  20  33  27  Avg. Linkage within New Group  20  33  27  Centroid Method  20  37  23  Median Method  20  37  23  Ward's Method  42  14  24  Jancey's Method  23  23  34  Forgy's Method  16  29  35  Convergent K-mean Method  16  29  35  Group Methods Hierarchical Single Linkage  Chi-Squares  Nonhierarchical  T a b l e 6.  Results  of 12 C l u s t e r i n g Methods f o r DATA2  Legend  ©  Box Location Cluster Link  SO-  —  3rd Last Link  .— 2nd Last Link Last Link 60-  504  40-1  TS  " \ J 1  30i  eo  77  20  61  04  Figure  20  16.  40  (9  60  60  Linkages O u t l i n e d by S i n g l e Linkage-".City  100  - Block "  120  140  Method f o r DATA2 co  Legend e  Box L o c a t i o n Cluster Link  °C  Figure  20  '  17.  *"  40  '  60  '  Ip  TQO  Linkages O u t l i n e d by S i n g l e Linkage-Euclidean  '  120  '  140  Distance Method f o r DATA2  CO  Legend ©  °6  ~"  Figure  20.  20  '  Linkages  40~~  Outlined  '  £5  by A v g .  '  "~iq  Linkage  ~ ~°~. t  between  100  Merged  '  Box  Location  120  Groups  Method  1*0  f o r DATA2  CO CO  Legend ©  Box L o c a t i o n Cluster Link  °0  "~~~  !  F i g u r e 21.  20  '  40  ^  '  60  '  ~£o  100  '  120  140  '  Linkages O u t l i n e d by Avg. Linkage w i t h i n New Group Method f o r DATA2  00  Legend ©  Box L o c a t i o n Cluster Link 3 r d La3t L i n k 2nd Last L i n k Last Link  "  1  gure  40  22.  60  Linkages  SO  Outlined  100  By C e n t r o i d  120  Method  140  f o r DATA2  o  Legend ©  Box l o c a t i o n 'Cluster Link  °Z  '  20  ~  40  F i g u r e 23.  '  60  '  86  '  TOO*  '  120  '  Linkages O u t l i n e d by Median Method f o r DATA2  140  Legend ©  Box L o c a t i o n C l u s t e r LirJc 3rd L a s t L i n k 2nd L a s t LirJc Last Link  '  40  F i g u r e 24.  '  SO  '  80  r  ~~  10cf~  '  120  '  Linkages O u t l i n e d by Ward's Method f o r DATA2  140  Legend. ©  Box L o c a t i o n  (3)  Seed P o i n t  ^ /  /  ©  /  Initial  Partition  9 6  .  ©  w  ^  X  ©  i©**©  © ©  ^.  ^^  ©  Group Boundary  ©  ©  2«  \  ©  ©  ©  ©  .  ©  "©  ^  ^  18  X  ©3X © ©  2 5  » N  yew C e n t r o i d  ( V-30  © ©3J e12 e27 ©  9  © ©38  ® «  ° M  »  ~-  N *  .  ; 70  •  V  20  r  "  F i g u r e 25.  40  '  /  / • © 43  •  ©  ©  At  ii  n  to  •  44 4ft 5  "  ®«s"* «7 ^  64  ©  »*  55  C  ^ - •  ©,  1  **  °„  © e^  78  —  _  ~~ —  "i  *  %  V  J  '  ^ '  SO  '  lo  Group Boundaries Defined  '  TOO  '  120  140"  by Forgy's and Convergent K-mean  Methods u s i n g Seed P o i n t s as Inputs f o r DATA2  Legend ©  Box L o c a t i o n  © •//' °»  / / ©.  //  ^  »  *  .110  ©  1  ©  4  \ -  Initial  ©w  t / S  {; ©. * ©  +  ©_ © . ^  B  \  —  _____  O  7  \\ ©  ©  -  ©  ^—  V.^-.-j;^-' /  ~  • Seed P o i n t  » ©  x  - _  29  s  ©\\©  ©  21 23 V ©  ©  26  \  «. «^  •  37 \  < 0  i  N  -k  K = : «  \  s / '  \  20  '  •Figure 26.  \ \. \  40  ©  ©  <3  ©  53  49  ©  "  <  «  \  ,  64  SO  X  \  JS  e7  —— " * " " © " "  »  " ^  © ' • ' * *  ©  — -N— "*""" ' ' © * ~  V  ^ ©  gt  © '  '  \  E  V'' S  ' ©  /  ©  14  \  Boundary  ~~  © \> ©  >^  .  New C e n t r o i d  — G r o u p  3H  Partiticn  .71  ««  .  ©  ©  74  »  \^ \  \  '*  © * * , - - ^ — — — — • • . — * ' *  47  *  ~  _  •»  © / • J"?-.'  60  Group Boundaries  '  30  '  i  100  Defined by Jancey's  I n i t i a l Partitions  f o r DATA2  i  i  120  Method U s i n g  .  •  140  95  On the whole, i t i s c l e a r t h a t v i s u a l l y i d e n t i f i cable  groups i n an unevenly d i s t r i b u t e d data s e t can be  d e t e c t e d e a s i e r than t h a t o f evenly d i s t r i b u t e d data s e t . Further evaluations  5.3.3  are d i s c u s s e d  i n s e c t i o n 5.5.  North Burnaby E m p i r i c a l Data  (NBDATA)  There a r e a t o t a l o f 87 box l o c a t i o n s f o r two r o u t e s i n the North Burnaby area  (Figure  3 and Appendix F ) .  These  l o c a t i o n s are s i t u a t e d i n an area o f approximately 34 square miles.  This  s c a t t e r e d , ungrouped data s e t has no v i s u a l l y  identifiable clusters. two  groups o f 46 and 41 r e s p e c t i v e l y  r o u t e s has l i t t l e box  These p o i n t s  are presently (Figure  i m p l i c a t i o n on the d e s i r e d  r o u t e d as  2 7 ) , and these grouping o f the  locations.  The  h i e r a r c h i c a l and n o n h i e r a r c h i c a l  presented i n T a b l e 7 and F i g u r e s  28-38.  r e s u l t s are  The group s i z e s vary  from 3 t o 39 f o r group 1 and 48 t o 84 f o r the o t h e r .  The  r e s u l t s o f two d i s s i m i l a r group s i z e s i n the f i v e h i e r a r c h i c a l methods  (single linkage:"City  average l i n k a g e  distance;  between merged groups; and the c e n t r o i d methods)  show t h a t a p o t e n t i a l c l u s t e r a distance  - B l o c k " and E u c l i d e a n  (points  1, 2 and 3) l o c a t e d a t  from the r e s t o f the l o c a t i o n s would temper the  effectiveness  o f these c l u s t e r i n g a l g o r i t h m s .  t h a t groups l o c a l i t i e s  The only method  i n t o c l u s t e r s o f 39 and 48 members  Group Sizes Group  1  2  C i t y - Block  3  84  Euclidean Distance  3  84  Chi-Squares  29  58  Complete Linkage  37  50  Avg. Linkage between Merged Group  3  84  Avg. Linkage within New Group  39  48  Centroid Method  12  75  Median Method  12  75  Ward's Method  35  52  Jancey's Method  35  52  Forgy's Method  31  56  Convergent K-mean Method  31  56  Methods Hierarchical Single Linkage  Nonhierarchical  T a b l e 7. R e s u l t s  o f 12 C l u s t e r i n g Method f o r NBDATA  Scale F i g u r e 28.  Linkages O u t l i n e d by S i n g l e Linkage-  " City  - Block"  1" =  0.62 mile  Method f o r NBDATA co  F i g u r e 29.  Linkages. O u t l i n e d by S i n g l e Linkage-Euclidean  Distance  Method f o r NBDATA": vo vo  Scale  F i g u r e 30.  1" =  0.62  mile  Linkages O u t l i n e d by S i n g l e Linkage-Chi Squares Method f o r NBDATA o c  Scale Figure  32.  Linkages  Outlined  by  Avg.  Linkage  between  Merged  Groups  1" =  Method  0.62  s-ile  f o r NBDATA • |_» o  Figure 3 7 .  Group Boundaries Defined by Jancey's Method U s i n g Seed P o i n t s as Inputs f o r NBDATA  Scale Figure  38.  Group K-mean  Boundaries Methods  Defined  Using  by Forgy's  and  I n i t i a l ' P a r t i t i o n  1  M  Convergent f o r NBDATA'  0.62 a i l e  r e s p e c t i v e l y i s the average l i n k a g e w i t h i n the new algorithm.  T h i s , however, does not  group  i n d i c a t e t h a t t h i s method  i s most s u i t a b l e f o r t h i s s e t o f d a t a .  The  approach i n l a t t e r s e c t i o n s would g i v e an  evaluation  in-depth  examination  of these r e s u l t s .  Three s e t s of seed p o i n t s and were a l s o used to t e s t the e f f e c t results  (Table 8 ) .  i n i t i a l partitions  i n i n i t i a l c l u s t e r on  There are i n d i c a t i o n s t h a t i n i t i a l  parti-  t i o n s have l e s s i n f l u e n c e than seed p o i n t s on the data These v a r i a n c e s  i n r e s u l t i n g group s i z e s , however, do  i n d i c a t e the s u p e r i o r i t y o f one the  set. not  n o n h i e r a r c h i c a l method over  others.  5.3.4  South Burnaby E m p i r i c a l Data T h i s s e t of data d e p i c t s the  boxes which are s e r v i c e d by South Burnaby a r e a . t h a t are v i s u a l l y this set.  (SBDATA) l o c a t i o n s of 113  3 truck routes  There i s no d i s t i n c t  identifiable  T h i s probably  used p r e v i o u s l y f o r the other grouping t h i s data  set.  (Figure 39)  mail i n the  group boundaries  ( F i g u r e 4 and Appendix F) f o r  represents  of m a i l boxes i n other a r e a s .  for  the  The  the t y p i c a l  distribution  12 c l u s t e r i n g methods  t h r e e s e t s are a l s o employed  Group S i z e s Trial  1  Group  2  3  1  2  1  2  -1  2  5  62  10  70  15  55  Jancey's  34  53  35  52  34  53  Forgy's  35  52  31  56  31  56  35  52  31  56  31  56  40  47  43  44  35  52  Jancey's  35  52  34  53  35  52  Forgy's  35  52  35  52  35  52  Convergent K-mean  35  52  35  52  35  52  Seed Points Methods  Convergent . K-mean  Initial Partition Methods  Table  8.  Summary o f N o n h i e r a r c h i c a l R u n s  f o r NBDATA  Ill  Scale  Figure  39.  Present S t r e e t  1  M  L e t t e r Box C o l l e c t i o n  Routes of South Burnaby A r e a  —  0.62  aile  The r e s u l t s of these p l o t t e d i n Table '" C i t y 111.  - Block  9  and F i g u r e s 40-50.  The s i n g l e l i n k a g e -  " . method i d e n t i f i e s group s i z e s of 1, 1 and  These group s i z e s a r e t o t a l l y unacceptable  s c h e d u l i n g problems. unbalanced r e s u l t s . and  12 methods a r e t a b u l a t e d and  The c e n t r o i d methods have s i m i l a r The complete l i n k a g e , the Ward's method  the three n o n h i e r a r c h i c a l a l g o r i t h m s  balanced  as a i d s t o  give  number of members i n each group.  reasonably  However, a l l the  r e s u l t a n t group s i z e s from these c l u s t e r i n g methods do not resemble the a c t u a l number o f c a l l p o i n t s 40 and 36). cal  (group s i z e s 38,  A more e x t e n s i v e e v a l u a t i o n examining the c r i t i -  group s i z e s w i l l be d i s c u s s e d  i n S e c t i o n 5.5.  The n o n h i e r a r c h i c a l methods' r e s u l t s presented i n Table  9  a r e the more d i s t i n c t i v e group s i z e s of s i x t r i a l s  u s i n g d i f f e r e n t seed p o i n t s and p a r t i t i o n s trials  5.4  show l i t t l e  (Table  10). These  s i g n i f i c a n t d i f f e r e n c e s i n the group s i z e s .  Tools f o r Evaluation Two methods were developed t o e v a l u a t e  r e s u l t i n g from v a r i o u s techniques  the groups  on the f o u r s e t s of d a t a .  Both methods a r e i n c l u d e d i n the computer program ROUTE (Appendix H) doing  s t a t i s t i c s and r o u t i n g on the groups  l i n e d by the d i f f e r e n t c l u s t e r i n g methods.  out-  Group Sizes 1  1  3  C i t y - Block  1  1  111  Euclidean Distance  21  26  66  Chi-Squares  34  36  43  Complete Linkage  20  45  48  Avg. Linkage between Merged Group  21  26  66  Avg. Linkage within New Group  22  36  55  Centroid Method  2  18  93  Median Method  2  18  93  Ward* s Method  45  28  40  Jancey's Method  34  45  34  Forgy's Method  33  46  34  Convergent K-mean Method  33  46  34  Group Methods Hierarchical Single Linkage  Nonhierarchical  T a b l e 9.  R e s u l t s o f 12 C l u s t e r i n g Methods f o r SBDATA  Group Sizes -  3  1  2  3  1  2  3  1  2  3  20  44  87  15  40  93  20  39  92  Jancey s  35  44  34  34  45  34  34  45  34  Forgy's  33  46  34  33  46  34  34  49  30  Convergent K-mean  33  46  34  34  49  30  34  49  30  Initial " —-r-»^Partition Methods ~"  35  40  38  38  38  37  35  44  34  Jancey's  34  45  34  35  44  34  33  46  34  Forgy's  36  46  31  36  47  30  36  46  31  Convergent K-mean  35  44  34  35  44  34  35  44  34  Group -~-^_Seed Points Me tho d s ~ ^ —  1  .  2  1  Trial  Table  10.  Summary  o f N o n h i e r a r c h i c a l Runs f o r SBDATA  115  Scale F i g u r e 40.  Linkages O u t l i n e d by S i n g l e " City  - Block "  1" =  Linkage-  Method f o r SBDATA  0.62 milo  116  Scalo  F i g u r e 41.  1" =  Linkages O u t l i n e d by S i n g l e LinkageE u c l i d e a n D i s t a n c e Method f o r SBDATA  mile  117  Scale F i g u r e 42.  1" =  Linkages O u t l i n e d by S i n g l e LinkageChi-Squares Method f o r SBDATA  0.62 nile  119  licale.  F i g u r e 44.  1" —=  0.62  Linkages O u t l i n e d by Avg. Linkage between Merged Groups Method f o r SBDATA  rvxle  120  121  122  Scale F i g u r e 47.  L i n k a g e s O u t l i n e d by M e d i a n M e t h o d  1" == 0.62 irdle f o r SBDATA  123  ^1  Legend o  Box Location Cluster Link 3rd  Last Link  jjj  2nd L a s t Link  Scale F i g u r e 48.  1"=  0.62  Linkages O u t l i n e d by Ward's Method f o r SBDATA  /  Kile  Legend  o \ <§>  Box  location  Seed P o i n t I n i t i a l P a r t i t i o n fj<  4-  \  "18  t  ^  »>*« «  Kev  Centroid  Group Boundary  \  \  \  '©  \  ^ - ^-  \ * i  1.1  *  'X  "  V  Hi  ^  1  1  1 I  \*  %  \ ^ — —  - — —  -—  Scale  F i g u r e 49.  Group Boundaries D e f i n e d by Jancey's Method U s i n g Seed P o i n t s as Inputs f o r SBDATA  125  Scale  F i g u r e 50.  1" =  0.62 mile  Group Boundaries D e f i n e d by Forgy's and Convergent K-mean Methods U s i n g I n i t i a l P a r t i t i o n s f o r SBDATA  The  f i r s t part of the program r e c o r d s  the data  p o i n t s ; c a l c u l a t e s the d i s t a n c e among data p o i n t s w i t h i n the groups; computes the mean and standard a histogram The  d e v i a t i o n s and p l o t s  of the d i s t r i b u t i o n of d i s t a n c e s  (Appendix H).  d i s t r i b u t i o n s of d i s t a n c e s w i t h i n group a r e p l o t t e d  because they a r e r e l a t e d t o the r o u t e d i s t a n c e w i t h i n each group.  C h r i s t o f i d e s (1969) i n d i c a t e s t h a t the a s s o c i a t e d  d i s t a n c e f o r optimized r a d i a l distances  truck route  (D ) i s a f u n c t i o n of Q  (Di) and the average value  number o f customers t h a t  o f the maximum  can be s e r v i c e d by one r o u t e (C).  These s t a t i s t i c s a r e then compared and used as e v a l u a t i o n key  f o r the methods.  The  second p a r t o f the program was m o d i f i e d  ROUTPLOT, a computer program developed f o r p l o t t i n g u s i n g maximum d i s t a n c e s a v i n g as c r i t e r i o n 1975).  from  routes  (Chance e t a l ,  T h i s r o u t i n e g i v e s the t r a v e l l i n g d i s t a n c e s , the  times r e q u i r e d and the order o f c a l l  p o i n t s f o r the maximum  s a v i n g r o u t e w i t h i n the d e f i n e d groups  (Appendix H).  t i m i n g and t r a v e l l i n g d i s t a n c e s a r e c r i t i c a l problems f o r the Post O f f i c e , these outputs j u d g i n g the u s e f u l n e s s  Since  i n the s c h e d u l i n are u s e f u l i n  o f c l u s t e r i n g methods.  5.5  Evaluation The  d i s c u s s i o n of the r e s u l t s  i n the  previous  s e c t i o n s have i n d i c a t e d t h a t s u b j e c t i v e e v a l u a t i o n of r e s u l t s need to be supported The  l a s t s e c t i o n introduced  by some o b j e c t i v e measures. two  appropiate  l u a t i n g the group s i z e s q u a n t i t a t i v e l y .  t o o l s f o r eva-  Since the f o u r s e t s  of data each has  s p a t i a l c h a r a c t e r i s t i c s of i t s own,  b e s t to evaluate  the methods f o r each data  5.5.1  from the two  Evenly  evaluated  tools.  d i s t r i b u t e d C o n t r i v e d Data  As d i s c u s s e d the Ward's and  analysing  (DATAl)  i n S e c t i o n 5.3.1, the complete l i n k a g  the three n o n h i e r a r c h i c a l methods each groups  t h i s data set i n t o two  groups of comparable s i z e s (Table 4 ) .  An examination of the p l o t s of the groups d e f i n e d by methods (Figure 9,  13-15), however, does not  s u p e r i o r i t y of one method over the o t h e r .  radially  l i n k a g e of h i e r a r c h i c a l methods and  i n g of n o n h i e r a r c h i c a l methods have t h e i r own  Both the  laterally  group  grouping  These groupings of comparable s i z e s are  erent from t h e o r e t i c a l s t a n d p o i n t , but  these  i n d i c a t e any  distinct  boundaries.  i t is  set s e p a r a t e l y .  In the f o l l o w i n g s u b s e c t i o n s , the methods w i l l be u s i n g the outputs  the  indiff-  i n a c t u a l case, i f  there are r e s t r i c t i o n s imposed on the t r a v e l l i n g d i r e c t i o n s , then the r e s u l t i n g group .boundaries would be c r i t i c a l to the scheduling  problem.  128 The  outputs  from the program ROUTE f o r t h i s s e t of  d a t a as grouped by v a r i o u s methods are summarized i n T a b l e and  12.  These outputs  are a f f e c t e d by the group s i z e s  and  the d i s t r i b u t i o n of the data u n i t s w i t h i n each group. trary  to the group s i z e comparisons, the complete  Con-  linkage  groupings do not have comparable means of d i s t a n c e s or times and d i s t a n c e s .  T h i s i s the r e s u l t o f grouping  ed data u n i t s i n t o c l u s t e r s . though gives f a i r l y  The Ward's method  groupings,  The  h i e r a r c h i c a l methods g i v e i d e n t i c a l r e s u l t s s i m i l a r i t y of t h e i r groupings.  The  three  non-  because of the  t r a v e l l i n g distances  times f o r groupings of these methods, i n t h i s c a s e , s l i g h t l y and are the best matches among the  12 s e t s of r e s u l t s .  d i s t r i b u t i o n p a t t e r n s of i n t e r - d a t a u n i t  times.  The  patterns  and  vary  tances w i t h i n groups are c r i t i c a l i n the d e t e r m i n a t i o n d i s t a n c e s and  travel-  because of the unevenly s c a t t e r i n g  of data u n i t s w i t h i n the d e f i n e d groups.  The  travel  scatter-  s i m i l a r means, do not have s i m i l a r  l i n g d i s t a n c e or timings  11  disof  travel  of the groups of the above  f i v e c l u s t e r i n g methods, as shown i n F i g u r e s 51-53, d e f i n i t e l y reflect methods.  the s i m i l a r i t y between the two The  p a t t e r n s of the two  groups d e f i n e d by  these  groups d e f i n e d by the complete  l i n k a g e method are d i s t i n c t l y d i s s i m i l a r .  The Ward's p a t t e r n s  have b e t t e r resemblance whereas t h a t of n o n h i e r a r c h i c a l methods have almost i d e n t i c a l d i s t r i b u t i o n p a t t e r n s . resemblance does not only o f f s e t  the mean and  The  degree o f  standard  Mean (feet) Method?  •  ___Group^  Std. Deviation  1  2  20332  6791  10629  3413  Euclidean Distance  20332  6791  10629  3413  Chi-Squares  13281  19494  6499  10577  Complete Linkage  13926  16008  6757  7863  Avg. Linkage between Merged Group  16747  12115  8178  5977  Avg. Linkage within New Group  16747  12115  8178  5977  Centroid Method  16233  12800  7973  6008  Median  16233  12800  7973  6008  15574  15004  7962  7671  Jancey's Method  18396  17811  10240  10353  Fogy's Method  18396  17811  10240  10353  Convergent K-mean Method  18396  17811  10240  10353  1  2  Hierarchical Single Linkage City  - Block  Method  Ward's Method Nonhierarchica1  T a b l e 11.  Means and Standard D e v i a t i o n s o f Groups D e f i n e d by 12 C l u s t e r :  Methods f o r DATAl  Travel Time(min) "Methods  "  •  ___^Group^^  Travel Dist.(mi)  Total Time (min) 1  2  1  2  1  2  202.82  27.92  50.70  6.98  263.12  39.62  Euclidean Distance  202.82  27.92  50.70  6.98  263.12  39.62  Chi-Squares  77.07  168.93  19.27  42.23  96.87  221.13  Complete Linkage  107.85  136,96  26.96  34.24  139.35  177.46  Avg. Linkage between Merged Group  153.19  78.02  40.44  19.51  199.09  102.32  Avg. Linkage within New Group  153.19  78.02  40.44  19.51  199.09  102.32  Centroid Method  145.31  127.16  36.33  24.82  189.41  127.16  Median Method  145.31  127.16  36.33  24.82  189.41  127.16  Ward's Method  137.20  113.58  34.20  28.39  175.90  146.88  Nonhierarchica1 Jancey's Method  120.50  118.83  30.12  29.71  152.90  158.43  Forgy'» Method  120.50  118.83  30.12  29.71  152.90  158.43  Convergent K-mean  120.50  118.83  30.12  29.71  152.90  158.43  Hierarchical Single Linkage C i t y - Block  1. & 2. from f i r s t to l a s t box ; 3. including stopping time.  T a b l e 12.  T r a v e l Times and D i s t a n c e s of Groups Defined by 12 C l u s t e r Methods f o r DATAl  INTFRVAIS (f-TET)  FR F SU NC Y 0  AO  30  lflOO.77  1*0  0 76 15779,6 1  A-******************  115 111  * * A * * ** A * * * * * * A * * * * * * * * * * * x fx » A A• * * A * * A  fi4 25039.50 29669', 45 34299739" 3*929,74  k 1:  AA ******  A A * * * * *  A  *ft*  * * * * * * * * *  * * * * *  o  TV *  X SI  4 7fi  A * A * K A- A * . * * * * * * *  A' A *  O  *  t)  __6 5  * * A  * * * * * * * * * * * * *  o  40 T6" 19  A- * * A A  ri  A A A' A  C3  /; * * A *  * * * * *  * A A *  «8"rrt'v"~jl!3"  40  80  1259,65 5795,30 '10330,7b  19401,68 _ H3937 , 13  33 0 0 0 0 5  0 116  A A A  * * * * * * * * * * * *  * * * A * It ft Z *• * * * * * * * * * * * * * * * * * * *  161  ****************************************  130  ********************************  1.5 1  i | - . . ****************************  96 " 79"  37543,50 _55L  c  46b14,42  j* * * * * * * * * * * * * *  1 SB  t  4Ho?a, »6  J 60  57  ************************ ******************* ************** *S ***!'< *  A * * *  * * * * * * *  o o CI  o  -  1 1  '  *1  * i  |  . •K * i  | i  i o  i  *  !  *  \ t  2T  •  i  o '  'La •  1  i « i • -x ! * ! *  2  ; * j  '  *  •a •v. •*  ft .  •K  j  o  *  ! o  * i  •i '•  K  *  *  *; *!  1 •*  ~j  ~T"— - T -  O  tc  * i  Pi  >  3L  !_!  *  o ft  •K •K  ••<  •<  •<  •eC -rf  !  J—  • Ln  CC' *  c  ID  c tr. • ctr rj  a-  • rj  \C tn fj  •rt  "3" o o  o  C ;  j  i  |  -  •  in —*  r~ cr c  *  in  ru  o»  rj  O  o  o >o rj M  o O  *  r-  |C 1  |o  *  * \^  L'A  I  I  1  ; * • CO  —  i  cr  o  _ i  i i  1—*  1 1  !  !©• ,  11 i  io  OJ  |  »  —rr  ' ci-  i  A.  «»  ft K  i •¥.  —I  c• o o-  •K  •iC ft f  ft •rt rt  T jIn.  —-r-  o  ^< •it fc *  •f <  ft  rv: *  •c  •f.  •K ^<  * '  -ii  ¥ * M •K  •K  !•  f\!  :  •it ft  1  1 i  •  +:  * ft ft •K  •K '  (-  • .ii  ti •K  +c  •< : -<  !*  -r—rr-  rv  to ITi  *  ! f-  •St  #  -;< i<  •X  * !  .! rc  ft  ft  * •K •  if.  -it  •K  * i  j«  !*  + * i * I  i *  ft  -< •*  -K v.  * !  T  ft  :  ' ft .  iC  •*  •<  i  * ;  +: *  I  *—  !*  *  ft  j i  * :  *  !I* ^  •X ft -K *  :  I -it  I*  ft -K 'ft •ft ft  •<  ft  CO  M -K •W * •rt ft  -« ft  r--;  «: •  o  * ft  r.  *  •K  •K  •kf  •K •  i*  •K  *  -K ft  *i ' *i  I*  *  •it  •* •X  *  * i  !* i*  * •  *; *!  -K  o  rv:  •K •K  '.  * •K :  i * j *  1  * •k ft  -K ft.  •K : *  *  2  !  •K  •K  *  GROUP  METHOO)  1  -X *  ft ft  *i  <  +x -tc  «  *  .  ft -is ft  (WARD  j1  ft !  * ; ft  •*  C-GROUP  |  *  *i  *  1 j |  *: *! * j  *  !  o  &  * ft  1  GROUP  j  *i  ';  >-  '  *1 *1  !  1  'j  d  *i  CI  c »  o f^o LH rj  iir. i » ;£*•  f". Jo  •<!  1  I  o i  ft!  •51  I  *>  ro c o  Tj  F i g u r e 52. D i s t r i b u t i o n of D i s t a n c e s W i t h i n Groups D e f i n e d by Ward's Method f o r DATA 1  IT  o-  i *  cr o  INTERVALS  FREQUENCY  (FEET)  'JO 1259.8 5 0 7 1 8 J , 11  80  120 1  r  35  *&******  76  * ft * j ^ *  86  * * * * * * * * * * * * * * * * * * * * *  86  * * * * * * * * * * * * * * * * * * * * *  * * * * * * * * * * * * * *  10111,78  n o ;s -o  1310 2,13 16063,0 7 ~1 9 0 2 3 , 7 T 2198a,36 2 ^ 9 1 5 , 00 27905.65  __8.0. . 85 61 _3.L 21  * * *_* * * * * *.* * * * * *ft* *  o r;u r n o — i cz i-n  _  ********************* *************** *_*.*..* ***-*_*_**.*  !  ft * * * *  .30866,29  0  1259,85  10  80  0  L°  1  522*.38 80 9196,90  ************************************  \l\bS\liT 17133,95 21 1 0 2 , 1 7  160  ************ *-***************************  181  *********************************************  1 69  ******************************************  no  _******* * * * * * * * * * * * * * * * * * * * *  25070,99 29039,52  i*********************  33008,01  36 9 76756 «0_915 09 _ t  *********************  10  **********  17  Mr * *: *  o  a  -o  134  d e v i a t i o n s of the d i s t a n c e s , but a l s o i n f l u e n c e the d i s t a n c e s and  travel  times.  Based on the above e v a l u a t i o n s , the n o n h i e r a r c h i c a l methods' groupings d e f i n i t e l y means, t r a v e l d i s t a n c e s and though there i s l i t t l e  show b e t t e r c o r r e l a t i o n s i n  t i m e s , and  similarities  distribution  patterns  i n the group s i z e s .  If  there were t r a v e l r e s t r i c t i o n i n l a t e r a l movements, then the Ward's method would probably methods. grouping  be the best c h o i c e among these  On the whole, t h i s q u a n t i t a t i v e e v a l u a t i o n of  the  r e s u l t s i n d i c a t e s t h a t the f i v e methods mentioned i n  t h i s s u b - s e c t i o n are favoured  f o r t h i s s e t of data based  the assumption that there i s no r e s t r i c t i o n s  on  in travel  directions.  5.5.2  Unevenly D i s t r i b u t e d C o n t r i v e d Data The  p l o t s of the r e s u l t s of the  methods i n d i c a t e that the complete  results  (Table 13 and  14)  27).  A study  and  average  t o t h a t of the  intend-  of the  summarized  s i m i l a r means of d i s t a n c e s ,  T h i s i s probably  the r e s u l t of v a r i o u s  d e n s i t y of data u n i t s w i t h i n each group. er and more s c a t t e r e d c l u s t e r has  these  memberships a l s o i n d i c a t e s  three methods have very  t r a v e l times.  the two  from the program ROUTE f o r  sets of d i f f e r e n t group s i z e s and t h a t these  clustering  l i n k a g e and  linkage methods give r e s u l t s resembling ed group s i z e s (19, 34 and  12  (DATA2)  Group 1, the  l e s s s t o p s , but  small-  longer  Mean ( f e e t ) Methods""  •—^____^^ ^P^ r  i JL  2  3  Standard  Deviation  1  2  3  Hierarchical S i n g l e Linkage City - Block  N.  A.  N.  A.  Euclidean Distance  5252  6471  14692  2414  3375  9592  Chi-Squares  5252  6471  14692  2414  3375  9592  Complete L i n k a g e  6085  8237  8434  3131  4281  4574  Avg. L i n k a g e between Merged Group  6642  8124  8434  3672  4285  4575  Avg. Linkage w i t h i n New Group  6642  8124  8434  3672  4285  4575  C e n t r o i d Method  6642  9358  6754  3672  5040  3273  M e d i a n Method  6642  9358  6754  3672  5040  3273  Ward's Method  11668  7569  7033  7257  4573  3468  J a n c e y s Method  10424  14040  13645  8433  9152  8625  F o r g y s Method  6955  13900  14644  4730  8609  9671  C o n v e r g e n t K-mean  6955  13900  14644  4730  8609  9671  Nonhierarchical 1  1  T a b l e 13.  Means and Standard Dev i a t i o n s o f Group s Defined by 12 C l u s t e r Methods f o r DATA2  M  Travel MSO^  _Group^^  1  Time^-Cmin.) 2  Travel D i s t . (mi) 1  1  3  2  3  2 Total Time (m in.) 1  2  3  Hierarchical Single Linkage N.A.  N.A.  N.A.  C i t y - Block Euclidean Distance  25.29 | 34.85  74.45  6.32  8.71  18.61  40.59  .57.34  108.65  Chi-Squares  25.29  34.85  74.45  6.32  8.71  18.61  40.59  57.34  108.65  Complete Linkage  30.06  50.58  50.82  7.52  12.65  12.71  47.16  81.18  75.12  Avg. Linkage between Merged Group  32.45  48.20  50.82  8.11  12.05  12.71  50.45  77.90  75.12  Avg. Linkage within New Group  32.45  48.20  50.82  8.11  12.05  12.71  50.45  77.90  75.12  Centroid Method  32.45  65.38  40.09  8.11  16.34  10.02  50.45  98.68  60.79  Median Method  32.45  65.38  40.09  8.11  16.34  10.02  50.45  98.68  60.79  Ward's Method  69.67  26.72  42.47  17.42  6.68  10.62  107.47  39.32  64.07  Nonhierarchica1 Jancey's Method  45.27  60.61  69.67  11.39  15.15  17.42  66.27  81.31  100.27  Forgy's Method  31.97  66.81  67.05  7.99  16.70  16.76  46.37  92.91  98.55  Convergent K-mean  31.97  66.81  67.05  7.99  16.70  16.76  46.37  92.91  98.55  1. from 1st to l a s t box; 2. including stop time.  T a b l e 14.  T r a v e l D i s t a n c e s and Times o f Groups Defined by 12 C l u s t e r Methods f o r DATA2  137  d i s t a n c e s between stops another.  Group 2 and  i n t r a v e l l i n g from one  l o c a t i o n to  3 have denser p o p u l a t i o n w i t h i n the  groups, thus r e q u i r i n g more stops i n t r a v e r s i n g the inter-point distances.  These c h a r a c t e r i s t i c s are  i n the d i s t a n c e d i s t r i b u t i o n p a t t e r n s  shorter  reflected  ( F i g u r e s 54-55) of  the  above three l i n k a g e methods.  The  intended  group s i z e s f o r the three c l u s t e r s are  not of comparable number of members and t i o n of these r e s u l t s d i f f i c u l t . o f these  t h i s makes the  Undoubtly, the  outcomes must be s u b j e c t i v e .  evalua-  evaluation  If similar  travel  d i s t a n c e s among the t h r e e groups a r e r e q u i r e d , then the Jancey*s n o n h i e r a r c h i c a l method has  groups o f t h i s n a t u r e .  hand, i f both the t r a v e l times and  l i n k a g e methods would be more a p p r o p i a t e One  age u s i n g " C i t y  - Block  "  method f o r t h i s set of d a t a .  intended  the two  i n grouping  t h i s set  i s that single l i n k -  measure i s not an  appropiate  Among the other seven methods, the  However, the r e s u l t s from ROUTE f o r the group-  ings of these methods do not  i n d i c a t e any  distinct  character-  i s t i c s t h a t are more s u p e r i o r than the three l i n k a g e methods' and  to  average  c e n t r o i d methods a l s o have s i m i l a r group s i z e s as ones.  other  d e f i n i t e c o n c l u s i o n , however, can be drawn i n the  e v a l u a t i o n o f these c l u s t e r i n g techniques  the two  the  the mean of d i s t a n c e s are  be comparable, then the complete l i n k a g e and  of data.  On  the Jancey's method's.  The  other f i v e methods do  group data u n i t s i n t o c l u s t e r s w i t h s i z e s r e s e m b l i n g  not to  the  1NTERVM 3 (KEtT)  OQ  FPfOUfiNCY  fD  «o 0.0  i  ~ ™ _ J  •  .„t  0 0,0 16  « * * »  20  * * * * * * *  0,0 P-  co rt t-i  4913.41 38 " " 8  "~623>.'.25  cr C rt-:  o o  * * * * * * * * *  3:  *  *  I—  7559.09 S3  A  25  * * * * * *  16  * * * *  * A *  5 -« rn  *  8881,93  p-  o  ^  t-  1 0 2 0 4 .7 7-  o  11527.61  Ml  12fl50 .<)5  5  *  6  *  >  -  1*173.29. co rt Pi  ro  O  co  si 1787.«s  rt  O  o c co  O  0  "3b 23", >Ta"  s: 3'  BO  0,0  1 0&  ************ A * * * * * * * * * * * * * * * * * * * * * * * * *  655!.21 * * * * * * * * * * * * * * * * * * —  'fiSTsToo  -  70  t****ft**A* * * * * * * *  10Q7fl'.79 — s r  1 18«2,59  86  *********************  AO  **********  13  I**********  27  ******  "T36P6T3T" 15370.lh  fD Hi p.  17133.95  a,  TFiT97';7'~  **************  _  fD  a cr  o o  . 3  10 -910.89 ~"3'ifA'T.62 33 5351.36  TJ H» fD rt fD  ~97fi3".83  tr>  11968.56  g*  1H73.30  OP fD  Hi'  O  i™{  g g  L —  '  7559,09  IT  k**»**«* *-«*****************  6!  ***************  60  ***************  3b  * i * * * * * * * ********  ~T6378"7o"3~ 26  ******  16  ****  18582.77 20787,50 5 "2299,?;23T  80  !<ro  ATC.  LOTAC3  GROUP  K I T H CCS  2  GROUP  CROUP  J  1  I *  irj  1  n  F i g u r e 55.  ¥•  r-  I  -I  A"  D i s t r i b u t i o n of D i s t a n c e s W i t h i n Groups Defined by Avg. Linkage Methods f o r DATA2  140  intended  groupings nor have comparable means of d i s t a n c e s  t r a v e l d i s t a n c e s w i t h i n the d e f i n e d  On  groups.  the whole, t h e r e are methods t h a t are more s u i t -  able f o r grouping t h i s set of data.  The  s u p e r i o r i t y of  method, i n t h i s case, cannot be judged s o l e l y by the of the means of d i s t a n c e s , t r a v e l d i s t a n c e s and requires subjective evaluation. process  and  any  comparisons  times; i t  This subjective decision  i n v o l v e s the c o n s i d e r a t i o n of a p p r o p i a t e  group  sizes,  the weighing of the q u a n t i t a t i v e measures, and most important of a l l ,  the experience  5.5.3  user.  North Burnaby E m p i r i c a l Data The  the new  o f the  (NBDATA)  complete l i n k a g e , the average l i n k a g e w i t h i n  group, the Ward's and  the Jancey's methods a l l have  s i m i l a r group s i z e s though d i f f e r e n t memberships. d i s t r i b u t i o n of data p o i n t s , e s p e c i a l l y p o i n t s t h i s set do not only d i s c o u r a g e  the use  The  uneven  1, 2 and  3, i n  of group s i z e s as  an  e v a l u a t i o n of c r i t e r i o n , i t a l s o i n d i c a t e s the importance of d i s t r i b u t i o n p a t t e r n i n the judgement of a p p r o p i a t e  The  groupings.  a p p l i c a t i o n of e v a l u a t i o n t o o l s on the  12  sets  of groupings d e f i n e d by the d i f f e r e n t c l u s t e r i n g methods on t h i s data  set i n d i c a t e s t h a t t h e r e are c o n f l i c t i n g evidence  the s u p e r i o r i t y of one method over the others  (Tables  15 and  on 16).  Mean (feet) 77~T—T  —  _  Group  Std. Deviation 1  2  1  2  C i t y - Block  944  11637  476  7241  Euclidean Distance  944  11637  476  7241  Chi-Squares  9562  7705  5875  4112  Complete Linkage  9824  6501  5609  3215  Avg. Linkage between Merged Group  944  11637  476  7241  Avg. Linkage within New Group  8614  10087  8105  6125  Centroid Method  9489  9531  7226  5293  Median Method  9489  9531  7226  5293  Ward s Method  10269  7622  6187  4201  Jancey's Method  9749  6688  5553  3332  Forgy's Method  9576  7091  5487  3588  Convergent K-mean Method  9576  7091  5487  3588  Methods  —•  Hierarchical Single Linkage  1  Nonhierarchical  T a b l e 15.  Means and Standard D e v i a t i o n o f Groups D e f i n e d by 12 C l u s t e r . Methods f o r NBDATA  Travel Time (min) 1  " r — — G r o u p Methods — —  1  Travel DisU(mi) 1  2  2  2 Total T ime (min) 1  2  Hierarchical Single Linkage C i t y - Block  1.00  159.91  0.25  39.98  3.70  235.51  Euclidean Distance  1.00  159.91  0.25  39.98  3.70  235.51  50.06  63.25  12.51  15.81  76.16  115.45  Complete Linkage  66.81  49.73  16.70  12.43  100.11  94.73  Avg. Linkage between Merged Group  1.00  159.91  0.25  39.98  3.70  235.51  Avg. Linkage within New Group  70.68  66.19  17.67  16.55  105.78  109.39  Centroid Method  21.21  87.83  5.30  21.96  32.01  155.33  Median Method  21.21  87.83  5.30  21.96  32.01  155.33  Ward's Method  68.19  57.55  17.05  14.39  99.69  104.35  Nonhierarchical Jancey's Method  65.00  54.07  16.25  13.52  96.50  100.87  Forgy's Method  59.46  60.15  14.87  15.04  87.36  110.55  Convergent K-mean  59.46  60.15  14.87  15.04  87.36  110.55  Chi-Squares  1. from 1st to l a s t box; 2. including Stopping time  T a b l e 16.  T r a v e l D i s t a n c e s and Times o f Group Defined by 12 C l u s t e r Methods f o r NBDATA  The c e n t r o i d methods' groupings have the most s i m i l a r means of d i s t a n c e s , whereas the Forgy"s and the Convergent n o n h i e r a r c h i c a l methods' c l u s t e r s have the almost t r a v e l times and d i s t a n c e s .  K-mean  identical  T h i s c o n t r a s t i n the q u a n t i t a t i v e  measures i s the r e s u l t o f the d i f f e r e n c e s i n the group s i z e s as w e l l as the d i s t r i b u t i o n p a t t e r n o f the data p o i n t s w i t h i n the  groups.  In t h i s c a s e , t h e most s i m i l a r group s i z e s (39, 48) as d e f i n e d by the average  l i n k a g e w i t h i n the new  group  method does not have the most matched t r a v e l times or d i s tances.  The i n c l u s i o n o f a p o t e n t i a l c l u s t e r o f 3 p o i n t s  (1, 2 and 3) i n the groupings d i s t o r t s the d i s t r i b u t i o n p a t t e r n o f the d i s t a n c e s  ( F i g u r e s 56-58).  The r e s u l t a n t  t r a v e l time r e q u i r e d f o r t r a v e r s i n g a s m a l l e r , but f a i r l y s c a t t e r e d group i s r e l a t i v e l y s i m i l a r t o t h a t r e q u i r e d f o r t r a v e l l i n g through the p o i n t s o f a denser p o p u l a t i o n . two n o n h i e r a r c h i c a l methods (Forgy's and Convergent have groupings o f comparable  The  K-mean)  d i s t a n c e d i s t r i b u t i o n s , thus the  r e s u l t i n g t r a v e l times and d i s t a n c e s a r e s i m i l a r .  These two  c l u s t e r i n g methods a r e probably the most s u i t a b l e methods among the 12 f o r t h i s d a t a s e t .  S i m i l a r t o the c o n c l u s i o n drawn f o r DATAl, the s i n g l e l i n k a g e methods do not group data p o i n t s i n t o groups o f comparable  s i z e s , nor s i m i l a r t r a v e l d i s t a n c e s .  The  • 144  F i g u r e 56.  D i s t r i b u t i o n of D i s t a n c e s W i t h i n Groups D e f i n e d by Forgy's and Convergent K-tnean Methods f o r NBDATA  145 - >  o  m  «  «  «  -•M-TH*tt-wew-eRoy p-  o O  GROUP  GROl/H  « « « «  '  i  « « « « « « « « « « « « « « «  2  « « « «  * « « * «  •-©• o  * « «  *  o  wa ! •  I  00  C fVOJ  i\!  c-  I  .  -O  .  ro ! in to  CO cu  <o  t-  O-  * 1* *  •  00  M r—  f .  -«  0"!  00  Jo V  u~>  in  o  t-  F i g u r e 57. D i s t r i b u t i o n o f D i s t a n c e s W i t h i n Groups D e f i n e d by Average Linkage Methods f o r NBDATA  *1  !  • : ro .  cc c o-  F i g u r e 58.  D i s t r i b u t i o n o f D i s t a n c e s W i t h i n Group Defined by Ward's Method f o r NBDATA  147  average l i n k a g e between merged groups method's c l u s t e r s do  not  have s i m i l a r s i z e s nor  t r a v e l times.  These methods  i n a p p r o p i a t e because they tend t o i s o l a t e the c l u s t e r from the r e s t of the data u n i t s .  incompatible t r a v e l times.  have s i m i l a r c l u s t e r i n g p r o p e r t i e s degree of i s o l a t i o n of the from the  The  two  of  3-point  course,  centroid  methods  though l e s s severe i n  t h r e e data u n i t s .  other four methods are  potential  are  This i s o l a t i o n  y i e l d s d r a s t i c a l l y d i f f e r e n t group s i z e s and, totally  also  generally  The  the  results  a c c e p t a b l e but  not  recommended f o r c l u s t e r i n g t h i s set of d a t a .  5.5.4  South Burnaby E m p i r i c a l The  (1, 1 and  111)  oddly c l u s t e r e d and  the  centroid  examined i n t h i s e v a l u a t i o n failed  to c l u s t e r the out  are r u l e d  the  s c h e d u l i n g problem.  single-linkage  (SBDATA)  group s i z e s f o r s i n g l e methods (2, 18,  process.  as a i d s  that  linkage  93) were  not  These methods have  data set i n t o groups of s i m i l a r  and  (Table 9) i n d i c a t e s  Data  sizes  f o r c l u s t e r i n g t h i s set of data f o r The  summary t a b l e  of the  the  nonhierarchical  group  methods and  c h i - s q u a r e s techniques c l u s t e r the data  sizes the set  i n t o t h r e e comparable groups.  An  examination of the r e s u l t s from ROUTE on  twelve sets of groupings as d e f i n e d by indicates  that  the  the  v a r i o u s methods  only three methods' groupings a c t u a l l y have  148  comparable means of the distances, t r a v e l times and distances (Tables 17 and 18). The Jancey's method groups have d i s t i n c t ly s i m i l a r t r a v e l times as w e l l as t o t a l times f o r the traverse within the groups: a r e s u l t i n grouping the data points into regions of almost i d e n t i c a l areas (Figure 49). The s i m i l a r i t y of d i s t r i b u t i o n patterns of the i n t e r - u n i t distances f o r groups as defined by t h i s method (Figure 59) a l s o r e f l e c t s the s u p e r i o r i t y of t h i s method over the others.  Among the h i e r a r c h i c a l methods, only the s i n g l e linkage Chi-squares method and the Ward's method have comparable means of distances as w e l l as t r a v e l times within the defined groups.  The group sizes f o r the Ward's method are  45, 28 and.40 respectively (Figure 47). Though the scattered d i s t r i b u t i o n s of a l l the data units within these groups resemble each other, the distances between points f o r group 2 (28 points) are s l i g h t l y longer than that of the other two groups.  Thus the d i s t r i b u t i o n of distances  a wider spread than the others.  (Figure 60) has  The number of stops i s , i n  t h i s case, c r i t i c a l f o r the t o t a l t r a v e l and stop time.  The single linkage Chi-square method has i t s d i s t i n c e linkage pattern (Figure 41). Although the group sizes are 34, 36 and 43 r e s p e c t i v e l y , the defined boundaries are very d i f f e r e n t from that of any methods examined i n t h i s  Standard  Mean (feet) M e t h o d ^ — _ ^ J * o u p  1  2  3  1  Deviation  2  3  Hierarchical Single Linkage C i t y - Block  N. A.  N.A.  Euclidean Distance  5009  6937  7766  2280  3541  4016  Chi-Squares  7919  7712  7784  5011  4896  3697  Complete Linkage  A738  6566  8309  2107  3231  4029  Avg. Linkage between Merged Group  5009  6037  7766  2280  3541  4016  Avg. Linkage within New Group  5146  6230  8283  2350  3232  4132  Centroid Method  N.A.  5603  9949  N.A.  2993  4975  Median Method  N.A.  5603  9949  N.A.  2993  4975  Ward's Method  7039  7346  6333  3818  3847  3348  Nonhierarchical Jancey's Method  7099  6345  6465  3328  3180  3336  Forgy's Method  7169  6470  6269  3363  3205  3210  Convergent K-mean  7169  6470  6269  3363  3205  3210  T a b l e 17.  Means and Standard D e v i a t i o n s o f Groups Defined by 12 C l u s t e r Method f o r SBDATA  Travel Me tho dT^"^>^Gro up  J.  Time^Cmin) 2  Travel 3  1  Dist. (mile) 1  2  3  Total 1  Time (min.) 2  2  3  Hierarchical Single Linkage C i t y - Block  N.A.  N.A.  N.A.  Euclidean Distance  27.75  39.23  77.76  6.94  9.81  19.44  46.65  62.63  137.16  Chi-Squares  38.65  50.43  61.97  9.66  12.67  15.49  69.25  82.83  100.67  Complete Linkage  24.19  50.14  71.48  6.05  12.54  17.87  42.19  90.64  114.68  Avg. Linkage between Merged Group  27.75  39.23  77.76  6.94  9.81  19.44  46.65  62.63  137.16  Avg. Linkage within New Group  28.91  40.84  75.88  7.23  10.21  18.97  48.71  73.24  125.38  Centroid Method  N.A.  27.97  190.16  N.A.  6.99  47.75  N.A.  44.17  273.86  Median Method  N.A.  27.97  190.16  N.A.  6.99  47.75  N.A.  44.17  273.86  Ward's Method  57.97  44.24  48.74  14.49  11.06  12.18  98.47  69.44  84.74  Nonhierarchical Jancey's Method  49.22  54.19  '48.96  12.30  13.55  12.17  78.92  95.39  79.29  Forgy's Method  50.78  56.12  35.90  12.70  14.02  8.97  98.47  69.44  62.90  Convergent K-mean  50.78  56.12  35.90  12.70  14.02  8.97  98.47  69.44  62.90  1. from 1st to last box; 2. including stopping time.  T a b l e 18.  T r a v e l D i s t a n c e s and Times o f Groups Defined by 12 C l u s t e r Methods f o r SBDATA  GROUP-2  J a n c e y ' s Method CROUP  3 a  2  5  4  c  I  O  <C  Kl  &  *-  m  I  9  a-  t  in . »  CC hi  * r  CROUP  t  M  m  I  F i g u r e 59.  o  I  e!• IP  a . * co  w  r>  K>  (VI  (vi o f» I r 1  c»  «M  H- .in « o CO  rj>  IA  m  o  •o  1<1  in  —  i/i  o  in -c '. • o»  <  3 P • rn m  *P  o . • co  <D  r*> *0  CO  KI *T  O  , » CO  LI -O  —>  ru  D i s t r i b u t i o n of D i s t a n c e s W i t h i n Groups D e f i n e d by Jancey's Method  •cr ! -it - I -  f o r SBDATA  -CHf-setjAKea-tiefHTO-  F i g u r e 60.  CROUP 2  D i s t r i b u t i o n of D i s t a n c e s W i t h i n Groups Defined by Chi-Squares  Method f o r SBDATA  study.  The d i s t r i b u t i o n p a t t e r n o f the d i s t a n c e s  s i m i l a r t o the Ward's.  is fairly  These d i s t r i b u t i o n s a r e s l i g h t l y  skewed t o the l e f t as shown i n F i g u r e 61.  The skewness i s  p a r a l l e l t o the d i f f e r e n c e i n the means and standard t i o n o f the d i s t a n c e s . distance also d i f f e r  The r e s u l t i n g r o u t e slightly  (Table  devia-  t r a v e l time and  18), because o f the  d i f f e r e n t degree o f s c a t t e r i n g o f the data u n i t s w i t h i n the d i f f e r e n t s i z e d groups.  The d i f f e r e n c e s i n p o p u l a t i o n d e n s i t y o f the groups as d e f i n e d by the above t h r e e methods, p l a y s an i n t e g r a l part i n the c a l c u l a t i o n o f d i s t a n c e s and the c h o i c e o f r o u t e . nonhierarchical  (Jancey's) method i s a p p a r e n t l y  The  the best  c l u s t e r i n g ' t e c h n i q u e s among the twelve methods f o r grouping t h i s set of data.  5.6  Summary As mentioned i n the above s e c t i o n s , e v a l u a t i o n o f  the a p p l i c a b i l i t y o f the twelve c l u s t e r i n g methods i s d i f f i c u l t and has t o be very  subjective.  In s e c t i o n 5.5, the  q u a n t i t a t i v e e v a l u a t i o n approaches a r e designed t o a i d the e v a l u a t i o n process.  I t i s found t h a t not a l l the methods a r e  s u i t a b l e f o r a l l kinds o f d a t a chatacteristics. i n s e c t i o n 5.3.  sets with various  spatial  T h i s c o n c l u s i o n i s p a r a l l e l t o t h a t drawn  CROUP 1  GROUP 3  -CGROUP—< *A ROt-ftTMOOOROUP 2 a u a.  t  o I  <c o  » «T  I -O  —  r -  c j { co  2 f, -2| £  Kl •O  KI «  ro  tn  ry ro  •r  fi  4* w  to  ^  r ** - r -  in  e»  • •  i  ro «  • •  Kl I -O O K l | tO cp in  Kl  C  rj  <c  K!  r» « o  o O  «r LO i o : «»  . • ._ r-  . •  - F i g u r e 61. . D i s t r i b u t i o n of. -Distances w i t h i n Groups Defined" by Ward 's Method f o r SBDATA'\  The c h o i c e o f c l u s t e r i n g method f o r grouping data u n i t s depends h i g h l y on the c h a r a c t e r i s t i c s o f t h e data set.  A table  (Table 19) summarizing the p r e f e r e n c e o f method  based  on the group s i z e s , the data u n i t d i s t r i b u t i o n s w i t h i n  the groups, and the t r a v e l times and d i s t a n c e s r e q u i r e d f o r t r a v e r s i n g the data p o i n t s w i t h i n the groups i n d i c a t e s t h a t v a r i o u s methods a r e favoured f o r c l u s t e r i n g d i f f e r e n t sets.  Methods t h a t g i v e a p p r o p i a t e groupings  data  f o r the evenly  d i s t r i b u t e d data s e t s (DATAl, .NBDATA and SBDATA) do not n e c e s s a r i l y give s a t i s f a c t o r y c l u s t e r s f o r other data  sets.  Among these methods t h a t a r e c o n s i d e r e d a p p l i c a b l e t o the s c h e d u l i n g problem, some a r e h i g h e r ranked  than the o t h e r s  f o r one s e t o f d a t a , and v i c e - v e r s a .  The s p a t i a l c h a r a c t e r i s t i c s o f the data u n i t s w i t h i n a group, perhaps, d e c i s i o n process.  i s the most c r i t i c a l element i n the  DATAl, a data s e t without any p o t e n t i a l  c l u s t e r , i s best grouped by the t h r e e n o n h i e r a r c h i c a l methods The r e s u l t s o f these methods a l l have s i m i l a r  characteristic:  the data u n i t s a r e a l l c l u s t e r e d i n t o l a t e r a l groups.  This  c h a r a c t e r i s t i c c o u l d be one o f the d e c i d i n g f a c t o r i n the evaluation process.  The i n c l u s i o n o f a p o t e n t i a l  3-point  c l u s t e r i n the n o r t h east c o r n e r o f the a r e a i n NBDATA r e duces the e f f e c t i v e n e s s o f some methods.  Only two h i e r a r c h i -  c a l methods a r e favoured f o r t h e i r r e s u l t i n g t r a v e l d i s t a n c e s  156  Ranking C r i t e r i o n  Group  2  1  " ^ M e ^ d s - ^ ^  Travel Time & D i s t .  Sizes  1  3  2  3  Hierarchical Single Linkage C i t y - Block . Euclidean Distance  1  Chi-Squares  3  Complete Linkage Avg. Linkage between Merged Group  1  2  3 3  2  2  Avg. Linkage H i t h i n New Group  1  2  2  1  2  Centroid Method Median Method  2  Ward's Method  2  3  3  2  Nonhierarchical  1  Jancey's Method  1  3  1  Forgy's Method  1  2  1  1  Convergent K-mean  1  2  1  1  3  1 Legend 1  —  best method  2  —  second best method  T a b l e 19.  3  —  t h i r d best method  Summary o f Method P r e f e r e n c e s f o r t h e Four Sets o f Data  157  and times.  The Ward's method i s c o n s i d e r e d t o be the a l t e r n a -  t i v e technique f o r grouping t h i s data s e t . SBDATA, a d a t a s e t s i m i l a r t o DATAl, i s best grouped method.  by Jancey's  nonhierarchical  On the whole, the t h r e e n o n h i e r a r c h i c a l methods are-  genera l l y ranked as a p p l i c a b l e methods f o r grouping evenly distributed  data s e t s .  The most ranked h i e r a r c h i c a l method i s  the Ward's technique though the r e s u l t s from t h i s method a r e l e s s a p p e a l l i n g than the n o n h i e r a r c h i c a l ones'.  The unevenly  distributed  grouping i s found t o be best grouped methods: the complete methods.  data s e t w i t h intended by the three l i n k a g e  l i n k a g e and the two average  U n f o r t u n a t e l y , t h e r e were no a v a i l a b l e  data t h a t resemble  linkage empirical  t h i s DATA2 s e t , and the c o n c l u s i o n s  c o n c e r n i n g the c l u s t e r i n g  o f . t h i s type of data d i s t r i b u t i o n  can o n l y be drawn from the t r i a l s  on DATA2.  It is difficult  t o s t a t e t h a t these t h r e e methods a r e the best ones f o r grouping t h i s type of d i s t r i b u t i o n , and i t r e q u i r e s s u p p o r t i n g evidence t o s u b s t a n t i a t e the v a l i d i t y of t h i s  subjective  evaluation.  In e v a l u a t i n g the c l u s t e r i n g methods f o r grouping m a i l boxes f o r the s c h e d u l i n g problem, the r a n k i n g based  system  on the s i m i l a r i t y of t r a v e l d i s t a n c e s and times i s p r o -  b a b l y the best approach.  I n most c a s e s , the schedules f o r the  158  t r u c k s e r v i c e s are set up a c c o r d i n g t o the time a l l o c a t e d t o each d r i v e r and not t o the number o f boxes each d r i v e r has t o service.  The P o s t a l Union r u l i n g on the maximum number of  stops f o r each r o u t e , however, c o u l d h i n d e r the use of above s c h e d u l i n g approach.  For t h i s study, i t i s assumed that the  schedules are prepared f o r each d r i v e r based on the time s l o t a s s i g n e d t o each type of s e r v i c e s .  I f the assumption i s  c o r r e c t , then the t h r e e n o n h i e r a r c h i c a l methods and the Ward's method are probably the b e t t e r c l u s t e r i n g techniques f o r groupi n g d a t a sets f o r Burnaby a r e a .  159  CHAPTER VI  CONCLUSIONS  T h i s study i n i n v e s t i g a t i n g characteristics the  the a p p l i c a b i l i t y and  o f the twelve c l u s t e r i n g  techniques  indicates  following:-  1. The s p a t i a l c h a r a c t e r i s t i c s the  locations  o f the data s e t or  o f the m a i l boxes have s i g n i f i c a n t  influence  on the outcomes o f the grouping  methods.  A set of f a i r l y  units  c o u l d be best  evenly d i s t r i b u t e d  grouped  by the n o n h i e r a r c h i -  c a l methods and not the o t h e r s . these methods a r e not s u i t a b l e ly d i s t r i b u t e d  data u n i t s .  data  Conversely, f o r g r o u p m g uneven-  There i s no s t r a i g h t  r u l e , however, t o d i s c r i m i n a t e the use of any c l u s t e r i n g method, and i t r e q u i r e s t e s t runs t o prove the s u i t a b i l i t y  o f these c l u s t e r i n g  methods  f o r the p a r t i c u l a r data s e t .  2. The three s i n g l e  l i n k a g e methods u s i n g  d i s t a n c e measures g i v e v a r i o u s r e s u l t s same s e t o f d a t a . "City  The method u s i n g  - Block"as a s s o c i a t i o n  different f o r the  simple  measure tends t o  group c l o s e l y and  l o c a t e d data u n i t s i n t o a c l u s t e r  i s o l a t e the more d i s t a n t data p o i n t s .  grouping c h a r a c t e r i s t i c generates  This  dissimilar  group s i z e s as w e l l as t r a v e l d i s t a n c e s and times for  t r a v e r s e s w i t h i n d e f i n e d groups.  The  square  r o o t o f the sum o f squares o f d i f f e r e n c e s (Euc lidean) method does not always i s o l a t e the d i s t a n t  data  u n i t s from the o t h e r s , and t h i s approach g e n e r a l l y g i v e s f a i r l y a c c e p t a b l e groupings clusters.  o f the n a t u r a l  The c h i - s q u a r e s method c l u s t e r s data  units with d i s t i n c t i v e  l i n k s , and the group s i z e s  o u t l i n e d by t h i s method i s u s u a l l y a c c e p t a b l e , but not as good as other methods'.  On the whole,  s i n g l e l i n k a g e methods a r e not s u i t a b l e for  techniques  grouping these f o u r s e t s of d a t a .  The two average  l i n k a g e methods and the complete  l i n k a g e technique a r e c o n s i d e r e d t o be s u i t a b l e only f o r the unevenly criteria of  single  criteria of  used  d i s t r i b u t e d data s e t . The  i n these methods d i f f e r from t h a t  l i n k a g e methods', and these  grouping  g e n e r a l l y l i n k data u n i t s i n t o groups  f a i r l y comparable s i z e s .  These a l g o r i t h m s  tend t o l i n k the p o t e n t i a l c l u s t e r s together p r i o r t o the l i n k i n g of more s c a t t e r e d The e f f e c t i v e n e s s o f the a b i l i t y t i a l c l u s t e r s i s h i g h l y reduced  points.  t o group  poten-  i f t h e r e i s an  161  absence o f p o t e n t i a l groupings.  The r e s u l t s  from  these methods f o r evenly d i s t r i b u t e d data s e t s are t h e r e f o r e not as s a t i s f a c t o r y as these o f n o n h i e r a r c h i c a l and the Ward's methods'.  On  the other handj intended groupings a r e r e a d i l y c l u s t e r e d by these methods as shown i n the t r i a l s f o r DATA2.  4. The r e s u l t s generated by the c e n t r o i d and median methods a r e always i d e n t i c a l f o r these f o u r data sets.  The d i f f e r e n c e s i n weighing the i n t e r - u n i t  or i n t e r - c l u s t e r d i a t a n c e s i n these two methods have l i t t l e  i n f l u e n c e on the l i n k i n g sequence or  the l i n k a g e p a t t e r n .  I t i s apparent  of these methods should be used  t h a t o n l y one  i n further studies  of the a p p l i c a b i l i t y of c l u s t e r a n a l y s i s . r e s u l t s produced  The  by these methods do not conform  t o any of the b e t t e r ranked methods, and they a r e not s u i t a b l e f o r grouping the m a i l boxes f o r the s c h e d u l i n g problem.  5. The Ward's method i s by f a r the best ranked among the seven h i e r a r c h i c a l methods f o r grouping evenly d i s t r i b u t e d data s e t s . c r i t e r i o n adopted  The e r r o r sum o f squares  i n t h i s method l i n k s the v a r i a b l e s  w i t h l a r g e v a r i a n c e s f i r s t , thus g i v e s a good  162  r e p r e s e n t a t i o n of both the c l o s e l y  located  d i s t a n t data u n i t s i n the groupings. of  the Ward's method f o r the  The  unevenly  and results  distributed  data s e t , however, i s u n a c c e p t a b l e .  6. The r e s u l t s from the n o n h i e r a r c h i c a l techniques are d i s t i n c t l y  s a t i s f a c t o r y f o r grouping evenly  d i s t r i b u t e d data s e t s .  The  l o c a t i o n of the seed  p o i n t s or i n i t i a l p a r t i t i o n s f o r these methods, c o n t r a r y t o the f i n d i n g s of other a u t h o r s , i s found t o be of minor importance of  i n the grouping  the evenly d i s t r i b u t e d data u n i t s .  The  contin-  uous a l l o c a t i o n and r e a l l o c a t i o n of the data u n i t s to  the n e a r e s t c e n t r o i d a p p a r e n t l y reduces  importance In  of the seed p o i n t s and  the  partitions.  t h i s study, randomly s e l e c t e d seed p o i n t s and  i n i t i a l p a r t i t i o n s are seemingly does not only reduce  valid.  the complexity i n u s i n g  n o n h i e r a r c h i c a l methods, i t a l s o reduces time used  This  the  i n data preparations.  7. The d i s t r i b u t i o n of i n t e r - u n i t d i s t a n c e s w i t h i n groups i d e n t i f i e s the s c a t t e r i n g of the data u n i t s . The  s i m i l a r i t y of d i s t r i b u t i o n p a t t e r n s would  i n d i c a t e the c o m p a t i b i l i t y of t r a v e l d i s t a n c e s and times r e q u i r e d f o r the groups d e f i n e d .  This  163  conclusion  i s s i m i l a r t o the r e l a t i o n s h i p between  t r a v e l distances p o i n t s are (1969).  and  the a r e a  located described  The  i n which the  call  by C h r i s t o f i d e s  comparison of the d i s t r i b u t i o n  p a t t e r n i s d e f i n i t e l y a u s e f u l a i d i n the s e l e c t ion  8. The  o f c l u s t e r i n g methods.  best  i n t e r p r e t a t i o n t o o l f o r both the h i e r a r -  c h i c a l and  nonhierarchical clustering results i s  the r e p r e s e n t a t i o n  of the  l i n k s between e n t i t i e s  or the group boundaries on a 2-dimensional graph. T r e e diagrams are a l s o u s e f u l , but  i t requires  more time to t r a c e a t r e e than to i n s p e c t l i n k a g e or boundary p l o t s . on graph  a l s o helps  The  the  p l o t of data u n i t s  to understand the degree of  s c a t t e r i n g o f the c a l l p o i n t s .  On  the whole,  v i s u a l a i d s are u s e f u l i n the p r e p a r a t i o n  of  schedules.  9. The  t r a v e l d i s t a n c e s and  times are  critical  measures i n e v a l u a t i n g the groupings d e f i n e d v a r i o u s methods.  These elements are a c t u a l l y the  most v i t a l i n f o r m a t i o n  i n the  preparation  schedules as w e l l as the s p e c i f i c r o u t e s trucks.  The  o p t i m a l r o u t e as o u t l i n e d by  maximum d i s t a n c e  by  of for  the  the  s a v i n g r o u t i n g method f o r each  group can i n the  10.  The  a l s o be used as a d e c i s i o n  s e l e c t i o n of c l u s t e r i n g method.  evaluation  of c l u s t e r i n g methods, whether  q u a l i t a t i v e or q u a n t i t a t i v e very s u b j e c t i v e  The  and  i n n a t u r e , has  the Ward's and  c a l methods f o r data p o i n t s d i s t r i b u t e d South Burnaby's. groupings of the  the  boxes i n t o c l u s t e r s  the complete l i n k a g e and  use the  three nonhierarchi and  imply that a l l the  should be  I t i s expected that  North Shore, would warrant the  be  cluster  s i m i l a r t o North  T h i s , however, does not  these approaches.  to  heuristic.  seemingly apparent c h o i c e of methods f o r  i n g m a i l boxes should be  e.g.  criterion  performed  by  some a r e a s , such as  the  of other c l u s t e r i n g methods two  average l i n k a g e methods,  i n order to g i v e s a t i s f a c t o r y groupings of the m a i l boxes. The  t e s t s on unevenly d i s t r i b u t e d data set DATA2 have shown  that n o n h i e r a r c h i c a l of data point  The  methods are  not  suitable  distribution.  t o o l s developed f o r t h i s study i n c l u s t e r i n g  data s e t s , c a l c u l a t i n g s t a t i s t i c s , and d i s t a n c e and  e s t i m a t i n g the  route  t i m i n g c o u l d be c o o r d i n a t e d i n t o a s i n g l e  f o r s c h e d u l i n g purposes.  These t o o l s a r e  h e l p analyse the r o u t e s t r u c t u r e relationships  f o r t h i s type  of c a l l p o i n t s .  e f f i c i e n t means to  as w e l l as  The  program  the  spatial  histograms showing  165 d i s t r i b u t i o n of d i s t a n c e s between p o i n t s i s a l s o a u s e f u l f o r a n a l y s i n g the data p o i n t s and  tool  e v a l u a t i n g the groupings.  T h i s study a l s o p o i n t s out t h a t a l t h o u g h  computeri-  zed c l u s t e r i n g methods can h e l p the s c h e d u l e r s i n d e t e r m i n i n g the assignment  of c a l l p o i n t s , i t does not o v e r - r u l e the  s u p e r i o r i t y of the groupings o u t l i n e d manually c a r r i e d out by the experienced p l a n n e r s .  An  by i n s p e c t i o n as  interesting  study  r e l a t e d t o t h i s c l u s t e r i n g method i n v e s t i g a t i o n would b e the study of a p p l i c a b i l i t y of the f i v e more s u i t a b l e methods i n grouping the Vancouver's m a i l or bundle boxes i n t o c l u s t e r s . T h i s t r i a l would f u r t h e r prove the f e a s i b i l i t y  of u s i n g c l u s t e r  a n a l y s i s as an a i d t o the Post O f f i c e s c h e d u l i n g problem.  166  FOOTNOTE  1.  Unpublished S p e c i a l V e h i c l e U t i l i z a t i o n Study R e p o r t , Vancouver Post O f f i c e , 1975.  2.  M.R. Anderberg. C l u s t e r A n a l y s i s f o r A p p l i c a t i o n (New York: Academic P r e s s , 1973). pp.25-29.  3.  B. E v e r i t t . C l u s t e r A n a l y s i s ( T o r o n t o : E d u c a t i o n a l Books, 1974). pp.7-9.  4.  M.R. Anderberg. pp.132-133.  5.  B.S. Duran and P.L. O d e l l . Cluster Analysis,_a Survey (New Y o r k : S p r i n g e r - V e r l a g , 1974). pp.6-7.  6.  M.R. Anderberg. pp.136-7.  7.  Ibid.  p.134.  8.  Ibid.  p.140.  9.  Ibid.  p.156.  Ibid.  p.163.  10.  Heinmann  REFERENCES  Anderberg, M.R., Cluster Analysis for A p p l i c a t i o n , Academic Press, N.Y., 1973, 354 p. Astrahan, M.M., "Speech Analysis by C l u s t e r i n g , or the Hyperphoneme Method", Stanford A r t i f i c i a l I n t e l l i gence Project, Stanford U n i v e r s i t y , Stanford, C a l i f . , 1970, 25p. B a l l , G.H. and H a l l , D.J., "A C l u s t e r i n g Technique for Summarizing Multivariate D a t a " , Behavioral Sciences, vol. 12, No. 2, 1967, pp. 153-55. Bijnen, E.J., Cluster Analysis: Survey and Evaluation of Techniques, T i l b u r g University Press, The Netherlands 1973, 112p. Bonner, R.E., "On Some Clustering Techniques", IBM Journal of Research and Development. 1964, vol.8, pp. 22-32. Bridges, C.C., "Hierarchical Cluster A n a l y s i s " , Psvchologi c a l Reports, v o l . 18, 1966, pp.851-54. Carmichael, J.W. , George, J.A. and J u l i u s , R.S., "Finding Natural Clusters", Syst. Zool.. vol.17, 1968, pp. 144 150. C a t t e l l , R.B., Factor A n a l y s i s . Harper, N.Y. , 1952, p.355. Chance, R., Dyke, B. and Wong, S., Unpublished Report on Special Vehicle U t i l i z a t i o n Study, Vancouver Post O f f i c e , Vancouver, 1975, 40p. Christofides, N. and E i l o n . S., "Expected Distances i n D i s t r i b u t i o n Problems Cp. Res. Q., v o l . 20, no. 4, 1969, pp. 437-43. Cochran, W.G. and Hopkins, C.E., "Some C l a s s i f i c a t i o n Problems with Multivariate Q u a l i t a t i v e Data", Biometrics , vol.17, n o . l , 1961, pp.10-32. Cole, A.J. and Wishart, D., "An Improved Algorithm for the Jardine-Sibson Method of Generating Overlapping Clusters", The Computer Journal, vol.13,1970, pp.156163.  168  Cormack, R.M. , "A Review o f C l a s s i f i c a t i o n " , Jour. R. S t a t i s t . Soc. S e r i e s A. v. 134, no.3, 1971, pp. 367.  321-  Duran, B.S. and O d e l l , P.L., C l u s t e r A n a l y s i s , a Survey, S p r i n g e r - V e r l a g , N.Y. , 1974, 137p. Edwards, A.W.F. and C a v a l l - S f o r z a , L.L., "A Method f o r C l u s t e r A n a l y s i s " , B i o m e t r i c s , v.21, 1965, pp.362375. Engelman, L. and F u j S . , BMDP2M program, UCLA BMD Documenta t i o n . UCLA, C a l i f . , 1970, 7p. E v e r i t t , B., C l u s t e r A n a l y s i s . Heinmann E d u c a t i o n a l London, 122p.  Books,  F l e i s s , J.L. and Z u b i n , J . , "On the Methods and Theory o f C l u s t e r i n g " . M u l t i v a r i a t e B e h a v i o r a l Research, v.4, 1969, pp.235-250. Forgy, E., " C l u s t e r A n a l y s i s o f M u l t i v a r i a t e Data: E f f i c i e n cy vs I n t e r p r e t a b i l i t y o f C l a s s i f i c a t i o n s " , B i o m e t r i c s , v.21, 1965, p.758. Gower, J.C., "Some D i s t a n c e P r o p e r t i e s o f Latent Root and V e c t o r Methods Used i n M u l t i v a r i a t e A n a l y s i s " , B i o m e t r i k a . v.53, 1966, pp. 325-338. , "A Comparison o f Some Methods o f C l u s t e r B i o m e t r i c s . v.23, no.4, 1967, pp.623-37.  Analysis",  "Minimum Spanning Trees and S i n g l e Linkage C l u s t e r A n a l y s i s " , A p p l . S t a t i s t . , v.18, n o . l , pp.54-64. Green, P.E. and Rao, V.R.. "A Note on P r o x i m i t y Measures and C l u s t e r A n a l y s i s ', Jour. Mark.. Res. , v o l . V I , 1969, pp.359-64. H a r r i s , B., F a r h i , A. and Dufour, J . , A s p e c t s o f a, Problem , i n C l u s t e r i n g , U n i v e r s i t y o f P e n n s y l v a n i a , 1972, 28p. H a r t i g a n , J.A., " R e p r e s e n t a t i o n o f S i m i l a r i t y M a t r i c e s by T r e e s " , J . Amer. S t a t i s t . A s s o c . , v.62, 1967, pp.11401158. ________ " D i r e c t C l u s t e r i n g o f a Data M a t r i x " , J . Amer. S t a t i s t . Assoc., v.67, 1972, pp.123-129.  169  , C l u s t e r i n g A l g o r i t h m s , John W i l e y , N.Y., 1975, 351p. J a r d i n e , N. and S i b s o n , R., "The C o n s t r u c t i o n o f H i e r a r c h i c a l and N o n - h i e r a r c h i c a l C l a s s i f i c a t i o n s " , Comp. J . , v . l l , 1968, pp. 177-184. , "Choice of Methods f o r Automatic Comp. J . , v.14, 1971, pp.404-406.  Classifications",  J a r v i s , R.A. and P a t r i c k , E.A., " C l u s t e r i n g U s i n g a S i m i l a r i t y Measure Based on Shared Near Neighbours , IEEE Trans. Comp., v.22, no.11, 1973, pp.Io25-1034. Jensen, R.E., "A Dynamic Programming A l g o r i t h m f o r C l u s t e r A n a l y s i s " , Op. Res., v.17, 1969, pp.1034-57. Johnson, R.M. , "Q-Analysis o f Large Samples", Jour. Mark. Res. , v. V I I , 1970, pp.104-5. Johnson, S.C., " H i e r a r c h i c a l C l u s t e r i n g Schemes", Psychom e t r i k a , v.32, no.3, 1967, pp.241-254. Jones, K.S. and Jackson, D., "Current Approaches t o C l a s s i f i c a t i o n and Clump-finding a t t h e Cambridge Language Research U n i t " , Comp. J . , v.10, 1967, PP.29-37. King, B.F., "Stepwise C l u s t e r i n g Prcedures", J . Amer. S t a t i s t . A s s o c . , v.62, 1967, pp.86-101. Koontz, W.L. e t a l , "A Branch and Bound C l u s t e r i n g A l g o r i t h m " , IEEE Tran. Comp., v.24, no. 9, 1975, pp.908-914. K r u s k a l , J r . J.B., "On the S h o r t e s t Spanning Subtree o f a Graph and the T r a v e l l i n g Salesman Problem", Proc. Amer. Math. Soc., no.7, 1956, pp.48-50. K r u s k a l , J.B., " M u l t i d i m e n s i o n a l S c a l i n g by O p t i m i z i n g Goodness of F i t t o a Nonmetric Hypothesis ', Psychometrika. v . 2 9 1964, pp.1-28. s  Lance, G.N. and W i l l i a m s , W.T., "Computer Programs f o r Hierarchical Polythetic Classification('Similarity A n a l y s i s ' ) " , Comp. J . , v.9, n o . l , 1966, pp.60-64. ,"A G e n e r a l Theory o f C l a s s i f i c a t o r y S o r t i n g S t r a t egies I , H i e r a r c h i c a l Systems", Comp. J . , v.9, 1967, pp.373-380. , "A G e n e r a l Theory o f C l a s s i f i c a t o r y S o r t i n g S t r a t egies I I , C l u s t e r i n g Systems", Comp. J . , v.10, 1967, pp.271-277.  170  L i n g , R.F., "On the Theory and C o n s t r u c t i o n o f K - C l u s t e r " , Comp. J . . v.15, 1972, pp.326-332. MacQueen, J.B. , "Some Methods f o r C l a s s i f i c a t i o n and A n a l y s i s of M u l t i v a r i a t e O b s e r v a t i o n s " , Proc. Symp. Math. S t a t i s t , and P r o b a b i l i t y , 5 t h . B e r k e r l y , v . l , 1967, pp.281-297. McQuitty, L.L. , " H i e r a r c h i c a l Linkage A n a l y s i s f o r t h e I s o l a t i o n o f Types", E d u c a t i o n a l and P s y c h o l o g i c a l Measurement, v.20, 1960, pp.55-67. , " H i e r a r c h i c a l Syndrome A n a l y s i s " , Educ. and Psycho. Measure., v.20, 1960, pp.293-304. , " H i e r a r c h i c a l C l a s s i f i c a t i o n by M u l t i p l e Linkage", Educ. and Psycho. Measure., v.30, 1970, pp.3-19. Mcrae, D.J., "MLRCA: A F o r t r a n IV I t e r a t i v e K-means C l u s t e r A n a l y s i s Program", B e h a v i o r a l S c i . , v.16, no.4, 1971, pp.423-424. M a r r i o t , F.H.C., " P r a c t i c a l Problems i n a Method o f C l u s t e r A n a l y s i s " , B i o m e t r i c s , v.27, no.3, 1971, pp.501-14. Morrison, D.G., "Measurement Problems i n C l u s t e r A n a l y s i s " , Management S c i . , v.13, 1967, pp. B-775-780. Parks, J.M., " F o r t r a n IV Program f o r Q-Mode C l u s t e r A n a l y s i s on Distance F u n c t i o n w i t h P r i n t e d Dendogram", Comp. C o n t r i b . 46, S t a t . Geol. Survey, U n i v e r s i t y o f Kansas, Lawrence, Kansas, 1970, 36p. P a t t e r s o n , J.M. and Whitaker, R.A., "CGROUP: H i e r a r c h i c a l Grouping A n a l y s i s w i t h Optimal C o n t i g e n i t y C o n s t r a i n t Program , UBC Computer Centre Program, UBC, Vancouver, 1973, 20p. Rand, W.M. , " O b j e c t i v e C r i t e r i a f o r the E v a l u a t i o n o f C l u s t e r i n g Methods", J . Amer. S t a t i s t . A s s o c . , v.66, 1971, pp.846-850. R o h l f , F . J . , "Adaptive H i e r a r c h i c a l C l u s t e r i n g Schemes", S y s t . Z o o l . , v.19, n o . l , 1970, pp.58-83. Sawrey, W.L. e t a l , "An O b j e c t i v e Method o f Grouping P r o f i l e s by D i s t a n c e Functions and i t s R e l a t i o n t o F a c t o r A n a l y s i s " , Educ. and Psycho. Measure., v.20, 1960, pp. 651-673.  Shepard, R.N., " A n a l y s i s o f P r o x i m i t i e s : M u l t i d i m e n s i o n a l S c a l i n g w i t h an Unknown D i s t a n c e F u n c t i o n I and I I " , Psychometrika, v. 27, 1966, pp. 125-140, 219-246. Shepherd, M.J. and W i l l m o t t , A . J . , " C l u s t e r A n a l y s i s on the A t l a s Computer", Comp. J.,v.11, 1968, pp.57-62. Sibson, R., "SLINK: An O p t i m a l l y E f f i c i e n t A l g o r i t h m f o r the S i n g l e Link C l u s t e r Method", Comp. J . , v.16, 1973, pp.30-34. S o k a l , R.R. and Michenener, C D . , "A S t a t i s t i c a l Method f o r E v a l u a t i n g Systematic R e l a t i o n s h i p s " , Univ. Kansas. S c i . B u l l . 38, 1958, pp.1409-1438. S o k a l , R.R. and Sneath, P.H.A., ' P r i n c i p l e s o f Numerical Taxonomy , Freeman, San F r a n c i s c o , 1963, 377p. Tse, A., "Scheduling o f Post O f f i c e L e t t e r Box C o l l e c t i o n Routes - A Case Study", Comm. 541 P r o j e c t , UBC, Vancouver, 1975, 26p. Ward, J r . J.H., " H i e r a r c h i c a l Grouping t o Op imise an O b j e c t i v e F u n c t i o n " , J . A m e r . S t a t i s t . A s s o c . , v.58, 1963, pp.236-244. ; " r  —  Ward, J r . J.H. and Hook, M.E. , " A p p l i c a t i o n o f an H i e r a r c h i c a l Grouping Procedure t o a Problem o f Grouping P r o f i l e s ' , Educ. and Psycho. Measurement, v.23, 1963, pp.69-83. Wishart, D., "An A l g o r i t h m f o r H i e r a r c h i c a l C l a s s i f i c a t i o n s B i o m e t r i c s , v.22, n o . l , 1969, pp.165-170. ,"Fortran I I Programs f o r 8 Methods o f C l u s t e r Analysis(CLUSTRAN I ) " , Comp. C o n t r i b . 38, S t a t e G e o l . Survey, Univ. of Kansas, Lawrence, 1969, 47p. "~ Wolfe, J.H., " P a t t e r n C l u s t e r i n g by M u l t i v a r i a t e Mixture A n a l y s i s " , M u l t i v a r i a t e B e h a v i o r a l Res., v.5, no.3, 1970, pp.329-350. Wright, W.E., "An Axiomatic S p e c i f i c a t i o n o f E u c l i d e a n A n a l y s i s " , Comp. J . , v.17, 1974, pp.355-364. Zahn, C.T., " G r a p h - T h e o r e t i c a l Methods f o r D e t e c t i n g and D e s c r i b i n g G e s t a l t C l u s t e r s " , IEEE T r a n s . Comp., v.20, n o . l , 1971, pp.68-86.  APPENDIX A  L i s t i n g o f MATRIX: a Computer  Program  for Generating Symmetric D i s t a n c e M a t r i x  173  C C.  <C I  j 10 30 20  50 40  70 60 !  :80 !  '95 90  96 ]L00 JLOl ]. 13  1>S I G  T H I S 15 M A T R I X : PROGRAM FOR C A L C U L A T I N G D I S T A N T FOR C L U S T E R I N G PROGRAM I N P U T S  MATRIX  DIMENSION X I 1 5 G ) , Y ( 1 5 Q ) , D I S T i 1 5 0 , 1 5 0 } , A D I S T ( 1 5 0 , 1 5 0 ) DIMENSION DMAT(7000) READ, N DO 10 1=1,N READ(5tlOO) X(I)»Y<I) WRITE(6,iOI) I,XJI),YiIJ CONTINUE DO 20 I 1 = 1 . N DO 3 0 1 2 = 1 , N DIST(II,I2)={(X(12)-X<II)i**2)+((Y112)-Y{11i)**2J CONTINUE AMAX=0. AMIN=999999.9 DO 4 0 1 3 = 1 , N DO 5 0 I 4 = 1 , N I F ( DIST ( 1 3 , 1 4 ) . G T . A M A X J A M A X = D I S T ( I 3 , 1 4 ) IFiDiSTi13,14).LT.AMIN) AMIN=OIST(13,14) CONTINUE DO 60 I 5 = 1 , N 00 70 I 6 = 1 , N ADISTU5,I6)=DIST(I5,16)/AMAX CONTINUE DO 80 I 7=1 ,JN CONTINUE K=0 DO 90 I 8 = 2 , N N2=I8-1 DO 95 I 9 = 1 , N2 K=K + 1 0MAT<K)=AD1ST(I8,I9) CONTINUE COUNT=K/i0. KOUNT=K/10 K1=K0UNT IF(COUNT.GT.KOUNT) K1=KGUNT+1 DO 96 I I - l . K l KB£G=«I 1 * 1 0 ) - 9 KEND=II*10 I F ( 1 1 . E Q - K l ) KcMD=K WRITE(7,113) (DMATIK3),K3=KBEG,KEND) W R I T E ( 6 , l i 3 ) (DMA!{K2),K2=KBcG»KEND) FORMAT* 5 X , 2 F 7 . 0 ) FORMAT!I 5 , 2 F 1 0 . 2 ) FORMAT ( 1 0 F 7 . 4 ) STOP END  APPENDIX  B  L i s t i n g and Sample Outputs from HiER: a Computer Program f o r S i x H i e r a r c h i c a l C l u s t e r i n g  Techniques  175  C  100  PROGRAM D R I V E R — - MAIN PROGRAM DIMENSION X ( 1 0 0 0 0 1 LIMIT = 10000 CALL C N T P . H X , L I M I T ) WRITE(6,100) F O R M A T ( • 1 ' f l 5 X . • c N D OF OUTPUT *) STOP END SUBROUTINE  C C  0  CNTRL(X,LIMIT)  T H I S S U B R O U T I N E A L L O C A T E S STORAGE» READS INPUT AND CONTROLS E X E C U T I O N FOR A H I E R A R C H I C A L C L U S T E R I N G JOB BASED ON A P R O V I D E D SIMILARITY MATRIX.  INPUT  SPECIFICATIONS  e  CARD 1 T I T L E CARD CARD 2 I N F O R M A T I O N FOR S U B R O U T I N E S C L S T R AND T R E E 1- 3 NE=NUM8ER OF E N T I T I E S (DATA U N I T S OR V A R I A B L E S ) TO B£ COLS c CLUSTERED c COLS 4- 5 I S I G N = O P T I O N FOR S I M I L A R I T Y F U N C T I O N I S I G N = + 1 , D I S T A N C E MEASURE c I S I G N = - 1 , C O R R E L A T I O N MEASURE c NTSV=TAPE UNIT ON WHICH C L S T R R E S U L T S ARE SAVED COLS 6- 7 NTSV = 7, PUNCH R E S U L T S ON CARDS c N T S V . L E . O i DO NOT SAVE R E S U L T S c N T I N = U N I T FROM WHICH S I M I L A R I T Y M A T R I X I S READ COLS 8- 9 N T I N = 5 , CARD READER ;c N T I N o N E o S , D I S K OR T A P E c. COLS 10-12 I N O P T = I N P U T O P T I O N FOR S I M I L A R I T Y MATRIX •c I N O P T . L E . O f EACH RECORD I S ONE ROW OF A LOWER T R I A N G c ULAR M A T R I X c I N O P T c G T o O , THE LOWER T R I A N G U L A R M A T R I X I S C O N S I D E R E D c c TO BE STORED BY ROWS' I N ONE LONG L I N E A R c ARRAY AND IS READ IN B L O C K S * I N O P T # L O N G . COLS 13-14 c KOUT=OUTPUT O P T I O N c K 0 U T = + 2 , STANDARD OUTPUT c KQUT=-2» STANDARD OUTPUT PLUS PUNCHED S E Q U E N C E L I S T c FROM SUBROUTINE * T R E E * G  t  z  r  c«*• ANY P R E P O S I T I O N I N G OF THE I / O U N I T S NTSV AND NT I N MUST BE c A C C O M P L I S H E D IN PROGRAM D R I V E R OR THROUGH USE OF CONTROL CARDSo c c CARD 3 I N P U T FORMAT FOR S I M I L A R I T Y MATRIX ( 2 0 A 4 FORMAT) c C A R D ( S ) 4 S I M I L A R I T Y MATRIX c CARD 5 END OF RECORD CARD ( 7 / 8 / 9 )  176  INCLUDE  CARDS  4 AND 5 ONLY I F  THE S I M I L A R I T Y  MATRIX IS  ON  CARDS***  CARD(S) 6 L A B E L CARDS FOR E N T I T I E S . T H E R E ARE TWO O P T I O N S 1. I N C L U D E 1 CURD WITH THE 4 - C H A R A C T E R S * W O L B * I N COLUMNS 1-4 UNDER T H I S O P T I O N L A B E L S ARf= NOT P R I M T E D ON THE TREE O U T P U T . C  2»  DECK  I N C L U D E NE C A R D S , COLUMNS I TO 2 0 C O N T A I N I N G A L A 3 E L FOR ONE ENTITY. ORDER THE L A B E L CARDS IN THE SAME S S Q U i M C c AS THE E N T I T I E S ARE R E P R E S E N T E D I N THE S I M I L A R I T Y M A T R I X .  SETUP  SPECIFICATIONS  THE USER P R O V I D E S PROGRAM D R I V E R WHICH PERFORMS THE F O L L O W I N G 1. ASSIGNS INPUT/OUTPUT UNITS 2. E S T A B L I S H E S THE DIMENSION! OF THE X ARRAY AND S E T S T H I S D I M E N S I O N EQUAL TO L I M I T . 3. C A L L S SUBROUTINE C N T R L . THE F O L L O W I N G EXAMPLE W I L L S U F F I C E I N MOST C A S E S .  TASKS.  PROGRAM DRI VER ( I N P U T , OUTPUT , PUNCH , T A P E S = I NP UT , T A P E 6 = Q'JT PUT , ATAPE7=PUNCH,TAPE1,TAPE2) DIMENSION X{7000) L I MIT=7000 CALL C N T R L ( X , L I M I T ) END A SECOND J O B DEPENDENT SEGMENT I S S U B R O U T I N E M E T H O J . THE USER S E L E C T S AMONG THE S E V E R A L A L T E R N A T I V E V E R S I O N S OF T H I S S U B R O U T I N E IMPLEMENT THE D E S I R E D C L U S T E R I N G T E C H N I Q U E . THE  SUBPROGRAMS  CNTRL,  CLSTR,  MTXIN,  L F I N D AND TREE  GO I N E V E R Y  THE X ARRAY I S PAR I T I O N E D FOR STORAGE AS FOLLOWS STORAG £ FOR ARRAYS NEEDED AT A L L S T A G E S OF THE J O B NE WORDS — S T O R A G E OF THE I I ARRAY X ( N 1 ) TO X ( N 2 - 1 ) X I N 2 ) TO X I N 3 - 1 ) N'c WORDS — S T O R A G E OF THE J J ARRAY X { N 3 ) TO X { N 4 - 1 ) NE WORDS — S T O R A G E OF THE SS ARRAY X ( N 4 ) TO X I N 5 - 1 ) NE WORDS — S T O R A G E OF THE I L ARRAY X ( N 5 ) TO X ( N 6 - 1 ) NE WORDS — S T O R A G E OF THE J L ARRAY X (N6 ) TO X ( N 7 - 1 ) NE WORDS — S T O R A G E OF THE N E X T ARRAY STORAG E FOR ARRAYS NEEDED I N S U B R O U T I N E C L S T R M1=N7 ( N E * ( N E - 1 ) 1 / 2 W O R D S — STORAGE OF THE S XI M l ) TO X C M 2 - 1 ) NE W O R D S — S T O R A G E OF THE LAST ARRAY XI M2) TO X ( M 3 - 1 ) NE W O R D S — S T O R A G E OF THE NEAR ARRAY X(M3 ) TO X ( M 4 - 1 ) NE W O R D S — S T O R A G E OF THE SREF ARRAY X ( M 4 ) TO X ( M 5 - 1 ) NE W O R D S — S T O R A G E OF THE L I S T ARRAY X ( M 5 ) TO X 1 M 6 - 1 ) T  ARRAY  TO  JOB.  177  X(M6> TO X ( M 7 - 1 ) NE W O R D S — S T O R A G E OF THE A ARRAY X ( M 7 ) TO XI MS I NE W O R D S — S T O R A G E OF THE B ARRAY STORAGE FOR ARRAYS NEEDED IN S U B R O U T I N E TREE ( O V E R L A Y ARRAYS IN SUBROUTINE CLSTRJ L1=N7 TO X ( L 2 - 1 ) 2 5 * N E W O R D S — S T O R A G E OF THE A ARRAY TO X ( L 3 - 1 ) 5 * N E W O R D S — S T O R A G E OF THE L A B E L ARRAY TO X ( L V - l ) NE W O R D S — S T O R A G E OF THE L C L N O ARRAY TO X ( L 5 - 1 ) NE WORDS—.STORAGE OF THE L I N E ARRAY TO X R 6 - 1 ) NE W O R D S — S T O R A G E OF THE IS ARRAY TO X I L 7 ) NE W O R D S — S T O R A G E OF THE L A S T ARRAY  INTEGER FIRST D I M E N S I O N X l l ) , F M T ( 2 0 ) ,T I T L E I 2 0 ) , E P S ( 2 5 ) DATA R L B / ' N O L B ' / READ(5,1000) TITLE READ15,1100) NE,I SIGN,NTSV,NT INtINOPT,KOUT WRITE(6*2500) TITLE W R I T E ( 6 , 2 2 0 0 ) N E i I S I G N , N T S V . N T IN»INOPT,KOUT P A R T I T I O N THE STORAGE ARRAY Nl=l N2=N1+NE N3=N2+NE N4=N3+NE N5=N4+NE N6=N5+NE N7=N6+NE M2=N7+(NE*CNE-1))/2 M3=M2+NE M4=M3+NE M5 = M4 + NE M6=M5+NE M7=M6+N£ M8=M7+NE-1 L2=N7+25*NE 13=L2+5*NE L4=L3+NE L5=L4+NE . L6=L5+NE L7=L6+NE-1 CHECK FOR S U F F I C I E N T STORAGE MAX=M8 I F ( L 7 . G T . M A X ) MAX=L7 WRITE(6,2300) MAX,LIMIT I F ( M A X o G T L I M I T ) STOP READ THE S I M I L A R I T Y M A T R I X R E A D ( 5 , 1 0 0 0 ) FMT WRITE(6,2100) FMT CALL M T X I N t X ( N 7 ) , I N O P T , N E t N T I N , F M T ) 0  NEEDED  178  C. 60 | C  READY TO C L U S T E R CALL CLSTRlXIN1),X(N2),X(N3),X(N4),X(N5),X(N6),X(N7),XIM2),X(M3), AX(M4) , X ( M 5 ) , X ( M 6 ) , X ( M 7 ) , T I T L E , N E , I S I G N , N T S V ) READ L A B E L C A R D ( S ) FIRST=L2 LAST=L2+4 READ(5,1000) (X(I),1=FIRST,LAST) IFIXC FIRST).f=Q.RLB)  GO  TO  80  READ R E M A I N I N G L A B E L S DO 70 K = 2 , N E FIRST=LAST+1 L A S T = L A S T +5 TO READ(5,1000) (X(I),I=FIRST.LAST) C DRAW THE T R E E C O R R E S P O N D I N G TO THE C L U S T E R I N G 80 MERGES=NE-1 CALL TREE(XINl),X(N2),X(N3),X(N4),X(N5),X(N6),X(N7).X(L2),X(L3), AX ( L 4 ) , X {L 5 ) , X ( L 6 ) , E P S , T I T L E , ME RGE S , 1, 6 , 1 , KOUT , NE) RETURN 1 0 0 0 FORMAT( 2 0 A 4 ) 1100 FORMATl13,3I2fI3,12,13) 2100 F O R M A T ( 7 H F O R M A T , 2 0 A4) 2200 F O R M A T ( 5 H NE = , I 8 , / , 8 H I S I G N = , I 5 , / , 7 H NTSV = , I 6 , / , 7 H NT I N = , 1 6 , A / , 8 H INOPT = , I 5 , / , 7 H KOUT =,16) 2 J 0 0 F O R M A T { 1 9 H R E Q U I R E D STORAGE = , I 5 , 6 H W O R D S , / , A 19H A L L O T T E D STORAGE = , I 5 , 6 H W O R D S , / ) 2">00 FORMAT ( ' 1' , / / , 2 0 A 4 , / / ) END C  C C  S U B R O U T I N E C L S T R ( 11 , J J , SS , 1 L , J L , N E X T , S , L AST , \J E A R , S R E F , L I ST , A , B , A T I T L E , N , I SIGN,NT) I N T H I S V E R S I O N THE LOWER T R I A N G U L A R P O R T I O N OF THE S I M I L A R I T Y MATRIX I S STORED BY ROWS I N THE O N E - D I M E N S I O N A L ARRAY S .  C  C C C C C C C C C C C  C  C C C C C  THE FOLLOWING V A R I A B L E S ARE S P E C I F I E D I N THE C A L L I N G PROGRAM AND ARE P A S S E D THROUGH THE ARGUMENT L I S T N=NUMBER OF O B J E C T S TO BE C L U S T E R E D S ( J ) = J - T H ELEMENT IN LOWER T R I A N G U L A R S I M I L A R I T Y M A T R I X ISIGN=0PTI0N'SPECIFYING T Y P E OF S I M I L A R I T Y F U N C T I O N USED I S I G N = + l = D I S T A N C E MEASURE ( D E C R E A S I N G F U N C T I O N OF S I M I L A R I T Y ) I S I G N = - l = C ORR EL A T I O N MEASURE ( I N C R E A S I N G F U N C T I O N OF S I M I L A R I T Y ) NT=TAPE UNIT ON WHICH THE R E S U L T S ARE S A V E D NT e>LE«>0 = DO NOT SAVE R E S U L T S ON TAPE NT=7=SAVE R E S U L T S ON PUNCHED CARDS T I T L E = I D E M T I F Y I N G T I T L E FOR T H I S RUN THE F O L L O W I N G V A R I A B L E S R E P R E S E N T THE OUTPUT OF THE PROGRAM AND ARE P A S S E D BACK THROUGH THE ARGUMENT L I S T . T H E S E R E S U L T S ARE READY FOR SUBROUTINE T R E E . K=STAGE OF C L U S T E R I N G I I ( K ) = L O W E R NUMBERED C L U S T E R MERGED AT STAGE K  179  C E tt G G G G d C 0 CJ d C C C C C  10 C  C  30 C 40  J J ( K ) = U P P E R NUMBERED C L U S T E R MERGED AT STAGE K S S ( K ) = V A L U c Or S I M I L A R I T Y F U N C T I O N A S S O C I A T E O WITH McRGE AT IL { K ) = P R E C E O I N G STAGE AT WHICH I K K ) WAS LAST I N A MERGE J L ( K ) = P R E C 6 D I N G S T A G E AT WHICH J J ( K ) WAS LAST IN A MERGE N E X T t K.) =NEXT STAGE AT WHICH I K K ) I S I N A MERGE  STAGE  K  I N A D D I T I O N , THE FOLLOWING V A R I A B L E S PLAY IMPORTANT R O L E S IN THE PROGRAM N E A R { I ) = I D NUMBER OF EXTREME ELEMENT I N ROW I OF THE LOWER TRIANGULAR S I M I L A R I T Y MATRIX* SREFCI)=S!MILARITY MEASURE FOR THE P A I R (I,NEAR(I)) L I S T ( I ) = I - T H CLUSTER ID NUMBER I N S E Q U E N T I A L L I S T OF CURRENT C L U S T E R S NCL=NUMBER OF C L U S T E R S AT CURRENT S T A G E L A S T ! I ) = S T A G E NUMBER AT WHICH C L U S T E R I WAS LAST I N A MERGE A=WORKING AREA FOR S U B R O U T I N E METHOD R=WORKING AREA FOR S U B R O U T I N E METHOD T H I S S U B R O U T I N E USES F U N C T I O N L F I N D < I , J ) TO F I N D THE ADDRESS I N S FOR THE S I M I L A R I T Y MEASURE BETWEEN C L U S T E R S I AND J D I M E N S I O N S (1) , I I ( 1 ) , J J ( 1 ) , S S ( l ) , I L ( 1 ) , J L { 1 ) , N E X T 1 1 ) , N E A R { 1 ) , ASREF(l),LISTll),LAST(l),A(l),B(l) DIMENSION T I T L E ( 2 0 ) I N I T I A L I Z E V A R I A B L E S AND SET CONSTANTS NCL=N K=l SIGN=ISIGN BIG=SIGN*loE50 CALL METHODC-S,NEAR,SREF,LI S T , A , B , S R E F X , S I G N , N , N C L , L R E F , N R E F , 1 } INITIALIZE ARRAYS DO 10 J = l , N LAST(J)=0 NEXT(J)=0 LIST(J)=J SREF(J)=8IG CONTINUE F I N D EXTREME ENTRY I N EACH ROW L=0 DO 30 1=2,N 11=1-1 DO 30 J = 1 , 1 1 L=L+1 IN E F F E C T S ( L ) = S ( I , J ) I F ( ( ( S ( L ) - S R E F l 1 } ) * S I G N ) o G T o Oo) GO TO 30 N 5 A R ( I J =J SR£FU)=SU) CONTINUE MAIN LOOPo F I N D EXTREME V A L U E I N S R E F ARRAY SREFX=BIG DO 50 1 = 2 , N C L LISTI =LIST(I ) I F ( U S R E F I L I S T I ) - S R E F X ) * S I G N ) . G T . O ) GO TO 5 0  180  IREF=I LREF=LISTI SREFX=SREF(L1STI ) CONTINUE L R E F IS T H E ROW N U M B E R C O N T A I N I N G T H E E X T R E M E E N T R Y I N T H E S A R R A Y . IF T H E R E ARE T I E S , T H E N L R E F IS T H E H I G H E S T N U M B E R E D ROW W I T H T H I S EXTREME V A L U E . HENCE L R E F . G T « N E A R ( L R E F ) « IREF I D E N T I F I E S THE P L A C E M E N T OF L R E F I N T H E L I S T ARRAY. NR£F=NEAR(LREF)  50 C C C C  C A L L M E T H O D ( S , N E A P . » S R E F , L I S T , A » 3 » S R E F X » S I G N , N» N C L » L R E F » N R E F » 2 ) G E N E R A T E M E R G E DATA N E E D E D FOR S U B R O U T I N E TREE  C  II(K)=NREF JJ(K)=LREF SS(K)=SREFX IL(K)=LAST(NREF) JL(K)=LAST(LREF) LAST(NREF)=K IF(ILlK)oEQoO) GO  TO  oC  ILK=IL(K) NeXT(Il.K)=K IF(JL(K).EQo0) GO T O 7 0 J L K = J L ( K) NEXTt JLK)-K K=K+1 TERMINATE I F N-1 MERGES HAVE IF(K.EQ.N) GO TO 1 4 0 . U P D A T E FOR T H E . N E X T C Y C L E  60  70 C C  OCCURED  NCL=NCL-1 •IF(IREF.GT.NCL) GO TO 9 0 UPDATE LIST A R R A Y BY R E M O V I N G DO 8 0 I = I R = F , N C L L I S T ( I ) = L I S T ( I-t-i ) U P D A T E FOR NEXT C Y C L E  C 80 C 90  CALL  M E T H O D ( S , N E A R , S R E F , LI  LREF  AND  PUSHING  DOWN  THE  LIST  ST,A,B,SREFX,SIGN,N,NCL,LREF,NREF,3)  GO TO 4 0 C C L U S T E R I N G F I N I S H E D AND A L L A N C I L L A R Y INFORMATION GENERATED. C S A V E R E S U L T S AS DESIRED. 140 K=K-1 160 IF(NT.LE.O) RETURN . WRITE(NT,2300) TITLE DO 1 7 0 1=1,K 170 WRITE(NT,2200) I » 11 ( I ) , J J ( I ) , S S ( I ) , I L ( I ) , J L ( I ) , N E X T ( I ) RETURN 2200 F O R M A T ( 3 1 1 0 , E 1 6 . 8 , 3 1 1 0 ) 2300 FORMAT(20A4) END  C FUNCTION LFIND(I,J) I F T H E LOWER T R I A N G U L A R P O R T I O N OF A S Y M M E T R I C MATRIX IS S T O R E D BY ROWS I N A O N E - D I M E N S I O N A L ARRAY, THEN THE ELEMENT ( I , J ) IN T H E F U L L  C C  i  181  C C  C 10  M A T R I X I S ELEMENT L F I N D ( I , J ) IF C I . G T . J I GO TO 10 ROW J , COLUMN I LFINO=( ( J - l } * ( J - 2 ) ) / 2 + I RETURN ROW I , COLUMN J L F I N D = ( ( ! - ! ) * { 1-2) )/2+J RETURN  IN  T  HE  LINEAR  ARRAY  END  C  SUBROUTINE TREE( I , J , S , I L , J L i N E X T , A , L A B E L , L C L N O , L I N E , I S , L A S T , E P S , A T I T L E , N , K B E G , N T , I NT R V , I P R N T , M A X IN) C  C  DATA  INPUT  THROUGH C A L L I N G  SEQUENCE  C  C C C C C C C C C C C C CC C C C C C C C C C C C C C  N = HIGHE ST STAGE NUMBER I N THE C L U S T E R MERGE DATA (MUST BE E X A C T ) KBEG= ST AGE NUMBER AT WHICH THE TREE B E G I N S , D E F A U L T VALUE 1 NT=TAPE NUMBER FOR P R I N T E D O U T P U T , DEFAULT V A L U E = 6 I N T R V - I N T E R V A L OPTION FOR S E G M E N T A T I O N I N T R V = 1 = D E F A U L T V A L U E o CONSTRUCT E P S BY D I V I D I N G THE RANGE OF S INTO 2 5 EQUAL SEGMENTS INTRV = 2-=EPS I S P R O V I D E D AS P A S T OF THE ARGUMENT L I S T INTRV=3=THE IS A R R A Y IS ALREADY CONSTRUCTED AND EPa I S P R O V I D E D FOR I N F O I P R N T = P R I N T O P T I O N FOR INPUT I N F O R M A T I O N I A 8 S ( I P R N T ) = l o P R I N T ONLY T I T L E AND - I S * ARRAY IABStIPRNT).NEol. I N A D D I T I O N P R I N T THE C L U S T E R MERGE DATA I P R N T . L E o O . I N A D D I T I O N , PUNCH THE SEQUENCE I N WHICH THE E N T I T I E S A P P E A R IN THE T R E E ( N E E D E D !=0R P 0 S T - 4 N A L Y S I S OF DATA U N I T C L U S T E R I N G I N SUBROUTINE *P3ST0J*). E P S ( M ) = R I G H T E N D P D I N T FOR THE MIN I N T E R V A L USED FOR S E G M E N T I N G S L A B E L ( M , I J ) = W I T H OF 5 WORDS I D E N T I F Y I N G THE I J T H 0 3 J E C T T I T L E = A R R A Y OF 2 0 WORDS FOR I D E N T I F Y I N G THE R U N . K=INDEX I D E N T I F Y I N G STAGE NUMBER I N THE C L U S T E R I N G KTH STAGE J ( K ) = U P P E R NUMBERED C L U S T E R I D E N T I F I C A T I O N NUMBER I N THE MERGE AT THE KTH STAGE S ( K ) = V A L U E OF THE C R I T E R I O N F U N C T I O N FOR THE MERGE AT THE KTH S T A G E IS(K)=CATEGORIZED VALUE OF S= I N T E G E R I N RANGE 1 TO 25 I L ( K ) = S T A G E NUMBER WHEN K K ) WAS LAST IN A MER3E ( J F D * F I R S T MERGE FOR K J L { K ) = S T A G E NUMBER WHEN J ( K ) WAS LAST I N A MERGE (0 FOR F I R S T MERGE FDR J ( N E X T ( K ) = S T A G E NUMBER WHEN K K ) TO NEXT I N A MERGE MAX I N = H I G H E S T C L U S T E R ID NUMBER IN THE C L U S T E R MERGE DATA  C  C  OTHER V A R I A B L E S  USED  IN  THE  PROGRAM  C  C C C C C  L I N E ( I ) = L ! N E NUMBER I N THE PRINTOUT AT WHICH K K ) I S C A R R I E D ( A F T E R MOST RECENT MERGE) L C L N O ( L ) = THE C L U S T E R NUMBER TO 8E P R I N T E D ON L I N E L AT THE L E F T OF THE TRE A ( M , L ) = T H E MTH SEGMENT (OF 2 5 ) IN THE L T H L I N E OF THE P R I N T O U T LAST(L)=FARTHEST RIGHT SEGMENT I N L I N E L WHICH I S BLANK  R E A L * 4 LABEL DIMEN5 ION T < N ) , J i N ) t S ( N ) , I S ( N ) , I L ( N ) , J L ( N ) , N E X T I N ) , A A ( 2 5 , M A X I N J , L AST ( MAX I N ) t LCLNO'( MAXIN J D I M E N S I O N L I N E {MAX J N) , L A 3 E L ( 5 , M A X I N ) DIMENSION EPS(25),TITLE(20) DATA 3AP.I , B L I N K , B A R S , B L A N K / 4 H I»4H I,4H ,+H DATA R L B / ' N O L B V DEFAULT V A L U E S IF(KBEG.LT.l) KBEG=1 IP ( I N T R V o L T o l o O R o I N T R V o G T o 3 ) INTRV = 1 I F ( N T o L E o O ) NT=6 I N I T I A L I Z E ARRAYS N0BJ=N+1 DO 10 K=1,N03J LINE(K)=0 LCLNO(K)=0 LAST(K)=0 DO 1 0 L=i,25 I A(L,K)=BLANK 10 CONTINUE C SEGMENT THE S ARRAY GO TO (20,40,120),INTRV C CONSTRUCT I N T E R V A L S OF EQUAL L E N G T H 20 R AN GE= S ( N ) - S ( K B E G) DELTA=RANGE/25o EPS(1)=S(K3EGJ+DELTA ' DO 30 K = 2 , 2 4 " 30 EPS(K)=EPS(K-1)+DELTA EPS(25)=S(N) C . CONSTRUCT THE IS ARRAY 40 I F ( E P S ( l ) « G T o E P S ( 2 ) ) GO TO 7 0 C S I N C R E A S E S WITH D I S S I M I L A R I T Y ( A S DOES A D I S T A N C E ) KK = 1 DO 60 K = 1 , N 50 I F ( S ( K ) « . L E E P S ( K K ) ) GO TO 6 0 I F ( K K o E Q « 2 5 ) GO TO 6 0 KK=KK+1 GO TO 50 60 IS(K)=KK ' GO TO 120 C S D E C R E A S E S WITH D I S S I M I L A R I T Y (AS DOES A C O R R E L A T I O N ) 70 KK=24 KKK=25 NN=N+1 DO 9 0 K = 1,.N KCOMP=NM-K 80 I F ( S ( K C G M P ) L T o E P S ( K K ) ) GO TO 90 KKK=KK KK=KK-1 I F ( K K . = Q . O I GO TO 1 0 0 e  e  183  GO T O 8 0 90 IS(KCOMP)=KKK 100 DO 1 1 0 K = 1 , K C 0 M P 110 IS(K)=1 C PRINT I N P U T TO T R E E 120 WRITE(NT,2000) TITLE WRITE(NT,2100)  KBEG,N  WRITE(NT,2200) WRITE(NT,2300)  M=l WRITE(NT,2400) M,S(KBEG),EPS(M) DO 1 3 0 M = 2 , 2 5 MM=M-1 130 WRITE(NT,2400) M,EPS(MM),EPS(M) I F ( U B S ( I P R N T ) . E Q . 1) GO T O 1 5 0 C PRINT T H E C L U S T E R MERGE DATA WRITE(NT,2000) TITLE WRITE(NT,2500) DO 1 4 0 K = K B E G , N WRITE(NT,2600) K,I(K),J(K),S(K),IS(K),IL(K),JL(K),NEXT(K) 1140 CONTINUE CJ S T A R T T R E E WITH T H E MOST S I M I L A R PAIR l|50 K=KBEG LNO=0 1160  MERGE C L U S T E R S IK=I(K) JK=J(K)  I(K)  SET LINE NUMBERS IFIIL(K).NE.O) LN0=LN0+1  AND J ( K )  FOR OUTPUT GO T O 1 7 0  LINElIK)=LNO LCLNO(LNO)=IK 170 IF(JL(K).NE.O) GO T O 1 8 0 LN0=LN0+1 L I N E ( J K )=.LNO LCLNO(LNO)=JK C FILL IN T H E P R I N T LINES 180 ISK=IS(K) KT=0 ITEM=IK 190 LITEM =L INE(ITEM) IF{ISK-LAST(LITEM)-l) 225,200,210 C A D D O N L Y O N E MORE S F G E M N T F O R L I N E ( I T E M ) 200 A ( I SK , L I T EM ) = BAR I LAST(LITEM)=ISK GO TO 2 2 5 C ADD MORE T H A N O N E S E G M E N T 210 LBEG=LAST(LITEM)+1 LEND=ISK-1 DO 2 2 0 L = L B E G , L E N D  184  220  A!L » LITEM)=BARS GO TO 200 C REPEAT FOR CLUSTER J ( K ) 225 KT=KT+1 I F l K T . N E . l ) GO TO 2 3 0 TTEM=JK GO TO 1 9 0 C TAKE CARE OF ANY LINES BETWEEN K K ) AND J ( K ) 230 LIK=LINE(IK) LJK=LINE(JK) I F C L I K o G T o L J K ) G J TO 240 LBOT=LJK LTOP=LIK GO TO 2 50 LBOT= LIK 240 LTOP=LJK IF(LBOT.EQ. 1 L T 0 P + 1 ) ) TO 2 7 0 250 MUST F I L L IN SOME VERTICAL CONNECTIONS C LBEG=LT0P+1 LEND=LB,DT-1 DO 26 0 L=LBEG,LEND ' IF(A(ISK,L)»EQo5ARI) GO TO 260 A( I S K , L ) = BLINK LAST(L)=ISK 260 CONTINUE C UPDATE LINE NUMBER FOR NEW CLUSTER 270 LINElI K) = (LINElTKJ+LINEIJK))/2 C MERGE COMPLETE. FIND NEXT STAGE KLAST=K K=NEXT1K) IF(KoGT«NoDRoKoLT« GO TO 4 0 0 I F U L ( K ) . L E . O ) GO TO 280 I F { J L ( K ) . L E . O ) GO TO 290 GO TO 300 280 IL(K)=-IL1K) GO TO 160 . 290 JL(K)=-JLIK) GO TO 160 C THIS MERGE INVOLVES THA • EACH HAVE MOR THAN ONE MEM3ER. C BACKTRACK TO THE ROOT OF THE TREE ALONG THE UN EXPLORED BRANCH. 300 I F ( I L [ K ) o E Q . K L ST) GO TO 310 GO DOWN I L ( K ) BRANCH SET J L ( K ) SO WE KNOW NOT TO GO DOWN THAT BRANCH AGA: JLIK)=-JL(K) K=I LIK ) GO TO 3 20 C | GO DOWN J L ( K ) BRANCH, SET I L ( K ) SO WE KNOW NOT TO GO DOWN THAT BRANCH AGA! 310 IL(K)=-IL(K) K=JL1K) 320 I F 1 K . L T . 1 . 0 R . K GT.N) GO TO 600 I L ( K ) = J L ( K ) I F F BOTH ZERO. C TEST TO SEE I F THE END HAS BEEN REACHED,  i  185  330 340 350 ' 360  IF( I L C K J — J L ( K i ) 330, 160,350 I F { I L ( K ) „ E Q . 0 ) GO T O 360 K=IL(K) GO T O 3 2 0 IF{JL(K).SQ.O) K= J L t K )  GO T O 3 2 0 C P R I N T THE T R E E 400 WRITE(NT,2000)  GO  TO  340  TITLE  IF(LABSL(1,1).EQ.RL3) GO T O WRITE(NT,3000) (K,K=1,25)  420  00 410 L=1,LN0 LL=LC LNO(L) 410  W R I T E (NIT, 3 1 0 0 ) ( L A B E L ( K, L L ) , K = l , 5) , L L , ( A l K, L ) , K = i , 2 5 ) GO T O 4 4 0 C LEAVE LABEL SPACES BLANK 420 WRITE(NT,3010) (K,K=1,25) 00  430  L=1,LN0  LL=LCLMQ(L) 430 WRITE(NT,3210) LL,(A(K,L),K=1,25) C TREE COMPLETE 440 IF( IPRNT.GT.O) RETURN C PUNCH S E Q U E N C E LIST WRITE(7,3900} WRITE(7,4000) RETURN  TITLE (LCLNO(L),L=1,LNO)  C ERROR, P R I N T AS MUCH O F T H E 600 WRITt(NT»6000 ) KLAST,K GO T O 4 0 0 2000 FORMAT!1H1,20X,20A4,//)  TREE  2100  FORMAT(65H AcEN S T A G E ,  T H I S RUN D E P I C T S I 5 , 1 1 H AND S T A G E  2200  FORMAT(63H ALASSES.,/)  THE  CRITERION  AS  HAS  BEEN  T H E P O R T I O N OF , I 5 , 1 9 H OF T H E  VALUES  ARE  CONSTRUCTED  THE TREE GENERATED CLUSTERING.,/)  SEGMENTED  2 3 0 0 F O R M A T ( 6 H C L A S S , 5 X , 1 1 H L O W F:R 3 0 U N D , ' 5 X , 1 1 H U P P E R AOO F O R M A T ( 1 X » I 5» 2 E 1 6 . 3) 2 500 FORMAT(IH ,9X,1HK,9X,1 HI,9X,IHJ,15X,1H5,3X,2HI I A,4HNE X T , / / ) 2600 FORMAT(IX,3110,£16.3,4110) 3 0 0 0 F O R M A T l l O H I T E M NAME , 1 2 X » 5 H I D NO,2X,2514,//)  INFO  THE  3ETW  FOLLOWING  C  BOUND,//) 5.3X,2HIL,8X,2HJL,6X  C IF LOCAL CONVENTIONS P E R M I T , RtCOMMEN THAT THE CARRIAGE CONTROL C CHARACTER IN F O R M A T S 5 1 0 0 AND 3 2 0 0 A L L O W 6 6 L I N E S O F P R I N T P E R PAGE. C THAT I S , T H E M A R G I N S - A T T H E T O P AND B O T T O M O F T H E P A G E A R E SUPPRESSED C AND P R I N T I N G IS SINGLE SPACE. 31100 F O R M A T ( 1 H , 5 A 4 , 16 , 2X , 2 5 A 4 ) 3<j)10 F O R M A T ( 5 X » 5 H I D NO,2X,2514,//) 3210 F O R M A T ( 5X , I 6 , 2 X , 2 5 A 4 ) 3^00 FORMAT(20A4) 4(000 F O R M A T ! 201 4) 6q>00  FORMAT(37H  ERROR.  WHILE  BACKTRACKING  FROM  KLAST,I6,27H  K  WAS  FOUND  186  A OUT END | £ t t  OF  RANGE.,/,IX,3HK  =,120)  S U B R O U T I N E MTX I N ( X , I O P T , N E , N T I N , F M T ) T H I S SUBROUTINE READS A LOWER T R I A N G U L A R MATRIX * X * R E P R E S E N T I N G A S S O C I A T I O N AMONG * N E * E N T I T I E S . THE M A T R I X I S READ FROM U N I T * N T I N * I N FORMAT * F M T * . THE MODE OF INPUT FOR THE M A T R I X IS D c T c R M l N E D BY THE * I O P T * PARAMETER AS F O L L O W S , tt IOPTcLEcO, M A T R I X IS READ IN LOWER T R I A N G U L A R FORM BY ROWS, E A C H d ROW B E I N G A NEW R E C O R D . G IOPT.GT.O, M A T R I X I S READ I N CONSTANT LENGTH B L O C K S , E A C H * I 3 P T * C WORDS L O N G . D I M E N S I O N F M T ( 2 0 ) , X ( 1) INTEGER FIRST IF( IOPToLE.0) GO TO 30 C READ THE S I M I L A R I T Y MATRIX I N BLOCKS IOPT LONG ! FIRST=1 LAST=IOPT 10 READ(NT I N , F M T , E N D = 60) [ X { I ) , I = F I R S I , L A S T ) C USE THE END OF RECORD CARD TO S I G N I F Y END OF THE S I M I L A R I T Y M A T R I X 20 FIRST=FIRST+IOPT [ LAST=LAST+IOPT GO TO 10 C READ THE S I M I L A R I T Y M A T R I X AS ROWS OF A LOWER T R I A N G U L A R M A T R I X , C EACH ROW A R E C O R D . 30 FIRST=1 LAST=1 DO 50 K = 2 , N £ R E A D ( N T I N , F M T »E MD=2 0 G ) ( X I I ) , 1 = F I R S T , L A S T ) 40 FIRST=LAST+1 LAST=LAST+K 50 CONTINUE C P A S S THE END OF F I L E READ{NT I N , F M T , E N D = 6 0 ) Z 210 WRITE(6,2500) GO TO 9 9 9 60 RETURN C ERROR MESSAGES 200 WRITE(6,2400) . GO TO 2 2 0 220 WRITE(6,2600) K,FIRST,LAST,Z,(X(I),I=FIRST,LAST) 999 STOP 2 4 0 0 F O R M A T { 3 6 H EOF ENCOUNTERED WHEN NONE E X P E C T E D . ) 2 5 0 0 F O R M A T ( 3 0 H NO EOF WHEN ONE WAS E X P E C T E D . ) 2600 F O R M A T ( l X , 3 I 1 0 , F 1 0 . 7 , / , ( 1 X . 1 2 F 1 0 . 7 ) ) END $SIG  c C C C C C  SUBROUTINE AJ03J  187  METHOD(S,NEAR,SREF,LIST,A,B,SREFX,SIGN,N«NCL,LREF,NREF,  H I E R A R C H I C A L C L U S T E R I N G 3Y S I N G L E L I N K A G E . THE L A o O R I T H M I S D E R I V E D FROM J O H N S O N , S . C , H I E R A R C H I C A L C L U S T E R I N G S C H E M E S , PSYCHOS ET R I K A , VOLUME 3 2 , NUMBER 3 , S E P T E M B E R 1 9 6 7 , PP 2 4 1 - 2 5 4 .  DIMENSION S ( l ) , N E A R { 1 ) , S R E F ( 1 I,L1ST(1),All),3(11 GO TO { 1 0 , 1 5 , 2 C ) , J O B C J03=l. INITIALIZATION 10 WRITE ( 6 , 3 0 0 0 ) ' 3 0 0 0 FORMAT ( 2 6 H 0 S I N G L E L I N K A G E C L U S T E R I N G ) BIG=SIGN S--loE50 RETURN C J 0 B = 2 , DUMMY ENTRY* !l5 RETURN C J 0 B = 3 , UPDATE FOR NEXT R O U N D . 20 CONTINUE DO 5 0 J = 1 , N C L U P D A T E E N T R I E S IN S ARRAY A S S O C I A T E D WITH NREF I=LIST(J) I F ( I . E Q . N R E F ) GO TO 5 0 t R E C A L L THAT L R E F HAS B E E N REMOVED FORM L I S T SO I NEED NOT BE T E S T E D t FOR E Q U A L I T Y WITH L R E F LL = L F I N D ( I , L R E F ) LN=LFIND(I,NREF) IF( l(S(LL)-S(LN))*SIGN)oGE.O.) GO TO 3 5 S(LN)=S(LL) I F ( I o G T . N R E F ) GO TO 3 0 G I.LT.NREF G CHECK WHETHER S ( L N ) HAS A B E T T E R VALUE THAN S R E F ( N R E F ) I F ( ( ( S ( L N ) - S R E F ( N R E F ) ) * S I G N ) . G T . O . ) GO TO 50 NEARlNREF)=1 SREF(NREF)=S(LN) GO TO 50 E50 I F ( L G T . L R E F ) GO TO 4 0 C I.GT.NREF.AND.I.LT.LREF CJ CHECK WHETHER S ( L N ) HAS A BETTER VALUE THAN S R E F ( I ) IF{I(S(LN)-SPvcFin)*SIGN).GE.O.) GO TO 5 0 SREFCI)=S(LN) NEAR(I)=MREF GO TO 5 0 35 I F ( I . L T . L R E F ) GO TO 50 C I.GT.LREF C UPDATE NEAR ARRAY FOR THOSE ROWS WHOSE EXTREME ELEMENT WAS L R E F 40 I F ( N E A R ( D . N E . L R E F ) GO TO 5 0 N E A R ( I ) = NREF SREF(I)=S(LN) 50 CONTINUE RETURN END :  188  SUBROUTINE  C  METHOD(S,NEAR,SREF,LIST,.A,BtSREFX,SIGN.N.NCL,LREF,NREF,  A JOB)  C  HIERERCHICAL  C  DERIVED  C  JOHNSON,  C  VOLUME  CLUSTERING  BY C O M P L E T E  LINKAGE.  THE ALGORITHM  IS  FROM S . C ,  32,  HIERARCHICAL  NUMBER  3,  CLUSTERING  SEPTEMBER  SCHEMES,  1 9 6 7 , PP  PSYCHOMETRIKA,  2^1-254.  C DIMENSION S ( 1 ) , N E A R ( I ) , S R E F { 1 ) , L I S T ( 1 ) , A ( 1 ) , 3 1 1 ) GO T O ( 1 0 , 1 5 , 2 0 ) , J O B C J0B=1. INITIALIZATION 10 WRITE(6,2000) 2000 F O R M A T ! 2 8 H 0 C 0 M P L E T E L I N K A G E C L U S T E R I N G ) BIG=SIGN*1.E50  ] RETURN C J 0 B = 2 , DUMMY ;15 RETURN |C ;20 | 0  C  30 C C C 40  SO 55 60  70 80  ENTRY.  J 0 B = 3 , U P D A T E FOR N E X T R O U N D . DO 3 0 J=1,NCL I=LIST<J) IF(I.EQ.NREF) GO T O 3 0 RECALL T H A T LREF H A S BEEN R E M O V E D FROM L I S T SO I NEED NOT BE TESTED FOR E Q U A L I T Y WITH L R E F . LL=LFINDiI,LREF) LN=LFINDII,NREF) IF(((S(LL)-S(LN))*SIGN).LE.G) GO TO 3 0 SILN)=S{LL) CONTINUE U P D A T E T H E NEAR AND S R E F A R R A Y S . I F T H E E X T R E M E E L E M E N T I N ROW I WAS E I T H E R L R E F OR N R E F , T H E N I T I S N E C E S S A R Y T O F I N D A NEW E X T R E M E ELEMENT. ROWS P R I O R T O N R E F N E E D N O T B E C O N S I D E R E D . DO 5 0 J = 1 , N C L I =L 1ST(J) IF(I.EQ.NREF) GO T O 5 5 CONTINUE IF(J.EQ.l) GO T O 8 0 SREF{I)=BIG J1=J-1 DO 7 0 L = 1 , J 1 LISTL=LIST(L) LL=LFINO(I,LISTL) IFU(S(LL)-SREF{I))*SIGN).GE»0<,) NEAR!I)=LISTL SREF(I)=S{LL) CONTINUE J=J+1 IF(J.GT.NCL) RETURN I=LIST(J)  GO T O 7 0  IF(NEARd).EQ.LREF.OR.NEARCI ).EQ.NREF)  GO TO 80 END  GO T O 6 0  189  SUBROUTINS  C  M E T H O D ( S , N E A R , S RE F , L I ST , N U M 6 R , S U M , S R E F X » 3 1 G N » N » NC L »  ALREF,NREF,JOB)  C  HIERARCHICAL  C  MAXIMIZING  CLUSTERING  C C C C C  THE ALGORITHM IS D E R I V E D FROM T H E * G R O U P A V E R A G E * METHOD D E S C R I B E D IN L A N C E , G . N . AND W. T „ W I L L I A M S , A G E N E R A L T H E O R Y O F CLASSIFICATORY SORTING STRATEGIES, 1. H I E R A R C H I C A L SYSTEMS, THE COMPUTER J O U R N A L , V O L U M E 9 , NUMBER 4 , FEBRUARY 1 9 6 7 , P P 3 7 3 - 3 8 0 .  THE  AVERAGE  BY  MINIMIZING  CORRELATION  THE  BETWEEN  AVERAGE THE  DISTANCE  MERGED  OR  GROUPS.  C  DIMENSION S(l),NEAR(1),SREF(1),LIST(1),NUMBRt1),SUM(1) GO TO 1 1 0 , 2 5 , 3 0 ) , J O B C J03=l, INITIALIZE. C NUM8RII3=NUMBER OF E N T I T I E S C U R R E N T L Y IN THE I - T H CLUSTER 10 WRITE(6,2000) 2 0 0 0 FORMAT I 4 2 H 0 A V E R A G E L I N K A G E BETWEEN T H E MERGED GROUPS) DO 2 0 J = 1 , N 20 NUMBRtJ)=l BIG=SIGN*1.E50 RETURN C J 0 B = 2 , DUMMY E N T R Y . 25 RETURN C J 0 8 = 3 , U P D A T E FOR N E X T ROUND. C U P D A T E T H E NEW C L U S T E R 30 NUM8R(NREF)=NUMBR(NREF)+NUMSR(LREF) C U P D A T E E N T R I E S IN T H E R E D U C E D S I M I L A R I T Y M A T R I X . THE E N T R I E S ARE C T H E SUM T O T A L O F S I M I L A R I T Y V A L U E S A S S O C I A T E D W I T H A L L C P A I R W I S E L I N K S B E T W E E N T H E E L E M E N T S O F T H E TWO C L U S T E R S . DO 4 0 J=1,NCL I=LIST(J) I F ( I . E Q . N R E F ) GO TO 4 0 C R E C A L L T H A T L R E F H A S B E E N R E M O V E D F R O M L I S T AND T H E R E F O R E I N E E D N O T C B E T E S T E D FOR E Q U A L I T Y WITH L R E F . LL=LFIND(I,LREF) LN=LFIND(I,NREF) S(LN)=S(LN)+S(LL) 40 CONTINUE C U P D A T E T H E NEAR AND S R E F A R R A Y S . I F T H E E X T R E M E E L E M E N T I N ROW I C WAS E I T H E R L R E F OR N R E F , T H E N I T IS N E C E S S A R Y T O F I N D A NEW W X T R E M E C ELEMENT. ROWS P R I O R T O N R E F N E E D N O T B E CONSIDERED. DO 50 J = 1 , N C L I=LIST(J) IF(I.EQ.NREF) GO T O 5 5 50 CONTINUE 55 IF(J.EQ.l) GO T O 8 0 60 SREF(I)=8IG J1=J-1 DO 7 0 L=1,J1 LISTL=LIST(L)  190  70 80  LL=LFIND(I»L!STL) S R E F X = S ( L L ) / ( N U M B R (I ) * N U M B R ( L I S T L ) ) IF{((SRFFX-SREF(I))*SIGN).GE.O.) GO T O N E A R ( I ) = LIS TL SRE F { I ) = S R E F X CONTINUE J=J+1 IF( J.GT.-NCL )  70  RETURN  I =L I ST { J) IF(NcAR(I)»EQoLREFoORoNEAR(II.EQ.NREF) GO END  TO  80  GO  TO  60  191  SUBROUTINE  METHOD!S,NEAR,SREF,LI  A L R F . F , N R E F ,  ST,NUMBR,SUM,SREFX,SIGN,N,NCL,  JOB)  C C  HIERARCHICAL  C  MAXIMIZING  C  FOR  EACH  C  HEW  GROUP TO  IS  BY  MINIMIZING  CORRELATION  MERGE  TH*  THE  WITHIN  AVERAGE  OF  A V E R M G E  DISTANCE  THE  JROUP.  ALL  NEW  LINKAGES  OR THAT  WITHIN  IS,  THE  CALCULATED. S ( 1 ) » NEAR(1},S RE F ( 1 ) , L I S T(1) , NUMBR11),SUM(i)  (10,25,30},JOB  J0B=1,  C  AVERAGE  POTENTIAL  DIMENSION GO  CLUSTERING  THE  INITIALIZE.  C  NUMBR(I}=NUMBER  C  SUM(I)=SUM  OF  OF  ALL  ENTITIES PAIRWISE  CURRENTLY  IN  SIMILARITIES  THE  I-TH  AMONG  ^LUSTER  ENTITIES  IN  THE  I-TH  CLUSTER  C 10  WRITE(6,2000)  2000  FORMATC37H0AVERAGE LINKAGE  ;  DO  20  WITHIN  THE  NEW  GROUP)  J=1,N  j  NUMBR(J)=1  20  SUM(J) = 0.  BIG=SIGN*1.E50 RETURN C  J0B=2,  2 5  DUMMY  ENTRY.  RETURN  C  J0B=3,  UPDATE  i  UPDATE  THE  30  FOR  NEW  NEXT  ROUND.  CLUSTER  NUMBR(lMREF)=NUMBR(NREF)+NUM8R(LREF) LN=LFIND(LREF,NREF) SUM(NREF)=SUM(NREF)+SUM!LREF)+S(LN)  G  UPDATE  C  THE  ENTRIES  SUM  TOTAL  PAIRWISE DO  40  IN OF  LINKS  THE  REDUCED  SIMILARITY  BETWEEN  THE  SIMILARITY  VALUES  MATRIX.  ASSOCIATED  ELEMENTS  OF  THE  THE  WITH  TWO  ENTRIES  ARE  ALL  CLUSTERS.  J=1,NCL  I = L I ST ( J ) IF{  I.EQ.NREF)  RECALL BE  THAT  TESTED  GO  LREF  FOR  TO  HAS  40 BEEN  EQUALITY  L L = L F I N D (I  ,LP.EF )  L N = L F I ND ( I  , N'REF )  REMOVED  WITH  FROM  LIST  AND  TcREFORe  I  NEED  NOT  LREF.  S(LiM) = S ( L N ) + S ( L L ) 40  CONTINUE  C  UPDATE  CJ  WAS  THE  EITHER  ELEMENT. DO  50  NEAR  AND  LREF  OR  ROWS  SREF NREF,  PRIOR  TO  J=1,NCL  I=LIST(J) IF(loEQ.NREF) 50  CONTINUE  55  IF(J.EQ.l)  o'o  SREF(I)=BIG  GO  J1=J-1 DO  70  L=1,J1  GO  TO  TO  80  55  ARRAYS. THEN NREF  IT NEED  IF IS  THE  EXTREME  NECESSARY  NOT  BE  TO  ELEMENT FIND  .CONSIDERED.  A  IN NEW  ROW  I  EXTREME  192  70 30  LISTL=LIST(U LL=LFIND(I,LISTL) NTGT=NUM3R{I)+NUMBR(LI STL) NTOT=( ( N T O T ) M N T C T - l ) i/2 SRE FX= ( " S U M { I ) + S U M ( L J S T L ) + S ( L L ) J / N T O T I F ( ( ( S R E F X - S R E F { I ) ) * S I G N ) „ G E o O . ) GO TO 70 ME A R l I ) = L I S T L SREF(I)=SREFX CONTINUE J=J+L I F ( J . G T o N C L ) RETURN I=LIST(J) I F { N E A R ( I ) . E Q „ L R E F o O R e N E A R ( I ) . E Q o N R E F ) GO TO 6 0 GO TO 8 0 END  193 I C iC C C C C C  SUBROUTINE METHOD(S,NEAR,SREF,LIST,NUMBR,SUM,SREFX,SIGN,N,NCL, ALREF,NREF,JOB) H I E R A R C H I C A L C L U S T E R I N G BY C E N T R O I D  SORTING  T H E P A R T I C U L A R ALGORITHM USED HERE I S D E S C R I B E D I N L A N C E , G.N. AND W.T, W I L L I A M S , A GENERAL THEORY OF C L A S S I F I C A T O R Y S O R T I N G S T R A T E G I E S , 1. H I E R A R C H I C A L S Y S T E M S , T H E COMPUTER JOURNAL, VOLUME 9, NUMBER 4, FEBRUARY 1 9 6 7 , P P 3 7 3 - 3 8 0 . DIMENSION S ( 1 ) , NEAR (1) , S R E F ( 1 ) , L I S T ( 1 ) ,NUKBR(1) ,SUM(1) GO TO ( 1 0 , 2 5 , 3 0 ) , J O B C JOB=1, I N I T I A L I Z E . C NUMBR(I)=NUMBER OF E N T I T I E S C U R R E N T L Y I N T H E I - T H C L U S T E R C CLUSTER 10 WRITE(6,2000) 2000 FORMAT(42H0CENTROID C L U S T E R I N G . BEWARE OF R E V E R S A L S ) DO 20 J = 1 , N 20 NUKBR(J)^1 BIG=SIGN*1.E50 RETURN C J O B = 2 , DUMMY ENTRY. 25 RETURN iC J 0 3 = 3 , U P D A T E FOR NEXT ROUND. C U P D A T E T H E NEWCLUSTER 30 NTOT=NUM3R(NREF)+NOMBS(LREF) TOT=NTOT ALL=NUKBR(LREF)/TOT ALN=KUMBR(LREF)/TOT PROD= A L N * A L L LBET=LFIND (LREF,NREF) DO 40 J = 1 , N C L I=LIST(J) I F ( I . E Q . N R E F ) GO TO 40 (p R E C A L L THAT L R E F HAS B E E N REMOVED FROM L I S T AND T E R E F O R E I NEED NOT € B E T E S T E D FOR E Q U A L I T Y WITH L R E F . LL=LFIND (I,LREF) LN=LFIND(I,NREF) S (LN) = A L L * S ( L L ) +A.LN + S (IN) -PEOD*S ( L B E T ) HO CONTINUE C;: U P D A T E T H E NEAR AND S R E F ARRAYS. . I F T H E E X T R E M E ELEMENT I N ROW I (b WAS E I T H E R L R E F OR NREF, THEN I T I S N E C E S S A R Y TO F I N D A NEW E X T R E M E 0: E L E M E N T . ROW'S PRIOR TO NREF NEED NOT BE C O N S I D E R E D . • DO 50 J = 1 , N C L I = L I S T (J) I F ( I . E Q . N R E F ) GO TO 55 50 CONTINUE 55 I F ( J . EQ. 1) GO TO 8 0 60 SREF(I)=BIG J1=J-1 DO 7 0 L = 1 , J 1  194  L I S T L = L I S T (L) L L - L T l l i D(I,LI5TL)  70 80  $SIG  . I F ( { (S ( L L ) - 3 3 E F ( I ) ) * S I G N ) . GE.. 0.) GO TO 70 NE AH ( I ) = L I S T L S R E F ( I ) =S ( L L ) CONTINUE J=J+1 I F ( J . G T . N C L ) RETURN I = L I S T (J) I F (NEAR ( I ) . E Q . L P E-F. 0 3 . NBAS ( I ) . EQ. NREF) GO GO TO 8 0 END  TO  60  195  SUBROUTINE;  M E T H O D ( S , N E A R , S R E F , l . [ S T , A , 8, S R E F X  f  S I G N , Nt N C L , L R E F , N R E F ,  A JOB) H I E R A R C H I C A L C L U S T E R I N G BY THE M I D I AM METHOD OF GOWER, J o C o , A COMPARISON OF SOME METHODS OF C L U S T E R A N A L Y S I S , B I O M E T R I C S , VOLUME 2 3 , NUMBER 4, DECEMBER 1967, P P 6 2 3 - 6 3 7 . DIMENSION S ( 1 ) » N E AR(1)» S RE F { 1 ) » L I S T 1 1 ) » A ( 1 ) » B ( 1 ) GO TO ( 1 0 , 1 5 , 2 0 ) , J O B J 0 3 = l . INITIALIZATION 10 WRITE(6,2000) 2 0 0 0 FORMAT(44H0ME0IAN METHOD OF GOWER, BEWARE OF REVERSALS) BIG=SIGN*1.E50 RETURN C J0B=2, DUMMY ENTRY. 15 RETURN C J03=3, UPDATE FOR NEXT ROUND. 20 LBfcT=LFIND{LRSF,NREF) DO 3 0 J=1,NCL I-L 1ST ( J ) I F ( I . E Q . N R E F ) GO TO 3 0 RECALL THAT LREF HAS BEEN REMOVED FROM LIST SO I NEED NOT BE TESTED FOR EQUALITY WITH LREF. LL = L F I N D { I , L R E F) LN=LFIND(I,NREF) I F S IS A DECREASING FUNCTION OF SIMILARITY ( E . G . DISTANCE) THEN S U N ) = ( S ( L N ) +S(LL) ) /2 , -S < LBET ) / 4 . I F S I S A N INCREASING FUNCTION OF SIMILARITY ( E . G . CORRELATION) THEN S(LN)=(S(LN)+S(LL))/2©+(I»-S(LBET))/4. S(LN)=(S(LN)+S(LL))/2.-S(L8ET)/4. 3b CONTINUE ; - .:; UPDATE THE NEAR AND SREF . ARRAYS. I F THE EXTREME ELEMENT IN ROW I WAS EITHER LREF OR NREF. THEN' IT IS NECESSARY TO FIND A NEW WXTREME ELEMENT. ROWS PRIOR TO NREF NEED NOT BE CONSIDERED. 4|0 DO 50 J = 1,NCL I=LIST(J) I F ( I . E Q . N R E F ) GO TO 55 50 CONTINUE 5,5 I F ( J . E Q . l ) GO TO 80 60 SREF(I)=BIG J1=J-1 I DO 70 L = 1 , J 1 LISTL = L IST(L) LL = L F I N D ( I , L I STL) I F l ( ( S ( L L ) - S R E F ( I ) ) * S I G N ) . G E . O . ) GO TO 70 N5AR(I)=LISTL SREF(I)=S(LL) 70 CONTINUE 80 J=J+1 I F ( J . G T . N C L ) RETURN I=LIST(J) IF(NEAR 11).EQoLREF.OR.NEARtI).EQ.NREF) GO TO 60 GO TO 80 END. 1  i  CLUSTER  TRIAL  R_UNj-|  80 NE a •1 ISIGN o -1 NTSV s 7 NT I M a IN'CPT- » 10 2 KO'JT = _ R E O i l I R E D _ S T CR AGE P 0 1 2 0 WORDS A L L O T T E D S T O R A G E n 7 0 0 0 WORDS F O R M A T ClOF7,«i_) AVERAGE  _  •  LINKAGE  W I T H I N T H E NEW G R O U P  fHlF"RTN lDEPIclT~fH"ri ORTI.ON ,  _  .THE  CRITERION  CLASS  VALUES  LOWER  A R E 'SEGMENTED  BOUND  • 0.2120CC0CE- C I 0 . 3 5 0 3 5 2 9 0 C -01 0".''ilW!7 0 5 'cf -01 « • 0 . 6 2 7 0 5 ? 7 C E --01 0.7&S«112CE.-01 5 0 0 . S C 3 7 6 3 7CE -01 0.10«2; 1 6 OE 00 7 S 0.1 1 P G J 6 8 C E 00 9 o"."i3i6B2f6r c o ' 10. 0 . 1 1 5 7 1 7 3 0 E 00 0 . 1 S 9 5 5 2 6 C E CO 11 12 " " 0 . 1 7 3 3 3 7 8 0 E 00 0 . 1 * 7 2 2 3 1 O E 00 13 0 . 2 O l O 5 r t 3 0 E 00 1« 0,2 1 u8936CC 15 00 0 . 2 2 3 7 2 0 6 Q E 00 16 0.2<J256«10E 00 17 0 . 2 5 6 3 9 9 3 C E 00 18 0 . 2 7 0 2 3 0 6 0 E 00 19 0 , 2 3 ^ 0 'J 9 8 0CO E 20 o"72 *? 7 4 o s i"o E"00 21 0.3117«03cE- 00 22 23 • 0.32557560-E 00 "2<i" 0".33<5M109OE 00' 0 . 3 5 3 2 4 6 1 0 E 00 25 <  OF T H E T R E E  UPPF.R  INTO  DOUNf;  0,,35035290E-0t 0,q6B705°OE-01 0,627 0587 0'E-O'r 0,76501 120E-01 0.<J0 J 7 6 3 7 0 E •01 '0' 10«21l60E 0 0 O . U P C 1 6 P 0 E 00 0 .1 3 1 S G ?. 1 C E00_ 0 , I U 5 7 1 7.30E b o 0 . 1 5 ' 5 5 5 2 6 0 E oo 0', 1 7 3 3 R 7 R 0 E 00 O ' . l f l 7 2 2 3 l 0 E ' 0 0" 0 . 2 O 1 C 5 C 3 O E 00 0 . 2 1 H 8 9 3 6 0 E 00 o.?28 7 P O s n r " OT • 0 ' . 2 / | 2 S 6 « i 0 F . 00 0 ' , 2 S 6 3 9 9 3 C E 00 0 " . 2 7 0 2 3 4 6 0 E 00" 0 , 2fla 0 6 9fi0 E 00 _0_.'29_7905 lOF. 0 0 0731 l"T'«'0'3 0r 00 o ' . 3 2 5 5 7 S 6 0 £ 00 0 ^ 3 3 9 4 1 090E 00 '"0,3532«'M'0E" "00" 0 . 3 6 7 0 8 2 4 0 E 00  CENERATEO BETWEEN THE FOLLOWING  STAGE  CLASSES,  1 AND S T A G E  79 Of T H E C L U S T E R I N G ,  CLUSTER TRIAL HUN '  "K  1  Vi  2 _5 4 5 _6 7 » 9 "10 11 _J2 13 14 _1S 16 17 _J_9_ 19 20 21 22 23 _ja  25  26 27 / 28"29 __30 31 32 31 „ 32 _33 _ 30 _ 35 -, j8 39 "«6 41 42 43 44 45 "46" _j 5  6  7  -S 50' li I. ri H Il •)M "S'S 56 57 "58 %t 60 6"l 62 63 '60 65 66 67 68 69 ' 70 7) 72 7^1 >0 75 "'76 77 76 ~19  I  40 7 69 14 61 27 16 40 7" 33 _5_6 69 14 16 66 68 _36 7 60 ._ 56 16 27 __4 0  71  ••  48 46 "23 3 66 52 52 2 «q 2 I 10 " 1 57 56 J5 13 68 08 36 35 22 9 66 68 2 53 5 f c  T5 28 39 -.o .3 66 5  _ 4 _  36 52 J 23 35 22 08 56 _ 57 " 1 68 36 "2 22, 35 52 1 1 56 22 52 22" I 1  5  J  "IS"  Ts b~ 5T2Too o c"f ^ol  i  44 0.2120000CE-OI 10 0.21200000C-01 ' 7 0 ~ "0.26500OOPE-O1 " 17 0.26500000F-01 62 O_.Jt70OO0r!:-0 1 29 o,"M766o6CE-0~l 16 0.3170000CE-01 _45 0. J52t>6(.S0E-01 . 8 " 0,35266t>SOF.-01' 30 0.3700000CE-01 64 o.ieeoooocE-oi 76 o;'«235'332"Ct-o"i 21 0.42JJJ52CK-01 20 0.U58JJJ00E-0I 6 9 ~ 0.16733MCE-01 70 0.17600000F-01 37 0.47600000E-01  I I t" 1 ! 1 ! 2 2 2 2 2~ 2 2 2 2 2_  n ofoaSessjce-oi 61 0.09J666J0E-01 67 0.5J783320E-01 25 0.SU6666J0E-01 31 0.5603J320E-C1 OJ 0.57 3153a OE-OJ  T 3 3 3 3 J  72 0".5'8260(>6i;£-'61  49 0.5B20D000E-01 47 O.58JCOO00E-01 20"~0.58200000E-01 4 0.5e200000E-ol 75 5 " 0_,_3ue9970E^0_l SO 5 0,63««9"960E-01 0.63499980E-01 J56 00.63499980E-01 ,640<>5200F-0 1 7 0 64 5 O 9 9 S 0 E - 0 1 16 0,6a495200F-0i 67 0 0O,645O99ROE-OI .6524997CE-01 58 0.6524997CE-01 0.63779940E-01 60 J 8 o".Ve'7999 7 0 £ - 6 l 19 0 . 6 8 7 9 9 9 7 C E - 0 1 80 0.70533330E-01 51 " " 0.7406663CE-01 00 0.700799JOE-01 03 0 .74099950E-01 28 0".7«r000'|0E-6i 12 0.74100010F-01 7S__0,75«79920E-01 73 0.77600000E-01 6 0.7760000CE-01 » " •> 7(.nnnncr-n 1 55 0,79B00O00E-.0l  3  3 3 3 0 3 8  « 4 4 4 » 4 4 0 4 4 0 4 0 0 0 4 5 5S 5 5 52—ot7«_«io -br| • 33 5 0 6 0.81099980E-01 0.846J329CE-01 5 i1 5 i ". n0.84666600E-01 .8461?29Cr-01 5 S 77 O:65661820E-0| .5 . 63 C.667699JCE-01 S » --_. - - - so oTaTTTiOBsoT.ol s 59 0 , e e i 9 9 9 7 O £ - c l 5 9_0.908099<)CE-01 t " 30 0 91699">5CF-sl *6" 02 O ^ d S u f l v g C F - C ! 4 26 9.9(.2P9<)'5T.-C'. 6 . 53 0%'962999'iiOt-or 6 71 0.10l2S5hOE 00 6 66 0 . 1 0 I 7 J 3 2 0 E C 0 _ _ 6 10 " 0,102:9')70E 0 0 " " ~" 6 79 0,1C687990£ 00 7 39 0 . U M 1 5 4 C E C 0 7^ 13 t~. 1 ibI66iifiE 00" 6 27 0.1 1880700!: CO 6 08 0 , 1261 05'iOE 00 8 57 " 0.134 M l f i O E 00 " " 9 23 0,149lht.60E 00 . 10 2 0.150 7 495 0FJJO JO 66 "0',"t60Jj62OE 00. 11 36 0.18IO5320E 00 12 56 0.203S308CF. 00 10 35 " 0.205009S0E 00 "'lo" 3 0.21761JOCE 00 IS 22 0.28134730E 00 19 5 2 — 6 " , 16 7 0 8 2 o 0 C"0O 25 :  E  1  :  IL  '  JL  o  5  0 0 0" 0 0 '• B 0 2 3 0 C 5 S 6 0' 0 0 T5 0 _12 lS 7  9  IS  NEXT  FT"  0 4 0_ 10 0. "" 13" O 14 0 20__ 021 4 IS 0 _ 24 0 19" 0 50 t 2J__ 0 16 0 33 0_ 22 13 30 " 0 39 0 4J_ -  ,  o 6 0 0 0 0  0  34 3S 36_ 31 68 _ l _  62  0 .0 40 0 0 51 "0 6 58" t6 0 «5_ 0 0 0 0 . 57 56 <i 0? 0 0 J 14 22 • H . ' J 00 21 _63_ » 0 20 '54 0 «2 0 0 52 17 0 46 26 0 "61 " 18 24 5S 37 0 59 i! CP 6T" 0 0 57 30 0 _ S3 " 3 9 " ' 0 " 65 32 0 67 0 6 61_ 1"5 0 ' 60" » • • To 0 I I .... *» 0. 27 »» . . . . . " 38 0 *< •* . J M 35 0 , ?<L. oi o 66 31 0 70 29 44 71 "28 0" "" 7 1 * 02 0 69 03 50 63_ o5 «8 Vi~ 5« .25 71 36 53 70 _ "34 09 72 46 ' 0 73 55^ 51 7«_ o7 52 72 60 23 70 59 6t 76 ' 56 63 75 57 58 77 ^6 0 67 77_ 6"2 65 TS 68 66 76 70 73 . 79_ "74 " " 64 78 72 71 78 77 76 79 18 75 T 7  ^  0  : o  2  7  198  ID  I "" 2 "" j  NO 64 65  se  •r  $  6  i—.i  67 57  -*!  4  :  •i  i  I-  1  <0~  76 (6 "75 78  ' _: 7  "52  51 59 71 72  _tl_ 62 60 56 63 66 7«  --i  £0  73 79 "35  SB 43  «V 48 49 "51 ' 53 55 06 47 39 36 37 D O  IK 45 at '50 27 29 31 22  it  33  30 26  T a 9  "12" 23 _24_ 30 7 10 0 II 1  1"  17 21 16 18 20  "25" 32 2  "  5 6 _13 19 15  --I  I I•I " I I —I -I  1" i  4  tO  11 ~  " '"14  >S~ 1*  J 7 ' " "18  1«"  2 0 " 21  ii  23  2«~25"  APPENDIX C  Sample Outputs from Program UBC:BMDP2M  BHDP2M - CLUSTER ANALYSIS OF CASES HEALTH SCIENCES .COMPUTING FACILITY UNIVERSITY OF CALIFORNIA. LOS ANGELES  PROGRAM REVISED FEBRUARY 26, 1973 WRITEUP REVISED SEPTEMBER, 1971  PROBLEM CONTROL CARDS PROB TITLE IS 'CLUSTER TRIAL RUN - TOATA1 - 2M«./ INPUT VARIABLE=2. CASE=80. " F O R M A T = ' ( 5 X , 2 F7.6)'•/ PROC SUMCFSO. ST AN 0./ PRINT OAT A. DISTANCE. VERTICAL./ END / PROBLEM TITLE . . . . . .  .CLUSTER TRIAL RUN - TDATA1 - 2H  NUMBER OF VARIABLES TO READ IN . . 2 NUMBER OF VARIABLES ADDED BY TRANSFORMATIONS. . 0 TOTAL NUM3ER OF VARIABLES 2 NUMBER OF CASES TO READ IN.'. . . . 80 CASE LABELING VARIABLES . . . . . . . . . . '.'.'"" 6 0 LIMITS AND MISSING VALUE CHECKED BEFORE TRANSFORMATIONS INPUT TAPE NUMBER . . "5 REWIND INPUT TAPE PRIOR TO READING DATA . . . . NO INPUT FORMAT . . (i>X,2F7.0) PRINT DISTANCE MATRIX . . . - -_. YES _ TYPE OF TREE PRINTED. '. .' . ". . 7 ~."V .""VERTICAL CALCULATING PROCEDURE , . . . . SUM-SOR STANDARD I ZAT1CN CN INPUT DATA YE_S PRINT INPUT DATA MATRIX AFTER STANDARDIZATION . YES  NO. 1 2 3 4 6 7 a  9 10 u 13 14 15 16 1 7  18 19 20 21 22 _2i. 24 25 26  27  28 -22_ 30 31 32 33 34 _3 5_ .36 37 36 39 40 _4J_ 42  43 44 45 46  J±.Z.  48 49  50 51 52 _5Ji_  STANDARDIZED INPUT DATA  NAME  •1.597 •1 .597 •1.597 -1.597 •I .471 1.471 1.346 1 .34t> 1 .295 1. 295 1.220. -1 220 -] 170 -1 094 -I 1 .094 j 1.044 -1L-^4_ -0.969 -0.K93 -0.843. -0.868 -0.843 j i  -0.843 -0.717 -0.717 -0.591 -0.591 -0.466 -0.4 66 -0.466 -0.315 -0.340 iQ-..29JL. -0.2 14 -0.164 -0.139 . -0.089 0.037 _Q..J2.12_ 0.037 0.037 .0.087.. 0.163 0.2 8b _iL.23.8_ 0.2P8 0.213 0.414 0.41<. 0.565 ..n.5i5,  -C.879 C.361 I -104 1.649 0.IQ8_ -0 ,135 126 . 730 05 5 ,02 7 ,325_ 1.,600 -0.,234 -0. 679 0.,410 - I . ,275 -0.,730. - 1 . , 126 -0.,135 - I . ,374 - c . 879 0.708 _ 1 ._352_ 1 . 897 -1.126 . . 0.261.. -0.383 0.906 _=JC,.1.6.5_ 1.749 -0.730 _-l.275 0.311 0.60P  54 55 56 57 5e 59  60 61 62 63 64 65 66 67 60 69 70 71 72 73 74 75 76 77 78 79  0.666 : 0.666 0.666 0. 791 0.917 0.867 0.9 17  •0.917  0.967 1.043 1.043 1.043 "168 168 219 294 1.294 _1_. _24jt_ 172 94 1.420 1.344 1.420 1.420 1 .54 6 1.671 1.747 1.621  -0.086 0. 856 1.600 ~-l.126 -0. 978 O^l^ 1. T04 1.699 1.501 0.706 -0.680 -0.879 -1.126 -0.532 C.460 "-1.C2 7 -0.779 _L^798_ 1,352 0.836 0. 261 -0.433 -1. 126 -1 .523 -0.879 -0.234 C.410  _JL___2_  -1.126 -0.779 C.955 -0.135 -1.374  ___Q,ii3JL  0.46 0 1.352 -1.2 75 - 1 . 126 -0.383 __L_0.63._ 0.708 1 .104 -C. 879 1.352 -0.482 ,-..0„41Q .  N3 O  202  AHAL&.0IS1. 1 2 J A > b 7 8 9  VALUES OF VARIABLES CF £ L U S » E « S  0.111 0.111 0.157 0.1 60 0.160 0.t67 0.195 0.195 0.205  "  11 0.21* 12 0.222 11 0.235 1* 0.238 15 0;250 lo 0.256 17 0.267 18 0.269 _ . ; 19 0.27* 20 0.277 21 0-292 22 0 . 222 2J 0.294 2* 0.294 25 ... 0.298 26 0.304 27 0.306 28 0.3 15 ~2* 0.319 ' 30 • 0 . 3 2 0 31 0.342 32 CK343 33 0.353 _,*. 0.3 53 -  35 0.369 36 0.371 37 . . 0 . 3 7 6 38 0.380 39 0.382 _*0_ ___392 Al 0.392 ^A2 0.399 _.A3 0.373 AA 0.A03 AS 0.A05 4.6. Q _D_ A7 0-AAO *8 0.AA1 *9... .. 0 . 4 4 3 50 0.AA6 51 0.A56 52 0^4 5 i 53 CASE 5A 0.A80  ... 55  "  _  56 57 <f> 59 60 61 62 63 6A 65 66 67 68 69 70 71 72 73 7* 75 76  ]l  79  0.507  0.A97 0.507 n.ip 0.515 0.536 0.517 0.598 0.623 0.676 0.656 0.713 0.738 0.777 0.800 O.JSZ& O'. e61 0.926 1.0*1 * 0.985 1.1*8 1. 76 1 1.653  -rl.3?l 0.062 -1.069 C. 980 17357 -1.006 1.106 1.29* 0.9*2 -1.002 0.096 1.282 -0.952 -0.063 _. 16? -1.287 -0.893 0.917 0.850 -0.340 -..All -1.031 0.A27 -0.327 0.125 -1.A46 0 . 221 -.'706 -0.717 1.231 1-395 0.791 fl-163 ' - •  -  -1 .077 -1.325 -0.C04 -0.978 -1.C77 -1.501 -0.606 * -1.093 1.600 -0.629 -1.259' 0.361 _ - -1-259 -0.705 _0-6fc'; -lVloO -1.226 -0.99* 1.600 -1.201 _P • "OJ, -C.18S -0.433 C.460 1.228 1.079 I.,2i9 0.038 0.807. -0.606 0.377 0.980 -0. 5 M  .  2.000 2.000 2.000 2 .000__ 2".0-6 2.000 2.000 3.000 2.000 3"."000 3.000 2.000 3.000 2.000 3.000_ 3.000 A.000 3.000 3.000 2.000 2_O00_ 2.000 2.000 2.000 2.000 2.000 3.,00?_ 2.000 2.000 A . 000 3.000 2.000 2 . .0 0 Q  -1.534 0.53'. 2.000 0.e75 0.889 3.000 -0.2U' 1.129 2.000 .. -1.A09 1.425 2.000 0.075 -O.C36 2.000 1.10& -1.C44 6._0CJ__ 0.280 0.526 3.000 -1.C62 -1.197 7.000 rl.OAA -1.C87 10.000.. -0.198 -0.713 3.000 -0.506 0.336 2.000 _Q _6_4 li£2 3 2_00 0__ -0.079 -1.236 5.000 0.9A8 1.6A9 A.000 _ _ _ r l . 1 7 8 . _ _ . - 0 . 168 . . . . 3 . 0 0 0 0.A23 -0.5G1 3.000 1.156 -0.e_9 10.000 1.0 IS 1 .590 5,_.0 0 _ O.0A7 1.213 5.000 1.A01 C.45' A.000  -0.E11  0.571  -0.650 0.53* -0.717 1.666 -l.lis -1.040 1.203 -C.B70 -0.123 -1.040 -1.427 1.352 0.196 0.30.1 " " -0.933 -0.21A . 0.360 °_'i . 1. 176 0.165 0.026 -C.915 _ 1.731 -C.92A. ~ - 1 . 1 2 3 1 . 4 S 7 -0.779 0.19* - 0 .895 __?A6 1.27"l - 0 . 8 71 0.768 0.AA6 0.532 _ 1.A02 0.670 0.E4* -0.570 -0.980 -0.975 ' C.6 8 0 -6.078 0.7~70 ['.111 -C.2A? >:i69 i.e*o 0.000 0.000 ?6  A.000 .  6.000 • 3.000 17.000 11.000 8.000 A.000 5.000 5.000 CO  2-_9 _  7".0'00 11.000 12.000 7.000 11.000 ?_L _ 13.000 1A.0OO 1C.000 2*.000 23.000 2 0 . 0 0_0_ **7'000 67.000 80.000 0 0  203  APPENDIX D  Sample Outputs from Program UBC:CGROUP  PR03LEM "NAME" * '  NUMBER.OF ITEMS TO 3E GROUPED ° OF  NUMBER '  "CLUSTER TRIAL RUN 7 OA'fA'l"""  START  PRINTING  STANDARDIZE PRINT  A  STORE  WHEN  THERE  GROUPING  TREE  CONTIGUITY ITEM  KEYS  GROUPING  KEYS  GRAPH  CONSTRAINT  IDENTIFICATION GROUP  »  I  YF3  7  NO BE  '.  READ t NQ  .  '  : NO j YES  NUMBER OF FORMAT CARPS  n  ERROR  2  t YES  TRANSPOSE 0ATA MATRIX  PLOT  '  10 GROUPS  ARr  NAMES TO  MEMBERSHIP  60  T  ~"  "  ~  j  t YES  TERMS  720' BYTES OF CORE ARE ACOUIRED TO TRANSPOSE THE DATA MATRIX DATA FORMAT I ( 5 X f 2 F 7 , 6 1 .•  EX E C U T I O N  iinnu  TIME  BYTES  POR'  OF  T R A N S P O S I N 6 "n  CORE  ARE  ACQUIRED  0,09  FOR  SEC ON 03  GROUPING  TIME T6"ReTd"OATA ANO~STORE'"ERROR liAf R'1 X~» :  "  T,06 "jJECONOS" "" -  S3  o  m: .  FIELD  OF «'S W I L L  BE W R I T T E N .  _  -  ....  ,».«-rr—r...vw  -  -i—nTTT7*n~7T9—rTYl  rCTTS Tfl "'  o o  • •  o 79 GROUPS A F T E R J O I N I N G 70 C O U P S A F T E R J O I N I N G 77 GROUPS A F T E R J O I N I N G 76 GROUPS A F T E R J O I N I N G 3TEP fl 7S .CPQL'PS AFTER J O I N I N G STEP 5 ~~S~~li GROUPS" A F T E R J C T N T N G 7 73 C O U P S A F T E R J O I N I N G S7EP 3 72 G=CUPS A F T E R J O I N I N G STEP 9 7 I GROUPS A F T E R J O I N I N G ' STEP 10 70 GROUPS A F T E R J O I N I N G STEP _FTER J O I N I N G 11 _ 6 9 GROUPS A STEP 68'GROUPS " A F T E R J O l N l N C ""STEP" T2~ 67 GROUPS A F T E R J O I N I N G 13 STEP 66 GROUPS A F T E R J O I N I N G 1" STEP 6 5 GROUPS A F T E R J O I N I N G 15 "STEP 60 GROUPS A F T E R J O I N I N G 16 STEP 6 3 _ G R 0 U P 5 _ AFTFR_ J 0 I N I N G _ S T E P . _17 G P 0 U P 3 AFTER J O I N I N G S 7 cP~~ 10 ~~lZ 61 G=0l.'?S A F T E R J O I N I N G STEP 1? 60 GROUPS A F T E R J O I N I N G STEP 20 59 C-RO'JPS A F T E R J O I N I N G ' " STEP 21 58 CROUPS A F T E R J O I N I N G STEP 22 S T E P _ 2 3 _ _ 5 7 GROUPS^ A F T E R J O I N I N G 56" GROUPS A F T E R " J O I N I N G S T E P ~2"o 5 5 GROUPS A F T E R J O I N I N G 2S STEP 5c G=OUPS A F T E R J O I N I N G 26 ' STEP 53 G P C U P S A F T E R J O I N I N G 27 "STEP 52 GROUPS A F T E R J O I N I N G 28 STEP 02 ( N a ? STEP 51 _r.Rn;iPS AF r FR_ JO I N I N G JOINING 55 (N = S T E P "30 5d" GROUPS" ( F l f l i JOINING a J CNC 31 STEP 09 GROUPS A F T E R JOINING 2 »".:a STEP' 32 08 GROUPS A F T E R J O I N I N G 35 (Na ~ STEP' 33 07 G ~ 0 U P 3 A F T E R J O I N I N G 3a STEP 06 GROI'PS A F T E R J 0 ! N I N C _ _7_5_JNn_ STE^_ _ J 5 i i 5 _ G R 0 U P 3 _ A F T E R_ 'JOINING 39 ( N a 3»" 3"TE"» I C GROUPS AFT F R JOINING 68 ( N a 3? STEP «3 GROUPS A F T E R J O I N I N G IS ( N a 38 STEP 02 GROUPS A F T E R J O I N I N G 63 ( N a 39 STEP «1 GRGUPS A F T E R J O I N I N G 2« (N = 00 STEP 00 GROUPS A F T E R J,FTE_R_JOINlNG_ 36 ( N a 01 STEP 39 GROUPS AFTER' J O I N I N G 71 (N = 02 38 GROUPS STEP JOINING 02 (Na «3 37 GROUPS AFTER STEP J O I N I N G 31 (N = a« 36 GROUPS AF TF.R STEP J O I N I N G 6 (N = 85 3 5 CROUPS AFTER STEP J01N1NG 06 ( N a AFTER «6 3« GROUPS STEP JOINING 66 ( N a «7 33 GROUPS AFTER STEP ""STEP «S J2 CROUPS A F T E R J O I N I N G 16 (N«  3TEP  I  2 5  STEP STEP  -  -  1) 1) 1) 1) 1) T5 "18 ( N a 1) 67 ( N a 1) 29 ( N a 1) " 62 ( N a 76 ( N = 1) 70 ( N a 1) •1) 2) & 2T~TNa 01 ( N a 1) 1) I 05 (N = 1) 2) 20 (Na' 1) 2) 8 ( N a 1) 1) 32 ( N a 1) 1) 19~C"Na ~T> 1) S2 ( N a 1) 1) 70 ( N a 1) 2) " JO ( N a ' 1) 1) 09 ( N a 1) 1) 9_(Na_ 1) n 1) 11 ( N a 2) 58 ( N a 1) 59 (Na 61'(Na 2) 29 ( N a 1) 08 ( N a 1) n 1) 8. 60 (N = 1) 2) & S I ( N a 5 ( N a 1) 1J & 38 ( N a 1) 1) & I) 1) t. 12 ( N a 1) 1) t. 79 ( N a 1). 07 (Ns 1) 1) 80 ( N a 2) J) 26 ( N a 1) 73 ( N a 1) 1) 30 ( N a 1) 1 ) 00 ( N a • 3) 1) 72 (N 1) 1) S3 ( N a 2) 2) 37 ( N a 1) 2) 13 ( N a 1) 1J SO ( N a 2] 1) 78 (N _3JL I — 2 5 ( N a 2) 3) 1) 1) 1) 1) 1) 1) 1>  & & & & & & &  10 ( N a 011 ( H a , 17 ( N a 65 (No 69 (N =  0 . 6 2 5 6 S 8 E 02CUM ERROR ERROR « 0 . 6 2 5 6 9 3 E 02CUM • ERROR a 0 . 1 2 O 7 7 7 E 01CUH • ERROR i 0 . 1 2 9 7 7 9 E OICUM • ERROR = 0 . 1 2 9 7 Sr0 E 01CUM_j ERROR ro~'l"a"b77 .E' OlCUM • ERROR a 0 . 1 9 1 9 6 8 E OICUM : ERROR = 0 . 2 1 1 8 7 0 E OICUM ! ERROR B 0.?. 11B7 0 E -•OICUM : OICUM i ERROR a 0 . 2 5 6 6 2 7 E ^OlCUMj ERROR a 0 . 2 7 9 0 8 1 E •OICUM i sro;'310'i33'E'ERROR ERROR a 0 . 3 1 6 8 0 9 E -•01CUM i ERROR a 0 , 3 3 J 6 9 9 f .•OICUM ! ERROR a 0 . 3 8 3 S 0 7 E -•0 1CUK : ERROR a 0 . O 3 2 0 2 5 E -•OICUM • ERROR a O . O J 2 0 2 S E "•OICUM "ERROR a 0 , 0 3 7 0 2 o E •OICUM ERROR a 0 . 0 3 7 0 2 9 E -•OICUM ERROR a 0 . C 0 J 2 5 2 E --OICUM -OICUM ERROR = 0 , 0 5 1 1 0 3 E -OICUM ERROR a 0 . 0 6 7 8 6 H E -0KUM_ ERROR a 0 . 0 7 3 3 1 3 E -OICUM " ERROR a 0 ~ O 8 3 b 6 2 E -OICUM ERROR a 0 . 5 0 5 0 5 H E , -OICUM ERROR a 0 . 5 1 5 8 7 3 E ' a 0 . 5 1 6 3 0 7 E '-OICUM ERROR a 0 . 5 1 9 1 2 1 E --OICUM ERROR a 0 6 3 1 0 9 0 E '-01CUM_ ERROR a 0,6S1C93E ERROR" a 0,66B039E<-bicu'ri ERROR a 0 . 6 8 9 6 5 3 E '-OICUM ERROR a 0,720P.63E -OICUM ERROR a 0 . 7 3 2 5 5 5 E -OICUM ERROR a Q . 7 3 9 9 5 2 E -OICUM ERROR 3 0", 7 399 5 a"-OICUM -E" OICUM F.PRUH a 0.79O3O9E •OICUM ERROR a 0 . 8 3 2 0 8 8 E •OICUM ERROR = 0.832088E ERROR a 0 . 8 3 2 C 9 2 E •OICUM 'ERROR a 0 . 8 6 2 9 6 0 E •OICUM ERROR = 0 . 1 0 2 0 5 8 •OICUM CUM ERROR a 0 . 1 0 3 9 6 1 CUM ERROR c 0 . 1 C 9 6 S 8 CUM ERROR a 0 . 1 3 2 3 0 6 CUM . ERROR a 0 , 1 3 0 0 7 6 CUM ERROR 0.103061 CUM ERROR ERROR « 0 , 1 6 1 3 7 0 CUM ;  -  :  t  0 . 6 2 5 6 8 8 E • 0 21NDE X 0.125138E • 01IN0EX 0 . 2 0 9 9 1 5 E •01IN0EX 0.37969OE •OlINOEX 0.509O7OE • 01 I N C E X _ 0"."6"5625Sc' .oYiN'cf x 0 . 6 O 2 2 0 C E • oirof.x 0,105011 0.126593 IN'OEX 0.152261 INOtX 0.180169_ INr>EX_ 0'.2U?\?. INDEX 0,202893 INC.EX 0,276263 INPEX 0,310613 INOEX 0,357316 . INDEX 0_.00 1 0 1 8 _ INDEX "0.000721 INDEX 0,088020 INDEX 0,532709 INDEX 0.577860 '"INDEX 0,620606 INDEX 0 671''78_ INDEX " 0 . 7 2 02 7 8 INDEX" 0,770820 • INDEX 0.822O11 INDEX 0.870006 INDEX 0.925958 INDEX _0,9890h7_ INDEX INDEX" lTosJi'a INDEX 1,11698 INDEX 1,18790 INDEX 1,26003 INDEX 1,33368 !NDEX_ 1 . 00768 INDEX 1 ,o'8 16 7 INDEX 1,56110 INDEX 1,60031 INDEX 1,72752 INDEX 1,81073 1,89702 INCEX. 1 ,999"o"8 INDEX 2.1030O INDEX . 2.21270 .INDEX 2,30505 INDEX 2,07952 INDEX -2.62258 INDEX > 2,78aHS INDEX • i  0.0006 77.5092 3.0R70 0.0001 6.3565~" 26.9177 7.5S98 0.0 10,9986  9  _6,12S7_ 7.7516 1.3967 3,5719 _ 9,3511 8 ,2?32 0.0 ' 0,7296 0,0001 • c,e<>8S_ 1.0625 2.1927 0.6T09_ r,16"69 2.6035 1,1132 0,0096 0,2807 Ui .59_  0  2 ,  d.cooi  2,9271 l.sjsa ; 2.O506 0.0967 0,06C5_  «,:•;. i  3,2321 2,0051. 0.0 0.C002 ,«6 3 8 _ 7.1232 0,7088 _.'2.C276 7,0081 0.5630 •2.17CS4j  r  O O  O O  ©  O  o © o o o o o  «,JJ86  ho o c  1  7  8  10  2  6 9  IS  11  3 IS *  5 4  2J  2?  26  12 25  12  27  2'  5  J5  56 09  J'  7  J» o"2  31 03  39  07  Aft  53  55 5t  60 60  61  57  7 IS  62 66  63  5  63  IS  1 2 5  5 7  IS  "" 6  27 35  55  36 58  1 1  SS 17 51  65 70  7J  eo  « _ _ £ » A F T E R 1  7  15  ti 7  2  _J  27  12  35  5  S _ « _  1« 22  2 5 _ 20  30 "1  SI  36  37  oo OS  51  .2  7  55  56  60  61  13  57  5»  60  65  6J  . 68  _7S  70  N»  STEP  L  I 13" 2 11 3 7 .27.._12_  8  72 7  ao  _5..5__o_ 62  GROUPS  12 36  57  13  50  52  62  7;  72  6'  70  75  76  T nr  5  6  13  57  31 OS 07 (0  _12 36 09 OS 65  STEP  70  18 12  6  22  26  28  33  So  00 60 63 70  05 61 68 75  06 62 73 76  50 71 70 77  AFTER  JOTXIVG 2 !••'« [7 I T 25 -  12  9  0 31  36  37  00  01  03  09  51  55  56 59 69  02  07  08  53  50  58  60  65  66  67  1  7  1  13  2  18  2  "' 3  27  12  27  20  35  2« 36  57  5-  CROUPS  8  10  7)  AFTER 10  19  22 50  23  06  61 68  62  71  72  73  70  75  76  70 77  80 76  18  tN» 20  21  I  12 01  ' 15 00  15  19  00  6  50  52  OS  07  06  09  51  51  50  55  (o  65  66  67~" V O  CROUPS  AFTER  T O  [»•  22  23  20  26  56  5'  JC1NIXG  1  (N«  13)  I  27  11  |0  16  17  It  20  21  25  27  2«  3  6  9  12  IS  15  19  22  23  20  26  03  07  08  09  51  53  50  55  56  5<  3  •t  7 o—7  6 " 6 ~ 6 7~~i i~  CROUPS  AFTER  s )  JCINK'G 16  s~  -  35  |7  ?7 — 1  1,6295  5.7H65  ERROR  CUM  '  27.1977  6.2367  SO  33  CUH  «  ..33,1883  0.2061  38  12)  ERROR  "28"~ 30 " S 3 60  61  -  «  5,99059  30  6 2 . 6S  68  71  72  70  73  80  {•(•  IB  2o)  20  21  12)  ERROR  »  31  32  36  37  28  30  33  30  60  61  62  63  CUM  7,23539  .  00,0237  o 0  01  00  05  06  50  68  71  72  73  70  80  ..INDEX;"  1,0390 .  52 ._  en' S  57  25  13)  («•  10  II  10  $  6  9  12  13  15  19  22  23  20  26  0?  03  07  08  09  5|  53  50  55  56  57  • 39  —  28  ('••  10 02  .INDEX.-".  -  0  6 5  7)  (N«  8  —  CUH..«. .21.0061..  32  37  -  _57_.£RR0R..»_.S.06282  32  26  39  25  5  6 5  IN.OEX  79  12)  36  9  3 20  02  19  18.3032  52  31 ;9  0  C U H »  75" ~ S 2 ~  |5 «5  17  N  25  t  11)  .r  tl 6  ~t  2.50057  ERROR'  52 72 80 7»  60 63  35  63.. ( " _ .  21  |3 oo  JOININC 16  I  20  i O _ 01 56 59 69  3  5  |8  19  36  75  <M»  20 00 55 50 67  29  STEP  16  15  12  IT"  10  23. 37 51 53 66  rr  5  30 11  JOINING  II  12  CROUP  35  AFTER  GROUPS  IT  13  "57  IS,  CROUPS  "T  7)  (N=  32  06  50  55  l  25  eo  2« 30 02 .58  ~  5)  JOINING 3 5 (N= 16 17 18 J _ _ 21 24 26 S3  61  7$  CROUP  79  05  STEP  27 35 39  76  60  70  .  77  00  75  12 12 12  76  56 59  60  27 J5 3«  7S  01  07  2 3  70  30  66  11 7  72 60  55  5-  2 3  52  20  02  7  50  oo  57 63  6  06  37  13  7  05  5  23  57  1  00  2  5t 53 08 15_66.  13  21 30  22  35 30  »'  2° 35  19  12 7  1  8  50  67  10  IS  35 39  CRO'JP  I 26  .  71  AFTER  11  13 9 .31 03  2_!_  1' 26  to  _LS  2 3  63  ~ •  !« |9  2»  ja  I  55  CROUP  1  11 15  JOiMHiO  10 I SS I 12 12  es 07  _I5 S7  6  27  29  tPRCR  •.  31  32  36 ' 37  28  30  33  30  56  59  60  61  CUM  28,1997 00  01  62  63  "00 60  •  66,6230  05  06  50  65  66  67  I'IOEX_»  ll|5899  52"" 68  69  25  26  70  71  72  28  20  73  It 77 Y«~J* "«5 —  -  CROUP  "  I'  STEP  "»  '  OS  yj'  31 37'  I  32  33 3«  —  TIME  t  31  2  3  t  JOlNllO  6  7  J6  37  oo  02  1 C»OUPS  32  AFTER  5  07  0J  3 JO 60  r c « GROUPING  0 33 65  •  AFTER 5  1'  01  '.  37 67  7 38 66  (im 10  00  SO 6<  o ' . J * JECOxOS  _  06 53  51  1 6  25) II  05  00  JOINIKO 6  56 66  I  6  _ *  7 7 78 " 79  t•'•  Jl  0-  3' _ ,  7°  1 —  CROUPS So  -  t  EXECUTION  3  " 3 5  STEP  64  2  2  ~7~7S >6  CROUP V  76  1  -  CN«  9 «0 70  1  13  50  52  5a  55  OS) 10 «t 71  2  12  t l 02 72  ("•  16)  |0  15  ERROR 16  17  » 18  33.2055 19  20  CUM 21  •  22  101.829 23  20  T*OEX  •  27  0.5325 30  5 6 5 7 5 8 5 9 6 1 t l 6 2 t l » 0 * 5 6 6 6 7 6 8 6 9 TO 7 J 7 2 — J _ J _ 33  I 12  OJ 73  (•<• 13 00 70  37)  to 03 75  ERROR  15 06 76  16 07 77  • :7 06 76  56,1693 16 01 79  |9 50 80.  CUM 21 20 51  •  159,098  INCEX  •  1,5536  22  23  20  25  26  27  16  29  35  52  55  50  55  56  57  38  39  60  209  .LISTER TR IRL RUN / DflTRl  APPENDIX E  L i s t i n g and Sample Outputs from NONHIER: a Computer Program f o r Three N o n h i e r a r c h i c a l C l u s t e r i n g Techniques  211  DIMENSION X { 7 5 0 0 ) LIMIT=7500 CALL E X E C 1 X , L I M I T ) STOP END SUBROUTINE E X E C { X , L I M I T ) C C T H I S S U B R O U T I N E READS PARAMETERS» COMPUTHS STORAGc AND C A L L S MAJOR C PROGRAM SEGMENTS N E E D E D FOR A N O N - H I E R A R C H I C A L C L U S T E R I N G J O B U S I N G C ONE OF THE METHODS PROGRAMMED AS A V E R S I O N OF S U B R O U T I N E * K M E A N * . C C EVERY J O B R E Q U I R E S T H R E E USER S U P P L I E D DECK S E G M E N T S , C C 1* PROGRAM * D R I V E R * PERFORMS T H E FOLLOWING T A S K S . C A. ASSIGNS INPUT/OUTPUT U N I T S . C 3 . E S T A B L I S H E S THE D I M E N S I O N OF THE * X * ARRAY AND S E T S T H I S C D I M E N S I O N TO * L I M I T * C C . CALLS SUBROUTINE * 2 X E C * . C THE FOLLOWING E X A M P L E WILL S U F F I C E I N MOST C A S E S . C C PROGRAM DRIVER*INPUT,OUTPUT PUNCH TAPES=INPUT TAPc6=JUTPUTt C AT AP57= PUNCH,TAPE1» T A P E 2 ) C DIMENSION X(5GOO) C LIMIT=5GOO C CALL E X E C { X , L I M I T ) C END C C 2 . S U B R O U T I N E * U S E R * I S EMPLOYED TO READ THE COMPLETE S E T OF SCORES C ON THE V A R I A B L E S FOR ONE DATA U N I T . THE FOLLOWING E X A M P L E C I L L U S T R A T E S VARIOUS P O S S I B I L I T I E S FOR MERGING F I L E S AND C" T R A N S F O R M I N G V A R I A B L E S AS THEY A R E R E A D * C C SUBROUTINE U S C R ( X ) C DIMENSION X I 3 ) C READll.lOO) X ( 7 ) , Y C READ(2) (X(I),I=1,6) C READ{5,200) X(8),Z C X(3)=.5*X(3) C X(7)=3 6*X(7) C X(8)=.4*X(8)+.35*Y+«25*Z*X(8) C RETURN C 1 0 0 FORMAT (2 F l l o 3 ) C 200 F0RMAT(F8.1,F6.3) C END C C 3 . F U N C T I O N * D I S T * COMPUTES T H E D I S T A N C E BETWEEN TwO DATA U N I T S OR C BETWEEN A DATA U N I T AND A C L U S T E R C E N T R O I D . T H E USER CAN S P E C I F Y C ANY D E S I R E D D I S T A N C E F U N C T I O N AND WEIGHT THE V A R I A B L E S I N ANY C MANNER. THE F O L L O W I N G E X A M P L E I L L U S T R A T E S A WEIGHTED SQUARED t  e  f  f  212  C  EUCLIDEAN  C  THE  PROBLEM  DISTANCE  C  *W*  ARRAY.  BETWEEN  INVOLVES  3  TWO D A T A  VARIABLES  UNITS  AND  THE  DENOTED WEIGHTS  AS ARE  X  AND  IN  Y.  THE  C C  FUNCTION  C  DIMENSION  C  DATA  OIST=0.  C  DO 10  ,W(8)  1=  10  1,8  DIS7=DIST+W(I}*((XiI)-Y(I))**2)  C  RETURN  C  END  C C C  X { 1 5 , Y { 1)  ( W ( I ) » I = l t 3 ) / 3 * l . , 3 . i 4 * 5 t 2 . , 2 * i /  C C  DIST(X,Y)  N O T E T H A T S C A L I N G A N D T R A N S F O R M A T I O N OF V A R I A B L E S C A N B E ACCOMPLISHED EITHER IN S U B R O U T I N E * U S E R * OR I N S U B R O U T I N E  *DIST*.  C c  C  INPUT  C C C C  CARD 1 CARD 2 COLS COLS  C C  COLS COLS  SPECIFICATIONS TITLE PARAMETER CARD 1- 5 ME=NUMBER 6-10 MV=NUMBER 11-15 16-20  C  CARD  NTIN.NE.5,  C  COLS  21-25  ENTITIES (DATA VARIABLES  NC = N U M B E R OF C L U S T E R S N T I N - I N P U T U N I T FOR T H E NTIN=5»  C  OF OF  DATA  SET  READER  TAPE  OR  NTOUT=GUTPUT UNIT  DISK FOR  FILE  SAVING  CLUSTER  MEMBERSHIP  LISTS  C  NT0UT=7t  C C C C C C C C C C C C C C C C C  N T O U T . L E e 0 , DO N O T S A V E M E M B E R S H I P LISTS MINREL=TERM INATION PARAMETER. C L U S T E R I N G E N D S WHEN A C Y C L E T H R O U G H T H E D A T A S E T R E S U L T S IN *MINREL* OR F E W E R C H A N G E S I N C L U S T t R E M B E R S H I PS MINREL.Lc.O, I T E R A T E TO C O M P L E T E C O N V E R G E N C E IPART=IN!TIAL PARTITION PARAMETER IPART=1, S E E D P O I N T S ARE S E L E C T E D FROM THE D A T A U N I T S . R E A D T H E S E Q U E N C E N U M 3 E R S FOR T H E C H O S E N D A T A U N I T S FROM C A R D ( S ) 3 IN 2 0 1 4 F O R M A T . IF THE DATA SET IS NOT S T O R E D IN C J R E , THE L I S T OF O F S E Q U E N C E N U M B E R S MUST B E I N A S C E N D I N G ORDER IPART=2i THE DATA UNITS ARE G R O U P E D I N T O AN I N I T I A L PARTITION IN T H E I N P U T S E Q U E N C E W I T H T H E FIRST *NUMBR(1)* IN C L U S T E R 1, T H E N E X T *NUMBR12)« IN C L U S T E R 2 E T C . READ THE *NUMBR* A R R A Y FROM C A R D ( S ) 3 IN 2 0 1 4 F O R M A T . IPART=3 T H E S C O R E V E C T O R S FOR T H E S E E D P O I N T S ARE  COLS  26-30  COLS  31-35  -  '  C  PUNCH  f  C C C C  CARD  UNITS)  . COLS  36-40  READ READ  FROM FROM  CARDIS) CARD 3 .  4  M E T H Q D = P A R A M E T E R FOR C H O O S I N G VERSION OF S U B R O U T I N E  METHQD=lf  JANCEY  ALGORITHM  IN  FORMAT  *FMT*  THE ALGORITHM *KMEAN*.  IN  WHICH ONE  IS  213  C M E T H O D . N E . l t FORGY A L G O R I T H M C C * * * C A R D S 3 AND 4 ARE READ I N S U B R O U T I N E * K M E A N * A C C O R D I N G TO THE C * * * P R 3 C £ D U R S S P E C I F I E D BY THE CHOSEN V A L U E OF * I P A R T * * NOTE THAT THE C * * * 3 A S I C K - M E A N S METHOD OF MAC QUE EM S I M P L Y U S E S THE F I R S T * N C * DATA C * * * U N I T S AS C L U S T E R SEED P O I N T S AND T H E R E F O R E IGNORES THE * I P A R T * C***PARAMETER. r  C  c c c c c c c c  .  :  STORAGE A L L O C A T I O N S I N THE * X * ARRAY X ( N 1 ) TO X I N 2 - U N C * N V WORDS- STORAGE OF THE CENTR ARRAY NC W O R D S — S T O R A G E OF THE NUM3R ARRAY X I N 2 ) TO X ( N 3 - ' i ) NE W O R D S — S T O R A G E OF THE MEMBR ARRAY X ( N 3 ) TO X ( N 4 - 1 ) NC*NV W O R D S — S T O R A G E OF THE TOTAL ARRAY X ( N 4 ) TO X ( N 5 - 1 ) NV OR N E * N V W O R D S — S T O R A G E OF THE DAT4 ARRAY X ( N 5 ) TO X ( N 6 ) NE W O R D S — S T O R A G E OF THE L I S T ARRAY I N * R E S U L T * X ( N 4 ) TO X I N 7 )  DIMENSION X U ) , T I T L E ( 2 0 ) READ(5,10GO) TITLE RE A D ( 5 , 1 1 O O ) N E , N V , N C , N T I N , N T O U T , M I N R E L , I P A R T , M E T H O D WRITE(6,2000) TITLE W R I T E ( 6 , 2 1 0 0 ) N E , N V , N C , N T I N , N T O U T , M I N R E L , I PART,METHOD Nl=l N2=N1+NC*NV N3=N2+NC 'N4=N3+NE N5=N4+NC*NV C * N 6 * MAY BE I N C R E A S E D IN *KM£AN*. N6=N5+NV-1 N7=N4+NE-1 MAX=N6 IF(N7.GT.MAX) MAX=N7 WRITE(6,2200) MAX,LIMIT I F ( M A X o G T . L I M I T ) STOP CALL K M E A N ( X I N l ) , X ( N 2 ) , X ( N 3 ) , X ( N 4 ) , X t N 5 ) , N 5 , N E , N V , N C , N T IN,MINREL, A I P A R T , METHOD, L I M I T ) C A L L R E S U L T ( X ( N i ) , X { N 2 ) , X ( N 3 ) , X ( N 4 ) » T I T L E » N c » NV»NC » NTOUT) RETURN 1 0 0 0 FO R M A T ( 2 0 A 4 ) 1100 F0RMAT(8I5) 2 0 0 0 FORMAT( 1 H 1 , 2 0 A 4 ) 2 1 0 0 F O R M A T ( 5 H O N E = , I 3 , / , 5 H NV = , I 8 , / , 5 H N C . = , 1 8 , / , 7 H N T I N = , 1 6 , / , =,14) A8H NTOUT = , I 5 , / , 9 H M I N R E L = , I 4 , / , 8 H I PART = , I 5 , / , 9 H METHOD W O R D S , / , 2 2 0 0 F O R M A T ( 1 9 H 0 R E Q U I R E D STORAGE = , I 5 , 6 H A 1 9 H 0 A L L 0 T T E D STORAGE = , I 5 , 6 H WORDS) END C SUBROUTINE R E S U L T { C E N T R , N U M 3 R , M E M B R , L I S T , T I T L E , N E , N V . N C , N T O U T ) C T H I S S U B R O U T I N E P R I N T S THE R E S U L T S FROM A C L U S T E R I N G J O B BASED C ON ANY V E R S I O N OF S U B R O U T I N E * K M E A N * .  214  C .. DIMENSION C C  C C C  C E N T R ( l ) ,NUMBR<1) , M E M B R { 1 ) , L I ST ( 1 ) , T I T L £ 12_»)  AS A CONTINGENCY P R E C A U T I O N WRITE OUT THE RAW M E M 8 c R S H I P L I S T WRITE ( 6 , 2 0 0 0 ) T I T L E WRIT: ( 6 , 2100} (MEM3R(K ) , K = l , N l = l WRIT!. ( 6 , 2 2 0 0 ) ( N U M B P . t J ) , J = 1 , N C ) INVERT THE * M E M B R * ARRAY AND PUT THE R E S U L T I N THE * L I S T * ARRAY, F I R S T R E V I S E T H E * N U M B R * ARRAY TO C O N T A I N START P O I N T S I N THE * L I S T * ARRAY FOR EACH C L U S T E R NU M BR (N C ) = N E-NUM bR { N C ) +1 • J J = NC JJ1=JJ-1 DO 10 J = 2 N C NUMBR(JJl)=NUMBR(JJ)-NUMBR(JJ1) JJ=JJ1 JJI=JJ-1 B U I L D * L I S T * ARRAY DO 2 0 K = l N E MEMBRK=MEMBR(K) N J = NU M B R { M E M B RK ) LISTiNJ)=K NUMBR(MEM3RK)=NUMBR(MEMBRK)+1 CONTINUE SAVE THE SORTED M E M B E R S H I P L I S T I F D E S I R E D I F ( N T O U T o L E o O ) GO TO 3 0 WRITE(NTQUT,3000) TITLE WRITE{N TOUT13100) l L I ST(K) ,K = l , N E ) RESTORE T H E * N U M B R * ARRAY JJ=NC DO 4 0 J = 2 , N C NUMBR{J J ) = N U M 8 R ( J J ) - N U M B R { J J - 1 ) JJ=JJ-1 NUMBR(1)=NUMBR(1)-1 P R I N T R E S U L T S FOR E A C H C L U S T E R WRITE(6,2000) TITLE WRITE(8,2000) TITLE t  10 C  f  20 C  C 30  40 C  Kl = l  DO 5 0 J = l , N C WRITE(6,2300) J NUMBR(J) WRITE(8,2301) J,NUMBR(J) J1=(J-1)*MV W R I T E ( 6 , 2 4 0 0 ) {C E N T R ( J 1 + 1 ) , 1 = 1 , N V ) K2=K1+NUMBR(J)-1 WRITE(6,2500) (LIST(K),K=K1,K2) WRITE(8 ,2501 ) ( L I S T ( K F ) , K F = K 1 , K 2 ) K1=K2+1 CONTINUE WRITE(6,3500) RETURN f  50  215  2000 2100 2200 2300 2301 2400 2500 2501 3000 3100 3500  FDRMAT(1H1,20A4) F O R M A T ( 2 0 H 0 R A W M E M B E R S H I P L I S T , / , ( I X t 25 I 5 J ) FORMATi14H0CLUSTER S I Z E S , / , ( L X , 2 5 1 5)) F O R M A T { 3 H 0 C L U S T E R , I ? , 9 H C D M T A I N S , I 5 , 1 1 H DATA U N I T S ) F0RMAT12I4) F0RMAT(21H0CENTP.0ID C O O R D I N A T E S , / , ( 1 X » 1 0 E 1 2 « 4 ) ) FORMAT ( 1 6 H 0 M E M b E F . S H l P L I ST , / , { I X , 25 I 5 ) ) F0RMAT115I5) FORMAT(20A4) F0RMAT(20I4) F O R M A T ( ' 1 * » 1 5 X , ' E N D OF O U T P U T ' , / / / ) END  C  100  C  10  $SIG  SUBROUTINE U S E R ( X ) DIMENSION X ( 2 ) READ(5,100) X(1),X12) RETURN F O R M A T ( 5 X , 2 F 1 0 o 2) END FUNCTION D I S T ( X , Y ) DIMENSION X ( l ) , Y ( 1 ) DIST=0. DO 10 1=1,2 DIST=DIST+{(X(I)-YII))**2) RETURN END  216  SUBROUTINE KMEAN(CENTR,NUMBR,MEMBR,TOTAL,DATA,N5,NE,NV,NC,NT IN, AMI N R E L , I P A R T , M E T H O D , L I M I T . )  C C C C  • VERSION  lc  THE  DATA  SET  IS  STORED  IN  CENTRAL  MEMORY.  _.  __  C C C  THIS SUBROUTINE ITERATIVELY SORTS * N E * USING THE ALGORITHM OF (METHOD.NE.1)  DATA  UNITS  INTO  *NC*  CLUSTERS  C C C C C  FORGY, E.W., CLUSTER ANALYSIS OF M U L T I V A R I A T E D A T A . EFFICIENCY VERSUS IN7ERPRETA8IL!TY OF C L A S S I F I C A T I O N S , P A P E R P R E S E N T E D AT 3IOMETRIC SOCIETY IWNAR) M E E T I N G S , RIVERSIDE, CALIFORNIA, JUNE 1965. A B S T R A C T IN B I O M E T R I C S , V O L U M E 2 1 , NUMBER 3, P 7 6 8 .  C  OR  THE  C THE  ALGORITHM  C C  JANCEY,  C  OF  R.C.,  BOTANY,  OF  (METHOD=l)  MULTIDIMENSIONAL  VOLUME  14,  NUMBER  GROUP 1,  APRIL  ANALYSIS, 1966,  PP  AUSTRALIAN  JOURNAL  127-130.  C C C C C C C  CENTR(NV*{J-1)+I)=SC0RE ON I - T H V A R I A B L E FOR J - T H C L U S T E R CENTROID TOTAL(NV*(J-1)+I)=TOTAL S C O R E ON I - < H V A R I A B L E FOR DATA U N I T S THUS FAR A L L O C A T E D TO T H E J - T H C L U S T E R NUMBR{J)=NUMBER OF D A T A U N I T S T H U S F A R A L L O C A T E D TO T H E J - T H C L U S T E R MEM3RIK)=CLUSTER TO WHICH T H E K - T H DATA UNIT C U R R E N T L Y BELONGS DATA(NV*{K-i)+I)=SCORE ON I - T H V A R I A B L E FOR K - T H D A T A U M I T  C  C  C C  D I M E N S I O N C ' E N T R ( 1 ) , T 0 T A L ( 1 ) , N U M B R ( 1 ) , M E M B R ( 1 ) , D A T A {1) A,NAME(4) D A T A I N A M E ( I ) » I = 1, 4 ) / 4 H F,4H0RGY,4H JA,4HNCEY/ 1= 1 I F ( M E T H O D . E Q . 1) 1=3 WRITE(6,2000) MAMS.I),NAME(1+1) WRITE 1 3 , 2 0 0 1 ) NAMECI),NAME{I+1) C H E C K FOR S U F F I C I E N T STORAGE N6=N5+NE*NV-1 WRITE(6f2100) N6iLIMIT IF(N6.GT.LIMIT) STOP ESTABLISH INITIAL PARTITION IF( IPARToME.3) GO T O 2 0 S E E D P O I N T S ARE R E A D D I R E C T L Y FROM CARDS READ ( 5 , 1 0 0 0 ) FMT WRITE(6,2200) FMT WRITE(6,2300) J1 = 0  DO  10  10  J=1,NC  READC 5 , F M T ) (CENTR{J1+I),I=1,NV) WRITE(6 ,2400) (CENTR<J1+1),I=1,NV) J1=J1+NV  ,FMT(20)  217  GO TO 3 0 I P A R T = 1 OR 2 WRITE(6,2500) I PART READ(5,1100) (NUMBR(J),J=1,NC) WRITE(6,2600) (NUMBRCJ),J^1,NC) C READ THE DATA SET INTO C E N T R A L MEMORY 30 Kl=l DO 4 0 K = 1 , N E C A L L USER ( D A T A ( K 1 ) ) 40 K1=K1+NV IF(IPART.EQ.3) GO TO 1 0 0 C I F * I P A R T * IS 1 OR 2 SET UP THE SEED P O I N T S I F U P A R T o E Q . 2 ) GO TO 6 0 C IPART=1. THE DATA U N I T WITH SEQUENCE NUMBER * N U M B R ( J ) * I S USED AS C THE J - T H S E E D POINT DO 5 0 J = 1 , N C NJ=(NUMBR( J ) - U * N V Jl=(J-l)*NV DO 5 0 I = 1 , N V C E N T R ( J 1 + 1) = D A T A ( N J + 1) 50 CONTINUE GO TO 1 0 0 C IPART=2» THE DATA U N I T S ARE GROUPED INTO C L U S T E R S WITH THE J - T H C CLUSTER HAVING * N U M B R ( J ) * MEMBERS. 60 K=0 J1=-NV C ACCUMULATE THE T O T A L SCORE ON E A C H V A R I A B L E FOR EACH C L U S T E R DO 80 J = 1 , N C NJ=NUM8R(J) J1=J1+NV DO 70 1 = 1 , N V 70 T O T A L ( J 1 + 1)=0 o DO 80 K J = l i ! M J K=K + 1 MEMBR(K)=J K1=(K-1)*NV DO 80 1 = 1 , N V J2=J1+I - T 0 T A L ( J 2 ) = T 0 T A L ( J 2 ) + D A T A ( K 1 +1 ) 80 • CONTINUE C COMPUTE THE C E N T R O I D S J1 = 0 00 90 J = 1 , N C DO 90 I = 1 , N V J1=J1+1 CENTRIJ1)=T0TAL( JD/NUM3RC J ) 90 CONTINUE GO TO 115 C I N I T I A L I Z E ARRAYS 100 DO 1 1 0 K = 1 , N E C 20  218  110 MEM3R(K)=0 115 NPASS=1 C B E G I N N I N G OF MAIN LOOP 120 J1=0 DO 1 3 0 J = 1 , N C NUMBR{J)=0 00 1 3 0 1 = 1 , N V J1=J1+1 130 TQTAL(J1)=0. MOVES=0 TDIST=0 C A L L O C A T E EACH DATA U N I T K1=0 DO 1 6 0 K = 1 , N E K2=K1+1  TO THE N E A R E S T  CLUSTER  J2 = l  C  C  COMPUTE D I S T A N C E TO F I R S T C L U S T E R C E N T R O I D DREF=DIST(DATA(K2)»CENTR(J2)) JREF--1 TEST D I S T A N C E S TO R E M A I N I N G C L U S T E R C E N T R O I D S DO 1 4 0 J=2,NC J2=J2+NV DTEST=DIST(DATA(K2),CENTR(J2)) I F ( D T E S T o G E o D R E F ) GO TO 140 DREF=DTEST JRcF=J  140 CONTINUE C A L L O C A T E DATA U N I T * K * TO C L U S T E R * J R E F * NUMBR(J REF) = NUMBR(J R E F ) + 1 TDIST=TDIST+DREF I F ( J R E F . E Q . M E M B R ( K ) ) GO TO 1 5 0 C THE DATA U N I T CHANGES I T S M E M B E R S H I P M0VES=M0VES+1 MEMBR(K ) = J R E F 150 Jl=(JREF-1)*NV DO 1 6 0 I = 1 , N V Jl=Jl+l  K1=K1+1 TOTAL(J1)=T0TAL(J1)+DATA(K1) 160 CONTINUE C A L L DATA U N I T S A L L O C A T E D o T E S T FOR CONVERGENCE WRITE(6,2700) MOVES,NPASS,TDIST NPASS=NPASS+1 JREF=0 I F ( M O V E S o G T o M I N R E L ) GO TO 1 8 5 I F ( M E T H O D o N S o l o A N D . M O V E S . E Q . 0 ) RETURN JREF=1 C COMPUTE TRUE C L U S T E R C E N T R O I D S — F O R G Y UPDATE 170 J1 = 0 DO 1 8 0 J = 1 , N C  CENTROID  DO  130  I=1,NV  J1=J1 + 1 130  CENTRAJl)=TOTAL(Jl)/NUM3R{J) IF(JREF.EO.l) GO  185 C  TO  RETURN  120  IF(METHOD.NE.1) JANCEY  190  GO  TO  170  UPDATE  J1=0 DO 2 0 0 J=1,NC DO 2 0 0 1=1,NV J1=J1+1  200  CENTR{J1)=2«*T0TAL(J1)/NUMBR{J1-CENTR{J1)' GO T O 1 2 0 FORMAT(20A4) FORMAT(2014)  1000 1100 2000  FORMAT!1H0,2A4, AN C O R E )  53H  2001  FORMAT(2QA4)  2100  FORMAT ( 1 9 HORE Q U I R E D  METHOD  OF  CLUSTER  ANALYSIS.  STORAGE  =,I5,6.H =,I5,6H  WORDS)  2200  1 9 H 0 A L L 0 T T E D STORAGE F O R M A T { 7 HO FOP. MAT , 2 0 A 4 }  2300 2400  FORMAT{ 4 3 HI IN I T I A L FORMAT(IX,10E12.4)  CENTERS  READ  2500 2600  FORMAT( 9H1 I PART FORMAT!IX,1017)  A  2700  CLUSTER =,12,  30H,  NUM5R  DATA  SET  STORED  WORDS,/,  IN  ARRAY  AS READ  FOLLOWS///) AS  FOLLOWS///)  F O R M A T i l H O , I 5 , 3 7 H D A T A U N I T S M O V E D ON I T E R A T I O N NUMBER,13,/, A 3 8 H SUMMED D E V I A T I O N S ABOUT S E E D P O I N T S =,E16.8) END  220  SUBROUTIN E KMEAN(CENTR,NUMBR,MEMBRt TOTAL t DATA t N 5 A M I N R E L , I P A R T , M E T H O D , L I MIT)  C  c  N E t NV t N C , N T I N t  .  c  C  t  VERSION  1«  THE  _.  DATA  SET  IS  STORED  IN  CENTRAL  MEMORY.  _  c C C  THIS SUBROUTINE ITERATI'VELY SORTS * N E * DATA U N I T S I N T O * N C * CLUSTERS USING THE CONVERGENT K-MEANS METHOD D E S C R I 3 E D IN S E C T I O N 7.2.2.  C C C C C C C  C E N T R ( N V * ( J - l ) + ! ) = S C O R E ON I - T H V A R I A B L E FOR J - T H C L U S T E R CENTROID T O T A L ( N V ' I J - D + I j = T O T A L S C O R E ON I - T H V A R I A 3 L E FOR DATA U N I T S T H U S FAR A L L O C A T E D TO T H E J - T H C L U S T E R NUMBR(J)=NUMBER O F D A T A U N I T S T H U S FAR A L L O C A T E D T J THE J - T H C L U S T E R MEMBR(K)= C L U S T E R TO WHICH T H E K - T H DATA UNIT CURRENTLY BELONGS DATA(NV*!K-1)+I)=SCQRE ON I - T H V A R I A B L E FOR K - T H D A T A U N I T  C DIMENSION C E N T R ( 1 ) , T O T A L ( 1 ) , N U M B R ( 1 ) , M E M B R ! 1 ) , D A T A 11),FMT(20) WRITE(6,2000) WRITE(8,2000) C  C C  C H E C K FOR S U F F I C I E N T STORAGE N6=N5+NE*NV-1 WRITE(6,2100) N6,LIMIT IF(N6.GT.LIMIT) STOP ESTABLISH INITIAL PARTITION IFUPART.NE.3) GO T O 2 0 SEED P O I N T S ARE READ D I R E C T L Y READ ( 5 , 1 0 0 0 ) FMT WRITE(6,2200) FMT WRITE(6,2300) J1 = 0 DO  10  FROM  CARDS  J=1,NC  READ(5,FMT) 10  WRITE(6,2400) J1=J1+NV  C 20  GO T O 3 0 I P A R T = 1 OR 2 WRITE(6,2500) READ(5,1100)  (CcNTRCJl+I),1=1.NV) (CENTR(Jl+I),1=1,NV)  I PART (NUMBR(J),J=1,MC)  40  WRITEt6,2600) (NUMBRCJ) , J = l ,NC) R E A D T H E DATA S E T I N T O C E N T R A L MEMORY Kl=l DO 4 0 K=1,NE CALL USER (DATA(K1)) K1=K1+NV  C  IF  C C  IF(IPART.E0.2) GO T O 6 0 IPART=1. THE DATA UNIT WITH THE J - T H SEED POINT  C 30  IF( IPART.EQ.3) GO T O 5 1 *IPART* IS 1 OR 2 S E T UP  THE  SEED  SEQUENCE  POINTS NUMBER  *NUMB*(J)*  IS  USED  AS  00 50 J = i , N C NJ= ( N U M B R . J ) - l ) * N V Jl=(J-i)*NV DO 50 I - l N V C E N T R ( J 1 +1) = D A T A i N J M ) CONTINUE THE I N I T I A L CONFIGURATION IS GIVEN IN TERMS OF SEED POINTS. CONSTRUCT AN I N I T I A L PARTITION B Y ASSIGNING EACH O H T A UNIT TO TH NEAREST S E E D POINT. S E E D POINTS REMAIN FIXED THROUGHOUT ASSIGNM OF THE FULL DATA S E T . DO 52 K=liNF. MEMBR(K)=0 J1 = 0 00 53 J=1,MC NUMBR(J)=0 00 53 1=1,NV Jl-Jl+1 TOTAL(Jl5=0. ALLOCATE EACH DATA UNIT TO THE NEAREST SHED POINT Kl-0 DO 55 K=1,N_ K2=K1+1 J2 = l COMPUTE DISTANCE TO F I R S T SEED POINT DREF-DIST(DATACK2),CENTR(J2)) JREF=1 TEST DISTANCES TO REMAINING SEED POINTS 00 54 J~-2,NC J2=J2+NV 0TEST = D I S T ( D A T A ( K 2 ) » CENTR(J 2) ) I F C D T E S T . G E . D R E F ) GO TO 54 DREF = OTE ST JREF=J CONTINUE ALLOCATE DATA UNIT * K * TO CLUSTER * J R E F * NUMBR(J REF)=NUMBR(JREF)+1 MEMBR (K.)=JREF Jl=tJREF-l)*NV DO 55 1=1,NV J1=J1+1 K1=K1+1 TOTAL(Jl)=TOTAHJ1)+DATA(K1) CONTINUE GO TO 85 IPART=2. THE DATA UNITS ARE GROUPED INTO CLUSTERS WITH THE J - T H CLUSTER HAVING * N U M B R ( J » * MEMBERS. K= 0 J1=-NV ACCUMULATE THE TOTAL SCORE ON EACH VARIABLE FOR EACH CLUSTER DO 80 J=1,NC f  50 C C C C 51 52  53 C  C C  54 C  55 C C  60  C  222  NJ=NUMBR(J) -J1=J1+NV DO 70 1=1,NV 70 TOTAL.J1+I)=0. DO 80 K J = 1 , N J K=K + 1 MEMBR(K)=J K1=(K-1)*NV DO 80 I=1,NV J2=J1+I TOTAL(J2)=T0TAL(J2)+DATAIK1+I) 80 CONTINUE C COMPUTE THE CENTROIDS 85 J1=0 DO 90 J=1,NC DO 90 I=1,NV J1=J1+1 CENTR{J1)=TOTAL(Jl)/NUMBR(J ) 90 CONTINUE C I N I T I A L I Z E ARRAYS 100 NPASS=1 C BEGINNING OF MAIN LOOP 120 MOVES=G TDIST-0 C ALLOCATE EACH DATA UNIT TO THE NEAREST CLUSTER CENTROID K1=0 DO 16 0 K=1,NS K2=K1+1 J2=l C COMPUTE DISTANCE TO FIRST CLUSTER CENTROID DREF=DIST(DATA(K2),CENTR(J2)) JREF=1 C TEST DISTANCES TO REMAINING CLUSTER CENTROIDS DO 140 J=2,NC J2=J2+MV DTEST=DI S T ( D A T A ( K 2 ) , C E N T R ( J 2 ) ) IF(DTESTo GEoDREF) GO TO 140 DREF=DTEST JREF=J 140 CONTINUE TDIST=TDIST+DREF I F ( J R E F . N E . M E M B R ( K ) ) GO TO 155 K1=K1+NV GO TO 160 C REALLOCATE DATA U N I T * K * FROM CLUSTER *MEMBR(K)<< TO CLUSTER * J R E F * 155 M0VES=M0VES+1 J2=MEMBR(K) NUM BR(J 2)=NUM B R ( J 2 ) - 1 NUMBR { JREF ) =.NUMBR (JREF ) * - l MEMBR(K)=JREF  J1=IJREF-1)*NV J3=(J2-1)*NV 00 150 1=1,NV J1=J1+1 J3=J3+1 K1 =  K1U  TOTAL(Jl)=TDIAL(Jl]+DATA(Kl) CENTR{J1)=TOT A L ( J 1 ) / N U M B R ( J RE F) TOT A L ( J 3)=TOTAL(J 3 ) - D A T A ( K 1 3 CENTR(J3)=TOT A L ( J 3 ) / N U M B R ( J 2) 150 CONTINUE 160 CONTINUE C ALL DATA UNITS ALLOCATED. TEST FOR CONVERGENCE WRITE(6,2700) MOVES,NPASS,TDIST NPASS=NPASS+1 IF{MOVE S . L E . M I N R E I ) RETURN GO TO 120 1000 FORMAT{20A4) 1100 FQRMAT(20I4) 2000 FORMAT I 46H0C0NVFRGEMT K-MEANS METHOD OF CLUSTER A N A L Y S I S , / , A 24H DATA SET STORED IN CORE) 2100 FORMATi19H0RE QUI RED STORAGE = , I 5 , 6 H WORDS,/, A 19H0ALL0TTED STORAGE = , I 5 , 6 H WORDS) 2200 FORMAT(7H0F0RMAT,20A4) 2300 FORMAT{ 4 3 H 1 I N I T I A L CLUSTER CENTERS READ I N AS FOLLOWS///) 2400 F O R M A T ( I X , 1 0 E 1 2 o 4 ) 2500 FORMAT! 9H1 I PART =,12, 3 0 H , NUMBR ARRAY READ AS FpLLOWS///) 2600 FORMAT iIX,1017) 2700 FORMAT(1H0,15,37H DATA UNITS MOVED ON ITERATION N U M B E R , 1 3 , / , A33H SUMMED DEVIATIONS ABOUT SEED POINTS = , E 1 6 . 8 ) END SSI 6  CLUSTER NE NV  TRIAL RUN  - NON-H-IERARCHICAL - TOATA1 •  *  80 2 2 uc » NTI'I 5 5 NTO'JT « -1 _>!INR£L_S_ 1 IPAPT • 2 K£T»03 « 3 a  REOUIREO STORAGE =  FCSCY  186'WORDS  "ETHOO OF CLUSTER  RECU1AED STORAGE « L-AHSTTEn  IPART  ST"?tsr  » 2,  10  »  ANALYSIS,  OATA SET STORED In CORE  250 WORDS  Ttn. uriai  NU»8R'ARRAY  UFAO A3 FOLLOWS  ao  37 OATA UNITS  HOVEO  suMM&a—3E-viA7Ki»ir,  ON  AnpuT  ITERATION A'tiHHKR  srro  1  PO-WS-=—»*-*<nn>?*tk-i*'-  i DATA U N I T S MOVED ON T T F R A T I O N "MMSFR y_ S U * « E D D E V I A T I O N S ABOUT Sf FT. POTsTS « fl.685h003ar  RAW  ME*S£R3HIP L I S T 1 2 2 2 2 1 2 1 ? 1 2 1 1 1 1 1 2  CLUSTER CI CLUSTER  2 2 ?  12  1 1 2  2 1 2  SIZES 36 1 CONTAINS  44  DATA  UNITS  CENTROID COORDINATES • • .0.731SE 03 0.2164E 06 I  KCrs?-SJ»It>-rT5T— 1 6 7 45 46 . 47 CLUSTER  8 50  2 CONTAINS  —crNTrtCTD-xroJTTTT^Ai'es" 0.7400E 03 0.5739E «EMDER3HI?LI6T 2 59  3 60  4 . 61  10 52  36 DATA  13 57  14 S8 '  16 64  17 65  18 66  19 67  20 69  22 72  23 73  21 74  26 80  28  30  UNITS'  06  — 5 62  11 54  -*9 63  <2 68  15 71  APPENDIX F  C o o r d i n a t e s o f Data P o i n t s Used i n T h i s Study  226  INPUT DATA FOR EVFNLY DISTRIBUTED CONTRIVED DATA - D A T A l  i  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  X  10 10 10 10 15 15 20 20 22 22 25 25 27 30 30 32 32 35 38 40 39 40 40 40 45 45 50 50 52 55 55 55 61 60 62 65 67 68 70 75 75 75 75 77 80 85  y  i  X  20 45 60 71 52 35 15 23 59 17 11 70 33 20 46 12 23 15 35 10 20 52 65 76 15 43 30 56 34 73 23 12 44 50 64 15 22 57 35 10 25 47 65 12 15 30  47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80  83 85 82 90 90 96 94 100 100 100 105 110 108 110 1.10 112 115 115 115 120 120 122 125, 125 123 125 130 127 130 130 135 140 143 138  J  y  39 52 60 20 65 28 46 36 55 70 15 18 41 60 72 68 52 24 20 15 27 47 17 22 74 65 55 43 29 15 7 20 33 46  INPUT DATA F O R UNEVENLY DISTRIBUTED CONTRIVED i  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  X  5 7 10 10 10 10 15 15 15 15 20 20 20 22 25 25 25 25 30 35 40 45 45 45 47 50 50 50 50 53 55 55 55 55 57 60 60 62 63 63 65 67 55 59 59 64  y 65  67 55 65 70 75 60 65 77 80 55 65 70 58 45 55 65 75 50 45 50 45 50 55 52 40 45 55 60 48 60 55 45 40 36 33 50 45 41 36 38 41 28 28 23 22  1  X  47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80  62 58 55 50 45 40 45 75 85 90 90 100 100 105 105 105 110 110 110 115 115 115 115 115 119 120 122 12 5 125 130 133 128 125 125  y  20 15 20 10 10 15 20 25 30 25 35 35 40 30 45 50 28 36 43 • 32 40 45 43 55 36 43 51 47 40 33 25 27 30 25  DATA  228  INPUT DATA FOR NORTH BURNABY AREA (STREET LETTER BOX ID  N3001 NB002 NB003 NB004 N8005 NB006 NB007 NB008 NB009 NBOIO NBOll NBCU2 NB013 N3014 NB015 NB016 N3017 NB018 NB019 NB020 NB021 NB022 NB023 N3024 NB025 NB026 NB027 N3028 NB029 N3030 N3031 N3032 NB033 N3034 NB035 NB036 NB037 NB033 NB039 N8040 N3041 NB042 N3043 NB044 NB045 NB046  X  y  401.23 393.23 397.93 365.23 361.03 365.13 365.63 369.93 3 65.33 372.13 376.43 384.93 391.63 385.83 378.93 3 73 . 93 368.93 3 73.93 368.13 363.43 361.13 356.83 356.83 356.33 3 56.83 356.83 362.93 364.33 364.73 352.73 348.63 345.53 343.53 352.73 350.73 346.73 344.43 332.23 329.93 327.03 325-63 323.53 320.13 313.33 314.23 318.03  223.53 229.73 223.73 220.63 214.33 212.03 211.63 209.33 204.63 204.83 193.33 199.43 198.63 193.23 193.23 •193.23 193.23 ' 190.03 191.63 202.03 205.93 204.83 208.43 216.53 220.83 226.53 226.73 223.83 232.33 226.43 216.53 215.33 211.43 212.33 207.33 208.93 209.93 210.43 203.53 210.83 212.53 208.53 ' 214.53 213.83 220.33 226.03  LOCATIONS)  location  SFU MALL SFU FRONT OF SHELL STATION SFU FRONT OF STUDENT RES. DUTHIE AND CURTIS C L I F F AND WINCH DUTHIE AND HALIFAX 1800 DUTHIE P H I L L I P S AND CORONADO DUTHIE AND BROADWAY CAMROSE AND BROADWAY LAKE C I T Y AND ENTERPRISE UNDER HILL AND ENTER PRODUCTION WAY AND THUNDERBIRD CRES. LAKEDALE AND GOVERNMENT PIPER AND GOVERNMENT LOZELLES AND GOVERNMENT P H I L L I P S AND GOVERNMENT LOZELLES AND WINSTON 7342 WINSTON BAINBRIDGE AND LOUGHEED C L I F F AND BROADWAY SPERLING AND BROADWAY SPERLING AND ADAIR SPERLING AND KITCHENER SPERLING AND CURTIS SPERLING AND HASTINGS (SUB LGCHDALE) BAP.NET AND HASTINGS BARNET AND PANDORA INLET AND SIERRA KENSINGTON AND HASTINGS FELL AND KITCHENER HOLOOM AND GRANT FELL AND BUCHANAN KENSINGTON AND HALIFAX 6265 EAST BROADWAY 5901 EAST BROADWAY!SUB 113) HOLDOM AND BROADWAY (SUM4S) BETA AND LOUGHEED ALPHA AND DAWSON 4477 LOUGHEED ROSSER AND HALIFAX MADISON AND DAWSON GILMORE AND GRAVELEY 3765 EAST 1ST DOUGLAS AND ESMOND MCDONALD AND PENDER (NB S T A T . i  229  NB(K7 NB048 N8049 NB050 N3051 MB052 NB053 N3054 NB055 NB056 N8057 N5058 N8059 NB060 NB061 N3062 N3063 N6G64 N3065 NB066 NB067 N3063 NB069 NB070 NB071 N3072 NB073 NB074 N8075 NB076 N8077 N3078 NB079 NB080 NB081 N8082 NB083 NB084 N8085 N3036 N3087  3 23.73 319.43 319.43 3 23.73 323.73 3 27.7? 330.73 336.13 336.13 342.53 339.33 336.13 327.73 330.23 3 30.43 327.73 3 27.73 332.03 3 3 6 . 13 334.13 336.13 337.43 341.43 344.53 346.73 347.73 343.53 339.33 332.03 .3 2 3 . 7 3 32 3.73 319.43 319.43 319.43 319.43 316.03 314.33 314.33 312.53 316.03 318.03  2 22.13 222.13 220.03 218.43 214.93 214.93 2.16.03 219.03 221.03 223.1 3 217.23 212.83 210.83 211.53 213 .43 221.03 227.03 227.03 226.13 233.23 233.23 231.03 231.03 231.03 2 31.03 226.43 " 2 26.43 226.43 223.13 227.03 2 30 » 73 227.03 2 30.73 233.23 236.33 238.23 235.83 232.73 230.73 227.03 227.03  MAOI SON ANO UNION GILMORE ANO VENABLES GILMQP.O ANO NAPIER 1265 MADISON (SUB 6 7) MADISON AND GRAVELEY WI LL I rib DOM ANO BRENTLAWN FAIRLAWM AND MIOLAWN DELTA AND F - AI R.LAWN DELTA ANO PARKER HOWARD AND UNION 1381 SPRINGER DELTA AND 3RENTLAWN WILLINGTON AND LOUGHEED EA TONS BRENTWOOD (SUB 121) REAR BRENTWOOD SHOP CTR. WILLINGTON AND PARKER WILLINGTON AND HASTINGS (SUB 45) BETA AND HASTINGS DELTA ANO HASTINGS EMPIRE OR (GAMMA) AND CAMBRIDGE DELTA AND CAMBRIDGE HYTHE AND DUNDAS GROSVENOR AND DUNDAS HOLDOM AND DUNDAS WARWICK AND DUNDAS STRATFORD AND HASTINGS ELLESMERE AND HASTINGS (SUB 132) SPRINGER AND HASTINGS BETA AND UNION MADISON AND HASTINGS MADISON AND TRIUMPH GILMORE AND HAST INGS GILMORE AND TRIUMPH GILMORE ANO CAMBRIDGE GILMORE AND T R I N I T Y INGLETON AND EDINBURGH ESMOND AND MCGILL ESMOND AND OXFORD 80UNDRY AND TRIUMPH INGLETON AND HASTINGS MACDONALD AND HASTINGS (SUB 106)  230  INPUT DATA FOR "SOUTH BURNABY AREA (STREET LETTER BOX LOCATIONS)  TJJ  X  SB001 SB002 SB003 S3004 SB005 SB006 SB007 SB008 SB0G9 SBOIO SBOll S3012 SB013 S3014 SB015 SB016 SB017 SB018 SB0I9 S3020 SB021 SB022 SB023 SB024 SB025 SB026 SB027 SB028 SB029 SB030 SB 0 3 1 SB032 SB033 SB034 SB035 SB036 SB037 SB038 S3039 SB040 SB041 SB042 SB043 SB044 SB045 SB046  375.16 372.36 3 67.66 3 62.16 357.66 552.16 3 4 9 . 86 348.06 344.86 344.76 339.46 333.66 342.16 345.16 338.03 344.06 339.66 339.66 ji3•86 328.66 330.26 323.06 324.36 324.36 326.36 316.36 315.16 321.21 316.86 317.96 316.16 316.06 320.86 3 2 0 . 86 322.56 327.06 324.66 324.86 323.76 3 26.26 321.36 323.46 328.36 326.46 324.26 321.36  y 1 6 2 .27 165.67 171.2 7 172.77 178. 97 183.77 185.57 190.67 187.17 181.37 181.37 131.17 189.07 191.57 206.93 200.17 198.57 194.57 196.37 191.47 190.17 196.17 196.17 197.97 203.37 202.17. 196.17 192.07 190.67 190.67 187.97 182.37 182.87 178.17 174.57 176.17 179.62 182.87 167.77 164.07 167.57 161.47 162.77 159.17 155.07 151.17  location  CANADA WAY AND WEDGEWOOD CANADA WAY AND GOODLAD CANADA WAY AND STANLEY BUCKINGHAM AND BURR IS CANADA WAY AND SPERLING CANADA WAY AND LEDGER 4 9 1 6 CANADA WAY GODWIN AND SPROTT MAHON AND SPRUCE MAHON AND G I L P I N ROYAL OAK AND G I L P I N GARDEN GROVE AND MOSCROP 5325 KlNCAlD DOUGLAS AND WOODSWORTH 2 2 1 0 DOUGLAS ROAD DOUGLAS AND REGENT ROYAL OAK AND MANOR 4 6 9 4 CANADA WAY GARDNER COURT AND CANADA WAY 3 7 0 0 WILLINGTON (FRONT BCIT) 3 7 0 0 WILLINGTON (SAC BCIT) WILLINGDON AND CANADA WAY SUMNER AND CANADA WAY SUMNER AND DOMINION GILMORE AND S T I L L CREEK SMITH AND MYRTLE 3 7 3 7 CANADA WAY KALYK AND NITHSDALE 3 8 1 5 SUNSET (SUB 9 3 ) BURNABY GEN HOSPITAL SMITH AND SPRUCESMITH AND MOSCROP PATTERSON AND MOSCROP PATTERSON AND HAZELWOOD BARKER AND BOND G I L P I N CRES AND BURKE BARKER AND GI LPI.N DARWIN AND MOSCROP MCKAY AND KINGSWAY MCKAY AND BERESFORD PATTERSON AND BERESFORD C A S S I E AND MAY WOOD BERESFORD AND TELFORD SUSSEX AND IMPERIAL MCKAY AND VICTORY PATTERSON AND RUMBLE  S6047 So 043 SB049 S3050 SB051 SB.052 S3053SB054 S3 05 5 S3036 S3037 SB053 S3059 S3060 SB061 S8062 SB063 SB 06 4 S3065 S3066 S3067 S3068 S3069 S3070 S8071 S3072 SB0 73 S3074 S8075 SB076 SB077 SB078 S3079 SB080 SB081 SB082 SB083 S3034 SB085 SB036 SB037 SB OS 8 SB 039 S3090 SB091 SB092 SB093 SB094 S8095 SB096  3 2 1 . 36 326.46 330. 3 6 3 2 6 . 46 329 . 76 335 . 36 ^35.36 335- 3 6 3 3 6 . 36 3 3 9 . 61 3 43 . 66 3 4 7 . 86 3 4 7 . 36 3 4 7 . 86 3 4 7 . 86 3 4 7 . 56 3 5 2 . 16 3 52 . 16 3 4 3 . 66 3 4 3 . 66 3 3 9 . 86 3 3 9 . 36 3 3 9 . 86 . 3 3 9 . 36 3 2 6 . 96 3 2 1 . 36 3 1 7 . 06 315. 46' 314.86 3 1 4 . 36 314.86 3 3 2 . 76 3 3 2 . 76 3 3 5 . 26 3 39. 3 6 3 4 4 . 21 '348. 81 3 5 2 . 06 . 3 5 5 . 26 3 5 7 . 56 3 5 7 . 56 3 6 0 . 26 3 6 3 . 26 3 7 2 . 16 3 6 9 . 76 3 6 6 . 56 3 6 7 . 76 3 6 4 . 46 3 63 . 26 3 5 2 . 96  148.07 143.17 146.77 151.07 15 3*12 151.07 146.77 155,07 '159.17 15 6.07 155.17 153.7 7 15 0.97 147.97 144.37 140.47 147.87 151.92 150.97 146.87 150.97 143.97 145.97 141.97 144.92 144.97 143.77" 146.77 151.12 154.07 15 8.07 1 7 6 . 17 171.92 173.07 168.02 163.77 165.57 163.57 164.97 163.37 161 .47 165.07 161.37 159.7 7 160.77 159.07 155.07 155.27 155.17 159.17  PATTERSON AND PORTLAND SUSSEX AND PORTLAND STRATHEARN AND MCKEE SUSSEX AND RUMBLE FREDERICK AND WITLING NELSON AND RUMBLE NELSON AND MCKEE NELSON AND VICTORY DUNBLANE AND IMPERIAL ROYAL OAK AND BERESFORD MCPHERSON AND BERESFORD BULLER AND BERESFORD BULLER AND RUMBLE BULL EP. AND PORTLAND BULLER AND CARSON G I L L E Y AND MARINE G I L L E Y AND PORTLAND 7542 GILLEY MCPHERSON AND RUMBLE MCPHERSON AND MCKEE ROYAL OAK AND RUMBLE (SUB 9 ROYAL OAK AND CLINTON ROYAL OAK AND EWART ROYAL OAK AND MARINE SUSSEX AND MARINE PATTERSON AND MARINE GREENALL AND MARINE JQFFRE AND CARSON JOFFRE AND RUMBLE (SUB 109} J OF F Rt AND ARBOR JOFFRE AND DUBOIS SUSSEX AND BURKS SUSSEX AND SARDIS NELSON AND BUXTON ROYAL OAK AND DOVER ELGIN AND IRVING WALTHAM AND BERWICK GILLEY AND BURNS BRANTFORD AND STANLEY SPERLING AND WALKER SPF; RL I NG AND BUR FORD WALKER AND STANLEY (SUB 97) WALKER AND IMPERIAL MARY AND VISTA HUMPHRIES AND ELWELL LINDEN AND ELWELL ESMONDS AND KINGSWAY 7155 KINGSWAY (SUB 120) SALISBURY AND KING SWAY COL BOURNE AND IMPERIAL  SB097 SB098 SB099 SBIOO SB101 S3102 S8103 S8104 SB105 S8106 S3107 SB 103 , SB109 SB110 SB111 SB112 SB113 $SIG  357.56 . 3 5 2 . 06 3 4 6 . 36 3 4 2 . 76 3 3 9 . 36 3 3 3 . 26 3 3 0 . 76 3 3 2 . 26 3 3 5 . 26 3 33 . 76 3 3 1 . 36 3 3 1 . 86 3 2 9 . 51 3 2 6 . 96 3 2 3 . 50 3 1 8 . 11 316.06  155.17 156.87 159.17 160.32 162.57 159.17 164.42 164.37 164.52 165.37 166.47 167.47 169.37 168.87 169.67 171.47 173.57  SPERLING AND KINGSWAY G I L L E Y AND KINGSWAY KINGSWAY AND IMPERIAL DENBIGH AND KINGSWAY ROYAL OAK AND KINGSWAY (SUB J U B I L E E AND IMPERIAL SIMPSON SEARS (SUB 65) SIMPSON SEARS NELSON AND KINGSWAY MCMURRAY AND KINGSWAY SUSSEX AND KINGSWAY 6025 SUSSEX PIONEER AND GRA JGE 4429 KINGSWAY (SUB 122) 4211 KINGSWAY JERSEY AND KINGSWAY (SUB 88) SMITH AND THURSTON  233  APPENDIX G  Membership L i s t s o f Groupings D e f i n e d by V a r i o u s C l u s t e r i n g Methods  234  M E M B E R S H I P L I S T OF GROUPS DEFINED BY EVENLY DISTRIBUTED CONTRIVED OATA  : EUCLIDEAN SINGLE LINKAGE 67 MEMBERS GROUP 1 2 3 4 5 1 20 17 18 19 16 35 32 33 34 31 50 47 43 49 46 63 GROUP 57  68 2 - 1 3 58  71 72 MEMBERS 64 65  DISTANCE 6 21 36 51  SUM  GF  DIFFER 10 25 40 55  24  29 49  31 50  3 3  73  t>4 79  65 80  32 51 66  73  79  12  13  15  19  20 45  21 46  25 47  60 75  61 76  62 77  27 Ho 6i  8 43  9  18 44 .59 74  69  23  77  75  67  22  76  70  66  '62  5o  53  74  14 29 44 61  13 28 43 60  9 24 39 54  73  HcTHODS 12 Z7 42 59  8 23 38  7 22 37 52 30  LINKAGE : CHI - S Q U A R E S SINGLE MEMBERS GROUP 1 - 2 2 6 2 3 4 5 1 35 28 30 34 38 26 MEMBERS GROUP 2 - 5 8 16 17 10 11 14 7 41 42 37 39 40 36 57 " 58 54 55 56 53 73 72 68 69 70 " 71  +  VARIOUS METHODS SET DATAl  11 Hi  15 30 45  METHOD  52 67  METHOD COMPLc TE LINKAGE MEM3ERS GROUP 1 - 3 5 1 16 33 GROUP 31 51 66  AVG. GROUP 1 16 31 46 GROUP 52 67  -2 17  3 18  4 19  34 35 38 2 - 4 5 MEMBERS 32 36 37 52 53 54 67 68 69  LINKAGE 1 - 5 1 2  BETWEEN MEMBERS 3 4  5 20  6 21  7 22  8  9  i i 2o  J.2 27  13 28  15  24  10 25  14  23  29  30  43 58  44 59  45 60  4o  Hi  48  49  50  t»i  73  74  75  7t>  62 77  63 78  64 79  65 80  42 39 55  40 56  70  71  MERGED  41 5 7 72  GROUP 7  8  9  10  18  19  5 20  6  17  21  22  23  24  25  i i 2o  12 27  13 28  14 29  15 30  32 47  33 4d  34 49  35 50  36 51  37  38  39  40  41  42  43  44  45  56 71  57 72  • 58 73  59 74  60 75  61 76  o2 77  63 78  64 79  65 80  66  2 - 2 9 MEMBERS 53 54 55 68 69 70  235  AVG. 1 16  LINKAGE 2  31 46 GROUP GROUP 52 69  17  W I T H I N NEW GROUP 5 6 3 4 20 21 18 19  7 22  8 23  9 24  10 25  li. 2o  1<: 27  13 28  14 29  15 30  36 51  37 53  38 55  3 9  40  41  4<i  43  44  45  59 74  60  61 76  62 77  63 78  6*  o5  66  67  63  71  30  8 23  10 25 40  Ix 2t>  12 2/  13 28  14 29  38  9 24 39  4i  42  43  44  15 30 45  57 72  58 73  59 74  60  71  61 76  62 77  63 73  o4 79  12 27 50  13 28  14 29  15 30  32 47 1 -  35 33 34 48 49 50 53 MEMEdERS  2  27  54 70  MEMBERS 56 57 71 72  50 73  CENTROID AND M E D I A N M E T H O D S MEMBERS GROUP 1 - 4 9 6 2 3 4 5 1 20 21 16 17 18 19 36 35 32 33 34 31 49 50 51 43 MEMBERS GROUP 2 - 3 1 54 55 4 7 5 2 53 46 6 9 70 6 6 6 7 6 S 65  75  7 22 37  56  75  80  WARD' S GROUP 1 16  METHOD 1 - 4 3 2 17 32  31 GROUP 35  59 74  JANCEY'S,  60 75  FORGY'S  1 - 3 6 3 38  2 35 68  :OUP  33  34  2 - 3 7 MEMBERS 38 39 42  58 73  GROUP  MEMBERS 3 4 18 19  71 2  - 44 6  1 27  29  54  57  61 76  AND  5 20 36  6 21  7 22  8 23  9 24  10 25  i i 26  37  40  41  44  45  4o  43  47  48  49  53  54  57  63 78  67  66  69  55 70  56  65 80  51 66  52  64 79  71  72  23 55  24 56  2o 39  28 60  30 61  33 62  63  16  17 44 75  li> 4J  19  20  46  4 7  21 50  76  77  78  79  62 77  CONVERGENT  K-MEAN  52  METHODS  MEMBERS 4 42 72  5 43 73  9 48  12 49  74  30  10 36  11 37  65  66  15 51  22  13 39  14 40  6 7  69  53  34  MEMBERS 7  3  31 58  32 64  '  41 70  25 52  236  MEMBERSHIP L I S T OF GROUPS DEFINED BY VARIOUS METHODS UNEVENLY DISTRIBUTED CONTRIVED DATA SET - DATA2  SINGLE LINKAGE : EUCLIDEAN DISTANCE METHOD MEMBERSHIP L I S T NOT ANALYSED  SINGLE LINKAGE J SUM OF DIFFER ENCES AND CHI-SQUARES Mi: THUDS GROUP 1 - 1 7 MEMBERS l i 1 2 3 4 5 6 7 8 9 10 12 17 18 GROUP 2 - 2 5 MEMBERS 19 20 21 22 23 24 25 26 27 29 15 34 35 36 37 38 39 40 41 42 33 GROUP 3 - 3 8 MEMBERS 43 44 45 46 47 43 49 50 51 52 i»3 54 58 59 60 61 62 63 64 65 66 67 68 69 73 74 75 76 77 78 79 80  13  14  16  30  31  32  55 70  56 71  57 72  -  COMPLETE LINKAGE METHOD GROUP 1 - 1 9 MEMBERS 2 3 4 5 1 16 17 18 19 GROUP 2 - 3 4 MEMBERS 21 22 23 24 20 35 36 37 38 39 50 51 52 53 GROUP 3 - 2 7 MEMBERS 55 56 57 58 54 69 70 71 72 73  AVG. LINKAGE METHODS GROUP 1 - 2 0 MEMEBERS 2 3 4 1 17 18 19 16 GROUP 2 - 3 3 MEMBERS 21 22 23 24 36 37 38 39 51 52 53 GROUP 3 - 2 7 MEMBERS 54 55 56 57 69 70 71 72.  6  7  8  9  10  l i  JL ti  13  14  15  25 40  26 41  27 42  23 43  29 44  30 45  3A 46  32 47  33 48  34 49  59 74  60 75  61 76  62 77  63 73  64 79  65 30  66  67  68  5 20  6  7  8  9  10  l i  12  13  14  15  25 40  26 41  27 42  28 43  29 44 "  30 45  31 HO  32 47  33 48  34 49  35 50  58 73  59 74  60 75  61 76  62 77  63 78  64 79  60  65  66  67  68  237  CENTROID AND MEDIAN METHODS GROUP 1 - 2 0 MEMBERS 1 2 3 4 16 18 19 17 GROUP 2 - 37 MEMBERS 21 23 24 22 36 37 38 39 5i 52 53 54 GROUP 3 - 23 MEMBERS 58 59 60 61 74 75 76 73  S MET'HOD 1 - 42 MEMBERS 2 3 4 17 18 19 32 33 34 2 - 14 MEMBERS ,44 45 46 3 - 2 4 MEMBERS 58 59 60 73 74 75  WARD' GROUP 1 16 31 GROUP 43 GROUP 57 72  JANCEY S METHOD GROUP i - 23 MEMBERS  3 4 1 2 18 17 24 28 GROUP 2 - 23 MEMBERS 36 43 44 45 60 63 66 76 GROUP 3 - 34 MEMBERS 19 15 20 21 40 39 41 42 73 74 75 72  FORGY GROUP 1 31 GROUP 35 55 GROUP 3 32 70  1  5 20  6  7  8  9  10  1A  12  13  14  15  25 40 55  26 41 56  27 42 57  28 43  29 44  30 45  31 4b  32 -*/  33 48  34 49  35 50  62 77  63 73  64 79  65 30  66  67  od  69  70  71  72  5 20 35  6 21 36  7 22 37  8 23 38  9 24 39  10 25 40  ii 26 41  12 27 42  13 28  14 29  15 30  47  48  49  50  51  52  53  54  55  56  61 76  62 77  63 73  64 79  65 80  66  67  68  69  70  71  5 29  6 31  7 32  8 70  9  10  li  12  13  14  .16  46 77  47 78  43 79  49 80  50  51  52  53  54  55  56  22 57  23 58  25 59  26 61  27 62  30 64  33 65  3<* 67  35 68  37 69  38 71  S AND CONVERGENT K-MEAN METHODS 1 - 16 MEMBERS 2  4  5  2 - 29 MEMBERS 36 40 43 56 57 58 3 - 35 MEMBERS 15 16 11 33 34 37 72 73 74  6  7  8  9  10  12  li  it  17  18  29  44 t>0  45 63  46 64  47 66  48 71  49 76  50 77  51 76  52 79  53 80  54  19 38 75  20 39  21 41  22 42  23 59  24 61  25 o2  26  27 67  28 6a  30 69  238  MEMBERSHIP L I S T OF GROUPS DEFINED BY VARIOUS METHODS EMPIRICAL DATA - NORTH BURNABY AREA - NBDATA SINGLE LINKAGE: EUCLIDEAN DISTANCE AND SUM OF DIFFERENCES METHODS + AVG. LINKAGE B E T W E E N MERGED GROUP METhOD GROUP 1 - 3 MEMBERS 1 2 3 GROUP 2 - 84 MEMBERS 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 38 39 40 41 42 43 44 4-5 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 03 34 85 86 87 SINGLE LINKAGE : CHI -SQUARES METHOD GROUP 1 - 29 MEMBERS 1 2 3 4 5 6 7 16 17 IS 19 20 21 22 GROUP 2 - 58 MEMBERS 28 25 26 27 29 30 31 45 46 47 48 49 50 51 60 61 62 63 64 65 66 75 76 77 78 79 ' 80 81 COMPL ETE LINKAGE METHOD GROUP 1 - 37 MEMBERS 1 2 3 4 5 16 17 18 19 20 31 32 35 34 35 GROUP 2 - 50 MEMBERS 38 39 40 41 42 53 54 55 56 57 68 69 70 71 72 83 84 85 86 87  13 33 48 63 78  8 23  9 24  10 33  li 34  35  13 36  14 37  15  32 52 67 82  33 53 68 83  39 54 69 84  40 55 70 85  41 5o 71 86  42 57 72 87  43 58 73  44 59 74  6 21 36  7 22 37  8 23  9 24  10 25  26  li  12 27  13 28  14 29  15 30  43 58 73  44 59 J4  45 60 75  46 61 76  47 62 77  48 63 78  49 64 79  50 65 80  51 66 8i  52 67 82  43 62 87  44 63  45 64  46 75  47 76  48 77  49 78  12 27 65  13 28 66  ±4 29 67  15 3,0 66  16 31 69  17 32 70  13 33 71  AVG. LINKAGE WITHIN NEW GROUP METHOD GROUP 1 - 39 MEMBERS 1 2 3 38 39 40 41 42 50 51 52 53 58 59 60 61 79 80 81 32 84 85 86 83 GROUP 2 - 48 MEMBERS 4 5 6 7 8 9 10 11 19 20 21 22 23 24 25 26 34 35 36 37 54 55 56 57 72 73 74  239  CENTROID AND MEDIAN METHODS GROUP 1 - 1 2 MEMBERS L 2 3 11 12 13 lOUP 2 - 75 MEMBERS 4 5 6 7 8 9 28 29 30 31 32 33 43 44 45 46 47 48 58 59 60 61 62 63 73 74 75 76 77 78  WARD'S METHOD GROUP 1 - 3 5 MEMBERS 1 2 3 4 16 17 18 19 36 37 54 55 10UP 2 - 52 ME MBER.S 26 27 28 29 48 49 50 51 66 67 63 69 81 82 83 84  J A N C E Y ' S METHOD • GROUP 1 - 3 5 MEMBERS 1 2 3 4 16 17 13 19 31 33 34 35 OUP 2 - 52 MEMBERS 32 37 33 39 53 54 51 52 66 67 63 69 81 82 83 84  14  15  16  17  18  19  10 34 49 64 79  20 35 50 65 80  21 36 51 66 81  22 37 52 67 82  23 38 5J  6d 85  2t 39 34 69 34  25 40 55 70 85  26 41 56 71 86  27 42 57 72 87  5 20 57  6 21  7 22  8 23  9 24  10 25  ii 3i  12 3^  13 33  14 34  15 35  30 52 70 85  38 53 71 86  39 56 72 87  40 58 73  41 59 74  42 60 75  43 6l 7o  44 o2 77  45 63 78  46 64 79  47 65 80  5 20 36  6 21  7 22  8 23  9 24  10 25  11 26  12 27  13 28  14 29  15 30  40 55 70 85  41 56 71 86  42 57 72 87  43 58 73  44 59 74  45 60 75  46 . 61 76  47 o2 7/  48 63 78  49 64 79  50 65 80  9 24  10 25  11 2o  12 27  13 28  14 29  15 34  40 55 70 85  41 56 71 86  42 57 72 87  43 58 73  44 59 74  45 60 75  46 61 76  FORGY'S AND CONVERGENT K-MEAN METHODS GROUP 1 - 31 MEMBERS 1 2 3 4 5 6 7 8 16 18 19 20 21 22 23 17 35 ;OUP 2 - 56 MEMB• ERS 30 31 32 33 36 37 33 39 47 48 49 50 51 52 53 54 62 63 64 65 66 67 68 69 77 78 79 80 81 82 83 84  240  MEMBERSHIP L I S T OF GROUPS DEFINED BY VARIOUS METHODS EMPIRICAL DATA - SOUTH BURNABY AREA - SBDATA  SINGLE LINKAGE : SUM OF DIFFERENCES METHOD + AVG. LINKAGE BETWEEN MERGED GROUP METHOD GROUP 1 - 21 MEMBERS 1 2 3 4 5 83 84 85 86 93 94 96 95 97 98 GROUP 2 - 26 ME MBERS o 7 8 9 10 13 11 14 15 23 25 26 22 24 27 28 29 30 GROUP 3 - 66 MEMBERS 12 33 34 35 36 37 38 39 40 47 48 49 50 51 52 53 54 55 63 64 62 65 66 67 68 69 70 78 77 79 80 81 99 32 100 101 103 1C9 110 111 112 113  SINGLE LINKAGE : CHI SQUARES METHOD GROUP 1 - 34 MEMBERS 48 49 1 2 52 53 57 66 67 68 69 70 71 72 95 96 97 98 GROUP 2 - 36 MEMBERS 3 4 5 40 43 44 42 74 75 76 77 31 83 104 103 105 106 103 107 GROUP 3 - 43 MEMBERS 6 7 8 9 10 12 11 23 24 25 26 27 21 22 36 38 39 37 41 78 79  COMPLETE LINKAGE METHOD GROUP 1 - 20 MEMBERS 1 2 3 4 83 94 95 96 97 98 GROUP 2 - 45 MEMBERS 45 46 47 48 49 61 60 62 63 64 75 76 77 81 82 GROUP 3 - 48 MEMBERS 5 6 7 8 9 20 21 22 23 24 35 36 37 38 39 112 113 111  87  63  69  90  91  92  16 31  17 32  id  19  20  21  41 56 71 102  42 57 72 103  43 5o 73 i.04  44 59 74 105  45 60 75 106  46 61 76 107  58 73  59 37  60 39  6x 90  62 9i  63 92  64 93  65 94  45 84  46 85  47 86  50 88  5 i 99  54 100  55 101  56 102  13 28 80  14 29 109  15 30 110  16 3A H i  17 32 ii2  18 33 113  19 34  20 35  84  85  66  87  83  39  90  91  92  93  50 65 99  51 60 100  52 67 .01  53 68 102  54 69 103  55 70 104  56 71 105  57 72 106  58 73 107  59 74 108  10 25 40  11 26 41  12 27 42  13 28 *3  14 29 44  15 30 78  16 -il 79  17 32 80  18 33 109  19 34 110  241  AVG.  LINKAGE  WITHIN  NEW  GROUP  - 22 M E M B E R S 1 2 5 3 4 93 94 97 95 96 G R O U P 2 - 36 MEMBERS 46 47 48 49 50 61 62 63 64 65 76 82 100 101 77 GROUP 3 - 55 M E M B E R S 6 7 a 9 10 21 22 23 24 25 36 37 33 39 40 104 105 108 106 107 GROUP  CENTROID GROUP 1  62 GROUP  15 30 GROUP  1 34 49 65 81 96 111  AND  MEDIAN  - 2 MEMBERS 70 2 - 18 M E M B E R S 16 17 18 31 32  3  -  2 35 50 66 82 97 112  WARD' S  93  1 63  83 98  84 99  85  86  87  86  89  90  91  92  51 66 102  52 67  53 68  54  55  5o  57  58  59  60  69  70  7A  73  74  75  11 2b 41 1C9  12 27 42 110  13 28 43 111  14 29  13 30  lo 3A  11 ->2  18 33  19 34  20 35  44  45  73  79  80  81  103  112  113  23  24  25  26  27  28  29  9 42  10 43  11 44  x2 45  13 46  14 47  33 48  58 75 90  59 7o 91  60 77 92  61 78 93  63 79  METHODS  19  20  21  22  5  6 . 39 54 71 86 101  7 40 55 72 87 102  8 41 56 73 88  57 74 89  94  64 80 95  103  1.04  105  10o  i07  103  109  110  53 63 93  54 69 94  55 70 95  56 81 96  57 82 97  58 96  59 84 99  60 85 100  61 36 101  62 87 102  10 26  11 27  13 28  14 29  15 30  17 32  13 33  19  20  31  38 72 109  39 73 110  40 74 111  41 75 112  42 76 113  43 77  44 78  45 79  <t6 80  47 103  MEMBERS  3 36 51 67 83 98 113  4 37 . 52 68 84 99  38  53 69 85 100  METHOD  MEMBERS - 45 3 2 • 4 52 66 64 65 67 89 91 88 90 92 GROUP 2 - 28 MEMBERS 8 6 7 9 5 23 24 25 22 21 G R O U P 3 - 40 MEMBERS 36 34 37 12 35 50 48 49 51 71 105 106 107 108 104 GROUP  METHOD  1  1  lo  \  242  J A N C E Y * S METHOD GROUP 1 - 3 3 MEMBERS 6 7 8 9 10 21 22 23 24 25 37 38 78 GROUP 2 - 46 MEMBERS 35 39 40 41 42 53 54 55 56 67 79 80 81 101 102 113 GROUP 3 - 3 4 MEMBERS 3. 2 3 4 5 82 83 84 85 86 97 98 99 100  11 26  12 27  13 28  14 29  15 30  16 3i  i.7 32  16 33  19 34  20 36  43 68 103  44 69 104  45 70 105  <*6 71 106  47 72 107  46 73 lOd  49 74 i 0*  50 75 110  51 76 H i  52 77 il2  57 87  58 88  59 89  60 90  61 91  6<: 92  o3 9.>  64 94  65 95  66 96  14 29  15 30  16 3 i  17 32  16 33  19 34  20 35  F O R G Y » S AND CONVERGENT K-MEAN METHODS GROUP 1 - 3 4 MEMBERS 6 7 8 9 10 11 12 13 21 22 23 24 25 26 27 28 36 37 38 78 GROUP 2 - 4 9 MEMBERS 39 40 41 42 43 44 45 46 54 55 56 57 65 66 67 63 76 77 79 80 31 100 101 102 110 111 112 113 GROUP 3 - 3 0 MEMBERS 2 3 4 5 58 59 60 86 87 83 89 90 91 92  47 69 103  43 70 104  '49 71 105  3'J 72 10o  51 73 107  52 74 108  53 75 109  61 93  62 94  63 95  o*t 9o  82 97  83 98  84 99  APPENDIX H  L i s t i n g and Sample Outputs from ROUTE: a Computer Program Designed f o r E v a l u a t i n g  Clusters  244  DIMENSION X ( 1 0 0 ) , Y ( 1 0 0 ) , 0 ( 1 0 0 , 1 0 0 ) , S ( 1 0 0 , 1 O O J , D O J ( 1 0 0 1 , 1' IT(100),NI(5 000),NJ(5000),L(100),SN(10J,10), 2 LR(50,100),LLR{10,I 00),LK|100),PROTIT(15) 3 , F S A ( 2 ) , T I M E ( 1 0 0 3 ,LAB 1 1 0 0 ) , 0 1 S T Z ( l U l , l U l ) , 4 AREA(5),NLR(I00),S0T(3,5000),SNAME(±00,10), 5 L A B K 1 0 0 ) , T I L E ( 2 0 ) ,WAY (20) T  C C C C C C C C C C C C C C C C C C C C C C C  THE FOLLOWING ARE IMPORTANT PROTIT AREA(I) ICAP ITIM M INUM ISTIME AVGSPD SCALE NPLOT  = = = = = = = = = =  XORD YORD N FSA TYPE XORG YQRG  = = = = = = =  VARIABLES:-  PROBLEM T I T L E GEOGRAPHIC LOCATION CAPACITY OF TRUCK (N0„ OF STOPS) TIME CONSTRAINT OF TRUCK ROUTE TOTAL NUMBER OF TRUCKS TOTAL NUMBER OF BOXES IN THIS AREA STOPPING TIME AT EACH BOX AVERAGE TRUCK SPEED IN M . P . H . MAP SCALE (1 MM= XX MILES) PLOTTING CONTROL NUMBER NPL0T=1 E X I S T I N G ROUTE PLOTTED NPLOTc GTe1 ONLY NEW ROUTE PLOTTED X-COOROIHATE OF STATION IN MM Y-C00RDINA7E OF STATION IN MM NO OF BOXES FOR THIS ROUTE FORWARD SORT AT I ON AREA SL3/BR X-COORDINATS FOR PLOT ORIGIN (INCHES) Y-COORDINATE FOR PLOT ORIGIN (INCHES)  READ ( 5 , 1 1 ) PROTIT 11 FORMAT ( I 5 A 4 ) WRITE ( 6 , 1 9 0 ) PROTIT 190 FORMAT ( * 1 ' , 2 0 ( / ) , 2 5 X , ' M U L T I P L E ROUTE SCHEDULING BY SAVINGS ' , 1'ALGORITHM »,/26X,15A4) R E A D ( 5 , 1 5 ) ( A R E A ( I ) , I = 1 , 5 ) , I C A P , I T I M , M , I N U M , I S T I M E , N P L OT,AVGS PD, A SCALE 15 FORMAT!5A4,6I4,2F10.6) WRITE ( 6 , 6 0 ) ( AREA { I ) , 1= 1 , 5 ) , I C A P , I T I M , A V G S P D , M , I N J M , S C A L E 60 F O R M A T ( • 1 ' , 1 0 ( / ) , 1 5 X , ' T H E FOLLOWING ROUTING PRINT-OUTS FOR «, 1 5 A 4 , ' AREA ARE BASED ON : - ' / / , 1 7 X , 2 ' C A P A C I T Y CONSTRAINT OF EACH TRUCK ROUTE = • , 1 6 , 2 X , • B O X E S ' / / 3 , 1 7 X , ' T I M E CONSTRAINT OF EACH R O U T E ' , 1 I X , • = •,I 6,2X,•MINUTES•//, 4 1 7 X , ' A V E R A G E SPEED OF TRUCKS EN R O U T E ' , 8 X , ' = • , F 6 . 2 , 2 X , M . P . H o / 5 / , 1 7 X , ' T O T A L NUMBER OF ROUTES IN THE A R E A ' , 6 X , ' = • , 1 6 . 2 X , ROUTES'/ 6 / , 1 7 X , ' TOTAL NUMBER OF 30XES IN THE A R E A ' , 7 X , ' = ',lo,2X,«BOXES'/ 7 / , 1 7 X , ' M A P S C A L E • , 3 1 X , *= « , 2 X , « 1 M M = • , F 1 0 . 6 , • M I L E S ' , / ) WRITE(6,61) ISTIME 61 F3RMAT(17X,•STOPPING TIME FOR RESIDENTIAL B O X E S • , 5 X , » = • , 5 X , 1 2 , 2 X , 1 ' S E C O N D S ' , / / , 1 7 X , ' S T O P P I N G TIME FOR BUSINESS BOXES',6X,'=',5X, 2'AS SPECIFIED',//) 1  C  1  245  JC = 0 IR = 0 TOIST=Oo R E A 0 ( 5 , I S ) F S A . T Y P E , X O R D , Y O R D , X O R G , Y O R G , I P L O T , I L I ST 18 F0RMAT(5X,2A4,4F8.2,2I4) 7 . OO 9 8 7 I L = 1 , 1 0 0 937 IT(IL)=2 C X(o) - X - C O O R D I N A T E OF C A L L P O I N T l o ) C Y(.) - Y - C O O R D I N A T E OF C A L L P O I N T (.) C L(.) - L A B E L OF C A L L P O I N T (.) C T I M E (• ) - S T O P P I N G T I M E R E Q U I R E D AT P O I N T U ) C DI.,.) - D I S T A N C E BETWEEN C A L L P O I N T S (...) C DD0( ) - D I S T A N C E OF C A L L P O I N T ( « ) FROM ( 0 , 0 ) C S(.,.) - S A V I N G S BY J O I N I N G C A L L POINT S I . , . ) C LR(!,J) - L A B E L OF J T H E L T . I N SUBROUTE I C NLR(o) - N O . OF EL To IN S U B R Q U T E C ) C NR - N O . OF CURRENT SUBROUTES C C C READ L A B E L AND L O C A T I O N OF E A C H BOX C CALL S E L E C T ( L , X , Y , T I M E , S N A M E , I N U M , T I T L E , W A Y , N ) C C C A L C U L A T E D I S T A N C E BETWEEN BOX AND O R I G I N C 92 DO 1 I=1,N 1 DDOtI)=(ABS(X{I)-X0RD)+ABS(Y(I)-YORD)) NM1=N-1 o  c C C A L C U L A T E D I S T A N C E SAVED M A T R I X AND D I S T A N C E MATRIX C D=DISTANCE M A T R I X S = D I S T A N C E SAVED DO 3 I = 1 , N M 1 K= 1 + 1 DO 3 J = K , N D ( I , J ) = ( A B S ( X ( I ) - X ( J ) ) ) + ( A B S ( Y ( I ) - Y ( J ) )) S ( I , J ) = DDO( 1 ) + D D 0 ( J ) - D ( I , J ) S ( J , I) = S ( I , J ) DIJ,I)=D(I,J) 3 CONTINUE NA=N+1 DO 5 I = 1 , N A D(I,I)=Oo • 5 S(I,1)=0. K=0 CALL C O U N T ( D , N , T I T L E , W A Y , S C A L E ) C C STORE D I S T A N C E SAVED MATRIX I N A VECTOR C C S O T ( 1 , . ) =F R O M - M A I L 8 0 X , S O T ( 2 , . ) = T O - M A I L B O X , S O T ( 3 , . ) = D1ST-SAVED DO 4 I = 1 , N M 1  246  IP1=I+1 DO 4 J = I P 1 , N • K=K+1 SOT(l.K)=I SOT(2,K)=J S0T(3»K)=S(I»J) 4 CONTINUE  C  SORT  C r  VECTOR  BY D I S T A N C E  SAVED  VALUES  NN=((N*N)-N)/2 CALL ISORKSOT,3,3000,1,NN,3,3,-1) DQ 10 I = 1 , N N " NItI)=SOT(l,I) N J U ) =S0T{2, I ) 10 C O N T I N U E  C  B E G I N ROUTE C H O I C E  C  C  SUBROUTES ARE FORMED 8Y S E A R C H I N G I N THE L I S T M A X . S A V I N G S AND ARE L I N K E D M U L T 1 P L L Y .  C C  C 20  C C  C  HEURISTIC.  NR=0 DO 12 IF IF  SOT(3,.)  I=1,NN  NEGATIVE,  IT  IS  (SOT(3,I).LToOo)  A L R E A D Y I N ROUTE GO TO  12  C C  C C C C  C  C C  C  TEST IF  IF  ONLY ONE I S  (IT(NJ(I)).NH.1T{NI(I)))  TEST IF  TO SEE  TO SEE  IF  END P O I N T GO TO  OF  SUBROUT^  21  I N SUBROUTE  (IT(NJ(I)J.EQ.l)  GO TO  22  B E G I N NEW SUBROUTE  IT(NI(I))=1 IT(NJ(I))=1 C NR= # OF NEW R O U T E S , L R ( N R , 1 ) = L A B E L . 1 S T . B O X , L R ' N R , 2 ) = L A B E L . 2 . B O X C NLR(NR) = 2.N0.0F.LAb£LS = 2 NR=NR+1 LR(NR,1)=NI ( I ) LR (NP., 2 ) =N J ( I ) NLR(NR)=2 GO TO 12  C  FOR  ADD NI(I)  C  C  22 C C C C C  C  TO S U 3 R 0 U T E  DO 71 J = 1 , N R EG R 0 U T 1 = 1 2 - 1 4 - 1 7 R O U T 2 = 1 3 - 2 0 - 1 2 TRY TO MATCH AND MAKE I ROUTE 1 R0UTE = 1 8 - 2 ~ 1 2 - 1 4 - 1 7 F I N D S U B R O U T E ( S ) AND END CHECK TO S E E I F E I T H E R HAS BEEN J O I N E D TOGETHER B E F J \ E SO THAT A CLOSED LOOP I S NOT C O M P L E T E D TOO EARLY IF ( L R ( J , D o E Q o N I ( T ) . O R . L R ( J » N L R ( J ) J . E O . N I I I ) ) K i = J IF ( L R ( J , l ) o E Q o N J ( I ) o O R . L R { J , N L R ( J ) ) „ E Q . N J ( I ) ) K 2 = J 71 CONTINUE I F (KloEQ«,K2) GO TO 12  C T E S T FOR BOTH AT B E G I N N I N G OF SUBROUTE C I F N E I T H E R L A B E L OCCURRED BEFORE STORE I N NEW ROW OF M A T R I X BUT C DO NOT CHECK FOR A N Y T H I N G IF ( L R { K l , l ) o E Q . N U I ) „ A N D o L R ( K 2 , i ) o E Q . N J ( I ) ) GO TO 73  C C C  TEST  FOR ROUTES  FITTING  TOGETHER  IF ( L R ( K l , N L R ( K I J ) o E Q o N I I I ) A N D o L R { K 2 , N L R { K 2 ) C I F OLD BOX LA8EL=NEWBOX L A B E L GO TO 72 IF { L R ( K 1 , N L R ( K 1 ) ) . E Q . N I ( I ) ) GO TO 72 GO TO 9 9 a  C C C  R E V E R S E ORDER OF SUBROUTE  K l & PLACE  }«£*}• N J ( I ) ) GO TO  INTO L L R  73  NK1=NLR(Kl) DO 75 J = 1 , N K 1 LLR(K1,NLRIK1)-J+l.=LR(K1,J) 75 CONTINUE C STORE NEW L A B E L J N NEW ROW OF M A T R I X DO 83 J = l , N K i LRIK1,J)=LLR(K1,J) 8 8 CONTINUE  C  P L A C E BOTH S U 3 R 0 U T E S INTO ROUTE K l NKlP2=SUM.T0TALoN0oLABELS NKl=NOoOF.BOXESoIN.K1.ROUTE 72 NK1P2=NLR(K1)+NLR(K2) . NK1=NLR{K1)+1 C J O I N UP 2 SUBROUTES TOGETHER DO 76 J = N K 1 , N K 1 P 2 LR(Kl,JJ=LRIK2,J-NLR(Kl)) LRIK2,J-NLR(Kl))=0 76 C O N T I N U E NLR(K1)=NKIP2  C C  C C C  TURN L A B E L S IF(K2oGT.Kl)  AROUND GO TO 7 6  89 K2P1=K2+1 C INCORPORATE NEW ROUTE AND RENUMBER OLD ROUTE SO OLD ONE F I T S C NLR=# OF BOXES IN ROUTE COUNTER LR=M£W LABELS DO 77 J=K2P1,NR NLRJ= NL R I J ) NL R ( J - l ) =NLR ( J ) DO 77 K=1,NLRJ L R ( J - 1» K ) = L R ( J » K) 77 CONTINUE NR=NR-1 GO TO 100 78 I F ( K 2 „ E Q . N R ) GO TO 67 NNR=NR-1 DO 66 K=K2iNNR JAK=K+1 NNUM=NLR(JAK) DO 66 J=1,NNUM LR(K,J)=LR(JAK,J) LR(JAK,J}=0 NDUMP=NLR(K j NLR(K)=NLR(JAK) 66 CONTINUE NLR (NR) =0 67 NR=NR-1 GO TO 100 C C F I T SECOND ROUTE INTO LLR C 79 NK2=NLR(K2) DO 81 J=1,NK2 LLR(K2,NLR(K2)-J+1)=LR(K2,J) 81 CONTINUE DO 80 J=1,NK2 LR(K2,J)=LLR(K2»J) 80 CONTINUE GO TO 72 99 KK1=K1 K1 = K2 K2=KK1 GO TO 72 C C PLACE NEW LOCATIONS ON SUBROUTE C 21 IF ( I T ( N I ( I ) ) o E Q < . 2 ) GO TO 52 C SWITCH LABELS FOR NEW BOXES THE OLD BOX HAS ALREADY JOINED W C ANOTHER NOW IT BECOMES A "T0= BOX RATHER THAN A "FROM" BOX NII=NI(I) NI(I)=NJ(I) NJ( 1 1 - N I I 52 L1=0  L2 = 0 DQ 53 J = 1 , N R I F L A B E L S MATCH L1=0LD„LABEL CHECK IF ( L R { J , 1 ) « E Q . N J ( I ) ) L1=J  C  IF  I L R ( J , I N L R ( J ) ) . E O . N J ( I H  L2  =  2ND L A B E L  WITH NEW L A B E L  J  CONTINUE I F ( L l . G T . O . ) GO TO 55 NLRtL2)=NLR(L2)+l LR(L2 »NLR(L2)) = N I ( I > GO TO 1 0 0 ' 55 NL1=NLR{L1) C R E L A B E L NEW BOX TO OLD 1ST B O X . A N D R E L A B E L C BOX N O . DO 56 J = 1 , N L 1 L R ( L i , iSIL1 - J + 2 ) = L R ( L 1 , N L I - J + 1 ) 56 C O N T I N U E LR ( L i , 1 ) = N I ( I ) 53  1ST  OLD oOX  N L R C L l ) = N L R ( L U + i  C C C  CALCULATE  STATISTICS  "  AND UPDATE  RECORDS  100 IP 1 = 1+1 C NLR=# OF BOXES J O I N E D TOGETHER IT(NI(I))=ITINI(I))-1 IT(NJ( I ) )=ITINJ(I))-i I F 1 I T I N K I ) ) s E O . O J GO TO 120 C GOTO E L I N I M A T E THE POSS OF J O I N I N G AN OLD BOX C GOTO E L I N I M A T E THE POSS OF J O I N I N G AN OLD BOX 1 3 5 I F ( I T ( i N J ( I ) ) . E Q . O ) GO TO 1 2 3 GO TO 12 C E L I M I N A T E POSS OF J O I N I N G OLD BOXES BY M A K I N G N E G A T I V E 120 DO 125 J=IP1,NN IF I N K J ) » E Q . N I ( I) ) S0T(3,J)=-1. IF ( N J ( J ) . E Q . N I ( I ) ) S0T(3,J)=-1. 125 CONTINUE GO TO 1 3 5 128 DO 126 J=IP1,NN I F (N I ( J ) • E Q . N J { I H S0TC3,J)=-1. IF ( N J { J ) o E Q o N J ( I ) ) S0Tl3,J)=-lo 126 C O N T I N U E 12 CONTINUE C C PRINT R E S U L T S . C JC=JC+1 IR=IR+1 DIST=0. IFLAG=1 DO 35 I = 1 , N M 1 35  TO  D I S T = D I S T + D ( L R ( l i l ) , L R ( 1 , I + l l )  2ND  250  DIST=THE DISTANCE IS SUMMED FROM 1ST BOX TO LAST BOX DISS = THE COMPLETE ROUTE D I S T A N C E » S u T H ARE NECESSARY CAUSE THE ROUTE CAN BE STARTED AFTER SLB OR BR OR IN MIDDLE OF SLB D I S S = D I S T + U D 0 ( L R ( 1 , 1 ) ) + 0 0 0 ( L R (1»N )) ADIST = ACTUAL DISTANCE (STN TO S T N , I N MILES. BDIST = ACTUAL DISTANCE (1ST TO LAST BOX» IN MILES) STIME = TOTAL TIME REQUIRED FOR STOPPING AT BOXES TRTIME = TOTAL TRAVEL TIME REQUIRED (STN TO STN) BTT'IME = TOTAL TRAVEL TIME (1ST TO LAST BOX) TOTIME = TOTAL TIME PER ROUTE INCLUDING STIME AND TRTIME (STN TO STN) BTIME = TOTAL TIME FOR TRAVELLING AND STOPPING (1ST TO LAST BOX) DIST = DISTANCE OF THE ROUTE FROM FIRST BOX T J LAST BOX DISS = DISTANCE OF THE ROUTE FROM ORIGIN (STATION) TO ORIGIN CHANGE L R , L A K , E T C TO LAB OR NEMONJCS FDR LABEL 34 44 32 36 33  41  40 45 47 46 42  DO 34 I = 1 , N LAB(1)=LR(1fI) LAB(N+1)=0 ST I MS = 0 . 0 DO 32 I = 1,N LK(I)=L(LAS(I)) CONTINUE 00 36 IN=1,N DO 36 IN0=1,10 SN(IN,IN0)=SNAME(LA3( IN),INO) DO 33 IK=1,N STIME=(STIME+TIME(LR(1,IK))) BDIST=DIST*SCALc BTTIME=C3DIST/AVGSPD)*60. ADIST=DISS*SCALE TRT IME= ( ADI S T / A V G S P D ) * 6 0 . TOT IME=TRTIME + STI ME BTIME = 3TTIME + STI ME W R I T E ( 6 , 4 1 ) IR, ( A R E A ( I J l ) , I J 1 = 1 , 5 ) , ( F S A ( I J 2 ) , I J 2 = 1 , 2 ) .TYPE F O R M A T t ' 1 ' , / / , 1 1 0 X , PAGE , 1 3 , / / , 1 5 X , ' M U L T I P L E ROUTE S C H E D U L I N G ' , 1 / , 1 5 X , ' U S I N G TIME-SAVING M E T H O D ' , / / / , 1 5 X , 5 A 4 . • A R E A ' , / , 2 1 5 X , ' F o S o A . • , 2 X , 2 A 4 , / , 1 5 X , A 4 , « ROUTES',////) I F ( I F L A G . E Q . 2 ) GO TO 45 WRITE(6,40) F 0 R M A T Q 5 X , ' * * PRELIMINARY ROUTE * * • , / / / / ) GO TO 46 WR1TE(6,47) FORMAT(15X » ' I M P R O V E D ROUTE * * ' , / / / / ) W R ! T E ( 6 , 4 2 ) JC•N,ST I ME,BDI ST,BTTI ME,BTI ME,ADIST,TRTIME,TOTIME FORMAT(15X,•ROUTE N O . ' , 1 4 , / / , 1 1 5 X , ' N 0 . GF BOXES EN ROUT E • , 16X , • = " • , I 5 , 3 X , • BOXES ' , / / , 2 1 5 X , ' T O T A L STOPPING T I M E ' , 1 8 X , • = • , 3 X , F 6 . 2 » ' MINUTES'.//, 3 1 5 X , 'DISTANCE TRAVELLED(1ST TO LAST BOX) = ».3X,F6.2,' MILES',//, 1  1  251  4 1 5 X , ' T R A V E L T I M E R E Q U I R E D ( 1 S T TO LAST BOX ) = • , 3 X , F 6 . 2 , • MINUTES',/ 5/,15X,•TOTAL TIME R E Q U I R E D { 1 S T TO L A S T BOX) = ' » 3 X » F 6 . 2 » 2 X , ' M I N U T E S 6',//,15X,'TOTAL D I S T A N C E T R A V E L L E D ( S T N TO STNJ =',3X,F6.2, 7' M I L E S ' , / / , 1 5 X , ' T O T A L T R A V E L T ! M E ( S T N TO STN ) • , 3 X , • = • , 83X,F6o2,' M I N U T E S ' , / / , I 5 X , ' T O T A L T I M E R E Q U I R E D FOR T H I S ', 9'ROUTE =•,3X,F6.2,2X,'MINUTES',/) NPi=N+l LR(1,NP1)=0 LK(NP1)=0 WR I T S ( 6 , 4 3 ) ( L K ( J ) , J = 1, NP 1.) 43 F Q R M A T ( / / / , 1 5 X , • O R D E R OF C A L L P O I N T S : ' , / , 1 5 X , • S T A T ION ', 1 12(15,'-')»/»6(25X,12(15,'—')»/),' STATION',/) I F ( I F L A G o E Q o 2 ) GO TO 8 0 5 D I S T Z U ,1)=0. DO 8 0 1 IK=1,N DISTZ(IK+1,1)=DD0(IK) 8 0 1 D I S T Z d , I K + 1 ) =DDO( I K ) DO 8 0 2 I M = 1 , N DO 8 0 2 IJ=1,N 802 D I S T Z U M + 1 , 1 J+1)=D( I M . I J ) XTM=1000 CALL I M P R O T ( L A B , N , X T M , D I S T Z , D I S T , D I S S ) DO 8 0 9 I H = 1 , N 809 L A B K I H ) = L ( L A B ( I H ) ) I F LAG =2 1R=IR+1 GO TO 4 4 805 1F ( B T I M E o G T e l T I M ) GO TO 2 9 9 9 0 1 I F ( I L I S T o N E o 1 ) GO TO 9 0 2 8 03 NUM=N/2 0 ANUM=N/20. I F ( A N U M e G T o NUM)NUM=NUM + 1 ICNT=0 NON=0 804 IR=IR+1 ICNT=ICNT+1 WRITE(6,808) IR,JC,(AREA(IRl),IR1=1,5),ICNT,NUM 808 F O R M A T t • ! ' , / / , 1 1 0 X , • P A G E ' , 1 3 , / , 2 7 X , ' R O U T E NO. ,15,//, 1 2 4 X , 5 A 4 , / / , 2 3 X , • O R D E R OF C A L L POINTS',//,13X,60('-•),//, 213X,'B0X NO.',10X,'LOCATION',20X,»SHEET',12,» OF',12,//, 313X,60(•-»),//) NAN=N0N+1 N0N=ICNT*20 IF(NONo G T . N ) N 0 N = N DO 8 0 6 IW=NAN,NON 806 WRITE(6,807)LABKIW),(SN(IW,IS)tIS=l,10) 807 F0RMAT(15X,15,5X,10A4,/) I F ( I C N T . L T . N U M ) GO TO 8 0 4 9 0 2 I F ( I P L O T o G T . l ) GO TO 8 1 0 CALL P L I K ( D I S T , N P 1 , L A B , X , Y , X O R G , Y O R G , S C A L E , J C , L A B I , o  1  252  1NPL0T ) 610  TDIST=TDIST+OIST A T D I S T = T D I S T * SCALEGO  299  TO  997  IR=IR+1 WRITE(6,298)  298  FORMAT! TIME  I* 2'  GREATER TRY  GO  ROOTS  TIME  NUMBER  ROUTE ' ,  TO  ^OR  WARNING  * * ' , / / / / , 1 5 X ,  NO.',13,  ALLOWED',//,15X, OF  BOXES  IN  NUMBER  OF  ////)  901  I F ( J C . L T . M ) GO TO 7 (6,170)  FORMAT 10F  999  THAN  LESSER  WRITE 170  REQUIRED  3'  V T H I S  997  IR,JC  1',110X,'PAGE',I3iI5(/),15X,•**  1  M , A T D I ST  (' 1 * , 1 0 X , • T O T A L  ROUTES',2X,F7»2,"  ROUTES',2X,I2,/11X,'TOTAL  LENGTH  MILES')  STOP END SUBROUTINE DIMENSION  I M P R O T ( N M R O U T , M M R , T M M , D M , D 1 S T , D ISS ) NMROUT(IOO),NEWRT(100),DM(101,101).NLRilOu),LR(10,iOO)  C C  THIS  C  REINSERTING  SUBROUTINE IT  IMPROVES IN  THE  ROUTES  BEST  3Y  REMOVING  POSITION  IN  THIS  A CUSTOMER ROUTE,  C  C C  MMR  =  #  C  LM  =  NUMBER  OF  BOXES OF  EN  ROUTE  THE  ROUTE  C  DM(* .)  =  DISTANCE  FROM  C  TMM  =  DISTANCE  REMAINING  f  C  I  =  ROUTE  C  K  =  BOX  =  TEST  C  ITSTCS  C  NMROUT(o)=  C  DIST  =  {*)  TO  (•)  INDEX  INDEX BOX  LABEL TRAVEL  NUMBER  OF  (•)  IN  DISTANCE  CALL OF  NEW  ORDER ROUTE  C  c  STEP  C  THRU  ROUTES  THRU  CUTOMERS  C KM1--1 C STEP  C  OF  ROUTE  I  C DO  90  K=l,MMR  C STORE  C  C  10  ROUTE  WITHOUT  TEST  CUSTOMER  IN  DO 10 K K 1 = 1 , M M R I F ( K K l o L T o K ) NEWRT(KK1)=NMROUT(KK1) IF(KKIoGT»K) NEWRT(KK1-1)=NMROUT(KK1) CONTINUE  NEWRT(o)  IN  A  ROUTE  AND  253  NEWRT(MMR)=0 NLR{MMR)=0  C  TEST  C C  BOX  IS  K • TH BOX OF N M R Q U T J . )  I T S T C S =.NMR0UT( K)+1 KP1=NEWRT(K)+1 I F ( K. GTc 1 ) KM1 = NE W R T ( K - 1 ) + 1 IFCITSTCS.EQ.KM1)G0T0 90  C  CALCULATE DISTANCE  C  C  OF NEWRT  T£MP=TMM-DM(KM1,ITSTCS)-DM(ITSTCS,KP1)+DM(KM1,KPi) ISVKK=0 SVTM=TMM  C  KMP I S  C  C  KM-PREVIOUS  KMP=1  C  TEST CUSTOMER I N EACH P O S I T I O N D I S T A N C E ON NEWRT ( S V T M ) o  C C  C  20  C C  NO CHANGE I N NEWRT TRY N E X T BOX  IF(ISVKK.LE.l)  C C  STORE  C  30 90  35  LOCATION  I N NEWRT  DO 20 KK2=1,MMR KMO=NEWRT(KK2)+I TEMPT M=TEMP+DM{KMP,1TSTCS)+DM(ITSTCS,KMO)-DM(KMP,KMu) KMP=KM0 I F { T E M P T M o G E o SVTM) GOTO 20 ISVKK=KK2 SVTM=TEMPTM CONTINUE IF  C  AND SAVE  GOTO  90  NEW ROUTE  TMM=S VTM ISVM=ISVKK-1 DO 3 0 K K 3 = 1 , I S V M I F C K K 3 . L T . I S V K K ) NMROUT{KK3)= NEWRT(KK3) IF(KK3.GT.ISVKK) NMROUT(KK3)=NEWRT<KK3-1) CONTINUE NMROUT{ISVKK)=ITSTCS-1 CONTINUE DIST=0.0 MNR=MMR-2 DO 35 1=1,MNR DIST=DIST+DM(NMROUT(I)+1,NMROUT{I+1)+1)  (ISVKK)  AND  254  01S S = D I S T + D M ( 1 , N M R O U T ( RETURN  1)+ i ) + DM(NMROUT(MMR) ,11  END  S U 3 R 0 U T I N E PL I K ( S C L T , N A C C 3 T , S , X , Y , X O R G , Y O R G , 1 SCALE,JC,LABI,NPLOT> . DIMENSION J J J 1 2 ) , I D A T E ( 3 ) , X { 1 0 0 ) , Y { 1 0 0 ) , L A B 1 ( 1 0 0 ) INTEGER S ( I O O ) CALL T I M E ( 5 , 0 , I D A T E ) C C C  PLOT  AXIS  CALL NUMBER(1.85,Oc 3 5 , 0 . 1 , X O R G , 0 . 0 , 1 ) CALL PLOT(2.0,1.0,3) CALL P L 0 T ( 2 . 0 , 1 . 0 , 2 ) JJJ(1)=13 YB-1.0  YBl=Y3-0.25 DO 1 0 1 = 1 , 1 3 AK=(XOKG+(1*25))  10  20  X3=C1*1.969)+2.0 CALL SYMBOL(X3,YB,0.i,JJJ{1),0.0,-1) CALL N U M B E R ( X B , Y 8 i , 0 . 1 , A K , 0 . 0 , 1 ) CALL NUMBER!1.25,1.0,0.1,YORG,0.0,1) CALL PLOT(2.0,1.0,2) XB=2.0 X31=XB-0.75 DO 20 1 = 1 , 1 0 AY=(YORG+l1*25)) YB=(1*1.969)+!.0 CALL S Y M B 0 L ( X B , Y 3 , o l , J J J ( i ) , 9 0 . 0 , - 1 ) CALL NUMBER(X31,Y8,0.1,AY,0.0,1) CALL PL0T(2.0,1.0,2)  C  C  CHANGE COORDINATES  TO MAP S C A L E  C  100  NN=NACCST-1 DO 1 0 0 J = 1 , N N SX=X(J) X{J)=(SX-X0RG)/2.54*2 SY=Y(J) Y(J)=(SY-Y0RG)/2 54*2 o  C  C  PLOT  ROUTES  C  JJJ(2)=2 XB=X(S(1))+2.0 Yd=Y(S(1))+1.0 CALL P L O T I X B , Y B , 3 ) CALL S Y M B 0 L ( X B , Y B , . 1 , J J J ( 2 ) , 0 . 0 , - 2 ) Xl=XB-0.25  255  50 C C C  Yl=Y6-0.20 •BLAB=LA81{1) ' CALL N U M B E R ( X I , Y I , 0 . 1 , B L A B i 0 . 0 , - 1 ) C A L L PLOT ( X B , Y B , 3) DO 50 J = 2,N.M IK=S(J) Xb=X(IK)+2.0 YB=Y(IKJ+1.0 X2=X3-0.25 Y2=YB-0.20 ALAB=LAB1(J) CALL S Y M B O L ( X B , Y B , . 1 , J J J ( 2 ) , 0 . 0 , - 2 ) CALL N U M B E R ! X 2 , Y 2 , 0 . 1 , A L A B , 0 . 0 , - 1 ) CALL P L O T ( X B , Y B , 3 ) I F t N P L O T . N E . l ) GO TO 5 2 PLOT P R E S E N T ROUTE  51 C C C  XB=X(1)+5.1 YB = Y( D + 3 . 1 CALL P L 0 T ( X B , Y B , 3 ) CALL D A S H L N C O . l , 0 . 0 5 , 0 . 1 , 0 . 0 5 ) DO 51 J = 2 , N N XB=X(J)+5ol YB=Y(J)+3.1 CALL P L O T I X B , Y B , 4 ) PLOT HEADINGS  52  SCAL0=SCALE*2 SCAl=SCALQ*25.4 C A L L P L O T l 1 5 . 0 , 5 . 0 , 3) CALL S Y M B O L ( 1 5 . 0 , 1 . 2 5 , i , 2 3 H T 0 T A L TRAVEL DISTANCE ,0.,23) CALL N U M B E R ! 1 7 . 0 , 1 . 2 5 , . l ,S0LT , 0 . , 1 ) C A L L S Y M B O L . 1 5 . 0 , 1 . 5 0 , . 1 , 1 6 H N U M B E R OF BOXES , 0 . , 1 6 ) ANN=NN C A L L NUMBER! 1 7 . 0 , 1 . 5 0 , . 1 , A N N , 0 . ,-D CALL S Y M B 0 L ! 1 5 . 0 , 1 . 0 , . 1 , 1 7 H S C A L E : 1 MM = , 0 « , , 1 7 ) CALL NUMBER!17.0,1.0,.1,SCALD,0.,2) CALL SYMBOL(18.0,1.0,.1,5HMILES,0.,5) C A L L S Y M B 0 L ( 1 5 . 5 , 0 . 7 5 , . 1 , 1 0 H ! 1 INCH = , 0 « , 1 0 ) CALL NUMBER(17.5,0.75,.!,SCAL,0.,2) CALL S Y M B 0 L ( 1 8 . 5 , 0 o 7 5 , . l , 6 H M I L E S ) , 0 . , 6 ) C A L L PLOT ( 1 9 . 0 , 0 . 0 , - 3 ) RETURN END SUBROUTINE S E L E C T ( L , X , Y , T 1 M E , S N A M E , N U M , T I T L e , W A Y , I G P ) D I M E N S I O N T I T L E ! 2 0 ) , WAYI 2 0 ) , L G P ( 1 0 0 ) , L ( 1 0 0 ) , X ( 1 0 0 ) , Y ( 1 0 0 ) , ASNAMEt100,10),TIME(100),10!10),0(10) READ(5,100) TITLE,WAY o  256  100 F 0 R M A T ( 2 0 A 4 , / , 2 0 A 4 ) R E A 0 ( 5 , 1 1 0 ) INO.IGP 110 FORMAT(214) RSAD(5,120) (LGP(K),K=1,IGP) ' 120 FORMAT(1515) ATIME=.9 J=l DO 190 II=1,NUM READ(7,140) IA,B,C.(D(K),K=1,10) I F ( I A . E Q . L G P ( J ) ) GO TO 200 GO TO 190 200 L ( J ) = I A X( J)= 8 Y(J)=C TIMciJ)=ATIME DO 201 IC=1,10 201 S N A M S i J , 1 0 = 0 ( 1 0 J=J + 1 I F ( J „ G T „ I G P ) GO TO 99 190 CONTINUE 140 FORMAT(2X,I 3 , 2 F 1 0 « 2 , 3 X , 1 0 A 4 ) 9 9 RETURN END SUBROUTINE C O U N T ( 0 , N , T I T L E , W A Y , S C A L E ) DIMENSION XL I S T ( 5 0 0 0 ) , 0 ( 1 0 0 , 1 0 0 ) , I C O N ! ( 1 1 ) , I N T ( 1 1 ) , S T O R ( 1 5 0 ) DIMENSION T I T L E ( 2 0 ) , W A Y ( 2 0 ) REAL INT, INTER DATA STAR , BL"NK/ ' * • , « «/ AMIN=99999„99 AMAX=0»0 NUMK=1 SUM=0. ASUM=0» SCAL1=SCALE*5230 DO 14 I=2,N K=I-i DO 14 J=1,K XLIST(NUMK)=D(I,J) IF ( XL IS T (iMUMK) oGTo AM AX ) AM AX=XL 1ST ( NUMK ) I F ( X L I ST(NUMK).LT•AM IN) AMIN=XLI ST(NUMK) ' SUM=SUM+XLIST(NUMK) 14 NUMK=NUMK+1 NUMK=NUMK-1 AVG=SUM/NUMK DO 16 11 = 1,NUMK DSQ=(XLIST(I1)-AVG)**2 16 ASUM=ASUM+DSQ VAR=ASUM/( NIJMK-1 ) STD=SQRT(VAR) TSUM=SUM  257  SUM=TSUM*SCAL1 TAVG=AVG AVG=TAVG*SCAL1 TSTD=STD STD=TSTD*SCAL1 8MAX=AMAX AMAX=BMAX*SCAL1 BMIN=AMIN AMIN=BMIN*SCAL1 RANGE=AMAX-AM  10  17  IN  INTERCHANGE/LO DO 10 1 2 = 1 , 1 0 IC0NTCI2)=0 INT(1)=AMIN*1 DO 17 IG=2,11 INT( I G ) = I N T ( I G - 1 J + I N T E R CALL S S 0 R T ( X L I 5 T , N U M K , 3 ) IKONT=0 Jl =2 DO  19  JK=1,NUMK  VALUc=XLIST(JK5*SCAL1 I F { V A L U E . G T . I N T ! J l ) ) GO TO 22 IK0NT=IK0NT+1 GO TO 1 9 22 I C O N T ( J 1 ) = IKONT IK0NT=1 J1=J1+1 19 CONTINUE WRIT£<6,100) TITLE,N,WAY 100 FORMAT!'!•,//,5X,20A4,//,5X,'STATISTICS OF D I S T A N C E M A T R I X O F ' , A« G R O U P ' , 1 4 , ' USING ,20A4,//) WRITE(6,101) SUM,AVG,STD WRITE(6,50) NUMK 50 F O R M A T { 2 0 X , ' T O T A L NUMBER OF E L E M E N T I N O I S T o M A T R I X = ' , 1 1 0 ) WRITE ( 6 , 51) A Mi N , AMAX 51 FORMAT!20X,'MINIMUM D I S T A N C E = • , F 1 0 . 2 , ' F E E T ' , / , 2 0 X , ' M A X I MUM' , A• DISTANCE = • , F 1 0 . 2 , • F E E T « , / / ) 101 F0RMAT120X,•TOTAL DISTNCE = ' , F 1 6 . 2 , ' F E E T ' , / , 2 0 X , ' A V E R A G E • , A ' DISTANCE = ' , F 1 4 . 2, ' F E E T ' , / , 2 0 X , ' S T A N D A R D ', .B 'DEVIATION = • , F12»2,•FEET»,///) WRITE(6,102) 102 FORMAT { 3 0 X , * * * * * FREQUENCY PLOT * * * * • . , / / / , 2 0 X , • I N T E R V A L S ' , 2 0 X , A * FREQUENCY',/) M=0 DO 150 IJ=1,10 IF(ICONT(IJ).EQ.O) GO TO 7 0 M=ICONT( I J ) / 4 IF{M„EQ.0)M=1 DO 9 J Y = 1 , M 9 STOR(JY)=STAR f  258  70  8 103 150 152 151 $SIG  MN=M+1  00 8 JX=MN,150 ST0R( JX )=ELNK WRITE(6,103) I N T l I J ) , I C O N T < I J ) , ( S T O R { K ) , K = 1 » 6 0 ) F0RMAT120X,F9.2,/,32X,13,3X,60A1) CONTINUE WRITE(6,152) INT t i l ) FORMAT(20X,F9.2) WRIT5(6,151 ) (NUMBE,NUMBE=40,240,4G) FORMAT( / , 3 7 X , 6 ( • I •) , ' I • , / , 37X , ' 0" , 6 {7X, I 3 ) ) RETURN END  MULTIPLE ROUTE SCHEDULING BY "SAVINGS' ALGORITHM" CLUSTER TRIAL RUN - NON-HIFRARCHICAL - TDATA1  THE FOLLOWING ROUTING PRINT-PUTS FOR £VSNLY DI3T. DATA  AREA ARE BASEO ON I •  CAPACITY CONSTRAINT OF EACH TRUCK ROUTE a  50  BOXES  TIME CONSTRAINT~0F~FACH~ROUTE  =  90  MINUTES  AVERAGE SPEED OF TRUCKS EN ROUTE  =  IS.00  M.P.H.  TOTAL NUMHER OF ROUTES IN THE AREA  =  2  ROUTES  TOTAL NUMREFOTBIDXES~IN~T~HE"1,RFA~  z~""~'eO BOXES  HAP SCALE  • -  1 MM e  :  0. Q5Q652MILE3  STC PlNG TIME FOR RESIDENTIAL BOXES  =  45  STOPPING TIME FOR BUSINESS BOXES  =  AS~SPECTFlED  p  "  SECONDS  to_.  CCOS"f E R""X N TERPREfA f IQti" -' T ^ A f V i " - 6 R 0 U ' P ' " " l " STATISTICS OF DISTANCE MATRIX OP GROUP  49 USING MEOIAN(GOWER?METHOD a .CENTROID SORTING  TOTAL OISTNCE a 19090196 . 0 OFEET AVERAGE 01 STANCE a 1 c,?.S1,U 1 FEET STANDARD DEVIATION = 7973,69FSST  TOTAL NUMBER OF ELEMENT IN DIST. MATRIX a MINIMUM DISTANCE = 1259.85FEET MAXIMUM DISTANCE = U 1 260 . 07FEET ****  1176  FREQUENCY PLOT **** FREOUFNCY  INTERVALS I259.ft5 0 5259.07 95  ***********************  9259.fl9 157  ***************************************  210  ****************************************************  195  ************************************************  215  *****************************************************  139  ******************************'****  13259.91 17259.93 21259.95 25259.97 29259,99 86  *********************  50  ************  21  *****  33260'.01 37260.03 '11260.05  •  TO"  80  160  TOO  Tao"  MULTIPLE ROUTE SCHEDULING USING TIKE-SAVING METHOD EVENLY DIST. OAT A F.S.A. UNDEFINE SI,B ROUTES  AREA  *» PRELIMINARY ROUTE **  ROUTE NO.  1  NO. OF BOXES EN ROUTE  a  H9  TOTAL STOPPING TIME  =  44.10  DISTANCE~TRAVELLED( 1ST TO  LASfHBoxT~~=  36  TRAVEL TIME r<ECUlREP(13T TO LAST BQX) = TOTAL TIME REOUIREDC1ST TO LAST BOX) =  BOXES  765  154.62 198.72  TOTAL OTSTANCF"~TRAVELLEOCS~TN~T'cTsflfj~s'~~~z9'.bI s  TOTAL TRAVEL TIM E (S T N TO STN)  STATION  TOTAL TIME REHUIREO ..FOR THIS ROUTE  =  ORDER OF CALL POINTS: 3TATI0N 2622«33536371410-  29482513-  15-  0-  19513221-  27493117-  158.44 202.54  334220. 6-  MINUTES MILES MINUTES MINUTES MILES MINUTES MINUTES  38- • 3439411611?.5-  2850103-  234574-  2444112-  304 0e9-  PAGE MULTIPLE ROUTE 'SCHEDULING  TJSTNG. 7 IN-E-SAVINU  METHdD  EVENLY DIST'. DATA F.3.A. UNDEFTNE SLB ROUTE3  **  IMPROVED  ROUTE NO".  ROUTE  ARFA  **  l"  NO. OF BOXES EN ROUTE  8  1)9  TOTAL STOPPING TIME  e  44.10  MINUTES  OISTAMCE TRAVELLED(1ST TO LAST BOX)  =  36.33  MILES  TRAVEL. T I M r RFOIIIRFOCIST TO LAST BOX) =  145.31  MTNUTES  TO 1 AST BOX) =  180.41  MINUTES  TOTAL TIME RFOUIRFDC1ST  TO STN) a  3H.«»S  TOTAL TRAVEL TIMf.cSTN TO STN)  =  155.81  MINUTES  TOTAL TIME R(-OUJRFD FOR THIS ROUTE  s  199.91  MINUTES  TOTAL DISTANCE TRAVELLEDCSIN  0!?CER OF CAI.I STATION -  STATION  BOXES  POINTS:  ?638-  1943-  14-  10-  15-  0-  MILES  •  272949513?- " 3T-". 2117-  33_  ^  3442'206-  2839182-  2223415011- ' 1053-  "  24457-  4-  3044112-  354089-  2  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0093519/manifest

Comment

Related Items