Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A simulation study comparing the reliability and validity of methods of scoring ratings Phillips, Norman 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1984_A8 P55_4.pdf [ 6.94MB ]
Metadata
JSON: 831-1.0096273.json
JSON-LD: 831-1.0096273-ld.json
RDF/XML (Pretty): 831-1.0096273-rdf.xml
RDF/JSON: 831-1.0096273-rdf.json
Turtle: 831-1.0096273-turtle.txt
N-Triples: 831-1.0096273-rdf-ntriples.txt
Original Record: 831-1.0096273-source.json
Full Text
831-1.0096273-fulltext.txt
Citation
831-1.0096273.ris

Full Text

A SIMULATION STUDY COMPARING  THE R E L I A B I L I T Y  AND VALIDITY OF METHODS OF SCORING RATINGS by Norman B.A., M c G i l l  Phillips University,  1978  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE  REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS in  THE  FACULTY OF GRADUATE STUDIES Department  We a c c e p t to  this  of P s y c h o l o g y  thesis  the required  as conforming standard  UNIVERSITY OF BRITISH COLUMBIA January (cT)  1984  Norman P h i l l i p s ,  1984  In p r e s e n t i n g  t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of  requirements f o r an advanced degree a t the  the  University  o f B r i t i s h Columbia, I agree t h a t the L i b r a r y s h a l l make it  f r e e l y a v a i l a b l e f o r reference  and  study.  I  further  agree t h a t p e r m i s s i o n f o r e x t e n s i v e copying o f t h i s t h e s i s f o r s c h o l a r l y purposes may  be  department o r by h i s or her  granted by  the head o f  representatives.  my  It i s  understood t h a t copying or p u b l i c a t i o n of t h i s t h e s i s f o r f i n a n c i a l gain  s h a l l not be  allowed without my  permission.  Department o f The U n i v e r s i t y o f B r i t i s h 1956 Main Mall Vancouver, Canada V6T 1Y3 Date  DE-6  (3/81)  Columbia  written  Abstract Simulated uni-factor judges; of  rating  according to a  model under v a r y i n g c o n d i t i o n s o f : number o f  number o f t a r g e t s ;  d i s c r e p a n c i e s i n judges'  scales  measurement; and mean and v a r i a n c e i n d i s t r i b u t i o n s of  individual  judges'  reliabilities.  standardizing  ratings,  reliabilities  from  a  d a t a were g e n e r a t e d  function  in  consensus. optimal  of the judges'  close  from  under  t o maximum  i n reduced  likelihood  some c o n d i t i o n s .  Only  The maximum  alpha  was f o u n d  coefficient  another  individual  factor  o f t h e sum  estimate  which  reliabilities.  t o an  square showed  scores. judges'  Cronbach's performed  t o g i v e the best  than  mean  methods o f e s t i m a t i n g i n d i v i d u a l  were t e s t e d .  reliability  d i d the simple  true score estimates  e s t i m a t e appeared  the  resulted  the t r u e s c o r e s . Burt's estimates  proposed  reliabilities  r a t i n g s by  reliabilities  t r u e s c o r e s than  A method o f s c a l i n g  resemblance  Several  estimated  with  individual  d a t a , and w e i g h t i n g  absolute scale resulted  deviations  (1936) method o f  e s t i m a t i n g judges'  the r a t i n g  higher correlations  Burt's  likelihood  poorly  factor  loading  e s t i m a t e o v e r a l l . The  t o be a much p o o r e r  e s t i m a t e of  ( o r mean) o f a g r o u p o f involved estimating  judges  judges'  Table of Contents ABSTRACT  i i V  L I S T OF TABLES L I S T OF FIGURES  v i i  ACKNOWLEDGEMENT  viii  I.  Introduction  1  A.  Purpose  2  B.  Rating Scales  4  C.  Scale Discrepancies  6  D. F a c t o r i a l E. Rater  Validity.  Reliability  F. D i f f e r e n t i a l  II.  .9 10  Weighting  .15  G. Summary  17  H. M o d e l  18  I.  20  O b j e c t i o n s t o B u r t ' s Method  Method  22  A.  Data  Generation  B.  Experimental Conditions  24  C.  Rater  22  26  Reliabilities  D. R e l i a b i l i t y E. W e i g h t i n g  III.  ..  o f Sum  28  f o r Maximum R e l i a b i l i t y  29  F. T r u e  Score V a r i a n c e  30  G. T r u e  Scores  30  Results  .33  A.  Rater  Reliabilies  B.  Reliability  C. W e i g h t i n g  33  o f Sum  36  f o r Maximum R e l i a b i l i t y i ii  37  D. T r u e  Score V a r i a n c e  41  E. T r u e  Scores  4  F. U n i f o r m D i s t r i b u t i o n IV. D i s c u s s i o n  1  48 1  05  REFERENCES  1  1  8  APPENDIX  1  2  4  iv  List Tables  Table Table Table  1-12  13 14 15  of T a b l e s  Means ( o v e r r e p l i c a t i o n s ) o f means, SD's, c o r r e l a t i o n s w i t h a c t u a l r e l i a b i l i t i e s a n d mean s q u a r e d e v i a t i o n s from a c t u a l r e l i a b i l i t i e s o f rater r e l i a b i l i t y estimates  51  Average over r e p l i c a t i o n s e s t i m a t e means  63  Mean s t a n d a r d estimates  of  d e v i a t i o n s of  reliability reliability 64  Mean c o r r e l a t i o n between and a c t u a l r e l i a b i l i t i e s  reliability  estimates 65  Table  16  C o r r e l a t i o n s between r e l i a b i l i t y e s t i m a t e s and a c t u a l r e l i a b i l i t e s a v e r a g e d o v e r r a t e r b i a s ...66  Table  17  C o r r e l a t i o n s between r e l i a b i l i t y e s t i m a t e s and a c t u a l r e l i a b i l i t i e s a v e r a g e d o v e r sample size  67  A v e r a g e mean s q u a r e d e v i a t i o n s of r e l i a b i l i t y e s t i m a t e s from a c t u a l r e l i a b i l i t i e s  68  Mean s q u a r e d e v i a t i o n s o f r e l i a b i l i t y e s t i mates from a c t u a l r e l i a b i l i t i e s a v e r a g e d o v e r rater bias  69  Mean s q u a r e d e v i a t i o n s o f r e l i a b i l i t y e s t i mates from a c t u a l r e l i a b i l i t i e s a v e r a g e d o v e r sample s i z e . .  70  Table Table  Table  Tables  18 19  20  21-32 C o m p a r i s o n o f e s t i m a t e s o f r e l i a b i l i t y o f sums of r a t e r s ; p o p u l a t i o n v a l u e s o f r e l i a b i l i t y o f c o m p o s i t e s w e i g h t e d by two methods; c o m p a r i s o n o f e s t i m a t e s of t r u e s c o r e variance  Table  33  Expected  Table  34  Estimates  Tables  reliability  o f t h e sum o f r a t e r s  of r e l i a b i l i t y  o f t h e sum  35-46 Means ( o v e r r e p l i c a t i o n s ) o f means, SD's, c o r r e l a t i o n s w i t h a c t u a l and mean s q u a r e d e v i a t i o n s from a c t u a l (MSE1) o f t r u e s c o r e e s t i m a t e s ; a l s o i n c l u d e s mean s q u a r e d e v i a t i o n s from a c t u a l o f e s t i m a t e s s t a n d a r d i z e d to mean e q u a l t o e s t i m a t e d t r u e s c o r e mean and v a r i a n c e e q u a l t o t h e p r o d u c t o f e s t i mates o f t r u e s c o r e v a r i a n c e and r e l i a b i l i t y (MSE2)  v  71 83 84  85  Table Table  Table  Table Table  47 48  49  50 51  Mean c o r r e l a t i o n between t r u e mates and a c t u a l t r u e s c o r e s  score  esti97  Means f o r a b s o l u t e s c a l e t r u e s c o r e e s t i m a t e s of mean s q u a r e d e v i a t i o n s from a c t u a l t r u e scores  98  Mean MSE o f t r u e s c o r e e s t i m a t e s standardi z e d t o e s t i m a t e d o p t i m a l s c a l e from a c t u a l true scores  99  Mean s q u a r e d bias  d e v i a t i o n s averaged  Mean s q u a r e d size  d e v i a t i o n s averaged  over  rater 100  over  sample 1 01  Table  52  Means ( o v e r r e p l i c a t i o n s ) o f means, SD's, c o r r e l a t i o n s w i t h a c t u a l r e l i a b i l i t i e s and mean s q u a r e d e v i a t i o n s from a c t u a l r e l i a b i l i t i e s o f rater r e l i a b i l i t y estimates 102  Table  53  Comparison of e s t i m a t e s of r e l i a b i l i t y of sums o f r a t e r s ; p o p u l a t i o n v a l u e s o f r e l i a b i l i t y o f c o m p o s i t e s w e i g h t e d by two methods; comparison of e s t i m a t e s of t r u e s c o r e variance  103  Means ( o v e r r e p l i c a t i o n s ) o f means, SD's, c o r r e l a t i o n s w i t h a c t u a l and mean s q u a r e d e v i a t i o n s from a c t u a l (MSE1) o f t r u e s c o r e e s t i m a t e s ; a l s o i n c l u d e s mean s q u a r e d e v i a t i o n s from a c t u a l o f e s t i m a t e s s t a n d a r d i z e d t o mean e q u a l t o e s t i m a t e d t r u e s c o r e mean and v a r i a n c e e q u a l t o t h e p r o d u c t o f e s t i mates o f t r u e s c o r e v a r i a n c e and r e l i a b i l i t y (MSE2)  104  Table  54  vi  List Figure Figure Figure Figure Figure Figure  1 2 3 4 5 6  of F i g u r e s  Reliability bias  o f t h e sum a v e r a g e d  Reliability size  o f t h e sum a v e r a g e d  over  rater 38  over  sample 39  R e l i a b i l i t y of consensus s i t e averaged over r a t e r  and w e i g h t e d bias  compo-  R e l i a b i l i t y o f c o n s e n s u s and w e i g h t e d s i t e a v e r a g e d o v e r sample s i z e  compo-  Mean s q u a r e bias  d e v i a t i o n s averaged  over  40 42 rater 47  Mean s q u a r e . d e v i a t i o n s size ..  vi i  averaged  over  sample ... 49  Ac knowledqement I would  like  t o e x p r e s s my  people: J e r r y Wiggins Steiger  for  project  possible;  expertise; and  most of  teaching  Dimitri  Del  gratitude  for h i s advice me  a l l of  Paulhus  Papageorgis  a l l to Diane  for  the  and  to  the  moral  techniques  support; that  for his attention f o r h i s generous everything.  vi i i  following  made  Jim this  and attention;  I. ...subjectivity for A  most  error,  so  sciences  a d v a n c e by  As  long  sciences,  analysis  chromatic  Barre  be  and  itself,  behavioral  and  i n the  distortion  the  like'  s c i e n c e s , we  have  but  we  always s y s t e m a t i c  i s not  introduced  Suppose t h a t  "true"  ratings.  criterion  Burt  which g i v e s  process  exemplified i n the  by  a  any  microscopic the  that  with  of  fixed,  of g r e a t e r  the  of  ...  (Fiske,  judges  validity  procedure simple  than  the  in  1978).  exists the  score  estimates  1  a  no  ratings  targets' judges'  (1936) p r o p o s e d a method of c o m b i n i n g  true  the  r a t e s e a c h of  i s to estimate  a v e r a g e of  the  The  as  a v a r i a b l e f o r which t h e r e  usual  of  by  precision.  and  lens  e a c h of a p a n e l  t a r g e t s on  scores  very  of  have been u n a b l e t o measure  error  t h e m s e l v e s . The  the  through  recognized  that e r r o r with  available  of  ( V 9 6 7 , p . v i i ) . In  m a g n i t u d e of  of  exact  c o n t r i b u t e s e r r o r t o many k i n d s  observations,  distortion  natural  the magnitude  measurement, as  other  possibly  t o become  the  and  subjectivity  of  discernment  measurement o f  and  sources  i t : ' A l l the  then  error  fields.  through the nature,  of  related  m e a s u r e d and  puts  e r r o r " inherent  observation  source  identifying  s i n c e have s o u g h t  and  and  "probable  pool  La  first  possibility  is a  i n p e r s o n a l i t y and  t h a t e r r o r can  controlled.  lens  in observations  research  s c i e n c e can  INTRODUCTION  ratings  w h i c h a r e more r e l i a b l e  and  2 more v a l i d only  than  those  g i v e n by t h e c o n s e n s u s  information contained i n the r a t i n g s .  i n v o l v e s e s t i m a t i n g each estimating  judges)  judges'  reliabilities.  been  sufficient  (Lawshe & N a g e l ,  1952),  and  of s t a n d a r d i z e d  by a f u n c t i o n  B u r t ' s method  to j u s t i f y  reliability  results  of the  in increased  o f improvement was n o t  the computational  labour  and hence t h e method h a s  ignored. It  i s argued  the problem of  rating  in  validity  primary of  weighted  b u t , a p p a r e n t l y the degree  considered involved  ratings  The method  judges'  t r u e s c o r e s by t h e c o m p o s i t e  (within  validity,  individual  method u s i n g  here,  however, t h a t  of computation,  data  and c o n s i d e r i n g  eliminate  the importance  f o r many a r e a s o f r e s e a r c h , any improvement  or r e l i a b i l i t y  aim of t h i s  ought  t o be h i g h l y  s t u d y was t o i n v e s t i g a t e  B u r t ' s method a s compared  varying  computers  v a l u e d . The the performance  to simple averaging  under  numbers o f j u d g e s and t a r g e t s and v a r y i n g  distributions  of judges'  scales  reliability.  The most  investigated  was a c o m p a r i s o n  estimating  individual  important  judges'  o f measurement and secondary  topic  w h i c h was  o f v a r i o u s p r o p o s e d methods o f reliabilities  from  rating  data.  A.  PURPOSE There  object  a r e many s i t u a t i o n s  or event  i n which e v a l u a t i o n  c a n o n l y be p e r f o r m e d  judgment. E x a m p l e s  include  by human  such d i v e r s e  o f an  subjective  areas as j u d g i n g  3 livestock,  a r t , wine, a t h l e t i c s ,  p e r f o r m a n c e , and these  areas  between  some c a s e s  value  i s always a c e r t a i n  no m a t t e r  common p r a c t i c e In  p e r s o n a l i t y . B e c a u s e of  there  judges  i t may  be  true  1979;  of e x p e r t  & Gleser,  Landy  very  they  & Farr,  nature  a r e . Hence of  "true"  t h e c o n s e n s u s of  1963;  (Cook,  Horowitz,  1980). T h i s  it is  judges.  to d e f i n e the  judges  a  1979; Inouye, &  is especially  of p e r s o n a l i t y v a r i a b l e s : ...  the  better or  impulsiveness d e f i n e d than  score, given  competent Fiske average  by  of a p e r s o n  as an  equal  can  s c a r c e l y be  t o the average  a d e q u a t e number of  to rate.... ( K e l l e y ,  (1978) b e l i e v e s t h a t  rating,  those  1947).  "the  conceptual  value"  r a t i n g s i s q u e s t i o n a b l e . He. g i v e s t h e f o l l o w i n g  analogy: Ralph  Gerard  used t o r e l a t e  squirrel-hunter a  squirrel  passing hunter shot the  go  Hunters  one  he  f o o t to the  fires one  the  s t o r y of  his double-barreled  in a tree,  fires left  one  of  second b a r r e l ,  f o o t to the  squirrel  central  and  the  falls  right  down d e a d ,  the only  of  the gun.  barrel,  the  killed  Seeing  the  squirrel. t o have  shot The  the  squirrel. by  the  But  law  tendency.  firing  a t a t a r g e t seems l i k e  an  of  disagreement  r a t i n g s from a p a n e l  appropriate  large panel  the  job  amount of  "expert"  ( o r q u a l i t y ) as  Cronbach, Rajaratnam, Siegelman,  how  t o combine  of a q u a n t i t y  sufficiently  academic or  appropriate  of  of  4 a n a l o g y , but of  the a p p r o p r i a t e c o n c l u s i o n  the t a r g e t  can  be e s t i m a t e d by  i s that  the  location  the average  location  of  include  performance  the s h o t s . A r e a s where r a t i n g s a r e u s e d (Borman, (Berg  1979;  Landy & F a r r ,  & Adams,  Blaney,  1962;  assessment validate  which  Burisch,  that  i n 72%  used  P s y c h o l o g y between  "ratings  by p e e r  were u s e d as  ratings  to  and  validation 1965  the  of t h e most  questionnaires,  s c a l e s are only  studies  namely  Jackson's  (Jackson,  are peer  suggest  that  one  1974;  form o f p e e r a s s e s s m e n t .  But  more i n f o r m a t i v e  r a t i n g s may  advantage  r a t i n g s are e a s i e r (Kane of  Two  n o m i n a t i o n s and p e e r r a n k i n g . t o be  t h a n n o m i n a t i o n s o r r a n k i n g s (Kane  1981).  alleged  are often  of t h e c a s e s " . One  validated  p o p u l a r forms  valid  Love,  of  SCALES  Rating  and  to  1978).  B. RATING  Recent  or f e a s i b l e  r e v i e w of  of A p p l i e d  regarded p e r s o n a l i t y in part  ratings  a literature  revealed  criterion  was  other  Knight &  i n s t r u m e n t s . F o r example, Landy  i n the J o u r n a l  primary  PRF,  assessment  1973;  hence o t h e r forms  a r e d e v i s e d . But p e e r  the proposed  1975  highly  i t i s not p r a c t i c a l  j u d g e s , and  (1980) r e p o r t  studies and  clinical  1977).  a p a n e l of  Trumbo  and  Flemenbaum & Zimmerman,  F o r many p u r p o s e s obtain  1976)  job  & Lawler,  to obtain  1978).  less  reliable  & Lawler, and  generally  Some b e l i e v e  informativeness i s i l l u s o r y  1978;  the  because  of  5  the  unreliability  1980). R a t i n g t h e y can  be  and  invalidity  scales  used  also  tend  f o r group,  of  the  t o be  results (Brief,  more f l e x i b l e  individual, self,  or  in  that  other  reporting. Rating formats  scales  can  t a k e on  ( G u i l f o r d , 1954;  Wiggins,  form  i s that  of  five  to nine  response c a t e g o r i e s  The  scales  the  many d i f f e r e n t p h y s i c a l  generally  and.sometimes at of  the  Likert scale  each p o i n t  descriptions  the  obtained  the  rated  on  attribute is also  1978). The  is  the  at  a l o n g a pseudo-continuum  are  s c a l e . The  r e l a t e d t o the 1970). The  very  the  important  a numeric v a l u e representing  the  from  continuum. endpoints  explicitness reliability  of  concreteness for  to  of  reliability  e s s e n t i a l c h a r a c t e r i s t i c of  judge a s s i g n s  common  arranged along a  the  (Cronbach,  most  i n which t h e r e  descriptions  is highly  ratings  (Fiske, that  include  1973). The  rating  the  scales  target  dimension  of  interest. Rating relatively Lawler and  the  has  u n r e l i a b l e and the  average v a l i d i t y found  for  to  (Love, rank a an  1981; set  been m e n t i o n e d , a r e invalid.  average around  r a t i n g data  r e l a t i v e f r e e d o m of  judge way  as  ( 19*78) r e p o r t  typically the  scales,  .3.  there  are  infinite  which  r e s u l t i n the  The  poor  t o be  around  reliability  which they p e r m i t 1978). T h e r e  ways of  same o r d e r i n g .  This  assigning freedom  of  to  the  i s only  in a p a r t i c u l a r order,  number of  and  i s sometimes a t t r i b u t e d  expression  targets  example, Kane  reliability  Kane & L a w l e r , of  For  generally  one  but ratings  .6  6 expression about  e n a b l e s the  the t a r g e t s  C.  measurements, e t c . ) but i t  amount of  response b i a s  on  the part  of  SCALE DISCREPANCIES  recent  Guilford  i n judgment has years  (1954) e n u m e r a t e d  s c a l e s comment  on t h e t e n d e n c y  t o t h e mean and  Atkins,  Briar,  Cronbach,  & Gleser,  1968;  mean and  of t h e i r  & Tripodi.,  F a r e n h e i t . As  to average  one  Paulhus,  commonly  rating  ratings 1966;  (Bieri,  Burt,  be  of C e l s i u s  (1936) p o i n t e d  1936;  Cronbach,  1960;  1981;  that  j u d g e may  i n the s c a l e  Burt  on  1972;  & Cox,  of measurement u s e d by  ratings  of b i a s which  a l i writers  Fiske  1947;  attention  1982).  Grozz  Taylor,  of a j u d g e ' s r a t i n g s c a n be  of temperature,  assigning  sense  Kelley,  variance  as t h e s c a l e  1963;  of  f o r judges t o vary with  variance  Leaman, M i l l e r ,  deal  & Tversky,  G l e s e r , Nanda, & R a j a r a t n a m ,  Rajaratnam, Grossman,  a great  many forms  data. V i r t u a l l y  respect  analogy  received  (Kahneman, S l o v i c ,  occur with r a t i n g  The  information  judge.  Bias in  t o s u p p l y more  (eg. i n t e r v a l  also allows a certain the  judge  1968).  thought  j u d g e . To thought  &  use  of  the  as  while another  uses  out, -it  doesn't  i n two  different  scales  r e s e a r c h e r s have t r i e d  to reduce  the  scores expressed  of  make much of  measurement. One  way  discrepancy rater  that  between  training  judges' s c a l e s  o f measurement  (Bernardin,  1978;  Borman,  Latham, Wexley, & P u r s e l l ,  1975;  Pursell,  1979;  i s with  Crow,  1957;  D o s s e t t , & Latham,  7  1980; S p o o l , out  their  (Borman, quite  1978). Thus j u d g e s  ratings'  or t o 'provide  1979). R a t e r  training  w e l l b u t i n some c a s e s  training. effect  may be i n s t r u c t e d fewer h i g h  ratings'  when p o s s i b l e seems  i t i s not f e a s i b l e  T r a i n i n g may a l s o have an u n f o r e s e e n  on t h e j u d g e s  (Knight  t o 'spread  & Blaney,  t o work  to provide biasing  1977).  A n o t h e r common method o f e l i m i n a t i n g d i s c r e p a n c y between  s c a l e s o f measurement  ratings  t o a common s c a l e  1974). Some  researchers  i s t o s t a n d a r d i z e a l l judges'  (Burt,  find  1936; K e l l e y , 1947; S m i t h ,  this  procedure  ( C r o h b a c h e t a l . , 1972). T y p i c a l l y standardized of  t o have a mean o f z e r o  judge's  artificial  ratings are  and a s t a n d a r d d e v i a t i o n  one. T h e r e a r e two r e a s o n s  standardization the  each  t o be  standard  why t h i s  i s not the o p t i m a l  procedure  one. In t h e f i r s t  d e v i a t i o n of a judge's  place  r a t i n g s i s n o t an  appropriate  index  is  of " t r u e " score v a r i a n c e as w e l l as e r r o r  composed  of the u n i t  of  variance. A better the  of the u n i t  regression coefficient  (Cureton, rating  data  variance  o f r a t i n g s on t r u e  would be  scores a model f o r  below).  a standard  resulting  since  o f measurement  1958) (See t h e d i c u s s i o n c o n c e r n i n g  A second and  index  o f measurement  reason  why s t a n d a r d i z a t i o n t o a mean o f z e r o  d e v i a t i o n o f one i s s u b o p t i m a l  scores provide  information  f o r only  standing  o f t h e t a r g e t s . The s c o r e s c a n n o t  original  s c a l e i n any s e n s e .  i s that the the r e l a t i v e  be r e l a t e d  Hence t h e i n f o r m a t i o n  to the  contained  8 in  the s c a l e d e s c r i p t i o n s cannot  scores.  I t would be p r e f e r a b l e t o o b t a i n  mean and v a r i a n c e absolute, But  how  i s the absolute  score  ratings  defined  for a target  over  study  adopts  the p o s i t i o n  s c o r e mean and s t a n d a r d manner a s e x p e c t e d In  forquantities  by c o n s e n s u a l  i s the expected  the population  agreement, the  value  of p o t e n t i a l  (average) of  judges.  deviation are defined  value  f u n c t i o n s of t r u e  i n the usual  scores.  i t i s clear  that  the appropriate  mean  observed  sample mean o f a l l r a t i n g s s i n c e t h i s  estimate  of the true  are estimated  judges'  score  mean. Suppose t h a t  by some l i n e a r  i s given  standard  d e v i a t i o n i s estimated  least  standard  by r . F u r t h e r ,  squares c r i t e r i o n  and  the r e s u l t i n g  and  standard  of  estimating  f ( x ) of the  the true  score that  i m p l i e s that the  be s e t a t s * ( r ) * * l / 2 . 1  be c o m b i n e d  composite converted  i n standard  score  form  t o t h e d e s i r e d mean  d e v i a t i o n . T h i s p r o c e d u r e p r e s u p p o s e s a method true  score  variance  Single a s t e r i s k represents represents exponentiation. 1  the targets'  by s. I t c a n be shown  d e v i a t i o n of f ( x ) s h o u l d  Hence r a t i n g s c o u l d  i s the  of t h i s  suppose t h a t  of e s t i m a t i o n  score  i s the best  combination  r a t i n g s . And s u p p o s e t h e r e l i a b i l i t y  estimate  the  The t r u e  terms of the s c a l e of s t a n d a r d i z a t i o n f o r t r u e  estimates,  scores  i n an  score  by C r o n b a c h e t a l . ( 1 9 6 3 ) , namely, t h a t suitably  of t h e  s c a l e o f measurement.  s c a l e of the true  t o be d e t e r m i n e d ? T h i s  w h i c h a r e most true  estimates  of the t a r g e t s ' " t r u e " scores  a s o p p o s e d t o an a r b i t r a r y  estimates taken  be a p p l i e d t o t h e r e s u l t i n g  and r e l i a b i l i t y  multiplication.  of a  Double a s t e r i s k  9 composite.  The  f o r m e r c a n be e s t i m a t e d from  covariance  between  (1963). Formulas derive,  for estimating  g i v e n e s t i m a t e s of the (Green,  reliabilities  of  by B u r t  tendency of  individual  a r e easy  individual  to  judges  1950). Methods o f e s t i m a t i n g  s o u r c e of b i a s (1936) and  for different  the t a r g e t  the  judges are d i s c u s s e d  below.  in rating  was  Cronbach  components o r  be  thought  factors,  t h e n one  described  as  measuring  instruments (Allen  c a n be  the f a c t o r i a l  the s i m p l i f y i n g is sufficiently  thought  random e r r o r . generalize  j u d g e may  of as  validity & Yen,  emphasize  be  salient  1979).  This  of  Cronbach that  that  interest,  factorial  each  f r o m one  the techniques proposed  c a s e of more complex  judges  assumption  resulting  I t would  of the  (1936) and  by  be  as  study  will  e t a l . (1963)  the q u a n t i t y judge's  being  ratings  general factor  and  however, t o Burt  structures.  to  different  from a n o t h e r . T h i s phenomenon c a n  t h e example of B u r t  measured  aspects  i f the q u a n t i t y  o f as made up o f  factors  making  pointed  e t a l . (1963) i s t h e  s c o r e s . Thus,  different  follow  data which  j u d g e s t o f o c u s on d i f f e r e n t  in assigning  be m e a s u r e d c a n  in  the l a t t e r  et a l .  FACTORIAL VALIDITY .Another  out  average  j u d g e s as s u g g e s t e d by C r o n b a c h  reliabilities  D.  the  (1936) t o t h e  10 E. RATER R E L I A B I L I T Y The which of  final  i s recognized  rating  The  source  data  of r a t e r  b i a s t o be c o n s i d e r e d , one  by v i r t u a l l y  every  i s the d i f f e r e n t i a l  m a j o r i t y of r e c e n t  w r i t e r on t h e s u b j e c t  reliability  s t u d i e s on r a t e r  a t t e n t i o n on t h e a v e r a g e r e l i a b i l i t y  judges.  An i n t r a c l a s s  generalizability rater  reliability  (1972);  Ebel,  Ferguson, Shrout  1941; M a x w e l l  al.  1966; B u r t ,  i n order  (Burdock, F l e i s s ,  of average  1939; J a c k s o n  &  1981;  w h i c h h a s been  correlations  with  various  t o weed o u t t h e u n r e l i a b l e 1963; S t r a h a n ,  of K e l l e y (1947),  1980).  Cronbach et  (1972) p r o p o s e d a method o f e s t i m a t i n g t a r g e t s ' s c o r e s  which uses the average r e l i a b i l i t y are  estimated  target  by t h e a l p h a  p l u s the o v e r a l l  one minus a l p h a .  low, the one,  Thus  mean. On t h e o t h e r  each t a r g e t ' s s c o r e  mean r a t i n g  assigned  intraclass  True  correlation  rater  i s estimated hand,  i s estimated  to that target.  scores  f o r a given  mean o f a l l t a r g e t s  i f the average  each t a r g e t ' s t r u e score overall  of judges.  by t h e sum o f t h e mean r a t i n g  multiplied  coefficient, by  index  1968; P a u l u s ,  & Hardesty,  a suggestion  of a p a n e l of  1936; C r o n b a c h e t a l .  1958; J a c k s o n ,  1979). One p r o c e d u r e  of judges  Following  i s a suitable  intraclass  have  c o e f f i c i e n t (or  & Pilliner,  i s t o compute  combinations judges  (Bartko,  1951; H a g g a r d ,  & Fleiss,  advocated  coefficient)  judges.  reliability  focused  correlation  of  multiplied  reliability is  by a v a l u e  i f alpha  near  i s close to  by a v a l u e  near t o the  11 The  earliest  reliabilities of  Shen  of  r e p o r t e d method of e s t i m a t i n g individual  (1925) who  relied  j u d g e s from r a t i n g  considerably  methods d e v e l o p e d  by  two  of w h i c h he  methods - one  computationally It  relied,  as  a number of  identity  which holds  from one  general  so-called judge as  r(ij),  proof  considered  whenever  factor  plus  the  r(ii) the  attenuation,  and  the  identity  correlation  namely t h a t  i n the  the  a b s e n c e of  1965;  Similarly  r(ik) =  dividing  cancelling,  one  obtains  estimating  the  correlations than  three  1931;  the  result  i s the  reliability judges  of  i and  the  be  u n i - f a c t o r assumption  =  1. And  random e r r o r .  hence r ( i j ) =  Cureton,  appropriate  r(ii)  Shen,  1931;  Kelley,  e t c . Hence  terms  1947;  of  there  judge  i , j & k.  i s more t h a n  i f there one  (Burt,  formula  i i n t e r m s of But  by  and  = r(ij)*r(ik)/r(jk)  1925). T h i s g i v e s a  judges  judges then  j  The  w h i c h would  i n t h e a b s e n c e of  1936;  reliability  between  assumed t o  correction for  (r(ii)*r(kk))**1/2,  and  Cureton,  an  1925).  multiplying  1936;  on  the  correlation  But  (Burt,  Shen,  accurate.  random e r r o r i s  perfect correlation  (r(ii)*r(jj))**1/2  proposed  discussed,  between  f o r the  Hence r ( i j ) / ( r ( i i ) * r ( j j ) ) * * 1 / 2  Overall,  are  and  is r(ij)=(r(ii)*r(jj))**1/2.  r(ij)/(r(ii)*r(jj))**1/2. implies  data  more  random e r r o r . T h i s  u s e s Spearman's f o r m u l a  expected  t o be  that  be  theoretically  results  was  formulas  to  u n i - f a c t o r assumption. Writing  i as  data  Truman K e l l e y . Shen a c t u a l l y  u n f e a s i b l e but  do  on  the  way  for  the are of  more  12 estimating do  the r e l i a b i l i t y  to take the average  thing, term  a particular  of a l l e s t i m a t e s s i n c e ,  The  an a v e r a g e  weighted Shen u s e d obtain  happens t o be n e a r  zero.  Shen  b e s t method c o n c e i v a b l e i s t o from  individual determinations  as the squares of t h e i r  Kelley's  logarithmic  standard  differential  estimates f o r the standard error  coefficient  and observed  reliability  estimates that  n e a r - z e r o denominator weighted  i f the  the f o l l o w i n g :  theoretically  obtain  f o r one  e s t i m a t e may be much t o o l a r g e  r ( j k ) i n the denominator  (1925) p r o p o s e d  the  o f any g i v e n j u d g e . I t w i l l n o t  from  technique to  of a  the r e s u l t i n g  are spuriously  errors.  reliability formula  large  that  because  a r e g i v e n n e a r - z e r o weight  of a  i nthe  a v e r a g e . The p r o c e d u r e o f w e i g h t i n g i n v e r s e l y by  squared  standard error  results  i n minimum s t a n d a r d e r r o r  estimates: ... a n d i s u n d o u b t e d l y reliability Kelley  the best value f o r a  coefficient  (Shen,  (1947) m e n t i o n e d  t h i s method o f e s t i m a t i n g t h e  individual  judges' r e l i a b i l i t i e s .  considered  a similar  the  however,  availability Shen many fewer estimated  of  Burt  (1936)  also  p r o c e d u r e . But a l l o f them d i s m i s s e d  procedure as i n v o l v i n g  This,  1925)  t o o much c o m p u t a t i o n a l l a b o u r .  i s not a s e r i o u s  drawback w i t h t h e  computers.  (1925) p r o p o s e d calculations. by a f u n c t i o n  another procedure which The r e l i a b i l i t y  of the average  of judge  correlation  involved iis between  13 judge  i and  a l l other  between a l l p a i r s of  this  of  judges  and  the average  judges. K e l l e y  correlation  (1947) a d v o c a t e d  reliabilities  u s i n g the concepts  been p r o p o s e d .  Burt  first  centroid  and  first  principal  axis  by  centroid  the  judge's  factor  but  analytic  during  the d i s c u s s i o n  Another  and  computing  loading  on  the use  the  reliability  of the proposed  the c o r r e l a t i o n s  o f t h e s e and (Smith,  Cronbach  B u r t and  a  loading  (1965) principal  rationale  will  model of  be  1974).  technique  clarified  rating  data.  reliabilities of  and  Once a g a i n , B u r t since with  of c o r r e l a t i o n  Smith,  behind  then  c o n v e r t i n g back t o the  a v e r a g i n g d i d not c o n f o r m  1936).  the  et a l .  between a l l p a i r s  arithmetic  the chance d i s t r i b u t i o n s  on  the  suggested  c o n v e r t i n g t h e s e by F i s h e r ' s - Z f u n c t i o n ,  of c o r r e l a t i o n s  the  unities  first  method o f e s t i m a t i n g j u d g e s '  the average  One  l o a d i n g s on  instead  m a t r i x . The  using a similar  (Burt,  l o a d i n g s on  w i t h the squared  considered  of  such methods.  sample v a r i a n c e . O v e r a l l  e s t i m a t e s of  i n v o l v e s computing  have  of t h e c o v a r i a n c e m a t r i x  of the c o r r e l a t i o n  factor  scale  matrix  reiability  u s i n g the squared  factor  judges  the  judges'  analysis  f o r e s t i m a t i n g the c o m m u n a l i t i e s .  first  proposed  from  component. B u r t d i d not  (1963) e s t i m a t e d r a t e r  divided  factor  t h e o t h e r method u s e d  d i a g o n a l s of the c o r r e l a t i o n procedure  of  (1936) c o n s i d e r e d two  method e s t i m a t e d r e l i a b i l i t i e s  the  use  estimate.  A number o f methods o f e s t i m a t i n g i n d i v i d u a l  on  the  (1936)  t h e method of "...  what  i s known  coefficients"  however, b o t h d i s m i s s t h e  14 procedure again  as  requiring  t o o much c o m p u t a t i o n a l  the a v a i l a b i l i t y  of c o m p u t e r s e l i m i n a t e s t h i s  C r o n b a c h e t a l . (1963) o f f e r e d estimating square  of  judges, all  individual  pairs  of  is  by  made of  c o v a r i a n c e of  and  judge  of  judge  namely by  i with  slightly  e x p l a i n why  s u b j e c t to extreme unless n  h i g h , but  the v a r i a b l e  no  error  mentioned  Probably reliabilities with  this  bias  (Cronbach  i s t o use the  easy  are  sum  and  there  mention  i s unit-rank tend  Burt  uni-factor  by  from  sum.  sum.  rater  judge's  Burt  (1936)  and  t h i s method as p r o v i d i n g p o i n t e d out estimates  s i n c e the  judge's  Hence a more  o m i t t i n g the the  of one  ratings.  reliability  inflated  i n c l u d e d i n the  estimated  to  method of e s t i m a t i n g  of a l l j u d g e s '  t o be  is  be  e t a l . , 1963).  the c o r r e l a t i o n  (1936) t h e  i s obtained  to  i s small r e l a t i v e  a c c u r a t e e s t i m a t e . As  estimate being  their  earlier.  t h e most p o p u l a r  t h i s method t e n d ratings  of  sampling  i s l a r g e or  C r o n b a c h e t a l . (1963) recommended  (1974) and  covariance  further  u n i t - r a n k c o n d i t i o n i s e q u i v a l e n t to the  a quick,  other  t h e p e r f o r m a n c e of  among c o n d i t i o n s . S e c o n d , t h e e s t i m a t e s  ratings  the  the  i ' s v a r i a n c e . Cronbach et a l .  - w h i c h may  are  fluctuations  assumption  for  i t i n subsequent w r i t i n g s :  Estimates  The  problem.  formula  the average  very pleased with  estimate  another  reliabilities,  the p r o d u c t  judges  (1963) were not proposed  judges'  the average  divided  l a b o u r . Once  judge  by  Smith  obtained  by  own  accurate  whose  reliability  15 If  the  variance,  r a t i n g s are  then  have g r e a t e r be  obtained  weight by  standardized  F.  a rater  not with  i n the  will  tend  to  Thus a b e t t e r e s t i m a t e  judges with  the  sum  might  of  scores.  arising  from t h e  i s more d i f f i c u l t  scales  of measurement  correct  greater  judges with  to d e a l with  ought  lesser  the  researchers  to provide  an  there  individual  Kelley's  observed  general  the  scores  relative  Kelley  next  general  above, and  that  argument  r(ii)**1/2  factor  the  expressed factor using  to  judges  line  with than  /  observed  known and  the  the  scores  i n t e r m s of r(ij)=  s u b s t i t u t e d these  their  regressed  the  that or  the  factor  scores.  determinant.  between a  judge  and  respective  (r(ii)*r(jj))**1/2 into  Kelley  "are a l l  m i n o r s of a  correlation  in  observations,  h y p o t h e t i c a l " t r u e " or by  judges  (l-r(ii)).  are  underlying  given  of  for weighting  same t h i n g " . K e l l e y t h e n on  from  p o s s i b l e to  more w e i g h t  in a long  reliabilities  weights are  reliabilities  given  first  (1927) words, t h e  measures of  bias a r i s i n g  analogous  intuitively  t o the q u a n t i t y  i s one  than  of  reliability.  (1927) was  assumes t h e  way  t o be  Kelley  proportion  reliability  1979). I t i s not  i n any  feels  reliability  differential  (Borman,  f o r random e r r o r  s t a n d a r d i z a t i o n . One  the  large variance  sum.  correlating  judges  The  a  t o a common  DIFFERENTIAL WEIGHTING Error  in  standardized  as  determinant.  discussed And  16 finally, readily with  following  l e d t o the e v a l u a t i o n  the r e s u l t  proportion Burt lines,  a s u g g e s t i o n by H a r o l d H o t e l l i n g  that  judges'  t o t h e above  demonstration  different simple  of a c o m p o s i t e .  by Thomson  1947b) and  Green  problem  (1940,  common d e f i n i t i o n  weights  w1,  to  + w2b'  w1a'  of  w3  v1, v2, and  The  had  in obtaining  Mosier  a  the to the  e v o l v e d over a s e r i e s (1943), P e e l  the problem  between p a r a l l e l (Allen  of (1947a,  aim  w1a  v3  forms  a',  + w2b  is to'find  correlation, such  that  one  the case t h a t  eigenvalue solution  forms--one 1979). b',  + w3c  c'.  Given  is parallel  weights  which  sums. U s i n g  find  between  i s m a x i m i z e d . Now, u1=v1, u2=v2 and  t h e maximum to canonical  the  weights u l ,  the c o r r e l a t i o n  +v3c'  satisfy  can  by  & Yen,  between t h e s e two  v i a ' + v2b'  i t must be  The  but w i t h a  eigenvector solution  the composite  hence t h e s e w e i g h t s  problem.  An  reliability  + w3c'.  method o f c a n o n i c a l  symmetry  weights.  would maximize  a, b, c have p a r a l l e l  w2,  + u3c  which  similar  (1947). p r o v i d e d y e t  interested  approached  maximize the c o r r e l a t i o n  and  was  1947),  the c o r r e l a t i o n  Suppose t e s t s  in  (1950) .  Thomson o r i g i n a l l y considering  along  of these r e g r e s s i o n  i n m i n d . He  maximum r e l i a b i l i t y  and  t h e same w e i g h t s  formula f o r weights  papers  weighted  (1965) o b t a i n e d t h e same w e i g h t s ,  purpose  reliability,  + u2b  s c o r e s whould be  up  weights.  (1936) d e r i v e d  Overall  u3  d e t e r m i n a n t " , came  apparently independently. Kelley  another  u2,  of t h i s  "which  u1a  by u3=v3,  reliability correlations  17 (Morrison, weighting (Green,  1976) c a n be seen f o r maximum  reliability  to the  of composite  problem  1950).  Overall the  t o be e q u i v a l e n t  familiar  variables  (1965) u t i l i z e d expression  i n terms  substitution  into  a s s u m p t i o n and  f o r the c o r r e l a t i o n  of t h e i r  respective  the e i g e n v e c t o r  weights which maximized proportional  the u n i - f a c t o r  reliabilities,  solution  the r e l i a b i l i t y  between two  deduced  and on  that the  of the composite a r e  t o t h e w e i g h t s o b t a i n e d by K e l l e y  and B u r t .  G. SUMMARY To  summarize,  t h e p r o c e d u r e t o be recommended,  was s u g g e s t e d by B u r t ratings  into  individual mentioned  unit  (1936),  standard score  reliability  results  s c o r e s . That absolute  s,  Burt advocated s c a l i n g the  s t a n d a r d form a s w e l l . But  i s t o attempt  i s , t o attempt  t o e x p r e s s ' them  to estimate true  ( C r o n b a c h e t a l . 1963). T h i s  t h e mean t o t h e o v e r a l l  and e s t i m a t i n g  and t h e r e l i a b i l i t y  scaling  one o f t h e  i n t h e minimum s q u a r e d d e v i a t i o n  scale  converting ratings,  form, e s t i m a t e t h e  judge u s i n g  scores.  score estimates to unit  another p o s s i b i l t y which  of each  judge's  methods, and t o u s e t h e s e e s t i m a t e s t o w e i g h t t h e  judges t o estimate true true  i s t o c o n v e r t each  which  the true  sample  score  of the composite  the standard d e v i a t i o n  in a  from  scores  scale  true i n an  involves  mean o f t h e  standard deviation, e s t i m a t e , r , and  of the e s t i m a t e s t o s * r .  18 H. MODEL The  present  consideration  discussion  of s o u r c e s of r a t e r  were c o n s i d e r e d : factorial question in  has developed  scale  validity,  o f measurement  and r e l i a b i l i t y  of f a c t o r i a l  validity  r e s u l t o f one g e n e r a l  ratings scale  generated  c a n be e x p r e s s e d  scores plus  regression  random e r r o r .  equation,  deviation  advocated  (Cureton,  by K e l l e y  multi-factorial The  The s c a l i n g  that the  t o the judge's u n i t of  1958). T h i s model o f r a t i n g s (1931), Burt  (1968) a l t h o u g h  has-been  (1936),  B u r t and  f o r the p o s s i b i l t y of a  j ' s true  mean. A ( i ) and B ( i )  i n the form:  + B ( i ) * ( Y ( j ) - Y B A R ) + E ( i j ) where X ( i j )  the obseved  an a d d i t i v e  factor  c o e f f i c i e n t of a l i n e a r  model c a n be w r i t t e n  represents  is, a  model.  = YBAR + A ( i )  target  index. That  by a  as a l i n e a r f u n c t i o n of  (1924), C u r e t o n  allowed  Hence  i s n o t t h e same a s t h e s t a n d a r d  (1963) and O v e r a l l  Cronbach a l s o  is  and random e r r o r .  c o e f f i c i e n t corresponds  measurement  shelved  a l l ratings are  o f t h e r a t i n g . I t h a s been a r g u e d  regression  Cronbach  which  variance),  judge a r e c h a r a c t e r i z e d  corresponds t o the regression  X(ij)  that  o f measurement and a r e l i a b i l i t y  judge's r a t i n g s true  (mean and  factors  h a s been e s s e n t i a l l y  factor  by a g i v e n  T h r e e main  ( o r random e r r o r ) . The  making t h e s i m p l i f y i n g a s s u m p t i o n  the  is  bias.  out of  r a t i n g of judge  score.  YBAR  a r e judge  constant  i for target  i s the population  j . Y(j)  true  score  i ' s s c a l i n g parameters. A ( i )  and B ( i )  i s a scaling  factor  or u n i t  19 of  measurement. E ( i j ) i s t h e random e r r o r component. The  reliability Var(X). total  That  population  these  variance  d i v i d e d by t h e  The derived  b i a s , A ( i ) , over  i s zero and t h a t  the consensual  the average  definition  namely f o r a f i x e d  ( o r mean).of r a t i n g s o v e r score  level  B ( i ) i s one ( C r o n b a c h e t a l . 1963). W i t h  assumptions  value  the average  of judges  factor  satisfied,  true  by B ( i ) * * 2 * V a r ( Y ) /  i s , the " t r u e " score  i s assumed t h a t  scaling  is  i i s given  variance. It  the  of j u d g e  target  of t r u e  scores  j , the expected  a l l judges  i s equal  t o the  f o r target j .  factor  analytic  f r o m t h e common  estimates  of r e l i a b i l i t y are  f a c t o r model a s f o l l o w s : The  reliability  o f a v a r i a b l e c a n be d e f i n e d as t h e s q u a r e d  correlation  with  correlation  between a v a r i a b l e w i t h  factor  i s equal  (Gorsuch, factor  true  scores  to the corresponding  1974). Hence  scores,  (Allen  since true  the r e l i a b i l i t y  estimating are  factor ratings  scores  by f a c t o r  using  are given  factor  by l i n e a r  method o f maximum l i k e l i h o o d  loading  are represented  another  by t h e  structure i s  components,  combinations  method o f  mentioned,  I f the f a c t o r  then,  of the  elements as weights. factor  by  factor.  which, as j u s t  scores.  the eigenvector  and a  of a v a r i a b l e i s g i v e n  by t h e method o f p r i n c i p a l  scores  variance  model a l s o s u g g e s t s  targets' true  represented  estimated  factor  unit  scores  s q u a r e o f i t s l o a d i n g on t h e common The common  & Yen, 1979). The  I f the  a n a l y s i s i s used,  then  20 factor  scores are g e n e r a l l y estimated  multiple  regression technique  by Thomson's  (Morrison,  1976;  (1951)  Nunnally,  1978).  I.  OBJECTIONS TO  BURT'S METHOD  There doesn't evidence  with  reliability  resulting  particular  be  from  of c o m p o s i t e  One  and  Lawshe and  validity.  about  Nagle The  general  conclusions  reached  Hogarth  superior  suggest  to l i n e a r l y  situations The they  typically certain  feeling  seems t o have  effort.  relates  t o the c u r r e n t versus  unit  (1981) r e p o r t e d on Corrigan  (1974) and  the Einhorn  (1975):  These a u t h o r s  sizes.  by Dawes and  talking  immediacy  basically,  relatively  t h a t the  employed  their  weighting  optimal weighting  involving  fact are  that unit  about  are  in personality to t h e i r  small  those  may  in a  small  relatively  be variety  sample sample  that  assessment  are lends a  c o n c l u s i o n s . But  arguments r e s t  on  to  small gains in  worth the c o m p u t a t i o n a l  1976). W i g g i n s  f o r some  consensus appears  in r e l a t i v e l y The  g i v e s some  (1952) i n t h e  t h e m e r i t s of d i f f e r e n t i a l  (Wainer,  or  improvement  o b j e c t i o n t o B u r t ' s method  weighting  sizes  of  reliability.  i t i s not  controversy  of  improved v a l i d i t y  B u r t ' s method. B u r t  t h a t t h e method r e s u l t s  been t h a t  a g r e a t d e a l of e m p i r i c a l  derived estimates  c a s e s , as do  reliability  and  t o be  r e s p e c t t o the  theoretically  context  appear  the  well-known  2 1  instability small a  of  r e g r e s s i o n weights  samples. Wainer  somewhat more r a d i c a l  proof  t h a t under  coefficients  fairly  general  accuracy  i n the o r i g i n a l  least of  weights  squares  subjects  The  He  presents  takes a  circumstances,  i n m u l t i p l e r e g r e s s i o n m o d e l s can  with equal  equal  relatively  t h e o t h e r hand  position.  replaced  that  on  ( 1 9 7 6 )  in  weights  will  with almost  u l t i m a t e aim  loss  s a m p l e . F u r t h e r he  of  shows  have g r e a t e r r o b u s t n e s s  regression coefficients  (Wiggins,  no  be  i n new  than  samples  1 9 8 1 ) .  of t h e p r e s e n t  study  was  to  compare  t h e p e r f o r m a n c e s of v a r i o u s methods of e s t i m a t i n g t r u e s c o r e s under d i f f e r e n t b i a s . The study.  rating  the  the e f f e c t of  Second,  replicated of  any  p e r f o r m a n c e s of possible  c o n d i t i o n s t o be i t a l l o w s any  t o be  e r r o r . And estimated  the e s t i m a t e s  that a n a l y t i c  But,  expressions of  can  are  in general,  present  thus  reducing  the a c t u a l  be  values  hence  the  value  obtainable for  estimated  i t is difficult  essential  varied  known and  e x p r e s s i o n s may  f o r the expected  i t i s not  rater  be measured e x a c t l y . I t i s  random v a r i a b l e s . M o r e o v e r , w i t h  computers,  f o r the  times,  third,  some or p e r h a p s a l l of t h e q u a n t i t i e s study.  and  particular condition  d e s i r e d number of  sampling  the q u a n t i t i e s  size  three d e s i r a b l e p r o p e r t i e s . F i r s t , i t  experimental  systematically.  sample  t e s t e d were s i m u l a t e d  S i m u l a t i o n has  enables  t o be  data  c o n d i t i o n s of  in  to o b t a i n  of c o m p l i c a t e d  this analytic functions  the a v a i l a b i l i t y  (Diaconis & Efron,  of  1983).  II.  A.  METHOD  DATA GENERATION The  rating  data  were s i m u l a t e d w i t h a F o r t r a n p r o g r a m  ( s e e A p p e n d i x ) on t h e U n i v e r s i t y  of B r i t i s h  470/V8 c o m p u t e r . F o r t h e g e n e r a t i o n i  was c o n s i d e r e d  t o be a " l i n e a r  (1968) w o r d s ) w i t h factor  s c o r e s Y ( j ) were o b t a i n e d generator  filter"  a characteristic  B ( i ) , and a r e l i a b i l i t y from  distribution  w i t h mean z e r o  a s s u m p t i o n s ) and a s t a n d a r d experimental  experimental accordance  mean A ( i ) , a  normal  from  from  a random  ( i n accordance  scaling  the assumption  reverse the order  distribution  factors  t h e model by t h e were, f o r  distribution,  below z e r o were t r u n c a t e d t h a t no r a t e r  would  of the t r u e s c o r e s ( i e . were e x c l u d e d ) . To p r e s e r v e value  B ( i ) were o b t a i n e d  of the B ( i ) t r u n c a t e d . Thus from  a normal  t r u n c a t e d below z e r o and above two w i t h a mean  one and a s t a n d a r d  experimental  with  d e v i a t i o n a s s p e c i f i e d by  one) v a l u e s above two were a l s o  scaling  mean  normal  factors  a random n o r m a l  regression coefficients  would e q u a l  of  the true  random number  d e v i a t i o n as s p e c i f i e d  t h e mean o f one ( s o t h a t t h e e x p e c t e d  rater  scaling  a unit  c o n d i t i o n . Values  with  systematically  the  (in Overall's  In a l l c a s e s ,  w i t h a mean of one and a s t a n d a r d  negative  judge  r(ii).  c o n d i t i o n . The r a t e r  t h e most p a r t , o b t a i n e d  in  each  Amdhal  (RANDN - s e e UBC D o c u m e n t a t i o n ) . The r a t e r  b i a s v a l u e s A ( i ) were o b t a i n e d  the  of d a t a ,  Columbia  d e v i a t i o n as s p e c i f i e d  c o n d i t i o n . Truncation decreases  22  by t h e the standard  23 deviation  of a d i s t r i b u t i o n .  truncation and  Kotz  B(i)  c a n be c a l c u l a t e d  (1970).  calculated  The s t a n d a r d d e v i a t i o n a f t e r from  The o b s e r v e d  value. Since  does n o t a p p e a r  formulas  uniform  t h e shape o f t h e d i s t r i b u t i o n  t o be known, t h e n o r m a l  The  t h e B ( i ) were a l s o  distribution rater  reliability  control  distribution  was symmetric  greater  one o r l e s s  from  selected  c o n d i t i o n s as w e l l . was s p e c i f i e d  Hence f o r a g i v e n  1.  2.  on v a l u e s  reliabilities  were  distribution in  by t h e e x p e r i m e n t a l c o n d i t i o n .  experimental  normal  Each  judge  c o n d i t i o n , the relevant  t o t a r g e t s from  a random  distribution. was a s s i g n e d a l e v e l  b i a s v a l u e A ( i ) from a  w i t h a mean o f z e r o a n d a  as s p e c i f i e d  0 < B ( i ) < 2.  b i a s B ( i ) from a  w i t h a mean o f one and a  as s p e c i f i e d  standard  by t h e c o n d i t i o n .  was a s s i g n e d a s c a l i n g  normal d i s t r i b u t i o n deviation  that the  The s t a n d a r d d e v i a t i o n f o r  unit  judge  t h e mean was under  z e r o . Rater  s c o r e s Y ( j ) were a s s i g n e d  Each  i n much t h e  by t h e f o l l o w i n g s t e p s :  normal d i s t r i b u t i o n  3.  except  True  deviation  an a p p r o p r i a t e  experimental c o n d i t i o n s .  an a p p r o p r i a t e u n i f o r m  were g e n e r a t e d  from  and d i d n o t t a k e  obtained  data  b u t f o r t h e sake o f  and t r u n c a t i o n was such  than  of the  distribution  v a l u e s were o b t a i n e d  factors  experimental  reliabilities  obtained  for selected  same way a s t h e s c a l i n g  than  i n Johnson  v a l u e was compared w i t h t h e  was b e l i e v e d t o be a s a f e a p p r o x i m a t i o n , comparison  given  by t h e e x p e r i m e n t a l  standard  c o n d i t i o n and  24 4.  Each  j u d g e was  assigned a  reliability  n o r m a l d i s t r i b u t i o n w i t h mean and  5.  specified  by  1 and  symmetric  An  was  the  "observed"  generated X(ij)  by  about  rating  for  the  + B ( i ) * Y(j)  * E ( i j ) where E ( i j ) was  having  level  reliability  bias  since  expected  v a l u e of the  correct  value  commented  unrealistic  the  B.  expected  +  i  was  (B(i)**2/r(ii)-B(i)**2)**l/2 from a  formula  random  unit  r e s u l t s i n judge i  A ( i ) , scaling bias  B(i)  i s one  B(i),  ( i e . the  target's  t h i s method of  i n the  sense that  t o have a  A(i)  i t follows  rating assigned  unbounded whereas r e a l r a t i n g s not  judge  function:  e x p e c t e d v a l u e of  average  that  0 < r(ii) <  and  r(ii).  Moreover  target,  j by  obtained  normal d i s t r i b u t i o n . T h i s  and  as  mean.  target  following  r ( i i ) from a  standard deviation  experimental condition  the  = A(i)  value  to  that  that  "true"  for a  target  score).  generating the  i s z e r o and  the  fixed is  the  It should  ratings  be  is  values are t h e o r e t i c a l l y  usually  have bounds. T h i s  s i g n i f i c a n t e f f e c t on  the  was  outcome.  EXPERIMENTAL CONDITIONS The  this  experimental conditions  s t u d y were: t h e  targets,  the  reliability following tested:  number of  standard bias,  four  and  sets  of  judges,  deviations the  w h i c h were m a n i p u l a t e d number  of  level,  scale  and  mean r e l i a b i l i t y  bias.  The  j u d g e and  of  the  target  sample  in  sizes  were  25 1.  5 judges,  10  targets  2.  10 j u d g e s ,  5  targets  3.  10 j u d g e s ,  10  targets  4.  20 j u d g e s ,  20  targets  Each of these  j u d g e and t a r g e t  w i t h each of the f o l l o w i n g  combinations  1.  SD(A)=.5,  MEAN(r)=.8,  SD(r)=.2  2.  SD(A)=.5, SD(B)=.5, MEAN(r)=.6,  SD(r)=.4  3.  SD(A)=.0, SD(B)=.0, MEAN(r)=.6,  SD(r)=.4  The  first  A(i) of  t o judges  values  judges  etc.  have l e v e l  were a s s i g n e d  parameters allow  bias  Similarly  scaling  deviation  of .5,  the comparison  l e v e l s of r e l i a b i l i t y — i n  the  o f j u d g e s - - t o t h e e f f e c t s of low r e l i a b i l i t y .  scales  of measurement c a n be examined s i n c e  s p e c i f i e s that  t o t h e same v a l u e  In  the values  the l a s t  of A ( i ) and B ( i ) be  (namely z e r o  and one  respectively)  a l l judges. The  four  three  conditions ten  values  deviation  t h e mean i s z e r o .  condition  bias:  t h e e f f e c t of t h e p r e s e n c e of d i f f e r e n c e s i n  condition equal  .5, s i n c e  of b i a s  the e f f e c t s of high  judges'  the  2/3 of t h e j u d g e s w i l l  i n the f i r s t  These c h o i c e s  addition  bias  from a d i s t r i b u t i o n w i t h a s t a n d a r d about  tested  of r a t e r  level  from a d i s t r i b u t i o n w i t h a s t a n d a r d  population  for  f o r example, a s s i g n s  between -.5 and  factors  of  condition,  .5 so t h a t  the  SD(B)=.5,  three  c o m b i n a t i o n s was  sample s i z e c o n d i t i o n s  rater bias conditions f o r c o m p a r i s o n . One  j u d g e s and t e n t a r g e t s  with  were f u l l y  to give  a total  further condition level  and s c a l e  crossed  with  of twelve involving standard  26 deviations of  o f .5 and r e l i a b i l i t y  .6 and .4 r e s p e c t i v e l y ,  factors  mean and s t a n d a r d d e v i a t i o n  was t e s t e d i n w h i c h t h e s c a l i n g  B ( i ) and t h e r e l i a b i l i t i e s  a uniform  distribution  r ( i i ) were o b t a i n e d  i n s t e a d of a t r u n c a t e d  from  normal  distribution. Each c o n d i t i o n combination each r e p l i c a t i o n "observed"  was r e p l i c a t e d  target true scores, rater  s c o r e s were g e n e r a t e d  specifications replication  bias  i n d i c e s , and  according to the  of the e x p e r i m e n t a l  the f o l l o w i n g data  150 t i m e s . On  were  c o n d i t i o n . For each obtained:  C. RATER R E L I A B I L I T I E S . The r e l i a b i l i t i e s estimated  by s e v e n  introduction. by  of the i n d i v i d u a l  The r e s u l t s  from  e a c h method were  represented  t h e mean a n d s t a n d a r d d e v i a t i o n s o f t h e r a t e r  actual squared  rater  reliability  between t h e e s t i m a t e s  v a l u e s , and f i n a l l y ,  d e v i a t i o n of the e s t i m a t e s  s h o u l d be s t r e s s e d t h a t t h e a c t u a l eac-h judge  was a v a i l a b l e  e s t i m a t e s . The seven  the  t h e form squared  from  and t h e  t h e mean  the a c t u a l  with  value. It  the v a r i o u s  methods t e s t e d were t h e f o l l o w i n g : method o f w e i g h t i n g  r(ii) = r(ij)*r(ik)/r(jk) standard  error  covariance  estimates  by t h e i n v e r s e o f  of e s t i m a t e .  C r o n b a c h : C r o n b a c h e t a l . ' s (1963) f o r m u l a the average  reliability  p o p u l a t i o n r e l i a b i l i t y of  f o r comparison  Shen: Shen's " i m p r a c t i c a l " of  2.  were  o f t h e methods d e s c r i b e d i n t h e  estimates, the c o r r e l a t i o n  1.  judges  between judge  which d i v i d e s  i and t h e o t h e r  27  judges all 3.  by t h e p r o d u c t  judges  l o a d i n g on t h e f i r s t  the c o r r e l a t i o n  subroutine  SYMAL  matrix  ML: The s q u a r e d factor  algorithm  factor given  on a t e c h n i q u e 5.  Avg F i s h e r - z :  functional 6.  r with with  7.  The UBC  component system  components.  correlation  maximum  matrix.  (1973).  likelihood  The maximum  a n a l y s i s p r o g r a m was b a s e d  on an  The method  i s based  due t o Rao. The a v e r a g e  of the F i s h e r - z transform of  of a given  back  judge  with  to a correlation  the other  coefficient  judges by t h e  i n v e r s e of t h e F i s h e r - z t r a n s f o r m .  Sum: The r a t i n g s  t h e sum o v e r  being  f o r judges.  i n Morrison  the c o r r e l a t i o n s was c o n v e r t e d  principal  l o a d i n g s on t h e f i r s t  of the judges'  likelihood  between  ( s e e UBC D o c u m e n t a t i o n s ) was u s e d t o  compute t h e p r i n c i p a l 4.  covariance  and t h e v a r i a n c e o f judge i .  PC: The s q u a r e d of  of the average  by e a c h  a l l judges  judge  were  correlated  excluding the given  judge  assessed.  r with  z-Sum: T h i s  ratings  i s t h e same a s r w i t h Sum e x c e p t a l l  are converted  to standard  scores prior to  summat i o n . Finally, measures deviation  t h e mean a n d s t a n d a r d  (mean, SD, c o r r e l a t i o n from  a c t u a l ) over  d e v i a t i o n s of the f o u r  with a c t u a l ,  150 r e p l i c a t i o n s  and mean  square  were t a b u l a t e d .  28 D. R E L I A B I L I T Y OF SUM The the  true r e l i a b i l i t y  o f a sum ( o r mean) o f j u d g e s  v a l u e s A ( i ) , B ( i ) , and r ( i i )  o f t h e model e q u a t i o n a r e  known c a n be computed e x a c t l y by t h e r a t i o variance  to total  reliability population of  v a r i a n c e . The e x p e c t e d  o f a sample o f j u d g e s was e s t i m a t e d  from  by c o m p u t i n g  t h e sum f o r 1,000 s a m p l e s o f Two sample e s t i m a t e s  when  a  of t r u e  value  score  of the  specified  t h e mean  reliability  judges.  of the r e l i a b i l i t y  o f a sum were  tested: 1.  Alpha:  Cronbach's alpha  i s a common measure o f t h e  expected  reliability  o f a sum o f r a t e r s .  proposed  by s e v e r a l w r i t e r s ( H o y t ,  Cronbach et a l . (1972). a  lower  1968)  bound  equivalent  1941; E b e l , 1951;  Cronbach's alpha  f o r the r e l i a b i l i t y  with e q u a l i t y  I t h a s been  i s known t o be  o f a sum  (Hunter,  o c c u r i n g when t h e r a t e r s a r e  ( i e . equal  means, v a r i a n c e s and  reliabilities). 2.  G r e e n : G r e e n ' s measure u t i l i z e s rater  reliabilities.  reliability Green The  estimates  Maximum  likelihood  were u s e d .  of i n d i v i d u a l  factor  The f o r m u l a  analysis  i s given i n  (1950). means and s t a n d a r d  d e v i a t i o n s of the e s t i m a t e s  obtained  over  t h e 150 r e p l i c a t i o n s  compared  with  each o t h e r  v a l u e s . As w e l l , and  estimates  actual  and w i t h  t h e mean s q u a r e  v a l u e s were  computed.  were c a l c u l a t e d and the estimated  population  d e v i a t i o n between  estimates  29 E . WEIGHTING Given  FOR MAXIMUM  RELIABILITY  known b i a s v a l u e s A ( i ) , B ( i ) and r ( i i ) "for e a c h  judge the a c t u a l  reliability  of a weighted  composite  c a n be  determined  u s i n g a f o r m u l a g i v e n i n Green  (1950). I f  population  v a l u e s a r e used  the composite  reliability  then the r e s u l t i n g  corresponding whether  in calculating  value i s the population  t o the s e t of weights  used.  t h e w e i g h t s were o b t a i n e d from  Two methods o f d e t e r m i n i n g w e i g h t s composite 1.  reliability  were  sample e s t i m a t e s . f o r maximizing the  tested.  was t h e method d e v e l o p e d by Thomson'(1940,  1947),  (1943), P e e l  Mosier  i n terms  the  principal  Overall: Overall  finally  components by G r e e n SYMAL was u s e d  t o compute  The s e c o n d method t e s t e d was one p r o p o s e d by (1965).  The  maximum  likelihood  The  results  of t h i s  as w e l l  routine  947b) a n d  components.  B o t h methods u t i l i z e  performance  (1947a;  of p r i n c i p a l  ( 1 9 5 0 ) . The UBC s y s t e m  factor  section  of O v e r a l l ' s  as the g a i n  e s t i m a t e s of r a t e r analysis  a r e of i n t e r e s t  simplified  in reliability  150 r e p l i c a t i o n s  were  reliability.  e s t i m a t e s were  used.  i n that the  f o r m u l a s c a n be t e s t e d resulting  w e i g h t s . The means and s t a n d a r d d e v i a t i o n over  matter  PC: The f i r s t  expressed  2.  I t doesn't  value  reported.  from the  f o r both estimates  30 F. TRUE SCORE VARIANCE The cases  equal  variance 1.  p o p u l a t i o n v a r i a n c e of the t r u e scores  t o one. Two methods o f e s t i m a t i n g t r u e  from  rating  data  Avg Cov: C r o n b a c h e t a l (1963) d e f i n e d t r u e  Avg b=1: T r u e estimated mean v a l u e  The  covariance  between a l l j u d g e s .  v a r i a n c e and r e l i a b i l i t y  of B ( i ) e q u a l  deviation  from  by s e t t i n g t h e  t o one.  means and s t a n d a r d d e v i a t i o n s f o r b o t h  150 r e p l i c a t i o n s  score  s c o r e v a r i a n c e c a n be e s t i m a t e d  rater  score  were t e s t e d :  v a r i a n c e as the average 2.  i s in a l l  methods o v e r t h e  were r e p o r t e d a s w e l l a s t h e mean  of each e s t i m a t e  from  unity  (the actual  square true  score  variance).  G. TRUE SCORES Finally,  the estimates  s c o r e e s t i m a t e s . Of t h e s e , 1.  Consensus: T h i s  of primary seven  concern  were  true  methods were t e s t e d .  i s the simple  average  of  "observed"  scores. 2.  Weighted Consensus: Observed weights  designed  composite  3.  standardized  without  of the  first  s c o r e s were e s t i m a t e d  by t h e a v e r a g e  ratings.  Weighted S t a n d a r d i z e d : combined  weights  by  the r a t i n g s .  S t a n d a r d i z e d : True of  4.  t o maximize t h e r e l i a b i l i t y  using Burt's  standardizing  scores a r e weighted  Standardized  using Kelley's,  ratings  were  B u r t ' s and O v e r a l l ' s w e i g h t s .  31 5.  Cronbach-Kelley: score the  True  scores  consensus weighted  sum p l u s  the o v e r a l l  were e s t i m a t e d  by t h e e s t i m a t e d  by t h e raw  reliability  of  mean w e i g h t e d by one minus t h e  reliability  o f t h e sum. G r e e n ' s e s t i m a t e  of the  reliability  o f t h e sum was u s e d a s i t was f o u n d  preliminary  i n v e s t i g a t i o n s t o be more a c c u r a t e  from than  alpha. 6.  PC S c o r e s :  True  combination the  matrix.  regression  (Thomson,  likelihood  factor analysis.  scores  given  standardized,  Finally,  the  were s t a n d a r d i z e d  were e s t i m a t e d  with  the a c t u a l true  weighted score  estimates true  t o have mean e q u a l  estimated  resulting  scores, the  arbitrary.  mean and v a r i a n c e  score  of the p a r t i c u l a r  observed  hence t h e  from e a c h method were  true  of  formula  i s also  to the o v e r a l l  of e s t i m a t e d  reliability  and f a c t o r  scores  E a c h method o f e s t i m a t i o n  general  d e v i a t i o n of  i s arbitrary,  estimate.  the  o f a maximum  s c o r e s . In t h e case of  standardized,  the estimates  t o the product  either  t o have  by m u l t i p l e  t h e mean and s t a n d a r d  mean s q u a r e d e v i a t i o n from  equal  e i g e n v e c t o r of  by e a c h method were c a l c u l a t e d a s w e l l a s  of t h e t r u e  rescaled  first  1951) f r o m t h e r e s u l t s  each r e p l i c a t i o n  correlation  scale  by t h e l i n e a r  variance. True  estimates  by.the  Scores  ML S c o r e s :  For  the  were e s t i m a t e d  of judges g i v e n  correlation  unit 7.  scores  v a r i a n c e and true  i s a linear  r a t i n g s or s t a n d a r d i z e d f o r the r e l i a b i l i t y  score combination  r a t i n g s . Hence  of a weighted  32 composite  can  be u s e d  method of e s t i m a t i n g  to estimate true  scores.  the r e l i a b i l i t y  of  each  III.  A.  RESULTS  RATER R E L I A B I L I E S Tables  1 through  12 p r e s e n t  the r e s u l t s  methods o f e s t i m a t i n g  individual  rater  tables correspond indicated The  to d i f f e r e n t  results  of the r a t e r size  m e a s u r e s o f mean, s t a n d a r d actual  and mean s q u a r e d  Tables  13 t h r o u g h 17.  Table  13 c o n t a i n s  mean r e l i a b i l i t y  reliability  and r a t e r  t h e mean o v e r  estimate  given  represented  f o r each of the  distribution  and  the average  of r a t e r  bias  The r a t e r  bias  Sum and t h e r w i t h apparent size  conditions  s u c h a s .5.5.8.2. T h i s  in level  judge's r e l i a b i l i t y  are that  of t h e  by e a c h method o f e s t i m a t i o n  by n u m e r i c a r r a y s  of the r e l i a b i l i t i e s  observations  with  150 r e p l i c a t i o n s  the c o n d i t i o n i n which the s t a n d a r d  the  d e v i a t i o n s of  and s c a l e a r e .5,  i s .8 a n d t h e s t a n d a r d  i s .2. N o t e w o r t h y  t h e mean e s t i m a t e s  given  by t h e r w i t h  Z-Sum methods a r e c o n s i s t e n t l y h i g h  increase  i n bias corresponding  with  to the l a r g e r  c o n d i t i o n s . The PC and C r o n b a c h methods show  h i g h means i n t h e s m a l l a p p e a r s t o be r e d u c e d for  bias  d e v i a t i o n from a c t u a l s e p a r a t e l y i n  are  represents  estimates are  deviation, correlation  each c o n d i t i o n c o m b i n a t i o n .  sample  The  c o n d i t i o n c o m b i n a t i o n s as  for  an  reliabilities.  a t the t o p of the t a b l e s .  b r o k e n down by sample  deviation  f o r the v a r i o u s  sample c o n d i t i o n s b u t t h i s  a s sample  size  i n c r e a s e s . The means  t h e Shen, ML and Avg F i s h e r - z methods a p p e a r  33  bias  t o be  34 roughly  unbiased.  Table of  14 c o n t a i n s t h e means o v e r  t h e sample  estimates example,  standard  t h e 150  d e v i a t i o n of the  reliability  o f e a c h method and e a c h c o n d i t i o n c o m b i n a t i o n . F o r i n a given  combination,  replication  t h e Shen e s t i m a t e ,  under a g i v e n c o n d i t i o n say, gives  reliability  estimates  f o r each judge.  estimates  was computed and t h e c o r r e s p o n d i n g  d e v i a t i o n s were a v e r a g e d the  table  average  of the expected  standard  are contained  i n Tables  specified  under  1 through  than  (1970).  the e x p e r i m e n t a l  The r e l i a b i l i t y  n o r m a l w i t h mean  c o n d i t i o n s (.2  .6 and 1.0 r e s u l t s .11.  i n having  The r e l i a b i l i t i e s  .4 w i t h  using a formula distribution  .8 and s t a n d a r d  o f about  compared  with  Inspection  of T a b l e  observations: First considerably  J o h n s o n and  corresponding  to a  d e v i a t i o n of about  in a  .22. Hence t h e r e s u l t s  the values  from  .6 and s t a n d a r d d e v i a t i o n  t r u n c a t i o n a t .2 and 1.0 r e s u l t  deviation  standard  d e v i a t i o n .2 t r u n c a t e d a t  a standard  w i t h mean  12. The  the standard  .4) b e c a u s e o f t r u n c a t i o n . The r e s u l t i n g  Kotz  Hence  sample d e v i a t i o n s o f t h e a c t u a l  d e v i a t i o n s c a n be e s t i m a t e d  be  t h e 150 r e p l i c a t i o n s .  standard d e v i a t i o n s are l e s s  deviations and  over  standard  o f e a c h method under e a c h c o n d i t i o n .  reliabilities actual  The s t a n d a r d d e v i a t i o n o f t h e s e  contains estimates  deviation The  replications  standard  of Table  14 a r e t o  .11 a n d .22.  14 s u g g e s t s  the f o l l o w i n g  t h e SD f o r t h e C r o n b a c h method i s  inflated  a t t h e s m a l l e r sample  sizes.  This bias  35 appears  to decrease  Fisher-z the  a s t h e sample s i z e  method h a s t h e s m a l l e s t SDs, f o l l o w e d c l o s e l y by  r w i t h Sum and t h e r w i t h  smaller judges than  i n c r e a s e s . The Avg  than  the a c t u a l  Z-Sum w i t h SDs g e n e r a l l y  SDs e x c e p t  and 5 t a r g e t s where t h e y  the a c t u a l .  tended  t o be s l i g h t l y  higher  The Shen and ML methods had s i m i l a r SDs.  B o t h were s l i g h t l y Finally,  i n t h e c o n d i t i o n o f 10  higher  than  those  of t h e PC method.  t h e SDs f o r t h e C r o n b a c h method were g e n e r a l l y  largest. Table  15 c o n t a i n s f o r e a c h method o f e s t i m a t i o n under  each c o n d i t i o n combination  the average  estimated  reliabilities,  and a c t u a l  replications. conditions  These  rater results  and p r e s e n t e d  correlation  were a v e r a g e d  i n Table  averaged over  between over  rater  150  bias  16. The s i z e o f  correlation  f o r a l l methods a p p e a r s  importantly  o f t h e number o f t a r g e t s and s e c o n d a r i l y o f t h e  number o f r a t e r s . correlation  The ML method had t h e h i g h e s t  and  actual  17 p r e s e n t s averaged  correlations rater in  bias.  average  i n t h e 5-10, 10-10 a n d 20-20 c o n d i t i o n s , w h i l e  the Cronbach estimate Table  t o be a f u n c t i o n most  was s u p e r i o r i n t h e 10—5 c o n d i t i o n . the c o r r e l a t i o n s  over  sample s i z e s .  are not g r e a t l y  affected  between  estimates  In g e n e r a l t h e by t h e d i s t r i b u t i o n o f  The ML method g e n e r a l l y p e r f o r m s  the best  except  t h e .0.0.6.4 c o n d i t i o n where t h e C r o n b a c h e s t i m a t e  showed  a correlation Table mean s q u a r e  considerably larger  18 p r e s e n t s  than  t h e mean o v e r  the other  estimates.  150 r e p l i c a t i o n s  d e v i a t i o n between t h e r a t e r  reliability  of the  36 e s t i m a t e s and t h e a c t u a l estimation  and each  Table  experimental  correlations,  sample  size.  over  the rater  The most n o t e w o r t h y  the  conditions.  s m a l l sample s i z e  sample s i z e s .  had mean s q u a r e than  deviations  t h e o t h e r methods  conditions.  improve  indicated  of the Cronbach  20 p r e s e n t s t h e a v e r a g e  over  deviation for  b i a s c o n d i t i o n s . As w i t h  result  deviation  averaged  mean s q u a r e  t h e mean s q u a r e d d e v i a t i o n s  e x c e s s i v e mean s q u a r e  Table  f o r e a c h method o f  condition.  19 p r e s e n t s t h e a v e r a g e  e a c h method a v e r a g e d the  reliabilities  mean s q u a r e  was t h e  estimate i n  deviations  Once a g a i n , t h e C r o n b a c h which  were c o n s i d e r a b l y  i n t h e .5.5.6.4  and  The PC method had t h e s m a l l e s t  with  method larger  .0.0.6.4 mean  square  deviat ion.  B. R E L I A B I L I T Y OF SUM Tables the of  21 t h r o u g h  combinations  32 p r e s e n t t h e r e s u l t s  of c o n d i t i o n s  e s t i m a t e s of the r e l i a b i l i t y  methods o f w e i g h t i n g j u d g e s estimates of true These  results  the  reliability  for  each  comparisons j u d g e s , two  reliability,  and two  score v a r i a n c e . a r e d i s c u s s e d s e p a r a t e l y . With  combination  v a l u e computed o v e r a t h o u s a n d presented  to  o f t h e sum o v e r  f o r maximum  o f t h e sum o v e r  condition  with respect  f o r each of  i n T a b l e 33.  raters,  respect to  the p o p u l a t i o n value  was e s t i m a t e d by t h e a v e r a g e replications.  The r e s u l t s a r e  37 The all  estimates  given  by t h e a l p h a  c o n d i t i o n s are given Figure  with  1 gives  the expected  t h e mean r e l i a b i l i t y  estimates  averaged over  method o f e s t i m a t i o n the  i n Table  and G r e e n methods f o r  34. reliability  o f sum o f t h e a l p h a  rater  sample s i z e .  estimated  reliabilities  better estimate  Figure  2 gives  o f t h e sum o f j u d g e s a v e r a g e d  s i z e s . The G r e e n method o f e s t i m a t i o n  better  than  C.  respect  the  virtually  two methods g i v e  method p e r f o r m s w e l l under  unweighted  reliability  Figure unweighted  reliability.  identical  conditions  The i n c r e a s e  rater  general  Neither sample  the weights i s greatest  3 graphs the a c t u a l r e l i a b i l i t i e s  size  result when t h e  i s large.  of the  sum a s d e t e r m i n e d by t h e 1000 r e p l i c a t i o n s a s  (or Burt's)  averaged over over  21-32 t h a t  results.  i s low and t h e sample  well as the a c t u a l r e l i a b i l i t y Overall's  from T a b l e s  the 10-judges 5 - t a r g e t s  c o n d i t i o n . In a l l o t h e r  increased  performs  t o t h e two methods o f w e i g h t i n g f o r i t i s evident  in  again  over  RELIABILITY  maximum r e l i a b i l i t y ,  size  with  method under a l l c o n d i t i o n s .  WEIGHTING FOR MAXIMUM With  improve  weights  of the composite, t o maximize  150 r e p l i c a t i o n s .  using  reliability  These values  are averaged  b i a s c o n d i t i o n s . The w e i g h t e d c o m p o s i t e  greater  of  t h e a c t u a l and  sample  the alpha  and G r e e n  o f sum. B o t h e s t i m a t e s  increased  along  b i a s c o n d i t i o n s . The G r e e n  gives a considerably  actual r e l i a b i l i t y  o f sum  than  t h e c o n s e n s u s by about  i s in  .03. The o n l y  R E L I A B I L I T Y OF THE SUM AVERAG-ED OVER RATER BIAS . / OOOr  o o o  175-  I i  <jSo l  3  -J <?00 i  j  OQ  <  i  875^-  -J uj ^  i  • -» — *  ACTUAL GREEN ALPHA  225"  5-\0  10-5 SAMPLE SIZE  /0-/O (RATERS  -TASG-ETS)  20-20 CO  RELIABILITV OF THE SUM AVERAGED OVER S A M P L E S|2E . / ooo  ACTUAL GREEN  975 • o  o  — -  ALPHA  °I50  o  2]  .—I  oo <:  27£ -  B2.5 800  .5.5". 3.2 RATER  8/AS  Oo -0  /RELIABILITY-  COMPOSITE  OF  CONSENSUS  AVERAGED  OVER  hNb  WElGrHTEt)  RATER  B/AS  looo r  975,-  o o o  95o  9*5 3]  9oo  5  g  7  5  * ^  C W  O  N  E  I  S  E  N  S  G  H  T  E  U D  S -  COMPOSITE  825"  9oo L  1 /0-5" SAMPLE  SIZE  (raters  /0-/o -JARGETS)  A.0  -  2.0 ' o  41 exception  to this  where t h i s  relation  Figure sizes.  i s i n t h e 10-judges 5 - t a r g e t s  difference  i s reversed.  4 g r a p h s t h e same v a l u e s  The e f f e c t  condition,  averaged  over  a p p e a r s t o be r e d u c e d . T h e r e  between t h e two methods  sample  i s no  i n t h e .5.5.8.2  condit ion.  D. TRUE SCORE VARIANCE The  two methods o f e s t i m a t i n g  indicated of for  21 t h r o u g h  p e r f o r m a n c e . The p o p u l a t i o n each  replication  estimates the  i n Tables  replications  standard  d e v i a t i o n s and mean  a l s o a p p e a r s t o be s l i g h t l y  variance  patterns i s one The near  a c o n s i d e r a b l e amount o f  a s i n d i c a t e d by t h e l a r g e s q u a r e d e v i a t i o n s . The v a r i a n c e  reduced  and when t h e r e  o f measurement  E. TRUE  of the  when t h e a v e r a g e  rater  a r e no d i f f e r e n c e s i n t h e  judges.  SCORES  The in  i s high  score  similar  t o show an a v e r a g e  o f one, b u t w i t h  between  scales  true  v a r i a n c e as  and e a c h c o n d i t i o n c o m b i n a t i o n .  variance  reliability  score  32 show v e r y  g i v e means w h i c h a p p e a r  c o r r e c t value  true  Tables With  results  f o r the true  score  estimates  35 t h r o u g h 46. respect  t o t h e means, o n l y  c o n s e n s u s and C r o n b a c h - K e l l e y  methods  the consensus,  means and g e n e r a l l y a p p e a r  weighted  involve non-arbitrary  means. The c o n s e n s u s and C r o n b a c h - K e l l e y identical  are presented  methods  to provide  give good  RELIABILITY C  O  M  P  O  S  I  T  E  OF CONSENSUS AVERA&EO O  V  E  AND R  S  A  M  P  L  VJEIG-HTED SI2E-  E  IOOO -  i .975-  L  900 875"  850  -  •  CONSENSUS  •  WEIGHTEDC OM POSITE  815 800  1 .5". 5". 8. a  RATER  BIAS  0.0.6.V  43 estimates value  of  somewhat The  of  the a c t u a l  z e r o ) . The  of  their  on  judges.  under  great  e x t e n t . The  t o be  the other  larger  still.  hand, i s less  than of  and  true  in Table  47.  f o r a l l methods of e s t i m a t i o n t a r g e t s and  A l l methods a l s o  s e c o n d a r i l y on  very  well  show t h e  i n terms  highest  .5.5.8.2 c o n d i t i o n . D i s c r e p a n c y  appear  to a f f e c t  any  highest correlations  s t a n d a r d i z e d and  scores  of  standardized scores  t h e methods t o a  are obtained  maximum l i k e l i h o o d  in  by  methods,  show t h e h i g h e s t  the but  correlations  10-judges 5 - t a r g e t s c o n d i t i o n .  The square  tend  A l l methods p e r f o r m  the  b i a s d o e s not  the  standard deviations  are presented  t h e number of  correlations.  simple  standard  between e s t i m a t e s  mean c o r r e l a t i o n s  scale  weighted  the  the  one).  a l l c o n d i t i o n combinations  correlations  the  on  than  the  d e v i a t i o n (the p o p u l a t i o n value  mean c o r r e l a t i o n s  t h e number of  in  larger  estimate,  standard  depend p r i m a r i l y  the  t o be  deviations for  shows a s t a n d a r d d e v i a t i o n w h i c h  i s always  The  of  standard  consensus estimate  true scores  The  to. be  t h e o n l y ones w i t h n o n - a r b i t r a r y  t r u e s c o r e s . And  Cronbach-Kelley  consistently  for  the  the weighted  which  a population  c o n s e n s u s mean a p p e a r s  are  d e v i a t i o n s . The  deviations  the  a l w a y s has  unstable.  c o n s e n s u s method t e n d  The  (which  weighted  same e s t i m a t e s  standard  of  mean  MSE1  values  in Tables  d e v i a t i o n s (averaged  true score estimates  and  35  over  t o 46 150  represent  replications)  the a c t u a l  t h e mean between  t r u e s c o r e s . Once  44 again  o n l y the consensus,  Cronbach-Kelley values  The than  estimates  in Table  weighted  the o t h e r  consensus  and  methods have n o n a r b i t r a r y v a l u e s . The  f o r these  are presented  weighted  combinations  48.  c o n s e n s u s shows a c o n s i d e r a b l y h i g h e r  two  Cronbach-Kelley  under a l l c o n d i t i o n  MSE1  methods. The  method a r e  MSE1  slightly  values  for  the  s m a l l e r than  those  of  t h e c o n s e n s u s method. B o t h methods show s m a l l e r MSEI's the  l a r g e sample  particularly The  size  affected  MSE2 v a l u e s  represent  estimated  Table estimation  the  and  as  35  through  46  over  (converted to  sample mean and  150 the  variance  true score variance  t h e MSE2 v a l u e s  and  from  number of  MSE's f o r t h e  diverged  judges  from  performs  t h e MSEs of and  of t h e  10-judges  estimates  targets increased.  (consensus this  the  and  pattern in that  1 0 - j u d g e s 5 - t a r g e t s c o n d i t i o n were  those  of  combination.  p o o r l y . In g e n e r a l the  f o r e a c h method  t h a t none of t h e methods  'absolute' estimates  than  judges.  true score estimate)  each c o n d i t i o n  Cronbach-Kelley)  lower  of t h e  49 p r e s e n t s  exceptionally  Only  be  d e v i a t i o n s (averaged  of e s t i m a t e d  under  scores.  is clear  decreased  number of  to o v e r a l l  reliability  true  seem t o  reported in Tables  to the product  It  the  of t r u e s c o r e e s t i m a t e s  s c a l e of mean e q u a l  actual  by  t h e mean s q u a r e d  replications)  equal  c o n d i t i o n . They  MSE1  the  slightly  10-targets c o n d i t i o n .  45 All rater  estimates  effects  increased  of a b s e n c e of  rater r e l i a b i l i t y .  Cronbach-Kelley  systematic  The  the  .5.5.6.4  improved  of a v e r a g e  the  the  these  from  estimates,  .6  to  in a greater  improvement. The  weighted  weighted  standardized  and  resulted in  MSE's f o r t h e 5-targets equal  .0.0.6.4 c o n d i t i o n e x c e p t  condition. Finally  results  except  under  f o r the the  estimates  under the  smaller  20-judges  c o n d i t i o n , the  w o r s t was:  weighted  c o n s e n s u s and  The  results  5-targets  better  than  weighted a close  PC  the  f o r the  others.  standardized second,  But  i n the  ML  standardized  and  tied  for  for l a s t .  and  to  first, fourth, This  conditions. and  10-judges the  10-5  slightly  condition,  methods came out third  the  from best  seemed t o p e r f o r m 5-10  the  10-judges  c o n s i s t e n t . For  PC  and  .0.0.6.4 c o n d i t i o n s  10-targets  and  almost  standardized  rater bias  as  10-judges  gave  Comparing  tied  5-judges  lowest  c o n d i t i o n where  ML  .8  consensus,  the  performance  and  Cronbach-Kelley  standardized the  the  third,  c o n d i t i o n s were not  condition,  estimate  MSE.  of  standardized  f o r e a c h of  under  20-targets  order  weighted consensus second,  held  PC  10-judges 5 - t a r g e t s  10-targets  ranking  the  .5.5.8.2 and  .5.5.8.2 c o n d i t i o n gave a  and  estimates  and  of  resulted  ML  and  MSE's under  absence  p e r f o r m a n c e of  rater r e l i a b i l i t y  the  standardized  methods c o n s i s t e n t l y gave l o w e s t  bias  increase  rater bias  consensus,  .5.5.8.2 c o n d i t i o n . T h u s , a l t h o u g h  systematic the  MSEs under  b i a s c o n d i t i o n . T h e r e were however, v a r i a t i o n s i n  relative  the  showed h i g h e s t  the  ahead w i t h  consensus  and  PC  46 Cronbach-Kelley It for  last.  s h o u l d be  observed  the C r o n b a c h - K e l l e y  virtually  (ie.  method o f e s t i m a t i o n  unchanged under  Table  50  before  presents  t h e MSE1  and  estimation  averaged  over  represents  these  i t is virtually  because  i t i s of no from  rater  graph  standardized,  standardized  same as ML,  and  and  methods gave b e t t e r t r u e s c o r e e s t i m a t e s methods. The  method p e r f o r m e d  the o t h e r s  their  worst  under  this  results  slightly  greater  improvement method  under  other  10-5  10-10  superiority  over  estimates  s t a n d a r d i z e d method) under  than  consensus  spread  else.  but  The  the o t h e r s the  was  t o be  ML  (as d i d  better  followed b a s i c a l l y  same p a t t e r n . B o t h methods a p p e a r e d  an  20-20 c o n d i t i o n .  method gave s l i g h t l y  the consensus,  very  c o n d i t i o n more b e c a u s e of  showed a c l e a r  the  performed  the  c o n d i t i o n . The  anything  Cronbach-Kelley  of  5-judges  estimates  method t h a n  The  form.  standardized  i n the  i n t h e ML  the weighted  either  methods. A l l methods gave  the  i n the  omitted  standardized  condition, especially  and.the C r o n b a c h - K e l l e y similar  the  5  omitted,  i n graph  than  c o n s e n s u s or C r o n b a c h - K e l l e y  10-targets c o n d i t i o n , while  PC  significance)  t h a t t h e ML  b e t t e r than  f o r the  components methods of  (with weighted the  consensus  bias conditions. Figure  particular  the  remained  f o r the  t h e MSE2 v a l u e s  principal  values  because  is clear  values  s t a n d a r d i z e d , weighted  maximum l i k e l i h o o d ,  deviations  rescaling.  r e s c a l i n g ) and  Cronbach-Kelley,  It  t h a t t h e mean s q u a r e d  affected  the  MEA/V RATER  SQUARE 6/AS  bEVIATloNS  AVERA&Eb  OVER  48 predominantly  by t h e number  of j u d g e s  as opposed t o the  number  of t a r g e t s . The o t h e r methods, on t h e o t h e r  showed  improvement w i t h  judges  and t h e number  Table averaged  51  over  results  methods  condition.  a spreading  not  closely  i n s t e a d of r a t e r  methods a r e g r a p h e d similar  the average  to a f f e c t  of  Again  in Figure  values rater  bias.  50 b u t  f o r the  6. .5.5.8.2  reliability  the s i z e  the  resulted  o f s c a l e of o f t h e MSEs b u t  between methods. The b e s t MSEs were g i v e n  standardized  ( i e . Burt's  followed closely  behind  these,  Cronbach-Kelley  F. UNIFORM  as T a b l e  o u t o f t h e methods. P r e s e n c e  the spread  rescaled)  t o t h e same v a r i a b l e s  b i a s seemed  the weighted  t h e number  targets.  indicated  Decreasing  measurement  of  sample s i z e  for selected  All  in  refers  i n c r e a s e s i n both  hand,  then  by t h e ML  1936  estimates  e s t i m a t e s . PC  followed  s t a n d a r d i z e d and f i n a l l y  and c o n s e n s u s  the  estimates.  DISTRIBUTION  Tables  52 t o 54 c o n t a i n t h e r e s u l t s  for estimates of:  reliability  (Table 52); r e l i a b i l i t y  o f sum,  reliability  and t r u e s c o r e v a r i a n c e  ( T a b l e 5 3 ) ; and t r u e  scores  (Table  54) under t h e sample s i z e  1 0 - t a r g e t s and t h e r a t e r difference previous  selected  between t h i s  one  reliability  by  a t random  r(ii) from  c o n d i t i o n 10-judges  b i a s c o n d i t i o n .5.5.6.4. The o n l y experimental  i s t h a t the s c a l i n g values  maximum  c o n d i t i o n and a  factor  B ( i ) and t h e  a s s o c i a t e d with each a uniform  distribution  judge  were  i n s t e a d of a  rtEA/V SQUARE SAMPLE S/ZE  DEVIATIONS  hVERh&Eb  OVER  C0NSE/V5U5  C RON 8 AC H' KBLLEY STANDARD/ZED E l&HTED S TANt>t\Rb1ZEb  200  /5b  oo o 0^ UJ  100  5: 75$0 if•  .0-0. 6. ¥ RATER  BIAS  50 normal d i s t r i b u t i o n . distribution  The i n t e r v a l s  were c h o s e n  such  of t h e u n i f o r m  that the d i s t r i b u t i o n s  had t h e  same means and v a r i a n c e s a s t h e n o r m a l c o u n t e r p a r t s . The corresponding Tables in  the uniform  distribution  of a l l r e s u l t s  rater  reliability  correlations of  with  estimates appear  The actual  are s l i g h t l y  condition,  lower  lower.  had s l i g h t l y  reliabilities.  estimates  was  pattern  (although  these  estimates  standard  lower.  f o r the uniform  the r e l a t i v e  order  c o n d i t i o n as w e l l ,  as f o r t h e n o r m a l  reliability  The t r u e s c o r e v a r i a n c e ,  t h e same. The mean s q u a r e  f o r the uniform  The  of the t r u e s c o r e e s t i m a t e s  lower  but a g a i n  t o be  estimates  slightly  correlations  tended  In p a r t i c u l a r ,  t o be u n s t a b l e ) . The c o r r e s p o n d i n g were a l s o  preserved  in various directions.  were c o n s i d e r a b l y l o w e r  deviations  were  c o n d i t i o n but t h e r e  the a c t u a l  sums were s l i g h t l y  lower  results are contained in  8, 28 and 42. The p a t t e r n s of r e s u l t s  shifts all  normal d i s t r i b u t i o n  distribution.  with the  distribution  of t h e v a r i o u s d e v i a t i o n s were but showed t h e same  Table 1 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s f r o m A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 5 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 80 1 .20 .50 Normal Normal  Estimates  SD  R  MSE  Actual  0.793 (0.05)  0. 1 02 (0. 03)  Shen  0.795 (0.11)  0. 1 52 (0. 07)  0. 612 (0. 30)  0.026 (0.03)  Cronbach  0.836 (0.12)  0. 254 (0. 12)  0. 449 (0. 39)  0.063 (0.06)  PC  0.823 (0.08)  0. 1 07 (0. 06)  0. 610 (0. 31 )  0.016 (0.02)  ML  0.787 (0.10)  0. 1 52 (0. 07)  0. 628 (0. 31)  0.022 (0.02)  0.792 (0.10)  0. 068 (0. 04)  0. 614 (0. 31 )  0.015 (0.02)  Avg  Fisher-Z  r with  Sum  0.841 (0.08)  0. 086 (0. 05)  0. 584 (0. 35)  0.016 (0.01)  r with  Z-Sum  0.848 (0.08)  0. 086 (0. 05)  0. 609 (0. 32)  0.016 (0.01)  Note. Standard  deviations are given  in parentheses.  52  Table  2  Means ( o v e r r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 5 10 150  Mean SD Distribution  Rater E s t imate  Reliability  Level - Scale R e l i a b i l i t y 0 1 .60 .50 .50 .40 Normal Normal Normal ;  Estimates  Mean  SD-  Actual  0.594 (0.09)  0. 206 (0. 06)  Shen  0.57.1 (0.20)  0. 239 (0. 08)  0 .646 (0 .37)  0.057 (0.04)  Cronbach  0.657 (0.16)  0. 323 (0. 14)  0 .599 (o .37)  0.086 (0.10)  PC  0.675 (0.11)  0. 206 (0. 09)  0 .638 (0 .35)  0.045 (0.03)  ML  0.619 (0.13)  0. 257 (0. 09)  0 .663 (0 .35)  0.046 (0.03)  0.593 (0.16)  0. 1 28 (0. 07)  0 .647 (0 .35)  0.040 (0.03)  Avg  Fisher-Z  R  MSE  r with  Sum  0.666 (0.1-5)  0. 1 69 (0. 09)  0 .623 (0 .35)  0.049 (0.04)  r with  Z-Sum  0.687 (0.14)  0. 1 72 (0. 09)  0 .642 (0 .35)  0.048 (0.03)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  53  Table  3  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  _N 5 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  Estimates  SD  R  MSE  Actual  0.613 (0.10)  0. 21 1 (0. 06)  Shen  0.562 (0.21 )  0. 251 (0. 10)  0. 663 (0. 32)  0.065 (0.06)  Cronbach  0.653 (0.33)  0. 353 (0. 60)  0. 768 (0. 23)  0,440 (3.85)  PC  0.668 (0.12)  0. 223 (0. 09)  0. 656 (0. 31 )  0.043 (0.03)  ML  0.613 (0.13)  0. 273 (0. 09)  0. 667 (0. 34)  0.050 (0.04)  0.580 (0.18)  0. 1 43 (0. 07)  0. 685 (0. 26)  0.045 (0.04)  Avg  Fisher-Z  r with  Sum  0.655 (0.17)  0. 188 (0. 09)  0. 645 (0. 31 )  0.051 (0.04)  r with  Z-Sum  0.673 (0.16)  0. 1 90 (0. 09)  0. 681 (0. 26)  0.048 (0.04)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  54  Table  4  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  _N 10 5 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Actual  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  Estimates  SD  R  MSE  0. 800 (0. 03)  0. 1 07 (0. 02)  0.798 (0. 17)  0. 202 (0. 1 1 )  0 .463 (0 .27)  0. 065 (0. 08)  Cronbach  0. 807 (0. 13)  0. 238 (0. 13)  0 .485 (0 .26)  0. 070 (0. 13)  PC  0. 793 (0. 12)  0. 180 (0. 09)  0 .503 (0 .25)  0. 042 (0. 04)  ML  0. 778 (0. 13)  0. 199 (0. 09)  0 .507 (0 .26)  0. 048 (0. 05)  0. 779 (0. 16)  0. 131 (0. 09)  0 .489 (0 .25)  0. 047 (0. 08)  Shen  Avg  .  Fisher-Z  r with  Sum  0. 831 (0. 13)  0. 1 57 (0. 12)  0 .491 (0 .24)  0. 048 (0. 07)  r with  Z-Sum  0. 836 (0. 13)  0. 1 56 (0. 12)  0 .494 (0 .24)  0. 047 (0. 07)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  55  Table 5 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 5 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  Estimates  SD  R  MSE  Actual  0.599 (0.07)  0. 214 (0. 04)  Shen  0.624 (0.20)  0. 294 (0. 09)  0 .503 (0 .28)  0. 1 03 (0.05)  Cronbach  0.750 (0.55)  0. 424 (0. 51 )  0 .535 (0 .28)  0.677 (3.79)  PC  0.667 (0.12)  0. 266 (0. 07)  0 .526 (0 .27)  0.074 (0.04)  ML  0.644 (0.13)  0. 288 (0. 08)  0 .528 (0 .28)  0.079 (0.04)  0.595 (0.22)  0. 217 (0. 10)  0 .529 (0 .26)  0.091 (0.08)  Avg  Fisher-Z  r with  Sum  0.672 (0.20)  0. 281 (0. 14)  0 .513 (0 .26)  0.119 (0.10)  r  Z-Sum  0.687 (0.19)  0. 280 (0. 14)  0 .534 (0 .26)  0.115 (0.11)  with  Note. Standard  deviations are given  i n parentheses.  56  Table 6 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s a n d Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of R a t e r R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 5 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  Estimates  SD  R  MSE  Actual  0.599 (0.07)  0. 212 (0. 04) .  Shen  0.601 (0.24)  0. 276 (0. 10)  0. 448 (0. 31 )  0.115 (0.06)  Cronbach  0.684 (0.24)  0. 359 (0. 34)  0. 551 (0. 27)  0.261 (1.16)  PC  0.662 (0.14)  0. 263 (0. 08)  0. 479 (0. 30)  0.080 (0.04)  ML  0.641 (0.15)  0. 285 (0. 08)  0. 467 (0. 33)  0.088 (0.05)  0.591 (0.23)  0. 210 (0. 10)  0. 500 (0. 25)  0.094 (0.08)  Avg  Fisher-Z  r  with  Sum  0.676 (0.20)  0. 274 (0. 14)  0. 477 (0. 26)  0.118 (0.09)  r  with  Z-Sum  0.685 (0.19)  0. 273 (0. 14)  0. 503 (0. 24)  0.113 (0.08)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  57  Table 7 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  Estimates  SD  R  MSE  Actual  0.805 (0.03)  0. 1 07 (0. 02)  Shen  0.826 (0.09)  0. 1 48 (0. 07)  0. 675 (0. 18)  0.022 (0.02)  Cronbach  0.827 (0.08)  0. 175 (0. 06)  0. 623 (0. 23)  0.026 (0.02)  PC  0.823 (0.08)  0. 1 29 (0. 06)  0. 683 (0. 18)  0.017 (0.02)  ML  0.807 (0.08)  0. 1 47 (0. 06)  0. 694 (0. 18)  0.019 (0.02)  0.816 (0.09)  0. 079 (0. 04)  0. 680 (0. 18)  0.014 (0.02)  Sum  0.875 (0.06)  0. 091 (0. 05)  0. 671 (0. 19)  0.017 (0.01)  Z-Sum  0.878 (0.06)  0. 091 (0. 05)  0. 676 (0. 19)  0.017 (0.01)  Avg r  Fisher-Z  with  r with  Note. Standard  deviations are given  i n parentheses.  58  Table 8 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  Estimates  SD  R  MSE  Actual  0. 594 (0. 07)  0. 215 (0. 04)  Shen  0. 581 (0. 16)  0. 266 (0. 06)  0. 727 (0. 18)  0 .053 (0 .04)  Cronbach  0. 625 (0. 1 1 )  0. 279 (0. 07)  0. 730 (0. 18)  0 .046 (0 .03)  PC  0. 638 (0. 10)  0. 238 (0. 06)  0. 724 (0. 18)  0 .038 (0 .02)  ML  0. 608 (0. 1 1 )  0. 263 (0. 05)  0. 744 (0. 18)  0 .039 (o .03)  0. 593 (0. 13)  0. 1 55 (0. 05)  0. 716 (0. 18)  0 .036 (0 .02)  Avg  Fisher-Z  r with  Sum  0. 701 (0. 1 1 )  0. 1 97 (0. 07)  0. 689 (0. 18)  0 .051 (0 .03)  r with  Z-Sum  0. 716 (0. 10)  0. 198 (0. 07)  0. 709 (0. 18)  0 .051 (0 .03)  Note. Standard  deviations are given  i n parentheses.  59  Table 9 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 60 ..0 .40 Normal Normal  Estimates  SD  R ;•  MSE  Actual  0. 605 (0. 07)  0. 213 (0. 04)  Shen  0. 604 (0. 15)  0. 264 (0. 07)  0. 703 (0. 22)  0. 052 (0. 03)  Cronbach  0. 626 (0. 1 1 )  0. 258 (0. 07)  0. 760 (0. 19)  0. 036 (0. 02)  PC  0. 653 (0. 10)  0. 233 (0. 06)  0. 707 (0. 21)  0. 038 (0. 02)  ML  0. 626 (0. 1 1 )  0. 255 (0. 06)  0. 715 (0. 24)  0. 039 (0. 02)  0. 614 (0. 13)  0. 1 50 (0. 05)  0. 695 (0. 20)  0. 035 (0. 02)  Sum  0. 723 (0. 10)  0. 1 90 (0. 07)  0. 675 (0. 22)  0. 051 (0. 03)  Z-Sum  0. 732 (0. 10)  0. 190 (0. 07)  0. 689 (0. 20)  0. 051 (0. 03)  Avg r  Fisher-Z.  with  r with  Note. Standard  deviations are given  in parentheses.  60  Table  10  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 20 20 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .20 .50 Normal Normal  Estimates  SD  R  MSE  Actual  0.,797 (0. ,02)  0. 1 07 (0. 01 )  Shen  0..808 (0. .05) .  0. 1 30 (0. 04)  0.,813 (0. ,08)  0.,009 (0. ,01 )  Cronbach  0..801 (0. .06)  0. 1 37 (0. 03)  0.,793 (0. ,09)  0.,010 (0. ,01 )  PC  0,.802 (0. .05)  0. 121 (0. 03)  0..818 (0. .08)  0.,008 (0. ,01 )  ML  0,.793 (0, .05)  0. 131 (0. 03)  0..826 (0, .08)  0.,009 (0. .01 )  0,.800 (0, .05)  0. 069 (0. 02)  0,.812 (0, .09)  0.,007 (0. .01 )  Avg  Fisher-Z  r with  Sum  0,.878 (0, .03)  0. 077 (0. 02)  0,.804 (0, .09)  0,.012 (0. .01 )  r with  Z-Sum  0,.880 (0, .03)  0. 077 (0. 02)  0,.808 (0, .09)  0..013 (0. ,01 )  Note. Standard  deviations are given  i n parentheses.  61  Table  11  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y Estimates,-  Raters Targets Replications  N 20 20 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  Estimates  SD  R  MSE -  •  Actual  0.601 (0.04)  0. 216 (0. 02)  Shen  0.579 (0.11)  0. 256 (0. 04)  0 .862 (0 .06)  0.027 (0.02)  Cronbach  0.602 (0.08)  0. 250 (0. 04)  0 .868 (0 .06)  0.021 (0.01)  PC  0.610 (0.08)  0. 232 (0. 04)  0 .862 (0 .06)  0.019 (0.01)  ML  0.595 (0.08)  0. 243 (0. 03)  0 .871 (0 .06)  0.019 (0.01)  0.584 (0.09)  0. 141 (0. 03)  0 .842 (0 .07)  0.023 (0.01)  Avg  Fisher-Z  r with  Sum  0.724 (0.07)  0. 1 78 (0. 04)  0 .828 (0 .07)  0.035 (0.02)  r  Z-Sum  0.732 (0.07)  0. 178 (0. 04)  0 .834 (0 .07)  0.036 (0.02)  with  Note. Standard  d e v i a t i o n s are given  i n parentheses.  62  Table  12  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s a n d Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 20 20 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y .60 1 .40 .0 Normal Normal  Estimates  SD  R  MSE  Actual  0. 608(0. 05)  0.,216 (0. ,02)  Shen  0. 601 (0. 10)  0..251 (0. .04)  0..868 (0. .06)  0. 022 (0. 01 )  Cronbach  0. 61 1 (0. 08)  0..239 (0. .04)  0.,882 (0, .05)  0. 017 (0. 01 )  PC  0. 626 (0. 08)  0..227 (0. .04)  0..867 (0. .06)  0. 017 (0. 01 )  ML  0. 61 1 (0. 08)  0,.240 (0, .04)  0,.879 (0, .06)  0. 017 (0. 01 )  0. 604 (0. 09)  0,. 1 38 (0, .03)  0,.849 (0, .06)  0. 020 (0. 01 )  Avg  Fisher-Z  r with  Sum  0. 741 (0. 06)  0,. 1 70 (0, .04)  0,.836 (0, .06)  0. 035 (0. 02)  r with  Z-Sum  0. 746 (0. 06)  0,.171 (0, .04)  0,.840 (0, .06)  0. 036 (0. 02)  Note. Standard  deviations.are given  i n parentheses.  63  Table  13  A v e r a g e Over R e p l i c a t i o n s o f R e l i a b i l i t y Sample S i z e E s t imate  Rater  Bias  Estimate  Means  (judges-targets)  5-10  1 0-5  10-10  20-20  Shen  .5.5.8.2 .5.5.6.4 .0.0.6.4  .795 .571 .562  .798 .624 .601  .826 .581 .604  .808 .579 .601  Cronbach  .5.5.8.2 .5.5.6.4 .0.0.6.4  .836 .657 .653  .807 .750 .684  .827 .625 .626  .801 .602 .611  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  .823 .675 ,668  .793 .667 .662  .823 .638 .653  .802 .610 .626  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  .787 .619 .613  .778 .644 .641  .807 .608 .626  .793 .595 .611  Avg F i sher-z  .5.5.8.2 .5.5.6.4 .0.0.6.4  .792 .593 .580  .779 .595 .591  .816 .593 .614  .800 .584 .604  r with Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .841 .666 .655  .831 .672 .676  .875 .701 .723  .878 .724 .741  r with Z-Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .848 .687 .673  .836 .687 .685  .878 .716 .732  .880 .732 .746  64  Table  14  Mean S t a n d a r d D e v i a t i o n s o f R e l i a b i l i t y  Estimates  Sample S i z e Estimate  Rater  Bias  (-judges-targets)  5-10  10-5  10-10  20-20  Shen  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 52 .239 .251  .202 .294 .276  .148 .266 .264  .130 .256 .251  Cronbach  .5.5.8.2 .5.5.6.4 .0.0.6.4  .254 .323 .353  .238 .424 .359  . 175 .279 .258  . 1 37 .250 .239  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 07 .206 .223  .180 .266 .263  .129 .238 .233  .121 .232 .227  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 52 .257 .273  .199 .288 .285  . 1 47 .263 .255  .131 .243 .240  Avg F i sher-z  .5.5.8.2 .5.5.6.4 .0.0.6.4  .068 . 1 28 . 1 43  .131 .217 .210  .079 . 1 55 . 1 50  .069 . 141 . 1 38  r with Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .086 .169 .188  . 157 .281 .274  .091 . 197 .190  .077 . 1 78 . 1 70  r with Z-Sum  .5:5.8.2 .5.5.6.4 .0.0.6.4  .086 . 1 72 . 1 90  , 1 56 .280 .273  .091 .198 .190  .077 . 1 78 .171  65  Table  15  Reliabilities Sample S i z e Estimate  Rater  Bias  (judges-targets)  5-10  1 0-5  10-10  20-20  Shen  .5.5.8.2 .5.5.6.4 .0.0.6.4  .612 .646 .663  .463 .503 .448  .675. .727 .703  .813 .862 .868  Cronbach  .5.5.8.2 .5.5.6.4 .0.0.6.4  .449 .599 .768  .485 .535 .551  .623 .730 .760  .793 .868 .882  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  .610 .638 .656  .503 .526 .479 •  .683 .724 .707  .818 .862 .867  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  .628 .663 .667  .507 .528 .467  .694 .744 .715  .826 .871 .879  Avg F i sher-z  .5.5.8.2 .5.5.6.4 .0.0.6.4  .614 .647 .685  .489 .529 .500  .680 .716 .695  .812 .842 .849  r with Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .584 .623 .645  .491 .513 .477  .671 .689 .675  .804 .828 .836  r with Z-Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .609 .642 .681  .534 .534 .503  .709 .709 .689  .834 .834 .840  66  Table  16  C o r r e l a t i o n s Between R e l i a b i l i t y E s t i m a t e s R e l i a b i l i t i e s A v e r a g e d Over R a t e r B i a s Sample  Size  Estimate  5-10  10-5  Shen  .640  Cronbach  and A c t u a l  (judges-targets) 10-10  20-20  .471  .702  .848  .605  .524  .704  .848  PC  .635  .503  .705  .848  ML  .653  .501  .718  .859  Avg F i s h e r - z  .649  .506  .697  .834  r w i t h Sum  .617  .494  .678  .823  r with  .644  .510  .691  .827  Z-Sum  .  Table  17  C o r r e l a t i o n s Between R e l i a b i l i t y E s t i m a t e s R e l i a b i l i t i e s A v e r a g e d Over Sample S i z e Rater Estimate  Bias  and A c t u a l  (SD(A),SD(B),Mean(r),SD(r))  .5.5.8.2  .5.5.6.4  .0.0.6.4  .641  .685  .671  .588  .683  .740  PC  .654  .688  .677  ML  .664  .702  .682  .649  .684  .682  Shen Cronbach  Avg  Fisher-z.  < -  r  with  Sum  .638  .663  .658  r  with  Z-Sum  .647  .680  .678  68  Table  18  A v e r a g e Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s  of R e l i a b i l i t y  Sample S i z e E s t imate  Rater  Bias  Estimates  (judges-targets)  5-10  1 0-5  10-10  20-20  Shen  .5.5.8.2 .5.5.6.4 .0.0.6.4  .026 .057 .065  .065 . 1 03 .115  .022 .053 .052  .009 .027 .022  Cronbach  .5.5.8.2 .5.5.6.4 .0.0.6.4  .063 .086 .440  .070 .667 .261  .026 .046 .036  .010 . .021 .017  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  .016 .045 .043  .042 .074 .080  .017 .038 .038  .008 .019 .017  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  .022 .046 .050  .048 .079 .088  .019 .039 .039  .009 .019 .017  .5.5.8.2 .5.5.6.4 ' .0.0.6.4  .015 .040 .045  .047 .091 .094  .014 .036 .035  .007 .023 .020  r with Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .016 .049 .051  .048 .119 .118  .017 .051 .051  .012 .035 .035  r with Z-Sum  .5.5.8.2 .5.5.6.4 .0.0.6.4  .016 .048 .048  .047 .115 .113  .017 .051 .051  .013 .036 .036  Avg Fisher-z  69  Table  19  Mean S q u a r e D e v i a t i o n s o f R e l i a b i l i t y E s t i m a t e s A c t u a l R e l i a b i l i t i e s A v e r a g e d Over R a t e r B i a s Sample S i z e  from  (judges-targets)  Estimate  5-10  1 0-5  10-10  20-20  Shen  .049  .094  .042  .019  Cronbach  .196  .333  .036  .016  PC  .035  .065  .031  .015  ML  .039  .072  .032  .015  .033  .077  ..028  .017  ..039  ' . .095  .040  .027  .037  .092  .040  .028  Avg  Fisher-z  r  w i t h Sum  r  with  Z-Sum  Table  20  Mean S q u a r e D e v i a t i o n s o f R e l i a b i l i t y E s t i m a t e s A c t u a l R e l i a b i l i t i e s A v e r a g e d Over Sample S i z e Rater Estimate  Bias  from  (SD(A),SD(B),Mean(r),SD(r))  .5.5.8.2  .5.5.6.4  .0.0.6.4  Shen  .031  .060  .064  Cronbach  .042  .205  .189  PC  .021  .044  .045  ML  .025  .046  .049  .021  .048 '  .049  Avg  Fisher-z  r  with  Sum  .023  .064  .064  r  with  Z-Sum  .023  .063  .062  71  T a b l e 21 C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s of T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 5 10 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 80 1 .20 .50 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.881  0.066  0.008  Green  0.931  0.047  0.002  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.946  0.035  Overall  0.946  0.035  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1.029  0.733  0.534  Avg  b = 1  1.076  0.747  0.560  72  Table  22  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two Methods; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 5 10 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .60 .'50 .40 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.766  0. 1 34  0.021  Green  0.825  0.119  0.014  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.872  0.099  Overall  0.872  0.098  True  MSE  Reliability  SD  Score  Variance  E s t imate  Mean  Avg Cov  0.995  0.667  0.442  Avg  1 .062  0.690  0.477  b = 1  SD  MSE  73  Table  23  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 5 10 150  Level 0 .0 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.788  0.161  0..029  Green  0.820  0. 1 28  0.017  Weighting  SD  f o r Maximum  MSE  Reliability  Estimate  Mean  PC  0.878  0. 1 05  Overall  0.882  0.091  True  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  0.943  0.54.5  0.299  Avg  b = 1  0.970  0.544  0.295  74  Table  24  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e .  Raters Targets Replications  N 10 5 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  o f Sum  E s t imate  Mean  Alpha  0.915  0.098  0.012  Green  0.951  0.074  0.006  Weighting  SD  f o r Maximum  E s t imate  Mean  PC  0.944  0.057  Overall  0.945  0.055  True  MSE  Reliability  SD  Score  Variance  E s t imate  Mean  SD  MSE  Avg  Cov  0.899  0.782  0.617  Avg  b = 1  0.924  0.794  0.632  75  Table  25  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two Methods; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e  Raters Targets Replications  N 10 5 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y .60 1 .40 .50 Normal Normal  o f Sum  E s t imate  Mean  Alpha  0.786  0.252  0.076  Green  0.866  0. 1 57  0.026  Weighting  MSE  SD  f o r Maximum  Reliability  E s t imate  Mean  PC  0.860  0. 1 58  Overall  0.873  0. 1 27  True  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  0.968  0.792  0.625  Avg  b = 1  1.018  0.804  0.642  76  Table  26  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s of R e l i a b i l i t y of C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e .  Raters Targets Replications  N 10 5 150  Level 0 .0 Normal  Mean SD Distribution  Reliability E s t imate  Mean  Alpha Green  Scale R e l i a b i l i t y 1 .60 .0 40 Normal Normal  o f Sum SD  MSE  0.836  0.201  0.046  0.887  0. 126  0.016  Weighting  f o r Maximum  Reliability  E s t imate  Mean  PC  0.848  0. 1 54  Overall  0.858  0. 1 34  True  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1.022  0.792  0.624  Avg  b = 1  1.049  0.790  0.623  77  Table  27  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e .  Raters Targets Replications  N 10 10 150  Level 0 .50 Normal  Mean SD _ Distribution  Reliability  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.943  0.031  0.002  Green  0.968  0.022  0.000  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.979  0.015  Overall  0.979  0.015  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1.013  0.528  0.277  Avg  b = 1  1.037  0.536  0.286  78  Table  28  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s of T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 10 10 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y .60 1 .40 .50 Normal Normal  o f Sum  E s t imate  Mean  Alpha  0.862  0.079  0.008  Green  0.893  0.066  0.004  Weighting  SD  f o r Maximum  E s t imate  Mean  PC  0.941  0.071  Overall  0.944  0.047  True  MSE  Reliability  SID  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1 .036  0.647  0.417  Avg  b = 1  1.067  0.660  0.437  79  Table  29  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e .  Raters Targets Replications  N 10 10 150  Level 0 0 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 60 .0 .40 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.900  0.061  0.004  Green  0.913  0.052  0.003  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.947  0.048  Overall  0.947  0.048  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1.038  0.476  0.227  Avg  b = 1  1.050  0.476  0.227  80  Table  30  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y of Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f . R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e .  Raters Targets Replications  N 20 20 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y .80 1 .20 .50 Normal Normal  o f Sum  Estimate  Mean  Alpha  0.973  0.008  0.000  Green  0.983  0.007  0.000  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.993  0.004  Overall  0.993  0.004  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  0.966  0.320  0.103  Avg  b = 1  0.976  0.323  0.104  81  T a b l e 31 C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 20 20 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  o f Sum  E s t imate  Mean  Alpha  0.928  0.029  0.001  Green  0.941  0.028  0.001  Weighting  SD  f o r Maximum  E s t imate  Mean  PC  0.985  0.009  Overall  0.985  0.009  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  0.950  0.399  0.161  Avg  b = 1  0.962  0.402  0.162  82  Table  32  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 20 20 150  Level 0 .0 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  o f Sum  E s t imate  Mean  Alpha  0.951  0.020  0.000  Green  0.953  0.019  0.000  Weighting  SD  f o r Maximum  Estimate  Mean  PC  0.986  0.009  Overall  0.986  0.009  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  1.009  0.333  0.110  Avg  b = 1  1.011  0.333  0.110  83  Table  33  Expected Number of Raters  Reliability  o f t h e Sum o f R a t e r s  Rater  Bias  (SD(A),SD(B),Mean(r),SD(r))  .5.5.8.2  .5.5.6.4  .0.0.6.4  5  .9404  .8236  .8422  10  .9687  .8992  .9123  20  .9839  .9463  .9540  84  Table  34  Estimates  of R e l i a b i l i t i y  o f t h e Sum Sample S i z e  Estimate  Rater  Bias  (judges-targets)  5-10  10-5  10-10  20-20  Alpha  .5.5.8.2 .5.5.6.4 .0.0.6.4  .881 .766 .788  .915 .786 .836  .943 .862 .900  .973 .928 .951  Green  .5.5.8.2 .5.5.6.4 .0.0.6.4  .931 .825 .820  .951 .866 .887  .968 .893 .913  .983 .941 .953  85  Table  35  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f -True S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Ra t e r s Targets Replications  N 5 10 150  Mean SD Di s t r i b u t i o n  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  Scores SD  R  MSE 1  MSE 2  Actual  -0 .014 (0 .30)  0. 963 (0. 22)  Consensus  -0 .015 (0 .37)  1 .018 (0. 32)  0 .964 (0 .03)  0. 151 (0. 1 1 )  0. 143 (0. 11 )  Wconsensus  -0 .028 (0 .44)  1. 014 (0. 37)  0 .970 (0 .03)  0. 231 (0. 24)  0. 1 34 (0. 11)  Standardized  -0 .000 (0 .00)  0. 904 (0. 05)  0 .972 (0 .02)  0. 1 63 (0. 14)  0. 133 (0. 10)  Wstandardize  -o .000 (0 .00)  0. 955 (0. 03)  0 .972 (0 .03)  0. 1 66 (0. 14)  0. 1 32 (0. 1 1 )  Cronb-Kelley  -o .015 (0 .37)  0. 955 (0. 33)  0 .964 (0 .03)  0. 1 43 (0. 11)  0. 143 (0. 11 )  PC  Scores  -0 .000 (o .00)  1. 000 (0. 00)  0 .973 (0 .02)  0. 1 75 (0. 14)  0. 131 (0. 10)  ML  Scores  -o .000 (0 .00)  0. 987 (0. 01 )  0 .972 (0 .03)  0. 171 (0. 15)  0. 1 32 (0. 1 1 )  Note. Standard  deviations are given  i n parentheses.  8 6  Table  36  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 5 10 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y .60 1 .40 .50 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  -o. 005 (0. 30)  0,.993 (0, .22)  Consensus  -0. 014 (0. 36)  1 .078 , (0. .30)  0,.899 (0, .08)  0. 301 (0. 19)  0. 265 (0. 16)  Wconsensus  0. 013 (0. 44)  1,.117 (0. .35)  0..932 (0, .07)  0. 340 (0. 29)  0. 221 (0. 15)  Standardized  0. 000 (0. 00)  0,.803 (0. .09)  0,.928 (0. .05)  0. 247 (0. 15)  0. 229 (0. 14)  Wstandardize  0. 000 (0. 00)  0,.924 (0, .05)  0,.938 (0. .07)  0. 226 (0. 16)  0. 213 (0. 15)  Cronb-Kelley  -o. 014 (0. 36)  0,.901 (0. .32)  0,.899 (0, .08)  0. 263 (0. 16)  0. 265 (0. 16)  PC  Scores  0. 000 (0. 00)  1..000 (0, .00)  0,.934 (0. .05)  0. 242 (0. 14)  0. 217 (0. 14)  ML  Scores  0. 000 (0. 00)  0,.973 (0, .02)  0,.937 (0. .07)  0. 230 (0. 16)  0. 214 (0. 15)  Note. Standard  deviations are given  i n parentheses.  87  Table  37  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 5 10 150  Mean SD Distribution  True Estimate  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 40 Normal Normal  Scores SD  MSE 1  MSE2  Actual  0..035 (0. .33)  0,.952 (0. .23)  Consensus  0,.028 (0, .36)  1 .038 . (0, .25)  0..904 (0. ,08)  0., 1 97 (0. ,14)  0.,169 (0. 10)  Wconsensus  0..000 (0, .36)  1 .087 , (0, .26)  0.,928 (0. ,09)  0.,180 (0. ,24)  0., 129 (0. ,11)  Standardized  0,.000 (0. .00)  0..794 (0, .09)  0.,928 (0. ,06)  0.,251 (0. ,18)  0., 1 42 (0. ,09)  Wstandardize  0,.000 (0. .00)  0,.922 (0, .05)  0.,931 (0. ,09)  0.,242 (0. , 18)  0.,125 (0. ,10)  Cronb-Kelley  0,.028 (0, .36)  0..868 (0, .29)  0.,904 (0. ,08)  0., 1 67 (0. ,10)  0., 169 (0. ,10)  PC  Scores  0..000 (0. .00)  1..000 (0. .00)  0.,924 (0. ,11)  0.,270 (0. ,21 )  0., 1 32 (0. ,10)  ML  Scores  0..000 (0. .00)  0,.972 (0. .02)  0.,929 (0. ,10)  0.,254 (0. ,19)  0., 1 26 (0. ,10)  Note. Standard  deviations are given  i n parentheses.  88  Table  38  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 5 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  0..054 (0. .46)  0,.912 (0, .33)  Consensus  0..046 (0. .50)  0,.904 (0, .36)  0.,975 (0. ,05)  0..069 (0. .06)  0,.070 (0. .06)  Wconsensus  0..075 (0. .63)  0,.953 (0, .43)  0.,973 (0. ,05)  0..264 (0. .28)  0,.070 (0. .06)  Standardized  0..000 (0. .00)  0,.867 (0, .10)  0.,979 (0. ,04)  0..288 (0. .29)  0..067 (0. .06)  Wstandardize  0..000 (0. .00)  0,.978 (0, .04)  0.,973 (0. ,05)  0..324 (0. .30)  0..070 (0. .06)  Cronb-Kelley  0..046 (0. .50)  0,.874 (0, .37)  0.,975 (0. ,05)  0..069 (0. .06)  0..070 (0. .06)  PC  Scores  0..000 (0. ,00)  1..000 (0. .00)  0.,978 (0. ,05)  0..330 (0. .31 )  0..066 (0. .06)  ML  Scores  0.,000 (0, ,00)  0..998 (0. .00)  0.,973 (0. ,05)  0..334 .31 ) . (0.  0..070 (0. .06)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  89  Table  39  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 5 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .40 .50". Normal Normal  Scores SD  R  MSE1  MSE2  Actual  -0. 012 (0. 43)  0. 949 (0. 35)  Consensus  -0. 018 (0. 49)  0. 981 (0. 37)  0. 926 (0. 15)  0. 1 55 (0. 13)  0. 1 50 (0. 12)  Wconsensus  -0. 006 (0. 61 )  1 .1 05 (0. 48)  0. 945 (0. 10)  0. 384 (0. 68)  0. 1 35 (0. 1 1 )  Standardized  -0. 000 (0. 00)  0. 748 (0. 15)  0. 951 (0. 12)  0. 305 (0. 28)  0. 131 (0. 11)  W s t a n d a r d i ze  -0. 000 (0. 00)  0. 956 (0. 07)  0. 940 (0. 14)  0. 326 (0. 30)  0. 1 35 (0. 1 1 )  Cronb-Kelley  -o. 018 (0. 49)  0. 885 (0. 41 )  0. 926 (0. 15)  0. 1 50 (0. 12)  0. 1 50 (0. 12)  PC  Scores  -0. 000 (0. 00)  1. 000 (0. 00)  0. 941 (0. 16)  0. 343 (0. 32)  0. 131 (0. 1 1 )  ML  Scores  -o. 000 (0. 00)  0. 996 (0. 00)  0. 928 (0. 1 9 )  0. 350 (0. 32)  0. 1 36 (0. 1 1 )  Note. Standard  deviations are given  i n parentheses.  90  Table  40  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 5 150  Mean SD Distribution  True E s t imate  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  Scores' SD  R  MSE 1  MSE2  Actual  0 .017 (0 .42)  0. 943 (0. 35)  Consensus  0 .026 (0 .45)  0. 992 (0. 37)  0. 946 (0. 08)  0 .091 (0 .08)  0 .088 (0 .07)  Wconsensus  0 .014 (0 .48)  1 .1 76 (0. 40)  0. 926 (0. 15)  0 .246 (0 .41 )  0 .088 (0 .08)  Standardized  0 .000 (0 .00)  0. 747 (0. 15)  0. 960 (0. 05)  0 .290 (0 .32)  0 .078 (0 .06)  Wstandardize  0 .000 (0_ .00)  0. 960 (0. 05)  0. 927 (0. 15)  0 .323 (0 .31 )  0 .087 (0 .08)  Cronb-Kelley  0 .026 (0 .45)  0. 91 1 (0. 40)  0. 946 (0. 08)  0 .087 (0 .07)  0 .088 (0 .07)  PC  Scores  0 .000 (0 .00)  1. 000 (0. 00)  0. 938 (0. 16)  0 .334 (0 .33)  0 .080 (0 .09)  ML  Scores  0 .000 (0 .00)  0. 996 (0. 00)  0. 912 (0. 22)  0 .354 (0 .34)  0 .092 (0 .10)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  91  T a b l e 41 Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 10 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .20 .50 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  -0..014 (0. .31 )  0..994 (0. .22)  Consensus  0..012 (0. .37)  1 .000 . (0, .25)  0..985 (0. .01 )  0.,072 (0. ,05)  0,.072 (0. .05)  Wconsensus  0..003 (0. .41 )  1..013 (0. .33)  0..990 (0. .01 )  0., 1 39 (0. ,13)  0..064 (0. .05)  Standardized  -0..000 (0. .00)  0..902 (0. .05)  0..988 (0, .01 )  0., 1 52 (0. ,14)  0..067 (0, .05)  W s t a n d a r d i ze  -o..000 (0. .00)  0..963 (0. .02)  0..990 (0. .01 )  0.,151 (0. ,14)  0..063 (0. .05)  Cronb-Kelley  0..012 (0. .37)  0..972 (0. .26)  0..985 (0. .01 )  0.,071 (0. ,0-5)  0..072 (0. .05)  PC  Scores  -o..000 (0. .00)  1..000 (0. .00)  0..989 (0. .01 )  0., 1 58 (0. ,14)  0..066 (0. .05)  ML  Scores  -o..000 (0. .00)  0..995 (0. .00)  0..990 (0. .01 )  0., 1 55 (o. ,14)  0..063 (0. .05)  Note. Standard  deviations are given  i n parentheses.  92  Table  42  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 10 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  0.003 (0.32)  0. 981 (0. 22)  Consensus  0.004 (0.37)  1. 037 (0. 31 )  0. 939 (0. 05)  0. 1 74 (0. 1 1 )  0. 161 (0. 10)  Wconsensus  0.019 (0.44)  1. 082 (0. 37)  0. 970 (0. 03)  0. 215 (0. 25)  0. 1 14 (0. 09)  Standardized  0.000 (0.00)  0. 773 (0. 08)  0. 960 (0. 03)  0. 221 (0. 18)  0. 131 (0. 09)  Wstandardize  0.000 (0.00)  0. 920 (0. 07)  0. 972 (0. 03)  0. 181 (0. 17)  0. 1 10 (0. 08)  Cronb-Kelley  0.004 (0.37)  0. 937 (0. 31 )  0. 939 (0. 05)  0. 161 (0. 10)  0. 161 (0. 10)  PC  Scores  0.000 (0.00)  1. 000 (0. 00)  0. 967 (0. 03)  0. 201 (0. 17)  0. 1 20 (0. 08)  ML  Scores  0.000 (0.00)  0. 988 (0. 01)  0. 972 (0. 04)  0. 188 (0. 17)  0. 1 10 (0. 08)  Note. Standard  deviations are given  i n parentheses.  93  Table  43  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2)»  Raters Targets Replications  N 10 10 150  Mean SD Distribution  True E s t imate  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  0..028 (0, .31 )  0,.986 (0, .21 )  Consensus  0,.026 (0, .34)  1 .042 , (0, .21 )  0,.953 (0, .04)  0. 097 (0. 05)  0. 089 (0. 05)  Wconsensus  0..033 (0. .33)  1,.097 (0, .23)  0,.972 (0, .03)  0. 091 (0. 14)  0. 061 (0. 05)  Standardized  0..000 (0. .00)  0,.785 (0, .08)  0,.965 (0, .02)  0. 203 (0. 18)  0. 073 (0. 04)  Wstandardize  0..000 (0. .00)  0..930 (0. .04)  0..974 (0. .03)  0. 174 (0. 16)  0. 059 (0. 04)  Cronb-Kelley  0..026 (0. .34)  0..960 (0. .23)  0..953 (0. .04)  0. 089 (0. 05)  0. 089 (0. 05)  PC  Scores  0.,000 (0. ,00)  1..000 (0. .00)  0..972 (0. .02)  0. 1 84 (0. 16)  0. 063 (0. 04)  ML  Scores  0.,000 (0. ,00)  0..990 (0. .01 )  0.,974 (0. ,03)  0. 1 77 (0. 16)  0. 059 (0. 04)  Note. Standard  deviations are given  i n parentheses.  94  Table  44  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 20 • 20 150  Mean SD Distribution  True Estimate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .80 .50 .20 Normal Normal  Scores SD  R  MSE 1  MSE2  Actual  -0..004 (0. .21 )  0. 983 (0. 15)  Consensus  -0..012 (0. .25)  0. 982 (0. 16)  0..991 (0, .01 )  0,.035 (0. .02)  0..035 (0. .02)  Wconsensus  -o..042 (0. .31 )  0. 980 (0. 22)  0..996 (0, .00)  0,.083 (0. .09)  0..027 (0. .02)  Standardized  0..000 (0. .00)  0. 892 (0. 03)  0,.993 (0. .00)  0,.078 (0. .08)  0..032 (0. .02)  Wstandardize  0..000 (0. .00)  0. 951 (0. 02)  0..996 (0, .00)  0..070 (0. .07)  0.,026 (0. .02)  Cronb-Kelley  -o..012 (0. .25)  0. 966 (0. 16)  0,.991 (0. .01 )  0..034 (0, .02)  0.,035 (0. .02)  PC  Scores  0..000 (0. .00)  1. 000 (0. 00)  0..994 (0. .00)  0..077 (0. .07)  0.,031 (0. ,02)  ML  Scores  0..000 (0. .00)  0. 997 (0. 00)  0..996 (0. .00)  0..072 (0. .07)  0.,026 (0. ,02)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  95  Table  45  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 20 20 150  Mean SD Distribution  True E s t imate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Normal Normal  Scores SD  Actual  -0,.005 (0. .24)  0..968 (0. .17)  Consensus  -0,.000 (0. .27)  0..988 (0. .20)  Wconsensus  0,.018 (0, .35)  Standardized  R  MSE 1  MSE2  0..971 (0, .02)  0. 078 (0. 04)  0..075 (0. .04)  c o .  1..004 ,27)  0..990 (0. .01 )  0. 1 07 (0. 10)  0..044 (0. .03)  0..000 (0, .00)  0.,760 (0. ,06)  0..982 (0. .01 )  0. 1 36 (0. 10)  0..059 (0. .03)  Wstandardize  0..000 (0, .00)  0.,913 (0. ,04)  0..992 (0. .01 )  0. 094 (0. 09)  0..042 (0. .03)  Cronb-Kelley  -o,.000 (0. .27)  0.,933 (0. ,20)  0..971 (0. .02)  0. 075 (0. 04)  0..075 (0. .04)  PC  Scores  0..000 (0, .00)  1.,000 (0. ,00)  0..986 (0. .01 )  0. 109 (0. 09)  0..053 (0. .03)  ML  Scores  0..000 (0. .00)  0.,994 (0. ,00)  0..992 (0. .01 )  0. 097 (0. 09)  0..042 (0. .03)  Note. Standard  d e v i a t i o n s are given  i n parentheses.  96  Table  46  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s f r o m A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s . S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e a n d R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 20 20 150  Mean SD Distribution  True Estimate  Mean  Level 0 .0 Normal  Scale R e l i a b i l i t y 1 .60 .0 .40 Normal Normal  Scores SD  R  MSE1  MSE2  Actual  -0. 005 (0. 21 )  0. 986 (0. 16)  Consensus  -0. 001 (0. 22)  1. 015 (0. 16)  0. 975 (0. 01 )  0. 050 (0. 02)  0. 048 (0. 02)  Wconsensus  -0. 003 (0. 22)  1. 020 (0. 15)  0. 992 (0. 01)  0. 018 (0. 01 )  0. 019 (0. 01)  Standardized  0. 000 (0. 00)  0. 773 (0. 05)  0. 982 (0. o r )  0. 1 27 (0. 10)  0. 036 (0. 01 )  Wstandardize  0. 000 (0. 00)  0. 919 (0. 04)  0. 992 (0. 01 )  0. 081 (0. 09)  0. 018 (0. 01 )  Cronb-Kelley  -0. 001 (0. 22)  0. 970 (0. 17)  0. 975 (0. 01 )  0. 048 (0. 02)  0. 048 (0. 02)  PC  Scores  -o. 000 (0. 00)  1. 000 (0. 00)  0. 985 (0. 01 )  0. 095 (0. 08)  0. 030 (0. 01 )  ML  Scores  0. 000 (0. 00)  0. 995 (0. 00)  0. 992 (0. 01)  0. 082 (0. 08)  0. 018 (0. 01)  Note. Standard  deviations are given  i n parentheses.  97  Table  47  Mean C o r r e l a t i o n True Scores  Between T r u e  Score  Sample Estimate  Rater  Bias  Estimates  Size  and A c t u a l  (judges-targets)  5-10  1 0-5  10-10  20-20  Consensus  .5.5.8.2 .5.5.6.4 .0.0.6.4  .964 .899 .904  .975 .926 .946  .985 .939 .953  .991 .971 .975  Weighted Consensus  .5.5.8.2 .5.5.6.4 .0.0.6.4  .970 .932 .928  .973 .945 .926  .990 .970 .972  .996 .990 .992  Standardized  .5.5.8.2 .5.5.6.4 .0.0.6.4  .972 .928 .928  .979 .951 .960  .988 .960 .965  .993 .982 .982  Weighted Standardized  .5.5.8.2 .5.5.6.4 .0.0.6.4  .972 .938 .931  .973 .940 .927  .990 .972 .974  .996 .992 .992  CronbachKelley  .5.5.8.2 .5.5.6.4 .0.0.6.4  .964 .899 .904  .975 .926 .946  .985 .939 .953  .991 .971 .975  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  .973 .934 .924  .978 .941 .938  .989 .967 .972  .994 .986 .985  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  .972 .937 .929  .973 .928 .912  .990 .972 .974  .996 .992 .992  98  Table  48  Means f o r A b s o l u t e S c a l e T r u e S c o r e E s t i m a t e s S q u a r e D e v i a t i o n s from A c t u a l T r u e S c o r e s Sample S i z e Estimate  Rater  Bias  o f Mean  (judges-targets)  5-10  10-5  10-10  20-20  Consensus  .5.5.8.2 .5.5.6.4 .0.0.6.4  .151 .301 .197  .069 .155 .091  .072 . 174 .097  .035 .078 .050  Weighted Consensus  .5.5.8.2 .5.5.6.4 .0.0.6.4  .231 .340 .180  .264 .384 .246  .139 .215 .091  .083 .107 .018  CronbachKelley  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 43 .263 .167  .069 . 1 50 .087  .071 .161 .089  .034 . .075 .048  Table  49  Mean MSE o f T r u e S c o r e E s t i m a t e s S t a n d a r d i z e d O p t i m a l S c a l e from A c t u a l T r u e S c o r e s Sample S i z e Estimate  Rater  Bias  to Estimated  (judges-targets)  5-10  10-5  10-10  20-20  Consensus (adjusted)  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 43 .265 .1 69  .070 . 1 50 .088  .072 .161 .089  .035 .075 .048  Weighted Consensus  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 34 .221 .129  .070 . 1 35 .088  .064 .114 .061  .027 .044 .019  Standardized  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 33 .229 .142  .067 .131 .078  .067 .131 .073  .032 .059 .036  Weighted Standardized  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 32 .213 . 125  .070 . 1 35 .087  .063 .110 .059  .026 .042 .018  CronbachKelley  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 43 .265 .169  .070 . 1 50 .088  .072 .161 .089  .035 .075 .048  PC  .5.5.8.2 .5.5.6.4 .0.0.6.4  .131 .217 . 1 32  .066 .131 .080  .066 .120 .063  .031 .053 .030  ML  .5.5.8.2 .5.5.6.4 .0.0.6.4  . 1 32 .214 . 1 26  .070 . 1 36 .092  .063 .110 .059  .026 .042 .018  100  Table  50  Mean S q u a r e d D e v i a t i o n s A v e r a g e d Over R a t e r Sample  Size  Bias  (judqes-targets)  Estimate  5-10  10-5  10-10  20-20  Consensus (unadjusted) Cronbach-Kelley  .216  . 1 05  •  4  .054  .191  . 1 02  . 1 07  .052  Standardized  .110  .092  .090  .042  Weighted-standardized ML  . 1 57  .097  .077  .029  .156  .099  .077  .029  PC  .1 60.  .092  .083  .038  1 1  101  T a b l e 51 Mean S q u a r e d D e v i a t i o n s A v e r a g e d Rater Estimate  Bias  Over Sample  Size  (SD(A),SD(B),Mean(r),SD(r))  .5.5.8.2  .5.5.6.4  .0.0.6.4  Consensus (unadjusted) Cronbach-Kelley  .082  . 177  .109  .079  . 162  .098  Standardized  .075  . 1 38  .082  Weighted dized ML  .073 .  .125  .072  .073  .126  .074  .074 .  . 1 30  .076  PC  Standar-  .  102  Table  52  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s .  Raters Targets Replications  N 10 10 150  Mean SD Distribution  Rater Estimate  Reliability  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Uniform Uniform  Estimates  SD  R  MSE  Actual  0.593 (0.07)  0.218 (0.04)  Shen  0.538 (0.18)  0.267 (0.06)  0. 694 (0. 16)  0.065 (0.04)  Cronbach  0.602 (0.13)  0.293 (0.10)  0. 686 (0. 17)  0.061 (0.09)  PC  0.613 (0.11)  0.248 (0.06)  0. 693 (0. 16)  0.042 (0.02)  ML  0.582 (0.12)  0.270 (0.06)  0. 714 (0. 18)  0.045 (0.03)  0.557 (0.15)  0. 1 64 (0.05)  0. 674 (0. 16)  0.044 (0.03)  Avg  Fisher-Z  r with  Sum  0.669 (0.13)  0.211 (0.08)  0. 638 (0. 17)  0.054 (0.03)  r with  Z-Sum  0.688 (0.11)  0.216 (0.08)  0. 664 (0. 16)  0.053 (0.03)  Note. Standard  deviations are given  i n parentheses.  103  Table  53  C o m p a r i s o n o f E s t i m a t e s o f R e l i a b i l i t y o f Sums o f R a t e r s ; P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s W e i g h t e d by Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e ,  Raters Targets Replications  N 10 10 150  Level 0 .50 Normal  Mean SD Distribution  Reliability  Scale R e l i a b i l i t y 1 .60 .50 .40 Uniform Uniform  o f Sum  Estimate  Mean  Alpha  0.836  0.098  0.013  Green  0.871  0.085  0.008  Weighting  SD  f o r Maximum  E s t imate  Mean  PC  0.943  0.042  Overall  0.943  0.042  True  MSE  Reliability  SD  Score  Variance  Estimate  Mean  SD  MSE  Avg  Cov  0.844  0.533  0.307  Avg  b = 1  0.874  0.543  0.309  1 04  Table  54  Means ( o v e r r e p l i c a t i o n s ) o f Means, SD's, C o r r e l a t i o n s w i t h A c t u a l and Mean S q u a r e D e v i a t i o n s from A c t u a l (MSE1) o f T r u e S c o r e E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e D e v i a t i o n s from A c t u a l o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l t o E s t i m a t e d T r u e S c o r e Mean and V a r i a n c e E q u a l t o t h e P r o d u c t o f E s t i m a t e s o f T r u e S c o r e V a r i a n c e and R e l i a b i l i t y (MSE2).  Raters Targets Replications  N 10 10 150  Mean SD Distribution  True E s t imate  Mean  Level 0 .50 Normal  Scale R e l i a b i l i t y 1 .60 .50 .40 Uniform Uniform  Scores SD  R  MSE 1  MSE2  Actual  0.,014 (0. .29)  0.,915 (0. ,22)  Consensus  0..039 (0. .35)  0.,949 (0. .27)  0.,927 (0. ,06)  0.,161 (0. ,10)  0. 1 52 (0. 09)  Wconsensus  0..040 (0, .40)  0.,998 (0. .34)  0.,962 (0. ,04)  0., 1 93 (0. ,14)  0. 1 1 1 (0. 08)  Standardized  0,.000 (0. .00)  0..750 (0. .09)  0.,954 (0. .04)  0.,181 (0. ,14)  0. 1 22 (0. 08)  Wstandardize  0..000 (0. .00)  0..912 (0, .05)  0..966 (0. .04)  0., 1 62 (0. ,14)  0. 106 (0. 08)  Cronb-Kelley  0..039 (0. .35)  0,.841 (0, .29)  0..927 (0. .06)  0., 1 52 (0. ,09)  0. 1 52 (0. 09)  PC  Scores  0,.000 (0, .00)  1..000 (0, .00)  0..963 (0. .03)  0.,187 (0. ,14)  0. 1 1 1 (0. 08)  ML  Scores  -o,.000 (0, .00)  0,.986 (0, .01 )  0..966 (0. .04)  0., 1 77 (0. ,14)  0. 106 (o. 08)  Note. Standard  deviations are given  i n parentheses.  IV. In  general, the r e s u l t s  opinion  that Burt's  reliability results  are useful  study  the simple  up w i t h B u r t ' s  p a t t e r n s emerged,  Sum and r w i t h  Z-Sum were c o n s i s t e n t l y  being  estimated  C r o n b a c h method a l s o appears  Some l i g h t  to decrease  gave  inflated  judges' f o r the r with  from  even  though  t h e sum. The  estimates  but the b i a s  when t h e number o f j u d g e s  and/or t a r g e r s  with  the accuracy  t h e mean depends on t h e p u r p o s e t o w h i c h t h e e s t i m a t e i s  b e i n g p u t . I f one i s i n t e r e s t e d a  shed  from t h e  inflated,  i n c r e a s e s . Whether o r n o t one i s c o n c e r n e d of  was  procedure.  estimates  was e x c l u d e d  The  c o n d i t i o n s under  o f methods o f e s t i m a t i n g i n d i v i d u a l The mean r e l i a b i l i t y  judge  consensus.  f o r example,  reliabilities.  the  support the  may s u g g e s t  i s t o be e x p e c t e d .  i s s u e s which a r e t i e d  comparison  over  i n that they  benefit  Some c l e a r  of t h i s  (1936.) method g i v e s modest g a i n s i n  and v a l i d i t y  w h i c h maximum on  DISCUSSION  s e t of judges,  relevant  factor  standard  actual with  i s irrelevant.  i s used  as part of a  as i n the case  of B u r t ' s  to the s t a b i l i t y  as w e l l as t o the squared  e s t i m a t e s . The a v e r a g e  standing of  But i t may be  d e v i a t i o n of the r e l i a b i l i t y  a g i v e n method r e l a t e s estimates  t h e mean  when t h e e s t i m a t e  multiplicative The  then  i n the r e l a t i v e  weights. estimates of  o f t h e mean o f t h e  d e v i a t i o n from t h e  Fisher-z,  r with  Sum and r  Z-Sum methods had s t a n d a r d d e v i a t i o n s l e s s  standard  d e v i a t i o n s of the a c t u a l  reliabilities.  methods had s t a n d a r d d e v i a t i o n s s l i g h t l y  1 05  larger  than the The o t h e r than the  106 actual  except  larger  standard deviations  conditions.  t h e C r o n b a c h method w h i c h had  This  results  support  under t h e s m a l l  considerably sample  i n u n s t a b l e mean e s t i m a t e s .  the  data  Cronbach et a l . ' s  the  e s t i m a t e does n o t p e r f o r m w e l l  Thus  (1963) o b s e r v a t i o n under  small  that  sample  conditions. In  terms of the r e l a t i v e s t a n d i n g  reliabilities,  t h e most  important  of r a t e r  index  i s the c o r r e l a t i o n  between e s t i m a t e d and a c t u a l . A l l methods t e s t e d similar  c o r r e l a t i o n s . The h i g h e s t  generally  given  by t h e maximum  correlations  gave  were  likelihood factor  analysis  method, a l t h o u g h  t h e C r o n b a c h method may be s u p e r i o r  absence of r a t e r  scale  i n the  Thus  i t may be o f  interest  t o t r y t h e C r o n b a c h method w i t h  standardized  ratings.  This  by  would  differences.  involve  estimating  the square of the c o r r e l a t i o n  between t h a t  judge's r a t i n g s  quite  a judge's  reliability  (as opposed t o c o v a r i a n c e )  and a l l o t h e r  judges'  ratings  divided  by t h e a v e r a g e c o r r e l a t i o n between a l l p a i r s o f  judges'  ratings.  The actual  s i z e o f t h e c o r r e l a t i o n between e s t i m a t e s and  was s t r o n g l y  number o f t a r g e t s present The  with  and l e s s  the exception  by t h e number o f j u d g e s and t h e  so by t h e t y p e s o f r a t e r of the Cronbach  by t h e PC and Avg F i s h e r - z  ML method. Thus i n t e r m s o f l e a s t  which  i s a function  o f mean, v a r i a n c e  bias  method.  l e a s t o v e r a l l mean s q u a r e d e v i a t i o n s  were g i v e n the  affected  from  estimates, squares  actual  followed  estimation,  and c o r r e l a t i o n , t h e  by  107 best  methods o f e s t i m a t i o n a p p e a r  components and a v e r a g e performed  the worst  method, a l t h o u g h size.  Hence,  t o be t h e p r i n c i p a l  F i s h e r - z methods. The method  i n t e r m s o f mean s q u a r e  i t seems t o improve w i t h  i f a b s o l u t e as opposed  between e s t i m a t e and a c t u a l  was t h e C r o n b a c h  i n c r e a s e d sample  to relative  i s desired,  which  agreement  t h e PC method i s  recommended and t h e C r o n b a c h method s h o u l d be a v o i d e d the  sample  size  i s large.  One o f t h e most concerned judges of  interesting  results  of t h i s  t h e two e s t i m a t e s o f t h e r e l i a b i l i t y  (which  judges)  consensus  i s an e s t i m a t e  study  o f t h e sum of  i s equivalent to the r e l i a b i l i t y  which  o f t h e mean  of the r e l i a b i l i t y  method o f e s t i m a t i n g t r u e s c o r e s .  likelihood  squared  factor  individual  judges'  reliabilities  of the  Surprisingly,  t h e method d e s c r i b e d a s G r e e n ' s method w h i c h u s e d  maximum  l o a d i n g s as e s t i m a t e s of the performed  C r o n b a c h ' s a l p h a under a l l c o n d i t i o n s . exact  unless  much b e t t e r  Note t h a t  alpha  than i s an  e s t i m a t e when c o n d i t i o n s ( j u d g e s ) a r e e q u i v a l e n t , i e . ,  when t h e y  have e q u a l means, v a r i a n c e s and  (Cronbach  e t a l . , 1972). J u d g e s were n o t e q u i v a l e n t i n any  of  the c o n d i t i o n s tested The  actual  reliability  lower  The o n l y c o n d i t i o n  that  t h e maximum  reliabilities  than the  o f t h e sum f o r p r a c t i c a l l y a l l where  i t seemed t o s l i p a  was t h e 1 0 - j u d g e s 5 - t a r g e t s c o n d i t i o n .  observed rater  study.  G r e e n e s t i m a t e was o n l y s l i g h t l y  conditions. little  in this  reliabilities  likelihood  are poorer  under  I t h a s been  estimates of this  individual  condition,  which  108 may  e x p l a i n why  the Green e s t i m a t e  under t h i s c o n d i t i o n as under t h e performed of  the  much b e t t e r t h a n  reliability  of  the  alpha sum  by  the p r i n c i p a l The  the  judges'  however. A b e t t e r  an  apparent  judges small  by  obtained  by  the average  the  estimating F i s c h e r - Z or  components method. i m p r o v e d c o n s i d e r a b l y under  adequate estimate improvement  effect  sample  reason  l a r g e sample  sizes.  alpha  estimate  be  reliability  the  is directly  Hence a l p h a  the  variances, The  or  relating  verify  s i n c e the  surprising  not  with  be  to the used  judge's  poorest  under  equivalence  of  i f there i s reliability  respect to t h e i r  to weighting  the v a l i d i t y two  is  means,  estimates  f o r maximum  of O v e r a l l ' s f o r m u l a  f o r the  very c l o s e l y .  It i s  that O v e r a l l ' s formula  works b e c a u s e  the  i n v o l v e d was  simulated data  weighting  was  to  of  agree  assumption  Weighting  differ  number  due  .5.5.6.4 c o n d i t i o n . Thus i t  related  should  the  may  reliabilities.  results  reliability weights  judges  as  p e r f o r m a n c e of a l p h a under  be  under  to b e l i e v e t h a t the average  and/or  basic  in the  of u n i t  s i z e s and  that alpha  conditions.  all  reliabilities  i n c r e a s e s . The  appears  not  be  still  estimate  2 0 - j u d g e s 2 0 - t a r g e t s c o n d i t i o n . Hence a l p h a may  the c e i l i n g  low  as w e l l  under c o n d i t i o n s l i k e  p e r f o r m a n c e of a l p h a  considered The  perform  other c o n d i t i o n s . I t  1 0 - j u d g e s 5 - t a r g e t s c o n d i t i o n may individual  d i d not  i n the  the  study  uni-factor  satisfied  f o r maximum r e l i a b i l i t y  scheme) g e n e r a l l y r e s u l t e d  assumption,  this  (which  in higher  and  assumption. i s just  Burt's  reliabilities  109 than  the  unweighted  Weighting  produced  sum  the  unweighted  condition  probably  averaging  over  p e r f o r m a n c e of probably  was  be  improvement  the  where t h e w e i g h t e d  worse t h a n  the u s u a l  sample  sum.  sizes  the weights  10-judges  composite The  attenuated  e x p l a i n e d by  i n even t h e l a r g e  figure  f o r the fact  4 ) . The  10-5  reliabilities.  But,  t h e ML  method d i d not  provide very  None of  very  advisable  t o use  consequently weights. of  less).  under  observed,  such  as p o i n t e d  the  of  judges  estimates  results  sampling weights  weighting errors  does not  however, t h a t 10  10-5  f o r the  be  condition.  used  or  or more the  i t i s not  small  numbers large,  worse t h a n  rater  five  out  good  i s too  and  unit  when t h e r e  is a  reliability fewer  seem t o m a t t e r .  i t i s just  concerning  error  perform  should  of  reliability  under c o n d i t i o n s of The  involved  estimates  c o n d i t i o n . Thus  as when t h e r e a r e  i f there are  The  this  the e s t i m a t e d  large standard  number  weights  weights  Hence u n i t  esetimates, The  w e l l under  t a r g e t s (5 or  risk  l o a d i n g s as  t h e methods of e s t i m a t i n g r a t e r  performed  of  reliability  poor  t h a t the weights  judge's  rater  by  c o n d i t i o n can  individual  of  this  indicated  squared  estimates  factor  performed  from  the d i f f e r e n c e s  (see  the  actually  results  did  5-targets  maximum l i k e l i h o o d  previously,  consensus).  o n l y c o n d i t i o n f o r which weighting  increase r e l i a b i l i t y  condition  i s just  substantial  sample c o n d i t i o n . The not  (which  targets.  I t should  as u n a d v i s a b l e  not  to  be use  targets. true score  variance  were d i s a p p o i n t i n g . N e i t h e r of t h e  two  estimates  110 seemed t o work v e r y  well  seemed e x c e s s i v e . The variance the  i s given  number of that  estimates  of  sampled It  obtain  the  the  large standard  standard  the  (Hoel,  very  true  factor  the  estimates  1971)  sample where n i s  values  n=5,  20  the  score  interest  score  can  be  to  variance. of  absolute  true  techniques  10,  attributable  true  i n an  estimating  analytic  the  i s not  d e s i r a b l e i n the  score  deviations  d e v i a t i o n s of  variance  d e v i a t i o n of  a b e t t e r method.of  Perhaps the  standard  d e v i a t i o n of  (2/(n-1 )') 1/2  true score  w o u l d be  expressing  standard  the  t a r g e t s . S u b s t i t u t i n g the  indicates  the  by  in that  s c a l e , to  variance.  used  in  this  respect. Concerning estimates—the identical  transformation  the  Cronbach-Kelley those  of  towards the squared  the  i n t e r e s t - - t r u e score  correlation  from t h e  estimates. of  of  Cronbach-Kelley  f o r mean and  i s expected  Cronbach-Kelley  than  c o n s e n s u s and  results  values. This  the  t h e main t o p i c  c o n s e n s u s . The  the  actual,  respect  the  first  to the  effect  noted  of  This  was  data  could  be  'messier'  d e v i a t i o n s of  are  smaller  regressed  reducing  the  mean  and  i s t h a t a l l methods gave  c o r r e l a t i o n s under a l l c o n d i t i o n s  rating  linear  between e s t i m a t e s  high  somewhat s u r p r i s i n g .  the  scores.  correlation  t h i n g t o be  actual  however, were much  d e v i a t i o n from a c t u a l t r u e  With  of  standard  consensus s i n c e scores  mean. T h i s has  with  definition  I t i s merely a  estimates,  methods gave  (.899  It i s probably i n some r e a l  and true life  above). that situations.  111  For  example,  uni-factor  the  data  influences  scale  were s c a l e  condition  deviation  of  had  .22  were t r u n c a t e d ) . situations It  has  been  the  and  1936;  (since  reliabilities  I t may  be  (in a  (or a n e g a t i v e As  a  scaling  symmetry, was  felt  to  follow  can  reverse  the  scaling to  possible  that  t h i s study  two.  I t was  defined  which a l s o  expected value al., would  not  of  the  standard  and  of  the  above 1 life this. have  targets This  reliability  that  1 and  one  i n the  truncated  would  like  and  the  scale  how  by  the  preserve a r b i t r a r y . It  scaling the  half  namely  number  would  scale  factors  by  the  scaling  requirement  factor  the  be  one  that  a  factor  of  (Cronbach  the  be  the  i s unnecessary.  question  of  balance  such a d i s t r i b u t i o n could  satisfied  re-examining  one  to  the  study,  to  d i s t r i b u t i o n such t h a t true  interesting  d i s t r i b u t i o n of  1963). P e r h a p s t h i s r e q u i r e m e n t require  a  some j u d g e s  somewhat a r t i f i c i a l  clear  the  .2  i t would be  judges m u l t i p l y i n g  of  and  a negative  with  of  poorest  i n some r e a l  order  the  number  .6  that  judges m u l t i p l y i n g the  The  even worse t h a n  d i s t r i b u t i o n s used  a geometric  biasing  factor).  u n i f o r m w i t h mean  intuitively  of  ratings  C r o n b a c h e t a l . , 1972).  forms of  seemed t o be  only  below  get  the  gave v a l i d  biases.  many w r i t e r s  1965;  f a c t o r s . The  n o r m a l and  case  sense) to having  follow-up  investigate  the  reliabilities  Hunter,  would amount  reliability  a mean r e l i a b i l i t y  known t o a c t u a l l y  satisfied  d i f f e r e n c e s ) . The  been o b s e r v e d by  (Burt,  in t h i s study  a s s u m p t i o n . Hence a l l r a t e r s  (notwithstanding  rater  used  et This  absolute  true  112  score  scale. Although  a l l methods p r o d u c e d a c c e p t a b l e c o r r e l a t i o n s  between e s t i m a t e d likelihood  and a c t u a l t r u e  and weighted  standardized  gave c o n s i s t e n t l y h i g h e r condition  the  produced  reflects  estimation  the highest  except,  , where  reliabilities  methods  again  i n the  simply  correlations.  the p r e v i o u s l y d i s c u s s e d of judges'  t h e maximum  ( i e . Burt's)  correlations,  o f 1 0 - j u d g e s and 5 - t a r g e t s  standardizing result  scores,  result  This  last  concerning  i n t h e 10-5  condition. With the  most  respect  interesting  between e s t i m a t e s Cronbach-Kelley and  equal  for  estimates  a r e expressed from t r u e  (including  were c o n v e r t e d  t o the o v e r a l l  of estimated  true  deviations  i n absolute  scores  scale  d i r e c t l y . In  t h e c o n s e n s u s and  t o a s c a l e d e f i n e d by a mean  true  and v a r i a n c e  score  variance  a l l methods) and t h e e s t i m a t e d  particular  scores,  sample mean o f t h e r a t i n g s ( t h i s  same f o r a l l e s t i m a t e s )  product  to relative  and a c t u a l s c o r e s . The c o n s e n s u s and  a l l estimates  Cronbach-Kelley)  as opposed  measure i s t h e mean s q u a r e  h e n c e c a n be d e v i a t e d  addition,  the  to absolute  score  estimate  equal  was  t o the  ( w h i c h was t h e same  reliability  of the  ( t h i s v a r i e d between  methods). The  mean s q u a r e d e v i a t i o n f o r t h e s i m p l e  consistenly the  the worst  Cronbach-Kelley  weighted  o f a l l t h e methods  method. The maximum  standardized  c o n s e n s u s was  f o l l o w e d c l o s e l y by l i k e l i h o o d and  ( B u r t ' s ) methods g e n e r a l l y  performed  11 3 much b e t t e r t h a n 5-targets and  c o n d i t i o n , i n which case  unweighted  better  the o t h e r s , except  than  standardized  i n the 10-judges  the p r i n c i p a l  methods p e r f o r m e d  components  slightly  the others.  One i n t e r e s t i n g deviations  again,  result  i s that  f o r the Cronbach-Kelley  virtually  unchanged a f t e r  indicates  that  t h e mean estimate  square remained  t h e change o f s c a l e , w h i c h  i t was a l r e a d y  expressed  i n the appropriate  scale. It the  should  estimates  estimated out,  Hence  scores  true  conclusion  w h i c h a s has been  score  variance  Burt's  obtained  to estimate  squared  variance greater  a s t h e maximum  maximum  individual  method o f e s t i m a t i n g  true  s c o r e has  likelihood  factor  by Thomson's r e g r e s s i o n method, a n d when  a 30 t o 50% r e d u c t i o n  from a c t u a l t r u e Cronbach-Kelley  which case  score  show an even  methods a r e r e s c a l e d t o an a b s o l u t e  generalization  pointed  with, much  of t r u e  i s that, using  loadings  t h e same a c c u r a c y  about  t o which  t h e c o n s e n s u s method.  factor  reliabilities, about  the v a r i a n c e  method would p r o b a b l y  over  final  likelihood  these  variance,  i f a better estimate  found, Burt's  The  in  score  does n o t e s t i m a t e  improvement  that  were s c a l e d was i n p a r t d e f i n e d by t h e  true  accuracy. is  a l s o be o b s e r v e d  scores  over  estimates.  i s less  result  i n t h e mean s q u a r e d e v i a t i o n  the simple  c o n s e n s u s and  The e x c e p t i o n  i s when t h e r e a r e fewer  there  s c a l e , they  discrepancy  to this  than  between  five  targets, in  the estimates  1 14 and  the unweighted  accurate  than  With  standardized estimates  a r e s l i g h t l y more  the others.  5 t a r g e t s or l e s s ,  one would n o t be a d v i s e d  a weighted consensus;  the unweighted  standardized  optimal.  cases,  improve b o t h t h e  In a l l o t h e r  reliability  and the v a l i d i t y of t h e e s t i m a t e s ,  enormously. But i f the data anyway, i t c a n ' t If  hurt  of r a t e r  work b e t t e r t h a n 10-targets  reliabilities,  possible  t h e ML It  in obtaining  another  consideration i s  rater  estimates  might  10-targets  reliability  seems t o  method works a s  methods under t h e o t h e r  I t was n o t t e s t e d i n t h i s  i n the 5-judges  Fischer-Z  by a computer  methods under t h e 5 - j u d g e s  the other  that Burt's  a l t h o u g h not  estimates.  c o n d i t i o n . The maximum l i k e l i h o o d  conditions.  scores are  The a v e r a g e F i s h e r - z method  the other  or b e t t e r than  better  scored  to obtain better  number o f j u d g e s .  well  are being  one i s s p e c i f i c a l l y i n t e r e s t e d  estimates the  weights  t o use  study,  but i t  have p e r f o r m e d  is even  c o n d i t i o n had t h e a v e r a g e  estimates  been u s e d  i n s t e a d of  estimates. was m e n t i o n e d  make s e n s e t o combine  i n the i n t r o d u c t i o n that  s c o r e s which a r e expressed i n  different  scales.  variance.  But i t was a l s o a r g u e d  measurement  Hence,  i s given  i t does n o t  scores are standardized  t o a common  that the u n i t of  by t h e r e g r e s s i o n c o e f f i c i e n t  on t r u e  s c o r e s . Thus t o e q u a l i z e u n i t s o f measurement, s c o r e s be  d i v i d e d by t h e i r  rather  than  their  true score  standard  regression  should  coefficients  d e v i a t i o n s . But B u r t ' s  weights  115 come out  the  same i n e i t h e r  standardizing ratings  by  to unit  any  case  variance,  constant  will  since and  not  they  involve  hence d i v i d i n g  a f f e c t the  the  resulting  weights. For  future  investigations,  i t would be  interesting  explore  other  possible  and  rater  r e l i a b i l i t y c o e f f i c i e n t s . I t would a l s o  the  d i s t r i b u t i o n s f o r the  interesting  and  factorially  more complex  is,  attributes  more r e a l i s t i c  to  be  rater  stress  uni-factor  m o d e l , however, at  it  also  would be  least  known. The  as  i n the  defined large  same s o r t s  one  number of Since  defined  could  concept  the  of  the  get  an  would be  true  by  present  study  The  the  should  particular,  in a  scores.  situation One  t e m p e r a t u r e s of  temperature  in t h i s  be  however  accordance with a  for  case  performed room  c o n s e n s u a l a g r e e m e n t . The  only  obtain  ratings  a  consensually  from a  infinitely  judges. data  a  i s that  ' e x a c t ' measure of to  and  situations.  true  difference  That  reasonable  t e s t s . In  rate  of  ratings.  comparisons c o u l d  s t u d y . One  i s not  the  be  complex,  r a t i n g data  t o have s u b j e c t s  present  temperature that  be  d i f f e r e n t rooms. The  is  way  rating  some e m p i r i c a l  knowledge of  a  factors  case  score models.  in t h e i r  i s probably  f o r most  the  factorially  d e v e l o p m e n t s of  include  which p e r m i t s exact  number of  be  i n t e r e s t i n g to o b t a i n  example m i g h t  true  different factors  Other p o s s i b l e obviously  investigate  and  j u d g e d may  j u d g e s may  approximation,  to  scaling  to  t h i s s t u d y were g e n e r a t e d  s p e c i f i c model,  i t i s evident  in  that  the  116 results the  have o n l y  h a z a r d s of  desirable  a l l simulation  to t e s t  a s s u m t i o n s . The necessarily life any  data  And  the  general  the  aim  present  score  only  information  s t u d y was  optimal  about  the  judges  context.  similar clinical  be  concerned with  which c o u l d  over  used  would be  be  extracted  estimates  any  two  characteristics.  the  to p r e d i c t  using  those estimating Cronbach  using  of  the et  this-  s t u d i e s . Such a  stability  Some k i n d  that  future  to Goldberg's prediction  of  the  the  ratings  from  of  the  the  proposal estimates  of c r o s s - v a l i d a t i o n  estimated with  r a t i n g s by  scale  a given  that  (Wiggins,  and judge  approach  1981), where j u d g e s '  p r e d i c t o r v a r i a b l e s and  might  judge. T h i s  (1970) ' b o o t s t r a p p i n g '  a  model.  optimizing  reliability.  in future on  i n the  involved  possibility  parameters a s s o c i a t e d  on  that  in  required.  i s conceivable  aire r e g r e s s e d  case  f o r a p a r t i c u l a r s e t of  information  reliability  the  represented  would r e q u i r e  procedure  model r e q u i r e m e n t s  possess d i f f e r e n t  s c a l e s of measurement and  t i m e and  real  c l a s s should  estimates  information  be  v i o l a t e the  the  however  essential properties  (1963) c o n s i d e r e d  It  model does n o t  s i t u a t i o n s . The  ratings. Obtaining judges'  the  v i o l a t i o n s of  be  i s to develop a procedure which a p p l i e s to  that  true  of  s t u d i e s . C l e a r l y i t would  ways, i t i s l i k e w i s e  c l a s s of  The  of  i s one  i t s s i g n i f i c a n c e . Although  would d o u b t l e s s  e l e m e n t s of  al.  from  situations will  yet  s i g n i f i c a n c e . This  e f f e c t s of v a r i o u s  simplicity  detract  number o f  rating  hypothetical  is to  ratings  the r e s u l t i n g  1 17 regression given  weights  a r e used  judge. Goldberg  Dawes and C o r r i g a n equations  to predict  future ratings  ( 1 9 7 0 ) , W i g g i n s a n d Kohen  (1974) f o u n d  outperformed  that  the judges  by a  ( 1 9 7 4 ) , and  the r e g r e s s i o n  themselves  i n n e a r l y every  case. The  model under c o n s i d e r a t i o n i n t h i s  a r e g r e s s i o n of r a t i n g s regression be  estimates  converted  ratings.  from  for ratings  judge,  one c a n l i n e a r l y  future observed  however, assumes a c e r t a i n  that  and  in a  i t s h o u l d be  judge's  reiterated  such  as those  proposed  many  by B u r t ( l 9 3 6 )  i n use p r i m a r i l y  i n v o l v e a great d e a l of computational  be e x p l o i t e d  small,  procedure,  o f c o m p u t e r s makes f e a s i b l e  techniques  With computers d o i n g  measurements  scores  i s not necessary f o r  K e l l e y ( l 9 4 7 ) which a r e not c u r r e n t l y they  true  ratings. This  which  g e n e r a l comment,  the a v a i l a b i l i t y  because  can  predict  past data f o r  study.  a final  statistical  from  amount o f s t a b i l i t y  mean, v a r i a n c e and r e l i a b i l i t y  As  on t r u e s c o r e s c a n e a s i l y  t o r e g r e s s i o n e s t i m a t e s o f t r u e s c o r e s on  the judge's  the p r e s e n t  represents  on h y p o t h e t i c a l t r u e s c o r e s . But t h e  Hence u s i n g r e g r e s s i o n e s t i m a t e s  a particular  study  t h e work, t h e s e  to i n c r e a s e the r e l i a b i l i t y  i n a r e a s where  i s much  statistical  needed.  improvement,  labor.  techniques  and v a l i d i t y o f  no m a t t e r  how  REFERENCES A l l e n , M.J., & Yen, W.M. ( 1 9 7 9 ) . I n t r o d u c t i o n t o measurement t h e o r y . Monterey, C a l i f o r n i a : Brooks/Cole. B a r t k o , J . J . ( 1 9 6 6 ) . The i n t r a c l a s s c o r r e l a t i o n c o e f f i c i e n t as a measure of r e l i a b i l i t y . P s y c h o l o g i c a l R e p o r t s , 19, 3-11. B e r g , I . , & Adams, H. ( 1 9 6 2 ) . The e x p e r i m e n t a l b a s i s o f p e r s o n a l i t y a s s e s s m e n t . In A . J . B a c h r a c h ( E d . ) , E x p e r i m e n t a l f o u n d a t i o n s of c l i n i c a l p s y c h o l o g y . New Y o r k : B a s i c Books. B e r n a r d i n , H . J . ( 1 9 7 8 ) . The e f f e c t s o f r a t e r t r a i n i n g on l e n i e n c y and h a l o e r r o r s i n s t u d e n t r a t i n g s o f i n s t r u c t o r s . J o u r n a l o f A p p l i e d P s y c h o l o g y , 63, 301-308. B i e r i , J . , A t k i n s , L., B r i a r , S'. , Leaman, R., M i l l e r , H., & T r i p o d i , T. ( 1 9 6 6 ) . C l i n i c a l and s o c i a l judgment: The d i s c r i m i n a t i o n of b e h a v i o r a l i n f o r m a t i o n . New Y o r k : Wiley. B r i e f , A.P. ( 1 9 8 0 ) . Peer a s s e s s m e n t r e v i s i t e d : A b r i e f comment on Kane and L a w l e r . P s y c h o l o g i c a l B u l l e t i n , 78-79. Borman, W.C. ( 1 9 7 8 ) . E x p l o r i n g upper l i m i t s and v a l i d i t y i n j o b p e r f o r m a n c e r a t i n g . A p p l i e d P s y c h o l o g y , 63, 135-144.  88,  of r e l i a b i l i t y J o u r n a l of  Borman, W.C. ( 1 9 7 9 ) . Format and t r a i n i n g , e f f e c t s on a c c u r a c y and r a t e r e r r o r s . J o u r n a l of A p p l i e d P s y c h o l o g y , 64, 410-421.  rating  Brown, E.M. ( 1 9 6 8 ) . I n f l u e n c e o f t r a i n i n g method and r e l a t i o n s h i p on t h e h a l o e f f e c t . J o u r n a l o f A p p l i e d P s y c h o l o g y , 52, 195-199. B u r d o c k , E . I . , F l e i s s , J . L . , & H a r d e s t y , A.S. ( 1 9 6 3 ) . A new view of i n t e r - o b s e r v e r agreement. P e r s o n n e l P s y c h o l o g y , J_6, 373-384. B u r i s c h , M. ( 1 9 7 8 ) . C o n s t r u c t i o n s t r a t e g i e s f o r m u l t i s c a l e personality inventories. Applied Psychological Measurement, 2, 97-111. B u r t , C. ( 1 9 3 6 ) . The a n a l y s i s o f e x a m i n a t i o n m a r k s . In P. H a r t o g and E.C. Rhodes ( E d s . ) , The marks o f e x a m i n e r s . London: M a c m i l l a n . Cook, M. (1979).. P e r c e i v i n g o t h e r s : The p s y c h o l o g y of i n t e r p e r s o n a l p e r c e p t i o n . London: Methuen. 1 18  1 19 Cronbach, L . J . (1970). E s s e n t i a l s New Y o r k : H a r p e r & Row.  of p s y c h o l o g i c a l  testing.  C r o n b a c h , L . J . , G l e s e r , G.C., Nanda, H., & R a j a r a t n a m , N. ( 1 9 7 2 ) . The d e p e n d a b i l i t y o f b e h a v i o r a l measurements: T h e o r y o f g e n e r a l i z a b i l i t y f o r s c o r e s and p r o f i l e s . New York: Wiley. C r o n b a c h , L . J . , R a j a r a t n a m , N., & G l e s e r , G.C. ( 1 9 6 3 ) . Theory of g e n e r a l i z a b i l i t y : A l i b e r a l i z a t i o n of r e l i a b i l i t y t h e o r y . B r i t i s h J o u r n a l of S t a t i s t i c a l P s y c h o l o g y , 16, 137-163. Crow, W.J. ( 1 9 5 7 ) . The e f f e c t o f t r a i n i n g upon a c c u r a c y and v a r i a b i l i t y i n i n t e r p e r s o n a l p e r c e p t i o n . J o u r n a l of A b n o r m a l and S o c i a l P s y c h o l o g y , 55, 355-359. C u r e t o n , E . E . ( 1 9 3 1 ) . E r r o r s o f measurement A r c h i v e s o f P s y c h o l o g y , 125, 1-63.  and c o r r e l a t i o n .  C u r e t o n , E . E . ( 1 9 5 8 ) . The d e f i n i t i o n and e s t i m a t i o n o f t e s t r e l i a b i l i t y . E d u c a t i o n a l and P s y c h o l o g i c a l M e a s u r e m e n t s , J_8, 715-738. Dawes, R.M., decision  & C o r r i g a n , B. ( 1 9 7 4 ) . L i n e a r m o d e l s i n m a k i n g . P s y c h o l o g i c a l B u l l e t i n , 81, 95-106.  D i a c o n i s , P. & E f r o n , B. ( 1 9 8 3 ) . Computer i n t e n s i v e methods i n s t a t i s t i c s . Sc i e n t i f i c A m e r i c a n , 248, 116-130. E b e l , R.L. ( 1 9 5 1 ) . E s t i m a t i o n o f t h e r e l i a b i l i t y P s y c h o m e t r i k a , 16, 407-424.  of r a t i n g s .  E i n h o r n , H.J., & H o g a r t h , R.M. ( 1 9 7 5 ) . U n i t w e i g h t i n g schemes f o r d e c i s i o n m a k i n g . O r g a n i z a t i o n a l B e h a v i o r and Human P e r f o r m a n c e , 13, 171-192. F i s k e , D.W. ( 1 9 7 8 ) . S t r a t e g i e s f o r P e r s o n a l i t y R e s e a r c h : The O b s e r v a t i o n V e r s u s I n t e r p r e t a t i o n o f B e h a v i o r . San Francisco: Jossey-Bass. F i s k e , D.W., & Cox, J.A. ( i 9 6 0 ) . The c o n s i s t e n c y of r a t i n g s by p e e r s . J o u r n a l o f A p p l i e d P s y c h o l o g y , 44, 11-17. F l e i s s , J . L . (1970). E s t i m a t i n g the r e l i a b i l i t y d a t a . P s y c h o m e t r i k a , 35, 143-162.  of i n t e r v i e w  F l e i s s , J . L . , S p i t z e r , R.L., & B u r d o c k , E . I . ( 1 9 6 5 ) . E s t i m a t i n g a c c u r a c y o f judgment u s i n g r e c o r d e d i n t e r v i e w s . A r c h i v e s o f G e n e r a l P s y c h i a t r y , 12, 562-567. Flemenbaum, A., & Zimmerman N. ( 1 9 7 3 ) . I n t e r - and i n t r a - r a t e r r e l i a b i l i t y of the b r i e f p s y c h i a t r i c s c a l e . P s y c h o l o g i c a l R e p o r t s , 36, 783-792.  rating  120 G o l d b e r g , L.R. ( 1 9 7 0 ) . Man v e r s u s model o f man: A r a t i o n a l e p l u s e v i d e n c e f o r a method o f i m p r o v i n g on c l i n i c a l i n f e r e n c e s . P s y c h o l o g i c a l B u l l e t i n , 73, 422-432. G o r s u c h , R.L. ( 1 9 7 4 ) . F a c t o r a n a l y s i s . Saunders.  T o r o n t o : W.B.  G r e e n , B.F. ( 1 9 5 0 ) . A n o t e on t h e c a l c u l a t i o n o f w e i g h t s f o r maximum b a t t e r y r e l i a b i l i t y . P s y c h o m e t r i k a , 15, 57-61. G r o z z , H.J., & Grossman, K.G. ( 1 9 6 8 ) . C l i n i c i a n s ' r e s p o n s e s t y l e : A source of v a r i a t i o n and b i a s i n c l i n i c a l j u d g m e n t s . J o u r n a l o f Abnormal P s y c h o l o g y , 73, 207-214. G u i l f o r d , J.P. (1954). Psychometric McGraw-Hill.  methods. New Y o r k :  H a g g a r d , E.A. ( 1 9 5 8 ) . I n t r a c l a s s c o r r e l a t i o n a n a l y s i s o f v a r i a n c e . New Y o r k : D r y d e n . H o e l , P.G. ( 1 9 7 1 ) . I n t r o d u c t i o n New Y o r k : W i l e y .  and t h e  t o mathematical  statistics.  H o r o w i t z , L.M., I n o u y e , D., & S i e g e l m a n , E.Y. ( 1 9 7 9 ) . On averaging judges' r a t i n g s t o i n c r e a s e t h e i r c o r r e l a t i o n w i t h an e x t e r n a l c r i t e r i o n . J o u r n a l o f C o n s u l t i n g a n d C l i n i c a l P s y c h o l o g y , 47, 453-455. H o y t , C. ( 1 9 4 1 ) . T e s t r e l i a b i l i t y e s t i m a t e d by a n a l y s i s o f v a r i a n c e . P s y c h o m e t r i k a , 6, 153-160. Hunter, J . E . (1968). P r o b a b i l i s t i c f o u n d a t i o n s f o r c o e f f i c i e n t s o f g e n e r a l i z a b i l i t y . P s y c h o m e t r i k a , 33, 1-18. J a c k s o n , D.N. ( 1 9 7 4 ) . P e r s o n a l i t y r e s e a r c h form m a n u a l . Goshen, N.Y.: R e s e a r c h P s y c h o l o g i s t s P r e s s . J a c k s o n , R.W.B. ( 1 9 3 9 ) . R e l i a b i l i t y o f m e n t a l J o u r n a l o f P s y c h o l o g y , 29, 267-287. J a c k s o n , R.W.B., & F e r g u s o n , r e l i a b i l i t y of t e s t s .  tests.  British  G.A. ( 1 9 4 1 ) . S t u d i e s on t h e  Kahneman, D., S l o v i c , P., & T v e r s k y , A. ( E d s . ) . ( 1 9 8 2 ) . Judgment under u n c e r t a i n t y : H e u r i s t i c s a n d b i a s e s . C a m b r i d g e , Mass.: C a m b r i d g e U n i v e r s i t y P r e s s . Kane, J . S . , & L a w l e r , E . E . ( 1 9 7 8 ) . Methods o f p e e r a s s e s s m e n t . P s y c h o l o g i c a l B u l l e t i n , 85, 555-586. K e l l e y , T . L . ( 1 9 2 4 ) . N o t e on t h e r e l i a b i l i t y o f a t e s t : A r e p l y t o D r . Crum's c r i t i c i s m . J o u r n a l o f E d u c a t i o n a l P s y c h o l o g y , JJ5, 193-204.  121 K e l l e y , T.L. ( 1 9 2 7 ) . I n t e r p r e t a t i o n of e d u c a t i o n a l m e a s u r e m e n t s . New Y o r k : W o r l d Book. K e l l e y , T.L. ( 1 9 4 7 ) . F u n d a m e n t a l s o f s t a t i s t i c s . Mass.: Cambridge U n i v e r s i t y P r e s s .  Cambridge,  K n i g h t , R.A., & B l a n e y , P.H. ( 1 9 7 7 ) . The i n t e r r a t e r r e l i a b i l i t y of t h e P s y c h o t i c I n p a t i e n t P r o f i l e . J o u r n a l of C l i n i c a l P s y c h o l o g y , 33, 647-653. ;  K u s y s z y n , I . ( 1 9 6 8 ) . A c o m p a r i s o n of j u d g m e n t a l methods w i t h endorsements i n the assessment of p e r s o n a l i t y t r a i t s . J o u r n a l of A p p l i e d P s y c h o l o g y , 52, 227-233. Landy, F . J . , & F a r r , J . L . ( 1 9 7 6 ) . P o l i c e p e r f o r m a n c e a p p r a i s a l . JSAS C a t a l o g o f S e l e c t e d Documents i n P s y c h o l o g y , 6, 83. Landy, F . J . , & F a r r , J . L . ( 1 9 8 0 ) . P e r f o r m a n c e P s y c h o l o g i c a l B u l l e t i n , 87, 72-107.  ratings.  Landy, F . J . , & Trumbo, D.A. ( 1 9 8 0 ) . The P s y c h o l o g y b e h a v i o r . Homewood, 111.: D o r s e y P r e s s .  of work  Latham, G.P., Wexley, K.N., & P u r s e l l , E.D. ( 1 9 7 5 ) . T r a i n i n g managers t o m i n i m i z e r a t i n g e r r o r s i n t h e o b s e r v a t i o n o f b e h a v i o r . J o u r n a l of A p p l i e d P s y c h o l o g y , 60, 550-555. Lawshe, C.H., & N a g l e , B.F. ( 1 9 5 2 ) . A n o t e on t h e c o m b i n a t i o n of r a t i n g s on t h e b a s i s o f r e l i a b i l i t y . P s y c h o l o g i c a l B u l l e t i n , 49, 270-273. Lehman, H.E., Ban, T.A., & V e r d i n , M.D. (1965). R a t i n g the r a t e r . A r c h i v e s of G e n e r a l P s y c h i a t r y , 13, 67-75. L o v e , K.G. ( 1 9 8 1 ) . C o m p a r i s o n o f p e e r a s s e s s m e n t methods: R e l i a b i l i t y , v a l i d i t y , f r i e n d s h i p b i a s and u s e r r e a c t i o n . J o u r n a l of A p p l i e d P s y c h o l o g y , 66, 451-457. M a x w e l l , A.E., & P i l l i n e r , A.E.G. ( 1 9 6 8 ) . D e r i v i n g c o e f f i c i e n t s of r e l i a b i l i t y and agreement f o r r a t i n g s . B r i t i s h J o u r n a l of M a t h e m a t i c a l and S t a t i s t i c a l P s y c h o l o g y , 21, 105-116. M o r r i s o n , D.J. (1976). M u l t i v a r i a t e Toronto: McGraw-Hill.  statistical  M o s i e r , C . I . ( 1 9 4 3 ) . On t h e r e l i a b i l i t y o f c o m p o s i t e s . P s y c h o m e t r i k a , 8, 161-168. N u n n a l l y , J.C. (1978). P s y c h o m e t r i c McGraw-Hill. Overall,  J.E.  (1965). R e l i a b i l i t y  methods .  weighted,  theory. Toronto:  of composite  ratings.  122 Educational 1011-1022.  a n d P s y c h o l o g i c a l Measurement, 25,  O v e r a l l , J . E . (1968). E s t i m a t i n g i n d i v i d u a l r a t e r r e l i a b i l i t i e s from a n a l y s i s o f t r e a t m e n t e f f e c t s . E d u c a t i o n a l a n d P s y c h o l o g i c a l Measurement, 28, 255-264. P a u l h u s , D. ( 1 9 8 1 ) . A s s e s s m e n t o f i n t e r - r a t e r reliability. U n p u b l i s h e d m a n u s c r i p t . U n i v e r s i t y of B r i t i s h Columbia. P e e l , E.A. ( 1 9 4 7 a ) . P r e d i c t i o n o f a complex c r i t e r i o n and b a t t e r y r e l i a b i l i t y . B r i t i s h J o u r n a l of S t a t i s t i c a l P s y c h o l o g y , j _ , 84-94. P u r s e l l , E.D., D o s s e t t , D.L., & Latham, G.P. ( 1 9 8 0 ) . O b t a i n i n g v a l i d p r e d i c t o r s by m i n i m i z i n g r a t i n g e r r o r s i n t h e c r i t e r i o n . P e r s o n n e l P s y c h o l o g y , 33, 91-96. Shen, E . ( 1 9 2 5 ) . The r e l i a b i l i t y c o e f f i c i e n t o f p e r s o n a l r a t i n g s . J o u r n a l o f E d u c a t i o n a l P s y c h o l o g y , 16, 232-236. S h r o u t , P.E., & F l e i s s , J . L . ( 1 9 7 9 ) . I n t r a c l a s s c o r r e l a t i o n s : Uses i n a s s e s s i n g r a t e r reliability. P s y c h o l o g i c a l B u l l e t i n , 86, 420-428. S m i t h , J.M. ( 1 9 7 4 ) . A new r a t e r s e l e c t i o n t e c h n i q u e f o r use with b e h a v i o r a l r a t i n g s c a l e s . J o u r n a l of C l i n i c a l P s y c h o l o g y , 30, 40-43. S p o o l , M.D. ( 1 9 7 8 ) . T r a i n i n g p r o g r a m s f o r o b s e r v e r s o f b e h a v i o r : A r e v i e w . P e r s o n n e l P s y c h o l o g y , 31, 853-888. S t r a h e n , R.F. (1980)..More on a v e r a g i n g j u d g e s ' r a t i n g s : D e t e r m i n i n g t h e most r e l i a b l e c o m p o s i t e . J o u r n a l of C o n s u l t i n g and C l i n i c a l P s y c h o l o g y , 48, 587-589. Thomson, G.H. ( 1 9 4 0 ) . W e i g h t i n g f o r b a t t e r y r e l i a b i l i t y and p r e d i c t i o n . B r i t i s h J o u r n a l o f P s y c h o l o g y , 30, 357-366. Thomson, G.H. ( 1 9 4 7 ) . The maximum c o r r e l a t i o n o f two w e i g h t e d b a t t e r i e s : H o t e l l i n g ' s 'most p r e d i c t a b l e c r i t e r i o n ' . B r i t i s h J o u r n a l of S t a t i s t i c a l Psychology, 1 , 84-94. Thomson, G.H. ( 1 9 5 1 ) . The f a c t o r i a l a n a l y s i s o f human a b i l i t y . L o n d o n : U n i v e r s i t y o f London P r e s s . T i n s l e y , H.E., & W e i s s , D . J . ( 1 9 7 5 ) . I n t e r r a t e r reliability and agreement o f s u b j e c t i v e j u d g m e n t s . J o u r n a l o f C o u n s e l l i n g P s y c h o l o g y , 22, 358-376. W a i n e r , H. ( 1 9 7 6 ) . E s t i m a t i n g c o e f f i c i e n t s i n l i n e a r m o d e l s : I t d o n ' t make no n e v e r mind. P s y c h o l o g i c a l B u l l e t i n , 83, 213-217.  1 23 W i g g i n s , J . S . ( 1 9 7 3 ) . P e r s o n a l i t y and p r e d i c t i o n : of p e r s o n a l i t y a s s e s s m e n t . R e a d i n g , Mass.: Addison-Wesley. W i g g i n s , J . S . ( 1 9 8 1 ) . C l i n i c a l and s t a t i s t i c a l Where a r e we and where do we go from h e r e ? P s y c h o l o g y Review, j _ , 3-18.  Principles  prediction: Clinical  W i g g i n s , N., & Kohen, E . ( 1 9 7 1 ) . Man v e r s u s model of man r e v i s i t e d : The f o r e c a s t i n g o f g r a d u a t e s c h o o l s u c c e s s . J o u r n a l of P e r s o n a l i t y and S o c i a l P s y c h o l o g y , 19, 100-106.  APPENDIX INTEGER DISTRB(3),CONVRG REAL LEVSD,SCALSD,RELBAR,RELSD,X(30,2),REL(30,8),R(30,30), > GLOB(8,4,2)/64*0./,TMINX(30,3),TMINZ(30,3),GLOB2(2,3) > /6*0./,GLOB3(2,2)/4*0./,GLOB4(2,3)/6*0./,GLOB5(8,5,2) > /80*0./,TRUE(50,8),B(30),SIGMAY(2),XDAT(30,50),ZDAT(30, > ERRVAR(30),TRUREL(8),UNITWT(30)/30*1./ READ(5,1) NREPS,SEED 1 FORMAT(I 5,F5.2) Z=RANDN(SEED) Z=RAND(Z) 2 CONTINUE READ(5,3,END=3 0) NRAT,NTAR,LEVSD,SCALSD,RELBAR,RELSD, > (DISTRB(I),I=1,3),TRUALF 3 FORMAT(2I5,4F5.2,3I5,F5.4) CALL INIT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5) REWIND 7 DO 20 1=1,NREPS CALL DATA(NTAR,NRAT,LEVSD,SCALSD,RELBAR,RELSD,X,REL,R, > TMI NX, TMIN Z , TRUE , B', VARTOT, SUMTOT, XDAT, ZDAT, ERRV > DISTRB) CALL KELLEY(NRAT,REL,R,NTAR) CALL CRONB(NTAR,NRAT,REL,TMINX,X,VARTOT,SIGMAY) CALL OVRALL(NRAT,R,REL,NTAR,TRUE,ZDAT,TRUREL,UNITWT) CALL MAXLIK(NRAT,R,REL,TRUE,NTAR,ZDAT,CONVRG,TRUREL,UNITW IF (CONVRG.EQ.0) GOTO 5 1 = 1-1 GOTO 20 5 CONTINUE CALL FISHER(NRAT,R,REL) DO 10 J=1,NRAT REL(J,6)=TMINX(J,3)/SORT(TMINX(J,2)*X(J,2)) REL(J,7)=TMINZ(J,3)/SQRT(TMINZ(J,2)*FLOAT(NTAR-1)) 10 CONTINUE CALL COMPAR(NRAT,REL,GLOB) CALL ROFSUM(NRAT,X,REL,B,NTAR,VARTOT,SUMTOT,SIGMAY,GLOB2, > TRUE,ERRVAR,TRUREL,UNITWT,R,TRUALF) CALL MAXWAT(NRAT,REL,B,GLOB 3,X,R,TRUE,XDAT,ZDAT,NTAR,ERRV > TRUREL,UNITWT) CALL TRUVAR(NRAT,SIGMAY,REL,X,GLOB4,NTAR,VARTOT) CALL GETRUE(NTAR,TRUE,SUMTOT,SIGMAY,GLOB 5,NRAT,TRUREL) 20 CONTINUE CALL GLOBAL(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NREPS) CALL OUTPUT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NRAT,RELBAR,NTAR, > LEVSD,SCALSD,RELSD,DISTRB(1),DISTRB(2),DISTRB(3), > NREPS) GOTO 2 30 CONTINUE STOP END  **********************************************************  SUBROUTINE DATA(NTAR,NRAT,LEVSD,SCALSD,RELBAR,RELSD,X,REL,R, 1 24  1 25  > >  TMINX,TMIN Z,TRUE,B,VARTOT,SUMTOT,XDAT,ZDAT,ERRVAR, DISTRB) INTEGER DISTRB(3) REAL X ( 3 0 , 2 ) , X D A T ( 3 0 , 5 0 ) , R E L ( 3 0 , 8 ) , T R U E ( 5 0 , 8 ) , B ( 3 0 ) , E R R V A R ( 3 > R(30,30),ZDAT(30,50),TMINX(30,3),TMINZ(30,3),LEVSD DO 10 1=1,NTAR TRUE(1,8)=FRANDN(0.) TRUE(I,1)=0. TRUE(I,3)=0. 10 CONTINUE RLOW=2.*RELBAR-1. DO 70 1=1,NRAT A=FRANDN(0.)*LEVSD 20 CONTINUE IF (DISTRB(2).EQ.1) B(I)=FRANDN(0.)*SCALSD+1. IF (DISTRB(2).EQ.2) B ( I ) = F R A N D ( 0 . ) * 1.5242+.2379 IF ( B ( I ) . L E . 0 . . O R . B ( I ) . G E . 2 . ) GOTO 20 30 CONTINUE I F (DISTRB(3).EQ.1) REL(1,8)=FRANDN(0.)*RELSD+RELBAR IF (DISTRB(3).EQ.2) REL(I,8)=FRAND(0.)*.7621+.21895 IF (REL(1,8).LE.RLOW.OR.REL(1,8).GE.1.) GOTO 30 ERRVAR(I)=B(I)**2/REL(l , 8 ) - B ( l ) * * 2 X ( I , 1 )=0. X(I,2)=0. DO 40 J=1,NTAR XDAT(I,J)=A+B(I)*TRUE(J,8)+SQRT(ERRVAR(I))*FRANDN(0 . ) X(I,1)=X(I,1)+XDAT(I,J) X(I ,2)=X(I,2)+XDAT(l,J)**2 TRUE(J,1)=TRUE(J,1)+XDAT(I,J) 40 CONTINUE X ( I , 2 ) = X ( I , 2 ) - X ( l , 1 ) * * 2/FLOAT(NTAR) R(I,1) =1 . IMIN1=I-1 DO 50 J=1,IMIN1 R(I,J)=0. 50 CONTINUE XBAR=X(I,1)/FLOAT(NTAR) XSD=SQRT(X(I,2)/FLOAT(NTAR-1)) DO 60 J=1,NTAR ZDAT(I,J)=(XDAT(I,J)-XBAR)/XSD TRUE(J,3)=TRUE(J,3)+ZDAT(I,J) DO 60 K=1,IMIN1 R(I,K)=R(I,K)+ZDAT(I,J)*ZDAT(K,J) 60 CONTINUE DO 70 J=1,IMIN1 R(I,J)=R(I,J)/FLOAT(NTAR-1) R(J,I)=R(I,J) 70 CONTINUE R(1,1) = 1 . DO 100 1=1,NRAT DO 80 J=1,3 TMINX(I,J)=0. TMINZ(I,J)=0.  1 26  80  CONTINUE DO 90 J=1,NTAR TMX=TRUE(J,1)-XDAT(I,J) TMINX(I,1)=TMINX(I,1)+TMX TMINX(I,2)=TMINX(I,2)+TMX**2 TMINX(I,3)=TMINX(I,3)+TMX*XDAT(I,J) TMZ=TRUE(J,3)-ZDAT(I,J) TMINZ(I,1)=TMINZ(I,1)+TMZ TMINZ(I,2)=TMINZ(I,2)+TMZ**2 TMINZ(I,3)=TMINZ(I,3)+TMZ*ZDAT(I,J) 90 CONTINUE TMINX(I,2)=TMINX(I,2)-TMINX(l,1)**2/FLOAT(NTAR) TMINX(I,3)=TMINX(I,3)-X(I,1)*TMINX(I,1)/FLOAT(NTAR) TMINZ(I,2)=TMINZ(I,2)-TMINZ(l,1)**2/FLOAT(NTAR) 100 CONTINUE SUMTOT=0. VARTOT= 0. DO 110 1=1,NTAR SUMTOT=SUMTOT+TRUE(I,1) VARTOT=VARTOT+TRUE(I,1)**2 TRUE(1,1)=TRUE(I,1)/FLOAT(NRAT) TRUE(I,3)=TRUE(I,3)/FLOAT(NRAT) 110 CONTINUE VARTOT=VARTOT-SUMTOT**2/FLOAT(NTAR) RETURN END  ******************************************  SUBROUTINE KELLEY(NRAT,REL,R,NTAR) REAL R E L ( 3 0 , 8 ) , R ( 3 0 , 3 0 ) DO 20 1=1,NRAT REL(I,1)=0. W=0. DO 10 J=2,NRAT IF ( J . E Q . I ) GOTO 10 JMIN1=J-1 DO 10 K=1,JMIN1 IF (K.EQ.I) GOTO 10 RELIAB=R(I,J)*R(I,K)/R(J,K) SE=RELIAB** 2/FLOAT(NTAR)*ABS(4.*RELIAB+2./RELIAB + > 1./R(J,K)**2+(1.-2.*RELIAB)/R(I,j)**2+(1.-2.*REL > /R(I,K)**2-5.) W=W+1,/SE REL(1,1)=REL(1,1)+RELIAB/SE 10 CONTINUE REL(I,1)=REL(I,1)/W 20 CONTINUE RETURN END  ************************************************************ SUBROUTINE CRONB(NTAR,NRAT,REL,TMINX,X,VARTOT,SIGMAY) REAL R E L ( 3 0 , 8 ) , T M I N X ( 3 0 , 3 ) , X ( 3 0 , 2 ) , S I G M A Y ( 2 ) SUMVAR=0. DO 10 1=1,NRAT  127  SUMVAR= SUMVAR+X(1,2) 10 CONTINUE XN=FLOAT(NRAT)/(FLOAT(NRAT-1)*(VARTOT-SUMVAR)) DO 20 1=1,NRAT REL(I,2)=XN*TMINX(I,3)**2/X(I,2) 20 CONTINUE SIGMAY(1)=(VARTOT-SUMVAR)/FLOAT(NRAT*(NRAT-1)*(NTAR-1)) SIGMAY(2)=FLOAT(NRAT)/FLOAT(NRAT-1)*(1.-SUMVAR/VARTOT) RETURN END  *****************************************  SUBROUTINE OVRALL(NRAT,R,REL,NTAR,TRUE,ZDAT,TRUREL,UNITWT) REAL R A R R A Y ( 9 0 0 ) , R ( 3 0 , 3 0 ) , E I G E N ( 3 0 ) , R E L ( 3 0 , 8 ) , T R U E ( 5 0 , 8 ) , > ZDAT(30,50),TRUREL(8),UNITWT(30) DO 5 1=1,NTAR TRUE(I,6)=0. 5 CONTINUE K=0 DO 10 1=1,NRAT DO 10 J = 1 , 1 K=K+1 RARRAY(K)=R(I ,J) 10 CONTINUE CALL SYMAL(RARRAY,NRAT,EIGEN,I ERROR,1) N=NRAT*(NRAT-1) ICOUNT=0. DO 15 1=1,NRAT IF(RARRAY(N+I).LT.0.) ICOUNT=ICOUNT+1 15 CONTINUE ICOUNT=ICOUNT*2 F=1 . IF (I COUNT.GT.NRAT) F = - 1 . DO 20 1=1,NRAT REL(I,3)=EIGEN(NRAT)*RARRAY(N+I)**2 RARRAY(N+I)=F*RARRAY(N+I)/SORT(EIGEN(NRAT)) DO 20 J=1,NTAR TRUE(J,6)=TRUE(J,6)+RARRAY(N+I)*ZDAT(I,J) 20 CONTINUE CALL RCOMP(NRAT,RARRAY(N +1),R,REL,UNITWT,TRUREL(6)) RETURN END  *********************************************************** SUBROUTINE COMPAR(NRAT,REL,GLOB) REAL R E L D A T ( 8 , 4 ) , R E L ( 3 0 , 8 ) , G L O B ( 8 , 4 , 2 ) DO 30 M=1,8 I=M-1 I F (I.EQ.0) 1=8 DO 10 J = 1 , 4 RELDAT(I,J)=0. 10 CONTINUE DO 20 J=1,NRAT RELDAT(1,1)=RELDAT(I , 1 ) + R E L ( J , I ) RELDAT(I,2)=RELDAT(I,2)+REL(J,I)**2  128  RELDAT(I,3)=RELDAT(I,3)+REL(J,I)*REL(J,8) RELDAT(I,4)=RELDAT(I,4)+(REL(J,I)-REL(J,8))**2 20 CONTINUE R E L D A T ( I , 2 ) = S Q R T ( ( R E L D A T ( I , 2 ) - R E L D A T ( l , 1 ) * * 2/FLOAT(NRAT)) > /FLOAT(NRAT-1)) RELDAT(I,3)=(RELDAT(I,3)-RELDAT(I,1)*RELDAT(8,1)) > /(RELDAT(1,2)*RELDAT(8,2)*FLOAT(NRAT-1)) RELDAT(1,1)=RELDAT(I,1)/FLOAT(NRAT) RELDAT(1,4)=RELDAT(1,4)/FLOAT(NRAT) DO 30 J=1,4 GLOB(I,J,1)=GLOB(I,J,1)+RELDAT(I,J) GLOB(I,J,2)=GL0B(I,J,2)+RELDAT(I,J)**2 30 CONTINUE RETURN END  *************************************  SUBROUTINE MAXLIK(NRAT,R,REL,TRUE,NTAR,ZDAT,CONVRG,TRUREL,UN INTEGER OLD,CONVRG REAL R ( 3 0 , 3 0 ) , R E L ( 3 0 , 8 ) , T R U E ( 5 0 , 8 ),ZDAT(30,50),S(900),MAX, > EIGEN(30),L(2,30),P(30),CONST/.005/,W(30),TRUREL(8), > UNITWT(30) N=NRAT*(NRAT-1) ITER=0. CONVRG=0 S(1)=R(1,1) INDEX=1 DO 20 I=2,NRAT IMIN1=I-1 DO 10 J=1,IMIN1 INDEX=INDEX+1 S(INDEX)=R(I,J) 10 CONTINUE INDEX=INDEX+1 S(INDEX)=R(I,1) 20 CONTINUE CALL SYMAL(S,NRAT,EIGEN,IERROR,1) X=SORT(EIGEN(NRAT)) DO 30 1=1,NRAT L(2,I)=S(N+I)*X 30 CONTINUE NEW=1 OLD=2 40 CONTINUE ITER=ITER+1 P(1)=SQRT(R(1,1)-L(OLD,1)**2) S(1)=L(OLD,1)**2/P(1)**2 INDEX=1 DO 60 I=2,NRAT P(I)=SQRT(R(I,1)-L(OLD,I)**2) IMIN1=1-1 DO 50 J=1,IMIN1 INDEX=INDEX+1 S(INDEX)=R(I,J)/(P(I)*P(J))  129  50  CONTINUE INDEX=INDEX+1 S(INDEX)=L(0LD,I)**2/P(l)**2 60 CONTINUE CALL SYMAL(S,NRAT,EIGEN,IERROR,1) IF (I ERROR.EQ.0) GOTO 65 CONVRG=1 RETURN 65 CONTINUE MAX=0. XX=SQRT(EIGEN(NRAT)) DO 70 1=1,NRAT L(NEW,I)=S(N+I)*XX*P(I) X=ABS(L(NEW,I)-L(OLD,I)) IF (X.GT.MAX) MAX=X 70 CONTINUE K=NEW NEW=OLD OLD=K I F (ITER.LT.2.OR.(MAX.GT.CONST.AND.ITER.LE.15)) GOTO 40 I F (ITER.LE.15.OR.MAX.LE.CONST) GOTO 75 CONVRG=1 RETURN 7 5 CONTINUE Z = 0. DO 80 1=1,NRAT X=L(OLD,I)**2 REL(I,4)=X P(I)=R(I,1)-X Z=Z+X/P(I) 80 CONTINUE Z=1.+Z DO .90 1 = 1 ,NTAR TRUE(I,7)=0. 90 CONTINUE ICOUNT=0. DO 95 1=1,NRAT IF ( L ( O L D , I ) . L T . 0 . ) ICOUNT=ICOUNT+1 95 CONTINUE ICOUNT=ICOUNT*2 F=1 . IF (ICOUNT.GT.NRAT) F=-1. DO 100 1=1,NRAT W(I)=F*L(OLD,I)/(P(I)*Z) DO 100 J=1,NTAR TRUE(J,7)=TRUE(J,7) + Z D A T ( I , J ) * W ( I ) 100 CONTINUE CALL RCOMP(NRAT,W,R,REL,UNITWT,TRUREL(7)) RETURN END **************************************** SUBROUTINE FISHER(NRAT,R,REL) REAL R ( 3 0 , 3 0 ) , R E L ( 3 0 , 8 ) , Z ( 3 0 )  130  10 1=1,NRAT Z(I)=0. 10 CONTINUE DO 20 I=2,NRAT IMIN1=I-1 DO 20 J=1,IMIN1 X=ALOG((1.+R(I,J))/(1.-R(I,J))) Z(I ) = Z ( I ) + X Z(J)=Z(J)+X 20 CONTINUE DO 30 1=1,NRAT X=2.7183**(Z(I)/FLOAT(NRAT-1)) REL(I,5)=(X-1.)/(1.+X) 30 CONTINUE RETURN END DO  ********************************************** SUBROUTINE  ROFSUM(NRAT,X,REL,B,NTAR,VARTOT,SUMTOT,SIGMAY,GLO TRUE,ERRVAR,TRUREL,UNITWT,R,TRUALF) REAL X ( 3 0 , 2 ) , R E L ( 3 0 , 8 ) , B ( 3 0 ) , S I G M A Y ( 2 ) , G L O B 2 ( 2 , 3 ) , T R U E ( 5 0 , 8 ) > RELSUM(3),ERRVAR(30),TRUREL(8),UNITWT(30),R(30,30) RELSUM(1)=SIGMAY(2) CALL RCOMP(NRAT,UNITWT,R,REL,X(1,2),TRUREL(1)) CALL RCOMP(NRAT,UNITWT,R,REL,UNITWT,TRUREL(3)) RELSUM(2)=TRUREL(1) TRUREL(5)=TRUREL(1) DO 20 1=1,2 GLOB2(1,1)=GLOB2(1,1)+RELSUM(I) GLOB2(1,2)=GLOB2(1,2)+RELSUM(I)**2 GLOB2(l,3)=GL0B2(I,3)+(RELSUM(I)-TRUALF)**2 20 CONTINUE S=SUMTOT/FLOAT(NRAT*NTAR)*(1.-RELSUM(2)) DO 30 1=1,NTAR TRUE(1,5)=RELSUM(2)*TRUE(I,1)+S 30 CONTINUE RETURN END >  ***********************************************  SUBROUTINE MAXWAT(NRAT,REL,B,GLOB3,X,R,TRUE,XDAT,ZDAT,NTAR, ERRVAR,TRUREL,UNITWT) REAL ARRAYOOO),REL(30,8),BSUM(2),WSUM(2),EIGEN(30),B(30), > GLOB3(2,2),X(30,2),R(30,30),W(30,3),TRUE(50,8), > ZDAT(30,50),XDAT(30,50),ERRVAR(30),TRUREL(8),UNITWT(30) ARRAY(1)=REL( 1 , 4 ) / ( 1 . - R E L ( 1 , 4 ) ) INDEX=1 DO 20 I = 2,NRAT IMIN1=I-1 DO 10 J=1,IMIN1 INDEX=INDEX+1 ARRAY(INDEX)=R(I,J)/SQRT((1.-REL(I,4))*(1.-REL(J,4))) 10 CONTINUE INDEX=INDEX+1 ARRAY(INDEX)=REL(I , 4 ) / ( 1 . - R E L ( 1 , 4 ) ) >  131  20  CONTINUE DO 30 1=1,NTAR TRUE(I,2)=0. TRUE(l,4)=0. 30 CONTINUE BSUM(1)=0. BSUM(2)=0. WSUM(1)=0. WSUM(2)=0. CALL SYMAL(ARRAY,NRAT,EIGEN,IERROR,1) N=NRAT*(NRAT-1) W1=0. DO 40 1=1,NRAT W(I,1)=ARRAY(N+I)/SQRT((1.-REL(I,4))*X(I,2)) W(I,2)=SQRT(REL(I,4)*FLOAT(NTAR-1))/(SQRT(X(l,2))*(1.> REL(1,4))) W(I,3)=SQRT(REL(I,4))/(1.-REL(I,4)) W1=W1+W(I,3) DO 40 J=1 ,2 BSUM(J)=BSUM(J)+W(I,J)*B(I) WSUM(J)=WSUM(J)+W(I,J)**2*ERRVAR(I) 40 CONTINUE DO 60 1=1,2 XR=BSUM(I)**2/(BSUM(l)**2+WSUM(l)) GLOB3(1,1)=GLOB3(1,1)+XR GLOB3(l,2)=GL0B3(I,2)+XR**2 60 CONTINUE DO 7 0 1=1,NTAR TRUE(I,2)=0. TRUE(I,4)=0. 70 CONTINUE DO 80 1=1,NRAT W(I,3)=W(I,3)/W1 DO 80 J=1,NTAR TRUE(J,2)=TRUE(J,2)+W(I,3)*XDAT(I,J) TRUE ( J , 4 ) =TRUE ( J , 4 ) +W (I , 3 ) * ZDAT (I , J ) 80 CONTINUE . CALL RCOMP(NRAT,W(1,3),R,REL,X(1,2),TRUREL(2)) CALL RCOMP(NRAT,W(1,3),R,REL,UNITWT,TRUREL(4)) RETURN END  ****************************************  SUBROUTINE TRUVAR(NRAT,SIGMAY,REL,X,GLOB4,NTAR,VARTOT) REAL S I G M A Y ( 2 ) , R E L ( 3 0 , 8 ) , X ( 3 0 , 2 ) , G L O B 4 ( 2 , 3 ) SIGMAY(2)=0. DO 10 1=1,NRAT SIGMAY(2)=SIGMAY(2)+X(I,2)*(1.-REL(I,4)) 10 CONTINUE SIGMAY(2) = (VARTOT-SIGMAY(2))/FLOAT(NRAT* * 2 *(NTAR-1)) DO 20 1=1,2 GLOB4(l,1)=GLOB4(l,1)+SIGMAY(l) GLOB4(l,2)=GL0B4(I,2)+SIGMAY(I)**2 GL0B4(I,3)=GL0B4(I,3)+(SIGMAY(I)-1.)**2  1 32  20 CONTINUE RETURN END  ************************************** SUBROUTINE GETRUE(NTAR,TRUE,SUMTOT,SIGMAY,GLOB5,NRAT,TRUREL) REAL TRUDAT(8,5),TRUE(50,8),GLOB5(8,5,2),SIGMAY(2),TRUREL(8) YBAR=SUMTOT/FLOAT(NRAT*NTAR) DO 30 M=1,8 I=M-1 I F (I.EQ.O) 1=8 DO 10 J=1,5 TRUDAT(I,J)=0. 10 CONTINUE DO 20 J=1,NTAR TRUDAT(I,1)=TRUDAT(I,1)+TRUE(j,I) TRUDAT(I,2)=TRUDAT(I,2)+TRUE(J,I)* * 2 TRUDAT(1,3)=TRUDAT(1,3)+TRUE(J,I)*TRUE(J,8) TRUDAT(I,4)=TRUDAT(I,4)+(TRUE(J,I)-TRUE(J,8))**2 20 CONTINUE TRUDAT(1,2)=SQRT((TRUDAT(I,2)-TRUDAT(l,1)**2/FLOAT(NTAR)) > /FLOAT(NTAR-1)) TRUDAT(I,3)=(TRUDAT(1,3)-TRUDAT(I,1)*TRUDAT(8,1))/ > (TRUDAT(1,2)*TRUDAT(8,2)*FLOAT(NTAR-1)) TRUDAT(1,1)=TRUDAT(I,1)/FLOAT(NTAR) TRUDAT(1,4)=TRUDAT(1,4)/FLOAT(NTAR) 30 CONTINUE TRUREL(8) = 1 . DO 50 1=1,8 X2=TRUDAT(1,2)/SQRT(ABS(SIGMAY(1)*TRUREL(I))) X1=TRUDAT(1,1)-YBAR*X2 DO 40 J=1,NTAR TRUDAT(I,5)=TRUDAT(I,5)+((TRUE(J,I)-X1)/X2-TRUE(J,8))* 40 CONTINUE TRUDAT(1,5)=TRUDAT(1,5)/FLOAT(NTAR) DO 50 J=1,5 GLOB5(I,J,1)=GLOB5(I,J,1)+TRUDAT(I,J) GLOB5(l,J,2)=GL0B5(I,J,2)+TRUDAT(I,J)**2 50 CONTINUE RETURN END  **************************************************  SUBROUTINE GLOBAL(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NREPS) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, CALL GDIM3(GLOB,8,4,2,NREPS) CALL GDIM2(GLOB2,2,3,NREPS) DO 10 I = 1 , 2 GLOB3(l,2)=SQRT((GLOB3(l,2)-GLOB3(I,1)**2/FLOAT(NREPS))/ > FLOAT(NREPS-1)) GLOB3(l,1)=GLOB3(l,1)/FLOAT(NREPS) 10 CONTINUE CALL GDIM2(GLOB4,2,3,NREPS) CALL GDIM3(GLOB5,8,5,2,NREPS) RETURN  133  END  *************************************** SUBROUTINE GDIM3(X,I 1,1 2,1 3,NREPS) REAL X ( I 1,1 2,1 3) DO 10 1=1,11 DO 10 J=1,12 X(I,J,2)=SQRT((X(I,J,2)-X(I,J,1)**2/FLOAT(NREPS))/ > FLOAT(NREPS-1)) X(I,J,1)=X(I,J,1)/FLOAT(NREPS) 10 CONTINUE RETURN END  *************************************************  SUBROUTINE GDIM2(X,I 1 ,1 2,NREPS) REAL X ( I 1 ,1 2 ) DO 10 1=1,11 X(I,2)=SQRT((X(I,2)-X(l,1)**2/FLOAT(NREPS))/FLOAT(NREPS-1 X(I,1)=X(I,1)/FLOAT(NREPS) X(I,3)=X(I,3)/FLOAT(NREPS) 10 CONTINUE RETURN END  *************************************************  SUBROUTINE OUTPUT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NRAT,RELBAR, NTAR,LEVSD,SCALSD,RELSD,11,12,13,NREPS) INTEGER T I T L E ( 3 ) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, > LEVSD WRITE(6,999) 999 FORMAT( 1'/•-') WRITE(6, 10) 10 FORMAT('-',8X,'Table'//9X,'Means ( o v e r r e p l i c a t i o n s ) o f Mean > 'SD''s, C o r r e l a t i o n s w i t h ' ) WRITE(6,11) 11 FORMAT('+' ,8X,59('_' ) ) WRITE(6,12) 12 F O R M A T ( 9 X , ' A c t u a l R e l i a b i l i t i e s and Mean S q u a r e D e v i a t i o n s f > 'Actual') WRITE(6,9) 9 FORMAT('+',8X,59('_')) WRITE(6, 13) 13 F O R M A T ( 9 X , ' R e l i a b i l i t i e s o f R a t e r R e l i a b i l i t y E s t i m a t e s . ' ) WRITE(6,14) 14 FORMAT('+',8X,45('_')) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,20) 20 FORMAT(///25X,'Rater R e l i a b i l i t y E s t i m a t e s ' ) WRITE(6,21 ) 21 FORMAT('+',24X,27('_')) CALL OUT3(GLOB,8,4,2) WRITE(6,999) WRITE(6,22) 22 FORMAT('-',8X,'Table'//9X,'Comparison o f E s t i m a t e s o f R e l i a b >  1  134  >  i t y of Sums o f R a t e r s ; ' ) WRITE(6,23) 23 FORMAT('+',8X,57('_')) WRITE(6,24) 24 F O R M A T ( 9 X , ' P o p u l a t i o n V a l u e s o f R e l i a b i l i t y o f C o m p o s i t e s > 'Weighted by') WRITE(6,25) 25 FORMAT('+',8X,58('_')) WRITE(6,26) 26 FORMAT(9X,'Two M e t h o d s ; C o m p a r i s o n o f E s t i m a t e s o f T r u e S c o r > 'Variance. ' ) WRITE(6,27) 27 FORMAT('+' 8 X , 6 0 ( ' _ ' ) ) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,30) 30 F O R M A T ( / / / 2 9 X , ' R e l i a b i l i t y o f Sum') WRITE(6,31) 31 FORMAT('+',28X,18(' ' ) ) CALL O U T 2 ( G L O B 2 , 2 , 3 T WRITE(6,40) 40 FORMAT(///22X,'Weighting f o r Maximum R e l i a b i l i t y ' ) WRITE(6,41) 41 FORMAT('+',21X,33('_')) WRITE(6,50) 50 FORMAT(//9X,'Estimate',10X,'Mean',9X,'SD') WRITE(6,55) 55 FORMAT('+',8X,60('_')) DO 80 1=1,2 READ(7,60) ( T I T L E ( J ) , J = 1 , 3 ) 60 FORMAT(3A4) WRITE(6,70) ( T I T L E ( J ) , J = 1 , 3 ) , ( G L O B 3 ( I , J ) , J = 1 , 2 ) 70 FORMAT(/9X,3A4,3X,2(F8.3,4X)) 80 CONTINUE WRITE(6,85) 85 FORMAT(9X,60('_')) WRITE(6,90) 90 FORMAT(///29X,'True S c o r e V a r i a n c e ' ) WRITE(6,91) 91 FORMAT('+',28X,19(' ' ) ) CALL O U T 2 ( G L O B 4 , 2 , 3 7 WRITE(6,999) WRITE(6,95) 95 FORMAT('-',8X,'Table'//9X,'Means ( o v e r r e p l i c a t i o n s ) o f Mean > 'SD''s, C o r r e l a t i o n s w i t h ' ) WRITE(6,96) 96 FORMATC + ', 8X, 59 ('_')) WRITE(6,97) 97 FORMAT(9X,'Actual and Mean S q u a r e D e v i a t i o n s from A c t u a l (MS > ' of True') WRITE(6,98) 98 FORMAT(' +' ,8X,60('_' )) WRITE(6,99) 99 FORMAT(9X,'Score E s t i m a t e s ; A l s o I n c l u d e s Mean S q u a r e ', 1  r  135  >  ' D e v i a t i o n s from') WRITE(6,100) 100 FORMAT('+',8X,58('_')) WRITE(6,101) 101 FORMAT(9X,'Actual o f E s t i m a t e s S t a n d a r d i z e d t o Mean E q u a l ', > 'to Estimated') WRITE(6,102) 102 FORMAT('+',8X,59('_')) WRITE(6,103) 103 FORMAT(9X,'True S c o r e Mean a n d V a r i a n c e E q u a l t o t h e P r o d u c t > 'of') WRITE(6,106) 106 FORMAT('+',8X,52('_')) WRITE(6,107) 107 F O R M A T ( 9 X , ' E s t i m a t e s o f T r u e S c o r e V a r i a n c e a n d ', > ' R e l i a b i l i t y (MSE2). ' ) WRITE(6,104) 104 FORMAT('+',8X,56('_')) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,110) 110 FORMAT(///34X,'True S c o r e s ' ) WRITE(6,111) 111 FORMAT('+',33X,11('_')) CALL OUT3(GLOB5,8,5,2) RETURN END  ***************************************  SUBROUTINE OUT3(X,11,12,13) INTEGER T I T L E ( 8 , 3 ) REAL X ( I 1 , 1 2 , 1 3 ) READ(7,5) ( ( T I T L E ( I , J ) , J = 1 , 3 ) , I = 1 , I 1 ) 5 FORMAT(3A4) IF (I2.EQ.4) WRITE(6,10) 10 FORMAT(//9X,'Estimate',11X,'Mean',6X,'SD',7X,'R',7X, > 'MSE') IF (I2.EQ.5) WRITE(6,20) 20 FORMAT(//9X,'Estimate',11X,'Mean',6X,'SD',7X,'R',7X, > 'MSE1',5X,'MSE2') WRITE(6,15) 15 FORMAT('+',8X,60(' ' ) ) WRITE(6,30) ( T I T L E l 8 , I ) , I = 1 , 3 ) , ( ( X ( 8 , I , J ) , I = 1 , 2 ) , J = 1 , 2 ) 30 F O R M A T ( / 9 X , 3 A 4 , 3 X , 2 ( F 8 . 3 , 1 X ) / 2 4 X , 2 ( 2 X , ' ( ' , F 5 . 2 , ' ) ' ) ) DO 60 1=1,7 WRITE(6,40) ( T I T L E ( I , J ) , J = 1 , 3 ) , ( X ( I , J , 1 ) , J = 1 , 1 2 ) 40 FORMAT(/9X,3A4,3X,5(F8.3,1X)) IF (I2.EQ.4) WRITE(6,45) ( X ( I , J , 2 ) , J = 1 , 1 2) 45 FORMAT(24X,4(2X,'(',F5.2,')')) IF (I2.EQ.5) WRITE(6,50) ( X ( I , J , 2 ) , J = 1 , 1 2) 50 FORMAT(24X,5(2X, ' ( '.,F5. 2 , ' ) ' ) ) 60 CONTINUE WRITE(6,70) 70 FORMAT(9X,60('_')//9X,'Note. S t a n d a r d d e v i a t i o n s a r e g i v e n i > 'parentheses.')  1 36  RETURN END  ***************************************  5 11 15 30 40 50  SUBROUTINE OUT2(X,I 1,1 2) INTEGER T I T L E ( 8 , 3 ) REAL X ( I 1 ,12) READ(7,5) ( ( T I T L E ( I , J ) , J = 1 ,3 ) ,I = 1 ,1 1) FORMAT(3A4) WRITE(6,11) FORMAT(//9X,'Est imate',1 OX,'Mean' ,9X,'SD' ,1 OX,'MSE') WRITE(6,15) FORMAT('+',8X,60('_')) DO 40 1=1,11 WRITE(6,30) ( T I T L E ( I , j ) , J = 1 , 3 ) , ( X ( I , J ) , J = 1 , 1 2) FORMAT(/9X,3A4,3X,3(F8.3,4X)) CONTINUE WRITE(6,50) FORMAT(9X,60('_')) RETURN END  ***************************************************  SUBROUTINE CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11, I 2,1 3,NREPS) INTEGER D T Y P E ( 2 , 2 ) / ' N o r m ' , ' a l ' , ' U n i f ' , ' o r m '/ REAL LEVSD WRITE(6,10) 10 F O R M A T ( / / 2 4 X , ' N ' , 1 8 X , ' L e v e l * , 3 X , ' S c a l e ' , 1 X , ' R e l i a b i l i t y ' ) WRITE(6,20) 20 FORMAT('+',8X,16('_'),4X,40('_')) WRITE(6,30) NRAT,RELBAR 30 FORMAT(9X,'Raters',7X,I3,4X,'Mean',12X,'0',7X,'1',6X,F3.2) WRITE(6,40) NTAR,LEVSD,SCALSD,RELSD 40 FORMAT(9X,'Targets' ,6X,I 3,4X,'SD' , 8 X , 3 ( 5 X , F 3 . 2 ) ) WRITE(6,50) N R E P S , ( D T Y P E ( I , 1 1 ) , 1 = 1 , 2 ) , ( D T Y P E ( I , 1 2 ) , 1 = 1 , 2 ) , > (DTYPE(I,13),1=1,2) 50 F O R M A T ( 9 X , ' R e p l i c a t i o n s ' , 1 X , I 3 , 4 X , ' D i s t r i b u t i o n ' , 2 X , 3 ( 2 A 4 ) ) RETURN END >  * *"* ************************************************ SUBROUTINE RCOMP(NRAT,W,R,REL,VAR,RELCOM) REAL W(30),R(30,30),NUM,DENOM,REL(30,8),VAR(30) NUM=W(1)**2*REL(1,4)*VAR(1) DENOM=W(1)**2*VAR(1) DO 20 I=2,NRAT IMIN1=1-1 DO 10 J=1,IMIN1 Z=2.*W(I)*W(J)*SQRT(VAR(I)*VAR(J))*R(I,J) NUM=NUM+Z DENOM=DENOM+Z 10 CONTINUE NUM=NUM+W(I)**2*REL(I,4)*VAR(I) DENOM=DENOM+W(I)* * 2 *VAR(I) 20 CONTINUE  137  RELCOM=NUM/DENOM RETURN END  *************************************  10  25 20 30 20  SUBROUTINE INIT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, DO 30 1=1,2 GL0B3(I,1)=0. GL0B3(I,2)=0. DO 10 J=1 ,3 GL0B2(I,J)=0. GL0B4(I,J)=0. CONTINUE DO 20 J=1,8 DO 25 K=1,4 GL0B(J,K,I)=0. GLOB5(J,K,I)=0. CONTINUE GLOB5(J,5,I)=0. CONTINUE • CONTINUE RETURN END CONTINUE  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096273/manifest

Comment

Related Items