Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A simulation study comparing the reliability and validity of methods of scoring ratings Phillips, Norman 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1984_A8 P55_4.pdf [ 6.94MB ]
Metadata
JSON: 831-1.0096273.json
JSON-LD: 831-1.0096273-ld.json
RDF/XML (Pretty): 831-1.0096273-rdf.xml
RDF/JSON: 831-1.0096273-rdf.json
Turtle: 831-1.0096273-turtle.txt
N-Triples: 831-1.0096273-rdf-ntriples.txt
Original Record: 831-1.0096273-source.json
Full Text
831-1.0096273-fulltext.txt
Citation
831-1.0096273.ris

Full Text

A SIMULATION STUDY COMPARING THE RELIABILITY AND VALIDITY OF METHODS OF SCORING RATINGS by Norman P h i l l i p s B.A., M c G i l l U n i v e r s i t y , 1978 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS in THE FACULTY OF GRADUATE STUDIES Department of Psychology We accept t h i s t h e s i s as conforming to the r e q u i r e d standard UNIVERSITY OF BRITISH COLUMBIA January 1984 (cT) Norman P h i l l i p s , 1984 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y a v a i l a b l e for reference and study. I further agree that permission for extensive copying of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the head of my department or by h i s or her representatives. I t i s understood that copying or p u b l i c a t i o n of t h i s t h e s i s for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date DE-6 (3/81) A b s t r a c t Simulated r a t i n g data were generated a c c o r d i n g to a u n i - f a c t o r model under v a r y i n g c o n d i t i o n s o f : number of judges; number of t a r g e t s ; d i s c r e p a n c i e s i n judges' s c a l e s of measurement; and mean and va r i a n c e i n d i s t r i b u t i o n s of i n d i v i d u a l judges' r e l i a b i l i t i e s . Burt's (1936) method of s t a n d a r d i z i n g r a t i n g s , e s t i m a t i n g judges' i n d i v i d u a l r e l i a b i l i t i e s from the r a t i n g data, and weighting r a t i n g s by a f u n c t i o n of the judges' estimated r e l i a b i l i t i e s r e s u l t e d i n h igher c o r r e l a t i o n s with true scores than d i d the simple consensus. A method of s c a l i n g true score estimates to an optimal a b s o l u t e s c a l e r e s u l t e d in reduced mean square d e v i a t i o n s from the t r u e s c o r e s . Burt's estimates showed c l o s e resemblance to maximum l i k e l i h o o d f a c t o r s c o r e s . S e v e r a l proposed methods of e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s were t e s t e d . Only Cronbach's performed p o o r l y under some c o n d i t i o n s . The maximum l i k e l i h o o d f a c t o r l o a d i n g estimate appeared to g i v e the best estimate o v e r a l l . The alpha c o e f f i c i e n t was found to be a much poorer estimate of the r e l i a b i l i t y of the sum (or mean) of a group of judges than another estimate which i n v o l v e d e s t i m a t i n g judges' i n d i v i d u a l r e l i a b i l i t i e s . Table of Contents ABSTRACT i i LIST OF TABLES V LIST OF FIGURES v i i ACKNOWLEDGEMENT v i i i I. I n t r o d u c t i o n 1 A. Purpose 2 B. Rat i n g S c a l e s 4 C. S c a l e D i s c r e p a n c i e s 6 D. F a c t o r i a l V a l i d i t y . .. .9 E. Rater R e l i a b i l i t y 10 F. D i f f e r e n t i a l Weighting .15 G. Summary 17 H. Model 18 I. O b j e c t i o n s to Burt's Method 20 I I . Method 22 A. Data Generation 22 B. Experimental C o n d i t i o n s 24 C. Rater R e l i a b i l i t i e s 26 D. R e l i a b i l i t y of Sum 28 E. Weighting f o r Maximum R e l i a b i l i t y 29 F. True Score Variance 30 G. True Scores 30 I I I . R e s u l t s .33 A. Rater R e l i a b i l i e s 33 B. R e l i a b i l i t y of Sum 36 C. Weighting f o r Maximum R e l i a b i l i t y 37 i i i D. True Score V a r i a n c e 41 E. True Scores 4 1 F. Uniform D i s t r i b u t i o n 48 IV. D i s c u s s i o n 105 REFERENCES 1 1 8 APPENDIX 1 2 4 i v L i s t of Tables Tables 1-12 Means (over r e p l i c a t i o n s ) of means, SD's, c o r -r e l a t i o n s with a c t u a l r e l i a b i l i t i e s and mean square d e v i a t i o n s from a c t u a l r e l i a b i l i t i e s of r a t e r r e l i a b i l i t y estimates 51 Table 13 Average over r e p l i c a t i o n s of r e l i a b i l i t y estimate means 63 Table 14 Mean standard d e v i a t i o n s of r e l i a b i l i t y estimates 64 Table 15 Mean c o r r e l a t i o n between r e l i a b i l i t y estimates and a c t u a l r e l i a b i l i t i e s 65 Table 16 C o r r e l a t i o n s between r e l i a b i l i t y estimates and a c t u a l r e l i a b i l i t e s averaged over r a t e r b i a s ...66 Table 17 C o r r e l a t i o n s between r e l i a b i l i t y estimates and a c t u a l r e l i a b i l i t i e s averaged over sample s i z e 67 Table 18 Average mean square d e v i a t i o n s of r e l i a b i l i t y estimates from a c t u a l r e l i a b i l i t i e s 68 Table 19 Mean square d e v i a t i o n s of r e l i a b i l i t y e s t i -mates from a c t u a l r e l i a b i l i t i e s averaged over r a t e r b i a s 69 Table 20 Mean square d e v i a t i o n s of r e l i a b i l i t y e s t i -mates from a c t u a l r e l i a b i l i t i e s averaged over sample s i z e . . 70 Tables 21-32 Comparison of estimates of r e l i a b i l i t y of sums of r a t e r s ; p o p u l a t i o n values of r e l i a b -i l i t y of composites weighted by two methods; comparison of estimates of t r u e score va r i a n c e 71 Table 33 Expected r e l i a b i l i t y of the sum of r a t e r s 83 Table 34 Estimates of r e l i a b i l i t y of the sum 84 Tables 35-46 Means (over r e p l i c a t i o n s ) of means, SD's, c o r r e l a t i o n s with a c t u a l and mean square d e v i a t i o n s from a c t u a l (MSE1) of true score estimates; a l s o i n c l u d e s mean square d e v i a -t i o n s from a c t u a l of estimates s t a n d a r d i z e d to mean equal to estimated true score mean and v a r i a n c e equal to the product of e s t i -mates of t r u e score v a r i a n c e and r e l i a b i l i t y (MSE2) 85 v Table 47 Mean c o r r e l a t i o n between t r u e score e s t i -mates and a c t u a l true scores 97 Table 48 Means f o r a b s o l u t e s c a l e t r u e score estimates of mean square d e v i a t i o n s from a c t u a l t r u e scores 98 Table 49 Mean MSE of true score estimates standard-i z e d to estimated optimal s c a l e from a c t u a l true scores 99 Table 50 Mean squared d e v i a t i o n s averaged over r a t e r b i a s 100 Table 51 Mean squared d e v i a t i o n s averaged over sample s i z e 1 01 Table 52 Means (over r e p l i c a t i o n s ) o f means, SD's, c o r -r e l a t i o n s with a c t u a l r e l i a b i l i t i e s and mean square d e v i a t i o n s from a c t u a l r e l i a b i l i t i e s of r a t e r r e l i a b i l i t y estimates 102 Table 53 Comparison of estimates of r e l i a b i l i t y of sums of r a t e r s ; p o p u l a t i o n values of r e l i a b -i l i t y of composites weighted by two methods; comparison of estimates of true score v a r i a n c e 103 Table 54 Means (over r e p l i c a t i o n s ) of means, SD's, c o r r e l a t i o n s with a c t u a l and mean square d e v i a t i o n s from a c t u a l (MSE1) of true score estimates; a l s o i n c l u d e s mean square d e v i a -t i o n s from a c t u a l of estimates s t a n d a r d i z e d to mean equal to estimated true score mean and v a r i a n c e equal to the product of e s t i -mates of t r u e score v a r i a n c e and r e l i a b i l i t y (MSE2) 104 v i L i s t of F i g u r e s F i g u r e 1 R e l i a b i l i t y of the sum averaged over r a t e r b i a s 38 Fi g u r e 2 R e l i a b i l i t y of the sum averaged over sample s i z e 39 F i g u r e 3 R e l i a b i l i t y of consensus and weighted compo-s i t e averaged over r a t e r b i a s 40 F i g u r e 4 R e l i a b i l i t y of consensus and weighted compo-s i t e averaged over sample s i z e 42 Fi g u r e 5 Mean square d e v i a t i o n s averaged over r a t e r b i a s 47 F i g u r e 6 Mean s q u a r e . d e v i a t i o n s averaged over sample s i z e . . ... 49 v i i Ac knowledqement I would l i k e to express my g r a t i t u d e to the f o l l o w i n g people: J e r r y Wiggins f o r h i s advice and moral support; Jim S t e i g e r f o r teaching me a l l of the techniques that made t h i s p r o j e c t p o s s i b l e ; Del Paulhus f o r h i s a t t e n t i o n and e x p e r t i s e ; D i m i t r i Papageorgis f o r h i s generous a t t e n t i o n ; and most of a l l to Diane f o r e v e r y t h i n g . v i i i I. INTRODUCTION . . . s u b j e c t i v i t y i n o b s e r v a t i o n s i s a source of e r r o r f o r most r e s e a r c h i n p e r s o n a l i t y and r e l a t e d f i e l d s . A s c i e n c e can advance by i d e n t i f y i n g sources of e r r o r , so that e r r o r can be measured and p o s s i b l y c o n t r o l l e d . As La Barre puts i t : ' A l l the n a t u r a l s c i e n c e s long s i n c e have sought to become exact s c i e n c e s , f i r s t through the discernment of the p o s s i b i l i t y and the nature, and then through a n a l y s i s and measurement of the magnitude of "probable e r r o r " inherent i n the very process of o b s e r v a t i o n and measurement, as e x e m p l i f i e d by chromatic and other d i s t o r t i o n i n the m i c r o s c o p i c l e n s i t s e l f , and the l i k e ' ( V 9 6 7 , p . v i i ) . In the b e h a v i o r a l s c i e n c e s , we have recognized that s u b j e c t i v i t y c o n t r i b u t e s e r r o r to many kinds of o b s e r v a t i o n s , but we have been unable to measure the magnitude of that e r r o r with any p r e c i s i o n . The e r r o r i s not always systematic and f i x e d , as i n the d i s t o r t i o n i n t r o d u c e d by a l e n s ... ( F i s k e , 1978). Suppose that each of a panel of judges r a t e s each of a pool of t a r g e t s on a v a r i a b l e f o r which there e x i s t s no a v a i l a b l e c r i t e r i o n of g r e a t e r v a l i d i t y than the r a t i n g s themselves. The usual procedure i s to estimate t a r g e t s ' " t r u e " scores with the simple average of the judges' r a t i n g s . Burt (1936) proposed a method of combining r a t i n g s which g i v e s true score estimates which are more r e l i a b l e and 1 2 more v a l i d than those given by the consensus method using only i n f o r m a t i o n c o n t a i n e d i n the r a t i n g s . The method i n v o l v e s e s t i m a t i n g each i n d i v i d u a l judges' r e l i a b i l i t y and e s t i m a t i n g true scores by the composite of s t a n d a r d i z e d ( w i t h i n judges) r a t i n g s weighted by a f u n c t i o n of the judges' r e l i a b i l i t i e s . Burt's method r e s u l t s i n i n c r e a s e d v a l i d i t y , but, ap p a r e n t l y the degree of improvement was not con s i d e r e d s u f f i c i e n t to j u s t i f y the computational labour i n v o l v e d (Lawshe & Nagel, 1952), and hence the method has been ignored. It i s argued here, however, that computers e l i m i n a t e the problem of computation, and c o n s i d e r i n g the importance of r a t i n g data f o r many areas of re s e a r c h , any improvement in v a l i d i t y or r e l i a b i l i t y ought to be h i g h l y valued. The primary aim of t h i s study was to i n v e s t i g a t e the performance of Burt's method as compared to simple averaging under v a r y i n g numbers of judges and t a r g e t s and v a r y i n g d i s t r i b u t i o n s of judges' s c a l e s of measurement and r e l i a b i l i t y . The most important secondary t o p i c which was i n v e s t i g a t e d was a comparison of v a r i o u s proposed methods of e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s from r a t i n g data. A. PURPOSE There are many s i t u a t i o n s i n which e v a l u a t i o n of an obje c t or event can only be performed by human s u b j e c t i v e judgment. Examples i n c l u d e such d i v e r s e areas as judging 3 l i v e s t o c k , a r t , wine, a t h l e t i c s , academic or job performance, and p e r s o n a l i t y . Because of the very nature of these areas there i s always a c e r t a i n amount of disagreement between judges no matter how "expert" they a r e . Hence i t i s common p r a c t i c e to combine r a t i n g s from a panel of judges. In some cases i t may be a p p r o p r i a t e to d e f i n e the " t r u e " value of a q u a n t i t y (or q u a l i t y ) as the consensus of a s u f f i c i e n t l y l a r g e panel of expert judges (Cook, 1979; Cronbach, Rajaratnam, & G l e s e r , 1963; Horowitz, Inouye, & Siegelman, 1979; Landy & F a r r , 1980). T h i s i s e s p e c i a l l y t rue of p e r s o n a l i t y v a r i a b l e s : ... the impulsiveness of a person can s c a r c e l y be b e t t e r d e f i n e d than as equal to the average r a t i n g , or s c o r e , given by an adequate number of those competent to r a t e . . . . ( K e l l e y , 1947). F i s k e (1978) b e l i e v e s that "the conceptual v a l u e " of average r a t i n g s i s q u e s t i o n a b l e . He. g i v e s the f o l l o w i n g analogy: Ralph Gerard used to r e l a t e the s t o r y of the s q u i r r e l - h u n t e r and h i s d o u b l e - b a r r e l e d gun. Seeing a s q u i r r e l i n a t r e e , he f i r e s one b a r r e l , the shot p a s s i n g one foot to the l e f t of the s q u i r r e l . The hunter f i r e s the second b a r r e l , only to have the shot go one foot to the r i g h t of the s q u i r r e l . But the s q u i r r e l f a l l s down dead, k i l l e d by the law of c e n t r a l tendency. Hunters f i r i n g at a t a r g e t seems l i k e an a p p r o p r i a t e 4 analogy, but the a p p r o p r i a t e c o n c l u s i o n i s that the l o c a t i o n of the t a r g e t can be estimated by the average l o c a t i o n of the shots. Areas where r a t i n g s are used i n c l u d e job performance (Borman, 1979; Landy & F a r r , 1976) and c l i n i c a l assessment (Berg & Adams, 1962; Flemenbaum & Zimmerman, 1973; Knight & Blaney, 1977). For many purposes i t i s not p r a c t i c a l or f e a s i b l e to o b t a i n a panel of judges, and hence other forms of assessment are d e v i s e d . But peer r a t i n g s are o f t e n used to v a l i d a t e the proposed instruments. For example, Landy and Trumbo (1980) report a l i t e r a t u r e review of v a l i d a t i o n s t u d i e s i n the J o u r n a l of A p p l i e d Psychology between 1965 and 1975 which r e v e a l e d that " r a t i n g s were used as the primary c r i t e r i o n i n 72% of the c a s e s " . One of the most h i g h l y regarded p e r s o n a l i t y q u e s t i o n n a i r e s , namely Jackson's PRF, was i n part v a l i d a t e d by peer r a t i n g s (Jackson, 1974; B u r i s c h , 1978). B. RATING SCALES Rating s c a l e s are only one form of peer assessment. Two other popular forms are peer nominations and peer ranking. Recent s t u d i e s suggest that r a t i n g s may to be l e s s r e l i a b l e and v a l i d than nominations or rankings (Kane & Lawler, 1978; Love, 1981). But r a t i n g s are e a s i e r to o b t a i n and g e n e r a l l y more i n f o r m a t i v e (Kane & Lawler, 1978). Some b e l i e v e the a l l e g e d advantage of i n f o r m a t i v e n e s s i s i l l u s o r y because of 5 the u n r e l i a b i l i t y and i n v a l i d i t y of the r e s u l t s ( B r i e f , 1980). Rating s c a l e s a l s o tend to be more f l e x i b l e i n that they can be used f o r group, i n d i v i d u a l , s e l f , or other r e p o r t i n g . Rating s c a l e s can take on many d i f f e r e n t p h y s i c a l formats ( G u i l f o r d , 1954; Wiggins, 1973). The most common form i s that of the L i k e r t s c a l e i n which there are from f i v e to nine response c a t e g o r i e s arranged along a continuum. The s c a l e s g e n e r a l l y i n c l u d e d e s c r i p t i o n s at the endpoints and.sometimes at each p o i n t on the s c a l e . The e x p l i c i t n e s s of the d e s c r i p t i o n s i s h i g h l y r e l a t e d to the r e l i a b i l i t y of the obtained r a t i n g s (Cronbach, 1970). The concreteness of the r a t e d a t t r i b u t e i s a l s o very important f o r r e l i a b i l i t y ( F i s k e , 1978). The e s s e n t i a l c h a r a c t e r i s t i c of r a t i n g s c a l e s i s that the judge a s s i g n s a numeric value to the t a r g e t along a pseudo-continuum r e p r e s e n t i n g the dimension of i n t e r e s t . Rating s c a l e s , as has been mentioned, are g e n e r a l l y r e l a t i v e l y u n r e l i a b l e and i n v a l i d . For example, Kane and Lawler ( 19*78) report the average r e l i a b i l i t y to be around .6 and the average v a l i d i t y around .3. The poor r e l i a b i l i t y t y p i c a l l y found f o r r a t i n g data i s sometimes a t t r i b u t e d to the r e l a t i v e freedom of e x p r e s s i o n which they permit the judge (Love, 1981; Kane & Lawler, 1978). There i s only one way to rank a set of t a r g e t s i n a p a r t i c u l a r order, but there are an i n f i n i t e number of ways of a s s i g n i n g r a t i n g s which r e s u l t i n the same o r d e r i n g . T h i s freedom of 6 e x p r e s s i o n enables the judge to supply more in f o r m a t i o n about the t a r g e t s (eg. i n t e r v a l measurements, e t c . ) but i t a l s o a l l o w s a c e r t a i n amount of response b i a s on the part of the judge. C. SCALE DISCREPANCIES Bias i n judgment has r e c e i v e d a great deal of a t t e n t i o n in recent years (Kahneman, S l o v i c , & Tversky, 1982). G u i l f o r d (1954) enumerated many forms of b i a s which commonly occur with r a t i n g d a t a . V i r t u a l l y a l i w r i t e r s on r a t i n g s c a l e s comment on the tendency f o r judges to vary with respect to the mean and v a r i a n c e of t h e i r r a t i n g s ( B i e r i , A t k i n s , B r i a r , Leaman, M i l l e r , & Tripodi., 1966; Burt, 1936; Cronbach, G l e s e r , Nanda, & Rajaratnam, 1972; Cronbach, Rajaratnam, & G l e s e r , 1963; F i s k e & Cox, 1960; Grozz & Grossman, 1968; K e l l e y , 1947; Paulhus, 1981; T a y l o r , 1968). The mean and v a r i a n c e of a judge's r a t i n g s can be thought of as the s c a l e of measurement used by that judge. To use the analogy of temperature, one judge may be thought of as a s s i g n i n g r a t i n g s i n the s c a l e of C e l s i u s while another uses F a r e n h e i t . As Burt (1936) p o i n t e d out, - i t doesn't make much sense to average scores expressed in two d i f f e r e n t s c a l e s of measurement. One way that r e s e a r c h e r s have t r i e d to reduce the d i s c r e p a n c y between judges' s c a l e s of measurement i s with r a t e r t r a i n i n g (Bernardin, 1978; Borman, 1979; Crow, 1957; Latham, Wexley, & P u r s e l l , 1975; P u r s e l l , D o s s e t t , & Latham, 7 1980; Spool, 1978). Thus judges may be i n s t r u c t e d to 'spread out t h e i r r a t i n g s ' or to 'provide fewer high r a t i n g s ' (Borman, 1979). Rater t r a i n i n g when p o s s i b l e seems to work q u i t e w e l l but i n some cases i t i s not f e a s i b l e to provide t r a i n i n g . T r a i n i n g may a l s o have an unforeseen b i a s i n g e f f e c t on the judges (Knight & Blaney, 1977). Another common method of e l i m i n a t i n g d i s c r e p a n c y between s c a l e s of measurement i s to s t a n d a r d i z e a l l judges' r a t i n g s to a common s c a l e (Burt, 1936; K e l l e y , 1947; Smith, 1974). Some r e s e a r c h e r s f i n d t h i s procedure t o be a r t i f i c i a l (Crohbach et a l . , 1972). T y p i c a l l y each judge's r a t i n g s are st a n d a r d i z e d to have a mean of zero and a standard d e v i a t i o n of one. There are two reasons why t h i s procedure of s t a n d a r d i z a t i o n i s not the optimal one. In the f i r s t p l a c e the standard d e v i a t i o n of a judge's r a t i n g s i s not an a p p r o p r i a t e index of the u n i t of measurement s i n c e v a r i a n c e i s composed of " t r u e " score v a r i a n c e as w e l l as e r r o r v a r i a n c e . A b e t t e r index of the u n i t of measurement would be the r e g r e s s i o n c o e f f i c i e n t of r a t i n g s on true scores (Cureton, 1958) (See the d i c u s s i o n concerning a model f o r r a t i n g data below). A second reason why s t a n d a r d i z a t i o n to a mean of zero and a standard d e v i a t i o n of one i s suboptimal i s that the r e s u l t i n g scores provide i n f o r m a t i o n f o r only the r e l a t i v e standing of the t a r g e t s . The scores cannot be r e l a t e d to the o r i g i n a l s c a l e i n any sense. Hence the i n f o r m a t i o n contained 8 i n the s c a l e d e s c r i p t i o n s cannot be a p p l i e d to the r e s u l t i n g s c o r e s . I t would be p r e f e r a b l e to o b t a i n estimates of the mean and v a r i a n c e of the t a r g e t s ' " t r u e " scores i n an ab s o l u t e , as opposed to an a r b i t r a r y s c a l e of measurement. But how i s the ab s o l u t e s c a l e of the true score estimates to be determined? T h i s study adopts the p o s i t i o n taken by Cronbach et a l . (1963), namely, that f o r q u a n t i t i e s which are most s u i t a b l y d e f i n e d by consensual agreement, the true score f o r a t a r g e t i s the expected value (average) of r a t i n g s over the p o p u l a t i o n of p o t e n t i a l judges. The true score mean and standard d e v i a t i o n are d e f i n e d i n the usual manner as expected value f u n c t i o n s of t r u e s c o r e s . In terms of the s c a l e of s t a n d a r d i z a t i o n f o r true score estimates, i t i s c l e a r t h at the a p p r o p r i a t e mean i s the observed sample mean of a l l r a t i n g s s i n c e t h i s i s the best estimate of the true score mean. Suppose that the t a r g e t s ' scores are estimated by some l i n e a r combination f ( x ) of the judges' r a t i n g s . And suppose the r e l i a b i l i t y of t h i s estimate i s given by r . F u r t h e r , suppose that the tr u e score standard d e v i a t i o n i s estimated by s. I t can be shown that the l e a s t squares c r i t e r i o n of e s t i m a t i o n i m p l i e s that the standard d e v i a t i o n of f ( x ) should be set at s * ( r ) * * l / 2 1 . Hence r a t i n g s c o u l d be combined i n standard score form and the r e s u l t i n g composite converted to the d e s i r e d mean and standard d e v i a t i o n . T h i s procedure presupposes a method of e s t i m a t i n g true score v a r i a n c e and r e l i a b i l i t y of a 1 S i n g l e a s t e r i s k r e p r e s e n t s m u l t i p l i c a t i o n . Double a s t e r i s k r e p r e s e n t s e x p o n e n t i a t i o n . 9 composite. The former can be estimated from the average c o v a r i a n c e between judges as suggested by Cronbach et a l . (1963). Formulas f o r e s t i m a t i n g the l a t t e r are easy to d e r i v e , given estimates of the i n d i v i d u a l judges r e l i a b i l i t i e s (Green, 1950). Methods of e s t i m a t i n g the r e l i a b i l i t i e s of i n d i v i d u a l judges are d i s c u s s e d below. D. FACTORIAL VALIDITY .Another source of b i a s i n r a t i n g data which was p o i n t e d out by Burt (1936) and Cronbach et a l . (1963) i s the tendency f o r d i f f e r e n t judges to focus on d i f f e r e n t a spects of the t a r g e t i n a s s i g n i n g s c o r e s . Thus, i f the q u a n t i t y to be measured can be thought of as made up of d i f f e r e n t components or f a c t o r s , then one judge may emphasize d i f f e r e n t f a c t o r s from another. T h i s phenomenon can be d e s c r i b e d as the f a c t o r i a l v a l i d i t y of the judges as measuring instruments ( A l l e n & Yen, 1979). T h i s study w i l l f o l l o w the example of Burt (1936) and Cronbach et a l . (1963) in making the s i m p l i f y i n g assumption that the q u a n t i t y being measured i s s u f f i c i e n t l y s a l i e n t that each judge's r a t i n g s can be thought of as r e s u l t i n g from one g e n e r a l f a c t o r and random e r r o r . I t would be of i n t e r e s t , however, to g e n e r a l i z e the techniques proposed by Burt (1936) to the case of more complex f a c t o r i a l s t r u c t u r e s . 10 E. RATER RELIABILITY The f i n a l source of r a t e r b i a s to be c o n s i d e r e d , one which i s recognized by v i r t u a l l y every w r i t e r on the subject of r a t i n g data i s the d i f f e r e n t i a l r e l i a b i l i t y of judges. The m a j o r i t y of recent s t u d i e s on r a t e r r e l i a b i l i t y have focused a t t e n t i o n on the average r e l i a b i l i t y of a panel of judges. An i n t r a c l a s s c o r r e l a t i o n c o e f f i c i e n t (or g e n e r a l i z a b i l i t y c o e f f i c i e n t ) i s a s u i t a b l e index of average r a t e r r e l i a b i l i t y (Bartko, 1966; Burt, 1936; Cronbach et a l . (1972); E b e l , 1951; Haggard, 1958; Jackson, 1939; Jackson & Ferguson, 1941; Maxwell & P i l l i n e r , 1968; Paulus, 1981; Shrout & F l e i s s , 1979). One procedure which has been advocated i s to compute i n t r a c l a s s c o r r e l a t i o n s with v a r i o u s combinations of judges i n order to weed out the u n r e l i a b l e judges (Burdock, F l e i s s , & Hardesty, 1963; Strahan, 1980). F o l l o w i n g a suggestion of K e l l e y (1947), Cronbach et a l . (1972) proposed a method of e s t i m a t i n g t a r g e t s ' scores which uses the average r e l i a b i l i t y of judges. True scores are estimated by the sum of the mean r a t i n g f o r a given t a r g e t m u l t i p l i e d by the alpha i n t r a c l a s s c o r r e l a t i o n c o e f f i c i e n t , p l u s the o v e r a l l mean of a l l t a r g e t s m u l t i p l i e d by one minus al p h a . Thus i f the average r a t e r r e l i a b i l i t y i s low, each t a r g e t ' s t r u e score i s estimated by a value near the o v e r a l l mean. On the other hand, i f alpha i s c l o s e to one, each t a r g e t ' s score i s estimated by a value near to the mean r a t i n g assigned to that t a r g e t . 11 The e a r l i e s t r e p o r t e d method of e s t i m a t i n g the r e l i a b i l i t i e s of i n d i v i d u a l judges from r a t i n g data was that of Shen (1925) who r e l i e d c o n s i d e r a b l y on formulas and methods developed by Truman K e l l e y . Shen a c t u a l l y proposed two methods - one of which he c o n s i d e r e d to be c o m p u t a t i o n a l l y u n f e a s i b l e but t h e o r e t i c a l l y more a c c u r a t e . I t r e l i e d , as do a number of r e s u l t s to be d i s c u s s e d , on an i d e n t i t y which holds whenever the data are assumed to r e s u l t from one g e n e r a l f a c t o r p l u s random e r r o r . T h i s i s the s o - c a l l e d u n i - f a c t o r assumption. W r i t i n g the r e l i a b i l i t y of judge i as r ( i i ) and the c o r r e l a t i o n between judges i and j as r ( i j ) , the i d e n t i t y i s r ( i j ) = ( r ( i i ) * r ( j j ) ) * * 1 / 2 . The proof uses Spearman's formula f o r the c o r r e c t i o n f o r a t t e n u a t i o n , namely that the c o r r e l a t i o n which would be expected i n the absence of random e r r o r i s r ( i j ) / ( r ( i i ) * r ( j j ) ) * * 1 / 2 . But the u n i - f a c t o r assumption i m p l i e s p e r f e c t c o r r e l a t i o n i n the absence of random e r r o r . Hence r ( i j ) / ( r ( i i ) * r ( j j ) ) * * 1 / 2 = 1. And hence r ( i j ) = ( r ( i i ) * r ( j j ) ) * * 1 / 2 (Burt, 1936; Cureton, 1931; K e l l e y , 1947; O v e r a l l , 1965; Shen, 1925). S i m i l a r l y r ( i k ) = ( r ( i i ) * r ( k k ) ) * * 1 / 2 , e t c . Hence by m u l t i p l y i n g and d i v i d i n g the a p p r o p r i a t e terms and c a n c e l l i n g , one o b t a i n s r ( i i ) = r ( i j ) * r ( i k ) / r ( j k ) (Burt, 1936; Cureton, 1931; Shen, 1925). T h i s g i v e s a formula f o r e s t i m a t i n g the r e l i a b i l i t y of judge i i n terms of the c o r r e l a t i o n s between judges i , j & k. But i f there are more than three judges then there i s more than one way of 12 e s t i m a t i n g the r e l i a b i l i t y of any given judge. I t w i l l not do to take the average of a l l estimates s i n c e , f o r one t h i n g , a p a r t i c u l a r estimate may be much too l a r g e i f the term r ( j k ) i n the denominator happens to be near zero. Shen (1925) proposed the f o l l o w i n g : The t h e o r e t i c a l l y best method c o n c e i v a b l e i s to o b t a i n an average from i n d i v i d u a l d e t e r m i n a t i o n s weighted as the squares of t h e i r standard e r r o r s . Shen used K e l l e y ' s l o g a r i t h m i c d i f f e r e n t i a l technique to ob t a i n estimates f o r the standard e r r o r of a r e l i a b i l i t y c o e f f i c i e n t and observed from the r e s u l t i n g formula that r e l i a b i l i t y estimates that are s p u r i o u s l y l a r g e because of a near-zero denominator are given near-zero weight i n the weighted average. The procedure of weighting i n v e r s e l y by the squared standard e r r o r r e s u l t s i n minimum standard e r r o r e s t i m a t e s : ... and i s undoubtedly the best value f o r a r e l i a b i l i t y c o e f f i c i e n t (Shen, 1925) K e l l e y (1947) mentioned t h i s method of e s t i m a t i n g the i n d i v i d u a l judges' r e l i a b i l i t i e s . Burt (1936) a l s o c o n s i d e r e d a s i m i l a r procedure. But a l l of them d i s m i s s e d the procedure as i n v o l v i n g too much computational labour. T h i s , however, i s not a s e r i o u s drawback with the a v a i l a b i l i t y of computers. Shen (1925) proposed another procedure which i n v o l v e d many fewer c a l c u l a t i o n s . The r e l i a b i l i t y of judge i i s estimated by a f u n c t i o n of the average c o r r e l a t i o n between 1 3 judge i and a l l other judges and the average c o r r e l a t i o n between a l l p a i r s of judges. K e l l e y (1947) advocated the use of t h i s e s t i m a t e . A number of methods of e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s using the concepts of f a c t o r a n a l y s i s have been proposed. Burt (1936) c o n s i d e r e d two such methods. One method estimated r e l i a b i l i t i e s from the loadings on the f i r s t c e n t r o i d and the other method used the l o a d i n g s on the f i r s t p r i n c i p a l component. Burt d i d not use u n i t i e s on the d i a g o n a l s of the c o r r e l a t i o n matrix but i n s t e a d suggested a procedure f o r e s t i m a t i n g the communalities. Cronbach et a l . (1963) estimated r a t e r r e i a b i l i t y with the squared l o a d i n g on the f i r s t c e n t r o i d f a c t o r of the c o v a r i a n c e matrix d i v i d e d by the judge's sample v a r i a n c e . O v e r a l l (1965) proposed using the squared l o a d i n g on the f i r s t p r i n c i p a l a x i s f a c t o r of the c o r r e l a t i o n matrix. The r a t i o n a l e behind f a c t o r a n a l y t i c estimates of r e l i a b i l i t y w i l l be c l a r i f i e d d u r i n g the d i s c u s s i o n of the proposed model of r a t i n g data. Another method of e s t i m a t i n g judges' r e l i a b i l i t i e s i n v o l v e s computing the c o r r e l a t i o n s between a l l p a i r s of judges and c o n v e r t i n g these by F i s h e r ' s - Z f u n c t i o n , and then computing the average of these and c o n v e r t i n g back to the s c a l e of c o r r e l a t i o n s (Smith, 1974). Once again, Burt (1936) c o n s i d e r e d using a s i m i l a r technique s i n c e the method of a r i t h m e t i c averaging d i d not conform with "... what i s known of the chance d i s t r i b u t i o n s of c o r r e l a t i o n c o e f f i c i e n t s " (Burt, 1936). Burt and Smith, however, both d i s m i s s the 14 procedure as r e q u i r i n g too much computational labour. Once again the a v a i l a b i l i t y of computers e l i m i n a t e s t h i s problem. Cronbach et a l . (1963) o f f e r e d another formula f o r e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s , namely by the square of the average c o v a r i a n c e of judge i with the other judges, d i v i d e d by the product of the average c o v a r i a n c e of a l l p a i r s of judges and judge i ' s v a r i a n c e . Cronbach et a l . (1963) were not very pleased with the performance of t h e i r proposed estimate - which may e x p l a i n why no f u r t h e r mention i s made of i t i n subsequent w r i t i n g s : Estimates are subject to extreme sampling f l u c t u a t i o n s unless n i s l a r g e or there i s u n i t - r a n k among c o n d i t i o n s . Second, the estimates tend to be s l i g h t l y high, but t h i s b i a s i s small r e l a t i v e to the v a r i a b l e e r r o r (Cronbach et a l . , 1963). The u n i t - r a n k c o n d i t i o n i s e q u i v a l e n t to the u n i - f a c t o r assumption mentioned e a r l i e r . Probably the most popular method of e s t i m a t i n g r a t e r r e l i a b i l i t i e s i s to use the c o r r e l a t i o n of one judge's r a t i n g s with the sum of a l l judges' r a t i n g s . Burt (1936) and Cronbach et a l . (1963) recommended t h i s method as p r o v i d i n g a q u i c k , easy and accurate e s t i m a t e . As p o i n t e d out by Smith (1974) and Burt (1936) the r e l i a b i l i t y e stimates obtained by t h i s method tend to be i n f l a t e d s i n c e the judge's own r a t i n g s are i n c l u d e d i n the sum. Hence a more accurate estimate i s obtained by o m i t t i n g the judge whose r e l i a b i l i t y i s being estimated from the sum. 15 If the r a t i n g s are not s t a n d a r d i z e d to a common va r i a n c e , then a r a t e r with a l a r g e v a r i a n c e w i l l tend to have g r e a t e r weight i n the sum. Thus a b e t t e r estimate might be obtained by c o r r e l a t i n g judges with the sum of s t a n d a r d i z e d scores. F. DIFFERENTIAL WEIGHTING E r r o r a r i s i n g from the d i f f e r e n t i a l r e l i a b i l i t y of judges i s more d i f f i c u l t to d e a l with than b i a s a r i s i n g from s c a l e s of measurement (Borman, 1979). I t i s not p o s s i b l e to c o r r e c t f o r random e r r o r i n any way analogous to s t a n d a r d i z a t i o n . One f e e l s i n t u i t i v e l y that judges with g r e a t e r r e l i a b i l i t y ought to be given more weight than judges with l e s s e r r e l i a b i l i t y . K e l l e y (1927) was the f i r s t i n a long l i n e of r e s e a r c h e r s to provide an argument f o r weighting judges in p r o p o r t i o n to the q u a n t i t y r ( i i ) * * 1 / 2 / ( l - r ( i i ) ) . K e l l e y assumes the i n d i v i d u a l r e l i a b i l i t i e s are known and that there i s one general f a c t o r u n d e r l y i n g the o b s e r v a t i o n s , or in K e l l e y ' s (1927) words, the observed scores "are a l l measures of the same t h i n g " . K e l l e y then regressed the observed scores on the h y p o t h e t i c a l " t r u e " or f a c t o r s c o r e s . The r e l a t i v e weights are given by minors of a determinant. K e l l e y next expressed the c o r r e l a t i o n between a judge and the g e n e r a l f a c t o r i n terms of t h e i r r e s p e c t i v e r e l i a b i l i t i e s using r ( i j ) = ( r ( i i ) * r ( j j ) ) * * 1 / 2 as d i s c u s s e d above, and s u b s t i t u t e d these i n t o the determinant. And 16 f i n a l l y , f o l l o w i n g a suggestion by H a r o l d H o t e l l i n g "which r e a d i l y l e d to the e v a l u a t i o n of t h i s determinant", came up with the r e s u l t that judges' scores whould be weighted i n p r o p o r t i o n to the above weights. Burt (1936) d e r i v e d the same weights along s i m i l a r l i n e s , a p p a r e n t l y independently. K e l l e y (1947). provided yet another demonstration of these r e g r e s s i o n weights. O v e r a l l (1965) obtained the same weights, but with a d i f f e r e n t purpose i n mind. He was i n t e r e s t e d i n o b t a i n i n g a simple formula f o r weights which would maximize the r e l i a b i l i t y , of a composite. An e i g e n v e c t o r s o l u t i o n to the maximum r e l i a b i l i t y problem had evolved over a s e r i e s of papers by Thomson (1940, 1947), Mosier (1943), Peel (1947a, 1947b) and Green (1950) . Thomson o r i g i n a l l y approached the problem by c o n s i d e r i n g the c o r r e l a t i o n between p a r a l l e l forms--one common d e f i n i t i o n of r e l i a b i l i t y ( A l l e n & Yen, 1979). Suppose t e s t s a, b, c have p a r a l l e l forms a', b', c'. Given weights w1, w2, w3 the composite w1a + w2b + w3c i s p a r a l l e l to w1a' + w2b' + w3c'. The aim i s t o ' f i n d weights which maximize the c o r r e l a t i o n between these two sums. Using the method of c a n o n i c a l c o r r e l a t i o n , one can f i n d weights u l , u2, u3 and v1, v2, v3 such that the c o r r e l a t i o n between u1a + u2b + u3c and v i a ' + v2b' +v3c' i s maximized. Now, by symmetry i t must be the case that u1=v1, u2=v2 and u3=v3, and hence these weights s a t i s f y the maximum r e l i a b i l i t y problem. The eigenvalue s o l u t i o n to c a n o n i c a l c o r r e l a t i o n s 17 (Morrison, 1976) can be seen to be e q u i v a l e n t to the weighting f o r maximum r e l i a b i l i t y of composite problem (Green, 1950). O v e r a l l (1965) u t i l i z e d the u n i - f a c t o r assumption and the f a m i l i a r e xpression f o r the c o r r e l a t i o n between two v a r i a b l e s i n terms of t h e i r r e s p e c t i v e r e l i a b i l i t i e s , and on s u b s t i t u t i o n i n t o the e i g e n v e c t o r s o l u t i o n deduced that the weights which maximized the r e l i a b i l i t y of the composite are p r o p o r t i o n a l to the weights obtained by K e l l e y and Burt. G. SUMMARY To summarize, the procedure to be recommended, which was suggested by Burt (1936), i s to convert each judge's r a t i n g s i n t o u n i t standard score form, estimate the i n d i v i d u a l r e l i a b i l i t y of each judge using one of the mentioned methods, and to use these estimates to weight the judges to estimate true s c o r e s . Burt advocated s c a l i n g the tru e score estimates to u n i t standard form as w e l l . But another p o s s i b i l t y i s to attempt to express' them i n a s c a l e which r e s u l t s i n the minimum squared d e v i a t i o n from true s c o r e s . That i s , to attempt to estimate true scores i n an ab s o l u t e s c a l e (Cronbach et a l . 1963). T h i s i n v o l v e s c o n v e r t i n g the mean to the o v e r a l l sample mean of the r a t i n g s , and e s t i m a t i n g the tr u e score standard d e v i a t i o n , s, and the r e l i a b i l i t y of the composite estimate, r , and s c a l i n g the standard d e v i a t i o n of the estimates to s * r . 18 H. MODEL The present d i s c u s s i o n has developed out of c o n s i d e r a t i o n of sources of r a t e r b i a s . Three main f a c t o r s were c o n s i d e r e d : s c a l e of measurement (mean and v a r i a n c e ) , f a c t o r i a l v a l i d i t y , and r e l i a b i l i t y (or random e r r o r ) . The qu e s t i o n of f a c t o r i a l v a l i d i t y has been e s s e n t i a l l y shelved i n making the s i m p l i f y i n g assumption that a l l r a t i n g s are the r e s u l t of one general f a c t o r and random e r r o r . Hence r a t i n g s generated by a given judge are c h a r a c t e r i z e d by a s c a l e of measurement and a r e l i a b i l i t y index. That i s , a judge's r a t i n g s can be expressed as a l i n e a r f u n c t i o n of true scores p l u s random e r r o r . The s c a l i n g f a c t o r corresponds to the r e g r e s s i o n c o e f f i c i e n t of a l i n e a r r e g r e s s i o n equation, which i s not the same as the standard d e v i a t i o n of the r a t i n g . I t has been argued that the r e g r e s s i o n c o e f f i c i e n t corresponds to the judge's u n i t of measurement (Cureton, 1958). T h i s model of r a t i n g s has-been advocated by K e l l e y (1924), Cureton (1931), Burt (1936), Cronbach (1963) and O v e r a l l (1968) although Burt and Cronbach a l s o allowed f o r the p o s s i b i l t y of a m u l t i - f a c t o r i a l model. The model can be w r i t t e n i n the form: X ( i j ) = YBAR + A ( i ) + B ( i ) * ( Y ( j ) - Y B A R ) + E ( i j ) where X ( i j ) r e p r e s e n t s the obseved r a t i n g of judge i f o r t a r g e t j . Y ( j ) i s t a r g e t j ' s tr u e score. YBAR i s the p o p u l a t i o n true score mean. A ( i ) and B ( i ) are judge i ' s s c a l i n g parameters. A ( i ) i s an a d d i t i v e constant and B ( i ) i s a s c a l i n g f a c t o r or u n i t 19 of measurement. E ( i j ) i s the random e r r o r component. The r e l i a b i l i t y of judge i i s given by B ( i ) * * 2 * Var(Y) / Va r ( X ) . That i s , the " t r u e " score v a r i a n c e d i v i d e d by the t o t a l v a r i a n c e . I t i s assumed that the average l e v e l b i a s , A ( i ) , over the p o p u l a t i o n of judges i s zero and that the average s c a l i n g f a c t o r B ( i ) i s one (Cronbach et a l . 1963). With these assumptions the consensual d e f i n i t i o n of true scores i s s a t i s f i e d , namely f o r a f i x e d t a r g e t j , the expected value (or mean).of r a t i n g s over a l l judges i s equal to the true score f o r t a r g e t j . The f a c t o r a n a l y t i c estimates of r e l i a b i l i t y are d e r i v e d from the common f a c t o r model as f o l l o w s : The r e l i a b i l i t y of a v a r i a b l e can be d e f i n e d as the squared c o r r e l a t i o n with true scores ( A l l e n & Yen, 1979). The c o r r e l a t i o n between a v a r i a b l e with u n i t v a r i a n c e and a f a c t o r i s equal to the corresponding f a c t o r l o a d i n g (Gorsuch, 1974). Hence s i n c e true scores are represented by f a c t o r s c o r e s , the r e l i a b i l i t y of a v a r i a b l e i s given by the square of i t s l o a d i n g on the common f a c t o r . The common f a c t o r model a l s o suggests another method of e s t i m a t i n g t a r g e t s ' t r u e scores which, as j u s t mentioned, are represented by f a c t o r s c o r e s . I f the f a c t o r s t r u c t u r e i s estimated by the method of p r i n c i p a l components, then, f a c t o r scores are given by l i n e a r combinations of the r a t i n g s using the ei g e n v e c t o r elements as weights. I f the method of maximum l i k e l i h o o d f a c t o r a n a l y s i s i s used, then 20 f a c t o r scores are g e n e r a l l y estimated by Thomson's (1951) m u l t i p l e r e g r e s s i o n technique (Morrison, 1976; Nunnally, 1978). I. OBJECTIONS TO BURT'S METHOD There doesn't appear to be a great deal of e m p i r i c a l evidence with respect to the improved v a l i d i t y or r e l i a b i l i t y r e s u l t i n g from Burt's method. Burt g i v e s some t h e o r e t i c a l l y d e r i v e d estimates of improvement f o r some p a r t i c u l a r cases, as do Lawshe and Nagle (1952) i n the context of composite r e l i a b i l i t y . The consensus appears to be that the method r e s u l t s i n r e l a t i v e l y small gains i n r e l i a b i l i t y and v a l i d i t y . The g e n e r a l f e e l i n g seems to have been that i t i s not worth the computational e f f o r t . One o b j e c t i o n to Burt's method r e l a t e s to the c u r r e n t c o n t r o v e r s y about the m e r i t s of d i f f e r e n t i a l versus u n i t weighting (Wainer, 1976). Wiggins (1981) r e p o r t e d on the c o n c l u s i o n s reached by Dawes and C o r r i g a n (1974) and Einhorn and Hogarth (1975): These authors suggest that u n i t weighting may be s u p e r i o r to l i n e a r l y optimal weighting i n a v a r i e t y of s i t u a t i o n s i n v o l v i n g r e l a t i v e l y small sample s i z e s . The f a c t that the r e l a t i v e l y small sample s i z e s they are t a l k i n g about are those that are t y p i c a l l y employed in p e r s o n a l i t y assessment lends a c e r t a i n immediacy to t h e i r c o n c l u s i o n s . But b a s i c a l l y , t h e i r arguments r e s t on the well-known 2 1 i n s t a b i l i t y of r e g r e s s i o n weights i n r e l a t i v e l y small samples. Wainer ( 1 9 7 6 ) on the other hand takes a somewhat more r a d i c a l p o s i t i o n . He p r e s e n t s a proof that under f a i r l y g e n e r a l circumstances, c o e f f i c i e n t s i n m u l t i p l e r e g r e s s i o n models can be r e p l a c e d with equal weights with almost no l o s s of accuracy i n the o r i g i n a l sample. Further he shows that equal weights w i l l have g r e a t e r robustness than l e a s t squares r e g r e s s i o n c o e f f i c i e n t s i n new samples of s u b j e c t s (Wiggins, 1 9 8 1 ) . The u l t i m a t e aim of the present study was to compare the performances of v a r i o u s methods of e s t i m a t i n g true scores under d i f f e r e n t c o n d i t i o n s of sample s i z e and r a t e r b i a s . The r a t i n g data t e s t e d were simulated f o r the present study. S i m u l a t i o n has three d e s i r a b l e p r o p e r t i e s . F i r s t , i t enables the experimental c o n d i t i o n s to be v a r i e d s y s t e m a t i c a l l y . Second, i t allows any p a r t i c u l a r c o n d i t i o n to be r e p l i c a t e d any d e s i r e d number of times, thus reducing the e f f e c t of sampling e r r o r . And t h i r d , the a c t u a l values of the q u a n t i t i e s to be estimated are known and hence the performances of the estimates can be measured e x a c t l y . I t i s p o s s i b l e that a n a l y t i c e x p r e s s i o n s may be o b t a i n a b l e f o r some or perhaps a l l of the q u a n t i t i e s estimated i n t h i s study. But, i n g e n e r a l , i t i s d i f f i c u l t to o b t a i n a n a l y t i c e x p r e s s i o n s f o r the expected value of complicated f u n c t i o n s of random v a r i a b l e s . Moreover, with the a v a i l a b i l i t y of computers, i t i s not e s s e n t i a l ( D i a c o n i s & E f r o n , 1983). I I . METHOD A. DATA GENERATION The r a t i n g data were simulated with a F o r t r a n program (see Appendix) on the U n i v e r s i t y of B r i t i s h Columbia Amdhal 470/V8 computer. For the generation of data, each judge i was c o n s i d e r e d to be a " l i n e a r f i l t e r " ( i n O v e r a l l ' s (1968) w o r d s ) w i t h a c h a r a c t e r i s t i c mean A ( i ) , a s c a l i n g f a c t o r B ( i ) , and a r e l i a b i l i t y r ( i i ) . In a l l cases, the true scores Y ( j ) were obtained from a u n i t normal random number generator (RANDN - see UBC Documentation). The r a t e r mean b i a s v a l u e s A ( i ) were obtained from a random normal d i s t r i b u t i o n with mean zero ( i n accordance with the model assumptions) and a standard d e v i a t i o n as s p e c i f i e d by the experimental c o n d i t i o n . The r a t e r s c a l i n g f a c t o r s were, f o r the most p a r t , obtained from a random normal d i s t r i b u t i o n , with a mean of one and a standard d e v i a t i o n as s p e c i f i e d by the experimental c o n d i t i o n . Values below zero were t r u n c a t e d i n accordance with the assumption that no r a t e r would s y s t e m a t i c a l l y reverse the order of the true scores ( i e . negative r e g r e s s i o n c o e f f i c i e n t s were exc l u d e d ) . To preserve the mean of one (so that the expected value of the B ( i ) would equal one) valu e s above two were a l s o t r u n c a t e d . Thus the r a t e r s c a l i n g f a c t o r s B ( i ) were obtained from a normal d i s t r i b u t i o n t r u n c a t e d below zero and above two with a mean of one and a standard d e v i a t i o n as s p e c i f i e d by the experimental c o n d i t i o n . T r u n c a t i o n decreases the standard 22 23 d e v i a t i o n of a d i s t r i b u t i o n . The standard d e v i a t i o n a f t e r t r u n c a t i o n can be c a l c u l a t e d from formulas given i n Johnson and Kotz (1970). The observed value was compared with the c a l c u l a t e d v a l u e . Since the shape of the d i s t r i b u t i o n of the B ( i ) does not appear to be known, the normal d i s t r i b u t i o n was b e l i e v e d to be a safe approximation, but f o r the sake of comparison the B ( i ) were a l s o o btained from an a p p r o p r i a t e uniform d i s t r i b u t i o n f o r s e l e c t e d experimental c o n d i t i o n s . The r a t e r r e l i a b i l i t y v a l u e s were obtained i n much the same way as the s c a l i n g f a c t o r s except the mean was under experimental c o n t r o l and t r u n c a t i o n was such that the d i s t r i b u t i o n was symmetric and d i d not take on v a l u e s g r e a t e r than one or l e s s than z e r o . Rater r e l i a b i l i t i e s were obtained from an a p p r o p r i a t e uniform d i s t r i b u t i o n i n s e l e c t e d c o n d i t i o n s as w e l l . The standard d e v i a t i o n f o r r e l i a b i l i t i e s was s p e c i f i e d by the experimental c o n d i t i o n . Hence f o r a given experimental c o n d i t i o n , the r e l e v a n t data were generated by the f o l l o w i n g s t e p s : 1. True scores Y ( j ) were a s s i g n e d to t a r g e t s from a random u n i t normal d i s t r i b u t i o n . 2. Each judge was assign e d a l e v e l b i a s value A ( i ) from a normal d i s t r i b u t i o n with a mean of zero and a standard d e v i a t i o n as s p e c i f i e d by the c o n d i t i o n . 3. Each judge was assign e d a s c a l i n g b i a s B ( i ) from a normal d i s t r i b u t i o n with a mean of one and a standard d e v i a t i o n as s p e c i f i e d by the experimental c o n d i t i o n and 0 < B ( i ) < 2. 24 4. Each judge was a s s i g n e d a r e l i a b i l i t y value r ( i i ) from a normal d i s t r i b u t i o n with mean and standard d e v i a t i o n as s p e c i f i e d by the experimental c o n d i t i o n and 0 < r ( i i ) < 1 and was symmetric about the mean. 5. An "observed" r a t i n g f o r t a r g e t j by judge i was generated by the f o l l o w i n g f u n c t i o n : X ( i j ) = A ( i ) + B ( i ) * Y ( j ) + ( B ( i ) * * 2 / r ( i i ) - B ( i ) * * 2 ) * * l / 2 * E ( i j ) where E ( i j ) was obtained from a random u n i t normal d i s t r i b u t i o n . T h i s formula r e s u l t s i n judge i having l e v e l b i a s A ( i ) , s c a l i n g b i a s B ( i ) , and r e l i a b i l i t y r ( i i ) . Moreover s i n c e the expected value of A ( i ) i s zero and the expected value of B ( i ) i s one i t f o l l o w s that f o r a f i x e d t a r g e t , the average r a t i n g a s s i g n e d to that t a r g e t i s the c o r r e c t value ( i e . the t a r g e t ' s " t r u e " s c o r e ) . It should be commented that t h i s method of g e n e r a t i n g r a t i n g s i s u n r e a l i s t i c i n the sense that the v a l u e s are t h e o r e t i c a l l y unbounded whereas r e a l r a t i n g s u s u a l l y have bounds. T h i s was not expected to have a s i g n i f i c a n t e f f e c t on the outcome. B. EXPERIMENTAL CONDITIONS The experimental c o n d i t i o n s which were manipulated in t h i s study were: the number of judges, the number of t a r g e t s , the standard d e v i a t i o n s of l e v e l , s c a l e and r e l i a b i l i t y b i a s , and the mean r e l i a b i l i t y b i a s . The f o l l o w i n g four s e t s of judge and t a r g e t sample s i z e s were t e s t e d : 25 1. 5 judges, 10 t a r g e t s 2. 10 judges, 5 t a r g e t s 3. 10 judges, 10 t a r g e t s 4. 20 judges, 20 t a r g e t s Each of these judge and t a r g e t combinations was t e s t e d with each of the f o l l o w i n g three combinations of r a t e r b i a s : 1. SD(A)=.5, SD(B)=.5, MEAN(r)=.8, SD(r)=.2 2. SD(A)=.5, SD(B)=.5, MEAN(r)=.6, SD(r)=.4 3. SD(A)=.0, SD(B)=.0, MEAN(r)=.6, SD(r)=.4 The f i r s t c o n d i t i o n , f o r example, a s s i g n s l e v e l b i a s values A ( i ) to judges from a d i s t r i b u t i o n with a standard d e v i a t i o n of .5 so that about 2/3 of the judges w i l l have l e v e l b i a s values between -.5 and .5, s i n c e the mean i s zero. S i m i l a r l y the judges i n the f i r s t c o n d i t i o n were as s i g n e d s c a l i n g f a c t o r s from a d i s t r i b u t i o n with a standard d e v i a t i o n of .5, e t c . These c h o i c e s of b i a s parameters allow the comparison of the e f f e c t s of high l e v e l s of r e l i a b i l i t y — i n the p o p u l a t i o n of judges--to the e f f e c t s of low r e l i a b i l i t y . In a d d i t i o n the e f f e c t of the presence of d i f f e r e n c e s i n judges' s c a l e s of measurement can be examined s i n c e the l a s t c o n d i t i o n s p e c i f i e s that the valu e s of A ( i ) and B ( i ) be equal to the same value (namely zero and one r e s p e c t i v e l y ) f o r a l l judges. The four sample s i z e c o n d i t i o n s were f u l l y c r o s s e d with the three r a t e r b i a s c o n d i t i o n s to give a t o t a l of twelve c o n d i t i o n s f o r comparison. One f u r t h e r c o n d i t i o n i n v o l v i n g ten judges and ten t a r g e t s with l e v e l and s c a l e standard 26 d e v i a t i o n s of .5 and r e l i a b i l i t y mean and standard d e v i a t i o n of .6 and .4 r e s p e c t i v e l y , was t e s t e d i n which the s c a l i n g f a c t o r s B ( i ) and the r e l i a b i l i t i e s r ( i i ) were obtained from a uniform d i s t r i b u t i o n i n s t e a d of a tr u n c a t e d normal d i s t r i b u t i o n . Each c o n d i t i o n combination was r e p l i c a t e d 150 times. On each r e p l i c a t i o n t a r g e t true s c o r e s , r a t e r b i a s i n d i c e s , and "observed" scores were generated a c c o r d i n g to the s p e c i f i c a t i o n s of the experimental c o n d i t i o n . For each r e p l i c a t i o n the f o l l o w i n g data were obtained: C. RATER RELIABILITIES . The r e l i a b i l i t i e s of the i n d i v i d u a l judges were estimated by seven of the methods d e s c r i b e d i n the i n t r o d u c t i o n . The r e s u l t s from each method were represented by the mean and standard d e v i a t i o n s of the r a t e r r e l i a b i l i t y e s t i m ates, the c o r r e l a t i o n between the estimates and the a c t u a l r a t e r r e l i a b i l i t y v a l u e s , and f i n a l l y , the mean squared d e v i a t i o n of the esti m a t e s from the a c t u a l v a l u e . I t should be s t r e s s e d that the a c t u a l p o p u l a t i o n r e l i a b i l i t y of eac-h judge was a v a i l a b l e f o r comparison with the v a r i o u s e s t i m a t e s . The seven methods t e s t e d were the f o l l o w i n g : 1. Shen: Shen's " i m p r a c t i c a l " method of weighting estimates of the form r ( i i ) = r ( i j ) * r ( i k ) / r ( j k ) by the i n v e r s e of the squared standard e r r o r of estimate. 2. Cronbach: Cronbach et a l . ' s (1963) formula which d i v i d e s the average covariance between judge i and the other 27 judges by the product of the average c o v a r i a n c e between a l l judges and the v a r i a n c e of judge i . 3. PC: The squared l o a d i n g on the f i r s t p r i n c i p a l component of the c o r r e l a t i o n matrix f o r judges. The UBC system subroutine SYMAL (see UBC Documentations) was used to compute the p r i n c i p a l components. 4. ML: The squared l o a d i n g s on the f i r s t maximum l i k e l i h o o d f a c t o r of the judges' c o r r e l a t i o n matrix. The maximum l i k e l i h o o d f a c t o r a n a l y s i s program was based on an a l g o r i t h m given i n Morrison (1973). The method i s based on a technique due to Rao. 5. Avg F i s h e r - z : The average of the F i s h e r - z transform of the c o r r e l a t i o n s of a given judge with the other judges was converted back to a c o r r e l a t i o n c o e f f i c i e n t by the f u n c t i o n a l i n v e r s e of the F i s h e r - z transform. 6. r with Sum: The r a t i n g s by each judge were c o r r e l a t e d with the sum over a l l judges e x c l u d i n g the given judge being assessed. 7. r with z-Sum: T h i s i s the same as r with Sum except a l l r a t i n g s are converted to standard scores p r i o r to summat i o n . F i n a l l y , the mean and standard d e v i a t i o n s of the four measures (mean, SD, c o r r e l a t i o n with a c t u a l , and mean square d e v i a t i o n from a c t u a l ) over 150 r e p l i c a t i o n s were t a b u l a t e d . 28 D. RELIABILITY OF SUM The true r e l i a b i l i t y of a sum (or mean) of judges when the values A ( i ) , B ( i ) , and r ( i i ) of the model equation are known can be computed e x a c t l y by the r a t i o of t r u e score v a r i a n c e to t o t a l v a r i a n c e . The expected value of the r e l i a b i l i t y of a sample of judges from a s p e c i f i e d p o p u l a t i o n was estimated by computing the mean r e l i a b i l i t y of the sum f o r 1,000 samples of judges. Two sample estimates of the r e l i a b i l i t y of a sum were t e s t e d : 1. Alpha: Cronbach's alpha i s a common measure of the expected r e l i a b i l i t y of a sum of r a t e r s . I t has been proposed by s e v e r a l w r i t e r s (Hoyt, 1941; E b e l , 1951; Cronbach et a l . (1972). Cronbach's alpha i s known to be a lower bound f o r the r e l i a b i l i t y of a sum (Hunter, 1968) with e q u a l i t y o c c u r i n g when the r a t e r s are e q u i v a l e n t ( i e . equal means, v a r i a n c e s and r e l i a b i l i t i e s ) . 2. Green: Green's measure u t i l i z e s estimates of i n d i v i d u a l r a t e r r e l i a b i l i t i e s . Maximum l i k e l i h o o d f a c t o r a n a l y s i s r e l i a b i l i t y estimates were used. The formula i s given i n Green (1950). The means and standard d e v i a t i o n s of the estimates obtained over the 150 r e p l i c a t i o n s were c a l c u l a t e d and compared with each other and with the estimated p o p u l a t i o n v a l u e s . As w e l l , the mean square d e v i a t i o n between estimates and a c t u a l values were computed. 29 E. WEIGHTING FOR MAXIMUM RELIABILITY Given known b i a s values A ( i ) , B ( i ) and r ( i i ) "for each judge the a c t u a l r e l i a b i l i t y of a weighted composite can be determined using a formula given i n Green (1950). If p o p u l a t i o n values are used i n c a l c u l a t i n g the composite r e l i a b i l i t y then the r e s u l t i n g value i s the p o p u l a t i o n value corresponding to the set of weights used. I t doesn't matter whether the weights were obtained from sample e s t i m a t e s . Two methods of determining weights f o r maximizing the composite r e l i a b i l i t y were t e s t e d . 1. PC: The f i r s t was the method developed by Thomson'(1940, 1947), Mosier (1943), Peel (1947a; 947b) and f i n a l l y expressed i n terms of p r i n c i p a l components by Green (1950). The UBC system r o u t i n e SYMAL was used to compute the p r i n c i p a l components. 2. O v e r a l l : The second method t e s t e d was one proposed by O v e r a l l (1965). Both methods u t i l i z e e s timates of r a t e r r e l i a b i l i t y . The maximum l i k e l i h o o d f a c t o r a n a l y s i s estimates were used. The r e s u l t s of t h i s s e c t i o n are of i n t e r e s t i n that the performance of O v e r a l l ' s s i m p l i f i e d formulas can be t e s t e d as w e l l as the gain i n r e l i a b i l i t y r e s u l t i n g from the weights. The means and standard d e v i a t i o n f o r both estimates over 150 r e p l i c a t i o n s were r e p o r t e d . 30 F. TRUE SCORE VARIANCE The p o p u l a t i o n v a r i a n c e of the true scores i s i n a l l cases equal to one. Two methods of e s t i m a t i n g true score v a r i a n c e from r a t i n g data were t e s t e d : 1. Avg Cov: Cronbach et a l (1963) d e f i n e d true score v a r i a n c e as the average c o v a r i a n c e between a l l judges. 2. Avg b=1: True score v a r i a n c e can be estimated from estimated r a t e r v a r i a n c e and r e l i a b i l i t y by s e t t i n g the mean value of B ( i ) equal to one. The means and standard d e v i a t i o n s f o r both methods over the 150 r e p l i c a t i o n s were r e p o r t e d as w e l l as the mean square d e v i a t i o n of each estimate from u n i t y (the a c t u a l t r u e score v a r i a n c e ) . G. TRUE SCORES F i n a l l y , the estimates of primary concern were true score e s t i m a t e s . Of these, seven methods were t e s t e d . 1. Consensus: T h i s i s the simple average of "observed" s c o r e s . 2. Weighted Consensus: Observed scores are weighted by weights designed to maximize the r e l i a b i l i t y of the composite using Burt's weights without f i r s t s t a n d a r d i z i n g the r a t i n g s . 3. S t a n d a r d i z e d : True scores were estimated by the average of s t a n d a r d i z e d r a t i n g s . 4. Weighted S t a n d a r d i z e d : Standardized r a t i n g s were combined using K e l l e y ' s , Burt's and O v e r a l l ' s weights. 31 5. Cronbach-Kelley: True scores were estimated by the raw score consensus weighted by the estimated r e l i a b i l i t y of the sum p l u s the o v e r a l l mean weighted by one minus the r e l i a b i l i t y of the sum. Green's estimate of the r e l i a b i l i t y of the sum was used as i t was found from p r e l i m i n a r y i n v e s t i g a t i o n s to be more acc u r a t e than alpha. 6. PC Scores: True scores were estimated by the l i n e a r combination of judges given by.the f i r s t e i g e n v e c t o r of the c o r r e l a t i o n matrix. Scores were s t a n d a r d i z e d to have u n i t v a r i a n c e . 7. ML Scores: True scores were estimated by m u l t i p l e r e g r e s s i o n (Thomson, 1951) from the r e s u l t s of a maximum l i k e l i h o o d f a c t o r a n a l y s i s . For each r e p l i c a t i o n the mean and standard d e v i a t i o n of estimates given by each method were c a l c u l a t e d as w e l l as the c o r r e l a t i o n with the a c t u a l t r u e s c o r e s . In the case of st a n d a r d i z e d , weighted s t a n d a r d i z e d , and f a c t o r s c o r e s , the s c a l e of the true score e s t i m a t e s i s a r b i t r a r y , hence the mean square d e v i a t i o n from true scores i s a l s o a r b i t r a r y . F i n a l l y , the estimates r e s u l t i n g from each method were r e s c a l e d to have mean equal to the o v e r a l l mean and v a r i a n c e equal to the product of estimated true score v a r i a n c e and the estimated r e l i a b i l i t y of the p a r t i c u l a r t rue score e s t i m a t e . Each method of e s t i m a t i o n i s a l i n e a r combination of e i t h e r observed r a t i n g s or s t a n d a r d i z e d r a t i n g s . Hence the general formula f o r the r e l i a b i l i t y of a weighted 32 composite can be used to estimate the r e l i a b i l i t y of each method of e s t i m a t i n g true s c o r e s . I I I . RESULTS A. RATER RELIABILIES Tables 1 through 12 present the r e s u l t s f o r the v a r i o u s methods of e s t i m a t i n g i n d i v i d u a l r a t e r r e l i a b i l i t i e s . The t a b l e s correspond to d i f f e r e n t c o n d i t i o n combinations as i n d i c a t e d at the top of the t a b l e s . The r e s u l t s of the r a t e r r e l i a b i l i t y e s timates are broken down by sample s i z e and r a t e r b i a s f o r each of the measures of mean, standard d e v i a t i o n , c o r r e l a t i o n with a c t u a l and mean squared d e v i a t i o n from a c t u a l s e p a r a t e l y i n Tables 13 through 17. Table 13 c o n t a i n s the mean over 150 r e p l i c a t i o n s of the mean r e l i a b i l i t y estimate given by each method of e s t i m a t i o n fo r each c o n d i t i o n combination. The r a t e r b i a s c o n d i t i o n s are represented by numeric a r r a y s such as .5.5.8.2. T h i s r e p r e s e n t s the c o n d i t i o n i n which the standard d e v i a t i o n s of the d i s t r i b u t i o n of r a t e r b i a s i n l e v e l and s c a l e are .5, and the average judge's r e l i a b i l i t y i s .8 and the standard d e v i a t i o n of the r e l i a b i l i t i e s i s .2. Noteworthy o b s e r v a t i o n s are that the mean estimates given by the r with Sum and the r with Z-Sum methods are c o n s i s t e n t l y high with an apparent i n c r e a s e i n b i a s corresponding to the l a r g e r sample s i z e c o n d i t i o n s . The PC and Cronbach methods show high means i n the small sample c o n d i t i o n s but t h i s b i a s appears to be reduced as sample s i z e i n c r e a s e s . The means fo r the Shen, ML and Avg F i s h e r - z methods appear to be 33 34 roughly unbiased. Table 14 c o n t a i n s the means over the 150 r e p l i c a t i o n s of the sample standard d e v i a t i o n of the r e l i a b i l i t y e s t imates of each method and each c o n d i t i o n combination. For example, i n a given r e p l i c a t i o n under a given c o n d i t i o n combination, the Shen estimate, say, give s r e l i a b i l i t y e s t imates f o r each judge. The standard d e v i a t i o n of these estimates was computed and the corresponding standard d e v i a t i o n s were averaged over the 150 r e p l i c a t i o n s . Hence the t a b l e c o n t a i n s estimates of the expected standard d e v i a t i o n of each method under each c o n d i t i o n . The average sample d e v i a t i o n s of the a c t u a l r e l i a b i l i t i e s are c o n t a i n e d i n Tables 1 through 12. The a c t u a l standard d e v i a t i o n s are l e s s than the standard d e v i a t i o n s s p e c i f i e d under the experimental c o n d i t i o n s (.2 and .4) because of t r u n c a t i o n . The r e s u l t i n g standard d e v i a t i o n s can be estimated using a formula from Johnson and Kotz (1970). The r e l i a b i l i t y d i s t r i b u t i o n c o r r e s p o n d i n g to a normal with mean .8 and standard d e v i a t i o n .2 t r u n c a t e d at .6 and 1.0 r e s u l t s i n having a standard d e v i a t i o n of about .11. The r e l i a b i l i t i e s with mean .6 and standard d e v i a t i o n .4 with t r u n c a t i o n at .2 and 1.0 r e s u l t i n a standard d e v i a t i o n of about .22. Hence the r e s u l t s of Table 14 are to be compared with the valu e s .11 and .22. I n s p e c t i o n of Table 14 suggests the f o l l o w i n g o b s e r v a t i o n s : F i r s t the SD f o r the Cronbach method i s c o n s i d e r a b l y i n f l a t e d at the s m a l l e r sample s i z e s . T h i s b i a s 35 appears to decrease as the sample s i z e i n c r e a s e s . The Avg F i s h e r - z method has the s m a l l e s t SDs, followed c l o s e l y by the r with Sum and the r with Z-Sum with SDs g e n e r a l l y smaller than the a c t u a l SDs except i n the c o n d i t i o n of 10 judges and 5 t a r g e t s where they tended to be s l i g h t l y higher than the a c t u a l . The Shen and ML methods had s i m i l a r SDs. Both were s l i g h t l y higher than those of the PC method. F i n a l l y , the SDs f o r the Cronbach method were g e n e r a l l y l a r g e s t . Table 15 c o n t a i n s f o r each method of e s t i m a t i o n under each c o n d i t i o n combination the average c o r r e l a t i o n between estimated and a c t u a l r a t e r r e l i a b i l i t i e s , averaged over 150 r e p l i c a t i o n s . These r e s u l t s were averaged over r a t e r b i a s c o n d i t i o n s and presented i n Table 16. The s i z e of c o r r e l a t i o n f o r a l l methods appears to be a f u n c t i o n most importantly of the number of t a r g e t s and s e c o n d a r i l y of the number of r a t e r s . The ML method had the highest average c o r r e l a t i o n i n the 5-10, 10-10 and 20-20 c o n d i t i o n s , while the Cronbach estimate was s u p e r i o r in the 10—5 c o n d i t i o n . Table 17 presents the c o r r e l a t i o n s between estimates and a c t u a l averaged over sample s i z e s . In gen e r a l the c o r r e l a t i o n s are not g r e a t l y a f f e c t e d by the d i s t r i b u t i o n of r a t e r b i a s . The ML method g e n e r a l l y performs the best except i n the .0.0.6.4 c o n d i t i o n where the Cronbach estimate showed a c o r r e l a t i o n c o n s i d e r a b l y l a r g e r than the other e s t i m a t e s . Table 18 prese n t s the mean over 150 r e p l i c a t i o n s of the mean square d e v i a t i o n between the r a t e r r e l i a b i l i t y 36 estimates and the a c t u a l r e l i a b i l i t i e s f o r each method of e s t i m a t i o n and each experimental c o n d i t i o n . Table 19 presents the average mean square d e v i a t i o n f o r each method averaged over the r a t e r b i a s c o n d i t i o n s . As with the c o r r e l a t i o n s , the mean squared d e v i a t i o n s improve with sample s i z e . The most noteworthy r e s u l t i n d i c a t e d was the exc e s s i v e mean square d e v i a t i o n of the Cronbach estimate i n the small sample s i z e c o n d i t i o n s . Table 20 presents the average mean square d e v i a t i o n s averaged over sample s i z e s . Once again, the Cronbach method had mean square d e v i a t i o n s which were c o n s i d e r a b l y l a r g e r than the other methods i n the .5.5.6.4 and .0.0.6.4 c o n d i t i o n s . The PC method had the sm a l l e s t mean square d e v i a t i o n . B. RELIABILITY OF SUM Tables 21 through 32 present the r e s u l t s f o r each of the combinations of c o n d i t i o n s with respect to comparisons of estimates of the r e l i a b i l i t y of the sum over judges, two methods of weighting judges f o r maximum r e l i a b i l i t y , and two estimates of true score v a r i a n c e . These r e s u l t s are d i s c u s s e d s e p a r a t e l y . With respect to the r e l i a b i l i t y of the sum over r a t e r s , the p o p u l a t i o n value fo r each c o n d i t i o n combination was estimated by the average value computed over a thousand r e p l i c a t i o n s . The r e s u l t s are presented i n Table 33. 37 The estimates given by the alpha and Green methods f o r a l l c o n d i t i o n s are given i n Table 34. F i g u r e 1 give s the expected r e l i a b i l i t y of sum along with the mean r e l i a b i l i t y of sum of the alpha and Green estimates averaged over r a t e r b i a s c o n d i t i o n s . The Green method of e s t i m a t i o n g i v e s a c o n s i d e r a b l y b e t t e r estimate of the a c t u a l r e l i a b i l i t y of sum. Both estimates improve with i n c r e a s e d sample s i z e . F i g u r e 2 g i v e s the a c t u a l and estimated r e l i a b i l i t i e s of the sum of judges averaged over sample s i z e s . The Green method of e s t i m a t i o n again performs b e t t e r than the alpha method under a l l c o n d i t i o n s . C. WEIGHTING FOR MAXIMUM RELIABILITY With respect to the two methods of weighting f o r maximum r e l i a b i l i t y , i t i s evident from Tables 21-32 that the two methods give v i r t u a l l y i d e n t i c a l r e s u l t s . N e i t h e r method performs w e l l under the 10-judges 5-targets sample s i z e c o n d i t i o n . In a l l other c o n d i t i o n s the weights r e s u l t in i n c r e a s e d r e l i a b i l i t y . The i n c r e a s e i s g r e a t e s t when the unweighted r e l i a b i l i t y i s low and the sample s i z e i s l a r g e . F i g u r e 3 graphs the a c t u a l r e l i a b i l i t i e s of the unweighted sum as determined by the 1000 r e p l i c a t i o n s as wel l as the a c t u a l r e l i a b i l i t y of the composite, using O v e r a l l ' s (or Burt's) weights to maximize r e l i a b i l i t y averaged over 150 r e p l i c a t i o n s . These values are averaged over r a t e r b i a s c o n d i t i o n s . The weighted composite i s i n general g r e a t e r than the consensus by about .03. The only REL IAB IL ITY OF THE SUM AVERAG-ED OVER RATER BIAS . / OOOr 175-o I o i o <jSo l -J <?00 -j i OQ i < 875^ --J i uj ^ 225" • ACTUAL -» GREEN — * A L P H A 3 5-\0 10-5 /0-/O 20-20 SAMPLE SIZE ( R A T E R S - T A S G - E T S ) CO RELIABILITV OF THE SUM AVERAGED OVER SAMPLE S|2E . / ooo 9 7 5 • o o °I50 o .—I oo 27£ -<: B2.5 -8 0 0 A C T U A L G R E E N — - ALPHA .5.5". 3.2 RATER 8 /AS 2 ] Oo -0 /REL IAB IL ITY- OF CONSENSUS hNb WElGrHTEt) COMPOS ITE A V E R A G E D O V E R R A T E R B / A S looo r 975,-o o o 95o 9*5 9oo 5 g 7 5 825" 9oo L 1 * C O N S E N S U S ^ W E I G H T E D -C O M P O S I T E /0-5" /0-/o S A M P L E SI Z E (raters -JARGETS) A.0 - 2.0 ' 3] o 41 e x c e p t i o n to t h i s i s i n the 10-judges 5-targets c o n d i t i o n , where t h i s r e l a t i o n i s r e v e r s e d . F i g u r e 4 graphs the same values averaged over sample s i z e s . The e f f e c t appears to be reduced. There i s no d i f f e r e n c e between the two methods i n the .5.5.8.2 c o n d i t i o n . D. TRUE SCORE VARIANCE The two methods of e s t i m a t i n g true score v a r i a n c e as i n d i c a t e d i n Tables 21 through 32 show very s i m i l a r p a t t e r n s of performance. The p o p u l a t i o n t r u e score v a r i a n c e i s one for each r e p l i c a t i o n and each c o n d i t i o n combination. The estimates g i v e means which appear to show an average near the c o r r e c t value of one, but with a c o n s i d e r a b l e amount of v a r i a n c e between r e p l i c a t i o n s as i n d i c a t e d by the l a r g e standard d e v i a t i o n s and mean square d e v i a t i o n s . The v a r i a n c e a l s o appears to be s l i g h t l y reduced when the average r a t e r r e l i a b i l i t y i s high and when there are no d i f f e r e n c e s i n the s c a l e s of measurement of the judges. E. TRUE SCORES The r e s u l t s f o r the true score estimates are presented i n Tables 35 through 46. With res p e c t to the means, only the consensus, weighted consensus and Cronbach-Kelley methods i n v o l v e n o n - a r b i t r a r y means. The consensus and Cronbach-Kelley methods give i d e n t i c a l means and g e n e r a l l y appear to provide good RELIABILITY OF CONSENSUS C O M P O S I T E AVERA&EO O V E R AND VJEIG-HTED S A M P L E S I2E-I OOO -i .975- L 900 8 7 5 " 8 5 0 -815 -8 0 0 1 • CONSENSUS • W E I G H T E D -C O M P O S I T E .5". 5". 8. a RATER BIAS 0.0.6.V 43 estimates of the a c t u a l mean (which always has a p o p u l a t i o n value of z e r o ) . The weighted consensus mean appears to. be somewhat u n s t a b l e . The same estimates are the only ones with n o n - a r b i t r a r y standard d e v i a t i o n s . The standard d e v i a t i o n s f o r the consensus method tend to be l a r g e r than the standard d e v i a t i o n s of the true s c o r e s . And the standard d e v i a t i o n s of the weighted consensus estimate tend to be l a r g e r s t i l l . The Cronbach-Kelley estimate, on the other hand, c o n s i s t e n t l y shows a standard d e v i a t i o n which i s l e s s than the true scores standard d e v i a t i o n (the p o p u l a t i o n value of which i s always one). The mean c o r r e l a t i o n s between estimates and true scores for a l l c o n d i t i o n combinations are presented i n Table 47. The mean c o r r e l a t i o n s f o r a l l methods of e s t i m a t i o n depend p r i m a r i l y on the number of t a r g e t s and s e c o n d a r i l y on the number of judges. A l l methods perform very w e l l i n terms of t h e i r c o r r e l a t i o n s . A l l methods a l s o show the highest c o r r e l a t i o n s under the .5.5.8.2 c o n d i t i o n . Discrepancy i n s c a l e b i a s does not appear to a f f e c t any of the methods to a great e x t e n t . The highest c o r r e l a t i o n s are obtained by the weighted s t a n d a r d i z e d and maximum l i k e l i h o o d methods, but the simple s t a n d a r d i z e d scores show the highest c o r r e l a t i o n s i n the 10-judges 5-targets c o n d i t i o n . The MSE1 values i n Tables 35 to 46 represent the mean square d e v i a t i o n s (averaged over 150 r e p l i c a t i o n s ) between the true score estimates and the a c t u a l true s c o r e s . Once 44 again only the consensus, weighted consensus and Cronbach-Kelley methods have n o n a r b i t r a r y v a l u e s . The MSE1 v a l u e s f o r these estimates under a l l c o n d i t i o n combinations are presented i n Table 48. The weighted consensus shows a c o n s i d e r a b l y higher MSE1 than the other two methods. The MSE1 values f o r the Cronbach-Kelley method are s l i g h t l y s maller than those of the consensus method. Both methods show s m a l l e r MSEI's under the l a r g e sample s i z e c o n d i t i o n . They seem to be p a r t i c u l a r l y a f f e c t e d by the number of judges. The MSE2 value s r e p o r t e d i n Tables 35 through 46 represent the mean squared d e v i a t i o n s (averaged over 150 r e p l i c a t i o n s ) of t r u e score estimates (converted to the s c a l e of mean equal to o v e r a l l sample mean and v a r i a n c e equal to the product of estimated true score v a r i a n c e and estimated r e l i a b i l i t y of the true score estimate) from a c t u a l true s c o r e s . Table 49 p r e s e n t s the MSE2 values f o r each method of e s t i m a t i o n and each c o n d i t i o n combination. It i s c l e a r that none of the methods performs e x c e p t i o n a l l y p o o r l y . In general the MSEs of the estimates decreased as the number of judges and t a r g e t s i n c r e a s e d . Only the 'absolute' estimates (consensus and Cronbach-Kelley) d i v e r g e d from t h i s p a t t e r n i n that the MSE's f o r the 10-judges 5-targets c o n d i t i o n were s l i g h t l y lower than those of the 10-judges 10-targets c o n d i t i o n . 45 A l l estimates showed hi g h e s t MSEs under the .5.5.6.4 r a t e r b i a s c o n d i t i o n . There were however, v a r i a t i o n s i n the r e l a t i v e e f f e c t s of absence of systematic r a t e r b i a s and i n c r e a s e d r a t e r r e l i a b i l i t y . The consensus, s t a n d a r d i z e d and Cronbach-Kelley methods c o n s i s t e n t l y gave lowest MSE's under the .5.5.8.2 c o n d i t i o n . Thus, although the absence of systematic b i a s improved the performance of these estimates, the i n c r e a s e of average r a t e r r e l i a b i l i t y from .6 to .8 r e s u l t e d i n a greater improvement. The weighted consensus, weighted s t a n d a r d i z e d and ML estimates r e s u l t e d i n lowest MSE's f o r the .0.0.6.4 c o n d i t i o n except under the 10-judges 5-targets c o n d i t i o n . F i n a l l y the PC estimate gave almost equal r e s u l t s f o r the .5.5.8.2 and the .0.0.6.4 c o n d i t i o n s except under the 10-judges 5-targets c o n d i t i o n where the .5.5.8.2 c o n d i t i o n gave a s m a l l e r MSE. Comparing the estimates under the 20-judges 20-targets and 10-judges 10-targets c o n d i t i o n , the order of performance from best to worst was: weighted s t a n d a r d i z e d and ML t i e d f o r f i r s t , weighted consensus second, PC t h i r d , s t a n d a r d i z e d f o u r t h , and consensus and Cronbach-Kelley t i e d f o r l a s t . T h i s ranking h e l d f o r each of the r a t e r b i a s c o n d i t i o n s . The r e s u l t s f o r the 5-judges 10-targets and 10-judges 5-targets c o n d i t i o n s were not as c o n s i s t e n t . For the 10-5 c o n d i t i o n , s t a n d a r d i z e d and PC seemed to perform s l i g h t l y b e t t e r than the o t h e r s . But i n the 5-10 c o n d i t i o n , the weighted s t a n d a r d i z e d and ML methods came out ahead with PC a c l o s e second, s t a n d a r d i z e d t h i r d and consensus and 46 Cronbach-Kelley l a s t . I t should be observed that the mean squared d e v i a t i o n s for the Cronbach-Kelley method of e s t i m a t i o n remained v i r t u a l l y unchanged under r e s c a l i n g . Table 50 presents the MSE1 v a l u e s f o r the consensus ( i e . before r e s c a l i n g ) and the MSE2 values f o r the Cronbach-Kelley, s t a n d a r d i z e d , weighted s t a n d a r d i z e d , maximum l i k e l i h o o d , and p r i n c i p a l components methods of e s t i m a t i o n averaged over r a t e r b i a s c o n d i t i o n s . F i g u r e 5 r e p r e s e n t s these values (with weighted s t a n d a r d i z e d omitted, because i t i s v i r t u a l l y the same as ML, and PC omitted because i t i s of no p a r t i c u l a r s i g n i f i c a n c e ) i n graph form. I t i s c l e a r from the graph that the ML and s t a n d a r d i z e d methods gave b e t t e r true score estimates than e i t h e r of the consensus or Cronbach-Kelley methods. The s t a n d a r d i z e d method performed b e t t e r than the others i n the 5-judges 10-targets c o n d i t i o n , while the other estimates performed t h e i r worst under t h i s c o n d i t i o n , e s p e c i a l l y the consensus and.the Cronbach-Kelley methods. A l l methods gave very s i m i l a r r e s u l t s under the 10-5 c o n d i t i o n . The spread was s l i g h t l y g r e a t e r i n the 10-10 c o n d i t i o n more because of an improvement i n the ML method than anything e l s e . The ML method showed a c l e a r s u p e r i o r i t y over the others (as d i d the weighted s t a n d a r d i z e d method) under the 20-20 c o n d i t i o n . The Cronbach-Kelley method gave s l i g h t l y b e t t e r estimates than the consensus, but followed b a s i c a l l y the same p a t t e r n . Both methods appeared to be a f f e c t e d MEA/V SQUARE bEVIATloNS AVERA&Eb OVER RATER 6/AS 48 predominantly by the number of judges as opposed to the number of t a r g e t s . The other methods, on the other hand, showed improvement with i n c r e a s e s i n both the number of judges and the number of t a r g e t s . Table 51 r e f e r s to the same v a r i a b l e s as Table 50 but averaged over sample s i z e i n s t e a d of r a t e r b i a s . Again the r e s u l t s f o r s e l e c t e d methods are graphed i n F i g u r e 6. A l l methods i n d i c a t e d s i m i l a r values f o r the .5.5.8.2 c o n d i t i o n . Decreasing the average r a t e r r e l i a b i l i t y r e s u l t e d i n a spreading out of the methods. Presence of s c a l e of measurement b i a s seemed to a f f e c t the s i z e of the MSEs but not the spread between methods. The best MSEs were given by the weighted s t a n d a r d i z e d ( i e . Burt's 1936 estimates r e s c a l e d ) f o l l o w e d c l o s e l y by the ML e s t i m a t e s . PC f o l l o w e d c l o s e l y behind these, then s t a n d a r d i z e d and f i n a l l y the Cronbach-Kelley and consensus e s t i m a t e s . F. UNIFORM DISTRIBUTION Tables 52 to 54 c o n t a i n the r e s u l t s f o r estimates o f : r e l i a b i l i t y (Table 52); r e l i a b i l i t y of sum, maximum r e l i a b i l i t y and true score v a r i a n c e (Table 53); and true scores (Table 54) under the sample s i z e c o n d i t i o n 10-judges 10-targets and the r a t e r b i a s c o n d i t i o n .5.5.6.4. The only d i f f e r e n c e between t h i s experimental c o n d i t i o n and a pre v i o u s one i s that the s c a l i n g f a c t o r B ( i ) and the r e l i a b i l i t y v alues r ( i i ) a s s o c i a t e d with each judge were s e l e c t e d at random from a uniform d i s t r i b u t i o n i n s t e a d of a rtEA/V SQUARE DEVIATIONS hVERh&Eb OVER SAMPLE S / Z E o o o U J 5: 200 /5b 100 75-$0 i f • C 0 N S E / V 5 U 5 C RON 8 AC H' KBLLEY S T A N D A R D / Z E D E l&HTED -S TANt>t\Rb1ZEb 0^ RATER BIAS .0-0. 6. ¥ 50 normal d i s t r i b u t i o n . The i n t e r v a l s of the uniform d i s t r i b u t i o n were chosen such that the d i s t r i b u t i o n s had the same means and v a r i a n c e s as the normal c o u n t e r p a r t s . The corresponding normal d i s t r i b u t i o n r e s u l t s are contained i n Tables 8, 28 and 42. The p a t t e r n s of r e s u l t s were preserved i n the uniform d i s t r i b u t i o n c o n d i t i o n but there tended to be s h i f t s of a l l r e s u l t s i n v a r i o u s d i r e c t i o n s . In p a r t i c u l a r , a l l r a t e r r e l i a b i l i t y e stimates had s l i g h t l y lower c o r r e l a t i o n s with the a c t u a l r e l i a b i l i t i e s . The r e l i a b i l i t y of sums were s l i g h t l y lower. The true score v a r i a n c e , estimates were c o n s i d e r a b l y lower (although these estimates appear to be u n s t a b l e ) . The corres p o n d i n g standard d e v i a t i o n s were a l s o s l i g h t l y lower. The c o r r e l a t i o n s of the t r u e score e s t i m a t e s with the a c t u a l are s l i g h t l y lower f o r the uniform d i s t r i b u t i o n c o n d i t i o n , but again the r e l a t i v e order of the v a r i o u s estimates was the same. The mean square d e v i a t i o n s were lower f o r the uniform c o n d i t i o n as w e l l , but showed the same p a t t e r n as f o r the normal d i s t r i b u t i o n . Table 1 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 5 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal 80 .20 Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.793 (0.05) 0. (0. 1 02 03) Shen 0.795 (0.11) 0. (0. 1 52 07) 0. (0. 612 30) 0.026 (0.03) Cronbach 0.836 (0.12) 0. (0. 254 12) 0. (0. 449 39) 0.063 (0.06) PC 0.823 (0.08) 0. (0. 1 07 06) 0. (0. 610 31 ) 0.016 (0.02) ML 0.787 (0.10) 0. (0. 1 52 07) 0. (0. 628 31) 0.022 (0.02) Avg F i s h e r - Z 0.792 (0.10) 0. (0. 068 04) 0. (0. 614 31 ) 0.015 (0.02) r with Sum 0.841 (0.08) 0. (0. 086 05) 0. (0. 584 35) 0.016 (0.01) r with Z-Sum 0.848 (0.08) 0. (0. 086 05) 0. (0. 609 32) 0.016 (0.01) Note. Standard d e v i a t i o n s are given in parentheses. 52 Table 2 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N L e v e l - Scale R e l i a b i l i t y Raters 5 Mean 0 1 .60 Targets 10 SD .50 ; .50 .40 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y E s t i m a t e s Est imate Mean SD- R MSE A c t u a l 0.594 0. 206 (0.09) (0. 06) Shen 0.57.1 0. 239 0 .646 0.057 (0.20) (0. 08) (0 .37) (0.04) Cronbach 0.657 0. 323 0 .599 0.086 (0.16) (0. 14) (o .37) (0.10) PC 0.675 0. 206 0 .638 0.045 (0.11) (0. 09) (0 .35) (0.03) ML 0.619 0. 257 0 .663 0.046 (0.13) (0. 09) (0 .35) (0.03) Avg F i s h e r - Z 0.593 0. 1 28 0 .647 0.040 (0.16) (0. 07) (0 .35) (0.03) r with Sum 0.666 0. 1 69 0 .623 0.049 (0.1-5) (0. 09) (0 .35) (0.04) r with Z-Sum 0.687 0. 1 72 0 .642 0.048 (0.14) (0. 09) (0 .35) (0.03) Note. Standard d e v i a t i o n s are given i n parentheses. 53 Table 3 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . _N L e v e l S c a l e R e l i a b i l i t y Raters 5 Mean 0 1 .60 Targe t s 10 SD .0 .0 .40 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.613 (0.10) 0. (0. 21 1 06) Shen 0.562 (0.21 ) 0. (0. 251 10) 0. (0. 663 32) 0.065 (0.06) Cronbach 0.653 (0.33) 0. (0. 353 60) 0. (0. 768 23) 0,440 (3.85) PC 0.668 (0.12) 0. (0. 223 09) 0. (0. 656 31 ) 0.043 (0.03) ML 0.613 (0.13) 0. (0. 273 09) 0. (0. 667 34) 0.050 (0.04) Avg F i s h e r - Z 0.580 (0.18) 0. (0. 1 43 07) 0. (0. 685 26) 0.045 (0.04) r with Sum 0.655 (0.17) 0. (0. 188 09) 0. (0. 645 31 ) 0.051 (0.04) r with Z-Sum 0.673 (0.16) 0. (0. 1 90 09) 0. (0. 681 26) 0.048 (0.04) Note. Standard d e v i a t i o n s are given i n parentheses. 54 Table 4 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . _N L e v e l Scale R e l i a b i l i t y Raters 10 Mean 0 1 .80 Targets 5 SD .50 .50 .20 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0. 800 0. 1 07 (0. 03) (0. 02) Shen . 0. 798 0. 202 0 .463 0. 065 (0. 17) (0. 1 1 ) (0 .27) (0. 08) Cronbach 0. 807 0. 238 0 .485 0. 070 (0. 13) (0. 13) (0 .26) (0. 13) PC 0. 793 0. 180 0 .503 0. 042 (0. 12) (0. 09) (0 .25) (0. 04) ML 0. 778 0. 199 0 .507 0. 048 (0. 13) (0. 09) (0 .26) (0. 05) Avg F i s h e r - Z 0. 779 0. 131 0 .489 0. 047 (0. 16) (0. 09) (0 .25) (0. 08) r with Sum 0. 831 0. 1 57 0 .491 0. 048 (0. 13) (0. 12) (0 .24) (0. 07) r with Z-Sum 0. 836 0. 1 56 0 .494 0. 047 (0. 13) (0. 12) (0 .24) (0. 07) Note. Standard d e v i a t i o n s are given i n parentheses. 55 Table 5 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 10 Targets 5 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .60 .40 Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.599 0. 214 (0.07) (0. 04) Shen 0.624 0. 294 0 .503 0. 1 03 (0.20) (0. 09) (0 .28) (0.05) Cronbach 0.750 0. 424 0 .535 0.677 (0.55) (0. 51 ) (0 .28) (3.79) PC 0.667 0. 266 0 .526 0.074 (0.12) (0. 07) (0 .27) (0.04) ML 0.644 0. 288 0 .528 0.079 (0.13) (0. 08) (0 .28) (0.04) Avg F i s h e r - Z 0.595 0. 217 0 .529 0.091 (0.22) (0. 10) (0 .26) (0.08) r with Sum 0.672 0. 281 0 .513 0.119 (0.20) (0. 14) (0 .26) (0.10) r with Z-Sum 0.687 0. 280 0 .534 0.115 (0.19) (0. 14) (0 .26) (0.11) Note. Standard d e v i a t i o n s are given i n parentheses. 56 Table 6 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N L e v e l S c a l e R e l i a b i l i t y Raters 10 Mean 0 1 .60 Targets 5 SD .0 .0 .40 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.599 (0.07) 0. (0. 212 04) . Shen 0.601 (0.24) 0. (0. 276 10) 0. (0. 448 31 ) 0.115 (0.06) Cronbach 0.684 (0.24) 0. (0. 359 34) 0. (0. 551 27) 0.261 (1.16) PC 0.662 (0.14) 0. (0. 263 08) 0. (0. 479 30) 0.080 (0.04) ML 0.641 (0.15) 0. (0. 285 08) 0. (0. 467 33) 0.088 (0.05) Avg F i s h e r - Z 0.591 (0.23) 0. (0. 210 10) 0. (0. 500 25) 0.094 (0.08) r with Sum 0.676 (0.20) 0. (0. 274 14) 0. (0. 477 26) 0.118 (0.09) r with Z-Sum 0.685 (0.19) 0. (0. 273 14) 0. (0. 503 24) 0.113 (0.08) Note. Standard d e v i a t i o n s are given i n parentheses. 57 Table 7 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N L e v e l S c a l e R e l i a b i l i t y Raters 10 Mean 0 1 .80 Targets 10 SD .50 .50 .20 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.805 (0.03) 0. (0. 1 07 02) Shen 0.826 (0.09) 0. (0. 1 48 07) 0. (0. 675 18) 0.022 (0.02) Cronbach 0.827 (0.08) 0. (0. 175 06) 0. (0. 623 23) 0.026 (0.02) PC 0.823 (0.08) 0. (0. 1 29 06) 0. (0. 683 18) 0.017 (0.02) ML 0.807 (0.08) 0. (0. 1 47 06) 0. (0. 694 18) 0.019 (0.02) Avg F i s h e r - Z 0.816 (0.09) 0. (0. 079 04) 0. (0. 680 18) 0.014 (0.02) r with Sum 0.875 (0.06) 0. (0. 091 05) 0. (0. 671 19) 0.017 (0.01) r with Z-Sum 0.878 (0.06) 0. (0. 091 05) 0. (0. 676 19) 0.017 (0.01) Note. Standard d e v i a t i o n s are given i n parentheses. 58 Table 8 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N L e v e l S cale R e l i a b i l i t y Raters 10 Mean 0 1 .60 Targets 10 SD .50 .50 .40 R e p l i c a t i o n s 150 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0. 594 0. 215 (0. 07) (0. 04) Shen 0. 581 0. 266 0. 727 0 .053 (0. 16) (0. 06) (0. 18) (0 .04) Cronbach 0. 625 0. 279 0. 730 0 .046 (0. 1 1 ) (0. 07) (0. 18) (0 .03) PC 0. 638 0. 238 0. 724 0 .038 (0. 10) (0. 06) (0. 18) (0 .02) ML 0. 608 0. 263 0. 744 0 .039 (0. 1 1 ) (0. 05) (0. 18) (o .03) Avg F i s h e r - Z 0. 593 0. 1 55 0. 716 0 .036 (0. 13) (0. 05) (0. 18) (0 .02) r with Sum 0. 701 0. 1 97 0. 689 0 .051 (0. 1 1 ) (0. 07) (0. 18) (0 .03) r with Z-Sum 0. 716 0. 198 0. 709 0 .051 (0. 10) (0. 07) (0. 18) (0 .03) Note. Standard d e v i a t i o n s are given i n parentheses. 59 Table 9 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 10 Ta r g e t s 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .0 Normal 1 ..0 Normal 60 .40 Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R ; • MSE A c t u a l 0. 605 0. 213 (0. 07) (0. 04) Shen 0. 604 0. 264 0. 703 0. 052 (0. 15) (0. 07) (0. 22) (0. 03) Cronbach 0. 626 0. 258 0. 760 0. 036 (0. 1 1 ) (0. 07) (0. 19) (0. 02) PC 0. 653 0. 233 0. 707 0. 038 (0. 10) (0. 06) (0. 21) (0. 02) ML 0. 626 0. 255 0. 715 0. 039 (0. 1 1 ) (0. 06) (0. 24) (0. 02) Avg F i s h e r - Z . 0. 614 0. 1 50 0. 695 0. 035 (0. 13) (0. 05) (0. 20) (0. 02) r with Sum 0. 723 0. 1 90 0. 675 0. 051 (0. 10) (0. 07) (0. 22) (0. 03) r with Z-Sum 0. 732 0. 190 0. 689 0. 051 (0. 10) (0. 07) (0. 20) (0. 03) Note. Standard d e v i a t i o n s are given in parentheses. 60 Table 10 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 20 Targets 20 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0. ,797 0. 1 07 (0. ,02) (0. 01 ) Shen 0. .808 0. 1 30 0. ,813 0. ,009 (0. .05) . (0. 04) (0. ,08) (0. ,01 ) Cronbach 0. .801 0. 1 37 0. ,793 0. ,010 (0. .06) (0. 03) (0. ,09) (0. ,01 ) PC 0, .802 0. 121 0. .818 0. ,008 (0. .05) (0. 03) (0. .08) (0. ,01 ) ML 0, .793 0. 131 0. .826 0. ,009 (0, .05) (0. 03) (0, .08) (0. .01 ) Avg F i s h e r - Z 0, .800 0. 069 0, .812 0. ,007 (0, .05) (0. 02) (0, .09) (0. .01 ) r with Sum 0, .878 0. 077 0, .804 0, .012 (0, .03) (0. 02) (0, .09) (0. .01 ) r with Z-Sum 0, .880 0. 077 0, .808 0. .013 (0, .03) (0. 02) (0, .09) (0. ,01 ) Note. Standard d e v i a t i o n s are given i n parentheses. 61 Table 11 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y Estimates,-N Raters 20 Targets 20 R e p l i c a t i o n s 150 Le v e l S c a l e R e l i a b i l i t y Mean 0 1 .60 SD .50 .50 .40 D i s t r i b u t i o n Normal Normal Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.601 0. 216 - • (0.04) (0. 02) Shen 0.579 0. 256 0 .862 0.027 (0.11) (0. 04) (0 .06) (0.02) Cronbach 0.602 0. 250 0 .868 0.021 (0.08) (0. 04) (0 .06) (0.01) PC 0.610 0. 232 0 .862 0.019 (0.08) (0. 04) (0 .06) (0.01) ML 0.595 0. 243 0 .871 0.019 (0.08) (0. 03) (0 .06) (0.01) Avg F i s h e r - Z 0.584 0. 141 0 .842 0.023 (0.09) (0. 03) (0 .07) (0.01) r with Sum 0.724 0. 1 78 0 .828 0.035 (0.07) (0. 04) (0 .07) (0.02) r with Z-Sum 0.732 0. 178 0 .834 0.036 (0.07) (0. 04) (0 .07) (0.02) Note. Standard d e v i a t i o n s are given i n parentheses. 62 Table 12 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 20 Targets 20 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .0 Normal 1 .0 Normal .60 .40 Normal Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0. 608- 0. ,216 (0. 05) (0. ,02) Shen 0. 601 0. .251 0. .868 0. 022 (0. 10) (0. .04) (0. .06) (0. 01 ) Cronbach 0. 61 1 0. .239 0. ,882 0. 017 (0. 08) (0. .04) (0, .05) (0. 01 ) PC 0. 626 0. .227 0. .867 0. 017 (0. 08) (0. .04) (0. .06) (0. 01 ) ML 0. 61 1 0, .240 0, .879 0. 017 (0. 08) (0, .04) (0, .06) (0. 01 ) Avg F i s h e r - Z 0. 604 0, . 1 38 0, .849 0. 020 (0. 09) (0, .03) (0, .06) (0. 01 ) r with Sum 0. 741 0, . 1 70 0, .836 0. 035 (0. 06) (0, .04) (0, .06) (0. 02) r with Z-Sum 0. 746 0, .171 0, .840 0. 036 (0. 06) (0, .04) (0, .06) (0. 02) Note. Standard d e v i a t i o n s . a r e given i n parentheses. 63 Table 13 Average Over R e p l i c a t i o n s of R e l i a b i l i t y Estimate Means Sample S i z e ( j u d g e s - t a r g e t s ) Est imate Rater Bias 5-10 1 0-5 10-10 20-20 Shen .5.5.8.2 .795 .798 .826 .808 .5.5.6.4 .571 .624 .581 .579 .0.0.6.4 .562 .601 .604 .601 Cronbach .5.5.8.2 .836 .807 .827 .801 .5.5.6.4 .657 .750 .625 .602 .0.0.6.4 .653 .684 .626 .611 PC .5.5.8.2 .823 .793 .823 .802 .5.5.6.4 .675 .667 .638 .610 .0.0.6.4 ,668 .662 .653 .626 ML .5.5.8.2 .787 .778 .807 .793 .5.5.6.4 .619 .644 .608 .595 .0.0.6.4 .613 .641 .626 .611 Avg .5.5.8.2 .792 .779 .816 .800 F i sher-z .5.5.6.4 .593 .595 .593 .584 .0.0.6.4 .580 .591 .614 .604 r with .5.5.8.2 .841 .831 .875 .878 Sum .5.5.6.4 .666 .672 .701 .724 .0.0.6.4 .655 .676 .723 .741 r with .5.5.8.2 .848 .836 .878 .880 Z-Sum .5.5.6.4 .687 .687 .716 .732 .0.0.6.4 .673 .685 .732 .746 64 Table 14 Mean Standard D e v i a t i o n s of R e l i a b i l i t y Estimates Estimate Rater Bias Sample S i z e (-judges-targets) 5-10 10-5 10-10 20-20 Shen .5.5.8.2 . 1 52 .202 .148 .130 .5.5.6.4 .239 .294 .266 .256 .0.0.6.4 .251 .276 .264 .251 Cronbach .5.5.8.2 .254 .238 . 175 . 1 37 .5.5.6.4 .323 .424 .279 .250 .0.0.6.4 .353 .359 .258 .239 PC .5.5.8.2 . 1 07 .180 .129 .121 .5.5.6.4 .206 .266 .238 .232 .0.0.6.4 .223 .263 .233 .227 ML .5.5.8.2 . 1 52 .199 . 1 47 .131 .5.5.6.4 .257 .288 .263 .243 .0.0.6.4 .273 .285 .255 .240 Avg .5.5.8.2 .068 .131 .079 .069 F i sher-z .5.5.6.4 . 1 28 .217 . 1 55 . 141 .0.0.6.4 . 1 43 .210 . 1 50 . 1 38 r with .5.5.8.2 .086 . 157 .091 .077 Sum .5.5.6.4 .169 .281 . 197 . 1 78 .0.0.6.4 .188 .274 .190 . 1 70 r with .5:5.8.2 .086 , 1 56 .091 .077 Z-Sum .5.5.6.4 . 1 72 .280 .198 . 1 78 .0.0.6.4 . 1 90 .273 .190 .171 65 Table 15 R e l i a b i l i t i e s Estimate Rater Bias Sample S i z e ( j u d g e s - t a r g e t s ) 5-10 1 0-5 10-10 20-20 Shen .5.5.8.2 .612 .463 .675. .813 .5.5.6.4 .646 .503 .727 .862 .0.0.6.4 .663 .448 .703 .868 Cronbach .5.5.8.2 .449 .485 .623 .793 .5.5.6.4 .599 .535 .730 .868 .0.0.6.4 .768 .551 .760 .882 PC .5.5.8.2 .610 .503 .683 .818 .5.5.6.4 .638 .526 .724 .862 .0.0.6.4 .656 .479 • .707 .867 ML .5.5.8.2 .628 .507 .694 .826 .5.5.6.4 .663 .528 .744 .871 .0.0.6.4 .667 .467 .715 .879 Avg .5.5.8.2 .614 .489 .680 .812 F i sher-z .5.5.6.4 .647 .529 .716 .842 .0.0.6.4 .685 .500 .695 .849 r with .5.5.8.2 .584 .491 .671 .804 Sum .5.5.6.4 .623 .513 .689 .828 .0.0.6.4 .645 .477 .675 .836 r with .5.5.8.2 .609 .534 .709 .834 Z-Sum .5.5.6.4 .642 .534 .709 .834 .0.0.6.4 .681 .503 .689 .840 66 Table 16 C o r r e l a t i o n s Between R e l i a b i l i t y Estimates and A c t u a l  R e l i a b i l i t i e s Averaged Over Rater B i a s Sample S i z e ( j u d g e s - t a r g e t s ) Estimate 5-10 10-5 . 10-10 20-20 Shen .640 .471 .702 .848 Cronbach .605 .524 .704 .848 PC .635 .503 .705 .848 ML .653 .501 .718 .859 Avg F i s h e r - z .649 .506 .697 .834 r with Sum .617 .494 .678 .823 r with Z-Sum .644 .510 .691 .827 Table 17 C o r r e l a t i o n s Between R e l i a b i l i t y Estimates and A c t u a l  R e l i a b i l i t i e s Averaged Over Sample S i z e Rater Bias (SD(A),SD(B),Mean(r),SD(r)) Estimate .5.5.8.2 .5.5.6.4 .0.0.6.4 Shen .641 .685 .671 Cronbach < - .588 .683 .740 PC .654 .688 .677 ML .664 .702 .682 Avg F i s h e r - z . .649 .684 .682 r with Sum .638 .663 .658 r with Z-Sum .647 .680 .678 68 Table 18 Average Mean Square D e v i a t i o n s of R e l i a b i l i t y E stimates from A c t u a l R e l i a b i l i t i e s Est imate Rater Bias Sample Siz e ( j u d g e s - t a r g e t s ) 5-10 1 0-5 10-10 20-20 Shen .5.5.8.2 .026 .065 .022 .009 .5.5.6.4 .057 . 1 03 .053 .027 .0.0.6.4 .065 .115 .052 .022 Cronbach .5.5.8.2 .063 .070 .026 .010 .5.5.6.4 .086 .667 .046 . .021 .0.0.6.4 .440 .261 .036 .017 PC .5.5.8.2 .016 .042 .017 .008 .5.5.6.4 .045 .074 .038 .019 .0.0.6.4 .043 .080 .038 .017 ML .5.5.8.2 .022 .048 .019 .009 .5.5.6.4 .046 .079 .039 .019 .0.0.6.4 .050 .088 .039 .017 Avg .5.5.8.2 .015 .047 .014 .007 F i s h e r - z .5.5.6.4 .040 .091 .036 .023 ' .0.0.6.4 .045 .094 .035 .020 r with .5.5.8.2 .016 .048 .017 .012 Sum .5.5.6.4 .049 .119 .051 .035 .0.0.6.4 .051 .118 .051 .035 r with .5.5.8.2 .016 .047 .017 .013 Z-Sum .5.5.6.4 .048 .115 .051 .036 .0.0.6.4 .048 .113 .051 .036 69 Table 19 Mean Square D e v i a t i o n s of R e l i a b i l i t y E stimates from  A c t u a l R e l i a b i l i t i e s Averaged Over Rater B i a s Sample S i z e ( j u d g e s - t a r g e t s ) Estimate 5-10 1 0-5 10-10 20-20 Shen .049 .094 .042 .019 Cronbach .196 .333 .036 .016 PC .035 .065 .031 .015 ML .039 .072 .032 .015 Avg F i s h e r - z .033 .077 ..028 .017 r with Sum ..039 ' . .095 .040 .027 r with Z-Sum .037 .092 .040 .028 Table 20 Mean Square D e v i a t i o n s of R e l i a b i l i t y Estimates from  A c t u a l R e l i a b i l i t i e s Averaged Over Sample S i z e Rater Bias (SD(A),SD(B),Mean(r),SD(r)) Estimate .5.5.8.2 .5.5.6.4 .0.0.6.4 Shen .031 .060 .064 Cronbach .042 .205 .189 PC .021 .044 .045 ML .025 .046 .049 Avg F i s h e r - z .021 .048 ' .049 r with Sum .023 .064 .064 r with Z-Sum .023 .063 .062 71 Table 21 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 5 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal 80 .20 Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.881 0.931 0.066 0.047 0.008 0.002 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD PC O v e r a l l 0.946 0.946 0.035 0.035 True Score V a r i a n c e Estimate Mean SD MSE Avg Cov 1.029 0.733 0.534 Avg b = 1 1.076 0.747 0.560 72 Table 22 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 5 Targets 10 R e p l i c a t i o n s 150 Le v e l S c a l e R e l i a b i l i t y Mean 0 1 .60 SD .50 .'50 .40 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.766 0.825 0. 1 34 0.119 0.021 0.014 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD PC O v e r a l l 0.872 0.872 0.099 0.098 True Score Variance Est imate Mean SD MSE Avg Cov Avg b = 1 0.995 1 .062 0.667 0.690 0.442 0.477 73 Table 23 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 5 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l Scale R e l i a b i l i t y 0 .0 Normal 1 .0 Normal .60 .40 Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.788 0.820 0.161 0. 1 28 0..029 0.017 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD PC O v e r a l l 0.878 0.882 0. 1 05 0.091 True Score Variance Estimate Mean SD MSE Avg Cov 0.943 0.54.5 0.299 Avg b = 1 0.970 0.544 0.295 74 Table 24 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score V a r i a n c e . N Raters 10 Targe t s 5 R e p l i c a t i o n s 150 L e v e l Scale R e l i a b i l i t y Mean 0 1 .80 SD .50 .50 .20 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.915 0.951 0.098 0.074 0.012 0.006 Weighting f o r Maximum R e l i a b i l i t y E s t imate Mean SD PC O v e r a l l 0.944 0.945 0.057 0.055 True Score V a r i a n c e Est imate Mean SD MSE Avg Cov 0.899 0.782 0.617 Avg b = 1 0.924 0.794 0.632 75 Table 25 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance N Raters 10 Targets 5 R e p l i c a t i o n s 150 L e v e l S c a l e R e l i a b i l i t y Mean SD D i s t r i b u t i o n 0 .50 Normal 1 .50 Normal .60 .40 Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.786 0.866 0.252 0. 1 57 0.076 0.026 Weighting f o r Maximum R e l i a b i l i t y Est imate Mean SD PC O v e r a l l 0.860 0.873 0. 1 58 0. 1 27 True Score V a r i a n c e Estimate Mean SD MSE Avg Cov 0.968 0.792 0.625 Avg b = 1 1.018 0.804 0.642 76 Table 26 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score V a r i a n c e . N Raters 10 Targets 5 R e p l i c a t i o n s 150 Mean SD Le v e l Scale R e l i a b i l i t y 0 .0 1 .0 .60 40 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.836 0.887 0.201 0. 126 0.046 0.016 Weighting f o r Maximum R e l i a b i l i t y Est imate Mean SD PC O v e r a l l 0.848 0.858 0. 1 54 0. 1 34 True Score Variance Estimate Mean SD MSE Avg Cov 1.022 0.792 0.624 Avg b = 1 1.049 0.790 0.623 77 Table 27 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by Two Methods; Comparison of Estimates of True Score V a r i a n c e . N Raters 10 Targets 10 R e p l i c a t i o n s 150 L e v e l Scale R e l i a b i l i t y Mean 0 1 .80 SD _ .50 .50 .20 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.943 0.968 0.031 0.022 0.002 0.000 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD PC 0.979 0.015 O v e r a l l 0.979 0.015 True Score V a r i a n c e Estimate Mean SD MSE Avg Cov 1.013 0.528 0.277 Avg b = 1 1.037 0.536 0.286 78 Table 28 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 10 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S c a l e R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .60 .40 Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.862 0.893 0.079 0.066 0.008 0.004 Weighting f o r Maximum R e l i a b i l i t y Est imate Mean SID PC 0.941 0.071 O v e r a l l 0.944 0.047 True Score Variance Estimate Mean SD MSE Avg Cov 1 .036 0.647 0.417 Avg b = 1 1.067 0.660 0.437 79 Table 29 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score V a r i a n c e . N Raters 10 Targets 10 R e p l i c a t i o n s 150 Mean SD Le v e l Scale R e l i a b i l i t y 0 0 1 .0 60 .40 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.900 0.913 0.061 0.052 0.004 0.003 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD PC 0.947 0.048 O v e r a l l 0.947 0.048 True Score Variance Estimate Mean SD MSE Avg Cov 1.038 0.476 0.227 Avg b = 1 1.050 0.476 0.227 80 Table 30 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values o f . R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score V a r i a n c e . N Raters 20 Targets 20 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S c a l e R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.973 0.983 0.008 0.007 0.000 0.000 Weighting f o r Maximum R e l i a b i l i t y Estimate Mean SD 0.993 0.004 0.993 0.004 True Score V a r i a n c e Estimate Mean SD MSE Avg Cov 0.966 0.320 0.103 Avg b = 1 0.976 0.323 0.104 PC O v e r a l l 81 Table 31 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 20 Targe t s 20 R e p l i c a t i o n s 150 Le v e l Scale R e l i a b i l i t y Mean 0 1 .60 SD .50 .50 .40 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.928 0.941 0.029 0.028 0.001 0.001 Weighting f o r Maximum R e l i a b i l i t y Est imate Mean SD PC 0.985 0.009 O v e r a l l 0.985 0.009 True Score Variance Estimate Mean SD MSE Avg Cov 0.950 0.399 0.161 Avg b = 1 0.962 0.402 0.162 82 Table 32 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 20 Targe t s 20 R e p l i c a t i o n s 150 L e v e l Scale R e l i a b i l i t y Mean 0 1 .60 SD .0 .0 .40 D i s t r i b u t i o n Normal Normal Normal R e l i a b i l i t y of Sum Est imate Mean SD MSE Alpha Green 0.951 0.953 0.020 0.019 0.000 0.000 Weighting f o r Maximum R e l i a b i l i t y E stimate Mean SD PC 0.986 0.009 O v e r a l l 0.986 0.009 True Score Variance Estimate Mean SD MSE Avg Cov 1.009 0.333 0.110 Avg b = 1 1.011 0.333 0.110 83 Table 33 Expected R e l i a b i l i t y of the Sum of Raters Number Rater Bias (SD(A),SD(B),Mean(r),SD(r)) of Raters .5.5.8.2 .5.5.6.4 .0.0.6.4 5 .9404 .8236 .8422 10 .9687 .8992 .9123 20 .9839 .9463 .9540 84 Table 34 Estimates of R e l i a b i l i t i y of the Sum Sample S i z e ( j u d g e s - t a r g e t s ) Estimate Rater Bias 5-10 10-5 10-10 20-20 Alpha .5.5.8.2 .881 .915 .943 .973 .5.5.6.4 .766 .786 .862 .928 .0.0.6.4 .788 .836 .900 .951 Green .5.5.8.2 .931 .951 .968 .983 .5.5.6.4 .825 .866 .893 .941 .0.0.6.4 .820 .887 .913 .953 85 Table 35 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of -True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Ra t e r s 5 Targets 10 R e p l i c a t i o n s 150 Mean SD Di s t r ibut ion L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal True Scores Estimate Mean SD R MSE 1 MSE 2 A c t u a l -0 .014 0. 963 (0 .30) (0. 22) Consensus -0 .015 1 . 018 0 .964 0. 151 0. 143 (0 .37) (0. 32) (0 .03) (0. 1 1 ) (0. 11 ) Wconsensus -0 .028 1. 014 0 .970 0. 231 0. 1 34 (0 .44) (0. 37) (0 .03) (0. 24) (0. 11) S t a n d a r d i z e d -0 .000 0. 904 0 .972 0. 1 63 0. 133 (0 .00) (0. 05) (0 .02) (0. 14) (0. 10) Wstandardize -o .000 0. 955 0 .972 0. 1 66 0. 1 32 (0 .00) (0. 03) (0 .03) (0. 14) (0. 1 1 ) Cronb-Kelley -o .015 0. 955 0 .964 0. 1 43 0. 143 (0 .37) (0. 33) (0 .03) (0. 11) (0. 11 ) PC Scores -0 .000 1. 000 0 .973 0. 1 75 0. 131 (o .00) (0. 00) (0 .02) (0. 14) (0. 10) ML Scores -o .000 0. 987 0 .972 0. 171 0. 1 32 (0 .00) (0. 01 ) (0 .03) (0. 15) (0. 1 1 ) Note. Standard d e v i a t i o n s are given i n parentheses. 8 6 Table 36 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Va r i a n c e Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N Raters 5 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l Scale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .60 .40 Normal True Scores Estimate Mean SD R MSE 1 MSE2 A c t u a l -o. 005 0, .993 (0. 30) (0, .22) Consensus -0. 014 1 , .078 0, .899 0. 301 0. 265 (0. 36) (0. .30) (0, .08) (0. 19) (0. 16) Wconsensus 0. 013 1, .117 0. .932 0. 340 0. 221 (0. 44) (0. .35) (0, .07) (0. 29) (0. 15) Standardized 0. 000 0, .803 0, .928 0. 247 0. 229 (0. 00) (0. .09) (0. .05) (0. 15) (0. 14) Wstandardize 0. 000 0, .924 0, .938 0. 226 0. 213 (0. 00) (0, .05) (0. .07) (0. 16) (0. 15) Cronb-Kelley -o. 014 0, .901 0, .899 0. 263 0. 265 (0. 36) (0. .32) (0, .08) (0. 16) (0. 16) PC Scores 0. 000 1. .000 0, .934 0. 242 0. 217 (0. 00) (0, .00) (0. .05) (0. 14) (0. 14) ML Scores 0. 000 0, .973 0, .937 0. 230 0. 214 (0. 00) (0, .02) (0. .07) (0. 16) (0. 15) Note. Standard d e v i a t i o n s are given i n parentheses. 87 Table 37 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Raters 5 Targets 10 R e p l i c a t i o n s 150 Mean SD Le v e l Scale R e l i a b i l i t y 0 .0 1 .0 .60 40 D i s t r i b u t i o n Normal Normal Normal True Scores Estimate Mean SD MSE 1 MSE2 A c t u a l 0. .035 0, .952 (0. .33) (0. .23) Consensus 0, .028 1 . .038 0. .904 0. , 1 97 0. ,169 (0, .36) (0, .25) (0. ,08) (0. ,14) (0. 10) Wconsensus 0. .000 1 , .087 0. ,928 0. ,180 0. , 129 (0, .36) (0, .26) (0. ,09) (0. ,24) (0. ,11) S t a n d a r d i z e d 0, .000 0. .794 0. ,928 0. ,251 0. , 1 42 (0. .00) (0, .09) (0. ,06) (0. ,18) (0. ,09) Wstandardize 0, .000 0, .922 0. ,931 0. ,242 0. ,125 (0. .00) (0, .05) (0. ,09) (0. , 18) (0. ,10) Cronb-Kelley 0, .028 0. .868 0. ,904 0. , 1 67 0. , 169 (0, .36) (0, .29) (0. ,08) (0. ,10) (0. ,10) PC Scores 0. .000 1. .000 0. ,924 0. ,270 0. , 1 32 (0. .00) (0. .00) (0. ,11) (0. ,21 ) (0. ,10) ML Scores 0. .000 0, .972 0. ,929 0. ,254 0. , 1 26 (0. .00) (0. .02) (0. ,10) (0. ,19) (0. ,10) Note. Standard d e v i a t i o n s are given i n parentheses. 88 Table 38 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N Raters 10 Targets 5 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S c a l e R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal True Scores Estimate Mean SD R MSE 1 MSE2 A c t u a l 0. .054 0, .912 (0. .46) (0, .33) Consensus 0. .046 0, .904 0. ,975 0. .069 0, .070 (0. .50) (0, .36) (0. ,05) (0. .06) (0. .06) Wconsensus 0. .075 0, .953 0. ,973 0. .264 0, .070 (0. .63) (0, .43) (0. ,05) (0. .28) (0. .06) Standardized 0. .000 0, .867 0. ,979 0. .288 0. .067 (0. .00) (0, .10) (0. ,04) (0. .29) (0. .06) Wstandardize 0. .000 0, .978 0. ,973 0. .324 0. .070 (0. .00) (0, .04) (0. ,05) (0. .30) (0. .06) Cronb-Kelley 0. .046 0, .874 0. ,975 0. .069 0. .070 (0. .50) (0, .37) (0. ,05) (0. .06) (0. .06) PC Scores 0. .000 1. .000 0. ,978 0. .330 0. .066 (0. ,00) (0. .00) (0. ,05) (0. .31 ) (0. .06) ML Scores 0. ,000 0. .998 0. ,973 0. .334 0. .070 (0, ,00) (0. .00) (0. ,05) . (0. .31 ) (0. .06) Note. Standard d e v i a t i o n s are given i n parentheses. 89 Table 39 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Raters 10 Targets 5 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50". Normal .60 .40 Normal True Scores Estimate Mean SD R MSE1 MSE2 A c t u a l -0. 012 0. 949 (0. 43) (0. 35) Consensus -0. 018 0. 981 0. 926 0. 1 55 0. 1 50 (0. 49) (0. 37) (0. 15) (0. 13) (0. 12) Wconsensus -0. 006 1 . 1 05 0. 945 0. 384 0. 1 35 (0. 61 ) (0. 48) (0. 10) (0. 68) (0. 1 1 ) Stan d a r d i z e d -0. 000 0. 748 0. 951 0. 305 0. 131 (0. 00) (0. 15) (0. 12) (0. 28) (0. 11) Wstandardi ze -0. 000 0. 956 0. 940 0. 326 0. 1 35 (0. 00) (0. 07) (0. 14) (0. 30) (0. 1 1 ) Cronb-Kelley -o. 018 0. 885 0. 926 0. 1 50 0. 1 50 (0. 49) (0. 41 ) (0. 15) (0. 12) (0. 12) PC Scores -0. 000 1. 000 0. 941 0. 343 0. 131 (0. 00) (0. 00) (0. 16) (0. 32) (0. 1 1 ) ML Scores -o. 000 0. 996 0. 928 0. 350 0. 1 36 (0. 00) (0. 00) (0. 1 9 ) (0. 32) (0. 1 1 ) Note. Standard d e v i a t i o n s are given i n parentheses. 90 Table 40 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Raters 10 Targets 5 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 1 .60 .0 .0 .40 Normal Normal Normal True Scores' Est imate Mean SD R MSE 1 MSE2 A c t u a l 0 .017 0. 943 (0 .42) (0. 35) Consensus 0 .026 0. 992 0. 946 0 .091 0 .088 (0 .45) (0. 37) (0. 08) (0 .08) (0 .07) Wconsensus 0 .014 1 . 1 76 0. 926 0 .246 0 .088 (0 .48) (0. 40) (0. 15) (0 .41 ) (0 .08) S t a n d a r d i z e d 0 .000 0. 747 0. 960 0 .290 0 .078 (0 .00) (0. 15) (0. 05) (0 .32) (0 .06) Wstandardize 0 .000 0. 960 0. 927 0 .323 0 .087 (0_ .00) (0. 05) (0. 15) (0 .31 ) (0 .08) Cronb-Kelley 0 .026 0. 91 1 0. 946 0 .087 0 .088 (0 .45) (0. 40) (0. 08) (0 .07) (0 .07) PC Scores 0 .000 1. 000 0. 938 0 .334 0 .080 (0 .00) (0. 00) (0. 16) (0 .33) (0 .09) ML Scores 0 .000 0. 996 0. 912 0 .354 0 .092 (0 .00) (0. 00) (0. 22) (0 .34) (0 .10) Note. Standard d e v i a t i o n s are given i n parentheses. 91 Table 41 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N Raters 10 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal True Scores Estimate Mean SD R MSE 1 MSE2 A c t u a l -0. .014 0. .994 (0. .31 ) (0. .22) Consensus 0. .012 1 . .000 0. .985 0. ,072 0, .072 (0. .37) (0, .25) (0. .01 ) (0. ,05) (0. .05) Wconsensus 0. .003 1. .013 0. .990 0. , 1 39 0. .064 (0. .41 ) (0. .33) (0. .01 ) (0. ,13) (0. .05) Standardized -0. .000 0. .902 0. .988 0. , 1 52 0. .067 (0. .00) (0. .05) (0, .01 ) (0. ,14) (0, .05) Wstandardi ze -o. .000 0. .963 0. .990 0. ,151 0. .063 (0. .00) (0. .02) (0. .01 ) (0. ,14) (0. .05) Cronb-Kelley 0. .012 0. .972 0. .985 0. ,071 0. .072 (0. .37) (0. .26) (0. .01 ) (0. ,0-5) (0. .05) PC Scores -o. .000 1. .000 0. .989 0. , 1 58 0. .066 (0. .00) (0. .00) (0. .01 ) (0. ,14) (0. .05) ML Scores -o. .000 0. .995 0. .990 0. , 1 55 0. .063 (0. .00) (0. .00) (0. .01 ) (o. ,14) (0. .05) Note. Standard d e v i a t i o n s are given i n parentheses. 92 Table 42 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Va r i a n c e Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N L e v e l Scale R e l i a b i l i t y 10 Mean 0 1 .60 10 SD .50 .50 .40 150 D i s t r i b u t i o n Normal Normal Normal Raters T a r g e t s R e p l i c a t i o n s True Scores Estimate Mean SD R MSE 1 MSE2 A c t u a l 0.003 (0.32) 0. (0. 981 22) Consensus 0.004 (0.37) 1. (0. 037 31 ) 0. (0. 939 05) 0. (0. 1 74 1 1 ) 0. (0. 161 10) Wconsensus 0.019 (0.44) 1. (0. 082 37) 0. (0. 970 03) 0. (0. 215 25) 0. (0. 1 14 09) Sta n d a r d i z e d 0.000 (0.00) 0. (0. 773 08) 0. (0. 960 03) 0. (0. 221 18) 0. (0. 131 09) Wstandardize 0.000 (0.00) 0. (0. 920 07) 0. (0. 972 03) 0. (0. 181 17) 0. (0. 1 10 08) Cronb-Kelley 0.004 (0.37) 0. (0. 937 31 ) 0. (0. 939 05) 0. (0. 161 10) 0. (0. 161 10) PC Scores 0.000 (0.00) 1. (0. 000 00) 0. (0. 967 03) 0. (0. 201 17) 0. (0. 1 20 08) ML Scores 0.000 (0.00) 0. (0. 988 01) 0. (0. 972 04) 0. (0. 188 17) 0. (0. 1 10 08) Note. Standard d e v i a t i o n s are given i n parentheses. 93 Table 43 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2)» N Raters 10 Targe t s 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S c a l e R e l i a b i l i t y 0 .0 Normal 1 .0 Normal .60 .40 Normal True Scores Est imate Mean SD R MSE 1 MSE2 A c t u a l 0. .028 0, .986 (0, .31 ) (0, .21 ) Consensus 0, .026 1 , .042 0, .953 0. 097 0. 089 (0, .34) (0, .21 ) (0, .04) (0. 05) (0. 05) Wconsensus 0. .033 1, .097 0, .972 0. 091 0. 061 (0. .33) (0, .23) (0, .03) (0. 14) (0. 05) S t a n d a r d i z e d 0. .000 0, .785 0, .965 0. 203 0. 073 (0. .00) (0, .08) (0, .02) (0. 18) (0. 04) Wstandardize 0. .000 0. .930 0. .974 0. 174 0. 059 (0. .00) (0. .04) (0. .03) (0. 16) (0. 04) Cronb-Kelley 0. .026 0. .960 0. .953 0. 089 0. 089 (0. .34) (0. .23) (0. .04) (0. 05) (0. 05) PC Scores 0. ,000 1. .000 0. .972 0. 1 84 0. 063 (0. ,00) (0. .00) (0. .02) (0. 16) (0. 04) ML Scores 0. ,000 0. .990 0. ,974 0. 1 77 0. 059 (0. ,00) (0. .01 ) (0. ,03) (0. 16) (0. 04) Note. Standard d e v i a t i o n s are given i n parentheses. 94 Table 44 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates Standardized to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N Raters 20 Targets • 20 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .50 Normal 1 .50 Normal .80 .20 Normal True Scores Estimate Mean SD R MSE 1 MSE2 Ac t u a l -0. .004 0. 983 (0. .21 ) (0. 15) Consensus -0. .012 0. 982 0. .991 0, .035 0. .035 (0. .25) (0. 16) (0, .01 ) (0. .02) (0. .02) Wconsensus -o. .042 0. 980 0. .996 0, .083 0. .027 (0. .31 ) (0. 22) (0, .00) (0. .09) (0. .02) Standardized 0. .000 0. 892 0, .993 0, .078 0. .032 (0. .00) (0. 03) (0. .00) (0. .08) (0. .02) Wstandardize 0. .000 0. 951 0. .996 0. .070 0. ,026 (0. .00) (0. 02) (0, .00) (0. .07) (0. .02) Cronb-Kelley -o. .012 0. 966 0, .991 0. .034 0. ,035 (0. .25) (0. 16) (0. .01 ) (0, .02) (0. .02) PC Scores 0. .000 1. 000 0. .994 0. .077 0. ,031 (0. .00) (0. 00) (0. .00) (0. .07) (0. ,02) ML Scores 0. .000 0. 997 0. .996 0. .072 0. ,026 (0. .00) (0. 00) (0. .00) (0. .07) (0. ,02) Note. Standard d e v i a t i o n s are given i n parentheses. 95 Table 45 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score Estimates; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and V a r i a n c e Equal to the Product of  Estimates of True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Raters 20 Targets 20 R e p l i c a t i o n s 150 L e v e l Scale R e l i a b i l i t y Mean 0 1 .60 SD .50 .50 .40 D i s t r i b u t i o n Normal Normal Normal True Scores Est imate Mean SD R MSE 1 MSE2 A c t u a l -0, .005 0. .968 (0. .24) (0. .17) Consensus -0, .000 0. .988 0. .971 0. 078 0. .075 (0. .27) (0. .20) (0, .02) (0. 04) (0. .04) Wconsensus 0, .018 1. .004 0. .990 0. 1 07 0. .044 (0, .35) c o . ,27) (0. .01 ) (0. 10) (0. .03) S t a n d a r d i z e d 0. .000 0. ,760 0. .982 0. 1 36 0. .059 (0, .00) (0. ,06) (0. .01 ) (0. 10) (0. .03) Wstandardize 0. .000 0. ,913 0. .992 0. 094 0. .042 (0, .00) (0. ,04) (0. .01 ) (0. 09) (0. .03) Cronb-Kelley -o, .000 0. ,933 0. .971 0. 075 0. .075 (0. .27) (0. ,20) (0. .02) (0. 04) (0. .04) PC Scores 0. .000 1. ,000 0. .986 0. 109 0. .053 (0, .00) (0. ,00) (0. .01 ) (0. 09) (0. .03) ML Scores 0. .000 0. ,994 0. .992 0. 097 0. .042 (0. .00) (0. ,00) (0. .01 ) (0. 09) (0. .03) Note. Standard d e v i a t i o n s are given i n parentheses. 96 Table 46 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of E s t i m a t e s . S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and Variance Equal to the Product of  Estimates of True Score Variance and R e l i a b i l i t y (MSE2). N Raters 20 Targets 20 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S cale R e l i a b i l i t y 0 .0 Normal 1 .0 Normal .60 .40 Normal True Scores Estimate Mean SD R MSE1 MSE2 A c t u a l -0. 005 0. 986 (0. 21 ) (0. 16) Consensus -0. 001 1. 015 0. 975 0. 050 0. 048 (0. 22) (0. 16) (0. 01 ) (0. 02) (0. 02) Wconsensus -0. 003 1. 020 0. 992 0. 018 0. 019 (0. 22) (0. 15) (0. 01) (0. 01 ) (0. 01) S t a n d a r d i z e d 0. 000 0. 773 0. 982 0. 1 27 0. 036 (0. 00) (0. 05) (0. or) (0. 10) (0. 01 ) Wstandardize 0. 000 0. 919 0. 992 0. 081 0. 018 (0. 00) (0. 04) (0. 01 ) (0. 09) (0. 01 ) Cronb-Kelley -0. 001 0. 970 0. 975 0. 048 0. 048 (0. 22) (0. 17) (0. 01 ) (0. 02) (0. 02) PC Scores -o. 000 1. 000 0. 985 0. 095 0. 030 (0. 00) (0. 00) (0. 01 ) (0. 08) (0. 01 ) ML Scores 0. 000 0. 995 0. 992 0. 082 0. 018 (0. 00) (0. 00) (0. 01) (0. 08) (0. 01) Note. Standard d e v i a t i o n s are given i n parentheses. 97 Table 47 Mean C o r r e l a t i o n Between True Score Estimates and A c t u a l True Scores Sample S i z e ( j u d g e s - t a r g e t s ) Estimate Rater Bias 5-10 1 0-5 10-10 20-20 Consensus .5.5.8.2 .964 .975 .985 .991 .5.5.6.4 .899 .926 .939 .971 .0.0.6.4 .904 .946 .953 .975 Weighted .5.5.8.2 .970 .973 .990 .996 Consensus .5.5.6.4 .932 .945 .970 .990 .0.0.6.4 .928 .926 .972 .992 Standar- .5.5.8.2 .972 .979 .988 .993 d i z e d .5.5.6.4 .928 .951 .960 .982 .0.0.6.4 .928 .960 .965 .982 Weighted .5.5.8.2 .972 .973 .990 .996 Standar- .5.5.6.4 .938 .940 .972 .992 d i z e d .0.0.6.4 .931 .927 .974 .992 Cronbach- .5.5.8.2 .964 .975 .985 .991 K e l l e y .5.5.6.4 .899 .926 .939 .971 .0.0.6.4 .904 .946 .953 .975 PC .5.5.8.2 .973 .978 .989 .994 .5.5.6.4 .934 .941 .967 .986 .0.0.6.4 .924 .938 .972 .985 ML .5.5.8.2 .972 .973 .990 .996 .5.5.6.4 .937 .928 .972 .992 .0.0.6.4 .929 .912 .974 .992 98 Table 48 Means f o r Absolute Scale True Score Estimates of Mean  Square D e v i a t i o n s from A c t u a l True Scores Sample S i z e ( j u d g e s - t a r g e t s ) Estimate Rater Bias 5-10 10-5 10-10 20-20 Consensus .5.5.8.2 .151 .069 .072 .035 .5.5.6.4 .301 .155 . 174 .078 .0.0.6.4 .197 .091 .097 .050 Weighted .5.5.8.2 .231 .264 .139 .083 Consensus .5.5.6.4 .340 .384 .215 .107 .0.0.6.4 .180 .246 .091 .018 Cronbach- .5.5.8.2 . 1 43 .069 .071 .034 . K e l l e y .5.5.6.4 .263 . 1 50 .161 .075 .0.0.6.4 .167 .087 .089 .048 Table 49 Mean MSE of True Score Estimates Standardized to Estimated  Optimal S c a l e from A c t u a l True Scores Sample S i z e ( j u d g e s - t a r g e t s ) Estimate Rater B i a s 5-10 10-5 10-10 20-20 Consensus .5.5.8.2 . 1 43 .070 .072 .035 (adjusted) .5.5.6.4 .265 . 1 50 .161 .075 .0.0.6.4 .1 69 .088 .089 .048 Weighted .5.5.8.2 . 1 34 .070 .064 .027 Consensus .5.5.6.4 .221 . 1 35 .114 .044 .0.0.6.4 .129 .088 .061 .019 Standar- .5.5.8.2 . 1 33 .067 .067 .032 d i z e d .5.5.6.4 .229 .131 .131 .059 .0.0.6.4 .142 .078 .073 .036 Weighted .5.5.8.2 . 1 32 .070 .063 .026 Standar- .5.5.6.4 .213 . 1 35 .110 .042 d i z e d .0.0.6.4 . 125 .087 .059 .018 Cronbach- .5.5.8.2 . 1 43 .070 .072 .035 K e l l e y .5.5.6.4 .265 . 1 50 .161 .075 .0.0.6.4 .169 .088 .089 .048 PC .5.5.8.2 .131 .066 .066 .031 .5.5.6.4 .217 .131 .120 .053 .0.0.6.4 . 1 32 .080 .063 .030 ML .5.5.8.2 . 1 32 .070 .063 .026 .5.5.6.4 .214 . 1 36 .110 .042 .0.0.6.4 . 1 26 .092 .059 .018 100 Table 50 Mean Squared D e v i a t i o n s Averaged Over Rater B i a s Sample S i z e ( j u d q e s - t a r g e t s ) Estimate 5-10 10-5 10-10 20-20 Consensus .216 . 1 05 • 1 1 4 .054 (unadjusted) Cronbach-Kelley .191 . 1 02 . 1 07 .052 Standardized .110 .092 .090 .042 Weighted-standar- . 1 57 .097 .077 .029 d i z e d ML .156 .099 .077 .029 PC .1 60. .092 .083 .038 101 Table 51 Mean Squared D e v i a t i o n s Averaged Over Sample S i z e Rater Bias (SD(A),SD(B),Mean(r),SD(r)) Estimate .5.5.8.2 .5.5.6.4 .0.0.6.4 Consensus .082 . 177 .109 (unadjusted) Cronbach-Kelley .079 . 162 .098 Sta n d a r d i z e d .075 . 1 38 .082 Weighted Standar- .073 . .125 .072 d i z e d ML .073 .126 .074 PC . .074 . . 1 30 .076 102 Table 52 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l R e l i a b i l i t i e s and Mean Square D e v i a t i o n s from A c t u a l  R e l i a b i l i t i e s of Rater R e l i a b i l i t y E s t i m a t e s . N Raters 10 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l S c a l e R e l i a b i l i t y 0 1 .60 .50 .50 .40 Normal Uniform Uniform Rater R e l i a b i l i t y Estimates Estimate Mean SD R MSE A c t u a l 0.593 (0.07) 0.218 (0.04) Shen 0.538 (0.18) 0.267 (0.06) 0. (0. 694 16) 0.065 (0.04) Cronbach 0.602 (0.13) 0.293 (0.10) 0. (0. 686 17) 0.061 (0.09) PC 0.613 (0.11) 0.248 (0.06) 0. (0. 693 16) 0.042 (0.02) ML 0.582 (0.12) 0.270 (0.06) 0. (0. 714 18) 0.045 (0.03) Avg F i s h e r - Z 0.557 (0.15) 0. 1 64 (0.05) 0. (0. 674 16) 0.044 (0.03) r with Sum 0.669 (0.13) 0.211 (0.08) 0. (0. 638 17) 0.054 (0.03) r with Z-Sum 0.688 (0.11) 0.216 (0.08) 0. (0. 664 16) 0.053 (0.03) Note. Standard d e v i a t i o n s are given i n parentheses. 103 Table 53 Comparison of Estimates of R e l i a b i l i t y of Sums of Raters;  P o p u l a t i o n Values of R e l i a b i l i t y of Composites Weighted by  Two Methods; Comparison of Estimates of True Score Variance, N Raters 10 Targets 10 R e p l i c a t i o n s 150 L e v e l Scale R e l i a b i l i t y Mean SD D i s t r i b u t i o n 0 1 .60 .50 .50 .40 Normal Uniform Uniform R e l i a b i l i t y of Sum Estimate Mean SD MSE Alpha Green 0.836 0.871 0.098 0.085 0.013 0.008 Weighting f o r Maximum R e l i a b i l i t y Est imate Mean SD PC 0.943 0.042 O v e r a l l 0.943 0.042 True Score V a r i a n c e Estimate Mean SD MSE Avg Cov 0.844 0.533 0.307 Avg b = 1 0.874 0.543 0.309 1 04 Table 54 Means (over r e p l i c a t i o n s ) of Means, SD's, C o r r e l a t i o n s with  A c t u a l and Mean Square D e v i a t i o n s from A c t u a l (MSE1) of True  Score E s t i m a t e s ; A l s o Includes Mean Square D e v i a t i o n s from  A c t u a l of Estimates S t a n d a r d i z e d to Mean Equal to Estimated  True Score Mean and Va r i a n c e Equal to the Product of  Estimates of True Score V a r i a n c e and R e l i a b i l i t y (MSE2). N Raters 10 Targets 10 R e p l i c a t i o n s 150 Mean SD D i s t r i b u t i o n L e v e l Scale R e l i a b i l i t y 0 1 .60 .50 .50 .40 Normal Uniform Uniform True Scores Est imate Mean SD R MSE 1 MSE2 A c t u a l 0. ,014 0. ,915 (0. .29) (0. ,22) Consensus 0. .039 0. ,949 0. ,927 0. ,161 0. 1 52 (0. .35) (0. .27) (0. ,06) (0. ,10) (0. 09) Wconsensus 0. .040 0. ,998 0. ,962 0. , 1 93 0. 1 1 1 (0, .40) (0. .34) (0. ,04) (0. ,14) (0. 08) S t a n d a r d i z e d 0, .000 0. .750 0. ,954 0. ,181 0. 1 22 (0. .00) (0. .09) (0. .04) (0. ,14) (0. 08) Wstandardize 0. .000 0. .912 0. .966 0. , 1 62 0. 106 (0. .00) (0, .05) (0. .04) (0. ,14) (0. 08) Cronb-Kelley 0. .039 0, .841 0. .927 0. , 1 52 0. 1 52 (0. .35) (0, .29) (0. .06) (0. ,09) (0. 09) PC Scores 0, .000 1. .000 0. .963 0. ,187 0. 1 1 1 (0, .00) (0, .00) (0. .03) (0. ,14) (0. 08) ML Scores -o, .000 0, .986 0. .966 0. , 1 77 0. 106 (0, .00) (0, .01 ) (0. .04) (0. ,14) (o. 08) Note. Standard d e v i a t i o n s are given i n parentheses. IV. DISCUSSION In g e n e r a l , the r e s u l t s of t h i s study support the op i n i o n that Burt's (1936.) method g i v e s modest gains i n r e l i a b i l i t y and v a l i d i t y over the simple consensus. The r e s u l t s are u s e f u l i n that they may suggest c o n d i t i o n s under which maximum b e n e f i t i s to be expected. Some l i g h t was shed on i s s u e s which are t i e d up with Burt's procedure. Some c l e a r p a t t e r n s emerged, f o r example, from the comparison of methods of e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s . The mean r e l i a b i l i t y estimates f o r the r with Sum and r with Z-Sum were c o n s i s t e n t l y i n f l a t e d , even though the judge being estimated was excluded from the sum. The Cronbach method a l s o gave i n f l a t e d estimates but the b i a s appears to decrease when the number of judges and/or t a r g e r s i n c r e a s e s . Whether or not one i s concerned with the accuracy of the mean depends on the purpose to which the estimate i s being put. If one i s i n t e r e s t e d i n the r e l a t i v e standing of a set of judges, then the mean i s i r r e l e v a n t . But i t may be re l e v a n t when the estimate i s used as part of a m u l t i p l i c a t i v e f a c t o r as i n the case of Burt's weights. The standard d e v i a t i o n of the r e l i a b i l i t y estimates of a given method r e l a t e s to the s t a b i l i t y of the mean of the estimates as w e l l as to the squared d e v i a t i o n from the a c t u a l e s t i m a t e s . The average F i s h e r - z , r with Sum and r with Z-Sum methods had standard d e v i a t i o n s l e s s than the standard d e v i a t i o n s of the a c t u a l r e l i a b i l i t i e s . The other methods had standard d e v i a t i o n s s l i g h t l y l a r g e r than the 1 05 106 a c t u a l except the Cronbach method which had c o n s i d e r a b l y l a r g e r standard d e v i a t i o n s under the small sample c o n d i t i o n s . T h i s r e s u l t s i n unstable mean e s t i m a t e s . Thus the data support Cronbach et a l . ' s (1963) o b s e r v a t i o n that the estimate does not perform w e l l under small sample c o n d i t i o n s . In terms of the r e l a t i v e standing of r a t e r r e l i a b i l i t i e s , the most important index i s the c o r r e l a t i o n between estimated and a c t u a l . A l l methods t e s t e d gave q u i t e s i m i l a r c o r r e l a t i o n s . The highest c o r r e l a t i o n s were g e n e r a l l y given by the maximum l i k e l i h o o d f a c t o r a n a l y s i s method, although the Cronbach method may be s u p e r i o r i n the absence of r a t e r s c a l e d i f f e r e n c e s . Thus i t may be of i n t e r e s t to t r y the Cronbach method with s t a n d a r d i z e d r a t i n g s . T h i s would i n v o l v e e s t i m a t i n g a judge's r e l i a b i l i t y by the square of the c o r r e l a t i o n (as opposed to cov a r i a n c e ) between that judge's r a t i n g s and a l l other judges' r a t i n g s d i v i d e d by the average c o r r e l a t i o n between a l l p a i r s of judges' r a t i n g s . The s i z e of the c o r r e l a t i o n between estimates and a c t u a l was s t r o n g l y a f f e c t e d by the number of judges and the number of t a r g e t s and l e s s so by the types of r a t e r b i a s present with the exce p t i o n of the Cronbach method. The l e a s t o v e r a l l mean square d e v i a t i o n s from a c t u a l were given by the PC and Avg F i s h e r - z e s t imates, f o l l o w e d by the ML method. Thus i n terms of l e a s t squares e s t i m a t i o n , which i s a f u n c t i o n of mean, var i a n c e and c o r r e l a t i o n , the 107 best methods of e s t i m a t i o n appear to be the p r i n c i p a l components and average F i s h e r - z methods. The method which performed the worst i n terms of mean square was the Cronbach method, although i t seems to improve with i n c r e a s e d sample s i z e . Hence, i f ab s o l u t e as opposed to r e l a t i v e agreement between estimate and a c t u a l i s d e s i r e d , the PC method i s recommended and the Cronbach method should be avoided unless the sample s i z e i s l a r g e . One of the most i n t e r e s t i n g r e s u l t s of t h i s study concerned the two estimates of the r e l i a b i l i t y of the sum of judges (which i s e q u i v a l e n t to the r e l i a b i l i t y of the mean of judges) which i s an estimate of the r e l i a b i l i t y of the consensus method of e s t i m a t i n g t r u e s c o r e s . S u r p r i s i n g l y , the method d e s c r i b e d as Green's method which used maximum l i k e l i h o o d squared f a c t o r l o a d i n g s as estimates of the i n d i v i d u a l judges' r e l i a b i l i t i e s performed much b e t t e r than Cronbach's alpha under a l l c o n d i t i o n s . Note that alpha i s an exact estimate when c o n d i t i o n s (judges) are e q u i v a l e n t , i e . , when they have equal means, v a r i a n c e s and r e l i a b i l i t i e s (Cronbach et a l . , 1972). Judges were not e q u i v a l e n t i n any of the c o n d i t i o n s t e s t e d i n t h i s study. The Green estimate was only s l i g h t l y lower than the a c t u a l r e l i a b i l i t y of the sum f o r p r a c t i c a l l y a l l c o n d i t i o n s . The only c o n d i t i o n where i t seemed to s l i p a l i t t l e was the 10-judges 5-targets c o n d i t i o n . I t has been observed that the maximum l i k e l i h o o d estimates of i n d i v i d u a l r a t e r r e l i a b i l i t i e s are poorer under t h i s c o n d i t i o n , which 108 may e x p l a i n why the Green estimate d i d not perform as w e l l under t h i s c o n d i t i o n as under the other c o n d i t i o n s . I t s t i l l performed much b e t t e r than alpha however. A b e t t e r estimate of the r e l i a b i l i t y of the sum under c o n d i t i o n s l i k e the 10-judges 5-targets c o n d i t i o n may be obtained by e s t i m a t i n g i n d i v i d u a l judges' r e l i a b i l i t i e s by the average F i s c h e r - Z or by the p r i n c i p a l components method. The performance of alpha improved c o n s i d e r a b l y under the 20-judges 20-targets c o n d i t i o n . Hence alpha may be c o n s i d e r e d an adequate estimate under l a r g e sample s i z e s . The apparent improvement in the alpha estimate may be due to the c e i l i n g e f f e c t of u n i t r e l i a b i l i t y as the number of judges i n c r e a s e s . The performance of alpha was poorest under small sample s i z e s and under the .5.5.6.4 c o n d i t i o n . Thus i t appears that alpha i s d i r e c t l y r e l a t e d to the equivalence of c o n d i t i o n s . Hence alpha should not be used i f there i s reason to b e l i e v e that the average judge's r e l i a b i l i t y i s low and/or the judges d i f f e r with respect to t h e i r means, v a r i a n c e s , or r e l i a b i l i t i e s . The r e s u l t s r e l a t i n g to weighting f o r maximum r e l i a b i l i t y v e r i f y the v a l i d i t y of O v e r a l l ' s formula f o r the weights s i n c e the two estimates agree very c l o s e l y . I t i s not s u r p r i s i n g that O v e r a l l ' s formula works because the b a s i c assumption i n v o l v e d was the u n i - f a c t o r assumption, and a l l s imulated data i n the study s a t i s f i e d t h i s assumption. Weighting f o r maximum r e l i a b i l i t y (which i s j u s t Burt's weighting scheme) g e n e r a l l y r e s u l t e d in higher r e l i a b i l i t i e s 109 than the unweighted sum (which i s j u s t the u s u a l consensus). Weighting produced s u b s t a n t i a l improvement i n even the l a r g e sample c o n d i t i o n . The only c o n d i t i o n f o r which weighting d i d not i n c r e a s e r e l i a b i l i t y was the 10-judges 5 - t a r g e t s c o n d i t i o n where the weighted composite a c t u a l l y performed worse than the unweighted sum. The r e s u l t s from t h i s c o n d i t i o n probably a t t e n u a t e d the d i f f e r e n c e s i n d i c a t e d by averaging over sample s i z e s (see f i g u r e 4). The poor performance of the weights f o r the 10-5 c o n d i t i o n can probably be e x p l a i n e d by the f a c t that the weights i n v o l v e d maximum l i k e l i h o o d squared f a c t o r l o a d i n g s as estimates of i n d i v i d u a l judge's r e l i a b i l i t i e s . But, as p o i n t e d out p r e v i o u s l y , the ML method d i d not provide very good estimates of r a t e r r e l i a b i l i t y under the 10-5 c o n d i t i o n . None of the methods of e s t i m a t i n g r a t e r r e l i a b i l i t y performed very w e l l under t h i s c o n d i t i o n . Thus i t i s not a d v i s a b l e to use weights under c o n d i t i o n s of small numbers of t a r g e t s (5 or l e s s ) . The sampling e r r o r i s too l a r g e , and consequently the estimated weights perform worse than u n i t weights. Hence u n i t weighting should be used when there i s a r i s k of l a r g e standard e r r o r s f o r the r a t e r r e l i a b i l i t y e s e t imates, such as when there are f i v e or fewer t a r g e t s . The number of judges does not seem to matter. I t should be observed, however, that i t i s j u s t as unadvisable not to use weights i f there are 10 or more t a r g e t s . The r e s u l t s concerning the true score v a r i a n c e estimates were d i s a p p o i n t i n g . Neither of the two estimates 110 seemed to work very w e l l i n that the standard d e v i a t i o n s seemed e x c e s s i v e . The standard d e v i a t i o n of the sample v a r i a n c e i s given by (2/(n-1 )') 1/2 (Hoel, 1971) where n i s the number of t a r g e t s . S u b s t i t u t i n g the values n=5, 10, 20 i n d i c a t e s that the l a r g e standard d e v i a t i o n s of the estimates of the true score v a r i a n c e i s not a t t r i b u t a b l e to the sampled standard d e v i a t i o n of the true score v a r i a n c e . I t would be very d e s i r a b l e i n the i n t e r e s t of e x p r e s s i n g the true score estimates i n an a b s o l u t e s c a l e , to o b t a i n a b e t t e r method.of e s t i m a t i n g true score v a r i a n c e . Perhaps the f a c t o r a n a l y t i c techniques can be used in t h i s r e s p e c t . Concerning the main t o p i c of i n t e r e s t - - t r u e score e s t i m a t e s — t h e consensus and Cronbach-Kelley methods gave i d e n t i c a l r e s u l t s f o r mean and c o r r e l a t i o n with a c t u a l v a l u e s . T h i s i s expected from the d e f i n i t i o n of the Cronbach-Kelley e s t i m a t e s . I t i s merely a l i n e a r t r a n s f o r m a t i o n of the consensus. The standard d e v i a t i o n s of the Cronbach-Kelley estimates, however, were much sm a l l e r than those of the consensus s i n c e scores are r e g r e s s e d towards the mean. T h i s has the e f f e c t of reducing the mean squared d e v i a t i o n from a c t u a l t r u e s c o r e s . With respect to the c o r r e l a t i o n between est i m a t e s and a c t u a l , the f i r s t t h i n g to be noted i s that a l l methods gave hig h c o r r e l a t i o n s under a l l c o n d i t i o n s (.899 and above). T h i s was somewhat s u r p r i s i n g . I t i s probably true t h a t r a t i n g data c o u l d be 'messier' i n some r e a l l i f e s i t u a t i o n s . 111 For example, the data used in t h i s study s a t i s f i e d the u n i - f a c t o r assumption. Hence a l l r a t e r s gave v a l i d r a t i n g s ( notwithstanding s c a l e d i f f e r e n c e s ) . The only b i a s i n g i n f l u e n c e s were s c a l e and r e l i a b i l i t y b i a s e s . The poorest r a t e r c o n d i t i o n had a mean r e l i a b i l i t y of .6 and a standard d e v i a t i o n of .22 ( s i n c e r e l i a b i l i t i e s below .2 and above 1 were t r u n c a t e d ) . I t may be the case that i n some r e a l l i f e s i t u a t i o n s the r e l i a b i l i t i e s can get even worse than t h i s . I t has been observed by many w r i t e r s that some judges have been known to a c t u a l l y reverse the order of the t a r g e t s (Burt, 1936; Hunter, 1965; Cronbach et a l . , 1972). T h i s would amount ( i n a sense) to having a negative r e l i a b i l i t y (or a negative s c a l i n g f a c t o r ) . As a follow-up to t h i s study i t would be i n t e r e s t i n g to i n v e s t i g a t e p o s s i b l e forms of the d i s t r i b u t i o n of the s c a l i n g f a c t o r s . The d i s t r i b u t i o n s used in the study, namely normal and uniform with mean 1 and t r u n c a t e d to preserve symmetry, seemed to be somewhat a r t i f i c i a l and a r b i t r a r y . I t was f e l t i n t u i t i v e l y that one would l i k e the s c a l i n g f a c t o r s to f o l l o w a geometric d i s t r i b u t i o n such that the number of judges m u l t i p l y i n g the true s c a l e by one h a l f would balance with the number of judges m u l t i p l y i n g the s c a l e by a f a c t o r of two. I t was not c l e a r how such a d i s t r i b u t i o n c o u l d be d e f i n e d which a l s o s a t i s f i e d the requirement that the expected value of the s c a l i n g f a c t o r be one (Cronbach et a l . , 1963). Perhaps t h i s requirement i s unnecessary. T h i s would r e q u i r e re-examining the q u e s t i o n of the a b s olute true 112 score s c a l e . Although a l l methods produced a c c e p t a b l e c o r r e l a t i o n s between estimated and a c t u a l true scores, the maximum l i k e l i h o o d and weighted s t a n d a r d i z e d ( i e . Burt's) methods gave c o n s i s t e n t l y higher c o r r e l a t i o n s , except, again i n the c o n d i t i o n of 10-judges and 5-targets , where simply s t a n d a r d i z i n g produced the highest c o r r e l a t i o n s . T h i s l a s t r e s u l t r e f l e c t s the p r e v i o u s l y d i s c u s s e d r e s u l t concerning the e s t i m a t i o n of judges' r e l i a b i l i t i e s i n the 10-5 c o n d i t i o n . With respect to a b s o l u t e as opposed to r e l a t i v e s c o r e s , the most i n t e r e s t i n g measure i s the mean square d e v i a t i o n s between estimates and a c t u a l s c o r e s . The consensus and Cronbach-Kelley estimates are expressed i n ab s o l u t e s c a l e and hence can be d e v i a t e d from true scores d i r e c t l y . In a d d i t i o n , a l l estimates ( i n c l u d i n g the consensus and Cronbach-Kelley) were converted to a s c a l e d e f i n e d by a mean equal to the o v e r a l l sample mean of the r a t i n g s ( t h i s was the same f o r a l l estimates) and v a r i a n c e equal to the product of estimated t r u e score v a r i a n c e (which was the same for a l l methods) and the estimated r e l i a b i l i t y of the p a r t i c u l a r t rue score estimate ( t h i s v a r i e d between methods). The mean square d e v i a t i o n f o r the simple consensus was c o n s i s t e n l y the worst of a l l the methods fo l l o w e d c l o s e l y by the Cronbach-Kelley method. The maximum l i k e l i h o o d and weighted s t a n d a r d i z e d (Burt's) methods g e n e r a l l y performed 1 1 3 much b e t t e r than the o t h e r s , except a g a i n , i n the 10-judges 5-targets c o n d i t i o n , i n which case the p r i n c i p a l components and unweighted s t a n d a r d i z e d methods performed s l i g h t l y b e t t e r than the ot h e r s . One i n t e r e s t i n g r e s u l t i s that the mean square d e v i a t i o n s f o r the Cronbach-Kelley estimate remained v i r t u a l l y unchanged a f t e r the change of s c a l e , which i n d i c a t e s that i t was al r e a d y expressed i n the a p p r o p r i a t e s c a l e . It should a l s o be observed that the v a r i a n c e to which the estimates were s c a l e d was i n p a r t d e f i n e d by the estimated true score v a r i a n c e , which as has been p o i n t e d out, does not estimate t r u e score v a r i a n c e with, much accuracy. Hence i f a b e t t e r estimate of true score v a r i a n c e i s found, Burt's method would probably show an even gre a t e r improvement over the consensus method. The f i n a l c o n c l u s i o n i s t h a t , using squared maximum l i k e l i h o o d f a c t o r l o a d i n g s to estimate i n d i v i d u a l r e l i a b i l i t i e s , Burt's method of e s t i m a t i n g true score has about the same accuracy as the maximum l i k e l i h o o d f a c t o r scores obtained by Thomson's r e g r e s s i o n method, and when these methods are r e s c a l e d to an a b s o l u t e s c a l e , they r e s u l t i n about a 30 to 50% r e d u c t i o n i n the mean square d e v i a t i o n from a c t u a l true scores over the simple consensus and Cronbach-Kelley e s t i m a t e s . The exce p t i o n to t h i s g e n e r a l i z a t i o n i s when there are fewer than f i v e t a r g e t s , i n which case there i s l e s s d i s c r e p a n c y between the estimates 1 14 and the unweighted s t a n d a r d i z e d estimates are s l i g h t l y more ac c u r a t e than the o t h e r s . With 5 t a r g e t s or l e s s , one would not be ad v i s e d to use a weighted consensus; the unweighted s t a n d a r d i z e d scores are o p t i m a l . In a l l other cases, weights improve both the r e l i a b i l i t y and the v a l i d i t y of the estimates, although not enormously. But i f the data are being scored by a computer anyway, i t can't hurt to o b t a i n b e t t e r e s t i m a t e s . If one i s s p e c i f i c a l l y i n t e r e s t e d in o b t a i n i n g estimates of r a t e r r e l i a b i l i t i e s , another c o n s i d e r a t i o n i s the number of judges. The average F i s h e r - z method seems to work b e t t e r than the other methods under the 5-judges 10-targets c o n d i t i o n . The maximum l i k e l i h o o d method works as w e l l or b e t t e r than the other methods under the other c o n d i t i o n s . I t was not t e s t e d i n t h i s study, but i t i s p o s s i b l e that Burt's estimates might have performed even b e t t e r i n the 5-judges 10-targets c o n d i t i o n had the average F i s c h e r - Z r a t e r r e l i a b i l i t y e s timates been used i n s t e a d of the ML es t i m a t e s . I t was mentioned i n the i n t r o d u c t i o n that i t does not make sense to combine scores which are expressed i n d i f f e r e n t s c a l e s . Hence, scores are s t a n d a r d i z e d to a common v a r i a n c e . But i t was a l s o argued that the u n i t of measurement i s given by the r e g r e s s i o n c o e f f i c i e n t on true s c o r e s . Thus to e q u a l i z e u n i t s of measurement, scores should be d i v i d e d by t h e i r t r u e score r e g r e s s i o n c o e f f i c i e n t s r a t h e r than t h e i r standard d e v i a t i o n s . But Burt's weights 1 1 5 come out the same in e i t h e r case s i n c e they i n v o l v e s t a n d a r d i z i n g to u n i t v a r i a n c e , and hence d i v i d i n g the r a t i n g s by any constant w i l l not a f f e c t the r e s u l t i n g weights. For f u t u r e i n v e s t i g a t i o n s , i t would be i n t e r e s t i n g to e x p l o r e other p o s s i b l e d i s t r i b u t i o n s f o r the s c a l i n g f a c t o r s and the r a t e r r e l i a b i l i t y c o e f f i c i e n t s . I t would a l s o be i n t e r e s t i n g and more r e a l i s t i c to i n v e s t i g a t e the case of f a c t o r i a l l y more complex r a t e r and true score models. That i s , a t t r i b u t e s to be judged may be f a c t o r i a l l y complex, and judges may s t r e s s d i f f e r e n t f a c t o r s i n t h e i r r a t i n g s . The u n i - f a c t o r model, however, i s probably a reasonable approximation, at l e a s t f o r most r a t i n g s i t u a t i o n s . Other p o s s i b l e developments of the present study should o b v i o u s l y a l s o i n c l u d e some e m p i r i c a l t e s t s . In p a r t i c u l a r , i t would be i n t e r e s t i n g to o b t a i n r a t i n g data i n a s i t u a t i o n which permits exact knowledge of the true s c o r e s . One example might be to have s u b j e c t s r a t e the temperatures of a number of d i f f e r e n t rooms. The true temperature in t h i s case i s known. The same s o r t s of comparisons c o u l d be performed as i n the present study. One d i f f e r e n c e however i s that room temperature i s not d e f i n e d by consensual agreement. The only way that one c o u l d get an 'exact' measure of a c o n s e n s u a l l y d e f i n e d concept would be to o b t a i n r a t i n g s from a i n f i n i t e l y l a r g e number of judges. Since the data f o r t h i s study were generated i n accordance with a s p e c i f i c model, i t i s evident that the 116 r e s u l t s have only h y p o t h e t i c a l s i g n i f i c a n c e . T h i s i s one of the hazards of a l l s i m u l a t i o n s t u d i e s . C l e a r l y i t would be d e s i r a b l e to t e s t the e f f e c t s of v a r i o u s v i o l a t i o n s of the assumtions. The s i m p l i c i t y of the model does not however n e c e s s a r i l y d e t r a c t from i t s s i g n i f i c a n c e . Although r e a l l i f e data would do u b t l e s s v i o l a t e the model requirements in any number of ways, i t i s l i k e w i s e the case that any two r a t i n g s i t u a t i o n s w i l l possess d i f f e r e n t c h a r a c t e r i s t i c s . And yet the aim i s to develop a procedure which a p p l i e s to a general c l a s s of s i t u a t i o n s . The e s s e n t i a l p r o p e r t i e s of the elements of that c l a s s should be represented i n the model. The present study was concerned with o p t i m i z i n g the true score estimates f o r a p a r t i c u l a r set of r a t i n g s using only i n f o r m a t i o n which c o u l d be e x t r a c t e d from those r a t i n g s . O b t a i n i n g optimal estimates i n v o l v e d e s t i m a t i n g the judges' s c a l e s of measurement and r e l i a b i l i t y . Cronbach et a l . (1963) c o n s i d e r e d the p o s s i b i l i t y of using this-i n f o r m a t i o n about judges in f u t u r e s t u d i e s . Such a proposal would r e q u i r e i n f o r m a t i o n on the s t a b i l i t y of the estimates over time and context. Some kind of c r o s s - v a l i d a t i o n procedure would be r e q u i r e d . I t i s c o n c e i v a b l e that the estimated s c a l e and r e l i a b i l i t y parameters a s s o c i a t e d with a given judge might be used to p r e d i c t f u t u r e r a t i n g s by that judge. T h i s i s s i m i l a r to Goldberg's (1970) 'bootstrapping' approach to c l i n i c a l p r e d i c t i o n (Wiggins, 1981), where judges' r a t i n g s aire regressed on p r e d i c t o r v a r i a b l e s and the r e s u l t i n g 1 17 r e g r e s s i o n weights are used to p r e d i c t f u t u r e r a t i n g s by a given judge. Goldberg (1970), Wiggins and Kohen (1974), and Dawes and C o r r i g a n (1974) found that the r e g r e s s i o n equations outperformed the judges themselves i n n e a r l y every case. The model under c o n s i d e r a t i o n i n t h i s study r e p r e s e n t s a r e g r e s s i o n of r a t i n g s on h y p o t h e t i c a l t r u e s c o r e s . But the r e g r e s s i o n e s t i m a t e s f o r r a t i n g s on true scores can e a s i l y be converted to r e g r e s s i o n estimates of t r u e scores on r a t i n g s . Hence using r e g r e s s i o n estimates from past data f o r a p a r t i c u l a r judge, one can l i n e a r l y p r e d i c t t rue scores from the judge's f u t u r e observed r a t i n g s . T h i s procedure, however, assumes a c e r t a i n amount of s t a b i l i t y i n a judge's mean, v a r i a n c e and r e l i a b i l i t y which i s not necessary f o r the present study. As a f i n a l g e neral comment, i t should be r e i t e r a t e d that the a v a i l a b i l i t y of computers makes f e a s i b l e many s t a t i s t i c a l techniques such as those proposed by Bu r t ( l 9 3 6 ) and K e l l e y ( l 9 4 7 ) which are not c u r r e n t l y i n use p r i m a r i l y because they i n v o l v e a great deal of computational l a b o r . With computers doing the work, these s t a t i s t i c a l techniques can be e x p l o i t e d to in c r e a s e the r e l i a b i l i t y and v a l i d i t y of measurements i n areas where improvement, no matter how sm a l l , i s much needed. REFERENCES A l l e n , M.J., & Yen, W.M. (1979). I n t r o d u c t i o n to measurement  theory. Monterey, C a l i f o r n i a : Brooks/Cole. Bartko, J . J . (1966). The i n t r a c l a s s c o r r e l a t i o n c o e f f i c i e n t as a measure of r e l i a b i l i t y . P s y c h o l o g i c a l Reports, 19, 3-11. Berg, I . , & Adams, H. (1962). The experimental b a s i s of p e r s o n a l i t y assessment. In A.J. Bachrach (Ed.), Experimental foundations of c l i n i c a l psychology. New York: Bas i c Books. Bern a r d i n , H.J. (1978). The e f f e c t s of r a t e r t r a i n i n g on l e n i e n c y and halo e r r o r s i n student r a t i n g s of i n s t r u c t o r s . J o u r n a l of A p p l i e d Psychology, 63, 301-308. B i e r i , J . , A t k i n s , L., B r i a r , S'. , Leaman, R., M i l l e r , H., & T r i p o d i , T. (1966). C l i n i c a l and s o c i a l judgment: The  d i s c r i m i n a t i o n of b e h a v i o r a l i n f o r m a t i o n . New York: Wiley. B r i e f , A.P. (1980). Peer assessment r e v i s i t e d : A b r i e f comment on Kane and Lawler. P s y c h o l o g i c a l B u l l e t i n , 88, 78-79. Borman, W.C. (1978). E x p l o r i n g upper l i m i t s of r e l i a b i l i t y and v a l i d i t y in job performance r a t i n g . J o u r n a l of  A p p l i e d Psychology, 63, 135-144. Borman, W.C. (1979). Format and t r a i n i n g , e f f e c t s on r a t i n g accuracy and r a t e r e r r o r s . J o u r n a l of A p p l i e d  Psychology, 64, 410-421. Brown, E.M. (1968). I n f l u e n c e of t r a i n i n g method and r e l a t i o n s h i p on the halo e f f e c t . J o u r n a l of A p p l i e d  Psychology, 52, 195-199. Burdock, E . I . , F l e i s s , J.L., & Hardesty, A.S. (1963). A new view of i n t e r - o b s e r v e r agreement. Personnel Psychology, J_6, 373-384. B u r i s c h , M. (1978). C o n s t r u c t i o n s t r a t e g i e s f o r m u l t i s c a l e p e r s o n a l i t y i n v e n t o r i e s . A p p l i e d P s y c h o l o g i c a l  Measurement, 2, 97-111. Burt, C. (1936). The a n a l y s i s of examination marks. In P. Hartog and E.C. Rhodes (Eds.), The marks of examiners. London: Macmillan. Cook, M. (1979).. P e r c e i v i n g o t h e r s : The psychology of  i n t e r p e r s o n a l percept i o n . London: Methuen. 1 18 1 19 Cronbach, L . J . (1970). E s s e n t i a l s of p s y c h o l o g i c a l t e s t i n g . New York: Harper & Row. Cronbach, L . J . , G l e s e r , G.C., Nanda, H., & Rajaratnam, N. (1972). The d e p e n d a b i l i t y of b e h a v i o r a l measurements:  Theory of g e n e r a l i z a b i l i t y f o r scores and p r o f i l e s . New York: Wiley. Cronbach, L . J . , Rajaratnam, N., & G l e s e r , G.C. (1963). Theory of g e n e r a l i z a b i l i t y : A l i b e r a l i z a t i o n of r e l i a b i l i t y theory. B r i t i s h J o u r n a l of S t a t i s t i c a l  Psychology, 16, 137-163. Crow, W.J. (1957). The e f f e c t of t r a i n i n g upon accuracy and v a r i a b i l i t y i n i n t e r p e r s o n a l p e r c e p t i o n . J o u r n a l of  Abnormal and S o c i a l Psychology, 55, 355-359. Cureton, E.E. (1931). E r r o r s of measurement and c o r r e l a t i o n . A r c h i v e s of Psychology, 125, 1-63. Cureton, E.E. (1958). The d e f i n i t i o n and e s t i m a t i o n of t e s t r e l i a b i l i t y . E d u c a t i o n a l and P s y c h o l o g i c a l Measurements, J_8, 715-738. Dawes, R.M., & C o r r i g a n , B. (1974). L i n e a r models i n d e c i s i o n making. P s y c h o l o g i c a l B u l l e t i n , 81, 95-106. D i a c o n i s , P. & E f r o n , B. (1983). Computer i n t e n s i v e methods in s t a t i s t i c s . Sc i e n t i f i c American, 248, 116-130. E b e l , R.L. (1951). E s t i m a t i o n of the r e l i a b i l i t y of r a t i n g s . Psychometrika, 16, 407-424. Einhorn, H.J., & Hogarth, R.M. (1975). U n i t weighting schemes f o r d e c i s i o n making. O r g a n i z a t i o n a l Behavior and  Human Performance, 13, 171-192. F i s k e , D.W. (1978). S t r a t e g i e s f o r P e r s o n a l i t y Research: The  Observation Versus I n t e r p r e t a t i o n of Behavior. San F r a n c i s c o : Jossey-Bass. F i s k e , D.W., & Cox, J.A. ( i 9 6 0 ) . The c o n s i s t e n c y of r a t i n g s by peers. J o u r n a l of A p p l i e d Psychology, 44, 11-17. F l e i s s , J.L. (1970). E s t i m a t i n g the r e l i a b i l i t y of i n t e r v i e w d ata. Psychometrika, 35, 143-162. F l e i s s , J.L., S p i t z e r , R.L., & Burdock, E . I . (1965). E s t i m a t i n g accuracy of judgment using recorded i n t e r v i e w s . A r c h i v e s of General P s y c h i a t r y , 12, 562-567. Flemenbaum, A., & Zimmerman N. (1973). I n t e r - and i n t r a - r a t e r r e l i a b i l i t y of the b r i e f p s y c h i a t r i c r a t i n g s c a l e . P s y c h o l o g i c a l Reports, 36, 783-792. 120 Goldberg, L.R. (1970). Man versus model of man: A r a t i o n a l e p l u s evidence f o r a method of improving on c l i n i c a l i n f e r e n c e s . P s y c h o l o g i c a l B u l l e t i n , 73, 422-432. Gorsuch, R.L. (1974). F a c t o r a n a l y s i s . Toronto: W.B. Saunders. Green, B.F. (1950). A note on the c a l c u l a t i o n of weights f o r maximum b a t t e r y r e l i a b i l i t y . Psychometrika, 15, 57-61. Grozz, H.J., & Grossman, K.G. (1968). C l i n i c i a n s ' response s t y l e : A source of v a r i a t i o n and b i a s i n c l i n i c a l judgments. J o u r n a l of Abnormal Psychology, 73, 207-214. G u i l f o r d , J.P. (1954). Psychometric methods. New York: McGraw-Hill. Haggard, E.A. (1958). I n t r a c l a s s c o r r e l a t i o n and the  a n a l y s i s of v a r i a n c e . New York: Dryden. Hoel, P.G. (1971). I n t r o d u c t i o n to mathematical s t a t i s t i c s . New York: Wiley. Horowitz, L.M., Inouye, D., & Siegelman, E.Y. (1979). On averaging judges' r a t i n g s to in c r e a s e t h e i r c o r r e l a t i o n with an e x t e r n a l c r i t e r i o n . J o u r n a l of C o n s u l t i n g and  C l i n i c a l Psychology, 47, 453-455. Hoyt, C. (1941). Test r e l i a b i l i t y estimated by a n a l y s i s of v a r i a n c e . Psychometrika, 6, 153-160. Hunter, J.E. (1968). P r o b a b i l i s t i c foundations f o r c o e f f i c i e n t s of g e n e r a l i z a b i l i t y . Psychometrika, 33, 1-18. Jackson, D.N. (1974). P e r s o n a l i t y r e s e a r c h form manual. Goshen, N.Y.: Research P s y c h o l o g i s t s P r e s s . Jackson, R.W.B. (1939). R e l i a b i l i t y of mental t e s t s . B r i t i s h  J o u r n a l of Psychology, 29, 267-287. Jackson, R.W.B., & Ferguson, G.A. (1941). S t u d i e s on the  r e l i a b i l i t y of t e s t s . Kahneman, D., S l o v i c , P., & Tversky, A. (Eds.). (1982). Judgment under u n c e r t a i n t y : H e u r i s t i c s and b i a s e s . Cambridge, Mass.: Cambridge U n i v e r s i t y P r e s s . Kane, J.S., & Lawler, E.E. (1978). Methods of peer assessment. P s y c h o l o g i c a l B u l l e t i n , 85, 555-586. K e l l e y , T.L. (1924). Note on the r e l i a b i l i t y of a t e s t : A r e p l y to Dr. Crum's c r i t i c i s m . J o u r n a l of E d u c a t i o n a l  Psychology, JJ5, 193-204. 121 K e l l e y , T.L. (1927). I n t e r p r e t a t i o n of e d u c a t i o n a l  measurements. New York: World Book. K e l l e y , T.L. (1947). Fundamentals of s t a t i s t i c s . Cambridge, Mass.: Cambridge U n i v e r s i t y P r e s s . Knight, R.A., & Blaney, P.H. (1977). The i n t e r r a t e r r e l i a b i l i t y of the P s y c h o t i c I n p a t i e n t P r o f i l e . J o u r n a l  of ; C l i n i c a l Psychology, 33, 647-653. Kusyszyn, I. (1968). A comparison of judgmental methods with endorsements i n the assessment of p e r s o n a l i t y t r a i t s . J o u r n a l of A p p l i e d Psychology, 52, 227-233. Landy, F . J . , & F a r r , J.L. (1976). P o l i c e performance a p p r a i s a l . JSAS C a t a l o g of S e l e c t e d Documents i n  Psychology, 6, 83. Landy, F . J . , & F a r r , J.L. (1980). Performance r a t i n g s . P s y c h o l o g i c a l B u l l e t i n , 87, 72-107. Landy, F . J . , & Trumbo, D.A. (1980). The Psychology of work  behavior. Homewood, 111.: Dorsey Press. Latham, G.P., Wexley, K.N., & P u r s e l l , E.D. (1975). T r a i n i n g managers to minimize r a t i n g e r r o r s i n the o b s e r v a t i o n of behavior. J o u r n a l of A p p l i e d Psychology, 60, 550-555. Lawshe, C.H., & Nagle, B.F. (1952). A note on the combination of r a t i n g s on the b a s i s of r e l i a b i l i t y . P s y c h o l o g i c a l B u l l e t i n , 49, 270-273. Lehman, H.E., Ban, T.A., & V e r d i n , M.D. (1965). Rating the r a t e r . A r c h i v e s of General P s y c h i a t r y , 13, 67-75. Love, K.G. (1981). Comparison of peer assessment methods: R e l i a b i l i t y , v a l i d i t y , f r i e n d s h i p b i a s and user r e a c t i o n . J o u r n a l of A p p l i e d Psychology, 66, 451-457. Maxwell, A.E., & P i l l i n e r , A.E.G. (1968). D e r i v i n g c o e f f i c i e n t s of r e l i a b i l i t y and agreement f o r r a t i n g s . B r i t i s h J o u r n a l of Mathematical and S t a t i s t i c a l  Psychology, 21, 105-116. Morrison, D.J. (1976). M u l t i v a r i a t e s t a t i s t i c a l methods. Toronto: McGraw-Hill. Mosier, C.I. (1943). On the r e l i a b i l i t y of weighted, composites. Psychometrika, 8, 161-168. Nunnally, J.C. (1978). Psychometric theory. Toronto: McGraw-Hill. O v e r a l l , J.E. (1965). R e l i a b i l i t y of composite r a t i n g s . 122 E d u c a t i o n a l and P s y c h o l o g i c a l Measurement, 25, 1011-1022. O v e r a l l , J.E. (1968). E s t i m a t i n g i n d i v i d u a l r a t e r r e l i a b i l i t i e s from a n a l y s i s of treatment e f f e c t s . E d u c a t i o n a l and P s y c h o l o g i c a l Measurement, 28, 255-264. Paulhus, D. (1981). Assessment of i n t e r - r a t e r r e l i a b i l i t y . Unpublished manuscript. U n i v e r s i t y of B r i t i s h Columbia. P e e l , E.A. (1947a). P r e d i c t i o n of a complex c r i t e r i o n and b a t t e r y r e l i a b i l i t y . B r i t i s h J o u r n a l of S t a t i s t i c a l  Psychology, j_, 84-94. P u r s e l l , E.D., Do s s e t t , D.L., & Latham, G.P. (1980). O b t a i n i n g v a l i d p r e d i c t o r s by minimizing r a t i n g e r r o r s i n the c r i t e r i o n . Personnel Psychology, 33, 91-96. Shen, E. (1925). The r e l i a b i l i t y c o e f f i c i e n t of personal r a t i n g s . J o u r n a l of E d u c a t i o n a l Psychology, 16, 232-236. Shrout, P.E., & F l e i s s , J.L. (1979). I n t r a c l a s s c o r r e l a t i o n s : Uses i n a s s e s s i n g r a t e r r e l i a b i l i t y . P s y c h o l o g i c a l B u l l e t i n , 86, 420-428. Smith, J.M. (1974). A new r a t e r s e l e c t i o n technique f o r use with b e h a v i o r a l r a t i n g s c a l e s . J o u r n a l of C l i n i c a l  Psychology, 30, 40-43. Spool, M.D. (1978). T r a i n i n g programs f o r observers of behavior: A review. Personnel Psychology, 31, 853-888. Strahen, R.F. (1980)..More on averaging judges' r a t i n g s : Determining the most r e l i a b l e composite. J o u r n a l of  C o n s u l t i n g and C l i n i c a l Psychology, 48, 587-589. Thomson, G.H. (1940). Weighting f o r b a t t e r y r e l i a b i l i t y and p r e d i c t i o n . B r i t i s h J o u r n a l of Psychology, 30, 357-366. Thomson, G.H. (1947). The maximum c o r r e l a t i o n of two weighted b a t t e r i e s : H o t e l l i n g ' s 'most p r e d i c t a b l e c r i t e r i o n ' . B r i t i s h J o u r n a l of S t a t i s t i c a l Psychology, 1 , 84-94. Thomson, G.H. (1951). The f a c t o r i a l a n a l y s i s of human  a b i l i t y . London: U n i v e r s i t y of London Press. T i n s l e y , H.E., & Weiss, D.J. (1975). I n t e r r a t e r r e l i a b i l i t y and agreement of s u b j e c t i v e judgments. J o u r n a l of  C o u n s e l l i n g Psychology, 22, 358-376. Wainer, H. (1976). E s t i m a t i n g c o e f f i c i e n t s i n l i n e a r models: It don't make no never mind. P s y c h o l o g i c a l B u l l e t i n , 83, 213-217. 1 23 Wiggins, J.S. (1973). P e r s o n a l i t y and p r e d i c t i o n : P r i n c i p l e s  of p e r s o n a l i t y assessment. Reading, Mass.: Addison-Wesley. Wiggins, J.S. (1981). C l i n i c a l and s t a t i s t i c a l p r e d i c t i o n : Where are we and where do we go from here? C l i n i c a l  Psychology Review, j_, 3-18. Wiggins, N., & Kohen, E. (1971). Man versus model of man r e v i s i t e d : The f o r e c a s t i n g of graduate school success. J o u r n a l of P e r s o n a l i t y and S o c i a l Psychology, 19, 100-106. APPENDIX INTEGER DISTRB(3),CONVRG REAL LEVSD,SCALSD,RELBAR,RELSD,X(30,2),REL(30,8),R(30,30), > GLOB(8,4,2)/64*0./,TMINX(30,3),TMINZ(30,3),GLOB2(2,3) > /6*0./,GLOB3(2,2)/4*0./,GLOB4(2,3)/6*0./,GLOB5(8,5,2) > /80*0./,TRUE(50,8),B(30),SIGMAY(2),XDAT(30,50),ZDAT(30, > ERRVAR(30),TRUREL(8),UNITWT(30)/30*1./ READ(5,1) NREPS,SEED 1 FORMAT(I 5,F5.2) Z=RANDN(SEED) Z=RAND(Z) 2 CONTINUE READ(5,3,END=3 0) NRAT,NTAR,LEVSD,SCALSD,RELBAR,RELSD, > (DISTRB(I),I=1,3),TRUALF 3 FORMAT(2I5,4F5.2,3I5,F5.4) CALL INIT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5) REWIND 7 DO 20 1=1,NREPS CALL DATA(NTAR,NRAT,LEVSD,SCALSD,RELBAR,RELSD,X,REL,R, > TMI NX, TMIN Z , TRUE , B', VARTOT, SUMTOT, XDAT, ZDAT, ERRV > DISTRB) CALL KELLEY(NRAT,REL,R,NTAR) CALL CRONB(NTAR,NRAT,REL,TMINX,X,VARTOT,SIGMAY) CALL OVRALL(NRAT,R,REL,NTAR,TRUE,ZDAT,TRUREL,UNITWT) CALL MAXLIK(NRAT,R,REL,TRUE,NTAR,ZDAT,CONVRG,TRUREL,UNITW IF (CONVRG.EQ.0) GOTO 5 1 = 1-1 GOTO 20 5 CONTINUE CALL FISHER(NRAT,R,REL) DO 10 J=1,NRAT REL(J,6)=TMINX(J,3)/SORT(TMINX(J,2)*X(J,2)) REL(J,7)=TMINZ(J,3)/SQRT(TMINZ(J,2)*FLOAT(NTAR-1)) 10 CONTINUE CALL COMPAR(NRAT,REL,GLOB) CALL ROFSUM(NRAT,X,REL,B,NTAR,VARTOT,SUMTOT,SIGMAY,GLOB2, > TRUE,ERRVAR,TRUREL,UNITWT,R,TRUALF) CALL MAXWAT(NRAT,REL,B,GLOB 3,X,R,TRUE,XDAT,ZDAT,NTAR,ERRV > TRUREL,UNITWT) CALL TRUVAR(NRAT,SIGMAY,REL,X,GLOB4,NTAR,VARTOT) CALL GETRUE(NTAR,TRUE,SUMTOT,SIGMAY,GLOB 5,NRAT,TRUREL) 20 CONTINUE CALL GLOBAL(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NREPS) CALL OUTPUT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NRAT,RELBAR,NTAR, > LEVSD,SCALSD,RELSD,DISTRB(1),DISTRB(2),DISTRB(3), > NREPS) GOTO 2 30 CONTINUE STOP END ********************************************************** SUBROUTINE DATA(NTAR,NRAT,LEVSD,SCALSD,RELBAR,RELSD,X,REL,R, 1 24 1 25 > TMINX,TMIN Z,TRUE,B,VARTOT,SUMTOT,XDAT,ZDAT,ERRVAR, > DISTRB) INTEGER DISTRB(3) REAL X(30,2),XDAT(30,50),REL(30,8),TRUE(50,8),B(30),ERRVAR(3 > R(30,30),ZDAT(30,50),TMINX(30,3),TMINZ(30,3),LEVSD DO 10 1=1,NTAR TRUE(1,8)=FRANDN(0.) TRUE(I,1)=0. TRUE(I,3)=0. 10 CONTINUE RLOW=2.*RELBAR-1. DO 70 1=1,NRAT A=FRANDN(0.)*LEVSD 20 CONTINUE IF (DISTRB(2).EQ.1) B(I)=FRANDN(0.)*SCALSD+1. IF (DISTRB(2).EQ.2) B(I)=FRAND(0.)* 1.5242+.2379 IF (B(I).LE.0..OR.B(I).GE.2.) GOTO 20 30 CONTINUE IF (DISTRB(3).EQ.1) REL(1,8)=FRANDN(0.)*RELSD+RELBAR IF (DISTRB(3).EQ.2) REL(I,8)=FRAND(0.)*.7621+.21895 IF (REL(1,8).LE.RLOW.OR.REL(1,8).GE.1.) GOTO 30 ERRVAR(I)=B(I)**2/REL(l , 8 ) - B ( l ) * * 2 X(I , 1 )=0. X(I,2)=0. DO 40 J=1,NTAR XDAT(I,J)=A+B(I)*TRUE(J,8)+SQRT(ERRVAR(I))*FRANDN(0 . ) X(I,1)=X(I,1)+XDAT(I,J) X(I ,2)=X(I,2)+XDAT(l,J)**2 TRUE(J,1)=TRUE(J,1)+XDAT(I,J) 40 CONTINUE X ( I , 2 ) = X ( I , 2 ) - X ( l , 1 ) * * 2/FLOAT(NTAR) R ( I , 1 ) = 1 . IMIN1=I-1 DO 50 J=1,IMIN1 R(I,J)=0. 50 CONTINUE XBAR=X(I,1)/FLOAT(NTAR) XSD=SQRT(X(I,2)/FLOAT(NTAR-1)) DO 60 J=1,NTAR ZDAT(I,J)=(XDAT(I,J)-XBAR)/XSD TRUE(J,3)=TRUE(J,3)+ZDAT(I,J) DO 60 K=1,IMIN1 R(I,K)=R(I,K)+ZDAT(I,J)*ZDAT(K,J) 60 CONTINUE DO 70 J=1,IMIN1 R(I,J)=R(I,J)/FLOAT(NTAR-1) R(J,I)=R(I,J) 70 CONTINUE R(1,1) = 1 . DO 100 1=1,NRAT DO 80 J=1,3 TMINX(I,J)=0. TMINZ(I,J)=0. 1 26 80 CONTINUE DO 90 J=1,NTAR TMX=TRUE(J,1)-XDAT(I,J) TMINX(I,1)=TMINX(I,1)+TMX TMINX(I,2)=TMINX(I,2)+TMX**2 TMINX(I,3)=TMINX(I,3)+TMX*XDAT(I,J) TMZ=TRUE(J,3)-ZDAT(I,J) TMINZ(I,1)=TMINZ(I,1)+TMZ TMINZ(I,2)=TMINZ(I,2)+TMZ**2 TMINZ(I,3)=TMINZ(I,3)+TMZ*ZDAT(I,J) 90 CONTINUE TMINX(I,2)=TMINX(I,2)-TMINX(l,1)**2/FLOAT(NTAR) TMINX(I,3)=TMINX(I,3)-X(I,1)*TMINX(I,1)/FLOAT(NTAR) TMINZ(I,2)=TMINZ(I,2)-TMINZ(l,1)**2/FLOAT(NTAR) 100 CONTINUE SUMTOT=0. VARTOT= 0. DO 110 1=1,NTAR SUMTOT=SUMTOT+TRUE(I,1) VARTOT=VARTOT+TRUE(I,1)**2 TRUE(1,1)=TRUE(I,1)/FLOAT(NRAT) TRUE(I,3)=TRUE(I,3)/FLOAT(NRAT) 110 CONTINUE VARTOT=VARTOT-SUMTOT**2/FLOAT(NTAR) RETURN END ****************************************** SUBROUTINE KELLEY(NRAT,REL,R,NTAR) REAL REL(30,8),R(30,30) DO 20 1=1,NRAT REL(I,1)=0. W=0. DO 10 J=2,NRAT IF (J.EQ.I) GOTO 10 JMIN1=J-1 DO 10 K=1,JMIN1 IF (K.EQ.I) GOTO 10 RELIAB=R(I,J)*R(I,K)/R(J,K) SE=RELIAB** 2/FLOAT(NTAR)*ABS(4.*RELIAB+2./RELIAB + > 1./R(J,K)**2+(1.-2.*RELIAB)/R(I,j)**2+(1.-2.*REL > /R(I,K)**2-5.) W=W+1,/SE REL(1,1)=REL(1,1)+RELIAB/SE 10 CONTINUE REL(I,1)=REL(I,1)/W 20 CONTINUE RETURN END ************************************************************ SUBROUTINE CRONB(NTAR,NRAT,REL,TMINX,X,VARTOT,SIGMAY) REAL REL(30,8),TMINX(30,3),X(30,2),SIGMAY(2) SUMVAR=0. DO 10 1=1,NRAT 127 SUMVAR= SUMVAR+X(1,2) 10 CONTINUE XN=FLOAT(NRAT)/(FLOAT(NRAT-1)*(VARTOT-SUMVAR)) DO 20 1=1,NRAT REL(I,2)=XN*TMINX(I,3)**2/X(I,2) 20 CONTINUE SIGMAY(1)=(VARTOT-SUMVAR)/FLOAT(NRAT*(NRAT-1)*(NTAR-1)) SIGMAY(2)=FLOAT(NRAT)/FLOAT(NRAT-1)*(1.-SUMVAR/VARTOT) RETURN END ***************************************** SUBROUTINE OVRALL(NRAT,R,REL,NTAR,TRUE,ZDAT,TRUREL,UNITWT) REAL RARRAY(900),R(30,30),EIGEN(30),REL(30,8),TRUE(50,8), > ZDAT(30,50),TRUREL(8),UNITWT(30) DO 5 1=1,NTAR TRUE(I,6)=0. 5 CONTINUE K=0 DO 10 1=1,NRAT DO 10 J=1,1 K=K+1 RARRAY(K)=R(I ,J) 10 CONTINUE CALL SYMAL(RARRAY,NRAT,EIGEN,I ERROR,1) N=NRAT*(NRAT-1) ICOUNT=0. DO 15 1=1,NRAT IF(RARRAY(N+I).LT.0.) ICOUNT=ICOUNT+1 15 CONTINUE ICOUNT=ICOUNT*2 F=1 . IF (I COUNT.GT.NRAT) F = -1. DO 20 1=1,NRAT REL(I,3)=EIGEN(NRAT)*RARRAY(N+I)**2 RARRAY(N+I)=F*RARRAY(N+I)/SORT(EIGEN(NRAT)) DO 20 J=1,NTAR TRUE(J,6)=TRUE(J,6)+RARRAY(N+I)*ZDAT(I,J) 20 CONTINUE CALL RCOMP(NRAT,RARRAY(N +1),R,REL,UNITWT,TRUREL(6)) RETURN END *********************************************************** SUBROUTINE COMPAR(NRAT,REL,GLOB) REAL RELDAT ( 8 ,4 ),REL(30,8 ),GLOB ( 8 ,4 ,2) DO 30 M=1,8 I=M-1 IF (I.EQ.0) 1=8 DO 10 J=1,4 RELDAT(I,J)=0. 10 CONTINUE DO 20 J=1,NRAT RELDAT(1,1)=RELDAT(I , 1 )+REL(J,I) RELDAT(I,2)=RELDAT(I,2)+REL(J,I)**2 128 RELDAT(I,3)=RELDAT(I,3)+REL(J,I)*REL(J,8) RELDAT(I,4)=RELDAT(I,4)+(REL(J,I)-REL(J,8))**2 20 CONTINUE RELDAT(I,2)=SQRT((RELDAT(I,2)-RELDAT(l,1)* * 2/FLOAT(NRAT)) > /FLOAT(NRAT-1)) RELDAT(I,3)=(RELDAT(I,3)-RELDAT(I,1)*RELDAT(8,1)) > /(RELDAT(1,2)*RELDAT(8,2)*FLOAT(NRAT-1)) RELDAT(1,1)=RELDAT(I,1)/FLOAT(NRAT) RELDAT(1,4)=RELDAT(1,4)/FLOAT(NRAT) DO 30 J=1,4 GLOB(I,J,1)=GLOB(I,J,1)+RELDAT(I,J) GLOB(I,J,2)=GL0B(I,J,2)+RELDAT(I,J)**2 30 CONTINUE RETURN END ************************************* SUBROUTINE MAXLIK(NRAT,R,REL,TRUE,NTAR,ZDAT,CONVRG,TRUREL,UN INTEGER OLD,CONVRG REAL R(30,30),REL(30,8),TRUE(50,8 ),ZDAT(30,50),S(900),MAX, > EIGEN(30),L(2,30),P(30),CONST/.005/,W(30),TRUREL(8), > UNITWT(30) N=NRAT*(NRAT-1) ITER=0. CONVRG=0 S(1)=R(1,1) INDEX=1 DO 20 I=2,NRAT IMIN1=I-1 DO 10 J=1,IMIN1 INDEX=INDEX+1 S(INDEX)=R(I,J) 10 CONTINUE INDEX=INDEX+1 S(INDEX)=R(I ,1 ) 20 CONTINUE CALL SYMAL(S,NRAT,EIGEN,IERROR,1) X=SORT(EIGEN(NRAT)) DO 30 1=1,NRAT L(2,I)=S(N+I)*X 30 CONTINUE NEW=1 OLD=2 40 CONTINUE ITER=ITER+1 P(1)=SQRT(R(1,1)-L(OLD,1)**2) S(1)=L(OLD,1)**2/P(1)**2 INDEX=1 DO 60 I=2,NRAT P(I)=SQRT(R(I , 1)-L(OLD,I)**2) IMIN1=1-1 DO 50 J=1,IMIN1 INDEX=INDEX+1 S(INDEX)=R(I,J)/(P(I)*P(J)) 129 50 CONTINUE INDEX=INDEX+1 S(INDEX)=L(0LD,I)**2/P(l)**2 60 CONTINUE CALL SYMAL(S,NRAT,EIGEN,IERROR,1) IF (I ERROR.EQ.0) GOTO 65 CONVRG=1 RETURN 65 CONTINUE MAX=0. XX=SQRT(EIGEN(NRAT)) DO 70 1=1,NRAT L(NEW,I)=S(N+I)*XX*P(I) X=ABS(L(NEW,I)-L(OLD,I)) IF (X.GT.MAX) MAX=X 70 CONTINUE K=NEW NEW=OLD OLD=K IF (ITER.LT.2.OR.(MAX.GT.CONST.AND.ITER.LE.15)) GOTO 40 IF (ITER.LE.15.OR.MAX.LE.CONST) GOTO 75 CONVRG=1 RETURN 7 5 CONTINUE Z = 0. DO 80 1=1,NRAT X=L(OLD,I)**2 REL(I,4)=X P(I)=R(I , 1)-X Z=Z+X/P(I) 80 CONTINUE Z=1.+Z DO .90 1 = 1 ,NTAR TRUE(I,7)=0. 90 CONTINUE ICOUNT=0. DO 95 1=1,NRAT IF (L(OLD,I).LT.0.) ICOUNT=ICOUNT+1 95 CONTINUE ICOUNT=ICOUNT*2 F=1 . IF (ICOUNT.GT.NRAT) F=-1. DO 100 1=1,NRAT W(I)=F*L(OLD,I)/(P(I)*Z) DO 100 J=1,NTAR TRUE(J,7)=TRUE(J,7) + ZDAT(I,J)*W(I ) 100 CONTINUE CALL RCOMP(NRAT,W,R,REL,UNITWT,TRUREL(7)) RETURN END **************************************** SUBROUTINE FISHER(NRAT,R,REL) REAL R(30,30),REL(30,8),Z(30) 130 DO 10 1=1,NRAT Z(I)=0. 10 CONTINUE DO 20 I=2,NRAT IMIN1=I-1 DO 20 J=1,IMIN1 X=ALOG((1.+R(I,J))/(1.-R(I,J))) Z(I ) = Z(I)+X Z(J)=Z(J)+X 20 CONTINUE DO 30 1=1,NRAT X=2.7183**(Z(I)/FLOAT(NRAT-1)) REL(I,5)=(X-1.)/(1.+X) 30 CONTINUE RETURN END ********************************************** SUBROUTINE ROFSUM(NRAT,X,REL,B,NTAR,VARTOT,SUMTOT,SIGMAY,GLO > TRUE,ERRVAR,TRUREL,UNITWT,R,TRUALF) REAL X(30,2),REL(30,8),B(30),SIGMAY(2),GLOB2(2,3),TRUE(50,8) > RELSUM(3),ERRVAR(30),TRUREL(8),UNITWT(30),R(30,30) RELSUM(1)=SIGMAY(2) CALL RCOMP(NRAT,UNITWT,R,REL,X(1,2),TRUREL(1)) CALL RCOMP(NRAT,UNITWT,R,REL,UNITWT,TRUREL(3)) RELSUM(2)=TRUREL(1) TRUREL(5)=TRUREL(1) DO 20 1=1,2 GLOB2(1,1)=GLOB2(1,1)+RELSUM(I) GLOB2(1,2)=GLOB2(1,2)+RELSUM(I)**2 GLOB2(l,3)=GL0B2(I,3)+(RELSUM(I)-TRUALF)**2 20 CONTINUE S=SUMTOT/FLOAT(NRAT*NTAR)*(1.-RELSUM(2)) DO 30 1=1,NTAR TRUE(1,5)=RELSUM(2)*TRUE(I,1)+S 30 CONTINUE RETURN END *********************************************** SUBROUTINE MAXWAT(NRAT,REL,B,GLOB3,X,R,TRUE,XDAT,ZDAT,NTAR, > ERRVAR,TRUREL,UNITWT) REAL ARRAYOOO),REL(30,8),BSUM(2),WSUM(2),EIGEN(30),B(30), > GLOB3(2,2),X(30,2),R(30,30),W(30,3),TRUE(50,8), > ZDAT(30,50),XDAT(30,50),ERRVAR(30),TRUREL(8),UNITWT(30) ARRAY(1)=REL( 1 , 4 ) / ( 1 .-REL(1,4)) INDEX=1 DO 20 I = 2,NRAT IMIN1=I-1 DO 10 J=1,IMIN1 INDEX=INDEX+1 ARRAY(INDEX)=R(I,J)/SQRT((1.-REL(I,4))*(1.-REL(J,4))) 10 CONTINUE INDEX=INDEX+1 ARRAY(INDEX)=REL(I ,4)/( 1 .-REL (1,4)) 131 20 CONTINUE DO 30 1=1,NTAR TRUE(I,2)=0. TRUE(l,4)=0. 30 CONTINUE BSUM(1)=0. BSUM(2)=0. WSUM(1)=0. WSUM(2)=0. CALL SYMAL(ARRAY,NRAT,EIGEN,IERROR,1) N=NRAT*(NRAT-1) W1=0. DO 40 1=1,NRAT W(I,1)=ARRAY(N+I)/SQRT((1.-REL(I,4))*X(I,2)) W(I,2)=SQRT(REL(I,4)*FLOAT(NTAR-1))/(SQRT(X(l,2))*(1.-> REL ( 1,4))) W(I,3)=SQRT(REL(I,4))/(1.-REL(I,4)) W1=W1+W(I,3) DO 40 J=1 ,2 BSUM(J)=BSUM(J)+W(I,J)*B(I) WSUM(J)=WSUM(J)+W(I,J)**2*ERRVAR(I) 40 CONTINUE DO 60 1=1,2 XR=BSUM(I)**2/(BSUM(l)**2+WSUM(l)) GLOB3(1,1)=GLOB3(1,1)+XR GLOB3(l,2)=GL0B3(I,2)+XR**2 60 CONTINUE DO 7 0 1=1,NTAR TRUE(I,2)=0. TRUE(I,4)=0. 70 CONTINUE DO 80 1=1,NRAT W(I,3)=W(I,3)/W1 DO 80 J=1,NTAR TRUE(J,2)=TRUE(J,2)+W(I,3)*XDAT(I,J) TRUE (J , 4 ) =TRUE (J , 4 ) +W (I , 3 ) * ZDAT (I , J ) 80 CONTINUE . CALL RCOMP(NRAT,W(1,3),R,REL,X(1,2),TRUREL(2)) CALL RCOMP(NRAT,W(1,3),R,REL,UNITWT,TRUREL(4)) RETURN END **************************************** SUBROUTINE TRUVAR(NRAT,SIGMAY,REL,X,GLOB4,NTAR,VARTOT) REAL SIGMAY(2),REL(30,8),X(30,2),GLOB4(2,3) SIGMAY(2)=0. DO 10 1=1,NRAT SIGMAY(2)=SIGMAY(2)+X(I,2)*(1.-REL(I,4)) 10 CONTINUE SIGMAY(2) = (VARTOT-SIGMAY(2))/FLOAT(NRAT* * 2 *(NTAR-1)) DO 20 1=1,2 GLOB4(l,1)=GLOB4(l,1)+SIGMAY(l) GLOB4(l,2)=GL0B4(I,2)+SIGMAY(I)**2 GL0B4(I,3)=GL0B4(I,3)+(SIGMAY(I)-1.)**2 1 32 20 CONTINUE RETURN END ************************************** SUBROUTINE GETRUE(NTAR,TRUE,SUMTOT,SIGMAY,GLOB5,NRAT,TRUREL) REAL TRUDAT(8,5),TRUE(50,8),GLOB5(8,5,2),SIGMAY(2),TRUREL(8) YBAR=SUMTOT/FLOAT(NRAT*NTAR) DO 30 M=1,8 I=M-1 IF (I.EQ.O) 1=8 DO 10 J=1,5 TRUDAT(I,J)=0. 10 CONTINUE DO 20 J=1,NTAR TRUDAT(I,1)=TRUDAT(I,1)+TRUE(j,I) TRUDAT(I,2)=TRUDAT(I,2)+TRUE(J,I)* * 2 TRUDAT(1,3)=TRUDAT(1,3)+TRUE(J,I)*TRUE(J,8) TRUDAT(I,4)=TRUDAT(I,4)+(TRUE(J,I)-TRUE(J,8))**2 20 CONTINUE TRUDAT(1,2)=SQRT((TRUDAT(I,2)-TRUDAT(l,1)**2/FLOAT(NTAR)) > /FLOAT(NTAR-1)) TRUDAT(I,3)=(TRUDAT(1,3)-TRUDAT(I,1)*TRUDAT(8,1))/ > (TRUDAT(1,2)*TRUDAT(8,2)*FLOAT(NTAR-1)) TRUDAT(1,1)=TRUDAT(I,1)/FLOAT(NTAR) TRUDAT(1,4)=TRUDAT(1,4)/FLOAT(NTAR) 30 CONTINUE TRUREL(8) = 1 . DO 50 1=1,8 X2=TRUDAT(1,2)/SQRT(ABS(SIGMAY(1)*TRUREL(I))) X1=TRUDAT(1,1)-YBAR*X2 DO 40 J=1,NTAR TRUDAT(I,5)=TRUDAT(I,5)+((TRUE(J,I)-X1)/X2-TRUE(J,8) ) * 40 CONTINUE TRUDAT(1,5)=TRUDAT(1,5)/FLOAT(NTAR) DO 50 J=1,5 GLOB5(I,J,1)=GLOB5(I,J,1)+TRUDAT(I,J) GLOB5(l,J,2)=GL0B5(I,J,2)+TRUDAT(I,J)**2 50 CONTINUE RETURN END ************************************************** SUBROUTINE GLOBAL(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NREPS) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, CALL GDIM3(GLOB,8,4,2,NREPS) CALL GDIM2(GLOB2,2,3,NREPS) DO 10 I = 1 , 2 GLOB3(l,2)=SQRT((GLOB3(l,2)-GLOB3(I,1)**2/FLOAT(NREPS))/ > FLOAT(NREPS-1)) GLOB3(l,1)=GLOB3(l,1)/FLOAT(NREPS) 10 CONTINUE CALL GDIM2(GLOB4,2,3,NREPS) CALL GDIM3(GLOB5,8,5,2,NREPS) RETURN 133 END *************************************** SUBROUTINE GDIM3(X,I 1,1 2,1 3,NREPS) REAL X(I 1,1 2,1 3) DO 10 1=1,11 DO 10 J=1,12 X(I,J,2)=SQRT((X(I,J,2)-X(I,J,1)**2/FLOAT(NREPS))/ > FLOAT(NREPS-1)) X(I,J,1)=X(I,J,1)/FLOAT(NREPS) 10 CONTINUE RETURN END ************************************************* SUBROUTINE GDIM2(X,I 1 ,1 2,NREPS) REAL X(I 1 ,1 2 ) DO 10 1=1,11 X(I,2)=SQRT((X(I,2)-X(l,1)**2/FLOAT(NREPS))/FLOAT(NREPS-1 X(I,1)=X(I,1)/FLOAT(NREPS) X(I,3)=X(I,3)/FLOAT(NREPS) 10 CONTINUE RETURN END * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * SUBROUTINE OUTPUT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5,NRAT,RELBAR, > NTAR,LEVSD,SCALSD,RELSD,11,12,13,NREPS) INTEGER TITLE(3) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, > LEVSD WRITE(6,999) 999 FORMAT( 11'/•-') WRITE(6, 10) 10 FORMAT('-',8X,'Table'//9X,'Means (over r e p l i c a t i o n s ) of Mean > 'SD''s, C o r r e l a t i o n s with') WRITE(6,11) 11 FORMAT('+' ,8X,59('_' ) ) WRITE(6,12) 12 FORMAT(9X,'Actual R e l i a b i l i t i e s and Mean Square D e v i a t i o n s f > 'Actual') WRITE(6,9) 9 FORMAT('+',8X,59('_')) WRITE(6, 13) 13 F O R M A T ( 9 X , ' R e l i a b i l i t i e s of Rater R e l i a b i l i t y Estimates.') WRITE(6,14) 14 FORMAT('+',8X,45('_')) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,20) 20 FORMAT(///25X,'Rater R e l i a b i l i t y Estimates') WRITE(6,21 ) 21 FORMAT('+',24X,27('_')) CALL OUT3(GLOB,8,4,2) WRITE(6,999) WRITE(6,22) 22 FORMAT('-',8X,'Table'//9X,'Comparison of Estimates of R e l i a b 134 > 1 i t y of Sums of Raters;') WRITE(6,23) 23 FORMAT('+',8X,57('_')) WRITE(6,24) 24 FORMAT(9X,'Population Values of R e l i a b i l i t y of Composites > 'Weighted by') WRITE(6,25) 25 FORMAT('+',8X,58('_')) WRITE(6,26) 26 FORMAT(9X,'Two Methods; Comparison of Estimates of True Scor > 'Variance. ' ) WRITE(6,27) 27 FORMAT('+' r8X,60('_')) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,30) 30 FORMAT(///29X,'Reliability of Sum') WRITE(6,31) 31 FORMAT('+',28X,18(' ')) CALL OUT2(GLOB2,2,3T WRITE(6,40) 40 FORMAT(///22X,'Weighting f o r Maximum R e l i a b i l i t y ' ) WRITE(6,41) 41 FORMAT('+',21X,33('_')) WRITE(6,50) 50 FORMAT(//9X,'Estimate',10X,'Mean',9X,'SD') WRITE(6,55) 55 FORMAT('+',8X,60('_')) DO 80 1=1,2 READ(7,60) (TITLE(J),J=1,3) 60 FORMAT(3A4) WRITE(6,70) (TITLE(J),J=1,3), (GLOB3(I,J),J=1,2) 70 FORMAT(/9X,3A4,3X,2(F8.3,4X)) 80 CONTINUE WRITE(6,85) 85 FORMAT(9X,60('_')) WRITE(6,90) 90 FORMAT(///29X,'True Score Variance') WRITE(6,91) 91 FORMAT('+',28X,19(' ')) CALL OUT2(GLOB4,2,37 WRITE(6,999) WRITE(6,95) 95 FORMAT('-',8X,'Table'//9X,'Means (over r e p l i c a t i o n s ) of Mean > 'SD''s, C o r r e l a t i o n s with') WRITE(6,96) 96 FORMATC + ', 8X, 59 ('_')) WRITE(6,97) 97 FORMAT(9X,'Actual and Mean Square D e v i a t i o n s from A c t u a l (MS > ' of True') WRITE(6,98) 98 FORMAT(' +' ,8X,60('_' )) WRITE(6,99) 99 FORMAT(9X,'Score Estimates; A l s o Includes Mean Square ', 135 > 'Deviations from') WRITE(6,100) 100 FORMAT('+',8X,58('_')) WRITE(6,101) 101 FORMAT(9X,'Actual of Estimates Standardized to Mean Equal ', > 'to Estimated') WRITE(6,102) 102 FORMAT('+',8X,59('_')) WRITE(6,103) 103 FORMAT(9X,'True Score Mean and Variance Equal to the Product > 'of') WRITE(6,106) 106 FORMAT('+',8X,52('_')) WRITE(6,107) 107 FORMAT(9X,'Estimates of True Score Variance and ', > ' R e l i a b i l i t y (MSE2). ' ) WRITE(6,104) 104 FORMAT('+',8X,56('_')) CALL CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11,12,13,NRE WRITE(6,110) 110 FORMAT(///34X,'True Scores') WRITE(6,111) 111 FORMAT('+',33X,11('_')) CALL OUT3(GLOB5,8,5,2) RETURN END *************************************** SUBROUTINE OUT3(X,11,12,13) INTEGER TITLE(8,3) REAL X(I1,12,13) READ(7,5) ((TITLE(I,J),J=1,3),I=1,I1) 5 FORMAT(3A4) IF (I2.EQ.4) WRITE(6,10) 10 FORMAT(//9X,'Estimate',11X,'Mean',6X,'SD',7X,'R',7X, > 'MSE') IF (I2.EQ.5) WRITE(6,20) 20 FORMAT(//9X,'Estimate',11X,'Mean',6X,'SD',7X,'R',7X, > 'MSE1',5X,'MSE2') WRITE(6,15) 15 FORMAT('+',8X,60(' ')) WRITE(6,30) (TITLEl8,I),I=1,3),((X(8,I,J),I=1,2),J=1,2) 30 FORMAT(/9X,3A4,3X,2(F8.3,1X)/24X,2(2X,'(',F5.2,')')) DO 60 1=1,7 WRITE(6,40) (TITLE(I,J),J=1,3),(X(I,J,1),J=1,12) 40 FORMAT(/9X,3A4,3X,5(F8.3,1X)) IF (I2.EQ.4) WRITE(6,45) (X(I,J,2),J=1,1 2) 45 FORMAT(24X,4(2X,'(',F5.2,')')) IF (I2.EQ.5) WRITE(6,50) (X(I,J,2),J=1,1 2) 50 FORMAT(24X,5(2X, ' ( '.,F5. 2 , ' ) ' ) ) 60 CONTINUE WRITE(6,70) 70 FORMAT(9X,60('_')//9X,'Note. Standard d e v i a t i o n s are given i > 'parentheses.') 1 36 RETURN END *************************************** SUBROUTINE OUT2(X,I 1,1 2) INTEGER TITLE(8,3) REAL X(I 1 ,12) READ(7,5) ((TITLE(I,J),J=1 ,3 ) ,I = 1 ,1 1) 5 FORMAT(3A4) WRITE(6,11) 11 FORMAT(//9X,'Est imate',1 OX,'Mean' ,9X,'SD' ,1 OX,'MSE') WRITE(6,15) 15 FORMAT('+',8X,60('_')) DO 40 1=1,11 WRITE(6,30) (TITLE(I,j),J=1 ,3),(X(I,J),J=1, 1 2) 30 FORMAT(/9X,3A4,3X,3(F8.3,4X)) 40 CONTINUE WRITE(6,50) 50 FORMAT(9X,60('_')) RETURN END *************************************************** SUBROUTINE CONDIT(NRAT,RELBAR,NTAR,LEVSD,SCALSD,RELSD,11, > I 2,1 3,NREPS) INTEGER DTYPE(2,2)/'Norm','al ','Unif','orm '/ REAL LEVSD WRITE(6,10) 10 FORMAT(//24X,'N',18X,'Level*,3X,'Scale',1X,'Reliability') WRITE(6,20) 20 FORMAT('+',8X,16('_'),4X,40('_')) WRITE(6,30) NRAT,RELBAR 30 FORMAT(9X,'Raters',7X,I3,4X,'Mean',12X,'0',7X,'1',6X,F3.2) WRITE(6,40) NTAR,LEVSD,SCALSD,RELSD 40 FORMAT(9X,'Targets' ,6X,I 3,4X,'SD' ,8X,3(5X,F3.2)) WRITE(6,50) NREPS,(DTYPE(I,11),1=1,2),(DTYPE(I,12),1=1,2), > (DTYPE(I,13),1=1,2) 50 FORMAT(9X,'Replications',1X,I3,4X,'Distribution',2X,3(2A4)) RETURN END * *"* ************************************************ SUBROUTINE RCOMP(NRAT,W,R,REL,VAR,RELCOM) REAL W(30),R(30,30),NUM,DENOM,REL(30,8),VAR(30) NUM=W(1)**2*REL(1,4)*VAR(1) DENOM=W(1)**2*VAR(1) DO 20 I=2,NRAT IMIN1=1-1 DO 10 J=1,IMIN1 Z=2.*W(I)*W(J)*SQRT(VAR(I)*VAR(J))*R(I,J) NUM=NUM+Z DENOM=DENOM+Z 10 CONTINUE NUM=NUM+W(I)**2*REL(I,4)*VAR(I) DENOM=DENOM+W(I)* * 2 *VAR(I) 20 CONTINUE 137 RELCOM=NUM/DENOM RETURN END ************************************* SUBROUTINE INIT(GLOB,GLOB2,GLOB3,GLOB4,GLOB5) REAL GLOB(8,4,2),GLOB2(2,3),GLOB3(2,2),GLOB4(2,3),GLOB5(8,5, DO 30 1=1,2 GL0B3(I,1)=0. GL0B3(I,2)=0. DO 10 J=1 ,3 GL0B2(I,J)=0. GL0B4(I,J)=0. 10 CONTINUE DO 20 J=1,8 DO 25 K=1,4 GL0B(J,K,I)=0. GLOB5(J,K,I)=0. 2 5 CONTINUE GLOB5(J,5,I)=0. 20 CONTINUE • 30 CONTINUE RETURN END 20 CONTINUE 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096273/manifest

Comment

Related Items