UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The group measurement of generalizing ability at the grade six level Filmer-Bennett, Gordon Thomas 1946

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1946_A8 F5 G7.pdf [ 6.16MB ]
Metadata
JSON: 831-1.0302576.json
JSON-LD: 831-1.0302576-ld.json
RDF/XML (Pretty): 831-1.0302576-rdf.xml
RDF/JSON: 831-1.0302576-rdf.json
Turtle: 831-1.0302576-turtle.txt
N-Triples: 831-1.0302576-rdf-ntriples.txt
Original Record: 831-1.0302576-source.json
Full Text
831-1.0302576-fulltext.txt
Citation
831-1.0302576.ris

Full Text

lift fls fx- a-) THE GROUP MEASUREMENT OF GENERALIZING ABILITY AT THE GRADE SIX LEVEL by ' Gordon Thomas Filmer-Bennett A Thesis submitted i n P a r t i a l Fulfilment of The Requirements for the Degree of M A S T E R OF A R T S i n the Department of Philosophy and Psychology THE UNIVERSITY OF"BRITISH COLUMBIA October 1946. ACKNOWLEDGMENT I wish to extend thanks to Mr. R. Straight, Superintendent of the Vancouver City Schools, and to the p r i n c i p a l s and teachers who co-op-erated so w i l l i n g l y i n the execution of t h i s project. I am also indebted to Mr. T. Robinson and more es p e c i a l l y to Miss T. Combolos f o r assistance i n test administration and scoring. Credit i s also due Mr. H. Parker who shared i n the construction of these t e s t s , TABLE OF CONTENTS CHAPTER II I I I IV INTRODUCTION THE PROBLEM 1. Review of the Literature 2. Summary of the - Literature 3* The Problem Defined 4. The Problem i n Outline GENERAL PROCEDURE, APPARATUS, AND SUBJECTS. . 1. General Procedure 2. Apparatus 3. Subjects THE EXPERIMENTS 1. The D Score as a Basis for Analysis 2. Sex Differences 3. Group Differences 4.. High and Low I.Q. Groups Compared -5. Summary of Chapter I I I TEST RELIABILITY AND VALIDITY 1. 2. 3. Test R e l i a b i l i t y An Aspect of Test V a l i d i t y : Correlations with I n t e l l i -gence and Other Variables Summary of Chapter IV ITEM VALIDITY AND ANALYSIS PAGE 1 7 7 8 10 19 21 26 27 28 31 34 36 38 44 1. Item-Difficulty 46 a. V a l i d i t y and the Diagnostic Value of an Item . 49 3. Items Analyzed 52 4. Another Aspect of V a l i d i t y 58 5. Summary of Chapter V 60 TABLE OF CONTENTS (Cont'd) CHAPTER PAGE VI GROUP REACTION TO SUCCESSIVE TEST PRESENTATIONS 1. Improved and Unimproved Scores 62 2. Perfect Scores 64 3. Mean Scores 65 4. P o s i t i v e and Negative Component Scores 72 5. Summary of Chapter VI 76 VII FINAL EVALUATION AND CRITICISM 78 VIII CONCLUSIONS, IMPLICATIONS, AND SUGGESTIONS FOR FUTURE RESEARCH 1. Conclusions 83 2. Implications for Educational or Applied Psychology . 87 3. Suggestions for Future Research 88 APPENDIX I. The Sets of Directions A. Instructions Issued Subjects i n Experiment I. 1 A B. Instructions Issued Subjects i n Experiment I I . 4 A C. Instructions Issued Subjects i n Experiment I I I . 8 A APPENDIX I I . Supplementary Tables. 12 A BIBLIOGRAPHY. LIST OF TABLES AND PLATES A. TABLES TABLE PAGE. I Average Intelligence of Matched Groups. Standard Deviations. 83 II Average Chronological Age of Matched Groups. Standard Deviations. (Expressed i n Months) 83 III Average Intelligence of Sub-Groups. Standard Deviations. 84 IV/ Average Chronological Age of Sub-Groups. Standard Deviations. (Expressed i n Months) 25 V Correlations Between D Score and T o t a l Score. Standard E r r o r s . 26 VI C r i t i c a l Ratios of Sex Differences i n Mean Scores and V a r i a b i l i t y on the D Test. 27 VII Group Differences i n Mean Scores and V a r i a -b i l i t y on The D Test. C r i t i c a l Ratios. 28 V i l l a Differences i n Mean Scores and V a r i a b i l i t y Between High I.Q. Groups, Based Upon, the Com-bined T o t a l of A, E, C, and D Scores. C r i t i c a l Ratios. 32 VHIb Differences i n Mean Scores and V a r i a b i l i t y . . Between Low I.Q,. Groups, Based upon the Comr bined T o t a l of A, B, C, and D Scores. C r i t i c a l Ratios. 32. IX Differences i n Mean Scores between High and .Low I.Q, Groups, Based upon the Combined To t a l of A, B, C, and D Scores. C r i t i c a l r a t i o s . 33 XI Correlations Between Test-Halves, t h e i r Standard Errors, and R e l i a b i l i t y C o e f f i c i e n t s f o r the D Test. 36 Zero-Order Correlations of D Test with In-te l l i g e n c e , Reading, and Arithmetic Reason-ing, and of Intelligence with Reading and Arithmetic Reasoning. Standard Errors. 40 LIST OF TABLES AND PLATES (Cont'd.) TABLE PAGE XII Correlations of Table XI Corrected for Attentuation. 41 XIII The Relationship of D Test Performance And School Achievement In Terms Of Probabil i t y (P), Contingency C o e f f i c i e n t s (C) And Their Standard Errors (Approximate). 43 XIV Order Of Item-Difficulty In Terms Of Percen-tages Of Maximum Possible Scores And The Number of Failures (F). 47 XV Superiority Of Upper Over Lower Groups In Performance On Test Items, Expressed In Terms Of C r i t i c a l Ratios Of The Differences In Mean Score And In V a r i a b i l i t y . 50 XVI Percentage Accuracy Of Posi t i v e And Negative Component Scores On The Total Of Tests A, B, C and D For Six Test Items 51 XVII Number Of Correct Responses To Individual P o s i t i v e Instances Of Pog, Mib, And We'z On Each Of Tests A, B, G, And D. (Boys' Groups only) 53 XVIII I n t e r c o r r e l a t i o n Of Combined A, B, C, And D Scores On Test Items. 59 XIX Number of Unimproved Scores Out Of A Possible 120 For Each Sub-GrOup, C l a s s i f i e d In Terms Of Declination, Reversal Of Judgment, And Perseveration. 62 XX Number Of Items Solved By Sub-Groups Together With The Total Number Of Solutions Out Of A Possible 360 For Whocle Groups. 65 XXI Group Performance On Total Of Each Of Tests A, B, C, and D. Mean Scores, Standard Deviations, And Standard Errors. 66 LIST OF TABLES AND PLATES (Cont'd.) TABLE PAGE XXII XXIII XXIV A B E H C r i t i c a l Ratios Of The Differences Between Mean Test Scores Within Each Group (Bracketed Letters Designate Test Having Highest Score). 68 Intercorrelations Of Average A, B, C, a i d D Test Scores. 71 Percentage Accuracy Of P o s i t i v e And Negative Component Scores Achieved By Whole And Sub-Groups On Tests A, B, C, and D. 73 Performance Of High IQ Groups On T o t a l Of Tests A, B, C, and D. Mean Scores, Standard Deviations And Standard Errors. 12A Performance of Low IQ Groups On Total Of Tests A, B, C, And D. Mean Scores, Standard Deviations, And Standard Errors. 12A Order Of Item D i f f i c u l t y For High IQ Groups In Terms of Percentage Of Maximum Possible Score. 13A Order Of Item-Difficulty For Low IQ Groups In Terms of Percentage Of Maximum Possible Score. 13A Comparison Of Number Of Perfect Scores In Test A With Number of Perfect Scores Continuing Throughout Tests A, B, C, and D. (Sub-Groups Only). 14A C r i t i c a l Ratios Of The Differences In V a r i a b i l i t y Between A, B, C, and D Scores Within Each Group. 15A High Group Performance On Total of Each Of Tests A, B, C, and D. Mean Scores, Standard Deviations, And Standard E r r o r s . 16A Low Group Performance On T o t a l of Each of Tests A, B, C, and D. Mean Scores, Standard Deviations, And Standard E r r o r s . 17A EIST OF TABLES AND PLATES (Cont'd.) TABLE PAGE I J C r i t i c a l Ratios Of The Difference Between Mean Scores Within High Groups. . 18A C r i t i c a l Ratios Of The Differences Between Mean Scores Within Low Groups. 18A B. PLATES FIGURE PAGE 1 - 1 0 Teaching And Test Instances. 12-16 11 Group Progress On Individual Test Items. 69 i I N T R O D U C T I O N The demands of a constantly changing world coupled with the increasing emphasis upon s c i e n t i f i c techniques bespeak the need f o r systematic thinking of the highest order. An inherent part of such thinking, the abstraction of meaning and the formation of concepts represent, according to Sherman 1, the acme of i n t e l l i g e n c e . As he points out, there i s general agreement that "an accurate measure of a person's i n t e l l i g e n c e i s possible only when his capacity to form and express concepts (abstract think-ing) can be estimated." The fac t that conceptual thinking i s of such undisputed importance i n di r e c t i n g human a c t i -v i t y stimulates i n t e r e s t i n developing ways and means toward i t s analysis and measurement. Since conceptual a b i l i t y f a l l s within the realm of thinking normally designated as reasoning, the search f o r adequate d e f i n i t i o n s might well commence with the l a t t e r , Sherman, M., Intelligence And I t s Deviations, New York, The Ronald Press Co., 1945, p. 15. According to B i l l i n g s , reasoning consists i n "the solving of a p r a c t i c a l or t h e o r e t i c a l problem or d i f f i c u l t y by the use of or through the r e l a t i n g of past experiences." Extending t h i s d e f i n i t i o n , Gates 2 c l a s s i f i e s reasoning as i "a form of learning", not vastly unlike t r i a l - a n d - e r r o r learning, i n which pertinent facts are rec a l l e d and combined with those perceived at the time. The popularity of t h i s view i s indicated by the frequent reference to r a t i o n a l learning, concept-learning, and the l i k e i n psychological journals and textbooks. While not i n fundamental disagree-ment with t h i s conception, McGeoch 3 prefers to regard learning and reasoning as separate but closely r e l a t e d processes operating from near-opposite ends of a continuum, with overt response i n the one giving way to symbolic response i n the other. Following upon a series of f a c t -f i n d i n g endeavours, an in t e r p r e t a t i o n offered by Maier 4 regardSn,; reasoning as spontaneous adjustment, a type of integrative response which depends f o r success upon the removal of persistent and old-established tendencies. I t i s the ease and readiness with which these persistencies of habit are sidetracked that distinguishes between able and poor reasoners. Furthermore, reasoning and learning a b i l i t y are not necessarily highly correlated; a clever reasoner 1. B i l l i n g a , M.L., "Problem-Solving In Different F i e l d s of Endeavor", American Journal of Psychology, v o l . XLVT,1934, p. 260. 2. Gates, A.I., "Psychology f o r Students of Education, New York, The MacMillan Co., 1930, pp. 386-393. 3. McGeoch, J.A., Psychology of Human Learning, New York, Longmans, Green and Co., 1942, p. 517. 4. Maier, N.R.F., "Reasoning i n Rats and Human Beings", Psychological Review, v o l . XLIV, 1937, pp. 365-378. may be a poor learner, and vice-versa. Learning "furnishes us data with which to solve problems", but i t "also furnishes us with habitual directions and so i n t e r f e r e s with new adaptations." That i s why age and "excess" learning are more often than not productive of stereotyped ways of thinking which becloud the p o s s i b i l i t y of new approaches toward the solution of a problem. "By regarding reasoning as a new combination of past experiences," Maier concludes, "we designate a mechanism which d i f f e r s from learning and yet u t i l i z e s what has been learned." On the whole, the apparent controversy over the i n t e r r e l a t i o n of reasoning and learning seems to spring mainly from differences i n the d e f i n i t i o n of learning. As to what constitutes reasoning, there i s general acknowledg-ment of the presence of a problem requiring solution, of the need for r e c a l l , and of the importance of past exper-ience and the r e l a t i n g and reorganization of pertinent parts of t h i s experience. Turning to a quantitative analysis of reasoning, Thurstone 1 found that, instead of being highly s p e c i f i c , reasoning appeared d i v i s i b l e into merely two factors which, for want of further study, he t e n t a t i v e l y l a b e l l e d " I " and "D Factor " I " i s l i n k e d to the discovery of a rule or to the 1. Thurstone, L.L., "Primary Mental A b i l i t i e s " , Psychometric Monographs, #1, 1938, pp.v, 86-89. formulation of a hypothesis, and i s therefore representative of induction; factor "D" i s associated with the ap p l i c a t i o n of a general rule or p r i n c i p l e to p a r t i c u l a r s , thus sym-b o l i z i n g deduction. Recent experiments by Holzinger and others have suggested the p o s s i b i l i t y , however, that these may not actually exist as separate factors at all"'". Whatever the fa c t s , induction and deduction would seem to be i n t i -mately related. Aware of the probable overlap of inductive and deductive thinking, Woodworth regards these terms as more aptly describing problems than thought processes. He uses "induction" synonymously with "concept formation" i n r e f e r -ence to problems which c a l l forth c l a s s i f i c a t o r y or gener-a l i z e d responses. Early though not wholly representative examples of t h i s type of problem were those used by H u l l and Kuo 3, i n which the abstraction of common elements from a series of patterns was considered an aspect of concept formation. In neither of these cases, however, was there 4 provision for generalization i n the sense i n which Smoke 1. Wolfe, P., "Factor Analysis to 1940", Psychometric  Monographs, #3, 1940, p. 33. 2. Woodworth, R.S., Experimental Psychology, London, Methuen and Co. Ltd., 1938, p. 801. 3. H u l l , C.L., "Quantitative Aspects of the Evolution of Concepts; An Experimental Study", Psychological Monographs, vo l . XXVIII, No.l, 1920; Kuo, Z.Y., "A b e h a v i o r i s t i c Experiment on Inductive Inference", Journal of Experimental  Psychology, v o l . VI, 1923, pp.247-293. Both c i t e d i n Smoke, K.L., "An Objective Study of Concept Formation". 4. Smoke, K.L., "An Objective Study of Concept Formation", Psychological Monographs, v o l . XLII, No.4,1932, pp.2-8, 42. V employs the word. For Smoke, generalization or concept formation i s something more than the mere abstracting of elements; i t i s a search for common relati o n s h i p s . Response i s no longer to a single element within the stimulus pattern, but to a "dynamic whole". Elements, according to Smoke, may note even enter into the concept. But T y l e r 1 contends that generalization "involves both elements and r e l a t i o n s between these elements". I t i s indeed d i f f i c u l t to conceive of " r e l a t i o n s h i p " as an entity i n i t s e l f , f o r the very term implies "things r e l a t e d " . However, t h i s f a c t does not detract from Smoke's d e f i n i t i o n of gener-a l i z a t i o n or concept formation as a "process whereby an organism develops a symbolic response (usually but not necessarily l i n g u i s t i c ) which i s made to the members of a g class of stimuli patterns, but not to other s t i m u l i " . Since response i n this sense involves the formulation of a rule or p r i n c i p l e , generalization or concept formation may be regarded as nothing l e s s than an expression of reasoning a b i l i t y . This d e f i n i t i o n of generalization, therefore, w i l l be applied i n the present study. 1. Tyler, F.T., Generalizing A b i l i t y of Junior High School  P u p i l s : An Experimental Study of Rule Induction, unpublished Ph.D. Thesis, University of C a l i f o r n i s , 1939. 2. Smoke, K.L., op. c i t . , p. 8. 1. CHAPTER I. THE PROBLEM 1. Review of the L i t e r a t u r e . H u l l and Kuo pioneered the way toward an objective analysis of generalizing a b i l i t y , but i t remained f o r others to expand the technique. An inventory of such experiments to date indicates the development of several sub-types which, f o r immediate purposes, w i l l be reduced to those emphasizing simple sorting tests and those favouring other means f o r the study of generalizing a b i l i t y . Typical of the former are the experiments of Hanfmann and Kasanin. 1 Their t e s t s , administered i n d i v i d u a l l y , required the classi-r f i c a t i o n of geometric s o l i d s according to the possession of certain common properties. They outlined three s i g n i f i c a n t c h a r a c t e r i s t i c s of conceptual thinking, namely, "the importance of the attitude of looking for categories, the recognition of many p o s s i b i l i t i e s rather than merely the f i r s t one to occur, and the consideration of the t o t a l system". Conducting an experiment along almost i d e n t i c a l 2 l i n e s , Thompson found quantitative and q u a l i t a t i v e d i f f e r -ences between the generalizing a b i l i t y of 6- to 8-year-olds and that of 9-to 11-year-olds, the l a t t e r exhibiting l e s s 1. Hanfmann, E., & Kasanin, J . , "A Method f o r the Study of Concept Formation", Journal of Psychology, v o l . I l l , 19 37, pp. 524-529. 2. Thompson, J . , "The A b i l i t y of Children of Different Grade Levels to Generalize on Sorting Tests", Journal of Psychology, v o l . XI, 1941, pp. 119-126. r i g i d i t y i n theircattack upon the problems. Her res u l t s are closely a l l i e d to those of Long and Welch 1 i n showing that c l a s s i f i c a t i o n on the basis of form i s probably one of the lowest lev e l s of generalization. On the whole, these exper-iments involved a r e l a t i v e l y small number of subjects and were sparing i n t h e i r use of s t a t i s t i c a l analysis. The other type of study i s exemplified by the ind i v i d u a l experiment of Ewart and Lambert wherein the subject advanced toward a solution of the problem through the perception of a complexity of p o s i t i o n a l r e l a t i o n s h i p s . Generalization was found to be highly correlated with i n t e l l i g e n c e , and to benefit from verbal i n s t r u c t i o n . Conclusions were based upon a small, select group and made no reference to r e l i a b i l i t i e s or sev differences. A group experiment by Peterson 3 required the derivation of a general rule or p r i n c i p l e (physical law of the lever) operating i n each of a series of 20 problems. Performance was rated according to the number of problems solved and a correct statement of the underlying p r i n c i p l e involved. The a b i l i t y to solve problems i n this setting bore l i t t l e or no r e l a t i o n to i n t e l l i g e n c e . I t also appeared that success in solving the problems was adversely affected by a 1. Long, L., and Welch,.L., "A Preliminary i n v e s t i g a t i o n of Some Aspects of the Hi e r a r c h i c a l Development of Concepts", Journal of General Psychology, v o l . XXII, 1940,pp.359-388 2. Ewart,P.H., & Lambert,J.F., "The E f f e c t of Verbal Instrue tions Upon the Formation of a Concept", Journal of General  Psychology, v o l . VI, 1932, pp.400-413. 3. Peterson, G.M., "An Empirical Study of the A b i l i t y to Generalize", Journal of General Psychology, vol.VI, 1932, pp.90-114. 3. reduction i n the amount of i n s t r u c t i o n forthcoming. T y l e r 1 conducted an investigation of generalizing a b i l i t y , using a combination l i g h t and switch panel. The problem was to discover from patterns arranged thereon, the switch which turned out a l l the l i g h t s . This was an ind i v i d u a l experiment. Correlations with i n t e l l i g e n c e were s i g n i f i c a n t and substantial. Sex differences favouring the boys were probably linked to the mechanical nature of the apparatus. Results suggested that solutions were achieved with the a i d of both p o s i t i v e and negative instances, where "p o s i t i v e " was used to describe those examples which i l l u s t r a t e the rule governing solution, and "negative" to examples which v i o l a t e this r u l e . Tyler.also found that solutions were not always accompanied by the a b i l i t y to verbalize the rul e or p r i n c i p l e concerned. Sidestepping the need f o r overt manipulation, Smoke2 required h i s subjects to discern common relationships between elements contained, within a series of geometric patterns. As i n the preceding experiment, successful gener-a l i z a t i o n did not imply a b i l i t y to define the concept verbally. The negative teaching example (defined as i n Tyler's experiment) promoted greater accuracy, but had a le s s decided e f f e c t upon rate of performance; a majority 1. Tyler, F. T., op. c i t . 2. Smoke, K.L., op. c i t . ; Smoke, K.L., "Negative Instances i n Concept Learning", Journal of Experimental Psychology, v o l . XVI, 1933, pp. 583-8. 4. preference lay with the use of both p o s i t i v e and negative teaching examples. Among factors observed to be charac-t e r i s t i c of concept formation were (1) grouping, (2) i n s i g h t f u l behavior, and (3) formulation, tes t i n g , and acceptance or r e j e c t i o n of hypotheses. Computing corre-l a t i o n s f o r one experimental group, i n t e l l i g e n c e and speed of generalizing were found to be s i g n i f i c a n t l y related. This relationship was not de:tfem.ihJBdfor the other groups, nor were comparisons made with performance on other reasoning tests. As Tyler has previously noted, no study was made of sex differences nor of the eff e c t of order of presentation upon i t e m - d i f f i c u l t y . The subjects were tested i n d i v i d u a l l y , and together formed a r e l a t i v e l y small and highly select group. Foreseeing the p o s s i b i l i t i e s behind Smoke's technique, Tyler suggested the need f o r i t s further a p p l i -cation. To t h i s end Wood1 made numerous changes i n Smoke's tests to permit their adaptation to a lower age l e v e l ; the method of presentation was altered somewhat and most of the items were completely redefined. " The tests were adminis-tered i n d i v i d u a l l y to 50 Grade VI.boys, h a l f of whom were subjected to in s t r u c t i o n by means of po s i t i v e examples, while the remainder were taught by both p o s i t i v e and negative examples. In each case the teaching examples 1. Wood, J.E., The Relative Role of Po s i t i v e and Negative  Instances In Concept Formation, unpublished Master's Thesis, Vancouver, University of B r i t i s h Columbia, 1943. 5. were presented i n cumulative fashion and were allowed to remain before the subjects throughout the test i n g period. On the basis of hi s r e s u l t s Wood concluded that the negative teaching example greatly a s s i s t s generalization, e s p e c i a l l y among those o f . l e s s e r i n t e l l i g e n c e and i n those cases involving more complex items. Recognition was a^more r e l i a b l e measure of generalizing a b i l i t y than v e r b a l i z a t i o n , though no ane of recognition, v e r b a l i z a t i o n , or reproduction could be depended upon to precede the others i n order of appearance. In spite of the l i m i t e d size of the groups, correlations with other variables might have been computed. No provision was made for the study of sex differences, nor were r e l i a b i l i t i e s l i s t e d . Performance being rated solely according to the number of perfect scores, there was no d i f f e r e n t i a t i o n between indivi d u a l s of unequal a b i l i t y who were both capable of obtaining the solution to an item. Had a time l i m i t been imposed and a l l test t r i a l s been made compulsory, a composite score made up of perfect scores and number of t r i a l s required to reach a solution would havd yielded a more accurate measurement of the a b i l i t y under consideration. The f i r s t to apply Smoke's p r i n c i p l e i n a group experiment, Dickinson subjected the test items to s t i l l further changes and modified procedure i n accordance with 1. Dickinson, A.E., An Investigation Into The Generalizing A b i l i t y Of Grade Two Pupils; Master's Thesis, Vancouver, University of B r i t i s h Columbia, 1943, published i n abstract in Journal of Educational Psychology, v o l . XXXV,1944. pp. 432-441. 6. several of Wood's findings by reducing the number of teaching and test instances. With 160 Grade II children for subjects, she made a comprehensive study of the ef f e c t upon gener-a l i z i n g a b i l i t y of i n s t r u c t i o n u t i l i z i n g successive and cumulative presentation of both p o s i t i v e and negative examples and of p o s i t i v e examples only. Teaching examples successively presented were removed during the testing period, hence involving the need of r e c a l l ; under cumulative presentation they were continuously exposed as i n Wood's experiment. Subjects were selected to form four groups, of £0 boys and 20 g i r l s each, matched on the basis of i n t e l l -igence and chronological age. Achievement, registered in terms of mean scores rather than number of perfect scores, was most successful under successive presentation and was more impaired than aided by the introduction of negative teaching examples, though these trends were not' s t a t i s t i c a l l y s i g n i f i c a n t . Boys showed superior a b i l i t y when instructed by p o s i t i v e and negative examples, and g i r l s when instructed by only p o s i t i v e examples; again], however, these differences were not dependable. Test r e l i a b i l i t i e s were high. . Correlations with i n t e l l i g e n c e and reading a b i l i t y were, i n general, low and n e g l i g i b l e , but the rel a t i o n s h i p of test performance to scholastic achievement was not deter-mined. These r e s u l t s should be confirmed by employing larger samples embodying a wider i n t e l l i g e n c e range. 7. 2. Summary of the Literature From t h i s condensed account of re l a t e d studies i n generalizing a b i l i t y emerge the following conclusions: 1. Differences i n experimental r e s u l t s , which are probably att r i b u t a b l e to group differences as well as to differences i n test material and methods of procedure, i l l u s t r a t e the need f o r more r e p e t i t i o n and follow-up of experiments previously undertaken. 2. Wherever possible, i n d i v i d u a l experiments should be repeated as group experiments, and vice versa. 3. Verbalization of thexrule or p r i n c i p l e governing solution of a problem i s a doubtful c r i t e r i o n of generalizing a b i l i t y or concept formation. 4. The question of sex differences and the value of the negative teaching example demand closer study. 5. Consideration, of test v a l i d i t y was usually r e s t r i c -ted to correlations with various c r i t e r i a ; no reference was made to item v a l i d i t y . 6. No attempt was made to analyze reaction to posi-r t i v e and negative test instances. 3. The Problem Defined Generalizing a b i l i t y may be estimated by any one of the experimental methods previously described, but that u t i l i z e d by Smoke appears most suited to both i n d i v i d u a l and group testing at any l e v e l . Smoke's technique permits 8. as close an approach to the study of the ordinary everyday process of conceptual thinking as any yet devised. The present problem may be broadly defined as the group measurement of generalizing a b i l i t y at the Grade VI l e v e l , where "generalization" i s used synonymously with "concept formation" to designate the process whereby a common relationship i s abstracted from a series of geometric patterns. I t s basic assumptions are that (a) Generalizing a b i l i t y , i f possessed by children at the Grade VI l e v e l , can be measured by the method described. (b) Generalizing a b i l i t y i s more accurately repre-sented by scores on a recognition test than by verba l i z a t i o n of the rule involved i n the solution. 4. The Problem i n Outline S p e c i f i c questions which t h i s study w i l l attempt to investigate may be outlined i n b r i e f . 1. What i s the eff e c t upon generalizing a b i l i t y of group i n s t r u c t i o n which c a l l s f or the exposure one by one of patterns representative of the r u l e or p r i n c i p l e to be deduced and which requires t h e i r removal during the tes t i n g period? 2. What i s the effect upon generalizing a b i l i t y of group i n s t r u c t i o n which c a l l s f or the alternate exposure one by one of patterns representative and of patterns not representative of the rule or p r i n c i p l e to be deduced and which requires t h e i r removal during the testing period? 3. What i s the eff e c t upon generalizing a b i l i t y of group i n s t r u c t i o n which c a l l s f or the alternate exposure by cumulative presentation of patterns representative and of patterns not representative of the rul e or p r i n c i p l e to be deduced and which permits t h e i r continued exposure during the testing period? 4. To what extent, i f any, do sex differences govern generalizing a b i l i t y i n t h i s setting? 5. With what r e l i a b i l i t y and accuracy can the group measurement of generalizing a b i l i t y at the Grade VT l e v e l be accomplished? How c l o s e l y related to other forms of mental achievement i s the a b i l i t y to abstract s p a t i a l relationships? 6. What are some of the factors of d i f f i c u l t y which impede successful generalization of th i s type? 7. Are test s t i m u l i which exemplify the rule or p r i n c i p l e governing solution and those which do not i l l u s t r a t e t h i s rule i d e n t i f i e d with equal accuracy? 1 0 CHAPTER I I . GENERAL PROCEDURE. APPARATUS, AND SUBJECTS 1. General Procedure. Since t h i s study was conducted with a view to retesting several hypotheses advanced by Smoke and Wood, and to determining the extent to which t h e i r r e s u l t s pertain to group si t u a t i o n s , i t was desirable that the general conditions surrounding concept formation i n the present investigation p a r a l l e l closely those of the previous studies. The nine d i f f e r e n t geometric symbols or "concepts" constituting the present tests were borrowed from Wood who, i n turn, designed them from Smoke's. A nonsense s y l l a b l e was used i n both cases to designate a given series of geometric patterns exhibiting a common relationship between ce r t a i n elements contained within them. These "concepts" and t h e i r accompanying d e f i n i t i o n s are l i s t e d below i n the order i n which they were presented to the subjects. Concept D e f i n i t i o n Dax: A t r i a n g l e containing a dot. Mef: A c i r c l e , h a l f black and h a l f white. Vec: A str a i g h t l i n e , at one end of which i s axdot i n d i r e c t l i n e with i t . 11. Concept D e f i n i t i o n Mib: A c i r c l e touching a square. Zum: A c i r c l e with one dot inside and one dot outside i t . Tov: A square and four crosses, one near each of i t s four sides. Pog: Two li n e s (straight or other-' wise) of unequal length. Wez: A c i r c l e touching the shortest side of a t r i a n g l e . Z i f : A c i r c l e inside a rectangle, and touching i t s two longest sides but Not touching either end. Sets of traching and test instances, i d e n t i c a l to those used by Wood, with the exception of a s l i g h t change i n the order of presentation and i n the number of teaching and test instances employed, were prepared. Instead of eight teaching examples as i n Wood's eac-gexi'me'n't four such examples of a given concept were presented. This procedure, already adopted by Dickinson, i s i n conformity withf-Wood's findings, namely, that performance showed l i t t l e or not improvement beyond the fourth presentation. Likewise, the number of test instances was reduced from sixteen to ten. Hereafter the terms "example" and "instance" w i l l be used:'to d i s t i n -guish patterns comprising the. teaching series and those comprising the test s e r i e s , respectively. When r e f e r r i n g c o l l e c t i v e l y to teaching and test patterns, the term "instances" w i l l be applied. 1£. Fig. 2 \ • Fig. 3 ' • t Fig. 4 14. Fig. 5 > Fig. 6 Fig. 8 Fig. 9 Birthday " ' ~" Zd:./A<f».A. *8= _IL=3 School T.aobsr JIl ff.fi B . . . 3l.. .ioM.H 9 Total ... 4Wi KAKPLB A - sue ~9 ItT T Hti Example g - TEC Ha, Ho- TgT Mo No. (S3 So. Fig. 1 0 17. The teaching and test instances r e l a t e d to each of the nine "concepts" included both p o s i t i v e and negative instances. P o s i t i v e examples or instances of a given concept r e f e r r e d to those patterns embodying the r e l a t i o n -ship which defined the concept i n question, while negative examples or instances referred to patterns i n which this relationship was absent, ^osi-tive examples or instances 9,fte d i f f e r e d from/another i n the siz e and p o s i t i o n of the elements, and i n heaviness of outline; negative examples or instances, besides being d i s s i m i l a r i n these respects, v i o l a t e d one or more of the conditions demanded by the concept. A clearer conception of the material employed i n th i s study may be had by reference:', to Figs. 1 - SO. As regards the t e s t s , p o s i t i v e and negative instances were arranged i n chance order, one test containing as many as six p o s i t i v e instances, several .'containing f i v e such instances, while the remainder had but four. This study was divided into three experiments, each of which may be outlined b r i e f l y . In Experiment I four p o s i t i v e teaching examples were s e r i a l l y presented one at a time. Each was exposed f o r a study-period of 8 seconds, a f t e r which i t was removed and followed immediately by the test bearing a time-limit of 25 seconds. This manner of introducing and exposing the teaching examples w i l l be designated by the:.term "successive presentation". In 18. Experiment II the procedure was i d e n t i c a l with that just described except that the four teaching examples, instead of being p o s i t i v e throughout, included an equal number of po s i t i v e and negative examples. Presented al t e r n a t e l y , each p o s i t i v e and negative example was submitted for study, and upon i t s removal was followed by the t e s t . Experiment III employed the samexteaching examples as i n Experiment I I , but d i f f e r e d from the l a t t e r i n the method of presentation. Each example was presented together with those which pre-ceded i t , and, i n addition to the prescribed 8 seconds of exposure, was permitted to remain before the subjects during the whole of the testing period, thereby greatly reducing the ef f e c t of memory upon the learning of the concepts. This system of presenting the teaching examples w i l l be referred to as "cumulative presentation". The three experiments were distinguished from one another, therefore, i n respect to the teaching method applied. In point of s i m i l a r i t y , however, a l l experiments employed pre c i s e l y the same tes t s , each test being repeat-edly presented subsequent to the study of the teaching examples. Also, i n accordance with the need to control a l l factors l i k e l y to influence test procedure, a standard set . of instructions accompanied each experiment. In each case, the subjects were introduced to the problems by an i l l u s -t r a t i v e example, the concept "Dax",_through a series of steps comparable to those to be employed i n learning the 19. concepts comprising the test.. F i n a l l y , the subjects were warned not to change the i r answers to an item a f t e r the next test item appeared. No further assistance beyond the ' preliminary instructions was given at any point i n the experiments. In conclusion, the three experiments may be c l a s s i f i e d as follows: Experiment I: A group study of the effect upon concept formation of the success-ive presentation of p o s i t i v e teaching examples. Experiment I I : A group study of the e f f e c t upon concept formation of the success-ive presentation of alternate p o s i -t i v e and negative teaching examples. Experiment I I I : A group study of the effect upon concept formation of the cumula-t i v e presentation of alternate p o s i -time and negative teaching examples. £. Apparatus. In contrast to the presentation methods used in the previous studies, the stimuli were submitted to groups of subjects by means of lantern s l i d e s . Two projectors were employed, one to f l a s h on the teaching examples, the other the test instances. The experiments were conducted i n the schools, a classroom or small auditorium being set aside for the:,purpose and p a r t i a l l y darkened, but permitting of s u f f i c i e n t l i g h t for the recording of answers. • Total time required for administering the tests was approximately 35 minutes. 20. In making up the f i l m s l i d e s the necessary patterns were drawn on p l a i n white cards. Six teaching cards, four p o s i t i v e and two negative, and ten test cards were drawn up for each concept, the t o t a l number of cards being 144. These were then numbered and l a b e l l e d , arranged i n the desired order, and photographed. The negative f i l m was used i n making up the s l i d e s , with the r e s u l t that a l l figurescwere projected upon the screen as white against a black background. Response was recorded i n triple-page booklets, the f i r s t " page of which i s shown i n Fig.lQ. Space was a l l o t t e d i n which the subject was required to f i l l i n hi s (or her) name, sex, and school, further provision being made f o r additional data to be inserted by the experimenter. On the f i r s t page, as on the two succeeding pages, space was pror vided f o r responding to three concepts, each i n order of presentation. Under each item number the f i r s t , second, t h i r d , and fourth presentations of the test were l a b e l l e d A, B, C, and D, respectively. Henceforth throughout t h i s study i t w i l l be found convenient so to designate the several presentations of the t e s t . The numbers 1, 2, 3, 10 corresponded to the ten instances of the concept, p o s i t i v e and negative, which constituted the t e s t . The subject's task was to determine whether a given test instance was or was not representative of a p a r t i c u l a r concept, and to draw a c i r c l e around either "yes" or "no1* accordingly. , This procedure i s outlined i n d e t a i l i n Appendix I. In scoring the r e s u l t s , use was made of the three columns at th a l r i g h t . These columns were l a b e l l e d w - n , and "T'', from l e f t to r i g h t , and were reserved i n that order for scores based upon the number of correct recog-n i t i o n s of po s i t i v e instances, of negative instances, and of the t o t a l of..positive and negative instances. In the following chapters, except where s p e c i f i c mention i s made of and component scores, discussion of t e s t per-formance w i l l have reference s o l e l y to "T" scores. 3. Subjects The present study was conducted with the co l l a b -oration of the Superintendent and of the p r i n c i p a l s and teachers of nin.e.Vancouver schools. The tests were admin-i s t e r e d i n June 1942 to Grade. VI children of white extrac-t i o n . Selection of schools was such as to provide a f a i r d i s t r i b u t i o n of socio-economic factors. In order to insure a suitable l e v e l of d i f f i c u l t y f o r each test item and to determine the adequacy of the instructions, three t r i a l experiments were conducted with 60 Grade VI subjects. The experimental groups upon which the analysis i s based were l i m i t e d to 270 of a t o t a l o f 440 subjects o r i g i n a l l y tested. This reduction arose from the need for making up three comparable groups, one f o r each of the three experiments. Each group was numbered 22. according to the experiment i n which i t participated; thus, those subjects engaged i n Experiment I formed Group I, those engaged i n Experiment II formed Group I I , and so on. These groups were composed by matching i n d i v i d u a l s on the basis of sex, chronological age, and I.Q. as measured by the Otis Self-Administering Intermediate Examination. Boys:-and g i r l s involved i n one experiment were matched with one another and with boys and g i r l s taking part i n each of the two remaining experiments. Since sex was among the factors determining c l a s s i f i c a t i o n , i t i s ob-vious that there must be altogether 6 experimental groups, each containing 45 subjects. The average i n t e l l i g e n c e of the 6 groups thus formed i s shown i n Table I. C r i t i c a l r a t i o s of the differences between means and standard deviations of any two groups did not exceed .38. The average chronological age of each of these groups i s l i s t e d i n Table I I . Here again differences were s t a t i s t i c a l l y n e g l i g i b l e . In matching i t was found impracticable to employ a range smaller than*5 I.Q. points and 5 months chronological age. For example, a boy i n Group I who possessed an IQ rating of 112 and a chronological age of 12; years 3 months was matched with subjects i n each of the other 5 groups whose IQ»s f e l l within the range 107 - 117, the further requirement being that the difference between the 23; TABLE I. AVERAGE INTELLIGENCE OF MATCHED GROUPS. STANDARD DEVIATIONS. . GROUP I GROUP II GROUP III Boys G i r l s Boys G i r l s Boys S i r l s A.M... L10.87 ' 110.54 111.08 111.08 110.60 110.54 S.D... 9.78 10.35 9.90 9.60 10.29 9.78 the chronological ages of any two of the subjects thus matched not exceed 5 months. Intelligence ratings were obtained from tests administered to a l l subjects e a r l i e r i n the year; chronological ages were l i s t e d as of June 30, 1942. Since the matter of distinguishing between the performances of subjects of high i n t e l l i g e n c e rating and those of low i n t e l l i g e n c e rating was of considerable si g n i f i c a n c e for t h i s study, i t was also decided to subdivide each of the 6 experimental groups into high, medium, and low IQ groups, as indicated i n Table I I I . TABLE I I . AVERAGE CHRONOLOGICAL AGE OF MATCHED GROUPS. STANDARD DEVIATIONS. (EXPRESSED IN MONTHS) GROUP I GROUP II GROUP I I I Boys G i r l s Boys G i r l s Boys G i r l s A.M... 145.56 145.62 145.62 145.34 145.34 145.62 S.D... 5.58 5.44 5.10 5.38 4.62 5.40 24. TABLE I I I . AVERAGE INTELLIGENCE OF SUB-GROUPS. STANDARD DEVIATIONS. GROUP I GROUP II GROUP III Boys G i r l s Boys G i r l s Boys G i r l s A.M. 121.10 HIGH G R 0 U I > S.D. 4.74 121.77 , 4.44 121.63 4.64 121.23 3.92 121.90 • 4.52 121.50 4.00 A.M. 111.50 MED. G R 0 U P S.D. 3.66 110.83 3.90 111.37 2.12 111.77 3.78 110.57 3.92 110.70 3.56 A.M. 100.17 LOW G R 0 U P S.D. 4.42 99.23 5.74 100.03 4.64 100.57 4.66 99.37 5.44 99.90 4.88 Thus, the resu l t i n g groups each involved 15 subjects. The maximum difference i n average i n t e l l i g e n c e between l i k e groups had a c r i t i c a l r a t i o of .82. Differences i n v a r i a -b i l i t y o f IQ between l i k e groups or between l i k e and unlike groups were somewhat greater, though none was s i g n i f i c a n t . Average chronological ages pertaining to the groups i n question are l i s t e d i n Table IV. In each instance i t w i l l be observed that the greatest differences i n average chronological age were to be found between the high and low groups, though again these differences were not s t a t i s t i c a l l y s i g n i f i c a n t . Considered h o r i z o n t a l l y and v e r t i c a l l y , differences i n v a r i a b i l i t y between groups yielded a maximum c r i t i c a l r a t i o of 1.23. For present purposes these sub-groups are s u f f i c i e n t l y well equated 85. TABLE IV. AVERAGE CHRONOLOGICAL AGE OF SUB-GROUPS. STAN-DARD DEVIATIONS. (EXPRESSED IN MONTHS) 1 GROUP I GROUP I I GROUP I I I A.M. HIGH GROUP S . T J . Boys G i r l s Boys G i r l s Boys G i r l s 142.77 4.80 144.37 4.80 144.10 5.00 • 143.30 4.00 143.70 3.80 144.83 4.60 A.M.; MED. GROUP S . D > : 145.97 6.00 144.63 5.80 144.50 5.00 144.77 5.60 144.63 4.60 144.63 5.40 A.M. LOW G R 0 U P S.D., 147.97 4.60 147.83 5.40 148.83 4.80 147.97 5.40 147.70 4.40 147.97 5.40 to provide some ind i c a t i o n of group performance i n r e l a t i o n to i n t e l l i g e n c e , though the inconstancy of the age factor, together with the small number of subjects involved i n each case, render any conclusions based thereon as merely suggestive of cer t a i n trends i n performance. 26. CHAPTER I I I . THE EXPERIMENTS 1. The D Score As A Basis For Analysis. In conducting a quantitative analysis of the data, the f i r s t question concerns the p a r t i c u l a r score that i s to serve as axbasis f o r interpretating r e s u l t s . Is i t possible to f i n d an approach to the study of test performance which combines maximum v a l i d i t y with minimum computation? For example, i s the sum of the scores on the four tests for a l l eight concepts to provide the basis'from which our con-clusions derive, or i s there some other standard equally acceptable, but which lends i t s e l f more readily to calcu-lation? In an e f f o r t to provide a sa t i s f a c t o r y answer to the problem,* i t was decided to compute the correlations between the t o t a l of the A, B, C, and D scores and the D score*. The r e s u l t s are set fo r t h i n Table V. I t may be noted that s l i g h t l y higher correlations were found i n the case of Group I i n which the negative teaching examples were absent, but i n general i t appears that t h i s study may well be based upon an analysis of the D score. TABLE V. CORRELATIONS BETWEEN D SCORE AND TOTAL SCORE. STANDARD ERRORS. x* • • • • SE GROUP I GROUP II GROl rp III Boys G i r l s Boys G i r l s Boys , G i r l s ' .94 .02 .92 .02 .89 .03 .90 .03 .91 .03 .90 .03 of scores on the A test f o r eight concepts.^unless qth wise stated. The same applies i n regard to B, C, and D er-scores. 27. 2. Sex Differences. With the establishment o f the D score as the basis f o r analysis, the next step c a l l s for a comparison of sex groups to determine the a d v i s a b i l i t y of continuing to treat these as separate units or of combining t h e i r r e s u l t s for each experiment. The answer to the question of sex differences i s provided by the c r i t i c a l r a t i o s of Table VI i n which mean scores and standard deviations are compared. In both cases involving successive presentation boys showed only a s l i g h t tendency to exceed the g i r l s ^ t h i s tendency being most evident i n Group I. On the other hand, i n the case of cumulative presentation the g i r l s achieved the highest mean score. Though Table VI makes no mention of the f a c t , s a t i s f a c t o r y s i g n i f i c a n c e * ( c r i t i c a l r a t i o of 2.02) characterized the difference between achievement of boys and of g i r l s , to the advantage of the l a t -t er. TABLE VI. CRITICAL RATIOS OF SEX DIFFERENCES IN MEAN SCORES AND VARIABILITY ON THE D TEST. GROUP I • SROUP II GROUP III Boys G i r l s Boys G i r l s Boys G i r l s A.M.... 53.57 52.86 48.50 47.52 51.43 57.12 C.R^... .80 .35 2.21 L^* D • • • • 7.08 8.40 13.48 13.24 14.40 9.48 C *-^ -^ • • • 1.14 .12 2.70 Henceforth, c r i t i c a l r a t i o s of 1.65, 2.35. 3.00 w i l l c o n s t i -tute the dower arbit r a r y l i m i t s f o r s a t i s f a c t o r y , high, and v i r t u a l s t a t i s t i c a l significance", respectively. See Peters and Van Voorhis, S t a t i s t i c a l Procedures and Their Mathemat-i c a l Bases, pp. 138, 17b. 28. Thus,an o v e r a l l comparison of boys with g i r l s indicates that while the g i r l s made the lowest average scores, they also attained the highest average on the D Test. However, while the resu l t s o f f e r no conclusive evidence of marked sex differences i n the handling of concepts, the extent of the differences i n mean scores and v a r i a b i l i t y between boys and g i r l s i n Group III j u s t i f i e s treating the sexes separately throughout the remainder of this study. 3. Group Differences. Group differences are next examined to determine the ef f e c t of variations i n the method of i n s t r u c t i o n . The necessary data f o r t h i s purpose are furnished by Table VII. The groups whose differences are under study i n each case are indicated i n the column at the extreme l e f t , the remaining TABLE VII. GROUP DIFFERENCES IN MEAN SCORES AND VARIABILITY ON THE D TEST. CRITICAL RATIOS. GROUPS BOYS GIRLS Mean S. D. Mean S?D. DIFF. . C.R. DIFF. C.R. DIFF. . C.R. DIFF. C.R. I - II 5.07 +2.23 6.40 -3.98 « 5.34 +2.28 4.84 -2.92 I - III 2.14 +.89 7.32 -4.33 4.26 -2. 27 1.08 - .81 II - II I 8.93 -1.00 .92 r .44 9.60 -3.95 3.76 +2.19 29. columns containing the actual differences i n mean scores and v a r i a b i l i t y , together with the c r i t i c a l r a t i o s of these differences. A po s i t i v e c r i t i c a l r a t i o indicates that the first-named group i n the extreme left-hand column attained the higher mean score or greater v a r i a b i l i t y , as the case may be; on the other hand, a negative r a t i o points to the first-named group as possessing the lower mean score or as being the less variable of the two. A comparison of the mean scores of a l l boys' groups; revealed Group I as the most successful, Group II as the least successful of the three groups. Difference i n mean scores between Groups I and II approached high s t a t i s t i c a l s i g n i f i c a n c e , while that between Groups I and III was considerably l e s s . Of equal i n t e r e s t i s the manner i n which the scores were d i s t r i b u t e d about the mean i n the above groups. Differences between standard deviations showed that boys i n Group III were scarcely more variable i n performance than those i n Group I I ; on the other hand, boys i n Groups II and III showed promise of always displaying greater v a r i a b i l i t y on the tests than boys i n Group I. The foregoing r e s u l t s i ndicate, i n the case of the boys, a tendency toward higher mean scores and greater uniformity of response from successive presentation involv-ing only pos i t i v e examples than from either of the two remaining methods. .30. Turning now to the g i r l s , those i n Group I l l a c h i e v e d the highest mean score on the D Test, those i n Group II the lowest. The difference between scores i n Groups I and scores i n Groups II and III approached high s i g n i f i c a n c e , while the difference between scores i n Groups, II and III was v i r t u a l l y s i g n i f i c a n t . As regards v a r i a b i l i t y , Group II g i r l s showed them-selves more variable than Group III g i r l s and decidedly more so than Group I g i r l s , a s a t i s f a c t o r y and a high s i g n i f i c a n c e attaching to the respective differences. There was l i t t l e difference in v a r i a b i l i t y between Groups I and I I I . Considering only mean scores and disregarding differences i n v a r i a b i l i t y , the r e s u l t s suggest the advan-tage to the g i r l s of the method employing cumulative pre-sentation of bo^h p o s i t i v e and negative examples i n the teaching s e r i e s . The boys, on the other hand, seemed to derive most benefit from successive presentation i n which negative examples of the concept were excluded. For both boys and g i r l s successive presentation u t i l i z i n g the negative example appeared as the least favorable mode of i n s t r u c t i o n , and i n both cases involving p o s i t i v e and negative examples the method of cumulative presentation held the advantage. While the recommendation of any p a r t i c u l a r method of presentation wouldbbe rather presumptuous at t h i s stage i n the analysis, at l e a s t one or two facts are worth noting: 31. Group i n s t r u c t i o n as herein provided leads to a lower average score and to a greater spread i n achievement when ass i s t e d by negative examples than when only p o s i t i v e examples are presented. These findings are i n sharp contrast to those of Wood i n which the presence of negative examples within the teaching series boosted performance and produced a closer grouping of the i n d i v i d u a l scores about the mean. Dickinson's re s u l t s yielded no clear-cut tendencies i n either d i r e c t i o n , though they offered some evidence of a decrease i n v a r i -a b i l i t y accompanying presentation of the negative example. Dickinson's and the present study advance contradictory claims regarding the eff e c t upon performance of varying only the memory factor. Thus, i n the former higher achievement accompaniedxsuccessive presentation, while i n the l a t t e r a reverse trend favored cumulative presentation where the negative example was concerned. 4. High And Low IQ, Groups Compared. In concluding t h i s phase of our study, an attempt should be made to determine the r e l a t i o n s h i p between in t e l l i g e n c e and concept formation. An insight into r e l a t i v e performance by subjects d i f f e r i n g widely i n i n t e l l i g e n c e may be gained by reference to the sub-groups mentioned i n the la s t chapter. We shall- f i n d i t convenient at t h i s time to l i m i t ourselves to a study of high and low IQ groups, u t i l i z i n g the combined r e s u l t s of the A, B,C/ and D tests f o r each group. Since inte r e s t again l i e s with obtained 32. TABLE V i l l a . DIFFERENCES IN MEAN SCORES AND VARIABILITY BETWEEN HIGH IQ GROUPS, BASED UPON THE COMBINED TOTAL OF A, B, C, AND D SCORES. CRITICAL RATIOS. GROUPS BOYS G] :RLS Mean S. D. Mean S. D. DIFF. C.R. DIFF. C.R. DIFF. . C.R. DIFF. , C.R. I - II I - I I I II - III 1.33 7.33 6.00 -.12 -.58 -.43 8.10 17.10 9.00 -1.07 -1.90 • - .92 4.00 16.67 20.67 + .34 -1.98 -2.02 7.60 12.00 19.60 • -.90 +2.02 +2.71 TABLE VIIID. DIFFERENCES IN MEAN SCORES AND VARIABILITY BETWEEN LOW IQ, GROUPS, BASED UPON THE COMBINED TOTAL OF A. B, C, AND D SCORES. CRITICAL RATIOS. GROUPS BOYS GIRLS Mean S..D. Mean S.D. DIFF. C.R. . DIFF . C.R. DIFF., C .R., DIFF. , C.R. I - II 28.67 +•2.78 17.50 -2.40 16.67 2.08 .70 -.12 I - III 29.33 +2.78 18.50 -2.48 22.66 -2.38 8.10 -1.20 II - I I I .66 +.05 1.00 -.11 39.33 -4.08 7.40 -1.09 differences between scores rather than with the actual scores themselves*, only these differences and the i r c r i t i c a l r a t i o s are tabulated above. As with Table VII, no provision i s made for a dire c t comparison of boys with g i r l s . Excluding for a moment a l l comparisons involving Group I I I g i r l s , -it w i l l appear that score differences between high IQ groups were n e g l i g i b l e , while those between low groups, were highly i n favor of Group I. In general, an * Mean scores and standard deviations f o r high and low IQ groups are provided i n Table A-B, Appendix I I . 33. increase i n v a r i a b i l i t y accompanied the introduction of the negative example. These indications suggest that, while the negative example has l i t t l e e f f e c t upon the group performance of bright children, i t may actually prove detrimental to those of le s s e r i n t e l l i g e n c e under conditions similar to those which prevailed i n these experiments. Table IX presents these differences from another angle by d i r e c t l y comparing the mean performance of high and low groups, with the following r e s u l t s : Scores i n Low Groups II were s i g n i f i c a n t l y lower than scores i n High Groups I, while the differences between scores i n Low Groups I and High Groups II were of only s a t i s f a c t o r y or n e g l i g i b l e s i g n i f i c a n c e . A l l t h i s suggests that the negative example serves merely to re-emphasize the difference i n i n t e l l i g e n c e between high and low groups; that i s , the lower the average i n t e l l i g e n c e of the group, the more i n h i b i t i v e may become the ef f e c t of the negative example upon test performance. In other words, the evidence offers nothing to substantiate TABLE IX. DIFFERENCES IN MEAN SCORES BETWEEN HIGH AND LOW IQ, GROUPS, BASED UPON THE COMBINED TOTAL OF A, I7~C, AND D SCORES. CRITICAL RATIOS.  GROUPS BOYS GIRLS DIFF. C.R. DIFF. -C.R. 1(H) - II(L) 38.67 3.44 43.33 4.65 I(L) - 11(H) 11.33 -1.56 22.66 -2.08 11(H) *• III(L) . 40.66 3.18 0.0 0.0 II(L) - III(H) 46.00 -3.21 60.00 -8.49 34. Wood's claims for the i n s t r u c t i o n a l advantage of the negative example to those of les s e r i n t e l l i g e n c e . A s i m i l a r comparison of high and low IQ groups i n Groups II and III suggests that azreduction i n the memory factor had a n e g l i g i b l e effect upon the r e l a t i v e performance of high and low boys' groups; the performance of high and low g i r l s ' groups within Group I I I , however, was f a r super-i o r to that of other sub-groups on the same i n t e l l i g e n c e l e v e l . The singular performance of Group I I I g i r l s as a whole lends a certain inconsistency to the general pattern which i s unexplainable i n terms of i n t e l l i g e n c e , arithmetic reasoning, or reading, insofar as can be determined. The p o s s i b i l i t y that behavior was actuated by ce r t a i n moti-vational factors peculiar to one experimental setting i s minimized by the fac t that the subjects comprising t h i s p a r t i c u l a r group represented three d i f f e r e n t schools. It may be that this was a select group i n terms of an a b i l i t y or a b i l i t i e s ignored by previous measurement. 5. Summary of Chapter I I I . 6 Following i s a condensation and restatement of findings up to thi s point. 1. Average performance on the D Tests correlated highly with average performance on the combined A, B, C, and D tes t s . 35. 2. Instructions attended by successive presentation performed t h e i r function more s a t i s f a c t o r i l y when negative teaching examples were excluded. For the presentation of both p o s i t i v e and negative examples the cumulative method was the more ef f e c t i v e . 3. In general, dispersion or scatter of scores was augmented by the presence of the negative example i n the teaching s e r i e s . 4. Boys showed a tendency to benefit most from in s t r u c t i o n by successive presentation of p o s i t i v e examples, g i r l s from i n s t r u c t i o n by cumulative presentation of p o s i t i v e and negative examples. Successive presentation favoured the boys and cumulative presentation the g i r l s , though no decided sex differences were manifested. 5. Results suggest that group i n s t r u c t i o n u t i l i z i n g the negative example had l i t t l e effect upon the response of bright children, while adversely aff e c t i n g that of normal children. 6. The c o n f l i c t i n g evidence of Wood's findings and of those of the present study points to possible inherent differences which distinguished performance i n each of the two settings, and suggests that care must be exercised i n attempting to generalize from one to the other. 36. CHAPTER IV. TEST RELIABILITY AND VALIDITY 1. Test R e l i a b i l i t y . Of paramount importance i n t e s t e v a l u a t i o n i s the degree to which c o n s i s t e n c y o f performance c h a r a c t e r i z e s the two halves of a t e s t or i s ma i n t a i n e d through s e v e r a l p r e s e n t a t i o n s o f the t e s t or i t s e q u i v a l e n t . The s p l i t - h a l f technique being the only means a v a i l a b l e f o r an estimate of r e l i a b i l i t y i n t h i s case, the t e s t was d i v i d e d i n t o two equal p a r t s , each c o n t a i n i n g four items corresponding i n d i f f i c u l t y to the f o u r c i t e m s i n the o t h e r h a l f , i n accordance with the u n d e r l y i n g assumptions governing t h i s method. To t h i s end i t became necessary to reassemble the t e s t items f o r each of the whole groups I , I I , and I I I , though the same item-arrangement h e l d f o r boys* and g i r l s ' groups w i t h i n each. R e l i a b i l i t i e s were ob t a i n e d by c o r r e l a t i n g the two halve s so formed, and then a p p l y i n g the Spearman-Brown formula. The r e s u l t i n g values are t a b u l a t e d below. TABLE X. CORRELATIONS BETWEEN TEST-HALVES, THEIR STANDARD ERRORS, AND RELIABILITY COEFFICIENTS FOR THE D TEST. GROUP I GROUP I I GROUP I I I Boys G i r l s Boys G i r l s Boys G i r l s .62 .69 .79 .71 .84 .68 .09 .08 .06 .07 .04 .08 r l l .77 .82 .88 .83 .91 .81 .37. The fact that these values may be higher than might obtain from the use of equivalent forms constitutes no c r i t i c i s m of the Spearman-Brown formula, according to Jackson and Ferguson 1, but i s simply a t t r i b u t a b l e to "the process of s p l i t t i n g the t e s t . " In any case, insofar as an estimate i s possible, indications point to a f a i r l y high degree of r e l i a b i l i t y for the type of test under consideration. I t i s noteworthy that maximum r e l i a b i l i t y pertained to the two boys' groups subjected to the negative teaching example.. ~ In l i n e with the assumption that the greater the 2 element of chance, the lower the r e l i a b i l i t y , Symonds contends that response on the basis of the NO-Yes choice tends to reduce test r e l i a b i l i t y . Since r e l i a b i l i t y i s l a r g e l y a function of v a r i a b i l i t y , the e f f e c t upon the former of added opportunity for guesswork i s obvious. But i n spite of t h i s claim, the r e l a t i v e l y high value of the c o e f f i c i e n t s obtained i s j u s t i f i c a t i o n for concluding that the r o l e of chance has received no undue emphasis i n the present t e s t s . A possible explanation f o r present r e l i a b i l i t i e s being s l i g h t l y lower than those of Dickinson's tests may focus upon the d i s t r a c t i v e influences r e s u l t i n g from exposure of the two projectors during the testing period, and from the frequent need f o r conducting the experiments 1. Jackson, R.B., and Ferguson, G.A., Studies on the R e l i -a b i l i t y Of Tests, B u l l e t i n No.12 of the Department of Educational Research, University of Toronto, 1941, p.11. 2. Symonds, P.M., "Factors Influencing Test R e l i a b i l i t y " , Journal of Educational Psychology, v o l . XIX, 1928, p. 79. 3. Dickinson,. A.E., op. c i t . , p. 47. 38. outside the f a m i l i a r classroom surroundings. Furthermore, had present r e l i a b i l i t i e s been based upon correlations between t o t a l s of A, B, C, and D scores for each t e s t - h a l f instead of upon correlations between D scores, the values would probably have been somewhat higher. 2. An Aspect of Test V a l i d i t y : Correlations  With Intelligence And Other Variables. Closely a l l i e d to test r e l i a b i l i t y i s the matter of v a l i d i t y . With what success do the present tests accomplish the segregation and measurement of generalizing a b i l i t y ? While the answer to any such question r e l a t i n g to mental tests i s necessarily but an estimate of the f a c t s , several means exist for deriving conclusions. These con-s i s t i n computing correlations between the test concerned and some c r i t e r i o n , i n studying the v a l i d i t y and i n t e r -correlations of the test items themselves, i n applying the index of r e l i a b i l i t y , or i n using any of the other dir e c t or i n d i r e c t methods for estimating test v a l i d i t y . The first-named, which has found wide app l i c a t i o n , was the one employed i n t h i s study, supplemented l a t e r (Chap. V) by an investigation of item v a l i d i t y . The a v a i l a b i l i t y of performance ratings on stan-dard tests of i n t e l l i g e n c e , reading, and arithmetic reason-ing made i t desirable to compute correlations between each of these and the tests of generalizing a b i l i t y to determine whether the l a t t e r are measuring a b i l i t i e s covered by the 39. other tests or whether they are measuring something quite d i f f e r e n t . At t h i s point a b r i e f description of the a r i t h -metic reasoning test i s i n order. Ihi.fchiSMtest* a l l items e n t a i l reading and memory, and a clear demand i s placed upon r e l a t i o n a l and numerical a b i l i t y . Typical of problems on the arithmetic reasoning test are the following: (a) A l i c e has f i l l e d 48 pages of her 64-page exercise book. How many pages of her exercise hook are s t i l l blank? (b) The discount at 5$ on a b i l l was #20.00. How much was the b i l l before i t was discounted? (c) A box which has a volume of 24 cubic feet i s 4 feet long, 3 feet wide. How deep i s i t ? (d) A newsboy made lj_- cents on each paper he sold. This was 60$ of the cost. What was the s e l l i n g price of each paper? (e) A man rows down stream 6 miles i n 2 hours and, returning against the current, takes 6 hours. Find his rate of rowing and the rate at which the stream flows. In each of these problems the r o l e of memory i s seen i n the r e c a l l of c e r t a i n fundamental rules related to areas, volumes, percentages, subtraction, m u l t i p l i c a t i o n , d i v i s i o n , and so on. Accuracy i n dealing with numbers i s also a fa c t o r i n reaching a solution. Possession of these two factors, memory and accuracy, seems s u f f i c i e n t to produce the desired re s u l t at the Grade VI l e v e l i n the case of easier problems, such as (a), (b) and (c), which merely require a mechanical a p p l i -cation of some simple arithmetic r u l e . Where more d i f f i - p c u l t problems are concerned, of which problem (e) i s an * Vancouver Tests: Reasoning in.Arithmetic, Form A. 40. example. There must he added some r e l a t i o n a l or integrative process tent a t i v e l y scknowledged as arithmetic reasoning. It appears, therefore, that somexof the items on the arithmetic reasoning test draw upon reasoning a b i l i t y . Correlations between each of these three tests and those of generalizing a b i l i t y are assembled i n Tabic XK, together with correlations between i n t e l l i g e n c e and each of reading and arithmetic reasoning. Considering the degree of error involved, the nature of the test material, and the TABLE XI. ZERO-ORDER CORRELATIONS OF D TEST WITH INTELLIGENCE, READING, AND ARITHMETIC REASONING, AND OF INTELLI-GENCE WITH READING AND ARITHMETIC REASONING. STANDARD ERRORS. . GROUP I GROUP II GROUP III Boys Gir] Ls Boys ; G i r l s Bo\ rs G i r l s r SE r r SE,_ r SEr\ r S E r r S E r r SE, D &.I.Q. .30 .14, .45 .12 .36 .13 .37 .13 .41 .12 .37 .13 D & R. .26 .14 .34 .13 .23 .14 .26 .14 .44 .12 .30 .14 D & A.R. .32 .13 .51 .11 .27 .14; .19 .14 .25 .14 .13 .14 I.Q. & R. .66 .08 .66 .08 .64 .09 .70 .08 .71 .07 .74 .07 I.Q.&A.R. .30 .14 .69 .08 .64 .09 .56 .10 .44 .12 .55 .10 size and s e l e c t i v i t y of the groups, the r e s u l t s suggest a relationship between D Test scores and performance on the Otis Test. The correlations of the D Test with each of the remaining variables were 'somewhat lower, f o r the most part. Intelligence seemed most closely associated with reading a b i l i t y and l e a s t with generalizing a b i l i t y . By correcting 41 . TABLE XII. CORRELATIONS OF TABLE IX CORRECTED FOR ATTENUATION? D & I.Q. D & R D & A.R. GROIJ rP I GROUP II GROI] Q? I I I ! Boys G i r l s Boys G i r l s ; Boys G i r l s .35 .31 .39 .51 .40 .59 .40 .86 .30 .42 .30 . .22 .45 ' .49 .28 .43 .35 .15 for attenuation and so cancelling the d i s t o r t i v e e f f e c t of chance errors i n correlated tests, the c o e f f i c i e n t s appear as i n Table XII. Results show that D Test performance, under the influence of changes i n methods of i n s t r u c t i o n , exhibited a p r a c t i c a l l y constant r e l a t i o n s h i p with i n t e l l i g e n c e . This fact becomes even more apparent upon combining and averaging values f o r boys and g i r l s groups within each of the three major groups. A l l i n a l l , i n t e r r e l a t i o n s with reading a b i l i t y and arithmetic reasoning may be s i m i l a r l y described, although there i s some ind i c a t i o n that the correlations of D scores with arithmetic reasoning were lower i n Groups II and III than i n Group I. I f , therefore, the arithmetic reasoning test be accepted as an adequate means f o r measuring generalizing a b i l i t y , i t appears that the introduction of the negative example impairs the v a l i d i t y of the D Test as a measure of t h i s a b i l i t y . From Table XI i t would appear that, f o r Grade VI children, the reading factor in the Otis Test i s an important determinant behind i n t e l l i g e n c e ranking. * R e l i a b i l i t y of Reading and Arithmetic Reasoning Tests"was" .90. 42. Reading may therefore supply the reason f o r the low inte r c o r r e l a t i o n s of D Test scores with i n t e l l i g e n c e , for language forms an i n t e g r a l part of each of the 75 items on the Otis Test hut i s confined to the preliminary i n s t r u c -tions i n the tests of generalizing a b i l i t y . This explan-ation applies also to the low relat i o n s h i p between the D and reading tests. On the other hand, the fac t that the correlations between these two tests were p o s i t i v e might indicate the presence of a common reasoning factor i n each. Or equally probable, the existencexof a verbal factor cJommon to both tests may account f o r the p o s i t i v e corre-l a t i o n s , p a r t i c u l a r l y since comprehension of the verbal instructions at the outset was prerequisite to a successful manipulation of the concepts i n the generalizing t e s t s . A f u l l treatment of this aspect of test v a l i d i t y should explore the p o s s i b i l i t y of a rel a t i o n s h i p with estimated classroom performance. D Test scores were graded according to the system used i n rating school achievement. Those subjects among the best f i v e percent received a grade of A, the next ten percent a gradexof B, and so fo r t h . By applying the chi-square t e s t , p o s i t i v e evidence of a re l a t i o n s h i p between school achievement and D Test performance was established and revealed to be la r g e l y , though not e n t i r e l y , independent of chance factors. These data, together with th e i r expression i n terms of the 43. contingency c o e f f i c i e n t (C), are contained i n Table XIII. Quite a high degree of association i s indicated by the c o e f f i c i e n t s , but these values must be accepted with ce r t a i n reservations. F i r s t l y , school achievement r a t i n g , instead of being wholly objective i n nature, i s i n part the product of personal judgment. And secondly, since an A grade at one school might carry only B cr e d i t at another, and since each experimental group included subjects drawn from a number of schools, i t can aff o r d but a rough measure of a subject's standing within that group. Therefore, while the facts support the p r o b a b i l i t y of a positi v e r e l a t i o n s h i p , i t s actual extent i s problematical. Quite apart from other considerations, a set of low correlations would not have been surprising i n view of the large number of a b i l i t i e s governing school work. TABLE XIII. THE RELATIONSHIP OF D TEST PERFORMANCE AND SCHOOL ACHIEVEMENT IN TERMS OF PROBABILITY (P), CONTIN-GENCY COEFFICIENTS (C) AND THEIR STANDARD ERRORS (APPROXIMATE).  p c SE C  GROUP I GROUP I I GRO UP I I I Boys G i r l s , Boys G i r l s Boys G i r l s .1190 .61 .15 .3611 .69 .15 .1685 .62 .15 .3594 .65 .15 .3256 .64 .15 .1314 .71 .15 44. A comparative study of performance on the D Test and on various c r i t e r i a has thus- demonstrated that the relati o n s h i p between scores on the first-named and those on each of cthe.other tests exhibits a c e r t a i n semblance of consistency, and i n so doing y i e l d s some proof of the v a l i d i t y of the tests of generalizing a b i l i t y . While the res u l t s for v a l i d i t y are not en t i r e l y conclusive, p a r t i -c u l a r l y as regards the form i n which the tests were administered to Groups II and I I I , much of the evidence implies that the present tests were measuring cert a i n q u a l i t i e s beyond the range of the other tests considered. 3. Summary of Chapter IV. 1. Indications suggest that measurement of group performance by these tests i s attended by a f a i r l y high degree of r e l i a b i l i t y . 2. In general, D Test performance exhibited a posi-t i v e , though not s i g n i f i c a n t r e l a t i o n s h i p with i n t e l l i g e n c e (as measured by the Ot i s Test). On the other hand, i n t e l l i g e n c e seemed to have more i n common with reading a b i l i t y and arithmetic reasoning than with LI:r; performance on the D Tests. 3. Correlations of D Test performance with reading a b i l i t y and arithmetic reasoning were generally p o s i t i v e but low. D. Test performance displayed most i n common with scholastic achievement, although no accurate measurement of t h i s relationship was possible. 46. CHAPTER V. ITEM VALIDITY AND ANALYSIS Transferring from v a l i d a t i o n techniques of the type used i n the foregoing chapter to an application of " i n d i r e c t " methods which r e s t r i c t analysis to d e t a i l s within the test, the next point of consideration is. that of item v a l i d i t y , for a test i s no morevvalid than the items which comprise i t . In t h i s study test items must not be confused with test instances; the term "test item" i s herein used to designate the whole battery of test instances, p o s i t i v e and negative of a given concept. . 1. Item-Difficulty . Thurstone 1, summarizing the r e s u l t s of several experiments with Grade VI children, claims that tests emboyding items with a d i f f i c u l t y range extending from approximately 30 percent to 70 percent successes and averaging about 50 percent successes probably carry a higher v a l i d i t y value than tests whose ranges of item-d i f f i c u l t y vary from 80 to 100 percent successes. To obtain an o v e r - a l l picture of the d i f f i c u l t y - o r d e r held by items i n our study, rank order of d i f f i c u l t y was determined f o r the t o t a l of A, B, C, and D scores within each item rather than f o r the D score alone. This step was deemed desirable, 1. Thurstone, T.G., "The D i f f i c u l t y of a Test and I t s Diagnostic Value", Journal of Educational Psychology, v o l . XXXII, 1932, pp.341-2. TABLE XIV. ORDER OF ITEM-DIFFICULTY IN TERMS OF PERCENTAGES OF MAXIMUM POSSIBLE SCORESMAND THE NUMBER OF FAILURES (F)*. . GROUP I GROUP II GJ 10UP III Boys G i r l s Boy s G i r l s Boys G i r l s Item % F Item , * Item t Item f F Item F Item $ F Zum 84 0 Zum 82 0 Zum 74 ."4 Zum 76 2 Zum 77 4 Zum 88 0 Vec 81 0 Vec 73 2 Vec 68 10 Vec 66 12 Vec 69 8 Vec 80 1 Mef 68 4 Mef 71 3 24 f 63 11 Wez 64 2 Wez 68 3 Z i f 73 • 6 Z i f 63 7 Mib 62 5 Wez 62 5 Z i f 61 12 Mef 67 8 Mef 72 2 Wez 62 2 Z i f 61 13 Mef 61 16 Mef 59 14 Z i f 66 16 Wez 70 3 Mib 59 7 Wez 60 3 Mib 56 11 Mib 58 13 Pog 58 7 Pog 62 3 Tov 57 6 Tov 60 4 Tov 53 14 Pog 56 6 Mib 56 8 Mib 60 7 Pog 56 5 Pog 55 4 Pog 53 11 Tov 54 7 Tov 52 15 Tov 56 6 Fa i l u r e : a score below 50 percent of the possible score 48. since the i n t e r r e l a t i o n of D and t o t a l scores was known only for thextest as a whole, and not f o r the i n d i v i d u a l items. Table XIV l i s t s these items i n order of d i f f i c u l t y , l e a s t to greatest, together with the percentage of maximum possible score and the number of subjects receiving less than a 50 percent score on each. Errors ranged from 12 to 48 percent of the possible score, depending upon the item and the conditions under which i t was presented. Varying the method of i n s t r u c t i o n did not materially change the rank-order; Zum and Vec retained t h e i r positions through-out as the easiest items, while Mib, Pog, and Tov f o r the most part were the hardest of the series. There was no eviden.ee to show that a given item was learned more ef f e c -t i v e l y by one method than by another, although i n no case did the highest average score on an item occur under pos i -tive-negative successive presentation. Analysis of s i m i l a r data for sub-groups (see Appendix I I , Tables C and D) reveals l i t t l e beyond the fact that there occurred among the high groups a greater spread between average scores on the easiest and most d i f f i c u l t items. Then, too, the closer s i m i l a r i t y between orders of i t e m - d i f f i c u l t y among high groups i s suggestive of the more predictable manner i n which members of these groups may have attacked the problems. 49. 2. V a l i d i t y And the Diagnostic Value of An Item. In order to determine the v a l i d i t y of items com- -posing a test, K e l l e y 1 recommends selecting two groups made up of the 27 percent of the subjects who received the high-est scores on the test and the 27 percent who received the lowest scores. For purposes of the present study upper and lower groups were selected from the boys i n Groups I and III on the basis of combined A, B, C, and D scores. The r e s u l t s of t h i s analysis (Table XV) suggest i n general greater discriminatory properties for those items l i s t e d i n Table XIV which are more remote from the 50 percent d i f f i c u l t y l e v e l than f o r items such as Mib, Tov, and Pog which closely approach t h i s l e v e l . Rank order correlations between order of d i f f i c u l t y and diagnostic value were .74 and .79 f o r Groups I and I I I , respectively, i n d i c a t i n g that f o r the d i f f i c u l t y of items i n t h i s study, the easier the item, the greater i s l i k e l y to be i t s discriminatory value. It may therefore be that the optimum d i f f i c u l t y ^ -l e v e l approximates 75 percent successes for the material of these experiments. This does not necessarily imply a 1. Kelley, T.L., c i t e d i n Long, J.A. and Sandiford, P., •'The Validation of Test Items", B u l l e t i n No.3 of the  Department of Educational Research, University of Toronto, 1935, p. 94. 5 0. TABLE XV. SUPERIORITY OF UPPER OVER LOWER GROUPS IN PER-FORMANCE ON TEST ITEMS, EXPRESSED IN TERMS OF CRITICAL RATIOS OF THE DIFFERENCES IN MEAN SCORE AND IN VARIABILITY. GROUP I BOYS GROUP : CII BOYS M C.R. 0 C.R.,, M C . R V 4.30 .47 6.77 .39 8.85 2.62 5.86 1.17 1.00 1.95 3.10 .98 5.48 .78 7.88 2.65 1.21 .09 2.47 2.21 • 1.40 2.24 4.65 1.41 5.20 2.87 7.05 1.15 6.67 1.98 7.83 1.52 contradiction of Thurstone's r e s u l t s , for the present tests are not unlike achievement tests of the true-false type i n which the medium d i f f i c u l t y l e v e l approaches 75 percent of the possible score. Nevertheless, i t would be f a l l a c i o u s to presume the general application of present findings without c a l l i n g attention to their l i m i t a t i o n s as defined by the size and s e l e c t i v i t y of the groups involved. For, as Long and Sandiford caution, "...the v a l i d i t y values obtained from data gathered on a p a r t i c u l a r group of subjects are not highly r e l i a b l e indications of t h e i r v a l i d i t i e s for another and widely d i f f e r e n t group." 1 In consequence, evidence i n Table XV of the greater v a l i d i t y of items l i s t e d under Group III may be more apparent than r e a l . . 1. Long, *T.A. and Sandiford, P., op. c i t . , p. 107. TABLE XVI. PERCENTAGE ACCURACY OF POSITIVE AND NEGATIVE COMPONENT SCORES GROUP I GROUP II GROt IP I I I Boys G] .rls Boys G i r l s Bo} 78 G i r l s + . - + - +• - +• -+- - •+• . -Mef... 73 64 68 75 57 63 54 62 62 70 64 77 Vec... 75 88 60 86 57 78 60 73 66 72 76 82 Zum..• 72 92 60 95 62 83 63 83 68 84 79 93 Tov... 33 76 34 78 21 74 21 76 21 73 27 74 Pog*.. 28 82 25 85 26 80 26 86 31 85 32 94 Z i f . . . 71 49 57 67 63 63 60 62 69 62 74 70 58. 3. Items Analyzed. Associated with t h i s whole conception of test? v a l i d i t y i s the need fo r an investigation of the various factors which combine to make an item easy or d i f f i c u l t . Examination of Table XVI, which translates averaged "no" , and "yes" scores for s i x of the eight test items into perr centages based upon accuracy of response to p o s i t i v e and negative test instances, repeals a preponderant tendency to respond more accurately to negative than to p o s i t i v e test instances. This behavior characterized a l l groups and was e s p e c i a l l y magnified i n the case of the more d i f f i c u l t items. In Mef and Z i f only was there any evidence of an equal or greater percentage of "yes" scores; the reason for th i s l i e s not so much i n the f a c t that the p o s i t i v e instances of Mef and Z i f were more e a s i l y i d e n t i f i e d than were those of other concepts but rather i n the fact that r e l a t i v e l y fewer negative instances were recognized. In a l l other items, however, p o s i t i v e instances offered greater d i f f i c u l t y . Group reaction to s p e c i f i c p o s i t i v e instances i n a number of items merits some attention at this point. Since space does not permit a complete study of the material"at hand, consideration w i l l he r e s t r i c t e d to several outstanding features of the more d i f f i c u l t items as applied to boys i n a l l three groups. 53. TABLE XVII. NUMBER OF CORRECT RESPONSES TO INDIVIDUAL POSITIVE INSTANCES OF POG, MIB, AND WEZ ON GROUP I GROUP II GROUP III Instance: 1 2 4 7 8 1 2 4 7 8 1 2 4 7 8 A. . 32 2 .2 4 4 30 3 2 3 2 41 2 2 0 3 B.. 19 4 9 24 . 4 22 9 6 12 7 31 7 4 7 4 POG C . 17 .17 32 12 3 21 5 7 20 6 32 5 17 27 8 D.. 19 15 15 12 12 23 7 12 ,21 8 32 8 12 21 10 Instanced 1 4 6 9 . 1 1 4 6. s : 1 4 6 9 A.. 39! • 6 1 3 23 4 1 4 i 20 3 2 5 B.. 30 6 3 6 23 12 9 7 19 7 10 10 MIB • C. . 19 16 2 19 31 8 1 6 40 9 2 5 D.. 22 16 i ! 13 25 10 5 9 i 35. 11 3 10 Instance: 1 3 6 1 8 9 1 3 6 8 9 1 3 6 8 9 A.. 33 10 38 25 30 38 7 39 26 25 44 5 41 25 28 B.. 19 35 19. 15 16 30 12 28 21 21 34 10 32 20 25 WEZ C . 36 21 34 27 28 26 37 25 19 22 34 42 29 22 24 D.. 21, 33, 22, 19. 21 28, 22. 25 21. 22 35 31. 29. 19. 21 Table XVII l i s t s the number of correct responses to posi t i v e instances of Pog, Mib, and Wez on each of tests A, B, C, and D. Turning f i r s t to Pog, r e s u l t s reveal that recognition was confined almost s o l e l y to instance 1 i n Test A but dropped somewhat and spread to other instances upon succeeding presentations. This trend may be explained by the close s i m i l a r i t y of the f i r s t teaching example and test instance 1. Both are unique i n displaying a horizontal l i n e 54. and a short .arc, the only difference being i n the r e l a t i v e p o s i t i o n of the two elements. In a l l groups, a reduction i n response to instance 1 followed presentation of the second teaching example and was accompanied by a more accurate recognition of instance 7, p a r t i c u l a r l y when assisted by the second p o s i t i v e teaching example. Here again s i m i l a r i t y between teaching and test instances i s i n the form of unequal straight l i n e s , thereby suggesting that a few individuals may have concentrated equally upon elements and r e l a t i o n s , or even upon elements alone. The comparatively poor response to instance 2 and 4 (elements intersecting) u n t i l r e l i e v e d by the t h i r d p o s i t i v e teaching example shows that " i n t e r -section" was a source of d i s t r a c t i o n to some. In the case of Mib test instance 1 was more readil y i d e n t i f i e d than were the other p o s i t i v e instances. This reaction was most pronounced following presentation of the f i r s t and second p o s i t i v e teaching examples, both of which display a c i r c l e touching on the outside of a square, i n common with the test instance. The t h i r d p o s i t i v e example ( c i r c l e within a square) lowered response to instance 1, but favored instances 4 and 9 i n which one element i s enclosed within the other. It appears, therefore, that the presence of t h i s extra r e l a t i o n s h i p of "insidedness , , and "outsidedness" i n some cases had a share i n monopolizing attention and so delimiting perception of the relationship defining Mib. Undoubtedly test instance 6, the most r a d i c a l 55. of the series i n i t s departure from the teaching examples, was also the most d i f f i c u l t . But f a i l u r e to id e n t i f y t h i s instance suggests that perhaps response was also to the figure or pattern as a whole, and that accentuation of the r e l a t i o n "larger than, smaller than" i n such a case consti-tuted a d i s t o r t i o n . In Wez are found some of the properties described for Mib. As before, those instances bearing closest resem-blance to the immediate teaching example i n point of r e l a t i o n , s i z e , and shape e l i c i t e d the greatest response. In Test A those p o s i t i v e instances which, l i k e the teaching example, contain a c i r c l e on the outside of a tr i a n g l e , were r e a d i l y recognized, but not so with instance 3, i n which the c i r c l e i s i n s i d e the t r i a n g l e . D i f f i c u l t y with the l a t t e r was greatly a l l e v i a t e d by the introduction of the second p o s i -t i v e example displaying a c i r c l e i n s i d e the t r i a n g l e . The dominance of t h i s r e l a t i o n s h i p of "insidedness" and "outsidedness" i s reemphasized by the fact that i n i t i a l recognition of instance 3 was usually extended to include negative instance 5 ( c i r c l e i n s i d e , but not touching the t r i a n g l e ) . Much of what has been saidcof Mib and Wez f i n d s r e p e t i t i o n i n the re s u l t s f o r Tov. With Z i f , however, there i s no question of "insidedness" or "outsidedness", and a l l p o s i t i v e instances evoke s i m i l a r response. But one of the negative test instances, namely'instance 10 ( c i r c l e touching 56. both sides and an end of the rectangle), was of unparalleled d i f f i c u l t y . In a l l groups there was a better-than-chance tendency to regard t h i s instance as a Z i f . I n a b i l i t y of the second negative teaching example ( c i r c l e touching one side and an end of the rectangle) to s h i f t the response shows that the necessary r e l a t i o n s h i p was only p a r t i a l l y perceived. In the l i g h t of the teaching examples used, the v a l i d i t y of th i s p a r t i c u l a r test instance i s questionable, for thie former make i t cl e a r that a c i r c l e must touch both sides of a rectangle but they f a i l to- specify that these must be the only two points of contact.* When i t i s remembered that instance 10 represents one quarter of the t o t a l number of negative test instances f o r Z i f , the discrepancies i n Table XVI are more e a s i l y understood. Analysis; also discloses that t h e x d i f f i c u l t y experr ienced by many boys i n Group I i n i d e n t i f y i n g negative instances of Mef sprung from t h e i r loosely defining Mef as "a c i r c l e , partly black, partly white", and re j e c t i n g only those three instances (4, 7, 10) i n which these q u a l i t i e s were absent. But with the advent of the f i r s t negative instance ( c i r c l e , p a r t l y black, p a r t l y white) i n Gro.ups II and I I I , t h i s hypothesis suffered a set-back. An exhaustive analysis c a l l s for a study of reaction to a l l test instances, both p o s i t i v e and negative. * The fact that deduction of the rule governing Z i f i s impossible from p o s i t i v e examples alone constitutes an argument in favor of the use of negative examples. . 5 7 . But the foregoing, coupled with a further inspection of Table XVII y i e l d s the following tentative conclusions: A. Negative instances are more e a s i l y i d e n t i f i e d than p o s i t i v e instances. B. I t e m - d i f f i c u l t y i s lar g e l y an inverse function of the s i m i l a r i t y between teaching and test instances. This s i m i l a r i t y e f f e c t i s most apparent when the p a r t i c u l a r teaching and test instances are i n . juxtaposition, but diminishes somewhat upon i n t e r -ference by.succeeding teaching examples. C. The presence of r e l a t i o n s i n c i d e n t a l to the concept impedes solution. Items i n which the d e f i n i t i o n i s f u l f i l l e d i r r e s p e c t i v e of whether or not one of the elements involved i s enclosed within the other are more d i f f i c u l t than items i n which these added r e l a t i o n s are an integral part of the necessary or defining r e l a t i o n . This special tendency may derive from the use of Dax as a demonstrative example. D. Opportunity for hypothesis appears as a factor i n i t e m - d i f f i c u l t y . This conclusion supports Tyler's contention.-^ . E. The value of p o s i t i v e and negative teaching examples varies with d i f f e r e n t test instances within a given item. A p a r t i c u l a r example may a s s i s t one subject 1. Tyler, F. T. op.' c i t . 58. but not another on a given test instance, or i t may help a subject with onextest instance but hinder him with another. I f l i t t l e else, these r e s u l t s c l e a r l y depict how serious would be the consequences of a rearrangement or reconstruction of the teaching and test instances upon item-validity values. At the same time, they provide clues toward the improvement of v a l i d i t y i n a number of cases. 4. Another Aspect of V a l i d i t y . It has been pointed out that the determination of whether a test i s f u l f i l l i n g the purpose for which i t was constructed must consider not merely the v a l i d i t y of the test items, but the degree of c o r r e l a t i o n between those items. Thus, other conditions being s a t i s f i e d , the lower the co r r e l a t i o n s , the higher the v a l i d i t y of the test as a whole 1. . To supplement the material o f the preceding sections, Table XVIII l i s t s i n t e r c o r r e l a t i o n s of successively presented items f o r boys i n Groups I and I I I . From the standpoint of v a l i d i t y i t i s apparent that the values f o r Group I most nearly f i t the demands f o r a low i h t e r e o r r e l a t i o n of test items. A careful study of the c o e f f i c i e n t s foundcin t h i s l a t t e r group discloses several i n t e r e s t i n g f a c t s : 1. Long, J.A., and Sandiford, P., op. c i t . , p. 119. 59. TABLE XVIII. INTERC OR REL ATI ON OF COMBINED A, B, C, AND D SCORES ON TEST ITEMS. ' GROUP I BOYS GROUP III BOYS r SE r r S E r .39 .13 ,45 .12 .16 •14 , ..34 .13 .-•04 .15 .50 .11 ZUM-TOV 9.26 .14 ' .30 .14 T O V - P O G • -.08 .15 • .28 .14 .11 .15 .52 .11 .55 .10 .67 .08 • • • .55 .10 .43 .12 . .14 .15 .09 .15 A v i r t u a l or s i g n i f i c a n t r e l a t i o n s h i p existed between perfor-mances on easier items, but there was l i t t l e or no connection between performances on the more d i f f i c u l t items. Also, scores on easy and d i f f i c u l t items were p r a c t i c a l l y unrelated. One possible explanation f o r these low correlations i s that scores on the d i f f i c u l t items may have been more dependent upon chance factors i n contrast to scores on easy items. Again, the influence of transfer cannot be e n t i r e l y ignored. 60. Ruger 1 i n his experiment with mechanical puzzles was well aware of the s i g n i f i c a n t e f f e c t of order of presentation upon problem-solving. He observed that some subjects were apt to generalize from one item to another as regards some d e t a i l of s i m i l a r i t y which actually had no bearing upon i t s solution. Such behavior might explain the negative i n t e r c o r r e l a t i o n of Zum and Tov i n Group I. Because of the large number of perfect scores i n Zum, there was probably a d i s t i n c t tendency to carry over the perceived relationship of "insidedness" and "outsidedness" to an attempted solution of Tov. S i m i l a r l y , transfer e f f e c t s , either p o s i t i v e or negative, may partly account f o r the presence or absence of int e r r e l a t i o n s h i p s among the other test items. 5. Summary of Chapter V. 1. In the case of whole groups (N = 45) the test items were a l l of les s than 5.0% d i f f i c u l t y , with errors ranging from 12 percent to 48 percent of the possible score. 2. Order of item d i f f i c u l t y correlated highly with diagnostic value, the most d i f f i c u l t items possessing a lower diagnostic value than the easier items. 1. Huger. H.A., "The Psychology of E f f i c i e n c y " , Teachers"  College Educational Reprints. No. 5, 1926 (a reprint of Archives of Psychology, No. 15, 1910) pp. 29-30. 61. Pos i t i v e test instances presented greater d i f f i c u l t y than negative instances. Other factors upon which ±$em d i f f i c u l t y appeared dependent were (a) the extent of s i m i l a r i t y between teaching and test instances. (b) the presence of extraneous re l a t i o n s within the stimulus pattern. (c) the number of l i k e l y methods of solution which suggested themselves. The i n t e r c o r r e l a t i o n s of test items i n Group I Boys were i n c l i n e d to be low and n e g l i g i b l e ; those i n Group III Boys were considerably higher. In Group I easier items were s i g n i f i c a n t l y intern-related, while d i f f i c u l t items correlated low both with one another and with easier items. 62. CHAPTER VI. GROUP REACTION TO SUCCESSIVE TEST PRESENTATIONS Up to t h i s point attention has been given over almost exclusively to a study of general test performance and. the o v e r a l l effectiveness of the several i n s t r u c t i o n a l methods. To estimate more f u l l y the e f f i c a c y of the negative teaching example, the preceding study must be succeeded by a detailed analysis of step-by-step performance and an inquiry into the r e l a t i v e merits of the d i f f e r e n t modes of presen-tation as they a f f e c t perseveration and the formulation and r e j e c t i o n of hypotheses. Was progress toward solution gradual or rapid? How c l o s e l y did progress on one item p a r a l l e l that on another? With what degree of consistency did i n d i v i d u a l s respond to i n s t r u c t i o n by negative examples? This chapter w i l l be devoted to answering these and other s i m i l a r questions. 1. Improved and Unimproved Scores. One approach to a study of the development of group reaction to repeated presentations of test stimuli i s through a numerical consideration of improved and unimproved scores within each group. Unimproved scores, assembled i n Table XIX, are subdivided three ways to include instances of f l u c t u a t i n g unimproved scores, r e v e r s a l s of judgment, TABLE XIX. NUMBER OF UNIMPROVED SCORES.OUT OF A POSSIBLE 120 FOR EACH SUB-GROUP, „ CLASSIFIED IN TERMS OF DECLINATION* REVERSAL OF JUDGMENT2 AND PERSEVERATION DECLINATION REV.OF JUDG. PERSEVERATION18 GROUP I GROUP II H |S 7 2 43 3 8 r o t a l 44 1 5 114 6 31 G i r l s H 28 3 15 M 40 2 5 r o t a l 42 1 14 H 110 6 34 28 3 9 Boys M 39 3 6 LCotal 43 2 7 110 8 22 GROUP III G i r l s H 31 6 8 M 33 5 8 44 8 5 H 108 19 21 30 3 8 Boys M 38 3 7 L [Total 32 6 2 i 100 12 17 G i r t H 21 1 19 M Obtal 28 5 13 32 4 6 81 10 38 TOTAL TOTAL PERCENTAGE OF UNIMPROVED SCORES 151 42 150 42 140 39 148 41 129 36 129 36 1. includes a l l f l u c t u a t i n g scores which f a i l to improve beyond the A score. 2. a perfect A score terminating i n an imperfect D score. 3. an unchanging score which is not a perfect score. OS ro 63. and perseveration, each defined as i n the table. Of a t o t a l of 360 items, 36 to 42 percent d i s -played a lack of improvement i n attempts beyond the i n i t i a l or A Test, though i n 96 percent of a l l such cases A-scores were of 50% grade or better. Results were inconclusive i n ascribing greater progress to one method than to another, though there i s some evidence that the absence of memory hastened.improvement among low IQ groups. According to further observation, reversale of judgment were r e l a t i v e l y infrequent. Perseveration occurred i n a greater number of cases and.offered l i m i t e d support f o r the theory that the negative teaching example tends to i n t e r f e r e with mental i n e r t i a . However, because of the r e s t r i c t e d d e f i n i t i o n of perseveration, imposed by the very nature of the experiment, any further conclusions would be misleading. For example, indications that the high groups were equally or more often subject to mental i n e r t i a than other groups quite overlooks the f a c t that perseverative response by the former normally occurred on a higher score l e v e l , the only obstacle to a perfect score sometimes being a probable perceptual oversight or a flaw i n the test i t s e l f . This fact leads to an inescapable admission of the manifold d i f f i c u l t i e s besetting an ob-j e c t i v e analysis of t h i s type of behavior. 64. I s o l a t i o n of perseveration i n a l l i t s aspects ent a i l s a recognition of such possible forms of reaction as unswerving response to d e t a i l or to the whole pattern, con-centration f i x e d upon s i m i l a r i t i e s or upon differences, unyielding emphasis upon elements rather an upon r e l a t i o n s . Yet i t i s very doubtful i f more than a minority of such cases could be covered by an account which regards unaltered responses to individual test instances as the sole outward manifestations of perseveration. 2. Perfect Scores. Group progress may be further analyzed by a consideration of perfect scores. Reference has already been made to instances embodying reversals of judgment wherein a perfect A scofe i s coupled with a subsequent decline i n achievement. Table XX furnishes additional data and provides for more extensive conclusions, These may be stated i n b r i e f : 1. Of a l l scores displaying improvement beyond the i n i t i a l or A score, 18 to 28 percent were solutions i n the sense i n which t h i s term applies to perfect D scores. i 2. Under successive presentation, the negative teaching example appeared to have a neutral, i f not reductive e f f e c t upon the number of solutions achieved. TABLE XX. NUMBER OF ITEMS SOLVED* BY SUB-GROUPS TOGETHER WITH THE TOTAL NUMBER OF SOLUTIONS OUT OF A POSSIBLE 360 FOR WHOLE GROUPS. HIGH  MEDIUM  LOW  TOTAL GROUP I Boys 24 16 17 57 G i r l s 24 14 6 44 GROUP II Boys 21 12 8 41 G i r l s 22 11 6 39 GROUP I I I Boys 29 20 8 57 Girls. 30 18 17 65 3. Both, boys and g i r l s solved more concepts under cumulative than under successive presentation involving the negati vexexample. 4. High groups were credited with more than 40 percent of a l l solutions and gave evidence of a better-than-chance tendency that a perfect score occurring i n Test A would maintain i t s e l f throughout a l l t e s t s . * * 3. Mean Scores. Probably the most adequate method for estimating the extent of group advancement beyond the i n i t i a l test i s by a comparison of average A, B, C, and D scores. The •'learning*' curve described f o r each of the major groups (Table XXI) indicates that A scores were l i t t l e battered in succeeding t e s t s , the gain nowhere exceeding ten * perfect D score. ** Appendix I I , Table E. 66. TABLE XXI. GROUP PERFORMANCE ON TOTAL OF EACH OF TESTS A, B, C, AND D. MEAN SCORES, STANDARD DEVIATIONS, AND STANDARD : ERRORS. A.M. TEST A S.D. SRQXT B I GROUP II GROUP I I I Boys • G i r l s Boys G i r l s Boys G i r l s 50.90 (i93) , 6.24 (.66) 50.19 (1.06) 7.08 (.75) 49.21 (1.13) 7.60 (.80) 51.34 (1.05) 7.04 (.74) 51.17 (1.40) 9.40 (.99) ! 53.48 (1.01) 6.80 (.72) A.M. TEST B S.D. 53.12 ' (.88) 5.88 (.62) 52.68 (1.15) ; 7.68 (.si) : 45.83 (1.77) 11.84 (1.25) -43.88 (1.79) 12.0 (1.26) 48.41 (1.83) 1 12.24 : (1.29) 54.46 (1.24) 8.28 (.87) A.M. TEST ~ S.D. 53.30 -(1.06) 7.12 (.75) 53.12 (1.15) 7.68 ; (.81) ' 53.74 (1.25) 8.40 : (.89) , 55.43 (1.31) 8.80 (.93) 53.83 (1.63) 10.90 i (1.15) 58.90 (1.58) 10.56 (l.U) A.M. TEST ~~ S.D. 53.57 (1.06) 7.08 (.75) 52.86 (1.25) 8.40 ! (.89) 48.50 ! (2.01) i • 1 13.48 (1.42) i 47.52 1 (1.98) ; 13.24 (1.40) j i 51.43 (2.15) 14.40 (1.52) 57.12 (1.41) 9.48 (1.00) percent of the A score. In general outline, developmental reaction assumed some of the c h a r a c t e r i s t i c s displayed by Dickinson's groups 1. For example, Group I demonstrated most gain from Test A to B, with improvement thereafter becoming almost imperceptible. And i n Groups II and II I performance was marked by decided fluctuations. Immediately 67. following presentation of the f i r s t negative example there occurred a drop i n response, attended by an increase i n v a r i a b i l i t y . In Test C scores rose sharply and displayed greater uniformity, only to suffer a relapse i n Test D. Of these groups only g i r l s i n Group III f a i l e d c t o conform to t h i s general pattern. Group progress has been measured s t a t i s t i c a l l y by considering v e r t i c a l differences between the mean scores of Table XXI. These differences, interpreted i n terms of c r i t i c a l r a t i o s (Table XXII) indicate a most "rapid" change in scores from Tests B to C f o r Groups II and I I I , the drop i n scores thereafter displaying s i g n i f i c a n c e only within Group I I . Another noteworthy fact i s that t h i s l a t t e r group i n i t s progress from A to C a c t u a l l y surpassed by a small margin the progress achieved by Group I. Measurement of progress of d u l l and bright sub-groups* c a l l s f o r an amendment to e a r l i e r conclusions which stressed the unfavorable effect of the negative teaching example upon low-group achievement. Thus, a comparison of differences c r e d i t s the system of positive-negative presentation with promoting the greatest gain from A to C among low groups. Or to state i t i n a diff e r e n t way: There i s some evidence that for these groups p o s i t i v e teaching examples are generally more e f f e c t i v e when Appendix I I , Tables G to J . 68. TABLE XXII. CRITICAL RATIOS OF THE DIFFERENCES* BETWEEN MEAN TEST SCORES WITHIN EACH GROUP (BRACKETED LETTERS DESIGNATE TEST HAVING HIGHEST SCORE) GROUP I GROUP II GROUP I l l Boys G i r l s Boys G i r l s Boys . . Girls, A and B 2.20(B) 2.62(B) 2.24(A) 4.78(A) 1.66(A) 1.54(B) B and C .23(C) .70(C) 4.88(C) 6.80(C) 3.76(C) 4.04(C) C and D .48(D) .39(C) 3.03(C) 4.32(C) 1.63(C) 1.58(C) A and C 2.82(C) 3.02(C) 4.19(C) 3.47(C) 2.31(C) 4.34(C) A and D 3.18(D) 2.49(D) • 39(A) 2.11(A) .13(D) 2.91(D) B and D .59(D) .29(D) 2.59(D) 4.04(D) 2.96(D) 3.59(D) * For differences i n v a r i a b i l i t y see Appendix I I , Table F. immediately anteceded by a negative example than when given in continuous series with p o s i t i v e examples. With the high groups improvement from A to C was more pronounced under the influence of po s i t i v e successive presentation as against positive-negative successive presentation, suggesting that i n th i s case p o s i t i v e examples may possibly have greatest value when immediately preceded by other p o s i t i v e examples rather than by negative examples. A precise measurement of these' effects was d i f f i c u l t to obtain f o r the reason that the positive examples studied immediately p r i o r to Test C i n groups I and II were not i d e n t i c a l , the second teaching example i n the former group serving as the t h i r d example i n the l a t t e r group. F i n a l l y , differences between A and D Test scores, though not s t a t i s t i c a l l y computed, were of GROUP PROGRESS ON INDIVIDUAL TEST  ITEMS 6BDGP I BOOT 100 . 90 80 « L 70 o " CO W 40 H = 1 = # ft I m i 1 i m A B C O 4SSL. 100 90 80 70 60 SO 40 p mm 1 mm I 1 f i t i A B e 1 4 a c. » 8BB0T m BOW #= #: "Hi 1 I i It n II 4 B 0 B tact i i i i P i *&. ; :•; t i-ii A . 8 O B tut >* Pig. 1 1 89. s u f f i c i e n t size to suggest that low groups progressed most "rapidly" when faced with cumulative presentation; of the high groups the g i r l s gave evidence of greatest progress from Tests A to D under cumulative presentation, while the hoys achieved the greatest advance under po s i t i v e successive presentation. In order to comprehend more f u l l y the nature of the "learning" gradient, graphical outlines of progress on i n d i v i d u a l test items were drawn for each Group ( F i g . 11) revealing a number of opposing trends i n performance. For instance, concerning boys i n Group I i t was found that the second p o s i t i v e teaching example was immediately followed by a sharp r i s e i n Mef scores and an equally sharp decline i n Wez scores, with a r e p e t i t i o n of such performance succeeding presentation of the fourth p o s i t i v e example. The contrasting effects of d i f f e r e n t teaching examples were also r e f l e c t e d i n Group I g i r l s ' scores for Zum and Wez. With the boys in Group II a sharp drop i n Vec scores immediately accompanied presentation of the negative examples, i n opposition to the imperceptible changes i n corresponding Tov scores. The "learning" curves for Group II g i r l s likewise indicate that some of the easier items were characterized by greater fluctuations in per-formance than were the most d i f f i c u l t items. Turning to a consideration of Group I I I boys, a decline i n achievement from Test A to D was associated with one of the easiest 71. items, namely Vec, whereas considerable progress was made on one of the most d i f f i c u l t items, Pog. In the g i r l s ' group a similar difference i n trend occurred i n two items of almost equal d i f f i c u l t y . This lack of p a r a l l e l i s m between performance on test items prompts an inquiry into how cl o s e l y t o t a l performance on one test was associated with that on another. Table XXIII reveals a s i g n i f i c a n t relationship between test scores, with correlations varying from near low to high. Among TABLE XXIII. INTERCORRELATIONS OF AVERAGE A, B, C, AND D TEST SCORES. A and B oB and C C and D A and C A and D B and D GROUP I Boys .38 .68 .86 .64 .65 .71 G i r l s .63 .85 .85 .62 .58 .87 GROUP II Boys .53 .47 .52 .59 .42 .86 G i r l s .50 .43 .44 .52 .41 .89 GROUP" III Boys .50 .66 .73 .72 .44 .88 G i r l s .53 .72 .72 .61 .51 .85 eo'riei.usI6M§ that" may: Be drawn ale 1 EHe1. following: 1. I n i t i a l performance afforded but a rough estimate of f i n a l achievement. 2. Average i n d i v i d u a l reaction to the f i r s t and second negative teaching examples d i s -played a high degree of consistency. 72 3* Performance aided by positive-negative successive presentation appeared s l i g h t l y more i r r e g u l a r and unpredictable than when assisted by e i t h e r of the two remaining methods, as suggested by the lower c o r r e l a -tions i n Group I I . 4. P o s i t i v e and Negative Component  Scores In the l a s t chapter b r i e f reference was made to the accuracy with which negative t e s t instances of a concept were i d e n t i f i e d . A broader picture of group performance, in. t h i s respect i s afforded by Table XXIY, which converts average p o s i t i v e and negative component scores into percentages f o r high, medium, and low I.Q. groups. It expresses an unmistakeable tendency by these groups to score higher on every test i n t h e i r recognition of negative instanoes, the only two exceptions attaching to C and D scores of High Group I. Also observable i s the d i s p a r i t y between high and low group accuracy i n responding to p o s i t i v e instances and the closer s i m i l a r i t y of t h e i r negative scores. Comparisons between high and low groups based on the above table show that i n 19 out of 24 oases the difference between p o s i t i v e component scores was f i f t e e n points or more, to the disadvantage of the low group; but i n 17 out of 24 cases low group negative scores 73. TABLE XXIV. PERCENTAGE ACCURACY OF POSITIVE AND NEGATIVE COMPONENT SCORES ACHIEVED BY WHOLELAND SUB-GROUPS ON TESTS A, B, C, AND D.  GROUP I GROUP II GROUP II I BO G i r l s Boys G i r l s Bo ys G i r l s •+- — • — + -+- -V- — H 57 70 57 76 i 57 74 57 78 61 ; 80 61 80 TEST A M 48 72 41 • 80 1 43- 74 46 81 48 j 81 : 46"! 81 L 48 80 35 81 35 78 43 74 41 ; 65 52 | 76 SHML 50 74 43 80 46 76 48 78 50 76 52 j 80 H 65 72 si 76 57 72 54 78 57 78 65 ! 83 TEST B M 50 76 46 85 35 70 33 65 43 : 72 : 43 | 80 L 48 81 37 81 39 65 26 65 37 67 50 ! 80 2-HML 54 76 48 ; 81 41 69 37 69 i 46 72 52 | 81 H 74 69 65 78 67 76 63 ; 80 70 76 . 76 80 TEST C M 5£ 74 46 83 54 81 57 81 . 57 ; si: 57 ; 80 L 52 76 41 80 ; 37 81 i 54 ! 76 ! 52 67 1 65 : 81 CHML 59 74 50 80 : 52 80 57 I 80 59 • 74 65 80 H 72 69 63 80 67 74 59 ; 80 63 78 ; 74 •; 83 TEST > D M 50 80 43 85 37 70 39 69 52 : 76 V 50 80 L 50 80 35 83 . 39 69 : 37 : 69 46 : 67 i 59 ; 80 SHML 57 76 48 83 48 70 43 : 72 \ 54 ; 74 1 61 1 81 were greater than or no more than f i v e points below high group negative scores. In other words, the low. IQ groups f e l l f a r behind i n the detection of pos i t i v e instances, but demonstrated closer a b i l i t y with the high groups i n handling negative instances, especially i n Group I where they actually 74. surpassed the high groups i n t h i s respect. The medium groups, instead of retaining a p o s i t i o n midway between these two extreme groups, behaved more nearly l i k e the low group. This l i n e of demarcation distinguishing the per-formance of high groups from that of medium and low groups suggests that i t i s not altogether impossible that there may be a point along the i n t e l l i g e n c e scale at which the a b i l i t y to score reasonably high i n the i d e n t i f i c a t i o n of po s i t i v e instances suddenly makes i t s e l f f e l t . There i s need for further experimentation of th i s nature, involving larger and mores..representative groups. Several studies have examined the effect of p o s i t i v e and negative instances insofar as they concern the simple recognition of d i f f e r e n t materials. A c h i l l e s 1 , experi-menting with geometric forms, words, nonsense s y l l a b l e s , and such l i k e , f i r s t presented to his subjects a number of items and then required that they respond to a recognition test comprised of these and numerous new or unfamiliar items. It wasffound that greater accuracy characterized response to the unfamiliar than to the unfamiliar, but there i s no record of the s t a t i s t i c a l r e l i a b i l i t y of this trend. The author concluded that "the new make a d i s t i n c t impression and the subject responds with more certainty.....This strangeness or 1. A c h i l l e s , l.M., "Experimental Studies i n Recall and Recognition", Archives of Psychology, v o l . VI, Sept. 1920, pp. 75. newness appears to be a p o s i t i v e thing." A s i m i l a r 2 experiment by Seward involved the presentation of a series of papers bearing d i f f e r e n t designs and colors, followed by an interpolated task, and then the recognition t e s t . The correctness of the p o s i t i v e response was observed to be d i r e c t l y proportional, that of the negative response inversely proportional to the degree of i d e n t i t y between the presentation and the test s t i m u l i . That i s , i d e n t i t y between o r i g i n a l and immediate stimuli gave r i s e to a more accurate p o s i t i v e response, while d i s s i m i l a r i t y between the two favored the negative response. Tendency differences were regarded as being highly r e l i a b l e f o r the select group used; correlations with i n t e l l i g e n c e were inconclusive. The close congruency between thesexresults and those issuing from the present study l i s t s the p o s s i b i l i t y that the confronting problem, oversimplified by some, may have resolved i t s e l f into one of mere recognition. Born of a misunderstandingcof the preliminary i n s t r u c t i o n s , there may have developed a strong tendency to seek i n a test facsimiles of the teaching examples, with an eye to exactness of siz e , shape, p o s i t i o n , and number of elements. Because:-of the wide d i v e r s i t y of d e t a i l between most teaching and test s t i m u l i , with the resultant emphasis upon d i s s i m i l a r i t y rather than upon s i m i l a r i t y , subjects 2. Seward, G.H., "Recognition Time As A Measure of Confidence (An Experimental Study of Redintegration)", Archives of  Psychology, Vol. XVI, 1928, pp. 1-54. 76. may have been more readily able to i d e n t i f y negative than p o s i t i v e instances. This theory finds a measure of support i n material of the preceding chapter. 5. Summary of Chapter VI. 1. A majority of scores showed some improvement beyond Test A. 2. Testing conditions made d i f f i c u l t a complete and objective analysis of perseveration. 3. Accurate prediction of f i n a l achievement was impossible on the basis of i n i t i a l performance. 4. The apparent i n h i b i t o r y e f f e c t of the negative teaching example, manifested by an adverse change i n both mean score and v a r i a b i l i t y , was immediate rather than of prolonged duration. 5. Negative teaching examples were highly consistent i n their o v e r - a l l immediate ef f e c t upon i n d i v i -dual performance. 6. Among low groups there were indications of the greater effectiveness of p o s i t i v e examples when immediately preceded by negative examples. For the high groups the p o s i t i v e teaching example seemed most e f f e c t i v e when preceded only by other p o s i t i v e examples at lea s t as far as successive presentation was concerned. (These tendencies were not s t a t i s t i c a l l y r e l i a b l e ) . 77. 7. The negative teaching example f a i l e d to augment the number of complete generalizations and may act u a l l y have hampered th e i r development. 8. The study of p o s i t i v e and negative component scores offers a p r a c t i c a l approach to per-formance-analysis . 78. CHAPTER VII. FINAL EVALUATION AND CRITICISM The contrasting results of the various experimental studies i n concept formation underscore the supreme im-portance of reducing and removing the p o s s i b i l i t y of inaccuracies i n f l i c t e d by the presence of uncontrolled variables of one kind or another. For the successful control of these variables i n i n d i v i d u a l experiments more adequate f a c i l i t i e s are at hand than i n group experiments where increased complexity adds to the d i f f i -culty, of the s i t u a t i o n . Oneo of the chief prerequisites i n the experimental control of test performance i s a complete and comprehensive set of instructions. Without t h i s proper guidance the i s o l a t i o n and measurement of sp e c i a l a b i l i t i e s becomes v i r t u a l l y impossible, f o r , as Thurstone^writes, "the f a c t that a person has a high rating i n a p a r t i c u l a r a b i l i t y does not help him to superior performance i n a task unless the task involves the a b i l i t y i n question." I f , i n these tes t s , language defic i e n c i e s and a consequent i n a b i l i t y to take f u l l advantage of the i n s t r u c t i o n s obstruct the most complete expression of generalizing a b i l i t y of which the 1. Thurstone, L.L., op. c i t . , p. 3. subject i s capable, then test v a l i d i t y i s seriously impaired. In Wood's experiment this d i f f i c u l t y was countered by a certain f l e x i b i l i t y " 1 " i n the instructions which permitted t h e i r adaptation to ind i v i d u a l needs. Furthermore, the essential difference between posit i v e and negative examples was thoroughly "stamped i n " by having the subjects place each i n separate p i l e s , thus supplementing v i s u a l and auditory with kinaesthetic cues. These arrangements pro-vided at the outset a reasonable assurance of the subject's f u l l acquaintance with the demands of the task before him to a degree not possible i n group experiments such as the present one. Therein probably l i e s one of the p r i n c i p a l reasons why the present r e s u l t s d i f f e r e d so markedly from those obtained under a system of i n d i v i d u a l testing. In other words, there i s the p o s s i b i l i t y that generalizing a b i l i t y , as related to non-verbal material, was one among several a b i l i t i e s evoked by the present group t e s t s , and that, i f so, inadequacy of the inst r u c t i o n s may have been partly responsible for t h i s state of a f f a i r s . This supposition finds some basis i n the high percentage of unimproved scores. Also, the sharp drop i n average scores on Tests B and D, coupled with the high degree of relationship between such performance, i s supporting 1. Delivery of the inst r u c t i o n s concluded with the words, " I f you do not understand any part of what you are to do, please ask me about i t now." Wood, J . , op. c i t . , jkppendix 80. testimony that the value of the negative example was appreciated by only a l i m i t e d number. This may s i g n i f y lack of proper guidance or i t could mean that the negative example was detrimental i n i t s e l f . The p r o b a b i l i t y of the former i s suggested by a careful re-examination of the directions accompanying the tests i n the l i g h t of various findings, a matter which l a t e r w i l l be discussed in'greater d e t a i l . Of course, differences between experimental r e s u l t s cannot be j u s t i f i e d i n terms of differences i n test admin-i s t r a t i o n alone. In analyzing reaction to i n d i v i d u a l test instances i n an e a r l i e r chapter, i t was observed that the whole character of the tests could be made to undergo considerable change by s l i g h t l y a l t e r i n g or r e s h u f f l i n g the test material. Because p r a c t i c a l necessity i n the present case demanded that such changes be made i n Wood's test material, already a modification of Smoke's o r i g i n a l , our tests may have greater or lesser p o t e n t i a l i t i e s for measuring generalizing a b i l i t y at a given age l e v e l than those tests from which they were constructed. A l l com-parisons must take into account th i s f a c t ; e s p e c i a l l y i s t h i s true where Dickinson's experiment i s involved, for here i t was deemed advisable to carry these changes even further by completely redefining several test items to f i t the needs of a s t i l l younger group. 81. One need hardly continue further to r e a l i z e that i t i s d i f f i c u l t enough to c l a s s i f y one form of a test as "more v a l i d " or "less v a l i d " than some other, but much more d i f -f i c u l t to makexa pronouncement as to i t s ultimate v a l i d i t y . The problem must be subject to attack, not from one f i x e d point of view, but from a l l sides. The danger of re s t i n g judgment upon mere s t a t i s t i c a l formulae i n analyses of t h i s sort i s c i t e d by Kuhlmann who charges that s t a t i s t i c a l method "puts i t s main f a i t h i n the p o s s i b i l i t y of what i n effect amounts to correcting error made in observations a f t e r they have been made, of supplementing or supplying observations where none e x i s t . " ^ For example, application of the index of r e l i a b i l i t y to determine the v a l i d i t y of the tests under consideration would bestow upon them high values which are unsubstantiated by the r e s u l t s of a more extended examination. This p o s s i b i l i t y went large l y p unnoticed by Dickinson who concluded that her tests were v a l i d solely on the basis of the index of r e l i a b i l i t y and a highly subjective analysis of the process of concept formation. And so, i n reviewing present findings associated with test v a l i d i t y , the only deductions that can be r e l i a b l y made must take the form of recommendations f o r 1. Kuhlmann, F., "Our Changing Fashions i n Methods of Research", American Journal of Psychology, vol.§5, 1942, p.572-3. 2. Dickinson, A.E., op. c i t . , p. 51-2. the more e f f e c t i v e control of test procedure. Concerning the actual v a l i d i t y of these group t e s t s , a f i n a l verdict must await further experiment, for a close study of ind i v i d u a l performance and of p o s i t i v e and negative component scores has made i t appear not unlikel y that recognitive a b i l i t y rather than generalizing a b i l i t y was frequently being tested. 83 CHAPTER VIII. CONCLUSIONS. IMPLICATIONS, AND SUGGESTIONS  FOR FUTURE RESEARCH 1. Conclusions The expanding r o l e of conceptual thinking i n human endeavor pointed to the need of developing tests f o r the accurate measurement of such a b i l i t y . Concept formation was used synonymously with generalization, and was defined according to Smoke's usage of the term as "a process where-by an organism develops a symbolic response (usually but not necessarily l i n g u i s t i c ) which i s made to the members of a class of s t i m u l i patterns, but not to other stimuli."^-It i s primarily a process of responding to common r e l a t i o n -ships, though elements would appear to constitute a neces-sary part thereof. A b r i e f outline of relevant studies suggested that more r e p e t i t i o n and continuation of previous experiments would help s a t i s f y a need f o r the perfecting of techniques and the establishment of a basis f o r more exten-sive generalizations. Accordingly, i t was decided to check hypotheses advanced by previous experimenters u t i l i z i n g Smoke's technique of guaging conceptual a b i l i t y i n terms of the a b i l i t y to perceive an inter-element rel a t i o n s h i p common to a series of geometric patterns. Since Wood2 had already applied t h i s technique to the i n d i v i d u a l study of generaliz-1. Smoke, K. L., op. c i t . ; p.8. S. Wood, J . A., op. c i t . 84 ing a b i l i t y i n Grade VI boys, i t s a p p l i c a t i o n to a group study at the same l e v e l of educational attainment appeared worthy of i n v e s t i g a t i o n . To permit analysis of the varied e f f e c t s of i n s t r u c t i o n upon success i n generalizing, arrangements were made to study performance under three sets of conditions. Eor t h i s pur-pose, subjects were selected and matched with one another according to sex, chronological age, and I.Q. to form three experimental groups (exclusive of t r i a l groups), each com-p r i s i n g 45 boys and 45 g i r l s . These i n turn were subdivided into groups representing children of high, medium, and low i n t e l l i g e n c e . The general procedure required the presenta-t i o n , by means of f i l m s l i d e s , of a series of teaching and test instances f o r nine d i f f e r e n t concepts. The study-time fo r each of the four teaching examples was 8 seconds, while the t o t a l time required for response to the 10 test instances was 25 seconds. The three experiments were a l i k e i n t h e i r use of a fore-test and a l l employed the same tes t s , each made up of an almost equal number of p o s i t i v e and negative instances of a given concept; they d i f f e r e d , however, i n r e -gard to the type of teaching examples employed and to t h e i r manner of presentation. The f i r s t group was subjected to i n s t r u c t i o n by the successive presentation, of p o s i t i v e exam-ples; the second and t h i r d groups were instructed, by means of both p o s i t i v e and negative examples, involving successive presentation and cumulative presentation, respectively. 85 Cumulative presentation, as opposed to successive presenta-t i o n , provided f o r the continued exposure of the examples during the period of t e s t i n g . The test was taken immediately following the study of each teaching example, the four presentations of the same test being designated as A, B, C, and D. A set of standardized directions accompanied test administration i n a l l groups. Owing to the high correlations which defined the i n t e r -r e l a t i o n of the t o t a l of D scores with the sum t o t a l of A, B, C, and D scores, the former was regarded as a suitable c r i t e r i o n of test performance. Among the more important findings of t h i s study were the following: 1. These tests are capable of group measurement with a reasonably high degree of r e l i a b i l i t y . 2. Generally speaking, test performance under the influence of successive presentation was more sat i s f a c t o r y where only p o s i t i v e examples were employed. Where both p o s i t i v e and negative exam?-pies were involved, cumulative presentation appeared the better method. 3. While g i r l s were credited with maximum achievement, there was no conclusive evidence f o r the existence of sex diffe r e n c e s . 4. Test performance, while showing some po s i t i v e r e l a -tionship to i n t e l l i g e n c e , reading, and arithmetic reasoning, appeared also to be measuring an a b i l i t y 86 or a b i l i t i e s beyond the scope of these c l a s s i f i c a -t i o n s . Test performance seemed most closely associated with scholastic achievement. 5. Negative teaching examples, to presentations of which average i n d i v i d u a l reaction displayed high consistency, were usually accompanied by an immediate decline and spread i n group achievement. 6. A comparison of high and low I.Q. groups i n the basis of a l l four test performances suggested that the negative example was of l i t t l e advantage to bright children, while a handicap to those of more or less average i n t e l l i g e n c e . Among the l a t t e r , on the other hand, there were indications that the negative teaching example enlivened and i n t e n s i f i e d the di d a c t i c e f f e c t of the p o s i t i v e example immediately following. This e f f e c t was not duplica-ted i n the case of the high groups. 7. The value of the negative teaching example varied with the i n d i v i d u a l and with the p a r t i c u l a r t e s t instances employed. 8. Negative t e s t instances were i d e n t i f i e d by a l l groups with, greater accuracy than were p o s i t i v e i n -stances. Low I.Q. groups demonstrated close a b i l i t y with high I.Q. groups i n i d e n t i f y i n g negative test instances, hut were much less capable i n regard to p o s i t i v e instances. 87 9. I t e m - d i f f i c u l t y was contingent upon -a. The s i m i l a r i t y and d i s s i m i l a r i t y between teaching and test instances. b. The presence of r e l a t i o n s within, the stimulus pattern which were i r r e l e v a n t to a s o l u t i o n . c. The number of approaches which appeared l i k e l y to lead to a solution. 10. Analysis of test v a l i d i t y was confined to two boys* groups, the one instructed by successive presenta-tion, of p o s i t i v e examples, the other by cumulative presentation of p o s i t i v e and negative examples, and revealed that -a. Order of d i f f i c u l t y was c l o s e l y r e l a t e d to the diagnostic value of an item, the easier items d i f f e r e n t i a t i n g more e f f e c t i v e l y be-tween able and poor performers. b. Intercorrelations of test items were generally low and n e g l i g i b l e where performance was un-influenced by the negative example, but were higher f o r the group instructed by p o s i t i v e -negative cumulative presentation. The r e s u l t s caution against too great reliance upon any one s t a t i s t i c a l formula or technique f o r the determination of test v a l i d i t y . 3. Educational Implications The implications of t h i s study f o r Educational or Applied Psychology may be b r i e f l y summarized: Where concep-t u a l thinking i s involved at the Grade 71 l e v e l , use of the negative example i n group i n s t r u c t i o n i s apt to be more con-fusing than b e n e f i c i a l unless painstaking care i s exercised. 8 3 In many oases i t i s probably not so much the nega t ive example i t s e l f which provokes c o n f u s i o n , but r a t h e r an e r r o n -eous concep t ion o f the problem to be s o l v e d . For i n s t a n c e , to grasp the f u l l import of the nega t ive example, one must f i r s t understand the s i g n i f i c a n c e of the p o s i t i v e example? and to understand the p o s i t i v e example r e q u i r e s , i n the present t e s t s , awareness tha t some s o r t o f r e l a t i o n s h i p i s i n v o l v e d . I t would appear , t h e r e f o r e , t ha t the v a l u e o f the nega t ive example i s governed not on ly by a f a m i l i a r i t y w i t h the demands o f the problem but a l s o by the type o f m a t e r i a l which forms the ob jec t o f g e n e r a l i z a t i o n . 3. Suggest ions f o r Future. Research As o f t e n p o i n t e d ou t , the va lue o f many a p s y c h o l o g i c a l i n v e s t i g a t i o n has been s a c r i f i c e d by the a l l - t o o - f r e q u e n t tendency to abandon a p r o j e c t at a c e r t a i n s tage o f deve lop -ment and before some p r a c t i c a l and wor thwhi le c o n t r i b u t i o n to knowledge has been r e a l i z e d . In the s tudy o f g e n e r a l i z i n g a b i l i t y (concept format ion) recent a t tempts , no tab ly those of Long and Welch , have sought to remedy t h i s s i t u a t i o n . I n keeping w i t h t h i s t r e n d , a set o f s i m i l a r s t u d i e s i s be ing c u r r e n t l y conducted a t the U n i v e r s i t y o f B r i t i s h Columbia , of which the present one . is the t h i r d i n the s e r i e s . As an inducement toward the c o n t i n u a t i o n o f t h i s endeavor, the f o l l o w i n g suggest ions and recommendations are o f f e r e d f o r the improvement o f t e s t i n g t echn iques : 89 Foremost among the f a c t o r s demanding r e v i s i o n i s the p r e l i m i n a r y guidance wh ich i s in tended to in t roduce the sub jec t to the problem s i t u a t i o n . In remodel ing the i n s t r u c -t i o n s s p e c i a l emphasis should d w e l l upon th ree o b j e c t i v e s : 1. The Subjec t must be impressed w i t h the f a c t tha t r e l a t i o n s are i n v o l v e d , and r e l a t i o n s o n l y . 2. He must understand the d i f f e r e n c e between p o s i t i v e and nega t ive t e a c h i n g examples, and the s i g n i f i c - ance o f each . 3. He must unders tand tha t a l l p o s i t i v e t e a c h i n g examples o f a g i v e n i t em contain , one r e l a t i o n i n common and must be warned tha t no one example i s unique i n t h i s r e s p e c t . As a f i r s t s tep toward accompl i sh ing these ends, Dax might w e l l be r ep l aced by another "concept" wh ich , a f t e r the p a t t e r n o f V e c , does not i n v o l v e a c l o s e d f i g u r e . The e f f e c t o f t h i s s u b s t i t u t i o n upon the s o l u t i o n of i tems which i n c l u d e " in s idednes s " or "outs idedness" as e i t h e r i n c i d e n t a l o r e s s e n t i a l r e l a t i o n s h i p s c o u l d then be ana lyzed and com-par i sons made. In t h i s way the i n f l u e n c e of the f o r e - t e s t upon subsequent performance and i t e m - d i f f i c u l t y can more e a s i l y be Judged. Iff these t e s t s are to p rov ide even a rough measure o f g e n e r a l i z i n g a b i l i t y , great care must be a p p l i e d i n formu-l a t i n g the i n s t r u c t i o n s . The importance of t h i s requirement can not be exaggera ted . Among s e v e r a l changes tha t shou ld be made i n the present set o f d i r e c t i o n s , at l e a s t two o f these bear men t ion ing . Fo r example, accompanying the show-i n g o f the second t e a c h i n g example, the words "we might 90 g u e s s . . . t h a t a Pax i s a dot and a t r i a n g l e " were in tended to l e a d the sub jec t s i n t h e i r i n d i v i d u a l e f f o r t s toward the c o r r e c t g e n e r a l i z a t i o n , "a dot i n s i d e a t r i a n g l e " . But the p o s s i b i l i t y tha t t h i s sugges t ion may a l s o have m i s l e d t h i n k -ing toward elements and away from r e l a t i o n s h i p s i s not denied by the f a c t s . Another shortcoming r e f l e c t e d i n t e s t p e r -formance i s to be found i n the par t -s ta tement tha t " . . . . t h e p o s i t i o n o f the dot does not ma t t e r , " r e f e r r i n g to i t s p o s i t i o n w i t h i n the t r i a n g l e . Even though the c o r r e c t r e -l a t i o n s h i p i s l a t e r d e f i n e d , t h i s statement i s not s u f f i -c i e n t l y e x p l i c i t and l eaves too much room f o r confus ion i n the mind o f the s u b j e c t . Keen judgment should govern the s e l e c t i o n and a r range-ment of the t each ing and t e s t i n s t a n c e s , i n the i n t e r e s t o f v a l i d i t y . A l l approach toward i d e n t i t y o f l i k e t e a c h i n g and t e s t i n s t ances i n r e spec t t o shape, s i z e , and p o s i t i o n o f elements should be avo ided , p a r t i c u l a r l y i n the oase o f i tems of l e s s e r d i f f i c u l t y . I n p r e p a r i n g the t e s t s f o r a d m i n i s t r a t i o n a t h i g h e r age l e v e l s , v a l i d i t y might be best served by an adjustment of the time f a c t o r r a t h e r than by changes in . the t e s t m a t e r i a l i t s e l f . ^ - Group response to p o s i t i v e and nega t ive t e s t i n s t ances should be s t u d i e d , and any t rends no ted . In the event t h a t response f o l l o w s a p a t t e r n s i m i l a r to that 1. Thurs tone, T. G . , op . c i t . , p . 335 91 observed i n t h i s study.,; the p o s s i b i l i t i e s of the p o s i t i v e component score as a suitable c r i t e r i o n of generalizing a b i l i t y should be considered by computing i t s r e l a t i o n s h i p to other variables and by analyzing i t s diagnostic capacity. Where possible i t would be of i n t e r e s t to compute correlations between generalizing a b i l i t y and i n t e l l i g e n c e as measured by an i n d i v i d u a l test such as the Stanford-Blnet i n which reading has a more li m i t e d r o l e . Dickinson has proposed the time-saving measure of deferring testing u n t i l a l l four teaching examples are pre-sented,, thus eliminating Tests A, B, and C. While the p r a c t i c a l worth of such an arrangement i s attested by the high i n t e r - r e l a t i o n s h i p of D scores with the t o t a l of A, B, C, and D scores, i t s adoption at t h i s stage of development would hamper the study of v a l i d i t y and circumscribe a l l e f f o r t s to probe the true nature of i n d i v i d u a l and group performance. The advantages claimed by Thurstone-*- f o r the projector method of t e s t administration are, f i r s t l y , maximum control over exposure-time, and secondly, f a c i l i t y f o r capturing and holding attention. He points out that "the attention value of the v i s u a l projector method can be regarded as one of i t s p r i n c i p a l features". To ensure that t h i s statement applies i n any given- s i t u a t i o n , care should be taken to minimize Thurstone, L.L., "A Micro-Film. Projector Method f o r Psycho-l o g i c a l Tests", Psychometrika, v o l . VI, #4, August 1941, p. 240. . . . d i s t r a c t i v e influences by placing the projectors as f a r to the rear of the room as possible. Use of a portable screen frequently makes t h i s impossible owing to i t s l i m i t e d s i z e . A better substitute would be a large white sheet or, i f a portable screen must be used, the same eff e c t could be pro-duced by contracting the size of the slide-images or by em-ploying a d i f f e r e n t type of projector-lens. BIBLIOGRAPHY A c h i l l e s , E.M., "Experimental Studies i n Recall and Recog-n i t i o n " , Archives of Psychology, v o l . VI, Sept. 1920. B i l l i n g s , M.L., "Problem-Solving i n Different F i e l d s of Endeavor", American Journal of Psychology, v o l . XLVI, 1934. pp. 259-272. Dickinson, A.E., An Investigation Into The Generalizing A b i l i t y of Grade Two Pupils , Master's Thesis, Vancouver, U n i v e r s i t y of B r i t i s h Columbia, 1943, published i n abstract i n Journal of Educational Psychology, v o l . XXXV, 1944. pp. 432-441. Ewart, P.H. & Lambert, J.F., "The E f f e c t of Verbal Instruc-tions Upon the Formation of a Concept", Journal of General Psychology, v o l . VI, 1932. pp. 400-413. Garrett, H.E., S t a t i s t i c s i n Psychology and Education, Toronto, Longmans, Green and Co., 1940. Hanfmann, E. & Kasanin, J . , "A Method for the Study of Concept Formation", Journal of Psychology, v o l . I l l , 1937. pp. 521-540. H u l l , C.L., "Quantitative Aspects of the Evolution of Concepts; An Experimental Study", Psychological  Monographs. v o l . XXVIII, No.l, 192 0. Jackson, R.B. & Ferguson, G.A., Studies on the R e l i a b i l i t y Tests, B u l l e t i n No.12 of the Department of  Educational Research. University of Toronto,1941. Kuhlmann, F., "Our Changing Fashions i n Methods of Research", American Journal of Psychology, v o l . 55, 1942. pp. 569-573. BIBLIOGRAPHY (Continued) Kuo, Z.Y., "A Behavioristic Experiment on Inductive Inference", Journal of Experimental Psychology, v o l . VI, 1923. pp. 247-293. Long, J'.A. & Sandiford, P., "The Validation of Test Items", B u l l e t i n No.3 of the" Department of Educational Research, University of Toronto, 1935. Long, L. & Welch, L., "A Preliminary Investigation of Some Aspects of the H i e r a r c h i c a l Development of Concepts", Journal Of General Psychology, vol.XXC'I, 1940. pp. 359-378. McGeoch, J.A,, Psychology of Human Learning, New York, Longmans, Green and Co., 1942. Maier, N.R.F., "Reasoning i n Rats and Human Beings", Psychological Review, v o l . XLIV. 1937. pp.365-378. Peterson, G.M., "An Empirical Study of the A b i l i t y to Generalize", Journal of General Psychology, v o l . VT, 1932~ pp. 90^114. Ruger, H.A., "The Psychology of E f f i c i e n c y " , Teachers College  Educational Reprints. No.5, 1926 (a reprint of Archives of Psychology, No.15, 1910). Seward, G.H., "Recognition Time as a Measure of Confidence (An Experimental Study of Redintegration)", Archives of Psychology, v o l . XVT, 1928. pp.1-54. Sherman, M., Intelligence And I t s Deviations. New York, The Ronald Press Co., 1945. Smoke, K.L., "An Objective Study of Concept Formation", Psychological Monographs, v o l . XLII, No,4, 1932. "Negative Instances in Concept Learning", Journal of Experimental Psychology, v o l . XVI, 1933. pp. 583-8. Symonds, P.M., "Factors Influencing Test R e l i a b i l i t y " , Journal of Educational Psychology, v o l . XIX, 1928. pp. 73-87. BIBLIOGRAPHY (Continued) Thompson, J . , "The A b i l i t y of Children of Different Grade Levels To Generalize on Sorting Tests", Journal of Psychology, v o l . XI, 1941. pp. 119-126. Thurstone, L.L., "Primary Mental A b i l i t i e s " , Psychometric  Monographs, No.l, 1938. "A Micro-Film Projector Method For Psychological Tests", Psyohometrika, v o l . VT, No.4, August 1941. Thurstone, T.G., "The D i c c i f u l t y of a Test and Its Diagnostic Value", Journal of Educational Psychology, vol . XXIII, 1932. pp. 335-343. Tyler, F.T., generalizing A b i l i t y of Junior High School F u p i l s : An Experimental Study of Rule Induction, unpublished Ph. D. Thesis, University of C a l l -f o r n i a , 19 39. Wolfe, D., "Factor Analysis to 1940", Psychometric Monographs, No.3, 1940. Wood, J.E., The Relative Role of P o s i t i v e and Negative  Instances i n Concept Formation, unpublished Master's Thesis, Vancouver, University of B r i t i s h Columbia, 1943. Woodworth, R.S., Experimental Psychology, London, Methuen and Co. Ltd., 1938. • APPENDIX I . A . I n s t r u c t i o n s Issued Subjec ts I n Experiment I . Today your teacher has suggested that you he lp us work out some p i c t u r e - p u z z l e s . We t h i n k you w i l l enjoy doing these p u z z l e s . You have never seen them be fo re . To make i t more i n t e r e s t i n g we s h a l l score your r e s u l t s . Here on the b l a c k - b o a r d you see the f i r s t pa r t o f your answer sheet: F i l l i n your name, whether a boy or g i r l , your age, b i r t h d a y , s c h o o l , and the name o f your teacher . Pay no a t t e n t i o n to the other b l a n k s . F i r s t on the screen we are go ing to show you a p i c t u r e of a t h i n g c a l l e d a Dax ( s p e l l ) . You w i l l study t h i s p i c t u r e f o r a few moments to d i s c o v e r the idea o f what a Dax i s . Then you w i l l be shown the p u z z l e made up of 10 p i c t u r e s . You are to t e l l which of these 10 p i c t u r e s c o n t a i n the i dea o f what a Dax i s , and which do no t . To make t h i s c l e a r , l e t us l ook at the f i r s t example. (Dax f l a s h e d on s c r e e n ) . Here i s a p i c t u r e of a Dax. Now study i t and see i f you can decide what a Dax i s . (Pause. Then Dax r e p l a c e d by t e s t ) . Now look a t the p u z z l e . Look at number one. Do you t h i n k i t con t a in s the i d e a of a Dax? I f you do then draw a c i r c l e around "yes" i n row A under "1". I f you t h i n k i t does not c o n t a i n the idea o f a Dax, c i r c l e the "no" i n row A under "1". 2 A ( I l l u s t r a t i n g ) Then look at picture Number 2. Do you think i t contains the idea of a Dax? I f you do, then draw a c i r c l e around "yes" i n row'Aunder "2". I f you think i t does not contain the idea of a Dax, then c i r c l e the "no" in row A under "2". ( I l l u s t r a t i n g ) Now you do the same for thexother pictures. Draw the c i r c l e s around "yes" or "no" i n row A, because t h i s i s the f i r s t puzzle. Pay no attention to these columns on the r i g h t . Be sure that you  put your answers under "Dax" on your sheet. (Pause) Raise your hand when you have f i n i s h e d . (Test replaced by second Dax). Now l e t us look at the second example on the screen. This i s also a Dax. Now you must remember what the f i r s t Dax was l i k e , and see how t h i s one i s l i k e the f i r s t one. You remember that the other example had a t r i a n g l e and a dot. So has t h i s one. Remember that the shape of the other t r i a n g l e was not the same as t h i s one, so that the shape of the triangle does not matter. You remember also that the dot i n the other Dax was i n a d i f f e r e n t p o s i t i o n , so that the p o s i t i o n of the dot does not matter. We might guess, then, that a Dax i s a dot and a t r i a n g l e . Now l e t us look at the second puzzle. (Dax replaced by test) Look at picture number one. Do you think i t contains the idea of a Dax? I f you do, then c i r c l e the "yes" i n row B under 1, because t h i s i s the second puzzle. I f you think picture number one does not 3A con ta in the i dea of a Dax, then c i r c l e the "no" . ( I l l u s -t r a t i n g ) Now look a t p i c t u r e number two. Do you t h i n k i t i s a Dax? i f you do, then c i r c l e the "yes" i n row B under 2. I f you t h i n k i t i s not a Dax, c i r c l e the "no" . ( I l l u s -t r a t i n g ) Now you do the r e s t o f the p u z z l e . (Pause. Test r e p l a c e d by t h i r d Dax) Now l e t us l o o k at the t h i r d examplexof a Dax. Do you t h ink tha t a Dax i s a t r i a n g l e and a dot? W e l l , I am going to t e l l you what a Dax r e a l l y i s . A Dax i s a t r i a n g l e w i t h a dot i n s i d e i t . You remember tha t i n each case the dot was i n s i d e the t r i a n g l e . Now you do the t h i r d p u z z l e . (Pause. Then Dax r e p l a c e d by t e s t ) Do you t h i n k p i c t u r e number one i s a Dax? Y e s , i t i s , because i t has a t r i a n g l e w i t h a dot i n s i d e i t . So draw a c i r c l e around "yes" i n row C under 1. Now i s p i c t u r e number two a Dax? No, i t i s n o t , because the dot i s ou t s ide the t r i a n g l e . So you draw a c i r c l e x a r o u n d the "no" i n row C under 2. Now you go ahead and do the r e s t . (Pause. Tes t r e p l a c e d by f o u r t h Dax) Let us look at the f o u r t h example. A g a i n , we see tha t a Dax i s a t r i a n g l e w i t h a dot i n s i d e i t . A l r i g h t now, you do the fou r th p u z z l e . (Dax r ep laced by t e s t ) Now s i n c e there i s a t ime l i m i t on our p u z z l e s , we are going to g i v e you j u s t the amount o f time you w i l l have f o r the other problems. You w i l l then have an idea o f how f a s t you must work. (25 second i n t e r v a l ) Now I am going to t e l l you the answers to t h i s fourth puzzle, and you see i f you had them r i g h t . Number one i s a Dax; number two i s not a Dax; etc. A l r i g h t now. The Dax was only a practice puzzle. The puzzles you try from now on w i l l be counted. You w i l l work at each one just as you did with the Dax. There i s just one more r u l e i n solving these puzzles: Once a new picture has been shown do not go back and make any changes i n answers that you have already made. I f you do, those answers w i l l be counted wrong. For example, i n t h i s next puzzle, l e t us say you have been shown the f i r s t picture of a Mef and that you have already done the f i r s t puzzle that i s , you have f i n i s h e d row A. When the second picture of a Mef i s shown you must not go back and make any changes in that f i r s t row. Now, everyone ready. B. Instructions Issued Subjects In Experiment I I . Today your teacher has suggested that you help us work out some picture-puzzles. We think you w i l l enjoy doing these puzzles. You have never seen them before. To make i t more int e r e s t i n g we s h a l l score your r e s u l t s . Here on the black-board you see the f i r s t part of your ahswer sheet: F i l l i n your name, whether a boy or g i r l , your age, birthday, school, and the name of your teacher. Pay no attention to the other blanks. 5A F i r s t on the screen we are going to show you a picture of a thing c a l l e d a Dax ( S p e l l ) . You w i l l study this picture for a few moments to discover the idea of what a Dax i s . Then you w i l l be shown the puzzle made up of 10 pictures. You are to t e l l which of these 10 pictures contain the idea of what a Dax i s , and which do not. To make thi s c l e a r , l e t us look at the f i r s t example. (Dax flashed on screen) Here i s a picture of a Dax. Now study i t and see i f you can decide what a Dax i s . (Pause. Then Dax replaced by test) Now look at the puzzle. Look at number one. Do you think i t contains the idea of a Dax? I f you do then draw a c i r c l e around "yes" i n row A under 1. If^you think i t does not contain the idea of a Dax, c i r c l e the "no" i n row A under 1. ( I l l u s t r a t i n g ) Then look at picture number two. Do you think i t contains the idea of a Dax? If you do then draw a ' c i r c l e around "yes" i n row A under 2. I f you think i t does not contain the idea of a Dax, then c i r c l e the "no" i n row A under 2. ( I l l u s t r a t i n g ) Now you'do the same for the other pictures.. Draw-the c i r c l e s around "yes" or "no" in row A, because t h i s i s the f i r s t puzzle. Pay no attention to these columns on the r i g h t . Be sure that you put your answers under "DAX" on your sheet. (Pause) Raise your hand when you have f i n i s h e d . (Test replaced by noQ-Dax) Now l e t us look at the second example on the screen. Here we have something that i s not a Dax. Now you must 6 .A remember what the Dax was l i k e , and see how t h i s example i s d i f f e r e n t from i t . You remember that the o ther example had a t r i a n g l e and a dot . But t h i s one has on ly a t r i a n g l e . Now l e t us l ook a t the second p u z z l e . (Non-Dax r e p l a c e d by t e s t ) Look at p i c t u r e number one. Do you t h i n k i t c o n t a i n s the idea o f a Dax? I f you do, then c i r c l e the w y e s " i n row B under 1, because t h i s i s the second p u z z l e . I f you t h i n k p i c t u r e number one does not c o n t a i n the i dea o f a Dax, then c i r c l e the "no" . ( I l l u s t r a t i o n ) Now l o o k at p i c t u r e number two. Do you t h i n k i t i s a Dax? I f you do, then c i r c l e the "yes" i n row B under 2 . I f you t h i n k i t i s not a Dax, c i r c l e the "no" . ( I l l u s t r a t i n g ) Now you do. the r e s t o f the p u z z l e . (Pause. Tes t r ep l aced by second Dax) Le t us l o o k at the t h i r d example on the sc reen . Now t h i s one i s a Dax. You remember the f i r s t example o f a Dax that you saw had a t r i a n g l e and a do t . So has t h i s one. Remember tha t the shape of the t r i a n g l e i n the o ther Dax was not the same as t h i s one, so that the shape o f the t r i a n g l e does no t ma t te r . You remember a l s o tha t the dot i n the o ther Dax was i n a d i f f e r e n t p o s i t i o n , so tha t the p o s i t i o n of the dot does not ma t t e r . Now study t h i s Dax c l o s e l y . Do you t h i n k that a Dax i s a t r i a n g l e and a dot? W e l l , I am going to t e l l you what a Dax r e a l l y i s . A Dax i s a t r i a n g l e w i t h a dot i n s i d e i t . You remember tha t i n the f i r s t Dax the dot was a l s o i n s i d e the t r i a n g l e . But 7 A the second example had no do t , so tha t i t was not a Dax. Now you do the t h i r d p u z z l e . (Pause. Dax r e p l a c e d by t e s t ) Do you t h i n k p i c t u r e number one i s a Dax? Y e s , i t i s , because i t has a t r i a n g l e w i t h a dot i n s i d e i t . So draw a c i r c l e around "yes" i n row C under 1. Now i s p i c t u r e number two a Dax? No, i t i s n o t , because the dot i s ou t s ide the t r i a n g l e . So you draw a c i r c l e around the "no" i n row C under 2 . Now you go ahead and do the r e s t . (Pause. Test r ep l aced by non-Dax) L e t us look at the f o u r t h example. Now t h i s i s not a Dax, because the dot i s o u t s i d e the t r i a n g l e . A l r i g h t now, you do the fou r th p u z z l e . (Non-Dax r e p l a c e d by t e s t ) Now s ince there i s a time l i m i t on our p u z z l e s , we are go ing to g i v e you j u s t the amount o f t ime you w i l l have f o r the o ther problems. Ypu w i l l then have an i dea o f how fa s t you must work. (35 second i n t e r v a l ) Now I am going to t e l l you the answers to t h i s f o u r t h p u z z l e , and youcsee i f you had them r i g h t . Number one i s a Dax; number two i s not a Dax; e t c . A l r i g h t now. The Dax was on ly a p r a c t i c e p u z z l e . The puzz l e s you t r y from now;con w i l l be counted. You w i l l work a t each one ju s t as you d i d with', the Dax. There i s jus t one more r u l e i n s o l v i n g these p u z z l e s : Once a new p i c t u r e has been shown do not go back and make any changes i n answers tha t you have a l r eady made. I f you do, those answers w i l l be counted wrong. For ex-ample, i n t h i s next p u z z l e , l e t us say you have been shown the f i r s t p i c t u r e of a Mef and that you have a l ready done the f i r s t p u z z l e tha t i s , you have f i n i s h e d row A . When the second p i c t u r e o f a Mef i s shown you must not go back and make any changes i n that f i r s t row. Now, everyone ready. G. I n s t r u c t i o n s Issued Subjec t s In Experiment I I I . Today your teacher has suggested that you h e l p us work out some p i c t u r e - p u z z l e s . We t h i n k you w i l l enjoy doing these p u z z l e s . You have never seen them before . To make i t more i n t e r e s t i n g we s h a l l score your r e s u l t s . Here on the b l a c k - b o a r d you see the f i r s t pa r t of your answer sheet : P i l l i n your name, whether a boy or g i r l , your age, b i r t h d a y , s c h o o l , and the name of your t eacher . Pay no a t t e n t i o n to the other b l a n k s . F i r s t on the screen we are going to show you a p i c t u r e of a t h i n g c a l l e d a Dax ( S p e l l ) . You w i l l study t h i s p i c t u r e f o r a few moments to d i s c o v e r the idea o f what a Dax i s . Then you w i l l be shown the puzz l e made up o f 10 p i c t u r e s . You are to t e l l which o f these 10 p i c t u r e s c o n t a i n the i dea o f what a Dax i s , and which do n o t . To make t h i s c l e a r , l e t us look a t the f i r s t example. (Dax on) Here i s a p i c t u r e o f a Dax. Now study i t and see i f you decide what a Dax i s . (Pause. Then t e s t a l s o f l a sh ed on) Now look at the p u z z l e . Look a t 9A Number One. Do you think i t contains the idea of a Dax? I f you do then draw a c i r c l e around "yes" i n row A under "1". I f you think i t does not contain the idea of a Dax, c i r c l e the "no" i n row A under "1". ( I l l u s t r a t i n g ) Then look at picture Number 2. Do you think i t contains the idea of a Dax? I f you do then draw a c i r c l e around "yes" i n row A under "2". I f you think i t does not contain the idea of a Dax, then c i r c l e the "no" i n row A under "2". Now you do the same f o r the other pictures. Draw the c i r c l e s around "yes" or "no" i n row A, because th i s i s the f i r s t puzzle. Pay no attention to these columns on the r i g h t . Be sure that you put your answers under "Dax" on  your sheet. (Pause) Raise your hand when you have f i n i s h e d . (Test o f f . F i r s t p o s i t i v e Dax supplemented by a negative Dax) Let us look at the two examples on the screen. The top example i s the Dax that you just studies. Now, below i t i s an example of something that i s not a Dax. You w i l l notice that the Dax (indicating) has a t r i a n g l e and a dot, while the example below, has only a t r i a n g l e . We might guess, then, that a Dax i s a dot and a triangle." Now l e t us look at the second puzzle. (Test flashed ofl) Look at picture Number .One. Do you think i t contains the idea of a Dax? I f you do, then c i r c l e the "yes" i n row B under "1", because th i s i s the second puzzle. I f you think picture Number 1 does not contain the idea of a Dax, then 10 JL c i r c l e the "no". (Il lustrating) Now look at picture Number 2. Do you think i t is a Dax? If you do, then c irc le the "yes" in row B under "2". If you think i t is not a Dax, c irc le the "no". Now you do the rest of the test. (Pause. Test off, and two preceding examples supplemented by a third example). Let us look at the three examples on the screen. You have already studied the f i r s t two (indicating). Below them is another example of a Dax.. You w i l l notice that the f i r s t Dax (indicating) has a triangle and a dot. So has this one. Notice also that the shape of the triangle in the f i r s t Dax is not the same as this one, so that the shape of the triangle does not matter. You can see, too, that the dot in the f i r s t Dax is in a different posit ion, so that the position of the dot does not matter. Now study this Dax closely, (referring to third example). Do you think that a Dax is a triangle and a dot? Well, I am going to t e l l you what a Dax really i s . A Dax i s a triangle with a dot inside i t . Note that in the f i r s t Dax the dot is also inside the triangle. But the second ex-ample has no dot, so that i t i s not a Dax. Now you do the third .puzzle. (Test flashed oni. Do you think picture Number One is a Dax? Yes, i t i s , because i t has a triangle with a dot inside i t . So draw a circle^around "yes" in row C under " l n . Now is picture Number 2 a Dax? No, i t is not, because the dot i s outside the triangle. So you I l l draw a c i rc le around the "no" in row C under "2". Now you go ahead and do the rest. (Pause. Test off, and a .fourth example added) Now let us study the;:four examples on the screen. You are familiar with the f i r s t three. But i f you look at the last example you can see that i t i s not a Dax, because the dot is outside the triangle. Alr ight , now, you do the. fourth puzzle. (Test flashed on) Now since there is a time l imit on our puzzles, we are going to give you just the amount of time you w i l l have for the other problems. You w i l l then have an idea of how fast you must work. (S25 second interval) Now I am going to t e l l you the answers to this fourth puzzle, and youesee i f you had them right. Number 1 i s a Dax. Number 2 i s not a Dax. E t c . , etc. Alright now. The Dax was only a practice puzzle. The puzzles you try from now on w i l l be counted. You w i l l work at each one just as you did with the Dax. There is just one more rule in solving these puzzles: Once::a new picture has been shown do not go back and make any changes in answers that you have already made. I f you do, those answers w i l l be counted wrong. For example, in this next puzzle, let us say you have been shown the f i r s t picture, of a Mef and that you have already done the f i r s t puzzle that i s , you have finished row A. When the second picture of a Mef is shown you must not go back and make any changes in that f i r s t row. Alright now. Everyone ready. 12A APPENDIX IT. TABLE A. PERFORMANCE OF HIGH IQ GROUPS ON TOTAL OF TESTS A. B, C, AND D. MEAN SCORES, STANDARD DEVIATIONS, AND STANDARD ERRORS. A.M. S.D. GROUP I Boys 219.60 (6.39) 23.90 C4.52) Girls 224.83 (7.33) 27.40 (5.48) GROUP II Boys 220.83 (8.56) 32.00 (6.05) Girls 220.83 (9.36) 35.00 (6.62) GROUP III Boys 226.83 (10.96) 41.00 (7.75) Girls 241,50 (4.12) 15.40 (2.91) TABLE B. A.M. S.D. PERFORMANCE OF LOW IQ GROUPS ON TOTAL OF TESTS A, B, C, AND D. MEAN SCORES, STANDARD DEVIATIONS, AND STANDARD ERRORS. GROUP I B O y s 209.50 (4.57) 17.10 (3.23) Girls 198.17 (5.56) 20.80 (3.93) GROUP II 180.83 (9.25) 34.60 (6.54) Girls 181.50 (5.75) 21.50 (4.06) GROUP III Boys 180.17 (9.52) 35.60 (6.73) Girls 220.83 (7.73) 28.90 (5.46) 13'A TABLE C. ORDER OF ITEM-DIFFICULTY FOR HIGH IQ GROUPS IN TERMS OF PERCENTAGE OF MAXIMUM. POSSIBLE SCORE. GROUP I GROUP II GROUP III Boys i Girls Boj ITS Girls Boys Girls Item ! % . i Item % Item % Item % Item % Item % Zum 85 Zum 92 Zum 82 Zum 82 Zum 85 Zum 93 Vec 84 Vec 83 Vec 80 Vec 78 Vec 81 Vec 88 Mef 72 Mef 75 Zif 72 Mef 72 Zif 78 Zif 79 Zif 69 Zif 70 j Wez 71 Wez 70 Wez 73 Mef 79 Wez 64 Mib 66 Mef 70 Zif 68 Mef 73 Wez 76 dtov 59 Wez 64 Mib 60 Mib 65 Pog 63 Pog 66 Mib 59 Tov 60 Pog 59 Pog 61 Mib 62 Mib 63 Pog 56 Pog 53 Tov 52 Tov 55 Tov 52 Tov 55 TABLE D. ORDER OF ITEM-DIFFICULTY FOR LOW IQ GROUPS IN TERMS OF PERCENTAGE OF MAXIMUM POSSIBLE SCORE. GROUPS I GROUP II GROUP III B03 rs Girls Boys Girls B03 Girls Item % Item % Item % Item % Item % Item % Zum 83 Zum 74 Zum 66 , Zum 69 Zum 70 Zum 86 Vec 83 ; Mef 67 Wez 59 Vec 61 Wez 60 Vec 79 Mef 70 Vec 63 Zif 58 Wez 60 Zif 58 Mef 71 Wez 59 Tov 60 j Vec 57 Zif 56 Mef 57 Zif 68 Zif 58 Mib 59 Tov 53 Pog 54 Vec 56 Wez 66 Tov 57 Wez 57 1 Mef 53 Tov 51 Pog 54 Pog 62 Mib 57 Pog 55 ; Mib 52 Mef 50 Tov 49 Mib 60 Pog 56 Zif 54 Pog 52 Mib 50 Mib 46 Tov 56 141 TABLE E. COMPARISON OF NUMBER OF PERFECT SCORES IN TEST A WITH NUMBER OF PERFECT SCORES CONTINUING .THROUGHOUT TESTS A, B, C, AND D. (SUB-GROUPS ONLY). GROUP I GROUP II GROUP III Boys . G] L r l s Boys Gj L S B oys ttirls H L H M L H M L H M L H M L H M L PERFECT SCORES IN TEST A. L7 8 • 7 17 10 9 12 11 12 16 8 3 24 14 7 20 12 13 PERFECT SCORES THROUGHOUT ALL FOUR TESTS Ll 4 4 9 3 0 10 6 8 11 4- 18 8 0 18 5 9 TABLE F. CRITICAL RATIOS OF THE DIFFERENCES IN VARIABILITY BETWEEN A. B t C t and D SCORES WITHIN EACH GROUP. A & B B & C C & D A & C B & D GROUP" I Boys .43 1.72 .07 1.14 1.74 TJirT s .70 .00 1.13 . .69 1.20 GROUP I I Boys' 3.31 2.51 3.48 .82 1.67 "GTrT GROUP III s Boys 3.84 2.25 2.92 1.73 1.43 2.00 1.03 2.63 1.42 2.23 G i r l s 1.54 2.30 1.04 3.48 1.69 16 A TABLE G. HIGH GROUP PERFORMANCE ON TOTAL OF EACH OF TESTS A 7 ~ B " , C, AND D. MEAN SCORES, STANDARD DEVIATIONS, AND STANDARD ERRORS.  GROUP I GROUP II GROUP III * Boys G i r l s Boys G i r l s Boys G i r l s TEST A A.M. 51.43 (1.64) .52.50 (1.79) 53.83 (1.59) 55.17 (1.94) 56.77 (1.49) 55.97 (1.60) S.D. 6.12 (1.16) 6.68 -(1.26) 5.96 (1*13) 7.24 (1.37) 5.56 (1.05) 6.00 (1.13) TEST B A.M. 55.17 (1.59) 55.70 (2.11) 52.23 (2.86) 53.83 . (2.55) 55.17 1 (3.09) 58.90 (1.18) S.D. 5.96 (1.13) 7.88 (1.49) 10.68 (2.02) 9.52 (1.80) 11.56 (2.19) 4.40 ( .83) TEST A.M. 56.77 (1.94) 58.10 (2.24) 57.57 (2.19) 57.03 (2.59) 58.90 (3..07) 62.10 (1.09) S.D. 7.24 (1.37) 8.36 (1.58) 8.20 (1.55) 9.68 (1.83) 11.48 (2.17) 4.08 ( .77) TEST D . A.M. S.D. 56.23 ;(2.01) 7.52 (1.42) 57.30 (2.18) 8.16 (1.54) 57.30 (2.95) 11.04 (2.09) 55.17 (3.36) 12.56 (2.37) 56.50 (4.35) 16.28 (3.08) 62.37 ( .94) 3.52 ( .67) TABLE H. LOW GROUP PERFORMANCE ON TOTAL OF EACH OF TESTS A"7"B, C, AND D. MEAN SCORES, STANDARD DEVIATIONS, AND STANDARD ERRORS. A.M. TEST A S.D. GROUP I GROUJ P II GROUP III Boys G i r l s Boys G i r l s Boys G i r l s 51.70 (1.80) 6.72 (1.27) 48.23 (1.89) 7.08 (1.34) 46.10 (1.95) 7.28 (1.38) 47.17 (1.39) 5.20 ( .98) 43.70 (2.72) 10.16 (1.92) 53.03 (2.02) 7.56 (1.43) . A.M. TEST B S.D. 52.77 (1.33) 4.96 ( .94) 49.03 (1.35) 5.04 ( .95) 42.37 (2.95) 11.02 (2.08) 37.57 (2.09) 7.80 (1.47) 42.90 (2.24) 8.36 (1.58) 53.83 (2.33) 8.72 (1.65) A.M. TEST C S.D. 51.97 (1.51) 5.64 (1.07) 49.30 (1.47) 5.48 (1.04) 48.50 (2.11) 7.88 (1.49) 52.77 (2.09) 7.80 (1.47) 47.43 (2.48) 9.28 (1.75) 58.37 (1.74) 6.52 (1.23) A.M. TEST D S.D. 52.50 (1.61) 6.04 (1.14) 49.30 (1.83) 6.84 (1.29 ) 43.97 (3.47) 12.98 (2.45) 43.43 (2.51) 9.40 (1.78) .46.10 (3.30) 12.36 '(2.34) 56.23 (2.91) 10.88 (2.06) 1 8 A TABLE I. CRITICAL RATIOS OF THE DIFFERENCES BETWEEN MEAN SCORES WITHIN HIGH GROUPS. (BRACKETED LETTERS DESIGNATE HIGHEST SCORES). ..  GROUP I GROUP ] CI GROUP I I I Boys G i r l s Boys G i r l s Boys G i r l s A & B 3.20(B) 1.83(B) .86(A) .68(A) .66(A) 2.64(B) B & C 1.72(C) 2.73(C) 2.37(C) 3.95(C) . 1.59(C) 5.08(C) A & C 4.49(C) 2.84(C) 2.09(C) .87(C) 1.01(C) 4.41(C) B & D 1.13(D) 2.13(D) 2.41(D) 1.09(D) .91(D) 4.34(D) \ TABLE J . CRITICAL RATIOS OF THE DIFFERENCES BETWEEN MEAN SCORES WITHIN LOW GROUPS. (BRACKETED LETTERS DESIGNATE HIGHEST SCORES).  GROUP I GROUP I I GROUP III Boys G i r l s Boys G i r l s Boys G i r l s A & B .47(B) .63(B) 2.10(A) 4.28(A) .03(A) .04(B) B & C .61(B) .24(C) 3.56(C) 5.78(C) 2.38(C) 2.27(C) A & C •17(C) .75(C) 1.32(C) 2.83(C) 2.02(C) 4.60(C) B & D .18(B) .23(D) .82(D) 3.64(D) 1.99(D) 2.12(D) 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0302576/manifest

Comment

Related Items