UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A reliability analysis of the American Association for Health, Physical Education and Recreation Youth… Field, Arthur Edward James 1964

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1964_A7_5 F5.pdf [ 3.53MB ]
Metadata
JSON: 831-1.0302538.json
JSON-LD: 831-1.0302538-ld.json
RDF/XML (Pretty): 831-1.0302538-rdf.xml
RDF/JSON: 831-1.0302538-rdf.json
Turtle: 831-1.0302538-turtle.txt
N-Triples: 831-1.0302538-rdf-ntriples.txt
Original Record: 831-1.0302538-source.json
Full Text
831-1.0302538-fulltext.txt
Citation
831-1.0302538.ris

Full Text

A RELIABILITY ANALYSIS OP THE AMERICAN ASSOCIATION FOR HEALTH, PHYSICAL EDUCATION AND RECREATION YOUTH FITNESS TEST ITEMS By Arthur E. J. Field B.P.E., The University of Br i t i s h Columbia, 1960 A Thesis Submitted in Partial Fulfilment of The Requirements For The Degree of Master of Physical Education in the School of Physical Education and Recreation We accept this thesis as conforming to the required standard. The University of British Columbia August, 1964 I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t of the r e q u i r e m e n t s f o r an advanced degree at the U n i v e r s i t y of B r i t i s h C o l u m b i a , I agree that the L i b r a r y s h a l l make i t " f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . • I f u r t h e r agree that ' p e r -m i s s i o n f o r e x t e n s i v e c o p y i n g of t h i s t h e s i s f o r s c h o l a r l y purposes may be granted by the Head of my Department or by h i s r e p r e s e n t a t i v e s . I t i s unders tood that , c o p y i n g or p u b l i -c a t i o n of t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d without my w r i t t e n p e r m i s s i o n ® Department, of PHYSICAL EDUCATION AMD RECREATION The U n i v e r s i t y of B r i t i s h C o l u m b i a , Vancouver 8, Canada Date AUGUST 1964 ABSTRACT A complete r e l i a b i l i t y analysis of the AAHPER Test has not been reported i n the physical education literature. Previous reports have dealt only with the test-retest r e l i a b i l i t y coefficients of one or more items. The purpose of this study was to provide a comprehensive r e l i a b i l i t y analysis of the AAHPER Test items. More specifically the problems of this study were (l) to determine the average and best test-retest r e l i a b i l i t y coefficients of the test items; (2) to determine the standard error of measurement (absolute accuracy) of the test items as computed by the standard correlation formula method and the analysis of variance technique; (3) to determine i f the practice effect is significant for each test item; (4) to determine i f the test items measure with an accuracy sufficient to distinguish between the subjects tested; and (5) to determine for each test item i f subject differences (differences between subjects) are significantly larger than practice differences (differences between t r i a l s ) . Fifty-seven untrained male students enrolled i n the Required Physical Education Programme at the University of British Columbia were tested once a week for four consecutive weeks with the AAHPER Test. The items administered were pull-ups, sit-ups, standing broad jump, shuttle run, 50-yard dash, softball throw and 600-yard run-walk. The data from each test item were analyzed i n order to obtain (a) means and standard deviations for each of four t r i a l s , (b) between t r i a l s correlation coefficients and an average r e l i a b i l i t y coefficient, (c) standard errors of measurement computed by the standard correlation formula m e t h o d a n d t h e a n a l y s i s o f v a r i a n c e t e c h n i q u e , a n d ( d ) t h r e e F r a t i o s ( a n a l y s i s o f v a r i a n c e ) . It was c o n c l u d e d o n t h e basis o f t h e r e l i a b i l i t y a n a l y s i s o f t h e d a t a c o l l e c t e d t h a t ( l ) t h e a v e r a g e t e s t - r e t e s t r e l i a b i l i t y c o e f f i c i e n t s o f t h e t e s t i t e m s w e r e p u l l-ups .938, s i t - u p s .861, s t a n d i n g broad j u m p .899, s h u t t l e r u n .776, 50 - y a r d d a s h .792, S o f t b a l l t h r o w .940 a n d 600-yard r u n -w a l k .759; (2) t h e s t a n d a r d e r r o r s o f m e a s u r e m e n t c o m p u t e d b y t h e s t a n d a r d c o r r e l a t i o n f o r m u l a m e t h o d a n d t h e a n a l y s i s o f v a r i a n c e t e c h n i q u e w e r e p u l l -u p s ( c o r r e l a t i o n f o r m u l a m e t h o d 0.794 a n d a n a l y s i s of v a r i a n c e t e c h n i q u e 0.834), s i t - u p s (6.250 a n d 6.934), s t a n d i n g b r o a d j u m p (3.124 a n d 3.353 i n c h e s ) , s h u t t l e r u n (0.227 a n d 0.239 s e c o n d s ) , 50-yard d a s h (0.194 a n d 0.190 s e c o n d s ) , S o f t b a l l throw (9.100 a n d 9.170 feet), a n d 600-yard r u n - w a l k (5.000 a n d 5.660 s e c o n d s ) ; (3) a n a l y s i s o f v a r i a n c e r e s u l t s s h o w e d a s i g n i f i c a n t p r a c t i c e e f f e c t o v e r f o u r t r i a l s f o r a l l i t e m s e x c e p t t h e s o f t b a l l t h r o w ; (4) a n a l y s i s o f v a r i a n c e r e s u l t s s h o w e d t h a t t h e AAHPER Test i t e m s m e a s u r e w i t h a n a c c u r a c y s u f f i c i e n t t o d i s t i n g u i s h b e t w e e n t h e s u b j e c t s t e s t e d ; a n d (5) a n a l y s i s o f v a r i a n c e r e s u l t s s h o w e d t h a t f o r e a c h t e s t i t e m s u b j e c t d i f f e r e n c e s a r e n o t s i g n i f i c a n t l y l a r g e r t h a n p r a c t i c e d i f f e r e n c e s ; a n d s i n c e t h e y u s u a l l y a r e , i t c a n b e c o n c l u d e d t h a t t h e p r a c t i c e e f f e c t m u s t h a v e b e e n s e v e r e . The f i n d i n g s o f t h i s s t u d y s h o w e d t h a t t h e p u l l - u p s a n d s o f t b a l l t h r o w v a r i a b l e s w e r e h i g h l y r e l i a b l e . Thus w h e n u s i n g t h e s e i t e m s i t s e e m s r e a s o n a b l e t o a c c e p t f i r s t t r i a l s c o r e s a s s u f f i c i e n t l y a c c u r a t e f o r b o t h s u r v e y a n d e x p e r i m e n t a l p u r p o s e s . The s t a n d i n g broad j u m p , 50 - y a r d d a s h and t h e 600-yard r u n - w a l k i t e m s h a d r e l a t i v e l y h i g h r e l i a b i l i t y ; h o w e v e r , r e s u l t s s h o w e d t h a t s e v e r a l p r e l i m i n a r y p r a c t i c e t r i a l s a r e p r o b a b l y n e c e s s a r y b e f o r e scores become sufficiently reliable for research purposes. The sit-ups and shuttle run were the least reliable items of the AAHPER Test. These items seem to require at least four preliminary practice t r i a l s before a satisfactory level of r e l i a b i l i t y can be attained. ACKNOWLEDGMENT The writer wishes to thank Dr. S. R. Brown, of the University of Br i t i s h Columbia, School of Physical Education and Recreation for his stimulating advice and continued assistance in the writing of this study. Sincere thanks are extended to the members of my thesis committee for their guidance and encouragement; to Mr. Harry Walters who aided i n the administration of the tests; to Mr. R. W. B. Jackson, Statistician, University of Toronto and to Mr. H. Dempster and the staff of the University of British Columbia computor center. Appreciation is extended to those fifty-seven students who served as subjects and produced the necessary data which made this study possible. Finally, the writer wishes to thank his wife Joan and Mrs. E. Kane for their help i n completing this study. TABLE OF CONTENTS CHAPTER PAGE I STATEMENT OF THE PROBLEM 1 The P r o b l e m 2 D e l i m i t a t i o n s 2 L i m i t a t i o n s • 3 I I J U S T I F I C A T I O N OF THE PROBLEM 4 I m p o r t a n c e o f t h e S t u d y 5 D e f i n i t i o n s o f Terms U s e d 6 I I I REVIEW OF LITERATURE 12 H i s t o r y a n d D e v e l o p m e n t o f t h e AAHPER T e s t 12 A d v a n t a g e s o f t h e AAHPER T e s t 15 F a c t o r s T h a t I n f l u e n c e T e s t R e l i a b i l i t y 16 Methods o f E s t i m a t i n g R e l a t i v e P r e c i s i o n ( R e l i a b i l i t y ) 18 Methods o f E s t i m a t i n g A b s o l u t e P r e c i s i o n ( E r r o r s o f Measurement ) 24 R e l a t e d R e l i a b i l i t y S t u d i e s 25 TV METHODS AND PROCEDURE 28 S u b j e c t s 28 T e s t A d m i n i s t r a t i o n 29 T e s t i n g Team 30 E q u i p m e n t and F a c i l i t i e s 30 S t a t i s t i c a l T r e a t m e n t o f D a t a 31 CHAPTER PAGE V RESULTS AND DISCUSSION 36 Reli a b i l i t y 36 Standard Errors of Measurement 41 Analysis of Variance 42 VI SUMMARY AND CONCLUSIONS 51 BIBLIOGRAPHY 54 APPENDICES 59 A. AAHPER RESEARCH PROGRAMME 60 B. HISTORY AND USE OF THE AAHPER TEST 63 C. AAHPER TEST ADMINISTRATION INSTRUCTIONS 65 D. AAHPER TEST SCORE CARD ; 69 LIST OF TABLES I AAHPER Youth Fitness Test Data From Four Trials 37 II AAHPER Youth Fitness Test Standard Errors of Measurement 41 III AAHPER Youth Fitness Test Analysis of Variance Data .... 43 CHAPTER I STATEMENT OF THE PROBLEM If we are to attach any significance to the results of the American Association for Health, Physical Education and Recreation (AAHPER) Youth Fitness Test, we must f i r s t be assured that the test items are accurate and reliable measuring instruments. We can not justify the use of the AAHPER Test as a measuring instrument of youth fitness i f the test items are inaccurate and unreliable. The AAHPER Youth Fitness Test was developed by the American Association for Health, Physical Education and Recreation to test the physical fitness of the American youth. It is now being widely used i n schools and universities throughout the world. It came into existence to supply the need for a suitable test to be used i n conjunction with the "Fitness Movement" i n the United States. This movement was initiated by President Eisenhower, and continued by the late President Kennedy's Council on Youth Fitness. The AAHPER National Youth Fitness Test provides national norms of physical performance for boys and g i r l s , ages 10-17 and college age youth and young adults, ages 18-30. The test i s especially useful for determining individual weaknesses, program quality and changes i n physical performance. The items of this test are pull-ups on the horizontal bar for measuring arm strength, sit-ups for measuring abdominal endurance, shuttle run for measuring speed and a g i l i t y , standing broad jump for measuring muscular power of the legs, 50-yard dash for determining speed, 6 o f t b a l l 2 throw for measuring arm power, and the 600-yard run-walk for determining endurance during sustained activity. The test may be administered in a gymnasium t a r outside without special equipment. I. THE PROBLEM The object of this study is to determine by a reliability analysis i f the AAHPER Youth Fitness Test items are accurate and reliable measuring instruments. More specifically the problems of this study were: 1* to determine the average and best test-retest reliability coefficients of the test items; 2* to determine the standard errors of measurement (absolute accuracy) of the test items as computed by the standard correlation formula method and the analysis of variance technique; 5« to determine i f practice effect is significant for each test item; 4. to determine i f the test items measure with an accuracy sufficient to distinguish between the subjects tested; and 5. to determine for each test item i f subject differences (differences between subjects) are significantly larger than practice differences (differences between trials)• II. DELIMITATIONS 1. This study deals with a group of fifty-seven untrained male f i r s t and second year University of British Columbia students. 3 III. LIMITATIONS 1. A larger sample would be more desirable in this type of investigati on. 2. The weather conditions during the second outdoor session were not ideal. 3. Some subjects did not participate to the best of their ability. 4. No attempt was made to obtain a random sample. A l l subjects volunteered for the testing program. CHAPTER II JUSTIFICATION OF THE PROBLEM • The function of measurement in physical education is to assess status or capacity in a given quality or s k i l l at a given time.'*' In order to attain the goals of physical education i t is necessary to use measuring devices such as the AAHPER Test. The AAHPER Youth Fitness Test is one of the latest, most valid physical fitness tests to be constructed. The supposition is that this test is an accurate and reliable measuring instrument of the most important aspects of physical fitness. Reliability coefficients of the individual items included in the American Association for Health, Physical Education and Recreation Youth Fitness Test have been reported from time to time by various investigators. No previous report appears to have been made of the reliability of the AAHPER Test items when done in successive order by the same subjects under conditions which have been officially recommended for the administration of the whole test. The test-retest reliability coefficients reported by various other investigators describe only one aspect of reliability. Other aspects of reliability are equally or more important and these have been generally ignored by test designers and research workers in physical education. Moreover, most reports of reliability using correlation coefficients have been based on only two trials. The possibility of further improvement in reliability by repeated testing has thus been virtually ignored. The object of this study was to determine I Carlton R. Meyers and T. Erwin Blesh, Measurement in Physical  Education, New York, The Ronald Press Company, 1962, p. 11. by a comprehensive reliability analysis i f the American Association for Health, Physical Education and Recreation Youth Fitness Test items are accurate and reliable measuring instruments. I. IMPORTANCE OF THE STUDY Complete and detailed information concerning the reliability of the AAHPER Test as a measuring device of youth fitness is of considerable importance in the field of physical education. This information will provide valuable assistance for the physical educator in determining the test's usefulness when dealing with a specific problem and in interpreting test results.^ In discussing the importance of reliability in educational 3 measurement, Thorndike stresses the following important concepts which are applicable to measurement and evaluation in physical education. In physical education we obtain a score for an individual on a test in order to arrive at some conclusion concerning him or to map a program of action with regard to him. When selecting a physical performance test. for a specific testing project and when interpreting the test results, the physical educator is primarily concerned with the reliability and accuracy 2 Robert W. B. Jackson and George A. Ferguson, Studies on the  Reliability of Tests, University of Toronto, Toronto, Department of Educational Research, Bulletin No. 12, 1941, p. 101. 3 Robert L. Thorndike, "Reliability," Chapter 15 in Educational  Measurement, E. F. Lindauist, ed., Washington, D.C, American Council on Education, 1961, pp. 562-563. 6 of the measuring instrument. Assuming a l l other considerations to be equal (particularly validity) the physical educator will choose that test which will give the most accurate and reliable estimate of the characteristic being studied* An individual score of questionable reliability resulting from the application of a physical performance test is distressing to the research worker in physical education. It would be unwise to consider this score as an indication of true individual performance. Further action or judgement which may be based on an unreliable score can only be tentative. II. DEFINITIONS OF TERMS USED Reliability It is apparent that there is some disagreement among the authorities as to the definition of reliability. This disagreement stems in part from an early connection between reliability and the correlation coefficient which has tended to confuse rather than clarify the issue. 4 This confusion can best be explained by examining some of the definitions of reliability which are in common use. Most writers in physical education define reliability as consistency. Clarke defines reliability as "... the degree of consistency with which a 4 Jackson and Ferguson, Reliability of Tests, p. 23. 7 measuring device may be applied." 0 Weiss and Scott give a similiar definition of reliability. They define reliability as the "... consistency with which a test can be administered by the same tester."® Mefcrs and Blesh in agreement with Clark© and Weiss and Scott summarize by defining reliability as "... the degree of consistency of results obtained by a measuring instrument upon repeated application... ."^ Authorities in educational research give additional definitions.of reliability. Thorndike and Hagen state that "Reliability has to do with accuracy and precision of a measurement procedure."** Kientzle defines reliability of a t e 6 t as "... the extent to which i t agrees with i t s e l f . " 9 Garrett states "A test score is called reliable when we have reasons for 5 H. Harrison Clarke, Application of Measurement to Health and  Physical Education, 3rd ed., New York, Prentice-Hall, Inc., 1959, p. 35. 6 Raymond A. Weiss and M. Gladys Scott, "Construction of Tests," Chapter 8 in Research Methods in Health and Physical Education, 2nd ed., M. Gladys Scott, ed., Washington, D. C. American Association for Health, Physical Education and Recreation, 1959, p. 243. 7 Meyers and Blesh, Measurement in Physical Education, p. 62. 8 Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation  in Psychology and Education, New York, John Wiley and Sons, Inc., 1959, p. 108. 9 Mary J. Kientzle, "Statistics in Psychology and Education," Pullman, Washington, The State College of Washington, Department of Psychology, 1951, p. 133. (Mimeographed). 8 in believing the score to be stable and trustworthy." Guilford reports that "Under the concept of reliability we are concerned about the accuracy with which a score represents the status of an individual in whatever aspect the test measures him."^ Guilford also points out that reliability has several meanings. "Common synonyms for reliability include: dependability, consistency, and stability. Each means something somewhat different as applied to measurements. Even the same term has slightly different meanings as applied to different measurement operations."^ Jackson and Ferguson° state that much of the disagreement among the various writers as to the definition of reliability stems from the fact that some authorities place emphasis on accuracy or errors of measurement while other authorities place emphasis on consistency or relative stability (reliability). They suggest that the issue would be 10 Henry E. Garrett, Statistics in Psychology and Education, 5th ed., New York, Longmans, Green and Co., 1958, p. 337. 11 J. F. Guilford, Psychometric Methods, 2nd ed., New York, McGraw-Hill, Inc., 1954, p. 349. 12 Guilford, Psychometric Methods, p. 342. 13 Jackson and Ferguson, Reliability of Tests, p. 24. 9 clarified i f the two terms absolute accuracy and relative accuracy were used in the place of the blanket term reliability. 14 Guilliksen reports that i t has become customary in the last forty years to assess tests in terms of the reliability coefficient rather than the error of measurement. He suggests in agreement with Jackson and Ferguson that since the reliability coefficient and the standard error of measurement have advantages and disadvantages, both should always be given when making a complete assessment of a test. Thorndike^ in collaboration with' Gronbach, Cureton, Kelly, Kurtz, Richardson, and Thurstone defines reliability as the tendency towards consistency. He states in agreement with Jackson and Ferguson and Guilliksen that the consistency of a set of measures may be determined by two different methods. In the fir s t , one is concerned with the amount of variation we would find in a set of repeated measurements of the same specimen. This is called the standard error of measurement. The second method of determining consistency in measurement may be made in terms of the consistency with which an individual maintains his ranking in a group when a set of measures are reproduced a second time. An index of this consistency of measurements is the correlation between two sets of scores which may be called the reliability coefficient. 14 Harold Guilliksen, Theory of Mental Tests, New York, John Wiley and Sons, Inc., 1950, p. 194. 15 Thorndike, "Reliability," Chapter 15 in Educational Measurement, E. F. Lindquist, Ed., pp. 560-561. 10 Lyman^® in agreement with Jackson and Ferguson, Guilliksen, and Thorndike reports that r e l i a b i l i t y refers to consistency of measurement and may be assessed i n either an absolute or relative sense. He states absolute consistency refers to the v a r i a b i l i t y we could expect i n a person's score, i f examined repeatedly with the same test while relative consistency refers to the a b i l i t y of the test to reproduce scores that place examinees in the same position relative to each other. It appears that r e l i a b i l i t y i s a somewhat d i f f i c u l t term to define. This d i f f i c u l t y stems in part from the fact that r e l i a b i l i t y refers to a series of concepts that are often confused with one another. Authorities, however, seem to agree that any manual that accompanies a test should provide as s t a t i s t i c a l indices of precision of measurement, measures expressed both i n relative (correlation coefficient) and absolute (standard error of measurement) terms. Standard Error of Measurement The standard error of measurement indicates how much we would expect a person's score to vary i f he were examined repeatedly with the same test. It expresses the r e l i a b i l i t y of a test i n an absolute sense. This s t a t i s t i c estimates how accurate our obtained scores are and how much confidence we can place on getting a reasonably accurate measurement with 16 Howard B. Lyman, Test Scores and What They Mean, Englewood C l i f f s , N. J., Prentice-Hall, Inc., 1963, p. 31. 11 the variable concerned. Thus one can have 68.26 percent confidence the obtained score l i e s w ithin + 1 standard error of measurement of the true score or 99 percent confidence that the obtained score l i e s w i t h i n _+ 2.58 standard errors of measurement of the true score. Practice E f f e c t The practice e f f e c t refers to the improvement i n performance from t e s t to r e t e s t . The practice effect i s also c a l l e d the i n i t i a l educational adjustment and may be attributed to learning rather than to physiological changes. Factors which contribute to the practice e f f e c t are increased motivation, competition with other subjects, improvement i n form, preliminary practice, coaching, and changes i n i n t e r e s t , desire and a t t i t u d e . CHAPTER III REVIEW OF LITERATURE Much has been written on the American Association for Health, Physical Education and Recreation Youth Fitness Test and the concept of r e l i a b i l i t y * however, this review w i l l present only a brief summary of the AAHPER Te6t and a survey of the literature closely related to the problem at hand. I. HISTORY AND DEVELOPMENT OF THE AAHPER TEST During the war years there was a great interest i n fitness but during times of peace this interest f e l l off. Much of the interest during the last decade can no doubt be traced to the large draftee rejection rate of the Korean War. The results of the Kraus-Weber Test did much to revitalize interest i n fitness. In 1946 Dr. Hans Kraus, Associate Professor at the Institute of Rehabilitation, New York University, developed the Kraus-Weber Test for Minimum Muscular Fitness. The Kraus-Weber Test is composed of five strength items and one f l e x i b i l i t y item. Each movement is performed once on a pass or f a i l basis. The Kraus-Weber Test was originally designed to determine i f polio patients were sufficiently rehabilitated to leave the hospital. The test was later applied to posture cases and as a measure of progress for those i n f l i c t e d with disorders of the spine. 13 The tests were later administered to 4,264 American and 2,870 European children between the ages of 6 and 16 by Kraus and Hirschland."*-Results showed that 57,9 percent of the American children and only 8*7 percent of the European children failed. Immediately following the release of the Kraus-Weber Test results, leading newspapers and magazines carried articles pertaining to the fitness of America's youth. These articles aroused the haltion's interest in youth and adult fitness. Consequently, individuals began to ask: what is wrong with our fitness; what can be done and whstt must be done? This great interest in fitness led former President Eisenhower to call a conference on the fitness of American youth. This conference, the f i r s t fitness conference held under the auspices of the Federal Government, was held at the United States Naval Academy, Annapolis, June 18-19, 1956. The Conference recommended that: Official recognition be taken of the fact that our adult citizens and our youth have l i t t l e appreciation of the existence of a problem pertaining to the fitness of American youth. The public generally, and parents, church leaders, and educators in particular, be alerted to the facts that (a) in this age of automation, the fitness of our youth cannot be taken for granted, (b) indifferences to the softness which comes from lack of participation in 1 Hans Kraus and Ruth P. Hirschland, "Minimum Muscular Fitness Tests in School Children," AAHPER Research Quarterly, Vol. 25 (May 1954), pp. 178-188. 14 health-giving ac t i v i t i e s w i l l bring erosion of our strength, and (c) physical fitness goes hand-in-hand with moral, mental, and emotional fitness. Intensive, continual, and co-operative research be conducted to supply the factual base for formulating fitness policies, plans, and programs.2 As a result of this conference President Eisenhower created the President's Council of Youth Fitness on September 6th, 1956. The results of the Kraus-Weber Tests and the findings of the National Conference on Youth Fitness prompted the American Association for Health, Physical Education and Recreation to hold a conference on September 12-13. It was decided at this conference that a meeting of selected members of the Research Council would be held i n February 1957 i n order to formulate a National Fitness Project. The i n i t i a l planning of the AAHPER Test took place at this follow-up meeting i n Chicago. At the end of two day's discussion the test battery was decided upon. The test included items which attempted to judge the individual's proficiency i n running, throwing, endurance, strength, a g i l i t y and swimming. Specifically, the items were pull-ups for measuring arm-shoulder girdle strength; sit-ups for measuring strength and endurance of abdominal and hip flexor muscles; shuttle run for measuring speed, co-ordination and a g i l i t y , standing broad jump for measuring explosive power 2 "Report of the President's Conference on the Fitness of American Youth. June 1956 ... Highlights of Conference Findings and Recommendations," Journal of the American Association of Health, Physical Education and Recreation, Vol. 28 (March 1957J, p. 33. 15 of the leg extensors; 50-yard d a s h for measuring speed, S o f t b a l l throw for measuring s k i l l , strength a n d c o - o r d i n a t i o n ; and the 600-yard run-walk for measuring endurance. Shortly after the Chicago meeting a Physical Fitness Research Committee was formed with Paul Hunsicker as director. Under the direction of Dr. Hunsicker the AAHPER Test was administered to a total of 8,500 school children in grades 5 to 12. The results were analyzed and national norms were established. In addition, a test manual was prepared by the American Association for Health, Physical Education and Recreation describing the test battery and its administration. II. ADVANTAGES OF THE AAHPER TEST Hunsicker^ reports five reasons why items of the AAHPER Test were chosen by the Advisory Committee of the AAHPER Youth Fitness Testing Project. 1. The Tests are Reasonably Familiar The AAHPER Youth Fitness Test items have been used for some time and are familiar to, or readily understood by a l l physical education personnel. Even young boys and girls and those individuals without 3 Paul Hunsicker, "The Youth Fitness Project," Journal of the Canadian  Association for Health, Physical Education and Recreation, Vol. 30 (February-March 1964J, pp. 15-16, 31. 16 specialized training i n physical education should have l i t t l e or no d i f f i c u l t y understanding the test items. 2. The Tests Require L i t t l e or No Equipment The equipment required to administer the AAHPER Test is of a low cost and is usually found i n most schools and recreation areas. 3. The Tests Can Be Administered to Both Sexes The test items can be administered to both boys and g i r l s ; however, the pull-up item is modified for g i r l s . 4. The Tests Can Be Administered to Grades 5 to 12 The tests can easily be given to children i n the age range o£ grades 5 to 12. 5. The Tests Measure Different Aspects of Fitness The test items measure the different components of fitness and many muscle groups of the body. III. FACTORS THAT INFLUENCE TEST RELIABILITY The r e l i a b i l i t y of a test is influenced by a wide variety of causes. The more common factors that affect test r e l i a b i l i t y are l i s t e d below: 17 1. Increasing Test Length Increasing the length of a test will increase reliability providing of course the group is the same and the new items are as good as those on the shorter test. 2. Repeating a Test Averaging the scores from several applications of a test or from parallel forms will increase reliability. 3. Variability of the Group The size of variance is important in determining reliability. A wide distribution of group scores is more likely to yield a higher reliability coefficient than a narrow restricted distribution. 4. Time In a test-retest situation the reliability is higher when the time interval between the two testing periods is short. 5. Item Difficulty Reliability is higher when test items are about f i f t y percent difficult. Reliability tends to decrease when the difficulty of the items departs from the f i f t y percent level. 6. Practice Any factor such as practice which influences the difficulty 18 value of the test items will also influence reliability. 7. Irregularities Irregular testing conditions which affect some scores more than others tend to cause lower reliability coefficients. 8. Item;.. Intercorrelations Reliability is highest where the items of the test a l l inter-correlate highly. 9. Range of Difficulty The more nearly equal are the difficulties of the test items, the higher the reliability. IV. METHODS OF ESTIMATING RELATIVE PRECISION (RELIABILITY) There are five procedures in use for computing the reliability of a test. These methods are as follows: 1. Alternate or Parallel Forms Method (Equivalent Forms) When duplicate forms of a test are administered to the same group of subjects, the correlation between Form A and Form B may be used as the reliability of the test. When developing two parallel forms of a test care must be taken to ensure similar test content and administration instructions. The parallel 19 forms method is the most commonly used estimate of reliability for written 4 standardized tests* 2. Test-Retest Method (Stability) When a test is administered to the same group of subjects on two occasions, the correlation between the scores earned on the two administrations may be used as the reliability of the test. The test-retest method is most often used when determining the reliability of s k i l l and physical performance tests. Thorndike states that "... for those types of tests in which sampling of items and memory of previous responses are not an issue and for which comparability of motivation seems likely, a second application of the same test at a later date, and correlation of the two sets of scores provides an adequate set of operations for reliability estimation." Garrett points out that "The test-retest method will estimate less accurately the reliability of a test which contains novel features and is highly susceptible to practice than i t will estimate the reliability of test scores which involve familiar and well-learned operations l i t t l e affected by practice. Owing to difficulties in controlling conditions which 4 Henry E. Garrett, Statistics in Psychology and Education, 5th ed.. New York, Longmans, Green and Co., 1958, p. 339. 5 Robert L. Thorndike, "Reliability," Chapter 15 in Educational  Measurement, E. F. Lindquist, ed., Washington, D. C, American Council on Education, 1961, pp. 578-579. 20 influence scores on retest, the test-retest method is generally less use-g ful than other methods. 3. Split-Half Method (Internal Consistency) In this method of determining reliability the test is divided into two equivalent halves. Scores from the two halves are then correlated. The correlation obtained, however, represents the reliability coefficient of only one half the test, so to obtain the reliability of the whole test the Spearman-Brown Prophecy Formula must be used. This formula is : 2rhh  r X X ~ 1 + rhh where rhh is the reliability of a half test. The question of how a test may be divided into two halves is an important one. The procedures include: (a) Putting alternate items in each half test (odd-even split). (b) Using the fi r s t half of the test as one-half test and using the second half as the other (first versus second halves). (c) Selecting items for the two half tests which are equivalent in content and difficulty. (d) Putting alternate groups of test items in each half-test. The split-half method of determining reliability is used when i t i s not possible to repeat the test or to construct parallel forms of the test. 6 Garrett, Statistics in Psychology and Education, p. 338 21 One of the advantages of the split-half method is that a l l the data required for computing r e l i a b i l i t y can be obtained at one si t t i n g , 4» Method of Rational Equivalence The formulas used i n this method of calculating r e l i a b i l i t y are somewhat different from those used i n the calculation of r e l i a b i l i t y discussed so; far. The method of rational equivalence does not require the calculation of a correlation coefficient. This method determines the internal consistency of a test through an analysis of the individual test items. The formulas used to calculate this method of r e l i a b i l i t y are 7 known as the Kuder-Richardson formulas. Garrett gives a simple approximation of one of the Kuder-Richardson formulas which i s useful i n determining r e l i a b i l i t y quickly. It reads :" - N - M (N - M) r 1 1 = fr-2t (N - 1) Where: T i l = r e l i a b i l i t y of the whole test . N » number of items on the test C-t « standard deviation of the test scores M = the mean of the test scores The Kuder-Richardson rational equivalence formulas give a lower estimate of test r e l i a b i l i t y than would be obtained by the other methods 7 Garrett, Statistics i n Psychology and Education, pp. 340-342. 22 described. Thus, using this method of determining reliability eliminates the danger of making an overestimation. 5. Analysis of Variance Methods Hoyt has developed a formula for estimating test reliability by analysis of variance techniques. He showed that the reliability of a test could be estimated from the formula S2 So - (K-1) " (K-1)°(N-1) * ~ S, '2 (K-1) Where: 2 • among individuals mean square (variance) K-1 S =» error mean square (variance) (K - 1) (N - 1) q Jackson and Ferguson applied analysis of variance techniques and the methods of testing statistical hypotheses to estimate the reliability of two forms of an intelligence test. To determine the relative accuracy they introduced a new statistic called the sensitivity coefficient. It is defined as the ratio;,1 of the standard deviation of the true scores to the 8 Cyril Hoyt, "Test Reliability Estimated by Analysis of Variance," Psychometrika, Vol. 6^  (June 1941), pp. 153-160. 9 Robert W. B. Jackson and George A. Ferguson, Studies on the  Reliability of Tests, University of Toronto, Toronto, Department of Educational Research, Bulletin No. 12, 1941, pp. 39-40. 23 standard deviation of the distribution of errors of measurement. In other words the sensitivity coefficient expresses the differences between the individuals tested in terms of the errors of measurement of the test. If the sensitivity coefficient is small, then the errors of the measurement will be large in comparison with the differences in individual ability tested and hence the scores obtained by an individual will have a large error component. To find a unique estimate of the sensitivity (relative accuracy) from an analysis of variance proceed as follows: (a) Subtract the error mean square from the between subjects mean square. (b) Divide the resulting difference by twice the error mean square. (c) Find the square root of the quotient as an estimate of the sensitivity. Alexander^ using the/:, methods of estimation provided by analysis of variance, provides estimates of reliability for a block of data consisting of several trials by the same individuals. He also discussed the conditions under which each estimation of reliability is considered valid. 10 Howard W. Alexander, "The Estimation of Reliability When Several Trials are Available," Psychometrika, Vol, 12 (June 1947), pp. 79-99. 24 V. METHODS OF ESTIMATING ABSOLUTE PRECISION (ERRORS OF MEASUREMENT) Reliability can be expressed in either absolute or relative terms. This became evident when reviewing the various definitions of reliability. Absolute precision takes the form of the standard error of measurement. The standard error of measurement is the standard deviation of a distribution of scores a l l of which are estimates of the same true score. The standard error of measurement may be estimated by the standard correlation formula method or the analysis of variance technique. 1. Standard Formula Method The reliability coefficient by itself does not give an estimate of absolute accuracy of our measurements. However, we can calculate the standard error of measurement using the reliability coefficient in the formula Where: SE M = the standard error of measurement SD = the standard deviation of the test scores (test I) T » the reliability coefficient of test I 2. Analysis of.Variance Technique The standard error of measurement may also be calculated by analysis of variance by taking the square root of the error mean square. This gives a direct estimate of the absolute accuracy of our measurements. 25 VI. RELATED RELIABILITY STUDIES A review of the literature was made to discover the reported r e l i a b i l i t y of the AAHPER Test items. Barrow*^ i n an attempt to develop an easily administered test of general motor a b i l i t y for college men calculated the test-retest r e l i a b i l i t y of twenty-nine test items. He found the following test -retest r e l i a b i l i t i e s for items similar to those used on the AAHPER Testj standing broad jump .895, softball throw .928, and 60-yard dash .828. 12 Kane and Meredith i n determining a b i l i t y i n the standing broad jump of elementary school children 7, 9, and 11 years of age, calculated r e l i a b i l i t y coefficients for boys and girls i n each age group. Pearson product moment correlation coefficients were computed for the best record with the second best record of twelve t r i a l s . For a l l three ages and both sexes, the r e l i a b i l i t y coefficients approximated .980. In 1952, McCraw and Tolbert conducted a study comparing different St •v., 11 Harold M. Barrow, "Test of Motor A b i l i t y for College Men," AAHPER  Research Quarterly, Vol. 25 (October 1954), pp. 253-260. 12 Robert J. Kane and Howard V. Meredith, "Ability in the Standing Broad Jump of Elementary School Children 7, 9, and 11 years of Age," AAHPER  Research %tarterly, Vol. 23 (May 1952), pp. 198-208* 13 L. W. McCraw and J. W. Tolbert, "A Comparison of the Rel i a b i l i t i e s of Methods of Scoring Tests of Physical A b i l i t y , " AAHPER Research aguarterly, Vol. 23 (March 1952), pp. 73-81. 26 methods of scoring, such as, one t r i a l , average of three trials, median of three trials, best one of three trials, average of two trials and best one of two trials. Six tests of physical ability were administered to 128 junior high school boys. The tests of physical ability used were 50-yard dash, standing broad jump, softball throw, jump and reach, a wall volley and speed shooting. Subjects were allowed three trials for each test on each of two administrations. The following coefficients of correlation were reported. Standing Trials 50-Yard Dash Broad Jump Softball Throw 1st Trial - 2nd Trial, First Adm. .886 .818 .933 1st Trial - 3rd Trial, First Adm. .845 .831 .914 2nd Trial - 3rd Trial, First Adm. .905 .858 .908 1st Trial - 2nd Trial, Second Adm. .855 .902 .930 1st Trial - 3rd Trial, Second Adm. .882 .877 .920 2nd Trial - 3rd Trial, Second Adm. .893 .913 .944 First Trial - First Trial .832 .797 .892 Average of Three-Average of Three .931 .918 .940 Median of Three - Median of Three .861 .899 .905 Best of Three - Best of Three .876 .916 .927. Average of Two - Average of Two .886 .882 .905 Best of Two - Best of Two .864 .845 .915 Brown revealed the reported range of reliability of twenty-eight 14 Howard S. Brown, "A Comparative Study of Motor Fitness Tests," AAHPER Research %tarterly, Vol. 25 (March 1954), pp. 8-19. 27; selected test items including six which are similar to the items on the AAHPER Test. He also determined the r e l i a b i l i t y of the test items himself. Test Items Range of Previously Reli a b i l i t y Found Reported R e l i a b i l i t i e s By Brown  Pull-Ups or Chins .86-,99 .915 60-Yard Dash .84-.97 .823 Standing Broad Jump .66-.98 .905 Softball Throw .89-.97 .916 Sit-Ups (2 minutes) .71-.99 .823 60-Yard Shuttle Run no report .819 Dodge Run .68-.97 .875 Willgoose, Askew, and Askew*^ estimated the r e l i a b i l i t y of the 600-yard! nun-Walk at the grade eight le v e l . They tested seventy grade eight boys at an interval of one week under identical testing conditions. A rank order correlation coefficient of .92 was computed. 15 Carl E. Willgoose, Nathaniel R. Askew, and Mildred Askew, "Reliability of the 600-Yard Run-Walk at the Junior High School Level, " AAHPER Research % a r t e r l y , Vol. 32 (May 1961), pp. 264-266. CHAPTER IV METHODS AND PROCEDURE This chapter presents a detailed description of the methods and procedures used in administering and analyzing the seven items of the AAHPER Test. I. SUBJECTS One hundred and twenty University of British Columbia f i r s t and second year male students registered in the Physical Education Research Programme in the Fall of 1961. These students agreed to become subjects for a number of physical fitness tests to be administered by fourth year students, and graduate students in physical education. Registration in the Physical Education Research Programme enabled students to satisfy one-half of the Physical Eduoation Service Programme requirement. During the month of March, 1962, students who had enrolled in the Research Programme received a letter from Dr. Stanley R. Brown, Associate . ' Professor and Director of Research, informing them of the testing procedure (see Appendix A). Each subject was required to attend two testing sessions a week for four weeks. Testing began on Monday, March 12th at the University of British Columbia. Indoor sessions were held on Mondays and Wednesdays in the University of British Columbia War Memorial Gymnasium. Outdoor sessions were held Wednesdays and Fridays on the Varsity rStadium playing field. One hundred and twenty letters were sent to students originally 29 enrolled in the Research Programme; however, only seventy-nine of these one hundred and twenty students began the testing programme. At the end of the fourth week fifty-seven subjects had completed the testing programme. II. TEST ADMINISTRATION Before the f i r s t testing session each subject was given a summary of the history and use of the AAHPER Test (see Appendix B) and a summary of test administration instructions (see Appendix C). A l l subjects made four complete runs through the seven item American Association for Health, Physical Education and Recreation Youth Fitness Test. The seven items administered were pull-ups, sit-ups, standing broad jump, shuttle run, 50-yard dash, softball throw and 600-yard run-walk. Pull-ups, sit-ups, standing broad jump and shuttle run items were done indoors and 50-yard dash, softball throw, and 600-yard run-walk items were done out-of-doors. Sit-ups were done on a gymnasium wood floor, the 50-yard dash and softball throw were done on a grass football field and the 600-yard run-walk was conducted on a 440 yard cinder track. Subjects were given a demonstration by their instructors of the proper method of performing the test items. They were given similar motivation throughout the experiment and were tested by;the same group of testers each week. The testing instructions specified were carefully followed in a l l cases. For the purpose of this study i t was necessary to emphasize before testing began that no warm-up or practice was allowed. Each participant was given an AAHPER Youth Fitness Test score card 30 for recording results (see Appendix D). These cards were to be f i l l e d out after the completion of each test item and carried to the next testing station. Subjects were instructed to follow the pre-established order of performance u n t i l they completed the test items. Students worked i n pairs to observe and record each other's performance under supervision. Weather conditions were uniformly good during the f i r s t , third and fourth t r i a l s but during the second outdoor testing period there was a continuous light drizzle with no wind. As i t was not possible to postpone the outdoor portion of the second testing period, i t was considered worth-while recording results to see what happens to performance and r e l i a b i l i t y measurements under conditions which are common to the west coast of British Columbia. III. TESTING TEAM The testing team was made up of graduate students and qualified physical education instructors. A l l testers took part i n a training session before the f i r s t t r i a l was administered. At this time, specific instructions were given, testing procedures were demonstrated and a practice session was conducted. A l l members of the testing team were familiar with the testing routine before the testing programme began. IV. EQUIPMENT AND FACILITIES The following equipment and f a c i l i t i e s were used for the administration of the test. 1. Gymnasium area 31 2 . O u t d o o r p l a y i n g f i e l d 3 . H i g h b a r 4 . Two gymnasium mats 5 . F o u r s m a l l b l o c k s 2" x 2 W x 4 " 6 . Two t a p e m e a s u r e s 7 . Two s t o p w a t c h e s 8 . S i x V b i t r u b b e r s o f t b a l l s 9 . F o o t b a l l y a r d m a r k e r s , one t o t h r e e h u n d r e d y a r d s . V . S T A T I S T I C A L TREATMENT OF DATA F o u r s e t s o f d a t a f o r e a c h o f t h e AAHPER T e s t i t e m s were g a t h e r e d b y t h e i n v e s t i g a t o r . These d a t a were p u n c h e d on I . B . M . c a r d s a n d were a n a l y z e d t o o b t a i n ( l ) t e s t i t e m means a n d s t a n d a r d d e v i a t i o n s o n e a c h o f f o u r t r i a l s , (2 ) a n i n t e r c o r r e l a t i o n m a t r i x a n d (3) t h r e e F r a t i o s ( a n a l y s i s o f v a r i a n c e ) f o r e a c h t e s t i t e m . A v e r a g e R e l i a b i l i t y C o e f f i c i e n t s I n o r d e r t o c a l c u l a t e a n a v e r a g e r e l i a b i l i t y c o e f f i c i e n t f o r e a c h v a r i a b l e , t h e b e t w e e n t r i a l s r e l i a b i l i t y c o e f f i c i e n t s o b t a i n e d f r o m t h e i n t e r c o r r e l a t i o n m a t r i x were t r a n s f o r m e d i n t o F i s h e r ' s Z f u n c t i o n , summed a n d d i v i d e d b y t h e i r number t o o b t a i n t h e a r i t h m e t i c mean o f t h e Z ' s . The mean Z was t h e n c o n v e r t e d i n t o a n e q u i v a l e n t r e l i a b i l i t y c o e f f i c i e n t . S t a n d a r d E r r o r s o f Measurement S t a n d a r d e r r o r s o f measurement ( a b s o l u t e a c c u r a c y ) f o r e a c h v a r i a b l e were c a l c u l a t e d b y t h e a n a l y s i s o f v a r i a n c e t e c h n i q u e a n d t h e s t a n d a r d correlation formula method. The standard errors of measurement using the analysis of variance technique were calculated for each variable by taking the square root of the error mean square. The standard errors of measurement for each variable using the standard correlation method were calculated by incorporating the average standard deviation and the average reliability coefficient into the standard error of measurement formula S EM = S DA J l -Y>£ (where A - average). Por the three variables (50-yard dash, softball throw and 600-yard run-walk) measured out-of-doors under wet conditions during the second testing period, standard errors of measurement were calculated with t r i a l two data eliminated. Analysis of Variance To determine the usefulness of the AAHPER Test items as measuring instruments, the testing results for each test item were analyzed by analysis of variance techniques advocated by Jackson and Ferguson^ and Garrett. 2 The data from each of the outdoor test items (50-yard dash, softball throw and 600-yard run-walk)were analyzed by analysis of variance with t r i a l 1 Robert W.B. Jackson and George A. Ferguson, Studies on the  Reliability of Tests, University of Toronto, Toronto, Department of Educational Research, Bulletin No. 12, 1941, pp. 31-38. 2 Henry E. Garrett, Statistics in Psychology and Education, 5th ed., New York, Longmans, Green and Co., 1958, p. 295. 33 two data eliminated. This procedure was followed because the means of the outdoor items were adversely affected by wet weather conditions which prevailed during the second outdoor testing session. The analysis of variance methods used in this study presuppose that the subjects tested are not a l l of the same ability and that subject.;scores, on the average, progressively increase from t r i a l to t r i a l . From the data collected for each test item i t was quite obvious that the subjects varied considerably in ability. In most cases, i t also appeared that, on the average, subject scores improved from tr i a l to t r i a l . Where this apparent improvement was tested statistically and found to be significant i t could properly be described as practice effect. Analysis of variance techniques were used in this study for treating three different problems: (l) The determination of a significant practice effect, (2) the determination of whether or not the test items measure differences between subjects tested, and (3) the determination of whether or not subject differences are significantly greater than practice differences. In order to investigate these problems i t was necessary to obtain a measure of the amount the practice effect, the differences between subjects, and error contributed to the total variance. The total variance was divided into components which were assigned to practice effect (trials), differences between subjects (individuals), and error (residual). This provided a means by which to test the significance of the practice effect and the differences between subjects and to determine whether or not sa subject differences are significantly greater than practice differences. Practice Effect (Differences Between Trials) In order to determine whether or not there was a significant practice effect for each test item, i t was necessary to calculate an F ratio for t r i a l s . The F ratio for t r i a l s was calculated by dividing the mean square (variance) t r i a l s by the error mean square (variance). If the practice effect i s not significant (no significant differences between the means), the between t r i a l s mean square w i l l be of the same order as the error mean square. However, i f the between t r i a l s mean square i s significantly greater than the error mean square, then i t maybe concluded that the practice effect is significant. Differences Between Subjects In order to determine i f a test item measure with an^accuracy sufficient to distinguish between subjects tested i t was necessary to calculate an F ratio-:, for subjects. The F ratio for subjects was calculated by dividing the mean square (variance) subjects by the error mean square (variance). In most groups individuals w i l l not be of the same a b i l i t y i n any test of motor performance. If the accuracy with which any test measures performance i s not sufficient to distinguish between the a b i l i t i e s of the subjects tested, then the differences between the scores made by the subjects w i l l be small and due entirely to errors of measurement. This means that the 36 between subjects mean square w i l l not be significantly larger than the error mean square. However, i f between subjects mean square i s significantly larger than the error mean square, i t may be assumed that the differences between subject scores are too large to be caused solely by errors of measurement. Thus, i t may be concluded that the test measures performance with an accuracy sufficient to distinguish between the a b i l i t i e s of3the subjects tested. In other words the test can be relied upon to produce consistent results. Subject Differences Versus Practice Differences Through a further analysis of variance i t is possible to determine whether or not subject differences (differences between subjects) are significantly greater than practice differences (differences between t r i a l s ) . This method of analysis was used to calculate a third F ratio for each variable. This ratio was obtained by dividing the mean square (variance) subjects by the mean square (variance) t r i a l s . If the mean square subjects i s not significantly larger than the mean square t r i a l s , i t may be concluded that subject differences are no larger than practice differences. Subject differences are, however, usually much larger than practice differences, but when they are not, i t may be assumed that there has been a practice effect. CHAPTER V RESULTS AND DISCUSSION The results given in this study refer to the scores made by f i f t y -seven subjects on four repetitions of the seven item AAHPER Youth Fitness Test. Results are presented and discussed in terms of reliability, errors of measurement and analysis of variance (F ratios). I. RELIABILITY In experimental studies employing motor fitness (physical performance) items, i t is desireable to repeat pre-training trials until satisfactory reliability standards are met. This seems necessary whether control groups are used or are not used since failure to show differences between experimental and control groups may be a result of data being unreliable. Also there is the practical question of how much improvement might have resulted solely from a practice of the test item. If practice alone can produce changes comparable with training, what real knowledge has been gained about the value of the training method in improving fitness? Strictly reliable motor fitness data show (a) relatively high correlation co-efficients, (b) means of similar size, (c) standard deviations of comparable size, (d) differences between paired scores which have the appearance of being random errors and (e) small standard errors of measurement. Test item performance means and standard deviations for the four trials are presented in Table I together with between trials reliability coefficients and an average reliability coefficient. 37; TABLE I AAHPER YOUTH FITNESS TEST DATA FROM FOUR TRIADS (N-57) Test-Retest Variable Mean Standard Between Reliability-and Trials Scores Deviations Trials Coefficients 1. PullrUpS (Mo.) 1 v 2 .917 1 6.245 3.465 1 v 3 .938 2 6.105 3.216 1 v 4 .928 3 6.403 2.939 2 v 3 .945 4 6.614 3.143 2 v 4 .932 3 v 4 .960 Average 6.342 3.196 .938 2. Sit-Ups (No.) 1 v 2 .825 1 31.438 13.779 1 v 3 .853 2 31.807 15.370 1 v 4 .757 3 34.543 15.025 2 v 3 .943 4 39.596 21.543 2 v 4 .829 3 v 4 .890 Average 34.346 16.703 .861 3• Standing Broad (Ins•) Jump 1 v 2 .823 1 84.508 9.300 1 v 3 .822 2 86.543 10.390 1 v 4 .820 3 87.333 9.474 2 v 3 .937 4 87.877 9.939 2 v 4 .941 3 v 4 .951 Average 86.565 9.785 .899 38 TABLE I - Continued Test-Retest Variable Mean Standard Between Re l i a b i l i t y and Trials Scores Deviations Trials Coefficients 4. Shuttle Run (Sees.) 1 v 2 > .761 1 9.696 .557 1 v 3 .686 2 9.484 .449 1 v 4 .721 3 9.464 .469 2 v 3 .806 4 9.292 .439 2 v 4 .878 3 v 4 .754 Average 9.484 .481 .776 5. 50-Yard Dash (Sees.) 1 v 2 .797 1. 6.812 .437 1 v 3 .804 a2 6.899 .428 1 v 4 .762 3 6.712 .427 2 v 3 .770 4 6.750 .436 2 v 4 .755 3 v 4 .850 Average 6.793 .432 .792 6, Softball Throw (Ft.) 1 v 2 .960 1 164.917 34.538 1 v 3 .931 a2 157.631 32.717 1 v 4 .938 3 165.157 35.735 2 v 3 .935 4 163.631 36.004 2 v 4 .938 3 v 4 .933 Average 162.833 34.772 .940 7. 600-Yard Run-Walk (Sees.) 1 v 2 .764 1 110.438 9.888 1 v 3 .659 a2 111.561 11.946 1 v 4 .860 3 107.175 11.138 2 v 3 .656 4 106.350 10.797 2 v 4 .803 3 v 4 .814 Average 108.881 10.967 .759 a Trial two scores were adversely affected by wet weather conditions 3S P u l l - U p s F o r t h e g r o u p t e s t e d improvement i n mean p u l l - u p p e r f o r m a n c e o v e r f o u r t r i a l s was a p p r o x i m a t e l y o n e - h a l f p u l l - u p . The b e t w e e n t r i a l s r e l i a b i l i t y c o e f f i c i e n t s showed a p r o g r e s s i v e improvement f r o m .917 t o . 9 6 0 . The v a r i a b i l i t y r e m a i n e d r e l a t i v e l y c o n s t a n t . I t w o u l d a p p e a r , t h e r e f o r e , f r o m t h i s d a t a t h a t o n l y one o r two p r a c t i c e t r i a l s a r e n e c e s s a r y b e f o r e t e s t r e s u l t s c a n be u s e d i n e x p e r i m e n t a l s t u d i e s . S i t - U p s Mean s i t - u p p e r f o r m a n c e i n c r e a s e d f r o m 3 1 . 4 3 8 t o 39 .596 o v e r f o u r t r i a l s . T h i s was a n i n c r e a s e o f a p p r o x i m a t e l y e i g h t s i t - u p s . The s t a n d a r d d e v i a t i o n s i n c r e a s e d f r o m 13 .779 t o 2 1 . 5 4 3 . The b e t w e e n t r i a l r e l i a b i l i t y c o e f f i c i e n t s v a r i e d c o n s i d e r a b l y w i t h n o p a r t i c u l a r p a t t e r n e v i d e n t . The l a r g e improvement i n mean p e r f o r m a n c e a n d t h e i n c r e a s e d v a r i a n c e o v e r f o u r t r i a l s s u g g e s t t h a t f o u r o r more t r i a l s a r e r e q u i r e d b e f o r e r e l i a b l e r e s u l t s c a n be u s e d i n e x p e r i m e n t a l s t u d i e s . T h i s may n o t be a s u i t a b l e i t e m f o r m o t o r f i t n e s s t e s t s s i n c e m o t i v a t i o n a n d l e a r n i n g a p p e a r t o have a m a r k e d i n f l u e n c e o n p e r f o r m a n c e . S t a n d i n g B r o a d Jump The s t a n d i n g b r o a d jump v a r i a b l e showed a n i n c r e a s e i n mean p e r f o r m a n c e o f 3 .369 i n c h e s . The v a r i a b i l i t y r e m a i n e d r e l a t i v e l y c o n s t a n t a n d t h e b e t w e e n t r i a l s r e l i a b i l i t y c o e f f i c i e n t s i n c r e a s e d f r o m .823 t o . 9 5 1 . I t i s o b v i o u s f r o m t h i s d a t a t h a t good r e l i a b i l i t y c a n o n l y be a c h i e v e d a f t e r t h r e e o r f o u r p r a c t i c e - t r i a l s . 4'0 Shuttle Run There was considerable reduction in shuttle run mean performance times and variabilities over four trials and one can only speculate how much more improvement might have resulted from further practice. It is obvious this item needs plenty of practice before the skills involved in changing direction and picking up and placing blocks can be learned adequately. 50-Yard Dash Trial two scores were obviously affected by wet weather conditionsj however, the means and standard deviations of the other trials remained relatively constant. These results indicate that this item requires l i t t l e learning and few practice trials are necessary before reliable results can be achieved. The event is much simpler than the shuttle run and much less learning is involved. Softball i W o w Poor weather conditions caused a reduction in the mean and standard deviation of t r i a l two scores. The event was otherwise highly reliable as subjects showed very l i t t l e improvement in performance over four trials. From these results i t is reasonable to assume that t r i a l one scores are sufficiently reliable for experimental purposes. 600-Yard Run-Walk The small improvement in mean performance (.820 seconds) and the 4 v l reasonably large reliability coefficient ( . 8 1 4 ) between trials three and four indicate stable results and suggest that at least two preliminary practice trials are necessary before recording representative times. II. STANDARD ERRORS OF MEASUREMENT Table II contains the standard errors of measurement of the test items as calculated by standard correlation formula method and the analysis of variance technique. TABLE II j AAHPER YOUTH FITNESS TEST STANDARD ERRORS OF MEASUREMENT Variables Units Correlation Formula Method Analysis of Variance Technique 1 Pull-Ups No. 0.794 0.834 2 Sit-Ups No. 6.250 6.934 3 Standing Broad Jump Ins. 3.124 3.353 4 Shuttle Run Sees. 0.227 0.239 a5 50-Yard Dash Sees. 0.194 0.190 a6 Softball Throw Ft. 9.100 9.170 a ? 600-Yard Run-Walk Sees. 5.000 5.660 a Standard errors of measurement were calculated with t r i a l two data eliminated. 42 The standard error of measurement indicates the amount of variation one could expect in a person1s score, i f he were examined repeatedly with the same test. When interpreting a person's score i t is advisable to think in terms of the standard error of measurement and to avoid making any definite conclusions concerning that score. It is important to understand that scores obtained for an individual are not true scores but only estimates of the individual's true score. The standard error of measure-ment gives us a clear idea of how accurate our measurements are and how much confidence we can place on getting a reasonably accurate measurement. In the present study the standard errors of measurement calculated by the standard correlation formula method compared cloaely with those calculated by the analysis of variance technique. The differences between errors of measurement in Table II are probably the result of rounding errors. A choice between the two methods of determining the standard error of measurement will depend to a large extent upon the nature of the problem under consideration. III. ANALYSIS OF VARIANCE Each variable was treated by analysis of variance to answer the following questions* 1. Is the practice effect significant? 2. Does the test item measure with an accuracy sufficient to distinguish between the subjects tested? 3. Are subject differences significantly greater than practice differences? 43 Table III contains an analysis of variance for each variable. Three F ratios have been calculated for each variable (F , F , F ). 1 d o F refers to question one, F refers to question two, and F refers to 1 2 3 question three. TABLE III AAHPER YOUTH FITNESS TEST ANALYSIS OF VARIANCE DATA Variable and Source DF Sum of Squares Mean Square (Variance) 1. Pull-Ups Between Trials (Practice Effect) 3 8.15789 Between Subjects (Individuals) 56 2172.31000 Error (Residual) 168 116.84200 Total 227 2297.3100 2.71929 38.79130 0.69549 VI* j j ; r i a 1 ^ 5»91a Fo -V* S u b 3 9 o t s C T 55.77c F« SV. Subjects, 1 / L 9 f id 1 V. Error 2 V. Error 3 y. Trials 2. Sit-Ups Between Trials (Practice Effect) 3 2605.10 Between Subjects (Individuals) 56 53988.60 Error (Residual) 168 8077.90 Total 227 64671.60 868.36 964.08 48.08 F V. Trials. 1 8 > Q 6 a F V. Subject!. 2 0 # Q 5 c F V. Subjects m d V. Error c V . Error 6 V. Trials A , A A 44 TABLE III - Continued Variable and Source DF Sum of Squares Mean Square (Variance) 3, Standing Broad Jump Between Trials (Practice Effect) 3 372.820 Between Subjects (Individuals) 56 19561.200 Error (Residual) 168 1887.930 Total 227 21822.000 124.2730 349.3080 11.2376 * 1 - 3 r i a l s . 11.05* P2= V* Subjects _ 3 1 # 0 8 c F V. Subjects . 2 # 8 1 d V. Error V. Error V. Trials 4. Shuttle Run Between Trials (Practice Effect) 3 4.67416 Between Subjects (Individuals) 56 42.31370 Error (Residual) 168 9.60834 Total 227 56.59620 1.5580500 .7556030 .0571925 F V. Trials m 2 ? > 2 4 a F V. Subjects m 0 F V. Subjects m x V. Error & V. Error V. Trials 15. 50-Yard Dash Between Trials (Practice Effect) 2 0.29 Between Subjects (individuals) 56 27.54 Error (Residual) 112 4.11 Total 170 31.94 0.150 0.490 0.036 F V. Trials m . , fta V. Error F V. Subjects .„ _ l C  2 = V. Error " 1 3 ' 6 1 F V. Subjects  3 * V. Trials 3.27c 45 TABLE III - Continued Variable and Source DF Sum of Squares Mean Square (Variance) 6. Softball Throw Between Trials (Practice Effect) 2 117.93 Between Subjects (Individuals) 56 202402.28 Error (Residual) 112 9420.74 Total 170 211940.95 58.965 3614.330 84.110 F V. Trials V. Error 0.701b V V ; y ° t e - 42.97° v., iirror F„ V. Subjects V. Trials 61.29c 7. 600-Yard Run-Walk Between Trials (Practice Effect) 2 532.71 Between Subjects (individuals) 56 16115.98 Error (Residual) 112 3589.29 Total 170 20237.98 F V. Trials = 8 > 3 1 a F V Subjects . 8 > 9 ? c V. Error V. Error 266.35 287.79 32.05 F V. Subjects V. Trials = 1.08^  Significant at the 0.05 level of confidence. Not significant at the 0.05 level of confidence. Significant at the 0.01 level of confidence. Not significant at the 0.01 level of confidence. NOTE* F ratios were calculated with tr i a l two data eliminated. 4 6 P r a c t i c e E f f e c t ( F x ) I f a t r u e h y p o t h e s i s t h a t t h e r e was no p r a c t i c e e f f e c t o v e r f o u r t r i a l s was r e j e c t e d , t h e r e i s a n e r r o r o f t h e f i r s t k i n d ( a l p h a e r r o r ) . I f a f a l s e h y p o t h e s i s t h a t t h e r e was no p r a c t i c e e f f e c t o v e r f o u r t r i a l s was a c c e p t e d , t h e r e w o u l d be a n e r r o r o f t h e s e c o n d k i n d ( b e t a e r r o r ) . L o g i c a n d e x p e r i e n c e s u g g e s t t h a t t h e r e w o u l d be p r a c t i c e e f f e c t o v e r f o u r t r i a l s i n m o s t m o t o r f i t n e s s t e s t s p e r f o r m e d b y u n t r a i n e d s u b j e c t s . I t i s c o n s i d e r e d c o n s e r v a t i v e p r o c e d u r e i n m o t o r f i t n e s s t e s t i n g t o t r a i n s u b j e c t s i n t e s t p e r f o r m a n c e b e f o r e r e c o r d i n g r e s u l t s . One w o u l d be r e l u c t a n t t o make t h e e r r o r t h a t t h e r e was no p r a c t i c e e f f e c t o v e r f o u r t r i a l s i f i n t r u t h t h i s e f f e c t d i d o c c u r . One w o u l d n o t be s o r e l u c t a n t t o make t h e e r r o r t h a t t h e r e was p r a c t i c e e f f e c t o v e r f o u r t r i a l s i f i n t r u t h t h e r e was no r e a l p r a c t i c e e f f e c t . F o r t h e above r e a s o n s a l e v e l o f c o n f i d e n c e o f 0 . 0 5 r a t h e r t h a n 0 . 0 1 was s e t f o r F t e s t s o f p r a c t i c e e f f e c t . A n a l y s i s o f v a r i a n c e r e s u l t s i n d i c a t e a s i g n i f i c a n t p r a c t i c e e f f e c t f o r a l l t e s t i t e m s e x c e p t t h e s o f t b a l l t h r o w a t t h e 0 . 0 5 l e v e l o f c o n f i d e n c e . The v a l u e o f F f o r t h e i n d o o r t e s t i n g i t e m s ( p u l l - u p s , s i t - u p s , s t a n d i n g b r o a d jump a n d s h u t t l e r u n ) f r o m t h e F T a b l e w i t h 3 and 168 d e g r e e s o f f r e e d o m a t 5 p e r c e n t l e v e l o f c o n f i d e n c e i s 2 . 6 6 . The v a l u e o f F f o r t h e o u t d o o r t e s t i n g i t e m s ( 5 0 - y a r d d a s h , s o f t b a l l t h r o w a n d 6 0 0 - y a r d r u n - w a l k ) f r o m t h e F T a b l e w i t h 2 a n d 112 d e g r e e s o f f r e e d o m a t t h e 5 p e r c e n t l e v e l o f c o n f i d e n c e i s 3 . 0 8 . The v a l u e s o f F ^ ( p r a c t i c e e f f e c t ) o b t a i n e d f o r e a c h v a r i a b l e were p u l l - u p s 3 . 9 1 , s i t - u p s 1 8 . 0 6 , s t a n d i n g b r o a d jump 1 1 . 0 5 , s h u t t l e r u n 2 7 . 2 4 , 5 0 - Y a r d d a s h 4 . 1 6 , s o f t b a l l t h r o w 0 . 7 0 1 , a n d 6 0 0 - y a r d r u n - w a l k .47 8.31. As the value of -F (0.701) obtained for the softball throw is smaller than the value of F (3.08) from the F Table at the 5 percent level of confidence with 2 and 112 degrees of freedom, i t can be concluded that the practice effect is not significant. Since the values of F^ obtained for the indoor test items are larger than the value of F (2.66) at the 5 percent level of confidence with 3 and 168 degrees of freedom and since the values of F-^  obtained for the 50-yard dash and the 600-yard run-walk are larger than the value of F (3.08) at the 5 percent level of confidence with 2 and 112 degrees of freedom, i t can be concluded that the practice effect is significant for a l l test items except the softball throw. The existence of practice effect is of considerable importance and should be considered in a l l research studies. The significant practice effect in this study indicates that much learning has taken place and that a l l test items with the possible exception of the softball throw should be practiced before stable results can be recorded and interpreted. Differences Between Subjects (F g) A level of confidence of 0.01 was set for the between subjects F ratios. A motor performance test item should be able to show consistent discrimination between individual differences in ability from t r i a l to t r i a l and i t should do this despite random fluctuation in scores due to errors of measurement. The high level of confidence of 0.01 was chosen so that there should be as l i t t l e doubt as possible about the accuracy of the test item in distinguishing between the abilities of the subjects tested. 48 Analysis of variance results indicate that the test items measure with an accuracy sufficient to distinguish between the subjects tested. The value of F for the indoor test items (pull-ups, sit-ups, stand-ing broad jump and shuttle run) from the F Table with 56 and 168 degrees of freedom at the 1 percent level of confidence is 1.62. The value of F for the outdoor test items (50-yard dash, softball throw, and 600-yard run-walk) from the F Table with 56 and 112 degrees of freedom at the 1 percent level of confidence is 1.69. The values of F g (subjects) obtained for each variable from the study were pull-ups 55.77, sit-ups 20.05, standing broad jump 31.08, shuttle run 13.21, 50-yard dash 13.61, softball throw 42.97, 600-yard run-walk 8.97. Since the values of F obtained for the indoor test items are larger than the value of F (1.62) at the 1 percent level of confidence with 56 and 168 degrees of freedom and since the values of F obtained for the outdoor items are larger than the value of F (1.69) at the 1 percent level of confidence with 56 and 112 degrees of freedom, i t can be concluded that for a l l items the between subjects mean square (variance) is significantly larger than the error mean square (variance). Since the subject mean square for each test item was significantly larger than the error mean square, i t can be assumed that the differences between subject scores were so large that they could not be due entirely to errors of measurement. Thus, i t can be further concluded that the test items measure with an accuracy sufficient to distinguish between the abilities of the subjects tested. 49 Subject Differences :.:Versus Practice Differences (Fg) The 0.01 level of confidence was chosen for the F ratio^. subject variance over practice (between trials) variance, in order to guard against the possibility of rejecting a true hypothesis that practice effect was severe. Normally subject variance is much larger than practice variance in repeated trials of motor performance tests. A non-significant F ratio would indicate that this was not so and that the practice affect is severe. If practice effect is severe i t would be desirable to show this as a warning to test administrators that adequate preliminary practice is needed before performances could be considered sufficiently stable for use. Analysis of variance results indicate that subjects differences are not significantly greater than practice differences. The value of F for the indoor test items (pull-ups, sit-ups, standing broad jump and shuttle run) from the F Table with 56 and 3 degrees of freedom at the 1 percent level of confidence is 26.32. The value of F for the outdoor items (50-yard dash, softball throw and 600-yard run-walk) from the F Table with 56 and 2 degrees of freedom at the 1 percent level of confidence is 99.49. The values of Fg (subject variance over practice variance) obtained for each variable from the study were pull-ups 14.26, sit-ups 1.11, standing broad jump 2.81, shuttle run .48, 50-yard dash 3.27, softball throw 61.29, and 600-yard run-walk 1.08. Since the values of Fg obtained for the indoor test items are smaller than the value of F (26.32) at the 1 percent level,of confidence with 56 and 3 degrees of freedom and since the values of Fg obtained for the outdoor items are smaller than the value of F (99.49) at the 1 percent level of confidence 50' w i t h 56 a n d 2 d e g r e e s o f f r e e d o m , i t c a n be c o n c l u d e d t h a t f o r a l l t e s t i t e m s s u b j e c t d i f f e r e n c e s a r e n o t s i g n i f i c a n t l y l a r g e r t h a n p r a c t i c e d i f f e r e n c e s ( d i f f e r e n c e s b e t w e e n t r i a l s ) . As t h e b e t w e e n s u b j e c t s mean s q u a r e ( v a r i a n c e ) i s n o t s i g n i f i c a n t l y l a r g e r t h a n t h e b e t w e e n t r i a l s mean s q u a r e ( v a r i a n c e ) , a n d s i n c e i t u s u a l l y i s , t h e i m p l i c a t i o n i s t h a t t h e p r a c t i c e e f f e c t as a r e s u l t o f l e a r n i n g a n d o t h e r f a c t o r s m u s t have b e e n s e v e r e . T h i s f a c t i s v e r i f i e d b y t h e s i g n i f i c a n t b e t w e e n t r i a l s ( p r a c t i c e e f f e c t ) F r a t i o s f o r a l l i t e m s e x c e p t t h e s o f t b a l l t h r o w . The n o n s i g n i f i c a n t F r a t i o (between s u b j e c t v a r i a n c e o v e r o b e t w e e n t r i a l s v a r i a n c e ) f o r t h e s o f t b a l l t h r o w i n d i c a t e s some p r a c t i c e has o c c u r r e d . However , s i n c e t h e s i z e o f t h i s r a t i o c l o s e l y a p p r o a c h e s t h e one p e r c e n t l e v e l o f c o n f i d e n c e , i t c a n be r e g a r d e d as a l m o s t s i g n i f i c a n t . H e n c e , t h e amount o f p r a c t i c e t h a t has o c c u r r e d i s v e r y s m a l l a n d c e r t a i n l y n o t l a r g e enough t o show s i g n i f i c a n t d i f f e r e n c e s b e t w e e n t r i a l m e a n s . CHAPTER VI SUMMARY AND CONCLUSIONS The object of this study was to determine by a reliability analysis i f the American Association for Health, Physical Education and Recreation Youth Fitness Test items are accurate and reliable measuring instruments. This study was made possible through the use of fifty-seven untrained male f i r s t and second year students in the Required Physical Education Programme at the University of British Columbia who were tested once a week for four consecutive weeks with the AAHPER Test. The data from each test item were analyzed in order to obtain (a) means and standard deviations on each of four trials, (b) between trials correlation coefficients and an average reliability coefficient, (c) standard errors of measurement computed by the standard correlation formula method and the analysis of variance technique, and (d) three F ratios (analysis of variance). The findings of this study based on the reliability analysis of the data collected may be summarized as followst 1. Test items with average test-retest reliability coefficient over four trials of .85 or better were pull-ups (.938), sit-ups (.861), standing broad jump (.899), and softball throw (.940). Items with the poorest average test-retest reliability coefficients over four trials were shuttle run (.776), 50-yard dash (.792), and the 600-yard run-walk (.759). 52 2. The best between t r i a l reliability coefficient for each variable was pull-ups (T 3-T 4 • .960), sit-ups (Tg-Tg » .943), standing broad jump (Tg-T4 - .951), shuttle run (Tg-T4 - .878), 50-yard dash (T 3-T 4 - .850), softball throw (Tj-Tg » .960), and 600-yard run-walk (^-^ » .860). 3. The standard errors of measurement of the test items computed by the standard correlation formula method compared closely with those computed by the analysis of variance technique. 4. The standard errors of measurement computed by the standard correlation formula method and the analysis of variance technique were pull-ups (correlation formula method 0.794 and analysis of variance technique 0.834), sit-ups (6.250 and 6.934), standing broad jump (3.124 and 3.353 inches), shuttle run (0.227 and 0.239 seconds), 50-yard dash (0.194 and 0.190 seconds), softball throw (9.100 and 9.170 feet), and 600-yard run-walk (5.000 and 5.660 seconds). 5. Analysis of variance results showed a significant practice effect over four trials for a l l test items except the softball throw. 6. Analysis of variance results showed that the AAHPER Test items measure with an accuracy sufficient to distinguish between the subjects tested. 7. Analysis of variance results showed that for each test item subject differences are not significantly larger than practice differences. Since subject differences are usually much larger than practice differences, i t can be concluded that practice effect in this study must have been severe. 53 In accordance with the above findings the following recommendations are made to anyone contemplating administering the AAHPER Test. 1. The pull-ups and softball throw variables appear to be highly reliable measuring instruments. Thus for these items i t seems reasonable to accept f i r s t t r i a l scores as sufficiently accurate for both survey and experimental purposes. 2. The standing broad jump, 50-yard dash and 600-yard run-walk items are apparently not as reliable as the pull-ups and softball throw items and several practice t r i a l s should be given before results can be accepted as reasonably accurate estimates of true individual performance. 3. The sit-ups and shuttle run test items are apparently the least reliable items of the AAHPER Test. These items seem to require at least four, perhaps more, preliminary practice t r i a l s before a satisfactory level of r e l i a b i l i t y can be attained. 54 BIBLIOGRAPHY A. BOOKS Albaugh, Ralph M. Thesis Writing, A Guide to Scholarly Style. Ames, Iowa, Littlefield, Adams and Company, 1957. Best, John W. Research in Education. Englewood Cliffs, N. J., Prentice-Hall, Inc., 1959. Borg, Walter R. Educational Research. An Introduction. New York, David McKay Company, Inc., 1963. Campbell, William G. Form and Style in Thesis Writing. Boston, Houghton Mifflin Company, 1954. Clarke, H. Harrisoni Application of Measurement to Health and Physical  Education. 3rd ed. New York, Prentice-Hall, Inc., 1959. Ferguson, George A. Statistical Analysis in Psychology and Education. New York, McGraw-Hill Book Company, Inc., 1959. Garrett, Henry E. Statistics in Psychology and Education, 5th ed. New York, Longmans, Green and Co., 1958. Guilford, J. P. Psychometric Methods. 2nd ed. New York, McGraw-Hill, Inc., 1954. Gulliksen, Harold. Theory of Mental Tests. New York, John Wiley and Sons, Inc., 1950. Lyman, Howard B. Test Scores and What They Mean. Englewood Cliffs, N.J., Prentice-Hall, Inc., 1962. Meyers, Carlton R., and T. Erwin Blesh. Measurement in Physical Education. New York, The Ronald Press Company, 1962. Scott, Gladys M. and Esther French. Measurement and Evaluation in  Physical Education. Dubuque, Iowa, Wn. C. Brown Company, 1959. Thorndike, Robert L. "Reliability," Chapter 15 in Educational Measurement. E. F. Lindquist, ed. Washington, D.C, American Council on Education, 1961. Thorndike, Robert L., and Elizabeth Hagen. Measurement and Evaluation in  Psychology and Education. New York, John Wiley and Sons, Inc., 1959. Travers, Robert M. W. An Introduction to Educational Research. New York, The MacMillan Company, 1958. •55 Weiss, Raymond A. and M. Gladys Scott. "Construction of Tests," Chapter 8 in Research Methods in Health and Physical Education.,. 2nd ed. M. Gladys Scott, ed. Washington, D. C, American Association for Health, Physical Education and Recreation, 1959. B. PERIODICALS Alexander, Howard W. "The Estimation of Reliability When Several Trials are Available," Psychometrika, Vol. 12 (June 1947), pp. 79-99. Barrow, Harold M. "Test of Motor Ability for College Men," AAHPER  Research (Quarterly, Vol. 25 (October 1954), pp. 253-260. Binder, A. "The Choioe of an Error Term in Analysis of Variance Designs," Psychometrika, Vol. 20 (March 1955), pp. 29-50. Broer, M. R. "Reliability of Certain Skill Tests for Junior High Girls," ' AAHPER Research ^ Quarterly, Vol. 25 (May 1958), pp. 139-145. Brown, Howard S. "A Comparative Study of Motor Fitness Tests," AAHPER  Research Quarterly, Vol. 25 (March 1954), pp. 8-19. Cronbach, L. J. "Test Reliability: Its Meaning and Determination," Psychometrika, Vol. 12 (March 1947), pp. 1-16. Feldt, Leonard S. and Mary E. McKee. "Estimation of the Reliability of Skill Tests," AAHPER Research %iarterly, Vol. 29 (October 1958), pp. 279-293. Fox, K, "Reliability and Validity of Selected Physical Fitness Tests for High School Girls," AAHPER Research ^ Quarterly, Vol. 30 (December 1959), pp. 430-437. Green, B. F. "A Test of the Equality of Standard Errors of Measurement," Psychometrika, Vol. 15 (September 1950), pp. 251-257. Guttman, L. "A Basis for Analyzing Test-Retest Reliability," Psychometrika, Vol. 10 (December 1945), pp. 245-282. Henry, Franklin M. "Influence of Measurement Error and Intra-Ind'ividual Variation on the Reliability of Muscular Strength and Vertical Jump Tests." AAHPER Research %tarterly, Vol. 30 (May 1959), pp. 145-155. Henry, Franklin M. "Reliability, Measurement Error and Intra-Individual Difference," AAHPER Research JSuarterly, Vol. 30 (March 1959) pp. 21-24. Henry, Franklin M. "The Practice and Fatigue Effects in the Sargent Test," AAHPER Research %iarterly. Vol. 13 (March 1942), pp. 16-29. 516 Horst, A. P. "A Generalized Expression for the Reliability of Measures," Psyohometrika, Vol. 14 (March 1949), pp. 21-32. Hoyt, Cyril. "Test Reliability Estimated by Analysis of Variance," Psyohometrika, Vol. 6 (June 1941), pp. 153-160. Hunsicker, Paul A. "The Youth Fitness Project," Journal of the Canadian  Association for Health, Physical Education and Recreation, Vol. 30 (February - March 1964), pp. 15-16, 31. Johnson, H. G. "An Empirical Study of the Influence of Errors of Measurement Upon Correlation," American Journal of Psychology, Vol. 57 (October 1944), pp. 521-536. Kammeyer, Shirley J. "Reliability and Validity of a Motor Ability Test for High School Girls," AAHPER Research {Quarterly, Vol. 27 (October 1956), pp. 310-315. Kane, Robert J. and Howard V. Meredith. "Ability in the Standing Broad Jump of Elementary School Children 7, 9 and 11 Years of Age," AAHPER  Research Quarterly, Vol. 23 (May 1952), pp. 198-208. Kraus, Hans and Ruth P. Hirschland. "Minimum Muscular Fitness Tests in School Children," AAHPER Research %tarterly, Vol. 25 (May 1954), pp. 178-188. Kuder, G. F. and M. W. Richardson. "The Theory of Estimation of Test Reliability," Psychometrika, Vol. 2 (September 1937), pp. 151-160. Lev, Joseph. "Evaluation of Test Items by the Method of Analysis of Variance," Journal of Educational Psychology, Vol. 29 (November 1938), pp. 623-630. McGraw L. W. and J. W. Tolbert. "A Comparison of the Reliabilities of Methods of Scoring Tests of Physical Ability," AAHPER Research  Quarterly, Vol. 23 (March 1952), pp. 73-81. Mollenkopf, W. G. "Variation of the Standard Error of Measurement," Psychometrika, Vol. 14 (September 1949), pp. 189-215. "Report of the Presidents Conference on the Fitness of American Youth. June 1956... Highlights of Conference Findings and Recommendations," Journal  of the American Association of Health, Physical Education and  Recreation, Vol. 28 (March 1957), pp. 53-54. Willgoose, Carl E., Nathaniel R. Askew, and Mildred Askew. "Reliability of the 600-Yard Run-Walk at the Junior High School Level," AAHPER  Research %tarterly, Vol. 32 (May 1961), pp. 264-266. C. PUBLICATIONS OF LEARNED SOCIETIES AAHPER Youth Fitness Test Manual. Washington, D. C , AAHPER - NEA Fitness Department, 1958. Cureton, T. K. Endurance of Young Men. Washington, D. C., Society for Research i n Child Development, Serial No. 40, Vol. 10, 1945. Jackson, Robert W. B. Application of the Analysis of Variance and  Covariance Method to Education Problems. University of Toronto, Toronto, Department of Educational Research, Bulletin No. 11, 1940. Jackson, Robert W. B., and George A. Ferguson. Studies on the R e l i a b i l i t y  of Tests, University of Toronto, Toronto, Department of Educational Research, Bulletin No. 12, 1941. D. UNPUBLISHED MATERIALS Barnum, B. L. "A Study of Youth Fitness of 8th Grade Junior High School Girls of Mitchell South Dakota as Measured by the AAHPER Test." Unpublished Master of Education Thesis in Physical Education, University of South Dakota, Vermilian, South Dakota, 1960. Eisenbraun, Dalvin E. "A Comparative Analysis of the Kraus-Weber Test and the AAHPER Test of Physical Fitness." Unpublished Master of Science Thesis i n Physical Education, South Dakota State College, Brooking, South Dakota, 1958. Green, C. "A Study of the Performance of Seventh Grade Girls on the AAHPER Youth Fitness Test." Unpublished Master of Science Thesis i n Physical Education, University of Colorado, Boulder, Colorado, 1959. Ikeda, Namiko. "A Comparison of Physical Fitness of Children i n Iowa, U.S.A. and Tokyo, Japan." Unpublished Ph.D. Dissertation, The State University of Iowa, Iowa City, Iowa, 1961. Kientzle, Mary J. "Statistics i n Psychology and Education." Pullman, Washington, The State College of Washington, Department of Psychology, 1951. (Mimeographed). Liba, M. R. "Factors Affecting the R e l i a b i l i t y of Motor Performance." Unpublished Ph.D. Dissertation, University of Wisconsin, Madison, Wisconsin, 1956. Loveland, Edward H. "^Measurement of Factors Affecting Test-Retest R e l i a b i l i t y . " Unpublished Ph.D. Dissertation, The University of Tennessee, Knoxville, Tennessee, 1952. .58 Routledge, Robert H. "A Study to Establish Norms, for Edmonton Secondary School Boys, of the Youth Fitness Tests of the American Association for Health, Physical Education, and Recreation." Unpublished Master of Education Thesis, The University of Alberta, Edmonton, Alberta, 1961. Spiker, Otto H. "Elementary School Health and Physical Education Program Standards and Related Variables Compared with Pupil Achievement on Five Items of the AAHPER Youth Fitness Test." Unpublished P.E.D. Dissertation, The University of Indiana, Bloomington, Indiana, 1960. 59 60 APPENDIX A AAHPER RESEARCH PROGRAMME THE UNIVERSITY OF BRITISH COLUMBIA School of Physical Education and Recreation Required Physical Education Program, P. E. 64 March 5, 1962, During the final weeks of this term the School of Physical Education and Recreation will be administering physical fitness tests several times to students enrolled in P. E. 64. One test is that of the American Association for Health, Physical Education and Recreation which has been widely used in the schools of the U.S. and other countries in recent years. It has also been used in universities and scoring scales for university students are available. The items of "this test are pull-ups on horizontal bar, sit-ups (up to 100), standing broad jump, shuttle run (4 x 30 feet) - a l l done indoors, and 50-yard sprint, softball throw, 600-yard run - a l l done outdoors. This test will be administered four times. The indoor part will be administered once during the fi r s t part of each week and the outdoor part will be administered once in the second part of each week. You have been scheduled to attend testing sessions twice a week until the end of term. Testing begins Monday 12th March. It is absolutely necessary that you do not miss a session unless absence is completely unavoidable. Absentees must be explained in writing or an excuse will not 60. be recorded. If a scheduled session is missed for a very special reason, i t will be possible to "make i t up" in the same week. "Make-up" times are as followss Indoor part - Wednesdays, from 12:30 to 1:30 or 4:30 to 6:30 p.m. Outdoor part - Fridays, from 4:30 to 6:30 p.m. If you are scheduled to attend Saturday but will have to miss, you must attend the day before - i.e. on Friday at 4:30. Be certain to change into shortB, shirt and tennis or basketball shoes for every session. It is advisable also to have a sweat suit for the outdoor sessions. If you forget to bring equipment, attendance is s t i l l necessary. On arrival at the War Memorial Gymnasium on testing days, consult the blackboard in the foyer for directions about what to do or where to go when changed. Any cancellations will be posted on the black-board. Unfortunately, the weather has not permitted testing to be carried out earlier in the term. Your consistent attendance and co-operation during the next few weeks will be greatly appreciated and will ensure that your physical education requirement will be completed satisfactorily. You have been scheduled to attend testing sessions at the War Memorial Gymnasium during the weeks beginning Monday March 12th, 19th, 26th, and April 2nd, on 62 Mondays at Fridays at Wednesdays at Saturdays at Stanley R. Brown, Assistant Professor 63 APPENDIX B HISTORY AND USE OF THE AAHPER TEST THE UNIVERSITY OF B R I T I S H COLUMBIA S c h o o l o f P h y s i c a l E d u c a t i o n a n d R e c r e a t i o n The AAHPER P h y s i c a l ( M o t o r ) F i t n e s s T e s t The AAHPER T e s t i s b e i n g w i d e l y u s e d i n s c h o o l s and u n i v e r s i t i e s o f t h e N o r t h A m e r i c a n C o n t i n e n t a n d O v e r s e a s . I t came i n t o e x i s t e n c e t o s a t i s f y t h e n e e d f o r a s u i t a b l e t e s t t o be u s e d i n c o n j u n c t i o n w i t h t h e " f i t n e s s movement" w h i c h s t a r t e d w i t h t h e s u p p o r t o f P r e s i d e n t E i s e n h o w e r a n d has c o n t i n u e d w i t h t h e e n t h u s i a s t i c p r o m o t i o n o f P r e s i d e n t K e n n e d y . R e c e n t l y t h e Government o f Canada v o t e d a n a n n u a l g r a n t o f f i v e m i l l i o n d o l l a r s t o promote f i t n e s s a n d s p o r t among t h e y o u t h a n d a d u l t s o f t h i s c o u n t r y . By "motor f i t n e s s " we mean t h e c a p a c i t y f o r e f f i c i e n t p e r f o r m a n c e i n t h e b a s i c r e q u i r e m e n t s o f r u n n i n g , j u m p i n g , d o d g i n g , f a l l i n g , c l i m b i n g , swimming, l i f t i n g w e i g h t s ( i n c l u d i n g o n e ' s own b o d y ) , c a r r y i n g l o a d s a n d e n d u r i n g u n d e r s u s t a i n e d e f f o r t i n a v a r i e t y o f s i t u a t i o n s . The AAHPER T e s t i s d e s i g n e d t o measure a number o f t h e s e a t t r i b u t e s . The p r o j e c t i n w h i c h y o u a r e now a p a r t i c i p a n t i s d e s i g n e d t o f i n d o u t some i m p o r t a n t f a c t s a b o u t t h e AAHPER T e s t i n t h e e v e n t i t i s p u t i n t o g e n e r a l use i n t h i s U n i v e r s i t y . I t i s most i m p o r t a n t t h a t y o u s h o u l d do y o u r b e s t on a l l o c c a s i o n s . O n l y i n t h i s way w i l l i t be p o s s i b l e f o r y o u t o f i n d o u t what y o u r t r u e c a p a b i l i t i e s a r e . 64 Percentile scoring scales f o r college men have been published recently. When the tests are over you w i l l be given a record of your results and percentile ratings. P u l l attendance and genuine e f f o r t over the four weeks of the project w i l l be greatly appreciated and w i l l ensure that you w i l l complete your physical education requirement for the year. .66 APPENDIX C AAHPER TEST ADMINISTRATION INSTRUCTIONS Pull-Upst This test measures the strength and endurance of the arm and shoulder muscles. Use the overhand grip - palms away from the body. From a f u l l hang, with arms and legs f u l l y extended, raise the body until the chin can be placed over the bar. Lower to a f u l l hang. Repeat continuously without pause as many times as possible. When tired do not drop from the bar from the "chinning" position. Lower to a f u l l hang and attempt to pull up again. If unsuccessful lower to a f u l l hang and then drop from the bar. Cautionst Swinging the body, raising or kicking the knees or legs is not permitted. Sit-Ups t This test measures the strength and endurance of the hip flexor and abdominal muscles. Position; Lie on the back, legs extended, feet about two feet apart, hands behind the neck, fingers interlocked, elbows retracted and touching the floor. A partner holds the ankles down, the heels being on the floor at a l l times. Execution; S i t up, turning the trunk to the l e f t and touch the right elbow to the l e f t knee. Return to the starting position. S i t up, turning the trunk to the right and touch the l e f t elbow to the right knee. Return to the starting position. 6@ Repeat the exercise continuously without pause alternating to the left and right. Cautions; Fingers must remain in contact behind the neck throughout. The knees must be on the floor during the sit-ups but may be slightly bent when touching elbow to knee. When returning to the starting position, the elbows must be flat on the floor before sitting up again. Partner: Do not count the sit-up i f the finger tips do not remain in contact behind the head throughout or i f the knees are bent when the subjectnbegihs to sit up. Count the number of correct sit-ups. Stop the person you are holding at 100 sit-ups or whenever he is unable to continue without resting longer than two seconds in the starting or sit-up position. Standing Broad Jump: This test measures ability to move the body "explosively". It reflects speed and strength in propelling the bodyweight. Position: Stand with the feet slightly apart and toes just behind the take-off line. Execution: Preparatory to jumping swing the arms backward and bend the knees. Jump as far forward as possible by simultaneously extending the knees and swinging the arms forward. Three trials are allowed. Distance is measured from take-off line to heel or other part of the body that touches the floor nearest the take-off line. Shuttle Run: This test measures the ability to move quickly and change direction i.e. speed plus agility. 6 7 Two l i n e s a r e d r a w n 30 f e e t a p a r t . Two b l o c k s o f wood a r e p l a c e d j u s t b e h i n d one l i n e a n d t h e s t a r t t a k e s p l a c e b e h i n d t h e o t h e r l i n e * P o s i t i o n t Take up a s t a n d i n g s t a r t p o s i t i o n w i t h t h e f r o n t f o o t j u s t b e h i n d t h e s t a r t i n g l i n e . E x e c u t i o n : On t h e s i g n a l " R e a d y ? G o , " r u n t o t h e o p p o s i t e l i n e , p i c k up one b l o c k , r e t u r n a n d p l a c e i t j u s t o v e r t h e s t a r t i n g l i n e . R e t u r n t o p i c k up t h e o t h e r b l o c k a n d r a c e o v e r t h e s t a r t i n g l i n e , b l o c k i n h a n d . Two t r i a l s a r e a l l o w e d , w i t h a r e s t i n b e t w e e n . C a u t i o n t P e r f o r m t h e t e s t a t f u l l s p e e d . When f i n i s h i n g do n o t s l o w down b u t r u n t h r o u g h t h e f i n i s h as f a s t a s p o s s i b l e . Do n o t p l a c e t h e s e c o n d b l o c k down b u t keep i t i n y o u r h a n d . T h r o w i n g o r d r o p p i n g t h e f i r s t b l o c k i s n o t a l l o w e d . 5 0 - Y a r d D a s h : T h i s t e s t m e a s u r e s s p e e d i n r u n n i n g . P o s i t i o n : Take up a s t a n d i n g s t a r t p o s i t i o n w i t h t h e f r o n t f o o t j u s t b e h i n d t h e s t a r t i n g l i n e . E x e c u t i o n : The s t a r t e r w i l l use t h e commands " A r e y o u r e a d y ? " a n d " G o . " The l a t t e r w i l l be a c c o m p a n i e d b y a downward sweep o f t h e arm t o g i v e t h e t i m e k e e p e r a v i s u a l s i g n a l . Run a s f a s t as y o u c a n m a i n t a i n i n g f u l l s p e e d p a s t t h e f i n i s h i n g l i n e . Two t r i a l s a r e a l l o w e d , w i t h a r e s t i n b e t w e e n . S o f t b a l l Throw f o r D i s t a n c e : T h i s t e s t m e a s u r e s c o - o r d i n a t i o n as w e l l as arm s t r e n g t h a n d s p e e d . D i r e c t i o n s : The t e s t c o n s i s t s o f t h r e e c o n s e c u t i v e throws f o r d i s t a n c e . The t h r o w i n g a r e a i s c o n f i n e d w i t h i n two p a r a l l e l l i n e s , s i x f e e t a p a r t , a t 6'8 ri g h t angles to the di r e c t i o n of the throw. Stepping on or over the l i n e s during a throw or on following through constitutes a f o u l . Only overhand throws are permitted. 600-Yard Run: This t e s t measures running endurance and r e f l e c t s w i l l -power, muscular and c i r c u l a t o r y - respiratory f i t n e s s and pace judgement. Directions: Before t h i s event you w i l l be paired with a partner who w i l l hold your score card during the event and w i l l place himself opposite the f i n i s h l i n e , ready to record your time as i t i s c a l l e d out by the timer. When you have finis h e d your run, return to the f i n i s h l i n e (a) ready to perform the same function f o r your partner i f he has not already run or (b) to c o l l e c t your score card. Executions At the signal "Ready?, GoJ" begin running from a standing s t a r t and maintain the best pace possible throughout the event to f i n i s h i n the best possible time. 69 APPENDIX D AAHPER TEST SCORE CARD Test Dates/Times. Mon._ Name Wed. Fri . Faculty/Year Age Height High School/City Sat. Weight Test Item Test 1. Test 2. Test 3. Test 4. Pull-Ups Sit-Ups Standing Broad Jump Shuttle Run 50-Yard Dash Softball Throw 600-Yard Run 1. 2. 3. 1. 2. 3. 1. 2. 3. JL* 2« 3* -L« 2« 3» X» 2* 3« 1. 2. 3. 1. 2. 1. 2. 3. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0302538/manifest

Comment

Related Items