Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Constrained scaling : calibrating individual subjects in magnitude estimation West, Robert Lawrence 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-14853X.pdf [ 6.65MB ]
Metadata
JSON: 831-1.0099179.json
JSON-LD: 831-1.0099179-ld.json
RDF/XML (Pretty): 831-1.0099179-rdf.xml
RDF/JSON: 831-1.0099179-rdf.json
Turtle: 831-1.0099179-turtle.txt
N-Triples: 831-1.0099179-rdf-ntriples.txt
Original Record: 831-1.0099179-source.json
Full Text
831-1.0099179-fulltext.txt
Citation
831-1.0099179.ris

Full Text

CONSTRAINED SCALING: CALIBRATING INDIVIDUAL SUBJECTS IN MAGNITUDE ESTIMATION by ROBERT LAWRENCE WEST B.A., The University of British Columbia, 1990. . M.A., Simon Fraser University, 1992. A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Department of Psychology) We accept the thesis as conforming to the re^ired^tandard THE UNIVERSITY OF BRITISH COLUMBIA April 1996 © Robert Lawrence West In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada Date DE-6 (2/88) A B S T R A C T Magnitude estimation, although a very valuable technique for the study of sensory systems, suffers from the problems of excessive intersubject variability and interlab variability (Marks, 1974). Assuming that healthy normal subjects experience approximately the same levelof perceived stimulus magnitude when presented with the same stimulus under the same conditions, the seemingly excessive intersubject variability revealed by magnitude estimation techniques must be due to factors left free to vary during the magnitude estimation procedure. In this dissertation I explore a methodology called constrained scaling, which is an attempt to establish a methodology for magnitude estimation that exerts greater control over the scaling process. Constrained scaling consists of using feedback to train subjects to respond to a set of stimuli according to a power function with a particular exponent and then asking them to respond to a different set of stimuli using the response scale they have learned, but without feedback. Thus this dissertation was an investigation of the degree to which, and under what conditions, subjects could extend a learned scale to novel stimuli. The results indicate that under the right conditions subjects can perform this task with a level of precision sufficient to significantly reduce intersubject variability as compared to standard magnitude estimation results. The consequences of constraining subjects to answer according to a predetermined function are discussed in terms of the type of scale that is produced. Also, the conceptual implications of constrained scaling for modeling sensory systems and conscious perception are discussed. ii TABLE OF CONTENTS Abstract ii Table of Contents iii List of Tables v List of Figures vi Acknowledgment viii Introduction 1 Measurement 1 A Brief History of Psychophysical Scaling 6 Problems With The Power Law 11 Constrained Scaling 15 Previous Empirical Results 20 Research Goals 22 Overview of Experiments 25 Part 1: Achieving the Phenomena 28 Experiment 1 28 Experiment 2 35 Experiments 42 Experiment 4 , 52 Part 2: Cross Modal Applications 57 Experiment5 57 Part 3: Stimulus Range Effects 63 Experiment 6 63 Experiment 7 63 Part 4: Different Exponents 73 Experiments 73 Experiment 9 80 Part 5: Nonperceptual Stimuli 87 Experiment 9 87 iii Discussion 94 Psychophysical Models 94 Scaling Issues 98 The Systems Perspective 101 Future Directions: Beyond the Power Law 104 References 108 iv List of Tables Tablel. Variability in a convenient sample of ME and CMM experiments 23 Table 2. test Trial results for Experiment 1 ^ 32 Table 3. test Trial results for Experiment 2 37 Table 4. test Trial results for Experiment 3 46 Table 5. test Trial results for Experiment 4 54 Table 6. test Trial results for Experiment 5 60 Table 7. test Trial results for Experiment 6 66 Table 8. test Trial results for Experiment 7 70 Table 9. test Trial results for Experiment 8 77 Table 10. test Trial results for Experiment 9 83 Table 11. test Trial results for Experiment 10 90 Table 12. 1000 Hz test trial results for Experiment 3 106 Table 13. 65 Hz test trial results for Experiment 3 106 V List of Figures Figure 1. Distribution of individual loudness exponents from Hellman and Meiselman (1988) 13 Figure 2. Distribution of average loudness exponents, taken from Marks (1974) 14 Figure 3. A cartoon reproduction, taken from Poulton (1989) 19 Figure 4. Experiment 1: Fitted power functions for an individual subject 34 Figure 5. Experiment 2: Fitted functions for thelOO Hz test trials 40 Figure 6. Nonlinear and linear fitted functions for the 1000 Hz, no-feedback trials of subject 2 41 Figure 7. Average median error for 10 trial blocks for the learning trials of Experiment 2 (line A) and Experiment 3 (line B) 49 Figure 8. Experiment 3: Raw 65 Hz data fitted according to log(R)=B*log(S-T)+log(K) 50 Figure 9. Experiment 3: Individual exponent values 51 Figure 10. Experiment 4: Fitted functions for 1000 Hz tones 55 Figure 11. Experiment 4: Individual exponent values 56 Figure 12. Experiment 5: Fitted functions for the light stimulus 61 Figure 13. Experiment 5: Individual exponent values 62 Figure 14. Experiment 6: Fitted functions for the 65 Hz test trials 67 Figure 15. Experiment 6: Individual exponent values 68 Figure 16. Experiment 7: Fitted functions for the 65 Hz test trials 71 Figure 17. Experiment 7: Individual exponent values 72 Figure 18. Experiment 8: Fitted functions for the 65 Hz test trials 78 vi Figure 19. Experiments: Individual exponent values 79 Figure 20. Experiment 9: Fitted functions for the 65 Hz test trials 85 Figure 21. Experiment 9: Individual exponent values 86 Figure 22. Experiment 10: Fitted functions for happiness 93 Figure 23. Three Psychophysical Models 95 vii Acknowledgment I would like to thank my wife, Fay, for her years of patience and support, without which I could not have completed this thesis. I would also like to thank my advisor, Dr. Lawrence Ward, for being an exceptional mentor. viii "Decide on standard weights and measures after careful consideration." Confucius (551 to 479 B.C.) Introduction This dissertation is concerned with the measurement of psychological magnitude by means of magnitude estimation, a technique devised by S.S. Stevens (see Stevens, 1975 for a detailed historical account of the creation of magnitude estimation). The process of magnitude estimation involves presenting stimuli to subjects and having them match the magnitude of a particular psychological dimension mediated by the stimuli to their subjective impressions of the magnitude of numbers. For example, various tones could be presented and subjects could be required to match the loudness of each tone to a number so that the subjective magnitude of the number equaled the subjective magnitude of the tone. Although this technique is simple, a storm of controversy has ensued as to the details of implementing the technique and what exactly the results mean. However, more on this later. Before proceeding to the issues specific to magnitude estimation it will be necessary to briefly digress and consider the issue of measurement in general. Measurement Without measurement it would not be possible to link mathematics to empirical observations, an enterprise central to scientific inquiry. But despite the obvious importance of this activity the status of measurement theory within the philosophy of science is diminutive. With a only a few exceptions the issue of measurement has been dismissed as, "neither interesting nor important," (Ellis, 1968). However, because some l of these issues are relevant to this dissertation a brief, selective review of measurement theory is in order. Measurement is "the assignment of numerals to objects or events according to rules" (a paraphrase of Campbell, 1940, by Stevens, 1946). In order to be clear concerning terminology and the classification of different types of measurement I will use Ellis's (1968) classification scheme, as it is arguably the most comprehensive. Ellis's classification scheme distinguishes between direct measurement, which occurs when a quantity is measured without reference to any other quantities, and indirect measurement, which occurs when a quantity is measured with reference to one or more other quantities. Fundamental scales (Campbell, 1940) are a form of direct measurement. To achieve this type of scale there must be a way of combining two systems both with a quantity, p (eg. p could be weight), such that the combination also posses p. Also the combining operation, o, must satisfy at least the following, where x and y are different amounts of quantity p (eg. x and y could be the weights of two different objects): o(x,y)=o(y,x) o(x,y)>x o(x,y)=o(a,b) when x=a and y=b , o(o(x,y),z)=o(x,o(y,z)) (Ellis, 1968) The classic example of a fundamental measurement is using a pan balance to measure mass. Another example is the measurement of length by overlapping two objects to see if one is greater than, less than, or equal to the other. However, not all direct scales are fundamental scales. When a combining operation is not possible the 2 resulting scales are referred to as elemental scales (eg. hardness can be measured by smashing different substances together, but there is no way to combine hardnesses). Associative scales are a form of indirect measurement. An associative scale can be constructed when there is a measurable quantity, q, associated with the quantity, p, to be measured, such that, under specified conditions, if the quantities are ordered according to q they are also ordered according to p. The classic example of an indirect scale is when temperature is measured by measuring the volume of various substances (eg. mercury, alcohol) assumed to expand and contract in response to temperature differences. Derived scales are another type of indirect scale that are derived from already known scale values (eg. scales of density). Magnitude estimation is a form of associative measurement in which subjective magnitude is assumed to vary systematically with subjects' numerical responses. The primary advantage that fundamental scales have over associative scales is that, by definition, the addition of fundamental scale values directly reflects the physical process of addition. With associative scales the addition of scale values may or may not reflect physical or psychological reality. For example, if tone A is perceived to be twice as loud as tone B, then in order to reflect this a subject should match tone A to a number twice as large as the number tone B was matched to. If a scale directly reflects physical or psychological reality in this manner then I will refer to it as linear. Scales that do not meat this criterion or for which it cannot be proven that they meet this criterion, I will refer to as nonlinear. It is important to note that if an associative scale is nonlinear very little practical advantage is lost. Consider two associative scales describing the same quantity, one linear and one nonlinear. All other things being equal, the nonlinear scale will be 3 systematically related to the linear scale and all lawful relationships existing for the linear scale will also exist for the nonlinear scale (Ellis, 1968, Stevens, 1951). Philosophically, the linearity issue is complex and controversial (see Ellis, 1968). Important for this dissertation is the importance of having a physical means of concatenating the quantity to be measured. As Ellis (1968), Stevens (1951) and Luce (1972) have pointed out, it is possible to construct fundamental scales without using a measurement procedure involving physical concatenation or addition. Finally, both Ellis (1968) and Stevens (1951) have argued that considerations of additivity (i.e. concatenation) and linearity should be secondary to considerations of the mathematical desirability of the scale. In other words, they maintain that the main criterion for creating a scale should be to create simple and elegant mathematical models. To achieve a meaningful associative scale three conditions must be met: 1) there must be a process (invented or naturally occurring) that transforms one variable into another (Luce, 1972), 2) the same, specified process must be used for each measurement (Ellis, 1966) 3) the relevant variables must be related by an interlocking algebraic structure (Ellis, 1966; Luce, 1972). The second point in this list is often taken for granted in the physical sciences. It simply asserts that different methods of measurement (i.e. methods that result in different units of measurement) cannot be mixed. For example, if length A is 3 feet and length B is 47 centimeters, B cannot be said to be greater than A simply because 47 is greater than 3 (i.e you cannot compare numbers derived using different measurement processes, unless of course, you have a formal means of equating the units). This point is trivial for physical measurement where well defined units of measurement are 4 common but it is arguably one of the greatest impediments in creating valid associative scales of psychological quantities or magnitudes. This is because the responses of individuals, or the average responses of naturally occurring groups (eg. male/female, different cultural groups etc.) can only be compared if the associative functions mapping responses to psychological states are equivalent across the individuals or groups being compared. For example, if person A reports a level of happiness equal to 7 on a 10 point scale, and person B reports a level of happiness equal to 8 on the same scale, we are unable to conclude that person B is happier than person A because we cannot be sure that the associative functions mapping responses to psychological states were the same for A and B. Essentially, the problem is one of calibrating individual subjects (or groups) to each other. The situation is analogous to when temperature was measured using different types of wine and alcohol, resulting in a confusing array of scale values (Ward, 1992). In the following section I will examine how this issue has impacted on psychophysical scaling in general and on magnitude estimation in particular. One last note concerning terminology. Sensations have different aspects or attributes that are available to consciousness. For example, a subject can report how long, how clear, how intense, or how pleasant a sensation is. However, this dissertation will be concerned primarily with the magnitude (i.e. intensity) aspect of sensations evoked by experimentally controlled stimuli of differing physical magnitudes. In all cases, unless otherwise noted, any reference to perception should be taken as referring specifically to this situation. Also, in all cases, unless otherwise noted, consciousness is defined as the awareness of the magnitude aspect of a sensation, "so that a verbal or nonverbal description of it can be provided, or a voluntary response, 5 equivalent to the description, may be produced" (Ladavas, Cimatti, Pesce, and Tuozzi, 1993). A Brief History of Psychophysical Scaling When Fechner created psychophysics it was based on his insight that conscious perceptions of stimulus magnitudes could be measured if a consistent relationship existed between perceived magnitude and subjects' overt responses.-Since Fechner assumed that perceptions of stimulus magnitudes were lawfully related to physical magnitudes, a lawful relationship between physical magnitudes and response magnitudes would be a sign that a lawful relation existed between perceptual magnitudes and response magnitudes. Weber's Law, AS=WS (1) which states that the amount that a stimulus magnitude (S) needs to be increased (AS) in order to create a just noticeable difference (JND), is a constant proportion (W) of the stimulus magnitude (S), provided Fechner with the lawful relationship that he needed. Fechner assumed an additive relationship between JNDs and units of conscious perception^ so that if stimulus threshold equaled 0, one JND above threshold would equal 1, one JND above 1 would equal 2, and so on. By further assuming that JNDs reflected a continuous function, as opposed to the discrete function suggested by the summing procedure, Fechner was able to derive what has come to be known as Fechner's Law, P = (1/W)log(S/T) (2) where P is conscious perception, T is threshold, and 1/W is a scaling constant. Note, however, that for Fechner to claim that his scale was linearly related to the true 1 With only two relevant variables the interlocking algebraic structure requirement is redundant. 6 psychological scale, it was necessary for him to assume that JNDs could be summed to measure conscious perception; to claim a nonlinear scale he needed only to assume that JNDs were systematically related to perceived magnitude. Despite the fact that consciousness is, by definition, composed of mental phenomena that we have access to, it was widely believed at the time that people could not accurately report conscious perceptions of magnitude (Stevens, 1975). Because Fechner subscribed to this view it did not occur to him to test his assumption by asking subjects to report their conscious perceptions. Instead, he measured JNDs and derived subjects' perceptions of magnitude. The idea that people can report conscious perceptions of stimulus magnitudes with a reasonable degree of accuracy was first introduced by Stevens in a series of studies (see Stevens, 1975 for a review). Essentially, what Stevens did was to demonstrate that people could systematically match their perceptions of stimulus magnitudes from one modality to their perceptions of stimulus magnitudes from another modality. This general approach was termed cross-modality matching (CMM) and involved presenting a series of stimuli in one modality and having subjects match the magnitude of each one by adjusting the magnitude of a stimulus from a different modality. For example, subjects could be instructed to adjust the intensity of a light so that their perceptions of brightness matched the perceived loudness of various tones presented by the experimenter. By assuming that subjects' perceptions of numerosity, or numbers, was linear to the properties of actual numbers (see Poulton, 1989b, for a discussion of the validity of this assumption), and also that the mapping of subjective number magnitude to the subjective experience of other stimulus magnitudes was linear, Stevens argued that the 7 matching of numbers to the magnitudes of perceived stimuli produced a linear scale of conscious perception (note, I shall refer to the combination of these two assumptions as the linearity assumption). Stevens referred to this special case of CMM (Luce 1972) as magnitude estimation (ME). Using ME and the linearity assumption, Stevens found that Fechner's law did not hold for perceptions of prothetic sensory continua, such as loudness and brightness2. Instead he found that the perception of magnitude obeyed a power law, R=KSB (3) where both B and K are constants. According to this function, each modality is characterized by an exponent value, B, that describes the relationship between the physical intensity of a stimulus and the perceptions of magnitude evoked by it. K was a multiplicative constant representing the unit of the scale and was generally considered less interesting as it was thought to be arbitrary (however, see Borg & Marks, 1983 for a review of the possible meanings of K). Note also that the value of the exponent is entirely dependent on the choice of the scale used to quantify the stimuli. For example, for 1000 Hz tones Stevens' results (see Stevens, 1975) indicate that the exponent is approximately 0.30 if sound power is used to quantify the stimuli, and 0.60 if sound pressure is used (in this dissertation sound pressure is used). Thus the value of the exponent has no absolute meaning in relationship to the physical worid, its meaning is always relative to the measurement unit of the stimulus. In terms of psychophysical scaling, the most important feature of the power law is that it exhibits an internal algebraic consistency. Although it raises interesting 2 Prothetic refers to differences in quantity whereas metathetic refers to differences in quality (Stevens, 1975). For example, brightness is a prothetic dimension whereas color is a metathetic dimension. Fechner's law did hold for some metathetic continua such as pitch, apparent position, apparent inclination, and apparent proportion. Fechner's law was also thought to hold for visual saturation and visual hue, although some evidence suggests otherwise (Indow & Stevens, 1966). The problem in distinguishing log functions from power functions is that sometimes both fit the data reasonably well (Norwich, 1993). 8 questions, this dissertation will not be concerned with relating the power law to external, physical measurements except in so far as they can be used to reveal an inner, psychological consistency. For example, CMM exponents are predictable from the ME exponents of the two modalities involved, according to the derived relation R = K S A / B (4) where A is the ME exponent associated with S and B is the ME exponent associated with R (Stevens, 1975). The power law results are also systematically related to other psychological phenomena related to magnitude perception (Luce, 1972), such as intensity-duration exchange (Stevens, 1966; Stevens & Hall, 1966), adaptation level (Stevens & Stevens, 1963), reaction time (Vaughan, Costa, & Gilden, 1966), Weber's law (Teghtsoonian, 1971), and signal detection theory (Norwich, 1995; Ward, 1995). Furthermore, several comprehensive theories from which these empirical relationships can be derived have recently been developed (Norwich, 1993; Zwislocki, 1994; Link, 1992). This sort of consistency makes ME much more than a mere curve fitting exercise. In fact, Stevens did not need his linearity assumption to claim valid (nonlinear) scales, the internal and external consistency of the power law were enough. Stevens' linearity assumption rested primarily on face validity. Other methods of indirect scaling also produced legitimate scales different from, but systematically related to Stevens' ME scales. For example, category scaling, which is the same as ME except that the range of allowable responses is restricted to a limited number of categories, systematically produces lower exponents than ME (on average about half the size, Ward, 1971). Also, Fechner's JND scales are different from but systematically related to Stevens' ME scalessince the Weber fraction (W) for any modality can be predicted from the ME exponent (B) associated with that modality according to the formula, W = (1.03 1 / B)-1 (5) where 1.03 is a fixed constant (Teghtsoonian, 1971)3 note - this law applies only to Steven's ME data and a specially selected set of Weber fractions and should be considered only in this context. See later for a discussion of the extent to which findings associated with Stevens' ME results can be generalized). In fact, in theory, it is possible to concoct many different scales from the interlocking results of psychophysics, all of which would meet the criterion for a nonlinear associative scale. Stevens' assumption that his ME scale was linearly related to conscious perception can be justified only in so far as ME gives subjects the most freedom in reporting their perceptions, and therefore could be considered to have the greatest face validity. A separate argument for the linearity assumption is that linearity is a desirable mathematical property to assume (Stevens, 1946, Zwislocki, 1991), however, this argument has no bearing on the issue of validity. Stevens also believed, as is common in psychology and physiology, that healthy, normal individuals should experience similar sensations of magnitude when confronted with the same stimuli. Because of this Stevens attributed individual differences in ME exponents to biases in reporting magnitudes. Therefore, Stevens' implicit scaling model consisted of two parts, a stimulus input function and a response output function (which could be linear or nonlinear, i.e. biased). This general approach has been termed by Marks (1991) the "canonical model of psychophysics." It asserts that subjects' responses are directly proportional to the sensation magnitudes that gave rise to them, 10 provided the responses have been collected using a method that does not bias the response output function (Stevens, 1975, Zwislocki, 1983)4. The problem of coupling numbers to subjective impressions in such experiments (ME and CMM) has been solved in principle, but work should continue on minimizing biases and variability associated with the coupling. (Zwislocki, 1991, p. 25) Problems with the Power Law Stevens highly consistent results promised great things. However, cracks soon appeared in the edifice he had built. Although a considerable amount of evidence indicated that subjects do obey the power law (see Stevens, 1975; and Bolanowski and Gescheider, 1985 for reviews), the specific exponent values that Stevens found could not be reliably replicated. At the level of the individual the value of the exponent for the same modality varied considerably across individuals in the same experiment (eg. Algom and Marks, 1984; Luce and Mo, 1965; Marks and J. C. Stevens, 1965; Rule and Markley, 1971; Wanschura and Dawson, 1974; Logue, 1976) and also across time within individuals (Logue, 1976; Marks, 1991; Teghtsoonian and Teghtsoonian, 1983). ' Figure 1, taken from Hellman and Meiselman (1988), illustrates the intersubject variability found for loudness exponents within a single experiment. Note, however, that this graph represents a best case scenario as Hellman is noted for achieving unusually low levels of intersubject variability. Clearly these results were at odds with the general expectation that normal individuals are similar to each other and that individuals are consistent across time. Stevens also found individual differences but he argued that 3 Note, this law applies only to Steven's ME data and a specially selected set of Weber fractions (see Teghtsoonian, 1971) and should be considered only in this context. See later for a discussion of the extent to which findings associated with Stevens' -ME results can be generalized 11 response biases vary randomly across individuals and that therefore the data could be averaged to get the true exponent value (see Stevens, 1975 for a detailed account of Stevens' position). However, Marks (1974) reviewed the literature and found that the average value of the exponent varied across experiments done in different labs, even when the same method was used. Figure 2, taken from Marks (1974), illustrates the interlab variability found for loudness exponents (note, in Figure 2, interval scaling is the same as category scaling). Thus, even at the group level, Stevens' exponent values could not be reliably replicated. The central problem with the inconsistencies in exponents is that it is unclear whether they represent biases in reporting or actual differences in perception due to context effects or individual differences. This is because the stimulus input and response output functions are not uniquely defined due to our inability to directly measure conscious perception. For example, a stimulus/response power function could arise if the stimulus input function is a power function and the response output function is a linear function (Stevens, 1959), or if both functions are power functions (Attneave, 1962), or if both functions are log functions (MacKay, 1963), depending on the theorized intervening mechanism. Ascertaining whether individual differences occur in the stimulus input function or in the response output function is critical for the canonical model since, as Luce & Mo (1965) pointed out, real individual differences in the stimulus input function would mean that the average exponent values associated with different modalities have no meaning at the level of the individual. The same problem also prevents proponents of the canonical model from ascertaining which labs produce legitimate, unbiased exponents. 4 Both Stevens and Zwislocki admit that the linearity assumption cannot be proven, but go on to argue that it is a practical and mathematically desirable assumption. Over all, their position seems to be that the linearity assumption is probably correct, but 12 Figure 1. Distribution of individual loudness exponents, taken from Hellman and Meiselman (1988). MEASURED LOUDNESS EXPONENTS 16 14 12 10h 2-0-I J I I L j | .25 .35 .45 .55 .65 .75 .85 .95 1.05 1.15 EXPONENT (n) even if it is not correct we should still use it. 13 X Figure 2. Distribution of average loudness exponents, taken from Marks (1974). f U r t O S C A L I M 1 5 4 i 4 4 % 5 4 F 4. 4 - 4 C < T T Sf 2 ? * * sf * § ? ? J * i * s * POWER-FUNCTION EXPONENT 15 14 Overall, these difficulties indicate that the problem of demonstrating the validity of the linearity assumption is currently intractable due to our inability to differentiate the stimulus input function from the response output function, and that there is currently insufficient interindividual, intraindivual, and interlab consistency to create a nonlinear associative scale. As noted above, a necessary condition for the creation of an associative scale is that the same processes (including the response output function) are used to make each measurement. Assuming that normal individuals are not vastly different from each other, the highly variable results of ME suggest that this condition is not being met, as ME is currently implemented. Constrained Scaling Due to the problems created by the excessive intersubject variability found using ME, there has been considerable concern over eliminating biases from current scaling procedures in order to reveal the true stimulus input exponent (see Poulton, 1989, for a review). However, Ward (1992, 1993) has argued that this view is consistent with a model of the mind as unitary and static, as opposed to distributed and dynamic. Pursuing a view consistent with Minsky's (1986) position that the mind is composed of specialized mental subunits which in turn are composed of smaller specialized mental subunits, and so on down to the level of individual neurons, Ward (1992) proposed that psychophysical scales should be defined and that observers should be taught how to use these scales,- "in situations that have been studied and analyzed so as to engage a known and consistent subset of mental agents." In theory, it is possible that such scales could then be applied to novel stimuli without altering the set of mental subunits responsible for matching response outputs to stimulus inputs (Ward, 1992). From the point of view of creating valid associative scales this situation would be highly desirable 15 since it would satisfy the condition that the same, specified process be used for each measurement (see above). In this dissertation I explore a method called constrained scaling (West & Ward, 1994), which is an attempt to establish a method for ME consistent with Ward's (1992) position. Constrained scaling consists of using feedback to train subjects to respond to a set of stimuli according to a power function with a particular exponent and then asking them to respond to a different set of stimuli using the response scale they have learned, but without feedback. Provided the stimulus input function and the response output function do not interact, magnitude scaling tasks can be conceptualized in the following way: r(s(x)) = y (6) (Marks, 1991) where x is the physical intensity, y is the subject's response, / is the response output function, and s is the stimulus input function (assumed to occur before r). By training subjects to relate x to y by a single function, call it m, we can make r(s(x)) isomorphically the same across subjects, although r and s would not necessarily need to be the same across subjects for this to occur. r(s(x)) = m(x) = y (7) After this calibration procedure, by altering the stimuli (i.e. in a manner predicted to alter the exponent; eg. in the case of loudness altering the frequency of the sound stimuli should change the exponent value) we can alter s to a different function, call it s'. If r(s'(x)) remains calibrated across subjects it implies that the function describing the transformation from s to s', call it q, was the same across subjects. For example, if we assume a power function for the stimulus input system and another power function for the response output system (Attneave, 1962; Curtis, Attneave & Harrington, 1968; 16 Rule, Curtis and Markely, 1970), then transforming s to s' by raising s by some power q, would maintain the calibration between subjects even if they differed across r and s, provided that q was the same for all subjects. Thus, if subjects are calibrated to respond the same way to standard stimuli, and their responses remain calibrated when presented with novel stimuli, it would imply that q is the same. However, it would be overly conservative to limit ourselves to discussing q. Putting aside the question of conscious perception for the moment, the m function can be decomposed into an initial hard wired, fixed function (f) that is the same across healthy, normal individuals, followed by an unfixed function (u) that can be cognitively altered. m(x) = u(f(x)) = y (8) Given that f is the same across subjects, calibrating subjects to the same m function would also mean that their u functions are the same. In this view, constrained scaling exponents will systematically reflect the initial, fixed function (Baird, Kreindler, and Jones, 1971). Furthermore, because people are not computers that can be programmed in arbitrary ways, it is reasonable to assume that fixing the u function, through training and the constraints of the task, will create a situation in which subjects use the same set of mental subunits, in the same way, to perform the task. And, since the task involves matching response magnitudes to consciously perceived stimulus magnitudes, this process would include the mental subunits responsible for conscious perception of the stimuli. Following from this we would expect the relationship between response magnitudes and consciously perceived stimulus magnitudes to be the same across subjects. 17 Therefore, if it is possible to teach subjects a scale and for them to extend it to novel stimuli, it should result in a true associative scale. Such a scale would be characterized by reduced intersubject and interlab variability (i.e. relative to ME) and would allow for the replication of specific quantitative results (i.e. specific exponent values and laws relating them). Based on the support the power law has received from ME and CMM results (eg. see Stevens, 1975, Bolanowski and Gescheider, 1991), the use of a power function for the scale to be learned would seem the most appropriate choice as it should make the task seem natural to subjects. However, as no one has ever attempted to get subjects to extend a learned psychophysical scale to novel stimuli, the best approach will ultimately be determined empirically. In general, this approach of creating scales rather than discovering them (as in standard ME) has profound consequences for how we interpret the results of ME experiments. Essentially, what looks like a reduction in response biases according to the discovery approach, is actually the setting up of strong informal constraints so as to engage a consistent set of mental sub units, according to the creation approach. This can explain how some labs can get very consistent results across experiments (i.e. average exponent values) that cannot be replicated in other labs. In fact, there is evidence that Stevens used such informal constraints. The cartoon in Figure 3 is taken from Poulton (1989) and is a reproduction of a drawing by one of Stevens' graduate students illustrating his or her perception of Stevens' methods. Similar reports were obtained from R. Teghtsoonian (1994), a former visitor in Stevens' lab, who reported that Stevens interrupted his scaling session to inform him that his results were in error. Stevens also trained subjects on ME of line length before letting them judge other 18 Figure 3. A cartoon reproduction, taken from Poulton (1989). 19 stimuli (Stevens, 1975) and tended to use the same, highly practiced subjects over and over (Teghtsoonian & Teghtsoonian, 1994). However, there is also evidence that constraining the ME task can have undesirable results. Stevens originally included a standard stimulus with an assigned numerical value as part of the ME procedure, but found it produced less satisfactory results than unconstrained ME, or free ME as he called it (Stevens, 1975). The following quote, reported in Stevens (1975), is from one of Stevens' ME subjects. I felt freer to use numbers over a wide range. I liked the idea that I could just relax and contemplate the tones. When there was a fixed standard I felt more constrained to try to multiply and divide loudnesses, which is hard to do; but with no standard I could just place the tone where it seemed to belong, (p. 28) Therefore, it could be argued that constrained scaling will result in contrived, artificial, and/or confused scales. However, if the arguments presented above in favor of constrained scaling are correct, then it should be possible to reliably recreate past ME and CMM findings. If this is possible, it would be very difficult to argue that constrained scaling is not as legitimate an approach as ME or CMM. Previous Empirical Results Several studies have already been performed indicating the feasibility of training subjects to respond to a standard scale. King & Lockhead (1983), Koh & Meyer (1991), Koh (1993), West & Ward (1994), and Marks Galanter, & Baird (1995) have all provided evidence that, given feedback, subjects can learn to respond according to power functions of a given exponent, quickly and with a high degree of accuracy. Of these, only West & Ward (1994) and Marks, Galanter, & Baird (1995) employed the 20 learned scale on novel stimuli. In terms of the chronology of events leading up to this dissertation it is important to note that Marks, Galanter, & Baird (1995) based their method on West & Ward (1994) which was an initial report of the results of Experiment 1 of this thesis. Therefore the results of West & Ward (1994) are reported later, in Experiment 1. Marks, Galanter, & Baird (1995) trained their subjects to respond to 500 Hz monaural tones according to power functions with exponents of 0.3, 0.6 and 1.2 and then removed the feedback and presented subjects with the same monaural 500 Hz tones alternated with binaural 500 Hz tones, both without feedback. Looking at the data averaged across subjects, their finding was that loudness summation was unaffected by the training, although the exponent value for both the monaural and binaural tones decreased when feedback was removed, by roughly 25% in all cases. Although they did not report individual exponents directly, it is possible to estimate the range of individual exponents from their graph of individual exponents versus individual binaural summation ratios. For subjects trained to respond according to an exponent of 0.3 the ratio of highest to lowest exponent was about 1.50:1, for an exponent of 0.6 it was about 1.58:1, and for an exponent of 1.2 it was about 1.44:1, somewhat better than the usual ME results (see Table 1, later). These results indicate that standard psychophysical results can be obtained using a learned scale, and suggest that intersubject variability is reduced. However, having subjects extend a learned, 1000 Hz, monaural scale to 1000 Hz binaural tones might constitute a special case in that, for purposes of scaling, subjects might experience monaural and binaural 1000 Hz tones in a very similar way. In order to establish the viability of constrained scaling stimuli that were clearly different would need to be used. 21 Research Goals The primary research goal of this dissertation was to demonstrate that constrained scaling can be used to calibrate subjects sufficiently to produce a meaningful, nonlinear (in the sense that it cannot be established if it is linear or not), associative scale of psychological magnitude, and to understand the psychological processes by which this is accomplished. The program of research investigated subjects' abilities to extend magnitude scales, learned using the loudness of 1000 Hz, pure tones, to other stimuli. Several different approaches were attempted, the success and failure of each shedding light on the psychological mechanisms involved in this task. The ability to extend the learned scales was investigated for intramodal stimuli (other sound frequencies), ( intermodal stimuli (brightness), and extramodal stimuli (i.e. cognitively generated magnitudes, in this case the magnitude of happiness expected if various amounts of money were to be received). In each case the pattern of results was compared to established ME and CMM findings. Also, the range of the response scales subjects were trained on (and hence the value of the exponent they were trained on), and the range of the stimuli that subjects were exposed to, were examined separately in terms of their effect on the constrained scaling process. In both cases the effects of these variables were found to be related to the constrained scaling process. In order to demonstrate that constrained scaling produces less individual variation than ME scaling, several benchmarks were employed in this dissertation. According to Marks (1974), individual ME experiments can produce highest to lowest exponent ratios of at least 2:1. However, a sampling of the literature (see Table 1) reveals that the variance is typically higher than this and can, in fact, be much higher. An exception to 22 Marks' 2:1 rule is study 8 by Hellman & Meisselman (1993)5 which has a highest to lowest exponent ratio of 1.60:1. However, I would argue that these results were achieved by using and improving on Stevens informal constraints. In fact, during loudness scaling Hellman monitors the process and if she feels the subject is getting confused she presents them with a very loud tone (Hellman, 1994). Table 1. Variability in a convenient sample of ME and CCM experiments. mean/sd high/low method N stimulus study 1 3.50 2.75 ME 11 loudness Stevens & Guirao, 1964 2 2.30 NA ME 32 loudness Teghtsoonian & Teghtsoonian, 1983 3 2.58 NA ME 35 loudness Teghtsoonian & Teghtsoonian, 1983 4 3.45 3.95 ME 8 loudness Algom & Marks, 1990 5 3.41 2.30 ME 11 loudness Algom & Marks, 1990 6 2.25 3.32 ME 8 loudness Ward, 1982 7 2.24 6.00 ME 8 brightness Ward, 1982 8 5.38 1.60 ME 10 loudness Hellman & Meiselman, 1988 9 3.65 2.27 ME 6 heaviness Luce & Mo, 1965 10 4.33 1.75 ME 6 loudness Luce & Mo, 1965 11 3.05 3.37 CMM 20 duration to loudness Lilienthal & Dawson, 1976 12 2.88 3.81 CMM 20 loudness to duration Lilienthal & Dawson, 1976 13 2.55 2.44 CMM 5 loudness to line length Zwislocki, 1983 14 3.07 2.40 CMM 10 duration to loudness Ward, 1975 high/low is the ratio of the highest to lowest individual exponent Most importantly, nobody is able to replicate the low levels of variation achieved in Hellman's lab. The most relevant experiments for this study are Experiments 6 and 7 in Table 1 which were performed in the same lab and used the same equipment as the experiments reported in this dissertation. The higher variability in these studies was likely due to the use of inexperienced subjects and the relative absence of informal constraints. Likewise the experiments in this study all involved inexperienced subjects. Taking Hellman and Meisselman (1993) as a rough guideline for what is possible when 5 Although Hellman and Meiselman tested 51 subjects they reported individual exponent values for only 10. 23 informal constraints are imposed it was predicted that constrained scaling should always result in a ratio of highest to lowest exponent of 1.60:1 or lower and that the mean exponent divided by the standard deviation of exponents should always be 5.50 or higher. Also, although it is common in ME experiments for some subjects to produce reversals (exponent values that violate expected directional predictions), if constrained scaling is to produce valid results at the level of the individual reversals should ideally be absent. In terms of creating an associative scale of conscious perception, there is no set criterion as to the level of consistency required between inputs and outputs. Given that subjects will probably not learn the standard scale perfectly, some individual variation should be expected when novel stimuli are presented, even if large numbers of trials are used. However, with any measurement, noise due to the imperfection of the measuring instrument is a factor. What I hope to show in this dissertation is that, through the use of constrained scaling, individual psychophysical differences can be consistently reduced to an unprecedented level (i.e. below the benchmarks), without compromising established psychophysical laws. Whether or not the variation is reduced sufficiently to create a stable indirect scale will be open to interpretation. Should these experiments be successful, further research can be undertaken to further reduce individual differences. What strikes the scientist as a reasonable degree of accuracy varies widely form field to field, and even from problem to problem. Approximations become the rule at the forward edge of any advancing science, and the accepted notion of what degree of accuracy will qualify 24 as reasonable may alter as the art of measurement evolves. (S.S. Stevens, 1975, p. 268) Overview of Experiments In addition to the main experiments, numerous pilot studies were done and are referred to when they were the basis for various decisions. All experiments were analyzed according to two criteria: 1) Did constrained scaling produce results equal to or better than the benchmarks for intersubject variability for the novel stimuli? 2) Did constrained scaling produce results consistent with established ME and CMM findings? Part 1: Achieving the Phenomenon: Intramodal Constrained Scaling Experiments 1 to 4 were attempts to get constrained scaling to work in the auditory domain, using 1000 Hz tones, an exponent of 0.6, and a response range of 1 to 100 as the training continuum. The testing continuum consisted of tones with frequencies known to produce exponents that differ from those of the 1000 Hz training tones. There were a number of reasons, which are discussed in Experiment 1, why these conditions seemed favorable for constrained scaling. In the end, a successful application of constrained scaling was attained. Several important findings with relevance to constrained scaling and to ME were uncovered. Part 2: Cross-Modality Applications: Intermodal Constrained Scaling Experiment 5 was an attempt to apply constrained scaling cross-modally. In this case the same training regimen as in experiments 1 to 3 was used (Le. 1000 Hz tones) 25 but the test continuum was brightness instead of loudness. This experiment was critical for establishing that constrained scaling could be used to create the interlocking set of cross-modality results necessary to establish legitimate scales. Also, if it is assumed that Stevens was successful at informally constraining his scales at the aggregate level, then fixing subjects on the exponent that Stevens attained for loudness (i.e. 0.60) should also lock them onto the exponent that Stevens attained for brightness (i.e. 0.30). Therefore, it was possible to test the hypothesis that constrained scaling eliminates interlab differences. Part 3: Stimulus Range Effects For ME, Stevens always used a range of stimulus intensities that was close to the full range, but not uncomfortably intense or too difficult to detect for the subject (Stevens, 1975). This type of procedure, which is very common in ME, has the effect of making the bottom and top of the stimulus range close to the bottom and top of the subject's perceptual range. An important effect of anchoring the stimulus range to the subject's own subjective range may be to provide the subject with a familiar context within which to make judgments. Experiments 6 and 7 examined the importance of this highly familiar context for the constrained scaling process by using constrained scaling to test subjects on various stimulus intensity ranges. Part 4: Different Exponents (i.e. Different Response Ranges) Marks, Galanter & Baird (1995) demonstrated that the exponent value of the function that subjects were trained on had no effect on loudness summation at the group level. However, because their subjects did not remain calibrated, it is still unclear if certain exponents are more natural to use, as suggested by the Canonical Model. 26 Experiments 8 and 9 investigated this question by using the auditory paradigm of Experiment 3 but training subjects on exponents of 0.30 and 0.90, instead of 0.60. Part 5: Nonperceptual Stimuli: Extramodal Constrained Scaling Experiment 9 extends constrained scaling beyond the perceptual domain and into the social, cognitive domain, by examining the magnitude of happiness resulting from winning various amounts of money. Specifically, subjects were trained using the standard auditory scale (i.e. 1000 Hz tones) and then asked to use that scale to rate how happy they would be if they won various amounts of money in a lottery. The social domain differs from the perceptual domain in that we should expect some legitimate individual differences. If constrained scaling can be extended beyond the perceptual domain it may be possible to detect these differences in a reliable way. This would have far reaching consequences for social, personality, and clinical psychology which currently lack the ability to clearly differentiate real individual differences from differences in response styles. 27 Part 1: Achieving the phenomena Experiment 1 In this experiment observers were trained using 1000 Hz pure tones and then tested on tones of 65 Hz, 100 Hz, 1000 Hz, and 8000 Hz without feedback. The auditory modality was chosen because research has indicated that it tends to produces less intrasubject variability compared to other modalities (Teghtsoonian & Teghtsoonian, 1983). According to previous research establishing equal loudness contours for these frequencies (see Ward, 1990 for a summery), 65, 100, and 8000 Hz tones should be perceived as less loud than 1000 Hz tones of same physical intensity. Moreover, the exponents of the psychophysical functions describing the 65, 100, and 8000 Hz tones should be higher than the exponent of the psychophysical function describing the 1000 Hz tones. It was predicted that, using the constrained scaling method, all subjects would produce this pattern of results. Furthermore, it was also predicted that the intersubject variability for each frequency would be below the benchmark levels set out in the introduction. Subjects Four volunteers were paid to participate. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. None had participated in a scaling experiment before. Apparatus Subjects were seated in a dimly lit sound attenuation chamber and the tones, which were 1 second in duration with a 2.5 msec rise and fall time, were monaurally presented to them through high quality headphones. Subjects estimated the loudness of the tones by entering a number on a computer keyboard. The number was displayed 28 on a two digit LED display for verification or correction by the subject. Following verification the estimate would be replaced by either feedback indicating the correct response or a code indicating no feedback. The tones were produced by a custom built sound generator controlled by a computer. Procedure Observers were first trained to respond to different intensities of a 1000 Hz pure tone according to the power function R=16.6S6 (9) where R is the correct response and S is the sound pressure in dynes/cm2. The exponent was set to 0 .60 to make the scale consistent with Stevens' s (1975) sone scale, which is an excellent candidate for a standard scale of loudness, and the multiplicative constant was set to 16.60 so that R=100 would result from an S equivalent to approximately 100 dB SPL. Before the training, observers were instructed that they would be learning to use a particular number scale to judge the loudnesses of pure tones. After each tone observers estimated its loudness by entering a number on a computer keyboard. The entered number was then displayed on the LED display for verification or correction by the observer. Following this, the correct response (feedback) was presented on the same display. At a signal from the observer (using the keyboard) the next tone was presented and the process began again. Observers were able to respond using whole numbers from 1 to 99. The training tones always corresponded to a whole number between 1 and 99 on the observers' response scale resulting in a stimulus range from approximately 33.3 dB SPL (response "1") to slightly below 100 dB SPL (response "99"). The sound pressure of the tones was varied by randomly selecting values of R (i.e. randomly selecting the 29 feedback) and then presenting the corresponding sound pressure level for judgment. This was done to make the pattern of sound pressures random with respect to the scale that observers were learning. Because individual stimuli were selected at random there were seldom repeats of particular sound pressures. This procedure contrasts with the usual procedure in psychophysical scaling in which relatively few stimulus intensity levels are selected for repeated presentation and the spacing of the stimuli is related to the physical scale by a log function, rather than a power function as in this case. However, it was not possible to use a few selected intensities since their identity would have been revealed through the feedback. Observers were each given 210 trials to learn the function. In the second part of the experiment observers were instructed to use the scale they had learned to judge the loudnesses of the test tones. These tones were presented in 30-trial blocks of no-feedback-trials in which both the frequency (65, 100, 1000, or 8000 Hz) and sound pressure were selected at random. The random selection of sound pressure was done using the same method as in the learning trials so that the sound pressure spanned the same range for all frequencies. Therefore, according to the research on equal loudness contours, some of the 65, 100, and 8000 Hz tones should have been perceived as below the scale value associated with a response of 1, and could also occasionally occur below threshold. Because of this, subjects were instructed to respond with a 1 to anything equal to or less than 1, including 0. The test blocks were alternated with blocks of 30 feedback-trials in which observers judged 1000 Hz tones of random sound pressure followed by feedback (these blocks were identical to the learning trials). The purpose of the feedback blocks was to refresh the observers' memory of the scale at regular intervals (Mori and Ward, 1995). 30 Observers were given a break after 210 trials and then repeated the process for a total of 420 test trails, 210 (7 blocks of 30) with feedback and 210 without. Results and Discussion Because of the stimulus spacing it was more appropriate to fit power functions directly to the raw data rather than using the more traditional method of linear regression on the logarithms of the stimulus and response amplitudes. The simplex algorithm available on the Systat 5.02 for Windows statistical package was used to fit power functions directly to the raw data. Also, before analyzing the data all responses equal to 1 were discarded, due to the ambiguous nature of this response. During the learning phase the four subjects produced exponents of 0.50, 0.55, 0.47, 0.45, respectively. The average exponent was 0.50, the mean/sd was 11.25, and the ratio/of highest to lowest exponent was 1.23:1. These figures indicate that the subjects were well calibrated across the learning phase, the mean/sd and the highest to lowest exponent ratio both indicated a level of intersubject variability much lower than the bench mark figures (mean/sd=5.50, hi/low exponent ratio=1.60:1), although this should be no surprise as subjects were receiving feedback on every trial. Table 2 gives the results of the Simplex Estimation procedure for the second part of the experiment. For the blocks of refresher trials (the 1000 Hz tones, with-feedback trials), subjects seemed less able to maintain the exponent of 0.6 but interestingly remained well calibrated to each other at a lower exponent. The mean exponent was 0.45, the mean/sd was 12.82, and the highest to lowest exponent ratio was 1.20:1. On the 1000 Hz tones without feedback the mean exponent dropped even further and subjects appeared to be less calibrated. The mean exponent was 0.37, the mean/sd was 5.03, and the highest to lowest ratio was 1.45:1. 31 Table 2. Test trial results for Experiment 1. Analysis S S S Block F NF NF Subject* Exponent Corrected Exponent Corrected Exponent Corrected 1000 Hz RA2 1000 Hz RA2 8000 Hz RA2 1 0.46 0.77 0.38 0.77 0.39 0.62 2 0.43 0.71 0.31 0.57 0.45 0.63 3 0.41 0.71 0.33 0.65 0.46 0.69 4 0.49 0.75 0.45 0.73 0.52 0.62 mean 0.45 0.37 0.46 sd 0.04 0.06 0.05 m/sd 12.83 5.83 8.92 h/l 1.20 1.45 1.31 Analysis S S Block NF NF Subject* Exponent Corrected Exponent Corrected 100 Hz RA2 65 Hz RA2 1 0.30 0.50 0.32 0.37 2 0.42 0.67 0.37 0.59 3 0.33 0.74 0.41 0.70 4 0.44 0.61 0.55 0.68 mean 0.37 0.41 sd 0.07 0.10 m/sd 5.33 4.14 h/l 1.48 1.73 S: R=KSAB F: feedback block NF: no feedback block Comparing the other frequencies to the 1000 Hz no-feedback results produced mixed results. As predicted all subjects gave the higher exponents for the 8000 Hz tones. Also subjects were well calibrated on the 8000 Hz tones relative to the bench marks. The mean/sd was 8.92, and the highest to lowest ratio was 1.31:1. However, for the 65 and 100 Hz tones some subjects gave exponents lower than the 1000 Hz no-32 feedback exponents, while others gave higher exponents (i.e. there were reversals, see Table 2). Subjects also appeared to be less calibrated on these frequencies and, except for the highest to lowest exponent ratio for the 100 Hz tones, failed to meet the bench marks set out in the introduction, although they did come closer than is typical for ME (compare to the sample ME results in Table 1). For the 100 Hz tones the mean/sd was 5.36, and the highest to lowest ratio was 1.47:1. For the 65 Hz tones, the mean/sd was 4.14, and the highest to lowest ratio was 1.73. Although subjects did not always provide exponents in the predicted directions, their responses nevertheless indicated that they consistently found the 65, 100, and 8000 Hz tones to be less intense than the 1000 Hz tones at the same sound pressures. For individual subjects, t-tests revealed that responses to the 1000 Hz tones were significantly higher than responses to the 65, 100, and 8000 Hz tones (p<.001 except for the difference between the 100 Hz responses and the 1000 Hz responses of subject 3 which was significant at p=0.014). Thus this first attempt at constrained scaling did reproduce the basic finding that 1000 Hz tones are perceived as more intense than 65, 100, and 8000 Hz tones of the same sound pressure amplitudes, in all of the subjects. Figure 4 displays the fitted functions for the 65, 100, 1000, and 8000 Hz tones for subject 3, who produced the highest R 2 values (as Experiment 1 was more in the nature of a pilot study, only the data from one subject are presented). 33 Figure 4. Experiment 1: Fitted Power Functions for an Individual Subject. 65 Hz 100 Hz Experiment 2 Experiment 2 emended the procedure used in Experiment 1 in several ways to make the task easier for subjects and improve the results. In particular, the number of frequencies used in the test blocks was reduced from four (8000, 1000, 100, and 65 Hz) to two (1000 Hz and 65 Hz). This was done to reduce any confusion arising from having to keep track of multiple scales. Subjects Four volunteers were paid to participate. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. None had participated in a scaling experiment before. Procedure The procedure was the same as in Experiment 1 except that only a 1000 Hz tone and a 100 Hz tone were used during the test phase. This was done to reduce any confusion that subjects might experience when trying to keep track of multiple frequencies. Also the apparatus was altered so that subjects were able to respond using whole numbers from 1 to 99, or 0 in the event they heard nothing, or the letter L to indicate a judgment of less than 1 but greater than 0. This was done in order to make it clear to subjects that the tones could go below 1 and to retain the responses equal to 1 in the analysis. Subjects were instructed that the 100 Hz tones would go below a scale value of 1 and might be inaudible. Results and Discussion Once again the simplex algorithm (on Systat 5.02 for Windows) was used to fit power functions directly to the raw data. Of course, before analyzing the data all responses below 1 were discarded (either L or 0) as they carried no information 35 regarding the power function. During the learning phase the four subjects produced exponents of 0.50, 0.53, 0.61, 0.59. In terms of the intersubject variability the results were almost identical to those found in the learning trials of Experiment 1. The mean/sd was 11.14 and the ratio of highest to lowest exponent was 1.24:1. Table 3 gives the results from the second part of the experiment. As can be seen from the 1000 Hz with-feedback refresher trials, this time subjects were able to stay quite close to the learned scale as long as they had feedback (mean exponent = 0.59, see Table 3), indicating that the apparent drop in the exponent which occurred in this condition in Experiment 1 may have been due to the overall greater complexity of the task. The results from this condition also indicate an extremely low level of intersubject variability. The mean/sd was 36.88 and the ratio of highest to lowest exponent was 1.06:1. On the 1000 Hz without-feedback trials subjects dropped to a mean exponent of 0.47, although they remained remarkably well calibrated. The mean/sd was 31.00 and the ratio of highest to lowest exponent was 1.08:1. The fact that subjects remained closely calibrated indicates that the drop was probably not caused by subjects returning to their own idiosyncratic response preferences. Instead it would appear that subjects were affected in the same way by the removal of feedback and the additional 100 Hz tone. Of course, four subjects is a small sample and it is possible that these results could have occurred by chance, however, it is interesting to note that Marks, Galanter, & Baird (1995) found that when feedback was removed, subjects who had been trained to respond to 500 Hz tones according to a power function with an exponent of 0.60, fell to an average exponent of 0.45 as measured by monaural magnitude estimation, and 0.44 as measured assuming loudness summation for binaural stimuli (they did not 36 report the exponents of individual subjects for this task). From a black box perspective (i.e. the m function, see equation 7), exactly the same process occurred in this experiment as in Marks, Galanter, & Baird (1995), suggesting a common cause for this result. This issue will be pursued further in the discussion section of this dissertation. Table 3. Test trial results for Experiment 2. Analysis S R S R Block F F NF NF Subject* Exponent Corrected Exponent Corrected Exponent Corrected Exponent Corrected 1000 Hz RA2 1000 Hz RA2 1000 Hz RA2 1000 Hz RA2 1 0.57 0.82 0.63 0.82 0.48 0.74 0.62 0.77 2 0.60 0.85 0.60 0.83 0.49 0.82 0.47 0.74 3 0.61 0.85 0.59 0.84 0.45 0.83 0.49 0.78 4 0.58 0.84 0.62 0.85 0.47 0.80 0.64 0.85 • mean 0.59 0.61 0.47 0.55 sd 0.02 0.02 0.01 0.09 m/sd 36.88 37.94 31.00 6.44 h/l 1.06 1.06 1.08 1.35 j Analysis S R Block NF NF Subject* Exponent Corrected Exponent Corrected 100 Hz RA2 100 Hz RA2 1 0.41 0.43 0.58 0.46 2 0.37 0.70 0.55 0.72 3 0.39 0.71 0.67 0.68 4 0.57 0.57 0.77 0.72 mean 0.43 0.64 sd 0.09 0.10 m/sd 4.81 6.43 h/l 1.53 1.40 S: R=KSAB R: log(R)=B*log(S)+log(K) F: feedback block NF: no feedback block Unfortunately, although subjects remained calibrated with each other on the 1000 Hz no-feedback tones, albeit to a (ower exponent, the simplex estimates of the exponents for the 100 Hz tones were substantially more variable. Figure 5 displays the fitted functions for the 100 Hz trials. The mean/sd was 4.81 and the ratio of highest to lowest exponent was 1.53:1 (benchmarks: mean/sd=5.50, hi/low exponent ratio=1.60:1). Also, the simplex estimates for the 100 Hz exponents were all lower than the 1000 Hz exponents, the opposite of what was predicted. However, when the log of the responses was (inearly regressed against the log of the stimulus intensities according to the function, log(R) = B log(S) + log(K) (10) except for subject 1, the differences in exponents was in the predicted direction (see Table 3). Also the intersubject variability for the 100 Hz tones was reduced. The mean/sd was 6.43 and the ratio of highest to lowest exponent was 1.40:1. For the 1000 Hz tones the intersubject variability was substantially increased. The mean/sd was 6.44 and the ratio of highest to lowest exponent was 1.35:1. However, the intersubject calibration results for the 1000 Hz with-feedback trials were unaffected by the change in analysis (see Table 3). Overall, an increase in intersubject variability was expected. Figure 6 illustrates the problem using the 1000 Hz no-feedback trials of subject 2. It displays the simplex and regression functions plotted as response against stimulus and as log response against log stimulus. As can be seen, a likely reason that the regression analysis failed to show the same level of calibration on the 1000 Hz without-feedback tones is that the lower end of the stimulus range was no longer sufficiently • sampled when both axes were logged (this can also explain the aberrant results of subject 1). However, the failure of the simplex estimates to produce the expected 38 relationship between exponents is more problematic. It may be the case, at least under these conditions, that the relationships between the exponents is method dependent. It was also interesting that the regression approach seemed, if anything, to lower the intersubject variability for the 100 Hz tones. This would suggest that subjects were better calibrated on the less intense sound pressure levels, which were more heavily weighted in the regression analysis. Similarly, the fact that regression approach raised the intersubject variability for the 1000 Hz tones without feedback but not for the 1000 Hz tones with feedback, suggests that feedback is particularly important for the less intense tones. 39 Figure 5. Experiment 2: Fitted Functions for the 100 Hz Test Trials. Subjects 1 and 2 Subjects 3 and 4 SOUND PRflESSURE (DYNES/CM-2) SOUND PRRESSURE <DYNES/CM"2) 40 Figure 6. Nonlinear and linear fitted functions for the 1000 Hz, no-feedback trials of subject 2. Nonlinear simplex estimate 150 ui 0 £ > 1 1 1 0 6 10 16 20 SOUND PRESSURE (DYNES/CM"2) Linear regression estimate 150 0 C ' 1 : 1 1 0 6 10 16 20 SOUND PRESSURE (DYNES/CM-2) I 1 i i I -2 - 1 . 0 1 2 LOG SOUND PRESSURE (L0GCDYNES/CM-2)) I i i i I - 2 - 1 0 1 2 LOG 'SOUND PRESSURE (L0«DYNES/CM~2» 41 Experiment 3 In Experiment 3 the same general approach as in Experiments 1 and 2 was employed, but with several important changes in the method. Because the regression analysis in Experiment 1 produced the predicted relationship between exponents, whereas the nonlinear curve fitting approach did not, Experiment 3 returned to the more traditional psychophysical practice of spacing the stimuli according to a logarithmic function and estimating the exponents according to Equation 10. To eliminate the drop in the 1000 Hz exponent found in Experiments 1 and 2 when feedback was removed, the frequency of feedback was increased to feedback on every second trial. The apparatus was also improved so that subjects could enter responses below 1, and 65 Hz tones were used instead of 100 Hz tones on the test trials in order to create a stronger contrast with the standardized 1000 Hz tones. Additionally, all subjects were run in a base line condition in which the 65 Hz tones were replaced with 1000 Hz tones. Subjects Six volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. None had participated in a scaling experiment before. Apparatus The apparatus was the same as Experiment 1 except that the LED was replaced with a computer screen and the keyboard was replaced with a mouse. A program written in Visual Basic for DOS was used to allow subjects to enter their responses by using a mouse to manipulate a specially designed scroll bar that appeared on a computer monitor screen. The scroll bar (which was 25 cm long) allowed subjects to 42 move the scroll bar cursor by 1, 10, or 0.1, and to access all numbers from 0 to 99.9 with a precision of one decimal place. A text box on the screen displayed the digital value the scroll bar was set at. To receive a tone the subject used the mouse to click on a button marked PLAY TONE. After subjects had indicated their response by using the scroll bar they used the mouse to click on a screen button marked OK, at which point their response was replaced with feedback. The feedback was accurate to five digits, as many as would fit in the box, to encourage subjects to use the full three digits available to them for their responses. Procedure The learning phase was the same as in Experiments 1 and 2, except that the number of learning trials was reduced to 100 trials. In the second part of the experiment, without-feed back trials were alternated with with-feedback trials and only a single frequency was used for the test stimuli. Thus subjects would receive a test stimulus, provide a response, receive no feedback, receive a trained 1000 Hz stimulus, provide a response, receive feedback, and so on. The method for selecting the sound pressures was to randomly select dB values between 33 dB and 99 dB, so some of the 65 Hz tones would have been perceived as below the scale value corresponding to 1 and a few would have occurred below threshold. Subjects were told that the 1000 Hz tones would always fall in the response range of 1 to 100, while the 65 Hz tones could require responses as low as 0 and as high as 99.9. The same subjects were also run in a base line condition in which the 65 Hz tones were replaced with 1000 Hz tones. Other than this change the procedure was identical, including the use of 100 training trials prior to testing, Subjects completed the experimental condition and the base line condition on different days. The order of testing was counterbalanced, half of the 43 subjects received the baseline condition first while the other half received the experimental condition first. Results and Discussion For all power function analyses the log of the response was linearly regressed against the log of the stimulus according to Equation 10. Combining both learning sessions, the individual exponents were 0.56, 0.60, 0.59, 0.57, 0.589, and 0.60. The level of intersubject variability for this phase was substantially reduced from Experiments 1 and 2. The mean/sd was 36.63 and the ratio of the highest to lowest exponent was 1.03:1 (Experiment 1: mean/sd=11.25, hi/low exponent= 1.23:1; Experiment 2: mean/sd=11.14, hi/low exponent=1.24:1). This improvement can be accounted for by subjects learning to exploit the logarithmic spacing of the stimuli. Traditionally, ME has employed log-equally spaced stimuli, however, it is rarely noted that this stimulus spacing should appear uneven (i.e. a preponderance of less intense stimuli) to subjects, if they truly experience sensation magnitude according to the power law. Figure 7 displays the learning curves for subjects in the learning phase in ^ Experiment 2 (line A) and subjects in their initial learning phase in Experiment 3 (line B). Each point in the figure is the mean of the median errors of each subject in blocks of 10 trials. The graph illustrates that extensive training with feedback had no discernible benefit beyond the first 10 trials when the stimuli were spaced according to a power function with an exponent of 0.60 (line A), but a substantial effect over the first 30 trials when the stimuli were spaced according to a logrequal function (line B). These results suggest that naive subjects initially respond with the expectation that the stimuli will be evenly spaced according to a power function, as in Experiments 1 and 2. 44 For the baseline test trials and the actual test trials there was no discernible affect for order. For the base line test trials, the 1000 Hz with-feedback trials produced a mean exponent value of 0.54. The mean/sd was 22.54 and the highest to lowest exponent ratio was 1.11:1. For the 1000 Hz without-feedback trials, the mean exponent value was 0.54, the mean/sd was 14.13 and the highest to lowest exponent ratio was 1.21:1. Although an F-test for variance revealed no significant differences between the with-feedback and without-feedback trials, it was interesting to note that the intersubject variability for the without-feedback trials appeared higher. One might have expected this condition to produce less variable results since subjects had the benefit of just haying received feedback on the previous 1000 Hz tone, whereas for the with-feedback trials subjects did not receive feedback on the previous tone. These results suggest that subjects were not sensitive to this potential advantage. The results of the test trials are displayed in Table 4. The mean exponent values were 0.55 for the 1000 Hz trials and 0.83 for the 65 Hz trials. For all subjects, as predicted, the 65 Hz tones produced exponents higher than the exponents produced by the 1000 Hz tones with which they were alternated. A t-test revealed that the difference in exponent values was significant at p<.001. In terms of intersubject variability both the 1000 Hz trials and the 65 Hz trials exceeded the intersubject variability benchmarks (mean/sd=5.50, hi/low exponent ratio= 1.60:1). For the 1000 Hz tones the mean/sd was15.31, and the ratio of highest to lowest exponent was 1.21:1. For the 65 Hz tones the mean/sd was 8.22 and the ratio of highest to lowest exponent was 1.36:1. 45 Table 4. Test trial results for Experiment 3. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 65 Hz RA2 1 0.54 0.83 0.82 0.89 2 0.59 0.87 0.96 0.85 3 0.49 0.84 0.71 0.87 4 0.57 0.89 0.70 0.92 5 0.58 0.83 0.93 0.89 6 0.54 0.88 0.88 0.93 mean 0.55 0.83 sd 0.04 0.11 m/sd 15.31 8.22 h/l 1.09 1.36 Analysis RT RC Subject* Exponent Corrected Estimated Exponent Corrected 65 Hz RA2 Threshold 65 Hz RA2 1 0.78 0.90 0.01 0.70 0.87 2 0.87 0.87 0.01 0.73 0.75 3 0.64 0.89 0.01 0.56 0.84 4 0.66 0.92 0.01 0.67 0.90 5 0.85 0.90 0.01 0.75 0.87 6 0.85 0.93 0.00 0.80 0.85 mean 0.77 0.70 sd 0.10 0.08 m/sd 7.58 8.75 h/l 1.36 1:42 R: log(R)=B*log(S)+log(K) RT: log(R)=B*log(S-T)+log(K) RC: log(R)=B*log(S)+log(K); R=>1. The 65 Hz tones also exhibited a tendency for the slope to increase below a response of 1 as is typical of near threshold responses (eg. see Stevens, 1975). Figure 8 shows the raw data of individual subjects for the 65 Hz tones with best fitting lines according to the function, log(R) = B log(S - T) + log(K) (11) 46 which Stevens used to account for the deviation from the power law near threshold. The reasoning behind this function is that subjects use threshold (T) as zero, which causes a distortion since power law functions must pass through the zero point associated with the physical absence of the stimulus. Subtracting T from S corrects for this problem and, in general, accounts for the deviation near threshold (Stevens, 1975). From Figure 8 it is clear that this function does a reasonably good job of fitting the data. For these results the mean exponent was 0.77 and according to a t-test the exponents were significantly different from the 1000 Hz exponents at p<.001. In terms of intersubject variability the mean/sd was 7.39, and the highest to lowest exponent ratio was 1.33:1. Both of these figures indicate a level of intersubject variability below that indicated by the benchmarks. Yet another way to analyze the data is to discard subjects' responses below response=1, on the basis that subjects were not trained to respond below 1 and therefore might exhibit idiosyncratic tendencies in this range, especially considering the fact that the near threshold deviation from the power law occurred in this range. Therefore, the 65 Hz data were reanalyzed, excluding responses below 1, according to Equation 10. The mean 65 Hz exponent was 0.70, the mean/sd was 8.75, and the highest to lowest exponent ratio was 1.42:1. Once again, a t-test revealed that these 65 Hz exponents were significantly different from the 1000 Hz exponents at p<.001. The intersubject variability across the different types of analyses used in this experiment are illustrated in Figure 9. As can be seen the different methods of analysis are highly consistent, although there was a general tendency for the difference between the 1000 Hz exponents and the 65 Hz exponents to become smaller as the higher slope for the near threshold 65 Hz tones was either accounted for or left but of the analysis. 47 Following Ward (1990) and Marks (1974), the exponents for different sound frequencies can be approximated by the following equations , F <= 400 Hz: B = 2 (H+G (400-F)) (12) F> 400Hz:B = 2(H) (13) where F is frequency, B is the exponent, and H and G are constants. Equation 12 describes a linear approximation of the relationship between frequency and exponent for frequencies =< 400 Hz, and can be rearranged into the more familiar Y=MX+B form, F <= 400Hz: B = (-2G)F + (2H+2G(400)) (14) As Equation 14 illustrates, -2G describes the slope of the increase in exponents for frequencies <= 400 Hz. Ward (1990) found a value for G equal to 0.0004. Using the exponents found in this experiment it was determined that G was equal to 0.0004 when the mean 65 Hz exponent was 0.83 (the first analysis), 0.0003 when the mean 65 Hz exponent was 0.77 (the second analysis), and 0.0002 when the mean 65 Hz exponent was 0.70 (the third analysis). These estimates are very close to Ward's (1990) estimates illustrating that constrained scaling produces results consistent with established psychophysical methods. However, this is not to say that estimates of G do not suffer from the excessive variability that plagues psychophysics. Marks (1974), using equal loudness contours and the results of standard ME experiments from various labs (see Marks, 1974), estimated G to be approximately 0.0009 which, based on the 1000 Hz exponent found in this experiment (i.e. 0.55), predicts a 65 Hz exponent of 1.15. 48 Figure 7. Average median error for 10 trial blocks for the learning trials of Experiment 2 (line A) and Experiment 3 (line B). 49 Figure 8. Experiment 2: Raw 65 Hz data fitted according to log(R)=B*log(S-T)+log(K) Subjects 1 and 2 - 3 - 2 - 1 0 1 2 LOG STIMULUS (LOG DYNES/CM~2) -3 - 2 - 1 0 1 LOG STIMULUS (LOG DYMES/OU-2) Subjects 5 and 6 50 Figure 9. Experiment 3: Individual exponent values. Frequency/ Analysis 51 Experiment 4 Experiment 4 employed the same apparatus and the same stimuli as in the learning condition of Experiment 3 to perform a standard ME experiment. This was done in order to assure that the apparent difference in intersubject variability between a successful application of constrained scaling (i.e. Experiment 3) and standard ME results (as represented in Table 1) was real. Subjects Six volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. Two of the subjects (subject numbers 3 and 4) had previously participated in short pilot studies (one each) involving training similar to that given in Experiment 3. None of the others had participated in a scaling experiment before. Apparatus The apparatus was the same as Experiment 3 except that the scroll bar was removed and subjects were provided with a keyboard. By using the mouse to click on the response text-box subjects were able to key in a response of any length. 52 Procedure Subjects were instructed that they would hear 1000 Hz tones presented at a variety of intensities and that they should match the intensities of the tones to their own subjective impression of the intensity of number. Special care was taken to ensure that each subject fully understood the ME instructions. Examples were given and subjects were questioned in order to detect any misconceptions. The motivation of subjects was also noted, all seemed highly motivated and positive about the task. Each subjected 1 completed 100 trials. Results and Discussion Figure 10 shows the raw data of each subject in log-log coordinates with the best fitting line according to Equation 10 (note that the range of the response axis varies from subject to subject in order to accommodate the wide variety of response ranges). Individual exponent values are displayed in Table 5 and Figure 11. Figure 11 also includes the 1000 Hz with-feedback results of Experiment 3 as a reference. While the mean exponent for the ME results was close to 0.6 (mean exponent = 0.638) little meaning can be attached to this figure due to the high level of individual variability and deviations from the power law itself (see subjects 3 and 5, Figure 10). In terms of intersubject variability, the mean/sd was 2.44, and the highest to lowest exponent ratio 3.05:1. These results were typical of ME (see Table 3) and failed to come close to the bench marks for intersubject variability (mean/sd=5.50, hi/low exponent ratio=1.60;1). Comparing these results to the Equation 10 analysis of the 65 Hz results from Experiment 3 (i.e. the no feedback results), an F-test for variability revealed that constrained scaling produced significantly less intersubject variability (p=0.04) than standard, free ME. 53 Table 5. Test trial results for Experiment 4. Analysis R Subject* Exponent Corrected 1000 Hz RA2 1 0.71 0.88 2 0.33 0.83 3 0.75 0.62 4 1.00 0.80 5 0.34 0.66 6 0.71 0.75 mean 0.64 sd 0.26 m/sd 2.44 h/l 3.05 R: log(R)=B*log(S)+log(K) Figure 10. Experiment 4: Fitted functions for 1000 Hz tones. Subjects 1 and 2 LOG SOUND PRESSURE (LOG DYNES/CM~2) LOG SOUND PRESSURE (LOG DYNES/CM"^) Subjects 3 and 4 -ZB -1.5 -0.5 0.6 1.6 LOG SOUND PRESSURE (LOG DYNES/CM~2) 2.0 0,o I J ' 1 1 -2.5 -1.5 -0.5 0.5 1.5 LOG SOUND PRESSURE (LOG DYNES/CM~2) Subjects 5 and 6 LOG SOUND PRESSURE (LOG DYNES/CM~2) LOG SOUND PRESSURE (LOG D Y N E S / C W 2 ) 55 Figure 11. Experiment 4: Individual exponent values. 56 . Part 2: Cross- Modality Applications Experiments Experiment 5 was done to demonstrate the feasibility of using constrained scaling to perform cross-modality matching. In this case the same training regimen as in experiments 1 to 3 was used (i.e. 1000 Hz tones) but the test continuum was brightness instead of loudness. Also, if it is assumed that Stevens was successful at informally constraining his scales at the aggregate level, then fixing subjects on the exponent that Stevens attained for loudness (i.e. 0.60) should also lock them onto the exponent that Stevens attained for brightness. Based on this assumption it was predicted that each subject in this experiment would replicate Stevens' average result of an exponent of 0.30 for brightness. If obtained this result would imply that interlab differences in exponent values can be minimized by using constrained scaling. Subjects Eight volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. None of the subjects reported any visual system abnormalities other than those corrected for by wearing glasses. Subjects who wore glasses wore them during the experiment. Unlike Experiments 1 to 4, several subjects participated in more than one of Experiments 5 to 10, however, their results were indistinguishable from those of novice subjects. Apparatus The apparatus was the same as in Experiment 3 except that the colors of the computer monitor screen were altered (primarily to dark red on a black background) and the overall luminance of the monitor was reduced to make the interior of the sound attenuation chamber as dark as possible without making it too difficult for the subject to 57 see the screen. The light stimuli were produced in the form of a dot, 6.5 centimeters in diameter and of approximately uniform luminance. It was positioned at eye level, directly in front of the subjects, and approximately 60 cm from their eyes. The intensities of the light stimuli were six, equal-log spaced, levels of luminance: 0.013, 0.760, 0.430, 2.400, 13.800, and 79.400 footlamberts. They were produced by a 565 nm wavelength LED embedded in diffusing plastic and controlled by varying the voltage across the LED. Light stimuli were presented for 1 sec with rise and fall times of less than 1 microsecond. Procedure The learning procedure was identical to the procedure in Experiment 3 except that subjects were only given 50 learning trials. This was based on the finding from Experiment 3 that subjects exhibited no significant evidence of further learning beyond approximately 20 to 30 trials (see Figure 7). The testing procedure was also the same as in Experiment 3 except that the 65 Hz tones were replaced by the luminance levels, which were also presented randomly. Subjects performed 100 test trials alternating between sounds and lights (approximately 8 or 9 presentations for each luminance level). Results and Discussion Except for the light stimuli, the results were analyzed in the same way as Experiment 3. For the learning trials the mean exponent was 0.56. In terms of individual variability the mean/sd was 10.67 and the highest to lowest exponent ratio was 1.29:1 (these figures will be relevant for comparisons to later experiments). The results of the test trials are displayed in Table 6. The responses to the six luminance levels were analyzed by taking the mean response values for each luminance level and regressing 58 the best fitting line through them. These fitted functions are displayed in Figure 12. Figure 13 displays subjects' individual exponents for loudness and brightness. For both modalities, the mean divided by the standard deviation, and the highest to lowest exponent ratio were better than the bench marks for intersubject variability (see Table 6). Additionally, a t-test revealed that the difference between the loudness exponents (mean = 0.534) and the brightness exponents (mean = 0.328) was significant at p<001. Table 6. Test trial results for Experiment 5. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 lights RA2 1 0.56 0.87 0.36 0.89 2 0.61 0.91 0.34 0.99 3 0.55 0.86 0.40 0.89 4 0.48 0.87 0.27 0.92 5 0.60 0.90 0.25 0.91 6 0.50 0.84 0.36 0.94 7 0.46 0.74 0.31 0.93 8 0.52 0.83 0.35 0.97 mean 0.53 0.33 sd 0.06 0.05 m/sd 9.37 6.56 h/l 1.34 1.59 R: log(R)=B*log(S)+log(K) This experiment clearly demonstrates that subjects can extend the use of a learned scale across modalities. Furthermore, by fixing subjects to the average exponent value Stevens typically found for loudness (approximately 0.60) it was possible to come very close to the average exponent value Stevens typically found for brightness 59 (approximately 0.30, see Stevens, 1975), suggesting that Stevens was successful at informally constraining subjects at the aggregate level. Note also that, individually, each subject came close to the predicted exponent values of 0.60 for loudness and 0.30 for brightness. 60 Figure 12. Experiment 5: Fitted Functions for the Light stimulus. LOG BBQHTNESS <FOOTLAN*ERTS) LOO BRIGHTNESS CFOOTLAM8EFT8) SUBJECT 3 SUBJECT 4 LOG BRIGHTNESS (FOOTLAKCERTS) - 2 - 1 0 1 LOG BRIGHTNESS (FOOTLAMBERTS) LOG BRIGHTNESS (F00TLAMEEPT8) LOG BRIGHTNESS (F00TLAWEEPT6) SUBJECT 7 SUBJECT 8 LOG BRIGHTNESS (FOOTLAMBERTS) LOG BRIGHTNESS (FOOTLAMBEPTS) 61 Figure 13. Experiment 5: Individual exponent values. 62 Part 3: Range Effects Experiments 6 and 7 For ME, Stevens always used a stimulus range that was close to the full range possible, but not uncomfortably intense or too difficult for subjects to detect (Stevens, 1975). This type of procedure, which is very common in ME, has the effect of making the bottom and top of the stimulus range close to the bottom and top of the subject's perceptual range. An important effect of anchoring the stimulus range to the subject's own subjective range may be to provide the subject with a famijiar context within which to make judgments. Experiments 6 and 7 examined the importance of this highly familiar context for constrained scaling by using constrained scaling to test subjects on subranges of the stimulus continuum. On average, Within a modality, as the range of the stimuli decreases, the estimated exponent of the power function increases (Poulton, 1968, 1989, Teghtsoonian and Teghtsoonian, 1978). In an attempt to eliminate this range effect subjects were trained to respond according to an exponent of 0.60 using the same subranges to be later employed in the test phase. If the range effect was approximately the same size for the stimuli subjects were trained on and for the novel stimuli in the test phase, then training subjects to "undo" the range effect in the learning phase should also cause them to "undo" the range effect in the test phase. However, if the range effect for the novel stimuli was larger, then the training would be insufficient to undo it. Or, if the range effect for the novel stimuli was smaller, then the training would result in an over compensation and an effect in the opposite direction. 63 Subjects In each experiment seven volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. Apparatus The apparatus was the same as in Experiment 3. Procedure The procedure was identical to Experiment 3 (i.e. exponent =0.60) except that subjects received only 50 learning trials and performed only 100 alternating test trials, similar to Experiment 5. Also the range of the 1000 Hz tones was reduced by changing the most intense tone from 100 dB to 90 dB in Experiment 6, and to 80 dB in Experiment 7. This was done for both the training trials and the test trials. Likewise the top range of the 65 Hz test tones was reduced to 90 dB (Experiment 6) and 80 dB (Experiment 7). The bottom of the stimulus range of the 65 Hz tones was also altered from 33 dB to 40 dB in both experiments in order to subjectively equalize the bottom of the 65 Hz range with the bottom of the 1000 Hz range. This was done in order to focus on subjective range differences occurring at the tops of the ranges and also to avoid the added complication of having subjects respond over a subjective range they had not been trained on (i.e. below response=1) and that deviates from the power law (see Experiment 3). The range equalization was based on the results of a pilot study and was consistent with the research on equal loudness contours (eg. see Ward, 1990). Subjects responded on the same 0 to 100 response scale but a marker was placed at response=50 in Experiment 6 and response=25 in Experiment 7. Subjects were told that the 1000 Hz tones would not be louder than indicated by the marker. Subjects were also told that the lower range of the 65 Hz tones would be around response=1, 64 but to use their best judgment. This was similar to the procedure in Experiment 3 in which subjects were told that the bottom of the 65 Hz range could require a response . as low as response=0. As in Experiment 3, subjects were free to respond below response=1.^ Results The results were analyzed in the same way as those of Experiment 3. Taking Experiment 6 first, for the learning trials the mean exponent value was 0.56, the mean/sd was 8.93 and the highest to lowest exponent ratio was 1.35:1. These results were very similar to the results from the learning trials of Experiment 5, which used a greater stimulus range but was otherwise the same, suggesting that the difference in range did not adversely affect intersubject variability as long as feedback was supplied. The results of the test trials are displayed, in Table 7. The exponent values were derived by fitting Equation 10 to the raw data. The fitted functions for the 65 Hz tones are displayed in Figure 14. Both the 1000 and 65 Hz test trial results indicate a level of intersubject variability less than the benchmark criteria. The individual 1000 and 65 Hz exponents are displayed in Figure 15. As predicted the 65 Hz exponents were higher than the 1000 Hz exponents for all subjects. Across subjects this was significant at p=.001, as indicated by a t-test. For the 65 Hz test trial results the most appropriate comparison was to the Experiment 3, 65 Hz results analyzed excluding responses below 1. Comparing the results we find that in this experiment the mean 65 Hz exponent was lower and so was the intersubject variability. However, a t-test comparing exponent values revealed no significant difference (p=0.20 one tailed, p=0.40 two tailed). Likewise, an F-test for variance revealed no significant difference in intersubject variability (p=0.30). 65 Table 7. Test trial results for Experiment 6. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 65 Hz RA2 1 0.57 0.91 0.68 0.89 2 0.55 0.84 0.70 0.81 3 0.60 0.83 0.69 0.64 4 0.57 0.89 0.72 0.72 5 0.59 0.91 0.67 0.71 6 0.51 0.67 0.68 0.82 7 0.49 0.78 0.52 0.61 mean 0.56 0.67 sd 0.04 0.06 m/sd 13.53 10.39 h/l 1.23 1.37 R: log(R)=B*log(S)+log(K) 66 Figure 14. Experiment 6: Fitted Functions for the 65 Hz Test Trials. - 3 - 2 - 1 o 1. LOG STMULUS 0.00 0YNES/CM~2) - 2 - 1 0 1 2 LOG STMULUS (LOG OYNESvWS) SUBJECT 3 -3 - 2 - 1 0 1 LOO STMULUS LOG tmC8/CM~2) - 2 - 1 0 1 2 LOG STMULUS (LOO DYN6S/t>r2) SUBJECT 5 - 3 - 2 - 1 0 1 2 LOO STMULUS CLOG DYNEQ/CM*2) - 3 - 2 - 1 0 1 2 LOG STMULUS (LOG DYNES/CM~2) SUBJECT 7 -a -2 - i LOG STMULUS CLOG DYNE3/C1-C2) 67 Figure 15. Experiment 6: Individual exponent values. Frequency 6 8 Examining the results for Experiment 7, for the 1000 Hz learning trials the mean exponent was 0.53, the mean/sd was 5.29, and the highest to lowest exponent ratio was 1.75:1. These results were quite poor compared to the learning trial results of previous experiments and the mean divided by the standard deviation did meet the benchmark criterion. The results of the test trials are displayed in Table 8. As in Experiment 6 the exponent values were derived by fitting Equation 10 to the raw data. The fitted functions for the 65 Hz tones are displayed in Figure 16. The individual 1000 and 65 Hz exponents are displayed in Figure 17. Similar to the 1000 Hz learning trials, both the 1000 Hz and 65 Hz test trials resulted in intersubject calibration levels substantially lower than those found in Experiments 3 and 6. Except for the mean divided by the standard deviation for the 1000 Hz test trials, the calibration indicators failed to meet the benchmarks (mean/sd => 5.50, high/low <=1.60). Also, subject 2 produced the reverse Of the predicted patter of exponents (i.e. the 65 Hz exponent was lower than the 1000 Hz exponent). F-tests for variance revealed that the intersubject exponent variability was significantly higher in Experiment 7 (top of range=80 dB) compared to Experiment 6 (top of range=90 dB), for both the 1000 Hz test trials (p=0.05) and the 65 Hz test trials (p=0.04), indicating that the reduction in range resulted in a decrement in subjects' abilities to maintain the learned scale. Furthermore, a glance at Figure 17 reveals that subjects' 1000 and 65 Hz test trial results were not correlated (r = -0.31, p=0.50), indicating that subjects were not simply exhibiting idiosyncratic tendencies to use either lower or higher exponents, across both frequencies. 69 Table 8. Test trial results for Experiment 7.. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 65 Hz RA2 1 0.49 0.75 0.59 0.64 2 0.52 0.81 0.43 0.59 3 0.63 0.81 0.64 0.70 4 0.55 0.70 0.82 0.73 5 0.55 0.90 0.69 0.75 6 0.44 0.65 0.69 0.70 7 0.37 0.49 0.84 0.76 mean 0.51 0.67 sd 0.09 0.14 m/sd 5.97 4.88 h/l 1.72 1.95 R: log(R)=B*log(S)+log(K) Although subjects were less well calibrated in Experiment 7, the mean 65 Hz exponent of 0.67 was the same as the 65 Hz exponent of 0.67 found in Experiment 5 (at this level of precision). Also, both of these were close, and not significantly different from, the mean 65 Hz exponent of 0.70 found in Experiment 3. The mean 65 Hz exponent across all three experiments was 0.68 which at plus or minus 0.02 includes the lowest and highest mean exponent values. Thus it can be concluded that for all practical purposes the range effect was either eliminated or controlled by the procedure used in these experiments. 70 Figure 16. Experiment 7: Fitted Functions for the 65 Hz Test Trials. SUBJECT 1 SUBJECT 2 - 2 - 1 0 1 2 LOG STMULUS CLOG DYNE8/CW2) LOG STIMULUS (LOG DY1*S/CW2) SUBJECT 3 - 3 - 2 - 1 0 1 2 LOG STMULUS 0-OG DYNES/CM-2) - 3 - 2 - 1 0 1 LOG STMULUS (LOG DWE8/CI-r2) SUBJECT 5 - 3 - 2 - 1 0 1 2 LOG STMULUS (LOO DWEB/CM-2) - 3 - 2 - 1 0 1 LOG STMULUS (LOG DYTJ6S/CkT2) SUBJECT 7 - 3 - 2 -1 0 1 LOG STMULUS (LOG DYNE9/CM~2) 71 Figure 17. Experiment 7: Individual exponent values. Frequency 72 Part 4: Different Exponents Experiments 3, 5 and 6 of this dissertation demonstrated that subjects can learn a loudness scale based on an exponent of 0.60 and use it to judge sensory magnitudes associated with other unlearned stimuli. The exponent of 0.60 was chosen because of the claim by Stevens and his supporters that the stimulus input function for the loudness of 1000 Hz tones is characterized by this exponent. If this is true then the learning portion of constrained scaling may operate not by teaching subjects a specific exponent, but by teaching them to relax and cease to bias what comes naturally. Marks, Galanter & Baird (1995) demonstrated that the exponent of the power function that subjects were trained on had no effect on loudness summation at the group level. However, it remains unclear whether subjects can extend a learned 1000 Hz scale to novel stimuli if the learned exponent is other than 0.60. Experiment 8 Experiment 8 tested whether constrained scaling would work if subjects were trained to respond to 1000 Hz tones according to an exponent of 0.30. Similar to Experiment 3 the novel stimuli were 65 Hz tones. In terms of learning the scale, Marks, Galanter, and Baird (1995) used feedback to train subjects to respond according to an exponent of 0.30 to 500 Hz tones (according to Stevens, 500 Hz tones are also characterized by an exponent of approximately 0.60). Taking pooled geometric means within and across subjects they found a group exponent of 0.31. Between subjects the mean/sd was 14.71, and the highest to lowest exponent ratio was 1.27:1. These figures were very close to the results obtained when Marks, Galanter, and Baird (1995) trained subjects to respond to 500 Hz tones according to an exponent of 0.60. In this case the group exponent was 0.58, between subjects the standard deviation for exponents was 0.03, 73 the mean/sd was 16.67, and the highest to lowest exponent ratio was 1.38:1. In addition to the Marks, Galanter, and Baird (1995) results, King and Lockhead (1981) were able to use feedback to train a single naive subject to respond to 1000 Hz tones according to ah exponent of 0.33 with remarkable accuracy (exponent value not reported, see graph in King and Lockhead, 1981). In terms of responding without feedback, Baird, Kreindler, and Jones (1971), constrained subjects to respond to line length according to different exponents by assigning two moduli. Specifically, they showed all subjects a line length which was to have a response value equal to 1 and also assigned a response value to the longest line length. Once any two response values have been set, the exponent of the power function passing through those two points is determined. Thus, Baird, Kreindler, and Jones (1971) were able to constrain subjects to respond according to various exponents, without providing any feedback. Baird, Kreindler, and Jones (1971) do not report individual exponents, however, at the group level subjects were able to respond more accurately, the lower the exponent was. Specifically, subjects constrained to respond according to exponents of 0.33, 0.50, 0.75, 1.00, 1.33, 2.0, and 3.0, and responded with exponents of 0.31, 0.48, 0.71, 0.90, 1.24, 1.71, and 2.55 (according to Stevens, the correct exponent for line length is equal to 1). Based on these past research results, it was predicted that subjects would find learning and applying an exponent of 0.30 no more difficult than subjects found learning and applying an exponent of 0.60. Also, if the learned exponent is altered then the most straightforward prediction for the responses to the novel stimuli would be that they would be altered in the same way. If this were the case, subjects trained on different exponents could be calibrated by simply dividing or multiplying one set of 74 exponents by the ratio of the exponents the two groups were trained on. Therefore, since 0.30 is half of 0.60 it was predicted that the average 65 Hz exponent from Experiment 8 should equal 0.42 which is half of the average 65 Hz exponent found in Experiment 3 (using all responses and fitting the best fitting line according to Equation 10). Subjects Seven volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. Apparatus The apparatus was the same as in Experiment 3 except that the response scale went from 0 to 10 and subjects could respond to two decimal places of accuracy. Procedure The procedure was identical to that of Experiment 3 except that subjects received only 50 learning trials and performed only 100 alternating test trials, similar to Experiments 5, 6, and 7. Also, instead of learning the 0.60 scale subjects learned a scale based on an exponent of 0.30, with a response range of 1 to 10 for the 1000 Hz tones. It is important to note that once one stimulus value is matched to a particular response value by a particular exponent, the response values for all other stimuli are fixed. In this case the response scale was determined by choosing an exponent of 0.30 and by assigning the least intense 1000 Hz tone a value of 1, as was the case when an exponent of 0.60 was used. By doing this the upper bound of the response continuum was fixed at approximately 10. The value of 0.30 was chosen to produce a response scale compatible with the base 10 number system that subjects prefer (similar to the response scale associated with an exponent of 0.60, i.e. 1 to 100). The information 75 content, or fidelity, of the 0.30 response scale was equalized to the previous 0.60 response scales by restricting subjects to three digit responses for the 0.30 response scale, as was the case for the 0.60 response scale. For example, the number just below the maximum response value was 9.99 for the 0.30 response scale, and 99.9 for the 0.60 response scale. Results and Discussion The learning trials were analyzed by fitting the best line through the raw data according to Equation 10. The mean exponent was 0.31, the mean/sd was 8.94, and the highest to lowest exponent ratio was 1.39:1 .These results were similar to the test trial results from Experiments 5 and 6 suggesting no difference in the learning process for power functions based on the exponents of 0.60 and 0.30. The results of the test trials are displayed in Table 9. The exponent values were, again derived by fitting Equation 10 to the raw data. The fitted functions for the 65 Hz trials are displayed in Figure 18. Both the 1000 and 65 Hz test trial results indicate a level of intersubject variability less than the benchmark, although the 65 Hz results appeared to be more variable than the 65 Hz results in past experiments. The individual 1000 and 65 Hz exponents are displayed in Figure 19. As predicted the 65 Hz exponents were higher than the 1000 Hz exponents for all subjects. Across subjects this was significant at p=.04, as indicated by a t-test. Also, for the 65 Hz tones the mean exponent was 0.42, which was equal to the predicted value. As in Experiment 3, the 65 Hz results were also analyzed excluding responses below response=1, where subjects had not been trained and where the near threshold deviation from the power law would have occurred. Similar to Experiment 3, this analysis had the effect of lowering the estimated exponent values as well as the 76 Table 9. Test trial results for Experiment 8. Analysis R R RC Subject* Exponent Corrected Exponent Corrected Exponent Corrected 1000 Hz RA2 65 Hz RA2 65 Hz RA2 1 0.27 0.74 0.41 0.76 0.34 0.81 2 0.28 0.86 0.47 0.81 0.29 0,82 3 0.30 0.86 0.56 0.75 , 0.35 0.86 4 0.30 0.83 0.42 0.76 0.33 0.79 5 0.26 0.87 0.36 0.87 0.28 0.77 6 0.27 0.69 0.37 0.69 0.24 0.63 7 0.26 0.74 0.36 0.77 0.32 0.77 mean 0.28 0.42 0.31 sd 0.02 0.07 0.04 m/sd 13.75 5.81 7.87 h/l 1.19 1.57 1.49 R: log(R)=B*log(S)+log(K) RC: log(R)=B*log(S)+log(K); R=>1. intersubject variability (see Table 9). Although the difference between the 1000 Hz exponents and the 65 Hz exponents was reduced, it was actually more significant due to the reduction in intersubject variability. According to a t-test the difference was significant at p<.001. However, the mean 65 Hz exponent of 0.31 was not so close to the predicted value of 0.35 (i.e. half of the mean 65 Hz exponent found using the same analysis procedure in Experiment 3). To test the difference, the exponents for individual subjects in Experiment 3 were divided in half and contrasted with the individual 65 Hz exponents found in this experiment. A t-test revealed a marginal difference, significant at p= 0.07. Thus it may be the case that, under certain conditions, dividing the novel stimuli exponents of subjects trained on an exponent of 0.60 by two does not make them calibrated to subjects trained on an exponent of 0.30. 77 Figure 18. Experiment 8: Fitted Functions for the 65 Hz Test Trials. SUBJECT 2 -3 -2 -1 LOG STMULUS (LOG DYNES/CM~2> - 2 - 1 0 1 LOG STMULUS (LOG OvTCS/OM-2) SUBJECT 3 - 2 - 1 0 1 2 LOG STMULUS (LOG DYNES/CM"^) "-3 -2 - 1 0 . 1 2 LOG STMULUS (LOG DYNE6/OM~2) SUBJECT 5 SUBJECT 6 - 3 - 2 - 1 0 1 LOG STMULUS (LOG DYNE8/CM~2) -3 -2 -1 0 1 LOG STMULUS (LOG CfNeS/CM~2) -3 -2 -1 0 1 2 LOG STMULUS (LOG 0WES/Ckr2> 78 Figure 19. Experiment 8: Individual exponent values. Frequency/Analysis 79 Experiment 9 Having demonstrated that constrained scaling, based on a 1000 Hz training regime, can work for exponents less than the canonical value of 0.60, we turn to exponents greater than 0.60. Marks, Galanter, and Baird (1995) trained subjects to respond to 500 Hz tones according to an exponent of 1.20 and found that even with feedback subjects produced a group exponent that was significantly different from 1.20 (the exponent was 1.11). In addition, compared to when the same subjects were trained on exponents of 0.30 and 0.60, the mean/sd was lower (9.09 compared to 14.71 and 16.67 respectively) and the highest to lowest exponent ratio was higher (1.60:1 compared to 1.27:1 and 1.38:1 respectively). Baird, Kreindler, and Jones (1971), also constrained subjects to respond with unusually high exponents and found a greater tendency to fall short of the exponent they were trained on as the exponent was increased (see Experiment 8, Introduction section). Furthermore, they found substantial deviations from the power law for the highest exponents. King and Lockhead (1981), however, were able to.use feedback to train two naive observers to respond to 1000 Hz tones according to an exponent of 1.00 with an extremely high level of accuracy (exponent value not reported, see the graph in King and Lockhead, 1981). This experiment examined whether constrained scaling would work if subjects were trained to respond to 1000 Hz tones according to an exponent of 0.90. An exponent of 0,90 was chosen because it results in a scale of 1 to 1000 when the lowest 1000 Hz training tone is set to 1. Thus, similar to the 0.60 response scale used in Experiments 1 to 7 (1000 Hz response range = 1 to 100) and the 0.30 response scale used in Experiment 8 (1000 Hz response range = 1 to 10) this 0.90 scale would also be compatible with the base 10 number system that subjects are familiar with. The 80 3 • • • information content, or fidelity, of the 0.90 response scale was made equal to the previous 0.60 and 0.30 response scales by similarly restricting subjects to three digit responses. For example, the number just below the maximum response value was 9.99 for the 0.30 response scale, 99.9 for the 0.60 scale, and 999 for the 0.90 response scale. However, in this case the three digit accuracy meant that subjects could only respond to tones below response=1, with a 1 or a 0. In Marks Galanter and Baird (1995) the same problem resulted in subjects being required to respond with four digit accuracy when trained on the exponent of 1.20, whereas they were restricted to three digit accuracy for the exponents of 0.60 and 0.30. Baird, Kreindler, and Jones (1971) also encountered this problem. King and Lockhead (1981) do not report enough information to ascertain if this was a problem; however, if they did hold the fidelity constant the decrease in accuracy found in the other studies could be attributed to the greater cognitive load created by requiring subjects to learn responses with more digits. To avoid this confound, it was decided to maintain a three digit response precision in Experiment 9 and to eliminate the 65 Hz tones below response=1 by subjectively equalizing the bottom of the 65 Hz range to the bottom of the ,1000 Hz range, as in Experiments 6 and 7. Subjects Seven volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. Apparatus The apparatus was the same as in Experiment 3. 81 Procedure Experiment 9 was identical to Experiment 8 except that subjects were trained on an exponent of 0.90 and the bottom of the 65 Hz range was subjectively equalized to the bottom of the 1000 Hz range using the same method as in Experiments 6 and 7. Subjects were told that the lowest 65 Hz tone would be around response=1 and to use a response of 1 for any tones equal to or less than 1. Results and Discussion The learning trials were analyzed by fitting the best line through the raw data according to Equation 10. The mean exponent was 0.700, the mean/sd was 5.63 and the highest to lowest ratio was 1.71:1. These results differed from King and Lockhead (1981) in that subjects were unable to learn to respond accurately according to the 0.90 exponent (the closest subject produced an exponent of 0.81; the worst subject produced an exponent of 0.48). At present the only explanation I can offer for the King and Lockhead (1981) results is that had they used more than two subjects their results would be more in line with Baird, Kreindler, and Jones (1971), Marks Galanter and Baird (1995), and the results of this experiment. The results of the test trials are displayed in Table 10. The exponent values were derived by fitting Equation 10 to the raw data. The fitted functions for the 65 Hz trials are displayed in Figure 20. In terms of intersubject variability the results were quite poor and failed to surpass the benchmarks, except for the 1000 Hz mean/sd which just squeaked by. Also, a t-test indicated that the difference between the 65 Hz exponents and the 1000 Hz exponents was not significant. The individual 1000 and 65 Hz exponents are displayed in Figure 21. Overall the level of variability was very similar to that found in Experiment 7, in which the top of the stimulus range was 82 Table 10. Test trial results for Experiment 9. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 65 Hz RA2 1 0.79 0.84 0.68 0.54 2 0.76 0.80 0.77 0.67 3 0.74 0.81 0.97 0.85 4 0.74 0.79 0.71 0.55 5 0.91 0.95 1.01 0.82 6 0.48 0.44 0.60 0.54 7 0.82 0.74 1.01 0.74 mean 0.75 0.82 sd 0.13 0.17 m/sd 5.66 4.78 h/l 1.90 1.68 R: log(R)=B*log(S)+log(K) reduced to 80 dB. However, closer inspection of Figure 21 reveals an important difference between these results and the results of Experiment 7. In Experiment 7 there was a negative, nonsignificant correlation between the 1000 Hz test trial exponents and the 65 Hz test trial exponents, while in this experiment the correlation was r = +0.69 which was marginally significant at the p= .08. This correlation suggests that while subjects deviated from the 0.90 exponent in idiosyncratic ways, there was a tendency to respond consistently across the 1000 and 65 Hz test tones. Overall, considering the results of Experiments 8 and 9 as well as previous research, it would seem that exponents greater than 0.60 for 1000 Hz tones are less natural in that they are more difficult to learn and harder to maintain without feedback, while there is no significant evidence that exponents less than 0.60 are any less easy to learn, maintain, or generalize to other stimuli. This pattern of results suggests that the exponent values describing the perception of loudness are not completely arbitrary, 83 but rather are limited to a certain range. However, another possibility is that higher exponent values are more difficult because they require a larger response range in absolute terms. Thus, using a scale of 1 to 10 might seem more daunting and less familiar than using a scale of 1 to 1000, even though both require the same degree of accuracy (i.e. three digits in this case). 84 Figure 20. Experiment 9: Fitted Functions for the 65 Hz Test Trials. - 2 - 1 0 1 2 LOG STMULUS (LOG OYNE8/CM-2) - 2 - 1 0 1 LOO STMULUS (LOG - 2 - 1 0 1 2 LOG STMULUS (LOG D\r£3/CM~2) - 3 - 2 - 1 0 1 2 LOG STMULUS [LOG DYNES/C*-T2) S U B J E C T 7 - 2 - 1 0 1 LOG STMULUS (LOG DVNBSAXT2) S U B J E C T 6 - 3 - 2 - 1 0 1 LOG STMULUS (LOG DTT«S/CM~2) LOG STMULUS (LOG D1T£a/CH~2> 85 ure 21. Experiment 9: Individual exponent values. Frequency 7 86 Part 5: Nonperceptual Stimuli Experiment 10 It has long been known that the scaling of social stimuli, such as the seriousness of crimes (Ekman, 1962) or the desirability of various wrist watches (Indow, 1961), produces the same general pattern of results as the scaling of perceptual stimuli (see Stevens, 1975 for a review). However, very little has been made of this fact. Experiment 10 extends constrained scaling beyond the perceptual domain and into the social, cognitive domain, by examining the magnitude of happiness resulting from winning various amounts of money. Specifically, subjects were trained using the standard auditory scale (i.e. 1000 Hz tones) and then asked to rate how happy they would be if they won various amounts of money in a lottery (i.e. to rate the perceived utility under these specific conditions). The social domain differs from the perceptual domain in that we should expect legitimate individual differences. If constrained scaling can be extended into the social domain it may be possible to detect these differences in a reliable way which could have far reaching consequences for social, personality, cross-cultural, and clinical psychology which currently lack the ability to clearly differentiate real individual differences from differences in response styles. Thus, it was predicted that subjects would remain well calibrated on the 1000 Hz test trial tones, but exhibit substantial intersubject variability on the hypothetical lottery winnings. In terms of predicting the mean exponent for money, values range from 0.17 (Sellin & Wolfgang, 1964, as cited in Stevens, 1975), based on ME scaling of the seriousness ,of stealing various amounts of money, to 0.5 (G. Cramer, 1728, as cited in Stevens, 1975), calculated based on a gambling game. In possibly the most thorough study, 2000 Canadian university students provided an average ME exponent of 0.25 for the 87 seriousness of money thefts (Akman & Normandeau, 1967, as cited in Stevens, 1975). It is not possible to tell if the differences between these exponents were due to differences in the task (eg. gambling versus theft), differences in the groups (eg. cultural or economic differences), or differences in the informal constraints contained in the ME procedure. However, overall it seemed reasonable to predict that the average exponent for money would be less than 0.6. Subjects Seven volunteers participated for pay. All claimed to have normal hearing and there was no evidence of hearing abnormalities during the task. Apparatus The apparatus was the same as in Experiment 3 except a text box was placed above the response scale to display amounts of money. Procedure The procedure was the same as in Experiment 5 except that instead of luminance levels, subjects were presented with amounts of money that were generated by selecting random numbers between 17 and 60, dividing by 10, and raising 10 to the resulting value. This resulted in a log-equally spaced scale from approximately $50 to $1,000,000. The money values were displayed to two decimal places, as is common practice. Results and Discussion , The learning trials were analyzed by fitting the best line through the raw data according to Equation 10. The results were typical for this phase, the mean exponent was 0.57, the mean/sd was 10.46 and the highest to lowest ratio was 1.36:1. 88 The results of the test trials are displayed in Table 11. The exponent values were derived by fitting Equation 10 to the raw data. The 1000 Hz test trial results produced a level of intersubject variability well below the bench mark criteria, indicating that subjects remained calibrated during the test phase (see Table 11). For the money stimuli the mean exponent was 0.27, the mean/sd was 2.35, and the highest to lowest exponent ratio was 6 16, indicating substantial individual differences. However, although the R2 values for money were all above 0.80, it was clear from examining graphs of the raw data that many subjects did not conform to the straight line in log/log coordinates predicted by the power law (see Figure 22). The data were reanalyzed according to Equation 11 in Experiment 3 to see if the deviations were consistent with the deviations from the power law found near threshold (see Experiment 3). These results are also displayed in Table 11. As can be seen, subjects 3 and 4 produced unusual threshold estimates (i.e. because they were negative). These values were probably due to the fact that estimates of the threshold values in Equation 11 can be unstable, due to undesirable trade offs between the parameters. In order to get a second estimate the data were reanalyzed according to the equation, -R = K(S-T)B (12) which is equivalent to equation 11, but does not treat the stimuli or responses as log spaced. These results are also displayed in Table 11. Using this procedure, subject 3 produced a reasonable threshold estimate, although subjects 1, 2, and 4 did not. Subject 4 was the only subject to fail to produce a reasonable threshold estimate using either procedure. Therefore, subject 4's data were reanalyzed using equation 12 and fixing the value of T at 50. This produced an exponent of 0.20. The corrected R2 value 89 Table 11. Test trial results for Experiment 10. Analysis R R Subject* Exponent Corrected Exponent Corrected 1000 Hz RA2 money RA2 1 0.59 0.93 0.33 0.82 2 0.55 0.87 0.06 0.87 3 0.48 0.79 0.38 0.84 4 0.60 0.93 0.18 0.88 5 0.60 0.86 0.30 0.85 6 0.55 0.90 0.24 0.93 7 0.61 0.93 0.21 0.64 8 0.57 0.93 0.26 0.84 mean 0.57 0.24 sd 0.04 0.10 m/sd 13.50 2.53 h/l 1.27 6.16 Analysis RT ST Subject* Exponent Corrected Exponent Corrected money RA2 Threshold money RA2 Threshold 1 0.29 0.84 47.00 0.31 0.75 -0.39 2 0.06 0.90 71.01 0.06 0.86 -46.89 3 0.39 0.85 -41.04 0.27 0.89 46.94 4 0.23 0.89 -1140.47 0.22 0.89 -872.68 5 0.29 0.87 44.28 0.24 0.89 49.20 6 0.22 0.95 44.96 ,0.18 0.95 49.61 7 0.19 0.72 49.66 0.13 0.67 50.10 8 0.23 0.87 49.44 0.19 0.89 50.07 mean 0.24 0.20 sd 0.09 0.08 m/sd 2.50 2.55 h/l 4.91 5.19 R: log(R)=B*log(S)+log(K) RT: log(R)=B*log(S-T)+log(K) ST: R=K(S-T)AB resulting from this procedure (corrected R2=0.88) was only marginally lower than the corrected R 2 value using Equation 12 (corrected R2=0.89) or Equation 11 with T free to vary (corrected R2=0.89). The results of these analyses are displayed in Figure 22. 90 When plausible the results from Equation 11 were used to fit the function, for subject 3 the results from Equation 12 were used and for subject 4 the results from Equation 12 with T fixed at 50 were used. Overall, it is equivocal whether subjects 1 to 4 produced a near-threshold distortion in the power law. However, subjects 5 to 8 produced remarkably stable estimates of T, both across subjects and across methods of analysis, clearly indicating the presence of the near threshold distortion. Also, in all cases, the estimate of T is remarkably close to $50 which subjects were told would be the least amount that could be won (equation 12 estimates = 49.20, 49.61, 50.10, 50.07; equation 11 estimates = 44.28, 44.96, 49.66, 49.44). Thus subjects 5 to 8, and possibly subjects 1 to 4, used the stimulus they were told would be the least intense as threshold, rather than the actual informational threshold (i.e. $0). This indicates that when judging the magnitude of nonperceptual stimuli, context and knowledge of the situation can be accounted for in a very precise manner. In terms of individual differences, in Experiments 3, 5, 6, and 8 subjects remained calibrated on novel stimuli under conditions very similar to this experiment. Furthermore, when subjects failed to remain calibrated on the novel stimuli (i.e. Experiments 7 and 9) they were also less calibrated on the 1000 Hz test trial tones. However, in this experiment subjects remained highly calibrated on the 100.0 Hz tones. Therefore, it seems reasonable to assume that the higher variability found for the money exponents reflects real individual differences. For example, using the Equation 11 results (which accounted for the most variance) we find that to double the happiness of subject 5 (exponent=0.29) the amount of money would need to be increased by approximately 11 times, while for subject 7 (exponent=0.Q8) the amount of money 91 would need to be increased by approximately 39 times. However, because constrained scaling cannot claim to produce linear scales (i.e. what would the money exponents have been if subjects were trained on an exponent of 0.30) the importance of this measure lies in what it reveals about the relative sensitivity of subjects to the money stimulus. 92 Figure 22. Experiment 10: Fitted Functions for happiness. SUBJECT 2 1 2 . 3 4 6 6 LOG STMULUS (LOG DYNES/CW2) 1 2 8 4 5 8 LOG STMULUS (LOG OWES/CM^) 1 2 3 4 5 6 LOG STMULUS (LOG OYNESVCKT2) 1 2 3 4 6 LOG STMULUS (LOG DYNES/GM~2> 2 3 4 5 6 LOG STMULUS CLOG DYNES/t>r2) 2 S 4 5 6 LOG STMULUS (LOG DYNES/CM-2) SUBJECT 7 1 2 3 4 5 3 LOG STMULUS 0O3 OYNES/CM-2) 1 2 3 4 6 6 LOG STMULUS CLOG DYNE8/CM*2) 93 Discussion Psychophysical Models Fechner originally intended psychophysics as a way of studying consciousness, while Stevens was primarily interested in using psychophysics to study perceptual systems. In reality, both were studying the path, or possible paths from the perceptual system, through the extraction of a conscious perception of magnitude, to a response based on that conscious perception. The issue of response bias has already been raised in the introduction so we turn here to the issue of consciousness. Although the canonical model of psychophysics assumes that conscious perception is driven directly by the sensory system (Stevens, 1975; Marks, 1991), others disagree, the logical alternative to the canonical model is Mark's (1991) contextual model. According to this model, the context of the situation can cognitively alter the perception of magnitude before it reaches consciousness. For example, the range effect discussed in Experiment 6 is thought by proponents of this model to occur before conscious perception (eg. Marks and Warner, 1991; Schneider and Parker, 1994). The difficulty in distinguishing between the contextual model and the canonical model lies in the fact that there is no definitive criterion for distinguishing whether an effect was due to a preconscious cognitive factor or a response bias factor. However, in both models the implied model of consciousness is as defined in the introduction, that is as an awareness of an essentially static representation of stimulus magnitude. Both the canonical model and the contextual model are illustrated in Figure 23. Constrained scaling has different implications depending on which model is adopted. Under the assumptions of the canonical model it effectively circumvents the problem of idiosyncratic response biases, in effect, by assuring that all subjects have 94 stimulus perceptual mechanism stimulus perceptual mechanism stimulus perceptual mechanism consciousness response mechanism k A A A • • • response m response mechanism response mechanism • T T response single process multiple processes multiple processes with feedback 95 the same response bias. According to the canonical model, properly executed ME experiments will produce a linear scale of consciousness, whereas constrained scaling will only produce a linear scale only if subjects are trained on the true exponent value in the training phase. However, this is not a serious problem as, should the proponents of the canonical model discover a means of divining the true exponents, subjects could be trained on them. However, it is incumbent on the proponents of the canonical model to come up with a means of identifying the true exponent. In terms of the contextual model, the training involved in constrained scaling offers greater support for the assumption that the response function does not change across tasks (i.e. because of the response training), a critical assumption if we are to assume that context-induced response differences are due to effects upstream of consciousness. However, it is still not possible to absolutely rule out the possibility that a change in context could systematically bias subjects' response functions such that they remained calibrated but produced different results under different contexts. For example, the systematic drop in exponent value found in Experiment 2 when the feedback was removed could indicate that providing feedback sets up a context that alters perceptions of loudness. In memory psychophysics (see Algom, 1992, for a review), a similar drop in the exponent value with the passage of time is taken as a real effect on perceptions of stimulus magnitude. On the other hand, the drop could also represent a highly systematic effect on subjects' response functions. Although a single constrained scaling experiment cannot rule out the possibility that an effect was due to a highly systematic response bias, such biases should be quite rare if they exist at all. Indeed, one might rule them out all together on the basis that the response output system has never been conceived of as a precise system, identical 96 across subjects, but rather as a highly variable and idiosyncratic system (eg. based on personal experience with numbers and arbitrary decisions). However, even with the caveat that constrained scaling cannot completely rule out response bias, the effect of making the response function empirically penetrable is highly desirable. For example, Algom and Marks (1990) found that stimulus range affected the binaural gain 6 and attributed it to a change in the gain of the auditory system (see Schneider and Parker, 1990, 1994), however, the effect could also be accounted for by assuming that subjects maintained the same binaural ratio7 across a response bias resulting in higher exponents for the smaller stimulus range (see Marks, Galanter and Baird, 1995). By . using the constrained scaling approach, Marks, Galanter, and Baird (1995) were able to significantly reduce the plausibility of the second argument by demonstrating that, at the group level, training subjects to respond according to different exponents had no effect on the binaural gain (i.e. in order to maintain the second; argument, one would have to argue that there is something fundamentally different about altering the exponent through feedback and altering it through the use of the range effect response bias). A third psychophysical model, proposed by Ward (1992, 1993), also exists. This model, which I will refer to as the "dynamic model", views the mind as a highly dynamic system and is consistent with cognitive models of consciousness such as those put forward by Dennet (1991), Minsky(1986), and Hofstadter(1979). These models all differ from the view of consciousness employed so far in this dissertation, in that conscious perceptions are viewed as fluid and constantly changing, rather than static representations which can be accessed and reported. For example, according to 6 The binaural gain is the amount (in dB) that must be added to a monaural stimulus to make it as loud as a binaural stimulus. 7 The binaural ratio is the ratio of the log of binaural responses to the log of monaural responses, generally found to be 2:1. 97 Denhet's (1991) multiple drafts model of consciousness, conscious perceptions are constantly being rewritten, edited, and recorhbined so that there is no static state that can be labeled as the conscious perception. An example of this is the phonetic restoration phenomenon (Warren, 1970), which can be demonstrated by presenting subjects with a burst of white noise, followed by a word stem, followed by a sentence stem. Under these conditions, instead of hearing the word stem subjects report that they hear a whole word, however, the word that they hear is determined by the sentence stem which occurs after they have already heard the word stem. For example, if the word stem was "eel," subjects would report having heard "peel" if the sentence stem was "the orange," and "heel" if the sentence stem was "of the shoe." Similarly, according to the dynamic model, conscious perceptions of magnitude will undergo various revisions, including retroactive revisions, up to and even beyond the giving of a response. Thus the dynamic model postulates a plethora of conscious representations, constantly in flux. This is not to say that a jet engine will ever sound quieter than the chirping of a cricket, but that within certain parameters perceptions of sensory magnitudes are not fixed but fluid. The dynamic model is also illustrated in Figure 23. As can be seen, as one moves towards increasing complexity (i.e. from the canonical model towards the dynamic model) it becomes increasingly pragmatic to think in terms of particular sets of mental processes or pathways from the stimulus to the response, rather than in terms of characterizing a single mental process (i.e. the logical goal of the canonical model). Therefore, under the assumptions of the dynamic model, the goal of psychophysical, scaling becomes one of engaging the same mentahsub units in the same order for each subject (Ward, 1992), a task for which constrained scaling was specifically 98 designed. Note also that the dynamic model is the only model in which consciousness plays an important role. In the canonical model and the contextual model consciousness is merely a transparent window through which perceptual processes can be viewed. The consciousness part of these two models could be eliminated without changing the interpretation of experimental results. This is not the case with the dynamic model. Another interesting point is that under the assumptions of the dynamic model, response bias does not necessarily have to be postulated. The responses that subjects provide may be considered representations of magnitude, just as the neurologically encoded representations preceding them (moreover, we know that subjects are generally conscious of their responses). In this view a subject's response is just one in a stream (or possibly several parallel streams) of representations evoked by the stimulus8. Anecdotally, it was interesting to note that in the research for this dissertation, for the same maximum stimulus value, subjects trained on an exponent of 0.30 and a maximum response value of 10 complained the least about the volume of the loudest tone, while subjects trained on an exponent of 0.60 and a maximum response value of 100 complained more, and subjects trained on an exponent of 0.90 and a maximum response value of 1000 complained quite a bit. Therefore, although highly speculative, it is possible that the values or patterns of responses can retroactively influence subjects' impressions of stimulus magnitudes in a manner similar to the phonetic restoration phenomenon. Scaling Issues Dennet (1991) and Hofstadter (1979) have both argued that, from a functionalist perspective, there is no compelling reason not to extend the definitions of cognitive systems to include representations outside of the brain. 99 The primary goal of this dissertation was to demonstrate that constrained scaling can be used to calibrate subjects sufficiently to produce a meaningful, nonlinear (in the sense that it cannot be established ff it is linear or not), associative scale of psychological magnitude. According to the benchmarks set out in the introduction, this has been accomplished. Constrained scaling has been demonstrated to provide meaningful results intramodally, within the auditory domain; intermodally, between audition and vision; and extramodally, between audition and cognitively generated estimates of the expected utility of money. Using constrained scaling for sensory continua, subjects who receive the same level of stimulus magnitude report approximately the same level of subjective magnitude. Assuming that subjects with the same perceptual mechanisms (i.e. healthy and normal) experience the same magnitudes when exposed to the same sensory stimuli under the same conditions, this indicates that constrained scaling provides calibrated results. In other words, when successfully applied, constrained scaling causes subjects to respond using the same unit of psychological magnitude. Therefore, just as many different physical objects can be measured for length using a single unit (eg. centimeters, feet, cubits) so too, in theory, any psychological magnitude can be measured using the common unit provided by constrained scaling. The idea of a single unit to describe the magnitudes of such diverse phenomena as the brightness of a light and the utility of money might at first seem strange as the study of such phenomena are treated as different areas in psychology. However, as Norwich (1993) points out, the moment energy is detected by a perceptual organ it becomes information, much the same as an amount of money is coded as information. Thus the common unit provided by constrained scaling can be thought of as a measure of information. too Several points are pertinent to the drawback that constrained scaling produces nonlinear scales. The first is that there is no strong evidence that other forms of ME produce linear scales. The second is that it is debatable whether linear scales are necessarily superior to nonlinear scales, since equivalent interlocking systems of mathematical laws can be derived from either (see Stevens, 1951; Ellis 1968). The third is that, according to the dynamic model (Ward, 1991, 1993), there are no "true" scales to be measured. Rather than worrying about linearity, a more practical approach is to develop reliable scales and to attach meaning to the scale values empirically. As in physics, once a systematic, reliable means of measurement has been achieved, any underlying structure will be revealed through experimentation. In terms of the claim that constrained scaling eliminates idiosyncratic response biases from the resulting scale, it is interesting to compare the results of constrained scaling to exponents derived using a method that avoids the use of a response continuum, such as Shepard's (1966) nonmetric approach. Specifically, it can be argued that avoiding the use of a response continuum avoids response biases altogether, and that any remaining individual differences must be due to real individual, sensory system differences (Schneider, 1980, 1988). However, according to the dynamic model, avoiding the use of a response continuum would not necessarily cause subjects to lock onto the same cognitive pathway, and therefore not necessarily reduce idiosyncratic differences to a minimum. In order to eliminate the response continuum, Schneider (1980) employed the nonmetric approach and, in a separate study (Schneider, 1988), the conjoint measurement approach (Luce and Tukey, 1964). In both cases subjects were required only to make binary judgments of "greater than" or "less than" for paired stimuli. For 101 example, in Schneider (1980) subjects judged which of two pairs of tones displayed the greatest loudness difference. Similarly, in Schneider (1988) subjects were presented with two-tone complexes, each made up of two simultaneous tones of different frequency and intensity, and asked to judge which was louder. In both cases subjects satisfied all the conditions for the construction of a scale and exponent values were calculated. In Schneider (1980), for 1,200 Hz tones, the mean/sd was 3.25, and the highest to lowest exponent ratio was 2.55. In Schneider (1988), for 2 kHz tones, the mean/sd was 7.51, and the highest to lowest exponent ratio was 1.36; for 5 kHz tones, the mean/sd was 5.67, and the highest to lowest exponent ratio was 1.57. Using the conjoint measurement approach (Schneider, 1988) intersubject variability was reduced to about the same level found using constrained scaling, while the nometric approach (Schneider, 1980) produced results similar to ME in this regard (see Table 1). However, a closer examination of the conjoint measurement results reveals problems. As in constrained scaling, Schneider (1988) presented both tone frequencies within the same experiment. Examining the ratios of the 2 Khz tones to the 5 Khz tones for individual subjects, we find that the mean ratio was 0.91, the standard deviation was 0.19, and the mean/sd was 4.93. Comparing these figures to Experiment 6 of this dissertation (which used 65 and 1000 Hz tones): the mean ratio was 0.83, the standard deviation was 0.07 and the mean/sd was 12.16, the later more than double that found using Conjoint Measurement. If individual differences were due only to idiosyncratic response biases and real sensory system differences then, in theory, the methodologies used by Schneider (1980,1988) should have produced lower or equally low levels of individual variability, particularly for the exponent ratios between frequencies. Of course it is possible, but 102 unlikely, that in both studies Schneider selected subjects with unusually high individual differences compared to the subjects in Experiment 6 of this dissertation. A second possibility is that the models he used were simply wrong, although this too seems unlikely as the models produced results very similar to those found using ME and the data satisfied the rigorous requirements of the models (Schneider, 1980, 1988). The most likely explanation is that subjects have some cognitive latitude in the combining operations required by these techniques (i.e. subtracting and adding magnitudes), which is consistent with the assumptions of the dynamic model. The Systems Perspective Viewed from a strictly cognitive, artificial intelligence perspective, the process leading from stimulus input to response output can be understood as a formal system. Given the same inputs, whenever one system always produces the same output as another system they are isomorphicallv the same, even if the actual transducing mechanisms are quite different (Hofstadter, 1979). For example^ if the same program is run on a Turing machine and on a Von Neuman machine the two resulting systems would be isomorphically identical, even though the physical and computational processes involved in each system would be quite different. Constrained scaling attempts to make subjects isomorphically the same. From a systems perspective this goal is highly desirable as it means the system could be studied independently of considerations of individual subjects. As Luce (1972) pointed out, psychophysics cannot be like physics unless the object of study can be successfully abstracted from its instantiation within individual subjects. If the object of study is a formal system that subjects can become (i.e with training) then it may be possible to achieve this goal. 103 In fact, it is trivially easy to make subjects isomorphically the same. For example, subjects could be instructed to take whatever number they are given and respond with the sum of the number plus 7. Of course this is quite uninteresting, but if the system were one that could process many different types of inputs then these inputs would be related by a fixed intedocking mathematical structure, provided that subjects stayed locked onto the system. The results of this dissertation indicate that this is possible, at least to an approximation. A related point is that it was clear that some subjects were better than others at locking onto the system. If a criterion were imposed (i.e. use only subjects who have demonstrated an expertise at locking on) the results could probably be substantially improved. Also, as noted in the introduction, this research is only a starting point. As more is learned about what factors enable subjects to "lock on," (eg. Experiments 6 to 9) results should become increasingly precise. Future Directions: Beyond the Power Law This dissertation has focused on calibrating subjects to power functions with particular exponent values. However, the power law, although extremely popular, may not be the optimal way to characterize ME results. Competing with the power law are various forms of what Norwich (1993) has termed the complete law of sensation (see Norwich, .1993, for a historical review). Norwich's version of this law, contained within the entropic theory of perception, has proven particularly powerful in that many of the empirically discovered laws of psychophysics have been derived from it (see Norwich, 1993, for a review). According to the entropic theory, ME results should follow the function R=(1/2)Kln(1+YSN) (15) 104 where Y and N are constants, with N approximating the exponent (B) from the power law (Equation 3). When Y S N is small, Equation 15 is approximated by the power law (see Norwich, 1993, for the derivation), however, as Y S N grows larger (i.e. so that the relative contribution of the +1 term becomes small) Equation 15 is better characterized by a version of Fechner's log law (i.e. dropping the +1 term, R=(1/2)K ln(YSN)). In log-log coordinates this would predict a deviation from a straight line that would appear as a slight downward curve at higher stimulus magnitudes (Norwich, 1993). If the entropic theory is correct then at high stimulus intensities the power law should have seemed unnatural to subjects. A visual inspection of the raw data presented in this dissertation reveals that subjects who visibly deviated from the power law did so in the manner predicted by the entropic theory (except for the standard ME results in Experiment 4). As a test, the data from the test trials of Experiment 3 were reanalyzed excluding stimuli below ln(S)=1, a some what arbitrary cut off point for "high" stimulus magnitudes9. Functions were fit in both log-log coordinates and log-linear coordinates. The corrected R 2 values are displayed in Tables 12 and 13. As can be seen, even for the with-feedback tones, subjects, with only a few exceptions, responded in a way more consistent with Fechner's log law than Steven's power law (i.e. log-linear R 2 > log-log R2), at high stimulus magnitudes. Also, a paired-comparison, two-tailed t-test revealed that the difference in the R 2 values was significant at p<.001. Given that subjects were specifically taught to use a power function, these results can be considered a strong indication that Equation 15 constitutes a better description of subjects' natural tendencies. 9 Traditionally the power law is described using logs, whereas Norwich perfers Ins to describe the entropic theory. 105 The mystery of the method dependent results found in Experiments can also be resolved if it is assumed that subjects were responding according to Equation 15. . Specifically, it was found that a nonlinear curve fitting approach (i.e. on the raw data) resulted in exponent values significantly lower than 0.60 for the no-feedback, 1000 Hz tones, while fitting a straight line through the log of the stimuli and the log of the responses (Equation 10) resulted in exponent values approximately equal to 0.60 (see Experiment 2). If subjects were responding according to Equation 15, then using the log of the stimuli would compress the higher stimulus values where Equation 15 is not well approximated by the power law, and expand the stimulus range in which Equation 15 is well approximated by the power law. Thus it may only be meaningful to fit the power law according to log-stimulus values. Overall, these results suggest that training subjects on Equation 15 may be an avenue to further reduce intersubject variability. However, there are problems associated with fitting Equation 15 to the data. Specifically, the values for K and Y trade off and the estimates are nonrobust (Norwich, 1993). Estimates of N are more robust but still vary significantly, although in practice they are quite close to power law estimates of B for the same data (Norwich, 1993). Future research will need to address these technical problems. One possibility is that training subjects on Equation 15 would make it possible to assume the value of one of the three parameters (K, Y and N). If the number of parameters to be estimated could be reduced to two the curve fitting results should become more stable. 106 Table 12. 1000 Hz test trial results for Experiment 3. Frequency 1000Hz 1000Hz 1000Hz 1000Hz Feedback NF NF F F Analysis log-log log-linear log-log log-linear Subject* Corrected Corrected Corrected Corrected RA2 RA2 RA2 RA2 1 0.561 0.655 0.718 0.759 2 0.474 0.555 0.632 0.733 3 0.763 0.784 0.645 0.588 4 0.754 0.837 0.708 0.715 5 0.327 0.680 0.843 0.847 6 0.729 0.776 0.759 0.816 mean 0.601 0.715 0.718 0.743 F: feedback NF: no feedback Table 13. 65 Hz test trial results for Experiment 3. Frequency 65Hz 65Hz 1000Hz 1000Hz Feedback NF NF F F Analysis log-log log-linear log-log log-linear Subject* Corrected Corrected Corrected Corrected RA2 RA2 RA2 RA2 1 0.520 0.573 0.718 0.844 2 0.363 0.461 0.729 0.791 3 0.247 0.222 0.569 0.660 4 0.371 0.380 0.618 0.655 5 0.694 0.781 0.840 0.885 6 0.538 0.499 0.776 0.844 mean 0.456 0.486 0.708 0.780 F: feedback NF: no feedback 107 References Algom, D., & Marks, L. E. (1984). Individual differences in loudness processing and loudness scales. Journal of Experimental Psychology: General, 113(4). 571-593. Attneave, F. (1962). Perception and related areas. In S. Koch (Ed.), Psychology: A study of a science. Vol. 4. New York: McGraw-Hill. Baird, J . C , Kreindler, M., & Jones, K. (1971). Generation of multiple ratio scales with a fixed stimulus attribute. Perception and Psychophysics. 9. 399-403. Bolanowski, S. J . , & Gescheider, G. A. (1991). Ratio scaling of psychological magnitude: In honor of the memory of S. S. Stevens. Hillsdale, New Jersy: Lawrence Erlbaum Associates, Publishers. Borg, G. A. V., & Marks, L. E. (1983). Twelve meanings of the measure constant psychophysical power functions. Bulletin of the Psychonomics Society. 21(1). 73-75. Curtis, D. W., Attneave, F., & Harrington, T. L. (1968). A test of a two stage model of magnitude judgment. Perception and Psychophysics, 3. 25-31. Dennett, D. C. (1991). Consciousness Explained. Toronto: Little, Brown and Company Ekman, G. (1962). Measurement of moral judgment: A comparison of scaling methods. Perception and Motor Skills.15. 3-9. Ellis, B. (1968). Basic concepts of measurement. Cambridge: Cambridge University Press. Hellman,. R. P. (personal communication to L. Ward, August, 1994). Hellman, R. P., & Meiselman, C. H. (1988). Prediction of individual loudness exponents from cross modality matching. Journal of speech and hearing research, 31, 605-615. Hofstadter, D. R. (1979). Godel. Escher. Bach: An Eternal Golden Braid. New York: Random House. Indow, T. (1961). An example of motivation research applied to product design. Published in Japanese in Chosa To Giiiutsu. 102. 45-60. Indow, T., & Stevens, S. S. (1966). Scaling of saturation and hue. Perception and Psychophysics. 1. 253-272. King, M. C , & Lockhead, G. R. (1981). Response scales and sequential effects in judgment. Perception and Psychophysics, 30(6), 599-603. 108 Koh, K. (1993). Induction of combination rules in two dimensional function learning. Memory and Cognition, 21(5). 573-590. Koh, K., & Meyer, D. E. (1991). Function learning: Induction of continuous stimulus-response relations/Journal of Experimental Psychology: Learning, Memory and Cognition. 17(5). 811-836. Ladavas, E., Cimatti, D., Del Pesce, M., & Tuozzi, G. (1993). Emotional evaluation with and without conscious stimulus identification: Evidence from a split-brain patient. Cognition and Emotion, 7(1), 95-144. Lilienthal, M. G., & Dawson, W. E. (1976). inverse cross-modality matching: A test of ratio judgment consistency for group and individual data. Perception and Psychophysics, 19(3). 252-260. Logue, A. W. (1976). Individual differences in magnitude estimation of loudness. Perception and Psychophysics, 19(3). 279-280. ' Luce, D. R. (1972). What sort of measurement is psychophysical measurement? American Psychologist, Feb, 96-106 Luce, D. R., & Mo, S. S. (1965). Magnitude estimation of heaviness and loudness by individual subjects: A test of a probabilistic response theory. The British Journal of Mathematical and Statistical Psychology, 18(2), 159-174. MacKay, D. M. (1963). Psychophysics of perceived intensity: A theoretical basis for Fechner's and Stevens' Laws. Science, 139, 1213-1216. Marks L. E. (1974). Sensory Processes: The New Psychophysics. New York: Academic Press. Marks L. E. (1974). On scales of sensation: Prolegomena to any future psychophysics that will be able to come forth as science. Perception and Psychophysics. 16(2). 358-376. Marks L. E. (1991). Reliability of magnitude matching, Perception and Psychophysics. 49(1), 31-37. Marks, L. E., Galanter, E., & Baird, J . C. (1995). Binaural summation after learning psychophysical functions for loudness. Perception and Psychophysics. in press. Marks, L. E., & Warner, E. (1965). Slippery context effect and critical bands. Journal of Experimental Psychology: Human Perception and Performance. 17(4), 986-996. Marks, L. E., & Stevens, J . C. (1965). Individual brightness functions. Perception and Psychophysics. 1. 17-24. 109 Minsky, M. (1986). The Society of Mind. New York: Simon and Schuster, Inc. Mori, S., & Ward, L. M. (1995). Pure feedback effects in absolute identification. Perception and Psychophysics. 57(7), 1065-1079. Norwich, K. H. (1993). Information, sensation, and perception. San Diego: Academic Press, Inc. Poulton, E. C. (1989). The new psychophysics: Six models for magnitude estimation. Psychological Bulletin, 60. 65-77. Poulton, E. C. (1989). Bias in guantifying judgments. London: Lawrence ' Erlbaum Associates, Publishers. Rule, S. J . , Curtis, D. W., & Markley, R. P. (1970). Input and output transformation from magnitude estimation. Journal of Experimental Psychology, 86(3), 343-349. Rule, S. J . , & Markely, R. P. (1971). Subject differences in cross-modality matching. Perception and Psychophysics. 9(1 b), 115-117. Schneider, B. (1980). Individual loudness functions determined from direct comparisons of loudness intervals. Perception and Psychophysics. 28(6). 493-503. Schneider, B. (1988). The additivity of loudness across critical bands: A conjoint measurement approach. Perception and Psychophysics, 43, 211-222. Stevens, S. S. (1946). On the theory of scales and measurement. Science. 103. 677-80. Stevens, S. S. (1959). Crossmodality validation of subjective scales for loudness, vibration, and electric shock. Journal of Experimental Psychology, 1. 96-100. Stevens, S. S. (1966). Duration, luminance and the brightness exponent. Perception and Psychophysics, 1, 96-100. Stevens, S. S. (1975). Psychophysics: Introduction to its perceptual, neural and social Prospects. New York: A Wiley-lnterscience Publication. Stevens, S. S., & Guirao, M. (1975). Subjective scaling of length and area and the matching of length to loudness and brightness. Journal of Experimental Psychology, 66(2). 177-186. Stevens, J . C , & Hall, J . W. (1966). Brightness and loudness as functions of stimulus duration. Perception and Psychophysics, 1. 319-327. Stevens, J . C. and Stevens, S. S. (1966). Brightness function: Effects of adaptation. Journal of the optical society of America, 53, 375-385. no Teghtsoonian, M., & Teghtsoonian, R. (1983). Consistency of individual exponents in cross-modal matching. Perception and Psychophysics. 33(3). 203-214. Teghtsoonian, M., & Teghtsoonian, R. (personal communication, October, 1994). Teghtsoonian, R. (1971). On the exponents in Stevens' law and the constants in Ekman's law. Psychological Review. 78, 71-80. Teghtsoonian, R. (1994). (personal communication, October, 1994). Teghtsoonian, R., & Teghtsoonian, M. (1994). Range and regression effects in magnitude scaling. Perception and Psychophysics, 24(4), 305-314. Vaughan, H. G.; Costa, L. D., & Gilden, L. (1966). The functional relation of visual evoked response and reaction time to stimulus intensity. Vision Research, 6. 645-656 Ward, L. M. (1975). Sequential dependencies and response range in cross-modality matches of duration to loudness. Perception and Psychophysics. 18(3). 217-223. Ward, L. M. (1982). Mixed modality psychophysical scaling: Sequential dependencies and other properties. Perception and Psychophysics. 31(1). 53-62. Ward, L. M. (1990). Critical bands and mixed frequency scaling: Sequential dependencies, equal loudness contours, and power function exponents. Perception and Psychophysics, 47(6). 551-562. Ward, L. M. (1991). Associative measurement of psychological magnitude. In S. J . Bolanowski & G. A. Gescheider (Eds.), Ratio Scaling of Psychological Magnitude: In Honor of the Memory of S. S. Stevens (pp. 79-100). Hillsdale, NewJersy: Lawrence Erlbaum Associates, Publishers. Ward, L. M. (1992). Who Knows? In G. Borg & N. Neely (Eds.) Fechner Day 92. Stockholm: International Society for Psychophysics. Ward, L. M. (1993). Mind in Psychophysics. In D. Algom (Ed.), Psychophysical Approaches to Cognition (pp. 187-250). North-Holland: Elsevier Science Publishers B. V. Ward, L. M. (1995). On the role of discriminability in psychophysics. Talk given at, Fechner Day 95. Cassis, France: International Society for Psychophysics. Warren, R. M. (1970). Perception restoration of missing speech sounds. Science, 167, 393-395. Wanschura R. G., & Dawson, W. E. (1974). Regression effect and individual power functions over sessions. Journal of Experimental Psychology. 102(5), 806-812. i l l West, R. L, & Ward, L M. (1994). Constrained Scaling. In L. M. Ward (Ed.) Fechner Day 94. Vancouver: International Society for Psychophysics. Zwislocki, J . J . (1983). Group and individual relations between sensation magnitudes and their numerical estimates. Perception and Physchophvsics, 33(5), 460-468. Zwislocki, J . J . (1983). Natural measurement. In S. J . Bolanowski & G. A. Gescheider (Eds.), Ratio Scaling of Psychological Magnitude: In Honor of the Memory of S. S. Stevens (pp. 18-26). Hillsdale, New Jersy: Lawrence Eribaum Associates, Publishers. 112 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0099179/manifest

Comment

Related Items