UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perception of rhythm in sequences of clicks and of syllables D’Arcy, Janet Mary 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1984_A6_7 D37.pdf [ 4.44MB ]
Metadata
JSON: 831-1.0095981.json
JSON-LD: 831-1.0095981-ld.json
RDF/XML (Pretty): 831-1.0095981-rdf.xml
RDF/JSON: 831-1.0095981-rdf.json
Turtle: 831-1.0095981-turtle.txt
N-Triples: 831-1.0095981-rdf-ntriples.txt
Original Record: 831-1.0095981-source.json
Full Text
831-1.0095981-fulltext.txt
Citation
831-1.0095981.ris

Full Text

PERCEPTION OF RHYTHM IN SEQUENCES OF CLICKS AND OF SYLLABLES by Janet Mary D'Arcy B.A., Simon Fraser University, 1983 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE FACULTY OF GRADUATE STUDIES Faculty of Medicine School of Audiology and Speech Sciences We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1984 © Janet Mary D'Arcy, 1984 I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an advanced degree a t the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I agree t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y purposes may be g r a n t e d by t h e head o f my department o r by h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department o f ' The U n i v e r s i t y o f B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6 (3/81) i i ABSTRACT The perception of speech rhythm may be affected by two factors, one being a tendency, by speakers, to lengthen syllables as the utterance progresses, the other being a tendency, by listeners, to impose rhythmic structure on speech sequences. Three tests were constructed in which each stimulus consisted of a sequence of six sounds and each test contained seven different time-altered stimuli. The time alteration was of a nonlinear progressive nature. Test I consisted of sequences of six clicks; Tests II and III consisted of sequences of six [ta] and six [na] syllables, respectively. Native speakers of English, French and Japanese were asked to rate the regularity of the sequence as accelerating, regular, or decelerating. Results indicate that a stimulus may have values of the alteration parameter which cover a large range and yet s t i l l be perceived as regular. However, in a l l cases, the stimulus that would be produced using the values corresponding to a subject's perception of regularity would be acoustically anisochronous: the timing of this stimulus would always be decelerated. No significant difference in performance was found between English, French and Japanese subjects, nor between click, [ta], and [na] stimuli, nor between the six test orders. The DL for irregularity could not be measured accurately, but i t was estimated. A further experiment w i l l be necessary to determine this DL more accurately. TABLE OF CONTENTS Page ABSTRACT i i TABLE OF CONTENTS i i i LIST OF TABLES v LIST OF FIGURES v i ACKNOWLEDGMENT v i i DEDICATION v i i i V- -Chapter 1 INTRODUCTION 1 Chapter 2 LITERATURE REVIEW 5 2.1 Introduction 5 2.2 Timing and Rhythm 9 2.3 Timing i n Language and Isochrony 19 Chapter 3 AIMS OF THE INVESTIGATION 32 Chapter 4 EXPERIMENTAL PROCEDURES AND APPARATUS 35 4.1 P i l o t Study 35 4.2 P i l o t Test I 35 4.21 Stimulus Preparation 35 4.22 Subjects 38 4.23 Test Procedure 38 4.24 Results 41 4.3 P i l o t Test I I 43 4.31 Stimulus Preparation 43 4.32 Subjects 50 4.33 Test Procedure 51 4.34 Results 51 i v 4.4 Main Study 52 4.41 Stimulus Preparation 55 4.42 Subjects 59 4.43 Test Procedure -~- 59 Chapter 5 RESULTS 61 5.1 Data Storage and Sorting 61 5.2 Pseudo-normalization 67 5.3 Perception of Regularity 67 5.4 Subject Evaluation C r i t e r i a 71 5.5 Estimation of DL for Irregularity 74 Chapter 6 DISCUSSION 85 BIBLIOGRAPHY 94 APPENDIX A - SUBJECT INFORMATION FORM 97 APPENDIX B - INSTRUCTIONS 98 V LIST OF TABLES Table Page I Durational Structure (in samples) for the Ten Stimuli of 37 Pilot Test I II Durational Structure (in samples) for the [ta] Stimuli of 49 Pilot Test Ha III Durational Structure (in samples) for the [na] Stimuli of 49 Pilot Test lib IV Typical Response Matrix for Subject 2 in Pilot Test Ha 53 V Typical Averaged Response for All Subjects of Pilot Test Ha 53 VI Durational Structure (in samples) for the Click Stimuli of 58 Main Test I VII Durational Structure (in samples) for the Speech Stimuli of 58 Main Tests II and III VIII Typical Response Matrix for Subject E2 in Main Test II 62 IX Typical Averaged Response for All English Subjects of Main 62 Test II X Typical Averaged Response for All Tests for Subject E2 63 XI Normalized Response Matrix for Subject E2 in Main Test II 68 XII Stimulus Number Corresponding to Perception of Regularity 70 XIII Results of ANOVA Performed On 18-Subject Group 72 XIV Estimated values of Aa for DL Using the Three Best Subjects 83 v i LIST OF FIGURES Figure Page 1 Block Diagram for Digital-to-Analog Conversion 39 2 Block Diagram for Dubbing of Original Tape and Recording 40 of Item Numbers on Test Tape 3 Mean Score and 2-s.d. Range for Each Stimulus of Pilot Test I, 42 Averaged Over Nine Subjects 4 Block Diagram for Recording of CV Syllables 46 5 Block Diagram for Analog-to-Digital Conversion 47 6 Standard Deviations for Subject 3 in Pilot Test Ila 54 7 Curve Produced by Plotting a-values as a Function of their 56 Corresponding Stimulus Numbers 8 Mean Score and 2-s.d. Range for Each Stimulus of Main Test I, 64 Averaged over A l l Subjects 9 Mean Score and 2-s.d. Range for Each Stimulus of Main Test II, 65 Averaged over A l l Subjects 10 Mean Score and 2-s.d. Range for Each Stimulus of Main Test III, 66 Averaged over A l l Subjects 11 Typical Averaged Response Means for Test I 75 12 Typical Averaged Response Means for Test II*. 76 13 Typical Averaged Response Means for Test III 77 14 Typical Averaged s.d.'s for Test I 78 15 Typical Averaged s.d.'s for Test II 79 16 Typical Averaged s.d.'s for Test III 80 v i i ACKNOWLEDGMENT I would l i k e to thank a l l those who have had a-part i n t h i s thesis. In p a r t i c u l a r , I thank: - Andre-Pierre Benguerel for giving so freely of his time and advice during the research and writing of t h i s thesis. - John Gilbert for serving on my committee. - Malcolm Greig, Chinh Le, Brenda Morrison, and Ronnie Sizto for s t a t i s t i c a l advice. - A l l my subj'ects for the i r time and attention. - Maryalyce McDonald for reviewing a portion of the manuscript and for encouragement. - Trish Batch and Rebecca D'Arcy for t h e i r help i n preparation of the manuscript. - A l l my friends and family, especially Paul, Rebecca, Jack, Roy, and my Mother for t h e i r understanding and support. This thesis i s dedicated to the memory of my father, Owen D'Arcy (1907-1970), who believed in the value of further education. 1 CHAPTER 1 INTRODUCTION Time i s not a thing that, like an apple, may be perceived. Stimuli and patterns of stimuli occupy physical time; and we react to such stimuli by perceptions, judgments, comparisons, estimates, etc. (Woodrow, 1951) In speech perception, the abil i t y of the auditory system to determine the order and time of occurrence of serial events i s important. It is well known that an acoustic signal has three parameters: frequency, amplitude, and time. These physical parameters are generally highly correlated with various attributes of auditory sensation: pitch, loudness, and perceptual timing; and these, in turn, correlate with the prosodic features of speech: intonation,, stress, and rhythm. Traditionally, psychoacousticians have studied pitch and loudness phenomena extensively, but have devoted less effort to the study of temporal phenomena. This neglect may be partially due to the di f f i c u l t y encountered in measurement: only recently has instrumentation, including computers, provided capabilities to handle a l l the necessary experimental constraints conveniently. As audition i s often considered to be the temporal modality (the ear has a much better time resolution than other sense modalities have), additional attention should be paid to the temporal organization of sound perception and - more importantly for humans - of speech perception. It has been noted that the suprasegmental patterns of language are always patterns in time (Lehiste, 1971). Suprasegmental features such as stress, intonation, and rhythm depend highly on the acoustic parameters of intensity, fundamental frequency, and duration, respectively. This study i s concerned with rhythm, the more or less regular recurrence in time of 2 auditory or of phonetic events possibly arranged in similar patterns of duration. One may rate how rhythmic (or regular) a series of sounds i s by assessing the regularity of patterns of durations. If these durations are equal or near equal, the group of sounds w i l l be perceived as regular or rhythmic. This has led many authors to propose the existence of some type of isochrony - that i s , the rhythmic organization of speech into more or less equal intervals (Lehiste, 1977) - for the various languages they have studied. It appears that many of these unequal (or near equal) intervals are perceived as equal because their difference from the standard i s below perceptual threshold. A tendency toward isochrony i s thus found in the signal and the signal i s perceived as rhythmic, despite certain irregularities in temporal intervals. However, i f one or several of these intervals differ by an amount exceeding this perceptual threshold, the series w i l l be heard as irregular. It must be recalled that duration, intensity, and fundamental frequency interact in audition. For example, experiments dealing with minimal audible duration have indicated that the detection of a brief tone may depend on i t s intensity: the more intense the signal, the shorter the signal duration required to reach threshold. Other studies have indicated that a sound with higher intensity or a.sound with higher pitch w i l l appear, longer than one with lower intensity or with lower pitch; this effect decreases as duration increases (Fraisse, 1982). Furthermore, i t has been shown that when two tonal stimuli are presented with the same intensity level but one with a greater duration, the stimulus with the greater duration may be perceived as being louder. The frequency of the tone affects the period over which the loudness increases: the lower the frequency of the tone, the 3 greater the period of loudness summation (Fraisse, 1982). Depending on the particular language, linguistic stress may be coded into any of the acoustic prosodic parameters (duration, intensity, and fundamental frequency). Fry (1955) carried out an investigation to assess the influence of the physical cues of duration and intensity on the perception of linguistic stress in English. His results showed that both duration and intensity ratios are cues for stress; however, the former is a more efficient cue than the latter. Allen (1975) noted that fundamental frequency i s even more important for stress perception. He also noted that in English the occurrence of duration, intensity, and fundamental frequency i s highly correlated: the syllables that are higher in fundamental frequency are also usually greater in duration and in intensity. Rhythm may be inferred either from the durations of individual sounds, or from the intervals between the beats elicit e d by these sounds, depending on the stimuli. For example,- in a train of clicks, the equality or near-equality of the intervals between clicks may result in the perception of a regular rhythm. However, in a sequence of syllables, i t i s unclear whether i t i s the equality or near-equality of the durations of the syllables, or the equality or near-equality of the interstress intervals which results in the perception of a regular rhythm. Since speech i s a continuous process and not merely a succession of discrete sounds produced by independent motor activities, the acoustic cues in the speech signal overlap in time and change with context as do articulatory activities. It i s well known that the auditory system appears to be efficent in detecting change and i t would appear that the perception of the rhythm of speech must take place in spite of - or perhaps because of - the constantly changing signal. There are probably several levels of time control in speech 4 production. Allen (1973) proposed two levels, one for long-term control, the other for short-term control. Long-term control would monitor speech rate (the average rate at which syllables, words, and phrases are spoken), and short-term control would monitor phonetic duration (segment duration within the syllable and possibly syllable duration within the rhythmic phrase). These two levels are not totally independent, for there i s evidence that the rate of an utterance may affect segment and/or syllable durations as well as pause durations. A judgment of the accuracy of either one of these two levels of control must take into account the influence of the other level. For example, when subjects produce speech as stimuli for a segment duration study, they should be instructed to attempt to hold their speech rate as constant as possible in order to reduce variability from an unwanted source (i.e. from long-term control). For the purpose of this study, the level of interest i s the one for short-term control. Since the term 'irregularity' has at least two clearly distinct meanings and uses in the context of rhythm, i t i s important to.clarify the sense in which this term i s used in the present study. Whereas the term 'irregularity' can refer to a randomly located timing alteration in an otherwise regular sequence, i t can also refer to a timing alteration which would progressively shorten or progressively lengthen intervals.or durations along the time dimension. In his experiment, Hibi. (1983) used the term •irregularity' in the former sense; in this study,, the term w i l l be used in the latter. The present study i s not primarily concerned with the production abil i t y of subjects, but rather with their perceptual a b i l i t y , more specifically their ability to perceive whether certain types of stimuli.are regular or irregular. 5 CHAPTER 2 LITERATURE REVIEW 2.1 Introduction Timing i s the organization of events according to a schedule which may or may not involve periodicity. For instance, bus routes are organized according to a schedule. The schedule may be periodic; in such a case, the buses would arrive regularly once per hour. The schedule may also be nonperiodic; in this case, the buses would arrive at irregular times depending on the expected volume of passenger t r a f f i c . The events of interest to this study are sounds or, more specifically, sounds which are important, in the sense that they have salience. One can say that accented or lengthened sounds have salience. Important sounds may be said to contain the 'beat* - that i s , the emphasis or the main pulse - of the (sequentially) organized events. For example, i f the event of interest i s a syllable, the important sound may be the 'syllable beat 1. If, on the other hand, the event of interest i s a metric foot - a group of accented and unaccented syllables forming a rhythmic unit - the important sound may be the 1 stress beat'. The perception of timing implies an organization (or a mental construction) of the events. The events may be perceived as a number of" elements (i.e. component parts) which are organized into one or more groups (or units). Four possible types of organization can be envisaged: l.a periodic, with one salient element (or beat), and one element only per period. l.b periodic, with one salient element and one or more non-salient elements per period. 6 2. scheduling (pre-established, but nonperiodic timing). 3. random (no pre-established timing). Interested in the temporal organization of speech events, Ohala (1973) investigated three hypotheses (or models) which correspond to the organizational types listed above: 1. Speech units are uttered according to an underlying regular rhythm. This model has periodicity (thus scheduling) and corresponds to types (l.a) and (l.b) above, depending on whether a l l or only some of the elements are salient. 2. Speech units are produced according to an underlying pre-established time schedule which i s basically nonperiodic. This model - called the comb model - has scheduling but no periodicity and corresponds to type (2) above.. 3. There i s no underlying time programme (or rhythm). Speech elements would simply be strung together, one event being initiated only once the previous one has been completed. This model - called the chain model - has neither periodicity nor scheduling and corresponds to type (3) above. In his study, Ohala mainly discussed the latter two hypotheses; his results w i l l be discussed later. The present study investigates the former hypothesis, namely types (l.a) and (l.b). There are several levels of organization in speech at. which one can speak of rhythmic organization. Two such levels are: - the discourse level, where the beats of rhythm are associated with the . pauses which separate the successive breath groups. This i s a higher level which may. also be referred to as "long-term1. 7 - the phrase level, where the beats of rhythm are the beats associated with each foot or each syllable. This is a lower level which may also be referred to as •short-term 1. The discourse level of rhythmic organization i s not within the scope of this study. At the phrase level of organization, language rhythm has been said to be determined by the recurrence, succession and coordination of the stress and syllable beats (Abercrombie, 1967). The two earliest types of speech rhythm proposed (Pike, 1945; Abercrombie, 1967) were: 1. syllable-timed language rhythm: the intervals between the syllable beats are isochronous. 2. stress-timed language rhythm: the intervals between the stress beats are isochronous. If one of the beat sequences i s isochronous, the other cannot be: in a stress-timed language, the syllable beats are unevenly spaced and in a syllable-timed language, there may be no stress beats (e.g. in French) and i f they exist, the stress beats are unevenly spaced. Other types of language rhythm are also possible and have been proposed (e.g. tone-timing, mora-timing), but w i l l not be elaborated on here. Since the major interest of the present investigation i s the rhythmic aspect of the perception of 1short-term * sound sequences, this study deals with the phrase level of organization and endeavours to investigate various aspects related to both types of periodic organization mentioned above, that i s , types (l.a) and (l.b). These two types of periodic organization correspond to syllable-timed and stress-timed languages. As previously stated, events are organized into temporal patterns, 8 or groups, and these patterns may be perceived as having rhythm. In a discussion of rhythm, Fraisse (1982) stated The task of those who study rhythm i s a d i f f i c u l t one, because a precise, generally accepted definition of rhythm does not exist. This d i f f i c u l t y derives from the fact that rhythm refers, to a complex reality in which several variables [or aspects] are fused. (Fraisse, 1982, p.149) In a prior study, Fraisse (1963) discussed two important aspects of rhythm, which are also of interest to the present study. One such aspect, which has already been discussed to some extent, i s that of periodicity: the recurrence of a beat at regular intervals of time. The second aspect i s that of structure: the recurring units a l l have a pattern with an analogous structure. The rhythm of some phenomena appears to be construed primarily through an awareness of one or the other of these two aspects.. For example, the perception of the rhythm of day and night alternation or of a military march i s through an awareness of the periodicity of these phenomena. On the other hand, the perception of the rhythm of seasons during the year, of a waltz, or of the feet of a verse i s through an awareness of the previously learned structure of recurring structures. The perception of periodicity and of structure implies the perception of duration. Fraisse (1963, p.76) noted that "The perception of duration is that of the duration of an organization." Duration perception requires (among other things) the comparison of 'segments' or of 'intervals'. However, ...a succession i s characterized not by i t s elements, either stimuli or intervals, but by the scheme of i t s durations:...the identity of the temporal patterns was what made i t possible for rhythms composed of different 'elements' to appear similar. (Fraisse, 1963, p.77) The durations of segments or of intervals may be manipulated in 9 order to investigate any effects on the perception of timing. For example, experimenters have used duration manipulation in various experimental designs in order to assess the 'psychological present', ...the physical time over which stimuli may be spread and yet a l l be perceived as present. This time has sometimes been termed the temporal span of attention. (Woodrow, 1951, p.1230), and/or the •indifference interval', a time span within which temporal discrimination i s most accurate. Similar manipulations and their effects on the perception of timing in language are currently being investigated extensively. Discussion of the above effects occurs in the remainder of this chapter. 2.2 Timing and Rhythm In an article on rhythm, Fraisse (1982, p.150) explained that "...rhythm i s a perceptual quality specifically linked to certain successions." That i s to say, one imposes a rhythm on various elements, but such an imposition may be influenced by certain objective factors. Indeed, subjective factors, individually or in conjunction with objective ones, may intentionally or involuntarily determine the nature of the perceived rhythm. The perception of the most simple rhythm may occur when the subjective factors entirely determine the grouping. The most easily perceived rhythm i s one that is produced by the simple repetition of the same stimulus at a constant frequency. In the rest of this a r t i c l e , we w i l l c a l l this a 'cadence'. The simplest examples are the beating of a clock or of a metronome. But the most important fact i s that these rhythms are characteristic of some very fundamental activities such as walking, swimming, and flying. Both animals and people move about with rhythmic movements characteristic of their species. The f i r s t rhythmic movement found in the human newborn is sucking, with periods that follow at intervals _pf from 600 to 1200 msec. This regularity i s interrupted by spontaneous pauses, but sucking movements occur at a cadence that seems to be characteristic for each infant. (Fraisse, 1982, p.151) 10 The fundamental activities mentioned above, including those of man, are periodic activities which have their own 'spontaneous' rate of sequence, the natural or voluntary rhythm that results when we are allowed to perform an activity at our own rate. The range of spontaneous rates i s 0.5 to 2.0 per second: this frequency depends on the muscular mass concerned (Fraisse, 1982). Spontaneous rate i s often measured by the natural speed of tapping. The interval length between two taps varies from 200 to 1400 msec with the most representative interval being 600 msec. There i s great interindividual variability of this rate but l i t t l e intra-individual variability: the intra-individual variability i s only 3 to 5%, which i s in the range of the differential threshold for durations of this type. The small amount of intra-individual variability may indicate that spontaneous rate i s characteristic of the individual (Fraisse, 1982). Spontaneous rate i s often compared with 'preferred' rate, a regular succession judged as being neither too slow nor too fast. If a sequence of sounds, such as clicks uniformly spaced and equally loud, i s presented, the preferred rate of sequence increases with the number of elements in the group. The range of preferred rates, indicated by the time between successive clicks, i s 0.2 to 1.3 seconds (Woodrow, 1951). The upper limit of our capacity for perception of succession i s five, to six elements. However, i f these: identical sounds are grouped or form a unit of significance (e.g. in two's, three's, four's, or five's), far more elements can be perceived: one may perceive four or five groups of these sounds"(Fraisse, 1963). ' This grouping of groups is most likely related to the 7+2 of Miller (1967) and to the notion of 'chunking•. Spontaneous rate, which pertains to a motoric phenomenon, and 11 preferred rate, which pertains to a perceptual phenomenon, have comparable frequencies and are often associated; a subject generally finds i t f a i r l y easy to accompany a regular series of sounds with a motor act. This accompaniment tends to be a synchronization between the sound and a tap - that i s to say, that the stimulus and the response occur simultaneously. [...J A similar behavior i s possible only i f the motor command i s anticipated in regard to the moment when the stimulus is produced. More precisely, the signal for the response i s not the sound stimulus but the temporal interval between successive signals. (Fraisse, 1982, p.154) The perception of a more complex rhythm may occur when objective characteristics determine the grouping. The characteristics of the stimulus sequence may largely determine the nature of the perceived grouping: the relative intensities of the group members, their absolute and relative durations, and their temporal spacing are probably the most important characteristics. Experimenters may manipulate any of these three characteristics and then attempt to answer two types of questions: (a) What is the nature of the effects principally produced by manipulating these characteristics? (b) What are the possible durations of rhythmic groups (Fraisse, 1982)? The f i r s t of these two areas experimenters have investigated i s the nature of the effects produced by manipulation of various objective characteristics. Although any. differentiation in an isochronous sequence, of identical elements may serve as a basis for grouping, each one does not have the same effects as far as the organization of the temporal sequence, i s concerned. An element of greater intensity, for example, may determine the nature of a grouping. The objective accent i s situated most spontaneously at the beginning of the pattern: a- regular succession of a strong and of a weak 12 sound of equal duration i s perceived as a succession of trochees (strong-weak) 60% of the time and as a succession of iambs (weak-strong) 40% of the time (Fraisse, 1982). An element of greater duration may also determine the nature of a grouping. In general, a noteworthy lengthening of the duration of a sound or of the interval between two sounds determines the end of a group; this longer duration allows one to distinguish between two successive patterns. (Fraisse, 1982, p.159) Such a lengthening creates a rupture between two groups. Woodrow (1951) summed up the effects of these two characteristics: a regularly recurring, relatively greater intensity exerts a group-beginning effect, while a regularly recurring, relatively greater duration exerts a group-ending effect ( i f the temporal spacing i s equal). This latter effect i s particularly important to the discussion of the present study. The objective characteristic of temporal spacing may be best investigated in conjunction with intensity. For example, when the temporal spacing and the relative intensities (referred to as loudness in the following quotation) are considered; Woodrow explains: It i s possible to determine at how slow a rate certain factors producing rhythmical grouping lose their force. One such factor i s a difference in the relative loudness of sounds composing a rhythmical series. For. example, i f every second sound in a series of equally spaced sounds Is louder than the others, and i f the rate of sequence of the sounds is* moderately slow, the sounds tend to be perceived in trochaic rhythm, that i s , in groups of two with the louder sound the f i r s t member of the group. However, when the sounds succeed each other at a very slow rate, this segregating effect of loudness may vanish: the subject no longer reports any grouping at a l l . (Woodrow, 1951, p.1230) According to Woodrow, this grouping seemed to disappear i f the temporal spacing - the interval between the sounds - was greater than 3.5 seconds. Fraisse (1963) also observed a limiting value for the temporal 13 spacing of two stimuli, but stated that the rhythmic structure disappeared when the interval between sounds was approximately 2 seconds. This value would be expected to vary depending on the stimuli and the conditions. Again, experimenters may manipulate objective characteristics in order to investigate a second area: the possible durations of rhythmic groups. As already alluded to above, there are certain limits to the durations of the perceived groups. According to Woodrow (1951), the study of rhythm - the main interest of this paper - affords an indirect way of attempting to determine the upper limit of the psychological present. When a subject perceives a sequence of elements as a sequence of groups of elements (e.g. trochees), the perception i s described as what i s meant by rhythm in the psychological sense. The successive groups are ordinarily of similar pattern and experienced as repetitive. Each group i s perceived as a whole and therefore has a length lying within the psychological present. (Woodrow, 1951, p.1232) Experiments on the perception of timing have attempted to estimate the limits (both the upper limit - maximum duration - and the lower limit -minimum duration) of the psychological present. The question may be better phrased, perhaps, as that of the maximal physical time over which may extend a temporal stimulus pattern, the successive parts of which are perceived as a whole possessing a unitary property of duration. There i s also a minimal time which constitutes the lower duration threshold. (Woodrow, 1951, p.1230) As already stated, the nature of the stimuli may affect the size of this time span, the upper limit of which was found to l i e between 2.3 and 12 seconds (Kastenholz, 1922, cited in Woodrow, 1951). At the lower limit of this time span, a stimulus of short duration results in a unitary experience but since the stimulus i s perceived as instantaneous, i t is f e l t to lack the quality of duration. Therefore, there must exist a minimum time limit for duration: 1A this minimum amount of time appears to vary between 0.01 and 0.12 seconds, again depending on the stimulus and the conditions (Woodrow, 1951). If the temporal length of the stimulus l i e s within the psychological present, then the timing may be said to correspond to the short-term level. It i s the perception of this short-term level in which this study i s interested. It has been noted that a rhythmical measure (e.g. syllable, foot) i s assumed to constitute a perceptual unity and that this notion seems to be in agreement with results obtained through direct introspections of temporal unity (Woodrow, 1951). In addition to investigating the limits to the durations of rhythmic groups, one may also investigate the accuracy with which a duration i s estimated. The perception of time has been studied by many researchers attempting to discover the size of the 'indifference' interval. The indifference interval was investigated by Vierordt in 1868 (cited by Woodrow, 1951, p.1225) who f i r s t noted that "...short intervals are overestimated and long ones are underestimated." A large range of indifference intervals has been reported, with the intervals of 0.5 to 0.7 seconds perhaps being reported most frequently. Since, i t i s for the indifference interval.that the most accurate-temporal discriminations are found, the use of this temporal interval appears most appropriate when one attempts to measure the just noticeable difference (jnd), or difference limen (DL) for duration, namely the smallest temporal difference one i s able to detect. In 1933, Blakely (cited in Woodrow, 1951) used empty intervals bounded by clicks to measure relative jnd's of interval length and found that discrimination was most accurate at intervals of 0.6 to 15 0.8 seconds, with the relative jnd being just under 8% of the standard. After reviewing the literature, Woodrow (1951) noted that the relative jnd for durations was almost the same as for intervals. He stated that the differential threshold for sound interval and sound duration had been determined: In general, the just noticeable difference i s smallest at some relatively short magnitude or range of magnitudes and increases both above and below this middle region. (Woodrow, 1951, p.1224) The relative jnd was slightly greater with shorter and longer standards; however, from 0.2 to 1.5 seconds, i t was s t i l l less than 10%. Using even longer standards, from 2 to 30 seconds, the jnd rose from 16% to 30%. The accuracy for reproduction of these intervals and durations (for instance, reproduction of an interval by taps on a reaction key) was similar to the accuracy for discrimination, with the greatest reproduction accuracy extending from 0.2 to 2.0 seconds (Woodrow, 1951). The perception of temporal phenomena i s not fully understood. However, any experimental design must consider the two types of factors discussed above, which affect temporal judgment of intervals and of durations. Some of the objective factors are intensity, quality, length of preparatory interval (e.g. amount of time between item number and stimulus), length of bounding elements (in the case of intervals), range.of times used, and practice effects. Some of the subjective factors are judgment strategies, attention, interest, motivation, and emotional state. Since time perception i s affected by many factors, some of which (e.g. emotional state) are not easily kept under control and often make the study of time estimation using a s t r i c t experimental approach d i f f i c u l t , care must be taken in the experiments proposed for this study in order to minimize any undesired 16 effects caused by the above factors. Temporal stimuli are ordinarily of two sorts: (1) what are called empty intervals, for example, two flashes of light or two short sounds separated by a period of time; and (2) sounds or lights lasting continuously over a period of time. (Woodrow, 1951, p.1224) Both sorts of stimuli are of interest in this study. Hereafter, the f i r s t type i s referred to as an 'interval' and the second type is referred to as a •duration'. When reference i s made to either type in the present paper, i t is to be assumed that the stimuli are composed of sounds unless stated otherwise. Furthermore, the types of sounds to be used as stimuli in the experiments carried out for this study are nonspeech sounds for intervals and speech sounds for durations. Studies dealing with the perception of timing have used both 'multiple' and 'single' temporal stimulus paradigms; however, i t i s the 'single' stimulus (composed of six elements) paradigm which has particular importance for this paper since i t i s the type to be used in the present experiments. In a 'multiple' stimulus (which may be single- or multi-element) paradigm, two time intervals or durations are presented before a subject makes a judgment. For example, the experimenter presents two intervals; the subject then compares"them and makes a judgment as to their relative length. In a.'single' stimulus (which also may be single- or multi-element) paradigm, one time interval or duration i s presented without the presentation of a second interval or duration with which to compare the first-. For example, the experimenter presents a single interval and the subject makes a judgment as to i t s length. It should be noted that Woodrow (1951) reviewed various studies and concluded that when judging a single temporal stimulus 17 ...the terms short and long probably always carry the implication of a comparison with a standard even though the subject might be unaware of i t . This standard i s largely a function of recent temporal experiences, or of a composite trace l e f t by them. (Woodrow, 1951, p.1229) As already stated, the experiments for this paper w i l l use the single stimulus paradigm; however, the subject w i l l be given the terms 'regular' and 'irregular' for comparison (instead of 'short' and 'long'), with 'irregular' being subdivided into 'accelerating' and 'decelerating'. It. may be assumed that these terms also carry the implication of a comparison with a standard whether the subject i s aware of i t or not. In a similar study on the rhythmic perception of repetitive sound sequences, Hibi (1983) dealt with the occurrence of one or two random irregularities in a group of clicks. He investigated the amount of temporal distortion which must intervene in a sequence of uniformly spaced sounds in order for a listener to be able to report an irregularity: ...the ability of listeners to report whether there was a distortion in a sequence as a function of the rate of succession did not vary in the regions of from 1 to 2.5 times per second and from 4 to 7 times per second. (Hibi, 1983) It should be noted, however, that the difference in detection values between these two regions was significant. The picture emerging from these results, then, i s of the subjects' accommodating to the temporal sequences which have different rates of succession ranging from 0.7 to 7.0 times.per second. (Hibi, 1983) As indicated in Chapter 1, this random type of irregularity must be differentiated from the nonrandom type of irregularity such as the gradual lengthening or shortening of intervals that may occur in a grouping of clicks or other sounds. Again, i t i s this latter type of irregularity which is investigated in the current study, as such irregularities seem to be important and observable in most types of speech. It is well known that a 18 trend toward lengthening of syllables as the utterance progresses has been found in most languages. Despite the presence of this nonrandom type of irregularity, and despite hesitations and other pauses which tend at times to conceal or to disguise the rhythm, speech rhythm has important implications for perception. However, Hibi's finding regarding the difference in detection values between a low and a high rate of succession i s interesting in that the finding led him to propose two different types of processes which may occur in rhythmic perception. One type of process appears to be an 'ongoing' or 'one-by-one1 processing which takes place when the rate of succession i s low (slower than three times per second). In this type of processing, a listener would expect the next element at a predictable time and then compare the perception of this next element with his expectation; each element i s processed separately. This mechanism would account for the fact that negative compensation among neighbouring time intervals results in the perception of a regular rhythm. The other type of process appears to be a 'holistic', 'global', or 'Gestalt-like' processing which takes place when the rate of succession i s high (faster than three times per second). In this sort of processing, there would not be sufficient time to process in a one-by-one manner, thus the regularity of the timing pattern would be evaluated in two steps: F i r s t , i t i s necessary to postulate a regular temporal pattern, that i s , the pattern of time intervals that would yield an even tempo, and then to detect a departure from these regular values in the observed intervals by pattern-matching or similar routine. (Hibi, 1983) In this way, the sequence i s processed as a whole. Both of these processes could occur at the short-term level. However, i t i s the latter process which i s of particular interest to the 19 present study since our stimuli are based on a syllabic rate which i s greater than three times per second. 2.3 Timing in Language and Isochrony Since language appears to have the same rhythmic constraints as other motor behaviours, limits on the types of language rhythms to be expected are set by these constraints. Allen (1975) suggested that: ...[language rhythms] should be simple in structure, confined largely to successions and alternations, depending on the relationship between syllables and stress accent in the language; the rate of succession of syllables and rhythmic groups should be in or near the range of 0.2 to 1.0 per second; and the variability with which the time program of the rhythm i s realized should conform to the variability of other skilled motor acts. (Allen, 1975) In addition, these constraints impose more subtle, unspecified limitations for a particular language rhythm, which may be due to certain aspects of articulatory timing which must be learned. A claim made by some authors (Allen, 1975; Fraisse, 1982) is that the natural rhythms of our movements reinforce the tendency to impose rhythm on language and other human acts. These authors often cite a study by Miyake (1902) which looked at the rhythmic structure of various motor behaviours. Miyake came to two conclusions: 1.. i t i s impossible not to act rhythmically and 2. simple successions and alternations are most prevalent to our movements. Allen (1975) and Fraisse (1982) further cited studies which showed that when we are performing a motor task at our spontaneous rate- for speech and other motor activities, the average limits are approximately 0.2 and 2.0 seconds between acts. The difference in these spontaneous rates.of 20 succession depends on who we are, what we are doing, and whether we are reproducing a given interval or producing our own interval (variability i s less i f the rate i s our own). These figures would be comparable to the preferred rate discussed in Section 2.2. Classe (1939) was one of the f i r s t to propose that isochrony was present in the rhythm of English prose. He claimed that isochrony was probably the essential characteristic of rhythm. However, Classe appreciated that perfect isochrony could be realized only under very special and definite conditions and that these conditions were seldom f u l f i l l e d in ordinary speech. Due to the variation in interstress intervals, many studies have rejected the idea of s t r i c t isochrony at the acoustic level, or they have simply stated that perfect isochrony cannot be found (Classe, 1939; Shen and Peterson, 1962; Allen, 1968; Lea, 1974; etc.). Hoequist (1983a) claimed that, due to other constraints, speakers do not usually attain their goal of altering production in order to move toward isochrony. Indeed, s t r i c t isochrony in the acoustic signal does not appear to hold for any of the languages examined. The results of Hoequist's study support a weaker version of isochrony: a tendency toward isochrony seems apparent, but isochrony i s affected by other processes such as prepausal. lengthening (lengthening before a pause, for instance, word, phrase, utterance), changes,in rate, etc. In 1969, Crystal proposed a theory of 'subjective isochrony 1, namely that rhythmic regularity i s 'read into' the perceived utterance. Lehiste (1977), in her review of the literature on isochrony, reached the same basic conclusion. She concluded that isochrony i s primarily a perceptual phenomenon imposed on acoustic anisochrony. Lehiste further concluded that isochrony i s integrated into the grammar of English at the syntactic level: 21 large enough deviations from the expected isochrony may serve to signal a syntactic boundary. These conclusions would agree with Fraisse's (1982) explanation of rhythm, in general, being a perceptual quality (see Section 2.2). Lehiste (1977) writes that many durational differences in interstress intervals may be below perceptual threshold: furthermore, listeners tend to impose a rhythmic structure on speech and therefore may perceive isochrony even in sequences with differences above perceptual threshold. Ladefoged (1975) noted that different languages had different rhythms due to their variation in use of stress. For example, the rhythm in French has a great evenness as only the phrase-final syllable has a stress different from the other syllables. From this viewpoint, French is a syllable-timed language with syllables tending to recur at regular intervals. However, in English, stress has a dual role: i t carries grammatical meaning and organizes rhythm. Therefore, the syntactic and rhythmic patterns interact, often making the stress patterns d i f f i c u l t to describe. With regard to these different rhythms, the durational characteristics of speech are often proposed as a basis for classifying any language into a particular rhythm or timing category. Three of the categories proposed most often are stress-timing, syllable-timing, and mora-timing, which would have a trend toward equal durations of feet, syllables, and morae, respectively. Much has been written regarding the possible existence of these categories and i t is often assumed that English, French, and Japanese are examples of stress-, syllable-, and mora-timed languages, respectively. 22 When discussing the temporal patterns of speech, some authors (e.g. Lehiste, 1977) use the term isochrony to refer to stress-timed languages only, while others (e.g. Hoequist, 1983a; 1983b) use the term to refer to a l l languages, regardless of the timing category. This paper intends to use the term isochrony to refer to a l l language timing categories. In a study on the location of rhythmic stress beats in English, Allen (1972a; 1972b) gave evidence for English being a stress-timed language: both the actual beat locations and the correlation of accuracy with degree of stress supported this notion. As noted earlier in this section, Lea (1974) found evidence against s t r i c t isochrony in English: stressed syllables show a tendency to occur at intervals of 400 to 500 ms; however, the number of unstressed syllables between stresses very much affects these intervals and causes large interval variation. In his study on prosody, he concluded that ...the concept of English being a stress-timed language i s not simply exhibited by exact equality of interstress intervals, or even by an unquestionable 'tendency toward equality' of interstress intervals regardless of other factors. We found that, contrary to several published hypotheses, the average interstress interval increases about linearly with the number of unstressed syllables between the stresses. A tendency toward stressed-unstressed alternation was exhibited, and i t i s probably this tendency, plus the somewhat uniform durations expected for unstressed syllables, that yields the tendency for interstress intervals to cluster somewhat near an average of 400 ms or so. (Lea, 1974) This stressed-unstressed alternation may be supported by evidence from speech production. Since an economical model.of speech predicts a parallel rhythm pattern for perception and production, Stone (1981) measured jaw movement during speech production of nonsense syllables to see i f a rhythm pattern was present. Her attempt to find a physiological correlate of stress level in the jaw displacement signal was not successful, as 23 displacement was affected by nonrhythmic variables. However, the opening velocity of vowels gave cues for three levels of stress: unstressed, stressed, and 'beat' syllables. Furthermore, a rhythm pattern due to the alternation of these stress levels (i.e. alternating beat and non-beat syllables) was present. The presence of the 'compression effect' (or of temporal compensation) at the foot level - the shortening of syllable durations as the number of syllables in a foot increases - has been suggested by various investigators (Allen, 1968; Lehiste, 1970; Huggins, 1972; and Klatt,. 1976). In a cross-linguistic study, Hoequist (1983a) investigated whether duration was altered differentially depending on the timing category to which a particular language belonged. He suggested that a stress foot in a stress-timed language consists of a f o o t - i n i t i a l syllable which carries' the stress 'beat', followed by zero, one, or more, less-stressed syllables, up to but not including the next stress 'beat*.. He referred to the compression effect at the foot level as evidence for the existence of stress feet. This compression effect allows maintenance of even or. near-even foot durations. In a study on hierarchical (rhythmic) versus serial structure in behaviour, Martin (1972) noted that syllables in a string are timed relative to the whole pattern rather than simply concatenated. This would provide evidence for preset timing and would favour central timing mechanisms during production, rather than peripheral timing mechanisms. He hypothesized that the accented syllables in English are planned f i r s t , since the accented elements dominated the temporal organization of an utterance. Lower level, less-stressed syllables are planned later. The accented syllables would appear to be the main targets in such a programme. This view would tend to 24 support the existence of the compression effect in that the timing of the stressed syllables i s most important and therefore takes priority. The timing of the less-stressed syllables i s less important and, therefore, susceptible to compression in order to maintain the timing of the stressed syllables. The compression effect should not occur at the foot level in a syllable-timed language, as presumably no foot structure i s present and syllables w i l l therefore tend toward equal durations. Nor should this compression effect occur in a mora-timed language as the morae w i l l also tend toward equal durations. In another study, Hoequist (1983b) had subjects produce sentences containing multisyllable target words (with stress occurring in most positions in these target words) in order to test durational characteristics of these same three timing categories. His conclusion was that morae and stress feet influenced syllable duration in Japanese and English, respectively; however, Spanish showed no language-specific durational characteristics resulting from syllable-timing (assuming that Spanish i s a syllable-timed language). In addition, stress-timed languages showed a small trend toward medial syllable shortening - perhaps due to stress context. These results led Hoequist to suggest that Japanese maybe a duration-controlling language; English may be a duration-compensating language; and Spanish may be neither. He noted that other results supported, these differential durational characteristics according to the timing category of the particular language involved (e.g. Delattre, 1966, cited in Hoequist, 1983a). In a discussion of unemphatic stress in French - a prominence of 25 some kind which f a l l s on the last syllable of an isolated word or the last syllable of each group of words in a longer utterance - Benguerel (1970) noted that the prevailing factor was vowel duration. Referring to these groups of words as rhythmic groups, sense groups, or breath groups, he showed that in his data, obtained for French subjects, the last vowel of the rhythmic group had an average length 50% greater than that of the next-to-last vowel. In a study of English, Klatt (1975) found this phrase-final lengthening phenomenon to be an average of 30%. This unemphatic stress, or prepausal lengthening (a tendency toward longer durations as one gets to the end of an utterance, e.g. word, phrase, sentence), does not appear to be an exclusive characteristic of a particular timing-type. However, the amount of fi n a l lengthening that occurs in a particular timing-type could indicate different degrees of durational control. This may produce indirect evidence with regard to the timing category of a particular language. Allen (1975) indicated that different rhythmic patterns result in differences in phonetic 'shape' between accented syllables and unaccented ones. In English, unstressed syllables lose their phonetic shape as they are •reduced* in both quality and quantity. However, in French and Japanese, the unstressed syllables or morae do not lose their phonetic shape. This would Indicate that the rhythmic pattern i s tied to syllables, morae, and accents. The claim has been made that English appears to have mainly a rhythm of alternation since i t usually has one or more unaccented syllables per foot (Allen, 1975). On the other hand, French and Japanese appear tO' have mainly a rhythm of succession since stress plays a weaker role in these two languages and they tend to group the syllables or morae into sequences of equals (Allen, 1975). 26 Abercrombie (1967) noted that i t was the way stressed and unstressed syllables succeeded each other that produced the rhythm of a language. This rhythm i s one of the most fundamental aspects of the language for i t i s one of the earliest learned by an infant, one of the last to disappear in various forms of aphasia, and one of the most d i f f i c u l t for an adult to modify when learning a foreign language. It has been suggested that identical rhythmic sequences are perceived differently by speakers of different languages as a consequence of the rhythmic pattern of their particular language (Jakobson, Fant, and Halle, 1963).. Allen (1975) added that different experimental situations might result in a subject's language having different effects on his perceptions, perhaps as a function of whether or not the subject was listening in the 'speech mode'. As discussed in Section 2.1, i t appears that when attempting to decide whether a particular language has a rhythm of succession or a rhythm of alternation, one must specify the hierarchical rhythmic level being considered, for the type of rhythm may be a function of this level. For instance, at the foot level, English may be said to have a rhythm of succession; however, at the syllable level, English may be said to have a rhythm of alternation. These two levels (foot and syllable) appear to be sub-levels of the phrase level of organization discussed earlier. It must be emphasized that there i s evidence that the phrase (or short-term) level and the long-term level, which i s not within the scope of the present paper, interact: languages may differ in the way in-which rate affects segment duration (Lehiste, 1970). In some languages (e.g. English), an increase in speech rate may be achieved by shortening unstressed 27 syllables, while in other languages (e.g. French), an increase in rate may be achieved by shortening each syllable more or less proportionately over the entire speeded up utterance. In addition to the two hierarchical, central components, namely short-term and long-term control, Allen's (1973) and Ohala's (1973) models for durational control had peripheral sources of imprecision: neuromuscular transmission, articulator and aerodynamic constraints, and measurement error. Allen noted that the effects of the peripheral sources of imprecision on speech duration could account for much of the individual variation that is encountered in the measurement of the various language rhythm categories. To compensate for individual variation, the listener appears to impose a rhythmic structure oh the speech signal, as long as the timing i s not too distorted. The goal of the speaker appears to be the production of a sound pattern which f i t s an auditory perceptual target. The fulfillment of this goal results in speech being perceived as regular by the listener. Since i t i s the 'important' sounds or 'beats' of the organized speech events which listeners appear to use in order to impose a rhythm on language, various studies, which endeavour to determine rhythmic beat location are discussed below. In an attempt to validate various findings related to 'units of perceptual processing' (e.g. segment, syllable, word, and foot level), Barry (1983) used a judgment task in which subjects were required to indicate the location of a click in a string of nonsense syllables. With particular regard to stress-timed languages, he sought evidence for perceptual units below the major syntactic group level. His working hypothesis was that 28 Clicks located objectively 'within' a certain unit of perceptual processing w i l l tend to be placed subjectively within that unit, but w i l l not be differentiated as to their location within the unit. (Barry, . 1983) Since the location of click placement was random for segments, he concluded that segments were unlikely to be perceptual processing units. However, localization within the syllable and the foot resulted in fewer placement errors, supporting the claim that both the syllable and the foot were possible processing units, with the syllable being perhaps the basic unit. In an investigation of the perception of timing in conversational speech (English sentences), Huggins (1972) previously had reached a conclusion similar to Barry's. Huggins proposed a 'vowel onset' hypothesis where compensation occurs between the onset of stressed vowels and, therefore, these stressed vowel onsets tend toward invariance. Huggins' vowel onset hypothesis would be compatible with the 'syllable beat' hypothesis proposed by Allen (1972a; 1972b). To examine the perceived location of rhythmic stress beats in English, Allen used click placement tasks and English sentences. Subjects performed three different tasks: 1. tapped their fingers in time to the beat of a given syllable in a sentence. 2. moved a click to match the beat of a given syllable in a sentence. 3.. judged whether a click 'hit the beat' of a given syllable in a. sentence. Allen concluded that the beats were closely associated with stressed-syllable vowel onset but preceded i t by an amount positively correlated with the degree of stress. 29 In a paper on speech perception, Morton, Marcus, and Frankish (1976) introduced the concept of the 'perceptual center' (P-center) of a word as " i t s psychological moment of occurrence", that i s , a point which must be temporally equidistant from corresponding points in surrounding words in order for the sequence to be perceptually isochronous. They realized that this point would not necessarily be the acoustic onset of the word (or of the vowel). By presenting pairs of digits, they found that sequences which were acoustically isochronous were not perceptually isochronous (i.e. sequences with regular acoustic onsets were not perceptually regular). When subjects adjusted the sequence timing until they perceived isochrony, they consistently introduced acoustic anisochrony. In a follow-up study on the P-center, Fowler (1979) found "...that listeners judge isochrony based on acoustic information about 'articulatory' timing rather than on some articulation-free acoustic basis." The manner class of the i n i t i a l consonant had acoustic consequences on the articulatory onset (Fowler, 1979). The 'P-center' of a syllable appears to correspond to the 'beat' of a syllable and both of these locations are affected by the s y l l a b l e - i n i t i a l consonant. It would seem that the P-center and the beat are measurements of the same phenomenon. According to Allen (1975), the.perceived rhythm of English may be due to two tendencies: 1. a tendency to impose a rhythmic structure on any sequence of intervals. 2. a tendency to adjust the perception of interstress intervals toward some average or central duration. These two tendencies would lead to perceptual isochrony, despite .the existence of acoustic anisochrony. 30 In his paper on temporal regulation of speech, Ohala (1973) looked for support for either the chain or the comb model in speech timing. These models appear to correspond to Kozhevnikov and Chistovich's (1966) closed and open loop models, respectively. Ohala concluded that the speaker must adhere to a pre-programmed time schedule for short spans of speech (e.g. one or two syllables) but that such i s not the case for longer stretches. This would correspond to his proposed hybrid model: the comb model for short-term timing and the chain model for long-term timing. Speech rhythm i s often said to be experienced as a rhythm of motor movement (Abercrombie, 1967). In as much as the present study i s primarily concerned with the perception of a regular rhythm, not the production, i t must, be partially assumed the two can be separated, or at least observed independently. If rhythm i s perceived through neural interactions between areas of the brain which govern production and perception, perhaps there i s a common mechanism underlying both the production and the perception of a temporal sequence. Such an idea i s very akin to the motor theory of speech perception and most theories seem to be based on this premise. The motor theory hypothesizes that auditory patterns are interpreted in terms of the articulatory patterns that would e l i c i t them. The process i s not presumed to be a conscious process, but a rapid, computer-like cross-check performed subconsciously by the sections of the nervous system that deal with auditory, speech-motor, and linguistic functions. Two phenomena (already described in this chapter) have important implications for the experiments to be carried out in this study: 1. the trend toward lengthening of syllables as an utterance progresses. 2. the tendency of acoustic isochrony of speech sequences not to result, in perceptual isochrony. ° 31 In other terms, speech may have timing irregularities which are perceived as regular. Since this study i s interested in the difference limen (DL) for timing irregularity, both of the above phenomena must be taken into consideration in order to discover: (a) what degree of irregularity i s required for a stimulus, which contains elements of progressive irregularity, to be perceived as regular; (b) what increment or decrement in the degree of irregularity (for accelerating and decelerating stimuli) i s required for the detection of an irregularity. 32 CHAPTER 3 AIMS OF THE INVESTIGATION This thesis proposes to study the abil i t y of an individual to perceive a progressive irregularity in the rhythm of a repeated nonspeech or speech sound sequence. For this purpose, two types of experiments w i l l be carried out. In the f i r s t type, each stimulus wi l l consist of a train of clicks. In the second type, each stimulus w i l l consist of a sequence of concatenated CV (consonant-vowel) syllables. These speech stimuli w i l l use a basic CV syllable, monotone and of constant amplitude, in an attempt to control some of the variables (mainly fundamental frequency and intensity) which can affect the perceived quality of speech sounds. Since this study deals with aspects of rhythmic structure that do not involve pitch and loudness, we are manipulating only the time dimension, and although this type of manipulation may affect pitch and/or loudness, any such effect i s assumed to be minimal. Experiments have been carried out (Hibi, 1983) to assess the degree of temporal distortion that i s required in a repetitive sound sequence (trains of clicks) in order to perceive a randomly located irregularity. In the-experiments proposed here, stimuli w i l l be altered to varying degrees,by. either shortening or lengthening gradually the intervals between clicks or between syllable beats. These experiments have two main objectives, based primarily on two tendencies discussed in Chapter 2, namely (1) the tendency, by speakers of most languages, to lengthen durations (of segments, syllables, words) as their utterance (word, phrase, sentence) progresses; (2) the tendency, by listeners, to. impose a rhythmic structure on sound sequences. 33 The f i r s t objective of the study i s to determine the degree of irregularity required for the listener to perceive a repetitive sequence (of clicks or of syllables) as regular. Since there i s a trend toward lenthening of syllables as the utterance progresses and since listeners tend to impose a rhythmic structure on sound sequences, the stimulus which i s perceived as most regular i s not expected to be the one which i s acoustically regular, but one which i s decelerated. Hence, the intention i s to discover the degree of deceleration that a subject perceives as regular. The second objective of the study is to determine, with respect to the above perceived regularity, how large an increment or decrement in the degree of irregularity i s required for the listener to perceive that a repetitive sequence (of clicks or of syllables) i s no longer regular. In other terms, we are interested in a measure of the degree of irregularity required for the perception of regularity and also in a measure of the DL for timing irregularity. In addition to the two main objectives, of these experiments, there are two others. A third one i s to determine i f the nature of the stimulus has an effect on either of the two measures of rhythm discussed above. A fourth objective i s to determine i f the language background of the listener has an effect on either of the two measures of rhythm discussed above. This study w i l l use native speakers of English, French, and Japanese as subjects in both types of experiments. Since these three languages are alleged to differ by having, respectively, stress-timing, syllable-timing, and mora-timing, in analyzing the results, an attempt w i l l be made to discover any correlation that may exist between the type of timing (and the particular language) and either of the two measures of rhythm discussed above. 34 The subjects in this study w i l l , in most cases, have some background in linguistics and hence w i l l not be considered linguistically naive. This may make them more aware of durational patterns in speech than non-naive subjects would be, but should help rather than prejudice the study. In a paper on the rhythmic perception of seven languages, Miller (1984), using both linguistically naive and non-naive subjects, found that the non-naive subjects demonstrated greater discrimination ab i l i t y than the naive subjects did. This supports the decision to u t i l i z e linguistically non-naive subjects as i t i s desirable to obtain the best possible discrimination a b i l i t y , thereby, the difference limen for irregularity. 35 CHAPTER 4 EXPERIMENTAL PROCEDURES AND APPARATUS A.l Pilot Study In order to determine the stimulus structure and the parameter values: of the stimuli for the main study, a pilot-study was carried out. The stimuli used in Pilot Test I were sequences of clicks, while those used in Pilot Test II were sequences of CV syllables. The two syllable types used in Pilot Test II were [ta] and [na]; however, only one syllable type was heard during any particular s i t t i n g . Three test tapes were produced in order to estimate the values of timing distortion necessary for subjects to detect that the rhythm of a sound sequence was not regular.. It was decided that the values with the greater amount of distortion should produce close to (but slightly less than) 100% detection so that the intermediate steps would be small enough to estimate the DL for irregularity. The goal, therefore, was to produce stimuli with extreme accelerated and decelerated parameter values which could correspond to anchor points between which the DL's for rhythmic irregularity would l i e . The pilot tapes were also used to verify that the quality of recording was adequate and to make any necessary improvements. 4.2' Pilot Test I 4.21 Stimulus Preparation This test used only click stimuli, each stimulus"^ consisting of six 1-ms clicks and each click being separated l l n the remainder of this paper, the term 'stimulus' w i l l be used interchangeably with 'stimulus type', whereas the term 'test item' w i l l be used interchangeably with 'stimulus token*. 36 from i t s neighbour by a specific silent interval. Each click consisted of 10 samples (1 ms) or one cycle of a 1-kHz sinewave; this cycle always had the same (zero) starting phase. The length of the intervals depended on the timing of the series, which was either speeded up, regular, or slowed down: ten different temporal sequencing patterns were used. Table I shows the number of samples for each stimulus and illustrates the timing alterations. Stimulus 5 was acoustically regular in timing: each interval consisted of 1270 samples (127 ms). Stimuli 1 to A and 6 to 9 also had the f i r s t four intervals equal, while the last interval was either shorter or longer than the previous intervals. Stimulus 10 had each interval (after the f i r s t ) longer than the previous interval by a constant.amount of 256 samples (25.6 ms). A distinction should be made here between three types of irregularity that can occur in a sequence of sounds: (1) A single departure from regularity, where just one interval i s altered, as i s the case for stimuli 1 to 4 and 6 to 9 of Pilot Test I. (2) A progressive irregularity, increasing or decreasing linearly in duration, as i s the case for stimulus 10 in Pilot Test I. (3) A progressive irregularity, increasing or decreasing according to some nonlinear function, as w i l l be the case in Pilot Test II and in the main study. The stimuli were prepared on a PDP-12 digital computer, using a set of computer programmes written by Lloyd Rice at the U.C.L.A. Phonetics Laboratory. A 1-kHz tone from a sine wave generator was sampled into the computer at a rate of 10 kHz. Fifty test items (click sequences) were recorded in a pseudo-random order ensuring that there were five repetitions of each stimulus type, with TABLE I Durational Structure (in samples) for the Ten Stimuli of Pilot Test I IUIUS No. Cl 11 C2 12 C3 13 C4 14 C5 15 C6. Length of Final Interval 1 10 1270 10 1270 10 1270 10 1270 10 246 10 24.6 ms 2 10 1270 10 1270 10 1270 10 1270 10 502 10 50.2 ms 3 10 1270 10 1270 10 1270 10 1270 10 758 10 75.8 ms 4 10 1270 10 1270 10 1270 10 1270 10 1014 10 101.4 ms 5 10 1270 10 1270 10 1270 10 1270 10 1270 10 127.0 ms 6 10 1270 10 1270 10 1270 10 1270 10 1526 10 152.6 ms 7 10 1270 10 1270 10 1270 10 1270 10 1782 10 178.2 ms 8 10 1270 10 1270 10 1270 10 1270 10 2038 10 203.8 ms 9 10 1270 10 1270 10 1270 10 1270 10 2294 10 229.4 ms 10 10 1270 10 1526 10 1782 10 2038 10 2294 10 229.4 ms Cn = Number of samples per n click In = Number of samples per n^ *1 interval 38 no stimulus appearing twice in succession. The test items were f i r s t run through the D/A converter, lowpass filtered at the Nyquist frequency (5 kHz) with an RC lowpass f i l t e r having a 48 dB/octave skirt slope, and recorded onto the f i r s t track of an audio tape, using a two track Revox A77 tape recorder, as shown in Figure .1. In order to allow the f i r s t five items on the. tape to serve as practice items, these items were repeated at the end of the tape. Only items 6 to 55 were considered in the analysis. The actual test tape was obtained by dubbing the original tape onto the f i r s t track of a second audio tape (with another Revox A77 tape recorder) as shown in Figure 2. Item numbers were recorded onto the second track of the test tape and the two tracks were electronically mixed for the actual testing. 4.22 Subjects Nine adults (7 females and 2 males), none of whom were linguistically naive, were used as subjects for Pilot Test I. Eight were native speakers of English and one was a native speaker of French. None of the subjects had any known speech or hearing problems. 4.23 Test Procedure Subjects were seated individually in a quiet room and informed that they would hear trains.of clicks.. They were told they would, perhaps notice that the timing was altered in some of the trains: in some of the stimuli, the timing may be speeded up, while in others i t may be slowed down. They responded on a five-point scale, as suggested by the columns on the response form: column 1 was for 'speeded up', column 3 was for 'regular*, and column 5 was for 'slowed down'. Subjects were further encouraged to use columns 2 and 4, whenever they f e l t the timing was intermediate. PDP-12 Computer fs = 10 kHz RC LP F i l t e r 48 dB/Octave fc = 5 kHz' Revox A77 19 cm/s D/A Y In Out In 1 w Figure 1: Block Diagram for Digital-to-Analog Conversion Revox A77 RC LP F i l t e r 19 cm/s 48 dB/Octave f c = 5 kHz Out : • In Out "I Mike Quiet Room Headphones Revox A77 19 cm/s In 1 Out 1 In 2 Figure 2: Block Diagram for Dubbing of Original Tape and Recording of Item Numbers on Test Tapes o Al The f i r s t 10 items were presented to the subject, the tape stopped, and any questions the subject had were answered. The tape was then restarted at item 1. The test was presented through h i g h - f i d e l i t y headphones at a comfortable l i s t e n i n g l e v e l (approximately 70 dB SPL). a.2a Results To analyze the data, each response was coded with a number from 1 to 5, as suggested above. With the aid of a computer programme, a •response matrix 1 was computed for each of the subjects. Table IV i n Section a.3a shows a t y p i c a l example of one of these matrices. In the table, the stimulus types are l i s t e d i n a column; the response data, after sorting, are l i s t e d i n a 7 x 5 matrix; and the mean and i t s standard deviation (s.d.), for each stimulus type, are l i s t e d i n the two rightmost columns. Table V i n Section a.3a shows a t y p i c a l example of the averaged response for a l l subjects i n a part i c u l a r test. These data were obtained by averaging a l l of the subjects' means and s.d.'s for each stimulus type. These data are displayed i n Table V, by stimulus type. Figure 3 shows the mean and two s.d.'s, averaged for nine subjects, for each stimulus type of P i l o t Test I. This figure gives an indication of the v a r i a b i l i t y i n response for each stimulus type. The results indicate that stimulus 5, which had equal i n t e r - c l i c k i ntervals (cf. Table I ) , was perceived as having regular timing by a l l but one subject, who twice reported i t as speeded up. However, stimulus 6, which was slowed by 25.6 ms was perceived as regular by a l l but one subject. In addition, responses to stimulus 6 had a smaller mean (over nine subjects) standard deviation than did those to neutral stimulus 5, as can be seen i n Figure 3. 42 Response 5 4 3 2 accelerating decelerating • • • • i _ 2 3 4 5 6 7 8 9 Stimulus i 10 Figure 3: Mean Score and 2-s.d. Range for Each Stimulus of Pilot Test I, Averaged over Nine Subjects A3 This figure also shows that the variability i s far greater for the decelerated stimuli (6 to 9) than for accelerated stimuli (1 to A). The reduced variability of the accelerated stimuli indicates that less of an irregularity in the sequence of clicks (and therefore smaller changes in timing structure) may be required for the shortened stimuli to produce small enough changes to best estimate the DL for accelerated irregularity. These changes in the amount of irregularity would be necessary as most of the corresponding timing structures used in Pilot Test I were too easily detected and therefore may be too extreme to be close to the DL. The large variability of the decelerated stimuli indicates that more of an irregularity in the sequence of clicks (and therefore larger changes in timing: structure) may be required for the lengthened stimuli to produce large enough changes to best estimate the DL for decelerated irregularity. These changes in the amount of irregularity would be necessary as most of the corresponding timing structures used in Pilot Test I were too d i f f i c u l t to detect and therefore may not include the DL. Stimulus 10, which underwent progressive lengthening, was easily detected as irregular by most subjects. Therefore, less of an irregularity in the sequence of clicks would be necessary to estimate the DL for irregularity in a linearly progressive lengthening situation. 4.3 Pilot Test II 4.31 Stimulus Preparation The stimuli for this test consisted of manipulated natural speech, each stimulus consisting of a six-syllable sequence where the syllables were phonetically identical, of same amplitude, and monotone, but where, as in Pilot Test I, the durations, were varied. In 44 Pilot Test Ha, the syllable used was [ta]; in Pilot Test l i b , the syllable was [na]. As alluded to above and explained below, the stimuli used in Pilot Test H a and l i b differed from those used in Pilot Test I in that they a l l underwent nonlinear progressive timing alteration, instead of being identical in a l l but the last interval (stimuli 1 to 9). A trained phonetician (native speaker of French), seated in a sound-treated booth, produced the desired syllables which were recorded onto a Revox A77 tape recorder. In order to help the speaker to produce these syllables on a monotone, he heard a 100-Hz pure tone while producing the syllables, as shown in the schematic diagram in Figure 4. He also endeavoured to produce them in such a way that both of the segments (C and V) would be long enough for the intended purpose. The syllables were then sampled at a frequency of 5 kHz, playing the tape at half-speed (9.5 cm/s) through a lowpass f i l t e r at the Nyquist frequency (in this case, 2.5 kHz) into a PDP-12 digital computer, using the same computer programmes as before. The instrumental layout i s diagrammed in Figure 5. In order to estimate typical values of timing irregularities encountered in natural speech and likely to be perceived as regular, the production data of Oiler (1973) were recompiled, plotted, and values were obtained for various segment and syllable duration ratios. The ratios utilized are shown below. These values were also compared to the data of Nooteboom (1973) and Klatt (1975; 1976) and found to be fai r l y similar. It was decided to use a function which would generate syllable duration values that would most closely match the corresponding median values obtained from Oiler's (1973) data. The following three ratios were considered for the match: 45 R l = C5 + V5 In these formulae, Cj and C4 + V4 Vj are the durations.of the jth consonant and jth vowel, R2 = C5 + V5 respectively. C123 +V123 R3 = C4 + V4 C123 + V123 The function finally arrived at to compute the successive durations of the six repeated syllables was: D = N s . exp (a.x n , z) where D = syllable duration in samples x = a parameter depending on the position of the syllable in the utterance which varies linearly and discretely between 0 ( f i r s t syllable) and 1 (last syllable) N s = number of samples in f i r s t syllable a = scale factor for the exponent of the exponential n = exponent of x (when n = 2, D has the form of the Gaussian function) z = adjusts the value of the exponential in such a way that, for x = 1 (i.e. last syllable), i t w i l l always have the same value for a given a-value, regardless of the value of n For negative values of a, the value D'(a) = 1 - [D(-a) - 1] was used instead of D(a). According to Gerber (1974), the average syllable rate of conversational speech i s slightly above seven syllables per second. Thus, the number of samples, Ns, in the f i r s t syllable of each stimulus was obtained by dividing the rate at which the digitized sound sequences were sampled (10 kHz) by seven; hence, Ng = 1430 samples. As stated previously, the median values obtained from Oiler (1973) for the three syllable duration ratios defined above were used to estimate a 'neutral' timing sequence. Minimum and maximum values (that i s the extreme accelerated and decelerated Sinewave Generator 100 Hz Out Revox A77 19 cm/s In 1 In 2 Out 1 Out 2 I 1 Sound-treated Room ^"^Headphones Mike Figure 4: Block Diagram for Recording of CV Syllables ON Revox A77 9.5 cm/s RC LP F i l t e r 48 dB/Octave f c = 2.5 kHz PDP-12 Computer f s = 5 kHz Out 2 In Out A/D In w Figure 5: Block Diagram for Analog-to-Digital Conversion 48 stimulus, respectively) for parameter a were then computed in an attempt to find the stimulus extremes which are perceived as irregular most of the time, the idea being that the DL for progressive irregularity should be between the perceived neutral timing sequence and the anchor points. For Pilot Test Ha, using the [ta] syllable, the best f i t to the median R^ , R^i and R^ , estimated from Oiler's data, was found for the values n = 4 and a = 0.53. These figures were chosen for TA-4, the 'neutral' stimulus (cf. Table II). The two extreme values chosen for parameter a_ were -0.20 and 1.20; they yielded the ratio values used for stimuli TA-1 and TA-7 and shown in Table II. Two more values of a_ were selected between the 'neutral' and each of the two extreme values, corresponding to stimuli TA-2, TA-3, TA-5, and TA-6. The proportion ased for the duration of the consonantal part of a l l of the [ta] stimuli was chosen as 47.5% of the total syllable duration. The stimuli were prepared, edited, and recorded as described in Section 4.21. However, as Pilot Test Ila contained only seven different stimuli, the test tape contained only forty test items, the f i r s t five serving as practice items and items 6 to 40 being those used in the analysis. In Pilot Test l i b , the [na] stimuli underwent similar preparation to the [ta] stimuli in Pilot Test Ila. However, due to the results of the Pilot Test Ila data analysis (discussed in the following section), only four parameter values were used, mainly in order to determine the values of a corresponding to stimuli which are perceived as irregular close to 100% of the time. As shown in Table III, the a-values varied between -0.20 and 0.88. When preparing the [na] stimuli, some diff i c u l t y was encountered: in attempting to make the nasal consonant long enough for the syllable 49 TABLE II Durational Structure (in samples) for the [ta] Stimuli of Pilot Test Ha SI S2 S3 S4 S5 S6 Stimulus No. a TA-1. 1430 1430 1425 1404 1346 1217 -0.20 ;\ TA-2 1430 1430 1431 1435 1446 1470 0.04 TA-3 1430 1430 1437 1466 1548 1736 0.28 TA-4 1430 1431 1444 1500 1662 2065 0.53 TA-5 1430 1431 1449 1530 1769 2405 0.75 TA-6 1430 1432 1455 1562 . 1889 2820 0.98 TA-7 1430 1432 1461 1593 2010 3285 1.20 TABLE III Durational Structure (in samples) for the [na] Stimuli of Pilot Test l i b SI S2 S3 _S4 _S5 S6 Stimulus No. a NA-1 1430 1430 1425 1404 1346 1217 -0.20 NA-2 1430 1430 1428 1420 1397 1348 -0.08 NA-3 1430 1431 1449 1531 1774 2421 0.76 NA-4 1430 1431 1452 1548 1836 2631 0.88 Sn =. Number of samples per n syllable 50 duration required, the consonant amplitude was too great in comparison to the vowel amplitude. Hence, several new recordings were necessary for this syllable. The most satisfactory way of overcoming this d i f f i c u l t y was to omit, prior to the preparation of the test stimuli, the two periods of the CV-transition which were immediately before the f i r s t period of the vowel. The syllable with shortened transition produced a more natural sounding stimulus than the syllable with f u l l transition. The proportion finally selected for the consonant segment of the [na] stimuli was 40% of the total syllable duration. Five repetitions of each of these stimuli were recorded in a similar manner to that in Pilot Test I l a with an additional five items acting as practice at the beginning of the tape and items 6 to 25 being those used for the analysis. For both Pilot Test Ila and Pilot Test l i b , the actual test tape was obtained by dubbing the original tape onto the f i r s t track of a second audio tape (with another Revox A77 tape recorder) as shown in Figure 2. Item numbers were recorded onto the second track of the test tape and the-two tracks were electronically mixed. 4.32 Subjects Three adults (two females and one male) served as subjects for Pilot Test Ila. None of them were linguistically naive. The two female subjects were native speakers of English and the male subject was a native speaker of French. He was also the speaker for the speech material. For Pilot Test l i b , only the male subject and one of the female subjects participated. The subjects had no known speech or hearing problems. 51 4.33 Test Procedure The subjects were seated individually in a quiet room. They were informed that they would hear stimuli consisting of sequences of six syllables and that they might notice that the timing was altered (either speeded up or slowed down) in some of the sequences. They were asked to put a check in the appropriate column of the seven-point, forced-choice response form, after listening to each test item. A seven-point scale was used in order to give the possibility of a more graded response; i t also allowed the experimenter to potentially s p l i t the responses for the intermediate columns more conveniently, for example with a grouping 1-2/3-4-5/6-7: column 1, 4, and 7 were labeled 'speeded up', 'regular', and 'slowed down',., respectively. Regarding the intermediate columns, the same instructions were given as in Pilot Test I". The f i r s t seven items were presented to the subject, the tape stopped, and any questions the subject had were answered. The tape was then restarted at item 1. The test was presented through high-fidelity headphones at a comfortable listening level (approximately 70 dB SPL). 4.34 Results In order to analyze the results of Pilot Test Ila (and of l i b ) , each response was coded with a number from one to seven, according to the column used for the response. The responses and the computed means and s.d.'s were arranged in 7 x 7 matrix form, as described earlier (Section 4.24). Typical, results are shown in Tables IV and V. By examining Table V, i t can be noted that stimulus TA-7, the most slowed down [ta] stimulus, was perceived as irregular by a l l subjects in a l l presentations. Therefore, i t was decided that this stimulus type (with the 52 a-parameter value of 1.20) would not be used in the main study since i t resulted in 100% detection'of irregularity. In as much as the s.d. of the averaged (for a l l subjects) mean (cf. Table V) was small for stimuli TA-1 (one s.d. = 0.30) and TA-6 (one s.d. = 0.78), the parameter values for these two stimuli (a = -0.20 and a = 0.98) were chosen as the two anchor points in the main study. The results of Pilot Test l i b confirmed the choice of these parameter values for the two anchor points. As can be observed in Table V, stimulus TA-3 had an averaged (for a l l subjects) mean of 4.00 and a corresponding s.d. of 0.24: this stimulus appeared to be perceived as regular most of the time in the pilot study. Hence, a stimulus with the same parameter values as TA-3 (a = 0.28 and n = 4, corresponding to values of Rp R^ and R3 of 1.121, 1.214, and 1.082, respectively) was chosen as the 'neutral* stimulus in the main study. For each subject, the s.d. values were plotted on a linear ordinate scale as a function of their corresponding stimulus number, i t s e l f plotted on a linear abscissa scale, as exemplified in Figure 6. When examining this figure, a 'camel-back' shape may be seen to emerge. In order to obtain this •camel-back' shape, there should be minimum variability for three stimuli (the most accelerating, the neutral, and the most decelerating) and larger variability for the stimuli between these three. It i s f e l t that the.values o f a corresponding to the DL for irregularity should f a l l between the three minimum variability points, that i s , in the areas of maximum variability. 4.4 Main Study The stimulus structure and the parameter values of the stimuli to be used for the main study were chosen on the basis of the results of the pilot 53 TABLE IV Typical Response Matrix for Subject 2 in Pilot Test Ha SUBJECT 2 Stimulus Responses Mean Standard Deviation TA-1 1 1. 1, 2 1 1.20 0.45 TA-2 2 3 2 3 2 2.40 0.55 TA-3 3 4 4 5 4 4.00 0.71 TA-4 5 6 5 6 5 5.40 0.55 TA-5 6 6 6 6 6 6.00 0.00 TA-6 6 6 7 6 7 6.40 0.55 TA-7 7 7 7 7 7 7.00 0.00 TABLE V Typical Averaged Response for A l l Subjects of Pilot Test Ha Stimulus Mean Standard Deviation TA-1 1.13 0.30 TA-2 2.60 0.91 TA-3 4.00 0.24 TA-4 5.13 0.59 TA-5 6.27 0.48 TA-6 6.53 .0.78 TA-7 7.00 0.00 » I I I A \ \ \ I / » / \ i \i JL— Stimulus Figure 6. Standard Deviations for Subject 3 i n P i l o t Test I l a . 55 study (cf. Sections 4.24 and 4.34). Editing and recording of the stimuli was performed as described in detail in Section 4.2. 4.41 Stimulus Preparation The timing for a l l of the stimuli in the main study were computed with the same function (D) used for the speech stimuli in the pilot study. In addition, the intervals between the clicks or between the syllable beats were the same for each set of stimuli. Since the click was placed where the syllable beat was thought to occur, a l l click stimuli were shorter than their corresponding speech stimuli by approximately one syllable (more precisely 45% of the f i r s t syllable, plus 55% of the last one). Based on the results of the pilot study, the values of a to be used for the most accelerating, the neutral, and the most decelerating stimuli were selected as -0.20, 0.28 and 0.98, respectively. These three values were then plotted as a function of their corresponding stimulus number, i t s e l f plotted on a linear abscissa scale, as displayed in Figure 7. The values of a for the intermediate points were obtained by quadratic interpolation between these three points. This produced the values for the three sets of tests to be used in the main study, namely a = -0.20, -0.07, 0.10, 0.28, 0.50, 0.72, and 0.98. It i s expected that the DL values (measured by a) for the accelerated and decelerated irregularities will,be between the neutral., stimulus and each extreme. As in the pilot study, the f i r s t syllable of each stimulus had 1430 samples, based on a syllabic rate of 7 per second. Three sets of stimuli, each using seven stimulus-types, were prepared: the set for Test I consisted of click stimuli, whereas the sets for Test II and Test III consisted of [ta] and [na] speech stimuli, respectively. It should be recalled that a l l stimuli in the main study Figure 7. Curve Produced by Plotting a-values as a Function of Their Corresponding Stimulus Numbers. 57 underwent nonlinear progressive timing alteration. Each stimulus in Test I consisted of a sequence of six clicks, separated by a specific interval calculated with the same function D (cf. Section 4.2) and the clicks were identical to those in the pilot study. The same computer programmes that computed the syllable durations for Pilot Test II computed the interval durations for Test I. The place of vowel onset was computed (i.e. consonant proportion = 45%, vowel proportion = 55%) and the click was placed at this location. The durational structure for the click stimuli was.as shown in Table VI. The stimuli for Test II and Test III were prepared in the same manner, and from the same [ta] and [na] syllables, as those used in Pilot Test II. For a detailed description, see Section 4.31. The durational structure for the speech stimuli was as shown in Table VII. The stimuli were then prepared, edited, and recorded as described in Section 4.21. As each of the three tests in the main study contained five repetitions of each of the seven stimulus-types, plus seven practice items at the beginning and three buffer, items at the end of each test, each of the tests contained 45 items, with items 8 to 42 being used for the analysis. The pseudo-random order imposed on the test items of the main study included the constraint that no stimulus-type could be repeated until at least three other stimulus-types had intervened. A thirty second, 1-kHz calibration tone was recorded on the actual test tape in order to assure presentation of the stimuli at a repeatable level (approximately 70 dB SPL). Then Tests I, II, and III were.dubbed from the original tape onto the f i r s t track of a second audio tape (see Section 4.21 for details). 58 TABLE VI Durational Structure ( i n samples) for the Cl i c k Stimuli of Main Test I Stimulus No. CI 11 C2 12 C3 13 C4 14 C5 15 C6 a_ CLICK1 10 1430 10 1428 10 1416 10 1378 10 1288 10 -0.20 CLICK2 10 1430 10 1429 10 1425 10 1412 10 1382 10 -0.07 CLICK3 10 1430 10 1431 10 1437 10 1456 10 1499 10 0.10 CLICK4 10 ' 1430 10 1433 10 1450 10 1503 10 1633 10 0.28 CLICK5 10 1430 10 1436 10 1467 10 1564 10 1816 10 0.50 CLICK6 10 1431 10 1439 10 1483 10 1628 10 2025 10 0.72 CLICK7 10 1431 10 1442 10 1503 10 1709 10 2308 10 0.98 TABLE VII Durational Structure ( i n samples) for the Speech Stimuli of Main Test I I and I I I  SI S2 S3 S4 S5 S6 Stimulus No. a TAl, NA1 1430 1430 1425 1404 1346 1217 -0.20 TA2, NA2 1430 1430 1428 1421 1401 1359 -0.07 TA3, NA3 1430 1430 1433 1443 1471 1533 0.10 TA4, NA4 1430 1430 1437 1466 1548 1736 0.28 TA5, NA5 1430 1431 1443 1496 1648 2022 0.50 TA6, NA6 1430 1431 1448 1526 1754 2355 0.72 TA7, NA7 1430 1432 1455 1562 1889 2820 0.98 59 4.42 Subjects Twenty-four adults, none of whom were linguistically naive, were used as subjects for the main study. These subjects were divided into three groups according to their language background as indicated below. Group I consisted of twelve subjects (6 females and 6 males) who were native speakers of English and whose age varied between 23 and 43 years. Group II consisted of six subjects (3 females and 3 males) who were native speakers of French and whose age varied between 25 and 50 years. Group III consisted of six subjects (3 females and 3 males) who were native speakers of Japanese and whose age varied between 24 and 37 years. A l l of the subjects had at least a working knowledge of English. None of the subjects had any known speech or hearing problems: any of the subjects who had not undergone audiometric testing in the recent past had their hearing screened prior to the commencement of the experiment. A l l of the subjects who underwent audiometric screening had hearing within normal limits in the frequencies considered most important for speech. 4.43 Test Procedure Subjects were seated individually in a quiet room which met the standards required for a hearing screening situation. They were, informed that they would hear sequences of nonspeech and speech sounds: in one test (Test I), they would hear stimuli consisting, of sequences, of clicks; in another test (Test II), they would hear stimuli consisting of sequences of [ta] syllables; and in'another test (Test III), they would hear stimuli consisting of sequences of [na] syllables. They were further informed that they might notice that the timing was altered (either speeding up or slowing down) in some of the sequences. They were asked to put a check in the appropriate column of the seven-point, 60 forced-choice response form, after listening to each test item. A seven-point scale was used for similar reasons as given in Section 4.33. Column 1, 4, and 7 were labeled 'speeded up', 'regular', and 'slowed down', respectively. Regarding the intermediate columns, the same instructions were given as in Pilot Test I. The test tape was played on a Revox A77 tape recorder and presented over TDH-39 headphones at a level of 60-70 dB SPL as measured with the calibration tone on a Bruel and Kjaer 2203 precision sound level meter with a Bruel and Kjaer 6 cc 4152 a r t i f i c i a l ear. Each of the subjects took the tests in a particular order as explained below. The order of test presentation varied within each group: the three main test orders were: I, II, III; II, III, I; and III, I, II. One female and one male in each group were arbitrarily assigned to take the tests in one of each of these three orders. Since there were twice as many subjects in Group I as "in Groups II and III, one female and one male from Group I were arbitrarily assigned to take the tests in one of each of the following three orders: I, III, II; II, I, III; and III, II, I. For each of the tests, the f i r s t seven items were presented to the subject, the tape stopped, and any questions the subject had were answered. The tape was then restarted at item 1. The subject was allowed to rest for 1 to 3 minutes between tests while the experimenter set up the equipment for the next test. 61 CHAPTER 5 RESULTS 5.1 Data Storage and Sorting In order to analyze the results of the perception tests, each listener response was assigned a number from 1 to 7, corresponding to the number of the column in.which the listener had registered his answer. For instance, an answer of 'speeded up* was assigned a 1; an answer of 'regular' was assigned a 4; and an answer of 'slowed down' was assigned a 7. Listener responses were then stored on computer tape so that the figures would be available for data manipulation. Responses to items 8 to 42 were sorted in matrix form, one matrix per test and per subject. This sorting produced three matrices per subject, each matrix containing 35 items. Each 7 x 5 response matrix was incorporated into a table which included three additional columns: one for the stimulus number, one for the response mean, and one for the corresponding standard deviation (s.d.). Table VIII presents an example of one of these tables for Test II and Subject E2. The response matrices and corresponding tables are described in more detail in Section 4.24. In addition, the responses were cumulated by stimulus type and the corresponding means and s.d.'swere computed.and displayed in table form as exemplified in Table IX ( a l l English subjects for Test II) and Table X ( a l l tests for Subject E2). Figures 8, 9, and 10 show for Test I, II,"and III, respectively, the response mean and two-s.d. range for each stimulus type, averaged for a l l subjects. The two-s.d. ranges indicate the variability in response for each stimulus type in each test. It can be seen see that the variability, 62 TABLE VIII Typical Response Matrix for Subject E2 in Main Test II SUBJECT E2 Stimulus Responses Mean Standard Deviation TA-1 1. 1 1 1 1 1.00 0.00 TA-2 1 1 1 1 4 1.60 1.34 TA-3 4 4 4 4 4 4.00 0.00 TA-4 4 4 4 4 5 4.20 0.45 TA-5 5 4 6 6 6 5.40 0.89 TA-6 7 7 6 6 6 6.40 0.55 TA-7 7 7 7 7 7 7.00 0.00 TABLE IX Typical Averaged Response for A l l English Subjects of Main Test II Stimulus Mean Standard Deviation TAl 1.67 0.80 TA2 2.33 0.90 TA3 3.63 0.74 TA4 4.07 0.76 TA5 5.37 0.88 TA6 5.90 0.80 TA7 6.68 0.47 63 TABLE X Typical Averaged Response of A l l Tests for Subject E2 Stimulus Mean Standard Deviation CLICK1, TAl, NA1 1.07 0.26 CLICK2, TA2, NA2 1.73 1.10 CLICK3, TA3, NA3 3.73 0.70 CLICK4, TA4, NA4 4.07 0.26 CLICK5, TA5, NA5 5.33 0.72 CLICK6, TA6, NA6 6.33 0.49 CLICK7, TA7, NA7 7.00 . 0.00 Response 7 6 5 4 3 • 2 • 1 2 3 4 5 6 7 Stimulus Figure 8. Mean Score and 2-s.d. Range for Each Stimulus of Main Test I, Averaged Over A l l Subjects. Response 7 2 1 / A • • Stimulus Figure 9. Mean Score and 2-s.d. Range for Each Stimulus of Main Test II, Averaged over A l l Subjects. Response 7 h 4 • 66 3 2 1 r L 2 5 6 Stimulus Figure 10. Mean Score and 2-s.d. Range for Each Stimulus of Main Test III, Averaged over A l l Subjects. 67 does not show any areas of minimum variability. Absence of areas of minimum variability for'the whole group may be due to the fact that the large intersubject variability obscures the location of these areas when the entire subject population i s considered. That i s , the location of the areas of minimum variability may differ from subject to subject; hence, when the s.d.'s are averaged over a l l subjects, the areas of minimum and of maximum variability may get smoothed out. 5.2 Pseudo-normalization Pseudo-normalization of the data was attempted since some subjects consistently did not u t i l i z e the extremes of the seven-point scale, while others consistently did not u t i l i z e the intermediate steps between 'regular* and the two extremes. A l l the data was transformed by mapping the responses of the three response groups 1-2, 3-4-5, and 6-7 onto the pseudo-normalized responses o f . l , 4, and 7, respectively. Table XI shows the transformed responses, the corresponding response mean, and s.d. for the Test II responses of Subject E2. The s.d.'s for the transformed scores of the constructed stimuli could be smaller or larger than the s.d.'s for the non-transformed scores since the transformation used may reduce or increase the variability depending on whether the non-transformed scores varied within one of the three response groups or between the response groups. Such manipulation of the data did not, in the end, prove to be useful. 5.3 Perception of Regularity In order to estimate typical values of timing irregularities likely 68 Normalized Response Matrix for Subject E2 in Main Test II TABLE XI SUBJECT E2 Stimulus Responses Mean Standard Deviation TA-1 1 1 1 1 1 1.00 0.00 TA-2 1 1 1 1 4 1.60 1.34 TA-3 4 4 4 4 4 4.00 0.00 TA-4 4 4 4 4 4 4.00 0.00 TA-5 4 4 7 7 7 5.80 1.64 TA-6 7 7 7 7 7 ' 7.00 0.00 TA-7 7 7 7 7 7 7.00 0.00 69 to be perceived consistently as regular by a particular subject, the perception data of each subject were plotted separately for each test with the stimulus number plotted linearly on the abscissa and the response mean plotted linearly on the ordinate. A value for the stimulus number which would correspond to a response of four was then obtained by fourth-degree polynomial interpolation on these particular data points. Table XII displays stimulus number values thus obtained for each subject and each test. A number of factors influenced the choice of the degree of the polynomial used for interpolation purposes. The largest degree that could be justified was the sixth, since there were only seven data points and since there could have been, at most, four inflection points; and the smallest degree that could be selected was the second, since a straight-line f i t was not desirable. In a low degree f i t , a large error on one response mean could be detrimental even to remote points which themselves may not be in error. In a high degree f i t , a close f i t may be illusory. The number of inflection points on a curve allowed by a polynomial i s equal to the degree of the polynomial, minus two. Both fourth- and sixth-degree polynomial f i t s were calculated. The results obtained led to similar conclusions and only those of the fourth-degree f i t are reported here. The figures obtained from the fourth-degree interpolation were used in the analysis of variance (ANOVA) performed for this study and described later in this section. The values of a for the stimulus that would correspond to a response of four were then determined by quadratic interpolation, using the same curve used to obtain the stimulus parameter values in the main study. This curve i s displayed in Figure 7 and described in Section 4.41. A three-way analysis of variance. (ANOVA) was performed, with native TABLE XII Stimulus Number Corresponding to Perception of Regularity Group I Group II Group III Test No. Test No. Test No. Subject J. II III Subject J. II in Subject I II III El 3.44 3.39 3.96 F l 3.73 3.55 4.05 J l 3.42 5.46 4.53 E2 3.53 3.51 3.91 F2 3.56 3.68 3.79 J2 4.68 3.56 4.27 E3 4.02 3.65 3.80 F3 4.15 4.89 4.05 03 4.52 3.33 3.16 E4 3.89 3.03 3.72 F4 3.91 3.87 3.49 J4 4.52 4.37 4.14 E5 4.52 3.35 4.01 F5 4.69 5.35 5.17 35 5.84 4.73 3.84 E6 6.04 4.92 4.73 F6 4.15 3.33 4.21 36 3.42 3.64 4.43 E7 3.67 3.58 3.90 E8 3.59 3.41 3.73 E9 4,36 4.15 3.94 " i E10. 3.90 3.33 4.49 E l l 5.41 4.21 5.04 E12 3.84 3.27 3.34 o 71 language of subject (L), test order (0), and test type (T) as the factors (test order was not completely balanced). In this design, sex of subject was nested with language and order, and was crossed with type: a repeated measures design in that a l l subjects respond to a l l three test types. Table XIII displays a summary of an ANOVA performed on a balanced subset of 18 subjects. It was decided not to u t i l i z e the unbalanced design as i t produced results which may have been an artifact of the design. The data displayed in Table XII were used as input for this ANOVA. They were the values estimated for each test as corresponding to each subject's perception of regularity. This ANOVA indicates that none of the results reached s t a t i s t i c a l significance. Specifically, the language background of a subject did not have an effect on the results, although this may have been expected to be the case. However, the absence of this effect may have been due to the large intra- and intersubject v a r i a b i l i t i e s ; the degree of irregularity required for a perception of regularity clearly differs from subject to subject. Similarly, neither the nature of the stimulus nor the test order had an effect on the results. This effect may have been due to the use of nonsense speech stimuli rather than meaningful speech, although the subjects may have been listening in the 'nonspeech mode' rather than in the 'speech mode' when listening to the speech stimuli. 5.4 Subject Evaluation Criteria Each subject's responses were tabulated as in Tables VIII, X, and XI and examined individually. Since some of the subjects appeared to have performed better and more consistently than others, i t was decided to try to 72 TABLE XIII Results of ANOVA Performed on 18-Subject Group Sum of Mean Squares DF Square F-ratio L 2.0594 2 1.0297 2.2641 n.s. 0 0.52218 2 0.26109 0.57407 n.s. L x 0 3.0572 4 0.76431 1.6805 n.s. S(L,0) 4.0932 9 0.45480 1.7084 n.s. T 0.085144 2 0.042572 0.15992 n.s. L x T 0.50178 4 0.12544 0.47121 n.s. 0 x T 0.61938 4 0.15484 0.58165 n.s. L x 0 x T 2.4377 8 0.30472 1.1446 n.s. Residual 4.7919 18 0.26622 Total 18.168 53 L = language group 0 = test order S = sex of subject T = test type 73 determine c r i t e r i a which would select the best subjects and to compare their results with the results of the entire subject population. This decision seemed particularly appropriate in the context of the present experiments due to the fact that, in order to obtain the DL for irregularity, the intention was to determine the best performance possible. Utilizing the responses to a l l five repetitions of each stimulus as well as the corresponding pseudo-normalized responses, the following c r i t e r i a based on the ability to .discriminate between accelerating, neutral, and decelerating sound sequences, were established. Criterion 1: For each of the three tests of a particular subject, the response means contain no reversals (i.e. as the stimulus number increases, so does the response mean) with the response mean to stimulus i_ never being less than the response mean to stimulus i-1,. or more than the response mean to stimulus i+1. Criterion 2: For each test, the s.d.*s of the response means were f i r s t averaged over a l l stimulus-types in that test, then rank ordered using a l l 24 subjects. Any subject must have a l l three of these s.d.'s in the bottom half of the distribution. Various other c r i t e r i a were considered, but they were eventually discarded as unsuitable. Four subjects passed both Criteria 1 and 2, namely subjects E l , E2, E4, and E5. These four subjects were then grouped together and their responses as a group were compared with the responses of the entire subject population as a group. The responses were 'averaged' (in the remainder of this paper, when 'averaged' i s enclosed in single quotation marks, i t means averaged over a l l subjects in a particular group) in the 4-subject group and in the 24-subject group separately, and the corresponding 74 s.d.'s were computed. The results of these computations are displayed in Figures 11 to 16. Figures 11 to 13 show for Test I, II, and III, respectively, the 'averaged' response means for both the 4-subject and for the 24-subject group. Similarly, Figures 14 to 16 show for Test I, II, and III, respectively, the 'averaged' s.d.'s for both the 4-subject and for the 24-subject group. Each graph u t i l i z e s a dashed line to represent the 4-subject group and a solid line to represent the 24-subject group. In Figures 11 to 13, the curves representing the 'averaged' responses for the 4-subject group and for the 24-subject group appear to have basically the same shape. The main difference i s that the curve for the 24-subject group covers a slightly smaller range of response means as compared to the 4-subject group. In Figures 14 to 16, the curves representing the 'averaged' s.d.'s for the 4-subject group and for the 24-subject group do differ in their basic shape. As would be expected, the 24-subject group shows greater variability than the 4-subject group. However, the main difference i s that the 24-subject group tends not to exhibit a camel-back shape, whereas the 4-subject group does exhibit this shape. As already explained in Chapter 4, i t i s f e l t that the DL for irregularity should be in the areas of maximum-variability. 5.5 Estimation of DL for Irregularity In order to estimate typical values of timing irregularities likely to be perceived 50% of the time as accelerating or decelerating, two curves were plotted for each one of the better subjects. Figure 11. Typical 'Averaged' Response for Test I. Figure 12. Typical 'Averaged' Response for Test II. Response 7 5 • 4 • 3 • 2 - 4-subject group •—«24-subject group 4 -5 Stimulus Figure 13. Typical 'Averaged' Response for Test III. 0.8 0.6 0.4 0.2 • •• 4-subject group •——•24-subject group • Stimulus Figure 14. Typical 'Averaged' s.d.'s for Test I. s.d. 1.6 Figure 16. Typical 'Averaged' s.d.'s for Test III. 81 The f i r s t curve was obtained by plotting the response means for each subject, as a function of the stimulus number, for each test on a separate graph. Both ordinates and abscissae were plotted linearly. Stimulus-number values were then obtained for response-mean values of 2.5, 4, and 5.5, using the same fourth-degree interpolation function as previously. The stimulus-number value corresponding to a response of 4 was chosen as corresponding to a stimulus a subject would perceive as regular, as explained in Section 5.3. The hypothesis was presented earlier that the responses 1, 4, and 7 constitute, by design, anchor points. The stimulus-number values corresponding to a response of 2.5 and 5.5 were chosen as the stimuli thought to be in the area of maximum uncertainty, i.e. the stimuli with the smallest degree of irregularity a subject would perceive 50% of the time as accelerating and decelerating, respectively. The second curve was obtained by plotting the s.d.'s corresponding to the above response means for each subject and for each test, as a function of the stimulus number using linear abscissa and ordinate scales. The stimulus-number value corresponding to a given s.d. in the area of minimum variability (that i s , in the trough of some fitted curve) was then obtained by visual inspection. This value was assumed to correspond to the stimulus the subject would perceive as most regular. Two stimulus-number values corresponding to ordinates in the two areas of maximum variability (that i s , at the peaks of some fitted curve) were then obtained by visual inspection. The lower stimulus-number value was taken to represent the stimulus corresponding to the DL for accelerated rhythm, the upper one corresponding to the DL for decelerated rhythm. For a l l of the stimulus numbers just described, the values of a were 82 then determined by using the same curve used to obtain the stimulus parameter values i n the main study. This curve i s displayed i n Figure 7 and described i n Section 4.41. When the values of a, which correspond to maximum uncertainty of the subject, are subtracted from the value of a, which corresponds to a perceptually regular stimulus, the resulting (absolute) Aa values represent the best estimate of DL for accelerating (Aa-^) i r r e g u l a r i t y or for decelerating i r r e g u l a r i t y . Changes i n the degree of i r r e g u l a r i t y required for perception of i r r e g u l a r i t y were investigated i n both directions since there was no reason to expect symmetry around the neutral stimulus, especially as measured by a. The values of da thus obtained do not appear s i g n i f i c a n t l y different for direction of change, nor for the two curves described above. Therefore, Aa^ and A a 2 , the measures used here to estimate the DL for acceleration and deceleration, have values distributed over a f a i r l y narrow range. Since the second curve (described above) for subject E l did not exhibit a camel-back shape, his data were not considered further; henceforth, only subjects E2, E4, and E5 are considered. The values of A a obtained for the three remaining subjects, l i s t e d i n Table XIV,. can be seen to range from approximately 0.20 to 0.50, with both the. mean and median Aa.at approximately 0.30. Hence, for our purpose, of obtaining a f i r s t estimate for progressive i r r e g u l a r i t y , the mean Aa of 0.30 w i l l be satisfactory. Since the a-values corresponding to the perception of regularity has a mean of a=0.20 for these subjects, the a-values corresponding to the most uncertain s t i m u l i above and below regularity are at a=-0.10 and a=0.50, for accelerating and decelerating s t i m u l i , respectively. The-durations for 83 TABLE XIV  Estimated Values of A a for DL  Using The Three Best Subjects Based on Mean Based on s.d. Accelerating Decelerating Accelerating Decelerating 0.64 0.40 0.30 0.47 0.46 0.37 0.28 0.44 0.30 ' 0.37 0.28 0.41 0.27 0.34 0.25 0.34 0.27 0.31 0.22 0.33 0.27 0.30 0.22 0.30 0.25 0.29 0.18 0.28 0.21 0.27 0.16 0.21 0.20 0.23 0.16 0.21 84 the six successive syllables, which would correspond to the three a-values given above, are as follows: 143.0, 143.0, 142.7, 141.7, 138.9, and 132.7 ms for the accelerated case, 143.0, 143.0, 143.5, 145.6, 151.4, and 164.3 ms for the perceptually regular case, and 143.0, 143.1, 144.3, 149.6, 164.8, and 202.2 ms for the decelerated case. 85 CHAPTER 6 DISCUSSION The main purpose of t h i s study was to attempt to determine two important factors pertaining to the perception of rhythm: 1.. The degree of i r r e g u l a r i t y required for the l i s t e n e r to perceive a repetitive sequence of elements as regular. 2. The increment or decrement i n the degree of i r r e g u l a r i t y required for the l i s t e n e r to detect that a repetitive sequence of elements i s no longer perceptually regular. In addition, the following questions were of interest: a. Does the nature of the stimulus (speech versus nonspeech) affect the above two factors? b. Does the language background of a subject ( i n part i c u l a r , English, French, or Japanese) affect the above two factors? Three l i s t e n i n g tests including c l i c k s or CV syllables were prepared and presented to native speakers of English, French, and Japanese. Stimuli consisted of sequences of si x sounds ( c l i c k s or syllables) which underwent progressive timing alterations according to some predetermined function (D) u t i l i z e d i n a l l three tests. Test I contained fi v e repetitions of seven time-altered sequences of c l i c k s as well as seven practice and three buffer items. Test I I contained f i v e repetitions of seven time-altered sequences of [ta] sy l l a b l e s as well as seven practice and three buffer items. Test I I I contained f i v e repetitions of seven time-altered sequences of [na] sy l l a b l e s as well as seven practice and three buffer items. 86 Subjects recorded their answers to each stimulus on a seven-point scale with 1 corresponding to 'speeded up', 4 corresponding to 'regular', and 7 corresponding to 'slowed down*. Subjects' responses were sorted into tables according to stimulus type and order of a particular repetition (see Tables VIII, X, and XI). The subjects who performed best were selected according to c r i t e r i a based on consistency and low variability. Four subjects passed both of the c r i t e r i a and the responses of these subjects were considered separately. The 'averaged' responses of the 4-subject group were compared to those of the entire subject population. For each test type (cf. Figures 11 to 13), curves representing response mean (judgment of regularity) versus stimulus number have the same basic shape, regardless of group size. However, when the variability of these judgments of regularity i s considered (cf. Figures 14 to 16), the curves for each one of the two groups do not appear to have the same shape. As expected, the larger group exhibits greater variability than the smaller group, but, more interestingly, the larger group does not tend to exhibit a camel-back shape, whereas the smaller group does. It may be assumed that the 'regular' stimulus should occur in the area of minimum variability and should correspond to a response mean of 4 and that the DL for irregularity should be determined by the interval between this last point and. a point in the area of maximum variability (i.e. between the trough and the peak of the variability curve). This appears to be approximately so for the smaller group, but not for the other. Before discussing the DL for irregularity, i t i s necessary to comment on the perception of regularity and to examine the results of the ANOVA performed on the data which corresponds to the subjects' perception of 87 regularity. Table XII displays, for the 24 subjects and for the three tests, the predicted stimulus numbers which correspond to timing irregularities likely to be perceived as regular by each subject and which cover a large, quite evenly distributed range. Such a large range in the stimulus-number values corresponding to a perception of regularity i s a possible reason for the absence of a camel-back shape for the 24-subject group in Figures 14 to 16: acoustic isochrony i s not identical to perceptual isochrony and, in addition, there i s large intersubject variability in the perception of regularity. Although there i s a large range of values corresponding to 'regular', there may not be any best place, as measured by a_, at which to perceive regular. The ANOVA performed for the balanced, 18-subject group produced no significant results for either the main effects or the interactions (cf. Table XIII). There are several possible reasons for the absence of significant results. The f i r s t possible reason i s that the large intra- and intersubject va r i a b i l i t i e s may mask the significance of any one factor in an ANOVA. Perhaps, a future experiment could attempt to use more homogeneous and more s t r i c t l y selected subgroups. A second possible reason for not obtaining any significant-differences in this ANOVA i s that the level of linguistic knowledge differed between language groups. For instance, when the experimenter was recruiting subjects, there was a large subject population from which to recruit English subjects; a smaller population from which to recruit French subjects; and an even smaller population from which to recruit Japanese subjects. The English subjects were recruited partly for their linguistic knowledge and, therefore, 88 greater linguistic awareness may be attributed to the English group than to the other two language groups. Analysis of only the English group's data gives results in which the factor 'T' (test type) tends toward significance. This tendency may be due to the greater linguistic awareness of the English group. A future experiment could try to include a larger population base from which to recruit subjects; in this way, language groups could be matched for linguistic knowledge. A third potential reason for no factor reaching significance in the ANOVA i s familiarity with the materials and with the experimental situation: the English subjects were a l l involved to varying degrees with an audiology and speech science programme and most had experience with materials and situations similar to those of the experiment, whereas few of the French and probably none of the Japanese subjects had had this same experience. Any future experiment should ensure a more homogeneous subject population in this respect.. The perception of regularity may also be compared to the production data of Oiler (1973), referred to in Chapter 4. Differences in the syllable duration ratios for the Oiler data and that of the current study were apparent: the median ratio of the last to next-to-last syllable of Oiler's production data was 1.245, whereas the same ratio corresponding to the perception of regularity in the current study was 1.112. for a l l subjects and 1.085 for the best three subjects. Together, the two sets of data indicate that acoustic isochrony i s not present in the speech signal and that acoustic isochrony i s not equal to perceptual isochrony. However, the amount of departure from acoustic isochrony which resulted in perceptual isochrony in the present experiments does not agree closely with the corresponding amount. 89 found in the production data of Oiler. Several variables in the experiments undertaken for the present study may have contributed to these disagreements. The f i r s t variable i s the listening mode the subjects are in: the subjects may not have been listening in the speech mode and, hence, may not have judged these stimuli as they would have i f they had been listening in the speech mode. There i s no adequate way to ensure that subjects listen in a particular mode; however, the use of meaningful speech i s more likely to result in this objective than i s the use of only clicks and nonsense speech stimuli. The second variable i s the linguistic naivete of the subjects. For example, some of the less linguistically naive subjects were aware of the phenomenon of prepausal lengthening in speech and may have consciously or subconsciously tried to take this phenomenon into account during the test situation. A future experiment could ensure that one subgroup knew and understood such phenomena, while ensuring that another subgroup was linguistically naive, and then compare the results of the two subgroups. After considering the degree of irregularity which would correspond to a subject's perception of regularity, the estimation of a DL for progressive irregularity may now be considered. The characterization of such a DL i s very complex:, various.parameters can be used to estimate, this DL. When preparing the stimuli, using the function (D), a l l but one parameter was held constant and only the parameter a, which measures the degree of acceleration or of deceleration, was manipulated. Therefore, this parameter i s used to estimate the DL, or, more specifically, the least noticeable increment or decrement in the degree of irregularity. Since there i s a tendency for speakers to lengthen syllables as their utterance 90 progresses and for listeners to impose rhythmic structure on sequences of speech sounds and since there i s no reason to expect symmetry, especially as measured by a_, i t was thought necessary to measure separately the DL for acceleration and that for deceleration. As indicated in Chapter 5 (cf. Table XIV), the estimated DL's for acceleration and deceleration ranged approximately fromAa=0.20 toAa=0.50, with the median being approximately Aa=0.30. The DL's for acceleration were not significantly different from those for deceleration. However, most subjects commented that the accelerating stimuli were much more d i f f i c u l t to detect than the decelerating stimuli. This observation could be due to the fact that the. degree of irregularity in the most accelerated stimulus was not as distant (as measured with a) from the degree of irregularity in the neutral stimulus as was the degree of irregularity in the most decelerated stimulus. The. results of this experiment are to be used in a future experiment currently being designed. The purpose w i l l be to obtain a DL for accelerating and decelerating irregularity, using the figures obtained here which correspond to the perception of regularity and to the estimated DL's. The best subjects, namely E2, E4, and E5, w i l l also be subjects in the future experiment, should they be available. The discovery of the DL for irregularity may have important practical considerations, since rhythm i s so important in speech i n t e l l i g i b i l i t y . It i s obvious that temporal factors play a v i t a l part in speech. Speech whose temporal pattern i s grossly abnormal may be almost incomprehensible, as i s often true of the speech of the deaf. (Huggins, 1968) Rhythm may be a more important factor in aural rehabilitation than 91 generally acknowledged, for instance, in rehabilitation with cochlear implants, speech compressors, and deaf speech, as well as in strategies used in foreign language teaching. In 1952, Guberina (cited in Northern and Downs, 1978), for example, developed the Verbotonal Method, designed to improve foreign language learning by emphasizing the spoken rhythm of the foreign language. He later applied this method to the teaching of speech to the deaf population - s t i l l emphasizing the rhythm of the spoken language. One of the main conclusions to be drawn from the current experiments is that acoustic isochrony does not result in perceptual isochrony, whether the stimulus consists of speech or nonspeech sounds. The difference between physical and perceptual isochrony, in speech, may be due to two tendencies discussed in Chapters 2 and 3, namely the tendency, by speakers, to lengthen syllables as their utterance progresses, and the tendency, by listeners, to impose a rhythmic structure on speech sequences. Nevertheless, these tendencies do not explain such a phenomenon in nonspeech sounds. Since there were no significant differences between the different types of stimuli, perhaps one may assume the existence of an underlying regular rhythm which governs the listener's perception of regularity, at least at the short-term level. However, this assumption should be further assessed in future experiments, particularly since a review of the literature indicates a (claimed) tendency toward isochrony in the languages of the world and concludes that more of a timing alteration may be necessary for detection of an irregularity in the speech stimuli than in the click stimuli (Lehiste, 1973; 1979). As neither the degree of acoustic irregularity necessary for a subject to perceive regularity, nor the estimated DL for timing irregularity i s greater for speech stimuli than for nonspeech stimuli, these results do 92 not lend support to the hypothesis that the listener's perception of rhythm is altered by the expectancy of isochrony in speech. If correct, this hypothesis would imply that the acoustic signal i s transmitted and perceived as only quasi-regular by the peripheral nervous system, but that at some higher or more central level of processing,- due to the nature of the stimuli (namely speech stimuli), the signal would be interpreted as regular, whereas i t would not be expected to be the case for nonspeech stimuli. Summary The results of this investigation provided information regarding to the following: 1. The degree of irregularity; when measured by the parameter a, there i s a relatively large range of a-values perceived as 'regular'; there i s also great intersubject variability. For a l l subjects, the perceptually regular stimulus was decelerated. 2. An estimate of DL, measured by Aa, for accelerating and decelerating irregularity; the required just noticeable increment or decrement in the degree of irregularity was estimated u t i l i z i n g only the results of the best three subjects. There i s a relatively narrow range of A a values for this estimate. Due to the intra- and intersubject variabilities., i t was not possible to ascertain whether the language background of a subject or the nature of the sound stimulus affect the perception of rhythm. Similarly, i t was not possible to ascertain whether the direction of irregularity, increment versus decrement, affects the DL for irregularity. The present research i s seen as a f i r s t step in obtaining the DL for 93 progressive temporal i r r e g u l a r i t y at the short-term l e v e l i n speech and nonspeech auditory events. A continuation of t h i s experiment has been proposed i n order to refine the results of the present experiments, as well as to control possible confounding variables. Previous studies have demonstrated that rhythm at the short-term l e v e l may have important implications for speech perception; the present study i s a step toward discovering more about the perception of rhythm. 94 BIBLIOGRAPHY Abercrombie, D. (1967) Elements of General Phonetics. Chicago: Aldine Publishing Co. Allen, G.D. (1968) Experiments on the rhythm of English speech. Working Papers in Phonetics No. 10, U.C.L.A., 42. Allen,. G.D. (1972a) The location of rhythmic stress beats in English: an experimental study I. Language and Speech, 15, 72-100. Allen, G.D. (1972b) The location of rhythmic stress beats in English: an experimental study II. Language and Speech, 16, 179-195. Allen, G.D. (1973) Segmental timing control in speech production. J. of Phonetics, 1, 219-237. Allen, G.D. (1975) Speech rhythm: i t s relation to performance universals and articulatory timing. J. of Phonetics, 3, 75-86. Barry, W.J. (1983) Click Placement and Units of Perceptual Processing. Phonetica, 40, 247-268. Benguerel, A.P. (1970) Some physiological aspects of'stress in French. Phonetics Laboratory Natural Language Studies No. 4. The University of Michigan, Ann Arbor, Michigan. Blakely, W. (1933) The discrimination of short empty temporal interval. Ph.D. dissertation, University of I l l i n o i s Library. Classe, A. (1939) The rhythm of English prose. Oxford: Basil Blackwell and Mott. Crystal, D. (1969) Prosodic systems and intonation in English. Cambridge: Cambridge University Press. Delattre, P. (1966) A comparison of syllable length conditioning among languages. IRAL, IV/3 Fowler, CA. (1979) "Perceptual centers" in speech production and perception. Perception & Psychophysics, 25, 375-388. Fraisse, P. (1963) The psychology of time. New York: Harper and Row. Fraisse, P. (1982) Rhythm and tempo. In: The psychology of music. (D. Deutsch, Editor). New York: Academic Press, 149-180. Fry, D.B. (1955) Duration and intensity as physical correlates of linguistic stress.. J. Acoust. Soc. Amer., 27(4), 765-768. 95 Gerber, S.E. (1974) Introductory Hearing Science: physical and psychological  concepts. Philadelphia: W.B. Saunders Co. Hibi, S. (1983) Rhythm perception in repetitive sound sequence. J. Acoust. Soc. Jpn. (E), 4, 2, 83-95. Hoequist, C. Jr. (1983a) Durational correlates of linguistic rhythm categories. Phonetica, 40, 19-31. Hoequist, C. Jr. (1983b) Syllable duration in stress-, syllable-, and mora-timed languages. Phonetica,'40, 203-237. Huggins, A.W.F. (1968) How accurately must a speaker time his articulations? IEEE Transactions on Audio and Electroacoustics, AU-16, No. 1, 112-117. Huggins, A.W.F. (1972) On the perception of temporal phenomena in speech. J. Acoust. Soc. Amer., 51, 1279-90. Jakobson, R., C.G.M. Fant, and M. Halle (1963) Preliminaries to speech  analysis: the distinctive features and their correlates. Cambridge, Mass.: M.I.T. Press. Kastenholz, J. (1922) Untersuchungen zur Psychologie der Zeitauffassung. Arch. ges. Psychol., 43, 171-228. Klatt, D.H. (1975) Vowel lengthening i s syntactically determined in a connected discourse. J. of Phonetics, 3, 129-140. Klatt, D.H. (1976) Linguistic uses of segmental duration in English: acoustic and perceptual evidence. J. Acoust. Soc. Amer., 59, 1208-1221. Kozhevnikov, V.A. and L.A. Chistovich. (1966) Rech artikulyatsya i vospriyatie. Moscow-Leningrad, 1965. Translated as Speech: Articulation and Perception. Springfield, Va.: Joint Publications Research Service, U.S. Dept. of Commerce. Ladefoged, P. (1975) A course in phonetics. New York: Harcourt Brace Jovanovich, Inc. Lea, W.A. (1974) Prosodic aids to speech recognition: V. A summary of results to date. Sperry Univac, St. Paul, Minn., Defense Systems Div., No. AD-A003-931. Lehiste, I. (1970) Suprasegmentals. Cambridge, Mass.: M.I.T. Press. Lehiste, I. (1971) Temporal organization of spoken language. In: Form and  substance, (L.L. Hammerich, R. Jakobson, and E. Zwirner, Editors). Copenhagen: Akademisk Forlag, 159-169. Lehiste, I. (1973) Rhythmic units and syntactic units in production and perception. J. Acoust. Soc. Amer., 54, 1228-1234. 96 Lehiste, I. (1977) Isochrony reconsidered. J. of Phonetics, 5, 253-263. Lehiste, I. (1979) The perception of duration within sequences of four intervals. J. of Phonetics, 7_, 313-316. Martin, J.G. (1972) Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psych. Rev., 79, 487-509. Miller, G.A. (1967) The psychology of communication: seven essays. New York: Basic Books, Inc., 14-44. Miller, M. (1984) On the perception of rhythm. J. Phon., 12, 75-83. Miyake, I. (1902) Researches on rhythmic activity. Studies from the Yale Psychological Laboratory, 10, 1-48. Morton, J., S. Marcus, and C. Frankish. (1976) Perceptual centers (P-centers). Psychological Review, 83, 405-408. Nooteboom, S.B. (1973) The perceptual reality of^some prosodic durations. J. Phon., 1_, 25-45. Northern, J.L. and Downs, M.P. (1978) Hearing in children, 2nd Edition. • Baltimore: Williams & Wilkins Co. -Ohala, J.J. (1973) The temporal regulation of speech. Proceedings of the Symposium on Auditory Analysis and Perception of Speech, Leningrad (H. Fujisaki, Editor), 516-538. Oiler, D.K. (1973) The effect of position in utterance on speech segment duration in English. J. Acoust. Soc. Amer., 54, 1235-1247. Pike, K. (1945) The intonation of American English. Ann Arbor, Mich.: University of Michigan Press. Shen, Y. and Peterson, G.G. (1962) Isochronism in English. Stud, in Ling. Occasional Papers, University of Buffalo, 9, 1-36. Stone, M. (1981) Evidence for a rhythm pattern in speech production: observations of jaw movement. J. of Phonetics, 9, 109-120. Vierordt, K. (1868) Per Zeitsinn nach Versuchen. Tubingen: H. Laupp. Woodrow, H. (1951) Time perception. In: Handbook of experimental psychology (S.S. Stevens, Editor). New York: Wiley, 1224-1236. 97 APPENDIX A SUBJECT INFORMATION FORM Identi t i e s of a l l subjects w i l l remain confidential. Name: Phone: Age:_ Sex: Native Language: Additional languages spoken: Fluency: Fluency: Fluency: Have you completed any l i n g u i s t i c s courses?_ I f yes, approximately how many? Do you have any hearing problems?_ I f yes, describe problem: 98 APPENDIX B INSTRUCTIONS You are going to hear sound stimuli, each one of which consists of a sequence of six clicks, or of six repeated syllables. I would like you to rate, their degree of regularity or of irregularity. If the sequence of clicks (or syllables) sounds speeding up, check the l e f t column; i f i t sounds regular, check the center column; and i f i t sounds slowing down, check the right column. If you feel i t i s in-between these categories, feel free to use the intermediate columns. For example, i f i t i s just slightly speeding up, you may want to check one of the columns between regular and speeded up. Please mark only one column for each item, but give an answer for each .item. In these tests, there i s no right or wrong answer; your subjective evaluation of the items i s what matters, and i t i s a l l I am interested i n . Before starting to answer, you w i l l be able to listen to the f i r s t seven items, including some examples of the most irregular stimuli you will hear. I wi l l then stop the tape to see i f you have any questions. When you are ready, the tape wi l l be restarted at item one and you should then start recording your answers. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0095981/manifest

Comment

Related Items