UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Redundancy of temporal envelope information across frequency bands in normal hearing individuals Lister, Adrian Mark 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2009_spring_lister_adrian.pdf [ 520.29kB ]
Metadata
JSON: 24-1.0067084.json
JSON-LD: 24-1.0067084-ld.json
RDF/XML (Pretty): 24-1.0067084-rdf.xml
RDF/JSON: 24-1.0067084-rdf.json
Turtle: 24-1.0067084-turtle.txt
N-Triples: 24-1.0067084-rdf-ntriples.txt
Original Record: 24-1.0067084-source.json
Full Text
24-1.0067084-fulltext.txt
Citation
24-1.0067084.ris

Full Text

Redundancy of temporal envelope information across frequency bands in normal hearing individuals by Adrian Mark Lister  B.Eng., The University of Victoria, 2004  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Audiology and Speech Sciences)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2009 c Adrian Mark Lister 2009  Abstract Much research has focused on the role of different spectral aspects of the speech signal in speech perception. To date there had been little quantification of the temporal envelope of the speech signal for speech perception. The goal of this study was to examine the role of temporal envelope information in speech recognition by quantifying the shared temporal envelope information extracted from neighbouring high frequency 1/3-octave bands. The Envelope Difference Index (EDI) was used in the current study to calculate similarity of temporal envelope information between adjacent bands. There was significant shared information, but also novel information as well. In general, for listeners identifying nonsense words from limited temporal information, their speech recognition performance increased as the gap between two bands increased. Furthermore, the amount of shared information between 1/3 octave bands also decreased as a function of gap size. The acoustic measure of redundancy between pairs of bands did not predict intelligibility. Based on these results, models of speech intelligibility must account for this redundancy rather than assuming that bands can be analyzed and weighted individually. ii  Table of Contents Abstract  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iii  List of Tables  vi  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2  1.1  Spectral and temporal features of speech  . . . . . . . . . . .  2  1.2  Definition of the temporal domain . . . . . . . . . . . . . . .  3  1.3  Behavioural evidence  . . . . . . . . . . . . . . . . . . . . . .  5  1.4  Neural evidence  . . . . . . . . . . . . . . . . . . . . . . . . .  7  1.5  Minimum required acoustic information for successful speech understanding  1.6  . . . . . . . . . . . . . . . . . . . . . . . . . .  9  Quantifying redundancy and synergy in both acoustic and perceptual domains  . . . . . . . . . . . . . . . . . . . . . . .  11  1.7  Limitation of acoustic methods . . . . . . . . . . . . . . . . .  12  1.8  Experiment development  14  . . . . . . . . . . . . . . . . . . . .  iii  Table of Contents 1.8.1  Specific aims . . . . . . . . . . . . . . . . . . . . . . .  15  2 Methods  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  16  2.1  Participants  . . . . . . . . . . . . . . . . . . . . . . . . . . .  16  2.2  Stimuli  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  16  2.3  Procedure  2.4  Analysis  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  17  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  19  3.1  Treatment of data . . . . . . . . . . . . . . . . . . . . . . . .  19  3.2  Results  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  22  3.2.1  Performance on token identification . . . . . . . . . .  22  3.2.2  Performance for feature-based scoring . . . . . . . . .  23  3.2.3  Redundancy  26  3.2.4  Predicted performance based on acoustic information  3 Results  4 Discussion 4.1  4.2  4.3  . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  Performance  32 35  . . . . . . . . . . . . . . . . . . . . . . . . . . .  37  4.1.1  Overall performance on token identification . . . . . .  37  4.1.2  Overall performance based on feature scoring . . . . .  38  Redundancy  . . . . . . . . . . . . . . . . . . . . . . . . . . .  38  4.2.1  Voice  . . . . . . . . . . . . . . . . . . . . . . . . . . .  39  4.2.2  Manner . . . . . . . . . . . . . . . . . . . . . . . . . .  40  4.2.3  Place  . . . . . . . . . . . . . . . . . . . . . . . . . . .  41  4.2.4  Summary of redundancy scores . . . . . . . . . . . . .  41  Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42  iv  Table of Contents 4.4  Limitations of the study . . . . . . . . . . . . . . . . . . . . .  44  4.5  Directions for further research  . . . . . . . . . . . . . . . . .  45  4.6  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .  47  References  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  48  A  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  52  B  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  54  C  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  56  D  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  59  E  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  64  F  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  79  G  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  81  Appendices  v  List of Tables 3.1  95% Scheff´e confidence intervals for place of articulation contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3.2  31  95% Obtained and critical t values for identification of significant redundancy by place of articulation . . . . . . . . . . .  32  3.3  Accuracy of performance classification pre logistic regression  34  3.4  Accuracy of data classification post logistic regression . . . .  34  vi  List of Figures 3.1  Proportion correct scores for token identification by gap size .  23  3.2  Proportion correct scores for feature identification by gap size 25  3.3  Redundancy values for voiced and voiceless stimuli as a function of gap size . . . . . . . . . . . . . . . . . . . . . . . . . .  28  3.4  Redundancy values for manner of articulation by gap size . .  30  3.5  Redundancy values for place of articulation as a function of  4.1  gap size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  33  Scatterplot of performance data as a function of EDI values  36  vii  Acknowledgements A thank you to my supervisor, Dr. Lorienne Jenstad, for her wisdom and patience during the last few years as well as my committee, Drs. Navid Shahnaz and Valter Ciocca, for their expertise and insightful comments surrounding this thesis. I am also fortunate to have had the support of friends and family throughout my Masters programme and to to NSERC for funding.  1  Chapter 1  Introduction 1.1  Spectral and temporal features of speech  Recognition of speech involves the translation of acoustic features of the speech signals into meaningful perceptual units. This process occurs in real time in spite of the multiple and variable cues to identify each speech segment. The process by which the auditory system decodes an acoustic signal into meaningful speech is still, after decades of research, not fully understood; in part, this is because acoustic cues overlap with each other and carry redundant information. It is therefore difficult to isolate individual contributions of specific acoustic features to meaningful perception. The acoustic features within the speech signal can be classified by their spectral and temporal components. Spectral components, or the frequency spectrum, include the static spectral components of speech, such as the frequencies of the formants in a vowel or the multi-harmonic nature of voiced speech. The contour of the changes in amplitude, however, refer to the temporal characteristics of the speech signal. When normal hearing listeners have redundant cues available; that is, both temporal and spectral, they place a greater perceptual weight on spectral cues (Hedrick & Jesteadt, 1996; Hedrick, 1995). Listeners with hearing loss, in contrast, have been 2  1.2. Definition of the temporal domain shown to place more perceptual reliance on temporal components for speech recognition than do listeners with normal hearing (Hedrick & Jesteadt, 1996; Hedrick, 1995). Current implementation of hearing aid processing necessarily causes alteration of the slow changes in amplitude (i.e., the temporal envelope) because the provision of audibility in hearing aid processing primarily uses wide dynamic range compression (WDRC), which adversely affects the temporal envelope of the speech signal (Jenstad & Souza 2005, Verschuure et al., 1995). The goal of WDRC processing is to ensure that the low-level components of speech are audible; that is, amplified above the threshold of the hearing-impaired listener. This process effectively compresses the dynamic range of normal hearing individuals into the limited dynamic range of a hearing impaired individual, with the necessary consequence of altering the temporal envelope. Because of the necessary trade-off between audibility and distortion of the signal, it is important to quantify how listeners use the information contained within the temporal domain. This will allow us to develop hearing aid processing that can better preserve or enhance the cues important for speech recognition and determine how much compromise to the temporal domain can be made for the sake of achieving maximum audibility.  1.2  Definition of the temporal domain  The temporal dimension of the speech signal has been divided by Rosen (1992) into three features: envelope, periodicity, and fine structure, based  3  1.2. Definition of the temporal domain on the dominant temporal fluctuation rates of each feature. Rosen (1992) defines the slow fluctuations in amplitude rates between 2 and 50 Hz as the temporal envelope of the signal. He posits that the envelope’s low frequency variation conveys four major types of linguistic information: segmental cues to manner of articulation, segmental cues to voicing, segmental cues to vowel identity, and prosodic cues. Periodicity, the second temporal feature described by Rosen (1992), relates to the distinction between periodic / aperiodic stimuli as well as the rate of periodic stimulation. Periodic speech stimuli have fluctuations of 50500 Hz whereas aperiodic stimuli typically fluctuate up to 5-10 kHz. Rosen posits that periodicity assists listeners in segmental information about voicing and manner. Periodicity also supplies prosodic information that relates intonation and stress. In particular, since the fundamental frequency of a speech signal reflects vocal fold vibration and is the acoustic correlate of voice pitch, it plays an important role in accenting syllables; for example, the verbal versus nominal function of the word rebel vs. rebel. Lastly, Rosen (1992) describes the fine structure of the speech signal or the variations of wave shape within single periods of periodic sounds. This fast temporal feature is suggested to be important to a listener for identifying place of articulation and vowel quality information. In particular, stop gaps are thought to be differentiated based on the fine structure of their initial release burst; for example, bun, done, gun. Temporal fine structure is also thought to play a role in voicing and manner distinctions; for example, all voiced phonemes have a low frequency (<1 kHz) component whereas voiceless sounds have no such component. 4  1.3. Behavioural evidence Support for Rosen’s emphasis on the temporal aspects of speech comes from both behavioural and neural evidence showing that temporal features are processed and used in the auditory system.  1.3  Behavioural evidence  The importance of the temporal aspect of the speech signal as an auditory process is initially derived from the auditory phenomenon of the missing fundamental. The missing fundamental is the ability of listeners to match a pitch of a complex tone to a lower frequency which is not present within the complex tone. For example, if a high-frequency tone is interrupted periodically, then the listener will perceive a pitch corresponding to the frequency whose period is equal to the interruption rate. If the high-frequency tone is interrupted every 25 ms, then subjects will match the pitch of the interrupted high-frequency tone to that of a 400 Hz tone. Similarly, if subjects are presented with an increasing harmonic series without the fundamental, they match the tone to the fundamental frequency even though it is not physically present within the stimuli signal. These examples are evidence which indicates that the auditory system is responding to the temporal pattern of the complex periodic sound’s waveform not only the frequency components. These examples of behavioural responses to tonal stimuli indicate that the human auditory system is encoding temporal aspects of auditory signals; however, speech recognition relies on much more complex signals than pure tones. The importance of various components of the temporal envelope for speech intelligibility has  5  1.3. Behavioural evidence been demonstrated for normal hearing listeners by, for example, Apoux and Bacon, (2004), Greenberg et al., (1998); Shannon et al., (1995); and Xu et al., (2005). Research performed by Greenberg et al. (1998) examined the minimum temporal and spectral information normal hearing listeners needed for successful speech recognition. They found that listeners were able maintain high speech intelligibility with a signal degraded to provide only sparse spectral information. The limited information listeners did have was 4 1/3-octave wide frequency bands, each separated by an octave, with the temporal envelope preserved in each band. Because listeners were able to achieve such high intelligibility with these limited cues, this demonstrates that the temporal envelope carries useable information. In the second part of the same study, subsequent degradation of different aspects of the temporal envelope showed that listeners rely heavily on the amplitude and phase components of the 3-8 Hz modulation spectrum for speech intelligibility. When normal hearing listeners have redundant cues available; that is, both temporal and spectral, they place a greater perceptual weight on spectral cues (Hedrick & Jesteadt, 1996; Hedrick, 1995). In light of this reliance on the spectral components of the speech signal, research has historically placed greater emphasis on understanding listeners’ use of spectral cues in speech, and so the relative importance of temporal envelope information is not nearly as well understood. Although normal hearing listeners rely more on spectral information, when forced to rely on temporal aspects of the signal they are able to extract meaningful cues, as was shown by Greenberg et al. (1998). This suggests that normal hearing listeners can be trained to switch their perceptual attention between acoustic cues. Lorenzi et al. 6  1.4. Neural evidence (2006) confirmed this hypothesis by examining normal-hearing and hearingimpaired listeners’ use of the temporal fine structure (the fast temporal cues) of a speech signal in modulated background noise. One aspect of their study examined how normal hearing listeners’ and hearing-impaired listeners’ ability to recognize speech consisting of temporal fine structure alone improved with practice. Normal hearing subjects’ speech intelligibility performance improved significantly as the number of trials increased. Hearing-impaired individuals, in contrast, did not improve their speech intelligibility scores with more trials. This suggests that hearing-impaired individuals do not have access to acoustic cues provided by temporal fine structure. Had their scores improved, albeit at a slower rate than normal hearing listeners, it would suggest that the hearing impaired individuals were able to use that acoustic feature, though perhaps degraded. Lorenzi and colleagues posit that neural damage reduces a listener’s ability to track changes in the temporal fine structure within an auditory filter. This demonstrates that there is a need to further quantify which aspects of temporal information in speech are useable by hearing impaired listeners.  1.4  Neural evidence  In addition to behavioural evidence of the importance of temporal information, it has been demonstrated that the auditory system encodes temporal cues from the cochlea to the cortex (Joris et al., 2004). The firing patterns of auditory nerve fibres indicate how information is coded and transmitted in the auditory system. A neural example of these temporal aspects within  7  1.4. Neural evidence the auditory system is phase locking. Phase locking refers to a clear and fixed relationship between some aspect of the response and the phase (or time) aspect of the stimulus. Specifically, the discharge rates of auditory nerve fibres are time-locked to tonal stimuli up to 4000 – 5000 Hz. Furthermore, neural discharge rates cluster at a number of relatively discrete latencies, with fewer spikes at successively higher multiples of the period. The locations of the peaks closely correspond to integral multiples of the period or each stimulus frequency up to 1100 Hz. At higher frequencies, the period of the peaks become as low as 625 µs. This minimum period reflects the fibre’s refractory period. Together these results show that pure tones are encoded in the neural system temporally. With more complex speech signals the amplitude modulations are coded by the discharge rate of a neuron. More intense amplitudes are encoded with faster discharge rates. The synchrony between the modulation of the input signal and a neuron’s firing rate is characterized by the modulation transfer function (MTF) (Joris et al., 2004). The modulation transfer functinon of the human auditory system is the neural response modulation relative to input modulation as a function of modulation frequency. By examining the MTF along ascending levels of the auditory system, the ability of the auditory system to preserve the fidelity of a signal’s temporal envelope can be quantified. It is a reasonable assumption that if temporal synchrony is not preserved at the neural level, then temporal information is not accessible for decoding speech. For example, the output of the cochlea is relayed along the afferent nerve 8  1.5. Minimum required acoustic information for successful speech understanding fibers of the auditory nerve as input to the central nervous system. At this level the MTF is close to unity as inner hair cells stimulated at their characteristic frequency show responses at both the modulation and carrier frequencies of the input signal (Joris, et al., 2004). This is distinctively different from the frequency spectrum of the input signal, which exhibits spectral peaks at the carrier frequency and at the carrier frequency plus and minus the modulation frequency. This indicates that the auditory system is not simply performing a Fourier transform of the incoming signal; rather, the auditory system specifically encodes the modulation rate of a signal, that is a temporal feature (Joris et al. 2004). Similar results are found throughout the auditory system.  1.5  Minimum required acoustic information for successful speech understanding  Because the above evidence shows that amplitude cues are encoded and used throughout the auditory system, but normal hearing listeners place greater weight on the spectral cues when they are available (Hedrick & Jesteadt, 1996; Hedrick, 1995), there are still questions about how listeners use the spectral versus temporal acoustic information in speech. Previous research (e.g., Crouzet & Ainsworth, 2001; Greenberg et al., 1998; Shannon 1995, 1998; Warren 1995, 2002) has tried to isolate acoustic cues by distorting one or more dimensions of speech. Such studies have found that listeners with normal hearing were able to achieve high speech intelligibility scores for a signal with either spectrally or temporally degraded information, demon9  1.5. Minimum required acoustic information for successful speech understanding strating that listeners can use temporal or spectral cues to accurately decode speech. The information contained within either dimension of the speech signal is sufficient for recognition, indicating a high amount of redundancy within the information carried by different acoustic cues of speech. This redundancy indicates that multiple acoustic cues map to the same perceptual cue (i.e., manner, place, voicing). Such redundancy is evident not just between the spectral and temporal domains but also within each domain. For example, Steeneken & Houtgast (1999) found that octave bands of acoustic information were mutually dependent. They demonstrated this by presenting to listeners stimuli that varied by the number of octave-wide frequency bands, their relative spacing, and their signal to noise ratio. Their results identified the mutual dependence of octave bands on predictions of speech intelligibility, suggesting that adjacent low (125 – 250 Hz) or high (4000 – 8000 Hz) frequency bands exhibit high redundancy whereas the mid frequency bands (from 500 to 4000 Hz) show low redundancy, indicating that these bands carry unique information. Redundancy across acoustic cues makes it difficult to isolate the unique information carried within each cue. This is particularly problematic for experimental paradigms and models that assume statistical independence of acoustic cues. Examples of widely-used models for speech intelligibility predictions are the Speech Transmission Index (STI) and the Speech Intelligibility Index (SII). Both models assume statistical independence of cues and will be discussed in detail later. The opposite violation of the assumption of independence is synergy. Synergy refers to the contribution of information to speech intelligibility 10  1.6. Quantifying redundancy and synergy in both acoustic and perceptual domains which exceeds predictions based on cue independence. Within the spectral domain, research by Warren and colleagues (1995, 1999) as well as M¨ usch and Buus (2001a, 2001b, 2004) examined the contribution of distant spectral bands on speech intelligibility. As bands moved farther apart, having less spectral similarity, speech intelligibility scores increased. The intelligibility scores with two distantly-spaced bands presented in combination exceeded predicted scores from the same two independent bands presented in isolation, indicating that the distant bands were interacting synergistically. A similar pattern of synergy and redundancy is expected within the temporal domain. Acoustic analysis of the modulation spectrum of temporal envelopes in different bands suggests that temporal envelopes extracted from distant frequency regions are only partially correlated, whereas adjacent bands are highly correlated (Crouzet & Ainsworth, 2001). Because of the high acoustic correlation, in order to further quantify the importance of specific temporal cues and the perceptual weights that listeners give to individual cues, we must first quantify the amount of redundant and synergistic information contained across the temporal cues under study.  1.6  Quantifying redundancy and synergy in both acoustic and perceptual domains  To quantify mutual dependence (i.e., redundancy or synergy) behaviourally, an estimate of intelligibility for 2 bands presented simultaneously can be derived based on the probability of error for each of the two bands presented individually (Warren 1995) (i.e., p1 and p2 ). Actual obtained scores can 11  1.7. Limitation of acoustic methods be compared to estimated scores to assess the amount of synergy or redundancy in those bands. If obtained scores are lower than predicted scores, then the two combined bands are providing similar or redundant information. In contrast, if obtained scores exceed predicted scores, then synergistic information exists between the two bands. We can quantify the amount of redundancy / synergy as So /Se − 1 where So refers to the obtained score and Se is the estimated or predicted score. The perceptual redundancy and synergy is likely related to acoustic measures of the same. Therefore, it is important to have an accurate quantification of acoustic similarity within the temporal domain.  1.7  Limitation of acoustic methods  It has been difficult to quantify the temporal envelope for real speech signals; previous estimates of the temporal envelope, such as the Speech Transmission Index (STI) or the Modulation Transfer Function (MTF), have been limited in their ability to accurately describe temporal envelope perturbations because these methods were developed for measures of simple signals (e.g., tones) instead of complex signals (e.g. speech) (Fortune et al., 1994). In particular, the MTF, which measures the ability of a system to preserve temporal modulations, uses 100%-modulated speech-weighted noise at typical speech frequencies as an input signal (Humes, 1993). By comparing the system’s output modulations to the input signal, an estimate of temporal changes introduced by the system can be calculated. Accurate predictions about speech intelligibility from modulated noise, however, are not possible  12  1.7. Limitation of acoustic methods since the human auditory system is nonlinear. The behaviour of any nonlinear system for complex signals such as speech cannot be predicted from its behaviour for simple signals such as tones or steady noise. Furthermore, both the STI and MTF assess only long term changes to the speech signal. Both the MTF and the STI need to be measured over a 3 minute time window (Steeneken & Houtgast, 1984). A shorter 15 second time window can be used to estimate the Rapid STI (RASTI) (van Doorn , 1984), but this time window is still much longer than individual speech segments. As a result, neither the MTF nor the STI is able to provide information about specific and brief speech components that may be of interest; for example, stop gaps are thought to be differentiated based on the fine structure of their initial release burst, an event that is typically around 10–35 ms (Fortune et al., 1994; Raphael, Borden, & Harris, 2007). In order to circumvent the limitations of the STI and MTF in providing real speech analysis, Fortune et al. (1994) describe a method of directly assessing the temporal correspondence between two complex signals of any length, instead of trying to predict the change from simple stimuli of long duration. This method, the Envelope Difference Index, (EDI) is calculated by directly quantifying envelopes of signals. By measuring differences across individual samples, a single value of the temporal difference between two signals is calculated. As originally described, the EDI was calculated across a single, wide frequency band. Application of this single channel EDI calculation by Jenstad and Souza (2005) to quantify the effect of compression hearing aid release time on speech acoustics and intelligibility found that even when both temporal and spectral cues are fully available, measures of 13  1.8. Experiment development envelope change on the EDI scale account for a significant amount of variance in speech recognition data. The EDI will be used in the current study to calculate similarity of temporal envelope information between adjacent bands.  1.8  Experiment development  Previous research by Apoux and Bacon (2004) has laid some foundation for the development the current study. They examined listeners’ perceptual weighting for a speech signal’s temporal characteristics across frequencies by observing subjects’ performance on nonsense syllable recognition with various randomly degraded frequency bands. By correlating signal degradation with response they obtained a metric of perceptual weight, to quantify the importance of the temporal envelope in each band. In their study, they used a limited number of relatively wide bands: firstly: 1120 Hz and below, 1120-2250 Hz, 2250-3500 Hz, and 3500-10000 Hz; secondly: 800-1500 Hz, 1500-2500 Hz, 2500-5000 Hz. Apoux and Bacon’s results suggested that temporal cues within the high frequency bands, i.e., above 3500Hz, carry the most perceptual weight; that is, the high frequency bands carry the most information-rich temporal envelope cues. However, wide bands such as the ones used in the Apoux and Bacon (2004) study provide only a coarse analysis of the weighting of the temporal envelope. These wide bands do not allow for the quantification of temporal envelope information within critical bands, which is one of the  14  1.8. Experiment development commonly accepted units of analysis within the auditory system1 . Furthermore, Apoux and Bacon’s study (2004) did not quantify redundancy within the temporal domain. It is important to quantify redundancy prior to quantifying perceptual weighting of cues, since any attempt to quantify temporal weight would be confounded by redundancies in the speech signal. Any speech intelligibility models based on these temporal weights without redundancy factors would provide inaccurate predictions of listener responses. The data from the proposed study will be used as a baseline for further studies on hearing impaired individuals’ weighting of temporal cues. Ultimately, this line of research will be used to refine models of temporal processing changes that occur with hearing impairment.  1.8.1  Specific aims  The goal of this study is to examine the role of temporal envelope information in speech recognition by quantifying the redundant temporal envelope information extracted from neighbouring high frequency 1/3-octave bands.  1 For ease of signal processing 1/3 octave wide bands are often implemented as an acceptable approximation to critical bands.  15  Chapter 2  Methods 2.1  Participants  Fifteen individuals with normal hearing, ranging from 20-35 years of age, participated in this experiment. For the purposes of this study normal hearing was defined as having pure-tone air conduction thresholds <20dB HL ANSI (1996) at octave frequencies from 250 to 8000 Hz. All participants had Type A tympanograms and no significant history of noise exposure or ototoxic medications. All participants were fluent speakers of Canadian English and were recruited from the Vancouver, B.C. area.  2.2  Stimuli  Speech stimuli consisted of the University of Western Ontario’s version of the Distinctive Features Differences (UWO-DFD) test. The UWO-DFD uses a closed set format with 21 English consonant targets. All of the consonant sounds are spoken by four talkers (two men and two women) in a fixed “aCil” environment (e.g., abil, akil, atil, afil) (Cheesman & Jamieson, 1996). Subjects were required to identify which of the 21 words they heard. The stimuli were prepared in a similar fashion to those presented by  16  2.3. Procedure Apoux & Bacon (2004). In particular, seven one-third octave high frequency bands ranging from 1800 Hz to 9000 Hz were selected for study based on Apoux and Bacon’s (2004) study which, suggested that temporal information above 3.5 kHz contains more perceptual information than other bands. See Table 1 for band specific cut-off frequencies. MATLAB scripts were developed so that the UWO-DFD stimuli were passed through a 1/3-octave IIR filter-bank (3rd order Butterworth) providing 21 bands. Envelope extraction of these bands was performed by full wave rectification and low-pass filtering at 50 Hz. The resulting envelopes were used to modulate a white noise. Finally, the modulated noise was frequency limited by filtering with the same band pass filter used in the original analysis band. In all conditions, a low frequency band spanning from 71 Hz to 450 Hz was present to avoid floor effects associated with presenting only single 1/3-octave bands of temporal information. Because the low-frequency band was constant across all listening conditions, it should not contribute to calculations of between the bands of interest. No acoustic information was presented between 450 Hz and 1800 Hz. This two-octave gap ensured that no upward spread of masking affected the 1800 Hz to 2240 Hz band. Stimuli were calibrated at an overall level of 66 dB SPL.  2.3  Procedure  Participants were tested with two listening conditions: single band and dual band conditions. In the single band condition, a single item was randomly selected from one of the seven bands of interest. All twenty-one nonsense  17  2.4. Analysis words, spoken by each of the four talkers, from each of the seven bands, were randomized. In the dual band condition, all possible pairs of the bands were randomly presented to the listener, again across all nonsense words and talkers. Listeners were tested individually in a double-walled soundattenuating booth using a single-interval 21-alternative procedure that is computer controlled using scripts written in MATLAB, with response alternatives presented on a computer touchscreen monitor. Stimuli were routed through a TDT System 3 consisting of an RP 2.1 Enhanced real-time processor, a PA5 Programmable attenuator and a HB7 Headphone Driver. Stimuli were played to the listeners binaurally through Sennheiser HD 265 Linear II circumaural headphones.  2.4  Analysis  Redundancy and synergy were quantified in both the acoustic and perceptual domains. In the acoustic domain the EDI provided an index of similarity between pairs of bands. In the perceptual domain differences between actual and predicted scores provided a measure of synergy and redundancy. It was expected that acoustic similarity and perceptual redundancy would be highly correlated.  18  Chapter 3  Results 3.1  Treatment of data  Subjects’ responses were scored in several ways, first on their ability to correctly identify the whole target stimulus and second on their ability to correctly identify specific features; that is, manner, place, or voicing. In addition, scores were calculated as either performance scores or as redundancy scores (i.e., the amount of shared information between different listening conditions). Thus, there were four dependent variables. All proportion correct scores were transformed into rationalized arc-sine units (RAUs) to make them suitable for statistical analysis. The data was transformed according to Studebaker’s (1985) transform. R = (146/π)θ − 23 and θ = arcsin X/(N + 1) + arcsin (X + 1)/(N + 1) where R refers to the score in RAU , θ to the arcsin-transformed score in radians, N to the number of test items, and X refers to the number of correct responses. The independent variables were frequency band, gap size (i.e., number of bands separating pairs of frequency bands), and token identity. One subject’s data were excluded from analysis as it was noted that during testing this person fell asleep multiple times and the scores were more than 2 standard deviations from the group mean. 19  3.1. Treatment of data The amount of information shared across bands was determined by calculating redundancy scores. Single band error probabilities were multiplied together to give a predicted dual band score based on the assumption that single bands contribute independently and uniquely to speech intelligibility when the two single bands are presented together (dual band condition) using the equation Se = 1 − ((1 − p1 ) × (1 − p2 )) where Se is the predicted or estimated score and p1 is the probability of error with one band and p2 is the probability of error with the other band. The predicted score is compared to actual obtained scores by dividing the obtained score by the predicted score. The resulting fraction represents how much information is redundant between the two bands. R =  So Se  − 1. The maximum redundancy score (-0.5)  occurs when the two bands share the same information. In this case there is not an increase in performance when two bands are presented together as opposed to when they are presented alone. By contrast, in the situation when the bands contain completely unique information the redundancy score is 0.0. Synergy is indicated when scores are greater than 0.0. Predicted scores for the dual band condition were then transformed to RAUs. The RAU calculation depends on the number of scoreable items. For the predicted scores, no actual items were presented, but N (number of scoreable items) was considered to be twice the number of items in each single band; that is, there were 56 presentations in each single band so the predicted scores used 112 as the number of test items for the RAU calculation. Redundancy was also calculated for feature-based scores using the same procedures described above but with differing Ns. For feature based scoring 20  3.1. Treatment of data the Ns above were multiplied by a factor of 3 to represent the increase in the number of scorable items. Redundancy data were pooled across subjects and tokens resulting in 1176 data points per band for the single band condition and 2352 data points per band for the dual band condition. This pooling was necessary because without it, predicted scores were often 0, resulting in uninterpretable data points. Because these were all listeners with normal hearing, their scores were sufficiently similar to pool the data for this one calculation. Specifically, all individual subjects’ mean scores fell within the 95% confidence interval calculated by the binomial distribution where the group mean was 67% and the lower bound of the confidence interval was 57% and the upper bound 78%. There are, however, significant differences among subjects for their performance on feature and token identification. Refer to Appendix C for specifics of the ANOVA. Results were also collapsed across bands to allow for the examination of gap size. Previous work by Apoux and Bacon (2004) had indicated that the spectral region from which a temporal envelope was extracted was significant. Specifically, they found that temporal envelopes extracted above 3500 Hz provided more perceptual information than envelopes extracted below 3500 Hz. Because of their findings we performed a one-way ANOVA on this study’s data to examine the effect of frequency band on listener performance. The ANOVA indicated a significant difference across the frequency bands for place of articulation and no significant differences for manner of articulation, voicing, or overall token recognition. Refer to Appendix ?? for specifics. The significant effect of band for listener recognition of place of articulation showed a 6% increase in performance from band 1 up to band 21  3.2. Results 7. In light of this very small but significant linear increase in scores for only place of articulation, data were collapsed across bands.  3.2 3.2.1  Results Performance on token identification  The performance data was explored for differences as a function of band separation. A univariate analysis of variance for gap size (7 levels, from 0 (single band) to 6 (furthest separation between any two bands) revealed significant main effect for gap size [F(6,78)=16.726, p<0.001]. The main effect of gap size was further examined via a series of repeated measures contrasts comparing each of the dual band conditions to the single band condition. Performance in each of the dual band conditions was significantly greater than performance in the single band condition [Gap size 1, F(1,13)=15.554, p=0.002; Gap size 2, F(1,13)=19.957, p=0.001; Gap size 3, F(1,13)=22.922, p<0.001; Gap size 4, F(1,13)=38.280, p<0.001; Gap size 5, F(1,13)=67.576, p<0.001; Gap size 6, F(1,13)=25.922, p<0.001]. Finally, a set of polynomial contrasts revealed a linear increase of performance as gap size increased [Linear, F(1,13)=13.715, p=0.003]. However, there are also significant quadratic and cubic components of this trend [Quadratic, F(1,13)=6.648, p=0.023; Cubic, F(1,13)=7.930, p=0.015]. The mean performance data are shown in Figure 3.1, as a function of gap size. Gap size represents the number of 1/3 octave wide spacing between the presentation of the two bands. The zero gap size condition represents single band presentation. The vertical bars denote the 0.95 confidence inter22  3.2. Results  .50  .40  Proportion Correct  .30  .20  .10  0.00 0  1  2  3  4  5  6  GAP  Figure 3.1: Proportion correct scores for token identification by gap size vals (CI) for the stimuli. Performance is represented as a RAU proportion value, where 1.16 indicates that all of the tokens were correctly identified in the single band condition and 1.18 indicates that all tokens were correctly identified in the dual band condition.  3.2.2  Performance for feature-based scoring  Page 1 Performance was examined further for feature identification (manner, place  and voicing) as a function of gap size. This scoring creates more score-able units per token, which is more sensitive to small changes between conditions. A univariate analysis of variance for gap size (7 levels, from 0 (single band) 23  3.2. Results to 6 (furthest separation between any two bands) revealed significant main effect for gap size [F(6,78)=13.488, p<0.001]. The main effect of gap size was further examined via a series of repeated measures contrasts comparing each of the dual band conditions to the single band condition. Performance for the dual band conditions was significantly greater than performance in the single band condition when gap size was four 1/3 octaves or greater [Gap size 1, F(1,13)=3.721, p=0.076; Gap size 2, F(1,13)=4.343, p=0.057; Gap size 3, F(1,13)=3.681, p=0.077; Gap size 4, F(1,13)=13.109, p=0.003; Gap size 5, F(1,13)=37.122, p<0.001; Gap size 6, F(1,13)=7.714, p<0.001]. Finally, a set of polynomial contrasts revealed a linear increase of performance as gap size increased [Linear, F(1,13)=38.188, p<0.001]. However, there are significant cubic and quartic components of this trend resulting in an Sshaped function [Cubic, F(1,13)=4.699, p=0.049; Quartic, F(1,13)=5.621, p=0.034]. The mean of proportion correct of feature identification data is shown in Figure 3.2 as a function of gap size. Gap size represents the number of 1/3 octave wide spacings between the presentation of the two bands. The zero gap size condition represents single-band presentations. The vertical bars denote the 95% confidence intervals (CI) for the performance values in each listening condition. Performance of feature recognition is represented as a proportion value in arcsine units, where 1.23 indicates that all of the tokens were correctly identified.  24  3.2. Results  1.00  .90  Proportion Correct  .80  .70  .60  .50 0  1  2  3  4  5  6  GAP  Figure 3.2: Proportion correct scores for feature identification by gap size  25 Page 1  3.2. Results  3.2.3  Redundancy  Data were then analyzed based on the amount of shared information contained across different 1/3 octave bands of the speech stimuli. The benefit of assessing shared information rather than performance allows the quantification of redundant information by the addition of a second band. Performance alone does not assess how much perceptual information overlaps between bands. As previously mentioned, the redundancy calculation involves the use of subjects’ predicted scores, determined from listener performance in single band conditions, compared to their obtained score in the dual band condition. These following analyses yield the levels of shared information about the speech signal from differing frequency bands.  Redundancy for voicing Redundancy values, that is, the amount of information shared between two bands regarding phoneme voicing, were explored using a two factor ANOVA of voice (2 levels) and gap size (6 levels). The ANOVA revealed a significant effect of voicing [F(1,429)=12.194, p=0.001]. There was no interaction between voice and gap [F(5,429)=2.175, p=0.056] nor a main effect of gap size [F(5,429)=1.050, p=0.388]. The redundancy values for voiceless stimuli were closer to zero than for voiced stimuli indicating more unique information across bands for voiceless stimuli. However, both voiced and voiceless stimuli were significantly redundant; that is, they both had redundancy values significantly different from zero as shown in the calculated 95% con-  26  3.2. Results fidence intervals. None of the values were greater than 0, indicating no synergy. The mean redundancy values for the correct identification of voicing are shown in Figure 3.3 for voiced and voiceless stimuli as a function of gap size. The vertical bars denote 95% confidence intervals for the redundancy of voicing identification. Gap size represents the number of 1/3 octave wide spacings between the presentation of the two bands. Redundancy of voicing identification is represented where 0.0 indicates that there is no redundant information for correct voicing identification and -0.5 indicates complete redundancy between bands.  Redundancy for manner The degree of shared information for manner of articulation was explored using a two factor ANOVA of manner (5 levels) and gap size (6 levels). The ANOVA revealed a significant interaction of manner of articulation by gap size [F(20,411)=3.429, p<0.001] as well as a significant main effect of manner of articulation [F(4,411)=57.208, p<0.01]. The effect of gap size, however, was not significant [F(5,411)=0.317, p=0.903]. The interaction of manner of articulation and gap was further examined via a series of post-hoc tests using calculated Scheff´e-adjusted critical values. The first part of exploring the interaction of gap size and manner of articulation was polynomial fits of redundancy values by gap size for each manner of articulation. For stops this polynomial analysis revealed a linear increase of redundancy as gap size increased [Linear, 95% CI ranged from –2.301 to –0.071]. 27  3.2. Results  0  -0.1  Redundancy  -0.2  -0.3 Voiced Voiceless -0.4  -0.5  -0.6  -0.7 1  2  3  4  5  6  Gap Size  Figure 3.3: Redundancy values for voiced and voiceless stimuli as a function of gap size  28  3.2. Results However, there is a significant quadratic component of this trend resulting in an inverted U-shaped function [Quadratic, 95% CI ranged from -4.62 to –1.434]. For fricatives the polynomial analysis indicated a quadratic trend of redundancy as gap size increased [Quadratic, 95% CI ranged from -3.324 to –0.516]. However, there is a significant cubic component of this trend resulting in a U-shaped function [Cubic, 95% CI ranged from 0.346 to 4.906]. For nasals the polynomial analysis indicated a quartic trend of redundancy as gap size increased resulting in an S-shaped function [Quartic, 95% CI ranged from -9.460 to –1.980]. There was no significant polynomial trends of redundancy scores as a function of gap size for either affricates or approximants. Appendix A provides a listing of all the 95% Scheff´e confidence intervals for polynomial fits to each manner of articulation. The second part of exploring the interaction of gap size and manner of articulation was determining whether the redundancy values that are shown in Figure 3.4 were significantly different from zero. If the values are significantly and negatively different from zero it indicates that the amount of shared information between bands is significant. A single sample t-test comparing each redundancy score to zero was used with a critical t value adjusted using the Scheff´e adjustment. For all manners of articulation and all band combinations scores were negative and significantly different from 0 indicating redundancy. Appendix B provides a listing of all obtained t values and critical t values for gap size and manner of articulation 29  3.2. Results  0 -0.1  Redundancy  -0.2 Stop Fricative Affricate Approximant Nasal  -0.3 -0.4 -0.5 -0.6 -0.7 1  2  3  4  5  6  Gap Size  Figure 3.4: Redundancy values for manner of articulation by gap size Mean redundancy values for manner of articulation are shown in Figure 3.4 as a function of gap size for stops, fricatives, affricates, approximants, and nasals. Gap size represents the number of 1/3 octave wide gaps between the presentation of the two bands. The redundancy of feature recognition is represented where 0.0 indicates that there is no redundant information for correct manner identification and -0.5 indicates complete redundancy between bands.  30  3.2. Results Comparison Front to Back Mid to Back Front to Mid  95% CI Lower Bound -0.96968487 -1.25746751 -0.61670055  95% CI Upper Bound 1.380125284 1.224972552 1.05963592  Significant No No No  Table 3.1: 95% Scheff´e confidence intervals for place of articulation contrasts Redundancy for place of articulation Redundancy scores for place of articulation were explored using a two factor ANOVA of place of articulation (3 levels) and gap size (6 levels). The ANOVA revealed a significant effect of place of articulation [F(2,423)=8.449, p<0.001]. The interaction between place of articulation and gap was not significant [F(10,423)=1.716, p=0.075] and there was no main effect of gap size [F(5,423)=1.045, p=0.391]. The effect of place of articulation on redundancy was explored using Scheff´e simultaneous 95% confidence intervals for all pairs. No significant differences between places of articulation were discovered The second part of exploring the effect of place of articulation was determining if the redundancy values that are shown in Figure 3.5 were significantly different from zero. If the values are significantly different from zero it indicates that the amount of shared information between bands is significant. A single sample t-test comparing each redundancy score to zero was used with a critical t value adjusted fro multiple comparisons using the Scheff´e adjustment. For all places of articulation scores were negative and significantly different from zero, indicating significant redundancy. The mean redundancy values for the correct identification of place of articulation is shown in Figure 3.5 for front, mid and back places of articu31  3.2. Results Place Front Mid Back  t Value -17.308 -20.403 -5.269  t Critical 2.300294 2.300294 2.300294  Significant * p<0.05 * p<0.05 * p<0.05  Table 3.2: 95% Obtained and critical t values for identification of significant redundancy by place of articulation lation. The vertical bars denote 95% confidence intervals for the redundancy of place identification. Redundancy of place identification is represented on a scale from 0.0 to -0.5 where 0.0 indicates that there is no redundant information for place of articulation identification and -0.5 indicates complete redundancy between bands.  3.2.4  Predicted performance based on acoustic information  It was hypothesized that acoustic information contained within a stimulus could be used to predict the level of shared information within the perceptual domain. This prediction was assessed by examining the correlation between each stimulus’s EDI value and the redundancy score calculated for each subject. Because the EDI is a measure of similar acoustic information between two stimuli, it should be related to similar perceptual information. However, redundancy scores could not be calculated for individual stimuli for individual subjects, because for small numbers of tokens, the predicted score was often zero. As a reminder, the redundancy value is derived from the obtained score divided predicted score, resulting in a denominator of zero for many calculations. In order to circumvent the issue of a calculation with 0 as a denominator, a compromise was made for this analysis. The  32  3.2. Results  0  -0.1  Redundancy  -0.2  -0.3 Front Mid Back  -0.4  -0.5  -0.6  -0.7 1  2  3  4  5  6  Gap Size  Figure 3.5: Redundancy values for place of articulation as a function of gap size  33  3.2. Results  Incorrect Correct  Incorrect 15564 9132  Correct 0 0  Table 3.3: Accuracy of performance classification pre logistic regression  Incorrect Correct  Incorrect 15009 7615  Correct 555 1517  Table 3.4: Accuracy of data classification post logistic regression assumption was made that recognition scores would increase with greater amounts of acoustic information. Thus, as the EDI score increased, showing greater amounts of unique acoustic information, it was expected that performance would increase. Thus, recognition scores for individual tokens for each subject were used as the dependent variable, rather than calculated redundancy scores. The role of acoustic information in predicting performance was explored using logistic regression analysis, in which measured EDI values were used to predict recognition. The amount of unique acoustic information does not appear to predict performance because using the EDI as a predictor variable in the logistic regression does not improve the accuracy of predicted outcome over having no predictor variable in the model. Table 3.3 lists the accuracy of data classification (correct or incorrect) for no predictor variables in the logistic regression and Table 3.4 lists the accuracy of data classification (correct or incorrect) when EDI value is used as a predictor variables. There was no significant improvement in classification of response, as correct classification changed from 63% to 66.9%.  34  Chapter 4  Discussion Several studies have provided information about the redundancy of the spectral speech information across frequency bands, but there has been relatively little published data regarding the redundant information contained across the temporal dimension of the speech signal. The purpose of this study was to examine the role of temporal information in speech recognition by quantifying the redundant temporal information extracted from neighbouring high frequency 1/3 –octave bands. It was hypothesized that neighbouring bands would contain similar information, and as band separation increased, redundancy would decrease. The overall findings of this study partially confirm this hypothesis. Specifically, the results of this study indicate that there is significant redundancy across bands; however, there was not a clear function relating redundancy to gap size. The was no evidence of synergy across bands within this data set. The data do support previous research findings such as those results by Healy and colleagues (2003) that the temporal patterns can be integrated across wide frequency gaps, as evidenced by the linear increase in listener performance as gap size increased. Furthermore the acoustic measures of  35  Chapter 4. Discussion 120  100  80  Performance (RAU)  60  40  20  0 .1  .2  .3  .4  .5  EDI Value  Figure 4.1: Scatterplot of performance data as a function of EDI values the temporal envelope (EDI) agree with Crouzet et al. (2001) results that temporal envelopes in adjacent bands are highly correlated as evidence by the scatter plot of EDI and performance (Figure 4.1). Many of the EDI values are low, clustered around 0.2, indicating a high degree of similar temporal envelope information2 . 2  Recall that higher EDI values indicate greater differences between two stimuli.  36  4.1. Performance  4.1  Performance  Performance was assessed on both overall token scoring and more sensitive feature-based scoring in order to assess whether listeners were obtaining more perceptual information with two bands of temporal envelopes over a single band and furthermore whether listeners’ performance would improve with larger gap separation between bands.  4.1.1  Overall performance on token identification  There are two key findings when examining how subjects were able to identify tokens. Firstly, the performance in all the dual band conditions is significantly greater than the single band condition. Secondly, as gap size increases, performance scores increase cubically. The first finding suggests that listeners are getting more useable information from the dual band condition than from the single band condition. This means that there exists unique temporal information in each of the two bands. This fits with the hypothesis that there would be at least some usable, unique information in different temporal bands. In addition, as gap size increases, performance increases linearly with quadratic and cubic overlays; that is, there is evidence that by separating the bands, more unique information is provided. The overall trend for the curve is that as gap size increases performance increases indicating that as the gap between bands increases so too does performance. This trend is in agreement with the hypothesis that spectrally distant bands would contribute more information to listeners.  37  4.2. Redundancy The performance findings support the Warren and Healy’s 2005 study, in which they concluded that normal hearing listeners, when forced to rely on spectrally distant temporal aspects of the speech signal are able to extract meaningful cues.  4.1.2  Overall performance based on feature scoring  Results were further scored based on the number of features (manner of articulation, voicing, and place of articulation) correctly identified. This scoring method increased the scoreable units and resulted in more sensitive analysis of performance. For feature based scoring if the gap size was greater than four 1/3-octave bands the subjects’ performance was significantly better than the single band condition. Moreover, examining the means of performance change with gap size it is noted that scores increase as a cubic function of gap size. The interpretation of results for feature–based scoring is essentially the same as for whole–token scoring. In this case, whole token scoring was sufficiently sensitive to differences.  4.2  Redundancy  Within the context of this study shared information between bands is quantified by a redundancy calculation that examines predicted versus obtained score, where the predicted score is based on the assumption that the two bands contribute independently to intelligibility. If there is an overlap of information contained within the two bands presented, the obtained score is less than the predicted score resulting in a negative redundancy. In general,  38  4.2. Redundancy it was expected that redundancy would decrease as band spacing increased because the amount of shared information between bands was expected to decrease. The amount of shared information between bands is thought to be high when the gap size is small because the temporal envelopes of adjacent 1/3-octave bands are highly correlated and decrease in correlation with increasing gap size separation (Crouzet & Ainsworth, 2001). Redundancy was examined separately for each feature (voice, manner of articulation, and place of articulation).  4.2.1  Voice  Rosen (1992) has suggested that the temporal envelope provides weak perceptual cues to voicing; for example, voiced approximants and nasals have greater amplitudes than voiceless stops, affricates and fricatives. Furthermore the existence and duration of silent intervals for voiceless stops is also useful for identification of voicing. Since voice and voiceless tokens have different acoustic characteristics within the temporal domain, it was at least possible to have some unique information across bands. Based on the voicing redundancy results from this study the information pertaining to voicing identification in any two band condition is redundant. This means that listeners’ obtained scores of voicing are always significantly less than their predicted voicing score. Voiceless stimuli information across bands was less redundant than voiced stimuli. This finding is expected since the presence of voicing increases the energy within the temporal domain.  39  4.2. Redundancy  4.2.2  Manner  Rosen (1992) suggests that the temporal envelope provides perceptual cues to manner of articulation. Specifically, he posits that rise time, overall duration, and release burst (all envelope features) are influential in the distinction of manner of articulation. It is expected, therefore, that each manner of articulation will affect the temporal envelope of the speech signal differently, resulting in possible variation in redundancy values as a function of manner of articulation. For example, nasals and approximants have primarily spectral differences thus they are expected to have high redundancy (i.e., there is little useable information to begin with. Thus high redundancy was expected for these items since very little perceptual information to their identify exists within the temporal envelope In our study, there were significant differences in redundancy patterns across manners. This suggests that different manners affect the temporal envelope differently even across different bands for the same manner of articulation, supporting Rosen’s (1992) hypothesis that the temporal envelope provides perceptual cues to manner of articulation. Across all manners of articulation there are significant differences in redundancy values as a function of gap size. However, when each manner of articulation is examined separately, there are only significant differences in redundancy as a function of gap size for stops, fricatives, and nasals. These findings are not surprising since nasals, with their low resonant frequency, affect the temporal envelope. Likewise fricatives, with their long overall duration, have marked differences in their temporal envelopes. Stops  40  4.2. Redundancy too, with their release burst and silent gap, affect the temporal envelope. Approximants in contrast have largely spectral information and therefore little information within the temporal domain.  4.2.3  Place  Place information is not thought to be conveyed by the temporal envelope. Place of articulation is typically linked to changes in the fundamental frequency, which can be modeled using a two tube model: the place of articulation (i.e., tongue position) changes the length of the tubes, thus changing the fundamental frequency. These changes in frequency are not reflected as changes in the temporal envelope. The results from this study indicate that for all places of articulation there is significant redundancy across bands. However, there were no significant differences in redundancy between places of articulation nor as a function of gap size. This indicates that there is overlap of information in the temporal domain; however, since it does not change with place or gap size it is most likely due to the very limited amount of perceptual information is present within the temporal envelope.  4.2.4  Summary of redundancy scores  The redundancy values found within this study indicate that multiple acoustic cues map to the same perceptual cue (i.e., manner, place, voicing). Such redundancy has been shown within each of the spectral and temporal domains Steeneken & Houtgast (1999), Warren and colleagues (1995, 1999) as well as M¨ usch and Buus (2001a, 2001b, 2004). For example, Steeneken & 41  4.3. Acoustics Houtgast (1999) found that octave bands of acoustic information were mutually dependent. They demonstrated this by presenting to listeners stimuli that varied by the number of octave-wide frequency bands, their relative spacing, and their signal-to-noise ratio. Their results identified the mutual dependence of octave bands on predictions of speech intelligibility, suggesting that adjacent low (125-250 Hz) or high (4000-8000 Hz) frequency bands exhibit high redundancy whereas the mid frequency bands (from 500 to 4000 Hz) show low redundancy, indicating that these bands carry unique information. These results of the current study support those findings and extend them to show that even for adjacent bands as narrow as one-third octaves, there can be unique information contained within the temporal envelope  4.3  Acoustics  There does not appear to be a strong correlation between the measure of acoustic information in the temporal domain and behavioural performance based on the quantification method used in this study. It is clear that listeners are extracting information from the stimuli provided for them because their scores are well above chance levels; however, the aspects of the signal they are using for recognition are not known. The EDI measurement was not a predictor of behavioural results. The EDI calculation that was used for this study quantified the unique temporal envelope information between two bands using the same envelope procedure as used for the stimuli. The use of another temporal measure may provide better predictive results; however, other currently available measures of tem-  42  4.3. Acoustics poral information are not appropriate for this quantification. The majority of the measures of the temporal envelope use the modulation transfer function (MTF) as their basis. As described earlier the MTF relies on a long a portion of a non-speech signal and thus could not provide an estimate of temporal information contained within a signal for this study. It was thought that the EDI would capture the entire waveform and all its temporal information. The EDI may in fact capture all of the temporal information; however, due to the analysis requirement of using performance data rather than redundancy data the ability of the EDI values as an acoustic predictor may not be as limited as these results suggests. Refinement of the redundancy calculation should be attempted as well as further refinement of the EDI. Refinement of the EDI analysis could be performed by steeper filters for band filtering or through incorporation of some spectral information. These refinements may provide greater insight into what temporal aspects of the signal are important for successful speech recognition. It is important to note that the information presented to the listeners is not purely temporal in nature. The temporal envelope requires a carrier to modulate. In this study, one Gaussian noise band centred at the middle of the low frequency band (71 - 450 Hz) was modulated by the temporal envelope of that frequency region. Each of the high frequency envelopes was then used to modulate a 1/3-octave wide Gaussian noise with cut-offs equivalent to that of the extracted temporal envelope. This method of presenting temporal information ensures that limited spectral information is available for the listener to use. However, studies by Greenberg et al., (1998) have shown that even with limited spectral infor43  4.4. Limitations of the study mation listeners can achieve some level of speech intelligibility, suggesting that listeners in this study may have relied on both temporal and spectral components of the stimuli for intelligibility. While the method we used limited spectral information, it is difficult to completely remove it. Further work in this aspect of the study is proposed. In order to remove any spectral contribution, the temporal envelope from one band could be used to modulate a different Gaussian noise band. This approach creates auditory chimeras as described by (Delgutte, Smith, & Oxenham, 2002). Properly counterbalanced, this approach would provide information regarding the use of spectral cues with primarily temporal information, and allow us to remove the contribution of spectral information to this analysis.  4.4  Limitations of the study  One limitation of this study was the unequal number of trials obtained across listening conditions. With the 7 bands there were more trials for pairs of adjacent bands than for pairs separated by six bands. Adjacent bands were available by pairing Bands 1 & 2, 2 & 3, 3 & 4, etc. However, the maximum gap separation was only available by pairing bands 1 & 7. i.e., 6, for 1 gap pairs whereas for the six band gap there was only 1 comparison. The experiment setup could be modified so that an equal number of gap size presentations were presented to each listener. Another issue that may exist is that the low frequency band (71 to 450 Hz) may be affecting the redundancy values. This low frequency band was included in the stimuli to avoid floor effects; without this band, subjects  44  4.5. Directions for further research would not have been able to respond above chance levels. One way that future studies may examine this is to also test only the low frequency band as a zero-band gap condition. That way the redundancy / synergy could be assessed in each of the single band conditions to determine the actual contribution of the constant low frequency-band. It should be made clear that even though the low-frequency band was constant in all conditions, the synergy / redundancy function is nonlinear, making it difficult to numerically remove the contribution of the low-frequency band. In the current study the band may affect the amount of shared information without a quantifiable value.  4.5  Directions for further research  The limitations of this study create opportunity to refine this preliminary research. The development of a scaling factor to redistribute the redundancy scores along a non-hyperbolic curve would enable comparison of redundancies across bands. Furthermore, the development of a new formula for calculating redundancy would increase the validity of this study. Due to the large number of conditions in which the redundancy calculations would have a value of zero as the denominator, many presentations had to be averaged together, resulting in a loss of detail within the data. This study supports the notion that listeners are able to rely on temporal information for successful speech understanding but this could not be quantified using the EDI measure. A refinement of temporal quantification  45  4.5. Directions for further research in the acoustic domain will enable the understanding of which aspects of the temporal envelope are necessary for successful speech decoding. This research provides the basis for a temporal weighting function that quantifies the relative contributions of individual bands of temporal envelope information to intelligibility. Such a weighting function is essential for improving models of speech recognition and for developing of algorithms within hearing aid processing. Current implementation of hearing aid processing necessarily causes alteration of the slow changes in amplitude (i.e., the temporal envelope) because the provision of audibility in hearing aid processing primarily uses wide dynamic range compression (WDRC), which can adversely affect the temporal envelope of the speech signal (Jenstad & Souza 2005, Verschuure et al. 1995). The goal of WDRC processing is to ensure that the low-level components of speech are audible; that is, amplified above the threshold of the hearingimpaired listener. This process effectively compresses the dynamic range of normal hearing individuals into the limited dynamic range of a hearing impaired individual, with the necessary consequence of altering the temporal envelope. Because of the necessary trade-off between audibility and distortion of the signal, the development of a temporal weighting function allows for quantifying the trade-offs between intelligibility derived from intact temporal information and intelligibility derived from increased audibility.  46  4.6. Conclusions  4.6  Conclusions  In general, for listeners identifying nonsense words from limited temporal information, their speech recognition performance increases as the gap between two bands increases. Furthermore, the amount of shared information between 1/3 octave bands also decreases as the gap between bands increases. Within the bands tested here, there was no evidence of the bands combining synergistically. In the acoustic domain, the level of redundancy, as it was measured with the EDI tool, did not predict intelligibility. These findings suggest that there is useable information contained within the temporal envelope, as evidenced by subject scores above chance. Moreover there is significant redundancy across the temporal envelope extracted from different frequency bands. This indicates that different bands share information about phoneme identify. This redundancy, however, is not complete, meaning that the temporal envelope differs across these frequency bands and provides different acoustic cues to stimulus identify. The amount of shared information decreases as the gap size increases. This is expected since neighbouring bands are more correlated acoustically than widely separated bands. Based on these results, models of speech intelligibility must account for this redundancy rather than assuming that bands can be analyzed and weighted individually.  47  References Apoux, F., & Bacon, S. (2004). Relative importance of temporal information in various frequency regions for consonant identifcation in quiet and in noise. The Journal of the Acoustical Society of America, 116 (3), 1671. Berg, B. (1989). Analysis of weights in multiple observation tasks. The Journal of the Acoustical Society of America, 86 (5), 1743. Cheesman, M., & Jamieson, D. (1996). Development, evaluation and scoring of a nonsense word test suitable for use with speakers of canadian english. Canadian Acoustics, 24 (1), 3. Crouzet, O., & Ainsworth, W.(2001). On the various influences of envelope information on the perception of speech in adverse conditions: An analysis of between-channel envelope correlation. Doherty, K., & Lufti, T. (1996). Use of a correlation method to estimate a listers weighting function for speech. Journal of the Acoustical Society of America, 100 (6), 3769. Doorn, A. van, Grind, W. van de, Bouman, M., & Koenderink, J. (1984). Limits in perception: Essays in honour of Maarten A. Bouman.  48  Chapter 4. References Fortune, T., Woodruff, B., & Preves, D. (1994). A new technique for quantifying temporal envelope contrasts. Ear and Hearing(15), 93. Greenberg, S., Arai, T., & Silipo, R. (1998). Speech intelligibility derived from exceedingly spare spectral information. Healy, E., & Warren, R.(2003). The role of contrasting temporal amplitude patterns in the perception of speech. The Journal of the Acoustical Society of America, 113 (3), 1676. Hedrick, M. (1995). Effect of relative and overall amplitude on perception of voiceless stop consonants by listeners with normal and impaired hearing. The Journal of the Acoustical Society of America, 98 (3), 1292. Hedrick, M., & Jesteadt, W. (1996). Effect of relative amplitude, presentation level, and vowel duration on perception of voiceless stop consonants by normal and hearing-impaired listeners. The Journal of the Acoustical Society of America, 100 (5), 3398. Jenstad, L., & Souza, P. (2005). Quantifying the effect of compression hearing aid release time on speech acoustics. Journal of Speech, Language, and Hearing Research(48), 651. Jenstad, L., Souza, P., & Lister, A. (2006). Development of a metric for quantifying the temporal envelope of speech. Poster presented at the International Hearing Aid Research Conference, Lake Tahoe, CA, August 2006. Joris, P., Schreiner, C., & Rees, A.(2004). Neural processing of amplitudemodulated sounds. Physiology Review, 84, 451. 49  Chapter 4. References Lorenzi, C., Gilbert, G., Carn, H., Garnier, S., & Moore, B.(2006). Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proceedings of the National Academy of Sciences of the United States of America, 103 (49), 18866. M¨ usch, A., & Buus, S. (2001a). Using statistical decision theory to predict speech intelligibility. II. Measurement and prediction of consonantdiscrimination performance. The Journal of the Acoustical Society of America, 109 (6), 2910. M¨ usch, A., & Buus, S.(2001b). Using statistical decision theory to predict speech intelligibility. I. Model structure. The Journal of the Acoustical Society of America, 109 (6), 2896. M¨ usch, A., & Buus, S. (2004). Using statistical decision theory to predict speech intelligibility. III. Effect of audibility on speech recognition sensitivity. The Journal of the Acoustical Society of America, 116 (4), 2223. Raphael, L., Borden, G., & Harris, K.(2007). Speech science primer: Physiology, acoustics, and perception of speech. Remez, R., Rubin, P., Pison, D., & Carrell, T. (1992). Speech perception without traditional speech cues. Science, 212 (4497), 947. Rosen, S. (1992). Temporal information in speech: acoustic, auditory and linguistic aspects. Phil. Trans. R Soc. Lond, 336, 367. Shannon, R., Zeng, F., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270 (5234), 303. 50  Shannon, R., Zeng, F., & Wygonski, J. (1998). Speech recognition with altered spectral distribution of envelope cues. The Journal of the Acoustical Society of America, 270 (5234), 303. Verschuure, K., Maas, A., Stikvoort, E., & Jong, R. de.(1996). Compression and its effect on the speech signal. Ear and Hearing, 17 (2), 162. Warren, R. (1995). Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits. Perception Psychophysics, 57 (2), 162. Xu, L., Thompson, C., & Pfingst, B. (2005). Relative contributions of spectral and temporal cues for phoneme recognition. The Journal of the Acoustical Society of America, 117, 3255.  51  Appendix A  95% simultaneous Scheff´e confidence intervals for polynomial fits of data for manner of articulation Manner  Lower CI  Mean  Upper CI  Order  significant  Stops  -2.30085  -1.18599  -0.07113  Linear  *  Stops  -4.06179  -2.74789  -1.43399  Quadratic  *  Stops  -0.67644  1.365326  3.407095  Cubic  Stops  -0.71717  0.133976  0.985125  Quartic  Stops  -1.7591  0.931208  3.621516  Quintic  Fricatives  -2.31447  -1.11444  0.085578  Linear  Fricatives  -3.3239  -1.9202  -0.51649  Quadratic  *  Fricatives  0.345773  2.626103  4.906433  Cubic  *  Fricatives  -0.83569  0.124004  1.083703  Quartic  Fricatives  -3.20401  -0.23139  2.741236  Quintic  Affricates  -1.91039  0.09447  2.099329  Linear  Affricates  -2.82126  -0.61052  1.600208  Quadratic  Affricates  -2.96317  -0.2433  2.476566  Cubic  Affricates  -1.42942  -0.4536  0.522209  Quartic  Affricates  -3.68935  -0.52374  2.64187  Quintic  Approximants  -1.3785  0.701903  2.782305  Linear  Approximants  -3.19125  -1.22975  0.731754  Quadratic 52  Appendix A.  Manner  Lower CI  Mean  Upper CI  Order  Approximants  -6.10122  -2.44799  1.205235  Cubic  Approximants  -1.14458  0.329843  1.804268  Quartic  Approximants  -3.44216  0.078664  3.599488  Quintic  Nasals  -3.75442  -0.43912  2.876181  Linear  Nasals  -9.4604  -5.72037  -1.98035  Quadratic  Nasals  -4.03695  0.809318  5.65559  Cubic  Nasals  -2.50466  -0.67206  1.160539  Quartic  Nasals  -8.62948  -2.64328  3.342908  Quintic  significant  *  Significance is based on the adjusted Scheff´e F statistic  53  Appendix B  Single t-test statistics for manner of articulation for determining if any manner and gap combination has significant redundancy Manner  Gap  t value  t critical  Significant  Stop  1  -18.356  2.566102  *  Stop  2  -15.728  2.566102  *  Stop  3  -15.879  2.566102  *  Stop  4  -15.211  2.566102  *  Stop  5  -10.655  2.566102  *  Stop  6  -8.875  2.566102  *  Fricative  1  -10.09  2.566102  *  Fricative  2  -8.259  2.566102  *  Fricative  3  -7.944  2.566102  *  Fricative  4  -6.266  2.566102  *  Fricative  5  -7.182  2.566102  *  Fricative  6  -3.734  2.566102  *  Affricate  1  -4.119  2.566102  *  Affricate  2  -4.151  2.566102  *  Affricate  3  -3.152  2.566102  *  Affricate  4  -3.25  2.566102  *  Affricate  5  -4.362  2.566102  * 54  Appendix B.  Manner  Gap  t value  t critical  Significant  Affricate  6  -9.665  2.566102  *  Approximant  1  -14.407  2.566102  *  Approximant  2  -13.266  2.566102  *  Approximant  3  -13.462  2.566102  *  Approximant  4  -12.091  2.566102  *  Approximant  5  -8.166  2.566102  *  Approximant  6  -3.907  2.566102  *  Nasal  1  -20.431  2.566102  *  Nasal  2  -20.38  2.566102  *  Nasal  3  -9.937  2.566102  *  Nasal  4  -8.761  2.566102  *  Nasal  5  -8.801  2.566102  *  Nasal  6  -5.445  2.566102  *  Significance is determined on a Scheff´e adjusted critical t value  55  Appendix C  Analysis of Variance (ANOVA) tables for differences across listener scores in the single band condition Sum of Squares Place  Manner  Overall  Mean Square  649.282  6  108.214  1849.047  91  20.319  Total  2498.329  97  1089.250  6  181.542  8589.693  91  94.392  9678.943  97  Between Groups Within Groups Total  Voice  df  Between Groups Within Groups  Between Groups Within Groups  388.061  6  64.677  3194.123  91  35.100  Total  3582.183  97  Between Groups Within Groups  271.865  6  45.311  2743.363  91  30.147  Total  3015.228  97  F  Sig.  5.326  .000  1.923  .085  1.843  .099  1.503  .186  There is a significant effect of frequency band for listener recognition of place of articulation. This trend is plotted below  56  Appendix C.  Score (in RAU) for Place Identification  100 90 80 70 60 50 40 30 20 10 0 1  2  3  4  5  6  7  Frequency Band  Increase in listener score for identification of place of articulation as a function of band frequency.  57  Appendix C. Analysis of Variance (ANOVA) tables for differences across bands in listener scores in the single band condition  Sum of Squares Place  Manner  13  62.979  1679.598  84  19.995  Total  2498.329  97  6703.508  13  515.654  2975.435  84  35.422  9678.943  97  1991.754  13  153.212  1590.429  84  18.934  3582.183  97  2122.633  13  163.279  892.594  84  10.626  3015.228  97  Between Groups Within Groups Between Groups Within Groups Total  Overall  Mean Square  818.730  Total Voice  df  Between Groups Within Groups  Between Groups Within Groups Total  F  Sig.  3.150  .001  14.558  .000  8.092  .000  15.366  .000  Significant effects are seen for feature recognition as well as token recognition. However, as previously mentioned, listeners’ mean scores fall within the binomial confidence intervals.  58  Appendix D  Single band data for all subjects showing band presented as well as proportion correct scores for each feature(Voice, Manner, Place) and Overall proportion. Overall proportion correct refers to the proportion of features a listener is able to correctly identify. Subject  Band  Voice  Manner  Place  Overall  1  1  0.869  0.774  0.393  0.679  1  2  0.893  0.726  0.393  0.671  1  3  0.821  0.571  0.500  0.631  1  4  0.845  0.607  0.476  0.643  1  5  0.774  0.619  0.500  0.631  1  6  0.857  0.714  0.524  0.698  1  7  0.857  0.810  0.548  0.738  2  1  0.905  0.833  0.488  0.742  2  2  0.917  0.798  0.512  0.742  2  3  0.905  0.857  0.560  0.774  2  4  0.905  0.750  0.583  0.746  2  5  0.869  0.857  0.524  0.750  2  6  0.905  0.821  0.536  0.754  2  7  0.857  0.750  0.500  0.702  3  1  0.905  0.738  0.417  0.687 59  Appendix D.  Subject  Band  Voice  Manner  Place  Overall  3  2  0.929  0.738  0.440  0.702  3  3  0.881  0.702  0.476  0.687  3  4  0.869  0.667  0.583  0.706  3  5  0.893  0.798  0.524  0.738  3  6  0.940  0.833  0.595  0.790  3  7  0.869  0.738  0.524  0.710  4  1  0.917  0.810  0.476  0.734  4  2  0.929  0.786  0.452  0.722  4  3  0.905  0.786  0.560  0.750  4  4  0.857  0.679  0.583  0.706  4  5  0.893  0.762  0.583  0.746  4  6  0.952  0.762  0.560  0.758  4  7  0.905  0.774  0.571  0.750  5  1  0.905  0.774  0.452  0.710  5  2  0.917  0.774  0.512  0.734  5  3  0.869  0.726  0.560  0.718  5  4  0.869  0.655  0.524  0.683  5  5  0.905  0.762  0.560  0.742  5  6  0.929  0.821  0.536  0.762  5  7  0.869  0.798  0.548  0.738  6  1  0.881  0.655  0.512  0.683  6  2  0.881  0.607  0.476  0.655  6  3  0.893  0.560  0.440  0.631 60  Appendix D.  Subject  Band  Voice  Manner  Place  Overall  6  4  0.810  0.619  0.452  0.627  6  5  0.881  0.738  0.571  0.730  6  6  0.869  0.690  0.464  0.675  6  7  0.881  0.762  0.583  0.742  7  1  0.714  0.440  0.417  0.524  7  2  0.655  0.452  0.393  0.500  7  3  0.631  0.452  0.429  0.504  7  4  0.631  0.429  0.429  0.496  7  5  0.667  0.560  0.452  0.560  7  6  0.643  0.500  0.381  0.508  7  7  0.643  0.571  0.452  0.556  8  1  0.905  0.690  0.440  0.679  8  2  0.869  0.595  0.464  0.643  8  3  0.893  0.619  0.476  0.663  8  4  0.917  0.714  0.524  0.718  8  5  0.857  0.655  0.512  0.675  8  6  0.869  0.702  0.488  0.687  8  7  0.845  0.655  0.536  0.679  9  1  0.833  0.667  0.440  0.647  9  2  0.833  0.583  0.429  0.615  9  3  0.738  0.512  0.429  0.560  9  4  0.810  0.583  0.512  0.635  9  5  0.810  0.619  0.464  0.631 61  Appendix D.  Subject  Band  Voice  Manner  Place  Overall  9  6  0.821  0.655  0.488  0.655  9  7  0.845  0.726  0.524  0.698  10  1  0.893  0.560  0.440  0.631  10  2  0.821  0.560  0.464  0.615  10  3  0.869  0.548  0.488  0.635  10  4  0.845  0.476  0.429  0.583  10  5  0.917  0.560  0.452  0.643  10  6  0.857  0.607  0.488  0.651  10  7  0.845  0.679  0.464  0.663  11  1  0.893  0.762  0.452  0.702  11  2  0.929  0.798  0.429  0.718  11  3  0.917  0.762  0.571  0.750  11  4  0.869  0.726  0.500  0.698  11  5  0.905  0.798  0.560  0.754  11  6  0.929  0.833  0.548  0.770  11  7  0.857  0.702  0.571  0.710  12  1  0.833  0.762  0.440  0.679  12  2  0.905  0.607  0.512  0.675  12  3  0.821  0.702  0.393  0.639  12  4  0.857  0.738  0.417  0.671  12  5  0.821  0.750  0.452  0.675  12  6  0.881  0.786  0.488  0.718  12  7  0.881  0.738  0.488  0.702 62  Appendix D.  Subject  Band  Voice  Manner  Place  Overall  13  1  0.905  0.750  0.429  0.694  13  2  0.940  0.774  0.571  0.762  13  3  0.869  0.786  0.583  0.746  13  4  0.857  0.702  0.524  0.694  13  5  0.845  0.762  0.512  0.706  13  6  0.881  0.786  0.548  0.738  13  7  0.869  0.786  0.548  0.734  14  1  0.833  0.452  0.452  0.579  14  2  0.726  0.440  0.381  0.516  14  3  0.798  0.476  0.393  0.556  14  4  0.714  0.429  0.405  0.516  14  5  0.810  0.571  0.524  0.635  14  6  0.774  0.536  0.429  0.579  14  7  0.786  0.619  0.536  0.647  15  1  0.940  0.750  0.429  0.706  15  2  0.905  0.571  0.417  0.631  15  3  0.821  0.619  0.452  0.631  15  4  0.881  0.619  0.464  0.655  15  5  0.833  0.643  0.488  0.655  15  6  0.857  0.643  0.476  0.659  15  7  0.869  0.726  0.488  0.694  63  Appendix E  Dual band data for all subjects showing band combinations as well as proportion correct scores for each feature (Voice, Manner, Place) and Overall proportion. Overall proportion correct refers to the proportion of features a listener is able to correctly identify. Subject  Band 1  Band 2  Voice  Manner  Place  Overall  1  1  2  0.857  0.774  0.512  0.714  1  1  3  0.905  0.702  0.488  0.698  1  1  4  0.869  0.786  0.595  0.750  1  1  5  0.845  0.679  0.536  0.687  1  1  6  0.929  0.798  0.643  0.790  1  1  7  0.881  0.714  0.643  0.746  1  2  3  0.905  0.702  0.488  0.698  1  2  4  0.845  0.631  0.500  0.659  1  2  5  0.833  0.643  0.524  0.667  1  2  6  0.881  0.702  0.631  0.738  1  2  7  0.857  0.762  0.536  0.718  1  3  4  0.774  0.571  0.464  0.603  1  3  5  0.833  0.595  0.583  0.671  1  3  6  0.786  0.488  0.476  0.583  1  3  7  0.881  0.631  0.536  0.683 64  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  1  4  5  0.786  0.571  0.429  0.595  1  4  6  0.810  0.607  0.524  0.647  1  4  7  0.786  0.655  0.512  0.651  1  5  6  0.833  0.774  0.548  0.718  1  5  7  0.845  0.726  0.560  0.710  1  6  7  0.821  0.679  0.464  0.655  2  1  2  0.917  0.774  0.536  0.742  2  1  3  0.881  0.786  0.571  0.746  2  1  4  0.881  0.750  0.583  0.738  2  1  5  0.917  0.810  0.571  0.766  2  1  6  0.881  0.798  0.560  0.746  2  1  7  0.905  0.833  0.536  0.758  2  2  3  0.869  0.810  0.512  0.730  2  2  4  0.881  0.798  0.595  0.758  2  2  5  0.881  0.786  0.583  0.750  2  2  6  0.905  0.774  0.536  0.738  2  2  7  0.940  0.786  0.524  0.750  2  3  4  0.893  0.774  0.560  0.742  2  3  5  0.869  0.821  0.571  0.754  2  3  6  0.833  0.798  0.560  0.730  2  3  7  0.845  0.810  0.488  0.714  2  4  5  0.893  0.810  0.440  0.714  2  4  6  0.869  0.833  0.548  0.750  65  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  2  4  7  0.857  0.774  0.488  0.706  2  5  6  0.869  0.798  0.524  0.730  2  5  7  0.845  0.774  0.464  0.694  2  6  7  0.857  0.726  0.488  0.690  3  1  2  0.905  0.738  0.464  0.702  3  1  3  0.917  0.798  0.512  0.742  3  1  4  0.905  0.774  0.619  0.766  3  1  5  0.893  0.774  0.571  0.746  3  1  6  0.929  0.821  0.726  0.825  3  1  7  0.869  0.821  0.560  0.750  3  2  3  0.869  0.726  0.500  0.698  3  2  4  0.869  0.690  0.536  0.698  3  2  5  0.917  0.714  0.476  0.702  3  2  6  0.929  0.774  0.548  0.750  3  2  7  0.881  0.845  0.512  0.746  3  3  4  0.833  0.702  0.500  0.679  3  3  5  0.893  0.702  0.548  0.714  3  3  6  0.857  0.738  0.512  0.702  3  3  7  0.917  0.786  0.548  0.750  3  4  5  0.869  0.762  0.536  0.722  3  4  6  0.881  0.714  0.536  0.710  3  4  7  0.929  0.845  0.524  0.766  3  5  6  0.881  0.798  0.548  0.742  66  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  3  5  7  0.917  0.774  0.476  0.722  3  6  7  0.893  0.762  0.524  0.726  4  1  2  0.917  0.714  0.536  0.722  4  1  3  0.857  0.810  0.536  0.734  4  1  4  0.929  0.821  0.607  0.786  4  1  5  0.893  0.810  0.595  0.766  4  1  6  0.952  0.786  0.631  0.790  4  1  7  0.869  0.738  0.607  0.738  4  2  3  0.881  0.762  0.548  0.730  4  2  4  0.869  0.774  0.583  0.742  4  2  5  0.929  0.810  0.643  0.794  4  2  6  0.929  0.833  0.667  0.810  4  2  7  0.917  0.786  0.560  0.754  4  3  4  0.929  0.821  0.548  0.766  4  3  5  0.905  0.762  0.536  0.734  4  3  6  0.893  0.786  0.667  0.782  4  3  7  0.964  0.786  0.655  0.802  4  4  5  0.869  0.690  0.548  0.702  4  4  6  0.881  0.857  0.619  0.786  4  4  7  0.929  0.774  0.571  0.758  4  5  6  0.964  0.821  0.631  0.806  4  5  7  0.929  0.798  0.655  0.794  4  6  7  0.905  0.798  0.560  0.754  67  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  5  1  2  0.917  0.798  0.488  0.734  5  1  3  0.881  0.774  0.595  0.750  5  1  4  0.857  0.786  0.619  0.754  5  1  5  0.893  0.786  0.643  0.774  5  1  6  0.905  0.845  0.607  0.786  5  1  7  0.917  0.786  0.619  0.774  5  2  3  0.940  0.857  0.524  0.774  5  2  4  0.857  0.750  0.607  0.738  5  2  5  0.833  0.738  0.571  0.714  5  2  6  0.893  0.798  0.726  0.806  5  2  7  0.940  0.833  0.583  0.786  5  3  4  0.845  0.738  0.583  0.722  5  3  5  0.810  0.702  0.595  0.702  5  3  6  0.881  0.738  0.607  0.742  5  3  7  0.905  0.881  0.655  0.813  5  4  5  0.845  0.750  0.595  0.730  5  4  6  0.857  0.679  0.500  0.679  5  4  7  0.881  0.810  0.560  0.750  5  5  6  0.917  0.821  0.607  0.782  5  5  7  0.905  0.762  0.583  0.750  5  6  7  0.845  0.774  0.571  0.730  6  1  2  0.893  0.774  0.500  0.722  6  1  3  0.881  0.690  0.583  0.718  68  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  6  1  4  0.857  0.786  0.524  0.722  6  1  5  0.845  0.667  0.548  0.687  6  1  6  0.929  0.679  0.571  0.726  6  1  7  0.905  0.786  0.643  0.778  6  2  3  0.833  0.690  0.524  0.683  6  2  4  0.869  0.714  0.524  0.702  6  2  5  0.810  0.571  0.488  0.623  6  2  6  0.845  0.619  0.536  0.667  6  2  7  0.881  0.738  0.607  0.742  6  3  4  0.810  0.726  0.595  0.710  6  3  5  0.881  0.619  0.452  0.651  6  3  6  0.869  0.560  0.500  0.643  6  3  7  0.881  0.738  0.476  0.698  6  4  5  0.810  0.655  0.536  0.667  6  4  6  0.845  0.571  0.500  0.639  6  4  7  0.893  0.738  0.524  0.718  6  5  6  0.845  0.655  0.464  0.655  6  5  7  0.869  0.786  0.488  0.714  6  6  7  0.857  0.726  0.571  0.718  7  1  2  0.738  0.500  0.345  0.528  7  1  3  0.643  0.393  0.393  0.476  7  1  4  0.690  0.464  0.429  0.528  7  1  5  0.643  0.452  0.393  0.496  69  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  7  1  6  0.655  0.440  0.452  0.516  7  1  7  0.714  0.464  0.464  0.548  7  2  3  0.690  0.357  0.393  0.480  7  2  4  0.679  0.393  0.440  0.504  7  2  5  0.512  0.333  0.393  0.413  7  2  6  0.631  0.417  0.345  0.464  7  2  7  0.726  0.583  0.345  0.552  7  3  4  0.667  0.476  0.381  0.508  7  3  5  0.655  0.452  0.381  0.496  7  3  6  0.726  0.393  0.357  0.492  7  3  7  0.702  0.500  0.417  0.540  7  4  5  0.619  0.452  0.417  0.496  7  4  6  0.655  0.369  0.417  0.480  7  4  7  0.631  0.536  0.452  0.540  7  5  6  0.714  0.488  0.393  0.532  7  5  7  0.726  0.548  0.381  0.552  7  6  7  0.667  0.512  0.440  0.540  8  1  2  0.917  0.619  0.476  0.671  8  1  3  0.893  0.619  0.500  0.671  8  1  4  0.893  0.500  0.440  0.611  8  1  5  0.905  0.619  0.512  0.679  8  1  6  0.929  0.631  0.500  0.687  8  1  7  0.881  0.655  0.500  0.679  70  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  8  2  3  0.905  0.571  0.512  0.663  8  2  4  0.893  0.607  0.476  0.659  8  2  5  0.893  0.619  0.488  0.667  8  2  6  0.881  0.571  0.512  0.655  8  2  7  0.905  0.702  0.500  0.702  8  3  4  0.845  0.583  0.512  0.647  8  3  5  0.821  0.488  0.488  0.599  8  3  6  0.881  0.571  0.476  0.643  8  3  7  0.893  0.655  0.500  0.683  8  4  5  0.929  0.595  0.488  0.671  8  4  6  0.845  0.607  0.464  0.639  8  4  7  0.905  0.714  0.500  0.706  8  5  6  0.869  0.679  0.512  0.687  8  5  7  0.905  0.702  0.500  0.702  8  6  7  0.881  0.667  0.476  0.675  9  1  2  0.929  0.750  0.548  0.742  9  1  3  0.905  0.750  0.583  0.746  9  1  4  0.881  0.643  0.595  0.706  9  1  5  0.917  0.762  0.583  0.754  9  1  6  0.905  0.762  0.631  0.766  9  1  7  0.929  0.786  0.560  0.758  9  2  3  0.869  0.714  0.631  0.738  9  2  4  0.857  0.762  0.643  0.754  71  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  9  2  5  0.857  0.643  0.595  0.698  9  2  6  0.881  0.714  0.536  0.710  9  2  7  0.905  0.738  0.548  0.730  9  3  4  0.833  0.679  0.631  0.714  9  3  5  0.833  0.702  0.607  0.714  9  3  6  0.869  0.714  0.655  0.746  9  3  7  0.881  0.702  0.607  0.730  9  4  5  0.845  0.714  0.607  0.722  9  4  6  0.810  0.762  0.607  0.726  9  4  7  0.833  0.667  0.607  0.702  9  5  6  0.821  0.667  0.560  0.683  9  5  7  0.929  0.786  0.619  0.778  9  6  7  0.929  0.833  0.655  0.806  10  1  2  0.881  0.512  0.429  0.607  10  1  3  0.881  0.607  0.452  0.647  10  1  4  0.881  0.607  0.464  0.651  10  1  5  0.833  0.583  0.429  0.615  10  1  6  0.869  0.548  0.452  0.623  10  1  7  0.905  0.702  0.452  0.687  10  2  3  0.881  0.655  0.476  0.671  10  2  4  0.845  0.583  0.500  0.643  10  2  5  0.798  0.512  0.464  0.591  10  2  6  0.869  0.571  0.429  0.623  72  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  10  2  7  0.833  0.631  0.417  0.627  10  3  4  0.881  0.560  0.476  0.639  10  3  5  0.833  0.536  0.476  0.615  10  3  6  0.893  0.548  0.452  0.631  10  3  7  0.881  0.607  0.464  0.651  10  4  5  0.857  0.583  0.500  0.647  10  4  6  0.905  0.631  0.500  0.679  10  4  7  0.905  0.607  0.464  0.659  10  5  6  0.869  0.607  0.440  0.639  10  5  7  0.857  0.643  0.429  0.643  10  6  7  0.845  0.726  0.476  0.683  11  1  2  0.940  0.869  0.536  0.782  11  1  3  0.917  0.845  0.655  0.806  11  1  4  0.893  0.869  0.655  0.806  11  1  5  0.940  0.845  0.583  0.790  11  1  6  0.929  0.833  0.702  0.821  11  1  7  0.893  0.774  0.679  0.782  11  2  3  0.857  0.810  0.571  0.746  11  2  4  0.940  0.881  0.667  0.829  11  2  5  0.893  0.833  0.619  0.782  11  2  6  0.929  0.869  0.595  0.798  11  2  7  0.940  0.833  0.643  0.806  11  3  4  0.857  0.857  0.679  0.798  73  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  11  3  5  0.869  0.774  0.690  0.778  11  3  6  0.845  0.810  0.679  0.778  11  3  7  0.917  0.917  0.667  0.833  11  4  5  0.881  0.821  0.619  0.774  11  4  6  0.893  0.869  0.524  0.762  11  4  7  0.905  0.905  0.607  0.806  11  5  6  0.917  0.821  0.583  0.774  11  5  7  0.917  0.833  0.631  0.794  11  6  7  0.893  0.762  0.512  0.722  12  1  2  0.881  0.774  0.476  0.710  12  1  3  0.905  0.738  0.512  0.718  12  1  4  0.869  0.679  0.464  0.671  12  1  5  0.869  0.738  0.500  0.702  12  1  6  0.893  0.762  0.488  0.714  12  1  7  0.869  0.810  0.548  0.742  12  2  3  0.869  0.738  0.476  0.694  12  2  4  0.881  0.774  0.536  0.730  12  2  5  0.905  0.667  0.512  0.694  12  2  6  0.929  0.762  0.512  0.734  12  2  7  0.893  0.810  0.524  0.742  12  3  4  0.881  0.690  0.405  0.659  12  3  5  0.833  0.702  0.500  0.679  12  3  6  0.881  0.738  0.500  0.706  74  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  12  3  7  0.881  0.845  0.548  0.758  12  4  5  0.881  0.726  0.476  0.694  12  4  6  0.905  0.774  0.488  0.722  12  4  7  0.893  0.798  0.512  0.734  12  5  6  0.893  0.762  0.488  0.714  12  5  7  0.905  0.798  0.536  0.746  12  6  7  0.881  0.762  0.452  0.698  13  1  2  0.893  0.786  0.548  0.742  13  1  3  0.881  0.762  0.583  0.742  13  1  4  0.869  0.738  0.607  0.738  13  1  5  0.869  0.667  0.607  0.714  13  1  6  0.881  0.762  0.702  0.782  13  1  7  0.929  0.738  0.595  0.754  13  2  3  0.786  0.690  0.512  0.663  13  2  4  0.833  0.690  0.512  0.679  13  2  5  0.833  0.655  0.571  0.687  13  2  6  0.845  0.690  0.548  0.694  13  2  7  0.905  0.798  0.655  0.786  13  3  4  0.810  0.619  0.571  0.667  13  3  5  0.798  0.619  0.607  0.675  13  3  6  0.845  0.571  0.619  0.679  13  3  7  0.893  0.690  0.583  0.722  13  4  5  0.833  0.631  0.571  0.679  75  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  13  4  6  0.857  0.643  0.583  0.694  13  4  7  0.810  0.607  0.524  0.647  13  5  6  0.857  0.702  0.607  0.722  13  5  7  0.869  0.714  0.607  0.730  13  6  7  0.798  0.726  0.548  0.690  14  1  2  0.821  0.536  0.476  0.611  14  1  3  0.810  0.560  0.512  0.627  14  1  4  0.774  0.476  0.476  0.575  14  1  5  0.833  0.476  0.476  0.595  14  1  6  0.798  0.583  0.607  0.663  14  1  7  0.857  0.702  0.560  0.706  14  2  3  0.810  0.512  0.500  0.607  14  2  4  0.810  0.536  0.488  0.611  14  2  5  0.786  0.488  0.512  0.595  14  2  6  0.738  0.500  0.536  0.591  14  2  7  0.798  0.571  0.548  0.639  14  3  4  0.833  0.536  0.548  0.639  14  3  5  0.690  0.488  0.512  0.563  14  3  6  0.774  0.548  0.524  0.615  14  3  7  0.798  0.524  0.548  0.623  14  4  5  0.798  0.571  0.488  0.619  14  4  6  0.833  0.536  0.500  0.623  14  4  7  0.798  0.595  0.464  0.619  76  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  14  5  6  0.869  0.690  0.583  0.714  14  5  7  0.750  0.595  0.524  0.623  14  6  7  0.798  0.702  0.571  0.690  15  1  2  0.940  0.810  0.571  0.774  15  1  3  0.893  0.655  0.464  0.671  15  1  4  0.917  0.714  0.560  0.730  15  1  5  0.952  0.750  0.619  0.774  15  1  6  0.905  0.774  0.548  0.742  15  1  7  0.917  0.786  0.571  0.758  15  2  3  0.881  0.679  0.560  0.706  15  2  4  0.881  0.667  0.500  0.683  15  2  5  0.881  0.679  0.583  0.714  15  2  6  0.881  0.667  0.524  0.690  15  2  7  0.917  0.798  0.631  0.782  15  3  4  0.881  0.667  0.512  0.687  15  3  5  0.881  0.595  0.524  0.667  15  3  6  0.881  0.619  0.560  0.687  15  3  7  0.893  0.690  0.571  0.718  15  4  5  0.881  0.702  0.548  0.710  15  4  6  0.857  0.690  0.595  0.714  15  4  7  0.893  0.738  0.560  0.730  15  5  6  0.881  0.702  0.488  0.690  15  5  7  0.845  0.798  0.548  0.730  77  Appendix E.  Subject  Band 1  Band 2  Voice  Manner  Place  Overall  15  6  7  0.869  0.714  0.488  0.690  78  Appendix F  Ethics Approval  79  Appendix F.  The University of British Columbia Office of Research Services Behavioural Research Ethics Board Suite 102, 6190 Agronomy Road, Vancouver, B.C. V6T 1Z3  CERTIFICATE OF APPROVAL - MINIMAL RISK PRINCIPAL INVESTIGATOR:  INSTITUTION / DEPARTMENT:  Lorienne Jenstad  UBC/Medicine, Faculty of/Audiology & H08-01832 Speech Sciences  UBC BREB NUMBER:  INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT: Institution  Site  UBC  Vancouver (excludes UBC Hospital)  Other locations where the research will be conducted:  N/A  CO-INVESTIGATOR(S): Adrian M. Lister  SPONSORING AGENCIES: Natural Sciences and Engineering Research Council of Canada (NSERC) PROJECT TITLE: Quantifying Hearing-Impaired Listeners' Use of Acoustic Information in Speech CERTIFICATE EXPIRY DATE: September 11, 2009 DOCUMENTS INCLUDED IN THIS APPROVAL:  DATE APPROVED: September 11, 2008  Document Name  Consent Forms: Consent Advertisements: Recruitment poster  Version  Date  1  August 20, 2008  1  August 20, 2008  The application for ethical review and the document(s) listed above have been reviewed and the procedures were found to be acceptable on ethical grounds for research involving human subjects.  Approval is issued on behalf of the Behavioural Research Ethics Board and signed electronically by one of the following: Dr. M. Judith Lynam, Chair Dr. Ken Craig, Chair Dr. Jim Rupert, Associate Chair Dr. Laurie Ford, Associate Chair Dr. Daniel Salhani, Associate Chair Dr. Anita Ho, Associate Chair  80  Appendix G  Consent form for all participants  81  Appendix G.  Quantifying Hearing-Impaired Listeners' Use of Acoustic Information in Speech Principal Investigator: Lorienne Jenstad, PhD, Aud(C), Assistant Professor 604-827-3338 (lab); ljenstad@audiospeech.ubc.ca (email) Co-investigator: Adrian Lister, Master of Science Student This project is for a Master of Science thesis. We are asking you to be in a research study because you have indicated that you are interested in participating. The purpose of this consent form is to give you the information you will need to help you decide whether or not to be in the study. Please read the form carefully. You may ask questions about the purpose of the research, what we would ask you to do, the possible risks and benefits, your rights as a volunteer and anything else about the research or this form that is not clear. When all your questions have been answered, you can decide if you want to be in the study or not. This process is called “informed consent”. PURPOSE AND BENEFITS This is a research study about what information listeners use to understand speech. You will not benefit directly from taking part in this study. The information obtained in this study will help us understand how speech is understood, and how hearing loss changes that process. The information will ultimately be used to design better hearing aids. PROCEDURES If you choose to be in this study, we would like you to come in for 1 study session. This session will last about 2 hours. We will ask you a general question about your health. We will also ask questions that test your memory and attention. For example, we ask you to repeat number sequences. You do not have to answer every question. If you do not qualify for this study, we will destroy this information. Then we will ask you to sit comfortably in a small room and listen to recorded speech, some of which will sound distorted. We want to know how much of the speech you understand, so we will ask you to repeat it back to us or point at your choice on a computer screen. We encourage you to take breaks, and you can take a break at any time. Most people take 1 two-hour session to complete the study procedures. Because everyone works at a different pace, it is hard to predict exactly how long it will take for you to complete the study.  Page 1 of 2  Version 1 Date: August 20, 2008  82  Appendix G.  RISKS, STRESS, OR DISCOMFORT There are no known physical risks for these study procedures. It is possible that you may find the ear phones uncomfortable. The researchers can adjust them for you. We have addressed concerns about your privacy in the following section of this consent form. OTHER INFORMATION Being in this study is voluntary, and you may decline to enter, or withdraw from the study at any time without any consequences to treatment, medical care, or class standing. Information about you is confidential. We will code study records. The link between the code and your name will be kept at a secured location, separate from the study information. Only lab employees, all of whom have been trained in privacy and confidentiality, will have access to the link. We will keep the link between the study records and your name for five years from the data of participation in the study, and then we will destroy the link. If we publish the results of this study, we will not use your name. If you begin the study, you will be compensated by the amount of $15. This research has been sponsored by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC) and the University of British Columbia (UBC) to the Principal Investigator. We may want to recontact you about taking part in future related studies. Please indicate below whether or not you give your permission to re-contact you. Giving the research team permission to re-contact you does not obligate you in any way. Subject’s statement The study described above has been explained to me, and I voluntarily consent to participate, as indicated by my signature below. I have had an opportunity to ask questions. I understand that future questions I may have about the research will be answered by the investigator listed above. If I have questions about my treatment or rights as a subject, I may call the Research Subject Information Line in the UBC Office of Research Services at the University of British Columbia, at 604-822-8598. I acknowledge that I received a copy of this consent form. Please check one box below: “I give my permission for the researchers to re-contact me for future related research.” ___________ Yes  Signature of subject  Page 2 of 2  ________________ No  Printed name  Date  Version 1 Date: August 20, 2008  83  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0067084/manifest

Comment

Related Items