Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Optimizing acoustical conditions for speech intelligibility in classrooms Yang, Wonyoung 2006

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2006-200223.pdf [ 8.79MB ]
Metadata
JSON: 831-1.0092850.json
JSON-LD: 831-1.0092850-ld.json
RDF/XML (Pretty): 831-1.0092850-rdf.xml
RDF/JSON: 831-1.0092850-rdf.json
Turtle: 831-1.0092850-turtle.txt
N-Triples: 831-1.0092850-rdf-ntriples.txt
Original Record: 831-1.0092850-source.json
Full Text
831-1.0092850-fulltext.txt
Citation
831-1.0092850.ris

Full Text

OPTIMIZING ACOUSTICAL CONDITIONS FOR SPEECH INTELLIGIBILITY IN CLASSROOMS by WONYOUNG YANG B.Sc, Hanyang University, Korea, 1998 M.Sc, The Pennsylvania State University, USA, 2002 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Occupational and Environmental Hygiene) THE UNIVERSITY OF BRITISH COLUMBIA October 2006 © Wonyoung Yang, 2006 A B S T R A C T High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that i f the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were.done in two different auralized sound fields — approximately diffuse and non-diffuse — using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with SNS = 4 dB and increased to 0.8 and 1.2 s with decreased SNS = 0 dB, for both normal and hearing-impaired listeners. Hearing-impaired listeners. required more early energy than normal-hearing listeners. Reflective ceiling barriers and ceiling reflectors — in particular, parallel front-back rows of semi-circular reflectors — achieved the goal of decreasing reverberation with the least speech-level reduction. i i T A B L E O F C O N T E N T S A B S T R A C T " T A B L E OF CONTENTS iii LIST OF T A B L E S vi LIST OF FIGURES vii LIST OF S Y M B O L S ix LIST OF A B B R E V I A T I O N S xi A C K N O W L E D G M E N T S xiii DEDICATION XV 1 INTRODUCTION 1 1.1 Background 1 1.2 Literature Review 2 1.2.1 Speech-intelligibility measures 2 1.2.2 Effects of noise and reverberation on speech intelligibility 4 1.2.3 Speech intelligibility in virtual environments 7 1.2.4 Room-acoustical modeling methods 10 1.2.5 Summary 12 1.3 Objectives of the Thesis • 13 1.4 Approach 13 1.5 Thesis Outline 14 References 14 2 VALIDATION OF THE AURALIZATION TECHNIQUE: SPEECH INTELLIGIBILITY TESTS IN VIRTUAL AND REAL CLASSROOMS 21 2.1 Introduction • 21 2.2 Methods 23 2.2.1 Subjects and test materials 23 2.2.2 Classrooms and acoustical measurements 24 2.2.3 Auralization and listening tests 25 2.3 Results and Analysis 29 2.3.1 Room acoustical parameters 29 2.3.2 Speech intelligibility tests • 33 2.4 Discussion 37 ii i 2.5 Summary and Conclusions 40 References 41 3 OPTIMUM REVERBERATION TIMES FOR SPEECH INTELLIGIBILITY FOR NORMAL AND HEARING-IMPAIRED LISTENERS IN IDEALIZED CLASSROOMS WITH DIFFUSE SOUND FIELDS 43 3.1 Introduction 43 3.2 Theoretical Considerations 45 3.3 Experimental Methodology 46 3.3.1 Classroom and sound-field simulation procedures 46 3.3.2 Subjects 49 3.4 Results 50 3.4.1 Speech Intelligibility 50 3.4.2 Best predicting early-time limit 53 3.5 Discussion 56 3.6 Conclusions ! 57 References 58 4 OPTIMUM REVERBERATION FOR SPEECH INTELLIGIBILITY FOR NORMAL AND HEARING-IMPAIRED LISTENERS IN REALISTIC VIRTUAL CLASSROOMS USING AURALIZATION 59 4.1 Introduction 59 4.2 Methods 60 4.2.1 Subjects 60 4.2.2 Classroom configurations 61 4.2.3 Sound-field simulation and speech-intelligibility test procedure 65 4.2.4 Data analysis 67 4.3 Results 67 4.3.1 Speech intelligibility 67 4.3.2 Useful-to-detrimental ratio and Speech Transmission Index 69 4.4 Discussion 73 4.5 Conclusions 75 References -76 5 CEILING BARRIERS AND REFLECTORS TO OPTIMIZE L E C T U R E - R O O M SOUND FOR SPEECH INTELLIGIBILITY 78 5.1 Introduction 78 5.2 Methods 80 5.2.1 Lecture-room configurations 80 5.2.2 Ceiling barrier and reflector configurations 80 i v 5.2.3 Physical scale-modeling 81 5.2.4 Computer simulation 84 5.3 Results 86 5.3.1 Comparison of measurement and prediction 86 5.3.2 Ceiling barriers 88 5.3.3 Ceiling reflectors 92 5.4 Discussion 96 5.5 Conclusions... 100 References 100 6 CONCLUSION 101 6.1 Contributions • 101 6.2 Limitations 103 6.3 Future Work 104 Reference 106 A P P E N D I X A - Ethical Certificate 107 APPENDIX B - Consent Forms 108 A P P E N D I X C - 300 M R T Words in a Response Sheet 112 v L I S T O F T A B L E S Table 2. 1. Individual and average absorption coefficients (a, a) and diffusion coefficients (d, d) 26 Table 2. 2. Freefield sound levels at l m from source and power levels 27 Table 2. 3. Difference between prediction and measurement parameters 32 Table 2. 4. Mean speech-intelligibility scores and standard deviations 33 Table 2. 5. Difference between predicted and measured M R T results 35 Table 2. 6. Equations and coefficients of determination (R2) associated with linear regression of C50, t/50, and speech-to-noise level difference with mean speech-intelligibility score 37 Table 3. 1. Received speech-to-noise level differences (SNR in dB) for all test sound-field configurations 48 Table 3.2. Coefficients of determination (R2) associated with third-order-polynomial regression fits for each U, value, for both normal-hearing and hearing-impaired subjects. The highest values are in bold 53 Table 4. 1. Absorption (a) and diffusion (d) coefficients of the surface materials 64 Table 4. 2. Octave-band average surface-absorption coefficients (a) and diffusion coefficients (d ) of the six classroom configurations 64 Table 4. 3. Coefficients of determination (R2) associated with third-order-polynomial regression fits for various U, values, for both normal-hearing (NH) and hearing-impaired (HoH) subjects 70 vi L I S T O F F I G U R E S Figure 1.1. Flow chart of the fully-computed auralization procedure 8 Figure 2. 1. 3D model of Room A showing sources and receiver positions 25 Figure 2. 2. Directivity patterns of the speech sources 28 Figure 2. 3. Measured and predicted reverberation metrics '. 30 Figure 2. 4. Measured and predicted speech and babble noise levels and speech-to-noise level differences 31 Figure 2. 5. Measured and predicted speech-intelligibility test scores and 95% confidence interval.... 34 Figure 2. 6. Variation of speech intelligibility with acoustical parameters 36 Figure 2. 7. Room impulse responses at r2 38 Figure 3. 1. Floor-plan and elevation of the virtual classroom, showing the speaker, listener and noise-source positions. A l l coordinates are in metres 48 Figure 3. 2. Variation of mean speech-intelligibility score, and 95% confidence interval, with #7" 52 Figure 3. 3. Variation of mean speech-intelligibility score with useful-to-detrimental ratio using the best-fit early-time limit for all sound-field configurations and the best-fit third-order polynomial regression curves 55 Figure 4. 1. Classroom configurations with sound-absorbing areas shown in grey 63 Figure 4. 2. Speech-to-noise ratio (SNR) at listener's location 65 Figure 4. 3. Variation of mean speech-intelligibility score, and 95% confidence interval, with RT..... 68 Figure 4. 4. Variation of useful-to-detrimental ratios with reverberation time for different early-time limits (from 20 ms to 100 ms) 70 Figure 4. 5. Variation of mean speech-intelligibility score with useful-to-detrimental ratio using the best-fit early-time limit for all sound-field configurations and the best-fit quadratic regression curves 71 Figure 4. 6. Variation of mean speech-intelligibility score with Speech Transmission Index using the best-fit early-time limit, for all sound-field configurations, and the best-fit quadratic regression curves (eqns in text) 73 Figure 4. 7. The initial 0.2 s of the impulse response 74 Figure 5.1. Lecture-room floor-plan showing the speech-source and receiver positions 79 Figure 5. 2. Photographs of the 1/8-scale-model without ceiling barriers or reflectors 82 Figure 5. 3. Measured octave-band, horizontal-plane directivity factors of the l:8-scale-model speech source; levels are normalized to 0 dB at 0° 83 Figure 5. 4. Scale-model ceiling barrier and reflector configurations 85 Figure 5.5. Computer models of the lecture room 86 Figure 5. 6. Variation with frequency of speech levels and early-decay times at three central positions in a lecture room without ceiling barriers or reflectors, as measured in a scale-model and as predicted by CATT-Acoustic 87 vii Figure 5. 7. Variation with position of speech levels and early-decay times in a lecture room without and with reflective and absorptive ceiling barriers, as measured in a scale model and as predicted by CATT-Acoustic along the center line with the centre speaker 89 Figure 5.8. Variation with position of speech levels and early-decay times with the centre speaker without and with reflective and absorptive ceiling barriers, as measured in a scale model 90 Figure 5. 9. Variation with position of speech levels and early-decay times with the right speaker without and with reflective ceiling barriers, as measured in a scale model 91 Figure 5.10. Variation with position of speech levels and early-decay times in the scale-model lecture room at central positions without and with ceiling barriers and reflectors 93 Figure 5. 11. Variation with position of speech levels and early-decay times with the centre speaker without and with reflective ceiling reflectors, as measured in a scale model 94 Figure 5. 12. Variation with position of speech levels and early-decay times with the right speaker without and with ceiling reflectors, as measured in a scale model 95 Figure 5.13. Variation of percentage decrease of early-decay time with sound-level decrease at three central positions with the centre speaker in a lecture room with reflective and absorptive ceiling barriers, as measured in a scale model and as predicted, with linear trendlines 98 Figure 5. 11. Variation of percentage decrease of early-decay time with sound-level decrease at three side positions with the centre speaker in a lecture room with reflective and absorptive ceiling barriers, as measured in a scale model and as predicted, with linear trendlines 99 viii L I S T O F S Y M B O L S a Sound absorption coefficient a Room average sound absorption coefficient A Absorptive ceiling barrier C Early-to-late sound energy ratio (clarity factor) C C Semicircular ceiling reflector with curved side down CF Semicircular ceiling reflector with flat side down d Sound diffusion coefficient d Room average sound diffusion coefficient E, Direct energy of sound signal Ee Early-arriving reflected energy of sound signal Ex Late-arriving energy of sound signal En Noise energy HoH Hearing impaired L Listener Leq Equivalent sound pressure level ^nfl Long-term anechoic levels at 1 m directly in front of the noise sources in a free field Long-term anechoic levels at 1 m directly in front of the speech in a free field L w Sound power level N Noise source Sound directivity R Reflective ceiling barrier r l C Receiver at front in centre line r l L Receiver at front in left-side line r l R Receiver at front in right-side line R2 Coefficients of determination x2C Receiver at middle in centre line r2L Receiver at middle in left-side line r2R Receiver at middle in right-side line r3C Receiver at back in centre line r3L Receiver at back in left-side line r3R Receiver at back in right-side line ra Distance from the noise source to the listener r s Distance from the speech source to the listener rh Reverberation radius (critical distance) S Speech source SNR Speech-tq-noise level difference at receiver SNS Relative output power levels of the speech and noise sources U Useful-to-detrimental sound energy ratio x L I S T O F A B B R E V I A T I O N S %ALcons Articulation Loss for Consonants AI Articulation Index A N O V A Analysis of Variance BRIR Binaural Room Impulse Responses C W H Central West Health EDT Early Decay Time FRT Fairbanks Rhyme Test FS Full-Scale value H L Hearing Loss HRTF Head Related Transfer Function JND Just Noticeable Differences K E M A R Knowles Electronics Mannequin for Acoustics Research M L S Maximum Length Sequences M L S S A Maximum Length Sequence System Analyser M R T Modified Rhyme Test mSTI modified Speech Transmission Index M T F Modulation Transfer Function N H Normal Hearing N L A A-weighted Noise Level RASTI Rapid Speech Transmission Index RT Reverberation Time SAI Speech Audibility Index SD Standard Deviation SI Speech Intelligibility SII Speech Intelligibility Index S L A A-weighted Speech Level S L A N A-weighted Speech Level for a 'Normal' voice S L N unweighted Speech Level for a 'Normal' voice STI Speech Transmission Index U B C University of British Columbia F E M Finite Element Method B E M Boundary Element Method FDTD Finite-Difference Time-Domain method SEA Statistical Energy Analysis RTC Randomized Tail-corrected Cone-tracing xii A C K N O W L E D G M E N T S Among the many people of whom I wish to acknowledge for their support and help, I would particularly like to thank all of the volunteers who participated in a series of speech-intelligibility tests critical for my research. A special thank you goes to those who have hearing impairment. This study would not have been made without them. I am greatly indebted to my Supervisory Committee members, Hugh Davies, Janet Jamieson, Jeff Small, Sydney Fels, and my Examination Committee members, Eric Vatikiotis-Bateson, Susan Kennedy, Michael Vorlaender, for their support and guidance. I am more than grateful to my supervisor as well as my mentor, Murray Hodgson, for giving me generous support and full confidence in me, for teaching me what are real researcher' roles, and for shepherding me to finish the doctoral program. His encouragement and patience gave me a power to overcome my limitations and to continue the work. So many people have helped in numerous ways, by being right there at the right times for me. I particularly wish to thank my lab buddies for ever after, Ann Nakashima, Galen Wong, Zohreh Razabi, Katrina Scherebnyj, Gary Chen, Owen Cousins, (and ' S S A R A H ' with black lips) for their cheerful encouragement and academic discussion whenever I felt that this work was never going to the end. Very special thanks to Maki Uemae for her data collections and support for this work. I also would like to express my gratitude to the all of the staff, faculty and students of the School of Occupational and Environmental Hygiene for making a friendly working environment in this interdisciplinary program. Special thanks to Melissa Friesen, who was always in the same step as I, in this lonely journey, for helping me to become familiar with the department. I sincerely appreciate all my friends in Vancouver for their support and concern for acting as a family away from home. M y friends in Korea for keeping me in their hearts and thoughts during these years also gave me energy for living in a whole new world. More than thanks to my family for your never-ending love, patience, and support, and for reminding me what are the most important things in life. Dear mother and father, you are the root for me to stand on the earth. xiii It has been four years since I came to Vancouver to study. At the moment I looked at the aerial photos of Vancouver and U B C , which my supervisor, Murray, showed me at an acoustical conference in Pittsburgh, I realized that I was going to fall in love with this beautiful city. Last but not least, my deep gratitude goes to this wonderful city, Vancouver, which inspired the beauty of nature and the preciousness of life into me. October 2006 Wonyoung Yang xiv D E D I C A T I O N To [Moon Jung in the memory of your sacrifice and Cove. xv I I N T R O D U C T I O N I.I B a c k g r o u n d People now spend the majority of their time in indoor environments, so appropriate room acoustics are imperative for human health, productivity, and comfort. It is particularly important in acoustically sensitive spaces such as classrooms, where verbal communication is critical. The acoustics of such rooms should achieve a high degree of speech intelligibility for listeners. Speech signals should be transmitted without distortion and be clearly audible. Since subjective speech-intelligibility tests were proposed for evaluation of telephonic intelligibility [1,2,3], speech intelligibility has been considered to be important [4] in a wide variety of different fields. In speech science, speech intelligibility is used for clinical purposes [5]; in audiology, speech intelligibility is important in developing and evaluating hearing-aids [6]; in classroom acoustics, speech intelligibility is critical to improve conditions for teaching and learning ability [7]. In this thesis, speech intelligibility will be studied in relation to room-acoustical parameters. Speech intelligibility can be defined as the percentage of speech material correctly recognized by a listener, and it can be measured by specially designed speech-intelligibility tests or predicted by way of speech-intelligibility metrics. Speech-intelligibility metrics have been developed based on fundamental room-acoustical parameters. The room-acoustical parameters affecting speech intelligibility are known to be speech-to-noise level difference (often referred to in the literature as the signal-to-noise ratio) and reverberation. In general, speech intelligibility tends to increase with increased speech-to-noise level difference and to decrease with increased reverberation. The signal-to-noise level difference is considered to be the more dominant factor for controlling speech intelligibility in rooms [8]. However, in rooms the situation is complicated by the fact that reverberation and steady-state levels interact. Increased reverberation increases speech and noise levels by increasing the reverberant sound energy; this increases or decreases the speech-to-noise level difference at a listener position depending on the listener's relative distances to the speech and noise sources [9]. In order to achieve a high degree of speech intelligibility, the room-acoustical parameters - speech-to-noise level difference and reverberation - should not be maximized or minimized but be optimized. How to do this is the topic of this thesis. 1.2 Literature Review This section introduces fundamental concepts related to speech-intelligibility measures, which were used in this thesis (Section 1.2.1), reviews current issues related to optimal acoustical conditions for speech intelligibility and points out problems with the current experimental methods on speech-intelligibility studies (Section 1.2.2). This section also reviews feasibility and fundamentals of auralization for speech-intelligibility test (Section 1.2.3), and provides methods chosen to evaluate for this work (Section 1.2.4). 1.2.1 Speech-intelligibility measures Subjective speech-intelligibility tests Subjective speech-intelligibility tests measure human ability to recognize speech signals. These tests present test speech materials to appropriate listeners who record what they hear, the accuracy of their perceptions is scored. In general, the choice of the test is related to the purpose of the study. There are good overviews focused on the assessment of subjective speech-intelligibility tests [10,11]. In this section, the birth of speech-intelligibility tests and their application in room acoustics are discussed. Speech-intelligibility tests frequently used in room acoustics are phoneme-level rhyme tests. Black [12] developed the first closed-set discrimination tests. Fairbanks [13] developed the Fairbanks' Rhyme Test (FRT), using the lists of rhyming monosyllabic words guided by the data of Thorndike and Lorge [14]. Modifications of Fairbanks' Rhyme Test 2 were made by House et al. [15] and became known as the Modified Rhyme Test (MRT). Kreul et al. [16] altered the M R T to make it more clinically useful. Adaptations were made in the areas of timing consistency, carrier phrase, test instructions, and forms used, with both male and female talkers. Further revision of the M R T was made by Griffiths [17], who added new items to examine phonemic confusion in the responses. The Fairbanks' Rhyme Test and the Modified Rhyme Test have been widely used: for example, Latham [18] used the FRT to validate the useful-to-detrimental sound-energy ratio; Bradley [19 20 8] has used the FRT in his studies of speech intelligibility; Nabelek et al. \l\,YL,2!>\ have conducted the M R T in studies of speech perception. Objective speech-intelligibility metrics Subjective speech-intelligibility tests are by far the most accurate and reliable methods for intelligibility testing regarding subjective perception [24]. However, subjective speech-intelligibility tests are complicated to set up, time-consuming to conduct, and require extensive statistical analysis to interpret. Hence, various objective metrics have been developed to predict intelligibility scores. One of the most prominent early efforts in speech intelligibility prediction was the development of the Articulation Index (AI) by French and Steinberg [25]; it can be computed from the intensities of speech and noise received by the ear, both as a function of frequency. The AI was developed as an acoustical metric that could be used to predict the speech recognition ability of speech-transmission systems. The method was reconsidered by Kryter [26,27] who increased its accessibility by the introduction of a calculation scheme, work sheets, and tables. Later, Peutz [28] published a new method for predicting speech intelligibility - the articulation loss for consonants (%ALcons) - which is computed from measurements of the direct-to-reverberant ratio, signal-to-noise ratio and the reverberation time. Later, the Speech Transmission Index (STI) was developed using the Modulation Transfer Function (MTF) of a transmission channel [29,30,31,32,33]. The STI combines two major phenomena that affect speech intelligibility, reverberation and noise, to extract a single index that gives good correlation with subjective perception.. The Rapid Speech Transmission Index (RASTI) was developed as a simplified version of the STI [34]. The RASTI measures 3 in only two octave bands centered at 500 Hz and 2 kHz. The Speech Intelligibility Index (SII) [35] is a modified version of the AI and includes reverberation, noise and distortion, all of which are accounted for in the modulation transfer function. Recently, Speech Audibility Index (SAI) [36] was defined as the proportion of the useful speech signal (direct speech and early reverberation) that is above the level of the effective noise (actual noise and late reverberation). The SAI is similar to the STI in that it accounts for both noise and reverberation in terms of changes in the amplitude envelope of speech. Speech intelligibility can be predicted from impulse responses. Early-arriving sound reflections (reflections arriving within some early-energy time of the direct sound) make the direct sound louder, so that the integrated early-arriving reflections increase the intelligibility of speech [37]. Late-arriving reflections are not integrated with the direct sound and degrade speech intelligibility by causing one speech sound to blur into the next. The ratio of early-to-late arriving sound has been proposed as an indicator of the effects of room acoustics (i.e. reverberation) on the clarity and intelligibility of speech [38]. Lochner and Burger [39] developed their Signal-to-Noise Ratio which provided a measure of useful and detrimental reflected speech energy according to the integration and masking characteristics of hearing by adding a background-noise component to the early-to-late arriving sound-ratio concept. The useful-to-detrimental ratio was extended by Latham [18] to account for the effect of fluctuating ambient background noise on speech intelligibility. The useful-to-detrimental ratio is considered to be a suitable intelligibility metric for predicting speech intelligibility for normal-hearing listeners [19,20]. Typically, the time limit for the early-arriving reflections has been taken to be 50 ms for speech sounds [8,20]. 1.2.2 Effects of noise and reverberation on speech intelligibility Experimental studies Experimental studies have been used to examine speech intelligibility in different acoustical conditions which are combinations of various signal-to-noise ratios and reverberation times. Nabelek and Robinson [23] found that speech intelligibility was inversely proportional to the reverberation time in the absence of noise, but their results did not explain 4 what happens when noise sources are introduced. Nabelek and Pickett [21,22] found that speech intelligibility decreases with increased reverberation time. They only had two different conditions, with reverberation times of 0.3 s and 0.6 s. Finitzo-Hieber and Tillman [40] developed Nabelek and Pickett's tests, including a reverberation time of zero, and found that reverberation degraded speech discrimination for both normal and hearing-impaired children. Yacullo and Hawkins [41] also examined the effects of noise and reverberation time on monaural speech recognition using sentence materials, and concluded that speech intelligibility was adversely affected by noise and reverberation. Because of the fragmentary character of each study, it is difficult to draw systematic conclusions from these studies. However, these early experimental studies consistently found an optimal reverberation time of zero. They used fixed signal-to-noise ratios with different reverberation times at the listener's position, and put the speech and the noise at the same distance from the listener. Experimental studies on speech intelligibility for hearing-impaired people have been done using subjective speech tests. Listeners with even mild sensorineural hearing loss may have greater difficulty when listening in noisy environments than do normal-hearing listeners [42,43,44,45]. Reverberation has a particularly detrimental effect on speech intelligibility for hearing-impaired listeners [46,47]. Nabelek and Dagenais [48] examined both noise and reverberation. Although the mean speech-test scores for the noise and reverberant conditions were not significantly different, the patterns of errors for these two conditions were different. They concluded that temporal smearing in reverberant condition should be considered. Duquesnoy and Plomp [49] investigated the applicability of the Speech Transmission Index (STI) to hearing-impaired subjects with added reverberation and noise. They proposed reduced reverberation times to improve communication for elderly people in rooms [50]. Humes et al. [51] found that the Articulation Index (AI) and the Speech Transmission Index (STI) had significant shortcomings for hearing-impaired listeners, and proposed the modified Speech Transmission Index (mSTI) as an alternative speech-intelligibility index for hearing-impaired listeners. Attempts to predict more accurate speech-recognition performance for sensorineural hearing-impaired listeners, several correction factors have been applied to the AI [52]. Payton, Uchanski, and Braida [53] used the AI and the STI; they found the AI was unable to represent the reduction in intelligibility scores due to reverberation for the hearing-impaired listeners. Neither Articulation Index (AI) nor Speech Intelligibility Index (SII) can 5 accurately predict the speech intelligibility observed in many hearing-impaired listeners [54,55,56]. In all of these experimental studies, noise was incorporated in an unrealistic manner. In reality, reverberation results in increased speech levels as well as noise levels [57]. In real rooms, the speech level at the listener's position is higher with reverberation, as is the noise level. In the presence of noise, it is important to recognize the spatial relationship between speaker and noise source, since where the noise sources are located differs their sound levels in reverberant rooms. Therefore, using the fixed signal-to-noise ratio at the listener in these experiments eliminated the accurate effect of reverberation on speech intelligibility. Theoretical studies Theoretical studies generally have predicted non-zero reverberation times. Bradley [20], using both subjective test results and the objective intelligibility measures, suggested optimum reverberation times for classrooms of 0.4 to 0.5 s with a background noise level of 30 dBA. The effects of reverberation on signal-to-noise ratio were taken into account by calculating an overall A-weighted signal-to-noise ratio for each speech level at each listener. Plomp and Mimpen [58] found non-zero optimum reverberation times with a variety of room sizes and signal-to-noise ratios by applying the image method. In their study, the effects of reverberation on noise were incorporated by considering the audience as a collection of individual noise sources. Bistafa and Bradley [59] predicted nonzero optimal reverberation times using a number of metrics. They found that increased reverberation increased early energy and intelligibility. However, too much reverberation decreased speech intelligibility. Increases of speech levels with increasing reverberation were considered in their study, but the effects of reverberation on noise levels were ignored. Hodgson and Nosal [9] addressed how to incorporate noise in a realistic manner. Optimum reverberation times depended on the signal-to-noise level difference at the listener's position, the positions and orientations of the speaker and the noise source, and the number of noise sources. They found that i f the speech source was farther from the listener than the noise source, then speech levels increased more with reverberation than did noise levels, and the level difference increased with reverberation, tending to increase intelligibility. If, on the other hand, the noise source was farther from the 6 listener than the speech source, then noise levels increased more with reverberation than did speech levels, tending to decrease intelligibility. Thus, the effect of reverberation on intelligibility depended on the distances from the listener of both the speech source and the noise source. 1.2.3 Speech intelligibility in virtual environments Subjective speech-intelligibility tests give more accurate results than objective metrics. However, subjective speech-intelligibility tests in real rooms have the limitation that they are difficult to perform with a large number of subjects. Auralization brings a solution to the limitations of the subjective speech tests, having unlimited capability of reproduction of the realistic listening environments and making it possible for subjective speech intelligibility to be predicted in a room before it is actually built. Auralization or acoustical virtual reality is "the process of rendering audible, by physical or mathematical modeling, the sound field of a source in a space, in such a way as to simulate the binaural listening experience at a given position in the modelled space" [60]. Historical reviews of auralization are presented in Kleiner et al. (1993) [60] and Lokki (2002) [61]. To perform auralization, room impulse responses are recorded or calculated, then are convolved with source signals recorded in an anechoic chamber. Figure 1.1 shows the auralization procedure. The image-source model and ray-tracing methods are well known algorithms for room-acoustic prediction used in auralization [62,63,64,65]. The influence of the absorption, scattering, and diffraction of surfaces has been incorporated in room-acoustic prediction [66,67]. Currently, calculated room impulse responses are widely used in auralization which, in that case, is called 'fully-computed' auralization [65,68,69,70]. 7 Sound-field simulation: Environmental data Binaural simulation: HRTFs Convolution: Anechoic signals 3D room model Impulse Response Binaural Impulse Response Test signals Auralization: Replay Figure 1.1. Flow chart of the fully-computed auralization procedure Auditory perception is sensitive with respect to listening environment [71]. Due to the auditory characteristics of humans, headphone playback systems need to compensate for the sound localization, sound transmission from a free field to a point in the ear canal of a human, and to consider the frequency response of the headphone used. The head-related transfer functions (HRTFs) are modeled as a filter which accounts for the effect of the reflections from the pinnae and shoulder, as well as the shadowing effect of the head itself. The HRTFs not only vary in a complex way with azimuth, elevation, range, and frequency, but also vary significantly from person to person [72,73]. Wightman and Kistler [72,74] developed and validated synthesized stimuli over headphones using the head-related transfer functions (HRTFs) measured from each subject. Judgments of azimuth were quite accurate, and there were no obvious differences between judgments made with free-field signals and those made with the synthesized ones. Judgments of elevation, however, tended to be more variable with the synthesized signals [72]. Wenzel et al. [75] extended Wightman and Kistler's work to sixteen untrained subjects with non-individual HRFTs. The results suggested that most listeners could obtain useful directional information of azimuth dimension from an auditory display without requiring the use of individual HRTFs. However, they found the high rate of front-to-back confusions to be asymmetric around the interaural axis with non-individual HRTFs. Virtual stimuli processed using non-individual HRTFs have been cited in the literature as degrading localizaion accuracy, decreasing externalization, and increasing reversal errors [76,77]. Although the non-individual HRTFs have drawbacks, measurement of individual HRTFs is impractical in an application developed for a large number of people [78,79]. 8 Bronkhorst [80] found no significant effect of using individualized HRTFs on reversals with two different noise spectra. When virtual speech stimuli were incorporated, no significant main effects were found for head tracking or use of individual HRTFs [81]. Whether or not individualized cues are required may depend upon the nature of the task [79]. Subjective speech-intelligibility tests using auralization techniques have been used for various purposes. Nordlund, Kihlman, and Lindblad [82] developed a method for speech-intelligibility test using an artificial speaker having human voice directivity and an artificial head constructed for stereophonic recording in an auditorium. Kleiner [83] used more advanced auralization techniques for speech-intelligibility tests including calculated echograms and measured echograms to synthesize sound fields. Ricard and Meirs [84] incorporated head-related transfer functions (HRTFs) [72,73] into their Modified Rhyme Test (MRT) phrases for both recognition and localization tasks. Besing and Koehnke [85,86] used source-to-eardrum transfer functions to develop speech-intelligibility tests in noise. Peng [87] found that auralization with simulated binaural room impulse responses (BRTRs) was more accurate to test speech intelligibility than that with monaural room impulse responses. Detailed review on speech intelligibility in virtual environments will be presented in Chapter 2. It is important to know how reliable auralization is, and to what extent acoustical details are actually simulated [88]. Although a number of auralization systems have been developed, few studies to validate the quality of the systems have been reported. This might be a consequence of the fact that such evaluation is laborious and that absolute quality is hard to define. Comparison of recordings and auralized test material has been used to evaluate the quality of auralization. Kleiner [83] compared direct listening and three different auralization techniques - binaural recordings made in the theatre, convolution with calculated impulse response, and convolution with measured impulse response - using speech-intelligibility tests. Lokki and Jarvelainen [89] used a recording made in real room to evaluate auralization quality. Rindel and Christensen [88] compared three auralization techniques to validate the quality of their auralization. Another way of comparing real and auralized sound fields is to compare the just-noticeable differences (JNDs) for various parts of the binaural impulse response. Prodi and Velecka [90] used the JNDs to evaluate the binaural playback systems for virtual sound 9 fields. They concluded that all required quality features should be tested with a given specific application and it could be defined whether the performance of auralization is plausible with the specific application. 1.2.4 Room-acoustical modeling methods For this work it was desired to predict the acoustical conditions in rooms and to obtain auralized sound fields using room-acoustical modeling methods. There are two different methods used in the modeling of room acoustics in this work: computational modeling and physical scale-modeling. Computational modeling There are three approaches in computational modeling in room acoustics: wave-based modeling, ray-based modeling, and statistical modeling [91]. The wave-based modeling methods (methods that account for wave phase) such as the finite-element method (FEM), boundary-element method (BEM), and finite-difference time-domain method (FDTD), are suitable only for small enclosures and low frequencies due to the heavy computational requirements [92]. The statistical modeling methods such as the statistical energy analysis (SEA), are used for prediction of noise levels in coupled rooms where sound transmission by structures is important [93]. Another statistical approach is diffuse-field theory, in which the theoretical assumption is that the reverberant sound field is perfectly diffuse [94]. In this thesis, it was used to calculate absorption coefficients and reverberation times from measured data in scale-model rooms. Since the Sabine formula was derived, diffuse-field theory has been developed [95]. It is applicable to fairly reverberant rooms with uniform surface absorption. Eyring improved upon Sabine formula to make it applicable to less reverberant spaces, by treating the waves as being absorbed only at the surfaces [96]. Fitzroy improved upon Sabine by allowing the absorbent material to be distributed unevenly [97]. Although a diffuse sound field is an idealization in rooms, it can be applicable in real rooms when the room absorption is 10 uniformly distributed and the room shape is quasi-cubic, especially i f the room surfaces are diffusely reflecting [98]. The ray-based modeling methods, such as the ray-tracing and the image-source methods, are most commonly used for room acoustical modeling [99,100,101,62]. There are also hybrid models, which are combined ray-tracing and image-source method together [64,102]. In hybrid models, early reflections are calculated with the image-source method due to its accuracy in finding reflection paths, and later reflections are calculated with ray-tracing method, due to its efficiency in computational requirement. One of the hybrid acoustical prediction and auralization software systems, CATT-Acoustic v8.0 [103], was used in this work. For early sound reflections, the image-source method was utilized, with added first-order diffuse reflection. The direct sound, first-order diffuse and specular reflections and second-order specular reflections are handled by the image-source method. For fully-detailed calculation, a randomized tail-corrected cone-tracing (RTC) method was used. The RTC combines features of both specular cone-tracing, standard ray-tracing and the image-source method [104]. Physical scale-modeling Physical scale-modeling methods are now well proven and there is considerable knowledge regarding their ability to represent the acoustical conditions of real rooms [105, 106, 107, 108,109]. Acoustic scale modeling was first undertaken by Spandock in 1934 [110]. The principle behind acoustic scale modeling is simple. In order to create a 1/n scale model, all dimensions are scaled by 1/n. If the wavelength is similarly reduced then wavelength-to-dimension ratios remain unchanged. If the wavelength is reduced, the frequency must be increased. However the propagation medium (generally air), in the model is the same as in the real room, so for a 1/n scale model, frequency should be increased by n times (possibly involving ultrasonic model frequencies). At these higher model frequencies, the absorption and diffusion properties of the model surfaces and the air should be the same as in the real room at normal frequencies. 11 A major constraint to the choice of scale factor relates to the transducers, which tend to have a high frequency limit around 100 kHz, both in the case of loudspeakers and microphones. Ai r absorption is also a major problem in scale-model measurement, air absorption increases approximately with the square of the frequency [111]. Since it should increase in proportion to frequency, the air-absorption effect is excessive in a scale model and cannot be neglected. The air absorption can be calculated at the model test frequencies for each temperature and relative humidity measured in the scale-model room [111]. 1.2.5 Summary Previous studies of optimal acoustical conditions for speech intelligibility were based on two major approaches. A series of experimental and theoretical studies have been carried out to understand the acoustical conditions important for speech intelligibility. Different types of studies, with different ways of incorporating noise, resulted in different apparent optimal reverberation times for speech. In the experimental studies, the use of fixed signal-to-noise ratios at the listener eliminated the accurate inclusion of the effect of reverberation on speech intelligibility. In the theoretical predictions, it was found that the effect of reverberation on intelligibility was dependent on the distances from the listener of both the speech source and the noise source. In order to prove the theoretical prediction, more accurate experimental study is imperative. Subjective speech-intelligibility tests widely used in room acoustics, objective metrics to predict speech intelligibility, and the possibility of using the auralization technique were reviewed for the experimental study on speech intelligibility. Applying auralization to speech-intelligibility tests was deemed plausible in terms of its repeatability of test signals, as long as auralization was first validated for speech-intelligibility testing. Two room modeling methods chosen for this work, CATT-Acoustic and physical scale-modeling, were introduced. 12 1.3 Objectives of the Thesis The primary objective of this thesis is to find the optimal reverberation for speech intelligibility by an experimental method using more realistic room-acoustical models. The interaction between reverberation and steady-state level will be incorporated into room-acoustical models. The effect of doing so is to increase the reality and accuracy of speech-intelligibility tests, resulting in more valuable data for the acoustic design of rooms for speech. The second objective is to find the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners. Optimal room-acoustical conditions for hearing-impaired people on speech intelligibility are not well known, despite there being are a number of audiological studies on speech intelligibility for hearing-impaired listeners. Considering that the hearing-impaired population requires acoustically more elaborate listening environments, knowledge of the optimal room-acoustical conditions is imperative. The third objective is to propose an optimal architectural design for speech intelligibility based on the findings of this thesis. Ceiling barriers and reflectors will be used as a design variable in classrooms. The results provide a design concept for high quality classroom acoustics. 1.4 Approach The auralization technique was employed as a fundamental method for reproducing sound fields, and its application was validated for speech intelligibility in this thesis. The first step in finding the optimal reverberation for speech intelligibility using auralization was to create idealized virtual rooms with an approximately diffuse sound field in order to have results to be tested in the same conditions of the previous experimental studies. For more accurate modeling of the interaction between reverberation and speech-to-noise ratio, the concepts of the speech-to-noise level difference at a listener's position (SNR) and the 13 relative output-power levels of the speech and noise sources were chosen to give differences (SNS) that are clearly defined and used in this thesis. The next step consisted of applying the results from the first step into more realistic virtual rooms with non-diffuse sound fields, modeled based on existing classrooms. This step repeats the first step, and confirms and extends the results of the previous study. Finally, a novel practicable method for achieving high speech intelligibility in classrooms is proposed based on the results of the previous steps. Ceiling barriers and reflectors for optimizing classroom acoustics for speech intelligibility were evaluated by using a physical scale-model and computer prediction. 1.5 Thesis Outline This thesis is organized as follows. Chapter 2 discusses validation of speech-intelligibility testing in virtual classrooms in comparison with real classrooms. Chapter 3 investigates the optimum reverberation for speech intelligibility when noise is incorporated in a realistic manner in a simplified virtual room for both normal and hearing-impaired listeners. Chapter 4 develops Chapter 3 in the case of realistic virtual rooms. Chapter 5 reports tests of ceiling barriers and reflectors as an architectural solution to improve speech intelligibility based on the optimum acoustical conditions found in Chapter 3 and Chapter 4. Chapter 6 concludes the thesis and discusses the future work. References [1] G. A. Campbell, "Telephonic intelligibility," Phil. Mag. 19, 152-159 (1910). [2] H. Fletcher, "The nature of speech and its interpretation," J. Franklin Inst. 193 (6), 729-747 (1922). [3] H. Fletcher and J. C. Steinberg, "Articulation testing methods," J. Acoust. Soc. Am. 1 (1), 1-48 (1930). [4] P. Pratt, in Sound & Vibration Control (1996), pp. 49-54. 14 [5] A . A . Tyler and L. C. Tolbert, "Speech-language assessment in the clinical setting," Am. J. Speech-Lang. Pat. 11 (4), 215-220 (2002). [6] D. Noffsinger, G. B . Haskell, V . D. Larson et al., "Quality rating test of hearing aid benefit in the NIDCDAf A Clinical Trial," Ear Hear. 23 (4), 291-300 (2002). [7] A . C. Neuman and I. Hochberg, "Children's perception of speech in reverberation," J. Acoust. Soc. Am. 73 (6), 2145-2149 (1983). [8] J. S. Bradley, R. D. Reich, and S. G. Norcross, "On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility," J. Acoust. Soc. Am. 106 (4), 1820-1828 (1999). [9] M . Hodgson and E . - M . Nosal, "Effect of noise and occupancy on optimal reverberation times for speech intelligibility in classrooms," J. Acoust. Soc. Am. I l l (2), 931-939 (2002). [10] H . J. M . Steeneken,"Quality evaluation of speech processing systems," in Digital speech processing: speech coding, synthesis, and recognition, edited by I. A . Nejat (Kluwer Academic Publishers, Boston, 1992), pp. 127-160. [11] J. P. Penrod,"Speech threshold and word recognition/discrimination testing," in Handbook of cliniclal audiology, edited by J. Katz (Williams & Wilkins, Baltimore, 1994), pp. 147-164. [12] J. W. Black, "Multiple-choice intelligibility tests," J. Speech Hear. Dis. 22 (2), 213-235 (1957). [13] G. Fairbanks, "Test of phonemic differentiation: The rhyme test," J. Acoust. Soc. Am. 30 (7), 596-600(1958). [14] E. L . Thorndike and I. Lorge, The teacher's word book of30,000 words. (Teachers college Columbia university, New York, 1944). [15] A . S. House, C. E. Williams, M . H . L. Hecker et al., "Articulation-testing methods: Consonantal differentiation with a closed-response set," J. Acoust. Soc. Am. 37 (1), 158-166 (1965). [16] E. J. Kreul, J. C. Nixon,K. D. Kryter et al., " A proposed clinical test of speech discrimination," J. Speech Hear. Res. 11 (3), 536-552 (1968). [17] J. D. Griffiths, "Rhyming minimal contrasts: A simplified diagnostic articulation test," J. Acoust. Soc. Am. 42 (1),.236-241 (1967). [18] H . G. Latham, "The signal-to-noise ratio for speech intelligibility-An auditorium acoustics design index," Appl. Acoust. 12 (4), 253-320 (1979). [19] J. S. Bradley, "Predictors of speech intelligibility in rooms," J. Acoust. Soc. Am. 80 (3), 837-845 (1986). [20] J. S. Bradley, "Speech intelligibility studies in classrooms," J. Acoust. Soc. Am. 80 (3), 846-854 (1986). [21] A . K . Nabelek and J. M . Pickett, "Reception of consonants in a classroom as affected by monaural and binaural listening, noise, reverberation, and hearing-aids," J. Acoust. Soc. Am. 56(2), 628-639(1974). [22] A . K . Nabelek and J. M . Pickett, "Monaural and binaural speech-perception through hearing-aids under noise and reverberation with normal and hearing-impaired listeners," J. Speech Hear. Res. 17 (4), 724-739 (1974). [23] A . K . Nabelek and P. K . Robinson, "Monaural and binaural speech perception in reverberation for listeners of various ages," J. Acoust. Soc. Am. 71 (5), 1242-1248 (1982). [24] A. K . Nabelek and I. V . Nabelek,"Room acoustics and speech perception," in Handbook of clinical audiology, edited by J. Katz (Williams & Wilkins, Baltimore, 1994), pp. 624-637. 15 [25] N . R. French and J. C. Steinberg, "Factors governing the intelligibility of speech sounds," J. Acoust. Soc. Am. 19 (1), 90-119 (1947). [26] D. K . Karl, "Methods for the calculation and use of the articulation index," J. Acoust. Soc. Am. 34 (11), 1689-1697 (1962). [27] D. K . Karl, "Validation of the articulation index," J. Acoust. Soc. Am. 34 (11), 1698-1702 (1962). [28] V . M . A . Peutz, "Articulation loss of consonants as a criterion for speech transmission in a room," J. Audio Eng. Soc. 19 (11), 915-919 (1971). [29] T. Houtgast and H . J. M . Steeneken, "The modulation transfer function in room acoustics as a predictor of speech intelligibility," Acustica 28, 66-73 (1973). [30] T. Houtgast and H . J. M . Steeneken, "Evaluation of speech transmission channels by using artificial signals," Acustica 25, 355-367 (1971). [31] T. Houtgast, H . J. M . Steeneken, and R. Plomp, "Predicting speech intellgibility in rooms from the modulation transfer function. I. General room acoustics," Acustica 46 (1), 60-72 (1980). [32] T. Houtgast, H . J. M . Steeneken, and R. Plomp, "Predicting speech intellgibility in rooms from the modulation transfer function. II. Mirror image computer model applied to rectangular rooms," Acustica 46 (1), 73-81 (1980). [33] H . J. M . Steeneken and T. Houtgast, "A physical method for measuring speech-transmission quality," J. Acoust. Soc. Am. 67 (1), 318-326 (1980). [34] T. Houtgast and H . J. M . Steeneken, "A multi-language evaluation of the RASTI-method for estimating speech intelligibility in auditoria," Acustica 54 (4), 185-199 (1984). [35] American National Standards Institute, "ANSI S3.5-1997 Methods for caculation of the speech intelligibility index," 1997. [36] A . Boothroyd, "Room acoustics and speech perception," Semin. Hear. 25 (2), 155-166 (2004). [37] H . Haas, "The influence of a single reflection on the audibility of speech," J. Audio. Eng. Soc. 20, 145159(1972). [38] W. Reichardt, O. Abdelalim, and W. Schmidt, "Definition and basis of making an objective evaluation to distinguish between useful and useless clarity defining musical performances," Acustica 32 (3), 126-137 (1975). [39] J. P. A . Lochner and J. F. Burger, "The influence of reflections on auditorium acoustics," J. Sound Vib. 1 (4), 426-448 (1964). [40] T. Finitzo-Hieber and T. W. Tillman, "Room acoustics effects on mono-syllabic word discrimination ability for normal and hearing-impaired children," J. Speech Hear. Res. 21 (3), 440-458(1978). [41] W. S. Yacullo and D. B Hawkins, "Speech recognition in noise and reverberation by school-age children," Audiology 26, 235-246 (1987). [42] R. Plomp and A . M . Mimpen, "Speech-reception threshold for sentences as a function of age and noise level," J. Acoust. Soc. Am. 66 (5), 1333-1342 (1979). [43] J. R. Dubno, D. D. Dirks, and D. E. Morgan, "Effects of age and mild hearing loss on speech recognition in noise," J. Acoust. Soc. Am. 76 (1), 87-96 (1984). [44] R. Plomp, " A signal-to-noise ratio model for the speech-reception threshold of the hearing-impaired," J.Speech Hear. Res. 29 (2), 146-154 (1986). [45] C. C. Crandell, "Speech recognition in noise by children with minimal degrees of sensorineural hearing loss," Ear Hear. 14 (3), 210-216 (1993). 16 [46] A . K . Nabelek and L. Robinette, "Reverberation as a parameter in clinical testing," Audiology 17 (3), 239-259 (1978). [47] A . K . Nabelek and T. R. Letowski, "Vowel confusions of hearing-impaired listeners under reverberant and nonreverberant conditions," 50(2), 126-131 (1985). [48] A . K . Nabelek and P. A . Dagenais, "Vowel errors in noise and in reverberation by hearing-impaired listeners," J. Acoust. Soc. Am. 80 (3), 741-748 (1986). [49] A . J. Duquesnoy and R. Plomp, "Effect of reverberation and noise on the intelligibility of sentences in cases of presbycusis," J. Acoust. Soc. Am. 68 (2), 537-544 (1980). [50] R. Plomp and A. J. Duquesnoy, "Room acoustics for the aged," J. Acoust. Soc. Am. 68 (6), 1616-1621 (1980). [51] L . E. Humes, D. S. D. Dirks, T. Bell et al., "Application of the articulation index and the speech transmission index to the recognition of speech by normal-hearing and hearing-impaired listeners.," J. Speech Hear. Res. 29 (4), 447-462 (1986). [52] C. V . Pavlovic, G. A . Studebaker, and R. L. Sherbecoe, "An Articulation Index based procedure for predicting the speech recognition performance of hearing-impaired individuals," J. Acoust. Soc. Am. 80 (1), 50-57 (1986). [53] K . L . Payton, R. M . Uchanski, and L. D. Braida, "Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing," J. Acoust. Soc. Am. 95 (3), 1581-1592 (1994). [54] C. A . Kamm, D. D. Dirks, and T. S. Bell, "Speech recognition and the articulation index for normal and hearing-impaired listeners," J. Acoust. Soc. Am. 77 (1), 281-288 (1985). [55] L . E. Humes, "Factors underlying the speech-recognition performance of elderly hearing-aid wearers," J. Acoust. Soc. Am. 112 (3), 1112-1132 (2002). [56] D. J. Schum, L . J. Matthews, and F. S. Lee, "Actual and predicted word-recognition performance of elderly hearing-impaired listeners," J. Speech Hear. Res. 34 (3), 636-642 (1991). [57] W. Yang and M . Hodgson, "Acoustical evaluation of preschool classrooms," Noise Control Eng. J. 53 (2), 43-52 (2005). [58] R. Plomp and A . M . Mimpen, "Improving the reliability of testing the speech reception threshold for sentences," Audiology 18 (1), 43-52 (1979). [59] Sylvio R. Bistafa and John S. Bradley, "Reverberation time and maximum background-noise level for classrooms from a comparative study of speech intelligibility metrics," J. Acoust. Soc. Am. 107 (2), 861-875 (2000). [60] M . Kleiner, B . I. Dalenback, and P. Svensson, "Auralization - A n overview," 41 (11), 861-875 (1993). [61] T. Lokki, Physically-based auralization: Design, implementation, and evaluation, D.Tech., Helsinki University of Technology, 2002. [62] J. Borish, "Extension of the image model to arbitrary polyhedra," J. Acoust. Soc. Am. 75 (6), 1827-1836(1984). [63] D. R. Begault, "Challenges to the successful implementation of 3-D sound," 39 (11), 864-870 (1991). [64] G. M . Naylor, "ODEON--Another hybrid room acoustical model," 38 (2-4), 131-143 (1993). [65] K . H . Kuttruff, "Auralization of impulse responses modeled on the basis of ray-tracing results," J. Audio Eng. Soc. 41 (11), 876-880 (1993). 17 [66] J. H . Rindel, "Modelling the angle-dependent pressure reflection factor," Appl. Acoust. 38 (2-4), 223-234(1993). [67] R. R. Torres, U . P. Svensson, and M . Kleiner, "Computation of edge diffraction for more accurate room acoustics auralization," J. Acoust. Soc. Am. 109 (2), 600-610 (2001). [68] W. Ahnert and R. Feistel, "Ears auralization software," J. Audio Eng. Soc. 41 (11), 894-904 (1993). [69] B . I. Dalenback, M . Kleiner, and P. Svensson, "Audibility of changes in geometric shape, source directivity, and absorptive treatment - Experiments in auralization," 41 (11), 905-913 (1993). [70] A . Mochimaru, "A study of the practicality and accuracy of impulse-response calculations for the auralization of sound-system design," J. Audio Eng. Soc. 41 (11), 881-893 (1993). [71] J. Blauert, Spatial hearing: the psychophysics of human sound localization, Rev. ed. (MIT Press, Cambridge, Mass., 1997), pp.xiii, 494. [72] F. L. Wightman and D. J. Kistler, "Headphone simulation of free-field listening. II: Psychophysical validation," J. Acoust. Soc. Am. 85 (2), 868-878 (1989). [73] D. J. Kistler and F. L. Wightman, "A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction," J. Acoust. Soc. Am. 91 (3), 1637-1647(1992). [74] F. L . Wightman and D. J. Kistler, "Headphone simulation of free-field listening. I: Stimulus synthesis," J. Acoust. Soc. Am. 85 (2), 858-867 (1989). [75] E. M . Wenzel, M . Arruda, D. J. Kistler et a l , "Localization using nonindividualized head-related transfer functions," J. Acoust. Soc. Am. 94 (1), 111-123 (1993). [76] H . Moller, M . F. Sorensen, C. B. Jensen et al., "Binaural technique: Do we need individual recordings?" J. Audio Eng. Soc. 44 (6), 451-469 (1996). [77] J. Sodnik, R. Susnik, M . Stular et a l , "Spatial sound resolution of an interpolated HRIR library," Appl. Acoust. 66 (11), 1219-1234 (2005). [78] E. M . Wenzel, P. K . Stone, S. S. Fisher et a l , "A system for three-dimensional acoustic 'visualization' in a virtual environment workstation," Proc. 1st IEEE Conference on Visualization '90, San Francisco, C A , U S A (1990). [79] E. M . Wenzel and D. R. Begault, "Are individualized head-related transfer functions required for auditory information displays?" J. Acoust. Soc. Am. 105 (2), 1035 (1999). [80] A . W. Bronkhorst, "Localization of real and virtual sound sources," J. Acoust. Soc. Am. 98 (5), 2542-2553 (1995). [81] D. R. Begault and E. M . Wenzel, "Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source," J. Audio Eng. Soc. 49 (10), 904-916 (2001). [82] B . Nordlund, T. Kihlman, and S. Lindblad, "Use of articulation tests in auditorium studies," J. Acoust. Soc. Am. 44 (1), 148-156 (1968). [83] M . Kleiner, "Speech-intelligibility In real and simulated sound fields," Acustica 47 (2), 55-71 (1981). [84] G. L. Ricard and S. L . Meirs, "Intelligibility and localization of speech from virtual directions," Human Factors 36 (1), 120-128 (1994). [85] J. Besing and J. M . Koehnke, "A test of virtual auditory localization," Ear Hear. 16 (2), 220-229(1995). 18 [86] J. Koehnke and J. M . Besing, "A procedure for testing speech intelligibility in a virtual listening environment.," Ear Hear. 17 (3), 211-217 (1996). [87] J. Peng, "Feasibility of subjective speech intelligibility assessment based on auralization," Appl. Acoust. 66, 591-601 (2005). [88] J. H . Rindel and C. L. Christensen, "Room acoustic simulation and auralization - How close can we get to the real room?" Proc. 8th WESPAC, Melbourne (2003). [89] T. Lokki and H . Jarvelainen, "Subjective evaluation of auralization of physics-based room acoustics modeling," Proc. 7th International Conference on Auditory Display, Espoo, Finland (2001). [90] N . Prodi and S. Velecka, "The evaluation of binaural playback systems for virtual sound fields," Appl. Acoust. 64 (2), 147 (2003). [91] L . Savioja, Modeling techniques for virtual acoustics, D.Tech., Helsinki University of Technology, 2000. [92] D. Botteldooren, "Finite-difference time-domain simulation of low-frequency room acoustic problems," J. Acoust. Soc. Am. 98 (6), 3302-3308 (1995). [93] C. B . Burroughs, R. W. Fischer, and F. R. Kem, "An introduction to statistical energy analysis," J. Acoust. Soc. Am. 101 (4), 1779-1789 (1997). [94] H . Kuttruff, Room acoustics, 4th ed. (Spoon Press, London New York, 2000), pp. 116-122. [95] W. C. Sabine, Collected papers on acoustics. (Dover Publications, New York, 1964). [96] C. F. Eyring, "Reverberation Time in "Dead" Rooms," J. Acoust. Soc. Am. 1 (2A), 168 (1930). [97] D. Fitzroy, "Reverberation formula which seems to be more accurate with nonuniform distribution of absorption," J. Acoust. Soc. Am. 31 (7), 893-897 (1959). [98] M . Hodgson, "When is diffuse-field theory applicable," Appl. Acoust. 49 (3), 197-207 (1996). [99] A . Kulowski, "Algorithmic representation of the ray tracing technique," Appl. Acoust. 18 (6), 449-469(1985). [100] K. H . Kuttruff, "Auralization of impulse responses modeled on the basis of ray-tracing results," J. Audio Eng. Soc. 41 (11), 876-880 (1993). [101] J. B . Allen and D. A . Berkley, "Image method for efficiently simulating small-room acoustics," J. Acoust. Soc. Am. 65 (4), 943-950 (1979). [102] M . Vorlander, "Simulation of the transient and steady-state sound-propagation in rooms using a new combined ray-tracing image-source algorithm," J. Acoust. Soc. Am. 86 (1), 172-178 (1989). [103] B. -I. Dalenback, CATT-Acoustic v8.0 (Gothenburg, 2004). [104] CATT, in User's Manual (Gothenburg, 2002), pp. 2-69-84. [105] P. S. Veneklasen, "Model techniques in architectural acoustics," J. Acoust. Soc. Am. 47 (2A), 419-423 (1970). [106] H . D. Harwood and A . N . Burd, "Acoustic scaling of studios and concert halls," Acustica 28 (6), 330-340(1973). [107] M . Barron and C. B . Chinoy, "1:50 scale acoustic models for objective testing of auditoria," Appl. Acoust. 12 (5), 361-375 (1979). [108] M . Barron, "Auditorium acoustic modelling now," Appl. Acoust. 16 (4), 279-290 (1983). [109] M . Hodgson and R. J. Orlowski, "Acoustic scale modeling of factories, Part I: Principles, instrumentation and techniques," J. Sound Vib. 113 (1), 29-46 (1987). 19 [110] F. Spandock, "Akustische Modellversuche," Annalen der Physik 20, 345-360 (1934). [ I l l ] International Organisation for Standardisation (ISO), "ISO 9613-1 Acoustics - Attenuation of Sound During Propagation Outdoors - Part 1: Calculation of the Absorption of Sound by the Atmosphere First Edition," 1993. 20 2 VALIDATION OF THE AURALIZATION TECHNIQUE: SPEECH INTELLIGIBILITY TESTS IN VIRTUAL AND REAL CLASSROOMS * 2.1 Introduction Subjective speech-intelligibility tests give more realistic results than do the measurement of prediction of objective metrics. However, performing such tests in real rooms has limitations - for example, they are difficult to perform with a large number of subjects. Auralization offers a solution to these limitations, having an unlimited capability to reproduce realistic listening environments, and making it possible for speech intelligibility to be assessed in a room before it is built. However, before it can be used with confidence, the auralization technique must be validated in comparison with direct listening in real rooms. There are several ways to perform subjective speech intelligibility tests, as follows: (1) on-site listening tests: - a person speaks speech material in the room; - a loudspeaker plays speech material, recorded in an anechoic chamber, into the room. (2) off-site listening tests: - speech material recorded in the room are reproduced by loudspeakers in an anechoic chamber or via headphones; - speech material recorded in an anechoic chamber are convolved with room impulse responses measured in the room, and are reproduced by loudspeakers in an anechoic chamber or via headphones. * A version of this chapter has been submitted for publication to Acustica. Yang, W . and Hodgson, M. (2006) Validation of the auralization technique: comparison of speech intelligibility tests in virtual and real classrooms. 21 - speech material recorded in an anechoic chamber are convolved with calculated room impulse responses and are reproduced by loudspeakers in an anechoic chamber or via headphones ('fully-computed' auralization). For off-site listening methods, auralization techniques are required. The quality of speech-intelligibility tests depends on how well the speech source is modeled and how speech sound is transmitted to listeners - that is, the accurate modeling of the room and the listener. This is what auralization seeks to achieve. Speech-intelligibility tests using auralization methods have been used for acoustical evaluation of rooms, for validation of the auralization technique, and for clinical purposes. Nordlund, Kihlman, and Lindblad [1] developed a method for studying the correlation between the acoustical features and speech intelligibility of the room. They made recordings of nonsense monosyllables using an artificial speaker having human voice directivity and an artificial head constructed for stereophonic recording in an auditorium, and played the recordings through headphones. This method laid the groundwork for an auralization study on speech intelligibility. Kleiner [2] used more advanced auralization techniques for speech intelligibility testing. He used calculated and measured echograms to synthesize sound fields. Sound fields simulated in an anechoic chamber were recorded using a dummy-head and presented to listeners using headphones. The speech-intelligibility scores in on-site listening tests were compared with off-site listening-test results using dummy recordings, measured echograms, and calculated echograms. In general, the three different off-site speech-intelligibility scores were correlated with the on-site speech-intelligibility scores; however, the work revealed errors using the auralization technique for speech intelligibility. The speech-intelligibility scores in on-site listening tests in a theatre with a reverberation time os 0.8 s in the 1-kHz octave band were higher than those in off-site listening tests using auralization techniques. Ricard and Meirs [3] studied speech intelligibility and listener ability to localize speech in a virtual sound field. They employed head-related transfer functions (HRTFs) to condition Modified Rhyme Test (MRT) phrases for both recognition and localization tasks. Localization of azimuth was accurate, but front-back confusions in the range of other 22 localization studies that used HRTFs were found. Besing and Koehnke [4,5] used source-to-eardrum transfer functions to develop speech intelligibility tests for virtual auditory localization. The source-to-eardrum transfer functions were measured using a broadband stimulus in actual sound fields for each source location in each environment using the K E M A R [6] manikin. They concluded that the virtual localization test enabled potential problems inherent in free-field localization tests to be avoided. Peng [7] conducted subjective speech-intelligibility tests using auralization with binaural room impulse responses (BRTRs). He found that auralization with simulated BRTRs is a better means of testing speech intelligibility than that with monaural room impulse responses. In summary, research has been done to develop and validate the auralization technique, but there is lack of literature to support the validity of subjective speech-intelligibility testing in auralized sound fields for speech-intelligibility studies. The objective of the present work was to validate the fully-computed auralization technique for use in a subsequent speech-intelligibility study. The results of the real-classroom measurements are compared with results of tests in virtual classrooms. The similarities and differences between the results for the real classrooms and virtual classrooms are explained. The process of verification demonstrates the reliability of the auralization technique and, in particular, the fidelity of a virtual classroom model as a representation of a realistic listening environment. 2.2 M e t h o d s 2.2.1 Subjects and test materials Twelve university students (eight females and four males) completed the listening tests in both the real classrooms and the virtual classrooms. A l l of the subjects had normal hearing (thresholds of 20 dB H L or better) at octave frequencies from 250 to 8000 Hz. The Modified Rhyme Test [8] was used for the speech-intelligibility test. A test list consists of 50 words, which are common one-syllable words for consonant identification - 25 words tested initial consonants and 25 tested final consonants. The 300 words of the M R T were recorded in an anechoic chamber by a male talker who spoke standard Canadian English. 23 Each word was presented in the carrier phrase, "Say (test word)", at an average rate of two syllables per second. The word order of each list was randomized to obtain the 12 lists of the MRT, each containing 50 words. The r.m.s. amplitude of each word list was normalized tot the same value. As a noise signal, four-talker babble by AudiTec of St. Louis [9] was presented with the MRT lists. 2.2.2 Classrooms and acoustical measurements Speech-intelligibility tests were performed in two existing, medium-sized university classrooms in order to evaluate the fidelity of the 'fully-computed' auralized sound fields. The classrooms were architecturally identical, with volumes of 400 m3, but had different acoustical characteristics. One (Room A) had acoustical treatment to reduce reverberation, the other (Room B) had no acoustical treatment. Three listening positions (rl, r2, and r3) were chosen along the centre line of the classrooms. The speech source was positioned at the front, where the instructor might typically stand. The noise source was at the back of the classroom between r2 and r3 (see Figure 2.1). Acoustical measurements were made in both classrooms. Impulse responses and speech levels were measured using the Maximum Length Sequence System Analyser (MLSSA) at each listening position. Reverberation Time (RT), Early Decay Time (EDT), early-to-late energy ratio (C5 0), and useful-to-detrimental energy ratio (U50) were calculated from the measured and predicted impulse responses. Mid-frequency (500, 1000, and 2000 Hz octave-band average) values were used for the RT and the EDT. C 5 0 and U50 were calculated from the measured and predicted unfiltered impulse responses by Eqs. (1) and (2). (1) (2) 24 Figure 2.1. 3D model of Room A showing sources (Speaker and Noise) and receiver positions (r l , r2 and r3). Here, Ed is the direct energy from the source, Ed.50ms is the total early speech energy arriving up to 50 ms after the direct sound, ESoms-«> is the total late speech energy arriving after 50 ms, and the noise energy is E„. 2.2.3 Auralization and listening tests The classrooms were modeled, and acoustical prediction and auralization were performed, using CATT-Acoustic v8.0 [10]. Each classroom had 153 surfaces; sources emitted 13508 rays. The classroom surfaces were discretized according to the distribution of materials in Room A, which was acoustically treated. For predicting Room B, the 3D model for Room A was also used, to maintain the same surface numbers and discretization, though 25 with different materials. Figure 2.1 shows the speech source, noise source and receiver positions for Room A . The absorption coefficients and diffusion coefficients of the surface materials in the classrooms were the only acoustical quantities that could be varied to model the virtual classrooms. Table 2.1 shows absorption and diffusion coefficients of each surface material, and the average absorption and diffusion coefficients used in the room modeling. The absorption coefficients were chosen based on published absorption coefficients of the materials [11]. Room B consisted of a floor made of polished concrete, walls made of painted plywood, a ceiling made of textured plaster, and a blackboard. Room A had acoustical panels on the walls and the ceiling, with absorption coefficients 0.20, 0.57, 0.90, 0.98, 0.98, 0.97 in the 125 to 4000 Hz octave bands. The diffusion coefficients were chosen based on previous work [12], then adjustments were made to achieve a best fit to the measured RTs. Table 2.1. Individual and average absorption coefficients (a, a ) and diffusion coefficients (d,d). Material Area (m2) Hz 125 250 500 l k 2k 4k Glazed 78.08 a 0.02 0.02 0.02 0.01 0.01 0.01 concrete d 0.05 0.05 0.05 0.05 0.05 0.05 Painted Room A : 72.39 a 0.28 0.07 0.06 0.07 0.07 0.06 plywood RoomB:132.57 d 0.03 0.03 0.02 0.01 0.01 0.01 Textured Room A : 66.37 a 0.28 0.07 0.06 0.07 0.07 0.06 plaster RoomB: 124.24 d 0.03 0.03 0.02 0.01 0.01 0.01 Wood 44.14 a 0.14 0.10 0.06 0.08 0.10 0.10 d 0.60 0.45 0.32 0.38 0.30 0.30 Blackboard 19.32 a d 0.15 0.10 0.25 0.05 0.20 0.04 0.12 0.04 0.10 0.03 0.05 0.03 Seat area 60.77 a 0.13 0.16 0.15 0.13 0.18 0.25 d 0.62 0.72 0.80 0.80 0.85 0.85 Absorption Room A only 118.05 a d 0.20 0.39 0.57 0.42 0.90 0.90 0.98 0.98 0.98 0.95 0.97 0.95 Room A 459.12 a d 0.18 0.26 0.21 0.27 0.29 0.39 0.31 0.41 0.31 0.40 0.31 0.40 Room B 459.12 a 0.20 0.08 0.07 0.07 0.08 0.08 d 0.17 0.16 0.16 0.16 0.16 0.16 26 Table 2. 2. Freefield sound levels (L e q) at lm from source and power levels (Lw). Sources (dB) 125 250 500 l k 2k 4k unweighted M L S Leq L w 43.7 53.5 46.5 55.5 50.4 57.9 57.8 64.1 60.6 64.9 61.7 64.8 65.3 69.9 M R T L e q L^v 51.5 61.3 57.6 66.6 59.2 66.7 56.3 62.6 54.4 58.7 48.8 51.9 63.7 71.2 Noise L e q L w 50.4 61.3 59.4 70.3 62.5 73.4 • 51.5 62.4 43.4 54.3 33.4 44.3 64.7 75.6 The impulse responses predicted by CATT-Acoustic were convolved with the M R T lists recorded in the anechoic chamber and with the babble noise signals. Since the Maximum Length Sequences (MLS) were used to measure the room impulse responses, and a speech signal (MRT) was used to conduct the listening tests, two sets of predictions were made to represent each case. In order to compare predictions with the results measured using the M L S S A system, the M L S spectrum was used as the speech signal in the prediction of the acoustical parameters. However, for auralization, the M R T spectrum was used as the speech signal. Free-field sound levels at 1 m from the source were measured in the anechoic chamber using both M L S and M R T sources, and are presented in Table 2.2. For both real and virtual classroom tests, loudspeakers were used to reproduce the speech and noise. In order to account for a human speaker's directivity pattern, a loudspeaker designed for human voice testing, which had directional characteristics similar to that of the human voice, was used as the speech source. Figure 2.2 shows the measured directivities. An omnidirectional loudspeaker was used as the babble-noise source. The presentation levels of the speech-intelligibility tests were calibrated using the one-minute-average sound level (Leq.imin) of the M R T list 1, based on Table 2.2. Subjects completed the tests at all three receiver positions ( r l , r2, r3) for each configuration. For the virtual classroom test, head related-transfer functions (HRTFs) [13] and headphone transfer functions were included in the simulations to provide more realistic sound reproduction. In order to take into account the effects of head, shoulder, and pinna, the 27 front 0 back Figure 2. 2. Directivity patterns of the speech sources (a: loudspeaker radiating an MRT signal for speech intelligibility test; b: loudspeaker radiating am MLS signal for acoustical measurements and prediction. ... 125 Hz, -•- 250 Hz, — 500 Hz, — 1000 Hz, — 2000 Hz, — 4000 Hz). 28 average HRTF provided by CATT-Acoustic was used, along with headphone transfer function compensation for Beyer DT 990 Pro headphones [beyerdynamic, Heilbronn]. The computed binaural impulse responses were convolved with the M R T lists and the babble noise using CATT-Acoustic. CATT-Acoustic does not provide absolute calibration of auralization levels [14]. Thus, the sound levels at the three receiver positions for the convolved M R T and the convolved babble signals were calibrated relative to one another using the calibration function in CATT-Acoustic. The speech signal and the babble noise for each combination were combined; all of the sound processing was conducted using CATT-Acoustic. Since the difference between the 1-m free-field sound levels of the MRT, and of the babble noise was 1.0 dB (see Table 2.2), which is perceptually insignificant [15], the calibrated M R T and babble noise were added without sound-level scaling. The final auralization test materials were transferred to a compact disc for presentation via a CD player. Each virtual test was performed individually in a soundproof booth using the Beyer DT 990 Pro headphones. The headphone output levels were set by measuring the equivalent sound-pressure levels of the M R T signal at the drive unit. 2.3 Results and Analysis 2.3.1 Room acoustical parameters Figures 2.3a and b show the measured and predicted RTs and EDTs; Table 2.3 shows their differences. RT and EDT showed good agreement between measurement and prediction in both Room A and Room B. The differences were less than 10 %, except at r2 and r3 in Room A (see Table 2.3). A 10 % accuracy can be considered to be an indicator of the minimum practically significant difference [16]. Although the actual difference was not great in the high absorption classroom (Room A), the difference in percentage was greater because the measured RT and EDT were very low compared to Room B. C 5 0 showed disagreements between measurement and prediction for both Room A and Room B (see Figure 2.3c). The differences in the C 5 0 s varied from -3.7 dB to 5.6 dB, as shown in Table 2.3. The just noticeable difference (IND) in C 5 0 values is known to be 1.1 dB 29 Seats Seats Figure 2. 3. Measured and predicted reverberation metrics (a: Reverberation times; b: Early decay times; c: C 5 0 without babble; d: f/50 with babble. • : Predicted Room A; • : Measured Room A; • : Predicted Room B; o: Measured Room B). 30 Figure 2. 4. Measured and predicted speech and babble noise levels and speech-to-noise level differences (a: speech levels; b: babble levels; c: speech-to-noise ratio with background noise, d: speech-to-noise ratio with babble noise. • : Virtual Room A; • : Real Room A; • : Virtual Room B; o: Real Room B). 31 Table 2.3. Difference between prediction and measurement parameters. Predicted - Measured Room r l r2 r3 RT(s) A 0.02 (4.4 %) 0.13 (26.5 %) 0.16(29.1 %) B 0.03(1.6%) 0.01 (0.5 %) 0.00 (0.0 %) EDT(s) A 0.02(10.0%) -0.03 (-7.5 %) 0.00 (0.0 %) B -0.07 (-3.7 %) -0.11 (-5.7%) 0.03 (1.7%) C 5 0 ( d B ) A 2.1 5.6 1.3 B -0.9 -2.2 -3.7 SPEECH (dBA) A " 1.7 0.8 0.6 B 0.6 0.9 1.5 Uso (dB) A 3.4 1.8 -0.2 B 1.2 -0.7 -1.8 B A B B L E (dBA) A -0.2 1.4 1.0 B -0.3 0.8 0.3 [17]. The measured C 5 0 at r2 was lower than at r3 in both classrooms. In the high-absorption classroom (Room A), the predicted C 5 0 values were higher than the measured C 5 0 ; however, in the low-absorption classroom (Room B), the measured Cso values were in general higher than the predicted C 5 0 s. In Room A, C 5 0 was higher if EDT was lower; however, in Room B, the relationship between C 5 0 and EDT was not clear. It may be due to the large amount of late sound energy compared to the direct sound and early sound energies. Uso showed better agreement than C 5 0 . The differences in the USoS varied from -1.8 dB to 3.4 dB, as shown in Table 2.3. The JND for U50 might be higher than that for C 5 0 (no reference was found for the JND for U50), because U50 includes a noise component, and the noise might desensitize the hearing ability. Speech levels and babble noise levels were both measured and predicted. Figure 2.4 shows the speech levels, noise levels and speech-to-noise level difference, with background noise and with babble noise. Their differences are shown in Table 2.3. In most cases, the predicted sound levels were slightly higher than the measured levels. 32 2.3.2 Speech intelligibility tests The mean percentage speech-intelligibility scores and 95 % confidence intervals are shown in Figure 2.5. In the high absorption classroom (Room A) without babble noise, the difference between the virtual and real classrooms was greater than in the low-absorption classroom (Room B). Contrastingly, in Room B with the babble noise, the difference between the virtual and real classrooms was much greater than in Room A . The difference in the M R T scores between the virtual and real classrooms for Room B with the babble was more than 15 %, with the virtual classroom having lower speech intelligibility relative to the real classroom. Table 2.4 presents the mean speech-intelligibility scores and standard deviations. As expected, the standard deviations were generally greater when the mean speech-intelligibility scores were lower with babble noise and vice versa. When the babble noise was not present, the standard deviations in the real classrooms, both Room A and Room B, were greater than in the virtual classrooms, except at r3 in Room A and at r l in Room B. This suggests that the virtual classrooms are reliable for speech-intelligibility tests without significant noise signals. Contrastingly, when the babble noise was present, the standard deviations in the virtual classrooms were greater than in the real classrooms. This suggests that auralization is sensitive to babble noise. Table 2. 4. Mean speech-intelligibility scores and standard deviations. Room A Room B Without babble With babble Without babble With babble Virtual Real - Virtual Real Virtual Real Virtual Real Mean 99.17 91.83 92.93 92.33 94.67 97.00 80.83 95.50 SD 1.34 3.35 4.63 5.25 3.55 3.46 5.49 4.04 Mean 99.33 91.00 84.83 85.5 93.17 95.17 67.00 87.50 SD 1.30 4.05 7.26 4.52 2.76 4.71 7.21 6.88 Mean 96.33 94.00 77.17 80.33 91.67 94.33 59.33 80.00 SD 2.67 2.09 8.96 5.90 2.93 4.66 6.17 6.27 33 Seats Figure 2. 5. Measured and predicted speech-intelligibility test scores and 95% confidence interval (a: without babble; b: with babble. • : Virtual Room A; • : Real Room A; • : Virtual Room B; o: Real Room B). 3 4 Table 2. 5. Difference between predicted and measured MRT results. M R T Without Babble With Babble Room r l r2 r3 r l r2 r3 ^ A 7.34 8.33 2.33 0.50 -0.67 -3.16 Difference A (7.99) (9.15) (2.48) (0.54) (-0.78) (-3.93) (%) B -2.33 -2.00 -2.66 -14.67 -20.50 -20.67 (-2.40) (-2.10) (-2.82) (-15.36) (-23.43) (-25.84) A 8.16 6.30 2.55 0.25 -0.21 -0.98 Paired f-test A (0.000) (0.000) (0.027) (0.810) (0.836) (0.347) (p-value) B -2.38 -1.54 -2.4 -10.32 -6.54 -11.60 (0.036) (0.153) (0.035) (0.000) (0.000) (0.000) These results were interpreted by statistical analysis using the paired Mest, as shown in Table 2.5. Without babble, there was a highly significant difference in the speech intelligibility scores between the real and virtual classrooms in Room A at r l and r2, but no statistically significant difference in Room B. With babble, there was no statistical difference between the real and virtual classrooms in Room A, but in Room B there was a highly statistical difference. The mean speech-intelligibility scores were compared with C50, U50, and speech-to-noise level differences in Figure 2.6. Without babble noise, the speech intelligibility was highly correlated with C 5 0 and speech-to-noise level difference in Room B (see Figure 2.6a and b). However, there was no clear relationship between speech intelligibility and C 5 0 or speech-to-noise level difference in the high absorption classroom (Room A) . With babble noise, the speech intelligibility depends on U5o and speech-to-noise level difference in both the high and the low absorption classrooms (Room A and Room B) (see Figures 2.6c and d). The equations and coefficients of determination (R2) associated with the linear regression lines are presented in Table 2.6. Thus, in general, the mean speech-intelligibility scores are highly correlated with the speech-intelligibility metrics in both the low and high absorption classrooms, and in both real and virtual classrooms, with babble noise. In quite absorptive classrooms, the difference in the mean speech-intelligibility scores between the virtual classroom and the real classroom were large, and the relationship between the mean speech-intelligibility scores and speech-intelligibility metrics were not clearly seen. 35 U 5 0(dB) " SNR(dB) Figure 2. 6. Variation of speech intelligibility with acoustical parameters (a: C 5 0 ; b: Speech-to-noise level difference without babble; c: Uso; d: Speech-to-noise level difference with babble. • : Virtual Room A; • : Real Room A; • : Virtual Room B; o: Real Room B). Table 2. 6. Equations and coefficients of determination (R2) associated with linear regression of C50) U$o, and speech-to-noise level difference with mean speech-intelligibility score. Virtual Real Real Eq Room A R2(%) Eq Room A R2(%) Virtual E q Room B R2 (%) Eq RoomB R2(%) Without Babble With Babble SI-C50 SI-SNR SI-t/50 SI-SNR SI= =0.2697«C5 0 SI= =0.2248«SNR SI= =0.8324'£/ 5 0 SI= =0.8868«SNR +94.777 +93.996 +86.976 +88.978 32.76 64.66 90.24 90.68 SI= =0.0393«C5 0 SI= =-0.193«SNR SI= =0.7738«(75o SI= =0.8408«SNR +91.884 . +95.753 +89.252 +90.117 1.39 46.22 88.57 96.63 SI= =0.497 l ' C 5 0 SI= =0.5551-SNR SI= =2.3001'£/ 5 0 SI= =3.3649«SNR +92.448 +80.931 +82.477 +82.708 86.34 97.43 96.79 96.43 Sf =0.7356«C5o SI= =0.4409«SNR Sf =2.1998«C/5o SI= =2.4374«SNR +92.762 +86.216 +99.515 +99.329 74.18 98.80 83.15 97.87 2.4 Discussion Differences between the speech intelligibility in the real and virtual classrooms can be explained by one or a combination of the potential reasons listed below: - the early-to-late energy ratio was not correctly estimated; - surface diffusion was not sufficiently taken into account; - the localization of the speech and noise sources in the virtual classrooms was not accurately modeled; - sound reproduction via headphones was not the same as natural listening; - the calibration of the speech signals and babble signals after convolution was not correctly done; - the distortion of the electroacoustic equipment was too high. Each of these points will be discussed in detail in this section. Information about the early-to-late energy ratio can be obtained by considering the room impulse responses. As shown in Figures 2.7a and b, in Room A the predicted impulse 37 response had a stronger direct sound compared to the measured impulse response. This made predicted C 5 0 values higher than the measured C 5 0 in this configuration. In contrast, the predicted impulse response had a relatively long reverberant tail in the reverberant room, i Room B. This can also be understood by comparing the reverberation times and early decay times shown in Figures 2.3a and b. In Room B, the predicted RTs and EDTs did not change much with receiver position in the room; however, the measured RTs were lower than the measured EDTs. Thus, the measured impulse response had more early energy than the predicted one. Even though the measured and predicted reverberation times were similar, the ratios of early energy to late energy differed. Time (s) Time (s) Figure 2. 7. Room impulse responses at r2 (a: Predicted Room A; b: Predicted Room B; c: Measured Room A; d: Measured Room B). 38 The room impulse responses (Figure 2.7) provide information about sound-field diffuseness as well as the early-to-late energy ratio. A number of strong individual peaks appear in the predicted room impulse responses in both Room A and Room B. Thus, the predicted rooms have more specular reflections than in the measured rooms. This suggests that the predicted rooms should have had more diffusion (i.e., higher diffusion coefficients). For the reverberant classroom, Room B, the real classroom speech-intelligibility scores were higher than the virtual classroom test scores, as in Kleiner's results [2]. He found that speech-intelligibility results using simulated sound fields with RTs 0.8 to 1.0 s were worse than for direct listening in the theatre tested. The standard deviation of the mean speech-intelligibility scores also supported the conclusion that the auralization technique was adversely influenced by the reverberation and noise, as shown in Table 2.4. It is difficult to localize sound in a reverberant room or in noise. Sound localization is adversely affected by reverberation [18,19]. Since Good and Gilkey [20] found adverse effects of noise in localization, it has been studied by many researchers [21,22,23,24]. However, this issue remains unsolved. It may be also caused by experimental errors associated with on-site listening testing in real classrooms. For example, in the real-classroom tests, listeners can move their heads to change the signal levels they receive, whereas in the virtual-classroom tests using headphones, the test-signal levels were kept constant. HRTFs are important to enhance spatial reproduction in an auralized sound field [25]. Wenzel et al. [26,27,28] compared individual HRTFs and non-individual HRTFs. The data showed that, while the interaural cues to horizontal location were robust, the spectral cues were distorted by a synthesis process that used non-individualized HRTFs. However, many listeners were able to obtain at least some useful directional information from an auditory display, without requiring the use of individualized HRTFs. The relatively small standard deviations of the mean speech-intelligibility scores in the virtual classrooms suggest that considering differences in the subjects' HRTFs may not be necessary, and that the effect of headphone distortion is negligible. It was necessary to perform convolution twice for each configuration, since we had two different sound signals — the speech and the babble noise. Calibration of the convolved 39 sound levels relied on CATT-Acoustic in this study. Absolute calibration was difficult to achieve due to the various scaling factors involved in the calculation [14]. It should be possible to improve the accuracy of the sound-level calibration using the relative level difference between the speech and the babble noise. In the high-absorption classroom, Room A, the differences between the real and virtual classrooms may not only be due to limitations in the auralization techniques. It may be caused by experimental errors in real-classroom measurements due to distortion added by the electro-acoustic equipment. As shown in Figures 2.6a and b, the speech intelligibility in Room A without noise did not correlate with the two objective speech-intelligibility metrics, C 5 0 and speech-to-noise level difference. Contrastingly, in the configuration which had differences between the real and virtual classrooms (Room B with babble noise), the speech-intelligibility scores were well correlated with both C 5 0 and speech-to-noise level difference. 2.5 S u m m a r y a n d C o n c l u s i o n s This comparison study showed that, i f the room to be auralized is not too absorptive (e.g., RT= 0.4 s at the center of the room) or noisy (e.g., SNS = 0 dB), speech-intelligibility tests using auralization are reliable. The fundamental assumption of the auralization technique is that the room impulse response characterizes the room. The calculated impulse response is convolved with sound signals to obtain the sounds to be auralized. How close the predicted impulse response is to the measured impulse response is the key factor determining the accuracy of the auralization procedure. In this study, the results for the virtual classroom demonstrated limitations of current auralization techniques in certain conditions. Auralizing reverberant sound fields with noise is challenging. The low-absorption virtual classroom (Room B) did not agree well with the real classroom in terms of the speech-intelligibility test scores. 40 Mixing two different sound signals - the M R T signals and the babble noise - turned out to be another problem. In real sound fields, a listener can distinguish the sound information which he/she wants to listen to, from other noises. Sound directivity and spatial hearing can help this hearing procedure. However, in virtual classrooms, when the babble noise was presented in the low-absorption classroom (Room B), the M R T words were completely 'mixed up' with the babble noise. In general, the prediction results showed good agreement for most room-acoustical parameters, except for C 5 0 , even though the M R T test scores showed some disagreement. References [I] B. Nordlund, T. Kihlman, and S. Lindblad, "Use of articulation tests in auditorium studies," J. Acoust. Soc. Am. 44 (1), 148-156 (1968). [2] M . Kleiner, "Speech-intelligibility In real and simulated sound fields," Acustica 47 (2), 55-71 (1981). [3] G. L . Ricard and S. L . Meirs, "Intelligibility and localization of speech from virtual directions," Human Factors 36 (1), 120-128 (1994). [4] J. Besing and J. Koehnke, "A Test of Virtual Auditory Localization," Ear Hear. 16 (2), 220-229(1995). [5] J. Koehnke and J. M . Besing, "A procedure for testing speech intelligibility in a virtual listening environment.," Ear Hear. 17 (3), 211-217 (1996). [6] W. G. Gardner and K . D. Martin, "HRTF measurements of a K E M A R , " J. Acoust. Soc. Am. 97(6), 3907-3908 (1995). [7] J. Peng, "Feasibility of subjective speech intelligibility assessment based on auralization," Appl. Acoust. 66, 591-601 (2005). [8] E. J. Kreul, J. C. Nixon, K . D. Kryter et al., " A proposed clinical test of speech discrimination," J. Speech Hear. Res. 11 (3), 536-552 (1968). [9] AuditecofSt. Louis (St. Louis, MO, 2003). [10] B . -I. Dalenback, CATT-Acoustic v8.0 Users' Manual (Gothenburg, 2004). [II] M . D. Egan, Architectural acoustics. (McGraw-Hill, New York, 1988), pp.52-53. [12] M . Hodgson and E . - M . Nosal, "Effect of noise and occupancy on optimal reverberation times for speech intelligibility in classrooms," J. Acoust. Soc. Am. I l l (2), 931-939 (2002). [13] F. L . Wightman and D. J. Kistler, "Headphone simulation of free-field listening. II: Psychophysical validation," J. Acoust. Soc. Am. 85 (2), 868-878 (1989). [14] C A T T , in User's Manual (Gothenburg, 2002), pp. 7-38-39. 41 [15] E. Zwicker and H . Fasti, Psycho-acoustics: Facts and models, 2nd ed. (Springer-Verlag, Heidelberg, 1999), pp. 175-182. [16] S. R. Bistafa and J. S. Bradley, "Predicting reverberation times in a simulated classroom," J. Acoust. Soc. Am. 108 (4), 1721-1731 (2000). [17] J. S. Bradley, R. D . Reich, and S. G. Norcross, "On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility," J. Acoust. Soc. Am. 106 (4), 1820-1828 (1999). [18] C. Giguere and S. M . Abel, "Sound localization: Effects of reverberation time, speaker array, stimulus frequency, and stimulus rise/decay," J. Acoust. Soc. Am. 94 (2), 769-776 (1993). [19] B . G. Shinn-Cunningham, N . Kopco, and T. J. Martin, "Localizing nearby sound sources in a classroom: Binaural room impulse response," J. Acoust. Soc. Am. 117 (5), 3100-3115 (2005). [20] M . D . Good and R. H . Gilkey, "Sound localization in noise: The effect of signal-to-noise ratio," J. Acoust. Soc. Am. 99 (2), 1108-1117 (1996). [21] K . S. Abouchacra, D . C. Emanuel, I. M . Blood et al., "Spatial perception of speech in various signal to noise ratios," Ear Hear. 19 (4), 298-309 (1998). [22] C. Lorenzi, S. Gatehouse, and C. Lever, "Sound localization in noise in normal-hearing listeners," J. Acoust. Soc. Am. 105 (3), 1810-1820 (1999). [23] C. Lorenzi, S. Gatehouse, and C. Lever, "Sound localization in noise in hearing-impaired listeners," J. Acoust. Soc. Am. 105 (6), 3454-3463 (1999). [24] E. H . A . Langendijk, D . J. Kistler, and F. L. Wightman, "Sound localization in the presence of one or two distracters," J. Acoust. Soc. Am. 109 (5), 2123-2134 (2001). [25] J. Blauert, Spatial hearing: the psychophysics of human sound localization, Revised ed. (Hirzel Verlag, Stuttgart, 1996), pp.372-392. [26] E. M . Wenzel, F. L . Wightman, and D . J. Kistler, "Localization with non-individualized virtual display cues," Proc. SIGCHI (1991). [27] E. M . Wenzel, M . Arruda, D . J. Kistler et al., "Localization using nonindividualized head-related transfer functions," J. Acoust. Soc. Am. 94 (1), 111-123 (1993). [28] E. M . Wenzel and D . R. Begault, "Are individualized head-related transfer functions required for auditory information displays?" J. Acoust. Soc. Am. 105 (2), 1035 (1999). 42 3 OPTIMUM REVERBERATION TIMES FOR SPEECH INTELLIGIBILITY FOR NORMAL AND HEARING-IMPAIRED LISTENERS IN IDEALIZED CLASSROOMS WITH DIFFUSE SOUND FIELDS * 3.1 Introduction Verbal communication is one of the most important acoustical activities in rooms, from small meeting rooms and classrooms to larger auditoria and conference rooms. The acoustical designs of such rooms should therefore achieve a high degree of speech intelligibility for listeners. Speech intelligibility is directly related to speech-to-noise level difference, and is inversely related to reverberation time [1]. However, in rooms the situation is complicated by the fact that reverberation and steady-state levels of speech and noise interact. Increased reverberation increases both speech and noise levels by increasing the reverberant sound energy; this increases or decreases the speech-to-noise level difference at a listener position depending on the listener's relative distances to the speech and noise sources. Here we consider rooms with approximately diffuse sound fields, for which reverberation can be accurately described by the reverberation time (RT), so that the results can be related to previous experimental research. The literature reports a number of experimental and theoretical studies which investigated the relationship between the prevailing acoustical conditions and resulting speech intelligibility. These studies accounted for noise and the interaction between reverberation and speech-to-noise level difference, with varying degrees of realism. A brief overview of the literature is presented here - see Ref. 2 for a full review and discussion. * A version of this chapter has been published in J. Acoust. Soc. Am. 120(2) 801-807. Yang, W . and Hodgson, M. (2006) Auralization study of optimum reverberation times for speech intelligibility for normal and hearing-impaired listeners in classrooms with diffuse sound fields. 43 Nabelek and Robinson [3] showed that, in the absence of noise, speech intelligibility was inversely related to reverberation time - that is, the optimal reverberation time for speech intelligibility was zero. Nabelek and Pickett [4,5] and Finitzo-Hieber and Tillman [6] performed speech-intelligibility tests with normal-hearing and hearing-impaired subjects, for various fixed speech-to-noise ratios in rooms with various fixed reverberation times, again finding that speech intelligibility decreased with increased reverberation time. Hearing-impaired people were more sensitive to reverberation than normal-hearing people. However, these experimental studies were unrealistic in effectively assuming a diffuse sound field by involving exponential sound decays, and in not accounting for the interaction between reverberation and sound levels. The theoretical studies, on the other hand, were based on speech-intelligibility metrics which are considered to be good predictors of speech intelligibility, and which account for the interaction. Bradley [1], using both subjective test results and theoretical prediction of the USo useful-to-detrimental energy metric in diffuse sound fields, found optimum reverberation times of 0.4 to 0.5 s for classrooms with a uniform background-noise level of 30 dBA. Houtgast et al. [7] used a numerical model and found non-zero optimum reverberation times for a variety of rooms with non-diffuse sound fields, and for various speech-to-noise level differences. The effects of reverberation on noise were incorporated by considering the audience as a collection of individual noise sources. Bistafa and Bradley [8] used a theoretical model and predicted non-zero optimal reverberation times in diffuse sound fields using a number of metrics. They found that increased reverberation increased early energy and intelligibility, but that too much reverberation decreased intelligibility. However, noise levels were again unrealistically assumed to be uniform throughout the room. Hodgson and Nosal [2] incorporated noise into a theoretical model in a realistic manner, by including noise sources in the rooms, which had diffuse sound fields. Predicted optimum reverberation times depended on the source directivities, the speech-to-noise level difference at the listener's position, the positions and orientations of the speaker and the noise source, and the number of noise sources. They found that i f the speech source was farther from the listener than, the noise source, then speech levels increased more with reverberation than did noise levels, and the level difference increased with reverberation, tending to increase intelligibility. If, on the other hand, the noise source was farther from the listener than the speech source, then noise levels increased 44 more with reverberation than did speech levels, tending to decrease intelligibility. Thus, the effect of reverberation on intelligibility depended on the relative distances from the listener to the speech and the noise sources. The effect increases with source/receiver distance and, therefore, is greater when source/receiver distances are larger than the reverberation radius (or critical distance). In this paper, auralization techniques are used to identify optimal reverberation times in an idealized classroom with speech and babble-noise sources and an approximately diffuse sound field, in order to validate theoretical prediction. Considering individual speech and noise sources is more realistic. Subjective test has the potential to be more accurate than theoretical prediction, since it directly reflects listeners' perception. Optimal reverberation times are found by performing speech-intelligibility tests with normal-hearing and hearing-impaired adult subjects. Considering the U, speech-intelligibility metric that has been shown to be well-suited to the prediction of speech intelligibility in classrooms [1], the early/late energy time t that best predicts speech intelligibility is found for both subject groups. Involving both normal- and hearing-impaired subjects allows similarities and differences between these two subject groups to be determined. 3.2 T h e o r e t i c a l C o n s i d e r a t i o n s Ut is a metric based on the useful-to-detrimental energy-ratio concept. This concept divides acoustical energy received after the arrival of the direct sound into useful and detrimental parts. The useful part consists of the direct energy from the speaker, E&, and the early-arriving, reflected energy from the speaker, Ee. The remaining reflected, or late-arriving, energy, E\, is considered detrimental. In addition to the late-arriving reflected energy, noise energy, En, is detrimental. Thus, the measured useful-to-detrimental ratio calculated from measured data is defined as, The Ut useful-to-detrimental ratio can also be predicted based on diffuse-field theory [2], 0) 45 ^ „ = 1 0 1 o g (rl/r?) + l-e-kl + 10 (Lnn-L,fl)l\0, 2 dB, (2) where the subscripts'm' and 'p' refer to measured and predicted data, respectively. Here k = ln(106)/i?r, where RT is the reverberation time, rs is the distance from the speech source to the listener, r„ is the distance from the noise source to the listener, and rh is the reverberation radius (or critical distance) associated with the speech source. Z, s f l and Z, n f l are the long-term anechoic levels at 1 m directly in front of the speech and noise sources, in a free field, respectively; qs and qn are their directivity indices. 3.3 Experimental Methodology 3.3.1 Classroom and sound-field simulation procedures ' In this study, one of the objectives was to model an idealized room with an approximately diffuse sound field and exponential sound decay. Thus, the design was based on previous research into the factors that relate to a diffuse sound field [9]. The virtual classroom was based on a real 95-seat classroom of simple, rectangular geometry. It was 11-m long, 7-m wide and 5-m high (volume = 385 m3). Predictions were made in octave bands from 125 to 4000 Hz. The same absorption coefficients were used for all octave bands and for all surfaces. The absorption coefficient was varied to achieve different reverberation times. Values of 1.0, 0.68, 0.40 and 0.21 were used, respectively, to obtain reverberation times of 0.0, 0.2, 0.4 and 0.8 s. The corresponding reverberation radii varied from 3 to 1 m. In order to avoid strong specular reflections from the surfaces, and to promote diffuse fields with exponential sound decays, all surfaces were defined to be 30 % diffusely reflecting [10]. CATT-Acoustic v8.0 [11] was used to predict and auralize the sound fields. The number of rays and the truncation time were 10,088 and 1.0 s, respectively, for both prediction and auralization. In order to verify the diffusiveness of the simulated sound fields, predicted EDTs and RT's were compared; these should be very similar in a diffuse sound field with an exponential sound decay. The differences were always less than 0.05 s at mid frequencies (500, 1000, and 2000 Hz). 46 The classroom contained a speech source, a noise source and a virtual listener, all located at least 2 m from surfaces. Two noise-source positions were considered - one between the speech source and the listener, such that the noise source was closer to the listener than the speech source (Noise 1), and one farther from the speech source than the listener (Noise 2). Figure 3.1 shows the floor-plan of the virtual room, with the relative positions of the listener, the speaker, and the noise sources. The speech source had human-like directivity, the noise source was omnidirectional. The listener and the speech source faced each other. The relative output power levels of the speech and noise sources were chosen to give differences (SNS) of 0 and 5 dB. Note that SNS is different from the difference in the speech and noise levels at a receiver location (SNR). Table 3.1 lists the SNR for both SNS values. The values of SNS and RT were selected on the basis of preliminary listening tests. These covered a wide range of SNS and RT values and allowed the more limited ranges, resulting in realistic SNR values and expected to contain the optimal RT values which were used here to be identified. The test RT values were additionally chosen to cover the range including zero and the optimal values specified in classroom standards [1]. Binaural impulse responses between the listener and the speech and noise sources were predicted. In order to take into account the effects of the head, shoulder and external auditory systems of the virtual listener, the head-related transfer functions (HRTFs) provided with the CATT-Acoustic system were used. Headphone playback without equalization, and diffuse-field HRTF data, were used in this work. The resulting sound fields were auralized using the CATT-Acoustic software. The speech-signal level was chosen to approximate typical classroom levels. The noise-source levels were set relative to the input speech level to achieve the two test SNS values. A total of 16 different sound-field configurations were created, consisting of all combinations of the two speech- and noise-source relative output levels (SNS = 0 and 5 dB), the four reverberation times (RT = 0, 0.2, 0.4, and 0.8 s), and the two positions of the noise source (Noise 1 and Noise 2). As shown in Table 3.1, the speech-to-noise level differences received at the listener position (SNR), corresponding to the two SNS values, varied in the various configurations from -6 dB to +8.5 dB. 47 Table 3. 1. Received speech-to-noise level differences (SNR in dB) for all test sound-field configurations. SNS = 0 dB SNS=5dB RT(s) Noise 1 Noise 2 Noise 1 Noise 2 0.0 -6.0 3.5 -1.0 8.5 0.2 -4.5 2.1 0.4 7.2 0.4 -2.8 1.2 2.1 6.2 0.8 -1.5 0.6 3.3 5.7 Elevation Speech Listener Noise 1 source Noise 2 o o 1 1 I I Floor Plan 5 7 9 11 Figure 3. 1. Floor-plan and elevation of the virtual classroom, showing the speaker, listener and noise-source positions. All coordinates are in metres. 48 The Modified Rhyme Test (MRT) [12] was used as the speech-intelligibility test method. Twelve, fifty-word M R T word lists recorded by a male, native-Canadian talker in an anechoic chamber were combined through the CATT-Acoustic system with four-talker babble noise (available from A U D I T E C H [13]). The MRT-speech and babble-noise signals for each test configuration were mixed together using the Goldwave v5.1 sound-editing program [14], at levels corresponding to the predicted sound-pressure levels at the listener position. The resulting, final auralization test materials were transferred to a compact disc for presentation to subjects using a CD player. The test material was replayed through Sony M D R V600 headphones in a soundproof room. Each subject was tested individually. Each listened to a complete list of 50 words for each of the 16 different sound-field configurations. In order to avoid score inflation caused by the closed-set method1 used here, subjects were instructed not to guess the answer. The tests were presented in randomized order. The presentation levels were set by the predicted levels of the sound-field configurations, and used for both normal and hearing-impaired groups. 3.3.2 Subjects Hearing-screening tests were done prior to the speech-intelligibility testing, to identify the hearing categories of the subjects. Subject groups for the study were normal-hearing adults, and hearing-impaired adults with a mild to moderate sensorineural hearing loss, whose first language was English. The hearing-loss criteria for the hearing-impaired subjects were lower than 25 dBHL (HL = Hearing Loss) between 250 Hz and 1 kHz, and between 30 and 55 dBHL from 2 to 8 kHz, with no more than 15 dB difference between the two ears at any two frequencies. This represents a typical frequency response for sensorineural hearing loss [15]. The hearing-impaired subjects in this work did not use hearing aids in their everyday lives. Data collection was done at two different sites: the University of British Columbia, Vancouver, B C ( 'UBC' ) and Central West Health, Grand Falls-Windsor, N L ( 'CWH'). Forty-three normal-hearing and twenty-eight hearing-impaired subjects, with mean ages of 26 and 48 years, respectively, completed the tests. For the normal-hearing subjects, the difference between the U B C and C W H groups was not statistically significant. For the 49 hearing-impaired subjects, the difference was statistically significant (p < 0.05), with the C W H hearing-impaired group showing a lower average M R T score than the U B C group. However, the subject groups at the two locations showed similar variations of scores with the different test configuration (i.e. for different SNS and RT); thus the results for the two groups were combined. The exact reason for the difference in the results for the hearing-impaired subjects at the two test sites is not known. 3.4 Results 3.4.1 Speech Intelligibility Mean speech-intelligibility scores for the sixteen sound-field configurations were calculated separately for the normal-hearing and hearing-impaired subject groups. Results are presented as the percentage of correct responses. Figure 3.2 shows the variations of the mean speech-intelligibility score (with 95 % confidence interval) with reverberation time and noise-source position. For normal-hearing subjects, when the noise source was farther from the listener than the speaker (Noise 2), with either SNS = 5 dB or 0 dB, the mean speech-intelligibility scores exceeded 85 % for all reverberation times. Analysis of variance (ANOVA) was employed to compare the sensitivity of the scores to variations in the speech- arid noise-source output-level difference (SNS) and reverberation time (RT). The mean scores for the four different RT's were found to be statistically different (p < 0.0005). The mean scores at each RT were ranked using Tukey's paired comparison test [16] (a = 0.05). The rank orders by the mean speech-intelligibility scores were varied in inverse relation the the RT except for two pairs for which scores were not statistically different: RT= 0.0 and 0.2 s with SNS = 5 dB, and RT = 0.4 and 0.8 s with SNS = 0 dB. With SNS = 5 dB, for RTs of 0.2 and 0.4 s, the mean speech-intelligibility scores with SNS = 5 dB were significantly higher than those with SNS = 0 dB (p < 0.001). However, the difference at RT = 0.8 s was not significant, although the mean speech-intelligibility scores with SNS = 5 dB were lower than those with SNS = 0 dB. When RT was 0.0 s there was no statistical difference between SNS = 5 dB and SNS = 0 dB (p = 0.554). 50 For hearing-impaired subjects, with Noise 2, the mean speech-intelligibility scores were between 68.3 and 80.0 % with SNS = 5 dB, and between 64.3 and 72.5 % with SNS = 0 dB. The differences between the scores for the normal-hearing and hearing-impaired subjects were 13.3 to 19.5 % with SNS = 5 dB, increasing to 17.8 to 22.7 % with SNS = 0 dB. The A N O V A results for the difference in the mean speech-intelligibility test scores between the normal-hearing and hearing-impaired subjects indicated a statistically significant difference (p < 0.0005). For the hearing-impaired subjects in this case, the 95% confidence intervals overlapped at the different RTs; therefore, statistical confidence in the rank order was lower. The peak (i.e. the locally highest value, possibly not statistically significant) occurred at RT= 0.2 s with SNS = 5 dB, and its confidence interval was relatively narrow; with SNS = 0 dB, the highest mean speech-intelligibility score was at RT = 0.0 s. There were no significant differences between the scores with SNS — 5 dB and 0 dB, except in the case of RT — 0.2 s (p <0.05). For normal-hearing subjects, when the noise source was positioned between the listener and the speaker (Noise 1), with SNS = 5 dB, mean speech-intelligibility scores varied from 80.4 to 88.3 %. There were two peaks in the score, at RT= 0.0 and 0.4 s; the difference was not statistical significant according to their rank order by Tukey's pairwise comparison test (a = 0.05). For the hearing-impaired subjects in the same conditions, mean speech-intelligibility scores varied from 61.7 to 67.6 %. Of the values tested, the highest score occurred at RT = 0.8 s. The differences between the scores for the normal-hearing and hearing-impaired subjects varied from 12.8 to 22.3 % with SNS = 5 dB, and the differences were statistically significant. In the case of Noise 1, with SNS= 0 dB, mean speech-intelligibility scores increased with increasing reverberation time for both normal- and hearing-impaired groups. Of the RTs tested, the mean speech-intelligibility scores had its highest value at RT = 0.8 s for both subject groups. The mean scores for the four RTs were again ranked using Tukey's paired comparison test (a = 0.05), and were found to vary directly with RT, with 99.9 % confidence. 51 Figure 3. 2. Variation of mean speech-intelligibility score, and 95% confidence interval, with RT: a. SNS = 5 dB, Noise 2; b. SNS = 0 dB, Noise 2; c. SNS = 5 dB, Noise 1; d. SNS = 0 dB, Noise 1). ( ) normal-hearing; ( ) hearing-impaired. The lowest mean speech-intelligibility scores measured in the various test configurations varied between 66.2 and 75.3 % for normal-hearing subjects, and between 36.7 and 55.1 % for hearing-impaired subjects. The differences between the scores for the two subject groups varied from 20.2 to 29.5 %, the biggest differences seen in the four cases. The difference between the results for SNS = 0 and 5 dB was statistically significant (p < 0.0005) for both subject groups. 52 3.4.2 Best predicting early-time limit For each sound-field configuration, useful-to-detrimental ratios were calculated from the predicted impulse responses and the applicable speech and noise levels, according to Eq. (1). They were also predicted using Eq. (2) for comparison with theory. Early-time limits of / = 20, 30,..., 120 ms were used. In order to identify the early-time limits which best predicted the measured speech intelligibility, regression analyses were performed on the mean speech-intelligibility scores for each sound-field configuration. Since the relationships were clearly not linear or quadratic, and following Bradley [17], third-order polynomials were fit. Table 3.2 shows the strengths of the relationships - quantified by the goodness-of-fit measure, R2 - between each measure and speech intelligibility, for both the normal- and hearing-impaired results. Table 3. 2. Coefficients of determination (R2) associated with third-order-polynomial regression fits for each U, value, for both normal-hearing and hearing-impaired subjects. The highest values are in bold. Normal hearing Hearing-impaired Measured Predicted Measured Predicted 74.7 72.6 70.2 66.4 f/30 81.1 80.0 81.4 78.6 u40 83.8 84.3 88.6 86.6 u50 85.2 86.0 92.6 91.0 u60 84.7 86.4 93.9 93.3 u10 83.9 86.2 94.5 94.5 83.2 85.7 94.5 95.0 u90 82.4 85.1 94.5 95.3 £7ioo 81.9 84.4 94.4 95.3 t/no 81.3 83.8 • 94.3 95.2 /^120 80.8 83.3 94.1 95.1 53 Since the form of the trendline and the number of data points was the same in every case, the success of each measure can be compared by comparing the corresponding R2 values. In both calculation and prediction results, the trends were similar. U5Qim and t/60,p were most accurate at predicting the speech-intelligibility results for the normal-hearing subjects. However, for the hearing-impaired subjects, U10^ USo>m and U90,m and f/90>p and C / 1 0 o, P predicted the results best; that is, with early-time limits 20 - 40 ms higher than for the normal-hearing group. That the limit was higher in prediction than in measurement may be due to the fact that the sound fields in the virtual rooms were not perfectly diffuse as assumed in prediction. The c 7 5 0 > m (for normal hearing, 'NH' ) and Uso,m (for hearing-impaired, 'HoH') regression curves are shown in Figure 3.3. The corresponding equations are as follows: Normal hearing: 57 N H , m= 85.2 + 1.94-t/5o,m-0.167-[/5o,m2 + 0.00911-c/5o>m3 ic 2 = 85.2 % 57NH,p = 85.6 + 1.77-C/6o,p - 0.176-c/60,p2 - 0.012-L/6 0,P 3 R2 = 86.4 % Hearing-impaired: 57HoH,m = 64.1 + 2.86-£/ 8 0 i m - 0.232-[/80,m2 + 0.010-U 8 0 > m 3 R2 = 94.5 % 57HoH,P = 64.7 + 2.66-f7100,p- 0.245•L/.oo.p2 - 0.013-C/100,p3 R2 = 95.3 % 54 Figure 3. 3. Variation of mean speech-intelligibility score with useful-to-detrimental ratio using the best-fit early-time limit for all sound-field configurations and the best-fit third-order polynomial regression curves: a. normal-hearing, J75o,m; b. hearing-impaired, Uso>m. 55 3.5 Discussion The speech-intelligibility results for the normal-hearing and hearing-impaired subjects obtained with the Modified Rhyme Tests revealed some basic differences in perception by the two groups, and also some similarities. Normal-hearing subjects in our study showed decreased speech intelligibility with increased reverberation time when the speech source was closer to the listener than the noise source (Noise 2). When the noise source was between the speech source and the listener (Noise 1), of the RTs tested, the optimal reverberation time varied from 0.4 to 0.8 s when the SNS varied from 5 to 0 dB, except for normal-hearing subjects with SNS = 5 dB, for which the highest score was obtained with RT= 0 s. Except for this result, the results are in good agreement with those of Hodgson and Nosal [2] (who also proposed a detailed explanation for their results). In general, hearing-impaired subjects in our study showed similar trends to the normal-hearing subjects. This is consistent with recent work by Bradley, Sato and Picard [18]. Of course, the hearing-impaired listeners were more adversely affected by reduced speech-to-noise level difference. Increased reverberation time generally resulted in increased speech intelligibility when the noise source was closer than the speech source to the listener (Noise 1). Increased reverberation time above RT = 0.2 s decreased speech intelligibility when the speech source was closer to the listener than the noise source (Noise 2). When the noise source was farther from the listener than the speech source, the optimal reverberation time included zero and low, non-zero values. The results for the hearing-impaired subjects had relatively large standard deviations on their mean speech-intelligibility scores. The greatest standard deviations always occurred at RT= 0 s among the four reverberation times. The smallest standard deviations always occurred at the RTs resulting in the peak mean speech intelligibility scores - i.e. at the optimal reverberation times. When the noise source was farther from the listener than the speech source, the difference in score resulting from the two different speech- and noise-source output level 56 differences (SNS) was smaller than those when the noise was between the speech source and the listener, for both the normal- and hearing-impaired subject groups. 3.6 Conclusions The results of this work generally support previous theoretical predictions [2]. With the noise source incorporated in a realistic manner, the optimal reverberation times were dependent on its positions relative to the speaker and the listener in a room. The optimal reverberation time was zero or near zero when the noise source was farther than the speaker; zero and non-zero reverberation times were found to be optimal when the noise source was between the listener and the speaker. If the speech-to-noise level difference is adverse for a subject group, some reverberation is required to increase the speech signal. The best early-time limit in the useful-to-detrimental energy ratio was 50 - 60 ms for normal-hearing subjects. Uy0, Uso or U90were the most accurate predictors of the mean speech-intelligibility score. Hearing-impaired subjects apparently require more early energy than normal-hearing subjects with this range of the speech-to-noise level difference. In this study, a simple, idealized classroom with approximately diffuse sound field and exponential sound decay was studied, as was the case in previous experimental work reported in the literature. Thus, the results depended on the overall reverberation in the room. They also depend on the source and receiver locations involved; Ref. 2 contains further discussion of the effect of varying these parameters. Ignored in this study is the influence of detailed room-acoustical factors such as individual reflections from the wall, floor or ceiling; these relate to the exact room geometry and surface-absorption distribution, and exist in realistic rooms. It would be interesting to repeat the study using a more realistic model of a classroom to improve the current work. The optimal reverberation might be found to vary from room to room depending, for example, on details of the arrival of reflections at the receiver. It would also be interesting to include replay-headphone equalization and angularly-varying HRTF data in the simulations, though the relatively small standard deviations of the mean speech-intelligibility scores among the normal-hearing subjects suggests that considering differences in the subjects' HRTFs may not be necessary, and that the effect of headphone distortion is negligible. 57 R e f e r e n c e s [I] J. S. Bradley, "Speech intelligibility studies in classrooms," J. Acoust. Soc. Am. 80(3), 846-854 (1986). [2] M . R. Hodgson and E . - M . Nosal, "Effect of noise and occupancy on optimal reverberation times for speech intelligibility in classrooms," J. Acoust. Soc. Am. 111(2), 931-939 (2002). [3] A . K . Nabelek and P.K. Robinson, "Monaural and binaural speech perception in reverberation for listeners of various ages," J. Acoust. Soc. Am. 71(5), 1242-1248 (1982). [4] A . K . Nabeleck and J .M. Pickett, " Monaural and binaural speech perception through hearing aids under noise and reverberation with normal and hearing-impaired listeners," J. Speech Hear. Res. 17(4), 724-739 (1974). [5] A . K . Nabelek and J .M. Pickett, "Reception of consonants in a classroom as affected by monaural and binaural listening, noise, reverberation, and hearing aids," J. Acoust. Soc. Am. 56(2), 628-639(1974). [6] T. Finitzo-Hieber and T.W. Tillman, "Room acoustics effects on monosyllabic word discrimination ability for normal and hearing-impaired children," J. Speech Hear. Res. 21(3), 441-458(1978). [7] T. Houtgast, H . J. M . Steeneken and R. Plomp, "Predicting speech intelligibility in rooms from the modulation transfer function. II. Mirror image computer model applied to rectangular rooms," Acustica 46(1), 73-81 (1980). [8] S.R. Bistafa and J.S. Bradley, "Reverberation time and maximum background-noise level for classrooms from a comparative study of speech intelligibility metrics," J. Acoust. Soc. Am. 107(2), 861-875 (2000). [9] M . R. Hodgson, "When is diffuse-field theory applicable?," Appl. Acoust. 49(3), 197-207 (1996). [10] M . Hodgson, "Evidence of diffuse surface reflections in rooms," J. Acoust. Soc. Am. 89(2), 765-771 (1991). [II] B . -I. Dalenback, CATT-Acoustic v8.0 Users' Manual (Gothenburg, 2004). [12] E. J. Kreul, J. C. Nixon, K . D. Kryter et al., " A proposed clinical test of speech discrimination," J. Speech Hear. Res. 11(3), 536-552 (1968). [13] Auditec of St. Louis, (St. Louis, M O , 2003). [14] Goldwave Inc., GoldWwave v5.0 (St. John's, N L , 2004). [15] D. Henderson, R. J. Salvi, F. A . Boertcher, and A. E. Clock, "Neurophysiologic correlates of sensory-neural hearing loss," J. Katz, Handbook of Clinical Audiology, Chapter 4, 4th Ed. Williams & Wilkins (1994). [16] W. Mendenhall, R. J. Beaver, and B. M . Beaver, Introduction to Probability and Statistics, 10th ed. (Duxbury Press, Belmont, C A , 1999). [17] J. S. Bradley, "Predictors of speech intelligibility in rooms," J. Acoust. Soc. Am. 80(3), 837-845 (1986). [18] J. S. Bradley, H . Sato and M . Picard, "On the importance of early reflections for speech in rooms," J. Acoust. Soc. Am. 113(6), 3233-3244 (2003). 58 4 OPTIMUM REVERBERATION FOR SPEECH INTELLIGIBILITY FOR NORMAL AND HEARING-IMPAIRED LISTENERS IN REALISTIC VIRTUAL CLASSROOMS USING AURALIZATION * 4.1 Introduction Conceptually, speech intelligibility is directly related to signal-to-noise level difference and inversely related to the amount of reverberation [1]. However, situations involving rooms become complicated, since reverberation and steady-state levels interact so that they are not independent. Increased reverberation increases speech and noise levels by increasing the reverberant sound energy. Hodgson and Nosal [2] reviewed the effect of reverberation on speech and noise levels, and using theoretical prediction found that the effect of reverberation on speech intelligibility depends on the relative distances of the listener to the speech and the noise sources. Regarding the optimization of reverberation for speech intelligibility, early reflections have been considered since Lochner and Burger [3] applied the concept of the useful-to-detrimental energy ratio. The useful-to-detrimental ratio was extended to account for the effect of fluctuating ambient background noise on speech intelligibility by Latham [4] for more accurate prediction. The useful-to-detrimental ratio is considered to be a suitable intelligibility metric for predicting speech intelligibility for normal-hearing listeners [1,5,6]. Typically, the time limit for the early-arriving reflections has been taken to be 50 ms for speech sounds [1,7]. The useful early-time limit for hearing-impaired listeners for speech intelligibility was unknown. Bradley, Sato and Picard [8] studied the -effect of early reflection * A version of this chapter has been submitted for publication to Ear and Hearing. Yang, W . and Hodgson, M. (2006) Optimum reverberation for speech intelligibility for normal and hearing-impaired listeners in realistic virtual classrooms using auralization. 59 for speech intelligibility for hearing-impaired listeners. They found that increased early-reflection energy has the same effect on speech-intelligibility scores as does an equal increase in the direct-sound energy for both normal and hearing-impaired groups. In a previous paper [9], the authors used auralization to experimentally confirm Hodgson and Nosal's [2] prediction results, incorporating realistic noise sources into rooms with approximately diffuse sound fields. Subjective speech intelligibility testing with human subjects showed that hearing-impaired listeners require more early sound energy to achieve the benefit of increased speech level by reverberation than do normal-hearing listeners. The objective of the present work was to expand and validate the previous work on speech intelligibility with diffuse sound fields into more realistic rooms with non-diffuse sound fields. This was done using speech-intelligibility tests with both normal and hearing-impaired listeners in auralized sound fields of existing classrooms and their variations. The test reverberation times were also extended to correspond to those of the existing classrooms, and to confirm the optimal reverberation times for speech intelligibility for both normal and hearing-impaired listeners in the case when the noise source was positioned between the talker and the listener; this was not clearly elucidated in the previous work, due to the limited values of the test reverberation times used. The second purpose of this work was to examine and compare widely used speech-intelligibility metrics. Useful-to-detrimental ratios and Speech Transmission Index (STT) were correlated with the speech-intelligibility test scores. The best predicting early time limits for useful sound energy were identified for both normal and hearing-impaired listeners. 4.2 M e t h o d s 4.2.1 Subjects Subjects for the study were normal-hearing adults, and hearing-impaired adults with mild to moderate sensorineural hearing loss, whose first language was English. Twenty-five normal-hearing and thirteen hearing-impaired subjects participated in the tests; they varied in age from 18 to 64 years old, with mean ages of 24 and 41 years, respectively. Hearing-60 screening tests were done prior to the speech-intelligibility testing, to categorize the hearing of the subjects. The subjects in the normal-hearing groups had pure-tone auditory thresholds less than 25 dBHL (HL = Hearing Loss) at octave frequencies from 250 to 8000 Hz [10]. The hearing-loss criteria for the hearing-impaired subjects were less than 25 dBHL between 250 Hz and 1 kHz, and between 30 and 55 dBHL from 2 to 8 kHz, in either ear. This represents a typical frequency response for sensorineural hearing loss [11]. The hearing-impaired subjects in this study did not wear hearing aids in their everyday lives. 4.2.2 Classroom configurations In this study, realistic virtual classrooms were created with various reverberation times. Thus, the design was based on previous research and the acoustical conditions of the selected existing classrooms [9,12]. A configuration with zero reverberation time was excluded, since it is unrealistic. Six classroom configurations were defined to have reverberation times varying from 0.3 s to 1.9 s, by changing the amount of surface absorption. Two typical medium-sized university classrooms were selected as models for the virtual classrooms. The two classrooms are architecturally identical, but have different acoustical characteristics. One has acoustical treatment to reduce the reverberation (RTma = 0.6 s at the listener position, L), and the other has no acoustical treatment (RT^d = 1.9 s at L). Figure 4.1 shows a three-dimensional model of the virtual classroom. It has 96 seats, length = 15 m, width = 8.5 m, and height = 4 m (volume = 400 m 3 , total surface area = 464 m 2 , volume to surface-area ratio = 0.86 m) with a sloped floor in the seating area. There was a speech source, a noise source and a virtual listener in the classroom. The speech source (S) was positioned at the front of the classroom where the instructor might typically stand. The listener was in the middle of the classroom (source/receiver distance = 5.0 m). Speech-intelligibility tests were auralized in six classroom sound-field configurations having different reverberation times from 0.3 to 1.9 s. The corresponding reverberation radii of the speech source varied from approximately 4 to 1 m. The noise source (N) was positioned between the 61 speech source and the listener at 2.5 m from both. The speech source, the noise source, and the listener were on the center line of the classroom (see Figure 4.1). Figure 4.1 shows six classroom configurations with different applications of absorbing material to vary the reverberation time. The sound-absorbing material used corresponded to 60-mm-thick glass fibre, with the following sound-absorption coefficients in the six octave frequency bands from 125 Hz to 4 kHz: 0.20, 0.57, 0.90, 0.98, 0.98, and 0.97. The absorption and diffusion coefficients of the surface materials are shown in Table 4.1. In Figure 4.2, configurations (0.6) and (1.9) correspond to the two existing classrooms, configuration (0.3) has more absorbing material on the ceiling and the walls, configuration (0.8) has no ceiling absorption, but the same amount of wall absorption as configuration (0.6), configurations (1.2) and (1.5) have the absorbing materials only on the side walls. Each configuration name indicates the predicted mid-frequency reverberation time at the listener position in the unoccupied classroom. The surface-diffusion coefficients were set based on the work of Hodgson [13]. Table 4.2 lists the octave-band average surface-absorption coefficients (a) and diffusion coefficients (d ) of the six classroom configurations. Two different speech- and noise-source output-level differences (SNS) were tested: SNS of 0 and 4 dB, selected on the basis of preliminary predictions. The test values of SNS and RT covered a wider range of RT values than those in the previous study [9], and allowed the more limited ranges, resulting in realistic SNR values and expected to contain the optimal RT values which were used here, to be identified. 62 63 Table 4.1. Absorption (a) and diffusion (d) coefficients of the surface materials. material 125 250 500 1000 2000 4000 Glazed a 0.02 0.02 0.02 0.01 0.01 0.01 concrete d 0.05 0.05 0.05 0.05 0.05 0.05 a 0.03 0.09 0.25 0.31 0.33 0.44 Carpet d 0.06 0.07 0.10 0.15 0.15 0.20 a 0.13 0.16 0.15 0.13 0.18 0.25 Audience 0.85 0.85 d 0.62 0.72 0.80 0.80 Painted a 0.28 0.07 0.06 0.06 0.06 0.05 Plywood d 0.03 0.03 0.02 0.01 0.01 0.01 a 0.14 0.10 0.06 0.08 0.10 0.10 Wood 0.30 d 0.60 0.45 0.32 0.38 0.30 Blackboard a 0.15 0.25 0.20 0.12 0.10 0.05 d 0.10 0.05 0.04 0.04 0.03 0.03 Fibreglass a 0.20 0.57 0.90 0.98 0.98 0.97 d 0.39 0.42 0.90 0.98 0.95 0.95 Table 4. 2. Octave-band average surface-absorption coefficients (a) and diffusion coefficients (d ) of the six classroom configurations. RT(s) 125 250 500 1000 2000 4000 0.3 a 0.17 0.35 0.54 0.58 0.59 0.61 d 0.32 0.34 0.60 0.65 0.64 0.65 0.6 a 0.18 0.22 0.29 0.31 0.32 0.32 d 0.26 0.27 0.39 0.42 0.41 0.41 0.8 a 0.19 0.15 0.18 0.19 0.20 0.20 d 0.22 0.22 0.28 0.29 0.28 0.28 1.2 a 0.19 0.12 0.13 0.13 0.14 0.14 d 0.19 0.19 0.22 0.23 0.22 0:22 1.5 a 0.20 0.10 0.09 0.09 0.10 ' 0.10 d 0.18 0.17 0.18 0.18 0.18 0.18 1.9 a 0.20 0.08 0.07 0.07 0.08 0.08 d 0.17 0.16 0.16 0.16 0.16 0.16 64 4.2.3 Sound-field simulation and speech-intelligibility test procedure CATT-Acoustic v8.0 [14] was used to predict and auralize the test sound fields. The number of rays involved in the predictions was 22,854 in each configuration and the truncation times varied from 1.0 s to 2.2 s depending on the target reverberation times in the classrooms. Human speech directivity was associated with both the speech source and the noise source. The listener and the speech source faced each other, as in a real classroom situation. The noise source faced the front in the room. Predictions were made in six octave bands from 125 to 4000 Hz. Binaural impulse responses between the listener and both the speech and the noise sources were predicted. The head-related transfer functions (HRTFs) provided with the CATT-Acoustic system were used; theses take into account the effects of reflections from the pinnae and shoulders, as well as the shadowing effect of the head itself. Playback, with headphone transfer function compensation, with Beyer DT 990 Pro headphones, was used in this work. Figure 4. 2. Speech-to-noise ratio (SNR) at listener's location: (—•—) SNS = 4 dB; ( - - A — ) SNS = 0dB. 65 The predicted sound fields were auralized using the CATT-Acoustic software. The speech-signal level was chosen to approximate typical classroom levels in the six octave frequency bands from 125 Hz to 4 kHz: 54.5, 60.6, 62.2, 59.3, 57.4, and 51.8 dB [15]. The noise-source levels were set relative to the input speech level to achieve the two test SNS values. A total of 12 different sound-field configurations were created, consisting of all combinations of the two speech- and noise-source relative output levels (SNS = 0 and 4 dB), and of the six reverberation times (RT = 0.3, 0.6, 0.8, 1.2, 1.5 and 1.9 s). As shown in Figure 4.2, the speech-to-noise level differences received at the listener position (SNR), corresponding to the two SNS values, varied in the various configurations from -4.1 dB to +3.3 dB. Modified Rhyme Test (MRT) [16] word lists were used as the speech stimuli. The M R T consists of 300 words embedded in a carrier phrase, "Say /word/.", that are arranged in six lists each having fifty words: twenty five differ by the initial consonant and twenty five by the final consonant. The six lists were recorded by a male, native-Canadian talker in an anechoic chamber, and the 50-word orders in each list were randomized to obtain twelve lists for the twelve classroom configurations. The r.m.s. amplitudes of the lists were normalized to the same value. The M R T lists were combined through the CATT-Acoustic system with four-talker babble noise (available from AudiTech, St. Louis, M O [17]). The MRT-speech and babble-noise signals for each test configuration were mixed together using the Goldwave v5.1 [18] sound-editing program, at levels corresponding to the predicted sound-pressure levels at the listener position. The final auralization test materials were transferred to a compact disc for presentation to subjects using a CD player. The test material was replayed through Beyer DT 990 Pro headphones in a soundproof room. Each subject was tested individually. Subjects listened to a complete list of 50 words for each of the 12 different sound-field configurations. In order to avoid score inflation caused by the closed-set method used here, subjects were instructed not to guess the answer. 66 4.2.4 Data analysis Descriptive statistics for the speech-intelligibility test scores for each of the twelve sound-field configurations were calculated both for the normal-hearing and the hearing-impaired listeners. Results are presented as the percentage of correct responses. Analysis of variance (ANOVA) was employed to compare the sensitivity of the scores to variations in the speech- and noise-source output-level difference (SNS) and reverberation time (RT). Tukey's multiple-comparison test was used for post hoc measures of individual mean differences [19]. A significance level of 0.05 was used to evaluate the statistical outcomes of the speech-intelligibility tests. For each sound-field configuration, useful-to-detrimental ratios were calculated from the predicted impulse responses and the applicable speech and noise levels, as described in Ref. 9. Early-time limits of 10 ms intervals from 20 to 120 ms were used. 4.3 Results 4.3.1 Speech intelligibility Figure 4.3 shows the variations of the mean speech-intelligibility score (with 95 % confidence interval) with reverberation time. For normal-hearing subjects, with SNS = 4 dB, the mean speech-intelligibility scores were between 77.1 % and 90.1 %. The mean scores for the six different RT's were statistically different (p < 0.0005). The highest mean speech-intelligibility score occurred at RT = 0.6 s, which was statistically significant for the six different RT's by the Tukey's multiple-comparison test (a = 0.05). For hearing-impaired subjects, with SNS = 4 dB, the mean speech-intelligibility scores were between 72.9 % and 83.5 %, and were also statistically different (p < 0.024). The highest mean speech-intelligibility score also occurred at RT= 0.6 s; however it was not statistically significant by Tukey's multiple-comparison test (a = 0.05). 67 For normal-hearing subjects with SNS = 0 dB, mean speech-intelligibility scores varied from 69.2 % to 78.5 %. There was a significant difference in the mean scores of the six RTs tested (p < 0.0005). With RT= 0.8 s and 1.2 s, the highest mean speech-intelligibility scores occurred with the same value of 78.5 %; the difference was not statistically significant according to their rank order by Tukey's multiple-comparison test (a = 0.05). For the hearing-impaired subjects in the same conditions, mean speech-intelligibility scores varied from 64.0 to 71.1 %. Of the values tested, the highest score occurred at RT= 0.8 s. The differences between the scores for the normal-hearing and hearing-impaired subjects were statistically significant, in general. When each configuration was examined individually, more detailed results were obtained. The mean speech-intelligibility scores for the normal-hearing and hearing-impaired subjects had statistically significant differences, except for the case of RT= 1.5 s, for both SNS=0 dB and 4 dB. 0.8 1.0 1.2 RT(s) 0.8 1.0 1.2 RT(s) 1.8 ,2.0 Figure 4. 3. Variation of mean speech-intelligibility score, and 95% confidence interval, with RT: a. SNS = 4 dB; b. SNS = 0 dB. ( ) normal-hearing; ( ) hearing-impaired. 68 4.3.2 Useful-to-detrimental ratio and Speech Transmission Index Figure 4.4 shows Ut values with SNS = 0 dB and 4 dB using both methods. For t = 30 and 40 ms, the highest Ux values occurred at RT = 0.6 s with both SNS values. With SNS = 4 dB, the highest Ut values occurred at RT = 0.8 s for t = 50 ms and higher. When SNS decreased to 0 dB, the RT giving the highest Ut value increased to 1.2 s. Considering that the /?rhaving the highest mean speech-intelligibility scores was 0.6 s for SNS = 4 dB and 0.8 s to 1.2 s for SNS = 0 dB, the variation of Ux prediction with RT showed fairly good agreement with the mean speech-intelligibility results. In order to identify the early-time limits for which speech intelligibility was highest, regression analyses were performed on the mean speech-intelligibility scores for each sound-field configuration [20]. Table 4.2 shows the strengths of the relationships - quantified by the coefficient of determination, R2 - between each measure and speech intelligibility, for both the normal- and hearing-impaired results. Since the form of the regression curves and the number of data points were the same in every case, the success of each measure can be compared by comparing the corresponding R2 values. Ui0 was most accurate at predicting the speech-intelligibility results for the normal-hearing subjects. For the hearing-impaired subjects, U50 predicted the results best - that is, with early-time limits 20 ms higher than for the normal-hearing group. The limit using CATT-Acoustic was lower in this study than in the study of rooms with approximately diffuse sound fields [9]. This may be due to the fact that the sound fields in the virtual classrooms had relatively large numbers of specular reflections, so that the useful early sound time limits were decreased. The C/30 (for normal hearing, 'NH ' ) and U50 (for hearing-impaired, 'HoH') regression curves are shown in Figure 4.5a and b. The resulting best-fit regression equations are as follows: Normal hearing: 57 N H = 89.1 +2.73685-Un- 0.01553-U l 0J R2 = 9\3% Hearing-impaired: ShoH = 79.5 + 1.631 13-C/JO- 0.26769-Uso2 R2 = 9\2% 69 Table 4. 3. Coefficients of determination (R1) associated with third-order-polynomial regression fits for various Rvalues, for both normal-hearing (NH) and hearing-impaired (HoH) subjects. Figure 4. 5. Variation of mean speech-intelligibility score with useful-to-detrimental ratio using the best-fit early-time limit for all sound-field configurations and the best-fit quadratic regression curves: a. normal-hearing, UM; b. hearing-impaired, C/5 0. 71 Speech Transmission Index was calculated based on the RT at lk Hz and the A-weighted signal-to-noise ratios for each classroom configurations using a simplified version of the procedure developed by Steeneken and Houtgast based on the modulation transfer function, as described in Ref. 21. Figure 4.6 shows the mean speech-intelligibility scores versus STI. The resulting best-fit regression equations are: Normal hearing: SINH = 47.456 + 89.616STI- 0.4005 STI2 R2 = 87.0 % Hearing-impaired: SIHoH = 38.82 + 121.67-577- 88.535\S772 R2 = 54.5 % STI predicted speech-intelligibility with R2 = 87.0 % for the normal-hearing subjects. The speech-intelligibility converged asymptotically to 100 % as STI was greater than 0.6. For the hearing-impaired subjects, STI could not describe the speech-intelligibility with high accuracy; that is, STI was unable to predict speech intelligibility for hearing-impaired listeners. This is consistent with previous results reported in the literature [22,23,24]. 72 100 95 90 -y -t—> 85 -/ ° :elligibili 80 75 • J o * ° M 0 * Speech Inl 70 65 60 55 50 -M , o / / ° 4 I I 1 1 0 0.2 0.4 0.6 0.8 1 STl Figure 4. 6. Variation of mean speech-intelligibility score with Speech Transmission Index using the best-fit early-time limit, for all sound-field configurations, and the best-fit quadratic regression curves (eqns in text). (•, ) normal-hearing; (o, ) hearing-impaired. 4.4 Discussion The results for the normal-hearing and hearing-impaired subjects, obtained with the MRT, revealed some basic differences in perception by the two groups, and also some similarities. In this study, the range of RTs examined was broadened in order to investigate the speech intelligibility for the case of a noise source between the talker and the listener, which was expected to have non-zero optimal reverberation times [2], over a higher range of RTs than in the previous study [9]. When the noise source was between the speech source and the listener, of the RTs tested, the optimal reverberation time varied from 0.6 s to 0.8 s. The results are in good agreement with those of the previous study [9]. 73 x 10"" 1 |—i 1 1 1 1 1 1 1 1 r •3 0 -0.5 h a. _i 1 \ 1 1 1 1 1 1 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Time (s) Figure 4. 7. The initial 0.2 s of the impulse response, a. RT= 0.8 s in the realistic classroom; b. RT = 0.8 s in the idealized classroom in Ref 9. 74 In general, hearing-impaired subjects in our study showed similar trends to the normal-hearing subjects in varied room-acoustical conditions. This is consistent with recent work by Bradley, Sato and Picard [8]. As expected, the hearing-impaired listeners were more adversely affected by reduced speech-to-noise level difference. Increased reverberation time generally resulted in increased speech intelligibility when the noise source was closer than the speech source to the listener; at times greater than the optimal reverberation time, increased reverberation time decreased speech intelligibility. The useful early-time limits found here using CATT-Acoustic were shorter in the realistic virtual classrooms than in the approximately diffuse sound fields of the previous study [9]. This can be explained by examining the impulse responses. Figure 4.7 shows the initial 0.2 s of the impulse responses for the realistic classroom and for the idealized classroom with an approximately diffuse sound field, both with RT = 0.8 s. Figure 4.7a clearly shows strong specular reflections in the realistic virtual classroom. This shows that optimum reverberation times may vary from room to room according to the details of the reflection arrival in the impulse response. 4.5 C o n c l u s i o n s Extending previous work in classrooms with approximately diffuse sound fields, this study used auralization and speech-intelligibility tests to find the optimum reverberation time for speech intelligibility in more realistic rooms with non-diffuse sound fields. For both normal and hearing-impaired subjects, the results of this study agreed with previous results [9]. The optimal reverberation time was not zero when the noise source was between the listener and the speaker, with the received speech-to-noise level difference varying from -4 to +3 dB. The optimal reverberation time increased with decreasing the speech-to-noise level difference. If the speech-to-noise level difference is adverse for a subject group, some reverberation is required to increase the speech signal. Hearing-impaired subjects required more early energy than normal-hearing subjects for this range of speech-to-noise level differences. 75 In this study, a typical medium-sized classroom was the model for the virtual classroom. It would be interesting to repeat the study using various types of classroom to improve the current work. While the hearing-impaired listeners in this study benefited from early reflections more than the normal listeners, it would be'also interesting to verify the result for elderly listeners or more severely impaired listeners who use hearing aids in their everyday lives to investigate optimal reverberation for hearing aids. References [I] J. S. Bradley, "Speech intelligibility studies in classrooms," J. Acoust. Soc. Am. 80 (3), 846-854(1986). [2] M . Hodgson and E . - M . Nosal, "Effect of noise and occupancy on optimal reverberation times for speech intelligibility in classrooms," J. Acoust. Soc. Am. I l l (2), 931-939 (2002). [3] J. P. A . Lochner and J. F. Burger, "The influence of reflections on auditorium acoustics," J. Sound Vib. 1 (4), 426-448 (1964). [4] H . G. Latham, "The signal-to-noise ratio for speech intelligibility—An auditorium acoustics design index," Appl. Acoust. 12 (4), 253-320 (1979). [5] J. S. Bradley, "Predictors of speech intelligibility in rooms," J. Acoust. Soc. Am. 80 (3), 837-845(1986). [6] J. S. Bradley, "Relationships among measures of speech intelligibility in rooms," J. Audio Eng. Soc. 46 (5), 396-405 (1998). [7] J. S. Bradley, R. D. Reich, and S. G. Norcross, "On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility," J. Acoust. Soc. Am. 106 (4), 1820-1828 (1999). [8] J. S. Bradley, H . Sato, and M . Picard, "On the importance of early reflections for speech in rooms," J. Acoust. Soc. Am. 113 (6), 3233-3244 (2003). [9] W. Yang and M . Hodgson, "Auralization study of optimum reverberation times for speech intelligibility for normal and hearing-impaired listeners in classrooms with diffuse sound fields," J. Acoust. Soc. Am. 120(2), 801-807 (2006). [10] J. G. Clark, "Uses and abuses of hearing loss classification," A S H A 23 (7), 493-500 (1981). [II] D. Henderson, R. J. Salvi, F. A . Boettcher et al.,"Neurophysiologic correlates of sensory-neural hearing loss," in Handbook of cliniclal audiology, edited by J. Katz (Williams & Wilkins, Baltimore, 1994), pp. 37-55. [12] W. Yang and M . Hodgson, "Speech intelligibility tests in virtual and real classrooms," Proc. Forum Acusticum 2005, Budapest (2005). [13] M . Hodgson, "Evidence of diffuse surface reflections in rooms," 89 (2), 765-771 (1991). [14] B . -I. Dalenback, CATT-Acoustic v8.0 Users' Manual (Gothenburg, 2004). [15] American National Standards Institute, "ANSI S3.5-1997 Methods for caculation of the speech intelligibility index," 1997. [16] E. J. Kreul, J. C. Nixon, K . D. Kryter et al., "A proposed clinical test of speech discrimination," J. Speech Hear. Res. 11 (3), 536-552 (1968). [17] Auditec of St. Louis (St. Louis, M O , 2003). [18] GoldWave Inc., GoldWave v5.1 (St. John's, N L , 2005). 76 [19] F. J. Gravetter and L . B . Wallnau, Essentials of statistics for the behavioral sciences, 4th ed. (Wadsworth, Pacific Grove, C A , 2002), pp.327-330. [20] J. S. Bradley and S. R. Bistafa, "Relating speech intelligibility to useful-to-detrimental sound ratios (L)," J. Acoust. Soc. Am. 112 (1), 27-29 (2002). [21] M . Hodgson, "Rating, ranking, and understanding acoustical quality in university classrooms," J. Acoust. Soc. Am. 112 (2), 568-575 (2002). [22] K . L. Payton, R. M . Uchanski, and L. D. Braida, "Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing," J. Acoust. Soc. Am. 95 (3), 1581-1592 (1994). [23] L . E. Humes, S. Boney, and F. Loven, "Further validation of the Speech Transmission Index (STI)," J Speech Hear Res 30 (3), 403-410 (1987). [24] L . E. Humes, D . S. D. Dirks, T. Bell et al., "Application of the articulation index and the speech transmission index to the recognition of speech by normal-hearing and hearing-impaired listeners.," J. Speech Hear. Res. 29 (4), 447-462 (1986). 77 5 CEILING BARRIERS AND REFLECTORS T O OPTIMIZE LECTURE-ROOM SOUND FOR SPEECH INTELLIGIBILITY* 5.1 Introduction The importance of classroom acoustics and of speech intelligibility is well recognized. The room-acoustical parameters affecting speech intelligibility are known; generally, speech intelligibility tends to increase with increased speech-to-noise level difference and to decrease with increased reverberation. Reverberation for speech intelligibility is best quantified by the clarity factor C 5 0 , based on the early-to-late energy fraction. C 5 0 is usually highly correlated with early-decay time and reverberation time in classrooms [1]. The effect of speech-to-noise level difference on speech intelligibility dominates that of reverberation [2]. Other studies [1,3,4] have confirmed that the values of these parameters are often non-optimal. Thus, the question remains as to how to achieve the optimal values of the parameters in real classrooms in a practical, cost-effective way. Many newly designed or renovated classrooms use absorptive materials to reduce reverberation and late-arriving energy. However, these absorptive materials also decrease speech levels and may reduce beneficial early reflections. The purpose of the study reported here was to find an effective way to design lecture rooms - i.e. larger classrooms with an instructor at the front of the room, speaking to a group of students in front of him/her - and control sound to achieve optimum reverberation and adequate speech levels, especially at the back of the room, for speech intelligibility. The effectiveness of a novel system of ceiling barriers and reflectors for optimizing speech intelligibility is investigated, using a room-prediction model and physical scale-modeling. * A version of this chapter has been submitted for publication to j . Acoust. Soc. Am. Yang, W . and Hodgson, M. (2006) Ceiling barriers and reflectors to optimize lecture-room sound for speech intelligibility. 78 A room consists of a floor, walls and a ceiling. Among these three room components, the ceiling is chosen to be modified because it has a large flexibility compared to the walls and floor, and it can help reflect a teacher's voice toward the back of the room. Various ceiling-barrier and ceiling-reflector configurations were designed. Each design was incorporated into computer and scale-model lecture-room models, and the effects of the barriers and reflectors on the sound field were predicted, to optimize the designs. j Speaker-Centre 9 • r l C • rlR > • r2C • r2R • r3C • r3R Speaker-Right 9 P • : r l L • r l C • rlR • r2L • r2C • r2R • .. r3L • r3C • r3R Figure 5.1. Lecture-room floor-plan showing the speech-source (S) and receiver (rlL, r2L, r3L, r lC , r2C, r3C, rlR, r2R, r3R) positions. 79 5.2 M e t h o d s 5.2.1 Lecture-room configurations A typical medium-sized university lecture room was selected as the model for tests of the effects of the ceiling barriers and reflectors. Figure 1 shows the floor plan. The lecture-room has 96 seats, length = 15 m, width = 8.5 m, and height = 4 m (volume = 510 m3, total surface area = 443 m2, volume to surface-area ratio =1.15 m). The room surfaces and the seats with writing tablets, are sound-reflective. The mid-frequency reverberation time measured in the unoccupied classroom was about 2 s. Average octave-band surface-absorption coefficients calculated using diffuse-field theory varied from 125 to 2000 Hz as follows: 0.22, 0.12, 0.08, 0.08, 0.08. A speech source was positioned either at the front-center or at the front-right of the lecture room, and three listening positions were positioned in front (rlC - source/receiver distance = 2.0 m), middle (r2C = 5.5 m), and back (r3C = 9.0 m) seats on the center line of the lecture room. Six additional listening positions were used when predicting the acoustical conditions in side areas of the lecture room (see Figure 5.1). 5.2.2 Ceiling barrier and reflector configurations Two basic types of ceiling barriers and reflectors were studied. The first involved parallel ceiling barriers, projecting down from the ceiling and running front-to-back in the classroom. They were expected to absorb reverberant sound incident on the ceiling from many angles, reducing late-arriving energy, while leaving speech signals, reflecting from the (reflective) ceiling between the barriers to the back of the room, unaffected. Both sound-absorptive and sound-reflective barriers were considered. Different shapes, materials, spacings and depths of the ceiling-barrier configurations were considered in pilot studies. Six configurations were selected for detailed study using computer prediction and scale-model measurement. These were reflective (configuration R) and absorptive (A) barriers, 0.6-m deep, separated by either 0.3 m (Rl/Al) , 0.6 m (R2/A2) or 1.2 m (R4/A4). 80 Based on preliminary barrier results, ceiling reflectors involving lengths of obstacles of semicircular cross-sectional shape (configuration C), suspended from the ceiling with either the flat (F) or curved (C) side down, were evaluated in the scale model. These were expected to reflect and scatter speech sounds like 'fittings' in an industrial workshop (see below). The shapes were based on common suspended light fixtures. The diameter of the semicircular reflectors was 0.3 m, the distance from the ceiling to the bottom of the reflectors was 0.6 m. They were 7-m long, and ran front-back in the lecture room, separated by 1.2 m, and with either the flat side down without absorption on the curved side (CF), the flat side down with absorption on the upper curved side (CFA), or the curved side down (CC) without absorbtion. Alternatively, either thirty (configuration CF30) or sixty (CF60) reflectors, 0.8-m long, were hung randomly from the ceiling with flat side down without absorption. 5.2.3 Physical scale-modeling The lecture room was studied without and with ceiling barriers and reflectors using a l:8-scale model. According to the fundamental principle of the scale-modeling technique [5], according to which all dimensions are scaled down by the scaling factor, the lecture room with length =15 mFS (FS=full-scale value), width = 8.5 mFS, and height = 4 mFS, was a 1.88 m x 1.06 m x 0.50 m 1/8-scale model. The floor of the model was of polished concrete. The walls, ceiling, the partition at the back of the room, and the rows of seats, were of varnished plywood. Figure 5.2a is a photograph of the model showing the rows of seats and the rear partition. In order to investigate how the effects of ceiling barriers and reflectors vary with room occupancy in the lecture-room, two different occupancies (unoccupied and 37 % occupied/26 students) were used in the scale model. Air absorption is a major problem in scale-model measurement. Air absorption increases approximately with the square of the frequency [6]. In a scale model, wavelength-to-dimension ratios are maintained, so wavelengths are scaled down by the scale factor, resulting in scaled-up model test frequencies. Since the test frequencies are high, air absorption is excessive in a scale model and cannot be neglected. For prediction, air-absorption exponents were calculated at the model test frequencies for the temperature and relative humidity measured in the scale model, as described in Ref. 6. 81 Figure 5. 2. Photographs of the 1/8-scale-model without ceiling barriers or reflectors: a. showing the seats and rear partition; b. showing the model speech source and microphone. 82 Speech sources in lecture rooms are mainly human talkers. Source directivity can strongly influence speech levels in lecture rooms. For accurate scale modeling of the lecture room, a model speech source is required which radiates with the directional characteristics of human speech. Such a source was created using a l:8-scale head made of modeling clay, and formed around the end of a hollow cone driven by a 'tweeter' loudspeaker, which narrowed down to a 3-mm-diameter opening as the mouth, to represent human speech directivity in the scale model (see Figure 5.2b). The power levels and directivities of the model speech source were measured in an anechoic chamber. Figure 5.3 shows the measured horizontal-plane directivity. 83 The ceiling barriers were made of varnished plywood, and could be covered with thin industrial carpet to make them absorbent. The absorption of the carpet was estimated from the change in reverberation time that occurred when a sample of it was introduced into the empty model. The absorption was similar to that of 50-mm-thick glass fibre at full-scale frequencies. The semicircular ceiling reflectors were made of painted wood; the same carpet was used to make them absorbent. The carpet was also used to cover the seats to simulate occupied absorption. The corresponding occupancy was estimated from the reduction in EDT at 1kHz [7]. Figure 5.4 shows photographs of some of the scale-model ceiling barriers and reflectors. Acoustical measurements were made using the Maximum Length Sequence System Analyzer (MLSSA), which measured the impulse response between the model speech source and a Bruel & Kjaer 4135, l/4"-diameter microphone used to receive the sound signals. A l l measurements were made after the pre-calibration of the equipment. Early-decay times {EDT in s) and steady-state levels (with the speech-source output levels always kept constant) were measured. Measurements were/made in octave bands from 1 to 16 kHz (125 to 2000 HzFS) at all three receiver positions. Average mid-frequency-EDTmid values most relevant to speech intelligibility were calculated by averaging the octave-band EDT's at 500, 1000 and 2000 HzFS. In the case of the model without ceiling barriers or reflectors, average surface-absorption coefficients were calculated from the measured octave-band EDTs using diffuse-field theory; values increased with frequency from 0.06 to 0.1, close to those in the full-scale room. The octave-band steady-state levels were converted to total A-weighted 'speech' levels SLAN corresponding to a typical adult talking in a normal voice level, using the relative output power levels of such a talker [8] and of the model speech source. 5.2.4 Computer simulation The ceiling barriers were also studied using CATT-Acoustic v8.0 [9] computer simulation. The lecture room and ceiling-barrier configurations were modeled, octave-band EDTs and speech levels predicted, and corresponding values of EDT^ and SLA^ calculated. The lecture-room configuration was exactly the same as in the scale-model measurements. A sound source and nine receivers were positioned as shown in Figure 5.1. The output level 84 Figure 5. 4. Scale-model ceiling barrier and reflector configurations in the unoccupied lecture-room: a. R i o ; b. A l u ; c. C F u ; d. CF60u. and the directivity of the sound source were identical to the values used in the scale-model measurements. Unoccupied seats were modeled as one large 1-m-deep seat block (see Figure 5.5). Figure 5.5 shows computer models of the virtual lecture room without and with ceiling barriers. Ceiling reflectors were not studied by computer prediction as it was not clear exactly how to model them. The absorption coefficients of the room surfaces used in the simulation were the average values measured in the scale model without ceiling barriers or reflectors. Diffuse-reflection coefficients of the surfaces were set to increase with frequency from 0.1 to 0.3, based on previous research [10]. 85 Figure 5. 5. Computer models of the lecture room with: a. no barriers; b. R l / A l barriers; c. R2/A2 barriers; d. R4/A4 barriers. 5.3 Results 5.3.1 Comparison of measurement and prediction In order to confirm that the scale-model and virtual lecture rooms were reasonably similar, comparisons were made between measured and predicted speech levels and early-decay times for the three central receiver positions. Figure 5.6 shows the octave-band results. Predicted speech levels were somewhat lower than those measured - by about 4 dB at low frequency, decreasing with frequency to about 1.5 dB at high frequency. Predicted early-decay times varied negligibly with position; measured times showed much more variation. Predicted EDTs tended to be lower than those measured, by up to about 0.7 s (25 %). The imperfect agreement between measurement and prediction is interesting, given that the average absorption coefficients involved in the virtual and scale models were very similar to 86 3.5 -3.0 - b. 2.5 -2.0 - "HL \ — * i -Q LU 1.5 -* • X ^ . —•— * < - ]#r7*7,. 1.0 - -0.5 -0 0 1 I I ! I I I • i 125 250 500 1000 2000 Frequency (HzFS) Figure 5. 6. Variation with frequency of (a.) speech levels SLN and (b.) early-decay times EDT at three central positions with the centre speaker in an unoccupied lecture room without ceiling barriers or reflectors, as measured in a scale-model and as predicted by CATT-Acoustic: (—•—) rlC-measured, (—•—) r2C-measured, (—A—) r3C-measured, (••••<>••••) rlC-predicted, (•••••••••) r2C-predicted, (••••A--) r3C-predicted. 87 one another. It can partly be explained by uncertainties in the scale-model measurements, differences in the values of important room parameters (e.g. the diffuse-reflection coefficients), and possible limitations of the computer simulation (for example, the seat block). In any case, it was concluded that the scale-model and virtual lecture-room models, while not identical, are sufficiently similar that both can be used to study the effects of ceiling barriers and reflectors. The two techniques have their advantages and disadvantages. The scale model has the advantage of physical realism (for example, including modal effects), while prediction has the advantage that the input data defining the virtual room are precisely known. 5.3.2 Ceiling barriers Figure 5.7 compares the speech-level and early-decay-time results for the six ceiling-barrier configurations with the results for no barriers for the three positions along the center line in the unoccupied room. The measured and predicted levels are highest with no barriers, for which speech levels decrease by about 2 dBA from the front to the back of the lecture room. With the reflective ceiling barriers, as the number of barriers increased, predicted speech levels remained virtually unchanged at the front of the room, but decreased by up to about 1.5 dBA at the back. Measured levels showed more variability, but similar trends, especially at r3. With the absorptive ceiling barriers, and increasingly with the number of barriers, levels decreased more rapidly with distance relative to levels with the reflective barriers; speech levels decreased by up to between 4 and 6 dBA at the front and back of the room, respectively. Ceiling barriers decreased the early-decay times in all cases. With the reflective ceiling barriers, the predicted EDT^ progressively decreased with increased number of barriers, from about 1.6 s to about 0.8 s; the decreases were similar at the three receiver positions. With absorptive barriers, EDTmid varied little with barrier spacing and was very low (around 0.5 s). Again, measured results were similar, but showed less clear trends, and barriers resulted in smaller decreases in EDTmii. 88 The absorptive ceiling barriers do not achieve the objective, since they decrease speech levels significantly, as well as reducing early-decay times. Reflective barriers, on the other hand, do achieve the objective, reducing early-decay times significantly, while reducing speech levels little. Therefore, the study was focused on the reflective ceiling barriers. Effects of the occupancy in the lecture-room, and effects of the speaker-position, on the performance of the reflective ceiling barriers were investigated. r1C r2C r3C Seats Seats Figure 5. 7. Variation with position of speech levels SLAN and early-decay times EDT^D in an unoccupied lecture room without and with reflective and absorptive ceiling barriers, as measured in a scale model and as predicted by CATT-Acoustic along the center line with the centre speaker: a. SLAN, measured; b. EDT^A, measured; c. SLAN, predicted; d. EDT„M, predicted. ( ) no barriers, (—•—) R l , (- - A - -) R2, (— • —) R4, (—•—) A l , (- - A - -) A2, (_._0—)A4. 89 SLAN(dB) 9 a. NOo 60.4 NOo 58.9 NOu 63.4 NOu 63.8 • • Rio 60.3 Rio 57.8 R2o 60.2 R2o 59.1 R4o 60.8 R4o 59.1 Rlu 61.2 Rlu 60.7 R2u 62.1 R2u 60.2 R4u 62.7 R4u 60.8 NOo 56.0 NOo 57.1 NOu 60.5 NOu 61.5 • • Rio 56.5 Rio 55.7 R2o 56.9 R2o 56.5 R4o 57.1 R4o 56.8 Rlu 58.9 Rlu 59.6 R2u 58.8 R2u 58.9 R4u 60.3 R4u 59.7 NOo 57.7 NOo 55.5 NOu 59.3 NOu 60.0 • • Rio 57.3 Rio 54.9 R2o 56.9 R2o 56.5 %• R4o 56.7 R4o 55.9 IN k Rlu 56.2 Rlu 57.9 R2u 59.1 R2u 58.3 R4u 59.1 R4u 60.0 1 EDTmid(s) 9 b. NOo 1.08 NOo 1.33 NOu 2.21 NOu 2.00 • • Rio 0.95 Rio 1.05 R2o 0.86 R2o 0.97 R4o 1.07 R4o 0.97 Rlu 1.37 Rlu 1.34 R2u 1.53 R2u 1.73 1 R4u 1.56 R4u 1.75 P NOo 1.76 NOo 1.56 NOu 2.74 NOu 2.24 % • • Rio 1.05 Rio 1.10 R2o 1.22 R2o 1.05 R4o 1.06 R4o 1.17 Rlu 1.56 Rlu 1.48 R2u 1.94 R2u 1.89 R4u 1.65 R4u 1.79 NOo 1.55 NOo 1.55 i NOu 3.06 NOu 2.85 • • Rio 1.07 Rio 1.15 '•I- R2o 1.15 R2o 1.28 R4o 1.23 R4o 1.15 Rlu 1.59 . Rlu 1.63 R2u 1.66 R2u 1.83 R4u 1.78 R4u 1.55 Figure 5. 8. Variation with position of speech levels SLA^ and early-decay times EDT^ with the centre speaker without and with reflective ceiling barriers, as measured in a scale model: a. SLAN; b. EDTmid. Figure 5.8 shows the measured A-weighted speech levels for a 'Normal' voice (SLAu) and EDTma values with reflective ceiling barriers for the six receiver positions with the centre speaker in the scale model. In both the occupied and unoccupied rooms, speech levels did not decrease significantly and speech levels in the side seats were slightly lower than in the center-line seats. In the occupied room, speech levels remained constant or increased slightly with barriers. Early decay times decreased with barriers in both the occupied and unoccupied rooms. 90 S N A N ( d B ) a. (7) NOo 56.5 NOo 58.3 NOo 61.3 NOu 62.9 NOu 62.9 NOu 63.0 §R10*57.5 Rio 56.8 Rio 59*8 :.R2o 58.7 R2o 58.6 R2o 61.0 R4o 58.1 R4o 59.6 R4o 60.9 Rlu 60.3 Rlu 59.1 Rlu 59.7 R2u 60.3 R2u 60.5 R2u 62.2 R4u 61.0 R4u 61.3 R4u 62.5 NOo 54.7 NOo 54.9 NOo 57.6 NOu 61.2 NOu 60.5 NOu 61.0 • Rio 55.0 • Rio 52.8 • Rio 55.0 2  6.9 2  6.6 2o 56.7 R4o 56.3 R4o 56.6 R4o 57.6 Rlu 57.5 Rlu 56.5 Rlu 56.5 R2u 58.7 R2u 58.8 R2u 58.8 R4u 59.0 R4u 59.7 R4u 59.7 NOo 54.7 NOo 56.1 NOo 57.1 NOu 59.2 NOu 60.8 NOu 60.7 • Rio 54.7 • Rio 53.7 • Rio 55.8 2 5 4 2  5.6 2  .3 R4o 55.5 R4o 56.8 . R4o 56.6 Rlu 57.2 Rlu 55.6 Rlu 57.6 R2u 58.1 R2u 58.4 R2u 58.0 |*R4u 58.6 R4u 59.7 R4u 57.9 • E D T m i d ( s ) b. 9 : NOo 1.67 NOo 1.36 NOo 1.23 : NOu 2.08 NOu 2.17 NOu 2.23 R lo*1.24 Rio f.29 Rio 0%5 R2o 1.15 R2o 1.06 R2o 0.93 R4o 1.05 R4o 0.95 R4o 0.90 •Rlu 1.37 Rlu 1.35 Rlu 1.35 R2u 1.51 R2u 1.38 R2u 1.58 R4u 1.53 R4u 1.51 R4u 1.64 NOo 1.76 NOo 2.04 NOo 1.64 jJNOu 2.49 NOu 2.94 NOu 1.64 • • Rio 1.46 Rio 1.27 Rio 1.40 : R2o 1.38 R2o 1.36 R2o 1.12 R4o 1.34 R4o 1.23 R4o 1.20 Rlu 1.61 Rlu 1.60 Rlu 1.51 *R2u 175 R2u 1.96 R2u 1.76 R4u 2.01 ' R4u 1.96 R4u 1.96 NOo 1.72 NOo 1.42 NOo 1.42 NOu 2.34 NOu 3.08 NOu 2.57 • • • Rio 1.55 Rio 1.03 Rio 1.30 : R2o 1.41 R2o 1.30 R2o ,1.30 R4o 1.32 R4o 1.06 R4o 1.18 , Rlu 1.59 Rlu 1.42 Rlu 1.73 R2u 1.87 R2u 1.73 R2u 1.69 R4u 1.95 R4u 1.61 R4u 1.93 Figure 5. 9. Variation with position of speech levels SLAN and early-decay times EDT^d with the right speaker without and with reflective ceiling barriers, as measured in a scale model: a. SLA^; b. EDTmu', c. Cso,mjd> Figure 5.9 shows the measured A-weighted speech levels for 'Normal' voice (SLA^) and EDTmii values with reflective ceiling barriers for the nine receiver positions with the right speaker in the scale model. As shown in Figure 5.8, in both the occupied and unoccupied rooms, speech levels with reflective ceiling barriers did not decrease significantly, and speech levels in the side seats were slightly lower than in the center-line seats. In an occupied room, speech levels increased slightly with barriers at the left side seats ( r lL , r2L, r3L). Reflective ceiling barriers decreased early decay times by between 0.2 and 0.8 s in the occupied room, and by between 0.1 and 1.0 s in the unoccupied room. 91 The ceiling-barrier results are reminiscent of those that occur when reflective scattering obstacles ('fittings') are introduced into an industrial workshop [11]. Reverberation times decrease sharply; steady-state levels from a sound source increase slightly close to the source due to back-scattering, and decrease farther from the source. Therefore, the alternative ceiling-reflector concept, consisting of reflective scattering obstacles suspended from the ceiling, was tested. 5.3.3 Ceiling reflectors Figure 5.10 shows the speech levels and early-decay times for the semicircular ceiling reflectors at the three central positions with the centre speaker in the unoccupied room. Also shown are the results without barriers or reflectors, and for the R4 ceiling-barrier configuration which had the same spacing as the CF and CC reflectors. The CF reflectors decreased early-decay times by between 0.1 and 0.6 s, with the largest decreases occurring at r2C. The early-decay times were very close to those for barrier configuration R4. In this configuration, speech levels decreased by about 1 dBA at r l C , and increased slightly at r2C and r3C. The C C reflectors also decreased early-decay times, though much less than the CF reflectors or R4 barriers. The short, randomly-distributed ceiling reflectors (configurations CF30 and CF60) showed somewhat different results from those for the longer reflectors. Both CF30 and CF60 had little effect at r l C and r3C, but increased early-decay times at r3C. The CF30 reflectors had little effect on speech levels; the CF60 reflectors increased levels slightly at r l C , decreased them slightly at r2C, and left them unchanged at r3C. Clearly, the CF reflectors were effective at decreasing early-decay time with the least reduction of speech levels. Further measurements were made with semi-circular ceiling reflectors which had sound-absorptive materials on the upper curved surfaces (configuration CFA). Figures 5.11 and 12 show the measured speech levels and early-decay times in both the occupied and unoccupied rooms. When the speaker was at front-centre, speech levels at the side seats were lower than those without reflectors; however when the speaker was at right-centre speech levels at the left side seats ( r l L , r2L, r3L) were slightly higher than those without reflectors. In the occupied classroom, speech 92 levels decreased by between 2.4 and 4.5 dBA due to the added absorption. The C F A reflectors decreased both speech levels and early-decay times more than the CF reflectors did. 63 0.0 I——— 1 1 r1C r2C r3C Seats Figure 5.10. Variation with position of (a.) speech levels, SLA^ and (b.) early-decay times EDTmii in the scale-model lecture room at central positions without and with ceiling barriers and reflectors: ( ) no reflectors, ( ) R4, (—A—) CF, (—•—) CR, (—•—) C60, (—O— )C30. 93 SNAN (S) NOo 60.4 NOu 63.4 • CFo 60.8 CFAo 57.5 CFu 63.8 CFAu 60.8 NOo 56.0 NOu 60.5 • CFo 56.6 CFAo 54.1 CFu 60.6 CFAu 55.9 NOo 57.7 NOu 59.3 • CFo 56.6 CFAo 54.0 CFu 62.4 CFAu 53.7 a. NOo 58.9 NOu 63.8 • CFo 60.4 CFAo 58.8 CFu 63.3 CFAu 56.3 NOo 57.1 NOu 61.5 • CFo 56.9 CFAo 53.7 CFu 61.6 CFAu 54.1 NOo 55.5 NOu 60.0 • CFo 56.7 CFAo 52.2 CFu 60.6 CFAu 54.3 EDT m i d (s) b. NOo 1.08 NOu 2.21 • CFo 1.16 CFAo 0.94 CFu 1.82 CFAu 1.21 NOo 1.76 NOu 2.74 • CFo 1.52 CFAo 1.46 CFu 2.47 CFAu 1.52 NOo 1.55 NOu 3.06 • CFo 1.50 CFAo 1.13 CFu 2.23 CFAu 1.80 NOo 1.33 NOu 2.00 • CFo 1.35 CFAo 1.10 CFu 2.23 CFAu 1.40 NOo 1.56 NOu 2.24 • CFo 1.56 CFAo 1.16 CFu 2.50 CFAu 2.11 NOo 1.55 NOu 2.85 • CFo 1.42 CFAo 1.53 CFu 2.46 CFAu 1.65 Figure 5.11. Variation with position of speech levels SLAN and early-decay times EDT^ with the centre speaker without and with ceiling reflectors, as measured in a scale model: a. SLAN; b. 94 : SLA N ( s ) *• 9 NOo 56.5 NOu 62.9 • 'CFo 60.3 CFAo 53.7 CFu 62.3 CFAu 58.3 NOo 58.3 NOu 62.9 • CFo 60.3 CFAo 54.9 CFu 63.2 CFAu 58.3 NOo 61.3 NOu 63.0 • CFo 60.3 CFAo 58.6 CFu 63.6 CFAu 58.7 HNOo 54.7 NOu 61.2 NOo 54.9 NOu 60.5 NOo 57.6 NOu 61.0 • CFo 58.4 CFAo 50.6 CFu 61.5 ' CFAu 55.4 • CFo 58.5 " CFAo 50.7 CFu 61.4 CFAu 56.0 • CFo 57.3 CFAo 54.4 CFu 61.4 CFAu 55.7 •NOo 54.7 •NOu 59.2 NOo 56.1 NOu 60.8 NOo 57.1 NOu 60.7 • • CFo 59.2 CFAo 49.4 CFu 58.2 'CFAu 54.8 • CFo 59.7 CFAo 50.6 CFu 60.4 CFAu 55.2 • CFo 57.9 CFAo 53.1 CFu 61.0 CFAu 54.8 E D T m i d ( s ) b. 9 \ NOo 1.67 NOo 1.36 NOo 1.23 NOu 2.08 NOu 2.17 NOu 2.23 • • • CFo 1.42 CFo 1.02 CFo 1.16 CFAo 0.82 CFAo 0.87 CFAo 0.85 CFu 1.96 CFu 1.97 CFu 2.08 CFAu 1.31 CFAu 1.19 CFAu 1.41 NOo 1.76 NOo 2.04 NOo 1.64 NOu 2.49 NOu 2.94 NOu 1.64 is • • • CFo 1.37 CFo 1.51 CFo 1.28 CFAo 1.23 CFAo 1.11 CFAo 1.07 Cl'u 2.11 ' CFu 2.28 CFu 2.77 CFAu 1.71 CFAu 1.35 CFAu 1.66 NOo 1.72 NOo 1.42 NOo 1.42 NOu 2.34 NOu 3.08 NOu 2.57 • • • (To 1.21 CFo 1.10 CFo 1.44 CFAo 1.24 CFAo 1.05 CFAo 1.04 CFu 3.18 CFu 2.21 CFu 2.56 CFAu 1.55 CFAu 1.45 CFAu 1.42 Figure 5.12. Variation with position of speech levels SLAN and early-decay times EDT^ with the right speaker without and with ceiling reflectors, as measured in a scale model: a. SLAN; b. 95 5.4 Discussion To reduce reverberation while minimizing decreases of the sound levels in lecture rooms, reflective ceiling-barriers can be used. Making the barriers sound-absorptive reduces reverberation, but also reduces speech levels, which is detrimental to speech intelligibility. Long, parallel obstacles of semicircular cross-section, with their flat sides facing downward, running front-to-back in the lecture room, can also be effective. They provide early reflections and increase early energy, increasing speech intelligibility. The benefit would be expected to be increase as the number of reflectors increases (i.e. as their spacing decreases). As for the ceiling barriers, Figure 5.13 shows the predicted and measured percentage decreases of the early-decay times plotted against the decibel decreases of the speech levels for the six ceiling-barrier configurations at the center line seats, r l C , r2C, and r3C. Each set of data has been best-fit with a linear-regression trendline. For speech intelligibility, the goal is to achieve the minimum speech-level decrease with a significant EDT decrease. In Figure 5.13, this corresponds to the upper, left areas of the graphs. Both in the physical scale model (Figures 5.13a, c and e) and the computer prediction (Figures 5.13b, d, and f), the trendlines for the reflective ceiling barriers are always positioned above the trendlines for the absorptive ceiling barriers for small decreases of the speech levels. This means that even i f the speech levels decrease a small amount, the decreases in the early-decay time would be greater with the reflective ceiling barriers than with the absorptive ceiling barriers in the center line. Thus, the early-decay times can be reduced effectively by the reflective ceiling barriers. In the computer simulation, the source-receiver distance clearly affected the results. When the receiver is farther from the source, the predicted effect of the reflective ceiling barriers is less than that measured in the scale model. This may result from the way CATT-Acoustic models the direct sound and diffuse surface reflection. Based on the results here, the sound field predicted by the computer simulation seems to be less diffuse than that measured in the physical scale model. Figure 5.14 show the predicted percentage decreases of the early-decay times plotted against the decibel decreases of the speech levels for the six ceiling-barrier configurations at 96 the side seats, r l L , r2L, and r3L. The percentage decreases of the early-decay times remained nearly same as the speech level changes at each receiver position for the absorptive ceiling barriers. Thus, for the side seats some absorption is required to reduce the specular reflections from the walls. The absorptive and reflective ceiling barriers could be used in combination to optimize lecture-room acoustics for speech intelligibility upon the room layout. In practice, it is the occupied room that is of most interest, with the occupants contributing significant absorption in the lower floor/seating region of the room. This absorption would reduce reverberation times and causes speech levels to decrease, especially at larger source/receiver distances [1]. Sound paths carrying energy from a talker at the front of the room to receivers, which involve reflection from the ceiling, would contribute more to the sound field at the receivers. Thus, it was expected that the potential effects of ceiling barriers and reflectors to increase. Figures 5.13 and 14 show the effects of the reflective ceiling barriers. The trendlines for the reflective ceiling barriers in the occupied room are always positioned above the trendlines for the reflective and absorptive ceiling barriers in the unoccupied room for small decreases or increases of the speech levels. This means that the reflective ceiling barriers can be effective even in an occupied classroom. 97 -4.0 -2.0 -2.0 -2.0 0.0 2.0 S L A M decrease (dB) 0.0 2.0 4.0 SLAM decrease (dB) 4.0 6.0 6.0 8.0 0.0 2.0 4.0 SLAN decrease (dB) 6.0 8.0 100 90 80 70 60 50 40 30 20 10 0 g-' -o * ' — — " ^ ^ ^ o ^ ^ b. a -2.0 0.0 2.0 4.0 SLAN decrease (dB) 6.0 0.0 2.0 4.0 6.0 SLAM decrease (dB) 8.0 Figure 5.13. Variation of percentage decrease of early-decay time with sound-level decrease at three central positions with the centre speaker with (...0...) reflective and (...•...) absorptive ceiling barriers in the unoccupied room, and with (—•—) reflective and (—•—) absorptive ceiling barriers in the occupied room, as measured in a scale model and as predicted, with linear trendlines: a. rlC-measured; b. rlC-predicted; c. r2C-measured; d. r2C-predicted; e. r3C-measured; f. r3C-predicted. 98 o LU Q LU Q LU 100 90 £ 80 V 70 S 60 U 50 30 20 10 0 2 4 . 6 S L A J decrease (dB) 8 10 2 4 6 SLAi decrease (dB) 8 10 ---- • ,'° C. 2 4 6 SLAg decrease (dB) 8 10 Figure 5.14. Variation of percentage decrease of early-decay time with sound-level decrease at three side positions with the centre speaker in a lecture room with (...0...) reflective and (...•...) absorptive ceiling barriers in the unoccupied room, and with (—•—) reflective and (—•—) absorptive ceiling barriers in the occupied room, as predicted, with linear trendlines: a. r l L ; b. r2L; c. r3L. 99 5 .5 Conclusions Reflective ceiling barriers achieved the goal o f decreasing reverberation with the least speech-level reduction; the effect increases with barrier density. Cei l ing reflectors, in the form o f long obstacles o f semicircular cross-section, suspended below the ceiling in parallel, front-to-back lines with flat side down, were also effective, though a closer spacing would be desirable. However, some amount o f absorption was necessary to prevent specular reflections from the walls. The shape of the semicircular ceiling reflectors was inspired by typical lighting fixtures. The results suggest that lighting fixtures could be effective at controlling lecture-room sound i f they were made with flat, sound-reflecting (and, of course, optically transparent) bottoms, and arranged in long, parallel, front-to-back lines. References [I] M . R. Hodgson, "Experimental investigation of the acoustical characteristics of university classrooms", J. Acoust. Soc. Am. 106(4), 1810-1819 (1999). [2] J. S. Bradley, R. Reich, and S. G. Norcross, "On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility," J. Acoust. Soc. Am. 106(4), 1820-1828 (1999). [3] W. Yang, M . Hodgson, "Auralization study of optimum reverberation times for speech intelligibility for normal and hearing-impaired listeners in classrooms with diffuse sound fields," J. Acoust. Soc. Am. 120(2), 801-807 (2006). [4] M . Picard and J. S. Bradley "Revisiting speech interference in classrooms", Audiology 40(5), 221-244 (2001). [5] M . Hodgson, R. J. Orlowski, "Acoustic scale modeling of factories, Part I: Principles, instrumentation and techniques", J. Sound Vib. 113(1), 29-46 (1987). [6] H . E. Bass, L . C. Sutherland, A . J. Zuckerwar, D. T. Blackstock, and D. M . Hester, "Atmospheric absorption of sound: further developments", J. Acoust. Soc. Am. 97(1), 680-683 (1995). [7] M . Hodgson, "Empirical prediction of speech levels and reverberation in classrooms," J. Build. Acoust. 8(1), 1-14 (2001). [8] American National Standards Institute, "ANSI S3.5-1997 Methods for calculation of the speech intelligibility index," 1997. [9] B . -I. Dalenback, CATT-Acoustic v8.0 (Gothenburg, 2004) [10] M . Hodgson, "Evidence of diffuse surface reflections in rooms", J. Acoust. Soc. Am. 89(2), 765-771 (1991). [II] M . Hodgson, "Measurements of the influence of fittings and roof pitch on the sound field in panel-roof factories", Appl. Acoust. 16(5), 369-391 (1983). 1 0 0 6 CONCLUSION 6.1 Contributions This work comprised three main themes; the auralization technique was validated for use in speech-intelligibility testing; the optimum reverberation for speech intelligibility for normal-hearing and hearing-impaired listeners was investigated using auralization; and novel architectural acoustical designs for optimizing the classroom acoustics were developed. Following the validation of auralization for speech-intelligibility testing, this work then expanded the application of auralization in speech-intelligibility research. Tests showed that i f the room to be auralized is not very sound-absorptive, or reverberant and noisy, speech-intelligibility tests using auralization are valid and reliable. In general, the prediction results showed good agreement for most room-acoustical parameters. Success in applying the auralization technique to speech-intelligibility testing can yield more facilitative [1] and reliable subjective speech-intelligibility tests than on-site speech-intelligibility tests. These were presented in Chapter 2. The fully-computed auralization procedure used in this study was done using the CATT-Acoustic — hybrid room-acoustical prediction and auralization program. This work was motivated by discrepancies between the results of experimental studies and theoretical predictions of the optimal reverberation time for speech intelligibility. In this work, the traditional concept of the signal-to-noise ratio, which is the output sound-level difference of the signal and noise sources, was not used. Instead the concepts of the signal-to-noise level difference at listeners and the relative output power levels of the speech and noise sources were separated in the experimental setting: this had previously only been done in theoretical predictions. This made it possible to incorporate more realistic noise sources into the room models, and to eliminate the discrepancies between the results of experimental studies and theoretical predictions of the optimal reverberation time for speech 101 intelligibility. In particular it generally confirmed the results of theoretical predictions. This work was presented in Chapter 3 and Chapter 4. The effects of the location of the speaker, receiver, and noise source on speech intelligibility in a room were confirmed by comparing two cases: when the noise source was closer than the speech source to the listener, and when the speech source was closer to the listener than the noise source (Chapter 3 and Chapter 4). Generally, the optimal reverberation time was zero or near zero when the noise source was farther than the speaker; non-zero reverberation times were found to be optimal when the noise source was between the listener and the speaker both, for normal and hearing-impaired listeners. This finding agrees with Hodgson and Nosal [2]'s theoretical prediction results. The optimum reverberation time in a room cannot be identified as a single defining value, because it depends on the relationship between the speakers, receivers, and noise sources, as well as on the room. When the speech-to-noise level difference is adverse for both normal-hearing and hearing-impaired listeners, some reverberation is required to increase the speech signal and increase intelligibility. With the tested range of speech-to-noise level differences (-6 dB to 8.5 dB), hearing-impaired listeners needed more early-sound energy than normal-hearing listeners in the cases of both approximately diffuse sound fields and non-diffuse sound fields. This was shown in Chapter 3 and Chapter 4. Classroom noise levels are too high to achieve the ideal 15 dB speech-to-noise level difference [3] in many schools [4,5,6,7]. If it is hard to decrease existing noise levels (over 70 dBA) significantly, particularly in preschool classrooms where noise is frequently attributable to student activities [5,7,8], enhancing of early reflections rather than minimizing reverberation can improve speech intelligibility in a room. This can also be applied to classrooms for hearing-impaired students who do not use hearing-aids in their everyday lives. However, the priority in controlling sound for speech intelligibility is reducing noise levels. Various types of ceiling barriers and reflectors proposed and evaluated in this work were effective at optimizing acoustical conditions for speech intelligibility. This novel system contains sound-reflective materials, in contrast with traditional sound-absorptive systems used to minimize classroom reverberance. Reflective ceiling barriers and ceiling reflectors — 102 in particular, parallel front-back rows of semi-circular reflectors — achieved the goal of decreasing reverberation with the least speech-level reduction. However, a small amount of absorption is necessary to prevent specular reflections from the walls. This was presented in Chapter 5. The novel system of the ceiling barriers and reflectors can be applied to classrooms to optimize acoustical conditions for speech intelligibility. Specifically, the ceiling reflectors inspired by typical lighting fixtures currently used in many classrooms can be directly installed, simply considering light-fixture arrays in classrooms. 6.2 Limitations Auralized sound fields were validated in Chapter 2 and used in Chapter 3 and Chapter 4 for the speech-intelligibility tests in this work. Although some limitations were found on the auralization technique in sound-absorptive (RT = 0.4 s in the 1kHz octave band) or noisy, reverberant rooms (SNS = 0 dB and RT = 2.0 s), as shown in Chapter 2, in general the speech-intelligibility scores in the auralized classroom showed agreement with the scores in the real classroom. The discrepancy between the on-site speech-intelligibility tests and the virtual speech-intelligibility tests in very sound-absorptive or noisy, reverberant classrooms is a weakness of the work. In very absorptive rooms the speech-intelligibility scores with auralized sound fields were higher than the scores in actual rooms; in noisy, reverberant rooms the speech-intelligibility scores with auralized sound fields were lower than the score in actual rooms. Even considering these facts, the overall findings may not change, since the decreased speech-intelligibility scores in real, absorptive rooms, and the increased speech-intelligibility scores in real noisy, reverberant rooms, support theoretical predictions in the literature that the optimum reverberation time is not zero with noise. Therefore, auralized sound fields can be used for speech-intelligibility studies, even i f they do not perfectly represent the room sound field to be modeled. The other limitation of this work is the relatively low statistical power associated with the number of hearing-impaired subjects in the speech-intelligibility tests. The tests were advertised using posters and email distribution and, specifically for recruiting hearing-103 impaired subjects, local networks helped to disseminate the advertisement: the Disability Resource Centre of the University of British Columbia, local audiology clinics, and the Workers' Compensation Board of BC. For the speech-intelligibility tests in Chapter 3 and Chapter 4, the target numbers of subjects were thirty for each hearing group; however it proved very challenging to find hearing-impaired subjects to volunteer for the tests. In Chapter 3, forty-three normal-hearing and twenty-eight hearing-impaired subjects completed the speech-intelligibility tests. In Chapter 4, twenty-five normal-hearing and thirteen hearing-impaired subjects participated in the tests. For the validation tests in Chapter 2, twelve subjects participated in the speech-intelligibility tests. The small number of subjects can decrease confidence in the current results. However, only three subjects could participate in each real classroom test, since there were three receiver positions (r l , r2, and r3) designed for the tests, as shown in Figure 2.1. In order to have more subjects in the real classroom listening tests, enormous amounts of time and classroom scheduling would have been required, which was unrealistic. 6.3 Future Work The optimal reverberation for speech intelligibility has been explored experimentally in this work. However, this study opens some directions for future work in this area. In this study, a typical medium-sized classroom was used as the environment for the speech intelligibility test. There is a need for an experimental study using various types of classroom to improve the current work; e.g. small-sized classrooms and large-sized auditoria. As described in Chapter 3 and Chapter 4, the talker, listener, and noise sources were positioned on the centre line of the room for the speech-intelligibility tests. It would be interesting to observe whether other talker-listener-noise combinations can affect the optimal reverberation for speech intelligibility, especially when the azimuthal angle is incorporated into the talker-listener-noise combination. This can be studied by considering types of classroom activities; e.g. group activities, round table discussion, lectures, etc. Classroom activities are important factors affecting talker-listener-noise combinations. The investigation of more noise sources would be another improvement to understand speech intelligibility accurately. Only one noise signal (babble noise as a student noise) was used in this work. To obtain more realistic 104 classroom sound fields, different types of classroom noise sources could be studied - for example, a ventilation outlet, computer fan noise, a laser projector and other visual aids, as well as student-activity noise. While the hearing-impaired listeners in this study benefited from early reflections more than the normal listeners, it would also be interesting to verify the result for elderly listeners or more severely impaired listeners who use hearing aids in their everyday lives to investigate optimum reverberation for hearing aids. Reference [1] J. Besing and J. M . Koehnke, " A test of virtual auditory localization," Ear Hear. 16 (2), 220-229(1995). [2] M . Hodgson and E . - M . Nosal, "Effect of noise and occupancy on optimal reverberation times for speech intelligibility in classrooms," J. Acoust. Soc. Am. I l l (2), 931-939 (2002). [3] America National Standards Institute, "ANSI SI2.60 Acoustical performance criteria, design requirements and guidelines for schools," 2002. [4] M . Hodgson, "Experimental investigation of the acoustical characteristics of university classrooms," J. Acoust. Soc. Am. 106 (4), 1810-1819 (1999). [5] M . Picard and J. S. Bradley, "Revisiting speech interference in classrooms," 40 (5), 221-244 (2001). [6] H . A . Knecht, P. B . Nelson, G. M . Whitelaw et al., "Background noise levels and reverberation times in unoccupied classrooms: Predictions and measurements," Am. J. Audiol. 11 (2), 65-71 (2002). [7] B . Shield and J. E. Dockrell, "External and internal noise surveys of London primary schools," J. Acoust. Soc. Am. 115 (2), 730-738 (2004). [8] W. Yang and M . Hodgson, "Acoustical evaluation of preschool classrooms," Noise Control Eng. J. 53 (2), 43-52 (2005). 105 APPENDIX A - Ethical Certificate The University of British Columbia Office of Research Services and AdrnWslraSori Behavioural Research Ethics Board Hodgson, MR, Occupational & Env flygicne UBC Campus, Yang, Wonyoung, Occupation^ & Env Hygiene Natural Science Engineering Research Council W U t ' i / " "••'.'.Vii . . . ..... Optimum Acoustical Conditions in Classrooms ' ;*f S E P 2 0 2 0 0 5 t T h e j p t o w ^ r ev i sed by the Gbmmittee and the experimental procedures were found to be acceptable on ethical girourtdsfo^ Approval of the ffefhvioaral Research^Elhia Board hy one of the foiling: Dr. Peter Suedfddi Chair, Dr, Susan Rowley, Associate Chair This Certificate of Approval is valid for the above term provided there is no'Change in the experimental procedures APPENDIX B - Consent Forms Chapter 2 THE UNIVERSITY OF BRITISH COLUMBIA School of Occupational & Environmental Hygiene 3'" Floor 2206 East Mall Vancouver, BC Canada V6T 1Z3 www.soeh.ube.ca (604) 822r9595 fe/ (604) 822-9588 fax INFORMED CONSENT FORM Speech Intelligibility in Real and Virtual Classrooms Principal Investigator: Murray Hodgson, Ph.D., Professor, Occupational & Environmental Hygiene, UBC, 822-3073 Co-Investigators: Wbnyourig Yang, MSc, Ph.D. Candidate, Occupational & Environmental Hygiene, UBC, 822-9590 This study will form the, basis of a part of; Wonyoung Yang's Ph.D. research project. This study has: been funded by the. Natural Sciences and Engineering Research Council of Canada. Purpose: The purpose of this study is'to1 validate the acoustical virtual reality technique for speec h intelligibility in classrooms. The project could result, in improved designs for classrqo ms: using acoustical virtual reality technique, auralization'. This- study poses no significant risks to subjects. Study Procedures: For the purposes of this study we are recruiting normal hearing participants. Your part icipatipn in the study involves consenting to, a hearing screening (comparable to those c onducted,in audiology clinics for hearing loss). This study consists of two parts; a live t est in real classrooms and, a virtual test in a laboratory. For a live test, you will be seate d in a real classroom and word samples will be presented by a loudspeaker. For the firs t part of a virtual test, like that for a conventional hearing screening, you will be asked to raise your hand when you hear a tone presented oyer the headset. For the* second pa rt of the virtual test, you wilt be seated at a separate listening station and word samples will be presented over the; same: type of headset. These: word samples will be presented at the same level of the live test. During the testing, you will be asked to choose what y ou. hear on a response sheet. You will receive your hearing; screening resuitsdmmediately after the listening test. The total amount of your time needed for participating is 60 minutes per each test. Version- 1 NowmberJOM Reference Number: ROI' Hogel of 2 Confidentiality: Your confidentiality wil l bc respected. No information that discloses your identity will be released or published without your specific consent to the disclosure. No records which identify yon by name; or initials will be allowed to leave the Investigators' offices. Your -rights to: privacy arc also protected by the 'Freedom of Information and Protection of Privacy Act ' of Br i t i sh Columbia. This Act lays down rules for: the: collection, protection, and retention of your personal information by public bodies, such as me University of Bri t ish Columbia. Further details about this Act are available upon request. Your results wil l be securely stored for, possible; future follow-up analyses. Compensation: You wil l be compensated $ 1 0 . 0 0 for participating in this research study. Contact for Information about the study: If yon have any questions or desire further information with.respect to this study, pleas e contact; Prof. Murray Hodgson at 604-822-3073, or Won Yang at 601-822-9590. Contact for information about the rights of research subjects: If you have any concerns about your treatment or rights as a research subject, please c ontact the Research Subject Information Line i n the University of Bri t ish Columbia Offi ce of Research .Services at 604-822:-8598. Cbii'seht: Your participation in this study is entirely, voluntary and you may refuse to participate or withdraw from the study at any time without, adversely affecting your relationship wi 111 the investigators,, any of your health care providers, or the University of B .C. Your signature below indicates that you have received a copy of this consent form for y our own records. Your signature indicates that you consent to participate in this study. Printed Name: Signature of Participant: • Date: Version. 1 Xavember 2004 RejerenceNtunber.ROI Page 2 aj 2 Chapter 3, Chapter 4 INFORMED CONSENT FORM Optimum acoustical conditions for normal and hard of hearing people in classrooms Principal Investigator: Murray Hodgson, Ph.D., Professor, Occupational & Environmental Hygiene, UBC, 822-3073 Co-Investigators: Wonyoung Yang, MSc, Ph.D. Candidate, Occupational & Environmental Hygiene, UBC, 822-9575 This study will form a part of Wonyoung Yang's Ph.D. research project. This; study has been funded by the Natural Sciences; and Engineering Research Council of Canada. Purpose: The purpose of this study is to determine the optimal reverberation for normal and har d of hearing people in classrooms. Reverberation, or the process of how speech is percei ved, is regarded as an important issue with regards to speech intelligibility. The project could result in improved designs for classrooms for both normal and hard of hearing pe ople. This study poses no significant risks to subjects. Study Procedures: For the purposes: of, this study we are recruiting both normal and hard of hearing par tic ipants. Testing for normal and hard of hearing participants is identical. Your participafi on in the study involves consenting to a hearing screening (comparable to those eonduc ted in audiology clinics for hearing loss). For the 'first part: of the survey, like that for a c onvehtional hearing screening, you will be asked to raise your hand when you hear a to ne presented over the headset. For the second part of the survey, you will be seated at a separate listening station and word samples; will be presented over the same type of h eadset. These word samples will be presented at a moderate listening level. During this portion of the testing, you will be asked to select what you hear oh a response sheet. Th e word samples will be presented in such a way as to simulate a classroom setting. Vari ous reverberations; will be incorporated into the test using a specialized computer progr am. Verslon.02012006 BUEB Approval No: B03-02S6 Page Jo/2 You wil l receive your hearing screening results immediately after the listening test: The total amount of your time needed for participating;is ;50 minutes onone occasion o: nly. Confidentiality: Your confidentiality will be respected. No information that discloses your identity will be released or published without your specific consent to the disclosure. No records which identify you by name; or initials wil l be allowed to leave the Investigators' offices. Your rights to privacy are. also protected by the 'Freedom of Information and Protection of Privacy A c t ' o f Br i t i sh Columbia. This Act lays down rules for the collection,, protection, and retention of your personal information by public bodies, such as the University of Bri t ish Columbia. Further details about this Act are available upon request. Your results will be securely stored for possible future follow-up analyses. Compensation: You will be compensated $10."°Tor participating in this research study. Contact for information about the study: If yon have any questions or desire further informaUbh with respect to this study, pleas e contact: Prof. Mur ray Hodgson at,604-822-3073, or, Won Yang at 604-822-9575. Contact for information about the rights of research subjects: If you, have any concerns about your treatment or rights as a research subject, please c ohtact the Research Subject Information Line in me University of Br i t i sh Columbia Offi ce of Research Services at 604-822-8598. Consent: Your participation in this study is entirely voluntary and you may refuse to participate or withdraw from .the study at any time without adversely affecting your relationship wi t l i the investigators, any of your health care providers, or the University of B .C . Your signature below indicates that you have received a copy of this consent form for y our own records. Your signature indicates that you consent to participate i n this study, Printed Name: Signature of Participant: Date: Version 02012006 BRSBApproyd,No: BOJ-0216 Pnge2of2 APPENDIX C - 300 MRT Words in a Response Sheet O ; ts m w 111 in Ui man took dark lace j hill kick peel j hear j rang | same j ! sale: I hit j *-<•*. « = ! , duck TJ ra 't^ • | rent | CUSS | cape 1 bun ; gust coil bean 1" bath" >• 1 (/) 1 > • ' • .§-«•. TO E TJ ra E hook hark lake i lame, j kill will kid ' kit keel i heel j r heap : heath | •gang i bang | came | bale ] "bit : fizz ! i dun ; did D) ; dent , tent cub ! cuff I cane , case . but: .i bug bust i must o X I bead beam : bass i back look ; lark i lake i lame, j kill will kid ' kit keel i heel j r heap : heath | •gang i bang | game gale " "fit fin . dub dip dig ; dent , tent cub ! cuff I cane , case . but: .i bug bust i must bead beam : bass i back 1 > » o ro ra mass : book; park late ,| eel 1 heal hang -J fame male " kit O) u= , dud | |. .dig: ! i: big; | 1  bent j cut cave J j buff I Just j toil beak bad >. TJ map i shook bark , lay | bill | kin i feel j heave j fang j tame | . , . „ ( , , . o „ H \'v> Q-i | -dung: i •din Ol 'Q. 1 sent cud f came buck:: dust soil beat ban .1 > > COi ro math cook mark •lane 1 till kill reel 1 heat j sang name i tale wit fib i dug dim iD) :«= i. went ! CUp: | cake bus rust o 1 beach I bat Number! 26 N . CM col a> fSI; CM | ! O : CO : T - I: CM con co : CO CO •CO • CO CO :co t^.i 00 ?CO! CO I o CO o • CM ; co • • m i1 co!: :rv i CO • * O J ; o "**:! m i 2 TJ TJ TJ '.xti Q) XI y ro CL 00: .2 ra TO ra c CD <B S 2 ro -2 & 3 iS ra fe B S3 O i-C ns ra <D i ra CL 0) <11 > o.i ra oo oo TJ ' 0) o) E <" oo o oo m 0) ! o ra 0) V) oo ro i; > CL, CL ra i c C L ! O L i 8. 8. 'o> i ra Q. CL C L C L c ro o>i£ i'ra c ro XI S. : r o 5 <» •xt-i m I I I 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0092850/manifest

Comment

Related Items