Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A model for auditory lateralization in non-stationary multi-source environments Shu, Zhengjin 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1995-060659.pdf [ 4.04MB ]
Metadata
JSON: 831-1.0065019.json
JSON-LD: 831-1.0065019-ld.json
RDF/XML (Pretty): 831-1.0065019-rdf.xml
RDF/JSON: 831-1.0065019-rdf.json
Turtle: 831-1.0065019-turtle.txt
N-Triples: 831-1.0065019-rdf-ntriples.txt
Original Record: 831-1.0065019-source.json
Full Text
831-1.0065019-fulltext.txt
Citation
831-1.0065019.ris

Full Text

A MODEL FOR AUDITORY LATERALIZATIONIN NON-STATIONARY MULTI-SOURCE ENVIRONMENTSbyZHENGJIN SHUM.A.Sc., Beijing Institute of Technology, 1987B.A.Sc., Beijing Institute of Technology, 1984A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIES(Department of Electrical Engineering)We accept this thesis as conformingTHE UNIVERSITY OF BRITISH COLUMBIAOctober 1995to the recjuired standard© Zhengjin Shu, 1995In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of Electrical EngineeringThe University of British ColumbiaVancouver, CanadaDate October 17, 1995DE6 (2/88)ABSTRACTThe problem addressed in this thesis is the determination of the positions of soundsources in non-stationary multi-source environments. This problem is approached bydeveloping models that mimic the processing of sounds by the auditory system. It iswell known that in the localization process the auditory system utilizes interauralintensity and time differences (lID and ITD) and the interaural envelope delay (lED).However, the way such cues are estimated and organized by the auditory system innon-stationary multi-source situations is not known.It is argued in this thesis that the auditory localization process can be divided intothree serial processing stages: decomposition, localization, and integration (DLI).Specifically, the signals detected by the two ears are first decomposed into theirspectro-temporal distributions as represented in the neural activities of the auditorynerve fibers. Short-time spatial attributes, in terms of the localization cues, are thendetermined from energy concentrations in these distributions. A spatial scene ofacoustic events is finally built by integrating the energy concentrations according totheir spatial attributes.A unique DLI model is proposed in which short-time cue estimation is realized bya process of pattern recognition and comparison using neural networks, and thespatial scene is represented by short-time cue distributions. Three implementations ofthe DLI model, which model separate auditory pathways responsible for threedifferent types of cue sensitivity (11D, lTD. and lED) observed in the auditory system,are developed, and their performance in estimating short-time cue distributions arestudied by computer simulation.It is shown that there are unexplored patterns in the neural signals carried by theauditory nerve fibers that are important for auditory localization. These patterns areshown to contain good indications of interaural differences, and can be used to obtainrobust short-time cue estimates by training neural networks that have relatively11simple structures. Furthermore, while such networks can be trained using thesimplest types of inputs, they show the ability to generalize and perform well withmore complex stimuli. It is demonstrated that the model works well in noise and innon-stationary multi-source situations.The same model structure can be trained to estimate different localization cues,suggesting that the underlying structure of the different pathways responsible fordifferent types of cue sensitivities in the auditory system may not necessarily bedifferent. The receptive connection patterns of the hidden neurons in the modelindicate that the spectro-temporal response properties of binaural neurons in theauditory system may play an important role in auditory localization, and thatexcitatory and inhibitory inputs to the binaural neurons play equally important rolesin localization.111TABLE OF CONTENTSAbstract.iiTable of Contents ivList of Tables viiList of Figures xiiList of Abbreviations xviiiAcknowledgments xixINTRODUCTION 11 AUDITORY LOCALIZATION 5§ 1.1 Introduction 5§ 1.2 Approaches Used in the Study of Auditory Localization 6§ 1.3 A Brief Review of the Structure of the Auditory System 7§ 1.4 Binaural Hearing 9§ 1.5 Interaural Intensity Differences (ilDs) 12§ 1.6 Interaural Time Differences (ITDs) 18§ 1.7 Free Field Auditory Localization 25§ 1.8 Interaural Envelope Delay 28§1.9 Models of Auditory Localization 29§ 1.10 Cross-correlation Based Models 32§ 1.11 Models of cue sensitive neurons 422 MODELING AUDITORY LOCALIZATION IN COMPLEX ACOUSTIC ENVIRONMENTS 493 A DLI MODEL FOR AUDITORY LOCALIZATION 58§3.1 ADL1Mode1 58§3.2 Three Parallel Implementations of the DLI Model 58iv§3.3 Modeling the Auditory Periphery 66§3.4 Measuring Short-Time Interaural Differences via Neural Networks 683.4.1 Spectro-Temporal Patterns at the Inputs of the Networks 683.4.2 Structure of the Networks 733.4.3 Training of the Networks 81§3.5 Integrating Short-Time Cue Estimates 82§3.6 Summary of the DLI Model 834 SIMULATION METHODS 85§4.1 Simulation of the Peripheral Auditory System 854.1.1 Time-Varying Narrow-Band Filter 884.1.2 Feedback Control of the Filter Bandwidth 884.1.3 Traveling Wave Delay 894.1.4 Model of the Inner Hair Cell 904.1.5 Modeling High-Frequency Auditory Nerve Fibers 90§4.2 Simulation of the Neural Networks 904.2.1 Training Examples 914.2.2 Specifications of Three Networks to be Tested 934.2.3 The Training Process 954.2.4 Short-Time Cue Estimates 97§4.3 Evaluation Methods of Trained Networks 974.3.1 Parametric Description of Cue Estimates 984.3.2 Statistical Evaluation of the Trained Networks 994.3.3 Generalization Ability of the Networks 804.3.4 Robustness Test of the Networks 1014.3.5 Testing in Multi-Source Situations 101A. A Model for Multi-Source Environments 101B. Orthogonal On-Off Modulations of Different Sound Sources 102vC. Resolution Power for the Separation of Two Sound Sources 104D. Uncorrelated On-Off Modulations of Different Sound Sources 1044.3.6 Summary Table of Tests for the Evaluation of the Networks 1055 TEST RESULTS OF lTD ESTIMATION 106§5.1 Pure-Tone Stimuli 106§5.2 Two-Tone Complex Stimuli 109Analysis of the Input Patterns of the lTD Network for Two-Tone Stimuli 111§5.3 Pink Noise Stimuli 114§5.4 Two Similar Sources from Different Directions 1166 TEST RESULTS OF HP ESTIMATION 122§6.1 Pure-Tone Stimuli 122§6.2 Two-Tone Complex Stimuli 124§6.3 Pink Noise Stimuli 127§6.4 Two Similar Sources from Different Directions 1297 TEST RESULTS OF lED ESTIMATION 133§7.1 AM Stimuli 133§7.2 Stimuli Modulated by Two-Tone Complex Signals 135§7.3 Two AM Sources from Different Directions 137CONCLUSIONS 142REFERENCES 152viLIST OF TABLESTable 4.1 Networks (ITD, lID, lED) to be tested for the listed stimulus typeand parameter combinations 105Table 5.1 lTD estimates (mean ± standard deviation) and the error of themean position estimate for pure-tone stimuli with a 250 jis lTDand a 20 dB SNR 107Table 5.2 lTD estimates (mean ± standard deviation) and the error of themean position estimate for pure-tone stimuli with a 250 j.Is lTDandaOdBSNR 107Table 5.3 lTD estimates and the errors of the mean position estimates forpure-tone stimuli with different ITDs and a 20 dB SNR 110Table 5.4 lTD estimates (mean ± standard deviation) and the error of themean position estimate for two-tone complex stimuli with a 250ji,sITDanda2OdBSNR 110Table 5.5 lTD estimates (mean ± standard deviation) and the error of themean position estimate for two-tone complex stimuli with a 250iisITDandaOdBSNR 110Table 5.6 lTD estimates and the errors of the mean position estimates fortwo-tone complex stimuli with a 250 j.ts lTD and a 20 dB SNR.FD: frequency difference between the two components in thestimuli 111Table 5.7 lTD estimates and the errors of the mean position estimates for pinknoise stimuli with different bandwidths (BW). The SNR used inthe tests was 20 dB, and the lTD of the stimuli was 250 jis 115Table 5.8 lTD estimates and the error of the mean position estimate for pinknoise stimuli with a 0 dB SNR and a 250 J.is lTD. The bandwidthof the stimuli was 50 Hz 116viiTable 5.9 lTD estimates and the errors of the mean position estimates for pinknoise stimuli with different center frequencies (CF). The SNRused in the tests was 20 dB, and the lTD of the stimuli was 250117Table 5.10 lTD estimates and the position errors for two-source cases withdifferent on-off modulation periods (MP). The SNR was 20 dB,and the ITDs of the two sources were -250 j,is and 250 Jis,respectively 118Table 5.11 lTD estimates and the position errors for a two-source case with a0 dB SNR. The ITDs of the two sources were -250 ts and 250j.is, respectively 119Table 5.12 lTD estimates and the position errors for two-source cases withdifferent spatial separations (SS, in terms of the two ITDs). TheSNRwas2OdB 119Table 5.13 lTD estimates and the position errors for a two-source case withuncorrelated on-off modulations. The SNR was 20 dB 120Table 6.1 lID estimates (mean ± standard deviation) and the error of the meanposition estimate for pure-tone stimuli with a 3 dB lID and a 20dB SNR 123Table 6.2 lTD estimates (mean ± standard deviation) and the error of the meanposition estimate for pure-tone stimuli with a 3 dB lID and a 5dB SNR 123Table 6.3 lID estimates and the errors of the mean position estimates forpure-tone stimuli with different lIDs and a 20 dB SNR 124Table 6.4 lID estimates (mean ± standard deviation) and the error of the meanposition estimate for two-tone complex stimuli with a 3 dB lIDanda2OdBSNR 125viiiTable 6.5 lID estimates (mean ± standard deviation) and the error of the meanposition estimate for two-tone complex stimuli with a 3 dB lIDanda5dBSNR 126Table 6.6 ID estimates and the errors of the mean position estimates for two-tone complex stimuli with a 3 dB lID and a 20 dB SNR. FD:frequency difference between the two components in the stimuli 126Table 6.7 lID estimates and the errors of the mean position estimates for pinknoise stimuli with different bandwidths (BW). The SNR used inthe tests was 20 dB, and the lID of the stimuli was 3 dB 127Table 6.8 lID estimates (mean ± standard deviation) and the error of the meanposition estimate for pink noise stimuli with a 5 dB SNR and a 3dB lID. The bandwidth of the stimuli was 500 Hz 128Table 6.9 lID estimates and the errors of the mean position estimates for pinknoise stimuli with different center frequencies (CF). The SNRused in the tests was 20 dB, and the lID of the stimuli was 3 dB 129Table 6.10 lID estimates and the position errors for two-source cases withdifferent on-off modulation periods (MP). The SNR was 20 dB,and the liDs of the two sources were -3 dB and 3 dB,respectively 130Table 6.11 lID estimates and the position errors for a two-source case with a5 dB SNR. The liDs of the two sources were -3 dB and 3 dB,respectively 131Table 6.12 lID estimates and the position errors for two-source cases withdifferent spatial separation (SS, in terms of the two lIDs). TheSNRwas2OdB 131Table 6.13 lTD estimates and the position errors for a two-source case withuncorrelated on-off modulations. The SNR was 20 dB 132ixTable 7.1 lED estimates (mean ± standard deviation) and the error of themean position estimate for AM stimuli with a 0.83 ms TED and a2OdBSNR 134Table 7.2 TED estimates (mean ± standard deviation) and the error of themean position estimate for AM stimuli with a 0.83 ms TED and a5dBSNR 134Table 7.3 lED estimates and the errors of the mean position estimates for AMstimuli with different TEDs and a 20 dB SNR 134Table 7.4 TED estimates (mean ± standard deviation) and the error of themean position estimate for stimuli with two-tone complexmodulation waveforms. The TED to be estimated was 0.83 ms.The SNR in the test was 20 dB 136Table 7.5 TED estimates (mean ± standard deviation) and the error of themean position estimate for stimuli with two-tone complexmodulation waveforms. The TED to be estimated was 0.83 ms.The SNR in the test was 5 dB 136Table 7.6 TED estimates and the errors of the mean position estimates forstimuli with two-tone complex modulation waveforms. The TEDto be estimated was 0.83 ms. The SNR in the test was 20 dB.FD: frequency difference between the two components in themodulation waveforms 137Table 7.7 lED estimates and the position errors for two-source cases withdifferent on-off modulation periods (OOP). The SNR was 20dB, and the TEDs of the two sources were ±0.83 ms,respectively 138xTable 7.8 lED estimates and the position errors for a two-source cases with a17 ms on-off modulation periods (OOP). The SNR was 5 dB,and the ]EDs of the two sources were ±0.83 ms, respectively 139Table 7.9 lED estimates and the position errors for two-source cases withdifferent spatial separations (SS, in terms of the two TEDs). TheSNR was 20 dB 140Table 7.10 lED estimates and the position errors for a two-source case withrandom on-off modulations. The SNR was 20 dB, and the IEDsof the two sources were ±0.83 ms, respectively 140xiLIST OF FIGURESFig. 1.1 Three major parts of the auditory system 8Fig. 1.2 Four parts of the auditory periphery 8Fig. 1.3 The main ascending auditory pathways of the brainstem. For abbreviations see the main text. (After Pickles, 1988, Page 170.) 9Fig. 1.4 A coordinate system for sound localization 10Fig. 1.5 Interaural amplitude ratio a for tones as a function of the azimuthangle 0 and the frequency I (Shaw, 1974a, b). (After Durlach andColburn, 1978.) 11Fig. 1.6 Interaural time difference ‘r for tones and clicks as a function of theazimuth angle 0 (Shaw, 1974a, b). The solid lines represent dataobtained by Firestone (1930), Mills (1958), Nordlund (1962), andFeddersen et al. (1957). The lower dashed curve is derived from theformula r = r/c(O + sin(0)) and the upper dashed curve from theformula ‘r = r/c(3sin(0)), where r = 8.75 cm and c 344 m/s(Woodworth, 1938; Rayleigh, 1945). (After Durlach and Colburn,1978.) 12Fig. 1.7 Lateral displacement (calibrated as follows: 0 corresponds to 0 = 00, -5 to0 =-90°, and 5 to 0 =+90°) of the auditory event as a function of theinteraural intensity difference (dB). (After Blauert, 1983, Page 158.) 14Fig. 1.8 The perceived location of the pure tone sound image as a function of theinteraural intensity difference. The perceived location is normalizedacross the studies with 0% corresponding to 0=00 and 100% to 0 ±900.(After Yost and Hafter, 1987.) 15Fig. 1.9 The lID threshold for perceived change as a function of frequency fortones. The parameters (0, 9, and 15 dB) are the liDs of the referentxiitones which serve to mark positions in lateral space. (After Yost andHafter, 1987.) 16Fig. 1.10 Effects of varying liDs on the discharge rate of a neuron in the catdorsal nucleus of the lateral lemniscus. (After Kuwada and Yin, 1987.) 17Fig. 1.11. Responses of a superior colliculus neuron of cat to changes in lID.(After Kuwada and Yin, 1987.) 18Fig. 1.12 Binaural coherence curves for tones of different frequencies. P denotesthe percentage of judgments “to the left,” and positive ‘r corresponds tothe left ear leading. (After Durlach and Colburn, 1978.) 19Fig. 1.13 The perceived location of the pure tone sound image as a function ofthe interaural time difference (in terms of degrees of interaural phasedifference). The perceived location is normalized with 0%corresponding to 0 = 00 and 100% to 6 =±900. (After Yost and Hafter,1987.) 20Fig. 1.14 The perceived locations (calibrated as follows: 0 corresponds to 0 = 00, 5to 0=900, and -5 to 0 =9O0) of impulse sound images as a function of theinteraural time difference. (After Blauert, 1983, Page 144.) 21Fig. 1.15 The lTD (in terms of degrees of interaural phase difference) thresholdfor perceived change as a function of frequency for tones. Theparameters are the ITDs (again, in terms of degrees of interaural phasedifference) of the referent tones which serve to mark positions in lateralspace. (After Yost and Hafter, 1987.) 22Fig. 1.16 Responses (A and B) of two inferior colliculus neurons to changes inlTD. and waveforms (C, D, and E) of three types of binaural stimuliused in the stimulation of these two neurons. (After Kuwada and Yin,1983.) 24xliiFig. 1.17 Angle of just noticeable difference (JND) (1O) for tone bursts as afunction of the tone frequency f and the angle 8 between the soundsource and the median plane. (After Durlach and Colburn, 1978.) 26Fig. 1.18 Comparison made by Mills (1960, 1972) of interaural phase JND andthe interaural amplitude JND for ITD=0 and IID=0 with the changes inlTD tS4 and in lID iXa that occur when an actual source is moved a justnoticeable angle from the median plane. (After Durlach and Colburn,1978.) 27Fig. 1.19 The coincidence network proposed by Jeffress in 1948 for the localization of low-frequency tones. (After Colburn and Durlach, 1978.) 31Fig. 1.20 A block diagram of the cross-correlation model of Sayers and Cherry(1957). The term e’CM in the figure corresponds to the first weightingfactor, and the term AL or AR to the second weighting factor discussedin the main text. The term Av(t) refers to temporal average. (AfterColburn and Durlach, 1978.) 33Fig. 1.21 A block diagram of the auditory nerve based model by Colburn (1973,1977). (After Colburn and Durlach, 1978.) 35Fig. 1.22 Schematic diagram of the sterausis binaural network suggested byShamma (1989). C, denotes the output of the (i,j) - th neuron in thematrix, which receives inputs from the ipsilateral fiber X, and thecontralateral fiber Y, 37Fig. 1.23 Location of the peaks of the multi-channel cross-correlation functionfor broad-band noise with an lTD of 1500 p.s. The vertical axis indicatesthe center frequency of the different band-pass filter channels, while thehorizontal axis indicates the argument (ITD) of the cross-correlationfunction. (After Stern et al., 1988.) 39xivFig. 1.24 Schematic diagram of the model by Sujaku et al. (1981) for lTDsensitive inferior cofficulus neurons 44Fig. 1.25 Schematic diagram of the connections from AVCN and MNTB to theLSO column in the model by Reed and Blum (1990) 47Fig. 2.1 Simulated responses of a cochlear model (Shamma et al., 1989) to a 600Hz sinusoidal stimulus. (After Shamrna et al., 1989.) 53Fig. 3.1 A schematic diagram of a DLI model for auditory localization incomplex acoustic environments 59Fig. 3.2 Schematic diagrams of three implementations of the generic DLI model(shown in Fig. 3.1): (a) lTD estimation; (b) lID estimation; and (c) D3Destimation. ST-lID: short-time lID. ST-ITD: short-time lTD. ST-TED:short-time TED 61Fig. 3.3 Example of neural activities of a group of nine (vertical axis) modeledauditory nerve fibers in responding to a pure tone stimuli of 900 Hz 70Fig. 3.4 Frequency spectrum of the signal in Eq. 3.2 72Fig. 3.5 High modulation rates. Example of neural activities of a group ofauditory nerve fibers when the stimulus has relatively high modulationrates. Each square represents the activation of a particular auditorynerve fiber at a particular time slice. The white area in each square isproportional to the activation level. AN: auditory nerve 73Fig. 3.6 Intermediate modulation rates. Example of neural activities of a group ofauditory nerve fibers when the stimulus has intermediate modulationrates. Each square represents the activation of a particular auditorynerve fiber at a particular time slice. The white area in each square isproportional to the activation level. AN: auditory nerve 74Fig. 3.7 Low modulation rates. Example of neural activities of a group ofauditory nerve fibers when the stimulus has relatively low modulationxvrates. Each square represents the activation of a particular auditorynerve fiber at a particular time slice. The white area in each square isproportional to the activation level. AN: auditory nerve 75Fig. 3.8 A schematic diagram of the structure of the networks for short-time cueestimations. The squares represent neurons in the network. The size ofthe white area in a square represents the amplitude of the neuron’sactivity, which is normalized to be from zero to one. The adjacent layersin the network are fully connected in a feed-forward fashion. See textfor details on the connection between layers 76Fig. 3.9 An example of the tuning curve of an output layer neuron. Thearrowhead line at lTD = 0.1 ms represents the activity of the neuron forthis particular lTD value 79Fig. 3.10 An example of the coding of cue values by an array of output layerneurons. The arrowhead lines indicate the activities of the individualneurons. The mean of the Gaussian envelope is the encoded cue value 80Fig. 3.11 An example of the lID distribution function. The sound stimuli used inthe simulation were 1 kHz pure-tones on which white noise was addedwith a 20 dB signal-noise ratio. The lTD in the stimuli was 250 jis. Thepeak around 250 jis indicates the correct estimate of the true lTD value 83Fig. 4.1 A model of the neural signals in auditory nerve fibers (de Boer and deJongh, 1978) 86Fig. 4.2 A schematic diagram of the auditory periphery model by Carney (1993).See text for descriptions of the different elements in the model.(Adopted from Carney, 1993.) 87Fig. 4.3 Parameters associated with an lTD distribution peak 99Fig. 4.4 Examples of the two square-wave signals, S(t) and S2(t), which areorthogonal to each other 103xviFig. 5.1 lTD estimates shown as a function of the true ITDs 109Fig. 5.2 Example input activation patterns of the lTD network for a two-tonecomplex stimulus whose two components have a 250 Hz difference 113Fig. 5.3 An example lTD distribution function for a two-source case with a 25 mson-off modulation period 118Fig. 5.4 The lTD distribution histogram for the uncorrelated two-source case.The parameters that describe the two target sources are shown in Table5.13 121Fig. 6.1 lID estimates shown as a function of the true liDs 125Fig. 6.2 An example lID distribution function for a two-source case with a 35 mson-off modulation period 130Fig. 6.3 The lID distribution function for the uncorrelated two-source case. Theparameters that describe the two target sources are shown in Table 6.13 132Fig. 7.1 Estimated TEDs shown as a function of the true 1EDs 135Fig. 7.2 The resulting TED distribution function for the test with a 17 ms on-offmodulation period 139Fig. 7.3 The TED distribution function for the two-source case with random on-off modulations. Parameters associated with the two lateral peaks areshown in Table 7.10 141xviiLIST OF ABBREVIATIONSAM Amplitude Modulation LSO Lateral Superior OliveAN Auditory Nerve MAA Minimum Audible AngleAVCN Anteroventral Cochlear Nuclei MGB Medial Geniculate BodyBF Binaural Facilitation ML Maximum LikelihoodBW Bandwidth MNTB Medial Nucleus of the Trapezoid BodyCF 1. Characteristic Frequency2. Center Frequency MP Modulation PeriodDC Direct Current MSO Medial Superior OliveDCN Dorsal Cochlear Nuclei MTB Medial Trapezoid BodyDLI Decomposition-Localization- OOP On-Off Modulation PeriodIntegrationPVCN Posteroventral Cochlear NucleiEE Excitatory-ExcitatorySNR Signal Noise RatioEl Excitatory-InhibitorySOC Superior Olivary ComplexFD Frequency DifferenceSPL Sound Pressure LevelIC Inferior colliculusSS Spatial SeparationlED Interaural Envelope DelaylID Interaural Intensity DifferenceslTD Interaural Time DifferencesJND Just Noticeable DifferenceLL Lateral LemniscusxviiiACKNOWLEDGMENTSI would like to express my sincere gratitude and thanks to my supervisor in theDepartment of Electrical Engineering, Dr. Charles. A. Laszlo; without his constantencouragement and invaluable guidance, this thesis would not have been possible. Iam also deeply indebted to my supervisor in the Department of Ophthalmology, Dr.Max S. Cynader, for his invaluable support, inspiration, and guidance. I have learneda great deal from both of them during the course of this research.Special thanks go to Dr. Nicholas V. Swindale, who has given me manysuggestions over the past few years, and has commented on the draft of this thesis.Thanks also go to Dr. Robert M. Douglas for his help with computer programmingand answers to questions about psychology, and to Dr. Pierre Zakarauskas for manyinspiring discussions.I would also like to take this opportunity to thank Doris Metcalf, graduatesecretary in the Department of Electrical Engineering, and Barry Gibbs, secretary inthe Ophthalmology Research Laboratory, for their help over the years of my study atUBC. Thanks also go to other staff members of the Departments of ElectricalEngineering and Ophthalmology whose help did not go unnoticed.To my parentsxixIntroductionThe ability to localize sound sources enables us humans to obtain importantinformation about what and where an event happens. It also contributes to our abilityto understand speech. We could be very confused and sometimes in danger if wecould not determine where a sound comes from. There is no doubt that auditorylocalization plays an important role in the survival of a large number of biologicalspecies. A question for us is how the auditory system solves the problem of soundlocalization.The problem of localizing sound sources also arises in many engineering systems,and applications include underwater object detection, medical diagnosis, surveillancesystems, robot sensing, and hearing aid devices. Passive sonar applications areparticularly relevant to sound localization. Although much research has beenconducted to improve the performance of such engineering systems, the physiologicalear is still superior to artificial devices under similar conditions. In fact, the powerfulability of the auditory system to localize sound sources has stimulated curiosityamong researchers in various disciplines for more than a century. In particular,physiologists and psychologists have been focusing their inquiries on why and howthe auditory system works, while engineers and computer scientists have beeninterested in the duplication of auditory functions and in methods to accomplish thelocalization task. It is likely, however, that an integrated understanding of theproblem of auditory localization would benefit from mutual interactions amongresearch in different areas. Such integrated understanding is necessary to meet thechallenge of developing artificial devices that emulate auditory localization with onlytwo sensors.Introduction 2Auditory localization occurs in both the horizontal and vertical planes (Fig. 1.4,Page 10). Horizontal localization is dominant, and under some conditions it is calledlateralization. In this thesis, the term “localization” refers to horizontal localization,unless it is clear from the context that it is used in its more general meaning.Previous studies have identified certain essential aspects of auditory localization.Cues used by the auditory system to localize sound sources, and possible neuralmechanisms for the using of these cues have been studied and documented fordecades. Specifically, three binaural cues, the interaural time (ITD) and intensity(lID) differences, and the interaural envelope delay (TED) of the sound signalsdetected by the two ears, play important roles in the sound localization process.Theoretical studies have been carried out to investigate how these cues may beprocessed in the brain in order to obtain estimates of the directions of sound sources.In addition, models have been created to integrate such hypotheses with experimentaldata. These efforts give rise to interesting ideas for the design of artificial devicesthat mimic the auditory localization process.Despite the previous efforts in the study of auditory localization, theunderstanding of how such localization is achieved in noisy, dynamic, and multi-source environments, which are typical in nature, remains a challenging problem. Infact, this problem has not been specifically addressed with enough emphasis bypublished models, which are designed to fit data from experiments conducted in staticacoustic environments with high signal-noise ratios (SNR5). However, the auditorysystem can localize sound sources with excellent performance under noisy and timevarying conditions. Investigations in this direction may provide an opportunity todiscover new mechanisms of or insights into auditory localization that may not be soevident under more simplified and idealized assumptions about the acousticenvironment. Another reason for the consideration of more complex and realisticacoustic environments is that any practical localization device must work in naturalIntroduction 3(noisy) situations. Thus, it is the objective of this work to study mechanisms forauditory localization in noisy and time-varying acoustic environments.We argue that in order for the auditory system to localize sound sources undersuch conditions, it is essential for the system to estimate the location cues quickly androbustly. Thus, our approach is to develop and study models that estimate the wellknown binaural cues in such fashion in noise and in short time duration. Thismodeling strategy is also crucial in developing models that can localize sound sourcesin multi-source environments.Localizing multiple sound sources with only two ears is a difficult problem.However, the auditory system can localize multiple sound sources in rathercomplicated situations. One possible reason that the auditory system can do this maybe due to the ability of the system to decompose the acoustic input signals intodifferent frequency bands. By focusing on different frequency bands whileperforming the localization task, sources with different frequency contents can beseparately localized. When two sound sources have similar frequency contents,however, the problem of separating the directions of the two sources remainsunsolved.In order for the auditory system to solve the problem is such situations, thelocalization cues need to be estimated quickly and robustly in a changing acousticenvironment. We note that, in a natural acoustic environment, there are manytransient effects in the sense that sounds come and go, and the energies emitted fromthe sound sources change from time to time. Thus, we can model the acousticenvironment by assuming that multiple sounds are organized in such a way that, inmost small time intervals, there always is one sound that has significantly higherintensity than the other sounds in the environment. This loudest sound can then betreated as the signal in the corresponding time windows, and all other sounds as noise.Introduction 4In this way it is possible to pick up multiple sound sources that emit over the entireobservation period but have changing relative intensity or energy.Our specific objectives in this thesis include the development of models that canestimate the three different binaural cues quickly and robustly, and the developmentof methods of integrating these estimates over time to obtain a spatial representationof the acoustic environment.We first review, in Chapter 1, the literature related to auditory localization. Wethen, in Chapter 2, discuss our modeling strategies and the motivations of our model.In Chapter 3, we describe, in detail, our model for the estimation of the binaural cuesin noisy, dynamic, and multi-source environments. In Chapter 4, we describe themethods used in the simulation of the model described in Chapter 3. In the followingchapters (5, 6, and 7) we present estimation results of the individual cues (ITDs, liDs,and TEDs) respectively. We finally conclude the thesis with a summary anddiscussion of the significance of our model, and suggestions for future work.51 Auditory Localization§tl IntroductionIn everyday life, our ability to tell where a sound is coming from is so natural that weoften take it for granted without paying much attention to the power of auditorylocalization. Scientific inquiry into auditory localization, however, dates back to theearly nineteenth century. Around as early as 1800, an Italian physicist namedGiovanni B. Venturi carried out some free-field tests using brief flute tones on normaland unilaterally deaf listeners. He found that his deaf listeners could not localize thesound correctly, and hypothesized that the difference in sound intensity between thetwo ears was the basis of localization (Carterette, 1978).The nineteen century psychologists wondered why nonspatial tones could bespatially perceived. Rayleigh in 1877 showed that, for high-frequency tones, theinteraural ratio of the sound intensities at the two ears is crucial for localization. Thisobservation was further supported by evidence from some experiments by Thompsonin 1879. These observations led to an intensity theory of localization (Carterette,1978). Research done in the twentieth century has confirmed that, while the intensitycue is an important one, the intensity theory is not the whole story.Early this century, using dichotic’ stimuli, researchers (e.g. Rayleigh, 1907;Wilson and Myers, 1908) demonstrated that low-frequency tones could be localizedon the side of leading phase. In 1908, Mallock suggested that localization could bebased on interaural time differences.In 1907, Rayleigh formulated, for the first time, the duplex theory of localization,which states that the localization of low-frequency sounds depends on interaural timeI Stimuli presented through headphones are called dichotic stimuli if the two signals at the two earsare not identical.Chapter 1 Auditory Localization 6differences (ITDs) while that of high-frequency sounds depends on interauralintensity differences (liDs).During the past several decades, a large number of studies have been carried outto further investigate the mechanisms involved in auditory localization. These studiesgive us a much more complicated picture regarding how auditory localization isachieved. While the ideas of ITDs and ilDs being important localization cues remaintrue, the duplex theory of localization has proved to be only partially correct. In thischapter, we shall give a detailed review of the more recent studies on auditorylocalization.§1.2 Approaches Used in the Study of Auditory LocalizationAs described in the last section, research on auditory localization has a rather longhistory. Various approaches have been used to understand the possible mechanismsunderlying sound localization by different biological species. This diversity ofapproaches is a reflection on the complexity of the problem.At first, localization has been studied using psychophysical methods. Based onthe measurements of various thresholds of sensation generated by different acousticstimuli, functional mechanisms could be hypothesized. Indeed, research done usingsuch methods has the longest history (over a century) producing a major part of ourunderstanding of auditory localization.Second, physical acoustics has been used to study the physical aspects of soundstimuli (frequency, intensity, etc. ) that make localization possible. The relationshipbetween these aspects and the direction of a sound source has been investigated usingboth theoretical analysis and direct measurements. Theses studies have establishedthe physical foundation of directional hearing.Another important aspect of auditory localization research is the physiology ofauditory localization. By recording the electrical activities of neurons in differentChapter 1 Auditory Localization 7sites along the auditory pathways, this approach has been a powerful tool in revealingthe neural mechanisms of auditory localization.Finally, the theoretical approach has been used along with experimental studies.Although the majority of research has been experimental, theoretical analysis hasplayed an important role in interpreting the data, in hypothesizing the possiblemechanisms of the auditory system, and in integrating our understanding of theproblem. Moreover, the results of this aspect of the research, i.e. the models, are mostuseful in engineering applications§1.3 A Brief Review of the Structure of the Auditory SystemThe auditory system can roughly be divided into three parts: the auditory periphery,the brainstem auditory nuclei, and the auditory cortex. The primary information flowin the auditory system is from the periphery through the brainstem auditory nuclei tothe auditory cortex, as shown in Fig. 1.1. The auditory periphery is stimulated by thesound coming into the ear. The sound pressure signal is transformed in the auditoryperiphery into neural activities of the auditory nerve fibers, which link the auditoryperiphery to the brainstem auditory nuclei. The neural activities of the auditory nervefibers are relayed through a number of brainstem auditory nuclei, where certain typesof processing occur, to the auditory cortex. The pathways from the periphery throughthe brainstem to the cortex are called the ascending pathways, as opposed to thevarious feedback pathways (centrifugal pathways) from the cortex to the brainstemand to the periphery.The auditory periphery itself can in turn be divided into four parts: the outer ear,the middle ear, the cochlea, and the auditory nerve, as shown in the schematicdiagram in Fig. 1.2. The middle ear couples sound energy in the outer ear to thecochlea. It can be viewed as a mechanical transformer which matches the impedanceof the air in the outer ear with the much higher impedance of the fluids inside theChapter 1 Auditory Localization 8cochlea. The cochlea transforms mechanical vibration into neural signals carried bythe auditory nerve fibers. This transformation has been the subject of bothexperimental and modeling research for many decades (Allen, 1985).The neural signals carried by the auditory nerve fibers are processed and relayedto the auditory cortex through a number of “stations” in the brainstem, where neuralinformation converges and diverges. These “stations” are referred to as nuclei. Fig.1.3 shows a schematic diagram of the main ascending auditory pathways of thebrainstem (Pickles, 1988).The neural signals in the auditory nerve are first fed into the cochlear nucleus(CN) which consists of three parts, the dorsal (DCN), posteroventral (PVCN) andanteroventral (AVCN) cochlear nuclei. One part of the pathways from the cochlearnucleus leads to the superior olivary complex (SOC), while another part bypasses thisSoundStimulusSoundStimulusThe Auditory SystemFig. 1.1 Three major parts of the auditory system.NeuralActivitiesFig. 1.2 Four parts of the auditory periphery.Chapter] Auditory Localization 9complex and leads to the lateral lemniscus (LL) and its nucleus. The superior olivarycomplex consists of three parts, the lateral superior olive (LSO), the medial superiorolive (MSO), and the medial trapezoid body (MTB). Among them, two parts, theLSO and the MSO, receive information from both left and right cochlea nuclei.Signals from LSO and MSO are then fed into the lateral lemniscus, which in turnsends signals to the inferior colliculus (IC). Signals from IC are sent to the medialgeniculate body (MGB), which in turn sends signals to the auditory cortex.Soc1LSO IL_-_ _iFig. 1.3 The main ascending auditory pathways of thebrainstem. For abbreviations see the main text. (After Pickles,1988, Page 170.)§1.4 Binaural HearingFig. 1.4 shows a coordinate system for the study of sound localization. The stimulussignals detected by the two ears when a sound source is emitting energy from aspecific direction (0, 0) generally contain subtle differences. As mentioned in SectioncortexNLLChapter 1 Auditory Localization 101.1, the importance of such differences has long been recognized by early researchers(Rayleigh, 1907). The interaural differences caused by a single source have beenstudied by a number of people (e.g. Shaw, 1974a, b; Steinberg and Snow, 1934;Kuhn, 1979). Fig. 1.5 and Fig. 1.6 (After Durlach and Colburn, 1978) showsummaries made by Shaw (1 974a,b) of data obtained by several researchers(Firestone, 1930; Mills, 1958; Nordlund, 1962; Feddersen et al., 1957; Woodworth,1938; and Rayleigh, 1945).HORIZONTALPLANErOV VERTICAL (SAGITTAL)PLANEFig. 1.4 A coordinate system for sound localization.Despite these differences between the two signals at the two ears, we usuallyperceive only one sound image coming from a direction in the vicinity of the source.0 =009 = —90°left0 = 90°right0 = 180°4’ = 90°0= 18004’ = —9OChapter] Auditory Localization 11We say that a sound image is fused if the image is spatially compact and unitary. Ifthe image is perceived to be outside the head, we say that the image is externalized.2015I000C2520—. ISS002 0.3 04 0.5 0.7 10 1.4 2 3f (kHz)Fig. 1.5 Interaural amplitude ratio a for tones as a function ofthe azimuth angle 0 and the frequency f (Shaw, 1974a, b).(After Durlach and Colburn, 1978.)In order to study the different effects of different parameters in the interauraldifferences, researchers often use headphones to deliver stimuli to the listener’s twoears with well defined interaural differences. Headphone stimulation often produces asound image inside the head. When the parameters in the interaural differences arechanged, the image can usually be moved around inside one’s head, and suchmovement is along the left-right axis. Thus, the task of judging the lateral positionsof such sound images is referred to as lateralization. By controlling the exactinteraural differences in the two signals at the listener’s two ears, the effects of theinteraural differences on the lateralization of the sound image can be studied. Weshall refer to binaural stimuli as diotic if there is no interaural difference between the(a)-1000 Hz,. — —0HzI I I30 60 90 120 50 1809 (degrees)Z-90il iI4 5 7 1012Chapter] Auditory Localization 12two signals at the two ears. When there are interaural differences between the twosignals, we refer to the stimuli as dichotic.0.9 -250 Hz0.8 -1000Hz0.7- //0.6— / CLICKS2000Hz / /%‘E /j0.4— / /I, j/0.3— i/Ii \ ‘I,0.2-1 \\‘0.10j- I II iii ii Ii0 30 60 90 120 150 1808 (degrees)Fig. 1.6 Interaural time difference ‘T for tones and clicks as afunction of the azimuth angle 0 (Shaw, 1974a, b). The solidlines represent data obtained by Firestone (1930), Mills (1958),Nordlund (1962), and Feddersen et al. (1957). The lowerdashed curve is derived from the formula t = r/c (0+ sin( 0))and the upper dashed curve from the formula r = r/c(3sin(0)),where r = 8.75 cm and c = 344 m/s (Woodworth, 1938;Rayleigh, 1945). (After Durlach and Colburn, 1978.)§1.5 Interaural Intensity Differences (liDs)Over the past several decades the effects of the lID on the perceived lateral positionof the sound image have been studied by many researchers. One question concerningsuch effects is: what value of the lID can cause complete lateralization of theChapter] Auditory Localization 13perceived sound image. Pinheiro and Tobin reported in 1969 that the 11D required fora noise burst to be perceived as from the side of the head was roughly 10 dB,independent of the level and duration of the burst. Similar results were reported byBekesy in 1959 for a click train of 100 pulses per second. Flanagan et al. (1964)found that the lID required for a click to be perceived from the side of the head is lessthan 10 dB. Others (e.g. Guttman, l962a; Moushegian and Jeffress, 1959; Sayers,1964; Whitworth and Jeffress, 1961) reported that the value of lID for completelateralization of clicks and tones was much greater than 10 dB.Another issue is the different effects of the 11D under different conditions. Bekesyin 1960 reported that the lateralization effect of the lID was approximately the samefor tones above 3 kHz, for noise, and for clicks. Moushegian and Jeffress (1964) andFeddersen et al. (1957) found that the lateralization of the sound image for a certainlID value was frequency dependent for tones.Some later studies tried to quantify the effects of the lID on lateralization for arange of lID values, giving a function of the lID over the perceived lateralization ofthe sound image. Fig. 1.7 shows data from Blauert, 1983, where the perceived lateralpositions of both broad-band noise and 600 Hz tones are plotted as functions of thelID. Until full lateralization is attained, the function is more or less linear. Fig. 1.8shows a comparison made by Yost and Hafter (1987) of data obtained in two studies(Watson and Mittler, 1965; and Yost, 1981). Yost and Hafter (1987) have concludedthat the relationship between the lateral position and the lID is approximately thesame over a considerable range of frequency, overall level, and duration.Sensitivity of the auditory system to changes in the interaural intensity differencehas also been studied by a number of researchers (Mill, 1960; Rowland and Tobias,1967; Herskowitz and Durlach, 1969; Grantham, 1984). The smallest change in thelID that leads to a change in the perceived lateral position of the sound image wasmeasured in these studies. Fig. 1.9 shows a comparison made by Yost and HafterChapter] Auditory Localization 14(1987) of studies by Mills (1960), Yost and Dye (1987). Different curves in Fig. 1.9correspond to different ilDs around which the perceived smallest changes in 11D weremeasured. Thus, the different curves demonstrate the sensitivity of the auditorysystem to changes in lID at different positions of the sound images. Zero dB lIDcorresponds to a sound image in the center of the head; 15 dB lID corresponds to asound image fully lateralized to one side of the head; nine dB lID corresponds to asound image somewhere between the positions of the images produced by a 0 dB lIDanda 15 dB lID.Cci)Eci)C)c50.Cl)Uci)-J0Cci)20)•0D-)Fig. 1.7 Lateral displacement (calibrated as follows: 0corresponds to 0 = 00, -5 to 0 =-90°, and 5 to 0 =+90°) of theauditory event as a function of the interaural intensitydifference (dB). (After Blauert, 1983, Page 158.)The upward shifting of the curves from the 0 dB case to 15 dB case indicates thatthe sensitivity to lID changes is highest when the sound image is perceived to be inthe center of the head. The sensitivity becomes poorer when a sound image isperceived to be more lateral. Also note that the auditory system is less sensitive tochanges in the 11D for tones at frequencies around 1 kHz.-6 0 6 12Interaural Intensity Difference (dB)Chapter 1 Auditory Localization 15The effects of varying the 11D on the responses of auditory neurons have also beenstudied for several decades. These studies have been aiming at elucidating thefunctional roles of binaural neurons and the neural mechanisms of the lID sensitivityobserved in psychophysical experiments. Neurons sensitive to the lID have beenfound in various sites along the auditory pathways, including the superior olivarycomplex (Boudreau and Tsuchitani, 1968; Goldberg and Brown, 1968; Caird andKlinke, 1983), dorsal nucleus of the lateral lemniscus (Brugge et al., 1970), inferiorcolliculus (Ross et al., 1966; Geisler at al., 1969; Stillman, 1972), superior colliculus(Schechter et al., 1981; Hirsh et al., 1985; Yin et al., 1985), medial geniculate body(Aitkin and Webster, 1972), and auditory cortex (Brugge et al., 1969; Brugge andMerzenich, 1973; Imig and Adrian, 1977; Phillips and Irvine, 1981). Thecharacteristic frequencies (CFs) of lID sensitive neurons fall mostly in the higherregion in the audible frequency range. This is consistent with the psychophysicalobservation that high-frequency tones can be lateralized if there is an lID in thestimuli.60- 0 Yost(1981)i :: ,/7o Mittier (1965)0-•• I II I I III0 5 10 15 20Interaural Intensity Difference (dB)Fig. 1.8 The perceived location of the pure tone sound imageas a function of the interaural intensity difference. Theperceived location is normalized across the studies with 0%corresponding to 0= 00 and 100% to 0= ±90°. (After Yost andHafter, 1987.)Chapter 1 Auditory Localization 16Two major classes of lID sensitive neurons have been found. Neurons in the firstclass are excited by ipsilateral simulation and inhibited by contralateral stimulation.These neurons are referred to as “El” cells to reflect the separate excitatory-inhibitoryinfluences from the two ears. Fig. 1. 10 (After Kuwada and Yin, 1987) shows the lIDresponse characteristics of such a neuron found in the nucleus of the lateral lemniscus(Brugge et al., 1970). In the experiment where the data shown in Fig. 1.10 wereobtained, the stimulus intensity at the contralateral ear was held constant while theintensity at the ipsilateral ear was varied. The vertical axis in Fig. 1.10 shows thepercentage of the number of spikes with respect to the number of spikes generatedwith contralateral stimulation alone. The stimulus used was a 6.4 kHz tone burst witha 200 ms duration.-ci)C)C2-. 1.5-C)ci)CC - I I I I I liii100 1000Frequency (Hz)Fig. 1.9 The lID threshold for perceived change as a functionof frequency for tones. The parameters (0, 9, and 15 dB) arethe liDs of the referent tones which serve to mark positions inlateral space. (After Yost and Hafter, 1987.).-— 0 dB (Mills, 1960)—4-— 9 dB (Yost and Dye, 1987)—A-— 15 dB (Yost and Dye, 1987)I I I I 1111110000Chapter] Auditory Localization 17The second class of lID sensitive neurons receive excitatory inputs from bothears. Thus, they are referred to as “EE” cells to reflecting the excitatory influencefrom both ears. The neurons do not respond to monaural stimulation to either ear, butthey do respond to binaural stimulation (Kitzes et al., 1980; Wise and Irvine, 1983;Yin et al., 1985). This phenomenon was referred to by Yin and Kuwada (1984) asbinaural facilitation (BF). The relationship between the spike count and the lID forthe neurons in the second class, different from that shown in Fig. 1.10, exhibits nonmonotonic characteristics. Fig. 1.11 (After Kuwada and Yin, 1987) shows such arelationship for a neuron in the superior colliculus of the cat (Yin et al., 1985). Themean intensity of the stimuli at the two ears is 60 dB SPL. The stimulus frequency is19kHz.0C0C-)0ci)0)Cci)0ci)-15 0 15 30 45 60Interaural Intensity Difference (dB)Fig. 1.10 Effects of varying liDs on the discharge rate of aneuron in the cat dorsal nucleus of the lateral lemniscus. (AfterKuwada and Yin, 1987.)100Chapter 1 Auditory Localization 18Fig. 1.11. Responses of a superior colliculus neuron of cat tochanges in lID. (After Kuwada and Yin, 1987.)§1.6 Interaural Time Differences (ITDs)The effects of the lTD on lateralization have been studied extensively over the pastseveral decades. Different types of stimuli have been used in these studies. Forsinusoidal stimuli, the perceived lateral position of the sound image is a periodicfunction of the lTD as the lTD is varied (Durlach and Colburn, 1978). The period ofthis function is equal to the period T of the sinusoidal stimuli. If the frequency of thestimuli is substantially above 1500 Hz, the perceived lateral position of the soundimage no longer follows the change of the lTD in the stimuli. For low frequencytones, as the lTD is increased from 0 to half the period of the tone, T/2, the perceivedsound image moves from the center of the head towards the lead ear. When the lTDis in the vicinity of T12, the sound image is no longer perceived as compact as it iswhen lTD is 0, and it is difficult to tell where the image is. As the lTD is furtherincreased from T12 to T, the image can again be perceived clearly, and the position of4030a)-C!)0I-a)2z0 60dB-30 -20 -10 0 10 20 30lID (dB)Chapter] Auditory Localization 19the image returns to the center of the head from the lag ear. The maximumlateralization is reached at ITDs in the vicinity of T12. Thus, it depends on thefrequency of the sinusoidal stimulus. When T12 is larger than 700 ps, the maximumlateralization corresponds to full lateralization to the side.PP1600HzFig. 1.12 Binaural coherence curves for tones of differentfrequencies. P denotes the percentage of judgments “to theleft,” and positive corresponds to the left ear leading. (AfterDurlach and Colbum, 1978.)The periodic function between perceived position of the sound image and the lTDis best illustrated via the binaural coherence curve (Durlach and Colburn, 1978). Fig.1.12 shows some examples of such curves which are derived from an experiment bySayers and Cherry in 1975. In this experiment, the listener was asked to judgewhether the image was to the left or right of the center of the head. The percentage of“left” response with respect to the total number of stimulus presentation trials isplotted as a function of the lTD values tested. The positive ITDs in the plotscorrespond to the left ear leading. As we can see from the binaural coherence curvesI 250Hzr (msec)-I 0 Ir (msec)Chapter 1 Auditory Localization 20in Fig. 1.12, lateral perception of the sound image is a periodic function of the ITD,and the period of this function varies with the frequency of the sinusoidal stimuli.The effects of the lTD on lateralization have also been studied using a methodwhere the listener is asked to assign a position to the sound image on a linear scale(Sayers, 1964; Watson and Mittler, 1965; Yost, 1981). Fig. 1.13 from Yost andRafter (1987) shows the results obtained by Yost, 1981. This plot illustrates anapproximately linear relationship between the perceived lateral position and the lTD.This relationship remains similar for tones with frequencies up to 1200 Hz. In theplot in Fig. 1.13 the lTD is measured in terms of interaural phase difference in therange from 0 to it. This is because the relationship depicted in Fig. 1.13 is periodicalwith a period equal to the period of the stimuli, as demonstrated in Fig. 1.12.C000-JG)>ci)0ci)cici)N0z0 30 60 90 120 150 180I nteraural Phase Difference (degrees)Fig. 1.13 The perceived location of the pure tone sound imageas a function of the interaural time difference (in terms ofdegrees of interaural phase difference). The perceived locationis normalized with 0% corresponding to 0=0° and 100% to 0 =±90°. (After Yost and Hafter, 1987.)Chapter] Auditory Localization 21The effects of the lTD on the lateralization of the sound image have also beenstudied with broad-band noise stimuli (e.g. Blodgett et al., 1956). Similar results tothat for pure tones have been obtained except that the perception of the “side” of thesound image does not vary periodically with the lTD. Fig. 1.14 shows that for abroad band click stimulus similar relationship to that shown in Fig. 1.13 existsbetween the perceived location of the sound image and the lTD was observed(Blauert, 1983). Again, unlike the case for sinusoidal stimuli, this relationship is notperiodical. The perception of a single image breaks down when the lTD issubstantially larger than 1 ms (Durlach and Colburn, 1978).a)E 4•a)0a.0) -0i5 0--Jo-2-a)2c’-4-)-1.5 -1 -0.5 0 0.5 1Interaural Time Difference (ms)Fig. 1.14 The perceived locations (calibrated as follows: 0corresponds to 0= 00, 5 to 0=900, and -5 to 0 =-90°) of impulsesound images as a function of the interaural time difference.(After Blauert, 1983, Page 144.)The dependence of the lateralization on the lTD for more complex stimuli havealso been studied. Sayers (1964), Sayers and Cherry (1957), and Toole and Sayers(1965a) studied multiple-tone complexes. Sayers and Toole (1964) and Toole andSayers (1965a,b) studied click trains. Cherry (1961), Cherry and Sayers (1956),1.5Chapter 1 Auditory Localization 22Cherry and Taylor (1954) and Sayers and Cherry (1957) used speech signals. Forspeech, the lateralization judgments and coherence curve are similar to that for broadband noise. For multiple-tone complexes and periodic click trains, multiple imagesmay occur. Sophisticated listeners can often identify a variety of images andtrajectories as the lTD is varied (Durlach and Colburn, 1978).18—16-) 14-a)12-a)Cl)ca 10-8- x - Yost(1974)6- o - Zwislocki and Feldman (1956)a) V - Mills (1960)2A - Klump and Eady (1956)+ - Herskowitz and Durlach (1969)0-, • I I I I I I I I0 500 1000 1500Frequency (Hz)Fig. 1.15 The lTD (in terms of degrees of interaural phasedifference) threshold for perceived change as a function offrequency for tones. The parameters are the ITDs (again, interms of degrees of interaural phase difference) of the referenttones which serve to mark positions in lateral space. (AfterYost and Hafter, 1987.)The sensitivity of the auditory system to small changes in the lTD for tones havebeen studied by Klumpp and Eady (1956), Zwislocki and Feldman (1956),Hershkowitz and Durlach (1969a), Domnitz (1973), Yost (1974), and Mills (1960).Fig. 1.15 shows data from a number of studies summarized by Yost and Hafter(1987). The data show that the binaural system is highly sensitive to changes in thelTD. The parameters in Fig. 1.15 are the ITDs around which the sensitivity of theauditory system to changes in lTD were measured. Thus, different parameters in Fig.13501 8090 0-45QAChapter 1 Auditory Localization 231.15 indicate different sensitivities of the auditory system when the sound image isperceived at different lateral positions. It is evident in Fig. 1.15 that the auditorysystem is more sensitive when the sound image is perceived to be in the center of thehead than when the sound images is perceived to the side. Another observation is thatthe lTD threshold is approximately 2 degrees (in terms of interaural phase difference)at low frequencies, and the threshold increases as the frequency is increased. Above1200 Hz, the binaural system is insensitive to changes in the lTD (Yost and Rafter,1987).The sensitivity of neurons from different sites along the auditory pathways to thelTD had also been studied by a number of researchers (Goldberg and Brown, 1969;Crow et al., 1978; Brugge et al., 1969; Brugge et al., 1970; Aitkin and Webster, 1972;Chan and Yin, 1984; Yin and Chan, 1990; Meissen et al., 1990; Kuwada and Yin,1983; and Yin and Kuwada, 1983). Fig. 1.16 shows data from experiments by Yinand Kuwada (1983), demonstrating responses of two low-frequency neurons in theinferior colliculus of the cat to different lTD values in pure tone stimuli (afterKuwada and Yin, 1983). In their experiments, Kuwada and Yin (1983) usedsinusoidal stimuli. Three types of delays were used which were depicted in Fig. 1.16(C, D, and E). The stimuli shown in Fig. 1.16C are labeled as “phase compensated”,reflecting the fact that, although there is an on-set delay between the two stimuli at thetwo ears, there is no phase difference in the two tonal stimuli. The stimuli shown inFig. 1. 16D are labeled as “delay curve”, reflecting the fact that the entire waveform ofthe stimulus to one ear is delayed with respect to the stimulus to the other ear. Thestimuli shown in Fig. 1.1 6E are labeled as “phase curve”, reflecting the fact that thereis only a phase difference between the two stimuli, and that there is no-set delaybetween the two stimuli. Fig. 1.16A shows the responses of a inferior neuron to thetwo types of stimuli shown in Fig. 1. 16C and D respectively. The neuron showsselective response to different delays in the stimuli shown in Fig. 1.16D, but not toChapter 1 Auditory Localization 242n80DELAY CI.PvE .Z PHASE CtPYE 0-6567 —3334 0 3334 5657contro DElAY 4sec) ipsiFig. 1.16 Responses (A and B) of two inferior colliculusneurons to changes in lTD. and waveforms (C, D, and E) ofthree types of binaural stimuli used in the stimulation of thesetwo neurons. (After Kuwada and Yin, 1983.)As consistent with the psychophysical observations for low-frequency sinusoids,neural sensitivity to the lTD decreases beyond certain frequencies. For the cat, thissensitivity is restricted to frequencies below 3 kHz, which is about the upper-limit forphase-locking (i.e. firing to a particular phase in the periods of the sinusoidalthe different on-set delays in the stimuli shown in Fig. l.16C. Fig. 1.16B showsresponses of another inferior colliculus neuron to the two types of stimuli shown inFig. 1. 16D and E respectively. The neuron shows similar sensitivities to the phasedifferences in the two types of stimuli shown in Fig. 1. 16D and E. The data in Fig.1.16A and B indicate that, for tonal stimuli, the neurons are sensitive to the phasedifferences only.A XD: 771—1 1300 1-4:, 60 dB SPL60STJMUU.5, 4 500f800::DELAY CLVE •PHASE COAP. 0PHASE COMPENSATEDDO00 -000 0 000 1B 18170-3 300 Hz, 60 dO SP1.STIULLUSt c 1000/1500 eccontra PHASE LEAO ps-1 0TIUE tm5ec)Chapter] Auditory Localization 25waveform) of the cat’s auditory nerve fibers (Kuwada and Yin, 1987). Thisobservation suggests that the neural mechanism for lTD sensitivity is closely relatedto the phase-locking of the auditory nerve fibers.§1.7 Free Field Auditory LocalizationWhile dichotic stimulation is useful in the study of the effects of individual interauralparameters on the sensitivity of the auditory system, free field stimulation isimportant to study the overall performance of the auditory system in localization. In1936, Stevens and Newman measured a listener’s ability to localize pure tone stimuliin free field environments. A major finding of their study is that a listener’s ability tolocalize pure tones in the frequency range below 1 kHz or above 4 kHz is better thanthe listener’s ability to localize pure tones in the range between 1 kHz and 4 kHz.They concluded that in the mid-frequency region, the listener could not effectivelyuse either lTD or the lID cue because neither was particularly salient in this region.Another influential free field experiment was done by Mills in 1958, when hemeasured the smallest change that a listener could reliably report in the azimuth angleof the sound source. This smallest change in the azimuth angle is referred to as theminimum audible angle (MAA) in azimuth (Mills, 1958).Fig. 1.17 shows the results of Mill’s experiment (1958), where the MAA isplotted as a function of the frequency of the tone at different azimuth positions.These data show that the MAA in azimuth can be as small as 1 degree. The differentcurves in Fig. 1.17 correspond to different angles of the source with respect to themedia plane at which the MAA is measured. As we can see from Fig. 1.17, thespatial resolution of the auditory system is frequency dependent. In the middlefrequency range of 1-4 kHz, the auditory system shows poorer spatial sensitivity,which is consistent with the observation by Stevens and Newman (1936) cited above.Also evident from Fig. 1.17 is that as the source moves away from the median plane,Chapter] Auditoiy Localization 26the MAA increases, and the highest spatial resolution is observed when the sound isin front of the listener.14—ciA 112— : .I’(IAl0 I IJ. H I \&_ I ; 119750 j .\ 1!6-£0 ,11 a20ff\C. II 1 1 1200 500 1000 2000 5000 10,000(Hz)Fig. 1.17 Angle of just noticeable difference (JND) (zXO)0 fortone bursts as a function of the tone frequency f and the angleo between the sound source and the median plane. (AfterDurlach and Colburn, 1978.)A comparison (Fig. 1.18) made by Mills (1960, 1972) of the sensitivities of theauditory system to the ITD, lID, and spatial location shows that the curve for the justnoticeable difference in lTD as a function of the stimulus frequency (PHASE JND inFig. 1.18) and that for the actual phase change occurs when the sound source ismoved a just noticeable angle coincide in the frequency region below 1500 Hz.Similarly, the curve for the just noticeable difference in the III) (AMPLITUDE INDin Fig. 1.18) and that for the actual amplitude change occurs when the sound source ismoved a just noticeable angle coincide with each other in the frequency regionbetween 1500 and 6000 Hz. These findings suggest that the localization of tones isChapter 1 Auditory Localization 27determined by the lTD sensitivity of the auditory system in the frequency regionbelow 1500 Hz, and by the ID sensitivity in the region of 1500-6000 Hz.20I (“H15- 1 1.5i ACTUALPHASE JND.4 PHASEa’, I - -AMPLITUDEJND5 - /1 0.5‘I.—‘_“ ACTUAL-- —AMPLITUDE200 500 1000 2000 5000 10,000f (Hz)Fig. 1.18 Comparison made by Mills (1960, 1972) ofinteraural phase JND and the interaural amplitude JND forITD=0 and IID=0 with the changes in lTD At and in lID zathat occur when an actual source is moved a just noticeableangle from the median plane. (After Durlach and Colburn,1978.)Free field stimulation is also important in the study of the localization cues otherthan the lTD and IID, especially in the study on how localization in elevation is madepossible. Many studies investigated the fundamental significance of the filteringeffects of the outer ear (including the head, shoulder, pinna, and ear canal) whichimposes systematic direction-dependent changes on the spectrum of the incomingsound (Blauert, 1983; Butler et al., 1990; Middlebrooks, 1992; Wenzel et al., 1993;Zakarauskas and Cynader, 1993; Musicant and Butler, 1982, 1984a, 1984b; Musicantet al., 1990; Neti et al., 1992; Wightman et al., 1987). A notable effect of the filteringChapter] Auditory Localization 28characteristics of the outer ear is that notches and peaks are introduced to anoriginally flat sound spectrum. Such spectral notches and peaks prove to beperceptually important (Shu et al., 1993), and may be used by the auditory system tolocalize sound sources in both azimuth and elevation(Neti et al., 1992).Free field stimuli have also been used in physiological studies of auditorylocalization. A notable finding is the evidence for the existence of a neural map ofspatial location in the central auditory system of the barn owl (Knudsen and Konishi,1978; Moiseff and Konishi, 1981). Maps of auditory space have been found in thedeep and intermediate layers of the superior colliculus of the guinea pig (Palmer andKing, 1982; King and Palmer, 1983) and in the auditory system of the cat(Middlebrooks and Knudsen, 1984). Other free-field studies of neural sensitivitiesinclude that by Raj an et al. (1 990a,b) who studied the azimuthal sensitivity of theneurons in the primary auditory cortex of the cat.§1.8 Interaural Envelope DelayAs reviewed in Section 1.6, the auditory system is insensitive to the ITDs in high-frequency sinusoidal stimuli. Also, as shown in Section 1.7, auditory localization oftones is determined by the 11D sensitivity of the auditory system in the frequencyregion of 1500-6000 Hz. Moreover, these observations have been interpreted to beevidence in supporting the duplex theory of Rayleigh (1907). However, experimentswith more complex stimuli (such as the amplitude-modulated high-frequency tones)reveal that the interaural time difference can be important for the localization of highfrequency stimuli under certain conditions. Specifically, the lateralization of highfrequency signals is influenced by the interaural time delay if the high-frequencysignals have low-frequency envelopes.Envelope delay sensitivity has been demonstrated for high-frequency noise andclicks, as well as amplitude-modulated high-frequency tones by a number ofChapter 1 Auditory Localization 29researchers (Bekesy, 1960; David et al., 1958, 1959; Harris, 1960; Henning, 1974a;Klumpp and Eady, 1956; Leakey et al., 1958; Yost et al., 1971). Experiments byLeakey et al. (1958) and David et al. (1959) demonstrated that the lateralization ofsuch stimuli depended principally on the interaural relationship of the envelopes, notthe microstructures of the stimuli. This observation was further confirmed by somemore recent studies by Henning (1980), Henning and Ashton (1981), McFadden andPasanen (1976), and Nuetzel and Hafter (1976, 1981). Furthermore, according toHenning (1974a,b), the sensitivity to lTD for a sinusoidally amplitude-modulatedhigh-frequency tone can be as high as that to lTD for a pure tone with a frequencyequal to the modulation frequency of the high-frequency tone. Results fromexperiments by Henning (1974a,b), Licklider and Webster (1950), and Yost et al.(1971) strongly suggest that the timing information carried solely by the auditorynerve fibers with high characteristic frequencies can be used to discriminate the lTDin high-frequency signals. Indeed, experiments by Yin et al. (1984) who usedamplitude-modulated tones provided physiological evidence for the above citedpsychophysical observations. They found that some neurons with high characteristicfrequencies in the inferior colliculus were sensitive to the interaural envelope delay.Moller (1974) found that high-frequency cochlear nucleus neurons could phase-lockto the envelopes of amplitude-modulated signals with high-frequency carriers.Kuwada and Yin (1987) have suggested that high-frequency neurons can be sensitiveto interaural envelope delay in a manner similar to low-frequency neurons beingsensitive to the lTD in the microstructures of low frequency stimuli.§1.9 Models of Auditory LocalizationAs reviewed in the previous sections, a large amount of experimental work has beendevoted to the study of auditory localization for more than a century. In an effort toChapter] Auditory Localization 30explain the experimental data in terms of models or theories, modeling studies ofauditory localization have paralleled the experimental work for many decades.Some early models based their ideas on a model proposed by Bekesy in 1930(Bekesy, 1960; Colburn and Durlach, 1978). In his model, Bekesy assumed apopulation of neurons that were innervated by fibers from both ears. Neurons in thepopulation can become “tuned” to one of two excited states according to the source ofthe excitation. Specifically, a neuron is tuned left if its excitation comes from fibersoriginating form the left ear, or is tuned right if the excitation comes from fibersoriginating from the right ear. The lateralization of a stimuli is then determined by acomparison of the number of neurons tuned left with that of neurons tuned right.In 1958, Matzker proposed a model which was essentially consistent withBekesy’ s ideas, but was more plausible with respect to anatomical and physiologicalobservations. The population of Bekesy’s tunable cells is replaced with twosymmetric auditory pathways, each of which is excited by stimulation of one ear.The lateralization of a stimuli is determined by a comparison between activity levelsat some relatively central nuclei which receive information from the two pathways.Matzker (1958) also assumed that there were contralateral inhibitory pathways whichblock the excitatory pathways for a few milliseconds. No detailed assumption aboutthe structure of the pathways was given in the model. Thus, the model describes onlyqualitatively how sound localization may be accomplished by the auditory system.Bekesy’s ideas were further elaborated by van Bergeijk in 1962. In his model,van Bergeijk assumed that the binaural interaction occurs at a pair of relativelyperipheral nuclei. Lateralization of a stimuli is assumed to be determined by acomparison of the number of neural firings in the left nucleus with that in the rightnucleus. In this model, the ipsilateral inputs to each nucleus are assumed to beinhibitory and the contralateral inputs excitatory. The image of the sound stimuli isChapter] Auditory Localization 31assumed to be lateralized to the side opposite to the nucleus with the greater numberof firings (Colburn and Durlach, 1978).To Higher To HigherCenters CentersFig. 1.19 The coincidence network proposed by Jeffress in1948 for the localization of low-frequency tones. (AfterColburn and Durlach, 1978.)Another group of models (e.g. Licidider, 1959; Sayers and Cherry, 1957; Colburn,1973, 1977) have their root in the ideas suggested by Jeffress in 1948. In his model,Jeffress described a neural network for the estimation of interaural time differences.This network is often referred to as the coincidence network, and this model as thecoincidence model. Fig. 1.19 shows a schematic diagram of the coincidence network.Binaural neurons depicted in the diagram receive delayed neural signals via auditoryfibers from both ears. The response of a binaural neuron is assumed to be maximumwhen the two excitations from the two ears coincide in time. Thus, a particularbinaural neuron is tuned to respond maximally to a particular lTD depending on thedelays the nerve fibers introduce before the excitation signals reach the binauralneuron. Thus, by systematically varying the delays in the nerve fibers leading toChapter 1 Auditory Localization 32different binaural neurons, a neural map of different lTD values can be implementedin the network.§1.10 Cross-correlation Based ModelsJeffress’ model (1948) discussed in the last section is mathematically the same ascross-correlating the two signals detected by the two ears (Colburn and Durlach,1978). Many other models may also be viewed to be variations of a cross-correlationmechanism (Yost and Hafter, 1987; Colburn and Durlach, 1978). The basic idea ofthese models is that the cross-correlation function, defined asR(t,r)= JXR(t)xL(t_t)dt (1.1)has a maximum point at r = ,r*, if XR (t) and XL (t) are identical except for a delay r*,where XR (t) and XL (t) are input signals to the right and left ears, respectively.The first quantitative model based on the idea of the cross-correlation mechanismwas proposed by Sayers and Cherry in 1957 to model the binaural phenomena offusion and lateralization. Fig. 1.20 from Colburn and Durlach (1978) shows a blockdiagram of this model. In this model, the interaural cross-correlation function in Eq.1.1 is modified by two weighting factors. One factor limits the influence of largerinteraural delays and the other reflects the influence of interaural intensity differences.The output of the model is a judgment of whether the sound image is lateralized to theleft or to the right. This arrangement is convenient because the model was used to fitto the data from some psychophysical experiments where listeners were asked tomake such judgments.A multiple-channel version of this model was given by Sayers (1964) and Tooleand Sayers (1965b). In this later version, the spectra of the input signals are firstobtained via two banks of band-pass filters. Pairs of outputs from two band-passChapter] Auditory Localization 33filters (one from each ear) with the same center frequency are used as the inputs to thecross-correlation device depicted in Fig. 1.20.The idea of interaural cross-correlation has also been used in models for data frombinaural signal detection experiments (Jeffress et al. 1952; Robinson and Jeffress,1963; Dolan and Robinson, 1967; McFadden, 1968; Levitt and Lundry, 1966b;Osman, 1971). In such models, a decision variable is usually devised with the aimthat the decision made by the model match that made by the listeners in the binauraldetection experiments.In a series of papers, Colburn and his coworker (Colburn, 1973, 1977; Colburnand Latimer, 1978) developed a model for binaural interaction based on explicitquantitative description of physiological observations of auditory nerve fiber firingpatterns. The description of the auditory nerve activity in the model is based on theFig. 1.20 A block diagram of the cross-correlation model ofSayers and Cherry (1957). The term e’M in the figurecorresponds to the first weighting factor, and the term AL orAR to the second weighting factor discussed in the main text.The term Av(t) refers to temporal average. (After Colburn andDurlach, 1978.)Chapter] Auditory Localization 34physiological observations of the auditory nerve of the cat made by Kiang and hisassociates (Kiang 1968; Kiang et al., 1965), and on the mathematical modeling workof Siebert (1965, 1968, 1970). The activity of the fibers is described in terms ofstatistically independent, non-homogeneous Poisson random processes. For a 500-Hztonal stimulus described ass(t) = Acos[2r500(t—(1.2)the rate function rm (t), which was used to describe the instantaneous probability offiring of the synchronized fiber, was assumed to be of the formrm(t) Aexp(-Jöcos[22r500(t_ ‘r)]) (1.3)Fig. 1.21 shows a general block diagram used by Colburn and Durlach (1978) topresent the basic ideas of Colburn’s model (Colburn, 1973, 1977). In the model, it isassumed that an ideal decision maker uses an overall decision variable that is acombination of three decision variables: a purely monaural variable for each ear and abinaural variable that carries the interaural timing information. Central to this modelis a binaural displayer (Fig. 1.21b) which can be considered as a quantification andelaboration of Jeffress’ coincidence network (1948). The outputs of the coincidencecounter in Fig. l.21b, Lm(fm,rm), m = 1,• ,M,, are approximated by the followingintegralE[Lm(fm,tm)1= Twf rm(t— ;)rm(t)dt (1.4)for m = 1,• . ., M, where T is the time window for coincidence of each fiber pair, Tis the duration of the stimulus, r,flL (t) and rm (t) are the rate functions describing theinstantaneous firing probability for the corresponding left and right fibers,respectively. The integral in Eq. (1.4) is an estimate of the cross-correlation functionof the instantaneous firing rates of the corresponding fiber pairs.Many of the models that use the idea of interaural cross-correlation are essentiallyequivalent in terms of their operations on the acoustic input waveforms (Colburn andDurlach, 1978). They differ mainly in the detailed implementation of the interauralChapter] Auditory Localization 35correlation and in the detailed construction of the decision variables used for theprediction of psychophysical observations. These models are mainly designed to fitdata from a large number of psychophysical experiments which measure thesensitivities of the auditory system to such parameters of the binaural stimuli as thelTD. lID, and lED. The models do not offer new ideas as to how the lTD cue isprocessed other than the coincidence mechanism which was first proposed in 1948 byJeffress.(a)(b)Fig. 1.21 A block diagram of the auditory nerve based modelby Colburn (1973, 1977). (After Colburn and Durlach, 1978.)Jeffress’ coincidence network makes use of neural delay lines that havesystematic delay values. Shamma et al. (1989) proposed an alternative network,which was referred to as the sterausis binaural network, for lTD estimation withoutthe use of neural delay lines. They argued that it was not necessary to explicitly makeChapter] Auditory Localization 36use of neural delay lines in order to estimate the lTD. Specifically, they noted thatthe traveling wave of the motion of the basilar membrane provided timinginformation that was sufficient for the estimation of the lTD. The phase of thewaveform describing the motion of the basilar membrane at a particular point on thebasilar membrane is delayed with respect to the phase of the waveform at anotherpoint on the basilar membrane that is nearer to the apex of the basilar membrane.Thus, the correlation of the signals originating from different points on the basilarmembrane can be used as the mechanism for lTD estimation. Fig. 1.22 shows aschematic diagram of this model. A matrix of binaural neurons are assumed toreceive excitatory inputs from both ears through the auditory nerve fibers. Eachneuron in the matrix receives neural signals from two, and only two, fibers (each fromone ear), generating a measure of the correlation between the instantaneous activitiesof the two fibers. Due to the phase difference of the neural signals originating fromfibers with different characteristic frequencies, a binaural neuron in the matrix shownin Fig. 1.22 responds maximally when the phase difference between the neural signalsat the input of this neuron match the interaural time delay between the stimuli at thetwo ear. The idea of cross-correlating signals from two auditory nerve fibers withdifferent characteristic frequencies in order to estimate the lTD has also beensuggested earlier by Schroeder (1977) and Loeb et al. (1983).In terms of operations or of processing of the neural signals in the auditory nervefibers, the model of Shamma et al. (1989) is the same as that of Jeffress (1948).Binaural neurons in both models generate a measure of the cross-correlation of thetwo input signals. The uniqueness, however, of the sterausis model lies in the factthat the cross-correlation measure of the sterausis neuron also reflects the lIDinformation in addition to the lTD information. In Jeffress model, a separatemechanism for the lID sensitivity has to be assumed. A weakness of the sterausismodel, however, is the difficulty in evaluating the model quantitatively. It is difficultChapter] Auditory Localization 37to calibrate the lTD and the lID sensitivities of the network from the activity patternof the binaural neuron matrix shown in Fig. 1.22.Sound-pAVCNFAuditory nerveContra-lateral cochleaFig. 1.22 Schematic diagram of the sterausis binaural networksuggested by Shamma (1989). cq denotes the output of the(i, j) - th neuron in the matrix, which receives inputs from theipsilateral fiber X1 and the contralateral fiber Y.In 1994 Bonham further extended the ideas of Shamma et al. (1989) in his Ph.D.thesis by allowing multiple fibers with different characteristic frequencies from bothbasilar membranes to innervate a single binaural neuron. Bonham (1994) showed thata weighted sum of the neural signals from multiple frequency bands from one earcould give rise to a waveform that was similar to the neural signals from single.000*• -. ks.AVCN t tCharacteristic frequencyChapter 1 Auditoty Localization 38auditory nerve fibers but had a different phase delay. Different sets of weights, whichcan be obtained using a Hebbian type learning rule (the connection weight betweentwo neurons is increased if they fire together), result in different phase delays in thecombined signals. Such combined signals from both ears can thus be correlated togenerate lTD estimates. This model still obtains the lTD sensitivity by means ofcross-correlating two temporal waveforms. The only difference of this model fromthe sterausis model of Shamma et al. (1989) and the coincidence model of Jeffress(1948) lies in the origin of the temporal signals to be cross-correlated: in Jeffress’model the signals are delayed versions of single auditory nerve fibers; in the sterausismodel the signals come directly from the auditory nerve fibers without any delay; andin B onham’ s model the signals are weighted sums of neural signals from differentauditory nerve fibers.When applying the cross-correlation based models to complex stimuli, a commonapproach is to apply the cross-correlation mechanism in different frequency bands,and the results of individual bands are combined across the relevant frequency range(Sayers 1964; Toole and Sayers, 1965b). In the more recent cross-correlation basedmodels cited above (such as those of Colburn, 1973, 1977, Shamma et al., 1989, andBonham, 1994), this multiple frequency channel scheme is inherent in the sense thatthe cross-correlation is carried out explicitly over neural signals of auditory nervefibers. In these models, however, the emphasis is given to how the lTD and lID canbe measured for individual frequency channels. In 1988, Stern at a!. proposed amodel that addresses explicitly how the individual frequency channels could becombined and weighted in order to model lateralization data for band-pass stimuli.They proposed a weighting strategy to predict the lateralization of low-frequencyband-pass stimuli based on the outputs of multiple cross-correlation channels, whichwe refer to as the multi-channel cross-correlation function. Fig. 1.23 shows anexample of the multi-channel cross-correlation function. The solid lines in the Fig.Chapter 1 Auditory Localization 391.23 indicate the locations of the peaks of the cross-correlation function acrossfrequency space. The predicted lateralization of the stimuli in terms of the interauraldelay is a weighted average of the t (ITD) coordinates of the solid lines. Theweighting is done in such a way that if a solid line is more straight and closer to=0, the corresponding t values are given greater weights.Fig. 1.23 Location of the peaks of the multi-channel cross-correlation function for broad-band noise with an lTD of 1500us. The vertical axis indicates the center frequency of thedifferent band-pass filter channels, while the horizontal axisindicates the argument (ITD) of the cross-correlation function.(After Stern et al., 1988.)The idea of weighting of the multi-channel cross-correlation function has alsobeen used by Shackleton et al. (1992). The weighting is carried out on the entirecross-correlation function with an implicit assumption that there is only one soundsource in the acoustic environment. Thus, the model can not deal with multi-sourceenvironments.The idea of cross-correlation function weighting has been taken a step further byArad et al. (1994) to address the problem of multiple sound source localization.-4000 -2000 ‘0 2000 4000lnteraural Delay r (i.s)Chapter] Auditory Localization 40Unlike the model by Stern et al. (1988), the weighting is done over selected frequencychannels, and short-time cross-correlation functions are used. An overall short-timecross-correlation function is obtained by integrating, across frequency, the short-timecross-correlation functions of the selected channels. A series of such overall short-time correlation function are obtained over a certain listening period. This series ofcorrelation functions are then either averaged over time, or used to form a two-dimensional image where one dimension corresponds to the lTD and the otherdimension is time. The values of the short-time correlation function are translatedinto gray-scale values in the image. Traces of peaks in the image along the temporaldimension indicate the presence of the sound sources. The lTD values correspondingsuch traces of peaks in the image are the lTD estimates of the corresponding sources.Edge detection algorithms were used to locate such traces of peaks in the image.In this model, multiple sources with fluctuating intensities are localized by meansof estimating short-time cross-correlation functions. Aral et al. (1994) tested theirmodel in both the one- and two-speaker cases. When there are multiple soundsources in the acoustic environment, short-time cross-correlation functions give riseto noisy and faulty peaks. Aral et al. (1994) reported that, in order to obtain reliableestimates of the cross-correlation functions, several (4-12) short-time (3-12 ms)correlation function estimates have to be averaged. Furthermore, in the two-speakercase, biased lTD estimates were obtained.The idea of short-time cross-correlation is also used by Lyon (1983) in a modelfor binaural localization and source separation. In this model, short-time lTDestimates are first obtained via short-time cross-correlation. Frequency- and timedependent weights are then chosen to weight the outputs of the left and right cochlearmodels according to the short-time lTD estimates. The final output of Lyon’s model(1983) is a spectro-temporal representation of the sound stimuli based on theweighted outputs of the cochlear models. Such spectro-temporal representations areChapter] Auditory Localization 41used to visualize sound stimuli. The model can display separate images for differentsound sources, but the separation is limited to left from right or vice versa.Another direction for the extension of the basic cross-correlation model (Sayersand Cherry, 1957) is the introduction of a lateral inhibition mechanism and monauralchannels into the model. Lindemann (1986a, b) proposed a model in whichcorrelation at one delay value inhibits correlation at other delay values. This producesthe effect of sharpened cross-correlation functions. The model also provides amechanism for the precedence effect. (The precedent effect is observed when twobinaural sounds are presented with a brief time interval between the two sounds. Thetwo sounds may be perceived as a single auditory event, and when this happens, theperceived lateralization of the single auditory event is determined by the directionalcues carried by the first sound.) The centroid of the laterally inhibited cross-correlation function is taken to be the predicted lateral position of the sound image.An interesting aspect of Lindemann’ s model is that the lateral inhibition mechanismallows the lTD information to be processed as well as the lTD information because thelevel of inhibition is assumed to be intensity dependent.An interesting extension to Lindemann’s (1986a, b) model was made by Gaik(1993) who imposed attenuation factors on the inhibited cross-correlation functiondefined in Lindemann’s model. The attenuation factors are used to insure that, fornatural ITD-IID combinations, the intensities of the signals being correlated are equalwhen maximum correlation is reached. Natural lTD-lID combinations are thecorresponding lTD and lID value pairs produced by free-field impulsive soundsources. Thus, the modified (with both attenuation and lateral inhibition) crosscorrelation function has a single peak for natural lTD-lID combinations. Multiplepeaks may occur for unnatural combinations, signaling the presence of multiple soundsources that can be identified by separate lTD and 11D cues.Chapter 1 Auditory Localization 42§1.11 Models of cue sensitive neuronsModeling studies have been closely coupled with experimental studies of neuralsensitivities of the auditory system (Goldberg and Brown, 1969; Yin and Chan,1990). The joint effort of both theoretical modeling and experimental recording ofneural activities in responding to binaural localization cues has been aimed atilluminating the neural mechanism for auditory localization.As reviewed in Section 1.3, the superior olivary complex is the first site in theauditory system that exhibits binaural interaction. In fact, it has been identified to bethe primary site of binaural processing (Kuwada and Yin, 1987; Goldberg and Brown,1969; Yin and Chan, 1990), and most models of cue sensitive neurons are built withneurons in this complex in mind. There are two nuclei in the superior olivarycomplex that are thought to be involved in sound localization. In the lateral superiorolive, high frequency (>3 kHz) neurons are sensitive to liDs (Boudreau andTsuchitani, 1968; Caird and Kiucke, 1983). In the medial superior olive, on the otherhand, the majority of the neurons have low characteristic frequencies (<3 kHz), andthese neurons are sensitive to 1TDs (Goldberg and Brown, 1969; Yin and Chan, 1990;Guinan et al., 1972).The hypothetical coincidence detector neurons in Jeffress’ lTD network (1948)have been taken as the basic model for the lTD sensitive neurons in the MSO.Goldberg and Brown (1969) and Yin and Chan (1990) showed that some neurons inthe MSO exhibited properties predicted by the coincidence detector in Jeffress’ model(1948). Most other models are elaborations and refinements of the Jeffress model.In 1990, Colburn et al. presented such a refined coincidence model of observedMSO responses. This model is based on an internal variable that is identified as themodel membrane potential. There are two inputs to the model, one from each ear.The inputs are modeled as filtered sequences of pulses distributed as Poissonprocesses. The membrane potential is increased a certain amount whenever there isChapter] Auditory Localization 43an input pulse from either input. The modeled neuron outputs a firing pulse wheneverthe membrane potential reaches a threshold. The behavior of the model wascompared with physiological data obtained by Goldberg and Brown (1969) and Yinand Chan (1990). Good agreement between the data and the model was reported byColburn et al. (1990).The model of Colburn et al. (1990) was further refined by Han and Colburn(1993), who replaced the previous more functional model of the membrane potentialwith a model neuron that has four conductance channels. One of the four channelsresponds to excitatory inputs, another responds to inhibitory inputs, a third representsthe delayed potassium channel that opens in response to output action potentials, andthe last channel represents the constant, residual conductance of the cell membrane(MacGregor, 1987). The model showed similar behavior to the previous model ofColburn et al. (1990).The response properties of MSO neurons have also been modeled using themaximum likelihood (ML) estimation technique. By modeling the inputs of the MSOneurons as Poisson processes, Dabak and Johnson (1992) investigated what a MLestimator would do to obtain an lTD estimate, and then compared the input-outputcharacteristics of the ML estimator with that of the MSO neurons. They found that animplementation of the ML estimator has a similar structure to that of the Jeffress’coincidence detector model (1948), but the ML estimator differs from Jeffress’ modelin that the ML estimator processes its inputs in a much complicated fashion. Detailedimplementation of this more complicated processing is not given in their paper.Chapter] Auditory Localization 44Contra lateralInputIpsilateralInputFig. 1.24 Schematic diagram of the model by Sujaku et al.(1981) for lTD sensitive inferior colliculus neurons.Although the models just described above do not contain any inhibitory input,evidence of inhibitory inputs to the MSO neurons has been found by Grothe andSanes (1993), Adams and Mugnaini (1990), Schwartz (1992), and Cant and Hyson(1992). Few studies, however, have explored the roles of inhibition. Inhibition wasseen in a model by Sujaku et al. (1981), which describes the lTD sensitivity ofneurons in the inferior colliculus. The structure of the model is shown in Fig. 1.24.There are two inputs to the model, one from each ear, which make excitatory synapticconnections onto the modeled binaural cell. In this respect, the model is similar to amodel of the type described above, i.e. a coincidence detector model. The uniquenessof the model, however, is the addition of the collateral presynaptic inhibition fromeach input. In other words, the input from one side decrease the effect of the inputfrom the other side on the model neuron. D1 through D4 in Fig 1.24 are fourindependent delays introduced in the four branches of the inputs. The model wasshown to be able to simulate the observed neural responses in the inferior colliculusof the cat. The model was also shown to be able to simulate neurons sensitive to thedirection of lTD changes. Another model of lTD sensitive IC neurons thatTimeDelayExcitatory..ASynapseInhibitorySynapseChapter] Auditory Localization 45incorporated both excitatory and inhibitory inputs is proposed by Colburn andIbrahim (1993).As mentioned in the beginning of this section, high-frequency neurons in the LSOare sensitive to the liDs (Boudreau and Tsuchitani, 1968; Caird and Klucke, 1983).These neurons are excited by the ipsilateral input and inhibited by the contralateralinput. The thresholds and tuning of the ipsilateral excitatory effect and that of thecontralateral inhibitory effect are often comparable. This excitatory-inhibitory (El)characteristics provides a neural mechanism for the lID sensitivity (Caird and Klucke,1983; Pickles, 1988). Guinan et al. (1972) suggested a model for the 11D sensitiveLSO neurons. This model was later studied by Colburn and Moss (1981). The modelneuron receives two inputs, one from each ear, and the neuron has a membranepotential which is characterized as the internal variable of the model. The input fromeach ear is a pulse train modeled as a Poisson process. An input pulse from theipsilateral side causes a depolarization pulse to be added to the membrane potentialwhile an input pulse from the contralateral side causes a hyperpolarization pulse to beadded to the membrane potential. The model neuron fires an output whenever thedepolarization exceeds a threshold. Colburn and Moss (1981) were able todemonstrate that the overall pattern of the response of the modeled LSO neuron wassimilar to those measured in the LSO by Boudreau and Tsuchitani (1968). The modelneuron showed sensitivity to both the lID and the overall intensity of the stimuli.Johnson et al. (1990) developed a maximum likelihood (ML) estimator which wasused to model the response behavior of the 11D sensitive LSO neurons. The MLestimator takes as its inputs the input signals to the LSO neurons. Such input signalsare modeled as Poisson processes. The intensity functions of these Poisson processesare formulated to be functions of three variables: one is the lateral angle 0 of theperceived sound image of the stimuli, and the other two are the intensities of thesound stimuli at the two ears. The output of the ML estimator is an estimate of theChapter] Auditory Localization 46lateral angle 0. Johnson et al. (1990) found that the ML estimate of 0 was based onestimation of the 11D. In other word, the IID sensitive neurons can be thought to bepart of an implementation of the ML estimator.While some researchers have been concerned, as in the models just reviewedabove, about the response properties of the individual LSO neurons, some others havebeen concerned about the organization of the LSO neurons that have functional rolesin sound localization. Reed and Blum (1990) have presented a structural model forthe encoding of the azimuth angle by a hypothetical column of neurons in the LSO.The focus of the model is on the connections between the LSO neurons in the columnand their inputs from the anteroventral cochlear nucleus (AVCN) and the medialnucleus of the trapezoid body (MNTB). Fig. 1.25 shows a schematic diagram ofthese connections. A basic assumption made in the model is that the thresholds of theAVCN neurons increase monotonically from the AVCN neurons connecting to oneend of the LSO column to those connecting to the other end of the column. Thethresholds of the MNTB neurons (which make inhibitory connections with the LSOcolumn), however, decrease monotonically in the same direction along the LSOcolumn. This arrangement results in the coding of the liDs by the position of theneuron in the LSO column whose firing first goes to zero. The encoding of the liDsin the model was demonstrated to be independent of the absolute sound level, and tovary linearly with the lID.Yin et al. (1985) found that some neurons in the cat superior colliculus weresensitive to the liDs but did not show the excitatory-inhibitory characteristicscommon to the lID sensitive LSO neurons. In fact, those superior colliculus neuronsreceive excitatory inputs from both ears, as does the lTD sensitive neurons observedin both the medial superior olive and the superior colliculus (Goldberg and Brown,1969; Yin and Chan, 1986; Kuwada et a!. 1984). Yin et al. (1985) tested thesensitivity of these lID sensitive neurons to the ITDs, and found that the neurons areChapter] Auditoiy Localization 47also sensitive to the ITDs. Moreover, the neurons’ lID sensitivity functions aresimilar in shape to their corresponding lTD sensitivity functions. These observationsled Yin et al. (1985) to suggest a temporal mechanism for the lID sensitivity of theseneurons.Fig. 1.25 Schematic diagram of the connections from AVCNand MNTB to the LSO column in the model by Reed and Blum(1990).Such a mechanism has been proposed by Jeffress in 1984 in his temporalcoincidence model (see Fig. 1.19 on Page 31). Jeffress hypothesized that the neuralmechanisms for the lID sensitivity also involve temporal coincidence since the neuralresponse latency may change as a function of the stimulus intensity. This hypothesishas been referred to as the latency hypothesis (Kuwada and Yin, 1987).Electrophysiological recordings made by Yin et al. (1985) seem to have foundevidence that supports such a model. In fact, a computer simulation carried out byYin et al. (1985) showed that such a model could be used to describe the responses ofsome neurons in the superior colliculus. The sensitivity functions of the modeledChapter 1 Auditory Localization 48neuron to both the lTD and 11D resemble in general shape to that observed in theexperiment by Yin et al. (1985). The model, however, does not account for theobserved changes in the discharge rate in responding to the changes in the overallstimulus intensity (Kuwada and Yin, 1987).492 Modeling Auditory Localization in ComplexAcoustic EnvironmentsIn Chapter 1, we reviewed both experimental and theoretical studies of auditorylocalization. Theses studies have shown that the auditory system is very sensitive tothe spatial location as well as a number of parameters (ITD, lID, and TED) of a soundstimulus. Specifically, the smallest change in azimuth that the auditory system candetect for a pure tone source can be as small as 1/360 of the entire azimuthal range(Mills, 1958). Also, a listener can notice a 0.5 dB change in the lID under certainconditions (see Fig. 1.9 on Page 16). Perhaps the most impressive sensitivity of theauditory system is that to the ITDs. The owl can detect ITDs as short as 10 jis(Konishi 1993). The highest sensitivity to the lTD reported in human psychophysicalexperiments is also on the order of 10 jis (Yost and Hafter, 1987), which is very shortconsidering the fact that a neural impulse persists for as long as 1000 .ts.As reviewed in Chapter 1, much effort has been directed to the study of themechanisms for the sensitivities of the auditory system to localization cues. Manymodels have been proposed. A noticeable characteristics of these is that most modelsare designed to fit to the data observed in psychophysical and physiologicalexperiments where the sensitivity measurements were made under well controlledconditions (Sayers and Cherry, 1957; Colburn, 1973, 1977; Stern et al., 1988; Gaik,1993; Colburn et al., 1990; Han and Colburn, 1993; Sujaku et al., 1981; Colburn andMoss, 1981; Yin et al., 1985). This limits the validity of the models in more complexsituations, which are typical of real acoustic environments.Many quantitative models are centered around proposed mechanisms for the lTDsensitivity of the auditory system. Furthermore, many of the models are based on thecoincidence detection mechanism first proposed by Jeffress in 1948. This mechanismhas been implemented, in a number of models, in terms of the estimation of a varietyChapter 2 Modeling Auditory Localization in Complex Acoustic Environments 50of interaural cross-correlation functions (Sayers and Cherry, 1957; Sayers, 1964;Colburn, 1973, 1977; Shamma et al., 1989; Stem et al., 1988; Lindemann, 1986a, b;Gaik, 1993; Arad et al., 1994). In order to adapt these models to deal with thesituations common in real acoustic environments (which are noisy, dynamic and havemultiple sound sources) short-time interaural cross-correlation functions have beenused (Arad et al., 1994). However, as Arad et al. (1994) have shown, short-timecross-correlation function estimates are noisy and have faulty peaks when the timewindow is relatively small, and when there are two or more sources emitting soundsat the same time. Thus, previous models have difficulties when applied to soundlocalization tasks in complex situations which are more likely to occur in a realenvironment.Natural acoustic environments are typically noisy, dynamic, and have multiplesimultaneous sources. Nevertheless, the auditory system can localize sound sourcesin natural environments with good accuracy and reliability. Investigation into howauditory localization is achieved in such situations may provide an opportunity todiscover new mechanisms of or insights into auditory localization that may not be soevident under simplified and idealized assumptions about the acoustic environment.Another reason for the consideration of a more complex and realistic acousticenvironment is that any practical localization device must work in such situations.Furthermore, previous models could not be adapted in a straightforward fashion forapplications in the development of localization devices that would work effectively inreal environments. Thus, the problem we wish to consider is how auditorylocalization is achieved in noisy, dynamic, and multi-source environments.We argue that in order for the auditory system to solve the problem of soundlocalization in natural environments, the localization cues must be estimated robustlyand quickly. Since the natural environment is usually noisy, the estimation of thelocalization cues must be resistant to the effect of noise. Another expectation is thatChapter 2 Modeling Auditory Localization in Complex Acoustic Environments 51multiple sources will be present simultaneously in the majority of real-life situations.In such situations, the auditory system is demonstrated to be able to concentrate onone source to the exclusion of others (Durlach and Colburn, 1978). Robust short-timecue estimation may provide a possible mechanism for such an ability.First, we note that, in natural environments, sound sources are often dynamic inthe sense that there are many transient effects, and the energies of the sounds emittedby different sources change from time to time. Thus, it is essential for the auditorysystem to estimate the location cues robustly in the relatively short time periods orwindows during which some sound source emit higher energy than other sources. Inthis way, it is possible for the auditory system to pick up multiple sound sources thatemit sound energies over the whole observation period, but have changing relativeintensities. To model such a process, we propose a “decomposition-localizationintegration” (DLI) scheme. In this scheme, the mixture of signals detected by the twoears are first decomposed into their spectro-temporal distributions. Spatial attributes(in terms of the ITD, lID, and TED cues) are then determined robustly over smallspectro-temporal windows from energy concentrations in the spectro-temporaldistributions. Finally, a spatial scene of the sound sources in the environment is builtby integrating the short-time energy concentrations according to their spatialattributes. This integration process is needed because a sound source may spread overthe entire observation period while the spatial attributes of the energy concentrationsare estimated over small temporal windows.There is physiological evidence that supports the above modeling concept. Wenote that all the subsequent processes for binaural as well as monaural informationprocessing are based on the neural signals from the auditory nerve fibers (see Figs.1.1 and 1.2), which in turn “map” in some form the parallel analysis of the soundstimulus encoded in the motion of the basilar membrane (Allen, 1985). Such parallelanalysis is in fact a spectro-temporal decomposition of the stimulus signal (ShammaChapter 2 Modeling Auditory Localization in Complex Acoustic Environments 52et a!., 1989). Furthermore, Takahashi et al. (1993) conducted experiments to examinethe ability of neurons in the inferior colliculus of the barn owl to respond to multiplesound sources. Using two noise bursts which were time reverses of each other, theyfound that the owl was able to “tell” that there were two sound sources, even thoughthe spectra of the two noise bursts were the same, and the two noise bursts weresimultaneously presented to the owl. This observation suggests that the space map ofthe inferior colliculus relies on differences between noise bursts that exist over brieftime spans, and that it builds a neural image of the acoustic environment frommultiple samples gathered over time. It further suggests that the auditory system mayestimate the location cues in short time windows, which corresponds to the“localization” process in our proposed DLI scheme. Thus, it is plausible that theauditory system detects sound sources in short time windows and accumulates shorttime estimates to get a complete image of the acoustic environment. Theaccumulation of short-time cue estimates corresponds to the “integration” process inthe DLI scheme.The question for us now is how can the auditory system estimate the location cuesquickly and robustly? Most interestingly, how can the relatively slow nervous systemmeasure the very small IYDs in a short-time window? Our approach to the problem isto look at what we know about the system, and see how we can find a process thatwill be able to evaluate and to measure the relatively small time differences. It is notobvious by just observing the neurons and the nuclei in the auditory pathways that therelatively slow system can actually measure very small time differences. Is there aprocess in the system, we wonder, that can indeed produce clear and robust timedifference measures, or something that is on the order that the brain can deal with, yetit is an indicator that the small time difference exists? If we could find such aprocess, we can devise an algorithm that can be used in a machine, and give an ideaQOOO CIDNC,)C,).0CCl)CDCCl) Cl)C))-.0——CD-)=C-()Cl)C,)0‘<0CDCl)CDC)Cl)Cl)- — CD-CDCDo‘<Cl)0CD0eCDCDrCl)CDCDCl)000_q),CD‘0-Cl)Cl)C-0Cd).‘-a—CDcrq-Cl)CDz00CDCD0 .i.-.CDg5DCDCDCl),—.Q-00Cl)C0C-CDCd)00Cl)•-•Cl)‘C-C’CDCD.CDdCDCD -÷,.0CDCDC-•’C-t-—<-I-tIC‘-•ci)CCDCDCDC)Cl)CD—-NCl)-CD0•I-.C)-CDN00)00—CD—CD—‘CICl)0--)•_(r-•CD0CD0C-Cl)C-+.-P0--p—.0CDCDCl)-4C-::CDCDCD-4CD-.•-4-4CD0CDCD‘.••CD Cl)-—fCl)C/)Cl)r.fCDC-CD—iCDoCDz-•—-‘• QCl)‘0QtC)‘tCDCDCl)Cd)Chapter 2 Modeling Auditory Localization in Complex Acoustic Environments 54In Fig. 2.1, we can clearly see the structured pattern features (e.g. the curved linepatterns of the synchronized maximum response points across frequency) in the two-dimensional spectro-temporal image representing the neural signals of the auditorynerve fibers. It is a promising avenue to explore whether these patterns hold someindicators that can be used to evaluate the very small time differences between thetwo ears. We argue that these patterns, readily available in the auditory system,indicate to us that, indeed, it is possible to use the relatively long time scale events,i.e. the patterns shown in Fig. 2.1, as references for the measurement of very smallinteraural time differences. In other words, patterns from the two ears can becompared, and pattern comparison yields measures of the very small time differences.We do not know whether the neural system actually does that, but it is possiblebecause the patterns are readily available in the auditory system, and the neuralsystem is good at comparing patterns. This is a question for the physiologists todecide.The fundamental contribution of this idea is that there are unexplored patterns thatare generated by the system which may embed codes for the small interaural timedifferences. These patterns may be generated for the very purpose of enabling therelatively slow neural elements to evaluate small time differences. This can beviewed as a coding process where the patterns are used to carry the messagepertaining to time.Thus, a key task for us is to model the pattern recognition and comparison processfor the purpose of estimating short-time cues. The many techniques for solvingpattern recognition and comparison problems are based on either statistical or neuralnetwork concepts (Fukunaga, 1990; Pao, 1989; Nigrin, 1993). To choose the mostappropriate approach for use in our model, certain aspects of our modeling processshould be considered. Firstly, our objective is to develop a model that is based on asolid foundation of knowledge about the way the auditory system achievesChapter 2 Modeling Auditory Localization in Complex Acoustic Environments 55localization. The auditory system, like any other part of the nervous system, involvesparallel processing. Parallel processing is also important for practical applications asit has the potential to work in real time when implemented using current VLSItechnology (Mead et al., 1991). Secondly, it is important to be able to relate theprocesses involved in the model to the processes of the auditory system. Thirdly, aswe argued earlier in this chapter, estimates of different localization cues may beobtained from similar pattern recognition and comparison processes. Thus, flexibleprocessing structures are more desirable than ad hoc algorithms specifically designedfor the estimation of different cues. Fourthly, the ability to learn and to continuelearning after initial training is important if the model is to be used for practicalapplications. Statistical techniques for pattern recognition are mostly not adaptive,but typically process all training data simultaneously before being used with new data(Lippmann, 1987). Moreover, as the acoustic environments under consideration arenon-stationary, noisy and have multiple sources, assumptions about the types ofacoustic stimuli and their statistical distributions should be kept to a minimum. Thus,non-parametric pattern recognition techniques are preferable as they require weakerassumptions concerning the shapes of the underlying distributions of patterns than thetraditional statistical pattern recognition techniques (Lippmann, 1987). Finally,robustness and fault tolerance in the model are also important for practicalapplications.Artificial neural networks have proven to be powerful tools for solving patternrecognition and comparison problems (Nigrin, 1993; Pao, 1989). More importantly,they provide an integrated pattern recognition framework which potentially has all thedesirable features discussed above. Therefore, artificial neural networks were chosento model the pattern recognition and comparison process in our proposed DLIscheme.Chapter 2 Modeling Auditory Localization in Complex Acoustic Environments 56It should be noted that a number of recent studies (Anderson et al., 1994;Backman and Karjalainen, 1993; Lim and Duda, 1994; Neti et al., 1992; Palmieri etal., 1991; Gelfand et al., 1988; and Wang and Denbigh, 1993) have used neuralnetworks to model auditory localization. However, these studies explored differentaspects of auditory localization from the one addressed in this thesis.Anderson et al. (1994) proposed a hierarchical neural network structure for soundlocalization based on cross-correlations of the left and right sensory signals and on theliDs. The liDs were obtained from the FFTs of the two sensory signals. Backmanand Karjalainen (1993) proposed another neural network model which also usedcross-correlations and FFT based lID estimates. In the model by Palmieri et al.(1991), the ITDs and liDs are given pre-determined values rather than estimated ones,and neural networks were trained to perform the mapping from the ITDs and liDs tothe corresponding spatial locations. Although neural networks were used in thesemodels, they were used to simulate different processes from what we are proposing tomodel. While we are proposing to use neural networks to obtain robust estimates oflocalization cues (e.g. ITDs and liDs), the above models use neural networks to mapestimated localization cues to spatial positions. Furthermore, our approach has thepotential of extracting localization patterns that are more complex and more powerfulthan the traditional second-order statistics. Neural networks were also used by Neti etal. (1992) and Wang and Denbigh (1993) to model auditory localization, but thesetwo studies focused on monaural localization cues rather than the binaural cuesconcerned in our work. Gelfand et al. (1988) proposed an artificial neural map basedmodel for sensory fusion. An lID-frequency map and an lTD-frequency map wereused in their model, but it is not clear how these two maps were obtained fromsensory signals. No result was given for these two maps. The focus of the model wason sensory fusion rather than sound localization. Lim and Duda (1994) proposed amodel for sound localization based on the output of a cochlear model. They usedChapter 2 Modeling Auditory Localization in Complex Acoustic Environments 57auto- and cross-correlation based lTD and 11D estimates to study how these cues varywith azimuth and elevation, and how well the azimuth and elevation of a single soundsource can be estimated from the lTD and 11D cues. It should be noted that all theabove cited models examined only single-source situations. In contrast, our modelingeffort is devoted to localization in multiple source environments.583 A DLI Model for Auditory Localization§3.1 A DLI ModelIn the last chapter, considering the characteristics of real acoustic environments, weproposed a modeling scheme in which the process of localizing multiple soundsources in a noisy non-stationary environment was divided into three stages in series:decomposition, localization, and integration (DLI). Furthermore, as discussed in thelast chapter, the localization process in the DLI scheme, which corresponds to theestimation of short-time location cues, can be realized by looking for and comparingpatterns in the spectro-temporal distributions of the sound stimuli. Here we present aDLI model, shown in Fig. 3.1, that uses the above described idea of patternrecognition and comparison. The pattern recognition and comparison process is to beimplemented with neural networks which possess the demonstrated ability to learnfrom examples to perform such tasks (Rumelhart et al., 1986).§3.2 Three Parallel Implementations of the DLI ModelAs reviewed in Chapter 1, three binaural differences (ITDs, lIDs, and IEDs) playimportant roles in auditory localization. They all have profound effects on theperceived lateral position of the sound image when the ears are stimulated by dichoticstimuli, but experimental evidence indicates that separate pathways in the auditorysystems are responsible for the observed effects of different cues (Yost and Hafter,1987; Kuwada and Yin, 1987; Konishi, 1993; Durlach and Colburn, 1978).Specifically, as it has long been recognized in the duplex theory (Rayleigh, 1907),the localization of tones is determined by the sensitivity of the auditory system todifferent cues (ITDs and lIDs) in different frequency regions (Mills, 1960, 1972).The lTD sensitivity of the auditory system is limited to a frequency range bounded byChapter 3 A DLI Modelfor Auditory Localization 59a relatively low upper frequency (1200 Hz for human) while the lID sensitivityextents beyond 10 kHz (Mills, 1960, Yost and Dye, 1987). Furthermore,physiological and anatomical experiments suggest that the processing of the lID andlTD cues occurs in different auditory nuclei (Boudreau and Tsuchitani, 1968; Cairdand Kiucke, 1983; Goldberg and Brown, 1969; Yin and Chan, 1990; Guinan et al.,1972).Fig. 3.1 A schematic diagram of a DLI model for auditorylocalization in complex acoustic environments.Although the auditory sensitivity to the ITDs is limited to low frequencies fortones, the auditory system is sensitive to the envelope delays for complex high-frequency stimuli, such as click trains and amplitude modulated tones (Leakey et al.,1958; David et al., 1958, 1959; Yost et al., 1971; McFadden and Pasanen, 1976;Henning, 1980). Also, some auditory neurons with high characteristic frequencies aresensitive to the TEDs (Yin et al., 1984). Thus, it seems that the processing of ITDsLeftStimulus Decompositionby the AuditoryPeripheryRightStimulusDecompositionby the AuditoryPeripherySpatial Cue DistributionChapter 3 A DLI Modelfor Auditoty Localization 60and TEDs also occurs in different neural circuitry originating form different parts ofthe basilar membrane.Although, as discussed above, different binaural cues are processed in separatepathways in the auditory system, the ideas behind the generic DLI model, shown inFig. 3.1, remain valid for all three binaural cues for the following reasons. First, thearguments made in the last chapter for the DLI modeling scheme can be referred todifferent binaural cues. Second, although the idea of pattern recognition andcomparison was introduced in the last chapter with the measurement of small timedifferences in mind, a different aspect of the interaural differences, namely the lID,may also be measured by way of such a pattern comparison process. Third, thestructured pattern feature observed in Fig. 2.1 on Page 53 is a result of the phase-locking property of the neural activity of the auditory nerve fibers (Kuwada and Yin,1987; Pickles, 1988) to the fine timing structure of the stimulus waveform. Althoughsuch phase-locking to the fine timing structure disappears for high-frequency fibers,there is evidence that high-frequency fibers are able to phase-lock to the envelopes ofamplitude-modulated signals with high-frequency carriers (Moller, 1974). Moller(1974) found that high-frequency cochlear nucleus neurons, which are directlyinnervated by high-frequency auditory nerve fibers, exhibit phase-locking property tothe envelopes of amplitude-modulated stimulus signals. Thus, the patterns in theneural activities of the high-frequency auditory nerve fibers, which reflect the timingstructures of the envelopes of the sound stimuli, can still be explored and compared inorder to measure the TEDs. In conclusion, different implementations of the genericDLI model shown in Fig. 3.1 can be developed to model auditory localization basedon different binaural cues. Fig. 3.2a through 3.2c show three such implementationscorresponding to the three binaural cues (ITD, lID, and lED) respectively.Chapter 3 A DLI Model for Auditory Localization 61Fig. 3.2 Schematic diagrams of three implementations of thegeneric DLI model (shown in Fig. 3.1): (a) lTD estimation; (b)lID estimation; and (c) TED estimation. ST-lID: short-timelID. ST-ITD: short-time lTD. ST-lED: short-time TED.Chapter 3 A DLI Modelfor Auditory Localization 62ITDST-lTD IntegrationI ST-lTD---- &SL(t) lTDS11(t) AWJv4q1im Network___o I I 00‘ Q_ I -41----(a)Chapter 3 A DLI Modelfor Auditory LocalizationILST-I ID IntegrationST-LID-Network--------lID634 SL(t)bwV\MA aci)-Jci)0ci)1ci)0ci)00ci)€0-(b)Chapter 3 A DLI Model for Auditory LocalizationAST-lED Integration-ST-lED----Network---------64- lED4 SL(t)Iw0>,-0D__J 000C00<0cx(c)Chtzpter 3 A DLI Modelfor Auditoiy Localization 65Corresponding to the different processing stages shown in the generic DLI model(Fig. 3.1), each of the three parallel models shown in Fig. 3.2 consists of threeprocessing stages in series. Specifically, in each of the three diagrams shown in Fig.3.2, the left and the right stimuli are first processed by the models of the left and rightperipheral auditory systems, and are transformed into two sets of parallel signalsrepresenting the firing probabilities of parallel auditory nerve fibers. These two setsof parallel signals are then processed by a parallel set of neural networks (shown ineach diagram in Fig. 3.2 are one of such networks in the set) whose tasks are to lookfor patterns in the outputs of the peripheral auditory models, and to abstract thecorresponding binaural localization cues (lTD in Fig. 3.2a, lID in Fig. 3.2b, TED inFig. 3.2c) by comparing the patterns between the two sides.By placing different networks on different parts of the model basilar membrane inthe auditory periphery, patterns in different frequency regions can be compared.Thus, to cover the entire frequency range, it is necessary to construct a parallel set ofsuch networks in each of the three implementations shown in Fig, 3.2.The outputs of these networks are series of short-time cue estimates. These short-time estimates are fed into an information integration mechanism, which correspondsto the integration process shown in each of the diagrams in Fig 3.2, producinghistogram type distribution functions of the short-time estimates. Peaks in thesedistribution functions indicates the most probable cue estimates in the acousticenvironment over the integration time period. In the following sections we shalldiscuss the detailed implementation of the three serial processing stages we have justdescribed.Although the three diagrams shown in Fig. 3.2 have similar structures, the tasksfor the neural networks in different diagrams are different. They are required toestimate different binaural cues, namely the ITDs, ilDs, and IEDs, and they operate inparallel, corresponding to different neural circuitry in the auditory system that processChapter 3 A DLI Modelfor Auditory Localization 66different cues. A question arises as to the interpretation of the outputs of the threeimplementations which are distribution functions of different short-time cues. As allthree types of binaural cues (ITDs, liDs, and IEDs) indicate spatial locations of soundsources in the acoustic environment, the three types of distribution functions providea complementary description of the overall picture of the acoustic environment. Apeak in any distribution function at the outputs of the three diagrams in Fig. 3.2indicate a presence of a sound source whose location is determined by the cue valuecorresponding to the peak. If consistent ITD, lID, and lED are received by the twoears, the overall picture of the acoustic environment in terms of spatial distribution ofthe sound sources can always be obtained by combining the indication of soundsources from all three type of distribution functions through an “or” operation.§3.3 Modeling the Auditory PeripheryAs shown in both Figs. 3.1 and 3.2, the first process that needs to be modeled is theauditory periphery, which has been under study for more than a century (Helmholtz,1862; Bekesy, 1960; Siebert, 1968; Allen, 1985). A rich body of knowledge of thephysiology and anatomy as well as the signal processing characteristics of theauditory periphery has been gathered through the joint effort of researchers in variousrelated fields. Although there are still some important questions about the system thathave not been fully answered, a clear picture of the basic elements and their functionsin the system has been obtained. In particular, the peripheral processing of soundstimuli by the auditory system can be roughly divided into three stages: (a) theacoustical and mechanical filtering effect of the outer and middle ears; (b) mechanicalfiltering due to the motion of the basilar membrane in the cochlea; and (c)transduction of the mechanical movement of the basilar membrane to neural activityin the auditory nerve fibers, through the activation of the hair cells of the cochlea.Many theoretical models of these processing stages have been proposed over the pastChapter 3 A DLI Modelfor Auditory Localization 67several decades (Siebert, 1968; Allen, 1985; Carney, 1993). Although differentapproaches have been used in different models, they give us a similar picture.Roughly speaking, the effect of the basilar membrane motion is similar to that of aparallel set of band-pass filters that give rise to a frequency-map representation of thesound stimuli. Such a representation is then sent to the higher processing centers ofthe auditory system in the form of neural activities through the auditory nerve fibers.The transduction from the mechanical movement of the basilar membrane to theseneural activities is carried out by the hair cells in the cochlea, which have beenmodeled computationally as a rectifier plus a nonlinear transformation of the rectifiedsignals (Allen, 1985; Carney, 1993).We do not attempt to develop new models for the auditory periphery since it hasbeen studied extensively, and many models exist. Rather, we selected the modelreported by Carney (1993) to be used in our work. This is an integrated assembly ofmodels incorporating the different processing stages of the peripheral auditorysystem. The overall model describes a process that transforms a stimulus signal intothe firing rates of model auditory nerve fibers. One of the reasons for the choice ofthis model is that it provides us with accurate simulations of the temporal responseproperties of single auditory-nerve fibers for both single tones and complex soundstimuli. Since the temporal response characteristics of the auditory nerve fibers arethe foundation for the lTD sensitivity observed in the auditory system, these responseproperties are especially important for the estimation of ITDs. Another reason whywe select this model is the computational simplicity of the simulation of the model.This is important from the practical point of view: if our model of auditorylocalization were applied to practical applications, we would like to have a systemthat could work in real-time. Thus, the amount of computation becomes an importantfactor, and should be minimized. We leave the detailed description and computersimulation of this model to the next chapter.Chapter 3 A DLI Modelfor Auditory Localization 68§3.4 Measuring Short-Time Interaural Differences via Neural NetworksA central process in our model is that undertaken by the neural networks, shown inthe three diagrams in Fig. 3.2, whose tasks are to estimate short-time interauraldifferences (ITDs, liDs, and TEDs) by looking for and comparing patterns betweenthe two sides in parallel neural activities of auditory nerve fibers. These tasks arelearned by the networks though training processes. In this section, we describe thedetailed implementations and training of such networks.3.4.1 Spectro-Temporal Patterns at the Inputs of the NetworksPhysiological studies show that auditory nerve fibers with lower (below 3 kHz for thecat) characteristic frequencies exhibit phase-locking of their neural activities to thestimulus waveform (Kuwada and Yin, 1987; Pickles, 1988). As discussed in Chapter1, this phase-locking property of the low-frequency nerve fibers is the foundation ofthe lTD sensitivity of the auditory system. The phase shift between the phase-lockedneural activities from the two ears is used as a measure of the time delay of thestimuli detected by the two ears. Such phase shift is estimated, in most previousmodels of the lTD sensitivity, by using the cross-correlation function of the neuralactivities of two corresponding single auditory nerve fibers, each from one ear(Sayers and Cherry, 1957; Sayers, 1964; Colburn, 1973, 1977; Shamma et al., 1989;Stern et al., 1988; Lindemann, 1986a, b; Gaik, 1993; Arad et al., 1994).Another characteristics of the firing of the auditory nerve fibers is that their meanfiring rates change as monotonic functions of the intensities of the sound stimuli(Pickles, 1988). Thus, the neural activation level of the auditory nerve fibers providea reference for the measurements of the liDs. Indeed, as reviewed in Chapter 1, theso called El cells in the auditory pathways, who receive excitatory input from theipsilateral ear and inhibitory input from the contralateral ear, are thought to beresponsible for such measurements of the liDs. Moreover, some previous models forChapter 3 A DLI Modelfor Auditory Localization 69the lID sensitivity (Colburn and Moss, 1981; Johnson et al., 1990; Reed and Blum,1990) provide detailed mechanisms for the comparison of the activation levels ofsingle auditory nerve fibers between the two ears in order to obtain estimates of theilDs.A uniqueness of our model (see Fig. 3.2) lies in that the location cues areestimated by pattern comparison between group activities of sets of auditory nervefibers, rather than by cross-correlating the two temporal waveforms (for the ITDs) orcomparing the neural activation levels (for the lIDs) of two single auditory nervefibers. Due to the phase-locking activity of individual nerve fibers, the spectrotemporal patterns are also phase-locked to the stimulus waveform for low-frequencyfibers. For instance, from Fig. 3.3, which shows the activation of a group of auditorynerve fibers with adjacent characteristic frequencies responding to a 900 Hz pure-tone, one can appreciate the clear temporal alignment of the neural signals of theauditory nerve fibers. The phase information is represented in the line patterns of theneural activation across frequency. These patterns provide excellent references forsmall time difference measurements. Furthermore, the activation levels of thesepatterns provide references for the measurement of liDs. Both the temporal shift andthe activation level difference of these spectro-temporal patterns can be measuredbetween the neural activities of two sets (one from each ear) of auditory nerve fibers,which yield short-time lTD and lID estimates respectively. These short-timeestimates can be repeatedly generated in an on-going process as the ears continue tolisten to the sound stimuli. Thus, in our proposed model (Fig. 3.2), the neuralnetworks view an integrated group of auditory nerve fibers as a whole, as both thephase and intensity information is clearly encoded in the group activity patterns of thecorresponding auditory fibers.An advantage of this approach is that, in order to produce estimates of the locationcues, a binaural neuron only needs to observe the input signals for a period of timeChapter 3 A DLI Modelfor Auditory Localization 70that is in the order of the reciprocal of the neuron’s characteristic frequency. Thisprovides the foundation for cue estimation over short time spans, and for theintegration of the short-time cue estimates to obtain an overall picture of the acousticenvironment. Also, it provides an opportunity to handle fast time-varying situations.Another advantage is that the two-dimensional spectro-temporal patterns give arelatively robust representation of both the phase and intensity information of thestimulus waveform. If some single fibers’ signals contain higher noise than thesignals of the rest of the fibers do, the image pattern will still be recognizable andallow pattern comparison.11:a)zFig. 3.3 Example of neural activities of a group of nine(vertical axis) modeled auditory nerve fibers in responding to apure tone stimuli of 900 Hz.Neural networks trained to estimate ITDs will not work for high-frequency stimulibecause auditory nerve fibers with high characteristic frequencies can not phase-lockto the fine timing structure of the stimulus waveform (Pickles, 1988; Kuwada andYin, 1987). However, as reviewed in Chapter 1, Moller’s experiment (1974) suggeststhat auditory nerve fibers with high characteristic frequencies show phase-locking tothe envelopes of amplitude-modulated signals with high-frequency carriers. Thus, forcomplex high-frequency stimuli, the group activities of high-frequency auditory nerveTime (ms).Chapter 3 A DLI Modelfor Auditory Localization 71fibers show spectro-temporal patterns that reflect the temporal structures of theenvelopes of the stimulus signals. Such patterns can then be compared by neuralnetworks (see Fig. 3.2c) to measure the temporal delays of envelopes (i.e. the FED)between the two ears. A major difference between the lTD estimation and the TEDestimation is that for the lTD case, the patterns in the input to the neural networksreflect the fine timing structures of the sound stimuli whereas for the TED case thepatterns reflect the timing structures of the envelopes of the sound stimuli. Thus, asimilar structured network may be used to estimate TEDs. A unique feature of theenvelope spectro-temporal patterns of the stimuli is that the patterns will not show upif the stimuli are not amplitude-modulated with a modulation rate falling in anappropriate range as defined below.Consider an amplitude-modulated sound stimulus signal of the following formS(t) = Asin(comt + Prn)51(C0ct + (3.1)where Corn is the modulation frequency, 0c >> corn is the carrier frequency, Pm andare the phases of the modulation and carrier signals, respectively. Eq. 3.1 can berewritten asS(t) = [cos(Co1t+q1)—cos(w2t+c’2)]2 (3.2)where CD= 0)c — Wrn Co2 = + tDm Pi = Pc — ‘Pm’ and = + p,. Thus, theFourier spectrum of S(t) consists of two adjacent sinusoids. The distance infrequency space between the two sinusoids is twice as much as the modulationfrequency. The midpoint between the two sinusoids on the frequency axis is thecarrier frequency, as shown in Fig. 3.4.As reviewed in Section 3.3, the effect of the basilar membrane motion is similarto that of a parallel set of band-pass filters. If the two sinusoids in Fig. 3.4 areseparated at such a distance that at most only one sinusoid falls within the pass bandof a particular filter (i.e. the modulation rate is too high), the output of that filter willChapter 3 A DLI Modelfor Auditory Localization 72at most contain one major sinusoid. Thus, the envelope of this output will be largelyflat with a major DC component. Fig. 3.5 shows such a case where a window of 13time slices of the neural activities of a set of nine auditory nerve fibers are presentedin a matrix of small squares. The larger the white area in each square in the figure,the higher the activation of the corresponding fiber at the corresponding time slice.ciD4.J02—______I________1Fig. 3.4 Frequency spectrum of the signal in Eq. 3.2.When the two sinusoids in Fig. 3.4 are sufficiently close to each other so that bothof them fall within the pass bands of a set of auditory nerve fibers, the activation ofthese auditory nerve fibers will show patterns reflecting the amplitude modulation ofthe stimuli. Fig. 3.6 shows such a case.If the two sinusoids in Fig. 3.4 are so close to each other that the modulation ratebecomes very slow, the activation of the auditory nerve fibers will not showsignificant temporal variation within a relatively small temporal window. Fig. 3.7shows such a case.In summary, there are upper and lower limits of the modulation rate, within whichthe windowed activities of the auditory nerve fibers show spectro-temporal patternsthat reflect the envelope of the stimulus signal.Chapter 3 A DLI Model for Auditory Localization 73U,rommaC)SIC JUUUUUUULI- rJrnnuDucr—nnnrnnriqIIndex of AN fibersFig. 3.5 High modulation rates. Example of neural activitiesof a group of auditory nerve fibers when the stimulus hasrelatively high modulation rates. Each square represents theactivation of a particular auditory nerve fiber at a particulartime slice. The white area in each square is proportional to theactivation level. AN: auditory nerve.3.4.2 Structure of the NetworksIt is possible that networks with different structures may be trained to perform thesame task. Our objective here is to find a network structure that can be trained toperform the tasks required, rather than to investigate the difference of performance ofdifferent network structures that may be used for the task. We have chosen the back-propagation network (Rumelhart et al., 1986) for the task. This type of network hasbeen shown to be able to learn, from training examples, complex input-outputtransformations (Maren et al., 1990). More importantly, multi-layer back-propagationnetworks are able to form arbitrarily complex decision regions for the purpose ofpattern classification and recognition (Lippmann, 1987). This ability is directlyrelated to the non-linearity of the individual neurons and to the fact that the networkshave multiple layers. In our simulation study, a three-layer structure was used. AsChapter 3 A DLI Modelfor Auditory Localization 74will be evident from the results to be presented in Chapters 5, 6, and 7, the samenetwork structure can be trained to perform the three different tasks required in thethree parallel implementations (see Fig. 3.2) of the generic DLI model shown in Fig.3.1. Fig. 3.8 shows a schematic diagram of the structure.]rniiErnrnrnCl)JEJLJDE]4- LJUJU°EHLJLJLJUUULJLJL]Index of AN fibersFig. 3.6 Intermediate modulation rates. Example of neuralactivities of a group of auditory nerve fibers when the stimulushas intermediate modulation rates. Each square represents theactivation of a particular auditory nerve fiber at a particulartime slice. The white area in each square is proportional to theactivation level. AN: auditory nerve.The three layers of the network shown in Fig. 3.8 are called the input, hidden, andoutput layers, respectively. The input layer is fully connected with the hidden layer,and the hidden layer with the output layer. Both of these connections are feedforward which means that each neuron in the receiving layer receives weighted inputfrom every neuron in the sending layer, and that the activation of a receiving neuronis governed by the following equation:Chapter 3 A DLI Modelfor Auditory Localization 75s = f[2w1u (3.7)where s is the output of the i-th neuron in the receiving layer, u ‘s are the outputs ofthe neurons in the sending layer, b is a bias constant, w,1 ‘s are the connection weightsbetween the two layers, N is the number of neurons in the sending layer, and thefunction fQ) is defined as a sigmoid nonlinearityf(x) = 1 (3.8)1+eThe connection weights and the bias constant are to be determined in a trainingprocess as will be described in the next subsection.Cl)a)0(I)a)SH4-0><a)-oCcrnrnrncuurnriflrn00000[LiL][JLJUUUUUucurnurnrnorncrnLJCUDflLJUULJCDrnornniDLflJflIndex of AN fibersFig. 3.7 Low modulation rates. Example of neural activities ofa group of auditory nerve fibers when the stimulus hasrelatively low modulation rates. Each square represents theactivation of a particular auditory nerve fiber at a particulartime slice. The white area in each square is proportional to theactivation level. AN: auditory nerve.Chapter 3 A DLI Model for Auditory Localization 76Output LayerHidden LayerInput LayerC)EInput from Right EarFig. 3.8 A schematic diagram of the structure of the networksfor short-time cue estimations. The squares represent neuronsin the network. The size of the white area in a squarerepresents the amplitude of the neuron’s activity, which isnormalized to be from zero to one. The adjacent layers in thenetwork are fully connected in a feed-forward fashion. See textfor details on the connection between layers.The input layer of the network consists of two parts, with each corresponding toone of the two ears. The two parts have the same structure in the form of arectangular matrix, as shown in the input layer in Fig. 3.8. The activity in the matrixof neurons represents the neural activation pattern of a group of auditory nerve fibersover a short-time interval. Each column of the matrix corresponds to one auditorynerve fiber in the group, and the rows correspond to the neural firing rates of theauditory nerve fibers at successive discrete time instants in the time interval shown.The sampling rate is minimized to achieve simulation efficiency without causing anyaliasing problem. This is possible because the neural activities of the auditory nervefibers correspond to outputs of the band-pass filters used to model the mechanicalfiltering effect of the basilar membrane.Auditory Nerve FibersInput from Left EarChapter 3 A DLI Modelfor Auditory Localization 77In particular, the activation of the neurons in the input matrix can be viewed as asmall patch of the overall neural spectro-temporal representation of the soundstimulus. An alternative view is that the matrix of the input neurons is a smallspectro-temporal window that can be used to scan the spectro-temporal input alongthe time axis. To determine the size of this spectro-temporal window, or the numberof columns and rows in the input matrix, several considerations are relevant. First,the activation patterns in the window must be representative of the patterns exhibitedin the overall spectro-temporal representation produced by the auditory peripheralmodel that conveys the timing and intensity information. Accordingly, the windowsize should not be too small to accommodate such patterns. On the other hand, inorder to follow the possibly time-varying cues in non-stationary situations, thetemporal dimension of the window should be as small as possible. Also, in order todifferentiate the possibly multiple sound sources having different frequency contents,the spectral dimension of the window should also be small. Finally, a small windowis also desirable from the point of view of computational convenience in the trainingand simulation of the network. For the simulation presented in Chapters 5, 6, and 7, a13 by 9 input matrix is used. This corresponds to nine adjacent auditory nerve fibermodels sampled at thirteen consecutive time instants. For low-frequency fibers, thesampling interval is determined by the upper limit of the frequency range covered bythe relevant auditory nerve fibers. For high-frequency fibers, which cannot phase-lock to the fine temporal waveform of the sound stimulus, the sampling interval isdetermined be the upper limit of the relevant modulation frequency range described inthe last section. The characteristic frequencies of the nine auditory nerve fibers at theinput to the lTD network are chosen to be frequency values uniformly sampled in theequivalent-rectangular-bandwidth-rate (ERB-rate) scale (Moore and Glasberg, 1983).A sampling interval of 0.2 in the ERB-rate scale is used. This interval corresponds toa frequency interval of about 100 Hz at 1 kHz. In Fig. 3.8, where an example of theChapter 3 A DLI Modelfor Auditoiy Localization 78activation of the input matrices of neurons stimulated by a 900 Hz tone are shown, a13 by 9 window contains a complete period of the periodical patterns (lines) in theneural spectro-temporal representation of the sound stimulus (see also Fig. 3.3).In contrast to the input layer, the hidden layer is an array of neurons that do nothave any particular order. Each hidden neuron receives a weighted sum of the inputsfrom both matrices of neurons in the input layer. Different neurons usually havedifferent weighting vectors, abstracting different information in the input activationpatterns. These weighting factors are determined by a training process which isdesigned to force the network to learn its tasks, as will be described in the nextsubsection. The number of neurons in the hidden layer is usually determinedempirically depending on the specific tasks of the network. A general guideline is tochoose the smallest possible number of hidden neurons that are sufficient for thenetwork to learn the specific tasks. This not only minimizes the computing resourcesrequired for the simulation, but also prevents the network from directly memorizingthe training examples instead of learning generalization for use on other stimuli.The task of the network is to estimate the cues from the information available inthe windowed neural activities of the model auditory nerve fibers, as represented bythe two input matrices of neurons corresponding to the two ears. This task is realizedby encoding the location cue values in the collective activities of the output layerneurons of the network. A convenient and biologically plausible way to encode thecues is to define tuning curves for the output layer neurons in responding to differentcue values (Lehky and Sejnowski, 1990). Specifically, the activity or response of anoutput layer neuron is defined as a non-monotonic function with a single peak at aparticular cue value. For computational convenience, we choose a Gaussian functionfor the shape of the tuning curves. Fig. 3.9 shows such a tuning curve for the lTD cuewhich specify the response activity of a particular output layer neuron as a Gaussianfunction of the relevant ITDs.Chapter 3 A DLI Modelfor Auditory Localization 79• I03-t)zNE3-Cz- 0.9 -0.6 -0.3 0 0.1 0.3 0.6 0.9ITD(ms)Fig. 3.9 An example of the tuning curve of an output layerneuron. The arrowhead line at lTD = 0.1 ms represents theactivity of the neuron for this particular lTD value.There are two problems associated with the use of non-monotonic tuning curves(Lehky and Sejnowski, 1990). A first problem is that if the tuning curves arenarrowly shaped and each neuron is used to encode only one particular cue value, alarge number of neurons are needed to obtain relatively high resolution. Anotherproblem is that a neuron may give the same response to different cue values when thetuning curves overlap. One way of resolving this problem is to encode a range of cuevalues by the collective responses of a group of neurons rather than by single-neuronresponses to each cue value within the range. Fig. 3.10 shows an example of theactivity pattern of an ordered array of output neurons corresponding to a specific cuevalue. In this particular example, these responses are determined by their Gaussianshaped tuning curves which are defined as1Ri(e)=,.—e 2ito (3.9)2itawhere 0 corresponds to the encoded cue value, i is the index of the output neurons.Chapter 3 A DLI Modelfor Auditory Localization 80The maximum-response cue values (which correspond to the u1 ‘s in Eq. 3.9) ofthe neurons in the array change in an orderly fashion from the neuron at one end ofthe array to the one at the other end. Thus, the spatial arrangement of the neurons inthe output array is just as important as it is for the neurons in the input matrix.z0zFig. 3.10 An example of the coding of cue values by an arrayof output layer neurons. The arrowhead lines indicate theactivities of the individual neurons. The mean of the Gaussianenvelope is the encoded cue value.The choice of the number of neurons in the output layer is also an empirical one.As a general guideline, more neurons in the layer will increase the precision of thecoding scheme. However, because there will be noise in the response of theindividual neurons, the incremental improvement in precision will become smallerand smaller as the number of neurons increases. For the simulation presented inChapters 5, 6, and 7, there are 20 neurons in the output layers of the correspondingnetworks.R 5/R6\1 2 3 4 5 6Indices of output layer neuronsChapter 3 A DLI Modelfor Auditory Localization 813.4.3 Training of the NetworksThe tasks of the networks are to estimate the location cues as encoded in their outputactivity patterns upon scanning a section of the neural spectro-temporal representationof the sound stimuli. The performance of these tasks are learned through trainingprocesses in which the connection weights in the network are adjusted systematicallyso that the outputs of the networks will approximate the desired outputs for thetraining examples. The desired output activity patterns are given, and theycorrespond to the known cue values of the training examples. After the training iscompleted, the networks are expected to give appropriate cue estimates for arbitrarysound stimuli.As the training process can be formulated as an optimization problem, variousoptimization techniques can be used to design the training algorithm. The objective isto find a set of the connection weights in the network that minimize the differencebetween the desired outputs and the actual outputs of the network for all the trainingexamples.As the networks are trained only to produce the correct output encoding of theknown cue values of the training examples, an important question arises on how wellthe networks will encode the cue for arbitrary sound stimuli, or, how well the networkwill extrapolate and interpolate in the entire space of all possible stimuli. To achievegood ability to generalize, the choice of training examples is crucial. This in turndepends on the structure of the input space. For all three types of networks shown inFig. 3.2, there exist multiple parameter dimensions that effectively affect the inputpatterns of the networks. Our strategy in designing the training example sets is, thus,to uniformly sample the major relevant parameter dimensions.In addition to the proper sampling of the stimulus parameter space for theselection of the training example sets, another aspect must also be considered.Although the shifts or the activation level differences of the spectro-temporal patternsChapter 3 A DLI Modelfor Auditory Localization 82between the two ears are constant for a certain corresponding cue value, the absolutephase of the pattern in the spectrotemporal representation from each ear is changingcontinuously. Thus, the networks should be trained to measure the constant phase oractivation level differences independent of the absolute phase of the spectrotemporalpattern from each ear. This is achieved by including, in the training set, examplesthat have different absolute phases of the same spectro-temporal patterns.§3.5 Integrating Short-Time Cue EstimatesA simple way of integrating the short-time cue estimates (produced by the networksdiscussed in the last section) is to use histograms. Histograms of such short-timeestimates are formed by dividing a corresponding range of cue values into a numberof bins of relatively small interval. The short-time estimates are then put into thecorresponding bins depending on their values. In a dynamic situation, the short-timeestimates changes from one time window to another. Thus, the resulting histogramsare estimates of short-time cue distributions over the corresponding listening period.Such distributions give us direct indications of the spatial distribution of soundsources in the observation time period. Peaks in such distribution functions indicatethe most likely estimates of the cue values that correspond to different sound sources.Fig. 3.11 shows such a distribution function for the lTD cue. The single peak in thedistribution indicates a single source localized at a position that corresponds to thelTD value at the peak.The idea of using histograms to integrate short-time cue estimates is biologicallyplausible. A possible neural implementation of the process of building a histogram isto have an array of neurons representing the different bins in the histogram. Thefiring rate of a neuron in the array corresponds to the height of the correspondinghistogram bins, and this firing rate is driven by the short-time estimates that coincidewith the cue value represented by the neuron.Chapter 3 A DLI Modelfor Auditory Localization 83§3.6 Summary of the DLI ModelIn summary, our model for auditory localization consists of several processingstages in series, with each stage having parallel processing pathways or modules. Thestimulus signals detected by the two ears are first processed by models of the left andright peripheral auditory systems, respectively. These two models transfer theirincoming sound signals into parallel sets of band-pass signals with different centerfrequencies, forming two neural spectro-temporal images of the sound stimuli at thetwo ears. The second stage consists of three parallel processing pathways, each ofwhich is arranged to generate short-time estimates of one of the three binaural cues:lTD. lID, and TED. Within each pathway, a parallel set of neural networks are trainedw 100D-500 0 500lTD (us)Fig. 3.11 An example of the lTD distribution function. Thesound stimuli used in the simulation were 1 kHz pure-tones onwhich white noise was added with a 20 dB signal-noise ratio.The lTD in the stimuli was 250 jis. The peak around 250 jisindicates the correct estimate of the true lTD value.Chapter 3 A DLI Modelfor Auditory Localization 84to estimate the corresponding short-time cues based on information from smallsections of the entire frequency range. In effect, each neural network in the set takes,as its input, two patches of the neural spectro-temporal images of the sound stimuli atthe two ears as represented in the neural activities of the model auditory nerve fibers.The lTD and TED networks are trained to measure the temporal shift of patterns in theneural images corresponding to the two ears, while the lID networks are trained tomeasure the activation level differences between the two neural images. The outputsof these networks are series of short-time estimates of the ITD, lID, and TED whichare fed into an information integration process, giving rise to distributions of thecorresponding short-time cues. Peaks of the distribution functions correspond tosound sources appearing in the corresponding listening period.854 Simulation MethodsIn the last two chapters we have introduced and described our model for auditorylocalization. The model is studied using computer simulations. In this chapter, wediscuss the methods used in our simulations. We first discuss the detailed simulationof a model (by Carney, 1993) for the peripheral auditory system, which is used as thefirst module or processing stage in our DLI model shown in Fig. 3.1. Second, wediscuss the simulation and training of the neural networks which are the centralprocesses in our model. As mentioned in the last chapter, only one network in eachof the three parallel implementations (Fig. 3.2) of the DLI model (Fig. 3.1) needs tobe simulated to gain a good understanding of the behavior of the model. Finally, wediscuss the evaluation methods used to assess the performance of the trainednetworks.§4.1 Simulation of the Peripheral Auditory SystemAs mentioned in Section 3.3, a model by Carney (1993) is used in our simulation.The model transforms a stimulus signal into the firing rates on the model auditorynerve fibers. This transformation has previously been modeled as a cascade of alinear filter, a static nonlinearity, and a stochastic pulse generator, as shown in Fig.4.1 (de Boer and Kuyper, 1968; de Boer and de Jongh, 1978). It has been found(Johannesma, 1972; de Boer, 1975; de Boer and Kruidenier, 1990) that a filter of theformg(t) = [(t—a) / ‘rJ’. cos[WCF(t — a)], for t a, (4.1)g(t)=0, for t<a.can be used in the model shown in Fig. 4.1 to describe the responses of auditory nervefibers. The parameters in Eq. 4.1 are estimated by fitting the model in Fig. 4.1 toChapter 4 Simulation Methods 86activities of auditory nerve fibers responding to wideband noise stimuli. The functionin Eq. 4.1 provides an estimate of the linear aspects of the acoustical and mechanicalfiltering of the middle ear and the basilar membrane (de Boer and de Jongh, 1978).Fig. 4.1 A model of the neural signals in auditory nerve fibers(de Boer and de Jongh, 1978).A non-linear characteristics of basilar membrane mechanics, which is notconsidered in the model shown in Fig. 4.1 above, is that the bandwidths of thecochlear filters changes as a function of the intensity of the sound stimulus (Rhode,1971; Hall, 1974; Johnstone et al., 1986). Carney (1993) incorporated this nonlinearity in the model shown in Fig. 4.1 by introducing a feedback mechanism tocontrol the bandwidth of the filter in Eq. 4.1 according to the intensity of the soundstimulus. A schematic diagram of Carney’s model is shown in Fig. 4.2. Fourprocessing elements can be identified in the diagram: (i) a time-varying narrow-bandfilter that describes the linear aspects of the mechanical tuning of the basilarmembrane; (ii) a feedback mechanism for the control of the bandwidth of the narrowband filter in (i); (iii) an overall time delay consisting of the traveling wave,acoustical and synaptic delays; and (iv) a static nonlinearity that describes theconversion of mechanical movement to neural signals by the inner hair cells. In thefollowing subsections (4.1.1 through 4.1.4), we shall give more detailed descriptionsof these elements (Carney, 1993).Chapter 4 Simulation Methods 87StimulusFig. 4.2 A schematic diagram of the auditory periphery modelby Carney (1993). See text for descriptions of the differentelements in the model. (Adopted from Carney, 1993.)P(t)F(t)Pf(t)Vffi(t)Vjhjt)Firing rateChapter 4 Simulation Methods 884.1.1 Time-Varying Narrow-Band FilterFor the parameter ranges relevant to auditory nerve fiber responses, the function inEq. 4.1 has a simple frequency domain approximation (Patterson et al., 1988):G(w) oc 1 . (4.2)1+j(a— CF)The parameter has a strong influence on the bandwidth of the filter. The time delaya is included as part of the delay introduced after the narrow-band filter. The value of7 is set equal to 4 for fibers with relatively low characteristic frequencies (de Boer,1975).In our simulation, Eq. 4.2 is implemented through a cascade of y digital filters ofthe following form:q1(kT) = A[q0(kT)+q0(kT— T)+ B(kT)q1(kT— T)] (4.3)where A = 1/(1.5;C + 1), and B(kT) = F(kT)C — 1, with C = 2/T. ; represents thetime constant of the exponential damping of the function in Eq. 4.1 for a 75 dBsound-pressure level. F(kT) is the feedback signal shown in Fig. 4.2, and will bedescribed in the next subsection.4.1.2 Feedback Control of the Filter BandwidthAn important aspect of the mechanical tuning of the basilar membrane is that thebandwidth of the tuning varies over time as a function of the sound-pressure level ofthe stimulus (Rhode, 1971; Hall, 1974; Robles et al., 1976; Patuzzi et al., 1984;Johnstone et al., 1986; Ruggero and Rich, 1991). When the amplitude of the stimulusis low, the filter is relatively sharply tuned. As the amplitude of the input signalincreases, the bandwidth of the filter increases. This aspect of the nonlinearity of thebasilar membrane tuning is modeled through a feedback control mechanism whichChapter 4 Simulation Methods 89includes a saturating nonlinearity and a low-pass filter. The saturating nonlinearity ismodeled using a hyperbolic function of the following form:V (‘ P(t)Vf1 (t) = max tanhl 0.707 ‘ — P0 I + tanh P0 , (4.4)1+tanhP0‘Dfbwhere the parameter Vmax determines the saturation value, P0 determines theasymmetry of the nonlinearity, and Dffi sets the operating point of the nonlinearity.The low-pass filter in the feedback ioop is implemented in the time domain as adigital filter of the form:V0(kT) =C1V0(kT— T)+C2[V(kT)+V1(kT— T)], (4.5)where T is the sampling interval, and C and C2 are the filter coefficients determinedby the cut-off frequency of the filter. The output of the low-pass filter is scaled andbiased according to the following equation:F(t)=- ( V(t)(’r (4.6)2 LVJ2)where F(t) is the feedback signal to the time-varying narrow-band filter of the basilarmembrane model, and ;, again, represents the time constant of the exponentialdamping of the function in Eq. 4.1 for a 75 dB sound-pressure level.4.1.3 Traveling Wave DelayThe responses of the auditory nerve fibers have specific latencies due to theacoustical, synaptic, and traveling wave delays. To align the latency of the output ofthe narrow-band filter with the measured latencies of the auditory nerve fibers, a timedelay is introduced after the narrow-band filtering, as shown in Fig. 4.2. The timedelay varies as a function of the characteristic frequency (co) of the auditory nervefiber as described by the following equation:Pbm(t) = Pf(t — AD exp(— x(coc) )—(4.7)Chapter 4 Simulation Methods 90where AD and AL are obtained from experimental measurements by Carney and Yin(1988), and x is the distance in mm from the apex of the basilar membrane.4.1.4 Model of the Inner Hair CellThe inner hair cell is modeled as a saturating nonlinearity, which has the same formas Eq. 4.4, followed by two low-pass filters that have the same form as Eq. 4.5. Theasymmetrical saturating nonlinearity of Eq. 4.4 has previously been proposed tomodel the input/output characteristics of the inner hair cells by Russell and Sellick(1978). The two low-pass filters represent the electrical filtering of the inner hair cellmembrane (Russell and Sellick, 1983). The cut-off frequency of the two low-passfilters is set to 1100 Hz, which is the same as that used for the low-pass filter in thefeedback ioop shown in Fig. 4.2.4.1.5 Modeling High-Frequency Auditory Nerve FibersIn order to model the effect that high-frequency auditory nerve fibers can not phase-lock to the fine timing structure of the sound stimulus, we add an extra processingstage in the peripheral model described in the previous subsections. Specifically, theenvelope of the output of the time-varying narrow-band filter in Fig. 4.2 is abstractedand sent to the next processing stage. This model is used for the auditory nerve fibersat the inputs to both lID and TED networks shown in Fig. 3.2.§4.2 Simulation of the Neural NetworksWe have once tried to write our own computer programs to facilitate the simulationstudies of the neural networks in our model. A critical issue in the simulation ofneural networks is the implementation of the training algorithms. It is important forus to implement different training algorithms in the simulation programs. This willallow us to test with different training methods in order to chose the most appropriateChapter 4 Simulation Methods 91(in terms of the training speed and the ability to avoid local minima) ones. Anotherissue concerns the user interface. A graphical user interface is desired, which allowus to plot the changes of important parameters during training, to view the activitypatterns in different layers of the network during and after training, and to view theconnection patterns between layers of the network. The interface should beinteractive in the sense that the user has full control of the processes taking place inthe network, such as setting break points and stepping through the training process,changing parameters at any point during training, and testing a trained network byindividual test stimuli.In the early stage of our simulation study, we became aware of a neural networksimulation package called Xenon (van Camp, 1993) that meets the requirements ofour simulation. Xenon is a collection of neural network simulation tools developedby a research group in the Department of Computer Science at the University ofToronto. It runs on the UNIX platform, and allows easy creation of differentsimulators for different types of neural networks. Thus, instead of continuing withwriting our own simulation programs, we have since used the back-propagationnetwork simulator, which comes with the Xenon package, to train the networks in ourmodel. Most of the popular training algorithms for back-propagation networks areimplemented in the simulator. Moreover, the simulator has an excellent graphicaluser interface that allow the user to visualize and manipulate the networks in the waydiscussed above. It also has a UNIX-like script shell that allow automation of thetraining and manipulation of the networks.4.2.1 Training ExamplesTo limit the number of training examples, only the simplest types are used. It is alsointeresting to know whether a network trained with the simplest type of stimuli can“generalize” and perform well with more complex stimuli. For the lTD and lIDChapter 4 Simulation Methods 92networks, only pure-tone stimuli are used. For the TED networks, sinusoidallyamplitude-modulated (AM) tones are used. In order to train the networks to performtheir tasks (estimating short-time cues ) in noisy situations, white noise is added to thetraining stimuli.As mentioned in Subsection 3.4.3, the choice of training examples is to uniformlysample the relevant parameter dimensions of the stimulus space. For the lTD and lIDnetworks (Fig. 3.2a, b), the relevant parameters of the stimuli are frequency, intensity,signal-noise ratio (SNR), ITD, and lID. For the TED networks (Fig. 3.2c), themodulation frequencies of AM stimuli are also relevant.A training example to a network consists of two parts. One part is the inputpattern to the network, and the other part is the corresponding output pattern thenetwork is expected to produce upon being presented with the input pattern. Asdiscussed in Subsection 3.4.2, the output patterns of the networks in our model shownin Fig. 3.2 encode the corresponding localization cues. Such encoding is defined bythe tuning curves of the output layer neurons in responding to different cue values.These tuning curves have Gaussian shapes whose maximum response points vary inan orderly fashion from one end of the output layer to the other end, covering aspecific cue value range. The ratio between the standard deviation of the Gaussiancurves and the cue value range covered by the output layer neurons is 15. Bychoosing such a ratio, the activities of the output layer neurons show a Gaussianshaped pattern that occupies one-fifth of the entire output layer, and the position ofthis pattern shift along the output layer depending on the encoded cue value. For anlTD network, the cue value range is determined by the maximum characteristicfrequency of the auditory nerve fibers at the input to the network. For lID networks,the encoded cue value range is from -12 dB to 12 dB. This value (12 dB) of lIDcorresponds to a sound image completely lateralized to one ear, as reviewed inChapter 4 Simulation Methods 93Chapter 1. For a TED network, the cue value range is determined by the upper limitof modulation frequency discussed in Subsection 3.4.1.4.2.2 Specifications of Three Networks to be TestedAs reviewed in Chapter 1, the lTD sensitivity of the auditory system is limited tofrequencies below 1200 Hz (Yost and Hafter, 1987), whereas the lED sensitivity isonly observed in high-frequency stimuli (Bekesy, 1960; Henning, 1980; Yin et al.,1984). For the lID sensitivity, although it is observed in the entire frequency range(see Fig. 1.9 on Page 16), free-field stimuli can only induce significant liDs in thehigh frequency range (see Fig. 1.5). Thus, different location cues are estimated notonly in separated pathways in the auditory system, as discussed in Section 3.2, butalso in different frequency regions. Simulation of the lTD networks will be limited tothe lower frequency region (below 1200 Hz), whereas that of the lID and TEDnetworks limited to the higher frequency region.From the model for the peripheral auditory processing, it follows (see Eq. 4.1)that if we normalize the frequency tuning curves that correspond to different pointsalong the basilar membrane according to their corresponding characteristicfrequencies, they have similar shapes. This suggests that, within the same frequencyregion in which different networks of the same type are constructed, the input patternsto these networks have similar characteristics. As these networks rely on such inputpatterns to abstract the interaural differences, they are expected to have similarbehaviors. Thus, only one network of each of the three types (ITD, lID, and TED; seeFig. 3.2) will be studied, and the choice of the frequency ranges covered by thesenetworks is somewhat arbitrary.For the chosen lTD network (to be presented in Chapter 5), nine fibers from eachof the two ears are used at the input of the network. The characteristic frequencies ofthe fibers cover the frequency range of 800-1000 Hz. This range is adjacent to theChapter 4 Simulation Methods 94upper limit of the frequency region of lTD sensitivity. Thirteen consecutive samples(at a sampling interval of 200 ts) of each signal from these fibers are sent to thenetwork. The lTD range covered by the output layer neurons is from -505 ts to 505Jis. There are 20 neurons in the output layer that respond maximally at lTD values of= —505 + 50. 5(i — 1) J.ts, i = l,• •, 20, respectively. These ‘s are uniform samplesover the range from -505 jis to 505 I.Is.For the chosen lID network (to be presented in Chapter 6), another set of nineauditory nerve fibers from each ear is used at the input of the network. The model ofthe auditory periphery used in our simulation is derived largely from the observedbehavior of the auditory periphery at low frequencies (below 5000 Hz), as mostauditory periphery models are (Carney, 1993). Thus, we have chosen an intermediatefrequency range within the region of 1200-5000 Hz for the simulation of the lIDnetwork. Specifically, the characteristic frequencies of the lID network cover therange of 2500-3500 Hz. The activities of these auditory nerve fibers, which reflectthe timing structure of the envelope of the stimulus signals, are sampled at 500 jisintervals. Thirteen consecutive samples from each fiber are sent to the network.There are also twenty neurons in the output layer of the network which respondmaximally at 11D values of u1 = —12 + 1.26(i —1) dB, i = 1,.. .,20, respectively. Thesep, ‘s are uniform samples over the range of -12 dB to 12 dB.The same fibers as those in the lID network described above are used at the inputto the trained lED network to be presented in Chapter 7. However, the TED networkis trained differently: rather than measuring the activation level differences betweenthe two ears (as is the lID network trained to do), the TED network is trained tomeasure the temporal delay between the envelopes of the stimulus signals at the twoears. The effective modulation frequency (Subsection 3.4.1), at which the activationpatterns of the relevant auditory nerve fibers reflect the temporal structure of themodulation, ranges from 154 to 305 Hz. The same number of output layer neurons asChapter 4 Simulation Methods 95that used in the lID network are used in the lED network, but they are trained toencode a range of TEDs rather than lIDs. The TED range encoded by the TED networkis from -1.64 to 1.64 ms, which is determined by the effective modulation frequencyrange. The i-th neuron in the output layer responds maximally at an TED value of—1.64 + 0.1 64(i — 1) ms. These /i ‘s are uniform samples over the range of -1.64to 1.64ms.4.2.3 The Training ProcessThe training example set for a network consists of a series of input-output pairs thatthe network is to be trained to associate. The training process consists of a series ofiterations. In each iteration, the input patterns of the training examples are presentedto the network, and the actual outputs of the network are compared to thecorresponding output patterns in the examples which are the desired outputs of thenetwork. The weights of the network are adjusted to minimize the mean squarederror between the actual and the desired outputs. The training process is terminatedwhen a local minimum is reached. The process is repeated several times usingdifferent random initial weights of the network. The final weights that correspond tothe smallest local minima are taken to be the final weights of the network.The minimization problem in the training process is solved using the conjugate-gradient method well known in optimization theory (Hestenes, 1980). This method issuitable for effective handling of large-scale problems with hundreds of variables(Fletcher, 1975; Powell, 1977; Hestenes, 1980), which is the case in our simulation.A typical minimization process consists of a series of iterations of local search in theweight space. In each iteration, the search starts with the weights resulting from lastiteration, and a search direction is chosen. The error function that is to be minimizedis then evaluated in the neighborhood of the starting point and along the chosendirection. The point along the direction that gives rise to the smallest error is chosenChapter 4 Simulation Methods 96to be the search result of this iteration. This process is called line search. In theconjugate-gradient method, the search directions of successive iterations are chosen tobe conjugate to each other (Hestenes, 1980). It is well known in optimization theorythat if the error function is quadratic with respect to the weights, the exact minimumof the error function can be found in a limited number of iterations if the searchdirections are conjugate to each other (Fletcher, 1975; Powell, 1977; Hestenes, 1980).As any error function can be approximated by a quadratic function in the vicinity of astarting point, conjugate direction search will lead to a point in the weight spacewhere the error function has the smallest value in the vicinity of the starting point.When such a point is reached, the search will be repeated with another set ofconjugate directions in the vicinity of the new starting point. The search ends where alocal minimum of the error function is reached. The error functions of the networksin our model are far more complex then a simple quadratic function due to the nonlinearity (see Eq. 3.8) of the transformation from a neurons’ inputs to its output. It isdifficult to predict how many iterations are needed before a local minimum isreached. Furthermore, as a line search is conducted in each iteration to find theminimum point along the search direction of the iteration, it is also difficult toestimate how long each iteration will take. Our experience with the training of thenetworks presented in the thesis is that the search of a local minimum needs a numberof iterations that is on the order of 1000, and takes about two weeks on the Sunworkstations (SparcStation 2) available.Several factors contribute to the lengthy training of the networks in our model. Afirst factor is the size of the networks. There are 234 neurons in the input layer, and20 in each of the hidden and output layers of a network. A network, thus, contains4700 connection weights between the input and the hidden layers, and 420 weightsbetween the hidden and output layers. A second factor is the size of the trainingexample sets. The training examples are saved in computer files in the ASCII format.Chapter 4 Simulation Methods 97Such an example file typically contains about 20,000,000 bytes of data. It takes thenetwork simulator several hours to read all these data into computer memory beforethe actual training process starts. The limited computer resource available for ourresearch makes it necessary to terminate a training process before it is finished, andrestart the process from the termination point in a later time. This considerablyincreases the overall time required for the training because the entire training exampleset needs to be re-loaded into computer memory before the training process can berestarted. The third factor concerns the demand of computer memory of the networksimulator. The memory requirement of our simulation exceeds the available memoryin a workstation. Thus, the computer must constantly swap data between its memoryand hard disk, slowing down the computer operations in the training process.Because of the enormous computational resource requirement, our simulations arelimited to one narrow band of frequencies in the audible spectrum for each of thethree networks. Thus, the problem of across-frequency integration (Stem et al., 1988)is not addressed in this work.4.2.4 Short-Time Cue EstimatesA trained network produces short-time cue estimates upon being presented withstimulus signals. As discussed in Subsection 3.4.2, such short-time cue estimates areencoded in the activation patterns of the output layer neurons (see Fig. 3.10). Todecode these activation patterns to obtain the corresponding short-time cue estimateswe need to fit Gaussian curves to the patterns. The means of the fitted Gaussianfunctions give the short-time cue estimates.§4.3 Evaluation Methods of Trained NetworksAfter a network is trained, some important issues arise concerning the performance ofthe trained network when it carries out its tasks. Since the network is only trained onChapter 4 Simulation Methods 98limited number of training examples of the simplest type, one such issue is itsgeneralization ability. Another issue concerns the robustness of the network in noisyenvironments. Finally, how well the network works in multi-source situations needsto be tested.4.3.1 Parametric Description of Cue EstimatesAs discussed in Subsection 4.2.3, the trained networks give out short-time cueestimates. The actual cue estimate for a relatively long listening period is anintegration of these short-time cue estimates generated through an integration process,which follows the network processing stage, as shown in Fig. 3.2. The result of suchan integration process is a cue distribution function, an example of which is shown inFig. 4.3. Roughly speaking, peaks in such distribution functions indicate theestimated cue values. More specifically, however, certain characteristics of the“hump” surrounding a peak should be considered when evaluating the cue estimates.Relevant parameters are the position and width of the hump. The position of a humpis defined as the centroid of the area under the hump. The width of the hump isdefined as the difference between the upper and lower 6 dB cut-off positions. For theexample shown in Fig. 4.3, the width is defined to bed = Atupper — tower (4.8)where &uppe,. is the lTD position below which the height of the distribution functionin the vicinity of the peak is at least 6 dB lower than the height of the peak, and Atioweris the lTD position above which the height of the distribution function in the vicinityof the peak is at least 6 dB lower than the height of the peak.The two parameters (position and width), as defined above, are measures of howwell the cue estimates reflect the true cue values. For instance, the position gives anestimate of the true cue value, and the width gives us some ideas about the resolutionChapter 4 Simulation Methods 99of the estimate. Thus, the performance of the trained networks will be evaluated interms of these two parameters.w2C0.0C,,4.3.2 Statistical Evaluation of the Trained NetworksSince white noise is added to the stimuli, the performance of the networks isstochastic in nature, i.e. there will be variation in the resulting cue estimates forstimuli with similar characteristics. Also, since the networks are only trained to learna limited number of training examples, there will be variation in the cue estimates fordifferent stimuli that have the same cue value. Furthermore, there are an unlimitednumber of possible stimuli, and we cannot test all of them. Thus, the performance ofthe trained networks should be evaluated from a statistical point of view.lTD (jts)Fig. 4.3 Parameters associated with an lTD distribution peak.Chapter 4 Simulation Methods 100Specifically, in a typical test presented in the following chapters, a random sampleof 120 stimuli is first selected from the entire assembly of all possible stimuli that arerelevant to the test and have the same specific cue value. The network underexamination is then tested with all 120 stimuli in the random sample, and the samplemean (X) and standard deviation (S) of the cue estimates resulting from the testingstimuli are calculated. According to the central limit theorem of statistics (Romano,1977), this sample mean (X) has an approximate normal distribution. The 95%confidence interval for the true mean of the cue estimates is between about X — 0.1 8Sand X+0.18S, where X and S are the sample mean and standard deviation,respectively.4.3.3 Generalization Ability of the NetworksAs mentioned in Subsection 4.2.1, only a limited number of the simplest type stimuliare used as the training examples. An important evaluation of the trained networks is,then, to test how well the networks work with stimuli not seen in the training examplesets. A first test for this purpose is to examine the performance of the networks forstimuli that are of the same type as those used in training, but have different stimulusparameters. Tests with other representative types of sound stimuli are also necessary.For the lTD and lID networks, a second test is to examine the networks with stimulithat contain two frequency components. The purpose of this test is to assess theability of the networks to “generalize”, and how well they will perform with stimulithat have discrete frequency spectra. For the TED network, sinusoidal signalsmodulated with two-tone-complex waveforms are used for the corresponding test. Athird test for the lTD and lID networks is to examine the networks with stimuli thathave continuous frequency spectra. Band-pass filtered noise (pink noise) samples areused in this test. Since a broad-band stimulus is subject to band-pass filtering by theChapter 4 Simulation Methods 101cochlea, the test results with pink noise stimuli provide us with some concrete ideasabout how well the networks work for more general stimuli than pink noise.4.3.4 Robustness Test of the NetworksRobustness of the cue estimates generated from the trained networks is an importantaspect of performance because one of our goal is to develop models that can localizesound sources in noisy situations. Thus, in all the tests discussed in the lastsubsection, both low and high signal-noise ratios (SNRs) are used in order to obtainan idea of how the performance of the networks changes from high SNRs to lowSNRs.4.3.5 Testing in Multi-Source SituationsIn the tests described in the last two subsections, we are concerned with how accurateand robust is the cue estimation made by the trained networks. Thus, only one targetsound source are used in those tests. As discussed in Chapter 2, an important task forour model is to localize multiple sound sources. In this section, we discuss the testingmethods used to assess the ability of the networks to localize multiple sound sources.A. A Modelfor Multi-Source EnvironmentsIt is an important characteristics of natural acoustic environments that differentsources appear at different times and have time-varying acoustic powers. Theconcept of the DLI modeling scheme (see Fig. 3.1) is proposed to take advantage ofthese characteristics in order to resolve multiple sound sources that have similarfrequency contents.To emphasize the above temporal characteristics of an acoustic environment, wemodel the environment to be having n independent sound sources and add white noiseto the signals reaching the two ears. The target sources may have the same or similarChapter 4 Simulation Methods 102frequency contents, but have time varying intensities. In mathematical terms, the twosignals detected by the two ears have the following general form:sL(t) = S(t)u1( +S2(t)ut)+ + S(t)u(t) + wL(t) (4.9)sR(t)=Sl(t1l)Blul(tzl)+S2(t’r’Z (4.10)+...+Sfl(t—;)Bflufl(t—;)+wR(t)where S(t), i = l,• . •,n, describe the time-varying modulation of the n sourcesu,(t), i = 1,.. ‘,n. B, i = 1,.. •,n, reflect the liDs, and ‘ci, i = 1,• •,n, are the ITDs.WL (t) and WR (t) are independent white noise samplesB. Orthogonal On-OffModulations ofDifferent Sound SourcesTo test the trained networks for the separation of the directions of multiple soundsources, we first consider an idealized situation where there are two target soundsources from two different directions. The intensities of the two sounds are on-offmodulated with their modulation functions being square-waves that are orthogonal toeach other. In this case, the sound signals detected by the two ears can be describedas follows:sL(t) =S1(t)ut)+S2(t)ut)+ wL(t) (4.11)sR(t)=SI(t’rl)Blul(t’rl)+5(tt;)+wR(t) (4.12)where u (t) and u2 (t) are the sound waveforms of the two target sources, ‘r1 and r2are two time delays, B1 and B2 are two interaural amplitude ratios corresponding tothe two sources, and S(t) and S2Q) are two orthogonal square-wave signals thatsatisfy the following equation:S2Q)=S1(t—p/2) (4.13)where p is the period of the square-wave signals. Fig. 4.4 shows examples of the twosquare-wave signals in Eq. 4.13.Chapter 4 Simulation Methods 103S1(t)pS (t)0Fig. 4.4 Examples of the two square-wave signals, S (t) andS2 (t), which are orthogonal to each other.In the actual simulation, we drop the time delay terms in the on-off modulationsignals. By doing so, the only cue that make the separation of the two sourcespossible is the differential on-off modulations of the two sources. The two signals atthe two ears then become:sL(t) =S1(t)ut)+S2(t)ut)+ wL(t) (4.14)sR(t) =S1(t)Bu—z1)+S2(t)Bu— 2) + wR(t) (4.15)If, in one extreme case of Eq. 4.14 and Eq. 4.15, the rate of the on-off modulation isso high that the two signals from the different directions are mixed in most of theshort-time windows in which the short-time cues are estimated, the two sources arenot separable by the networks. If, in another extreme case of Eq. 4.14 and Eq. 4.15,the rate of the modulation is so slow that in the entire listening period there is onlyone source is turned on, then the situation degenerate to a single target source case.Chapter 4 Simulation Methods 104Thus, an important parameter that will affect the results of the networks is the on-offperiod of the modulation functions.C. Resolution Powerfor the Separation of Two Sound SourcesAnother important aspect of the performance of the networks is their resolution powerin separating the directions of two closely placed targets. Although the widthparameter corresponding to a cue distribution peak gives us a measure of theresolution for single target cases, it is not clear how close two target sources may beplaced before the distribution peaks that correspond to the two targets merge into one.This is tested in situations where two orthogonal on-off modulated tonal sources areseparated by different spatial distances.D. Uncorrelated On-OffModulations ofDifferent Sound SourcesIn the test described in B earlier this subsection, the two target sources are completelytemporally segregated by means of orthogonal on-off modulations. However, in morerealistic situations, different sound sources are not correlated to each other. To modela more general situation, the networks are tested with two independently on-offmodulated target sources. More specifically, the on-off modulation intervals of thetwo sources have random lengths. The random on-off modulation of the two targetsare not correlated with each other. In this case, at any small interval of the listeningperiod, there are four possible combination of the two target signals: (i) target A is on,but target B is off; (ii) target A is off, but target B is on; (iii) both targets are on; andfinally (iv) both targets are off. Thus, in effect, the two targets are only partlysegregated. This test is designed to see if the networks are still able to pick up thetwo spatially separated targets in the first two (i and ii) combinations.Chapter 4 Simulation Methods 1054.3.6 Summary Table of Tests for the Evaluation of the NetworksTable 4.1 summarizes the planned tests described in this section.Table 4.1 Networks (lTD. lID, lED) to be tested for the listedstimulus type and parameter combinations.Low SNR High Frequency Center Band- Spatial(0 or 5 SNR difference frequency width separationdB) (20dB)Pure tone ITD, lID ITD, lIDTwo-tone lTD. lID lTD. lID ITD, lIDcomplexPink noise lTD. lID lTD. lID lTD. TB) lTD. Ill)AM stimuli TED lEDTwo-tone TED TED TEDcomplex asmodulationsignalOrthogonal ITD, lID, ITD, lID, lTD. lID,two-source TED TED TEDcaseRandom ITD, lID,modulation lEDtwo-sourcecase1065 Test Results of lTD EstimationAs mentioned in Subsection 4.2.1, three networks (lTD. 11D, and lED), each in one ofthe three implementations (Fig. 3.2) of the DLI model shown in Fig. 3.1, were trainedand evaluated following the simulation methods described in the last chapter. In thischapter, we present the test results of the trained lTD network. Results for the trainedlID and lED networks will be presented in Chapters 6 and 7, respectively.§5.1 Pure-Tone StimuliFollowing the evaluation strategy described in Section 4.3, the lTD network was firsttested with pure-tone stimuli having fixed values for the lTD and signal-noise ratio(SNR). A random sample of 120 pure-tone stimuli that have a certain lTD value wasselected to test the estimation accuracy for that particular lTD value. Severalparameters, frequency, intensity, phase, and lID, are particularly relevant to therandom sampling of stimuli. The intensities of the test stimuli were randomlyselected from the range of 30-60 dB SPL, which covers a moderate intensity range.For lower intensity values, our model generated lTD distribution functions that didnot show clear peaks, indicating that the model was not stimulated sufficiently to giveout locations of the sound sources, if any. The liDs of the test stimuli were randomlyselected from the range of -9 to 9 dB. The phases of the pure-tones are randomlyselected from the range of 0 to 2ic. The frequencies were randomly selected from therange of 800-1000 Hz, which covers the characteristic frequencies of the auditorynerve fibers at the input of the lTD network. Since the neural signals of the auditorynerve fibers are modeled as outputs of a process that involves narrow band-passfilters, a sound stimulus with its frequency outside the range adjacent to thecharacteristic frequencies of the fibers will be much attenuated such that the lIDChapterS Test Results of lTD Estimation 107network will not be stimulated sufficiently to give out valid lTD estimates. Asdiscussed in Subsection 4.2.2, the range of lTD values encoded by the output layerneurons of the lTD network is from -505 ps to 505 is. We used a 250 ts lTD in thetest, which is an intermediate value between 0 second and the upper limit (505 is) ofthe encoded lTD range. White noise was added to all test stimuli, resulting in a 20 dBSNR.lTD estimates (in terms of the position and width defined in Subsection 4.3.1)were obtained for the above described random sample of stimuli, and the mean andstandard deviation of these estimates were calculated, as shown in Table 5.1. It isclear that accurate lTD estimates were obtained.Table 5.1 lTD estimates (mean ± standard deviation) and theerror of the mean position estimate for pure-tone stimuli with a250 ps lTD and a 20 dB SNR.Position: p (ps) Width: d (j.ts) Position Error (ts)250.6±10.4 42.9±16.7 0.6The robustness of the lTD estimates was tested by repeating the test shown inTable 5.1 using a 0 dB SNR. All other aspects of the test remain unchanged. Table5.2 shows the results.Table 5.2 lTD estimates (mean ± standard deviation) and theerror of the mean position estimate for pure-tone stimuli with a250 jis lTD and a 0 dB SNR.Position: p (ts) Width: d (.ts) Position Error (j.is)250.4±10.6 86.9±35.3 0.4ChapterS Test Results of lTD Estimation 108Data in Table 5.2 are similar to that in Table 5.1 except for a decrease (50 jisversus 100 ps) in the resolution of the estimates. This indicates that the lTDestimates are robust in the presence of noise for pure-tone stimuli.The test shown in Table 5.1 was repeated for a small range of lTD values around0 second. The results are tabulated in Table 5.3, where the first column shows thetested lTD values, and each row shows the characteristics of the corresponding lTDestimates. The numbers after “±“ are standard deviations of the correspondingestimates.Table 5.3 lTD estimates and the errors of the mean positionestimates for pure-tone stimuli with different ITDs and a 20 dBSNR.ITD: tt (ts) Position: p (j.ts) Width: d (jis) Position Error (ps)-300 -304.0 ± 35.1 60.0 ± 23.6 -4.0-200 -194.2 ± 24.6 55.3 ± 20.0 5.8-100 -91.6± 10.5 53-5± 19.5 8.40 12.9± 19.1 41.3 ±20.4 12.9100 106.8 ± 17.3 51.1 ± 20.9 6.8200 206.0± 11.1 48.4± 18.3 6.0300 294.1 ± 8.7 46.9 ± 17.7 -5.9The estimated ITDs (in the second column of Table 5.3) are plotted as a functionof the true ITDs (in the first column of Table 5.3), as shown in Fig. 5.1. These resultsclearly show that the trained lTD network is able to produce accurate lTD estimateswith high resolutions.ChapterS Test Results oflTD Estimation 109U)HE4-IIii400200-200-400True lTD (j.ts)Fig. 5.1 lTD estimates shown as a function of the true ITDs.§5.2 Two-Tone Complex StimuliNext, the network was tested with stimuli that have two frequency components.Again, a random sample of stimuli was used. An important parameter of this type ofstimuli is the frequency spacing of the two components in the stimuli. Thus, we firstconducted a test using a random sample of stimuli that have a fixed frequency spacingvalue. Specifically, one frequency component was randomly selected from the rangeof 700-1100 Hz while the other was chosen to have a 100 Hz difference from the firstcomponent. The intensities of the stimuli in the sample were randomly selected fromthe range of 30-60 dB SPL. The liDs of the stimuli were randomly selected from therange of -9 to 9 dB. White noise was added to the stimuli, resulting in a 20 dB SNR.The lTD of the stimuli is 250 jis. Table 5.4 shows the test results.ChapterS Test Results of lTD Estimation 110Table 5.4 lTD estimates (mean ± standard deviation) and theerror of the mean position estimate for two-tone complexstimuli with a 250 jis lTD and a 20 dB SNR.Position: p (ps) Width: d (pUs) Position Error (jis)246.3±17.3 45.2±15.7 -3.7The robustness of the lTD estimation for two-tone complex stimuli was tested byrepeating the test shown in Table 5.4 with a 0 dB SNR. The results are shown inTable 5.5.Table 5.5 lTD estimates (mean ± standard deviation) and theerror of the mean position estimate for two-tone complexstimuli with a 250 ts lTD and a 0 dB SNR.Position: p (its) Width: d (j.ts) Position Error (is)239.4±19.6 93.2±41.1 -10.6Data shown in Tables 5.4 and 5.5 are similar to the corresponding data (shown inTables 5.1 and 5.2) for the pure-tone case presented in the last section. Thisdemonstrates the ability of the lTD network to “generalize” and perform well withtwo-tone complex stimuli. Furthermore, although there is a two fold decrease in theresolution of the lTD estimates when the SNR is reduced from 20 dB (Table 5.4) to 0dB (Table 5.5), the otherwise comparable data between Table 5.4 and Table 5.5indicate that the lTD estimates are robust in the presence of noise for two-tonecomplexes.The effect of the frequency spacing of the two components in the stimuli on theperformance of the lTD network was studied by repeating the test shown in Table 5.4for different values of the frequency difference between the two components. Theresults are shown in Table 5.6, where the first column shows the tested frequencyChapterS Test Results oflTD Estimation 111differences, and the subsequent columns show the corresponding lTD estimates. Thetrue lTD in the test stimuli was 250 ts.Table 5.6 lTD estimates and the errors of the mean positionestimates for two-tone complex stimuli with a 250 jis lTD anda 20 dB SNR. FD: frequency difference between the twocomponents in the stimuli.FD: Af (Hz) Position: p (is) Width: d (jis) Position Error (ps)25 247.2 ± 21.9 47.8 ± 24.4 -2.850 246.3 ± 20.6 45.7 ± 19.6 -3.7100 246.3 ± 17.3 45.2 ± 15.7 -3.7150 244.5± 15.2 47.0± 16.5 -5.5200 238.6±21.0 51.7±22.9 -11.4250 234.6 ± 19.4 57.8 ± 28.4 -15.4Data in Table 5.6 indicate that the frequency spacing between the two componentsin the stimuli has little effect on the estimation results.Analysis of the Input Patterns of the lTD Network for Two-Tone StimuliWe have seen that although the lTD network was trained by using pure-tone stimulionly, it also works with two-tone complex stimuli. We wondered why this was thecase. As the lTD is measured by comparing input patterns to the network, thesepatterns may hold the key to the answer. With a single tone stimulus, the activities ofthe auditory nerve fibers form line patterns (see Fig. 3.3) which provide references forthe estimation of the ITDs. What would the input patterns look like if the stimuluscontains two sinusoidal components?Consider a stimulus of the following form:ChapterS Test Results of lTD Estimation 112y(t) = Asin(O)1t+ p1)+ Bsin(w2t+ p2) (5.1)It can be re-written in the following formsy(t) = C[sin 0 sin(o)1t+ c’1)+ cos 0 sin(co2t+— Ecos(U)it+ — 8)—cos(U)t+p1 + 0)+2 [+sin(U)t+ +0) + sin(U)2t+ — 0)_[sin(coit+q1—0—)—sin(cot+cp+0)]+2+[sin(U)it + q1 +0— + sin(w2t+ — 0)]CO ++++ 1cosE( O) — 2t + — — 0+ +R 2 ) 2 4] [‘ 2 ) 2 4]— +U) + 2t + 1 +— 1cosE(°’_— + 1 2 + 0—[ 2 ) 2 4] [k 2 } 2 4(5.2)where C = + B2, and 0 = arcsin(A/C), — ir/2 < 0 <,r/2. From Eq. 5.2, it is clearthat the stimulus is a linear combination of two amplitude-modulated tones whichhave the same carrier frequency (U)1 + w2)/2 and modulating frequency(co1—In addition, the carrier tones are n/2 out of phase.The stimulus can be analyzed in three cases: (i) the two frequencies cü and (02 inEq. 5.1 are very close to each other; (ii) the two frequencies are very far from eachother; and (iii) the two frequencies have an intermediate distance in the frequencydomain. In the first case, as co and (02 are very close, the modulating frequency inEq. 5.2 is very small. Therefore, the amplitude-modulated tones in Eq. 5.2 can besimplified to two pure-tones of the same frequency in a short time window. Thus,over short time windows, Eq. 5.2 is reduced toy(t) A sin (Ut — B cos Cot = C cos(0t + 0) (5.3)which is a single-tone stimulus.ChapterS Test Results oflTD Estimation 113In the second case, where Coi and (02 are very far from each other, the band-passfiltering effect of the basilar membrane will filter out the frequency component that isfurther away from the center frequency of the range covered by the lTD network.Thus, this case is also reduced to the single-tone stimulus case.The last case, where (01 and Co2 have an intermediate distance, is somewhat morecomplicated than the first two. In this case, both tones in Eq. 5.2 are amplitude-modulated. The highest modulation rate occurs when (01 and (02 have the largestdistance, but none of them are filtered out by the model auditory nerve fibers thatactivate the lTD network. However, due to the fact that the ITDs are estimated invery short time windows, as long as the modulation rate is slow in the sense that theamplitude of the carrier tones does not change much in these short time windows, thetwo-tone case can still be reduced to the single-tone case. In fact, the activation of theauditory nerve fibers for the worst cases of two-tone stimuli still shows straightpatterns, as shown in Fig. 5.2.Input form Left Ear Input form Right Earc)C’)Fig. 5.2 Example input activation patterns of the lTD networkfor a two-tone complex stimulus whose two components have a250 Hz difference.Auditory Nerve FibersChapter 5 Test Results of lTD Estimation 114§5.3 Pink Noise StimuliFollowing the evaluation strategy discussed in Section 4.3, the trained lTD networkwas also tested with pink noise stimuli. A pink noise is characterized by itsbandwidth and center frequency. Two series of tests were conducted to assess theeffects of these two parameters on the performance of the lTD network. In bothseries of tests, the bandwidths of the pink noise tested are no larger than 150 Hz.Tests with wider bandwidths are not necessary because (i) the characteristicfrequencies of the input auditory nerve fibers to the lTD network cover a narrow bandof frequencies, and (ii) the models of the auditory nerve fibers are band-pass filters.If the bandwidth of a stimulus is significantly wider than the frequency band coveredby the lTD network, the energy outside the covered band will be filtered out by thegroup of auditory nerve fibers at the input to the lTD network.In the first series of tests, the effect of the bandwidth is studied. Morespecifically, in each experiment in the series, a random sample of 120 pink noisestimuli were selected. The bandwidth of these stimuli was the same, but otherstimulus parameters were randomly selected from uniform distributions. The lTDestimates for these stimuli were then obtained. The mean and standard deviation ofthese lTD estimates (in terms of the corresponding position and width parameters)were tabulated in Table 5.7, where the first column shows the tested values of thestimulus bandwidth. In the results presented in Table 5.7, the center frequencies ofthe stimuli were randomly selected from the range of 800-1000 Hz. To make surethat the stimuli contain energy in the frequency range covered by the lTD network,we did not use frequencies outside this range. If this is not the case, the lTD networkwill not be excited sufficiently by the stimuli, and will not be able to tell whichdirections the stimuli are coming from. The intensities of the stimuli were randomlyselected from the range of 30-60 dB. The liDs were randomly selected from theChapterS Test Results of lTD Estimation 115range from -9 to 9 dB. All test stimuli have a 250 ts lTD. with white noise added,resulting in a 20 dB SNR.Data shown in Table 5.7 are similar to that obtained in the pure-tone case (Table5.1), demonstrating the ability of the network to “generalize” and perform well withpink noise stimuli. Furthermore, the bandwidth of the pink noise stimuli shows noeffect on the performance of the network.The robustness of the lTD estimates for pink noise stimuli was tested by repeatingthe tests presented in Table 5.7 using stimuli that had a 50 Hz bandwidth and onwhich white noise was added so that the SNR was 0 dB. The results are shown inTable 5.8. Again, except for a two fold decrease in resolution, the estimates shown inTable 5.8 are similar to that obtained in the higher (20 dB) SNR case. Thus, the lTDestimates for pink noise stimuli are also robust in the presence of white noise.Table 5.7 lTD estimates and the errors of the mean positionestimates for pink noise stimuli with different bandwidths(BW). The SNR used in the tests was 20 dB, and the lTD ofthe stimuli was 250 jis.BW: w (Hz) Position: p (ps) Width: d (.ts) Position Error (is)25 249.6± 11.8 51.4± 22.0 -0.450 250.7 ± 10.2 50.7 ± 17.7 0.7100 251.0± 8.9 49.2± 15.3 1.0150 248.3 ±9.0 48.7± 13.3 -1.7Parallel to the first series of experiments presented in Table 5.7, in the secondseries of experiments, we tested the effect of the center frequencies of pink noisestimuli on the performance of the lTD network. In different experiments in the series,a different center frequency was tested so that a function of performance versusstimulus center frequency was obtained. In each experiment in the series, a randomChapterS Test Results oflTD Estimation 116sample of 120 stimuli that had the same center frequency was first selected, and thelTD estimates for these stimuli were obtained. The mean and the standard deviationof these lTD estimates were then calculated. In all of the test stimuli in the randomsamples used in the series of experiments, the bandwidths were randomly selectedfrom the range of 25-150 Hz. The intensities were randomly selected from the rangeof 30-60 dB. The liDs were randomly selected from the range from -9 to 9 dB.White noise was added to the stimuli, resulting a 20 dB SNR. The lTD of the stimuliis 250 JIs. The test results are tabulated in Table 5.9, where the first column showsthe different center frequencies tested in different experiments in the series. Theseresults show that the center frequency of a pink noise stimulus has little effect on theperformance of the lTD network.Table 5.8 lTD estimates and the error of the mean positionestimate for pink noise stimuli with a 0 dB SNR and a 250 pslTD. The bandwidth of the stimuli was 50 Hz.Position: p (jis) Width: d (jis) Position Error (j.is)235.7±21.5 84.8±31.3 -14.3§5.4 Two Similar Sources from Different DirectionsFollowing the testing strategy outlined in Subsection 4.3.5, the lTD network was alsotested in cases where there were two sound sources. A first test was carried out usingtwo pure-tone sources that were periodically on-off modulated with orthogonalmodulation functions (Subsection 4.3.5 B). The two pure-tones had the same 50 dBSPL intensity and the same 900 Hz frequency. The lID for both tone sources was 0dB. The ITDs for the two sources were -250 us and 250 jis, respectively. Whitenoise was added to the signals from the two targets, resulting in a 20 dB SNR.ChapterS Test Results oflTD Estimation 117Different on-off modulation periods were tested, as shown in the first column ofTable 5.10. There are two peaks in each of the lTD distribution functions generatedin the tests, as shown in Fig. 5.3 where the lTD distribution function that correspondsto a 25 ms on-off modulation period is plotted. The position and width parametersassociated with the two peaks are listed in the second through fifth columns of Table5.10. These data demonstrate that correct estimates of the two ITDs (-250 and 250ts) associated with the two pure tone sources were obtained. We found that thesmallest on-off modulation period at which the lTD network could still separate thetwo sources was 15 ms.Table 5.9 lTD estimates and the errors of the mean positionestimates for pink noise stimuli with different centerfrequencies (CF). The SNR used in the tests was 20 dB, andthe lTD of the stimuli was 250 ps.CF:f(Hz) Position: p (is) Width: d (jis) Position Error (Its)800 239.9± 11.8 70.9±28.2 -10.1850 249.4 ± 8.9 52.7 ± 13.0 -0.6900 252.2± 7.8 48.7± 11.5 2.2950 251.5 ±7.0 44.4± 10.1 1.51000 254.4 ± 8.6 43.3 ± 11.0 4.4The robustness of lTD network to separate two pure-tone sources was tested byrepeating the test shown in Table 5.10 with a 0 dB SNR. Only the smallest (15 ms)on-off modulation period was used. The results, which is similar to that shown inTable 5.10, is shown in Table 5.11. The comparable results between the two testsshown in Tables 5.10 and 5.11 demonstrate the robust ability of the network toseparate two similar sources originating form different directions.Chapter 5 Test Results of lTD Estimation 118200lTD of Target #1 lTD of Target #2ci)D4—’ -0E10050-0-0.6 -0.4 -0.2 0 0.2 0.4 0.6lTD (ms)Fig. 5.3 An example lTD distribution function for a two-source case with a 25 ms on-off modulation period.Table 5.10 lTD estimates and the position errors for two-source cases with different on-off modulation periods (MP).The SNR was 20 dB, and the ITDs of the two sources were-250 is and 250 jis, respectively.First Peak Second PeakMP (ms) Position Width Position Position Width Position(jis) (jis) Error (ps) (is) (is) Error (jis)15 -288.6 101.0 -38.6 274.8 50.5 24.820 -259.8 67.3 -9.8 244.7 33.7 -5.325 -236.3 50.5 13.7 236.5 33.7 -13.530 -234.7 84.2 15.3 236.6 50.5 -13.435 -250.0 67.3 0.0 257.3 33.7 7.3ChapterS Test Results oflTD Estimation 119Table 5.11 lTD estimates and the position errors for a two-source case with a 0 dB SNR. The ITDs of the two sourceswere -250 jis and 250 jis, respectively.First Peak Second PeakMP (ms) Position Width Position Position Width Position(.ts) (is) Error (jis) (jis) (us) Error (ps)15 -289.4 117.8 -39.4 268.7 67.3 18.7The resolution power of the network to separate the directions of two sources wastested by repeating the test in Table 5.10 for different spatial separations of the twopure-tone sources. A 15 ms on-off modulation period was used in the test. Table5.12 shows the results, where the first column lists the spatial separations (in terms ofthe two lTD values) of two pure-tone sources. The smallest lTD difference wasfound to be about 200 ps.Table 5.12 lTD estimates and the position errors for two-source cases with different spatial separations (SS, in terms ofthe two 1TDs). The SNR was 20 dB.First Peak Second PeakSS (jis) Position Width Position Position Width Position_____________ (us) (us) Error (u) (u) (us) Error (u’s)(-100, 100) -80.1 84.2 19.9 80.2 84.2 -19.8(-125, 125) -118.1 50.5 6.9 117.6 50.5 -7.4(-188, 188) -180.4 50.5 7.6 217.1 33.7 29.1(-250, 250) -288.6 101.0 -38.6 274.8 50.5 24.8ChapterS Test Results of lTD Estimation 120Finally, as described in Subsection 4.3.5 D, the network was tested in a case thathas two uncorrelated pure-tone targets. The two targets were also on-off modulated,but with on-off intervals of random lengths selected from the range of [0, 35] ms.The two tones had the same 50 dB SPL intensity and the same 900 Hz frequency.The lID of both sources was 0 dB. The ITDs of the two sources were -250 and 250.ts, respectively. White noise was added to the signals from the two targets, resultingin a 20 dB SNR. Table 5.13 shows the test results. The corresponding lTDdistribution histogram is shown in Fig. 5.4. Two peaks in this distribution functionare identifiable that correspond to the two targets. This indicates that the network isable to pick up the individual targets even when they are not completely segregated.The estimated parameters (position and width) associated with the two peaks arecomparable to that for the corresponding orthogonal two-target case presented earlierthis section (Table 5.10). The unequal sizes of the “humps” corresponding to the twopeaks are a reflection of the unequal “show-up” times of the two targets, which in turnare results of the independent random on-off modulations of the two targets. Thethird (middle) peak in the distribution function shown in Fig. 5.4 corresponds toshort-time lTD estimates obtained in time periods during which the two sources areboth present.Table 5.13 lTD estimates and the position errors for a twosource case with uncorrelated on-off modulations. The SNRwas20dB.First Peak Second PeakTrue ITDs (j.is) Position Width Position Position Width Position(ps) (ps) Error (j.is) (jis) (us) Error (us)(-250, 250) -247.7 67.3 2.3 241.5 33.7 -8.5ChapterS Test Results of lTD Estimation 121ci)D4-,2C04-,.014-,U)>4-’ci)12010080604020-0.6 -0.4 -0.2 0 0.2 0.4lTD (ms)Fig. 5.4 The lTD distribution histogram for the uncorrelatedtwo-source case. The parameters that describe the two targetsources are shown in Table 5.13.0.61226 Test Results of lID EstimationIn this chapter, we present parallel test results of the trained lID network to that of thelTD network presented in the last chapter. Again, the evaluation methods for the lIDnetwork, which are the same as that for the lTD network, are outlined in Section 4.3,and will not be repeated here. Only aspects that are unique to the lID network will bediscussed.§6.1 Pure-Tone StimuliThe lID network was first tested with pure-tone stimuli. A random sample of 120pure-tone stimuli was used in the test. Parameters relevant to the random sampling ofthe stimuli are the frequency, intensity, phase, and lTD. The intensities of the teststimuli were randomly selected from the range of 30-60 dB SPL. The ITDs wererandomly selected from the range of -1.64 to 1.64 ms. The phases of the pure-tonesare randomly selected from the range of 0 to 2it. The frequencies were randomlyselected from the range of 2500-3500 Hz, which covers the characteristic frequenciesof the auditory nerve fibers at the input of the 11D network. A 3 dB lID was used inthe test. With 10-12 dB ilDs corresponding to sound images completely lateralized toone ear (see Figs. 1.7 and 1.8), a 3 dB lID corresponds to a sound image perceived ata position slightly to the side. White noise was added to all sample stimuli, resultingin a 20 dB SNR. The resulting 11D estimates (in terms of the associated “position”and “width” parameters defined in Subsection 4.3.1) were obtained for the abovedescribed random sample of stimuli. The mean and standard deviation of theseestimates are shown in Table 6.1.The robustness of the lID estimates was tested by repeating the test shown inTable 6.1 using a 5 dB SNR. Table 62 shows the results.Chapter 6 Test Results oflID Estimation 123Despite the poorer resolution of the lID estimates than that for the 20 dB SNRcase (Table 6.1), the network gave out similar lID estimates to that shown in Table6.1, indicating that the lID estimates are robust in the presence of noise for pure-tonestimuli.Table 6.1 11D estimates (mean ± standard deviation) and theerror of the mean position estimate for pure-tone stimuli with a3 dB lID and a 20 dB SNR.Position: p (dB) Width: d (dB) Position Error (dB)3.9±0.87 0.8±0.27 0.9Table 6.2 11D estimates (mean ± standard deviation) and theerror of the mean position estimate for pure-tone stimuli with a3 dB lID and a 5 dB SNR.Position: p (dB) Width: d (dB) Position Error (dB)3.71±0.92 3.56±1.86 0.71The test shown in Table 6.1 was repeated for nine lID values in the range of -12 to12 dB. The results are tabulated in Table 6.3, where the first column shows the tested11D values.The estimated lIDs (in the second column in Table 6.3) are plotted as a functionof the true liDs (in the first column in Table 6.3), as shown in Fig. 6.1. This clearlyshows that the 11D network is able to produce accurate lID estimates.Chapter 6 Test Results of lID Estimation 124Table 6.3 lID estimates and the errors of the mean positionestimates for pure-tone stimuli with different liDs and a 20 dBSNR.lID: zI (dB) Position: p (dB) Width: d (dB) Position Error (dB)-12 -11±1.6 0.51±0.21 1-9 -9.7 ± 1.9 0.78 ± 0.42 -0.7-6 -6.8 ± 1.6 0.75 ± 0.44 -0.8-3 -3.6 ± 1.2 0.75 ± 0.33 -0.60 0.26± 1.1 0.66± 0.22 0.263 3.9 ± 0.87 0.8 ± 0.27 0.96 7.5 ± 1.7 0.66 ± 0.22 1.59 10 ± 0.83 0.65 ± 0.36 112 11 ± 2.2 0.46 ± 0.25 -1§6.2 Two-Tone Complex StimuliThe network was then tested with stimuli that have two frequency components. Wefirst conducted a test using a random sample of stimuli that have a fixed frequencydifference between the two components in the stimuli. Specifically, the twofrequency components in a test stimulus were randomly selected from the range of2000-4000 Hz except that the difference between the two components was always500 Hz. The intensities of the stimuli were randomly selected from the range of 30-60 dB SPL. The ITDs were randomly selected from the range of -1.64 to 1.64 ms.White noise was added to the stimuli in the sample, resulting a 20 dB SNR. The lIDof the test stimuli is 3 dB. Table 6.4 shows the test results.Chapter 6 Test Results oflID Estimation 1251510cci-t:0Fig. 6.1 11D estimates shown as a function of the true liDs.Table 6.4 lID estimates (mean ± standard deviation) and theerror of the mean position estimate for two-tone complexstimuli with a 3 dB lID and a 20 dB SNR.Position: p (dB) Width: d (dB) Position Error (dB)3.5±0.94 1.35±0.81 0.5The robustness of the lID estimates for two-tone complex stimuli was tested byrepeating the test shown in Table 6.4 with a 5 dB SNR. The results are shown inTable 6.5.Data shown in Tables 6.4 and 6.5 are similar to the corresponding data (shown inTables 6.1 and 6.2) for the pure-tone case presented in the last section. Thisdemonstrates the ability of the 11D network to “generalize” and perform well with twotone complex stimuli. Furthermore, although the resolution of the lID estimates-5 0 5True lID (dB)Chapter 6 Test Results of lID Estimation 126becomes poorer (4 dB versus 1.35 dB) when the SNR is reduced from 20 dB (Table6.4) to 5 dB (Table 6.5), the network is able to give out similar “position” estimates atthe two noise levels.Table 6.5 lID estimates (mean ± standard deviation) and theerror of the mean position estimate for two-tone complexstimuli with a 3 dB lID and a 5 dB SNR.Position: p (dB) Width: d (dB) Position Error (dB)3.9±1.3 4±1.7 0.9The test shown in Table 6.4 was repeated for different frequency differencesbetween the two components in the test stimuli. The results are shown in Table 6.6,where the first column show the tested frequency differences, and the subsequentcolumns show the corresponding lID estimates. The true lID in the test stimuli was 3dB.Table 6.6 11D estimates and the errors of the mean positionestimates for two-tone complex stimuli with a 3 dB UD and a20 dB SNR. FD: frequency difference between the twocomponents in the stimuli.FD: zf (Hz) Position: p (dB) Width: d (dB) Position Error (dB)100 3.77 ±0.97 1.99± 1.27 0.77300 3.4 ± 1.13 1.28 ± 0.73 0.4500 3.5 ± 0.94 1.35 ± 0.81 0.5700 3.77±0.89 1.02±0.65 0.77900 3.79 ± 0.72 0.7 ± 0.33 0.79Chapter 6 Test Results oflID Estimation 127Data in Table 6.6 indicate that the frequency difference between the twocomponents of the stimuli has little effect on the estimation results.§6.3 Pink Noise StimuliThe network was also tested with pink noise stimuli in two series of experiments.In the first series, the effect of the stimulus bandwidth was tested. In each test in theseries, the test stimuli had the same bandwidth. The center frequencies of the stimuliwere randomly selected from the range of 2500-3500 Hz. The intensities wererandomly selected from the range of 30-60 dB. The ITDs were randomly selectedfrom the range of -1.64 to 1.64 ms. All test stimuli had the same 3 dB lID, with whitenoise added, resulting in a 20 dB SNR. The test results are tabulated in Table 6.7,where the first column shows the values of the stimulus bandwidth used.Table 6.7 lID estimates and the errors of the mean positionestimates for pink noise stimuli with different bandwidths(BW). The SNR used in the tests was 20 dB, and the lID of thestimuli was 3 dB.BW: w (Hz) Position: p (dB) Width: d (dB) Position Error (dB)100 3.9±1 3.7±1.4 0.9300 3.3 ± 0.5 3 ± 0.97 0.3500 2.9 ± 0.69 2.9 ± 0.92 -0.1700 2.6 ± 0.4 3 ± 0.9 -0.4900 2.5 ± 0.62 3.1 ± 1 -0.5The results shown in Table 6.7 are similar to that obtained in the pure-tone case(Table 6.1), demonstrating the ability of the network to “generalize” and perform wellwith pink noise stimuli. Furthermore, the bandwidth of the pink noise stimuli showsChapter 6 Test Results oflID Estimation 128only a small effect on the performance of the network: there is a small downwardshift of the estimated lID (the second column of Table 6.7) as the stimulus bandwidthincreases from 100 to 900 Hz.The robustness of the lID estimates for pink noise stimuli was tested by repeatingthe tests shown in Table 6.7 using stimuli that have a 500 Hz bandwidth and a lower(5 dB) SNR. The results are shown in Table 6.8. There is a two fold decrease in theresolution of the lID estimates when the SNR is reduced from 20 dB to 5 dB SNR.Otherwise, the data in Table 6.8 are similar to the corresponding data entry in Table6.7.Table 6.8 lID estimates (mean ± standard deviation) and theerror of the mean position estimate for pink noise stimuli with a5 dB SNR and a 3 dB lID. The bandwidth of the stimuli was500 Hz.Position: p (dB) Width: d (dB) Position Error (dB)2.9±1.2 6.1±1.6 -0.1In the second series of experiments with pink noise stimuli, we tested the effect ofthe center frequencies of stimuli on the performance of the network. Different centerfrequencies were used to obtain a function of performance versus frequency. In eachexperiment in the series, the bandwidths of the test stimuli were randomly selectedfrom the range of 50-1000 Hz. The intensities were randomly selected from the rangeof 30-60 dB. The ITDs were randomly selected from the range of -1.64 to 1.64 ms.All test stimuli had the same 3 dB lID, and on which white noise was added with a 20dB SNR. The test results are shown in Table 6.9, where the first column shows thedifferent center frequencies tested. The center frequency of the stimuli shows littleeffect on the performance of the network.Chapter 6 Test Results of lID Estimation 129Table 6.9 lID estimates and the errors of the mean positionestimates for pink noise stimuli with different centerfrequencies (CF). The SNR used in the tests was 20 dB, andthe 11D of the stimuli was 3 dB.CF:f(Hz) Position: p (dB) Width: d (dB) Position Error (dB)2500 2.9±0.76 2.6±0.8 -0.12750 4.2± 0.89 3.1 ± 1.3 1.23000 2.9 ± 0.53 2.8 ± 0.82 -0.13250 2.9 ± 0.74 2.5 ± 0.86 -0.13500 2.4 ± 0.38 3 ± 0.94 -0.6§6.4 Two Similar Sources from Different DirectionsFinally, the network was tested in cases where there were two sound sources. In thefirst test in this category, two pure-tone sources (both have a 50 dB SPL intensity anda 3000 Hz frequency) were periodically on-off modulated with orthogonalmodulation functions. The lTD for both tone sources was 0 second. The lIDs for thetwo sources were -3 dB and 3 dB, respectively. The SNR used in the test was 20 dB.Different on-off modulation periods, which are listed in the first column of Table6.10, were tested. The test results are listed in the other columns of the table. Twopeaks in the lID distribution functions generated in the tests are identifiable whichcorrespond to the liDs of the two sources. Fig. 6.2 shows such an lID distributionfunction resulted from the test with a 35 ms on-off modulation period. We found thatthe smallest on-off modulation period for two-source separation was about 20 ms.The above test was then repeated with a 5 dB SNR and for a 20 ms on-offmodulation period. The results, as shown in Table 6.11, are similar to that for the 20dB case shown in Table 6.10 except for a decrease (the width parameter changes from0.8 to 2.8) in the resolution of the lID estimates.Chapter 6 Test Results of lID Estimation 130Table 6.10 11D estimates and the position errors for two-sourcecases with different on-off modulation periods (MP). The SNRwas 20 dB, and the liDs of the two sources were -3 dB and 3dB, respectively.First Peak Second PeakMP (ms) Position Width Position Position Width Position(dB) (dB) Error (dB) (dB) (dB) Error (dB)20 -3.4 0.4 -0.4 2.8 0.8 -0.225 -3.4 0.4 -0.4 3.2 0.8 0.230 -3.4 0.4 -0.4 2.9 0.8 -0.135 -3.4 0.4 -0.4 3 0.4 0.01001).- 80I60-:- H H— 20-/\ II II \J0 --15 -10 -5 0 5 10 15lID (dB)Fig. 6.2 An example lID distribution function for a two-sourcecase with a 35 ms on-off modulation period.Chapter 6 Test Results of lID Estimation 131Table 6.11 lID estimates and the position errors for a two-source case with a 5 dB SNR. The lIDs of the two sourceswere -3 dB and 3 dB, respectively.First Peak Second PeakMP (ms) Position Width Position Position Width Position(dB) (dB) Error (dB) (dB) (dB) Error (dB)30 -3 2.8 0.0 4.3 2.4 1.3Next, the test shown in Table 6.10 was repeated again, but for different spatialseparations of the two pure-tone sources. A 20 ms on-off modulation period was usedin the test. Table 6.12 shows the results, where the first column lists the spatialseparations (in terms of the two lID values) of the two pure-tone sources. Thesmallest IID difference was found to be about 3 dB.Table 6.12 lID estimates and the position errors for two-sourcecases with different spatial separation (SS, in terms of the twolIDs). The SNR was 20 dB.First Peak Second PeakSS (dB) Position Width Position Position Width Position(dB) (dB) Error (dB) (dB) (dB) Error (dB)(-1, 1) -1.3 2 -0.3 0.7 1.2 -0.3(-1.5, 1.5) -2.4 0.8 -0.9 2.0 0.8 0.5(-2,2) -3.4 0.4 -1.4 2.8 0.8 0.8The network was finally tested in a case where the two target sources weremodulated with on-off intervals having random lengths selected from the range of [0,35] ms. The two tones had the same 50 dB SPL intensity and the same 3000 HzChapter 6 Test Results of lID Estimation 132frequency. The lTD of both sources was 0 second. The liDs of the two sources were-3 dB and 3 dB, respectively. The SNR in the test was 20 dB. Table 6.13 shows thetest results. The corresponding lID distribution function is shown in Fig. 6.3. Twopeaks in this distribution function are identifiable that correspond to the two targets.The third (middle) peak corresponds to short-time lID estimates generated in timeperiods during which the two sources are both present.I-10 -5 0 5 10lID (dB)Fig. 6.3 The lID distribution function for the uncorrelated two-source case. The parameters that describe the two targetsources are shown in Table 6.13.Table 6.13 lTD estimates and the position errors for a two-source case with uncorrelated on-off modulations. The SNRwas20dB.First Peak Second PeakPosition Width Position Position Width Position(dB) (dB) Error (dB) (dB) (dB) Error (dB)-3.4 0.4 -0.4 3 [ 0.4 1 0.06050 jl40 ii30 II AI I I I20 i\. HI \ J II10\LN/ \_•••••0-15 151337 Test Results of lED EstimationSimilar to the lTD and lID networks presented in the last two chapters, the trainedTED network was tested following the methods outlined in Section 4.3. The testresults parallel those of the lTD and lID networks, and are presented in the followingsections.§7.1 AM StimuliAs mentioned in Subsection 4.2.1, the TED network was trained using sinusoidallyamplitude-modulated tonal stimuli, which we refer to as the AM stimuli. Thus, thefirst test for the TED network was carried out using a random sample of 120 AMstimuli to which white noise was added with a 20 dB SNR. The intensities of the teststimuli were randomly selected from the range of 30-60 dB SPL. The liDs wererandomly selected from the range of -12 to 12 dB. The carrier frequencies wererandomly selected from the range (2500-3500 Hz) that covers the characteristicfrequencies of the auditory nerve fibers at the input of the TED network. Themodulation frequencies were randomly selected from the effective range (154-305Hz) discussed in Subsections 3.4.1 and 4.2.1. As mentioned in Subsection 4.2.2, theTED values encoded by the output layer neurons range from -1.64 to 1.64 ms. Weused a 0.83 ms TED in the test, which is an intermediate value between 0 second andthe upper bound (1.64 ms) of the encoded lED range. The resulting TED estimates (interms of the associated “position” and “width” parameters defined in Subsection4.3.1) were shown in Table 7.1.The above test was repeated using a lower (5 dB) SNR. Table 7.2 shows theresults. Despite an about two-fold decrease (as compared with the data in Table 7.1)in resolution, correct TED estimates were obtained.Chapter 7 Test Results of lED Estimation 134Table 7.1 lED estimates (mean ± standard deviation) and theerror of the mean position estimate for AM stimuli with a 0.83ms TED and a 20 dB SNR.Position: p (ms) Width: d (ms) Position Error (ms)0.85±0.06 0.24±0.012 0.02Table 7.2 lED estimates (mean ± standard deviation) and theerror of the mean position estimate for AM stimuli with a 0.83ms TED and a 5 dB SNR.Position: p (ms) Width: d (ms) Position Error (ms)0.84±0. 12 0.54±0.34 0.01Table 7.3 TED estimates and the errors of the mean positionestimates for AM stimuli with different TEDs and a 20 dBSNR.TED: zt (ms) Position: p (ms) Width: d (ms) Position Error (ms)1.25 1.2 ± 0.08 0.25 ± 0.12 -0.050.83 0.85 ± 0.06 0.24 ± 0.12 0.020.42 0.42 ± 0.09 0.24±0.11 0.00 0.004 ± 0.09 0.19 ± 0.06 0.004-0.42 -0.42 ± 0.07 0.2 ± 0.07 0.0-0.83 -0.82±0.07 0.27±0.15 0.01-1.25 -1.2±0.08 0.3±0.32 0.05The test shown in Table 7.1 was also repeated for different TED values around 0second. The results are tabulated in Table 7.3, where the first column shows thetested TED values.Chapter 7 Test Results of lED Estimation 135The estimated IEDs (in the second column of Table 7.3) are plotted as a functionof the true TEDs (in the first column of Table 7.3), as shown in Fig. 7.1. It is clear,from this plot, that the lED network is able to produce accurate lED estimates for AMstimuli.Cl)wa)•1E.4-Ci)wFig. 7.1 Estimated TEDs shown as a function of the true IEDs.§7.2 Stimuli Modulated by Two-Tone Complex SignalsTo test how well does the model work for stimuli with more complex modulationwaveforms, we used modulation signals that contain two sinusoids of differentfrequencies while keeping the carrier signal sinusoidal. In this category, we firstconducted a test using a random sample of 120 stimuli that have a fixed frequencydifference between the two components in the modulation signals. Specifically, thetwo frequency components of the modulation signals were randomly selected fromthe range of 150-300 Hz except that the difference between the two components was-1.5 -1 -0.5 0 0.5 1 1.5True lED (ms)Chapter 7 Test Results of lED Estimation 136always 50 Hz. Other relevant parameters were randomly selected from the samecorresponding ranges as that used in the tests discussed in the last section. The SNRwas 20 dB. The TED to be estimated was 0.83 ms. Table 7.4 shows the test results.Table 7.4 lED estimates (mean ± standard deviation) and theerror of the mean position estimate for stimuli with two-tonecomplex modulation waveforms. The TED to be estimated was0.83 ms. The SNR in the test was 20 dB.Position: p (ms) Width: d (ms) Position Error (ms)0.85±0.06 0.25±0.11 0.02The network was also tested with a lower (5 dB) SNR while keeping other aspectsof the test unchanged. The results are shown in Table 7.5.Table 7.5 lED estimates (mean ± standard deviation) and theerror of the mean position estimate for stimuli with two-tonecomplex modulation waveforms. The lED to be estimated was0.83 ms. The SNR in the test was 5 dB.Position: p (ms) Width: d (ms) Position Error (ms)0.85±0.09 0.49±0.21 0.02The results shown in Tables 7.4 and 7.5 are similar to the corresponding results(shown in Tables 7.1 and 7.2) for the AM stimuli case presented in the last section.This demonstrates the ability of the TED network to “generalize” and perform wellwith stimuli modulated by more complex waveforms than simple sinusoidal ones.Moreover, although the resolution of the TED estimates shows a two-fold decreasewhen the SNR is reduced from 20 dB (Table 7.4) to 5 dB (Table 7.5), the network isable to give out similar “position” estimates at the two noise levels.Chapter 7 Test Results oflED Estimation 137The test shown in Table 7.4 was further repeated for different frequencydifferences between the two frequency components in the modulation waveforms.The results are shown in Table 7.6, where the first column shows the tested frequencydifferences, and the subsequent columns show the corresponding TED estimates. Thetrue lED in the test stimuli was 0.83 ms. The SNR was 20 dB. It is evident that thefrequency difference between the two components in the modulation waveforms haslittle effect on the estimation results.Table 7.6 lED estimates and the errors of the mean positionestimates for stimuli with two-tone complex modulationwaveforms. The TED to be estimated was 0.83 ms. The SNRin the test was 20 dB. FD: frequency difference between thetwo components in the modulation waveforms.FD: Af (Hz) Position: p (ms) Width: d (ms) Position Error (ms)25 0.87 ± 0.07 0.24±0.11 0.0450 0.85 ± 0.06 0.25±0.11 0.0275 0.85 ±0.08 0.31±0.17 0.02100 0.85±0.10 0.33±0.11 0.02125 0.86±0.06 0.39±0.18 0.03§7.3 Two AM Sources from Different DirectionsFinally, the TED network was tested in situations where there were two sourcesemitting AM signals from different directions. In a first series of tests in thiscategory, the two sources were periodically on-off modulated with orthogonalfunctions (see Subsection 4.3.5 B). The 11D of both sources was 0 dB. The TEDs forthe two sources were ±0.83 ms, respectively. The SNR in the tests was 20 dB. Table7.7 shows the test results. The first column of Table 7.7 shows the different on-offChapter 7 Test Results oflED Estimation 138modulation periods tested. The other columns of the table show the correspondingTED estimates for the two sources. Fig. 7.2 shows the resulting TED distributionfunction for a specific on-off modulation period (17 ms). The two peaks in thedistribution indicate the two TED estimates. We found that the smallest on-offmodulation period for two-source separation by the network was about 17 ms.Table 7.7 TED estimates and the position errors for two-sourcecases with different on-off modulation periods (OOP). TheSNR was 20 dB, and the TEDs of the two sources were ±0.83ms, respectively.First Peak Second PeakOOP (ms) Position Width Position Position Width Position(ms) (ms) Error (ms) (ms) (ms) Error (ms)17 -0.86 0.16 -0.03 0.92 0.38 0.0923 -0.82 0.11 0.01 0.88 0.11 0.0533 -0.85 0.16 -0.02 0.87 0.11 0.0467 -0.82 0.11 0.01 0.85 0.05 0.02The above test was repeated with a lower (5 dB) SNR and a 17 ms on-offmodulation period. The results are shown in Table 7.8, which demonstrate thenetwork’s ability to separate the directions of two AM sources in noise.The test shown in Table 7.7 was also repeated for different spatial separations ofthe two AM sources. A 17 ms on-off modulation period was used in these tests.Table 7.9 shows the results, where the first column lists the spatial separations (interms of the two TED values) of the two AM sources. The smallest TED differencebetween the two sources was found to be about 0.7 ms.Chapter 7 Test Results of lED Estimation 13940 H-1A /1I: P1.20 1‘5 Il110H5 \!-0.5 0 1 1.5 2lED(s) xlO-3Fig. 7.2 The resulting TED distribution function for the testwith a 17 ms on-off modulation period.Table 7.8 lED estimates and the position errors for a two-source cases with a 17 ms on-off modulation periods (OOP).The SNR was 5 dB, and the 1EDs of the two sources were±0.83 ms, respectively.First Peak Second PeakOOP (ms) Position Width Position Position Width Position(ms) (ms) Error (ms) (ms) (ms) Error (ms)17 -0.78 0.33 0.05 0.78 0.27 -0.05A last test for the TED network was carried out in a case where the two AMsources were modulated with on-off intervals having random lengths selected fromthe range of [0, 33] ms. Again, the IEDs of the two sources were ±0.83 ms,respectively. The SNR in the test was 20 dB. Table 7.10 shows the test results. Fig.Chapter 7 Test Results of lED Estimation 1407.3 shows the corresponding ifiD distribution function. The two lateral peaks in thedistribution correspond to the two sources. The middle peak corresponds to short-time lED estimates produced by the network in time periods during which the twosources are both present. The time periods with both sources present are results of therandom on-off modulation of the two sources.Table 7.9 lED estimates and the position errors for two-sourcecases with different spatial separations (SS, in terms of the twoTEDs). The SNR was 20 dB.First Peak Second PeakSS (ms) Position Width Position Position Width Position(ms) (ms) Error (ms) (ms) (ms) Error (ms)(-0.33, 0.33) -0.3 0.16 0.03 0.28 0.22 -0.05(-0.5, 0.5) -0.51 0.16 -0.01 0.5 0.11 0.0(-0.67, 0.67) -0.63 0.16 0.04 0.66 0.11 -0.01Table 7.10 lED estimates and the position errors for a two-source case with random on-off modulations. The SNR was 20dB, and the IEDs of the two sources were ±0.83 ms,respectively.First Peak Second PeakPosition Width Position Position Width Position(ms) (ms) Error (ms) (ms) (ms) Error (ms)-0.81 0.11 0.02 0.85 0.16 0.02Chapter 7 Test Results of lED Estimation 1415040! 30: 20 I’I’I I-1.5 -1 -5O0i 1 2IEDs (s) xlO-3Fig. 7.3 The lED distribution function for the two-source casewith random on-off modulations. Parameters associated withthe two lateral peaks are shown in Table 7.10.142ConclusionsThis work is concerned with the determination of the positions of sound sources inmulti-source acoustic environments. Solutions to this problem have manyapplications in such areas as speech recognition, robot sensing, tele-operation, andman-machine interfaces. The ability to localize sounds is essential for animals andhumans.Sound localization is a non-trivial problem, especially in natural environmentswhich are often noisy, dynamic, and have multiple simultaneous sources at differentlocations in space. Nevertheless, the human auditory system is able to localize withease multiple sound sources in such complicated situations. It is a great challenge todevelop machines that can mimic the processes that make auditory localizationpossible. This is the motivation for our work.To approach this problem, in Chapter 2, we proposed the DLI modeling scheme.In this scheme the signals detected by the two ears are first decomposed into theirspectro-temporal distributions as represented in the neural activities of the auditorynerve fibers. Spatial attributes or localization cues are then determined from energyconcentrations in the distributions. A spatial scene of acoustic events is finally builtby integrating the short-time energy concentrations according to their spatialattributes. While much work has been done previously on modeling how theperipheral auditory system decomposes the stimulus signals into spectro-temporaldistributions, our work has focused on modeling the later two processes in the DLIscheme: brief-interval cue estimation and integration. We have proposed, in Chapter3, a DLI model in which the task of brief-interval cue estimation is realized by aprocess of pattern recognition and comparison.Conclusions 143We have further proposed that three parallel implementations of the DLI modelcan be realized by placing, in each implementation, a set of neural networks along thebasilar membrane. As discussed in Chapter 3, these separate sets of networks areused to model the separate processing of different binaural cues (lTD. lID, and TED)along separate pathways in the auditory system. Moreover, the lTD networks areplaced in the lower frequency region (below 1200 Hz), and the lID and TED networksare placed in the higher region. Although the lID and the lED networks are placed inthe same frequency region, they are trained to abstract different interaural differences(delay or intensity).We have only presented results for one network of each type (lTD. lID, or TED).As discussed in Subsection 4.2.2, if we normalize the frequency tuning curves of theauditory periphery corresponding to different points along the basilar membrane withrespect to their characteristic frequencies, they turn out to be similar in shape (see Eq.4.2). Thus, the input patterns to different networks of the same type placed atdifferent parts of the basilar membrane have similar characteristics. This in turnsuggests that the test results of one such network are representative of those of othernetworks of the same type. In fact, we have trained several different lTD networksthat cover different frequency ranges within the lower frequency region (from about100-1200 Hz). Similar results were obtained for these networks. As mentioned inSubsection 4.2.3, it takes a relatively long time (weeks) to train just one network, andresource limitations have prevented us from training more complete sets of networksto cover the entire frequency region. Nevertheless, we have shown that robust shorttime cue estimates can be obtained by training the relatively simple networks used inour model.A rough estimate made from Fig. 1.14, which gives the perceived locations ofimpulse sound images as a function of the lTD (Blauert, 1983), suggests that a 10 .tslTD difference corresponds approximately to a one-degree difference in direction.Conclusions 144Corresponding to the estimated width parameter shown in Table 5.1, the lTD networkis able to localize sound sources within about 4.5 degrees in direction, which is aboutthe just noticeable angle difference measured in psychophysical experiments forsounds coming from an angle between 60 and 75 degrees from the median plane (seeFig. 1.17). In terms of interaural phase difference, the estimated width parametershown in Table 5.1 corresponds to about 15 phase degrees, which is about the humanlTD threshold for perceived change for a lateralized sound image, as shown in Fig.1.15. As reviewed in Chapter 1, an important observation that is evident from bothFig. 1.15 and Fig. 1.17 is that the lTD threshold of perceived change is at minimumwhen the sound image is perceived in the center of the head (0 second ITD), andincreases when the sound image becomes lateralized (larger absolute ITD5). In otherwords, the auditory system is more sensitive to lTD changes if the absolute lTD valueis smaller. This phenomenon of differential lTD sensitivity is less evident in the testresults of the lTD network, if any.In the results of 11D estimation presented in Table 6.3 the estimated values for thewidth parameter are generally smaller than (about half) the variation of the estimatedvalues of the position parameter. Thus, we compare the standard deviations of theposition estimates (rather than the width) with lID sensitivities measured inpsychophysical experiments. The position standard deviations shown in Table 6.3vary from about 1 to 2 dB. They are generally comparable to the measured lIDthreshold of perceived change for lateralized sound images, although the lID networkin our model is slightly less sensitive (0.5 dB versus 1 dB) than the observed humanperformance for centralized sound images (see Fig. 1.9 on Page 16). As reviewed inChapter 1, the 11D threshold of perceived change is smaller (0.5 versus 1.5 dB) whenthe sound image is centralized than it is when the sound image is lateralized (see alsoFig. 1.9 on Page 16). This phenomenon is also observable in the test results of theConclusions 145lID network. From Table 6.3, the standard deviation of position varies from about 2dB for the fully lateralized image case to about 1 dB for the centralized image case.A characteristics of our model is that the binaural neurons in the three types ofnetworks in the model “listen” to multiple auditory nerve fibers with different, butadjacent, characteristic frequencies. Such multiple-fiber innervation is also seen inBonham’s (1994) convergence model reviewed in Chapter 1. In Bonham’s model,however, the signals carried by multiple nerve fibers from a single ear are firstcombined to obtain a composite signal, which is then used to estimate ITDs via acoincidence detection mechanism (Jeffress, 1948). In effect, the binaural neurons donot “see” the spectro-temporal patterns seen by neurons in our model. The fact thatauditory nerve fibers with different characteristic frequencies from different ears aresent to the same binaural neurons is also seen in Shamma’s (1989) stereausis model.But in Shamma’s model the neurons receive only two inputs, namely a single fiberfrom each ear. Again, the binaural neuron does not see the spectro-temporal profileof the signals carried by the nerve fibers.Fig. C. 1 shows some examples of the connection patterns between the hiddenlayer neurons and the input layer ones in the trained lTD network. Two observationscan be made: (i) the hidden neurons can be viewed as spectro-temporal filters; and (ii)there are inhibitory inputs from the auditory nerve fibers to the hidden layer neurons(black square shown in Fig. C. 1). The first observation indicates to the physiologiststhat the spectro-temporal response properties of binaural neurons may be important inthe study of neural mechanisms of auditory localization. The second observationprovides us with a possible explanation for the usefulness of inhibitory inputs to thebinaural neurons observed in the physiological experiments reviewed in Chapter 1(Grothe and Sanes, 1993; Adams and Mugnaini, 1990; Schwartz, 1992; Cant andHyson, 1992). The connection patterns in our model are different from manyprevious models that do not have inhibitory input at all (e.g. Jeffress, 1948; ColburnConclusions 146et al., 1990; Dabak and Johnson, 1992). Inhibitory inputs to the binaural neurons areseen in a model by Sujaku et al. (1981), a schematic diagram of which is shown inFig. 1.24. This model can be seen as a degenerated case of the connection patternsobserved in our model.Fig. C. 1 (next page) Connection patterns between hidden layerneurons and input layer neurons in the trained lTD network.Each larger box in the figure corresponds to one hidden neuron,Inside these larger box are two matrices of connection weights(corresponding to the two matrices of input layer neurons) anda bias. The weights and bias are represented by white (positiveweight value) or black (negative weight value) area in a smallgray square. The sizes of the white or black areas in the smallsquares indicate the magnitudes of the weights.TimeSlicesinWindow-.1Conclusions 148To conclude this thesis, the following specific contributions are claimed. First,we have proposed a DLI scheme for models of auditory localization in non-stationarymulti-source acoustic environments, and have accordingly developed a unique DLImodel. The signals detected by the two ears are superpositions of the sound signalsemitted from the spatially distributed sound sources in the acoustic environment. Theultimate task of sound localization is to decompose the sensory signals into a spatialmap of the original sound sources. While it is very difficult to map the sensorysignals directly to their spatial distribution, the idea behind the DLI modeling schemeis to find a “bridge”, or an intermediate representation, between the sensory signalsand the spatial distribution of sound sources. In order for such a bridge to be helpfulin solving our problem, it must have the following property: it should consist ofelements that can be mapped to no more than one source in the spatial distribution.We argue that the spectro-temporal distribution of the stimulus signal, which ismanifested in the activity of parallel auditory nerve fibers, can serve as such a bridge.Small “patches” of the spectro-temporal distribution can be assumed to correspond tono more than one source in space. As it is very likely that different elements of theintermediate representation (the bridge) having the same short-time cues correspondto the same sources in space, the final spatial distribution of sound sources can beobtained by integrating, or grouping, these elements according to these cues.Secondly, we have shown that there are unexplored patterns in the neural signalscarried by parallel auditory nerve fibers that are important for sound localization.Spectro-temporal patterns are traditionally used in signal recognition tasks(Lippmann, 1989; Dror et al., 1995). Our work is the first to use such patterns for thepurpose of passive sound localization.Thirdly, we have trained a simple type of neural network to show that suchpatterns are good indications of interaural differences, and can be used to obtainrobust short-time location cue estimates.Conclusions 149Fourthly, we have shown that the simplest types of stimuli, pure-tones (for lTDand lID estimation) or AM stimuli (for TED estimation), are adequate for the trainingof the neural networks in our model. The trained networks are shown to be able to“generalize” and perform well when the stimuli are more complex signals.Furthermore, we have shown that our model works in low SNRs and in non-stationary multi-source situations.Finally, we have demonstrated that the same model structure can be used andtrained to estimate different localization cues. This suggests that there is no realdifference between the lTD and TED pathways except that the spectro-temporalrepresentations of the sound stimuli are different; for low-frequency stimuli, theneural activity in the auditory nerve fibers reflects the fine timing structure of thestimuli, whereas for high-frequency stimuli, the neural activity reflects the envelopetiming structure of the stimuli. In other words, the difference between the lTD andthe TED pathway comes naturally from the results of the differential peripheralprocessing of low-frequency and high-frequency stimuli.Although our model is computationally intensive and inherently parallel, as mostbrain models are, it is not far from practical implementation. Hardware chips (Lyonand Mead, 1988) already exist that mimic the peripheral processing of the auditorysystem. There are other chips (Lazzaro and Mead, 1989; Mead et al., 1991) that takethe output of a cochlear chip as their input, and mimic certain types of binauralprocessing. Our model provide an effective new computational structure or algorithmthat can be implemented in practical applications.There are several directions in which further research may improve our model.One of the limitations of our model is that the networks in the model are trained andtested using stimuli that simulate anechoic environments. However, roomreverberation and echoes may lead to biased location estimates or multiple estimatescorresponding to both an original sound source and its echoes. The precedence effectConclusions 150(Zurek, 1987) is thought to be related to the robust ability of the auditory system tolocalize in normal rooms, and the onset transients of sound stimuli play an importantrole in this effect. In order for our model to localize in room environments, methodsthat are more sophisticated than histogram accumulation should be used in theimplementation of the “integration” step in the DLI modeling scheme. Specifically, arefractory mechanism that involves lateral inhibition may be developed to reduce theeffect of echoes and room reverberation. In such a mechanism, the short-time cueestimates obtained immediately after the onsets will be inhibited if these estimates donot agree with the estimates obtained from the onset transients.Another limitation of our model is that the networks in the model are trainedusing stimuli whose loudness levels are restricted to 30-60 dB. This is a relativelynarrow range considering the large dynamic range of the sensitivity of the ear. Due tothe non-linearity of the cochlear model, networks trained with stimuli in a limitedrange may not generalize or perform well for stimuli that have loudness levels belowor above this range.We have made an assumption about the acoustic environments which states thatin most short (in the order of several milliseconds) time intervals only one sound hassignificant intensity. This assumption is likely to hold for such situations as aconversation between two or three people, but not for many environmental sounds orfor most music. Thus, the solution provided by our model to the multi-sourceproblem is relatively limited. To overcome this limitation, networks that “cover”frequency bands that are much wider than those used in our current simulation (or theentire audible frequency range) may be used. The task of such networks would be torecognize and then localize spectro-temporal patterns that are more complex thanthose limited to narrow frequency bands. Such more complex short-time spectrotemporal patterns may help distinguish simultaneous complex sounds. AnotherConclusions 151alternative is to combine the outputs from parallel networks over different narrowfrequency bands which may provide a satisfactory solution to the problem.The training of the networks in our model takes a relatively long time. Furthereffort may be devoted to developing more efficient ways of training, and ways ofimproving the performance of the networks after the initial training. We have usedsimple multi-layer feed-forward neural networks for the task of pattern recognitionand comparison, but have not investigated other possible network structures that maybe more effective in performing similar tasks. A future project is to developalternative network structures and to compare the performance of different networkstructures. Examination of these alternatives may also help to simplify or speed upthe training process. Finally, it should be worthwhile to explore methods for theimplementation of the present model using current integrated circuit technologies.152ReferencesAdams, J.C. and E. Mugnaini (1990) “Immunocytochemical evidence for inhibitoryand disinhibitory circuits in the superior olive,” Hearing Research, Vol. 49, pp.281-298.Aitkin, L.M. and W.R. Webster (1972) “Medial geniculate body of the cat:organization and responses to tonal stimuli of neurons in the ventral division,”J. Neurophysiol., Vol. 35, pp. 365-380.Allen, J.B. (1985) “Cochlear modeling,” IEEE ASSP Magazine, January, 1985, pp. 3-29.Anderson, T.R., J.A. Janko, and R.H. Gilkey (1994) “Modeling human soundlocalization with hierarchical neural networks,” Proceedings of the IEEEInternational Conference on Neural Networks, June 27-29, 1994, Orlando,Florida, Vol. 7, pp. 4502-4507.Arad, N., E.L. Schwartz, Z. Wollberg, and Y. Yeshurun (1994) “Acoustic binauralcorrespondence used for localization of natural acoustic signals,” NeuralNetworks, Vol. 7, No. 3, pp. 441-447.B ackman, J., and M. Karjalainen (1993) “Modeling of human directional and spatialhearing using neural networks,” Proceedings of the International Conference onAcoustics, Speech, and Signal Processing, 1993, Vol. 1, pp. 125-128.Bekesy, G. von (1959) “Neural funneling along the skin and between the inner andouter hair cells of the cochlea,” Journal of the Acoustical Society of America,Vol. 31, pp. 1236-1249.Bekesy, G. von (1960) Experiments in Hearing, McGraw Hill, 1960, New York.Blauert, J. (translated by J.S. Allen) (1983) Spatial Hearing, The Psychophysics ofHuman Sound Localization, The MIT Press, 1983, Cambridge, Massachusetts.Blodgett, H.C., W.A. Wilbanks, and L.A. Jeffress (1956) “Effect of large interauraltime differences upon the judgment of sideness,” Journal of the AcousticalSociety of America, Vol. 28, pp. 639-643.Bonham, B.H. (1994) “Self-organization and properties of receptive fields in theauditory brainstem - a model,” Ph.D. Thesis, University of California,Berkeley.Boudreau, J.C. and C. Tsuchitani (1968) “Binaural interaction in the cat superiorolive S segment,” J. Neurophysiol., Vol. 31, pp. 442-454.References 153Brugge, J.F. and M.M. Merzenich (1973) “Responses of neurons in the auditorycortex of the macaque monkey to monaural and binaural stimulation,” J.Neurophysiol., Vol. 36, pp. 1138-1158.Brugge, J.F., D.J. Anderson, and L.M. Aitkin (1970) “Response of neurons in thedorsal nucleus of the lateral lemniscus of the cat to binaural stimuli,” J.Neurophysiol., Vol. 33, pp. 441-458.Brugge, J.F., L.M. Dubrovsky, L.M. Aitkin, and D.J. Anderson (1969) “Sensitivity ofsingle neurons in the auditory cortex of cat to binaural tone stimulation: Effectsof varying interaural time and intensity,” J. Neurophysiol., Vol. 32, pp. 1005-1024.Caird, D. and R. Klinke (1983) “Processing of binaural stimuli by cat superior olivarycomplex neurons,” Experimental Brain Research, Vol. 52, pp. 385-399.Cant, N.B. and R.L. Hyson (1992) “Projections from the lateral nucleus of thetrapezoid body to the medial superior olivary nucleus in the gerbil,” HearingResearch, Vol. 58, pp. 26-34.Carney, L.H. (1993) “A model for the responses of low-frequency auditory-nervefibers in cat,” J. Acoust. Soc. Am., Vol. 93, No. 1, January 1993, pp. 401-4 17.Carney, L.H. and T.C.T. Yin (1988) “Temporal coding of resonance by low-frequency auditory nerve fibers: single fiber responses and a populationmodel,” 3. Neurophysiol., Vol. 60, pp. 1653-1677.Carterette E.C. and M.P. Friedman (Eds.), Handbook of Perception, Vol. IV, Hearing,Academic Press, New York.Carterette, E.C. (1978) “Historical notes on research in hearing,” in E.C. Carteretteand M.P. Friedman (Eds.), Handbook of Perception, Vol. IV, Hearing,Academic Press, 1978, New York.Chan, J.C.K. and T.C.T Yin (1984) “Interaural time sensitivity in the medial superiorolive of the cat: comparisons with the inferior colliculus,” Soc. Neurosci.Abstract, Vol. 10, p. 844.Cherry, E.C. (1961) “Two ears - but one world,” in W.A. Rosenblith (Ed.), SensoryCommunication, pp. 99-117, Cambridge, The MIT Press, Massachusetts, 1961.Cherry, E.C. and B.McA. Sayers (1956) “Human cross-correlator - A technique formeasuring certain parameters of speech perception,” Journal of the AcousticalSociety of America, Vol. 28, pp. 889-895.Cherry, E.C. and W.K. Taylor (1954) “Some further experiments upon therecognition of speech, with one and two ears,” Journal of the AcousticalSociety of America, Vol. 26, pp. 554-559.References 154Colburn, H.S. (1973) “Theory of binaural interaction based on auditory-nerve data: I.General strategy and preliminary results on interaural discrimination,” Journalof the Acoustical Society of America, Vol. 54, No. 6, PP. 1458-1470.Colburn, H.S. (1977) “Theory of binaural interaction based on auditory-nerve data: II.Detection of tones in noise,” Journal of the Acoustical Society of America,Vol. 61, No. 2, pp. 525-533.Colburn, H.S. and H. Ibrahim (1993) “Modeling of precedence-effect behavior insingle neurons and in human listeners,” Journal of the Acoustical Society ofAmerica, Vol. 93, p. 2293.Colburn, H.S. and J.S. Latimer (1978) “Theory of binaural interaction based onauditory-nerve data: III. Joint dependence on interaural time and amplitudedifferences in discrimination and detection,” Journal of the Acoustical Societyof America, Vol. 64, No. 1, pp. 95-106.Colburn, H.S. and N.J. Durlach (1978) “Models of binaural interaction,” in E.C.Carterette and M.P. Friedman (Eds.), Handbook of Perception, Vol. IV,Hearing, Academic Press, New York, 1978, pp. 467-518.Colburn, H.S. and P.J. Moss (1981) “Binaural interaction models and mechanisms,”in J. Syka and L. Aitkin (Eds.), Neuronal Mechanisms of Hearing, PlenumPress, New York, 1981, pp. 283-288.Colburn, H.S., Y. Han, and C.P. Culotta (1990) “Coincidence model of MSOresponses,” Hearing Research, Vol. 49, pp. 335-346.Crow, G., A.L. Rupert, and G. Moushegian (1978) “Phase-locking in monaural andbinaural medullary neurons: implications for binaural phenomena,” Journal ofthe Acoustical Society of America, Vol. 64, pp. 493-50 1.Dabak, A.G. and D.H. Johnson (1992) “Functional-based modeling of binauralprocessing: Interaural phase,” Hearing Research, Vol. 58, pp. 200-212.David, E.E., Jr., N. Guttman, and W.A. van Bergeijk (1958) “On the mechanism ofbinaural fusion,” Journal of the Acoustical Society of America, Vol. 30, Pp.801-802.David, E.E., Jr., N. Guttman, and W.A. van Bergeijk (1959) “Binaural interaction ofhigh-frequency complex stimuli,” Journal of the Acoustical Society ofAmerica, Vol. 31, pp. 774-782.de Boer, E. (1975) “Synthetic whole nerve action potentials for the cat,” J. Acoust.Soc. Am., Vol. 58, pp. 1030-1045.de Boer, E. and C. Kruidenier (1990) “On ringing limits of the auditory periphery,”Biol. Cybernet., Vol. 63, pp. 433-442.References 155de Boer, E. and H.R. de Jongh (1978) “ On cochlear encoding: potentialities andlimitations of the reverse correlation technique,” I. Acoust. Soc. Am., Vol. 63,pp. 115-135.de Boer, E. and P. Kuyper (1968) “Triggered correlation,” IEEE Trans. BiomedicalEng., Vol. 15, pp. 169-179.Dolan, T.R. and D.E. Robinson (1967) “Explanation of masking-level differences thatresult from interaural intensitive disparities of noise,” Journal of the AcousticalSociety of America, Vol. 42, pp. 977-981.Domnitz, R (1973) “The interaural time JND as a simultaneous function of interauraltime and interaural amplitude,” Journal of the Acoustical Society of America,Vol. 53, pp. 1549-1552.Dror, I.E., M. Zagaeski, and C.F. Moss (1995) “Three-dimensional target recognitionvia sonar: a neural network model,” Neural Networks, Vol. 8, No. 1, pp. 149-160.Durlach, N.J. and H.S. Colburn, (1978)”Binaural Phenomena,” in E.C. Carterette andM.P. Friedman (Eds.), Handbook of Perception, Vol. IV, Hearing, AcademicPress, 1978, New York.Feddersen, W.E., T.T. Sandel, D.C. Teas, and L.A. Jeffress (1957) “Localization ofhigh frequency tones,” Journal of the Acoustical Society of America, Vol. 29,pp. 260-270.Firestone, F.A. (1930) “The phase difference and amplitude ratio at the ears due to asource of pure tone,” Journal of the Acoustical Society of America, Vol. 2, pp.260-270.Flanagan, J.L., E.E. David, Jr., and B.J. Watson (1964) “Binaural lateralization ofcophasic and anti-phasic clicks,” Journal of the Acoustical Society of America,Vol. 36, pp. 2184-2 193.Fletcher, R. (1975) Practical Methods of Optimization, John Wiley & Sons, NewYork.Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition (2nd ed.),Academic Press, 1990, Boston.Gaik, W. (1993) “Combined evaluation of interaural time and intensity differences:Psychoacoustic results and computer modeling,” Journal of the AcousticalSociety of America, Vol. 94, pp.98-110.Geisler, C.D., W.S. Rhode, and D.W. Hazelton (1969) “Response of inferiorcolliculus neurons in the cat to binaural acoustic stimuli having wide-bandspectra,” J. Neurophysiol., Vol. 32, pp. 960-974.References 156Gelfand, G, J.C. Pearson, C.D. Spence, and W.E. Sullivan (1988) “Multisensorintegration in biological systems,” Proceedings of the IEEE InternationalSymposium on Intelligent Control, pp. 147-153, 1988.Goldberg, J.M. and P.B. Brwon (1968) Functional organization of the dog superiorolivary complex: an anatomical and electrophysiological study,” J.Neurophysiol., Vol. 31, pp. 639-656.Goldberg, J.M. and P.B. Brwon (1969) Response of binaural neurons of dog superiorolivary complex to dichotic tonal stimuli: some physiological mechanisms ofsound localization,” J. Neurophysiol., Vol. 32, pp. 613-636.Grantham, D.W. (1984) Interaural intensity discrimination: insensitivity at 1000 Hz,”Journal of the Acoustical Society of America, Vol. 75, No.4, pp. 1191-1194.Grothe, B. and D.H.Sanes (1993) “Inhibition influences time difference coding byMSO neurons ó an in vitro study,” Assoc. Res. Otolaryngology Abstract, Vol.16, p. 108.Guinan, J.J., B.E. Norris, and S.S. Guinan (1972) “Single auditory units in thesuperior olivary complex: II. location of unit categories and tonotopicorganization,” International Journal of Neuroscience, Vol. 4, pp. 147-166.Guttman, N.A. (1962) “A mapping of binaural click lateralization,” Journal of theAcoustical Society of America, Vol. 34, pp. 87-92.Hall, J.L. (1974) “Two-tone distortion products in a nonlinear model of the basilarmembrane,” J. Acoust. Soc. Am., Vol. 56, pp. 1818-1828.Han, Y. and H.S. Colburn (1993) “Point-neuron model for binaural interaction inMSO,” Hearing Res., Vol. 68, pp. 115-130.Harris, G.G. (1960) “Binaural interactions of impulsive stimuli and pure tones,”Journal of the Acoustical Society of America, Vol. 32, pp. 685-692.Helmholtz, H.L.F. (1862) On the Sensation of Tone, Dover Publications (1954), NewYork, pp. 406-4 10.Henning, G.B. (1974a) “Delectability of interaural delay in high-frequency complexwaveforms,” Journal of the Acoustical Society of America, Vol. 55, pp. 84-90.Henning, G.B. (1 974b) “Lateralization and the binaural masking-level difference,”Journal of the Acoustical Society of America, Vol. 55, pp. 1259-1262.Henning, G.B. (1980) “Some observations on the lateralization of complexwaveforms,” Journal of the Acoustical Society of America, Vol. 68, pp. 446-454.Henning, G.B. and J. Ashton (1981) “The effect of carrier and modulation frequencyon lateralization based on interaural phase and interaural group delay,” HearingResearch, Vol. 4, pp. 185-194.References 157Hershkowiz, R.M. and Durlach, N.J. (1969) “Interaural time and amplitude jnd’ s for a500 Hz tone,” Journal of the Acoustical Society of America, Vol. 46, pp. 1464-1467.Hestenes, M. (1980), Conjugate Direction Methods in Optimization, Springer-Verlag,New York, 1980.Hirsh, J.A., J.C.K. Chan, and T.C.T. Yin (1985) “Responses of neurons in the cat’ssuperior colliculus to acoustic stimuli: I. Monaural and binaural responseproperties,” J. Neurophysiol., Vol. 53, pp. 726-745.Imig, T.J. and H.O. Adrian (1977) “Binaural columns in the primary field (AT) of thecat auditory cortex,” Brain Research, Vol. 138, pp. 241-257.Jeffress, L.A. (1948) “A place theory of sound localization,” Journal of Comparativeand Physiological Psychology, Vol. 41, pp. 35-39.Jeffress, L.A., H.C. Blodgett, and B.H. Deatherage (1952) “The masking of tones bywhite noise as a function of the interaural phases of both components: I. 500cycles,” Journal of the Acoustical Society of America, Vol. 24, pp. 523-527.Johannesma, P.I.M. (1972) “The pre-response stimulus ensemble of neurons in thecochlear nucleus,” in Proceedings of the Symposium on Hearing Theory,Eindhoven, The Netherlands, pp. 58-69.Johansson, E.M., F.U. Dowla, and D.M. Goodman (1991) “Back-propagationlearning for multi-layer feed-forward neural networks using the conjugategradient method,” International Journal of Neural Systems, Vol. 2, No. 4, pp.29 1-302.Johnson, D.H., A. Dabak, and C. Tsuchitani (1990) “Function-based modeling ofbinaural processing: Interaural level,” Hearing Research, Vol. 49, pp. 30 1-320.Johnstone, B.M., R. Patuzzi, and G.Y. Yates (1986) “Basilar membranemeasurements and the traveling wave,” Hear. Res., Vol. 22, pp. 147-153.Kiang, N.Y.S. (1968) “A survey of recent developments in the study of auditoryphysiology,” Annals of Otology, Rhinology, and Laryngology, Vol. 77, pp.656-675.Kiang, N.Y.S., T. Watanabe, E.C. Thomas, and L.F. Clark (1965) Discharge patternsof single fibers in the cat’s auditory nerve,” Research Monograph 35, MITPress, Cambridge, Massachusetts, 1965.King, A.J. and A.R. Palmer (1983) “Cells responsive to free-field auditory stimuli inguinea-pig superior colliculus: distribution and response properties,” J. Physiol.(London), Vol. 342, pp. 361-38 1.Kitzes, L.M., K.S. Wrege, and J.M. Cassady (1980) “Patterns of responses of corticalcells to binaural stimulation,” J. Comp. Neurol., Vol. 192, pp. 455-472.References 158Kiump, R.G. and H.R. Eady (1956) “Some measurements of interaural timedifference thresholds,” Journal of the Acoustical Society of America, Vol. 28,pp. 859-860.Knudsen, E.I. and M. Konishi (1978) “A neural map of auditory space in the owl,”Science, Vol. 200, pp. 795-797.Konishi, M. (1993) “Listening with two ears,” Scientific American, April 1993, pp.66-73.Kuwada, S. and T.C.T. Yin (1983) “Binaural interaction in low frequency neurons ininferior colliculus of the cat: I. Effects of long interaural delays, intensity andrepetition rate on the interaural delay function,” J. Neurophysiol., Vol. 50, pp.981-999.Kuwada, S. and T.C.T. Yin (1987) “Physiological studies of directional hearing,” inW.A. Yost and G. Gourevitch (Eds.), Directional Hearing, Springer-Verlag,New York, 1987.Kuwada, S., T.C.T. Yin, J. Syka, T. Buunen, and R.E. Wickesberg (1984) “Binauralinteraction in low frequency neurons in inferior colliculus of the cat: IV.Comparison of monaural and binaural response properties,” J. Neurophysiol.,Vol. 51, pp. 1306-1352.Lazzaro, J. and C.A. Mead (1989) “A silicon model of auditory localization,” NeuralComputation, Vol. 1, pp. 47-57.Leakey, D.M., B.McA. Sayers, and C. Cherry (1958) “Binaural fusion of low- andhigh-frequency sounds,” Journal of the Acoustical Society of America, Vol. 30,p. 222.Lehky, S.R. and T. J. Sejnowski (1990) “Neural network model of visual cortex fordetermining surface curvature from images of shaded surface,” Proc. R. Soc.Lond., Vol. B240, pp. 251-278, 1990.Levitt, H. and E.A. Lundry (1966) “Binaural vector model: Relative interaural timedifferences,” Journal of the Acoustical Society of America, Vol. 40, p. 1251.Licklider, J.C.R. (1959) “Three auditory theories,” in E.S. Koch (Ed.), Psychology: Astudy of a science, Study 1, Vol. 1, McGraw-Hill, New York, 1959.Licklider, J.C.R. and J.C. Webster (1950) “The discriminability of interaural phaserelations in two-component tones,” Journal of the Acoustical Society ofAmerica, Vol. 22, pp. 191-195.Lim, C. and R.O. Duda (1994) “Estimating the azimuth and elevation of a soundsource from the output of a cochlear model,” Conference Record of the 28thAsilomar Conference on Signals, Systems, and Computers, Oct. 30 - Nov. 2,1994, Pacific Grove, California, Vol. 1, pp. 309-403.References 159Lindemann, W. (1986a) “Extension of a binaural cross-correlation model bycontralateral inhibition: I. Simulation of lateralization for stationary signals,”Journal of the Acoustical Society of America, Vol. 80, PP. 1608-1622.Lindemann, W. (1986b) “Extension of a binaural cross-correlation model bycontralateral inhibition: II. The law of the first wave front,” Journal of theAcoustical Society of America, Vol. 80, pp. 1623-1630.Lippmann, R.P. (1989) “Review of neural networks for speech recognition,” NeuralComputation, Vol. 1, pp. 1-38.Loeb, G., M. White, and M. Merzenich (1983) “Spatial cross-correlation: A proposedmechanism for acoustic pitch perception,” Biol. Cybern., 1983, pp. 149-163.Lyon, R.F. (1983) “A computational model of binaural localization and separation,”Proc. ICASSP, Vol. 83, pp. 1148-1151.Lyon, R.F. and C. Mead (1988) “An analog electronic cochlea,” IEEE Trans.Acoustics, Speech, and Signal Processing, Vol. 36, No. 7, July, 1988, pp. 1119-1134.MacFadden, D. and E. Pasanen (1976) “Lateralization at high frequencies based oninteraural time differences,” Journal of the Acoustical Society of America, Vol.59, pp. 634-639.MacGregor, R.J. (1987) Neural and Brain Modeling, Academic Press, New York,1987.Mallock, A. (1908) “Note on the sensibility of the ear to the direction of explosivesound,” Proceedings of the Royal Society (London), Series A, Vol. 80, pp. 110-112.Maren, A.J., C.T. Harston, and R.M. Pap (1990) Handbook of Neural ComputingApplications, Academic Press, San Diego, 1990.McFadden, D. (1968) “Masking-level differences determined with and withoutinteraural disparities in masker intensity,” Journal of the Acoustical Society ofAmerica, Vol. 44, pp. 212-223.Mead, C.A., X. Arreguit, and J. Lazzaro (1991) “Analog VLSI model of binauralhearing,” IEEE Trans. Neural Networks, Vol. 2, No. 2, March 1991, pp. 230-236.Melssen, W.J., W.J.M. Epping, and I.H.M. van Stokkum (1990) “Sensitivity forinteraural time and intensity difference of auditory midbrain neurons in thegrassfrog,” Hearing Research, Vol. 47, pp. 235-256.Middlebrooks, J.C. and E.I. Knudsen (1984) “A neural code for auditory space in thecat’s superior colliculus,” J. Neurosci., Vol. 4, pp. 2621-2634.References 160Mills, A.W. (1958) “On the minimum audible angle,” Journal of the AcousticalSociety of America, Vol. 30, PP. 237-246.Mills, A.W. (1960) “Lateralization of high-frequency tones,” Journal of theAcoustical Society of America, Vol. 32, pp. 132-134.Mills, A.W. (1972) “Auditory localization,” in J.V. Tobias (Ed.), Foundations ofModern Auditory Theory, Vol. 2, Academic Press, New York, 1972, pp. 303-348.Moiseff, A. and M. Konishi (1981) “Neuronal and behavioral sensitivity to binauraltime differences in the owl,” J. Neuroscience, Vol. 1, pp. 40-48.Moller, A.R. (1974) “Responses of units in the cochlear nucleus to sinusoidallyamplitude-modulated tones,” Exp. Neurol. Vol. 45, pp. 104-117.Moore, B.C.J. and B.R. Glasberg (1983) “Suggested formulae for calculatingauditory-fiber bandwidths and excitation patterns,” J. Acoust. Soc. Am., Vol.74, No. 3, September, 1993, pp. 750-753.Moushegian, G. and L.A. Jeffress (1959) “Rule of interaural time and intensitydifferences in the lateralization of low-frequency tones,” Journal of theAcoustical Society of America, Vol. 31, pp. 1441-1445.Neti, C., E.D. Young, and M.H. Schneider (1992) “Neural network models of soundlocalization based on directional filtering by the pinna,” J. Acoust. Soc. Am.,Vol. 92, No. 6, 1992, pp. 3140-3 156.Nigrin, A. (1993) Neural Networks for Pattern Recognition, MIT Press, 1993,Cambridge, Mass.Nordlund, B. (1962) “Physical factors in angular localization,” Acta OtoLaryngologica, Vol. 54, pp. 75-93.Nuetzel, J.M. and E.R. Hafter (1976) “Lateralization of complex waveforms: effectsof fine structure, amplitude, and duration,” Journal of the Acoustical Society ofAmerica, Vol. 60, pp. 1339-1346.Nuetzel, J.M. and E.R. Hafter (1981) “Lateralization of complex waveforms: spectraleffects,” Journal of the Acoustical Society of America, Vol. 69, pp. 1112-1118.Osman, E. (1971) “A correlation model of binaural masking level differences,”Journal of the Acoustical Society of America, Vol. 50, pp. 1494-1511.Palmer, A.R. and A.J. King (1982) “The representation of auditory space in themammalian superior colliculus,” Nature, Vol. 299, pp. 248-249.Palmieri, F. M. Datum, A. Shah, and A. Moiseff (1991) “Sound localization with aneural network trained with the multiple extended Kalman algorithm,”Proceedings of the International Joint Conference on Neural Networks, 1991,Vol. l,pp. 125-131.References 161Pao, Y.-H. (1989) Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, 1989, Reading, Mass.Patterson, R., I. Nimmo-Smith, J. Holdsworth, and P. Rice (1988) “Implementing agammatone filter bank,” SVOS Final Report: the Auditory Filter Bank.Patuzzi, R., P.M. Sellick, and B.M. Johnstone (1984) “The modulation of thesensitivity of the mammalian cochlear by low frequency tones: III. Basilarmembrane motion,” Hearing Research, Vol. 13, pp. 19-27.Phillips, D.P. and D.R.F. Irvine (1981) “Responses of single neurons inphysiologically defined area AT of the cat cerebral cortex: Sensitivity tointeraural intensity differences,” Hearing Research, Vol. 4, pp. 299-307.Pickles, J.O. (1988) An Introduction to the Physiology of Hearing (2nd Ed.),Academic Press, 1982, London.Pinheiro, M.L. and H. Tobin, (1969) “Interaural intensity differences in intracraniallateralization,” Journal of the Acoustical Society of America, Vol. 46, pp.1482-1487.Powell, M. (1977) “Restart procedures for the conjugate gradient method,”Mathematical Programming, Vol. 12, No. 2., pp. 24 1-254.Rajan, R., L.M. Aitkin, and D.R. Irvine (1990b) “Azimuthal sensitivity of neurons inprimary auditory cortex of cats: II. Organization along frequency-band strips,”J. Neurophysiol., Vol. 64, No. 3, pp. 888-902.Rajan, R., L.M. Aitkin, D.R. Irvine, and J. McKay (1990a) “Azimuthal sensitivity ofneurons in primary auditory cortex of cats: I. Types of sensitivity and theeffects of variations in stimulus parameters,” J. Neurophysiol., Vol. 64, No. 3,pp. 872-887.Rayleigh (J.W. Strutt, Lord Rayleigh), (1877) Theory of Sound, Macmillan, London,(2nd ed., revised, 1894) Reprinted by Dover, New York.Rayleigh (J.W. Strutt, Lord Rayleigh), (1907) “On our perception of sound direction,”Philosophical Magazine, Vol. 13, pp. 214-232.Rayleigh (J.W. Strutt, Lord Rayleigh), (1945) The Theory of Sound, Vol. 2, (2nd ed.)Dover, 1945, New York.Reed, M.C. and J.J. Blum (1990) “A model for the computation and encoding ofazimuthal information by the lateral superior olive,” Journal of the AcousticalSociety of America, Vol. 88, No. 3, pp. 1442-1453.Rhode, W.S. (1971) “Observations of the vibration of the basilar membrane using theMossbauer technique,” I. Acoust. Soc. Am., Vol. 49, pp. 1218-123 1.References 162Robinson, D.E. and L.A. Jeffress (1963) “Effect of varying the interaural noisecorrelation on the delectability of tonal signals,” Journal of the AcousticalSociety of America, Vol. 35, pp. 1947-1952.Robles, L., W.S. Rhode, and C.D. Geisler (1976) “Transient response of the basilarmembrane measured in squirrel monkeys using the Mossbauer effect,” J.Acoust. Soc. Am., Vol. 59, pp. 926-939.Romano, A. (1977) Applied Statistics for Science and Industry, Allyn and Bacon, Inc.Boston, 1977.Ross, J.E., and N.B. Gross, C.D. Geisler, and J.E. Hind (1966) “Some neuralmechanisms in the inferior colliculus of the cat which may be relevant to thelocalization of a sound source,” J. Neurophysiol., Vol. 29, pp. 288-314.Rowland, R.C. and J. F. Tobias (1967) “Interaural intensity difference linen,” Journalof Speech and Hearing Research, Vol. 10, pp. 745-756.Ruggero, M.A. and N.C. Rich (1991) “Application of a commercially-manufacturedDoppler-shift laser velocimeter to the measurements of basilar-membranevibration,” Hear. Res., Vol. 51, pp. 215-230.Rumelhart, D.E., G. E. Hinton, and R. J. Williams (1986) “Learning internal representations by error propagation,” In D. E. Rumelhart and J. L. McClelland(Eds.), Parallel Distributed Processing: Explorations in the Microstructure ofCognition, Vol. 1, pp. 318-362, MIT Press/Bradford Books, Cambridge, MA,1986.Russell, I.J. and P.M. Sellick (1978) “Intracellular studies of hair cells in themammalian cochlea,” J. Physiol., Vol. 284, pp. 261-290.Russell, I.J. and P.M. Sellick (1983) “Low-frequency characteristics of intracellularlyrecorded receptor potentials in guinea-pig cochlear hair cells,” J. Physiol., Vol.338, pp. 179-206.Sayers, B. McA. (1964) “Acoustic-image lateralization judgments with binauraltones,” Journal of the Acoustical Society of America, Vol. 36, pp. 923-926.Sayers, B. McA. and E.C. Cherry (1957) “Mechanism of binaural fusion in thehearing of speech,” J. Acoust. Soc. Am., Vol. 29, pp. 973-986.Sayers, B. McA. and F.E. Toole (1964) “Acoustic-image lateralization judgmentswith binaural transient,” Journal of the Acoustical Society of America, Vol. 36,pp. 1199-1205.Schechter, B.P., J.A. Hirsch, and T.C.T. Yin (1981) “Auditory input to cells in thedeep layers of the cat superior colliculus,” Soc. Neurosci. Abstracts, Vol. 7,20.13.References 163Schroeder, M.R. (1977) “New viewpoints in binaural interactions,” in E.F. Evans andJ.P. Wilson (Eds.), Psychophysics and Physiology of Hearing, Academic Press,New York, 1977, pp. 455-567.Schwartz, I.R. (1992) “The superior olivary complex and lateral lemniscal nuclei,” inD.B. Webster, A.N. Popper, and R.R. Fay (Eds.), The Mammalian AuditoryPathway: Neuroanatomy, Springer, New York, 1992, pp. 117-167.Shackleton, T.M., R. Meddis, and M.J. Hewitt (1992) “Across frequency integrationin a model of lateralization,” Journal of the Acoustical Society of America,Vol. 91, pp. 2276-2279.Shamma, S.A., N. Shen, and P. Gopalaswarry (1989) “Stereausis: Binaural processingwithout neural delays,” Journal of the Acoustical Society of America, Vol. 86,pp. 989-1006.Shaw, E.A.G. (1974a) “Transformation of sound pressure level from the free field tothe eardrum in the horizontal plane,” Journal of the Acoustical Society ofAmerica, Vol. 56, pp. 1848-1861.Shaw, E.A.G. (1974b) “The external ear,” in W.D. Keidel and W.D. Neff (Eds.),Handbook of Sensory Physiology, Vol. V/i, Springer-Verlag, 1974, New York.Shu, Z.J., N.V. Swindale, and M.S. Cynader (1993) “Spectral motion produces anauditory after-effect,” Nature, Vol. 364, No. 6439, pp. 721-723.Siebert, W.M. (1965) “Some implications of the stochastic behavior of primaryauditory neurons,” Kybernetic, Vol. 2, pp. 206-2 15.Siebert, W.M. (1968) “Stimulus transformations in the peripheral auditory system,” inP.A. Kolers and M. Eden (eds.), Recognizing Patterns, Studies in Living andAutomatic Systems, The MIT Press, Cambridge, Mass., 1968, pp. 104-133.Siebert, W.M. (1970) “Frequency discrimination in the auditory system: Place andperiodicity mechanisms?” Proceedings of the IEEE, Vol. 58, pp. 723-730.Steinberg, J.C. and W. B. Snow, (1934) “Physical factors in auditory perspective,”Bell System Technical Journal, Vol. 13, pp. 245-258.Stern, R.M., A.S. Zeiberg, and C. Trahiotis (1988) “Lateralization of complexbinaural stimuli: A weighted-image model,” Journal of the Acoustical Societyof America, Vol. 84, No. 1, pp. 156-165.Stevens, S.S. and E.B. Newman (1936) “Localization of actual sources of sound,”Am. J. Psychol., Vol. 48, pp. 297-306.Stillman, R.D. (1972) “Responses of high-frequency inferior colliculus neurons tointeraural intensity differences,” Experimental Neurology, Vol. 36, pp. 118-126.References 164Sujaku, Y., S. Kuwada, and T.C.T. Yin (1981) “Binaural interaction in the cat inferiorcolliculus: Comparison of the physiological data with a computer simulatedmodel,” in J. Syka and L. Aitkin (Eds.), Neuronal Mechanisms of Hearing,Plenum Press, New York, 1981, pp. 233-238.Takahashi, T.T., C.H. Keller, and P. Janata (1993) “Resolving multiple sound sourcesin the owl’s midbrain,” Society for Neuroscience Abstracts, Vol. 19, 221.6.Thomson, S.P. (1879) “The pseudophone,” Philosophical Magazine, Vol. 8, pp. 385-390.Toole, F.B. and B. McA. Sayers (1 965a) “Lateralization judgments and the nature ofbinaural acoustic images.” Journal of the Acoustical Society of America, Vol.37, pp. 319-324.Toole, F.B. and B. McA. Sayers (1965b) “Inferences of neural activity associatedwith binaural acoustic images.” Journal of the Acoustical Society of America,Vol. 38, pp. 769-779.van Bergeijk, W.A. (1962) “Variation on a theme of von Bekesy: A model of binauralinteraction,” Journal of the Acoustical Society of America, Vol. 34, pp. 1431-1437.van Camp, D. (1993) Xenon Neural Network Simulator User’s Guide, Dept. ofComputer Science, University of Toronto.Wang, L. and P.N. Denbigh (1993) “Monaural localization using combination ofTDQ) with Back-propagation,” Proceedings of the IEEE InternationalConference on Neural Networks, 1993, Vol. 1, pp. 187-190.Watson, C.S. and B.T. Mittler (1965) “Time-intensity equivalence in auditorylateralization: A graphical method,” Psychon. Science, Vol. 2, pp. 2 19-220.Whitworth, R.H. and L.A. Jeffress (1961) “Time versus intensity in the localization oftones,” Journal of the Acoustical Society of America, Vol. 33, pp. 925-929.Wightman, F.L., D.J. Kistler, and M.E. Perkins (1987) “A new approach to the studyof human sound localization,” in W.A. Yost and G. Gourevitch (Eds.),Directional Hearing, Springer-Verlag, New York, 1987, pp. 26-48.Wilson, H.A. and C.S. Myers, (1908) “The influence of binaural phase difference inthe localization of sound,” British Journal of Psychology, 1908, pp. 363-385.Wise, L.Z. and D.R.F. Irvine (1983) Auditory response properties of neurons in deeplayers of cat superior colliculus, J. Neurophysiol., Vol. 49, pp. 674-685.Woodworth, R.S. (1938) Experimental Psychology, Holt, 1938, New York.Yin, T.C.T. and J.C.K. Chan (1986) “Neural mechanisms underlying interaural timesensitivity to tones and noise,” in G.M. Edelman and W.E. Gall (Eds),Functions of the Auditory System, John Wiley and Sons, New York, 1986.References 165Yin, T.C.T. and J.C.K. Chan (1990) “Interaural time sensitivity in medial superiorolive of cat,” J. Neurophysiol., Vol. 64, PP. 465-488.Yin, T.C.T. and S. Kuwada (1983) “Binaural interaction in low frequency neurons ininferior colliculus of the cat: II. Effects of changing rate and direction ofinteraural phase,” J. Neurophysiol., Vol. 50, pp. 1000-1019.Yin, T.C.T. and S. Kuwada (1984) “Neuronal mechanisms of binaural interaction,” inG.M. Edelman, W.C. Cowan, and W.E. Gall (Eds.), Dynamic Aspects ofNeocortical Function, pp. 263-313, John Wiley and Sons, 1984, New York.Yin, T.C.T., J.A. Hirsch, and J.C.K. Chan (1985) “Responses of neurons in the cat’ssuperior colliculus to acoustic stimuli: II. A model of interaural intensitysensitivity,” J. Neurophysiol., Vol. 53, pp. 746-758.Yin, T.C.T., S. Kuwada, and Y. Sujaku (1984) “Interaural time sensitivity of highfrequency neurons in the inferior colliculus,” Journal of the Acoustical Societyof America, Vol. 76, pp. 1401-1410.Yost, W.A. (1974) “Discrimination of interaural phase differences,” Journal of theAcoustical Society of America, Vol. 55, pp. 1299-1303.Yost, W.A. (1981) “Lateral position of sinusoids presented with interaural intensiveand temporal differences,” Journal of the Acoustical Society of America, Vol.70, No. 2, pp. 397-409.Yost, W.A. and E.R. Hafter (1987) “Lateralization,” in W.A. Yost and G. Gourevitch(Eds.), Directional Hearing, Springer-Verlag, 1987, New York, 49-84.Yost, W.A. and G. Gourevitch (Eds.) (1987) Directional Hearing, Springer-Verlag,1987, New York.Yost, W.A. and R.H. Dye (1987) “Discrimination of interaural differences of level asa function of frequency,” Journal of the Acoustical Society of America.Yost, W.A., F.L. Wightman, and D.M. Green (1971) “Lateralization of filteredclicks,” Journal of the Acoustical Society of America, Vol. 50, pp. 1526-1531.Zakarauskas, P. and M.S. Cynader (1993) “A computational theory of spectral cuelocalization,” J. Acoust. Soc. Am., Vol. 94, No. 3, Pt. 1, pp. 1323-133 1.Zurek, P.M. (1987) “The precedence effect,” in W.A. Yost and G. Gourevitch (Eds.),Directional Hearing, Springer-Verlag, 1987, New York, pp. 85-105.Zwislocki, J. and R.S. Feldman (1956) “Just noticeable differences in dichotic phase,”Journal of the Acoustical Society of America, Vol. 28, pp. 860-864.

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0065019/manifest

Comment

Related Items