UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A warning signal identification system (WARNSIS) for the hard of hearing and the deaf 1989

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
UBC_1989_A7 C44.pdf [ 8.17MB ]
Metadata
JSON: 1.0064861.json
JSON-LD: 1.0064861+ld.json
RDF/XML (Pretty): 1.0064861.xml
RDF/JSON: 1.0064861+rdf.json
Turtle: 1.0064861+rdf-turtle.txt
N-Triples: 1.0064861+rdf-ntriples.txt
Citation
1.0064861.ris

Full Text

A W A R N I N G S I G N A L I D E N T I F I C A T I O N S Y S T E M ( W A R N S I S ) F O R T H E H A R D O F H E A R I N G A N D T H E D E A F K w o k W i n g Chau B . A . Sc. Universi ty of Windsor A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E O F M A S T E R OF A P P L I E D SCIENCE i n T H E F A C U L T Y OF G R A D U A T E STUDIES D E P A R T M E N T OF E L E C T R I C A L E N G I N E E R I N G We accept this thesis as conforming to the required standard T H E UNIVERSITY OF BRITISH COLUMBIA July 1989 © K w o k W i n g C h a u } ] 989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of EC CCr/ttCti*- £AJ Cfo £~&//J </ The University of British Columbia Vancouver, Canada Date Tulj. 3/ (?? DE-6 (2/88) Abstract The objective of this project has been to design a reliable warning sound recognition system for hard of hearing and deaf people. Commercia l ly available auditory warning devices use simple technologies, which are not able to produce the performance required. The demand for a versatile W A R N i n g Signal Identification System ( W A R N S I S ) that satisfies the needs of hard of hearing and deaf individuals has been well established. Th i s W A R N S I S must be "teachable" in order to cope w i t h the many different sounds, and diverse noisy environments. Relevant sounds are telephone rings, sirens, and smoke and fire alarms, and noise includes a l l other sounds including radio-music, conversation, machinery, etc. In the absence of published data, we studied extensively both t iming and spectral characteristics of warning sounds. We found that the average short-time absolute ampli tude of warning sounds is useful i n providing t iming information, and that the short-time spectra yie ld characteristic patterns for signal classification. T h e W A R N S I S operates i n real-time, and embodies two parts: the t iming analyzer and the spectral recognizer. The t iming analyzer continuously monitors the variations of environmental sounds, from which important t iming features are derived. If a po- tential warning sound is detected, the spectral recognizer is activated to analyze its spectral patterns. W h e n these patterns match one of the learned and pre-stored tem- plates, a warning sound is identified w i th the known warning sound associated w i th that template. A n advantage of such a recognition scheme is that i t avoids unnecessary and computat ionally intensive spectral analysis work when only noise is present. i i Evalua t ion results show that the W A R N S I S can reliably recognize warning sounds i n random noise w i t h no false alarms. In loud music and conversation backgrounds the W A R N S I S can s t i l l achieve a high recognition rate, but more false alarms are generated. In household environments where conditions are less demanding than our evaluation criteria, our system is expected to produce very satisfactory results. Since the W A R N S I S can be taught to learn and recognize new warning sounds, it may be used i n other applications such as noisy industr ial sites and traffic light control . i i i Table of Contents Abstract ii List of Tables xii List of Figures xv i Acknowledgement xvii 1 Introduction 1 1.1 Background 1 1.2 Auditory Warning Aids for Hearing Impaired Persons 2 1.2.1 Hard-wired Systems 3 1.2.2 Threshold Detector Systems 4 1.2.3 Hearing Ear Dogs 5 1.3 Project Objectives 6 1.4 Thesis Outline 7 2 Warn ing Sounds and Generating Devices 9 2.1 Types of Warning Signal Generating Devices 9 2.2 Industrial Standards for Warning Devices 10 2.2.1 Sound Output Power 10 2.2.2 Frequency Specification . . 10 2.3 Literature on Warning Sound Characteristics 11 2.3.1 Telephone Rings 11 iv 2.3.2 Smoke Detector A l a r m Sounds 12 2.3.3 Warn ing and A l a r m Sounds Generated by Vehicles and Traffic Cont ro l Devices 14 2.4 The Emerging Scientific basis for Generating Warn ing Sounds 15 2.4.1 A Generic Warn ing Sound Generating Scheme 15 3 Measurement and Analysis of T i m i n g & Spectral Characteristics 19 3.1 T i m i n g Characteristics 19 3.1.1 A P C - B a s e d D a t a Acquis i t ion System 19 3.1.2 D a t a Col lect ion 22 3.1.3 T i m i n g Features of Different Warn ing Sounds 25 3.2 Spectral Characteristics 33 3.2.1 Compar ison of Parametr ic and Nonparametric Spectral Es t ima- t ion Methods 34 3.2.2 Welch's Non-overlapping Spectral Es t ima t ion M e t h o d 35 3.2.3 Implementation of Welch's M e t h o d 37 3.2.4 D a t a Col lect ion 40 3.2.5 Spectra of Warn ing Sounds Generated by various Warn ing De- vices 43 3.2.6 Summary 59 4 Solutions to the Recognition Prob lem 69 4.1 Pat tern-Recognit ion M o d e l for Signal Identification 69 4.2 Review & Evalua t ion of Signal Recognit ion Techniques 71 4.2.1 A n a l y z i n g & Ut i l i z i ng T i m i n g Features 71 4.2.2 Feature Ex t rac t ion by F i l t e r Banks 74 4.2.3 T h e L P C / A R M o d e l 76 v 4.2.4 LPC-derived Cepstral Coefficients 77 4.2.5 The Hidden Markov Model (HMM) Approach 78 4.3 Overview of the Recognition Scheme for WARNSIS 79 4.4 Extracting & Classifying Timing Information 82 4.4.1 A Scheme to Extract Timing Features 83 4.5 Extracting Spectral Information 94 4.5.1 Feature Extraction 94 4.5.2 Dynamic Time Warping (DTW) 96 5 Design & Implementation 101 5.1 Timing Analyzer 101 5.1.1 Microphone 101 5.1.2 Analog Signal Conditioner 103 5.1.3 Control & Timing Processor (CTP) 104 5.2 Spectral Recognizer (SR) . . 104 5.2.1 The Hybrid Analog Processor (MC4760) 105 5.2.2 Feature Extraction and Pattern Matching Processor (/zPD776l) 106 5.2.3 The Control Processor (/xPD7762) 108 5.2.4 Pattern Memory 109 5.3 Software Program 109 5.3.1 The Command Set of the Spectral Recognizer 110 5.3.2 Initialization Stage I l l 5.3.3 Training Stage 112 5.3.4 Recognition Stage 114 6 Evaluat ion 118 6.1 Average Recognition Accuracies 120 vi 6.2 False-alarm Rates 123 6.3 Discussion 124 6.3.1 Average Recognition Accuracies 124 6.3.2 False-alarm Rates 129 7 Conclusions and Recommendations 1 3 1 7.1 Summary & Conclusions 131 7.2 Recommendations for Future Directions of Research 133 References 1 3 5 Appendices A Formulat ion of Relationship between S N R and S P L measurements 1 4 0 B Format of the command set of the S R 1 4 5 C Software Operat ing M a n u a l of T h e W A R N S I S 1 4 8 C . l Program Files 148 C . 2 Interactive Operations 149 C.2.1 Initialization Stage 150 C . 2.2 Training/Recognition Stage 152 D Evaluat ion Results 1 5 6 D. l The Complete WARNSIS 158 D. l . l Recognition Results with Background Steady Noise 158 D.1.2 Recognition Results with Background of F M Broadcast plus Steady Noise ,.. 161 vii D.1.3 Recognition Results with Background of A M Broadcast plus Steady Noise 163 D.1.4 Results of phone ring recognition with minimum burst duration (MBD) set to 1.024 sec 165 D.1.5 Results of the False-alarm Tests for the complete WARNSIS . . . 168 D.2 Timing Analyzer Part Alone 170 D.2.1 Recognition Results with Background Steady Noise 170 D.2.2 Recognition Results with Background of F M Broadcast Plus Steady Noise 172 D.2.3 Recognition Results with Background of A M Broadcast Plus Steady Noise 174 D.3 False-alarm Results for the Timing Analyzer Alone 176 D.4 Spectral Recognizer Part Alone 178 D.4.1 Recognition Results with Background Steady Noise 178 D.4.2 Recognition Results with Background of F M Broadcast plus Steady Noise 181 D.4.3 Recognition Results with Background of A M Broadcast plus Steady Noise 184 D . 4 . 4 Results of false-alarm tests for the spectral recognizer part alone 187 E Specifications 188 viii List of Tables 2.1 Spectral analysis results for different smoke detectors [13] 13 2.2 Summary of spectral analysis results for traffic alarm sounds [14] . . . . 14 3.3 Instantaneous and short-time signal amplitudes 20 5.4 Parameters used for the Timing Analyzer I l l 6.5 A summary of recognition results with M B D set to 0.1024 sec 121 6.6 A summary of recognition results with M B D set to 1.024 sec 123 6.7 Results of the false-alarm test with M B D set to 0.1024 125 6.8 Results of false-alarm test with M B D set to 1.024 sec 126 A. 9 Tabulation of SPL reading difference and SNR 144 B. 10 Format of command set of SR 146 B . l l Legal Values for parameters of the command set 146 B.12 Interpretation of status output codes from ^PD7762 147 D.13 "Numbers" assigned for different warning sounds 157 D.14 Confusion matrix for recognition results generated by the complete W A R N - SIS in the presence of steady noise 159 D.15 Recognition rates of burst-type sounds under steady noise condition . . 159 D.16 Recognition rates of steady sounds generated by the complete WARNSIS under steady noise condition 159 ix D.17 Confusion ma t r ix for phone ring recognition generated by the complete W A R N S I S under steady noise condit ion 160 D.18 Recognit ion rates of phone r ing generated by the complete W A R N S I S under steady noise condit ion 160 D.19 Confusion ma t r ix for recognition results generated by the complete W A R N - SIS i n the presence of F M broadcast plus steady noise . 162 D.20 Recognit ion rates of burst-type sounds produced by the complete W A R N - SIS under F M broadcast plus steady noise condit ion 162 D.21 Recognit ion rates of steady sounds generated by the complete W A R N S I S under F M broadcast plus steady noise condition 162 D.22 Confusion ma t r ix for recognition results generated by the complete W A R N - SIS i n A M broadcast plus steady noise background 164 D.23 Recognit ion rates of burst-type sounds generated by the complete W A R N - SIS i n A M broadcast plus steady noise environment 164 D.24 Recognit ion rates of steady sounds generated by the complete W A R N S I S i n A M broadcast plus steady noise background 164 D.25 Confusion mat r ix for phone r ing recognition generated by the complete W A R N S I S under the condit ion of F M broadcast and the steady noise w i t h M B D set to 1.024 sec 166 D.26 Results of recognition rates of phone rings generated by the complete W A R N S I S i n F M broadcast plus the steady noise background 166 D.27 Confusion mat r ix for the results of phone r ing recognition generated by the complete W A R N S I S i n the presence of A M broadcast plus the steady noise w i th M B D set to 1.024 sec 167 x D.28 Results of phone r ing recognition rates generated by the complete W A R N - SIS i n the presence of A M broadcast plus steady noise w i t h M B D set to 1.024 sec 167 D.29 Results of the false-alarm tests for the complete W A R N S I S w i t h M B D set to 0.1024 sec 168 D.30 Results of the false-alarm tests for the complete W A R N S I S w i t h M B D set to 1.024 sec 169 D.31 Confusion mat r ix for warning sound recognition generated by the t iming analyzer alone i n the presence of steady noise 171 D.32 Recognit ion rates of the t iming analyzer part alone i n the presence of steady noise 171 D.33 Confusion mat r ix for warning sound recognition generated by the t iming analyzer part alone i n the presence of F M broadcast plus steady noise . 173 D.34 Recognit ion rates of the t iming analyzer part alone in the presence of F M broadcast plus steady noise 173 D.35 Confusion mat r ix for warning sound recognition generated by the t iming analyzer part alone i n the presence of A M broadcast plus steady noise . 175 D.36 Recognit ion rates of the t iming analyzer part alone in the presence of A M broadcast plus steady noise 175 D.37 False-alarm test results of the t iming analyzer part alone w i th M B D set to 0.1024 sec 176 D.38 False-alarm test results of the t iming analyzer part alone w i t h M B D set to 1.024 sec 177 D.39 Confusion mat r ix for warning sound recognition generated by the spec- tral recognizer part alone i n the presence of steady noise 179 x i D.40 Results of steady sound recognition rate generated by the spectral rec- ognizer part alone in steady noise background 179 D.41 Results of burst-type sound recognition rates produced by the spectral recognizer part alone in steady noise background 180 D.42 Results of phone ring recognition rate produced by the spectral recog- nizer part alone in steady noise background 180 D.43 Confusion matrix for the results of warning sound recognition generated by the spectral recognizer part alone in F M broadcast and steady noise background 182 D.44 Results of steady sound recognition rate produced by the spectral rec- ognizer part alone in F M broadcast plus steady noise background . . . . 182 D.45 Results of burst-type sound recognition rates produced by the spectral recognizer part alone in F M broadcast plus steady noise background . . 183 D.46 Results of phone ring recognition rates produced by the spectral recog- nizer part alone under F M broadcast plus steady noise condition . . . . 183 D.47 Confusion matrix for warning sound recognition generated by the spec- tral recognizer part alone under A M broadcast plus steady noise condi- tion 185 D.48 Results of steady sound recognition rates produced by the spectral rec- ognizer part alone under A M broadcast plus steady noise condition . . . 185 D.49 Results of burst-type sound recognition rate produced by the spectral recognizer part alone in the presence of A M broadcast plus steady noise 186 D.50 Results of phone ring recognition rate produced by the spectral recog- nizer part alone in the presence of A M broadcast plus steady noise . . . 186 D.51 False-alarm tests for the spectral analyzer part alone 187 xii List of Figures 2.1 Aud i to ry Warn ing Sound Components [17,21,23] 18 3.2 Signal acquisit ion and derivation of instantaneous absolute signal ampl i - tudes 21 3.3 Flowchart of procedure to accumulate and store 1000 samples 23 3.4 Exper imenta l set-up for data collection 24 3.5 Short-time average absolute amplitudes ( S T A A A ) of siren sounds: a) J l : Burglar a la rm (JDS-100); b) J2 : M P I - 1 1 ; c) J3 : JDS-100 I; and d) J4 : H I - L O 27 3.6 Short-time average absolute amplitudes ( S T A A A ) of siren sounds: a) J5 : H i g h steady sound; b) J6 : Pulser; c) 37 : Steady horn; and d) J8 : Electronic Synthesized B e l l sound 28 3.7 Short-t ime average absolute amplitudes ( S T A A A ) of telephone rings and smoke a la rm sound: a) Electro-mechanical Ringer; b) Electronic Ringer; and c) Smoke a la rm sound 29 3.8 Short-time average absolute amplitudes ( S T A A A ) of radio broadcasts a) P o p music; b) Speech; and c) Rock music 30 3.9 Short-time average absolute amplitudes ( S T A A A ) of siren sounds w i th radio-broadcast as background: a) J l ; b) J2; c) J3; and d) J4 31 3.10 Short-time average absolute amplitudes ( S T A A A ) of different siren sounds w i t h same background noise: a) J5; b) J6; c) J7 ; and d) J8 32 x i i i 3.11 Spectrogram of the minimum 4-sample Blackman-Harris window, where PSD denotes power spectral density 41 3.12 Flowchart of the spectral analysis program 42 3.13 Short-time spectra of an electromechanical ringer 47 3.14 (a) : Spectra of an electromechanical ringer with seven loudness settings 48 3.14 (b) : Spectra of another electromechanical ringer with seven loudness settings 49 3.15 Long-time averaged spectra of five electromechanical ringers 50 3.16 Short-time averaged spectra of a multiple-line telephone 51 3.17 Effects of steady fan noise on telephone ring spectra 52 3.18 (a) Short-time spectra of electronic rings with pitch set at position one . 54 3.18 (b): Short-time spectra of electronic rings with pitch set at position two 55 3.18 (c) : Short-time spectra of electronic rings with pitch set at position three 56 3.18 (d) : Short-time spectra of electronic rings with pitch set at position four 57 3.19 Spectra of Rapid Yelp sound 61 3.20 Spectra of Conventional Yelp sound 62 3.21 Spectra of Low-Hi sweep sound 63 3.22 Spectra of European Hi-Low sound 64 3.23 Spectra of Hi-Frequency Steady sound 65 3.24 Spectra of Pulsating Horn sound 66 3.25 Spectra of Steady Horn sound 67 3.26 Spectra of Electronic Synthesized Bell sound 68 4.27 Classic Signal Recognition Scheme [37,38] 70 4.28 The 'hybrid' recognition scheme for WARNSIS 80 4.29 Block diagram of the Timing Feature Extractor 82 xiv 4.30 Relationships between the instantaneous energy and the instantaneous absolute amplitudes of a sequence, x(n). (a) : the plot of x(n); (b): the plot of |z(n)|; and (c): the plot of x2(n) 84 4.31 (a): The STAAA contour of a steady sound; (b): The STAAA contour of a burst-type sound 86 4.32 Two typical examples of how the dynamic amplitude threshold adapts to acoustic energy variations of the environment, (a): sudden decrease in signal levels; (b): sudden increase in signal levels 88 4.33 (a) : Detection of a steady sound; (b): A n illustration of how the scheme rejects a non-steady sound 90 4.34 A demonstration of the use of the M B D and MIAT to refine the basic warning sound analysis scheme 91 4.35 Flowchart of the Timing Feature Extraction Scheme 93 4.36 Filter-bank analysis of Warning sounds 95 4.37 An example of pattern matching between a reference template and an unknown pattern 97 4.38 Local path constraints for D T W 100 5.39 The building blocks of WARNSIS 102 5.40 Block diagram of MC4760 106 5.41 Block diagram of the functional operation of /^PD7761 107 5.42 Timing relationships associated with the synchronization of the spectral recognizer to burst-type warning signals, where S T A A A is the short-time average absolute amplitude of signal; RP is the repetition period; A S B W is the average signal burst width, and SR is the spectral recognizer . . . 113 5.43 Flowchart of the training scheme for steady sounds 115 xv 5.44 Flowchart of t raining procedures for burst-type warning sounds 116 5.45 Flowchart of the recognition procedure 117 6.46 A n example of a phone ring sequence added w i t h nonstationary back- ground noise 128 x v i Acknowledgement I would like to thank my supervisor, D r . C . A . Laszlo for his patience, encouragement, and input during this project. I a m greatly indebted to m y colleagues, Darre l l Wong and Sammy Y i c k for their invaluable discussions and advice. Special thanks are due to Ange la C h o i and Michae l Slawnych for their comments and suggestions to improve the presentation of this thesis. F ina l ly , very deep gratitude is directed to my family for their generous financial support. Th i s project was funded by Na tu ra l Sciences and Engineering Research Counc i l of Canada grant A67012. x v i i Chapter 1 Introduction 1.1 Background A u d i t o r y communicat ion is v i t a l to normal life. Such communicat ion often focuses on speech which is one of the most effective means of conveying ideas, opinions or information among people. Aud i to ry communicat ion also plays an important role in associating people w i t h their environment. In particular, auditory warnings are of great importance. Such warnings include baby cries, telephone rings, doorbells, door knocks, fire or smoke a la rm bells, burglar alarms, car horns, sirens, and electronic buzzers commonly used i n household appliances and office equipment. Generally, auditory warnings are achieved by special sounds. F i r s t ly , warning sounds are usually loud, strident and insistent to effectively cut through speech and background noises, and to command people's attention. Secondly, different warning sounds convey different "messages" which demand responses of varying urgency. Some warning sounds are used to "announce" a condit ion, or an event; for example, an in - coming telephone call , or a visi tor at a door. Other warning sounds alert people to potential life-threatening situations such as a fire, or intruders inside a house. Failure to respond to these warning sounds may result i n serious harm. Unfortunately, hearing-disabled people have difficulty i n hearing warning sounds and i n many cases cannot hear even very loud alarms. This problem extends to many different situations of everyday life. For such individuals , many common household 1 Chapter 1. Introduction 2 sounds go undetected (sounds produced by oven buzzers, ba throom fans, stove hood fans, or running water) causing inconvenience and occasional danger i n homes. In noisy environments, hearing-disabled individuals cannot discriminate different types of sounds. For example, they cannot hear the sounds that indicate automobile malfunc- tions such as worn brakes, bad wheel bearings, or noisy mufflers. In addi t ion, hard of hearing individuals who wear hearing aids can only detect warning signals i f their hearing aids are operating and are sensitive enough. Specifically, unless the hearing aid is worn during sleep, hard of hearing people usually cannot hear the sound of burglar alarms, or fire and smoke a la rm bells. Furthermore, i n tornado- prone states of the U . S . (Kansas, Texas and Arkansas) , the general public is usually alerted of approaching tornadoes by loud siren sounds. Miss ing such warning sounds can be fatal! B u t hearing-disabled people often cannot hear such sounds, and their utmost concern and their urgent need for special devices to warn of such impending disasters have been forcefully stated [l] . Indeed, the invisible disabil i ty of deaf and hard of hearing people creates serious inconveniences, frustrations, fears, and hazards i n their dai ly life. In part icular, the vulnerabi l i ty to missing auditory warnings contributes significantly to the lack of mobi l - ity, independence, and security of hearing-disabled persons. In response to the obvious need to help hearing-disabled people to cope w i th this problem, a number of special alert aids have been designed and marketed. 1.2 A u d i t o r y W a r n i n g Aids for Hearing Impaired Persons A range of systems, signalling and wake-up devices are currently available to alert hearing impaired individuals to telephone rings, doorbells, door knocks, fire or smoke a la rm bells, and general emergency signals i n diverse environments [1,2,3,4]. Some Chapter 1. Introduction 3 systems are simple sound amplitude amplification devices, which increase the volume of warning sounds to a level detectable by hearing aid wearers. Other, more sophisticated systems, are capable of dr iv ing external visual modules and tactile actuators. Three major types of auditory warning aids for the hearing-disabled are i n use: directly activated hard-wired systems, acoustic threshold detector systems, and hearing ear dogs. 1.2.1 Hard-wired Systems Such systems require direct electrical connection to sound generating sources. They are reliably activated by the electric signal that drives the warning sound generator, and alert the hearing-disabled by either flashing lights, or by vibratory actuators. To increase the operational range, and to eliminate the need for long cables, an interme- diate A M or F M transmitter can be integrated into such systems. Single or multiple remote receivers distr ibuted throughout the home or office can pick up the transmitted signal, and subsequently tu rn on actuators. A characteristic example of such systems is the Sonic Ale r t , which w i l l produce light flashes to alert the hearing impaired to telephone calls. The device can be used w i t h any telephone, and is easily installed by plugging it into any modular telephone jack and electrical outlet. B o t h the plug-in and a remote radio-transmitter version are available from the Special Needs Department of the B r i t i s h C o l u m b i a Telephone Company. Some hard-wired devices are simple enough to be installed by users without exten- sive electronic skills (e.g., Sonic A le r t ) . Other, more sophisticated devices, are custom designed, and require permanent instal lat ion by a technician at a considerable cost. A s reported, these custom designed devices often must be left behind when hearing- disabled individuals move from house to house [1]. In addit ion, as the number of sound Chapter 1. Introduction 4 generating devices increases i n homes or offices, the cost of hard-wired systems esca- lates due to bo th the wi r ing required, and the increased complexity. F ina l ly , before any remote warning device is installed, hearing-disabled people have to check if there are similar remote systems installed i n neighboring houses. Due to "cross-talk", such systems i n close proximi ty are very prone to generating false warnings. 1.2.2 Threshold Detector Systems Since warning devices produce sounds that are louder than normal environmental sound levels, threshold detector systems are designed to respond to changes i n loudness. In- stead of direct connection to sound generating sources, threshold devices employ a microphone, or special electromagnetic field sensor for signal acquisit ion. W i t h sensi- t iv i ty adjustment, a threshold device can be adapted to operate w i t h various types of alarms, for example horns, sirens, and telephones, under different acoustic conditions. W h e n the signal level from any source exceeds the preset threshold value of the sys- tem, such a device w i l l automatically activate the actuator to alert a hearing impaired ind iv idua l . Since these devices cannot selectively identify the sources of the loud sounds, in acoustic systems the microphone is positioned i n close proximi ty to the warning sound generator for m a x i m u m system sensitivity and selectivity to the desired inputs. A hearing impaired person can adjust the device sensitivity according to the acoustic background noise level. Such a device is simple to operate, and is used to monitor crying babies, telephones, doorbells, and burglar or smoke alarms. W h i l e threshold devices are generally more flexible than hard-wired systems, proper setting of the device sensitivity is frustrating to many users. Adjusted too high, the device is l ikely to miss the occurrence of warning sounds. A low threshold setting makes the device vulnerable to false triggering. Chapter 1. Introduction 5 Threshold detection systems using electromagnetic field sensing detect only warn- ing sounds emitted by electromechanical actuators, for example telephones and doors equipped w i t h electromechanical bells. W h e n an electromechanical bell is activated, a strong t ime-varying electromagnetic field is produced to activate an internal elec- tromechanical v ibra t ing system. Consequently, this v ibra t ion generates a loud sound. For the purpose of warning sound detection, the stray electromagnetic field emitted by many devices may be uti l ized. For example, w i th a suction cup electromagnetic field pickup coils may be attached to the telephone or bel l housing to intercept part of the t ime-varying magnetic field. The output of the pickup coil is amplified and fed to an appropriate threshold detection circuit . To alert hearing disabled individuals , such systems provide outlets for lamps and external vibratory actuators. Since some warning devices are usually installed out of reach inside houses and offices (for example, fire alarms), the installat ion of the field pickup coils may be diffi- cult . Due to low signal levels, special care is needed i n handling the wi r ing connection between the pick-up coil and the threshold detector circuit . In addit ion, many newer appliances use solid-state buzzers which do not generate any magnetic field. Neverthe- less, electromagnetic field sensing is a reliable method i f employed under the appropriate circumstances. 1.2.3 Hear ing E a r Dogs W h i l e a Hearing E a r dog is not a technological device, i t is included here to underscore the seriousness of the problem, and the complex and expensive solutions that are being offered. T h e Hear ing E a r dog program was originally funded by the U . S . Government to meet the special needs of hard of hearing and deaf people. A n affiliated program was established i n Ontar io , Canada and is named the Hearing E a r Dogs of Canada . On ly mature hearing-disabled individuals are qualified recipients of Hearing E a r dogs. In the Chapter 1. Introduction 6 U . S . , the expenditure involved in dog selection, veterinary care, housing, training, and placement are fully subsidized by the U . S . Government. Hearing E a r dogs are trained to alert their owners to warning sounds commonly found i n the l iv ing environment. Dogs chosen from pet adoption offices are extensively screened prior to the rigorous four to five months of t raining. Dur ing this t raining, the Hearing E a r dog learns obe- dience, and how to respond to sounds emitted by household appliances and warnings. T h e Hearing E a r dogs can reliably recognize warning sounds they are trained for, and w i l l skil lfully alert their owners. In addit ion, the Hearing E a r dog usually is an ideal companion for elderly people. The Hearing E a r dog approach to the problem also has some negative aspects. The t ra ining and dog placement processes are lengthy and costly, and the program often has a very long list of applicants wanting dogs. Moreover, since the t ra ining of Hearing E a r dogs requires special skills, once a placement is made recipients cannot teach their dogs to learn new warning sounds. The maintenance of the dogs is a costly proposition, and their transportation also creates problems. Furthermore, the presence of animals is not always tolerated i n offices, hotels and other public places. 1.3 Project Objectives Exi s t i ng auditory warning aids for hearing-disabled people suffer from various func- tional deficiencies. Such deficiencies include lack of portabil i ty, lack of flexibili ty i n recognizing warning sounds, and the propensity for false-alarms. In a recent survey [l] , hearing-disabled people have expressed their desire for personal warning sound recognition systems which are easy to operate, and which are able to distinguish differ- ent household warning and emergency sounds. The demand for a versatile W A R N i n g Signal Identification System ( W A R N S I S ) which satisfies such needs is well established. Chapter 1. Introduction 7 Motiva ted by this demand, by recent advances in speech recognition technology, and by the availabil i ty of specialized V L S I processors, it has been our objective to develop a real-time, adaptive W A R N S I S which meets the following design criteria: 1. T o be "teachable", which means that the device must be able to learn new warning sounds, and recognize them after a t raining procedure; 2. Have a recognition performance that is similar to that of normally hearing adults i n very noisy environments; and 3. To produce acceptable positive and negative false-alarm rates in use. In order to achieve this goal, work was undertaken to : 1. Investigate the characteristics of the warning sounds commonly used i n office and l iv ing environments; 2. U t i l i ze the results obtained in 1. to develop a recognition technique which has high rel iabi l i ty under noisy conditions; 3. Implement a prototype W A R N S I S embodying the recognition technique devel- oped i n 2; and 4. Evaluate its overall performance i n different noisy environments. 1.4 T h e s i s O u t l i n e In Chapter 2 the literature on the various warning devices is reviewed. Industrial stan- dards for the output power and spectral characteristics of warning sound generators are also discussed. Chapter 3 investigates the t iming and spectral features of some common auditory warning sounds. Chapter 4 reviews different speech recognition techniques, Chapter 1. Introduction 8 w i t h detailed discussion of the filter-bank approach used i n this work. The details of our W A R N S I S implementat ion are presented i n Chapter 5, and the evaluation of the system performance is contained i n Chapter 6. Chapter 7 gives the conclusion and recommendations for further improvement i n system performance. Chapter 2 Warning Sounds and Generating Devices 2.1 Types of W a r n i n g Signal Generating Devices Devices wh ich generate audible warning signals employ either electro-mechanical or solid state transducers. Electro-mechanical warning devices generally include a metallic gong, hammer and coil assembly. To activate such a device, its coil is electrically energized, causing the hammer to vibrate and to strike the gong. The tonal quality and loudness of these devices depend upon the various components i n the electro- mechanical assembly. Such are the shape and size of the gong(s), the force w i t h which the hammer strikes the gongs, and the mounting and housing enclosure. In addit ion, i n the manufacturing process, the mechanical components are assembled w i t h fairly large tolerances. Therefore, the characteristics of the sound generated by such devices vary significantly, even for different units of the same model . In the devices which employ solid-state transducers, warning sounds are elicited by applying electric voltage waveforms to these components. T h e tonal quali ty and loudness of such devices depend on the characteristics of these waveforms, and of the frequency response of the transducers. T h e waveforms are produced by electronic circuits, and therefore their characteristics can be easily manipulated. Since transducers are manufactured to close tolerances, the characteristics of the sounds generated by these electronic warning devices vary very l i t t le , even for different units of the same model . 9 Chapter 2. Warning Sounds and Generating Devices 10 2.2 Industrial Standards for Warning Devices 2.2.1 Sound Output Power Conceptually, warning sounds should be sufficiently loud to be effective i n generating at tention among people i n the v ic in i ty of the warning device. Based on this concept, various standard organizations 1 established recommendations for the sound output power of smoke a larm detectors [5], household fire warning and burglar a la rm systems [6], vehicle a l a rm systems [7], telephone rings [8,9,10] and. general audible signalling devices used for life safety and property protection [11], In general, it is recommended that i n non-industrial environments an auditory warning device operated at rated volt- age, and mounted i n its intended position, be capable of providing an output sound pressure level (SPL) at least 85 d B A (with reference to 20 fi Pa) measured at a distance of 10 feet from the device [12]. M o r e specifically, the m i n i m u m recommended S P L for warning devices depends on the environment where these devices are installed. If the warning devices are used in public places, a m i n i m u m of 15 d B A S P L above the average ambient sound level is required. If the devices are intended to be used i n private residences, these devices should produce a m i n i m u m of 10 d B A S P L above the average ambient sound level [11]. 2.2.2 Frequency Specification Our survey of the publications of five major standard associations led us to conclude that no specific guidelines on frequency content of general warning sounds has been established. The only exception is the telephone, whose required acoustic output power and frequency content are specified by the C S A , E I A , A N S I and B e l l Laboratories. 1 Canadian Standards Association (CSA), the Electronic Industries Association (EIA), the Under- writers Laboratories Incorporated (UL), the American National Standards Institute (ANSI), and the National Fire and Protection Association (NFPA) of the U.S. Chapter 2. Warning Sounds and Generating Devices 11 2.3 Literature on W a r n i n g Sound Characteristics 2.3.1 Telephone Rings Telephone ringers are designed to produce easily recognizable alerting sounds. The available standards are applicable to telephones with electromechanical, or bell-type, alerting ringers, and with modern electronic tone ringers [8,9,10]. The important per- formance characteristics specified by these standards are summarized for our purposes as follows: 1. The alerting signal of a telephone with an electro-mechanical alerting device shall contain two or more major frequency components (fl and f2) in the 500 - 6000 Hz range, with at least one having a mean power level of > 73 dB, relative to 1 pW. The second major component shall have a mean sound power level of > 68 dB, relative to 1 pW; 2. The total mean acoustic power level shall be > 80 dBA, relative to 1 pW. These power levels apply with the volume control set for maximum volume; 3. At least one of the major component (fl) shall be below 2000 Hz. The nominal frequency of the higher major frequency component (f2) shall be equal to or greater than 5/4 of the lower major frequency component (fl), i.e., f2 > 5/4 fl; 4. The alerting signal of a telephone with an electronic alerting device that does not produce an acoustic spectrum rich in overtones shall meet the criteria in 1), with the exception that f l and f2 shall each have a mean power level of > 73 dB, relative to 1 pW; Chapter 2. Warning Sounds and Generating Devices 12 5. A telephone shall have a loudness adjustment accessible to the user that produces at least of a 6 d B A total attenuation when operated from its high to low volume position; and 6. W i t h regard to r inging cycles, r inging current supplied by telephone company central office shall belong to one of the following sequences : • Repeti t ive bursts of 2 seconds out of every 6 seconds where an individual burst may be as short as 0.8 second; • Repet i t ive bursts of 1 second out of every 4 seconds where an individual burst may be as short as 0.6 second; or • Repeti t ive bursts of at least one r inging burst of a m i n i m u m 0.5 second durat ion i n any 4 second period. 2.3.2 Smoke Detector A l a r m Sounds Smoke alarms are used to alert people to the presence of smoke and to the potential of fire. Generally, this warning sound is very strident and insistent. In a study of a la rm sound attenuation inside residential buildings Hal l iwel l and Sul tan [13] investigated the spectral content of the sounds produced by a number of smoke detectors. Using a 2-channel F F T analyzer connected to two microphones, they obtained the short- t ime spectra, and for each sound 64 of these short-time spectra were averaged to give the spectrum. T h e narrow-band spectrum was subsequently converted to a th i rd- octave spectrum by simply summing the energy wi th in the third-octave bands. Their results for various smoke detectors show two or more strong spectral components in al l computed spectra [Table 2.1]. Unfortunately, this work did not include the investigation of the var iat ion of the short-time spectra obtained from consecutive samples. Chapter 2. Warning Sounds and Generating Devices Table 2.1: Spectral analysis results for different smoke detectors [13] Detector Typet 1/3 Octave Frequency Bands (kHz) 0.5 0.63 0.8 1.0 1.25 1.6 2.0 2.5 3.15 4.0 5.0 A l 38* 39 39 39 63 57 73 96 84 63 50 A2 37 38 38 38 44 56 70 98 92 67 56 BI 82 82 60 71 74 81 79 95 95 95 88 B2 79 81 66 72 76 81 77 93 94 96 92 CI 44 44 44 45 45 50 61 79 102 90 69 C2 44 44 44 45 45 50 62 79 102 91 70 D l 46 46 46 46 47 52 63 80 103 93 71 D2 44 44 44 45 45 50 62 80 102 88 68 E l 84 70 69 85 76 92 88 96 92 91 80 E2 76 83 63 69 80 87 85 97 100 91 89 F l 61 60 72 70 70 74 86 75 83 90 82 F2 58 61 69 70 72 77 90 81 82 89 82 G l 37 37 37 . 38 39 50 63 88 95 69 55 G2 38 38 38 38 39 48 61 84 95 71 56 t : Detectors with same letter denote identical model. $: Maximum Sound Power Output in dB Chapter 2. Warning Sounds and Generating Devices 14 2.3.3 W a r n i n g and A l a r m Sounds Generated by Vehicles and Traffic C o n - trol Devices Miyazaki and Ishida [14] have studied the spectral characteristics of traffic alarm sounds commonly used in Japan. Such include sounds produced by electric horns used in passenger cars, small, middle size buses and trucks; air horns used in large buses, heavy duty trucks, and trailers; sirens used in emergency vehicles; horns used in rail- road crossing; and traffic noises. Their observations have only limited value for us since they neither give description of the techniques used nor do they specify the type (short-time or long-time aver- age) of the spectra obtained. Table 2.2 summarizes their results. They conclude that traffic-alarm-sounds have sharp line spectra, whereas ambient traffic noise is wide-band random noise. Table 2.2: Summary of spectral analysis results for traffic alarm sounds [14] Traffic Alarm Devices Installed Vehicles Major Frequency Features Electric horn Passenger cars, small, middle size busses trucks basic resonant frequency at 300 Hz - 500 Hz, dominant harmonics at 2.0 - 4.0 kHz Air horn large busses, heavy duty trucks, trailers dominant peaks at 300 - 500 Hz Siren Emergency vehicles dominant peaks at 700 - 2000 Hz Rail-road crossing 2 - 3 dominant peaks at 2.0 - 4.0 kHz ambient traffic noise broadband noise below 300 Hz In British Columbia, and typically in North America, three types of emergency ve- hicle siren sounds are used: the "hi/lo" sound, the "yelp" sound, and the "wail" sound. Chapter 2. Warning Sounds and Generating Devices 15 The h i / l o sound is usually found on most ambulances. It consists of two alternating tones, and w i t h the pattern repeating about once per second. T w o commonly used tone pairs are 690/920 H z and 520/1520 H z . The wai l sound is a slow changing tone between two preset tone frequencies. A typical example is the wai l sound used by police motorcycle sirens w i t h preset tone frequencies at 500 H z and 1460 H z , and a repetit ion rate of 10 cycles per minute [15]. The yelp sound is a fast changing tone between two preset tone frequencies. A typical example is the electronic siren produced by Southern Vehicle Products Inc., which provides a yelp sound wi th preset tone frequencies at 600 H z and 1350 H z , and a repetit ion rate of 3 to 5 cycles per second [16]. The yelp and wai l sounds are used by both fire-trucks and police cars. 2.4 T h e Emerg ing Scientific basis for Generating Warn ing Sounds W h i l e warning sounds have been used for a long time, many of these are based on subjective opinions as to what is "best". On ly recently was any scientific work done to determine what sound characteristics w i l l elicit opt imal responses under varying circumstances. Such work is part icularly relevant for us, since i n the future warning devices may follow a more systematic approach to sound generation than it has been the case un t i l now. 2.4.1 A Generic Warn ing Sound Generating Scheme Accord ing to the work of Patterson and his colleagues, a warning sound need not to be excessively loud, but its amplitude must depend on the background noise level. They have demonstrated, that i n order to hear sounds reliably in noise, some spectral components must be between 15 d B and 25 d B above the masked threshold [17,18]. Lower and Wheeler [19] has developed a desk-top computer program to estimate this Chapter 2. Warning Sounds and Generating Devices 16 background threshold. W i t h the estimated background threshold, the spectral compo- nent amplitude of the warning sound can be determined. Th i s approach had been used to study the intense background noise of mi l i ta ry helicopters i n the U . K . [20]. W i t h regard to the frequency content of the warning sound, Pat terson [17] l imits it to the range between 0.5 k H z and 5.0 k H z . Based on these spectral amplitude and frequency l imits of the warning sounds, a pattern of pulsative sounds which is distinctive and resistant to undesirable noise contamination, was constructed by Patterson [17,22,23]. A s shown i n F i g . 2.1 , this prototype warning sound basically consists of a sequence of bursts each of which is made up of a sequence of pulses. Different degrees of perceived urgency can be manipulated by simply varying the characteristics of the pulse sequences. In Patterson's work, the pulse design starts w i t h measurement of the ambient noise spectrum. Then , the warning signal spectrum is determined by setting al l its compo- nents 15 - 25 d B above the corresponding ambient noise spectral values. In order to avoid excessive peak factors i n the signal waveform, sine or cosine phase is assigned to the spectrum. Consequently, the pulses are generated by applying the Inverse Fast Fourier Transform. These pulses vary i n durat ion from 75 msec to 200 msec i n accor- dance w i t h the guidelines set down by Patterson [17,23]. A l so , the pulses are gated w i t h sinusoidal ramps at both ends i n order to avoid uncontrollable transients. A t this stage, by varying the fundamental frequency, and the relative weight of high and low frequencies of the pulses, any degree of perceived urgency can be designed. Usu- ally, greater urgency is signalled by higher fundamentals, and by relatively more high frequency energy. A burst is produced by assembling three-to-nine copies of the basic pulse. B y changing the elapsed time between the start of one pulse, and the start of the next, distinct p i tch and temporal patterns may be created. B y varying the amplitude of Chapter 2. Warning Sounds and Generating Devices 17 the pulses different loudness patterns may be obtained. The perceived urgency is generated by changing the overall p i tch, the speed and the loudness pattern of the pulses. In general, a burst w i th a high pulse rate w i l l convey greater urgency than a burst w i t h a low pulse rate. A rising pitch-contour can produce a more urgent burst than a falling pitch-contour. Addi t ional ly , an urgent burst w i l l remain at, or near, the m a x i m u m loudness while a less urgent burst w i l l decrease in loudness towards the end of the burst. Such bursts serve as templates from which warning sounds may be synthesized. The amplitude variations and spacing of the bursts are determined experimentally. The cri terion is that the resulting warning sound should effectively convey the desired specific warning message to personnel i n the v ic in i ty without act ivating their startl ing reflex. Pat terson successfully implemented this scheme on warning systems of commercial aircrafts and mi l i ta ry helicopters [17]. A slight modification of this scheme was also adopted for medical equipment used in intensive-care units and operating theatres of hospitals i n the U . K . [22,23]. Time in seconds AUDITORY WARNING SOUND COMPONENTS Figure 2.1: Auditory Warning Sound Components [17,21,23]. Chapter 3 Measurement and Analysis of T i m i n g &c Spectral Characteristics A s we have seen i t i n Chapter 2, the literature on warning sounds yields l i t t le useful information on their t iming and short-time spectral characteristics. Since i t is the purpose of this work to apply t iming and short-time spectral analysis techniques to systematically extract the unique identifying characteristics of these warning sounds i n real-life environments, such information is essential for us. Specifically, the detailed knowledge of warning sound characteristics provides the basis for the exploration of different signal recognition schemes. 3.1 T i m i n g Characteristics The objective of this part of our work was to derive useful information on the t iming of warning sounds from measurements of signal waveforms. For this purpose we used telephone rings, siren sounds, and smoke a larm sounds. Telephone rings were generated by bo th electro-mechanical and electronic ringers; siren sounds were produced by an electronic siren driver; and the smoke a la rm sounds were obtained from a commercial smoke a larm. 3.1.1 A P C - B a s e d D a t a Acquis i t ion System To obtain quantitative data, a PC-based data acquisition system was designed and constructed. Th i s system accepts the instantaneous absolute amplitude waveform of the signal, and transforms it into the short-time average absolute ampli tude ( S T A A A ) 19 Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 20 waveforms. T h e n , the transformed waveforms are stored for plot t ing. The instanta- neous ampli tude and the short-time average variations i n absolute amplitudes of the signal are given i n Table 3.3, where x(n) represents the discrete instantaneous signal amplitudes, and N denotes the number of samples accumulated. Table 3.3: Instantaneous and short-time signal amplitudes signal amplitudes absolute signal amplitudes instantaneous short-time average x(n) 1 N J V n=l \x(n)\ J V n=l The instantaneous absolute signal amplitudes are generated by hardware, and the derivation of the short-time average absolute signal amplitudes, and storage of these derived samples is accomplished by software. F i g . 3.2 shows the block diagram of the method used to generate the discrete instantaneous absolute signal amplitudes. Basical ly, sounds are collected by a suitable microphone, are pre-amplified by a low-noise voltage amplifier, and are low-pass filtered prior to input to a full-wave rectifier. The output from the full-wave rectifier gives the instantaneous amplitude of the waveform. Then , an 8-bit A / D converter samples this waveform at 10 k H z . Consequently, the digitized sample is stored temporari ly in an output buffer un t i l the 8-bit microprocessor ( I N T E L 8088) is ready to accept the data v i a a bi-directional bus. In addit ion, a L E D bar graph is used to display the variations i n the instantaneous absolute amplitudes of the signal waveforms. Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 21 MICROPHONE SIGNAL PRE - PROCESSOR FULL-WAVE RECTIFIER CPU A/D CONVERTER Figure 3.2: Signal acquisition and derivation of instantaneous absolute signal ampl i - tudes In this implementat ion, the short-time average absolute signal amplitudes are de- rived from 12.8 msec accumulation of the instantaneous absolute signal ampli tude sam- ples ( A / D converted data). W i t h these instantaneous signals sampled at 10 k H z , a sample of the short-time average absolute signal amplitudes can be obtained by sum- ming 128 of the instantaneous signal samples. In order to avoid the problem of overflow dur ing the accumulat ion process, a 16-bit register is used to accumulate this sum. C o n - sequently, a sample of the short-time average absolute signal amplitudes is obtained by d iv id ing the 16-bit register content by the total number of accumulated samples (i.e 128 i n this case). The resulting quotient is then rounded to eight bits to provide the short-time average absolute signal amplitude sample which is transferred to a desig- nated file. T h i s file stores 1000 bytes. These data manipulat ion and transfer procedures are repeated unt i l the data file is completely filled w i th 1000 samples (equivalent to Chapter 3. Measurement and Analysis of Timing ic Spectral Characteristics 22 12.8 sec of the signal waveform). The program to handle this data manipulat ion and transfer i n real-time was wri t ten in I N T E L 8088/8086 assembly language. A flowchart of these operations is shown i n F i g . 3.3. 3.1.2 D a t a Collection W i t h this data acquisit ion system, we collected data on the absolute amplitudes of warning sounds i n the normal acoustic environment of our laboratory. F i g . 3.4 shows the experimental set-up. The siren horn produced siren sounds; and a radio cassette player provided the pre-recorded telephone rings and smoke a la rm sounds. A sound pressure level (SPL) meter placed aside the microphone measured the S P L variations of the environment throughout the data collection process. The S P L meter was set to " C " weighting and " S L O W " response, because the " C " weighting network of the S P L meter has a flat frequency response similar to that of the signal processing circuit of the data acquisit ion system; and the " S L O W " response provides an average of 1.0 sec of the acoustic energy variations of the environment. Based on the S P L measurements i n the absence and during the presence of warning sounds, the signal-to-noise ratio (SNR) could be deduced. S N R , i n this work, is defined as the ratio of peak signal power to peak noise power. Noises, i n this thesis, are defined as al l sounds other than warning sounds. Such unwanted sounds may include steady and transient random noises, radio broadcasts, or surrounding conversations. A detailed derivation of the relationship between the S N R and S P L measurements is given in A p p e n d i x A . D a t a on absolute amplitudes of warning sounds were collected i n two different back- ground environments. The first set of data were collected i n a steady random noise background which originated from a venti lat ion fan of a PC-computer . Such noise is typica l for office environments. A value of 60-62 d B C was recorded throughout the data Chapter 3. Measurement and Analysis of Timing Sc Spectral Characteristics t S T ^ B T J 1=1. j=1 • I INPUT E N D _ O F _ C O N V E R S I O N (EOC) S T A T U S F R O M A/D A C C < ~ - A C C + D , A C C < — - ACC/16 Xj < -- A C C 1 • J < — J + 1 | Y E S S T O R E X T O A S P E C I F I E D FILE Figure 3.3: Flowchart of procedure to accumulate and store 1000 sampl Chapter 3. Measurement and Analysis of Timing < f c Spectral Characteristics CASSETTE PLAYER AND RADIO Figure 3.4: Experimental set-up for data collection Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 25 collection process. This set of data were referred to as "clean", because the SNR was maintained at least over 20 dB. To study the effect of more complex background noises on warning sounds, the second set of data were collected at a SNR of 10 dB. The back- ground noise sources consisted of both steady random noise, radio music broadcast, and speech. To establish the short-time average absolute amplitude profiles of the various noise sounds (without warning sounds present), a third set of data was also collected. This included all the noise sources used above, and the noise SPL was the same as that used in the SNR measurements. 3.1.3 T i m i n g Features of Different W a r n i n g Sounds The plots of the first set of data are shown in Fig. 3.5, Fig. 3.6, Fig. 3.7 and Fig. 3.8. Since the purpose of these measurements is to establish the time variations of the short-time average absolute signal amplitudes, the actual value of these amplitudes is of no particular interest. Therefore, the vertical axes show a relative scale without units. The following observations may be drawn from these figures: 1. Fig. 3.5, Fig. 3.6(b) & (d) (siren sounds), and Fig. 3.7(a) &: (b) (telephone rings) show on-off type repetitive patterns of warning signal bursts; Fig. 3.6(a) &: (c) (siren sounds), and Fig. 3.7(c) (smoke alarm sound) display the steady sounds; 2. Fig. 3.5 (a) and (b) show devices which produces sounds with very similar tem- poral structures, but with different repetition rates; 3. Fig. 3.5 (d) is a two-tone siren sound, and its amplitude contour can be charac- terized by i) a transition from background level amplitudes, and ii) a repetitive Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 26 on-off pattern representing two tones of different intensities (for other siren sounds or telephone rings, the off-patterns represent the background noise levels); 4. The width of the bursts of these waveforms varies from 102.4 msec to 3.24 sec; 5. The repetition period of on-off patterns ranges from 140 msec to 5.86 sec; 6. Steady sounds are characterized by signal level transition to higher steady am- plitude level; and 7. Contours of the average of short-time absolute signal amplitude of radio broad- casts (Fig. 3.8) consist of random, nonrepetitive sequences of signal bursts. The plots of the second set of data are shown in Fig. 3.9 and Fig. 3.10. Comparative examination of these plots yields the following observations: 1. For short-burst, such as (a), and (b) in Fig. 3.9, and (d) in Fig. 3.10, the introduction of radio broadcast background alters the baseline levels, and smooths out the weak peaks of the "clean" signals; however it produces no significant change in relative timing between consecutive amplitude peaks of the waveforms; 2. For signals with long silence intervals( > 400 msec) such as (c), and (d) in Fig. 3.9, and (b) in Fig. 3.10, spurious small peaks appear randomly during these intervals; and 3. The repetition rate of the on-off patterns of burst-type sounds is unchanged by variations in background noise. In summary, we can conclude from these measurements that the short-time average absolute amplitude contours provide unique timing information on both steady and burst-type sounds. Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics i i i i 1 1 1 i i i 1 1 1 i 0.000 0512 1 024 1 538 2.048 2 580 0.00 2.38 5 12 7.88 10.24 12.80 TIME ( in sec) TIME ( in sec) (b) (d) Figure 3.5: Short-time average absolute amplitudes (STAAA) of siren sounds: a) Jl : Burglar alarm (JDS-100); b) J2 : MPI-11; c) J3 : JDS-100 I; and d) J4 : HI-LO Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 28 0.0 I™ rTTTTinnrnrr — i — i.a ~ i — 3.2 TIME ( i n sec) (a) "I 4.8 8.4 0.0 Jfirrmrnr —r— 1.8 — I — 3.2 TIME ( i n sec) (c) - i — 4.8 - I 0.4 0.000 0.912 TIME ( in sec) (b) 0 000 0 256 0.512 0.768 1 024 TIME ( in sec) (d) 1.280 Figure 3.6: Short-time average absolute amplitudes (STAAA) of siren sounds: a) J5 : High steady sound; b) J6 : Pulser; c) J7 : Steady horn; and d) J8 : Electronic Synthesized Bell sound Chapter 3. Measurement and Analysis of Timing <fc Spectral Characteristics 29 Figure 3.7: Short-time average absolute amplitudes (STAAA) of telephone rings and smoke alarm sound: a) Electro-mechanical Ringer; b) Electronic Ringer; and c) Smoke alarm sound Chapter 3. Measurement and Analysis of Timing <fe Spectral Characteristics 30 Figure 3 . 8 : Short-time average absolute amplitudes (STAAA) of radio broadcasts a) Pop music; b) Speech; and c) Rock music Chapter 3. Measurement and Analysis of Timing &c Spectral Characteristics 31 Figure 3.9: Short-time average absolute amplitudes (STAAA) of siren sounds with radio-broadcast as background: a) J l ; b) J2; c) J3; and d) J4 Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 32 < T 0.0 I 3.2 — I — 6.4 TIME ( i n sec) (a) - i — 8.8 12.8 s 0.0 I — 3.2 l e.4 TIME ( i n sec) (c) T 8.6 12.8 < * < 11 0 0 0 ^ - r , L r , 3 8 \ 9 U 6 4 0 ' 0 000 1.024 2.048 3.072 4.096 S ^ O TIME ( in sec) TIME ( in sec) (b) (d) Figure 3.10: Short-time average absolute amplitudes (STAAA) of different siren sounds with same background noise: a) J5; b) J6; c) J7 ; and d) J8 Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 33 3.2 Spectral Characteristics Based on the assumption that the short data records deduced from the observed time sequences are ergodic, and that their estimated spectra are slowly time-varying, spec- tral estimation techniques provide an insight into the frequency contents carried by the observed time sequences. Generally, spectral estimation methods use either the parametric, or the nonparametric approach. A detailed exposition of many different algorithms used for obtaining waveform spectra was given by Kay and Marple [24]. In general, parametric spectral analysis involves three steps. The first step is to se- lect a time series model, with assumed model order, for the observed data record. Time series models such as the autoregressive model(AR), the moving-average model(MA), or the autoregressive-moving average model(ARMA), are the most common choices for practical applications. For example, the linear prediction coding (LPC), or A R model with model order of 10-16, has been proven to be a very suitable choice for speech analysis and synthesis [25,26]. The second step is to estimate the model parameters using the available data sam- ples [24]. Depending on the specific time series model selected, different algorithms may be applied for such parameter extraction. The third step is to compute the estimated spectra by substituting the specific parameter values derived in the second step into the theoretical power spectral density function of the model used. The nonparametric spectral estimation approach assumes that the observed data record is produced from a set of sinusoidal components governed by the Fourier Series model of signals. Two popular and conventional spectral estimation techniques are the Blackman-Tukey [27] and the Welch's periodogram [28] methods. Both of these techniques employ the computationally efficient Fast Fourier Transform (FFT) . A new, unified, FFT-based spectral estimation method, capable of producing more statistically Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 34 stable spectra w i t h better frequency resolution than the conventional methods, has been proposed by N u t t a l l and Carter [29]. 3.2.1 Comparison of Parametric and Nonparametric Spectral Est imation Methods W i t h relatively short data sequences recorded under high signal-to-noise ( S N R ) condi- tions, the parametric technique can produce smoother and finer frequency resolution spectra. Unfortunately, the parametric spectral estimation approach is susceptible to noise interference. Such degradation i n performance of the A R model has been exten- sively investigated by L i m [30] and K a y [31]. The nonparametric spectral estimation approach is implemented i n practice by the Discrete Fourier Transform ( D F T ) . Since the D F T considers every data sequence to be periodic, such periodic extensions of the original data sequence exhibit discontinuities at the boundaries of the observed time interval. In the subsequent numerical analysis, these boundary discontinuities result i n spectral leakage over the entire frequency spec- t rum. Harr is [32] discussed the application of using various windows w i t h nonuniform weighting to reduce this spectral leakage. This can be accomplished only at the ex- pense of frequency resolution i n the spectrum. F ina l ly , to obtain a statistically stable spectrum, spectrum averaging of short-time spectra is definitely required [28]. In general, the frequency resolution of spectra obtained by the nonparametric spec- tral estimation approach is l imited by the data durat ion, and is independent of the S N R of the signals. Theoretically, the frequency resolution of spectra is inversely pro- port ional to the durat ion of the original data sequence. Since zero-padding of the data sequence before transformation effectively increases the signal durat ion, it has been a misconception that such a zero-padding procedure w i l l improve the frequency resolu- t ion of the resultant spectra. A s demonstrated i n [24], zero padding is useful only for Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 35 1) smoothing the appearance of the resultant spectra v i a interpolation, 2) resolving potential ambiguities of computed spectra, and 3) reducing the "quantization" error in the accuracy of estimating the frequencies of spectral peaks. It is common procedure to apply windowing prior to the zero-padding of the data sequence. 3.2.2 Welch's Non-overlapping Spectral Est imat ion M e t h o d For this work, we selected the conventional Welch's non-overlapping spectral estimation approach to investigate different warning sounds. The rationale behind this choice has four aspects. F i r s t , most warning sounds usually mainta in a regular rhy thm, and continuous, long data records can be obtained. Th i s allows spectral averaging, and results in the statistical s tabil i ty of the computed spectra. Secondly, by Welch's spectral estimation technique is robust w i t h respect to noise corruption of the signals, because the fre- quency resolution and the stabil i ty of the computed spectra are independent of the S N R . Th i rd ly , no a pr ior i knowledge of a signal model for various warning sounds is needed. F ina l ly , l imitat ions inherent i n Welch's spectral estimation method have been thoroughly studied, and techniques used to reduce discrepancies have been well explored [32]. Welch's non-overlapped spectral estimation technique may be described i n four steps: 1. Consider a data sequence, x(n) of length N , where n G [0, N — l ] , and divide N into K non-overlapped segments, each of which has an integral length of N / K , say M , and is denoted as Xk(m), where m G [0,M — l ] , and k G [0, K — 1]. Chapter 3. Measurement and Analysis of Timing Sc Spectral Characteristics 36 2. Select an " D F T - e v e n " window sequence w(m), w i t h length identical to xk(m), and mul t ip ly this window sequence onto Xk(m), giving xk(m) as follows, xk(m) = Xk{m)w(m) (3.1) 3. Take the magnitude square of the windowed sequence to obtain the kth segment discrete Fourier spectrum (often called modified periodogram) denoted as Sk (I), Sk(l) = MU M-l y~] Xk(m)w(m)e }' 771=0 MU M-l J2 xk(m)e ^"M1 m=0 where U = window average power given by, (3.2) M-l m—U (3.3) 4. Compute Sk(l) for k € [0, K — 1], and obtain the average spectrum, Savg(l), U 0 = 7 E ^ W 1 k=0 M-l (3.4) 1 DFT-even window is a conventional even window sequence with the right-end point missing. [32] Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 37 Welch demonstrated that the variance of spectral estimates can be reduced by div id ing a long original data sequence into finer segments. However, he also cautioned that the statistical bias generated by the estimation process increases linearly w i th increasing number of segments [28]. Therefore, the trade-off between the size of data segments and the amount of spectral variance reduction is to be determined by the user. 3.2.3 Implementation of Welch's M e t h o d Since the D F T can accept complex input quantities, we may make use of this feature to establish an efficient scheme for the computat ion of average spectrum from two real data sequences. Such a scheme is implemented by the use of the F F T algori thm, and involves only a single pass of the F F T computat ion. The three steps of calculations are summarized as follows. The first step is to substitute the real and imaginary parts of a complex input data sequence by two non-overlapped real data segments. Then , we take the D F T of this complex sequence, and after further calculations we can obtain the average spectra of the two non-overlapped data segments. The detailed mathematical derivations are given i n [33], w i t h the major steps summarized below: 1. Consider now g(m) being a complex input data sequence whose real and imagi- nary parts are substituted by the two non-overlapped real data segments Xi(m) and X2(m). Then , g(m) can be expressed by, g(m) = xi(m) + jx2(m) (3.5) where m G [0, M — 1]. Chapter 3. Measurement and Analysis of Timing <& Spectral Characteristics 38 2. The D F T of g(m) which is denoted as G(k) is expressed by, M - l G{k) = s H w M « " ' ^ (3-6) = GR{k)+jGj{k) where GR = real part of D F T of G[k) Gj — imaginary part of D F T of G(k) w(m) = " D F T - e v e n " window sequence k e [ 0 , M - 1] 3. N o w we take into consideration that given two real data sequences, Xi{m), and x2(m), and a D F T - e v e n window sequence, w(m), for m e [0,M — 1], the D F T of these windowed data sequences denoted as Xi(k), and X2(k), respectively, can be represented by their real and imaginary parts given below: Xi{k) = X1R(k) + j Xu(k) (3.7) X2(k) = X2R(k) + j X2I(k) (3.8) where X1R(k) = real part of the D F T of Xi(m) Xu(k) = imaginary part of the D F T of x i (m) X2R(k) = real part of the D F T of x 2 ( m ) X2i(k) = imaginary part of the D F T of x2(m) k e [ 0 , M - 1] Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 39 It can be shown that, XlR(M-k) = X1R(k) (3.9) X2R(M-k) = X2R{k) (3.10) Xu{M-k) = -Xu{k) (3.11) X2I(M-k) = -X2I{k) (3.12) Using the expression 3.5 for g(m) in Eq. (3.6), we can express GR(k) and G/(fc) in terms of the real and imaginary parts of X\(k) and Xzft): GR{k) = X1R(k) - X2I{k) (3.13) Gj{k) = Xu{k) + X2R{k) (3.14) If we substitute k by (M-k) into Eq.(3.13-3.14) and utilize the results obtained from Eq. (3.9-3.12), we obtain, GR{M-k) = X1R{k) + X2I{k) (3.15) GT{M-k) = - Xu(fc) + X2R{k) (3.16) 4. The average spectrum, Paug(k) for Xi(n) and x2(n) is given by, PaV9{k) = ^u{\Mk)\2 + \Mk)\2} (3-17) Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 40 where * = £ { E l . M I ' } C3.«) Therefore, by making use of the results obtained from Eq. (3.13-3.16) to solve for X1R{k), Xu{k), X2R{k), and X2I{k) in terms of GR{k), GR{M - k), G7(fc), and Gj(M — k), we can, subsequently, derive Pavg(k) from the real and imaginary parts of G{k). Thus, we can show that, PavS(k) = _L_ {GR(k) + GR(M -k)+ G)(k) + G){M - k)} (3.19) In this work, warning signals were sampled at a rate of 20 kHz with 12 bit resolution. The non-overlapped data segment length was a multiple of 12.8 msec, or of 256 data samples. With regard to the specific window used to reduce spectral leakage, the minimum 4-sample Blackman-Harris window (Fig. 3.11), with - 92 dB highest sidelobe level, - 6 dB/octave sidelobe fall-off rate, and two frequency bins 2 of the equivalent noise bandwidth [32], was used to multiply onto each non-overlapped data segment. The actual spectral calculations were performed on a V A X 750 general computer. The flowchart of the program is given in Fig. 3.12. 3.2.4 D a t a Collection In order to explore the variations of warning sound spectral characteristics, the sounds emitted by 1.) electromechanical ringers of five rotary dial phones, 2.) a multiple-line 2 A bin is a basis frequency for a spectrum and is derived from the ratio of the signal sampling frequency to the total number of data points used in the spectrum. Chapter 3. Measurement and Analysis of Timing ic Spectral Characteristics 41 Figure 3.11: Spectrogram of the minimum 4-sample Blackman-Harris window, where PSD denotes power spectral density Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 42 Read data in RX02 format from magnetic tape Unscramble data to ASCII format Multiply data sequence by DFT-even window Form : Z = X + i Y X = sequence 1 Y = sequence 2 Z = complex sequence Compute DFT of Z by using FFT I Unscramble FFT output to obtain averaged spectrum using eqt.(12) Figure 3.12: Flowchart of the spectral analysis program Chapter 3. Measurement and Analysis of Timing & Spectra] Characteristics 43 push-button telephone, 3.) an electronic ringer of a touch-tone telephone, and 4.) an electronic siren driver (used in timing feature measurement) were used. These sounds were recorded on a tape recorder in various ambient noisy environments in order to investigate the effects of background noises on warning sound spectra. The recorded warning sounds were fed to an A / D conversion system, and the digi- tized samples were stored onto a magnetic tape for storage and for further processing. To suppress the aliasing effect of the sampling process, a Kronhite electronic filter was used to remove the spectral components of the analog signals beyond the 10 kHz fre- quency bandwidth. Then, the filtered signal was fed to a 12-bit M I N C / D E C C AB-23 A / D converter with selectable data sampling frequency under the master control of a PDP-11 computer. In our work, the sampling frequency was set to 20 kHz. Conse- quently, each 6.5 seconds of the digitized sound record was transferred from a PDP-11 computer to a VAX-750 general computer for spectral analysis. 3.2.5 Spectra of W a r n i n g Sounds Generated by various Warn ing Devices Unless otherwise stated, most of the short-time spectra were obtained by averaging four consecutive 25.6 msec segments of the spectrum. We assume that within this 102.4 msec the signals are slowly-varying, and that the average spectrum provides a statistically stable representation of the frequency content of the signals. 3.2.5.1 Spectra of Telephone Rings generated by Electro-mechanical Ringers Although frequency specification on telephone rings are provided by various standard associations, the acceptable variations of the short-time spectra of telephone rings have not been published. In addition, there is no information on the effect on spectral Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 44 variations of the different loudness level adjustments that can be made on electro- mechanical ringers equipped w i t h loudness controls. Similar ly, there is no mention i n the standards (or i n the literature) of the effect of the p i tch setting of electronic ringers on the spectra of emitted sounds. The measurements reported here were made to obtain this missing information. F ive different aspects were examined: Short-t ime averaged spectra of an electro-mechanical ringer F i g . 3.13 gives a typical example of short-time spectra of telephone rings w i th the loudness level set to one. (The loudness adjustment control is found at the bo t tom panel of some rotary dia l telephones.) These rings were recorded i n an ordinary office environment. T w o regions of spectra are identified: the transient, and the steady-state regions. D u r i n g the beginning 600 msec of the r inging period (transient) these short- t ime spectra are very similar , and are r ich i n harmonic content (dominated by three to five major spectral peaks i n the 10 k H z frequency bandwidth) . Fol lowing this is the steady-state of the r inging period w i t h only two or three dominant peaks retained. Long-t ime averaged spectra of an electro-mechanical ringer at seven different loudness levels The next two figures show how telephone r ing spectra vary w i t h respect to changes in loudness level adjustments. The same telephone was used as i n the previous measure- ment. These spectra were obtained by averaging 256 25.6 msec long record segments (6.55 sec). F i g . 3.14 (a) shows that two major peaks always occur i n the spectra at each of the seven loudness settings. However, for another rotary dial telephone, F i g . 3.14 (b) shows the dramatic changes i n spectral characteristics when the loudness adjustment is Chapter 3. Measurement and Analysis of Timing < f e Spectral Characteristics 45 altered from level two to three. The disappearance of these dominant peaks is caused by some change in the internal ringing mechanism. These figures clearly illustrate the unpredictability of the effect of varying loudness settings on telephone ring spectral characteristics. Long-time averaged spectra of five electro-mechanical ringers Spectra from five rotary dial phones of the same model were used in this measurement. To provide a general view of their spectral variations, Fig. 3.15 gives an example of long-time averaged spectra of five electromechanical ringers with a preset loudness level. In Fig. 3.15, the dominant spectral peaks produced by phone samples 1, 2 and 3, do not appear in the spectra generated by phone samples 4 and 5. This indicates that phone rings generated from telephones of same model do not produce similar spectral characteristics. Short-time averaged spectra of a multiple-line telephone Fig. 3.16 depicts another set of short-time spectra for a multiple-line push-button telephone. Since this telephone is not equipped with a loudness adjustment control, our study on the effect of varying loudness adjustment on short-time spectra was not performed. Compared to other telephone ring spectra, Fig. 3.16 consists of spectral peaks at different frequency locations: 1.6 kHz, 3.2 kHz, 5.9 kHz and 9.2 kHz. Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics Short-t ime spectra of an electro-mechanical ringer in steady noise background 46 T o demonstrate how steady fan noise affected the short-time spectra of telephone rings, we used the same phone as i n the first two measurements. These telephone rings were recorded inside a computer room where an air-ventilation system was operating. Compared to F i g . 3.13, i n F i g . 3.17 the amplitudes of dominant peaks decreased, the number of dominant peaks was reduced, and the transient regions of the spectra have largely disappeared. Th i s may be caused by the effect of spectral flattening of the background noise. However, two of the dominant peaks of successive spectra are s t i l l retained. Conclusions Spectra of telephone rings produced by electro-mechanical ringers consist of a) two distinct regions (transient and steady) of short-time spectra, and b) spectral peaks are always located i n the 1.6 - 2.5 k H z and 4.7 - 6.2 k H z bands. Details of the spectral characteristics vary w i t h loudness, w i th the model, and w i t h ind iv idual units of the same model . In general, i t is difficult to predict the spectral distort ion caused by background noise because such distort ion is highly dependent on the characteristics of the noise. Such characteristics are both time and spatial variant. Since real environmental noise situations are very variable, there is very li t t le practical value i n further study on the effect of noise on the spectra. Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 47 ffl Q OH o d o p o 4.0 6.0 8.0 F R E Q U E N C Y I N k H z Figure 3.13: Short-time spectra of an electromechanical ringer Chapter 3. Measurement and Analysis of Timing & Spectra] Characteristics 48 Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 49 Figure 3.14 (b) settings Spectra of another electromechanical ringer w i t h seven loudness Chapter 3. Measurement and Analysis of Timing iz Spectral Characteristics Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics ' J © _ J r — r 1 1 — r X S 2.0 4.0 8.0 8.0 10.0 " F R E Q U E N C Y I N k H z Figure 3.16: Short-t ime averaged spectra of a multiple-line telephone 3. • * Spectral Characteristics e „ t and Analysis of T i m ^ g ^ S p e Measurement ana 52 100 F R E Q U E N C Y I N K H X F * B . 3.17: <*•*"*• Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 53 3.2.5.2 Spectra of telephone rings generated by an electronic ringer Since solid state transducers are manufactured to close tolerance, and the control circuits generate very consistent tone frequencies, electronic ringers of the same type will produce sounds with very similar features. In addition, the different types all conform to applicable standards. Therefore, only one electronic ringer unit was exam- ined in detail. Since the telephone we examined was equipped with pitch adjustment controls, the effects of different pitch settings on the spectra were also studied. Each of the short-time spectra was obtained by averaging two consecutive 102.4 msec long spectra. The reason for selecting 102.4 msec segments was to provide a frequency resolution of 19.6 Hz for the separation of the two dominant tones generated by the electronic tone ringer. Fig. 3.18 (a), (b), (c), and (d) show that the change of pitch setting results in more high energy peaks appearing. Although it is difficult to see in these plots, the pitch setting also results in the shifting of the dominant lowest frequency peaks. The tabulated numbers indicate that for this particular electronic ringer, one tone frequency varies from 468 Hz to 546 Hz, and the other varies from 546 Hz to 683 Hz. Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics 54 PQ T3 c o o o o LO 2.0 4 0 6.0 7 F R E Q U E N C Y I N k H z Figure 3.18 (a): Short-time spectra of electronic rings w i th p i tch set at position one Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics 55 o d o o i— o 4.0 6.0 ~ F R E Q U E N C Y I N k H z Figure 3.18 (b): Short-t ime spectra of electronic rings w i t h p i tch set at posit ion two Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 56 H — | 1 1 1 1 r § 2.0 4.0 6.0 8.0 10.0 7 F R E Q U E N C Y I N k H z Figure 3.18 (c) : Short-time spectra of electronic rings w i th p i tch set at posi t ion three Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics 57 vj ^ _ _ , 1 1 1 1 r ^ S 2.0 4.0 6.0 8.0 10 0 7 F R E Q U E N C Y I N k H z Figure 3.18 (d) : Short-time spectra of electronic rings w i t h p i tch set at posit ion four Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics 58 3.2.5.3 Spectra of Siren Sounds Lastly, we studied the spectral characteristics of different warning siren sounds produced by an electronic siren driver. Eight different siren sounds can be produced with this device. In all of these spectra, note that the peaks located at 7.0 kHz are produced by ambient noise monitored independently with the sound pressure level meter. Rapid-Yelp The short-time spectra of this sound consist of a band of frequencies varying from 1400 Hz - 3000 Hz (Fig. 3.19). Conventional Yelp Fig. 3.20 shows the variation of short-time spectra of this sound which consists of a varying band of frequencies ranged from 666 Hz - 1333 Hz. Low-high Sweep Fig. 3.21 shows a very interesting 'chirp-signal' type of short-time spectra. The spectra consist of peaks varying from 820 Hz to 4.0 kHz. European Hi-low Fig. 3.22 shows spectra which consist of fundamental spectral component at 1093 Hz, along with its harmonics at 1640 Hz and 3164 Hz. Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 59 Hi-frequency Steady F i g . 3.23 gives the spectra, which consist of a fundamental spectral component at 833 H z , together w i t h its harmonics at 1640 H z and 3200 H z . Pulsa t ing Siren F i g . 3.24 gives the spectra of a 'Pulsa t ing H o r n ' siren sound, which consists of a poorly defined peak at 1600 H z and a distinct peak at 2400 H z . Steady H o r n F i g . 3.25 shows spectra, which consist of two major bands of frequencies at 500 - 700 H z and 1200 - 1400 H z . Electronic Synthesized B e l l F i g . 3.26 shows the spectra of a bell sound, which consists of four peaks at 700 H z , 1406 H z , 2070 H z , and 2812 H z . 3.2.6 S u m m a r y Summing up the spectral analysis results, we reached the following conclusions [34]: • dominant spectral features of warning signals generally appear w i th in the fre- quency range between 300 H z to 5.0 k H z , • warning signal spectra may consist of a single spectral peak, or regular clusters of spectral peaks and valleys and, • i n general, the spectral features of warning signals are simpler than those of speech signals w i t h regards to: Chapter 3. Measurement and Analysis of Timing &z Spectral Characteristics 60 1. absence of nonstationary segment of short-time spectrum (while isolated speech utterance may consist of nonstationary short-time spectra caused by weak fricatives at the utterance boundaries) and, 2. repeatability of spectral features of warning sounds (while due to variable utterance rate of a word, nonlinear time distortion in spectral features oc- curs) . Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 61 Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 62 Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics o o o o PQ Q C O 10 2.0 40 60 6.0 F R E Q U E N C Y I N k H z Figure 3.21: Spectra of L o w - H i sweep sound 10.0 Chapter 3. Measurement and Analysis of Timing ii Spectral Characteristics 64 PQ Q CO o d o q 16 I O in 2.0 40 6.0 F R E Q U E N C Y I N k H z Figure 3.22: Spectra of European H i - L o w sound Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 65 PQ Q CO • F R E Q U E N C Y IN k H z Figure 3.23: Spectra of Hi-Frequency Steady sound Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics 66 Q co PL, o —I 1 1 " 1 • d 2.0 40 6.0 60 10.0 7 F R E Q U E N C Y I N k H z Figure 3.24: Spectra of Pulsa t ing H o r n sound Chapter 3. Measurement and Analysis of Timing &z Spectral Characteristics Q C O Pi £ 2.0 4.0 6.0 8.0 10.0 1 F R E Q U E N C Y IN k H z Figure 3.25: Spectra of Steady H o r n sound Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics 68 PQ Ti- cs d 5 8 J Q CO Ph in t— I o in F R E Q U E N C Y IN k H z 10.0 Figure 3.26: Spectra of Electronic Synthesized B e l l sound Chapter 4 Solutions to the Recognition Prob lem 4.1 Pattern-Recognit ion M o d e l for Signal Identification The classic pattern-recognition scheme for signal identification is shown i n F i g . 4.27. Th i s scheme consists of feature extraction, pattern matching (similari ty tests), and decision making blocks. It forms the basis of many applications, because it places no restrictions on the use of different feature sets, s imilar i ty algorithms, and decision rules, and it is possible to implement it i n a wide range of circumstances [37]. T h e function of the feature extraction stage is to convert the signal into parameters or feature sets. Th i s results i n the reduction, and sometimes el iminat ion, of redun- dancies that exist in the original signal. Such signal reduction procedures provide a manageable number of signal features, making practical machine recognition feasi- ble. Ext rac table signal features include t iming information, short-time spectra, Linear Predic t ion Cod ing ( L P C ) parameters, L P C - d e r i v e d cepstral coefficients, or statistical parameters derived from the Hidden Markov M o d e l ( H M M ) . For pattern comparison, the signal features must be either known a pr ior i , or the system must "learn" them. Such learning may be accomplished by t ra ining the system wi th the signal (s). Th i s involves the extraction of features, and their consequent storage i n template memory. The signal feature sets are obtained from consecutive short-time segments of the signals. To recognize a specific signal, the features of the unknown signal are compared w i t h the different sets of pre-stored reference signal features. 69 Chapter 4. Solutions to the Recognition Problem 70 UNKNOWN PATTERN DISTANCE SCORE S I G N A L FEATURE | EXTRACTION^ P A T T E R N C O M P A R S I O DECISION RULE R E C O G N I Z E D — - S I G N A L T E M P L A T E MEMORY Figure 4.27: Classic Signal Recognit ion Scheme [37,38] T h e matching of the unknown signal features to the templates is generally compli- cated by the non-linear t ime mis-alignment of the short-time feature segments of the unknown signal and of the reference templates. To solve this matching problem, the well-known dynamic t ime warping ( D T W ) a lgor i thm is employed [39]. Based on this a lgori thm, for each reference template an op t imum match between the unknown signal and the reference features is sought. In these pattern comparisons, distance calcula- tions aTe performed on the short-time segments of signal feature sets i n order to provide a measure of s imilar i ty between the unknown signal and the reference templates. The literature offers several distance measures [40,41,42]. One of two decision rules are used in most practical systems: the nearest neighbor rule ( N N rule), and the K-nearest neighbor rule ( K N N rule). The N N rule is applied when there is a unique reference template for each possible signal. In comparing an unknown signal w i th the reference templates, the pre-stored template which is the Chapter 4. Solutions to the Recognition Problem 71 smallest distance from the signal, is recognized to be the unknown signal. The K N N rule is applied when mult iple reference templates are learned from each possible sig- nal , giving several template sets representing different signals. The unknown signal is associated w i t h the template set for which the m i n i m u m average distance is computed. 4.2 Review Sc Evaluat ion of Signal Recognition Techniques In the following sections, an overview of previous research i n signal recognition is pre- sented. Emphasis is focused on speech signal (isolated utterance) recognition tech- niques, because 1) these recognition schemes fit well to the pattern-recognition model , 2) recognition performance of each applicable technique has been reported [35,36], and 3) warning sounds have acoustic features (i.e., p i tch and formant) s imilar to speech signals. Based on this survey, and on our signal analysis results, the most suitable recognition method w i l l be selected for the W A R N S I S . 4.2.1 A n a l y z i n g & Uti l iz ing T i m i n g Features T i m i n g information may be extracted from signals using autocorrelation coefficients, zero-crossing measurements, energy waveform analysis, and peak detection. Such i n - formation has been used i n signal recognition i n a variety of ways. 4.2.1.1 Auto-correlat ion coefficients P u r t o n [43] used speech signal autocorrelation coefficients i n his speaker-dependent recognition experiments. Specifically, these autocorrelation coefficients were derived from the outputs of two bandpass filters used to capture the formants of speech signals. He achieved an average of 90 % recognition accuracy for a vocabulary size of 10 words. Sondhi [44] applied autocorrelation analysis to the speech signals which were pre- processed by a center-clipping technique which removed the formant structure. The Chapter 4. Solutions to the Recognition Problem 72 signal p i tch was then extracted from the autocorrelation function. The combined for- mant structure removal of signals and autocorrelation analysis provided a robust p i tch est imation method. The effects of different degrees of formant structure removal prior to autocorrelation analysis on pi tch estimation of speech signals was given by Rabiner [45]. A real-time hardware implementation of a p i tch estimation scheme based on a combinat ion of center-clipping and peak-clipping methods, followed by autocorrelation analysis, was reported by Dubnowski [46]. To use the correlator-bank approach similar to that of P u r t o n [43] i n our work, the number of correlators used and the number of terms (autocorrelation function coefficients) retained to formulate a signal feature need to be determined. Th i s may be achieved by spectral analysis, and for our signals a mult iple-band correlator would be needed. In addi t ion, i f the autocorrelation function coefficients generated w i t h zero- delay is used, this is equivalent to ut i l iz ing the short-time signal spectral information. Hence, such correlator-based recognizer produces a large feature set, making difficult and uneconomical to design and implement a real-time recognizer based on this concept. 4.2.1.2 Zero-crossing Rabiner and Sambur [47] analyzed energy and zero-crossing measurements of pre- recorded speech signals i n determining the endpoint locations of isolated utterances. F i r s t , the energy contour of an utterance was generated and studied to provide a crude boundary. T o refine this utterance boundary, zero-crossing measurements were used. A t an S N R of 30 d B or better, this endpoint detection a lgor i thm worked very well over al l tested conditions. T o develop a low-cost, microprocessor-based speaker dependent recognizer, Whi taker and Angus [48] employed zero-crossing measurements to track two formant variations of speech sounds. The zero-crossing counts were obtained from the outputs of two filters Chapter 4. Solutions to the Recognition Problem 73 (one of which was a low-pass filter w i t h a cut-off frequency at 800 H z , and another was a high-pass filter w i t h 3 d B corner frequency at 1000 H z ) . In order to optimize storage, they used the variable rate encoding technique to reduce redundancies in signal fea- tures. W i t h a vocabulary size of 10 - 20 words, they attained an averaged recognition accuracy of 95 % - 99 %, depending upon the formant structure of the utterances. The use of zero-crossing detectors for warning signal recognition is attractive. How- ever, the accuracy of zero-crossing measurements depends on the relative amplitude of the dominant frequency compared to other frequency components wi th in each fre- quency band, and also on the spectral spacing of the components [48], In addi t ion, zero-crossing analysis is very prone to noise interference. A l though zero-crossing detec- tors can be implemented easily and economically, the inconsistency of their operational performance i n noisy environments makes this approach unsuitable for W A R N S I S . 4.2.1.3 Energy Waveform T o counter the effect of nonstationary background noise added to the signal dur- ing transmission over telephone lines, Lame l [49] et. al developed a hybr id endpoint detection scheme for isolated utterances. Th i s detector derives one or more endpoint pair estimates from the energy contours of the utterances. In order to determine the best endpoint pair, word recognition is performed using each possible set of endpoint pairs. The selection of the best pair is based on the best match achieved by the recog- ni t ion process. The authors call this detector "hybr id" because 1) sets of possible endpoint pairs are obtained, and 2) decision to select the best endpoint pair depends on feedback from the recognition scores. Us ing the best endpoint pairs correspond- ing to different utterances, the hybr id endpoint detector produces recognition results close to that obtained from hand-edited endpoints. A real-time implementat ion of this endpoint location scheme was given i n [50]. Chapter 4. Solutions to the Recognition Problem 74 It should be noted that energy contours are easily derived i n practice. Since the energy contour "waveform" contains information on energy level changes occurring in t ime, it is potential ly useful i n our application. 4.2.1.4 Peak Detection G o l d and Rabiner [51] analyzed the relative t iming relationships between the peaks of low-pass filtered speech signals, and reported a reliable p i tch estimation method for speech signals of p i tch frequency less than 300 H z , even i n a high level of white- noise background. A n extension of this technique was developed to detect periodic and nonperiodic signals [52]. Th i s method is especially susceptible to transient noise, such as those commonly occurring i n the everyday acoustic environment. Therefore, this approach is not suitable for us. 4.2.2 Feature Extract ion by Filter Banks Conceptually, the simplest way to extract spectral information from a signal is to pass it through a set of parallel bandpass filters tuned to different mid-frequencies. These mid-frequencies, and the filter bandwidth, would be selected to cover the frequency range of interest. The output of the filter is a measure of the average spectral intensity wi th in the filter band. Whi t e and Neely [53] implemented their broadband speech signal recognizer using a bank of 20 one-third octave bandpass filters. These overlapping filters spanned the frequency range from 100 H z to 10 k H z . Using a list of mult isyl labic words from a N o r t h Amer i can dictionary, they achieved a recognition accuracy of 99.6 % i n their experiments. Another filter-bank based speech recognizer was developed by K w o k , Ta i and Fung [54] for the identification of the monophonemic Cantonese digits zero to ten. Chapter 4. Solutions to the Recognition Problem 75 W i t h 12 eight-pole overlapping filters, this recognizer provided an average recognition accuracy of 96.8 %. In industry, N E C has developed its integrated filter-bank based isolated word recog- ni t ion L S I chip set [55]. The feature extraction processor of this chip set consists of eight biquad digi ta l bandpass filters spanning the frequency range from 100 H z to 5.0 k H z . Th i s chip set employs a specific data compression a lgor i thm to remove redundancy i n signal spectral features, and is "firm-wared" w i th dynamic programming a lgor i thm for dynamic t ime warping calculations for signal recognition. A recognition accuracy of more than 98 % was reported. M i y a z a k i and Ishida [14] developed a traffic a la rm sound monitor for aurally hand- icapped drivers. Th i s traffic a larm sound monitor consists of seven bandpass filters followed by seven line spectrum detectors. In order to reduce the false-alarm trigger- ing due to the squeaking noises of brakes, tires, engine-noise at high revolutions, wind noise at high-speed dr iv ing, human voice, and music, an error suppression circuit was designed to detect the sudden rise of the S P L of the input signal. The successful detec- t ion of traffic a la rm sounds depends on both the outputs from the seven line spectrum detectors, and the error suppression circuit . Dur ing field tests of this monitor on mod- erately crowded downtown roads in Tokyo, on the average one false-alarm per three minutes was observed. For our applicat ion the filter bank approach offers the advantages of robustness, noise-resistance, and straightforward implementation at a low cost. These w i l l be discussed i n more detail i n Section 4.5. Chapter 4. Solutions to the Recognition Problem 76 4.2.3 T h e L P C / A R M o d e l The L P C / A R model assumes that signals can be parametrically modeled as the outputs of a linear, t ime-varying system excited by either quasi-periodic pulse trains, or random noise. T h e L P C / A R signal analysis technique has been widely applied to seismic and speech signal processing. To discriminate between earthquakes and underground nu- clear explosions, Tjos theim [56] employed a third-order autoregressive model to analyze short per iod seismic events. The extracted A R parameters produced two discernible clusters characterizing earthquakes and explosions, respectively. So far L P C / A R parameters have been proven to give the most effective charac- terization of speech signals. These L P C / A R coefficients represent the combined in - formation about the formant frequencies, their bandwidth , and the glottal waveforms [57]. Therefore, during the past decade, considerable effort was directed at the study of the performance of L P C / A R - b a s e d isolated word recognizers. Ackenhusen and O h [58] implemented an eighth-order L P C - b a s e d isolated word recognizer using an A T & T D S P - 2 0 processor. Th i s recognizer has also been used i n research for 1) statistically clustered templates for speaker-independent word recognition, 2) recognition based vec- tor quantizat ion, and 3) recognition based hidden M a r k o v Mode l ing ( H M M ) of speech signals. Dau t r i ch et a l . [59] demonstrated that i n high S N R environments and for signals t ransmitted v ia telephone lines, L P C - b a s e d recognizers can perform several percentages better than filter-bank based recognizers. In considering an L P C / A R approach for W A R N S I S , we must deal w i th two prob- lems inherent to this technique. F i rs t , the order, 'p ' , of the L P C / A R signal analysis has to be estimated. Different criteria exist for estimating 'p ' for the L P C / A R analy- sis, but these cri teria are signal dependent [24]. Second, the L P C / A R signal analysis is very vulnerable to noise interference [60]. Since the L P C / A R model tends to fit Chapter 4. Solutions to the Recognition Problem 77 spectral peaks more accurately than the valleys [26], it is logical to compensate those spectral peaks caused by noise interference by increasing ' p ' i n noisy environments. Unfortunately, for a practical recognizer, ' p ' must always be fixed and independent of the varying unknown signals received. Tierney [61] showed that noise reduction should be applied prior to the analysis to ensure the best L P C / A R based recognition system performance i n noisy backgrounds. To compensate the L P C / A R parameter variations due to different noise sources, the derived LPC-ceps t r a l coefficients w i t h different weighting factors were adopted as signal features. Improvement i n system recognition performance was reported in [63,64]. T o implement a real-time L P C / A R based recognizer w i th "intelligent" noise pre- filtering for our applicat ion, a complex multiple-processor based system would be re- quired. Such complexity makes this approach undesirable for W A R N S I S . 4.2.4 L P C - d e r i v e d Cepstral Coefficients Pioneer work of investigating the effectiveness of using different speech parameters for speaker identification and verification was done by A t a l [62]. He concluded that L P C - derived cepstrum coefficients provided better identification performance than either L P C coefficients, or signal autocorrelation coefficients, or signal impulse response filter coefficients of an all-pole filter derived from the estimated L P C / A R coefficients. Recently, the use of L P C - d e r i v e d coefficients for speech signal recognition has been reconsidered by Juang et al . [63] who applied bandpass liftering i n speech recognition. He showed that bandpass liftering of the L P C - d e r i v e d cepstral coefficients (equivalent to applying a smoothing window) tends to reduce undesirable spectral sensitivity by smoothing the spectral peaks without distort ing the fundamental formant structure. Such undesirable spectral sensitivity may be caused by the presence of spectral notches or zeros i n the signal spectrum, introduced during signal transmission, by filtering, Chapter 4. Solutions to the Recognition Problem 78 or by improper preemphasis. Smoothing transforms the original L P C - d e r i v e d cepstral coefficients into more reliable parameters. Juang's recognition results showed that the bandpass liftering process produced one percent less error than a process using standard cepstral coefficients. Hanson and W a k i t a [64] used "root-power sums" or weighted cepstral coefficients as spectral distort ion measures for speaker-dependent isolated word recognition in dif- ferent noise environments. They showed that for white noise interference, a gain of 16 % i n recognition accuracy may be achieved by using weighted rather than standard cepstral coefficients. Th i s method suffers from the same l imitat ions of complexity and computational requirements as the L P C / A R approach. Therefore, it is equally unsuitable for our application. 4.2.5 T h e H i d d e n M a r k o v M o d e l ( H M M ) A p p r o a c h One applicat ion of H M M for signal recognition is speaker-independent isolated word recognition. The left-to-right topology of H M M is generally adopted i n practice. Such a H M M model has N states and each state corresponds to a set of temporal events i n the speech signals. The H M M is characterized by a state transi t ion matr ix , and a statistical characterization of the acoustic vectors w i t h i n the state. A detailed exposit ion on the applicat ion of H M M to speech recognition is given in [65]. Rabiner et al . [66] showed that the H M M based recognizer requires ten times less storage, and about 17 times less computat ion for recognizing a test utterance than does an equivalent recognizer using L P C coding and D T W . Th i s is at the expense of a slight increase i n error rate, and of extensive computat ion while training the model w i th a reasonable large ensemble of utterance samples. The improvement of the H M M performance in different noisy environments has received considerable attention i n the last few years [67]. Chapter 4. Solutions to the Recognition Problem 79 Considering H M M for our application, we must concern ourselves with the topology of the model. Based on such a topology, the Baum-Welch algorithm could be employed to extract the statistical parameters of the model [65]. To evaluate these probabilistic model parameters, scaling of temporary results must be performed with great care to avoid underflow problems which occur even when mainframes are used [66]. Therefore, H M M appears to be unattractive for hardware implementation using integer arithmetic amenable to real-time operation. 4.3 Overview of the Recognition Scheme for W A R N S I S In the selection of the recognition scheme for WARNSIS the following criteria must be considered: 1. reliability and robust recognition performance in different noise environments; 2. real-time operation; 3. portability; and 4. reasonable cost. Our preliminary experiments have shown that neither timing nor short-time spectral information is sufficient on its own for reliable recognition performance (see Chapter 6 for performance results). Since both timing and spectral information contributes unique identifiers, a "hybrid" recognition scheme, utilizing both timing and short-time spectral information was designed for WARNSIS (Fig. 4.28). In particular, our design uses timing features as "tokens" to assign sounds to various groups (steady, on-off, variable, etc). Spectral analysis is then used to correlate the spectra of the unknown sound with the spectra of the warning sounds belonging to that group. We have Chapter 4. Solutions to the Recognition Problem 80 SIGNAL • TIMING ANALYZER SPECTRAL ANALYZER SOUO STATE SVKTCH PATTERN MEMORY PATTERN COMPARISON DYNAMIC TIME WARPING DECISION RULE RECOGNIZED SOUND Figure 4.28: T h e ' hyb r id ' recognition scheme for W A R N S I S designed a unique analyzer which produces t iming information and obtains spectra using the filter bank approach. Operat ionally, the system works as follows. In the t raining stage, the warning sounds of interest are analyzed, and relevant t iming information is derived and stored i n the t iming pattern memory. Consequently, short-time spectra of these sounds are generated by the spectral analyzer. The short-time spectra of warning sounds are clas- sified and stored in the spectral pattern memory according to the group classification determined earlier by t iming analysis. In the recognition stage, two types of pattern comparisons are performed sequen- tial ly, before a decision is reached to declare a successful recognition for a specific warning sound. The first stage involves the t iming pattern comparison between the t iming features of an unknown signal and the pre-stored t iming patterns. If the match- ing cri teria are not satisfied for any of these patterns, no spectral analysis is performed Chapter 4. Solutions to the Recognition Problem 81 on the incoming signal, and the t iming analysis resumes for the next sample. If a match is found w i t h one of the t iming patterns, the signal is assigned to the corresponding "group", and spectral extraction and pattern comparisons are performed on it . Based on the m i n i m u m distance score computed for the pre-stored templates, the unknown signal is recognized as the corresponding warning sound. The details of the design are given in Sections 4.4 and 4.5. Since pat tern comparisons involve the most intensive computations in producing a set of distance measures (similari ty measures), any possible reduction i n number of comparisons between the unknown signal and the pre-stored templates enhances the real-time performance of recognizers. In our recognition scheme this is achieved by making use of the t iming features to group warning sounds. A n addit ional use of t iming information is to prevent unnecessary spectral and pattern analysis work when only noise is present. Chapter 4. Solutions to the Recognition Problem 82 4.4 Extrac t ing & Classifying T i m i n g Information One or two signal processing steps may be needed to extract t iming features from steady or burst-type warning sounds (Fig . 4.29). The first step classifies warning sounds according to the features derived from signal waveforms. For steady sounds, t iming feature extract ion terminates after this processing; for burst-type sounds, the processing proceeds to the next step, which estimates the repetit ion period. W A R N I N G S IGNAL S IGNAL CLASS IF ICAT ION S T E A D Y S O U N D B U R S T - T Y P E S O U N D REPET IT ION P E R I O D C A L C U L A T I O N Figure 4.29: Block diagram of the T i m i n g Feature Ext rac to r In real-life, warning sounds are modified acoustically by the environment, and the addi t ion of unwanted sounds. These background sounds may be either continuous, or transient. In addi t ion, what a microphone receives from a source depends on the paths between the two, their orientation w i th respect to each other, and the sound modification characteristics of the environment. Chapter 4. Solutions to the Recognition Problem 83 Extracting timing features from distorted and noisy signals has not been addressed by other workers in the literature. Compelled by the demands of real-life circumstances, we developed the algorithm presented here to deal with this problem. This development was inspired by the work of Gold and Rabiner [51], and Lamel [49]. 4.4.1 A Scheme to Extract T i m i n g Features We have demonstrated in Chapter 3 that the contour characteristics of the short-time average absolute amplitudes (STAAA) of warning sounds are distinctively defined for steady and burst-type sounds. Working with the short-time average absolute ampli- tude is more attractive for us than the average energy used in Lamel's work because the short-time average absolute amplitude: 1) is a simple measurement which preserves the essential features of the corresponding energy contours, 2) requires no multiplica- tion operations, and 3) has a smaller dynamic range which can be coded in 8 bits. The relationships between the short-time average absolute amplitudes and the average energy of a discrete sequence x(n) are shown in Fig. 4.30. Since the short-time average absolute amplitude is obtained from an 8-bit A / D conversion, and is coded in integer arithmetic, its dynamic variations are limited to 256 levels. The value of the short-time average absolute amplitudes is zero when the environmental noise level falls below the threshold value of the A / D conversion sys- tem. In order to compress the dynamic variations of the short-time average absolute amplitudes for plotting purposes, we adopted a logarithmic measure to readjust these short-time average absolute amplitude values. This logarithmic measure is: STAAA = 10 log 1 0 (STAAA + 1) (4.20) Chapter 4. Solutions to the Recognition Problem 84 s *»1 *1 TIME ( i n sec) (c) Figure 4.30: Relationships between the instantaneous energy and the instantaneous absolute amplitudes of a sequence, x(n). (a) : the plot of x(n); (b): the plot of |x(n)|; and (c): the plot of x2(n) Chapter 4. Solutions to the Recognition Problem 85 Note the value of the short-time average absolute amplitude is incremented by one to prevent the argument of the logarithm to take on the value of zero. The error introduced by this is not relevant since the essential features of the contour are not affected. From the STAAA contours of warning sounds, the break-points or transitions (ris- ing and falling) in these waveforms are located. Timing features of warning sound are thus derived from the timing relationship between these transitions similarly to the method of Gold and Rabiner. Fig. 4.31 (a) gives the STAAA contour of a steady sound, whereas Fig. 4.31 (b) shows the STAAA contour of a burst-type sound. With reference to Fig. 4.31 (a), a steady sound is identified if a rising transition of the waveform of short-time average absolute signal amplitude is detected, and a new value of short-time average absolute signal amplitude is then maintained for at least four seconds. For burst-type sounds two rising and falling transitions must be detected ( T i , T 3 , and T 2 , T 4 , respectively are shown in Fig. 4.31 (b)). The repetition period (RP) and the average width of signal bursts (AWSB) can then be obtained according to the following equations: RP = < r' - r'> + < r' - r ' » (4.2!) AWSB =  { T i ~  T l ) + ( r ' ~ T l ) (4.22) To detect these transitions, a signal amplitude threshold is derived from the short- time average absolute amplitude of the acoustic background. This short-time average absolute amplitude is dynamically updated every 12.8 msec to accommodate the acous- tic energy variations of the environment. This dynamic amplitude threshold (DAT) provides the baseline level of the background, and is used for transition (rising and falling) detection. Chapter 4. Solutions to the Recognition Problem 86 e n J111IIIIIIIIII!!!I!!R!IIIIIIIII!!I!!!I 0.0 I — 3.2 6.4 06 TIME ( i n sec) 12.8 (a) -i r 0 . 0 0 0 0.512 1 024 1.536 2 .048 TIME ( i n sec) (b) 2 . 5 6 0 Figure 4.31: (a): The STAAA contour of a steady sound; (bj: The isTAAA contour of a burst-type sound Chapter 4. Solutions to the Recognition Problem 87 W h e n the detection scheme starts, the dynamic amplitude threshold is assigned the m a x i m u m value. Then , the incoming short-time average absolute amplitude is compared to the dynamic amplitude threshold. If the incoming short-time average ab- solute ampli tude is less than the dynamic amplitude threshold, the dynamic amplitude threshold is updated by averaging the short-time average absolute ampli tude and the dynamic ampli tude threshold: Upda t ing ensures that the dynamic amplitude threshold follows the ampli tude level changes due to background noise. Th i s method continuously adjusts the dynamic amplitude threshold downwards unt i l a rising transit ion is detected. Such a transit ion may be either due to a warning signal, or due to a sudden increase i n background noise. If no rising t ransi t ion is detected for a period of four seconds, the dynamic amplitude threshold is reset to its in i t ia l value, and the search for a rising transi t ion resumes. F i g . 4.32 shows an example how the dynamic amplitude threshold adapts to acoustic energy variations i n the environment. Since the dynamic amplitude threshold and short-time average absolute amplitudes are expressed i n integer arithmetic, the value of the m i n i m u m detectable difference between them is one. T o avoid the false detection of a rising transi t ion due to random noise disturbance, we set the value of the threshold for detecting this transit ion as two. If the short-time average absolute amplitude is larger than the dynamic amplitude threshold by this preset threshold, a rising transit ion is detected and a reference time marker (T x ) is set. A corresponding falling transit ion w i l l be detected and marked (T2) as soon as an incoming short-time average absolute amplitude falls below the dynamic amplitude threshold. However, i f no falling transit ion is detected i n a period of four seconds (maximum allowable burst width) , this sound may be a steady sound. To DAT (updated) (4.23) Chapter 4. Solutions to the Recognition Problem 88 Figure 4.32: T w o typical examples of how the dynamic amplitude threshold adapts to acoustic energy variations of the environment, (a): sudden decrease i n signal levels; (b): sudden increase i n signal levels Chapter 4. Solutions to the Recognition Problem 89 confirm this, the dynamic amplitude threshold is reset to its in i t ia l value, and if no rising transi t ion is detected in one second period following, the sound is declared to be a steady sound, and the t iming feature extraction process terminates. If a rising transi t ion is detected wi th in one second, the search for its corresponding falling transi t ion continues, and the hypothesis of a steady sound is rejected. Assuming a burst-type signal this detection process continues unt i l a second transi t ion pair set is detected and marked w i t h T 3 and T 4 for rising and falling transitions, respectively. C o n - sequently the R P and A W S B are computed and the t iming feature extraction process terminates. A typ ica l example of the detection of a siren sound is i l lustrated i n F i g . 4.33 (a), and F i g . 4.33 (b) demonstrates how the steady sound detection scheme rejects non-steady sounds. Th i s scheme works well for warning sounds i n backgrounds w i th steady noises. To deal w i t h nonstationary noises such as radio broadcasts, and transient sounds due to door s lamming or movement of chairs, addit ional parameters and condit ional tests are included i n the scheme. These are: 1) the m i n i m u m burst durat ion ( M B D ) , and 2) the m a x i m u m inter-arrival t ime ( M I A T ) between two consecutive signal bursts. A s shown i n F i g . 4.34, any signal w i th durat ion less than the M B D is declared as an unwanted transient. Furthermore, i f the signal shows pulsative variations that last longer than M I A T , the hypothesis of a burst-type sound is rejected. These condit ional tests were incorporated into the basic scheme as follows. When any signal burst is detected, its wid th is calculated and compared to the M B D . If the computed w id th is less than the M B D , the detected burst is treated as transient noise, and the search continues. If the burst is longer than the M B D , the system waits un t i l a second t ransi t ion is detected. The t ime difference between following transitions Chapter 4. Solutions to the Recognition Problem 90 LEGEND SIGNAL LEVEL DAT i — 8.4 —\ 9.6 12.8 TIME ( in sec) (a) 5 12 7 68 10.24 TIME ( in sec) 12.80 (b) Figure 4.33: (a) : Detection of a steady sound; (b): An illustration of how the scheme rejects a non-steady sound Chapter 4. Solutions to the Recognition Problem 91 M I A T T1 < M I A T T1 — C O M B D - W1 > M B D W2 < M B D TRANSIENT NOtSE W1 C O N F I R M E D B U R S T S E Q U E N C E 031.B2) W 3 W 3 > M B D B 2 T I M E Figure 4.34: A demonstration of the use of the M B D and M I A T to refine the basic warning sound analysis scheme Chapter 4. Solutions to the Recognition Problem 92 is computed, and compared to the M I A T . If this t ime is longer than the M I A T , the hypothesis of a burst-type sound is rejected, dynamic amplitude threshold is reset, and the t iming feature extraction process is reset and restarted. A flowchart of the complete scheme for t iming feature extraction is shown i n F i g . 4.35. T h e program was wri t ten i n I N T E L 8088/8086 assembly language for real-time operation. T h e hardware developed i n Chapter 3 for t iming parameter measurement is employed here to generate the instantaneous absolute amplitudes of warning sounds. Chapter 4. Solutions to the Recognition Problem 93 YES (START"] I INPUT E DAT <— DAT+EO/2 ZZJ " INCREASE BURST WIDTH COUNTER I INPUT Ei RECORD ITS LOCATIONS YES YES RECORD THE SECOND TRANSITION PAIR LOCATIONS DECLARE STEADY SOUND COMPUTE RP. A W S 6 X END Figure 4.35: Flowchart of the T i m i n g Feature Ex t rac t ion Scheme Chapter 4. Solutions to the Recognition Problem 94 4.5 Extract ing Spectral Information As shown in Fig. 4.28, timing analysis is followed by spectral analysis. The latter is initiated only if the timing analysis indicates the possibility of the presence of one of the recognizable warning sounds. Since timing analysis of warning sounds gives the time markers for the rising and falling transition of sound bursts, it is equivalent to the end-point detection of isolated utterances [49]. Thus, the timing analyzer conveniently provides the on/off control for the spectral analyzer. 4.5.1 Feature Extract ion In our review of methods of obtaining spectral information from signals in real-time we have already indicated our preference for the filter-bank approach. Firstly, the filter- bank method works well for simple speech signals, and the warning signal spectra are simpler than the spectra of speech. In particular, Dautrich et. al. [59] demonstrated that for spoken digits the performance of a filter-bank recognizer was equal to the performance of the more complicated L P C recognizer. Secondly, as shown by Lim [60], in noisy environments filter-bank recognizers are less error prone than the L P C - based recognizers. This a very important criterion for us, since our specific goal is to recognize warning signals in low SNR situations. Thirdly, filter-bank recognizers are fast, are relatively simple, and are commercially available at a reasonable cost. Fig. 4.36 gives the block diagram of our spectral analyzer which uses a filter-bank. Signals pass through a bank of eight bandpass filters covering frequency bands from 100 Hz to 5.0 kHz. The output of each bandpass filter is passed through a full-wave rectifier, and low-pass filtered to give a value related to the energy of the incoming warning sounds in each band. The outputs of bandpass filters are sampled (typical rate 50 - 100 Hz) to give a segment of a feature set. At a time index k, a segment of Chapter 4. Solutions to the Recognition Problem 95 9GNM. B P 1 BP. F W LP i SIGNAL FEATURE PATTERN FW 6 L P 8 BP h=k BANDPASS FILTER = k FULL-WAVE RECTIFIER LPh= k LOW-PASS FILTER Figure 4.36: Fi l ter-bank analysis of Warn ing sounds parallel outputs {xi{k),x2(k),... ,xs(k)} defines a 8 th order feature vector X ( k ) as, X ( k ) = { x a ( A 0 , x 2 ( * ) , . . . , : r 8 ( * ) } (4.24) A complete spectral pattern of a warning sound is given as, R = { X ( l ) , X ( 2 ) , . . . , X ( k ) , . . . , X ( N ) } (4.25) In the recognition stage these reference patterns are compared to the spectral pat- tern T , of an unknown signal. Dynamic time warping is employed to provide a quan- t i tat ive s imi lar i ty measure between reference and unknown patterns. Chapter 4. Solutions to the Recognition Problem 96 4.5.2 Dynamic T ime Warp ing ( D T W ) The basic idea of D T W is to provide an op t imum similar i ty measure between two patterns of different t ime durations. D T W can compensate for the nonlinear time misalignment of patterns which may be caused by noise giving rise to errors in the detection of endpoints. Conceptually, matching between these patterns involves the search for a t ime warp- ing function for which the segment-to-segment comparison is opt imal according to some distance criteria. F i g . 4.37 gives an example of the op t imum match between a reference template and an unknown pattern whose feature sets consist of letter alphabets. Mathemat ical ly , the problem can be stated i n the following manner. Consider R(n),T(m) V n G [ l , i V ] , m G [1 ,M] where N ^ M (in general), and R{n),T{m) are the reference and the test pattern at t ime indices n,m, respectively. D T W is to find an op t imum time warping function w{n) to minimize the accumulated distance, [D*A) between these two patterns w i th D*A given by JV D*A = min £ d [ R{n),T{w{n)) ] (4.26) « » ) } n = 1 where d [ R(n),T(w(n)) ] is defined as the frame-by-frame (segment-by-segment) dis- tance measure. Several possible distance measures can be used, depending on the form of the feature sets [37]. In this discussion, the absolute magnitude difference is used as a distance measure. Thus , d [ R(n),T(w(n) ] is expressed by, d [ * ( » ) , ! > ( * ) ) ] = E l * » ( * ) - * S ( n ) ( * ) | (4-27) k=i Chapter 4. Solutions to the Recognition Problem Figure 4.37: An example of pattern matching between a reference template and unknown pattern Chapter 4. Solutions to the Recognition Problem 98 the k bandpass filter output at time index n of a reference spectral . pattern, the k th bandpass filter output at time index w(n) of a test spectral pattern, and the total number of bandpass filters of the filter-bank used. Since one would expect the optimum warping path to be close to a straight line, most of the computations at the beginning and the end of this path can be reduced by establishing boundary conditions for the search. In general, the optimum warping path function can be obtained by Dynamic Programming [39,53,69]. Rewriting the original path searching equation, a recursive accumulated distance function, denoted as DA(n,m), is defined as DA(n,m) = d [ R(n),T(m) } +rmn [ DA(n-1,1)} (4.28) I <m The above equation defines the minimum accumulated distance to grid point (n,m), and consists of the local distance between feature set R(n) and T(m), plus the minimum accumulated distance to the grid point (n — 1,1) where / are the possible values of m constrained by a given set of local paths. As an example, Fig. 4.38 shows one of the possible sets consisting of three paths leading to the grid point (n, m): (n — l ,m) , (n — l,m — 1), and (n — l,m — 2). To ensure that the time warping function is monotonically increasing, an additional path constraint is applied. Specifically, if the best path to grid point (n — l,m) came from grid point (n — 2,m), then no path can lead from the grid point (n — l ,m). where L = Chapter 4. Solutions to the Recognition Problem 99 Formulating these path constraints mathematically, we obtain w(n) - w(n- 1) = 0,1,2 if w(n - 1) ^ w(n - 2) (4.29) = 1,2, if w(n - 1) = w(n - 2) Therefore, substituting the above constraint equations into Eq. 4.28, we have the DP recursive solution to the D T W , D 'A where (n,m) = d [ R(n),T(m) } + (4.30) min {DA{n — 1, m) g{n — 1,m), DA(n - l , m - l),DA(n - l , m - 2)) } g(n-l,m) = 1 if w(n - 1) ^ w(n - 2) (4.31) = oo if w(n — 1) = w(n — 2) with boundary conditions governed by, w{l) = 1 (4.32) w(N) = M (4.33) and continuity criterion for w(n) expressed by, w(n) > w(n~l) (4.34) This iteration is carried out over all valid m, for each n sequentially from n = 1 to N. The constraint of Eq.(4.33) means that the last segment of the template and test signal must coincide and the distance function is DA{N,M). When the last segment is reached, the warping path w(n) is completely defined. Chapter 4. Solutions to the Recognition Problem 100 Figure 4.38: Loca l path constraints for D T W The complexity involved in D T W implementation depends on the boundary condi- tions, the local pa th constraints, and on the distance measure. B o t h Sakoe and C h i b a [39], and Myers [70] have investigated the effects of varying these factors on both speed and performance of the D T W algor i thm in speech-recognition systems. They have shown that only small differences are found in performance for a fairly wide range of variations of these parameters. If the reference and test patterns are dissimilar, the distance measures wi l l be con- sistently large. Therefore an accumulative distance l imi t must be established to stop unnecessary computat ion. Whenever an accumulated m i n i m u m distance is obtained, it is compared to the distance l imi t . If it is larger than the l imi t value, the matching process between this reference and the test pattern terminates, and another reference pattern is used to compare to the test pattern. Chapter 5 Design & Implementation Utilizing the methodologies discussed in the previous Chapters, we designed and im- plemented a WARNSIS prototype. Fig. 5.39 shows the four main hardware building blocks of our device: the microphone, the signal conditioner, the control &: timing processor (CTP), and the spectral recognizer (SR). 5.1 T i m i n g Analyzer 5.1.1 Microphone A microphone is used as the transducer that receives environmental sounds and pro- duces the electrical input for the WARNSIS. The characteristics of the microphone play a crucial role in determining the quality of the signal that is fed to the analog signal conditioner. We selected a SONY model directional microphone which has a frequency response of 100 - 15000 Hz, and a sensitivity of -70 ± 3 dB (with reference to 0 dB = lV/fxbar) at 1000 Hz. It is an electret-condenser microphone with two selectable angles ( 90° and 120° ) of reception. A microphone with a narrower angle of reception may provide better spatial separation between the signal and the background noise when the sources are separated, and the microphone is oriented at the direction of the signal source. On the other hand, when such a microphone is not oriented in direction of the signal source, the signal quality may be degraded substantially. 101 ANALOG SIGNAL CONDITIONER F U L L - W A V E R E C T I F I E R CONTROL A TIMING PROCESSOR norm P R E - A M P L O W - P A S S F I L T E R I I A U T O M A T I C G A I N C O N T R O L SPECTRAL O C N I (SR) ^ C M / ?w ~i r _ a MICROPHONE T O N E G E N E R A T O R C O N T R O L P R O C E S S O R H Y B R I D A N A L O G P R O C E S S O R P A T E R N M E M O R Y F E A T U R E E X T R A C T I O N « P A T T E R N M A T C H I N G P R O C E S S O R Figure 5.39: The building blocks of WARNSIS Chapter 5. Design & Implementation 103 5.1.2 Ana log Signal Condit ioner T h e function of the analog signal conditioner is to: 1) pre-process the microphone output to generate an analog input for the spectral recognizer, and 2) to calculate the instantaneous amplitudes of the signal for the use of this information by the control & t iming processor. Correspondingly, the signal conditioner consists of an audio pre- amplifier, a low-pass filter, an automatic gain controller ( A G C ) , two solid-state analog switches, an S P D T manual switch, a full-wave rectifier, and a 1 k H z cal ibrat ing tone generator. T h e voltage produced by the directional microphone is fed to an audio-preamplifier. Since the noise characteristics of an audio pre-amplifier system depend pr imar i ly on the noise generated by its first stage, we used a low-noise audio operational amplifier (wi th noise characteristic of 9 n V 2 / H z ) . This pre-amplifier provides a voltage gain of 58.3 d B at a 100 H z - 8.0 k H z bandwidth. T o reduce the unwanted high frequency content of the signal, the pre-amplified signal is fed to a 6 th order Chebyshev low-pass filter, w i t h a cut-off frequency at 6.4 k H z . Th i s 6 th order filter was constructed from three cascaded second order filters. The overall voltage gain of the filter chain is 11.2 d B . The filtered signal is consequently branched into two signal processing modules: the full-wave rectifier and the A G C . We used the same full-wave rectifier module as the one described i n Chapter 3. The A G C is employed to maintain the signal level at values that prevent signal cl ip- ping. Th i s A G C l imits output signal amplitude variations to 3 d B when the incoming signal varies by 60 d B . T h e analog switch, SWi, provides a windowed segment of the signal from the A G C output. Th i s switch is controlled by the control <fe t iming processor, and the gating window durat ion is set to 470 msec. This gating durat ion can easily be altered by an Chapter 5. Design & Implementation 104 external t iming resistance. The control & t iming processor w i l l activate SWi according to the t iming information extracted from the instantaneous amplitudes of the signal. The output of the A G C module is then fed to an S P D T manual switch (SWS). T h e 1 k H z cal ibrat ing tone has a peak-to-peak voltage of three volts. The tone generator is connected to another analog switch (SW2) whose output is tied to the second input of the SWS. The function of the 1.0 k H z tone is to calibrate the input signal level of the hybr id analog processor of the spectral recognizer during the in i t ia l - izat ion of the W A R N S I S . In this prototype, the user has to manual ly flip the switch to determine which one of the two signals (the processed signal from the microphone, or the cal ibrat ion 1 k H z tone) is fed to the hybr id analog processor. 5.1.3 C o n t r o l & T i m i n g Processor ( C T P ) The control &; t iming processor consists of decoding circuits, a software programmable port, and a microprocessor. The port ( I N T E L 8255, software programmable) allows parallel communicat ion between the microprocessor and the spectral recognizer to mon- itor the step-by-step operation of the recognizer logic, and is the gateway for the control signal that operates the switch i n the analog signal conditioner. The microprocessor is an I N T E L 8088, housed i n a personal computer. T h e first function of the control &; t iming processor is to perform 'real-time' t iming analysis as described i n Chapter 4. Its second function is to init iate the spectral recognition process. 5.2 Spectral Recognizer (SR) The spectral recognizer hardware consists of an N E C L S I speech chip set. Th i s set has three processors as shown in F i g . 5.39: 1) the hybr id analog processor (MC4760) , Chapter 5. Design &: Implementation 105 2) the feature extraction and pattern matching processor (^PD7761) , and 3) the control processor (yuPD7762) [55]. We selected this speech recognition chip set since it has the features required by our method: • filter-bank based recognizer; • signal frequency bandwidth of 100 H z to 5.0 kHz ; • allowable windowed signal durat ion from 0.2 sec to 2.0 sec; • supports a m a x i m u m storage of 512 signal templates; • uses syntax number in grouping signal templates; • pattern comparison using D T W v ia "firmwared" D P method; • simple set of twelve macro commands to operate the chip set; and • average recognition t ime of 0.5 sec. Th i s chip set, coupled w i t h external memory for signal template storage, constitutes the spectral recognizer of our W A R N S I S . 5.2.1 T h e H y b r i d An a lo g Processor (MC4760) The hybr id analog processor performs signal equalization and digi ta l sampling of input signals. F i g . 5.40 gives a simplified block diagram of M C 4 7 6 0 . Signal is accepted to the equalization amplifier whose voltage gain can be altered by varying an external resistance. Since sufficient voltage gain is provided from the signal conditioner, the voltage gain of the equalization amplifier is set to the possible m i n i m u m gain (0.59 d B ) . The gain of the input signal can further be adjusted by a digi tal programmable attenuator under the control of the control processor. For speech applicat ion, this Chapter 5. Design St Implementation 106 FROM ANALOG SWITCH EQUALIZER AMPLIFIER PROGRAMMABLE ATTENUATOR ANTI-ALIASING FILTER DtGmZED SAMPUi TO A/D SERIAL UP07761 CONVERTER PORT Figure 5.40: Block diagram of MC4760 attenuator compensates for signal level variations due to microphone position. However, in our application signal level equalization is performed by an external AGC circuit, and thus, the attenuator gain is permanently set to unity. The attenuated signal is then low-pass filtered by an anti-aliasing filter (5 kHz bandwidth), and is input to a built-in 8-bit A/D converter. The converter samples the signal at a rate of 10 kHz, and the sampled data are converted into inverted /z-law PCM codes. Subsequently, this output is serially transmitted to a dedicated serial input port of the feature extraction processor at a 2 MHz clock rate. 5.2.2 Feature Extract ion and Pattern M a t c h i n g Processor ( M P D 7 7 6 1 ) The /xPD7761 is an NMOS device optimized for single instruction cycle arithmetic operation. It runs at a clock rate of 8 MHz, and operates in either of two modes (analysis or pattern matching) as selected by the control processor (/xPD7762). A block diagram of the functional operation of the /iPD 7761 is shown in Fig. 5.41. Chapter 5. Design iz Implementation 107 P A T T E R N M E M O R Y UPD7761 U P D 7 7 6 2 8-BIT P A R A L L E L P O R T WGmZED SAMPLES FROM MC47W S E R I A L P O R T 8-BIT P A R A L L E L P O R T 8-BIQUAD DIGITAL F I LTERS D Y N A M I C P R O G R A M M I N G M A T C H I N G F U L L - W A V E R E C T I F I E R Figure 5.41: Block diagram of the functional operation of /zPD7761 In the analysis mode, the ^uPD7761 accepts digitized data samples from the M C 4 7 6 0 v i a a dedicated bui l t - in serial port. Da t a transfer t iming is controlled by an input clock at 2 M H z , which is the rate at which data is fed from the M C 4 7 6 0 . These samples are analyzed by a 8-channel biquad filter bank firmwared into the on-chip R O M memory. This filter bank spans the frequency spectrum from 100 H z to 5.0 k H z . Each output of the bandpass filter is full-wave rectified. The rectified outputs are sampled at a frame rate of 12 msec, and sent to the control processor v ia a 8-bit parallel port. This process is repeated for successive frames unt i l the entire windowed segment of the signal is analyzed. In the pattern matching mode, the ^PD7761 compares the features of the unknown signal wi th the pre-stored signal templates using the D T W approach. The algorithm Chapter 5. Design & Implementation 108 is firmwared onto the chip to perform the computat ionally intensive distance calcula- tions. E a c h comparison w i th a pre-stored template takes an average of 5 ms. U p o n completion, the recognition result is transferred to the control processor and subsequent templates are compared, unt i l a l l templates have been checked. 5.2.3 T h e C o n t r o l Processor (jiPD7762) The control processor provides the only communicat ion l ink between the control & t im- ing processor and the spectral recognizer. In addit ion, it performs two important func- t ional operations. F i r s t , it serves as a system controller for the M G 4 7 6 0 and / /PD7761 by providing the necessary control signals to synchronize a l l operations. Such control signals include the communicat ion protocols w i th the control &; t iming processor, the memory selection, read and write signals, reset signal for the M C 4 7 6 0 and / /PD7761, and specific command code to initiate the feature extraction and pattern operations of the / /PD7761 . Secondly, i t functions as a spectral feature compressor, by retaining only one of a set of vectors whose values are close to each other [55]. Pa t t e rn compression is important because i t allows a significant amount of reference memory to be saved, and it speeds up the calculations involved i n pattern matching. W h e n a specific operation code is sent from the control & t iming processor to the spectral recognizer, decoding is performed by the ^ P D 7 7 6 2 , providing the necessary control signals for execution. The ^uPD7762 also reports the result(s) obtained from the execution of the code to the control &; t iming processor. For example, i f a t raining command code is received by the / /PD7762, the following series of events occur: • the / /PD7762 decodes the command; • it activates the jiiPD7761 to extract spectral contents from the digit ized input signal samples fed from M C 4 7 6 0 ; Chapter 5. Design & Implementation 109 • the spectral information is sent to the / /PD7762 for feature compression; • the compressed spectral features are stored into the external pattern memory; and • a successful t raining status flag is sent to the control & t iming processor when al l t ra ining procedures are completed. Otherwise, an error status is reported to the control & t iming processor. 5.2.4 Pat tern M e m o r y The chip set can maximal ly allow 64 kbyte of pattern memory, which stores 512 signal templates. Th i s pattern memory is divided into four banks, each of which consists of 16 kbyte of memory, and can be randomly selected by the spectral recognizer i n the t ra ining and recognition stages. In our prototype we used 32 kbyte of static R A M . 5.3 Software P r o g r a m T h e software program co-ordinates the functional operations of the t iming analyzer and the spectral recognizer. Basical ly, it consists of different program modules which are responsible for various operational stages of the system. Such stages include the in i t ia l iza t ion of the system (the t iming analyzer and the spectral recognizer), the signal t iming analysis, and the signal t raining and recognition. The program module for the t iming analysis is a direct implementation of the a lgori thm developed i n Chapter 4, and the program module for the signal t raining and recognition was developed by using the specific set of commands provided by the chip-set manufacturer. We start the detailed description of the software w i th a summary of the most important commands of the spectral recognizer control language. T h e n we present the Chapter 5. Design & Implementation 110 three major modules of the program. These modules correspond to the three modes of operation of the system: ini t ia l izat ion, t raining, and recognition. 5.3.1 T h e C o m m a n d Set of the Spectral Recognizer Twelve commands are provided to operate the spectral recognizer. These commands are sent to the control &: t iming processor to init iate specific operations. E a c h command consists of a command code (8-bits), the required parameter(s), and a terminat ion code mark ing the end of each command character string. U p o n completion of the execution of the command, the status of the operation is reported to the t iming & control processor from the / /PD7762. A detailed description of the format of each command is given i n Append ix B . One of the special features of the spectral recognizer is the use of syntax numbers to group the reference signal templates. Such syntax numbers can be specified i n the t ra ining and recognition stages. A val id syntax number can range from 0 - 127 [55]. If none of the syntax numbers is specified, the default value of zero is assumed. W h e n the spectral recognizer learns the spectral features of a warning sound, this reference template w i l l be assigned to the group of templates which have the same syntax number. Similar ly , i n the recognition stage, one or more syntax number(s) w i l l be assigned to the unknown signal. To minimize useless comparisons, the spectral recognizer w i l l use only the reference templates which have the same syntax number(s) as the unknown signal being examined. In this work the syntax number is derived from the t iming features of warning signals. F r o m the t iming analyzer the repetit ion period of the burst-type signal is obtained. Then , the syntax number of this warning signal is evaluated by dividing its repetit ion period by eight, i n order to assure that the computed syntax number is bound wi th in the allowable range. However, steady sounds have no repetit ion period. Chapter 5. Design &: Implementation 111 Therefore, the syntax number of 110 is assigned arbitrarily to this group of signal templates. Furthermore, since telephone rings have by far the longest repetition period of all warning signals considered, any sound with a repetition period of about six seconds will be given the syntax number of the the telephone group (101). 5.3.2 Initialization Stage In the initialization stage the parallel port (INTEL 8255) is reset and configured to mode 0 operation (i.e. port A = bidirectional port, port B is set to output port for this implementation, four pins of the port C are for handshaking signals and two other pins are for output control signals). Then, the three processors of the spectral recognizer are also reset, and the pattern memory is tested. If any I /O hardware interfacing problem occurs during the memory testing process, a failure status from the //PD7762 will be reported to the control Si timing processor. Consequently, the 1 kHz tone is fed to the MC4760 for signal level adjustment. After level adjustment, the experimentally determined distance threshold is set to constrain the distance calculations between an unknown signal and the reference patterns. Then the user is prompted for any prestored template(s) to be transferred from permanent storage to the active pattern memory. Table 5.4 shows the parameters used in the timing analysis and their initial values. Table 5.4: Parameters used for the Timing Analyzer Timing Analysis Parameter Designated Values Minimum burst duration 102.4 msec Maximum burst duration 4000.0 msec Minimum detectable transition level 2 Starting D A T 255 Duration to average the absolute signal amplitudes 12.8 msec Chapter 5. Design &c Implementation 112 5.3.3 Train ing Stage In the training stage we employ the "training-by-recognition" strategy to learn the characteristics of warning sounds. In brief, this strategy is achieved by three steps: 1) learning the timing features of warning sounds, 2) extracting their spectral features, and 3) verifying the learned spectral features. First, the timing information of warning sounds is provided by the timing feature extraction program (cf. Section 4.4). With this information, warning sounds are classified into two groups: steady and burst-type sounds. Following the timing analysis, the spectral recognizer will learn the spectral patterns of these sounds. For steady sounds, the spectral recognizer immediately learns the spectral features and subsequently stores them in the pattern memory under syntax number 110. For burst-type sounds, spectral extraction process must be synchronized with the rising transition of the burst. As shown in Fig. 5.42, if the spectral recognizer idling time is known, this synchronization can be accomplished by activating the spectral recognizer prior to the expected beginning of the burst. With the learned timing information (i.e., repetition period and average signal burst width) of a burst-type warning sound, the idling time is obtained by subtracting the average signal burst width from the repetition period. Consequently, the spectral patterns are stored in the pattern memory under the syntax number derived from the detected repetition period. To verify the learned spectral patterns of warning sounds, the process described above is repeated. If the results of the two sets of recognition procedures are identical, the training procedure is completed. Otherwise, the training procedure repeats until the sound is "learned". If the spectral recognizer cannot successfully learn the spectral features of the signal, the user can interrupt the spectral recognizer, and restart the Chapter 5. Design &. Implementation 113 r RP j — A S B W — TIME Figure 5.42: Timing relationships associated with the synchronization of the spectral recognizer to burst-type warning signals, where STAAA is the short-time average ab- solute amplitude of signal; RP is the repetition period; ASBW is the average signal burst width, and SR is the spectral recognizer Chapter 5. Design Sz Implementation 114 t ra ining procedure. F i g . 5.43, and F i g . 5.44 show the flowcharts of the training procedures for steady and, burst-type warning sounds, respectively. Specific information relevant to each warning signal is stored for identification. Th i s information includes the syntax number, the pattern registration number which is automatical ly generated for each warning sound, the signal type (steady or burst-type), and an identifier (name) of the warning sound assigned by the user dur ing training. 5.3.4 Recognition Stage Signal recognition consists of two stages: 1) warning signal detection by the t iming analyzer, and 2) signal recognition by the spectral recognizer. The system continuously monitors the variations of the short-time average absolute ampli tude of sound in the environments. If a steady sound is detected, the spectral recognizer identifies the sound twice. If the two recognition results identify the presence of a known warning sound, the unknown sound is declared to be that warning sound. If a potential burst-type sound is detected, its repeti t ion period, burst wid th , and syntax number are derived. Based on these measurements, the spectral recognizer attempts to recognize the warning sound at the rising transi t ion of the signal burst. If any spectral reference template can be matched to the unknown signal, the warning signal is identified w i t h the known warning sound associated w i th that template. A flowchart of this recognition scheme is given i n F i g . 5.45. U p o n completion of the recognition process, a summary of signal t iming analysis and recognition results is displayed on the screen. These results include the syntax number, the signal type, the sound identifier, and the distance score from the matching calculations. A system operating manual has been wri t ten for users (Appendix C ) . Chapter 5. Design ic Implementation 115 OBTAIN SOUND SAMPLES STEADY SOUND IDENTIFICATION ASSIGN SYNTAX # 110 SPECTRAL ANALYSIS GET SOUND SAMPLES TRAINING COMPLETED Figure 5.43: Flowchart of the training scheme for steady sounds Chapter 5. Design iz Implementation 116 (START) 08TAIN SOUND SAMPLES BURST-TYPE SOUND IDENTIFICATION COMPUTE RP.AWSB COMPUTE SR IDLING TIME DELAY) DELAY <- (RP-AWS8) ELAY <— DELAY-1 NO YES ASSIGN SYNTAX 1 RP/8 RP: REPETTTION PERIOD AWSB: AVERAGE WIDTH Of SIGNAL BURST Sft SPECTRAL RECOGNIZER SPECTRAL ANALYSIS GET SOUND SAMPLES TRAINING COMPLETED Figure 5.44: Flowchart of training procedures for burst-type warning sounds Chapter 5 . Design k Implementation 117 SUCCESS; A Figure 5.45: Flowchart of the recognition procedure Chapter 6 Evaluat ion Experiments were conducted to evaluate the performance of the W A R N S I S under dif- ferent noisy situations. Performance cri teria were the average recognition rate and the false-alarm rate. Three noise backgrounds were used: 1) steady fan noise, 2) fan noise plus F M - r a d i o broadcasts, and 3) fan noise plus A M - r a d i o broadcasts. In view of the variations of spectra w i t h loudness and noise contamination (cf. Section 3.2.5), three templates were prepared for the spectral recognizer at different S N R s (i.e. l O d B , 20 d B , and 30 dB) w i t h the steady fan as a noise source. Peterson demonstrated that i n order to hear sounds reliably i n the presence of noise, their spectral components have to be 15 d B to 25 d B above the background S P L level [17,18]. Furthermore, current standards demand the audible warning devices used i n private residences must produce a m i n i m u m 10 d B A S P L above the average ambient level [11]. Therefore, we took the stricter criteria which was to mainta in the average S P L of the noisy background at a m i n i m u m of 10 d B C below the S P L of the warning sounds. Throughout the experiments, a value of 62 d B C S P L was measured for steady noise. W h e n radio-broadcast was introduced into the steady noise background, the variations i n S P L of the environment was monitored for five minutes i n order to provide the average S P L estimate of the noisy background. This estimate was obtained by averaging the S P L variations w i th in the observed time interval. M o r e specifically, this value was maintained approximately at 65 d B C . Note that the three d B C S P L increase 118 Chapter 6. Evaluation 119 was caused by acoustically adding two signals of equal strength (i.e. steady noise and radio-broadcast signal). Then, we activated an auditory warning device, and adjusted the loudness of the emitted sound so that the SPL reading was on the average 10 dBC above the noisy background. The set-up for these experiments was similar to the one used for the measurement of the average short-time absolute amplitude of warning sounds in Chapter 3. Siren sounds were emitted from a siren horn; the pre-recorded telephone rings and smoke alarm sounds were produced by a tape recorder; and the radio-broadcasts originated from a radio-cassette player. To explore the contribution of the timing and spectral recognizer parts to the per- formance of the WANRSIS, we also evaluated the recognition rate and the false-alarm rate using these subsystems separately. Specifically, for the timing analyzer part alone, the repetition period was our prime feature for warning sound recognition. Since steady sounds have no repetition period, their recognition accuracy rate cannot be found under these circumstances. In the training stage, the timing analyzer learned the repetition periods from the warning sounds. To recognize a warning sound, the repetition period of an unknown sound was extracted and compared to the values of the pre-stored repe- tition periods. If the absolute difference was less than 10 % of the pre-stored repetition period used in the comparison, the unknown sound was assigned to the corresponding reference warning sound. For the spectral recognizer part alone, the environmental sounds were continuously monitored. Under the steady noise background, the spectral recognizer learned the signal templates using the 'training-by-recognition' scheme. For signal recognition, only spectral features were used without utilization of any timing information. Chapter 6. Evaluation 120 6.1 Average Recognition Accuracies For each warning sound the recognition rate was derived by d iv id ing the number of times the correct sound was identified by the total number of times the sound was present. The average accuracy for each of the three types of warning sounds is the average of the recognition rates calculated for a l l sounds belonging to the type. The detailed calculations may be found i n Append ix D . Table 6.5 shows the summary of recognition results for the complete W A R N S I S , the t iming analyzer part alone, and the spectral recognizer part alone. The first column gives the three types of noisy backgrounds i n which the experiments were conducted; the second column shows the types of warning sounds used: 1) 'burst ' , denoting burst- type sounds, 2) 'steady', denoting steady sounds, and 3) 'phone', denoting telephone rings; the th i rd , fourth, and fifth columns give the average recognition accuracies ( A R A ) achieved by the complete W A R N S I S , the t iming analyzer part alone, and the spectral recognizer part alone, respectively. The recognition results for the spectral recognizer part alone i n a steady noise background were reported in [71]. In steady noise background, the complete W A R N S I S produced 100 % average recog- ni t ion accuracy for a l l three types of warning sounds. The t iming analyzer part alone yielded perfect recognition scores for burst-type sounds and phone rings; and the spec- tral recognizer part alone gave more than 95 % average recognition accuracy i n al l cases. A s mentioned previously, the t iming analyzer can detect the presence of steady sounds, but cannot distinguish any particular steady sound. Therefore, we cannot find the average recognition accuracy for the steady sound i n the column for the t iming analyzer part alone. W i t h the addi t ion of F M broadcast to the steady noise, the complete W A R N S I S could s t i l l reliably identify burst-type, and steady sounds. A s shown i n Table 6.5, Chapter 6. Evaluation 121 Table 6.5: A summary of recognition results with M B D set to 0.1024 sec Complete Timing Analyzer Spectral Recognizer Background Type of WARNSIS Alone Alone Noises Warning ARA^ ARA ARA Sound (%) (%) (%) Burst 100.0 100.0 100.0 Steady Steady 100.0 N / A 97.6 Noise Phone 100.0 100.0 95.8 F M + Burst 98.0 97.7 65.6 Steady Steady 100.0 N / A 91.1 Noise Phone 0.0 0.0 70.0 A M + Burst 99.3 98.3 67.2 Steady Steady 100.0 N / A 91.1 Noise Phone 0.0 0.0 69.2 ARA^ : Average Recognition Accuracy in % Minimum Burst Duration (MBD) : 0.1024 sec N / A : Not Applicable Chapter 6. Evaluation 122 the recognition accuracies were measured as 98.0 % for burst-type sounds, and 100 % for steady sounds. B u t , the complete W A R N S I S failed to recognize the telephone rings. Under the same noisy conditions the t iming analyzer could recognize burst-type sounds w i t h a 97.7 % average recognition accuracy, but failed to detect the presence of telephone rings. For the spectral recognizer part alone the average accuracy dropped from 100 % to 65.6 % for burst-type sounds, and was reduced from 95.8 % to 70 % for phone rings. However, this subsystem could s t i l l achieve a 91.1 % average recognition accuracy for steady sounds. These results indicate that the complete W A R N S I S consistently obtains higher recognition accuracy rates for burst-type and steady sounds than those of its subsys- tems separately. In close examination the complete W A R N S I S gives a 0.3 % recognition accuracy better than that of the t iming analyzer part for burst-type sounds w i t h the background of F M broadcast plus steady fan noise. In the same situations, the com- plete W A R N S I S outperforms the spectral recognizer by 24.4 % in identifying burst-type sounds, and by 8.9 % i n correctly recognizing different steady sounds. Similar results were also obtained when A M - r a d i o broadcast and steady noise was used as background. W i t h the background of radio broadcast, bo th the complete W A R N S I S and the t iming analyzer failed to detect the presence of phone rings. Analysis showed that this is due to the value of the m i n i m u m burst durat ion ( M B D ) selected. It is possible to set M B D to provide greatly improved phone r ing recognition (1.024 sec). Table 6.6 gives the recognition results w i t h this M B D value. Over 92 % recognition accuracy for phone rings is achieved by the complete W A R N - SIS, and the t iming analyzer can always correctly identify the presence of phone rings i n radio-broadcast backgrounds. Accord ing to the t iming analysis algori thm, the modifi- cat ion of the m i n i m u m burst durat ion has no effect on the performance of the complete Chapter 6. Evaluation 123 Table 6.6: A summary of recognition results w i th M B D set to 1.024 sec Complete T i m i n g Analyzer Spectral Recognizer Background T y p e of W A R N S I S Alone Alone Noises Warn ing ARA^ ARA ARA Sound (%) (%) {%) Burs t 0 0 100.0 Steady Steady 100.0 N / A 97.6 Noise Phone 100.0 100.0 95.8 F M + Burs t 0 0 65.6 Steady Steady 100.0 N / A 91.1 Noise Phone 92.5 100.0 70.0 A M + Burs t 0 0 67.2 Steady Steady 100.0 N / A 91.1 Noise Phone 94.2 100.0 69.2 ARA^ : Average Recognit ion Accuracy in % M B D : 1.024 sec N / A : No t Appl icab le W A R N S I S i n steady sound recognition, and of the spectral recognizer alone in al l noise situations. Therefore, we reproduced those average recognition accuracies from Table 6.5 i n Table 6.6. The effect of different M B D ' s on the performance of the W A R N S I S is discussed in detail i n Section 6.3.1. 6.2 False-alarm Rates Since the occurrence of warning sounds i n real-life environments is quite infrequent, it is essential for the W A R N S I S not only to achieve an acceptable recognition accuracy for various sounds, but also to operate w i th a low false-alarm rate. Chapter 6. Evaluation 124 W i t h the same experimental set-up as used before, we recorded the number of false- alarms over long period of t ime. The false-alarm rates for the complete W A R N S I S , the t iming analyzer part alone, and the spectral recognizer part alone were determined. Table 6.7 shows that i n steady noise situations W A R N S I S produces no false-alarms. W i t h radio-broadcast background the false a larm rate maybe as high as 2.33 per hour. Interestingly, phone r ing false alarms are never produced. For the t iming analyzer alone the 'worst ' false-alarm rate is 144.59 mis-recognitions per hour, 113 of which belongs to burst-type, 31 to steady, and 0.59 to phone ring sounds, respectively. In the two radio-broadcast backgrounds, over 99 % of mis- recognitions are classified into burst-type and steady sounds. For the spectral recognizer alone, the 'worst ' false-alarm rate is 1848 mis-recognitions per hour, 21 of which belongs to burst-type, 200 to steady, and 1627 to phone ring sounds, respectively. In two noisy conditions, over 80 % of mis-recognitions are classi- fied into phone rings. W i t h the M B D set to 1.024 sec, the W A R N S I S gave no false phone indications no matter what the noise conditions were (Table 6.8). Since the different M B D ' s have no effect on the performance of the spectral recognizer, the false-alarm rates for the spectral recognizer in Table 6.7 are reproduced i n Table 6.8. A l though it is very difficult to quantify, experience has shown that the false a larm rate is highly dependent on the type of music played. 6.3 Discussion 6.3.1 Average Recognition Accuracies Table 6.5 shows that the combined use of t iming and spectral characteristics of warn- ing sounds gives better recognition scheme for burst-type and steady sounds than any Chapter 6. Evaluation 125 Table 6.7: Results of the false-alarm test with M B D set to 0.1024 Complete Timing Analyzer Spectral Recognizer Background Mis- WARNSIS Alone Alone Noises recognized FAR? FAR FAR As (#/hour) (#/hour) (#/hour) Burst 0 0 0 Steady Steady 0 0 0 Noise Phone 0 0 0 Total 0 0 0 F M + Burst 1.33 49 21 Steady Steady 1.0 35 200 Noise Phone 0 0.76 1627 Total 2.33 84.76 1848 A M + Burst 0.5 113 153 Steady Steady 0.5 31 296 Noise Phone 0 0.59 1270 Total 1.0 144.59 1719 FAR? : False-alarm Rate M B D : 0.1024 sec Chapter 6. Evaluation 126 Table 6.8: Results of false-alarm test with M B D set to 1.024 sec Complete Timing Analyzer Spectral Recognizer Background Mis- WARNSIS Alone Alone Noises recognized FAR? FAR FAR As (#/hour) (#/hour) (#/hour) Burst 0 0 0 Steady Steady 0 0 0 Noise Phone 0 0 0 Total 0 0 0 F M + Burst 0 2.67 21 Steady Steady 1.0 36 200 Noise Phone 0 4 1627 Total 1.0 42.67 1848 A M + Burst 0 4.67 153 Steady Steady 0.5 26 296 Noise Phone 0 9.33 1270 Total 0.5 40.0 1719 FAR? : False-alarm Rate M B D : 1.024 sec Chapter 6. Evaluation 127 scheme using only one of them. In particular, for these two types of warning sounds i n radio broadcast backgrounds, the complete W A R N S I S gives at least 0.3 % better average recognition accuracy than that of the t iming analyzer alone, and provides min- imal ly 8 % better average recognition accuracy rate than that of the spectral recognizer part alone. The explanation for the failure of the complete W A R N S I S and the t iming analyzer to recognize phone r ing is as follows. F i g . 6.46 gives an example of a phone ring sequence added wi th nonstationary background noise. The phone r ing sequence is comprised of two 2 seconds bursts [Bi, and £ 3 ) , and of 4 seconds of silence. After the first phone r ing, the burst, Bx, is detected by the t iming analyzer, and the time markers for both rising and falling transitions are located. Wi thou t storing the detected burst waveform, the t iming analyzer continues to monitor the environmental sounds. D u r i n g the silence interval, 2? 2, which may be caused by radio music/conversation, is also detected by the t iming analyzer. Unfortunately, the two cri teria for a successful detection of a potential repetitive burst sequence are satisfied (i.e. W2 > MBD, and the burst interarrival t ime > MIAT). Therefore, the repetit ion period for these bursts is calculated, and compared to the prestored template values. Mis-recogni t ion to one of the warning sounds occurs, i f this value matches to any one of the prestored values. Otherwise, the t iming analyzer considers this burst sequence is caused by random noise, and their t ime markers are cleared as it restarts to search for another potential burst sequence. Similar ly , the t iming analyzer decides either mis-recognition or random noise rejection for the following phone bursts (i.e. B3 i n F i g . 6.46). A s a result, the t iming analyzer fails to detect the presence of phone rings. If the t iming analyzer cannot provide the t iming information on phone r ing sequence, the W A R N S I S cannot util ize this t iming analysis result, and eventually, i t also cannot identify the presence of phone rings. Ch&pter 6. EvaJuation 128 T I M E Figure 6.46: An example of a phone ring sequence added with nonstationary back- ground noise Chapter 6. Evaluation 129 In Table 6.6 we find that the t iming analyzer performs better than the complete W A R N S I S i n phone r ing recognition. A n explanation for this observation is as follows. For the t iming analyzer part alone the repetit ion period is the only feature used to to detect the presence of phone rings. Since the repetit ion periods of the phone ring sequences used are approximately six seconds, the t iming analyzer, therefore, cannot identify the sounds emitted from a specific telephone ringer. However, based on the t iming information derived from a phone r ing sequence, the complete W A R N S I S then examines the spectral content of a phone ring and compares it to the pre-stored spectral patterns belonging to the group of telephone rings. Thus , the complete W A R N S I S not only identifies the sound as a phone ring, but also provides addit ional information on the specific ringer. A s deduced from Table D.25 and D.27 in Append ix D (a complete set of evaluation results), the decreased recognition rate occurs even though it identifies the correct ringer, as it chooses the incorrect loudness or pi tch template. Th i s is because of the similar spectral characteristics between templates w i th adjacent settings (cf. Section 3.2). In a pract ical system, however, this would not matter as long as the "phone is r inging" event is detected. The repetit ion period of burst-type sounds ranges from 140 msec to 3.2 sec. W i t h the value of the m i n i m u m burst durat ion changed from 0.1024 sec to 1.024 sec, the t iming analyzer is prevented from extracting t iming features of those burst-type sounds w i t h repeti t ion periods less than 1.024 sec. However, the modification has no effect on the steady sound recognition performance of the complete W A R N S I S because steady sounds require a m i n i m u m burst durat ion of four seconds. 6.3.2 False-alarm Rates The results of the false-alarm rate indicate that the combined use of t iming and spectral features to characterize warning sounds provides an effective scheme to eliminate false Chapter 6. Evaluation 130 recognitions triggered by environmental noise. For random noise there are no false alarms. In the presence of F M broadcasts, the complete W A R N S I S gives a false-alarm rate of about 2.33 false recognitions per hour, which we consider to be unacceptably high for a practical recognition system operating i n real-life environments. It should be remembered, however, that the measurements presented here represent the 'worst-case' false-alarm recognition performance of the W A R N S I S . Rea l life performance should be better, since S N R ' s are usually higher than the 10 d B used i n our measurements. Eva lua t ion of performance i n use w i l l require field testing beyond the scope of this work. The specifications for the W A R N S I S are given i n Append ix E . Chapter 7 Conclusions and Recommendations 7.1 S u m m a r y & Conclusions This work was divided into two major parts: 1) the analysis of warning sounds, and 2) the design of a prototype recognition device based on ( l ) . A n extensive search for existing warning sound characteristics yielded only a l imi ted amount of t iming and spectral information. Therefore, we used various t iming and spectral analysis techniques to study the warning sounds emitted by telephones, smoke alarms, and electronic siren drivers. F i r s t , the short-time average absolute amplitudes of warning sounds were analyzed to provide t iming features. Results show that warning sounds can be categorized into either steady or burst-type sounds. Secondly, Welch's nonoverlapping spectral estimation method was used to analyze the short-time spectra of warning sounds. O u r findings indicate that the spectra of telephone rings produced from electromechanical ringers of dia l phones of the same model may vary significantly. These spectral characteristics also depend on the setting of the loudness adjustments provided. Typica l ly , the short-time spectra of a two second telephone r ing consist of two discernible parts: the transient region and the steady-state regions. Analyses were also performed on telephone rings emitted from an electronic ringer. Results indicate that by varying the p i tch setting, the two tones generated from 131 Chapter 7. Conclusions and Recommendations 132 the ringer change accordingly. For siren sounds, the short-time spectra can be divided into two groups: 1) spectra w i th r ich harmonics and, 2) spectra w i t h frequency clusters. Based on the t iming and spectral analysis results, a ' hybr id ' prototype recognition device ( W A R N S I S ) was developed and constructed using commercial ly available com- ponents. Th i s device utilizes a combination of t iming and spectral features of warning sounds as signal patterns. A 'real-time' a lgori thm is used to extract t iming features in noisy environments. Accord ing to the relative t iming characteristics of these features, warning sounds are classified. Then , the incoming signals are passed on for spectral analysis. A filter-bank approach is employed to analyse the short-time spectra of warning sounds. To categorize these spectral patterns, the t iming information of warning sounds is used to group these patterns w i t h sounds of similar t iming features. Th i s grouping technique greatly reduces the amount of computat ion involved i n the recognition stage. The real-time program to extract t iming features was wri t ten i n assembler lan- guage. The spectral recognizer was constructed w i t h commercial electronic compo- nents. A software operating system was developed to co-ordinate the t iming analyzer and the spectral recognizer. Our device consists of 79 chips, and the software program is comprised of 2490 lines of assembler source codes. Experiments were conducted to investigate the performance of the W A R N S I S in noisy environments. For burst-type and steady sounds, the W A R N S I S provides average recognition accuracies over 98 %. W i t h regard to the false-alarm rates, the complete W A R N S I S gives much lower values than the false-alarm rates of its separate t iming and spectral subsystems. In this work, we designed, constructed, and evaluated a warning sound recognition system. The evaluation results indicate that the W A R N S I S operates satisfactorily in real environments, where it can be taught to learn new sounds and to recognize them. Chapter 7. Conclusions and Recommendations 133 This system wi l l reliably recognize warning sounds i n random noise w i t h no false alarms. In very loud music and conversation the recognition is s t i l l good, although more false alarms are created. Considering that our evaluation cri teria have been very stringent, the performance of the system in real-life situations is expected to be satisfactory. 7.2 Recommendations for Future Directions of Research T o improve the performance of the complete W A R N S I S i n noisy environments wi th S N R of lower than 10 d B , future work should be directed towards the following: 1. The improvement of the transit ion or break-point detection scheme and imple- mentation: In the present design none of the short-time average amplitudes are stored for analysis. It is feasible to store these amplitude values, and then use a fast C P U to analyze the stored signal amplitude samples. Faster C P U than the one presently employed w i l l permit more elaborate analysis of these amplitude samples, so that the t iming analyzer becomes more intelligent i n rejecting un- wanted transient noises. A possible extension of this work is to use the shape of the ampli tude contours of burst-type sounds to provide addit ional signal features. 2. Exp lora t ion of the adaptive noise cancellation ( A N C ) technique: Since noise in this work consists of music, speech signals and transient noises, cancellation of these noises i n real-life environments leads us into unexplored territory. Then we need to find a suitable A N C algori thm and explore its implementat ion for op t imum performance. For real-time operation, a compromise may exist between the S N R improvement and the complexity of the algori thm. Chapter 7. Conclusions and Recommendations 134 3. Use of microphone array to provide better spatial separation between warning sound source and background noise: A microphone array can provide a much sharper directional beam to obtain better quali ty warning sound than a single directional microphone. Research i n this area should involve the selection the microphone array structure, its orientation, and a signal processing algori thm to analyze the outputs from the microphone array to yie ld the desired output. A possible extension is the combined use of adaptive noise cancellation and mul t i - microphone array system for sound tracking capabil i ty and noise removal en- hancement. Research i n this area wi l l require a multiple digi ta l signal processing (DSP) system to facilitate the real-time operation i n nonstationary noise envi- ronments. References J . E . Harkins and C . J . Jensema, Focus-group discussions with deaf and severely hard of hearing people on needs for sensory devices, Gallaudet Research Institute, Technology Assessment Program, Washington D . C . , 1987. J . Hurv i t z and R . Carmen, Special Devices for Hard of Hearing, Deaf, and Deaf- Blind Persons, L i t t l e , B r o w n and Company, Boston, 1981. T . Hustak, Directory of Technical Aids Available to Hearing Impaired Persons, Services for Hearing Impaired Persons, Inc., Regina, Sakatchewan 1984. J . E . H a r k i n and C . J . Jensema and H . R y l a n d , "Toward Emergency Vehicle Detec- t ion: Systemic Considerations", Proceedings of I C A R R T at Mont rea l , pp.228-229, 1988. Underwriters Laboratories Inc. Standard for Safety U L 2 1 7 : "Single and Mul t ip l e Stat ion Smoke Detectors", Oct . , 1985. Underwriters Laboratories Inc. Standard for Safety U L 9 8 5 : "Household F i re Warn ing System Uni t s " , June, 1985. Underwriters Laboratories Inc. Standard for Safety UL904 : "Vehicle A l a r m Sys- tems and U n i t s " , July , 1982. Canadian Standards Associat ion, Nat ional Standard of Canada , C A N / C S A - T 5 1 0 - M 8 7 , "Performance and Compat ib i l i ty Requirements for Telephone Sets", M a r c h , 1987. Electronic Industries Associat ion, E I A - 4 7 0 - A , "Telephone Instruments w i t h Loop Signall ing for Voiceband Appl ica t ions" , 1988. B e l l System Voice Communicat ions Technical Reference, P U B 48005, "Funct ional Produc t Class Cr i t e r i a : Telephones", Jan. , 1980 Nat iona l F i re Protect ion Associat ion, N F P A 72G, "Guide for the Installation, Maintenance and Use of Notif icat ion Appliances for Protect ive Signall ing Sys- tems", 1985. Nat iona l F i re Protect ion Associat ion, N F P A 72A, "Standard for Installation, Maintenance and Use of Loca l Protective Signall ing Systems for Guards 's Tour, F i re A l a r m and Supervisory Service", 1985. R . E . Hal l iwel l and M . A . Sultan, "Attenuat ion of Smoke Detector A l a r m Signals in Residental Bui ld ings" , Nat ional Research Counci l Canada, Institute for Research i n Construct ion, N R C C 25897. S. M i y a a k i and A . Ishida, "Traffic-alarm Sound Mon i to r for A u r a l l y Handicapped Dr ivers" , J . of Medica l & Computer , Vol .25, pp.68-74, Jan. , 1987. 135 References 136 [15] Installat ion and Service Instructions for M o d e l M C S - 1 M o t o r Signal, Federal Signal Corpora t ion . [16] Installat ion M a n u a l for Electronic Siren, M o d e l S A 400-63, Southern Vehicle P r o d - ucts, Inc. [17] R . D . Patterson, C A A Paper 82017, C i v i l A v i a t i o n Author i ty , London , U . K . , 1982. [18] J . Edwor thy and R . D . Patterson, "Ergonomic Factors i n A u d i t o r y Warnings", Ergonomics International 85, edited by I. D . B r o w n , R . Go ldsmi th , K . Coombes and M . A . Sinclair , pp.232-235, 1985. [19] Lower and Wheeler, "Design of Aud i to ry Warnings for Aircraf t , Industry and Hospi tals" , Ergonomics International 85, edited by I. D . B r o w n , R . Goldsmi th , K . Coombes and M . A . Sinclair , pp.226-228, 1985. [20] G . M . R o o d , J . A . Chi l le ry and J . B . Coll ister , "Requirements and App l i ca t ion of A u d i t o r y Warnings to M i l i t a r y Helicopters", Ergonomics International 85, edited by I. D . B r o w n , R . Goldsmi th , K . Coombes and M . A . Sinclair , pp.169-170, 1985. [21] M . J . Shailer and R . D . Patterson, "Pulse generation for A u d i t o r y Warn ing Sys- tems", Ergonomics International 85, edited by I. D . Brown , R . Goldsmi th , K . Coombes and M . A . Sinclair , pp.229-231, 1985. [22] J . H . K e r r , "Warning Devices", B r . J . Anaesth. , 57, pp.696-708, 1985. [23] R . D . Patterson, J . Edwor thy and M . J . Shailer, " A l a r m sounds for M e d i c a l Equ ip - ment i n Intensive Care Areas and Operat ion Theatres", Institute of Sound and V i b r a t i o n Research Paper A C 5 9 8 , 1986. [24] S. M . K a y and S. L . Marp le ,Jr., "Spectral Analysis : A M o d e r n Perspective", Proceedings of I E E E , Vol .69, N o . l l , pp.1380-1419, Nov . , 1981. [25] B . S. A t a l and M . R . Schroeder, "Linear Predic t ion Analysis of Speech based on a Pole-zero Representation", Journal of Acoust . Soc. of Amer . , Vol.64, No.5 , pp.1310-1318, Nov . , 1978. [26] J . M a k o u l , "Linear Predict ion: A tutorial Review", Proceeding of I E E E , Vol .63, pp.561-580, A p r . , 1975. [27] R . B . B l a c k m a n and J . W . Tukey, "The Measurement of Power Spectra from the point of view of Communica t ion Engineering", New York , Dover, 1959. [28] P . D . Welch, "The Use of fast Fourier transform for the estimation of Power Spectra: A method based on T ime Averaging over Short, Modif ied Periodograms", I E E E Trans, on A u d i o Electroacoust., V o l . A U - 1 5 , pp.70-73, June, 1967. [29] G . C . Car ter and A . H . Nu t t a l l , " O n the Weighted Overlapped Segment Aver- aging M e t h o d for Power Spectral Es t imat ion" , P roc . of the I E E E , Vol .68, No.10, pp.1352-1353, Oct . , 1980. References 137 J . S. L i m , " A l l Pole Mode l l ing of Degraded Speech", I E E E Trans, on A S S P , V o l . A S S P - 2 6 , pp.197-209, June, 1978. S. M . K a y , "The Effects of Noise on the Autoregressive Spectral Es t imator" , I E E E Trans, on A S S P , V o l . A S S P - 2 7 , pp.478-485, Oct . , 1979. F . J . Harr is , " O n the Use of Windows for Harmonic Analys is w i t h the Discrete Fourier transform", Proceedings of I E E E , Vol.66, N o . l , pp.51-83, Jan. , 1978. D . N . Romalo , " A n Interference Mon i to r for a Rad io Observatory", M . A . S c . The- sis, Dept . of Elec t r ica l Engineering, Universi ty of B r i t i s h Co lumbia , pp.42-44, A p r i l , 1988. S imon C h a u and Charles Laszlo, "Spectra of Telephone Rings and Annuncia t ing Signals used in an A i d for Hearing Impaired", Proceedings of the 13 th C M B E C , pp.147-148, Halifax, June, 1987. B . S. A t a l and L . R . Rabiner, "Speech Research Direct ions", A T & T Technical Journal , Vol .62, No.5 , Sept /Oct . , pp.75-88, 1986. S. E . Levinson, "Structural Methods i n Automat ic Speech Recogni t ion" , Proceed- ings of I E E E , Vol .73, N o . l l , Nov. , pp.1625-1650, 1985. L . R . Rabiner and S. E . Levinson, "Isolated and Connected W o r d Recognit ion - Theory and Selected Appl ica t ions" , I E E E Trans, on Communicat ions, V o l . C O M - 29, No.5 , pp.621-659, May , 1981. D . O'Shaughnessy, "Speech Recognit ion", I E E E A S S P Magazine, pp.4-17, Oct . , 1986. H . Sakoe and S. C h i b a , "Dynamic Programming A l g o r i t h m Opt imiza t ion for Spo- ken W o r d Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 2 6 , N o . l , pp.43-49, Feb., 1978. A . H . Gray, J r . and J . D . M a r k e l , "Distance Measures for Speech Processing", I E E E Trans, on A S S P , V o l . A S S P - 2 4 , No.5 , pp.380-391, Oct . , 1976. N . Nocerino, F . K . Soony, L . R . Rabiner and D . H . K l a t t , "Comparat ive study of Several Dis tor t ion Measures for Speech Recognit ion", P roc . I C A S S P , pp.25-28, 1985. H . Matsumoto and H . Iami, "Comparat ive Study of Variable Spect rum Match ing Measures on Noise Robustness", Proc . I C A S S P , pp.769-772, 1986. R . F . Pu r ton , "Speech Recognit ion Using Autocorre la t ion Ana lys i s" , I E E E Trans, on A u d i o and Electroacoustics, V o l . A U - 1 6 , No.2 , pp.235-239, June, 1968. M . M . Sondhi , "New Methods for P i t c h Detect ion", I E E E Trans, on A u d i o - Electro. , V o l . A U - 1 6 , pp.262-266, June, 1968. L . R . Rabiner , " O n the Use of Autocorre la t ion Analysis for P i t c h Detect ion", I E E E Trans, on A S S P , V o l . A S S P - 2 5 , N o . l , pp.24-33, Feb., 1977. References 138 [46] J . J . Dubnowski , R . W . Schafer and L . R . Rabiner , "Real-t ime digi tal Hardware p i tch detector", I E E E Trans on A S S P , V o l . A S S P - 2 4 , pp.2-8, Feb., 1976. . [47] L . R . Rabiner and M . R . Sambur, " A n A l g o r i t h m for determining the Endpoints of Isolated Utterances", The B e l l System Technical Journal , Vol .54, Vo l .2 , pp.297- 315, Feb., 1975. [48] M . T . Whi taker and J . A . S. Angus , " A L o w Cost Continuous W o r d Speech Recog- nizer", International Conf. on Speech Inpu t /Outpu t Techniques and Appl ica t ions , I E E Conf. Pub l ica t ion # 258, pp.119-123, M a r c h , 1986. [49] L . F . Lame l , L . R . Rabiner, A . E . Rosenberg and J . G . W i l p o n , " A n Improved E n d - point Detector for Isolated W o r d Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 29, No.4, August , pp.777-785, 1981. [50] J . G . Ackenhusen and L . R . Rabiner , "Microprocessor implementat ion of an L P C - based isolated word recognizer", in P roc . 1980 B T L / W E Microprocessor Symp. , Sept., pp.35-42, 1980. [51] B . G o l d and L . R . Rabiner, "Paral lel Processing Technique for Es t imat ion P i t c h Periods of Speech i n the T i m e Domain" , J . Acoust . Soc. Amer . , Vol .46, pp.442-448, A u g . , 1969. [52] B . G o l d , "Note on buzz-hiss detection", J . Acoust . Soc. Amer . , Vol .36, pp. 1659- 1661, 1964. [53] G . M . W h i t e and R . B . Neely, "Speech Recognit ion Experiments w i t h Linear Pre- dict ion, Bandpass F i l t e r ing and Dynamic Programming" , I E E E Trans, on A S S P , V o l . A S S P - 2 4 , No.2 , pp.183-188, A p r i l , 1976. [54] H . L . K w o k , L . C . T a i , and Y . M . Fung, "Machine Recognit ion of the Cantonese Digi ts Us ing Bandpass Fi l te rs" , I E E E Trans, on A S S P , V o l . A S S P - 3 1 , N o . l , pp.220- 222, Feb., 1983. [55] N E C Speech Recognit ion L S I Set M a n u a l , June, 1985. [56] D . Tjostheim, "Recognit ion of Waveforms Us ing Autoregressive Feature Ex t rac - t ion" , I E E E Trans, on Computer , Vo l .C-26 , No.3 , pp.268-270, M a r c h , 1977. [57] B . S. A t a l and M . R . Schroeder, "Adapt ive Predict ive Coding of Speech Signals", B e l l System Tech. Journal , Vol.49, pp.1973-1986, 1971. [58] J . G . Ackenhusen and Y . H . O h , "Single-chip Implementation of Feature Measure- ment for L P C - b a s e d Speech Recognit ion", A T & T Technical Journal , Vol .64, No.8, pp.1787-1805, Oct . , 1985. [59] B . A . Daut r i ch , L . R . Rabiner and T . B . M a r t i n , " O n the Effects of Vary ing F i l t e r B a n k Parameters i n Isolated Word Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 3 1 , No.4, pp.793-806, August , 1983. [60] J . S. L i m , "Es t imat ion of L P C coefficients from speech waveforms degraded by addit ive random noise", Proc I C A S S P 78, pp.599-601. References 139 [61] J . Tierney, " A Study of L P C Analysis of Speech i n Add i t i ve Noise", I E E E Trans, on A S S P , V o l . A S S P - 2 8 , No.4, pp.389-397, August , 1980. [62] B . S. A t a l , "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification", J . Acoust . Soc. A m . , Vol .55, No.6 , pp.1304-1312, June, 1974. [63] B . H . Juang, L . R . Rabiner and J . G . W i l p on, " O n the use of Bandpass Liftering i n Speech Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 3 5 , No.7 , pp.947-953, July , 1987. [64] B . A . Hanson and H . Wak i t a , "Spectral Slope Distance Measures w i t h Linear Pre- dic t ion Analys is for W o r d Recognit ion i n Noise", I E E E Trans, on A S S P , V o l . A S S P - 35, No.7 , pp.968-973, July, 1987. [65] S. E . Levinson, L . R . Rabiner , and M . M . Sondhi, " A n Introduction to the A p p l i - cation of the Theory of Probabil is t ic Functions of a Markov Process to Automat ic Speech Recogni t ion", The B e l l System Technical Journal , Vol .62, pp.1035-1074, A p r i l , 1983. [66] L . R . Rabiner , S. E . Levinson, and M . M . Sondhi, " O n the App l i ca t ion of Vector Quant iza t ion and Hidden Markov Models to Speaker-Independent, Isolated W o r d Recogni t ion" , The B e l l System Technical Journal , Vol .62, No.4, pp.1075-1105, A p r i l , 1983. [67] A . Varga, R . Moore , J . Br id l e , K . Pont ing, and M . Russel l , "Noise Compensat ion Algor i thms for use w i t h Hidden M a r k o v M o d e l based Speech Recogni t ion", Proc . of I E E E Conf., pp.481-484, 1988. [68] R . W . Schafer and L . R . Rabiner , "Dig i ta l Representation of Speech Signals", Proceedings of I E E E , Vol .63, No.4, pp.662-677, Apri l ,1975. [69] L . R . Rabiner , A . E . Rosenberg and S. E . Levinson, "Considerations i n Dynamic T i m e Warp ing Algor i thms for Discrete W o r d Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 2 6 , No.6, pp.575-582, D e c , 1978. [70] C . Myers , L . R . Rabiner and A . E . Rosenberg, "Performance Tradeoffs i n Dynamic T i m e Warp ing Algor i thms for Isolated W o r d Recogni t ion", I E E E Trans, on A S S P , V o l . A S S P - 2 8 , No.6, pp.623-635, D e c , 1980. [71] S imon C h a u and Charles Laszlo, " A Warn ing Signal Identification System ( W A R N S I S ) for H a r d of Hearing Individuals", Proceedings of the 14 th C M B E C , pp.145-146, Mont rea l , June, 1988. Appendix A Formulat ion of Relationship between S N R and S P L measurements In this work S N R is defined as the ratio of the peak power of the signal to peak power of the background noise. T o calculate the S N R directly, we need to obtain both signal and noise power. F r o m the S P L measurement of acoustic background (noise), the noise power can be derived. We found, however, that the measurement of the S P L of the warning sound alone in any real acoustic environment is impossible, since there is always some background noise present. Here we wi l l show the relationship of noise S P L , and the warning sound plus noise S P L to the S N R . The following notat ion wi l l be used: Iref = reference sound intensity la — peak acoustic intensity of background noise Is = peak warning sound intensity Ia+S = peak acoustic intensity of a warning sound plus background noise Pa = peak S P L of background noise Pa+S = peak S P L of a warning sound plus background noise SIa = Ia i n d B SIS = Is i n d B 140 Appendix A. Formulation of Relationship between SNR and SPL measurements 141 SIa+3 = Ia+S expressed i n d B SPLa — S P L measurement of background noise SPLa+s = S P L measurement of a warning sound plus background noise Pref = the reference sound pressure level ( 20 p, Pa) F r o m the definition of S N R , SNR la  = y- (A.35) and SNR(dB) = 10 l o g 1 0 { ^ | (A.36) Also , SIa+3 = 10 l o g 1 0 | ^ | (A.37) SIa = 10 l o g 1 0 | A . | (A.38) B u t (SIa+s — SIa) = difference i n sound intensity level i n d B , and using equations (A.37) and (A.38), it gives (SIa+s-SIa) = 10 l O g 1 0 ( ^ ± 4 - l 0 l o g 1 0 f / a ^ lref ) \ *ref ( 1 = 10 l o g r o j ^ 1 } (A-39) Since Ia+S = Is + Ia (without resonance), we have {Ia + Is)\ {SIa+s-SIa) = 10 l o g 1 0 1 10 log 1 0 { l + j-} (A-4°) Appendix A. Formulation of Relationship between SNR and SPL measurements 142 To find (SIa+s — SIa) by measurement, consider la = KP* Ia+3 = KP2A+S (A.41) where K — constant T h e n we can express (SIa+s — SIa) i n terms of Pa and Pa+S which can be measured by a commercially available S P L meter. {SIa+s-SIa) = 10 log. = 20 l o g r o j ^ } (A.42) Rewr i t ing equation (A.42) using Pref gives, (SIa+s-SIa) = 20 l o g 1 0 § ± i - 2 0 log10A- •Tref r r e f = SPLa+s-SPLa (A.43) = 10 l o g 1 0 ( l + £ ) Equa t ion A.43 indicates that the difference in sound intensity can be expressed i n terms of two measurable physical quantities — the difference i n SPL measurements in the absence and during the presence of a warning sound. Hence, we have {SPLa+s - SPLa) = 10 l o g l 0 ( l + £ ) = 10 l o g 1 0 ( l + SNR) (A.44) Appendix A. Formulation of Relationship between SNR and SPL measurements 143 Hence SNR = h. I* = { o „ ( , , o g l 0 ( & ^ ) ) } - 1 (A.45) or SNR(dB) = 10 logw(SNR) (A.46) W h e n the difference i n S P L readings is more than 10 d B , S N R i n d B is very close to the S P L difference i n d B (Table A.9). Appendix A. Formulation of Relationship between SNR and SPL measurements 144 Table A.9: Tabulation of SPL reading difference and SNR {SPLa+s - SPLa)in{dB) S'NR (in dB) SNR 0.5 -9.14 0.12 1.0 -5.9 0.26 1.5 -3.8 0.41 2.0 -2.43 0.59 2.5 -1-1 0.78 3.0 0.0 1.0 3.5 0.9 1.2 4.0 1.8 1.5 4.5 2.6 1.8 5.0 3.3 2.2 5.5 4.1 2.6 6.0 4.7 3.0 6.5 5.4 3.5 7.0 6.0 4.0 8.0 7.2 5.3 9.0 8.4 6.9 10.0 9.5 9.0 11.0 10.6 11.6 12.0 11.7 14.8 13.0 12.8 19.0 14.0 13.8 24.1 15.0 14.8 30.6 16.0 15.9 38.8 17.0 16.9 49.1 18.0 17.9 62.0 19.0 18.9 78.4 20.0 20.0 99.0 21.0 21.0 125 22.0 22.0 158 23.0 23.0 199 Appendix B Format of the command set of the S R The format of the twelve commands used to control the operation of the S R is given i n Table B.10 . Correspondingly, Table B . l l shows the legal values for the memory bank, the bank rejected value, the signal rejected value, the syntax and the registration In response to a specified command one or more of the following status output codes is (are) reported from the )uPD7762 to the control &; t iming processor. The interpretations of these status output codes are given in Table B.12 . 145 Appendix B. Format of the command set of the SR 146 Table B.10: Format of command set of SR Command Code Format 1. Initialize (2 byte code) ,00, H^, OFFH (termination code) code hex 2. Level_adjust 01H, [memory bank], (3-6 bytes) [memory bank], [memory bank], [memory bank], OFFH 3. Recognition 003H, [syntax # (S)], (2 - 32 bytes) [..., S . . . ], OFFH 4. Training 002H, registration [syntax #], (3-5 bytes) [signal rejected value], OFFH 5. Second Decision (2 bytes) 004H, OFFH 6. Hot start (2 bytes) 005H, OFFH 7. Down load (3 bytes) 006H, # of patterns, OFFH 8. Up load (2 bytes) 007H, OFFH 9. Change memory reject value (3 bytes) 008H, bank reject value, OFFH 10. Memory test (2 bytes) 009H, OFFH 11. Select memory bank (3 bytes) OOAH, bank #, OFFH 12. Change signal reject value OOCH, registration (3 bytes) signal reject value, OFFH Table B. l l : Legal Values for parameters of the command set Parameters Legal Value l) Memory bank value (B) 0 < B < 03 2) Bank reject value (BRV) 0 < BRV < OFEH 3) signal reject value (SRV) 0 < SRV < 080H 4) pattern registration value (PRV) 0 < PRV < 080H Appendix B. Format of the command set of the SR 147 Table B.12: Interpretation of status output codes from /iPD7762 Code Interpretation 000H normal completion of a command 001H Input signal level too high 002H Input signal level too low 003H Input signal longer than 2.0 sec 004H Request signal level adjustment 005H Specified syntax # non-existing 006H Registered pattern does not exist 007H the distance value is greater than B R V 008H Specified memory bank does not exist 009H Command format error OOAH The distance is greater than PRV, but less than B R V OOBH Signal duration is less than 200 msec OOCH Memory test error or hardware I/O error A p p e n d i x C Software Operat ing M a n u a l of T h e W A R N S I S C . l P r o g r a m Files This manual provides a guidance for the user to follow the operation procedure de- veloped for the signal recognition software. The software was designed to provide an interactive dialogue between the user and the device. Messages w i l l constantly display on the monitor to enquire the user to input the requested parameter values, and to indicate the status of the device. In this manual , such messages are shown in bold-face. The software was saved on a PC-computer , and was located at the sub-directory called \ s imon \nec \ . To enter this sub-directory, the user needs to type the following statements: type : cd simon displayed on the monitor: d:\simon type : cd nec displayed on the monitor : d:\simon\nec Once the user has entered the sub-directory of \ s imon \nec \ , he/she can find the pro- grams necessary to run this software. These programs are : 148 Appendix C. Software Operating Manual of The WARNSIS 149 • nec.asm : the source program of the system operating software in assembly language, • nec.exe : the executable file of nec.asm, • nec_dat.asm : the data file consisting of constants and variables for nec.exe and, • enec.bat : the batch file used to automatically assemble nec.asm to produce its object codes, to l ink its object code file (nec.obj) to yield the executable file (nec.exe), and to delete the redundant object file to optimize memory storage on the hard-disk. Th i s batch file is activated only when modification(s) has been made to the nec.asm. Execut ion of this batch job is accomplished by typing E N E C . A signal template file was stored at the directory of \ s i m o n \ n e c \ t e m p \ . Th i s data file is called as 50_warn.dat, and consists of 50 templates of various warning sounds. Such warnings include siren sounds emitted from an electronic siren driver, telephone rings and smoke a la rm sounds. C.2 Interactive Operations To execute the system software, the user types N E C . B y executing the nec.exe, the user enters the interactive operation mode, and is prompted to answer a number of questions. There are two stages i n this mode of operation, namely, the ini t ia t izat ion stage, and the t raining/recognit ion stage. Appendix C. Software Operating Manual of The WARNSIS 150 C . 2 . 1 Initialization Stage Once the program is executed, the following events occur. They are: 1. System Initialization in Progress 2. System Hardware Checking: i f everything is O K , these statements are dis- played on the monitor: . M E M O R Y C H E C K O K !! • M E M O R Y C H E C K O K !! Otherwise, error statements are reported, and they are : • Invalid C o m m a n d , or • M E M O R Y error or H A R D W A R E I / O error ! Under such circumstances, the user must exit the program by pressing C T R L - C , and shut off the power supply for 20 seconds, tu rn on the power supply, and re-run the program. 3. the user is prompted to flip a manual switch before the system begins the process of signal level adjustment. • Please, flip the switch to L E V E L _ A D J U S T , • If ready, Please press E N T E R key. Appendix C. Software Operating Manual of The WARNSIS 151 After the E N T E R key is pressed, the system starts the signal level adjustment. • Level adjustment in P R O G R E S S U p o n completion of the level adjustment, the system requests if the user wants to transfer any pre-stored signal template(s) to the template memory of the device. • D o you want to download signal templates from host C P U ? (y/n) If the answer is 'y', then the user needs to provide the template file name and the value of the total # of the prestored templates. • Please, input the file name consisting of the templates — *.dat. (d:\s imon\nec\temp\*.dat) , and a file opening statement is shown on the monitor. • S U C C E S S F U L open data file !! • Please, input of templates for downloading After this number is entered, data transfer begins to take place. U p o n completion of the data transfer, these statements are shown on the monitor; • Signal file H A S B E E N C L O S E D !! • S U C C E S S F U L data downloading !! • D o you want another downloading? (y/n) If 'y' is entered, the preceding steps repeat. Otherwise, the user enters the second stage of this software. Appendix C. Software Operating Manual of The WARNSIS 152 C.2.2 Training/Recogni t ion Stage Once the user stays i n this stage, he/she has to flip the manual switch to train- i n g / recognition posit ion. • Please, flip the switch to signal T R A I N I N G / R E C O G N I T I O N Tra in ing Procedure Then , the user is prompted i f he/she wants the system to learn a new sound. • D o you want to train the system to learn a new sound? (y/n) If the answer is 'n', the user proceeds to the recognition stage. If the answer is 'y', he/she needs to provide an identification for the new sound, and then presses the E N T E R key to start the training procedure. The interactive statements on the monitor are : • Please, specify an identification for input signal = , • template # — whose value is automatical ly generated by the system software, • Please, input S I G N A L for Training. • If ready, Please press E N T E R key. • Signal template training in P R O G R E S S For successful t raining, a summary of the template information is shown: Appendix C. Software Operating Manual of The WARNSIS 153 o S U C C E S S F U L T R A I N I N G • Burst signal !! (for burst signal), or Steady sound (for continuous , steady sound)!! • S Y N T A X # = • Template =#= = • Signal template identification = Subsequently, the user is prompted i f he/she wants the device to learn a new sound, or to recognize another new sound. If the training mode is selected, the affore-mentioned t ra ining steps repeat. If the recognition mode is selected, the user enters the recognition stage. Recognition Procedure The statement displaying on the monitor is • D o you want the system to recognize the signal ? (y/n) If the answer is 'n', the statement to enquire the signal template uploading is displayed on the monitor. B u t , if the user wants the device to recognize the signal, then the monitor shows the following statements, and the signal recognition process starts. • Start to recognize the input signal ! Appendix C. Software Operating Manual of The WARNSIS • Signal recognition in P R O G R E S S 154 For a successful recognition, a summary of the recognition results appears on the mon- itor: • S U C C E S S F U L R E C O G N I T I O N • T h e closest distance measured = • Burst sound, or Steady sound • S Y N T A X # - • Template # = • Signal template identification = Consequently, the user is prompted for another signal recognition. If the response is 'y', the preceding recognition steps repeat. If the response is 'n', he/she is enquired if the user wants to perform a signal template uploading process. • D o you want to save memory templates ?? (y/n) If the response is not 'y', the user needs to select one of the following options. • W h a t do you want to do next? (please, select one of the following choices) • r : another signal recognition Appendix C. Software Operating Manual of The WARNSIS 155 • d : another template file downloading • t : another signal training • e : exit the program Otherwise, for the memory uploading, the user provides a template file name for the identification of the stored signal templates. Then , the process of data transfer is performed transparently. The interactive statements are: • # of template for uploading = • Please, enter the file name • Successful open file • Successful uploading • Fi le closed • D o you want another signal memory uploading? (y/n) If the answer is 'y', the uploading steps repeat. Otherwise, the user has to select one of the previously mentioned options (r; d; t; e). Appendix D Evaluat ion Results In this work, confusion matrices are used to present the recognition results produced by the complete W A R N S I S , the t iming analyzer part alone, and the spectral recognizer part alone. T o simplify the notation for the confusion matrices given i n the following sections, different warning sounds are assigned a "number" as shown i n Table D.13. E a c h assigned number i n the first horizontal row indicated the'specific warning sound which was identified by a recognition system; and each assigned number i n the first vert ical column indicated the warning sound which was present i n the environments. E a c h element of the confusion mat r ix yielded the number of times that a warning sound was identified as the emitted sound i n the environments. Based on these results, the recognition rates for each warning sound are derived. Otherwise stated, the results presented here assumes that the M B D value is set to 0.1024 sec. T E ( L l ) represents telephone rings generated from electromechanical ringer w i th loudness level set at one. E T E ( P l ) represents telephone rings produced by electronic ringer w i t h pi tching adjustment preset at a specific posit ion. 156 Appendix D. Evaluation Results 157 Table D.13: "Numbers" assigned for different warning sounds T y p e of Sound Assigned Number J l (B) 1 J2 (B) 2 J3 (B) 3 J4 (B) 4 J5 (S) 5 J6 (B) 6 J7 (S) 7 J8 (B) 8 smoke a la rm (S) 9 T E ( L 1 ) (PH) 10 T E ( L 3 ) (PH) 11 T E ( L 5 ) (PH) 12 T E ( L 7 ) (PH) 13 E T E ( P l ) (PH) 14 E T E ( P 2 ) (PH) 15 E T E ( P 3 ) (PH) 16 E T E ( P 4 ) (PH) 17 B : Burs t - type Sound S: Steady Sound P H : Phone R i n g Appendix D. Evaluation Results D . l T h e Complete W A R N S I S D . l . l Recognition Results wi th Background Steady Noise Appendix D. Evaluation Results 159 Table D.14: Confusion mat r ix for recognition results generated by the complete W A R N - SIS in the presence of steady noise 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 30 30 30 30 30 30 30 30 30 Table D.15: Recognit ion rates of burst-type sounds under steady noise condit ion Burst- type Sound Assigned Value Recognit ion Rate {%) J l 1 100 32 2 100 J3 3 100 J4 4 100 J6 6 100 J8 8 100 Average - 100 Table D.16: Recognit ion rates of steady sounds generated by the complete W A R N S I S under steady noise condit ion Steady Sound Assigned Value Recognit ion Rate {%) J5 5 100 J7 "7 100 smoke a la rm 9 100 Average - 100 Appendix D. Evaluation Results 160 Table D.17: Confusion mat r ix for phone ring recognition generated by the complete W A R N S I S under steady noise condition 10 11 12 13 14 15 16 17 10 11 12 13 14 15 16 17 30 30 30 30 30 30 30 30 Table D.18: Recognit ion rates of phone r ing generated by the complete W A R N S I S under steady noise condit ion Phone R i n g Assigned Value Recognit ion Rate (%) T E ( L 1 ) 10 100 T E ( L 3 ) 11 100 T E ( L 5 ) 12 100 T E ( L 7 ) 13 100 E T E ( P l ) 14 100 E T E ( P 2 ) 15 100 E T E ( P 3 ) 16 100 E T E ( P 4 ) 17 100 Average - 100 Appendix D. Evaluation Results 161 D.1.2 Recognition Results with Background of F M Broadcast plus Steady Noise Appendix D. Evaluation Results 162 Table D.19: Confusion matr ix for recognition results generated by the complete W A R N - SIS i n the presence of F M broadcast plus steady noise 1 2 4 5 6 7 8 9 1 30 2 30 4 29 1 5 30 « 6 2 28 7 30 8 30 9 30 Table D.20: Recognit ion rates of burst-type sounds produced by the complete W A R N - SIS under F M broadcast plus steady noise condit ion Burst- type Sound Assigned Value Recognit ion Rate (%) J l 1 100 J2 2 100 J4 4 96.7 J6 6 93.3 J8 8 100 Average - 98.0 Table D.21: Recognit ion rates of steady sounds generated by the complete W A R N S I S under F M broadcast plus steady noise condit ion Steady Sound Assigned Value Recognit ion Rate (%) J5 5 100 J7 7 100 smoke a la rm 9 100 Average - 100 Appendix D. Evaluation Results 163 D.1.3 Recognition Results with Background of A M Broadcast plus Steady Noise Appendix D. Evaluation Results 164 Table D.22: Confusion mat r ix for recognition results generated by the complete W A R N - SIS i n A M broadcast plus steady noise background 1 2 4 5 6 7 8 9 1 30 2 30 4 30 5 30 6 29 1 7 30 8 30 9 30 Table D.23: Recognit ion rates of burst-type sounds generated by the complete W A R N - SIS i n A M broadcast plus steady noise environment Burs t - type Sound Assigned Value Recognit ion Rate (%) J l 1 100 J2 2 100 J4 4 96.7 J6 6 100 J8 8 100 Average - 99.3 Table D.24: Recognit ion rates of steady sounds generated by the complete W A R N S I S i n A M broadcast plus steady noise background Steady Sound Assigned Value Recognit ion Rate (%) J5 5 100 J7 7 100 smoke a l a rm 9 100 Average - 100 Appendix D. Evaluation Results 165 D.1.4 Results of phone ring recognition with m i n i m u m burst duration ( M B D ) set to 1.024 sec Appendix D. Evaluation Results 166 Table D.25: Confusion matr ix for phone r ing recognition generated by the complete W A R N S I S under the condit ion of F M broadcast and the steady noise w i th M B D set to 1.024 sec 10 11 12 13 14 15 16 17 10 28 2 . 11 29 1 12 26 4 13 1 29 14 26 4 15 3 27 16 1 29 17 • . 2 28 Table D.26: Results of recognition rates of phone rings generated by the complete W A R N S I S i n F M broadcast plus the steady noise background Phone R i n g Assigned Value Recognit ion Rate {%) T E ( L 1 ) 10 93.3 T E ( L 3 ) 11 96.7 T E ( L 5 ) 12 86.7 T E ( L 7 ) 13 96.7 E T E ( P l ) 14 86.7 E T E ( P 2 ) 15 90.0 E T E ( P 3 ) 16 96.7 E T E ( P 4 ) 17 93.3 Average - 92.5 Appendix D. Evaluation Results 167 Table D.27: Confusion mat r ix for the results of phone r ing recognition generated by the complete W A R N S I S i n the presence of A M broadcast plus the steady noise wi th M B D set to 1.024 sec 10 11 12 13 14 15 16 17 10 29 1 11 2 28 12 27 3 13 1 29 . 14 27 3 15 2 28 16 29 1 17 1 29 Table D.28: Results of phone ring recognition rates generated by the complete W A R N - SIS i n the presence of A M broadcast plus steady noise w i t h M B D set to 1.024 sec Phone R i n g Assigned Value Recognit ion Rate (%) T E ( L 1 ) 10 96.7 T E ( L 3 ) 11 93.3 T E ( L 5 ) 12 90.0 T E ( L 7 ) 13 96.7 E T E ( P l ) 14 90.0 E T E ( P 2 ) 15 93.3 E T E ( P 3 ) 16 96.7 E T E ( P 4 ) 17 96.7 Average - 94.2 Appendix D. Evaluation Results D.1.5 Results of the False-alarm Tests for the complete W A R N S I S 168 Table D.29: Results of the false-alarm tests for the complete WARNSIS with M B D set to 0.1024 sec Mis-recognized F M A M as Heavy Pop Soft Speech Soft Soft Rock Music Music Music Rock J l 1 0 1 1 0 0 J2 0 0 1 0 0 0 J3 0 0 0 0 0 0 J4 0 0 0 0 0 0 J5 2 2 2 1 2 0 J6 0 0 0 0 0 0 J7 0 2 0 0 0 0 J8 2 0 1 0 1 1 Smoke Alarm 0 0 0 0 0 0 TE(L1) 0 0 0 0 0 0 TE(L3) 0 0 0 0 0 0 TE(L5) 0 0 0 0 0 0 TE(L7) 0 0 0 0 0 0 E T E ( P l ) 0 0 0 0 0 0 ETE(P2) 0 0 0 0 0 0 ETE(P3) 0 0 0 0 0 0 ETE(P4) 0 0 0 0 0 0 Total # of recognitions 5 4 5 2 3 1 Duration (hours) 2 2 2 2 2 2 Appendix D. Evaluation Results 169 Table D.30: Results of the false-alarm tests for the complete WARNSIS with M B D set to 1.024 sec Mis-recognized F M A M as Heavy Pop Soft Speech Soft Soft Rock Music Music Music Rock J l 0 0 0 0 0 0 J2 0 0 0 0 0 0 J3 0 0 0 0 0 0 J4 0 0 0 0 0 0 J5 1 0 0 1 1 0 J6 0 0 0 0 0 0 J7 0 0 0 0 0 0 J8 0 0 0 0 0 0 Smoke Alarm 1 1 1 0 0 1 TE(L1) 0 0 0 0 0 0 TE(L3) 0 0 0 0 0 0 TE(L5) 0 0 0 0 0 0 TE(L7) 0 0 0 0 0 0 E T E ( P l ) 0 0 0 0 0 0 ETE(P2) 0 0 0 0 0 0 ETE(P3) 0 0 0 0 0 0 ETE(P4) 0 0 0 0 0 0 Total # of recognitions 2 1 1 1 1 1 Duration (hours) 2 1 1 0.5 0.5 0.5 Appendix D. Evaluation Results D.2 T i m i n g Analyzer Par t Alone D.2.1 Recognition Results with Background Steady Noise Appendix D. Evaluation Results 171 Table D.31: Confusion matrix for warning sound recognition generated by the timing analyzer alone in the presence of steady noise Type of Sound J l J2 J3 J4 J6 J8 Phone Ring J l 30 J2 30 J3 30 J4 30 J6 30 J8 30 Phone Ring 30 Table D.32: Recognition rates of the timing analyzer part alone in the presence of steady noise Burst-type Sound Assigned Value Recognition Rate (%) J l 1 100 J2 2 100 J3 3 100 J4 4 100 J6 6 100 J8 8 100 Average - 100 Phone Ring - 100 Appendix D. Evaluation Results 172 D.2 .2 Recognition Results with Background of F M Broadcast Plus Steady Noise Appendix D. Evaluation Results 173 Table D.33: Confusion mat r ix for warning sound recognition generated by the t iming analyzer part alone i n the presence of F M broadcast plus steady noise T y p e of Sound J l J2 J3 J4 J6 J8 Phone R i n g J l 30 J2 . 30 J3 4 26 . J4 . 30 J6 30 J8 30 Phone R i n g 12 3 • 5 10 0 Table D.34: Recognit ion rates of the t iming analyzer part alone i n the presence of F M broadcast plus steady noise Burst- type Sound Assigned Value Recognit ion Rate (%) J l 1 100 J2 2 100 J3 3 86.6 J4 4 100 J6 6 100 J8 8 100 Average - 97.7 Phone R i n g - 0 Appendix D. Evaluation Results 174 D.2.3 Recognition Results with Background of A M Broadcast Plus Steady Noise Appendix D. Evaluation Results 175 Table D.35: Confusion matrix for warning sound recognition generated by the timing analyzer part alone in the presence of A M broadcast plus steady noise Type of Sound J l J2 J3 J4 J6 J8 Phone Ring J l 30 J2 30 . . J3 3 27 J4 30 J6 30 . J8 - . 30 Phone Ring 6 5 7 . 12 0 Table D.36: Recognition rates of the timing analyzer part alone in the presence of A M broadcast plus steady noise Burst-type Sound Assigned Value Recognition Rate (%) J l 1 100 J2 2 100 J3 3 90 J4 4 100 J6 5 100 J8 8 100 Average - 98.3 Phone - 0 Appendix D. Evaluation Results D.3 False-alarm Results for the T i m i n g Analyzer Alone 176 Table D.37: False-alarm test results of the timing analyzer part alone with M B D set to 0.1024 sec mis-recognized F M A M as Pop Rock Classical Speech + Pop Speech Music Music Music Music J l 16 19 13 34 16 40 J2 16 13 10 18 26 25 J3 0 0 0 0 0 0 J4 0 2 1 0 2 1 J6 12 19 7 9 5 16 J8 0 1 0 0 0 0 Steady Sound 23 23 45 3 50 0 Phone 0 1 1 0 0 1 Total # of Mis-recognitions 67 78 77 64 99 83 Duration (minutes) 56 46 56 19 59 24 Appendix D. Evaluation Results 177 Table D.38: False-alarm test results of the timing analyzer part alone with M B D set to 1.024 sec mis-recognized F M A M as Pop Rock Classical Speech + Pop Speech Music Music Music Music J l 0 0 0 0 0 0 J2 0 0 0 0 0 J3 1 2 1 2 3 2 J4 0 0 0 0 0 0 J6 0 0 0 0 0 0 J8 0 0 0 0 0 0 Steady Sound 20 10 24 15 10 14 Phone 3 2 1 6 6 2 Total # of Mis-recognitions 24 14 26 23 19 18 Duration (minutes) 30 30 30 30 30 30 Appendix D. Evaluation Results D.4 Spectral Recognizer Part Alone D.4.1 Recognition Results wi th Background Steady Noise Appendix D. Evaluation Results 179 Table D.39: Confusion mat r ix for warning sound recognition generated by the spectral recognizer part alone i n the presence of steady noise 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 30 . 30 30 30 30 30 30 30 28 1 1 2 27 1 1 1 28 2 28 30 30 30 1 1 29 1 28 Table D.40: Results of steady sound recognition rate generated by the spectral recog- nizer part alone in steady noise background Steady Sound Assigned Value Recognit ion Rate (RR) i n % J5 5 100 J7 7 100 smoke a la rm 9 93.3 Average - 97.6 Appendix D. Evaluation Results 180 Table D.41: Results of burst-type sound recognition rates produced by the spectral recognizer part alone i n steady noise background Burst- type Sound Assigned Value Recognit ion Rate i n (%) J l 1 100 J2 2 100 J3 3 100 J4 4 100 J6 6 100 J8 8 100 Average - 100 Table D.42: Results of phone ring recognition rate produced by the spectral recognizer part alone i n steady noise background Phone R i n g Assigned Value Recognit ion Rate (%) T E ( L 1 ) 10 90.0 T E ( L 3 ) 11 93.3 T E ( L 5 ) 12 93.3 T E ( L 7 ) 13 100 E T E ( P l ) 14 100 E T E ( P 2 ) 15 100 E T E ( P 3 ) 16 96.7 E T E ( P 4 ) 17 93.3 Average - 95.8 Appendix D. Evaluation Results 1°1 D.4.2 Recognition Results wi th Background of F M Broadcast plus Steady Noise Appendix D. Evaluation Results 182 Table D.43: Confusion matrix for the results of warning sound recognition generated by the spectral recognizer part alone in F M broadcast and steady noise background 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 29 . . . 1 2 22 1 7. . 3 5 6 1 2 5 6 1 1 1 4 . 30 5 8 22 6 2 1 14 10 7 30 8 30 9 30 10 1 2 1 24 1 1 11 1 4 4 21 12 1 1 1 3 22 1 1 13 1 3 2 1 2 20 1 14 2 3 3 2 19 1 15 1 5 . 22 2 16 2 2 1 4 21 17 1 1 - 3 5 20 Table D.44: Results of steady sound recognition rate produced by the spectral recog- nizer part alone in F M broadcast plus steady noise background Steady Sound Assigned Value Recognition Rate (%) J5 5 73.3 J7 7 100 smoke alarm 9 100.0 Average - 91.1 Appendix D. Evaluation Results 183 Table D.45: Results of burst-type sound recognition rates produced by the spectral recognizer part alone i n F M broadcast plus steady noise background Burst- type Sound Assigned Value Recognit ion R a t e ( R R ) (%) J l 1 96.7 32 2 73.3 33 3 20.0 34 4 100 36 6 3.3 38 8 100 Average - 65.6 Table D.46: Results of phone ring recognition rates produced by the spectral recognizer part alone under F M broadcast plus steady noise condit ion Phone R i n g Assigned Value Recognit ion Rate (%) T E ( L 1 ) 10 80.0 T E ( L 3 ) 11 70.0 T E ( L 5 ) 12 73.3 T E ( L 7 ) 13 66.7 E T E ( P l ) 14 63.3 E T E ( P 2 ) 15 73.3 E T E ( P 3 ) 16 70.0 E T E ( P 4 ) 17 63.3 Average - 70.0 Appendix D. Evaluation Results 184 D.4.3 Recognition Results with Background of A M Broadcast plus Steady Noise Appendix D. Evaluation Results 185 Table D.47: Confusion matr ix for warning sound recognition generated by the spectral recognizer part alone under A M broadcast plus steady noise condit ion 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 29 . 1 . . 2 23 6 1 3 4 7 1 5 5 5 3 4 30 5 24 6 6 4 2 1 6 16 1 7 29 1 8 30 9 30 10 1 . 25 4 11 1 1 3 20 5 12 . 1 3 3 2 19 3 13 2 4 3 4 17 14 1 . 1 4 22 15 2 3 1 3 21 16 3 1 3 20 17 2 1 1 1 1 2 22 Table D.48: Results of steady sound recognition rates produced by the spectral recog- nizer part alone under A M broadcast plus steady noise condit ion Steady Sound Assigned Value Recognit ion Rate (%) J5 5 80.0 J7 . 7 93.3 smoke a la rm 9 100.0 Average - 91.1 Appendix D. Evaluation Results 186 Table D.49: Results of burst-type sound recognition rate produced by the spectral recognizer part alone i n the presence of A M broadcast plus steady noise Burst- type Sound Assigned Value Recognit ion Rate (RR) (%) J l 1 96.7 J2 2 76.7 J3 3 23.3 J4 4 100 J6 6 6.7 J8 8 100 Average - 67.2 Table D.50: Results of phone ring recognition rate produced by the spectral recognizer part alone i n the presence of A M broadcast plus steady noise Phone R i n g Assigned Value Recognit ion Rate (%) T E ( L 1 ) 10 83.3 T E ( L 3 ) 11 66.7 T E ( L 5 ) 12 63.3 T E ( L 7 ) 13 56.7 E T E ( P l ) 14 73.3 E T E ( P 2 ) 15 70.0 E T E ( P 3 ) 16 66.7 E T E ( P 4 ) 17 73.3 Average - 69.2 Appendix D. Evaluation Results 187 D.4.4 Results of false-alarm tests for the spectral recognizer part alone Table D.51: False-alarm tests for the spectral analyzer part alone mis-recognized F M A M as Heavy Pop Soft Speech Soft Soft Rock Music Music Music Rock J l 0 0 0 0 0 0 J2 0 0 0 0 0 0 J3 0 1 1 6 1 1 J4 0 0 0 0 0 0 J5 0 0 0 0 0 0 J6 0 0 1 24 0 0 J7 1 30 4 0 17 4 J8 0 0 1 0 0 0 Smoke Alarm 0 0 4 31 10 0 TE(L1) 28 10 30 2 3 7 TE(L3) 30 7 24 9 0 36 TE(L5) 9 29 9 48 79 23 TE(L7) 50 43 40 0 10 46 E T E ( P l ) 1 0 4 0 0 0 ETE(P2) 0 0 0 0 0 0 ETE(P3) 0 0 0 0 0 1 ETE(P4) 1 0 2 0 0 2 Total # of Mis-recognitions 120 120 120 120 120 120 Duration (minutes) 3.42 4 4.27 4.8 4.2 3.57 Appendix E Specifications 1. Power Supply : • + 5 V : 700 m A • + 12 V : 64.3 m A • - 12 V : 51.6 m A 2. Signal Features: T i m i n g and short-time spectral patterns 3. The W A R N S I S : a ' hyb r id ' system consisting of the parts of the t iming analyzer and the spectral analyzer 4. T i m i n g Analyzer Pa r t Alone: • Func t ion : the classification of warning sounds based on the absolute short- t ime average signal amplitudes • short-time durat ion : 12.8 msec • T i m i n g Features : the repetition period and the average signal burst wid th for burst-type sounds; whereas a rising signal ampli tude transit ion and a new signal amplitude for steady sounds 188 Appendix E. Specifications 189 5. Spectral Recognizer Part Alone: • Function : extraction of short-time spectral features from warning sounds by the filter-bank approach, • Short-time Duration : 12 msec • # of filters : 8 • Type of Filter : digital biquad • Frequency Span : 100 Hz to 5.0 kHz • Implementation : software • Pattern Matching : Dynamic Time Warping 6. Modes of Operations: • burst-type and steady warning sound recognition • phone ring recognition 7. Recognition Accuracy : • 98 % for steady and burst-type warning sounds for a SNR of over 10 dB • 93 % for phone rings for a SNR of over 10 dB or better 8. False-alarm Rate: • one false recognition per 90 minutes (worst-case) for burst-type and steady sounds Appendix E. Specifications 190 • no false r ing indications 9. Recognit ion T i m e : 0.5 sec to 10 sec depending on the type of warning sounds

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
China 4 0
Norway 2 0
United States 2 0
City Views Downloads
Beijing 3 0
Unknown 2 0
Guangzhou 1 0
Redmond 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items