UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A warning signal identification system (WARNSIS) for the hard of hearing and the deaf Chau, Kwok Wing Chau 1989

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A7 C44.pdf [ 8.17MB ]
Metadata
JSON: 831-1.0064861.json
JSON-LD: 831-1.0064861-ld.json
RDF/XML (Pretty): 831-1.0064861-rdf.xml
RDF/JSON: 831-1.0064861-rdf.json
Turtle: 831-1.0064861-turtle.txt
N-Triples: 831-1.0064861-rdf-ntriples.txt
Original Record: 831-1.0064861-source.json
Full Text
831-1.0064861-fulltext.txt
Citation
831-1.0064861.ris

Full Text

A WARNING  SIGNAL IDENTIFICATION  SYSTEM  (WARNSIS) FOR T H E H A R D OF HEARING AND T H E D E A F Kwok Wing Chau B . A . Sc. University of W i n d s o r  A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T O F T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF A P P L I E D  SCIENCE  in T H E F A C U L T Y OF G R A D U A T E  STUDIES  DEPARTMENT OF E L E C T R I C A L ENGINEERING  W e accept this thesis as conforming to the required standard  T H E UNIVERSITY O F BRITISH C O L U M B I A  July 1989  © Kwok Wing Chau  }  ] 989  In  presenting this  degree at the  thesis  in  University  of  partial  fulfilment  of  the  requirements  British Columbia, I agree that the  freely available for reference and study. I further agree that copying  of  department  for  by  his  or  her  representatives.  It  is  permission for extensive  understood  head of my  that  publication of this thesis for financial gain shall not be allowed without permission.  Department of EC CCr/ttCti*-  £AJ Cfo £~&//J </  The University of British Columbia Vancouver, Canada Date  DE-6 (2/88)  Tulj. 3/ (??  advanced  Library shall make it  this thesis for scholarly purposes may be granted by the or  an  copying  or  my written  Abstract  T h e objective of this project has been to design a reliable w a r n i n g sound recognition system for h a r d of hearing a n d deaf people. C o m m e r c i a l l y available auditory w a r n i n g devices use simple technologies, w h i c h are not able to produce the performance required. T h e d e m a n d for a versatile W A R N i n g Signal Identification S y s t e m ( W A R N S I S ) that satisfies the needs of h a r d of hearing a n d deaf individuals has been well established. T h i s W A R N S I S must be "teachable" i n order to cope w i t h the m a n y different sounds, a n d diverse noisy environments. Relevant sounds are telephone rings, sirens, a n d smoke a n d fire alarms, a n d noise includes a l l other sounds i n c l u d i n g radio-music, conversation, machinery, etc. In the absence of published data, we studied extensively b o t h t i m i n g a n d spectral characteristics of w a r n i n g sounds.  W e found that the average short-time  absolute  a m p l i t u d e of w a r n i n g sounds is useful i n p r o v i d i n g t i m i n g information, and that the short-time spectra yield characteristic patterns for signal classification. T h e W A R N S I S operates i n real-time, and embodies two parts: the t i m i n g analyzer a n d the spectral recognizer. T h e t i m i n g analyzer continuously monitors the variations of environmental sounds, from w h i c h i m p o r t a n t t i m i n g features are derived. If a potential w a r n i n g sound is detected, the spectral recognizer is activated to analyze its spectral patterns. W h e n these patterns m a t c h one of the learned a n d pre-stored templates, a w a r n i n g sound is identified w i t h the k n o w n w a r n i n g sound associated w i t h t h a t template. A n advantage of such a recognition scheme is that it avoids unnecessary a n d c o m p u t a t i o n a l l y intensive spectral analysis work when o n l y noise is present.  ii  E v a l u a t i o n results show that the W A R N S I S can reliably recognize w a r n i n g sounds i n r a n d o m noise w i t h no false alarms.  In loud music a n d conversation backgrounds  the W A R N S I S can still achieve a h i g h recognition rate, but more false alarms are generated.  I n household environments where conditions are less demanding t h a n our  evaluation criteria, our system is expected to produce very satisfactory results. Since the W A R N S I S can be taught to learn a n d recognize new w a r n i n g sounds, it may be used i n other applications such as noisy i n d u s t r i a l sites a n d traffic light c o n t r o l .  iii  Table of Contents  Abstract  ii  L i s t of Tables  xii  L i s t of Figures  xvi  Acknowledgement 1  2  xvii  Introduction  1  1.1  Background  1  1.2  Auditory Warning Aids for Hearing Impaired Persons  2  1.2.1  Hard-wired Systems  3  1.2.2  Threshold Detector Systems  4  1.2.3  Hearing Ear Dogs  5  1.3  Project Objectives  6  1.4  Thesis Outline  7  W a r n i n g Sounds a n d G e n e r a t i n g Devices  9  2.1  Types of Warning Signal Generating Devices  9  2.2  Industrial Standards for Warning Devices  10  2.2.1  Sound Output Power  10  2.2.2  Frequency Specification  2.3  . .  10  Literature on Warning Sound Characteristics  11  2.3.1  11  Telephone Rings iv  2.3.2  Smoke Detector A l a r m Sounds  2.3.3  W a r n i n g and A l a r m Sounds Generated b y Vehicles and Traffic C o n t r o l Devices  2.4  3  14  T h e E m e r g i n g Scientific basis for Generating W a r n i n g Sounds  15  2.4.1  15  A Generic W a r n i n g Sound Generating Scheme  M e a s u r e m e n t a n d A n a l y s i s of T i m i n g & S p e c t r a l C h a r a c t e r i s t i c s  19  3.1  T i m i n g Characteristics  19  3.1.1  A P C - B a s e d D a t a Acquisition System  19  3.1.2  D a t a Collection  22  3.1.3  T i m i n g Features of Different W a r n i n g Sounds  25  3.2  Spectral Characteristics 3.2.1  33  C o m p a r i s o n of P a r a m e t r i c a n d N o n p a r a m e t r i c Spectral E s t i m a tion Methods  34  3.2.2  Welch's Non-overlapping Spectral E s t i m a t i o n M e t h o d  35  3.2.3  Implementation of Welch's M e t h o d  37  3.2.4  D a t a Collection  40  3.2.5  Spectra of W a r n i n g Sounds Generated b y various W a r n i n g D e -  3.2.6 4  12  vices  43  Summary  59  Solutions to the R e c o g n i t i o n P r o b l e m  69  4.1  P a t t e r n - R e c o g n i t i o n M o d e l for Signal Identification  69  4.2  R e v i e w & E v a l u a t i o n of Signal Recognition Techniques  71  4.2.1  A n a l y z i n g & U t i l i z i n g T i m i n g Features  71  4.2.2  Feature E x t r a c t i o n b y F i l t e r B a n k s  74  4.2.3  The L P C / A R Model  76 v  77  4.2.5  The Hidden Markov Model (HMM) Approach  78  Overview of the Recognition Scheme for WARNSIS  79  4.4  Extracting & Classifying Timing Information  82  4.4.1  83  A Scheme to Extract Timing Features  Extracting Spectral Information  94  4.5.1  Feature Extraction  94  4.5.2  Dynamic Time Warping ( D T W )  96  Design & Implementation  101  5.1  Timing Analyzer  101  5.1.1  Microphone  101  5.1.2  Analog Signal Conditioner  103  5.1.3  Control & Timing Processor (CTP)  104  5.2  5.3  6  LPC-derived Cepstral Coefficients  4.3  4.5  5  4.2.4  Spectral Recognizer (SR)  . .  104  5.2.1  The Hybrid Analog Processor (MC4760)  105  5.2.2  Feature Extraction and Pattern Matching Processor (/zPD776l)  106  5.2.3  The Control Processor (/xPD7762)  108  5.2.4  Pattern Memory  109  Software Program  109  5.3.1  The Command Set of the Spectral Recognizer  110  5.3.2  Initialization Stage  Ill  5.3.3  Training Stage  112  5.3.4  Recognition Stage  114  Evaluation  118  6.1  120  Average Recognition Accuracies vi  7  6.2  False-alarm Rates  123  6.3  Discussion  124  6.3.1  Average Recognition Accuracies  124  6.3.2  False-alarm Rates  129 1 3 1  Conclusions and Recommendations 7.1  Summary & Conclusions  131  7.2  Recommendations for Future Directions of Research  133 1 3 5  References Appendices  140  A  F o r m u l a t i o n of R e l a t i o n s h i p between S N R a n d S P L measurements  B  F o r m a t of the c o m m a n d set of the S R  1 4 5  C  Software O p e r a t i n g M a n u a l of T h e W A R N S I S  1 4 8  D  C.l  Program Files  148  C. 2  Interactive Operations  149  C.2.1  Initialization Stage  150  C . 2.2  Training/Recognition Stage  152 1 5 6  E v a l u a t i o n Results D. l  The Complete WARNSIS  158  D. l . l  Recognition Results with Background Steady Noise  158  D.1.2  Recognition Results with Background of F M Broadcast plus Steady Noise  ,..  vii  161  D.1.3  Recognition Results with Background of A M Broadcast plus Steady Noise  D.1.4  D.1.5  163  Results of phone ring recognition with minimum burst duration (MBD) set to 1.024 sec  165  Results of the False-alarm Tests for the complete WARNSIS . . .  168  D.2 Timing Analyzer Part Alone  170  D.2.1  Recognition Results with Background Steady Noise  D.2.2  Recognition Results with Background of F M Broadcast Plus Steady Noise  D.2.3  172  Recognition Results with Background of A M Broadcast Plus Steady Noise  174  D.3 False-alarm Results for the Timing Analyzer Alone  176  D.4  Spectral Recognizer Part Alone  178  D.4.1  Recognition Results with Background Steady Noise  178  D.4.2  Recognition Results with Background of F M Broadcast plus Steady Noise  D.4.3  D.4.4 E  170  181  Recognition Results with Background of A M Broadcast plus Steady Noise  184  Results of false-alarm tests for the spectral recognizer part alone  187  Specifications  188  viii  L i s t of Tables  2.1  Spectral analysis results for different smoke detectors [13]  13  2.2  Summary of spectral analysis results for traffic alarm sounds [14] . . . .  14  3.3  Instantaneous and short-time signal amplitudes  20  5.4  Parameters used for the Timing Analyzer  Ill  6.5  A summary of recognition results with M B D set to 0.1024 sec  121  6.6  A summary of recognition results with M B D set to 1.024 sec  123  6.7  Results of the false-alarm test with M B D set to 0.1024  125  6.8  Results of false-alarm test with M B D set to 1.024 sec  126  A . 9 Tabulation of SPL reading difference and SNR  144  B. 10 Format of command set of SR  146  B . l l Legal Values for parameters of the command set  146  B.12 Interpretation of status output codes from ^PD7762  147  D.13 "Numbers" assigned for different warning sounds  157  D.14 Confusion matrix for recognition results generated by the complete W A R N SIS in the presence of steady noise D.15 Recognition rates of burst-type sounds under steady noise condition  159 . . 159  D.16 Recognition rates of steady sounds generated by the complete WARNSIS under steady noise condition  159  ix  D.17 Confusion m a t r i x for phone ring recognition generated b y the complete W A R N S I S under steady noise condition  160  D.18 R e c o g n i t i o n rates of phone r i n g generated b y the complete W A R N S I S under steady noise condition  160  D.19 Confusion m a t r i x for recognition results generated b y the complete W A R N SIS i n the presence of F M broadcast plus steady noise .  162  D . 2 0 R e c o g n i t i o n rates of burst-type sounds produced by the complete W A R N SIS under F M broadcast plus steady noise c o n d i t i o n  162  D.21 R e c o g n i t i o n rates of steady sounds generated by the complete W A R N S I S under F M broadcast plus steady noise condition  162  D.22 Confusion m a t r i x for recognition results generated b y the complete W A R N SIS i n A M broadcast plus steady noise background  164  D.23 R e c o g n i t i o n rates of burst-type sounds generated by the complete W A R N SIS i n A M broadcast plus steady noise environment  164  D.24 R e c o g n i t i o n rates of steady sounds generated by the complete W A R N S I S i n A M broadcast plus steady noise background  164  D.25 Confusion m a t r i x for phone r i n g recognition generated b y the complete W A R N S I S under the condition of F M broadcast a n d the steady noise w i t h M B D set to 1.024 sec  166  D.26 Results of recognition rates of phone rings generated b y the complete W A R N S I S i n F M broadcast plus the steady noise background  166  D.27 Confusion m a t r i x for the results of phone r i n g recognition generated by the complete W A R N S I S i n the presence of A M broadcast plus the steady noise w i t h M B D set to 1.024 sec  167  x  D.28 Results of phone r i n g recognition rates generated b y the complete W A R N SIS i n the presence of A M broadcast plus steady noise w i t h M B D set to 1.024 sec  167  D . 2 9 Results of the false-alarm tests for the complete W A R N S I S w i t h M B D set to 0.1024 sec  168  D.30 Results of the false-alarm tests for the complete W A R N S I S w i t h M B D set to 1.024 sec  169  D.31 Confusion m a t r i x for warning sound recognition generated by the t i m i n g analyzer alone i n the presence of steady noise  171  D.32 R e c o g n i t i o n rates of the t i m i n g analyzer part alone i n the presence of steady noise  171  D.33 Confusion m a t r i x for w a r n i n g sound recognition generated by the t i m i n g analyzer part alone i n the presence of F M broadcast plus steady noise . 173 D.34 R e c o g n i t i o n rates of the t i m i n g analyzer part alone i n the presence of F M broadcast plus steady noise  173  D.35 Confusion m a t r i x for w a r n i n g sound recognition generated by the t i m i n g analyzer part alone i n the presence of A M broadcast plus steady noise . 175 D.36 R e c o g n i t i o n rates of the t i m i n g analyzer part alone i n the presence of A M broadcast plus steady noise  175  D.37 False-alarm test results of the t i m i n g analyzer part alone w i t h M B D set to 0.1024 sec  176  D.38 False-alarm test results of the t i m i n g analyzer part alone w i t h M B D set to 1.024 sec  177  D.39 Confusion m a t r i x for w a r n i n g sound recognition generated b y the spect r a l recognizer part alone i n the presence of steady noise  xi  179  D.40 Results of steady sound recognition rate generated by the spectral recognizer part alone in steady noise background  179  D.41 Results of burst-type sound recognition rates produced by the spectral recognizer part alone in steady noise background  180  D.42 Results of phone ring recognition rate produced by the spectral recognizer part alone in steady noise background  180  D.43 Confusion matrix for the results of warning sound recognition generated by the spectral recognizer part alone in F M broadcast and steady noise background  182  D.44 Results of steady sound recognition rate produced by the spectral recognizer part alone in F M broadcast plus steady noise background . . . .  182  D.45 Results of burst-type sound recognition rates produced by the spectral recognizer part alone in F M broadcast plus steady noise background . . 183 D.46 Results of phone ring recognition rates produced by the spectral recognizer part alone under F M broadcast plus steady noise condition . . . .  183  D.47 Confusion matrix for warning sound recognition generated by the spectral recognizer part alone under A M broadcast plus steady noise condition  185  D.48 Results of steady sound recognition rates produced by the spectral recognizer part alone under A M broadcast plus steady noise condition . . . 185 D.49 Results of burst-type sound recognition rate produced by the spectral recognizer part alone in the presence of A M broadcast plus steady noise  186  D.50 Results of phone ring recognition rate produced by the spectral recognizer part alone in the presence of A M broadcast plus steady noise . . . 186 D.51 False-alarm tests for the spectral analyzer part alone  xii  187  L i s t of Figures  2.1  A u d i t o r y W a r n i n g S o u n d C o m p o n e n t s [17,21,23]  3.2  Signal acquisition and derivation of instantaneous absolute signal a m p l i -  18  tudes  21  3.3  F l o w c h a r t of procedure to accumulate a n d store 1000 samples  23  3.4  E x p e r i m e n t a l set-up for data collection  24  3.5  Short-time average absolute amplitudes ( S T A A A ) of siren sounds: a) J l : B u r g l a r a l a r m (JDS-100); b) J 2 : M P I - 1 1 ; c) J3 : J D S - 1 0 0 I; a n d d) J4 : HI-LO  3.6  27  Short-time average absolute amplitudes ( S T A A A ) of siren sounds: a) J5 : H i g h steady sound; b) J 6 : Pulser; c) 37 : Steady horn; and d) J8 : E l e c t r o n i c Synthesized B e l l sound  3.7  28  Short-time average absolute amplitudes ( S T A A A ) of telephone rings and smoke a l a r m sound: a) Electro-mechanical Ringer; b) E l e c t r o n i c R i n g e r ; a n d c) Smoke a l a r m sound  3.8  29  Short-time average absolute amplitudes ( S T A A A ) of radio broadcasts a) P o p music; b) Speech; a n d c) R o c k music  3.9  30  Short-time average absolute amplitudes ( S T A A A ) of siren sounds w i t h radio-broadcast as background: a) J l ; b) J 2 ; c) J 3 ; a n d d) J4  31  3.10 Short-time average absolute amplitudes ( S T A A A ) of different siren sounds w i t h same background noise: a) J 5 ; b) J 6 ; c) J 7 ; a n d d) J8  xiii  32  3.11 Spectrogram of the minimum 4-sample Blackman-Harris window, where PSD denotes power spectral density  41  3.12 Flowchart of the spectral analysis program  42  3.13 Short-time spectra of an electromechanical ringer  47  3.14 (a) : Spectra of an electromechanical ringer with seven loudness settings  48  3.14 (b) : Spectra of another electromechanical ringer with seven loudness settings  49  3.15 Long-time averaged spectra of five electromechanical ringers  50  3.16 Short-time averaged spectra of a multiple-line telephone  51  3.17 Effects of steady fan noise on telephone ring spectra  52  3.18 (a) Short-time spectra of electronic rings with pitch set at position one .  54  3.18 (b): Short-time spectra of electronic rings with pitch set at position two  55  3.18 (c) : Short-time spectra of electronic rings with pitch set at position three 56 3.18 (d) : Short-time spectra of electronic rings with pitch set at position four 57 3.19 Spectra of Rapid Yelp sound  61  3.20 Spectra of Conventional Yelp sound  62  3.21 Spectra of Low-Hi sweep sound  63  3.22 Spectra of European Hi-Low sound  64  3.23 Spectra of Hi-Frequency Steady sound  65  3.24 Spectra of Pulsating Horn sound  66  3.25 Spectra of Steady Horn sound  67  3.26 Spectra of Electronic Synthesized Bell sound  68  4.27 Classic Signal Recognition Scheme [37,38]  70  4.28 The 'hybrid' recognition scheme for WARNSIS  80  4.29 Block diagram of the Timing Feature Extractor  82  xiv  4.30 Relationships between the instantaneous energy and the instantaneous absolute amplitudes of a sequence, x(n). (a) : the plot of x(n); (b): the plot of |z(n)|; and (c): the plot of x (n)  84  2  4.31 (a): The STAAA  contour of a steady sound; (b): The STAAA  contour  of a burst-type sound  86  4.32 Two typical examples of how the dynamic amplitude threshold adapts to acoustic energy variations of the environment, (a): sudden decrease in signal levels; (b): sudden increase in signal levels  88  4.33 (a) : Detection of a steady sound; (b): A n illustration of how the scheme rejects a non-steady sound  90  4.34 A demonstration of the use of the M B D and M I A T to refine the basic warning sound analysis scheme  91  4.35 Flowchart of the Timing Feature Extraction Scheme  93  4.36 Filter-bank analysis of Warning sounds  95  4.37 A n example of pattern matching between a reference template and an unknown pattern  97  4.38 Local path constraints for D T W  100  5.39 The building blocks of WARNSIS  102  5.40 Block diagram of MC4760  106  5.41 Block diagram of the functional operation of /^PD7761  107  5.42 Timing relationships associated with the synchronization of the spectral recognizer to burst-type warning signals, where S T A A A is the short-time average absolute amplitude of signal; R P is the repetition period; A S B W is the average signal burst width, and SR is the spectral recognizer . . . 5.43 Flowchart of the training scheme for steady sounds  xv  113 115  5.44 F l o w c h a r t of t r a i n i n g procedures for burst-type w a r n i n g sounds  116  5.45 F l o w c h a r t of the recognition procedure  117  6.46 A n example of a phone ring sequence added w i t h nonstationary background noise  128  xvi  Acknowledgement  I w o u l d like to thank m y supervisor, D r . C . A . Laszlo for his patience,  encouragement,  a n d i n p u t d u r i n g this project. I a m greatly indebted to m y colleagues, D a r r e l l W o n g a n d S a m m y Y i c k for their invaluable discussions a n d advice. Special thanks are due to A n g e l a C h o i a n d M i c h a e l Slawnych for their comments and suggestions to improve the presentation of this thesis. F i n a l l y , very deep gratitude is directed to m y family for their generous financial support. T h i s project was funded b y N a t u r a l Sciences a n d Engineering Research C o u n c i l of C a n a d a grant A67012.  xvii  Chapter 1  Introduction  1.1  Background  A u d i t o r y c o m m u n i c a t i o n is v i t a l to n o r m a l life.  S u c h c o m m u n i c a t i o n often focuses  o n speech w h i c h is one of the most effective means of conveying ideas, opinions or information among people. A u d i t o r y c o m m u n i c a t i o n also plays a n i m p o r t a n t role i n associating people w i t h their environment. I n particular, a u d i t o r y warnings are of great importance. S u c h warnings include baby cries, telephone rings, doorbells, door knocks, fire or smoke a l a r m bells, burglar alarms, car horns, sirens, a n d electronic buzzers c o m m o n l y used i n household appliances a n d office equipment. Generally, auditory warnings are achieved by special sounds.  Firstly, warning  sounds are usually l o u d , strident a n d insistent to effectively cut through speech and b a c k g r o u n d noises, a n d to c o m m a n d people's attention.  Secondly, different w a r n i n g  sounds convey different "messages" w h i c h demand responses of v a r y i n g urgency. Some w a r n i n g sounds are used to "announce" a c o n d i t i o n , or a n event; for example, an i n coming telephone call, or a visitor at a door. O t h e r w a r n i n g sounds alert people to potential life-threatening situations such as a fire, or intruders inside a house. F a i l u r e to respond to these w a r n i n g sounds may result i n serious h a r m . Unfortunately, hearing-disabled people have difficulty i n hearing w a r n i n g sounds a n d i n m a n y cases cannot hear even very l o u d alarms. T h i s p r o b l e m extends to many different situations of everyday life.  F o r such i n d i v i d u a l s , m a n y c o m m o n household  1  Chapter 1.  Introduction  2  sounds go undetected (sounds produced by oven buzzers, b a t h r o o m fans, stove hood fans, or r u n n i n g water) causing inconvenience a n d occasional danger i n homes.  In  noisy environments, hearing-disabled individuals cannot discriminate different types of sounds. F o r example, they cannot hear the sounds that indicate automobile malfunctions such as w o r n brakes, b a d wheel bearings, or noisy mufflers. I n a d d i t i o n , h a r d of hearing i n d i v i d u a l s w h o wear hearing aids can only detect w a r n i n g signals i f their hearing aids are operating and are sensitive enough. Specifically, unless the hearing aid is w o r n d u r i n g sleep, h a r d of hearing people usually cannot hear the sound of burglar alarms, or fire and smoke a l a r m bells. Furthermore, i n tornadoprone states of the U . S . (Kansas, Texas and A r k a n s a s ) , the general public is usually alerted of approaching tornadoes by loud siren sounds. M i s s i n g such w a r n i n g sounds can be fatal! B u t hearing-disabled people often cannot hear such sounds, a n d their utmost concern a n d their urgent need for special devices to w a r n of such i m p e n d i n g disasters have been forcefully stated [l]. Indeed, the invisible d i s a b i l i t y of deaf a n d h a r d of hearing people creates serious inconveniences, frustrations, fears, and hazards i n their daily life. In particular, the v u l n e r a b i l i t y to missing auditory warnings contributes significantly to the lack of m o b i l ity, independence, a n d security of hearing-disabled persons. In response to the obvious need to help hearing-disabled people to cope w i t h this problem, a number of special alert aids have been designed a n d marketed.  1.2  A u d i t o r y W a r n i n g A i d s for H e a r i n g I m p a i r e d Persons  A range of systems, signalling a n d wake-up devices are currently available to alert hearing i m p a i r e d i n d i v i d u a l s to telephone rings, doorbells, door knocks, fire or smoke a l a r m bells, a n d general emergency signals i n diverse environments [1,2,3,4].  Some  Chapter 1.  Introduction  3  systems are simple sound a m p l i t u d e amplification devices, w h i c h increase the volume of w a r n i n g sounds to a level detectable by hearing a i d wearers. Other, more sophisticated systems, are capable of d r i v i n g external visual modules and tactile actuators. Three major types of auditory w a r n i n g aids for the hearing-disabled are i n use: directly activated hard-wired systems, acoustic threshold detector systems, a n d hearing ear dogs.  1.2.1  H a r d - w i r e d Systems  S u c h systems require direct electrical connection to sound generating sources.  They  are reliably activated by the electric signal that drives the w a r n i n g sound generator, a n d alert the hearing-disabled b y either flashing lights, or b y v i b r a t o r y actuators. T o increase the operational range, a n d to eliminate the need for long cables, a n intermediate A M or F M transmitter can be integrated into such systems. Single or multiple remote receivers d i s t r i b u t e d throughout the home or office can pick up the transmitted signal, a n d subsequently t u r n on actuators. A characteristic example of such systems is the Sonic A l e r t , w h i c h w i l l produce light flashes to alert the hearing i m p a i r e d to telephone calls. T h e device can be used w i t h any telephone, a n d is easily installed b y plugging it into any m o d u l a r telephone jack a n d electrical outlet.  B o t h the p l u g - i n and a remote radio-transmitter version  are available f r o m the Special Needs D e p a r t m e n t of the B r i t i s h C o l u m b i a Telephone Company. Some hard-wired devices are simple enough to be installed b y users w i t h o u t extensive electronic skills (e.g., Sonic A l e r t ) . Other, more sophisticated devices, are custom designed, a n d require permanent i n s t a l l a t i o n by a technician at a considerable cost. A s reported, these c u s t o m designed devices often must be left behind w h e n hearingdisabled i n d i v i d u a l s move from house to house [1]. In a d d i t i o n , as the number of sound  Chapter 1.  Introduction  4  generating devices increases i n homes or offices, the cost of h a r d - w i r e d systems escalates due to b o t h the w i r i n g required, a n d the increased complexity. F i n a l l y , before any remote w a r n i n g device is installed, hearing-disabled people have to check if there are similar remote systems installed i n neighboring houses. D u e to "cross-talk", such systems i n close p r o x i m i t y are very prone to generating false warnings.  1.2.2  T h r e s h o l d D e t e c t o r Systems  Since w a r n i n g devices produce sounds that are louder t h a n n o r m a l environmental sound levels, threshold detector systems are designed to respond to changes i n loudness. Instead of direct connection to sound generating sources, threshold devices employ a microphone, or special electromagnetic field sensor for signal acquisition. W i t h sensit i v i t y adjustment, a threshold device can be adapted to operate w i t h various types of alarms, for example horns, sirens, a n d telephones, under different acoustic conditions. W h e n the signal level from any source exceeds the preset threshold value of the system, such a device w i l l automatically activate the actuator to alert a hearing i m p a i r e d individual. Since these devices cannot selectively identify the sources of the l o u d sounds, i n acoustic systems the microphone is positioned i n close p r o x i m i t y to the w a r n i n g sound generator for m a x i m u m system sensitivity a n d selectivity to the desired inputs.  A  hearing i m p a i r e d person can adjust the device sensitivity according to the acoustic b a c k g r o u n d noise level.  Such a device is simple to operate, a n d is used to monitor  c r y i n g babies, telephones, doorbells, a n d burglar or smoke alarms. W h i l e threshold devices are generally more flexible t h a n hard-wired systems, proper setting of the device sensitivity is frustrating to m a n y users. A d j u s t e d too high, the device is likely to miss the occurrence of w a r n i n g sounds. A low threshold setting makes the device vulnerable to false triggering.  Chapter 1.  Introduction  5  T h r e s h o l d detection systems using electromagnetic field sensing detect only w a r n i n g sounds e m i t t e d by electromechanical actuators, for example telephones and doors equipped w i t h electromechanical bells. W h e n an electromechanical bell is activated, a strong time-varying electromagnetic field is produced to activate a n internal electromechanical v i b r a t i n g system. Consequently, this v i b r a t i o n generates a loud sound. F o r the purpose of w a r n i n g sound detection, the stray electromagnetic field emitted by m a n y devices m a y be utilized.  F o r example, w i t h a suction cup electromagnetic  field p i c k u p coils m a y be attached to the telephone or bell housing to intercept part of the t i m e - v a r y i n g magnetic field. T h e output of the p i c k u p coil is amplified a n d fed to a n appropriate threshold detection circuit. T o alert hearing disabled i n d i v i d u a l s , such systems provide outlets for lamps and external v i b r a t o r y actuators. Since some w a r n i n g devices are usually installed out of reach inside houses and offices (for example, fire alarms), the installation of the field pickup coils m a y be difficult. D u e to low signal levels, special care is needed i n h a n d l i n g the w i r i n g connection between the p i c k - u p coil and the threshold detector circuit. In a d d i t i o n , m a n y newer appliances use solid-state buzzers w h i c h do not generate any magnetic field. Nevertheless, electromagnetic field sensing is a reliable m e t h o d i f employed under the appropriate circumstances.  1.2.3  Hearing E a r Dogs  W h i l e a H e a r i n g E a r dog is not a technological device, it is included here to underscore the seriousness of the p r o b l e m , a n d the complex a n d expensive solutions that are being offered. T h e H e a r i n g E a r dog p r o g r a m was originally funded b y the U . S . Government to meet the special needs of hard of hearing a n d deaf people. A n affiliated p r o g r a m was established i n O n t a r i o , C a n a d a a n d is named the H e a r i n g E a r Dogs of C a n a d a . O n l y mature hearing-disabled individuals are qualified recipients of H e a r i n g E a r dogs. In the  Chapter 1.  Introduction  6  U . S . , the expenditure involved i n dog selection, veterinary care, housing, training, and placement are fully subsidized by the U . S . Government. H e a r i n g E a r dogs are trained to alert their owners to w a r n i n g sounds c o m m o n l y found i n the l i v i n g environment. Dogs chosen from pet adoption offices are extensively screened prior to the rigorous four to five months of t r a i n i n g . D u r i n g this t r a i n i n g , the H e a r i n g E a r dog learns obedience, a n d how to respond to sounds emitted b y household appliances a n d warnings. T h e H e a r i n g E a r dogs can reliably recognize w a r n i n g sounds they are trained for, and w i l l skillfully alert their owners. I n a d d i t i o n , the H e a r i n g E a r dog usually is a n ideal c o m p a n i o n for elderly people. T h e H e a r i n g E a r dog approach to the p r o b l e m also has some negative aspects. T h e t r a i n i n g a n d dog placement processes are lengthy a n d costly, a n d the p r o g r a m often has a very long list of applicants w a n t i n g dogs. Moreover, since the t r a i n i n g of H e a r i n g E a r dogs requires special skills, once a placement is made recipients cannot teach their dogs to learn new w a r n i n g sounds. T h e maintenance of the dogs is a costly proposition, a n d their transportation also creates problems. Furthermore, the presence of animals is not always tolerated i n offices, hotels a n d other public places.  1.3  P r o j e c t Objectives  E x i s t i n g a u d i t o r y w a r n i n g aids for hearing-disabled people suffer f r o m various functional deficiencies. S u c h deficiencies include lack of portability, lack of flexibility i n recognizing w a r n i n g sounds, and the propensity for false-alarms.  In a recent survey  [l], hearing-disabled people have expressed their desire for personal w a r n i n g sound recognition systems w h i c h are easy to operate, a n d w h i c h are able to distinguish different household w a r n i n g a n d emergency sounds. T h e demand for a versatile W A R N i n g Signal Identification S y s t e m ( W A R N S I S ) w h i c h satisfies such needs is well established.  Chapter 1.  Introduction  7  M o t i v a t e d b y this demand, by recent advances i n speech recognition technology, and b y the availability of specialized V L S I processors, it has been our objective to develop a real-time, adaptive W A R N S I S w h i c h meets the following design criteria: 1. T o be "teachable", w h i c h means that the device must be able to learn new w a r n i n g sounds, a n d recognize t h e m after a t r a i n i n g procedure; 2. H a v e a recognition performance that is similar to that of n o r m a l l y hearing adults i n very noisy environments; a n d 3. T o produce acceptable positive a n d negative false-alarm rates i n use. I n order to achieve this goal, work was undertaken to : 1. Investigate the characteristics of the w a r n i n g sounds c o m m o n l y used i n office and l i v i n g environments; 2. U t i l i z e the results obtained i n 1. to develop a recognition technique w h i c h has h i g h reliability under noisy conditions; 3. Implement a prototype W A R N S I S e m b o d y i n g the recognition technique developed i n 2; and 4. E v a l u a t e its overall performance i n different noisy environments. 1.4  Thesis Outline  I n C h a p t e r 2 the literature on the various w a r n i n g devices is reviewed. Industrial standards for the output power and spectral characteristics of w a r n i n g sound generators are also discussed. C h a p t e r 3 investigates the t i m i n g and spectral features of some c o m m o n a u d i t o r y w a r n i n g sounds.  C h a p t e r 4 reviews different speech recognition techniques,  Chapter 1.  Introduction  8  w i t h detailed discussion of the filter-bank approach used i n this work. T h e details of our W A R N S I S i m p l e m e n t a t i o n are presented i n C h a p t e r 5, a n d the evaluation of the system performance is contained i n C h a p t e r 6.  C h a p t e r 7 gives the conclusion and  recommendations for further improvement i n system performance.  Chapter 2  W a r n i n g Sounds and G e n e r a t i n g Devices  2.1  T y p e s of W a r n i n g S i g n a l G e n e r a t i n g Devices  Devices w h i c h generate audible w a r n i n g signals employ either electro-mechanical or solid state transducers. Electro-mechanical w a r n i n g devices generally include a metallic gong, h a m m e r a n d coil assembly.  T o activate such a device, its coil is electrically  energized, causing the hammer to v i b r a t e a n d to strike the gong. T h e t o n a l quality a n d loudness of these devices depend u p o n the various components i n the electromechanical assembly. S u c h are the shape a n d size of the gong(s), the force w i t h w h i c h the h a m m e r strikes the gongs, a n d the m o u n t i n g a n d housing enclosure. In a d d i t i o n , i n the manufacturing process, the mechanical components are assembled w i t h fairly large tolerances. Therefore, the characteristics of the sound generated by such devices vary significantly, even for different units of the same m o d e l . In the devices w h i c h employ solid-state transducers, w a r n i n g sounds are elicited by a p p l y i n g electric voltage waveforms to these components.  T h e tonal quality a n d  loudness of such devices depend o n the characteristics of these waveforms, a n d of the frequency response of the transducers.  T h e waveforms are produced b y electronic  circuits, a n d therefore their characteristics can be easily m a n i p u l a t e d . Since transducers are manufactured to close tolerances, the characteristics of the sounds generated by these electronic w a r n i n g devices vary very little, even for different units of the same model.  9  Chapter 2. Warning Sounds and Generating Devices  2.2  10  I n d u s t r i a l S t a n d a r d s for W a r n i n g Devices  2.2.1  Sound Output Power  Conceptually, w a r n i n g sounds should be sufficiently loud to be effective i n generating a t t e n t i o n a m o n g people i n the v i c i n i t y of the w a r n i n g device. B a s e d o n this concept, various s t a n d a r d organizations  1  established recommendations for the sound output  power of smoke a l a r m detectors [5], household fire w a r n i n g and burglar a l a r m systems [6], vehicle a l a r m systems [7], telephone rings [8,9,10] and. general audible signalling devices used for life safety a n d property protection [11], In general, it is recommended that i n non-industrial environments a n a u d i t o r y w a r n i n g device operated at rated voltage, a n d m o u n t e d i n its intended position, be capable of p r o v i d i n g a n output sound pressure level ( S P L ) at least 85 d B A ( w i t h reference to 20 fi Pa) measured at a distance of 10 feet from the device [12]. M o r e specifically, the m i n i m u m recommended S P L for w a r n i n g devices depends on the environment where these devices are installed. If the w a r n i n g devices are used i n p u b l i c places, a m i n i m u m of 15 d B A S P L above the average ambient sound level is required. If the devices are intended to be used i n private residences, these devices should produce a m i n i m u m of 10 d B A S P L above the average ambient sound level [11].  2.2.2  F r e q u e n c y Specification  O u r survey of the publications of five major standard associations led us to conclude that no specific guidelines on frequency content of general w a r n i n g sounds has been established. T h e only exception is the telephone, whose required acoustic output power a n d frequency content are specified by the C S A , E I A , A N S I and B e l l Laboratories. Canadian Standards Association (CSA), the Electronic Industries Association (EIA), the Underwriters Laboratories Incorporated (UL), the American National Standards Institute (ANSI), and the National Fire and Protection Association (NFPA) of the U.S. 1  Chapter 2. Warning Sounds and Generating Devices  2.3  11  L i t e r a t u r e o n W a r n i n g S o u n d Characteristics  2.3.1  Telephone R i n g s  Telephone ringers are designed to produce easily recognizable alerting sounds. The available standards are applicable to telephones with electromechanical, or bell-type, alerting ringers, and with modern electronic tone ringers [8,9,10]. The important performance characteristics specified by these standards are summarized for our purposes as follows: 1. The alerting signal of a telephone with an electro-mechanical alerting device shall contain two or more major frequency components (fl and f2) in the 500 - 6000 Hz range, with at least one having a mean power level of > 73 dB, relative to 1 pW. The second major component shall have a mean sound power level of > 68 dB, relative to 1 pW; 2. The total mean acoustic power level shall be > 80 dBA, relative to 1 p W . These power levels apply with the volume control set for maximum volume; 3. At least one of the major component (fl) shall be below 2000 Hz. The nominal frequency of the higher major frequency component (f2) shall be equal to or greater than 5/4 of the lower major frequency component (fl), i.e., f2 > 5/4 f l ; 4. The alerting signal of a telephone with an electronic alerting device that does not produce an acoustic spectrum rich in overtones shall meet the criteria in 1), with the exception that f l and f2 shall each have a mean power level of > 73 dB, relative to 1 pW;  Chapter 2. Warning Sounds and Generating Devices  12  5. A telephone shall have a loudness adjustment accessible to the user that produces at least of a 6 d B A t o t a l attenuation when operated from its h i g h to low volume position; and 6. W i t h regard to r i n g i n g cycles, ringing current supplied by telephone company central office shall belong to one of the following sequences : • R e p e t i t i v e bursts of 2 seconds out of every 6 seconds where a n i n d i v i d u a l burst m a y be as short as 0.8 second; • R e p e t i t i v e bursts of 1 second out of every 4 seconds where a n i n d i v i d u a l burst may be as short as 0.6 second; or • R e p e t i t i v e bursts of at least one r i n g i n g burst of a m i n i m u m 0.5 second d u r a t i o n i n any 4 second p e r i o d .  2.3.2  S m o k e D e t e c t o r A l a r m Sounds  Smoke alarms are used to alert people to the presence of smoke and to the p o t e n t i a l of fire. Generally, this w a r n i n g sound is very strident and insistent. In a study of a l a r m sound attenuation inside residential buildings H a l l i w e l l a n d S u l t a n [13] investigated the spectral content of the sounds produced by a number of smoke detectors.  Using  a 2-channel F F T analyzer connected to two microphones, they obtained the shortt i m e spectra, a n d for each sound 64 of these short-time spectra were averaged to give the s p e c t r u m .  T h e narrow-band s p e c t r u m was subsequently converted to a t h i r d -  octave s p e c t r u m by simply s u m m i n g the energy w i t h i n the third-octave bands. T h e i r results for various smoke detectors show two or more strong spectral components i n a l l computed spectra [Table 2.1]. Unfortunately, this work d i d not include the investigation of the v a r i a t i o n of the short-time spectra obtained from consecutive samples.  Chapter 2. Warning Sounds and Generating Devices  Table 2.1: Spectral analysis results for different smoke detectors [13]  Detector  1/3 Octave Frequency Bands (kHz)  Typet  0.5  0.63  Al A2 BI B2 CI C2 Dl D2 El E2 Fl F2 Gl G2  38* 37 82 79 44 44 46 44 84 76 61 58 37 38  39 38 82 81 44 44 46 44 70 83 60 61 37 38  0.8  1.0  1.25  1.6  2.0  2.5  3.15  4.0  5.0  39 39 38 38 60 71 66 72 44 45 44 45 46 46 44 45 85 69 63 69 72 70 69 70 37 . 38 38 38  63 44 74 76 45 45 47 45 76 80 70 72 39 39  57 56 81 81 50 50 52 50 92 87 74 77 50 48  73 70 79 77 61 62 63 62 88 85 86 90 63 61  96 98 95 93 79 79 80 80 96 97 75 81 88 84  84 92 95 94 102 102 103 102 92 100 83 82 95 95  63 67 95 96 90 91 93 88 91 91 90 89 69 71  50 56 88 92 69 70 71 68 80 89 82 82 55 56  t : Detectors with same letter denote identical model. $: Maximum Sound Power Output in dB  Chapter 2. Warning Sounds and Generating Devices  2.3.3  14  W a r n i n g a n d A l a r m Sounds G e n e r a t e d b y Vehicles a n d Traffic C o n t r o l Devices  Miyazaki and Ishida [14] have studied the spectral characteristics of traffic alarm sounds commonly used in Japan.  Such include sounds produced by electric horns used in  passenger cars, small, middle size buses and trucks; air horns used in large buses, heavy duty trucks, and trailers; sirens used in emergency vehicles; horns used in railroad crossing; and traffic noises. Their observations have only limited value for us since they neither give description of the techniques used nor do they specify the type (short-time or long-time average) of the spectra obtained. Table 2.2 summarizes their results. They conclude that traffic-alarm-sounds have sharp line spectra, whereas ambient traffic noise is wide-band random noise. Table 2.2: Summary of spectral analysis results for traffic alarm sounds [14] Traffic Alarm Devices  Installed Vehicles  Major Frequency Features  Electric horn  Passenger cars, small, middle size busses trucks  Air horn  large busses, heavy duty trucks, trailers Emergency vehicles  basic resonant frequency at 300 Hz - 500 Hz, dominant harmonics at 2.0 - 4.0 kHz dominant peaks at 300 - 500 Hz dominant peaks at 700 - 2000 Hz 2 - 3 dominant peaks at 2.0 - 4.0 kHz broadband noise below 300 Hz  Siren Rail-road crossing ambient traffic noise  In British Columbia, and typically in North America, three types of emergency vehicle siren sounds are used: the "hi/lo" sound, the "yelp" sound, and the "wail" sound.  Chapter 2. Warning Sounds and Generating Devices  15  T h e h i / l o sound is usually found o n most ambulances. It consists of t w o alternating tones, a n d w i t h the p a t t e r n repeating about once per second.  T w o c o m m o n l y used  tone pairs are 690/920 H z a n d 520/1520 H z . T h e w a i l sound is a slow changing tone between two preset tone frequencies. A t y p i c a l example is the w a i l sound used b y police motorcycle sirens w i t h preset tone frequencies at 500 H z a n d 1460 H z , a n d a repetition rate of 10 cycles per minute [15]. T h e yelp sound is a fast changing tone between two preset tone frequencies. A t y p i c a l example is the electronic siren produced by Southern Vehicle P r o d u c t s Inc., w h i c h provides a yelp sound w i t h preset tone frequencies at 600 H z a n d 1350 H z , a n d a repetition rate of 3 to 5 cycles per second [16]. T h e yelp and w a i l sounds are used b y b o t h fire-trucks a n d police cars.  2.4  T h e E m e r g i n g Scientific basis for G e n e r a t i n g W a r n i n g Sounds  W h i l e w a r n i n g sounds have been used for a long time, many of these are based on subjective opinions as to what is "best". O n l y recently was any scientific work done to determine what sound characteristics w i l l elicit o p t i m a l responses under v a r y i n g circumstances. S u c h work is p a r t i c u l a r l y relevant for us, since i n the future w a r n i n g devices m a y follow a more systematic approach to sound generation t h a n it has been the case u n t i l now.  2.4.1  A Generic W a r n i n g S o u n d G e n e r a t i n g Scheme  A c c o r d i n g to the work of P a t t e r s o n a n d his colleagues, a w a r n i n g sound need not to be excessively loud, but its amplitude must depend on the background noise level. T h e y have demonstrated, that i n order to hear sounds reliably i n noise, some spectral components must be between 15 d B and 25 d B above the masked threshold [17,18]. L o w e r and Wheeler [19] has developed a desk-top computer p r o g r a m to estimate this  Chapter 2. Warning Sounds and Generating Devices  16  b a c k g r o u n d threshold. W i t h the estimated background threshold, the spectral component a m p l i t u d e of the w a r n i n g sound can be determined. T h i s approach h a d been used to s t u d y the intense background noise of m i l i t a r y helicopters i n the U . K . [20]. W i t h regard to the frequency content of the w a r n i n g sound, P a t t e r s o n [17] l i m i t s it to the range between 0.5 k H z and 5.0 k H z . B a s e d on these spectral amplitude a n d frequency limits of the w a r n i n g sounds, a p a t t e r n of pulsative sounds w h i c h is distinctive a n d resistant to undesirable noise c o n t a m i n a t i o n , was constructed b y P a t t e r s o n [17,22,23]. A s shown i n F i g . 2.1 , this prototype w a r n i n g sound basically consists of a sequence of bursts each of w h i c h is made up of a sequence of pulses. Different degrees of perceived urgency can be manipulated by simply v a r y i n g the characteristics of the pulse sequences. I n Patterson's work, the pulse design starts w i t h measurement of the ambient noise spectrum. T h e n , the w a r n i n g signal spectrum is determined by setting a l l its components 15 - 25 d B above the corresponding ambient noise spectral values. In order to avoid excessive peak factors i n the signal waveform, sine or cosine phase is assigned to the spectrum. Consequently, the pulses are generated by a p p l y i n g the Inverse Fast Fourier T r a n s f o r m . These pulses vary i n d u r a t i o n from 75 msec to 200 msec i n accordance w i t h the guidelines set d o w n b y P a t t e r s o n [17,23]. A l s o , the pulses are gated w i t h sinusoidal ramps at b o t h ends i n order to avoid uncontrollable transients.  At  this stage, b y v a r y i n g the fundamental frequency, and the relative weight of h i g h and low frequencies of the pulses, any degree of perceived urgency can be designed. U s u ally, greater urgency is signalled b y higher fundamentals, and b y relatively more high frequency energy. A burst is produced by assembling three-to-nine copies of the basic pulse.  By  changing the elapsed time between the start of one pulse, a n d the start of the next, distinct p i t c h a n d t e m p o r a l patterns m a y be created.  B y v a r y i n g the amplitude of  Chapter 2.  Warning Sounds and Generating Devices  the pulses different loudness patterns may be obtained.  17  T h e perceived urgency is  generated by changing the overall p i t c h , the speed a n d the loudness pattern of the pulses. I n general, a burst w i t h a h i g h pulse rate w i l l convey greater urgency t h a n a burst w i t h a low pulse rate. A rising pitch-contour can produce a more urgent burst t h a n a falling pitch-contour. A d d i t i o n a l l y , a n urgent burst w i l l r e m a i n at, or near, the m a x i m u m loudness w h i l e a less urgent burst w i l l decrease i n loudness towards the end of the burst. Such bursts serve as templates from w h i c h w a r n i n g sounds m a y be synthesized. T h e a m p l i t u d e variations a n d spacing of the bursts are determined experimentally. T h e criterion is that the resulting w a r n i n g sound should effectively convey the desired specific w a r n i n g message to personnel i n the v i c i n i t y w i t h o u t a c t i v a t i n g their startling reflex. P a t t e r s o n successfully implemented this scheme on w a r n i n g systems of c o m m e r c i a l aircrafts a n d m i l i t a r y helicopters [17]. A slight modification of this scheme was also adopted for medical equipment used i n intensive-care units a n d operating theatres of hospitals i n the U . K . [22,23].  Time in seconds AUDITORY WARNING SOUND COMPONENTS  Figure 2.1: Auditory Warning Sound Components [17,21,23].  Chapter 3  M e a s u r e m e n t a n d A n a l y s i s of T i m i n g &c S p e c t r a l Characteristics  A s we have seen it i n C h a p t e r 2, the literature on w a r n i n g sounds yields l i t t l e useful information on their t i m i n g a n d short-time spectral characteristics.  Since it is the  purpose of this w o r k to apply t i m i n g a n d short-time spectral analysis techniques to systematically extract the unique identifying characteristics of these w a r n i n g sounds i n real-life environments, such information is essential for us. Specifically, the detailed knowledge of w a r n i n g sound characteristics provides the basis for the exploration of different signal recognition schemes.  3.1  T i m i n g Characteristics  T h e objective of this part of our work was to derive useful i n f o r m a t i o n on the t i m i n g of w a r n i n g sounds from measurements of signal waveforms. F o r this purpose we used telephone rings, siren sounds, a n d smoke a l a r m sounds. Telephone rings were generated b y b o t h electro-mechanical a n d electronic ringers; siren sounds were produced by an electronic siren driver; a n d the smoke a l a r m sounds were obtained f r o m a commercial smoke a l a r m .  3.1.1  A P C - B a s e d D a t a Acquisition  System  T o o b t a i n quantitative data, a P C - b a s e d data acquisition system was designed and constructed.  T h i s system accepts the instantaneous absolute a m p l i t u d e waveform of  the signal, a n d transforms it into the short-time average absolute a m p l i t u d e ( S T A A A ) 19  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  20  waveforms. T h e n , the transformed waveforms are stored for p l o t t i n g . T h e instantaneous a m p l i t u d e a n d the short-time average variations i n absolute amplitudes of the signal are given i n T a b l e 3.3, where x(n) represents the discrete instantaneous signal amplitudes, a n d N denotes the number of samples accumulated.  Table 3.3: Instantaneous and short-time signal amplitudes signal amplitudes instantaneous  short-time average  absolute signal amplitudes \x(n)\  x(n)  1 J V  N  n=l  J V  n=l  T h e instantaneous absolute signal amplitudes are generated by hardware, a n d the derivation of the short-time average absolute signal amplitudes, a n d storage of these derived samples is accomplished b y software. Fig.  3.2 shows the block d i a g r a m of the m e t h o d used to generate the discrete  instantaneous absolute signal amplitudes. Basically, sounds are collected b y a suitable microphone, are pre-amplified by a low-noise voltage amplifier, and are low-pass filtered prior to i n p u t to a full-wave rectifier. T h e output from the full-wave rectifier gives the instantaneous a m p l i t u d e of the waveform. T h e n , a n 8-bit A / D converter samples this waveform at 10 k H z . Consequently, the digitized sample is stored t e m p o r a r i l y i n a n output buffer u n t i l the 8-bit microprocessor ( I N T E L 8088) is ready to accept the data v i a a bi-directional bus. In addition, a L E D bar graph is used to display the variations i n the instantaneous absolute amplitudes of the signal waveforms.  Chapter 3. Measurement and Analysis of Timing & Spectral  MICROPHONE  CPU  SIGNAL PREPROCESSOR  Characteristics  21  FULL-WAVE RECTIFIER  A/D CONVERTER  F i g u r e 3.2: Signal acquisition and derivation of instantaneous absolute signal a m p l i tudes I n this i m p l e m e n t a t i o n , the short-time average absolute signal amplitudes are derived from 12.8 msec accumulation of the instantaneous absolute signal a m p l i t u d e samples ( A / D converted data).  W i t h these instantaneous signals sampled at 10 k H z , a  sample of the short-time average absolute signal amplitudes can be obtained by summ i n g 128 of the instantaneous signal samples. In order to avoid the p r o b l e m of overflow d u r i n g the a c c u m u l a t i o n process, a 16-bit register is used to accumulate this sum. C o n sequently, a sample of the short-time average absolute signal amplitudes is obtained by d i v i d i n g the 16-bit register content b y the total number of accumulated samples (i.e 128 i n this case). T h e resulting quotient is then rounded to eight bits to provide the short-time average absolute signal amplitude sample w h i c h is transferred to a designated file. T h i s file stores 1000 bytes. These data m a n i p u l a t i o n a n d transfer procedures are repeated u n t i l the data file is completely filled w i t h 1000 samples (equivalent to  Chapter 3. Measurement and Analysis of Timing ic Spectral Characteristics  22  12.8 sec of the signal waveform). T h e p r o g r a m to handle this d a t a m a n i p u l a t i o n and transfer i n real-time was w r i t t e n i n I N T E L 8088/8086 assembly language. A  flowchart  of these operations is shown i n F i g . 3.3.  3.1.2  Data  Collection  W i t h this d a t a acquisition system, we collected data on the absolute amplitudes of w a r n i n g sounds i n the n o r m a l acoustic environment of our laboratory. F i g . 3.4 shows the experimental set-up. T h e siren h o r n produced siren sounds; a n d a radio cassette player p r o v i d e d the pre-recorded telephone rings a n d smoke a l a r m sounds. A sound pressure level ( S P L ) meter placed aside the microphone measured the S P L variations of the environment throughout the data collection process. T h e S P L meter was set to " C " weighting a n d " S L O W " response, because the " C " weighting network of the S P L meter has a flat frequency response similar to that of the signal processing circuit of the data acquisition system; and the " S L O W " response provides a n average of 1.0 sec of the acoustic energy variations of the environment. B a s e d on the S P L measurements i n the absence a n d d u r i n g the presence of w a r n i n g sounds, the signal-to-noise ratio ( S N R ) could be deduced. S N R , i n this work, is defined as the ratio of peak signal power to peak noise power. Noises, i n this thesis, are defined as a l l sounds other t h a n w a r n i n g sounds. S u c h unwanted sounds m a y include steady and transient r a n d o m noises, radio broadcasts, or surrounding conversations. A detailed derivation of the relationship between the S N R and S P L measurements is given i n Appendix A . D a t a o n absolute amplitudes of w a r n i n g sounds were collected i n two different background environments. T h e first set of data were collected i n a steady r a n d o m noise b a c k g r o u n d w h i c h originated from a v e n t i l a t i o n fan of a P C - c o m p u t e r . S u c h noise is t y p i c a l for office environments. A value of 60-62 d B C was recorded throughout the data  Chapter 3. Measurement and Analysis of Timing Sc Spectral  Characteristics  tST^BTJ 1=1. j=1 • I INPUT END_OF_CONVERSION (EOC) S T A T U S F R O M A/D  A C C < ~-  ACC+D,  A C C < —- A C C / 1 6 Xj  < -- A C C 1  •  J < — J +1  |YES  STORE X TO A S P E C I F I E D FILE Figure 3.3: Flowchart of procedure to accumulate and store 1000 sampl  Chapter 3. Measurement and Analysis of Timing < f c Spectral Characteristics  CASSETTE PLAYER AND RADIO  Figure 3.4: Experimental set-up for data collection  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  25  collection process. This set of data were referred to as "clean", because the SNR was maintained at least over 20 dB. To study the effect of more complex background noises on warning sounds, the second set of data were collected at a SNR of 10 dB. The background noise sources consisted of both steady random noise, radio music broadcast, and speech. To establish the short-time average absolute amplitude profiles of the various noise sounds (without warning sounds present), a third set of data was also collected. This included all the noise sources used above, and the noise SPL was the same as that used in the SNR measurements.  3.1.3  T i m i n g Features of Different W a r n i n g Sounds  The plots of the first set of data are shown in Fig. 3.5, Fig. 3.6, Fig. 3.7 and Fig. 3.8. Since the purpose of these measurements is to establish the time variations of the short-time average absolute signal amplitudes, the actual value of these amplitudes is of no particular interest. Therefore, the vertical axes show a relative scale without units. The following observations may be drawn from these figures: 1. Fig. 3.5, Fig. 3.6(b) & (d) (siren sounds), and Fig. 3.7(a) &: (b) (telephone rings) show on-off type repetitive patterns of warning signal bursts; Fig. 3.6(a) &: (c) (siren sounds), and Fig. 3.7(c) (smoke alarm sound) display the steady sounds; 2. Fig. 3.5 (a) and (b) show devices which produces sounds with very similar temporal structures, but with different repetition rates; 3. Fig. 3.5 (d) is a two-tone siren sound, and its amplitude contour can be characterized by i) a transition from background level amplitudes, and ii) a repetitive  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  26  on-off pattern representing two tones of different intensities (for other siren sounds or telephone rings, the off-patterns represent the background noise levels); 4. The width of the bursts of these waveforms varies from 102.4 msec to 3.24 sec; 5. The repetition period of on-off patterns ranges from 140 msec to 5.86 sec; 6. Steady sounds are characterized by signal level transition to higher steady amplitude level; and 7. Contours of the average of short-time absolute signal amplitude of radio broadcasts (Fig. 3.8) consist of random, nonrepetitive sequences of signal bursts. The plots of the second set of data are shown in Fig. 3.9 and Fig. 3.10. Comparative examination of these plots yields the following observations: 1. For short-burst, such as (a), and (b) in Fig. 3.9, and (d) in Fig. 3.10, the introduction of radio broadcast background alters the baseline levels, and smooths out the weak peaks of the "clean" signals; however it produces no significant change in relative timing between consecutive amplitude peaks of the waveforms; 2. For signals with long silence intervals( > 400 msec) such as (c), and (d) in Fig. 3.9, and (b) in Fig. 3.10, spurious small peaks appear randomly during these intervals; and 3. The repetition rate of the on-off patterns of burst-type sounds is unchanged by variations in background noise. In summary, we can conclude from these measurements that the short-time average absolute amplitude contours provide unique timing information on both steady and burst-type sounds.  Chapter 3. Measurement and Analysis of Timing & Spectral  i i  i  0.000 0512  i  1  1  1  1 024  1 538  2.048  2 580  TIME ( i n sec) (b)  i i  0.00  Characteristics  i  1  1  1  2.38  5 12  7.88  10.24  TIME ( i n sec)  i  12.80  (d)  Figure 3.5: Short-time average absolute amplitudes (STAAA) of siren sounds: a) J l : Burglar alarm (JDS-100); b) J2 : MPI-11; c) J3 : JDS-100 I; and d) J4 : HI-LO  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  Jfirrmrnr  rTTTTinnrnrr  I™  0.0  —i—  i.a  ~i— 3.2  TIME ( i n sec)  "I  4.8  8.4  0.0  —r—  1.8  (a)  0.000 0.912  TIME ( i n sec) (b)  28  — I —  3.2  TIME ( i n sec)  -i— 4.8  - I  0.4  (c)  0 000 0 256  0.512  0.768  TIME ( i n sec)  1 024  1.280  (d)  Figure 3.6: Short-time average absolute amplitudes ( S T A A A ) of siren sounds: a) J 5 : High steady sound; b) J6 : Pulser; c) J7 : Steady horn; and d) J8 : Electronic Synthesized Bell sound  Chapter 3. Measurement and Analysis of Timing <fc Spectral Characteristics  29  Figure 3.7: Short-time average absolute amplitudes ( S T A A A ) of telephone rings and smoke alarm sound: a) Electro-mechanical Ringer; b) Electronic Ringer; and c) Smoke alarm sound  Chapter 3. Measurement and Analysis of Timing <fe Spectral Characteristics  30  Figure 3 . 8 : Short-time average absolute amplitudes ( S T A A A ) of radio broadcasts a) Pop music; b) Speech; and c) Rock music  Chapter 3. Measurement and Analysis of Timing &c Spectral  Characteristics  31  Figure 3.9: Short-time average absolute amplitudes (STAAA) of siren sounds with radio-broadcast as background: a) J l ; b) J2; c) J3; and d) J4  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  < T  I  0.0  3.2  — I —  -i—  6.4  12.8  TIME ( i n sec) 8.8  s  I —  3.2  0.0  (a)  l  T  TIME (e.4 i n sec) (c)  8.6  32  12.8  < *  <  11 0  0  0  ^  -r,Lr,  3  8  \  TIME ( i n sec) (b)  9  U  6  4  0  ' 0 000 1.024  2.048  3.072  4.096  S^O  TIME ( i n sec) (d)  Figure 3.10: Short-time average absolute amplitudes ( S T A A A ) of different siren sounds with same background noise: a) J5; b) J6; c) J7 ; and d) J8  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  3.2  33  S p e c t r a l Characteristics  Based on the assumption that the short data records deduced from the observed time sequences are ergodic, and that their estimated spectra are slowly time-varying, spectral estimation techniques provide an insight into the frequency contents carried by the observed time sequences. Generally, spectral estimation methods use either the parametric, or the nonparametric approach. A detailed exposition of many different algorithms used for obtaining waveform spectra was given by Kay and Marple [24]. In general, parametric spectral analysis involves three steps. The first step is to select a time series model, with assumed model order, for the observed data record. Time series models such as the autoregressive model(AR), the moving-average model(MA), or the autoregressive-moving average model(ARMA), are the most common choices for practical applications. For example, the linear prediction coding (LPC), or A R model with model order of 10-16, has been proven to be a very suitable choice for speech analysis and synthesis [25,26]. The second step is to estimate the model parameters using the available data samples [24]. Depending on the specific time series model selected, different algorithms may be applied for such parameter extraction. The third step is to compute the estimated spectra by substituting the specific parameter values derived in the second step into the theoretical power spectral density function of the model used. The nonparametric spectral estimation approach assumes that the observed data record is produced from a set of sinusoidal components governed by the Fourier Series model of signals.  Two popular and conventional spectral estimation techniques are  the Blackman-Tukey [27] and the Welch's periodogram [28] methods.  Both of these  techniques employ the computationally efficient Fast Fourier Transform ( F F T ) . A new, unified, FFT-based spectral estimation method, capable of producing more statistically  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  34  stable spectra w i t h better frequency resolution t h a n the conventional methods, has been proposed b y N u t t a l l and C a r t e r [29].  3.2.1  C o m p a r i s o n of P a r a m e t r i c and N o n p a r a m e t r i c S p e c t r a l E s t i m a t i o n Methods  W i t h relatively short d a t a sequences recorded under h i g h signal-to-noise ( S N R ) conditions, the parametric technique can produce smoother a n d finer frequency resolution spectra. Unfortunately, the parametric spectral estimation approach is susceptible to noise interference. Such degradation i n performance of the A R model has been extensively investigated b y L i m [30] and K a y [31]. T h e nonparametric spectral estimation approach is implemented i n practice b y the Discrete F o u r i e r T r a n s f o r m ( D F T ) . Since the D F T considers every d a t a sequence to be periodic, such periodic extensions of the original d a t a sequence exhibit discontinuities at the boundaries of the observed time interval. In the subsequent numerical analysis, these b o u n d a r y discontinuities result i n spectral leakage over the entire frequency spect r u m . H a r r i s [32] discussed the a p p l i c a t i o n of using various windows w i t h nonuniform weighting to reduce this spectral leakage. T h i s can be accomplished only at the expense of frequency resolution i n the spectrum. F i n a l l y , to o b t a i n a statistically stable spectrum, s p e c t r u m averaging of short-time spectra is definitely required [28]. I n general, the frequency resolution of spectra obtained by the nonparametric spect r a l estimation approach is l i m i t e d by the data d u r a t i o n , and is independent of the S N R of the signals. Theoretically, the frequency resolution of spectra is inversely prop o r t i o n a l to the d u r a t i o n of the original d a t a sequence. Since zero-padding of the data sequence before transformation effectively increases the signal d u r a t i o n , it has been a misconception that such a zero-padding procedure w i l l improve the frequency resolut i o n of the resultant spectra. A s demonstrated i n [24], zero p a d d i n g is useful only for  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  35  1) s m o o t h i n g the appearance of the resultant spectra v i a i n t e r p o l a t i o n , 2) resolving p o t e n t i a l ambiguities of c o m p u t e d spectra, a n d 3) reducing the " q u a n t i z a t i o n " error i n the accuracy of estimating the frequencies of spectral peaks. It is c o m m o n procedure to a p p l y w i n d o w i n g prior to the zero-padding of the d a t a sequence.  3.2.2  Welch's N o n - o v e r l a p p i n g S p e c t r a l E s t i m a t i o n M e t h o d  F o r this work, we selected the conventional Welch's non-overlapping spectral estimation approach to investigate different w a r n i n g sounds. T h e rationale b e h i n d this choice has four aspects. F i r s t , most w a r n i n g sounds usually m a i n t a i n a regular r h y t h m , and continuous, long d a t a records c a n be obtained. T h i s allows spectral averaging, a n d results i n the statistical s t a b i l i t y of the computed spectra. Secondly, by Welch's spectral estimation technique is robust w i t h respect to noise c o r r u p t i o n of the signals, because the frequency resolution a n d the s t a b i l i t y of the c o m p u t e d spectra are independent of the S N R . T h i r d l y , no a p r i o r i knowledge of a signal model for various w a r n i n g sounds is needed.  F i n a l l y , l i m i t a t i o n s inherent i n Welch's spectral estimation m e t h o d have  been thoroughly studied, a n d techniques used to reduce discrepancies have been well explored [32]. Welch's non-overlapped spectral estimation technique m a y be described i n four steps:  1. C o n s i d e r a d a t a sequence, x(n) of length N , where n G [0, N — l ] , a n d divide N into K non-overlapped segments, each of w h i c h has an integral length of N / K , say M , a n d is denoted as Xk(m), where m G [0, M — l ] , a n d k G [0, K — 1].  Chapter 3. Measurement and Analysis of Timing Sc Spectral Characteristics  2. Select a n " D F T - e v e n " w i n d o w sequence a n d m u l t i p l y this w i n d o w sequence onto  x (m) k  =  36  w(m), w i t h length identical to x (m), k  Xk(m),  giving x (m) as follows, k  (3.1)  Xk{m)w(m)  3. Take the magnitude square of the windowed sequence to o b t a i n the k  th  segment  discrete F o u r i e r s p e c t r u m (often called modified periodogram) denoted as Sk (I),  M-l  S (l) k  =  MU  y~] Xk(m)w(m)e 771=0  }  '  M-l  MU  J2 x (m)e ^"M k  (3.2)  1  m=0  where U = w i n d o w average power given by,  M-l  (3.3) m—U  4. C o m p u t e S (l) for k € [0, K — 1], a n d o b t a i n the average spectrum, k  U 0  M-l  E^W  = 7 1  1  S (l), avg  (3.4)  k=0  DFT-even window is a conventional even window sequence with the right-end point missing. [32]  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  W e l c h demonstrated  37  that the variance of spectral estimates c a n be reduced by  d i v i d i n g a long original d a t a sequence into finer segments. However, he also cautioned that the statistical bias generated by the estimation process increases linearly w i t h increasing number of segments [28]. Therefore, the trade-off between the size of d a t a segments a n d the amount of spectral variance reduction is to be determined by the user.  3.2.3  I m p l e m e n t a t i o n of Welch's M e t h o d  Since the D F T can accept complex input quantities, we m a y make use of this feature to establish a n efficient scheme for the c o m p u t a t i o n of average s p e c t r u m from two real d a t a sequences. S u c h a scheme is implemented by the use of the F F T a l g o r i t h m , and involves only a single pass of the F F T c o m p u t a t i o n . T h e three steps of calculations are s u m m a r i z e d as follows. T h e first step is to substitute the real and i m a g i n a r y parts of a complex input data sequence b y two non-overlapped real d a t a segments.  T h e n , we take the D F T of this  complex sequence, a n d after further calculations we can o b t a i n the average spectra of the two non-overlapped d a t a segments. T h e detailed m a t h e m a t i c a l derivations are given i n [33], w i t h the major steps summarized below: 1. C o n s i d e r now g(m) being a complex input d a t a sequence whose real and imaginary parts are substituted by the two non-overlapped real d a t a segments Xi(m) a n d X2(m). T h e n , g(m) can be expressed by,  g(m)  where m G [0, M — 1].  =  xi(m)  + jx (m) 2  (3.5)  Chapter 3. Measurement and Analysis of Timing <& Spectral Characteristics  38  2. T h e D F T of g(m) w h i c h is denoted as G(k) is expressed by, M-l  sH M«"'^  G{k) = =  where G  (3-6)  w  G {k)+jGj{k) R  = real part of D F T of G[k)  R  Gj — imaginary part of D F T of G(k) w(m) = " D F T - e v e n " window sequence k e [ 0 , M - 1] 3. N o w we take into consideration that given two real d a t a sequences, Xi{m),  and  x (m), a n d a D F T - e v e n w i n d o w sequence, w(m), for m e [0,M — 1], the D F T of 2  these windowed data sequences denoted as Xi(k),  a n d X (k), 2  respectively, can  be represented by their real a n d imaginary parts given below:  Xi{k)  =  X (k)  + j X (k)  (3.7)  X (k)  =  X (k)  + j X (k)  (3.8)  2  where X (k) 1R  1R  u  2R  2I  = real part of the D F T of Xi(m)  Xu(k)  = i m a g i n a r y part of the D F T of x i ( m )  X (k)  = real part of the D F T of x ( m )  X i(k)  = i m a g i n a r y part of the D F T of x (m)  2R  2  2  k e [ 0 , M - 1]  2  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  39  It can be shown that,  X (M-k)  = X (k)  X (M-k)  =  X {k)  (3.10)  X {M-k)  =  -X {k)  (3.11)  X (M-k)  =  -X {k)  (3.12)  lR  (3.9)  1R  2R  u  2I  2R  u  2I  Using the expression 3.5 for g(m) in Eq. (3.6), we can express G (k) R  and G/(fc)  in terms of the real and imaginary parts of X\(k) and Xzft):  G {k)  =  X (k)  - X {k)  (3.13)  Gj{k)  =  X {k)  + X {k)  (3.14)  R  1R  2I  u  2R  If we substitute k by (M-k) into Eq.(3.13-3.14) and utilize the results obtained from E q . (3.9-3.12), we obtain,  G {M-k)  = X {k)  G {M-k)  =  R  T  4. The average spectrum, P (k) aug  Pa {k) = ^u{\Mk)\ V9  2  + X {k)  1R  2I  - Xu(fc) + X {k) 2R  (3.15) (3.16)  for Xi(n) and x (n) is given by, 2  + \Mk)\ } 2  (3-17)  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  40  where  * = £{El.MI'}  C3.«)  Therefore, by making use of the results obtained from E q . (3.13-3.16) to solve for X {k),  X {k),  1R  u  X {k), 2R  and X {k) 2I  in terms of G {k), G {M R  and Gj(M — k), we can, subsequently, derive P (k) avg  R  - k), G (fc), 7  from the real and  imaginary parts of G{k). Thus, we can show that,  Pav (k) = _L_ S  {G (k) R  + G (M R  -k)+  G)(k) + G){M - k)}  (3.19)  In this work, warning signals were sampled at a rate of 20 kHz with 12 bit resolution. The non-overlapped data segment length was a multiple of 12.8 msec, or of 256 data samples.  With regard to the specific window used to reduce spectral leakage, the  minimum 4-sample Blackman-Harris window (Fig. 3.11), with - 92 dB highest sidelobe level, - 6 dB/octave sidelobe fall-off rate, and two frequency bins  2  of the equivalent  noise bandwidth [32], was used to multiply onto each non-overlapped data segment. The actual spectral calculations were performed on a V A X 750 general computer. The flowchart of the program is given in Fig. 3.12.  3.2.4  D a t a Collection  In order to explore the variations of warning sound spectral characteristics, the sounds emitted by 1.) electromechanical ringers of five rotary dial phones, 2.) a multiple-line A bin is a basis frequency for a spectrum and is derived from the ratio of the signal sampling frequency to the total number of data points used in the spectrum. 2  Chapter 3. Measurement and Analysis of Timing ic Spectral Characteristics  41  Figure 3.11: Spectrogram of the minimum 4-sample Blackman-Harris window, where P S D denotes power spectral density  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  Read data in RX02 format from magnetic tape  Unscramble data to ASCII format  Multiply data sequence by DFT-even window  Form : Z = X + i Y X = sequence 1 Y = sequence 2 Z  = complex sequence Compute DFT of Z by using FFT  I  Unscramble FFT output to obtain averaged spectrum using eqt.(12)  Figure 3.12: Flowchart of the spectral analysis program  42  Chapter 3. Measurement and Analysis of Timing & Spectra] Characteristics  43  push-button telephone, 3.) an electronic ringer of a touch-tone telephone, and 4.) an electronic siren driver (used in timing feature measurement) were used. These sounds were recorded on a tape recorder in various ambient noisy environments in order to investigate the effects of background noises on warning sound spectra. The recorded warning sounds were fed to an A / D conversion system, and the digitized samples were stored onto a magnetic tape for storage and for further processing. To suppress the aliasing effect of the sampling process, a Kronhite electronic filter was used to remove the spectral components of the analog signals beyond the 10 kHz frequency bandwidth. Then, the filtered signal was fed to a 12-bit M I N C / D E C C AB-23 A / D converter with selectable data sampling frequency under the master control of a PDP-11 computer. In our work, the sampling frequency was set to 20 kHz. Consequently, each 6.5 seconds of the digitized sound record was transferred from a PDP-11 computer to a VAX-750 general computer for spectral analysis.  3.2.5  S p e c t r a of W a r n i n g Sounds G e n e r a t e d by various W a r n i n g Devices  Unless otherwise stated, most of the short-time spectra were obtained by averaging four consecutive 25.6 msec segments of the spectrum.  We assume that within this  102.4 msec the signals are slowly-varying, and that the average spectrum provides a statistically stable representation of the frequency content of the signals.  3.2.5.1  S p e c t r a of Telephone Rings generated b y E l e c t r o - m e c h a n i c a l  Ringers Although frequency specification on telephone rings are provided by various standard associations, the acceptable variations of the short-time spectra of telephone rings have not been published.  In addition, there is no information on the effect on spectral  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  44  variations of the different loudness level adjustments that c a n be made o n electromechanical ringers equipped w i t h loudness controls. Similarly, there is no mention i n the standards (or i n the literature) of the effect of the p i t c h setting of electronic ringers on the spectra of emitted sounds. T h e measurements reported here were made to o b t a i n this missing information. F i v e different aspects were examined: Short-time averaged spectra of a n electro-mechanical ringer  Fig.  3.13 gives a t y p i c a l example of short-time spectra of telephone rings w i t h the  loudness level set to one.  ( T h e loudness adjustment control is found at the b o t t o m  panel of some rotary d i a l telephones.) These rings were recorded i n a n o r d i n a r y office environment. T w o regions of spectra are identified: the transient, a n d the steady-state regions. D u r i n g the beginning 600 msec of the ringing p e r i o d (transient) these shortt i m e spectra are very similar, and are rich i n harmonic content (dominated by three to five major spectral peaks i n the 10 k H z frequency b a n d w i d t h ) . Following this is the steady-state of the ringing period w i t h only two or three d o m i n a n t peaks retained.  L o n g - t i m e averaged spectra of a n electro-mechanical ringer at seven different loudness levels  T h e next two figures show how telephone r i n g spectra v a r y w i t h respect to changes i n loudness level adjustments. T h e same telephone was used as i n the previous measurement. These spectra were obtained by averaging 256 25.6 msec long record segments (6.55 sec). F i g . 3.14 (a) shows that two major peaks always occur i n the spectra at each of the seven loudness settings. However, for another rotary dial telephone, F i g . 3.14 (b) shows the d r a m a t i c changes i n spectral characteristics w h e n the loudness adjustment is  Chapter 3. Measurement and Analysis of Timing < f e Spectral  Characteristics  45  altered from level two to three. The disappearance of these dominant peaks is caused by some change in the internal ringing mechanism. These figures clearly illustrate the unpredictability of the effect of varying loudness settings on telephone ring spectral characteristics.  Long-time averaged spectra of five electro-mechanical ringers  Spectra from five rotary dial phones of the same model were used in this measurement. To provide a general view of their spectral variations, Fig. 3.15 gives an example of long-time averaged spectra of five electromechanical ringers with a preset loudness level. In Fig. 3.15, the dominant spectral peaks produced by phone samples 1, 2 and 3, do not appear in the spectra generated by phone samples 4 and 5. This indicates that phone rings generated from telephones of same model do not produce similar spectral characteristics.  Short-time averaged spectra of a multiple-line telephone  Fig.  3.16 depicts another set of short-time spectra for a multiple-line push-button  telephone.  Since this telephone is not equipped with a loudness adjustment control,  our study on the effect of varying loudness adjustment on short-time spectra was not performed. Compared to other telephone ring spectra, Fig. 3.16 consists of spectral peaks at different frequency locations: 1.6 kHz, 3.2 kHz, 5.9 kHz and 9.2 kHz.  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  46  Short-time spectra of a n electro-mechanical ringer i n steady noise background  T o demonstrate how steady fan noise affected the short-time spectra of telephone rings, we used the same phone as i n the first two measurements.  These telephone rings  were recorded inside a computer r o o m where a n air-ventilation system was operating. C o m p a r e d to F i g . 3.13, i n F i g . 3.17 the amplitudes of dominant peaks decreased, the number of d o m i n a n t peaks was reduced, a n d the transient regions of the spectra have largely disappeared.  T h i s may be caused by the effect of spectral flattening of the  background noise. However, two of the d o m i n a n t peaks of successive spectra are still retained.  Conclusions Spectra of telephone rings produced by electro-mechanical ringers consist of a) two distinct regions (transient a n d steady) of short-time spectra, a n d b) spectral peaks are always located i n the 1.6 - 2.5 k H z a n d 4.7 - 6.2 k H z bands. Details of the spectral characteristics v a r y w i t h loudness, w i t h the model, a n d w i t h i n d i v i d u a l units of the same m o d e l .  I n general, it is difficult to predict the spectral distortion caused b y background noise because such distortion is highly dependent on the characteristics of the noise. S u c h characteristics are b o t h time a n d spatial variant. Since real environmental noise situations are very variable, there is very little practical value i n further study on the effect of noise o n the spectra.  Chapter 3. Measurement and Analysis of Timing & Spectral  Characteristics  o d  ffl o p  Q OH o  4.0  6.0  8.0  F R E Q U E N C Y IN k H z F i g u r e 3.13: Short-time spectra of a n electromechanical ringer  47  Chapter 3. Measurement and Analysis of Timing & Spectra] Characteristics  48  Chapter 3. Measurement and Analysis of Timing & Spectral  F i g u r e 3.14 settings  (b)  Characteristics  49  Spectra of another electromechanical ringer w i t h seven loudness  Chapter 3. Measurement and Analysis of Timing iz Spectral  Characteristics  Chapter 3. Measurement and Analysis of Timing & Spectral  '  J  X  ©  S "  _ J  r— 2.0  r  4.0  1  8.0  Characteristics  1  8.0  —  r 10.0  F R E Q U E N C Y IN k H z  F i g u r e 3.16: Short-time averaged spectra of a multiple-line telephone  e  3.  • * Spectral Characteristics „ t and A n a l y s i s of T i m ^ g ^ S p e  52  Measurement ana  100 F R E Q U E N C Y IN K H X  F  * B . 3.17:  <*•*"*•  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  3.2.5.2  53  S p e c t r a of telephone rings generated b y a n electronic ringer  Since solid state transducers are manufactured to close tolerance, and the control circuits generate very consistent tone frequencies, electronic ringers of the same type will produce sounds with very similar features.  In addition, the different types all  conform to applicable standards. Therefore, only one electronic ringer unit was examined in detail. Since the telephone we examined was equipped with pitch adjustment controls, the effects of different pitch settings on the spectra were also studied. Each of the short-time spectra was obtained by averaging two consecutive msec long spectra.  102.4  The reason for selecting 102.4 msec segments was to provide a  frequency resolution of 19.6 Hz for the separation of the two dominant tones generated by the electronic tone ringer. Fig. 3.18 (a), (b), (c), and (d) show that the change of pitch setting results in more high energy peaks appearing. Although it is difficult to see in these plots, the pitch setting also results in the shifting of the dominant lowest frequency peaks.  The tabulated numbers indicate that for this particular electronic  ringer, one tone frequency varies from 468 Hz to 546 Hz, and the other varies from 546 Hz to 683 Hz.  Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics  PQ  T3  54  c o  o o  o  LO 7 F i g u r e 3.18  2.0  40  6.0  F R E Q U E N C Y IN k H z  (a): Short-time spectra of electronic rings w i t h p i t c h set at position one  Chapter 3. Measurement and Analysis of Timing &: Spectral Characteristics  55  o d  o o  i—  o  ~ F i g u r e 3.18  4.0  6.0  F R E Q U E N C Y IN k H z  (b): Short-time spectra of electronic rings w i t h p i t c h set at position two  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  H  §  1  1  1  1  2.0  4.0  6.0  8.0  r 10.0  F R E Q U E N C Y IN k H z (c) : Short-time spectra of electronic rings w i t h p i t c h set at p o s i t i o n three 7  F i g u r e 3.18  —|  56  Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics  vj ^  ^  S 7  F i g u r e 3.18  __,  1  2.0  1  4.0  1  1  6.0  8.0  57  r  10 0  F R E Q U E N C Y IN k H z  (d) : Short-time spectra of electronic rings w i t h p i t c h set at position four  Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics  3.2.5.3  58  S p e c t r a of S i r e n Sounds  Lastly, we studied the spectral characteristics of different warning siren sounds produced by an electronic siren driver. Eight different siren sounds can be produced with this device. In all of these spectra, note that the peaks located at 7.0 kHz are produced by ambient noise monitored independently with the sound pressure level meter.  Rapid-Yelp The short-time spectra of this sound consist of a band of frequencies varying from 1400 Hz - 3000 Hz (Fig. 3.19). Conventional Yelp Fig. 3.20 shows the variation of short-time spectra of this sound which consists of a varying band of frequencies ranged from 666 Hz - 1333 Hz. Low-high Sweep Fig. 3.21 shows a very interesting 'chirp-signal' type of short-time spectra. The spectra consist of peaks varying from 820 Hz to 4.0 kHz. European Hi-low Fig. 3.22 shows spectra which consist of fundamental spectral component at 1093 Hz, along with its harmonics at 1640 Hz and 3164 Hz.  Chapter 3. Measurement and Analysis of Timing & Spectral Characteristics  59  Hi-frequency Steady F i g . 3.23 gives the spectra, w h i c h consist of a fundamental spectral component at 833 H z , together w i t h its harmonics at 1640 H z and 3200 H z . P u l s a t i n g Siren F i g . 3.24 gives the spectra of a ' P u l s a t i n g H o r n ' siren sound, w h i c h consists of a p o o r l y defined peak at 1600 H z a n d a distinct peak at 2400 H z . Steady H o r n F i g . 3.25 shows spectra, w h i c h consist of two major bands of frequencies at 500 - 700 H z a n d 1200 - 1400 H z . E l e c t r o n i c Synthesized B e l l F i g . 3.26 shows the spectra of a bell sound, w h i c h consists of four peaks at 700 H z , 1406 H z , 2070 H z , a n d 2812 H z .  3.2.6  Summary  S u m m i n g up the spectral analysis results, we reached the following conclusions [34]: • dominant spectral features of w a r n i n g signals generally appear w i t h i n the frequency range between 300 H z to 5.0 k H z , • w a r n i n g signal spectra may consist of a single spectral peak, or regular clusters of spectral peaks a n d valleys and, • i n general, the spectral features of w a r n i n g signals are simpler t h a n those of speech signals w i t h regards to:  Chapter 3. Measurement and Analysis of Timing &z Spectral Characteristics  60  1. absence of nonstationary segment of short-time spectrum (while isolated speech utterance may consist of nonstationary short-time spectra caused by weak fricatives at the utterance boundaries) and, 2. repeatability of spectral features of warning sounds (while due to variable utterance rate of a word, nonlinear time distortion in spectral features occurs) .  Chapter 3. Measurement and Analysis of Timing & Spectral  Characteristics  61  Chapter 3. Measurement and Analysis of Timing & Spectral  Characteristics  62  Chapter 3. Measurement and Analysis of Timing &: Spectral  Characteristics  o o  PQ  o o  Q CO  10  2.0  40  60  6.0  F R E Q U E N C Y IN k H z F i g u r e 3.21: Spectra of L o w - H i sweep sound  10.0  Chapter 3. Measurement and Analysis of Timing ii Spectral Characteristics  PQ  o d  o q  Q CO  16 I  O  in  2.0  40  6.0  F R E Q U E N C Y IN k H z F i g u r e 3.22: Spectra of E u r o p e a n H i - L o w sound  64  Chapter 3. Measurement and Analysis of Timing & Spectral  Characteristics  PQ  Q CO  •  F R E Q U E N C Y IN k H z  F i g u r e 3.23: Spectra of Hi-Frequency Steady sound  65  Chapter 3. Measurement and Analysis of Timing Sz Spectral Characteristics  Q co PL,  o d  7  —I  1  1  "  2.0  40  6.0  1  60  F R E Q U E N C Y IN k H z F i g u r e 3.24: Spectra of P u l s a t i n g H o r n sound  •  10.0  66  Chapter 3. Measurement and Analysis of Timing &z Spectral  Characteristics  Q CO  Pi  £ 1  2.0  4.0  6.0  8.0  F R E Q U E N C Y IN k H z F i g u r e 3.25: Spectra of Steady H o r n sound  10.0  Chapter 3. Measurement and Analysis of Timing & Spectral  Characteristics  cs PQ  d  Ti5 Q  CO  8 in t—  J  I  Ph  o  in F R E Q U E N C Y IN k H z  10.0  F i g u r e 3.26: Spectra of E l e c t r o n i c Synthesized B e l l sound  68  Chapter 4  Solutions to the R e c o g n i t i o n P r o b l e m  4.1  P a t t e r n - R e c o g n i t i o n M o d e l for S i g n a l Identification  T h e classic pattern-recognition scheme for signal identification is shown i n F i g . 4.27. T h i s scheme consists of feature extraction, pattern m a t c h i n g (similarity tests), a n d decision m a k i n g blocks. It forms the basis of m a n y applications, because it places no restrictions on the use of different feature sets, s i m i l a r i t y algorithms, a n d decision rules, a n d it is possible to implement it i n a wide range of circumstances [37]. T h e function of the feature extraction stage is to convert the signal into parameters or feature sets. T h i s results i n the reduction, a n d sometimes e l i m i n a t i o n , of redundancies that exist i n the original signal.  Such signal reduction procedures provide  a manageable number of signal features, m a k i n g practical machine recognition feasible. E x t r a c t a b l e signal features include t i m i n g information, short-time spectra, L i n e a r P r e d i c t i o n C o d i n g ( L P C ) parameters, L P C - d e r i v e d cepstral coefficients, or statistical parameters derived f r o m the H i d d e n M a r k o v M o d e l ( H M M ) . F o r p a t t e r n comparison, the signal features must be either k n o w n a p r i o r i , or the system must "learn" t h e m . S u c h learning m a y be accomplished b y t r a i n i n g the system w i t h the signal (s). T h i s involves the extraction of features, a n d their consequent storage i n template memory. T h e signal feature sets are obtained from consecutive short-time segments of the signals. T o recognize a specific signal, the features of the u n k n o w n signal are compared w i t h the different sets of pre-stored reference signal features.  69  Chapter 4. Solutions to the Recognition Problem  70  DISTANCE  UNKNOWN PATTERN  SCORE RECOGNIZED  SIGNAL  FEATURE | EXTRACTION^  DECISION RULE  PATTERN COMPARSIO  — -  SIGNAL  TEMPLATE  MEMORY  F i g u r e 4.27: Classic Signal R e c o g n i t i o n Scheme [37,38] T h e m a t c h i n g of the u n k n o w n signal features to the templates is generally complicated by the non-linear time mis-alignment of the short-time feature segments of the u n k n o w n signal and of the reference templates.  T o solve this m a t c h i n g problem, the  well-known d y n a m i c t i m e w a r p i n g ( D T W ) a l g o r i t h m is employed [39]. B a s e d on this a l g o r i t h m , for each reference template a n o p t i m u m m a t c h between the u n k n o w n signal a n d the reference features is sought. In these pattern comparisons, distance calculations aTe performed on the short-time segments of signal feature sets i n order to provide a measure of s i m i l a r i t y between the u n k n o w n signal a n d the reference templates. T h e literature offers several distance measures [40,41,42]. One of two decision rules are used i n most p r a c t i c a l systems: the nearest neighbor rule ( N N rule), a n d the K-nearest neighbor rule ( K N N rule). T h e N N rule is applied w h e n there is a unique reference template for each possible signal. In c o m p a r i n g an u n k n o w n signal w i t h the reference templates, the pre-stored template w h i c h is the  Chapter 4. Solutions to the Recognition Problem  71  smallest distance from the signal, is recognized to be the u n k n o w n signal. T h e K N N rule is applied when m u l t i p l e reference templates are learned from each possible sign a l , g i v i n g several template sets representing different signals. T h e u n k n o w n signal is associated w i t h the template set for w h i c h the m i n i m u m average distance is computed.  R e v i e w Sc E v a l u a t i o n of S i g n a l R e c o g n i t i o n Techniques  4.2  I n the following sections, an overview of previous research i n signal recognition is presented.  E m p h a s i s is focused on speech signal (isolated utterance) recognition tech-  niques, because 1) these recognition schemes fit well to the pattern-recognition m o d e l , 2) recognition performance of each applicable technique has been reported [35,36], and 3) w a r n i n g sounds have acoustic features (i.e., p i t c h and formant) similar to speech signals.  B a s e d on this survey, and on our signal analysis results, the most  suitable  recognition m e t h o d w i l l be selected for the W A R N S I S .  4.2.1  A n a l y z i n g & U t i l i z i n g T i m i n g Features  T i m i n g i n f o r m a t i o n m a y be extracted from signals using autocorrelation coefficients, zero-crossing measurements, energy waveform analysis, and peak detection.  Such i n -  formation has been used i n signal recognition i n a variety of ways. 4.2.1.1 A u t o - c o r r e l a t i o n coefficients P u r t o n [43] used speech signal autocorrelation coefficients i n his speaker-dependent recognition experiments.  Specifically, these autocorrelation coefficients were derived  from the outputs of two bandpass filters used to capture the formants of speech signals. H e achieved a n average of 90 % recognition accuracy for a vocabulary size of 10 words. S o n d h i [44] applied autocorrelation analysis to the speech signals w h i c h were preprocessed by a center-clipping technique w h i c h removed the formant structure.  The  Chapter 4. Solutions to the Recognition Problem  72  signal p i t c h was then extracted from the autocorrelation function. T h e combined form a n t structure removal of signals and autocorrelation analysis provided a robust p i t c h estimation m e t h o d . T h e effects of different degrees of formant structure removal prior to autocorrelation analysis on p i t c h estimation of speech signals was given by R a b i n e r [45]. A real-time hardware implementation of a p i t c h estimation scheme based on a c o m b i n a t i o n of center-clipping and peak-clipping methods, followed by autocorrelation analysis, was reported b y D u b n o w s k i [46]. T o use the correlator-bank approach similar to that of P u r t o n [43] i n our work, the number of correlators used and the number of terms (autocorrelation function coefficients) retained to formulate a signal feature need to be determined.  This may  be achieved by spectral analysis, and for our signals a m u l t i p l e - b a n d correlator w o u l d be needed. I n a d d i t i o n , i f the autocorrelation function coefficients generated w i t h zerodelay is used, this is equivalent to u t i l i z i n g the short-time signal spectral information. Hence, such correlator-based recognizer produces a large feature set, m a k i n g difficult a n d uneconomical to design and implement a real-time recognizer based o n this concept. 4.2.1.2 Zero-crossing R a b i n e r and S a m b u r [47] analyzed energy and zero-crossing measurements of prerecorded speech signals i n determining the endpoint locations of isolated utterances. F i r s t , the energy contour of a n utterance was generated and studied to provide a crude boundary. T o refine this utterance boundary, zero-crossing measurements were used. A t a n S N R of 30 d B or better, this endpoint detection a l g o r i t h m worked very well over all tested conditions. T o develop a low-cost, microprocessor-based speaker dependent recognizer, W h i t a k e r a n d A n g u s [48] employed zero-crossing measurements to track two formant variations of speech sounds. T h e zero-crossing counts were obtained from the outputs of two filters  Chapter 4. Solutions to the Recognition Problem  73  (one of w h i c h was a low-pass filter w i t h a cut-off frequency at 800 H z , a n d another was a high-pass filter w i t h 3 d B corner frequency at 1000 H z ) . I n order to optimize storage, they used the variable rate encoding technique to reduce redundancies i n signal features. W i t h a v o c a b u l a r y size of 10 - 20 words, they attained a n averaged recognition accuracy of 95 % - 99 %, depending u p o n the formant structure of the utterances. T h e use of zero-crossing detectors for w a r n i n g signal recognition is attractive. H o w ever, the accuracy of zero-crossing measurements depends o n the relative amplitude of the dominant frequency compared to other frequency components w i t h i n each frequency b a n d , a n d also on the spectral spacing of the components [48], In a d d i t i o n , zero-crossing analysis is very prone to noise interference. A l t h o u g h zero-crossing detectors c a n be implemented easily a n d economically, the inconsistency of their operational performance i n noisy environments makes this approach unsuitable for W A R N S I S . 4.2.1.3 E n e r g y  Waveform  T o counter the effect of nonstationary background noise added to the signal duri n g transmission over telephone lines, L a m e l [49] et. al developed a h y b r i d endpoint detection scheme for isolated utterances.  T h i s detector derives one or more endpoint  pair estimates from the energy contours of the utterances.  In order to determine the  best endpoint pair, w o r d recognition is performed using each possible set of endpoint pairs. T h e selection of the best pair is based on the best m a t c h achieved by the recogn i t i o n process.  T h e authors call this detector " h y b r i d " because 1) sets of possible  endpoint pairs are obtained, a n d 2) decision to select the best endpoint p a i r depends on feedback from the recognition scores.  U s i n g the best endpoint pairs correspond-  ing to different utterances, the h y b r i d endpoint detector produces recognition results close to that obtained from hand-edited endpoints. A real-time i m p l e m e n t a t i o n of this endpoint l o c a t i o n scheme was given i n [50].  Chapter 4. Solutions to the Recognition Problem  74  It should be noted that energy contours are easily derived i n practice. Since the energy contour "waveform" contains information on energy level changes occurring i n time, it is p o t e n t i a l l y useful i n our application. 4.2.1.4 P e a k D e t e c t i o n G o l d a n d R a b i n e r [51] analyzed the relative t i m i n g relationships between the peaks of low-pass filtered speech signals, and reported a reliable p i t c h estimation m e t h o d for speech signals of p i t c h frequency less t h a n 300 H z , even i n a h i g h level of whitenoise background. A n extension of this technique was developed to detect periodic and nonperiodic signals [52]. T h i s m e t h o d is especially susceptible to transient noise, such as those c o m m o n l y occurring i n the everyday acoustic environment. Therefore, this approach is not suitable for us.  4.2.2  F e a t u r e E x t r a c t i o n by F i l t e r B a n k s  Conceptually, the simplest way to extract spectral information from a signal is to pass it t h r o u g h a set of parallel bandpass filters tuned to different mid-frequencies.  These  mid-frequencies, a n d the filter b a n d w i d t h , w o u l d be selected to cover the frequency range of interest. T h e output of the filter is a measure of the average spectral intensity w i t h i n the filter b a n d . W h i t e a n d Neely [53] implemented their broadband speech signal recognizer using a bank of 20 one-third octave bandpass filters. These overlapping filters spanned the frequency range from 100 H z to 10 k H z . U s i n g a list of m u l t i s y l l a b i c words from a N o r t h A m e r i c a n dictionary, they achieved a recognition accuracy of 99.6 % i n their experiments. A n o t h e r filter-bank based speech recognizer was developed b y K w o k , T a i a n d F u n g [54] for the identification of the monophonemic Cantonese digits zero to ten.  Chapter 4. Solutions to the Recognition Problem  75  W i t h 12 eight-pole overlapping filters, this recognizer provided a n average recognition accuracy of 96.8 % . In industry, N E C has developed its integrated filter-bank based isolated w o r d recogn i t i o n L S I chip set [55]. T h e feature extraction processor of this chip set consists of eight b i q u a d d i g i t a l bandpass filters spanning the frequency range from 100 H z to 5.0 k H z . T h i s chip set employs a specific d a t a compression a l g o r i t h m to remove redundancy i n signal spectral features, and is "firm-wared" w i t h d y n a m i c p r o g r a m m i n g a l g o r i t h m for d y n a m i c time w a r p i n g calculations for signal recognition. A recognition accuracy of more t h a n 98 % was reported. M i y a z a k i and Ishida [14] developed a traffic a l a r m sound m o n i t o r for aurally handicapped drivers. T h i s traffic a l a r m sound m o n i t o r consists of seven bandpass  filters  followed b y seven line s p e c t r u m detectors. In order to reduce the false-alarm triggering due to the squeaking noises of brakes, tires, engine-noise at h i g h revolutions, w i n d noise at high-speed d r i v i n g , h u m a n voice, and music, a n error suppression circuit was designed to detect the sudden rise of the S P L of the input signal. T h e successful detect i o n of traffic a l a r m sounds depends on b o t h the outputs from the seven line spectrum detectors, and the error suppression circuit. D u r i n g field tests of this m o n i t o r on m o d erately crowded downtown roads i n T o k y o , on the average one false-alarm per three minutes was observed. F o r our a p p l i c a t i o n the filter bank approach offers the advantages of robustness, noise-resistance,  and straightforward implementation at a low cost.  discussed i n more detail i n Section 4.5.  These w i l l be  Chapter 4. Solutions to the Recognition Problem  4.2.3  The L P C / A R  76  Model  T h e L P C / A R m o d e l assumes that signals can be parametrically modeled as the outputs of a linear, t i m e - v a r y i n g system excited b y either quasi-periodic pulse trains, or r a n d o m noise. T h e L P C / A R signal analysis technique has been widely applied to seismic and speech signal processing. T o discriminate between earthquakes and underground nuclear explosions, T j o s t h e i m [56] employed a third-order autoregressive model to analyze short p e r i o d seismic events.  T h e extracted A R parameters produced two discernible  clusters characterizing earthquakes and explosions, respectively. So far L P C / A R parameters have been proven to give the most effective characterization of speech signals. These L P C / A R coefficients represent the combined i n formation about the formant frequencies, their b a n d w i d t h , a n d the glottal waveforms [57]. Therefore, d u r i n g the past decade, considerable effort was directed at the study of the performance of L P C / A R - b a s e d isolated w o r d recognizers. A c k e n h u s e n and O h [58] implemented a n eighth-order L P C - b a s e d isolated w o r d recognizer using a n A T & T D S P - 2 0 processor.  T h i s recognizer has also been used i n research for 1) statistically  clustered templates for speaker-independent w o r d recognition, 2) recognition based vector q u a n t i z a t i o n , a n d 3) recognition based hidden M a r k o v M o d e l i n g ( H M M ) of speech signals.  D a u t r i c h et a l . [59] demonstrated  that i n high S N R environments and for  signals t r a n s m i t t e d v i a telephone lines, L P C - b a s e d recognizers can perform several percentages better t h a n filter-bank based recognizers. In considering a n L P C / A R approach for W A R N S I S , we must deal w i t h two problems inherent to this technique. F i r s t , the order, ' p ' , of the L P C / A R signal analysis has to be estimated. Different criteria exist for estimating ' p ' for the L P C / A R analysis, but these criteria are signal dependent [24]. Second, the L P C / A R signal analysis is v e r y vulnerable to noise interference [60]. Since the L P C / A R m o d e l tends to fit  Chapter 4. Solutions to the Recognition Problem  77  spectral peaks more accurately t h a n the valleys [26], it is logical to compensate those spectral peaks caused b y noise interference b y increasing ' p ' i n noisy environments. Unfortunately, for a practical recognizer, ' p ' must always be fixed a n d  independent  of the v a r y i n g u n k n o w n signals received. T i e r n e y [61] showed that noise reduction should be applied prior to the analysis to ensure the best L P C / A R based recognition system performance i n noisy backgrounds.  T o compensate the L P C / A R  parameter  variations due to different noise sources, the derived L P C - c e p s t r a l coefficients w i t h different weighting factors were adopted as signal features.  Improvement i n system  recognition performance was reported i n [63,64]. T o implement a real-time L P C / A R based recognizer w i t h "intelligent" noise prefiltering for our a p p l i c a t i o n , a complex multiple-processor based system w o u l d be required. S u c h c o m p l e x i t y makes this approach undesirable for W A R N S I S .  4.2.4  L P C - d e r i v e d Cepstral  Coefficients  Pioneer w o r k of investigating the effectiveness of using different speech parameters for speaker identification a n d verification was done b y A t a l [62]. H e concluded that L P C derived c e p s t r u m coefficients provided better identification performance t h a n either L P C coefficients, or signal autocorrelation coefficients, or signal impulse response  filter  coefficients of a n all-pole filter derived from the estimated L P C / A R coefficients. Recently, the use of L P C - d e r i v e d coefficients for speech signal recognition has been reconsidered b y J u a n g et a l . [63] who applied bandpass liftering i n speech recognition. He showed that bandpass liftering of the L P C - d e r i v e d cepstral coefficients (equivalent to a p p l y i n g a s m o o t h i n g window) tends to reduce undesirable spectral sensitivity by s m o o t h i n g the spectral peaks w i t h o u t distorting the fundamental formant  structure.  S u c h undesirable spectral sensitivity may be caused by the presence of spectral notches or zeros i n the signal spectrum, introduced d u r i n g signal transmission, b y  filtering,  Chapter 4. Solutions to the Recognition Problem  78  or by improper preemphasis. S m o o t h i n g transforms the original L P C - d e r i v e d cepstral coefficients into more reliable parameters. Juang's recognition results showed that the bandpass liftering process produced one percent less error t h a n a process using standard cepstral coefficients. H a n s o n a n d W a k i t a [64] used "root-power sums" or weighted cepstral coefficients as spectral d i s t o r t i o n measures for speaker-dependent isolated w o r d recognition i n different noise environments. T h e y showed that for white noise interference, a gain of 16 % i n recognition accuracy may be achieved b y using weighted rather t h a n standard cepstral coefficients. T h i s m e t h o d suffers from the same l i m i t a t i o n s of c o m p l e x i t y a n d c o m p u t a t i o n a l requirements as the L P C / A R approach.  Therefore, it is equally unsuitable for our  application.  4.2.5  The Hidden Markov Model ( H M M ) Approach  One a p p l i c a t i o n of H M M for signal recognition is speaker-independent isolated w o r d recognition. T h e left-to-right topology of H M M is generally adopted i n practice. Such a H M M m o d e l has N states a n d each state corresponds to a set of temporal events i n the speech signals. T h e H M M is characterized by a state t r a n s i t i o n m a t r i x , a n d a statistical characterization of the acoustic vectors w i t h i n the state. A detailed e x p o s i t i o n o n the a p p l i c a t i o n of H M M to speech recognition is given i n [65]. R a b i n e r et a l . [66] showed t h a t the H M M based recognizer requires ten times less storage, and about 17 times less c o m p u t a t i o n for recognizing a test utterance t h a n does a n equivalent recognizer using L P C coding a n d D T W . T h i s is at the expense of a slight increase i n error rate, and of extensive c o m p u t a t i o n while t r a i n i n g the m o d e l w i t h a reasonable large ensemble of utterance samples. T h e improvement of the H M M performance i n different noisy environments has received considerable attention i n the last few years [67].  Chapter 4. Solutions to the Recognition Problem  79  Considering H M M for our application, we must concern ourselves with the topology of the model. Based on such a topology, the Baum-Welch algorithm could be employed to extract the statistical parameters of the model [65]. To evaluate these probabilistic model parameters, scaling of temporary results must be performed with great care to avoid underflow problems which occur even when mainframes are used [66]. Therefore, H M M appears to be unattractive for hardware implementation using integer arithmetic amenable to real-time operation.  4.3  O v e r v i e w of the R e c o g n i t i o n Scheme for W A R N S I S  In the selection of the recognition scheme for WARNSIS the following criteria must be considered: 1. reliability and robust recognition performance in different noise environments; 2. real-time operation; 3. portability; and 4. reasonable cost. Our preliminary experiments have shown that neither timing nor short-time spectral information is sufficient on its own for reliable recognition performance (see Chapter 6 for performance results).  Since both timing and spectral information contributes  unique identifiers, a "hybrid" recognition scheme, utilizing both timing and short-time spectral information was designed for WARNSIS (Fig. 4.28). In particular, our design uses timing features as "tokens" to assign sounds to various groups (steady, on-off, variable, etc). Spectral analysis is then used to correlate the spectra of the unknown sound with the spectra of the warning sounds belonging to that group.  We have  Chapter 4. Solutions to the Recognition Problem  80  PATTERN MEMORY SIGNAL •  TIMING ANALYZER  SPECTRAL ANALYZER  PATTERN COMPARISON  SOUO STATE SVKTCH  DECISION RULE  RECOGNIZED SOUND  DYNAMIC TIME WARPING  F i g u r e 4.28: T h e ' h y b r i d ' recognition scheme for W A R N S I S designed a unique analyzer w h i c h produces t i m i n g information and obtains spectra using the filter bank approach. O p e r a t i o n a l l y , the system works as follows.  In the t r a i n i n g stage, the warning  sounds of interest are analyzed, and relevant t i m i n g information is derived and stored i n the t i m i n g p a t t e r n memory.  Consequently, short-time spectra of these sounds are  generated by the spectral analyzer. T h e short-time spectra of w a r n i n g sounds are classified a n d stored i n the spectral p a t t e r n memory according to the group classification determined earlier by t i m i n g analysis. I n the recognition stage, two types of pattern comparisons are performed sequentially, before a decision is reached to declare a successful recognition for a specific w a r n i n g sound.  T h e first stage involves the t i m i n g pattern comparison between the  t i m i n g features of an u n k n o w n signal and the pre-stored t i m i n g patterns. If the m a t c h i n g criteria are not satisfied for any of these patterns, no spectral analysis is performed  Chapter 4. Solutions to the Recognition Problem  81  on the i n c o m i n g signal, and the t i m i n g analysis resumes for the next sample. If a m a t c h is found w i t h one of the t i m i n g patterns, the signal is assigned to the corresponding "group", a n d spectral extraction a n d pattern comparisons are performed on it. B a s e d on the m i n i m u m distance score computed for the pre-stored  templates,  the u n k n o w n signal is recognized as the corresponding w a r n i n g sound. T h e details of the design are given i n Sections 4.4 a n d 4.5. Since p a t t e r n comparisons involve the most intensive computations i n p r o d u c i n g a set of distance measures (similarity measures), any possible reduction i n number of comparisons between the u n k n o w n signal a n d the pre-stored templates enhances the real-time performance of recognizers.  In our recognition scheme this is achieved by  m a k i n g use of the t i m i n g features to group w a r n i n g sounds.  A n a d d i t i o n a l use of  t i m i n g i n f o r m a t i o n is to prevent unnecessary spectral a n d pattern analysis work when only noise is present.  Chapter 4. Solutions to the Recognition Problem  4.4  82  E x t r a c t i n g & Classifying T i m i n g Information  One or two signal processing steps m a y be needed to extract t i m i n g features steady or burst-type w a r n i n g sounds ( F i g . 4.29).  from  T h e first step classifies w a r n i n g  sounds according to the features derived from signal waveforms. F o r steady sounds, t i m i n g feature e x t r a c t i o n terminates after this processing; for burst-type sounds, the processing proceeds to the next step, w h i c h estimates the repetition p e r i o d .  STEADY SOUND  WARNING SIGNAL  SIGNAL  CLASSIFICATION  BURST-TYPE SOUND  REPETITION  PERIOD  CALCULATION  F i g u r e 4.29: B l o c k d i a g r a m of the T i m i n g Feature E x t r a c t o r  In real-life, w a r n i n g sounds are modified acoustically by the environment, a n d the a d d i t i o n of unwanted sounds.  These background sounds m a y be either continuous,  or transient. I n a d d i t i o n , w h a t a microphone receives from a source depends on the paths between the two, their orientation w i t h respect to each other, a n d the sound modification characteristics of the environment.  Chapter 4. Solutions to the Recognition Problem  83  Extracting timing features from distorted and noisy signals has not been addressed by other workers in the literature. Compelled by the demands of real-life circumstances, we developed the algorithm presented here to deal with this problem. This development was inspired by the work of Gold and Rabiner [51], and Lamel [49].  4.4.1  A Scheme to E x t r a c t T i m i n g Features  We have demonstrated in Chapter 3 that the contour characteristics of the short-time average absolute amplitudes (STAAA) of warning sounds are distinctively defined for steady and burst-type sounds. Working with the short-time average absolute amplitude is more attractive for us than the average energy used in Lamel's work because the short-time average absolute amplitude: 1) is a simple measurement which preserves the essential features of the corresponding energy contours, 2) requires no multiplication operations, and 3) has a smaller dynamic range which can be coded in 8 bits. The relationships between the short-time average absolute amplitudes and the average energy of a discrete sequence x(n) are shown in Fig. 4.30. Since the short-time average absolute amplitude is obtained from an 8-bit A / D conversion, and is coded in integer arithmetic, its dynamic variations are limited to 256 levels. The value of the short-time average absolute amplitudes is zero when the environmental noise level falls below the threshold value of the A / D conversion system. In order to compress the dynamic variations of the short-time average absolute amplitudes for plotting purposes, we adopted a logarithmic measure to readjust these short-time average absolute amplitude values. This logarithmic measure is:  STAAA  =  10  log  10  (STAAA  + 1)  (4.20)  Chapter  4. Solutions  to the Recognition  Problem  84  s  *»1 *1  TIME ( i n sec) (c)  Figure 4.30: Relationships between the instantaneous energy and the instantaneous absolute amplitudes of a sequence, x(n). (a) : the plot of x(n); (b): the plot of |x(n)|; and (c): the plot of x (n) 2  Chapter 4. Solutions to the Recognition Problem  85  Note the value of the short-time average absolute amplitude is incremented by one to prevent the argument of the logarithm to take on the value of zero.  The error  introduced by this is not relevant since the essential features of the contour are not affected. From the STAAA  contours of warning sounds, the break-points or transitions (ris-  ing and falling) in these waveforms are located. Timing features of warning sound are thus derived from the timing relationship between these transitions similarly to the method of Gold and Rabiner. Fig. 4.31 (a) gives the STAAA sound, whereas Fig. 4.31 (b) shows the STAAA  contour of a steady  contour of a burst-type sound.  With reference to Fig. 4.31 (a), a steady sound is identified if a rising transition of the waveform of short-time average absolute signal amplitude is detected, and a new value of short-time average absolute signal amplitude is then maintained for at least four seconds. For burst-type sounds two rising and falling transitions must be detected ( T i , T , and T , T , respectively are shown in Fig. 4.31 (b)). The repetition period ( R P ) 3  2  4  and the average width of signal bursts (AWSB) can then be obtained according to the following equations:  RP  =  < ' - '> + < ' - ' »  AWSB  =  {  r  T  r  i  ~  T l )  r  +  (  r  ' ~  r  T  l  )  (4.2!) (4.22)  To detect these transitions, a signal amplitude threshold is derived from the shorttime average absolute amplitude of the acoustic background. This short-time average absolute amplitude is dynamically updated every 12.8 msec to accommodate the acoustic energy variations of the environment. This dynamic amplitude threshold (DAT) provides the baseline level of the background, and is used for transition (rising and falling) detection.  Chapter 4. Solutions to the Recognition Problem  86  e  n  J111IIIIIIIIII!!!I!!R!IIIIIIIII!!I!!!I  0.0  I—  3.2  6.4  06  12.8  TIME ( i n sec) (a)  0.000  0.512  1 024  -i 1.536  TIME ( i n sec)  r 2.048  2.560  (b)  Figure 4.31: (a): The STAAA contour of a steady sound; (bj: The isTAAA contour of a burst-type sound  Chapter 4. Solutions to the Recognition Problem  87  W h e n the detection scheme starts, the d y n a m i c a m p l i t u d e threshold is assigned the m a x i m u m value.  T h e n , the i n c o m i n g short-time average absolute amplitude is  compared to the d y n a m i c a m p l i t u d e threshold. If the i n c o m i n g short-time average absolute a m p l i t u d e is less t h a n the d y n a m i c amplitude threshold, the d y n a m i c a m p l i t u d e threshold is u p d a t e d b y averaging the short-time average absolute a m p l i t u d e a n d the d y n a m i c a m p l i t u d e threshold:  DAT (updated)  (4.23)  U p d a t i n g ensures that the dynamic amplitude threshold follows the a m p l i t u d e level changes due to background noise.  T h i s method continuously adjusts the d y n a m i c  a m p l i t u d e threshold downwards u n t i l a rising transition is detected. S u c h a transition m a y be either due to a w a r n i n g signal, or due to a sudden increase i n b a c k g r o u n d noise. If no rising t r a n s i t i o n is detected for a p e r i o d of four seconds, the d y n a m i c amplitude threshold is reset to its i n i t i a l value, a n d the search for a rising t r a n s i t i o n resumes. F i g . 4.32 shows a n example how the d y n a m i c a m p l i t u d e threshold adapts to acoustic energy variations i n the environment. Since the d y n a m i c amplitude threshold and short-time average absolute amplitudes are expressed i n integer arithmetic, the value of the m i n i m u m detectable  difference  between t h e m is one. T o avoid the false detection of a rising t r a n s i t i o n due to r a n d o m noise disturbance, we set the value of the threshold for detecting this transition as two. If the short-time average absolute amplitude is larger t h a n the d y n a m i c a m p l i t u d e threshold by this preset threshold, a rising transition is detected and a reference time m a r k e r ( T ) is set. A corresponding falling transition w i l l be detected and m a r k e d (T2) x  as soon as a n i n c o m i n g short-time average absolute a m p l i t u d e falls below the d y n a m i c a m p l i t u d e threshold. However, i f no falling transition is detected i n a p e r i o d of four seconds ( m a x i m u m allowable burst w i d t h ) , this sound m a y be a steady sound.  To  Chapter 4. Solutions to the Recognition  Problem  88  F i g u r e 4.32: T w o t y p i c a l examples of how the d y n a m i c a m p l i t u d e threshold adapts to acoustic energy variations of the environment, (a): sudden decrease i n signal levels; (b): sudden increase i n signal levels  Chapter 4. Solutions to the Recognition Problem  89  confirm this, the d y n a m i c amplitude threshold is reset to its i n i t i a l value, a n d if no rising t r a n s i t i o n is detected i n one second period following, the sound is declared to be a steady sound, a n d the t i m i n g feature extraction process terminates. If a rising t r a n s i t i o n is detected w i t h i n one second, the search for its corresponding falling t r a n s i t i o n continues, a n d the hypothesis of a steady sound is rejected. A s s u m i n g a burst-type signal this detection process continues u n t i l a second t r a n s i t i o n pair set is detected a n d m a r k e d w i t h T a n d T for rising a n d falling transitions, respectively. C o n 3  4  sequently the R P a n d A W S B are computed a n d the t i m i n g feature e x t r a c t i o n process terminates. A t y p i c a l example of the detection of a siren sound is illustrated i n F i g . 4.33 (a), and F i g . 4.33 (b) demonstrates how the steady sound detection scheme rejects non-steady sounds. T h i s scheme works well for w a r n i n g sounds i n backgrounds w i t h steady noises. T o deal w i t h nonstationary noises such as radio broadcasts, and transient sounds due to door s l a m m i n g or movement of chairs, a d d i t i o n a l parameters a n d c o n d i t i o n a l tests are included i n the scheme. These are: 1) the m i n i m u m burst d u r a t i o n ( M B D ) , and 2) the m a x i m u m inter-arrival time ( M I A T ) between two consecutive signal bursts. A s shown i n F i g . 4.34, any signal w i t h d u r a t i o n less t h a n the M B D is declared as an unwanted transient. F u r t h e r m o r e , i f the signal shows pulsative variations that last longer than M I A T , the hypothesis of a burst-type sound is rejected. These c o n d i t i o n a l tests were incorporated into the basic scheme as follows. W h e n any signal burst is detected, its w i d t h is calculated a n d compared to the M B D . If the c o m p u t e d w i d t h is less t h a n the M B D , the detected burst is treated as transient noise, a n d the search continues. If the burst is longer t h a n the M B D , the system waits u n t i l a second t r a n s i t i o n is detected. T h e t i m e difference between following transitions  Chapter 4. Solutions to the Recognition  Problem  90  LEGEND SIGNAL LEVEL DAT  i—  8.4  TIME ( in  sec)  (a)  5 12  TIME ( in  7 68  12.8  —\  9.6  10.24  12.80  sec)  (b) Figure 4.33: (a) : Detection of a steady sound; (b): A n illustration of how the scheme rejects a non-steady sound  Chapter 4. Solutions to the Recognition  91  Problem  MIAT T1 < M I A T T1 —  W3  MBD-  W3 > MBD  W2 < M B D TRANSIENT NOtSE  CO W1 > M B D  CONFIRMED BURST S E Q U E N C E 031.B2)  W1  B2  TIME  F i g u r e 4.34: A demonstration of the use of the M B D and M I A T to refine the basic w a r n i n g sound analysis scheme  Chapter 4. Solutions to the Recognition Problem  92  is computed, a n d compared to the M I A T . If this time is longer t h a n the M I A T , the hypothesis of a burst-type sound is rejected, d y n a m i c a m p l i t u d e threshold is reset, a n d the t i m i n g feature extraction process is reset a n d restarted. A flowchart of the complete scheme for t i m i n g feature extraction is shown i n F i g . 4.35.  T h e p r o g r a m was w r i t t e n i n I N T E L 8088/8086 assembly language for real-time  operation. T h e hardware developed i n C h a p t e r 3 for t i m i n g parameter measurement is employed here to generate the instantaneous absolute amplitudes of w a r n i n g sounds.  Chapter 4.  Solutions to the Recognition  Problem  93  (START"]  I INPUT E DAT <— DAT+EO/2  ZZJ INCREASE BURST WIDTH  "  COUNTER  I INPUT Ei  RECORD ITS LOCATIONS YES  YES YES RECORD THE SECOND TRANSITION PAIR LOCATIONS DECLARE STEADY SOUND  COMPUTE  RP. A W S 6  X  END F i g u r e 4.35: F l o w c h a r t of the T i m i n g Feature E x t r a c t i o n Scheme  Chapter 4. Solutions to the Recognition Problem  4.5  94  E x t r a c t i n g S p e c t r a l Information  As shown in Fig. 4.28, timing analysis is followed by spectral analysis. The latter is initiated only if the timing analysis indicates the possibility of the presence of one of the recognizable warning sounds.  Since timing analysis of warning sounds gives the  time markers for the rising and falling transition of sound bursts, it is equivalent to the end-point detection of isolated utterances [49]. Thus, the timing analyzer conveniently provides the on/off control for the spectral analyzer.  4.5.1  Feature Extraction  In our review of methods of obtaining spectral information from signals in real-time we have already indicated our preference for the filter-bank approach. Firstly, the filterbank method works well for simple speech signals, and the warning signal spectra are simpler than the spectra of speech. In particular, Dautrich et. al. [59] demonstrated that for spoken digits the performance of a filter-bank recognizer was equal to the performance of the more complicated L P C recognizer.  Secondly, as shown by L i m  [60], in noisy environments filter-bank recognizers are less error prone than the L P C based recognizers. This a very important criterion for us, since our specific goal is to recognize warning signals in low SNR situations. Thirdly, filter-bank recognizers are fast, are relatively simple, and are commercially available at a reasonable cost. Fig. 4.36 gives the block diagram of our spectral analyzer which uses a filter-bank. Signals pass through a bank of eight bandpass filters covering frequency bands from 100 Hz to 5.0 kHz. The output of each bandpass filter is passed through a full-wave rectifier, and low-pass filtered to give a value related to the energy of the incoming warning sounds in each band. The outputs of bandpass filters are sampled (typical rate 50 - 100 Hz) to give a segment of a feature set. A t a time index k, a segment of  Chapter 4. Solutions to the Recognition Problem  95  BP 1  SIGNAL FEATURE PATTERN  9GNM.  BP.  LP i  FW  FW  6  BP =k  L  P  8  BANDPASS FILTER  h  = k LP = k h  FULL-WAVE RECTIFIER LOW-PASS FILTER  F i g u r e 4.36: F i l t e r - b a n k analysis of W a r n i n g sounds ,x (k)} defines a 8  parallel outputs {xi{k),x (k),... 2  s  X(k)  =  th  order feature vector X ( k ) as,  {x (A0,x (*),...,:r (*)} a  2  8  (4.24)  A complete spectral pattern of a w a r n i n g sound is given as, R  =  {X(l),X(2),...,X(k),...,X(N)}  (4.25)  In the recognition stage these reference patterns are compared to the spectral pattern T , of a n u n k n o w n signal. D y n a m i c time w a r p i n g is employed to provide a quant i t a t i v e s i m i l a r i t y measure between reference a n d u n k n o w n patterns.  Chapter 4. Solutions to the Recognition Problem  4.5.2 The  96  Dynamic Time Warping ( D T W ) basic idea of D T W is to provide a n o p t i m u m s i m i l a r i t y measure between two  patterns of different t i m e durations.  D T W can compensate for the nonlinear time  misalignment of patterns w h i c h m a y be caused b y noise giving rise to errors i n the detection of endpoints. Conceptually, m a t c h i n g between these patterns involves the search for a t i m e w a r p ing function for w h i c h the segment-to-segment comparison is o p t i m a l according to some distance criteria. F i g . 4.37 gives a n example of the o p t i m u m m a t c h between a reference template a n d a n u n k n o w n pattern whose feature sets consist of letter alphabets. M a t h e m a t i c a l l y , the p r o b l e m can be stated i n the following manner.  R(n),T(m)  V n G [l,iV],  m  G [ 1 , M ] where N ^ M (in general), and  are the reference and the test pattern at t i m e indices n,m,  Consider  R{n),T{m)  respectively. D T W is to  find a n o p t i m u m t i m e w a r p i n g function w{n) to m i n i m i z e the accumulated distance,  [D* ) between these two patterns w i t h D* given b y A  A  JV  D*  A  =  min £ «»)}  where d [ R(n),T(w(n))  n  =  d [ R{n),T{w{n))  ]  (4.26)  1  ] is defined as the frame-by-frame (segment-by-segment) dis-  tance measure. Several possible distance measures can be used, depending on the form of the feature sets [37]. In this discussion, the absolute magnitude difference is used as a distance measure. T h u s , d [ R(n),T(w(n)  d [*(»),!>(*))]  =  ] is expressed by,  E  k=i  l*»(*)-*S(n)(*)|  (4-27)  Chapter 4. Solutions to the Recognition  Problem  Figure 4.37: A n example of pattern matching between a reference template and unknown pattern  Chapter 4. Solutions to the Recognition Problem  98  where the k  bandpass filter output at time index n of a reference spectral .  pattern, the k  bandpass filter output at time index w(n) of a test spectral  th  pattern, and  L = the total number of bandpass filters of the filter-bank used. Since one would expect the optimum warping path to be close to a straight line, most of the computations at the beginning and the end of this path can be reduced by establishing boundary conditions for the search. In general, the optimum warping path function can be obtained by Dynamic Programming [39,53,69]. Rewriting the original path searching equation, a recursive accumulated distance function, denoted as D (n,m), A  D (n,m) A  =  is defined as  d [ R(n),T(m)  } +rmn  [ D (n-1,1)} A  (4.28)  I <m  The above equation defines the minimum accumulated distance to grid point (n,m), and consists of the local distance between feature set R(n) and T(m), plus the minimum accumulated distance to the grid point (n — 1,1) where / are the possible values of m constrained by a given set of local paths. As an example, Fig. 4.38 shows one of the possible sets consisting of three paths leading to the grid point (n, m): (n — l , m ) ,  (n — l,m — 1), and (n — l,m — 2).  To ensure that the time warping function is  monotonically increasing, an additional path constraint is applied. Specifically, if the best path to grid point (n — l,m)  came from grid point (n — 2,m), then no path can  lead from the grid point (n — l , m ) .  Chapter 4. Solutions to the Recognition Problem  99  Formulating these path constraints mathematically, we obtain  w(n) - w(n-  1)  =  0,1,2  if  w(n - 1) ^ w(n - 2)  =  1,2,  if  w(n - 1) = w(n - 2)  (4.29)  Therefore, substituting the above constraint equations into Eq. 4.28, we have the D P recursive solution to the D T W ,  D'A (n,m)  =  d [ R(n),T(m)  } +  (4.30)  m i n {DA{n — 1, m) g{n — 1,m),  D (n  - l , m - l),D (n  A  A  - l , m - 2)) }  where  g(n-l,m)  =  1  =  oo  if  w(n - 1) ^ w(n - 2)  if  (4.31)  w(n — 1) = w(n — 2)  with boundary conditions governed by,  w{l)  =  1  (4.32)  w(N)  =  M  (4.33)  w(n~l)  (4.34)  and continuity criterion for w(n) expressed by,  w(n)  >  This iteration is carried out over all valid m, for each n sequentially from n = 1 to  N. The constraint of Eq.(4.33) means that the last segment of the template and test signal must coincide and the distance function is D {N,M). A  reached, the warping path w(n) is completely defined.  When the last segment is  Chapter 4. Solutions to the Recognition Problem  100  F i g u r e 4.38: L o c a l p a t h constraints for D T W T h e c o m p l e x i t y involved i n D T W i m p l e m e n t a t i o n depends on the b o u n d a r y conditions, the local p a t h constraints, a n d on the distance measure. B o t h Sakoe and C h i b a [39], a n d M y e r s [70] have investigated the effects of v a r y i n g these factors on b o t h speed a n d performance of the D T W a l g o r i t h m i n speech-recognition systems.  T h e y have  shown that only small differences are found i n performance for a fairly wide range of variations of these parameters. If the reference a n d test patterns are dissimilar, the distance measures w i l l be consistently large. Therefore a n accumulative distance l i m i t must be established to stop unnecessary c o m p u t a t i o n . W h e n e v e r a n accumulated m i n i m u m distance is obtained, it is compared to the distance l i m i t . If it is larger t h a n the l i m i t value, the matching process between this reference a n d the test p a t t e r n terminates, a n d another reference p a t t e r n is used to compare to the test pattern.  Chapter 5  Design & Implementation  Utilizing the methodologies discussed in the previous Chapters, we designed and implemented a WARNSIS prototype. Fig. 5.39 shows the four main hardware building blocks of our device:  the microphone, the signal conditioner, the control &: timing  processor ( C T P ) , and the spectral recognizer (SR).  5.1 5.1.1  Timing Analyzer Microphone  A microphone is used as the transducer that receives environmental sounds and produces the electrical input for the WARNSIS. The characteristics of the microphone play a crucial role in determining the quality of the signal that is fed to the analog signal conditioner. We selected a S O N Y model directional microphone which has a frequency response of 100 - 15000 Hz, and a sensitivity of -70 ± 3 dB (with reference to 0 dB = lV/fxbar)  at 1000 Hz. It is an electret-condenser microphone with two  selectable angles ( 90° and 120° ) of reception. A microphone with a narrower angle of reception may provide better spatial separation between the signal and the background noise when the sources are separated, and the microphone is oriented at the direction of the signal source. O n the other hand, when such a microphone is not oriented in direction of the signal source, the signal quality may be degraded substantially.  101  SPECTRAL  ANALOG SIGNAL CONDITIONER FULL-WAVE  norm  RECTIFIER  I  ^  PREAMP  LOW-PASS  CONTROL  I  CM/  ?w  ~i r  AUTOMATIC GAIN  FILTER  OCNI (SR)  CONTROL A TIMING PROCESSOR  CONTROL  PROCESSOR  _ HYBRID ANALOG PROCESSOR  PAT  ERN  MEMORY  TONE  a  GENERATOR  MICROPHONE  FEATURE EXTRACTION « PATTERN MATCHING PROCESSOR  Figure 5.39: The building blocks of WARNSIS  Chapter 5. Design &  5.1.2  Implementation  103  A n a l o g Signal Conditioner  T h e function of the analog signal conditioner is to: 1) pre-process the microphone output to generate a n analog input for the spectral recognizer, a n d 2) to calculate the instantaneous amplitudes of the signal for the use of this information b y the control & t i m i n g processor. Correspondingly, the signal conditioner consists of a n audio preamplifier, a low-pass filter, a n automatic gain controller ( A G C ) , two solid-state analog switches, a n S P D T m a n u a l s w i t c h , a full-wave rectifier, and a 1 k H z c a l i b r a t i n g tone generator. T h e voltage produced by the directional microphone is fed to a n audio-preamplifier. Since the noise characteristics of a n audio pre-amplifier system depend p r i m a r i l y on the noise generated b y its first stage, we used a low-noise audio operational amplifier ( w i t h noise characteristic of 9 n V / H z ) . T h i s pre-amplifier provides a voltage gain of 2  58.3 d B at a 100 H z - 8.0 k H z b a n d w i d t h . T o reduce the unwanted high frequency content of the signal, the pre-amplified signal is fed to a 6 kHz. This 6  th  th  order Chebyshev low-pass filter, w i t h a cut-off frequency at 6.4  order filter was constructed from three cascaded second order filters. T h e  overall voltage gain of the filter c h a i n is 11.2 d B . T h e filtered signal is consequently branched into two signal processing modules: the full-wave rectifier and the A G C . W e used the same full-wave rectifier module as the one described i n C h a p t e r 3. T h e A G C is employed to m a i n t a i n the signal level at values that prevent signal clipping. T h i s A G C l i m i t s output signal amplitude variations to 3 d B w h e n the incoming signal varies b y 60 d B . T h e analog s w i t c h , SWi, provides a windowed segment of the signal from the A G C output. T h i s switch is controlled b y the control <fe t i m i n g processor, and the gating w i n d o w d u r a t i o n is set to 470 msec. T h i s gating d u r a t i o n can easily be altered b y an  104  Chapter 5. Design & Implementation  external t i m i n g resistance. T h e control & t i m i n g processor w i l l activate SWi according to the t i m i n g information extracted from the instantaneous amplitudes of the signal. T h e output of the A G C module is then fed to an S P D T m a n u a l s w i t c h (SW ). S  T h e 1 k H z c a l i b r a t i n g tone has a peak-to-peak voltage of three volts. T h e tone generator is connected to another analog s w i t c h (SW ) 2  second i n p u t of the SW . S  whose output is tied to the  T h e function of the 1.0 k H z tone is to calibrate the input  signal level of the h y b r i d analog processor of the spectral recognizer d u r i n g the i n i t i a l i z a t i o n of the W A R N S I S . I n this prototype, the user has to m a n u a l l y flip the switch to determine w h i c h one of the two signals (the processed signal from the microphone, or the c a l i b r a t i o n 1 k H z tone) is fed to the h y b r i d analog processor.  5.1.3  C o n t r o l & T i m i n g Processor ( C T P )  T h e control &; t i m i n g processor consists of decoding circuits, a software programmable port, a n d a microprocessor. T h e port ( I N T E L 8255, software programmable) allows parallel c o m m u n i c a t i o n between the microprocessor and the spectral recognizer to m o n itor the step-by-step operation of the recognizer logic, a n d is the gateway for the control signal t h a t operates the s w i t c h i n the analog signal conditioner. T h e microprocessor is a n I N T E L 8088, housed i n a personal computer. T h e first function of the control &; t i m i n g processor is to perform 'real-time' t i m i n g analysis as described i n C h a p t e r 4.  Its second function is to initiate the spectral  recognition process.  5.2  S p e c t r a l Recognizer (SR)  T h e spectral recognizer hardware consists of a n N E C L S I speech chip set. T h i s set has three processors as shown i n F i g . 5.39: 1) the h y b r i d analog processor ( M C 4 7 6 0 ) ,  Chapter 5. Design &: Implementation  105  2) the feature extraction and p a t t e r n m a t c h i n g processor ( ^ P D 7 7 6 1 ) , and 3) the control processor (yuPD7762) [55]. W e selected this speech recognition chip set since it has the features required by our method: • filter-bank based recognizer; • signal frequency b a n d w i d t h of 100 H z to 5.0 k H z ; • allowable windowed signal duration from 0.2 sec to 2.0 sec; • supports a m a x i m u m storage of 512 signal templates; • uses s y n t a x number i n grouping signal templates; • p a t t e r n comparison using D T W v i a "firmwared" D P method; • simple set of twelve macro commands to operate the chip set; and • average recognition time of 0.5 sec. T h i s chip set, coupled w i t h external memory for signal template storage, constitutes the spectral recognizer of our W A R N S I S .  5.2.1  T h e H y b r i d A n a l o g Processor ( M C 4 7 6 0 )  T h e h y b r i d analog processor performs signal equalization a n d d i g i t a l s a m p l i n g of input signals. F i g . 5.40 gives a simplified block d i a g r a m of M C 4 7 6 0 . Signal is accepted to the equalization amplifier whose voltage gain can be altered b y v a r y i n g an external resistance.  Since sufficient voltage gain is provided from the signal conditioner,  the  voltage gain of the equalization amplifier is set to the possible m i n i m u m gain (0.59 d B ) . T h e gain of the input signal can further be adjusted by a d i g i t a l programmable attenuator under the control of the control processor. F o r speech application, this  Chapter 5. Design St Implementation  106  FROM ANALOG SWITCH  EQUALIZER AMPLIFIER  DtGmZED SAMPUi TO UP07761  PROGRAMMABLE ATTENUATOR  ANTI-ALIASING FILTER  A/D  SERIAL  CONVERTER  PORT  Figure 5.40: Block diagram of MC4760 attenuator compensates for signal level variations due to microphone position. However, in our application signal level equalization is performed by an external AGC circuit, and thus, the attenuator gain is permanently set to unity. The attenuated signal is then low-pass filtered by an anti-aliasing filter (5 kHz bandwidth), and is input to a built-in 8-bit A/D converter. The converter samples the signal at a rate of 10 kHz, and the sampled data are converted into inverted /z-law PCM codes. Subsequently, this output is serially transmitted to a dedicated serial input port of the feature extraction processor at a 2 MHz clock rate. 5.2.2  F e a t u r e E x t r a c t i o n and P a t t e r n M a t c h i n g Processor ( M P D 7 7 6 1 )  The /xPD7761 is an NMOS device optimized for single instruction cycle arithmetic operation. It runs at a clock rate of 8 MHz, and operates in either of two modes (analysis or pattern matching) as selected by the control processor (/xPD7762). A block diagram of the functional operation of the /iPD 7761 is shown in Fig. 5.41.  Chapter 5. Design iz Implementation  107  PATTERN MEMORY  UPD7761  UPD7762  8-BIT P A R A L L E L PORT  WGmZED SAMPLES FROM MC47W  SERIAL PORT  8-BIT P A R A L L E L PORT  8-BIQUAD DIGITAL FILTERS  DYNAMIC PROGRAMMING MATCHING  FULL-WAVE RECTIFIER  F i g u r e 5.41: B l o c k d i a g r a m of the functional operation of /zPD7761  In the analysis mode, the ^uPD7761 accepts digitized data samples from the M C 4 7 6 0 v i a a dedicated b u i l t - i n serial port. D a t a transfer t i m i n g is controlled by an input clock at 2 M H z , w h i c h is the rate at w h i c h d a t a is fed from the M C 4 7 6 0 . These samples are analyzed by a 8-channel biquad filter bank firmwared into the on-chip R O M memory. T h i s filter bank spans the frequency s p e c t r u m from 100 H z to 5.0 k H z . E a c h output of the bandpass filter is full-wave rectified. T h e rectified outputs are sampled at a frame rate of 12 msec, a n d sent t o the control processor v i a a 8-bit parallel port. T h i s process is repeated for successive frames u n t i l the entire windowed segment of the signal is analyzed. In the pattern m a t c h i n g mode, the ^ P D 7 7 6 1 compares the features of the u n k n o w n signal w i t h the pre-stored signal templates using the D T W approach. T h e algorithm  Chapter 5. Design &  Implementation  108  is firmwared onto the chip to perform the c o m p u t a t i o n a l l y intensive distance calculations. E a c h comparison w i t h a pre-stored template takes a n average of 5 ms.  Upon  completion, the recognition result is transferred to the control processor and subsequent templates are compared, u n t i l a l l templates have been checked.  5.2.3  T h e C o n t r o l Processor (jiPD7762)  T h e control processor provides the only c o m m u n i c a t i o n link between the control & t i m ing processor and the spectral recognizer. I n a d d i t i o n , it performs two i m p o r t a n t funct i o n a l operations. F i r s t , it serves as a system controller for the M G 4 7 6 0 a n d / / P D 7 7 6 1 by p r o v i d i n g the necessary control signals to synchronize a l l operations. S u c h control signals include the c o m m u n i c a t i o n protocols w i t h the control &; t i m i n g processor, the m e m o r y selection, read and write signals, reset signal for the M C 4 7 6 0 and / / P D 7 7 6 1 , and specific c o m m a n d code to initiate the feature extraction and p a t t e r n operations of the / / P D 7 7 6 1 . Secondly, it functions as a spectral feature compressor, by retaining only one of a set of vectors whose values are close to each other [55]. P a t t e r n compression is i m p o r t a n t because it allows a significant amount of reference memory to be saved, a n d it speeds up the calculations involved i n p a t t e r n matching. W h e n a specific operation code is sent from the control & t i m i n g processor to the spectral recognizer, decoding is performed by the ^ P D 7 7 6 2 , p r o v i d i n g the  necessary  control signals for execution. T h e ^uPD7762 also reports the result(s) obtained from the execution of the code to the control &; t i m i n g processor. F o r example, i f a t r a i n i n g c o m m a n d code is received b y the / / P D 7 7 6 2 , the following series of events occur: • the / / P D 7 7 6 2 decodes the command; • it activates the jiiPD7761 to extract spectral contents from the digitized input signal samples fed from M C 4 7 6 0 ;  Chapter 5. Design & Implementation  109  • the spectral information is sent to the / / P D 7 7 6 2 for feature compression; • the compressed spectral features are stored into the external pattern memory; and • a successful t r a i n i n g status flag is sent to the control & t i m i n g processor when all t r a i n i n g procedures are completed. Otherwise, a n error status is reported to the control & t i m i n g processor.  5.2.4  Pattern Memory  T h e chip set can m a x i m a l l y allow 64 k b y t e of pattern memory, w h i c h stores 512 signal templates. T h i s pattern memory is divided into four banks, each of w h i c h consists of 16 kbyte of memory, a n d can be r a n d o m l y selected by the spectral recognizer i n the t r a i n i n g and recognition stages. I n our prototype we used 32 k b y t e of static R A M .  5.3  Software P r o g r a m  T h e software p r o g r a m co-ordinates the functional operations of the t i m i n g analyzer a n d the spectral recognizer. Basically, it consists of different p r o g r a m modules w h i c h are responsible for various operational stages of the system. Such stages include the i n i t i a l i z a t i o n of the system (the t i m i n g analyzer a n d the spectral recognizer), the signal t i m i n g analysis, a n d the signal t r a i n i n g and recognition. T h e p r o g r a m module for the t i m i n g analysis is a direct i m p l e m e n t a t i o n of the a l g o r i t h m developed i n C h a p t e r 4, a n d the p r o g r a m module for the signal t r a i n i n g a n d recognition was developed by using the specific set of commands provided by the chip-set manufacturer. W e start the detailed description of the software w i t h a s u m m a r y of the most i m p o r t a n t commands of the spectral recognizer control language. T h e n we present the  Chapter 5. Design & Implementation  110  three major modules of the program. These modules correspond to the three modes of operation of the system: i n i t i a l i z a t i o n , t r a i n i n g , a n d recognition.  5.3.1  T h e C o m m a n d Set of the S p e c t r a l Recognizer  T w e l v e commands are provided to operate the spectral recognizer. These commands are sent to the c o n t r o l &: t i m i n g processor to initiate specific operations. E a c h c o m m a n d consists of a c o m m a n d code (8-bits), the required parameter(s), a n d a t e r m i n a t i o n code m a r k i n g the end of each c o m m a n d character string.  U p o n completion of the  execution of the c o m m a n d , the status of the operation is reported to the t i m i n g & control processor from the / / P D 7 7 6 2 .  A detailed description of the format of each  c o m m a n d is given i n A p p e n d i x B . One of the special features of the spectral recognizer is the use of s y n t a x numbers to group the reference signal templates. S u c h syntax numbers can be specified i n the t r a i n i n g and recognition stages. A v a l i d syntax number can range from 0 - 127 [55]. If none of the s y n t a x numbers is specified, the default value of zero is assumed. W h e n the spectral recognizer learns the spectral features of a w a r n i n g sound, this reference template w i l l be assigned to the group of templates w h i c h have the same s y n t a x number. S i m i l a r l y , i n the recognition stage, one or more syntax number(s) w i l l be assigned to the u n k n o w n signal. T o m i n i m i z e useless comparisons, the spectral recognizer w i l l use only the reference templates w h i c h have the same s y n t a x number(s) as the u n k n o w n signal being examined. In this work the s y n t a x number is derived from the t i m i n g features of w a r n i n g signals. obtained.  F r o m the t i m i n g analyzer the repetition p e r i o d of the burst-type signal is T h e n , the syntax number of this w a r n i n g signal is evaluated by d i v i d i n g  its repetition p e r i o d b y eight, i n order to assure that the c o m p u t e d syntax number is b o u n d w i t h i n the allowable range. However, steady sounds have no repetition period.  Chapter 5. Design &: Implementation  111  Therefore, the syntax number of 110 is assigned arbitrarily to this group of signal templates. Furthermore, since telephone rings have by far the longest repetition period of all warning signals considered, any sound with a repetition period of about six seconds will be given the syntax number of the the telephone group (101).  5.3.2  Initialization Stage  In the initialization stage the parallel port (INTEL 8255) is reset and configured to mode 0 operation (i.e. port A = bidirectional port, port B is set to output port for this implementation, four pins of the port C are for handshaking signals and two other pins are for output control signals). Then, the three processors of the spectral recognizer are also reset, and the pattern memory is tested. If any I / O hardware interfacing problem occurs during the memory testing process, a failure status from the //PD7762 will be reported to the control Si timing processor. the MC4760 for signal level adjustment.  Consequently, the 1 kHz tone is fed to  After level adjustment, the experimentally  determined distance threshold is set to constrain the distance calculations between an unknown signal and the reference patterns. Then the user is prompted for any prestored template(s) to be transferred from permanent storage to the active pattern memory. Table 5.4 shows the parameters used in the timing analysis and their initial values.  Table 5.4: Parameters used for the Timing Analyzer Timing Analysis Parameter Minimum burst duration Maximum burst duration Minimum detectable transition level Starting D A T Duration to average the absolute signal amplitudes  Designated Values 102.4 msec 4000.0 msec 2 255 12.8 msec  Chapter 5. Design &c Implementation  5.3.3  112  T r a i n i n g Stage  In the training stage we employ the "training-by-recognition" strategy to learn the characteristics of warning sounds. In brief, this strategy is achieved by three steps: 1) learning the timing features of warning sounds, 2) extracting their spectral features, and 3) verifying the learned spectral features. First, the timing information of warning sounds is provided by the timing feature extraction program (cf. Section 4.4).  With  this information, warning sounds are classified into two groups: steady and burst-type sounds. Following the timing analysis, the spectral recognizer will learn the spectral patterns of these sounds.  For steady sounds, the spectral recognizer immediately learns the  spectral features and subsequently stores them in the pattern memory under syntax number 110. For burst-type sounds, spectral extraction process must be synchronized with the rising transition of the burst. As shown in Fig. 5.42, if the spectral recognizer idling time is known, this synchronization can be accomplished by activating the spectral recognizer prior to the expected beginning of the burst.  With the learned timing  information (i.e., repetition period and average signal burst width) of a burst-type warning sound, the idling time is obtained by subtracting the average signal burst width from the repetition period. Consequently, the spectral patterns are stored in the pattern memory under the syntax number derived from the detected repetition period. To verify the learned spectral patterns of warning sounds, the process described above is repeated. If the results of the two sets of recognition procedures are identical, the training procedure is completed. Otherwise, the training procedure repeats until the sound is "learned". If the spectral recognizer cannot successfully learn the spectral features of the signal, the user can interrupt the spectral recognizer, and restart the  Chapter 5. Design &. Implementation  r j —  113  RP ASBW  —  TIME  Figure 5.42: Timing relationships associated with the synchronization of the spectral recognizer to burst-type warning signals, where STAAA is the short-time average absolute amplitude of signal; RP is the repetition period; ASBW is the average signal burst width, and SR is the spectral recognizer  Chapter 5. Design Sz Implementation  t r a i n i n g procedure.  Fig.  5.43, and F i g .  114  5.44 show the  flowcharts  of the t r a i n i n g  procedures for steady and, burst-type w a r n i n g sounds, respectively. Specific information relevant to each w a r n i n g signal is stored for identification. T h i s i n f o r m a t i o n includes the syntax number, the pattern registration number w h i c h is a u t o m a t i c a l l y generated for each w a r n i n g sound, the signal type (steady or burst-type), a n d a n identifier (name) of the w a r n i n g sound assigned by the user d u r i n g training.  5.3.4  R e c o g n i t i o n Stage  Signal recognition consists of two stages: 1) w a r n i n g signal detection by the t i m i n g analyzer, a n d 2) signal recognition by the spectral recognizer. T h e system continuously monitors the variations of the short-time average absolute a m p l i t u d e of sound i n the environments. If a steady sound is detected, the spectral recognizer identifies the sound twice. If the two recognition results identify the presence of a k n o w n w a r n i n g sound, the u n k n o w n sound is declared to be that w a r n i n g sound. If a potential burst-type sound is detected, its r e p e t i t i o n p e r i o d , burst w i d t h , a n d syntax number are derived. B a s e d on these measurements, the spectral recognizer attempts to recognize the w a r n i n g sound at the rising t r a n s i t i o n of the signal burst. If any spectral reference template can be matched to the u n k n o w n signal, the w a r n i n g signal is identified w i t h the k n o w n w a r n i n g sound associated w i t h that template. A flowchart of this recognition scheme is given i n F i g . 5.45. U p o n completion of the recognition process, a s u m m a r y of signal t i m i n g analysis a n d recognition results is displayed o n the screen.  These results include the syntax  number, the signal type, the sound identifier, a n d the distance score from the m a t c h i n g calculations. A system operating m a n u a l has been w r i t t e n for users ( A p p e n d i x C ) .  Chapter 5. Design ic Implementation  115  OBTAIN SOUND SAMPLES  STEADY SOUND IDENTIFICATION  ASSIGN SYNTAX # 110  SPECTRAL ANALYSIS  GET SOUND SAMPLES  TRAINING COMPLETED F i g u r e 5.43: Flowchart of the training scheme for steady sounds  Chapter 5. Design iz Implementation  116  (START)  08TAIN SOUND SAMPLES BURST-TYPE SOUND IDENTIFICATION COMPUTE RP.AWSB  COMPUTE SR IDLING TIME DELAY) DELAY <- (RP-AWS8)  ELAY <— DELAY-1  NO  YES ASSIGN SYNTAX 1 RP/8  RP: REPETTTION PERIOD AWSB: AVERAGE WIDTH Of SIGNAL BURST Sft SPECTRAL RECOGNIZER  SPECTRAL ANALYSIS  GET SOUND SAMPLES  TRAINING COMPLETED  Figure 5.44: Flowchart of training procedures for burst-type warning sounds  Chapter 5 .  Design k Implementation  117  SUCCESS; A  Figure 5.45: Flowchart of the recognition procedure  Chapter 6  Evaluation  E x p e r i m e n t s were conducted to evaluate the performance of the W A R N S I S under different noisy situations. Performance criteria were the average recognition rate a n d the false-alarm rate. T h r e e noise backgrounds were used: 1) steady fan noise, 2) fan noise plus F M - r a d i o broadcasts, a n d 3) fan noise plus A M - r a d i o broadcasts. In view of the variations of spectra w i t h loudness and noise c o n t a m i n a t i o n (cf. Section 3.2.5), three templates were prepared for the spectral recognizer at different S N R s (i.e. l O d B , 20 d B , a n d 30 d B ) w i t h the steady fan as a noise source. Peterson demonstrated that i n order to hear sounds reliably i n the presence of noise, their spectral components have to be 15 d B to 25 d B above the background S P L level [17,18]. F u r t h e r m o r e , current standards demand the audible w a r n i n g devices used i n private residences must produce a m i n i m u m 10 d B A S P L above the average ambient level [11]. Therefore, we took the stricter criteria w h i c h was to m a i n t a i n the average S P L of the noisy background at a m i n i m u m of 10 d B C below the S P L of the w a r n i n g sounds. T h r o u g h o u t the experiments, a value of 62 d B C S P L was measured for steady noise.  W h e n radio-broadcast was introduced into the steady noise background, the  variations i n S P L of the environment was monitored for five minutes i n order to provide the average S P L estimate of the noisy background.  T h i s estimate was obtained by  averaging the S P L variations w i t h i n the observed time interval. M o r e specifically, this value was m a i n t a i n e d a p p r o x i m a t e l y at 65 d B C . Note that the three d B C S P L increase  118  Chapter 6.  119  Evaluation  was caused by acoustically adding two signals of equal strength (i.e. steady noise and radio-broadcast signal). Then, we activated an auditory warning device, and adjusted the loudness of the emitted sound so that the S P L reading was on the average 10 d B C above the noisy background. The set-up for these experiments was similar to the one used for the measurement of the average short-time absolute amplitude of warning sounds in Chapter 3. Siren sounds were emitted from a siren horn; the pre-recorded telephone rings and smoke alarm sounds were produced by a tape recorder; and the radio-broadcasts originated from a radio-cassette player. To explore the contribution of the timing and spectral recognizer parts to the performance of the WANRSIS, we also evaluated the recognition rate and the false-alarm rate using these subsystems separately. Specifically, for the timing analyzer part alone, the repetition period was our prime feature for warning sound recognition. Since steady sounds have no repetition period, their recognition accuracy rate cannot be found under these circumstances. In the training stage, the timing analyzer learned the repetition periods from the warning sounds. To recognize a warning sound, the repetition period of an unknown sound was extracted and compared to the values of the pre-stored repetition periods. If the absolute difference was less than 10 % of the pre-stored repetition period used in the comparison, the unknown sound was assigned to the corresponding reference warning sound. For the spectral recognizer part alone, the environmental sounds were continuously monitored.  Under the steady noise background, the spectral recognizer learned the  signal templates using the 'training-by-recognition' scheme.  For signal recognition,  only spectral features were used without utilization of any timing information.  Chapter 6.  6.1  Evaluation  120  Average Recognition Accuracies  F o r each w a r n i n g sound the recognition rate was derived b y d i v i d i n g the number of times the correct sound was identified b y the t o t a l number of times the sound was present. T h e average accuracy for each of the three types of w a r n i n g sounds is the average of the recognition rates calculated for a l l sounds belonging to the type.  The  detailed calculations m a y be found i n A p p e n d i x D . Table 6.5 shows the s u m m a r y of recognition results for the complete W A R N S I S , the t i m i n g analyzer part alone, and the spectral recognizer part alone. T h e first c o l u m n gives the three types of noisy backgrounds i n w h i c h the experiments were conducted; the second c o l u m n shows the types of w a r n i n g sounds used: 1) 'burst', denoting bursttype sounds, 2) 'steady', denoting steady sounds, and 3) 'phone', denoting telephone rings; the t h i r d , fourth, a n d fifth columns give the average recognition accuracies ( A R A ) achieved by the complete W A R N S I S , the t i m i n g analyzer p a r t alone, a n d the spectral recognizer part alone, respectively. T h e recognition results for the spectral recognizer part alone i n a steady noise background were reported i n [71]. In steady noise background, the complete W A R N S I S produced 100 % average recogn i t i o n accuracy for a l l three types of w a r n i n g sounds. T h e t i m i n g analyzer part alone yielded perfect recognition scores for burst-type sounds a n d phone rings; a n d the spect r a l recognizer part alone gave more t h a n 95 % average recognition accuracy i n a l l cases. A s mentioned previously, the t i m i n g analyzer can detect the presence of steady sounds, but cannot distinguish any particular steady sound. Therefore, we cannot find the average recognition accuracy for the steady sound i n the c o l u m n for the t i m i n g analyzer part alone. W i t h the a d d i t i o n of F M broadcast to the steady noise, the complete could still reliably identify burst-type, a n d steady sounds.  WARNSIS  A s s h o w n i n Table 6.5,  Chapter 6.  Evaluation  121  Table 6.5: A summary of recognition results with M B D set to 0.1024 sec Complete WARNSIS  Timing Analyzer Alone  Spectral Recognizer Alone  ARA  ARA  Background  Type of  Noises  Warning Sound  ARA^ (%)  (%)  (%)  Steady Noise  Burst Steady Phone  100.0 100.0 100.0  100.0 N/A 100.0  100.0 97.6 95.8  FM + Steady Noise  Burst Steady Phone  98.0 100.0 0.0  97.7 N/A 0.0  65.6 91.1 70.0  AM + Steady Noise  Burst Steady Phone  99.3 100.0 0.0  98.3 N/A 0.0  67.2 91.1 69.2  ARA^ : Average Recognition Accuracy in % Minimum Burst Duration (MBD) : 0.1024 sec N / A : Not Applicable  Chapter 6.  Evaluation  122  the recognition accuracies were measured as 98.0 % for burst-type sounds, a n d 100 % for steady sounds. B u t , the complete W A R N S I S failed to recognize the telephone rings. U n d e r the same noisy conditions the t i m i n g analyzer could recognize burst-type sounds w i t h a 97.7 % average recognition accuracy, but failed to detect the presence of telephone rings. F o r the spectral recognizer part alone the average accuracy dropped from 100 % to 65.6 % for burst-type sounds, and was reduced from 95.8 % to 70 % for phone rings. However, this subsystem could still achieve a 91.1 % average recognition accuracy for steady sounds. These results indicate that the complete W A R N S I S consistently obtains higher recognition accuracy rates for burst-type and steady sounds t h a n those of its subsystems separately. I n close e x a m i n a t i o n the complete W A R N S I S gives a 0.3 % recognition accuracy better t h a n that of the t i m i n g analyzer part for burst-type sounds w i t h the b a c k g r o u n d of F M broadcast plus steady fan noise. I n the same situations, the c o m plete W A R N S I S outperforms the spectral recognizer b y 24.4 % i n identifying burst-type sounds, a n d b y 8.9 % i n correctly recognizing different steady sounds. S i m i l a r results were also obtained when A M - r a d i o broadcast and steady noise was used as background. W i t h the b a c k g r o u n d of radio broadcast, b o t h the complete W A R N S I S a n d the t i m i n g analyzer failed to detect the presence of phone rings. A n a l y s i s showed that this is due to the value of the m i n i m u m burst d u r a t i o n ( M B D ) selected. It is possible to set M B D to provide greatly improved phone r i n g recognition (1.024 sec). Table 6.6 gives the recognition results w i t h this M B D value. O v e r 92 % recognition accuracy for phone rings is achieved by the complete W A R N SIS, a n d the t i m i n g analyzer can always correctly identify the presence of phone rings i n radio-broadcast backgrounds. A c c o r d i n g to the t i m i n g analysis a l g o r i t h m , the modific a t i o n of the m i n i m u m burst d u r a t i o n has no effect on the performance of the complete  123  Chapter 6. Evaluation  Table 6.6: A s u m m a r y of recognition results w i t h M B D set to 1.024 sec  Background  T y p e of  Complete WARNSIS  Noises  Warning Sound  ARA^ (%)  Timing Analyzer Alone  Spectral Recognizer Alone  ARA (%)  ARA {%)  Steady Noise  Burst Steady Phone  0 100.0 100.0  0 N/A 100.0  100.0 97.6 95.8  FM + Steady Noise  Burst Steady Phone  0 100.0 92.5  0 N/A 100.0  65.6 91.1 70.0  AM + Steady Noise  Burst Steady Phone  0 100.0 94.2  0 N/A 100.0  67.2 91.1 69.2  ARA^ : Average R e c o g n i t i o n A c c u r a c y i n % M B D : 1.024 sec N / A : Not Applicable  W A R N S I S i n steady sound recognition, a n d of the spectral recognizer alone i n a l l noise situations. Therefore, we reproduced those average recognition accuracies from Table 6.5 i n T a b l e 6.6. T h e effect of different M B D ' s on the performance of the W A R N S I S is discussed i n detail i n Section 6.3.1.  6.2  F a l s e - a l a r m Rates  Since the occurrence of w a r n i n g sounds i n real-life environments is quite infrequent, it is essential for the W A R N S I S not only to achieve a n acceptable recognition accuracy for various sounds, but also to operate w i t h a low false-alarm rate.  Chapter 6.  Evaluation  124  W i t h the same experimental set-up as used before, we recorded the number of falsealarms over long period of time. T h e false-alarm rates for the complete W A R N S I S , the t i m i n g analyzer part alone, and the spectral recognizer part alone were determined. Table 6.7 shows that i n steady noise situations W A R N S I S produces no false-alarms. W i t h radio-broadcast background the false a l a r m rate maybe as h i g h as 2.33 per hour. Interestingly, phone r i n g false alarms are never produced. F o r the t i m i n g analyzer alone the 'worst' false-alarm rate is 144.59 mis-recognitions per hour, 113 of w h i c h belongs to burst-type, 31 to steady, a n d 0.59 to phone ring sounds, respectively.  In the two radio-broadcast backgrounds, over 99 % of mis-  recognitions are classified into burst-type and steady sounds. For the spectral recognizer alone, the 'worst' false-alarm rate is 1848 mis-recognitions per hour, 21 of w h i c h belongs to burst-type, 200 to steady, a n d 1627 to phone ring sounds, respectively. In two noisy conditions, over 80 % of mis-recognitions are classified into phone rings. W i t h the M B D set to 1.024 sec, the W A R N S I S gave no false phone indications no matter w h a t the noise conditions were (Table 6.8).  Since the different M B D ' s have  no effect o n the performance of the spectral recognizer, the false-alarm rates for the spectral recognizer i n T a b l e 6.7 are reproduced i n T a b l e 6.8. A l t h o u g h it is very difficult to quantify, experience has shown that the false a l a r m rate is highly dependent on the type of music played.  6.3 6.3.1  Discussion Average Recognition Accuracies  Table 6.5 shows that the combined use of t i m i n g and spectral characteristics of w a r n ing sounds gives better recognition scheme for burst-type a n d steady sounds t h a n any  Chapter 6.  Evaluation  125  Table 6.7: Results of the false-alarm test with M B D set to 0.1024 Complete WARNSIS  Timing Analyzer Alone  Spectral Recognizer Alone  Background  Mis-  Noises  recognized As  FAR?  FAR  FAR  (#/hour)  (#/hour)  (#/hour)  Burst Steady Phone Total  0 0 0 0  0 0 0 0  0 0 0 0  FM + Steady Noise  Burst Steady Phone Total  1.33 1.0 0 2.33  49 35 0.76 84.76  21 200 1627 1848  AM + Steady Noise  Burst Steady Phone Total  0.5 0.5 0 1.0  113 31 0.59 144.59  153 296 1270 1719  Steady Noise  FAR? : False-alarm Rate M B D : 0.1024 sec  Chapter 6.  Evaluation  126  Table 6.8: Results of false-alarm test with M B D set to 1.024 sec  Mis-  Complete WARNSIS  Noises  recognized As  FAR?  FAR  FAR  (#/hour)  (#/hour)  (#/hour)  Burst Steady Phone Total  0 0 0 0  0 0 0 0  0 0 0 0  FM + Steady Noise  Burst Steady Phone Total  0 1.0 0 1.0  2.67 36 4 42.67  21 200 1627 1848  AM + Steady Noise  Burst Steady Phone Total  0 0.5 0 0.5  4.67 26 9.33 40.0  153 296 1270 1719  Steady Noise  FAR? : False-alarm Rate M B D : 1.024 sec  Timing Analyzer Alone  Spectral Recognizer Alone  Background  Chapter 6.  Evaluation  127  scheme using only one of them. I n particular, for these two types of w a r n i n g sounds i n radio broadcast backgrounds, the complete W A R N S I S gives at least 0.3 % better average recognition accuracy t h a n that of the t i m i n g analyzer alone, and provides m i n i m a l l y 8 % better average recognition accuracy rate t h a n that of the spectral recognizer part alone. T h e explanation for the failure of the complete W A R N S I S a n d the t i m i n g analyzer to recognize phone r i n g is as follows.  F i g . 6.46 gives an example of a phone ring  sequence added w i t h nonstationary background noise. comprised of t w o 2 seconds bursts [Bi,  T h e phone r i n g sequence is  and £ 3 ) , and of 4 seconds of silence.  After  the first phone ring, the burst, B , is detected by the t i m i n g analyzer, and the time x  markers for b o t h rising and falling transitions are located. W i t h o u t storing the detected burst waveform, the t i m i n g analyzer continues to monitor the environmental sounds. D u r i n g the silence interval, 2? , w h i c h m a y be caused b y radio music/conversation, is 2  also detected b y the t i m i n g analyzer. Unfortunately, the two criteria for a successful detection of a potential repetitive burst sequence are satisfied (i.e. W > MBD, 2  the burst interarrival time > MIAT).  and  Therefore, the repetition period for these bursts  is calculated, a n d compared to the prestored template values. M i s - r e c o g n i t i o n to one of the w a r n i n g sounds occurs, i f this value matches to any one of the prestored values. Otherwise, the t i m i n g analyzer considers this burst sequence is caused b y r a n d o m noise, and their time markers are cleared as it restarts to search for another potential burst sequence. Similarly, the t i m i n g analyzer decides either mis-recognition or r a n d o m noise rejection for the following phone bursts (i.e. B3 i n F i g . 6.46). A s a result, the t i m i n g analyzer fails to detect the presence of phone rings. If the t i m i n g analyzer cannot provide the t i m i n g information on phone r i n g sequence,  the  W A R N S I S cannot utilize this t i m i n g analysis result, and eventually, it also cannot identify the presence of phone rings.  128  Ch&pter 6. EvaJuation  TIME  Figure 6.46: An example of a phone ring sequence added with nonstationary background noise  Chapter 6.  Evaluation  129  I n T a b l e 6.6 we find that the t i m i n g analyzer performs better t h a n the complete W A R N S I S i n phone r i n g recognition. A n explanation for this observation is as follows. F o r the t i m i n g analyzer part alone the repetition p e r i o d is the only feature used to to detect the presence of phone rings. Since the repetition periods of the phone r i n g sequences used are a p p r o x i m a t e l y six seconds, the t i m i n g analyzer, therefore,  cannot  identify the sounds emitted from a specific telephone ringer. However, based on the t i m i n g i n f o r m a t i o n derived from a phone r i n g sequence, the complete W A R N S I S then examines the spectral content of a phone r i n g a n d compares it to the pre-stored spectral patterns belonging to the group of telephone rings. T h u s , the complete W A R N S I S not only identifies the sound as a phone ring, but also provides a d d i t i o n a l i n f o r m a t i o n on the specific ringer. A s deduced from T a b l e D.25 and D.27 i n A p p e n d i x D (a complete set of evaluation results), the decreased recognition rate occurs even though it identifies the correct ringer, as it chooses the incorrect loudness or p i t c h template. T h i s is because of the similar spectral characteristics between templates w i t h adjacent Section 3.2).  settings (cf.  In a p r a c t i c a l system, however, this w o u l d not m a t t e r as long as the  "phone is r i n g i n g " event is detected. T h e repetition p e r i o d of burst-type sounds ranges from 140 msec to 3.2 sec. W i t h the value of the m i n i m u m burst d u r a t i o n changed from 0.1024 sec to 1.024 sec, the t i m i n g analyzer is prevented from extracting t i m i n g features of those burst-type sounds w i t h repetition periods less t h a n 1.024 sec. However, the modification has no effect on the steady sound recognition performance of the complete W A R N S I S because steady sounds require a m i n i m u m burst d u r a t i o n of four seconds.  6.3.2  F a l s e - a l a r m Rates  T h e results of the false-alarm rate indicate that the combined use of t i m i n g and spectral features to characterize w a r n i n g sounds provides an effective scheme to eliminate false  Chapter 6.  Evaluation  recognitions triggered by environmental noise.  130  F o r r a n d o m noise there are no false  alarms. In the presence of F M broadcasts, the complete W A R N S I S gives a false-alarm rate of about 2.33 false recognitions per hour, w h i c h we consider to be unacceptably high for a p r a c t i c a l recognition system operating i n real-life environments. It should be remembered, however, that the measurements presented here represent the 'worst-case' false-alarm recognition performance of the W A R N S I S . R e a l life performance should be better, since S N R ' s are usually higher t h a n the 10 d B used i n our  measurements.  E v a l u a t i o n of performance i n use w i l l require field testing b e y o n d the scope of this work. T h e specifications for the W A R N S I S are given i n A p p e n d i x E .  Chapter 7  Conclusions and R e c o m m e n d a t i o n s  7.1  S u m m a r y & Conclusions  T h i s w o r k was d i v i d e d into two major parts: 1) the analysis of w a r n i n g sounds, and 2) the design of a prototype recognition device based on ( l ) .  A n extensive search  for existing w a r n i n g sound characteristics yielded o n l y a l i m i t e d amount of t i m i n g a n d spectral information.  Therefore, we used various t i m i n g a n d spectral analysis  techniques to study the warning sounds emitted by telephones, smoke alarms, a n d electronic siren drivers. F i r s t , the short-time average absolute amplitudes of w a r n i n g sounds were analyzed to provide t i m i n g features. Results show that warning sounds can be categorized into either steady or burst-type sounds. Secondly, Welch's nonoverlapping spectral estimation m e t h o d was used to analyze the short-time spectra of w a r n i n g sounds.  O u r findings indicate that the spectra of  telephone rings produced from electromechanical ringers of d i a l phones of the same model may v a r y significantly. These spectral characteristics also depend on the setting of the loudness adjustments provided. T y p i c a l l y , the short-time spectra of a two second telephone r i n g consist of two discernible parts: the transient region a n d the steady-state regions. Analyses were also performed on telephone rings emitted from an electronic ringer. Results indicate that by v a r y i n g the p i t c h setting, the two tones generated from  131  Chapter 7. Conclusions and Recommendations  132  the ringer change accordingly. F o r siren sounds, the short-time spectra can be divided into two groups: 1) spectra w i t h r i c h harmonics and, 2) spectra w i t h frequency clusters. Based on the t i m i n g and spectral analysis results, a ' h y b r i d ' prototype recognition device ( W A R N S I S ) was developed and constructed using c o m m e r c i a l l y available components. T h i s device utilizes a c o m b i n a t i o n of t i m i n g a n d spectral features of w a r n i n g sounds as signal patterns. A 'real-time' a l g o r i t h m is used to extract t i m i n g features i n noisy environments. A c c o r d i n g to the relative t i m i n g characteristics of these features, w a r n i n g sounds are classified. T h e n , the i n c o m i n g signals are passed o n for spectral analysis. A  filter-bank  approach is employed to analyse the short-time spectra of w a r n i n g  sounds. T o categorize these spectral patterns, the t i m i n g information of w a r n i n g sounds is used to group these patterns w i t h sounds of similar t i m i n g features. T h i s grouping technique greatly reduces the amount of c o m p u t a t i o n involved i n the recognition stage. T h e real-time p r o g r a m to extract t i m i n g features was w r i t t e n i n assembler language.  T h e spectral recognizer was constructed w i t h commercial electronic compo-  nents. A software operating system was developed to co-ordinate the t i m i n g analyzer a n d the spectral recognizer. O u r device consists of 79 chips, and the software p r o g r a m is comprised of 2490 lines of assembler source codes. E x p e r i m e n t s were conducted to investigate the performance of the W A R N S I S i n noisy environments. F o r burst-type a n d steady sounds, the W A R N S I S provides average recognition accuracies over 98 % . W i t h regard to the false-alarm rates, the complete W A R N S I S gives m u c h lower values t h a n the false-alarm rates of its separate t i m i n g a n d spectral subsystems. I n this work, we designed, constructed, and evaluated a w a r n i n g sound recognition system. T h e evaluation results indicate that the W A R N S I S operates satisfactorily i n real environments, where it c a n be taught to learn new sounds a n d to recognize them.  Chapter 7. Conclusions and Recommendations  133  T h i s system w i l l reliably recognize w a r n i n g sounds i n r a n d o m noise w i t h no false alarms. In very loud music a n d conversation the recognition is still good, although more false alarms are created. C o n s i d e r i n g that our evaluation criteria have been very stringent, the performance of the system i n real-life situations is expected to be satisfactory.  7.2  R e c o m m e n d a t i o n s for F u t u r e Directions of R e s e a r c h  T o improve the performance of the complete W A R N S I S i n noisy environments w i t h S N R of lower t h a n 10 d B , future work should be directed towards the following: 1. T h e improvement of the t r a n s i t i o n or break-point detection scheme a n d implementation: I n the present design none of the short-time average amplitudes are stored for analysis. It is feasible to store these a m p l i t u d e values, a n d then use a fast C P U to analyze the stored signal amplitude samples. Faster C P U t h a n the one presently employed w i l l p e r m i t more elaborate analysis of these amplitude samples, so that the t i m i n g analyzer becomes more intelligent i n rejecting u n wanted transient noises. A possible extension of this work is to use the shape of the a m p l i t u d e contours of burst-type sounds to provide a d d i t i o n a l signal features. 2. E x p l o r a t i o n of the adaptive noise cancellation ( A N C ) technique:  Since noise i n  this work consists of music, speech signals and transient noises, cancellation of these noises i n real-life environments leads us into unexplored territory.  Then  we need to find a suitable A N C a l g o r i t h m a n d explore its i m p l e m e n t a t i o n for o p t i m u m performance. F o r real-time operation, a compromise m a y exist between the S N R improvement and the c o m p l e x i t y of the a l g o r i t h m .  Chapter 7. Conclusions and Recommendations  134  3. Use of microphone array to provide better spatial separation between warning sound source and background noise:  A microphone array can provide a m u c h  sharper directional b e a m to o b t a i n better quality w a r n i n g sound t h a n a single directional microphone.  Research i n this area should involve the selection the  microphone array structure, its orientation, and a signal processing a l g o r i t h m to analyze the outputs from the microphone array to y i e l d the desired output. A possible extension is the combined use of adaptive noise cancellation and m u l t i microphone array system for sound t r a c k i n g c a p a b i l i t y and noise removal enhancement. Research i n this area w i l l require a multiple d i g i t a l signal processing ( D S P ) system to facilitate the real-time operation i n nonstationary noise environments.  References  J . E . H a r k i n s and C . J . Jensema, Focus-group discussions with deaf and severely hard of hearing people on needs for sensory devices, Gallaudet Research Institute, Technology Assessment P r o g r a m , W a s h i n g t o n D . C . , 1987. J . H u r v i t z a n d R . C a r m e n , Special Devices for Hard of Hearing, Deaf, and DeafBlind Persons, L i t t l e , B r o w n and C o m p a n y , B o s t o n , 1981.  T . Hustak, Directory of Technical Aids Available to Hearing Impaired Persons, Services for H e a r i n g Impaired Persons, Inc., R e g i n a , Sakatchewan 1984. J . E . H a r k i n a n d C . J . Jensema and H . R y l a n d , "Toward E m e r g e n c y Vehicle Detect i o n : Systemic Considerations", Proceedings of I C A R R T at M o n t r e a l , pp.228-229, 1988. U n d e r w r i t e r s Laboratories Inc. S t a n d a r d for Safety U L 2 1 7 : "Single and M u l t i p l e S t a t i o n Smoke Detectors", O c t . , 1985. U n d e r w r i t e r s Laboratories Inc. S t a n d a r d for Safety U L 9 8 5 : W a r n i n g S y s t e m U n i t s " , June, 1985.  "Household F i r e  U n d e r w r i t e r s Laboratories Inc. S t a n d a r d for Safety U L 9 0 4 : "Vehicle A l a r m Systems and U n i t s " , J u l y , 1982. C a n a d i a n Standards A s s o c i a t i o n , N a t i o n a l S t a n d a r d of C a n a d a , C A N / C S A - T 5 1 0 M 8 7 , "Performance a n d C o m p a t i b i l i t y Requirements for Telephone Sets", M a r c h , 1987. Electronic Industries A s s o c i a t i o n , E I A - 4 7 0 - A , "Telephone Instruments w i t h L o o p Signalling for V o i c e b a n d A p p l i c a t i o n s " , 1988. B e l l S y s t e m Voice C o m m u n i c a t i o n s Technical Reference, P U B 48005, " F u n c t i o n a l P r o d u c t Class C r i t e r i a : Telephones", J a n . , 1980 N a t i o n a l F i r e P r o t e c t i o n A s s o c i a t i o n , N F P A 7 2 G , " G u i d e for the Installation, M a i n t e n a n c e and Use of N o t i f i c a t i o n A p p l i a n c e s for P r o t e c t i v e Signalling Systems", 1985. N a t i o n a l F i r e P r o t e c t i o n A s s o c i a t i o n , N F P A 7 2 A , "Standard for Installation, M a i n t e n a n c e a n d Use of L o c a l Protective Signalling Systems for Guards's Tour, F i r e A l a r m a n d Supervisory Service", 1985. R . E . H a l l i w e l l and M . A . S u l t a n , " A t t e n u a t i o n of Smoke Detector A l a r m Signals i n Residental B u i l d i n g s " , N a t i o n a l Research C o u n c i l C a n a d a , Institute for Research i n C o n s t r u c t i o n , N R C C 25897. S. M i y a a k i and A . Ishida, "Traffic-alarm Sound M o n i t o r for A u r a l l y H a n d i c a p p e d D r i v e r s " , J . of M e d i c a l & C o m p u t e r , Vol.25, pp.68-74, J a n . , 1987. 135  References  136  [15] Installation and Service Instructions for M o d e l M C S - 1 M o t o r Signal, Federal Signal Corporation. [16] Installation M a n u a l for Electronic Siren, M o d e l S A 400-63, Southern Vehicle P r o d ucts, Inc. [17] R . D . P a t t e r s o n , C A A P a p e r 82017, C i v i l A v i a t i o n A u t h o r i t y , L o n d o n , U . K . , 1982. [18] J . E d w o r t h y a n d R . D . Patterson, " E r g o n o m i c Factors i n A u d i t o r y W a r n i n g s " , Ergonomics International 85, edited by I. D . B r o w n , R . G o l d s m i t h , K . Coombes and M . A . Sinclair, pp.232-235, 1985. [19] L o w e r a n d Wheeler, "Design of A u d i t o r y Warnings for A i r c r a f t , Industry and H o s p i t a l s " , Ergonomics International 85, edited by I. D . B r o w n , R . G o l d s m i t h , K . Coombes and M . A . Sinclair, pp.226-228, 1985. [20] G . M . R o o d , J . A . C h i l l e r y and J . B . Collister, "Requirements and A p p l i c a t i o n of A u d i t o r y W a r n i n g s to M i l i t a r y Helicopters", E r g o n o m i c s International 85, edited by I. D . B r o w n , R . G o l d s m i t h , K . Coombes and M . A . Sinclair, pp.169-170, 1985. [21] M . J . Shailer and R . D . Patterson, "Pulse generation for A u d i t o r y W a r n i n g Systems", E r g o n o m i c s International 85, edited by I. D . B r o w n , R . G o l d s m i t h , K . Coombes and M . A . Sinclair, pp.229-231, 1985. [22] J . H . K e r r , " W a r n i n g Devices", B r . J . A n a e s t h . , 57, pp.696-708, 1985. [23] R . D . P a t t e r s o n , J . E d w o r t h y and M . J . Shailer, " A l a r m sounds for M e d i c a l E q u i p ment i n Intensive C a r e Areas and O p e r a t i o n Theatres", Institute of Sound and V i b r a t i o n Research P a p e r A C 5 9 8 , 1986. [24] S. M . K a y and S. L . M a r p l e ,Jr., "Spectral A n a l y s i s : A M o d e r n Perspective", Proceedings of I E E E , Vol.69, N o . l l , pp.1380-1419, N o v . , 1981. [25] B . S. A t a l and M . R . Schroeder, " L i n e a r P r e d i c t i o n A n a l y s i s of Speech based on a Pole-zero Representation", J o u r n a l of A c o u s t . Soc. of A m e r . , Vol.64, N o . 5 , pp.1310-1318, N o v . , 1978. [26] J . M a k o u l , " L i n e a r P r e d i c t i o n : A t u t o r i a l R e v i e w " , Proceeding of I E E E , Vol.63, pp.561-580, A p r . , 1975. [27] R . B . B l a c k m a n and J . W . Tukey, "The Measurement of Power Spectra from the point of view of C o m m u n i c a t i o n E n g i n e e r i n g " , N e w Y o r k , Dover, 1959. [28] P . D . W e l c h , " T h e Use of fast Fourier transform for the estimation of Power Spectra: A m e t h o d based on T i m e A v e r a g i n g over Short, M o d i f i e d Periodograms", I E E E Trans, on A u d i o Electroacoust., V o l . A U - 1 5 , pp.70-73, June, 1967. [29] G . C . C a r t e r a n d A . H . N u t t a l l , " O n the Weighted O v e r l a p p e d Segment A v e r aging M e t h o d for Power Spectral E s t i m a t i o n " , P r o c . of the I E E E , Vol.68, No.10, pp.1352-1353, O c t . , 1980.  References  137  J . S. L i m , " A l l P o l e M o d e l l i n g of Degraded Speech", I E E E Trans, on A S S P , V o l . A S S P - 2 6 , pp.197-209, June, 1978. S. M . K a y , " T h e Effects of Noise on the Autoregressive Spectral E s t i m a t o r " , I E E E Trans, on A S S P , V o l . A S S P - 2 7 , pp.478-485, O c t . , 1979. F . J . H a r r i s , " O n the Use of W i n d o w s for H a r m o n i c A n a l y s i s w i t h the Discrete Fourier transform", Proceedings of I E E E , Vol.66, N o . l , pp.51-83, J a n . , 1978. D . N . R o m a l o , " A n Interference M o n i t o r for a R a d i o Observatory", M . A . S c . T h e sis, D e p t . of E l e c t r i c a l Engineering, U n i v e r s i t y of B r i t i s h C o l u m b i a , pp.42-44, A p r i l , 1988. S i m o n C h a u a n d Charles Laszlo, "Spectra of Telephone R i n g s a n d A n n u n c i a t i n g Signals used i n a n A i d for H e a r i n g Impaired", Proceedings of the 13 CMBEC, pp.147-148, Halifax, June, 1987. th  B . S. A t a l and L . R . R a b i n e r , "Speech Research D i r e c t i o n s " , A T & T Technical J o u r n a l , V o l . 6 2 , N o . 5 , S e p t / O c t . , pp.75-88, 1986. S. E . L e v i n s o n , " S t r u c t u r a l M e t h o d s i n A u t o m a t i c Speech R e c o g n i t i o n " , Proceedings of I E E E , Vol.73, N o . l l , N o v . , pp.1625-1650, 1985. L . R . R a b i n e r and S. E . L e v i n s o n , "Isolated and Connected W o r d R e c o g n i t i o n T h e o r y and Selected A p p l i c a t i o n s " , I E E E Trans, on C o m m u n i c a t i o n s , V o l . C O M 29, N o . 5 , pp.621-659, M a y , 1981. D . O'Shaughnessy, "Speech R e c o g n i t i o n " , I E E E A S S P M a g a z i n e , pp.4-17, O c t . , 1986. H . Sakoe and S. C h i b a , " D y n a m i c P r o g r a m m i n g A l g o r i t h m O p t i m i z a t i o n for Spoken W o r d R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 2 6 , N o . l , pp.43-49, Feb., 1978. A . H . G r a y , J r . and J . D . M a r k e l , "Distance Measures for Speech Processing", I E E E Trans, o n A S S P , V o l . A S S P - 2 4 , N o . 5 , pp.380-391, O c t . , 1976. N . N o c e r i n o , F . K . Soony, L . R . R a b i n e r and D . H . K l a t t , " C o m p a r a t i v e study of Several D i s t o r t i o n Measures for Speech R e c o g n i t i o n " , P r o c . I C A S S P , pp.25-28, 1985. H . M a t s u m o t o and H . I a m i , " C o m p a r a t i v e S t u d y of V a r i a b l e S p e c t r u m M a t c h i n g Measures on Noise Robustness", P r o c . I C A S S P , pp.769-772, 1986. R . F . P u r t o n , "Speech R e c o g n i t i o n U s i n g A u t o c o r r e l a t i o n A n a l y s i s " , I E E E Trans, on A u d i o a n d Electroacoustics, V o l . A U - 1 6 , N o . 2 , pp.235-239, June, 1968. M . M . S o n d h i , " N e w M e t h o d s for P i t c h D e t e c t i o n " , I E E E Trans, o n A u d i o E l e c t r o . , V o l . A U - 1 6 , pp.262-266, June, 1968. L . R . R a b i n e r , " O n the Use of A u t o c o r r e l a t i o n A n a l y s i s for P i t c h D e t e c t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 2 5 , N o . l , pp.24-33, Feb., 1977.  References  138  [46] J . J . D u b n o w s k i , R . W . Schafer and L . R . R a b i n e r , "Real-time d i g i t a l Hardware p i t c h detector", I E E E Trans on A S S P , V o l . A S S P - 2 4 , pp.2-8, Feb., 1976. . [47] L . R . R a b i n e r and M . R . Sambur, " A n A l g o r i t h m for determining the E n d p o i n t s of Isolated Utterances", T h e B e l l S y s t e m Technical J o u r n a l , Vol.54, V o l . 2 , pp.297315, Feb., 1975. [48] M . T . W h i t a k e r and J . A . S. A n g u s , " A L o w Cost Continuous W o r d Speech Recognizer", International Conf. on Speech I n p u t / O u t p u t Techniques and A p p l i c a t i o n s , I E E Conf. P u b l i c a t i o n # 258, pp.119-123, M a r c h , 1986. [49] L . F . L a m e l , L . R . R a b i n e r , A . E . Rosenberg and J . G . W i l p o n , " A n Improved E n d point Detector for Isolated W o r d R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P 29, N o . 4 , A u g u s t , pp.777-785, 1981. [50] J . G . A c k e n h u s e n and L . R . R a b i n e r , "Microprocessor i m p l e m e n t a t i o n of a n L P C based isolated w o r d recognizer", i n P r o c . 1980 B T L / W E Microprocessor S y m p . , Sept., pp.35-42, 1980. [51] B . G o l d a n d L . R . R a b i n e r , " P a r a l l e l Processing Technique for E s t i m a t i o n P i t c h Periods of Speech i n the T i m e D o m a i n " , J . A c o u s t . Soc. A m e r . , V o l . 4 6 , pp.442-448, A u g . , 1969. [52] B . G o l d , "Note o n buzz-hiss detection", J . A c o u s t . Soc. A m e r . , Vol.36, pp. 16591661, 1964. [53] G . M . W h i t e a n d R . B . Neely, "Speech R e c o g n i t i o n E x p e r i m e n t s w i t h L i n e a r P r e diction, Bandpass F i l t e r i n g and D y n a m i c P r o g r a m m i n g " , I E E E Trans, o n A S S P , V o l . A S S P - 2 4 , N o . 2 , pp.183-188, A p r i l , 1976. [54] H . L . K w o k , L . C . T a i , and Y . M . F u n g , " M a c h i n e R e c o g n i t i o n of the Cantonese D i g i t s U s i n g Bandpass F i l t e r s " , I E E E Trans, on A S S P , V o l . A S S P - 3 1 , N o . l , pp.220222, Feb., 1983. [55] N E C Speech R e c o g n i t i o n L S I Set M a n u a l , June, 1985. [56] D . T j o s t h e i m , " R e c o g n i t i o n of Waveforms U s i n g Autoregressive Feature E x t r a c t i o n " , I E E E Trans, on C o m p u t e r , V o l . C - 2 6 , N o . 3 , pp.268-270, M a r c h , 1977. [57] B . S. A t a l and M . R . Schroeder, " A d a p t i v e P r e d i c t i v e C o d i n g of Speech Signals", B e l l S y s t e m Tech. J o u r n a l , Vol.49, pp.1973-1986, 1971. [58] J . G . A c k e n h u s e n a n d Y . H . O h , "Single-chip Implementation of Feature Measurement for L P C - b a s e d Speech R e c o g n i t i o n " , A T & T Technical J o u r n a l , Vol.64, N o . 8 , pp.1787-1805, O c t . , 1985. [59] B . A . D a u t r i c h , L . R . R a b i n e r and T . B . M a r t i n , " O n the Effects of V a r y i n g F i l t e r B a n k Parameters i n Isolated W o r d R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 3 1 , N o . 4 , pp.793-806, A u g u s t , 1983. [60] J . S. L i m , " E s t i m a t i o n of L P C coefficients from speech waveforms degraded b y a d d i t i v e r a n d o m noise", P r o c I C A S S P 78, pp.599-601.  References  139  [61] J . Tierney, " A S t u d y of L P C A n a l y s i s of Speech i n A d d i t i v e Noise", I E E E Trans, on A S S P , V o l . A S S P - 2 8 , N o . 4 , pp.389-397, A u g u s t , 1980. [62] B . S. A t a l , "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification a n d verification", J . A c o u s t . Soc. A m . , Vol.55, N o . 6 , pp.1304-1312, June, 1974. [63] B . H . J u a n g , L . R . R a b i n e r a n d J . G . W i l p on, " O n the use of Bandpass Liftering i n Speech R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 3 5 , N o . 7 , pp.947-953, J u l y , 1987. [64] B . A . H a n s o n a n d H . W a k i t a , "Spectral Slope Distance Measures w i t h L i n e a r P r e d i c t i o n A n a l y s i s for W o r d R e c o g n i t i o n i n Noise", I E E E Trans, on A S S P , V o l . A S S P 35, N o . 7 , pp.968-973, July, 1987. [65] S. E . L e v i n s o n , L . R . R a b i n e r , and M . M . Sondhi, " A n I n t r o d u c t i o n to the A p p l i cation of the T h e o r y of P r o b a b i l i s t i c Functions of a M a r k o v Process to A u t o m a t i c Speech R e c o g n i t i o n " , T h e B e l l System Technical J o u r n a l , Vol.62, pp.1035-1074, A p r i l , 1983. [66] L . R . R a b i n e r , S. E . Levinson, and M . M . S o n d h i , " O n the A p p l i c a t i o n of Vector Q u a n t i z a t i o n a n d H i d d e n M a r k o v M o d e l s to Speaker-Independent, Isolated W o r d R e c o g n i t i o n " , T h e B e l l S y s t e m Technical J o u r n a l , Vol.62, N o . 4 , pp.1075-1105, A p r i l , 1983. [67] A . V a r g a , R . M o o r e , J . B r i d l e , K . P o n t i n g , a n d M . Russell, "Noise C o m p e n s a t i o n A l g o r i t h m s for use w i t h H i d d e n M a r k o v M o d e l based Speech R e c o g n i t i o n " , P r o c . of I E E E Conf., pp.481-484, 1988. [68] R . W . Schafer a n d L . R . R a b i n e r , " D i g i t a l Representation of Speech Signals", Proceedings of I E E E , Vol.63, N o . 4 , pp.662-677, A p r i l , 1 9 7 5 . [69] L . R . R a b i n e r , A . E . Rosenberg and S. E . L e v i n s o n , "Considerations i n D y n a m i c T i m e W a r p i n g A l g o r i t h m s for Discrete W o r d R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 2 6 , N o . 6 , pp.575-582, D e c , 1978. [70] C . M y e r s , L . R . R a b i n e r and A . E . Rosenberg, "Performance Tradeoffs i n D y n a m i c T i m e W a r p i n g A l g o r i t h m s for Isolated W o r d R e c o g n i t i o n " , I E E E Trans, on A S S P , V o l . A S S P - 2 8 , N o . 6 , pp.623-635, D e c , 1980. [71] S i m o n C h a u a n d Charles Laszlo, " A W a r n i n g Signal Identification System ( W A R N S I S ) for H a r d of H e a r i n g I n d i v i d u a l s " , Proceedings of the 14 CMBEC, pp.145-146, M o n t r e a l , June, 1988. th  Appendix A  F o r m u l a t i o n of R e l a t i o n s h i p between S N R a n d S P L measurements  In this w o r k S N R is defined as the ratio of the peak power of the signal to peak power of the background noise. T o calculate the S N R directly, we need to o b t a i n b o t h signal a n d noise power.  F r o m the S P L measurement of acoustic background (noise), the  noise power can be derived. W e found, however, that the measurement of the S P L of the w a r n i n g sound alone i n any real acoustic environment is impossible, since there is always some background noise present. Here we w i l l show the relationship of noise S P L , and the w a r n i n g sound plus noise S P L to the S N R . T h e following n o t a t i o n w i l l be used: If re  = reference sound intensity  la — peak acoustic intensity of background noise I = peak w a r n i n g sound intensity s  I  a+S  P  = peak acoustic intensity of a w a r n i n g sound plus background noise = peak S P L of background noise  a  P  a+S  = peak S P L of a w a r n i n g sound plus background noise  SI  a  = I  SI  S  = I  a  s  in dB in dB 140  of Relationship between SNR and SPL measurements 141  Appendix A. Formulation  SI  = I  a+3  a+S  SPL  — S P L measurement of background noise  a  SPL  = S P L measurement of a w a r n i n g sound plus background noise  a+s  Pf re  expressed i n d B  = the reference sound pressure level ( 20 p, P a )  F r o m the definition of S N R , SNR = ySNR la  (A.35)  and  SNR(dB)  = 10  SI  =  10  l o g  =  10  log  {^|  (A.36)  | ^ |  (A.37)  log |A.|  (A.38)  1 0  Also,  a+3  SI  a  B u t (SI  a+s  1  0  1 0  — SI ) = difference i n sound intensity level i n d B , a n d a  using equations (A.37) a n d (A.38), it gives  (SI -SI ) a+s  =  a  10  l g O  1 0  (^±4-l0 l  = Since I  a+S  = I + I s  a  10  log  ref )  1 0  f  / a  \ *ref  ( 1 logroj^ }  ^ (A-39)  1  (without resonance), we have  {SI -SI ) a+s  a  =  10  log  10  log  {Ia +  Is)\  1 0 1  10  { l + j-}  ( - °) A  4  Appendix A.  T o find (SI  a+s  Formulation  — SI )  of Relationship  between SNR and SPL measurements  142  by measurement, consider  a  la  =  I  a+3  KP*  =  KP  (A.41)  2 A+S  where K — constant T h e n we c a n express (SI  — SI )  a+s  i n terms of P a n d P  a  a  w h i c h can be measured  a+S  by a commercially available S P L meter.  {SI -SI ) a+s  a  =  10  log.  =  20  logroj^}  (A.42)  R e w r i t i n g equation (A.42) using P f gives, re  (SI -SI ) a+s  =  a  20  log  1 0  §±i-20  •Tref  =  SPL -SPL  =  10  a+s  log A10  r  r e  f  (A.43)  a  log  1 0  (l+£)  E q u a t i o n A . 4 3 indicates that the difference i n sound intensity c a n be expressed i n terms of two measurable physical quantities — the difference i n SPL measurements i n the absence a n d d u r i n g the presence of a w a r n i n g sound. Hence, we have  {SPL  a+s  - SPL ) a  =  10  l  =  10  l o g ( l + SNR)  o  g  l  1 0  0  (l+£) (A.44)  Appendix A. Formulation of Relationship between SNR and SPL measurements 143  Hence  SNR  =  h. I*  =  {  o  „  (  , , o  g  l  0  ( & ^ ) ) } -  1  (A.45)  or  SNR(dB)  = 10  log (SNR) w  (A.46)  W h e n the difference i n S P L readings is more t h a n 10 d B , S N R i n d B is very close to the S P L difference i n d B (Table A.9).  Appendix A. Formulation of Relationship between SNR and SPL measurements 144  Table A.9: Tabulation of S P L reading difference and SNR  {SPL  a+s  -  SPL )in{dB) a  0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0 23.0  S'NR (in dB) -9.14 -5.9 -3.8 -2.43 -1-1 0.0 0.9 1.8 2.6 3.3 4.1 4.7 5.4 6.0 7.2 8.4 9.5 10.6 11.7 12.8 13.8 14.8 15.9 16.9 17.9 18.9 20.0 21.0 22.0 23.0  SNR 0.12 0.26 0.41 0.59 0.78 1.0 1.2 1.5 1.8 2.2 2.6 3.0 3.5 4.0 5.3 6.9 9.0 11.6 14.8 19.0 24.1 30.6 38.8 49.1 62.0 78.4 99.0 125 158 199  Appendix B  F o r m a t of the c o m m a n d set of the S R  T h e format of the twelve commands used to control the operation of the S R is given in Table B.10. Correspondingly, T a b l e B . l l shows the legal values for the m e m o r y bank, the bank rejected value, the signal rejected value, the syntax  a n d the registration  In response to a specified c o m m a n d one or more of the following status output codes is (are) reported from the )uPD7762 to the control &; t i m i n g processor. interpretations of these status output codes are given i n T a b l e B . 1 2 .  145  The  Appendix B. Format of the command set of the SR  146  Table B.10: Format of command set of SR Command Code 1. Initialize (2 byte code)  Format ,00, H^, OFFH (termination code) code  2. Level_adjust ( 3 - 6 bytes) 3. Recognition (2 - 32 bytes) 4. Training ( 3 - 5 bytes) 5. Second Decision (2 bytes) 6. Hot start (2 bytes) 7. Down load (3 bytes) 8. Up load (2 bytes) 9. Change memory reject value (3 bytes) 10. Memory test (2 bytes) 11. Select memory bank (3 bytes) 12. Change signal reject value (3 bytes)  hex  01H, [memory bank], [memory bank], [memory bank], [memory bank], OFFH 003H, [syntax # (S)], [..., S . . . ], OFFH 002H, registration [syntax #], [signal rejected value], OFFH 004H, OFFH 005H, OFFH 006H, # of patterns, OFFH 007H, OFFH 008H, bank reject value, OFFH 009H, OFFH OOAH, bank #, OFFH OOCH, registration signal reject value, OFFH  Table B . l l : Legal Values for parameters of the command set Parameters l) Memory bank value (B) 2) Bank reject value (BRV) 3) signal reject value (SRV) 4) pattern registration value (PRV)  Legal Value 0 < B < 03 0 < BRV < OFEH 0 < SRV < 080H 0 < PRV < 080H  Appendix B. Format of the command set of the SR  Table B.12: Interpretation of status output codes from /iPD7762 Code 000H 001H 002H 003H 004H 005H 006H 007H 008H 009H OOAH OOBH OOCH  Interpretation normal completion of a command Input signal level too high Input signal level too low Input signal longer than 2.0 sec Request signal level adjustment Specified syntax # non-existing Registered pattern does not exist the distance value is greater than B R V Specified memory bank does not exist Command format error The distance is greater than P R V , but less than B R V Signal duration is less than 200 msec Memory test error or hardware I / O error  147  Appendix C  Software O p e r a t i n g M a n u a l of T h e W A R N S I S  C.l  P r o g r a m Files  T h i s m a n u a l provides a guidance for the user to follow the operation procedure developed for the signal recognition software. T h e software was designed to provide an interactive dialogue between the user and the device. Messages w i l l constantly display o n the m o n i t o r to enquire the user to input the requested parameter values, and to indicate the status of the device. In this m a n u a l , such messages are shown i n bold-face. T h e software was saved on a P C - c o m p u t e r , and was located at the sub-directory called \ s i m o n \ n e c \ . T o enter this sub-directory, the user needs to type the following statements:  type : c d simon displayed o n the monitor: d : \ s i m o n type : c d nec displayed on the m o n i t o r : d : \ s i m o n \ n e c  Once the user has entered the sub-directory of \ s i m o n \ n e c \ , he/she can find the programs necessary to r u n this software. These programs are :  148  Appendix C. Software Operating Manual of The WARNSIS  149  • nec.asm : the source program of the system operating software i n assembly language,  • nec.exe : the executable file of nec.asm,  • nec_dat.asm : the data file consisting of constants a n d variables for nec.exe and,  • enec.bat : the b a t c h file used to automatically assemble nec.asm to produce its object codes, to link its object code file (nec.obj) to yield the executable file (nec.exe), a n d to delete the redundant object file to optimize memory storage on the hard-disk. T h i s b a t c h file is activated only when modification(s) has been made to the nec.asm. E x e c u t i o n of this b a t c h j o b is accomplished by t y p i n g ENEC.  A signal template file was stored at the directory of \ s i m o n \ n e c \ t e m p \ .  T h i s data  file is called as 50_warn.dat, a n d consists of 50 templates of various w a r n i n g sounds. S u c h warnings include siren sounds emitted from a n electronic siren driver, telephone rings a n d smoke a l a r m sounds.  C.2  Interactive Operations  T o execute the system software, the user types N E C . B y executing the nec.exe, the user enters the interactive operation mode, and is p r o m p t e d to answer a number of questions. There are t w o stages i n this mode of operation, namely, the i n i t i a t i z a t i o n stage, and the t r a i n i n g / r e c o g n i t i o n stage.  Appendix C. Software Operating Manual of The WARNSIS  C.2.1  150  Initialization Stage  Once the p r o g r a m is executed, the following events occur. T h e y are:  1. S y s t e m Initialization in Progress  2. S y s t e m H a r d w a r e Checking: i f everything is O K , these statements are displayed on the monitor:  . M E M O R Y C H E C K O K !! • M E M O R Y C H E C K O K !!  Otherwise, error statements are reported, and they are :  • Invalid C o m m a n d , or • M E M O R Y error or H A R D W A R E I / O error !  U n d e r such circumstances, the user must exit the p r o g r a m b y pressing C T R L - C , a n d shut off the power supply for 20 seconds, t u r n on the power supply, and re-run the program.  3. the user is p r o m p t e d to flip a m a n u a l s w i t c h before the system begins the process of signal level adjustment.  • Please, flip the switch to L E V E L _ A D J U S T ,  • If ready, Please press E N T E R  key.  Appendix C. Software Operating Manual of The WARNSIS  151  After the E N T E R key is pressed, the system starts the signal level adjustment.  • L e v e l adjustment i n P R O G R E S S  U p o n completion of the level adjustment, the system requests if the user wants to transfer any pre-stored signal template(s) to the template m e m o r y of the device.  • D o y o u w a n t to download signal templates f r o m host C P U ?  (y/n)  If the answer is 'y', then the user needs to provide the template file name and the value of the t o t a l # of the prestored templates.  • Please, input the file n a m e consisting of the templates — *.dat. ( d : \ s i m o n \ n e c \ t e m p \ * . d a t ) , and a file opening statement is shown on the monitor.  • S U C C E S S F U L open d a t a file !!  • Please, input  of templates for d o w n l o a d i n g After this number is entered,  d a t a transfer begins to take place. U p o n completion of the d a t a transfer, these statements are shown on the monitor;  • Signal file H A S B E E N C L O S E D !!  • S U C C E S S F U L d a t a downloading !!  • D o y o u w a n t another downloading? ( y / n )  If 'y' is entered, the preceding  steps repeat. Otherwise, the user enters the second stage of this software.  152  Appendix C. Software Operating Manual of The WARNSIS  C.2.2  T r a i n i n g / R e c o g n i t i o n Stage  Once the user stays i n this stage, he/she has to flip the m a n u a l switch to t r a i n i n g / recognition position. • Please, flip the switch to signal T R A I N I N G / R E C O G N I T I O N  Training Procedure  T h e n , the user is p r o m p t e d i f he/she wants the system to learn a new sound. • D o y o u w a n t to t r a i n the system to learn a new sound?  (y/n)  If the answer is 'n', the user proceeds to the recognition stage. If the answer is 'y', he/she needs to provide a n identification for the new sound, and then presses the E N T E R key to start the t r a i n i n g procedure. T h e interactive statements o n the m o n i t o r are : • Please, specify an identification for i n p u t signal = ,  • template #  — whose value is a u t o m a t i c a l l y generated by the system software,  • Please, i n p u t S I G N A L for T r a i n i n g .  • If ready, Please press E N T E R key.  • S i g n a l template t r a i n i n g in P R O G R E S S  F o r successful t r a i n i n g , a s u m m a r y of the template information is shown:  Appendix C. Software Operating Manual of The WARNSIS  153  o SUCCESSFUL TRAINING • B u r s t signal !! (for burst signal), or S t e a d y sound (for continuous , steady sound)!!  • SYNTAX #  =  • T e m p l a t e =#= =  • S i g n a l template identification =  Subsequently, the user is prompted i f he/she wants the device to learn a new sound, or to recognize another new sound. If the training mode is selected, the  affore-mentioned  t r a i n i n g steps repeat. If the recognition mode is selected, the user enters the recognition stage.  Recognition Procedure  T h e statement displaying on the monitor is  • D o y o u w a n t the system to recognize the signal ?  (y/n)  If the answer is 'n', the statement to enquire the signal template uploading is displayed o n the monitor.  B u t , if the user wants the device to recognize the signal, then the  m o n i t o r shows the following statements, and the signal recognition process starts.  • S t a r t to recognize the i n p u t signal !  Appendix C. Software Operating Manual of The WARNSIS  154  • S i g n a l recognition i n P R O G R E S S  F o r a successful recognition, a s u m m a r y of the recognition results appears o n the monitor: • SUCCESSFUL  RECOGNITION  • T h e closest distance measured  =  • B u r s t sound, or Steady sound  • SYNTAX #  -  • Template #  =  • S i g n a l template identification  =  Consequently, the user is p r o m p t e d for another signal recognition. If the response is 'y', the preceding recognition steps repeat. If the response is 'n', he/she is enquired if the user wants to perform a signal template uploading process. • D o y o u w a n t to save m e m o r y templates ??  (y/n)  If the response is not 'y', the user needs to select one of the following options. • W h a t do y o u want to do next? choices)  • r : another signal  recognition  (please, select one of the  following  Appendix C. Software Operating Manual of The WARNSIS  155  • d : another template file d o w n l o a d i n g  • t : another signal t r a i n i n g  • e : exit the p r o g r a m  Otherwise, for the memory uploading, the user provides a template file name for the identification of the stored signal templates. T h e n , the process of d a t a transfer is performed transparently. T h e interactive statements are:  • #  of template for u p l o a d i n g =  • Please, enter the file name  • Successful open file  • Successful u p l o a d i n g  • F i l e closed  • D o y o u w a n t another signal m e m o r y uploading?  (y/n)  If the answer is 'y', the uploading steps repeat. Otherwise, the user has to select one of the previously mentioned options (r; d; t; e).  Appendix D  E v a l u a t i o n Results  In this work, confusion matrices are used to present the recognition results produced by the complete W A R N S I S , the t i m i n g analyzer part alone, and the spectral recognizer part alone. T o simplify the notation for the confusion matrices given i n the following sections, different w a r n i n g sounds are assigned a "number" as shown i n T a b l e D . 1 3 . E a c h assigned number i n the first horizontal row indicated the'specific w a r n i n g sound w h i c h was identified b y a recognition system; a n d each assigned number i n the first vertical c o l u m n indicated the w a r n i n g sound w h i c h was present i n the environments. E a c h element of the confusion m a t r i x yielded the number of times that a w a r n i n g sound was identified as the emitted sound i n the environments. B a s e d on these results, the recognition rates for each w a r n i n g sound are derived. Otherwise stated, the results presented here assumes that the M B D value is set to 0.1024 sec. T E ( L l ) represents telephone rings generated from electromechanical ringer w i t h loudness level set at one. E T E ( P l ) represents telephone rings produced by electronic ringer w i t h p i t c h i n g adjustment preset at a specific position.  156  Appendix D. Evaluation  Results  157  Table D . 1 3 : "Numbers" assigned for different w a r n i n g sounds T y p e of Sound J l (B) J 2 (B) J 3 (B) J 4 (B) J 5 (S) J 6 (B) J7 (S) J8 (B) smoke a l a r m (S) TE(L1) (PH) TE(L3) (PH) TE(L5) (PH) TE(L7) (PH) E T E ( P l ) (PH) E T E ( P 2 ) (PH) E T E ( P 3 ) (PH) E T E ( P 4 ) (PH) B : Burst-type Sound S: Steady Sound P H : Phone R i n g  Assigned N u m b e r 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  Appendix D. Evaluation  D.l  D.l.l  Results  T h e Complete W A R N S I S  R e c o g n i t i o n Results w i t h B a c k g r o u n d Steady Noise  Appendix D. Evaluation Results  159  Table D.14: Confusion m a t r i x for recognition results generated by the complete W A R N SIS i n the presence of steady noise  1 2 3 4 5 6 7 8 9  1 30  2  3  4  5  6  7  8  9  30 30 30 30 30 30 30 30  Table D . 1 5 : R e c o g n i t i o n rates of burst-type sounds under steady noise c o n d i t i o n B u r s t - t y p e Sound  Assigned Value  R e c o g n i t i o n R a t e {%)  Jl 32 J3 J4 J6 J8 Average  1 2 3 4 6 8  100 100 100 100 100 100 100  -  Table D . 1 6 : R e c o g n i t i o n rates of steady sounds generated by the complete W A R N S I S under steady noise condition Steady Sound  Assigned Value  Recognition R a t e {%)  J5 J7 smoke a l a r m Average  5 "7 9  100 100 100 100  -  Appendix D. Evaluation Results  160  Table D.17: Confusion m a t r i x for phone ring recognition generated by the complete W A R N S I S under steady noise condition  10 11 12 13 14 15 16 17  10 30  11  12  13  14  15  16  17  30 30 30 30 30 30 30  T a b l e D.18: R e c o g n i t i o n rates of phone r i n g generated by the complete W A R N S I S under steady noise condition Phone Ring  Assigned V a l u e  Recognition R a t e (%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  100 100 100 100 100 100 100 100 100  -  Appendix D. Evaluation Results  D.1.2  161  R e c o g n i t i o n Results w i t h B a c k g r o u n d of F M B r o a d c a s t plus Steady Noise  Appendix D. Evaluation Results  162  Table D.19: Confusion m a t r i x for recognition results generated b y the complete W A R N SIS i n the presence of F M broadcast plus steady noise  1 2 4 5 6 7 8 9  1 30  2  5  4  6  7  8  9  30 29  1 30  2  «  28 30 30 30  Table D . 2 0 : R e c o g n i t i o n rates of burst-type sounds produced by the complete W A R N SIS under F M broadcast plus steady noise c o n d i t i o n B u r s t - t y p e Sound  Assigned Value  Recognition R a t e (%)  Jl J2 J4 J6 J8 Average  1 2 4 6 8  100 100 96.7 93.3 100 98.0  -  Table D . 2 1 : R e c o g n i t i o n rates of steady sounds generated by the complete W A R N S I S under F M broadcast plus steady noise condition Steady Sound  Assigned V a l u e  R e c o g n i t i o n R a t e (%)  J5 J7 smoke a l a r m Average  5 7 9  100 100 100 100  -  Appendix D. Evaluation Results  D.1.3  163  R e c o g n i t i o n Results w i t h B a c k g r o u n d of A M B r o a d c a s t plus Steady Noise  Appendix D. Evaluation Results  164  Table D.22: Confusion m a t r i x for recognition results generated by the complete W A R N SIS i n A M broadcast plus steady noise background  1 2 4 5 6 7 8 9  1 30  2  5  4  6  7  8  9  30 30 30 29  1 30 30 30  T a b l e D.23: R e c o g n i t i o n rates of burst-type sounds generated by the complete W A R N SIS i n A M broadcast plus steady noise environment B u r s t - t y p e Sound  Assigned Value  Recognition R a t e (%)  Jl J2 J4 J6 J8 Average  1 2 4 6 8  100 100 96.7 100 100 99.3  -  Table D.24: R e c o g n i t i o n rates of steady sounds generated by the complete W A R N S I S i n A M broadcast plus steady noise background Steady Sound  Assigned Value  R e c o g n i t i o n R a t e (%)  J5 J7 smoke a l a r m Average  5 7 9  100 100 100 100  -  Appendix D. Evaluation Results  D.1.4  165  Results of phone r i n g recognition w i t h m i n i m u m burst d u r a t i o n ( M B D ) set to 1.024  sec  Appendix D. Evaluation Results  166  Table D . 2 5 : Confusion m a t r i x for phone r i n g recognition generated b y the complete W A R N S I S under the condition of F M broadcast a n d the steady noise w i t h M B D set to 1.024 sec  10 11 12 13 14 15 16 17  10 28  11 2 29  12  13  14  15  26 3  4 27 1  16  17  29 2  28  . 1 26 1  4 29  •  .  T a b l e D . 2 6 : Results of recognition rates of phone rings generated by the complete W A R N S I S i n F M broadcast plus the steady noise background Phone Ring  Assigned V a l u e  R e c o g n i t i o n R a t e {%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  93.3 96.7 86.7 96.7 86.7 90.0 96.7 93.3 92.5  -  Appendix D. Evaluation Results  167  T a b l e D . 2 7 : Confusion m a t r i x for the results of phone r i n g recognition generated b y the complete W A R N S I S i n the presence of A M broadcast plus the steady noise w i t h M B D set to 1.024 sec  10 11 12 13 14 15 16 17  10 29 2  11 1 28  12  13  27 1  3 29  14  15  16  17  29 1  1 29  . 27 2  3 28  Table D.28: Results of phone ring recognition rates generated by the complete W A R N SIS i n the presence of A M broadcast plus steady noise w i t h M B D set to 1.024 sec Phone Ring  Assigned Value  R e c o g n i t i o n R a t e (%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  96.7 93.3 90.0 96.7 90.0 93.3 96.7 96.7 94.2  -  Appendix D. Evaluation Results  D.1.5  168  Results of the F a l s e - a l a r m Tests for the complete  WARNSIS  Table D.29: Results of the false-alarm tests for the complete WARNSIS with M B D set to 0.1024 sec Mis-recognized as Jl J2 J3 J4 J5 J6 J7 J8 Smoke Alarm TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Total # of recognitions Duration (hours)  Heavy Rock  FM Pop Music  Soft Music  Speech  AM Soft Music  Soft Rock  1 0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0  0 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0  1 1 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0  1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0  0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0  5 2  4 2  5 2  2 2  3 2  1 2  Appendix D. Evaluation Results  169  Table D.30: Results of the false-alarm tests for the complete WARNSIS with M B D set to 1.024 sec Mis-recognized as Jl J2 J3 J4 J5 J6 J7 J8 Smoke Alarm TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Total # of recognitions Duration (hours)  Speech  AM Soft Music  Soft Rock  0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0  0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0  0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0  1 1  1 0.5  1 0.5  1 0.5  Heavy Rock  FM Pop Music  Soft Music  0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0  2 2  1 1  Appendix D. Evaluation  D.2  D.2.1  Results  Timing Analyzer Part  Alone  R e c o g n i t i o n Results w i t h B a c k g r o u n d S t e a d y Noise  Appendix D. Evaluation Results  171  Table D.31: Confusion matrix for warning sound recognition generated by the timing analyzer alone in the presence of steady noise Type of Sound  Jl  Jl J2 J3 J4 J6 J8 Phone Ring  30  J2  J3  J4  J6  J8  Phone Ring  30 30 30 30 30 30  Table D.32: Recognition rates of the timing analyzer part alone in the presence of steady noise Burst-type Sound  Assigned Value  Recognition Rate (%)  Jl J2 J3 J4 J6 J8 Average  1 2 3 4 6 8  100 100 100 100 100 100 100  Phone Ring  -  100  Appendix D. Evaluation Results  D.2.2  172  R e c o g n i t i o n Results w i t h B a c k g r o u n d of F M B r o a d c a s t Plus Steady Noise  Appendix D. Evaluation Results  173  Table D . 3 3 : Confusion m a t r i x for w a r n i n g sound recognition generated by the t i m i n g analyzer part alone i n the presence of F M broadcast plus steady noise T y p e of Sound  Jl  Jl J2 J3 J4 J6 J8 Phone R i n g  30  J2  .  J3  J4  26 .  . 30  J6  J8  Phone R i n g  30 10  0  30  4  30 12  3  •  5  Table D.34: R e c o g n i t i o n rates of the t i m i n g analyzer part alone i n the presence of F M broadcast plus steady noise B u r s t - t y p e Sound  Assigned V a l u e  Recognition R a t e (%)  Jl J2 J3 J4 J6 J8 Average  1 2 3 4 6 8  100 100 86.6 100 100 100 97.7  Phone R i n g  -  0  Appendix D. Evaluation Results  D.2.3  174  R e c o g n i t i o n Results w i t h B a c k g r o u n d of A M B r o a d c a s t P l u s Steady Noise  Appendix D. Evaluation Results  175  Table D.35: Confusion matrix for warning sound recognition generated by the timing analyzer part alone in the presence of A M broadcast plus steady noise Type of Sound  Jl  J2  Jl J2 J3 J4 J6 J8 Phone Ring  30  J3  J4  30 3  J6  J8  .  .  30 . .  . 30 12  Phone Ring  27 30  -  6  5  7  0  Table D.36: Recognition rates of the timing analyzer part alone in the presence of A M broadcast plus steady noise Burst-type Sound  Assigned Value  Recognition Rate (%)  Jl J2 J3 J4 J6 J8 Average  1 2 3 4 5 8  100 100 90 100 100 100 98.3  Phone  -  0  Appendix D. Evaluation Results  D.3  176  F a l s e - a l a r m Results for the T i m i n g A n a l y z e r A l o n e  Table D.37: False-alarm test results of the timing analyzer part alone with M B D set to 0.1024 sec mis-recognized as Jl J2 J3 J4 J6 J8 Steady Sound Phone Total # of Mis-recognitions Duration (minutes)  FM Rock Music  Classical  Speech + Music  AM Pop Music  Speech  12 0 23 0  19 13 0 2 19 1 23 1  13 10 0 1 7 0 45 1  34 18 0 0 9 0 3 0  16 26 0 2 5 0 50 0  40 25 0 1 16 0 0 1  67 56  78 46  77 56  64 19  99 59  83 24  Pop Music 16 16 0 0  Appendix D. Evaluation Results  177  Table D.38: False-alarm test results of the timing analyzer part alone with M B D set to 1.024 sec mis-recognized as Jl J2 J3 J4 J6 J8 Steady Sound Phone Total # of Mis-recognitions Duration (minutes)  Pop Music  FM Rock Music  Classical  Speech + Music  AM Pop Music  0 0 1 0 0 0 20 3  0 0 2 0 0 0 10 2  0 0 1 0 0 0 24 1  0 0 2 0 0 0 15 6  0 0 3 0 0 0 10 6  2 0 0 0 14 2  24 30  14 30  26 30  23 30  19 30  18 30  Speech 0  Appendix D. Evaluation Results  D.4  D.4.1  S p e c t r a l Recognizer P a r t A l o n e  R e c o g n i t i o n Results w i t h B a c k g r o u n d Steady Noise  Appendix D. Evaluation Results  179  Table D.39: Confusion m a t r i x for w a r n i n g sound recognition generated b y the spectral recognizer part alone i n the presence of steady noise  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  1 30  2 . 30  3  4  5  6  7  8  9  10  11  12  28 2 1  1 27 1  1 1 28 2  28  13  14  15  16  17  30 1 1  29 1  28  30 30 30 30 30 30  30 30  Table D.40: Results of steady sound recognition rate generated by the spectral recognizer p a r t alone i n steady noise background Steady S o u n d  Assigned V a l u e  Recognition R a t e ( R R ) i n %  J5 J7 smoke a l a r m Average  5 7 9  100 100 93.3 97.6  -  Appendix D. Evaluation Results  180  T a b l e D . 4 1 : Results of burst-type sound recognition rates produced b y the spectral recognizer part alone i n steady noise background Burst-type Sound  Assigned Value  Recognition R a t e i n (%)  Jl J2 J3 J4 J6 J8 Average  1 2 3 4 6 8  100 100  -  100 100 100 100 100  Table D . 4 2 : Results of phone r i n g recognition rate produced by the spectral recognizer part alone i n steady noise background Phone Ring  Assigned Value  R e c o g n i t i o n R a t e (%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  90.0 93.3 93.3 100 100 100 96.7 93.3 95.8  -  Appendix D. Evaluation Results  D.4.2  1°1  R e c o g n i t i o n Results w i t h B a c k g r o u n d of F M B r o a d c a s t plus Steady Noise  Appendix D. Evaluation Results  182  Table D.43: Confusion matrix for the results of warning sound recognition generated by the spectral recognizer part alone in F M broadcast and steady noise background  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  1 29  2  3  4  5  6  7  8  9  10  .  22 5  11  . 6  2  1  12  13  14  15  .  1 5  7. 6  1 21  1  16 1  17  .  1  1  1 20 2  1 1 19  1  30  .  8  22 2  1  14  10  30 30 30 1 1 1 1 2  4 1  2 1  2 4 1 3  1  24 3 1  2 3 1 2 1  22 2 3  .  5  1 -  1 22 4  3  2 21 5  20  Table D.44: Results of steady sound recognition rate produced by the spectral recognizer part alone in F M broadcast plus steady noise background Steady Sound  Assigned Value  Recognition Rate (%)  J5 J7 smoke alarm Average  5 7 9  73.3 100 100.0 91.1  -  Appendix D. Evaluation Results  183  Table D . 4 5 : Results of burst-type sound recognition rates produced b y the spectral recognizer part alone i n F M broadcast plus steady noise background B u r s t - t y p e Sound  Assigned V a l u e  Recognition R a t e ( R R ) (%)  Jl  32 33 34 36 38  1 2 3 4 6 8  Average  -  96.7 73.3 20.0 100 3.3 100 65.6  Table D.46: Results of phone ring recognition rates produced by the spectral recognizer part alone under F M broadcast plus steady noise condition Phone Ring  Assigned V a l u e  Recognition R a t e (%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  80.0 70.0 73.3 66.7 63.3 73.3 70.0 63.3 70.0  -  Appendix D. Evaluation Results  D.4.3  184  R e c o g n i t i o n Results w i t h B a c k g r o u n d of A M B r o a d c a s t plus Steady Noise  Appendix D. Evaluation Results  185  Table D.47: Confusion m a t r i x for w a r n i n g sound recognition generated by the spectral recognizer part alone under A M broadcast plus steady noise c o n d i t i o n  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  1 29  2  3  4  .  23 4  5 1  6  7  7  8  9  10  .  .  6  1  11  12  1  13  14  15  16  5  5  5  3  16  1  17  30 24 4  2  6 6  1  1  29 30 30 1 1 .  2 1 2 3 2  .  1 4 .  3 1 1  3 3  1 3 3 1  1  1  25  4 20 2  5 19 4 4  3 17  1 3 1  22 3  21 20 2  22  T a b l e D . 4 8 : Results of steady sound recognition rates produced b y the spectral recognizer part alone under A M broadcast plus steady noise c o n d i t i o n Steady S o u n d  Assigned V a l u e  R e c o g n i t i o n R a t e (%)  J5 J7 smoke a l a r m Average  5 . 7 9  80.0 93.3 100.0 91.1  -  Appendix D. Evaluation Results  186  Table D.49: Results of burst-type sound recognition rate produced by the spectral recognizer part alone i n the presence of A M broadcast plus steady noise Burst-type Sound  Assigned V a l u e  Recognition R a t e ( R R ) (%)  Jl J2 J3 J4 J6 J8 Average  1 2 3 4 6 8  96.7 76.7 23.3 100 6.7 100 67.2  -  Table D.50: Results of phone r i n g recognition rate produced b y the spectral recognizer part alone i n the presence of A M broadcast plus steady noise Phone Ring  Assigned V a l u e  Recognition R a t e (%)  TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Average  10 11 12 13 14 15 16 17  83.3 66.7 63.3 56.7 73.3 70.0 66.7 73.3 69.2  -  Appendix D. Evaluation Results  D.4.4  187  Results of false-alarm tests for the spectral recognizer part alone  Table D . 5 1 : False-alarm tests for the spectral analyzer part alone mis-recognized as Jl J2 J3 J4 J5 J6 J7 J8 Smoke Alarm TE(L1) TE(L3) TE(L5) TE(L7) ETE(Pl) ETE(P2) ETE(P3) ETE(P4) Total # of Mis-recognitions Duration (minutes)  Heavy Rock  FM Pop Music  Soft Music  Speech  AM Soft Music  Soft Rock  0 0 0 0 0 0 1 0 0 28 30 9 50 1 0 0 1  0 0 1 0 0 0 30 0 0 10 7 29 43 0 0 0 0  0 0 1 0 0 1 4 1 4 30 24 9 40 4 0 0 2  0 0 6 0 0 24 0 0 31 2 9 48 0 0 0 0 0  0 0 1 0 0 0 17 0 10 3 0 79 10 0 0 0 0  0 0 1 0 0 0 4 0 0 7 36 23 46 0 0 1 2  120 3.42  120 4  120 4.27  120 4.8  120 4.2  120 3.57  Appendix E  Specifications  1. Power S u p p l y :  • + 5 V : 700 m A • + 12 V : 64.3 m A • - 12 V : 51.6 m A  2. Signal Features: T i m i n g a n d short-time spectral patterns  3. T h e W A R N S I S : a ' h y b r i d ' system consisting of the parts of the t i m i n g analyzer a n d the spectral analyzer  4. T i m i n g A n a l y z e r P a r t A l o n e :  • F u n c t i o n : the classification of warning sounds based on the absolute shortt i m e average signal amplitudes • short-time d u r a t i o n : 12.8 msec • T i m i n g Features : the repetition period a n d the average signal burst w i d t h for burst-type sounds; whereas a rising signal a m p l i t u d e t r a n s i t i o n and a new signal a m p l i t u d e for steady sounds 188  Appendix E. Specifications  189  5. Spectral Recognizer Part Alone:  • Function : extraction of short-time spectral features from warning sounds by the filter-bank approach, • Short-time Duration : 12 msec • # of filters : 8 • Type of Filter : digital biquad • Frequency Span : 100 Hz to 5.0 kHz • Implementation : software • Pattern Matching : Dynamic Time Warping  6. Modes of Operations:  • burst-type and steady warning sound recognition • phone ring recognition  7. Recognition Accuracy :  • 98 % for steady and burst-type warning sounds for a SNR of over 10 dB • 93 % for phone rings for a SNR of over 10 dB or better  8. False-alarm Rate:  • one false recognition per 90 minutes (worst-case) for burst-type and steady sounds  Appendix E. Specifications  190  • no false ring indications  9. R e c o g n i t i o n T i m e : 0.5 sec to 10 sec depending on the type of w a r n i n g sounds  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0064861/manifest

Comment

Related Items