Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Speech amplitude and zero crossing for automated identification of human speakers Wasson, Douglas Arnold 1974

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1974_A7 W38.pdf [ 2.15MB ]
Metadata
JSON: 831-1.0065539.json
JSON-LD: 831-1.0065539-ld.json
RDF/XML (Pretty): 831-1.0065539-rdf.xml
RDF/JSON: 831-1.0065539-rdf.json
Turtle: 831-1.0065539-turtle.txt
N-Triples: 831-1.0065539-rdf-ntriples.txt
Original Record: 831-1.0065539-source.json
Full Text
831-1.0065539-fulltext.txt
Citation
831-1.0065539.ris

Full Text

SPEECH AMPLITUDE AND ZERO CROSSING FOR AUTOMATED IDENTIFICATION OF HUMAN SPEAKERS  by  Douglas A r n o l d Wasson B.Sc.E.,  U n i v e r s i t y o f New B r u n s w i c k ,  1972  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE  in  the  Department of  Electrical  Engineering  We accept t h i s t h e s i s as conforming to required standard  THE UNIVERSITY OF BRITISH COLUMBIA May  1974  the  In p r e s e n t i n g  this  t h e s i s in p a r t i a l  f u l f i l m e n t o f the  requirements f o r  an advanced degree at the U n i v e r s i t y of B r i t i s h Columbia, I agree the L i b r a r y s h a l l make i t f r e e l y  a v a i l a b l e f o r r e f e r e n c e and  I f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e f o r s c h o l a r l y purposes may by h i s r e p r e s e n t a t i v e s .  be granted by  written  gain s h a l l  permission.  Depa rtment The U n i v e r s i t y o f B r i t i s h Columbia Vancouver 8, Canada  not  thesis  Department or  I t i s understood that c o p y i n g o r  of t h i s t h e s i s f o r f i n a n c i a l  study.  copying of t h i s  the Head o f my  that  publication  be a l l o w e d w i t h o u t  my  ABSTRACT This t h e s i s involves  an i n v e s t i g a t i o n  speech a m p l i t u d e , low pass zero c r o s s i n g r a t e  o f the u s e f u l n e s s  (ZCR) and h i g h pass  of zero  c r o s s i n g r a t e f o r speaker r e c o g n i t i o n . Speech samples were recorded from ten speakers  and p r e p r o -  cessed to y i e l d time q u a n t i z e d p r e p r o c e s s e d waveforms o f SPEECH AMPLITUDE, LOW PASS ZCR and HIGH PASS ZCR.  These p r e p r o c e s s e d waveforms  were then averaged f o r each speaker to form three s e t s o f waveforms.  These waveforms were then expanded i n an n - d i m e n s i o n a l  f e a t u r e s p a c e , where n i s system.  averaged  the number o f speakers used i n t e s t i n g  The o r t h o g o n a l f u n c t i o n s  d e s c r i b i n g the  the  f e a t u r e space were  d e r i v e d from the average p r e p r o c e s s e d waveforms by u s i n g the Gram Schmidt o r t h o g o n a l i z a t i o n t e c h n i q u e .  The average p r e p r o c e s s e d wave-  forms were expanded i n the f e a t u r e space to p r o v i d e a r e f e r e n c e f e a t u r e v e c t o r f o r each s p e a k e r . The  r e c o g n i t i o n procedure was based on measuring the  d i s t a n c e between the  f e a t u r e v e c t o r d e r i v e d from a sample p r e p r o c e s s e d  waveform f o r the unknown speaker and the r e f e r e n c e The  feature  vectors.  speaker whose r e f e r e n c e v e c t o r was c l o s e s t to the t e s t v e c t o r was  chosen as the speaker of t h a t u t t e r a n c e . the p r e p r o c e s s e d waveforms were t e s t e d . average percentage o f i n c o r r e c t d e c i s i o n s  A variety of  results  combinations o f  The b e s t r e s u l t s  showed an  o f 3.4% when one o f  speakers was i d e n t i f i e d on the b a s i s o f a s i n g l e  ful  Euclidean  ten  spoken s e n t e n c e .  The  i n d i c a t e t h a t speech amplitude and zero c r o s s i n g r a t e are u s e -  f o r speaker i d e n t i f i c a t i o n .  ii  TABLE OF CONTENTS Page Abstract T a b l e o f Contents L i s t of Tables L i s t of Figures Acknowledgement  i i i i i iv v "vi  I  INTRODUCTION  1  II.  SPEAKER RECOGNITION SYSTEM  3  2.1 2.2 2.3 2.4 III  IV  3" 3 3 10  DATA ACQUISITION AND INITIAL PROCESSING  11  3.1 3.2 3.3 3.4  11 11 12 13  Speech R e c o r d i n g D i g i t i z i n g o f t h e Speech Samples L o c a t i n g the Speech B o u n d a r i e s Preprocessing Algorithm  SYSTEM TESTS AND RESULTS  16  4.1 4.2  16 17 17 21 26  4.3 4.4 V  B a s i c System Preprocessing Feature E x t r a c t i o n D e c i s i o n Making  T e s t Procedure Test R e s u l t s 4.2.1 I n d i v i d u a l P r e p r o c e s s e d Waveforms 4.2.2 Concatenated P r e p r o c e s s e d Waveforms 4.2.3 Concatenated F e a t u r e V e c t o r s Measurement o f t h e V a r i a n c e s o f t h e P r e p r o c e s s e d Waveforms Discussion of Results  31 31  CONCLUSION  36  REFERENCES  38  iii  L I S T OF TABLES  Table  4.1  Page  Measurements F , G , and F / G made on the f e a t u r e  space  when i n d i v i d u a l p r e p r o c e s s e d waveforms were used  4.2  Results of a l l possible  combinations o f  three  p r e p r o c e s s e d waveforms  4.3  23  Measurements F , G and F / G made on the f e a t u r e  space  when concatenated p r e p r o c e s s e d waveforms were used  4.4  Measurements F , G , and F / G made on the f e a t u r e when concatenated  4.5  22  f e a t u r e v e c t o r s were used  27  space 30  Normalized v a r i a n c e s f o r a l l t h r e e p r e p r o c e s s e d waveforms  32  iv  LIST OF FIGURES  B l o c k Diagram o f Speaker R e c o g n i t i o n System  T y p i c a l P r e p r o c e s s o r Output Waveforms  Speech Boundary D e t e r m i n a t i o n  T y p i c a l p r e p r o c e s s e d waveforms f o r a speech sample o f c a n d i d a t e CS  Two t y p i c a l 2 - d i m e n s i o n a l f e a t u r e plots  vector  f o r 2 candidates  Performance r e s u l t s  f o r the  individual  p r e p r o c e s s e d waveforms  Performance r e s u l t s  f o r the  p r e p r o c e s s e d waveforms  Performance r e s u l t s feature vectors  concatenated  tested  f o r the  tested  v  concatenated  ACKNOWLEDGEMENT  I w i s h to  thank my s u p e r v i s o r ,  v a l u a b l e s u g g e s t i o n s and constant  D r . R. W. D o n a l d s o n , f o r  the  encouragement which I r e c e i v e d d u r i n g  the course of my p r o j e c t . I would a l s o l i k e to thank M r . M. Koombes f o r the a s s i s t a n c e p r o v i d e d i n the d i g i t i z a t i o n The C o u n c i l i n the is  gratefully  f i n a n c i a l support r e c e i v e d form o f a 1967  Science  acknowledged.  vi  of the  technical  data.  from the N a t i o n a l Research  S c h o l a r s h i p and under Grant 67-3308  1  CHAPTER 1  INTRODUCTION The a b i l i t y of human l i s t e n e r s t h e i r v o i c e s has l o n g been known. f r i e n d s on the telephone Studies  solely  to i d e n t i f y speakers  V i r t u a l l y everyone has  on the b a s i s  from  recognized  of the word " H e l l o " .  to o b t a i n e s t i m a t e s on the r e l i a b i l i t y w i t h which human  listeners  can r e c o g n i z e human speakers  [1,2] show t h a t l i s t e n e r s  r e c o g n i z e speakers w i t h a c c u r a c i e s of 94-98%.  The human l i s t e n e r s '  speaker r e c o g n i t i o n c a p a b i l i t i e s m o t i v a t e s  attempts  m a t i c (computer based)  recognition.  systems f o r speaker  can  to d e s i g n  Many d i f f e r e n t uses e x i s t f o r a speaker r e c o g n i t i o n  auto-  system.  I t c o u l d p r o v i d e c o n t r o l l e d access to a f a c i l i t y o r i n f o r m a t i o n to selected i n d i v i d u a l s .  A r e l i a b l e speaker i d e n t i f i c a t i o n system c o u l d  supply c l u e s to a s p e a k e r ' s  i d e n t i t y when o t h e r c l u e s are e i t h e r  mis-  s i n g or h i g h l y ambiguous. I t has shown t h a t automatic speaker r e c o g n i t i o n on p o p u l a tions  of 10-20 persons  is  feasible  [3-10].  In most p r e v i o u s  studies,  the speech s i g n a l was transformed i n t o s p e c t r a l form and the  resulting  time-frequency-energy  identify  or v e r i f y the s p e a k e r .  (spectrographic)  p a t t e r n s were used to  S p e c t r a l t r a n s f o r m a t i o n s y i e l d l a r g e amounts of  d a t a which has to be c a l c u l a t e d and a n a l y z e d u s i n g c o m p u t a t i o n a l l y expensive  techniques. B.S.  A t a l developed a system  tours to r e c o g n i z e s p e a k e r s .  [9]  i n which he used p i t c h c o n -  P i t c h contours were used to a v o i d the  dependency of s p e c t r a l data on v a r i a b l e t r a n s m i s s i o n The purpose of t h i s  characteristics.  study was to determine the u s e f u l n e s s of  speech a m p l i t u d e , low pass z e r o - c r o s s i n g r a t e and h i g h pass s i n g r a t e i n speaker r e c o g n i t i o n . [11]  others  These f e a t u r e s  f o r r e c o g n i t i o n of speech sounds.  t h a t z e r o - c r o s s i n g r a t e measurements cessing,  have been used by  I t has been noted  v i r t u a l l y independent of speaker volume?  [113.  these f e a t u r e s  [11]  a r e w e l l s u i t e d to d i g i t a l p r o and a l s o r e s u l t  much l e s s d a t a p e r speech sample than when s p e c t r a a r e u s e d . ately,  zero-cros-  may be l e s s speaker-dependent  than i s  in  Unfortuns p e c t r a l data  2  The t a s k o f speaker r e c o g n i t i o n c o u l d i n v o l v e e i t h e r  ident-  i f i c a t i o n o f a person from a p o p u l a t i o n of s e v e r a l known speakers verification  of a person's  conducted to see dependent  if  claimed i d e n t i t y .  the f e a t u r e s  to be u s e f u l ,  chosen were  the s p e a k e r ' s  a l l y o f two p a r t s . ital  Since t h i s  study was  sufficiently  speaker  i d e n t i f i c a t i o n t a s k was  f o r c o n d u c t i n g t h e speaker r e c o g n i t i o n t e s t .  or  selected  The study c o n s i s t s  One was the c o l l e c t i o n o f d a t a , the  gener-  analog-to-dig-  c o n v e r s i o n and the e x t r a c t i o n of the p r e p r o c e s s e d waveforms.  The o t h e r p a r t i n v o l v e d the e v a l u a t i o n o f the f e a t u r e s  for  speaker  recognition. In Chapter 2 the b a s i c system f o r speaker r e c o g n i t i o n outlined.  A d e s c r i p t i o n o f how the p r e p r o c e s s e d waveforms are com-  b i n e d and expanded i n f e a t u r e process  is  space i s  given.  The d e c i s i o n making  outlined. The r e c o r d i n g of speech samples,  the a n a l o g y t o - d i g i t a l c o n -  v e r s i o n and p r e p r o c e s s i n g i s d e s c r i b e d i n Chapter 3. t e s t s o f the speaker r e c o g n i t i o n system and r e s u l t s 4.  5.  is  The performance  appear i n Chapter  C o n c l u s i o n s and i d e a s f o r f u r t h e r r e s e a r c h are o u t l i n e d i n Chapter  3  CHAPTER 2 SPEAKER RECOGNITION 2.1  B a s i c System  The Figure 2.1. The  SYSTEM  speaker r e c o g n i t i o n system can be r e p r e s e n t e d as shown i n First  the analog speech sample i s  d i g i t i z e d speech sample i s  sampled and d i g i t i z e d .  then p r e p r o c e s s e d to y i e l d t h r e e wave-  forms, which are then operated on by the f e a t u r e e x t r a c t o r to y i e l d a feature v e c t o r .  This f e a t u r e v e c t o r i s  then handed on t o the  decision  making a l g o r i t h m which produces a d e c i s i o n as to the s p e a k e r ' s  2.2  identity.  Preprocessing The p r e p r o c e s s o r operates  on the d i g i t i z e d speech sample to  produce t i m e - q u a n t i z e d waveforms o f speech a m p l i t u d e , low pass c r o s s i n g and h i g h pass z e r o - c r o s s i n g r a t e .  zero-  Time q u a n t i z a t i o n r e s u l t s  from a v e r a g i n g each p r e p r o c e s s e d waveform o v e r a d j a c e n t 10 msec time windows.  F i g u r e 2.2  shows the p r e p r o c e s s o r output waveforms which are  time n o r m a l i z e d to T=1600 msec.  2.3  Feature E x t r a c t i o n Feature e x t r a c t i o n r e s u l t s  of  from the expansion of one o r more  the p r e p r o c e s s e d output waveforms i n n d i m e n s i o n a l space where n  the number o f speakers i n the e x p e r i m e n t . waveforms i n F i g u r e 2.2  is  I f a l l three preprocessed  are used i n the r e c o g n i t i o n system t h e r e are  s e v e r a l ways i n which the o r t h o g o n a l expansion can be done: One  approach i s  to concatenate  two o r three p r e p r o c e s s e d wave-  forms to form a composite waveform o f l e n g t h 2T o r 3T which i s  then  ex-  panded i n the n d i m e n s i o n a l o r t h o g o n a l space  to y i e l d a f e a t u r e  of  to expand each p r e p r o c e s s e d  dimensionality of n .  waveform i n i t s efficients  Another approach i s  own n d i m e n s i o n a l o r t h o g o n a l s p a c e ;  f o r each o f the p r e p r o c e s s e d waveforms  to form an o v e r a l l f e a t u r e v e c t o r . vector is  the v e c t o r s  can then be  The d i m e n s i o n a l i t y o f the  vector  of  co-  concatenated feature  2n or 3n depending whether 2 or 3 f e a t u r e waveforms were u s e d . Next i s  a d e s c r i p t i o n of how to d e r i v e the f u n c t i o n s which  Feature  Incoming speech sample.  Decision  Extractor Preprocessed Waveforms  F i g u r e 2.1  (speaker i s candidate i ) Feature Vectors  B l o c k Diagram of Speaker R e c o g n i t i o n  System  SPEECH RMPUTUDE  ccg-  D-l 0.0  -1—  0.75  t.O  TIME  0.25  (SEC)  LOW PASS ZCR  -1 0.5  —1 0.75  TIME  1 1.0  (SEC)  I—' 1.25  I— 1.5  2.0  HIGH PASS ZCR  O.S  0.2S  Figure  2.2  —1 0.75  TIME  1—  \.0  (SEC)  l'.S  T y p i c a l P r e p r o c e s s o r Output  — I — 1.7S  —1 2.0  Wavefprms  6  d e s c r i b e the n d i m e n s i o n a l o r t h o g o n a l s p a c e . of  several  samples  the sentence to be used i n the speaker r e c o g n i t i o n system are r e -  corded from each s p e a k e r . of  First,  these samples  Then the p r e p r o c e s s e d waveforms  are d e r i v e d .  waveform o f the j ^  Let  f o r each one  r e p r e s e n t the speech amplitude  u t t e r a n c e o f the i*"* s p e a k e r ,  represent  1  pass z e r o - c r o s s i n g r a t e waveform f o r the same u t t e r a n c e ,  the  and z^_. r e p r e s e n t  the h i g h pass z e r o - c r o s s i n g r a t e waveform f o r the same u t t e r a n c e ; v a r y from 1 to n where n i s from 1 to m, where m i s speaker. l a t e d as  the t o t a l number of s p e a k e r s ,  can  can v a r y  r e c o r d e d per  f o r each p e r s o n are c a l c u -  x. l  1 m = - Z m j  y.  =  i  m  —  E  ' i  m  z. l  1 m = - E mj  j  x.. ±2  2.1  y. .  2.2  =  1  1  ' i j  =  =  z. . IJ  1  2.3  i = l , n is  formed, where  the average o f the p r e p r o c e s s e d waveform o r concatenated  to be t r a n s f o r m e d ,  = x.  or y .  or  s  or  1 ^ = ($ ,y ,z )  ±  = (x ,y ) ±  ±  The  s^  waveforms  i.e.  s.  i  ±  2.4  or z.  or  (x^z^  or  (Yj^yZ^  2.5  2.6  ±  n p r e p r o c e s s e d waveforms s^ are now transformed i n t o an  independent o r t h o g o n a l set technique  i  follows:  Next the p r e p r o c e s s e d waveform s_^, is  and j  the t o t a l number o f speech samples  The average p r e p r o c e s s e d waveforms  low-  of functions  112J given by the e q u a t i o n s  •l  =  S  l  <f>^, i = l ,  n u s i n g the Gram Schmidt  below  2.7  7  J" - Z C <). 1  *j = s  kj  k  j=2,3,...,n  2.8  where C, ,•> k  J  ,1 ;<{.2 dt S  d  2.9  t  "  k In u s i n g the Gram Schmidt technique the assumption i s made t h a t the p r e p r o c e s s e d waveforms s^ are independent.  If  the p r e p r o c e s s e d  waveforms s^ are not independent then one o f the f u n c t i o n s ,  cb^, w i l l  be  zero. Now that the o r t h o g o n a l f u n c t i o n s have been d e r i v e d , the p r e p r o c e s s e d waveforms s . .  where  s. . = x . . ij ij or  s..  s..  y  = (x..,y..)  ±2  or  or  i  ]  i  . ij  or  z. . ij  2.10  o r ( x . . , z . .) o r ( y . . , z . . )  j  i  j  i  j  i  j  i  2.11  j  = (x. . , y . . , z . . )  2.12  can be expanded i n the o r t h o g o n a l space d e s c r i b e d by the f u n c t i o n s The p r e p r o c e s s e d waveforms s ^  used correspond to the average  cessed waveforms s^ used i n g e n e r a t i n g the o r t h o g o n a l s e t . tions  <JK .  prepro-  The e q u a -  d e s c r i b i n g the expansion i n the f e a t u r e space are n s. . = E u . . (k) cf> J k=l iJ k  2.13  u . . (k) =  2.14  X  where  s. . h d t / A  2.2  and  u^^ (k)  is  - o o  A = J"*"" qb, (t) 2  -oo  ^  i  j  k  dt  2.15  the c o o r d i n a t e v a l u e o f the p r e p r o c e s s e d waveform s  the a x i s d e s c r i b e d by the f u n c t i o n  along  (j^. til  til  The f e a t u r e v e c t o r f o r the j u t t e r a n c e o f the i speaker i s xjt.. = {u. . C l ) »• • • ,VL. . (k) , . . . , u . . Cn) } w h i c h i s j u s t the c o o r d i n a t e s o f l j ij Ij ij  S _ J , J expanded i n the n d i m e n s i o n a l o r t h o g o n a l s p a c e . cessed waveform s.  When the p r e p r o -  i s ..expanded i n the h d i m e n s i o n a l space i t  the f e a t u r e u = { i J L ( i ) » • • . » u ^ ( k ) , . . . ( n ) } .  results  The f e a t u r e  in  represents  the average l o c a t i o n of candidate i i n the n d i m e n s i o n a l s p a c e . In the second method o f u s i n g two o r t h r e e p r e p r o c e s s e d waveforms, each p r e p r o c e s s e d waveform i s space. tions  expanded i n i t s  own n d i m e n s i o n a l  T h i s method r e q u i r e s two or t h r e e s e t s o f the o r t h o g o n a l f u n c <JK to be generated depending upon the p r e p r o c e s s e d waveforms  In t h i s  c a s e , f o r each set '  of <j>.'s, s. = x . or s. = y . o r s . = z. x ' x x x ' x x x  pending upon which waveform i s b e i n g c o n s i d e r e d .  used. de-  As b e f o r e the p r e p r o -  cessed waveforms s . . , i n t h i s case s . . = x. . o r s . . = y . . o r s . . = z . . , ij ij ij iJ ij ij ij are expanded i n t h e i r c o r r e s p o n d i n g n d i m e n s i o n a l s p a c e . T h i s i s done f o r each p r e p r o c e s s e d waveform used. T h i s r e s u l t s i n the c o r r e s p o n d i n g f e a t u r e v e c t o r s u . ? o r u . ^ o r u. .. The f e a t u r e v e c t o r u . represents ij 13 • ij ij the c o e f f i c i e n t s o f the speech amplitude p r e p r o c e s s e d waveform x . . e x -*• y ^ panded i n i t s own n - d i m e n s i o n a l s p a c e , s i m i l a r l y f o r u . . and low pass z e r o - c r o s s i n g r a t e and u^* and h i g h pass z e r o - c r o s s i n g r a t e . The s e t s x  z  of  p r e p r o c e s s e d waveforms s. can a l s o be expanded i n t h e i r c o r r e s p o n d i n g 1 ± ± ± n d i m e n s i o n a l space which r e s u l t s i n the f e a t u r e v e c t o r s u . , u . ^ , u . . z  U s i n g t h i s method the new composite where u . .  feature vector i s  u..  cart be  u.. = (u.^,u.y)={ .?(l),...,u. (n),u.?(l),...,u.?(n)} xj x j ' ±2 ij 13 ij ij  2.16  X  u  or  or  u.. = (u..,u..) 13 13  2.17  u. . = ( u . ? , ^ )  2.18  u.. = (u. ,u.^,u.?) xj x j ' x j ' xj  2.19  X  ->-  There i s  also  the average composite f e a t u r e v e c t o r u^ Which equals  c o r r e s p o n d i n g combination of u * , u ^ and u? as u\_. has i n  (2.16-2.19).  For the o p e r a t i o n of the system the o r t h o g o n a l f u n c t i o n s ing u\  the n d i m e n s i o n a l s p a c e ( s ) have to be s t o r e d . ( i = l , n ) , the average l o c a t i o n o f the c a n d i d a t e i  s p a c e , has  to be r e t a i n e d .  the  A l s o the  describ-  features  i n the n d i m e n s i o n a l  9  The in  Gram Schmidt o r t h o g o n a l ! z a t i o n technique has  t h a t i f more speakers  orthogonal functions  advantage  are added to the system thus r e q u i r i n g more  to be g e n e r a t e d ,  only the a d d i t i o n a l o r t h o g o n a l  f u n c t i o n s have to be c a l c u l a t e d , w i t h the e x i s t i n g unaltered.  the  The new o r t h o g o n a l f u n c t i o n s  functions remaining  are generated u s i n g the new  independent p r e p r o c e s s e d waveforms and a l l the p r e v i o u s o r t h o g o n a l f u n c tions  (see  eqn.  Another advantage i s  (2.8)).  that the average  v e c t o r s have to be c a l c u l a t e d f o r the new speakers o n l y . feature vectors  f o r the e a r l i e r speakers  are e a s i l y  zeroes to the end of the f e a t u r e v e c t o r to g i v e i t ality.  The above statement  follows  feature  The average  a l t e r e d by adding the c o r r e c t dimension-  from the expansion t e c h n i q u e .  Equa-  t i o n 2.8 can be w r i t t e n as  i  s  +  *i  =  1  ° k A  =1  -  + c .* 2  + ::.+  2  Comparing (2.20) w i t h ( 2 . 1 3 )  u  U  2  u  i  it  = (1,0,0,  ±  =  (  =  c  (c  ,c  2.20  can be seen t h a t  0)  1 2 '  1 1  c.^.*.., +  2 1  1  ,  0  ' '  0  ,....,c _  l j l  ,  c  i  )  ,l,0,..,0)  > . u  = (c. n  In  ,c  0  2n'  Each v e c t o r has n c o o r d i n a t e s . values  .  ,1)  n-l,n' The v e c t o r u ^ , 1 £ i < n , has n o n - z e r o  f o r the c o o r d i n a t e s between 1 and i - 1 ,  the c o o r d i n a t e i has  v a l u e 1, and the c o o r d i n a t e s between i+1 and n are z e r o .  the  2.4  D e c i s i o n Making In system o p e r a t i o n an u t t e r a n c e i s  A set  o f p r e p r o c e s s e d waveforms are chosen and the u t t e r a n c e  r e s u l t i n g i n a feature vector u . minimum d i s t a n c e between t h i s each speaker i s  The d i s t a n c e d^'s  generated  the s m a l l e s t .  k=l,n  are then s o r t e d to l o c a t e  t h a t the u t t e r a n c e  come from the k  feature u i s  the  calculated using  k  then decides  Once the  processed  f e a t u r e and the average f e a t u r e v e c t o r o f  cL^ = | | u - u" | |  is  r e c o r d e d from a s p e a k e r .  2.21,  the s m a l l e s t  came from the k'*  A c o r r e c t d e c i s i o n i s made i f t n  speaker chosen by the  system.  1  one.  The system  speaker f o r which d^  the u t t e r a n c e d i d a c t u a l l y  CHAPTER 3 DATA ACQUISITION AND INITIAL PROCESSING 3.1  Speech R e c o r d i n g The  testing  f i r s t s t e p i n the p r o j e c t was to c o l l e c t  o f the speaker r e c o g n i t i o n system.  recorded i n a quiet  data for  The speech samples  tape  A q u i e t room was used f o r r e c o r d i n g to s i m u l a t e a f a v o u r a b l e  atmosphere i n which a r e a l system would o p e r a t e . in  were  room u s i n g a h i g h q u a l i t y audio r e c o r d i n g system  c o n s i s t i n g o f a AKG D200-E microphone and a S c u l l y Model 280 recorder.  the  a l l , seven male  and three  There were ten  speakers  female.  Each speaker read a s e t  of five  Only the t h i r d sentence o f the f i v e ,  sentences w r i t t e n on a c a r d .  "We were away a y e a r ago", was  used i n the speaker r e c o g n i t i o n e x p e r i m e n t .  T h i s sentence i s  v o i c e d sentence whose d u r a t i o n v a r i e d from 1.1  to 2.1  an e n t i r e l y  seconds.  The  o t h e r sentences were read i n o r d e r to a v o i d any s p e c i a l emphasis on the key s e n t e n c e .  The speakers were not g i v e n any i n s t r u c t i o n s  the manner i n which they s h o u l d read the s e n t e n c e s .  They were t o l d  r e c o r d i n g s would be used i n f u t u r e speaker r e c o g n i t i o n The in  the  tests.  r e c o r d i n g s e s s i o n s were h e l d once i n the morning and once  the a f t e r n o o n on f i v e  samples  concerning  o f the s e t  different  of f i v e  days.  A t o t a l o f 100  different  sentences were r e c o r d e d , 10 samples  per  speaker. 3.2  D i g i t i z i n g o f the Speech Samples The  analog speech s i g n a l was p r e f i l t e r e d , d i g i t i z e d and s t o r e d  on d i g i t a l magnetic t a p e . of  The p r e f i l t e r s  8- and 1-khz bandwidth and a band-pass  pass.  used i n c l u d e d low pass  filters  f i l t e r h a v i n g a 1-8 khz b a n d -  The analog speech s i g n a l was sampled at 16 khz and q u a n t i z e d u n i -  formly to 10 b i t s .  This p r o c e s s was accomplished u s i n g a Data G e n e r a l  Super Nova computer. #3342 R f i l t e r s  The low-pass  have a 48 d b / o c t a v e  and band-pass  f i l t e r s were K h r o n k i t e  cutoff rate.  The remainder o f "'the  p r o c e s s i n g o f the d i g i t i z e d speech samples was done on an IBM Model 360/67 duplex computer. •• • For  the  convenience,  r e f e r r e d to as normal s p e e c h , speech  the 8-khz low pass f i l t e r e d speech w i l l the 1 khz low pass speech  and the 1-8 khz band pass f i l t e r e d speech  as low pass  as h i g h pass  speech.  be  12  3.3  L o c a t i n g the Speech B o u n d a r i e s Initially  magnetic t a p e .  the speech samples were d i g i t i z e d and s t o r e d on  B e f o r e the p r e p r o c e s s e d waveforms c o u l d be  extracted  from the speech sample the b e g i n n i n g and e n d i n g o f the speech had  to be l o c a t e d .  speech c a l c u l a t e d ; of  sample  The method used was one i n which a measure on the when t h i s measure exceeded  a t h r e s h o l d the b e g i n n i n g  the speech sample was s a i d to be f o u n d , and when the measure  later  dropped below a t h r e s h o l d the e n d i n g o f the speech sample was s a i d  to  be l o c a t e d . The measure used was  the energy passed t h r o u g h a time window.  The energy was s i m p l y the sum o f the squares o f the speech samples  seen  through the window, w h i c h was moved a l o n g the speech samples w i t h energy b e i n g c a l c u l a t e d at each move. old  When the energy exceeded  the b e g i n n i n g o f the speech sample had been f o u n d .  the  a thresh-  Once the b e g i n -  n i n g o f the speech sample was f o u n d , s c a n n i n g f o r the end b e g a n .  When  the energy coming through the window dropped below a t h r e s h o l d the e n d ing  o f the speech sample had been l o c a t e d .  e a r l i e r by B . Gold  This  t e c h n i q u e was  used  [13].  Sentence beginning  Sentence ending  Speech value  Ii  -it-  Scanning window  Scanning window  ( 1 0 msec)  Time  ( 1 0 0 msec)  Speech Boundary D e t e r m i n a t i o n Figure  3.1  The w i d t h o f the window used f o r s c a n n i n g f o r the b e g i n n i n g of  the speech sample was chosen as 10 msec.  S i n c e speakers  tend  to  13  speak more s o f t l y  at the end o f the sentence than the b e g i n n i n g ,  i l l u s t r a t e d i n Figure 3.1,  a w i d e r window was used t o l o c a t e  ending of the speech sample.  The window w i d t h f o r l o c a t i n g  as  the the  ending o f the speech sample was 100 msec. The two t h r e s h o l d s ,  one f o r l o c a t i n g the b e g i n n i n g o f  speech sample and one f o r l o c a t i n g the ending o f the speech were determined e x p e r i m e n t a l l y .  the  sample,  The same t h r e s h o l d s were used f o r  l o c a t i n g the b e g i n n i n g and ending of normal speech samples and low pass speech samples.  Another s e t  of energy t h r e s h o l d s were determined  f o r l o c a t i n g the b e g i n n i n g and ending of the h i g h pass speech  samples.  One problem a r o s e i n l o c a t i n g the ending o f the speech  samples.  Some speakers paused l o n g enough between the words "away" and "a" i n the sentence "We were away a y e a r ago" to cause the a l g o r i t h m to p l a c e the ending of the speech sample f o l l o w i n g "away". was s o l v e d by t a k i n g i n t o c o n s i d e r a t i o n the f a c t speech sample l e n g t h was 1.1  seconds.  T h i s problem  t h a t the minimum  The a l g o r i t h m was a l t e r e d such  t h a t the scanning f o r the ending o f the speech sample d i d not u n t i l 1 second of the speech sample had passed s i n c e speech sample had been l o c a t e d .  the  cases.  Preprocessing Algorithm The next  s t e p was the p r e p r o c e s s i n g of the speech  p r i o r to f e a t u r e e x t r a c t i o n . tude,  the b e g i n n i n g o f  T h i s change r e s u l t e d i n the ending  of the speech sample b e i n g l o c a t e d c o r r e c t l y i n a l l 3.A  start  samples  The p r e p r o c e s s o r y i e l d e d speech  ampli-  low pass z e r o - c r o s s i n g r a t e and h i g h pass z e r o - c r o s s i n g  waveforms.  As the f i r s t  step i n p r e p r o c e s s i n g ,  sample was d i v i d e d i n t o a d j a c e n t 10 msec  rate  the d i g i t i z e d  speech  intervals.  The a m p l i t u d e A a s s i g n e d t o any 10 msec i n t e r v a l was o b t a i n e d by a v e r a g i n g the a b s o l u t e v a l u e o f the a m p l i t u d e s o f the 160 i n the i n t e r v a l .  Thus the speech amplitude waveform i s  r e p r e s e n t e d by  the amplitudes of a l l the 10 msec i n t e r v a l s i n the speech The z e r o - c r o s s i n g r a t e zero-crossings  samples  sample.  (ZCR) was determined by the number o f  i n a 10 msec i n t e r v a l .  ZCR was c a l c u l a t e d u s i n g  n Z  where x^ i  s  C  R  =  |=l  t : L  ~ 8 s  n  -\+i  <  )  s  g  n  (  \ ) ] /  2  3.1  the amplitude of sample k i n the i n t e r v a l and n=160.  T h e r e f o r e the low pass ZCR waveform i s  r e p r e s e n t e d by a l l the  ZCR's  c a l c u l a t e d u s i n g the low pass speech and the h i g h pass ZCR waveform i s r e p r e s e n t e d by a l l the ZCR's c a l c u l a t e d u s i n g h i g h pass  speech.  D u r a t i o n s o f the e x t r a c t e d p r e p r o c e s s e d waveforms v a r i e d from 1.1  seconds to 2.1 seconds.  The a v e r a g e ' l e n g t h was 1.6  seconds.  T h e r e f o r e a l l the p r e p r o c e s s e d waveforms were time n o r m a l i z e d to seconds u s i n g l i n e a r s c a l i n g .  The amplitude waveform was  amplitude n o r m a l i z e d such t h a t i t s constant v a l u e . appears i n F i g u r e  t o t a l energy e q u a l l e d a  1.6  also preselected  A t y p i c a l example of the t h r e e n o r m a l i z e d waveforms 3.2.  F i g u r e 3.2  T y p i c a l p r e p r o c e s s e d waveforms a speech sample o f c a n d i d a t e CS  for  CHAPTER 4 SYSTEM TESTS AND RESULTS 4.1  T e s t Procedure In  this  c h a p t e r a d e s c r i p t i o n of the d i f f e r e n t  tests per-  formed on t h e speaker r e c o g n i t i o n system and t h e i r r e s u l t s a r e presented. of  The same t e s t procedure was used to t e s t a l l combinations  the p r e p r o c e s s e d waveforms u s e d .  The t e s t procedure used was to  t e s t the system w i t h 2 , 4 , 6 , 8 and 10 c a n d i d a t e s s e p a r a t e l y u s i n g different  random groupings f o r each number o f c a n d i d a t e s , p r o v i d i n g a  t o t a l of 25 t e s t s i n a l l . for  five  The c a n d i d a t e g r o u p i n g remained t h e same  a l l t e s t s on the d i f f e r e n t p r e p r o c e s s e d waveform c o m b i n a t i o n s .  For t h e case o f t e n c a n d i d a t e s the d i f f e r e n t g r o u p i n g were merely different  o r d e r i n g s o f the same c a n d i d a t e s  the o r t h o g o n a l f u n c t i o n s  (different  orderings  affect  used).  The performance measure f o r the system was the percentage o f incorrect decisions  ( i e . i d e n t i f i e d the wrong person)  number of d e c i s i o n s made.  i n the t o t a l  As d e s c r i b e d i n S e c t i o n 2 . 3 , t o s e t up t h e  system the p r e p r o c e s s e d waveforms from a l l t e n speech samples f o r each c a n d i d a t e to be used i n t e s t i n g  the system were averaged to form a v e r -  age p r e p r o c e s s e d waveforms which i n t u r n were transformed i n t o o r t h o g o n a l f u n c t i o n s which d e s c r i b e the f e a t u r e space. the p r e p r o c e s s e d waveform(s) used i n t e s t i n g  To t e s t the system  from each speech sample o f each c a n d i d a t e  the system were i n t u r n expanded i n the f e a t u r e  space.  U s i n g each r e s u l t i n g f e a t u r e v e c t o r a d e c i s i o n was made as to the i d e n t i t y o f the speaker o f t h a t p a r t i c u l a r speech sample.  The d e c i s i o n  was then checked f o r i t s v a l i d i t y keeping t r a c k o f the i n c o r r e c t The  decisions.  t o t a l number o f d e c i s i o n s made i n a system t e s t e q u a l l e d t e n times  the number of c a n d i d a t e s used i n the t e s t . There were a l s o t h r e e o t h e r measurements made on the f e a t u r e vectors  i n the f e a t u r e space.  d e f i n e d as  These were the q u a n t i t i e s F , G and F / G  follows:  F=  t t  i=i J,i !vV N  N  /N(N_1)  4,1  17  N" G  =  1  J - i  5  t  0  l ij-^|/l" U  = 1  4.2  0 N  ->•  where u . i s  the average f e a t u r e v e c t o r f o r the j  f e a t u r e v e c t o r d e r i v e d from the i N is of  speaker, u . . i s  u t t e r a n c e o f the j  the number o f speakers used i n the system t e s t .  the  speaker and F is  a measure  the average d i s t a n c e between t h e average f e a t u r e v e c t o r s o f any  two c a n d i d a t e s  i n the f e a t u r e s p a c e .  d i s t a n c e between the i n d i v i d u a l feature vector for a l l vectors of  th  m e r i t f o r the system;  formance of the  G is  a measure o f the  f e a t u r e v e c t o r s and t h e i r i n the f e a t u r e s p a c e .  average  average  F/G is  a figure  the l a r g e r F / G the b e t t e r s h o u l d be the p e r -  system.  The r e s u l t s o f the d i f f e r e n t system t e s t s a r e p r e s e n t e d  in  g r a p h i c a l form w i t h p e r c e n t a g e of i n c o r r e c t d e c i s i o n s v e r s u s number o f candidates.  F o r a f i x e d number o f c a n d i d a t e s  from one group to a n o t h e r .  the t e s t r e s u l t s  The v a r i a t i o n of r e s u l t s  is  varied  i l l u s t r a t e d on  the graph by a b a r at t h a t number of c a n d i d a t e s c o v e r i n g the range o f results.  A c i r c l e on the b a r i s  used to denote  the average v a l u e of  the performance o f the system f o r t h a t number of As s t a t e d  i n S e c t i o n 2.3  candidates.  the d i m e n s i o n a l i t y o f t h e  feature  space e q u a l l e d the number of c a n d i d a t e s used i n the speaker r e c o g n i t i o r tests. can  T h e r e f o r e f o r the case of two speakers  the f e a t u r e  be p l o t t e d to show how the v e c t o r s group i n the f e a t u r e  These p l o t s were made f o r each group o f two c a n d i d a t e s . of  vectors  the p l o t s  4.2 4.2.1  are shown i n F i g u r e  space.  Two examples  4.1.  Test Results I n d i v i d u a l P r e p r o c e s s e d Waveforms First  the i n d i v i d u a l p r e p r o c e s s e d waveforms were  s e p a r a t e l y w i t h the speaker r e c o g n i t i o n system. presented i n Figure 4.2.  evaluated  The r e s u l t s  As can be seen from t h e c u r v e s the  are individual  p r e p r o c e s s e d waveform LOW PASS ZCR performed b e s t , w i t h an average e r r o r r a t e o f 8.4% w i t h t e n c a n d i d a t e s .  The p r e p r o c e s s e d waveform  SPEECH AMPLITUDE ranked second w i t h an average e r r o r o f 12.5% under the same c o n d i t i o n s .  The p r e p r o c e s s e d waveform HIGH PASS ZCR was  w i t h an average e r r o r r a t e o f 21.9% i n the same s i t u a t i o n .  In a l l  last  FEATURE VECTOR PLOT FOR CANDIDATES ND (X) 'AND GB (0)  o  0  o  M - MEAN  0 MO  -e-  ~~l  -0.25  0.25  1  75  0.5  Concatenated p r e p r o c e s s e d waveform used:  1.25  HIGH PASS ZCR, SPEECH AMPLITUDE  FERTURE VECTOR PLOT FOR CANDIDATES DW. (X) AND JM (0) M - MEAN o o  M  o  o 0 0 0 0  -e-  -0.25  —i 0.25  P r e p r o c e s s e d waveform u s e d : F i g u r e 4.1  i 0.5  <t>,  r  /-k 0.75  -f-  x  1.0  1  1.25  LOW PASS ZCR Two t y p i c a l 2 - d i m e n s i o n a l vector plots  feature  f o r 2 candidates  co154-  I  U.2-  O CO LU  g 5 UJ  10-  o  o LU o: cr; LU rr: o_ o o  0  £ NUMBER F i g u r e 4.2(a)  6  8  10  OF C A N D I D A T E S  Performance r e s u l t s  of-preprocessed  waveform SPEECH AMPLITUDE  154z u_ 2 o cn UJ  o  <  Q  Lu <-> DC  LU err c4CL o CJ  4 NUMBER  F i g u r e 4.2(b)  +  6  + 8  10  OF C A N D I D A T E S  Performance r e s u l t s o f the p r e p r o c e s s e d waveform LOW PASS ZCR  30-4-  t  o CO  u LU Q  20-  o Ul  Cd.  cr o 15f o z LL.  o  LU irjCD  LU u LU 0_  '  54-  4 NUMBER  Figure 4.2(c)  -  6  10  8  OF CANDIDATES  Performance r e s u l t s o f  the  pre-  p r o c e s s e d waveform HIGH PASS ZCR  c a s e s t h e e r r o r i n c r e a s e d a p p r o x i m a t e l y l i n e a r l y w i t h t h e number o f candidates. The measurements F,G and F/G made on t h e f e a t u r e v e c t o r s d e r i v e d from t h e p r e p r o c e s s e d waveforms a r e r e c o r d e d i n T a b l e 4.1.  The  v a l u e s a r e averaged over t h e f i v e d i f f e r e n t groups f o r each number o f c a n d i d a t e s used i n t h e system t e s t .  The p r e p r o c e s s e d waveform LOW  PASS ZCR showed t h e b e s t range o f t h e f i g u r e o f m e r i t F/G, r a n g i n g from 6.4 t o 2.1 as t h e number o f c a n d i d a t e s i n c r e a s e d from 2 t o 10. The range o f F/G f o r t h e p r e p r o c e s s e d SPEECH AMPLITUDE was from 4.9 t o 1.6 and f o r t h e p r e p r o c e s s e d waveform HIGH PASS ZCR  from 4.3 t o  1.5. 4.2.2  Concatenated P r e p r o c e s s e d Waveforms The speaker r e c o g n i t i o n system was used t o e v a l u a t e c o n c a t -  enated p r e p r o c e s s e d waveforms as e x p l a i n e d i n S e c t i o n 2.3.  Since  t h e r e a r e two ways t o c o n c a t e n a t e two p r e p r o c e s s e d waveforms A and B, t h a t i s A,B and B,A, and s i x ways t o c o n c a t e n a t e t h r e e p r e p r o c e s s e d waveforms A,B and C i t was d e c i d e d t o t e s t a l l p o s s i b l e  combinations  of two and t h r e e p r e p r o c e s s e d waveforms u s i n g t h e same s e t o f t e n candidates.  The r e s u l t s a r e p r e s e n t e d i n T a b l e 4.2.  The f o u r d i s t i n c t s e t s o f c o n c a t e n a t e d p r e p r o c e s s e d waveforms w h i c h performed t h e b e s t under t h e above t e s t were chosen f o r complete e v a l u a t i o n .  The s e t s t h a t were chosen were HIGH PASS ZCR,SPEECH  AMPLITUDE and LOW PASS ZCR, SPEECH AMPLITUDE and HIGH PASS ZCR, LOW PASS ZCR and LOW PASS ZCR, HIGH PASS ZCR,SPEECH AMPLITUDE. of t h e above s e t s a r e p r e s e n t e d i n F i g u r e 4.3.  The b e s t  The r e s u l t s  performance  was o b t a i n e d from t h e c o n c a t e n a t e d p r e p r o c e s s e d waveform LOW PASS ZCR, HIGH PASS ZCR, SPEECH AMPLITUDE w i t h an average e r r o r r a t e o f 5.2% with ten candidates.  T h i s performance was o n l y s l i g h t l y b e t t e r t h a n  the c o n c a t e n a t e d p r e p r o c e s s e d waveform.HIGH PASS ZCR, SPEECH AMPLITUDE w i t h an average e r r o r r a t e o f 5.4% under t h e same c o n d i t i o n s .  The  c o n c a t e n a t e d p r e p r o c e s s e d waveform LOW PASS ZCR, SPEECH AMPLITUDE ranked next w i t h an average e r r o r r a t e o f 9.8% w i t h t e n c a n d i d a t e s and HIGH PASS ZCR, LOW PASS ZCR was l a s t w i t h an average e r r o r r a t e o f 18.6% under s i m i l a r c o n d i t i o n s .  P r e p r o c e s s e d waveform:  No.  of c a n d i d a t e s  F  G  2  1.002  .2041  4.9.1  4  1.183  .3536  3.3.5  '6  1.263  .5988  2.12  8  1.378  .7233  1.91  10  1.436  .8899  1.62  P r e p r o c e s s e d waveform:  No.  of c a n d i d a t e s  F/G  LOW PASS ZCR  F  G  2  1.012  .1576  6.42  4  1.156  .3439  3.36  6  1.247  .4790  2.60  8  1.311  .5702  2.35  10  1.395  .6686  2.09  P r e p r o c e s s e d waveform:  No.  SPEECH AMPLITUDE  of candidates  F/G  HIGH PASS ZCR  F  G :  F/G  2  .986  .2306  4.28  4  1.156  .5630  2.05  6  1.302  .7644  1.70  8  1.430  .9431  1.52  10  1.557  1.044  •Measurements F , G , and F / G made on the f e a t u r e  1.49  space  when i n d i v i d u a l p r e p r o c e s s e d waveforms were used  Table  4.1  23  Combinations  Number of candidates:  of Preprocessed Waveforms  10  Concatenated Preprocessed Waveform  % error  SPEECH AMPLITUDE, LOW PASS ZCR  10.0  LOW PASS ZCR, SPEECH AMPLITUDE  7.0  SPEECH AMPLITUDE, HIGH PASS ZCR  7.0  HIGH PASS ZCR, SPEECH AMPLITUDE  °  5.0  LOW PASS ZCR, HIGH PASS ZCR  19.0  HIGH PASS ZCR, LOW PASS ZCR  18.0  SPEECH AMPLITUDE, LOW PASS ZCR, HIGH PASS ZCR  10.0  SPEECH AMPLITUDE, HIGH PASS ZCR, LOW PASS ZCR  7.0  LOW PASS ZCR, SPEECH AMPLITUDE, HIGH PASS ZCR  6.0  HIGH PASS ZCR, SPEECH AMPLITUDE, LOW PASS ZCR  6.0  LOW PASS ZCR, HIGH PASS ZCR, SPEECH AMPLITUDE  6.0  HIGH PASS ZCR, LOW PASS ZCR, SPEECH AMPLITUDE  6.0  Results of a l l possible combinations of three preprocessed waveforms  Table 4.2  NO  CO  10  ii— O CO o  LU L U  CD Q  < 1— hZ  5  O  LU L U  o cc cc cc LU o o- o z  i NUMBER Figure  6 8 OF C A N D I D A T E S  10  4.3(a) Performance r e s u l t s o f t h e c o n c a t e n a t e d p r e p r o c e s s e d waveform HIGH PASS ZCR, SPEECH AMPLITUDE  15+  CO  o u LU LU  10  CD Q < •— h -  z  I  o  LU L U  t  o cr: 5 + cc cr: LU  o  o_ o z  +  NUMBER  F i g u r e 4.3(b)  +  6 8 OF CANDIDATES  Performance  10  r e s u l t s o f the concatenated  p r e p r o c e s s e d waveform LOW PASS ZCR, SPEECH AMPLITUDE  25 25 + CO  z o (75  5  20  U  Q t— CJ LU Cd  0  15'  O O  10 +  o LU CD <  z LU O  5 +  LU 0_  4  NUMBER Figure  4.3(c)  6 8 OF C A N D I D A T E S  Performance r e s u l t s  of  the  10  concatenated  p r e p r o c e s s e d waveform HIGH PASS ZCR, LOW PASS ZCR  CO  §10O  CO  LUUJ CD Q <  Z  |- 5 + O  LU LU O Cd CC CC  UJ  °-  o a  z: NUMBER F i g u r e 4.3(d)  6 8 OF C A N D I D A T E S  Performance r e s u l t s  of  the  10  concatenated  p r e p r o c e s s e d waveform LOW PASS ZCR, HIGH PASS ZCR, SPEECH AMPLITUDE  The measurements F , G and F / G f o r these f o u r s e t s of enated p r e p r o c e s s e d waveforms a r e p r e s e n t e d i n T a b l e 4 . 3 .  concat-  The two  concatenated waveforms LOW PASS ZCR, HIGH PASS ZCR, SPEECH AMPLITUDE and  HIGH PASS ZCR, SPEECH AMPLITUDE  F/G values, of  6.02  to 2.00 and 5.98  have a p p r o x i m a t e l y the same  to 1.97  c a n d i d a t e s i n c r e a s e from 2 to 10.  r e s p e c t i v e l y as the number  The F / G v a l u e s  f o r the  last  two concatenated waveforms, LOW PASS ZCR, SPEECH AMPLITUDE and HIGH PASS ZCR, LOW PASS ZCR range from 5.68 respectively, 4.2.3  to 1.72  and 4.60  as the number of c a n d i d a t e s i n c r e a s e s  to  1.68  from 2 to  10.  Concatenated F e a t u r e V e c t o r s In  features  t h i s s e c t i o n the e v a l u a t i o n r e s u l t s  of the  concatenated  d e r i v e d by c o n c a t e n a t i n g the f e a t u r e v e c t o r s d e r i v e d from  i n d i v i d u a l p r e p r o c e s s e d waveforms are p r e s e n t e d . feature vectors  S i n c e i t was  the  t h a t were concatenated and n o t the p r e p r o c e s s e d wave-  forms as i n the l a s t  s e c t i o n i t made no d i f f e r e n c e i n which o r d e r the  f e a t u r e v e c t o r s were  concatenated.  The different  results  f o r the concatenated feature: v e c t o r s u s i n g  p r e p r o c e s s e d waveforms a r e p r e s e n t e d i n F i g u r e 4 . 4 .  the  The  concatenated f e a t u r e v e c t o r d e r i v e d from the p r e p r o c e s s e d waveforms SPEECH AMPLITUDE, LOW PASS ZCR and HIGH PASS ZCR performed b e s t w i t h an average e r r o r r a t e o f 3.4% w i t h ten c a n d i d a t e s .  The concatenated  f e a t u r e v e c t o r d e r i v e d from the p r e p r o c e s s e d waveforms SPEECH AMPLITUDE, LOW PASS ZCR was chosen b e h i n d w i t h an average e r r o r r a t e of 3.6% w i t h ten c a n d i d a t e s .  The other two concatenated f e a t u r e  vectors  d e r i v e d from the p r e p r o c e s s e d waveforms SPEECH AMPLITUDE, HIGH PASS ZCR and LOW PASS ZCR, HIGH PASS ZCR performed a p p r o x i m a t e l y the each h a v i n g an average e r r o r r a t e of 8.6% w i t h t e n T a b l e 4.4  candidates.  c o n t a i n s average v a l u e s f o r the measurements F ,  G and F / G made the f e a t u r e space c r e a t e d u s i n g the feature vectors.  same,  concatenated  Note t h a t the concatenated f e a t u r e v e c t o r d e r i v e d  from the p r e p r o c e s s e d waveforms SPEECH AMPLITUDE, LOW PASS ZCR had F / G v a l u e s r a n g i n g from 4.86 increases  to 1.77  as the number o f  from 2 to 10 which i s b e t t e r  candidates  than the F / G v a l u e s  for  the  concatenated f e a t u r e v e c t o r d e r i v e d from a l l t h r e e p r e p r o c e s s e d v a v e forms which range from 4.27  to 1.59>under the same c o n d i t i o n s .  The  No.  of candidates  F  Concatenated Waveform:  G  F/G  HIGH PASS ZCR, SPEECH AMPLITUDE  2  1.001  .167  5.98  4  1.159  .365  3.48  6  1.233  .512  2.41  8  1.323  .605  2.19  10  1.385  .702  1.97  Concatenated Waveform:  LOW PASS ZCR, SPEECH AMPLITUDE  2  1.001  .176  5.68  4  1.164  .425  2.73  6  1.251  .604  2.07  8  1.362  .708  1.92  10  1.448  .841  1.72  Concatenated Waveform:  HIGH PASS ZCR, LOW PASS ZCR  2  1.002  -.218  4.60  4  1.151  .492  2.34  6  1.291  .663  1.95  8  1.429  .824  1.73  10  1.530  .906  1.68  Concatenated Waveform:  LOW PASS ZCR, HIGH PASS ZCR, SPEECH AMPLITUDE  2  1.001  .166  6.02  4  1.158  .361  3.21  6  1.232  .506  2.43  8  1.322  .597  2.21  10  1.384  .693  2.00  Measurements  F , G and F / G made on the f e a t u r e  concatenated  space when  P r e p r o c e s s e d Waveforms were u s e d .  Table 4 . 3  10'  JJ  5+  2  k NUMBER  Figure 4 . 4 ( a )  6 8 OF C A N D I D A T E S  10  Performance results of concatenated feature vectors derived from the preprocessed waveforms SPEECH AMPLITUDE and LOW PASS ZCR  15-  ^10-  LU  U  L j g 5-  t  ^8 .2 Figure 4 . 4 ( b )  L NUMBER  t 1  6 OF  h8  10  CANDIDATES  Performance results of concatenated feature vectors derived from the preprocessed waveforms SPEECH AMPLITUDE and HIGH PASS ZCR  15' CO  z o  o10  UJ UJ O Q <  0  i  2  4 NUMBER  F i g u r e 4.4(c)  6  10  OF C A N D I D A T E S  Performance r e s u l t s vectors  8  o f concatenated  feature  d e r i v e d from the p r e p r o c e s s e d  waveforms LOW PASS ZCR and HIGH PASS ZCR  CO Z o o  CJ LU LU CD O r- •  I J  z o 5+ LU LU CJ or cr: ce LU o o_ CJ  4 NUMBER F i g u r e 4.4(d)  6  10  OF CANDIDATES  Performance r e s u l t s vectors  8  o f concatenated  feature  d e r i v e d from the p r e p r o c e s s e d wave-  forms SPEECH AMPLITUDE, LOW PASS ZCR and HIGH PASS ZCR  No. o f c a n d i d a t e s  F  P r e p r o c e s s e d waveforms used:  SPEECH AMPLITUDE, LOW PASS ZCR  G  F/G  2  1.426  .293  4.86  4  1.638  .626  .2.61  6  1.770  .858  2.06  8  1.900  .973  1.95  10  2.023  1.142  1.77  P r e p r o c e s s e d waveforms used:  SPEECH AMPLITUDE, HIGH PASS ZCR  2  1.416  .339  4.17  4  1.643  .783  2.10  6  1.818  1.055  1.72  8  2.003  1.263  1.59  10  2.144  1.440  1.49  P r e p r o c e s s e d waveforms used:  LOW PASS ZCR, HIGH PASS ZCR  2  1.426  .323  4.42  4  1.631  .695  2.35  6  1.811  .924  1.96  8  1.971  1.130  1.74  10  2.105  1.274  1.65  P r e p r o c e s s e d waveforms used:  SPEECH AMPLITUDE, LOW PASS ZCR and HIGH PASS ZCR  2  1.743  .408  4.27  4  2.006  .892  2.25  6  2.206  1.195  1.85  8  2.402  1.414  1.70  10  2.565  1.613  1.59  Measurements F, G, and F/G made on t h e f e a t u r e space when c o n c a t e n a t e d f e a t u r e v e c t o r s were used. Table  4.4  F / G values  f o r the two concatenated  feature vectors  d e r i v e d from SPEECH  AMPLITUDE,_HIGR PASS ZCR and LOW PASS ZCR, HIGH PASS ZCR range 4.17  to 1.49  increases '4.3  and 4.42  from 2 to  to 1.65  r e s p e c t i v e l y as the number o f  candidates  10.  Measurement o f the V a r i a n c e s o f the P r e p r o c e s s e d Waveforms In  the i n i t i a l p r o c e s s i n g ten p r e p r o c e s s e d waveforms o f  type were generated f o r each c a n d i d a t e . waveforms was  Each s e t  each  o f ten p r e p r o c e s s e d  averaged to form three average p r e p r o c e s s e d waveforms  each c a n d i d a t e to be used i n the  c a l c u l a t i o n of the b a s i s  the Gram-Schmidt o r t h o g o n a l i z a t i o n . p r e p r o c e s s e d waveform and i t s by  from  functions  for in  The v a r i a n c e between each I n d i v i d u a l  average was c a l c u l a t e d and then n o r m a l i z e d  the r o o t mean square (RMS) v a l u e o f  the average p r e p r o c e s s e d waveform.  The n o r m a l i z e d v a r i a n c e s , the average n o r m a l i z e d v a r i a n c e f o r each  set  o f waveforms and the RMS v a l u e o f the average p r e p r o c e s s e d waveform f o r all  the speakers and f o r a l l the waveform types o f SPEECH AMPLITUDE,  LOW PASS ZCR and HIGH PASS ZCR are p r e s e n t e d i n t a b l e 4 . 5 . average n o r m a l i z e d v a r i a n c e , t h a t i s  The o v e r a l l  averaged o v e r a l l the p r e p r o c e s s e d  waveforms o f one type of a l l the s p e a k e r s , o f the p r e p r o c e s s e d waveforms SPEECH AMPLITUDE, LOW PASS ZCR, and HIGH PASS ZCR are .414, .257  respectively.  4.4  D i s c u s s i o n of  .299 and  Results  The speaker r e c o g n i t i o n system was t e s t e d u s i n g p r e p r o c e s s e d waveforms,  individual  concatenated p r e p r o c e s s e d waveforms and c o n c a t e -  nated f e a t u r e v e c t o r s d e r i v e d from i n d i v i d u a l p r e p r o c e s s e d waveforms. The b e s t performance r e s u l t e d from the cases i n which concatenated  feature  v e c t o r s were u s e d .  The lowest average e r r o r r a t e o b t a i n e d was  3.4%  w i t h ten c a n d i d a t e s  achieved w i t h concatenated f e a t u r e v e c t o r s  derived  from a l l t h r e e p r e p r o c e s s e d waveforms.  In a l l cases the e r r o r r a t e i n -  c r e a s e d a p p r o x i m a t e l y l i n e a r l y w i t h the number o f c a n d i d a t e s . e i t h e r concatenated p r e p r o c e s s e d waveforms o r concatenated tors,  In u s i n g  feature  vec-  the improvement a c h i e v e d from u s i n g a l l t h r e e p r e p r o c e s s e d wave-  forms seems minimal s i n c e  a r e d u c t i o n of o n l y 0.2% i n e r r o r r a t e was  a c h i e v e d w i t h ten c a n d i d a t e s i n each c a s e . The method o f u s i n g concatenated  f e a t u r e vectors, has the a d -  vantage i n t h a t the d i m e n s i o n a l i t y o f the f e a t u r e space i s  increased  PRFPP0CF8SE0 WAVFFORM t SPEAKER  SPF.F.r.H AHPLITUOF  NORMALIZED WAVEFORM V ARIANCF3  1  2  3  0  5  6  7  B  9  32 MF A N  RMS VALUE  VARIANCE  MEAN WAVEFORM  10  ND  0.2B1 0.325 0.291 0.2B9 0,333 0.385 0.325 0,366 0.355 0.217  0.317  50.8  BC  0,293 0.138 0,393 0.318 0.1B7 0,389 0.90B 0.279 0,381 0.121  0,131  53,R  0,126  53.8  AS  0.3B7 0,312 0.015 0,552 0.133 0,571 0.301 0,351 0.167 0,169  BA  0.180 0.133 0,373 0.279 0.297 0,287 0.3B6 0,387 0,367 1 .820  0,511  JD  0,510 0.295 0.517 0.397 0,390 0,528 0.353 0,131 0.193 0.172  0,138  50'.7 53'.9  DW  D.126 0.183 0.395 0,355 0.367 0.381 0.122  1.890 0.376 0.396  0.519  50.0  C3  0.103 0.196 0,317 0.187 0.277 0,335 0.310 0.318 0.168 0.302  0.371  51.1  GE  0,117 0.319 0.323 0.332 0.361 0.188 0.381 0.338 0.331 0.112  0.371  51.3  JM  0,119 o',337 0.302 0,297 0.331 0,252 0,290 0,311 0.392 0.305  0,330  51.6  GR  0,281 0.118 0.511 0,399 0.526 0.260 0.399 0,157 0.320 0.283  0.392  51.2  PREPROCESSED WAVFFORM : SPEAKER  LOW PASS ZCR  NORMALIZED WAVEFORM VARIANCES  1  2  3  1  5  6  7  8  9  1  MEAN  RMS VALUE  VARIANCE  MEAN WAVEFORM  0  ND  0.203 0.212 0,300 0.233 0.315 0,265 0.320 0,273 0.265 0.237  0,265  8.0  BC  0,221 0,210 0,296 0.227 0.188 0.201 0,191 0.201 0.212 0,180  0,217  8.7  AS  0.191 0,350 0,110 0.181 0.393 0.387 0.102.0,371 0.111 0.129  0,115  6.5  BA  0,360 0,308 0,258 0.219 0.301 0,306 0.299 0,319 0,271 0,263  0,291  6.7  JD  0.326 0.350 0.396 0.298 0.251 0,121 0.373 0.351 0.353 0.351  0.317  7.5  D*  0,295 0.281 0.319 0.319 0.251 0.251 0.372 0.215 0.215 0.27B  0,286  8.7  CS  0,275 0,291 0.266 0.301 0,222 0,235 0.192 0,218 0.250 0.255  0,251  6.8  GE  0,289 0.255 0.236 0.300 0.303 0.331 0.2B3 0,313 0.271 0.276  0,286  7.3  JM  0,122 0,305 0,293 0.277 0.318 0.353 0.311 0.138 0.112 0.338  0,317  6.7  GB  0.262 0.286 0,357 0.277 0,316 0,263 0.263 0.238 0.262 0.279  0,280  7.3.  PREPROCESSED WAVEFORM I SPEAKER  HIGH PASS ZCR  NORMALIZED WAVEFORM VARIANCES  1  2  3  1  5  6  7  8  9  1  MEAN  RMS VALUE  VARIANCE  MEAN WAVEFORM  0  ND  0,236 0,321 0.215 0.281 0,201 0.301 0.228 0.220 0.187 0.203  0,213  39.2  BC  0,291 0.221 0.22S 0,221 0,106 0,299 0.353 0.211 0.256 0.209  0,271  38.S  AS  0,503 0.158 0.117 0.659 0.399 0.560 0,359 0.378 0,101 0.315  0,118  10.6  BA  0,182 0'.219 0.200 0.181 0.2C7 0.163 0.123 0,167 O . l f l l 0,181  0,181  32.9  JO  0.390 0.618 0.112 0,291 0.269 0.311 0,321 0,350 0.368 0,317  0.371  31.0  OW  .0.179 0.231 0.233 0.212 0.195 0.321 0,211 0.161 0.231 0.190  0,223  37.8  C8  0.231 0,207 0,225 0,205 0.328 0,208 0,202 0,197 0.196 0.207  0.221  39.0  GE  0,219 0.158 0.211 0,258 0,251 0,162 0.210 0.167 0.263 0.239  0.211  31.1  JM  0,183 0.198 0.265 0,179 0.271 0.179 0.111 0,180 0.150 0.117  0,190  Jfl'.O  68  0,156 0,179 0.200 0.225 0.193 0,187 0.209 0.221 0.217 0.221  0,201  J7.3  N o r m a l i z e d v a r i a n c e s f o r a l l t h r e e p r e p r o c e s s e d waveforms T a b l e 4.5  to  2n  or  tests, For  3n,  where  depending  individual  feature  space  requires  2  average tions  or  forms.  is n.  The  3 times  more  vectors at  were  used,  For  the  waveforms effect  was  and  as  the  was  storage  due  that  to  the  When t h e  variation  between  difference in  poorer  preprocessed waveforms waveform  Table  4.2.  ally  starts  (15-20), around  The  and  the  the  same  The when for  ten  effect  procedure  which  ...,x (t)  into  n  using  first  of -  functions  which  procedure  can generate  upon what  order  gonal C^j's.  functions As  shown  the  in  to  the vectors  of  do  2 or  used.  of  feature  time  time  requirements  order  some  effect  One  the  calcula-  3  preprocessed wave-  an i n c r e a s e  in  to  look  large the  of  n  orthogonal In  each  describe  the  the  as  was  is  the  the concate-  better  than  the  can be  seen  in  SPEECH AMPLITUDE a smaller  ZCR s t a r t s  ordering  system.  usu-  value  and  ends  is  smaller  of  the  To s e e  Gram-Schmidt independent  and  number o f  processed.  the  with  y^(t) the the  dimensional  a different 2.3  the  system  n  ZCR;  more  with  having  second.  candidate  an i n f i n i t e are  ends  c o u l d be  this  ZCR p e r f o r m e d  signals  the  of  This  preprocessed  discontinuity  discontinuity  upon  the  two  there  example  LOW P A S S  ZCR  test at  a set  Section  the  a n d LOW P A S S  and  preprocessed  system performance.  SPEECH AMPLITUDE  the  depended  xi(t)'s  the  where  and when  (40)  thus  2.9). of  on  was  LOW P A S S  ZCR,  value  to  n  in which  preprocessed waveform  used  generates in  were  coefficients  upon whether  performance.  transforms  (2.7  system  concatenating preprocessed  and  a n d LOW P A S S  were  preprocessed waveforms gonal  of  samples  the  rate  a set  the  increase  discontinuity  (4-8)  one has  equations  store  preprocessed waveform  candidates  this  is  a large  error  the  dimensionality  concatenated  discontinuity  LOW P A S S  values  SPEECH AMPLITUDE  to  SPEECH AMPLITUDE  reason  with  the  using  the  SPEECH AMPLITUDE,  concatenated waveform  in  obtained.  observed  probably  used  3 preprocessed waveforms  depending  method  joined.  nated  candidates  a similar  concatenated had  resulting  of  of  storage  waveforms  small,  2 or  method  additional  was  were  number  a decision,  system performance It  the  concatenated waveforms  arrive  waveforms  is  upon whether  and  feature  to  n  set  the  average  reason  orthogonalization  signals  x]_  ( t ) ,X2(t) ,  , y2(t),...  x(t)'s  are  y(t)'s feature  orthogonal  are  n  the  average  the  sets,  transformation feature  y (t)  ortho-  space.  Each d i f f e r e n t of  candidates,  vectors  set  This depending of  ortho-  coefficients location  f o r each c a n d i d a t e a r e dependent upon the C j ^ ' s .  Therefore the distance  between c a n d i d a t e s v a r y from one o r d e r i n g o f t h e c a n d i d a t e s t o t h e n e x t w h i c h r e s u l t s i n some c a n d i d a t e s b e i n g c l o s e r t o g e t h e r i n some cases than i n o t h e r s .  I n a l l cases the v a r i a t i o n i n performance was a m a t t e r  o f o n l y 3 o r 4 i n c o r r e c t d e c i s i o n s o u t o f a t o t a l o f 100 d e c i s i d a s . The performance curve f o r the p r e p r o c e s s e d waveform HIGH PASS ZCR has a drop i n p e r c e n t e r r o r from e i g h t t o t e n c a n d i d a t e s , as a r e s u l t o f t h e f a c t t h e system made no more e r r o r s f o r t e n c a n d i d a t e s than f o r e i g h t .  The e f f e c t i s a l s o n o t i c e a b l e i n t h e performance  curve  of t h e c o n c a t e n a t e d p r e p r o c e s s e d waveform HIGH PASS ZCR, LOW PASS ZCR. The p r e p r o c e s s e d waveforms r e s u l t e d i n much b e t t e r system performance when they were combined than when they were used  alone.  The p r e p r o c e s s e d waveforms had l a r g e v a r i a n c e s w i t h SPEECH AMPLITUDE h a v i n g ' t h e l a r g e s t average n o r m a l i z e d v a r i a n c e o f 0.414.  SPEECH  AMPLITUDE a l s o had t h r e e o r f o u r samples w i t h n o r m a l i z e d v a r i a n c e s i n the range 0.9 t o 1.8 which meant these samples were v e r y a t y p i c a l .  The  HIGH PASS ZCR p r e p r o c e s s e d waveform had t h e l o w e s t v a r i a n c e a l t h o u g h i t performed  the worst.  G e n e r a l l y the n o r m a l i z e d v a r i a n c e o f each sample  i n a s e t f o r one c a n d i d a t e was q u i t e c l o s e t o t h e average v a r i a n c e o f the s e t . indicate  normalized  The l a r g e v a r i a n c e s o f the p r e p r o c e s s e d waveforms  that there i s considerable v a r i a t i o n i n the preprocessed  waveforms d e r i v e d from two d i f f e r e n t speech samples from the same speaker.  The p r e p r o c e s s e d waveforms from d i f f e r e n t speakers seem q u i t e  s i m i l a r , as was n o t i c e system.  when o n l y two c a n d i d a t e s were used  to t e s t the  C o n s i d e r i n g (2.7 - 2.9) w i t h o n l y two c a n d i d a t e s t h e e q u a t i o n s  become  h  = s  2  - enta = s  J"S $j;dt 2  C  l l  =  /(t)2 1  d t  -  2  ciiSi  /S S dt 2  1  /§2 d t 1  T h e r e f o r e t h e more s i m i l a r S^ and §  2  a r e t h e c l o s e r C-Q approaches one.  S^ r e p r e s e n t s the average p r e p r o c e s s e d waveform and orthogonal functions.  represents the  I n many cases C x i e q u a l l e d a p p r o x i m a t e l y  .95 w h i c h  indicates  the two average p r e p r o c e s s e d waveforms o n l y d i f f e r 5%. For i n d i v i d u a l  and concatenated p r e p r o c e s s e d waveforms  v a l u e s o f F and G g e n e r a l l y f e l l i n the range o f 1 to 1.5 respectively all  and 0.2  -  as the number of c a n d i d a t e s i n c r e a s e d from 2 to 10.  cases G i n c r e a s e s  faster  space i n c r e a s e s which r e s u l t s catenated f e a t u r e v e c t o r s  than F as the d i m e n s i o n a l i t y o f the i n F/G decreasing.  feature vectors used.  1.0  In feature  F o r the case o f  con-  the v a l u e s o f F and G are e q u a l to the square  root o f the sum o f the squared v a l u e s o f F and G f o r the  trends;  the  individual  The f i g u r e of m e r i t , F / G , i n d i c a t e s  general  f o r example, as F / G decreases the e r r o r r a t e i n c r e a s e s .  In  some cases F / G was s m a l l e r than i n o t h e r cases but the e r r o r r a t e was also smaller.  Thus, the r e l a t i o n s h i p between F / G and % e r r o r  is.  s t r i c t l y s p e a k i n g not monotonic over s m a l l i n t e r v a l s , but tends to be monotonic f o r l a r g e changes i n F / G .  CHAPTER 5 CONCLUSION The  purpose  of this  o f SPEECH AMPLITUDE, LOW d i f f e r e n t speakers. and  different  be  PASS ZCR a n d HIGH PASS ZCR fo.r i d e n t i f y i n g  A system  combinations  for recognition. combined such  t h e s i s was t o i n v e s t i g a t e t h e u s e f u l n e s s  as d e s c r i b e d i n C h a p t e r  o f the p r e p r o c e s s e d waveforms were e v a l u a t e d  I t was shown t h a t t h e p r e p r o c e s s e d w a v e f o r m s c o u l d t h a t an a v e r a g e  achieved w i t h ten candidates.  e r r o r r a t e as l o w a s 3.4%  The s y s t e m  g e s t t h a t t h e t h r e e m e a s u r e m e n t s made o n t h e s p e e c h identifying The were e a s i l y to  are very  p r e p r o c e s s e d waveforms used had the advantages  c a l c u l a t e d and r e q u i r e d r e l a t i v e l y samples.  sug-  effective  ances and s y s t e m  performance  results.  they  s m a l l amounts o f s t o r a g e  None t h e l e s s  these  less  preprocessed  From t h e p o i n t o f v i e w o f v a r i -  t h e p r e p r o c e s s e d w a v e f o r m LOW  seemed t h e b e s t f o r s e p a r a t i n g s p e a k e r s . u s e d was e n t i r e l y v o i c e d .  that  The m a j o r d i s a d v a n t a g e was  d e p e n d e n c e t h a n one m i g h t w i s h .  waveforms y i e l d e d encouraging  sentence  The r e s u l t s  speakers.  c h a r a c t e r i z e the speech  speaker  c o u l d be  e r r o r r a t e seemed t o i n c r e a s e  l i n e a r l y w i t h t h e number o f c a n d i d a t e s i n a l l c a s e s .  in  2 was s i m u l a t e d  PASS ZCR  I t a l s o has t o be noted t h e  I f a sentence w i t h unvoiced  was u s e d maybe t h e p r e p r o c e s s e d w a v e f o r m o f HIGH PASS ZCR m i g h t  segments well  0  h a v e b e e n more u s e f u l The  t h a n was t h e c a s e i n t h e p r e s e n t  orthogonal functions that described the n dimensional  f e a t u r e space were d e r i v e d from The  study.  t e c h n i q u e used  the average  preprocessed  waveforms.  t o o b t a i n t h e o r t h o g o n a l f u n c t i o n s has t h e advantage  that a d d i t i o n a l speakers  are e a s i l y  added t o t h e system by o n l y h a v i n g  t o c a l c u l a t e t h e a d d i t i o n a l o r t h o g o n a l f u n c t i o n s and a v e r a g e  feature  vectors. F u r t h e r i n v e s t i g a t i o n might i n c l u d e use o f a sentence c o n t a i n s u n v o i c e d segments i n o r d e r t o improve p r e p r o c e s s e d w a v e f o r m HIGH PASS ZCR. u s i n g more s p e a k e r s  the performance  The s y s t e m  i n order to determine  that o f the  c o u l d a l s o be examined  how t h e e r r o r r a t e d e p e n d s o n / ^  t h e number o f c a n d i d a t e s when t h i s number e x c e e d s t e n . M o r e s a m p l e s each  speaker  from  c o u l d be o b t a i n e d , t o s e e i f t h e v a r i a n c e and t h e average  of  the p r e p r o c e s s e d waveforms change any.  the speech sample appeared q u i t e e f f e c t i v e  The measurements made on f o r speaker r e c o g n i t i o n .  Perhaps the system performance c o u l d be improved i f  the speaker was  r e q u i r e d to speak more than one s e n t e n c e , p a r t i c u l a r l y i f sentence d i d n o t y i e l d r e s u l t s h a v i n g a s u f f i c i e n t  the  l e v e l of  initial  confidence.  38  REFERENCES  [1]  I . P o l l a c k , J . M . P i c k e t t , and W..H. Sumby, "On t h e ' I d e n t i f i c a t i o n " of Speakers by V o i c e " , J . A c o u s t . S o c . A m e r . , 26, 403-406 (1954)  12]  P . D . B r i c k e r and S. P r u z a n s k y , " E f f e c t s o f Stimulus Content and D u r a t i o n on T a l k e r I d e n t i f i c a t i o n " , J . A c o u s t . S o c . A m e r . , 40, 1441-1449 (1966)  [3]  S. P r u z a n s k y , " P a t t e r n - M a t c h i n g Procedure f o r Automatic T a l k e r R e c o g n i t i o n " , J . A c o u s t . S o c . A m e r . , 35, 354-358 (1968)  14]  K . P . L i , J . E . Dammanon, and W.D. Chapman, " E x p e r i m e n t a l S t u d i e s i n Speaker V e r i f i c a t i o n U s i n g an A d a p t i v e System", J . A c o u s t . S o c . A m e r . , 40, 966-978 (1966)  [5]  P . D . B r i c k e r , R. Gnanadesikan, M . V . Mathews, S. P r u z a n s k y , P . A . • ,' Tukey, K.W. Wachter, and J . L . Warner, " S t a t i s t i c a l Techniques f o r T a l k e r I d e n t i f i c a t i o n " , B . S . T . J . , 50, 1427-1454 (1971)  [6]  S . K . Das and W.S. Mohn, " P a t t e r n R e c o g n i t i o n i n Speaker V e r i f i c a t i o n " , AFIPS Conf. P r o c , F a l l J o i n t Computer Conf e r e n c e , 35, 721-732 (1969)  17]  W . S . Mohn J r . , "Two s t a t i s t i c a l f e a t u r e e v a l u a t i o n techniques a p p l i e d to speaker r e c o g n i t i o n " , IEEE T r a n s a c t i o n s on Computers, C-20, n o . 9 , 979-987 ( S e p t . 1971)  [8]  S . K . Das, W . S . Mohn, "A Scheme f o r Speech P r o c e s s i n g i n Automatic Speaker V e r i f i c a t i o n " , IEEE T r a n s , on Audio and E l e c t r o a c o u s t i c s , AU-19, No. 1, 32-43, (March 1971)  [9]  B . S . A t a l , "Automatic Speaker R e c o g n i t i o n Based on P i t c h C o n t o u r s " , J . A c o u s t . S o c . A m e r . , 52, 1687-1697 (1972)  110]  R . C . Lummis, "Speaker V e r i f i c a t i o n by Computer U s i n g Speech I n t e n s i t y f o r Temperal R e g i s t r a t i o n " , IEEE T r a n s a c t i o n s on Audio and E l e c t r o a c o u s t i c s , A U - 2 1 , N o . 2 , 80-88, ( A p r i l 1973)  [llj  M . R . I t o , R.W. Donaldson, " Z e r o - C r o s s i n g Measurements f o r A n a l y s i s and R e c o g n i t i o n of Speech Sounds", IEEE T r a n s a c t i o n s on Audio and E l e c t r o a c o u s t i c s , AU-19, No. 3 , 235-242 ( S e p t . 1971)  [12]  B . P . L a t h i , An I n t r o d u c t i o n to Random S i g n a l s and Communication Theory, I n t e r n a t i o n a l Textbook Company, S c r a n t o n , P e n n s y l v a n i a , 1968, p . 70-73  [13]  Bernard G o l d , W o r d - R e c o g n i t i o n Computer Program, R . L . E . T e c h n i c a l Report 452, Massachusetts I n s t i t u t e o f T e c h n o l o g y , June 15, 1966.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065539/manifest

Comment

Related Items