UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Efficient speech storage via compression of silence periods Gan, Cheong Kuoon 1984

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1985_A7 G36.pdf [ 4.58MB ]
Metadata
JSON: 831-1.0096301.json
JSON-LD: 831-1.0096301-ld.json
RDF/XML (Pretty): 831-1.0096301-rdf.xml
RDF/JSON: 831-1.0096301-rdf.json
Turtle: 831-1.0096301-turtle.txt
N-Triples: 831-1.0096301-rdf-ntriples.txt
Original Record: 831-1.0096301-source.json
Full Text
831-1.0096301-fulltext.txt
Citation
831-1.0096301.ris

Full Text

E F F I C I E N T S P E E C H S T O R A G E V I A C O M P R E S S I O N O F S I L E N C E P E R I O D S b y C H E O N G K U O O N G A N B . S c . ( H o n s . ) , U n i v e r s i t y O f M a l a y a , 1 9 7 3 A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F M A S T E R O F A P P L I E D S C I E N C E i n T H E F A C U L T Y O F G R A D U A T E S T U D I E S D e p a r t m e n t O f E l e c t r i c a l E n g i n e e r i n g We a c c e p t t h i s t h e s i s a s c o n f o r m i n g t o t h e r e q u i r e d s t a n d a r d T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A D e c e m b e r 1 9 8 4 © C h e o n g K u o o n G a n , 1 9 8 4 In p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r an a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e Head o f my D e p a r t m e n t o r by h i s o r h e r r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . D e p a r t m e n t o f E l e c t r i c a l E n g i n e e r i n g T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a 2075 W e s b r o o k P l a c e V a n c o u v e r , C a n a d a V6T 1W5 D a t e : J a n u a r y 2 2 , 1985 i i A b s t r a c t An a d a p t i v e o p t i m a l s i l e n c e d e t e c t o r i s d e s i g n e d a n d i m p l e m e n t e d i n f o u r s p e e c h c o d i n g s c h e m e s : N - b i t PCM (N = 5 t o 1 2 ) , N - b i t A - l a w PCM (N = 4 t o 8 ) , N - b i t ADPCM (N = 3 t o 8) a n d ADM ( A d a p t i v e D e l t a M o d u l a t i o n ) f o r b i t - r a t e s o f 1 6 K p s , 24Kps a n d 3 2 K p s . The amount o f c o m p r e s s i o n i s a p p r o x i m a t e l y 35% f o r v o i c e r e c o r d i n g s s u c h a s r a d i o n e w s c a s t s , h i g h l y a c t i v e c o n v e r s a t i o n s a n d r e a d i n g s f r o m p r e p a r e d t e x t s . S u b j e c t i v e e v a l u a t i o n shows t h a t t h e s i l e n c e - e d i t e d v e r s i o n s ( s i l e n c e p l a y e d b a c k a s a b s o l u t e s i l e n c e ) h a v e a c c e p t a b i l i t y s c o r e s o f 1 .07 l o w e r t h a n t h e u n e d i t e d v e r s i o n s w i t h r e s p e c t t o a s p e c i f i c c o d i n g scheme f o r a s c o r e r a n g e o f 1 t o 5 . W i t h n o i s e - e d i t e d v e r s i o n s ( s i l e n c e r e p l a c e d by r a n d o m n o i s e d u r i n g p l a y b a c k ) t h e s c o r e d e g r a d a t i o n i s 0 . 5 . i i i T a b l e o f C o n t e n t s A b s t r a c t i i L i s t o f T a b l e s i v L i s t o f F i g u r e s v L i s t o f S p e c i a l T e r m s v i A c k n o w l e d g e m e n t s v i i C h a p t e r 1. INTRODUCTION 1 1 1.1 S p e e c h C o m p r e s s i o n : An O v e r v i e w 1 1.2 S i l e n c e I n t e r v a l s 4 1.3 C h a r a c t e r i s t i c s Of A S i l e n c e D e t e c t o r 6 1.4 T h e s i s O u t l i n e 9 C h a p t e r 2 . DEVELOPEMENT SYSTEM 11 2.1 S y s t e m S e t - u p 11 2 . 2 E x p e r i m e n t s On SD50 S i l e n c e D e t e c t o r 16 C h a p t e r 3 . AN OPTIMAL S I L E N C E D E T E C T O R , SD52 24 3.1 Some D e f i n i t i o n s 24 3 .2 T h e SD52 S i l e n c e D e t e c t o r 26 3 . 3 T h e SD52 A l g o r i t h m 31 3 . 4 D e t e r m i n a t i o n Of O p t i m a l P a r a m e t e r s 33 C h a p t e r 4 . S I L E N C E DETECTOR IMPLEMENTATION 38 4.1 S i l e n c e - S p e e c h C o d e r 38 4 . 2 N - b i t PCM I m p l e m e n t a t i o n 43 4 . 3 N - b i t A - l a w PCM I m p l e m e n t a t i o n 49 4 . 4 ADPCM I m p l e m e n t a t i o n 54 4 . 5 ADM I m p l e m e n t a t i o n 61 C h a p t e r 5 . EVALUATION AND CONCLUSIONS 67 5.1 S u b j e c t i v e E v a l u a t i o n Of S i l e n c e D e t e c t o r SD52 67 5 . 2 C o n c l u s i o n s 73 B IBLIOGRAPHY 75 APPENDIXES 79 i v L i s t o f T a b l e s 1. SD50 : S i l e n c e / S p e e c h I n t e r v a l s 14 2 . SD50 : D i s t r i b u t i o n o f S i l e n c e I n t e r v a l s 15 3 . S t a t i s t i c s f o r S i l e n c e P o r t i o n o f S p e e c h S a m p l e No .1 . . 18 4 . S e g r e g a t e d E n e r g y a n d Z c r S t a t i s t i c s 18 5 . T y p i c a l D u r a t i o n o f S i l e n c e I n t e r v a l s 21 6 . P e r f o r m a n c e o f SD50 f o r V a r i o u s P a r a m e t e r s 23 7 . S i l e n c e E n e r g y S t a t i s t i c s 29 8 . P a r a m e t e r R a n g e s f o r SD52 33 9 . A T e s t D o m a i n 34 10 . P e r f o r m a n c e o f SD52 f o r V a r i o u s A d m i s s i b l e P a r a m e t e r s . . 35 11 . O p t i m a l P a r a m e t e r s 36 12 . O c c u r r e n c e F r e q u e n c i e s o f Z e r o S i l e n c e Code f o r PCM . . . 45 13 . O c c u r r e n c e F r e q u e n c i e s o f M i d - R a n g e S i l e n c e Code f o r PCM 46 14 . R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r PCM 47 15 . The 1 3 - S e g m e n t a n d 1 5 - S e g m e n t C o d i n g Schemes 51 16 . T h e 9 - S e g m e n t A - l a w A p p r o x i m a t i o n 51 17 . R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r A - l a w PCM 53 18 . S t e p S i z e L a d d e r S p e c i f i c a t i o n 55 19 . R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r ADPCM 60 2 0 . R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r CVSD 65 2 1 . S i l e n c e D e t e c t o r E v a l u a t i o n C o m b i n a t i o n s 67 2 2 . C a s s e t t e R e c o r d i n g O r d e r i n g 69 2 3 . E v a l u a t i o n C o n s i s t e n c y R e s u l t s 70 2 4 . S i l e n c e D e t e c t o r E v a l u a t i o n S c o r e s 71 V L i s t o f F i g u r e s 1. D e v e l o p m e n t S y s t e m 11 2 . P d f o f S p e e c h W a v e f o r m 17 3 . P d f o f S h o r t - T i m e E n e r g y 17 .4. P d f o f S i l e n c e W a v e f o r m 17 5 . P d f o f S i l e n c e E n e r g y 17 6. S i x " E n e r g y - g r a m s " 20 7 . D i s t r i b u t i o n s o f S i l e n c e E n e r g y . 30 8 . SS C o d e r 40 9 . 3 - b i t M i d - R i s e r U n i f o r m Q u a n t i z e r 43 10 . P i e c e w i s e L i n e a r A p p r o x i m a t i o n t o t h e A - l a w 50 1 1 . ADPCM B l o c k D i a g r a m 55 12 . S t e p p i n g F u n c t i o n s 57 1 3 . ADPCM A d a p t a t i o n D i s c o n t i n u i t y 58 14 . CVSD M o d u l a t i o n 61 15 . T r a c k i n g P e r f o r m a n c e o f CVSD M o d u l a t i o n 63 16 . O v e r s a m p l i n g P r o c e s s 64 1 7 . ADPCM S t e p S i z e E r r o r 81 v i L i s t o f S p e c i a l T e r m s ( G , p ) 24 AVG 26 C o n c a t e n a t i o n p r o t e c t i o n 41 D 1 ( G , p ) , S i l e n c e Domain 24 D 2 ( G , p ) , S p e e c h D o m a i n 24 D t , T e s t Domain 24 D t - a d m i s s i b l e 25 D t - o p t i m a l 25 Dummy c o d e 41 e - c r i t 28 E - I N I T 27 E O F F 26 EON „ 26 M I N - S P 26 n o i s e - e d i t e d 68 S E Q - L 40 SD50 12 SD52 24 S I L - L 40 S i l e n c e c o d e 39 S i l e n c e c o d e - w o r d 39 S i l e n c e - e d i t e d 68 S i l e n c e i n t e r v a l s 5 SS c o d e s 38 S t e p s i z e l a d d e r 56 S t e p p i n g f u n c t i o n 56 Z - S I L 26 v i i A c k n o w l e d g e m e n t s T h e t h e s i s t o p i c was s u g g e s t e d by D r . R. W. D o n a l d s o n whose NSERC g r a n t p r o v i d e d t h e f i n a n c i a l s u p p o r t i n t h e f o r m o f R e s e a r c h A s s i s t a n t s h i p . My t h a n k s a n d a p p r e c i a t i o n a l s o go t o t h e t e n c o n s c i e n t i o u s s u b j e c t s who p a r t i c i p a t e d i n t h e l i s t e n i n g t e s t . 1 1. INTRODUCTION 1.1 S p e e c h C o m p r e s s i o n : An O v e r v i e w T h e d i g i t i z a t i o n o f s p e e c h o f f e r s many a d v a n t a g e s , i n c l u d i n g e a s e o f t i m e d i v i s i o n m u l t i p l e x i n g a n d a v a i l a b i l i t y t o a w h o l e r a n g e o f d i g i t a l s i g n a l p r o c e s s i n g t e c h n i q u e s . I t s r e a d i n e s s f o r d i g i t a l s i g n a l p r o c e s s i n g i s p e r h a p s t h e s i n g l e most i m p o r t a n t a d v a n t a g e . Not o n l y c a n d e s i g n , a n a l y s i s a n d t e s t i n g be d o n e w i t h a d i g i t a l c o m p u t e r , s p e e c h m a n i p u l a t i o n l i k e s t o r a g e , e d i t i n g , e n c r y p t i o n , i n d e x i n g , c a t a l o g u i n g a n d r e f e r e n c i n g c a n be a c c o m p l i s h e d w i t h c o n s i d e r a b l e ^ s p e e d a n d e a s e . D i g i t i z a t i o n o f s p e e c h a l s o l e a d s t o e x c i t i n g new f r o n t i e r s i n m a n - m a c h i n e i n t e r f a c i n g f o r a p p l i c a t i o n s s u c h a s human v o i c e r e s p o n s e , s p e a k e r i d e n t i f i c a t i o n a n d s p e e c h r e c o g n i t i o n . In 1970 t h e D2 C h a n n e l Bank d i g i t a l s w i t c h i n g n e t w o r k was a d d e d t o B e l l S y s t e m w i t h much s u c c e s s n o t w i t h s t a n d i n g t h e e x i s t i n g a n a l o g u e s y s t e m s [ H P , D M M ] . The m a j o r s e t b a c k s o f s p e e c h d i g i t i z a t i o n a r e l a r g e s t o r a g e a n d i n c r e a s e d b a n d w i d t h r e q u i r e m e n t s . S p e e c h s a m p l e d a t a N y q u i s t r a t e o f 8KHz a n d d i g i t i z e d t o 1 2 - b i t p e r s a m p l e r e q u i r e s a b i t r a t e o f 9 6 , 0 0 0 b i t s / s e c , w h i c h i m p l i e s e x c e s s i v e b a n d w i d t h r e q u i r e m e n t s f o r t r a n s m i s s i o n o f u n p r o c e s s e d d i g i t i z e d s p e e c h . F o r s t o r a g e , memory r e q u i r e m e n t s a r e a l s o e x c e s s i v e . F o r i n s t a n c e , a 10MB d i s c c a n s t o r e l e s s t h a n 3 m i n u t e s o f raw s p e e c h a t 6 4 k p s . E v e n w i t h t h e c o s t o f memory d e c r e a s i n g a t a r a t e o f 20-30% p e r y e a r s p e e c h d i g i t i z a t i o n i s s t i l l n o t a t t r a c t i v e b e c a u s e t h e s t o r a g e r e q u i r e m e n t r e m a i n s l a r g e , e s p e c i a l l y when d u p l i c a t i o n i s u n d e r t a k e n f o r r e l i a b i l i t y . T h e s e a n d o t h e r 2 considerations have led many researchers to consider speech compression. The success of d i g i t a l speech i s intimately linked to that of speech compression which involves reduction of b i t - r a t e , possibly at the expense of f i d e l i t y . When working in the time domain where the speech waveform i s being approximated by discrete-time samples, we can reduce the number of b i t s per sample, N, at the expense of q u a l i t y . However t h i s reduction i s bounded by l i s t e n e r a c c e p t a b i l i t y . As N i s reduced quantization noise i s increased. For example i t can be shown that the signal-to-quantization noise r a t i o SQNR(dB) = 6N - 7.2 [ J a y l ] , giving a 6 dB degradation for a 1-bit reduction. Further speech compression i s done by detection and removal of redundancies in speech. Speech redundancies manifest themselves in the following ways: 1) . Sample-to-sample correlation : Speech sampled at the Nyquist rate t y p i c a l l y shows a r e l a t i o n between two adjacent samples that can be expressed as the f i r s t order autocorrelation function A ( 1 ) = 2n i l x ( n ) - x ( n + 1 ) / L ' L l a r 9 e with t y p i c a l values of A(1) l y i n g between 0.75 to 0.90 for 8KHz speech. This high autocorrelation implies sample-to-sample redundancy. When such redundancy i s exploited, as in a d i f f e r e n t i a l coding schemes, speech compression i s achieved with minimum s a c r i f i c e of q u a l i t y in the reconstruction phase. 2) . Cycle-to-cycle redundancy : P e r i o d i c i t y exhibits i t s e l f 3 f r e q u e n t l y e s p e c i a l l y i n a v o i c e d s o u n d a n d may l a s t f o r 100ms o r m o r e . However d e t e c t i o n o f t h i s c y l i c n a t u r e o f s p e e c h i n v o l v e s c o m p l e x i t y a n d e x t r a o v e r h e a d e v e n t h o u g h one c a n s a v e some u n n e c e s s a r y c o d i n g on s u c c e s s i v e c y c l e s . 3) . P i t c h I n t e r v a l : E x p e r i m e n t s w i t h s p e e c h w a v e f o r m s h a v e shown t h a t most v o i c e d s o u n d s e x h i b i t p i t c h i n t e r v a l s o f b e t w e e n 2 . 5 t o 20ms t h a t may l a s t f o r 5 t o 40 p e r i o d s . E l i m i n a t i o n o f t h e s e r e p e t i t i o n s a t t h e e n c o d e r c a n g i v e r i s e t o s p e e c h c o m p r e s s i o n w i t h o u t l o s s o f q u a l i t y . The p r o b l e m h e r e i s d i f f i c u l t y i n p i t c h d e t e c t i o n . 4) . S p e e c h R e p r o d u c t i o n F e a t u r e s : F r o m t h e s p e e c h r e p r o d u c t i o n p o i n t o f v i e w s p e e c h w a v e f o r m i t s e l f i s n o t n e c e s s a r y f o r r e p r o d u c t i o n o f t h e same u t t e r a n c e . T h e n e c e s s a r y i n g r e d i e n t s a r e s p e e c h f e a t u r e s s u c h a s p a r a m e t e r s f r o m a s p e e c h p r o d u c t i o n m o d e l , f u n d a m e n t a l f r e q u e n c i e s , p i t c h p e r i o d s , v o c a l t r a c t r e s o n a n c e s , e t c . W h i l e r e t a i n i n g o n l y t h e s e r e p r o d u c t i o n f e a t u r e s c a n l e a d t o a d r a m a t i c s a v i n g f a c t o r o f 20 t o 3 0 , t h i s m e t h o d o f s p e e c h c o m p r e s s i o n i s c o m p l e x t o i m p l e m e n t , l a c k s n a t u r a l n e s s , h a s low l i s t e n e r a c c e p t a b i l i t y a n d s p e a k e r r e c o g n i t i o n i s v i r t u a l l y i m p o s s i b l e . 5) . S i l e n c e I n t e r v a l s : One o f t h e m o s t f r u i t f u l a p p r o a c h e s t o s p e e c h c o m p r e s s i o n i s t h r o u g h s i l e n c e d e t e c t i o n a n d d e l e t i o n . B r a d y [ B r a ] i s t h e f i r s t t o r e p o r t t h a t f o r a n o r m a l c a s u a l 2 - p e r s o n t e l e p h o n e c o n v e r s a t i o n t h e s p e e c h a n d s i l e n c e ( p a u s e s a n d l i s t e n i n g ) p e r i o d s a r e a b o u t 40% a n d 60% r e s p e c t i v e l y . R e m o v a l o f t h i s r e d u n d a n c y w i l l r e s u l t i n s u b s t a n t i a l s a v i n g s a n d a t t h e r e a s o n a b l e c o s t a s s o c i a t e d 4 w i t h t h e d e t e c t i o n o f s p e e c h / s i l e n c e a n d s i l e n c e / s p e e c h b o u n d a r i e s . T h e t h i r d r o u t e i n s p e e c h c o m p r e s s i o n i s v i a e f f i c i e n t s p e e c h c o d i n g t e c h n i q u e s t h a t e x p l o i t one o r more r e d u n d a n c i e s c i t e d a b o v e . C o m p a n d i n g law PCM, ADPCM a n d ADM a r e a few e x a m p l e s . F u r t h e r m o r e a n y s p e e c h c o d e c a n be o p t i m i z e d by means o f H u f f m a n e n t r o p y c o d i n g t h a t c a n r e s u l t i n a f u r t h e r b i t - r a t e r e d u c t i o n o f 11-25% w i t h o u t a n y p e n a l t y on t h e s p e e c h p r o d u c t i o n q u a l i t y w h a t s o e v e r [ B L ] . 1.2 S i l e n c e I n t e r v a l s S i l e n c e i n t e r v a l s ( p a u s e s when no s p e e c h i s p r e s e n t ) h a v e b e e n t h e o b j e c t o f i n t e r e s t i n many s i t u a t i o n s e n c o u n t e r e d by many r e s e a r c h e r s . One o f t h e f i r s t r e s u l t s o f s u c h i n t e r e s t was TASI (T ime A s s i g n m e n t S p e e c h I n t e r p o l a t i o n ) i n w h i c h v o i c e c h a n n e l c a p a c i t y c a n be i n c r e a s e d v i r t u a l l y by d e a l l o c a t i o n o f c h a n n e l s d u r i n g s i l e n c e i n t e r v a l s . T h e p e r f o r m a n c e o f TASI o r DSI ( D i g i t a l S p e e c h I n t e r p o l a t i o n ) s y s t e m s i s n e c e s s a r i l y a f f e c t e d by t h e s e n s i t i v i t y o r p r o p e r f u n c t i o n i n g o f a s i l e n c e i n t e r v a l d e t e c t o r [ M S ] . C o h e n [ C o h ] p o i n t e d o u t t h a t f o r P a c k e t V o i c e C o m m u n i c a t i o n , s i l e n c e d e t e c t i o n w h i c h c a n r e s u l t i n a r e d u c t i o n f a c t o r o f 2 o r 3 , r e d u c e s t h e t r a n s m i s s i o n d u t y c y c l e . M o r e o v e r t h i s r e d u c t i o n c a n be o b t a i n e d a l m o s t f o r f r e e , n e e d i n g a t most one I C . In an i n t e g r a t e d v o i c e - d a t a n e t w o r k s i l e n c e i n t e r v a l s c a n be u t i l i z e d f o r o t h e r v o i c e t r a f f i c o r f o r d a t a t r a f f i c [ W F , Y a t ] . In s p e e c h r e c o g n i t i o n o r a n y s p e e c h s e g m e n t a t i o n s c h e m e s , s i l e n c e 5 d e t e c t i o n or endpoint d e t e c t i o n i s the f i r s t p r o c e s s i n g block which e l i m i n a t e s much redundant computations [Das,BMM,LRRW,NMW, Ney ]. As memory c o s t decreases, d i g i t a l speech storage becomes more and more a t t r a c t i v e , p a r t i c u l a r l y with speech compression through s i l e n c e d e l e t i o n . Experimental speech storage systems have been presented r e c e n t l y both on microprocessors and on main frame computers. In the l a t t e r cae v a r i o u s speech manipulations may be c a r r i e d out [FJU,Nus,Max]. I t i s evident that s i l e n c e i n t e r v a l s p l a y a c e n t r a l r o l e i n a l l of the above a p p l i c a t i o n s . The a c t u a l d e f i n i t i o n of s i l e n c e has s i g n i f i c a n t e f f e c t on system performance. In a c o n v e r s a t i o n , v o i c e message or speech segment, s i l e n c e i n t e r v a l s encompass (a) s i l e n c e (or background noise) preceding or t r a i l i n g an utterance and (b) i n t e r - s e n t e n c e , i n t e r - p h r a s e , inter-word and intra-word s i l e n c e s . Speech may be regarded as the complement of s i l e n c e , encompassing both the intended i n t e l l i g e n t u t t e r a n c e as w e l l as the unintended u t t e r a n c e s such as s i g h s , i n h a l a t i o n s and other sounds that are i n a d v e r t e n t l y produced v o c a l l y or otherwise i n the course of speaking. Having made such a d i s t i n c t i o n , we contend that speech d e t e c t o r and s i l e n c e d e t e c t o r are the same f u n c t i o n a l l y s i n c e the outcomes are complementary. 6 1.3 C h a r a c t e r i s t i c s Of A S i l e n c e D e t e c t o r S i l e n c e i n t e r v a l d e t e c t i o n i n t h e p r e s e n c e of n o i s e i s a non-t r i v i a l p r o b l e m . A n a l o g u e v o l t a g e c o m p a r a t o r s ( a l s o c a l l e d v o i c e o p e r a t e d s w i t c h e s ) a r e e a r l y and s i m p l e v e r s i o n s o f s i l e n c e d e t e c t o r s w i t h a s i n g l e f i x e d t h r e s h o l d . T h e s e a r e a p p l i c a b l e o n l y i n a good and w e l l - c o n t r o l l e d r e c o r d i n g e n v i r o n m e n t s [ M S , B r a ] . The s e n s i t i v i t y of a s i l e n c e d e t e c t o r i s o f t e n c o n t r o l l e d by t h e i n p u t power o r e n e r g y t h r e s h o l d . A h i g h t h r e s h o l d r e n d e r s a s i l e n c e d e t e c t o r more s e n s i t i v e t o s i l e n c e and l e s s s e n s i t i v e t o s p e e c h . S i n c e b a c k g r o u n d o r c h a n n e l n o i s e l e v e l i n many a p p l i c a t i o n s i s n o t f i x e d , t h e s e t t i n g o f t h r e s h o l d and t h e r e b y t h e s e n s i t i v i t y o f t h e s i l e n c e d e t e c t o r , becomes a s u b j e c t t o t h e b a c k g r o u n d n o i s e l e v e l . The s e n s i t i v i t y of a s i l e n c e d e t e c t o r i s a l s o g o v e r n e d by i t s a p p l i c a t i o n . F o r TASI or DSI, d e t e c t i o n of s h o r t s i l e n c e i n t e r v a l s c a u s e s e x c e s s i v e c h a n n e l s w i t c h i n g and an o v e r l y s e n s i t i v e s i l e n c e d e t e c t o r c a u s e s d e g r a d a t i o n i n s y s t e m p e r f o r m a n c e i n t h i s c a s e . S i m i l a r l y f o r s p e e c h r e c o g n i t i o n o r s e g m e n t a t i o n , a s i l e n c e d e t e c t o r u s u a l l y has a l o w e r e d s e n s i t i v i t y t o a v o i d e x c e s s i v e s p e e c h c l i p p i n g . On t h e o t h e r hand, f o r s p e e c h s t o r a g e , t h e g o a l i s maximal s i l e n c e d e l e t i o n w i t h m i n i m a l s p e e c h c l i p p i n g , w h i c h i m p l i e s a s i l e n c e d e t e c t o r w h i c h i s a s s e n s i t i v e a s p o s s i b l e . The d e s i g n o f a s i l e n c e d e t e c t o r o f t e n p u t s c o n s t r a i n t s on t h e t y p e o f e n v i r o n m e n t s i n w h i c h i t c a n f u n c t i o n a t an a c c e p t a b l e q u a l i t y l e v e l . The d e s i g n a l s o d e t e r m i n e s t h e e a s e w i t h w h i c h t h e b e s t t h r e s h o l d c a n be a s c e r t a i n e d ; some d e s i g n s may r e q u i r e 7 r e p e a t e d t r i a l a n d e r r o r p r o c e d u r e s f o r e a c h a p p l i c a t i o n . We p o i n t o u t h e r e t h a t t h e 40% t a l k s p u r t s d e r i v e d by B r a d y i s v e r y much d e p e n d e n t on t h a t p a r t i c u l a r a n a l o g u e s p e e c h d e t e c t o r a n d i t s e n e r g y t h r e s h o l d . P u t t i n g B r a d y ' s r e s u l t i n b e t t e r f o c u s , we i n f e r t h a t d u r i n g a t w o - p e r s o n t e l e p h o n e c o n v e r s a t i o n a p e r s o n i s l i s t e n i n g 40% o f t h e t i m e a n d a c t i v e l y v o c a l i s i n g 40%, a n d t h a t t h e r e m a i n i n g 20% c o m p r i s e s s i l e n c e i n t e r v a l s . T h e s e f i g u r e s were o b t a i n e d u s i n g t h e b e s t t h r e s h o l d c h o s e n f r o m among a r a n g e u s e d e x p e r i m e n t l y . We a l s o i n f e r t h a t t h e t i m e when a p e r s o n i s s p e a k i n g c o n s i s t s o f 74% t a l k s p u r t s a n d 26% s i l e n c e . T h e l a t t e r f i g u r e i s s u b s t a n t i a l l y b e l o w t h e 37% o b t a i n e d f r o m o u r a d a p t i v e o p t i m a l s i l e n c e d e t e c t o r ( S e c t i o n 3 . 4 ) o p e r a t i n g on a p r e p a r e d s p e e c h . H a n g o v e r i s u s e d i n many s i n g l e - t h r e s h o l d e n e r g y - b a s e d s i l e n c e d e t e c t o r s t o a c c o u n t f o r t h e f a l l i n g e n e r g y l e v e l f r e q u e n t l y e n c o u n t e r e d a t t h e e n d o f an u t t e r a n c e [ M S , B r a , F a r ] . T y p i c a l v a l u e s o f 120-200ms a r e u s e d t o p r o l o n g t h e u t t e r a n c e a f t e r s i l e n c e i n t e r v a l i s d e t e c t e d . T h e same e f f e c t c a n be a c h i e v e d by h a v i n g two t h r e s h o l d s , one t o d e t e c t s p e e c h , one f o r s i l e n c e . F o r i n s t a n c e i n [ F J U ] , t h e l a t t e r was s e t a t 6 - l 2 d B b e l o w t h e f o r m e r . N e u b u r g [Neu] u s e d d u a l t h r e s h o l d s t o r e a l i s e a 15% i m p r o v e m e n t i n m a k i n g t h e v o i c i n g / n o n - v o i c i n g d e s i c i o n . T h e most p e r s i s t e n t p r o b l e m i n s i l e n c e d e t e c t i o n i s t h e p o s i t i v e d e t e c t i o n o f weak f r i c a t i v e s s u c h a s " s " a n d " z " w h i c h h a v e low e n e r g y l e v e l s . T h i s p r o b l e m b e c o m e s c r i t i c a l i n t h e p r e s e n c e o f n o i s e . M o s t r e s e a r c h e r s h a v e a c k n o w l e d g e d t h e l o s s o f weak f r i c a t i v e s [ F J U ] . A s a r u l e a l l s i l e n c e d e t e c t o r s w h i c h do 8 not measure ZCR ( z e r o - c r o s s i n g - r a t e ) or i t s e q u i v a l e n t s u f f e r from l o s s e s i n unvoiced f r i c a t i v e s and stop sounds which are p e r c e p t u a l l y important. The use of short-time ZCR i t s e l f i s s u b j e c t to some shortcomings. For i n s t a n c e , a high frequency component does not manifest i t s e l f w e l l i n the ZCR when there i s a low frequency component of comparable magnitude. F a r i e l l o [Far] used a s i g n sequence d e t e c t o r t o i d e n t i f y weak f r i c a t i v e s and semi-vowels with much success. But noise i n f i l t r a t i o n w i l l c o r r u p t such d e t e c t i o n by sign sequence i n the same way ZCR measurement i s c o r r u p t e d by n o i s e . The e f f e c t of noise can be minimized by subband s e p a r a t i o n or a.c. b i a s [BB]. R e a l i s i n g the importance of these weak f r i c a t i v e s i n audio p e r c e p t i o n , s e v e r a l r e s e a r c h e r s have developed more s o p h i s t i c a t e d s i l e n c e d e t e c t o r s . In [DMV] speech i s modelled as a high frequency c a r r i e r (ZCR) modulated by low frequency envelope (energy) r e s u l t i n g i n b e t t e r performance than the c o n v e n t i o n a l s i l e n c e d e t e c t o r s f o r low input l e v e l . Yatsuzuka [Yat] d e t a i l e d perhaps the most e l a b o r a t e scheme using short-time energy, ZCR and s i g n - b i t sequence and int r o d u c e d two t r a n s i t i o n s t a t e s between speech and s i l e n c e as i n t e r i m d e c i s i o n s to be r e c t i f i e d or n u l l i f i e d l a t e r . A 36.2% t a l k s p u r t s was obtained f o r a two-way c o n v e r s a t i o n as compared to 40% by Brady. Background n o i s e has been regarded by many as s t a t i o n a r y except f o r impulse n o i s e . T h i s p o i n t of view i s v a l i d i n a very r e s t r i c t e d and c o n t r o l l e d environment. Noise such as background c h a t t e r i n g , f o o t s t e p s , paper s h u f f l i n g , hums from o f f i c e machines and so on are not l i k e l y to be e l i m i n a t e d or c o n t r o l l e d i n a 9 p r a c t i c a l s i t u a t i o n . To be r e a l i s t i c , background n o i s e l e v e l s s h o u l d be c o n s i d e r e d time v a r y i n g , and as such f i x e d t h r e s h o l d d e t e c t o r s l i k e a l l the ones mentioned so f a r would p e r f o r m r a t h e r p o o r l y i n a n o n - c o n t r o l l e d s i t u a t i o n . Souza [Sou] i s the f i r s t t o d e s i g n an a d a p t i v e s e l f -n o r m a l i z i n g s i l e n c e d e t e c t o r f o r the purpose of i s o l a t e d - w o r d r e c o g n i t i o n . Because F t e s t s were used based on f i v e - v a r i a b l e s t a t i s t i c s d e r i v e d from the i n i t i a l 5-second s i l e n c e (a p r e -c o n d i t i o n ) , t h e d e t e c t o r was a l s o s p e a k e r - and s c r i p t - i n d e p e n d e n t . The a d a p t i v e n e s s was a s s u r e d by u p d a t i n g the s i l e n c e s t a t i s t i c s e v e r y 500 m i l l i s e c o n d s . 1.4 T h e s i s O u t l i n e The o b j e c t i v e of t h i s t h e s i s i s t o d e s c r i b e the d e s i g n of a s i l e n c e d e t e c t o r and i t s i m p l e m e n t a t i o n i n each of the f o u r speech c o d i n g schemes: PCM, log-PCM, ADPCM and ADM, f o r the purpose of speech s t o r a g e . As w e l l we show t h a t t h r o u g h s i l e n c e d e l e t i o n , speech s t o r a g e can be a c h i e v e d w i t h more than 35% s a v i n g i n s t o r a g e c a p a c i t y f o r a l l the above-mentioned c o d i n g schemes. We f i r s t d e s c r i b e i n Chapter 2 the VAX-11/750 e x p e r i m e n t a l system and some p r e l i m i n a r y e x p e r i m e n t s which m o t i v a t e the d e s i g n c o n s i d e r a t i o n s of our s i l e n c e d e t e c t o r . S i n c e we want the maximum s i l e n c e d e t e c t i o n p o s s i b l e f o r speech s t o r a g e a p p l i c a t i o n s , we e x p l o r e the n o t i o n of o p t i m a l i t y of a s i l e n c e d e t e c t o r i n Chapter 3. The a l g o r i t h m f o r our a d a p t i v e o p t i m a l s i l e n c e d e t e c t o r SD52 i s g i v e n here t o g e t h e r w i t h t h e d e t e r m i n a t i o n of i t s o p t i m a l p a r a m e t e r s . 10 Chapter 4 g i v e s the i m p l e m e n t a t i o n of SD52 i n each of the f o u r speech c o d i n g schemes. R e s u l t s and e r r o r a n a l y s i s a r e g i v e n . F i n a l l y , Chapter 5 p r o v i d e s the r e s u l t s of a s u b j e c t i v e e v a l u a t i o n of SD52, r e c a p i t u l a t e s the main r e s u l t s and recommends some a r e a s f o r f u r t h e r r e s e a r c h . 11 2 . DEVELOPMENT SYSTEM 2 .1 E x p e r i m e n t a l S y s t e m C o n f i g u r a t i o n Our e x p e r i m e n t s c o n c e r n i n g s p e e c h / s i l e n c e d i s c r i m i n a t i o n were d o n e on t h e V A X - 1 1 / 7 5 0 s y s t e m h a v i n g , among o t h e r f e a t u r e s , t h e f o l l o w i n g : H a r d w a r e : S o f t w a r e : 10MB r e m o v a b l e d i s k ( R L 0 2 ) V T 1 2 5 g r a p h i c s t e r m i n a l 1 6 - c h a n n e l A / D 4 - c h a n n e l D / A VMS o p e r a t i n g s y s t e m F o r t r a n c o m p i l e r RGL g r a p h i c l i b r a r i e s E x t e r n a l h a r d w a r e i n c l u d e d a m i c r o p h o n e , a b r e a d b o a r d c o n s i s t i n g o f a B u t t e r w o r t h b a n d p a s s f i l t e r a n d an o p - a m p , a n d a h e a d p h o n e o r an 8 -ohm s p e a k e r ( s e e F i g . 1 ) . The s p e e c h s a m p l i n g p r o c e s s i s r e p r e s e n t e d by t h e f o l l o w i n g block d i a g r a m : a d j u s t m i k e o— BPF ^ f . = 75Hz * f , = 3 .5KHZ h g a i n - 4 Op-Amp A / D x 300 8 KHz ? 1 2 - b i t a d j u s t F i g u r e 1 - D e v e l o p m e n t S y s t e m V A X - 1 1 / 7 50 A f t e r s p e e c h was p a s s e d t h r o u g h t h e d y n a m i c m i c r o p h o n e , i t was b a n d - p a s s e d f o r t h e s t a t e d f r e q u e n c y r a n g e w i t h l 2 d B / o c t a v e 12 r o l l - o f f . An o p - a m p w i t h a d j u s t a b l e g a i n t o 300 b r o u g h t t h e s p e e c h s i g n a l t o ± 5 V r a n g e r e q u i r e d by t h e A / D . A f t e r s a m p l i n g , s h o r t - t i m e e n e r g y a n d z e r o - c r o s s i n g - r a t e ( z c r ) were c o m p u t e d e v e r y 10ms (80 s a m p l e s ) u s i n g a r e c t a n g l e window o f l e n g t h 100 s a m p l e s ( 1 2 . 5 m s ) . E x p l i c i t l y 100 s h o r t - t i m e e n e r g y E = Z | X . | / 1 0 0 i=1 100 s h o r t - t i m e z c r Z = 1/2 1(1 - X . X i _ 1 / | X i X i _ 1 | ) , l x i x i - i l * 0 i = 1 B e c a u s e t h e b u f f e r s i z e f o r h a n d l i n g I / O t r a n s f e r s was 4000 s a m p l e s , t h e window l e n g t h a t b o t h e n d s o f t h e b u f f e r was t r u n c a t e d t o 90 s a m p l e s . T h i s t r u n c a t i o n i s n o t s i g n i f i c a n t s i n c e e n e r g y a n d z c r r e m a i n r e l a t i v e l y c o n s t a n t w i t h i n 10ms f o r human s p e e c h . T o d e r i v e some i n i t i a l e m p i r i c a l o b s e r v a t i o n s a s i m p l e s i l e n c e d e t e c t o r a l g o r i t h m c a l l e d SD50 was u s e d t o segment a r e c o r d e d s p e e c h i n t o s i l e n c e a n d s p e e c h i n t e r v a l s . T h i s d e t e c t o r was b a s e d s o l e l y on s h o r t - t i m e e n e r g y w i t h a p r e s e t f i x e d t h r e s h o l d . T h e 30 s e c . t e s t s p e e c h l a b e l l e d h e r e a s s p e e c h s a m p l e n o . 1 was r e c o r d e d a n d s t o r e d i n memory u s i n g t h e s y s t e m i n F i g . 1 . The s a m p l e was e x t r a c t e d f r o m a p r e p a r e d l e c t u r e t a p e d e l i v e r e d by an A m e r i c a n m a l e . T h e r e c o r d i n g i n c l u d e d a f a i r l y h i g h n o i s e l e v e l due t o p o o r r e c o r d i n g on a l o w - q u a l i t y c a s s e t t e . 13 S p e e c h S a m p l e N o . 1 : I t i s c l a i m e d t h a t y o u n g c h i l d r e n up t o t h e age o f a b o u t s e v e n o r e i g h t y e a r s a r e i n c a p a b l e o f g r a s p i n g t h e a b s t r a c t f u n d a m e n t a l t h a t number a n d v o l u m e r e m a i n c o n s t a n t e v e n t h r o u g h c h a n g e s i n t h e o u t w a r d a p p e a r a n c e o f t h e o b j e c t . F o r t h i r t y y e a r s t h e work o f P i a g e t a n d h i s c o l l e a g u e s i n G e n e v a h a s p r o f o u n d l y i n f l u e n c e d t h e e d u c a t i o n o f t h e y o u n g c h i l d . The r e s u l t a n d some s t a t i s t i c s o f SD50 s i l e n c e d e t e c t o r a r e r e p r o d u c e d i n T a b l e s 1 a n d 2 . In T a b l e 1, t h e s h o r t - t i m e e n e r g y E i s c o m p u t e d e v e r y 10ms. E a c h o f t h e 3000 s a m p l e p o i n t s i s c l a s s i f i e d a s s i l e n c e o r s p e e c h b a s e d o n l y on an e n e r g y t h r e s h o l d o f 2 0 . T h e a c t u a l s p e e c h b e g i n s a t 728 o r 7 . 2 8 s f r o m b e g i n n i n g o f r e c o r d i n g . A l l t h e " s p e e c h " i n t e r v a l s b e f o r e t h i s a r e o f d u r a t i o n l e s s t h a n o r e q u a l 30ms a n d a r e made up o f c l i c k s o u n d s f r o m t h e m i c r o p h o n e s l i d e s w i t c h o r t h e c a s s e t t e r e c o r d e r p u s h - b u t t o n . In T a b l e 2 , t h e p r e c e d i n g a n d t r a i l i n g s i l e n c e i n t e r v a l s a r e e x c l u d e d f r o m s t a t i s t i c a l c o m p i l a t i o n . In t h i s e x a m p l e , b e c a u s e t h e SD50 a l g o r i t h m d e t e c t e d a f a l s e s t a r t a t s a m p l e n o . 7 3 , t h e t o t a l s p e e c h d u r a t i o n was c o m p u t e d a s 2939-73+1 o r 2 7 6 7 c s , where c s d e n o t e s t h e s a m p l e p o i n t n u m b e r . I f t h e s t a t i s t i c s were a d j u s t e d t o d i s r e g a r d t h e f a l s e s t a r t , t h e t o t a l d u r a t i o n w o u l d be 2 2 1 2 c s , r e s u l t i n g i n 64 .78% s p e e c h a n d 35.22% s i l e n c e , r a t h e r t h a n 50.02% a n d 4 9 . 9 8 % , r e s p e c t i v e l y , a s s h o w n . 1 4 T a b l e 1 - S D 5 0 : S i S p e e c h S a m p l e N o . 1 ; S i l e n c e Speech C 1 723 72 C 73 74 3 2 C 121 128] 8 C 129 1293 1 C 132 394]263 C 395 3953 1 C 500 627]12B C 626 6283 1 C 633 639] 7 C 640 641] 2 C 739 739] 1 C 740 7523 13 C 757 756] 2 C 759 B0B3 50 C 815 8163 2 C 817 8173 1 C 875 875] 1 C 876 8993 24 [ 923 923] 1 C 924 9393 16 C 945 949] 5 C 950 9B53 36 C 999 1002] 4 C1003 1009] 7 El 034 1043] 10 11044 10443 1 C1051 1051] 1 C1052 1057] 6 Cl 106 1113] B t i l 14 11423 29 Cl 151 1185] 35 C11B6 1186] 1 C1195 1200] 6 C1201 12263 26 C1230 1231] 2 C1232 1282] 51 C1313 1329] 17 C1330 13443 15 C137B 1379] 2 C1380 13B0] 1 C1423 1429] '7 C1430 14543 25 C1460 1466] 7 C1467 14B9] 23 C149B 1511] 14 C 1512 15123 1 C15B6 1589] 4 C1590 1590] 1 C1664 1668] 5 C1669 1721 ] 53 C 1732 1732] 1 C1733 1747] 15 C 1B0B 1821 ] 14 C 1822 1B34] 13 C 1B49 1861] 13 C1B62 1883] 22 C 1904 1904] 1 C1905 1909] 5 C 1973 19B9] 17 C 1990 2000] 11 C2013 20133 1 C2014 2015] 2 C2023 2024] 2 C2025 20363 12 C2047 20473 1 C204B 2067] 20 C2064 2090] 7 C2091 2096] 6 C2120 21273 8 C212B 2138] 11 C2145 2151 3 7 C2152 2153] 2 C2157 215B3 2 C2159 217B] 20 C21B3 2183 3 1 C21B4 2184] 1 C219B 22053 B C2206 2229] 24 C2239 2239] 1 C2240 2242] 3 C2245 22563 12 C2257 2257] 1 C2321 2340] 20 C2341 2341] 1 C234S 2351 ] 4 C2352 2362] 11 £2401 2414] 14 C2415 2436] 22 C2445 2445] 1 C2446 2452] 7 C2463 2465] 3 C2466 24B23 17 C2485 24B6] 2 C24B7 24BB] 2 C2521 2530] 10 C2531 2532] 2 C2562 2564] 3 C2565 2566] • 2 C2577 2581] 5 C2582 2583] 2 C2593 2594] 2 C2595 26193 25 C2661 2665] 5 C2666 26743 9 C2679 26B6] 8 C2687 26973 11 C2705 2706] 2 C2707 27523 46 C2764 2766] 3 C2767 27893 23 C2B02 2802] 1 C2B03 28123 10 C2B22 2826] 5 C2827 2B273 1 C2B43 2843] 1 C2844 2847] 4 C2850 2852] 3 C2853 2854 3 2 C2857 2858] 2 C2B59 28663 B C2893 2895] 3 C2896 2B963 1 C2933 2933] 1 C2934 29393 6 l e n c e / S p e e c h I n t e r v a l s E n e r g y T h r e s h o l d = 20 S i l e n c e S p e e c h C 75 1173 43 c ne 1203 3 C 130 1303 1 [ 131 131] 1 C 396 4973102 [ 498 499] 2 C 629 6313 3 C 632 632] 1 C 642 7273 B6 C 728 73B] 1 1 C 753 7533 1 C 754 756] 3 C 809 8123 4 [ 813 814] 2 C 81B 8203 3 C 821 874] 54 C 900 9213 22 C 922 922] 1 C 940 9403 1 C 941 944] 4 t 9B6 9883 3 C 989 99B3 10 C1010 10133 4 C 1014 1033] 20 C1045 10493 5 C1050 10503 1 C1058 10583 1 C1059 1 1053 47 Cl 143 11473 5 C 1148 1 1503 3 C 1187 11893 3 Cl 190 11943 5 C1227 1227] 1 C1228 1229] 2 C1283 1285] 3 C1286 13123 27 C1345 1351] 7 C1352 1377] 26 C1381 1384] 4 C1385 1422] 38 [1455 14553 1 C1456 1459] 4 C1490 1496] 7 C 1497 14973 1 C1513 15253 13 C 1526 1585] 60 C1591 1651] 61 C 1652 1663] 12 C1722 17303 9 C1731 1731] 1 C 1748 17563 9 C1757 1B07] 51 C1B35 1B353 1 C1B36 1B4B] 13 C1B84 1B973 14 C 1B98 1903] 6 C1910 19713 62 C1972 1972] 1 C2001 2001 3 1 C2002 20123 1 1 C2016 2021 3 6 [2022 20223 1 C2037 2041 3 5 C2042 2046] 5 C2068 206B3 1 C2069 20B33 15 C2097 20973 1 C209B 2119] 2 2 C2139 21393 1 C2140 21443 5 C2154 21553 r» C2156 2156] 1 C2179 21813 3 C21B2 21B23 1 C21B5 21913 7 C2192 2197] 6 C2230 22353 6 [2236 22383 3 C2243 22433 1 C2244 2244] 1 C225B 23193 62 [2320 23203 1 C2342 23433 2 [2344 2347] 4 C2363 23633 1 C2364 24003 37 C2437 24433 7 C2444 2444] 1 C2453 24613 9 C2462 2462] 1 C24B3 24B33 1 C24B4 24B4] 1 C24B9 24903 2 [2491 2520] 30 C2533 25333 1 C2534 2561 ] 2B C2567 25733 7 [2574 2576] 3 C25B4 25853 2 [2586 2592] 7 C2620 2659] 40 [2660 2660] 1 C2675 26763 2 [2677 267B] 2 C269B 2703] , 6 [2704 2704] 1 [2753 2762] 10 [2763 2763] 1 [2790 28003 11 [2601 2B01 ] 1 [2813 2B173 5 [2B18 28213 4 C2B28 28313 4 [2832 2842] 11 C284B 2B4B] 1 [2B49 2B493 1 C28SS 2855] 1 [2656 28563 1 C2867 28733 7 [2874 2892] 19 C2897 29023 6 [2903 29323 30 [2940 30003 61 T a b l e 2 - S D 5 0 : D i s t r i b u t i o n o f S i l e n c e I n t e r v a l s S p e e c h S a m p l e N o . 1 ; E n e r g y T h r e s h o l d = 20 D u r a t i o n ( x 10ms) S i l e n c e S p e e c h 1 31 35 2 1 5 1 3 3 1 1 6 4 7 5 5 9 4 6 5 5 7 1 1 3 8 5 1 9 3 1 10 3 2 1 1 - 15 8 0 16 - 20 3 1 5 2 1 - 30 1 6 3 1 - 40 2 16 4 1 - 60 1 3 6 1 - 80 3 8 8 1 - 1 2 0 2 0 121 -160 1 0 1 6 1 - 1 0 T o t a l D u r a t i o n P e r c e n t a g e D u r a t i o n 1434 50.02% 1433 49 .98% 16 2 . 2 E x p e r i m e n t s U s i n g S i l e n c e D e t e c t o r SD50 B a s e d on s p e e c h s a m p l e n o . 1 , t h e p r o b a b i l i t y d i s t r i b u t i o n o f t h e s p e e c h . w a v e f o r m a m p l i t u d e i s shown i n F i g . 2 . A s s u g g e s t e d by P a e z a n d G l i s s o n [PG] t h e d i s t r i b u t i o n a p p r o x i m a t e s a gamma f u n c t i o n o f t h e f o r m : p ( x ) = k. ( a | x | ) ~ l / 2 . e x p ( - v / 3 | x | / 2 a ) ; k = ( l / 3 / 8 7 r ) 1 / 2 We n o t e t h a t t h e 30 s e c . o f s p e e c h f r o m w h i c h d i s t r i b u t i o n i s o b t a i n e d c o n t a i n s a b o u t 48% o f " s p e e c h " a n d 52% o f " s i l e n c e " i n t h e s e n s e d e f i n e d by SD50 s i l e n c e d e t e c t o r a l g o r i t h m w i t h e n e r g y t h r e s h o l d s e t a t 2 0 . T h e s i l e n c e p a r t c o n t r i b u t e s t o t h e p r o m i n e n t p e a k a t t h e o r i g i n . T h e d i s t r i b u t i o n o f s h o r t - t i m e e n e r g y E f o r t h e same s p e e c h s a m p l e i s shown i n F i g . 3 s h o w i n g d o m i n a n c e o f low s i l e n c e e n e r g y . In c o n t r a s t t o t h e w a v e f o r m a m p l i t u d e f o r t o t a l s p e e c h , t h e w a v e f o r m a m p l i t u d e f o r t h e s i l e n c e p o r t i o n h a s an a p p r o x i m a t e n o r m a l d i s t r i b u t i o n w i t h mean(m) a n d d e v i a t i o n ( d ) g i v e n i n F i g . 4 . F i g . 5 shows t h a t t h e d i s t r i b u t i o n o f s h o r t - t i m e e n e r g y E f o r t h e s i l e n c e p a r t e x t r a c t e d f r o m t h e r e c o r d e d s p e e c h i s c l e a r l y n o t N o r m a l ( G a u s s i a n ) . T h a t t h e s i l e n c e o r b a c k g r o u n d n o i s e i s t i m e v a r y i n g e v e n f o r a s h o r t r e c o r d i n g o f 30s i s i l l u s t r a t e d i n t h e T a b l e 3 , b a s e d on t h e SD50 s i l e n c e d e t e c t o r . We o b s e r v e d t h a t t h e mean s i l e n c e e n e r g y i n t i m e s e g m e n t 3 i s 55% h i g h e r t h a n t h a t i n t i m e segment 1. T h i s i n d i c a t e s t h a t f o r a n y r o b u s t s i l e n c e d e t e c t o r t h e s i l e n c e t h r e s h o l d must be a d a p t i v e . U s i n g SD50 t o s e g r e g a t e s i l e n c e a n d s p e e c h p o r t i o n s f r o m t h e r e c o r d e d s p e e c h s a m p l e y i e l d s t h e s t a t i s t i c s f o r t h e r e s p e c t i v e r o.o3 A m p l i t u d e F i g . 4 - P d f o f S i l e n c e W a v e f o r m 18 T a b l e 3 - S t a t i s t i c s f o r S i l e n c e P o r t i o n o f S p e e c h S a m p l e No.1 T i m e E n e r g y E Z c r Z Segment mean d e v i a t i o n d / m mean d e v i a t i o n d /m (m) (d) (A) (m) (d) (A) 1. 0 - 5s 9 . 5 6 2 . 0 6 0 . 2 2 2 2 . 9 3 5 . 5 7 0 . 2 4 2 . 5-1 0s 11.61 2.51 0 . 2 2 2 4 . 7 4 4 . 3 5 0 . 1 8 3 . 1 0 - 1 5 s 1 4 . 8 4 2 . 7 9 0 . 1 9 2 5 . 5 7 7 . 9 7 0.31 4 . 15 -20S 1 1 . 9 0 3 . 1 6 0 . 2 7 2 2 . 6 2 5 . 4 2 0 . 2 4 5 . 2 0 - 2 5 S 1 2 . 3 9 3 . 9 2 0 . 3 2 2 4 . 5 0 8 . 3 3 0 . 3 4 6 . 2 5 - 3 0 s 1 3 . 1 5 3 . 7 9 0 . 2 9 2 3 . 5 6 7 . 9 4 0 . 3 4 0 - 3 0 s 1 1 . 5 6 3 . 3 4 0 . 2 9 2 3 . 7 4 6 . 4 3 0 . 2 7 p o r t i o n s a s shown i n T a b l e 4 . T h e s i g n a l - t o - n o i s e r a t i o , SNR = 2 0 1 o g ( 1 1 6 . 6 5 / 1 1 . 5 6 ) = 2 0 d B . I f we d e f i n e a b s o l u t e d e v i a t i o n A a s d e v i a t i o n / m e a n we s e e t h a t t h e a b s o l u t e d e v i a t i o n f o r s i l e n c e i s v e r y much l o w e r t h a n t h a t o f s p e e c h . T a b l e 4 - S e g r e g a t e d E n e r g y a n d Z c r S t a t i s t i c s E n e r g y Z c r mean(m) d e v i a t i o n ( d ) d /m mean d e v i a t i o n S i l e n c e 1 1 . 5 6 3 . 3 4 0 . 2 9 2 3 . 7 4 6 . 4 3 S p e e c h 1 1 6 . 6 5 1 2 4 . 2 3 1 .06 1 7 . 4 7 7 . 9 2 B o t h 6 3 . 9 0 1 0 2 . 2 3 1 .60 2 0 . 6 2 7 . 8 9 T a b l e s 3 a n d 4 show t h a t t h e s h o r t - t i m e s i l e n c e e n e r g y h a s a b s o l u t e d e v i a t i o n o f l e s s t h a n 0 . 3 2 . I t i s e a s y t o show t h a t t h e a b s o l u t e d e v i a t i o n A o f s h o r t - t i m e e n e r g y E o f a r e c o r d e d s p e e c h s a m p l e i s i n d e p e n d e n t o f t h e r e c o r d i n g l e v e l . T h i s l e a d s t o a s i m p l e d e c i s i o n c r i t e r i o n w h i c h i s i n d e p e n d e n t o f r e c o r d i n g l e v e l . F r o m F i g . 5 P{E < 16} = 0 . 9 5 b u t i n t e r m s o f mean(m) a n d d e v i a t i o n ( d ) 19 16 = m + 1 . 3 3 d = m.(1 + 1 . 3 3 d / m ) = m . ( 1 + 1 . 3 3 A ) . F u r t h e r m o r e s i n c e A < 0 . 3 2 f o r s i l e n c e e n e r g y , we s e e t h a t p { E < 1.43m} > 0 . 9 5 . In o t h e r w o r d s d u r i n g s i l e n c e t h e p r o b a b i l i t y o f E > 1.43m i s l e s s t h a n 0 . 0 5 . T h i s a s s e r t i o n i s t r u e r e g a r d l e s s o f r e c o r d i n g l e v e l , f o r o u r t e s t s p e e c h s a m p l e . I n s e r t i n g a b s o l u t e s i l e n c e d u r i n g p l a y b a c k c a u s e s two p r o b l e m s a p a r t f r o m t h e a b r u p t n e s s c a u s e d by e d i t i n g s i l e n c e s o f s h o r t d u r a t i o n . F i r s t , t h e r e i s l o s s e s o f weak f r i c a t i v e s e x e m p l i f i e d i n t h e f o l l o w i n g two i n s t a n c e s : a . " s " i n " y e a r s " , 1 1 5 1 - 1 1 5 5 , c o r r e s p o n d i n g e n e r g y a n d z c r a r e E 19 17 16 17 14 Z 45 51 40 54 50 b . " c e " i n " a p p e a r a n c e " , 2 1 8 8 - 2 1 9 0 , c o r r e s p o n d i n g v e c t o r s : E 9 1 1 1 1 Z 39 49 3 1 . S e c o n d , b e c a u s e t h e t h r e s h o l d i s f i x e d a t 2 0 , t h e r e a r e i n s t a n c e s o f c l i p p e d s p e e c h e v i d e n t i n t h e p l a y b a c k t i m e i n t e r v a l 20 -2 3 s e c . when t h e a v e r a g e s i l e n c e e n e r g y f a l l s t o 7 . 5 ( c o m p a r e d t o t h e o v e r a l l mean o f 1 1 . 5 6 ) . A t h r e s h o l d w h i c h i s a p p r o p r i a t e f o r one p a r t o f a s p e e c h s a m p l e may be d e t r i m e n t a l i n a n o t h e r p a r t b e c a u s e o f t h e n o n - s t a t i o n a r y n a t u r e o f b a c k g r o u n d n o i s e . In s p o k e n E n g l i s h l a n g u a g e , t h e b e g i n n i n g o f an u t t e r a n c e i s u s u a l l y s t r e s s e d more t h a n t h e e n d i n g w h i c h t e n d s t o t r a i l o f f . . In f a c t g o o d i n t o n a t i o n a n d a r t i c u l a t i o n c a l l f o r s u c h an e m p h a s i s . S i x " e n e r g y - g r a m s " o f u t t e r a n c e s e x t r a c t e d f r o m t h e r e c o r d e d s p e e c h a r e shown i n F i g . 6 . In e a c h c a s e i t i s e a s y t o s p o t t h e b e g i n n i n g o f u t t e r a n c e w i t h i n 20ms. However i n most 20 c. 'Volume': 1757-1807 d. 'Young c h i l d r e n ' : 821-899 e. 'Number': 1669-1721 f . 'Child':2903-2932 F i g u r e 6 - S i x "Energy-grams" 21 c a s e s t h e t r a i l - o f f f a l l s i n t o t h e s i l e n c e t h r e s h o l d i n a g r a d u a l m a n n e r . T h i s s u g g e s t s , f o r o p t i m a l i t y a d i f f e r e n t t h r e s h o l d s h o u l d be u s e d f o r s p e e c h - o n a n d s p e e c h - o f f . F r o m l i s t e n i n g t o p l a y b a c k o f s p e c i f i c i n t e r v a l s i n t h e r e c o r d e d s p e e c h , we o b s e r v e d t h a t a l l s p e e c h i n t e r v a l s o f d u r a t i o n l e s s t h a n 30ms d o n o t c o n s t i t u t e i n t e l l i g i b l e s p e e c h , a n d t h a t d e l e t i o n o f t h e s e d o e s n o t d e g r a d e t h e q u a l i t y o f t h e i r p l a y b a c k . T h i s d e l e t i o n w i l l r e d u c e t h e s p e e c h c o n t e n t o f t h e r e c o r d i n g by a b o u t 3.5% ( s e e T a b l e 6 c ) . A r e d u c t i o n o f s p e e c h c o n t e n t w i t h o u t s a c r i f i c i n g q u a l i t y o f p l a y b a c k i s t h e g o a l o f s p e e c h c o m p r e s s i o n by means o f s i l e n c e d e l e t i o n . Some c h a r a c t e r i s t i c s o f s i l e n c e i n t e r v a l s a s o c c u r r e d i n s p e e c h s a m p l e n o . 1 a r e g i v e n i n T a b l e 5 b e l o w . The p e r f o r m a n c e o f SD50 u n d e r v a r i o u s p a r a m e t e r s i s g i v e n i n T a b l e 6 f r o m w h i c h we n o t e t h e f o l l o w i n g : a . T h e s p e e c h c o n t e n t d e c r e a s e s a s t h e e n e r g y t h r e s h o l d e - s i l i n c r e a s e s ( T a b l e 6 a ) . F o r t h e p u r p o s e o f s p e e c h c o m p r e s s i o n we w o u l d want t h e h i g h e s t e - s i l w i t h o u t g i v i n g away t o o much i n p l a y b a c k q u a l i t y . b . C o m p a r i n g T a b l e s 6a a n d 6 b , t h e r e i s a 2 . 5 % - 3 . 5 % s a v i n g i f t h e min imum s p e e c h a l l o w a b l e i s s e t a t 40ms i n s t e a d o f 10ms. T a b l e 5 - t y p i c a l d u r a t i o n o f s i l e n c e i n t e r v a l s D u r a t i o n (ms) I n t r a w o r d s i l e n c e I n t e r w o r d s i l e n c e I n t e r p h r a s e s i l e n c e I n t e r s e n t e n c e s i l e n c e 0 - 100 100 - 900 800 - 2500 10 - 50 22 B e c a u s e i n t e l l i g i b l e s p e e c h h a s d u r a t i o n s g r e a t e r t h a n o r e q u a l t o 40ms, t h e i m p r o v e d p e r f o r m a n c e w i l l n o t d e g r a d e p l a y b a c k q u a l i t y . T h i s a s s e r t i o n i s i n a g r e e m e n t w i t h o u r s u b j e c t i v e l i s t e n i n g t o t h e c o r r e s p o n d i n g s i l e n c e - e d i t e d p l a y b a c k s . c . T a b l e 6c shows t h a t s p e e c h c o n t e n t d e c r e a s e s i f t h e minimum s p e e c h a l l o w a n c e , s p - m i n , i s i n c r e a s e d . However a n y i n c r e a s e i n s p - m i n b e y o n d 40ms i s l i k e l y t o c a u s e f u r t h e r s p e e c h c l i p p i n g s i n c e some weak f r i c a t i v e s a n d s t o p c o n s o n a n t s h a v e d u r a t i o n s a s s h o r t a s 40ms. 23 T a b l e 6 - P e r f o r m a n c e o f SD50 f o r V a r i o u s P a r a m e t e r s m m - s p e - s i l %sp c s Min imum s p e e c h d u r a t i o n S i l e n c e t h r e s h o l d f o r E n e r g y s p e e c h a s a p e r c e n t a g e o f r e c o r d i n g d u r a t i o n number o f 10ms s a m p l e s n - s p e - s i l s p e e c h %sp 1 0ms 14 1726cs 78 .03% 10 1 5 1673 7 5 . 6 3 10 16 1599 7 2 . 2 9 10 17 1549 7 0 . 0 3 10 18 1514 6 8 . 4 4 10 19 1475 6 6 . 6 8 10 20 1433 6 4 . 7 8 10 21 1403 6 3 . 4 3 10 22 1366 61 . 7 5 10 23 1 337 6 0 . 4 4 10 24 1308 5 9 . 13 10 25 1284 5 8 . 0 5 10 26 1 262 5 7 . 0 5 40ms 14 1664cs 75 .23% 40 1 5 1607 7 2 . 6 5 40 16 1521 6 8 . 7 6 40 17 1469 66 .41 40 18 1432 6 4 . 7 4 40 19 1389 6 2 . 7 9 40 20 1354 61 .21 40 21 1323 59 .81 40 22 1283 5 8 . 0 0 40 23 1254 5 6 . 6 9 40 24 1230 55 .61 40 25 1209 5 4 . 6 6 40 26 1 187 5 3 . 6 6 1 0ms 20 1 4 3 3 c s 64 .78% 20 20 1398 6 3 . 2 0 30 20 1372 6 2 . 0 3 40 20 1354 61 .21 50 20 1334 60 .31 60 20 1314 5 9 . 4 0 70 20 1284 5 8 . 0 5 24 3 . AN OPTIMAL S I L E N C E D E T E C T O R , SD52 A s s o c i a t e d w i t h e a c h s i l e n c e d e t e c t o r i s t h e d e f i n i t i o n o f s i l e n c e / s p e e c h t h r e s h o l d s . A s t h e d e g r e e o f s i l e n c e d e t e c t i o n i s i n c r e a s e d t h e r e i s a p o i n t b e y o n d w h i c h a n y f u r t h e r i n c r e a s e w i l l r e s u l t i n t o o much s p e e c h c l i p p i n g ( s p e e c h l o s s ) . T h e r e t h u s e x i s t s a c o n c e p t o f o p t i m a l i t y i n h e r e n t i n a l l s i l e n c e d e t e c t o r s . O p e r a t i o n o f a s i l e n c e d e t e c t o r w i t h a d j u s t a b l e t h r e s h o l d s n e a r i t s o p t i m a l p o i n t o f t e n r e q u i r e s t e d i o u s f i n e - t u n i n g f o r e a c h a p p l i c a t i o n . I t may p r o v e t o be i m p o s s i b l e f o r a n o n - a d a p t i v e d e t e c t o r t o o p e r a t e c o n t i n u a l l y n e a r i t s o p t i m a l l e v e l . In t h i s c h a p t e r we p r e s e n t an a d a p t i v e s i l e n c e d e t e c t i o n a l g o r i t h m . 3 . 1 Some D e f i n i t i o n s T o p u r s u e t h e c o n c e p t o f o p t i m a l i t y we make t h e f o l l o w i n g d e f i n i t i o n s . L e t a r e c o r d i n g i n t e r v a l be l a b e l l e d a s an i n t e r v a l I = [ 1 , N ] , w h e r e N i s t h e number o f s a m p l e p o i n t s i n I. T h e n f o r a s i l e n c e d e t e c t o r G a n d i t s p a r a m e t e r v e c t o r p , t h e s i l e n c e d e t e c t o r a l g o r i t h m ( G , p ) when a p p l i e d t o a t e s t s p e e c h s a m p l e g i v e s r i s e t o a s e q u e n c e o f d i s j o i n t i n t e r v a l s I\ w h i c h a r e c l a s s i f i e d a s s i l e n c e a n d a c o m p l e m e n t a r y s e t o f d i s j o i n t i n t e r v a l s l £ w h i c h a r e c l a s s i f i e d a s s p e e c h ( c . f . T a b l e 1 ) . a ) . D e f i n i t i o n : S i l e n c e D o m a i n , D 1 ( G , p ) = U I V . b) . D e f i n i t i o n : S p e e c h D o m a i n , D 2 ( G , p ) = U l £ . k C l e a r l y , D ' ( G , p ) U D 2 ( G , p ) = I; D ' ( G , p ) f l D 2 ( G , p ) = 4> c ) . A T e s t D o m a i n Dt i s a n y s u b s e t o f I = [ 1 , N ] , 25 d) . A s i l e n c e d e t e c t o r a l g o r i t h m ( G 1 r p , ) i s s a i d t o b e t t e r t h a n ( G 2 , p 2 ) w i t h r e s p e c t t o a t e s t d o m a i n D f c i f | D 2 ( G , , p , ) | < | D 2 ( G 2 , p 2 ) | a n d D f c C D 2 ( G . , p\ ) , i = 1 , 2 , where | D 2 ( G , p ) | d e n o t e s t h e number o f p o i n t s i n t h e s p e e c h d o m a i n . By u s i n g t h e same s p e e c h s a m p l e t h i s d e f i n i t i o n a l l o w s two d i f f e r e n t s i l e n c e d e t e c t o r s t o be c o m p a r e d . F o r u s e f u l n e s s we want t o c h o o s e t h e t e s t d o m a i n D f c t o i n c l u d e a s many r e l e v a n t s p e e c h p o i n t s a s p o s s i b l e f r o m t h e s p e e c h s a m p l e . One way t o c h o o s e t h e t e s t d o m a i n i s t o i d e n t i f y w i t h i n t h e s p e e c h s a m p l e a few weak f r i c a t i v e s o r s t o p c o n s o n a n t s and i n c l u d e t h e s e p o i n t s i n t h e t e s t d o m a i n . e) . A s e t o f p a r a m e t e r s p f o r a s i l e n c e d e t e c t o r G i s s a i d t o be D ^ - a d m i s s i b l e i f D f c C D 2 ( G , p ) , , i . e . i f t h e s p e e c h d o m a i n due t o t h e a l g o r i t h m ( G , p ) c o n t a i n s t h e t e s t d o m a i n . f ) . A s i l e n c e d e t e c t o r a l g o r i t h m ( G , p * ) i s s a i d t o be D ^ - o p t i m a l i f ( G , p * ) i s t h e b e s t among a l l D f c - a d m i s s i b l e p a r a m e t e r s f o r a g i v e n s i l e n c e d e t e c t o r G . We i l l u s t r a t e t h e d e f i n i t i o n s a b o v e w i t h an e x a m p l e . L e t G = SD50 a n d p = e - s i l , t h e n ( S D 5 0 , e - s i l ) i s t h e s i l e n c e d e t e c t o r a l g o r i t h m d i s c u s s e d i n S e c t i o n 2 . 1 . D e f i n e two t e s t d o m a i n s a s f o l l o w s : D^ = { 1 1 5 1 - 1 1 5 4 , 2 1 8 8 - 2 1 9 0 }; D 2 = { 1151 -1154 }. T h e n w i t h r e s p e c t t o s p e e c h s a m p l e n o . 1 , ( S D 5 0 , 2 0 ) i s n e i t h e r D\-a d m i s s i b l e n o r D ^ - a d m i s s i b l e b e c a u s e t h e s p e e c h d o m a i n o f ( S D 5 0 , 2 0 ) d o e s n o t c o n t a i n e i t h e r d o m a i n . W h e r e a s ( S D 5 0 , 1 5 ) i s 26 D 2 - a d m i s s i b l e b u t n o t D T - a d m i s s i b l e , ( S D 5 0 , 8 ) i s b o t h D , -a d m i s s i b l e a n d D ^ - a d m i s s i b l e . F r o m T a b l e 6 a , | D 2 ( S D 5 0 , 1 5 ) | = 1 6 7 3 , | 'D 2 ( S D 5 0 , 1 4 ) | = 1726 , s o ( S D 5 0 , 1 5 ) i s b e t t e r t h a n ( S D 5 0 , 1 4 ) w i t h r e s p e c t t o t h e t e s t d o m a i n D f c g i v e n i n T a b l e 6 . 3 . 2 T h e SD52 S i l e n c e D e t e c t o r W i t h t h e i n s i g h t s g a i n e d i n S e c t i o n 2 . 2 , we want t o c o n s t r u c t a s i l e n c e d e t e c t o r a l g o r i t h m ( G , p ) w h i c h h a s t h e f o l l o w i n g c h a r a c t e r i s t i c s . [C1 ] A d a p t i v e t o t h e b a c k g r o u n d n o i s e . [C2 ] S e p a r a t e t h r e s h o l d s ( E O N , E O F F ) f o r s p e e c h - o n & s p e e c h - o f f a n d a s i m p l e n o n - s t a t i s t i c a l d e c i s i o n c r i t e r i o n . [C3 ] U t i l i z e s z c r t h r e s h o l d Z - S I L , t o r e c o v e r t h e o t h e r w i s e l o s t weak f r i c a t i v e s . [C4 ] M in imum s p e e c h d u r a t i o n s p e c i f i c a t i o n , M I N - S P . [C5 ] F i r s t p a r t o f r e c o r d e d s p e e c h n e e d n o t be s i l e n c e . [C6 ] O p t i m a l w i t h r e s p e c t t o some t e s t d o m a i n f o r some t e s t s p e e c h s a m p l e s . T h e s e c h a r a c t e r i s t i c s i n c l u d e 4 p a r a m e t e r s : p , = Z - S I L , p 2 = M I N - S P , p 3 = E O N , p „ = E O F F . One o f t h e c h a r a c t e r i s t i c s o f human s p e e c h i s t h a t some s p e e c h f e a t u r e s s u c h a s e n e r g y E a n d z e r o - c r o s s i n g - r a t e Z r e m a i n r e l a t i v e l y c o n s t a n t w i t h i n a 10ms f r a m e . A n o t h e r c h a r a c t e r i s t i c o f a c t i v e s p e e c h i s t h a t E i s n e v e r c o n s t a n t ( w i t h i n c e r t a i n l i m i t s ) f o r more t h a n 70ms. So a s i m p l i s t i c way t o c h a r a c t e r i z e s i l e n c e o r b a c k g r o u n d n o i s e i s t o s a y t h a t i f E i s r e l a t i v e l y c o n s t a n t f o r 100ms t h e n t h a t 100ms i s s i l e n c e . T o a d a p t t o t h e b a c k g r o u n d n o i s e we l e t AVG be t h e a v e r a g e s h o r t - t i m e e n e r g y o f 10 p r e v i o u s s i l e n c e p o i n t s ( e q u i v a l e n t o f 1 0 0 m s ) . T h i s i s e n o u g h t o t r a c k t h e v a r y i n g n a t u r e o f b a c k g r o u n d n o i s e . F o r t h i s p u r p o s e a F I F O s t a c k o f 10 p o i n t s i s k e p t f o r 27 u p d a t i n g A V G . C h a r a c t e r i s t i c [C2 ] i s a c h i e v e d by u t i l i z i n g a s i m p l e d e c i s i o n c r i t e r i o n g i v e n i n S e c t i o n 2 . 2 . L e t E O N , E O F F be s p e e c h -on a n d s p e e c h - o f f f a c t o r s . T h e n a c h a n g e o f c o n t e x t f r o m s i l e n c e t o s p e e c h o c c u r s i f E > E O N * A V G ; a n d a c h a n g e o f c o n t e x t f r o m s p e e c h t o s i l e n c e o c c u r s i f E i s l e s s t h a n o r e q u a l t o E O F F * A V G . F r o m t h e d i s c u s s i o n i n S e c t i o n 2 . 2 , E O F F < EON a n d EON > 1 . 4 . I n c o r p o r a t i n g [ C 3 ] , t h e d e c i s i o n r u l e f o r c h a n g e o f c o n t e x t i s f o r E a n d Z t o s a t i s f y t h e f o l l o w i n g c o n d i t i o n s : S p e e c h i f E > EON*AVG o r Z > Z - S I L S i l e n c e i f E < E O F F * A V G a n d Z < Z - S I L . T o a l l o w [ C 4 ] , when t h e r e i s a p o s s i b l e c o n t e x t s w i t c h f r o m s i l e n c e t o s p e e c h , t h a t p o i n t i s i n c l u d e d i n t h e s p e e c h d o m a i n o n l y i f t h e n e x t few p o i n t s ( d e p e n d i n g on M I N - S P ) a l s o s a t i s f y t h e same d e c i s i o n c o n d i t i o n w i t h AVG h e l d c o n s t a n t i n t h i s l o o k - a h e a d s c h e m e . H e n c e a p o i n t c a n be c l a s s i f i e d a s s p e e c h b u t n o t i n c l u d e d i n t h e s p e e c h d o m a i n b e c a u s e o f t h e minimum s p e e c h d u r a t i o n c o n s t r a i n t . We s e t an i n i t i a l t h r e s h o l d E - I N I T f o r E a t 50 f o r a r a n g e o f 0 - 2048 t o t a k e c a r e o f i n i t i a l s p e e c h / s i l e n c e d i s c r i m i n a t i o n . A l l i n i t i a l p o i n t s g r e a t e r t h a n E - I N I T a r e deemed t o be s p e e c h , b u t on t h e f i r s t p o i n t t h a t i s l e s s t h a n o r e q u a l t o E - I N I T , we b e g i n t h e a d a p t i v e p r o c e s s by s e t t i n g AVG a n d a l l 10 p o i n t s i n t h e F I F O s t a c k t o E . Some p o s s i b l e l o s s o f i n t e g r i t y o f t h e a d a p t a t i o n p r o c e s s may o c c u r b e f o r e a c o n t i g u o u s s i l e n c e o f 100ms i s e n c o u n t e r e d . However t h i s i n i t i a l m a r g i n a l l o s s o f i n t e g r i t y w i l l n o t l a s t f o r more t h a n 1 o r 2 s e c o n d s i n n o r m a l s p e e c h . A 28 c a s e i n p o i n t i s g i v e n i n T a b l e 2 f o r s p e e c h s a m p l e n o . 1 where t h e r e a r e 25 s i l e n c e i n t e r v a l s o f d u r a t i o n 100ms o r l o n g e r i n a 3 0 - s e c o n d r e c o r d i n g , i m p l y i n g an a v e r a g e s p a c i n g o f 1.2 s e c . f o r s u c h s i l e n c e i n t e r v a l s . A s f o r [ C 6 ] , t h e o p t i m a l i t y h a s t o be d o n e e x p e r i m e n t l y w i t h r e s p e c t t o some w e l l c h o s e n t e s t d o m a i n a n d g i v e n t e s t s p e e c h s a m p l e s . T h e r e a r e two p o t e n t i a l p i t f a l l s i n i m p l e m e n t i n g an a d a p t i v e , d u a l - t h r e s h o l d scheme w h i c h c a l l s f o r u p d a t i n g o f a v e r a g e s i l e n c e e n e r g y w h e n e v e r a s i l e n c e p o i n t i s d e t e c t e d . F i r s t , c o n s i d e r t h e p r o b l e m o f two t h r e s h o l d s . T h e i d e a b e h i n d two t h r e s h o l d s E O N , E O F F w i t h EON > EOFF i s s u p p o r t e d i n S e c t i o n 2 . 2 a n d F i g . 6 . I f we a c t i v a t e t h e EON t h r e s h o l d a f t e r a s i l e n c e p o i n t i s d e t e c t e d we may h o t d e t e c t t h e l i k e s o f s t o p c o n s o n a n t s w h i c h a r e o f t e n p r e c e d e d by 10 t o 30ms o f i n t r a w o r d s i l e n c e a n d e x h i b i t low e n e r g y . One way t o c i r c u m v e n t t h i s i s t o a c t i v a t e EON o n l y i f t h e p r e c e d i n g 6 p o i n t s (= 60ms) a r e s i l e n c e . S e c o n d l y , we a r e l i k e l y t o c o r r u p t t h e a v e r a g e i f we u p d a t e AVG i n d i s c r i m i n a t e l y . B e c a u s e o f t h e EON f a c t o r ( w h i c h c a n be i n t h e n e i g h b o u r h o o d o f 2 . 0 ) , t h e r i s i n g e d g e o f an u t t e r a n c e , a n d f r a m e a l i g n m e n t , i t i s l i k e l y t h a t t h e f i r s t one o r two p o i n t s o f an u t t e r a n c e a r e j u d g e d a s s i l e n c e . A s t h e s e b o u n d a r y p o i n t s c a n be n e a r l y t w i c e t h e a v e r a g e s i l e n c e e n e r g y , we a r e i n f a c t c o r r u p t i n g t h e a v e r a g e by u s i n g t h e s e p o i n t s t o u p d a t e A V G . H e n c e we want t o u p d a t e AVG o n l y i f a s i l e n c e p o i n t h a s E < e - c r i t * A V G f o r some f i x e d c r i t i c a l v a l u e e - c r i t . In S e c t i o n 2 . 2 i t was shown t h a t d u r i n g s i l e n c e , P [E < 1.43m] > 0 . 9 5 where m i s t h e mean 29 s i l e n c e energy f o r the whole speech sample. So an i n t u i t i v e v a l u e f o r e - c r i t would be 1.43. But t h i s s t i l l would l e a d t o i n c l u s i o n of many p o i n t s c l o s e t o 1.43*AVG i n the a d a p t i v e p r o c e s s (as many as the o c c u r r e n c e s of r i s i n g edge of u t t e r a n c e ) . F i g . 7 shows the d i s t r i b u t i o n s of s i l e n c e energy f o r the whole d u r a t i o n of speech and two s e l e c t e d s i l e n c e segments. S i n c e s i l e n c e energy ( i . e . background n o i s e ) i s time v a r y i n g i t i s not s u r p r i s i n g t o see t h a t the l o n g term d e v i a t i o n i s much l a r g e r than the s h o r t term l o c a l i z e d d e v i a t i o n . The t a b l e below shows the d i f f e r e n c e . T a b l e 7 - S i l e n c e Energy S t a t i s t i c s Time segment mean (m) d e v i a t i o n (d) A = d/m a. 0 - 30.0s 11.56 3.34 0.29 b. 1.5 - 3.5s 9.64 1.06 0.11 c. 5.0 - 6.0s 10.07 0.97 0.10 From F i g u r e s 7b and 7c, P[E > (l+2.23A)m] = P[E > 1.25m] = 0 . 0 1 5 and P[E > (1+1.99A)m] = P[E > 1.20m] = 0.021, s u g g e s t i n g the v a l u e of e - c r i t t o be 1.25 or 1.20. For o p t i m a l performance e-c r i t s h o u l d be between 1.25 and 1.35, as s u b s t a n t i a t e d by e x p e r i m e n t a t i o n . In t h i s t h e s i s we s e t e - c r i t t o be 1.28. In c o n c l u s i o n , we i n c o r p o r a t e the f o l l o w i n g c o n s i d e r a t i o n s i n implementing the a d a p t i v e d u a l - t h r e s h o l d s i l e n c e d e t e c t o r a l o g r i t h m SD52. (a) AVG h e l d c o n s t a n t i n a look- a h e a d scheme of d e t e c t i n g speech-on w i t h minimum speech d u r a t i o n r e q u i r e m e n t . (b) EON i s a c t i v a t e d o n l y i f p r e c e e d i n g 6 p o i n t s a r e s i l e n c e , o t h e r w i s e EOFF i s i n e f f e c t . 30 (c) AVG i s u p d a t e d by a s i l e n c e p o i n t o n l y i f E < 1.28*AVG, o t h e r w i s e AVG r e m a i n s u n a l t e r e d . F i g u r e 7 - D i s t r i b u t i o n s o f S i l e n c e E n e r g y 31 3.3 The SD52 Algorithm Input Parameters: MIN-SP,Z-SIL,EON,EOFF Input Data : Feature Vectors E, Z of 8KH2 speech, Test Domain D Output : K1,K2,type where [K1,K2] - i n t e r v a l , type • 1 ( s i l e n c e ) or 2(speech) Accept Parameter ±. I n i t i a l i z a t i o n : K1=TYPE=1, AVG=-1 CALL INTERVAL(K1,K2,TYPE) Output RI,K2,TYPE TYPE=2 TYPE«= 1 K1 =K 2+1 I TYPE-1 [K1,K2] i n t e r s e c t s any poi n t i n D? n TYPE«2 (END "y" i m p l i e s the input parameters are not D - a d m i s s i b l e . 32 SUBROUTINE INTERVAL(K1,K2,TYPE) Input : K1,TYPE Output : K1,K2,TYPE; K1,TYPE remain unchanged Data : E and Z, the feature vectors for the speech sample Update : AVG, average silence energy i f E < 1.28*AVG Comment: This subroutine, given the input, returns the largest i n t e r v a l [K1,K2] of the same TYPE. o I=K1 <^AVG=-1?^> n -><^E(I)<E-INIT?^>-Set appropriate threshold EON or EOFF n Start adaptive process for AVG E(I)<1.28*AVG? > n 1 Update AVG 1  Test change of context: \ y i f type=1 next MIN-SP pts speech? i f type=2 point * silence? n 1=1 + 1 R2=I-1 —7R RET > — * — v y < I=end ? J > -33 3 . 4 D e t e r m i n a t i o n Of O p t i m a l P a r a m e t e r s The s e t o f p o s s i b l e p a r a m e t e r s f o r SD52 a n d t h e i r u s e f u l r a n g e s a r e t a b u l a t e d b e l o w . T a b l e 8 - P a r a m e t e r R a n g e s f o r SD52 P a r a m e t e r P o s s i b l e Range U s e f u l Range p , . Z - S I L 1 - 100 30 - 50 p 2 . M I N - S P 10 - 1000ms 10 - 100ms p 3 . EON 1 . 0 - 1 0 0 . 0 1 . 4 - 4 . 0 p « . EOFF 1 . 0 - 1 0 0 . 0 1 . 0 - 4 . 0 We s e e t h a t e v e n t h e s p a c e o f u s e f u l p a r a m e t e r s i s l a r g e r t h a n we c a n h a n d l e , k e e p i n g i n m i n d t h a t e a c h p o i n t i n t h e u s e f u l r a n g e g i v e s r i s e t o a s i l e n c e d e t e c t o r a l g o r i t h m . T o b r i n g o r d e r i n t o t h i s p a r a m e t e r s p a c e we want t o f u r t h e r r e s t r i c t o u r a t t e n t i o n t o a s e t o f a d m i s s i b l e p a r a m e t e r s w i t h r e s p e c t t o some w e l l - c h o s e n t e s t d o m a i n o f a t e s t s p e e c h s a m p l e . L e t D t be t h e t e s t d o m a i n d e f i n e d by a l l t h e t e s t p o i n t s i n T a b l e 9 . D f c c o n t a i n s many weak f r i c a t i v e s t h a t a n y D t - a d m i s s i b l e p a r a m e t e r must i n c l u d e i n i t s s p e e c h d o m a i n . I t i s r e a s o n a b l e t o s a y t h a t a n y D t - a d m i s s i b l e p a r a m e t e r w o u l d y i e l d a s i l e n c e d e t e c t o r a l g o r i t h m w h i c h h a s g o o d p l a y b a c k q u a l i t y b e c a u s e i t h a s i n c l u d e d t h e more e l u s i v e d e t e c t i o n p o i n t s f r o m t h e r e c o r d e d s p e e c h s a m p l e . T a b l e 10 shows t h e p e r f o r m a n c e o f SD52 on s p e e c h s a m p l e n o . 1 f o r v a r i o u s D t - a d m i s s i b l e p a r a m e t e r s . T a b l e s 10a - 10f a l l show t h a t s p e e c h c o n t e n t d e c r e a s e s a s E O N , E O F F , M I N - S P o r Z - S I L 3 4 T a b l e 9 - A T e s t Domain 1. 852 - 858 " c h " i n " c h i l d r e n " 2 . 891 - 895 " r e n " i n " c h i l d r e n " 3 . 954 - 957 " t h " i n " t h e " 4 . 988 - 990 " g e " i n " a g e " 5 . 1053 - 1057 " s e " i n " s e v e n " 6 . 1150 - 1 1 55 " s " i n " y e a r s " 7 . 1287 - 1 288 " b l e " i n " i n c a p a b l e " 8 . 1457 - 1458 " s " i n " a b s t r a c t " 9 . 1899 - 1903 " s " i n " c o n s t a n t " 10 . 2486 - 2489 " g e " i n " P i a g e t " 1 1 . 2829 - 2849 " a t i o n " i n " e d u c a t i o n 12 . 2903 - 2906 " c h " i n " c h i l d " i n c r e a s e s . I t a l s o g i v e s t h e f i r s t i n c r e m e n t a l p a r a m e t e r w h i c h i s n o t a d m i s s i b l e a n d d i s p l a y s t h e f a i l e d t e s t p o i n t s . The s t r a t e g i c t e s t p o i n t s i n T a b l e 10 d e f i n e t h e e x t e n t o f t h e p a r a m e t e r r a n g e w h e r e p l a y b a c k q u a l i t y i s r e t a i n e d . F o r i n s t a n c e , we s e e t h a t i n s e t t i n g M I N - S P = 50ms ( T a b l e I 0 d ) , we f a i l t o i n c l u d e a s s p e e c h t h e " s " i n " a b s t r a c t " ( 1 4 3 0 - 1 4 8 9 ) . In t h i s c a s e " s " l a s t s f o r 40ms a n d i s p r e c e d e d a n d f o l l o w e d by one o r more s i l e n c e p o i n t s . F o r s p e e c h s a m p l e n o . 1 a n d t e s t d o m a i n D f c a s d e f i n e d , a s e t o f D t - a d m i s s i b l e p a r a m e t e r s i s g i v e n by Z - S I L M I N - S P EON E O F F [ 3 4 , 38] [ 10 , 40] [ 1 . 0 0 , 2 . 6 0 ] [ 1 . 0 0 , 1 .96 ] a n d a l l t h e s e p a r a m e t e r s d o p r o d u c e g o o d q u a l i t y p l a y b a c k b a s e d on o u r s u b j e c t i v e l i s t e n i n g . W i t h i n t h i s s e t o f p a r a m e t e r s t h e b e s t a l g o r i t h m i n t h e s e n s e o f d e f i n i t i o n 3 . 1 ( d ) i s g i v e n by t h e e x t r e m u m p o i n t , Z - S I L = 3 8 , M I N - S P = 4 0 , EON = 2 . 6 0 , E O F F = 1.96 w i t h c o r r e s p o n d i n g s p e e c h c o n t e n t = 1 3 8 8 c s o r 6 2 . 8 1 % . T h i s t u r n s 3 5 T a b l e 10 - P e r f o r m a n c e o f S D 5 2 f o r V a r i o u s A d m i s s i b l e P a r a m e t e r s T e s t D o m a i n : 8 5 2 8 5 8 ] [ 8 9 1 8 9 5 ] [ 9 5 4 9 5 7 ] [ 9 8 8 9 9 0 ] 1 0 5 3 1 0 5 7 ] [ 1 1 5 0 1 1 5 5 ] [ 1 2 8 7 1 2 8 8 ] [ 1 4 5 7 1 4 5 8 ] 1 8 9 9 1 9 0 3 ] [ 2 4 8 6 2 4 8 9 ] [ 2 8 2 9 2 8 4 9 ] [ 2 9 0 3 2 9 0 6 ] Z - S I L M I N - S P E O N E O F F S P E E C H % S P 3 6 1 0 m s 1 . 8 0 1 . 8 0 1 5 6 1 c s 7 0 . 5 7 % 3 6 1 0 1 . 9 0 1 . 8 0 1 5 4 9 7 0 . 0 3 3 6 1 0 2 . 0 0 1 . 8 0 1 5 4 7 6 9 . 9 4 3 6 10 2 . 1 0 1 . 8 0 1 541 6 9 . 6 7 3 6 10 2 . 2 0 1 . 8 0 1 5 4 0 6 9 . 6 2 3 6 10 2 . 3 0 1 . 8 0 1 5 3 3 6 9 . 3 0 3 6 10 2 . 4 0 1 . 8 0 1531 6 9 . 2 1 3 6 10 2 . 5 0 1 . 8 0 1 5 2 6 6 8 . 9 9 3 6 10 2 . 6 0 1 . 8 0 1 5 2 2 6 8.81 3 6 10 2 . 7 0 1 . 8 0 1 5 2 0 6 8 . 7 2 3 6 4 0 m s 1 . 8 0 1 . 8 0 1 4 8 0 c s 6 6 . 9 1 % 3 6 4 0 1 . 9 0 1 . 8 0 1 4 7 7 6 6 . 7 7 3 6 4 0 2 . 0 0 1 . 8 0 1 4 7 3 6 6 . 5 9 3 6 4 0 2 . 1 0 1 . 8 0 1 4 7 1 6 6 . 5 0 3 6 4 0 2 . 2 0 1 . 8 0 1 4 7 0 6 6 . 4 6 3 6 4 0 2 . 3 0 1 . 8 0 1 4 6 6 6 6 . 2 7 3 6 4 0 2 . 4 0 1 . 8 0 1 4 6 6 6 6 . 2 7 3 6 4 0 2 . 5 0 1 . 8 0 1 4 6 2 6 6 . 0 9 3 6 •40 2 . 6 0 1 . 8 0 1 4 5 7 6 5 . 8 7 3 6 4 0 2 . 7 0 1 . 8 0 * * f a i l [ 1 8 9 9 1 9 0 3 ] 3 6 1 0 m s 2 . 4 0 1 . 4 0 1 6 6 2 c s 7 5 . 1 4 % 3 6 10 2 . 4 0 1 . 5 0 1631 7 3 . 7 3 3 6 10 2 . 4 0 1 . 6 0 1 5 8 6 71 . 7 0 3 6 10 2 . 4 0 1 . 7 0 1 5 6 2 7 0 . 6 1 3 6 10 2 . 4 0 1 . 8 0 1 5 3 1 6 9 . 2 1 3 6 10 2 . 4 0 1 . 9 0 1 4 9 2 6 7 . 4 5 3 6 10 2 . 4 0 2 . 0 0 * * f a i l [ 2 8 2 9 2 8 4 9 ] 3 6 1 0 m s 2 . 4 0 1 . 8 0 1 5 3 1 c s 6 9 . 2 1 % 3 6 2 0 2 . 4 0 1 . 8 0 1 5 0 2 6 7 . 9 0 3 6 3 0 2 . 4 0 1 . 8 0 1 4 7 8 6 6 . 8 2 3 6 4 0 2 . 4 0 1 . 8 0 1 4 6 6 6 6 . 2 7 3 6 5 0 2 . 4 0 1 . 8 0 * * f a i l [ 1 4 5 7 1 4 5 8 ] 3 0 4 0 m s 2 . 4 0 1 . 8 0 1 5 2 1 C S 6 8 . 7 6 % 3 2 4 0 2 . 4 0 1 . 8 0 1 4 3 9 6 7 . 3 1 3 4 4 0 2 . 4 0 1 . 8 0 1 4 8 2 6 7 . 0 0 3 6 4 0 2 . 4 0 1 . 8 0 1 4 6 6 6 6 . 2 7 3 8 4 0 2 . 4 0 1 . 8 0 1 4 5 7 6 5 . 8 7 4 0 4 0 2 . 4 0 1 . 8 0 * * f a i l [ 9 8 8 9 9 0 ] 36 o u t t o be t h e D - o p t i m a l p a r a m e t e r w i t h r e s p e c t t o t h e t e s t d o m a i n and t h e t e s t s p e e c h s a m p l e . S p e e c h s a m p l e n o . 1 was r e c o r d e d a t a r a t h e r low l e v e l ; t h e d y n a m i c r a n g e f o r E i s 0 - 500 a n d c a n be c o n s i d e r e d a low e n d f o r t h e r e c o r d i n g l e v e l . S p e e c h s a m p l e n o . 1 a , r e - r e c o r d e d w i t h t h e same c a s s e t t e r e c o r d e d s p e e c h , was a t a somewhat h i g h r e c o r d i n g l e v e l w i t h a d y n a m i c r a n g e o f 0 - 1700 a n d c a n be c o n s i d e r e d a h i g h .end f o r r e c o r d i n g l e v e l . By c o m p a r i n g t h e r e s p e c t i v e e n e r g y v e c t o r s we d e t e r m i n e d t h a t s p e e c h b e g i n s a t 7 . 2 9 s f o r n o . 1 a n d 2 . 0 8 s f o r n o . 1 a s h o w i n g an o f f s e t o f 5 . 2 1 s . K n o w i n g t h i s o f f s e t and e x a m i n i n g t h e e n e r g y a n d z c r v e c t o r s due t o s p e e c h s a m p l e n o . 1 a , t h e c o r r e s p o n d i n g a n d e q u i v a l e n t t e s t d o m a i n f o r n o . 1 a was o b t a i n e d . F o l l o w i n g t h e p r o c e d u r e a s b e f o r e a D t - o p t i m a l p a r a m e t e r s e t was d e r i v e d a n d shown i n t h e T a b l e 11 . N o t e t h a t i n T a b l e 11 - O p t i m a l P a r a m e t e r s S a m p l e S i l e n c e D e t e c t o r A l g o r i t h m S p e e c h %sp mean AVG n o . 1 ( S D 5 2 , 3 8 , 4 0 , 2 . 6 0 , 1 . 9 6 ) o p t i m a l 1 3 8 8 / 2 2 1 0 62 .81% 1 1 . 5 6 n o . 1 ( S D 5 2 , 3 6 , 4 0 , 2 . 6 0 , 1 . 9 6 ) 1 3 9 8 / 2 2 1 0 63.26% 1 1 . 5 6 n o . 1 a ( S D 5 2 , 3 6 , 4 0 , 3 . 0 2 , 2 . 5 7 ) o p t i m a l 1 4 1 2 / 2 2 3 0 63.32% 4 5 . 8 9 s p e e c h s a m p l e n o . 1 a t h e c a s s e t t e r e c o r d e r was p l a y i n g a b o u t 1% s l o w e r t h a n i n s a m p l e n o . 1 . T h e t a b u l a t i o n shows t h a t w h i l e t h e d e t e c t o r a l g o r i t h m i s a d a p t i v e t o c h a n g e s i n b a c k g r o u n d n o i s e , t h e D t - o p t i m a l EON a n d E O F F d e p e n d on t h e r e c o r d i n g l e v e l . T h e o p t i m a l p e r f o r m a n c e i n t e r m s o f p e r c e n t a g e s p e e c h i s w i t h i n 0 .5%. F o r s p e e c h w i t h an a r b i t r a r y r e c o r d i n g l e v e l t h e o p t i m a l 37 p a r a m e t e r s s h o u l d l i e i n t h e f o l l o w i n g r a n g e s : Z - S I L : [ 3 6 , 38] M I N - S P : [ 4 0 , 40] EON : [ 2 . 6 0 , 3 . 0 2 ] E O F F : [ 1 . 9 6 , 2 . 5 7 ] T h e a c t u a l o p t i m u m p o i n t w i l l be somewhat d e p e n d e n t on t h e r e c o r d i n g l e v e l a n d t h e S i g n a l - t o - N o i s e R a t i o . 38 4 . S I L E N C E DETECTOR IMPLEMENTATION We a r e i n t e r e s t e d i n i m p l e m e n t i n g o u r s i l e n c e d e t e c t o r w i t h a s p e e c h c o d i n g s c h e m e . We u s e s p e e c h s a m p l e n o . 1 a a n d t h e s i l e n c e d e t e c t o r a l g o r i t h m SD52 w i t h t h e o p t i m a l p a r a m e t e r s e t : Z - S I L = 3 6 , M I N - S P = 40ms, EON = 3 . 0 2 , EOFF = 2 . 5 7 . As shown i n S e c t i o n - 3 . 4 t h i s a l g o r i t h m i s o p t i m a l f o r t h e 1 2 - b i t PCM s p e e c h s a m p l e w i t h r e s p e c t t o t h e c h o s e n t e s t d o m a i n . I t r e s u l t s i n 63 .32% s p e e c h a n d 36.68% s i l e n c e f r o m 2 2 . 3 s e c . o f s p e e c h , e x c l u d i n g t h e l e a d i n g a n d t r a i l i n g s i l e n c e p e r i o d s . T h e c o m p r e s s i o n a c h i e v e d by t h i s s i l e n c e d e t e c t o r i n t h i s 1 2 - b i t PCM c o d i n g scheme i s 3 6 . 6 8 % , n o t c o u n t i n g t h e o v e r h e a d a s s o c i a t e d w i t h s i l e n c e i n t e r v a l e n c o d i n g . 4 .1 S i l e n c e - S p e e c h C o d e r T h e two m a j o r a p p l i c a t i o n s o f a s i l e n c e d e t e c t o r a r e s p e e c h s t o r a g e a n d " v o i c e c o m m u n i c a t i o n t h r o u g h a b a n d w i d t h - l i m i t e d c h a n n e l . T h e d e s i r e d i n f o r m a t i o n i n b o t h c a s e s i s n o t s i m p l y t h e s p e e c h c o d e s b u t s i l e n c e - s p e e c h c o d e s o r s i l e n c e - e n c o d e d s p e e c h c o d e s (SS c o d e s ) . T h e m a i n c o n s i d e r a t i o n s i n e n c o d i n g t h e s i l e n c e i n t e r v a l s a r e t h e v a r i o u s a s p e c t s r e l a t i n g t o t h e d e t e c t i o n o f s i l e n c e i n t e r v a l s a t t h e d e c o d e r i n c l u d i n g c o m p l e x i t y , r e l i a b i l i t y a n d a d a p t a b i l i t y . S i l e n c e i n t e r v a l s c a n be e n c o d e d i n one o f t h e f o l l o w i n g m e t h o d s : (a ) R u n - l e n g t h c o d i n g o f s p e e c h a n d s i l e n c e i n t e r v a l s , (b) U n i q u e ( o r s p e c i a l ) c o d e o r h e a d e r , ( c ) U n i q u e ( o r i m p r o b a b l e ) s e q u e n c e o f c o d e s . In (a ) e a c h i n t e r v a l o f s p e e c h o r s i l e n c e i s p r e c e d e d by a h e a d e r 3 9 which i n d i c a t e s the l e n g t h of i n t e r v a l . The header code must be d i s t i n g u i s h a b l e from the message codes and can be coded as i n (b) or ( c ) . For (b) a s i l e n c e i n t e r v a l i s coded as a unique code f o l l o w e d by the code f o r i n t e r v a l l e n g t h . We can use an improba b l e or an i n c o n s e q u e n t i a l code as the s i l e n c e code and a l t e r t he improbable or i n c o n s e q u e n t i a l code when i t o c c u r s i n the message. For a c o d i n g scheme t h a t uses fewer than 4 b i t s per sample, we may have t o append an e x t r a b i t f o r each sample f o r the purpose of d i s t i n g u i s h i n g between the s i l e n c e code and the message. In ( c ) , a s i l e n c e i n t e r v a l i s s i g n a l l e d by a h i g h l y i m p r o b a b l e sequence of codes of a s p e c i f i c l e n g t h f o l l o w e d by the i n t e r v a l l e n g t h . For example, a s t r i n g of f i v e " 0 0 0 0 " s can be the s i l e n c e code-word i n a 4 - b i t ADPCM SS c o d i n g scheme w i t h the next code or codes s p e c i f y i n g the l e n g t h , p r o v i d e d of c o u r s e such a sequence never o c c u r s i n the speech c o n t e x t or has v e r y n e g l i g i b l e p r o b a b i l i t y of o c c u r r i n g . In t h i s t h e s i s we use method (c) t o encode s i l e n c e because of i t s s i m p l i c i t y and f l e x i b i l i t y . By i n c r e a s i n g the l e n g t h of s i l e n c e code-word, which i s a sequence of s i l e n c e codes, the p r o b a b i l i t y of the sequence o c c u r r i n g i n a speech c o d i n g can be made t o approach z e r o . The b l o c k diagrams f o r s i l e n c e - s p e e c h coder and decoder a r e g i v e n i n F i g u r e 8. The a l g o r i t h m f o r SS coder and decoder i s s t r a i g h t f o r w a r d i f s i l e n c e d e t e c t i o n i s done o f f - l i n e , i . e . u s i n g a two-pass approach. W h i l e t h i s approach r u l e s out r e a l - t i m e a p p l i c a t i o n s t h e r e a r e advantages g a i n e d i n software/hardware i m p l e m e n t a t i o n and o v e r a l l j o b s c h e d u l i n g when a main-frame 40 V o i c e ' PCM A-PCM ADPCM ADM PCM Converter (decoder) Silence detector SD52 Silence encoder SS codes •speech codes SS coder speech decoder silence decoder if SS decoder Figure 8 - SS Coder computer is involved. Moreover for speech storage purposes, the real-time requirement is not c r i t i c a l and in a practical implementation such as Voice Storage System for telephone messages [Nus], duplication is mandatorily carried out for the sake of r e l i a b i l i t y . The basic ideas in our silence encoding are : (a) Silence intervals are marked by a unique sequence of silence codes of a specific length SEQ-L, (b) the length of a silence interval follows this silence code-word and is encoded in the next SIL-L codes, and (c) the end of message is encoded by silence interval of zero length. The algorithm is as follows. The output of silence detector SD52 is a series of silence intervals, thus the detector conceptually segments the incoming speech codes into speech and silence domains. Source codes are sent unaltered in the speech 41 domain. When a s i l e n c e frame i s encountered, a s i l e n c e code-word c o n s i s t i n g of SEQ-L s i l e n c e codes i s sent f o l l o w e d by the i n t e r v a l l e n g t h code-word c o n s i s t i n g of SIL-L codes. A l l the source codes in t h i s s i l e n c e i n t e r v a l are d i s c a r d e d . T h i s s i l e n c e - s p e e c h coding c o n t i n u e s u n t i l the end of message when the s i l e n c e code-word corresponding to the z e r o - l e n g t h s i l e n c e i n t e r v a l i s sent. At the decoder, a n o n - s i l e n c e code i s deemed to be a speech code and i s decoded and r e l e a s e d f o r output. When a s i l e n c e code i s d e t e c t e d i t i s h e l d i n a b u f f e r , a t e s t f o r the chosen s i l e n c e code i s c a r r i e d out f o r next (SEQ-L - 1) codes. I f t h i s t e s t i s n e g a t i v e , a l l the s i l e n c e codes encountered thus f a r are decoded and r e l e a s e d f o r output. I f the t e s t i s p o s i t i v e , the next SIL-L codes are i n t e r p r e t e d as the i n t e r v a l l e n g t h , and a t r a i n of s i l e n c e as long as the s p e c i f i e d l e n g t h i s r e l e a s e d f o r output. In p r a c t i c e , the sequence of s i l e n c e codes, i . e . the s i l e n c e code-word, may be preceded by a s i l e n c e code that occurs i n the speech domain, r e s u l t i n g i n c o n f u s i o n at the decoder. Using the 4 - b i t ADPCM coding scheme as an example, when we d e s i g n a t e "0000" as the s i l e n c e code and a s t r i n g f i v e such codes as the s i l e n c e code-word, i t may happen t h a t the preceding speech segment ends with the speech code " 0 0 0 0 " . To p r o t e c t a g a i n s t t h i s c o n c a t e n a t i o n , a n o n - s i l e n c e code i s added to the chosen sequence of s i l e n c e codes, r e s u l t i n g f o r t h i s example a sequence that may look l i k e : 0 0 0 1 , 0 0 0 0 , 0 0 0 0 , 0 0 0 0 , 0 0 0 0 , 0 0 0 0 . In the absence of a d d i t i o n a l l o g i c at the decoder, t h i s a d d i t i o n a l dummy code w i l l be p e r c e i v e d as a message code. T h i s poses no performance problem f o r PCM or log-PCM at a l l . I t may cause some 4 2 l o s s o f a d a p t a b i l i t y i n an a d a p t i v e s c h e m e , b u t a s we s h a l l s e e l a t e r t h e m a j o r p r o b l e m h e r e i s t h e d e l e t i o n o f a l l t h e c o d e s i n t h e s i l e n c e i n t e r v a l . In a n y s p e c i f i c SS c o d i n g s c h e m e , a dummy n o n - s i l e n c e c o d e , a s i l e n c e c o d e , t h e l e n g t h o f s i l e n c e c o d e - w o r d ( S E Q - L ) a n d t h e number o f c o d e s ( S I L - L ) r e q u i r e d t o e n c o d e t h e i n t e r v a l l e n g t h h a v e t o be d e t e r m i n e d s o t h a t t h e e n t i r e s e q u e n c e o f t h i s s i l e n c e e n c o d i n g w o u l d n o t o c c u r a s p a r t o f a s p e e c h c o d i n g . 43 4.2 N - b i t P C M I m p l e m e n t a t i o n A m i d - r i s e r u n i f o r m q u a n t i z e r N - b i t P C M encoder i s c o n s i d e r e d N h e r e . For N b i t s per sample, the number of s t e p s i s 2 and the ,N s t e p s i z e A = 4096/2 , 4096 b e i n g the range f o r a 1 2 - b i t A/D c o n v e r t e r . I f nu(k) i s t h e incoming speech waveform sample the encoder i s d e s c r i b e d by: c ( k ) = [ m.(k)/A '] = [ -m.(k)/A ] + 2 N-1 i f m ^k) > 0 i f m i(k) < 0, The decoder g e n e r a t e s m ( k ) , the speech output sample, as f o l l o w s : m o ( k ) = c ( k ) . A + • A/2 ,N-1 i f c ( k ) < 2 N-1 N-1 = -{ c ( k ) - 2 " ' }.A - A/2 i f c ( k ) > 2 For N = 3, A = 512 and the encoding and d e c o d i n g s t e p s a re d e p i c t e d i n F i g u r e 9. n. o (k) 204 B 1024 -2048 -1024 1QO 101 11Q 111 -1024 -2048 01 1 010 « Codeword 001 000 ' ' . » m, (k) 1024 2048 < F i g u r e 9 - 3 - B i t M i d - R i s e r U n i f o r m Q u a n t i z e r 4 4 In c h o o s i n g a s u i t a b l e s i l e n c e c o d e , i t i s t e m p t i n g t o u s e t h e z e r o c o d e c o r r e s p o n d i n g t o t h e f i r s t p o s i t i v e s t e p r a n g e . However due t o t h e n a t u r e o f PCM a n d n o r m a l s p e e c h w i t h i n t e r m i t t e n t s i l e n c e s , t h e z e r o c o d e w i l l o c c u r most f r e q u e n t l y i n t h i s c o d i n g s c h e m e . T h u s , t o d e s i g n a t e a s e q u e n c e o f z e r o c o d e s a s t h e s i l e n c e c o d e - w o r d , one h a s t o u s e an e x c e e d i n g l y l o n g s e q u e n c e i n o r d e r t o a v o i d t h e p r o b l e m o f m i s - i n t e r p r e t a t i o n . As an i l l u s t r a t i o n , we u s e s p e e c h s a m p l e n o . 1 a e n c o d e d w i t h N - b i t PCM a n d d e t e r m i n e d t h e number o f o c c u r r e n c e s o f s i l e n c e c o d e - w o r d s o f v a r i o u s l e n g t h . The r e s u l t s a p p e a r i n T a b l e 12a ( f o r e n c o d e d s p e e c h w i t h o u t s i l e n c e d e l e t i o n ) a n d T a b l e 12b ( f o r e n c o d e d s p e e c h w i t h o p t i m a l s i l e n c e d e l e t i o n ) . F r o m t h e t a b l e s , w h i c h r e p r e s e n t t h e two e x t r e m e s o f s i l e n c e d e t e c t i o n , t h e s e q u e n c e l e n g t h n e e d e d f o r e n s u r i n g n o n - o c c u r r e n c e o f t h e same s e q u e n c e i n t h e SS c o d e s i s d e p e n d e n t on t h e d e g r e e o f s i l e n c e d e t e c t i o n . S u c h d e p e n d e n c y r e n d e r s t h e t a s k o f s e q u e n c e l e n g t h a s s i g n m e n t d i f f i c u l t . M o r e o v e r f o r s p e e c h w i t h low r e c o r d i n g l e v e l s u c h a s s p e e c h s a m p l e n o . 1 t h e r e may n o t be a r e l i a b l e s e q u e n c e o f z e r o c o d e s o f e c o n o m i c a l s e q u e n c e l e n g t h . H e n c e t h e z e r o c o d e s h o u l d n o t be u s e d a s a s i l e n c e c o d e . A b e t t e r c h o i c e o f s i l e n c e c o d e i n N - b i t PCM s p e e c h c o d i n g scheme w o u l d be t h e c o d e c o r r e s p o n d i n g t o t h e N-2 p o s i t i v e m i d - r a n g e , i n t h i s c a s e 2 . B e c a u s e o f t h e n a t u r e o f s p e e c h , t h e s i n g l e o c c u r r e n c e o f t h i s c o d e w o r d w i l l be no more f r e q u e n t t h a n a n y o t h e r a n d t h e c o n s e c u t i v e o c c u r r e n c e s w i l l be more u n l i k e l y . T h a t s i l e n c e d e t e c t i o n d o e s n o t a f f e c t t h e s e q u e n c e l e n g t h a s s i g n m e n t i n t h i s c a s e c a n be c o n c l u d e d f r o m T a b l e s 13a a n d 13b where t h e o n l y d i f f e r e n c e i s i n t h e r e s p e c t i v e N = 5 c o l u m n s . 45 T a b l e 12 - O c c u r r e n c e F r e q u e n c i e s o f Z e r o S i l e n c e Code f o r PCM Speech Sample no. 1a encoded with N-bit PCM Code = 00..0 (N times) n ( i ) = number of occurrences of s t r i n g of 1 codes a). Without s i l e n c e deletion: N = 12 1 1 10 9 8 7 6 5 n(1) 876 1593 2980 5278 8743 12312 1 1898 9528 n(2) 13 31 107 401 1208 3003 4323 4005 n(3) 0 O 4 32 188 911 2197 2642 n(4) 0 0 0 3 42 287 1040 154 1 n(5) 0 0 0 0 7 102 587 945 n(G) 0 0 0 0 0 40 336 573 n(7) 0 0 0 0 0 10 198 397 n(8) O 0 0 0 0 3 126 42 1 n(9) O 0 O 0 0 2 79 306 n( 10) 0 0 0 0 0 0 31 218 n(l1) 0 0 0 0 0 0 27 1 15 n(12) 0 0 0 0 0 0 20 63 n( 13) 0 0 0 0 0 0 1 1 51 n( 14) 0 0 0 0 0 0 6 54 n( 15) 0 0 0 0 0 0 8 54 n(>15) 0 0 0 0 0 0 9 125 b). With Optimal Silence deletion: N = 12 1 1 10 9 8 7 6 5 n( 1 ) 157 285 515 1035 1903 3195 4523 4707 n(2) 0 2 1 1 31 108 326 713 961 n(3) 0 0 1 1 3 62 200 322 n(4) 0 0 0 0 1 8 59 158 n(5) 0 0 0 0 0 1 19 65 n(6) 0 O 0 0 0 0 9 21 n(7) 0 0 0 0 0 0 4 21 n(8) 0 0 0 0 0 0 1 13 n(9) O 0 0 0 0 0 1 8 n( 10) 0 0 0 0 0 0 0 8 n( 1 1 ) 0 0 0 0 0 0 0 3 n( 12) 0 0 0 0 0 0 0 0 n(>12) 0 0 0 0 0 0 0 0 B a s e d on T a b l e 13 , i t i s now r e a s o n a b l e t o d e s i g n a t e a s t r i n g o f 5 m i d - r a n g e c o d e s a s t h e s i l e n c e c o d e - w o r d a n d h a v e some a s s u r a n c e o f r e l i a b i l i t y . B e c a u s e t h e window f r a m e u s e d i n t h e f e a t u r e e x t r a c t i o n i s 10ms, t h e s i l e n c e i n t e r v a l l e n g t h i s a l s o q u a n t i z e d t o 10ms. F o r N = 8 t h e maximum i n t e r v a l l e n g t h t h a t c a n be s p e c i f i e d by one c o d e i s 2.56 s e c . w h i c h i s a l s o u s e d a s t h e 4 6 T a b l e 13 - O c c u r e n c e F r e q u e n c i e s o f M i d - r a n g e S i l e n c e Code f o r PCM Speech Sample no.1a encoded with N-bit PCM Code = mid-range code n(1) = number of occurrences of s t r i n g of 1 codes a). Without Silence Deletion N = 12 11 10 9 S 7 6 5 code = '1024' '512' '256' '128' '64' '32' '16' '8' n(1) 15 24 46 82 162 338 627 1053 n(2) 0 0 0 0 5 1 1 39 105 n(3) 0 0 0 0 . 0 0 0 12 n(4) 0 0 0 0 0 0 0 1 n(5) 0 0 0 0 0 0 0 0 n(>5) 0 0 0 0 0 0 0 0 b). With Optimal Silence Deletion N = 12 1 1 10 9 8 7 6 5 code = '1024' '512' '256' ' 128' '64' '32' ' 16' '8' n( 1) 15 24 46 82 162 338 627 1052 n(2) 0 0 0 0 5 1 1 39 105 n(3) 0 0 0 0 0 0 0 12 n(4) O 0 O 0 0 0 0 1 n(5) 0 0 0 0 0 0 0 6 n(>5) O 0 0 0 0 0 0 0 maximum s i l e n c e p e r i o d , s i n c e t r u n c a t i o n o f any l o n g e r d u r a t i o n t o 2.56 s e c . d o e s n o t a f f e c t i n t e l l i g i b i l i t y . We t h e r e f o r e u s e one c o d e t o s p e c i f y t h e l e n g t h o f s i l e n c e i n t e r v a l f o r N = 8 t o 12. F o r N = 5 t o 7 s i l e n c e i n t e r v a l s c a n be e n c o d e d i n two c o d e s . The r e s u l t s o f u s i n g SEQ-L = 5 and S I L - L = 1 f o r N = 8 t o 12 o r S I L - L = 2 f o r N = 5 t o 7 i n e n c o d i n g s i l e n c e f o r N - b i t PCM a r e t a b u l a t e d i n T a b l e 14. The a c t u a l l e n g t h o f a c o m p l e t e s i l e n c e e n c o d i n g i s 7 ( f o r N = 8 t o 12) o r 8 ( f o r N = 5 t o 7) due t o one e x t r a n o n - s i l e n c e dummy c o d e needed f o r c o n c a t e n a t i o n p r o t e c t i o n ( S e c t i o n 4 . 1 ) . T h e s e r e s u l t s were d e r i v e d f r o m one p e r s o n ' s a u d i o p e r c e p t i o n . F o r e a c h s p e e c h c o d i n g scheme t h e 47 T a b l e 14 - R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r PCM Compression Silence Compression w/o silenc e encoding w s i l e n c e SS code N encoding overhead encoding b i t - r a t e Quality 12 36 . 91% 0. 28%' 36 , .63% 60835bps good' 1 1 37 . 13 0. 28 36 , .85 55572 good 10 37 ; , 13 0. 28 36 , .85 50520 good ' 9 37. .22 0. ,28 36. .94 45403 good 8 37 . 49 0. 28 37, .21 40186 good 7 37 , .49 0. ,31 37 , .21 35162 good 6 44 , .44 0. ,31 44 , . 16 26803 j 5 50. .58 0. ,31 50, . 30 19880 4 1 Overhead is c a l c u l a t e d as.follows: Total H of codes = (# of silenc e i n t e r v a l s +1) x 7 = 70 x 7 = 490 codes. Overhead percentage = 490/22.3x8000 = 0.00275. ' A l l utterances are c l e a r l y enunciated, speaker recognition possible. A l l test points in Table 9 are retained. 3 Quality is good in the above sense except for loss of "tant" in "constant". 4 A s l i g h t q u a l i t y degradation and the following losses: "and" in "number and volume" "j e c t " in "object" "tant" in "constant" o r i g i n a l N - b i t c o d e s a n d t h e SS c o d e s w e r e d e c o d e d a n d p l a y e d b a c k r e p e a t e d l y f o r more c a r e f u l d i s c r i m i n a t i o n s . A l l t h e w o r d i n g s w e r e l i s t e n e d f o r a n d a n y i n c o m p l e t e w o r d s o r p h r a s e s w e r e p l a y e d b a c k i n i s o l a t i o n t o p o s i t i v e l y i d e n t i f y t h e l o s s e s . We c o n c l u d e t h a t t h e i m p l e m e n t a t i o n o f s i l e n c e d e t e c t o r i n N - b i t PCM c o d i n g f o r s p e e c h s t o r a g e c a n l e a d t o a s a v i n g o f a t l e a s t 35% o f m e m o r y . The m a i n s o u r c e s o f q u a l i t y d e g r a d a t i o n a r e r e d u c t i o n o f b i t s p e r s a m p l e a n d t h e p e r f o r m a n c e o f s i l e n c e d e t e c t o r a l g o r i t h m . S i l e n c e e n c o d i n g i t s e l f a d d i n g a b o u t 0 . 3 % o v e r h e a d d o e s n o t h a v e a n y a p p r e c i a b l e n e g a t i v e e f f e c t on t h e d e c o d e r p l a y b a c k q u a l i t y . A l t h o u g h s i l e n c e e n c o d i n g i n t r o d u c e s i s s u e s s u c h a s s i l e n c e l e n g t h q u a n t i z a t i o n , d u r a t i o n t r u n c a t i o n a n d z e r o - l e v e l s i l e n c e p l a y b a c k , t h e f i r s t two a r e h a r d l y 4 8 p e r c e i v a b l e by a c a s u a l l i s t e n e r while the l a s t i s s u e at worst causes some abruptness when there i s a r a p i d on-off s w i t c h i n g . T h i s abruptness i s caused by the background hum d u r i n g speech and absolute s i l e n c e during s i l e n c e i n t e r v a l s and can be compensated by playback of a t y p i c a l background noi s e segment (see r e s u l t s of s u b j e c t i v e t e s t i n g done i n S e c t i o n 5 . 1 ) . 4 9 4 . 3 N - b i t A - L a w PCM I m p l e m e n t a t i o n B e c a u s e s p e e c h s i g n a l s i n v a r i a b l y h a v e a d y n a m i c r a n g e o f o v e r 4 0 d B , t h e r e i s a n e e d f o r n o n - u n i f o r m q u a n t i z a t i o n . L o g a r i t h m i c c o m p a n d i n g o f t h e s p e e c h s i g n a l p r o v i d e s a s o l u t i o n f o r o v e r c o m i n g t h e l a r g e d y n a m i c r a n g e a n d a c h i e v e s a r o u g h l y c o n s t a n t SQNR o v e r a w i d e r a n g e . E x p e r i m e n t s c o n d u c t e d by J a y a n t [ J a y l ] h a v e shown t h a t 7 - b i t y - l a w PCM c a n a c h i e v e t h e same t o l l - q u a l i t y w a v e f o r m r e p r e s e n t a t i o n o f s p e e c h a s t h a t o b t a i n e d by 1 1 - b i t P C M . M o r e o v e r , a s i n g l e c h i p c o d e c f o r i m p l e m e n t i n g l o g a r i t h m i c c o m p a n d i n g i s now a v a i l a b l e a t low c o s t . T h i s n o n - u n i f o r m q u a n t i z a t i o n a l s o c a l l e d i n s t a n t a n e o u s c o m p a n d i n g i n v o l v e s c o m p r e s s i o n by a l o g a r i t h m i c f u n c t i o n . The i n c o m i n g s p e e c h s i g n a l x ( n ) i s f i r s t c o m p r e s s e d by (1) y ( n ) = In | x ( n ) | * s i g n [ x ( n ) ] a n d t h e n q u a n t i z e d u n i f o r m l y f o r d i g i t a l t r a n s m i s s i o n . In a n a l o g u e i m p l e m e n t a t i o n , (1) i s a p p r o x i m a t e d by (2) y = l n d + LIX) / l n ( 1 + LI) { M- law } o r (3) y = A . x / (1 + In A ) , 0 < x <. A " 1 = (1 + In A x ) / ( 1 + In A ) , A " 1 ^ x < 1 { A - l a w } where A a n d n t a k e t y p i c a l v a l u e s o f 100 a n d 2 5 5 . D i g i t a l l y (2) a n d (3) a r e i n t u r n a p p r o x i m a t e d by 8 p i e c e w i s e l i n e a r s e g m e n t s t h a t c l o s e l y m a t c h t h e c u r v e s f o r (2) a n d ( 3 ) . F i g u r e 10 shows how A - l a w i s a p p r o x i m a t e d by l i n e a r a p p r o x i m a t i o n s . E a c h o f t h e 8 s e g m e n t s i s c o d e d w i t h an 8 - b i t c o d e a s f o l l o w s : 1 s i g n b i t , 3 segment i d e n t i f i c a t i o n b i t s a n d 4 i n t r a - s e g m e n t q u a n t i z a t i o n b i t s c o r r e s p o n d i n g t o 16 q u a n t i z a t i o n l e v e l s . F o r M~law t h i s r e s u l t s i n 16 s e g m e n t s a n d b e c a u s e t h e f i r s t p o s i t i v e a n d t h e f i r s t 50 F i g u r e 10 - P i e c e w i s e L i n e a r A p p r o x i m a t i o n t o t h e A - l a w n e g a t i v e s e g m e n t s a r e c o l l i n e a r t h i s a p p r o x i m a t i o n i s a l s o c a l l e d t h e 1 5 - s d g m e n t jx-law a p p r o x i m a t i o n . A s f o r A - l a w , t h e f i r s t 4 s e g m e n t s a r e c o l l i n e a r r e s u l t i n g i n t h e 1 3 - s e g m e n t A - l a w a p p r o x i m a t i o n . T a b l e 15 g i v e s t h e s e g m e n t a t i o n a n d e n c o d i n g scheme f o r t h e a b o v e two l i n e a r a p p r o x i m a t i o n s , a s s u m i n g an i n p u t r a n g e o f - 2 0 4 8 t o 2047 ( 1 2 - b i t ADC) a n d 8 e n c o d i n g b i t s . In c o d i n g d i r e c t l y f r o m a n a l o g u e s i g n a l s t h e segment r a n g e s shown i n T a b l e 15 s h o u l d be r e a d a s f r a c t i o n o f f u l l p o s i t i v e r a n g e ; f o r i n s t a n c e , 16 - 31 s h o u l d be r e a d a s 16 /2048 - 3 1 / 2 0 4 8 o f Vmax . B o t h t h e 1 5 - s e g m e n t a n d t h e 1 3 - s e g m e n t a t t a i n 1 2 - b i t r e s o l u t i o n f o r s m a l l s i g n a l s ( l e s s t h a n 3 1 / 2 0 4 8 Vmax) b u t h a v e t h e e q u i v a l e n t o f 6 - b i t p r e c i s i o n f o r m i d - t o u p p e r - r a n g e s i g n a l s . I t i s e v i d e n t t h a t 1 5 - s e g m e n t j z - law g i v e s b e t t e r i d l e c h a n n e l n o i s e 51 T a b l e 15 - The 1 3 - S e g m e n t a n d 1 5 - S e g m e n t C o d i n g S c h e m e s M = A = 255 s i g n segment segment r a n g e s t e p # o f c o d e b i t c o d e 1 3 - s e g m e n t 1 5 - s e g m e n t s i z e s t e p s p r e c i s i o n 000 0 - 15 0 8 1 / 0 . 5 1 6 1 2/1 3 001 16 - 31 8 - 23 1 1 6 12 010 32 - 63 24 - 55 2 16 1 1 011 64 - 127 56 - 1 1 9 4 16 10 100 128 - 255 120 - 247 8 1 6 9 101 256 - 51 1 248 - 505 16 16 8 1 1 0 512 - 1 0 2 3 506 - 1 0 1 5 32 1 6 7 111 1 024 - 2 0 4 7 1016 - 2 0 3 9 64 16 6 p e r f o r m a n c e b u t s u f f e r s f r o m a s l i g h t l y l a r g e r SQNR f o r m i d - t o u p p e r - r a n g e due t o l a r g e r s t e p s i z e s i n t h e o u t e r s e g m e n t s . The 1 5 - s e g m e n t h a s b e e n i n u s e i n t h e B e l l S y s t e m ' s D2 C h a n n e l Bank c o m m e r c i a l t e l e p h o n e e x c h a n g e n e t w o r k [ H P , D M M ] . We p r o p o s e h e r e a m o d i f i c a t i o n o f t h e 1 3 - s e g m e n t A - l a w a p p r o x i m a t i o n t o be i m p l e m e n t e d i n c o n j u n c t i o n w i t h o u r s i l e n c e d e t e c t o r S D 5 2 . T h i s m o d i f i c a t i o n g i v e n i n T a b l e 16 r e s u l t s i n a 9 - s e g m e n t A - l a w a p p r o x i m a t i o n . W i t h 8 e n c o d i n g b i t s i t h a s a t w o r s t 7 - b i t p r e c i s i o n a n d a t b e s t 1 1 - b i t p r e c i s i o n a s c o m p a r e d t o 6 - b i t a n d 1 2 - b i t p r e c i s i o n f o r t h e 1 3 - s e g m e n t . T a b l e 16 - The 9 - S e g m e n t A - l a w A p p r o x i m a t i o n s i g n segment segment s t e p number c o d e b i t c o d e r a n g e s i z e o f s t e p s p r e c i s i o n 000 0 - 31 2 16 1 1 001 32 - 63 2 16 11 010 64 - 127 4 16 10 01 1 128 - 255 8 16 9 100 256 - 511 16 16 8 101 512 - 1 0 2 3 32 16 7 110 1024 - 1 5 3 5 32 16 7 111 1536 - 2 0 4 7 32 16 7 5 2 B e c a u s e t h e e v e n t u a l c o d i n g w i l l b e d e v o i d o f s i l e n c e i n t e r v a l s a n d t h e i d l e c h a n n e l c o n d i t i o n b e i n g i n s i g n i f i c a n t i n t h i s c a s e , t h e r e i s r e a s o n t o b e l i e v e t h a t s u c h a m o d i f i c a t i o n w i l l r e s u l t i n a b e t t e r S Q N R . H a v i n g c h o s e n t h i s p a r t i c u l a r 9 - s e g m e n t A - l a w c o d i n g s c h e m e , we n e e d t o c o n s i d e r a n a p p r o r i a t e s i l e n c e c o d e t o b e u s e d i n S S c o d i n g ( s e e S e c t i o n 4 . 1 ) . A s i n t h e c a s e o f P C M , a m i d - r a n g e s i l e n c e c o d e (1 1.10 0 . . . 0 ) i s u s e d a n d we h a v e d e t e r m i n e d t h a t a s e q u e n c e o f 8 s u c h s i l e n c e c o d e s i s s u f f i c i e n t t o d e l i n e a t e s i l e n c e w i t h m i n i m a l d a n g e r o f m i s - i n t e r p r e t a t i o n . S i n c e we c a n t r u n c a t e s i l e n c e i n t e r v a l s t o 2 . 5 6 s e c . o r 2 5 6 u n i t s (1 u n i t = 1 0 m s ) , we c a n e n c o d e t h e s i l e n c e i n t e r v a l l e n g t h i n 8 b i t s o r 2 c o d e s f o r t h e c a s e N = 4 . We t h u s a s s i g n S I L - L = 2 . T h e r e s u l t s o f u s i n g S E Q - L = 8 a n d S I L - L = 2 i n S S c o d i n g w i t h t h e 9 - s e g m e n t A - l a w P C M a r e t a b u l a t e d i n T a b l e 1 7 . T h e y w e r e o b t a i n e d i n t h e s a m e m a n n e r a s i n t h e P C M i m p l e m e n t a t i o n . T h e a c t u a l l e n g t h o f a c o m p l e t e s i l e n c e e n c o d i n g i s 11 d u e t o o n e e x t r a n o n - s i l e n c e d u m m y c o d e n e e d e d f o r c o n c a t e n a t i o n p r o t e c t i o n ( S e c t i o n 4 . 1 ) . T h i s A - l a w i m p l e m e n t a t i o n a g r e e s w i t h t h e l o n g e s t a b l i s h e d f i n d i n g t h a t a s a v i n g o f 4 b i t s p e r s a m p l e o v e r P C M c o d i n g c a n b e o b t a i n e d w i t h r e s p e c t t o t h e s a m e q u a l i t y o f s p e e c h p l a y b a c k . 53 T a b l e 17 - R e s u l t s o f SS c o d e r I m p l e m e n t a t i o n f o r A - l a w PCM C o m p r e s s i o n S i l e n c e C o m p r e s s i o n w / o s i l e n c e e n c o d i n g w s i l e n c e SS c o d e N e n c o d i n g o v e r h e a d e n c o d i n g b i t - r a t e Q u a l i t y 8 3 6 . 8 6 7 3 7 . 1 7 6 3 6 . 6 8 5 3 6 . 6 8 4 3 7 . 2 6 0 . 4 4 3 6 . 4 2 0 . 4 4 3 6 . 7 3 0 . 4 4 3 6 . 2 4 0 . 4 4 3 6 . 2 4 0.41 3 6 . 8 2 40691 e x c e l l e n t 1 35431 e x c e l l e n t 1 30605 g o o d 2 25504 g o o d 2 20218 g o o d 2 1 A l l t h e w o r d s a r e c l e a r l y e n u n c i a t e d , t h e s p e a k e r c o u l d be r e c o g n i z e d f r o m h i s v o i c e , more s o f o r 8 - b i t t h a n 7 - b i t . In e a c h c a s e , much o f t h e a c o u s t i c q u a l i t i e s a r e r e t a i n e d . 2 G o o d i n t h e s e n s e t h a t a l l t h e w o r d s a r e d i s t i n c t l y a u d i b l e i n c l u d i n g a l l t h e weak f r i c a t i v e s , t h o u g h some f u z z i n e s s b e g i n s t o s e t i n a s N d e c r e a s e s . T h e s p e a k e r may n o t be r e a d i l y r e c o g n i z a b l e . F o r N = 4 s p e a k e r i d e n t i f i c a t i o n t h r o u g h l i s t e n i n g becomes a p r o b l e m . 54 4 . 4 ADPCM I m p l e m e n t a t i o n Numerous ADPCM s c h e m e s t h a t h a v e b e e n p r o p o s e d i n v o l v e v a r i a t i o n s i n t h e f o l l o w i n g c o m p o n e n t s : (a ) . P r e d i c t o r f i l t e r a n d i t s p a r a m e t e r s , (b) . A d a p t a t i o n ( i f a n y ) o f p r e d i c t o r p a r a m e t e r s , ( c ) . S t e p s i z e a d a p t a t i o n a n d s t e p s i z e m u l t i p l i e r s . Of t h e a b o v e , ( c ) i s most s i g n i f i c a n t a n d i n d e e d much e f f o r t s h a v e been i n j e c t e d i n t o v a r i o u s a d a p t a t i o n s t r a t e g i e s a n d t h e i r o p t i m i z a t i o n . J a y a n t [ J a y 2 ] r e c o g n i z e d t h a t t h e m u l t i p l i e r s a r e n o t c r u c i a l b u t t h e a d a p t a t i o n s h o u l d a d h e r e t o t h e p h i l o s o p h y t h a t when a s l o p e o v e r l o a d o c c u r s t h e s t e p s i z e s h o u l d be i n c r e a s e d by a f a c t o r o f 2 . 0 o r 3 . 0 a n d i n t h e g r a n u l a r r e g i o n s t e p s i z e n e e d o n l y t o be d e c r e a s e d by a f a c t o r o f 0 . 7 5 t o 0 . 9 0 . The r a t i o n a l e h e r e i s t h a t q u a n t i z a t i o n e r r o r d u r i n g o v e r l o a d c a n be v e r y d e t r i m e n t a l t o t h e t r a c k i n g p e r f o r m a n c e o f t h e a d a p t i v e d i f f e r e n t i a l scheme w h e r e a s g r a n u l a r e r r o r whose m a g n i t u d e i s a t most h a l f o f t h e s t e p s i z e c a n o n l y c a u s e i n c i d e n t a l o r t r a n s i e n t n o i s e . We d e t a i l h e r e an ADPCM scheme ( F i g u r e 11) w h i c h h a s a f i r s t -o r d e r f e e d - b a c k p r e d i c t o r f i l t e r P ( Z ) = 1 / ( 1 - a . Z " 1 ) w i t h f i x e d p r e d i c t o r c o e f f i c i e n t a . T h e s t e p s i z e a d a p t a t i o n i s by means o f a s t e p s i z e l a d d e r a n d a s t e p p i n g f u n c t i o n . The s t e p s i z e l a d d e r h a s a f i x e d number o f r u n g s g i v e n i n T a b l e 18 d e p e n d i n g on N , t h e number o f b i t s u s e d i n t h e ADPCM s c h e m e . T h e max imim s t e p s i z e i s c h o s e n s o t h a t a t t h e h i g h e s t q u a n t i z a t i o n l e v e l t h e q u a n t i z e d v a l u e s p a n s h a l f o f t h e p o s i t i v e i n p u t r a n g e ( i n t h e c a s e o f 1 2 - b i t d i g i t a l r e p r e s e n t a t i o n t h e 55 a. ^ Q u a n t i z e r 7K—— - 5 C o d e r P i X i « " ^0 S t e p p i n g F u n c t i o n S t e p p i n g F u n c t i o n S t e p S i z e L a d d e r z • 9 D e c o d e r d . * a . Z " 1 F i g u r e 11 - ADPCM B l o c k D i a g r a m maximum q u a n t i z e d v a l u e i s 0 . 5 x 2048 = 1 0 2 4 ) . B e c a u s e we a r e d e a l i n g w i t h t h e d i f f e r e n c e w a v e f o r m , s l o p e o v e r l o a d w i l l o c c u r l e s s f r e q u e n t l y t h a n o t h e r w i s e a n d when i t o c c u r s t h e a d a p t a t i o n s t r a t e g y e n s u r e s t h a t s l o p e o v e r l o a d w i l l n o t c o n t i n u e f o r l o n g . T a b l e 18 - S t e p S i z e L a d d e r S p e c i f i c a t i o n s t e p s i z e # o f q u a n t i z a t i o n N # o f r u n g s m i n i n u m maximum l e v e l s 3 25 1 256 4 4 22 1 128 8 5 19 1 64 16 6 16 1 32 32 7 13 1 16 64 8 10 1 8 128 56 T h e r u n g s b e t w e e n t h e b o t t o m a n d t o p r u n g s a r e s p a c e d e q u a l l y a n d t h e s t e p s i z e d o u b l e s e v e r y 3 r u n g s . In f a c t s t e p s i z e Z ( i ) c o r r e s p o n d i n g t o r u n g i i s g i v e n by Z ( i ) = 2 ^ ^ . F o r N = 8 t h e s t e p s i z e l a d d e r i s r u n g n o . 0 1 2 3 4 5 6 7 8 9 s t e p s i z e 1.00 1.26 1 .59 2 . 0 0 2 . 5 2 3 . 1 7 4 . 0 0 5 . 0 4 6 . 3 6 8 . 0 0 A s t e p p i n g s t r a t e g y d e t e r m i n e s w h e t h e r t h e n e x t s t e p s i z e s h o u l d i n c r e a s e o r d e c r e a s e , t h u s m o v i n g up o r down t h e l a d d e r , b a s e d s o l e l y on b a s i s o f t h e p r e s e n t q u a n t i z a t i o n l e v e l . I f t h e q u a n t i z a t i o n l e v e l i s a t t h e h i g h e s t v a l u e t h e n t h e r e i s a p r o b a b l e s l o p e o v e r l o a d w h i c h c a n be made l e s s l i k e l y t o o c c u r i n t h e n e x t i n s t a n c e i f t h e s t e p s i z e i s i n c r e a s e d by s t e p p i n g up 3 s t e p s ( e q u i v a l e n t t o d o u b l i n g o f s t e p s i z e ) . On t h e o t h e r h a n d i f t h e q u a n t i z a t i o n l e v e l i s a t t h e l o w e s t t h e r e i s a l i k e l i h o o d o f g r a n u l a r n o i s e w h i c h c a n be made l e s s s e v e r s u b s e q u e n t l y i f s t e p s i z e i s d e c r e a s e d by s t e p p i n g down one r u n g ( e q u i v a l e n t t o a f a c t o r o f 0 . 8 ) . The s t e p p i n g f u n c t i o n s we u s e d f o r v a r i o u s N a r e d e p i c t e d i n F i g u r e 12 . T h e p r e d i c t o r c o e f f i c i e n t a ( 0 < a < 1 ) r e l a t e s t o t h e s i g n i f i c a n c e we p l a c e on t h e p a s t s a m p l e t o p r e d i c t t h e n e x t s a m p l e . F o r i n p u t s i g n a l s o f h i g h c o r r e l a t i o n we want a t o be a s c l o s e t o 1 a s p o s s i b l e . S i n c e by c o n s i d e r i n g t h e d i f f e r e n c e w a v e f o r m i n s t e a d o f t h e o r i g i n a l , t h e s a m p l e t o s a m p l e c o r r e l a t i o n i s d e c r e a s e d , we c h o o s e a = 0 . 8 w h i c h a l s o e n s u r e s a f a s t d e c a y o f a n y i n c i d e n t a l e r r o r s u c h a s t r a n s m i s s i o n e r r o r o r e r r o r due t o l o s t s a m p l e s . A s i n o t h e r i m p l e m e n t a t i o n s , we n e e d t o e n c o d e s i l e n c e 5 7 F i g u r e 12 - S t e p p i n g F u n c t i o n s i n t e r v a l s d e r i v e d f r o m o u r s i l e n c e d e t e c t o r SD52 i n a way t h a t w o u l d a v o i d m i s - i n t e r p r e t a t i o n . T h e z e r o c o d e ' 0 0 . . 0 ' (N t i m e s ) i s c h o s e n a s t h e s i l e n c e c o d e b e c a u s e i t c o r r e s p o n d s t o t h e c a s e when t h e d i f f e r e n c e b e t w e e n p r e s e n t a n d p r e v i o u s s a m p l e s i s n e g a t i v e a n d q u a n t i z a t i o n l e v e l i s l o w e s t , a s s u c h a s e q u e n c e o f z e r o c o d e s c a n n o t c o n t i n u e l o n g b e f o r e t h e s t e p s i z e i s d e c r e a s e d u n t i l t h e q u a n t i z a t i o n l e v e l i s n o t l o w e s t o r t h e d i f f e r e n c e i s p o s i t i v e . Our e x p e r i m e n t s show t h a t a s e q u e n c e o f 7 z e r o c o d e s i s a d e q u a t e t o e n c o d e s i l e n c e . U s i n g S E Q - L = .7 a n d S I L - L = 3 t h e c o m p l e t e s i l e n c e e n c o d i n g c o n s i s t s o f a dummy c o n c a t e n a t i o n p r o t e c t i o n c o d e ( ' 0 0 . . 0 1 ' ) , 7 s i l e n c e c o d e s ( ' 0 0 . . 0 ' ) a n d 3 N - b i t i n t e r v a l l e n g t h c o d e s . B e c a u s e t h e a d a p t a t i o n i s b a s e d u p o n t h e l a s t s a m p l e t h e r e i s a l o s s o f c o n t i n u i t y i n t h e a d a p t a t i o n p r o c e s s a t t h e o u t s e t o f any s p e e c h i n t e r v a l . F i g u r e 13 i l l u s t r a t e s a t y p i c a l s i t u a t i o n . [ A ] shows t h e o r i g i n a l w a v e f o r m i n s o l i d l i n e s a n d t h e r e c o n s t r u c t e d w a v e f o r m f r o m t h e SS d e c o d e r i n d o t t e d l i n e s . [B ] g i v e s t h e 4 - b i t ADPCM c o d e s a n d [ C ] t h e SS c o d e s . A t t i m e = 2 . 8 9 s 58 smple# =23040 time =2.89s p =42.59 stepsz =20.16 smple# =24001 time =3.01S p =168.67 steps z =20.16 p =0.0 stepsz =32 F i g u r e 13 - ADPCM Adaptation D i s c o n t i n u i t y j u s t a f t e r sample number 23040, the p r e d i c t o r v a l u e p and step s i z e take the va l u e s as shown. At time = 3.01s j u s t before sample number 24001 the r e s p e c t i v e values are a l s o shown assuming no s i l e n c e d e l e t i o n takes p l a c e . The equal step s i z e s are p u r e l y a c c i d e n t a l . However due to s i l e n c e d e l e t i o n , at the outset of any speech i n t e r v a l as f a r as the decoder i s concerned p and step s i z e are i n d e t e r m i n a t e . When we use the c o n s e r v a t i v e d e c i s i o n to a s s i g n p = 0 . 0 and ste p s i z e = 32 . 0 i t i s as though some samples have been l o s t , r e s u l t i n g i n p r e d i c t i o n e r r o r and s t e p s i z e e r r o r . T h e r e f o r e the beginning of each speech i n t e r v a l corresponds to an e r r o r recovery process which can take as long as 200 samples before 59 the e r r o r s d i s s i p a t e t o z e r o . F o r t u n a t e l y our e x p e r i m e n t s i n d i c a t e t h a t t y p i c a l l y i t t a k e s about 10 t o 20 samples t o r e c o v e r and i n t h e worst case when i t t a k e s 200 samples or 25ms (at a s a m p l i n g r a t e of 8KHz), we c o u l d not d e t e c t any i l l e f f e c t . Of the two e r r o r s , s t e p s i z e e r r o r can be c h r o n i c and the r e c o v e r y depends on the s t a t i s t i c a l d i s t r i b u t i o n of s t e p s i z e . I f the i n i t i a l s t e p s i z e a t the o u t s e t of the speech i n t e r v a l i s not v e i l chosen, s e r i o u s muting or waveform d i s t o r t i o n can r e s u l t (see Appendix 1). P r e d i c t i o n e r r o r i s more d e t e r m i n i s t i c and the speed of decay of t h i s e r r o r depends on the p r e d i c t i o n c o e f f i c i e n t a. For a = 0.8 i t can be shown t h a t e r r o r d i e s out a f t e r 35 samples or about 4ms i n t h e worst case (Appendix 1 ) . T a b l e 19 t a b u l a t e s the r e s u l t s of implementing our N - b i t ADPCM speech c o d e r i n c o n j u n c t i o n w i t h our s i l e n c e d e t e c t o r SD52. Wi t h b e t t e r than 35% c o m p r e s s i o n , SD52 i s s u c c e s s f u l i n p r o d u c i n g good q u a l i t y speech a t l e s s than 16 K b i t s per second. 60 T a b l e 19 - R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r ADPCM N C o m p r e s s i o n w / o s i l e n c e e n c o d i n g S i l e n c e C o m p r e s s i o n e n c o d i n g w s i l e n c e o v e r h e a d e n c o d i n g SS c o d e b i t - r a t e Q u a l i t y 8 7 6 5 4 3 3 6 . 1 4 3 6 . 3 2 3 5 . 8 7 3 6 . 10 3 6 . 37 3 6 . 5 4 0 . 4 3 0 . 4 3 0 . 4 3 0 . 4 3 0 . 4 3 0 . 4 3 35 .71 3 5 . 8 9 3 5 . 4 4 3 5 . 6 7 3 5 . 9 4 3 6 . 1 1 41 1 46 35902 30989 25732 20499 1 5334 e x c e l l e n t 1 e x c e l l e n t ' e x c e l l e n t 1 e x c e l l e n t 1 g o o d 2 g o o d 2 1 A l l t h e w o r d s a r e c l e a r l y e n u n c i a t e d . T h e s p e a k e r c o u l d be r e c o g n i z e d f r o m h i s v o i c e , more s o f o r l a r g e r N . T h e q u a l i t y c a n be c o m p a r e d t o t h e o r i g i n a l 1 2 - b i t PCM r e c o r d i n g . 2 A l l t h e w o r d s a r e d i s t i n c t l y a u d i b l e , i n c l u d i n g a l l t h e weak f r i c a t i v e s , t h o u g h some f u z z i n e s s b e g i n s t o s e t i n a s N N d e c r e a s e s . The s p e a k e r may n o t be r e a d i l y r e c o g n i z a b l e . 61 4 . 5 ADM I m p l e m e n t a t i o n Among v a r i o u s ADM s c h e m e s , t h e C o n t i n u o u s l y V a r i a b l e S l o p e D e l t a M o d u l a t i o n (CVSD) i s c h o s e n f o r o u r i m p l e m e n t a t i o n b e c a u s e i t i s t h e l e a s t c o m p l e x , h a s b e e n t h e most w i d e l y u s e d a n d i t s s i n g l e c h i p IC f o r m i s c o m m e r c i a l l y a v a i l a b l e . M o r e i m p o r t a n t l y i t c a n be made l e s s s e n s i t i v e t o p r e d i c t o r a n d s t e p s i z e e r r o r s . However t h e t r a d e o f f i s a t t h e e x p e n s e o f t h e d e p e n d e n c y o f SQNR on t h e i n p u t s i g n a l l e v e l . T h e CVSD scheme i s g i v e n i n F i g u r e 14 i n w h i c h t h e s t e p s i z e l o g i c i s g i v e n b y : z ( n ) = b . z ( n - 1 ) + d2 = b . z ( n - 1 ) + d l i f c ( n ) = c ( n - 1 ) = c ( n - 2 ) o t h e r w i s e where 0 < b < 1 a n d d1 « d 2 . T h e i d e a b e i n g t h a t d u r i n g s l o p e o v e r l o a d a s d e t e r m i n e d by 3 c o n s e c u t i v e " 0 " s o r " 1 " s , t h e s t e p s i z e i s i n c r e a s e d a n d c o n v e r s e l y s t e p s i z e i s d e c r e a s e d d u r i n g g r a n u l a r i t y . x ( n ) c(n) + ^ d ( n ) . s ( n ) a.Z" 1- 1 ) f r «(n) s p ( n 1 [ *Z - 1 * | z - 1 ? E n c o d e r c ( n ) z ( n ) *Z *Z - 1 > t— ^ — _ u S t e p S i z e L o g i c z) D e c o d e r s ( n ) S t e p S i z e L o g i c z ( n ) 4 a.Z - i x ( n ) F i g u r e 14 - CVSD M o d u l a t i o n 62 C o m p u t a t i o n a l l y t h e a l g o r i t h m s a r e a s f o l l o w s : E n c o d e r D e c o d e r 1. s ( n ) = s i g n { x ( n ) - p ( n - 1 ) } 1. o b t a i n c ( n ) , s ( n ) 2 . z ( n ) = b . z ( n - l ) + dn 2 . z ( n ) = b . z ( n - l ) + dn 3 . x ( n ) = s ( n ) . z ( n ) + p ( n - 1 ) 3 . x ( n ) = s ( n ) . z ( n ) + p ( n - l ) 4 . p ( n ) = a . x ( n ) 4 . p ( n ) = ' a . x ( n ) N o t e t h a t t h e s t e p s i z e u s e d i s t h e c u r r e n t u p d a t e d s t e p s i z e - b a s e d on t h e c u r r e n t s i g n b i t u n l i k e t h e c a s e f o r ADPCM where t h e u p d a t e d s t e p s i z e c a n n o t be u s e d f o r t h e a d a p t i v e p r o c e s s b e c a u s e a t t h e e n c o d e r t h e q u a n t i z a t i o n l e v e l h a s t o be b a s e d on t h e p r e v i o u s s t e p s i z e . The f u n c t i o n o f t h e p r e d i c t i o n f i l t e r c o e f f i c i e n t a i s t h e same a s i n ADPCM, n a m e l y t o c o n t r o l t h e s p e e d o f p r e d i c t i o n e r r o r d e c a y a n d t h e s p e e d o f a d a p t a t i o n . F o r c l o s e l y c o r r e l a t e d s a m p l e s , b e t t e r SQNR p e r f o r m a n c e i s o b t a i n e d by s e t t i n g a c l o s e t o 1. S i n c e ADM s a m p l e s h a v e f i r s t o r d e r c o r r e l a t i o n t e n d i n g t o 1 a s t h e o v e r - s a m p l i n g f a c t o r i n c r e a s e s , we c h o o s e a = 0 . 9 5 w h i c h e n s u r e s g o o d SQNR p e r f o r m a n c e a n d f a s t a d a p t a t i o n t o w a v e f o r m v a r i a t i o n s a t t h e e x p e n s e o f s l o w e r r a t e o f p r e d i c t o r e r r o r d e c a y . T h e s t e p s i z e l o g i c i s s p e c i f i e d by b , d1 a n d d 2 . U s i n g t h e \ f a c t t h a t d u r i n g s l o p e o v e r l o a d o r g r a n u l a r i t y t h e s e q u e n c e z ( k ) i s m o n o t o n i c a n d b o u n d e d we c a n show t h a t z m i n = d 1 / ( l - b ) a n d zmax = d 2 / ( l - b ) . H e n c e d 1 , d2 a r e d e t e r m i n e d by z m i n , zmax s p e c i f i c a t i o n s a n d c o n v e r s e l y . T h e c h o i c e s o f b , d1 a n d d2 c o n t r o l t h e s p e e d o f s t e p s i z e c h a n g e a n d t h e d y n a m i c r a n g e . I f b i s c l o s e t o 1, t h e r a t e o f s t e p s i z e c h a n g e i s s l o w ( s i n c e d2 i s s m a l l ) i m p l y i n g s l o w , s y l l a b i c a d a p t a t i o n w i t h c h a n n e l e r r o r l e s s p r o n o u n c e d . I t a l s o means l e s s g r a n u l a r n o i s e 63 ( b e t t e r i d l e c h a n n e l p e r f o r m a n c e ) a n d s l o w r e c o v e r y f r o m s t e p s i z e e r r o r s ( s e e A p p e n d i x 2 ) . F o r b c o n s i d e r a b l y l e s s t h a n 1 t h e c o n v e r s e i s t r u e . S i n c e zmax s p e c i f i e s d 2 , i t a l s o c o n t r o l s t h e s p e e d o f s t e p s i z e a d a p t a t i o n . A h i g h zmax g i v e s b e t t e r w a v e f o r m t r a c k i n g d u r i n g h i g h s i g n a l l e v e l s b u t i n t r o d u c e s h i g h e r SQNR a t low i n p u t l e v e l . F i g u r e 15 shows t h e t r a c k i n g p e r f o r m a n c e o f a few c h o i c e s o f b a n d z m a x . I n p u t W a v e f o r m zmax b 1. 128 0 . 8 2 . 256 0 . 8 3 . 256 0 . 7 4 . 512 0 . 7 F i g u r e 15 - T r a c k i n g P e r f o r m a n c e o f CVSD M o d u l a t i o n J a y a n t a n d R o s e n b e r g [ J R ] h a v e c o n c l u d e d t h a t s l o p e o v e r l o a d n o i s e i s l e s s o b j e c t i o n a b l e t h a n g r a n u l a r n o i s e . T h e r e a s o n c a n w e l l be t h a t w h i l e t h e r e c o n s t r u c t e d w a v e f o r m f a i l s t o t r a c k t h e i n p u t w a v e f o r m d u r i n g s l o p e o v e r l o a d ( s e e - F i g u r e 15 -1 ) t h e o v e r a l l p e r c e p t i o n a s f a r a s t h e l i s t e n e r i s c o n c e r n e d i s t h a t o f m u t i n g i n s t e a d o f d i s t o r t i o n a s i n t h e c a s e f o r g r a n u l a r n o i s e . In o t h e r . w o r d s , i n t e l l i g i b i l i t y i s p r e s e r v e d d u r i n g s l o p e o v e r l o a d b u t may n o t be p r e s e r v e d d u r i n g g r a n u l a r i t y . We i m p l e m e n t e d t h e SS c o d e r w i t h t h e CVSD m o d u l a t i o n scheme u s i n g b i t r a t e s o f l 6 K p s , 24Kps a n d 3 2 K p s . T h e p a r a m e t e r s f o r t h e CVSD c o d e r w e r e : a = 0 . 9 5 d l = 0 . 6 z m i n = 2 b = 0 . 7 d2 = 1 1 5 . 2 zmax = 384 6 4 f o r a n i n p u t r a n g e o f - 2 0 4 8 t o 2 0 4 7 . S i n c e s p e e c h s a m p l e n o . 1 a w a s s a m p l e d a t 8 K H z , t o o b t a i n h i g h e r s a m p l i n g r a t e , we o v e r - s a m p l e d b y i n t e r p o l a t i o n , u s i n g a t h i r d o r d e r p o l y n o m i a l d e r i v e d f r o m o n e b a c k w a r d s a m p l e a n d t w o f o r w a r d s a m p l e s a s s h o w n i n F i g u r e 16 f o r a n o v e r - s a m p l i n g f a c t o r o f 3 . TJ a E < ° o r i g i n a l s a m p l e s + n e w s a m p l e s -> T i m e F i g u r e 16 - O v e r s a m p l i n g P r o c e s s A f t e r t h e e q u i v a l e n t o f s p e e c h s a m p l e n o . 1 a a t t h e d e s i r e d s a m p l i n g r a t e i s p r o d u c e d b y t h e a b o v e p r o c e s s , t h e C V S D c o d i n g i s p e r f o r m e d r e s u l t i n g i n a c o m p r e s s e d s p e e c h c o d e a t a b i t r a t e e q u a l t o t h e o v e r - s a m p l i n g r a t e . T h e n e x t p r o c e s s i n g b l o c k i n t h e S S c o d e r ( s e e F i g u r e 8 ) i s t h e P C M c o n v e r t e r f o r t h e p u r p o s e o f . s i l e n c e d e t e t i o n . T h e P C M c o n v e r t e r h e r e c o n s i s t s o f a C V S D d e c o d e r a n d a d i g i t a l l o w - p a s s f i l t e r w h i c h i s r e q u i r e d h e r e b e c a u s e i n t h e o v e r s a m p l i n g p r o c e s s q u a n t i z a t i o n n o i s e a t f r e q u e n c y h i g h e r t h a n 4 K H z i s i n t r o d u c e d . T h e r e s t o f t h e i m p l e m e n t a t i o n f o l l o w s a s s h o w n i n F i g u r e 8 . U s i n g t h e z e r o s i l e n c e c o d e ( ' 0 ' ) , a s e q u e n c e o f 15 s u c h c o d e s i s e n o u g h t o d e l i n e a t e s i l e n c e . I n f a c t A p p e n d i x 3 s h o w s t h a t t h e t h e o r e t i c a l m a x i m u m l e n g t h f o r o u r C V S D s c h e m e i s 1 6 . T a b l e 2 0 i s o b t a i n e d u s i n g S E Q - L = 15 a n d S I L - L = 8 . 65 T a b l e 20 - R e s u l t s o f SS C o d e r I m p l e m e n t a t i o n f o r CVSD C o m p r e s s i o n S i l e n c e C o m p r e s s i o n CVSD w / o s i l e n c e e n c o d i n g w s i l e n c e SS c o d e S p e e c h b i t r a t e e n c o d i n g o v e r h e a d e n c o d i n g b i t r a t e Q u a l i t y l o s s e s 32K 41 .31% 0.23% 41 .08% 18854 g o o d 1 a , b , c 2 24K 4 2 . 2 9 0 . 3 3 41 . 9 6 13930 g o o d b , c 1 6K 4 3 . 2 7 0 . 4 9 4 2 . 7 8 9155 g o o d b , c 1 E x c e p t f o r t h e l o s s e s n o t e d , t h e p e r c e p t i o n i s p l e a s i n g , s m o o t h a n d c e r t a i n l y i n t e l l i g i b l e . 2 S p e e c h l o s s e s among t h o s e t e s t p o i n t s i n t h e t e s t d o m a i n ( T a b l e 9) a r e : a . " s " i n " y e a r s " b . " s " i n " a b s t r a c t " c . " s t a n t " i n " c o n s t a n t " T h e r e s u l t s b r i n g a few p o i n t s t o l i g h t . a ) . A s p e e c h c o m p r e s s i o n o f a b o u t 42% f o r CVSD m o d u l a t i o n i s o b t a i n e d a s c o m p a r e d t o t h e o r i g i n a l o p t i m a l d e l e t i o n o f 37%, N - b i t PCM 37%, A - l a w PCM 36% a n d ADPCM 36%. T h i s p r o v e s t o be t o o m u c h . b) . A t 9 . 2 K b i t s p e r s e c o n d , t h e u t t e r a n c e i s c l e a r a n d c a n be c o n s i d e r e d g o o d e x c e p t f o r t h e two i n s t a n c e s o f s p e e c h l o s s . T h e s e s p e e c h l o s s e s a r e due t o t h e s i l e n c e d e t e c t o r o p e r a t i n g a t a l e v e l w h i c h i s o p t i m a l f o r 1 2 - b i t PCM s p e e c h s a m p l e b u t p r o v e s t o be i n a p p r o p r i a t e f o r CVSD m o d u l a t e d s p e e c h . c ) . T h e CVSD c o d i n g a n d d e c o d i n g p r o c e s s i n t r o d u c e s h i g h e r n o i s e l e v e l s t h a n t h e o r i g n i n a l 1 2 - b i t PCM s p e e c h . F o r i n s t a n c e t h e a v e r a g e e n e r g y i n t h e s i l e n c e d o m a i n f o r 1 2 - b i t PCM i s 5 0 . 2 2 w h e r e a s t h a t f o r 32Kps CVSD d e c o d e d PCM i s 5 7 . 5 4 . T h i s 6 6 h i g h e r n o i s e l e v e l c a u s e s t h e l o s s o f " t a n t " i n " c o n s t a n t " due t o a h i g h e r e n e r g y t h r e s h o l d s e t by t h e a d a p t i v e s i l e n c e d e t e c t o r . d ) . T h i s h i g h e r n o i s e l e v e l , w h i c h i s t h e c o s t f o r o b t a i n i n g b e t t e r t r a c k i n g p e r f o r m a n c e a t h i g h i n p u t l e v e l , a l s o h a s t h e e f f e c t o f l o w e r i n g t h e z e r o - c r o s s i n g - r a t e f o r low i n p u t l e v e l s a t h i g h f r e q u e n c i e s . T h i s a c c o u n t s f o r t h e l o s s o f f r i c a t i v e s shown i n t h e T a b l e . e) . The h i g h e r SQNR a t low i n p u t l e v e l s i s u n f o r t u n a t e , f o r one e x p e c t s t h a t w i t h s i l e n c e d e l e t i o n t h e i d l e c h a n n e l c o n d i t i o n w o u l d be i n s i g n i f i c a n t . B u t t h i s e x p e c t a t i o n i s o n l y v a l i d a t t h e o u t p u t o f t h e SS d e c o d e r a t w h i c h s i l e n c e i n t e r v a l s a r e p l a y e d b a c k a t z e r o l e v e l . T h e p r o b l e m i s a t t h e f r o n t e n d o f t h e s i l e n c e d e t e c t o r w h i c h u t i l i s e s e n e r g y a n d z c r a t low i n p u t l e v e l s . A b e t t e r SS c o d e r p e r f o r m a n c e c a n be o b t a i n e d by c h a n g i n g t h e p a r a m e t e r s a t e i t h e r CVSD c o d i n g l e v e l o r s i l e n c e d e t e c t i o n l e v e l . A t t h e CVSD l e v e l we want l e s s d i s t o r t i o n a t low i n p u t l e v e l s , p e r m i t t i n g more i n s t a n c e s o f s l o p e o v e r l o a d t h a n b e f o r e . S e t t i n g b = 0 . 8 , zmax = 256 w i l l a c h i e v e t h e d e s i r e d r e s u l t . A t t h e s i l e n c e d e t e c t o r l e v e l we c a n s e t t h e s i l e n c e d e t e c t o r p a r a m e t e r s a t a s u b - o p t i m a l l e v e l . 6 7 5 . EVALUATION AND CONCLUSIONS 5 .1 S u b j e c t i v e E v a l u a t i o n Of S i l e n c e D e t e c t o r SD52 T h e q u a l i t y o f a r e c o n s t r u c t e d s p e e c h o u t p u t f r o m s i l e n c e e n c o d e d s p e e c h c o d e s (SS c o d e s ) i s p r i m a r i l y d e p e n d e n t on t h e c o d i n g scheme u s e d a n d t h e e f f e c t s o f t h e s i l e n c e d e t e c t o r . One o f t h e m a j o r e f f e c t s o f s i l e n c e d e l e t i o n d u r i n g s p e e c h s t o r a g e a n d i t s s u b s e q u e n t i n s e r t i o n a s a b s o l u t e s i l e n c e d u r i n g s p e e c h p l a y b a c k i s s p e e c h c l i p p i n g . A n o t h e r i s t h e a b r u p t t r a n s i t i o n i n b a c k g r o u n d n o i s e f r o m l e v e l s d u r i n g s p e e c h p r e s e n c e t o t h e a r t i f i c i a l n o i s e - f r e e s i l e n c e i n s e r t e d d u r i n g p l a y b a c k . O t h e r e f f e c t s s u c h a s q u a n t i z a t i o n a n d t r u n c a t i o n t o w i t h i n 2 . 5 s e c . o f s i l e n c e i n t e r v a l s , a n d t h e t e m p o r a r y l o s s o f t h e a d a p t i v e p r o c e s s a t t h e o u t s e t o f a s p e e c h segment a r e h a r d l y n o t i c e a b l e e v e n by an a t t e n t i v e l i s t e n e r . We a t t e m p t t o e v a l u a t e t h e e f f e c t s o f o u r s i l e n c e d e t e c t o r on t h e r e c o n s t r u c t e d SS c o d e s f o r v a r i o u s c o d i n g s c h e m e s a n d r e c o r d i n g e n v i r o n m e n t s . T h r e e d i f f e r e n t s p e e c h p a s s a g e s , e a c h o f a b o u t 30 s e c o n d s d u r a t i o n were r e c o r d e d on a c a s s e t t e . T h e f i r s t was f r o m a p r e p a r e d l e c t u r e t a p e . T h e s e c o n d was f r o m a r a d i o n e w s c a s t a n d t h e t h i r d was f r o m a t e l e p h o n e c o n v e r s a t i o n , r e c o r d e d f r o m t h e r a d i o , b e t w e e n an A m e r i c a n m a l e a n d a m a l e s p e a k e r w i t h a h e a v y f o r e i g n a c c e n t . T h e f i r s t p a s s a g e was r e c o r d e d w i t h a mono h a n d -h e l d S o n y c a s s e t t e r e c o r d e r w i t h o u t t h e D o l b y s y s t e m w h i l e t h e s e c o n d a n d t h e t h i r d p a s s a g e s were r e c o r d e d o f f an A k a i s t e r e o c o m p o n e n t s y s t e m w i t h a t a p e d e c k h a v i n g t h e D o l b y s y s t e m . U s i n g o u r s y s t e m s e t - u p shown i n F i g u r e 1, t h r e e c o r r e s p o n d i n g 1 2 - b i t 68 PCM speech samples, each a 10 s e c . e x c e r p t , were o b t a i n e d and numbered as 1, 2 and 3. Three c o d i n g schemes were used each w i t h t h r e e d i f f e r e n t c o d i n g l e v e l s s p e c i f i e d by the number of b i t s per sample N, or the b i t - r a t e , as i n d i c a t e d i n Table 21. In the s i l e n c e - e d i t e d v e r s i o n s , s i l e n c e i n t e r v a l s were p l a y e d back as a b s o l u t e s i l e n c e (D/A o u t p u t l e v e l of 0 V o l t ) . In a d d i t i o n , 3 c o m b i n a t i o n s from T a b l e 21 were randomly chosen and n o i s e - e d i t e d by p l a y i n g back a copy of i t s background n o i s e d u r i n g the s i l e n c e i n t e r v a l s . T a b l e 21 - S i l e n c e D e t e c t o r E v a l u a t i o n C o m b i n a t i o n s Speech Sample Coding Scheme L e v e l V e r s i o n N = 4,6,8 U n e d i t e d N = 3,5,7 or S i l e n c e - e d i t e d 16K,24K,32K 1. H i g h n o i s e 2. A-law PCM L e c t u r e Tape 2. Low n o i s e 3. ADPCM Newscast 3. Telephone 4. ADM/CVSD C o n v e r s a t i o n The t e s t c o n s i s t e d of the 54 p o s s i b l e c o m b i n a t i o n s i n T a b l e 21, t h r e e n o i s e - e d i t e d samples and f o u r d u p l i c a t e samples f o r t e s t i n g c o n s i s t e n c y . Thus t h e r e were 61 t e s t samples p l u s s i x o r i g i n a l 1 2 - b i t PCM samples f o r o r i e n t a t i o n p u r p o s e s . A c a s s e t t e r e c o r d i n g was a r r a n g e d as shown i n T a b l e 22. The t e s t sample number was announced b e f o r e each 10-second r e c o r d i n g , which was f o l l o w e d by a s l i g h t pause. A l l the s i l e n c e and n o i s e e d i t e d samples were o b t a i n e d w i t h Our s i l e n c e d e t e c t o r SD52 w i t h f i x e d p a r a m e t e r s : Z-SIL = 36, MIN-SP = 40, EON = 3.0, EOFF = 2.5. A t o t a l of 10 s u b j e c t s p a r t i c i p a t e d i n the u n s u p e r v i s e d e v a l u a t i o n . A l l except one were g r a d u a t e s t u d e n t s or f a c u l t y 69 T a b l e 22 - C a s s e t t e R e c o r d i n g O r d e r i n g S a m p l e n o . S IDE 1 S a m p l e n o . S I D E 2 1 2 3 4 -i 22 33 y S p e e c h S a m p l e 1 S p e e c h S a m p l e 2 S p e e c h S a m p l e 3 Random O r d e r i n g D u p l i c a t e o f 37 Random O r d e r i n g D u p l i c a t e o f 4 34 35 36 37 55 67 S p e e c h S a m p l e 1 S p e e c h S a m p l e 2 S p e e c h S a m p l e 3 Random O r d e r i n g D u p l i c a t e o f 4 •Random O r d e r i n g D u p l i c a t e o f 37 members i n t h e E l e c t r i c a l E n g i n e e r i n g D e p a r t m e n t a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a . E a c h was g i v e n a S o n y Walkman c a s s e t t e p l a y e r M o d . WM-4 w i t h s t e r e o h e a d p h o n e s , a s c o r e s h e e t a n d t h e f o l l o w i n g p r i n t e d i n s t r u c t i o n s : " T h e p u r p o s e o f t h i s s u b j e c t i v e l i s t e n i n g t e s t i s t o a s s e s s t h e e f f e c t o f s i l e n c e d e l e t i o n a n d t h e s u b s e q u e n t i n s e r t i o n on s p e e c h . P l e a s e s t a t e t h e d e g r e e o f a c c e p t a b i l i t y o f t e s t s a m p l e s a s r e c o r d e d m e s s a g e s , on a s c a l e o f 1 t o 5 . 1 means t h a t a s a m p l e i s u n a c c e p t a b l e a n d 5 d e n o t e s t h e h i g h e s t d e g r e e o f a c c e p t a n c e . R a t e e a c h s a m p l e on i t s own r a t h e r t h a n by c o m p a r i s o n w i t h o t h e r s a m p l e s . T h e t e s t c o n s i s t s o f two 1 0 - m i n u t e s e g m e n t s r e c o r d e d on s i d e s 1 a n d 2 o f t h e a c c o m p a n y i n g c a s s e t t e . Y o u a r e a d v i s e d t o t a k e a 5 - m i n u t e b r e a k b e t w e e n t h e two s e g m e n t s . T h e r e a r e 6 7 , 1 0 - s e c o n d t e s t s a m p l e s r e c o r d e d a n d n u m b e r e d i n s e q u e n c e w i t h a s l i g h t p a u s e a f t e r e a c h f o r r e c o r d i n g y o u r s c o r e . F o r y o u r i n f o r m a t i o n a n d o r i e n t a t i o n , t h e f i r s t 3 s a m p l e s o f e a c h s i d e o f c a s s e t t e a r e o r i g i n a l s p e e c h s a m p l e s ; t h e r e m a i n d e r b e i n g a random o r d e r i n g o f v a r i o u s p r o c e s s e d s p e e c h s a m p l e s " . F o u r o f t h e t e n s u b j e c t s were a s k e d t o b e g i n w i t h s i d e 2 i n s t e a d o f s i d e 1. T h e r e s u l t s i n T a b l e 23 i n d i c a t e d t h a t w h i l e i n d i v i d u a l 70 s u b j e c t s d i s p l a y e d some i n c o n s i s t e n c i e s w i t h r e g a r d t o t h e same s p e e c h s a m p l e s , t h e g r o u p a s a w h o l e was c o n s i s t e n t , w i t h t h e a v e r a g e s c o r e s f o r t h e same s a m p l e s d i f f e r i n g by a t most 0 . 5 on a s c a l e o f 1 t o 5 . T h a t t h e s u b j e c t s were r e l i a b l e i n s p i t e o f t h e t e d i o u s t e s t r e p e t i t i o n s c a n be c o n c l u d e d f r o m t h e f a c t t h a t o f t h e 270 u n e d i t e d a n d s i l e n c e - e d i t e d p a i r i n g s o n l y 18 p a i r s (< 7%) a r e c o n t r a - c o n s i s t e n t i n t h e s e n s e t h a t t h e s i l e n c e - e d i t e d v e r s i o n had a h i g h e r s c o r e t h a n t h e u n e d i t e d v e r s i o n . T a b l e 23 - E v a l u a t i o n C o n s i s t e n c y R e s u l t s S a m p l e S a m p l e S u b j e c t n o . n o . i d . 1 2 3 4 5 6 7 8 9 10 A v e r a g e V a r i a n c e 1 1 4 4 5 4 5 3 4 4 2 3 3 . 8 0 . 9 34 1 4 4 5 4 3 4 4 4 2 4 3 . 8 0 . 7 2 2 5 4 5 4 5 4 4 5 3 3 4 . 2 0 . 7 35 2 5 3 5 4 4 5 4 5 3 4 4 . 2 0 . 7 3 3 3 2 4 3 4 2 3 2 2 1 2 . 6 0 . 9 36 3 5 4 4 3 2 3 3 4 2 2 3.1 1 .0 4 324s 2 1 3 2 4 1 2 2 3 1 2.1 0 . 9 33 324s 3 1 3 3 2 1 2 3 1 2 2.1 0 . 8 55 324s 3 2 4 3 3 2 2 3 2 2 2 . 6 0 . 7 22 237s 3 1 5 4 3 4 3 3 1 2 2 . 9 1 .2 37 237s 4 2 3 3 2 4 3 2 2 3 2 . 8 0 . 7 67 237s 3 2 4 2 2 1 3 3 2 3 2 . 5 0 . 8 T h e a v e r a g e d e g r e e o f a c c e p t a b i l i t y d r o p p e d 1.07 b e t w e e n t h e u n e d i t e d s a m p l e s a n d t h e s i l e n c e - e d i t e d o n e s ( T a b l e 2 4 a ) . T h i s d r o p was j u d g e d t o r e s u l t f r o m t h e b a c k g r o u n d n o i s e t r a n s i t i o n s r a t h e r t h a n f r o m s p e e c h c l i p p i n g . The s c o r e s h e r e c a n be r e g a r d e d a s a m e a s u r e o f n a t u r a l n e s s a n d p l e a s a n t n e s s . Among t h e t h r e e r e c o r d i n g e n v i r o n m e n t s t h e n e g a t i v e e f f e c t o f 71 the s i l e n c e d e t e c t o r was most p r o n o u n c e d i n a n o i s y b a c k g r o u n d , a s i n S a m p l e n o . 1 a n d was l e a s t e v i d e n t i n a c o n v e r s a t i o n a l r e c o r d i n g a s i n S a m p l e n o . 3 . S a m p l e n o . 3 i n c l u d e d n u m e r o u s o n - o f f s p e a k e r - t o - s p e a k e r t r a n s i t i o n s ( T a b l e 2 4 b ) . T a b l e 24 - S i l e n c e D e t e c t o r E v a l u a t i o n S c o r e s A v e r a g e U n e d i t e d A v e r a g e S i l e n c e -e d i t e d A v e r a g e N o i s e -e d i t e d S c o r e D i f f e r e n c e O v e r a l l A v e r a g e 3.51 2 . 44 By S p e e c h S a m p l e n o . 1 3 . 5 9 2 . 32 n o . 2 3 . 9 2 2 . 67 n o . 3 3.51 2 . 32 By C o d i n g Scheme A - l a w PCM 3 . 5 3 2 . 52 ADPCM 3 . 3 3 2 . 42 ADM 3 . 6 7 2 . 37 N o i s e - v s S i l e c e -• e d i t e d S a m p l e n o . 1 4 . 0 0 2 . 30 32Kbps ADM 4 . 0 0 S a m p l e n o . 2 3 . 8 0 2 . 90 7 - b i t ADPCM 3 . 8 0 S a m p l e n o . 3 3 . 0 0 3 . 00 6 - b i t A - L a w 3 . 0 0 3 . 4 0 3 . 4 0 3 . 3 0 -1 .07 -1 .27 -1 . 2 5 - 0 . 7 0 - 1 . 0 1 - 0 . 9 1 -1 .30 -1 . 70 - 0 . 6 0 - 0 . 9 0 - 0 . 4 0 0 .00 + 0 . 3 0 Among t h e t h r e e c o d i n g s c h e m e s , ADM showed t h e w o r s t e f f e c t s due t o s i l e n c e - e d i t i n g . T h i s r e s u l t o c c u r r e d b e c a u s e t h e p a r a m e t e r s f o r CVSD were s e t f o r g o o d w a v e f o r m t r a c k i n g a b i l i t y r a t h e r t h a n f o r a b e t t e r SQNR ( S e c t i o n 4 . 5 c ) . T h e r e were i m p r o v e m e n t s o f a t l e a s t 0.3 i n the, a c c e p t a b i l i t y s c o r e s o f t h e n o i s e - e d i t e d s a m p l e s o v e r t h e s i l e n c e - e d i t e d o n e s 72 (Table 24d). A s i g n i f i c a n t improvement of 1.1 was achieved f o r Sample no.1, a r e s u l t which supports the c o n c l u s i o n that the high background noi s e to a r t i f i c i a l a b s o l u t e s i l e n c e playback t r a n s i t i o n s was a major cause of score d e g r a d a t i o n s . The r e s u l t s show that the drop i n the scores was approximately 0.5 f o r n o i s e -e d i t i n g as compared to 1.07 f o r s i l e n c e - e d i t e d samples. The average amount of compression achieved f o r the three t e s t samples by the f i x e d SD52 parameters was, r e p e c t i v e l y , 31%, 35% and 31% over a chosen 10 sec. d u r a t i o n without l e a d i n g or t r a i l i n g s i l e n c e . The compression c o u l d be higher f o r s i m i l a r samples over longer d u r a t i o n s . For example, f o r the o r i g i n a l 30 sec. r e c o r d i n g ( c a l l e d Sample no.1a i n 3.4) from which t e s t sample 1 was e x t r a c t e d , the compression using the same parameters was 36.0% ( c . f . i t s o p t i m a l compression of 36.7%). Although the r e s u l t s i n Table 24 show a s i g n i f i c a n t decrease in a c c e p t a b i l i t y scores due to s i l e n c e - e d i t i n g , we contend that proper f u n c t i o n i n g of our s i l e n c e d e t e c t o r SD52 was maintained. There i s e v i d e n t that much of the score degradation was a t t r i b u t e d to the background noise t r a n s i t i o n s r a t h e r than to speech c l i p p i n g ( i f that was d e t e c t e d at a l l by the s u b j e c t s ) . I g n o r i n g the e f f e c t s of background n o i s e t r a n s i t i o n s , our s i l e n c e d e t e c t o r implementation i n v a r i o u s speech coding schemes was s u c c e s s f u l i n the sense t h a t the s i l e n c e - e d i t e d v e r s i o n s (with or without n o i s e i n j e c t i o n ) were only s l i g h t l y (0.5) l e s s a c c e p t a b l e as recorded messages than the unedited v e r s i o n s while a c h i e v i n g about 35% compression f o r h i g h l y a c t i v e speeches or c o n v e r s a t i o n s . For l e s s a c t i v e c o n v e r s a t i o n s , higher compression r a t i o s would be achieved. 7 3 5 .2 C o n c l u s i o n s In t h i s t h e s i s we h a v e e x p l o r e d t h e c o n c e p t o f o p t i m a l i t y i n a s i l e n c e d e t e c t o r . We were l e d t o t h e d e s i g n o f one (SD52) s i l e n c e d e t e c t o r whose p a r a m e t e r s c a n be o p t i m i z e d by a t e s t s p e e c h s a m p l e w h i l e r e t a i n i n g t h e p e r c e p t u a l l y i m p o r t a n t u n v o i c e d f r i c a t i v e s a n d s t o p s o u n d s . In t h e i m p l e m e n t a t i o n o f SD52 i n v a r i o u s s p e e c h c o d i n g s c h e m e s u s i n g SS c o d e s ( S p e e c h - S i l e n c e c o d e s ) , t h e o v e r h e a d a s s o c i a t e d w i t h s i l e n c e i n t e r v a l e n c o d i n g was n e g l i g i b l e a n d t h e c o m p r e s s i o n was a b o u t 35% f o r v o i c e r e c o r d i n g s s u c h a s r a d i o n e w s c a s t s , h i g h l y a c t i v e c o n v e r s a t i o n s a n d r e a d i n g s f r o m p r e p a r e d t e x t s . Some e r r o r r e c o v e r y i s s u e s were d i s c u s s e d i n c o n n e c t i o n w i t h a d a p t i v e c o d i n g s c h e m e s . In t h e i m p l e m e n t a t i o n o f SS c o d e s , we a s s u m e d no t r a n s m i s s i o n e r r o r s w h i c h i s a p p r o p r i a t e f o r o n - s i t e s p e e c h s t o r a g e . However f o r w i d e r a p p l i c a t i o n s o f s i l e n c e d e t e c t o r s i n a r e a s s u c h a s c h a n n e l m u l t i p l e x i n g o r v o i c e - d a t a n e t w o r k i n t e g r a t i o n , t h e e f f e c t s o f t r a n s m i s s i o n e r r o r s d u r i n g s i l e n c e e n c o d i n g h a v e t o be a d d r e s s e d a n d r e s o l v e d e v e n t h o u g h t h e c h a n n e l e r r o r s d u r i n g n o n -s i l e n c e c o u l d be t r e a t e d a s i n A p p e n d i x e s 1 a n d 2 . No a t t e m p t was made t o e l i m i n a t e b a c k g r o u n d n o i s e d u r i n g o u r s p e e c h r e c o r d i n g s a n d SD52 a p p e a r e d t o p e r f o r m w e l l i n s p i t e o f s u c h n o i s e . H o w e v e r t h e p e r f o r m a n c e o f SD52 i n a v e r y n o i s y e n v i r o n m e n t o r on n o i s y c h a n n e l s s h o u l d be s t u d i e d . T h e c o m p a r i s i o n o f v a r i o u s s i l e n c e d e t e c t o r s i s a l s o s u g g e s t e d a s an a r e a o f f u r t h e r r e s e a r c h . U s i n g t h e f a c i l i t i e s i n S e c t i o n 3.1 i t i s now p o s s i b l e t o c o m p a r e SD52 w i t h o t h e r d e t e c t o r s w i t h r e s p e c t 7 4 t o a common t e s t d o m a i n , f o r s p e e c h s t o r a g e a p p l i c a t i o n s , s p e e c h i n t e r p o l a t i o n s y s t e m s o r v o i c e - d a t a i n t e g r a t e d n e t w o r k s . The h a r d w a r e r e a l i z a t i o n o f SD52 a l g o r i t h m c a n be i n v e s t i g a t e d f o r r e a l - t i m e a p p l i c a t i o n s . 75 B IBLIOGRAPHY AK T . J . A p r i l l e and Y . K u o , "Two ADPCM A l g o r i t h n W i t h W i d e l y S e p a r a t e d E r r o r R e c o v e r y T i m e s " , I E E E T r a n s . C o m m . v o l . C O M - 2 7 N o . 6 , p p . 8 7 6 - 8 8 3 , J u n 1979 A n d J . C . A n d e r s o n , " I m p r o v e d Z e r o - C r o s s i n g M e t h o d E n h a n c e s D i g i t a l S p e e c h " , E D N , V o l . 27', N o . 20 , p p . 1 7 1 - 4 , 13 O c t . 1982 AR B . S . A t a l a n d L . R . R a b i n e r , "A P a t t e r n R e c o g n i t i o n A p p r o a c h t o V o i c e d - U n v o i c e d - S i l e n c e C l a s s i f i c a t i o n w i t h A p p l i c a t i o n s t o S p e e c h R e c o g n i t i o n " , I E E E T r a n s . A S S P , v o l . A S S P - 2 4 , N o . 3 , p p . 2 0 1 - 2 1 2 , J u n . 1 9 7 6 BB W . B e z d e l a n d J . ' S . B r i d i e , " S p e e c h R e c o g n i t i o n u s i n g Z e r o -C r o s s i n g M e a s u r e m e n t s a n d S e q u e n c e I n f o r m a t i o n " , P r o c . I n s t . E l e c t . E n g . V o l . 1 1 6 , p p . 6 1 3 - 6 1 7 , A p r . 1 9 6 9 B e l J . B e l l a m y , " D i g i t a l T e l e p h o n y " , J o h n W i l e y & S o n s , 1982 BL P . M . B o c c i a n d J . L . L o c i c e r o , " B i t - r a t e R e d u c t i o n o f D i g i t i z e d S p e e c h U s i n g E n t r o p y T e c h n i q u e s " , I E E E T r a n s . Comm. V o l . C O M - 3 1 , N o . 3 , p p . 4 2 4 - 4 3 0 , Mar 1983 BMM N . C . B u i , J . J . M o n b a r o n a n d J . G . M i c h e l , "An I n t e g r a t e d V o i c e R e c o g n i t i o n S y s t e m " , I E E E J . S o l i d S t a t e C i r c u i t s , V o l . S C -1 8 , N o . 1 , p p . 7 5 - 8 0 , F e b 1983 BP R . W . B e c k e r a n d F . P o z a , " A c o u s t i c P h o n e t i c R e s e a r c h i n S p e e c h U n d e r s t a n d i n g " , I E E E T r a n s . A S S P , v o l . A S S P - 2 3 , N o . 5 , p p . 4 1 6 -4 2 6 , O c t . 1 9 7 5 B r a P . T . B r a d y , "A S t a t i s t i c a l A n a l y s i s o f O n - O f f P a t t e r n s i n 16 C o n v e r s a t i o n s " , B e l l S y s t . T e c h . J . , V o l . 4 7 . 1 , p p . 7 3 - 9 2 , J a n . 1 9 6 8 BT J . A . B u c k l e w a n d F . T . T z e n g , " A d a p t i v e Companded P u l s e Code M o d u l a t i o n " I E E E T r a n s . C o m m . v o l . C O M - 3 1 N o . 5 , p p . 7 1 2 - 7 1 7 , May 1983 C o h D . C o h e n , " U s i n g L o c a l A r e a N e t w o r k s f o r C a r r y i n g O n - l i n e V o i c e " , ' L O C A L COMPUTER N E T W O R K S ' , N o r t h - H o l l a n d P u b l i s h i n g C o . , 1982 CT B . V . C o x a n d L . K . T i m o t h y , " N o n - P a r a m e t r i c R a n k - O r d e r S t a t i s t i c s A p p l i e d t o R o b u s t V o i c e d - U n v o i c e d - S i l e n c e C l a s s i f i c a t i o n " , I E E E T r a n s . A S S P , v o l . A S S P - 2 8 , N o . 5 , p p . 5 5 0 -5 6 1 , O c t . 1 9 8 0 CF R . E . C r o c h i e r e a n d J . L . F l a n a g a n , " C u r r e n t p e r s p e c t i v e s i n d i g i t a l s p e e c h " , I E E E Comm. M a g a z i n e , v o l . 2 1 N o . l , p p . 3 2 - 4 0 , J a n 1983 76 C J F P.Cummi s k e y , N . S . J a y a n t and J . L . F l a n a g a n , " A d a p t i v e q u a n t i z a t i o n i n DPCM c o d i n g o f s p e e c h " , B S T J , v o l . 5 2 , p p . 1 1 0 5 - 1 1 1 8 , S e p . 1973 CR R . E . C r o c h i e r e a n d R o n a l d , " D i g i t a l S p e e c h C o d i n g T e c h n i q u e s , B r o a d T u t o r i a l O v e r v i e w " , I E E E Comm. M a g a z i n e , v o l . 2 1 N o . 1 , p p . 3 2 - 4 0 , J a n 1983 Das S . K . D a s , "Some E x p e r i m e n t s i n D i s c r e t e U t t e r a n c e R e c o g n i t i o n " , I E E E T r a n s . A S S P , V o l . A S S P - 3 0 , N o . 5 , p p . 7 6 6 - 7 7 0 , O c t . 1 9 8 2 DMM C . L . D o m m a n n , L . D . M c D a n i e l and C . L . M a d d o x , "D2 C h a n n e l B a n k : M u l t i p l e x i n g a n d C o d i n g " , B S T J , v o l . 5 1 : 8 , p p . 1 6 7 5 - 1 6 9 9 , O c t 1972 DMV P . D . D r a g o , A . M . M o l i n a r i a n d F . C . V a g l i a n i , " D i g i t a l D y n a m i c S p e e c h - D e t e c t o r s " , I E E E T r a n s . C o m m . v o l . C O M - 2 6 , n o . 1 , p p . 1 4 0 -145 , J a n 1978 F a r E . F a r i e l l o , "A N o v e l D i g i t a l S p e e c h D e t e c t o r f o r I m p r o v i n g E f f e c t i v e S a t e l l i t e c a p a c i t y " , I E E E T r a n s . C o m m . v o l . C O M - 2 0 , p p . 5 5 - 6 6 , F e b 1972 F J U J . L . F l a n a g a n , J . D . J o h n s t o n a n d J . W . U p t o n , " D i g i t a l V o i c e S t o r a g e i n a M i c r o p r o c e s s o r " , I E E E T r a n s . C o m . , V o l . C O M -3 0 , N o . 2 , p p . 3 3 6 - 3 4 5 , F e b . 1 9 8 2 F l a J . L . F l a n a g a n e t a l , " S p e e c h C o d i n g " , I E E E T r a n s . C o m m . v o l . C O M - 2 7 , p p . 7 1 0 - 7 3 6 , 1979 G i b J . D . G i b s o n , " A d a p t i v e p r e d i c t i o n i n s p e e c h d i f f e r e n t i a l e n c o d i n g s y s t e m s " , P r o c I E E E v o l . 6 8 , A p r 1980 GG D . J . G o o d m a n a n d A . G e r s h o , " T h e o r y o f an a d a p t i v e q u a n t i z e r " , I E E E T r a n s . C o m m . v o l . C O M - 2 2 , p p . 1 0 3 7 - 1 0 4 5 , A u g . 1 9 7 4 GW D . J . G o o d m a n a n d R . W . W i l k i n s o n , "A R o b u s t A d a p t i v e Q u a n t i z e r " , I E E E T r a n s . C o m m . , v o l . C O M - 2 3 , Nov 1975 HCH D . H a c c o u h , P . C o h e n a n d H . H a i - H o c , "An E x p e r i m e n t a l I n v e s t i g a t i o n o f t h e A c t i v e - I d l e P a t t e r n o f S p e e c h O v e r L a n d M o b i l e R a d i o T e l e p h o n e C h a n n e l s " , I E E E T r a n s . V e h . T e c h . v o l . V T - 3 2 , n o . 4 , p p . 2 6 0 - 2 6 8 , Nov 1983 HP H . H . H e n n i n g a n d J . W . P a n , "D2 C h a n n e l B a n k : S y s t e m A s p e c t s " , B S T J V o l . 5 1 : 8 , p p . 1 6 4 1 - 1 6 5 7 , O c t 1972 Huf D . A . H u f f m a n , "A M e t h o d f o r C o n s t r u c t i o n o f M i n i m u m - R e d u n d a n c y C o d e s " , P r o c . o f I R E , V o l . 4 0 , p p . 1 0 9 8 - 1 1 0 1 , S e p 1952 J a y 1 N . S . J a y a n t , " D i g i t a l c o d i n g o f s p e e c h w a v e f o r m s : P C M , D P C M , a n d DM q u a n t i z e r s " , P r o c . I E E E , v o l . 6 2 , p p . 6 1 1 - 6 3 2 , May 1974 77 Jav2 N.S.Jayant, " A d a p t i v e Q u a n t i z a t i o n w i t h a one-word memory", BSTJ v o l . 5 2 , pp.1119-1144, 1973 J a y 3 N.S.Jayant, " S t e p - S i z e T r a n s m i t t i n g D i f f e r e n t i a l Coders f o r M o b i l e Telephony", BSTJ v o l . 5 4 : 9 , pp.1557-1581, Nov 1975 JR N.S.Jayant and A.E.Rosenberg, "The P r e f e r e n c e of S l o p e O v e r l o a d t o G r a n u l a r i t y i n DM of Speech", BSTJ Vol.47,pp.3117-3125, Dec 1971 Laf L.M.Lafuente, "ADPCM Coder f o r Low B i t Rate T r a n s m i s s i o n of Speech S i g n a l s " , E l e c t r i c a l Communications v o l . 5 8 No.2, pp.225-229, 1983 LRRW L.F.Lamel, L.R.Rabiner, A.E.Rosenberg and J.G.Wilpon, "An Improved E n d p o i n t D e t e c t o r f o r I s o l a t e d Word R e c o g n i t i o n " , IEEE T r a n s . ASSP,Vol.ASSP-29,No.4, pp.777-785, Aug.1981 Max N.G.Maxemchuk, "An e x p e r i m e n t a l speech s t o r a g e and e d i t i n g f a c i l i t y " , BSTJ, v o l . 5 9 , pp.1383-1395, 1980 MS H.Miedema and M.G.Schachtman, "TASI Q u a l i t y - E f f e c t of Speech D e t e c t o r s and i n t e r v a l s " , BSTJ v o l . 4 1 , pp.1455-1473, J u l 1962 Neu E.P.Neuburg, "Improvement of V o i c i n g D e c i s i o n s by Use of C o n t e x t " , P r o c e e d i n g s , I E E E Conf. ASSP 1978, pp.5-7 Ney H.Ney, "An O p t i m i z a t i o n A l g o r i t h m f o r D e t e r m i n i n g t h e E n d p o i n t s of I s o l a t e d U t t e r a n c e s " , P r o c . IEEE Conf. ASSP, 1981, pp. 720-723 Nie R . J . N i e d e r j o h n , "A M a t h e m a t i c c a l F o r m u l a t i o n and Comparison of Z e r o - C r o s s i n g A n a l y s i s T e c h n i q u e s which have been a p p l i e d t o A u t o m a t i c Speech R e c o g n i t i o n " , IEEE T r a n s . ASSP,vol.ASSP-23,No.4, pp.373-380, Aug.1975 NMW G.Neben,R.J.McAulay and C . J . W e i n s t e i n , "Experiments i s I s o l a t e d Word R e c o g n i t i o n u s i n g N o i s y Speech", P r o c e e d i n g s , I E E E Conf. ASSP 1983,Vol.3, pp.1156-1159 Nus E.Nussbaum, "1A V o i c e S t o r a g e System", BSTJ Vol.61,No.5, May 1982 0 J.B.O'Neal, " S i g n a l - t o - q u a n t i z i n g n o i s e r a t i o s f o r d i f f e r e n t i a l PCM", IEEE Trans.Comm. vol.COM-19, pp.568-569, Aug.1971 OS J . B . O ' N e a l , J r . and R.W.Stroh, " D i f f e r e n t i a l PCM f o r Speech and Data S i g n a l s " , IEEE Trans.Comm. vol.COM - 2 0 , no.5, Oct 1972 7 8 PG M . D . P a e z a n d T . H . G l i s s o n , "Min imum Mean S q u a r e d - E r r o r Q u a n t i z a t i o n i n S p e e c h " , I E E E T r a n s . C o m m . v o l . C O M - 2 0 , p p . 2 2 5 - 2 3 0 , A p r 1972 R B J L J . M . R a u l i n , G . B o n n e r o t , J . J e a n d o t a n d R . L a c r o i x , "A 60 C h a n n e l PCM-ADPCM C o n v e r t e r " , I E E E T r a n s . C o m m . v o l . C O M - 3 0 n o . 4 , p p . 5 6 7 - 5 7 3 , A p r 1982 RS1 L . R . R a b i n e r a n d R . W . S c h a f e r , " D i g i t a l P r o c e s s i n g o f S p e e c h S i g n a l s " , P r e n t i c e H a l l , 1978 RS2 L . R . R a b i n e r a n d M . R . S a m b u r , "An A l g o r i t h m f o r D e t e c t i n g t h e E n d p o i n t s o f I s o l a t e d U t t e r a n c e s " , B e l l S y s t . T e c h . J . , V o l . 5 4 , p p . 2 9 7 - 3 1 4 , F e b . 1 9 7 5 S o u P . D . S o u z a , "A s t a t i s t i c a l A p p r o a c h t o t h e D e s i g n o f an A d a p t i v e S e l f - N o r m a l i z i n g S i l e n c e D e t e c t o r " , I E E E T r a n s . A S S P , V o l . A S S P - 3 1 , N o . 3 , p p . 6 7 8 - 6 8 4 , J u n . 1 9 8 3 SV V . V . S . S a r m a a n d D . V e n u g o p a l , " S t u d i e s on P a t t e r n R e c o g n i t i o n A p p r o a c h t o V o i c e d - U n v o i c e d - S i l e n c e C l a s s i f i c t i o n " , P r o c e e d i n g s , I E E E C o n f . A S S P 1 9 7 8 , p p . 1 - 4 UL C . K . U n a n d H . H . L e e , " V o i c e d / U n v o i c e d / S i l e n c e D i s c r i m i n a t i o n o f S p e e c h by D e l t a M o d u l a t i o n " , I E E E T r a n s . A S S P , v o l . A S S P -2 8 , N o . 4 , p p . 3 9 8 - 4 0 7 , A u g . 1 9 8 0 W i t I . H . W i t t e n , " P r i n c i p l e s o f C o m p u t e r S p e e c h " , A c a d e m i c P r e s s , 1982 WF C . J . W e i n s t e i n a n d J . W . F o r g i e , " E x p e r i e n c e w i t h S p e e c h C o m m u n i c a t i o n i n P a c k e t N e t w o r k s " , I E E E J . S e l e c t e d A r e a s i n Comm. V o l . S A C - 1 , N o . 6 , p p . 9 6 3 - 9 8 0 , Dec 1983 Y a t Y . Y a t s u z u k a , " H i g h l y S e n s i t i v e S p e e c h D e t e c t o r a n d H i g h - S p e e d V o i c e b a n d D i s c r i m i n a t o r i n D S I - A D P C M S y s t e m " , I E E E T r a n s . C o m m . v o l . C O M - 3 0 n o . 4 , p p . 7 3 9 - 7 4 9 , A p r 1982 7 9 APPENDIX 1 - ERROR RECOVERY AT THE ADPCM DECODER G i v e n a s p e e c h w a v e f o r m we c a n e n c o d e i t w i t h ADPCM s t a r t i n g w i t h any a r b i t r a r i l y c h o s e n p r e d i c t o r v a l u e and s t e p s i z e , and i n t h e a b s e n c e o f t r a n s m i s s i o n e r r o r , t h e r e c o n s t r u c t e d s p e e c h w i l l show t h a t t h e i n i t i a l e r r o r b e t w e e n t h e p r e d i c t o r v a l u e a n d t h e f i r s t s p e e c h s a m p l e d i e s o u t i n l e s s t h a n 10 s a m p l e s f o r a p r o p e r l y d e s i g n e d ADPCM c o d e r . T h i s i s b e c a u s e t h e c o d e r s t r i v e s t o weed o u t any t r a n s i e n t e r r o r a n d a d a p t s i t s s t e p s i z e t o f i t t h e d y n a m i c r a n g e o f t h e i n c o m i n g s p e e c h w a v e f o r m . H o w e v e r , a t t h e d e c o d e r , g i v e n a s e q u e n c e o f ADPCM c o d e s c ( 1 ) , c ( 2 ) , c ( 3 ) , . . . where c ( k ) = { s ( k ) , l ( k ) } , s ( k ) = s i g n b i t , l ( k ) = q u a n t i z a t i o n l e v e l , t h e r e c o n s t r u c t e d s p e e c h i s u n i q u e l y d e t e r m i n e d by t h e c o d e s e q u e n c e , h a v i n g no i n c o m i n g w a v e f o r m t o a d a p t t o . A t a n y i n s t a n c e t h e r e c o n s t r u c t e d s a m p l e x ( k ) d e p e n d s on t h e s t e p s i z e z ( k ) a n d t h e p r e d i c t o r v a l u e p ( k ) , b o t h o b t a i n e d r e c u r s i v e l y . S u p p o s e a t t i m e m we r e p l a c e p ( m ) , z(m) by a r b i t r a r i l y a s s i g n e d p(m) a n d z ( m ) , t h e n e v e n i n t h e a b s e n c e o f t r a n s m i s s i o n e r r o r i t i s an open q u e s t i o n w h e t h e r t h e s e q u e n c e s x(m) a n d x(m) ( s e q u e n c e o b t a i n e d by t h e a r b i t r a r y r e p l a c e m e n t s ) c a n e v e r become t h e same t h r o u g h t i m e . In o u r ADPCM scheme t h e p r e d i c t o r e r r o r o c c u r r i n g a l o n e p o s e s no l o n g t e r m i l l e f f e c t . S u p p o s e a n d t h e n x(m) x(m) x(m) x(m+1) by i n d u c t i o n x(m+j) p (m) + d (m) ; p(m) + d(m) P e + x(m) a . x ( m ) + d(m+1) a . p d(m) = s(m) U ( n 0 + 0 . 5 } z ( m ) assuming z(m) = z(m) where p e = p(m) - p(m) a.pc + a.x(m) + d(m+l) + x(m+1) a ^ . p e + x(m+j) j = 1 , 2 , 3 , F o r a < 1 i t i s e a s y t o s e e t h a t x (m+j ) — » . x ( m + j ) . F o r a = 0 . 8 a n d t a k i n g t h e w o r s t c a s e o f p = 2048 we h a v e a ^ . p < 1 f o r j > 34 e i . e . x ( k + j ) = x ( k + j ) , f o r j > 34 T h e f o r e g o i n g shows t h a t i n t h e w o r s t c a s e , p r e d i c t o r e r r o r w i l l d i s a p p e a r i n 35 s a m p l e s o r 4 .4ms a n d c a n i n d e e d be c o n s i d e r e d a s t r a n s i e n t e r r o r . On t h e o t h e r h a n d t h e c a s e f o r s t e p s i z e e r r o r a n d i t s r e c o v e r y i s n o n - d e t e r m i n i s t i c a n d i s s i g n a l d e p e n d e n t . 8 0 L e t z ( k ) be t h e s t e p s i z e a t t i m e k, t h e n z ( k ) c o r r e s p o n d s t o r u n g r ( k ) on t h e s t e p s i z e l a d d e r w i t h z ( k ) = 2 r ^ k ^ 3 . A t t i m e k+1, t h e new r u n g i s r ( k + l ) = r ( k ) + f { l ( k + l ) } where f i s t h e s t e p p i n g f u n c t i o n . z ( k + 1 ) = z ( k ) . 2 f { 1 ( k + l ) } / 3 f o r a l l k z j _ . f { l ( k + i ) } / 3 z ( k + j ) = z ( k ) . 2 1 - 1 (1) E q u a t i o n (1) shows t h a t t h e s t e p s i z e s e q u e n c e d e p e n d s o n l y on an i n i t i a l s t e p s i z e z ( k ) a n d t h e s e q u e n c e l ( k + i ) . S u p p o s e a t t i m e T , p ( r ) , 6{r) a r e r e p l a c e d a r b i t r a r i l y by p ( r ) a n d 3 (T ) t h e n X(T ) = p ( r ) + d"(r) = p ( r ) + s ( r ) { 1 ( T ) + 0 . 5 } . z ( r ) X(T+1) = a . x ( r ) + a ( r+1) = a . p ( r ) + a . q ( T ) z ( T ) + q ( r + l ) z ( T + 1 ) where q ( k ) = s ( k ) . { 1 ( k ) + 0 . 5 } = a.p(r) + z ( r ) [ a . q ( r ) + q ( r + 1 ) 2 f { 1 ( t + 1 ] } / 3 ] by i n d u c t i o n , x ( r+j) = a 3 . p ( r ) + z ( r ) h ( T , j ) where h ( r , j ) = j , . z ( r ) [ a 3 q ( T ) + a ^ 1 q ( r + 1 ) 2 f ^ ( t + l ) } / 3 ^ + g ( T + j ) 2 1 , , / 3 j s i m i l a r l y x ( r + j ) = a^pir) + z ( r ) . h ( T , j ) a n d x ( r + j ) - g 3 . p ( r ) _ z ( r ) x ( r + j ) - a - J . p ( r ) Z(T) E q u a t i o n (2) i s t r u e p r o v i d e d t h e s e q u e n c e s z ( k ) a n d z ( k ) h a v e no b o u n d s a n d u n d e r t h i s c o n d i t i o n x ( r + j ) ^ Z(T) x ( r + j ) Z(T) F i g u r e 17 shows t h e c a s e f o r Z ( T ) / Z ( T ) < 1 a n d Z ( T ) / Z ( T ) > 1 a f t e r t h e c o m p o n e n t a 3 . p ( T ) b e c o m e s i n s i g n i f i c a n t . 81 In our ADPCM scheme the s t e p s i z e s a r e bounded by the l o w e s t and h i g h e s t rung of the s t e p s i z e l a d d e r , and the c o n d i t i o n under which (2) i s t r u e w i l l be v i o l a t e d by o c c u r r e n c e of l a r g e i n p u t s i g n a l which tends t o i n c r e a s e s t e p s i z e t o i t s maximum l i m i t . I t i s d u r i n g t h e s e t i m e s t h a t the s t e p s i z e s z ( k ) , z ( k ) can ever become e q u a l , t h r o u g h the s t a g n a t i o n of one and s t e p p i n g up by the o t h e r . T h e r e a f t e r the s t e p s i z e s w i l l be the same as both s t e p up or down by f{1(k+1)} rungs. A f t e r the s t e p s i z e s become e q u a l i t w i l l not be l o n g b e f o r e x ( k ) and x ( k ) a r e e q u a l a g a i n . E x p e r i m e n t s have shown t h a t d u r i n g a speech u t t e r a n c e , the maximum s t e p s i z e o c c u r s most f r e q u e n t l y . Hence the o c c u r r e n c e of p r e d i c t o r e r r o r and s t e p s i z e e r r o r o n l y have a t r a n s i e n t e f f e c t on the r e c o n s t r u c t e d waveform d u r i n g speech i n t e r v a l s . The s t e p s i z e e r r o r can be c o r r e c t e d by s e v e r a l ways. One way i s t o send the s t e p s i z e e x p l i c i t l y a t the o u t s e t of a speech i n t e r v a l , t h e r e b y i n c r e a s i n g the c o m p l e x i t y of SS coder and d ecoder. The o t h e r i s t o have a d e g e n e r a t i v e r e c u r s i v e s t e p s i z e r e l a t i o n s h i p such a s : the f a c t o r b w i l l e nsure a decay of i n i t i a l s t e p s i z e e r r o r i n the same way t h a t p r e d i c t o r c o e f f i c i e n t a e n s u r e s decay of p r e d i c t i o n e r r o r . B ut, i n any case c o m p l e x i t y i s the p r i c e t o pay. »<•.•,) = z(k) b.2 H l ( k + , ) ) / 3 f o r b < 1 z ( r ) / z ( r ) > 1 i n p u t waveform z ( T ) / z ( T ) > i F i g u r e 17 - ADPCM S t e p S i z e E r r o r 82 APPENDIX 2 - S T E P S I Z E ERROR RECOVERY AT CVSD DECODER In t h i s A p p e n d i x we p r o v e t h a t w i t h r e s p e c t t o t h e CVSD d e c o d e r ( F i g u r e 14) any s t e p s i z e e r r o r o c c u r r i n g a t t i m e m w i l l d i s s i p a t e t o z e r o a n d t h e e n s u i n g d e c o d e r o u t p u t w i l l be e r r o r f r e e t h r o u g h t i m e . P r o p o s i t i o n : ( 1 ) . I f e(m) = z(m) - z (m) t h e n e(m+j ) = b-^e(m). ( 2 ) . x (m+j ) > x ( m + j ) . P r o o f : ( 1 ) . e (m+ l ) = z ( m+ l ) - z ( m+ l ) = b . z ( m ) + 3(m+l) - b . z ( m ) -d(m+1) where d (m+ l ) = d1 o r d2 a s d e t e r m i n e d by c ( m + l ) , c ( m ) , c ( m - l ) . In t h e a b s e n c e o f t r a n s m i s s i o n e r r o r 3(m+1) = d (m+l ) a n d e(m+1) = b . e ( m ) . By i n d u c t i o n , e(m+j) = b ^ e ( m ) . ( 2 ) . S u p p o s e t h e s t e p s i z e e r r o r e(m) o c c u r s a t t i m e m, t h e n by (1) t h e r e e x i s t s n s u c h t h a t e(m+n) = b n e ( m ) = 0 . x(m+n+l) - x(m+n+1) = a .x (m+n) + s(m+n+1)z(m+n+1) - a .x (m+n) - s(m+n+1)z(m+n+1) = a .x (m+n) - a .x (m+n) + s(m+n+1)e(m+n+1) = a . [ x ( m + n ) - x(m+n)3 a n d x(m+n+j) - x(m+n+j) = a- 3 [x(m+n) - x (m+n)] > 0. Q . E . D . 83 APPENDIX 3 - LONGEST STRING OF ZEROS IN CVSD S P E E C H CODES P r o p o s i t i o n : F o r CVSD s p e e c h c o d e s w i t h b = 0 . 7 , d 1 = 0 . 6 , d2=115 t h e maximum l e n g t h o f a s t r i n g o f z e r o s i s 16 . P r o o f L e t a s t r i n g o f z e r o s commences a t t i m e m - 2 , t h e n p(m) = a . p ( m - l ) - a . z ( m ) p(m+1) = a . p ( m ) - a . z ( m + 1 ) 2 2 = a p ( m - l ) - a z (m) - a . z ( m + 1 ) p(m+n) = a n + 1 p ( m - 1 ) - z " = 0 a n - : i + 1 z (m+j ) But z (m+j ) = b . z ( m + j - 1 ) +d2 = b 2 z ( m + j - 2 ) + b . d 2 + d2 • = b^z fm) + d2[ 1 + b 2 + b 3 + . . . + b j ] = b j z ( m ) + d 2 d - b j + 1 ) / ( 1 - b) So p(m+n) = a n + 1 p ( m - D - a n + 1 z ( m ) l j ( b / a ) ? - d 2 . a n + 1 [ZQCL"* - b . l j ( b / d ) j - „ n + 1 r r ^ m i i ^ ( ^ i l Z L ^ l l d 2 . ( a " n " 1 - l ) . b . d 2 d - r n + 1 ) 1 = a L p(m-1 ) - z (m) - • = + J ( 1 - r ) d - b ) ( a - 1 ) ( 1 - b ) ( 1 - r ) w h e r e r = b / a F o r t h e s t r i n g t o s t o p a t m+n we must h a v e x(m+n+1) - p(m+n) > 0 (1) T h e g r e a t e s t l o w e r b o u n d f o r LHS o f (1) c o r r e s p o n d s t o : x(m+n+1> = - 2 0 4 8 ; p ( m - l ) = 2 0 4 8 ; z (m) = z m i n = 2 S u b s t i t u t i n g f o r a = 0 . 9 5 , b = 0 . 7 a n d d2 = 1 1 5 . 2 GLB o f LHS o f (1) = 5248 - 1 0 3 5 8 ( 0 . 9 5 ) n + 1 + I 0 l 4 ( 0 . 7 ) n + 1 = - v e f o r n = 12 = +ve f o r n = 13 T h e r e f o r e t h e s t r i n g s t o p s a t m+13, i m p l y i n g a maximum l e n g t h o f 1 6 . 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0096301/manifest

Comment

Related Items