Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Speech synthesis by concatenation of Digital waveform fragments Chu, Thien-Ke 1978

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1979_A1 C49.pdf [ 5.73MB ]
Metadata
JSON: 831-1.0065436.json
JSON-LD: 831-1.0065436-ld.json
RDF/XML (Pretty): 831-1.0065436-rdf.xml
RDF/JSON: 831-1.0065436-rdf.json
Turtle: 831-1.0065436-turtle.txt
N-Triples: 831-1.0065436-rdf-ntriples.txt
Original Record: 831-1.0065436-source.json
Full Text
831-1.0065436-fulltext.txt
Citation
831-1.0065436.ris

Full Text

SPEECH SYNTHESIS BY CONCATENATION OF DIGITAL WAVEFORM FRAGMENTS by THIEN-KE ^CHU B.Sc.A., Ecole Polytechnique, Universite de Montreal , 1968 M.A.Sc..University of British Columbia, 1970 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y O P { G R A D U A T E S T U D I E S ( D E P A R T M E N T O F E L E C T R I C A L . - E N G I N E E R I N G ) accept this thesis as conforming to the required standard RESEARCH SUPERVISOR : MEMBERS OF THE COMMITTEE : HEAD OF THE DEPARTMENT : THE UNIVERSITY OF BRITISH COLUMBIA December, 1978 © Thien-Ke jchu, 1978 I n p r e s e n t i n g t h i s t h e s i s i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r a n a d v a n c e d d e g r e e a t t h e U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t h e L i b r a r y s h a l l m a k e i t f r e e l y a v a i l a b l e f o r r e f e r e n c e a n d s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may b e g r a n t e d b y t h e H e a d o f my D e p a r t m e n t o r b y h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l n o t b e a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a 2075 Wesbrook Place Vancouver, Canada V6T 1W5 D e p a r t m e n t ABSTRACT A method to rule-synthesize speech by concatenation of d i g i t a l waveform fragments at a subphonemic l e v e l i s pre-sented. No special hardware i s needed to implement t h i s s o f t -ware synthesizer other than a D/A converter and an ordinary audio system. Computer software for an on-line analysis-by-synthesis process was developed. Phonetic cues, such as c h a r a c t e r i s t i c waveform fragments, durations of each quasi-steady state and the t r a n s i t i o n motion of a c e r t a i n number of phonemes were extracted and stored. C l a s s i f i c a t i o n s of phonetic cues were possible and necessary to reduce the storage requirement and to obtain rules for synthesis. An i n t e r p o l a t i o n scheme was developed to generate transient waveforms to eliminate the d i s c o n t i n u i t i e s at the concatenated junctions. Pitch v a r i a t i o n was found to be the most i n f l u e n t i a l factor for creating intonation i n p o l y s y l l a b l e utterances and was achieved by a p i t c h modification routine included i n the synthesis program. Test procedures and r e s u l t s are reported i n which a comparable vowel recognition rate for synthetic words i s 9 3% vs. the 9 4 % of d i g i t i z e d natural words in the f i r s t t e s t . Further studies are needed to generalize the method to synthesize unrestricted text. The findings of the phonetic cues could be applied to speech recognition i n future work. i i i TABLE OF CONTENTS Page ABSTRACT i i TABLE OF CONTENTS ' i i i LIST OF ILLUSTRATIONS v i LIST OF TABLES v i i i LIST OF SYMBOLS ix ACKNOWLEDGEMENTS . X CHAPTER 1. INTRODUCTION 1 1.1 Background 1 1.2 Methodology 4 2. QUALITATIVE PRESENTATION, 6 2.1 Introduction 6 2.2 Speech Synthesis Model 6 2.3 Acoustics of Speech 9 2.4 Analysis Process 12 2.5 Synthesis Process ' x^ 2.6 Tests 2 2 2.6.1 Test Materials 2 2 2.6.2 Test Arrangements and Procedure .... . 2 4 2.7 Test Results and Discussions 25 2.7.1 Vowel I d e n t i f i c a t i o n 3 2 i v .CHAPTER Page 2.7.2 Consonant Discrimination 3 3 2.7.3 Alphabet Recognition 3 3 3 SPEECH ANALYSIS 34 3.1 Data Processing F a c i l i t i e s 3 4 3.1.1 Hardware f a c i l i t i e s 3 4 3.1.2 Software f a c i l i t i e s 35 3.2 Data Preparation 37 3.3 Or i g i n a l Data Selection 38 3.4 Data Analysis 43 3.4.1 Vowel analysis 44 3.4.1.a Structure of vowel waveform .. 44 3.4.1.b Duration of vowels 45 3.4.1.C Duration of pitch period ... 48 3.4.2 Transient and consonant analysis 48 3.5 Characteristic. Phoneme Fragment Extraction 50 3.5.1 Fragmentation c r i t e r i a 50 3.5.2 Selection c r i t e r i a 55 3.5.3 F i l i n g system for Ch a r a c t e r i s t i c Phoneme Fragment Data 55 3.6 Conclusions 57-4 SPEECH SYNTHESIS 61 4.1 Command instructions 61 4.1.1 Control Commands 61 4.1.2 Functions 61 V CHAPTER Page 4.2 Interpolation schemes 64 4.2.1 Pitch contouring 64 4. 2.1.a Calculation of duration differences 64 4.2.1.b Remove or add samples to the o r i g i n a l CPF 6 6 4.2.2 Amplitude and p i t c h contouring .66 4.3 Memory organization 71 4.4 Selected examples of synthetic waveforms ... 71 4.5 Pol y s y l l a b l e words and sentences 74 4.6 Conclusions 75 5 CONCLUSIONS 77 REFERENCES 7 9 APPENDICES 8 6 APPENDIX A Word l i s t s and phonetic symbols 87 APPENDIX B Time and frequency plots of 8 o r i g i n a l words 89 APPENDIX C Test materials 106 v i LIST OF ILLUSTRATIONS FIGURE; Page 2.1 Black box representation of the synthesis system 7 2.2 Block diagram of the synthesis system 8 2.3 Block diagram of the data a c q u i s i t i o n process 12 2.4 Block diagram of the analysis process 13 2.5 Time plot of the word : "BEAT" 15 2.6 Frequency plots of the word: "BEAT" (voiced segment) 16 . 2.7 Time and frequency plots of fragments from phoneme / i / as in "BEAT" 18 2.8 Time plot of the word : "BEAT" - Synthetic .20 _ 2.9 Frequency plots of the word : "BEAT" - Synthetic (voiced segment) 21 3.1 Time plot of the word : "BEAT"-male voice 41 3.2 Time plot of the word : "BEAT"-fema1e voice .... 42 3.3 A s i m p l i f i e d p i t c h period waveform 50 3.4 Discontinuity i n concatenated waveform due to the differences i n amplitude of end points 51 3.5 D i f f i c u l t y i n automatic p i t c h detection 52 3.6 Selected end points for pi t c h detection in the present work 52 3.7 Results from the fragmentation process of the voiced segment of /baet/ 54 3.8 Time plot of selected C h a r a c t e r i s t i c Phoneme Fragments - 56 3.9 Analysis process : organization of core memory and data f i l e s 59 v i i FIGURES Page 4.1 Block diagram of the synthesizer at the a r t i c u l a t o r y stage 6 2 4.2 Notations for Pitch Contouring Description 65 4.3 Ef f e c t s of Removing Points 67 4.4 E f f e c t s of Adding points 68 4.5 Synthesizer at the A r t i c u l a t o r y Stage block diagram and memory organization 70 4.6 Time plot of the word / n i / synthetic 7.2._ 4.7 Synthetic transients (A) between /b/ and / i / (B) between /d/ and / i / (C) between /g/ and / i / 7 3 v i i i LIST OF TABLES TABLE Page 2.1 Subject i d e n t i f i c a t i o n and o v e r a l l r e s u l t s .26: 2.2 Confusion matrix for vowels preceeded by /b/, Test No. 1, Natural Stimulus 27 . 2.3 Confusion matrix for vowels preceeded by /b/, Test No. 1, Synthetic Stimulus 27 2.4 Confusion matrix for vowels preceeded by /b/, Test No. 3, Natural Stimulus 2 8 . 2.5 Confusion matrix for vowels preceeded by /b/, Test No. 3, Synthetic Stimulus 28 2.6 Confusion matrix for /b/, /d/, /g/, Test No. 2 . 29 2.7 Confusion matrix for /b/, /d/, /g/, Test No. 4 . 29 2.8 Confusion matrix for vowels preceeded by /b/, Test No. 4 29 2.9 Confusion matrix for vowels preceeded by /d/, Test No. 4 30 2.10 Confusion matrix for vowels preceeded by /g/, Test No. 4 30 2.11 Confusion matrix for English alphabet, Test No. 5 i 31 3.1?. Duration of pit c h period and voiced segment in number of samples of 9 words " b i t , beat, b a i t , boot, bet, boat, but, bought, bat.V 46 3.2 Duration of pit c h period and voiced segment i n m.sec. of 9 words " b i t , beat, b a i t , boot, bet, boat, but, bought, bat." .47 3.3 Charac t e r i s t i c Phoneme Fragment F i l e s 6 0 4.1 Control commands for / n i / 6 2 4.2 Concatenation and Interpolation Functions 63 4.3 Amplitude of interpolated fragments 'and value of ^ N 69 i x L I S T OF S Y M B O L S i- vowel number 1 (as i n "beat") 1 vowel number 2 (as i n " b i t " ) ; e vowel number 3 (as i n "bait") £ vowel number 4 (as i n "bet") 2 e vowel number 5 (as i n "bat") A vowel number 6 (as i n "but") 0 vowel number 7 (as i n "bought") 0 vowel number 8 (as i n "boat") M, vowel number 9 (as i n "boot") b voiced b i l a b i a l plosive d voiced alveolar plosive g voiced velar plosive f voiceless labio-dental f r i c a t i v e s voiceless alveolar f r i c a t i v e 1 voiced alveolar l a t e r a l continuant m voiced b i l a b i a l nasal n voiced alveolar nasal [ ] phonetic t r a n s c r i p t i o n / / phonemic interpretation " " alphabet t r a n s c r i p t i o n A/D Analog to D i g i t a l D/A D i g i t a l to Analog X ACKNOWLEDGEMENT It i s a pleasure to express my appreciation to. many people who have given sincere guidance, advice, c r i t i c i s m , and encouragement, without whom th i s work would not be possible, In p a r t i c u l a r , I would l i k e to thank my supervisor, Dr. Beddoes. Also, I am grate f u l to Mr. Mark Bunce for his help i n preparing the hardware and some software f a c i l i t i e s , my friends and collegues Tony and Chris Smith, Doug Dean and Dao Le Giang, Mrs. Judy Piercey_and Mr. Tino Varelas for th e i r contributions i n the project and i n preparing the thesis. This work was supported by the University of B r i t i s h Columbia and the National Research Council of Canada under grant 67-3290. Some equipment was provided by the Medical Research Council of Canada under grant no. MA-3971. x i DEDICATION To my family, who provides me with the essence of l i f e , l i k e a root to a green plant and To many, who relate to me in a manner beyond the words of thanks. i 1. INTRODUCTION A method to synthesize speech by concatenation of a lim i t e d number of stored d i g i t a l waveform fragments has been developed, and i s presented here. Acoustic features smaller than phoneme units and t h e i r interactions form the basis of t h i s method. At the l a t e r stage of the project the synthesizer was used as a part of the analysis system, the r e s u l t s from which could be used to explore the speech recognition problem. The immediate goal however was to use the synthesizer as an automatic voice read out, a device which has many important applications. These applications range from providing information from computer stored data through the already ex i s t i n g telephone network, automatic intercept messages, s e l f teaching, to aids for the handicapped [6,60]. 1.1 Background The f i r s t known r e a l i z a t i o n of a speech synthesizer, a mechanical analog of the vocal t r a c t , was accomplished i n 1791 by Wolfgang von Kempelen [6,14]. An e l e c t r i c a l analog version was i n i t i a t e d i n 1922 by Stewart [13]. A sophisticated e l e c t r i c a l synthesizer, the VODER, was designed by Dudley, Riesz, and Watkins [14], and demonstrated i n 1939. A quantitative description of vocal t r a c t acoustics 2 \ based on the e l e c t r i c a l transmission-line theory was independently and successfully formulated by Dunn, Chiba and Kajiyama [15,6]. The theory l a t e r was elaborated by Fant, Stevens, Kasowski and Rosen [6,16]. The vocal t r a c t shape was studied by Stevens and House i n 1954 [17]. The synthesis process using d i g i t a l computers was implemented by K e l l y and Lochbaum in the early 1960's [18]. Later, with the advent of d i g i t a l computer technology, and d i g i t a l signal processing theory, Flanagan, Coker and Bird were able to simulate the synthesizer on a d i g i t a l computer [19], and a r e a l time hardware system was r e a l i z e d by Rabiner, Jackson, Shafer and Coker [20]. In recent years, the e f f o r t was concentrated on synthesis rules [36-51",59], Synthesis rules were used to obtain more natural output speech. Speech synthesis can be viewed also as a means for conserving channel capacity i n communication engineering. The VOCODER was the f i r s t analysis-synthesis system that was i n actual operation [14, 22]. A l i n e a r model of speech production was developed by Fant in 1960 [23,5]. The"linear prediction method for speech analysis-synthesis systems was introduced by A t a l and Hanauer [24], and, Itakura and Saito [2.5-27]. S i g n i f i c a n t amounts of bandwidth compression were obtained with l i t t l e degradation of speech output [5,59,66,72]. The success of l i n e a r prediction coding lead to the development of a rule-synthesis scheme [59,66]. A l l speech synthesizers of the group mentioned above (except Kempelen's) used white noise generators and pit c h pulse generators as a source of energy. The synthesi mechanism was a combination of e l e c t r i c a l f i l t e r s or recursive d i g i t a l f i l t e r s [5,6]. Control parameters and synthesis rules were entered manually (VODER) or through d i g i t a l computers [12,68-70]. Speech synthesis by concatenation of prerecorded elements was studied by C.M.Harris in 1953 [28]. A number of "Building Blocks" were used. Dyads, "the set of a l l segments involving a single a r t i c u l a t o r y sequence pair and a l l conditions of prosody associated with that sequence", were used by Peterson, Wang and Sivertsen, who estimated that about 8000 elements were necessary [29] . The enormous storage requirement and the d i s c o n t i n u i t i e s at the junctions of the output speech ended t h i s work almost immediately [6,28,29]. Disco n t i n u i t i e s were produced not only by the hardware problem but mainly by the fundamental problem of time and context variant of speech [28,30,6]. Attempts were made to implement the synthesis) by concatenation of prerecorded elements to a d i g i t a l computer [11]. A single element was chosen to represent each phoneme. Data compression was possible. However, the fundamental problem of time and context variance was not considered. Results were based on a tr a i n i n g period of a small set of words, and a s p e c i f i c a p p l i c a t i on was the goal Despite the difference in'.the fundamental concept t h i s i s . 4 the s t a r t of the work reported here. In the present work d i g i t a l waveform fragments, with more than one fragment representing a phoneme, were used as the basic elements. Different combinations of fragments were used to obtain d i f f e r e n t allophones and generate phonemic var i a t i o n s . To overcome the problem of discontinuity, an interpolation scheme was developed to create transients at the junctions. 1.2 Methodology The p r i n c i p a l work areas are: - A study of the context, and variations of chosen utterances i n both time and frequency domain. - I d e n t i f i c a t i o n and c l a s s i f i c a t i o n of phonetic cues for each phoneme. - Development of the interpolation scheme for the synthesis process. - Testing of the synthesis system. A q u a l i t a t i v e presentation, an almost complete description of the project, i s found i n chapter 2. Most sections i n chapter 2 are amplified in subsequent chapters, though not i n the same order. The descriptions w i l l be of acoustics of speech, analysis process, data and analysis r e s u l t s , synthesis process, test design and r e s u l t s . Conclusions and suggestions for further work w i l l be i n the 5 l a s t chapter. I t should be noted that the work reported here i s regarded as a small contribution to the research of speech science at the basic l e v e l i n general, speech synthesis i n p a r t i c u l a r . No attempts or intentions have been made to generalize the findings or to compare r e s u l t s with the previous methods. The idea of using waveform data at the subphonemic l e v e l and the success in overcoming the disco n t i n u i t y at the junctions of concatenated fragments are the factors that contribute to the merit and innovative value of t h i s approach and j u s t i f y the motivation of t h i s project. 2. QUALITATIVE PRESENTATION 2.1 Introduction Before the detailed and technical problems of speech analysis and synthesis are discussed, an overview of the system i s desirable. The supporting data for a number of conclusions and remarks i n t h i s chapter w i l l be found i n subsequent chapters. The synthesis model i s described i n section 2.2. This i s followed by the studies of natural speech; i t s c h a r a c t e r i s t i c features and complexity, which i s the fundamental d i f f i c u l t y associated with the f a b r i c a t i o n of a r t i f i c i a l speech. The analysis, synthesis scheme and the evaluation of the synthesis system preceed a short discussion concluding t h i s chapter. A l l descriptions presented in t h i s chapter r e f e r to the present system unless otherwise sp e c i f i e d . 2.2 Speech synthesis model The present speech synthesis by rule i s a method of generating continuous acoustic speech waveforms as the response to the inputs which are ASCII codes retrieved from computers or derived from printed text (Figs. 2.1 and 2.2). ASCII Codes Input Synthesis System Speech Output -Fig. 2.1 Black box representation of the synthesis system. The synthesis system i s a two stage processor, an input analyzer and an output generator, linked by a" command buffer. At the f i r s t stage, depending upon the output mode set by the operator using the output mode selector, ASCII code inputs are analyzed i n d i v i d u a l l y for s p e l l i n g , or i n a group for words or sentences. From the analysis r e s u l t s the syntactic and phonemic features such as related phoneme fragments, duration, transient motion, stress and pause are assigned,, by associating control commands stacked i n the command buffer, to each input s t r i n g . Outputs from the command buffer control the synthesis processor which, upon receiving the control commands, transfers appropriate phoneme fragments to phoneme fragment buffers. Transitions are then generated by p i t c h and amplitude contouring of selected elements from the phoneme fragment buffers with the r e s u l t i n g discrete waveform then being stored i n the output buffer. A D/A converter, a low pass f i l t e r , an amplifier and a loud speaker are used to obtain continuous A S C I I I N P U T S O U T P U T MODE S E L E C T O R I N P U T A N A L Y S E R ( C o n t r o l C o m m a n d g e n e r a t o r , p h o n e m i c f e a t u r e a s s i g n m e n t ) F I R S T S T A G E COMMAND B U F F E R S T O R E D P H O N E M E F R A G M E N T S S Y N T H E S I S P R O C E S S O R O U T P U T B U F F E R D / A S P E E C H O U T P U T S E C O N D S T A G E F i g . 2.2 . B L O C K D I A G R A M OF THE S Y N T H E S I S S Y S T E M 00 9 speech output. The s y n t h e s i s system was o r g a n i z e d to match t h a t of a human r e a d i n g process. The f i r s t stage s i m u l a t e s the f u n c t i o n of the b r a i n where i n p u t i s analysed. Generated commands are then sent to the second stage, the a r t i c u l a t o r y mechanism. The f i r s t stage i s not y e t an automatic p r o c e s s , c o n t r o l commands are entered t o the command b u f f e r manually v i a a t e l e t y p e keyboard and can be s t o r e d i n designated f i l e s . A b l o c k diagram - of the s y n t h e s i s system i s shown i n F i g . 2.2. F u r t h e r d e t a i l s of the s y n t h e s i s model w i l l be found i n s e c t i o n 2.6 and i n chapter 4. 2.3 A c o u s t i c s of speech i The p r o d u c t i o n of speech c o u l d be d e s c r i b e d i n a s i m p l i f i e d t hree stage process [1]. The f i r s t stage i s p s y c h o l o g i c a l where the concept i s formulated, l i n g u i s t i c f e a t u r e s are assigned and commands are generated to c o n t r o l the speech mechanism. The second stage i s p h y s i o l o g i c a l where the movements of the organs of speech produce a c e r t a i n speech sound a c c o r d i n g to the c o n t r o l s r e c e i v e d from the f i r s t stage. D i s t u r b a n c e s i n the medium c o n s t i t u t e the t h i r d stage, p h y s i c a l or a c o u s t i c . A c o u s t i c speech has been i n v e s t i g a t e d i n d i f f e r e n t l e v e l s , sentence, word, phoneme and subphonemic - ilevels [7,10,32-50]. Each l e v e l has i t s own complexity and c h a r a c t e r i s t i c f e a t u r e s to be s t u d i e d . For example, the v a r i a t i o n of d u r a t i o n , p i t c h , s t r e s s and the interaction at word boundaries of speech sounds are the main concerns i n the study of sentences. Monosyllable is o l a t e d words are often used for phonemic analyzers because of th e i r r e l a t i v e l y less complicated acoustic features and also because a phoneme by i t s e l f most often cannot be perceived. Phoneme by t r a d i t i o n a l d e f i n i t i o n i s "the smallest sound unit i n a language that distinguishes one utterance from another" [86]. Even from one speaker phonemes are not invariant; the d i f f e r e n t phonetic r e a l i z a t i o n s of a phoneme are known as i t s allophones [1,4]. The fl u c t u a t i o n of formant frequency value and the change in duration i n d i f f e r e n t vowel-consonant combinations have been studied by many groups of researchers [3,30-50] among them are Fant, Liberman, Delattre, Cooper et a l . C l a s s i f i c a t i o n of variations i s one solution to the problem of context variance and the study of Building Blocks by C.M.Harris [28] i s a good example of t h i s v a r i a t i o n . In speech analysis and synthesis, the int e r a c t i o n between phonemes i s perhaps more d i f f i c u l t to understand and reproduce than the variations of utterances [3,6,9,2 8-31]. The long transient duration i s one of many indications to t h i s fact [42-46,59]. The d i f f i c u l t y i s c l e a r l y described by C.M.Harris [28] as follows : "Experiments indicated that speech based upon one building block for each vowel and consonant not only sounds unnatural but i s mostly u n i n t e l l i g i b l e because the influences on vowel and consonants are missing which o r d i n a r i l y occur between 11 adjacent speech sounds". The use of dyads, i n the time domain, i s a simple way to overcome the t r a n s i t i o n problem. In the frequency domain approach/ the problem i s r e f l e c t e d by the study of formant motion [42,48,68] and t h e i r applications i n the int e r p o l a t i o n of formant contours used in formant synthesis. To understand the speech production process further, researchers have r e l i e d on the receiving end of the speech communication system - the perceptual stage [7,9,32-50]. I t i s clear that i f the human perceptual stage can recognize cer t a i n phonemes despite t h e i r v a r i a t i o n s , there must be some c h a r a c t e r i s t i c features that are somewhat invariant. The c h a r a c t e r i s t i c features were refered to as "phonetic cues" [9]. Thus the choice to use phonetic cues at a subphonemic l e v e l i n the present work i s only l o g i c a l . Techniques such as f i l t e r i n g , masking and segmenting have been used by others [32,43,44,79] i n attempts to separate the d i f f e r e n t cues in order to study one cue at a time. Isolation of phonetic cues i s by no means simple or possible 1 to achieve with analysis and perceptual techniques alone- [9] . In the present work, the synthesizer was used as part of the analysis system and only the relevant cues were defined and studied. The d e f i n i t i o n of phonetic cues i s found in the next section. ( 32 2.4 A n a l y s i s process The main steps of the a n a l y s i s p r o c e s s i n c l u d e - data a c q u i s i t i o n - p r e l i m i n a r y i n t e r p r e t a t i o n o f data by a n a l y s i s and p e r c e p t i o n - data a n a l y s i s - b y - s y n t h e s i s The e x t r a c t i o n and storage o f p h o n e t i c cues are the r e s u l t s o f the a n a l y s i s p r o c e s s . Utterances from r e a d i n g a l i s t o f chosen words were re c o r d e d u s i n g a hig h g a i n low n o i s e t r a n s d u c e r microphone and a r e e l t o r e e l analog tape r e c o r d e r . The speaker was a t r a i n e d CBC female announcer. Analog s i g n a l s from the tape r e c o r d e r were f i l t e r e d , t h e n q u a n t i z e d . D i g i t a l data were s t o r e d under d e s i g n a t e d " o r i g i n a l f i l e s " w i t h a s m a l l computer m o n i t o r i n g the p r o c e s s . The s p e c i f i c a t i o n s and q u a n t i t a t i v e d e s c r i p t i o n s w i l l be found i n chapter 3. A b l o c k diagram of the a c q u i s i t i o n p r o c e s s i s shown i n F i g . 2.3. NATURAL SPEECH"" ANALOG TAPE FILTER A/D PDP-12 ORIGINAL FILES LINC tape RK08 disk RECORDER (Scully) Krohnhi te .3342 R Computer Fig. 2.3 BLOCK DIAGRAM OF THE DATA ACQUISITION PROCESS A N A L Y S I S P R O G R A M S Y N T H E S I S P R O G R A M P E R C E P T I O N I A U D I O -V I S U A L D I S P L A Y O R I G I N A L D A T A F I L E S M O N I T O R P D P - 1 2 C O M P U T E R I P L O T S T I M E P L O T S F E A T U R E E X T R A C T I O N ( C o n t r o l Comma n d ) P H O N E M E F R A G M E N T F I L E S F i g . 2 . 4 B L O C K D I A G R A M OF THE A N A L Y S I S P R O C E S S Several programs, from simple ones such as sample and storage, p l o t t i n g and semi-automatic segmenting to more complicated ones such as "analysis" and "synthesis" programs, were developed to process sampled data. Time and frequency plots were obtained and used i n conjunction with the audio-v i s u a l devices from the computer to recognize and c l a s s i f y phonetic cues. Phonetic cues by d e f i n i t i o n suitable for t h i s method are a l l information concerning the phoneme fragments, duration of the quasi-steady state, duration of the t r a n s i t i o n , manner of in t e r a c t i o n . The block diagram of the analysis process i s shown i n Fi g . .,2.4, while Figs. 2.5 and 2.6 show the time plo t and frequency plot of the word "beat". A synthesis program was developed, simple at f i r s t , to give some feedback information of the synthetic speech. The program was improved and updated with the improvement of the decision i n extracting phonetic features. The i t e r a t i v e feature of t h i s process contributed a s i g n i f i c a n t amount to the effectiveness and open end of the system. The following are the r e s u l t s from the analysis process, presented i n a q u a l i t a t i v e form. The supporting data and quantitative disscusion are i n chapter 3. - Time and context variance of phonemes was c l e a r l y indicated by the time plots of waveforms. - The junction variations have also been found for each phoneme. aanindwu B A i i y i a a 16 I 1 I I I I 1 2 3 4 S FREQUENCY CKHZD F I 6 . 2.6 : FREQUENCY PLOTS OF THE UORD: "BERT C VOICED SEGMENT D I? - Further findings - A small number of waveform fragments were found to be r e l a t i v e l y context invariant. They were referred to as c h a r a c t e r i s t i c phoneme fragments (CPF). - While the CPF varied l i t t l e ' for each phoneme, th e i r combinations varied greatly from one allophone to another. - I t was possible to c l a s s i f y junctions between phonemes, therefore reducing the storage required. . - The pi t c h period (defined by the maxima and zero crossing c r i t e r a ) of vowels and vowel-l i k e consonants varied continuously and smoothly throughout the voiced part of the utterances. - Stress was the e f f e c t of the pi t c h period v a r i a t i o n and amplitude v a r i a t i o n . I t was found that p i t c h v a r i a t i o n was more dominant than amplitude v a r i a t i o n for t h i s method. Based on the above findings, stored o r i g i n a l waveforms were fragmented into p i t c h period fragments and the CPF were selected and stored i n designated CPF f i l e s . The data related to each CPF such as number of rep e t i t i o n s and duration of transience were also recorded to be used l a t e r as part of the control parameters i n the synthesis process. The time and frequency plo t of four c h a r a c t e r i s t i c F I G . 2.7STIME A N D F R E Q U E N C Y P L O T S O F F R A G M E N T S F R O M P H O N E M E /v/ A S I N "BEAT" phoneme fragments of the phoneme / i / as in "beat" are shown in F i g . 2.7. 2.5 Synthesis process A scheme to synthesize speech at the a r t i c u l a t o r y < l e v e l was developed. The detailed discussion of the synthesis process i s i n chapter 4. Following are the functions included i n t h i s scheme. - Modification : the CPF was modified i n amplitude or i n pi t c h period before the transient fragments were generated. - Straight transfer : the CPF was transferred d i r e c t l y and unaltered to the output buffer. If the process was to be. repeated, only one CPF was transferred and the number of repe t i t i o n s was recorded. - Pitch contouring : samples were removed or added in a comb-filter manner to each t r a n s i t i o n phoneme fragment, the r e s u l t i n g data were transferred to the output buffer, and the number of re p e t i t i o n s for each generated fragment was set to 1. The number of samples to be removed or added were dependent on the duration of the t r a n s i t i o n . - Amplitude and p i t c h contouring : a l i n e a r eight l e v e l weighted interpolation function 3onindWb 3Aiitn3a was added to the p i t c h c o n t o u r i n g f u n c t i o n . The generated waveform was the d i r e c t source of e x c i t a t i o n to o b t a i n the audio output. The s y n t h e t i c waveforms of the word "beat" i n f i g u r e s 2 7 8^and 2.9 are • almost -identical,to r Jaose,^ 2.5 and 2.6 . 2.6 T e s t s P e r c e p t u a l t e s t s were an e s s e n t i a l p a r t i n the development of a speech s y n t h e s i s system. Informal t e s t s have been c a r r i e d out c o n t i n u o u s l y throughout many stages of the p r o j e c t to gather data i n c l a s s i f y i n g o r i g i n a l words d e t e c t i n g p h o n e t i c cues, s e l e c t i n g c h a r a c t e r i s t i c phoneme fragments, checking the v a l i d i t y o f the d e c i s i o n s and consequently improving the s y n t h e s i s scheme. These informa t e s t s were r e f e r r e d to as "developing t e s t s " . At the l a t e r stage formal t e s t s were necessary f o r e v a l u a t i n g the s y n t h e s i s system and the p o t e n t i a l of the s y n t h e s i s method. Formal t e s t s , r e f e r r e d to as " s y n t h e s i s e v a l u a t i o n t e s t s " , t ogether w i t h the r e s u l t s w i l l be e l a b o r a t e d i n t h i s s e c t i o n . 2.6.1 Te s t m a t e r i a l s The s y n t h e s i s system presented here i s by no means complete as a l l phonemes and most of t h e i r r e l e v a n t combinations have to be analyzed and c l a s s i f i e d , to be t e s t e d and r e t e s t e d , and i n a l i m i t e d p e r i o d of time t h i s was not f e a s i b l e . In order t o o p t i m i z e the use o f a l i m i t e d number of words t h a t were a v a i l a b l e and to o b t a i n some meaningful s t a t i s t i c a l data, a s e r i e s o f t e s t s were designed based on p r e v i o u s r e s e a r c h r e s u l t s . I t has been shown t h a t the i n t e l l i g i b i l i t y o f speech was a f u n c t i o n of the context of the t e s t m a t e r i a l s [ 8 4 , 8 7 ] , and the manner i n which t e s t words were presented [ 4 , 7 7 , 8 7 , 8 8 ] ; f o r example the p r o b a b i l i t y of c o r r e c t i d e n t i f i c a t i o n was i n f l u e n c e d by the p r o b a b i l i t y o f occurrence o f the t e s t words i n a language and the c o n f u s i o n r a t e was s u b s t a n t i a l l y h i g h e r when the words were presented i n i s o l a t i o n than i n g r a m a t i c a l l y c o r r e c t meaningful sentences [ 8 7 ] . When the consonants i n the i n i t i a l p o s i t i o n were c l a s s i f i e d by p l a c e and manner of a r t i c u l a t i o n , i t , was found t h a t c o n f u s i o n w i t h i n groups was higher than between groups [ 3 , 7 9 , 8 9 , 9 0 ] . From these p r e v i o u s f i n d i n g s , the i n i t i a l consonants of one group, the v o i c e d stop [b], [d], [g], the consonant [t] and nine vowels [i ] , [ I ] , [e] , [£] , [ee] , [ A ] , [ a ] , [o], [u], , were chosen to c o n s t r u c t the t e s t m a t e r i a l s . Utterances o f 26 l e t t e r s o f the E n g l i s h alphabet were a l s o s y n t h e s i z e d and t e s t e d . The p o l y s y l l a b l e stimulus "w" was g i v e n but was not taken i n t o c o n s i d e r a t i o n when r e s u l t s were i n t e r p r e t e d . In a d d i t i o n t o 53 s y n t h e t i c words, 9 o r i g i n a l u t t e r a n c e s i n the form / b - t / were a l s o used to compare the n a t u r a l and s y n t h e t i c speech. A l l t e s t words were D/A converted, f i l t e r e d a t 6KHZ (except f o r t e s t number one where the f i l t e r was s e t a t 2.5KHZ), then recorded on r e e l t o r e e l analog tape. 2.6.2 T e s t arrangements and procedure For a l i m i t e d number of words as s t i m u l i , p e r c e p t u a l t e s t i n g and unconscious l e a r n i n g c o u l d not be mutu a l l y e x c l u s i v e . To minimize t h a t e f f e c t , s u b j e c t s were asked to choose the response from the e n t i r e small set (1/9 or 1/2 7) , furthermorev. i n the consonant d i s c r i m i n a t i o n t e s t the vowel of the sti m u l u s was not d i s c l o s e d . For t h a t reason sub-sets designed by Nye and Gaitenby [81] or ABX t e s t s c o u l d not be used. A male v o i c e was used to announce the s e q u e n t i a l order of the s y n t h e t i c female v o i c e stimulus to a v o i d the " p h y s i c a l surround r e f e r e n c i n g e f f e c t " as d e s c r i b e d by Ladefoged and Broadbent [88]. F i v e t e s t s were prepared and analyzed as f o l l o w s : t e s t number 1 : vowel i d e n t i f i c a t i o n of original and synthetic. P e r c e p t u a l n o i s e was added by s e t t i n g the low-pass f i l t e r , a t 2.5 KHz. t e s t number 2 : consonant d i s c r i m i n a t i o n [b,d,g]. t e s t number 3 : same as t e s t number 1 but the f i l t e r was s e t at 6KHZ. t e s t number 4 : vowel and consonant i d e n t i f i c a t i o n . t e s t number 5 : i n t e l l i g i b i l i t y t e s t - alphabet r e c o g n i t i o n . Males and females, whose ages ranged from 2 0-3 0 w i t h no r e c o r d of he a r i n g d i f f i c u l t y were asked to be • _ 25 s u b j e c t s . The procedure was v e r b a l l y e x p l a i n e d and given i n w r i t i n g . A s h o r t break was taken a f t e r each t e s t and a token g i f t was promised and given, as an i n c e n t i v e f o r c o n c e n t r a t i o n , to the s u b j e c t w i t h the h i g h e s t r e c o g n i t i o n score. T e s t words were presented to s u b j e c t s through earphones of loudspeakers s e t up i n a small classroom; the frequency response of both d e v i c e s were high e r than 6KHZ. The d e t a i l s of t e s t arrangements, procedures, l i s t of s t i m u l i and response sheets are found i n the Appendix C No. Cll, C.2, C.3. 2.7 T e s t r e s u l t s and d i s c u s s i o n s T e s t r e s u l t s from 10 s u b j e c t s v a r i e d from almost p e r f e c t scores to r e l a t i v e l y poor ones. For the p r e l i m i n a r y t e s t a n a l y s i s the percentage of t o t a l c o r r e c t responses was used. Table 2.1 shows the c h a r a c t e r i s t i c i d e n t i f i c a t i o n of s u b j e c t s and t h e i r r e s u l t s . The h i g h percentage of r e c o g n i t i o n (99%) i n r e g i o n (1) f o r s u b j e c t (M) c o u l d be a t t r i b u t e d to h i s f a m i l i a r i t y w i t h s y n t h e t i c speech and a l s o having E n g l i s h as h i s mother tongue. Subject (D) r e p o r t e d t h a t h i s a b i l i t y t o memorize, conc e n t r a t e and deduce l o g i c a l l y were the m a i n " f a c t o r s f o r h i s high s c o r e . . The r e l a t i v e l y low r e c o g n i t i o n r a t e s i n r e g i o n (3) (84% and 85%) were a t t r i b u t e d to the d i f f e r e n c e s i n the n a t i v e languages and the o v e r e x p e c t a t i o n s these s u b j e c t s had Table 2.1 SUBJECT IDENTIFICATION AND OVERALL RESULTS Subject F a m i 1 i a r Engli s h Percentage with as of Synthetic Mo ther Correct -Speech Tongue Responses M Y Y 99 (1) D Y 99 C - Y 93 T - Y 92 (2) L - Y 91 G - - 90 B - Y 89 DA - - 85 (3) P: Y - 85 Y 84 of the q u a l i t y of s y n t h e t i c speech, ( s u b j e c t s l e f t out the u n n a t u r a l words i n s t e a d of f o l l o w i n g the " f o r c e d c h o i c e procedure" as r e q u i r e d ) . The data from s u b j e c t s i n r e g i o n (2) was found to be p r o d u c t i v e s i n c e i t r e f l e c t s the system i n a l o g i c a l way t h e r e f o r e improvements c o u l d be made; these r e s u l t s are shown i n the f o l l o w i n g c o n f u s i o n m a t r i c e s . T A B L E NO.2.2 C O N F U S I O N M A T R I X F O R V O U E U S P R E C E E D E D B Y /b / 27 R E S P O N S E 5?3 *Z- H n 25 I I e e as A d 0 I 25 I 16 5 3 1 e 25 6 25 SL 25 A 3 22 0 1 24 0 \ 25 UL 1 24 d i a g o n a l t r a c e : 94% T B B L E NO.2 3 C O N F U S I O N M A T R I X F O R V O U E L S P R E C E E D E D B Y / b / R E S P O N S E n 25 I £ 6 A D oc 25 I 21 • 1 1 2 23 2 t 23 .2 25 A 25 0 1 24 0 24 1 L/C 4 1 20 to o CO <o to u i 3 to Id d i a g o n a l t r a c e : 9 3% TABLE NO.2A CONFUSION MATRIX FOR VOUELS PRECEEDED BY / b / 2 8 RESPONSE to {« H re |_ C (0 » 2 5 I & at A D 0 i 2 5 I 18 7 e 2 5 6 2 5 dt 2 5 A 1 24 ^ I 2 2 3 2 5 a j 2 5 CO o ui o w 111 a: d i a g o n a l t r a c e 96! TABLE NO. 2..5 CONFUSION MATRIX FOR VOUELS PRECEEDED BY / b / RESPONSE o •1— n 2 5 i I e £ 26 A D 0 L/C-thet ULUS L 2 5 t-c/i w CO I 2 5 o z e 2 5 TEST e 24 1 u. o t- 24 1 RESUL A • 2 5 2 5 . 2 5 Z/I 5 1 3 16 d i a g o n a l t r a c e : 95% T a b l e N o . 2 . 6 C O N F U S I O N M A T R I X FOR / b / , / d / , / g / T a b l e N o . 2 . 7 C O N F U S I O N M A T R I X FOR / b / , / d / , / g / 29 R E S P O N S E t o R E S P O N S E n 1 3 5 /b/ / d / / g / co n 1 3 5 /b/ / d / / g / /b/ 1 3 1 3 1 TIMULU /b/ 1 3 4 1 1. CO / d / 1 1 3 2 2 / d / 1 3 4 1 / g / 2 2 1 3 1 -/ g / 1 3 5 R E S U L T S OF T E S T N o . 2 R E S U L T S OF T E S T N o . 4 T A B L E N O . 2 . 8 C O N F U S I O N M A T R I X F O R V O U E L S P R E C E E D E D B Y / b / R E S P O N S E n 1 5 i J e 6 A D 0 L 1 5 -I 1 5 e 1 5 -a 1 5 -15 1 5 1 5 o I 1 5 u. 1 1 5 8 d i a g o n a l t r a c e : 95% T A B L E N O . 2 . 9 C O N F U S I O N M A T R I X F O R V O U E L S P R E C E E D E D B Y / d / R E S P O N S E n 15 I 1 e 56 A 0 0 i 15 I 13 1 1 e 15 B 2 10 1 2 15 A 15 v 1 1 14 0 1 13 1 1 6 8 d i a g o n a l t r a c e : 87% T A B L E N O . 2.10 C O N F U S I O N M A T R I X F O R V O U E L S P R E C E E D E D B Y /g / R E S P O N S E n 15 I i, J £ A 0 0 U-14 1 I 5 10 e 14 1 1 8 I 4 1 13 1 1 A • 15 0 1 14 0 2 13 LL 8 7 d i a g o n a l t r a c e : 73% 31 TABLE N o . 2 . 1 1 : CONFUSION MATRIX FOR E N G L I S H A L P H A B E T R E S U L T S OF TEST N o . 5 S \ A B c p £ f GV i J K L M N 0 P Q R S T U V W X V 2 A 15 b 12 3 C 1 14 t> 15 6 3 11 1 F 6 9 G 1£ H 15 i IE j 15 15 * 20 * M 8 12 * N 3 17 0 2 13 P 14 1 a 2 10 3. 3 12 s 15 T - 15 U 1 14 1 V I 15 w I 15 X 15 V 1 14 z 15 D i a g o n a l t r a c e ( e x c l u d e d "W") : 85% * L , M , N : 4 S T I M U L I INSTEAD OF 3 PER T E S T . 2.7.1 Vowe 1 i d e n t i f i c a t i o n From t a b l e s 2.2 to 2.5 and 2.8 to 2.10 the r e s u l t s show c l e a r l y f o r the p a r t i c u l a r v o i c e s e l e c t e d i n t h i s work t h a t the most d i f f i c u l t vowel to p e r c e i v e was / I / . E r r o r s have been found f o r both n a t u r a l d i g i t a l i z e d speech and s y n t h e t i c speech, e s p e c i a l l y when / I / was preceeded by the /g/ s t i m u l u s . T h i s combination was p e r c e i v e d e i t h e r as /o/ or /u/. The l e a r n i n g e f f e c t has been confirmed by the improvement o f r e c o g n i t i o n r a t e s o f vowels preceeded by /b/. The score i n c r e a s e d from 9 3% f o r s y n t h e t i c words and 94% f o r n a t u r a l words i n the f i r s t t e s t to 95% and 96% r e s p e c t i v e l y i n t e s t No.3. The lower r e c o g n i t i o n r a t e f o r vowels preceeded by /d/ and /g/ c o u l d be a t t r i b u t e d to three f a c t o r s : - Lower q u a l i t y o f t e s t words of the form / d - t / and / g - t / were made by p r o j e c t i n g parameters while / b - t / words were made wit h parameters e x t r a c t e d from • n a t u r a l u t t e r a n c e s . - Most words d i d not e x i s t i n the E n g l i s h language. - The d i f f e r e n t procedure i n o b t a i n i n g the responses-the response subset was l a r g e r (27 v s . 9) and s u b j e c t s were asked t o i d e n t i f y consonants as w e l l as vowels. 2.7.2 Consonant d i s c r i m i n a t i o n The d i s c r i m i n a t i o n r a t e s between the v o i c e d stop consonants were h i g h , 97% f o r the f i r s t t r y ( t e s t No.2) and 99% f o r the second t r y ( t e s t No.4), because the response sub-set was much s m a l l e r (3) and the phoneme [b], [d], and [g] were the same as the f a m i l i a r grapheme b,d,g, a l s o the s u b j e c t s were not confused by having to look up to the l i s t o f codes ( a l l s u b j e c t s had no knowledge of pho n e t i c symbols). 2.7.3 Alphabet r e c o g n i t i o n The r e s u l t s from t e s t No.5 were as expected. The n a s a l phonemes /m/ and /n/ were known to have a h i g h r a t e of c o n f u s i o n . F u r t h e r work with an e x t e n s i v e l i s t of o r i g i n a l words t h a t c o n t a i n /m/ and /n/ would be needed. The h i g h c o n f u s i o n r a t e f o r "C" p e r c e i v e d as "T" and "F" as "S" were due to the s h o r t d u r a t i o n of / s / i n "C" and e x t e n s i v e d u r a t i o n of / f / i n "F". These e r r o r s have been c o r r e c t e d and i n f o r m a l t e s t s have been done. The o v e r a l l r e c o g n i t i o n r a t e f o r the alphabet was 85%. Without M, N, C and F the r e c o g n i t i o n r a t e f o r the r e s t of the alphabet i n c r e a s e d to 94%. ( " W " was excluded from the c a l c u l a t i o n s ). 34 3. SPEECH ANALYSIS In t h i s chapter the q u a l i t a t i v e description i n section 2.4 w i l l be elaborated and the quantitative' - •% —• information w i l l also be presented. 3.1 Data Processing Facilities  3.1.1 Hardware f a c i l i t i e s Human voices reading from a l i s t of chosen words, were recorded i n a model 1205A sound proof room from the Industrial Acoustic Company, using a high gain low noise, Bruel and Kjaer", condenser microphone and a Scully analog tape recorder which was located outside the room. The d i s t o r t i o n from the recording system, under normal operation, was specified to be less than 1%. A PDP-12 computer was used to monitor the analysis process. At the s t a r t of the project the computer had 16,384-12-bit words of core memory with a cycle time of 1.6us, l a t e r another 16K words of core were added. The peripheral devices connected to the computer included 2 dual tape transports (TU55) and controls, a RK08 disk system with a data transfer rate of 4K words/8Oms, a teletype keyboard and sense switches, an oscilloscope type VR12, a 10 b i t A/D and a 10 b i t D/A converter were linked to outside devices through 6 relay ' buffers and 8 analog inputs. An audio system, f i l t e r , amplifier, loudspeaker: and a Zeta 100 series d i g i t a l p l o t t e r , were added to complete the basic hardware requirement. 3.1.2 Software f a c i l i t i e s In c o n j u n c t i o n w i t h human i n t e r p r e t a t i o n , o n - l i h e , i n t e r a c t i v e d i g i t a l p r o c e s s i n g o f data was found to be necessary and a number of programs were developed to f u l f i l l the needs. In time-domain s i g n a l a n a l y s i s , the b a s i c software DIAL-MS o p e r a t i n g system was used and programs were w r i t t e n i n assembly language t o accommodate the l i m i t a t i o n s of the core memory and the requirements f o r r e a l - t i m e p r o c e s s i n g : For frequency a n a l y s i s a p p l i c a t i o n s an 0S8 o p e r a t i n g system was used and programs were w r i t t e n i n FORTRAN, wit h most su b r o u t i n e s , such as FFT, pr o v i d e d by oth e r r e s e a r c h e r s or s u p p l i e d by the D i g i t a l Equipment C o r p o r a t i o n . In both time domain and frequency domain, a standby monitor r o u t i n e connected to numerous su b r o u t i n e s was the programing method adapted which p r o v i d e d a f l e x i b l e and powerful a n a l y s i s p r o c e s s i n g t o o l . The f u n c t i o n of the subrout i n e s i n the ; !_analysis" program c o u l d be d i v i d e d i n t o three main groups : t r a n s f e r data, modify data and d i s p l a y data. Routines of the t r a n s f e r data group are : - Read o r i g i n a l data and name of f i l e from any s e l e c t e d d i s k u n i t or tape to core memory. - Read and w r i t e phoneme fragment from and to tape or d i s k u n i t #14. - T r a n s f e r any segment of data from one core area to another, addresses were s p e c i f i e d by the o p e r a t o r v i a a t e l e t y p e keyboard or console switches. - F i l e and r e c o r d e s s e n t i a l parameters r e l a t e d to the s e l e c t e d phoneme fragments i n t o a d i r e c t o r y area which c o u l d be s t o r e d on d i s k u n i t #14 or magnetic tape, and a hard copy c o u l d be p r i n t e d on command. Routines o f the data m o d i f i c a t i o n group : - I s o l a t e or segment data i n any s p e c i f i c l e n g t h . - D i f f e r e n t i a t e or i n t e g r a t e data. - Modify amplitude or p i t c h p e r i o d of a waveform fragment. - C l i p the d e r i v a t i v e data. - Create a r t i f i c i a l waveforms manually v i a switches and keyboard. - Average data s u b t r a c t i v e l y or a d d i t i v e l y . - Concatenate any waveform fragment to fragments of o t h e r o r i g i n a l waveforms as s p e c i f i e d by the op e r a t o r v i a t e l e t y p e keyboard. - Reverse order of samples. Routines f o r d i s p l a y i n g data : - D i s p l a y any s e l e c t e d segment of data from one to 24K samples v i s u a l l y or a u d i b l y or both. - Repeat the d i s p l a y i n g process c o n t i n u o u s l y from 1 t o 512 times. - D i s p l a y two d i f f e r e n t segments i n sequence f o r p e r c e p t u a l comparison i n i n f o r m a l t e s t s . 37 The same r o u t i n e s c o u l d be c a l l e d one a f t e r another to perform d i f f e r e n t f u n c t i o n s on the same s e l e c t e d segment of data. T h i s f e a t u r e i n c r e a s e d the c o m p l i c a t i o n of the programming e s p e c i a l l y i n t h e - a d d r e s s i n g task i n assembly language, however, the i n c r e a s e d power and f l e x i b i l i t y of the program'was a v a l u a b l e compensation f o r the e f f o r t . 3.2 Data p r e p a r a t i o n The analog r e c o r d i n g of u t t e r a n c e s were d i g i t i z e d and time and frequency p l o t s were made; the process i s d e s c r i b e d " i n t h i s s e c t i o n . A Krohn H i t e 3342R low pass f i l t e r w i t h the s k i r t s dropped o f f a t a r a t e of 48 db/octave [10] was used to l i m i t the analog i n p u t frequency a t 4.5KHZ. To a v o i d a l i a s i n g e r r o r s the sampling r a t e used was 12.5KHZ wit h a q u a n t i z a t i o n accuracy of 1/512 (9 b i t s ) . Nine b i t s q u a n t i z a t i o n was used to match the computer d i s p l a y f a c i l i t y . D i g i t a l waveform segments of 4096 samples each were s t o r e d under desi g n a t e d o r i g i n a l f i l e s in.LINC tape or RK08 d i s k s . Each o r i g i n a l f i l e was r e f e r r e d to by a l e t t e r and a number. The ASCII code of the l e t t e r was converted by the c a l l i n g program to the a b s o l u t e address of the f i l e i n the LINC tape which was s p e c i f i e d by the number. The o r i g i n a l data was s e l e c t e d and v e r i f i e d by o u t p u t t i n g v i a a D/A c o n v e r t e r and audio d e v i c e s . To o b t a i n the time p l o t s o r i g i n a l data was 38 converted to a s u i t a b l e format f o r us i n g FORTRAN p l o t t i n g programs. More steps were necessary to produce the s h o r t -time amplitude s p e c t r a or the frequency p l o t s . Waveforms were segmented i n t o i n d i v i d u a l p i t c h p e r i o d fragments, zero p o i n t s were added a t the end of each fragment forming a s e r i e s of even l e n g t h (2 7) 128 p o i n t r e c o r d s i n order to reduce computing time and to ensure the even spacing on the frequency a x i s f o r a l l curves. A DFFT r o u t i n e was used to trans f o r m each p i t c h p e r i o d waveform i n t o the frequency domain. The r e l a t i v e amplitude s p e c t r a were p l o t t e d u s i n g the Zeta p l o t t e r . I t should be noted t h a t the time a x i s i n F i g . 2.6 i s not l i n e a r l y s c a l e d s i n c e p i t c h p e r i o d s were c o n t i n u o u s l y changing. .. ; U n i t 14 of the o p e r a t i n g d i s k was r e s e r v e d f o r phoneme fragment data • Each s e l e c t e d fragment was s t o r e d i n a phoneme f i l e which was s p e c i f i e d by 2 l e t t e r s , the f i r s t l e t t e r v a r i e d from A to. Z, the second l e t t e r v a r i e d from A to H. The c a l l i n g program w i l l c o n vert the phoneme f i l e name i n t o the a b s o l u t e address of the f i l e i n the d i s k and a l s o i n t o the a b s o l u t e address i n the d i r e c t o r y where phoneme r e l a t e d parameters were s t o r e d . 3.3 O r i g i n a l data s e l e c t i o n A f t e r the p r e p a r a t i o n of hardware and software was complete p r e l i m i n a r y a n a l y s i s was done to decide the o r i g i n a l word l i s t and o r i g i n a l v o i c e . The d e c i s i o n was based on a 39. simple l o g i c a l s y s t e m a t i c approach which has remained u n a l ^ c t e r e d throughout the p r o j e c t . Choice of o r i g i n a l words : I t appeared i n the d e v e l o p i n g stage of the p r o j e c t t h a t the chosen words must f u l f i l l c e r t a i n systematic c o n d i t i o n s to study the c h a r a c t e r i s t i c s of a c o u s t i c data. The c o n d i t i o n s are as f o l l o w s : - Words chosen must have some meaning to i n s u r e the n a t u r a l n e s s of the manner of a r t i c u l a t i o n of the speaker. - Words used to study the d i f f e r e n c e s of vowels must o n l y have vowels as v a r i a b l e s and surrounding phonemes must be f i x e d . - A l a r g e number of words,containing allophones of a c e r t a i n phoneme,were needed to study the c o n t e x t v a r i a n c e of the phoneme. To s a t i s f y the above c o n d i t i o n s the most s u i t a b l e ; words':.to study vowels were found to be of the form [b-t] which were a l s o used to study the context v a r i a t i o n s of the i n i t i a l phoneme /b/ and f i n a l / t / phoneme. Words c o n t a i n i n g the complex vowel /o/ were chosen to study the c o n t e x t v a r i a n c e of vowels. The c h o i c e was based on the simple reason t h a t /o/ appeared to be a unique and unambiguous vowel. The complexity of the phoneme /o/ was not known due to the i n e x p e r i e n c e and l a c k of knowledge i n l i n g u i s t i c and a c o u s t i c s of speech a t the time. At a l a t e r stage of the study, the data p e r c e i v e d from the v a r i a t i o n s 40 and complexity of phoneme /o/ were the f a c t o r s t h a t l e a d to the simple i n t e r p o l a t i o n scheme f o r speech s y n t h e s i s . Other words t h a t cover a wide range of phonemes were a l s o recorded (Appendix A). The phonemically balanced words wi d e l y used f o r p e r c e p t u a l t e s t i n g were not s u i t a b l e nor necessary a t t h i s stage. Utterances from s e v e r a l speakers were recorded. Time p l o t s and p r e l i m i n a r y a n a l y s i s data were ob t a i n e d . The hig h p i t c h and low content i n the high frequency range of the female v o i c e r e l a t i v e to the male v o i c e were observed. A few words were analyzed then r e s y n t h e s i z e d . Informal t e s t s were made and i t appeared t h a t the degradation i n the r e s y n t h e s i z e d male v o i c e was l e s s n o t i c e a b l e than t h a t o f a female v o i c e . The reason was t h a t the male v o i c e was r i c h e r i n harmonic content and had a lower fundamental frequency, consequently a longer p i t c h p e r i o d , t h e r e f o r e more samples per fragment were s t o r e d . The trade o f f i n storage and q u a l i t y was c l e a r l y i n d i c a t e d . The female v o i c e was chosen f o r t h i s p r o j e c t however, based on the f o l l o w i n g f a c t o r s : - Simpler waveforms because of the lower content i n the high frequency range. - I t was e a s i e r to p e r c e i v e d e g r a d a t i o n i n the s y n t h e t i c speech output. The simple waveforms were e s s e n t i a l f o r the de v e l o p i n g stage of the p r o j e c t and the n o t i c e a b l e ' . degra d a t i o n was a v i t a l f a c t o r f o r improving the system 43 which used both feedback and . feedforeward f e a t u r e s . Time p l o t s of the word "beat" from a male and a female speaker are shown i n F i g . 3.1 and 3.2 r e s p e c t i v e l y . 3.4 Data a n a l y s i s In the time-domain approach to speech p r o c e s s i n g , due to the complexity and time v a r i a n t c h a r a c t e r i s t i c of speech waveforms, human i n t e r p r e t a t i o n of data was used i n c o n j u n c t i o n w i t h o t h e r processes such as short-time f o u r i e r a n a l y s i s and semi-automatic peak-and-zero-crossing d e t e c t i o n . The frequency a n a l y s i s was used e x t e n s i v e l y i n a s h o r t p e r i o d of the e x p l o r a t i o n stage of the p r o j e c t . Once the approxima-t i o n s i n the time-domain were found, o n l y the D i s c r e t e F a s t F o u r i e r Transform was used to o b t a i n the frequency p l o t s of p i t c h - p e r i o d amplitude s p e c t r a . The d e t a i l e d d e s c r i p t i o n of the frequency a n a l y s i s can be found i n r e f e r e n c e s [2], [52], and [56]. Since the a n a l y s i s and s y n t h e s i s methods are the main concerns the a c o u s t i c f e a t u r e s are simply, c o n s i d e r e d as data t h e r e f o r e o n l y r e l e v a n t f a c t s concerning the methodology are g i v e n . The d e t a i l e d d e s c r i p t i o n s of l i n g u i s t i c and p h y s i c a l aspectsoof speech can be found i n r e f e r e n c e s [1],[3],[4], [ 93]. In t h i s s e c t i o n time-domain measurements, the back up l o g i c f •• and the methods to e x t r a c t phonetic f e a t u r e s are d e s c r i b e d . -44 3.4.1. Vowel a n a l y s i s In the a n a l y s i s o f vowels f o l l o w i n g parameters were i n v e s t i g a t e d : a) S t r u c t u r e of vowel waveform b) D u r a t i o n of vowel c) D u r a t i o n of p i t c h p e r i o d 3.4.1.a S t r u c t u r e of vowel waveform In the time-domain, the s t r u c t u r e of vowel waveform i s the i d e n t i f i c a t i o n of vowel. The shape of the waveform by i t -s e l f r e f l e c t s i n f o r m a t i o n about the speaker's v o i c e and can be used i n speaker r e c o g n i t i o n . However i n speech s y n t h e s i s vowel wave shape i n f o r m a t i o n by i t s e l f has l i t t l e use. On the other hand, the v a r i a t i o n and the r e l a t i o n s h i p of s t r u c t u r e , i f t h ere i s any, between waveforms and w i t h i n waveforms are the main p o i n t s of i n t e r e s t . Waveforms and waveform fragments from the same vowel s i t u a t e d i n d i f f e r e n t c o n t e x t s were observed and compared us i n g the time p l o t s . S i m i l a r i t i e s o f waveform segments were noted and the frequency p l o t s were used to v e r i f y the o v e r a l l v i s u a l s i m i l a r i t i e s , by u s i n g an o n - l i n e a n a l y s i s . Afterward each s i m i l a r fragment was e x t r a c t e d , D/A converted and audio/ v i s u a l output was d i s p l a y e d . In order to have an a u d i b l e sound output the fragments were repeated many times. Two d i f -f e r e n t fragments were d i s p l a y e d s e q u e n t i a l l y f o r the purpose of comparison. From t h i s process a number of p o s s i b l e charac-45 t e r i s t i c phoneme fragments were stored to be investigated further. The time and frequency plots of the word "beat" that contain /U/ are shown in F i g . 2.5 and 2.6 respectively. Time and frequency plots of eight other words are i n appendix B. 3.4.1.b Duration of vowels The duration of vowels has been investigated by many researchers, and attempts have been made to generalize the c o r r e l a t i o n between vowel environment and vowel duration [37,38]. The incompatibility i n method used to measure vowel duration and the ambiguity i n the s t a r t and .end of the transient between phonemes made the generalization of t h i s type of data very objectionable. In t h i s work, the investigation of duration was d i v i -ded into two d i f f e r e n t categories, the perceptual duration of vowel and the defined duration of vowel. In the f i r s t category, durations were measured by informal perceptual testing and either time or number of samples were recorded as the r e s u l t s . In the second category, durations were derived from the data of the f i r s t category i n p a r a l l e l with time plots of waveform and the positions of selected C h a r a c t e r i s t i c Phoneme Fragments (CPF) of the selected vowel. The measurements were made for the quasi-steady-state of each CPF and the approximate transient duration between fragments. The data were not measured by time unit or number of samples but by number of p i t c h periods. The data derived from t h i s process were used as parameters i n the commands of the synthesis stage. I t should be noted that dura-TABLE NO . 3 .1. No. of Number of samples per p i t c h period p i t c h period N III N M /*/ N IN hi. M C F 54 64 57 72 86 65 69 54 66 1 58 64 62 68 64 67 74 71 73 2 57 62 63 68 66 67 75 72 73 3 56 62 65 68 63 68 - 74 73 74 4 55 61 64 69 67 68 75 75 75 5 55 62 63 .68 68 69 74 75 74 6 55 60 63 68 67 69 74 76 75 7 53 61 62 68 68 68 72 75 75 8 52 60 60 68 67 69 73 75 75 9 50 61 61 68 67 69 71 75 75 10 50 59 60 67 66 69 70 74 75 11 49 59 60 66 66 69 69 74 74 12 48 59 61 66 66 69 66 74 75 13 48 58 58 65 64 68 68 73 74 14 48 59 58 65 64 68 66 73 74 15 46 58 57 63 63 67 64 72 74 16 46 58 56 63 62 66 64 73 73 17 46 58 56 62 63 66 63 72 73 18 44 57 54 62 62 65 63 71 73 19 44 58 53 60 '62 65 63 71 72 20 . 44 57 52 60 65 62 71 72 21 46 56 52 60 63 62 70 71 22 57 51 59 62 69 70 23 56 50 58 61 68 69 24 56 51 57 59 68 68 25 56 50 57 57 67 68 26 55 50 56 58 65 65 27 57 49 56 57 64 65 28 60 49 57 56 63 29 58 49 57 55 30 48 56 55 31 48 56 -Total 1050 1704 1735 1885 1235 1990 144 2 1999 1954 D U R A T I O N OF P I T C H P E R I O D AND V O I C E D S E G M E N T I N NUMBER OF S A M P L E S OF 9 WORDS " b i t , b e a t , b a i t , b o o t , b e t , b o a t , b u t , b o u g h t , b a t . " 47 TABLE No, 3 . 2 NO. Of Duration of p i t c h period (m.sec.) pit c h period N N M " ftf lol / A / /of 1*1 C F 4 . 3 2 5 . 1 2 4 . 56 5 . 7 6 6 . 88 5 . 2 0 5 . 52 4 . 3 2 5 . 2 8 1 4 . 64 5 . 1 2 4 . 96 5 . 4 4 5 . 1 2 5 . 3 6 5 . 92 5 . 68 5 . 8 4 2 4 / 5 6 4 . 96 5 . 04 5 . 4 4 5 . 2 8 5 . 3 6 6 . 00 5 . 7 6 5 . 8 4 3 4 . 4 8 4 . 9 6 5 . 2 0 5 . 4 4 5 . 04 5 . 4 4 5 . 92 5 . 8 4 5 . 9 4 4 4 . 4 0 4 . 88 5 . 1 2 5 . 52 5 . 36 5 . 4 4 6 . 00 6 . 00 6 . 00 5 4 . 4 0 4 . 9 6 5 . 04 5 . 4 4 5 . 4 4 5 . 5 2 5 . 92 6 . 00 5 . 9 4 6 4 . 4 0 4 . 8 0 5 . 04 5 . 4 4 5 . 36 5 . 52 5 . 92 6 . 0 8 6 . 00 7 4 . 2 4 4 . 8 8 4 . 9 6 5 . 4 4 5 . 4 4 5 . 4 4 5 . 7 6 6 . 00 6 . 00 8 4 . 1 6 4 . 8 0 4 . 8 0 5 . 4 4 5 . 36 5 . 5 2 5 . 84 6 . 00 6 . 00 9 4 . 00 4 . 8 8 4 . 8 8 5 . 4 4 5 . 3 6 5 . 52 5 . 68 6 . 0 0 6 . 00 10 4 . 0 0 4 . 7 2 4 . 8 0 5 . 36 5 . 2 8 5 . 5 2 5 . 6 0 5 . 9 2 6 . 00 11 3 . 92 4 . 7 2 4 . 8 0 5 . 2 8 5 . 2 8 5 . 5 2 5 . 52 5 . 9 2 5 . 92 * 12 3 . 84 4 . 7 2 4 . 8 8 5 . 2 8 5 . 2 8 5 . 52 5 . 2 8 5 . 9 2 6 . 0 0 13 3 . 8 4 4 . 64 4 . 64 5 . 2 0 5 . 1 2 5 . 4 4 5 . 4 4 5 . 84 5 . 9 2 14 3 . 8 4 4 . 7 2 4 . 64 5 . 2 0 5 . 1 2 5 . 4 4 5 . 2 8 5 . 84 5 . 92 15 3 . 6 8 4 . 64 4 . 5 6 5 . 04 5 . 04 5 . 36 5 . 1 2 5 . 76 5 . 9 2 16 3 . 68 4 . 64 4 . 4 8 5 . 04 4 . 9 6 5 . 2 8 5 . 1 2 5 . 8 4 5 . 8 4 17 3 . 6 8 4 . 6 4 4 . 4 8 4 . 96 5 . 04 5 . 2 8 5 . 04 5 . 7 6 5 . 8 4 18 3 . 52 4 . 5 6 4 . 3 2 4 . 96 4 . 96 5 . 2 0 5 . 04 5 . 6 8 5 . 8 4 19 3 . 5 2 4 . 64 4 . 2 4 4 . 8 0 4 . 96 5 . 2 0 5 . 0 4 5 . 68 5 . 7 6 20 3 . 52 4 . 5 6 4 . 1 6 4 . 8 0 5 . 2 0 4 . 96 5 . 68 5 . 7 6 21 3 . 68 4 . 4 8 4 . 1 6 4 . 8 0 5 . 04 4 . 96 5 . 6 0 5 . 6 8 22 4 . 56 4 . 0 8 4 . 7 2 4 . 9 6 5 . 5 2 5 . 6 0 23 4 . 4 8 4 . 00 4 . 64 4 . 8 8 5 . 4 4 5 . 52 24 4 . 4 8 4 . 0 8 4 . 5 6 4 . 7 2 5 . 4 4 5 . 4 4 25 4 . 4 8 4 . 00 4 . 5 6 4 . 5 6 5 . 36 5 . 4 4 26 4 . 4 0 4 . 00 4 . 4 8 • 4 . 64 5 . 2 0 5 . 2 0 27 4 . 5 6 3 . 9 2 4 . 4 8 4 . 5 6 5 . 1 2 5 . 2 0 28 4 . 8 0 3 . 92 4 . 5 6 4 . 4 8 5 . 04 29 4. 64 3 . 92 4 . 5 6 4 . 4 0 30 3 . 84 4 . 4 8 4 . 4 0 31 3 . 8 4 4 . 4 8 Total 84 13632 13 8£0 15050 9 8 . 8 0 159.20 115.36 15 £92 156.32 DURATION OF PITCH PERIOD AND VOICED SEGMENT IN M.SEC. OF 9 WORDS "bit, beat, bait, boot, bet, boat, but, bought, bat." 4 8 t l o n measured by number of p i t c h periods did not r e f l e c t the absolute value of duration i n time, t h i s point w i l l be elabora-ted l a t e r . 3.4.1 .C Duration of p i t c h period The duration of p i t c h periods were noticeably d i f f e r e n t between vowels, and also continuously changed i n time. Duration of p i t c h periods i n sample points and i n m. s e c , are shown i n tables 3.1 and 3.2 respectively. The duration of p i t c h periods were extracted from waveforms subjected to 1.5 KHz low pass f i l t e r i n g using a semi-automatic p i t c h extraction program. Data from table 3.1 indicated that duration of p i t c h periods changed smoothly i n time with 2 to 4 steady state regions for the vowels investigated. The duration of p i t c h period of the selected CPF was stored i n the phoneme directory simultaneously with storage of the CPF. 3.4.2 Transient and consonant analysis Special attention was paid to accurate measurement of the transient duration, one of the main elements of i n t e r e s t in t h i s project. In informal tests for vowel duration measure-ments, a CVC monosyllable word was transferred from o r i g i n a l f i l e s to core memory. The complete play-back sound of the word (e.g. / b i t / ) was heard and switches were set to truncate the beginning segment u n t i l the output sound became ambiguous due to the gradual elimination of the perceivable information from the f i r s t consonant (/b/). The ambiguous region was recor-49 ded to be r e t e s t e d l a t e r f o r c o n s i s t e n c y . T h i s process was continued u n t i l o n l y the vowel and the l a s t consonant (/it/) were p e r c e i v e d . The data was again recorded as the p o s s i b l e s t a r t i n g p o i n t of the vowel w i t h no p e r c e p t u a l i n t e r f e r e n c e from the p r e c e d i n g consonant. To ensure t h a t the p e r c e p t u a l data were a c c u r a t e the r e v e r s e procedure ( from / i t / to / b i t / ) was used to t e s t another s u b j e c t . The accuracy of t r a n s i e n t d u r a t i o n was v e r i f i e d a g a i n by examining the vowel i n the f o l l o w i n g manner : For onset t r a n s i e n t , the f i r s t p i t c h p e r i o d of the vowel was. repeated N times then concatenated to the remaining waveform. Output was o b t a i n e d and the consonant again was p e r c e i v e d depending on the number of r e p e t i t i o n s ( N ) of the c o n n e c t i n g fragment. I f N was s m a l l the judgement o f t r a n s i t i o n was i n a c c u r a t e , l a r g e r N i m p l i e d t h a t l e s s i n t e r f e r e n c e was c a r r i e d over from the preceeding consonant. For o f f s e t t r a n s i e n t the r e v e r s e procedure was used. The accuracy i n o b t a i n i n g t r a n s i e n t d u r a t i o n s was important o n l y a t the development stage. In l a t e r stages, when s y n t h e t i c speech was f a b r i c a t e d , the c h o i c e of c h a r a c t e r i s t i c fragment of t r a n s i e n t ( CF ) was found to be more c r i t i c a l than the d u r a t i o n of t r a n s i e n t waveform. The c o u p l e r fragment ( CF ) of t r a n s i e n t from consonant /b/ and 9 vowels was e x t r a c t e d and used to generate the com-p l e t e t r a n s i e n t segment. The s y n t h e t i c word was judged to be of high q u a l i t y . C l a s s i f i c a t i o n was made and the number of CF r e q u i r e d was reduced to 4 f o r /b/, 4 f o r /g/, and 2 f o r /d/. In addition to the coupler fragments, a noise burst segment was stored for each consonant studied. 3.5 Ch a r a c t e r i s t i c phoneme fragment extraction In t h i s section, the c r i t e r i a or rules for CPF extrac-t i o n and the f i l i n g system for data storage and r e t r i e v a l are described. 3.5.1 Fragmentation c r i t e r i a The compatibility of a l l CPF and CF chosen, the prime important factor for the smooth transients and high qu a l i t y output, was achieved by following a set of rules to obtain the uniformity and the consistency throughout the fragmenta-t i o n process. The set of rules w i l l be derived i n t h i s section. A s i m p l i f i e d p i t c h period waveform fragment i s shown i n Fig . 3.3, the pitch-period onset and the ringings are i n d i c a -ted. T >-Fi g . 3.3 A SIMPLIFIED PITCH PERIOD WAVEFORM 51 The p e r i o d T can be d e f i n e d as the time i n t e r v a l b e t -ween corresponding p o i n t s , consequently the s t a r t i n g and ending p o i n t s f o r the p i t c h p e r i o d shown i n F i g . 3.3 can be chosen as S^-E^ o r ^2 - E2 a s i n < ^ i c a t e ( l . I f P(t) r e p r e s e n t s the waveform f u n c t i o n i n time-domain, then S^-E^ are the c o r r e s -ponding z e r o - c r o s s i n g p o i n t s t h a t precedes the p o s i t i v e peak of the d e r i v a t i v e P'(t) of P ( t ) . The value o f the p i t c h p e r i o d s measured from the d e r i v a t i v e P'(t) are r e l a t i v e l y c o n s i s t e n t , however, the amplitude v a l u e s P(S^) and P(E^) are d i f f e r e n t and are not s u i t a b l e f o r the c o n c a t e n a t i o n scheme ( F i g . 3.4). F i g . 3.4 DISCONTINUITY IN CONCATENATED WAVEFORM DUE TO THE DIFFERENCES IN AMPLITUDE OF END POINTS. The amplitude of the end p o i n t s c o u l d e a s i l y be m o d i f i e d to match each ot h e r i f the same fragments are concatenated (repeated), but the problem s t i l l e x i s t s when two c o n c a t e n a t i n g fragments are d i f f e r e n t . The problems of amplitude d i f f e r e n c e s do not e x i s t i f S 0 - E 0 , the corres p o n d i n g z e r o - c r o s s i n g p o i n t s 52 of P(t) are chosen; however, another problem arrives because the waveform fragments are not always " i d e a l " as the one shown in t h i s example. The end points S j_ - E i a r e fluctuated and are not always negative, but the zero-crossing P*(t) occurs at nearly the same region, however, the crossing points S2-E2 of P(t) d r a s t i c a l l y changed in position, the consistency therefore i s not established. The problem i s i l l u s t r a t e d i n F i g . 3.5. Fi g . 3.5 DIFFICULTY IN AUTOMATIC PITCH DETECTION (a) faulty detection of end point (b) example of natural waveform segment from /baet/ F i g . 3.6 SELECTED END POINTS FOR PITCH DETECTION IN THE PRESENT WORK 5 3 Lengthy and e l a b o r a t e a l g o r i t h m s have been developed by a number o f r e s e a r c h e r s [ 5 5 , 9 1 , 9 2 1 ; A easier, method i s found, f o r the v o i c e used i n t h i s work, the problem can be e l i m i n a t e d by simply choosing the z e r o - c r o s s i n g of P(t) l o c a t e d immediately b e f o r e the negative peak as the s t a r t i n g and ending p o i n t s ( F i g . 3 . 6 ) . The c r o s s i n g i n t h i s case always occurs and the end p o i n t s are e a s i l y a d j u s t e d to zero f o r every s t o r e d C h a r a c t e r i s t i c Phoneme Fragment. Even though the shape of the CPF, o b t a i n e d from t h i s method, i s d i f f e r e n t from t h a t of the c o n v e n t i o n a l p i t c h p e r i o d , the frequency content -i s the same i n both cases, t h e r e f o r e the sounds output are i d e n t i c a l l y p e r c e i v e d . The i n f o r m a l t e s t s v e r i f i e d t h a t f a c t and j u s t i f i e d the c h o i c e . Works have been s t a r t e d to develop an automatic p i t c h d e t e c t i o n program i n time-domain but more data with v o i c e from d i f f e r e n t speakers would be needed to g e n e r a l i z e and e v a l u a t e the r e s u l t s from t h i s method, which i s a suggestion f o r f u t u r e work t o p i c . The fragmentation c r i t e r i a are summarized as f o l l o w s : - A l l s t o r e d CPF should s t a r t and end w i t h a negative s l o p e . - The s t a r t i n g p o i n t should be the z e r o - c r o s s i n g p o i n t preceded the n e g a t i v e peak. - The d e t e c t e d s t a r t i n g and ending p o i n t s should be a u t o m a t i c a l l y m o d i f i e d to zero. An example i s shown i n F i g . 3.7 R E L A T I V E A M P L I T U D E —i H 3 m co o H < fS) cn oo o o h 55 3.5.2 S e l e c t i o n c r i t e r i a The f o l l o w i n g steps were taken to s e l e c t the s t o r e d c h a r a c t e r i s t i c phoneme fragments: (CPF) : - i s o l a t i n g phonemes by e l i m i n a t i n g the t r a n s i e n t segment ( s e c t i o n 3.4.2). - d e t e c t i n g s i m i l a r segments from d i f f e r e n t a l l o p h o n e s of each phoneme. The segment t h a t o c c u r r e d i n the m a j o r i t y of allophones was fragmented and the middle fragment of each segment was used as the p r e l i m i n a r y c h a r a c t e r i s t i c phoneme fragment ; such s e l e c t e d fragments were used to s y n t h e s i z e a few u t t e r a n c e s to check the v a l i d i t y of the c h o i c e s . A l t e r n a t i v e fragments were t e s t e d u n t i l the most p l e a s i n g and n a t u r a l output speech was p e r c e i v e d i n the i n f o r m a l p e r c e p t u a l t e s t s . I t was found t h a t i n many cases the CPF may look d i f f e r e n t but s i m i l a r output sounds were p e r c e i v e d t h e r e f o r e storage r e d u c t i o n was p o s s i b l e . Examples of some c h a r a c t e r i s t i c phoneme fragments of vowels are shown i n F i g . 3.8. 3.5.3 F i l i n g system f o r C h a r a c t e r i s t i c Phoneme Fragment data The f i l i n g system f o r data storage and r e t r i e v a l used i n the p r e s e n t p r o j e c t was designed as a r e s e a r c h t o o l r a t h e r than as a f i n a l user product, t h e r e f o r e , the f l e x i b i l i t y o f the system was of h i g h e r p r i o r i t y than the R E L A T I V E A M P L I T U D E 57 unnecessary data-compression scheme. F u l l 12-bit-word was used for each sample and 512 locations were reserved for each CPF f i l e , the unused locations were f i l l e d with zero. The block diagram i n F i g . 3.9 shows the organization of core memories and data f i l e s . F i e l d 0 and the f i r s t half of f i e l d 1 were reserved for the monitoring analysis and the projecting expansion of the program. The second half of f i e l d 1 was occupied by the CPF directory and the DIAL System monitor I/O Routines. DF3 was the data processing buffer. When a CPF was selected, a l l relevant information such as duration of the p i t c h period, the exact ;location of the s t a r t i n g point from the o r i g i n a l waveform, the o r i g i n a l f i l e name, and the CPE'.-file name were transferred to the directory, then stored in blocks number 10 to 13 of the selected storage unit or device (LINC tape or Disk unit 14). Under the same command, the CPF data were transferred to the f i r s t 2 blocks of DF3 then stored i n the same LINC tape or Disk unit as the directory. The address of the s t a r t i n g block was derived from the designated f i l e name (example - AA : 20, AB : 22, ZH : 656 ). Table 3.3 shows some of the f i l e names and th e i r related phonemes. . 3.6 Conclusions In t h i s chapter, the analysis system and the relevant r e s u l t s have been presented; topics included the 58 d e s c r i p t i o n of the hardware and software f a c i l i t i e s , the l o g i c f o r the c h o i c e of o r i g i n a l words and the female v o i c e , the r e l e v a n t a c o u s t i c data of phonet i c cues, the fragmentation r u l e s , the c r i t e r i a f o r s e l e c t i n g CPF and the f i l i n g system. The names of some CPF and CF were a l s o g i v e n because they w i l l be r e f e r r e d to i n the examples of the s y n t h e s i s process d e s c r i b e d i n the next chapter. O R I G I N A L D A T A F I L E S C P F F I L E S A N A L Y S I S M O N I T O R I N G P R O G R A M S Y S . B O O T S T R A P DFO P H O N E M E D I R E C T O R Y S Y S T E M I/O R O U T I N E S DF 1 S E L E C T E D F R A G M E N T D F 2 P H O N E M E B U F F E R D F 3 F i g . 3 . 9 A N A L Y S I S P R O C E S S : O R G A N I Z A T I O N OF C O R E MEMORY AND D A T A F I L E S TABLE No. 3.3 CHARACTERISTIC PHONEME FRAGMENT FILES CPF F i l e Name Rel a t e d Phoneme AA - AD ae. AE - AH e EA - ED i EE - EH 5 IA - ID I JE - JH A OA - OD o OE - OH 0 UA - UD u BA - BC b BD - BH b c o u p l e r s DA - DC d DD - DH d c o u p l e r s GA - GC g GD - GH g c o u p l e r s NC n NOTE : To study, 4 CPF were recorded f o r each vowel and c o u p l e r ; however, i n most cases, 2 were found to be s u f f i c i e n t . 61 4. SPEECH SYNTHESIS In the s y n t h e s i s p r o c e s s , the mo n i t o r i n g program " s y n t h e s i s " i s used as a c e n t r a l p r o c e s s o r to i n t e r p r e t the i n s t r u c t i o n s r e c e i v e d from the command b u f f e r and perform the r e q u i r e d f u n c t i o n s . The i n s t r u c t i o n s , f u n c t i o n s , memory orga-n i z a t i o n and examples of data processed output w i l l be d e s c r i -bed i n t h i s chapter. The bl o c k diagram of the s y n t h e s i z e r a t the a r t i c u l a t o r y stage o f F i g . 2.2 i s shown again i n F i g . 4.1. 4.1 Command i n s t r u c t i o n s 4.1.1 C o n t r o l commands The c o n t r o l commands were stacked i n the command b u f f e r i n a group of fou r 2-6 b i t words, the f i r s t word i n d i c a t e d the f u n c t i o n , the second and the t h i r d words i n d i c a t e d the selec-. t e d CPF, the l a s t word i n d i c a t e d the d u r a t i o n o f s y n t h e t i c segment by number of p i t c h p e r i o d s . Table 4.1 shows an example of c o n t r o l commands used i n s y n t h e s i z i n g the word / n i / . 4.1.2 Fu n c t i o n s The codes and d e s c r i p t i o n s o f f u n c t i o n s are found i n t a b l e 4.2. The i n s t r u c t i o n command T i s used to c r e a t e the steady s t a t e segment while commands C and A are used to gene-r a t e t r a n s i e n t s . G e n e r a l l y , A i s used to generate t r a n s i e n t between phonemes and C w i t h i n phonemes. 62 STORED PHONEME FRAGMENTS • COMMAND S Y N T H E S I S OUTPUT D / A S P E E C H B U F F E R PROCESSOR B U F F E R OUTPUT F i g . 4.1 BLOCK DIAGRAM OF THE S Y N T H E S I Z E R AT THE A R T I C U L A T O R Y STAGE T A B L E 4.1 CONTROL COMMANDS FOR / n i / CONTROL COMMANDS NOTE : 1- 2 3 4 A QA NC 3037 A NC EA 3336 -1 . Command f u n c t i o n s T EA EA 3313 -2 . CPF 1 C EA EB 3313 -3 . CPF 2 T EB EB 3312 -4 . Number o f p i t c h A EB QA 3336 p e r i o d s . 6 3 TABLE 4.2 CONCATENATION AND INTERPOLATION FUNCTIONS CODES FUNCTION DESCRIPTIONS T S t r a i g h t t r a n s f e r : Data, of the CPF i n d i c a t e d by the second word, are t r a n s f e r r e d to the output b u f f e r u n a l -t e r e d . The t h i r d word i s ignor e d , the l a s t word i n d i c a t e s the number of r e -p e t i t i o n s . C P i t c h c o n t o u r i n g : Second and t h i r d words i n d i c a t e CPF used to generate i n t e r p o l a t e d t r a n s i e n t p e r i o d s . The d i f f e r e n c e i n number of samples of the two s e l e c t e d CPF, and the number of p i t c h p e r i o d s i n d i c a t e d by the l a s t word, governs the number of sample p o i n t s to be removed from or added to each generated t r a n s i e n t p e r i o d . A Amplitude and p i t c h c o n t o u r i n g : The t r a n s i e n t p e r i o d s are generated by f i r s t u s i n g f u n c t i o n C to contour, p i t c h p e r i o d , then the amplitude of each r e -s u l t i n g f r a g m e n t i i s submitted to an 8 l e v e l amplitude weighted i n t e r p o l a t i o n f u n c t i o n . E End : T e r m i n a t i o n of command i n s t r u c -t i o n s . 64 4.2 I n t e r p o l a t i o n Schemes 4.2.1 P i t c h c o n t o u r i n g ( f u n c t i o n C) The p i t c h c o u n t ouring scheme was developed to generate i n t e r p o l a t e d fragments whose p i t c h p e r i o d s changed smoothly from one end to the ot h e r . Two steps were necessary to complete the process : - c a l c u l a t e the d i f f e r e n c e s i n d u r a t i o n between each i n t e r p o l a t e d and o r i g i n a l fragments - remove or add samples t o the o r i g i n a l CPF to generate new ones. 4.2.1.a C a l c u l a t i o n o f d u r a t i o n d i f f e r e n c e s L e t : A be the f i r s t o r i g i n a l CPF to be used SA : the d u r a t i o n of A i n number of samples B : the second o r i g i n a l CPF (anchor fragment) SB : the d u r a t i o n of B DSn : the d i f f e r e n c e i n d u r a t i o n between c o n s e c u t i v e fragments N : the number of i n t e r p o l a t e d fragments INT(X/Y) : i n t e g e r p a r t of X/Y MODCX/Y) : the remainder of X/Y |X-Y|: a b s o l u t e v a l u e of X-Y I A n : i n t e r p o l a t e d fragment generated from A I B n : i n t e r p o l a t e d fragment from B DS_ T A and DS^ T T J are the d e s i r e d l i " :. A — _LA 1 3 — I I J n n d i f f e r e n c e s i n d u r a t i o n between o r i g i n a l fragments A and B and the i n t e r p o l a t e d fragments IA and IB Two cases may occur : N<|SB-SA| and N>|SB-SA| i . N<|SB-SA| I f dS= INT(|SB-SA|/N) R = MOD( |SB-SA|/N) Rl= INT((N-R+l)/2) Then dS f o r Cn=l,...,Rl and (n=(Rl+R+l) , • • • , NDSn = (1) (2) (3) (4) dS+1 f o r n= (Rl+1) , . . . ,R1+R The d i f f e r e n c e s i n d u r a t i o n between i n t e r p o l a t e d fragment IA. n and the o r i g i n a l fragment A, and between I B n and B are n DS A-1A = y DS . DS B - I B n i = l N-n - Z Z n i=N DS (5) ( 6 ) The number of fragments generated from CPF A i s N l , and the number of fragments generated from CPF B i s N2 (Fig.4, N l and N2 are g i v e n as f o l l o w s : N l = INT((N+l)/2) (7) N2 •= N-Nl (8) A I A X I A 2 1 S A [AN1 I B , N-N2+1 I B , N I I I I I h D S 2 D S N 1 DS• DS^ n N •-f r a g . gen. from A gen. from B B S B i F i g 4.2 NOTATIONS FOR PITCH CONTOURING DESCRIPTION i i . N>|SB-SA| In t h i s case the d i f f e r e n c e DSn i s equal to 1 between each group of DN or DM fragments, and i s equal to zero w i t h i n groups. T h e r v a l u e s of DN and DM are c a l c u l a t e d as f o l l o w s : DN = INT(N/|SB-SA|) (9) DM = MOD(N/ |SB-SA|) (10) The d i f f e r e n c e s i n d u r a t i o n between each i n t e r p o l a t e d fragment and the o r i g i n a l fragments are computed and entered to the d u r a t i o n t a b l e to be used i n the process of adding or removing of samples . 4.2.1.b. Remove or add samples to the o r i g i n a l CPF Each p i t c h p e r i o d i s . a l t e r e d by r e p e a t i n g or removing one sample from each i n t e r v a l DTA or DTB which are' .givenoas follows DTA = • SA SA (11) h D S A - I A n > ; D S I • i = 1 . DTB = • SB SB (12) N-n > ' D S i ' D S B - I B n i'=N The e f f e c t s of adding and removing samples are shown i n f i g u r e s 4.3 and 4.4 • 4.2 .2 Amplitude and p i t c h c o n t o u r i n g ( f u n c t i o n A) For amplitude and p i t c h c o n t o u r i n g the new fragments are generated by f i r s t c o n t o u r i n g both fragments A and B to the same appropriate p i t c h v alue then the amplitudes are a l t e r e d i n group of GN fragments d e r i v e d as f o l l o w s : V E 73 LU a H _ j a. CE U i > H U l 0£ 67 1 0 . 2 4 MS -| T I M E EFFECTS OF REMOVING POINTSCHIGHER PITCH) U l a •_ t r ui > H V E V F H cr V G H U l a : V H H 2 3 F R E Q U E N C Y 4 CKHZD F I G . 4 3 t T I M E A N D F R E Q U E N C Y P L O T S O F F R A G M E N T S F R O M P H O N E M E /X/ A S I N "BAT" UJ o 3 Q_ T. CE Ul > H I— CE -I Ul 0£ V E 7 3 68 , 1 0 . 2 4 MS , I : 1 T I M E EFFECTS OF ADDING POINTS (LOWER PITCH) 2 3 4 F R E Q U E N C Y C K H Z 3 F I G . 4.4 : T I M E A N D F R E Q U E N C Y P L O T S O F F R A G M E N T S F R O M P H O N E M E /SC/ A S I N » R A T " I f N i s the number of i n t e r p o l a t e d c y c l e s GM = INT (N/7) (13) M = MOD (N/7) (14) Then : GN = GM + *N (15) where $N i s 1 or 0 depend on the value of M as i n d i c a t e d i n t a b l e 4.3 Table 4.3 AMPLITUDE OF INTERPOLATED FRAGMENTS AND VALUE OF <TN Group Magnitude £ N GM = 1 Notes :M=1 M=2 M=3 M=4 M=5 M=6 i 7MA +MB MI- 8 1 1 MI=amplitude of interpolate fragments MA=amplitude of CPF ' (A) MB=amplitude of CPF (B) i i 3MA+MB MI= 4 1 1 1 1 i i i 5 M A+ 3 M B MI= 8 1 1 1 1 1 1 i v MA+MB MI= 2 1 1 1 1 1 1 1 v 3MA+5MB MI= 8 1 1 1 1 1 v i MA+3MB MI= 4 1 1 1 v i i MA+7MB MI= 8 1 MONITORING PROGRAM COMMAND BUFFER OUTPUT CONTROL PARAMETER TABLE SYS. BOOTSTRAP CPF, CF DATA BUFFER GENERATED TRANSIENTS BUFFER PHONEME DIRECTORY SYSTEM I/O ROUTINES DF2 + DF3 OUTPUT BUFFER OUTPUT WORD OUTPUT BUFFER CONTROL D/A OUTPUT WAVEFORM F I G . 4 . 3 S Y N T H E S I Z E R A T T H E A R T I C U L A T O R Y S T A G E B L O C K D I A G R A M A N D M E M O R Y O R G A N I Z A T I O N o 71 4.a Memory o r g a n i z a t i o n The d e s c r i p t i o n i n t h i s s e c t i o n has a dua l purpose, to present the o r g a n i z a t i o n of memory and to e x p l a i n the ope-r a t i o n s o f the s y n t h e s i s p r o c e s s . The b l o c k diagram of the memory o r g a n i z a t i o n i s shown i n F i g . 4.5. The s y n t h e s i s or m o n i t o r i n g program r e c e i v e s data from the command b u f f e r and the phoneme d i r e c t o r y , and i s l o c a t e d i n the f i r s t h a l f of DFO. Upon r e c e i v i n g the com-mand i n s t r u c t i o n s , data of the s e l e c t e d CPF are t r a n s f e r r e d from d i s k u n i t 14 to the f i r s t 4 b l o c k s o f the D F l . I f the f u n c t i o n command i s T, the data are t r a n s f e r r e d u n a l t e r e d to DF2 and DF3, the s t a r t i n g address i n DF2 (or DF3), the dura-t i o n and the number of r e p e t i t i o n s are reco r d e d to the output c o n t r o l parameter t a b l e which i s l o c a t e d i n DFO. I f the func-t i o n command i s C or A, the data are used to generate i n t e r p o -l a t i n g p i t c h p e r i o d s which are s t o r e d i n the generated t r a n -s i e n t s b u f f e r (Block 4 t o 7 of the DFl) bef o r e being t r a n s f e r -red to DF2 (or DF3); the s t a r t i n g address, the d u r a t i o n , and the r e p e t i t i o n (set to 1) f o r each generated fragment are recorded to the output c o n t r o l parameter t a b l e . The parameters of t h i s t a b l e and the d i s p l a y r o u t i n e o f the mo n i t o r i n g pro-gram form the output c o n t r o l to monitor proper sequence of output samples. 4.4> S e l e c t e d examples of s y n t h e t i c waveforms ( A ) ( B ) A ( BD * EA ) 7 EA GE L ± A ( GE * EA ) 7 EA T I M E 10 M S / D I V 8 F i g . 4 .y S Y N T H E T I C T R A N S I E N T S ( A ) B E T W E E N ( B ) B E T W E E N ( C ) B E T W E E N / b / a n d / i / / d / a n d / i / / g / a n d / i / to A s i l e n t fragment QA ( d i g i t a l z e r o ) , a v o w e l - l i k e phoneme fragment NC ( f o r / n / ) , two CPF EA and EB ( f o r / i / ) are used to s y n t h e s i z e a simple v e r s i o n of the u t t e r a n c e / n i / and to i l l u s t r a t e the i n t e r p o l a t i o n f u n c t i o n s and the o p e r a t i o n of c o n t r o l command i n s t r u c t i o n s ( F i g . 4.6). The f i r s t row of f i g . 4.6 shows 4 s e l e c t e d CPF, the second row shows the onset t r a n s i e n t and the t r a n s i e n t between /n/ and / i / , the t h i r d row shows the f i r s t steady s t a t e segment of / i / , and the t r a n s i e n t between two CPF of / i / , the l a s t row shows the second steady s t a t e of / i / and the o f f s e t "segment. The corres p o n d i n g f u n c t i o n s , the s e l e c t e d CPF, and the number o f r e p e t i t i o n s are shown i n t a b l e 4 . 1 . The t r a n s i e n t s between / b / - / i / , / d / - / i / , and / g / - / i / generated f o r t e s t i n g m a t e r i a l s are found i n f i g . 4.7; the n o i s e b u r s t s i n the begi n n i n g and the steady s t a t e of / i / are not shown. 4.5 P o l y s y l l a b l e words and sentences Eventhough the s t u d i e s of p o l y s y l l a b l e words and sentences were not i n c l u d e d i n the scope of t h i s p r o j e c t , a few p o l y s y l l a b l e u t t e r a n c e s were generated i n a h e u r i s t i c manner to i n v e s t i g a t e the p o t e n t i a l of the s y n t h e s i z e r . F o l l o w i n g are the r e s u l t s from the p r e l i m i n a r y s t u d i e s i n the form of observationnal remarks r a t h e r than c o n c l u s i v e statements s i n c e a s t a t i s t i c a l data base of l i n g u i s t i c and s y n t a c t i c parameters o f o r i g i n a l sentences and formal t e s t s were y e t to be done. - I t was found t h a t , i n many cases, the phoneme fragments', should be m o d i f i e d i n p i t c h and/or amplitude b e f o r e being used as anchor fragments (such as fragments A and B i n f i g . 4.2) to generate i n t e r p o l a t e d fragments; consequently a m o d i f i c a t i o n f u n c t i o n (M) was added t o T,C,A,E ( t a b l e 4.2). When M was used, the c a l l i n g CPF was m o d i f i e d (by u s i n g f u n c t i o n C or :A -sectabri'~4.2) then generated data were used as anchor fragment, i n the subsequent o p e r a t i o n s . - T r a n s i e n t s between s y l l a b l e s and between words c o u l d be generated by c o n t o u r i n g the adjadcent phoneme fragments.(the s i l e n t fragment QA - f i g . 4 . 6 - was c o n s i d e r e d as a phoneme fragment) - I n t o n a t i o n c o u l d be generated by v a r y i n g the d u r a t i o n of p i t c h p e r i o d of connected fragments. I f two vowel fragments A and B were used i n a p o l y s y l l a b l e u t t e r a n c e , i t was found that the r i s e i n i n t o n a t i o n was p e r c e i v e d i f SA>SB and the f a l l i n i n t o n a t i o n was p e r c e i v e d i f SA<SB, where SA and SB were the d u r a t i o n s of p i t c h p e r i o d fragments A and B. F u n c t i o n M was used to modify SA and SB p r i o r to the i n t e r p o l a t i n g p r o c e s s . The context dependent parameters such as d u r a t i o n of phonemes, d u r a t i o n of s i l e n t i n t e r v a l s and d i f f e r e n t combinations, of phoneme fragments were the prime f a c t o r s t h a t a t t r i b u t e d to the n a t u r a l n e s s of s y n t h e t i c sentences. The s t u d i e s of such parameters should be of hi g h p r i o r i t y i n f u t u r e works. 4.6 C o n c l u s i o n s In t h i s chapter the s y n t h e s i z e r a t the a r t i c u l a t o r y stage and the i n t e r p o l a t i o n schemes are d e s c r i b e d along w i t h examples of s y n t h e t i c t r a n s i e n t s and monosyllable words . A short discussion about the po l y s y l l a b l e words and sentences, a topic for future work, i s also presented . The i l l u s t r a t i o n s i n figures 4.6, 4.7 and the perceptual tests indicate the s i m p l i c i t y and effectiveness of the inter p o l a t i o n functions described i n t h i s chapter. 77 5- CONCLUSIONS A method to s y n t h e s i z e speech by c o n c a t e n a t i o n o f waveform fragments i s presented. The d i s c o n t i n u i t y a t the j u n c t i o n s i s e l i m i n a t e d by g e n e r a t i n g the t r a n s i e n t waveform with a simple i n t e r p o l a t i o n scheme having two main f u n c t i o n s : p i t c h c o n t o u r i n g and amplitude w e i g h t i n g . Phonemes are found to possess c e r t a i n c h a r a c t e r i s t i c fragments which are r e l a t i v e l y c ontext independent. These c h a r a c t e r i s t i c phoneme fragments (CPF) are the b a s i c elements, the source o f e x c i t a t i o n , of the software s y n t h e s i z e r . The v a r i a t i o n o f the combinations of CPF corresponds to the v a r i a t i o n of u t t e r a n c e s or, i n other words, to the form a t i o n of a l l o p h o n e s . A maximum of 4 CPF are s u f f i c i e n t to w . r e p r e s e n t a vowel and most vowels can be r e c o n s t r u c t e d by usi n g o n l y 2 fragments. R e s u l t s from the p e r c e p t u a l t e s t s i n d i c a t e the e f f e c t i v e n e s s o f the s y n t h e s i s a l g o r i t h m . A r e c o g n i t i o n r a t e of 94% f o r d i g i t i z e d n a t u r a l u t t e r a n c e s and 93% f o r t h e i r s y n t h e t i c v e r s i o n s was o b t a i n e d f o r vowels preceeded by /b/ from s u b j e c t s who had no p r e v i o u s experience with s y n t h e t i c speech. The f l e x i b i l i t y and t o t a l software dependence of the s y n t h e s i z e r l e a d t o many p o s s i b l e a p p l i c a t i o n s , one example be i n g an audio e d i t o r f o r an unsighted programmer. As each programming language o n l y has a l i m i t e d v ocabulary, the in p u t a n a l y z e r ( F i g . 2 . 2 ) c o u l d be s u b s t i t u t e d by a t a b l e look up, and unused CPF c o u l d be d e l e t e d to reduce storage requirements, s i n c e d i g i t a l CPF data are the source of e x c i t a t i o n no s p e c i a l hardware are r e q u i r e d other than a D/A c o n v e r t e r and a simple audio system a t the programmer end to c o n v e r t any t e r m i n a l i n t o an audio e d i t o r . The syn-t h e s i s system i s simply a program and two data f i l e s and can t h e r e f o r e be t r e a t e d as a s p e c i a l f e a t u r e of an e d i t o r , a l i b r a r y program or a user program. The hardware r e a l i z a t i o n can be simple, e s p e c i a l l y w i t h the advent of m i c r o p r o c e s s o r technology. F u r t h e r work on the a n a l y s i s and s y n t h e s i s of words c o n t a i n i n g n a s a l consonants /m/ and /n/ i s r e q u i r e d i n order to improve the system. The d e t e c t i o n of CPF and other phone-t i c cues of vowels lea d s to the s u g g e s t i o n t h a t the f i n d i n g from the a n a l y s i s - b y - s y n t h e s i s system might be a p p l i e d to the study of the r e c o g n i t i o n aspect of speech. 79 REFERENCES Note : JASA = J . Acoustical Soc. of America 1. GIMSON, A.C, An Introduction to the Pronunciation of English, St. Martin's Press Inc., New York, 1970. 2. OPPENHEIM, A.V. and SCHAFER,R.W., D i g i t a l Signal Processing, Prentice-Hall, Englewood C l i f f s , N.J., 1975. 3. FANT, G.,Speech Sounds and Features, The MIT Press, Cambridge, Mass., 1973. 4. FLANAGAN, J.L., Speech Analysis Synthesis and Perception, Academic Press Inc., New York, 1965. 5. MARKEL, J.D. and GRAY, A.H.Jr., Linear Prediction of Speech, Springer-Verlag, B e r l i n Heidelberg, 1976. 6. FLANAGAN, J.L., RABINER, L.R., Eds., Speech Synthesis, Dowden, Hutchinson & Ross, Stroudsburg, Pa., 1973. 7. GILBERT, J.H., Ed., Speech and C o r t i c a l Functioning, Academic Press, New York, 19 72. 8. OPPENHEIM, A.V., Ed., Applications of D i g i t a l Signal Processing, Prentice-Hall Inc., Englewood C l i f f s , N.J., 1978. 9. FRY, D.B., Ed., Acoustic Phonetics, Cambridge University Press, New York, 19 76. 10. ITO, M.R., Investigation of Time-domain measurements for Analysis and Machine recognition of Speech, Ph.D. Thesis, University of B r i t i s h Columbia, Canada.,1971. 11. YEUNG,.J.M.C., Towards Spoken English : A Computer based Synthesizer for a Reading Machine for the Blind, Master Thesis, University of B r i t i s h Columbia, Canada, 1974. 12. DUDLEY, H., "The Carrier Nature of Speech", B e l l System Tech. J . , v o l . 19, pp. 495-515, 1940. 13. STEWARD, J.Q., "An E l e c t r i c a l Analogue of the Vocal Organs", Nature, v o l . 110, pp. 311-312, 1922. 14. DUDLEY, H., RIESZ, R.R. and WATKINS, S.S.A., "A Synthetic Speaker", J.Franklin Inst., v o l . 227, pp. 739-764, 1939. ,80 15. DUNN, H.K., "The C a l c u l a t i o n of Vowel Resonances, and an E l e c t r i c a l V o c a l T r a c t " , JASAy vol.22, pp. 740-753, 1950 16. STEVENS, K.N., KASOWSKI, S. and FANT, C.G.M., "An E l e c t r i c a l Analog of the V o c a l T r a c t " , JASA, vo l . 2 5 , pp. 734-742, 1953. 17. STEVENS, K.N. and HOUSE A.S., "Development of a Q u a n t i t a t i v e D e s c r i p t i o n of Vowel A r t i c u l a t i o n " , JASA, v o l . ^ 2 7 , pp. 484-493, 1955. 18. KELLY, J . L . , J r . , and LOCHBAUM C . C , "Speech S y n t h e s i s " , Proc. F o u r t h I n t e r n . Congr. Acoust, Paper G42, pp. 1-4, 1962. 19. FLANAGAN, J.L., COKER, C H . and BIRD, CM., " D i g i t a l Compu-t e r S i m u l a t i o n of a Formant-Vocoder Speech S y n t h e s i z e r " , 15th Ann. Meeting Audio Engr. S o c , P r e p r i n t 307, 1963. 20. RABINER, L.R., JACKSON, L.B., SCHAFER, R.W. and COKER, C.H., "A Hardware R e a l i z a t i o n of a D i g i t a l Formant Speech S y n t h e s i z e r " , IEEE Trans. Commun. Tech., v o l . COM-19, pp. 1016-1020, 1971. 21. MERMELSTEIN, P., " C a l c u l a t i o n of the V o c a l - T r a c t T r a n s f e r F u n c t i o n f o r Speech S y n t h e s i s A p p l i c a t i o n s " , Proc. Seventh I n t e r n . Congr. Acoust., Paper 23 C 13, pp.'173-176, 1971. 22. DUDLEY, H., "Fundamentals of Speech S y n t h e s i s " , J . Audio Engr. Soc., v o l . 3, pp.170-185, 1955. 23. FANT, G., "The A c o u s t i c s o f Speech", Proc. T h i r d I n t e r n . Congr. Acoust., pp. 188-201, 1959. 24. ATAL, B.S. and HANAUER, S.L., "Speech A n a l y s i s and S y n t h e s i s by L i n e a r P r e d i c t i o n of the Speech Wave", JASA, v o l . 50, pp. 637-655, 1971. 25. ITAKURA, F. and SAITO, 'S., " A n a l y s i s S y n t h e s i s Telephony Based on the Maximum L i k e l i h o o d Method", Proc. S i x t h  I n t e r n . Congr. Acoust., Paper C-5-5, C17-20, 1968. 26. ITAKURA, F. and SAITO, S., "A S t a t i s t i c a l Method f o r Estima-t i o n of Speech S p e c t r a l D e n s i t y and Formant F r e q u e n c i e s " , E l e c t r o n . Commun. Japan, v o l . 53-A, pp.36-43, 1970. 27. ITAKURA, F. and SAITO, S., "On the Optimum Q u a n t i z a t i o n of Feature Parameters i n the PARCOR Speech Synthesizer" 1, Proc. 19 72 Conf. Speech Commun. Process., pp. 434-437, 1972. 28. HARRIS, CM., "A Study of the B u i l d i n g B l o c k s i n Speech", JASA, v o l . 25, pp. 962-969, 1953. .81 29. PETERSON, G.E., WANG, W.S.Y., and SIVERTSEN, E., "Segmenta-t i o n Techniques i n Speech S y n t h e s i s " , JASA,vol. 30, pp. 739-742, 1958. 30. LIBERMAN, A.M., INGEMANN, F. , LISKER, i'jL . , DELATTRE, P., and COOPER, F.S., "Minimal Rules f o r S y n t h e s i z i n g Speech", JASA, v o l . 31, pp. 1490-1499, 1959. 31. COOPER, F.S., DELATTRE, P.C., LIBERMAN, A.M., BORST, J.M., and GERSTMAN, L . J . , "Some Experiments on the Perceptiono of S y n t h e t i c Speech Sounds", JASA, v o l . 24, pp. 597-606, 1952. 32. GAY, T., " E f f e c t s of F i l t e r i n g and Vowel Environment on Consonant P e r c e p t i o n " , JASA, v o l . 4 8 , No. 4 (Part 2), pp. 993-998, 1970. 33. KLEIN, W., PLOMP, R., and POLS, L.C.W., "Vowel S p e c t r a , Vo-wel Spaces, and Vowel I d e n t i f i c a t i o n " , JASA, v o l . 48, No. 4 (Part 2), pp. 999-1009, 1970. 34. POLS, L.C.W., VAN DER KAMP, L.J.Th., and PLOMP, R., "Percep-t u a l and P h y s i c a l Space of Vowel Sounds", JASA, v o l . 46, No. 2 (Part: 2) , pp. 458-467, 1969 . 35. THOMAS, I.B., HILL, P.B., CARROLL, F.S., and GARCIA, B., "Temporal Order i n the P e r c e p t i o n of Vowels", JASA, v o l . 48, No. 4 (Part 2), pp. 1010-1013, 1970. 36. BROAD, D.J., and HISASHI WAKITA, "Piecewise-Planar Represen-t a t i o n of Vowel Formant Frequencies", JASA, v o l . 62, No. 6, pp. 1467-1473, December 1977. 37. ALLEN, G.D., "Vowel d u r a t i o n measurement : A R e l i a b i l i t y Study", JASA, v o l . 63, No. 4, pp. 1176-1185, A p r i l 1978. 38. NORIKO UMEDA, "Vowel d u r a t i o n i n American E n g l i s h " , JASA, v o l . 58, No. 2, pp. 434-445, August 1975. 39. NORIKO UMEDA, "Consonant d u r a t i o n i n American E n g l i s h " , JASA, v o l . 61, No. 3, pp. 846-858, March 1977. 40. LEHISTE, I . , "Role of d u r a t i o n i n disambiguating s y n t a c t i c a l l y ambiguous sentences", JASA, v o l . 60, No. 5, pp. 1199-1202, November 19 76. 41. MILLER, J..L. , and EIMAS, P.D., "Studies on the P e r c e p t i o n of p l a c e and manner of a r t i c u l a t i o n : A Comparison of the l a b i a l s - a l v e o l a r and n a s a l - s t o p d i s t i n c t i o n s " , JASA, v o l . 61, No. 3, pp. 835-845, March 1977. 42. MERMELSTEIN, P., " D i f f e r e n c e limens f o r formant f r e q u e n c i e s o f s t e a d y - s t a t e and consonant-bound vowels", JASA, v o l . 63, No. 2, pp. 572-580, February 1978. 82 43. MERMELSTEIN, P., "Automatic Segmentation of Speech into S y l l a b i c Units", JASA, v o l . 58, No. 4, pp. 880-883, October 19 75. 44. NAKATANI, L.H., and DUKES, K.D., "Locus of segmental cues for Word juncture", JASA, v o l . 62, No. 3, pp. 714-719, September 19 77. 45. HUGGINS, A.W.F., "Just noticeable differences for segment duration i n Natural Speech", JASA, v o l . 51, No. 4 (Part 2), pp. 1270-1278, 1972. 46. ROSENTHAL, L.H., SCHAFER, R.W., and RABINER, L.R., "An Algorithm for locating the Beginning and End of an Utte-rance using ADPCM Coded Speech", B e l l Sys. Tech. J . , v o l . 53, No. 6, pp. 1127-1135, July-August 1974. 47. COLE, R.A., and COOPER, W.E., "Perception of voicing i n English a f f r i c a t e s and f r i c a t i v e s " , JASA, v o l . 58, No. 6, pp. 1280-1287, December 1975. 48. GAY, T., "Articulatory movements in VCV sequences", JASA, v o l . 62, No. 1, pp. 183-193, July 1977. 49. COLE, R.A., and COOPER, W.E., "Properties of f r i c t i o n analy-zers for ([)'] ", JASA, v o l . 62, No. 1, pp. 177-182, July 1977. 50. PLOMP, R., "Pitch of complex tones", JASA, v o l . 41, No. 6, pp. 1526-1533, 1967. 51. NORIKO UMEDA, "L i n g u i s t i c Rules for Text-to-Speech Synthesis", Proc. of the IEEE, v o l . 64, No. 4, pp. 443-451, A p r i l 1976. 52. ALLEN, J.B., "Short term Spectral Analysis Synthesis, and Modification by Discrete Fourier Transform", IEEE Trans. on Acoust., Speech, Sig. Proc., v o l . ASSP-25, No. 3, Pp. 235-238, June 1977. 53. BLESSER, B., and KATES, J.M., " D i g i t a l Precessing i n Audio Signals", Applications of D i g i t a l Signal Processing, Alan V. OPPENHEIM, Editor, 1978, Prentice-Hall, Inc., Englewood C l i f f s , N.J. 07632, Chapter 2, pp. 29-115. 54. GECKINLI, N.C., and YAVUZ, D., "Algorithm for Pitch Extrac-t i o n using zero-crossing i n t e r v a l sequence", IEEE Trans. on Acoust., Speech, Sig. P r o c , v o l . ASSP-25, No. 6, pp. 559-564, December 1977. 55. GIBSON, J.D., MELSA, J.L., JONES, S.K., " D i g i t a l Speech Ana-l y s i s using Sequential Estimation Techniques", IEEE Trans. on Acoust., Speech, Sig. P r o c , v o l . ASSP-23, No. 4, .pp. 362-369, August 1975. 83 56. KOPEC, G.E., OPPENHEIM, A.V., TRIBOLET, J.M., "Speech Analy-s i s by Homomorphic P r e d i c t i o n " , IEEE Trans, on Acoust., Speech, S i g . P r o c , v o l . ASSP-25, No. 1, pp. 40-49, February 1977. 57. MAKHOUL, J . , " S p e c t r a l L i n e a r P r e d i c t i o n : P r o p e r t i e s and A p p l i c a t i o n s " , IEEE Trans, on Acoust., Speech, S i g . Proc., v o l . ASSP-23, No. 3, pp. 283-296, June 1975. 58. OPPENHEIM, A.V., " D i g i t a l P r o c e s s i n g of Speech", A p p l i c a t i o n s of D i g i t a l S i g n a l P r o c e s s i n g , Alan V. OPPENHEIM, E d i t o r , P r e n t i c e - H a l l I n c , Englewood C l i f f s , New J e r s e y 07632, Chapter 3, pp. 117-168, 1978. 59. OLIVE, J.P., and SPICKENAGEL, N., "Speech Res y n t h e s i s from Phoneme-related Parameters", JASA, v o l . 59, No. 4, pp. 993-996, A p r i l 1976. 60. ALLEN, J . , "Synthesis of Speech from U n r e s t r i c t e d Text", Proc. of the IEEE, v o l . 64, No. 4, pp. 433-442, A p r i l 1976. 61. COKER, C.H., UMEDA, N., BROWMAN, C P . , "Automatic S y n t h e s i s from O r d i n a r y E n g l i s h Text", Proc. S i x t h I n t e r n . Congr. Acoust., Paper B-5-2, B 155-158, 1968. 62. ESTES, S.E., KERBY, H.R., MAXEY, H.D., and WALKER, R.M., "Speech S y n t h e s i s from s t o r e d data", IBM J . Res. Develop., v o l . 8, pp. 2-12, 1964. 6 3:.. HARRIS, CM. , "A' Speech-Synthesizer" , JASA, v o l . 25, No. 5, pp. 970-975, September 1953. 64. HOLMES, J.N., MATTINGLY, I.G., and SHEARME, J.N., "Speech S y n t h e s i s by Rule", Language and Speech, 7, pp. 127-143, 1964. 65. MARKEL, J.D., GRAY, A.H. J r . , "A L i n e a r P r e d i c t i o n Vocoder S i m u l a t i o n Based upon the A u t o c o r r e l a t i o n Method", IEEE  Trans, on Acoust., Speech, S i g . P r o c , v o l . ASSP-22, No. 2, pp. 124-134, A p r i l 1974. 66. OLIVE, J.P., and NAKATANI, L.H., "Rule s y n t h e s i s of Speech by Word Concatenation : A F i r s t Step", JASA, v o l . 55, pp. 660-666, 1974. 67. PORTNOFF, M.R., "Implementation of the D i g i t a l Phase Vocoder u s i n g the F a s t F o u r i e r Transform", IEEE Trans, on Acoust., Speech, S i g . Proc., v o l . ASSP-24, No. 3, pp. 243-248, June 19 76. 68. RABINER, L.R., "Speech S y n t h e s i s by Rule : An A c o u s t i c Domain Approach", B e l l Sys. Tech. J . , v o l . 47, pp. 17-37, 1968. 84 69. RABINER, L.R., "A Model f o r S y n t h e s i z i n g Speech by Rule", IEEE Trans. Audio E l e c t r o a c o u s t . , v o l . AU-17, pp. 7-13, 1969. 70. RABINER, L.R., and HERRMANN, 0., "On the Design of Optimum FIR Low-Pass F i l t e r s with Even Impulse Response D u r a t i o n " , IEEE Trans. Audio E l e c t r o a c o u s t . , v o l . AU-21, No. 4, pp. 329-336, August 1973. 71. SAMBUR, M.R., "An e f f i c i e n t L i n e a r - P r e d i c t i o n Vocoder", B e l l Sys. Tech. J . , v o l . 54, No. 10, pp. 1693-1723, December 19 75. 72. SAMBUR, M.R., ROSENBERG, A.E., RABINER, L.R., McGONEGAL, C.A., "On Reducing the buzz i n LPC s y n t h e s i s " , JASA, v o l . 63, No. 3, pp. 918-924, March 1978. 73. SCHAFER, R.W., and RABINER, L.R., "Design and S i m u l a t i o n of a Speech A n a l y s i s - S y n t h e s i s System based on Short-Time F o u r i e r A n a l y s i s " , IEEE Trans. Audio E l e c t r o a c o u s t . , v o l . AU-21, No. 3, pp. 165-174, June 1973. 74. GOODMAN, D.J., McDERMOTT, B.J., andNAKATANI, L.H., "Subjec-t i v e E v a l u a t i o n of PCM Coded Speech", B e l l Sys. Tech. J . , v o l . 55, No. 8, pp. 1087-1109, October 1976. 75. GOODMAN, D.J., GOODMAN, J.S., and MUN CHEN, " I n t e l l i g i b i l i t y and Ratings of D i g i t a l l y Coded Speech", IEEE Trans, on  Acoust • , Speech, S i g . P r o c , v o l . ASSP-26, No. 5, pp. 403-409, October 1978. 76. HOWES, D., "On the R e l a t i o n between the I n t e l l i g i b i l i t y and Frequency of Occurence of E n g l i s h Words", JASA, v o l . 29, No. 2, pp. 296-305, February 1957. 77. LEHISTE, I . , and PETERSON, G.E., " L i n g u i s t i c c o n s i d e r a t i o n s i n the Study of Speech I n t e l l i g i b i l i t y " , JASA, v o l . 31, No. 3, pp. 280-286, March 1959. 78. LIBERMAN, A.M., "Some R e s u l t s of Research on Speech P e r c e p t i o n " . JASA, v o l . 29, No. 1, pp. 117-123, January 1957. 79. MILLER, G.A., and NICELY, P.E., "An A n a l y s i s of P e r c e p t u a l Confusions among some E n g l i s h Consonants", JASA, v o l . 27, No. 2, pp. 338-352, March 1955. 80. NOLL, P., "A Comparative Study of v a r i o u s Q u a n t i z a t i o n schemes f o r Speech Encoding", B e l l Sys. Tech. J . , v o l . 54, No. 9, pp. 1597-1636, November 1975. 81. NYE, P.W., and GAITENBY, J.H., "Consonant I n t e l l i g i b i l i t y i n S y n t h e t i c Speech and i n a N a t u r a l Speech C o n t r o l (Modified Rhyme Tes t R e s u l t s ) " , Haskins L a b o r a t o r i e s : Status Report on Speech Research SR-33, pp. 77-91, 1973. 85 82. PREUSSE, J.W., "Semi-automatic Speech I n t e l l i g i b i l i t y Measurements", IEEE Trans. Audio Electroacoust., v o l . AU-15, No. 4, pp. 188-191, December 1967. 83. ROTHAUSER, E.H., URBANEK, G.E., PACHL, W.P., "A Comparison of Preference Measurement Methods", JASA, v o l . 49, No. 4 (Part 2), 1971. 84. WANG, W.S.Y., and CRAWFORD, J . , "Frequency Studies of English Consonants", Language and Speech, v o l . 3, pp. 131-139, 1960. 85. WONG, D.Y., and MARKEL, J.D., "An I n t e l l i g i b i l i t y Evaluation of several Linear Prediction Vocoder Modifications", IEEE Trans, on Acoust., Speech, Sig. P r o c , v o l . ASSP-26, No. 5, pp. 424-435, October 1978. 86. THATCHER, V.S., McQUEEN, A., Eds, The New WEBSTER Encyclope-dic Dictionary of the English Language, Consolidated Book Publishers, Chicago, 1971. 87. MILLER, G.A., HEISE, G.A., and LICHTEN, W., "The I n t e l l i g i b i l i t y of Speech as a Function of the Context of the Test Materials" J. Exptl. Psychol, 41, pp. 329-335, 1951. 8 8. LADEFOGED, P., BROADBENT, D.E., "Information Conveyed by Vowels", JASA, v o l . 29, pp. 98-104, 1957. 89. DELATTRE, P.C., LIBERMAN, A.M., COOPER, F.S., "Acoustic Loci and Transitional cues for Consonants", JASA, v o l . 27, 7 pp. 769-773, 1955. 90. LIBERMAN, A.M., DELATTRE, P.C., COOPER, F.S., GERSTMAN, L.J., "The Role of Consonant-Vowel Transitions i n the Perception of the Stop and Nasal Consonants!*, Psychological Monographs, v o l . 68, No. 8, 1954. 91. McGONEGAL,C.A., RABINER,L.R.,ROSENBERG,A.E., " A Semiautomatic Pitch Detector (SAPD) ", IEEE Trans, on Acoust.,Speech,Sig. Proc., Vol. ASSP-23, No.6, pp. 570-574, December 1975 92. MILLER, N.J., "Pitch Detection by Data Reduction",IEEE Trans. Acoustic,Speech,and Signal Processing, Vol.ASSP-23,No.1, pp. 72-79, February 1975. 93. POTTER,R.K. ,KOPP,G.A. ,KOPP,H.R. , V i s i b l e Speech, Dover ' J .. Publication,Inc., New York, 1966 86 A P P E N D I C E S 87 APPENDIX A TABLE A . l WORD LIST AND PHONETIC SYMBOLS PHONETIC SYMBOL KEY WORD PHONETIC SYMBOL KEY WORD Simple vowels Plosives I b i t b i d beat did I bet * give 9t bat ? p i t A but t t i t a not K, kid 0 bought Nasal consonants IT book meet n- boot need S b i r d sing % bert u F r i c a t i v e s Complex vowels 2 zero e bate % v i s i o n 0 boat very bout t h i s ai b i t e heat Oi voice f f i t few Semivowels-1iquids i •0" / thing sheet you j A seat w i witch l i d J A f f r i c a t i v e s church r i d judge which APPENDIX A TABLE A.2 LIST OF RECORDED WORDS CONTAINING VOWEL "0" INITIAL MEDIAL FINAL o don 11 go o l d wrote low ohm c o l d mow o n l y mold no oat home tow oak note CO obey known bow open phone pose ocean boat sew those row r o l l 89 APPENDIX B Time and Frequency P l o t s of 8 O r i g i n a l Words B. 1 BIT B. 2 BAIT B. 3 BET B. 4 BAT B. 5 BUT B. 6 BOUGHT B. 7 BOAT B. 8 BOOT a. time p l o t b. frequency p l o t s aannidwy BAiiyiaa F I G . B.2. a T I M E P L O T O F T H E UIOROs " B A T E iJ3 93 aarundwy 3 A i i y ~ i a a aannidwu a A i i u i a a 97 3 a r u i ~ i d w u B A i i y i a a <99 101 '.-I I I I 1 1 1 2 3 4 S F R E Q U E N C Y C K H Z J F I G . B . 6 . b : F R E Q U E N C Y P L O T S O F T H E U O R D : " B O U G H T " 102 aonindwu 3Aiiunaa I I 1 I I I 1 2 3 4 S F R E Q U E N C Y C K H Z J F I G . B.7.b : F R E Q U E N C Y P L O T S O F T H E U O R D : " B O A T " R E L A T I V E A M P L I T U D E I I I I I I 1 2 3 4 S F R E Q U E N C Y C K H Z D F I G . B.8 . b : F R E Q U E N C Y P L O T S O F T H E U O R D : " B O O T APPENDIX C TEST MATERIALS C . l Examples of t e s t words and command i n s t r u c t i o n s C.2 T e s t procedures C.3 S t i m u l i and response sheets TABLE C.l.a EXAMPLES OF TEST WORDS AND COMMAND INSTRUCTIONS B EA T T T T A A A A A T T E DEET A T T A T A T A A A T T E GEET T A T A A T A A T T • E BA BA 3313 BB BB 0001 BD BD 0331 BD EA 3307 EA EB 3335 EB EC 3320 EC ED 3307 ED QA 3303 QA - QA 3337 TA TA 3331 QA DA 3305 DB DB 3301 DE DE 3331 DE EA 3333 EA EA 0305 EA EB 3337 EB EB 0335 EB EC 3307 EC ED 3337 ED QA 3333 QA QA , 3007 TA TA 3331 GD GD 3331 GE EA 0337 EA EA 3335 EA EB 0337 EB EC 3337 EC EC 0337 EC ED 3337 ED QA 0303 QA QA 0337 TA TA 0031 BUT T BA BA T BB BB A BF JE T JE JE A JE JF T JF JF A JF JG A JG JH A JH QA T QA QA T TA TA E GHT T BA BA T B3 B3 T BC BC T BG BG A BG OE T OE OE A OE OF A OF OG A OG OH A OH QA T QA QA T TA TA E T T BA BA T B3 B3 A BG OA A OA OB T OB 03 A OB OC A OC OD A OD QA T QA QA T TA TA E r T BA BA T 3B BB A BH UA A UA UB A UB UC A UC UD A UD QA T QA QA T TA TA TAB.LIZ C.l.b B IT I i 3313 ! T 3331 T 3333 A 0335 | A 3333 ! A 3335 I A 3333 A 3333 T 3333 T 333 7 E 3331 BA IT T T 3313 T 333 i A 3031 A 3331 A 3335 A 3337 A 333 7 T 333 7' T 3337 E 3333 3335 BET 3331 T T T T 3313 A 3331 T 333 7 A 333 7 T 3333 T 833 7 E 3337 0303 BAT n. r\ n\ o <j yj tJ I 3331 T T T A T 3313 ; A a n a t X kj o >J I ; * 3337 I A 333 7 ! T 333 7 T 333 7 E 108 BA BA 3313 BB BB 3331 BC IA 3335 IA IB 333 7 IB IC 333 7 IC ID 3337 ID QA 0333 QA QA 3337 TA TA 3331 BA BA 3313 BB BB 3331 BE BE 333 I BE AE >'j <J f AE AF 3337 A F AG 333 7 AG AH 3337 AH QA 33 33 QA QA 333 7 TA TA 3331 BA BA 331 3 BB BB 3331 BE BE 3331 EE EE 330 7 EE EF 333 7 EF EF 333 7 EF QA 3335 QA QA 3337 TA TA 3331 BA BA 3313 BB BB 3331 3F BF 3331 BF A A 3333 A A AA 3312 AA AB 3335 AB AB 3337 AB QA 3333 QA QA 333 7 TA TA 3331 C.2 TEST PROCEDURES TEST No. I. and 3 Subject : Vowel i d e n t i f i c a t i o n . S t i m u l i 90 e n t r i e s from 9 d i f f e r e n t words. Response : Closed and f o r c e d . (1/9) Set up The order of the t e s t word w i l l be announced f i r s t by the n a t u r a l v o i c e of a male speaker. Each stimulus w i l l be giv e n twice. The r e p e t i t i o n s w i l l be about 5 seconds a p a r t . Procedure: Please w r i t e an a p p r o p r i a t e phonetic symbol or a number i n the i n d i c a t e d area. I f you are not sure what you have heard, p l e a s e guess. T e s t words, symbols and code numbers : I BEAT I BIT e BAIT 6 BET BAT A BUT 0 BOUGHT 0 BOAT CC BOOT 1 2 3 4 5 6 7 8 9 Example : Male announcers v o i c e : 99 Te s t word p e r c e i v e d : bat Response i i or 99 dt, 110 TEST No. 2 Subject : Consonant d i s c r i m i n a t i o n . S t i m u l i : 81 e n t r i e s from 27 d i f f e r e n t words. A l l words are of the form : / b - t / , /,d-t/, /g-t / Response : Closed and f o r c e d . (1/3) Set up : The order o f the t e s t word w i l l be announced f i r s t by the n a t u r a l v o i c e of a male speaker. Each stimulus w i l l be giv e n twice. The r e p e t i t i o n s w i l l be about 3 seconds a p a r t . Procedure : Please w r i t e the consonant p e r c e i v e d as b, d, or g i n the i n d i c a t e d area. When i n doubt ple a s e guess. Please note t h a t some o f the t e s t words do.not e x i s t i n the E n g l i s h language, they are made up to ev a l u a t e t h i s system. Example Announcers v o i c e : 99 Te s t word p e r c e i v e d : da*t Response : I l l TEST No. 4 Subject : Consonant and vowel i d e n t i f i c a t i o n . S t i m u l i : 81 e n t r i e s from 27 d i f f e r e n t words. A l l words are of the form : / b - t / , / d - t / , / g - t / Please note some of the t e s t words do not e x i s t i n the E n g l i s h language, they are made up to eval u a t e t h i s system. Response : Closed and f o r c e d . (1/27) Set up : The order of the t e s t word w i l l be announced f i r s t by the n a t u r a l v o i c e of a male speaker. Each stimulus w i l l be giv e n twice. The r e p e t i t i o n s w i l l be about 3 seconds a p a r t . Procedure : Please w r i t e the consonant p e r c e i v e d and the a p p r o p r i a t e phonetic symbol or the code number of the vowel f o l l o w i n g i t i n the i n d i c a t e d area. Phonetic symbols and code numbers • I e, i 2L A 0 0 BEAT BIT BAIT BET BAT BUT BOUGHT BOAT BOOT 1 2 3 4 5 6 7 8 9 Example : Male announcers v o i c e : 99 Te s t word p e r v e i v e d : goat Response 99 g 8 or 99 t c 112 TEST NO. 5 Subject : Alphabet r e c o g n i t i o n . S t i m u l i : Alphabet c h a r a c t e r s i n random ord e r . Response : C l o s e d and f o r c e d . (1/26) Set up : The order of the t e s t l e t t e r w i l l be announced f i r s t by the n a t u r a l v o i c e of a male speaker. Each stimulus w i l l be g i v e n twice. The r e p e t i t i o n s w i l l be about 5 seconds a p a r t . Procedure: Please w r i t e the alphabet c h a r a c t e r p e r c e i v e d i n the i n d i c a t e d area. C.3 STIMULI AND RESPONSE SHEETS TEST No. 1 and 3 STIMULI AND RESPONSE SHEET I I e a A 0 0 BEAT BIT BAIT BET BAT BUT BOUGHT BOAT BOOT 1 2 3 4 5 6 7 8 9 1 [but] 9 2 [bit] 2 3 [b£t] 4 4 [bot] 8 -! 5 [bet] 4 6 [bot] 8 7 [but] 9 8 [bot] 8 9 [bit] 2 10 [bot] 7 11 [bet] 3 12 [but] 9 13 [bAt] 6 14 [bOt] 7 15 [bit] 1 16 [bot] 8 17 [bat] 5 18 [bOt] 7 19 [b^t] 7 20 [bet] 4 21 [bJt] 7 22 [but] o 23 [bet] 3 24 [b£t] 4 25 [bat] 5 2 6 [bot] - 8 27 [bit] 1 28 [bat] 5 29 [ b A t ] 6 30 [bjt] 7 31 [bot] 8 32 [bat] 5 33 [bAt] 6 34 [bAt] 6 35 [bet] 3 36 [bet] 3 37 [b£-t] 4 . 38 [bit] 1 39 [bit] 2 40 [bit] 1 41 [bet] 3 42 [b£t] 4 43 [bot] 8 44 [bit] 1 45 [b&t] 4 46 [bot] 8 4 7 [b£t] 4 48 [bot] 7 4 9 [bet] 3 50 [but] 9 51 [bot] 8 52 [bit] 1 53 [bet] 3 54 [bAt] 6 55 [bat] 5 56 [b<?t] 7 57 [bit] 1 58 [bet] 3 59 [bit] 2 60 [bit] 1 61 [bit] 2 62 [but] 9 63 [bit] 2 64 [bit] 2 65 [b,?t] 7 66 [bat] 5 67 [but] 9 68 [bot] 8 69 [bit] 2 70 [bAt] 6 71 [but] 9 72 [bit] 1 73 [bit] 1 74 [bat] 5 75 [bit] 2 76 [bAt] 6 77 [b6t] 4 78 [bAt] 6 79 [bat] 5 80 [bet] 3 81 [bAt] 6 82 [bat] 5 83 [but] 9 84 [bat] 5 85 [bit] 2 86 [bAt] 6 87 [bet] 3 88 [but] 9 89 [b£t] 4 90 [hot] 1 TEST No.-2 and 4 RESPONSE SHEET AND STIMULI I I e e dt A 0 O IA, BEAT BIT BAIT BET BAT BUT BOUGHT BOAT BOOT 1 2 3 4 5 6 7 8 9 A l l words a r e o f t h e f o r m : / b - t / , / d-t/ , / g - t / 1 [gut] ' 9 2 [dot] 8 3 [ b e t ] 3 4 [bot] 7 5 [ g o t ] 8 6 [ b i t ] 2 7 [get] 4 8 [ d e t ] 3 9 [ d i t ] 2 10 [ b J t ] 7 11 [ga?t] 5 12 [ d e t ] 3 13 [dOt] 7 14 [ b o t ] 8 15 [ b i t ] 1 16 [bOt] 7 17 [gAt] 6 18 [det] 4 19 [got] 7 20 [ d l t ] 2 21 [dOt] 7 22 [dot] 5 23 [ d i t ] 1 24 [dAt] 6 25 [gut] 9 2 6 [ g a t ] 5 27 [ g i t ] 2 28 [g^)t] 7 29 [dut] 9 30 [bAt] . 6 31 [gAt] 6 32 [g£t] 4 33 [ g i t ] l 34 [det] 4 35 [dAt] 6 36 [ b e t ] 4 37 [bet] 3 38 [ b e t ] 4 39 [dot] 8 40 [ d i t ] 1 41 [ d a t ] 5 42 [got] 7 43 [ b e t ] 4 44 [get] 4 45 [ b i t ] 2 46 [ d o t ] 8 47 [ g e t ] 3 48 [ b i t ] 1 49 [gut] 9 50 [dut] 9 51 [bAt] 6 52 [dut] 9 53 [bot] 8 54 [but] 9 55 [ g i t ] 2 56 [ g i t ] 2 57 [ b e t ] 3 58 [bAt] 6 59 [ b a t ] 5 60 [ b i t ] 1 61 [dAt] 6 62 [ d i t ] 1 63 [ d e t ] 3 64 [ b i t ] 65 [gAt] 6 66 [bot] 8 67 [ b a t ] 5 68 [ g i t ] • 1 69 [ g e t ] 3 70 [ d i t ] 2 71 [det] 4 72 [got] 8 73 [ g i t ] 1 74 [but] 9 75 [ g o t ] 8 76 [bat] 5 77 [but] 9 78 [ g e t ] 3 79 [ga*] 5 80 [d3t] 7 81 [ d o t ] 7 TEST NOi 5 RESPONSE SHEET AND STIMULI Please w r i t e the alphabet c h a r a c t e r p e r c e i v e d i n the i n d i c a t e d area. 1 2 3 4 5 6 7 8 9 A Q c G Z B V L K 10 11 12 13 14 15 16 17 18 G W L P H N G X M 19 20 21 22 23 24 25 26 27 V K P N J 0 L W T 28 29 30 31 32 33 34 35 36 Y R F X V S M 0 D 37 38 39 40 41 42 43 44 45 C D Q J N Y D V B 46 47 48 49 50 51 52 53 54 0 U A M R p R H I 55 56 57 58 59 60 61 62 63 T T C F E A 0 J L 64 65 66 67 68 69 70 71 72 B X H E S U K M. Z 73 74 75 76 77 78 79 80 81 S I Z E I U W F P 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065436/manifest

Comment

Related Items