UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computer simulation, development and evaluation of a high speed spelled speech code Suen, China Yee 1972

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1972_A1 S93.pdf [ 4.42MB ]
Metadata
JSON: 831-1.0101406.json
JSON-LD: 831-1.0101406-ld.json
RDF/XML (Pretty): 831-1.0101406-rdf.xml
RDF/JSON: 831-1.0101406-rdf.json
Turtle: 831-1.0101406-turtle.txt
N-Triples: 831-1.0101406-rdf-ntriples.txt
Original Record: 831-1.0101406-source.json
Full Text
831-1.0101406-fulltext.txt
Citation
831-1.0101406.ris

Full Text

COMPUTER SIMULATION, DEVELOPMENT AND EVALUATION OF A HIGH SPEED SPELLED SPEECH CODE by CHING IEE SDEN B. Sc.(Eng.), U n i v e r s i t y of Hong Kong, 1966 M.Sc.(Eng.), U n i v e r s i t y of Hong Kong, 1968 M.A.Sc, U n i v e r s i t y of B r i t i s h Columbia, 1970 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY i n the Department of E l e c t r i c a l Engineering We accept t h i s t h e s i s as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June, 1972 In p r e s e n t i n g t h i s t h e s i s in p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced degree at the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e tha t the L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . 1 f u r t h e r agree t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by the Head o f my Department o r by h i s r e p r e s e n t a t i v e s . It i s u n d e r s t o o d tha t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Department o f E lec t r i ca l Enginering The U n i v e r s i t y o f B r i t i s h Co lumbia Vancouve r 8, Canada July 28th 1972 ABSTRACT A high speed s p e l l e d speech code has been developed f o r a reading machine f o r the b l i n d . Keeping w i t h i n the constraints of high reading speed and high i n t e l l i g i b i l i t y , a main c o n t r i b u t i o n of the work has been to minimize the memory s i z e and thus the cost of the d i g i t a l s p e l l e d speech reading machine. In order to reduce the amount of memory required to generate l e t t e r sounds of t h i s code, redundant phonemes were eliminated and a selected set of 18 basic phonemes was extracted by a segmentation program. L e t t e r sounds were then synthesized by concatenation of these basic phonemes. A l s o , vowels and vowel-like sounds have quasi-periodic waveforms. These sounds were reproduced s a t i s f a c t o r i l y by repeating over and over again a p i t c h period extracted from the o r i g i n a l waveforms. Another reduction of d i g i t a l memory storage was accomplished by .providing each i n d i v i d u a l phoneme with a minimum number of b i t s per sample. The segmentation program developed runs on a PDP-9 d i g i t a l computer. This program has the functions of a c q u i s i t i o n , graphic d i s p l a y , data p r i n t - o u t , auditory presentation, manipulation and e x t r a c t i o n of speech samples. Graphic d i s p l a y of the amplitude-time waveforms of various segments of a speech sample provided an accurate and e f f i c i e n t method of e x t r a c t i n g the basic phonemes. S i x vowels extracted i n t h i s way were experimented i n a d i s c r i m i n a t i o n t e s t . I t was found that even when these vowels were only 10 ms. i n duration, the subjects could learn t o discriminate them. The PDP-9 computer was also used t o synthesize the l e t t e r sounds and t o simulate a s p e l l e d speech machine. Experiment with three b l i n d subjects indicated that they could read s p e l l e d sentences between 60 and 70 words per minute with high i i i n t e l l i g i b i l i t y a f t e r only one hour of contact with t h i s s p o i l e d speech code. A difference coding scheme was used to reduce f u r t h e r the amount of d i g i t a l memory required t o store the basic phonemes. An attempt was al s o made t o f i n d out whether memory storage could be reduced by lowering the sampling r a t e . This was studied by reducing the bandwidth of the l e t t e r sounds i n a subjective t e s t using 16 b l i n d students. Also investigated i n t h i s experiment were the i n t e l l i g i b i l i t y of the l e t t e r sounds and the e f f e c t s of presentation speed and pause between words. Experimental r e s u l t s confirmed that most b l i n d subjects could learn to recognize a l l 26 synthesized l e t t e r sounds a f t e r a short period of t r a i n i n g , and they could read s p e l l e d sentences between 65 and 75 words per minute with an i n t e l l i g i b i l i t y score of about 85$ corr e c t . Bandwidth reduction reduced the pleasantness and c l a r i t y of the l e t t e r sounds. I t was concluded that f o r the reduction of memory storage, the dif f e r e n c e coding scheme was..preferred and the o r i g i n a l bandwidth of 6 kHz. should be retained. i i i TABLE OP CONTENTS Page ABSTRACT i i TABLE OF CONTENTS i v LIST OF ILLUSTRATIONS v i LIST OF TABLES v i i ACKNOWLEDGEMENT i x 1. INTRODUCTION 1 2. EXTRACTION OF SPEECH SEGMENTS 6 2.1 Computer System f o r Processing Speech . . . . . . 7 2.2 Segmentation Program . . . . . . . . . 9 3. DISCRIMINATION OF VOWELS CF VERY SHORT DURATION 13 3.1 A.Review of Vowel Perception . . . . . . . 14 3.2 Procedure . . . . . . . . . . . 15 3.2.1 Preparation of Vowels . . . . . . . . . 15 3.2.2 Experimental Design and Testing Procedure . . . . . 16 3.3 Results and Discussions . . . . . . . . 18 4. PREPARATION OF LETTER SOUNDS AND SPELLED SPEECH EXPERIMENT . . . 28 4.1.1 Basic Phonemes and L e t t e r Sounds . . . . . . . 29 4.1.2 Savings i n Memory Storage . . . . . . . . 32 4.2.1 Preliminary Experiment with B l i n d Subjects . . . . . 33 4.2.2 Experimental Results . . . . . . . . . 34 4.3 An Upper Bound to Spelled Speech A s s i m i l a t i o n . . . . 35 i v Page 5. MINIMIZING THE AMOUNT OF MEMORY REQUIRED TO STORE THE BASIC PHONEMES . 36 5.1.1 B i t Reduction by Considering the In d i v i d u a l Basic Phonemes . 36 5.1.2 Magnitude Difference Encoding Scheme . . . . . . 38 5.2 B i t Reduction by Decreasing the Sampling Rate . . . . 40 6. FURTHER EXPERIMENT TO STUDY INTELLIGIBILITY OF LETTER SOUNDS AND EFFECTS OF PRESENTATION-SPEED, BANDWIDTH-REDUCTION AND PAUSE TIME BETWEEN WORDS 41 6.1 Experimental Design and Testing Procedure . . . . . 42 6.2 Experimental Results . . . . . . . . . 43 6.2.1 L e t t e r I d e n t i f i c a t i o n Before Training . . . . . 43 6.2.2 L e t t e r I d e n t i f i c a t i o n A f t e r Training . . . . . . 50 6.2.3 Test on Sentence Reading . . . . . . . . 52 6.2.4 Word Length and I n t e l l i g i b i l i t y 57 7. SUMMARY, DISCUSSIONS AND SUGGESTIONS 58 7.1 Summary . . . . . . . . . . . . 58 7.2 Discussions . . . . . . . . . . . 60 7.2.1 Spelled Speech and E l d e r l y Subjects . . . . . . 60 7.2.2 Other Speech Aids f o r the B l i n d 61 7.3 Suggestions: Contracted Spelled Speech . . . . . 62 REFERENCES . 63 APPENDIX I L i s t of Phonetic Symbols Used 69 APPENDIX I I Plan of Spelled Speech Experiment with 16 B l i n d Subjects . 70 APPENDIX I I I Analysis of Variance: Data Analysis of Spelled Speech Experiment . . . . . . . . . . . 71 APPENDIX IV Results of Newman-Keuls' Test of Spelled Speech Scores . . 72 v LIST OP ILLUSTRATIONS Figure Page 1 Block diagram of a d i g i t a l s p e l l e d speech reading machine . . 2 2 Plan of study 4 3 Computer system f o r processing speech . . . . . . 8 4 Flow chart f o r the segmentation program . . . . . . 10 5 Waveform of the sound of the word 'SPLIT' 12 6 Segmenting the waveform into d i f f e r e n t parts . . . . . 12 7 Presentation of the vowel s t i m u l i . . . . . . . 17 8 D i s c r i m i n a t i o n scores f o r FT and UPT subjects as a function of t e s t blocks . . . . . . . . . . . . 20 9 Dis c r i m i n a t i o n scores of PT subjects f o r the s i x tested vowels . 24 10 D i s c r i m i n a t i o n scores of UPT subjects f o r the s i x tested vowels . 25 vi LIST OP TABLES Table Page 1 Means and standard deviations of the percent correct d i s c r i m i n a t i o n scores of PT and UPT subjects as a function of t e s t blocks . . . 19 2 Analysis of variance of d i s c r i m i n a t i o n scores . . . . . 23 3 Confusion matrix of PT subjects f o r the l a s t four blocks (144 vowel s t i m u l i ) . . . . . . . . . . . . . 26 4 Confusion matrix of UPT subjects f o r the l a s t four blocks (144 vowel s t i m u l i ) . . . . . . . . . . . . . 27 5 L i s t of basic phonemes . . . . . . . . . . 2 9 6 L e t t e r sounds synthesized by concatenation of basic phonemes . . 31 7 Comparison of memory space required by spoken l e t t e r sounds and memory space required by the basic phonemes . . . . . . . 32 8 Average $ correctness scores of three b l i n d subjects . . . . 34 9 D i s t r i b u t i o n s of maxima of the basic phonemes . . . . . 37 10 Example of magnitude difference encoding scheme f o r n =» 10 . . -38 11 D i s t r i b u t i o n s of maxima of the basic phonemes a f t e r difference coding 39 12 Confusion matrix of l e t t e r sounds at the 3 kHz. bandwidth . . . 44 13 Confusion matrix of l e t t e r sounds at the 4 kHz. bandwidth . . . 45 14 Confusion matrix of l e t t e r sounds at the 5 kHz. bandwidth . . . 4 6 15 Confusion matrix of l e t t e r sounds a t the 6 kHz. bandwidth . . . 47 16 Combined confusion matrix of l e t t e r sounds f o r the four bandwidths . 48 17 Combined i d e n t i f i c a t i o n scores of l e t t e r sounds i n descending order of correctness (perfect score: 16) . . . . . . . 4 9 v i i Table Page 18 Combined confusion matrix of l e t t e r sounds f o r the four bandwidths a f t e r learning . . . . . . . . . . . 51 19 Scores of s p e l l e d speech experiment i n percent correctness . . . 53 20 Summary data of s p e l l e d speech experiment i n percent correctness . 54 21 Overall percent correctness of words according to word length . . 57 22 L i s t of words to be synthesized from basic phonemes . . . . 62 v i i i ACKNOWLEDGEMENT The author wishes to express h i s gratitude to Dr. Michael P. Beddoes, supervisor of t h i s p r o j e c t , f o r h i s assistance and i n t e r e s t throughout the course of the work. The author i s indebted to Dr. John H. V. G i l b e r t of the D i v i s i o n of Audiology and Speech Sciences f o r encouragement and h e l p f u l suggestions. The author a l s o benefited from discussion with Dr. Michael S. Humphreys of the Psychology Department. Special thanks are given to the author's w i f e , L i n g , f o r encouragement and understanding throughout h i s graduate study. Thanks are also due to George Austin f o r good maintenance of the PDP-9 computer and Rodney George f o r equipment assistance. The author wishes .to acknowledge also., . f i n a n c i a l support .from ..the f o l l o w i n g organizations: a) operating grants: A-3290 from the National Research Council of Canada MA-3971 from the Medical Research Council of Canada grant from the Vancouver Foundation grant from the Canadian National I n s t i t u t e f o r the B l i n d b) c a p i t a l equipment grants: ME-3782 from the Medical Research Council of Canada E-3291 from the National Research Council of Canada. i x 1 1 Introduction There are now, e i t h e r e x i s t i n g or i n development, a v a r i e t y of reading machines which enable the b l i n d to obtain information from p r i n t e d materials. These machines range from the simple d i r e c t t r a n s l a t i o n type that produces e i t h e r an auditory or t a c t i l e code f o r each l e t t e r , to the complex recognition type that t a l k s d i r e c t l y t o the operator. Simple machines, such as the Optophone which generates buzz tones and the Lexiphone which produces musical tones modulated i n frequency and amplitude, are portable and r e l a t i v e l y cheap to produce. But i t requires upwards of a year of t r a i n i n g to master the code sounds. The ultimate reading rate w i t h these d i r e c t t r a n s l a t i o n machines i s a l s o l i m i t e d [ l Comparable t o these machines i s the Optocon which presents an image of a character on a 24 -by 6 matrix of stimulators,.much t r a i n i n g i s also required to master the t a c t i l e codes produced by t h i s machine [2, 3] . Talking machines of various designs have been considered by a number of i n v e s t i g a t o r s . With t h i s type of machines, speech i s e i t h e r generated from stored data or synthesized by a set of rules [4] and l e a s t l e a r n i n g and e f f o r t are required on the part of the user. However, t a l k i n g machines are very complex and c o s t l y [3] and are su i t a b l e only f o r library-use i n a time-shared mode. A personal-type of machine which s t r i k e s a compromise between expense and ease of use i s the sp e l l e d speech reading machine which produces l e t t e r sounds of the alphabet. While being about two t o three times more expensive than the Lexiphone, such a machine w i l l appeal to many b l i n d users f o r one p r i n c i p a l reason, i . e . i t can be used i n a matter of a few contact hours instead of a year or so to master the codes f o r the simple machines. One type of s p e l l e d speech reading machines c a l l e d the Cognodictor has been developed by Mauch Laboratories [5] . L e t t e r sounds of a f i x e d duration were recorded on d i f f e r e n t tracks of a f i l m drum. This k i n d of l e t t e r storage has disadvantages of bulkiness, cost, and f i x e d duration of l e t t e r sounds. A bette r method of s t o r i n g l e t t e r data would be a d i g i t a l memory sub s t i t u t e f o r the drum. F i g . 1 shows the block diagram of a d i g i t a l s p e l l e d speech reading machine under development at the U n i v e r s i t y of B r i t i s h Columbia. I t con s i s t s of a scanner of the Lexiphone t o scan the pr i n t e d t e x t [ l ] . This scanner can be p u l l e d e i t h e r by hand or by a mechanical device along the l i n e of p r i n t and sig n a l s are obtained Let te r Recognizer D e c i s i o n (^Letter Fea tu res Photocel l Scanner L e t t e r s P r i n ted Text 1 Dig i ta l S to rage of Le t ter Sounds L e t t e r Sounds P i g . i Block diagram of a d i g i t a l s p e l l e d speech reading machine. 3 from an array of 54 photocells as they pass over each of the l e t t e r s i n turn. The l e t t e r recognizer then makes use of these signals to extract sets of features [6] t o i d e n t i f y the l e t t e r . A f t e r the l e t t e r has been recognized, the corresponding l e t t e r sound w i l l be t r i g g e r e d and s p e l l e d speech utterances of the p r i n t e d t e x t w i l l be produced. Although d i g i t a l storage of l e t t e r sounds i s preferred to a f i l m drum memory u n i t , yet at f i r s t s i g h t , the number of samples required to produce l e t t e r sounds of good q u a l i t y would appear very large and the cost would be p r o h i b i t i v e . For example, one can e a s i l y a r r i v e at an estimate of the cost as f o l l o w s . At 60 words per minute (wpm.), each l e t t e r (roughly 5 l e t t e r s t o the word) w i l l occupy 0.2 sec. on the average; at 12.5 kHz. sampling using 9 b i t samples and allowing 20$ pause time between l e t t e r s , i t requires 468,000 b i t s to store the e n t i r e set of 26 l e t t e r sounds. Read-only .memory w i l l .cost about one cent a .bit, .thus the memory w i l l cost $4,680. In view of t h i s , a synthesis process was sought and a number of methods were investigated t o reduce the amount of d i g i t a l storage and cost i n constructing a d i g i t a l s p e l l e d speech reading machine f o r the b l i n d . In the synthesis of l e t t e r sounds, the acoustic properties of some l e t t e r sounds were also studied and were used as guidelines i n the development of a high speed d i g i t a l s p e l l e d speech code. This i n v e s t i g a t i o n was c a r r i e d out w i t h the a i d of a PDP-9 d i g i t a l computer and i t s graphic d i s p l a y and i n t e r f a c e accessories. The plan i s shown i n F i g . 2. F i r s t a segmentation computer program was developed to study the properties of l e t t e r sounds and to extract a set of basic phonemes from the l e t t e r sounds a f t e r sampling and d i g i t i z a t i o n . An experiment was then conducted to study d i s c r i m i n a t i o n l e a r n i n g of s i x vowels extracted by the segmentation program. Computer software was a l s o developed t o synthesize l e t t e r sounds and t o generate the d i g i t a l s p e l l e d speech code by concatenation of the basic phonemes. A p i l o t experiment was conducted 4 A High Speed Code f o r the D i g i t a l Spelled Speech Reading Machine Spoken L e t t e r Sounds Speed up Presentation Rate Sampling and D i g i t i z a t i o n Time Compression requires 461.3 k i l o - b i t s Concatenation to form L e t t e r Sounds Check Confusions Evaluation by Subjective Experiments Variable Word Space Speed of Presentation of Spelled Sentences Minimization of D i g i t a l Storage 64.8 k - b i t s Variable B i t Length f o r I n d i v i d u a l Phonemes Bandwidth Reduction 27.3 k - b i t s Difference Encoding 20.1 k - b i t s Difference Coding preferred P i g . 2 Plan of study. 5 using three b l i n d subjects. Memory storage of the basic phonemes was reduced by providing v a r i a b l e b i t lengths f o r the i n d i v i d u a l phonemes. Further reduction was accomplished by a difference coding scheme. The p o s s i b i l i t y of bandwidth reduction was studied i n an intensive subjective experiment which also investigated the e f f e c t s of presentation speed and v a r i a b l e pause between words. This t h e s i s i s concluded with a summary, discussions and suggestions. 2 E x t r a c t i o n of Speech Segments One cannot s p e l l out words at a rate much f a s t e r than 50 wpm. In order to speed up presentation rate of s p e l l e d speech, l e t t e r sounds must be processed e i t h e r by compression or by ex t r a c t i o n of sound segments [7] . The compression process speeds up the d e l i v e r y of speech by discarding alternate segments from the o r i g i n a l speech sample. The remaining segments are then joined together to form compressed speech. Thus the average frequencies and p i t c h of the o r i g i n a l speech sample are unchanged. Unfortunately t h i s method discards segments i r r e s p e c t i v e of t h e i r importance to the o r i g i n a l speech sample, and thus i n t e l l i g i b i l i t y of compressed speech deteriorates e s p e c i a l l y when the speed-up r a t i o i s high. In the case of spe l l e d speech, t h i s compression process also brings i n a l o t of confusions among some l e t t e r s , i n p a r t i c u l a r , confusions among B, D, G, P and T, and confusions between M and N, P and S [8]. In the segmentation process, a speech sample i s f i r s t sampled, d i g i t i z e d , and stored i n the computer. The waveform of t h i s speech sample can then be displayed on a screen and various segments of i t can be accurately and e f f i c i e n t l y extracted and a l s o joined together when required. Since the segmentation process i s f a r more precise than the compression process, i t was adopted throughout t h i s study to analyze speech sounds and to extract speech segments f o r the synthesis of l e t t e r sounds. 7 2.1 Computer System f o r Processing Speech F i g . 3 shows the computer system used f o r processing speech s i g n a l s . I t con s i s t s of a PDP-9 computer as the c e n t r a l processing u n i t and a number of input and output accessories. The PDP-9 computer has a memory s i z e of 16,000 (16 K) words and each word contains 18 b i t s . The teletype i s used f o r typing i n and p r i n t i n g out program i n s t r u c t i o n s , statements and data; i t i s also the ch i e f means of commanding the d i g i t a l computer. Occasionally, console switches are also used to control the computer. DEC tapes are magnetic tapes f o r program and data storage and r e t r i e v a l . The paper-tape u n i t i s used f o r reading i n and punching out program i n s t r u c t i o n s , statements and data. A p r e c i s i o n d i s p l a y u n i t i s used to d i s p l a y processed speech s i g n a l s or data stored i n the computer, i t has a d i s p l a y area of 9.25" square. Data points can be p l o t t e d on i t s screen i n a waveform pattern i n a square array of 1,024 by 1,024 (10 b i t s by 10 b i t s ) p oints. Low-pass f i l t e r s are used to l i m i t the bandwidth of the sound s i g n a l entering the a n a l o g - t o - d i g i t a l (A/D) converter and the s i g n a l coming out from the d i g i t a l - t o - a n a l o g (D/A) converter. An A/D converter i s required to i n t e r f a c e the sound wave with the d i g i t a l computer because a d i g i t a l computer cannot handle sound wave (an analog signal) d i r e c t l y . S i m i l a r l y , the D/A converter i s used t o convert the d i g i t a l l y processed information i n t o u s e f u l sound wave again. Both the A/D converter and the D/A converter have 12 b i t s of r e s o l u t i o n . The input s i g n a l to the low-pass f i l t e r can come from a tape-recorder or a microphone or some other devices and the output s i g n a l of the other low-pass f i l t e r can be recorded on a tape-recorder or d i r e c t l y put into loudspeaker, headphones, x-y p l o t t e r , e tc. 8 Reader Paper Tape Punch A / D Converter 12-b i t 1\ L o w - p a s s F i l t e r Tape-recorder DEC tape Unit PDP -9 D i g i t a l Computer 16K 18-bit Words 1 Precision D isp lay S c o p e W r i t e r T e l e t y p e P r i n te r D/A Converter 12 -b i t L o w - p a s s F i l t e r P i g . 3 Computer system f o r processing speech. 9 2.2 The Segmentation Program F i g . 4 shows the flow chart of the segmentation program developed to process speech s i g n a l s . The input s i g n a l i s f i r s t sampled and then stored i n the data buffer. The sampling frequency can be s p e c i f i e d by typing i n the wanted value from the t e l e -type. The amount of data that can be stored depends on the memory s i z e of the computer. For a sampling period of 80 l i s . , a memory si z e of 5 K ( i . e . . 5,000) words i s required to store a speech sound that l a s t s 400 ms. (the average duration of a word spoken at 150 wpra.). The PDP-9 computer has a memory s i z e of 16 K words. S e t t i n g aside 8 K words f o r program i n s t r u c t i o n s and subroutines, i t s t i l l has 8 K words of memory f o r s t o r i n g speech data. Thus f o r the sampling period of 80 us., i t can store 640 ms. of speech sounds. Making use of the long word length of 18 b i t s of the computer, a software technique was a l s o developed to e f f e c t i v e l y enlarge the a v a i -l a b l e memory storage s i z e by a f a c t o r of two. This was done by packing two 9 b i t data samples i n the 18 b i t word. The actu a l values of these samples were obtained afterwards by an unpacking subroutine. In sampling the incoming s i g n a l , a threshold was set up to detect the beginning of the speech s i g n a l automatically. This threshold was adjusted to a l e v e l which was j u s t higher than the noise l e v e l inherent i n the input system. Once the speech sample has been stored i n core memory, i t s data w i l l be under complete c o n t r o l of the computer and can be processed i n a number of ways. Some important features of t h i s program w i l l be described below. 10 set the sampling period ± set a threshold to detect the beginning of sound 5k accept input s i g n a l through A/D converter and store i t i n data b u f f e r ± wait f o r s e l e c t i o n 1 or 2 or 3 2. L i s t e n ± set various segments wanted f o r D/A conversion ± 1. Display set various segments wanted f o r d i s p l a y 3. Record < set various segments wanted to be recorded output the segments through D/A converter d i s p l a y waveforms of the segments i n a systematic way on the screen write the data samples onto a DEC tape f o r record go back to s e l e c t i o n 1 or 2 or 3 F i g . 4 Plow chart f o r the segmentation program. 11 1. Display With the a i d of a p r e c i s i o n oscilloscope (DEC model 30), e i t h e r the whole amplitude-time waveform of the speech sample or various portions of i t can be displayed on the screen f o r v i s u a l inspection. One such waveform of the word ' s p l i t ' and some segments of i t displayed on t h i s oscilloscope are depicted i n F i g . 5 and F i g . 6 re s p e c t i v e l y . Not shown i n the flow chart i s a scheme that can amplify and expand any part of the waveform. This y i e l d s a very accurate and e f f i c i e n t method of segmentation and enables the operator to examine any part of the waveform i n greater d e t a i l . 2. L i s t e n The whole word, or any segment or a combination of segments of i t i n any s p e c i f i e d order, can be presented to a l i s t e n e r . A recording of various segments of the word ' s p l i t " to form other words was prepared and demonstrated to a number of l i s t e n e r s . There i s also a subroutine which can repeat these sounds a number of times. 3. Record Various segments of the speech sample can also be recorded on a DEC tape. This gives the f l e x i b i l i t y of c a l l i n g out the recorded segments when required l a t e r on. Other features of the segmentation program include subroutines to command the teletype to p r i n t out data, or to punch data on a paper-tape so that data can be tr a n s f e r r e d to another computer or device. 12 ' : •' •' '. • • « • » -• :- • • •• •• • - ••••• F i g . 5 Waveform of the sound of the word 'SPLIT' r-yjtf S P • « » ,..... . . *. « 11. t * LI P i g . 6 Segmenting the waveform i n t o d i f f e r e n t p a r t s . 13 3 Discrimination of Vowels of Very Short Duration [ 9 ] One way of speeding up the presentation of s p e l l e d speech i s t o shorten the duration of vowel sounds. This experiment investigated the e f f e c t of short duration on the d i s c r i m i n a t i o n of the s i x vowels t o be used i n the synthesis of l e t t e r sounds. A l s o , t h i s experiment was conducted w i t h the aim of gaining some experience i n psychological acoustics experiments with speech s t i m u l i . In t h i s experiment, the d i s c r i m i n a t i o n of s i x vowel sounds (/a/, /s/, /e/, /o/, /u/ and / i / ) w of 1 0 ms. duration was studied. The question "Can we perceive a vowel i f only 1 0 ms. of i t i s heard?" has i n t e r e s t both because i f the answer i s yes then we may a c t u a l l y be able w i t h s u f f i c i e n t t r a i n i n g to i d e n t i f y a vowel spe-c i f i e d i n t h i s minimal way; and, a l t e r n a t i v e l y , i t may be permissible to produce a close r e p l i c a of a natural vowel sound-by repeating a period of the vowel say ten to twenty times and i n t h i s way economize the storage of data needed f o r the vowel. The e n t i r e experiment was c o n t r o l l e d by the PDP-9 computer. Vowels of equal p i t c h and i n t e n s i t y l e v e l were generated. Both p h o n e t i c a l l y t r a i n e d (PT) and untrained (UPT) subjects were used. Rapid learning took place and the PT subjects showed much bette r d i s c r i m i n a t i o n than the UPT subjects. The r e s u l t s also i n d i c a t e d that subjects could be t r a i n e d to c o r r e c t l y discriminate these 1 0 ms. vowels. Con-f u s i o n matrices of the l a s t four l e a r n i n g blocks i n d i c a t e d that / i / and /u/ sounded very much a l i k e when they were short. The pattern of the t e s t scores was discussed w i t h reference to pure tone perception. * A l i s t of phonetic symbols and key words can be found i n Appendix I. 14 3.1 A Review of Vowel Perception Although there has been extensive study of the duration of vowels [lO-18] , r e l a t i v e l y l e s s work has been done r e l a t i n g the durations of vowels to t h e i r recognition [l9-22]. Recently an i n v e s t i g a t i o n has been made to determine the recognition threshold of some vowels as a f u n c t i o n of temporal segmentations [23]. I t was found that the median recognition threshold v a r i e d from vowel to vowel from about 10 to 30 ms. However, the r e s u l t s of t h i s experiment were not i n good agreement w i t h those previously obtained by others [24-26] who reported that vowel fragments l e s s than 10 ms. could be recognized. In other studies [21, 22], i t was reported that vowels could be c o r r e c t l y i d e n t i f i e d at durations of the order of 30 ms. Despite the above r e s u l t s , d i s c r i m i n a t i o n l e a r n i n g of short durations of vowels has v i r t u a l l y been neglected. Using the computer to generate the s t i m u l i , t h i s experiment shows that both PT and UPT subjects can l e a r n t o discriminate vowels of 10 ms. A reason f o r using two sets of subjects was to a n t i c i p a t e what the e f f e c t of t r a i n i n g might have on performance: the PT subjects represented one extreme; the UPT subjects represented the novices. We tend to i d e n t i f y the performance of the PT subjects w i t h b l i n d people who are c h a r a c t e r i s t i c a l l y acute l i s t e n e r s . 15 3.2 Procedure 3.2.1 Preparation of Vowels Since the fundamental frequency limen i s about 0.3 to 0.5$ of the fundamental frequency [27]» and i n t e n s i t y d i s c r i m i n a t i o n may be acute with short sounds [2l] , precise control of i n t e n s i t y and fundamental frequency of the vowels i s necessary and t h i s was done by a computer. The s t i m u l i were obtained from s i x sustained vowels, /a/, /E/, /e/, /o/, /u/ and / i / . Since i r r e g u l a r i t i e s of amplitude and p i t c h may occur when a vowel i s sustained by a human speaker (e.g. see [28]), a computer c o n t r o l l e d method TOS employed to extract a basic segment (the pi t c h - p e r i o d of the voice) of the vowel waveform. This segment was then repeated a number of times to simulate the vowel -waveform. The scheme of-basic -segment e x t r a c t i o n .was done . b y a segmentation program described i n Chapter 2 [7] . This scheme allowed the operator to detect and extract accurately a desired segment of the vowel waveform. The sustained vowels were f i r s t low-pass f i l t e r e d at 8 kHz. and sampled at a rate of 20 kHz. The waveforms were then displayed on the screen of the d i s p l a y u n i t . A p i t c h - p e r i o d of 7.65 ms. duration ( i . e . fundamental frequency = 131 Hz.) occurred i n a l l the sustained vowels. A basic segment of each vowel with t h i s p i t c h period was then extracted. The s t a r t i n g point of t h i s basic segment was taken t o be the zero-crossing before the major peak i n the period of the vowel waveform. A f t e r these basic vowel segments had been extracted, synthesized vowels were generated and presented t o both PT and UPT l i s t e n e r s f o r i d e n t i f i c a t i o n . When a l l these vowels were c o r r e c t l y i d e n t i f i e d , they were recorded on a tape. Subsequently, these synthesized vowels were played and t h e i r i n t e n s i t i e s were equalized by measurement with a rms. voltmeter (Hewlett Packard 3400A). Prom these synthesized vowels, a second basic segment of each vowel was extracted and 16 formed the basic segment of the synthesized vowels used i n t h i s study. A l l the synthesized vowels were low-pass f i l t e r e d at 8 kHz. before presentation to the subjects. The f i r s t three formants of these r e s u l t i n g vowels were measured with a v a r i a b l e band-pass f i l t e r (Krohn-Hite model 3342R) and the rms. voltmeter. The formant frequencies i n Hz. were; /a/, PI = 760, P2 = 1050, F3 = 2500; /£/, PI = 580, P2 = 1900, F3 = 2450; /e/, F l = 510, F2 = 2050, F3 = 2700; /o/, F l m 530, F2 = 820, F3 = 2420; /u/, F l = 270, F2 = 660, F3 = 2350; /i/, F l = 260, F2 =2200, F3 = 2950. 3.2.2 Experimental Design and Testing Procedure To determine the duration of vowels to be employed i n t h i s study, a p i l o t experiment was conducted. This p i l o t experiment was also used to f i n d out the r e a c t i o n of PT and UPT subjects to t h i s experiment. The r e s u l t s i n d i c a t e d that i t was possible to discriminate among s i x 10 ms. vowels used i n t h i s study and that there might be a great difference between PT and UPT subjects. As a r e s u l t , vowels of 10 ms. duration were used and two groups of subjects were employed. S i x u n i v e r s i t y students who had no t r a i n i n g i n a r t i c u l a t o r y phonetics, formed the group of UPT subjects. Four graduate students and two f a c u l t y members a l l from the D i v i s i o n of Audiology and Speech Sciences formed the group of PT subjects. The graduate students had had about one year of t r a i n i n g i n a r t i c u l a t o r y phonetics and the f a c u l t y members had had about f i v e years of teaching experience i n phonetics and speech sciences. The design of t h i s experiment was s i m i l a r to t h a t of House et a l . [29]. The t e s t materials consisted of s i x d i f f e r e n t blocks of 36 vowel s t i m u l i each. These 36 s t i m u l i were composed of the s i x tested vowels occurring s i x times i n a block i n a constrained manner so that each vowel followed i t s e l f and every other vowel i n the whole block. To minimize order e f f e c t , each of the s i x subjects of the two groups was assigned 17 t o a given row i n a 6 X 6 L a t i n square. At the end of the s i x t h block the f i r s t two blocks were presented to the subject again. This experiment was conducted i n a quiet room. P r i o r to the presentation of a vowel sound, a 100 ms. "ready" s i g n a l of 1 kHz. was generated (see P i g . 7). A f t e r hearing the vowel sound, the subject was required to associate i t with one of the s i x needle p o s i t i o n s (indicated by numbers from 1 to 6) corresponding to d i f f e r e n t d e f l e c t i o n s on the meter i n front of him. He was required to w r i t e down the number i n the response period of 4.5 sec. on a response sheet provided. The feedback s i g n a l would then d e f l e c t the needle to the p o s i t i o n w i t h which the vowel was to be c o r r e c t l y associated. A f t e r t h i s , the ready s i g n a l would again be heard before the next vowel was presented, e t c . To ensure uniformity of s t i m u l i presentation, both i n d i c a t i n g s i g n a l s and a l l vowel s t i m u l i were generated by the computer. Both the 1 kHz. s i g n a l -and the -vowel, s t i m u l i were .recorded .on. one channel of a stereo tape-recorder (Tandberg model 64). The s i g n a l which monitored the meter was recorded on another channel. Both the 1 kHz. s i g n a l and the vowel s t i m u l i were presented t o the subjects through a loudspeaker (Ampex F2044 speaker a m p l i f i e r ) . P r i o r to the t e s t , 6 to 10 1 100 500 10 4 .5 3.5 2.5 100 | ms. ms. ms. sec. sec. s e c ms. —>• « > < — — > < > <— Response P e r i o d READY VOWEL F E E D B A C K READY S I G N A L S I G N A L S I G N A L F i g . 7 Presentation of the vowel s t i m u l i . 18 s t i m u l i of a block were presented to the subject t o f a m i l i a r i z e him w i t h the t e s t i n g procedure and to adjust the sound t o a comfortable l i s t e n i n g l e v e l . He was also t o l d the s i x d i f f e r e n t vowels used i n t h i s experiment. To the UPT subjects, key words (father, s e t , c h a o t i c , n o t a t i o n , pool and beet) were used to i l l u s t r a t e the vowels, explanation was also provided when there was doubt. To avoid obvious r e l a t i o n s between the stimulus and the number put on the meter, the numbers were changed from block to block f o l l o w i n g another 6 X 6 L a t i n square. Thus d e f l e c t i o n of the needle of the meter was the same f o r the same vowel throughout the whole t e s t , but the numbers put on the meter were changed from one block t o another. There was a rest period of three to four minutes between blocks and each subject spent about 1 hour and 20 minutes f o r t h i s experiment. A l l subjects were paid and were encouraged to t r y t h e i r best by g i v i n g them a bonus i f they got a good average percent .correct d i s c r i m i n a t i o n . 3.3 Results and Discussions The means and standard deviations of the percent correct d i s c r i m i n a t i o n scores f o r both groups of subjects are shown i n Table 1. Graphical displays of the t e s t scores are al s o shown i n F i g . 8. The large standard deviations i n d i c a t e that i n i t i a l l y there was quite a b i g spread among the t e s t scores of the d i f f e r e n t subjects. Devia-t i o n s among the scores of the PT group however, were not great i n l a t e r blocks of d i s c r i m i n a t i o n l e a r n i n g . I t must be emphasized that even though the vowels were only 10 ms. i n length, they d i d sound l i k e vowels to the PT subjects a f t e r several exposures. In f a c t , some of the vowels, p a r t i c u l a r l y /a/ and / f / , were recognized by most of the PT subjects on f i r s t hearing them. During the t e s t , they a l s o mimicked the vowels and 19 Block PT Subjects UPT Subjects Mean SD Mean SD 1 51.4 17.48 25.0 10.27 2 69.0 17.23 38.4 13.56 3 76.9 23.61 61.1 12.73 4 85.2 13.68 66.7 16.11 5 89.8 7.46 68.5 21.60 6 91.7 6.00 69.0 21.84 7 94.9 5.88 76.9 19.08 8 95.8 5.96 74.1 15.83 Table 1 Means and standard deviations of the percent correct d i s c r i m i n a t i o n scores of PT and UPT subjects as a function of t e s t blocks. 20 F i g . 8 D i s c r i m i n a t i o n scores f o r PT and UPT subjects as a f u n c t i o n of t e s t blocks. 21 t r i e d t o map them i n t o t h e i r own vowel system. To the UPT subjects, the vowels sounded l i k e c l i c k s . Large differences were obtained i n the performance of the two groups of subjects. The scores of the UPT group ranged from 15 to 25$ below those of the PT group. Rapid le a r n i n g took place i n the f i r s t four l e a r n i n g blocks a f t e r which the scores rose s t e a d i l y . The PT subjects approached perfect d i s c r i m i n a t i o n of the s i x vowels and there were four subjects (including one UPT subject) who reached the 100$ correct d i s c r i m i n a t i o n scores towards the end of the experiment. An analysis of variance was performed on the d i s c r i m i n a t i o n scores shown i n Table 1. The r e s u l t s of t h i s a n alysis are shown i n Table 2. S i g n i f i c a n t differences were obtained i n lea r n i n g blocks (p< 0.001) and groups (p<0.05), but not t h e i r i n t e r a c t i o n (p>0.10). -An a l t e r n a t i v e way to examine the-data .is.to .see how the d i f f e r e n t vowels aff e c t e d the t e s t scores of the subjects. For t h i s , d i s c r i m i n a t i o n learning curves f o r the d i f f e r e n t vowels were p l o t t e d and are shown i n F i g s . 9 and 10. The PT subjects could l e a r n the vowels /a/, /£/*, /e/ and /o/ to 100$ correct d i s c r i m i n a -t i o n . The UPT subjects also had high scores f o r these four vowels. Both groups of subjects had lower scores i n / i / and /u/. Confusion matrices f o r the l a s t four blocks are shown i n Tables 3 and 4 r e s p e c t i v e l y f o r the PT and UPT subjects. These matrices reveal that / i / and /u/ sound very much a l i k e when they are short. This k i n d of / i / and /u/ confusion has also been observed before by Powell and Tosi [23]. C l a r i f i c a t i o n concerning the scores of the d i f f e r e n t vowels can be achieved by reference to the perception of short tones. Biirck, Kotowski and Lic h t e (described i n [30] ) found that the duration of a tone required t o produce the experience of a d e f i n i t e p i t c h decreased from about 50 ms. to 11 ms. as the frequency of the short tone increased from 50 Hz. up to 2 kHz. Beyond 2 kHz., duration increased with 22 frequency. For a tone of 250 Hz., about 20 ms. was required to produce a d e f i n i t e p i t c h and only about 12 ms. was required f o r a tone i n the range of 1 t o 4 kHz. These f i g u r e s i n d i c a t e that the f i r s t formant i s the most severely a f f e c t e d formant when the vowels have a duration shorter than 12 ms. Since i n t e n s i t y has a strong influence on the perception of short vowels and the f i r s t formant i s the most intense formant, the lower the frequency range of t h i s formant i s , the more d i f f i -c u l t i t i s t o perceive the short vowels. Since /a/ and /E/ have the highest f i r s t formant frequencies among the s i x tested vowels, t h e i r d i s c r i m i n a t i o n might be expected to be best. This i s i n agreement with the r e s u l t s . Both /e/ and /o/ have a f i r s t formant frequency lower than that of /a/ and /e/, t h e i r scores were corres-pondingly lower. The f i r s t formants of / i / and /u/ (260 and 270 Hz. respectively) l i e i n the lowest frequency range among the s i x vowels. As a r e s u l t , the scores of / i / and /u/ were poorest and these two vowels were e a s i l y confused because of the lack of a good perception of t h e i r f i r s t formants. This suggests that d i s c r i m i n a t i o n of vowels of a very short duration i s l i k e the perception of short tones, and f o r a short duration, vowels with a higher f i r s t formant frequency are b e t t e r discriminated. 23 Source of V a r i a t i o n df SS MS F P Between Subjects 11 3192.3 Group (G) 1 1488.4 1488.40 8.74 <0.05 Subjects w i t h i n groups 10 1703.9 170.39 Within Subjects 84 4365.7 Block .(B) 7 3069.0 438.43 24.86 <0.001 Inte r a c t i o n (BG) 7 62.3 8.90 0.50 >0.10 B X Subjects w i t h i n groups 70 1234.4 17.63 df =« degree of freedom, SS =» sum of squares, MS = mean square, F = s t a t i s t i c , p = p r o b a b i l i t y . Table 2 Analysis of variance of d i s c r i m i n a t i o n scores. P i g . 9 D i s c r i m i n a t i o n scores of PT subjects f o r the s i x tested vowels. 25 P i g . 10 D i s c r i m i n a t i o n scores of UPT subjects f o r the s i x t e s t e d vowels. Stimulus Response a e 0 u i a 144 e 144 e 142 1 1 0 141 3 u 3 108 33 i 2 17 125 Total 144 144 142 147 129 158 Table 3 Confusion matrix of PT subjects f o r the l a s t four blocks (144 vowel s t i m u l i ) . 27 Stimulus Response a £ e o u i a 117 22 5 e 16 115 10 2 1 e 9 106 15 7 7 o 1 2 13 103 13 12 u 3 7 91 43 i 3 12 12 26 91 Total 137 148 149 139 138 153 Table 4 Confusion matrix of UPT subjects f o r the l a s t four blocks (144 vowel s t i m u l i ) . 28 4 Preparation of L e t t e r Sounds and Spelled Speech Experiment* In order t o reduce the amount of memory required to generate l e t t e r sounds, redundant phonemes were eliminated and a selected set of basic phonemes was extracted by the segmentation process. L e t t e r sounds were then synthesized by concatenation of these basic phonemes. A l s o , vowels and vowel-like sounds have quasi-periodic waveforms. These sounds were reproduced s a t i s f a c t o r i l y by repea-t i n g over and over again a very small segment (corresponding to the p i t c h period of the voice) extracted from the o r i g i n a l waveforms. In t h i s way, the amount of stored data should be d r a s t i c a l l y reduced. Through a subjective experiment, i t was observed that s p e l l e d speech a r t i f i c i a l l y generated according to the above algorithm was a s s i m i l a t e d at a rate which was close to the upper bound ca l c u l a t e d using the c.oncepis of spoken speech. * This work was presented at the I n t e r n a t i o n a l Conference on Speech Communication and Processing, Boston, A p r i l 24-26, 1972 [ 3 l ] . 29 4.1.1 Basic Phonemes and L e t t e r Sounds I t i s noted that l e t t e r sounds of the alphabet contain many redundant phonemes, e.g. phoneme A/* i n l e t t e r s B, C, D, E, G, etc. To minimize the number of samples required t o represent l e t t e r sounds and thus minimize memory storage, t h i s k i n d of redundancy must be eliminated. Table 5 shows a l i s t of 18 basic phonemes selected i n such a way that d i s t i n c t sounds of the whole set of 26 l e t t e r s of the alphabet could be generated by concatenation of these basic phonemes. Consonants /b/ /«/ /d/ / d 3 / A / / P/ A / H /w/ Vowels, L i q u i d and Nasals A/ A/ /el /a/ /o/ /u/ Ul /m/ /n/ Table 5 L i s t - o f .basic phonemes. The basic phonemes were obtained by e x t r a c t i o n from n a t u r a l l y spoken l e t t e r sounds using the segmentation program described i n Chapter 2. F i r s t , several samples (varying from 5 to 15) of each l e t t e r sound were recorded i n succession i n a quiet room. A Tandberg (model 64) tape-recorder and a B r i i e l & Kjaer condenser microphone (model 4145 with p r e a m p l i f i e r 2619) were used. A l l materials were spoken by the author. Next, the best sample of each l e t t e r sound was chosen. Each best sample was then p r e f i l t e r e d by a low-pass f i l t e r (Krohn-Hite model 3342R), sampled by the computer, and stored i n core f o r e x t r a c t i o n by the segmentation program. * A l i s t of phonetic symbols and key words can be found i n Appendix I. 30 In the sampling process of sound, the higher the sampling r a t e , the more perf e c t w i l l be the reproduced sound. The number of bits/sample must also be high. But a higher sampling rate and a higher number of bits/sample w i l l give a propor-t i o n a t e l y higher number of b i t s required f o r storage. According to the sampling theorem, the sampling rate should be ^ 2 1 f o r a s i g n a l having frequency com-ponents bandlimited 0 - \? Hz. In t h i s study, a 12.5 kHz. sampling rate was chosen because i t was shown that speech i n t e l l i g i b i l i t y was only s l i g h t l y d e t e r i o -rated when low-pass f i l t e r e d around 6 kHz. [32]. Experimenting w i t h d i f f e r e n t number of bits/sample, i t was observed that 9 bits/sample was quite adequate f o r good l e t t e r sounds. As a r e s u l t , a l l phoneme data was low-pass f i l t e r e d at 6 kHz. and sampled at 12.5 kHz. ( i . e . at 80 jis. i n t e r v a l s ) w i t h 9 bits/sample. Synthesized l e t t e r sounds are presented i n Table 6. In the assignment of l e t t e r sounds, some consideration had been given to reduce confusions among l e t t e r sounds, /ev/ was assigned to l e t t e r F so that v o i c i n g (/v/) at the end of the l e t t e r would make i t more d i s t i n c t from l e t t e r S (/ES/). For the same reason, the p i t c h of phoneme /m/ i n l e t t e r M was made higher than that of phoneme /n/ i n l e t t e r N. D i s t i n c t i o n s among stop consonants deserve a greater a t t e n t i o n because of t h e i r high frequency of occurrence i n the language and t h e i r ease of being confused. As a f i r s t step toward making voiced phonemes (stop consonants /b/, /d/ and a f f r i c a t e /d3/) more d i s t i n c t from v o i c e l e s s phonemes (stop consonants /k/, /p/ and / t / ) , a s l i g h t l y stronger than normal a s p i r a t i o n was given to a l l v o i c e l e s s stops. The next step was to f i n d out the proper s i l e n t i n t e r v a l s which should be put before the bursts of these stop consonants. I t was observed that a voiced stop seemed to merge w i t h the preceding sound and v o i c e l e s s stops sounded l i k e t h e i r voiced cognates when short s i l e n t i n t e r v a l s were used. In an experiment to measure the s i l e n t i n t e r v a l s of stop consonants [33] , i t was found that the s i l e n t i n t e r v a l s of stop consonants were much longer than those of t h e i r voiced cognates. The average difference i n s i l e n t i n t e r v a l s between voiced and v o i c e l e s s stop conso-nants was about 30 ms. Based on t h i s r e s u l t , the s i l e n t i n t e r v a l of v o i c e l e s s stop consonants was made 30 ms. longer than the voiced ones. This a d d i t i o n a l amount of s i l e n t i n t e r v a l f o r v o i c e l e s s stop consonants proved i n p i l o t e x p e r i -ments t o be e s s e n t i a l e s p e c i a l l y when l e t t e r s were presented at a high rate i n sp e l l e d speech. L e t t e r A B C D E P G H I Sound e i b i s i d i i ev d-ji e i d j age L e t t e r J K L M N 0 P Q R Sound d^ei k e i a em en ou p i k i u a L e t t e r S T u V W X Y Z Sound ES t i i u v i dabiu eks wage S£ Table 6 L e t t e r sounds synthesized by concatenation of basic phonemes. 32 4.1.2 Savings i n Memory Storage In terms of b i t s of d i g i t a l samples required, t h i s method of synthesizing l e t t e r sounds from the constructed set of basic phonemes gives a tremendous saving i n memory space and thus construction cost i s minimized. Table 7 shows a set of figur e s comparing the number of b i t s required by n a t u r a l l y spoken l e t t e r sounds uttered at a rate of about 50 wpm. and the number of b i t s required by the 18 basic phonemes. These figures show that the selected basic phonemes occupy only one seventh of the normal memory storage required by n a t u r a l l y produced l e t t e r sounds. Mode Spoken L e t t e r Sounds Basic Phonemes Duration of Sounds (sec.) 4.1 0.576 Memory Space ( k i l o - b i t s ) 461.3 64.8 Table 7 Comparison of memory space required by spoken l e t t e r sounds and memory space required by the basic phonemes. 33 4.2.1 Preliminary Experiment with B l i n d Subjects To t e s t the a c c e p t a b i l i t y of concatenated l e t t e r sounds and the l i s t e n i n g speed b l i n d subjects could l i s t e n to s p e l l e d speech, an experiment was conducted using a PDP-9 computer to simulate the d i g i t a l s p e l l e d speech reading machine. L e t t e r sounds were synthesized by concatenation of the proposed 18 basic phonemes. They were al s o low-pass f i l t e r e d at 6 kHz. to eliminate unwanted harmonics r e s u l t e d from the sampling process. Three b l i n d subjects aged between 14 and 19 were used. They a l l had a knowledge of grade I I B r a i l l e . Before the actual t e s t , the subjects had a p r a c t i c e session of one hour to f a m i l i a r i z e themselves with the l e t t e r sounds. The subjects were tested one at a tirne. During the f i r s t h a l f hour or so, l e t t e r sounds only were presented to the subjects. For the f i r s t f i f t e e n minutes, they were given c o n t r o l of the keyboard of the teletype and could l i s t e n to any l e t t e r sounds they wanted by s t r i k i n g the corresponding keys. A f t e r t h i s , a quiz of i d e n t i f i c a t i o n of l e t t e r sounds was given to them. This short quiz indicated by that time they had had already p r a c t i c a l l y no d i f f i c u l t y i n d i s t i n g u i s h i n g the 26 l e t t e r sounds. When t h i s was f i n i s h e d , s i n g l e words and sentences at various speeds were presented. Presentation rate was v a r i e d by computer c o n t r o l of the s i l e n t i n t e r v a l s between l e t t e r s (20 - 64 ms.) and words (88 - 282 ms.) corres-ponding to 50 - 80 wpm. As d i f f e r e n t authors have d i f f e r e n t ways of s p e c i f y i n g presentation r a t e , a standard sentence " I f you want to know reading speed d i v i d e twelve hundred by the time i n seconds needed to read t h i s sentence" was used t o determine the reading rate. This sentence contains a reasonable d i s t r i b u t i o n of l e t t e r s and comes up with an average word length of 4.4 l e t t e r s . The maximum speed that a subject could l i s t e n to s p e l l e d sentences was a l s o determined i n t h i s p r a c t i c e session. When t h i s p r a c t i c e session was over, the subjects had a r e s t period of ten minutes a f t e r which the actual t e s t was given. 34 The teat material consisted of selected l i s t s of p h o n e t i c a l l y balanced sentences [ 3 4 ] . Each l i s t had an average length of 78 words contained i n ten sentences. Each sentence was stopped at tv/o places corresponding to the middle and end of the sen-tence f o r the subjects to respond a f t e r presentation. Three l i s t s were tested f o r each l i s t e n i n g speed and the subjects were in s t r u c t e d to omit the words i f they could not catch them. The whole t e s t took about one hour and twenty minutes. 4.2.2 Experimental Results The i n t e l l i g i b i l i t y scores of t h i s t e s t are shown i n Table 8. This table indicates that the b l i n d subjects could read s p e l l e d sentences between 60 and 70 wpra. with a high average i n t e l l i g i b i l i t y score of 90$ correct. Also the subjects had no d i f f i c u l t y i n recognizing these a r t i f i c i a l l y generated l e t t e r sounds a f t e r a short period of t r a i n i n g . Subject A B C L i s t e n i n g Speed (wpm.) 60 70 50 60 60 70 $ Correctness 96.59 93.61 80.41 82.65 89.33 89.39 Table 8 Average $ correctness scores of three b l i n d subjects. 35 4.3 An Upper Bound to Spelled Speech A s s i m i l a t i o n In order to have an estimate of the maximum l i s t e n i n g rate b l i n d people could a t t a i n w i t h s p e l l e d speech, a h e u r i s t i c comparison with r a p i d l y presented speech w i l l be made. I t i s believed t h a t , f o r compressed speech, 275 wpm. i s a speech rate beyond which comprehension begins to decline sharply [35]. When i s o l a t e d p h o n e t i c a l l y balanced words were presented to l i s t e n e r s at t h i s r a t e , Poulke and S t i c h t [36] found the correctness score of i n t e l l i g i b i l i t y was 91$. A speech rate of 275 wpm. corresponds to a s y l l a b l e rate of 6.55 s y l l a b l e s / s e c . when one word i s counted as 1.43 s y l l a b l e s [37]. Each sp e l l e d speech l e t t e r contains about one s y l l a b l e . I f one word i s taken to contain an average of 4.4 l e t t e r s and each l e t t e r sound i s repre-sented by one s y l l a b l e , then 6.55 s y l l a b l e s / s e c . i s equivalent to a s p e l l e d speech rate of 89 wpm. However, l e t t e r sounds have to be strung together to form words f o r perception of meaning, t h i s w i l l slow down the l i s t e n i n g rate somewhat f o r s p e l l e d speech to be i n t e l l i g i b l e . For s i m i l a r t e s t materials ( p h o n e t i c a l l y balanced sentences), two b l i n d subjects of the present experiment achieved a rate of 70 wpm. w i t h a comparable i n t e l l i g i b i l i t y score of 91.5$ c o r r e c t . Although l i s t e n i n g rate can be increased with l e a r n i n g , an increase of more than 10 wpm. i s doubtful. Thus i t i s estimated that f o r an average b l i n d subject, the maximum acceptable l i s t e n i n g rate f o r good i n t e l l i g i b i l i t y of s p e l l e d speech i s around 80 wpm. In order to increase t h i s rate s t i l l f u r t h e r , some k i n d of contractions i s necessary and t h i s w i l l be discussed l a t e r . 36 5 Minimizing the Amount of Memory Required to Store the Basic Phonemes Two parameters, the sampling rate and the number of b i t s contained i n a sample, were considered. When these parameters are reduced, the data storage w i l l be reduced accordingly. Reduction of the number of b i t s per sample was accomplished by providing the minimum number of b i t s required by each i n d i v i d u a l phoneme. Further reduction was achieved by an encoding scheme. Reduction of the sampling rate was studied by varying the bandwidth of the basic phonemes. 5.1.1 B i t Reduction by Considering the Ind i v i d u a l Basic Phonemes I t i s well-known that i n t e n s i t i e s d i f f e r a great deal from one phoneme t o another. Thus only a small number of b i t s per sample w i l l be required by those phonemes which have r e l a t i v e l y low i n t e n s i t i e s . The set of basic phonemes was examined by d i s p l a y i n g the amplitude-time wave-form of each i n d i v i d u a l phoneme on the screen of the p r e c i s i o n d i s p l a y u n i t of the PDP-9 computer. A computer program was developed so that the magnitude and si z e of the waveforms could be modified. In the manipulation of the phoneme data, the maximum of the waveform was defined as the greatest amplitude i n b i t s per sample occurring i n the waveform. In case the maximum was only s l i g h t l y bigger than a c e r t a i n b i t range, the data was modified to f i t i n t o t h i s b i t range so that one b i t per sample of the data could be saved. A f t e r each modification of the data, comparison by l i s t e n i n g was made to ensure the modified phoneme sounded as good as the o r i g i n a l one. In t h i s manner, the r e s u l t i n g phoneme data amounted to 27.3 k - b i t s instead of 64.8 k - b i t s f o r 9 b i t s per sample. D i s t r i b u t i o n s of maxima of the modified data of the basic phonemes are shown i n Table 9. Phoneme Maxima (b i t s ) Phoneme Maxima ( b i t s ) b 5 i 4 s 3 e 5 d 5 £ 4 d 3 4 a 5 k 5 0 5 P 5 u 5 t 5 4 V 4 m 5 w 5 n 4 Table 9 D i s t r i b u t i o n s of maxima of the basic phonemes. 38 5.1.2 Magnitude Difference Encoding Scheme When the sampling rate i s high, the magnitude differences between successive samples of phoneme data are much smaller than the actual magnitudes of the data. Thus s t o r i n g only the magnitude differences w i l l save a l o t of storage space. The o r i g i n a l magnitudes of the data samples can be recovered simply by the a d d i t i o n of successive magnitudes. The operation of t h i s scheme resembles d i f f e r e n t i a l pulse code modulation. The fo l l o w i n g equations and Table 10 w i l l i l l u s t r a t e how t h i s scheme works. Actual data: D =» E A., i = 1, 2, 3, . a i = i l where n = number of data samples. n Encoded data: D = Z E. e i = l 1 where E. = A. - A l l and A„ = 0. 1-1 Decoded data: D„ = Z C. = D d i = i l i where C. I = E. 1 + C i - 1 A. l and c o = 0. Samples ( i ) 1 2 3 4 5 6 7 8 9 10 Actual Data < v 0 3 6 8 10 7 4 1 -1 2 Encoded Data (E.) 0 3 3 2 2 -3 -3 -3 -2 3 Table 10 Example of magnitude difference encoding scheme f o r n = 10. 39 Computer software had also been developed to encode the data and decode i t afterwards. Results of these procedures were v e r i f i e d by data p r i n t - o u t and graphic d i s p l a y of the waveforms. When the encoded waveforms were compared with the o r i g i n a l ones, i t was found that except f o r those phonemes (/s/ and /&•}/) which had strong high frequency components and /a/ which had a high i n t e n s i t y , t h i s encoding scheme y i e l d e d a s u b s t a n t i a l reduction of the maxima of most of the other phonemes. D i s t r i b u t i o n s of the maxima of the encoded data are shown i n Table 11. Phoneme Maxima ( b i t s ) Phoneme Maxima ( b i t s ) b 4 i 2 s 4 e 3 d 4 6 4 d 3 4 a 5 k 4 o 3 P 3 u 3 t 3 I 3 V 2 m 3 w 5 n 2 Table 11 D i s t r i b u t i o n s of maxima of the basic phonemes a f t e r difference coding. The encoded data requires a storage of 20.1 k - b i t s compared with 27.3 k - b i t s before encoding. This gives a f u r t h e r saving of 7.2 k - b i t s . Since the decoding process merely involves simple a d d i t i o n , i t i s an e r r o r - f r e e process. With t h i s encoding scheme, basic phonemes of bet t e r q u a l i t y could be ex-t r a c t e d by r a i s i n g the sampling rate. This i s because the magnitude differences between successive samples w i l l decrease w i t h the r i s e of sampling r a t e . 40 5»2 B i t Reduction by Decreasing the Sampling Rate By reducing the bandwidth of the basic phonemes, b i t reduction can be achieved by lowering the sampling rate. As i t has been found [38] that c e r t a i n phonemes occupy a smaller bandwidth than the others, i t i s possible to reduce the bandwidth from 0 - 6 kHz. to a smaller range at l e a s t f o r a number of phonemes. Reduction i n bandwidth was investigated experimentally. This w i l l be described i n the next chapter. This experiment was designed to study simultaneously the e f f e c t s of some other fa c t o r s which affected i n t e l l i g i b i l i t y and reading speed of s p e l l e d speech. The bandwidths studied were: 0 - 3 kHz., 0 - 4 kHz., 0 - 5 kHz. and 0 - 6 kHz. I f i t was found that the range 0 - 3 kHz. was good enough f o r a l l the phonemes, then data storage should be cut down by h a l f . Unfortunately t h i s method cannot be combined with the magnitude difference encoding scheme because a decrease i n sampling rate r e s u l t s i n an increase i n magnitude d i f f e r e n c e s , otherwise b i t storage could be cut down more e f f e c t i v e l y . 41 6 Further Experiment to Study I n t e l l i g i b i l i t y of L e t t e r Sounds and E f f e c t s  of Presentation Speed, Bandwidth Reduction and Pause Time Between Words Using three b l i n d subjects, a preliminary experiment described i n Chapter 4 in d i c a t e d that s p e l l e d sentences presented between 60 and 70 wpm. were h i g h l y i n t e l l i g i b l e . I t also indicated that the synthesized l e t t e r sounds could be e a s i l y learned. In order to confirm these r e s u l t s , a more intensive study was made and an experiment employing 16 b l i n d subjects was conducted. This experiment inves-t i g a t e d four f a c t o r s , namely, the i n t e l l i g i b i l i t y of l e t t e r sounds, the presenta-t i o n speed, the bandwidth of l e t t e r sounds, and the pause duration between words. The i n v e s t i g a t i o n of pause duration stemmed from the hypothesis that a pause proportional to the length of word might be h e l p f u l i n decoding long words. The r e s u l t s of t h i s experiment agreed w e l l with those of the ..preliminary one. The i n t e l l i g i b i l i t y of the constructed set of l e t t e r sounds proved to be strongly r e s i s t a n t to bandwidth reduction and subjects could l e a r n t o recognize a l l 26 synthesized l e t t e r sounds of the alphabet even when the bandwidth was reduced by h a l f . However, i n t e l l i g i b i l i t y scores of s p e l l e d sentences decreased s i g n i f i c a n t l y with e i t h e r reduction i n bandwidth or increase i n presentation speed. A pause proportional to the length of the preceding word d i d not give s i g n i f i c a n t l y b e t t e r i n t e l l i g i b i l i t y than a pause of f i x e d duration. 42 6.1 Experimental Design and Testing Procedure The investigated f a c t o r s were divided into a number of treatment l e v e l s . Presentation speed was divided i n t o four l e v e l s : 45, 55, 65 and 75 wpm. The bandwidths studied were 0 - 3 k H z . , 0 - 4 k H z . , 0 - 5 kHz. and 0 - 6 kHz. Two v a r i a t i o n s of pause duration a f t e r each word were studied: i n one case the pause duration was f i x e d and i n the other case, the pause duration was made linearly-proportional to the length of the preceding word. A Greco-Latin square design [39] was used i n t h i s experiment, the plan of t h i s design i s shown i n Appendix I I . This design eliminated order e f f e c t s i n presentation speed and bandwidth. Pour l i s t s of pho n e t i c a l l y balanced sentences [34] were used. Each l i s t had a length of 79 words contained i n ten unrelated sentences. Sixteen b l i n d subjects, aged between 14 and 27, were used and were divided i n t o eight groups f o r the experiment. Most of the subjects were students of the U n i v e r s i t y of B r i t i s h Columbia. Most of them had a good knowledge of B r a i l l e (reading speed above 100 wpm.). Only those who had been b l i n d f o r only several years were poor i n B r a i l l e (30 - 60 wpm.). A l l subjects were i n d i v i d u a l l y t r a i n e d and tested. This experiment was divided into three sessions of about one and a h a l f hours each. At the beginning of the f i r s t session, the e n t i r e set of l e t t e r sounds i n random order was presented to the subject f o r i d e n t i f i c a t i o n . Pour subjects were tested f o r each bandwidth. The r e s u l t of t h i s part of the experiment furnished the subjects' i d e n t i f i c a t i o n scores of the synthesized l e t t e r sounds before t r a i n i n g . Following t h i s , the subject was taught to recognize the l e t t e r sounds at 6 kHz. bandwidth. Subsequently the subject was given c o n t r o l of the keyboard of the t e l e -type and could l i s t e n to any l e t t e r sounds by s t r i k i n g the corresponding keys. In t h i s way, the subject could compare and contrast those l e t t e r s which were e a s i l y 43 confused. A short quiz of l e t t e r i d e n t i f i c a t i o n was given t o the subject from time to time. As the experiment progressed, the bandwidth was gradually reduced to 3 kHz. and words and sentences were also introduced at a gradually increasing speed. By the end of t h i s session, most subjects could recognize a l l the l e t t e r sounds and had experienced a l l four bandwidths and four speeds. The second session was emphasized on sentence reading. L e t t e r sounds were also reviewed from time to time. At the end of the second session, the subject was given a l e t t e r t e s t to evaluate l e t t e r d i s t i n c t i v e n e s s and learning e f f e c t . Sentence t e s t s were conducted i n the t h i r d session. P r i o r to each t e s t , p r a c t i c e sentences were given to f a m i l i a r i z e the sub-j e c t w i t h the t e s t condition. Test sentences were presented i n the same manner as described i n section 4.2.1 of Chapter 4. There was a r e s t period of f i v e minutes between l i s t s . 6.2 Experimental Results 6.2.1 L e t t e r I d e n t i f i c a t i o n before Training In order to f i n d out how natural and d i s t i n c t the synthesized l e t t e r sounds are, at the beginning of the f i r s t session, l e t t e r sounds i n random order were presented to the subjects f o r i d e n t i f i c a t i o n . The r e s u l t s of t h i s part of the experiment, i n the form of confusion matrices, are shown i n Tables 12 - 16. The t o t a l scores (out of 104) f o r each bandwidth are: 57 at 3 kHz., 70 at 4 kHz., 64 at 5 kHz. and 64 at 6 kHz. The average score i s 63.75, i . e . 61.3 % c o r r e c t . 44 ^^Response Stim u l u s " - ^ A B C D E P G H I J K L M N 0 P Q R S T U V X Y Z A 3 1 B 4 C 1 1 2 D 2 2 E 2 2 F 3 1 G 4 H 4 I 3 1 J 4 K 1 1 1 1 L 1 1 2 M 4 N 4 0 4 P 4 Q 4 R 1 3 S 1 1 1 1 T 2 2 U 2 2 V 1 2 1 V 3 1 X 4 T 4 Z 1 1 2 Table 12 Confusion matrix of l e t t e r sounds at the 3 kHz. bandwidth. 45 ^-\Re3ponse S t i m u l u s ^ ^ A B C D E P G H I J K L M N 0 P Q R S T U V V X Y Z A 3 1 B 2 2 C 3 1 D 2 2 E 1 3 P 4 G 3 1 H 4 I 4 J 1 3 K 3 1 L 1 3 U 1 1 2 N 3 1 0 4 P 3 1 Q 3 1 R 4 S 4 T 4 U 1 3 V 1 2 1 2 2 X 1 3 Y 4 z 1 1 1 1 Table 13 Confusion matrix of l e t t e r sounds at the 4 kHz. bandwidth. 46 ^^Response S t i m u l u 3 " \ A B C D E P G H I J K L M N 0 P Q R S T U V X Y Z A 2 2 B 1 1 1 1 C 1 3 D 2 2 E 4 P 1 3 G 3 1 H 4 I 3 1 J 3 1 K 1 1 1 1 L 3 1 M 4 N 4 0 4 P 4 Q 2 2 R 4 s 4 T 4 U 3 1 V 1 1 1 1 W 3 l X 1 3 Y 4 Z 1 3 Table 14 Confusion matrix of l e t t e r sounds at the 5 kHz. bandwidth. 47 ^^Ilesponse S t i m u l u s \ ^ A B C D E F G H I J K L M N 0 P Q R S T U V W X Y Z A 2 2 B 4 C 3 1 J) 3 1 E 1 2 1 F 3 1 G 3 1 H 4 I 4 J 3 1 K 1 1 1 1 L 1 3 M 1 3 N 4 0 1 1 2 P 4 Q 4 R 1 3 S 4 T 1 3 U 2 2 V 1 1 2 W 2 2 X 1 3 T 4 Z 1 1 2 Table 15 Confusion matrix of l e t t e r sounds at the 6 kHz. bandwidth. 48 s \ R e s p o n s e S t i m u l u s ^ \ A B C D E F G H I J K L M N 0 P Q R S T U V xt X Y Z A 10 6 B 11 1 1 3 C 8 1 7 D 4 9 3 E 1 3 11 1 F 1 13 2 G 13 3 H 16 I 14 2 J 11 4 1 K 1 2 2 4 3 4 L 2 8 6 M 1 1 1 13 N 15 1 0 1 1 14 P 15 1 Q 13 3 R 1 1 14 S 1 1 1 13 T 3 13 U 3 10 3 V 2 1 2 2 3 2 4 vr 10 6 X 3 13 T 16 Z 2 3 1 1 2 7 Table 16. Combined confusion matrix of l e t t e r sounds f o r the four bandwidths. 49 The combined scores shown i n Table 16 indicate that some l e t t e r s could be i d e n t i f i e d more e a s i l y than others. This table also indicates l e t t e r s having the some phoneme sound at the beginning (such as /&•$/ i n l e t t e r s G and J, and /e/ i n P, L, M, N, S etc.) or the same phoneme sound at the end (such as /i/ i n B, C, D, etc.) are more e a s i l y confused among themselves. Naturalness and d i s t i n c t i v e n e s s can be measured according to the i d e n t i f i c a t i o n scores of t h i s t e s t . L e t t e r s ranked i n t h i s way are shown i n Table 17 below. Prom t h i s t a b l e , i t can be seen that Score 16 15 14 13 11 L e t t e r H I N P I 0 R F G S T X B E Score 10 9 8 7 6 4 3 1 0 L e t t e r A D C L Z W J K V U M Q Table 17 Combined i d e n t i f i c a t i o n scores of T e t t e r sounds i n descending order of correctness (perfect score: 16). synthesis (using /ate/) of diphthong /al/ which occurs i n both T and I was very su c c e s s f u l . The low scores of U and Q ind i c a t e that d i r e c t combination of /i/ and /u/ does not give a good sound of diphthong / j u / or / i u / which occur i n U and Q re s p e c t i v e l y . The low scores of some other l e t t e r s were mainly due to confusions, e.g. l e t t e r M was most of the time m i s i d e n t i f i e d as N (see Table 16), l e t t e r J m i s i d e n t i f i e d as G, l e t t e r U m i s i d e n t i f i e d as l e t t e r E, e t c . Although the i d e n t i f i c a t i o n scores of some l e t t e r s ( p a r t i c u l a r l y U, U and Q) seem di s a p p o i n t i n g l y low, i t w i l l be shown that a f t e r a short period of t r a i n i n g , the majority of the subjects could i d e n t i f y a l l the l e t t e r s without e r r o r . 50 6.2.2 L e t t e r I d e n t i f i c a t i o n a f t e r Training During the progress of the experiment, i t was observed that many subjects could e a s i l y l e a r n to recognize a l l the l e t t e r s c o r r e c t l y . In the l e t t e r t e s t towards the end of the second session, f i v e sounds of each l e t t e r i n random order were presented to the subject f o r i d e n t i f i c a t i o n . Results of t h i s t e s t are shown i n Table 18. In t h i s t e s t , 12 subjects had perfect scores, 3 subjects made only 1 mistake (out of 130 responses) and 1 subject made 8 mistakes. Of the t o t a l number of 11 mistakes made, 9 were made at 3 kHz. and 2 at 6 kHz. The extremely high average correct i d e n t i f i c a t i o n score of 99.5$ suggests that a l l l e t t e r sounds can be learned to perfect recognition a f t e r a short period of t r a i n i n g . I t a l s o indicates that the i n t e l l i g i b i l i t y of the constructed set of l e t t e r sounds i s hi g h l y r e s i s t a n t to bandwidth reduction. 51 > \ R e s p o n s e S t i m u l u s ^ A B C D E P G H I J K L M N 0 P Q R S T U V W X Y Z A [80 B 80 C 80 D 2 77 1 •E 80 F 80 G 80 H 80 I 80 J 80 K 80 L 80 M 78 2 N 80 0 80 P 80 Q 2 78 R 80 S 2 78 T 1 79 U 80 V 1 79 w 80 X 80 I 80 z 80 Table 18 Combined confusion matrix of l e t t e r sounds f o r the four bandwidths a f t e r l e a r n i n g . 52 6.2.3 Test on Sentence Reading Results of t h i s t e s t are summarized i n Tables 19 and 20. I t can be seen that the scores vary widely from subject to subject. A d e t a i l e d analysis of the data showed that the subjects who were bad i n B r a i l l e ( p a r t i c u l a r l y those who became b l i n d i n t h e i r adulthood), were also r e l a t i v e l y bad i n s p e l l e d speech; t h e i r r e s u l t s resembled those obtained with sighted subjectsf. This indicates that those who can read B r a i l l e w e l l w i l l also be good i n reading s p e l l e d speech. Also there were a number of exceptional subjects who could read s p e l l e d speech comfortably at a speed of 80 wpm. Table 20 shows that the average scores of the b l i n d subjects are very high, ranging from 83 to 94.15$ correct. I t must be borne i n mind that although i t was an i n t e l l i g i b i l i t y t e s t , a c e r t a i n l e v e l of comprehension had already taken place because the subjects were tested on sentences rather than on s i n g l e words. Occa-s i o n a l l y the subjects were asked what the sentences were about and from t h e i r answers one knew that very often they understood the sentences even i n cases when one or two words were u n i n t e l l i g i b l e . A s p e l l e d speech experiment was conducted some time ago using a h i g h l y motivated group of sighted subjects [8]. The same t e s t i n g procedure and the same type of t e s t i n g materials (PB sentences) were used. For approximately the same presentation speed (55 wpm. i n the present experiment w i t h b l i n d subjects and 54 wpm. i n the experiment with sighted s u b j e c t s ) , the average percent correct score of the b l i n d subjects i s 12.93$ higher than that of the sighted subjects'. L 2 L 3 L4 G l 80.38 77.22 82.91 87.97 G 2 95.57 79.11 85.44 76.58 h G 3 96.20 96.20 84.18 91.77 G 4 88.61 93.04 96.20 99.37 G 5 84.18 79.11 90.51 90.51 T G 6 94.94 91.77 89.87 86.71 A2 G 7 92.45 98.10 89.87 90.51 G 8 81.65 82.91 90.51 95.57 Table 19 Scores of s p e l l e d speech experiment i n percent correctness. L i s t L l L 2 I>2 L4 90.19 86.39 87.18 87.92 h 88.31 87.97 90.19 90.82 Average 89.25 87.18 88.69 89.87 Speed (wpm.) a i ( 4 5 ) a 2 (55) a 3 ( 6 5 ) a 4 (75) \ 93.51 89.87 87.66 81.65 h 94.78 91.30 86.87 84.34 Average 94.15 90.59 87.27 83.00 Bandwidth (kHz.) bx (3) b 2 (4) b 3 (5) b 4 (6) h 85.60 87.34 90.19 89.56 h 88.61 89.87 89.56 89.24 Average 87.11 88.61 89.88 89.40 Table 20 Summary data of sp e l l e d speech experiment i n percent correctness. 55 An analysis of variance of the data i s shown i n Appendix I I I . This analysis i n d i c a t e s thats bandwidth and l i s t of t e s t i n g materials are both s i g n i f i c a n t (p<0.05), the e f f e c t of speed of presentation i s h i g h l y s i g n i f i c a n t (p<0.01), but e f f e c t s of row p o s i t i o n and i n t e r v a l of pause between words are not s i g n i f i c a n t . The bandwidth e f f e c t i s s i g n i f i c a n t because lowering the bandwidth reduces the pleasantness and c l a r i t y of l e t t e r sounds. Since d i f f e r e n t l i s t s of t e s t i n g materials have d i f f e r e n t sentences containing words d i f f e r i n g both i n length and f a m i l i a r i t y , i t i s not surprised to see the l i s t e f f e c t being s t a t i s t i c a l l y s i g n i f i c a n t . As expected, presentation speed i s h i g h l y s i g n i f i c a n t because i n t e l l i g i b i l i t y decreases sharply w i t h increase i n speed of presentation. The i n s i g n i f i c a n c e of row e f f e c t suggests that the order e f f e c t of subject groups i n t h i s t e s t i s not s i g n i f i c a n t . As f a r as the i n t e r v a l of pause between words i s concerned, although a pause proportional to the length of the preceding word gives some improvement over a f i x e d i n t e r v a l (average score: v a r i a b l e i n t e r v a l , 89.32$ correct; f i x e d i n t e r v a l , 88.17$ c o r r e c t ) , i t s e f f e c t i s not s i g n i f i c a n t to the 0.05 l e v e l . Since presentation speed and bandwidth are the two fact o r s which are most int e r e s t e d i n t h i s experiment, f u r t h e r t e s t s were made to probe the nature of differences among treatment means of these two f a c t o r s . The r e s u l t s of Newman-Keuls' t e s t s are shown i n Appendix IV and can be summarized schematically as follow s : 56 wpm. 45 55 65 75 kHz. 5 6 4 3 45 — ** -ft* 5 - wa-55 — a* 6 - tt 65 — 4 -75 — 3 -«* p<0.01 * p<0.05 Thus i t can be concluded that the various speeds of presentation d i f f e r s i g n i -f i c a n t l y from one another while f o r the d i f f e r e n t bandwidths, only 3 kHz. d i f f e r s s i g n i f i c a n t l y from 5 kHz. and 6 kHz. During the progress of the experiment, the subjects also reported that the 3 kHz. l e t t e r sounds were l e s s pleasant than those having higher bandwidths. 57 6.2.4 Word Length and I n t e l l i g i b i l i t y In t h i s s p e l l e d speech experiment, a large number of errors occurred i n those words which contained a large number of l e t t e r s . In view of t h i s , a c a l c u l a t i o n was made based on the number of errors observed at d i f f e r e n t word lengths (the number of l e t t e r s contained i n a word). The e f f e c t of t h i s f a c t o r on the e n t i r e experiment i s shown below. Word Length ( l e t t e r s ) 4 or l e s s 5 6 7 8 $ Correctness 92.13 83.58 80.21 76.84 62.50 Table 21 Overall percent correctness of the words according to word length. From Table 21, i t can be seen that the word length has a very great e f f e c t on the correctness of the response. The longer the word, the more l i k e l y that an e r r o r w i l l be made. I t must be pointed out that t h i s i s true even i n the case where a longer pause was given to decode the long words i n the t e s t . In some cases, the subjects reported i n s u f f i c i e n t time f o r the perception of the long words from the l e t t e r s , and while they were s t i l l pondering on the long words, words of the subsequent order followed and d i s r u p t i o n occurred. In other cases they forgot the l e t t e r s which occurred at the beginning or i n the middle of the long words. There are two main reasons why long words were l e s s i n t e l l i g i b l e . F i r s t , long words occur l e s s frequently i n the E n g l i s h language and so they were l e s s f a m i l i a r to the subjects. Second, some subjects perceived the long words l e t t e r by l e t t e r and so they forgot some of the l e t t e r s as the word length exceeded t h e i r memory l i m i t . This second f a c t o r w i l l become le s s prominent as the subjects become more s k i l f u l i n decoding s p e l l e d speech. I t i s expected that through p r a c t i c e , they w i l l eventually perceive chunks of l e t t e r s (e.g. i n the form of s y l l a b l e s ) at a time and not a l e t t e r at a time. 58 7 Summary, Discussions and Suggestion 7.1 Summary Y/ith the a i d of a PDP-9 d i g i t a l computer and i t s graphic d i s p l a y and i n t e r f a c e accessories, a segmentation program was developed to examine the properties of l e t t e r sounds, to extract various segments of a speech sample and t o d i s p l a y t h e i r waveforms on a p r e c i s i o n o s c i l l o s c o p e , and to present the processed speech si g n a l s to a l i s t e n e r . In order to cut down the amount of memory required to store the l e t t e r sounds, redundant parts of spoken l e t t e r sounds were eliminated and a set of 18 basic phonemes was chosen to synthesize the l e t t e r sounds. Further saving was accomplished by s t o r i n g only a p i t c h period of each vowel or vowel-like phoneme. In t h i s way, a very b i g memory reduction was possible and the amount of memory needed f o r good q u a l i t y l e t t e r sounds synthesized by concatenation of the basic phonemes was 64.8 k i l o - b i t s compared with 461.3 k i l o - b i t s required to store a set of n a t u r a l l y produced l e t t e r sounds. By t a k i n g i n t o account the amplitudes of i n d i v i d u a l phonemes, the 18 basic phonemes could be stored with 27.3 k i l o - b i t s . Two other methods of f u r t h e r reducing the memory storage were investigated. The f i r s t method involved a difference coding scheme which stored only the differences between successive samples. This method brought the memory storage down to 20.1 k i l o - b i t s . The other method was studied by reducing the bandwidth of the l e t t e r sounds. Subjective experiments indicated that there was a s t a t i s t i c a l l y s i g n i f i c a n t difference between i n t e l l i g i b i l i t y score at the 3 kHz. bandwidth and scores at 5 and 6 kHz. bandwidths. A l s o , the q u a l i t y of the l e t t e r sounds at 3 kHz. bandwidth was reported to be worse than that between 5 and 6 kHz. Thus i t was con-cluded that the difference coding scheme should be chosen to preserve the q u a l i t y of the synthesized l e t t e r sounds. 59 An experiment was conducted to study the d i s c r i m i n a t i o n of s i x vowels and indic a t e d that subjects could l e a r n to discriminate these speech s t i m u l i which were only 10 ms. i n duration. In the synthesis of l e t t e r sounds, some e f f o r t was given to make the synthe-s i z e d l e t t e r sounds e a s i l y d i s t i n g u i s h a b l e . The d i s t i n c t n e s s of the l e t t e r sounds was tested i n a p i l o t experiment using three b l i n d subjects. This t e s t indicated that b l i n d subjects could read s p e l l e d sentences between 60 and 70 wpm. at high i n t e l l i g i b i l i t y a f t e r only one hour of contact w i t h the l e t t e r sounds. The subjects also demonstrated that they could learn t o recognize a l l 26 synthesized l e t t e r sounds a f t e r a short period of t r a i n i n g . This was confirmed i n a more intensive experiment which was designed to investigate simultaneously the e f f e c t s of bandwidth reduction and rapi d presentation of the spe l l e d speech code and al s o the e f f e c t of g i v i n g a longer pause to compensate f o r a longer time needed to decode long words. Sixteen b l i n d subjects were used i n t h i s experiment and the PDP-9 computer was programmed to simulate the d i g i t a l s p e l l e d speech reading machine. Te3t sentences were typed i n and s p e l l e d sentences were generated at the output and presented to the subjects. The r e s u l t s of t h i s experiment indicated that an average young b l i n d subject could read s p e l l e d sentences between 65 and 75 wpm. with an i n t e l l i g i b i l i t y score of about 85% c o r r e c t . A longer pause did not increase s u b s t a n t i a l l y the i n t e l l i g i b i l i t y of long words. A l s o , reduction of the bandwidth reduced the i n t e l -l i g i b i l i t y . Thus f o r preserving good q u a l i t y of the l e t t e r sounds, a bandwidth between 4 and 6 kHz. should be used. Unfortunately, the difference encoding scheme and bandwidth reduction could not be combined to lower the memory storage and thus i t was concluded that f o r t h i s developed set of l e t t e r sounds, the difference coding scheme was preferred and a 6 kHz. bandwidth should be retained. 60 7.2 Discussions 7.2.1 Spelled Speech and E l d e r l y Subjects I t i s obvious that s p e l l e d speech i s of no use to b l i n d people who do not know the s p e l l i n g of words. Although t h i s would eliminate the group of young c h i l d r e n from using the machine to f u l l advantage, t h i s machine can nevertheless be used to t r a i n t h e i r s p e l l i n g . So f a r , the r e s u l t s on s p e l l e d speech were obtained from young b l i n d subjects. In order t o see whether e l d e r l y people can make good use of t h i s machine, two e l d e r l y b l i n d subjects (one 51 years old and the other 58 years old) were used to explore t h i s p o s s i b i l i t y . They were t r a i n e d i n the same manner as the 16 subjects except that each had only two sessions instead of three, and only the 6 kHz. bandwidth was used. This i s because they (and p o s s i b l y other e l d e r l y b l i n d subjects) had problems i n recognizing the l e t t e r sounds even at the 6 kHz. bandwidth. This might have resulted from aging of t h e i r auditory system. When i s o l a t e d words were presented, they could recognize them up to 40 wpm. But when sentences were presented, they could read only at about 20 wpm. This i s i n great contrast with the r e s u l t s obtained with young b l i n d subjects. Although both e l d e r l y subjects found t h i s mode of communication very i n t e r e s t i n g , i t seems u n l i k e l y that they can be tr a i n e d to read at a much f a s t e r r a t e , and c e r t a i n l y not the rate attained by young b l i n d subjects. Whether they would l i k e to use t h i s machine f o r pleasure reading w i l l have to be explored, but d e f i n i t e l y t h i s machine can help them to read short sentences or words l i k e denomination of paper money, bank account, b i l l s , names and addresses on envelope, telephone numbers, e t c . 61 7.2.2 Other Speech Aids f o r the B l i n d I t has been shown that s p e l l e d sentences can be read at a rate much f a s t e r than that can be achieved with the Lexiphone code. A l s o , i t requires only a short period of t r a i n i n g to recognize a l l the l e t t e r sounds. Although synthesized speech would be a much bette r and more nat u r a l mode of machine-to-man communication, i t remains to s e e whether cost and complexity of speech synthesis are j u s t i f i e d . How expensive, how convenient and how nat u r a l w i l l synthesized speech be? W i l l i t be more expensive than employing a paid reader or using the Talking Books? I t i s very d i f f i c u l t to give d e f i n i t e answers t o these questions at the present moment and indeed more ingenious works i n t h i s f i e l d have yet to come. One other type of speech a i d f o r the b l i n d now under development produces a language-like auditory code c a l l e d " S p e l l t a l k " [40, 4 l ] . S i m i l a r to s p e l l e d speech, S p e l l t a l k has one f i x e d sound f o r each p r i n t e d l e t t e r . The phonetic system of t h i s code i s based on the frequency of sound occurrence i n the E n g l i s h language. Thus phoneme / l / i s chosen f o r l e t t e r I because i n the E n g l i s h language, l e t t e r I has the f o l l o w i n g frequency of sound d i s t r i b u t i o n : / l / , 68%; / a l / , 26%; and others, 6fo. Although t h i s phonetic system w i l l produce a number of i n t e l l i g i b l e words because of i t s resemblance to the E n g l i s h language, there are many words which w i l l sound unexpectedly strange and d i f f i c u l t to pronounce and guess at e s p e c i a l l y when presented at a high rate (e.g. /koenge/ stands f o r the word change, /-^Iky/ f o r k i c k and /beeAtj/ f o r beauty, e t c . ) . A f a i r amount of t r a i n i n g i s required to understand S p e l l t a l k and fur t h e r evaluation i s necessary to f i n d out whether b l i n d people can be t r a i n e d to understand t h i s k i n d of machine language at a high r a t e . 62 7.3 Suggestions: Contracted Spelled Speech One way of increasing the reading rate of s p e l l e d speech i s to include con-t r a c t i o n s of s p e l l e d speech ( i . e . spoken words) l i k e the contractions used i n B r a i l l e . I t i s reckoned that i n c l u s i o n of a large vocabulary of words w i l l come back to the problem faced with the t a l k i n g machine, i . e . a b i g increase i n construction cost and complexity. However, with a l i m i t e d vocabulary of spoken words formed by concatenation of the basic phonemes used f o r the l e t t e r sounds of the alphabet, the cost would not be increased appreciably. In Table 22, a suggested vocabulary f o r contracted s p e l l e d speech i s shown. This vocabulary consists of some of the most frequently used words [37]. The way these words are formed from the basic phonemes i s a l s o presented i n Table 22. These words were also synthesized by the computer and presented to several l i s t e n e r s . Most of the words were recognized a f t e r only a short period of t r a i n i n g . Most of these words also occur i n Grade I I contracted B r a i l l e . Based on the frequency of occurrence of these words, i t has been estimated that a l i s t e n i n g rate of about 100 wpm. could be expected f o r a mixture of s p e l l e d speech and these spoken words. Prom Table 22, i t can be seen that some spoken words do not sound exactly l i k e n atural speech because of l i m i t a t i o n of selected basic phonemes. These words however can be improved by constructing more vowels (such as / i / , Idl and /o/) without increasing appreciably the amount of memory required. Word the of and to i n that i t i s f o r be Sound d£ ov gnd t u i n det i t i s vo b i Y/ord was as you with on by at t h i s are we Sound wos £s i u wid on base £t d i s a wi Table 22 L i s t of words to be synthesized from basic phonemes. 63 REFERENCES 1. Beddoes, M. P., and Suen, C. Y. "Evaluation and a method of presentation of the sound output from the Lexiphone-a reading machine f o r the b l i n d " IEEE Trans. Bio-Med. Eng., 18, 85-91, 1971. 2. L i n v i l l , J . G. "Development progress on a microelectronic t a c t i l e f a c s i m i l e reading a i d f o r the b l i n d " IEEE Trans. Audio and E l e c t r o a c o u s t i c s , 17, 271-274, 1969. 3. Nye, P. W., and B l i s s , J . C. "Sensory aids f o r the b l i n d : a challenging problem with lessons f o r the future" Proc. IEEE, 58, 1878-1898, 1970. 4. Cooper, P. S., Gaitenby, J . H., Mattingly, I. G., and Umeda, N. "Reading aids f o r the b l i n d : a s p e c i a l case of machine-to-man communication" IEEE Trans. Audio and E l e c t r o a c o u s t i c s , 17, 266-270, 1969. 5. Smith, G. C., and Mauch, H. A. "Summary report on the development of a reading machine f o r the b l i n d " Mauch Laboratories Summary Report t o the Prost h e t i c and Sensory Aids Service, Veterans Administration, J u l y 1971. 6. Beddoes, M. P., F l e t c h e r , T. R., and Suen, C. Y. "A sp e l l e d speech reading machine f o r the b l i n d " Paper presented at the International E l e c t r i c a l , E l e c t r o n i c s Conference and Ex p o s i t i o n , Toronto, Oct. 4-6, 1971. 6 4 7. Suen, C. Y., and Beddoes, M. P. "Soma appli c a t i o n s of a small d i g i t a l computer i n speech processing" J . Acous. Soc. Am., 50, 107, 1971. (A) Expanded version of t h i s paper i s now i n press, i n "Time-compressed Speech:  Anthology and Bibliography" by Sam Duker, Scarecrow Press, New Jersey. 8. Suen, C. I . "Towards an improved method of presenting the Lexiphone code and spelled speech" M.A.Sc. Thesis, U n i v e r s i t y of B r i t i s h Columbia, May 1970. 9. Suen, C. Y., and Beddoes, M. P. "Discrimination of vowels of very short duration" Perception & Psychophysics, 11, 417-419, 1972. 10. Parmenter, C. E., and Trevino, S. N. "The length of the sounds of a Middle Westerner" Amer. Speech, 10, 129-133, 1935. 11. Lehmann,W. P., and Heffner, R-M. S. "Notes on the length of vowels (VI)" Amer. Speech, 18, 208-215, 1943. 12. Black, J . W. "Natural frequency, duration, and i n t e n s i t y of vowels i n reading" J . Speech Hear. P i s . , 14, 216-221, 1949. 13. House, A. S., and Fairbanks, G. "The influence of consonantal environment upon the secondary a c o u s t i c a l c h a r a c t e r i s t i c s of vowels" J . Acous. Soc. Am., 25, 105-113, 1953. 65 14. Zimmerman, S. A., and Sapon, S. M. "Hote on vowel duration seen c r o s s - l i n g u i s t i c a l l y " J . Acous. Soc. Am., 30, 152-153, 1958. 15. T i f f a n y , W. R. "Sources of v a r i a t i o n i n vowel q u a l i t y " J . Speech Hear. Res., 2, 305-317, 1959. 16. Peterson, G. E., and Lehi9te, I. "Duration of s y l l a b l e n u c l e i i n E n g l i s h " J . Acous. Soc. Am., 32, 693-703, 1960. 17. House, A. S. "On vowel duration i n E n g l i s h " J . Acous. Soc. Am., 33, 1174-1178, 1961. 18. Sharf, P. J . "Vowel duration i n whispered and i n normal speech" Language and Speech, 7_, 89-97, 1964. 19. Siegenthaler, B. M. "A study of the i n t e l l i g i b i l i t y of sustained vowels" Quart. J . Speech, 36, 202-208, 1950. 20. T i f f a n y , W. R. "Vowel recognition as a funct i o n of duration, frequency modulation and phonetic context" J . Speech Hear. P i s . , 18, 289-301, 1953. 21. Schwartz, M. P. "A study of thresholds of i d e n t i f i c a t i o n f o r vowels as a funct i o n of t h e i r duration" J . Aud. Res., 3, 47-52, 1963. 66 22. P u j i s a k i , H. , and Kawashima, T. "The influence of various f a c t o r s on the i d e n t i f i c a t i o n and d i s c r i m i n a t i o n of synthetic speech sounds" The 6th International Congress on Acoustics, Tokyo, Japan, 1968. 23. Powell, R. L., and T o s i , 0. "Vowel recognition threshold as a function of temporal segments" J . Speech Hear. Res., _13, 715-724, 1970. 24. Peterson, G. E. "The s i g n i f i c a n c e of various portions of the wave length i n the minimum duration necessary f o r the recognition of vowel sounds" Ph.D. D i s s e r t a t i o n , Department of Speech, Louisiana State U n i v e r s i t y , 1939. 25. Gray, G. W. "Phonemic microtomy: the minimum duration of perceptible speech sounds" Speech Monographs, j ) , 75-90, 1942. 26. Joos, M. "Acoustic phonetics" Language Monograph, 24, No. 2 Suppl., 77-78, 1948. 27. Flanagan, J . L. Speech analysis synthesis and perception Academic Press Inc., New York (1965), p. 214. 28. Thomas, I. B., H i l l , P. B., C a r r o l l , F. S., and Garcia, B. "Temporal order i n the perception of vowels" J . Acous. Soc. Am., 48, 1010-1013, 1970. 29. House, A. S., Stevens, K. N., Sandel, T. T., and Arnold, J . B. "On the learning of speechlike vocabularies" J . verb. Learn, verb. Behaviour, 1, 133-143, 1962. 6 7 30. Stevens, S. S., and Davis, H. Hearing - i t s psychology and physiology John Wiley, New York (1937), p. 102. 31. Suen, C. Y., and Beddoes, M. P. "Output sounds of a d i g i t a l spelled speech reading machine for the bl i n d " Proceedings cf the International Conference on Speech Communication and  Processing, Boston, A p r i l 24-26, 1972. 32. M i l l e r , G. A. Language and communication McGraw H i l l , New York (1951), p. 64. 33. S l i s , I. H., and Cohen, A. "On the complex regulating the voiced-voiceless d i s t i n c t i o n I" Language and Speech, 12, 80-102, 1969. 34. "1965 revised l i s t of phonetically balanced sentences (Harvard sentences)" IEEE Trans. Audio and Electroacoustics, 17, 239-246, 1969. 35. Poulke, E. "A review of research on time compressed speech" Proceedings of the L o u i s v i l l e Conference on Time Compressed Speech, pp. 3-20, Oct. 1966. 36. Poulke, E., and Sticht, T. G. "The i n t e l l i g i b i l i t y and comprehension of time compressed speech" Proceedings of the L o u i s v i l l e Conference on Time Compressed Speech, pp. 21-28, Oct. 1966. 37. Dewey, G. Reiativ frequency of English speech sounds Harvard University Press, Cambridge (1950), p. 45. 68 38. F l e t c h e r , H. Speech and hearing i n communication D. Van Nostrand Co. Inc. (1953), p. 87-39. Winer, B. J . S t a t i s t i c a l p r i n c i p l e s i n experimental design McGraw H i l l , New York (1962), pp. 575-577-40. B e l l a v i a , D. C. "A pros t h e t i c reading a i d f o r the b l i n d " Ph.D. D i s s e r t a t i o n , Bio-Medical Engineering, Carnegie-Mellon U n i v e r s i t y , P i t t s b u r g h , Pennsylvania, 1970. 41. Lo n g i n i , R. L. " S p e l l t a l k : a new approach to reading machine output f o r the b l i n d " AFB Research Bu l l e t i n . , No. 2 4 , 1 5 3 - 1 5 7 , March 1 9 7 2 . 69 Appendix I L i s t of Phonetic Symbols used Consonants Vowels, L i q u i d , Nasals and Diphthongs Phonetic Symbol Key Word Phonetic Symbol Key Word b bee i beet 5 •see e chaotic d deed £ set <*3 j_ade a father k jca.se o notation P u pool t tea I i t V v e l a d the w wide 0 f o r f f i f e A up */ church I elder g .get m empty j n end 3 v i s i o n a l .i« you Iu mute 70 Appendix I I Plan o f Spelled Speech Experiment with 16 B l i n d Subjects L l L 2 S L 4 h G l a 3 b 3 a 4 b l a l b 4 a 2 b 2 G 2 a i b 2 a 2 b 4 a 3 b 1 a 4 b 3 G 3 a 2 b l a l b 3 a 4 b 2 a 3 b 4 G 4 a 4 b 4 ab„ 3 2 a 2 b 3 a 1 b 1 h G 5 a 3 b 3 a .b^  4 1 a l b 4 a 2 b 2 G 6 a i b 2 a 2 b 4 a 3 b l a 4 b 3 G7 a b 2 1 a l b 3 a 4 b 2 a 3 b 4 °8 4 4 a 3 b 2 & 2 b 3 a i b l I : I n t e r v a l of pause between word3. I. = f i x e d i n t e r v a l = 4.4 TT 1 Li I = i n t e r v a l increases l i n e a r l y with word length = m L + 2 T 2 W L where TT =-- pause between l e t t e r s = word length and m = 2.4 TT/4.4 » 0.545 T.. G: Subject group, there were two subjects per group. L: L i s t of t e s t i n g materials, l i s t s 13, 16, 22 and 45 of PB sentences were used, a: Speed of presentation, four speeds were used, v i z . 45, 55, 65 and 75 wpm. b; Bandwidth of l e t t e r sounds, four bandwidths were used, v i z . 0 - 3 , 0 - 4 , 0 - 5 and 0 - 6 kHz. 71 Appendix I I I Analysis of Variance; Data Analysis of Spelled Speech Experiment Source df MS P Between Subjects 15 -I Interval of pause 1 13.141 R Row 3 138.270 IR 3 82.474 Subjects within groups 8 84.077 Within Subjects 48 a Speed of presentation 3 225.890 71.688** b Bandwidth 3 14.682 4.659* L L i s t of testing materials 3 13.182 4.183* a l 3 5.182 b l 3 8.891 LI 3 11.224 3.562* (.AB)' 3 24.157 (AB)'I 3 26.990 Error 24 3.151 ** p<0.01 * p<0.05 Appendix IV Results of Newman-Keuls' Test of Spelled Speech Scores Speed (wpm.) a 1 (45) a 2 (55) a 3 (65) a 4 (75) a x (45) — 87** 141** a 2 (55) — 42** 96** a 3 (65) — 54** a 4 (75) — Bandwidth (kHz.) b 3 (5) b 4 (6) b 2 (4) \ (3) b 3 (5) — 6 16 35** \ ^ — 10 29* b, (4) — 19 b 1 (3) — p<0.01 * p<0.05 PUBLICATIONS Suen, C.Y. and M.P. Beddoes Some app l i c a t i o n s of a small d i g i t a l computer i n speech processing (in) "Time-Compressed Speech-Anthology and Bibliography": S. Duker. Scarecrow Press -(in press). Suen,C.Y. and M.P. Beddoes. "Output sounds f o r a d i g i t a l s pelled speech reading machine f o r the b l i n d " . Proc. 1972 International Conference on Speech Communication and Processing, 1972. Suen, C.Y. and M.P. Beddoes. "Discrimination of vowel sounds of very short duration". Perception and Psy-chophysics, 11, 417-419, 1972. Beddoes, M.P., T.R. Fletcher and C.Y. Suen. "A spe l l e d speech reading machine f o r the b l i n d " . Proc. I n t e r -nation a l E l e c t r i c a l , E l e c t r o n i c s Conference, 1971. Suen, C.Y. and M.P. Beddoes. "Some ap p l i c a t i o n s of a small d i g i t a l computer i n speech processing". Paper presented at the 81st Meeting of the Acous-t i c a l Society of America, Washington, 1971. Beddoes,M.P. and C.Y. Suen. "Evaluat ion and a method of presentation of the sound output from the L e x i -phone - a reading machine f o r the b l i n d " . IEEE Trans. Bio-Medical Engineering, Vol. BME-18, 85-91, 1971, Suen, C.Y. "Derivation of harmonic equations i n non-l i n e a r c i r c u i t s " . J. Audio Engineering S o c , 18, 675-676, 1970. Yu, P.K. and C.Y. Suen. "Analysis of the Darlington Con-f i g u r a t i o n s " . E l e c t r o n i c Engineering, 40_, 38-39, 1968. Suen, C.Y. " C h a r a c t e r i s t i c s of the Darlington composite t r a n s i s t o r " . Int. J . E l e c t r o n i c s , 24, 373-380, 1968. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0101406/manifest

Comment

Related Items