UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A secure radio-frequency assistive listening device for hard of hearing people Chan, Paul S. K. 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1993_fall_chan_paul.pdf [ 4.78MB ]
Metadata
JSON: 831-1.0065190.json
JSON-LD: 831-1.0065190-ld.json
RDF/XML (Pretty): 831-1.0065190-rdf.xml
RDF/JSON: 831-1.0065190-rdf.json
Turtle: 831-1.0065190-turtle.txt
N-Triples: 831-1.0065190-rdf-ntriples.txt
Original Record: 831-1.0065190-source.json
Full Text
831-1.0065190-fulltext.txt
Citation
831-1.0065190.ris

Full Text

A Secure Radio-Frequency Assistive Listening Device for Hard ofHearing PeoplebyPaul Sui-King ChanB.Eng. Queen Mary College, University of London, 1988.A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE OF MASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIES(DEPARTMENT OF ELECTRICAL ENGINEERING)We accept this thesis as conforming to the required standardTHE UNIVERSITY OF BRITISH COLUMBIASeptember, 1993.© Paul Sui-King Chan, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature) Department of ^  ■/!Q_Q__6..vn .The University of British ColumbiaVancouver, CanadaDate ^DE-6 (2/88)ABSTRACTAssistive listening devices allow hearing impaired listeners to achieve higher compre-hension than with hearing aids alone in difficult listening conditions which occurs duringbusiness meetings, lectures, and presentations. These devices minimize the acousticpathway between the speaker and listener by using some form of electromagnetic trans-mission. A major problem with existing assistive listening devices is that they providelittle security for the users, as the signal may be picked up by unauthorized listenersoutside the confines of the meeting room or hall.The design, implementation, and testing of a secure radio-frequency assistive listeningdevice are described in this thesis. The device combines digital voice encoding technologywith the direct sequence spread spectrum technique to provide a secure digital voicecommunication channel for hearing impaired people.It is shown that the CCITT 0.722 and the digital voice encoding algorithms areacceptable for hearing impaired listeners. A transmitter and receiver based on the Arlan650Tm wireless network card is implemented. Digitized speech samples are grouped intopackets and transmitted using the TCP/IP protocol. The packet error rates of the devicemeasured under some typical environment are presented.The degradation of intelligibility under packet loss conditions is also presented. It isshown that speech intelligibility decreases as the number of lost speech segments increasesand the lost speech segments are replaced by two simple packet loss replacement methods.Table of ContentsABSTRACT ^List of Tables  viList of Figures ^  viiACKNOWLEDGMENTS^  xi^Chapter 1^Introduction  1Section 1.1^Motivation ^  1Section 1.2^Assistive Listening Devices ^  4Section 1.3^Objective & Overall Design  6Section 1.4^Spread Spectrum techniques ^  8Section 1.5^Outline of the Thesis ^ 10Chapter 2^Selection of a speech encoding method ^ 12Section 2.1^Speech Characteristics and intelligibility  12Section 2.2^Subjective Measurement of different speech encodingtechniques ^  16Section 2.3^Selecting a speech encoding technique for hearingimpaired listeners ^ 232.3.1^Procedures of the SPIN test^ 262.3.2^Results in Quiet Conditions  302.3.3^Results in Noisy Conditions ^ 322.3.4^Speech encoding techniques for the hearing impairedlisteners ^  37^Chapter 3^Hardware Design & Implementation ^ 38Section 3.1^Overall design ^ 38Section 3.2^Spread Spectrum modem ^ 39Section 3.3^Arlan 650Tm characteristics 41Section 3.4^Codec Interface ^ 45Chapter 4^Software Design & Implementation ^ 53Section 4.1^Overall design ^ 53Section 4.2^TCP/IP protocol 54Section 4.3^Voice packet transmission in the radio-frequency ALDsystem ^ 61Section 4.4^Voice packet size in the radio-frequency ALD^ 72Section 4.5^Lost packet replacement strategies ^ 76Chapter 5^Evaluation of the radio-frequency ALD 79Section 5.1^Packet error rates of the radio-frequency ALD ^ 79Section 5.2^Miller and Nicely audiometric test ^ 875.2.1^Miller and Nicely Test procedures 925.2.2^Results of the Miller and Nicely Test^ 95Results of the pilot subject ^ 95Results using normal hearing subjects ^ 96Results using hearing impaired subjects 975.2.4^Effect of packet lost on speech intelligibility ^ 99ivChapter 6^Conclusions ^  106Section 6.1^Summary  106Section 6.2^Suggestions for further work^  107Appendix A^Example of a Revised SPIN Test Form ^ 108Bibliography  110VList of TablesTable 1.1^Total population and persons who report impaired hearingin Canada. ^  1Table 2.1^Frequency bands making equal (5 percent) contributionsto articulation index when all bands are at their optimumlevels. Composite data for men's and women's voices. . 15Table 2.2^Five-point scales for quality and impairment, andassociated number scores ^  17Table 2.3^Revised SPIN Test Score (Mean) 37Table 3.1^Known commercially available spread spectrummodems ^ 40Table 3.2^Radio frequency and data rate of channels in Arlan650TM ^ 42Table 3.3^Signal description of the TMS320C30 serial ports . . . ^ 46Table 3.4^Signal description of the MC14402^ 48Table 4.1^Categorization of packet loss distortions 73Table 5.1^Mean packet error rate in measurement # 1-7. ^ 84Table 5.2^Classification of consonants used to analyze confusion. ^ 89Table 5.3^Overall Scores of the pilot subject ^ 95Table 5.4^Miller and Nicely test results for the normal hearingsubjects^  96Table 5.5^Miller and Nicely test results for the hearing impairedsubjects^  97viList of FiguresFigure 1.1^Distribution of persons with impaired hearing aged 15 andover residing in households, Canada. ^ 3Figure 1.2^Proposed Assistive Listening Device    7Figure 2.1^Idealized speech spectrum measured at one meter fromlips   12Figure 2.2^Articulation index versus cut-off frequency. All bands areat their optimum levels. Curve is based on about equalnumber of men's and women's voices ^ 14Figure 2.3^Subjective speech quality of different speech encodingtechniques versus encoding bit rate ^ 18Figure 2.4^Relation between Al and various measures of speechintelligibility.   22Figure 2.5^Scoring Nomograph for the Revised SPIN test. ^ 25Figure 2.6^Setups for applying speech encoding techniques torecordings of Revised SPIN test forms. ^ 27Figure 2.7^Classification of hearing impairment in relation tohandicap for speech recognition ^  29Figure 2.8^Test setup in quiet condition 30Figure 2.9^Audiograms of subjects taking Revised SPIN test in quietcondition ^  31Figure 2.10^Revised SPIN test in quiet conditions. ^ 32VIIFigure 2.11^Test setup in noisy conditions, babble is presented at 8dBHL below speech signal . ^  33Figure 2.12^Frequency response of the Realistic FM wirelessmicrophone system (Model # 32-1221) ^ 34Figure 2.13^Audiograms of subjects taking Revised SPIN test undernoisy conditions^  35Figure 2.14^Revised SPIN test results for noisy conditions^ 36Figure 3.1^Overall design of the radio-frequency ALD system. . . . ^ 38Figure 3.2^Radio frequency spectrum of Arlan 650Tm Channel 7 to9 ^  44Figure 3.3^Block diagram of the Codec Interface. ^ 45Figure 3.4^TMS320C30 Serial Port Timing 47Figure 3.5^Transmit and receive timing diagram for MC14402 . . ^ 49Figure 3.6^TMS320C30 Processor Board and MC14402 Interfacesignal timing. ^  50Figure 3.7^Clock generation circuit in the Codec Interface ^ 51Figure 3.8^MC14402 to TMS320030 Processor Board interfacecircuit. ^  52Figure 4.1^Overall software design of the radio-frequency ALDsystem. ^  53Figure 4.2^The three level of service provided by TCP/IP ^55Figure 4.3^TCP/IP reference model^ 57Figure 4.4^Communication process in the TCP/IP protocols ^ 60viii^Figure 4.5^Communication processes in the radio-frequency ALDsystem ^  62Figure 4.6^Structure of an IP datagram ^ 64Figure 4.7^The five subfields that comprise the Type of service field. ^ 65Figure 4.8^Fragmentation of an IP datagram ^ 66Figure 4.9^IP address classes. ^ 68Figure 4.10^Structure of a Ethernet frame in radio-frequency ALD . . ^ 70Figure 4.11^Overhead accompanying each voice packet. ^ 74Figure 4.12^Data transmission rate versus packet size. ^ 75Figure 4.13^Packet replacement techniques for the radio-frequencyALD. ^ 77Figure 5.1^Location of radio-frequency ALD transmitter and receiverin Measurement #1, 2, 3, 6, 7, 8, 9 and 10^ 81Figure 5.2^Location of radio-frequency ALD transmitter and receiverin Measurement #4 and 5.^  82Figure 5.3^Packet error measurement of the radio-frequency ALD. . 83Figure 5.4^Packet error measurements under interference ^ 85Figure 5.5^Packet error measurement under interference ^86Figure 5.6^A confusion matrix ^ 90Figure 5.7^A confusion matrix grouped by voiced and voicelessconsonants ^  91Figure 5.8^confusion matrix for voiced and voiceless consonant . . ^ 92Figure 5.9^Simulated packet loss in the Miller and Nicely syllables ^ 93ixFigure 5.10Figure 5.11Figure 5.12Figure 5.13Figure 5.14Set up for the Miller and Nicely test ^ 94Audiogram of the hearing impaired subjects^ 98Results of the Miller and Nicely Tests ^ 99Overall scores for the hearing impaired subjects. ^ 100Results of different articulatory scores in silencesubstitution for normal hearing subjects^ 101Figure 5.15^Results of different articulatory scores in packet repetitionfor normal hearing subjects ^  102Figure 5.16^Results of different articulatory scores in silencesubstitution for hearing impaired subjects^ 103Figure 5.17^Scores of the 16 syllables under different packet lostconditions^  104ACKNOWLEDGMENTSI would like to thank Dr. C.A. Laszlo and Dr. C.S.K. Leung for their patience,suggestions and help during the project.Special thanks are extended to Dr. M.K. Pichora-Fuller in the Department ofAudiology and Speech Sciences for her advice and use of equipment to carry out theintelligibility tests. I would also like to thank M. Frauendorf, J. Nicol, S. Rosenberg,O.G. Gilbert for help in getting the software for intelligibility listening test to work.In addition, I would like to thank all volunteers who participated in the intelligibilitylistening tests.Financial support from the B.C. Science Council and NSERC grant OGP001731 isgratefully acknowledged.xiChapter I IntroductionChapter 1 Introduction1.1 MotivationAccording to a Statistics Canada 1992 survey [15], of 25,061,270 Canadians)1,022,220 report they have difficulty in hearing. This means that more than 1 of ev-ery 25 Canadian adults (4.1%) report that their hearing is impaired to some degree.More detailed statistics are given in Table 1.1. The reported rates of impaired hearingincrease markedly with age, from slightly less than 1% for persons under 25 years of ageresiding in households, to almost half (47.5%), for persons 85 years of age and older.For persons with impaired hearing aged 15 and over residing in households, aging is theprimary cause of impaired hearing (29.14%). Disease or stroke is the second most fre-quent cause (20.98%). Almost 1 out of 5 persons (18.11%) indicates that the impairmentresulted from something associated with work. Accidents and injuries account for 8.59%and 6.86% have impaired hearing present at birth. ITable 1.1 Total population and persons who report impaired hearing in Canada. [151Hearing ImpairedResidence Total Population Number PercentageTotal 25,061,270 1,022,220 4.1Under 15 5,325,185 48,390 0.915 years and over 19,736,085 973,830 4.9Residing in ...Household 24,806,180 908,825 3.7Under 15 5,322,315 47,970 0.915 years and over 19,483,865 860,855 4.4Health-related Institutions 255,090 113,395 44.5Under 15 2,870 420 14.615 years and over 252,220 112,975 44.8IChapter I IntroductionThe hearing impaired population consists of people of all ages, with varied occupa-tions, interests, hearing loss and needs. Within the hearing impaired population there aretwo groups, the deaf and the hard of hearing. The needs and characteristics of these twogroups are very different The primary mode of communication for the deaf is sign lan-guage, while the primary mode of communication for the hard of hearing is speech. Theoverall impact of hearing loss is significant, causing retardation of educational progressand the socialization process for children. For those who acquire hearing loss later inlife, the consequences include loss of self-esteem, tensions in inter-personal relationships,difficulties in the work place, and varied psychosocial problems.Impaired hearing may refer to minor difficulties or to the complete inability to usehearing for conversation. Using the responses to questions about the ability to carry ona conversation, three categories of impaired hearing have been defined in the survey.Fig. 1.1 shows the distribution of the three categories of hearing impaired aged 15 andover residing in households in Canada.Category II, the middle range of hearing difficulty, has the largest number of persons,with 587,065 (68.2%), while 211,930 (24.6%) are in Category I, and 45,575(5.3%) arein Category III. The fact that fewer cases of impaired hearing are found in CategoryI than would be anticipated is probably due to persons with mild hearing impairmentsnot reporting them as often as those with more severe impairments, because they areunaware of them or do not regard them as limiting. The fact that the survey defineshearing impairment as one that limits the individual in daily activities reinforces thisexplanation.2Chapter 1 IntroductionCategory I - persons who say they have no difficulty hearing one person but haveat least partial difficulty hearing in groups;Category II- persons who say they have partial difficulty hearing one person andhave at least partial difficulty hearing in groups;Category IIE- those who are completelt unable to hear in one person conversations;IND. - refers to those persons who have impaired hearing but whose degreeof impairment cannot be determined because key answers are missing.Fig. 1.1 Distribution of persons with impaired hearing aged 15 andover residing in households, Canada. [15]Dramatic changes have occurred in hearing health care during the last decade dueto the development of new surgical interventions and the application of technology.Some communication problems of the hard of hearing population can now be solvedeffectively due to advances in hearing aid technology. Unfortunately, hearing aids areable to provide only a partial solution. There are many situations in which the hearingaid does not provide adequate intelligibility of speech. Usually such problems occur3Chapter 1 Introductionin noisy environments with low signal-to-noise ratio or in poor acoustic circumstanceswhere reverberation reduces speech intelligibility. Some problem areas for the hard ofhearing population are1. auditoriums or meeting rooms where speaker-to-listener distance, reverberationand noise reduce speech intelligibility.2. television, telephone, and radio listening where poor fidelity and interference fromroom noise affects intelligibility, and3. person-to-person communication in such noisy environments as restaurants,automobiles and parties.Hearing aids, which amplify sounds regardless of their origin, become almost uselesswhen the acoustical signal-to-noise ratio is low. While persons with normal hearingunderstand speech when the signal-to-noise ratio is 6 dB to 12 dB [31], many hardof hearing people require a signal-to-noise ratio of 15 dB to 20 dB [28] to functionadequately.1.2 Assistive Listening DevicesAssistive listening devices offer hearing impaired listeners better communicationsthan hearing aids alone can provide in difficult listening conditions. These devicesminimize the acoustical pathway between the speaker and listener using some formof electromagnetic transmission. When used properly, these systems provide the hardof hearing listener with a high level acoustic signal of good quality. These systemsare having a great impact on the ability of hard of hearing individuals to participate inprofessional and business meetings, and educational and leisure activities. There are four4Chapter 1 Introductiontypes of systems available1. Hard-wire system consisting of a microphone, an amplifier and a number of earphonesconnected by a wire cord.2. Magnetic induction loop system consisting of a microphone, an amplifier and a coilof wire placed around the room. The amplified electrical speech signal is fed tothe coil of wire, producing a modulated magnetic field in the room. Hearing aidswith built-in induction coils can pick up the speech signal in the magnetic field andconvert it into sound.3. Infrared system consisting of one or more microphones wired into an amplifier/driverthat powers one or more infrared emission panels. The modulated infrared lighttravels to individual receivers which convert the light back into an electrical signal,and provide acoustical, magnetic or direct input to the hearing aid of the user.4. FM system consisting of a microphone with a transmitter worn by the speaker anda radio receiver worn by a listener. Current radio-frequency systems are allocatedthe 72-76 MHz radio band by the FCC in the US and DOC in Canada. Within thisband 32 separate channels are available.Each of these systems have specific advantages and operational limitations. The hard-wire system is the simplest but the mobility of the user is limited by the length of thecord. Systems based on magnetic induction loops require either permanent installationor placement of wires around the room each time it is used. In addition, the strengthof the magnetic field emitted from the coil varies as a function of distance and showsother unpredictable variations within the room. The physical orientation of the inductioncoil in the hearing aid will also markedly affect the signal received by the user. Most5Chapter I Introductionsystems based on infrared technologies also require extensive installation, governed by theneed to provide even illumination by placing the infrared emission panels in appropriatelocations in the room. In addition, infrared systems cannot be used in environments wherethe infrared transmission may be swamped by sunlight or other hot light sources. The FMsystem requires minimum expertise to set up and offers maximum mobility for the user.A major problem with the loop and FM systems is that they provide minimumsecurity for the users, as the signal may be picked up by unauthorized listeners outsidethe confines of the room or meeting hall. In some applications, this is not a problem, butin many business meetings, lectures, and presentations the possibility of being overheardis not acceptable.1.3 Objective & Overall DesignThe objective of this study is to develop a specialized radio-frequency assistivelistening device (ALD) for the hearing impaired population that combines the best featuresof existing methodologies. The design criteria for this device include transmissionsecurity, transmission quality to satisfy the needs of the hard of hearing user, and theease of use of the system.Fig. 1.2 shows the basic arrangement for the proposed ALD system. Listenersreceive transmitted speech via a light-weight headphone, or couple the ALD to theirpersonal hearing aid directly or via a personal loop if special amplification or signalprocessing is required to enhance their listening ability. This flexible approach will makethe technology not only useful to the hearing impaired population, but it will also beattractive to users of other communication devices, such as walkie-talkies and cordless6Chapter 1 Introductiontelephones, who want security in their systems.Fig. 1.2 Proposed Assistive Listening Device.In this thesis an investigation on the use of spread spectrum techniques to securethe voice signal transmitted between the speaker and hearing impaired listeners ispresented. The process requires digitizing voice into bits of data. Each bit of datais encoded by a pseudorandom sequence for transmission. At the receiver end, decodingis accomplished by correlating the receiving signal with a synchronized replica of thesame pseudorandom sequence which was used to transmit the data. The decoded digitalsignal is then converted back into voice signals. The idea behind this scheme is that whileeach authorized receiver could tune in to different transmitters by using their assignedpseudorandom code sequences, unauthorized receivers would be "locked out". Given7Chapter I Introductionthe large number of possible code sequences, it would be difficult for an unauthorizeduser to recover the digitized voice sent by the transmitter. In this fashion, authorizeduser would be able to freely use their communication system, but illegal eavesdroppingcould be made quite difficult.1.4 Spread Spectrum techniquesSpread spectrum systems were first developed in the mid-1950s. The initial ap-plications have been in anti-jamming of military tactical communications, in missileguidance systems, and in anti-multipath systems. The spread-spectrum technique is ameans of transmission in which the signal occupies a bandwidth in excess of the mini-mum necessary to send the information. The band spread is accomplished by means ofa pseudorandom code which is independent of the data. Synchronized reception with thepseudorandom code at the receiver is used for de-spreading and subsequent data recov-ery. Although the use of the spread-spectrum technique means that each transmissionrequires the use of a wide band of the spectrum, many of the requirements demanded byour application can be satisfied simultaneously. These include•Low probability of intercept;•Anti-jamming;•Anti-interference;-Low-density power spectra for hiding the signal;•Message screening from eavesdroppers; and•Selective addressing capability.8Chapter 1 IntroductionThere are several means by which the spectrum of a signal can be spread.1. Modulation of a carrier by a digital pseudorandom code sequence whose bit rateis much higher than the information bit rate. Such a technique is known as "directsequence" modulation.2. Carrier frequency shifting in discrete increments in a pattern dictated by apseudorandom code sequence. The transmitter jumps from frequency to frequency withinsome predetermined set. This is called "frequency hopping".3. Conceptually similar to the "frequency hopping" technique is the "time hopping"technique in which the bursts of data signals are transmitted at pseudorandom times.4. Hybrid combinations of the above techniques are also frequently used.Direct sequence (DS) methods are the most common in spread spectrum systems.This is because of their relative simplicity and efficiency of the technique. For example,DS methods do not require a high speed frequency synthesizer as in the frequency hoppingsystems. Compared to DS systems, simple time-hopping systems offer little in the wayof interference rejection since a continuous carrier at the signal center frequency canblock communications effectively. Because of this relative vulnerability to interference,time hopping techniques are usually combined with other spread spectrum techniques.The advantage of combining two spread spectrum techniques in hybrid systems is toobtain characteristics which are not available in using a single spread spectrum technique.The specific construction of hybrid systems usually depends on its application, and itsimplementation is usually more complicated than systems using a single spread spectrumtechnique. Each of the spread spectrum techniques discussed can be used to achieve thedesired spectrum spreading effect.9Chapter I IntroductionEach technique is important in the sense that it has useful applications. The historicaltendency has been to use each method mainly in a particular field of applications. [26]1.5 Outline of the ThesisChapter 2 examines speech encoding techniques suitable for this application. A dis-cussion of the characteristics of speech signals and factors affecting speech intelligibilityappears in Section 2.1. Section 2.2 is a review of some commonly used speech encodingtechniques, with emphasis on their relative quality and bit rates. Section 2.3 presents theresults of using the Speech Perception in Noise (SPIN) test to select a speech digitizingtechnique for the radio frequency assistive listening device.Chapter 3 deals with the hardware design of the radio frequency assistive listeningdevice. Section 3.1 discusses the overall design. A review of known spread-spectrummodems is presented in Section 3.2. Section 3.3 presents the features and characteristicsof the Arlan 650Tm spread-spectrum modem. Section 3.4 discusses the hardware interfaceused to send the digitized voice data to the modem.Chapter 4 deals with the software implementation aspects of the radio frequencyassistive listening device. Section 4.1 presents the overall software design. TransmissionControl Protocol/Internet Protocol (TCP/IP), the protocols used to transport the digitizedvoice signal, is discussed in section 4.2. Section 4.3 deals with the transmission of thedigitized voice data in the TCP/IP hierarchy. Section 4.4 is an analysis of packet sizewith respect to the bit transmission rate. Section 4.5 explains how missing packets canbe replaced.Chapter 5 contains the results of tests carried out with the modem. Section 5.11 0Chapter 1 Introductiondescribes the packet error rates of the radio-frequency assistive listening device indifferent environments. Section 5.2 deals with applying the Miller and Nicely audiometrictest to determine the effect of missing packets.Chapter 6 contains the conclusions and suggestions for future work.11ao& 3530o 25a)20Chapter 2 Selection of a speech encoding methodChapter 2 Selection of a speech encoding method2.1 Speech Characteristics and intelligibilitySpeech consists of a succession of sounds varying rapidly from instant to instantboth in intensity and in frequency. The sounds of speech contain energy between at least100 and 8000 Hz. Spectral analyses of spoken English have been studied extensivelyin the past [17], [20], [8], and was found that the spectra of individual voices differconsiderably. For comparison and testing purposes an idealized speech spectrum (Fig.2.1), based on average measurement of a group of speakers, has been developed byFrench and Steinberg [20].Idealized speech spectrum10 2^10 3^104Frequency (Hz)Fig. 2.1 Idealized speech spectrum measured at one meter from lips 120].12Chapter 2 Selection of a speech encoding methodThis spectrum is measured at a distance of one meter from the lips in a soundfield free from reflections. The intensity of this spectrum, integrated over the entirefrequency range, amounts to 65 dB relative to 10-16 watticm2. Audibility in the entirefrequency range is not required for good intelligibility of speech. Fletcher [12] statesthat substantially complete fidelity for the transmission of speech is obtained by a systemhaving a frequency range from 100 to 7000 cycles per second and a range of 40 decibelsin amplitude.The contribution of an individual speech sound to the comprehension of an utteranceis a very complex matter. Frequency, intensity, temporal characteristics, speech contextand speaker characteristics interact and contribute to the intelligibility of speech. Theeffective proportion of the speech signal available to a listener depends upon the intensityof the various sound components in their ears and the intensity of unwanted sounds thatmay be present.The Articulation Index (AI) [1], [17], [20] has been used as a quantitative measureof the intelligibility of speech transmitted over communication systems. It is based onthe concept that any narrow band of speech frequency of a given intensity carries anindependent contribution to the total index and that the total contribution of all the bandsis the sum of the contributions of separate bands. The magnitude of this index is taken tovary between zero and unity, the former applying when the received speech is completelyunintelligible, the latter to the condition of best intelligibility.French and Steinberg [20] investigated the relationships between articulation indexA, speech intensity, and frequency response of communication systems using a group oflisteners and talkers. They derived the following equation13Chapter 2 Selection of a speech encoding methodA .E W„-(AA)„,„xwhere A = Total articulation index= maximum contribution of any one bandWn = percent of maximum contribution contributed by each bandThe relationships between W„, and the respective levels of speech and of acousticalnoise in the ear is presented in [20]. Methods for the calculation of articulation indexis given in [1].Articulation index vs Cut-off frequency 0.90.80.70.60.50.40.30.20.102 310Frequency (Hz)10410Fig. 2.2 Articulation index versus cut-off frequency. All bands are at their optimum levels. Curveis based on about equal number of men's and women's voices 1201.14Chapter 2 Selection of a speech encoding methodFrom their experimental data, French and Steinberg [201 derived a curve of articu-lation index versus cut-off frequency of low pass filters under the special condition ofoptimal loudness at the ear and in negligibly low noise level (Fig 2.2). Two importantpoints can be concluded from their studies1. Extending the frequency range of a communication system below 250 Hz or above7000 Hz contributes almost nothing to the intelligibility of speech.2. Each of the following frequency bands in Table 2.1 makes a 5 percent contributionto the articulation index, provided that all bands are at their optimal levels.Table 2.1 Frequency bands making equal (5 percent) contributions to articulation index when allbands are at their optimum levels. Composite data for men's and women's voices [20].Band Frequencies Band Frequecies Band Frequencies Band Frequencies1 250-375 6 955-1130 11 1930-2140 16 3255-36802 375-505 7 1130-1315 12 2140-2355 17 3680-42003 505-645 8 1315-1515 13 2355-2600 18 4200-48604 645-795 9 1515-1720 14 2600-2900 19 4860-57205 795-955 10 1720-1930 15 2900-3255 20 5720-7000It should be noted that the perceived naturalness of the speech is considered as a.separate item here. It is known that although low frequency enhancement (50 Hz to200 Hz) does not contribute to intelligibility, it will increase the naturalness of speech.[211The required performance level of a given communication system can only beevaluated by the users of the system. Hirsh et al. [13] suggests that for low passfiltered speech, intelligibility dropped only slightly when frequencies above 1600 Hzwere removed. There is no single value of the articulation index which can be specified15Chapter 2"Selection of a speech encoding methodas a criterion for "acceptable" communication. Present-day commercial communicationsystems are usually designed for operation under conditions that provide articulationindexes in excess of 0.5 [1].2.2 Subjective Measurement of different speech encodingtechniquesWith the advent of digital communications, considerable interest has been focusedon the efficient encoding of speech. Over the past decade a large number of digitalcoding algorithms have been investigated for a wide variety of applications [24], [19],[14], [27]. Numerous signal processing techniques taking advantage of speech productionand perception properties have been proposed and studied for the purpose of reducingthe required transmission or storage rate for digitized speech. These techniques rangefrom low to high complexity in design, and offer a corresponding trade-off betweenperformance and complexity.The performances of different speech encoding techniques are usually evaluated byquality or intelligibility tests. The testing procedure will depend on whether the issueis the quality or the intelligibility of the digitized voice. In the high quality range,intelligibility is good and the perceived naturalness of the digitized speech is usuallyassessed by qualitative measurements. In the low quality range, quality is low anywayand intelligibility criteria will determine whether the coding technique is acceptable.Opinion rating on a subjective five-point scale is commonly used to assess the degreeof speech quality or speech impairment (Table 2.2).16Chapter 2 Selection of a speech encoding methodTable 2.2 Five-point scales for quality and impairment, and associated number scores.Score Quality Scale Impairment Scale5 Excellent Imperceptible4 Good (Just) Perceptible but not Annoying3 Fair (Perceptible) Slightly Annoying2 Poor Annoying (but not Objectionable)1 Unsatisfactory (Bad) Very Annoying (Objectionable)The final result from these tests, in the simplest form, is the pooled average judgementcalled the Mean Opinion Score (MOS) for a group of listeners. A MOS of 5.0 impliesperfect quality, but this is hardly ever attained, even by undigitized speech. Unimpairedand extremely high quality speech tends to get an MOS rating between 4 and 5. Thisis due to the fact that subjects may sometimes award a score such as 4 to a speechsample that ideally deserves a 5; or they may occasionally rank a slightly impairedstimulus higher than the original. A MOS approaching 4.5 signifies high-quality or near-transparent coding. A score of 3.5 on the MOS scale indicates that there is some detectabledistortion but very little degradation of intelligibility. Lowest in the hierarchy of speechcoding is synthetic speech with a MOS not exceeding 3.0, this quality is characterized byhigh intelligibility but has an inadequate level of naturalness and speaker recognizability.17Chapter 2 Selection of a speech encoding methodFig. 23 Subjective speech quality of different speech encoding techniques versus encoding bit rate.Fig 2.3 is a quantitative description of speech quality with various commonly usedspeech encoding techniques as a function of their bit rates in kilobits per seconds (Kbps)[33], [24], [21]. The subjective quality of digitized speech tends to increase as the bit rateis increased. The complexity of different encoding techniques is also illustrated in Fig.2.3. The level of complexity is quantified by implementation criteria such as the numberof multiply/add operations involved per waveform sample. The high complexity encodingtechniques are also characterized by the highest levels of encoding delay, making the morecomplex techniques less useful than others in real time applications. However, advancesin digital technology tend to make complex digitization techniques more practical.•A brief description of the digital encoding techniques shown in Fig. 2.3 follows.18Chapter 2 Selection of a speech encoding methodPulse Code Modulation (PCM) used in telecommunications usually samples at a rateof 8 KHz and quantizes the amplitude of each sample by rounding off each sample valueto a set of discrete values. Amplitude compression is typically used following either theso called it-law or A-law standards. They are characterized by fine quantizing steps forthe very frequently occurring low amplitude speech segments; coarser quantizing stepsare used for the occasional large amplitude segments. [21]The Adaptive Differential Pulse Code Modulation (ADPCM) exploits the high am-plitude correlations in adjacent speech samples to reduce the bit rate. It is based on thenotion of adaptively quantizing the difference in amplitude of adjacent samples. [21]Sub-band coding (SBC) decomposes the 0-4 KHz frequency band using band passfilters into the sub-bands 0-500, 500-1000, 1000-2000, 2000-3000 and 3000-4000 Hz.An ADPCM coder with fixed first-order prediction is employed for each band. [33]The CCITT G.722 algorithm is a form of SBC aimed to provide a 7 KHz audiobandwidth at 64 Kbps. Two sets of filters are used to divide the audio signal sampled at16 KHz into a high band and a low band. Two ADPCM coders are used to quantize thehigh and low band components to be transmitted. [24]In Adaptive Transform Coding (ATC), the input speech is blocked into frames ofdata and transformed by a symmetric discrete Fourier transform [33] . Each frameis represented by a set of transform coefficients, which are separately quantized andtransmitted. At the receiver, the quantized coefficients are inversely transformed toproduce a replica of the original input frame. [14]The Code-Excited Linear Prediction (CELP) is one of a class of coders known asanalysis-by-synthesis coders. In the actual encoding process, the encoder first buffers an19Chapter 2 Selection of a speech encoding methodinput speech frame of about 20 ms or so, and then performs linear prediction analysison the buffered speech. During the analysis stage, it attempts to find the best parametervalues so that the error between the input speech frame and its synthesized output frameat the decoder is minimized. These values are encoded and sent to the decoder. Thedecoder decodes and reproduces speech frame-by-frame. [16]The Adaptive Predictive Coder (APC) is an ADPCM coder which uses two stagesof prediction, a high-order short-term predictor based on the spectral envelope, and along-term predictor based on the pitch or periodicity information. The prediction resultsin a small prediction error which is quantized by an adaptive step-size quantiz,er. Thecoefficients of the long term predictor are adapted in every 10 ms, where the coefficientsof the short-term predictor are adapted every frame or every alternate frame. [33], [14]When low bit rate encoding techniques are used, especially under imperfect com-munication conditions (e.g. errors in transmitting digitized voice, listeners with hearingimpairments), intelligibility can be a serious issue. In these situations, the intent willnot be to measure speech quality which will be quite low anyway, but rather to measurefeatures that preserve information contrast (e.g. consonant in speech). Various speechintelligibility tests have been developed by audiologists to estimate a person's ability tounderstand conversational speech. Three commonly used intelligibility tests are phonet-ically balanced (PB) word tests, syllable tests, and sentence tests.The phonetically balanced word tests consist of word lists in which the phoneticcomposition of all words in the lists are all equivalent and representative of everydayEnglish speech. These tests utilize an open-set response format, i.e. the listener is notpresented with a closed set of several alternatives of monosyllabic words for each test20-fChapter 2 Selection of a speech encoding methoditem. The set of possible responses to a test item is open and is limited only by thelistener's vocabulary.The syllable tests have a closed-set format and are offered as an alternative to theconventional open-set PB word tests. The advantages of the closed-set format include theelimination of examiner bias, ease of administration and simplified scoring techniques.Another advantage of syllable tests is the possibility to obtain a somewhat detailed pictureof the type of errors made by the listener and not just an indication of the total numberof errors made.• In an attempt to approximate everyday speech more closely, several intelligibilitytests have been developed using sentences as the basic items. An advantage of speechtests using sentences over other material is that sentences approximate the spectral andcontextual characteristics of connected discourse.21Chapter 2 Selection of a speech encoding methodFig. 2.4 Relation between Al and various measures of speech intelligibility.The relationships between articulation index and intelligibility score for a givengroup of talkers and listeners is presented in Fig. 2.4 [1]. These curves show thatthe intelligibility score, in percent correct, is highly dependent on the constraints placedupon the message being communicated. The greater the constraints, the higher thepercent intelligibility score for a given articulation index. The constraints here referto grammatical structure, contextual information found in sentences, or limitations invocabulary size and syllabic length of words.22Chapter 2 Selection of a speech encoding method2.3 Selecting a speech encoding technique for hearingimpaired listenersBased on the Mean Opinion Scores in Fig. 2.3, two relatively simple speech encodingtechniques with high MOS, the p-law PCM and the CC,11 I G.722 algorithm, wereselected as candidates for the proposed radio-frequency ALD. The PCM coderhas a bandwidth of 3.4 KHz and the CCITT G.722 algorithm has a bandwidth of 7 KHz.Under ideal listening conditions, according to Fig. 2.2 the bandwidth of the p-law andthe CCITT G.722 algorithm has articulation indexes of 0.8 and 1.0 respectively. Thesealgorithms should thus produce a high intelligibility score for most of the intelligibilitytests in Fig. 2.4.For the hearing impaired population, intelligibility rather than quality is the importantfactor in deciding which speech coder is acceptable. A sentence-based intelligibility test,the Revised Speech Perception in Noise (SPIN) test, is used to determine whether thetwo encoding techniques are acceptable for the hard of hearing population.The SPIN test was originally developed by Kalikow et. al. [5) with the purposeof making a speech test which better reflects everyday listening conditions than the testswhich use isolated words as stimuli. Each test item in the SPIN test is a sentence of fiveto eight words in length. There are ten forms in the SPIN test, and each form contains 50sentences. For each of the 50 sentences in a form, half are high predictability sentences,which provide the listener with linguistic clues about the final word (e.g. The watchdoggave a warning GROWL). The other 25 sentences are low predictability sentences, whichprovide little or no information about the final word (e.g. I had not thought aboutthe GROWL). Recordings of the test sentences on magnetic tape are provided by the23Chapter 2 Selection of a speech encoding methoddeveloper of the test. The listener is asked to repeat the last word of each sentence. Thelast word of the sentence is always a monosyllabic noun. The nouns used have wordfrequencies of from 5 to 150 per million words in the Thorndike-Lorge lists [9]. Eachof these nouns is used twice, once in a high predictability sentence and once in the lowpredictability sentence. The background noise in the test is a babble of 12 voices on asecond sound track. Kalikow et. al. generated the babble track by adding together taperecordings of 12 talkers reading aloud from continuous text. Each recorded test form ispreceded by a calibration tone (1,000 Hz) to control the sound level output.*.The SPIN test was revised by Bilger [25] in order to standardize its use for hard ofhearing subjects. The test was administered to 128 hard of hearing subjects, with thepurpose of equalizing the difficulty (i.e. score) for each form. From the original ten forms,eight equivalent forms known as the Revised SPIN test were constructed (Appendix A).The score of the Revised SPIN test is known as the "percent hearing for speech" andis determined by the nomograph shown in Fig. 2.5. The construction of the nomographwas based on statistical analysis done on the test results of the 128 hard of hearingsubjects [25]. Once the numbers of correctly identified high and low predictability itemsare determined, the nomograph can be used to determine the "percent hearing for speech"for that person (e.g. a high predictability score of 25 and a low predictability score of22 will give a percent hearing of speech of 97.0). This percentage should fall in theacceptance region. If it does not, then the score probably underestimates the subject'shearing for speech.24II•1.41Percent hearina for speech Test scores outside the acceptance region underestimate the subjectsability to hear speech.Rejected scores are indicated by X s.13^10.4 19.8 24.8 28.8 4<9 4<5 X1 X4 X7 4<0 4<0 4<0 4<012^09.6 18.4 23. 27.1 4<1 4<6 X2 X4 X7 4<1 4<0 4<011^08.7 17.1 21. 25.5 X4 X8 X4 X6 X8 X1 4<11 0^07.9 15.9 20.3 23.9 X8 4<1 Xs 4<44<9 4<89^07.1 14.7 18.9 22.1 4<8 X4 X8 4<9 X18^06.4 13. 17.4 Xs 4<3 X8 4<8 4<87^06.7 12.1 16.1 4<94<6 4<9 X16^05.0 10.9 14.5 4<4 4<84<15^04.409. 13.1 X9 X94^03.8 08.6 11.6 4<003.0 072 09.82^02.3 05.71^01.6 04.00^0.0 02.3Chapter 2 Selection of a speech encoding methodNUMBER CORRECT - LOW-CONTEXT ITEMS0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2525^1)(0 to.(9 t/42 X8 X5 X8 X4 X1 X3 86.6 87.7 88.7 89.6 90.5 91.3 92.1 92.9 93.6 94.3 96.0 95.6 962 97.0 97.7 98.624^X, X0 X8 4<4 to X4 X9 71.9 73 76.8 77.3 78.8 802 81.6 82.9 84.1 85.3 86.7 87.9 89.1 90.3 91.5 92 94.3 96.0 98.523^4<8 4<2 4<0 4<8 4<3 4<0 63 65.9 68.1 69.9 71.9 73.6 752 76.7 782 79.7 81.1 82.6 83.9 86.6 86.9 88.6 90.222^4<2 X6 X2 4<0 4<666.: 59.1 61.8 63.7 65.9 67.7 692 712 72.9 74.5 76.1 77.94<4 too 046^4<021^4<5 4<2 4(7 X4 50 • 532 666 68.3 60 62.6 64.4 66.3 4<1 4<9 4<6 4<2 4<2 4<74<5 4<2 4<120^4<2 X6 X0 43 46.8 50.0 62.8 652 67.6 Xs 4<4 X7 X6 X4 X2 4<9 4<64<2 4<1 X9191817161514X6 X1 36.X9 28.1X7 26.113.1211.360.0 4<4 4<8 4<8 X1 4<0 to 4<8 4<6 4<4 4<2 4<2 4<944.8 X6 4<0 4<4 X4 X4 X7 X6 X6 X4 X3 4<1 4<242.5 4<2 4<6 4<0 4<0 4<4 4<4 4<3 4<3 4<2 X1 4<937.4 X5 X2 X6 X0 X0 X0 X0 X0 X9 X9 X235.6 X6 X9 X6 X6 X0 X0 X0 X0 X0 X933.7 X3 X0 X3 X6 X0 X0 X0 X0 X0242 30.1 34.122.7 28.1 32.3212 26.4 30.840.9 44.4 47.34.1 392 41.31.9 36.3 39.10.1•Fig. 2.5 Scoring Nomograph for the Revised SPIN test [25].The acceptance region is the region where the high and low predictability itemscores of a Revised SPIN test are acceptable. In general, scores on the diagonal (highpredictability item score = low predictability item score ) or below the diagonal on thenomograph are unacceptable; the exceptions are for pairs of high-low scores such as25Chapter 2 Selection of a speech encoding method25-24, 24-24, 1-1 or 0-1 indicating the subject has extremely good or extremely poorhearing. Based on the high-low pair distributions of the 128 hard of hearing subjects,regions above the diagonal that falls far away from the distribution are rejected. Theacceptance region is the region between the lines drawn through the matrix. Percentagesnot rejected, but outside the acceptance region are left to the discretion of the audiologist.The difference between a person's scores on the high and low predictability sentencescan be used as an indicator of how well that individual makes use of the context ofa sentence. The scores on the low predictability sentences should reflect how wellthe peripheral auditory system (outer ear, middle ear and inner ear) processes speech,while the score for the high predictability sentences should reflect how well the encodedinformation about speech is used by the individual.2.3.1 Procedures of the SPIN testThe purpose of the experiment is to determine whether the intelligibility of a hearingimpaired listener will be reduced when the speech material is processed by digitalencoding methods under quiet and noisy conditions. A magnetic recording of the eightRevised SPIN test forms was obtained from the U.B.C. School of Audiology and SpeechSciences. Magnetic recordings of the Revised SPIN Tests processed by the andthe CCITT 0.722 algorithms were made using the setup shown in Fig. 2.6.26Chapter 2 Selection of a speech encoding methodFig. 2.6 Setups for applying speech encoding techniques to recordings of Revised SPIN test forms.Only the speech channel is processed by the speech coding methods. It is assumedthat only a small amount of ambient noise will get into the ALD system in a realisticapplication. Ambient noise reaching the listener's ear has a bandwidth which is notlimited by the bandwidth of the ALD system.Ten hearing impaired listeners participated in the experiment. The ten subjects weredivided into two groups of five; one group was tested under quiet conditions and theother group under noisy conditions. In each group, unprocessed and processed sentencesare randomly selected and given to each subject in a random order. The unprocessedsentences are from the original Revised SPIN Test and the processed sentences are fromthe tapes processed by the 'u -law and the CCITT 0.722 algorithms. All the participatingsubjects had bilateral sensorineural hearing loss (hearing loss caused by damage to theinner ear or the auditory nerve). Prior to the Revised SPIN tests, each subject underwenta basic hearing screening test to determine their degree of hearing loss.The basic tool for the assessment of the degree of hearing is the audiogram. An27Chapter 2 Selection of a speech encoding methodaudiogram is a chart used to record graphically the hearing threshold of an individual atdifferent frequencies. The hearing threshold is typically defined as the lowest (softest)sound level needed for a person to detect the presence of a signal approximately 50% ofthe time. The audiogram plots the signal frequency against the hearing level in decibelhearing level (dBHOL). The horizontal line at 0 dB represent normal hearing sensitivityfor the average young adult. Results plotted on the audiogram can be used to classify theextent of hearing loss. Classification schemes using the pure tone audiogram are basedon the fact that there is a strong relationship between the threshold for those frequenciesknown to be important for hearing speech (500, 1000, 2000 Hz) and the lowest levelat which speech can be recognized accurately 50% of the time. Given the pure tonethresholds at 500, 1000, and 2000 Hz, the effect of the impairment can be estimated.This is accomplished by calculating the average (mean) loss for these three frequencies.A typical classification scheme is shown in Fig. 2.7 [10]. This scheme reflects thedifferent classifications of hearing loss as well as the likely effects of the hearing losson an individual's ability to hear speech.28Normal limits No significant difficultywith faint speech0102030Classification of Hearing Impairment1^1^111111^ 11111- Mild hearing loss Difficulty only withfaint speech40Moderate hearing loss Frequent difficulty withnormal speech50Frequent difficulty with_- Moderately severe hearing loss loud speech607080 — Severe hearing loss90Can understand only shoutedor amplified speech100 — Profound hearing loss^even amplified speechUsually cannot understandII^ I^1^1^1^1^1100^ 1000 10000Frequency (Hz)Fig. 2.7 Classification of hearing impairment in relation to handicap for speech recognition 1101.Chapter 2 Selection of a speech encoding methodEach subject was tested individually while seated in a sound proof room (Fig. 2.8).Sound was administered to the subject through Grason-Stadler GSI-16 Speakers. Soundlevel control was accomplished using a Grason-Stadler GSI-16 audiometer and thecalibration tone presented at the beginning of each Revised SPIN test form recording.Sound was presented at the subject's most comfortable loudness (MCL) and the subjectswere allowed to use their personal hearing aid if they wished. In administering theRevised SPIN test to these subjects, they were asked to respond to every item in theRevised SPIN Test. The 25—item Revised SPIN Practice Tape was used to accustom29a-Speech Material^ oSubjectSoundproof roomSubject response6--Z)L : speechChapter 2 Selection of a speech encoding methodsubjects to the procedure and to determine the subject's MCL. Whenever a subject failedto respond promptly, the tape was stopped and the subject was asked to guess. Also,whenever the subject gave an uncertain response, the tape was stopped and the subjectwas asked to repeat the response or to spell the word. Each test session was taped, so thatthe test items, the subjects responses, and any interchange between the test administratorand the subject appeared on the tape. A second test administrator then scored all the tests.2.3.2 Results in Quiet ConditionsIn the quiet condition the babble track was not presented to the subjects. Threedifferent recordings, one from the original Revised SPIN test and the other two from thetapes processed by the it-law and the CCITT G.722 algorithm were randomly selectedand given to each subject in a random order.Fig. 2.8 Test setup in quiet conditionThe subjects' audiograms and MCLs are shown in Fig. 2.9. Their hearing impairmentranged from mild hearing loss to severe hearing loss according to Fig. 2.7. Except forsubject 1, all subjects used their hearing aids in all of the three tests.30Pure Tone Audiogxam, Listener 1010X 20'c 30c40it.• 5060.E 70ri 80X 90100100^1000^10000Frequency (Hz)Pure Tone Audiogram, Listener 3010X 20"c 30407Lit 506070O 8090100100^1000^10000Frequency (Hz)Pure Tone Audiogram, Listener 2 100^1000Frequency (Hz)Pure Tone Audiogram, Listener 4010X 20'10 30• 40v).• 5060.g 708090100100^1000^1000CFrequency (Hz)102030405060708090100Chapter 2 Selection of a speech encoding methodPure Tone Audiogram, Listener 5010X 20-0 3040t 5060C 70.5 801000^10000Frequency (Hz)Fig. 2.9 Audiograms of subjects taking Revised SPIN test in quiet condition.311 1High Predictability item score100 SPIN Test Score in Quiet Environment19080— 51 o1 Low Predicatability item score5 mean2^3^4Listener[:.30201002^3^4Listener5 mean2-5I 20e 15I 10• ^7044'601^2^3^4Listener=SS=meanChapter 2 Selection of a speech encoding methodThe results of the SPIN test in the quiet environment is shown in Fig. 2.10.Original^CCTIT G.722^Mu lawFig. 2.10 Revised SPIN test in quiet conditions.No perfect aggregate score was achieved by any subject, but most of them came closeto 100% on high predictability items. As expected, scores on low predictability itemswere lower. Scores for the original Revised SPIN test recording and the two recordingsprocessed by the two digital encoding methods varied for individual subjects. The meansof the scores of the original Revised SPIN test recording and the two recordings processedby the two digital encoding methods are within ± 0.8% of each other (Table 2.3 on p. 37).2.3.3 Results in Noisy ConditionsThe noisy conditions were simulated by mixing the babble track with the speechtrack. The babble was set at 8 dBHL below the speech sound level, since this is the32Subject responseSoundproof roomFM trans/niter 1—receiver• EL :speechcrb^ R :BabbleSpeech Material0SubjectChapter 2 Selection of a speech encoding methodmedian speech to babble ratio encountered in a wide range of real life situations [25].The set up shown in the Fig. 2.11 was used. Voice output from the tape recorder waspassed through an FM wireless microphone system (Model Realistic # 32-1221) beforebeing input to the audiometer. All other experimental procedures were the same as forthe experiment in quiet condition.The purpose of introducing the FM wireless microphone system in the experiment wasin order to collect data for the evaluation of speech scrambling methods. As described inthe Introduction, the speech scrambling aspect involves secure transmission of scrambledspeech signals using the FM wireless microphone system.Fig. 2.11 Test setup in noisy conditions, babble is presented at 8 dBHL below speech signal.Ideally, the setup of Fig. 2.8 should have been used for the evaluation of noisyconditions. However, we have encountered difficulties in obtaining the agreement of hardof hearing subjects to participate in repeated experiments. Realizing that experiments will33104^102^ 30 10Frequency (Hz)5-5Chapter 2 Selection of a speech encoding methodbe needed to evaluate the performance of the speech coding scheme and the FM wirelessmicrophone system, we have decided to combine the two experiments. The setup ofFig. 2.11 was used. Compared to the setup of Fig.2.8, Fig. 2.11 shows the FM wirelessmicrophone system.The measured frequency response of the FM wireless microphone system has abandwidth of approximately 7 KHz as shown in Fig. 2.12.Bandwidth of FM TransceiverFig. 2.12 Frequency response of the Realistic FM wireless microphone system (Model # 32-1221)The introduction of the FM wireless microphone system in the experiment limitedall speech signal bandwidths to 7 KHz. This should not affect intelligibility test resultsas frequencies of a speech signal below 250 Hz or above 7000 Hz contributes very littleto the speech intelligibility [12]. In addition, experiments in quiet conditions show thatfor hard of hearing people, limiting the speech material to 3 Kliz by the 1t-law coderdoes not affect the intelligibility scores significantly compared to the original RevisedSPIN test recording. Any change in the results of the Revised SPIN tests under noisyconditions is mainly due to the introduction of the babble noise.34Pure Tone Audiogram, Listener 60▪ 10ccIX 20-o 30• 40t 50• 60.1 708090100Pure Tone Audiogram, Listener 7010X 203040t• 50•-.1 60• 70cui 80X 90100100^1000^10000^100^1000^1000CFrequency (Hz) Frequency (Hz)....r1000(0—^MCL= dBHL10 —X 20 —cc13 30 —.s aot 50 —60 L.1 70 —80 —90 —Right -9-Left —e-100^1000Frequency (Hz)111110000Chapter 2 Selection of a speech encoding method>4) 50—1 60708090100Pure Tone Audiogram, Listener 8100^1000^10000Frequency (Hz)Pure Tone Audiogram, Listener 9• '^'0—^MCL=50dBHL10—20:30—40—506070980 :0Right- Left.,■■■■,1100^1000Frequency (Hz)Pure Tone Audiogram, Listener 10I^111111^I^1111111Fig. 2.13 Audiograms of subjects taking Revised SPIN test under noisy conditions.35underestirn Sage10 mean6^7^8^9Listener11Ifi10 mean6^7^8^9ListenerSPIN Test Score in Noise ConditionOriginalHigh Predictability item scoreLow Predicatability item score1009080706050403020102520151050ES3S:=CCM 0.7228^9ListenerMu law(41 2520'15^15Chapter 2 Selection of a speech encoding methodThe subjects' aucliograms and MCLs are shown in Fig. 2.13. • Their , hearingimpairment ranged from mild hearing loss to severe hearing loss according to Fig. 2.7.Except subject 10, all subjects used their hearing aids in all of the three tests. The resultsof the Revised SPIN test in noisy conditions are shown in Fig. 2.14.Fig. 2.14 Revised SPIN test results for noisy conditions.As expected, scores for low predictability items are lower than those for the highpredictability items. Test scores of subject 6's CCITT G.722 recording (High score 19,Low score 18) and subject 7's original Revised SPIN recording (High score 22, Low score17) fall outside the acceptance region of the Revised SPIN test nomograph [25], indicatingthat the percent-hearing-for-speech scores for these subjects probably underestimate theirability to hear speech. These two scores were not taken into account when the mean36Chapter 2 Selection of a speech encoding methodpercent-hearing-for-speech was calculated. Scores of the original Revised SPIN test andfor the tests processed by the two digital encoding methods varied for individual subjects,but the means are within ± 2.3% of each other (Table 2.3).2.3.4 Speech encoding techniques for the hearing impaired listenersThe results in Table 2.3 show that applying either the p-law or CCITT 0.722 speechencoding techniques to the Revised SPIN Test forms does not reduce the intelligibilityscore significantly for hearing impaired persons.Table 2.3 Revised SPIN Test Score (Mean)Original SPIN CCITT 0.722 p-lawQuiet Condition 96.3 95.9 96.7Noisy Condition 89.1 89.5 91.4Both encoding techniques produce a data steam at 64 Kbps. The logical choicewould be the CCITT G.722 encoding technique, as it would give a larger bandwidthwhich could provide extra audio clues for hearing impaired listeners. However, a CCITT0.722 device in integrated circuit (IC) form is not currently available. In 1988, Philipsoffered the PCB 2322 chip that implements the CCITT 0.722 algorithm. Unfortunately,this chip is no longer in production. Implementation of the 0.722 algorithm is possibleusing digital signal processors, but this approach would result in a larger physical sizeand higher cost. Since our Revised SPIN test results show no significant difference interms of intelligibility for hard of hearing persons between the two encoding techniques,and a p-law codec in IC form is readily available, the p-law codec was chosen for theradio-frequency ALD.37Wireless SpreadSpectrum modemCodec InterfaceRadio-frequencyALD TransmitterWireless SpreadSpectrum modem Radio-frequencyALD ReceiverChapter 3 Hardware Design & ImplementationChapter 3 Hardware Design & Implementation3.1 Overall designThe development of the radio-frequency ALD is carried out with the aid of personalcomputers ( PCs ) as shown in Fig. 3.1. The Codec Interface performs voice digitizationand recovery, as well as provides the timing signals for the PC to read and write to theCodec Interface.r(Th rn-1 Fig. 3.1 Overall design of the radio-frequency ALD system.In the radio-frequency ALD transmitter, the voice input is sampled and digitized bythe Codec Interface. The PC reads the digitized voice data from the Codec Interface,and performs error detection/correction to ensure reliable data transmission. The PCthen coordinates the flow of encoded data to the wireless spread spectrum modem whichtransmits the data using a spreading code known to authorized receivers.38Chapter 3 Hardware Design & ImplementationIn the radio-frequency ALD receiver, the received spread spectrum signal is de-modulated. The PC performs error detection /correction on the demodulated data andcoordinates the flow of decoded data to the Codec Interface. The Codec Interface per-forms voice recovery and reproduces the analog voice signal. The use of more than oneradio-frequency ALD receiver enables voice broadcast from the radio-frequency ALDtransmitter to be received by a group of users with knowledge of the de-spreading codes.If there are several transmitters in operation, authorized receivers can select the appro-priate de-spreading codes to receive transmissions from the desired transmitter only.In this part of the thesis the examination of the feasibility of transmitting voice usingspread spectrum techniques is presented.3.2 Spread Spectrum modemThe spread spectrum modem must be capable of transmitting the data from the CodecInterface of the transmitter to the receiver. Since the codec converts analog voiceinto digital data stream at 64 Kbps, this implies the spread spectrum modem must havea data rate of no less than 64 Kbps.As data are transmitted over the radio-frequency link, errors will inevitably occurdue to interference or noise present in the transmission environment. These errors affectthe quality of the reproduced analog voice signal by the radio-frequency ALD receiver.Error correction or detection capability can be implement in the radio-frequency ALDsystem to minimize the effect of data lost. The degree of error detection or correctioncapability selected will depend on the channel error rate of the spread spectrum modem.An analysis of the error correction or detection capability required for the radio-frequency39Chapter 3 Hardware Design & ImplementationALD system and the required transmission rate of the radio-frequency spread spectrummodem will be discussed in Chapter 4.There are chip sets available for spreading and de-spreading direct sequence data(e.g. STEL-1032, STEL-3310 from Stanford Telecom, OCI 100011-1 from O'NeillCommunications Inc.). While relatively simple circuits can be used to digitally spread andde-spread the data, developing the radio frequency (RF) sections required for transmissionat 902-928 MHz is considerably more involved. The 902-928 MHz band has beenallocated for industrial, scientific and medical (ISM) applications in Canada and USAwhere license-free spread spectrum transmitters are allowed to operate. [4]It was decided to investigate the feasibility of integrating commercially availablespread spectrum modems in the design of the radio-frequency ALD system. In order totransmit the voice data output by the codec, the spread spectrum modem must be capableof transmitting and receiving data at a rate of at least 64 Kbps.Table 3.1 Known commercially available spread spectrum modems.Manufacturer model spread spectrum technique data rateNCR WaveLAN Direct Sequence 2 MbpsIntermec Model 9181 Direct Sequence 256 KpsSolid State Electronics Corp. Model 5093 Direct Sequence 64 KbpsPrism RangeLAN Direct Sequence 242 KbpsO'Neil Communication Inc. LAWN Direct Sequence 19.2 KbpsTelesystems Arlan 650 Direct Sequence 1.35Mbps40Chapter 3 Hardware Design & ImplementationA number of commercially available spread spectrum modems were reviewed, asshown in Table 3.1. None of them allow the user to change the spreading code directly,and some of them do not have the required data transmission rate. The Arlan 650Tm,manufactured by Telesystems, was selected for use in the radio-frequency ALD designbecause of its data rate and software compatibility with the TCP/IP protocols.3.3 Arlan 650Tm characteristicsThe Arlan 650Tm Wifeless Network Card from Telesystems SLW Inc. installs directlyinto the PC-bus of any IBM PC/ATTm or compatible. It contains an on-board spreadspectrum radio transceiver. A TNC-type connector at the rear panel of the Arlan 650TmWifeless Network card connects to an attached 8—inch half wave dipole antenna.In Canada, Arlan 650Tm operates under DOC regulations allowing license-free useof spread spectrum transmitters in the 902-928 MHz band. The Arlan 650Tm has amaximum RF power output of 1 Watt. It can be set up to operate in one of the 13channels and the data rates vary from 215 Kbps to 946 Kbps, as shown in Table 3.2.[30]41Chapter 3 Hardware Design & ImplementationTable 3.2 Radio frequency and data rate of channels in Arlan 650"1 130)Channel Rate kbps Centre Frequency MHz0 215 9081 215 9102 215 9133 215 9154 215 9175 215 9206 215 9227 344 9118 344 9159 344. 91910 630 91511 860 91512 946 91513* 1050 91514* 1350 915* Current Canadian DOC regulations prohibit the use of Radio Channels 13 and 14. [30]Fig. 3.2 shows the RF spectrum of Arlan 650Tm transmitting data using RadioChannels 7 to 9. The range of the spectrum shown is from 900 MHz to 930 MHz,with a horizontal scale of 3 MHz per division. The bandwidth of each channel overlapsand interferes with adjacent channels. The direct sequence spread spectrum techniquedescribed in Section 1.4 is used to minimize interference between adjacent channels.Each channel uses a different spreading code to spread the bandwidth of data to betransmitted. By using different radio channels, the spreading code of the system can bechanged indirectly.In addition to the radio channel, an Arlan 650Tm must have the same System Identifier(SID) to communicate with another Arlan 650. There are over 8 million possible SID42Chapter 3 Hardware Design & Implementationsettings which provide a good degree of privacy and security for the radio-frequency ALD.The range of the Arlan 650Tm in a given indoor environment depends the followingfactors.• data rate ( lower bit rate channels have an advantage over higher bit rate channels,since there is approximately a 6 to 7 dB decrease in receiver threshold as the datarate is increased from 200 Kbps to 1 Mbps [30].)the building material of the indoor environment and the number of obstacles (people,furniture, walls, partitions) in the direct path between the transmitter and the receiver.Floor to floor penetration also depends on the material used between the floors.• type and placement of antenna. (Telesystem has an optional high gain Yagi antennafor the Arlan 650Thl wireless network card.)Typical range is 300 feet indoor and 1000 feet outdoor line-of sight operation [30].Software drivers are supplied by the manufacturer for operating the Arlan 650Tmunder Novell NetwareTm 286 and 386. A Packet Driver is also available for operationwith third party TCP/IP protocols.43LEM.-200110111111.11,0C,SI S. 01E2ar..vnt/1/*CSOB 'ear !LkinlIPS*1DOCWM LI ID.Wet 1/711.tla igTIfl*.l Is.^Yard^INSae!144:1SRA,11147111141X tr I ONthe ArIWO.IMP 125 ONia506 'Mr I U.SIVA."lei?issa. tIWO! IDIn0•1.•^INTff&mown. L asc-MEI= 11111111111I111111111111111111M11111111111111111111111211011111111111111111111111111111111.1111111111111111111111111011111111111111111111111INIVINWA11111111111111111YIN111111111111111111111111111•NeurilkulNIIIIIISIMS111111111111111111111=1111110111M1111111NIMAIIIRE111111111/112111.1111111TAI.AAA.1111111111111W1111111111111miumwammonm11111111111111111111101E11111111111111:113111111111111111111104/041LIP.IS.410■I1WO/Chapter 3 Hardware Design & Implementation3.4 Codec InterfaceThe block diagram of the Codec Interface is shown in Fig. 3.3. The Codec Interfaceis designed so that it can be used in the radio-frequency ALD transmitter and the radio-frequency ALD receiver. In the transmitter, the Codec Interface passes digitized voicedata from the p-law codec to the PC. In the receiver, the Codec Interface passes receiveddigitized voice data from PC to the codec.Codec Interface 1.1 law codecAnalog voiceoutput Decoder Serial digitized voicedata from PCAnalog voiceinputCoder• alimmaps. Serial digitized voicedata to PCFig. 33 Block diagram of the Codec Interface.As the conventional PC serial port cannot handle serial data at 64 Kbps, the serialdigitized voice data is passed to and from the PC via a digital signal processor board.The digital signal processor board also acts as a buffer between the PC and the Codec45Chapter 3 Hardware Design & ImplementationInterface. A TMS320C30 Processor Board from Spectrum Signal Processing [29] wasused in the design. This is an IBM PCTm compatible plug-in board with two synchronousserial ports capable of input/output serial data at speeds of up to 8.3 Mbps. The PC readsthe data received via the serial port of the TMS320C30 Processor Board, and writes datato this board for transmission via its serial port.Serial port 0 and serial port 1 in the TMS320C30 Processor Board are totallyindependent. Each serial port can be configured to transmit and receive 8, 16, 24, or 32bits of data per word. The clock for each serial port can be either internal or external. Thesignal description of the TMS320C30 Processor Board serial port is shown in Table 3.3.Table 3.3 Signal description of the TMS320C30 serial ports [32].Signal DescriptionCLKXO/1 Serial port 0/1 transmit clock. This input clock signal determines the data output rate at DXO/1.DX0/1 Serial port 0/1 data output.FSXO/1 Frame synchronisation pulse for transmit. The input pulse initiates the data transmit process in DX0/1.CLKRO/1 Serial port 0/1 receive clock. This input clock signal determines the data input rate at DRO/1.DRO/1 Serial port 0/1 data input.FSR0/1 Frame synchronisation pulse for receive. The input pulse initiates the receive data process in DRO/1.In the radio-frequency ALD design, the serial port 0 of the TMS320C30 ProcessorBoard is configured for fixed data-rate, burst mode, external timing, 8—bit word operation.Transfers of data are separated by periods of inactivity on the serial port. Each transfer46 FSR/FSX(external)SIIIZSISIZZISSZIS.4164;a5taZia..71115.2t4n1PCLKXIRDX/DR <^>Chapter 3 Hardware Design & Implementationinvolves a single word, and is initiated by either a Frame Synchronization Transmit(FSX) pulse or a Frame Synchronization Receive (FSR) pulse as shown in Fig. 3.4.In the receive operation, FSR must be low during the last bit, or another transfer willbe initiated.Serial Port Timing of TMS320C30 Processor BoardFig. 3.4 TMS320C30 Serial Port Timing [32]Implementation of the Codec Interface is based on the Motorola MC14402Codec-Filter PCM-Mono-Circuit [18]. Signal descriptions of the MC14402 that are im-portant for interfacing with the TMS320C30 Processor Board are shown in Table 3.4.47Chapter 3 Hardware Design & ImplementationTable 3.4 Signal description of the MC14402 [181.Signal DescriptionTDC . TDC is a clock signal input to the MC14402 which determines the transmit data bit rate at TDD.TDE TDE is a 8 KHz input signal. The leading edge of TDE initiate the shifting out of an 8-bit word at TDD ata rate dtennined by the TDC clock signal.TDD TDD is the digital data output.RDC RDC is the receive data clock. It operates in sync with RCE and RDD to produce all receive data timing.RCE RCE is a 8 ICHz input signal. The leading edge of RCE initiates the receiving of an 8-bit word at RDD.The received word is shifted in RDD at a rate determined by the RDC clock signal.RDD RDD is the digital data input.In the radio-frequency ALD design, the MC14402 is configured to digitize andrecover voice using the it-law format Transfers of digitized voice data into and outof MC14402 is shown in Fig. 3.5. The leading edge of TDE initiates data transmissionat TDD, the rate of data output being determined by the clock signal at TDC. The leadingedge of RCE initiates data reception in RDD, the rate of data input being determinedby the clock signal at RDC.48• bit witiNI dtzed volt. daft42 0 0 42 42 42) 0 0TOETDCTDDROE^n^mi-_-11-L-Ul_S b11 soil ized voice data0 42 42 0 42 4M 0 0RODChapter 3 Hardware Design & ImplementationTransmit Timing Diagram Receive Timing DiagramFig. 3.5 Transmit and receive timing diagram for MC14402 [18]In the radio-frequency ALD transmitter, input voice is sampled by the MC14402 ata rate of 8 KHz and each sample is represented by an 8-bit word in the it-law format.Every 125 ps, an 8-bit word is transferred from the MC14402 to serial port 0 in theTMS320C30 Processor Board at a rate of 128 Kbps. Transfer of the 8-bit words areseparated by periods of inactivity of the serial port. The signal timing necessary for thedata transfer is shown in Fig. 3.6. The data transfer rate is controlled by a 128 KHzclock signal CK2 input to TDC of the MC14402 and CLKXO of serial port 0 in theTMS320C30 Processor Board.Data transfer is initiated by the leading edges of a 8 KHz clock CK3 inputs to FSROof the serial port 0 in the TMS320C30 Processor Board, which prepares the serial portto accept the 8-bit word at the next CK2 clock pulse. At the next CK2 clock pulse, theleading edge of CK1, a one clock cycle delay of CK3 inputs to TDE initiates the 8-bitword transfer from the MC14402 to serial port 0 in the TMS320C30 Processor Board.49FT' I^ CK3CK2gritTiRD. $ bit serimi^wig datacacoar64mocaoTOE^L CK1C K2bit sods, Ind voice dataCa042:042k&C)042>L C K1DX0/RDDRCEChapter 3 Hardware Design & ImplementationMC14402 to TMS320C30 Processor Board Transfer^TMS320C30 Processor Board to MC14402 TransferRadio-frequency ALD Transmitter01(3Radio-frequency ALD ReceiverFig. 3.6 TMS320C30 Processor Board and MC14402 Interface signal timing.In the radio-frequency ALD receiver, received data are passed from the serial port tothe MC14402 and converted back to an analog voice signal. Every 125 /Ls, an 8—bit wordis transferred from serial port 0 of the TMS320C30 Processor Board to the MC 14402.Data transfer occurs in a similar way as in the radio-frequency ALD transmitter. Thedata transfer rate is also controlled by the 128 KHz clock signal CK2 input to TDE ofthe MC14402 and CLICRO of the serial port 0 in the TMS320C30 Processor Board.Data transfer is initiated by the leading edge of an 8 KHz clock CK3 inputs to FSXOof the serial port 0 in the TMS320C30 Processor Board, which prepares the serial portto transmit the 8-bit word at the next CK2 clock pulse. At the next CIC2 clock pulse, theleading edge of CK1 input to RCE initiates the 8-bit word transfer from the serial port0 of the TMS320C30 Processor Board to the MC14402.Implementation of the timing signal, CK1, CK2, and CK3 is shown in Fig. 3.7.50R3GOO ohmna1 .11 KC^IV^• 00 MHzU24%^CK1^ Co-Xam7479l/ 1 :F1.11:E^CK2t 1> ILE> 1274047404U1^CK37404741404060Chapter 3 Hardware Design & ImplementationClock signals to Fig. 3.8Fig. 3.7 Clock generation circuit in the Codec Interface.A crystal CRY1 is used to generate a 4.096 MHz clock signal for the flip-flop U2:B.The flip-flop acts as a divide-by-2 device. The 2.048MHz clock output by U2:B is inputto U3, a 14—Stage Ripple-Carry Binary Counter/Divider. The Q4 output of the counterdivides the 2.048 MHz input clock by 24, generating the 128 KHz signal CK2. The Q8output of the counter divides the 2.048 MHz input clock by 28 generating the 8 KHz signalCK3. The flip-flop U2:A is used to delay CK3 by one CK2 clock cycle to generate CK1.Connection of the MC14402 to the three clock signals and the serial port 0 of theTMS320C30 Processor Board is shown in Fig. 3.8. The analog and digital inputs andoutputs of the Codec Interface in Fig. 3.3 is also shown in Fig. 3.8. Analog input voiceis amplified by the operational amplifier U4:A before being passed to the MC14402 andconverted to digital data. Received digitized voice data is converted back to analog voiceby the MC14402.51V RE FevcrVAC VDDFOC ROD Serial digitized voiceCV1^DI 1 data from PCFDCG RGE-41111111memmumRXO ADCTX IOC immemr4111e.0015 Serial digitized voiceTX IDD data to PCIDEACP A11S 1CK IisCK2VSS 51.5CK3MC 14402Clook Signalsfrom Fig.3.7Analog voiceoutputR14560R6^j#1:17w.I22;1.2K^10KAnalog voiceintputCl R5 U4 A0.68UF 22KTL074C31.1:1—L.IL^1470pF C2^R8^RIO^RII+ VA VA68O^ 47K 10K2Chapter 3 Hardware Design & ImplementationFig. 3.8 MC14402 to TMS320C30 Processor Board interface circuit.52Chapter 4 Software Design & ImplementationChapter 4 Software Design & Implementation4.1 Overall designThe software design was divided in two parts as shown in Fig. 4.1; one partcontrols the TMS320C30 Processor Board, and the other controls the PC. Data exchangesbetween the PC and the TMS320C30 Processor Board are achieved using the dual-accessmemory on the TMS320C30 Processor Board. In the radio-frequency ALD transmitter,the software in the TMS320C30 Processor Board reads the voice data from the CodecInterface, puts it in the dual-access memory and groups the voice data into a fixedlength voice packet. The software in the PC reads the voice packet from the dual-accessmemory, and encapsulates the voice packet with suitable protocol headers for the Arlan650Tm wireless network card to transmit.Software in PC Arlan 650 1^Arlan 650 l'IP■ Software in PCDualmemoryDualmemory',Codec :^I Codec1^i1  Interface 1  Interface  1iSoftware in DSPSoftware in DSPRadio-Frequency ALD Transmitter Radio-frequency ALD ReceiverFig. 4.1 Overall software design of the radio-frequency ALD system.53Chapter 4 Software Design & ImplementationIn the radio-frequency ALD receiver, the software in the PC reads the data receivedfrom the Arlan 650Tm, extracts the voice packet and puts it in the dual-access memory.The software in the TMS320C30 Processor Board reads the voice packet from the dualmemory and writes the voice data to the Codec Interface.The Arlan 650Tm wireless network card supports the Novell SPX/EPXTm protocol andthe Internet TCP/IP protocol. The TCP/IP and the SPX/IPXTm protocols are rules thatco-ordinate exchange of information between connected computers. While SPX/IPXTmis developed by Novell Inc., TCP/IP is an open system and its specifications are freelyavailable. More importantly, TCP/IP is designed to facilitate communication betweenmachines with diverse hardware architectures, to use almost any packet switched networkhardware, and to accommodate multiple computer operating systems. Thus, anyone canwrite the software needed to communicate across different computer networks. For thisreason, the TCP/IP protocol is used to transmit the voice packets in the radio-frequencyALD system.4.2 TCP/IP protocolTCP/IP is a set of protocols that allow computers from different vendors to shareresources across connected networks e.g. transferring files via the file transfer protocol(FTP), remote login via the network terminal protocol (TELNET) or sending electronicmail. They are sets of rules that co-ordinate the exchange of messages between computersand make the exchange more efficient. The most accurate name for the set of protocols isthe "Internet protocol suite". Transmission Control Protocol (TCP) and Internet Protocol(IP) are two protocols in the suite needed for many applications e.g. TELNET and FTP.54Chapter 4 Software Design & ImplementationBecause TCP and IF are the best known of the protocols, it is common to use the termTCP/IP to refer to the whole family of protocols.The general concept of data transmission in the TCP/IP network is described in thefollowing section to illustrate the software implementation of the RF ALD.Internet is a packet-switching network, where information in the network is transmit-ted in small segments, known as packets. The TCP/IP protocols define the format of thesepackets including the origin of the packet, the length of packet, and the type of packet,as well as the way computers on the network are to receive and re-transmit packets.TCP/IP provides three sets of services as shown in Fig. 4.3. At the lowest level,a connectionless delivery service provides a foundation on which everything rests. Atthe next level, a reliable transport service provides a higher level platform on whichapplications depend.APPLICATION SE VICES (e.g. FTP)RELIABLE TRANSPORT SERVICE (e.g. TCP)CONNECTIONLESS PACKET DELIVERY SERVICE (e.g. IP)Fig. 4.2 The three level of service provided by TCP/IP.Connectionless delivery is an abstraction of the service that most packet-switchingnetworks offer. This service routes packet from one node to another based on address55Chapter 4 Software Design & Implementationinformation carried in the packet. (a node on the Internet is any device that understandsTCP/IP protocols) Because the connectionless service routes each packet separately, itdoes not guarantee reliable, in-order delivery. Connectionless packet delivery is the basisfor all Internet services and makes the TCP/IP protocols adaptable to a wide range ofnetwork hardware.Most applications need more than just connectionless packet delivery because theyrequire the communication software to recover automatically from transmission errors,lost packets or failures of intermediate switches along the path of the sender and receiver.The reliable transport service handles such problems. It allows an application on onenode to establish a 'connection' with an application on another node, and then to senda large volume of data across the connection as if it were a permanent, direct hardwareconnection. In practice, the service divides the stream of data into small packets and makeuse of the connectionless packet delivery service to send them one at a time, and waitingfor the receiving host (small or large computers) to acknowledge reception of packets.The TCP/IP services can be modelled with four functional layers that build on afifth layer of hardware. The model shown in Fig. 4.4 presents a framework to describeprotocol characteristics and functions. Fig. 4.4 also shows some of the TCP/IP protocolsin relation to the reference model.56Chapter 4 Software Design & ImplementationFig. 4.3 TCP/IP reference model.Each layer in the model has various functions, which are independent of the otherlayers. Each layer, however, expects to receive certain services from the layer beneath it,and each layer provides certain services to the layer above it. Each layer on the transmithost communicates with that same layer (peer layer) on the destination host. The modelis generally used as a framework to describe the functions and characteristics of differentprotocols in the Internet.At the highest level, users invoke application programs that access services availableacross the Internet. An application interacts with the transport level protocol(s) to sendor receive data. Each application program chooses the style of transport needed, whichcan be either a sequence of individual messages or a continuous stream of bytes. Theapplication program passes the data in the required form to the transport level for delivery.The prime duty of the transport layer is to provide communication from one appli-57Chapter 4 Software Design & Implementationcation program to another. The transport layer may regulate flow of information. Itmay also provide reliable transport service, ensuring that data arrives without error andin sequence. The transport layer may accept data from several user programs and sendthem to the next lower level. To do so, it adds additional information to each packet,including codes that identify which application program sent it and which applicationprogram should receive it. The receiving node uses the destination code to identify theapplication program to which it should be delivered.The Internet layer handles communication from one node to another. It accepts apacket from the transport layer along with an IP address to which the packet shouldbe sent. The basic transfer unit in the Internet is an Internet datagram, sometimes alsoknown as an IP datagram. The Internet layer accepts a packet from the transport layerand encapsulates the packet in an IP datagram, fills in the datagram header, uses therouting algorithm to determine whether to deliver the datagram directly or send thedatagram to a gateway (gateways are dedicated computers that are attached to two ormore networks and forward data from one network to another), and passes the datagramto the appropriate network interface for transmission. The Internet layer also handlesincoming datagrams, checks their validity, and uses the routing algorithm to determinewhether the datagram should be processed locally or forwarded. For datagrams addressedto the local host, software in the Internet layer deletes the datagram header and choosesthe appropriate transport protocol that will handle the packet. The Internet layer doesnot limit datagrams to a small size nor does it guarantee that large datagrams will bedelivered without fragmentation. Fragmentation is a process in the transmitting node ofdividing a large datagram into several pieces. The receiving node will re-assemble the58Chapter 4 Software Design & Implementationdatagram based on the information in the fragments.At the network interface layer, the nodes on a network communicate with other nodeson the network using physical addresses specific to that network. Each node has a uniquephysical address for the hardware device that connects it to the network. The IP addressfor a node is a logical address, which is independent of the physical address. BecauseIP addresses are not dependent on any particular network interface hardware, they canbe used to send datagrams from one network to another network. The network interfacelayer encapsulates an IP datagram in a network interface frame, maps the IP address ofa node or a gateway on the same network into a physical address, and uses the networkhardware interface to delivery the network interface frame.An IP address is mapped onto a physical address using the Address ResolutionProtocol (ARP). The sending node broadcasts an ARP packet containing a IP address.The node with the IP address sends its physical address back to the requesting node.To speed packet transmissions and reduced the number of ARP request, each timethe node broadcasts an ARP request and receives a response, it creates an entry in theaddress resolution cache in the sending node. The entry maps the IP address to thephysical address. When the node needs to send another IP datagram, it looks up the IPaddress in its cache. If it finds that JP address, the node uses the corresponding physicaladdress for its network interface frame. The node broadcasts an ARP only if the IPaddress is not in its cache.Fig. 4.5 shows how messages from a host are transferred across different networks.At the sending host, user data are presented by a user application at the upper (application)layer. Each layer encapsulates its protocol control information (header) to the user data59Chapter 4 Software Design & Implementationand passes its header and user data to the next lower layer which repeats the process.Sway Host Receiving MostApplicationMP,AppikidienI^I-I H^•s•nas^IeautwayTaansponlaYofTransport*ofIrMormt*srIntemstlaYstYeomanWoo* MariamlaYofNewark triastaosIsY4HNam* IntarlacoIsYs,Flaidwisto Naniento alardwassA1-4 : Application HeaderTH : Transport HeaderIH : Internet HeaderNH1 : Network Interlace Header in Phyiscal Network 1Nil : Network Interface Trailer in Physical Network 1NH2 : Network Interface Header in Physical Network 2NT2 : Network Interlace Trailer in Physical Network 2Fig. 4.4 Communication process in the TCP/IP protocols.In the Internet layer, the sending host has to determine whether the destination IPaddress in the datagram is within the same physical network or on other network. Ifthe destination IP address is not on the same network, the host sends the datagram to agateway. The sending host has a table of IP addresses for one or more hosts that serveas gateways to other networks. It looks for the IP address of a gateway that leads to thedestination network, maps the IF address of the gateway to a physical address using ARP,and sends a network interface frame containing the IP datagram to the gateway. Whenthe gateway receives the IP datagram, it uses the destination IP address in the datagram60Chapter 4 Software Design & Implementationto send the message to its destination host using an appropriate network interface frame.If the destination IP address is within a single network, the host can map thedestination IP address into a physical address using ARP, and sends the IP datagramin a network interface frame to its destination directly.The fully encapsulated data are transported across the physical networks to thereceiving host. Here the process is reversed. The data go from the lower layers tothe upper layers, and the header created by the transmitting peer layer is used by thereceiving peer layer to invoke a service function for the transmitting site and the upperlayers of the receiving site. As the data go up through the layers, the headers are strippedaway after they have been used.4.3 Voice packet transmission in the radio-frequency ALD systemIn the radio-frequency ALD design, the radio-frequency ALD transmitter and re-ceivers are grouped in a simple physical network by the use of Arlan 650Tm wirelessnetwork cards over which TCP/IP operates. The Arlan 650Tm wireless network card isthe network interface hardware that connects the radio-frequency ALD transmitter andreceiver to the same network. Access to the network is governed by the Institute ofElectrical and Electronics Engineers 802.3 Media Access Control (IEEE 802.3 MAC)procedure [3] in the network interface layer which uses a Ethernet frame for communi-cation between different nodes.The communication process between the radio-frequency ALD transmitter and theradio-frequency ALD receiver can be modelled as shown in Fig. 4.6.61^0. (^Arlan 650 spread spectrum wireless network (IH: IP HeaderEH: Ethernet HeaderET: Ethernet TrailerVoice packetNetwork Interface^Lay_Qr^Network Interface^kayerDigitized voice data_e■pplication La er^in -law format Voce packetDigitized voice datainy-law formatInternet datalication Layerort Layer erInternet LaAEthernet frameChapter 4 Software Design & ImplementationTCP/IP Functional Lavers^ TCP/IP Functional LaversRadio-Frequency ALD Transmitter Radio-Frequency ALD ReceiverFig. 4.5 Communication processes in the radio-frequency ALD system.The application layer reads the voice packet from the dual-memory in the TMS320C30Processor Board and passes it to the transport layer on the radio-frequency ALD trans-mitter. To reduce overhead in the packet, the transport layer adds no header to the voicepacket. The transport layer ensures that the voice packet from the application layer ispassed to the internet layer on a first-come-first-out basis. The time interval betweentransmission depends on the length of the voice packet (e.g. if the voice packet size is64 bytes, the transmitter needs to transmit once every 8 ms to make up the codec rate of64 Kbps; if the voice packet size is 32 bytes, the transmitter needs to transmit once every4 ms; etc.). The Internet layer creates an IF datagram with a data portion containing62Chapter 4 Software Design & Implementationthe voice packet and passes the IP datagram down to the network interface layer. Thenetwork interface layer encapsulates the JP datagram in a Ethernet frame and passes thatto the Arlan 650Tm network card for transmission.At the radio-frequency ALD receiver, the network interface layer computes thechecksum of the Ethernet frame. If the checksum contained in the Ethernet frame doesnot match the checksum computed by the network interface layer, it discards the frame.If the checksums match, the network interface layer passes the IP datagram to the internetlayer. The internet layer computes the checksum in the datagram header. If the checksumcontained in the header does not match the checksum computed by the internet layer, itdiscards the IF datagram. If the checksums match, the internet layer passes the voicepacket to the transport layer. The transport layer ensures a voice packet is passed to theapplication layer in a timely order. If the voice packet is lost during transmission, a simplepacket replacement strategy is used to recover the lost voice data. The application layerwrites the received voice packet in the dual-memory of the TMS320C30 Processor Board.The structure of the IP datagram is shown in Fig. 4.7. An IP datagram is dividedinto a header and a data area. The IP datagram does not specify the format of the dataarea, it can be used to transport any data of length up to 65,515 bytes.63VERS^HLEN^SERVICE TYPE TOTAL LENGTHIDENTIFICATION n D Ms' F F^FRAGMENT OFFSETTIME TO LIVE^PROTOCOL HEADER CHECKSUMSOURCE IP ADDRESSDESTINATION IP ADDRESSIP OPTIONS (IF ANY)^ PADDINGDATADATA4 16 24 1^MinimumHeader lengthCorrespondsto IH in Fig.4.6Chapter 4 Software Design & ImplementationFig. 4.6 Structure of an IP datagram.The fields in the IP header have the following meaningVersion Number ('VERS) is a 4—bit field which specifies the version number of theIP. It is used to verify that the sender, receiver, and any gateway in between themagree on the format of the datagram. The version used in the radio-frequency ALDimplementation is 4.Length (HLEN) specifies the length of the IP protocol header in 32-bit words. Theminimum IP protocol header contains five words (20 bytes). The length of the protocolheader may be increased by the addition of optional fields, but the exact length must beknown for the purpose of interpretation. In the radio-frequency ALD implementation,the minimum length (20 bytes) is used.Type of service (SERVICE TYPE) is a 8—bit field that specifies how the datagramshould be handled. It is broken down to five subfields as shown in Fig. 4.8.64Chapter 4 Software Design & Implementation1^2^3^4^5^6^7^8PRECEDENCE UNUSEDFig. 4.7 The five subfields that comprise the Type of service field.Precedence indicates the relative importance of the datagram from 0 (routine) to 7(network control), allowing the sender to indicate the importance of each IF datagram.Bits D, T, and R specify the type of transport the IP datagram desires. When set (1), theD bit is a request for low delay, the T bit is a request for high throughput, and the R bit isa request for high reliability. In practice, the value 0 is always used since most host andgateway software ignore the Type of service field. A value of 0 is used in the field in theradio-frequency ALD implementation, and the receiver ignores the content in this field.Total Length contains the length of the datagram including the header in bytes. Thisentry is used to establish the data length. This field allows the length of a datagram to beup to 216 — 1 or 65,535 bytes. In the radio-frequency ALD implementation, the valuein this field depends on the size of the voice packet.Identification is a 16—bit value assigned by the sender to aid in assembling thefragments of a datagram. Since fragmentation of IP datagram is not expected in theradio-frequency ALD application, a value of 0 is used in this field.The field following Identification is a 3—bit value, which controls the handling ofdatagams in the case of fragmentation. The first bit is unused, the second and third bitsare DF (Don't fragment) and MF (More Fragments). If DF bit is set (1), the IF datagram65data 104 bytes01leI,Chapter 4 Software Design & Implementationis not fragmented under any circumstances, even if it can no longer be forwarded andmust be discarded. The MF bit shows whether or not the IP datagram is followed bymore sub-packets (0 indicates no more sub-packet, 1 indicates more sub-packets). Sincefragmentation of IP datagram is not expected in the radio-frequency ALD application, avalue of 0 is used in DF and MF.Fig. 4.9 shows the content of various fields in a IP datagram during the course offragmentation process so that the resulting IP datagram would adapt to a network witha maximum packet size of 128 bytes.Network maximum packet length . 128 bytes.Initial condition IP Header 20 bytes; data length 300 bytes; packet ID - 2345; DF 0.I IP Header) data 300 bytes I IP HeaderFragment 1 Total length - 124 bytes;gasket ID . 2345;offset -0;MF 1. Fragment 2: Total length . 124 bytes,packet ID . 2345;offset- 13;MF - 1.It Header^data 92 bytesFragment 3: Total length -112 bytes;packet ID 2345,offset .. 26;MF 0IP Header^data 104 bytesFig. 4.8 Fragmentation of an IP datagramIf the MF bit is set, the fragment offset specifies the offset in the original datagram ofthe data being carried in the fragment. It is measured in units of 8 bytes, starting at offsetzero for the first fragment. The receiving host can use this information to re-assemblethe original message correctly. Since fragmentation of IP datagram is not expected inthe radio-frequency ALD application, a value of 0 is used in this field.66Chapter 4 Software Design & ImplementationTime to Live specifies how long the datagram may remain in the network before itis discarded. The time to live is usually equal to the maximum number of gateways thata datagram may pass through. Each gateway along the path from source to destinationis required to decrement the Time to Live field by 1 when it processes the datagramheader. If this field is zero, the datagram must be discarded by the current gateway. Thisprevents a datagram from circulating endlessly in the network. In the radio-frequencyALD design, a value of 100 is used in this field.Protocol contains the ID of the transport protocol to which the datagram has to behanded over. An arbitrary value of 35, not used by other transport protocols, is used inthis field for the radio-frequency ALD application.Header Checksum contains the checksum for the protocol header fields. It preventsnode or host from working with false data. For efficiency, the user data in the IPdatagram is not checked. The Internet Checksum is formed by treating the entire headeras a sequence of 16—bit integers (starting from field Vers), adding them together usingone's complement arithmetic, and then taking the one's complement of the result. Forthe purpose of computing the checksum, the Header Checksum field is initially assumedto contain zero.Source and Destination Address are 32-bit Internet addresses which provide anunambiguous description of the access to a host in a network. Each IP address is dividedinto two parts: a network portion which identifies the network, and a host portion whichidentifies the node. This division can fall at one of three locations within the 32—bitaddress, corresponding to the three Internet address classes: Class A, Class B and ClassC as shown in Fig. 4.10. Regardless of the address class, all nodes on any single network67Chapter 4 Software Design & Implementationshare the same network portion, and each node has a unique host portion.0^8^16^24^31Class A 0 Network portion Host portionClass B^1 0^Network portion^Host portionClass C^1 1 0^Network portion^Host portionFig. 4.9 IF address classes.• Given an Internet address, its class can be determined from the three high-order bits.Class A addresses devote 7 bits to the network portion and 24 bits to the host portion.Class B addresses devote 14 bits to the network portion and 16 bits to the host portion.Class A addresses devote 21 bits to the network portion and 8 bits to the host portion.Internet address are usually written as four decimal integers separated by decimalpoints, where each integer gives the value of one byte of the Internet address. Thus the32—bit internet address10001001 01010010 00111001 00100011is written as 137.82.57.35.The Internet addressing rules reserve the following type of Internet addresses forspecial purposes.• Network addresses are internet addresses in which the host portion is set to all zeros.These are addresses of networks rather than nodes on a network. By convention, nonode is ever assigned a host portion consisting of all zeros.Chapter 4 Software Design & Implementation• Broadcast addresses are addresses in which the host portion is set to all ones. Abroadcast is destined for every node on the network. By convention, no node is everassigned a host portion consisting of all ones.Addresses in which the network portion consists of all zeros means this network.Using network address 0 is important in those cases where a host wants to com-municate over a network but does not know the network IP address. It allows thehost to communicate temporarily, but once the host learns its correct network and IPaddress, it must not use network address 0.• The class A network address 127 is reserved for "loopback" and is designed fortesting inter-process communication on the local host. When any host uses theloopback address to send data, the protocol software in the host returns the datagramwithout sending it across any network.In the radio-frequency ALD implementation, each node is assigned an arbitraryInternet address. A broadcast address is put in the datagram when a voice packet istransmitted from the radio-frequency ALD transmitter.The IF protocol header is extended to include options for rarely used fields such asTime stamp, Security, etc, so that the IP protocol header is kept as small as possible [2].The number of 8-bit words in an IF header is always a multiple of 4, padding charactermust be inserted in the option field if necessary.In the radio-frequency ALD transmitter, each voice packet is put in an IP datagramand the IF datagram is encapsulated in an Ethernet frame and transmitted to the radio-frequency ALD receivers through the Arlan 650114 wireless network card.The structure of an Ethernet frame is shown in Fig. 4.11. Ethernet frames are of69Chapter 4 Software Design & Implementationvariable length, with no frame smaller than 64 bytes or larger than 1518 bytes. Theminimum frame size is specified by the IEEE 802.3 standard and is required for correctprotocol operation. Ethernet Header (EH) ^in Fig. 4.6Ethernet Trailer (Er)in Fig. 4.6sPreamble F Destination Source Protocol Frame Data Frame CheckD Address Address Type (IP datagram in RF ALD application) sequence7 bytes 1 byte 6 bytes^6 bytes^2 bytes^38-1492 bytes^ 4 bytesFig. 4.10 Structure of a Ethernet frame in radio-frequency ALDThe preamble to an Ethernet frame is a stream of 7 bytes of alternating Os and iswhich serves to synchronize the receiving station.The Start Frame Delimiter (SFD) field immediately follows the preamble pattern. Itcontains a fixed 8—bit sequence 10101011 that indicates the start of a frame.The source and destination addresses field in the Ethernet frame are 48-bit long.The source address field specifies the Ethernet address of the sending network interfacehardware. Each Ethernet address is fixed in every Ethernet network interface hardware.Since the IEEE distributes numbers to the manufacturers of Ethernet network interfacehardware, it is certain that every station world-wide has a unique address. Ethernet framecan be sent to a specific Ethernet network interface hardware or to all Ethernet networkinterface hardware actively connected to the physical network. This is done by puttinga specific Ethernet address or the Broadcast address in the destination address field. Allones in the 48—bit long Ethernet address is defined as the Broadcast address.70Chapter 4 Software Design & ImplementationIn the radio-frequency ALD implementation, the radio-frequency ALD transmitteruses its own Ethernet address in the source address field and the broadcast address is putin the destination address field when a voice packet is transmitted.The Protocol type field contains a 2—byte integer that identifies the type of datacarried in the frame. Thus, various protocol layers above the network interface layer canuse this information to determine which protocol should be used to process the data inthe frame. In the radio-frequency ALD implementation, the higher level protocol is IPand 8 is used in that field.The Frame data contains an arbitrary sequence of data of length from 38 up to 1492bytes. If less data are to be sent, the data field is extended by appending extra bytes(pads) to satisfy the minimum frame size requirement. In the radio-frequency ALDimplementation, an IP datagram containing the voice packet is put in this field.After the data section comes a Frame Check Sequence field. It contains a 32-bitcyclic redundancy check (CRC) value which helps the interface to detect transmissionerrors. The value is computed as a function of the content of the Destination Address,Source Address, Protocol type, Frame data and pads (if exist). The encoding is definedby the following generating polynomial.G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x +1The CRC value corresponding to a given frame is computed by the followingprocedure:1. The first 32 bits of the frame are complemented.2. The n bits of the frame are then considered to be the coefficients of a polynomialM(x) of degree n-1. ( The first bit of the Destination Address corresponds to the )0-171Chapter 4 Software Design & Implementationterm and the last bit of the data field corresponds to the x° term.)3. M(x) is multiplied by X32 and divided by G(x), producing a remainder R(x) of degree<31.4. The coefficients of R(x) are considered to be a 32—bit sequence.5. The bit sequence is complemented and the result is the CRC.The sender computes the CRC as a function of the Ethernet frame, and the receiverrecomputes the CRC to verify that the frame has been received intact4.4 Voice packet size in the radio-frequency ALDThe voice packet size plays a significant role in the radio-frequency ALD design. Ithas a significant effect on the end-to-end transmission delay of the system, and affectsthe quality of speech perceived by the listener when a packet loss occurs. Its choiceis also influenced by a number of parameters in the network such as codec data rate,header overhead and network data rate. Their effects ought to be considered in order todetermine an optimal packet length for the radio-frequency ALD.The transmission delay depends on the codec data rate, the packet size, the networkdata transmission rate, and the header overhead. As the network data transmission rate,header overhead and codec data rate are fixed in the radio-frequency ALD design, thetransmission delay depends mainly on the voice packet size. This delay occurs becausebefore transmission can occur, the transmitter has to buffer the voice data until a voicepacket size is full. When a speaker communicates with a listener through the radio-frequency ALD, and expects a response from the listener, the speaker will expect theresponse to come within a time-width known as the expectation time window. If the72Chapter 4 Software Design & Implementationtransmission delay is larger than this expected time-width, the speaker will notice thedelay. This could adversely affect the efficiency of information exchange between thespeaker and the listener. Studies in the telephone industry have found that a transmissiondelay in the range of 100 - 200 ms is acceptable [6]. Given the data rate of the A-lawcodec is 64 Kbps, that would made 800 bytes (corresponding to 100 ms) the maximumvoice packet size.The effect of packet losses in terms of their size is categorized by Jayant andChristensen [22] as shown in Table 4.1.Table 4.1 Categorization of packet loss distortions.Voice packet size (ms) < 4 16-32 64Nature of distortion crackles glitches phoneme lossesJayant and Christensen [22] also suggest that the shortest possible phoneme (i.e. thesmallest distinguishable unit of speech) is roughly 20 ms, and when the voice packetlength is greater than 20 ms, the probability of totally losing a phoneme will increaserapidly. Based on their results, it is desirable to make the voice packet as short as possiblein order to minimize the effects of packet loss and transmission delay.However, as the voice packet size decreases, the number of voice packets needed tobe transmitted per second has to increase in order to sustain the 64 Kbps codec data rate.Since there is a fixed amount of protocol overhead associated with each voice packet,as the voice packet transmission rate increases, the minimum network transmission raterequired will also increase. The overhead for each voice packet is shown in Fig. 4.12,where L is the voice packet size in bytes.7322 bytes 20 bytes L bytes 4 bytesEthernet FrameHeaderIP Header Voice Packet EthernetFrame TrailerChapter 4 Software Design & ImplementationFig. 4.11 Overhead accompanying each voice packet.The packet overhead is 46 bytes in the radio-frequency ALD design.For L ?. 18, the minimum data transmission rate required, R, in bits per second isgiven byR = Frame size (byte) x 8 x Number of voice packet transmission per secondR = (46 + L) x 8 x (64,000 ÷ 8L)R = (46 + L) x 64,000 ÷ L (bps)For L < 18, the Frame size is restricted to the minimum Ethernet frame size (64bytes), R is given byR = 64 x (64,000 ÷ L)R = 4,096,000 ÷ L (bps)The plot of minimum data transmission rate versus the voice packet size is shown inFig. 4.13. The data rate of the p-law codec and the data transmission rate of Channel 0-6in the Arlan 650Tm wireless network card is also shown in the diagram.74Chapter 4 Software Design & Implementation3E.Data Transmission rate vs voice packet sizeArian 650 data rue, Channel 0-6Minimum transmission rate for 8 ms-packetData rate of^coder^1.--100^150^200^250^300^350^400^450^500^5150Voice data size in each packet (Byte)Fig. 4.12 Data transmission rate versus packet size.As the packet size decreases, the data rate required to deliver the voice packetincreases. Channel 0-6 in the Arlan 650Tm wireless network card has the lowest datatransmission rate (215 Kbps) among the 13 available channels. Since it is desirable for theradio-frequency ALD to be able to transmit voice packets in all 13 available channels, thevoice packet size selected should be greater or equal to 20 bytes (2.5 ms) which requiresa rate of 211.2 Kbps to transmit. Although the radio-frequency ALD should be capableof supporting voice transmission using 2.5 ms voice packet, it is highly inefficient asthe overhead (46 bytes) accompanying the voice packet is more than double the length50 60075Chapter 4 Software Design & Implementationof the voice packet.In order to minimize both the transmission delay and the perceptual effect of lostpacket on the voice quality at the receiver, packets should be as short as possible. Onthe other hand, in order to maintain high channel utilization, it is desirable to keep thesize of the packet as large as possible.As a compromise, the radio-frequency ALD uses a 8 ms (64 bytes) voice packet. Itis shorter than the shortest possible phoneme (20 ms) observed by Jayant and Christensen[22], but it is longer than the overhead of the packet (46 bytes). The required transmissionrate is 110 Kbps, which is well within the capability of the Arlan 650Tm wireless networkcard.4.5 Lost packet replacement strategiesA common feature of packet-switching communication systems is that they cannotguarantee accurate and prompt delivery of every packet. In large networks, this couldbe due to network congestion or transmission impairments. In the radio-frequency ALD,this is mainly caused by transmission impairments which lead to occasional,. randompacket losses.Packet losses cause distortion by introducing gaps in the speech sequence. Theperceptual effect of such losses depends on the voice encoding scheme, packet size, thepacket loss rate and the speech segment affected. Distortion may vary from negligibleduring silent period to unacceptable during high level voiced speech.Speech transmission in a packet switching network can tolerate some loss of packetswithout an adverse effect on the quality of the received speech perceived by the us-76Utterance of 'ga' with no packet loss (389 ms)•1111111^ilillIlill tilt II 11111^1111^11^I 11 111111 I I 11111 1 1 1 I I II I I 1 1 1 I I 11I1 1 1 1 I----1411fIrtfitY031044444441414444-F--"-----11 1 1^1 1 III 1 111^II^II liii liii IiiilIIlliiIIItIIIIiiliitIi I I I^I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1^ILoss of 8ms packet, silence substitution applied1111111111111^111^1^1^III^11111111111111^1 1^1 1^I^I^1^1^1^1 1^1 1^1^1^I I I 1 1^1 1 1^1^1 I^I^1^1^I^I^I^I^Ii144+14444+444444440141 1 1 1 1 I 1 1 1 1 I ii 1 111^I I I I I^I I I I H I I 1 I 1 1 1 I I 1 I 1 1 1 I I I 1 1 I 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 I 1 1 1 I Iliii^lilt 111111 III 1111111 1 Ill 11i 11 1 1 1 I I^I lilt  1 I^till lilt I 1 1 1 I 1 I 1 1 1 1 1^I 1 I 1 I I I 1 I^1 11iiIIiiIiIiIIiIi!iiiIiiiiiIIiiitIiiIIIiiIIIiiiiiiIIlIIiiiIIIIIiiiIIIIIiiIilIiiiFig. 4.13 Packet replacement techniques for the radio-frequency ALD.Chapter 4 Software Design & Implementationtener. Different techniques have developed to replace lost PCM voice packets in packetswitching networks [23], [7], [22]. They all aim to increase the tolerance of the packetvoice system to missing packets. The choice of a particular packet replacement tech-nique will depend on the packet loss rate and signal processing power of the system.Two relative simple methods as shown in Fig. 4.14 were considered in the design ofthe radio-frequency ALD system.Loss of 8 ms packet, packet repetition appliedThe simplest way of dealing with the gaps caused by packet losses is known assilence substitution or zero-stuffing. It requires no signal processing at the transmitter77Chapter 4 Software Design & Implementationor receiver. Every lost packet is treated as a silent interval in the transmitted speech.There is no published information on the effect of silence substitution on 8 ms voicepackets. For 16 ms voice packets, silence substitution is tolerable (MOS >3.5) only forsmall packet loss rates (2 percent maximum) [23].The next simplest technique is known as packet repetition. It requires the receiver tostore the contents of the most recently received packet. When one or more subsequentpackets are missing, the receiver sends this stored information to the PCM decoder. Thisapproach is more attractive because it is likely that the missing packet will resemble theimmediately preceding packet. Unfortunate, there is no published information on theeffect of packet repetition on 8 ms voice packets. For 16 ms packets, this technique canextend the maximum tolerable packet loss rate from 2 percent to 5 percent [23].Informal listening with the radio-frequency ALD system _indicates that the effectof packet loss is not noticeable using either silence substitution or packet repetitionreplacement. It is reported that the effect of a 1 percent packet loss rate with 16 mspacket using silence substitution is noticeable only to critical listeners [22]. For theradio-frequency ALD system, the measured packet loss rate under normal circumstancesis only 3.05 x 10 -4 (see Table 5.1 on p. 84), and the radio-frequency ALD system alsouses a shorter voice packet (8 ms instead of 16 ms). The effect of packet loss shouldbe minimal for most listeners.There are other more sophisticated techniques which require more complicated signalprocessing that can extend the maximum tolerable packet loss rate to approximately 10percent [23], [22]. Application of these packet replacement techniques is not necessarygiven the low packet loss rate and the small packet size used.78Chapter 5 Evaluation of the radio-frequency ALDChapter 5 Evaluation of the radio-frequency ALD5.1 Packet error rates of the radio-frequency ALDThe packet error rate of the radio-frequency ALD achieved in a given environmentwill depend on the transmission conditions, location of the antenna, the type of buildingand the number of obstacles (people, furniture, walls, partitions, etc. ) in the direct pathbetween the transmitter and the receiver. Since these are variables it is difficult to predicthow well the radio-frequency ALD will operate in any specific situation.To find the packet error rates of the radio-frequency ALD, we measured the packetloss rate in different environments. The tests were carried out in the Hector MacleodBuilding — a large multiuse building on the UBC campus.In the packet error measurement, a radio-frequency ALD transmitter and receiverarrangement was simulated by using Channel 12 of the Arlan 650Tm wireless networkcard.l. This channel has the highest data rate ( 946 Kbps) and is more susceptible to packetloss than other lower bit rate channels [30]. Instead of 64-byte voice data (correspondingto 8 ms of voice data), 64—byte sequences of data known to the receiver are transmittedevery 8 ms. The receiver compares the received sequences with the expected sequences,and records the packets lost during transmission.Note that current Canadian DOC regulations prohibit the use of Channel 13 and 14 [30].79Chapter 5 Evaluation of the radio-frequency ALDThe packet error measurements were carried out with the transmitter and the receiverplaced in line-of-sight at four different locations in the Hector Macleod Building asshown in Fig. 5.1 and Fig. 5.2.• At either end of the short corridor on 4th Floor (outside Room 459 and Room 441).• At either end of the long corridor on 4th Floor (outside Room 439 and Room 402).• In the Communication Laboratory. (Room 458; approx. 30' x 40').• In a lecture room (Room 228; approx. 37' x 60').Measurements in different locations took place during the period of June 28 to July17, 1993. Fig 5.3 shows the cumulative number of erroneous packets plotted against thenumber of packets transmitted for seven measurements.The packets are transmitted at 8ms intervals, and the number of packets transmittedis related to the time elapsed since the start of the measurement byTime elapsed = (number of packet transmitted) x 8 msThe seven measurements in Fig. 5.1 lasted from 19 hrs. 52 min. (Measurement # 6)to just over 48 firs (Measurement # 5). The packet loss rate varied from location tolocation but it was more or less constant throughout each measurement.80Chapters Evaluation of the radio-frequency ALDFig. 5.1 Location of radio-frequency ALD transmitter and receiverin Measurement #1, 2, 3, 6, 7, 8, 9 and 10.Chapter 5 Evaluation of the radio-frequency ALDjlector Macleod Building 2nd Floor"Ii°4•91...7 AID Tura leaRadis-fsarpfaw ALD &arra11118111112EMEMICEMEdIi pun impimamFig. 5.2 Location of radio-frequency ALD transmitter and receiver in Measurement #4 and 5.82ElEaflIn the short conidor on 4th Floor (outside Rm 459 and Rm 441)In the long corridor on 4th Floor (outside Rm 439 and Rm 402)In a lecture room (Rm 228)a 6 21 In the Communication Lab. (Rm 458)Packet error measurement in the Hector Macleoad Building, UBC.65005000345004000Faf, 3500g 30002500200015001000500o 0^2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 1.6e+07 1.8e+07 2e+07 2.2e+07 2.4e+07Number of Packets transmitted700060005500Chapter 5 Evaluation of the radio-frequency ALDFig. 5.3 Packet error measurement of the radio-frequency ALD.The mean packet error rate (number of lost packets / total packets transmitted), meannumber of correct packets transmitted between losses (1 / packet error rate) and meantime between packet losses (mean number of correct packets transmitted between lossesx 8 ms) are shown in Table 5.1.83Chapter 5 Evaluation of the radio-frequency ALDTable 5.1 Mean packet error rate in measurement # 1-7.Measurement # Mean packet error rate Mean correct Packetstransmitted between lossMean time between packetloss (seconds)1 3.64 x 10 -4 2750.49 22.002 3.50 x 10 -4 2858.67 22.873 3.18 x 10 -4 3146.89 25.18.4 3.10 x 10 -4 3228.91 25.835 2.89 x 10 -4 3457.78 27.666 2.59 x 10 -4 3861.59 30.897 2.43 x 10 -a 4116.87 32.93mean 3.05 x 10 -4 3345.89 26.77Packet error rates fluctuated during some of the measurements. Measurements # 8,# 9 and # 10 in Fig. 5.4 show large increases in packet error rates for brief periods.The sudden rise in packet error rates could be the result of external interferenceor other transmission impairments. Measurement # 10 shown in Fig. 5.4 and 5.5,was obtained in the Communication Laboratory. During the course of measurement,another Arlan 650Tm wireless network card was deliberately set to transmit data over thesame data channel (Channel 12) as the radio-frequency ALD. (i.e. two spread spectrumtransmitters with the same spread code, modulation method and transmitting frequencyoperating in the same area.) The packet error rate increases with such interference.8485008000750070009 In the long corridor on 4th Floor (outside km 439 and Rm 402)0 In the Communications Lab. (Rm 458)aPacket error measurement in the Hector Macleoad Building, UBC6500ada. 6000E 5500F,t, 5000E 4500• 4000c.! 3500300025002000150010005002e+06 4e+06^6e+06^8e+06^le+07^1.2e+07Number of Packets transmitted9Chapter 5 Evaluation of the radio-frequency ALDFig. 5.4 Packet error measurements under interferenceThere were several Arlan 650Tm wireless network cards installed in the Communi-cation Laboratory connecting PCs and workstations. They used the same data channel(Channel 12) as that used in the packet error measurement. If they were operating duringthe packet error measurement, the interference generated by them would account for thesudden increase in packet error rates in Measurements # 8 and # 9.85The second set of Arlan 650Transmitter and receiver stopdata transmission.A second set of Arlan 650Transmitter and receiver startdata transmission in the sameroom using the same channel.Mean packet errorrate approx. = 2.7 e -4Mean packet errorrate approx. = 1.1 e -2Mean packet errorrate approx. = 3.6 e -4Chapter 5 Evaluation of the radio-frequency ALDPacket error measurement in the Communication Lab.0^ I^ I 0^50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000 600000 650000 700000NumberofPackets transmittedFig. 5.5 Packet error measurement under interference.The kind of interference experienced in Measurement # 10 is unlikely to happenin the radio-frequency ALD application. If there would be several radio-frequencyALDs operating in the same area, each would use a different spreading code to preventinterference from each other.30002500tr.20001500100050086Chapter 5 Evaluation of the radio-frequency ALD5.2 Miller and Nicely audiometric testPacket losses in the radio-frequency ALD cause distortions in the transmitted speech.The distortion would depend on the packet size and the speech segments affected. Asingle 8 ms speech segment lost in a word or sentence would have little or no effect onthe score of a word or sentence intelligibility test because the lost information can mostlikely be reconstructed from the context of the word or the sentence.The Miller and Nicely test [11] was used to investigate the effect of packet losseson intelligibility of speech transmitted by the radio-frequency ALD. This is a syllabletest with little contextual information presented. The Miller and Nicely test is designedto give an articulatory analysis of the types of errors made by the listener. There are16 consonants (ip1, lit, 14 If1,101,181,If 1,1b1,14191,14151, 14131,14 and In') used inthe Miller and Nicely test. In the Miller and Nicely test, each of the 16 consonants arespoken before the vowel at to form a syllable (pronounced as "pa", "ta", "ka", "fa","tha" as in "thank", "so", "sha", "ba", "da", "go", "va", "tha" as in "that","ma", and "no" respectively.). These 16 syllables make up almost three quarters of theconsonants in normal speech and about 40 percent of all phonemes [11].Each test consists of a number syllable presentations arranged in a random order.After each syllable presentation the listener is asked to indicate which of the 16 syllableswas presented.The 16 syllables can be classified according to the articulation process used togenerate the sounds. There are five articulatory features that serve to characterize anddistinguish the different phonemes : voicing, nasality, affrication, duration and place ofarticulation. These features of speech production are reflected in certain specific acoustical87Chapter 5 Evaluation of the radio-frequency ALDcharacteristics which are used by the listener to differentiate the syllables. The followingset of features are used as a basis for classification.I. Voicing. In articulatory terms, the vocal cords do not vibrate whenconsonants IPI, 1 k!, Ifi, 101, Is', and if are produced, but they do vibrate forIbl,14191,1v1,151,1z1,131,174 and 14 Acoustically, this means that the voicelessconsonants are aperiodic or noisy in character, whereas a periodic or line-spectrumcomponent is superimposed on the noise for voiced consonants.2. Nasality. To articulate 1m1 and 1n1 the lips are closed and the pressure is releasedthrough the nose by lowering the soft palate at the back of the mouth. The nasalresonance introduced in this way provides an acoustical clue.3. Affrication. If the articulator (i.e. tongue and lips) close completely, the consonantmay be a stop or nasal, but if they are brought close together and air is forcedbetween them, the result is a kind of turbulence or friction noise that distinguishesIfl, 101,1s1, If 1,1v1,181,1zI, and 131 from Ipidtl,141b1,14191,1m1, and 144. Duration. This is the name that designates the difference between Is!, I, 1.4 and 131and the other 12 consonants. These four consonants are long, intense, high-frequencynoises, but the most effective features in setting them apart is their extra duration.5. Place of Articulation. This feature has to do with where in the mouth the majorconstruction of the vocal passage occurs. The 16 consonants can be separated intothree groups with IPI, 1bl, Ifl, M, and iml as front (0), It1,1d1,101,1s1,151,1z1,and Inlas middle (1), and 1 k!, Ig1,1f I, and 131 as back (2) consonants.88Chapter 5 Evaluation of the radio-frequency ALDThe classification of the 16 syllables is summarized in Table 5.2Table 5.2 Classification of consonants used to analyze confusionConsonant Voicing Nasality Affrication Duration PlacePt0o0o0o0o01k 0 0 0 0 2f o o 1 o 00 0 0 1 0 18 0 0 1 1 1f 0 0 1 1 2b 1 0 0 0 0d 1 0 0 0 19 1 0 0 0 2V 1 0 1 0 oa 1 o 1 o 1z 1 o 1 1 13 1 0 1 1 2m 1 1 0 0 0rt 1 1 0 0 1The results of the Miller and Nicely test are presented in a confusion matrix shownin Fig. 5.6. The syllables presented to the listener are indicated by the consonants listedvertically in the first column, and the responses of the listener are indicated horizontallyacross the top of the matrix. The number in each cell is the frequency with whicheach stimulus-response pair that was observed. The number of correct responses canbe obtained by totalling the number of entries along the main diagonal. The overallarticulation score is obtained by dividing the number of correct responses by the numberof syllables presented to the listener.89Chapter 5 Evaluation of the radio-frequency ALDpt k f 8 s rI b d g v^a^z^3^m^np 20 0 0 0000 0 0 0 0000 ^0^0t 0 20 0 0000 0 0 0 0000 ^0^0k 0 218 0000 0 0 0 0000 0^01 000 20 0 0 0 000 0000^008 000 10 8 2 0 000 0000 00s 0 00 0 9 11 0 000 0000^00f 0 00 0 0 0 20 0 0 0 0000 ^0^0b 0 00 5 000 15 0 0 0000 ^0^0d 0 00 0000 0 20 0 0000 ^0^0g 0 00 0000 0 0 20 0000 ^0^0v 0 00 0000 0 0 0 20^0^0^0^005 0 00 0000 0 0 0 10^10^0^0 00z 0 00 0000 0 0 0 0^10^10^0^003000 0 0 0 0 000 0^0^0^20^000 00 0 000 0 0 0 0000 ^20^00 00 0000 0 0 0 0000 ^0^20Overall articulation score = 272 / 320Fig. 5.6 A confusion matrix.In order to analyze the articulation score, the scores of different syllables in theconfusion matrix can be combined according to their articulatory features. Combiningsyllables in the confusion matrix creates a smaller confusion matrix that shows theconfusions between groups, and the sum along the diagonal of the smaller confusionmatrix gives a new articulation score for the articulatory feature. The new score willbe greater than the original score, since all the responses that were originally correctremain so and in addition all the confusions within each group are now considered tobe correct in the new score.90Chapter 5 Evaluation of the radio-frequency ALDvoiceless voicedpi k 10 sf 6 d g v 8 z 3 in nv p 20 0 0 0000 000 0 0 0 0 00t 0 20 0 0000 0 0 0 0 0 0 0 0 0k 0 218 0000 000 0 0 0 0 00e f 0 00 2000 0 000 0 0 0 0 00I^0 0 0 0 10 8 2 0 000 0 0 0 0 00e s 000 0 9 11 0 000 0 0 0 0 00s0 00 0 o 020 000 0 0 0 0 00S6 000 5 0 0 0 15 0 0 0 0 0 0 0 0d 0 00 0000 0 20 0 0 0 0 0 0 0v g 0 00 0 0 0 0 0020 0 0 0 0 00i^v 0 00 0000 0 0 0 20 0 0 0 0 0c^8 000 0 0 0 0 0 0 0 10 10 0 0 0 0e^z 000 0 0 0 0 000 0 10 10 0 00d 3 0 0 0 0 0 0 0 000 0 0 0 20 000 00 0000 0 0 0 0 0 0 0 20 00 00 0000 0 0 0 0 0 0 0 0 20Fig. 5.7 A confusion matrix grouped by voiced and voiceless consonantsUsing the classifications in Table 5.2, the confusion matrix in Fig. 5.6 can be groupedinto four portions dividing the voiced and voiceless consonant as shown in Fig. 5.7. Thescores of each portion can then be summed to form the confusion matrix for the voicingfeature as shown in Fig. 5.8. The probability that the voicing feature will be perceivedcorrectly can be calculated by summing the diagonal cell in the confusing matrix (i.e.the articulation score for voicing ).91Chapter 5 Evaluation of the radio-frequency ALDvoiceless^voicedvoiceless^140 0voiced 5 175articulation score for voicing = 315 / 320Fig. 5.8 confusion matrix for voiced and voiceless consonantUsing a similar method, a new set of articulation scores for each of the 'articulatoryfeature in Table 5.2 can be obtained by combining the syllables in the confusion matrix.This set of scores will indicate how well different articulatory features will be perceivedcorrectly.5.2.1 Miller and Nicely Test proceduresThe purpose of the experiment was to determine whether the intelligibility of trans-mitted speech will decrease when the speech material processed by the p-law encodingalgorithm is subjected to various lost packet conditions.Ten recordings of the 16 Miller and Nicely syllables by two Canadian Englishspeakers processed by the ti-law encoding algorithm were stored in a Nextml computer.From the ten recordings, the shortest and the longest sounds of the 16 syllables fromeach speaker were selected as the control set (a total of 64 syllables). The measuredlength of these syllables in the control set varies from 225 ms to 612 ms with a meanlength of 448 ms.Simulated packet losses of various packet sizes were applied to the 64 syllables inthe control set to make new sets of syllables. The speech segment chosen to correspondto the simulated packet loss was the segment in the syllable that contains the maximum92Consonant region11111111 111111111111 1111111111111111111111111111Chapter 5 Evaluation of the radio-frequency ALDamount of voice information (i.e. the region where the end of the consonant meets thestart of the vowel as shown in Fig. 5.9.).Vowel regionUtterance of 'ma'I11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111Speech segment affected by simulated packet lostFig. 5.9 Simulated packet loss in the Miller and Nicely syllablesThe following eight test conditions were used.1. Control Set (original syllables).2. 8 ms packet loss replaced by silence.3. 8 ms packet loss replaced by the last received 8—ms-packet.4. 16 ms packet loss replaced by silence.5. 16 ms packet loss replaced by the last received 8—ms-packet repeating two times tofill the 16 ms packet loss.6. 32 ms packet loss replaced by silence.7. 32 ms packet loss replaced by the last received 8—ms-packet repeating four times tofill the 32 ms packet loss.8. 64 ms packet loss replaced by silence.93Next ComputerSound-proof roomAudiometerChapter 5 Evaluation of the radio-frequency ALDThe arrangement shown in Fig. 5.10 was used to conduct the tests. The Nextcomputer outside a sound-proof room was used to present the syllables to the subjects.Sound output from the Next'rm computer was administered to the subjects throughheadphones at 50 dBHL. Sound level control was accomplished using the audiometer.The subject responded through the MacintoshTm Computer in the sound-proof room.After each syllable presentation, the NextTm computer waited for the subject to respondbefore presenting the next syllable.Fig. 5.10 Set up for the Miller and Nicely testFor each test condition, the 64 syllables in a set were repeated five times (i.e. eachtest had a total of 320 syllables) and presented to the subjects in a random order. Aftereach syllable presentation, the listener was asked to indicate (to guess, if necessary)which of the given 16 syllables was heard.The experiment was divided into three stages. In the first stage, a normal hearing pilotsubject was used to check out the experimental procedures and to obtain primarily resultsin the experiment. Results and feedbacks in the first stage were used as guidelines in the94Control 64 ms8 ms 16 ms 32 ms51.570.374.782.881.9overall scoreSilence substitutionChapter 5 Evaluation of the radio-frequency ALDsecond stage and third stage where normal and hearing impaired subjects were tested.5.2.2 Results of the Miller and Nicely TestResults of the pilot subjectA normal hearing subject was tested with the Control set; 8 ms, 16 ms, 32 ms and64 ms simulated packet lost replaced with silence substitution. The overall score of thetests are shown in Table 5.3.Table 5.3 Overall Scores of the pilot subjectThe results show that the score for silence substitution in 8ms packet lost has aslightly higher score ( 0.9 %) than the Control set, which indicates a single 8 ms packetloss has small or no effect to the subject in terms of the intelligibility. The results alsoindicate that as the length of speech segment lost increases, the overall score decreases.The pilot subject indicated that great concentration and effort is needed for the 64 mssilence substitution test, and there is a lot of uncertainty in getting the correct syllables.To avoid introducing any hardship on the subjects, it was decided that 64 ms silencesubstitution test will be dropped from the second and third stages of the experiment.95Chapter 5 Evaluation of the radio-frequency ALDResults using normal hearing subjectsSix young normal hearing subjects with training in phonetic pronunciation (studentsof the Department of Audiology and Speech Sciences) were tested. The reason forusing normal hearing subjects is that they have more sensitive hearing than the hearingimpaired listeners in detecting the minor changes in the syllables resulting from the packetreplacement strategies applied to small segments of the speech materials.Seven tests, the control set; silence substitution of 8 ms, 16 ms and 32 ms packet;and packet repetition of 8 ms, 16 ms and 32 ms were applied to each subject in randomorder. The overall score of the tests were shown in Table 5.4.Table 5.4 Miller and Nicely test results for the normal hearing subjects.Silence substitution Packet repetitionSubject Control 8 ms 16 ms 32 ms 8 ms 16 ms 32 ms1 86.9 90.0 84.7 80.6 89.1 85.3 76.62 88.8 89.7 86.9 78.4 88.4 86.6 81.33 85.0 80.6 82.8 75.9 81.6 83.4 75.64 82.2 82.5 81.9 75.3 86.6 83.4 74.35 91.3 90.0 89.4 74.7 '88.1 87.2 '83.16 87.8 81.6 82.2 74.1 84.4 80.6 72.5Mean 87 85.7 84.7 76.5 86.4 84.4 77.2The mean results show that the scores for silence substitution and packet repetitionin 8ms packet loss has a slightly lower score (within 1.3%) than the Control set, whichindicates a single 8 ms packet loss has small effect in degrading the intelligibility of thesyllables. The results also indicate that in terms of intelligibility scores, the difference96Chapter 5 Evaluation of the radio-frequency ALDbetween the two packet replacement techniques are within 0.7 % of each other.Results using hearing impaired subjectsSince the results in the normal hearing subjects show that the two packet replacementtechniques has similar effects. The tests on hearing impaired subjects concentrated onthe silence substitution technique, Four tests, the control set; silence substitution on 8ms, 16 ms and 32 ms packet were applied to each subject in random order.Three senior hearing impaired subjects with mild hearing loss at high frequency weretested. Their audiogram is shown in Fig 5.11. The overall score of the tests are shownin Table 5.5.Table 5.5 Miller and Nicely test results for the hearing impaired subjects.Silence substitutionSubject Control 8 ms 16 ms 32 ms1 73.8 76.9 69.1 59.12 69.7 71.3 68.1 60.33 82.8 76.6 74.7 68.4Mean 75.4 74.9 70.6 62:6The mean results show that the scores for silence substitution with 8ms packet lossare slightly lower score (within 0.5%) than the Control set, which indicates a single 8ms packet loss has a small degradation effect on intelligibility. The results also indicatethat as for the normal hearing subjects, as the length of speech segment lost increases,the overall score decreases.9701020cc3040t 50—4 60.E 70g 8090100Right —6--Left -9-Chapter 5 Evaluation of the radio-frequency ALDPure Tone Audiogram, Listener 1 102030405060708090100100^1000^100aFrequency (Hz)Pure Tone Audiogram, Listener 2100^1000Frequency (Hz)Pure Tone Audiogram, Listener 3100^1000^1000CFrequency (Hz)Fig. 5.11 Audiogram of the hearing impaired subjects98728^16^24^32^40Speech segment lost (ms)48 56 64Miller and Nicely Test (Overall score)silence subsitution (hearing impaired subjects) -9---silence subsitution (pilot) -0-packet repe-atition (normal hearing subjects) -a-silence subsitution (normal bearing subjects) -e--1009590858075706560555045aoChapter 5 Evaluation of the radio-frequency ALD5.2.4 Effect of packet lost on speech intelligibilityThe mean results of the experiment for the pilot subject, the normal, and the hearingimpaired subjects are summarized in Fig. 5.12. (0 ms speech segment losses correspondsto the scores of the control set)Fig. 5.12 Results of the Miller and Nicely TestsFor the normal hearing subjects, 8 ms packet loss have small effect on the overallscore (within 1.3% of control set) with silence substitution or packet repetition. Thehearing impaired subjects have lower mean scores than the normal hearing subjects. The99100959085t 80757065605550Miller and Nicely Test for bearing impaired subjects (Overall score)8^16^24Speech segment lost (ms)32 40Chapter 5 Evaluation of the radio-frequency ALDoverall scores of each hearing impaired subject are shown in Fig. 5.13. The overall scoredecreases as the length of packet segment increases, but 8 ms packet loss has greatereffect in subject 3 than the other two subject.Fig. 5.13 Overall scores for the hearing impaired subjects.Fig. 5.14, Fig. 5.15 and Fig. 5.16 show the different mean articulatory scores ob-tained under different packet loss conditions using different packet replacement strategiesfor the two group of listeners (0 ms speech segment losses corresponds to the scoresof the control set). The results show that some of the articulatory features are betterpreserved when packets are lost. In general, as the length of speech segment increases,scores for different articulatory features decrease.1000 8^16^24Speech segment lost (ms)32 40Articulatory scores for normal bearing subjects (silence substitution)100959085Chapter 5 Evaluation of the radio-frequency ALDFig. 5.14 Results of different articulatory scores in silence substitution for normal hearing subjects.For the normal hearing subjects, the articulatory scores obtained using silence sub-stitution or packet repetition techniques are similar, except that the voicing articulatoryfeature is more well preserved using the packet repetition technique.101Chapter 5 Evaluation of the radio-frequency ALD100 Articulatory scores  for normal bearing subjects (packet repetition) ^rimy95^Place of aniculation 0^DurationAffrication ^Nasality Voicing ^0^8^16^24^32^40Speech segment lost (ins)Fig. 5.15 Results of different articulatory scores in packet repetition for normal bearing subjects.The articulatory scores for different hearing impaired subjects are shown in Fig. 5.16.The scores of different articulatory features vary from individual to individual. In general,the scores for different articulatory features decrease as the length of packet lost increases.The individual scores of the 16 syllables for the normal hearing subjects and hearingimpaired subjects under different packet loss conditions are shown in Fig. 5.17. Theresults indicate that some of the syllables like "pa", "ta", "ka", "fa", "Sha", "ga", "ma","na" is better preserved than others.85102ao24 328^. 16Speech segment lost (ms)^Place of articulation ^DurationAffricationNasality ^Voicing Articulatory scores for Subject 3 (silence substitution)1009590858075Chapter 5 Evaluation of the radio-frequency ALD1009590858075 Articulatory scores for Subject 1 (silence substitution)0^8^16^24^32^aoSpeech segment lost (ms)1009590858075Articulatory scores for Subject 2 (silence substitution)8^16^24^32^40Speech segment lost (ms)Fig. 5.16 Results of different articulatory scores in silence substitution for bearing impaired subjects.10310013'•^80600pa ta ka fa thank sa Sha ba da ga va that za Zha ma na0100• 808 60• 4020a.Syllable score (32 ms packet loss)Syllable score (16 ms acket loss)1^1_^1^1^1^1^1^1^1^1^1^1^1^1^1^1pa ta ka fa thank sa Sha ba da ga va that za Zha ma na100T3 808 604020a.0Chapter 5 Evaluation of the radio-frequency ALD• 100• 808 60fi• 406 20Syllable score (Control intact (impaired)intact (normal)pa ta ka fa thank sa Sha ba da ga va that za Zha ma naSyllable score (8ms acket loss)1111111111111111 pa ta ka fa thank sa Sha ba da ga va that za Zha ma naP.R.= packet repetitionS.S.= silence substitutionS.S. (impaired)P.R. (normal)S.S. (normal) --e--Fig. 5.17 Scores of the 16 syllables under different packet lost conditions.S.S. (impaired)P.R. (normal)S.S. (normal) —e—S.S. (impaired) --to--P.R. (normal) --er--S.S. (normal)104Chapter 5 Evaluation of the radio-frequency ALDSyllable tests are the most difficult for intelligibility because no contextual informa-tion is presented in the stimuli. When words or sentences are presented to the listeneras in a normal conversation, there is more contextual information and the lost informa-tion present in a small segment of speech is likely to be recovered from the contextualinformation in the rest of the utterance. In addition, the packet losses occur in a moreor less random fashion, and not always occurs in the place that would affect the wordor syllable most. The degradation effect of small speech segment losses on speech in-telligibility should become smaller. The effect of single (8 ms) or double packet (16ms) speech segment loss in the radio-frequency ALD should be minimal for normal orhearing impaired listeners.105Chapter 6 ConclusionsChapter 6 Conclusions6.1 SummaryThis thesis documents the design, implementation and testing of a secure radio-frequency Assistive Listening Device for hard of hearing listeners. The design combinesthe use of spread spectrum and digital voice encoding technology to provide secureand high quality voice reception for hard of hearing users. The devices offer hearingimpaired listeners a better communication channel in difficult listening conditions thanhearing aids alone can provide.In spite of advances in digital voice encoding technology, it has not enjoyed wide-spread use in devices for the hard of hearing community. The results from our intel-ligibility tests indicate that in terms of intelligibility the CCITT G.722 and the p-lawencoding algorithms are comparable to undigitized voice for hard of hearing listeners.In order to investigate the feasibility of transmitting digitized voice using the directsequence spread spectrum method, a radio-frequency ALD transmitter and receiver setwas built based on the Arlan 65OTm wireless network card.The radio-frequency ALD was tested in different areas of a large multi-use building.The results of these tests indicate that the device performs well under normal circum-stances with a packet error rate around 3 x 10-4.Despite the low packet error rate, each packet lost will result in the loss of a 8 mslong speech segment. Two simple lost packet replacement methods, silence substitutionand packet repetition, were investigated to see if speech intelligibility would degradesubstantially in the event of single and multi-packet losses. The results indicate that106Chapter 6 Conclusionsintelligibility loss is minimal under single or double packet losses. Intelligibility beginsto drop when the lost speech segment is longer than 16 ms. We also found that packetrepetition is slightly better than silence substitution in preserving speech intelligibility.6.2 Suggestions for further workThe development of the radio-frequency ALD was accomplished on personal com-puters. The physical size of the device severely limits its mobility. Further work isneeded to produce a device which could be used by the hard of hearing community.Advances in digital voice processing have provided more efficient encoding algo-rithms, in terms of bit rate required than the p-law algorithm used in this work. Appli-cation of such voice encoding algorithms in the radio-frequency ALD would reduce thedata transmission rate and hence the cost of the final system. Further study is needed todetermine whether these algorithms are acceptable for hard of hearing listeners.107Appendix AExample of a Revised SPIN Test FormNameForm #-4 of the Revised SPIN Test (12/83))^Marker^DateS/B HIGH^LOW ACCEPT? Y/N Percent Hrg.1. The doctor X-rayed his CHEST. H .^1.2. Mary had considered the SPRAY. L^2.3. The woman talked about the FROG. L^3.4. The workers are digging a DITCH. H 4.5. Miss Brown will speak about the GRIN. L^5.6. Bill can't have considered the WHEEL. L^6.7. The duck swam with the white SWAN. H 7.8. Your knees and your elbow are JOINTS. H 8.9. Mr. Smith spoke about the AID. L^9.10. He hears ahe asked about the DECK. L^10.11. Raise the flag up the POLE. H 11.12. You want to think about the DIME. L^12.13. You've considered the SEEDS. L^13.14. The detectives searched for a CLUE. H 14.15. Ruth's Grandmother disscussed the BROOM. L^15.16. The steamship left on a CRUISE. H 16.17. Miss Smith considered the SCARE. L^17.18. Peter has considered the MAT. L^18.19. Tree trunks are coovered with BARK. H 19.20. The meat from a pig is called PORK. H 20.21. The old man considered the KICK. L^21.22. Ruth poured herself a cup of TEA. H 22.23. We saw a flock of wild GEESE. H 23.24. Paul could not discussed the RIM. L^24.25. How did your car get that DENT? H 25.10826. She made the bed with clean SHEETS. H 26.27. I've been considering the CROWN. L 27.28. The team was trained by their COACH. H 28.29. I've got a cold and a sore THROAT. H 29.30. We've spoken about the TRUCK. L 30.31. She wore a feather in her CAP. H 31.32. The bread was made from whole WHEAT. H 32.33. Mazy could not discussed the TACK. L 33.34. Spread some butter on your BREAD. H 34.35. The cabin was made of LOGS. H 35.36. Harry might consider the BEEF. L 36.37. We're glad Bill heard about the ASH. L 37.38. The loin gave an angry ROAR. H 38..39. The sandal has a broken STRAP. H 39.40. Nancy should consider the FIST. L 40.41. He's employed by a large FIRM. H 41.42. They did not discuss the SCREEN. L 42.43. Her entry should win first PRIZE. H 43.44. The old man think about the MAST. L 44.45. Paul wants to speak about the BUGS. L 45.46. The airplan dropped a BOMB. H as.47. You're glad she called about the BOWL. L 47.48. A zebra has black and white STRIPES. H 48.49. Miss Black could have dicussed the ROPE. L 49.50. I hope Paul asked about the MATE. L 50.Bibliography[1] American National Standard Methods for the Calculation of the Articulation Index.ANSI S3.5-1969. American National Standards Institute, Inc., 1969.[2] Internet Protocol. RFC 791. Defense Advanced Research Projects Agency, 1981.[3] Carrier sense multiple access with collision detection (CSMAICD) access methodand physical layer specification. ANSI/IEEE Std 802.3-1988. American NationalStandards Institute, Inc., 1988.[4] Canada, Department of Communications. Table of frequency allocations 9 KHz to275 GHz. Government Publications, 1991.[5] D. W. Kalikow, K. N. Stevens, L.L. Elliot. Development of a test of speechintelligibility in noise using sentence materials with controlled word predictability.The Journal of the Acostical Society of America, 61:1377-1351, 1977.[6] Daniel Minoli. Optimal packet length for packet voice communication. IEEETransactions on Communications, 27(3):607-611, 1979.[7] David J. Goodman, Gordon B. Lockhart, Ondria J. Wasem, Wai-Choong Wong.Waveform substitution techniques for recovering missing speech segments in packetvoice communications. IEEE Transactions on Acoustics, Speech, and SignalProcessing, 34(6):1440-1448, December 1986.[8] Denis Byrne. The speech spectrum —some aspects of its significance for hearing aidselection evaluation. British Journal of Audiology, 11:40-46, 1977.[9] E.L. Thorndike, I. Lorge. The Teacher's Word Book of 30,000 Words. TeachersCollege, Columbia University, New York, 1952.110[10]Fred H. Bess, Larry E. Humes. Audiology The fundamentals. Williams &1990.[11]George A. Miller, Patricia E. Nicely. An analysis of perceptual confusion amongsome english consonants. The Journal of the Acostical Society of America, 27:308—352, March, 1955.[12]fl. Fletcher. Hearing, the determining factor for high-fidelity transmission. Proceed-ings of the Institution of Radio Engineer, 30:266-267, June, 1942.[13]I. Hirsh, E. Reynolds, M. Joseph. Intelligibility of different speech materials. TheJournal of the Acostical Society of America, 26:530-538,1954.[14]James L.Flanagan, Manfred R. Schroeder, Bishnus Atal, Ronald E. Crochiere,Nuggehally S. Jayant, Jose M. Tribolet. Speech coding. IEEE Transactions onCommunications, 27(4):710-737, April 1979.[15]Jerome D. Schein, David Peikoff. Canadians with Impaired Hearing. PublicationDivision, Statistics Canada., January, 1992.[16]Juin-Hwey Chen, Richard V. Cox, Yen-Chun Lin, Nikil Jayant, Melvin J. Melchner.A Low Delay CELP Coder for the CCITT 16kb/s speech coding standard. IEEEJournal on Selected Areas in Communications, 10(5):830-849, June, 1992.[17]Leo L. Beranek. The design of speech communication systems. Proceedings of theInstitution of Radio Engineer, 35:880-890, Sept. 1947.[18]Motorola Inc., P.O. Box 20912, Phoenix, Arizona 85036, U.S.A. Motorola Telecom-munications Device Data, 1989.[19]Nobuhiko Kitawaki, Hiromi Nagabuchi,Masahiro Taka, Kenzo Takahashi. Speechcoding for atm networks. IEEE Communication Magazine, pages 21-27, January,1111990.[20]N.R. French, J.C. Steinberg. Factors governing the intelligibility of speech sounds.The Journal of the Acostical Society of America, 19:90-119, January, 1947.[21]N.S. Jayant. High-quality coding of telephone speech and wideband audio. IEEECommunication Magazine, pages 10-20, January 1990.[22]Nuggehally S. Jayant, Susan W. Christensen. Effect of packet losses in waveformcoded speech and improvements due to an odd-even sample-interpolation procedure.IEEE Transactions on Communications, 29(2):101-109, February 1981.[23]Ondria J. Wasem, David J. Goodman, Charles A. Dvorak, Howard G. Page. Theeffect of waveform substitution on the quality of pcm packet communications. IEEETransactions on Acoustics, Speech, and Signal Processing, 36(3):342-348, March1988.[24]Paul Mermelstein. G.722, A New CCITT Coding Standard for Digital Transmissionof Wideband Audio Signals. IEEE Communication Magazine, pages 8-15, January1988.[25]Robert C. Bilger. Manual for the Clinical Use of the Revised SPIN Test. Department ofSpeech and Hearing Science, University of Minois, Champaign, Illinois 61820, 1984.[26]Robert C. Dixon. Spread Spectrum Systems. Wiley-Interscience, 1984.[27]Ronald E. Crochiere, Richard V. Cox, James D. Johnston. Real time speech coding.IEEE Transactions on Communications, 30(4):621-634, April 1982.[28]Roy W. Gengel. Acceptable speech-to-noise ratios for aided speech discriminationby the hearing-impaired. The Journal of Auditory Research, XI:219-222, 1971.112[29]Spectrum Signal Processing Inc., 8525 Baxter Place, 100 Production Court, Burnaby,B.C., Canada V5A 4V7. TMS320C30 Processor Board User's Manual, March, 1991.[30]Telesystems SLW Inc., 85 Scaradale Road, Suite 201, Don Mills, Ontario, CanadaM3B 2R2. Arlan 650 Wireless Network Card User Guide, 1992.[31]Terese Finitzo-Hieber, Tom W. Tillman. Room acoustics effects on monosyllablicword discrimination ability for normal and hearing-impaired children. Journal ofSpeech and Hearing Research, 21:440-458, September 1978.[32]Texas Instruments, P.O. Box 1443, MS702 Houston, Texas 77001, U.S.A.TMS320C3x User's Guide, April, 1990.[33]Willian R. Daumer. Subjective evaluation of several efficient speech coders. IEEETransactions on Communications, 30(4):655-662, April 1982.113

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065190/manifest

Comment

Related Items