Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A subjective evaluation of the effects of digital channel errors in PCM and DPCM voice communication… Yan, James 1971

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1971_A7 Y35.pdf [ 3.68MB ]
Metadata
JSON: 831-1.0102107.json
JSON-LD: 831-1.0102107-ld.json
RDF/XML (Pretty): 831-1.0102107-rdf.xml
RDF/JSON: 831-1.0102107-rdf.json
Turtle: 831-1.0102107-turtle.txt
N-Triples: 831-1.0102107-rdf-ntriples.txt
Original Record: 831-1.0102107-source.json
Full Text
831-1.0102107-fulltext.txt
Citation
831-1.0102107.ris

Full Text

A SUBJECTIVE EVALUATION OF THE EFFECTS OF DIGITAL CHANNEL ERRORS IN PCM AND DPCM VOICE COMMUNICATION SYSTEMS by i JAMES YAN B.A.Sc, University of British Columbia, 1969 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE We accept this thesis as conforming to the required standard Research Supervisor * Members of Committee Head of Department Members of the Department of El e c t r i c a l Engineering THE UNIVERSITY OF BRITISH COLUMBIA May, 1971 In p r e s e n t i n g t h i s t h e s i s in p a r t i a l f u l f i l m e n t o f the r e q u i r e m e n t s f o r an advanced d e g r e e a t the U n i v e r s i t y o f B r i t i s h C o l u m b i a , I a g r e e t h a t t he L i b r a r y s h a l l make i t f r e e l y a v a i l a b l e f o r r e f e r e n c e and s t u d y . I f u r t h e r a g r e e t h a t p e r m i s s i o n f o r e x t e n s i v e c o p y i n g o f t h i s t h e s i s f o r s c h o l a r l y p u r p o s e s may be g r a n t e d by t h e Head o f my Depar tment o r by h i s r e p r e s e n t a t i v e s . I t i s u n d e r s t o o d t h a t c o p y i n g o r p u b l i c a t i o n o f t h i s t h e s i s f o r f i n a n c i a l g a i n s h a l l not be a l l o w e d w i t h o u t my w r i t t e n p e r m i s s i o n . Depar tment o f ^ > > t ? / c V » ^ L t^l/(?V4/£-£~X//j/g The U n i v e r s i t y o f B r i t i s h C o l u m b i a V a n c o u v e r 8, Canada Date ^ ^ /<? ABSTRACT When the message sink of a speech communication system i s a human, the ultimate criterion of system performance is the subjective quality of the output speech. Unfortunately, no tractable mathematical function has been found to adequately relate speech quality to the physical system parameters. For this reason, empirical methods must be ut i l i z e d to assess the interplay between the subjective quality and the objective parameters of a speech communication system. In this thesis the effects of transmission errors on the speech quality of pulse code modulation (PCM) and diff e r e n t i a l pulse code modulation (DPCM) voice communication systems are investigated. The subjective figure of merit adopted in this study i s the listener's preference of the output speech with respect to a suitably chosen reference. Under some assumptions and restrictions, the system models of the two types of modulation systems of interest are formulated and simulated in an IBM System 360/Model 67 time-shared computer. With the aid of a special-purpose input/output interface, the simulated systems are used to process a recorded speech sample representative of English speech. The quality of the processed speech i s then subjectively evaluated according to the isopreference method. The results of the subjective evaluation are presented in the form of isopreference contours. These contours indicate that in both PCM and DPCM systems, the speech quality i s dominantly influenced by quantization error when the channel is relatively error-free; whereas i f the channel is relatively noisy, finer quantization offers no improvement in quality. Furthermore, encoding the quantizer's output by either natural or folded binary coding yields vir t u a l l y identical speech quality. In a comparison with contours of constant system signal-to-distortion power ratio, the isopreference contours reveal that under some conditions, the system signal-to-distortion power ratio may be a reasonably adequate measure of human preference of speech. In terms of the minimum channel capacity required to achieve a desired speech quality, DPCM is found to perform better than PCM for three important channel models: the binary symmetric channel, the additive white Gaussian channel, and the Rayleigh fading channel with additive white Gaussian noise. In the latter two cases, the performance improvement of DPCM over PCM increases with increasing desired speech quality for the range of speech quality considered in this study. Finally, the implications of the subjective tests' results in two suboptimal operations are discussed. i i i TABLE OF CONTENTS Page I INTRODUCTION 1.1 Subjective Quality and Communication System Evaluation 1 1.2 Review of Related Research 1 1.3 Scope of the Thesis . 3 II COMPUTER SIMULATION OF PCM AND DPCM VOICE COMMUNICATION SYSTEMS 2.1 Introduction 5 2.2 System Models 6 2.2.1 Model of a General Digital Communication System 6 2.2.2 PCM System Model 8 2.2.3 DPCM System Model 12 2.3 Digital Computer Simulation 14 III SPEECH QUALITY EVALUATION 3.1 Introduction 18 3.2 The Isopref erence Method 19 3.3 Scaling of Isopref erence Contours 22 3.4 Preparation of Speech Material 23 3.5 Further Details of Subjective Tests 27 3.5.1 Pilot Test 27 3.5.2 Test for Determining Isopreference Contours 28 3.5.3 Test for Rating Isopref erence Contours . 29 IV RESULTS OF SUBJECTIVE TESTS 4.1 Determination of Experimental Isopreference Contours 31 4.1.1 Normality Assumption 31 iv Page A.1.2 Determination of Isopreferent Signals by Least-Squares F i t 33 4.1.3 Plotting Isopreference Contours; Results of Eating . Test 34 4.2 Comparison with the System Signal-to-Distortion Power Ratio.. 37 4.3 Discussion of the Subjective Tests' Results.j 42 4.3.1 Saturation Effect in Preference 42 4.3.2 The Minimum Required Channel Capacity of the BSC...... 42 4.3.3 The BSC Based on the Additive White Gaussian Channel.. 47 4.3.4 The BSC Based on the Rayleigh Fading Channel with AWG Noise ' 54 V CONCLUSION 63 APPENDIX A. FOLDED BINARY CODE 66 REFERENCES 68 v LIST OF ILLUSTRATIONS Figure w Page 2.1 Model of a general d i g i t a l communication system 7 2.2 A PCM communication system 9 2.3 Normalized frequency response of low-pass f i l t e r 10 2.4 Transition diagram of a binary symmetric channel '.. . . 10 2.5 A DPCM communication system 13 2.6 Block diagram of d i g i t a l recording and playback system. (a) Digital recording system. (b) Digital playback system.. 15 3.1 A typical isopreference contour 21 3.2 A typical psychometric curve 21 3.3 (a) Normalized amplitude probability density of speech. Sym-metrical average of positive and negative data. (b) Power density spectrum of speech 25 4.1 (a) An experimental psychometric curve. (b) Preference in unit normal deviate plotted against stimulus. The line i s fit t e d by a weighted least-squares method 32 4.2 PCM isopreference contours. The reference point of each contour is drawn solid. The number associated with each re-ference point i s the subjective rating ( S / N ) g u b j °f t n e r e ~ ference. (a) Natural binary coding, (b) Folded binary coding.. 35 4.3 DPCM isopreference contours. See the caption of Figure 4.2 for more comments. (a) Natural binary coding. (b) Folded binary coding . 36 4.4. Contours of constant system signal-to-distortion power ratio. v i Figure Page The coding is natural binary. The curves in broken lines are the isopreference contours of the respective system. Also given are the signal-to-distortion power ratio in dB of the points marked "X". (a) PCM system. (b) DPCM system 38 4.5 Simulation results relating the system signal-to-distortion power ratio to the relative speech signal power. Transmission errors are neglected and natural binary coding i s used. (a) PCM system. (b) DPCM sytem 40 4.6 Curves of constant channel capacity of the BSC. Superimposed are the contours of Figure 4.2(a)..... 43 4.7 Subjective quality ratings of Figures 4.2 - 4.3 plotted against their respective minimum required channel capacity when the di g i t a l channel i s a BSC. 45 4.8 Communication over an additive white Gaussian channel 47 4.9 Curves of constant capacity of the additive white Gaussian channel. Superimposed are the contours of Figure 4.2(a)... 49 4.10 Subjective ratings of Figures 4.2 - 4.3 plotted against their respective minimum required channel capacity when the physical channel is AWG 51 4.11 Operating points for suboptimal speech quality in an AWG channel 52 4.12 Subjective ratings versus the channel SNR in a PCM and a DPCM system for N=5. The physical channel is an AWG channel. 53 4.13 Communication over a Rayleigh fading channel with AWG noise 54 4.14 Optimum incoherent receiver for binary orthogonal signals 56 4.15 Curves of constant capacity of a Rayleigh fading channel with AWG noise. Superimposed are the contours of Figure 4.2(a)...... 58 v i i o Figure Page 4.16 Subjective ratings of Figures 4.2 - 4.3 versus their respective minimum required channel capacity when the physical channel is Rayleigh fading with AWG noise 59 4.17 Operating points for suboptimal speech quality in i Rayleigh fading channel with AWG noise..... 60 4.18 Subjective ratings versus the average received SNR in a PCM and a DPCM system for N=5. The physical channel is a Rayleigh fading channel with AWG noise 61 A.l Folded and natural binary code 66 v i i i ACKNOWLEDGEMENT 1 I am grateful to the Defence Research Board of Canada and the National Research Council of Canada for support received under Grants DRB 2801-26 and NRC A-3308, respectively. Grateful acknowledgement i s also given for the university fellowships and research assistantships received from 1969 to 1971. I am deeply grateful to Dr. R.W. Donaldson, my research supervisor, for his helpful suggestions and constant encouragement. I would like to thank Dr. G.B. Anderson for reading the original draft and for his valuable comments. I am specially thankful to Dr. Donald Chan of Bell-Northern Research for his suggestions and for the many f r u i t f u l discussions we had. I also enjoyed the enlightening discussions with Mr. Ke-Yen Chang. I wish to express my sincere appreciation to Messrs. A. MacKenzie and H. Black for their technical assistance, to Miss Linda Morris for typing the manuscript, and to Messrs. K.Y. Chang, A.B.S. Hussain and G.T. Toussaint for proofreading the fi n a l copy. lx I. INTRODUCTION 1.1 Subjective Quality and Communication System Evaluation Since distortion i s inevitable in a communication system, the prime objective of communication system design is to present to th'e message sink the least distorted replica of the input message. The significance I of the phrase "least distorted" depends on how meaningful the distortion measure i s . While quantitative distortion measures offer distinct advantages, no mathematical criterion can truly reflect a l l aspects of the performance of a communication system. The mean-square errors due to sampling, quantizing, random noise and cross-talk are subjectively different impairments. Indeed, i n any communication system i n which the message sink is a human, the ultimate test of acceptability i s the subjective quality of the output message. Subjective quality may pertain to just one subjective factor such as i n t e l l i g i b i l i t y , or to a combination of factors such as one's preference of one system to another. Owing to the lack of a suitable mathe-matical function relating subjective factors to objective parameters, empirical methods must be ut i l i z e d to assess the interplay between the subjective quality and the physical parameters of a communication system. 1.2 Review of Related Research Theoretical investigations of the effects of channel errors in d i g i t a l communication systems have been quite extensive. These investigations have been undertaken from the standpoint of different figures of merit. Among the various proposed figures of merit, the mean-square error (MSE) has, by far, been showered with the most attention. Under some assumptions, the MSE has been derived for diverse configurations of d i g i t a l communication 2 systems operating with significant channel error rates [l]-[4]. A closely related performance measure, the output signal-to-noise ratio, was obtained by Yates-Fish and Fitch [4] and by Viterbi [5] for pulse code modulation (PCM) systems with channel errors, and by Wolf [6] for simple delta modulation (A - M) systems with a noisy channel. Other c r i t e r i a , such as the mean magnitude error [7] and the information rate relative to the channel capacity [8], were also applied to binary PCM systems with transmission errors. The optimization of the mathematical performance criterion of di g i t a l communications systems subject to sizable channel noise has also attracted numerous researchers. The minimization of the MSE of PCM systems under certain restrictions was reported by Donaldson [9], Palffy-Muhoray [10], and Wintz and Kurtenbach [11]. The minimization of the MSE of a diff e r e n t i a l PCM (DPCM) system with previous-sample feedback and logarithmic quantization was also studied in [10], Chan and Donaldson [12] found the optimum pre-and postfilters in the mean-square sense for a general class of communication systems that include the typical d i g i t a l modulation systems operating in the presence of channel noise. Kurtenbach and Wintz [13] determined the quantizer structure that would minimize the MSE of a PCM system with a given input probability density and a given channel matrix. Finally, combatting channel errors and thus improving system performance with the use of coding has been the f e r t i l e topic of a host of papers [14]-[22]. In contrast to the extensive theoretical works on the effects of channel errors, the available literature on d i g i t a l communication systems evaluated on the basis of subjective figures of merit has been relatively limited. The published results of subjective evaluations of d i g i t a l 3 communication systems have included both video systems [23]-[24] and voice systems. At this point, only the latter systems are of interest. In 1968, Donaldson and Chan [2] compared subjectively the performance of DPCM and PCM voice systems in the absence of channel noise. In a later work, Donaldson and Douville [25] used subjective tests to optimize and compare various analog and d i g i t a l voice communication sys-tems that were assumed to have negligible channel errors. The transmission quality of commercial telephone systems containing one or more PCM links were examined by Hashimoto and Saito [26] from the viewpoint of articulation score. Line noise and code errors were considered in [26]. Recently, Chan and Donaldson [27] studied the subjective effects of optimum pre- and post-filtering inpulse amplitude modulation (PAM) voice systems as well as in PCM and DPCM voice systems. Channel errors were ignored in the latter two systems. In the light of the significance of subjective evaluation and i t s lack of application to voice systems perturbed by random d i g i t a l channel errors, efforts directed to f i l l this void are warranted. 1.3 Scope of the Thesis The objective of this thesis i s to investigate, on the basis of subjective quality, the performance of PCM and DPCM voice systems operating with significant transmission errors. Underlying this objective are two essential problems. The f i r s t i s that the signal transformations inherent in a PCM or a DPCM communication system must be fa i t h f u l l y realized by some convenient method. The second problem is that the output speech of the voice systems must be properly evaluated in terms of a suitable subjective criterion. These two key problems are, respectively, the main motivation 4 of the discussions presented in Chapters 2 and 3. Because of i t s convenience, reproducibility and f l e x i b i l i t y , d i g i t a l computer simulation was chosen to realize the operations of PCM and DPCM voice communication systems. This choice is further j u s t i f i e d in Chapter 2. Since the simulation of any system requires a system model, the PCM and DPCM system models adopted in this study are described in Section 2.2. The reasons guiding the selection of the system parameters and their values or range of values are also given. A description of the simulation f a c i l i t i e s concludes the chapter. As a technique for evaluating speech quality, the isopreference method, f i r s t proposed by Munson and Karlin [42], has evolved to be a widely used test procedure. In Chapter 3, the rationale of selecting the isopre-ference method i s f i r s t presented. Following this is a detailed explanation of the method i t s e l f and the related important problem of scaling or rating speech quality. The relevant details of the subjective tests conducted in this study are documented in the last section of the chapter. In Chapter 4 the results of the subjective tests are presented. These results are compared with the estimated system signal-to-distbrtion power ratios. They are also discussed in terms of the minimum channel capacity required to achieve a desired speech quality. In the discussions, the binary symmetric channel i s viewed as a modulation scheme for some physical channels. Such a consideration reveals some interesting relations between subjective quality and some common system design parameters such as the transmitter power and the channel bandwidth. A summary of the thesis and a brief indication of possible future related research are presented in Chapter 5. I I . COMPUTER SIMULATION OF PCM AND DPCM VOICE COMMUNICATION SYSTEMS 2.1 Introduction In the study of the transmission and coding methods of speech, the instrumentation of the operating model of the system being considered must accomodate the objectives of the study. When subjective quality i s of prime interest, as i t is in this thesis, the operating model must not only be able to synthesize the transmission and coding scheme as i t is conceived, but must also be able to satisfy the needs of subjective tests. Previous research [27]-[30] has demonstrated that d i g i t a l computer simulation offers encouraging benefits when i t is used to realize operating models suitable for subjective evaluation of speech quality. The application of d i g i t a l simulation to research in other various communication systems has also been reported [31]-[33]. A l l these studies indicate that d i g i t a l computer simulation yields the advantages summarized below. (1) Considerable savings in time and money are achieved by eliminating the construction of a hardware model. (2) Since the computer operates in a known fashion, precise control of the simulated system exists. This advantage lessens the uncertainty of whether to attribute unexpected results to the theory of the model, or to the imperfections in realizing the model. (3) System parameters can be precisely specified and easily changed. (4) Input and output data are exactly reproducible. This feature is crucial for performing quality evaluation tests. (5) Isolation of each component or operation for specific study is possible. The PCM and DPCM system models are described in Section 2.2. Included in the same section are the assumptions of and the restrictions on the models. The reasons for the choice of system parameters and their values or ranges of values are also given. In Section 2.3, -both the hardware and the software simulation f a c i l i t i e s are described. 2.2 System Models . . • I 2.2.1 Model of a General Digital Communication System A model of a general d i g i t a l communication system i s represented by the block diagram in Figure 2.1. In this model, the source encoder maps the source's output m(t), which may be continuous, into a sequence of vectors X^, each generated at time t ^ t ^ ' k=0, 1, The elements of each X^ are symbols chosen from a f i n i t e alphabet. The channel encoder then incorporates some error-control capability into the vector X^ to produce another vector Y whose dimensionality and alphabet may, in general," be different from those K. of X^. The degree of error control i s determined by various factors such as the desired error probability and the channel characteristics. The vector Y^ i s f i n a l l y transformed by the modulator to a vector whose space i s spanned by signals that are defined over [t^, t^. + T] and are suitable for transmission over the channel. At the receiving end, the inverse operations of modulation and encoding occur. Because the representation of m(t) by the sequence X^ is approximate, and also because noise i s ever present in the channel, the output m(t) i s a delayed and distorted version of the input message. A unique d i g i t a l communication system may be determined simply by specifying each block of Figure 2.1. In the case of PCM and DPCM, the only difference between the two systems li e s in the source encoder and the source.decoder. A comparative study of the performance of a PCM and a DPCM MESSAGE SOURCE CHANNEL *k \ MODULATOR SOURCE ENCODER ENCODER PHYSICAL CHANNEL MESSAGE Jhft) SOURCE A Xk CHANNEL A DEMODU-A SINK DECODER DECODER LATOR Fig. 2.1 Model of a general digital-communication- system. 8 voice system must therefore require that a l l operations other than source encoding and decoding be common to both systems. This fundamental constraint is assumed in the rest of the thesis. . 2.2.2 PCM System Model Figure 2.2 shows in block diagram a PCM communication system. It i s clear that the system is uniquely defined i f the transfer functions F(f) and G(f), the sampling rate f , the quantizer characteristics, the quantizer's output coding, and the d i g i t a l channel matrix are a l l specified. For this study, the following assumptions and restrictions apply. (1) Both the pre- and postfilters are constrained to be low-pass f i l t e r s with a 3-dB cut-off frequency at 3.74 KHz. The f i l t e r s provide an attenuation of at least 45 dB at 4.0 KHz. Figure 2.3 shows the f i l t e r ' s frequency response on a normalized scale. (2) The sampling frequency f i s set at 8 KHz, which i s the rate recommended by the Consultative Committee for International Telegraph and Telephone (CCITT). (3) The quantizer is a logarithmic quantizer following the compression law V y ' y i J v t = •; /-,, x log (1+ — r - 7 ) sgn v. (2.1) out log(l+u) 6 V 6 in where V = overload voltage and u is the compression parameter. Smith [34] showed that with such a nonuniform quantization characteristic, the signal-to-quantizing-noise power ratio i s relatively insensitive to the talker's volume. The parameter y is fixed at the value of 100. (4) The overload level V is so chosen that a maximum of 1% peak-clipping is allowed. If the quantizer's input is fi l t e r e d speech, V SOURCE ENCODER A/D CONVERTER QUANTIZER CODER XL CHANNEL ENCODER r L SOURCE D E C O D E R POST FILTER D/A G(f) CONVERTER QUANTIZER DECODER A A. DIGITAL CHANNEL CHANNEL DECODER Fig. 2.2 A PCM communication system. Fig. 2.4 Transition diagram of a binary symmetric channel 11 may be set at 4 times the RMS value of the input signal. For a wide class of f i l t e r s the instantaneous amplitude of f i l t e r e d speech has a less than 1 percent probability of exceeding 4 times the input's RMS value [35]. Subjective tests [36] have also confirmed that a 1 percent probability of peak clipping is virtu a l l y undetectable. (5) Two types of non-redundant, fixed-length binary codes are used to encode the quantizer's output. The f i r s t i s the very common natural binary code. The secondis the folded binary code. The latter code belongs to the class of symmetrical codes which appear to be suitable for speech, as speech waveforms tend to be symmetrical with respect to a quiescent level. Furthermore, Dostis [7] reported that in a logarithmically compandored PCM voice system, folded binary code produced, at low talker volumes, only error magnitudes much less than half of the f u l l range amplitude. A brief description of folded binary coding is given in Appendix A. (6) Channel encoding is not ut i l i z e d . (7) The d i g i t a l channel model is the binary symmetric channel (BSC). The channel transition diagram of the BSC is given in Figure 2.4, in which p i s the b i t error probability. This model assumes that the size of both the input and the output alphabets is two, and that the channel errors are s t a t i s t i c a l l y independent. Note that the BSC model assumes nothing about the nature of the physical channel, and imposes only limited restrictions on the modulation scheme and the decoding decision rule. For a lucid treatment of the BSC, see Wozencraft and Jacobs [37]. (8) The two parameters of each simulated system are the b i t error probability pand the number of quantization bits N. The range 12 of p i s from 1.0 x 10 ^ to 1.0 x 10 \ with p incremented in steps of 0.5 on the logarithmic scale. The parameter N is varied integrally from 2 to 6. 2.2.3 DPCM System Model A model of a DPCM communication system i s depicted in Figure 2.5. It is obvious that what distinguishes a DPCM system from a PCM system i s the predictive feedback loop in the source encoder and the source decoder. The assumptions and restrictions imposed on the PCM system model are thus held valid for the DPCM system model. What remains to be specified is the structure of the predictor. In this study, the predictor is restricted to be the optimum linear predictor based on the immediately previous sample. Thus, from Figure 2.5, fk • « fU  (2-2) 2 1 such that a is chosen to minimize E[Z^] . McDonald [38] showed that for 2 negligible quantization error, a = p^, where = E[M^M^_^]/E[M^]. In the simulation the prediction coefficient a is set at p^. Note that to reconstruct the input signal, the predictor at the source decoder must be the same as that in the source encoder. While the input of the quantizer in the PCM system is the samples of the prefiltered speech, the corresponding input in a DPCM system consists of the prediction error Z, . To determine the overload level V, R (0) = 2 k E[Z ] must be estimated. If the quantization error i s ignored R (0), can be k z k easily shown to be ^ E[] is the expectation operator. r SOURCE ENCODER ~1 ) PREFIL TER F(f) mf(t) QUANTIZER 4 QUANTIZER CODER PREDICTOR r SOURCE DECODER fs 0/4 CONVERTER Ik QUANTIZER DECODER CHANNEL ENCODER Y, DIG I TAL CHANNEL CHANNEL DECODER J Fig. 2.5 A DPCM communication system. R (0) = a 2 = (1 - a2)al (2.3) zk k "k 2 f M _ Brv,2-coarse quantization. In a t r i a l simulation in which the overload level V where (0) = E[M^] and a = p^. Such an estimate i s obviously poor for was set at four times a as estimated by (2.3), the estimation error Zk was found to range from 29% for N=2 to 8.5% for N=6. In the same simulation the probability of exceeding V is greater than 0.01 for N varying from 2 to 6 In subsequent simulations of the DPCM system, V was set at five times the experimental RMS value of obtained in the t r i a l simulation''". For this choice of V the percentage of overload i s about 0.5 - 0.6%. 2.3 Digital Computer Simulation Effective simulation of speech communication systems requires (1) an input/output (1/0) f a c i l i t y f u l l y capable of delivering speech to and recovering speech from a computer, (2) a computer with high-speed data-processing capability, and (3) a set of convenient and efficient data-handling subroutines. Each of the requirements as applied to this work is discussed in the following paragraphs. The 1/0 f a c i l i t y u t i l i z e d in our simulation i s the d i g i t a l recordin and playback system described by Chan [39]. A block diagram of the system is given in Figure 2.6. • The output of the recording system is digitized speech stored on 7-track d i g i t a l magnetic tapes. The playback system converts d i g i t a l speech samples to analog speech recorded on analog audio tapes. The simulation of the PCM and DPCM systems was accomplished in an In terms of the RMS value o\, of the speech data, V-3.5(ax, ) Fig; 2.6 Block diagram of d i g i t a l recording and playback system. (a) D i g i t a l recording system. (b) D i g i t a l playback system. IBM System/360 Model 67 time-shared computer. Since the 7-track format is not compatible with the FORTRAN I/O subroutines, a preliminary conversion from the 7-track format to the acceptable 9-track format is mandatory. The data on the 9-track d i g i t a l tape are the input of the simulation programs. The approach taken in programming the simulation was along "block diagram" lines. Each key operation of the system being simulated was realized with a subroutine. An entire system is simulated simply by calling the required subroutines simulating the operations that comprise the system. Such a programming strategy yields f l e x i b i l i t y and simplicity. Two programming languages were used to write the simulation software package. FORTRAN IV was used to handle I/O routines, supervisory chores, and the prediction needed in DPCM. For.logical and non-arithmetic data transformations such as simulating the BSC,- the IBM System/360 assembler language was more suitable. In this connection, some remarks on the simulation of the BSC are in order. Another view [37] of the BSC is that i f the channel input i s the vector X^, then the channel output vector X^ i s ^ . \ = \ ® n k ( 2 - 4 ) where © is mod-2 addition and n. i s a noise vector each element of which k has probability p to be a 1. A l l the vectors are binary-valued and have the same dimensionality. Also note that p i s the bit error probability. To generate n^ of dimension M, a subroutine [40] was called to generate M pseudo-random numbers each uniformly distributed betx^een 0 and 1. If a number was less than p, the corresponding element of n^ was set to 1; otherwise the element would be 0. To avoid wasteful repetitions, the required number of bits of BSC noise for each p of interest were generated and stored on magnetic tape prior to the actual simulation runs. In a simulation run, the noise data were read into core memory, blocked according to the right dimension, and f i n a l l y used to simulate the degradation of the BSC. The processed digitized speech output was f i r s t stored on 9-track magnetic tapes, and later was converted to the proper 7-track format required for digital-to-analog conversion via the I/O hardware mentioned earlier. III. SPEECH QUALITY EVALUATION 3.1 Introduction The assessment of subjective factors believed to be important in the design of speech transmission systems has been extensively studied for the last sixty years. Over this period, various methods have been developed to solve system design problems with different states of the art and applications. Surveys of the methods for speech quality measure-ments may be found in the works of Swaffield and Richards [41], Munson and Karlin [42], and Hecker and Guffman [43]. In the past, i n t e l l i g i b i l i t y had been the primary subjective criterion for evaluating speech communication systems. However, because the i n t e l l i g i b i l i t y of the output speech of modern voice communication systems i s frequently close to 100%, i n t e l l i g i b i l i t y has become an inadequate measure of speech quality. The concept of speech quality must now be extended to encompass the total auditory impression of speech on a listener. This impression includes, in addition to i n t e l l i g i b i l i t y , other factors such as loudness, preference, and speaker identification. The parameter "preference" describes the average attitude of a listener towards a speech signal relative to another speech signal with reproducible characteristics. In this work, preference is assumed to be the dominant factor with respect to the over-all speech quality. Such an assumption is valid i f (1) the speech signal is reasonably i n t e l l i g i b l e ; (2) the speech signal i s presented at the optimum loudness level, which is defined as the level at which the listener prefers to hear the signal; (3) the recogniza-b i l i t y of the speaker is of l i t t l e interest to the listener. Under these circumstances, which often apply in practical cases, the aspect of preference alone may represent speech quality for a l l practical purposes. The recommended methods for preference measurements have been reviewed in an IEEE engineering practice [44]. The particular method chosen for this research i s the isopreference method described in Section 3.2. In Section 3.3 the procedure for rating isopreference curves is discussed. Both Sections 3.4 and 3.5 give the necessary documentation of the subjective tests on which the results of this thesis are based. 3.2 The Isopreference Method Originally proposed by Munson and Karlin [42], the isopreference method has been studied or applied by numerous researchers [2], [25], [27], [42], [44]-[46], The basic assumption of the method is that speech quality may be characterized by a unidimensional scale based on the single subjective factor of preference. In this method a forced pair-comparison technique is ut i l i z e d to compare directly a speech test signal with another speech signal called the reference. The speech test signal generally has only one varying parameter. The reference may be a speech output of some trans-mission system whose parameters under study are a l l held fixed at some values, or the reference may be a speech quality standard"*" used to rate preferences. The results of the various comparisons are normally shown in the form of isopreference contours on a plane whose coordinates may be the system parameters of interest or may be speech level versus noise level. A speech test signal i s isopreferent to the reference i f the scores averaged over a l l listeners show an equal preference (50%) for both signals. An isopreference contour is a curve that connects a l l points The speech quality standard for preference rating i s liv e speech or high-quality recorded speech that is a r t i f i c i a l l y degraded to varying degrees in a measurable and reproducible manner. 20 representing isopreferent speech signals. To see how an isopreference contour is obtained, consider Figure 3.1. The parameters considered are labeled a and B. Suppose that the point A with coordinates ( a 0>^ 0) i - s chosen to be the reference, and that i t is desired to find an isopreference contour through A. To determine the various points on the contour, different sets of speech test signals must be compared with A. For example, to obtain the point B, signals with a = and B varying over a suitable range of B are each paired with A and presented in a random order to the listeners for a forced two-category judgment. The results, expressed in proportions, are then plotted against the parameter B. A smooth psychometric curve is drawn through the experimental points. A typical psychometric curve is shown in Figure 3.2. From this curve, the abscissa corresponding to a proportion of one half defines the value of B at point B. The range of B must be wide enough so that the proportions of judgments can vary from 0 to 1. The parameter chosen to vary depends on the expected shape of the contour, and some intuition i s required. In Figure 3.1, for example, to obtain point C, a should be the varying parameter while B is held at Q^-The procedure just described may be used to obtain more than one isopreference contour, provided the reference is redefined for each new contour. If, in one system, the interaction of various parameters is to be assessed, each possible pair of the parameters can define a distinct plane on which a unique set of isopreference contours is drawn. Similarly, different sets of isopreference contours can be obtained for different speech transmission systems that are evaluated on the basis of the same parameters. The isopreference method thus affords a convenient procedure Fig, 3.1 A typical isopreference contour Fig. 3.2 A typical psychometric curve 22 for both intra- and intersystem subjective evaluation of speech communication systems. 3.3 Scaling of Isopreference Contours Preference, by definition, i s a relative measure of speech quality. In order that the isopreference contours of a single speech communication system or of different systems may be rated and compared, a common standard of quality is necessary. Various speech rating standards have been proposed and tested [2], [27], [41], [45], [47]-[49]. In this study the speech quality standard introduced by Schroeder [49] is adopted. The main advan-tages of Schroeder's standard are i t s easy reproducibility by d i g i t a l means and i t s perceptual similarity to speech signals undergoing certain signal-dependent distortions such as quantizing and predictive coding. Consider a bandlimited signal s(t) whose Nyquist sample at time t^ i s s ( t ^ ) . The family of sampled reference signals is then defined as follows. .^ v a ( t k ) = (l+a 2) 2[s(t k)+a-n(t k)] (3.1) where a is a parameter determining the signal-to-noise ratio (SNR) of v ^ ( t k ) , and "(t^) = E^t^) "sCtjj).* "^k^ ^ s a n o ^ s e s a m P l e obtained by multiplying sCt^) by a zero-mean discrete stochastic process ^(t^) = +1 which is uncorrelated with the signal and whose samples form a sequence of uncorrelated random variables. It is seen that the SNR of v (t. ) i s a k given by SNR = a" 2 (3.2) Observe that the SNR is uniquely defined on a sample-by-sample basis and is independent of time-averaging procedures or intermittency of the signal. Also, the one-parameter family of the reference signals v a ( 0 Is consistent with the assumption in the isopreference method that speech quality is quantifiable on a unidimensional scale. With a speech quality standard defined, the task of rating a speech test signal or an isopreference contour is now reduced to the determination, by subjective comparison tests, of which of the reference signals v (t) is isopreferent to the test signal or to any point of the isopreference contour. The matching SNR is then labeled as the "subjective signal-to-noise ratio" CS/N) S l l i j j °f t n e t e s t signal or of the isopreference contour. Note that i t is acceptable to regard the rating of a point on an isopreference contour as the rating of the contour i t s e l f . This is based on the implicit assumption of t r a n s i t i v i t y in the isopreference method. The tr a n s i t i v i t y property requires that i f points A and B are equally preferred and point B is also isopreferent to point C, then A and C should be equally preferred. The three points may be on the same contour on one plane, or on contours on different planes. 3.4 Preparation of Speech Material The speech material adopted for a l l the subjective tests in this study is the much used test sentence, "Joe took father's shoe bench out; she was waiting at my lawn." This sentence, which contains most of the phonemes found in the English language [50], was spoken by a 31-year old male university professor with a Western Canadian accent. The sentence was recorded on low-noise Ampex 434 audio tape with a single-track Scully 280 tape recorder at 15 ips using an AKG D-200E low-impedance cardiod micro-phone. The recording was performed in an Industiral Acoustics Company Model 24 1205-A quiet room. Additional speech material was not employed due to the prohibited increase in data processing and testing efforts. Statistics on the test sentence were obtained with the aid of the d i g i t a l simulation f a c i l i t i e s described in Section 2.3 [39]. The effective bandwidth of the digitized speech was limited to 6 kHz. This was accomplished in the analog-to-digital conversion by reducing the speed of the Scully tape recorder to 7.5 ips, low-pass f i l t e r i n g the speech at 3 kHz, and sampling at 6 kHz. Shown in Figure 3.3(a) is the amplitude probability density of the speech samples normalized with respect to their RMS value. Also shown are the Laplacian distribution [51] and the gamma distribution [52], both of which have been proposed as models of the amplitude probability density of speech. In Figure 3.3(b), the relative power spectrum of the digitized speech is presented. Also shown for comparison is the relative power spectrum reported by Benson and Hirsh [53]. Their speech material was the test sentence, "Joe... lawn," and 90-second samples of news and technical materials. It was found that based on the voices of five male speakers, the spectra of the three types of materials were not significantly different, and could be represented by the spectrum reproduced in Figure 3.3(b). From Figure 3.3 i t can be gathered that although the test sentence used in this study is only about 5 seconds in duration, the sentence is reasonably representative of conversational speech. To prepare the speech samples required for the listening tests, the folloxtfing steps were undertaken. (1) The analog-recorded test sentence was bandlimited to an effective bandwidth of 4 kHz, sampled at an effective rate of 8 kHz, Fig. 3.3 (a) Normalized amplitude probability density of speech. Symmetrical average of positive and negative data. (b) Power density spectrum of speech. and d i g i t a l l y recorded on magnetic tape. (2) The digitized speech was processed by a PCM or DPCM system suitably simulated in an IBM 360/Model 67 computer. (3) The processed speech was converted to analog form and recorded on analog audio tapes. Only the speech samples that needed to be compared were converted. The choice of these samples was guided by the results of a p i l o t test described in Section 3.5. (4) The analog tapes for the listening tests were produced by manually splicing the tapes obtained in step (3) into the desired format. In step (3), loudness was controlled by monitoring the record amplifier output of the Scully tape recorder, and adjusting the record level so that the recorded speech samples sounded equally loud. It was observed earlier in the p i l o t test that close agreement existed between the listeners' and the experimenter's judgment of loudness. In step (4), every two samples, say A and B, that had to be compared, were paired twice, once in the direct order AB and once in the reverse order BA. The test material was a randomized sequence of a l l the pairs. The generation of the speech quality standard signals v a ( t ) followed almost the same procedure outlined before. The difference was that in step (2), the digitized test sentence., "Joe... lawn," was processed according to (3.1) to produce v^(t). The parameter a was so varied that the SNR increased from -9 to 42 dB in steps of 3 dB. The random process e(t, ) = +1 was generated by a fast pseudo-random number generation algorithm ts. [40]. 27 3.5 Further Details of Subjective Tests 3.5.1 Pilot Test A p i l o t test was conducted for two reasons. The f i r s t was to know which speech samples would serve as "reasonable" references in the fin a l pair-comparison tests. The second reason was to determine the proper range of each set of samples that would have to be compared with one of the references chosen. The references and ranges used in the pilot test i t s e l f were products of intuition. The pilot test consisted of two sessions spaced three days apart. In the f i r s t session, the outputs of the PCM system with natural coding were presented while samples processed by the DPCM system with natural coding were given in the second session. Both sessions were held in a quiet room. The tapes were played back on the Scully 280 tape recorder. Sharpe HA-10-MK II stereo headphones each with an external volume control were used. At the start of each session, the listeners were allowed to adjust their volume controls. However, in the course of the listening session, no change of the volume control was permitted. Prior to the listening session, the listeners were asked to read the following instructions: "In this listening test you w i l l hear pairs of speech signals. Each pair i s separated by a 5-second silent interval. After listening to a pair, indicate in the appropriate column which speech signal of the pair you would prefer to hear. If both speech signals sound equally good, make an arbitrary choice. The f i r s t speech signal of each pair i s designated as "A", and the second, as "B". In making your preference, ignore the "cl i c k s " immediately before and after each speech signal and avoid placing too much emphasis on loudness. 28 The speech material used throughout the test is the sentence, 'Joe took father's shoe bench out; she was waiting at my lawn.'". Each speech sample was slightly more than 5 seconds in time duration, and a one-second interval separated the two samples in a pair. A total of 85 pairs was presented in each session, which lasted 30 minutes. A five-minute rest period was introduced after the fortieth comparison. The listeners in the pil o t test were four male graduate student volunteers whose age range was 23 to 27 years. The mean age was about 24 years. A l l had no previous experience in listening tests, and exhibited no hearing abnormalities 3.5.2 Test for Determining Iisoperference Contours Data needed for plotting isopreference contours were obtained i n a test designed with the aid of the pil o t test's results. The test consisted of four sessions, each of which was devoted to one of the four types of systems being considered: PCM with natural and folded binary coding, and DPCM with natural and folded binary coding. The f i r s t three sessions were held on alternate working days of the same week while the fourth session was conducted on the Monday of the subsequent week. In each session, 100 pairs of speech samples were presented, and to minimize fatigue effects, a 5-minute rest period was given after every 25th pair. The total time of each session was about 45 minutes. The listeners were 12 male and 6 female university students. The group's age range was 18 years to 27 years with a mean of about 22 years. Except for three male listeners who participated in the pil o t test, the rest of the subjects had no previous experience with listening tests. A l l took 29 part in each session, and a l l showed normal auditory perception. The equipment and procedural instructions were the same as those employed in the pilot test. (See Section 3.5.1). In Section 4.1, the derivation of isopreference contours from the raw scores obtained in the test i s described. 3.5.3 Test for Rating Isopreference Contours In Section 3.3 i t is stated that tr a n s i t i v i t y is implicitly assumed in the isopreference method. To rate an isopreference contour, i t should be sufficient to rate a point on the contour. In this study, the reference point of each contour was rated. The rating was obtained by applying the isopreference method to the reference of each "contour and to the one-parameter family of speech quality standard reference signals described in Section 3.3. The rating test consisted of two sessions conducted two weeks apart. In the f i r s t session, the listeners were asked to compare 80 pairs of speech samples in order to rate the reference point of each contour. In the second session, a few selected points not on the experimental isopreference contours were also rated. ^  While the f i r s t session lasted 35 minutes, the duration of the second session, with 120 comparisons, was 50 minutes. The listeners, test equipment, and procedures were identical to those of the test described in Section 3.5.2. The motivation for the second session was to check the consistency of the listeners' rating judgment. The degree of consistency was also taken as an indicator of tr a n s i t i v i t y . While rating consistency i s not as stringent a requirement as trans i t i v i t y , rating consistency is deemed reasonably sufficien especially in view of the impracticality of generating speech samples with a non-integral number of bits. 30 In the second session there was an attempt to rate samples of 6-bit and 7-bit PCM with no channel errors. However, the rating scores tend to become random at (S/N) , . = 33 dB to 36 dB. This may be accounted subj for by the i n a b i l i t y of the isopreference method to evaluate speech samples of very good or very poor quality. For in the extreme cases, randomness creeps in to obliterate any possible unequivocal preference in our auditory perception. Fortunately, for the parameter values considered in this work, the saturation effect in the listeners' rating a b i l i t y does not come into play. 31 IV. RESULTS OF SUBJECTIVE TESTS 4.1 Determination of Experimental Isopreference Contours The results of an isopreference test are shown either as isopreference contours through the references selected for the test, or as ratings of the test signals being scaled. In either case, isopreferent speech signals must be found. From the definition of isopreferent signals given in Section 3.2, the determination of such signals implies that the value of the varying parameter giving a 50% preference must be known. The technique used in this study to determine an isopreferent signal is described in Section 4.1.2. The key assumption underlying this technique i s f i r s t discussed in Section 4.1.1. In Section 4.1.3, the details on the plotting and the ratings of the isopreference contours are presented 4.1.1 The Normality Assumption The scores obtained from the comparison between a reference and some chosen set of signals with one varying parameter are ordinarily f i r s t converted to proportions of a l l the listeners not preferring the reference"'". The proportions are then plotted on a plane whose ordinate is between 0 and 1, and whose abscissa is in terms of the varying parameter. The resulting smooth curve f i t t e d through the points is known as a psychometric curve. A representative experimental psychometric curve is shown in Figure 4.1 (a). Such a curve bears close resemblance to a normal ogive. This observation agrees with that of other investigators [2], [25], [27]. Thus, i t i s assumed that a l l the scores, with respect to one reference and expressed in proportions, constitute a sample from a population with a normal cumulative distribution. With equal validity, the scores may be converted to proportions of listeners preferring the reference. UNIT NCRMAL DEVIATE OF PROPORTION OF LISTENERS NOT PREFERRING REFERENCE The assumption of normality may be tested by a goodness of f i t test. One such test i s the Kolmogorov-Smirnov (K-S) test for goodness of f i t . The K-S s t a t i s t i c D i s defined as the least upper bound of the n difference between the experimental distribution F (x) and the assumed theoretical distribution F(x), that i s , D = sup |F (x) - F(x)I (4.1) n n a l l x where n is the sample size. It can be shown [54] that is distribution-2 free. Furthermore, i f the mean p and the variance a are estimated by their maximum liklihood estimates, the K-S test yields a test for normality that is more powerful than the classical chi-square test [55]. With F(x) assumed to be normal, the K-S test was applied to a l l the sets of scores obtained in both the rating test and the test for determining isopreference contours. Except for a few isolated cases, a l l other sets could not be rejected at the 1% significance level. 4.1.2 Determination of Isopreference Signals by Least-Squares F i t With the normality assumption j u s t i f i e d , the determination of the parameter value allowing a 50% preference is equivalent to the problem of estimating from a sample the mean of a normal population. A common method is to convert f i r s t each proportion p_j, of listeners not preferring the reference to a unit normal deviate y^, and a weighted least-squares method is used to f i t a straight line to the data points. A l l p^ with values of 0.00 or 1.00 must f i r s t be converted to a non-zero or a non-unity number, respectively, before conversion to unit normal deviates. In this study, 0.00 was changed to 0.01 and 1.00 to 0.99. A typical line fi t t e d by the weighted least-squares method is given in Figure 4.1 (b). The weights used in the l i n e - f i t t i n g were the Muller-Urban weights normalized to give amaximum weight of 1 when the proportion i s 0.5. The normalized Muller-Urban weight w. attached to the unit normal deviate y. i s 2 defined as ~y. -i w. = y(e X) , / , (4.2) i 4 P ±(l-P i) The reasons for adopting the Muller-Urban weights may be found in [56]. From the best-fit line whose equation i s y = a + bx (4.3) 2 the estimated mean x and the estimated variance s are easily derived to be x x = - £ (4.4) b s 2 = ( i ) 2 (4.5) For small samples, the confidence of x cannot be indicated by i t s approximate standard deviation a-'-s / v'n" [57]. Instead, the confidence of x as an x x i estimate of u is set by the inequality (x -u)/n-1 < t (4.6) a s where a is the significance level. Or put i n another way, the 100(l-a)% confidence interval of u is given by the inequality s • s x - t — — < v < x + t — — (4-7) 4.1.3 Plotting Isopreference Contours; Results of Rating Test A set of isopreference contours was obtained for each of the four types of systems being considered in this study. Illustrated in Figures 4.2 and 4.3, respectively, the four sets of contours were plotted on a plane whose ordinate i s the logarithm of the channel b i t error probability p in If t obeys the t-distribution, the probability that |t|>t is a. 10' 03 10' <r CQ O cc a. cc o cc cc UJ 10' CQ 10 -I 20.2±49dB x2.54±1.35dB _ i _ 2 3 4 5 NUMBER OF QUANTIZATION BITS N 10 -4 -3 2 10 "X. CQ O CC 0. CC O cc cc UJ 10 O 1 X.20.2±4.9dB X1.7- 1.99 dB _ i _ JL 1 2 3 4 5 6 NUMBER OF QUANTIZATION BITS N (a) (b) Fig. 4.2 PCM isopreference contours. The reference point of each contour is drawn solid. The number associated with each reference point i s the subjective rating ( S / N ) s u b j of the reference, (a) Natural binary coding. (b) Folded binary coding. BIT ERROR PROBABILITY p OQ LO 3 3 cn n o O T 3 CL l-| w- ro 3 i-h oo ro • H ro 3 cr ro >—' o o •D 3 0 rt H " O Cu C (D i-i Cu CO CT* * H -3 Co 01 ro I-I ro «< rt O D' o ro Cu H - n 3 &) 0 0 T 3 • rt H -O 3 O Hi ^ 0 Q e ro H) O • i 3 o n ro o o ro 3 rt e I 2: 1 CO \j O c -1 6 :> CD Co tn l " I — o I to —I I CD i+ CD BIT ERROR PROBABILITY p O I O I I s N 6 CO 5 > ) NI uj 1 + to a CD 37 decreasing order, and whose abscissa is the number of quantization bits N per sample. The bar associated with each point of equal subjective quality indicates the size of the 95% confidence interval of the point's true value. The confidence interval was calculated according to (4.7). A l l the isopreference contours, which were based on the best visual f i t to the data points, were drawn close to points of small variance and were constrained to have the same general shape as that of the neighboring curves. Also given in Figures 4.2 and 4.3 are the estimate and the 95% confidence limits of the (S/N) , , of each reference. Since tr a n s i t i v i t y subj J is assumed, the rating of each contour is taken to be just the (S/N) , . ' 6 . subj of i t s reference. To verify the consistency of the listeners' ratings, the points marked "x" in each of the four figures were also rated. The consistency in ratings is also considered as an approximate measure of tr a n s i t i v i t y . The results show that the listeners' judgments have good consistency. 4.2 Comparison with the System Signal-to-Distortion Power Ratio A common objective measure of communication system performance is 2 the system mean-square error which can be computed as follows: 2 1 N s ~ 2 CTD = N }. ( Mk " V (4.8) s 1=1 In (4.8) is the total number of samples, and and are, respectivelyj the input and output speech samples. (See Figures 2.2 and 2.5.). An equivalent figure of merit is the inband system signal-to-distortion power ratio A defined as 2 \ A = -f (4.9) °D 4 3. 2 NUMBER OF QUANTIZATION BITS N NUMBER OF QUANTIZATION BITS N Fig. 4.4 Contours of constant system signal-to-distortion power ratio. The coding i s natural binary. The curves in broken lines are the isopreference contours of the respective system. Also given are the signal-to-distortion power ratio in dB of the points marked "X". (a) PCM system, (b) DPCM system. 39 where a i s the power of the f i l t e r e d input speech. In the simulation of the PCM and DPCM systems A was estimated for each p o s s i b l e p a i r of values assumed by p and N. Based on these estimates, approximate contours of constant A were obtained by l i n e a r i n t e r p o l a t i o n . In Figure 4.4 (a) the constant-A contours of the PCM system with n a t u r a l coding are shown while the same contours of the DPCM system with n a t u r a l coding are seen i n Figure 4.4 (b). The contours of systems with folded coding are almost i d e n t i c a l to those of systems with n a t u r a l coding, and so are not presented. Also drawn i n Figure 4.4 are the isopreference contours of the corresponding systems. Inspection of Figure 4.4 reveals that contours with low A tend to bear close resemblance to isopreference contours with low subjective r a t i n g s . This implies that the fi g u r e of merit A may i n d i c a t e with reasonable adequacy the preference of poor-quality speech. However, as the speech q u a l i t y improves, a marked d i f f e r e n c e e x i s t s between the two types of contours. In p a r t i c u l a r , the isopreference contours are, f o r a given qu a l i t y , more s e n s i t i v e to the d e t e r i o r a t i o n of the d i g i t a l channel. On the other hand, both types of contours e x h i b i t the same abrupt sa t u r a t i o n e f f e c t as N i s varied. This observation suggests that when the channel i s r e l a t i v e l y e r r o r - f r e e , A may be an acceptable measure of preference; but that f or s i g n i f i c a n t channel noise, A f a i l s to r e f l e c t the ears' s e n s i t i v i t y to the impairment caused by transmission e r r o r s . Some remarks on the r e l a t i v e l y low numerical values of A are i n order. Consider the curves presented i n Figures 4.5. These curves were obtained experimentally by varying the r a t i o C = a /V i n the simulations \ of the PCM and DPCM voice systems. For s i m p l i c i t y transmissions errors F i g . 4.5 Simulation r e s u l t s r e l a t i n g the system s i g n a l - t o - d i s t o r t i o n power r a t i o to the r e l a t i v e speech s i g n a l power. Transmission errors are neglected and n a t u r a l binary coding i s used, (a) PCM system. (b) DPCM system. 41 were ignored with no loss of validity. An interesting result of Figures 4.5 i s that in both types of modulation systems, A exhibits, for each N, a maximum at a certain value C = C q. For N>2, A decreases rapidly as C becomes greater than C q, but a gentle r o l l - o f f in A is seen as C becomes less than C q. In the PCM system, an overload''" exists for a l l C£-18 dB. At C = -12 dB, i.e. V = 4crM , the overload i s 0.81%. This conforms with the guideline for choosing V set out in Section 2.2.2, and confirms the data of Cramer [35] and Purton [36]. In the DPCM system, the overload i s only 0.5 - 0.6% at C—10 dB, and peak-clipping i s absent for a l l C^-18 dB and N>2. In the simulations for preparing the subjective evaluation's test material, C was set at - 12 dB for PCM and at about - 10 dB for DPCM. (See Section 2.2.2). For this reason and in view of Figure 4.5,the numerical values quoted in Figures 4.4 are consistent. As noted in Section 2.2.2, a 1% overload i s virtua l l y undetectable subjectively [36]. The strong immunity of speech to distortion due to peak-clipping has also been verified by other investigators [58]-[59]. While i t may be of academic interest to operate at C = C q for each value of N, such a simulation is too idealized since peak-clipping must be non-existent. Overload i s inevitable in real multi-user voice systems. In these practical systems [36], [60]-[61], 1% peak-clipping i s a generally acceptable design criterion. The figures for the percentage of overload apply to the speech data used in the computer simulation. 42 4.3 Discussion of the Subjective Tests' Results 4.3.1 Saturation Effect in Preference An examination of Figures 4.2 and 4.3 clearly shows that each of the experimental isopreference contours exhibits two distinct saturation regions. In one region, defined as region I, the bit error probability p is small; in the other, defined as region II, p i s large. Both regions owe their existence to the counteracting contribution of the two main sources of distortion in the simulated systems - signal quantization and transmission errors. For a given preference level, quantization error dominates when the channel is relatively free of errors. On the other hand,when p is large, the frequent transmission errors mask out whatever is gained by finer quantization. Thus, to maintain a certain desired speech quality, both p and N must not be less than their respective lower bounds. For example, for the simulated PCM system with natural binary coding, the lower bounds of p and N are, respectively, about 1.1 x 10 and 4.5 bits/sample for a quality rated at (S/N)gu^j=20 dB. 4.3.2 The Minimum Required Channel Capacity of the BSC A main dividend of subjective evaluation i s the comparison of the evaluated communication systems on the basis of the subjective test results. In this section, both the PCM and DPCM systems are compared in terms of the minimum BSC channel capacity required to yield a given quality. The channel capcity of a BSC per usage is given by [62] C B g c = 1 - H(p,. 1-p) (4.10) H(p, 1-p) is the entropy function defined as H(p, 1-p) = -[plog p + (1-p) log(l-p)] (4.11) Since the number of usages per second i s f N, then C in bits/sec i s Fig. 4.6 Curves of constant channel capacity of the BSC. Superimposed are the contours of Figure 4.2(a). 44 CBSC = f s N [ 1 ~ H ( P ' 1 _ P ) ] ( 4 ' 1 2 ) Curves of constant channel capacity may be calculated from (4.12), and plotted on a log p-versus-N plane. These curves are found in Figure 4.6. For comparison purposes, the isopreference contours of the PCM system with natural coding are superimposed on the constant capacity curves. Inspection of Figure 4.6 indicates that the minimum required channel capacity curve that asymptotically coincides with the contour. In Figure 4 . 7 t h e subjective signal-to-noise ratio of each contour of a l l four systems considered in this study i s plotted against the contour's minimum required channel capacity C . The vertical bar at each point represents the 95% confidence JDOC, K interval of the subjective rating at that point. In Figure 4.7, the data points of each system are so located that they suggest a simple regression line may be drawn through them. However, in view of the closeness of a l l four sets of data points, i t i s valid to ask i f some, or even a l l , of the four sets of points may be grouped and be f i t t e d with a simple regression line. To answer this question, Kozak's [63] interactive sequential procedure for detecting parallelism and coincidence among regression lines was programmed and applied to Figure 4.7. It was found that at the 0.05 significance level, the four regression lines each representing each set of data points may be parallel but not coincident. However, at the same significance level, a single regression line may be fit t e d to the data points of both types of DPCM systems. This same conclusion applies to the two types of PCM systems. This finding is the j u s t i f i c a t i o n for so drawing the two lines shown in Figure 4.7. Subject to the assumptions and restrictions of the computer simulation, and within the s t a t i s t i c a l significance of the subjective evaluation, Fig. 4.7 Subjective quality ratings of Figures 4.2 - 4.3 plotted against their respective minimum required channel capacity when the d i g i t a l channel i s a BSC. 46 two conclusions may be inferred from Figure 4.7. The f i r s t i s that for both PCM and DPCM, the speech quality i s independent of whether natural or folded binary coding i s used to encode the quantizer's output. The second conclusion i s that based on C , DPCM performs s l i g h t l y better than PCM DO C , K over the range of parameters considered i n this work. The improvement i n qu a l i t y , i n terms of (^/N)^^ i° dB, i s almost consistently about 2.0 - 2.5 dB for C ranging from 15.0 to 45.0 kbits/sec. An equivalent Do C , K interpretation of the performance improvement i s that given a subjective q u a l i t y , DPCM has a C,,,,-, „ less than that of PCM. This view suggests that Bo 0, K for a d i g i t a l channel modelled by a BSC, DPCM i s a more e f f i c i e n t source encoding scheme. Other researchers [38], [64] had previously reported that for a noiseless channel, DPCM with previous-sample feedback y i e l d s a higher signal-to-quantizing-noise r a t i o than that of PCM. Although C „ -n i s acceptable as a yardstick for judging source Qo V-», K. encoding a b i l i t y , nevertheless C c n provides no information on some b o L . , K important parameters, such as transmitter power and channel bandwidth, which are usually associated with communication system design. Nor does C n offer any p r a c t i c a l insight into the subjective test r e s u l t s . DO C , K In f a c t , since the b i t error p r o b a b i l i t y p i s generally inversely related to the channel signal-to-noise r a t i o (SNR) y \ i t i s often undesirable to operate near the C _ of the desired subjective q u a l i t y . For to do so DO 0, R requires large y, and the BSC i s , for large y, a very poor scheme to exploit the capacity of the physical channel on which the BSC i s based. The foremost implication of the previous remarks i s that a further q u a l i f i c a t i o n of the BSC i s mandatory i f more useful information were to be The channel SNR y i s defined as the r a t i o of the physical channel's input signal power S to the mean power of some additive noise of the channel. 47 reaped from the results of the subjective tests. Such a qualification must jointly specify the mathematical model of the physical channel the modulation scheme, the demodulator and the decoding algorithm. In each of the next two sections, the BSC is f i r s t defined in terms of a physical channel model; then some resulting f r u i t f u l interpretations are presented. 4.3.3 The BSC Based on the Additive White Gaussian Channel A model of the additive white Gaussian (AWG) channel i s given in Figure 4.8. The received signal r(t) i s defined by r(t) = s(t) + n CO (4.13) w where s(t) is the transmitted signal and n (t) is a sample function from a w N q zero-mean white Gaussian process with spectral density — watts/Hz. In this work, the alphabet size of X is two; each element of the alphabet is equally l i k e l y . Furthermore, i t i s assumed that i f = t x ^ xkM'' a n t* \ = [ x ^ , .... x k M ] , then M i=l Thus, i f the demodulator's output r i s symmetrically quantized into two levels, a BSC results. If the modulation is by binary I " 1 [TRANSMITTER A. MODULATOR iAWG CHANNEL RECEI VER . J DEMO-DULATOR DECODER Fig. 4.8 Communication over an additive white Gaussian channel. antipodal s i g n a l l i n g and the receiver i s a maximum l i k e l i h o o d detector, then the b i t error p r o b a b i l i t y p of-the BSC i s [37] • • / / /2E "/ p = Q t / f ^ / ( 4 ' 1 5 ) O ! where 2' _ SL Q(x) = - i - /°° e 2' da (4.16) /27 " x and E g i s the energy of the s i g n a l s ( t ) . I f the s i g n a l s ( t ) i s further constrained to be power-limited and band-limited so that i t s power S and bandwidth W s a t i s f y T E = / s 2 ( t ) = ST (4.17) s o and W = ~ (4.18) then the channel capacity of the AWG channel i s [62] CAWG = W l o 8 2 ( 1 + $ ( 4 - 1 9 ) where y = channel SNR = S/N . In (4.17) and (4.18), T stands for the f i n i t e time duration such that s (t) = 0 for t<0 and t>T. Since the function Q(x) i s a monotonLcally decreasing function of x and since T = then applying s (4.15), (4.17) and (4.18), (4.19) may be written as f s N -1 2 C A W G = - ~ l o g 2 [ l + (Q \p))Z] (4.20) where y = Q _ 1(p) i f f p = Q(y) (4.21) Based on (4.20), curves of constant capacity f o r an AWG channel are p l o t t e d on a log p-versus-N plane. The curves are presented i n Figure 4.9, 49 / 2 3 4 5 6 7 NUMBER OF QUANT IZA TI ON Bl TS N Fig. 4.9 Curves of constant capacity of the additive white Gaussian channel. Superimposed are the contours of Figure 4.2(a). 50 For comparison purposes, the isopreference contours of the PCM system with natural coding are superimposed over the constant capacity curves. It is obvious from Figure 4.9 that a minimum required channel capacity exists for each isopreference contour. Or equivalently, a maximum subjective quality exists for a given channel capacity. In Figure 4.10 the subjective ratings of a l l four systems being studied are plotted against their respective minimum required channel capacity. The vertical bar at each point represents the 95% confidence interval of the subjective rating at that point. Kozak's [63] test for coincidence was applied to the data points and i t was found that a single regression line may represent the DPCM systems while another line may be fi t t e d to the data point of the PCM systems. Two conclusions may be immediately drawn from Figure 4.10. The f i r s t is that speech quality is neither refined nor degraded when folded binary coding i s used instead of natural binary coding. The second conclusion is that based on the minimum required channel capacity C. n of the AWG channel, DPCM performs better than PCM. The improvement in quality, in terms of (S/N) , . in dB, ranges from about 1 dB for C.TIO _ = 22.0 k bits/sec subj AWG,R to about 3 dB at C.T,_, _ = 70.0 k bits/sec. The improvement is thus a function AWG, R of C,„_, and increases with C.TT-, _. Another view of the performance improve-AWG,R AWG,R r r ment of DPCM over PCM is in terms of the saving in the capacity requirement. For example, for (^/N)^^. = 20 dB, a saving of about 8 k bits/sec is possible. Again the saving in capacity requirement increases as the quality increases. When an AWG channel with capacity C^ is available, there is a maximum achievable speech quality Q^ . It is of interest to inquire, what *30. -Q ID "I I I I i 30 40 50 60 70 MINIMUM REQUIRED CHANNEL CAPACITY C , KBITS/SEC AWG ,R Fig. 4.10 Subjective ratings of Figures 4.2 - 4.3 plotted against their respective minimum required channel capacity when the physical channel is AWG. 52 happens i f the quality actually realized i s less than Q^ . Consider Figure A.11 in which curve A is the curve of constant capacity C^, curve B i s the isopreference contour with quality Q^ , and curve C is the contour with quality Q^ . It i s obvious that points P and 0 are the operating points at which quality i s realized for the given channel capacity. Now i f the sampling rate f and the dimensionality D s of the signal space are both fixed, then for signal time duration T = , the s channel bandwidth Wc i s directly proportional to N. It follows that W£ is greater at 0 than at P. On the other hand, i t is straightforward to verify that the channel SNR y is smaller at 0 than at P. Thus, when the suboptimal quality i s acceptable, a trade-off between channel bandwidth and transmitter power exists. However, with respect to the communication efficiency defined as the ratio '.ri = C_or,/C , point 0 i s more efficient than point P. This can be checked by Ho C AWG referring to Figure 4.10 and by noting that = is the same for both operating points. This means that at suboptimal speech quality, the AWG channel i s better exploited with lower transmitter power and larger channel bandwidth. Am B C N Fig. 4.11 Operating points for suboptimal speech quality in an AWG channel. Fig. A.12 Subjective ratings versus the channel SNR in a PCM and a DPCM system for N=5. The physical channel is an AWG. channel. Another suboptimal operation worth investigating i s that in which the number of quantization bits N i s set at some value. For convenience, N is taken as 5 in the following discussion. In Figure 4.12 the subjective SNR ( S / N ) ^ ^ is graphically related to the required channel SNR y for both the PCM and DPCM systems using natural binary coding. The (S/N) , . at saturation is obtained from Figure ° subj 4.7 with C = 40 k bits/sec. A striking feature of Figure 4.12 is that the DbL>) R subjective SNR rises steeply and then levels off rather abruptly. The quality of both systems starts to saturate at about y = 55 dB while the difference between the subjective ratings at saturation is about 2.4 dB. Figure 4.16 also indicates the definite dependence of the improvement of DPCM over PCM on the channel SNR. It is interesting to note that when the quality of the PCM systems saturates at y = 55 dB, the y of the DPCM system providing the same quality i s about 52.6 dB. A saving of 2.4 dB in the transmitter power is thus afforded. 4.3.4 The BSC Based on the Rayleigh Fading Channel with AWG Noise As a model of a d i g i t a l channel, the BSC assumes nothing about the physical channel, and has limited restrictions on the modulator-demodulator pair and the decoding rule. Such f l e x i b i l i t y of the BSC is demonstrated in this section by deducing the BSC from another important physical channel model - the Rayleigh fading channel (RFC) with AWG noise. A block diagram model of a communication system operating over a RFC channel is shown in Figure 4.13. The received signal r(t) is r(t) = r°(t) + n ft) (4.22) w MODULATOR RAYLEIGH FADING CHANNEL In (t) - h w r ( t ) . OPTIMUM ••CD INCOHERENT RECEIVER A . Fig. 4.13 Communication over a Rayleigh fading channel with AWG noise. where n w ( t ) i s a sample function from a zero-mean whiter Gaussian process with s p e c t r a l density N^/2 watts/Hz, and r°(t) i s the output of the RFC. The channel c h a r a c t e r i s t i c s of a RFC are such that i f the input s i g n a l i s fi cos w^t, the output waveform r°(t) i s '/ r°(t) = fi a cos (u t + 8) i' (4.23) . o where a and 8 are s t a t i s t i c a l l y independent random variables with a j o i n t p r o b a b i l i t y density given by ^ — "a b — e , a^O, 0 <:CJ><2TT TTb P Q fi(a ,40 = 1 (4.24) 0 elsewhere Thus, a i s Rayleigh-distributed while 8 has a uniform density. A d e t a i l e d d e r i v a t i o n of (4.23) and (4.24) i s found i n [37]. For an exhaustive t r e a t -ment of fading di s p e r s i v e channels, of which the RFC i s the simplest, see [65]. To complete the s p e c i f i c a t i o n of the BSC, i t i s further assumed that the input and output alphabets are binary, that the modulation i s by binary frequency-shift keying (FSK) orthogonal over the i n t e r v a l [0,T] and that the optimum incoherent receiver i s r e a l i z e d according to Figure 4.14. The p r o b a b i l i t y of error per usage can then be evaluated [37] as P • E (4.25) 2+ — N o where E = average received energy. If the error i n each usage of the channel i s s t a t i s t i c a l l y independent of the errors of other usages, then a BSC model r e s u l t s with the b i t error p r o b a b i l i t y equal to p. The channel capacity C^ F C of the system of Figure 4.13 may be proved [62] to be BANDPASS FILTER MATCHED TO s/t) SQUARE-LAW ENVELOPE DETECTOR UT BANDPASS FILTER MATCHED TO ^) SQUARE-LAW ENVEL OPE DETECTOR MAXIMUM A POSTERIORI PROBABILI TY DETECTOR s}(t)=[/2Ercos(«jt) s(t)= ]/2Ercos(cop Fig. 4.14 Optimum incoherent receiver for binary orthogonal signals. Or E : R F C = 3 T T l o g 2 e ( 4 - 2 6 ) C R F C = C R F C L N 2 = * « - 2 1 ) where y = average received signal-to-noise ratio. With T = —— , (4.25) s is rewritten as P = —V - <4-28> 2+ — — f N s Based on (4.27)-(4.28), curves of constant capacity are again calculated and plotted on a log p-versus-N plane. The curves are presented in Figure 4.15. The isopreference contours of the'PCM system with natural binary coding are also shown for comparison. As in the case of the AWG channel, there exists a minimum channel capacity C required Kr 0, R by each contour. The C _ of each contour of a l l four systems was Kr 0, K thus estimated. In Figure 4.16, the subjective rating of each contour of each system i s plotted against the logarithm of the contour's C . After Kr L, K the application of Kozak's [63] test for parallelism and coincidence among regression lines to the data points of Figure 4.20, i t was found that at the 0.05 significance level, a single line may be fi t t e d to the D P C M data while another line may be fi t t e d to the data of the P C M systems. The two key conclusions that may be deduced from Figure 4.16 are similar to and consistent with those based on Figure 4.10. The f i r s t conclusion i s that there is no effect on the speech quality i f the folded code i s used instead of the natural code to encode the quantizer's output. The second is that D P C M performs better than P C M in the sense that for a given physical channel capacity, the maximum achievable quality is higher 58 °~ -31 10 >~ i— CD a o cc CL CC o CC U J -2' 10 t— CD \ \ 153.6 MEGA Bl TS/SEC 76.8 MEGA BITS/SEC v 38.4 1$.? MEGA BITS/SEC 9.6 A8 MEG A 2£ MFGAfjl 7~9 A p 7 1.2 MFGARITS/W 0.6 MEGA BITS/SEC fo' > n j * of «u*n*«°» e ' T S 6 N o a e i t y of a W W * ^ K | u r e T 2 C a t t h ftf constant capacityo c o n t o U r s of * ° 4 ' 1 5 S T o i s e . Superposed a « Fig. 4.16 Subjective ratings of Figures 4.2 - 4.3 versus their respective minimum required channel capacity when the physical channel i s Rayleigh fading with AWG noise. Ln 60 for DPCM. Equivalently, DPCM requires less channel capacity for a desired subjective quality. As in the case of the AWG channel, the performance improvement of DPCM over PCM is a function of C (or of the quality) and increases with increasing R (or increasing quality). However, a significant difference exists between the performance in.ah AWG channel and that i n a Rayleigh fading channel. In the former, the subjective quality improves linearly with C^ W G R , but in the latter, the speech quality increases linearly with the logarithm of Cn„._ _. Since C___, = y is proportional to Kr L., K Kr C , K the transmitter power S, i t i s thus necessary to increase S exponentially in order to achieve a linear increase in subjective quality expressed as < S / N )subj l n d B -When the quality realized is less than the maximum achievable in a given RFC, the situation is quite different from i t s counterpart when the AWG channel i s valid. It i s true, as indicated in Figure 4.17, that two N Fig. 4.17 Operating points for suboptimal speech quality inr.a Rayleigh fading channel with AWG noise. Fig. 4.18 Subjective ratings versus the average received SNR in a PCM and a DPCM system for N=5. The physical channel is a Rayleigh fading channel with AWG noise. operating points P ; and 0' exist. But since C equals the average received SNR, which is proportional to the transmitter power S, S i s the same at both P' and 0'. It follows that no trade-off between S and N exists. However, there is a trade-off between p and the communication efficiency n, as n is lower at P' than at 0'. This means that at a fixed quality level, the b i t error probability may be decreased by increasing the time interval T of the signal at the expense of poorer communication efficiency. In Figure 4.18, the subjective rating i s related to the average received SNR y for N = 5. Only PCM and DPCM systems with natural coding are considered. It can be seen that the rating is linearly related to y for relatively low y and then saturates at high y. This observation is consistent with a similar result in the case of the AWG channel. There are, however, two notable differences between such a suboptimal operation in a RFC and the same operation in an AWG channel. The f i r s t is that to achieve the same quality, more transmitter power is required in the RFC. Secondly, the slope in the linear region i s steeper in Figure 4.18 then what i t is in Figure 4.12. For example, for the PCM system, the slope i s about 1.1 in Figure 4.18 while that of Figure 4.16 equals 0.28. This suggests that for a given quality increment in the linear region, the corresponding increment in the average received power in the RFC i s the fourth power of the power increment required in an AWG channel. Finally, i t can be pointed out that the performance improvement of DPCM over PCM is again dependent on the average received SNR. V. CONCLUSION In t h i s t h e s i s , the e f f e c t s of transmission errors on the speech q u a l i t y of a PCM and a DPCM voice communication system are investigated on the basis of human preference. Models of the two modulation systems are simulated i n a d i g i t a l computer. The simualted systems, i n conjunction with a special-purpose I/O i n t e r f a c e , are used to process a recorded speech sample representative of English speech. The q u a l i t y of the processed speech i s then evaluated according to the isopreference method. Four d i s t i n c t types of systems are considered: PCM and DPCM systems with n a t u r a l binary coding, and PCM and DPCM systems with folded binary coding. For each type of system, a set of isopreference contours i s obtained to display the processed speech's preference v a r i a t i o n s with the b i t er r o r p r o b a b i l i t y p of a BSC and the number of quantization b i t s N. The contours of a l l four types of systems i n d i c a t e that quantization e r r o r dominates when the channel i s r e l a t i v e l y e r r o r - f r e e , but that f i n e quantization o f f e r s no gain i n q u a l i t y i f the channel error rate i s s i g n i -f i c a n t . Furthermore, encoding the quantizer's output by e i t h e r n a t u r a l or f o l d i n g binary coding y i e l d s v i r t u a l l y i d e n t i c a l speech q u a l i t y . The r e s u l t s of the subjective evaluation are compared with the estimated system s i g n a l - t o - d i s t o r t i o n power r a t i o s obtained i n the simu-l a t i o n of the PCM and DPCM systems. It i s found that f o r systems with poor-quality outputs or with r e l a t i v e l y small p, the system s i g n a l - t o -d i s t o r t i o n power r a t i o may be a reasonably adequate measure of human pre-ference of speech. However, as the channel de t e r i o r a t e s , the s i g n a l - t o -d i s t o r t i o n power r a t i o f a i l s to r e f l e c t the auditory s e n s i t i v i t y to the 64 impairment caused by transmission e r r o r s . Comparison among contours i s made possible by r a t i n g each contour on the basis of Schroeder's [49] speech q u a l i t y standard s i g n a l s . The ratings suggest that f or a d i g i t a l channel modelled by the BSC, DPCM i s a more e f f i c i e n t . source-encoding scheme than PCM. Furthermore, i f the BSC i s i n t e r p r e t e d as a modulation scheme f o r a p h y s i c a l channel, each isopreference contour may have a unique minimum p h y s i c a l channel capacity C required to achieve the contour's q u a l i t y . The minimum required channel capacity C of each contour i s g r a p h i c a l l y found for two important K p h y s i c a l channel models - the a d d i t i v e white Gaussian channel and the Rayleigh fading channel with a d d i t i v e white Gaussian noise. For these two channels, DPCM requires a capacity l e s s than that of PCM to a t t a i n the same speech q u a l i t y . The d i f f e r e n c e i n the capacity requirement i s found to increase with increasing q u a l i t y . Two suboptimal operations are also discussed. The f i r s t i s the s i t u a t i o n i n which the speech q u a l i t y a c t u a l l y r e a l i z e d i s less than the maximum achievable i n a given channel. Two operating points are observed. In the AWG channel, a £rade-off between the transmitter power and the channel bandwidth e x i s t s between the two operating points, although one point e x h i b i t s a p o s s i b l e b e t t e r e x p l o i t a t i o n of the p h y s i c a l channel. In the Rayleigh fading channel, no s i m i l a r trade-off between the transmitter power and the channel bandwidth e x i s t s . Instead, b i t er r o r p r o b a b i l i t y may be decreased at the p r i c e of a le s s e f f i c i e n t use of the p h y s i c a l channel. In the second suboptimal operation, the subjective q u a l i t y r a t i n g i s r e l a t e d to the required channel s i g n a l - t o noise r a t i o y measured at the re c e i v e r ' s input. A prominent feature i s that the r a t i n g , i n dB, increases l i n e a r l y with the required SNR y at low SNR, but saturates at high SNR. In the Rayleigh fading channel, the increment i n y necessary to s a t i s f y a given increment i n quality i s found to be exponentially related to the corresponding required incremental y i n an AWG channel. In spite of i t s l i m i t a t i o n i n judging speech of very good or very poor q u a l i t y , the isopreference method has been demonstrated to be suitable for the subjective evaluation of speech communication systems with moderate but s t i l l s i g n i f i c a n t d i s t o r t i o n s . I f the results of the subjective tests are set within a proper the o r e t i c a l framework, a host of useful i n f o r -mation may be obtained. Indeed, the isopreference method can be envisaged to be applicable to the subjective study of some problems intimately related to what has been presented i n this thesis. Among these problems are the subjective evaluations of low-bit quantization with noisy channel, of the effica c y of channel encoding i n improving speech qu a l i t y , and of the comparative performance of DPCM and PCM with channels with memory. APPENDIX A. FOLDED BINARY CODE The folded binary code i s a symmetrical code. Figure A . l shows f o u r - b i t folded binary code. For comparison, the natural binary code i s also given. In the folded binary code, the f i r s t b i t i s used to denote th p o l a r i t y , and the remaining d i g i t s to denote the magnitude, i . e . , the departure from the quiescent l e v e l . The symmetry of the codewords i s with respect to the quiescent l e v e l . 1111 1110 1101 1100 1011 1010 1001 1000 1111 1110 1101 1100 1011 1010 1001 1000 Quiescent Level 0000 0001 0010 0011 0100 0101 0110 0111 0111 0110 0101 0100 0011 0010 0001 0000 Folded Binary Code Natural Binary Code Figure A . l Folded and natural binary code. The conversion of an N-bit natural binary codeword X to i t s corresponding N-bit folded binary codeword Y may be accomplished according to the following simple algorithm 67 Y=X 0 (2 -D2 »( STOP ~ ) N In the flowchart, ® represents mod-2 addit i o n and (2 - 1 ) ^ denotes the N binary equivalent of the integer 2 -1. The same algorithm may be used to perform the converse operation. Of course, i n th i s case, X denotes an N-bit folded binary codeword while Y denotes an N-bit natural, binary codeword. 68 REFERENCES 1. R.E. Totty and G.C. Clark, "Reconstruction error in waveform transmission," IEEE Trans, on Information Theory (Correspondence) vol. IT-13, pp. 336-338, April 1967. 2. R.W. Donaldson and D. Chan, "Analysis and subjective evaluation of diffe r e n t i a l pulse-code modulation voice communication systems", IEEE Trans, on Communication Technology, vol. COM-17, pp. 10-19, February 1969. 3. B.R.N. Murthy and P.A. Wintz, "Analysis of differential systems for PCM transmissions," 1970 IEEE In t ' l Conf. on Communications Conf.  Record, vol. 1, pp. 8.45-8.54, 1970. 4. N.L. Yates-Fish and E. Fitch, "Signal/noise ratio in pulse code modulation," Proc. IEE, vol. 102, part B, pp. 204-210, March 1955. 5. A.J. Viterbi, "Lower bounds on maximum signal-to-noise ratios for di g i t a l communication over the gaussian channel," IEEE Trans, on  Communication Systems, vol. CS-12, pp. 10-17, March. 1964. 6. J.K. Wolf, "Effects of channel errors on delta modulation," IEEE Trans. on Communication Technology, vol. C0M-14, pp. 2-7, February. 1966. 7. I. Dostis, "The effects of d i g i t a l errors on PCM transmission of compandored speech," Bell Sys. Tech. J., vol. 44, pp. 2227-2243, December 1965. 8. A.R. B i l l i n g s , "The rate of transmission of information in pulse code modulation systems," Proc. IEE, vol. 105, part C, no. 7, pp. 444-447, March 1958. 9. R.W. Donaldson, "Optimization of PCM systems which use natural binary codes," Proc. IEEE (Correspondence), vol. 56, pp. 1252-1253, July, 1966. 69 10. P. Palffy-Muhoray, "Effect of channel transmission errors on DPCM systems," M.A.Sc. Thesis, Department of El e c t r i c a l Engineering, University of British Columbia, Vancouver 8, B.C., Canada, March 1969. 11. P.A. Wintz and A.J. Kurtenbach, "Waveform error control in PCM telemetry," IEEE Trans, on Information Theory, vol. IT-14, pp. 650-661, September 1968. 12. D. Chan and R.W. Donaldson, "Optimum pre- and postfiltering of sampled signals, with application to pulse modulation and data compression systems," IEEE Trans, on Communication Technology, vol. COM-19, pp. 141-157, April 1971. 13. A.J. Kurtenbach and P.A. Wintz, "Quantizing for noisy channels," IEEE Trans, on Communication Technology, vol. COM-17, pp. 291-302, April 1969. 14. M.R. Aaron, "PCM transmission in the exchange plant," Bell Sys. Tech. J., vol. 41, pp. 99-141, January 1962. 15. G.G. Apple and P.A. Wintz, "BCH code performance for sampled data," 1970 Int'l Conf. on Communications Conf. Record, vol. 2.,pp. 28.1-28.8, 1970. 16. A.J. Bernstein, K. Steiglitz, and J.E. Hopcroft, "Encoding of analog signals for binary symmetric channels," IEEE Trans, on Information Theory, vol. IT-12, pp. 425-430, October 1966. 17. J. Bodo, "Coding for least RMS error in binary PCM channels," 1962  WESCON Convention Record, TR 62-3, May. 1962. 18. M.M. Buchner, Jr., "A system approach to quantization and transmission error," Bell Sys. Tech. J., vol. 48, pp. 1219-1247, May-June 1969. 70 19. G.C. Clark and R.E. Totty, "PCM transmission with minimum mean-square error," Proc. of Int ' l Telemetering Conf., vol. II, pp. 468-479, 1966. 20. T.S. Huang, "Optimum binary code," M.I.T. Electronics Research Lab., Cambridge, Mass., Quarterly Progress Report No. 82, pp. 223-225, July 1966. 21. Ye. V. Voronov, "Coding method for transmission of binary numbers over a noisy channel," Radio Eng'g and Electronics Physics, pp. 1273-1279, May 1963. 22. I.T. Young and J.C. Mott-Smith, "On weighted PCM," IEEE Trans, on Informa- tion Theory (Correspondence), vol. IT-11, pp. 596-597, October 1965. 23. R.C. Brainard, "Subjective evaluation of PCM noise-feedback coder for TV," Proc. IEEE, vol. 55, pp. 346-353, March 1967. 24. T.S. Huang and M.T. Chikhaoui, "The effect of BSC on PCM picture quality," IEEE Trans, on Information Theory, vol. IT-13, pp. 270-273, April 1967. 25. R.W. Donaldson and R.J. Douville, "Analysis, subjective evaluation, optimization, and comparison of the performance capabilities of PCM, DPCM, AM, AM and PM voice communication systems," IEEE Trans, on  Communication Technology, vol. COM-17, pp. 421-431, August 1969. 26. K. Hashimoto and S. Saito, "Speech quality of PCM transmission system," Electronics and Communications in Japan, vol. 52-A, no. 2, pp. 20-26, 1969. 27. D. Chan and R.W. Donaldson, "Subjective evaluation of pre- and postfiltering in AM, PAM, PCM, DPCM and PM voice communication systems," Submitted to the IEEE Trans, on Communication Technology for publication. 28. E.E. David, Jr., M.V. Mathews, and H.S. McDonald, "Description and results of experiments with speech using d i g i t a l computer simulation," Proc. Nat'l Electronics Conf., Chicago, I l l i n o i s , pp. 766-775, October 1958. 71 29. M.V. Mathews, "Extremal coding for speech transmission," IRE Trans. on Information Theory, vol. IT-5, pp. 129-136, September 1959. 30. B.S. Atal and M.R. Schroeder, "Predictive coding of speech signals," IEEE Wescon Convention Record, Part 3, Paper No. 8/2, August 1968. 31. R.A. Gibby, "An evaluation of AM data system performance by computer simulation," Bell Sys. Tech. J., vol. 39, pp. 675-704, May 1960. 32. R.K. Kwan and W.F. McGee, "Digital computer simulation of a frequency-shift keying system," IEEE Trans, on Communication Technology, vol. C0M-16, pp. 683-690, October 1968. 33. R.M. Golden, "Digital computer simulation of sampled-data communication systems using the block diagram compiler: BLODIB," Bell Sys. Tech. J., vol. 45, pp. 345-358, March 1966. 34. B. Smith, "Instantaneous companding of quantized signals," Bell Sys. Tech. J., vol. 36, pp. 653-709, May 1957. 35. B.G. Cramer, "Optimum linear f i l t e r i n g of analog signals in noisy channels," IEEE Trans. Audio and Electroacoustics, vol. AU-14, pp. 3-15, March 1966. 36. R.F. Purton, "A survey of telephone speech signal s t a t i s t i c s and their significance in the choice of a PCM companding law," Proc. IEE, vol. 109B, pp. 60-66, January.1962. 37. J.M. Wozencraft and I.M. Jacobs, Principles of Communication Engineering, New York: Wiley, 1965. 38. R.A. McDonald, "Signal-to-noise and idle channel performance of differential pulse code modulation systems - particular applications to voice signals," Bell Sys. Tech. J., vol. 45, pp. 1123-1151, September 1966. 39. D. Chan, "Optimal pre- and postfiltering of noisy sampled signals -72 particular applications to PAM, PCM and DPCM communication systems," Ph.D. Thesis, Department of Ele c t r i c a l Engineering, University of Briti s h Columbia, Vancouver 8, B.C., Canada, August 1970. 40. D.S. Seraphin, "A fast random number generator for IBM 360," Commun. Assoc. Comput. Mach., vol. 12, p. 695, December 1969. 41. J. Swaffield and D.L. Richards, "Rating .of speech links and performance of telephone networks," Proc. IEE, vol. 106B, pp. 65-76, March 1959. 42. W.A. Munson and J.E. Karlin, "Isopreference method for evaluating speech-transmission c i r c u i t s , " J. Acoust. Soc. Am., vol. 34, pp. 762— 774, June 1962. 43. M.M.L. Hecker and N. Guttman, "A survey of methods for measuring speech quality," J. Audio Eng. Soc., vol. 15, pp. 400-403, 1967. 44. "IEEE Recommended Practice for Speech Quality Measurements," IEEE Trans. Audio and Electroacoustics, vol. AU-17, pp. 227-246, September 1969. 45. E.H. Rothauser, G.E. Urbanek, and W.P. Pachl, "Isopreference method for speech evaluation," J. Acoust. Soc. Am., vol. 44, pp. 408-418, February 1968. 46. W.H. Tedford, Jr., and T.V. Frazier, "Further study of the isopreference method of circ u i t evaluation," J. Acoust. Soc. Am., vol. 39, pp. 645-649, April 1966. 47. J. Mickunas, Jr., "Preference scaling of vocoder speech," J. Acoust.  Soc. Am., vol. 37, pp. 1199, 1965. 48. M.M.L. Hecker and CE. Williams, "Choice of reference conditions for speech preference tests," J. Acoust. Soc. Am., vol. 39, pp. 946-952, May 1966. 49. M.R. Schroeder, "Reference signal for signal quality studies," J. Acoust. 73 Soc. Am., vol. 44, pp. 1735-1736, June 1968. 50. H. Fletcher, Speech and Hearing in Communication, New York: D. Van Nostrand, 1953, p. 25. 51. W.B. Davenport, Jr., "An experimental study of speech-wave probability distributions," J. Acoust. Soc. Am., vol. 24, pp. 390-399, July 1952. 52. D.L. Richards, " S t a t i s t i c a l properties of speech signals," Proc. IEE, vol. I l l , pp. 941-949, May 1964. 53. R.W. Benson and I.J. Hirsh, "Some variables in audio spectrometry," J. Acoust. Soc. Am., vol. 25, pp. 499-505, May 1953. 54. B.W. Lindgren, S t a t i s t i c a l Theory, New York: MacMillan Company, 1962, p. 300. 55. M. Kac, J. Keifer, and J. Wolfowitz, "On tests of normality and other tests of goodness of f i t based on distance methods," Am. Math. Stat., vol. 26, pp. 189-211, 1955. 56. J.P. Guilford, Psychometric Methods, New York: McGraw-Hill, 1954, p. 129. 57. P.G. Hoel. Introduction to Mathematical Statistics, New York: Wiley, 1954, Ch. 11. 58. J.C.R. Licklider and R. Held, "Effects of various types of nonlinear distortion upon the i n t e l l i g i b i l i t y of speech," J. Acoust.Soc. Am., vol. 24, p. 114, January 1952. 59. R.K. Saxe and R.E. Lacy, "Some aspects of clipped speech," Proc. IRE, vol. 42, p. 613, March 1954. 60. H. Mann, H.M. Straube, and CP. V i l l a r s , "A companded coder for an experimental PCM terminal," Bell Sys. Tech. J., vol. 41, pp. 173-226, January 1962. 61. D.L. Richards, "Transmission performance of telephone networks containing PCM links," Proc. IEE, vol. 115, pp. 1245-1258, September 1968. 74 62. R.G. Gallager, Information Theory and Reliable Communication, New York: Wiley, 1968. 63. A. Kozak, "A simple method to test parallelism and coincidence for curvilinear, multiple linear and multiple curvilinear regressions," Paper presented at the Third Conference of the Advisory Group of Forest Statisticians, Section 25, I.U.F.R.O., Jouy-en-Joses, France, September 9-11, 1970. 64. K. Nitadori, " S t a t i s t i c a l analysis of APCM," Electronics and Communi- cation in Japan, vol. 48, pp. 17-26, February 1965. 65. R.S. Kennedy, Fading Dispersive Communication Channels, New York: Wiley, 1969. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0102107/manifest

Comment

Related Items