An Asynchronous Telephone Speech Scramblerwith a New Key Generation MethodbyRAYMOND W. M. WOOB. A. Sc. (Hons.), The University of British Columbia, 1991A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF ELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIASeptember 1993© Raymond W. M. Woo, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.Electrical EngineeringDepartment ofThe University of British ColumbiaVancouver, CanadaDate ^October 13, 1993DE-6 (2/88)AbstractThe design of an asynchronous telephone speech scrambler is proposed and its perfor-mance experimentally studied. The scrambler transforms the speech signal into a scrambledform by permuting the frequency coefficients of the speech signal. These coefficients arepermuted in pairs instead of individually. Subjective tests indicate that such an arrangementresults in an improvement in descrambled speech quality but no increase in the residualintelligibility of the scrambled speech signal. A new criterion for selecting good scramblingpermutations known as the derangement criterion is introduced. Subjective tests based onthe number test confirm that keys satisfying this criterion are able to produce scrambledspeech with virtually no residual intelligibility. A new derangement generation algorithm isintroduced. This algorithm maps an integer g, in the range of 0 < g < (n —1)!, uniquelyto a derangement of n elements.Table of ContentsAbstract^ iiList of Tables viList of Figures^ viiiAcknowledgment x1 Introduction 11.1 Secure Speech Communications for Telephone Users ^ 11.2 An Overview of the Problem ^ 21.2.1^Basic Concepts 21.2.2^Special Issues of Designing a Speech Scrambling System ^ 41.3 System Requirements ^ 51.4 Scope of Thesis 61.5 Organization of Thesis^ 72 Review of Speech Scrambling Systems 82.1 Digital Speech Encryption Systems ^ 82.1.1^The Data Encryption Standard 92.1.2^The Rivest-Shamir-Adleman Exponentiation Cipher ^ 92.1.3^Stream Ciphers ^ 102.1.4^Comments 121112.2 Analog Speech Scrambling Systems ^ 142.2.1 Traditional Frequency Domain Speech Scramblers ^ 142.2.1.1^Frequency Inversion ^ 142.2.1.2^Band-Shift Inversion 152.2.1.3^Bandsplitting ^ 162.2.2 Time Domain Speech Scramblers ^ 172.2.2.1^Reverse Time Segmentation 172.2.2.2^Time Segment Scrambling ^ 172.2.3 Modern Frequency Domain Speech Scramblers ^ 192.2.3.1^The Synchronous FFT Speech Scrambling Method ^ 202.2.3.2^The Asynchronous FFT Speech Scrambling Method^ 232.2.3.2.1^Filter Bank Analysis and Synthesis ^ 232.2.3.2.2^The Asynchronous Speech Scrambling Method . 272.2.4 Comments 303 System Design 333.1 Delay Effects and Detectability Threshold ^ 333.2 Bandwidth of Permutable Sub-bands and Speech Quality^ 364 The Scrambling Key Generation Algorithm 404.1 The Key Selection Problem^ 404.2 Existing Schemes^ 414.2.1 Scheme 1 414.2.2 Scheme 2 ^ 44iv4.3 Comments^ 484.4 Proposed Key Generation Algorithm ^ 494.4.1 The Derangement Criterion 494.4.2 The Derangement Generation Algorithm ^ 504.4.2.1 Mapping a Permutation to an Index 504.4.2.2 Generating a Permutation from an Index ^ 524.4.2.3 Generating a Derangement from an Index 534.4.2.4 Comments ^ 555 Experimental Set-up and Results 595.1 Assessing the Efficiency of the Derangement Criterion ^ 595.1.1 The Number Test ^ 595.1.2 A Critique of the Number Test ^ 615.1.3 Test Results ^ 625.2 Comparison of the Efficiency of the Parameters ^ 675.3 The Sentence Test ^ 785.4 Speech Quality Comparison^ 796 Conclusions and Recommendations for Future Work^ 826.1 Conclusions ^ 826.2 Recommendations for Future Work ^ 82References^ 84VList of TablesTable 1^MOS scores for standard speech coders. ^ 13Table 2 Delay values for various values of L and N 35Table 3 Values of P(m) for n = 12. ^ 42Table 4 Values of average OD and MPD for derangements having various number ofelements. ^ 56Table 5 Distribution of the values of HD between derangements generated byAlgorithm D^ 57Table 6^Mean residual intelligibility scores and their variances forN = 256, L = 5, NsB = 1 at various values of OD. ^ 62Table 7^Mean residual intelligibility scores and their variances forN = 128, L = 5, NsB = 1 at various values of OD. ^ 62Table 8^Comparison of the mean residual intelligibility scores of keys meeting thederangement criterion for N = 256 and 128. ^ 64Table 9^Distribution of the subjects' entries for the ten digits for N = 128. ^ 65Table 10 Mean residual intelligibility scores and their variances forN = 64, L = 5, NSB = 1 at various values of HD ^ 69Table 11 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 1 at various values of OD ^ 70Table 12 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 1 at various values of WSP. ^ 71Table 13 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 1 at various values of MPD ^ 72viTable 14 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 2 at various values of HD. ^Table 15 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 2 at various values of OD. ^Table 16 Mean residual intelligibility scores and their variances forN = 64, L = 5, NsB = 2 at various values of WSP. ^Table 17 Mean residual intelligibility scores and their variances forN --= 64, L = 5, NsB =- 2 at various values of MPD. ^Table 18 Comparison of the mean residual intelligibility scores of keys meeting thederangement criterion for different values of N and NsB^Table 19 Normalized ranking values and their standard deviations for speech processedwith parameters N = 64, L = 5, NsB = 1 and 2. ^viiList of Figures^1.1 General model of a privacy system for communications 21.2 Spectrum of human speech^ 52.1 General block diagram of a digital speech scrambler 82.2 A digital speech scrambling system based on a synchronous stream cipher . . 112.3 A digital speech scrambling system based on a self-synchronous stream cipher. 112.4 General structure of an analog speech scrambler. ^ 142.5 Frequency inversion. (a) Original speech spectrum and (b) scrambledspectrum^ 152.6 Band-shift inversion. (a) Original speech spectrum and (b) scrambledspectrum 162.7^Bandsplitting. (a) Original order of sub-bands and (b) scrambled spectrum. . . 162.8 Block diagram of the synchronous FFT scrambling system ^ 212.9^The N ideal filters in a filter bank^ 232.10 Net on which vT, [k] is defined. 262.11 Performing interpolation on the net on which vn [k] is defined. ^ 272.12 Block diagram of the asynchronous FFT scrambling system 282.13 The operations of preparing a frame for FFT ^ 293.1^Delay detectability threshold for various kinds of conversations. ^ 343.2 Quality ranking of speech scrambled with different number of frequencycomponents per sub-band. ^ 374.1^Probability density function P(rn) for n = 12^ 434.2 Dividing the speech spectrum for assigning weighting values^ 454.3^Grouping four neighboring sub-bands into clusters. ^ 464.4 Distribution of the values of HD between derangements generated by AlgorithmD^ 58viii5.1^Residual Intelligibility versus Order of Displacement forN = 256, L = 5, NsB = 1^ 635.2^Residual Intelligibility versus Order of Displacement forN = 128, L = 5, NsB = 1 635.3^Distribution of the subjects' entries for the ten digits for N = 128^ 665.4 Residual Intelligibility versus Hamming Distance for N = 64, L = 5, NsB = 1. ^ 695.5^Residual Intelligibility versus Order of Displacement forN = 64, L = 5, NsB = 1 ^ 705.6 Residual Intelligibility versus Weighted Shift Parameter forN = 64, L = 5, NsB = 1. ^ 715.7 Residual Intelligibility versus Mean Permutation Distance forN = 64, L = 5, NsB = 1. 725.8 Residual Intelligibility versus Hamming Distance for N = 64, L = 5, NsB = 2. ^ 735.9^Residual Intelligibility versus Order of Displacement forN = 64, L = 5, NsB = 2. ^ 745.10 Residual Intelligibility versus Weighted Shift Parameter forN = 64, L = 5, No = 2. ^ 755.11 Residual Intelligibility versus Mean Permutation Distance forN = 64, L = 5, NsB = 2. ^ 76ixAcknowledgmentI would like to express my sincere gratitude to my supervisor Dr. C. Leung for hissupport and guidance throughout the course of this thesis.I also wish to express my appreciation to all those who participated in the subjectivetests for their interest and involvement in the project.A special thank you is extended to my family for their support and encouragementthroughout my graduate studies.Financial support in the form of a research assistantship from NSERC grant OGP0001731is gratefully acknowledged.xChapter 1 Introduction1.1 Secure Speech Communications for Telephone UsersSpeech is one of the most fundamental forms of human communications. It has been andremains one of the main methods used for the exchange of information and ideas. Today,due to technological advances in telecommunications, we can carry on conversations withfamily members, friends and colleagues even if there is a great geographic distance separatingus. In fact, we have gradually developed a great dependence on speech communication overdistance, and the telephone network, which is by far the most widely used telecommunicationsnetwork, has become an integral part of our modern society. But notwithstanding itsnumerous benefits, the telephone has also opened up new avenues for interceptors tohave unauthorized access to private communication. For example, whereas most businesstransactions used to be conducted either by personal contact or written correspondence, theyare now handled largely by the telephone. It is often easier for someone to intercept valuableor sensitive information from the telephone network. Within the telephone network, thereis little provision of privacy for the network traffic. Therefore, for some telephone users,such as stock brokers advising their clients on investment opportunities, government officialsdiscussing sensitive matters, or corporate executives conducting a tele-conference, there lurksthe constant peril of having confidentiality compromised. Providing privacy protection tomany telephone network users is clearly needed.The issue of speech communications security has long been addressed in military anddiplomatic areas. However, it is only recently that this issue has received attention in otherareas, especially for users of the telephone network [ l ].1Chapter 1. Introduction^ 21.2 An Overview of the Problem1.2.1 Basic ConceptsThe issue of data security involves two related but distinct cryptographic problems [2][3]:• Privacy problem, which is mainly concerned with preventing an interceptor fromextracting useful information from a communications channel. Such action ofinformation interception is known as passive wiretapping or eavesdropping.• Authentication problem, which deals with the detection of interceptor injectingfalse information or altering the contents of existing information in the channel.Here, the breakage into the system is known as active wiretapping or tampering.The problem of authentication is of concern in the issue of security for speech commu-nications. For example, someone may assume a false identity and pass misinformation to, orretrieve valuable information from an unsuspecting receiver. However, we will concentrateon the problem of privacy in this thesis, as it poses a greater threat due to the fact thateavesdropping is a passive action and may not even be detectable.A commonly used model of a privacy system is shown in Figure 1.1. Note that theinterceptor is not a part of the privacy system, and is included merely to illustrate the courseof information flow in the system.Figure 1.1 General model of a privacy system for communications.Chapter 1. Introduction 3In the nomenclature of cryptography, the message originating from the sender is knownas the plaintext, P and is to be transmitted to the intended recipient via the insecure channel.To conceal the information content of the message from the eavesdropper, an invertibletransformation E K, based on a certain key, K, is used to convert P into the ciphertext, C:C = EK(P).Upon receiving C, the recipient deciphers it back to the original message by applying theinverse transformation D K, with the same key K:DK(C) = DK(EK(P)) = P. (1.2)Together, the encryptor and the decryptor constitute the cryptographic system. When usedin speech communications, the cryptographic system is often known as a speech scramblingsystem, or a speech scrambler. A cryptographic system may be an algorithm implementedin the form of computer software, a piece of hardware, or a combination of these.Generally, privacy is not a result of keeping the underlying principles of a cryptographicsystem secret. In fact, in most cases, any information regarding the cryptographic systemis revealed to the public [2]. For example, in practice, the cryptographic system may bea commercial product which is available to the sender, the recipient, and the eavesdropperalike. Rather, the cryptographic system provides secrecy by the secret selection of a particularkey. Therefore, to achieve high security, it is of paramount importance that the key spaceof the cryptographic system be so large that it would be impractical for the eavesdropper toattempt breaking the system by simply trying out all possible keys.Chapter 1. Introduction^ 41.2.2 Special Issues of Designing a Speech Scrambling SystemWhen designing a speech scrambling system, there are many performance and operationalissues that need to be considered. Besides key related issues, level of security, quality ofrecovered speech, effect of channel impairments on performance, required channel bandwidth,and processing delay are among the most important ones [4].The level of security is certainly the most significant issue of all, as the need for securityis the very motivation for a speech scrambling system. Usually, the level of requiredsecurity mostly depends on user needs. Of course, high level of security is desired under allcircumstances. However, it is also clear that the security needed by average telephone userswill not be as high as that required for military or diplomatic purposes. Furthermore, thedegree of security is usually linked to recovered speech quality in an adverse relationship[4]. To many users, the quality of recovered speech is almost just as important as speechsecurity; therefore, they may be reluctant to accept a system that provides high level ofsecurity but has poor speech quality.As the telephone network is a very large communications system, it incorporates a greatvariety of transmission media and components, such as open wires, cables, relays, multi-plexers, repeaters, echo suppressors, etc. Consequently, channel conditions of a telephoneconnection have a very wide variance. Very noisy lines are not unusual. Therefore, it isimportant that the speech scrambling system be robust enough so that its performance has aminimal dependence on channel characteristics.Although speech frequency components range from 20 Hz to 20 kHz, most of the energyand intelligibility content reside in a much narrower band, as shown in Figure 1.2 (reproducedfrom [5]). In fact, the standard band of the telephone speech signal is 300-3400 Hz (CCITTChapter 1. Introduction^ 5500^1000 1500 2000 2500 3000 3500Frequency (Hz)Figure 1.2 Spectrum of human speech.Recs. G. 132 and G. 151), while the generally adopted band in North America is 200-3200Hz [6]. Clearly, any speech scrambling system intended for telephone users needs to work inthe permissible bandwidth. Any expansion from this bandwidth due to the speech scramblingprocess will possibly cause information to be attenuated by the channel and the recoveredspeech will no doubt suffer quality degradation.Most speech scramblers need to process the speech signal to achieve the scramblingeffect; therefore, processing delay is inevitable. In many cryptographic systems, processingdelay is not a big concern; however, this is not the case with a telephone speech scrambler,which needs to operate in real-time. If the processing delay is noticeable, it may affectthe efficiency of conversations, and the users may find the system unacceptable. Therefore,the processing delay should be kept at such a level that it will not be easily detected byaverage users.1.3 System RequirementsThe requirements of the telephone speech scrambler to be presented in this thesis canChapter 1. Introduction^ 6be briefly summarized as:• adequate security;• transparent operations;• reasonable cost.While the first and the third requirements are clear and needs no further explanation,the second one needs to be expanded. For a telephone user, transparent operations have thefollowing two implications: 1) processing delay should be tolerable if not indiscernible; and2) degradation of speech quality should be minimized. From the viewpoint of a designer,the second requirement also implies a third point: 3) the operations of the system shouldhave a minimal dependence on any change in channel conditions.1.4 Scope of ThesisAs there has been increased interest in speech scrambling algorithms, various newconcepts have been proposed. Therefore, instead of proposing a new algorithm with noapparent advantage over existing ones, we will first examine existing methods and adoptone that meets our requirements. Furthermore, we will study the problem of the generationof keys which will ensure security. We will propose a new method for the generation ofpermutation keys, plus a criterion that can be used as a measure of the secrecy level ofpermutation keys.The actual implementation of the proposed system, along with the new key generationmethod, is a straightforward hardware and firmware application of a Digital Signal Processing(DSP) chip, such as the Motorola DSP56000. Due to a lack of time, a hardware prototype ofthe system has not been built. Instead, the performance of the system has been experimentallyChapter 1. Introduction^ 7studied using two personal computers, each of which is host to a Motolora DSP56000 basedDSP board.1.5 Organization of ThesisAfter this introduction, Chapter 2 provides a review of existing speech scramblingtechniques. Various properties of these methods will be explored. In particular, the schemethat will be adopted in this thesis will be discussed in detail.Chapter 3 discusses how the scheme will be adapted to meet our requirements. Particularattention will be made to the problems of processing delay and recovered speech quality.Chapter 4 presents the new permutation key generation method.Chapter 5 describes the subjective tests performed to evaluate the performance of boththe system and the key generation method. The results of this experiment are presented.Chapter 6 contains a summary of the thesis and recommendation of topics for furtherresearch.Chapter 2 Review of Speech Scrambling SystemsMany types of speech scramblers exist. They may be classified into two groups: digitaland analog scramblers [4]. In this thesis, the classification is based on the followingdefinitions. A digital system digitizes and compresses the input speech to a bit rate suitablefor transmission in the telephone channel. The resulting bit stream is encrypted using a dataencryption technique. On the other hand, an analog system produces scrambled speech thatis an analog signal having the same bandwidth as the original speech signal. Analog ordigital signal processing techniques may be used in the analog system to generate this signal.2.1 Digital Speech Encryption SystemsFigure 2.1 shows the general block diagram of a digital speech encryption system. Thea (t)^{xk} ^ {Yk} ^—a(t)^{4}^{51-k}•^klemodulatorl•---Figure 2.1 General block diagram of a digital speech scrambler.original analog speech signal xa(t) is first digitized into a stream of data bits {xk } by ananalog-to-digital (A/D) converter. In most cases, the A/D converter would be a speech coderwhich encodes the speech at a low bit rate with reasonable speech quality. The convertedbit stream is then enciphered into a different bit stream fykl by some encryption algorithm7" with a particular key K:{Y1, Y2, • • • , Yz, • • •} = TK „x1,x2,•-•,x ,• • •}).^(2.1)8Chapter 2. Review of Speech Scrambling Systems 9The modulator/demodulator (modem) pair is responsible for transferring the encrypted speechbit stream {yk} over the telephone network. Since a digital speech encryption systemprocesses purely digital data, common methods used in data encryption can be adopted.There are many different techniques developed for data encryption. In the following sections,some methods which are representative are briefly discussed, followed by comments on themerits and shortcomings of digital speech encryption systems.2.1.1 The Data Encryption StandardA natural choice for the digital speech encryption algorithm T is the Data EncryptionStandard (DES) adopted by the National Bureau of Standards of the United States [3]. TheDES is a complex combination of substitution and transposition (permutation), two of thefundamental building blocks of encryption. Data input to the DES are organized as 64—bitblocks. The key is also 64 bits in length, although 8 of these are parity bits. A plaintextdata block undergoes 16 iterations of a combination of substitution and transposition, whichare controlled by the key, to produce 64 bits of ciphertext. Besides its high level of security,another characteristic of the DES is its ease of implementation. All the operations involvedin the algorithm are either binary arithmetic or based on table lookup. Furthermore, althoughcomplex, the algorithm is highly repetitive. Thus it can be implemented easily in softwareon a computer or in a chip.2.1.2 The Rivest-Shamir-Adleman Exponentiation CipherAnother well known encryption algorithm that can be used in the implementation ofa digital speech scrambler is the Rivest-Shamir-Adleman (RSA) [7] exponentiation cipher.This algorithm is able to provide a very high level of security. It is based on the idea thatwhile finding large prime numbers is relatively easy [3], factoring the product of two suchChapter 2. Review of Speech Scrambling Systems 10numbers appears to be computational infeasible. As opposed to the DES algorithm and otherconventional encryption methods, the RSA algorithm has not one but two keys. These twokeys are used for data encryption and decryption, respectively. The encryption key is madeup of two appropriately chosen integers N and E, while the decryption key consists of twointegers N and D. Data blocks, M, in the range 0 < M < N is encrypted by computingC = ME mod N. (2.2)The ciphertext C is decrypted by a similar exponentiation operation:M = CD mod N. (2.3)2.1.3 Stream CiphersBoth the DES and the RSA algorithms are known as block ciphers, as the speech datastream is broken into blocks and the same key is used to encrypt all the blocks. There isanother class of ciphers known as the stream ciphers which may also be used to implementdigital speech scramblers. Contrary to block ciphers, stream ciphers produce a key streambased on the input key, and combine the key stream with the data stream on an element-by-element basis. Usually, an element is either a character or a bit.A stream cipher may be referred to as synchronous or self-synchronous [3]. As shownin Figure 2.2, in a speech encryption system based on a synchronous stream cipher, thekey stream is generated independently of the ciphertext stream. If a character is lost duringtransmission, the sender and the receiver must resynchronize their key generators beforeproper decyphering can be resumed. Otherwise erroneous deciphering will occur. On theother hand, in a system based on a self-synchronous stream cipher as shown in Figure 2.3,Chapter 2. Review of Speech Scrambling Systems^ 11each key element depends on n preceding ciphertext elements. Therefore, if a transmissionFigure 2.3 A digital speech scrambling system based on a self-synchronous stream ciphererror occurs, the following n elements will be incorrectly deciphered. But the system willbe deciphering correctly again after receiving n correct ciphertext elements. The sacrificepaid for this self-synchronous property is the propagation of error which may be undesirablein some cases.A common implementation of stream ciphers is by the use of a Linear Feedback ShiftRegister (LFSR). A more sophisticated implementation may use DES to implement the keystream generator.Chapter 2. Review of Speech Scrambling Systems^ 122.1.4 CommentsThe advantages of digital speech scrambling systems over the analog counterparts arenumerous. The most significant is that digital speech scramblers generally are able to providemuch higher levels of security [4]. In fact, the RSA algorithm has been considered as oneof the most secure enciphering systems known [2]. Also, digital transmission is better thananalog transmission, because digital systems are inherently less likely to be affected bychannel noise or distortion. Usually, such impairments affect data transmission in the formof inducing bit errors which, up to a limit, can be corrected by forward error correctionschemes. On the other hand, any impairment present in the channel will have a definiteand irreversible impact on an analog signal, resulting in a degradation in speech quality.Furthermore, today's transmission technology is gradually evolving from analog towardsdigital, with the advent of digital networks such as Integrated Services Digital Network(ISDN). Undoubtedly, digital speech scramblers will be better at exploiting various newtechnologies such as packet switching and digital switching supported by the new digitalnetworks.Although they are attractive in many ways, digital speech scramblers are however notsuitable for our purposes. Despite recent advances in speech coding techniques, a veryhigh bit rate is still required to encode speech with acceptable decoded speech quality[8]. Researchers in the area of speech coding often rely on a subjective rating scale of1 to 5, known as the Mean Opinion Score (MOS), to quantify the level of digital speechquality. Based on the MOS, a score of at least 4 is required to have high-quality ornear-transparent coding, and decoded speech with a score of 3.5 is characterized by alevel of speech degradation that is easily detectable but not bad enough to impede naturalChapter 2. Review of Speech Scrambling Systems^ 13Table 1 MOS scores for standard speech coders.Coder MOS64-kb/s PCM 4.332-kb/s ADPCM 4.116-kb/s LD-CELP 4.08-kb/s CELP 3.74.8-kb/s CELP 3.02.4-kb/s LPC 2.5PCM: Pulse Code ModulationADPCM: Adaptive Differential PCMCELP: Code Excited Linear Predictive codingLD-CELP: Low-Delay CELPLPC: Linear Predictive Coder (vocoder)telephone communication. Normally, a score of at least 3.5 is considered to be acceptablefor transmitting speech across telecommunications networks. Table 1 presents the MOSscores for various speech coding methods. To transmit data over the telephone networkat reasonably low cost, low speed modems (typically at 2.4 to 4.8 kb/s) are generally used.However, as indicated in Table 1, low bit rate speech coders do not provide sufficient decodedspeech quality. As we noted in Section 1.3, the descrambled speech quality is very importantand any degradation thereof should be minimal. Furthermore, low speed modems, even at thetransmission rate of 9.6 kb/s, do not necessarily provide adequate bit error rate performanceover some analog channels [9]. To be able to have better speech quality, one can always usea higher speed speech coder and thus a higher speed modem; however, the high cost of suchmodems make it impractical to use them in our application'. Furthermore, low bit rate speechcoders are often expensive. Therefore, for our application, we resort to analog scramblers .I In recent months, the cost of 14.4 kb/s modems have come down dramatically. Their current retail price is around $300. Thisreduction in price certainly allows a speech encryption system with a fair speech quality to be built at a reasonable cost. However, such asystem may still be much higher than what the public is willing to pay, which is, say, below $200. Furthermore, the prices of DSP productshave also decreased dramatically recently; therefore, an analog speech scrambling system based on DSP still compares very favorably.Chapter 2. Review of Speech Scrambling Systems^ 142.2 Analog Speech Scrambling SystemsThe general block diagram of an analog speech scrambler is shown in Figure 2.4. Theanalog speech signal xa(t) is scrambled into a different analog waveform ya (t) by somescrambling algorithm T:ya(t) = Tic (xa(t)). (2.4)Analog scramblers operate in either the time or frequency domain. In the followingsubsections, we will describe and discuss some common analog scramblers. At the end,comments will be made on the suitability of each method for our application.2.2.1 Traditional Frequency Domain Speech ScramblersTraditional frequency domain speech scramblers operate on the analog speech signaldirectly. Examples include frequency inversion, band-shift inversion, and bandsplitting.These schemes generally suffer from a low level of security as well as other practicalproblems such as bandwidth expansion.2.2.1.1 Frequency InversionFrequency inversion is one of the earliest methods used in analog speech scrambling.It lowers the intelligibility of speech by inverting the spectrum of the speech, as illustratedChapter 2. Review of Speech Scrambling Systems 15in Figure 2.5. On its own, frequency inversion provides very little security. The reason issimple: rather than distorting the original speech signal, frequency inversion instead convertsit into a different signal, while retaining most of the characteristics of the original speech,such as pitch and the temporal order of the sound patterns. The net result is as thoughthe language has been transformed by mapping it using a different pronunciation scheme[4]. Therefore, it is possible to learn to listen to speech scrambled by frequency inversion.Another shortcoming of this method is that it effectively has only one key.Frequency^Frequency(a) ( b )Figure 2.5 Frequency inversion. (a) Original speech spectrum and (b) scrambled spectrum.Frequency inversion is not suitable to be used on its own as a means of providingprivacy; however, it can be combined with other methods to enhance security. It can easilybe implemented. For example, on a sample based analog system, frequency inversion canbe achieved by simply inverting the signs of alternating samples when the speech signal isencoded by the method of Pulse Code Modulation [4].2.2.1.2 Band-Shift InversionBand-shift inversion is an improvement on the frequency inversion method. The im-provement exhibits itself as a provision of a key. In this method, the spectrum is not simplyChapter 2. Review of Speech Scrambling Systems^ 16inverted, but is also shifted by a frequency L. Although it performs better than frequencyinversion, it still is simplistic [4].A^ fsFrequency^ Frequency(a) (b)Figure 2.6 Band-shift inversion. (a) Original speech spectrum and (b) scrambled spectrum.2.2.1.3 BandsplittingThe basic concept behind bandsplitting is to divide the speech spectrum into a numberof sub-bands. The order of these sub-bands are then rearranged. Some of these sub-bands may also be inverted to increase the level of security, as illustrated in Figure 2.7.Compared to the previous two methods, bandsplitting improves the level of security. Infact, the basic idea of permuting the speech spectrum found in bandsplitting is also usedin modern frequency domain speech scramblers which use Digital Signal Processing (DSP)1 2 3 4 5 6 Frequency^Frequency(a) (b)Figure 2.7 Bandsplitting. (a) Original order of sub-bands and (b) scrambled spectrum.Chapter 2. Review of Speech Scrambling Systems 17techniques to process the speech signal. However, a traditional bandsplitter usually relieson hardware for the division and permutation of the speech spectrum; therefore, due tosystem hardware limitations, the number of sub-bands is limited. This limitation has twoundesirable consequences. Firstly, the key space is restricted. Furthermore, as not all possiblepermutations produce good results, the space of useful keys is further reduced. Therefore,it may be possible to break the system by trying out all keys. The second limitation isthat the bandwidth of each sub-band is of a fairly large size. Experiments have indicatedthat usually, the wider the bandwidth, the higher the residual intelligibility. In fact, it hasbeen reported [10] that it is possible to obtain reasonably good intelligibility out of a voicetransmission which is band limited to a 300-500 Hz spectrum. Therefore, if the bandwidthof each sub-band falls in this or a higher range, the scrambled speech signal may still carryconsiderable intelligibility.2.2.2 Time Domain Speech Scramblers2.2.2.1 Reverse Time SegmentationReverse time segmentation is a simple example of a speech scrambler that operates in thetime domain. In this method, the speech signal is divided into time segments. The temporalorder of these segments is then reversed to achieve the scrambling effect. Similar to thefrequency inverter, this method offers very little security [4].2.2.2.2 Time Segment ScramblingThe method of time segment scrambling is analogous to the technique behind bandsplit-ting. The speech signal is first divided into frames of equal periods. Each frame is furthersubdivided into small equal time periods called segments. The speech signal is scrambledChapter 2. Review of Speech Scrambling Systems^ 18by permuting these segments within a frame. Both the frame length and the segment lengthhave to be chosen carefully.Permuting speech segments will inevitably cause discontinuities in the signal. Thesediscontinuities induce high frequency components in the scrambled signal and thus increaseits bandwidth, as well as degrade the speech quality. Since dicontinuities only occur at thesegment boundaries, the number of discontinuities is directly proportional to the number ofsegments. To minimize the effects of discontinuities, the number of segments will be keptpreferably as low as possible. Furthermore, if the length of segments were so short thatthose segments in the same vicinity have similar amplitudes, then the scrambling effect willbe reduced. On the other hand, a segment duration cannot be so long that it contains anintelligible portion of the speech. It is clear that in such a case, the scrambling effect willbe severely compromised.The frame length determines the total delay as well as the size of the key space. Fora fixed segment size, a longer frame length implies a larger number of segments in aframe, which in turn implies a larger key space. However, a large frame length meanslong processing delay. Let D be the one-way processing delay of the speech scramblingsystem. In a time segment scrambling system, D is given by:D = fs (2.5)where N is the frame length and f s is the speech sampling frequency, which is typically 8kHz. To keep the delay low, N has to be low also. However, experiments show that, unlessthere are a large number of segments per frame, most time segment permutation scramblersperform poorly [11], [4]. For example, Jayant et al. found that for a time segment scramblingChapter 2. Review of Speech Scrambling Systems^ 19system with a one-way processing delay as long as 260ms, the resulting speech still has anintelligibility as high as 45% [11].2.2.3 Modern Frequency Domain Speech ScramblersModern analog scrambling methods are based on sample value scrambling using DSPtechniques. The concept of sample value scrambling was introduced to elevate the levelof security achievable by an analog scrambler [12], [13], [14]. Instead of processing thecontinuous-time signal xa(t), sample value scrambling chooses to process the discrete-timesample sequence x[n] = xa(nT), where T is the sampling period. By doing so, not only canthe speech signal xa(t) be scrambled at a much higher level, but the size of key space is alsoincreased significantly. In practice, the sequence x[n] cannot simply be randomly permutedand then interpolated, as the bandwidth of the resulting signal will be increased due to thetime discontinuities introduced by scrambling. Usually, x[n] is scrambled in a domain inwhich scrambling will not result in bandwidth expansion, i.e.:X[n] = g (x[n])^ (2.6)Y [n] = P(X[n])^ (2.7)where g is a transformation, and P is a permutation operation. After being scrambled in thenew domain, Y[n] is converted back into the time domain for transmission:y[n] = g-i(Y[n]).^ (2.8)In other words, the transformation T is in fact a composite function:T=Sogopog-i 0 1^ (2.9)Chapter 2. Review of Speech Scrambling Systems^ 20where S is the sampling operation, while I is the interpolation operation and fog E f(g).Wyner [12] suggested the use of an orthogonal transform called the Prolate SpheroidalTransform (PST) for the transformation g. PST uses a set of real sequences called the ProlateSpheroidal Sequences as the orthogonal basis and is able to maintain the same bandwidthafter the speech samples are scrambled. However, it requires lengthy computations and istherefore not practical.Weinstein [13] and Lee et al. [14] proposed transforming the speech samples to thefrequency domain by the Discrete Fourier Transform (DFT). The bandwidth of the speechsignal scrambled by a DFT based system is expected somewhat expanded. However, the gainin computational efficiency is tremendous as DSP techniques such as Fast Fourier Transform(FFT) can be exploited for real-time processing. Note that both of these two methodsrequire synchronization for correct descrambling. By making use of the Filter Bank conceptdeveloped for short-time analysis of speech [15], [16], Lee et al. [17] subsequently proposedan FFT based speech scrambling technique that does not require any synchronization. Boththe synchronous and the asynchronous DFT scrambling methods will be described below.2.2.3.1 The Synchronous FFT Speech Scrambling MethodThe block diagram of the synchronous FFT scrambling system is shown in Figure 2.8.The synchronization generator embeds timing signals used to keep synchronization in thescrambled speech signal. Segments of length N, xr[m], where r is a time-wise index forthese segments, are extracted from the speech sequence by the technique of windowing, i.e.,x[n] is multiplied by a window w[n], which is non-zero for 0 < n < N:x , [in] = x[r N — N + m + l]w[N — in — 1], 0 < in < N.^(2.10)x ( ) • x [n]A/D [n] t )^ ii[n][ nND FFT IFFT D/Ax [n] ^)7(t0 •^•x(t)^,../speech)output FFT coefficient^[n]inverse perm.(speechinput ^ X [n]FFT --11111.^ Y InlFFT coefficientpermutation D/AsynchronizationgeneratorsendIFFTChapter 2. Review of Speech Scrambling Systems 21sample timingframe timing^receiveFigure 2.8 Block diagram of the synchronous FFT scrambling system.Normally, a rectangular window will suffice; however, windows of other types may alsobe used.The speech segments are transformed into frequency domain by taking the DFT ofxr[rid:^[m] = DFT (x,[m]).^ (2.11)The frequency coefficients in each segment are then permuted and subsequently transformedback to the time domain:Yr [m] = P(X,[mJ),^ (2.12)^yr [m] = DFT' (177.[m]).^ (2.13)The sequence yr [m] is then interpolated into the continuous waveform y(t) which is trans-mitted.On the receiver side, the received signal y(t) is processed by the same sequence ofoperations, except that P is replaced by the inverse permutation operation P-1 which reordersthe frequency coefficients 1 [ni] to their original places.Chapter 2. Review of Speech Scrambling Systems^ 22Special precautions must be taken when considering the permutation operation P. Firstly,since xr[m] is a real sequence, Xr[m] has the following special properties:IX,[m] l = IX,[N — mil,^ (2.14)‹{X,[m]l = —‹{X, [N — mil^(2.15)i.e., the frequency coefficients are symmetrical in amplitude and asymmetrical in phase. IfP is completely random, these properties may not be maintained, and the resulting sequenceyr [ni], and thus the scrambled speech signal y(t) will be complex in general. To ensure thatthe speech signal will be real after scrambling, P must be chosen so that amplitude and phasesymmetries are retained after the permutation operation. A simple solution is to randomlypermute only the lower half of the frequency coefficients. The upper half will be reorderedaccordingly so that symmetry is maintained.As was indicated in Section 1.2.2, the telephone channel is typically band-limited to200-3200 Hz. Therefore, it is desirable to permute only those frequency coefficients withinthis range and leave the rest intact. Otherwise, coefficients from this range may be displacedoutside of the range and will be attenuated by the channel. This may lead to degradation inthe speech quality. The frequency sub-bands that reside in the above frequency range arereferred to as the permutable sub-bands.Values of N, which is also known as the FFT frame length, typically range from 128 to5122. The sampling frequency is usually about 8 kHz.2^To facilitate the use of FFT, frame lengths are taken to be highly composite numbers, often powers of 2.Chapter 2. Review of Speech Scrambling Systems^ 23H k(ejw)27tFigure 2.9 The N ideal filters in a filter bank.2.2.3.2 The Asynchronous FFT Speech Scrambling MethodThe asynchronous DFT scrambling method is based on the concepts of filter bankanalysis and synthesis used in short-time analysis of signals with a varying spectrum suchas speech. These two concepts will be summarized first, followed by a general discussionof the algorithm.2.2.3.2.1 Filter Bank Analysis and SynthesisAn ideal filter bank is made up of one ideal lowpass filter and N — 1 ideal bandpass filters.Each of these filters has bandwidth 27r I N and is centered at wk = (27r I N)k in the spectrum,where k = 0, 1, N — 1. The frequency responses Hk(e3w) of all these filters are shown inFigure 2.9. Let hk[n] be the impulse response of the ideal filter 11k (e3''). Then we have:sin (V)hp [n] =^—oo <n < oo.^(2.16)n7rAnd the other impulse responses hk[n] may be expressed as:hk[n] = ho[n]e3.^ (2.17)L-1u[q] = n rN q]ho[—rN — q]^(2.20)Chapter 2. Review of Speech Scrambling Systems^ 24When the speech signal x[n] is filtered by the filter bank, the frequency componentsof x[n] in each of the N bands are obtained. These frequency components can be down-converted to the baseband by multiplying by a sequence e —lwkn for k = 1,2, ...,N — 1.Let X. [k] denote the down-converted baseband frequency components, where n is the timeindex, and k is the frequency index. Xn[k] can be expressed as:X[k] = (x[n] * hk[n])e—iwkn(cx)=^E x[m]hk[n — rni c—iwknm=-00(=^L x[m]ho[n_ Tnidwk(n_m) e_iwknm-0000x[m]ho[n — m]e-3(21÷TrOm^ (2.18)171=-00where * is the convolution operator.Although ho [n] has an infinite duration theoretically , it can be approximated by beingtruncated into a length 2LN,L >> 1. Therefore Equation 2.18 is now written as:n-FLNXn[k]',-:;^E x[m]ho[n — m]e-1(2711k)m^(2.19)m=n—LNWe observe that the term e-3(2/71)m is periodic in m with period N. This implies that,in the sequence x[m]ho[n — m], elements which are N places away from one another aremultiplied by the same value of e—J(*rk)m. These elements can be summed together firstto reduce the total number of multiplications. A new sequence with N elements is formedby this summation:Chapter 2. Review of Speech Scrambling Systems^ 25and Equation 2.19 becomes3:N-1X iz[k] =^un[q - n modN]e-3(27÷Trk)q^(2.21)q=0In the above equation, q — n mod N is a circular shift of the index q by n mod N. Thisshift is used to provide the necessary linear phase which is present due to the shift of timereference from m 0 in Equation 2.19 to q = 0 (which implies m = n) in Equation 2.21.Note that Equation 2.21 is in the form of a Discrete Fourier Transform, i.e.:{X [k], k = 0, 1, . . . , N - 11 = DFT fun[q - n mod N], q = 0, . . . , N -^(2.22)Because X7, [k] is a low-pass signal with bandwidth 'TIN, the sampling rate can bereduced by a factor of R without suffering from aliasing, as long as R < N (see pp. 102— 105 of [18]). When X„[k ] is down-sampled by a factor of R, the filter-bank analysisequation becomes:N —1X rR[Ic] = YT ttrR[q - rR modN]e-j(akrk)q^(2.23)q=0or:R[k], k = 0,1, . . . , N — 11 = DFT furR[q — r R mod N], q = 0, . . . , N — 11. (2.24)To recover the original speech signal x[n] from the down-sampled frequency sequenceXrR[k], the method of filter bank synthesis is applied. Portnoff [16] showed an easy wayThe approximation sign will be dropped from now on, though the fact that the analysis equation is still an approximation does notchange.Chapter 2. Review of Speech Scrambling Systems^ 26of performing the synthesis that makes use of the technique of FFT. First, the inverse DFTof down-sampled sequence X,R[k] is evaluated:{x,R[k], k = 0, 1, . . . , N — 1} = DFT- 1 {X,R[k], k = 0,^,N — 11.^(2.25)Then the sequence x,R[k] is interpolated to form the complete time sequence xn[k]. Theinterpolation operation is performed by first filling zeros in the sequence x,R[k]:v [k] = {x rR[lc], n = r R, r = 1, 2, • • •0^other n(2.26)and then filtering the sequence v[k] by a low-pass filter with cutoff frequency 1 N and atruncated length of 2Q R:n-FQR-1vm[n mod IV] f [n —^ (2.27)m=n—QRThe above operations may be graphically represented in Figures 2.10 and 2.11. Figurex (r-DR[k]^x rR [k]^x (r+nR[k]N-1^o^•00000•o oo o o•o oN-2^o^•000 oo•o oo oo•o oo^•00000•o oo oo•o oo^• o o o o o • o o o o o • o o3^a^• o o o o o • o o o o o • o oo^• o o o o o • o o o o o • o o1^a^• o o o o o • o o o o o • o o0^1^(r-1)R^ rR^ r+ /1/?Figure 2.10 Net on which vn [k] is defined.x(r-nR [k I• 0•0Chapter 2. Review of Speech Scrambling Systems^ 27interpolating windowx rR[k]^x (r+1)RridO 0 0 • 0 0 0 • 0 • 0 0O 0 0 • 0 0^0 • 0 0O 0 0 • 0 • 0 0 0 • 0 0O 0 • 0 0 0 0 0 • 0 0O 0^0 0 0 0 0 • 0o • • 0 0 0 0 0 • • 00 • 0 0 0 0 •^0 0ko o • o o o o • • o orR^ (r+1)RFigure 2.11 Performing interpolation on the net on which vr,[k] is defined.2.10 shows the net on which vr, [k] is defined. The shaded points represent the values ofX r R[k] while the hollow dots represent the fill-in zeros. Figure 2.11 shows the interpolationoperation, which is performed along the zig-zag line in the net. All the points inside therectangular window are used in the computation for one value of x[n].2.2.3.2.2 The Asynchronous Speech Scrambling MethodOperations of the asynchronous FFT scrambling scheme are very similar to those of thesynchronous method, as is illustrated in Figure 2.12. Call N the FFT frame length, andL the window length factor. Frames of length 2LN, yrR[n], are extracted from the speechsequence x[n] by the multiplication of an FIR low pass filter, h[n], of the same length:h[n] = ho[n]w[n]^ (2.28)PI- n0^1where h0 {n] is the sinc function as defined in Equation 2.16. For possible choices of w[n], itshould be noted that a tapering window such as the Hamming Window, the Hanning Window,Chapter 2. Review of Speech Scrambling Systems^ 28X [n]P1^FFT FFT coefficientpermutation x ( I ) x[n](speechinput NDyLnj ^YLnj^Y( )IFFT^P2^D/Afilter bank^ filter bankanalysis synthesissendFFT coefficientinverse permfilter bank^ filter bankanalysis synthesisreceiveFigure 2.12 Block diagram of the asynchronous FFT scrambling system.jr[n -f[n]ND^P1 -110. FFTspeech)output i[n] ^n]IFFT^P2^D/Aor the Blackman Window should be used to reduce the Gibbs phenomenon. The applicationof the windowed filter h[n] is repeated after every R samples, which is the down-samplingfactor. Once a frame is extracted, it is divided into 2L consecutive segments, each of lengthN. Elements in the same positions of these segments are then added together to form a newframe of length N. The above operations are illustrated in Figure 2.13. After the indexof this frame is shifted appropriately, according to Equation 2.20, the result is ur R[n]. Theabove operations of producing u R[n] is represented by the P1 block in Figure 2.12. u R[n]is transformed into the frequency domain sequence X, R[k] by FFT. X, R[k] is then permutedinto YR [k] which is converted into yr R[k] by Inverse FFT. Intermediate zeros are filled inr R[k] to produce vii[k], which is interpolated into the scrambled speech sequence y[n] asdictated by Equation 2.27. The value of Q can be chosen to be equal to L and a sinc function(Equation 2.16) can be used as the interpolation filter f[n]. The operations that constructy[n] from YR [k] is represented by the P2 block in Figure 2.12.The scrambled speech can be unscrambled by following the same procedures as inscrambling. An inverse permutation is performed on the frequency sequence to rearrangethem to the original order. For proper unscrambling, the system does not need to keep frameChapter 2. Review of Speech Scrambling Systems^ 29Figure 2.13 The operations of preparing a frame for FFI.synchronization. The reason why synchronization is not needed is given below.First, we will assume that there is perfect synchronization. Let fi[n] be the receivedscrambled speech sequence. It is easy to see that when we apply filter bank analysis on [n] ,we will get the frequency component sequence 17-„R [k], which we can apply descramblingto rearrange the frequency components to the original order. Then we apply filter banksynthesis to the rearranged frequency components to reconstruct the time sequence 32 [n] . Ifthe channel is ideal, i.e. fi [n] = y [n] , then ±'[n] will be identical to the original speechsequence x [n] . Now we consider the case when the system is out of synchronization by8 samples. This means that when we apply filter bank analysis on [n] , instead of takingChapter 2. Review of Speech Scrambling Systems 30frames of length 2LN starting at samples rR,r = 0, I, . . . , we start at samples rR s.Therefore we will end up with a frequency component sequence -17.R+5[k]. Note that thissequence is also down-sampled by the rate R. By the sampling theorem, 1->;.R+, [k] andf7rR[k] are equivalent representation of -Y[k], and thus i[n], as they are both down-sampledby the same rate. Therefore, for the cases when s 0, 9[n] will be resolved into the samefrequency components as the ideal case when = 0.We will now examine the issue of asynchronous operation from a more general viewpoint.Recall that in the filter bank analysis operation, frames of length 2LN at intervals ofR samples are taken from the speech sequence x[n] to be processed. In other words,these frames overlap with each other. One implication of this operation is that there issignificant redundancy in the processed signal. Subsequently, any point of the signal containsinformation of the points preceding it or following it. When the amount of redundancy issufficient, the need for synchronization is eliminated.2.2.4 CommentsModern frequency domain scrambling methods compare favorably with the traditionalfrequency domain as well as the time domain techniques. The level of security providedby the modern frequency domain methods is significantly better. The idea behind boththe synchronous and the asynchronous FFT scrambling methods is the same as that ofbandsplitting. However, the modern methods are able to divide the spectrum into muchfiner sub-bands. This not only allows the speech to be scrambled to a higher degree, butthe key space is also expanded greatly. On its own, the method of time segment scramblingis not able to provide low residual intelligibility [11]. When combined with the method offrequency inversion, its performance improves [11]; however, the residual intelligibility isChapter 2. Review of Speech Scrambling Systems 31still not as low as that achieved by an FFT scrambling method [14], [17]. Furthermore,the associated bandwidth expansion is undesirable for an application that is based on thetelephone channel.Cost may be a legitimate concern for the application of a modern frequency domainscrambling method. However, with the continuous breakthroughs in the technology ofintegrated circuit design, the cost of DSP products have decreased dramatically in recentyears. Therefore, cost is no longer an impeding factor in the use of a DSP based system.When compared to the asynchronous FFT scrambling method, the synchronous methodhas some advantages. First, since speech frames are synchronized between the transmitterand the receiver, it is possible to apply the technique of multiple frame permutation, i.e.,a different permutation is applied to each frame within a group of frames. This techniquedoes not necessarily lower the unintelligibility; however, it improves security provided by thesystem by increasing the difficulty of cryptanalysis. Another advantage of the synchronousmethod is a shorter processing delay. There are two sources of delay in a speech scramblingsystem, buffering delay and scrambling delay. Buffering delay is the time duration requiredby the system to accumulate enough speech samples before it can initiate the scramblingprocess. Scrambling delay refers to how long it takes to scramble the samples in the buffer.With the speed and power of today's DSP microprocessors, scrambling delay is usually shortenough and can be neglected. Buffering delay depends on the sampling rate and the numberof samples the system needs to accumulate. For a frame length of N, i.e., the speech spectrumis to be divided into N frequency components, the buffering delay of the synchronous methodis N samples and that of the asynchronous method is 2LN samples. Therefore, for the samevalue of N and sampling rate, the asynchronous method requires a longer buffering delay.Chapter 2. Review of Speech Scrambling Systems 32The most attractive feature of the asynchronous method is its ability to operate asyn-chronously. The need for synchronization adds complexity to the hardware, or even thesoftware of the system, and thus increases the total cost. It also forces the system to bedependent on channel conditions. This dependence weakens the robustness of the system.Most methods used for maintaining synchronization involves imbedding timing signals in theregular speech signal. Under bad channel conditions, these timing signals (along with thespeech signal), may be lost and the system may be out of synchronization temporarily. In arecent design of a synchronous scrambler [19], it has been reported that it may take as longas 2 seconds to regain synchronization. Such delay will certainly affect the users' efficiencyin a conversation; therefore, many users may find it unacceptable.As it was noted in Section 1.3, the speech scrambler for telephone users should preferablyhave transparent operations. The asynchronous method is best at meeting this requirementas it reduces the dependence of the operation of the system on channel conditions. Itsrequirement for a longer processing delay can be met by choosing an appropriate set ofvalues for L and N such that the resulting delay will be within tolerable limits. The trade-off is a slight increase in residual intelligibility as will be shown in Chapter 5.Chapter 3 System DesignAs was pointed out in Chapter 2, the asynchronous FFT speech scrambling method isbest at meeting our requirements for a telephone speech scrambler. However, its parametersneed to be chosen carefully so that there is an optimal compromise amongst the level ofsecurity, processing delay, and speech quality. In this chapter, we will examine the systemdesign issue of configuring the asynchronous FFT speech scrambling algorithm so that thesystem will operate with acceptable delay as well as speech quality at the optimal level.Specifically, we will examine the issue of keeping the processing delay to an tolerable limit.We will also study the relationship between the bandwidth of each permutable sub-band andspeech quality.3.1 Delay Effects and Detectability ThresholdStudies [20], [21], [22] have shown that the efficiency of conversations degrades with anincrease of delay in a telecommunications system. To avoid this degradation, delay needs tobe kept within a limit so that its presence is not easily detected. Delay perception is largelya subjective matter and it has great dependence on human factors and conversational modes.Telephone conversations are usually made up of on-off speech patterns, i.e., talk spurtsinterleaved with silence periods [23]. A talker usually expects a response from his talkingpartner within a certain silence period. If the response time caused by processing delayis consistently longer than this period, delay is perceptible. The length of the expectationperiod varies widely with the kind of conversation that is taking place. Kitawaki and Itoh[22] found that delay perception depends on two speech characteristics: 1) the duration ofthe talk spurts, and 2) the variance of the duration of the talk spurts. As the duration and its33Chapter 3. System Design 34variance increase, the delay effect becomes more and more difficult to be detected. Kitawakiand Itoh further conducted a subjective test to study the delay detectability threshold ofsubjects with different levels of training engaging in various kinds of conversations. Theresults are shown in Figure 3.1 which is reproduced from [22]. The detectability threshold1000CZ 500c—I^I^I^I^I^I^I ^/` ^,^/` ,^TASK1 TASK2^TASK3 TASK4 TASKS TASK6Task 1: Take turns reading random digits aloud as quickly as possible.Task 2: Take turns verifying random numbers as quickly as possible.Task 3: Words with missing letters are completed with letters supplied by the other talker.Task 4: Take turns verifying city names as quickly as possible.Task 5: Determine the shape of a figure described verbally.Task 6: Free conversation.Figure 3.1 Delay detectability threshold for various kinds of conversations.ranged from 90 ms for Task 1 as perceived by the trained crews to 1120 ms for Task 6 asperceived by the untrained businessmen, housewives, and students. The results confirmedthat there is a wide variance in the detectability threshold depending on the nature of theconversation, and the talkers' technical background, as well as the level of training theyreceived in detecting processing delay. In designing our system, we need to be conservativeso that the processing delay will not be easily perceived in most situations. On the otherhand, we have to be realistic. Using the result of Task 1 to establish our reference point isC: Trained (Crews)E: Untrained (Lab. employees)U: Untrained (Businessmen,Housewives, Students)Chapter 3. System Design 35unreasonable because of the stringent nature of the task and the technical background of thesubjects involved. For Task 2, the detectability threshold of one way delay as perceived bythe trained crews and the untrained laboratory employees range approximately from 100 msto 175ms. This range would be a reasonably conservative reference point used to measurewhether the delay produced by our system can be easily perceived.As indicated in Chapter 2, the processing delay in the asynchronous FFT speechscrambling system depends on two parameters: L, which is the window length factor, andN, which is the FFT frame length. To process the speech signal, either the scrambler orthe descrambler needs to accumulate 2LN samples; therefore, the total one-way processingdelay, D, can be approximated by the following relationship:4ND = Lfs(3.1)where f, is the speech sampling frequency. A typical value for h is 8 kHz. Table 2 givesdelay values for various combinations of values of L and N.Table 2 Delay values for various values of L and N.L N .1'5 (kHz) D (ms)5 128 8 3203 128 8 1925 64 8 1603 64 8 96Clearly, for N = 64 and L = 5 or 3, the processing delay will be within the limit rangeestablished above. It has been reported [24] that for the combination of L = 3 and N = 64,the speech quality is good. For the combination of L = 5 and N = 64, the speech quality isexpected to be better, since the approximation of the ideal filters will be more precise in thisChapter 3. System Design^ 36case. Thus these values can be adopted to provide a good compromise between processingdelay and speech quality.3.2 Bandwidth of Permutable Sub-bands and Speech QualityThe crux of the idea behind sample based scrambling techniques, when compared tothe traditional bandsplitter, is that the bandwidth of permutable sub-bands can be reducedsignificantly. This will allow the spectrum to be scrambled at a much higher level. Thenormal practice in a FFT speech scrambler is to treat each frequency component as anindividual sub-band. However, it should be clear that it is also possible to group a pairof neighboring components as one sub-band. The reason for such a practice may not beapparent at first, as it will reduce the key space by half. The explanation is given below.In Chapter 2, it has been pointed out that when the speech signal is scrambled in thetime domain, distortion will occur in the form of high frequency components. Distortionalso exists when the speech spectrum is scrambled. The cause of the distortion also stemsfrom discontinuities resulting from scrambling. Naturally, the more discontinuities there are,the more severe the distortion becomes.A preliminary subjective test was set up to study the relationship between the bandwidthof the permutable sub-bands and the descrambled speech quality, for a fixed value of FFTframe length (N = 128). Let NsB represent the number of neighboring frequency componentsper sub-band. Five speech segments were scrambled and descrambled for the following fourcases: a) NsB = 1, b) NSB = 2, c) NsB = 4, and d) NsB = 8. Ten subjects participatedin the subjective test. Each person was asked to first listen to the four versions of eachsegment, and then to rank them in terms of speech quality. The average of the ranks ispresented in Figure 3.2. As is shown in the graph, the speech quality is best when fourChapter 3. System Design^ 37frequency components are permuted together as a sub-band. Speech quality gets worse whena sub-band consists of either more or fewer components. The reason for this is as follows.Distortion in speech due to spectral discontinuities is directly proportional to the totalnumber of discontinuities. Let NDis be the total number of discontinuities. NDis may beexpressed as:NDis PDis(SBi) (3.2)where SBi is the ith sub-band and PDis(SBi) is the probability that this sub-band gives riseto a discontinuity in its new location in the spectrum. Note that PDis(SBi) decreases with anincrease in N. The reason is that, as the number of sub-bands increases, the division of thespeech spectrum becomes finer; therefore, there will be an increase in the probability thattwo sub-bands will have similar energy level. Thus placing two such sub-bands together inthe scrambled spectrum will not necessarily induce a discontinuity.The number of sub-bands N is inversely related to the number of frequency componentsper sub-band. Thus with an increase in the number of frequency components in a sub-band,4.002 3.00-0 2.00G)coE 1.0080.001^2^4^8No. of Frequency Components in a Sub-band (N so )Figure 3.2 Quality ranking of speech scrambled with different number of frequency components per sub-band.Chapter 3. System Design 38two trends prevail in the change of the number of discontinuities. The increase in N willinduce an increase in Nms; however, the decrease in PDis(SBi) will decrease NDis. Thenet effect result is that NDis will first decrease and then rise again. Therefore, speech qualitywill be relatively worse either when the bandwidth of the permutable sub-bands is either toosmall or too big. The optimal value for the bandwidth will be somewhere in the middle,as is indicated in Figure 3.2.For N = 64, a similar trend exists. Permuting two frequency components together as asub-band (i.e., NsB = 2) gives better speech quality than permuting individual componentsalone (i.e., NsB = 1). A subjective test confirming this result is presented in Chapter 5.One concern in having multiple frequency components in one sub-band is the decrease insecurity. This decrease in security may be reflected in two ways: 1) a potential reduction inresidual unintelligibility of the scrambled speech; 2) the reduction in the key space. As willbe shown in Chapter 5, subjective tests show that there is no apparent reduction in residualunintelligibility when comparing the case of NsB = 2 to the case of NsB = 1 for N = 64.On the other hand, the reduction in key space is unavoidable. Whether this reduction has asignificant effect on the security of the system is considered below.In Chapter 2, it was pointed out that the allowable permutations of sub-bands haveto satisfy two constraints: 1) the scrambled signal has to remain real; 2) the telephonechannel is band-limited to 200-3200 Hz; therefore, only this frequency range in the speechspectrum should be permuted. Since the spectrum of the speech samples spans the range0-4000 Hz, the second constraint implies that only three quarters of the spectrum is to bepermuted. Together, these two constraints imply that, for N = 64, the number of sub-bandsfor permutation is 24. Therefore, the total number of available keys is 24!. Of course, notChapter 3. System Design 39all these keys are useful. The number of useful keys that are generated by the algorithm tobe discussed in Chapter 4 is (24 — 1)! = 23!. When two components are grouped as onesub-band, this number is reduced to 11! = 3.99 x 107. A key space of this size should meetthe security requirement of most users of the public telephone network.Chapter 4 The Scrambling Key Generation AlgorithmIt is well known that there are n! ways to permute the speech spectrum if it consistsof n permutable sub-bands. However, not all the permutations result in a scrambled speechsignal with acceptably low residual intelligibility. Only a subset of them are fit for use askeys for speech scrambling. Therefore, the problem of key selection is an important one inthe design of a speech scrambling system.4.1 The Key Selection ProblemLet S represent the set of useful permutation keys, and S-1, the set of inverse permutationkeys. Ideally, S should satisfy the following two requirements:1. All keys in S must produce unintelligible speech.2. For every key, Pi, in S, there exists one and only one inverse, Pi-1, in S-1 suchthat only P-1 can be used to descramble the speech scrambled by P. Using anypermutation key other than Pi-1 should produce unintelligible speech.Let I represent the identity permutation, i.e., the permutation with all its elements stayingat their original places. It may be hypothesized that the residual unintelligibility producedby a permutation, Pi, may be related to a parameter, D(P, , I), which measures its distancefrom I. The greater the value of D(P, , I), the more unintelligible is the speech scrambledusing P2. Consequently, the first requirement may be transformed into the following:D(Pi, I) > D th^for all Pi in S.^ (4.1)where Dth is a threshold value chosen to limit the residual intelligibility of the scrambledspeech signal to an acceptably low level.40Chapter 4. The Scrambling Key Generation Algorithm 41The second requirement calls for comparisons between keys. By the above distancehypothesis, it demands that the distance between any pair of keys has to be at least of athreshold value, i.e., it may be transformed into the following:D(Pi, Pj) > Dth for all Pi and Pi, i j, in S (4.2)It is easy to see the second requirement is far more stringent than the first one, as itdemands that a certain distance be kept amongst all the keys themselves, not just from thesingle identity permutation I. However, even the first requirement is difficult to meet. It isdifficult to establish an algorithm to construct the theoretical set S from the n! permutationsbecause speech perception is quite a subjective matter; therefore, it is very difficult to quantifythe parameter D analytically. Instead, researchers have tried to use other parameters whichcan be analytically derived to approximate the effect produced by the idealized parameter D.Two such schemes which have recently been proposed are described below. One of thesescheme is proposed by Sakurai et al. [19] and will be referred to as the Scheme 1 in thisthesis. The second scheme, proposed by Del Re et al. [24], will be referred to as Scheme 2.4.2 Existing Schemes4.2.1 Scheme 1Sakurai et al. [19] proposed the use of Hamming Distance (HD) of a permutation keyas a measure of the residual intelligibility of the speech scrambled by the permutation key.The HD of a permutation key is defined as the number of elements that are permuted out oftheir natural order. Roughly speaking, the larger the HD of a permutation key F, the lessintelligible is the speech signal scrambled by P. Specifically, Matsunaga et al. [9] usedChapter 4. The Scrambling Key Generation Algorithm^ 42the HD value of 90% of the total number of permutable sub-bands as the threshold value intheir synchronous FFT speech scrambling system.Clearly, the use of HD as an assessment of the permutation keys mainly aims atattempting to satisfy the first requirement, although how close HD is to the theoreticalparameter D in measuring the effectiveness of the permutation keys is not known. Howwell this scheme satisfies the second requirement is difficult to determine. Nevertheless,Matsunaga et al. noted that, for a random permutation of n elements, the probability P(m)that it has an HD of in is:P(m) = 1 ^[ 1^11^+ +(n — m)!^1!^2!( - 1 )m 1in! (4.3)The values of P(m), for n = 12, are listed in Table 3 and plotted in Figure 4.1.^TheTable 3 Values of P(m) for n = 12.m P(m)2 1.4x 10-73 9.2x1074 9.3x10-65 7.3x10-56 5.1x1047 3.1x1038 1.5x1029 6.1x10210 1.8x10-111 3.7x10-112 3.7x101average and the variance of all possible HD values are n — 1 and 1, respectively. Thereforeg =^bk+12 ' • + bk+j-1 +1,^for i > 3—2Chapter 4. The Scrambling Key Generation Algorithm^ 43Figure 4.1 Probability density function P(m) for n = 12.it is asserted that the closeness between a pair of permutation is very small if the permutationkeys, in terms of HD, are generated randomly.Sakurai et al. also proposed a way for the random generation of permutation keys. Thealgorithm is summarized below:I. A permutation key is represented by a bit sequence of 20 bits, which is used asthe initial state of a 20—stage PN sequence generator.II. Let bi, i = 0, 1, 2, . . . , be the ith output bit of the PN sequence generator. n —1integers gi, i = 1, 2, . . . , n-1, are generated from the PN sequence as follows:gl = 1iig2 — PO • 2 + b1) 22^+ 1Chapter 4. The Scrambling Key Generation Algorithm^ 44= [log2 ij + 1, k =s=2 { [log2 s^1}) (4.4)III. A permutation is generated from the set of n — 1 integers by applying theTompkins-Paige algorithm4.The total number of scrambling keys generated by this method is 220 — 1 1 x 106.This algorithm is able to transform an integer input to a unique permutation key.However, it should be noted that the permutation keys generated are not guaranteed toautomatically meet the 90% HD criterion, although it was reported that test results of severalhundred keys indicated that neither the scrambled signal nor the descrambled signal decodedby a wrong key was intelligible.4.2.2 Scheme 2Del Re et al. [24] considered using two parameters to assess the efficiency of apermutation key: a weighted shift parameter (WSP) for measuring a key's efficiency inproducing unintelligible speech, and a weighted mobile average (WMA) for indicating thesecurity of the key. Here, permutation security is intended to represent the level of difficultyin finding the unknown key and the unlikelihood that intelligible speech results when ascrambled speech signal is descrambled by a different key.The permutable sub-bands of the spectrum is divided into two groups of equal numberof sub-bands, A and B. Furthermore, in group A, the sub-bands occupying the lower halfof the spectrum in the group are designated as significant (AS) while the rest insignificant(ANS). All the sub-bands in B are insignificant. Figure 4.2 shows how the speech spectrumThe Tompkins-Paige algorithm is one that uniquely maps an integer, which is represented by the n — 1 digits gi to a permutation.It is presented in [25].Chapter 4. The Scrambling Key Generation Algorithm^ 45is divided. Weighting is determined based on how each sub-band is shifted from one groupto another.200^3200Frequency (Hz)Figure 4.2 Dividing the speech spectrum for assigning weighting values.WSP is defined as the sum of sub-band shifts (neglecting the shift direction) divided bythe total number of sub-bands and multiplied by the shift weights. Shift weights are assignedfor the following cases of sub-band shifting:• Weight P1 is assigned to shifts of1) a sub-bands in B within B,2) an ANS sub-band within A,3) an AS to an ANS sub-band.• Weight P2 (P2 < P1) is assigned to a shift of one AS sub-band to another ASsub-band.The shift weights are correlated with the number of permutable sub-bands by the followingrelationships:P1 = (1/a) - NNS2 43S3S2 S4Si S3S2 S4• •^•1first clusterSisecond cluster• •Si S2 S3 S4Chapter 4. The Scrambling Key Generation Algorithm^ 46P2 = P1/4 (4.5)where a is an experimentally derived weight of value 150 and NNS is the number ofinsignificant sub-bands. In other words, NNS is equal to 3/4 of the total number of permutablesub-bands.To determine the WMA of a permutation, every four neighboring sub-bands are groupedinto a cluster (see Figure 4.3). For all such clusters, the mean distance of the sub-bandsN permutable sub -bandslast clusterFigure 4.3 Grouping four neighboring sub-bands into clusters.with respect to the first sub-band is evaluated. Let the Si, 82, 83, and S4 represent the foursub-bands in a cluster. Then the mean distance is defined as- 82 1 + IS1 - S31 + !St - S4 / 3^(4.6)This mean distance is weighted according to the number of significant sub-bands present inthe cluster. Let NS be the number of significant sub-bands present in the cluster and NAbe the number of significant sub-bands which are adjacent in the cluster, then the weight Wis given by the following relationship:1 if NS < 1W-Eh=^if NS > 2 and NA = 0°^2NS-1 (4.7)w°+—°---i+wNA^if NS > 2 and NA > 01+ NA=Chapter 4. The Scrambling Key Generation Algorithm^ 47where WK is a constant equal to 0.5. The WMA of a permutation is defined to be theminimum value of these mean distances.Permutation keys are chosen on the basis that their WSP and WMA are higher thanthreshold values, which are determined from experimental results. Furthermore, these keysmust also meet the following five constraints:1) No significant sub-band must remain in its original position.2) A maximum of two significant sub-bands can remain in the AS cluster.3) A minimum of 8 (and a maximum of 16) sub-bands must change group (4 or 8from A, 4 or 8 from B).4) At least two of the four sub-bands in A must be significant.5) No more than two significant sub-bands must exist in a cluster of four.Del Re et al. also performed some analysis on the arrangements of the sub-bands inkeys Pi and PI, i j, so that a speech signal scrambled using Pi and descrambled usingP, will remain unintelligible. The analysis was done for the case that the spectrum consistsof 16 permutable sub-bands. It was hypothesized that Pi and 13 must be different in thefollowing way:1. If a,b,c,d, e and f represent the positions occupied by the significant sub-bandsin Pi, then the same positions in permutation 13 maybe occupied by a maximumof two significant sub-bands, neither of which is in the same position as in Pi.2. At least two-thirds of the remaining positions occupied by the insignificant sub-bands in Pi must be modified in permutation pi, so that the insignificant sub-bandscoincide in a maximum of six positions.Chapter 4. The Scrambling Key Generation Algorithm^ 48When a speech signal is scrambled by P, and subsequently descrambled by /), the net effectmay be regarded as equivalent to scrambling the original speech by a permutation Pk, wherePk = Pi-1 • Pz^ (4.8)It was pointed out that when the above two conditions are satisfied, the permutation P k hasthe following characteristics:• No significant sub-band regains its original position, and a maximum of two ofthe first six sub-bands are significant.• At most six insignificant sub-bands regain their original positions.It should be noted that when translated into the HD measure, the above conditions implythat Pk will have a Hamming Distance of at least 75% of the total number of permutablesub-bands. Also it is clear that the use of the WSP and WMA parameters and the fiveconstraints alone is not enough to guarantee that the above two conditions are met.4.3 CommentsScheme 1 is simple as it concentrates on only meeting the first requirement. On theother hand, in attempting to satisfy both the requirements, Scheme 2 is much more complex,with two parameters and five constraints. However, it is difficult to determine how muchimprovement is brought about by the added complexity in Scheme 2 in meeting the secondrequirement. The efficiency of the parameters HD and WSP in measuring the residualintelligibility of the speech signal is best evaluated by subjective tests. The results of thesesubjective tests are presented in the next chapter.A common shortcoming of both schemes is the lack of an algorithm for generating permu-tation keys that will automatically satisfy the respective key selection criteria. Consequently,Chapter 4. The Scrambling Key Generation Algorithm 49permutation keys must be randomly generated and tested to see if the criteria are satisfied. Ifnot, more permutations must be generated until one that satisfies the criteria is found. Thisbrute force way of key generation is inefficient. The time needed for the generation of eachkey is variable. Furthermore, since there is not a unique correlation between an integer inputand a permutation key, it is possible that one permutation key may be generated from morethan one integer input. This will no doubt undermine the security of the system.In the following sections, a new key selection and generation scheme is proposed. Thecriterion in this scheme is very simple yet efficient in discriminating good keys from badones. The new key generation method has the advantages that all the keys generated satisfythe new criterion, and that each key is uniquely mapped to an integer.4.4 Proposed Key Generation Algorithm4.4.1 The Derangement CriterionThe criterion for each permutation key in the new scheme is based on the hypothesisthat each sub-band in a key must be displaced by a minimum number of places, or distancefrom its natural position, to ensure that the speech processed by this key will have littleresidual intelligibility. This minimum distance is to be called the order of displacement(OD) of a permutation. The new criterion for each permutation key is that its order ofdisplacement must be of a certain value. When the order of displacement of a permutationis one, it implies that all the elements in the permutation are displaced out of their naturalplaces. Such a permutation is known as a derangement. Preliminary tests suggested thatall derangements will provide sufficiently low residual intelligibility for most applications.Note that all derangements have an HD of n, or 100% of the total number of permutableChapter 4. The Scrambling Key Generation Algorithm^ 50sub-bands. Thus the derangement criterion is supported by the findings of Matunaga et al.,who suggested the 90% criterion.4.4.2 The Derangement Generation AlgorithmIn this subsection, an algorithm for generating derangements is presented. We willbegin by discussing an algorithm [26] used to index all permutations of n elements, and analgorithm to reconstruct the permutation from a given index. The derangement generationalgorithm is presented as an extension to these two algorithms.4.4.2.1 Mapping a Permutation to an IndexAlgorithm I (Indexing a permutation) [p. 64 in [26]]: Let (U1, ..., Un) represent a givenpermutation of n integers, whose values ranging from 1 to n. An integer f(Ui, Un) iscomputed such that^0 < f(Ui,^Un) < n!,^ (4.9)and AU', •••, Un) = f(14,•••,Vn) iff U = 14 for 1 < i < n.• [Initialize.] Set i 4— n, f^0.• [Find maximum.] Find the maximum of U1, ..., Un, and suppose that Urn is themaximum. Set di=m — 1 and f 4- i•f+di.• [Exchange.] Exchange Ui and Urn.• [Decrease i.] Decrease i by one. If i > 1, return to step b.Several observations can be made on Algorithm I. Firstly, as i decrements from n to 2,the value of f is renewed in the following way:fi =^xi^ (4.10)Chapter 4. The Scrambling Key Generation Algorithm^ 51with the initial value fn = 0. Therefore, the final value of f can be expressed in termsof di and i:f(th,...,Urz)^(...((d • (n — 1) + dn-1) (n —2) + dn-2) • (n^3) • • • + d3) • 2 + d2. (4.11)The above equation can be rewritten as:f(Ui,^, Un) = d„, (n — 1)! 4_1 (n — 2)! + •^2! + d2 • 1!.^(4.12)Another observation is that each permutation has a unique set of values for the integersdi, as these integers are directly linked to the new positions of the permuted elements. Thusa unique ordering of a permutation implies a unique set of integers di.A third observation is that the range of values that di takes gradually gets smaller andsmaller as i decrements from n to 2. When i = n, the value of m is in the range of 1 and n,since Urn can be anyone of the n elements. As i decrements by one each time, the range form, and likewise that of d, decrements by one also. In general, d, satisfy the following:0 < di <^for 1 < i < rt.^ (4.13)The above observations indicate that Equation 4.12 is in fact equivalent to the expressionof the integer f in the factorial number system [26], which is defined below:Factorial number system [p. 64 in [26]]: Every integer in the range 0 < f < t! canbe uniquely written in the formf = (- • (ct—i x (t — 1) et_2) x (t — 2) + • • • + c2) x += (t —^+ (t — 2)!c_2^• + 2!c2 +^ (4.14)Chapter 4. The Scrambling Key Generation Algorithm^ 52where the factorial digits, cj, are integers satisfyingfor 1 < j < t.^ (4.15)The uniqueness property in the factorial number system implies that Algorithm I is able tomap each permutation of n elements uniquely to one, and only one index f in the range0 < f < n!.4.4.2.2 Generating a Permutation from an IndexTo construct the corresponding permutation for a given integer f, the factorial digits d,of f needs to be computed first. Note that Equation 4.11 can be expressed as:f (th, • • . , Un)^h • 2^d2where f2 = (• • • ((dy,^(n^+ 4_1) • (n — 2) + di,_2) • (n — 3). •^• + d3). (4.16)Since d2 < 2, it can be evaluated as:d2 = f mod 2. (4.17)Also, 12 can be evaluated from f as:f2 = [f /2] • (4.18)Similarly, 12 can be expressed as:12 = f3 • 3 + d3where f3 = (• • • ((d„• (n —1) + 4_1) • (n — 2) + di,_2) • (n — 3) c14). (4.19)Chapter 4. The Scrambling Key Generation Algorithm^ 53Therefore, we have the following:d3 = f2 mod 3,f3 = Lf2/3i •^ (4.20)The rest of the digits can be evaluated similarly. Consequently the algorithm for generatingthe corresponding permutation of n elements from an integer f, 0 < f < n!, is as follows:Algorithm P (Generating a permutation) [p. 64 in [26[]: Given an integer f in therange 0 < f < n!, a permutation of n elements (U1, . . . , U7) is generated such that theelements (U1, . . . , Un) have a unique ordering for each value of f.1. Initialize the sequence (U1, . . . , Un) in ascending order.2. For i = 2 to n do the following:a. set ni = (f mod i) + 1; f 4- [f/ii.b. exchange Urn and U,.4.4.2.3 Generating a Derangement from an IndexIn the process of evaluating the factorial digits di in Algorithm I, consider the case whendr, is maximum, which implies that m = n and dr, = n — 1. It is easy to see that thishappens only when Un has the highest value amongst all elements, i.e., the element Un staysat its natural position. Therefore, if dr, < n — 1, it implies that Un must be residing in aposition other than its natural position. Now consider the case when all the digits d, satisfiesthe following condition:0 < di < i — 1,^for 2 < i < n.^ (4.21)This implies that none of the elements U, resides in its natural position in the permutation.Thus the permutation is in fact a derangement. It therefore follows that, if the factorialChapter 4. The Scrambling Key Generation Algorithm 54digits d, of an integer f satisfy the above condition, a derangement will be generatedwhen Algorithm P is applied to f. Condition 4.21 implies that the total number of uniquederangements that can be generated in this manner is (n — 1)!.It is now desirable to establish a way to map an integer g in the range 0 < g < (n — 1)!to an integer f in the range 0 < f < n! and whose factorial digits di satisfy Condition 4.21.Let cj , 1 < j < n — 1, be the factorial digits of g. Then c satisfies:0 < < j, for 1 < j < n —1. (4.22)Now let d, = cz_2, for 3 < i < n, and assign d2 = 0. Clearly di, 2 < i < n, satisfiesCondition 4.21 and therefore may be used to generate a derangement.The basic idea in the derangement generation algorithm is first to resolve an integer g,in the range 0 < g < (n — 1)!, into its factorial digits c3 . These digits are then shifted upone order, with the digit of the first order being assigned to 0, to form the factorial digits,di, of a new integer f, 0 < f < n!. These new digits, which satisfy Condition 4.21, areused to generate a derangement.In practice, the digits di may be computed directly from the integer g, without theneed for the initial computation of the digits c 3 . Recall that the digits c are computed bysuccessively dividing g by consecutive integers that range from 2 to n — 1, and taking theremainders. Observe that if g is instead successively divided by consecutive integers in therange of 1 to n — 1, the resulting remainders form a sequence whose first element is 0,followed by the factorial digits of g, i.e., this sequence contains all the digits di which arerequired for the generation of a derangement.Algorithm D (Generating a derangement): Given an integer g in the range0 < g < (n — 1)!, a permutation of n elements (U1, • • • , (in) is generated such that all^Chapter 4. The Scrambling Key Generation Algorithm^ 55the elements (U1, . . . , Un) are displaced from their natural positions, and that they have aunique ordering for each value of g.1. Initialize the sequence (U1, . . . , Um) in ascending order.2. For i = 2 to n do the following:a. set m = (g mod (i — 1)) -I- 1; gb. exchange Urn and4.4.2.4 CommentsThe total number of derangements for n elements, Dii, is given by:1^1^1 ^1Di, = n!(1 —^+^—^' • • + (-1) 77). (4.23)Note that Di, > (n-1)!. Algorithm D does not generate all possible derangements for largevalues of n. It is therefore necessary to determine if the derangements generated by thealgorithm is biased in any way, when compared with the set of all derangements.Two parameters will be used for the comparison. One is the average of the values ofOD for all the derangements; the other is the average of the values of the mean permutationdistance (MPD) of all the derangements. Let (Ui, . . . , Un) represent a permutation P inthat order, then the mean permutation distance of P is defined as:nMPD(P) =n j=i^-j.^ (4.24)The values for the average OD and average MPD for derangements of elements ranging from5 to 12 are presented in Table 45• These values are computed for two set of derangements:The derangements were generated using Algorithm P and analyzed on a computer.Chapter 4. The Scrambling Key Generation Algorithm^ 56those generated by Algorithm D and the total set of derangements. Except for the caseTable 4 Values of average OD and MPD for derangements having various number of elements.NAverage OD Average MPDalgorithm all algorithm all5 1.09 1.09 2.02 2.016 1.08 1.11 2.34 2.337 1.09 1.11 2.67 2.678 1.10 1.12 3.00 3.009 1.11 1.12 3.33 3.3310 1.11 1.13 3.67 3.6711 1.12 1.14 4.00 4.0012 1.12 1.14 4.33 4.33of 5 elements, the average order of displacement of the set of derangements generated byAlgorithm D is consistently lower than that of the total set of derangements; however, thedifference is very small. On the other hand, the average permutation distance is the samefor both sets of derangements for the larger values of N. It is estimated that the trend ofthe closeness of the parameters average OD and average MPD for the two sets will prevailfor larger values of N. The conclusion of the comparison is that there is no bias in thosederangements generated by Algorithm D when compared to the total set.Similar to the HD parameter, OD is mainly used as an attempted indication of howwell a particular meets the first requirement. It is of interest to determine how close thederangements generated by Algorithm D are from each other. The parameter used for thismeasure is the HD, because a) it can be evaluated easily, and b) it provides a fairly goodassessment of the residual intelligibility of a scrambled signal, as will be shown in Chapter5. It is difficult to analytically evaluate the HD between derangements, especially thosegenerated by Algorithm D. Computer simulations are used to estimate these HD values.Chapter 4. The Scrambling Key Generation Algorithm^ 57Similar to the HD parameter, the OD parameter is used mainly as an assessment of howwell a permutation key satisfies the first requirement. It is of interest to evaluate how closethe derangements generated by Algorithm D are from each other. The HD parameter isused as the distance measure because a) it can be evaluated easily, and b) it offers a fairindication of residual intelligibility produced by a permutation, as will be shown in Chapter 5.The HD between two permutations is the number of elements occupying different positionsin these two permutations. A formal analysis of the distribution of the HD between thederangements generated by Algorithm D is difficult. The distribution for n --= 12 is estimatedby computer simulations in the following way. In each computer simulation, 1,000,000 pairsof derangements are randomly generated by Algorithm D. The values of HD between thesepairs of derangements are computed. The estimation of the distribution of HD betweenderangements is based on forty such computer simulations and is shown in Table 5 and inFigure 4.4. The distribution of HD between the derangements generated by Algorithm D isTable 5 Distribution of the values of HD between derangements generated by Algorithm D.HDProportion ofDerangements (%)Std. Dev.2 0 03 4.9 x 1 0-6 2.1 x 10-64 1.3 x 10-5 3.8 x 10-65 1.6x104 9.2 x 10-66 8.3 x 10-4 3.2 x 10-57 4.5 x 10-3 6.4 x 1 0-58 2.0 x 10-2 1.3 x 10-49 7.3 x 10-2 3.3 x 10-410 2.0 x 10-1 3.9 x 10-411 3.7 x 10-1 4.8 x 10-412 3.4 x 10-1 5.9 x 10-4Chapter 4. The Scrambling Key Generation Algorithm^ 58Hamming DistanceFigure 4.4 Distribution of the values of HD between derangements generated by Algorithm D.very similar to that between normal permutations. The HD between close to 90% of thesederangements is over 80% of the total number of sub-bands. It can be concluded that theHD between two derangements that are randomly generated by Algorithm D is large.The operations involved in Algorithm D include n — 1 mod operations, divisions, andswap operations, respectively. Therefore, the computational effort it takes is insignificant.Chapter 5 Experimental Set-up and ResultsThe scrambling key generation scheme proposed in Chapter 4 has a key selection criterionthat is effective in selecting good permutation keys, and supported by an efficient keygeneration algorithm. Preliminary tests indicated that the speech scrambled by derangementshas virtually no residual intelligibility. Subjective tests6 were conducted a) to confirmthis efficiency; b) to compare the efficiency of the following parameters in measuring theresidual intelligibility of scrambled speech samples: Hamming Distance (HD), Weighted ShiftParameter (WSP), Order of Displacement (OD), and Mean Permutation Distance (MPD);and c) to compare the speech quality for different values of NsB (the number of frequencycomponents per sub-band) for N = 64.5.1 Assessing the Efficiency of the Derangement CriterionA test similar to one used by Jayant et al. [11] was used to assess the residual intelli-gibility of speech scrambled by keys satisfying the derangement criterion. The experimentis described below.5.1.1 The Number TestIn the number test, the ten digits, 0 through 9, constitute the whole vocabulary. Thespeech samples used in the test are four-digit numbers such as 2108, spoken as two-one-zero-eight. The ten digits are evenly distributed in a set of 10 speech samples so that eachdigit occurs four times and each occurrence takes up a unique place in a number.6^In accordance with university regulations, an ethical review was performed on the test procedures used in the subjective testsdescribed in this thesis. An approval of the test procedures was obtained from the University of British Columbia Behavioural SciencesScreening Committee for Research and Other Studies Involving Human Subjects prior to the start of the subjective tests.59Chapter 5. Experimental Set-up and Results 60Two tests were conducted for two values of the FFT frame length N, with fixed values forthe other parameters — the window length factor L, and the number of frequency componentsper sub-band NsB:Test I. Number test with the following values of system parameters:N = 256, L = 5, NsB = 1;Test II. Number test with the following values of system parameters:N = 128, L = 5, NsB = 1.For each test, four sets of keys are randomly generated to satisfy the following conditions.There are twenty different keys in each set, all with the same value of OD. The four valuesof OD are 1, 13, 25, 37 for Test I, and 1, 7, 13, 19 for Test II. There is no special reason forchoosing these particular values of OD other than that they range evenly from 1 to roughly30% of the total number of permutable bands in each case. Each set of these keys areused to scrambled two sets of ten speech samples, spoken by a male and a female speaker,respectively.Scrambling of the speech samples was carried out using a personal computer which ishost to a Spectrum DSP56001 Board. The scrambled speech samples are stored in the harddrive of the computer. Before the subjective test, which takes place in a sound-proof room,each subject is given a sheet of instructions to read and to be understood before the testbegins. During the experiment, speech samples are presented to the subject over earphones.The volume can be adjusted by the subject to a comfortable level. After listening to eachspeech sample, the subject has to identify the four digits. The subject is allowed to listento the speech sample as many times as he/she finds necessary. When attempting to identifythe digits, if the subject feels that a particular digit is so unintelligible that he/she cannotChapter 5. Experimental Set-up and Results^ 61recognize it, he/she may use the character `-' in its place. However, the subject is urgedto use `-' only as a last resort.To avoid possible fatigue, the subject can exit the test at anytime. Information is savedso that the test can be resumed at the point of exit at a later time.The number test described above provides more conservative results then the one adoptedby Jayant et al. In their tests, the subjects are to finish a subjective test at one time and areonly allowed to listen to the speech samples once. Their results may be affected by possiblefatigue or temporary lack of concentration by the subjects.Since the listener is only trying to distinguish between ten possible sounds, it is aconsiderably more demanding test of the scrambler than asking listeners to understandscrambled sentences. Therefore, the resulting digit unintelligibility scores are expected, ingeneral, to provide upper bounds on the intelligibility of more general text because of thesmall size of the vocabulary.5.1.2 A Critique of the Number TestThere is a significant difference between the residual intelligibility for text and numbers.Number tests, particularly for numbers in the range of 0 to 9, do not provide a fair andtrue assessment of the intelligibility of the scrambled speech [4]. They are only a measureof the effect of the scrambler on these particular ten signals: zero, one, two, three, four,five, six, seven, eight, and nine. The subject is merely trying to differentiate between tensounds which, with the possible exceptions of five and nine, are totally dissimilar priorto scrambling. Therefore the number test is an exceptionally severe test of a scrambler.However, it is often favored by researchers because of its simplicity and the difficulty insetting up a comprehensive test based on normal conversational sentences.Chapter 5. Experimental Set-up and Results^ 625.1.3 Test ResultsSixteen subjects participated in the tests, nine of whom performed Test I, and the otherseven, Test II. All of subjects who participated in this and other subjective tests describedin this thesis have university level education; most are graduate students in ElectricalEngineering.The residual intelligibility (RI) score for each value of OD is the percent of the digitsthat were correctly identified, averaged across subjects. In the specific task of recognizingone of ten possible digits, a purely random response corresponds to an intelligibility scoreof 10%. Any character `-' in a subject's response is taken to represent a purely randomguess. The mean intelligibility scores and their variances ($2) for the two tests are presentedin Tables 6 and 7. The mean intelligibility scores are also shown in Figures 5.1 and 5.2.Table 6 Mean residual intelligibility scores and their variancesfor N 7= 256, L =-- 5, NsB = 1 at various values of OD.ODFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)1 11.4 0.4 12.5 0.7 12.0 0.213 9.3 0.5 15.7 1.4 12.5 0.625 7.8 0.1 13.1 0.3 10.4 0.0137 13.2 0.3 8.6 0.1 10.9 0.1Table 7 Mean residual intelligibility scores and their variancesfor N =-- 128, L -= 5, NsB -= 1 at various values of OD.ODFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)1 9.9 0.1 16.8 1.0 13.3 0.27 10.4 0.1 10.4 0.4 10.4 0.213 13.2 0.4 14.5 0.1 13.9 0.219 13.4 0.5 11.3 0.5 12.4 0.4Chapter 5. Experimental Set-up and Results^ 63 1.00.90.80.70.60.50.40.30.20. 10.0all samples —4—male samples - - 4- -female samplesOrder of DisplacementFigure 5.1 Residual Intelligibility versus Order of Displacement for N = 256, L = 5, Ns^1.1 .00.90.80.70.60.50.40.30.2all samples^•male samples - - 4- -female samples — —•0. 10.01 3^19Order of DisplacementFigure 5.2 Residual Intelligibility versus Order of Displacement for N = 128, L = 5, NsB = 1.Chapter 5. Experimental Set-up and Results^ 64The mean residual intelligibility scores for all values of OD are computed for the twovalues of N and are listed in Table 8.Table 8 Comparison of the mean residual intelligibility scores ofkeys meeting the derangement criterion for N = 256 and 128.NMean Residual Intelligibility Scores (%)Female Samples Male Samples All Samples128 11.7 13.3 12.5256 10.4 12.5 11.5The mean residual intelligibility scores computed based on all samples are very close to10% across the range of values of OD in both cases. This suggests that for these values of N,derangements are able to produce scrambled speech with virtually no residual intelligibility.The variances s2 of the mean residual intelligibility indicate the extent of the variabilityamong the individual subject scores. These values are fairly low in both cases. This impliesthat the mean residual intelligibility scores were consistently low for all subjects. This furthervalidates the suggestion that the derangement criterion is efficient in selecting keys whichare able to scramble speech to low intelligibility.The mean residual intelligibility scores obtained in both tests have roughly the same lowvalues across the ranges of values of OD. This implies that having an OD value of at least1, i.e., meeting the derangement criterion will allow a key to reduce the intelligibility of aspeech signal to the maximum extent possible.The mean residual intelligibility scores for the case of N = 256 is lower than those forthe case of N = 128. The reason for this is that the bandwidth of the sub-bands for thecase of N = 256 is smaller; thus the speech signal is scrambled to a higher level. However,Chapter 5. Experimental Set-up and Results^ 65the very small difference (1% when the comparison is based on results computed from allsamples) in scrambling efficiency does not have any real impact in practical usage.It is of interest to see how the subjects respond when attempting to identify a particulardigit. Table 9 shows the frequency distribution of digit selection, computed based on datafrom all subjects and all samples for Test II where N = 128. The same information isTable 9 Distribution of the subjects' entries for the ten digits for N^128.Subjects'EntriesOriginal Digits0 1 2 3 4 5 6 7 8 90 19 13.6 17.5 19.5 18.8 14.4 8.4 14.3 12.3 15.41 2.8 3.2 1.0 6.5 2.4 5.2 6.5 2.0 6.6 8.12 7.8 10.9 12.6 7.0 9.4 2.9 1.9 4.8 5.2 4.13 9.5 15.8 6.3 6.0 9.0 8.6 5.2 4.1 6.6 8.54 12.8 6.3 10.7 14.0 9.0 9.8 3.9 4.8 11.4 8.95 5.6 1.8 2.9 8.0 3.7 7.5 0.6 1.4 5.2 3.36 4.5 5.9 15.5 5.5 11.4 8.6 32.3 6.1 15.2 4.57 5.6 6.3 4.9 5.5 7.8 5.2 9.0 34.7 1.9 8.58 6.7 6.8 4.4 6.0 6.1 6.9 5.8 3.4 11.4 4.59 3.4 1.4 2.4 3.0 3.3 5.2 5.2 1.4 4.7 8.1- 22.3 28.0 21.8 19.0 19.1 25.9 21.3 23.1 19.4 26.0also shown in the pie charts in Figure 5.3. In each pie chart, the slice that is shifted outrepresents the frequency of a digit being correctly identified. It is clear that for most digits,the subjects' response is randomly distributed across the digits, when the character `-' isnot chosen. However, two digits seem to be exceptions. Compared to other digits, bothsix and seven seem to have an unusually high percentage of being correctly identified. Asimilar result was also observed by Jayant et al. [11], although they reported the unusually7^Due to round-off errors, the percentages in each column may not necessarily add up to exactly 100%.Chapter 5. Experimental Set-up and Results^ 66Figure 5.3 Distribution of the subjects' entries for the ten digits for N = 128.Chapter 5. Experimental Set-up and Results^ 67high intelligibility of only the digit six. Another observation is that, when a spoken digit isincorrectly identified, the digit zero was chosen more often than other digits in most cases.Similar results were observed for the case of N = 256. As suggested by Jayant et al.,these phenomena are best studied at the phoneme level and are beyond the scope of thisthesis. A possible explanation for the unusually high residual intelligibility of the digits sixand seven is that the phoneme(s) associated with the character `s' (or 'x') has a higher thanaverage recognizability, and that the subjects are able to distinguish six from seven by thenumber of syllables.5.2 Comparison of the Efficiency of the ParametersTwo tests based on the number test were conducted to compare the efficiency of theparameters HD, OD, WSP, and MPD:Test III. Number test with the following values of system parameters:N = 64, L = 5, NsB = 1;Test IV. Number test with the following values of system parameters:N = 64, L = 5, NsB = 2.For Test III, 20 sets of 20 keys are generated, with each set satisfying a different HD orOD requirement. Specifically, 14 sets of the keys have an HD value of two through 15,respectively; the other six sets of keys have an OD value of one through six, respectively.For Test IV, 18 sets of 20 keys are generated, 11 sets of which have the even HD valuesranging from ten to 30, respectively. The other seven sets of keys have following OD values:1, 3, 5, 6, 9, 11, 12, respectively. To facilitate the comparison, WSP and MPD of each ofthese keys are computed. The values taken by both WSP and MPD are decimal and roundedto the nearest integers during the compilation of test results.Chapter 5. Experimental Set-up and Results 68The two subjective tests were carried out in the same manner as Test I and II. The meanresidual intelligibility scores and their variances are computed for each value of the fourparameters. The results are listed in Tables 10 through 17. The residual intelligibility scoresare also plotted in the graphs shown in Figures 5.4 through 5.11.As shown by the graphs, all four parameters follow the general trend that an increasein value lowers the residual intelligibility. However, only the OD parameter provides aconsistent monotonically decreasing trend in both tests. Therefore, based on the results ofTest III and IV, the OD parameter seems to be the best of the four to use as a basis to selectkeys which are better at reducing the residual intelligibility of the speech signal.In contrast to the cases of N = 128 and 256, there is a decreasing trend in residualintelligibility scores when the value of OD increases. This indicates that in the case ofN = 64, setting the OD requirement of a key to be 1 may not necessarily scramble a speechsignal to the maximum extent. For applications with very stringent security requirements,the OD may have to be set at a value higher than 1.1.00.90.80.70.60.50.4all samples -4--male samples - - 4- -female samples0.30.20.1Chapter 5. Experimental Set-up and Results^ 69Table 10 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, NsB = 1 at various values of HD.HDFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)10 100 0 97.5 0.12 98.8 0.0312 100 0 100 0 100 014 88.1 2.9 92.3 2.1 90.2 2.416 79.5 12.9 87.8 2.0 83.7 1.318 77.9 2.2 85.2 1.5 81.6 1.720 66.4 1.1 83.2 1.4 74.7 1.122 56.5 0.9 60.1 1.4 58.3 0.924 61.1 1.2 69.3 2.5 65.2 1.126 45.8 3.6 66.0 1.1 55.9 1.928 44.3 2.9 39.1 1.6 41.7 1.130 32.9 1.6 46.9 3.5 39.9 1.932 32.4 0.3 31.1 0.7 31.8 0.30.0 ^10 1^14^16^1^2b^22^4^26^2^30Hamming DistanceFigure 5.4 Residual Intelligibility versus Hamming Distance for N = 64, L = 5, NsB = 1.all samples --ID--male samples - - .- -female samples --A--Chapter 5. Experimental Set-up and Results^ 70Table 11 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, NsB = 1 at various values of OD.ODFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)0 62.0 0.7 70.1 0.8 66.0 0.71 38.7 1.4 49.8 1.6 44.3 1.03 38.8 1.7 35.3 0.7 37.1 0.95 34.1 0.5 30.4 1.9 32.2 1.17 30.1 0.2 31.2 1.1 30.7 0.59 28.4 0.6 30.9 1.5 29.6 0.311 29.1 0.5 26.2 1.4 27.6 0.612 28.6 0.7 16.9 0.9 22.8 0.5Order of DisplacementFigure 5.5 Residual Intelligibility versus Order of Displacement for N = 64, L = 5, NsB = 1.1.00.90.80.7all samples --4--male samples - - 4- -female samples0.60.50.40.30.20.1Chapter 5. Experimental Set-up and Results^ 71Table 12 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, Ns = 1 at various values of WSP.WSPFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)1 86.4 0.9 93.7 1.3 90.0 1.02 82.1 1.0 88.6 1.0 85.3 1.03 70.1 2.0 83.7 1.8 76.9 1.24 62.1 1.0 62.6 1.6 62.3 0.75 45.9 0.8 55.9 0.9 50.9 0.96 46.5 1.4 51.4 0.1 49.0 0.97 36.9 0.7 34.7 0.8 35.8 0.58 28.0 0.08 30.5 0.6 29.3 0.19 35.4 1.0 30.0 1.4 32.7 0.910 24.7 1.2 23.6 0.2 24.1 1.40.0 1^2^3^4^5^6^7 A^9^10Weighted Shift ParameterFigure 5.6 Residual Intelligibility versus Weighted Shift Parameter for N = 64, L = 5, Ns = 1.1.00.90.8all samples -40--male samples - - 4- -female samples - --0.10.00.70.60.50.40.30.2Chapter 5. Experimental Set-up and Results^ 72Table 13 Mean residual intelligibility scores and their variancesfor N = 64, L =- 5, NsB = 1 at various values of MPD.MPDFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (70)2 100 0 100 0 100 03 92.7 1.3 94.3 1.7 93.5 1.24 85.4 1.7 91.3 1.9 88.3 1.75 80.6 1.5 88.4 1.3 84.5 1.36 77.8 0.8 87.8 0.7 82.8 0.67 64.5 1.4 82.7 1.4 73.6 1.28 56.4 1.4 61.7 2.7 59.1 1.99 54.6 2.5 48.6 2.2 51.6 2.210 43.8 1.4 43.1 2.5 43.5 1.411 34.0 1.1 51.7 1.9 42.8 0.812 26.6 1.0 32.3 0.6 29.5 0.513 53.4 1.2 42.5 0.9 48.0 0.814 25.6 1.1 34.9 1.2 35.3 0.315 25.4 0.3 23.6 0.8 24.5 0.416 31.3 0.9 26.9 1.1 29.1 0.7Mean Permutation DistanceFigure 5.7 Residual Intelligibility versus Mean Permutation Distance for N = 64, L = 5, NsB = 1.1.00.90.80.70.60.50.40.30.20.1Chapter 5. Experimental Set-up and Results^ 73Table 14 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, NH; = 2 at various values of HD.HDFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) HI (%) s2 (%)2 100 0 100 0 100 03 95 0 100 0 97.5 04 96.2 0.03 97.5 1.2 96.9 0.0075 89.2 2.8 81.2 15.9 84.2 15.66 83.8 5.2 85.9 3.6 84.8 4.37 81.3 4.6 80.8 4.3 81.3 4.48 81.7 5.1 83.4 4.3 82.5 4.69 73.9 4.9 76.5 5.9 75.2 5.310 61.8 3.1 67.3 3.4 64.6 2.811 64.0 3.7 72.3 4.6 68.1 3.812 53.8 3.4 68.3 4.6 61.1 3.813 46.8 2.8 55.2 3.9 51.0 2.914 42.7 1.8 51.8 4.8 47.2 2.415 38.0 4.6 30.8 1.5 34.4 2.316 30.7 1.5 29.0 1.1 29.8 1.2all samplesmate samples - - -female samples0.02^3^4^5^6^7^8^9^10^11^12^13^14Hamming DistanceFigure 5.8 Residual Intelligibility versus Hamming Distance for N = 64, L = 5, NsB = 2.81 4all samples^•male samples - - •- -female samples -0.70.60.50.40.3Chapter 5. Experimental Set-up and Results^ 74Table 15 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, NsB = 2 at various values of OD.ODFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)0 63.4 3.3 67.2 3.4 65.3 3.31 44.4 3.0 43.7 3.1 44.0 2.72 36.3 1.9 35.7 1.8 36.0 1.33 26.8 2.6 30.1 1.5 28.4 1.84 28.1 1.9 27.9 1.3 28.0 1.25 22.3 1.3 21.6 1.2 22.0 1.16 28.2 1.1 18.7 0.7 23.4 0.8Order of DisplacementFigure 5.9 Residual Intelligibility versus Order of Displacement for N^64, L = 5, NsB = 2.all samples -0--male samples - - •- -female samples --*-- -1^ 4Chapter 5. Experimental Set-up and Results^ 75Table 16 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, Ns B =-- 2 at various values of WSP.WSPFemale Samples Male Samples All SamplesRI (%) s2 (%) RI (%) s2 (%) RI (%) s2 (%)0 86.6 4.0 86.7 3.7 86.6 3.81 71.9 6.3 74.2 5.7 73.0 5.92 62.5 2.6 69.1 4.0 65.8 3.23 40.2 2.7 42.3 2.1 41.2 2.34 29.8 1.3 26.0 1.1 27.9 1.15 42.9 2.8 45.7 4.3 44.3 2.51.00.90.80.70.60.50.40.30.20.10 .0Weighted Shift ParameterFigure 5.10 Residual Intelligibility versus Weighted Shift Parameter for N = 64, L = 5, Ns B^2.Chapter 5. Experimental Set-up and Results^ 76Table 17 Mean residual intelligibility scores and their variancesfor N = 64, L = 5, NsB = 2 at various values of MPD.WSPFemale Samples Male Samples All SamplesRI (%) s2 ( % ) RI ( %) s2 ( % ) RI (%) s2 (%)0 100 0 100 0 100 01 80.3 8.8 88.0 9.6 84.1 8.92 79.4 9.6 79.7 7.2 79.6 8.43. 78.6 3.4 80.5 3.5 79.6 3.34 57.1 3.1 64.0 4.4 60.6 3.65 43.0 2.6 49.4 1.5 46.2 1.86 39.4 1.1 39.2 1.5 39.3 1.17 28.3 1.9 30.0 1.8 29.1 1.78 27.6 1.1 19.5^_ 0.9 23.6 0.8Mean Permutation DistanceFigure 5.11 Residual Intelligibility versus Mean Permutation Distance for N = 64, L = 5, NsB = 2.Chapter 5. Experimental Set-up and Results^ 77Table 18 lists the residual intelligibility scores, computed based on all speech samplesscrambled by keys meeting the derangement criterion, for four cases.Table 18 Comparison of the mean residual intelligibility scores of keysmeeting the derangement criterion for different values of N and NSB •NMean Residual Intelligibility Scores (%)Female Samples Male Samples All Samples64 (NsB = 1) 32.4 31.1 31.864 (NsB = 2) 31.3 26.9 29.1128 (NsB = 1) 11.7 13.3 12.5256 (NsB = 1) 10.4 12.5 11.5Results listed in Table 18 indicate that there is no increase of residual intelligibility whenthe value of the parameter NsB is increased from 1 to 2 for N = 64. However, for the keyssatisfying the derangement criterion, there is a significant increase in residual intelligibilitywhen the value of N decreases to 64 from 128 or 256. The residual intelligibility scores for thecase of N = 64 around 30%. These scores are high and are in contrast to the results obtainedin the preliminary tests, which indicated that speech segments scrambled by derangementshad virtually no residual intelligibility, for all the cases of N = 64, 128, and 256. A speechscrambling system which produces scrambled speech with residual intelligibility as high as30% is unacceptable. But the scores obtained in these tests are based on the number testand may provide a deceptive indication of the performance of a speech scrambling systemin actual usage, in terms of the level of security provided by the system. The reason for thesignificant increase in residual intelligibility scores for the case of N = 64 as compared tothe cases of N = 128 and 256 deserves to be examined closely and is provided below.In a frequency domain scrambling system based on the bandsplitting concept, thespectrum of the speech signal may be scrambled, but the signal energy is preserved.Chapter 5. Experimental Set-up and Results 78Therefore, although the resulting scrambled speech signal may be unintelligible, some speechcharacteristics such as the talk spurts and the intonation of the original speech are not altered.When the value of N is as high as 128 or 256, such characteristics are more difficult to detect.However, they are more easily detected when n = 64. In a number test, since there are onlyten digits to identify, the subjects can take advantage of such information as the numberof syllables in a digit and the digit's intonation pattern, to make intelligent guesses whenattempting to identify a particular digit. In such a case, the number test fails to provide afair performance assessment of the scrambling system, because when real life conversationsare involved, the subjects will not be able to take advantage of such speech characteristicsbecause of the large size of the vocabulary. Therefore, as pointed out in section 5.1.2 ofthis chapter, the results of the number test provides a conservative upper bound on theperformance of the system in actual usage. An asynchronous frequency domain speechscrambling system based on N = 64 and the derangement criterion for key selection willproduce no intelligibility when the speech signal contains normal text, as will be shown inthe test presented in the next section.5.3 The Sentence TestFor the case of N = 64, the number test indicates that the keys satisfying the derange-ment criterion still leave a residual intelligibility as high as 30% in the scrambled speechsignal. However this is not a true reflection of the performance of the system when actualconversations are involved. To test how well a speech signal containing sentences insteadof just digits is scrambled, the following test was set up:Test V. Sentence test with the following values of system parameters:N = 64, L = 5, NsB 2.Chapter 5. Experimental Set-up and Results 79Five keys, each of which has a different OD value ranging from one to five, are generatedrandomly. These keys are used to scrambled five speech samples, each of which containstwo sentences, spoken by a male and a female speaker, respectively. The sentences usedare Harvard sentences, which are a list of phonetically balanced sentences recommended byIEEE for use in speech quality measurements [27].The procedures in this test are similar to those in Test I and II. During the test, a subjectis asked to identify the words in the speech samples. Any word identified by the subjectis to be written in one of four spaces provided, to reflect the relative position of the wordin the speech sample. This measure is taken to prevent random guess of common wordssuch as "the", "a", etc. The subject may listen to the same speech sample as many timesas he/she thinks is necessary.Sixteen subjects participated in this test. None was able to correctly identify any wordin the scrambled speech samples. The conclusion to be drawn from this test is that anasynchronous speech scrambler based on the system parameters N = 64, L -= 5, NSB = 2and the derangement criterion for key selection will be able to meet the security requirementsof most users of the public telephone network.5.4 Speech Quality ComparisonIt was discussed in Section 3.2 that there is an improvement of speech quality whengroups of frequency components, instead of individual frequency components, are permuted,due to a reduction in distortion. A subjective test was set up to compare the speech qualitywhen speech samples are processed (scrambled and descrambled) for the following two cases:• N = 64, L = 5, NsB = 1;• N = 64, L = 5, NsB = 2.Chapter 5. Experimental Set-up and Results^ 80Other values of NsB are not considered because they will render the key space too smallfor practical purposes.For each value of NsB, five keys, all satisfying the derangement criterion, are randomlyselected. The two sets of keys are used to process five speech samples, each of which aremade up of two Harvard sentences, each spoken by a male and a female speaker, respectively.The Harvard sentences used in this test are different from those used in Test V. Let the fivespeech samples be represented by the letters A to E. Let the keys be represented by Ki-j,i = NsB, 1 j 5. In the subjective test, the ten processed speech samples are presentedto the subjects in the following order:1. speech sample A, processed by key K1-1;2. speech sample A, processed by key K2-1;3. speech sample B, processed by key K2-2;4. speech sample B, processed by key K1-25. speech sample C, processed by key K2-3;6. speech sample C, processed by key K1-3;7. speech sample D, processed by key K1-4;8. speech sample D, processed by key K2-4;9. speech sample E, processed by key K1-5;10. speech sample E, processed by key K2-5.These speech samples are not presented in any special order which can influence thesubject's response. After listening to every two of these speech samples, the subject has topick one that, in her opinion, has the better quality. The ranking of 1 is assigned to thespeech sample picked by the subject, while the other speech sample gets the ranking of 2.Chapter 5. Experimental Set-up and Results^ 81Sixteen subjects participated in this test. For each subject, the ranking for a value ofNsB is the average of the rankings for the five speech samples associated with the samevalue of NsB. The normalized ranking is computed by averaging the rankings of each valueof NsB across all subjects. Table 19 lists the values of the normalized rankings and theassociated standard deviations(S).Table 19 Normalized ranking values and their standard deviations forspeech processed with parameters N = 64, L = 5, NsB = 1 and 2.NSB Ranking S1 1.975 0.0682 1.025 0.068Clearly for NSB = 2, the descrambled speech has better quality. Therefore, by usingthe value of NsB = 2, there is no increase of residual intelligibility found in the scrambledspeech signal, but an improvement in speech quality.Chapter 6 Conclusions and Recommendationsfor Future Work6.1 ConclusionsTo meet the privacy needs of users of the public telephone network, the design of atelephone speech scrambler has been studied. The scrambler is based on an asynchronousfrequency domain speech scrambling algorithm to provide an adequate level of security anduser transparent operation at a reasonable cost.The relationship between descrambled speech quality and the bandwidth of the permuta-tion sub-bands has been examined. It has been found that speech quality is often improvedwhen more than one neighboring frequency components are grouped together to be per-muted as one sub-band.A new parameter is introduced to measure a key's efficiency in reducing the residualintelligibility of a speech signal. A simple criterion based on this parameter is proposedfor use in selecting permutations which are fit for use as scrambling keys. Subjective testsconfirm the efficiencies of both this parameter and the criterion, as speech samples scrambledby the keys meeting this criterion have little residual intelligibility.An algorithm is proposed for generating keys which satisfy the proposed key selectioncriterion. This algorithm is able to uniquely map an integer g in the range of 0 < g < (n — 1)!to a key of n elements.6.2 Recommendations for Future WorkThe filter used in the implementation of the filter bank analysis, Hr , does not have tobe an ideal low pass filter [16]. The necessary condition for Hg- is that it be a finite impulse82Chapter 6. Conclusions and Recommendations for Future Work^ 83response (FIR) filter satisfying the first Nyquist criterion:ho [0] = 1,ho[n] = 0^for n = ±N, +2N, +3N,• • •^(6.1)An FIR filter satisfying the above condition can be designed to provide optimal speech quality.In a recent paper, Goldburg et al. [28] suggest that the Discrete Cosine Transform(DCT) is also an efficient transform to be used for analog speech scrambling. It would beof interest to evaluate the derangement criterion's ability to select keys that are fit for usein a speech scrambling system based on the DCT.84References[1] A. Gersho and R. Steele, "Guest Editorial: Encryption of Analog Signals — APerspective," IEEE Journal on Selected Areas in Communications, vol. SAC-2, no.3, pp. 423-425, May 1984.[2] W. Diffie and M. E. Hellman, "Privacy and Authentication: An Introduction toCryptography," Proceedings of the IEEE, vol. 67, no. 3, pp. 397-427, Mar. 1979.[3] D. E. Denning, Cryptography and Data Security. Reading, Massachusetts, USA:Addison-Wesley Publishing Company Inc., 1982.[4] H. J. Beker and F. C. Piper, Secure Speech Communications. London, England:Academic Press, 1985.[5] R. L. Freeman, Telecommunications Transmission Handbook. New York, USA: JohnWiley and Sons, Inc., 3rd ed., 1991.[6] Transmission Systems for Communications. Holmdel, New Jersey, USA: Bell TelephoneLaboratories, Inc, 5th ed., 1982.[7] R. L. Rivest, A. Shamir, and L. Adleman, "A Method for Obtaining Digital Signaturesand Public-Key Cryptosystems," Comm. ACM, vol. 21(2), pp. 120-126, Feb. 1978.[8] N. S. Jayant, "High-Quality Coding of Telephone Speech and Wideband Audio," IEEECommun. Mag., pp. 10-20, Jan. 1990.[9] A. Matsunaga, K. Koga, and M. Ohkawa, "An Analog Speech Scrambling SystemUsing the FFT Technique with High-level Security," IEEE Journal on Selected Areasin Communications, vol. 7, no. 4, pp. 540-547, May 1989.[10] A. M. McCalmont, "Communications Security for Voice Techniques, Systems andOperations," Telecommun., pp. 35-42, Apr. 1973.[11] N. S. Jayant, B. J. McDermott, S. W. Christensen, and A. M. Quinn, "A Comparisonof Four Methods for Analog Speech Privacy," IEEE Trans. Commun., vol. COM-29,pp. 18-23, Jan. 1981.85[12] A. D. Wyner, "An Analog Scrambling Scheme Which Does Not Expand Bandwidth,Part I: Discrete Time," IEEE Trans. Inform. Theory, vol. IT-25, pp. 261-274, May1979.[13] S. B. Weinstein, "Sampling Based Techniques For Voice Scrambling," Proc. Int. Conf.Commun., pp. 16.2.1-16.2.6, June 1980.[14] L. S. Lee, Y. P. Ham, and Y. C. Chen, "A Simple Sample Value Scrambler UsingFFT Algorithms for Secure Voice Communications," Proc. Nat. Telecommun. Conf,pp. 49.4.1-49.4.5, Dec. 1980.[15] R. W. Schafer and L. R. Rabiner, "Design and Simulation of a Speech Analysis-synthesis System Based on Short-time Fourier Analysis," IEEE Trans. Audio Electroa-coust., vol. AU-21, pp. 165-174, June 1973.[16] M. R. Portnoff, "Implementation of the Digital Phase Vocoder Using the Fast FourierTransform," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 243—248, June 1976.[17] L. S. Lee, G. C. Chou, and C. S. Chang, "A New Frequency Domain Speech ScramblingSystem Which Does not Require Frame Synchronization," IEEE Trans. Commun.,vol. COM-32, pp. 444-456, Apr. 1984.[18] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. EnglewoodCliffs, New Jersey, USA: Prentice-Hall, Inc., 1989.[19] A. Sakurai, K. Koga, and T. Muratani, "A Speech Scrambler Using the Fast FourierTransform Technique," IEEE Journal on Selected Areas in Communications, vol. SAC-2, no. 3, pp. 434-442, May 1984.[20] R. M. Kraus and P. D. Bricker, "Effects of Transmission Delay and Access Delay onthe Efficiency of Verbal Communication," The Journal of the Acoustical Society ofAmerica, vol. 41, pp. 286-292, 1967.[21] J. G. Gruber and L. Strawczynski, "Subjective Effects of Variable Delay and SpeechClipping in Dynamically Managed Voice Systems," IEEE Trans. Commun., vol. COM-33, pp. 801-808, Aug. 1985.86[22] N. Kitawaki and K. Itoh, "Pure Delay Effects on Speech Quality in Telecommunica-tions," IEEE Journal on Selected Areas in Communications, vol. 9, no. 4, pp. 586-593,May 1991.[23] P. T. Brady, "A Statistical Analysis of On-off Patterns in 16 Conversations," Bell Syst.Tech. J., vol. 47, pp. 73-91, 1967.[24] E. Del Re, R. Fantacci, and D. Maffucci, "A New Speech Signal Scrambling Methodfor Secure Communications: Theory, Implementation, and Security Evaluation," IEEEJournal on Selected Areas in Communications, vol. 7, no. 4, pp. 474-480, May 1989.[25] E. F. Beckenbach, ed., Applied Combinatorial Mathematics. New York, USA: JohnWiley and Sons, Inc., 1964.[26] D. E. Knuth, The Art of Computer Programming, vol. 2. Reading, Massachusetts, USA:Addison-Wesley Publishing Company Inc., 2nd ed., 1973.[27] "IEEE Recommended Practice for Speech Quality Measurements," IEEE Transactionson Audio and Electroacoustics, pp. 227-246, Sept. 1969.[28] B. Goldburg, S. Sridharan, and E. Dawson, "Design and Cryptanalysis of Transform-Based Speech Scramblers," IEEE Journal on Selected Areas in Communications,vol. 11, no. 5, pp. 735-743, June 1993.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An asynchronous telephone speech scrambler with a new...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
An asynchronous telephone speech scrambler with a new key generation method Woo, Raymond W. M. 1993
pdf
Page Metadata
Item Metadata
Title | An asynchronous telephone speech scrambler with a new key generation method |
Creator |
Woo, Raymond W. M. |
Date Issued | 1993 |
Description | The design of an asynchronous telephone speech scrambler is proposed and its performance experimentally studied. The scrambler transforms the speech signal into a scrambled form by permuting the frequency coefficients of the speech signal. These coefficients are permuted in pairs instead of individually. Subjective tests indicate that such an arrangement results in an improvement in descrambled speech quality but no increase in the residual intelligibility of the scrambled speech signal. A new criterion for selecting good scrambling permutations known as the derangement criterion is introduced. Subjective tests based on the number test confirm that keys satisfying this criterion are able to produce scrambled speech with virtually no residual intelligibility. A new derangement generation algorithm is introduced. This algorithm maps an integer g, in the range of 0 ≤ g < (n - 1)!, uniquely to a derangement of n elements. |
Extent | 3911299 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2008-08-27 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0065101 |
URI | http://hdl.handle.net/2429/1544 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1993-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1993_fall_woo_raymond.pdf [ 3.73MB ]
- Metadata
- JSON: 831-1.0065101.json
- JSON-LD: 831-1.0065101-ld.json
- RDF/XML (Pretty): 831-1.0065101-rdf.xml
- RDF/JSON: 831-1.0065101-rdf.json
- Turtle: 831-1.0065101-turtle.txt
- N-Triples: 831-1.0065101-rdf-ntriples.txt
- Original Record: 831-1.0065101-source.json
- Full Text
- 831-1.0065101-fulltext.txt
- Citation
- 831-1.0065101.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065101/manifest